From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-ed1-f53.google.com (mail-ed1-f53.google.com [209.85.208.53]) by mx.groups.io with SMTP id smtpd.web11.1436.1630655286284561188 for ; Fri, 03 Sep 2021 00:48:06 -0700 Authentication-Results: mx.groups.io; dkim=pass header.i=@gmail.com header.s=20210112 header.b=EBreYi48; spf=pass (domain: gmail.com, ip: 209.85.208.53, mailfrom: harlydavidsen@gmail.com) Received: by mail-ed1-f53.google.com with SMTP id j13so6731462edv.13 for ; Fri, 03 Sep 2021 00:48:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=F2uD3OLCRrEvL9QM5kXna/zq6eyJWR+gJ/zwMqGFoZs=; b=EBreYi48Fn2UuZwWvQV/gzZJdJd776HyoYdrocAGCpvAtWZzpUssI8bCV8jxgAQABI q8wvlHu6eSGvktQIVJQVIsGCkqbYGUahu2dOcwl9CqKawMfsxq9Uf7ZzEpFYf8vcvasD 7N0mlMCf+zXoP6IvFQgS7znhyw/hbUewkhRJciCgx1Arh/Z+UbfGqJoprm1yaImYxUOz IVavMpE0Vyvis6mC1xFAupvo7RCB6NH4ZxNKnS5ODjYF8cUIdKdrKZKqBjUunf7XCZMZ taNZjirIZs3z8S4240CCbLoIspWGlfZvgVj9LEVMROneg04eUVRhiZoSJ0jnKbC2ekWr 4Iew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=F2uD3OLCRrEvL9QM5kXna/zq6eyJWR+gJ/zwMqGFoZs=; b=I8ldeqjponD/zo/C4zLFNN4m6hZdEbssx100bj3LZG1YLEmDaxBnA1j4jUhIGzjnZu YyeCfnXza0qIA7YeTTCtwmoV2YAzspjW8Vz/ioOYleLFe3ufDvxZwuSjHoNDOxWeFO7v ijNf9OpPRnTVrdL/+hfTmmxzoBv2pWV47nQj7rT3ip8dM65R3PHDyaBfBc74CMpKmFQk eopKg6kc27/9p2/e7OIFbttZx+oI7LVfSBY2cT2UoZzuaoLHZ0TN/oW1JVBMhtiOjsvw TY6eylnHnGcZ31TyJHY/OoyS1o2ijtzkIHyMGYtv7GXYK8E+SHiUpXDa8qYUy2q+qUcD fCyw== X-Gm-Message-State: AOAM530jiRqTtOLj9SsBhhWOPI0JkFXe9IewrIICeebyGtclH43lfL6q TQdoKr50NPtDbqsYPlnQZ+I/Jfgxjo2n2gHRE8vD2mGaf3s= X-Google-Smtp-Source: ABdhPJzJ9+RZ5HGVNl3HcU+rwNdZgD5sQwugbYLs8aCh8wIQKvmmPv62OieDWIDG92SKXSrjxkINkCu8x7fzwQB1j64= X-Received: by 2002:a05:6402:1d25:: with SMTP id dh5mr2646740edb.91.1630655284278; Fri, 03 Sep 2021 00:48:04 -0700 (PDT) MIME-Version: 1.0 References: <12E67558-0528-4623-969C-02F3A2559B51@apple.com> In-Reply-To: From: "Ethin Probst" Date: Fri, 3 Sep 2021 02:47:52 -0500 Message-ID: Subject: Re: [edk2-devel] [RFC] Add parallel hash feature into CryptoPkg.BaseCryptLib. To: edk2-devel-groups-io , "Yao, Jiewen" Cc: Andrew Fish , "Kinney, Michael D" , "Li, Zhihao" , "Wang, Jian J" , "Wu, Hao A" , "Lu, XiaoyuX" , "Jiang, Guomin" , "Liming Gao (Byosoft address)" , "Fu, Siyuan" , "Wu, Yidong" , "Li, Aaron" Content-Type: multipart/alternative; boundary="000000000000f8791005cb128496" --000000000000f8791005cb128496 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I think another problem that we need to consider is that to my knowledge, the MP services do not allow for thread scheduling at all. You can run a call back on multiple processors, but that won't increase the performance of the function you're calling because the function will be executed independently of all other processors doing the work, so you would need to intelligently write the function to determine what processor it's on and that in turn would determine what work the function does. This would also bring in the requirement for synchronization primitives like mutexes and locks. I'm not sure how exactly that could be accomplished without changing the API, or at least adding new functionality to it. But I may be missing something and this may be possible. But last time I checked, UEFI did not contain a thread-based scheduler. On Thu, Sep 2, 2021, 20:02 Yao, Jiewen wrote: > > Hi > > Comment on 2/3. > > > > I am not sure if the a new function AuthenticateFmpImageWithParallelhash() is absolutely necessary. > > Why you do the parallel hash before authentication and transfer the result to AuthenticateFmpImage? > > Why we cannot do it inside of AuthenticateFmpImage? > > > > Ideally, we hope to hide *algorithm* from *business logic*. > > Do you have any POC link? > > > > Thank you > > Yao Jiewen > > > > From: Andrew Fish > Sent: Friday, September 3, 2021 7:16 AM > To: edk2-devel-groups-io ; Kinney, Michael D < michael.d.kinney@intel.com> > Cc: Li, Zhihao ; Yao, Jiewen ; Wang, Jian J ; Wu, Hao A ; Lu, XiaoyuX ; Jiang, Guomin ; gaoliming@byosoft.com.cn; Fu, Siyuan ; Wu, Yidong < yidong.wu@intel.com>; Li, Aaron > Subject: Re: [edk2-devel] [RFC] Add parallel hash feature into CryptoPkg.BaseCryptLib. > > > > > > > >> On Sep 2, 2021, at 8:50 AM, Michael D Kinney wrote: >> >> >> >> Hi Zhihao, >> >> >> >> Is the result of the parallel hash identical to the current hash? If so, then can we simply have a new instance of the FmpAuthenticationLib and hide the ParallelHash256 digest inside this implementation of this new instance? >> >> >> >> I do not think BaseCryptLib should depend on CPU MP Services Protocol. Can the use of MP Services be moved up into the implementation of the new FmpAuthenticationLib? If new BASE compatible primitives need to be added to BaseCryptLib to support parallel hash, then those likely make sense. >> >> > > > > > > Mike, > > > > Stupid question but the BaseCryptLib seems to really be DxeCryptLib[1]? So are you worried about adding the dependency to this DXE Lib? It depends on UefiRuntimeServicesTableLib. Looks like SysCall/TimerWrapper.c[2] uses gRT->GetTime(). It looks like if the time services are not available it returns 0 from time(), so there is only a quality of service implication to when it it is used but no Depex? > > > > >> >> How do you decide how many CPU threads to use? >> >> > > > > If we end up splitting this up for =E2=80=9Cothers=E2=80=9D to handle the= MP in DXE, PEI, or MM then I think we probably need a more robust API set that abstracts breaking up the work, and combining it back tougher. Well you would need the worker functions to processes the broken up data on the APs. So I would imagine and API that splits the work and you pass in the number of APs (or APs + BSP) and you get N buffers out to process? Those buffers should describe the chunk to the worker function, and when the worker function is done the get the answer function can calculate the results from that. > > > > > > [1] We don=E2=80=99t have a Base implementation of BaseCryptLib. > > CryptoPkg/Library/BaseCryptLib/BaseCryptLib.inf > > LIBRARY_CLASS =3D BaseCryptLib|DXE_DRIVER DXE_CORE UEFI_APPLICATION UEFI_DRIVER > > > > CryptoPkg/Library/BaseCryptLib/PeiCryptLib.inf > > LIBRARY_CLASS =3D BaseCryptLib|PEIM PEI_CORE > > > > CryptoPkg/Library/BaseCryptLib/RuntimeCryptLib.inf > > LIBRARY_CLASS =3D BaseCryptLib|DXE_RUNTIME_DRIVER > > > > CryptoPkg/Library/BaseCryptLib/SmmCryptLib.inf > > LIBRARY_CLASS =3D BaseCryptLib|DXE_SMM_DRIVER SMM_CORE MM_STANDALONE > > > > [2] https://github.com/tianocore/edk2/blob/master/CryptoPkg/Library/BaseCryptLi= b/SysCall/TimerWrapper.c#L77 > > > > Thanks, > > > > Andrew Fish > > > >> Thanks, >> >> >> >> Mike >> >> >> >> From: devel@edk2.groups.io On Behalf Of Li, Zhiha= o >> Sent: Wednesday, September 1, 2021 6:38 PM >> To: devel@edk2.groups.io >> Cc: Yao, Jiewen ; Wang, Jian J < jian.j.wang@intel.com>; Wu, Hao A ; Lu, XiaoyuX < xiaoyux.lu@intel.com>; Jiang, Guomin ; gaoliming@byosoft.com.cn; Fu, Siyuan ; Wu, Yidong < yidong.wu@intel.com>; Li, Aaron >> Subject: [edk2-devel] [RFC] Add parallel hash feature into CryptoPkg.BaseCryptLib >> >> >> >> Hi, everyone. >> >> We want to add new hash algorithm=E2=80=94cSHAKE256/ParallelHash256 defi= ned by NIST SP 800-185=E2=80=94into BaseCryptLib of CryptoPkg. This feature can be= applied for digital authentication functions like Capsule Update. It utilizes multi-processor to calculate the image digest in parallel for update capsule authentication so that lessen the time of capsule authentication. >> >> >> >> Bugzilla: https://bugzilla.tianocore.org/show_bug.cgi?id=3D3596 >> >> >> >> [Background] >> >> The intention of this change is to improve the capsule authentication performance. >> >> Currently, the image is calculated to a hash value (usually by SHA-256), then the hash value be signed by a certificate. The header, certificate, and image binary be sealed to the capsule. In authentication phase, the program should calculate the hash using image binary in capsule and then perform authentication procedures. >> >> >> >> [Proposal] >> >> Now, we propose a new authentication flow, which firstly pre-calculates the ParallelHash256 digest of the image binary in parallel with multi-processors, then use the ParallelHash256 digest (instead of original image binary) in subsequent SHA-256 hash for sign/authentication. >> >> Since the big size image be compressed to the ParallelHash256 digest that only have 256 bytes, the time of SHA-256 running would be less. >> >> >> >> [Required Changes] >> >> Mainly in CryptoPkg, MdeModulePkg, SecurityPkg: >> >> 1. CryptoPkg: need to add the new hash algorithm named cSHAKE256/ParallelHash256 in BaseCrypLib. The ParallelHash function will consume CPU MP Service Protocol, not sure if this is allowed in BaseCryptLib? >> >> 2. MdeMoudulePkg: Add new authenticate function AuthenticateFmpImageWithParallelhash() to FmpAuthenticationLib. This is because original AuthenticateFmpImage() interface only have 4 parameters while the new have 5 parameters. The 5th parameter is ParallelHash256 digest raised above. We try to do the parallel hash before authentication and transfer the result to AuthenticateFmpImage function as parameter. So that we can do only once parallel hash externally in the case of multiple authentication which saves more time. >> >> 3. SecurityPkg: Add new function named FmpAuthenticatedHandlerPkcs7WithParallelhash() and AuthenticateFmpImageWithParallelhash() to FmpAuthenticationLibPkcs7. This is because original interfaces not have the formal parameter (ParallelHash256 digest) we need. We try to do the parallel hash before authentication and transfer the result to AuthenticateFmpImage and FmpAuthenticatedHandlerPkcs7 function as parameter. So that we can do only once parallel hash externally in the case of multiple authentication which saves more time. >> >> >> >> Please let me know if you have any comment or concern on this proposed change. >> >> >> >> Thanks for your time and feedback! >> >> Best regards, >> Zhihao >> >> > > > >=20 --000000000000f8791005cb128496 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

I think another problem that we need to consider is that to = my knowledge, the MP services do not allow for thread scheduling at all. Yo= u can run a call back on multiple processors, but that won't increase t= he performance of the function you're calling because the function will= be executed independently of all other processors doing the work, so you w= ould need to intelligently write the function to determine what processor i= t's on and that in turn would determine what work the function does. Th= is would also bring in the requirement for synchronization primitives like = mutexes and locks. I'm not sure how exactly that could be accomplished = without changing the API, or at least adding new functionality to it. But I= may be missing something and this may be possible. But last time I checked= , UEFI did not contain a thread-based scheduler.

On Thu, Sep 2, 2021, 20:02 Yao, Jiewen <jiewen.yao@intel.com> wrote:
>
> Hi
>
> Comment on 2/3.
>
> =C2=A0
>
> I am not sure if the a new function AuthenticateFmpImageWithParallelha= sh() is absolutely necessary.
>
> Why you do the parallel hash before authentication and transfer the re= sult to AuthenticateFmpImage?
>
> Why we cannot do it inside of AuthenticateFmpImage?
>
> =C2=A0
>
> Ideally, we hope to hide *algorithm* from *business logic*.
>
> Do you have any POC link?
>
> =C2=A0
>
> Thank you
>
> Yao Jiewen
>
> =C2=A0
>
> From: Andrew Fish <afish@apple.c= om>
> Sent: Friday, September 3, 2021 7:16 AM
> To: edk2-devel-groups-io <d= evel@edk2.groups.io>; Kinney, Michael D <michael.d.kinney@intel.com>
> Cc: Li, Zhihao <zhihao.li@in= tel.com>; Yao, Jiewen <ji= ewen.yao@intel.com>; Wang, Jian J <jian.j.wang@intel.com>; Wu, Hao A <hao.a.wu@intel.com>; Lu, XiaoyuX <xiaoyux.lu@intel.com>; Jiang, Guomin <= ;guomin.jiang@intel.com>; = gaoliming@byosoft.com.cn; F= u, Siyuan <siyuan.fu@intel.com>; Wu, Yidong <yidong.wu@inte= l.com>; Li, Aaron <aaron.li= @intel.com>
> Subject: Re: [edk2-devel] [RFC] Add parallel hash feature into CryptoP= kg.BaseCryptLib.
>
> =C2=A0
>
> =C2=A0
>
>
>
>> On Sep 2, 2021, at 8:50 AM, Michael D Kinney <michael.d.kinney@intel.com> wrote:
>>
>> =C2=A0
>>
>> Hi=C2=A0Zhihao,
>>
>> =C2=A0
>>
>> Is the result of the parallel hash identical to the current hash?= =C2=A0=C2=A0If so, then can we simply have a new instance of the=C2=A0FmpAu= thenticationLib=C2=A0and hide the=C2=A0ParallelHash256 digest inside this i= mplementation of this new instance?
>>
>> =C2=A0
>>
>> I do not think=C2=A0BaseCryptLib=C2=A0should depend on CPU MP Serv= ices Protocol.=C2=A0=C2=A0Can the use of MP Services be moved up into the i= mplementation of the new=C2=A0FmpAuthenticationLib?=C2=A0=C2=A0If new BASE = compatible primitives need to be added to=C2=A0BaseCryptLib=C2=A0to support= parallel hash, then those likely make sense.
>>
>> =C2=A0
>
> =C2=A0
>
> =C2=A0
>
> Mike,
>
> =C2=A0
>
> Stupid question but the=C2=A0BaseCryptLib seems to really be DxeCryptL= ib[1]? So are you worried about adding the dependency to this DXE Lib? It d= epends on=C2=A0UefiRuntimeServicesTableLib. Looks like=C2=A0SysCall/TimerWr= apper.c[2] uses gRT->GetTime(). It looks like if the time services are n= ot available it returns 0 from time(), so there is only a quality of servic= e implication to when it it is used but no Depex?
>
> =C2=A0
>
> =C2=A0
>>
>> How do you decide how many CPU threads to use?=C2=A0
>>
>> =C2=A0
>
> =C2=A0
>
> If we end up splitting this up for =E2=80=9Cothers=E2=80=9D to handle = the MP in DXE, PEI, or MM then I think we probably need a more robust API s= et that abstracts breaking up the work, and combining it back tougher. Well= you would need the worker functions to processes the broken up data on the= APs. So I would imagine and API that splits the work and you pass in the n= umber of APs (or APs + BSP) and you get N buffers out to process? Those buf= fers should describe the chunk to the worker function, and when the worker = function is done the get the answer function can calculate the results from= that.=C2=A0
>
> =C2=A0
>
> =C2=A0
>
> [1] We don=E2=80=99t have a Base implementation of BaseCryptLib.=C2=A0=
>
> CryptoPkg/Library/BaseCryptLib/BaseCryptLib.inf
>
> LIBRARY_CLASS=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =3D BaseCryptLib|DXE_DRIVER D= XE_CORE UEFI_APPLICATION UEFI_DRIVER
>
> =C2=A0
>
> CryptoPkg/Library/BaseCryptLib/PeiCryptLib.inf
>
> LIBRARY_CLASS=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =3D BaseCryptLib|PEIM PEI_COR= E
>
> =C2=A0
>
> =C2=A0CryptoPkg/Library/BaseCryptLib/RuntimeCryptLib.inf
>
> LIBRARY_CLASS=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =3D BaseCryptLib|DXE_RUNTIME_= DRIVER
>
> =C2=A0
>
> CryptoPkg/Library/BaseCryptLib/SmmCryptLib.inf
>
> =C2=A0 LIBRARY_CLASS=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0=3D BaseCryptLib|DXE= _SMM_DRIVER SMM_CORE MM_STANDALONE
>
> =C2=A0
>
> [2] https://github.com/tianoc= ore/edk2/blob/master/CryptoPkg/Library/BaseCryptLib/SysCall/TimerWrapper.c#= L77
>
> =C2=A0
>
> Thanks,
>
> =C2=A0
>
> Andrew Fish
>
>
>
>> Thanks,
>>
>> =C2=A0
>>
>> Mike
>>
>> =C2=A0
>>
>> From:=C2=A0devel@edk2.grou= ps.io=C2=A0<devel@edk2.group= s.io>=C2=A0On Behalf Of=C2=A0Li, Zhihao
>> Sent:=C2=A0Wednesday, September 1, 2021 6:38 PM
>> To:=C2=A0devel@edk2.groups= .io
>> Cc:=C2=A0Yao, Jiewen <j= iewen.yao@intel.com>; Wang, Jian J <jian.j.wang@intel.com>; Wu, Hao A <hao.a.wu@intel.com>; Lu, XiaoyuX <xiaoyux.lu@intel.com>; Jiang, Guomin &l= t;guomin.jiang@intel.com>;= =C2=A0gaoliming@byosoft.com.cn<= /a>; Fu, Siyuan <siyuan.fu@intel.= com>; Wu, Yidong <yidong.w= u@intel.com>; Li, Aaron <aa= ron.li@intel.com>
>> Subject:=C2=A0[edk2-devel] [RFC] Add parallel hash feature into Cr= yptoPkg.BaseCryptLib
>>
>> =C2=A0
>>
>> Hi, everyone.
>>
>> We want to add new hash algorithm=E2=80=94cSHAKE256/ParallelHash25= 6 defined by NIST SP 800-185=E2=80=94into BaseCryptLib of CryptoPkg. This f= eature can be applied for digital authentication functions like Capsule Upd= ate. It utilizes multi-processor to calculate the image digest in parallel = for update capsule authentication so that lessen the time of capsule authen= tication.
>>
>> =C2=A0
>>
>> Bugzilla:=C2=A0https://bugzilla.tianocore.org/show_bug.cgi?id=3D3596 >>
>> =C2=A0
>>
>> [Background]
>>
>> The intention of this change is to improve the capsule authenticat= ion performance.
>>
>> Currently, the image is calculated to a hash value (usually by SHA= -256), then the hash value be signed by a certificate. The header, certific= ate, and image binary be sealed to the capsule. In authentication phase, th= e program should calculate the hash using image binary in capsule and then = perform authentication procedures.
>>
>> =C2=A0
>>
>> [Proposal]
>>
>> Now, we propose a new authentication flow, which firstly pre-calcu= lates the ParallelHash256 digest of the image binary in parallel with multi= -processors, then use the ParallelHash256 digest (instead of original image= binary) in subsequent SHA-256 hash for sign/authentication.
>>
>> Since the big size image be compressed to the ParallelHash256 dige= st that only have 256 bytes, the time of SHA-256 running would be less.
>>
>> =C2=A0
>>
>> [Required Changes]
>>
>> Mainly in CryptoPkg, MdeModulePkg, SecurityPkg:
>>
>> 1. CryptoPkg: need to add the new hash algorithm named cSHAKE256/P= arallelHash256 in BaseCrypLib. The ParallelHash function will consume CPU M= P Service Protocol, not sure if this is allowed in BaseCryptLib?
>>
>> 2. MdeMoudulePkg: Add new authenticate function AuthenticateFmpIma= geWithParallelhash() to FmpAuthenticationLib. This is because original Auth= enticateFmpImage() interface only have 4 parameters=C2=A0 while the new hav= e 5 parameters. The 5th=C2=A0parameter is ParallelHash256 digest raised abo= ve. We try to do the parallel hash before authentication and transfer the r= esult to AuthenticateFmpImage function as parameter. So that we can do only= once parallel hash externally in the case of multiple authentication which= saves more time.
>>
>> 3. SecurityPkg: Add new function named FmpAuthenticatedHandlerPkcs= 7WithParallelhash() and AuthenticateFmpImageWithParallelhash() to FmpAuthen= ticationLibPkcs7. This is because original interfaces not have the=C2=A0for= mal parameter (ParallelHash256=C2=A0digest) we need.=C2=A0We try to do the = parallel hash before authentication and transfer the result to Authenticate= FmpImage and FmpAuthenticatedHandlerPkcs7 function as parameter. So that we= can do only once parallel hash externally in the case of multiple authenti= cation which saves more time.
>>
>> =C2=A0
>>
>> Please let me know if you have any comment or concern on this prop= osed change.
>>
>> =C2=A0
>>
>> Thanks for your time and feedback!
>>
>> Best regards,
>> Zhihao
>>
>> =C2=A0
>
> =C2=A0
> --000000000000f8791005cb128496--