* [PATCH v7 1/3] BaseTools: Remove COMMON section from the GCC discard list [not found] <20210720220646.659-1-christopher.zurcher@outlook.com> @ 2021-07-20 22:06 ` Christopher Zurcher 2021-07-21 1:11 ` 回复: " gaoliming 2021-07-21 11:44 ` [edk2-devel] " Yao, Jiewen 2021-07-20 22:06 ` [PATCH v7 2/3] CryptoPkg/OpensslLib: Add native instruction support for X64 Christopher Zurcher 2021-07-20 22:06 ` [PATCH v7 3/3] CryptoPkg/OpensslLib: Commit the auto-generated assembly files " Christopher Zurcher 2 siblings, 2 replies; 13+ messages in thread From: Christopher Zurcher @ 2021-07-20 22:06 UTC (permalink / raw) To: devel; +Cc: Ard Biesheuvel, Bob Feng, Liming Gao From: Christopher Zurcher <christopher.zurcher@microsoft.com> BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=2507 The COMMON section is used by OpenSSL assembly-optimized crypto functions. OpenSSL assembly code is auto-generated from the submodule and cannot be modified to remove dependence on the COMMON section. The default -fno-common compiler flag should still prevent variable from being emitted into the COMMON section. Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Bob Feng <bob.c.feng@intel.com> Cc: Liming Gao <gaoliming@byosoft.com.cn> Signed-off-by: Christopher Zurcher <christopher.zurcher@microsoft.com> --- BaseTools/Scripts/GccBase.lds | 1 - 1 file changed, 1 deletion(-) diff --git a/BaseTools/Scripts/GccBase.lds b/BaseTools/Scripts/GccBase.lds index a9dd2138d4..83cebd29d5 100644 --- a/BaseTools/Scripts/GccBase.lds +++ b/BaseTools/Scripts/GccBase.lds @@ -74,6 +74,5 @@ SECTIONS { *(.dynamic) *(.hash .gnu.hash) *(.comment) - *(COMMON) } } -- 2.32.0.windows.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* 回复: [PATCH v7 1/3] BaseTools: Remove COMMON section from the GCC discard list 2021-07-20 22:06 ` [PATCH v7 1/3] BaseTools: Remove COMMON section from the GCC discard list Christopher Zurcher @ 2021-07-21 1:11 ` gaoliming 2021-07-21 1:14 ` [edk2-devel] " Christopher Zurcher 2021-07-21 11:44 ` [edk2-devel] " Yao, Jiewen 1 sibling, 1 reply; 13+ messages in thread From: gaoliming @ 2021-07-21 1:11 UTC (permalink / raw) To: christopher.zurcher, devel; +Cc: 'Ard Biesheuvel', 'Bob Feng' Christopher: Thanks for your update. Can you let me know which platform is verified with this change by GCC tool chain? Ovmf? Thanks Liming > -----邮件原件----- > 发件人: christopher.zurcher@outlook.com > <christopher.zurcher@outlook.com> > 发送时间: 2021年7月21日 6:07 > 收件人: devel@edk2.groups.io > 抄送: Ard Biesheuvel <ardb@kernel.org>; Bob Feng <bob.c.feng@intel.com>; > Liming Gao <gaoliming@byosoft.com.cn> > 主题: [PATCH v7 1/3] BaseTools: Remove COMMON section from the GCC > discard list > > From: Christopher Zurcher <christopher.zurcher@microsoft.com> > > BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=2507 > > The COMMON section is used by OpenSSL assembly-optimized crypto > functions. OpenSSL assembly code is auto-generated from the submodule > and cannot be modified to remove dependence on the COMMON section. > The default -fno-common compiler flag should still prevent variable from > being emitted into the COMMON section. > > Cc: Ard Biesheuvel <ardb@kernel.org> > Cc: Bob Feng <bob.c.feng@intel.com> > Cc: Liming Gao <gaoliming@byosoft.com.cn> > Signed-off-by: Christopher Zurcher <christopher.zurcher@microsoft.com> > --- > BaseTools/Scripts/GccBase.lds | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/BaseTools/Scripts/GccBase.lds b/BaseTools/Scripts/GccBase.lds > index a9dd2138d4..83cebd29d5 100644 > --- a/BaseTools/Scripts/GccBase.lds > +++ b/BaseTools/Scripts/GccBase.lds > @@ -74,6 +74,5 @@ SECTIONS { > *(.dynamic) > *(.hash .gnu.hash) > *(.comment) > - *(COMMON) > } > } > -- > 2.32.0.windows.1 ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [edk2-devel] 回复: [PATCH v7 1/3] BaseTools: Remove COMMON section from the GCC discard list 2021-07-21 1:11 ` 回复: " gaoliming @ 2021-07-21 1:14 ` Christopher Zurcher 2021-07-21 1:46 ` 回复: " gaoliming 0 siblings, 1 reply; 13+ messages in thread From: Christopher Zurcher @ 2021-07-21 1:14 UTC (permalink / raw) To: devel@edk2.groups.io, gaoliming@byosoft.com.cn Cc: 'Ard Biesheuvel', 'Bob Feng' Yes, this was verified with OVMF. Thanks, Christopher Zurcher -----Original Message----- From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of gaoliming Sent: Tuesday, July 20, 2021 18:12 To: christopher.zurcher@outlook.com; devel@edk2.groups.io Cc: 'Ard Biesheuvel' <ardb@kernel.org>; 'Bob Feng' <bob.c.feng@intel.com> Subject: [edk2-devel] 回复: [PATCH v7 1/3] BaseTools: Remove COMMON section from the GCC discard list Christopher: Thanks for your update. Can you let me know which platform is verified with this change by GCC tool chain? Ovmf? Thanks Liming > -----邮件原件----- > 发件人: christopher.zurcher@outlook.com > <christopher.zurcher@outlook.com> > 发送时间: 2021年7月21日 6:07 > 收件人: devel@edk2.groups.io > 抄送: Ard Biesheuvel <ardb@kernel.org>; Bob Feng <bob.c.feng@intel.com>; > Liming Gao <gaoliming@byosoft.com.cn> > 主题: [PATCH v7 1/3] BaseTools: Remove COMMON section from the GCC > discard list > > From: Christopher Zurcher <christopher.zurcher@microsoft.com> > > BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=2507 > > The COMMON section is used by OpenSSL assembly-optimized crypto > functions. OpenSSL assembly code is auto-generated from the submodule > and cannot be modified to remove dependence on the COMMON section. > The default -fno-common compiler flag should still prevent variable > from being emitted into the COMMON section. > > Cc: Ard Biesheuvel <ardb@kernel.org> > Cc: Bob Feng <bob.c.feng@intel.com> > Cc: Liming Gao <gaoliming@byosoft.com.cn> > Signed-off-by: Christopher Zurcher <christopher.zurcher@microsoft.com> > --- > BaseTools/Scripts/GccBase.lds | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/BaseTools/Scripts/GccBase.lds > b/BaseTools/Scripts/GccBase.lds index a9dd2138d4..83cebd29d5 100644 > --- a/BaseTools/Scripts/GccBase.lds > +++ b/BaseTools/Scripts/GccBase.lds > @@ -74,6 +74,5 @@ SECTIONS { > *(.dynamic) > *(.hash .gnu.hash) > *(.comment) > - *(COMMON) > } > } > -- > 2.32.0.windows.1 ^ permalink raw reply [flat|nested] 13+ messages in thread
* 回复: [edk2-devel] 回复: [PATCH v7 1/3] BaseTools: Remove COMMON section from the GCC discard list 2021-07-21 1:14 ` [edk2-devel] " Christopher Zurcher @ 2021-07-21 1:46 ` gaoliming 0 siblings, 0 replies; 13+ messages in thread From: gaoliming @ 2021-07-21 1:46 UTC (permalink / raw) To: devel, christopher.zurcher; +Cc: 'Ard Biesheuvel', 'Bob Feng' Christopher: Based on current information, I am OK for this change. Reviewed-by: Liming Gao <gaoliming@byosoft.com.cn> Thanks Liming > -----邮件原件----- > 发件人: devel@edk2.groups.io <devel@edk2.groups.io> 代表 Christopher > Zurcher > 发送时间: 2021年7月21日 9:15 > 收件人: devel@edk2.groups.io; gaoliming@byosoft.com.cn > 抄送: 'Ard Biesheuvel' <ardb@kernel.org>; 'Bob Feng' <bob.c.feng@intel.com> > 主题: Re: [edk2-devel] 回复: [PATCH v7 1/3] BaseTools: Remove COMMON > section from the GCC discard list > > Yes, this was verified with OVMF. > > Thanks, > Christopher Zurcher > > -----Original Message----- > From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of gaoliming > Sent: Tuesday, July 20, 2021 18:12 > To: christopher.zurcher@outlook.com; devel@edk2.groups.io > Cc: 'Ard Biesheuvel' <ardb@kernel.org>; 'Bob Feng' <bob.c.feng@intel.com> > Subject: [edk2-devel] 回复: [PATCH v7 1/3] BaseTools: Remove COMMON > section from the GCC discard list > > Christopher: > Thanks for your update. Can you let me know which platform is verified > with this change by GCC tool chain? Ovmf? > > Thanks > Liming > > -----邮件原件----- > > 发件人: christopher.zurcher@outlook.com > > <christopher.zurcher@outlook.com> > > 发送时间: 2021年7月21日 6:07 > > 收件人: devel@edk2.groups.io > > 抄送: Ard Biesheuvel <ardb@kernel.org>; Bob Feng > <bob.c.feng@intel.com>; > > Liming Gao <gaoliming@byosoft.com.cn> > > 主题: [PATCH v7 1/3] BaseTools: Remove COMMON section from the GCC > > discard list > > > > From: Christopher Zurcher <christopher.zurcher@microsoft.com> > > > > BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=2507 > > > > The COMMON section is used by OpenSSL assembly-optimized crypto > > functions. OpenSSL assembly code is auto-generated from the submodule > > and cannot be modified to remove dependence on the COMMON section. > > The default -fno-common compiler flag should still prevent variable > > from being emitted into the COMMON section. > > > > Cc: Ard Biesheuvel <ardb@kernel.org> > > Cc: Bob Feng <bob.c.feng@intel.com> > > Cc: Liming Gao <gaoliming@byosoft.com.cn> > > Signed-off-by: Christopher Zurcher <christopher.zurcher@microsoft.com> > > --- > > BaseTools/Scripts/GccBase.lds | 1 - > > 1 file changed, 1 deletion(-) > > > > diff --git a/BaseTools/Scripts/GccBase.lds > > b/BaseTools/Scripts/GccBase.lds index a9dd2138d4..83cebd29d5 100644 > > --- a/BaseTools/Scripts/GccBase.lds > > +++ b/BaseTools/Scripts/GccBase.lds > > @@ -74,6 +74,5 @@ SECTIONS { > > *(.dynamic) > > *(.hash .gnu.hash) > > *(.comment) > > - *(COMMON) > > } > > } > > -- > > 2.32.0.windows.1 > > > > > > > > > > > > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [edk2-devel] [PATCH v7 1/3] BaseTools: Remove COMMON section from the GCC discard list 2021-07-20 22:06 ` [PATCH v7 1/3] BaseTools: Remove COMMON section from the GCC discard list Christopher Zurcher 2021-07-21 1:11 ` 回复: " gaoliming @ 2021-07-21 11:44 ` Yao, Jiewen 2021-08-04 12:26 ` Ard Biesheuvel 1 sibling, 1 reply; 13+ messages in thread From: Yao, Jiewen @ 2021-07-21 11:44 UTC (permalink / raw) To: devel@edk2.groups.io, christopher.zurcher@outlook.com Cc: Ard Biesheuvel, Feng, Bob C, Liming Gao Acked-by: Jiewen Yao <Jiewen.yao@intel.com> > -----Original Message----- > From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of Christopher > Zurcher > Sent: Wednesday, July 21, 2021 6:07 AM > To: devel@edk2.groups.io > Cc: Ard Biesheuvel <ardb@kernel.org>; Feng, Bob C <bob.c.feng@intel.com>; > Liming Gao <gaoliming@byosoft.com.cn> > Subject: [edk2-devel] [PATCH v7 1/3] BaseTools: Remove COMMON section > from the GCC discard list > > From: Christopher Zurcher <christopher.zurcher@microsoft.com> > > BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=2507 > > The COMMON section is used by OpenSSL assembly-optimized crypto > functions. OpenSSL assembly code is auto-generated from the submodule > and cannot be modified to remove dependence on the COMMON section. > The default -fno-common compiler flag should still prevent variable from > being emitted into the COMMON section. > > Cc: Ard Biesheuvel <ardb@kernel.org> > Cc: Bob Feng <bob.c.feng@intel.com> > Cc: Liming Gao <gaoliming@byosoft.com.cn> > Signed-off-by: Christopher Zurcher <christopher.zurcher@microsoft.com> > --- > BaseTools/Scripts/GccBase.lds | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/BaseTools/Scripts/GccBase.lds b/BaseTools/Scripts/GccBase.lds > index a9dd2138d4..83cebd29d5 100644 > --- a/BaseTools/Scripts/GccBase.lds > +++ b/BaseTools/Scripts/GccBase.lds > @@ -74,6 +74,5 @@ SECTIONS { > *(.dynamic) > *(.hash .gnu.hash) > *(.comment) > - *(COMMON) > } > } > -- > 2.32.0.windows.1 > > > > > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [edk2-devel] [PATCH v7 1/3] BaseTools: Remove COMMON section from the GCC discard list 2021-07-21 11:44 ` [edk2-devel] " Yao, Jiewen @ 2021-08-04 12:26 ` Ard Biesheuvel 2021-08-05 5:04 ` 回复: " gaoliming 0 siblings, 1 reply; 13+ messages in thread From: Ard Biesheuvel @ 2021-08-04 12:26 UTC (permalink / raw) To: Yao, Jiewen Cc: devel@edk2.groups.io, christopher.zurcher@outlook.com, Feng, Bob C, Liming Gao On Wed, 21 Jul 2021 at 13:44, Yao, Jiewen <jiewen.yao@intel.com> wrote: > > Acked-by: Jiewen Yao <Jiewen.yao@intel.com> > I don't think this is a good idea tbh. We have already identified that EDK2 code often fails to use the STATIC keyword when possible for global variables, and that unrelated variables that happen to have the same name will be collapsed into the same storage unit in the program image. (see commit 214a3b79417f64bf2faae74af42c1b9d23f50dc8 for details) Was this considered? Is this no longer an issue? > > -----Original Message----- > > From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of Christopher > > Zurcher > > Sent: Wednesday, July 21, 2021 6:07 AM > > To: devel@edk2.groups.io > > Cc: Ard Biesheuvel <ardb@kernel.org>; Feng, Bob C <bob.c.feng@intel.com>; > > Liming Gao <gaoliming@byosoft.com.cn> > > Subject: [edk2-devel] [PATCH v7 1/3] BaseTools: Remove COMMON section > > from the GCC discard list > > > > From: Christopher Zurcher <christopher.zurcher@microsoft.com> > > > > BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=2507 > > > > The COMMON section is used by OpenSSL assembly-optimized crypto > > functions. OpenSSL assembly code is auto-generated from the submodule > > and cannot be modified to remove dependence on the COMMON section. > > The default -fno-common compiler flag should still prevent variable from > > being emitted into the COMMON section. > > > > Cc: Ard Biesheuvel <ardb@kernel.org> > > Cc: Bob Feng <bob.c.feng@intel.com> > > Cc: Liming Gao <gaoliming@byosoft.com.cn> > > Signed-off-by: Christopher Zurcher <christopher.zurcher@microsoft.com> > > --- > > BaseTools/Scripts/GccBase.lds | 1 - > > 1 file changed, 1 deletion(-) > > > > diff --git a/BaseTools/Scripts/GccBase.lds b/BaseTools/Scripts/GccBase.lds > > index a9dd2138d4..83cebd29d5 100644 > > --- a/BaseTools/Scripts/GccBase.lds > > +++ b/BaseTools/Scripts/GccBase.lds > > @@ -74,6 +74,5 @@ SECTIONS { > > *(.dynamic) > > *(.hash .gnu.hash) > > *(.comment) > > - *(COMMON) > > } > > } > > -- > > 2.32.0.windows.1 > > > > > > > > > > > ^ permalink raw reply [flat|nested] 13+ messages in thread
* 回复: [edk2-devel] [PATCH v7 1/3] BaseTools: Remove COMMON section from the GCC discard list 2021-08-04 12:26 ` Ard Biesheuvel @ 2021-08-05 5:04 ` gaoliming 2021-08-06 20:01 ` Christopher Zurcher 0 siblings, 1 reply; 13+ messages in thread From: gaoliming @ 2021-08-05 5:04 UTC (permalink / raw) To: 'Ard Biesheuvel', 'Yao, Jiewen' Cc: devel, christopher.zurcher, 'Feng, Bob C' Ard: Chris explains this change in https://edk2.groups.io/g/devel/message/77662. And, he also verifies the patch in OVMF with GCC5 tool chain. Thanks Liming > -----邮件原件----- > 发件人: Ard Biesheuvel <ardb@kernel.org> > 发送时间: 2021年8月4日 20:27 > 收件人: Yao, Jiewen <jiewen.yao@intel.com> > 抄送: devel@edk2.groups.io; christopher.zurcher@outlook.com; Feng, Bob C > <bob.c.feng@intel.com>; Liming Gao <gaoliming@byosoft.com.cn> > 主题: Re: [edk2-devel] [PATCH v7 1/3] BaseTools: Remove COMMON section > from the GCC discard list > > On Wed, 21 Jul 2021 at 13:44, Yao, Jiewen <jiewen.yao@intel.com> wrote: > > > > Acked-by: Jiewen Yao <Jiewen.yao@intel.com> > > > > I don't think this is a good idea tbh. We have already identified that > EDK2 code often fails to use the STATIC keyword when possible for > global variables, and that unrelated variables that happen to have the > same name will be collapsed into the same storage unit in the program > image. (see commit 214a3b79417f64bf2faae74af42c1b9d23f50dc8 for > details) > > Was this considered? Is this no longer an issue? > > > > > > -----Original Message----- > > > From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of > Christopher > > > Zurcher > > > Sent: Wednesday, July 21, 2021 6:07 AM > > > To: devel@edk2.groups.io > > > Cc: Ard Biesheuvel <ardb@kernel.org>; Feng, Bob C > <bob.c.feng@intel.com>; > > > Liming Gao <gaoliming@byosoft.com.cn> > > > Subject: [edk2-devel] [PATCH v7 1/3] BaseTools: Remove COMMON > section > > > from the GCC discard list > > > > > > From: Christopher Zurcher <christopher.zurcher@microsoft.com> > > > > > > BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=2507 > > > > > > The COMMON section is used by OpenSSL assembly-optimized crypto > > > functions. OpenSSL assembly code is auto-generated from the submodule > > > and cannot be modified to remove dependence on the COMMON section. > > > The default -fno-common compiler flag should still prevent variable from > > > being emitted into the COMMON section. > > > > > > Cc: Ard Biesheuvel <ardb@kernel.org> > > > Cc: Bob Feng <bob.c.feng@intel.com> > > > Cc: Liming Gao <gaoliming@byosoft.com.cn> > > > Signed-off-by: Christopher Zurcher <christopher.zurcher@microsoft.com> > > > --- > > > BaseTools/Scripts/GccBase.lds | 1 - > > > 1 file changed, 1 deletion(-) > > > > > > diff --git a/BaseTools/Scripts/GccBase.lds > b/BaseTools/Scripts/GccBase.lds > > > index a9dd2138d4..83cebd29d5 100644 > > > --- a/BaseTools/Scripts/GccBase.lds > > > +++ b/BaseTools/Scripts/GccBase.lds > > > @@ -74,6 +74,5 @@ SECTIONS { > > > *(.dynamic) > > > *(.hash .gnu.hash) > > > *(.comment) > > > - *(COMMON) > > > } > > > } > > > -- > > > 2.32.0.windows.1 > > > > > > > > > > > > > > > > > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [edk2-devel] [PATCH v7 1/3] BaseTools: Remove COMMON section from the GCC discard list 2021-08-05 5:04 ` 回复: " gaoliming @ 2021-08-06 20:01 ` Christopher Zurcher 0 siblings, 0 replies; 13+ messages in thread From: Christopher Zurcher @ 2021-08-06 20:01 UTC (permalink / raw) To: devel@edk2.groups.io, gaoliming@byosoft.com.cn, 'Ard Biesheuvel', 'Yao, Jiewen' Cc: 'Feng, Bob C' Ard, Is the removal of the COMMON section during the build not redundant to the -fno-common option? Do you expect cases where we will still see the undesired variable collisions? Thanks, Christopher Zurcher -----Original Message----- From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of gaoliming Sent: Wednesday, August 4, 2021 22:04 To: 'Ard Biesheuvel' <ardb@kernel.org>; 'Yao, Jiewen' <jiewen.yao@intel.com> Cc: devel@edk2.groups.io; christopher.zurcher@outlook.com; 'Feng, Bob C' <bob.c.feng@intel.com> Subject: 回复: [edk2-devel] [PATCH v7 1/3] BaseTools: Remove COMMON section from the GCC discard list Ard: Chris explains this change in https://edk2.groups.io/g/devel/message/77662. And, he also verifies the patch in OVMF with GCC5 tool chain. Thanks Liming > -----邮件原件----- > 发件人: Ard Biesheuvel <ardb@kernel.org> > 发送时间: 2021年8月4日 20:27 > 收件人: Yao, Jiewen <jiewen.yao@intel.com> > 抄送: devel@edk2.groups.io; christopher.zurcher@outlook.com; Feng, Bob C > <bob.c.feng@intel.com>; Liming Gao <gaoliming@byosoft.com.cn> > 主题: Re: [edk2-devel] [PATCH v7 1/3] BaseTools: Remove COMMON section > from the GCC discard list > > On Wed, 21 Jul 2021 at 13:44, Yao, Jiewen <jiewen.yao@intel.com> wrote: > > > > Acked-by: Jiewen Yao <Jiewen.yao@intel.com> > > > > I don't think this is a good idea tbh. We have already identified that > EDK2 code often fails to use the STATIC keyword when possible for > global variables, and that unrelated variables that happen to have the > same name will be collapsed into the same storage unit in the program > image. (see commit 214a3b79417f64bf2faae74af42c1b9d23f50dc8 for > details) > > Was this considered? Is this no longer an issue? > > > > > > -----Original Message----- > > > From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of > Christopher > > > Zurcher > > > Sent: Wednesday, July 21, 2021 6:07 AM > > > To: devel@edk2.groups.io > > > Cc: Ard Biesheuvel <ardb@kernel.org>; Feng, Bob C > <bob.c.feng@intel.com>; > > > Liming Gao <gaoliming@byosoft.com.cn> > > > Subject: [edk2-devel] [PATCH v7 1/3] BaseTools: Remove COMMON > section > > > from the GCC discard list > > > > > > From: Christopher Zurcher <christopher.zurcher@microsoft.com> > > > > > > BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=2507 > > > > > > The COMMON section is used by OpenSSL assembly-optimized crypto > > > functions. OpenSSL assembly code is auto-generated from the > > > submodule and cannot be modified to remove dependence on the COMMON section. > > > The default -fno-common compiler flag should still prevent > > > variable from being emitted into the COMMON section. > > > > > > Cc: Ard Biesheuvel <ardb@kernel.org> > > > Cc: Bob Feng <bob.c.feng@intel.com> > > > Cc: Liming Gao <gaoliming@byosoft.com.cn> > > > Signed-off-by: Christopher Zurcher > > > <christopher.zurcher@microsoft.com> > > > --- > > > BaseTools/Scripts/GccBase.lds | 1 - > > > 1 file changed, 1 deletion(-) > > > > > > diff --git a/BaseTools/Scripts/GccBase.lds > b/BaseTools/Scripts/GccBase.lds > > > index a9dd2138d4..83cebd29d5 100644 > > > --- a/BaseTools/Scripts/GccBase.lds > > > +++ b/BaseTools/Scripts/GccBase.lds > > > @@ -74,6 +74,5 @@ SECTIONS { > > > *(.dynamic) > > > *(.hash .gnu.hash) > > > *(.comment) > > > - *(COMMON) > > > } > > > } > > > -- > > > 2.32.0.windows.1 > > > > > > > > > > > > > > > > > ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v7 2/3] CryptoPkg/OpensslLib: Add native instruction support for X64 [not found] <20210720220646.659-1-christopher.zurcher@outlook.com> 2021-07-20 22:06 ` [PATCH v7 1/3] BaseTools: Remove COMMON section from the GCC discard list Christopher Zurcher @ 2021-07-20 22:06 ` Christopher Zurcher 2021-07-21 11:44 ` Yao, Jiewen 2021-07-20 22:06 ` [PATCH v7 3/3] CryptoPkg/OpensslLib: Commit the auto-generated assembly files " Christopher Zurcher 2 siblings, 1 reply; 13+ messages in thread From: Christopher Zurcher @ 2021-07-20 22:06 UTC (permalink / raw) To: devel; +Cc: Jiewen Yao, Jian J Wang, Xiaoyu Lu, Mike Kinney, Ard Biesheuvel From: Christopher Zurcher <christopher.zurcher@microsoft.com> BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=2507 Adding OpensslLibX64.inf and modifying process_files.pl to process this file and generate the necessary assembly files. Adding OpensslLibX64Gcc.inf to allow building with GCC toolchain. ApiHooks.c contains a stub function for a Windows API call. uefi-asm.conf contains the limited assembly configurations for OpenSSL. Cc: Jiewen Yao <jiewen.yao@intel.com> Cc: Jian J Wang <jian.j.wang@intel.com> Cc: Xiaoyu Lu <xiaoyux.lu@intel.com> Cc: Mike Kinney <michael.d.kinney@intel.com> Cc: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Christopher Zurcher <christopher.zurcher@microsoft.com> --- CryptoPkg/CryptoPkg.ci.yaml | 21 +- CryptoPkg/Library/Include/CrtLibSupport.h | 2 + CryptoPkg/Library/Include/openssl/opensslconf.h | 3 - CryptoPkg/Library/OpensslLib/OpensslLib.inf | 2 +- CryptoPkg/Library/OpensslLib/OpensslLibConstructor.c | 44 ++ CryptoPkg/Library/OpensslLib/OpensslLibCrypto.inf | 2 +- CryptoPkg/Library/OpensslLib/OpensslLibX64.inf | 653 ++++++++++++++++++++ CryptoPkg/Library/OpensslLib/OpensslLibX64Gcc.inf | 653 ++++++++++++++++++++ CryptoPkg/Library/OpensslLib/UefiAsm.conf | 30 + CryptoPkg/Library/OpensslLib/X64/ApiHooks.c | 22 + CryptoPkg/Library/OpensslLib/process_files.pl | 241 ++++++-- 11 files changed, 1619 insertions(+), 54 deletions(-) diff --git a/CryptoPkg/CryptoPkg.ci.yaml b/CryptoPkg/CryptoPkg.ci.yaml index 5d7c340ae5..1448299073 100644 --- a/CryptoPkg/CryptoPkg.ci.yaml +++ b/CryptoPkg/CryptoPkg.ci.yaml @@ -7,7 +7,11 @@ ## { "LicenseCheck": { - "IgnoreFiles": [] + "IgnoreFiles": [ + # These directories contain auto-generated OpenSSL content + "Library/OpensslLib/X64", + "Library/OpensslLib/X64Gcc" + ] }, "EccCheck": { ## Exception sample looks like below: @@ -23,8 +27,13 @@ "Test/UnitTest", # This has OpenSSL interfaces that aren't UEFI spec compliant "Library/BaseCryptLib/SysCall/UnitTestHostCrtWrapper.c", - # this has OpenSSL interfaces that aren't UEFI spec compliant - "Library/OpensslLib/rand_pool.c" + # This has OpenSSL interfaces that aren't UEFI spec compliant + "Library/OpensslLib/rand_pool.c", + # This has OpenSSL interfaces that aren't UEFI spec compliant + "Library/Include/CrtLibSupport.h", + # These directories contain auto-generated OpenSSL content + "Library/OpensslLib/X64", + "Library/OpensslLib/X64Gcc" ] }, "CompilerPlugin": { @@ -51,7 +60,11 @@ }, "DscCompleteCheck": { "DscPath": "CryptoPkg.dsc", - "IgnoreInf": [] + "IgnoreInf": [ + # These are alternatives to OpensslLib.inf + "CryptoPkg/Library/OpensslLib/OpensslLibX64.inf", + "CryptoPkg/Library/OpensslLib/OpensslLibX64Gcc.inf" + ] }, "GuidCheck": { "IgnoreGuidName": [], diff --git a/CryptoPkg/Library/Include/CrtLibSupport.h b/CryptoPkg/Library/Include/CrtLibSupport.h index b1dff03bdc..17d7f29ba2 100644 --- a/CryptoPkg/Library/Include/CrtLibSupport.h +++ b/CryptoPkg/Library/Include/CrtLibSupport.h @@ -102,6 +102,7 @@ SPDX-License-Identifier: BSD-2-Clause-Patent // typedef UINTN size_t; typedef UINTN u_int; +typedef INTN ptrdiff_t; typedef INTN ssize_t; typedef INT32 time_t; typedef UINT8 __uint8_t; @@ -109,6 +110,7 @@ typedef UINT8 sa_family_t; typedef UINT8 u_char; typedef UINT32 uid_t; typedef UINT32 gid_t; +typedef CHAR16 wchar_t; // // File operations are not required for EFI building, diff --git a/CryptoPkg/Library/Include/openssl/opensslconf.h b/CryptoPkg/Library/Include/openssl/opensslconf.h index e5652be5ca..b8d59aebe8 100644 --- a/CryptoPkg/Library/Include/openssl/opensslconf.h +++ b/CryptoPkg/Library/Include/openssl/opensslconf.h @@ -112,9 +112,6 @@ extern "C" { #ifndef OPENSSL_NO_ASAN # define OPENSSL_NO_ASAN #endif -#ifndef OPENSSL_NO_ASM -# define OPENSSL_NO_ASM -#endif #ifndef OPENSSL_NO_ASYNC # define OPENSSL_NO_ASYNC #endif diff --git a/CryptoPkg/Library/OpensslLib/OpensslLib.inf b/CryptoPkg/Library/OpensslLib/OpensslLib.inf index b00bb74ce6..d84bde056a 100644 --- a/CryptoPkg/Library/OpensslLib/OpensslLib.inf +++ b/CryptoPkg/Library/OpensslLib/OpensslLib.inf @@ -16,7 +16,7 @@ VERSION_STRING = 1.0 LIBRARY_CLASS = OpensslLib DEFINE OPENSSL_PATH = openssl - DEFINE OPENSSL_FLAGS = -DL_ENDIAN -DOPENSSL_SMALL_FOOTPRINT -D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE + DEFINE OPENSSL_FLAGS = -DL_ENDIAN -DOPENSSL_SMALL_FOOTPRINT -D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE -DOPENSSL_NO_ASM # # VALID_ARCHITECTURES = IA32 X64 ARM AARCH64 diff --git a/CryptoPkg/Library/OpensslLib/OpensslLibConstructor.c b/CryptoPkg/Library/OpensslLib/OpensslLibConstructor.c new file mode 100644 index 0000000000..74ae1ac20c --- /dev/null +++ b/CryptoPkg/Library/OpensslLib/OpensslLibConstructor.c @@ -0,0 +1,44 @@ +/** @file + Constructor to initialize CPUID data for OpenSSL assembly operations. + +Copyright (c) 2020, Intel Corporation. All rights reserved.<BR> +SPDX-License-Identifier: BSD-2-Clause-Patent + +**/ + +#include <Uefi.h> + + +/** + An internal OpenSSL function which fetches a local copy of the hardware + capability flags. + +**/ +extern +VOID +OPENSSL_cpuid_setup ( + VOID + ); + +/** + Constructor routine for OpensslLib. + + The constructor calls an internal OpenSSL function which fetches a local copy + of the hardware capability flags, used to enable native crypto instructions. + + @param None + + @retval EFI_SUCCESS The construction succeeded. + +**/ +EFI_STATUS +EFIAPI +OpensslLibConstructor ( + VOID + ) +{ + OPENSSL_cpuid_setup (); + + return EFI_SUCCESS; +} + diff --git a/CryptoPkg/Library/OpensslLib/OpensslLibCrypto.inf b/CryptoPkg/Library/OpensslLib/OpensslLibCrypto.inf index 3557711bd8..cdeed0d073 100644 --- a/CryptoPkg/Library/OpensslLib/OpensslLibCrypto.inf +++ b/CryptoPkg/Library/OpensslLib/OpensslLibCrypto.inf @@ -16,7 +16,7 @@ VERSION_STRING = 1.0 LIBRARY_CLASS = OpensslLib DEFINE OPENSSL_PATH = openssl - DEFINE OPENSSL_FLAGS = -DL_ENDIAN -DOPENSSL_SMALL_FOOTPRINT -D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE + DEFINE OPENSSL_FLAGS = -DL_ENDIAN -DOPENSSL_SMALL_FOOTPRINT -D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE -DOPENSSL_NO_ASM # # VALID_ARCHITECTURES = IA32 X64 ARM AARCH64 diff --git a/CryptoPkg/Library/OpensslLib/OpensslLibX64.inf b/CryptoPkg/Library/OpensslLib/OpensslLibX64.inf new file mode 100644 index 0000000000..b92feaf1bf --- /dev/null +++ b/CryptoPkg/Library/OpensslLib/OpensslLibX64.inf @@ -0,0 +1,653 @@ +## @file +# This module provides OpenSSL Library implementation. +# +# Copyright (c) 2010 - 2020, Intel Corporation. All rights reserved.<BR> +# (C) Copyright 2020 Hewlett Packard Enterprise Development LP<BR> +# SPDX-License-Identifier: BSD-2-Clause-Patent +# +## + +[Defines] + INF_VERSION = 0x00010005 + BASE_NAME = OpensslLibX64 + MODULE_UNI_FILE = OpensslLib.uni + FILE_GUID = 18125E50-0117-4DD0-BE54-4784AD995FEF + MODULE_TYPE = BASE + VERSION_STRING = 1.0 + LIBRARY_CLASS = OpensslLib + DEFINE OPENSSL_PATH = openssl + DEFINE OPENSSL_FLAGS = -DL_ENDIAN -DOPENSSL_SMALL_FOOTPRINT -D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE + DEFINE OPENSSL_FLAGS_CONFIG = -DOPENSSL_CPUID_OBJ -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM + CONSTRUCTOR = OpensslLibConstructor + +# +# VALID_ARCHITECTURES = X64 +# + +[Sources.X64] + OpensslLibConstructor.c + $(OPENSSL_PATH)/e_os.h + $(OPENSSL_PATH)/ms/uplink.h +# Autogenerated files list starts here + X64/crypto/aes/aesni-mb-x86_64.nasm + X64/crypto/aes/aesni-sha1-x86_64.nasm + X64/crypto/aes/aesni-sha256-x86_64.nasm + X64/crypto/aes/aesni-x86_64.nasm + X64/crypto/aes/vpaes-x86_64.nasm + X64/crypto/modes/aesni-gcm-x86_64.nasm + X64/crypto/modes/ghash-x86_64.nasm + X64/crypto/sha/sha1-mb-x86_64.nasm + X64/crypto/sha/sha1-x86_64.nasm + X64/crypto/sha/sha256-mb-x86_64.nasm + X64/crypto/sha/sha256-x86_64.nasm + X64/crypto/sha/sha512-x86_64.nasm + X64/crypto/x86_64cpuid.nasm + $(OPENSSL_PATH)/crypto/aes/aes_cbc.c + $(OPENSSL_PATH)/crypto/aes/aes_cfb.c + $(OPENSSL_PATH)/crypto/aes/aes_core.c + $(OPENSSL_PATH)/crypto/aes/aes_ige.c + $(OPENSSL_PATH)/crypto/aes/aes_misc.c + $(OPENSSL_PATH)/crypto/aes/aes_ofb.c + $(OPENSSL_PATH)/crypto/aes/aes_wrap.c + $(OPENSSL_PATH)/crypto/aria/aria.c + $(OPENSSL_PATH)/crypto/asn1/a_bitstr.c + $(OPENSSL_PATH)/crypto/asn1/a_d2i_fp.c + $(OPENSSL_PATH)/crypto/asn1/a_digest.c + $(OPENSSL_PATH)/crypto/asn1/a_dup.c + $(OPENSSL_PATH)/crypto/asn1/a_gentm.c + $(OPENSSL_PATH)/crypto/asn1/a_i2d_fp.c + $(OPENSSL_PATH)/crypto/asn1/a_int.c + $(OPENSSL_PATH)/crypto/asn1/a_mbstr.c + $(OPENSSL_PATH)/crypto/asn1/a_object.c + $(OPENSSL_PATH)/crypto/asn1/a_octet.c + $(OPENSSL_PATH)/crypto/asn1/a_print.c + $(OPENSSL_PATH)/crypto/asn1/a_sign.c + $(OPENSSL_PATH)/crypto/asn1/a_strex.c + $(OPENSSL_PATH)/crypto/asn1/a_strnid.c + $(OPENSSL_PATH)/crypto/asn1/a_time.c + $(OPENSSL_PATH)/crypto/asn1/a_type.c + $(OPENSSL_PATH)/crypto/asn1/a_utctm.c + $(OPENSSL_PATH)/crypto/asn1/a_utf8.c + $(OPENSSL_PATH)/crypto/asn1/a_verify.c + $(OPENSSL_PATH)/crypto/asn1/ameth_lib.c + $(OPENSSL_PATH)/crypto/asn1/asn1_err.c + $(OPENSSL_PATH)/crypto/asn1/asn1_gen.c + $(OPENSSL_PATH)/crypto/asn1/asn1_item_list.c + $(OPENSSL_PATH)/crypto/asn1/asn1_lib.c + $(OPENSSL_PATH)/crypto/asn1/asn1_par.c + $(OPENSSL_PATH)/crypto/asn1/asn_mime.c + $(OPENSSL_PATH)/crypto/asn1/asn_moid.c + $(OPENSSL_PATH)/crypto/asn1/asn_mstbl.c + $(OPENSSL_PATH)/crypto/asn1/asn_pack.c + $(OPENSSL_PATH)/crypto/asn1/bio_asn1.c + $(OPENSSL_PATH)/crypto/asn1/bio_ndef.c + $(OPENSSL_PATH)/crypto/asn1/d2i_pr.c + $(OPENSSL_PATH)/crypto/asn1/d2i_pu.c + $(OPENSSL_PATH)/crypto/asn1/evp_asn1.c + $(OPENSSL_PATH)/crypto/asn1/f_int.c + $(OPENSSL_PATH)/crypto/asn1/f_string.c + $(OPENSSL_PATH)/crypto/asn1/i2d_pr.c + $(OPENSSL_PATH)/crypto/asn1/i2d_pu.c + $(OPENSSL_PATH)/crypto/asn1/n_pkey.c + $(OPENSSL_PATH)/crypto/asn1/nsseq.c + $(OPENSSL_PATH)/crypto/asn1/p5_pbe.c + $(OPENSSL_PATH)/crypto/asn1/p5_pbev2.c + $(OPENSSL_PATH)/crypto/asn1/p5_scrypt.c + $(OPENSSL_PATH)/crypto/asn1/p8_pkey.c + $(OPENSSL_PATH)/crypto/asn1/t_bitst.c + $(OPENSSL_PATH)/crypto/asn1/t_pkey.c + $(OPENSSL_PATH)/crypto/asn1/t_spki.c + $(OPENSSL_PATH)/crypto/asn1/tasn_dec.c + $(OPENSSL_PATH)/crypto/asn1/tasn_enc.c + $(OPENSSL_PATH)/crypto/asn1/tasn_fre.c + $(OPENSSL_PATH)/crypto/asn1/tasn_new.c + $(OPENSSL_PATH)/crypto/asn1/tasn_prn.c + $(OPENSSL_PATH)/crypto/asn1/tasn_scn.c + $(OPENSSL_PATH)/crypto/asn1/tasn_typ.c + $(OPENSSL_PATH)/crypto/asn1/tasn_utl.c + $(OPENSSL_PATH)/crypto/asn1/x_algor.c + $(OPENSSL_PATH)/crypto/asn1/x_bignum.c + $(OPENSSL_PATH)/crypto/asn1/x_info.c + $(OPENSSL_PATH)/crypto/asn1/x_int64.c + $(OPENSSL_PATH)/crypto/asn1/x_long.c + $(OPENSSL_PATH)/crypto/asn1/x_pkey.c + $(OPENSSL_PATH)/crypto/asn1/x_sig.c + $(OPENSSL_PATH)/crypto/asn1/x_spki.c + $(OPENSSL_PATH)/crypto/asn1/x_val.c + $(OPENSSL_PATH)/crypto/async/arch/async_null.c + $(OPENSSL_PATH)/crypto/async/arch/async_posix.c + $(OPENSSL_PATH)/crypto/async/arch/async_win.c + $(OPENSSL_PATH)/crypto/async/async.c + $(OPENSSL_PATH)/crypto/async/async_err.c + $(OPENSSL_PATH)/crypto/async/async_wait.c + $(OPENSSL_PATH)/crypto/bio/b_addr.c + $(OPENSSL_PATH)/crypto/bio/b_dump.c + $(OPENSSL_PATH)/crypto/bio/b_sock.c + $(OPENSSL_PATH)/crypto/bio/b_sock2.c + $(OPENSSL_PATH)/crypto/bio/bf_buff.c + $(OPENSSL_PATH)/crypto/bio/bf_lbuf.c + $(OPENSSL_PATH)/crypto/bio/bf_nbio.c + $(OPENSSL_PATH)/crypto/bio/bf_null.c + $(OPENSSL_PATH)/crypto/bio/bio_cb.c + $(OPENSSL_PATH)/crypto/bio/bio_err.c + $(OPENSSL_PATH)/crypto/bio/bio_lib.c + $(OPENSSL_PATH)/crypto/bio/bio_meth.c + $(OPENSSL_PATH)/crypto/bio/bss_acpt.c + $(OPENSSL_PATH)/crypto/bio/bss_bio.c + $(OPENSSL_PATH)/crypto/bio/bss_conn.c + $(OPENSSL_PATH)/crypto/bio/bss_dgram.c + $(OPENSSL_PATH)/crypto/bio/bss_fd.c + $(OPENSSL_PATH)/crypto/bio/bss_file.c + $(OPENSSL_PATH)/crypto/bio/bss_log.c + $(OPENSSL_PATH)/crypto/bio/bss_mem.c + $(OPENSSL_PATH)/crypto/bio/bss_null.c + $(OPENSSL_PATH)/crypto/bio/bss_sock.c + $(OPENSSL_PATH)/crypto/bn/bn_add.c + $(OPENSSL_PATH)/crypto/bn/bn_asm.c + $(OPENSSL_PATH)/crypto/bn/bn_blind.c + $(OPENSSL_PATH)/crypto/bn/bn_const.c + $(OPENSSL_PATH)/crypto/bn/bn_ctx.c + $(OPENSSL_PATH)/crypto/bn/bn_depr.c + $(OPENSSL_PATH)/crypto/bn/bn_dh.c + $(OPENSSL_PATH)/crypto/bn/bn_div.c + $(OPENSSL_PATH)/crypto/bn/bn_err.c + $(OPENSSL_PATH)/crypto/bn/bn_exp.c + $(OPENSSL_PATH)/crypto/bn/bn_exp2.c + $(OPENSSL_PATH)/crypto/bn/bn_gcd.c + $(OPENSSL_PATH)/crypto/bn/bn_gf2m.c + $(OPENSSL_PATH)/crypto/bn/bn_intern.c + $(OPENSSL_PATH)/crypto/bn/bn_kron.c + $(OPENSSL_PATH)/crypto/bn/bn_lib.c + $(OPENSSL_PATH)/crypto/bn/bn_mod.c + $(OPENSSL_PATH)/crypto/bn/bn_mont.c + $(OPENSSL_PATH)/crypto/bn/bn_mpi.c + $(OPENSSL_PATH)/crypto/bn/bn_mul.c + $(OPENSSL_PATH)/crypto/bn/bn_nist.c + $(OPENSSL_PATH)/crypto/bn/bn_prime.c + $(OPENSSL_PATH)/crypto/bn/bn_print.c + $(OPENSSL_PATH)/crypto/bn/bn_rand.c + $(OPENSSL_PATH)/crypto/bn/bn_recp.c + $(OPENSSL_PATH)/crypto/bn/bn_shift.c + $(OPENSSL_PATH)/crypto/bn/bn_sqr.c + $(OPENSSL_PATH)/crypto/bn/bn_sqrt.c + $(OPENSSL_PATH)/crypto/bn/bn_srp.c + $(OPENSSL_PATH)/crypto/bn/bn_word.c + $(OPENSSL_PATH)/crypto/bn/bn_x931p.c + $(OPENSSL_PATH)/crypto/buffer/buf_err.c + $(OPENSSL_PATH)/crypto/buffer/buffer.c + $(OPENSSL_PATH)/crypto/cmac/cm_ameth.c + $(OPENSSL_PATH)/crypto/cmac/cm_pmeth.c + $(OPENSSL_PATH)/crypto/cmac/cmac.c + $(OPENSSL_PATH)/crypto/comp/c_zlib.c + $(OPENSSL_PATH)/crypto/comp/comp_err.c + $(OPENSSL_PATH)/crypto/comp/comp_lib.c + $(OPENSSL_PATH)/crypto/conf/conf_api.c + $(OPENSSL_PATH)/crypto/conf/conf_def.c + $(OPENSSL_PATH)/crypto/conf/conf_err.c + $(OPENSSL_PATH)/crypto/conf/conf_lib.c + $(OPENSSL_PATH)/crypto/conf/conf_mall.c + $(OPENSSL_PATH)/crypto/conf/conf_mod.c + $(OPENSSL_PATH)/crypto/conf/conf_sap.c + $(OPENSSL_PATH)/crypto/conf/conf_ssl.c + $(OPENSSL_PATH)/crypto/cpt_err.c + $(OPENSSL_PATH)/crypto/cryptlib.c + $(OPENSSL_PATH)/crypto/ctype.c + $(OPENSSL_PATH)/crypto/cversion.c + $(OPENSSL_PATH)/crypto/dh/dh_ameth.c + $(OPENSSL_PATH)/crypto/dh/dh_asn1.c + $(OPENSSL_PATH)/crypto/dh/dh_check.c + $(OPENSSL_PATH)/crypto/dh/dh_depr.c + $(OPENSSL_PATH)/crypto/dh/dh_err.c + $(OPENSSL_PATH)/crypto/dh/dh_gen.c + $(OPENSSL_PATH)/crypto/dh/dh_kdf.c + $(OPENSSL_PATH)/crypto/dh/dh_key.c + $(OPENSSL_PATH)/crypto/dh/dh_lib.c + $(OPENSSL_PATH)/crypto/dh/dh_meth.c + $(OPENSSL_PATH)/crypto/dh/dh_pmeth.c + $(OPENSSL_PATH)/crypto/dh/dh_prn.c + $(OPENSSL_PATH)/crypto/dh/dh_rfc5114.c + $(OPENSSL_PATH)/crypto/dh/dh_rfc7919.c + $(OPENSSL_PATH)/crypto/dso/dso_dl.c + $(OPENSSL_PATH)/crypto/dso/dso_dlfcn.c + $(OPENSSL_PATH)/crypto/dso/dso_err.c + $(OPENSSL_PATH)/crypto/dso/dso_lib.c + $(OPENSSL_PATH)/crypto/dso/dso_openssl.c + $(OPENSSL_PATH)/crypto/dso/dso_vms.c + $(OPENSSL_PATH)/crypto/dso/dso_win32.c + $(OPENSSL_PATH)/crypto/ebcdic.c + $(OPENSSL_PATH)/crypto/err/err.c + $(OPENSSL_PATH)/crypto/err/err_prn.c + $(OPENSSL_PATH)/crypto/evp/bio_b64.c + $(OPENSSL_PATH)/crypto/evp/bio_enc.c + $(OPENSSL_PATH)/crypto/evp/bio_md.c + $(OPENSSL_PATH)/crypto/evp/bio_ok.c + $(OPENSSL_PATH)/crypto/evp/c_allc.c + $(OPENSSL_PATH)/crypto/evp/c_alld.c + $(OPENSSL_PATH)/crypto/evp/cmeth_lib.c + $(OPENSSL_PATH)/crypto/evp/digest.c + $(OPENSSL_PATH)/crypto/evp/e_aes.c + $(OPENSSL_PATH)/crypto/evp/e_aes_cbc_hmac_sha1.c + $(OPENSSL_PATH)/crypto/evp/e_aes_cbc_hmac_sha256.c + $(OPENSSL_PATH)/crypto/evp/e_aria.c + $(OPENSSL_PATH)/crypto/evp/e_bf.c + $(OPENSSL_PATH)/crypto/evp/e_camellia.c + $(OPENSSL_PATH)/crypto/evp/e_cast.c + $(OPENSSL_PATH)/crypto/evp/e_chacha20_poly1305.c + $(OPENSSL_PATH)/crypto/evp/e_des.c + $(OPENSSL_PATH)/crypto/evp/e_des3.c + $(OPENSSL_PATH)/crypto/evp/e_idea.c + $(OPENSSL_PATH)/crypto/evp/e_null.c + $(OPENSSL_PATH)/crypto/evp/e_old.c + $(OPENSSL_PATH)/crypto/evp/e_rc2.c + $(OPENSSL_PATH)/crypto/evp/e_rc4.c + $(OPENSSL_PATH)/crypto/evp/e_rc4_hmac_md5.c + $(OPENSSL_PATH)/crypto/evp/e_rc5.c + $(OPENSSL_PATH)/crypto/evp/e_seed.c + $(OPENSSL_PATH)/crypto/evp/e_sm4.c + $(OPENSSL_PATH)/crypto/evp/e_xcbc_d.c + $(OPENSSL_PATH)/crypto/evp/encode.c + $(OPENSSL_PATH)/crypto/evp/evp_cnf.c + $(OPENSSL_PATH)/crypto/evp/evp_enc.c + $(OPENSSL_PATH)/crypto/evp/evp_err.c + $(OPENSSL_PATH)/crypto/evp/evp_key.c + $(OPENSSL_PATH)/crypto/evp/evp_lib.c + $(OPENSSL_PATH)/crypto/evp/evp_pbe.c + $(OPENSSL_PATH)/crypto/evp/evp_pkey.c + $(OPENSSL_PATH)/crypto/evp/m_md2.c + $(OPENSSL_PATH)/crypto/evp/m_md4.c + $(OPENSSL_PATH)/crypto/evp/m_md5.c + $(OPENSSL_PATH)/crypto/evp/m_md5_sha1.c + $(OPENSSL_PATH)/crypto/evp/m_mdc2.c + $(OPENSSL_PATH)/crypto/evp/m_null.c + $(OPENSSL_PATH)/crypto/evp/m_ripemd.c + $(OPENSSL_PATH)/crypto/evp/m_sha1.c + $(OPENSSL_PATH)/crypto/evp/m_sha3.c + $(OPENSSL_PATH)/crypto/evp/m_sigver.c + $(OPENSSL_PATH)/crypto/evp/m_wp.c + $(OPENSSL_PATH)/crypto/evp/names.c + $(OPENSSL_PATH)/crypto/evp/p5_crpt.c + $(OPENSSL_PATH)/crypto/evp/p5_crpt2.c + $(OPENSSL_PATH)/crypto/evp/p_dec.c + $(OPENSSL_PATH)/crypto/evp/p_enc.c + $(OPENSSL_PATH)/crypto/evp/p_lib.c + $(OPENSSL_PATH)/crypto/evp/p_open.c + $(OPENSSL_PATH)/crypto/evp/p_seal.c + $(OPENSSL_PATH)/crypto/evp/p_sign.c + $(OPENSSL_PATH)/crypto/evp/p_verify.c + $(OPENSSL_PATH)/crypto/evp/pbe_scrypt.c + $(OPENSSL_PATH)/crypto/evp/pmeth_fn.c + $(OPENSSL_PATH)/crypto/evp/pmeth_gn.c + $(OPENSSL_PATH)/crypto/evp/pmeth_lib.c + $(OPENSSL_PATH)/crypto/ex_data.c + $(OPENSSL_PATH)/crypto/getenv.c + $(OPENSSL_PATH)/crypto/hmac/hm_ameth.c + $(OPENSSL_PATH)/crypto/hmac/hm_pmeth.c + $(OPENSSL_PATH)/crypto/hmac/hmac.c + $(OPENSSL_PATH)/crypto/init.c + $(OPENSSL_PATH)/crypto/kdf/hkdf.c + $(OPENSSL_PATH)/crypto/kdf/kdf_err.c + $(OPENSSL_PATH)/crypto/kdf/scrypt.c + $(OPENSSL_PATH)/crypto/kdf/tls1_prf.c + $(OPENSSL_PATH)/crypto/lhash/lh_stats.c + $(OPENSSL_PATH)/crypto/lhash/lhash.c + $(OPENSSL_PATH)/crypto/md5/md5_dgst.c + $(OPENSSL_PATH)/crypto/md5/md5_one.c + $(OPENSSL_PATH)/crypto/mem.c + $(OPENSSL_PATH)/crypto/mem_dbg.c + $(OPENSSL_PATH)/crypto/mem_sec.c + $(OPENSSL_PATH)/crypto/modes/cbc128.c + $(OPENSSL_PATH)/crypto/modes/ccm128.c + $(OPENSSL_PATH)/crypto/modes/cfb128.c + $(OPENSSL_PATH)/crypto/modes/ctr128.c + $(OPENSSL_PATH)/crypto/modes/cts128.c + $(OPENSSL_PATH)/crypto/modes/gcm128.c + $(OPENSSL_PATH)/crypto/modes/ocb128.c + $(OPENSSL_PATH)/crypto/modes/ofb128.c + $(OPENSSL_PATH)/crypto/modes/wrap128.c + $(OPENSSL_PATH)/crypto/modes/xts128.c + $(OPENSSL_PATH)/crypto/o_dir.c + $(OPENSSL_PATH)/crypto/o_fips.c + $(OPENSSL_PATH)/crypto/o_fopen.c + $(OPENSSL_PATH)/crypto/o_init.c + $(OPENSSL_PATH)/crypto/o_str.c + $(OPENSSL_PATH)/crypto/o_time.c + $(OPENSSL_PATH)/crypto/objects/o_names.c + $(OPENSSL_PATH)/crypto/objects/obj_dat.c + $(OPENSSL_PATH)/crypto/objects/obj_err.c + $(OPENSSL_PATH)/crypto/objects/obj_lib.c + $(OPENSSL_PATH)/crypto/objects/obj_xref.c + $(OPENSSL_PATH)/crypto/ocsp/ocsp_asn.c + $(OPENSSL_PATH)/crypto/ocsp/ocsp_cl.c + $(OPENSSL_PATH)/crypto/ocsp/ocsp_err.c + $(OPENSSL_PATH)/crypto/ocsp/ocsp_ext.c + $(OPENSSL_PATH)/crypto/ocsp/ocsp_ht.c + $(OPENSSL_PATH)/crypto/ocsp/ocsp_lib.c + $(OPENSSL_PATH)/crypto/ocsp/ocsp_prn.c + $(OPENSSL_PATH)/crypto/ocsp/ocsp_srv.c + $(OPENSSL_PATH)/crypto/ocsp/ocsp_vfy.c + $(OPENSSL_PATH)/crypto/ocsp/v3_ocsp.c + $(OPENSSL_PATH)/crypto/pem/pem_all.c + $(OPENSSL_PATH)/crypto/pem/pem_err.c + $(OPENSSL_PATH)/crypto/pem/pem_info.c + $(OPENSSL_PATH)/crypto/pem/pem_lib.c + $(OPENSSL_PATH)/crypto/pem/pem_oth.c + $(OPENSSL_PATH)/crypto/pem/pem_pk8.c + $(OPENSSL_PATH)/crypto/pem/pem_pkey.c + $(OPENSSL_PATH)/crypto/pem/pem_sign.c + $(OPENSSL_PATH)/crypto/pem/pem_x509.c + $(OPENSSL_PATH)/crypto/pem/pem_xaux.c + $(OPENSSL_PATH)/crypto/pem/pvkfmt.c + $(OPENSSL_PATH)/crypto/pkcs12/p12_add.c + $(OPENSSL_PATH)/crypto/pkcs12/p12_asn.c + $(OPENSSL_PATH)/crypto/pkcs12/p12_attr.c + $(OPENSSL_PATH)/crypto/pkcs12/p12_crpt.c + $(OPENSSL_PATH)/crypto/pkcs12/p12_crt.c + $(OPENSSL_PATH)/crypto/pkcs12/p12_decr.c + $(OPENSSL_PATH)/crypto/pkcs12/p12_init.c + $(OPENSSL_PATH)/crypto/pkcs12/p12_key.c + $(OPENSSL_PATH)/crypto/pkcs12/p12_kiss.c + $(OPENSSL_PATH)/crypto/pkcs12/p12_mutl.c + $(OPENSSL_PATH)/crypto/pkcs12/p12_npas.c + $(OPENSSL_PATH)/crypto/pkcs12/p12_p8d.c + $(OPENSSL_PATH)/crypto/pkcs12/p12_p8e.c + $(OPENSSL_PATH)/crypto/pkcs12/p12_sbag.c + $(OPENSSL_PATH)/crypto/pkcs12/p12_utl.c + $(OPENSSL_PATH)/crypto/pkcs12/pk12err.c + $(OPENSSL_PATH)/crypto/pkcs7/bio_pk7.c + $(OPENSSL_PATH)/crypto/pkcs7/pk7_asn1.c + $(OPENSSL_PATH)/crypto/pkcs7/pk7_attr.c + $(OPENSSL_PATH)/crypto/pkcs7/pk7_doit.c + $(OPENSSL_PATH)/crypto/pkcs7/pk7_lib.c + $(OPENSSL_PATH)/crypto/pkcs7/pk7_mime.c + $(OPENSSL_PATH)/crypto/pkcs7/pk7_smime.c + $(OPENSSL_PATH)/crypto/pkcs7/pkcs7err.c + $(OPENSSL_PATH)/crypto/rand/drbg_ctr.c + $(OPENSSL_PATH)/crypto/rand/drbg_lib.c + $(OPENSSL_PATH)/crypto/rand/rand_egd.c + $(OPENSSL_PATH)/crypto/rand/rand_err.c + $(OPENSSL_PATH)/crypto/rand/rand_lib.c + $(OPENSSL_PATH)/crypto/rand/rand_unix.c + $(OPENSSL_PATH)/crypto/rand/rand_vms.c + $(OPENSSL_PATH)/crypto/rand/rand_win.c + $(OPENSSL_PATH)/crypto/rsa/rsa_ameth.c + $(OPENSSL_PATH)/crypto/rsa/rsa_asn1.c + $(OPENSSL_PATH)/crypto/rsa/rsa_chk.c + $(OPENSSL_PATH)/crypto/rsa/rsa_crpt.c + $(OPENSSL_PATH)/crypto/rsa/rsa_depr.c + $(OPENSSL_PATH)/crypto/rsa/rsa_err.c + $(OPENSSL_PATH)/crypto/rsa/rsa_gen.c + $(OPENSSL_PATH)/crypto/rsa/rsa_lib.c + $(OPENSSL_PATH)/crypto/rsa/rsa_meth.c + $(OPENSSL_PATH)/crypto/rsa/rsa_mp.c + $(OPENSSL_PATH)/crypto/rsa/rsa_none.c + $(OPENSSL_PATH)/crypto/rsa/rsa_oaep.c + $(OPENSSL_PATH)/crypto/rsa/rsa_ossl.c + $(OPENSSL_PATH)/crypto/rsa/rsa_pk1.c + $(OPENSSL_PATH)/crypto/rsa/rsa_pmeth.c + $(OPENSSL_PATH)/crypto/rsa/rsa_prn.c + $(OPENSSL_PATH)/crypto/rsa/rsa_pss.c + $(OPENSSL_PATH)/crypto/rsa/rsa_saos.c + $(OPENSSL_PATH)/crypto/rsa/rsa_sign.c + $(OPENSSL_PATH)/crypto/rsa/rsa_ssl.c + $(OPENSSL_PATH)/crypto/rsa/rsa_x931.c + $(OPENSSL_PATH)/crypto/rsa/rsa_x931g.c + $(OPENSSL_PATH)/crypto/sha/keccak1600.c + $(OPENSSL_PATH)/crypto/sha/sha1_one.c + $(OPENSSL_PATH)/crypto/sha/sha1dgst.c + $(OPENSSL_PATH)/crypto/sha/sha256.c + $(OPENSSL_PATH)/crypto/sha/sha512.c + $(OPENSSL_PATH)/crypto/siphash/siphash.c + $(OPENSSL_PATH)/crypto/siphash/siphash_ameth.c + $(OPENSSL_PATH)/crypto/siphash/siphash_pmeth.c + $(OPENSSL_PATH)/crypto/sm3/m_sm3.c + $(OPENSSL_PATH)/crypto/sm3/sm3.c + $(OPENSSL_PATH)/crypto/sm4/sm4.c + $(OPENSSL_PATH)/crypto/stack/stack.c + $(OPENSSL_PATH)/crypto/threads_none.c + $(OPENSSL_PATH)/crypto/threads_pthread.c + $(OPENSSL_PATH)/crypto/threads_win.c + $(OPENSSL_PATH)/crypto/txt_db/txt_db.c + $(OPENSSL_PATH)/crypto/ui/ui_err.c + $(OPENSSL_PATH)/crypto/ui/ui_lib.c + $(OPENSSL_PATH)/crypto/ui/ui_null.c + $(OPENSSL_PATH)/crypto/ui/ui_openssl.c + $(OPENSSL_PATH)/crypto/ui/ui_util.c + $(OPENSSL_PATH)/crypto/uid.c + $(OPENSSL_PATH)/crypto/x509/by_dir.c + $(OPENSSL_PATH)/crypto/x509/by_file.c + $(OPENSSL_PATH)/crypto/x509/t_crl.c + $(OPENSSL_PATH)/crypto/x509/t_req.c + $(OPENSSL_PATH)/crypto/x509/t_x509.c + $(OPENSSL_PATH)/crypto/x509/x509_att.c + $(OPENSSL_PATH)/crypto/x509/x509_cmp.c + $(OPENSSL_PATH)/crypto/x509/x509_d2.c + $(OPENSSL_PATH)/crypto/x509/x509_def.c + $(OPENSSL_PATH)/crypto/x509/x509_err.c + $(OPENSSL_PATH)/crypto/x509/x509_ext.c + $(OPENSSL_PATH)/crypto/x509/x509_lu.c + $(OPENSSL_PATH)/crypto/x509/x509_meth.c + $(OPENSSL_PATH)/crypto/x509/x509_obj.c + $(OPENSSL_PATH)/crypto/x509/x509_r2x.c + $(OPENSSL_PATH)/crypto/x509/x509_req.c + $(OPENSSL_PATH)/crypto/x509/x509_set.c + $(OPENSSL_PATH)/crypto/x509/x509_trs.c + $(OPENSSL_PATH)/crypto/x509/x509_txt.c + $(OPENSSL_PATH)/crypto/x509/x509_v3.c + $(OPENSSL_PATH)/crypto/x509/x509_vfy.c + $(OPENSSL_PATH)/crypto/x509/x509_vpm.c + $(OPENSSL_PATH)/crypto/x509/x509cset.c + $(OPENSSL_PATH)/crypto/x509/x509name.c + $(OPENSSL_PATH)/crypto/x509/x509rset.c + $(OPENSSL_PATH)/crypto/x509/x509spki.c + $(OPENSSL_PATH)/crypto/x509/x509type.c + $(OPENSSL_PATH)/crypto/x509/x_all.c + $(OPENSSL_PATH)/crypto/x509/x_attrib.c + $(OPENSSL_PATH)/crypto/x509/x_crl.c + $(OPENSSL_PATH)/crypto/x509/x_exten.c + $(OPENSSL_PATH)/crypto/x509/x_name.c + $(OPENSSL_PATH)/crypto/x509/x_pubkey.c + $(OPENSSL_PATH)/crypto/x509/x_req.c + $(OPENSSL_PATH)/crypto/x509/x_x509.c + $(OPENSSL_PATH)/crypto/x509/x_x509a.c + $(OPENSSL_PATH)/crypto/x509v3/pcy_cache.c + $(OPENSSL_PATH)/crypto/x509v3/pcy_data.c + $(OPENSSL_PATH)/crypto/x509v3/pcy_lib.c + $(OPENSSL_PATH)/crypto/x509v3/pcy_map.c + $(OPENSSL_PATH)/crypto/x509v3/pcy_node.c + $(OPENSSL_PATH)/crypto/x509v3/pcy_tree.c + $(OPENSSL_PATH)/crypto/x509v3/v3_addr.c + $(OPENSSL_PATH)/crypto/x509v3/v3_admis.c + $(OPENSSL_PATH)/crypto/x509v3/v3_akey.c + $(OPENSSL_PATH)/crypto/x509v3/v3_akeya.c + $(OPENSSL_PATH)/crypto/x509v3/v3_alt.c + $(OPENSSL_PATH)/crypto/x509v3/v3_asid.c + $(OPENSSL_PATH)/crypto/x509v3/v3_bcons.c + $(OPENSSL_PATH)/crypto/x509v3/v3_bitst.c + $(OPENSSL_PATH)/crypto/x509v3/v3_conf.c + $(OPENSSL_PATH)/crypto/x509v3/v3_cpols.c + $(OPENSSL_PATH)/crypto/x509v3/v3_crld.c + $(OPENSSL_PATH)/crypto/x509v3/v3_enum.c + $(OPENSSL_PATH)/crypto/x509v3/v3_extku.c + $(OPENSSL_PATH)/crypto/x509v3/v3_genn.c + $(OPENSSL_PATH)/crypto/x509v3/v3_ia5.c + $(OPENSSL_PATH)/crypto/x509v3/v3_info.c + $(OPENSSL_PATH)/crypto/x509v3/v3_int.c + $(OPENSSL_PATH)/crypto/x509v3/v3_lib.c + $(OPENSSL_PATH)/crypto/x509v3/v3_ncons.c + $(OPENSSL_PATH)/crypto/x509v3/v3_pci.c + $(OPENSSL_PATH)/crypto/x509v3/v3_pcia.c + $(OPENSSL_PATH)/crypto/x509v3/v3_pcons.c + $(OPENSSL_PATH)/crypto/x509v3/v3_pku.c + $(OPENSSL_PATH)/crypto/x509v3/v3_pmaps.c + $(OPENSSL_PATH)/crypto/x509v3/v3_prn.c + $(OPENSSL_PATH)/crypto/x509v3/v3_purp.c + $(OPENSSL_PATH)/crypto/x509v3/v3_skey.c + $(OPENSSL_PATH)/crypto/x509v3/v3_sxnet.c + $(OPENSSL_PATH)/crypto/x509v3/v3_tlsf.c + $(OPENSSL_PATH)/crypto/x509v3/v3_utl.c + $(OPENSSL_PATH)/crypto/x509v3/v3err.c + $(OPENSSL_PATH)/crypto/arm_arch.h + $(OPENSSL_PATH)/crypto/mips_arch.h + $(OPENSSL_PATH)/crypto/ppc_arch.h + $(OPENSSL_PATH)/crypto/s390x_arch.h + $(OPENSSL_PATH)/crypto/sparc_arch.h + $(OPENSSL_PATH)/crypto/vms_rms.h + $(OPENSSL_PATH)/crypto/aes/aes_local.h + $(OPENSSL_PATH)/crypto/asn1/asn1_item_list.h + $(OPENSSL_PATH)/crypto/asn1/asn1_local.h + $(OPENSSL_PATH)/crypto/asn1/charmap.h + $(OPENSSL_PATH)/crypto/asn1/standard_methods.h + $(OPENSSL_PATH)/crypto/asn1/tbl_standard.h + $(OPENSSL_PATH)/crypto/async/async_local.h + $(OPENSSL_PATH)/crypto/async/arch/async_null.h + $(OPENSSL_PATH)/crypto/async/arch/async_posix.h + $(OPENSSL_PATH)/crypto/async/arch/async_win.h + $(OPENSSL_PATH)/crypto/bio/bio_local.h + $(OPENSSL_PATH)/crypto/bn/bn_local.h + $(OPENSSL_PATH)/crypto/bn/bn_prime.h + $(OPENSSL_PATH)/crypto/bn/rsaz_exp.h + $(OPENSSL_PATH)/crypto/comp/comp_local.h + $(OPENSSL_PATH)/crypto/conf/conf_def.h + $(OPENSSL_PATH)/crypto/conf/conf_local.h + $(OPENSSL_PATH)/crypto/dh/dh_local.h + $(OPENSSL_PATH)/crypto/dso/dso_local.h + $(OPENSSL_PATH)/crypto/evp/evp_local.h + $(OPENSSL_PATH)/crypto/hmac/hmac_local.h + $(OPENSSL_PATH)/crypto/lhash/lhash_local.h + $(OPENSSL_PATH)/crypto/md5/md5_local.h + $(OPENSSL_PATH)/crypto/modes/modes_local.h + $(OPENSSL_PATH)/crypto/objects/obj_dat.h + $(OPENSSL_PATH)/crypto/objects/obj_local.h + $(OPENSSL_PATH)/crypto/objects/obj_xref.h + $(OPENSSL_PATH)/crypto/ocsp/ocsp_local.h + $(OPENSSL_PATH)/crypto/pkcs12/p12_local.h + $(OPENSSL_PATH)/crypto/rand/rand_local.h + $(OPENSSL_PATH)/crypto/rsa/rsa_local.h + $(OPENSSL_PATH)/crypto/sha/sha_local.h + $(OPENSSL_PATH)/crypto/siphash/siphash_local.h + $(OPENSSL_PATH)/crypto/sm3/sm3_local.h + $(OPENSSL_PATH)/crypto/store/store_local.h + $(OPENSSL_PATH)/crypto/ui/ui_local.h + $(OPENSSL_PATH)/crypto/x509/x509_local.h + $(OPENSSL_PATH)/crypto/x509v3/ext_dat.h + $(OPENSSL_PATH)/crypto/x509v3/pcy_local.h + $(OPENSSL_PATH)/crypto/x509v3/standard_exts.h + $(OPENSSL_PATH)/crypto/x509v3/v3_admis.h + $(OPENSSL_PATH)/ssl/bio_ssl.c + $(OPENSSL_PATH)/ssl/d1_lib.c + $(OPENSSL_PATH)/ssl/d1_msg.c + $(OPENSSL_PATH)/ssl/d1_srtp.c + $(OPENSSL_PATH)/ssl/methods.c + $(OPENSSL_PATH)/ssl/packet.c + $(OPENSSL_PATH)/ssl/pqueue.c + $(OPENSSL_PATH)/ssl/record/dtls1_bitmap.c + $(OPENSSL_PATH)/ssl/record/rec_layer_d1.c + $(OPENSSL_PATH)/ssl/record/rec_layer_s3.c + $(OPENSSL_PATH)/ssl/record/ssl3_buffer.c + $(OPENSSL_PATH)/ssl/record/ssl3_record.c + $(OPENSSL_PATH)/ssl/record/ssl3_record_tls13.c + $(OPENSSL_PATH)/ssl/s3_cbc.c + $(OPENSSL_PATH)/ssl/s3_enc.c + $(OPENSSL_PATH)/ssl/s3_lib.c + $(OPENSSL_PATH)/ssl/s3_msg.c + $(OPENSSL_PATH)/ssl/ssl_asn1.c + $(OPENSSL_PATH)/ssl/ssl_cert.c + $(OPENSSL_PATH)/ssl/ssl_ciph.c + $(OPENSSL_PATH)/ssl/ssl_conf.c + $(OPENSSL_PATH)/ssl/ssl_err.c + $(OPENSSL_PATH)/ssl/ssl_init.c + $(OPENSSL_PATH)/ssl/ssl_lib.c + $(OPENSSL_PATH)/ssl/ssl_mcnf.c + $(OPENSSL_PATH)/ssl/ssl_rsa.c + $(OPENSSL_PATH)/ssl/ssl_sess.c + $(OPENSSL_PATH)/ssl/ssl_stat.c + $(OPENSSL_PATH)/ssl/ssl_txt.c + $(OPENSSL_PATH)/ssl/ssl_utst.c + $(OPENSSL_PATH)/ssl/statem/extensions.c + $(OPENSSL_PATH)/ssl/statem/extensions_clnt.c + $(OPENSSL_PATH)/ssl/statem/extensions_cust.c + $(OPENSSL_PATH)/ssl/statem/extensions_srvr.c + $(OPENSSL_PATH)/ssl/statem/statem.c + $(OPENSSL_PATH)/ssl/statem/statem_clnt.c + $(OPENSSL_PATH)/ssl/statem/statem_dtls.c + $(OPENSSL_PATH)/ssl/statem/statem_lib.c + $(OPENSSL_PATH)/ssl/statem/statem_srvr.c + $(OPENSSL_PATH)/ssl/t1_enc.c + $(OPENSSL_PATH)/ssl/t1_lib.c + $(OPENSSL_PATH)/ssl/t1_trce.c + $(OPENSSL_PATH)/ssl/tls13_enc.c + $(OPENSSL_PATH)/ssl/tls_srp.c + $(OPENSSL_PATH)/ssl/packet_local.h + $(OPENSSL_PATH)/ssl/ssl_cert_table.h + $(OPENSSL_PATH)/ssl/ssl_local.h + $(OPENSSL_PATH)/ssl/record/record.h + $(OPENSSL_PATH)/ssl/record/record_local.h + $(OPENSSL_PATH)/ssl/statem/statem.h + $(OPENSSL_PATH)/ssl/statem/statem_local.h +# Autogenerated files list ends here + buildinf.h + ossl_store.c + rand_pool.c + X64/ApiHooks.c + +[Packages] + MdePkg/MdePkg.dec + CryptoPkg/CryptoPkg.dec + +[LibraryClasses] + BaseLib + DebugLib + RngLib + PrintLib + +[BuildOptions] + # + # Disables the following Visual Studio compiler warnings brought by openssl source, + # so we do not break the build with /WX option: + # C4090: 'function' : different 'const' qualifiers + # C4132: 'object' : const object should be initialized (tls13_enc.c) + # C4210: nonstandard extension used: function given file scope + # C4244: conversion from type1 to type2, possible loss of data + # C4245: conversion from type1 to type2, signed/unsigned mismatch + # C4267: conversion from size_t to type, possible loss of data + # C4306: 'identifier' : conversion from 'type1' to 'type2' of greater size + # C4310: cast truncates constant value + # C4389: 'operator' : signed/unsigned mismatch (xxxx) + # C4700: uninitialized local variable 'name' used. (conf_sap.c(71)) + # C4702: unreachable code + # C4706: assignment within conditional expression + # C4819: The file contains a character that cannot be represented in the current code page + # + MSFT:*_*_X64_CC_FLAGS = -U_WIN32 -U_WIN64 -U_MSC_VER $(OPENSSL_FLAGS) $(OPENSSL_FLAGS_CONFIG) /wd4090 /wd4132 /wd4210 /wd4244 /wd4245 /wd4267 /wd4306 /wd4310 /wd4700 /wd4389 /wd4702 /wd4706 /wd4819 + + INTEL:*_*_X64_CC_FLAGS = -U_WIN32 -U_WIN64 -U_MSC_VER -U__ICC $(OPENSSL_FLAGS) $(OPENSSL_FLAGS_CONFIG) /w + + # + # Suppress the following build warnings in openssl so we don't break the build with -Werror + # -Werror=maybe-uninitialized: there exist some other paths for which the variable is not initialized. + # -Werror=format: Check calls to printf and scanf, etc., to make sure that the arguments supplied have + # types appropriate to the format string specified. + # -Werror=unused-but-set-variable: Warn whenever a local variable is assigned to, but otherwise unused (aside from its declaration). + # + GCC:*_*_X64_CC_FLAGS = -UWIN32 -U_WIN32 -U_WIN64 $(OPENSSL_FLAGS) $(OPENSSL_FLAGS_CONFIG) -Wno-error=maybe-uninitialized -Wno-error=format -Wno-format -Wno-error=unused-but-set-variable -DNO_MSABI_VA_FUNCS + + # suppress the following warnings in openssl so we don't break the build with warnings-as-errors: + # 1295: Deprecated declaration <entity> - give arg types + # 550: <entity> was set but never used + # 1293: assignment in condition + # 111: statement is unreachable (invariably "break;" after "return X;" in case statement) + # 68: integer conversion resulted in a change of sign ("if (Status == -1)") + # 177: <entity> was declared but never referenced + # 223: function <entity> declared implicitly + # 144: a value of type <type> cannot be used to initialize an entity of type <type> + # 513: a value of type <type> cannot be assigned to an entity of type <type> + # 188: enumerated type mixed with another type (i.e. passing an integer as an enum without a cast) + # 1296: Extended constant initialiser used + # 128: loop is not reachable - may be emitted inappropriately if code follows a conditional return + # from the function that evaluates to true at compile time + # 546: transfer of control bypasses initialization - may be emitted inappropriately if the uninitialized + # variable is never referenced after the jump + # 1: ignore "#1-D: last line of file ends without a newline" + # 3017: <entity> may be used before being set (NOTE: This was fixed in OpenSSL 1.1 HEAD with + # commit d9b8b89bec4480de3a10bdaf9425db371c19145b, and can be dropped then.) + XCODE:*_*_X64_CC_FLAGS = -mmmx -msse -U_WIN32 -U_WIN64 $(OPENSSL_FLAGS) $(OPENSSL_FLAGS_CONFIG) -w -std=c99 -Wno-error=uninitialized diff --git a/CryptoPkg/Library/OpensslLib/OpensslLibX64Gcc.inf b/CryptoPkg/Library/OpensslLib/OpensslLibX64Gcc.inf new file mode 100644 index 0000000000..4ffdd8cd06 --- /dev/null +++ b/CryptoPkg/Library/OpensslLib/OpensslLibX64Gcc.inf @@ -0,0 +1,653 @@ +## @file +# This module provides OpenSSL Library implementation. +# +# Copyright (c) 2010 - 2020, Intel Corporation. All rights reserved.<BR> +# (C) Copyright 2020 Hewlett Packard Enterprise Development LP<BR> +# SPDX-License-Identifier: BSD-2-Clause-Patent +# +## + +[Defines] + INF_VERSION = 0x00010005 + BASE_NAME = OpensslLibX64Gcc + MODULE_UNI_FILE = OpensslLib.uni + FILE_GUID = DD90DB9D-6A3F-4F2B-87BF-A8F2BBEF982F + MODULE_TYPE = BASE + VERSION_STRING = 1.0 + LIBRARY_CLASS = OpensslLib + DEFINE OPENSSL_PATH = openssl + DEFINE OPENSSL_FLAGS = -DL_ENDIAN -DOPENSSL_SMALL_FOOTPRINT -D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE + DEFINE OPENSSL_FLAGS_CONFIG = -DOPENSSL_CPUID_OBJ -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM + CONSTRUCTOR = OpensslLibConstructor + +# +# VALID_ARCHITECTURES = X64 +# + +[Sources.X64] + OpensslLibConstructor.c + $(OPENSSL_PATH)/e_os.h + $(OPENSSL_PATH)/ms/uplink.h +# Autogenerated files list starts here + X64Gcc/crypto/aes/aesni-mb-x86_64.S + X64Gcc/crypto/aes/aesni-sha1-x86_64.S + X64Gcc/crypto/aes/aesni-sha256-x86_64.S + X64Gcc/crypto/aes/aesni-x86_64.S + X64Gcc/crypto/aes/vpaes-x86_64.S + X64Gcc/crypto/modes/aesni-gcm-x86_64.S + X64Gcc/crypto/modes/ghash-x86_64.S + X64Gcc/crypto/sha/sha1-mb-x86_64.S + X64Gcc/crypto/sha/sha1-x86_64.S + X64Gcc/crypto/sha/sha256-mb-x86_64.S + X64Gcc/crypto/sha/sha256-x86_64.S + X64Gcc/crypto/sha/sha512-x86_64.S + X64Gcc/crypto/x86_64cpuid.S + $(OPENSSL_PATH)/crypto/aes/aes_cbc.c + $(OPENSSL_PATH)/crypto/aes/aes_cfb.c + $(OPENSSL_PATH)/crypto/aes/aes_core.c + $(OPENSSL_PATH)/crypto/aes/aes_ige.c + $(OPENSSL_PATH)/crypto/aes/aes_misc.c + $(OPENSSL_PATH)/crypto/aes/aes_ofb.c + $(OPENSSL_PATH)/crypto/aes/aes_wrap.c + $(OPENSSL_PATH)/crypto/aria/aria.c + $(OPENSSL_PATH)/crypto/asn1/a_bitstr.c + $(OPENSSL_PATH)/crypto/asn1/a_d2i_fp.c + $(OPENSSL_PATH)/crypto/asn1/a_digest.c + $(OPENSSL_PATH)/crypto/asn1/a_dup.c + $(OPENSSL_PATH)/crypto/asn1/a_gentm.c + $(OPENSSL_PATH)/crypto/asn1/a_i2d_fp.c + $(OPENSSL_PATH)/crypto/asn1/a_int.c + $(OPENSSL_PATH)/crypto/asn1/a_mbstr.c + $(OPENSSL_PATH)/crypto/asn1/a_object.c + $(OPENSSL_PATH)/crypto/asn1/a_octet.c + $(OPENSSL_PATH)/crypto/asn1/a_print.c + $(OPENSSL_PATH)/crypto/asn1/a_sign.c + $(OPENSSL_PATH)/crypto/asn1/a_strex.c + $(OPENSSL_PATH)/crypto/asn1/a_strnid.c + $(OPENSSL_PATH)/crypto/asn1/a_time.c + $(OPENSSL_PATH)/crypto/asn1/a_type.c + $(OPENSSL_PATH)/crypto/asn1/a_utctm.c + $(OPENSSL_PATH)/crypto/asn1/a_utf8.c + $(OPENSSL_PATH)/crypto/asn1/a_verify.c + $(OPENSSL_PATH)/crypto/asn1/ameth_lib.c + $(OPENSSL_PATH)/crypto/asn1/asn1_err.c + $(OPENSSL_PATH)/crypto/asn1/asn1_gen.c + $(OPENSSL_PATH)/crypto/asn1/asn1_item_list.c + $(OPENSSL_PATH)/crypto/asn1/asn1_lib.c + $(OPENSSL_PATH)/crypto/asn1/asn1_par.c + $(OPENSSL_PATH)/crypto/asn1/asn_mime.c + $(OPENSSL_PATH)/crypto/asn1/asn_moid.c + $(OPENSSL_PATH)/crypto/asn1/asn_mstbl.c + $(OPENSSL_PATH)/crypto/asn1/asn_pack.c + $(OPENSSL_PATH)/crypto/asn1/bio_asn1.c + $(OPENSSL_PATH)/crypto/asn1/bio_ndef.c + $(OPENSSL_PATH)/crypto/asn1/d2i_pr.c + $(OPENSSL_PATH)/crypto/asn1/d2i_pu.c + $(OPENSSL_PATH)/crypto/asn1/evp_asn1.c + $(OPENSSL_PATH)/crypto/asn1/f_int.c + $(OPENSSL_PATH)/crypto/asn1/f_string.c + $(OPENSSL_PATH)/crypto/asn1/i2d_pr.c + $(OPENSSL_PATH)/crypto/asn1/i2d_pu.c + $(OPENSSL_PATH)/crypto/asn1/n_pkey.c + $(OPENSSL_PATH)/crypto/asn1/nsseq.c + $(OPENSSL_PATH)/crypto/asn1/p5_pbe.c + $(OPENSSL_PATH)/crypto/asn1/p5_pbev2.c + $(OPENSSL_PATH)/crypto/asn1/p5_scrypt.c + $(OPENSSL_PATH)/crypto/asn1/p8_pkey.c + $(OPENSSL_PATH)/crypto/asn1/t_bitst.c + $(OPENSSL_PATH)/crypto/asn1/t_pkey.c + $(OPENSSL_PATH)/crypto/asn1/t_spki.c + $(OPENSSL_PATH)/crypto/asn1/tasn_dec.c + $(OPENSSL_PATH)/crypto/asn1/tasn_enc.c + $(OPENSSL_PATH)/crypto/asn1/tasn_fre.c + $(OPENSSL_PATH)/crypto/asn1/tasn_new.c + $(OPENSSL_PATH)/crypto/asn1/tasn_prn.c + $(OPENSSL_PATH)/crypto/asn1/tasn_scn.c + $(OPENSSL_PATH)/crypto/asn1/tasn_typ.c + $(OPENSSL_PATH)/crypto/asn1/tasn_utl.c + $(OPENSSL_PATH)/crypto/asn1/x_algor.c + $(OPENSSL_PATH)/crypto/asn1/x_bignum.c + $(OPENSSL_PATH)/crypto/asn1/x_info.c + $(OPENSSL_PATH)/crypto/asn1/x_int64.c + $(OPENSSL_PATH)/crypto/asn1/x_long.c + $(OPENSSL_PATH)/crypto/asn1/x_pkey.c + $(OPENSSL_PATH)/crypto/asn1/x_sig.c + $(OPENSSL_PATH)/crypto/asn1/x_spki.c + $(OPENSSL_PATH)/crypto/asn1/x_val.c + $(OPENSSL_PATH)/crypto/async/arch/async_null.c + $(OPENSSL_PATH)/crypto/async/arch/async_posix.c + $(OPENSSL_PATH)/crypto/async/arch/async_win.c + $(OPENSSL_PATH)/crypto/async/async.c + $(OPENSSL_PATH)/crypto/async/async_err.c + $(OPENSSL_PATH)/crypto/async/async_wait.c + $(OPENSSL_PATH)/crypto/bio/b_addr.c + $(OPENSSL_PATH)/crypto/bio/b_dump.c + $(OPENSSL_PATH)/crypto/bio/b_sock.c + $(OPENSSL_PATH)/crypto/bio/b_sock2.c + $(OPENSSL_PATH)/crypto/bio/bf_buff.c + $(OPENSSL_PATH)/crypto/bio/bf_lbuf.c + $(OPENSSL_PATH)/crypto/bio/bf_nbio.c + $(OPENSSL_PATH)/crypto/bio/bf_null.c + $(OPENSSL_PATH)/crypto/bio/bio_cb.c + $(OPENSSL_PATH)/crypto/bio/bio_err.c + $(OPENSSL_PATH)/crypto/bio/bio_lib.c + $(OPENSSL_PATH)/crypto/bio/bio_meth.c + $(OPENSSL_PATH)/crypto/bio/bss_acpt.c + $(OPENSSL_PATH)/crypto/bio/bss_bio.c + $(OPENSSL_PATH)/crypto/bio/bss_conn.c + $(OPENSSL_PATH)/crypto/bio/bss_dgram.c + $(OPENSSL_PATH)/crypto/bio/bss_fd.c + $(OPENSSL_PATH)/crypto/bio/bss_file.c + $(OPENSSL_PATH)/crypto/bio/bss_log.c + $(OPENSSL_PATH)/crypto/bio/bss_mem.c + $(OPENSSL_PATH)/crypto/bio/bss_null.c + $(OPENSSL_PATH)/crypto/bio/bss_sock.c + $(OPENSSL_PATH)/crypto/bn/bn_add.c + $(OPENSSL_PATH)/crypto/bn/bn_asm.c + $(OPENSSL_PATH)/crypto/bn/bn_blind.c + $(OPENSSL_PATH)/crypto/bn/bn_const.c + $(OPENSSL_PATH)/crypto/bn/bn_ctx.c + $(OPENSSL_PATH)/crypto/bn/bn_depr.c + $(OPENSSL_PATH)/crypto/bn/bn_dh.c + $(OPENSSL_PATH)/crypto/bn/bn_div.c + $(OPENSSL_PATH)/crypto/bn/bn_err.c + $(OPENSSL_PATH)/crypto/bn/bn_exp.c + $(OPENSSL_PATH)/crypto/bn/bn_exp2.c + $(OPENSSL_PATH)/crypto/bn/bn_gcd.c + $(OPENSSL_PATH)/crypto/bn/bn_gf2m.c + $(OPENSSL_PATH)/crypto/bn/bn_intern.c + $(OPENSSL_PATH)/crypto/bn/bn_kron.c + $(OPENSSL_PATH)/crypto/bn/bn_lib.c + $(OPENSSL_PATH)/crypto/bn/bn_mod.c + $(OPENSSL_PATH)/crypto/bn/bn_mont.c + $(OPENSSL_PATH)/crypto/bn/bn_mpi.c + $(OPENSSL_PATH)/crypto/bn/bn_mul.c + $(OPENSSL_PATH)/crypto/bn/bn_nist.c + $(OPENSSL_PATH)/crypto/bn/bn_prime.c + $(OPENSSL_PATH)/crypto/bn/bn_print.c + $(OPENSSL_PATH)/crypto/bn/bn_rand.c + $(OPENSSL_PATH)/crypto/bn/bn_recp.c + $(OPENSSL_PATH)/crypto/bn/bn_shift.c + $(OPENSSL_PATH)/crypto/bn/bn_sqr.c + $(OPENSSL_PATH)/crypto/bn/bn_sqrt.c + $(OPENSSL_PATH)/crypto/bn/bn_srp.c + $(OPENSSL_PATH)/crypto/bn/bn_word.c + $(OPENSSL_PATH)/crypto/bn/bn_x931p.c + $(OPENSSL_PATH)/crypto/buffer/buf_err.c + $(OPENSSL_PATH)/crypto/buffer/buffer.c + $(OPENSSL_PATH)/crypto/cmac/cm_ameth.c + $(OPENSSL_PATH)/crypto/cmac/cm_pmeth.c + $(OPENSSL_PATH)/crypto/cmac/cmac.c + $(OPENSSL_PATH)/crypto/comp/c_zlib.c + $(OPENSSL_PATH)/crypto/comp/comp_err.c + $(OPENSSL_PATH)/crypto/comp/comp_lib.c + $(OPENSSL_PATH)/crypto/conf/conf_api.c + $(OPENSSL_PATH)/crypto/conf/conf_def.c + $(OPENSSL_PATH)/crypto/conf/conf_err.c + $(OPENSSL_PATH)/crypto/conf/conf_lib.c + $(OPENSSL_PATH)/crypto/conf/conf_mall.c + $(OPENSSL_PATH)/crypto/conf/conf_mod.c + $(OPENSSL_PATH)/crypto/conf/conf_sap.c + $(OPENSSL_PATH)/crypto/conf/conf_ssl.c + $(OPENSSL_PATH)/crypto/cpt_err.c + $(OPENSSL_PATH)/crypto/cryptlib.c + $(OPENSSL_PATH)/crypto/ctype.c + $(OPENSSL_PATH)/crypto/cversion.c + $(OPENSSL_PATH)/crypto/dh/dh_ameth.c + $(OPENSSL_PATH)/crypto/dh/dh_asn1.c + $(OPENSSL_PATH)/crypto/dh/dh_check.c + $(OPENSSL_PATH)/crypto/dh/dh_depr.c + $(OPENSSL_PATH)/crypto/dh/dh_err.c + $(OPENSSL_PATH)/crypto/dh/dh_gen.c + $(OPENSSL_PATH)/crypto/dh/dh_kdf.c + $(OPENSSL_PATH)/crypto/dh/dh_key.c + $(OPENSSL_PATH)/crypto/dh/dh_lib.c + $(OPENSSL_PATH)/crypto/dh/dh_meth.c + $(OPENSSL_PATH)/crypto/dh/dh_pmeth.c + $(OPENSSL_PATH)/crypto/dh/dh_prn.c + $(OPENSSL_PATH)/crypto/dh/dh_rfc5114.c + $(OPENSSL_PATH)/crypto/dh/dh_rfc7919.c + $(OPENSSL_PATH)/crypto/dso/dso_dl.c + $(OPENSSL_PATH)/crypto/dso/dso_dlfcn.c + $(OPENSSL_PATH)/crypto/dso/dso_err.c + $(OPENSSL_PATH)/crypto/dso/dso_lib.c + $(OPENSSL_PATH)/crypto/dso/dso_openssl.c + $(OPENSSL_PATH)/crypto/dso/dso_vms.c + $(OPENSSL_PATH)/crypto/dso/dso_win32.c + $(OPENSSL_PATH)/crypto/ebcdic.c + $(OPENSSL_PATH)/crypto/err/err.c + $(OPENSSL_PATH)/crypto/err/err_prn.c + $(OPENSSL_PATH)/crypto/evp/bio_b64.c + $(OPENSSL_PATH)/crypto/evp/bio_enc.c + $(OPENSSL_PATH)/crypto/evp/bio_md.c + $(OPENSSL_PATH)/crypto/evp/bio_ok.c + $(OPENSSL_PATH)/crypto/evp/c_allc.c + $(OPENSSL_PATH)/crypto/evp/c_alld.c + $(OPENSSL_PATH)/crypto/evp/cmeth_lib.c + $(OPENSSL_PATH)/crypto/evp/digest.c + $(OPENSSL_PATH)/crypto/evp/e_aes.c + $(OPENSSL_PATH)/crypto/evp/e_aes_cbc_hmac_sha1.c + $(OPENSSL_PATH)/crypto/evp/e_aes_cbc_hmac_sha256.c + $(OPENSSL_PATH)/crypto/evp/e_aria.c + $(OPENSSL_PATH)/crypto/evp/e_bf.c + $(OPENSSL_PATH)/crypto/evp/e_camellia.c + $(OPENSSL_PATH)/crypto/evp/e_cast.c + $(OPENSSL_PATH)/crypto/evp/e_chacha20_poly1305.c + $(OPENSSL_PATH)/crypto/evp/e_des.c + $(OPENSSL_PATH)/crypto/evp/e_des3.c + $(OPENSSL_PATH)/crypto/evp/e_idea.c + $(OPENSSL_PATH)/crypto/evp/e_null.c + $(OPENSSL_PATH)/crypto/evp/e_old.c + $(OPENSSL_PATH)/crypto/evp/e_rc2.c + $(OPENSSL_PATH)/crypto/evp/e_rc4.c + $(OPENSSL_PATH)/crypto/evp/e_rc4_hmac_md5.c + $(OPENSSL_PATH)/crypto/evp/e_rc5.c + $(OPENSSL_PATH)/crypto/evp/e_seed.c + $(OPENSSL_PATH)/crypto/evp/e_sm4.c + $(OPENSSL_PATH)/crypto/evp/e_xcbc_d.c + $(OPENSSL_PATH)/crypto/evp/encode.c + $(OPENSSL_PATH)/crypto/evp/evp_cnf.c + $(OPENSSL_PATH)/crypto/evp/evp_enc.c + $(OPENSSL_PATH)/crypto/evp/evp_err.c + $(OPENSSL_PATH)/crypto/evp/evp_key.c + $(OPENSSL_PATH)/crypto/evp/evp_lib.c + $(OPENSSL_PATH)/crypto/evp/evp_pbe.c + $(OPENSSL_PATH)/crypto/evp/evp_pkey.c + $(OPENSSL_PATH)/crypto/evp/m_md2.c + $(OPENSSL_PATH)/crypto/evp/m_md4.c + $(OPENSSL_PATH)/crypto/evp/m_md5.c + $(OPENSSL_PATH)/crypto/evp/m_md5_sha1.c + $(OPENSSL_PATH)/crypto/evp/m_mdc2.c + $(OPENSSL_PATH)/crypto/evp/m_null.c + $(OPENSSL_PATH)/crypto/evp/m_ripemd.c + $(OPENSSL_PATH)/crypto/evp/m_sha1.c + $(OPENSSL_PATH)/crypto/evp/m_sha3.c + $(OPENSSL_PATH)/crypto/evp/m_sigver.c + $(OPENSSL_PATH)/crypto/evp/m_wp.c + $(OPENSSL_PATH)/crypto/evp/names.c + $(OPENSSL_PATH)/crypto/evp/p5_crpt.c + $(OPENSSL_PATH)/crypto/evp/p5_crpt2.c + $(OPENSSL_PATH)/crypto/evp/p_dec.c + $(OPENSSL_PATH)/crypto/evp/p_enc.c + $(OPENSSL_PATH)/crypto/evp/p_lib.c + $(OPENSSL_PATH)/crypto/evp/p_open.c + $(OPENSSL_PATH)/crypto/evp/p_seal.c + $(OPENSSL_PATH)/crypto/evp/p_sign.c + $(OPENSSL_PATH)/crypto/evp/p_verify.c + $(OPENSSL_PATH)/crypto/evp/pbe_scrypt.c + $(OPENSSL_PATH)/crypto/evp/pmeth_fn.c + $(OPENSSL_PATH)/crypto/evp/pmeth_gn.c + $(OPENSSL_PATH)/crypto/evp/pmeth_lib.c + $(OPENSSL_PATH)/crypto/ex_data.c + $(OPENSSL_PATH)/crypto/getenv.c + $(OPENSSL_PATH)/crypto/hmac/hm_ameth.c + $(OPENSSL_PATH)/crypto/hmac/hm_pmeth.c + $(OPENSSL_PATH)/crypto/hmac/hmac.c + $(OPENSSL_PATH)/crypto/init.c + $(OPENSSL_PATH)/crypto/kdf/hkdf.c + $(OPENSSL_PATH)/crypto/kdf/kdf_err.c + $(OPENSSL_PATH)/crypto/kdf/scrypt.c + $(OPENSSL_PATH)/crypto/kdf/tls1_prf.c + $(OPENSSL_PATH)/crypto/lhash/lh_stats.c + $(OPENSSL_PATH)/crypto/lhash/lhash.c + $(OPENSSL_PATH)/crypto/md5/md5_dgst.c + $(OPENSSL_PATH)/crypto/md5/md5_one.c + $(OPENSSL_PATH)/crypto/mem.c + $(OPENSSL_PATH)/crypto/mem_dbg.c + $(OPENSSL_PATH)/crypto/mem_sec.c + $(OPENSSL_PATH)/crypto/modes/cbc128.c + $(OPENSSL_PATH)/crypto/modes/ccm128.c + $(OPENSSL_PATH)/crypto/modes/cfb128.c + $(OPENSSL_PATH)/crypto/modes/ctr128.c + $(OPENSSL_PATH)/crypto/modes/cts128.c + $(OPENSSL_PATH)/crypto/modes/gcm128.c + $(OPENSSL_PATH)/crypto/modes/ocb128.c + $(OPENSSL_PATH)/crypto/modes/ofb128.c + $(OPENSSL_PATH)/crypto/modes/wrap128.c + $(OPENSSL_PATH)/crypto/modes/xts128.c + $(OPENSSL_PATH)/crypto/o_dir.c + $(OPENSSL_PATH)/crypto/o_fips.c + $(OPENSSL_PATH)/crypto/o_fopen.c + $(OPENSSL_PATH)/crypto/o_init.c + $(OPENSSL_PATH)/crypto/o_str.c + $(OPENSSL_PATH)/crypto/o_time.c + $(OPENSSL_PATH)/crypto/objects/o_names.c + $(OPENSSL_PATH)/crypto/objects/obj_dat.c + $(OPENSSL_PATH)/crypto/objects/obj_err.c + $(OPENSSL_PATH)/crypto/objects/obj_lib.c + $(OPENSSL_PATH)/crypto/objects/obj_xref.c + $(OPENSSL_PATH)/crypto/ocsp/ocsp_asn.c + $(OPENSSL_PATH)/crypto/ocsp/ocsp_cl.c + $(OPENSSL_PATH)/crypto/ocsp/ocsp_err.c + $(OPENSSL_PATH)/crypto/ocsp/ocsp_ext.c + $(OPENSSL_PATH)/crypto/ocsp/ocsp_ht.c + $(OPENSSL_PATH)/crypto/ocsp/ocsp_lib.c + $(OPENSSL_PATH)/crypto/ocsp/ocsp_prn.c + $(OPENSSL_PATH)/crypto/ocsp/ocsp_srv.c + $(OPENSSL_PATH)/crypto/ocsp/ocsp_vfy.c + $(OPENSSL_PATH)/crypto/ocsp/v3_ocsp.c + $(OPENSSL_PATH)/crypto/pem/pem_all.c + $(OPENSSL_PATH)/crypto/pem/pem_err.c + $(OPENSSL_PATH)/crypto/pem/pem_info.c + $(OPENSSL_PATH)/crypto/pem/pem_lib.c + $(OPENSSL_PATH)/crypto/pem/pem_oth.c + $(OPENSSL_PATH)/crypto/pem/pem_pk8.c + $(OPENSSL_PATH)/crypto/pem/pem_pkey.c + $(OPENSSL_PATH)/crypto/pem/pem_sign.c + $(OPENSSL_PATH)/crypto/pem/pem_x509.c + $(OPENSSL_PATH)/crypto/pem/pem_xaux.c + $(OPENSSL_PATH)/crypto/pem/pvkfmt.c + $(OPENSSL_PATH)/crypto/pkcs12/p12_add.c + $(OPENSSL_PATH)/crypto/pkcs12/p12_asn.c + $(OPENSSL_PATH)/crypto/pkcs12/p12_attr.c + $(OPENSSL_PATH)/crypto/pkcs12/p12_crpt.c + $(OPENSSL_PATH)/crypto/pkcs12/p12_crt.c + $(OPENSSL_PATH)/crypto/pkcs12/p12_decr.c + $(OPENSSL_PATH)/crypto/pkcs12/p12_init.c + $(OPENSSL_PATH)/crypto/pkcs12/p12_key.c + $(OPENSSL_PATH)/crypto/pkcs12/p12_kiss.c + $(OPENSSL_PATH)/crypto/pkcs12/p12_mutl.c + $(OPENSSL_PATH)/crypto/pkcs12/p12_npas.c + $(OPENSSL_PATH)/crypto/pkcs12/p12_p8d.c + $(OPENSSL_PATH)/crypto/pkcs12/p12_p8e.c + $(OPENSSL_PATH)/crypto/pkcs12/p12_sbag.c + $(OPENSSL_PATH)/crypto/pkcs12/p12_utl.c + $(OPENSSL_PATH)/crypto/pkcs12/pk12err.c + $(OPENSSL_PATH)/crypto/pkcs7/bio_pk7.c + $(OPENSSL_PATH)/crypto/pkcs7/pk7_asn1.c + $(OPENSSL_PATH)/crypto/pkcs7/pk7_attr.c + $(OPENSSL_PATH)/crypto/pkcs7/pk7_doit.c + $(OPENSSL_PATH)/crypto/pkcs7/pk7_lib.c + $(OPENSSL_PATH)/crypto/pkcs7/pk7_mime.c + $(OPENSSL_PATH)/crypto/pkcs7/pk7_smime.c + $(OPENSSL_PATH)/crypto/pkcs7/pkcs7err.c + $(OPENSSL_PATH)/crypto/rand/drbg_ctr.c + $(OPENSSL_PATH)/crypto/rand/drbg_lib.c + $(OPENSSL_PATH)/crypto/rand/rand_egd.c + $(OPENSSL_PATH)/crypto/rand/rand_err.c + $(OPENSSL_PATH)/crypto/rand/rand_lib.c + $(OPENSSL_PATH)/crypto/rand/rand_unix.c + $(OPENSSL_PATH)/crypto/rand/rand_vms.c + $(OPENSSL_PATH)/crypto/rand/rand_win.c + $(OPENSSL_PATH)/crypto/rsa/rsa_ameth.c + $(OPENSSL_PATH)/crypto/rsa/rsa_asn1.c + $(OPENSSL_PATH)/crypto/rsa/rsa_chk.c + $(OPENSSL_PATH)/crypto/rsa/rsa_crpt.c + $(OPENSSL_PATH)/crypto/rsa/rsa_depr.c + $(OPENSSL_PATH)/crypto/rsa/rsa_err.c + $(OPENSSL_PATH)/crypto/rsa/rsa_gen.c + $(OPENSSL_PATH)/crypto/rsa/rsa_lib.c + $(OPENSSL_PATH)/crypto/rsa/rsa_meth.c + $(OPENSSL_PATH)/crypto/rsa/rsa_mp.c + $(OPENSSL_PATH)/crypto/rsa/rsa_none.c + $(OPENSSL_PATH)/crypto/rsa/rsa_oaep.c + $(OPENSSL_PATH)/crypto/rsa/rsa_ossl.c + $(OPENSSL_PATH)/crypto/rsa/rsa_pk1.c + $(OPENSSL_PATH)/crypto/rsa/rsa_pmeth.c + $(OPENSSL_PATH)/crypto/rsa/rsa_prn.c + $(OPENSSL_PATH)/crypto/rsa/rsa_pss.c + $(OPENSSL_PATH)/crypto/rsa/rsa_saos.c + $(OPENSSL_PATH)/crypto/rsa/rsa_sign.c + $(OPENSSL_PATH)/crypto/rsa/rsa_ssl.c + $(OPENSSL_PATH)/crypto/rsa/rsa_x931.c + $(OPENSSL_PATH)/crypto/rsa/rsa_x931g.c + $(OPENSSL_PATH)/crypto/sha/keccak1600.c + $(OPENSSL_PATH)/crypto/sha/sha1_one.c + $(OPENSSL_PATH)/crypto/sha/sha1dgst.c + $(OPENSSL_PATH)/crypto/sha/sha256.c + $(OPENSSL_PATH)/crypto/sha/sha512.c + $(OPENSSL_PATH)/crypto/siphash/siphash.c + $(OPENSSL_PATH)/crypto/siphash/siphash_ameth.c + $(OPENSSL_PATH)/crypto/siphash/siphash_pmeth.c + $(OPENSSL_PATH)/crypto/sm3/m_sm3.c + $(OPENSSL_PATH)/crypto/sm3/sm3.c + $(OPENSSL_PATH)/crypto/sm4/sm4.c + $(OPENSSL_PATH)/crypto/stack/stack.c + $(OPENSSL_PATH)/crypto/threads_none.c + $(OPENSSL_PATH)/crypto/threads_pthread.c + $(OPENSSL_PATH)/crypto/threads_win.c + $(OPENSSL_PATH)/crypto/txt_db/txt_db.c + $(OPENSSL_PATH)/crypto/ui/ui_err.c + $(OPENSSL_PATH)/crypto/ui/ui_lib.c + $(OPENSSL_PATH)/crypto/ui/ui_null.c + $(OPENSSL_PATH)/crypto/ui/ui_openssl.c + $(OPENSSL_PATH)/crypto/ui/ui_util.c + $(OPENSSL_PATH)/crypto/uid.c + $(OPENSSL_PATH)/crypto/x509/by_dir.c + $(OPENSSL_PATH)/crypto/x509/by_file.c + $(OPENSSL_PATH)/crypto/x509/t_crl.c + $(OPENSSL_PATH)/crypto/x509/t_req.c + $(OPENSSL_PATH)/crypto/x509/t_x509.c + $(OPENSSL_PATH)/crypto/x509/x509_att.c + $(OPENSSL_PATH)/crypto/x509/x509_cmp.c + $(OPENSSL_PATH)/crypto/x509/x509_d2.c + $(OPENSSL_PATH)/crypto/x509/x509_def.c + $(OPENSSL_PATH)/crypto/x509/x509_err.c + $(OPENSSL_PATH)/crypto/x509/x509_ext.c + $(OPENSSL_PATH)/crypto/x509/x509_lu.c + $(OPENSSL_PATH)/crypto/x509/x509_meth.c + $(OPENSSL_PATH)/crypto/x509/x509_obj.c + $(OPENSSL_PATH)/crypto/x509/x509_r2x.c + $(OPENSSL_PATH)/crypto/x509/x509_req.c + $(OPENSSL_PATH)/crypto/x509/x509_set.c + $(OPENSSL_PATH)/crypto/x509/x509_trs.c + $(OPENSSL_PATH)/crypto/x509/x509_txt.c + $(OPENSSL_PATH)/crypto/x509/x509_v3.c + $(OPENSSL_PATH)/crypto/x509/x509_vfy.c + $(OPENSSL_PATH)/crypto/x509/x509_vpm.c + $(OPENSSL_PATH)/crypto/x509/x509cset.c + $(OPENSSL_PATH)/crypto/x509/x509name.c + $(OPENSSL_PATH)/crypto/x509/x509rset.c + $(OPENSSL_PATH)/crypto/x509/x509spki.c + $(OPENSSL_PATH)/crypto/x509/x509type.c + $(OPENSSL_PATH)/crypto/x509/x_all.c + $(OPENSSL_PATH)/crypto/x509/x_attrib.c + $(OPENSSL_PATH)/crypto/x509/x_crl.c + $(OPENSSL_PATH)/crypto/x509/x_exten.c + $(OPENSSL_PATH)/crypto/x509/x_name.c + $(OPENSSL_PATH)/crypto/x509/x_pubkey.c + $(OPENSSL_PATH)/crypto/x509/x_req.c + $(OPENSSL_PATH)/crypto/x509/x_x509.c + $(OPENSSL_PATH)/crypto/x509/x_x509a.c + $(OPENSSL_PATH)/crypto/x509v3/pcy_cache.c + $(OPENSSL_PATH)/crypto/x509v3/pcy_data.c + $(OPENSSL_PATH)/crypto/x509v3/pcy_lib.c + $(OPENSSL_PATH)/crypto/x509v3/pcy_map.c + $(OPENSSL_PATH)/crypto/x509v3/pcy_node.c + $(OPENSSL_PATH)/crypto/x509v3/pcy_tree.c + $(OPENSSL_PATH)/crypto/x509v3/v3_addr.c + $(OPENSSL_PATH)/crypto/x509v3/v3_admis.c + $(OPENSSL_PATH)/crypto/x509v3/v3_akey.c + $(OPENSSL_PATH)/crypto/x509v3/v3_akeya.c + $(OPENSSL_PATH)/crypto/x509v3/v3_alt.c + $(OPENSSL_PATH)/crypto/x509v3/v3_asid.c + $(OPENSSL_PATH)/crypto/x509v3/v3_bcons.c + $(OPENSSL_PATH)/crypto/x509v3/v3_bitst.c + $(OPENSSL_PATH)/crypto/x509v3/v3_conf.c + $(OPENSSL_PATH)/crypto/x509v3/v3_cpols.c + $(OPENSSL_PATH)/crypto/x509v3/v3_crld.c + $(OPENSSL_PATH)/crypto/x509v3/v3_enum.c + $(OPENSSL_PATH)/crypto/x509v3/v3_extku.c + $(OPENSSL_PATH)/crypto/x509v3/v3_genn.c + $(OPENSSL_PATH)/crypto/x509v3/v3_ia5.c + $(OPENSSL_PATH)/crypto/x509v3/v3_info.c + $(OPENSSL_PATH)/crypto/x509v3/v3_int.c + $(OPENSSL_PATH)/crypto/x509v3/v3_lib.c + $(OPENSSL_PATH)/crypto/x509v3/v3_ncons.c + $(OPENSSL_PATH)/crypto/x509v3/v3_pci.c + $(OPENSSL_PATH)/crypto/x509v3/v3_pcia.c + $(OPENSSL_PATH)/crypto/x509v3/v3_pcons.c + $(OPENSSL_PATH)/crypto/x509v3/v3_pku.c + $(OPENSSL_PATH)/crypto/x509v3/v3_pmaps.c + $(OPENSSL_PATH)/crypto/x509v3/v3_prn.c + $(OPENSSL_PATH)/crypto/x509v3/v3_purp.c + $(OPENSSL_PATH)/crypto/x509v3/v3_skey.c + $(OPENSSL_PATH)/crypto/x509v3/v3_sxnet.c + $(OPENSSL_PATH)/crypto/x509v3/v3_tlsf.c + $(OPENSSL_PATH)/crypto/x509v3/v3_utl.c + $(OPENSSL_PATH)/crypto/x509v3/v3err.c + $(OPENSSL_PATH)/crypto/arm_arch.h + $(OPENSSL_PATH)/crypto/mips_arch.h + $(OPENSSL_PATH)/crypto/ppc_arch.h + $(OPENSSL_PATH)/crypto/s390x_arch.h + $(OPENSSL_PATH)/crypto/sparc_arch.h + $(OPENSSL_PATH)/crypto/vms_rms.h + $(OPENSSL_PATH)/crypto/aes/aes_local.h + $(OPENSSL_PATH)/crypto/asn1/asn1_item_list.h + $(OPENSSL_PATH)/crypto/asn1/asn1_local.h + $(OPENSSL_PATH)/crypto/asn1/charmap.h + $(OPENSSL_PATH)/crypto/asn1/standard_methods.h + $(OPENSSL_PATH)/crypto/asn1/tbl_standard.h + $(OPENSSL_PATH)/crypto/async/async_local.h + $(OPENSSL_PATH)/crypto/async/arch/async_null.h + $(OPENSSL_PATH)/crypto/async/arch/async_posix.h + $(OPENSSL_PATH)/crypto/async/arch/async_win.h + $(OPENSSL_PATH)/crypto/bio/bio_local.h + $(OPENSSL_PATH)/crypto/bn/bn_local.h + $(OPENSSL_PATH)/crypto/bn/bn_prime.h + $(OPENSSL_PATH)/crypto/bn/rsaz_exp.h + $(OPENSSL_PATH)/crypto/comp/comp_local.h + $(OPENSSL_PATH)/crypto/conf/conf_def.h + $(OPENSSL_PATH)/crypto/conf/conf_local.h + $(OPENSSL_PATH)/crypto/dh/dh_local.h + $(OPENSSL_PATH)/crypto/dso/dso_local.h + $(OPENSSL_PATH)/crypto/evp/evp_local.h + $(OPENSSL_PATH)/crypto/hmac/hmac_local.h + $(OPENSSL_PATH)/crypto/lhash/lhash_local.h + $(OPENSSL_PATH)/crypto/md5/md5_local.h + $(OPENSSL_PATH)/crypto/modes/modes_local.h + $(OPENSSL_PATH)/crypto/objects/obj_dat.h + $(OPENSSL_PATH)/crypto/objects/obj_local.h + $(OPENSSL_PATH)/crypto/objects/obj_xref.h + $(OPENSSL_PATH)/crypto/ocsp/ocsp_local.h + $(OPENSSL_PATH)/crypto/pkcs12/p12_local.h + $(OPENSSL_PATH)/crypto/rand/rand_local.h + $(OPENSSL_PATH)/crypto/rsa/rsa_local.h + $(OPENSSL_PATH)/crypto/sha/sha_local.h + $(OPENSSL_PATH)/crypto/siphash/siphash_local.h + $(OPENSSL_PATH)/crypto/sm3/sm3_local.h + $(OPENSSL_PATH)/crypto/store/store_local.h + $(OPENSSL_PATH)/crypto/ui/ui_local.h + $(OPENSSL_PATH)/crypto/x509/x509_local.h + $(OPENSSL_PATH)/crypto/x509v3/ext_dat.h + $(OPENSSL_PATH)/crypto/x509v3/pcy_local.h + $(OPENSSL_PATH)/crypto/x509v3/standard_exts.h + $(OPENSSL_PATH)/crypto/x509v3/v3_admis.h + $(OPENSSL_PATH)/ssl/bio_ssl.c + $(OPENSSL_PATH)/ssl/d1_lib.c + $(OPENSSL_PATH)/ssl/d1_msg.c + $(OPENSSL_PATH)/ssl/d1_srtp.c + $(OPENSSL_PATH)/ssl/methods.c + $(OPENSSL_PATH)/ssl/packet.c + $(OPENSSL_PATH)/ssl/pqueue.c + $(OPENSSL_PATH)/ssl/record/dtls1_bitmap.c + $(OPENSSL_PATH)/ssl/record/rec_layer_d1.c + $(OPENSSL_PATH)/ssl/record/rec_layer_s3.c + $(OPENSSL_PATH)/ssl/record/ssl3_buffer.c + $(OPENSSL_PATH)/ssl/record/ssl3_record.c + $(OPENSSL_PATH)/ssl/record/ssl3_record_tls13.c + $(OPENSSL_PATH)/ssl/s3_cbc.c + $(OPENSSL_PATH)/ssl/s3_enc.c + $(OPENSSL_PATH)/ssl/s3_lib.c + $(OPENSSL_PATH)/ssl/s3_msg.c + $(OPENSSL_PATH)/ssl/ssl_asn1.c + $(OPENSSL_PATH)/ssl/ssl_cert.c + $(OPENSSL_PATH)/ssl/ssl_ciph.c + $(OPENSSL_PATH)/ssl/ssl_conf.c + $(OPENSSL_PATH)/ssl/ssl_err.c + $(OPENSSL_PATH)/ssl/ssl_init.c + $(OPENSSL_PATH)/ssl/ssl_lib.c + $(OPENSSL_PATH)/ssl/ssl_mcnf.c + $(OPENSSL_PATH)/ssl/ssl_rsa.c + $(OPENSSL_PATH)/ssl/ssl_sess.c + $(OPENSSL_PATH)/ssl/ssl_stat.c + $(OPENSSL_PATH)/ssl/ssl_txt.c + $(OPENSSL_PATH)/ssl/ssl_utst.c + $(OPENSSL_PATH)/ssl/statem/extensions.c + $(OPENSSL_PATH)/ssl/statem/extensions_clnt.c + $(OPENSSL_PATH)/ssl/statem/extensions_cust.c + $(OPENSSL_PATH)/ssl/statem/extensions_srvr.c + $(OPENSSL_PATH)/ssl/statem/statem.c + $(OPENSSL_PATH)/ssl/statem/statem_clnt.c + $(OPENSSL_PATH)/ssl/statem/statem_dtls.c + $(OPENSSL_PATH)/ssl/statem/statem_lib.c + $(OPENSSL_PATH)/ssl/statem/statem_srvr.c + $(OPENSSL_PATH)/ssl/t1_enc.c + $(OPENSSL_PATH)/ssl/t1_lib.c + $(OPENSSL_PATH)/ssl/t1_trce.c + $(OPENSSL_PATH)/ssl/tls13_enc.c + $(OPENSSL_PATH)/ssl/tls_srp.c + $(OPENSSL_PATH)/ssl/packet_local.h + $(OPENSSL_PATH)/ssl/ssl_cert_table.h + $(OPENSSL_PATH)/ssl/ssl_local.h + $(OPENSSL_PATH)/ssl/record/record.h + $(OPENSSL_PATH)/ssl/record/record_local.h + $(OPENSSL_PATH)/ssl/statem/statem.h + $(OPENSSL_PATH)/ssl/statem/statem_local.h +# Autogenerated files list ends here + buildinf.h + ossl_store.c + rand_pool.c + X64/ApiHooks.c + +[Packages] + MdePkg/MdePkg.dec + CryptoPkg/CryptoPkg.dec + +[LibraryClasses] + BaseLib + DebugLib + RngLib + PrintLib + +[BuildOptions] + # + # Disables the following Visual Studio compiler warnings brought by openssl source, + # so we do not break the build with /WX option: + # C4090: 'function' : different 'const' qualifiers + # C4132: 'object' : const object should be initialized (tls13_enc.c) + # C4210: nonstandard extension used: function given file scope + # C4244: conversion from type1 to type2, possible loss of data + # C4245: conversion from type1 to type2, signed/unsigned mismatch + # C4267: conversion from size_t to type, possible loss of data + # C4306: 'identifier' : conversion from 'type1' to 'type2' of greater size + # C4310: cast truncates constant value + # C4389: 'operator' : signed/unsigned mismatch (xxxx) + # C4700: uninitialized local variable 'name' used. (conf_sap.c(71)) + # C4702: unreachable code + # C4706: assignment within conditional expression + # C4819: The file contains a character that cannot be represented in the current code page + # + MSFT:*_*_X64_CC_FLAGS = -U_WIN32 -U_WIN64 -U_MSC_VER $(OPENSSL_FLAGS) $(OPENSSL_FLAGS_CONFIG) /wd4090 /wd4132 /wd4210 /wd4244 /wd4245 /wd4267 /wd4306 /wd4310 /wd4700 /wd4389 /wd4702 /wd4706 /wd4819 + + INTEL:*_*_X64_CC_FLAGS = -U_WIN32 -U_WIN64 -U_MSC_VER -U__ICC $(OPENSSL_FLAGS) $(OPENSSL_FLAGS_CONFIG) /w + + # + # Suppress the following build warnings in openssl so we don't break the build with -Werror + # -Werror=maybe-uninitialized: there exist some other paths for which the variable is not initialized. + # -Werror=format: Check calls to printf and scanf, etc., to make sure that the arguments supplied have + # types appropriate to the format string specified. + # -Werror=unused-but-set-variable: Warn whenever a local variable is assigned to, but otherwise unused (aside from its declaration). + # + GCC:*_*_X64_CC_FLAGS = -UWIN32 -U_WIN32 -U_WIN64 $(OPENSSL_FLAGS) $(OPENSSL_FLAGS_CONFIG) -Wno-error=maybe-uninitialized -Wno-error=format -Wno-format -Wno-error=unused-but-set-variable -DNO_MSABI_VA_FUNCS + + # suppress the following warnings in openssl so we don't break the build with warnings-as-errors: + # 1295: Deprecated declaration <entity> - give arg types + # 550: <entity> was set but never used + # 1293: assignment in condition + # 111: statement is unreachable (invariably "break;" after "return X;" in case statement) + # 68: integer conversion resulted in a change of sign ("if (Status == -1)") + # 177: <entity> was declared but never referenced + # 223: function <entity> declared implicitly + # 144: a value of type <type> cannot be used to initialize an entity of type <type> + # 513: a value of type <type> cannot be assigned to an entity of type <type> + # 188: enumerated type mixed with another type (i.e. passing an integer as an enum without a cast) + # 1296: Extended constant initialiser used + # 128: loop is not reachable - may be emitted inappropriately if code follows a conditional return + # from the function that evaluates to true at compile time + # 546: transfer of control bypasses initialization - may be emitted inappropriately if the uninitialized + # variable is never referenced after the jump + # 1: ignore "#1-D: last line of file ends without a newline" + # 3017: <entity> may be used before being set (NOTE: This was fixed in OpenSSL 1.1 HEAD with + # commit d9b8b89bec4480de3a10bdaf9425db371c19145b, and can be dropped then.) + XCODE:*_*_X64_CC_FLAGS = -mmmx -msse -U_WIN32 -U_WIN64 $(OPENSSL_FLAGS) $(OPENSSL_FLAGS_CONFIG) -w -std=c99 -Wno-error=uninitialized diff --git a/CryptoPkg/Library/OpensslLib/UefiAsm.conf b/CryptoPkg/Library/OpensslLib/UefiAsm.conf new file mode 100644 index 0000000000..2c2978d696 --- /dev/null +++ b/CryptoPkg/Library/OpensslLib/UefiAsm.conf @@ -0,0 +1,30 @@ +## -*- mode: perl; -*- +# UEFI assembly openssl configuration targets. +# +# Copyright (c) 2020, Intel Corporation. All rights reserved.<BR> +# +# SPDX-License-Identifier: BSD-2-Clause-Patent +# +## + +my %targets = ( +#### UEFI + "UEFI-x86_64" => { + perlasm_scheme => "nasm", + # inherit_from => [ "UEFI", asm("x86_64_asm") ], + inherit_from => [ "UEFI" ], + cpuid_asm_src => "x86_64cpuid.s", + aes_asm_src => "aes_core.c aes_cbc.c vpaes-x86_64.s aesni-x86_64.s aesni-sha1-x86_64.s aesni-sha256-x86_64.s aesni-mb-x86_64.s", + sha1_asm_src => "sha1-x86_64.s sha256-x86_64.s sha512-x86_64.s sha1-mb-x86_64.s sha256-mb-x86_64.s", + modes_asm_src => "ghash-x86_64.s aesni-gcm-x86_64.s", + }, + "UEFI-x86_64-GCC" => { + perlasm_scheme => "elf", + # inherit_from => [ "UEFI", asm("x86_64_asm") ], + inherit_from => [ "UEFI" ], + cpuid_asm_src => "x86_64cpuid.s", + aes_asm_src => "aes_core.c aes_cbc.c vpaes-x86_64.s aesni-x86_64.s aesni-sha1-x86_64.s aesni-sha256-x86_64.s aesni-mb-x86_64.s", + sha1_asm_src => "sha1-x86_64.s sha256-x86_64.s sha512-x86_64.s sha1-mb-x86_64.s sha256-mb-x86_64.s", + modes_asm_src => "ghash-x86_64.s aesni-gcm-x86_64.s", + }, +); diff --git a/CryptoPkg/Library/OpensslLib/X64/ApiHooks.c b/CryptoPkg/Library/OpensslLib/X64/ApiHooks.c new file mode 100644 index 0000000000..0c8043aa8e --- /dev/null +++ b/CryptoPkg/Library/OpensslLib/X64/ApiHooks.c @@ -0,0 +1,22 @@ +/** @file + OpenSSL Library API hooks. + +Copyright (c) 2020, Intel Corporation. All rights reserved.<BR> +SPDX-License-Identifier: BSD-2-Clause-Patent + +**/ + +#include <Uefi.h> + +/** + Stub function for win64 API call. + +**/ +VOID * +__imp_RtlVirtualUnwind ( + VOID * Args + ) +{ + return NULL; +} + diff --git a/CryptoPkg/Library/OpensslLib/process_files.pl b/CryptoPkg/Library/OpensslLib/process_files.pl index 57ce195394..42bff05fa6 100755 --- a/CryptoPkg/Library/OpensslLib/process_files.pl +++ b/CryptoPkg/Library/OpensslLib/process_files.pl @@ -9,9 +9,65 @@ # do not need to do this, since the results are stored in the EDK2 # git repository for them. # +# Due to the script wrapping required to process the OpenSSL +# configuration data, each native architecture must be processed +# individually by the maintainer (in addition to the standard version): +# ./process_files.pl +# ./process_files.pl X64 +# ./process_files.pl [Arch] + use strict; use Cwd; use File::Copy; +use File::Basename; +use File::Path qw(make_path remove_tree); +use Text::Tabs; + +my $comment_character; + +# +# OpenSSL perlasm generator script does not transfer the copyright header +# +sub copy_license_header +{ + my @args = split / /, shift; #Separate args by spaces + my $source = $args[1]; #Source file is second (after "perl") + my $target = pop @args; #Target file is always last + chop ($target); #Remove newline char + + my $temp_file_name = "license.tmp"; + open (my $source_file, "<" . $source) || die $source; + open (my $target_file, "<" . $target) || die $target; + open (my $temp_file, ">" . $temp_file_name) || die $temp_file_name; + + #Add "generated file" warning + $source =~ s/^..//; #Remove leading "./" + print ($temp_file "$comment_character WARNING: do not edit!\r\n"); + print ($temp_file "$comment_character Generated from $source\r\n"); + print ($temp_file "$comment_character\r\n"); + + #Copy source file header to temp file + while (my $line = <$source_file>) { + next if ($line =~ /#!/); #Ignore shebang line + $line =~ s/#/$comment_character/; #Fix comment character for assembly + $line =~ s/\s+$/\r\n/; #Trim trailing whitepsace, fixup line endings + print ($temp_file $line); + last if ($line =~ /http/); #Last line of copyright header contains a web link + } + print ($temp_file "\r\n"); + #Retrieve generated assembly contents + while (my $line = <$target_file>) { + $line =~ s/\s+$/\r\n/; #Trim trailing whitepsace, fixup line endings + print ($temp_file expand ($line)); #expand() replaces tabs with spaces + } + + close ($source_file); + close ($target_file); + close ($temp_file); + + move ($temp_file_name, $target) || + die "Cannot replace \"" . $target . "\"!"; +} # # Find the openssl directory name for use lib. We have to do this @@ -21,10 +77,57 @@ use File::Copy; # my $inf_file; my $OPENSSL_PATH; +my $uefi_config; +my $extension; +my $arch; my @inf; BEGIN { $inf_file = "OpensslLib.inf"; + $uefi_config = "UEFI"; + $arch = shift; + + if (defined $arch) { + if (uc ($arch) eq "X64") { + $arch = "X64"; + $inf_file = "OpensslLibX64.inf"; + $uefi_config = "UEFI-x86_64"; + $extension = "nasm"; + $comment_character = ";"; + } elsif (uc ($arch) eq "X64GCC") { + $arch = "X64Gcc"; + $inf_file = "OpensslLibX64Gcc.inf"; + $uefi_config = "UEFI-x86_64-GCC"; + $extension = "S"; + $comment_character = "#"; + } else { + die "Unsupported architecture \"" . $arch . "\"!"; + } + if ($extension eq "nasm") { + if (`nasm -v 2>&1`) { + #Presence of nasm executable will trigger inclusion of AVX instructions + die "\nCannot run assembly generators with NASM in path!\n\n"; + } + } + + # Prepare assembly folder + if (-d $arch) { + opendir my $dir, $arch || + die "Cannot open assembly folder \"" . $arch . "\"!"; + while (defined (my $file = readdir $dir)) { + if (-d "$arch/$file") { + next if $file eq "."; + next if $file eq ".."; + remove_tree ("$arch/$file", {safe => 1}) || + die "Cannot clean assembly folder \"" . "$arch/$file" . "\"!"; + } + } + + } else { + mkdir $arch || + die "Cannot create assembly folder \"" . $arch . "\"!"; + } + } # Read the contents of the inf file open( FD, "<" . $inf_file ) || @@ -47,9 +150,9 @@ BEGIN { # Configure UEFI system( "./Configure", - "UEFI", + "--config=../UefiAsm.conf", + "$uefi_config", "no-afalgeng", - "no-asm", "no-async", "no-autoerrinit", "no-autoload-config", @@ -129,23 +232,53 @@ BEGIN { # Retrieve file lists from OpenSSL configdata # use configdata qw/%unified_info/; +use configdata qw/%config/; +use configdata qw/%target/; + +# +# Collect build flags from configdata +# +my $flags = ""; +foreach my $f (@{$config{lib_defines}}) { + $flags .= " -D$f"; +} my @cryptofilelist = (); my @sslfilelist = (); +my @asmfilelist = (); +my @asmbuild = (); foreach my $product ((@{$unified_info{libraries}}, @{$unified_info{engines}})) { foreach my $o (@{$unified_info{sources}->{$product}}) { foreach my $s (@{$unified_info{sources}->{$o}}) { - next if ($unified_info{generate}->{$s}); - next if $s =~ "crypto/bio/b_print.c"; - # No need to add unused files in UEFI. # So it can reduce porting time, compile time, library size. + next if $s =~ "crypto/bio/b_print.c"; next if $s =~ "crypto/rand/randfile.c"; next if $s =~ "crypto/store/"; next if $s =~ "crypto/err/err_all.c"; next if $s =~ "crypto/aes/aes_ecb.c"; + if ($unified_info{generate}->{$s}) { + if (defined $arch) { + my $buildstring = "perl"; + foreach my $arg (@{$unified_info{generate}->{$s}}) { + if ($arg =~ ".pl") { + $buildstring .= " ./openssl/$arg"; + } elsif ($arg =~ "PERLASM_SCHEME") { + $buildstring .= " $target{perlasm_scheme}"; + } elsif ($arg =~ "LIB_CFLAGS") { + $buildstring .= "$flags"; + } + } + ($s, my $path, undef) = fileparse($s, qr/\.[^.]*/); + $buildstring .= " ./$arch/$path$s.$extension"; + make_path ("./$arch/$path"); + push @asmbuild, "$buildstring\n"; + push @asmfilelist, " $arch/$path$s.$extension\r\n"; + } + next; + } if ($product =~ "libssl") { push @sslfilelist, ' $(OPENSSL_PATH)/' . $s . "\r\n"; next; @@ -183,15 +316,31 @@ foreach (@headers){ } +# +# Generate assembly files +# +if (@asmbuild) { + print "\n--> Generating assembly files ... "; + foreach my $buildstring (@asmbuild) { + system ("$buildstring"); + copy_license_header ($buildstring); + } + print "Done!"; +} + # # Update OpensslLib.inf with autogenerated file list # my @new_inf = (); my $subbing = 0; -print "\n--> Updating OpensslLib.inf ... "; +print "\n--> Updating $inf_file ... "; foreach (@inf) { + if ($_ =~ "DEFINE OPENSSL_FLAGS_CONFIG") { + push @new_inf, " DEFINE OPENSSL_FLAGS_CONFIG =" . $flags . "\r\n"; + next; + } if ( $_ =~ "# Autogenerated files list starts here" ) { - push @new_inf, $_, @cryptofilelist, @sslfilelist; + push @new_inf, $_, @asmfilelist, @cryptofilelist, @sslfilelist; $subbing = 1; next; } @@ -216,49 +365,51 @@ rename( $new_inf_file, $inf_file ) || die "rename $inf_file"; print "Done!"; -# -# Update OpensslLibCrypto.inf with auto-generated file list (no libssl) -# -$inf_file = "OpensslLibCrypto.inf"; - -# Read the contents of the inf file -@inf = (); -@new_inf = (); -open( FD, "<" . $inf_file ) || - die "Cannot open \"" . $inf_file . "\"!"; -@inf = (<FD>); -close(FD) || - die "Cannot close \"" . $inf_file . "\"!"; +if (!defined $arch) { + # + # Update OpensslLibCrypto.inf with auto-generated file list (no libssl) + # + $inf_file = "OpensslLibCrypto.inf"; -$subbing = 0; -print "\n--> Updating OpensslLibCrypto.inf ... "; -foreach (@inf) { - if ( $_ =~ "# Autogenerated files list starts here" ) { - push @new_inf, $_, @cryptofilelist; - $subbing = 1; - next; - } - if ( $_ =~ "# Autogenerated files list ends here" ) { - push @new_inf, $_; - $subbing = 0; - next; + # Read the contents of the inf file + @inf = (); + @new_inf = (); + open( FD, "<" . $inf_file ) || + die "Cannot open \"" . $inf_file . "\"!"; + @inf = (<FD>); + close(FD) || + die "Cannot close \"" . $inf_file . "\"!"; + + $subbing = 0; + print "\n--> Updating OpensslLibCrypto.inf ... "; + foreach (@inf) { + if ( $_ =~ "# Autogenerated files list starts here" ) { + push @new_inf, $_, @cryptofilelist; + $subbing = 1; + next; + } + if ( $_ =~ "# Autogenerated files list ends here" ) { + push @new_inf, $_; + $subbing = 0; + next; + } + + push @new_inf, $_ + unless ($subbing); } - push @new_inf, $_ - unless ($subbing); + $new_inf_file = $inf_file . ".new"; + open( FD, ">" . $new_inf_file ) || + die $new_inf_file; + print( FD @new_inf ) || + die $new_inf_file; + close(FD) || + die $new_inf_file; + rename( $new_inf_file, $inf_file ) || + die "rename $inf_file"; + print "Done!"; } -$new_inf_file = $inf_file . ".new"; -open( FD, ">" . $new_inf_file ) || - die $new_inf_file; -print( FD @new_inf ) || - die $new_inf_file; -close(FD) || - die $new_inf_file; -rename( $new_inf_file, $inf_file ) || - die "rename $inf_file"; -print "Done!"; - # # Copy opensslconf.h and dso_conf.h generated from OpenSSL Configuration # -- 2.32.0.windows.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v7 2/3] CryptoPkg/OpensslLib: Add native instruction support for X64 2021-07-20 22:06 ` [PATCH v7 2/3] CryptoPkg/OpensslLib: Add native instruction support for X64 Christopher Zurcher @ 2021-07-21 11:44 ` Yao, Jiewen 0 siblings, 0 replies; 13+ messages in thread From: Yao, Jiewen @ 2021-07-21 11:44 UTC (permalink / raw) To: christopher.zurcher@outlook.com, devel@edk2.groups.io Cc: Wang, Jian J, Lu, XiaoyuX, Kinney, Michael D, Ard Biesheuvel Reviewed-by: Jiewen Yao <Jiewen.yao@intel.com> > -----Original Message----- > From: christopher.zurcher@outlook.com <christopher.zurcher@outlook.com> > Sent: Wednesday, July 21, 2021 6:07 AM > To: devel@edk2.groups.io > Cc: Yao, Jiewen <jiewen.yao@intel.com>; Wang, Jian J <jian.j.wang@intel.com>; > Lu, XiaoyuX <xiaoyux.lu@intel.com>; Kinney, Michael D > <michael.d.kinney@intel.com>; Ard Biesheuvel <ardb@kernel.org> > Subject: [PATCH v7 2/3] CryptoPkg/OpensslLib: Add native instruction support > for X64 > > From: Christopher Zurcher <christopher.zurcher@microsoft.com> > > BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=2507 > > Adding OpensslLibX64.inf and modifying process_files.pl to process this > file and generate the necessary assembly files. > Adding OpensslLibX64Gcc.inf to allow building with GCC toolchain. > ApiHooks.c contains a stub function for a Windows API call. > uefi-asm.conf contains the limited assembly configurations for OpenSSL. > > Cc: Jiewen Yao <jiewen.yao@intel.com> > Cc: Jian J Wang <jian.j.wang@intel.com> > Cc: Xiaoyu Lu <xiaoyux.lu@intel.com> > Cc: Mike Kinney <michael.d.kinney@intel.com> > Cc: Ard Biesheuvel <ardb@kernel.org> > Signed-off-by: Christopher Zurcher <christopher.zurcher@microsoft.com> > --- > CryptoPkg/CryptoPkg.ci.yaml | 21 +- > CryptoPkg/Library/Include/CrtLibSupport.h | 2 + > CryptoPkg/Library/Include/openssl/opensslconf.h | 3 - > CryptoPkg/Library/OpensslLib/OpensslLib.inf | 2 +- > CryptoPkg/Library/OpensslLib/OpensslLibConstructor.c | 44 ++ > CryptoPkg/Library/OpensslLib/OpensslLibCrypto.inf | 2 +- > CryptoPkg/Library/OpensslLib/OpensslLibX64.inf | 653 > ++++++++++++++++++++ > CryptoPkg/Library/OpensslLib/OpensslLibX64Gcc.inf | 653 > ++++++++++++++++++++ > CryptoPkg/Library/OpensslLib/UefiAsm.conf | 30 + > CryptoPkg/Library/OpensslLib/X64/ApiHooks.c | 22 + > CryptoPkg/Library/OpensslLib/process_files.pl | 241 ++++++-- > 11 files changed, 1619 insertions(+), 54 deletions(-) > > diff --git a/CryptoPkg/CryptoPkg.ci.yaml b/CryptoPkg/CryptoPkg.ci.yaml > index 5d7c340ae5..1448299073 100644 > --- a/CryptoPkg/CryptoPkg.ci.yaml > +++ b/CryptoPkg/CryptoPkg.ci.yaml > @@ -7,7 +7,11 @@ > ## > { > "LicenseCheck": { > - "IgnoreFiles": [] > + "IgnoreFiles": [ > + # These directories contain auto-generated OpenSSL content > + "Library/OpensslLib/X64", > + "Library/OpensslLib/X64Gcc" > + ] > }, > "EccCheck": { > ## Exception sample looks like below: > @@ -23,8 +27,13 @@ > "Test/UnitTest", > # This has OpenSSL interfaces that aren't UEFI spec compliant > "Library/BaseCryptLib/SysCall/UnitTestHostCrtWrapper.c", > - # this has OpenSSL interfaces that aren't UEFI spec compliant > - "Library/OpensslLib/rand_pool.c" > + # This has OpenSSL interfaces that aren't UEFI spec compliant > + "Library/OpensslLib/rand_pool.c", > + # This has OpenSSL interfaces that aren't UEFI spec compliant > + "Library/Include/CrtLibSupport.h", > + # These directories contain auto-generated OpenSSL content > + "Library/OpensslLib/X64", > + "Library/OpensslLib/X64Gcc" > ] > }, > "CompilerPlugin": { > @@ -51,7 +60,11 @@ > }, > "DscCompleteCheck": { > "DscPath": "CryptoPkg.dsc", > - "IgnoreInf": [] > + "IgnoreInf": [ > + # These are alternatives to OpensslLib.inf > + "CryptoPkg/Library/OpensslLib/OpensslLibX64.inf", > + "CryptoPkg/Library/OpensslLib/OpensslLibX64Gcc.inf" > + ] > }, > "GuidCheck": { > "IgnoreGuidName": [], > diff --git a/CryptoPkg/Library/Include/CrtLibSupport.h > b/CryptoPkg/Library/Include/CrtLibSupport.h > index b1dff03bdc..17d7f29ba2 100644 > --- a/CryptoPkg/Library/Include/CrtLibSupport.h > +++ b/CryptoPkg/Library/Include/CrtLibSupport.h > @@ -102,6 +102,7 @@ SPDX-License-Identifier: BSD-2-Clause-Patent > // > typedef UINTN size_t; > typedef UINTN u_int; > +typedef INTN ptrdiff_t; > typedef INTN ssize_t; > typedef INT32 time_t; > typedef UINT8 __uint8_t; > @@ -109,6 +110,7 @@ typedef UINT8 sa_family_t; > typedef UINT8 u_char; > typedef UINT32 uid_t; > typedef UINT32 gid_t; > +typedef CHAR16 wchar_t; > > // > // File operations are not required for EFI building, > diff --git a/CryptoPkg/Library/Include/openssl/opensslconf.h > b/CryptoPkg/Library/Include/openssl/opensslconf.h > index e5652be5ca..b8d59aebe8 100644 > --- a/CryptoPkg/Library/Include/openssl/opensslconf.h > +++ b/CryptoPkg/Library/Include/openssl/opensslconf.h > @@ -112,9 +112,6 @@ extern "C" { > #ifndef OPENSSL_NO_ASAN > # define OPENSSL_NO_ASAN > #endif > -#ifndef OPENSSL_NO_ASM > -# define OPENSSL_NO_ASM > -#endif > #ifndef OPENSSL_NO_ASYNC > # define OPENSSL_NO_ASYNC > #endif > diff --git a/CryptoPkg/Library/OpensslLib/OpensslLib.inf > b/CryptoPkg/Library/OpensslLib/OpensslLib.inf > index b00bb74ce6..d84bde056a 100644 > --- a/CryptoPkg/Library/OpensslLib/OpensslLib.inf > +++ b/CryptoPkg/Library/OpensslLib/OpensslLib.inf > @@ -16,7 +16,7 @@ > VERSION_STRING = 1.0 > LIBRARY_CLASS = OpensslLib > DEFINE OPENSSL_PATH = openssl > - DEFINE OPENSSL_FLAGS = -DL_ENDIAN -DOPENSSL_SMALL_FOOTPRINT > -D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE > + DEFINE OPENSSL_FLAGS = -DL_ENDIAN -DOPENSSL_SMALL_FOOTPRINT > -D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE - > DOPENSSL_NO_ASM > > # > # VALID_ARCHITECTURES = IA32 X64 ARM AARCH64 > diff --git a/CryptoPkg/Library/OpensslLib/OpensslLibConstructor.c > b/CryptoPkg/Library/OpensslLib/OpensslLibConstructor.c > new file mode 100644 > index 0000000000..74ae1ac20c > --- /dev/null > +++ b/CryptoPkg/Library/OpensslLib/OpensslLibConstructor.c > @@ -0,0 +1,44 @@ > +/** @file > + Constructor to initialize CPUID data for OpenSSL assembly operations. > + > +Copyright (c) 2020, Intel Corporation. All rights reserved.<BR> > +SPDX-License-Identifier: BSD-2-Clause-Patent > + > +**/ > + > +#include <Uefi.h> > + > + > +/** > + An internal OpenSSL function which fetches a local copy of the hardware > + capability flags. > + > +**/ > +extern > +VOID > +OPENSSL_cpuid_setup ( > + VOID > + ); > + > +/** > + Constructor routine for OpensslLib. > + > + The constructor calls an internal OpenSSL function which fetches a local copy > + of the hardware capability flags, used to enable native crypto instructions. > + > + @param None > + > + @retval EFI_SUCCESS The construction succeeded. > + > +**/ > +EFI_STATUS > +EFIAPI > +OpensslLibConstructor ( > + VOID > + ) > +{ > + OPENSSL_cpuid_setup (); > + > + return EFI_SUCCESS; > +} > + > diff --git a/CryptoPkg/Library/OpensslLib/OpensslLibCrypto.inf > b/CryptoPkg/Library/OpensslLib/OpensslLibCrypto.inf > index 3557711bd8..cdeed0d073 100644 > --- a/CryptoPkg/Library/OpensslLib/OpensslLibCrypto.inf > +++ b/CryptoPkg/Library/OpensslLib/OpensslLibCrypto.inf > @@ -16,7 +16,7 @@ > VERSION_STRING = 1.0 > LIBRARY_CLASS = OpensslLib > DEFINE OPENSSL_PATH = openssl > - DEFINE OPENSSL_FLAGS = -DL_ENDIAN -DOPENSSL_SMALL_FOOTPRINT > -D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE > + DEFINE OPENSSL_FLAGS = -DL_ENDIAN -DOPENSSL_SMALL_FOOTPRINT > -D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE - > DOPENSSL_NO_ASM > > # > # VALID_ARCHITECTURES = IA32 X64 ARM AARCH64 > diff --git a/CryptoPkg/Library/OpensslLib/OpensslLibX64.inf > b/CryptoPkg/Library/OpensslLib/OpensslLibX64.inf > new file mode 100644 > index 0000000000..b92feaf1bf > --- /dev/null > +++ b/CryptoPkg/Library/OpensslLib/OpensslLibX64.inf > @@ -0,0 +1,653 @@ > +## @file > +# This module provides OpenSSL Library implementation. > +# > +# Copyright (c) 2010 - 2020, Intel Corporation. All rights reserved.<BR> > +# (C) Copyright 2020 Hewlett Packard Enterprise Development LP<BR> > +# SPDX-License-Identifier: BSD-2-Clause-Patent > +# > +## > + > +[Defines] > + INF_VERSION = 0x00010005 > + BASE_NAME = OpensslLibX64 > + MODULE_UNI_FILE = OpensslLib.uni > + FILE_GUID = 18125E50-0117-4DD0-BE54-4784AD995FEF > + MODULE_TYPE = BASE > + VERSION_STRING = 1.0 > + LIBRARY_CLASS = OpensslLib > + DEFINE OPENSSL_PATH = openssl > + DEFINE OPENSSL_FLAGS = -DL_ENDIAN -DOPENSSL_SMALL_FOOTPRINT > -D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE > + DEFINE OPENSSL_FLAGS_CONFIG = -DOPENSSL_CPUID_OBJ -DSHA1_ASM - > DSHA256_ASM -DSHA512_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM > + CONSTRUCTOR = OpensslLibConstructor > + > +# > +# VALID_ARCHITECTURES = X64 > +# > + > +[Sources.X64] > + OpensslLibConstructor.c > + $(OPENSSL_PATH)/e_os.h > + $(OPENSSL_PATH)/ms/uplink.h > +# Autogenerated files list starts here > + X64/crypto/aes/aesni-mb-x86_64.nasm > + X64/crypto/aes/aesni-sha1-x86_64.nasm > + X64/crypto/aes/aesni-sha256-x86_64.nasm > + X64/crypto/aes/aesni-x86_64.nasm > + X64/crypto/aes/vpaes-x86_64.nasm > + X64/crypto/modes/aesni-gcm-x86_64.nasm > + X64/crypto/modes/ghash-x86_64.nasm > + X64/crypto/sha/sha1-mb-x86_64.nasm > + X64/crypto/sha/sha1-x86_64.nasm > + X64/crypto/sha/sha256-mb-x86_64.nasm > + X64/crypto/sha/sha256-x86_64.nasm > + X64/crypto/sha/sha512-x86_64.nasm > + X64/crypto/x86_64cpuid.nasm > + $(OPENSSL_PATH)/crypto/aes/aes_cbc.c > + $(OPENSSL_PATH)/crypto/aes/aes_cfb.c > + $(OPENSSL_PATH)/crypto/aes/aes_core.c > + $(OPENSSL_PATH)/crypto/aes/aes_ige.c > + $(OPENSSL_PATH)/crypto/aes/aes_misc.c > + $(OPENSSL_PATH)/crypto/aes/aes_ofb.c > + $(OPENSSL_PATH)/crypto/aes/aes_wrap.c > + $(OPENSSL_PATH)/crypto/aria/aria.c > + $(OPENSSL_PATH)/crypto/asn1/a_bitstr.c > + $(OPENSSL_PATH)/crypto/asn1/a_d2i_fp.c > + $(OPENSSL_PATH)/crypto/asn1/a_digest.c > + $(OPENSSL_PATH)/crypto/asn1/a_dup.c > + $(OPENSSL_PATH)/crypto/asn1/a_gentm.c > + $(OPENSSL_PATH)/crypto/asn1/a_i2d_fp.c > + $(OPENSSL_PATH)/crypto/asn1/a_int.c > + $(OPENSSL_PATH)/crypto/asn1/a_mbstr.c > + $(OPENSSL_PATH)/crypto/asn1/a_object.c > + $(OPENSSL_PATH)/crypto/asn1/a_octet.c > + $(OPENSSL_PATH)/crypto/asn1/a_print.c > + $(OPENSSL_PATH)/crypto/asn1/a_sign.c > + $(OPENSSL_PATH)/crypto/asn1/a_strex.c > + $(OPENSSL_PATH)/crypto/asn1/a_strnid.c > + $(OPENSSL_PATH)/crypto/asn1/a_time.c > + $(OPENSSL_PATH)/crypto/asn1/a_type.c > + $(OPENSSL_PATH)/crypto/asn1/a_utctm.c > + $(OPENSSL_PATH)/crypto/asn1/a_utf8.c > + $(OPENSSL_PATH)/crypto/asn1/a_verify.c > + $(OPENSSL_PATH)/crypto/asn1/ameth_lib.c > + $(OPENSSL_PATH)/crypto/asn1/asn1_err.c > + $(OPENSSL_PATH)/crypto/asn1/asn1_gen.c > + $(OPENSSL_PATH)/crypto/asn1/asn1_item_list.c > + $(OPENSSL_PATH)/crypto/asn1/asn1_lib.c > + $(OPENSSL_PATH)/crypto/asn1/asn1_par.c > + $(OPENSSL_PATH)/crypto/asn1/asn_mime.c > + $(OPENSSL_PATH)/crypto/asn1/asn_moid.c > + $(OPENSSL_PATH)/crypto/asn1/asn_mstbl.c > + $(OPENSSL_PATH)/crypto/asn1/asn_pack.c > + $(OPENSSL_PATH)/crypto/asn1/bio_asn1.c > + $(OPENSSL_PATH)/crypto/asn1/bio_ndef.c > + $(OPENSSL_PATH)/crypto/asn1/d2i_pr.c > + $(OPENSSL_PATH)/crypto/asn1/d2i_pu.c > + $(OPENSSL_PATH)/crypto/asn1/evp_asn1.c > + $(OPENSSL_PATH)/crypto/asn1/f_int.c > + $(OPENSSL_PATH)/crypto/asn1/f_string.c > + $(OPENSSL_PATH)/crypto/asn1/i2d_pr.c > + $(OPENSSL_PATH)/crypto/asn1/i2d_pu.c > + $(OPENSSL_PATH)/crypto/asn1/n_pkey.c > + $(OPENSSL_PATH)/crypto/asn1/nsseq.c > + $(OPENSSL_PATH)/crypto/asn1/p5_pbe.c > + $(OPENSSL_PATH)/crypto/asn1/p5_pbev2.c > + $(OPENSSL_PATH)/crypto/asn1/p5_scrypt.c > + $(OPENSSL_PATH)/crypto/asn1/p8_pkey.c > + $(OPENSSL_PATH)/crypto/asn1/t_bitst.c > + $(OPENSSL_PATH)/crypto/asn1/t_pkey.c > + $(OPENSSL_PATH)/crypto/asn1/t_spki.c > + $(OPENSSL_PATH)/crypto/asn1/tasn_dec.c > + $(OPENSSL_PATH)/crypto/asn1/tasn_enc.c > + $(OPENSSL_PATH)/crypto/asn1/tasn_fre.c > + $(OPENSSL_PATH)/crypto/asn1/tasn_new.c > + $(OPENSSL_PATH)/crypto/asn1/tasn_prn.c > + $(OPENSSL_PATH)/crypto/asn1/tasn_scn.c > + $(OPENSSL_PATH)/crypto/asn1/tasn_typ.c > + $(OPENSSL_PATH)/crypto/asn1/tasn_utl.c > + $(OPENSSL_PATH)/crypto/asn1/x_algor.c > + $(OPENSSL_PATH)/crypto/asn1/x_bignum.c > + $(OPENSSL_PATH)/crypto/asn1/x_info.c > + $(OPENSSL_PATH)/crypto/asn1/x_int64.c > + $(OPENSSL_PATH)/crypto/asn1/x_long.c > + $(OPENSSL_PATH)/crypto/asn1/x_pkey.c > + $(OPENSSL_PATH)/crypto/asn1/x_sig.c > + $(OPENSSL_PATH)/crypto/asn1/x_spki.c > + $(OPENSSL_PATH)/crypto/asn1/x_val.c > + $(OPENSSL_PATH)/crypto/async/arch/async_null.c > + $(OPENSSL_PATH)/crypto/async/arch/async_posix.c > + $(OPENSSL_PATH)/crypto/async/arch/async_win.c > + $(OPENSSL_PATH)/crypto/async/async.c > + $(OPENSSL_PATH)/crypto/async/async_err.c > + $(OPENSSL_PATH)/crypto/async/async_wait.c > + $(OPENSSL_PATH)/crypto/bio/b_addr.c > + $(OPENSSL_PATH)/crypto/bio/b_dump.c > + $(OPENSSL_PATH)/crypto/bio/b_sock.c > + $(OPENSSL_PATH)/crypto/bio/b_sock2.c > + $(OPENSSL_PATH)/crypto/bio/bf_buff.c > + $(OPENSSL_PATH)/crypto/bio/bf_lbuf.c > + $(OPENSSL_PATH)/crypto/bio/bf_nbio.c > + $(OPENSSL_PATH)/crypto/bio/bf_null.c > + $(OPENSSL_PATH)/crypto/bio/bio_cb.c > + $(OPENSSL_PATH)/crypto/bio/bio_err.c > + $(OPENSSL_PATH)/crypto/bio/bio_lib.c > + $(OPENSSL_PATH)/crypto/bio/bio_meth.c > + $(OPENSSL_PATH)/crypto/bio/bss_acpt.c > + $(OPENSSL_PATH)/crypto/bio/bss_bio.c > + $(OPENSSL_PATH)/crypto/bio/bss_conn.c > + $(OPENSSL_PATH)/crypto/bio/bss_dgram.c > + $(OPENSSL_PATH)/crypto/bio/bss_fd.c > + $(OPENSSL_PATH)/crypto/bio/bss_file.c > + $(OPENSSL_PATH)/crypto/bio/bss_log.c > + $(OPENSSL_PATH)/crypto/bio/bss_mem.c > + $(OPENSSL_PATH)/crypto/bio/bss_null.c > + $(OPENSSL_PATH)/crypto/bio/bss_sock.c > + $(OPENSSL_PATH)/crypto/bn/bn_add.c > + $(OPENSSL_PATH)/crypto/bn/bn_asm.c > + $(OPENSSL_PATH)/crypto/bn/bn_blind.c > + $(OPENSSL_PATH)/crypto/bn/bn_const.c > + $(OPENSSL_PATH)/crypto/bn/bn_ctx.c > + $(OPENSSL_PATH)/crypto/bn/bn_depr.c > + $(OPENSSL_PATH)/crypto/bn/bn_dh.c > + $(OPENSSL_PATH)/crypto/bn/bn_div.c > + $(OPENSSL_PATH)/crypto/bn/bn_err.c > + $(OPENSSL_PATH)/crypto/bn/bn_exp.c > + $(OPENSSL_PATH)/crypto/bn/bn_exp2.c > + $(OPENSSL_PATH)/crypto/bn/bn_gcd.c > + $(OPENSSL_PATH)/crypto/bn/bn_gf2m.c > + $(OPENSSL_PATH)/crypto/bn/bn_intern.c > + $(OPENSSL_PATH)/crypto/bn/bn_kron.c > + $(OPENSSL_PATH)/crypto/bn/bn_lib.c > + $(OPENSSL_PATH)/crypto/bn/bn_mod.c > + $(OPENSSL_PATH)/crypto/bn/bn_mont.c > + $(OPENSSL_PATH)/crypto/bn/bn_mpi.c > + $(OPENSSL_PATH)/crypto/bn/bn_mul.c > + $(OPENSSL_PATH)/crypto/bn/bn_nist.c > + $(OPENSSL_PATH)/crypto/bn/bn_prime.c > + $(OPENSSL_PATH)/crypto/bn/bn_print.c > + $(OPENSSL_PATH)/crypto/bn/bn_rand.c > + $(OPENSSL_PATH)/crypto/bn/bn_recp.c > + $(OPENSSL_PATH)/crypto/bn/bn_shift.c > + $(OPENSSL_PATH)/crypto/bn/bn_sqr.c > + $(OPENSSL_PATH)/crypto/bn/bn_sqrt.c > + $(OPENSSL_PATH)/crypto/bn/bn_srp.c > + $(OPENSSL_PATH)/crypto/bn/bn_word.c > + $(OPENSSL_PATH)/crypto/bn/bn_x931p.c > + $(OPENSSL_PATH)/crypto/buffer/buf_err.c > + $(OPENSSL_PATH)/crypto/buffer/buffer.c > + $(OPENSSL_PATH)/crypto/cmac/cm_ameth.c > + $(OPENSSL_PATH)/crypto/cmac/cm_pmeth.c > + $(OPENSSL_PATH)/crypto/cmac/cmac.c > + $(OPENSSL_PATH)/crypto/comp/c_zlib.c > + $(OPENSSL_PATH)/crypto/comp/comp_err.c > + $(OPENSSL_PATH)/crypto/comp/comp_lib.c > + $(OPENSSL_PATH)/crypto/conf/conf_api.c > + $(OPENSSL_PATH)/crypto/conf/conf_def.c > + $(OPENSSL_PATH)/crypto/conf/conf_err.c > + $(OPENSSL_PATH)/crypto/conf/conf_lib.c > + $(OPENSSL_PATH)/crypto/conf/conf_mall.c > + $(OPENSSL_PATH)/crypto/conf/conf_mod.c > + $(OPENSSL_PATH)/crypto/conf/conf_sap.c > + $(OPENSSL_PATH)/crypto/conf/conf_ssl.c > + $(OPENSSL_PATH)/crypto/cpt_err.c > + $(OPENSSL_PATH)/crypto/cryptlib.c > + $(OPENSSL_PATH)/crypto/ctype.c > + $(OPENSSL_PATH)/crypto/cversion.c > + $(OPENSSL_PATH)/crypto/dh/dh_ameth.c > + $(OPENSSL_PATH)/crypto/dh/dh_asn1.c > + $(OPENSSL_PATH)/crypto/dh/dh_check.c > + $(OPENSSL_PATH)/crypto/dh/dh_depr.c > + $(OPENSSL_PATH)/crypto/dh/dh_err.c > + $(OPENSSL_PATH)/crypto/dh/dh_gen.c > + $(OPENSSL_PATH)/crypto/dh/dh_kdf.c > + $(OPENSSL_PATH)/crypto/dh/dh_key.c > + $(OPENSSL_PATH)/crypto/dh/dh_lib.c > + $(OPENSSL_PATH)/crypto/dh/dh_meth.c > + $(OPENSSL_PATH)/crypto/dh/dh_pmeth.c > + $(OPENSSL_PATH)/crypto/dh/dh_prn.c > + $(OPENSSL_PATH)/crypto/dh/dh_rfc5114.c > + $(OPENSSL_PATH)/crypto/dh/dh_rfc7919.c > + $(OPENSSL_PATH)/crypto/dso/dso_dl.c > + $(OPENSSL_PATH)/crypto/dso/dso_dlfcn.c > + $(OPENSSL_PATH)/crypto/dso/dso_err.c > + $(OPENSSL_PATH)/crypto/dso/dso_lib.c > + $(OPENSSL_PATH)/crypto/dso/dso_openssl.c > + $(OPENSSL_PATH)/crypto/dso/dso_vms.c > + $(OPENSSL_PATH)/crypto/dso/dso_win32.c > + $(OPENSSL_PATH)/crypto/ebcdic.c > + $(OPENSSL_PATH)/crypto/err/err.c > + $(OPENSSL_PATH)/crypto/err/err_prn.c > + $(OPENSSL_PATH)/crypto/evp/bio_b64.c > + $(OPENSSL_PATH)/crypto/evp/bio_enc.c > + $(OPENSSL_PATH)/crypto/evp/bio_md.c > + $(OPENSSL_PATH)/crypto/evp/bio_ok.c > + $(OPENSSL_PATH)/crypto/evp/c_allc.c > + $(OPENSSL_PATH)/crypto/evp/c_alld.c > + $(OPENSSL_PATH)/crypto/evp/cmeth_lib.c > + $(OPENSSL_PATH)/crypto/evp/digest.c > + $(OPENSSL_PATH)/crypto/evp/e_aes.c > + $(OPENSSL_PATH)/crypto/evp/e_aes_cbc_hmac_sha1.c > + $(OPENSSL_PATH)/crypto/evp/e_aes_cbc_hmac_sha256.c > + $(OPENSSL_PATH)/crypto/evp/e_aria.c > + $(OPENSSL_PATH)/crypto/evp/e_bf.c > + $(OPENSSL_PATH)/crypto/evp/e_camellia.c > + $(OPENSSL_PATH)/crypto/evp/e_cast.c > + $(OPENSSL_PATH)/crypto/evp/e_chacha20_poly1305.c > + $(OPENSSL_PATH)/crypto/evp/e_des.c > + $(OPENSSL_PATH)/crypto/evp/e_des3.c > + $(OPENSSL_PATH)/crypto/evp/e_idea.c > + $(OPENSSL_PATH)/crypto/evp/e_null.c > + $(OPENSSL_PATH)/crypto/evp/e_old.c > + $(OPENSSL_PATH)/crypto/evp/e_rc2.c > + $(OPENSSL_PATH)/crypto/evp/e_rc4.c > + $(OPENSSL_PATH)/crypto/evp/e_rc4_hmac_md5.c > + $(OPENSSL_PATH)/crypto/evp/e_rc5.c > + $(OPENSSL_PATH)/crypto/evp/e_seed.c > + $(OPENSSL_PATH)/crypto/evp/e_sm4.c > + $(OPENSSL_PATH)/crypto/evp/e_xcbc_d.c > + $(OPENSSL_PATH)/crypto/evp/encode.c > + $(OPENSSL_PATH)/crypto/evp/evp_cnf.c > + $(OPENSSL_PATH)/crypto/evp/evp_enc.c > + $(OPENSSL_PATH)/crypto/evp/evp_err.c > + $(OPENSSL_PATH)/crypto/evp/evp_key.c > + $(OPENSSL_PATH)/crypto/evp/evp_lib.c > + $(OPENSSL_PATH)/crypto/evp/evp_pbe.c > + $(OPENSSL_PATH)/crypto/evp/evp_pkey.c > + $(OPENSSL_PATH)/crypto/evp/m_md2.c > + $(OPENSSL_PATH)/crypto/evp/m_md4.c > + $(OPENSSL_PATH)/crypto/evp/m_md5.c > + $(OPENSSL_PATH)/crypto/evp/m_md5_sha1.c > + $(OPENSSL_PATH)/crypto/evp/m_mdc2.c > + $(OPENSSL_PATH)/crypto/evp/m_null.c > + $(OPENSSL_PATH)/crypto/evp/m_ripemd.c > + $(OPENSSL_PATH)/crypto/evp/m_sha1.c > + $(OPENSSL_PATH)/crypto/evp/m_sha3.c > + $(OPENSSL_PATH)/crypto/evp/m_sigver.c > + $(OPENSSL_PATH)/crypto/evp/m_wp.c > + $(OPENSSL_PATH)/crypto/evp/names.c > + $(OPENSSL_PATH)/crypto/evp/p5_crpt.c > + $(OPENSSL_PATH)/crypto/evp/p5_crpt2.c > + $(OPENSSL_PATH)/crypto/evp/p_dec.c > + $(OPENSSL_PATH)/crypto/evp/p_enc.c > + $(OPENSSL_PATH)/crypto/evp/p_lib.c > + $(OPENSSL_PATH)/crypto/evp/p_open.c > + $(OPENSSL_PATH)/crypto/evp/p_seal.c > + $(OPENSSL_PATH)/crypto/evp/p_sign.c > + $(OPENSSL_PATH)/crypto/evp/p_verify.c > + $(OPENSSL_PATH)/crypto/evp/pbe_scrypt.c > + $(OPENSSL_PATH)/crypto/evp/pmeth_fn.c > + $(OPENSSL_PATH)/crypto/evp/pmeth_gn.c > + $(OPENSSL_PATH)/crypto/evp/pmeth_lib.c > + $(OPENSSL_PATH)/crypto/ex_data.c > + $(OPENSSL_PATH)/crypto/getenv.c > + $(OPENSSL_PATH)/crypto/hmac/hm_ameth.c > + $(OPENSSL_PATH)/crypto/hmac/hm_pmeth.c > + $(OPENSSL_PATH)/crypto/hmac/hmac.c > + $(OPENSSL_PATH)/crypto/init.c > + $(OPENSSL_PATH)/crypto/kdf/hkdf.c > + $(OPENSSL_PATH)/crypto/kdf/kdf_err.c > + $(OPENSSL_PATH)/crypto/kdf/scrypt.c > + $(OPENSSL_PATH)/crypto/kdf/tls1_prf.c > + $(OPENSSL_PATH)/crypto/lhash/lh_stats.c > + $(OPENSSL_PATH)/crypto/lhash/lhash.c > + $(OPENSSL_PATH)/crypto/md5/md5_dgst.c > + $(OPENSSL_PATH)/crypto/md5/md5_one.c > + $(OPENSSL_PATH)/crypto/mem.c > + $(OPENSSL_PATH)/crypto/mem_dbg.c > + $(OPENSSL_PATH)/crypto/mem_sec.c > + $(OPENSSL_PATH)/crypto/modes/cbc128.c > + $(OPENSSL_PATH)/crypto/modes/ccm128.c > + $(OPENSSL_PATH)/crypto/modes/cfb128.c > + $(OPENSSL_PATH)/crypto/modes/ctr128.c > + $(OPENSSL_PATH)/crypto/modes/cts128.c > + $(OPENSSL_PATH)/crypto/modes/gcm128.c > + $(OPENSSL_PATH)/crypto/modes/ocb128.c > + $(OPENSSL_PATH)/crypto/modes/ofb128.c > + $(OPENSSL_PATH)/crypto/modes/wrap128.c > + $(OPENSSL_PATH)/crypto/modes/xts128.c > + $(OPENSSL_PATH)/crypto/o_dir.c > + $(OPENSSL_PATH)/crypto/o_fips.c > + $(OPENSSL_PATH)/crypto/o_fopen.c > + $(OPENSSL_PATH)/crypto/o_init.c > + $(OPENSSL_PATH)/crypto/o_str.c > + $(OPENSSL_PATH)/crypto/o_time.c > + $(OPENSSL_PATH)/crypto/objects/o_names.c > + $(OPENSSL_PATH)/crypto/objects/obj_dat.c > + $(OPENSSL_PATH)/crypto/objects/obj_err.c > + $(OPENSSL_PATH)/crypto/objects/obj_lib.c > + $(OPENSSL_PATH)/crypto/objects/obj_xref.c > + $(OPENSSL_PATH)/crypto/ocsp/ocsp_asn.c > + $(OPENSSL_PATH)/crypto/ocsp/ocsp_cl.c > + $(OPENSSL_PATH)/crypto/ocsp/ocsp_err.c > + $(OPENSSL_PATH)/crypto/ocsp/ocsp_ext.c > + $(OPENSSL_PATH)/crypto/ocsp/ocsp_ht.c > + $(OPENSSL_PATH)/crypto/ocsp/ocsp_lib.c > + $(OPENSSL_PATH)/crypto/ocsp/ocsp_prn.c > + $(OPENSSL_PATH)/crypto/ocsp/ocsp_srv.c > + $(OPENSSL_PATH)/crypto/ocsp/ocsp_vfy.c > + $(OPENSSL_PATH)/crypto/ocsp/v3_ocsp.c > + $(OPENSSL_PATH)/crypto/pem/pem_all.c > + $(OPENSSL_PATH)/crypto/pem/pem_err.c > + $(OPENSSL_PATH)/crypto/pem/pem_info.c > + $(OPENSSL_PATH)/crypto/pem/pem_lib.c > + $(OPENSSL_PATH)/crypto/pem/pem_oth.c > + $(OPENSSL_PATH)/crypto/pem/pem_pk8.c > + $(OPENSSL_PATH)/crypto/pem/pem_pkey.c > + $(OPENSSL_PATH)/crypto/pem/pem_sign.c > + $(OPENSSL_PATH)/crypto/pem/pem_x509.c > + $(OPENSSL_PATH)/crypto/pem/pem_xaux.c > + $(OPENSSL_PATH)/crypto/pem/pvkfmt.c > + $(OPENSSL_PATH)/crypto/pkcs12/p12_add.c > + $(OPENSSL_PATH)/crypto/pkcs12/p12_asn.c > + $(OPENSSL_PATH)/crypto/pkcs12/p12_attr.c > + $(OPENSSL_PATH)/crypto/pkcs12/p12_crpt.c > + $(OPENSSL_PATH)/crypto/pkcs12/p12_crt.c > + $(OPENSSL_PATH)/crypto/pkcs12/p12_decr.c > + $(OPENSSL_PATH)/crypto/pkcs12/p12_init.c > + $(OPENSSL_PATH)/crypto/pkcs12/p12_key.c > + $(OPENSSL_PATH)/crypto/pkcs12/p12_kiss.c > + $(OPENSSL_PATH)/crypto/pkcs12/p12_mutl.c > + $(OPENSSL_PATH)/crypto/pkcs12/p12_npas.c > + $(OPENSSL_PATH)/crypto/pkcs12/p12_p8d.c > + $(OPENSSL_PATH)/crypto/pkcs12/p12_p8e.c > + $(OPENSSL_PATH)/crypto/pkcs12/p12_sbag.c > + $(OPENSSL_PATH)/crypto/pkcs12/p12_utl.c > + $(OPENSSL_PATH)/crypto/pkcs12/pk12err.c > + $(OPENSSL_PATH)/crypto/pkcs7/bio_pk7.c > + $(OPENSSL_PATH)/crypto/pkcs7/pk7_asn1.c > + $(OPENSSL_PATH)/crypto/pkcs7/pk7_attr.c > + $(OPENSSL_PATH)/crypto/pkcs7/pk7_doit.c > + $(OPENSSL_PATH)/crypto/pkcs7/pk7_lib.c > + $(OPENSSL_PATH)/crypto/pkcs7/pk7_mime.c > + $(OPENSSL_PATH)/crypto/pkcs7/pk7_smime.c > + $(OPENSSL_PATH)/crypto/pkcs7/pkcs7err.c > + $(OPENSSL_PATH)/crypto/rand/drbg_ctr.c > + $(OPENSSL_PATH)/crypto/rand/drbg_lib.c > + $(OPENSSL_PATH)/crypto/rand/rand_egd.c > + $(OPENSSL_PATH)/crypto/rand/rand_err.c > + $(OPENSSL_PATH)/crypto/rand/rand_lib.c > + $(OPENSSL_PATH)/crypto/rand/rand_unix.c > + $(OPENSSL_PATH)/crypto/rand/rand_vms.c > + $(OPENSSL_PATH)/crypto/rand/rand_win.c > + $(OPENSSL_PATH)/crypto/rsa/rsa_ameth.c > + $(OPENSSL_PATH)/crypto/rsa/rsa_asn1.c > + $(OPENSSL_PATH)/crypto/rsa/rsa_chk.c > + $(OPENSSL_PATH)/crypto/rsa/rsa_crpt.c > + $(OPENSSL_PATH)/crypto/rsa/rsa_depr.c > + $(OPENSSL_PATH)/crypto/rsa/rsa_err.c > + $(OPENSSL_PATH)/crypto/rsa/rsa_gen.c > + $(OPENSSL_PATH)/crypto/rsa/rsa_lib.c > + $(OPENSSL_PATH)/crypto/rsa/rsa_meth.c > + $(OPENSSL_PATH)/crypto/rsa/rsa_mp.c > + $(OPENSSL_PATH)/crypto/rsa/rsa_none.c > + $(OPENSSL_PATH)/crypto/rsa/rsa_oaep.c > + $(OPENSSL_PATH)/crypto/rsa/rsa_ossl.c > + $(OPENSSL_PATH)/crypto/rsa/rsa_pk1.c > + $(OPENSSL_PATH)/crypto/rsa/rsa_pmeth.c > + $(OPENSSL_PATH)/crypto/rsa/rsa_prn.c > + $(OPENSSL_PATH)/crypto/rsa/rsa_pss.c > + $(OPENSSL_PATH)/crypto/rsa/rsa_saos.c > + $(OPENSSL_PATH)/crypto/rsa/rsa_sign.c > + $(OPENSSL_PATH)/crypto/rsa/rsa_ssl.c > + $(OPENSSL_PATH)/crypto/rsa/rsa_x931.c > + $(OPENSSL_PATH)/crypto/rsa/rsa_x931g.c > + $(OPENSSL_PATH)/crypto/sha/keccak1600.c > + $(OPENSSL_PATH)/crypto/sha/sha1_one.c > + $(OPENSSL_PATH)/crypto/sha/sha1dgst.c > + $(OPENSSL_PATH)/crypto/sha/sha256.c > + $(OPENSSL_PATH)/crypto/sha/sha512.c > + $(OPENSSL_PATH)/crypto/siphash/siphash.c > + $(OPENSSL_PATH)/crypto/siphash/siphash_ameth.c > + $(OPENSSL_PATH)/crypto/siphash/siphash_pmeth.c > + $(OPENSSL_PATH)/crypto/sm3/m_sm3.c > + $(OPENSSL_PATH)/crypto/sm3/sm3.c > + $(OPENSSL_PATH)/crypto/sm4/sm4.c > + $(OPENSSL_PATH)/crypto/stack/stack.c > + $(OPENSSL_PATH)/crypto/threads_none.c > + $(OPENSSL_PATH)/crypto/threads_pthread.c > + $(OPENSSL_PATH)/crypto/threads_win.c > + $(OPENSSL_PATH)/crypto/txt_db/txt_db.c > + $(OPENSSL_PATH)/crypto/ui/ui_err.c > + $(OPENSSL_PATH)/crypto/ui/ui_lib.c > + $(OPENSSL_PATH)/crypto/ui/ui_null.c > + $(OPENSSL_PATH)/crypto/ui/ui_openssl.c > + $(OPENSSL_PATH)/crypto/ui/ui_util.c > + $(OPENSSL_PATH)/crypto/uid.c > + $(OPENSSL_PATH)/crypto/x509/by_dir.c > + $(OPENSSL_PATH)/crypto/x509/by_file.c > + $(OPENSSL_PATH)/crypto/x509/t_crl.c > + $(OPENSSL_PATH)/crypto/x509/t_req.c > + $(OPENSSL_PATH)/crypto/x509/t_x509.c > + $(OPENSSL_PATH)/crypto/x509/x509_att.c > + $(OPENSSL_PATH)/crypto/x509/x509_cmp.c > + $(OPENSSL_PATH)/crypto/x509/x509_d2.c > + $(OPENSSL_PATH)/crypto/x509/x509_def.c > + $(OPENSSL_PATH)/crypto/x509/x509_err.c > + $(OPENSSL_PATH)/crypto/x509/x509_ext.c > + $(OPENSSL_PATH)/crypto/x509/x509_lu.c > + $(OPENSSL_PATH)/crypto/x509/x509_meth.c > + $(OPENSSL_PATH)/crypto/x509/x509_obj.c > + $(OPENSSL_PATH)/crypto/x509/x509_r2x.c > + $(OPENSSL_PATH)/crypto/x509/x509_req.c > + $(OPENSSL_PATH)/crypto/x509/x509_set.c > + $(OPENSSL_PATH)/crypto/x509/x509_trs.c > + $(OPENSSL_PATH)/crypto/x509/x509_txt.c > + $(OPENSSL_PATH)/crypto/x509/x509_v3.c > + $(OPENSSL_PATH)/crypto/x509/x509_vfy.c > + $(OPENSSL_PATH)/crypto/x509/x509_vpm.c > + $(OPENSSL_PATH)/crypto/x509/x509cset.c > + $(OPENSSL_PATH)/crypto/x509/x509name.c > + $(OPENSSL_PATH)/crypto/x509/x509rset.c > + $(OPENSSL_PATH)/crypto/x509/x509spki.c > + $(OPENSSL_PATH)/crypto/x509/x509type.c > + $(OPENSSL_PATH)/crypto/x509/x_all.c > + $(OPENSSL_PATH)/crypto/x509/x_attrib.c > + $(OPENSSL_PATH)/crypto/x509/x_crl.c > + $(OPENSSL_PATH)/crypto/x509/x_exten.c > + $(OPENSSL_PATH)/crypto/x509/x_name.c > + $(OPENSSL_PATH)/crypto/x509/x_pubkey.c > + $(OPENSSL_PATH)/crypto/x509/x_req.c > + $(OPENSSL_PATH)/crypto/x509/x_x509.c > + $(OPENSSL_PATH)/crypto/x509/x_x509a.c > + $(OPENSSL_PATH)/crypto/x509v3/pcy_cache.c > + $(OPENSSL_PATH)/crypto/x509v3/pcy_data.c > + $(OPENSSL_PATH)/crypto/x509v3/pcy_lib.c > + $(OPENSSL_PATH)/crypto/x509v3/pcy_map.c > + $(OPENSSL_PATH)/crypto/x509v3/pcy_node.c > + $(OPENSSL_PATH)/crypto/x509v3/pcy_tree.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_addr.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_admis.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_akey.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_akeya.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_alt.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_asid.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_bcons.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_bitst.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_conf.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_cpols.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_crld.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_enum.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_extku.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_genn.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_ia5.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_info.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_int.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_lib.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_ncons.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_pci.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_pcia.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_pcons.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_pku.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_pmaps.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_prn.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_purp.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_skey.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_sxnet.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_tlsf.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_utl.c > + $(OPENSSL_PATH)/crypto/x509v3/v3err.c > + $(OPENSSL_PATH)/crypto/arm_arch.h > + $(OPENSSL_PATH)/crypto/mips_arch.h > + $(OPENSSL_PATH)/crypto/ppc_arch.h > + $(OPENSSL_PATH)/crypto/s390x_arch.h > + $(OPENSSL_PATH)/crypto/sparc_arch.h > + $(OPENSSL_PATH)/crypto/vms_rms.h > + $(OPENSSL_PATH)/crypto/aes/aes_local.h > + $(OPENSSL_PATH)/crypto/asn1/asn1_item_list.h > + $(OPENSSL_PATH)/crypto/asn1/asn1_local.h > + $(OPENSSL_PATH)/crypto/asn1/charmap.h > + $(OPENSSL_PATH)/crypto/asn1/standard_methods.h > + $(OPENSSL_PATH)/crypto/asn1/tbl_standard.h > + $(OPENSSL_PATH)/crypto/async/async_local.h > + $(OPENSSL_PATH)/crypto/async/arch/async_null.h > + $(OPENSSL_PATH)/crypto/async/arch/async_posix.h > + $(OPENSSL_PATH)/crypto/async/arch/async_win.h > + $(OPENSSL_PATH)/crypto/bio/bio_local.h > + $(OPENSSL_PATH)/crypto/bn/bn_local.h > + $(OPENSSL_PATH)/crypto/bn/bn_prime.h > + $(OPENSSL_PATH)/crypto/bn/rsaz_exp.h > + $(OPENSSL_PATH)/crypto/comp/comp_local.h > + $(OPENSSL_PATH)/crypto/conf/conf_def.h > + $(OPENSSL_PATH)/crypto/conf/conf_local.h > + $(OPENSSL_PATH)/crypto/dh/dh_local.h > + $(OPENSSL_PATH)/crypto/dso/dso_local.h > + $(OPENSSL_PATH)/crypto/evp/evp_local.h > + $(OPENSSL_PATH)/crypto/hmac/hmac_local.h > + $(OPENSSL_PATH)/crypto/lhash/lhash_local.h > + $(OPENSSL_PATH)/crypto/md5/md5_local.h > + $(OPENSSL_PATH)/crypto/modes/modes_local.h > + $(OPENSSL_PATH)/crypto/objects/obj_dat.h > + $(OPENSSL_PATH)/crypto/objects/obj_local.h > + $(OPENSSL_PATH)/crypto/objects/obj_xref.h > + $(OPENSSL_PATH)/crypto/ocsp/ocsp_local.h > + $(OPENSSL_PATH)/crypto/pkcs12/p12_local.h > + $(OPENSSL_PATH)/crypto/rand/rand_local.h > + $(OPENSSL_PATH)/crypto/rsa/rsa_local.h > + $(OPENSSL_PATH)/crypto/sha/sha_local.h > + $(OPENSSL_PATH)/crypto/siphash/siphash_local.h > + $(OPENSSL_PATH)/crypto/sm3/sm3_local.h > + $(OPENSSL_PATH)/crypto/store/store_local.h > + $(OPENSSL_PATH)/crypto/ui/ui_local.h > + $(OPENSSL_PATH)/crypto/x509/x509_local.h > + $(OPENSSL_PATH)/crypto/x509v3/ext_dat.h > + $(OPENSSL_PATH)/crypto/x509v3/pcy_local.h > + $(OPENSSL_PATH)/crypto/x509v3/standard_exts.h > + $(OPENSSL_PATH)/crypto/x509v3/v3_admis.h > + $(OPENSSL_PATH)/ssl/bio_ssl.c > + $(OPENSSL_PATH)/ssl/d1_lib.c > + $(OPENSSL_PATH)/ssl/d1_msg.c > + $(OPENSSL_PATH)/ssl/d1_srtp.c > + $(OPENSSL_PATH)/ssl/methods.c > + $(OPENSSL_PATH)/ssl/packet.c > + $(OPENSSL_PATH)/ssl/pqueue.c > + $(OPENSSL_PATH)/ssl/record/dtls1_bitmap.c > + $(OPENSSL_PATH)/ssl/record/rec_layer_d1.c > + $(OPENSSL_PATH)/ssl/record/rec_layer_s3.c > + $(OPENSSL_PATH)/ssl/record/ssl3_buffer.c > + $(OPENSSL_PATH)/ssl/record/ssl3_record.c > + $(OPENSSL_PATH)/ssl/record/ssl3_record_tls13.c > + $(OPENSSL_PATH)/ssl/s3_cbc.c > + $(OPENSSL_PATH)/ssl/s3_enc.c > + $(OPENSSL_PATH)/ssl/s3_lib.c > + $(OPENSSL_PATH)/ssl/s3_msg.c > + $(OPENSSL_PATH)/ssl/ssl_asn1.c > + $(OPENSSL_PATH)/ssl/ssl_cert.c > + $(OPENSSL_PATH)/ssl/ssl_ciph.c > + $(OPENSSL_PATH)/ssl/ssl_conf.c > + $(OPENSSL_PATH)/ssl/ssl_err.c > + $(OPENSSL_PATH)/ssl/ssl_init.c > + $(OPENSSL_PATH)/ssl/ssl_lib.c > + $(OPENSSL_PATH)/ssl/ssl_mcnf.c > + $(OPENSSL_PATH)/ssl/ssl_rsa.c > + $(OPENSSL_PATH)/ssl/ssl_sess.c > + $(OPENSSL_PATH)/ssl/ssl_stat.c > + $(OPENSSL_PATH)/ssl/ssl_txt.c > + $(OPENSSL_PATH)/ssl/ssl_utst.c > + $(OPENSSL_PATH)/ssl/statem/extensions.c > + $(OPENSSL_PATH)/ssl/statem/extensions_clnt.c > + $(OPENSSL_PATH)/ssl/statem/extensions_cust.c > + $(OPENSSL_PATH)/ssl/statem/extensions_srvr.c > + $(OPENSSL_PATH)/ssl/statem/statem.c > + $(OPENSSL_PATH)/ssl/statem/statem_clnt.c > + $(OPENSSL_PATH)/ssl/statem/statem_dtls.c > + $(OPENSSL_PATH)/ssl/statem/statem_lib.c > + $(OPENSSL_PATH)/ssl/statem/statem_srvr.c > + $(OPENSSL_PATH)/ssl/t1_enc.c > + $(OPENSSL_PATH)/ssl/t1_lib.c > + $(OPENSSL_PATH)/ssl/t1_trce.c > + $(OPENSSL_PATH)/ssl/tls13_enc.c > + $(OPENSSL_PATH)/ssl/tls_srp.c > + $(OPENSSL_PATH)/ssl/packet_local.h > + $(OPENSSL_PATH)/ssl/ssl_cert_table.h > + $(OPENSSL_PATH)/ssl/ssl_local.h > + $(OPENSSL_PATH)/ssl/record/record.h > + $(OPENSSL_PATH)/ssl/record/record_local.h > + $(OPENSSL_PATH)/ssl/statem/statem.h > + $(OPENSSL_PATH)/ssl/statem/statem_local.h > +# Autogenerated files list ends here > + buildinf.h > + ossl_store.c > + rand_pool.c > + X64/ApiHooks.c > + > +[Packages] > + MdePkg/MdePkg.dec > + CryptoPkg/CryptoPkg.dec > + > +[LibraryClasses] > + BaseLib > + DebugLib > + RngLib > + PrintLib > + > +[BuildOptions] > + # > + # Disables the following Visual Studio compiler warnings brought by openssl > source, > + # so we do not break the build with /WX option: > + # C4090: 'function' : different 'const' qualifiers > + # C4132: 'object' : const object should be initialized (tls13_enc.c) > + # C4210: nonstandard extension used: function given file scope > + # C4244: conversion from type1 to type2, possible loss of data > + # C4245: conversion from type1 to type2, signed/unsigned mismatch > + # C4267: conversion from size_t to type, possible loss of data > + # C4306: 'identifier' : conversion from 'type1' to 'type2' of greater size > + # C4310: cast truncates constant value > + # C4389: 'operator' : signed/unsigned mismatch (xxxx) > + # C4700: uninitialized local variable 'name' used. (conf_sap.c(71)) > + # C4702: unreachable code > + # C4706: assignment within conditional expression > + # C4819: The file contains a character that cannot be represented in the > current code page > + # > + MSFT:*_*_X64_CC_FLAGS = -U_WIN32 -U_WIN64 -U_MSC_VER > $(OPENSSL_FLAGS) $(OPENSSL_FLAGS_CONFIG) /wd4090 /wd4132 /wd4210 > /wd4244 /wd4245 /wd4267 /wd4306 /wd4310 /wd4700 /wd4389 /wd4702 > /wd4706 /wd4819 > + > + INTEL:*_*_X64_CC_FLAGS = -U_WIN32 -U_WIN64 -U_MSC_VER -U__ICC > $(OPENSSL_FLAGS) $(OPENSSL_FLAGS_CONFIG) /w > + > + # > + # Suppress the following build warnings in openssl so we don't break the build > with -Werror > + # -Werror=maybe-uninitialized: there exist some other paths for which the > variable is not initialized. > + # -Werror=format: Check calls to printf and scanf, etc., to make sure that the > arguments supplied have > + # types appropriate to the format string specified. > + # -Werror=unused-but-set-variable: Warn whenever a local variable is > assigned to, but otherwise unused (aside from its declaration). > + # > + GCC:*_*_X64_CC_FLAGS = -UWIN32 -U_WIN32 -U_WIN64 > $(OPENSSL_FLAGS) $(OPENSSL_FLAGS_CONFIG) -Wno-error=maybe-uninitialized > -Wno-error=format -Wno-format -Wno-error=unused-but-set-variable - > DNO_MSABI_VA_FUNCS > + > + # suppress the following warnings in openssl so we don't break the build with > warnings-as-errors: > + # 1295: Deprecated declaration <entity> - give arg types > + # 550: <entity> was set but never used > + # 1293: assignment in condition > + # 111: statement is unreachable (invariably "break;" after "return X;" in case > statement) > + # 68: integer conversion resulted in a change of sign ("if (Status == -1)") > + # 177: <entity> was declared but never referenced > + # 223: function <entity> declared implicitly > + # 144: a value of type <type> cannot be used to initialize an entity of type > <type> > + # 513: a value of type <type> cannot be assigned to an entity of type <type> > + # 188: enumerated type mixed with another type (i.e. passing an integer as an > enum without a cast) > + # 1296: Extended constant initialiser used > + # 128: loop is not reachable - may be emitted inappropriately if code follows > a conditional return > + # from the function that evaluates to true at compile time > + # 546: transfer of control bypasses initialization - may be emitted > inappropriately if the uninitialized > + # variable is never referenced after the jump > + # 1: ignore "#1-D: last line of file ends without a newline" > + # 3017: <entity> may be used before being set (NOTE: This was fixed in > OpenSSL 1.1 HEAD with > + # commit d9b8b89bec4480de3a10bdaf9425db371c19145b, and can be > dropped then.) > + XCODE:*_*_X64_CC_FLAGS = -mmmx -msse -U_WIN32 -U_WIN64 > $(OPENSSL_FLAGS) $(OPENSSL_FLAGS_CONFIG) -w -std=c99 -Wno- > error=uninitialized > diff --git a/CryptoPkg/Library/OpensslLib/OpensslLibX64Gcc.inf > b/CryptoPkg/Library/OpensslLib/OpensslLibX64Gcc.inf > new file mode 100644 > index 0000000000..4ffdd8cd06 > --- /dev/null > +++ b/CryptoPkg/Library/OpensslLib/OpensslLibX64Gcc.inf > @@ -0,0 +1,653 @@ > +## @file > +# This module provides OpenSSL Library implementation. > +# > +# Copyright (c) 2010 - 2020, Intel Corporation. All rights reserved.<BR> > +# (C) Copyright 2020 Hewlett Packard Enterprise Development LP<BR> > +# SPDX-License-Identifier: BSD-2-Clause-Patent > +# > +## > + > +[Defines] > + INF_VERSION = 0x00010005 > + BASE_NAME = OpensslLibX64Gcc > + MODULE_UNI_FILE = OpensslLib.uni > + FILE_GUID = DD90DB9D-6A3F-4F2B-87BF-A8F2BBEF982F > + MODULE_TYPE = BASE > + VERSION_STRING = 1.0 > + LIBRARY_CLASS = OpensslLib > + DEFINE OPENSSL_PATH = openssl > + DEFINE OPENSSL_FLAGS = -DL_ENDIAN -DOPENSSL_SMALL_FOOTPRINT > -D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE > + DEFINE OPENSSL_FLAGS_CONFIG = -DOPENSSL_CPUID_OBJ -DSHA1_ASM - > DSHA256_ASM -DSHA512_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM > + CONSTRUCTOR = OpensslLibConstructor > + > +# > +# VALID_ARCHITECTURES = X64 > +# > + > +[Sources.X64] > + OpensslLibConstructor.c > + $(OPENSSL_PATH)/e_os.h > + $(OPENSSL_PATH)/ms/uplink.h > +# Autogenerated files list starts here > + X64Gcc/crypto/aes/aesni-mb-x86_64.S > + X64Gcc/crypto/aes/aesni-sha1-x86_64.S > + X64Gcc/crypto/aes/aesni-sha256-x86_64.S > + X64Gcc/crypto/aes/aesni-x86_64.S > + X64Gcc/crypto/aes/vpaes-x86_64.S > + X64Gcc/crypto/modes/aesni-gcm-x86_64.S > + X64Gcc/crypto/modes/ghash-x86_64.S > + X64Gcc/crypto/sha/sha1-mb-x86_64.S > + X64Gcc/crypto/sha/sha1-x86_64.S > + X64Gcc/crypto/sha/sha256-mb-x86_64.S > + X64Gcc/crypto/sha/sha256-x86_64.S > + X64Gcc/crypto/sha/sha512-x86_64.S > + X64Gcc/crypto/x86_64cpuid.S > + $(OPENSSL_PATH)/crypto/aes/aes_cbc.c > + $(OPENSSL_PATH)/crypto/aes/aes_cfb.c > + $(OPENSSL_PATH)/crypto/aes/aes_core.c > + $(OPENSSL_PATH)/crypto/aes/aes_ige.c > + $(OPENSSL_PATH)/crypto/aes/aes_misc.c > + $(OPENSSL_PATH)/crypto/aes/aes_ofb.c > + $(OPENSSL_PATH)/crypto/aes/aes_wrap.c > + $(OPENSSL_PATH)/crypto/aria/aria.c > + $(OPENSSL_PATH)/crypto/asn1/a_bitstr.c > + $(OPENSSL_PATH)/crypto/asn1/a_d2i_fp.c > + $(OPENSSL_PATH)/crypto/asn1/a_digest.c > + $(OPENSSL_PATH)/crypto/asn1/a_dup.c > + $(OPENSSL_PATH)/crypto/asn1/a_gentm.c > + $(OPENSSL_PATH)/crypto/asn1/a_i2d_fp.c > + $(OPENSSL_PATH)/crypto/asn1/a_int.c > + $(OPENSSL_PATH)/crypto/asn1/a_mbstr.c > + $(OPENSSL_PATH)/crypto/asn1/a_object.c > + $(OPENSSL_PATH)/crypto/asn1/a_octet.c > + $(OPENSSL_PATH)/crypto/asn1/a_print.c > + $(OPENSSL_PATH)/crypto/asn1/a_sign.c > + $(OPENSSL_PATH)/crypto/asn1/a_strex.c > + $(OPENSSL_PATH)/crypto/asn1/a_strnid.c > + $(OPENSSL_PATH)/crypto/asn1/a_time.c > + $(OPENSSL_PATH)/crypto/asn1/a_type.c > + $(OPENSSL_PATH)/crypto/asn1/a_utctm.c > + $(OPENSSL_PATH)/crypto/asn1/a_utf8.c > + $(OPENSSL_PATH)/crypto/asn1/a_verify.c > + $(OPENSSL_PATH)/crypto/asn1/ameth_lib.c > + $(OPENSSL_PATH)/crypto/asn1/asn1_err.c > + $(OPENSSL_PATH)/crypto/asn1/asn1_gen.c > + $(OPENSSL_PATH)/crypto/asn1/asn1_item_list.c > + $(OPENSSL_PATH)/crypto/asn1/asn1_lib.c > + $(OPENSSL_PATH)/crypto/asn1/asn1_par.c > + $(OPENSSL_PATH)/crypto/asn1/asn_mime.c > + $(OPENSSL_PATH)/crypto/asn1/asn_moid.c > + $(OPENSSL_PATH)/crypto/asn1/asn_mstbl.c > + $(OPENSSL_PATH)/crypto/asn1/asn_pack.c > + $(OPENSSL_PATH)/crypto/asn1/bio_asn1.c > + $(OPENSSL_PATH)/crypto/asn1/bio_ndef.c > + $(OPENSSL_PATH)/crypto/asn1/d2i_pr.c > + $(OPENSSL_PATH)/crypto/asn1/d2i_pu.c > + $(OPENSSL_PATH)/crypto/asn1/evp_asn1.c > + $(OPENSSL_PATH)/crypto/asn1/f_int.c > + $(OPENSSL_PATH)/crypto/asn1/f_string.c > + $(OPENSSL_PATH)/crypto/asn1/i2d_pr.c > + $(OPENSSL_PATH)/crypto/asn1/i2d_pu.c > + $(OPENSSL_PATH)/crypto/asn1/n_pkey.c > + $(OPENSSL_PATH)/crypto/asn1/nsseq.c > + $(OPENSSL_PATH)/crypto/asn1/p5_pbe.c > + $(OPENSSL_PATH)/crypto/asn1/p5_pbev2.c > + $(OPENSSL_PATH)/crypto/asn1/p5_scrypt.c > + $(OPENSSL_PATH)/crypto/asn1/p8_pkey.c > + $(OPENSSL_PATH)/crypto/asn1/t_bitst.c > + $(OPENSSL_PATH)/crypto/asn1/t_pkey.c > + $(OPENSSL_PATH)/crypto/asn1/t_spki.c > + $(OPENSSL_PATH)/crypto/asn1/tasn_dec.c > + $(OPENSSL_PATH)/crypto/asn1/tasn_enc.c > + $(OPENSSL_PATH)/crypto/asn1/tasn_fre.c > + $(OPENSSL_PATH)/crypto/asn1/tasn_new.c > + $(OPENSSL_PATH)/crypto/asn1/tasn_prn.c > + $(OPENSSL_PATH)/crypto/asn1/tasn_scn.c > + $(OPENSSL_PATH)/crypto/asn1/tasn_typ.c > + $(OPENSSL_PATH)/crypto/asn1/tasn_utl.c > + $(OPENSSL_PATH)/crypto/asn1/x_algor.c > + $(OPENSSL_PATH)/crypto/asn1/x_bignum.c > + $(OPENSSL_PATH)/crypto/asn1/x_info.c > + $(OPENSSL_PATH)/crypto/asn1/x_int64.c > + $(OPENSSL_PATH)/crypto/asn1/x_long.c > + $(OPENSSL_PATH)/crypto/asn1/x_pkey.c > + $(OPENSSL_PATH)/crypto/asn1/x_sig.c > + $(OPENSSL_PATH)/crypto/asn1/x_spki.c > + $(OPENSSL_PATH)/crypto/asn1/x_val.c > + $(OPENSSL_PATH)/crypto/async/arch/async_null.c > + $(OPENSSL_PATH)/crypto/async/arch/async_posix.c > + $(OPENSSL_PATH)/crypto/async/arch/async_win.c > + $(OPENSSL_PATH)/crypto/async/async.c > + $(OPENSSL_PATH)/crypto/async/async_err.c > + $(OPENSSL_PATH)/crypto/async/async_wait.c > + $(OPENSSL_PATH)/crypto/bio/b_addr.c > + $(OPENSSL_PATH)/crypto/bio/b_dump.c > + $(OPENSSL_PATH)/crypto/bio/b_sock.c > + $(OPENSSL_PATH)/crypto/bio/b_sock2.c > + $(OPENSSL_PATH)/crypto/bio/bf_buff.c > + $(OPENSSL_PATH)/crypto/bio/bf_lbuf.c > + $(OPENSSL_PATH)/crypto/bio/bf_nbio.c > + $(OPENSSL_PATH)/crypto/bio/bf_null.c > + $(OPENSSL_PATH)/crypto/bio/bio_cb.c > + $(OPENSSL_PATH)/crypto/bio/bio_err.c > + $(OPENSSL_PATH)/crypto/bio/bio_lib.c > + $(OPENSSL_PATH)/crypto/bio/bio_meth.c > + $(OPENSSL_PATH)/crypto/bio/bss_acpt.c > + $(OPENSSL_PATH)/crypto/bio/bss_bio.c > + $(OPENSSL_PATH)/crypto/bio/bss_conn.c > + $(OPENSSL_PATH)/crypto/bio/bss_dgram.c > + $(OPENSSL_PATH)/crypto/bio/bss_fd.c > + $(OPENSSL_PATH)/crypto/bio/bss_file.c > + $(OPENSSL_PATH)/crypto/bio/bss_log.c > + $(OPENSSL_PATH)/crypto/bio/bss_mem.c > + $(OPENSSL_PATH)/crypto/bio/bss_null.c > + $(OPENSSL_PATH)/crypto/bio/bss_sock.c > + $(OPENSSL_PATH)/crypto/bn/bn_add.c > + $(OPENSSL_PATH)/crypto/bn/bn_asm.c > + $(OPENSSL_PATH)/crypto/bn/bn_blind.c > + $(OPENSSL_PATH)/crypto/bn/bn_const.c > + $(OPENSSL_PATH)/crypto/bn/bn_ctx.c > + $(OPENSSL_PATH)/crypto/bn/bn_depr.c > + $(OPENSSL_PATH)/crypto/bn/bn_dh.c > + $(OPENSSL_PATH)/crypto/bn/bn_div.c > + $(OPENSSL_PATH)/crypto/bn/bn_err.c > + $(OPENSSL_PATH)/crypto/bn/bn_exp.c > + $(OPENSSL_PATH)/crypto/bn/bn_exp2.c > + $(OPENSSL_PATH)/crypto/bn/bn_gcd.c > + $(OPENSSL_PATH)/crypto/bn/bn_gf2m.c > + $(OPENSSL_PATH)/crypto/bn/bn_intern.c > + $(OPENSSL_PATH)/crypto/bn/bn_kron.c > + $(OPENSSL_PATH)/crypto/bn/bn_lib.c > + $(OPENSSL_PATH)/crypto/bn/bn_mod.c > + $(OPENSSL_PATH)/crypto/bn/bn_mont.c > + $(OPENSSL_PATH)/crypto/bn/bn_mpi.c > + $(OPENSSL_PATH)/crypto/bn/bn_mul.c > + $(OPENSSL_PATH)/crypto/bn/bn_nist.c > + $(OPENSSL_PATH)/crypto/bn/bn_prime.c > + $(OPENSSL_PATH)/crypto/bn/bn_print.c > + $(OPENSSL_PATH)/crypto/bn/bn_rand.c > + $(OPENSSL_PATH)/crypto/bn/bn_recp.c > + $(OPENSSL_PATH)/crypto/bn/bn_shift.c > + $(OPENSSL_PATH)/crypto/bn/bn_sqr.c > + $(OPENSSL_PATH)/crypto/bn/bn_sqrt.c > + $(OPENSSL_PATH)/crypto/bn/bn_srp.c > + $(OPENSSL_PATH)/crypto/bn/bn_word.c > + $(OPENSSL_PATH)/crypto/bn/bn_x931p.c > + $(OPENSSL_PATH)/crypto/buffer/buf_err.c > + $(OPENSSL_PATH)/crypto/buffer/buffer.c > + $(OPENSSL_PATH)/crypto/cmac/cm_ameth.c > + $(OPENSSL_PATH)/crypto/cmac/cm_pmeth.c > + $(OPENSSL_PATH)/crypto/cmac/cmac.c > + $(OPENSSL_PATH)/crypto/comp/c_zlib.c > + $(OPENSSL_PATH)/crypto/comp/comp_err.c > + $(OPENSSL_PATH)/crypto/comp/comp_lib.c > + $(OPENSSL_PATH)/crypto/conf/conf_api.c > + $(OPENSSL_PATH)/crypto/conf/conf_def.c > + $(OPENSSL_PATH)/crypto/conf/conf_err.c > + $(OPENSSL_PATH)/crypto/conf/conf_lib.c > + $(OPENSSL_PATH)/crypto/conf/conf_mall.c > + $(OPENSSL_PATH)/crypto/conf/conf_mod.c > + $(OPENSSL_PATH)/crypto/conf/conf_sap.c > + $(OPENSSL_PATH)/crypto/conf/conf_ssl.c > + $(OPENSSL_PATH)/crypto/cpt_err.c > + $(OPENSSL_PATH)/crypto/cryptlib.c > + $(OPENSSL_PATH)/crypto/ctype.c > + $(OPENSSL_PATH)/crypto/cversion.c > + $(OPENSSL_PATH)/crypto/dh/dh_ameth.c > + $(OPENSSL_PATH)/crypto/dh/dh_asn1.c > + $(OPENSSL_PATH)/crypto/dh/dh_check.c > + $(OPENSSL_PATH)/crypto/dh/dh_depr.c > + $(OPENSSL_PATH)/crypto/dh/dh_err.c > + $(OPENSSL_PATH)/crypto/dh/dh_gen.c > + $(OPENSSL_PATH)/crypto/dh/dh_kdf.c > + $(OPENSSL_PATH)/crypto/dh/dh_key.c > + $(OPENSSL_PATH)/crypto/dh/dh_lib.c > + $(OPENSSL_PATH)/crypto/dh/dh_meth.c > + $(OPENSSL_PATH)/crypto/dh/dh_pmeth.c > + $(OPENSSL_PATH)/crypto/dh/dh_prn.c > + $(OPENSSL_PATH)/crypto/dh/dh_rfc5114.c > + $(OPENSSL_PATH)/crypto/dh/dh_rfc7919.c > + $(OPENSSL_PATH)/crypto/dso/dso_dl.c > + $(OPENSSL_PATH)/crypto/dso/dso_dlfcn.c > + $(OPENSSL_PATH)/crypto/dso/dso_err.c > + $(OPENSSL_PATH)/crypto/dso/dso_lib.c > + $(OPENSSL_PATH)/crypto/dso/dso_openssl.c > + $(OPENSSL_PATH)/crypto/dso/dso_vms.c > + $(OPENSSL_PATH)/crypto/dso/dso_win32.c > + $(OPENSSL_PATH)/crypto/ebcdic.c > + $(OPENSSL_PATH)/crypto/err/err.c > + $(OPENSSL_PATH)/crypto/err/err_prn.c > + $(OPENSSL_PATH)/crypto/evp/bio_b64.c > + $(OPENSSL_PATH)/crypto/evp/bio_enc.c > + $(OPENSSL_PATH)/crypto/evp/bio_md.c > + $(OPENSSL_PATH)/crypto/evp/bio_ok.c > + $(OPENSSL_PATH)/crypto/evp/c_allc.c > + $(OPENSSL_PATH)/crypto/evp/c_alld.c > + $(OPENSSL_PATH)/crypto/evp/cmeth_lib.c > + $(OPENSSL_PATH)/crypto/evp/digest.c > + $(OPENSSL_PATH)/crypto/evp/e_aes.c > + $(OPENSSL_PATH)/crypto/evp/e_aes_cbc_hmac_sha1.c > + $(OPENSSL_PATH)/crypto/evp/e_aes_cbc_hmac_sha256.c > + $(OPENSSL_PATH)/crypto/evp/e_aria.c > + $(OPENSSL_PATH)/crypto/evp/e_bf.c > + $(OPENSSL_PATH)/crypto/evp/e_camellia.c > + $(OPENSSL_PATH)/crypto/evp/e_cast.c > + $(OPENSSL_PATH)/crypto/evp/e_chacha20_poly1305.c > + $(OPENSSL_PATH)/crypto/evp/e_des.c > + $(OPENSSL_PATH)/crypto/evp/e_des3.c > + $(OPENSSL_PATH)/crypto/evp/e_idea.c > + $(OPENSSL_PATH)/crypto/evp/e_null.c > + $(OPENSSL_PATH)/crypto/evp/e_old.c > + $(OPENSSL_PATH)/crypto/evp/e_rc2.c > + $(OPENSSL_PATH)/crypto/evp/e_rc4.c > + $(OPENSSL_PATH)/crypto/evp/e_rc4_hmac_md5.c > + $(OPENSSL_PATH)/crypto/evp/e_rc5.c > + $(OPENSSL_PATH)/crypto/evp/e_seed.c > + $(OPENSSL_PATH)/crypto/evp/e_sm4.c > + $(OPENSSL_PATH)/crypto/evp/e_xcbc_d.c > + $(OPENSSL_PATH)/crypto/evp/encode.c > + $(OPENSSL_PATH)/crypto/evp/evp_cnf.c > + $(OPENSSL_PATH)/crypto/evp/evp_enc.c > + $(OPENSSL_PATH)/crypto/evp/evp_err.c > + $(OPENSSL_PATH)/crypto/evp/evp_key.c > + $(OPENSSL_PATH)/crypto/evp/evp_lib.c > + $(OPENSSL_PATH)/crypto/evp/evp_pbe.c > + $(OPENSSL_PATH)/crypto/evp/evp_pkey.c > + $(OPENSSL_PATH)/crypto/evp/m_md2.c > + $(OPENSSL_PATH)/crypto/evp/m_md4.c > + $(OPENSSL_PATH)/crypto/evp/m_md5.c > + $(OPENSSL_PATH)/crypto/evp/m_md5_sha1.c > + $(OPENSSL_PATH)/crypto/evp/m_mdc2.c > + $(OPENSSL_PATH)/crypto/evp/m_null.c > + $(OPENSSL_PATH)/crypto/evp/m_ripemd.c > + $(OPENSSL_PATH)/crypto/evp/m_sha1.c > + $(OPENSSL_PATH)/crypto/evp/m_sha3.c > + $(OPENSSL_PATH)/crypto/evp/m_sigver.c > + $(OPENSSL_PATH)/crypto/evp/m_wp.c > + $(OPENSSL_PATH)/crypto/evp/names.c > + $(OPENSSL_PATH)/crypto/evp/p5_crpt.c > + $(OPENSSL_PATH)/crypto/evp/p5_crpt2.c > + $(OPENSSL_PATH)/crypto/evp/p_dec.c > + $(OPENSSL_PATH)/crypto/evp/p_enc.c > + $(OPENSSL_PATH)/crypto/evp/p_lib.c > + $(OPENSSL_PATH)/crypto/evp/p_open.c > + $(OPENSSL_PATH)/crypto/evp/p_seal.c > + $(OPENSSL_PATH)/crypto/evp/p_sign.c > + $(OPENSSL_PATH)/crypto/evp/p_verify.c > + $(OPENSSL_PATH)/crypto/evp/pbe_scrypt.c > + $(OPENSSL_PATH)/crypto/evp/pmeth_fn.c > + $(OPENSSL_PATH)/crypto/evp/pmeth_gn.c > + $(OPENSSL_PATH)/crypto/evp/pmeth_lib.c > + $(OPENSSL_PATH)/crypto/ex_data.c > + $(OPENSSL_PATH)/crypto/getenv.c > + $(OPENSSL_PATH)/crypto/hmac/hm_ameth.c > + $(OPENSSL_PATH)/crypto/hmac/hm_pmeth.c > + $(OPENSSL_PATH)/crypto/hmac/hmac.c > + $(OPENSSL_PATH)/crypto/init.c > + $(OPENSSL_PATH)/crypto/kdf/hkdf.c > + $(OPENSSL_PATH)/crypto/kdf/kdf_err.c > + $(OPENSSL_PATH)/crypto/kdf/scrypt.c > + $(OPENSSL_PATH)/crypto/kdf/tls1_prf.c > + $(OPENSSL_PATH)/crypto/lhash/lh_stats.c > + $(OPENSSL_PATH)/crypto/lhash/lhash.c > + $(OPENSSL_PATH)/crypto/md5/md5_dgst.c > + $(OPENSSL_PATH)/crypto/md5/md5_one.c > + $(OPENSSL_PATH)/crypto/mem.c > + $(OPENSSL_PATH)/crypto/mem_dbg.c > + $(OPENSSL_PATH)/crypto/mem_sec.c > + $(OPENSSL_PATH)/crypto/modes/cbc128.c > + $(OPENSSL_PATH)/crypto/modes/ccm128.c > + $(OPENSSL_PATH)/crypto/modes/cfb128.c > + $(OPENSSL_PATH)/crypto/modes/ctr128.c > + $(OPENSSL_PATH)/crypto/modes/cts128.c > + $(OPENSSL_PATH)/crypto/modes/gcm128.c > + $(OPENSSL_PATH)/crypto/modes/ocb128.c > + $(OPENSSL_PATH)/crypto/modes/ofb128.c > + $(OPENSSL_PATH)/crypto/modes/wrap128.c > + $(OPENSSL_PATH)/crypto/modes/xts128.c > + $(OPENSSL_PATH)/crypto/o_dir.c > + $(OPENSSL_PATH)/crypto/o_fips.c > + $(OPENSSL_PATH)/crypto/o_fopen.c > + $(OPENSSL_PATH)/crypto/o_init.c > + $(OPENSSL_PATH)/crypto/o_str.c > + $(OPENSSL_PATH)/crypto/o_time.c > + $(OPENSSL_PATH)/crypto/objects/o_names.c > + $(OPENSSL_PATH)/crypto/objects/obj_dat.c > + $(OPENSSL_PATH)/crypto/objects/obj_err.c > + $(OPENSSL_PATH)/crypto/objects/obj_lib.c > + $(OPENSSL_PATH)/crypto/objects/obj_xref.c > + $(OPENSSL_PATH)/crypto/ocsp/ocsp_asn.c > + $(OPENSSL_PATH)/crypto/ocsp/ocsp_cl.c > + $(OPENSSL_PATH)/crypto/ocsp/ocsp_err.c > + $(OPENSSL_PATH)/crypto/ocsp/ocsp_ext.c > + $(OPENSSL_PATH)/crypto/ocsp/ocsp_ht.c > + $(OPENSSL_PATH)/crypto/ocsp/ocsp_lib.c > + $(OPENSSL_PATH)/crypto/ocsp/ocsp_prn.c > + $(OPENSSL_PATH)/crypto/ocsp/ocsp_srv.c > + $(OPENSSL_PATH)/crypto/ocsp/ocsp_vfy.c > + $(OPENSSL_PATH)/crypto/ocsp/v3_ocsp.c > + $(OPENSSL_PATH)/crypto/pem/pem_all.c > + $(OPENSSL_PATH)/crypto/pem/pem_err.c > + $(OPENSSL_PATH)/crypto/pem/pem_info.c > + $(OPENSSL_PATH)/crypto/pem/pem_lib.c > + $(OPENSSL_PATH)/crypto/pem/pem_oth.c > + $(OPENSSL_PATH)/crypto/pem/pem_pk8.c > + $(OPENSSL_PATH)/crypto/pem/pem_pkey.c > + $(OPENSSL_PATH)/crypto/pem/pem_sign.c > + $(OPENSSL_PATH)/crypto/pem/pem_x509.c > + $(OPENSSL_PATH)/crypto/pem/pem_xaux.c > + $(OPENSSL_PATH)/crypto/pem/pvkfmt.c > + $(OPENSSL_PATH)/crypto/pkcs12/p12_add.c > + $(OPENSSL_PATH)/crypto/pkcs12/p12_asn.c > + $(OPENSSL_PATH)/crypto/pkcs12/p12_attr.c > + $(OPENSSL_PATH)/crypto/pkcs12/p12_crpt.c > + $(OPENSSL_PATH)/crypto/pkcs12/p12_crt.c > + $(OPENSSL_PATH)/crypto/pkcs12/p12_decr.c > + $(OPENSSL_PATH)/crypto/pkcs12/p12_init.c > + $(OPENSSL_PATH)/crypto/pkcs12/p12_key.c > + $(OPENSSL_PATH)/crypto/pkcs12/p12_kiss.c > + $(OPENSSL_PATH)/crypto/pkcs12/p12_mutl.c > + $(OPENSSL_PATH)/crypto/pkcs12/p12_npas.c > + $(OPENSSL_PATH)/crypto/pkcs12/p12_p8d.c > + $(OPENSSL_PATH)/crypto/pkcs12/p12_p8e.c > + $(OPENSSL_PATH)/crypto/pkcs12/p12_sbag.c > + $(OPENSSL_PATH)/crypto/pkcs12/p12_utl.c > + $(OPENSSL_PATH)/crypto/pkcs12/pk12err.c > + $(OPENSSL_PATH)/crypto/pkcs7/bio_pk7.c > + $(OPENSSL_PATH)/crypto/pkcs7/pk7_asn1.c > + $(OPENSSL_PATH)/crypto/pkcs7/pk7_attr.c > + $(OPENSSL_PATH)/crypto/pkcs7/pk7_doit.c > + $(OPENSSL_PATH)/crypto/pkcs7/pk7_lib.c > + $(OPENSSL_PATH)/crypto/pkcs7/pk7_mime.c > + $(OPENSSL_PATH)/crypto/pkcs7/pk7_smime.c > + $(OPENSSL_PATH)/crypto/pkcs7/pkcs7err.c > + $(OPENSSL_PATH)/crypto/rand/drbg_ctr.c > + $(OPENSSL_PATH)/crypto/rand/drbg_lib.c > + $(OPENSSL_PATH)/crypto/rand/rand_egd.c > + $(OPENSSL_PATH)/crypto/rand/rand_err.c > + $(OPENSSL_PATH)/crypto/rand/rand_lib.c > + $(OPENSSL_PATH)/crypto/rand/rand_unix.c > + $(OPENSSL_PATH)/crypto/rand/rand_vms.c > + $(OPENSSL_PATH)/crypto/rand/rand_win.c > + $(OPENSSL_PATH)/crypto/rsa/rsa_ameth.c > + $(OPENSSL_PATH)/crypto/rsa/rsa_asn1.c > + $(OPENSSL_PATH)/crypto/rsa/rsa_chk.c > + $(OPENSSL_PATH)/crypto/rsa/rsa_crpt.c > + $(OPENSSL_PATH)/crypto/rsa/rsa_depr.c > + $(OPENSSL_PATH)/crypto/rsa/rsa_err.c > + $(OPENSSL_PATH)/crypto/rsa/rsa_gen.c > + $(OPENSSL_PATH)/crypto/rsa/rsa_lib.c > + $(OPENSSL_PATH)/crypto/rsa/rsa_meth.c > + $(OPENSSL_PATH)/crypto/rsa/rsa_mp.c > + $(OPENSSL_PATH)/crypto/rsa/rsa_none.c > + $(OPENSSL_PATH)/crypto/rsa/rsa_oaep.c > + $(OPENSSL_PATH)/crypto/rsa/rsa_ossl.c > + $(OPENSSL_PATH)/crypto/rsa/rsa_pk1.c > + $(OPENSSL_PATH)/crypto/rsa/rsa_pmeth.c > + $(OPENSSL_PATH)/crypto/rsa/rsa_prn.c > + $(OPENSSL_PATH)/crypto/rsa/rsa_pss.c > + $(OPENSSL_PATH)/crypto/rsa/rsa_saos.c > + $(OPENSSL_PATH)/crypto/rsa/rsa_sign.c > + $(OPENSSL_PATH)/crypto/rsa/rsa_ssl.c > + $(OPENSSL_PATH)/crypto/rsa/rsa_x931.c > + $(OPENSSL_PATH)/crypto/rsa/rsa_x931g.c > + $(OPENSSL_PATH)/crypto/sha/keccak1600.c > + $(OPENSSL_PATH)/crypto/sha/sha1_one.c > + $(OPENSSL_PATH)/crypto/sha/sha1dgst.c > + $(OPENSSL_PATH)/crypto/sha/sha256.c > + $(OPENSSL_PATH)/crypto/sha/sha512.c > + $(OPENSSL_PATH)/crypto/siphash/siphash.c > + $(OPENSSL_PATH)/crypto/siphash/siphash_ameth.c > + $(OPENSSL_PATH)/crypto/siphash/siphash_pmeth.c > + $(OPENSSL_PATH)/crypto/sm3/m_sm3.c > + $(OPENSSL_PATH)/crypto/sm3/sm3.c > + $(OPENSSL_PATH)/crypto/sm4/sm4.c > + $(OPENSSL_PATH)/crypto/stack/stack.c > + $(OPENSSL_PATH)/crypto/threads_none.c > + $(OPENSSL_PATH)/crypto/threads_pthread.c > + $(OPENSSL_PATH)/crypto/threads_win.c > + $(OPENSSL_PATH)/crypto/txt_db/txt_db.c > + $(OPENSSL_PATH)/crypto/ui/ui_err.c > + $(OPENSSL_PATH)/crypto/ui/ui_lib.c > + $(OPENSSL_PATH)/crypto/ui/ui_null.c > + $(OPENSSL_PATH)/crypto/ui/ui_openssl.c > + $(OPENSSL_PATH)/crypto/ui/ui_util.c > + $(OPENSSL_PATH)/crypto/uid.c > + $(OPENSSL_PATH)/crypto/x509/by_dir.c > + $(OPENSSL_PATH)/crypto/x509/by_file.c > + $(OPENSSL_PATH)/crypto/x509/t_crl.c > + $(OPENSSL_PATH)/crypto/x509/t_req.c > + $(OPENSSL_PATH)/crypto/x509/t_x509.c > + $(OPENSSL_PATH)/crypto/x509/x509_att.c > + $(OPENSSL_PATH)/crypto/x509/x509_cmp.c > + $(OPENSSL_PATH)/crypto/x509/x509_d2.c > + $(OPENSSL_PATH)/crypto/x509/x509_def.c > + $(OPENSSL_PATH)/crypto/x509/x509_err.c > + $(OPENSSL_PATH)/crypto/x509/x509_ext.c > + $(OPENSSL_PATH)/crypto/x509/x509_lu.c > + $(OPENSSL_PATH)/crypto/x509/x509_meth.c > + $(OPENSSL_PATH)/crypto/x509/x509_obj.c > + $(OPENSSL_PATH)/crypto/x509/x509_r2x.c > + $(OPENSSL_PATH)/crypto/x509/x509_req.c > + $(OPENSSL_PATH)/crypto/x509/x509_set.c > + $(OPENSSL_PATH)/crypto/x509/x509_trs.c > + $(OPENSSL_PATH)/crypto/x509/x509_txt.c > + $(OPENSSL_PATH)/crypto/x509/x509_v3.c > + $(OPENSSL_PATH)/crypto/x509/x509_vfy.c > + $(OPENSSL_PATH)/crypto/x509/x509_vpm.c > + $(OPENSSL_PATH)/crypto/x509/x509cset.c > + $(OPENSSL_PATH)/crypto/x509/x509name.c > + $(OPENSSL_PATH)/crypto/x509/x509rset.c > + $(OPENSSL_PATH)/crypto/x509/x509spki.c > + $(OPENSSL_PATH)/crypto/x509/x509type.c > + $(OPENSSL_PATH)/crypto/x509/x_all.c > + $(OPENSSL_PATH)/crypto/x509/x_attrib.c > + $(OPENSSL_PATH)/crypto/x509/x_crl.c > + $(OPENSSL_PATH)/crypto/x509/x_exten.c > + $(OPENSSL_PATH)/crypto/x509/x_name.c > + $(OPENSSL_PATH)/crypto/x509/x_pubkey.c > + $(OPENSSL_PATH)/crypto/x509/x_req.c > + $(OPENSSL_PATH)/crypto/x509/x_x509.c > + $(OPENSSL_PATH)/crypto/x509/x_x509a.c > + $(OPENSSL_PATH)/crypto/x509v3/pcy_cache.c > + $(OPENSSL_PATH)/crypto/x509v3/pcy_data.c > + $(OPENSSL_PATH)/crypto/x509v3/pcy_lib.c > + $(OPENSSL_PATH)/crypto/x509v3/pcy_map.c > + $(OPENSSL_PATH)/crypto/x509v3/pcy_node.c > + $(OPENSSL_PATH)/crypto/x509v3/pcy_tree.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_addr.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_admis.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_akey.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_akeya.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_alt.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_asid.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_bcons.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_bitst.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_conf.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_cpols.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_crld.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_enum.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_extku.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_genn.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_ia5.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_info.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_int.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_lib.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_ncons.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_pci.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_pcia.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_pcons.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_pku.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_pmaps.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_prn.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_purp.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_skey.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_sxnet.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_tlsf.c > + $(OPENSSL_PATH)/crypto/x509v3/v3_utl.c > + $(OPENSSL_PATH)/crypto/x509v3/v3err.c > + $(OPENSSL_PATH)/crypto/arm_arch.h > + $(OPENSSL_PATH)/crypto/mips_arch.h > + $(OPENSSL_PATH)/crypto/ppc_arch.h > + $(OPENSSL_PATH)/crypto/s390x_arch.h > + $(OPENSSL_PATH)/crypto/sparc_arch.h > + $(OPENSSL_PATH)/crypto/vms_rms.h > + $(OPENSSL_PATH)/crypto/aes/aes_local.h > + $(OPENSSL_PATH)/crypto/asn1/asn1_item_list.h > + $(OPENSSL_PATH)/crypto/asn1/asn1_local.h > + $(OPENSSL_PATH)/crypto/asn1/charmap.h > + $(OPENSSL_PATH)/crypto/asn1/standard_methods.h > + $(OPENSSL_PATH)/crypto/asn1/tbl_standard.h > + $(OPENSSL_PATH)/crypto/async/async_local.h > + $(OPENSSL_PATH)/crypto/async/arch/async_null.h > + $(OPENSSL_PATH)/crypto/async/arch/async_posix.h > + $(OPENSSL_PATH)/crypto/async/arch/async_win.h > + $(OPENSSL_PATH)/crypto/bio/bio_local.h > + $(OPENSSL_PATH)/crypto/bn/bn_local.h > + $(OPENSSL_PATH)/crypto/bn/bn_prime.h > + $(OPENSSL_PATH)/crypto/bn/rsaz_exp.h > + $(OPENSSL_PATH)/crypto/comp/comp_local.h > + $(OPENSSL_PATH)/crypto/conf/conf_def.h > + $(OPENSSL_PATH)/crypto/conf/conf_local.h > + $(OPENSSL_PATH)/crypto/dh/dh_local.h > + $(OPENSSL_PATH)/crypto/dso/dso_local.h > + $(OPENSSL_PATH)/crypto/evp/evp_local.h > + $(OPENSSL_PATH)/crypto/hmac/hmac_local.h > + $(OPENSSL_PATH)/crypto/lhash/lhash_local.h > + $(OPENSSL_PATH)/crypto/md5/md5_local.h > + $(OPENSSL_PATH)/crypto/modes/modes_local.h > + $(OPENSSL_PATH)/crypto/objects/obj_dat.h > + $(OPENSSL_PATH)/crypto/objects/obj_local.h > + $(OPENSSL_PATH)/crypto/objects/obj_xref.h > + $(OPENSSL_PATH)/crypto/ocsp/ocsp_local.h > + $(OPENSSL_PATH)/crypto/pkcs12/p12_local.h > + $(OPENSSL_PATH)/crypto/rand/rand_local.h > + $(OPENSSL_PATH)/crypto/rsa/rsa_local.h > + $(OPENSSL_PATH)/crypto/sha/sha_local.h > + $(OPENSSL_PATH)/crypto/siphash/siphash_local.h > + $(OPENSSL_PATH)/crypto/sm3/sm3_local.h > + $(OPENSSL_PATH)/crypto/store/store_local.h > + $(OPENSSL_PATH)/crypto/ui/ui_local.h > + $(OPENSSL_PATH)/crypto/x509/x509_local.h > + $(OPENSSL_PATH)/crypto/x509v3/ext_dat.h > + $(OPENSSL_PATH)/crypto/x509v3/pcy_local.h > + $(OPENSSL_PATH)/crypto/x509v3/standard_exts.h > + $(OPENSSL_PATH)/crypto/x509v3/v3_admis.h > + $(OPENSSL_PATH)/ssl/bio_ssl.c > + $(OPENSSL_PATH)/ssl/d1_lib.c > + $(OPENSSL_PATH)/ssl/d1_msg.c > + $(OPENSSL_PATH)/ssl/d1_srtp.c > + $(OPENSSL_PATH)/ssl/methods.c > + $(OPENSSL_PATH)/ssl/packet.c > + $(OPENSSL_PATH)/ssl/pqueue.c > + $(OPENSSL_PATH)/ssl/record/dtls1_bitmap.c > + $(OPENSSL_PATH)/ssl/record/rec_layer_d1.c > + $(OPENSSL_PATH)/ssl/record/rec_layer_s3.c > + $(OPENSSL_PATH)/ssl/record/ssl3_buffer.c > + $(OPENSSL_PATH)/ssl/record/ssl3_record.c > + $(OPENSSL_PATH)/ssl/record/ssl3_record_tls13.c > + $(OPENSSL_PATH)/ssl/s3_cbc.c > + $(OPENSSL_PATH)/ssl/s3_enc.c > + $(OPENSSL_PATH)/ssl/s3_lib.c > + $(OPENSSL_PATH)/ssl/s3_msg.c > + $(OPENSSL_PATH)/ssl/ssl_asn1.c > + $(OPENSSL_PATH)/ssl/ssl_cert.c > + $(OPENSSL_PATH)/ssl/ssl_ciph.c > + $(OPENSSL_PATH)/ssl/ssl_conf.c > + $(OPENSSL_PATH)/ssl/ssl_err.c > + $(OPENSSL_PATH)/ssl/ssl_init.c > + $(OPENSSL_PATH)/ssl/ssl_lib.c > + $(OPENSSL_PATH)/ssl/ssl_mcnf.c > + $(OPENSSL_PATH)/ssl/ssl_rsa.c > + $(OPENSSL_PATH)/ssl/ssl_sess.c > + $(OPENSSL_PATH)/ssl/ssl_stat.c > + $(OPENSSL_PATH)/ssl/ssl_txt.c > + $(OPENSSL_PATH)/ssl/ssl_utst.c > + $(OPENSSL_PATH)/ssl/statem/extensions.c > + $(OPENSSL_PATH)/ssl/statem/extensions_clnt.c > + $(OPENSSL_PATH)/ssl/statem/extensions_cust.c > + $(OPENSSL_PATH)/ssl/statem/extensions_srvr.c > + $(OPENSSL_PATH)/ssl/statem/statem.c > + $(OPENSSL_PATH)/ssl/statem/statem_clnt.c > + $(OPENSSL_PATH)/ssl/statem/statem_dtls.c > + $(OPENSSL_PATH)/ssl/statem/statem_lib.c > + $(OPENSSL_PATH)/ssl/statem/statem_srvr.c > + $(OPENSSL_PATH)/ssl/t1_enc.c > + $(OPENSSL_PATH)/ssl/t1_lib.c > + $(OPENSSL_PATH)/ssl/t1_trce.c > + $(OPENSSL_PATH)/ssl/tls13_enc.c > + $(OPENSSL_PATH)/ssl/tls_srp.c > + $(OPENSSL_PATH)/ssl/packet_local.h > + $(OPENSSL_PATH)/ssl/ssl_cert_table.h > + $(OPENSSL_PATH)/ssl/ssl_local.h > + $(OPENSSL_PATH)/ssl/record/record.h > + $(OPENSSL_PATH)/ssl/record/record_local.h > + $(OPENSSL_PATH)/ssl/statem/statem.h > + $(OPENSSL_PATH)/ssl/statem/statem_local.h > +# Autogenerated files list ends here > + buildinf.h > + ossl_store.c > + rand_pool.c > + X64/ApiHooks.c > + > +[Packages] > + MdePkg/MdePkg.dec > + CryptoPkg/CryptoPkg.dec > + > +[LibraryClasses] > + BaseLib > + DebugLib > + RngLib > + PrintLib > + > +[BuildOptions] > + # > + # Disables the following Visual Studio compiler warnings brought by openssl > source, > + # so we do not break the build with /WX option: > + # C4090: 'function' : different 'const' qualifiers > + # C4132: 'object' : const object should be initialized (tls13_enc.c) > + # C4210: nonstandard extension used: function given file scope > + # C4244: conversion from type1 to type2, possible loss of data > + # C4245: conversion from type1 to type2, signed/unsigned mismatch > + # C4267: conversion from size_t to type, possible loss of data > + # C4306: 'identifier' : conversion from 'type1' to 'type2' of greater size > + # C4310: cast truncates constant value > + # C4389: 'operator' : signed/unsigned mismatch (xxxx) > + # C4700: uninitialized local variable 'name' used. (conf_sap.c(71)) > + # C4702: unreachable code > + # C4706: assignment within conditional expression > + # C4819: The file contains a character that cannot be represented in the > current code page > + # > + MSFT:*_*_X64_CC_FLAGS = -U_WIN32 -U_WIN64 -U_MSC_VER > $(OPENSSL_FLAGS) $(OPENSSL_FLAGS_CONFIG) /wd4090 /wd4132 /wd4210 > /wd4244 /wd4245 /wd4267 /wd4306 /wd4310 /wd4700 /wd4389 /wd4702 > /wd4706 /wd4819 > + > + INTEL:*_*_X64_CC_FLAGS = -U_WIN32 -U_WIN64 -U_MSC_VER -U__ICC > $(OPENSSL_FLAGS) $(OPENSSL_FLAGS_CONFIG) /w > + > + # > + # Suppress the following build warnings in openssl so we don't break the build > with -Werror > + # -Werror=maybe-uninitialized: there exist some other paths for which the > variable is not initialized. > + # -Werror=format: Check calls to printf and scanf, etc., to make sure that the > arguments supplied have > + # types appropriate to the format string specified. > + # -Werror=unused-but-set-variable: Warn whenever a local variable is > assigned to, but otherwise unused (aside from its declaration). > + # > + GCC:*_*_X64_CC_FLAGS = -UWIN32 -U_WIN32 -U_WIN64 > $(OPENSSL_FLAGS) $(OPENSSL_FLAGS_CONFIG) -Wno-error=maybe-uninitialized > -Wno-error=format -Wno-format -Wno-error=unused-but-set-variable - > DNO_MSABI_VA_FUNCS > + > + # suppress the following warnings in openssl so we don't break the build with > warnings-as-errors: > + # 1295: Deprecated declaration <entity> - give arg types > + # 550: <entity> was set but never used > + # 1293: assignment in condition > + # 111: statement is unreachable (invariably "break;" after "return X;" in case > statement) > + # 68: integer conversion resulted in a change of sign ("if (Status == -1)") > + # 177: <entity> was declared but never referenced > + # 223: function <entity> declared implicitly > + # 144: a value of type <type> cannot be used to initialize an entity of type > <type> > + # 513: a value of type <type> cannot be assigned to an entity of type <type> > + # 188: enumerated type mixed with another type (i.e. passing an integer as an > enum without a cast) > + # 1296: Extended constant initialiser used > + # 128: loop is not reachable - may be emitted inappropriately if code follows > a conditional return > + # from the function that evaluates to true at compile time > + # 546: transfer of control bypasses initialization - may be emitted > inappropriately if the uninitialized > + # variable is never referenced after the jump > + # 1: ignore "#1-D: last line of file ends without a newline" > + # 3017: <entity> may be used before being set (NOTE: This was fixed in > OpenSSL 1.1 HEAD with > + # commit d9b8b89bec4480de3a10bdaf9425db371c19145b, and can be > dropped then.) > + XCODE:*_*_X64_CC_FLAGS = -mmmx -msse -U_WIN32 -U_WIN64 > $(OPENSSL_FLAGS) $(OPENSSL_FLAGS_CONFIG) -w -std=c99 -Wno- > error=uninitialized > diff --git a/CryptoPkg/Library/OpensslLib/UefiAsm.conf > b/CryptoPkg/Library/OpensslLib/UefiAsm.conf > new file mode 100644 > index 0000000000..2c2978d696 > --- /dev/null > +++ b/CryptoPkg/Library/OpensslLib/UefiAsm.conf > @@ -0,0 +1,30 @@ > +## -*- mode: perl; -*- > +# UEFI assembly openssl configuration targets. > +# > +# Copyright (c) 2020, Intel Corporation. All rights reserved.<BR> > +# > +# SPDX-License-Identifier: BSD-2-Clause-Patent > +# > +## > + > +my %targets = ( > +#### UEFI > + "UEFI-x86_64" => { > + perlasm_scheme => "nasm", > + # inherit_from => [ "UEFI", asm("x86_64_asm") ], > + inherit_from => [ "UEFI" ], > + cpuid_asm_src => "x86_64cpuid.s", > + aes_asm_src => "aes_core.c aes_cbc.c vpaes-x86_64.s aesni-x86_64.s > aesni-sha1-x86_64.s aesni-sha256-x86_64.s aesni-mb-x86_64.s", > + sha1_asm_src => "sha1-x86_64.s sha256-x86_64.s sha512-x86_64.s > sha1-mb-x86_64.s sha256-mb-x86_64.s", > + modes_asm_src => "ghash-x86_64.s aesni-gcm-x86_64.s", > + }, > + "UEFI-x86_64-GCC" => { > + perlasm_scheme => "elf", > + # inherit_from => [ "UEFI", asm("x86_64_asm") ], > + inherit_from => [ "UEFI" ], > + cpuid_asm_src => "x86_64cpuid.s", > + aes_asm_src => "aes_core.c aes_cbc.c vpaes-x86_64.s aesni-x86_64.s > aesni-sha1-x86_64.s aesni-sha256-x86_64.s aesni-mb-x86_64.s", > + sha1_asm_src => "sha1-x86_64.s sha256-x86_64.s sha512-x86_64.s > sha1-mb-x86_64.s sha256-mb-x86_64.s", > + modes_asm_src => "ghash-x86_64.s aesni-gcm-x86_64.s", > + }, > +); > diff --git a/CryptoPkg/Library/OpensslLib/X64/ApiHooks.c > b/CryptoPkg/Library/OpensslLib/X64/ApiHooks.c > new file mode 100644 > index 0000000000..0c8043aa8e > --- /dev/null > +++ b/CryptoPkg/Library/OpensslLib/X64/ApiHooks.c > @@ -0,0 +1,22 @@ > +/** @file > + OpenSSL Library API hooks. > + > +Copyright (c) 2020, Intel Corporation. All rights reserved.<BR> > +SPDX-License-Identifier: BSD-2-Clause-Patent > + > +**/ > + > +#include <Uefi.h> > + > +/** > + Stub function for win64 API call. > + > +**/ > +VOID * > +__imp_RtlVirtualUnwind ( > + VOID * Args > + ) > +{ > + return NULL; > +} > + > diff --git a/CryptoPkg/Library/OpensslLib/process_files.pl > b/CryptoPkg/Library/OpensslLib/process_files.pl > index 57ce195394..42bff05fa6 100755 > --- a/CryptoPkg/Library/OpensslLib/process_files.pl > +++ b/CryptoPkg/Library/OpensslLib/process_files.pl > @@ -9,9 +9,65 @@ > # do not need to do this, since the results are stored in the EDK2 > # git repository for them. > # > +# Due to the script wrapping required to process the OpenSSL > +# configuration data, each native architecture must be processed > +# individually by the maintainer (in addition to the standard version): > +# ./process_files.pl > +# ./process_files.pl X64 > +# ./process_files.pl [Arch] > + > use strict; > use Cwd; > use File::Copy; > +use File::Basename; > +use File::Path qw(make_path remove_tree); > +use Text::Tabs; > + > +my $comment_character; > + > +# > +# OpenSSL perlasm generator script does not transfer the copyright header > +# > +sub copy_license_header > +{ > + my @args = split / /, shift; #Separate args by spaces > + my $source = $args[1]; #Source file is second (after "perl") > + my $target = pop @args; #Target file is always last > + chop ($target); #Remove newline char > + > + my $temp_file_name = "license.tmp"; > + open (my $source_file, "<" . $source) || die $source; > + open (my $target_file, "<" . $target) || die $target; > + open (my $temp_file, ">" . $temp_file_name) || die $temp_file_name; > + > + #Add "generated file" warning > + $source =~ s/^..//; #Remove leading "./" > + print ($temp_file "$comment_character WARNING: do not edit!\r\n"); > + print ($temp_file "$comment_character Generated from $source\r\n"); > + print ($temp_file "$comment_character\r\n"); > + > + #Copy source file header to temp file > + while (my $line = <$source_file>) { > + next if ($line =~ /#!/); #Ignore shebang line > + $line =~ s/#/$comment_character/; #Fix comment character for > assembly > + $line =~ s/\s+$/\r\n/; #Trim trailing whitepsace, fixup line endings > + print ($temp_file $line); > + last if ($line =~ /http/); #Last line of copyright header contains a web link > + } > + print ($temp_file "\r\n"); > + #Retrieve generated assembly contents > + while (my $line = <$target_file>) { > + $line =~ s/\s+$/\r\n/; #Trim trailing whitepsace, fixup line endings > + print ($temp_file expand ($line)); #expand() replaces tabs with spaces > + } > + > + close ($source_file); > + close ($target_file); > + close ($temp_file); > + > + move ($temp_file_name, $target) || > + die "Cannot replace \"" . $target . "\"!"; > +} > > # > # Find the openssl directory name for use lib. We have to do this > @@ -21,10 +77,57 @@ use File::Copy; > # > my $inf_file; > my $OPENSSL_PATH; > +my $uefi_config; > +my $extension; > +my $arch; > my @inf; > > BEGIN { > $inf_file = "OpensslLib.inf"; > + $uefi_config = "UEFI"; > + $arch = shift; > + > + if (defined $arch) { > + if (uc ($arch) eq "X64") { > + $arch = "X64"; > + $inf_file = "OpensslLibX64.inf"; > + $uefi_config = "UEFI-x86_64"; > + $extension = "nasm"; > + $comment_character = ";"; > + } elsif (uc ($arch) eq "X64GCC") { > + $arch = "X64Gcc"; > + $inf_file = "OpensslLibX64Gcc.inf"; > + $uefi_config = "UEFI-x86_64-GCC"; > + $extension = "S"; > + $comment_character = "#"; > + } else { > + die "Unsupported architecture \"" . $arch . "\"!"; > + } > + if ($extension eq "nasm") { > + if (`nasm -v 2>&1`) { > + #Presence of nasm executable will trigger inclusion of AVX instructions > + die "\nCannot run assembly generators with NASM in path!\n\n"; > + } > + } > + > + # Prepare assembly folder > + if (-d $arch) { > + opendir my $dir, $arch || > + die "Cannot open assembly folder \"" . $arch . "\"!"; > + while (defined (my $file = readdir $dir)) { > + if (-d "$arch/$file") { > + next if $file eq "."; > + next if $file eq ".."; > + remove_tree ("$arch/$file", {safe => 1}) || > + die "Cannot clean assembly folder \"" . "$arch/$file" . "\"!"; > + } > + } > + > + } else { > + mkdir $arch || > + die "Cannot create assembly folder \"" . $arch . "\"!"; > + } > + } > > # Read the contents of the inf file > open( FD, "<" . $inf_file ) || > @@ -47,9 +150,9 @@ BEGIN { > # Configure UEFI > system( > "./Configure", > - "UEFI", > + "--config=../UefiAsm.conf", > + "$uefi_config", > "no-afalgeng", > - "no-asm", > "no-async", > "no-autoerrinit", > "no-autoload-config", > @@ -129,23 +232,53 @@ BEGIN { > # Retrieve file lists from OpenSSL configdata > # > use configdata qw/%unified_info/; > +use configdata qw/%config/; > +use configdata qw/%target/; > + > +# > +# Collect build flags from configdata > +# > +my $flags = ""; > +foreach my $f (@{$config{lib_defines}}) { > + $flags .= " -D$f"; > +} > > my @cryptofilelist = (); > my @sslfilelist = (); > +my @asmfilelist = (); > +my @asmbuild = (); > foreach my $product ((@{$unified_info{libraries}}, > @{$unified_info{engines}})) { > foreach my $o (@{$unified_info{sources}->{$product}}) { > foreach my $s (@{$unified_info{sources}->{$o}}) { > - next if ($unified_info{generate}->{$s}); > - next if $s =~ "crypto/bio/b_print.c"; > - > # No need to add unused files in UEFI. > # So it can reduce porting time, compile time, library size. > + next if $s =~ "crypto/bio/b_print.c"; > next if $s =~ "crypto/rand/randfile.c"; > next if $s =~ "crypto/store/"; > next if $s =~ "crypto/err/err_all.c"; > next if $s =~ "crypto/aes/aes_ecb.c"; > > + if ($unified_info{generate}->{$s}) { > + if (defined $arch) { > + my $buildstring = "perl"; > + foreach my $arg (@{$unified_info{generate}->{$s}}) { > + if ($arg =~ ".pl") { > + $buildstring .= " ./openssl/$arg"; > + } elsif ($arg =~ "PERLASM_SCHEME") { > + $buildstring .= " $target{perlasm_scheme}"; > + } elsif ($arg =~ "LIB_CFLAGS") { > + $buildstring .= "$flags"; > + } > + } > + ($s, my $path, undef) = fileparse($s, qr/\.[^.]*/); > + $buildstring .= " ./$arch/$path$s.$extension"; > + make_path ("./$arch/$path"); > + push @asmbuild, "$buildstring\n"; > + push @asmfilelist, " $arch/$path$s.$extension\r\n"; > + } > + next; > + } > if ($product =~ "libssl") { > push @sslfilelist, ' $(OPENSSL_PATH)/' . $s . "\r\n"; > next; > @@ -183,15 +316,31 @@ foreach (@headers){ > } > > > +# > +# Generate assembly files > +# > +if (@asmbuild) { > + print "\n--> Generating assembly files ... "; > + foreach my $buildstring (@asmbuild) { > + system ("$buildstring"); > + copy_license_header ($buildstring); > + } > + print "Done!"; > +} > + > # > # Update OpensslLib.inf with autogenerated file list > # > my @new_inf = (); > my $subbing = 0; > -print "\n--> Updating OpensslLib.inf ... "; > +print "\n--> Updating $inf_file ... "; > foreach (@inf) { > + if ($_ =~ "DEFINE OPENSSL_FLAGS_CONFIG") { > + push @new_inf, " DEFINE OPENSSL_FLAGS_CONFIG =" . $flags . "\r\n"; > + next; > + } > if ( $_ =~ "# Autogenerated files list starts here" ) { > - push @new_inf, $_, @cryptofilelist, @sslfilelist; > + push @new_inf, $_, @asmfilelist, @cryptofilelist, @sslfilelist; > $subbing = 1; > next; > } > @@ -216,49 +365,51 @@ rename( $new_inf_file, $inf_file ) || > die "rename $inf_file"; > print "Done!"; > > -# > -# Update OpensslLibCrypto.inf with auto-generated file list (no libssl) > -# > -$inf_file = "OpensslLibCrypto.inf"; > - > -# Read the contents of the inf file > -@inf = (); > -@new_inf = (); > -open( FD, "<" . $inf_file ) || > - die "Cannot open \"" . $inf_file . "\"!"; > -@inf = (<FD>); > -close(FD) || > - die "Cannot close \"" . $inf_file . "\"!"; > +if (!defined $arch) { > + # > + # Update OpensslLibCrypto.inf with auto-generated file list (no libssl) > + # > + $inf_file = "OpensslLibCrypto.inf"; > > -$subbing = 0; > -print "\n--> Updating OpensslLibCrypto.inf ... "; > -foreach (@inf) { > - if ( $_ =~ "# Autogenerated files list starts here" ) { > - push @new_inf, $_, @cryptofilelist; > - $subbing = 1; > - next; > - } > - if ( $_ =~ "# Autogenerated files list ends here" ) { > - push @new_inf, $_; > - $subbing = 0; > - next; > + # Read the contents of the inf file > + @inf = (); > + @new_inf = (); > + open( FD, "<" . $inf_file ) || > + die "Cannot open \"" . $inf_file . "\"!"; > + @inf = (<FD>); > + close(FD) || > + die "Cannot close \"" . $inf_file . "\"!"; > + > + $subbing = 0; > + print "\n--> Updating OpensslLibCrypto.inf ... "; > + foreach (@inf) { > + if ( $_ =~ "# Autogenerated files list starts here" ) { > + push @new_inf, $_, @cryptofilelist; > + $subbing = 1; > + next; > + } > + if ( $_ =~ "# Autogenerated files list ends here" ) { > + push @new_inf, $_; > + $subbing = 0; > + next; > + } > + > + push @new_inf, $_ > + unless ($subbing); > } > > - push @new_inf, $_ > - unless ($subbing); > + $new_inf_file = $inf_file . ".new"; > + open( FD, ">" . $new_inf_file ) || > + die $new_inf_file; > + print( FD @new_inf ) || > + die $new_inf_file; > + close(FD) || > + die $new_inf_file; > + rename( $new_inf_file, $inf_file ) || > + die "rename $inf_file"; > + print "Done!"; > } > > -$new_inf_file = $inf_file . ".new"; > -open( FD, ">" . $new_inf_file ) || > - die $new_inf_file; > -print( FD @new_inf ) || > - die $new_inf_file; > -close(FD) || > - die $new_inf_file; > -rename( $new_inf_file, $inf_file ) || > - die "rename $inf_file"; > -print "Done!"; > - > # > # Copy opensslconf.h and dso_conf.h generated from OpenSSL Configuration > # > -- > 2.32.0.windows.1 ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v7 3/3] CryptoPkg/OpensslLib: Commit the auto-generated assembly files for X64 [not found] <20210720220646.659-1-christopher.zurcher@outlook.com> 2021-07-20 22:06 ` [PATCH v7 1/3] BaseTools: Remove COMMON section from the GCC discard list Christopher Zurcher 2021-07-20 22:06 ` [PATCH v7 2/3] CryptoPkg/OpensslLib: Add native instruction support for X64 Christopher Zurcher @ 2021-07-20 22:06 ` Christopher Zurcher 2021-07-21 11:44 ` Yao, Jiewen 2 siblings, 1 reply; 13+ messages in thread From: Christopher Zurcher @ 2021-07-20 22:06 UTC (permalink / raw) To: devel; +Cc: Jiewen Yao, Jian J Wang, Xiaoyu Lu, Mike Kinney, Ard Biesheuvel From: Christopher Zurcher <christopher.zurcher@microsoft.com> BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=2507 Adding the auto-generated assembly files for X64 architectures. Cc: Jiewen Yao <jiewen.yao@intel.com> Cc: Jian J Wang <jian.j.wang@intel.com> Cc: Xiaoyu Lu <xiaoyux.lu@intel.com> Cc: Mike Kinney <michael.d.kinney@intel.com> Cc: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Christopher Zurcher <christopher.zurcher@microsoft.com> --- CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-mb-x86_64.nasm | 732 +++ CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha1-x86_64.nasm | 1916 ++++++++ CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha256-x86_64.nasm | 78 + CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-x86_64.nasm | 5103 ++++++++++++++++++++ CryptoPkg/Library/OpensslLib/X64/crypto/aes/vpaes-x86_64.nasm | 1173 +++++ CryptoPkg/Library/OpensslLib/X64/crypto/modes/aesni-gcm-x86_64.nasm | 34 + CryptoPkg/Library/OpensslLib/X64/crypto/modes/ghash-x86_64.nasm | 1569 ++++++ CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-mb-x86_64.nasm | 3137 ++++++++++++ CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-x86_64.nasm | 2884 +++++++++++ CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-mb-x86_64.nasm | 3461 +++++++++++++ CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-x86_64.nasm | 3313 +++++++++++++ CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha512-x86_64.nasm | 1938 ++++++++ CryptoPkg/Library/OpensslLib/X64/crypto/x86_64cpuid.nasm | 491 ++ CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/aesni-mb-x86_64.S | 552 +++ CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/aesni-sha1-x86_64.S | 1719 +++++++ CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/aesni-sha256-x86_64.S | 69 + CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/aesni-x86_64.S | 4484 +++++++++++++++++ CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/vpaes-x86_64.S | 863 ++++ CryptoPkg/Library/OpensslLib/X64Gcc/crypto/modes/aesni-gcm-x86_64.S | 29 + CryptoPkg/Library/OpensslLib/X64Gcc/crypto/modes/ghash-x86_64.S | 1386 ++++++ CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha1-mb-x86_64.S | 2962 ++++++++++++ CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha1-x86_64.S | 2631 ++++++++++ CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha256-mb-x86_64.S | 3286 +++++++++++++ CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha256-x86_64.S | 3097 ++++++++++++ CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha512-x86_64.S | 1811 +++++++ CryptoPkg/Library/OpensslLib/X64Gcc/crypto/x86_64cpuid.S | 491 ++ 26 files changed, 49209 insertions(+) diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-mb-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-mb-x86_64.nasm new file mode 100644 index 0000000000..1a3ed1dd35 --- /dev/null +++ b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-mb-x86_64.nasm @@ -0,0 +1,732 @@ +; WARNING: do not edit! +; Generated from openssl/crypto/aes/asm/aesni-mb-x86_64.pl +; +; Copyright 2013-2020 The OpenSSL Project Authors. All Rights Reserved. +; +; Licensed under the OpenSSL license (the "License"). You may not use +; this file except in compliance with the License. You can obtain a copy +; in the file LICENSE in the source distribution or at +; https://www.openssl.org/source/license.html + +default rel +%define XMMWORD +%define YMMWORD +%define ZMMWORD +section .text code align=64 + + +EXTERN OPENSSL_ia32cap_P + +global aesni_multi_cbc_encrypt + +ALIGN 32 +aesni_multi_cbc_encrypt: + mov QWORD[8+rsp],rdi ;WIN64 prologue + mov QWORD[16+rsp],rsi + mov rax,rsp +$L$SEH_begin_aesni_multi_cbc_encrypt: + mov rdi,rcx + mov rsi,rdx + mov rdx,r8 + + + + mov rax,rsp + + push rbx + + push rbp + + push r12 + + push r13 + + push r14 + + push r15 + + lea rsp,[((-168))+rsp] + movaps XMMWORD[rsp],xmm6 + movaps XMMWORD[16+rsp],xmm7 + movaps XMMWORD[32+rsp],xmm8 + movaps XMMWORD[48+rsp],xmm9 + movaps XMMWORD[64+rsp],xmm10 + movaps XMMWORD[80+rsp],xmm11 + movaps XMMWORD[96+rsp],xmm12 + movaps XMMWORD[(-104)+rax],xmm13 + movaps XMMWORD[(-88)+rax],xmm14 + movaps XMMWORD[(-72)+rax],xmm15 + + + + + + + sub rsp,48 + and rsp,-64 + mov QWORD[16+rsp],rax + + +$L$enc4x_body: + movdqu xmm12,XMMWORD[rsi] + lea rsi,[120+rsi] + lea rdi,[80+rdi] + +$L$enc4x_loop_grande: + mov DWORD[24+rsp],edx + xor edx,edx + mov ecx,DWORD[((-64))+rdi] + mov r8,QWORD[((-80))+rdi] + cmp ecx,edx + mov r12,QWORD[((-72))+rdi] + cmovg edx,ecx + test ecx,ecx + movdqu xmm2,XMMWORD[((-56))+rdi] + mov DWORD[32+rsp],ecx + cmovle r8,rsp + mov ecx,DWORD[((-24))+rdi] + mov r9,QWORD[((-40))+rdi] + cmp ecx,edx + mov r13,QWORD[((-32))+rdi] + cmovg edx,ecx + test ecx,ecx + movdqu xmm3,XMMWORD[((-16))+rdi] + mov DWORD[36+rsp],ecx + cmovle r9,rsp + mov ecx,DWORD[16+rdi] + mov r10,QWORD[rdi] + cmp ecx,edx + mov r14,QWORD[8+rdi] + cmovg edx,ecx + test ecx,ecx + movdqu xmm4,XMMWORD[24+rdi] + mov DWORD[40+rsp],ecx + cmovle r10,rsp + mov ecx,DWORD[56+rdi] + mov r11,QWORD[40+rdi] + cmp ecx,edx + mov r15,QWORD[48+rdi] + cmovg edx,ecx + test ecx,ecx + movdqu xmm5,XMMWORD[64+rdi] + mov DWORD[44+rsp],ecx + cmovle r11,rsp + test edx,edx + jz NEAR $L$enc4x_done + + movups xmm1,XMMWORD[((16-120))+rsi] + pxor xmm2,xmm12 + movups xmm0,XMMWORD[((32-120))+rsi] + pxor xmm3,xmm12 + mov eax,DWORD[((240-120))+rsi] + pxor xmm4,xmm12 + movdqu xmm6,XMMWORD[r8] + pxor xmm5,xmm12 + movdqu xmm7,XMMWORD[r9] + pxor xmm2,xmm6 + movdqu xmm8,XMMWORD[r10] + pxor xmm3,xmm7 + movdqu xmm9,XMMWORD[r11] + pxor xmm4,xmm8 + pxor xmm5,xmm9 + movdqa xmm10,XMMWORD[32+rsp] + xor rbx,rbx + jmp NEAR $L$oop_enc4x + +ALIGN 32 +$L$oop_enc4x: + add rbx,16 + lea rbp,[16+rsp] + mov ecx,1 + sub rbp,rbx + +DB 102,15,56,220,209 + prefetcht0 [31+rbx*1+r8] + prefetcht0 [31+rbx*1+r9] +DB 102,15,56,220,217 + prefetcht0 [31+rbx*1+r10] + prefetcht0 [31+rbx*1+r10] +DB 102,15,56,220,225 +DB 102,15,56,220,233 + movups xmm1,XMMWORD[((48-120))+rsi] + cmp ecx,DWORD[32+rsp] +DB 102,15,56,220,208 +DB 102,15,56,220,216 +DB 102,15,56,220,224 + cmovge r8,rbp + cmovg r12,rbp +DB 102,15,56,220,232 + movups xmm0,XMMWORD[((-56))+rsi] + cmp ecx,DWORD[36+rsp] +DB 102,15,56,220,209 +DB 102,15,56,220,217 +DB 102,15,56,220,225 + cmovge r9,rbp + cmovg r13,rbp +DB 102,15,56,220,233 + movups xmm1,XMMWORD[((-40))+rsi] + cmp ecx,DWORD[40+rsp] +DB 102,15,56,220,208 +DB 102,15,56,220,216 +DB 102,15,56,220,224 + cmovge r10,rbp + cmovg r14,rbp +DB 102,15,56,220,232 + movups xmm0,XMMWORD[((-24))+rsi] + cmp ecx,DWORD[44+rsp] +DB 102,15,56,220,209 +DB 102,15,56,220,217 +DB 102,15,56,220,225 + cmovge r11,rbp + cmovg r15,rbp +DB 102,15,56,220,233 + movups xmm1,XMMWORD[((-8))+rsi] + movdqa xmm11,xmm10 +DB 102,15,56,220,208 + prefetcht0 [15+rbx*1+r12] + prefetcht0 [15+rbx*1+r13] +DB 102,15,56,220,216 + prefetcht0 [15+rbx*1+r14] + prefetcht0 [15+rbx*1+r15] +DB 102,15,56,220,224 +DB 102,15,56,220,232 + movups xmm0,XMMWORD[((128-120))+rsi] + pxor xmm12,xmm12 + +DB 102,15,56,220,209 + pcmpgtd xmm11,xmm12 + movdqu xmm12,XMMWORD[((-120))+rsi] +DB 102,15,56,220,217 + paddd xmm10,xmm11 + movdqa XMMWORD[32+rsp],xmm10 +DB 102,15,56,220,225 +DB 102,15,56,220,233 + movups xmm1,XMMWORD[((144-120))+rsi] + + cmp eax,11 + +DB 102,15,56,220,208 +DB 102,15,56,220,216 +DB 102,15,56,220,224 +DB 102,15,56,220,232 + movups xmm0,XMMWORD[((160-120))+rsi] + + jb NEAR $L$enc4x_tail + +DB 102,15,56,220,209 +DB 102,15,56,220,217 +DB 102,15,56,220,225 +DB 102,15,56,220,233 + movups xmm1,XMMWORD[((176-120))+rsi] + +DB 102,15,56,220,208 +DB 102,15,56,220,216 +DB 102,15,56,220,224 +DB 102,15,56,220,232 + movups xmm0,XMMWORD[((192-120))+rsi] + + je NEAR $L$enc4x_tail + +DB 102,15,56,220,209 +DB 102,15,56,220,217 +DB 102,15,56,220,225 +DB 102,15,56,220,233 + movups xmm1,XMMWORD[((208-120))+rsi] + +DB 102,15,56,220,208 +DB 102,15,56,220,216 +DB 102,15,56,220,224 +DB 102,15,56,220,232 + movups xmm0,XMMWORD[((224-120))+rsi] + jmp NEAR $L$enc4x_tail + +ALIGN 32 +$L$enc4x_tail: +DB 102,15,56,220,209 +DB 102,15,56,220,217 +DB 102,15,56,220,225 +DB 102,15,56,220,233 + movdqu xmm6,XMMWORD[rbx*1+r8] + movdqu xmm1,XMMWORD[((16-120))+rsi] + +DB 102,15,56,221,208 + movdqu xmm7,XMMWORD[rbx*1+r9] + pxor xmm6,xmm12 +DB 102,15,56,221,216 + movdqu xmm8,XMMWORD[rbx*1+r10] + pxor xmm7,xmm12 +DB 102,15,56,221,224 + movdqu xmm9,XMMWORD[rbx*1+r11] + pxor xmm8,xmm12 +DB 102,15,56,221,232 + movdqu xmm0,XMMWORD[((32-120))+rsi] + pxor xmm9,xmm12 + + movups XMMWORD[(-16)+rbx*1+r12],xmm2 + pxor xmm2,xmm6 + movups XMMWORD[(-16)+rbx*1+r13],xmm3 + pxor xmm3,xmm7 + movups XMMWORD[(-16)+rbx*1+r14],xmm4 + pxor xmm4,xmm8 + movups XMMWORD[(-16)+rbx*1+r15],xmm5 + pxor xmm5,xmm9 + + dec edx + jnz NEAR $L$oop_enc4x + + mov rax,QWORD[16+rsp] + + mov edx,DWORD[24+rsp] + + + + + + + + + + + lea rdi,[160+rdi] + dec edx + jnz NEAR $L$enc4x_loop_grande + +$L$enc4x_done: + movaps xmm6,XMMWORD[((-216))+rax] + movaps xmm7,XMMWORD[((-200))+rax] + movaps xmm8,XMMWORD[((-184))+rax] + movaps xmm9,XMMWORD[((-168))+rax] + movaps xmm10,XMMWORD[((-152))+rax] + movaps xmm11,XMMWORD[((-136))+rax] + movaps xmm12,XMMWORD[((-120))+rax] + + + + mov r15,QWORD[((-48))+rax] + + mov r14,QWORD[((-40))+rax] + + mov r13,QWORD[((-32))+rax] + + mov r12,QWORD[((-24))+rax] + + mov rbp,QWORD[((-16))+rax] + + mov rbx,QWORD[((-8))+rax] + + lea rsp,[rax] + +$L$enc4x_epilogue: + mov rdi,QWORD[8+rsp] ;WIN64 epilogue + mov rsi,QWORD[16+rsp] + DB 0F3h,0C3h ;repret + +$L$SEH_end_aesni_multi_cbc_encrypt: + +global aesni_multi_cbc_decrypt + +ALIGN 32 +aesni_multi_cbc_decrypt: + mov QWORD[8+rsp],rdi ;WIN64 prologue + mov QWORD[16+rsp],rsi + mov rax,rsp +$L$SEH_begin_aesni_multi_cbc_decrypt: + mov rdi,rcx + mov rsi,rdx + mov rdx,r8 + + + + mov rax,rsp + + push rbx + + push rbp + + push r12 + + push r13 + + push r14 + + push r15 + + lea rsp,[((-168))+rsp] + movaps XMMWORD[rsp],xmm6 + movaps XMMWORD[16+rsp],xmm7 + movaps XMMWORD[32+rsp],xmm8 + movaps XMMWORD[48+rsp],xmm9 + movaps XMMWORD[64+rsp],xmm10 + movaps XMMWORD[80+rsp],xmm11 + movaps XMMWORD[96+rsp],xmm12 + movaps XMMWORD[(-104)+rax],xmm13 + movaps XMMWORD[(-88)+rax],xmm14 + movaps XMMWORD[(-72)+rax],xmm15 + + + + + + + sub rsp,48 + and rsp,-64 + mov QWORD[16+rsp],rax + + +$L$dec4x_body: + movdqu xmm12,XMMWORD[rsi] + lea rsi,[120+rsi] + lea rdi,[80+rdi] + +$L$dec4x_loop_grande: + mov DWORD[24+rsp],edx + xor edx,edx + mov ecx,DWORD[((-64))+rdi] + mov r8,QWORD[((-80))+rdi] + cmp ecx,edx + mov r12,QWORD[((-72))+rdi] + cmovg edx,ecx + test ecx,ecx + movdqu xmm6,XMMWORD[((-56))+rdi] + mov DWORD[32+rsp],ecx + cmovle r8,rsp + mov ecx,DWORD[((-24))+rdi] + mov r9,QWORD[((-40))+rdi] + cmp ecx,edx + mov r13,QWORD[((-32))+rdi] + cmovg edx,ecx + test ecx,ecx + movdqu xmm7,XMMWORD[((-16))+rdi] + mov DWORD[36+rsp],ecx + cmovle r9,rsp + mov ecx,DWORD[16+rdi] + mov r10,QWORD[rdi] + cmp ecx,edx + mov r14,QWORD[8+rdi] + cmovg edx,ecx + test ecx,ecx + movdqu xmm8,XMMWORD[24+rdi] + mov DWORD[40+rsp],ecx + cmovle r10,rsp + mov ecx,DWORD[56+rdi] + mov r11,QWORD[40+rdi] + cmp ecx,edx + mov r15,QWORD[48+rdi] + cmovg edx,ecx + test ecx,ecx + movdqu xmm9,XMMWORD[64+rdi] + mov DWORD[44+rsp],ecx + cmovle r11,rsp + test edx,edx + jz NEAR $L$dec4x_done + + movups xmm1,XMMWORD[((16-120))+rsi] + movups xmm0,XMMWORD[((32-120))+rsi] + mov eax,DWORD[((240-120))+rsi] + movdqu xmm2,XMMWORD[r8] + movdqu xmm3,XMMWORD[r9] + pxor xmm2,xmm12 + movdqu xmm4,XMMWORD[r10] + pxor xmm3,xmm12 + movdqu xmm5,XMMWORD[r11] + pxor xmm4,xmm12 + pxor xmm5,xmm12 + movdqa xmm10,XMMWORD[32+rsp] + xor rbx,rbx + jmp NEAR $L$oop_dec4x + +ALIGN 32 +$L$oop_dec4x: + add rbx,16 + lea rbp,[16+rsp] + mov ecx,1 + sub rbp,rbx + +DB 102,15,56,222,209 + prefetcht0 [31+rbx*1+r8] + prefetcht0 [31+rbx*1+r9] +DB 102,15,56,222,217 + prefetcht0 [31+rbx*1+r10] + prefetcht0 [31+rbx*1+r11] +DB 102,15,56,222,225 +DB 102,15,56,222,233 + movups xmm1,XMMWORD[((48-120))+rsi] + cmp ecx,DWORD[32+rsp] +DB 102,15,56,222,208 +DB 102,15,56,222,216 +DB 102,15,56,222,224 + cmovge r8,rbp + cmovg r12,rbp +DB 102,15,56,222,232 + movups xmm0,XMMWORD[((-56))+rsi] + cmp ecx,DWORD[36+rsp] +DB 102,15,56,222,209 +DB 102,15,56,222,217 +DB 102,15,56,222,225 + cmovge r9,rbp + cmovg r13,rbp +DB 102,15,56,222,233 + movups xmm1,XMMWORD[((-40))+rsi] + cmp ecx,DWORD[40+rsp] +DB 102,15,56,222,208 +DB 102,15,56,222,216 +DB 102,15,56,222,224 + cmovge r10,rbp + cmovg r14,rbp +DB 102,15,56,222,232 + movups xmm0,XMMWORD[((-24))+rsi] + cmp ecx,DWORD[44+rsp] +DB 102,15,56,222,209 +DB 102,15,56,222,217 +DB 102,15,56,222,225 + cmovge r11,rbp + cmovg r15,rbp +DB 102,15,56,222,233 + movups xmm1,XMMWORD[((-8))+rsi] + movdqa xmm11,xmm10 +DB 102,15,56,222,208 + prefetcht0 [15+rbx*1+r12] + prefetcht0 [15+rbx*1+r13] +DB 102,15,56,222,216 + prefetcht0 [15+rbx*1+r14] + prefetcht0 [15+rbx*1+r15] +DB 102,15,56,222,224 +DB 102,15,56,222,232 + movups xmm0,XMMWORD[((128-120))+rsi] + pxor xmm12,xmm12 + +DB 102,15,56,222,209 + pcmpgtd xmm11,xmm12 + movdqu xmm12,XMMWORD[((-120))+rsi] +DB 102,15,56,222,217 + paddd xmm10,xmm11 + movdqa XMMWORD[32+rsp],xmm10 +DB 102,15,56,222,225 +DB 102,15,56,222,233 + movups xmm1,XMMWORD[((144-120))+rsi] + + cmp eax,11 + +DB 102,15,56,222,208 +DB 102,15,56,222,216 +DB 102,15,56,222,224 +DB 102,15,56,222,232 + movups xmm0,XMMWORD[((160-120))+rsi] + + jb NEAR $L$dec4x_tail + +DB 102,15,56,222,209 +DB 102,15,56,222,217 +DB 102,15,56,222,225 +DB 102,15,56,222,233 + movups xmm1,XMMWORD[((176-120))+rsi] + +DB 102,15,56,222,208 +DB 102,15,56,222,216 +DB 102,15,56,222,224 +DB 102,15,56,222,232 + movups xmm0,XMMWORD[((192-120))+rsi] + + je NEAR $L$dec4x_tail + +DB 102,15,56,222,209 +DB 102,15,56,222,217 +DB 102,15,56,222,225 +DB 102,15,56,222,233 + movups xmm1,XMMWORD[((208-120))+rsi] + +DB 102,15,56,222,208 +DB 102,15,56,222,216 +DB 102,15,56,222,224 +DB 102,15,56,222,232 + movups xmm0,XMMWORD[((224-120))+rsi] + jmp NEAR $L$dec4x_tail + +ALIGN 32 +$L$dec4x_tail: +DB 102,15,56,222,209 +DB 102,15,56,222,217 +DB 102,15,56,222,225 + pxor xmm6,xmm0 + pxor xmm7,xmm0 +DB 102,15,56,222,233 + movdqu xmm1,XMMWORD[((16-120))+rsi] + pxor xmm8,xmm0 + pxor xmm9,xmm0 + movdqu xmm0,XMMWORD[((32-120))+rsi] + +DB 102,15,56,223,214 +DB 102,15,56,223,223 + movdqu xmm6,XMMWORD[((-16))+rbx*1+r8] + movdqu xmm7,XMMWORD[((-16))+rbx*1+r9] +DB 102,65,15,56,223,224 +DB 102,65,15,56,223,233 + movdqu xmm8,XMMWORD[((-16))+rbx*1+r10] + movdqu xmm9,XMMWORD[((-16))+rbx*1+r11] + + movups XMMWORD[(-16)+rbx*1+r12],xmm2 + movdqu xmm2,XMMWORD[rbx*1+r8] + movups XMMWORD[(-16)+rbx*1+r13],xmm3 + movdqu xmm3,XMMWORD[rbx*1+r9] + pxor xmm2,xmm12 + movups XMMWORD[(-16)+rbx*1+r14],xmm4 + movdqu xmm4,XMMWORD[rbx*1+r10] + pxor xmm3,xmm12 + movups XMMWORD[(-16)+rbx*1+r15],xmm5 + movdqu xmm5,XMMWORD[rbx*1+r11] + pxor xmm4,xmm12 + pxor xmm5,xmm12 + + dec edx + jnz NEAR $L$oop_dec4x + + mov rax,QWORD[16+rsp] + + mov edx,DWORD[24+rsp] + + lea rdi,[160+rdi] + dec edx + jnz NEAR $L$dec4x_loop_grande + +$L$dec4x_done: + movaps xmm6,XMMWORD[((-216))+rax] + movaps xmm7,XMMWORD[((-200))+rax] + movaps xmm8,XMMWORD[((-184))+rax] + movaps xmm9,XMMWORD[((-168))+rax] + movaps xmm10,XMMWORD[((-152))+rax] + movaps xmm11,XMMWORD[((-136))+rax] + movaps xmm12,XMMWORD[((-120))+rax] + + + + mov r15,QWORD[((-48))+rax] + + mov r14,QWORD[((-40))+rax] + + mov r13,QWORD[((-32))+rax] + + mov r12,QWORD[((-24))+rax] + + mov rbp,QWORD[((-16))+rax] + + mov rbx,QWORD[((-8))+rax] + + lea rsp,[rax] + +$L$dec4x_epilogue: + mov rdi,QWORD[8+rsp] ;WIN64 epilogue + mov rsi,QWORD[16+rsp] + DB 0F3h,0C3h ;repret + +$L$SEH_end_aesni_multi_cbc_decrypt: +EXTERN __imp_RtlVirtualUnwind + +ALIGN 16 +se_handler: + push rsi + push rdi + push rbx + push rbp + push r12 + push r13 + push r14 + push r15 + pushfq + sub rsp,64 + + mov rax,QWORD[120+r8] + mov rbx,QWORD[248+r8] + + mov rsi,QWORD[8+r9] + mov r11,QWORD[56+r9] + + mov r10d,DWORD[r11] + lea r10,[r10*1+rsi] + cmp rbx,r10 + jb NEAR $L$in_prologue + + mov rax,QWORD[152+r8] + + mov r10d,DWORD[4+r11] + lea r10,[r10*1+rsi] + cmp rbx,r10 + jae NEAR $L$in_prologue + + mov rax,QWORD[16+rax] + + mov rbx,QWORD[((-8))+rax] + mov rbp,QWORD[((-16))+rax] + mov r12,QWORD[((-24))+rax] + mov r13,QWORD[((-32))+rax] + mov r14,QWORD[((-40))+rax] + mov r15,QWORD[((-48))+rax] + mov QWORD[144+r8],rbx + mov QWORD[160+r8],rbp + mov QWORD[216+r8],r12 + mov QWORD[224+r8],r13 + mov QWORD[232+r8],r14 + mov QWORD[240+r8],r15 + + lea rsi,[((-56-160))+rax] + lea rdi,[512+r8] + mov ecx,20 + DD 0xa548f3fc + +$L$in_prologue: + mov rdi,QWORD[8+rax] + mov rsi,QWORD[16+rax] + mov QWORD[152+r8],rax + mov QWORD[168+r8],rsi + mov QWORD[176+r8],rdi + + mov rdi,QWORD[40+r9] + mov rsi,r8 + mov ecx,154 + DD 0xa548f3fc + + mov rsi,r9 + xor rcx,rcx + mov rdx,QWORD[8+rsi] + mov r8,QWORD[rsi] + mov r9,QWORD[16+rsi] + mov r10,QWORD[40+rsi] + lea r11,[56+rsi] + lea r12,[24+rsi] + mov QWORD[32+rsp],r10 + mov QWORD[40+rsp],r11 + mov QWORD[48+rsp],r12 + mov QWORD[56+rsp],rcx + call QWORD[__imp_RtlVirtualUnwind] + + mov eax,1 + add rsp,64 + popfq + pop r15 + pop r14 + pop r13 + pop r12 + pop rbp + pop rbx + pop rdi + pop rsi + DB 0F3h,0C3h ;repret + + +section .pdata rdata align=4 +ALIGN 4 + DD $L$SEH_begin_aesni_multi_cbc_encrypt wrt ..imagebase + DD $L$SEH_end_aesni_multi_cbc_encrypt wrt ..imagebase + DD $L$SEH_info_aesni_multi_cbc_encrypt wrt ..imagebase + DD $L$SEH_begin_aesni_multi_cbc_decrypt wrt ..imagebase + DD $L$SEH_end_aesni_multi_cbc_decrypt wrt ..imagebase + DD $L$SEH_info_aesni_multi_cbc_decrypt wrt ..imagebase +section .xdata rdata align=8 +ALIGN 8 +$L$SEH_info_aesni_multi_cbc_encrypt: +DB 9,0,0,0 + DD se_handler wrt ..imagebase + DD $L$enc4x_body wrt ..imagebase,$L$enc4x_epilogue wrt ..imagebase +$L$SEH_info_aesni_multi_cbc_decrypt: +DB 9,0,0,0 + DD se_handler wrt ..imagebase + DD $L$dec4x_body wrt ..imagebase,$L$dec4x_epilogue wrt ..imagebase diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha1-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha1-x86_64.nasm new file mode 100644 index 0000000000..f4fd9ca50d --- /dev/null +++ b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha1-x86_64.nasm @@ -0,0 +1,1916 @@ +; WARNING: do not edit! +; Generated from openssl/crypto/aes/asm/aesni-sha1-x86_64.pl +; +; Copyright 2011-2020 The OpenSSL Project Authors. All Rights Reserved. +; +; Licensed under the OpenSSL license (the "License"). You may not use +; this file except in compliance with the License. You can obtain a copy +; in the file LICENSE in the source distribution or at +; https://www.openssl.org/source/license.html + +default rel +%define XMMWORD +%define YMMWORD +%define ZMMWORD +section .text code align=64 + +EXTERN OPENSSL_ia32cap_P + +global aesni_cbc_sha1_enc + +ALIGN 32 +aesni_cbc_sha1_enc: + + + mov r10d,DWORD[((OPENSSL_ia32cap_P+0))] + mov r11,QWORD[((OPENSSL_ia32cap_P+4))] + bt r11,61 + jc NEAR aesni_cbc_sha1_enc_shaext + jmp NEAR aesni_cbc_sha1_enc_ssse3 + DB 0F3h,0C3h ;repret + + + +ALIGN 32 +aesni_cbc_sha1_enc_ssse3: + mov QWORD[8+rsp],rdi ;WIN64 prologue + mov QWORD[16+rsp],rsi + mov rax,rsp +$L$SEH_begin_aesni_cbc_sha1_enc_ssse3: + mov rdi,rcx + mov rsi,rdx + mov rdx,r8 + mov rcx,r9 + mov r8,QWORD[40+rsp] + mov r9,QWORD[48+rsp] + + + + mov r10,QWORD[56+rsp] + + + push rbx + + push rbp + + push r12 + + push r13 + + push r14 + + push r15 + + lea rsp,[((-264))+rsp] + + + + movaps XMMWORD[(96+0)+rsp],xmm6 + movaps XMMWORD[(96+16)+rsp],xmm7 + movaps XMMWORD[(96+32)+rsp],xmm8 + movaps XMMWORD[(96+48)+rsp],xmm9 + movaps XMMWORD[(96+64)+rsp],xmm10 + movaps XMMWORD[(96+80)+rsp],xmm11 + movaps XMMWORD[(96+96)+rsp],xmm12 + movaps XMMWORD[(96+112)+rsp],xmm13 + movaps XMMWORD[(96+128)+rsp],xmm14 + movaps XMMWORD[(96+144)+rsp],xmm15 +$L$prologue_ssse3: + mov r12,rdi + mov r13,rsi + mov r14,rdx + lea r15,[112+rcx] + movdqu xmm2,XMMWORD[r8] + mov QWORD[88+rsp],r8 + shl r14,6 + sub r13,r12 + mov r8d,DWORD[((240-112))+r15] + add r14,r10 + + lea r11,[K_XX_XX] + mov eax,DWORD[r9] + mov ebx,DWORD[4+r9] + mov ecx,DWORD[8+r9] + mov edx,DWORD[12+r9] + mov esi,ebx + mov ebp,DWORD[16+r9] + mov edi,ecx + xor edi,edx + and esi,edi + + movdqa xmm3,XMMWORD[64+r11] + movdqa xmm13,XMMWORD[r11] + movdqu xmm4,XMMWORD[r10] + movdqu xmm5,XMMWORD[16+r10] + movdqu xmm6,XMMWORD[32+r10] + movdqu xmm7,XMMWORD[48+r10] +DB 102,15,56,0,227 +DB 102,15,56,0,235 +DB 102,15,56,0,243 + add r10,64 + paddd xmm4,xmm13 +DB 102,15,56,0,251 + paddd xmm5,xmm13 + paddd xmm6,xmm13 + movdqa XMMWORD[rsp],xmm4 + psubd xmm4,xmm13 + movdqa XMMWORD[16+rsp],xmm5 + psubd xmm5,xmm13 + movdqa XMMWORD[32+rsp],xmm6 + psubd xmm6,xmm13 + movups xmm15,XMMWORD[((-112))+r15] + movups xmm0,XMMWORD[((16-112))+r15] + jmp NEAR $L$oop_ssse3 +ALIGN 32 +$L$oop_ssse3: + ror ebx,2 + movups xmm14,XMMWORD[r12] + xorps xmm14,xmm15 + xorps xmm2,xmm14 + movups xmm1,XMMWORD[((-80))+r15] +DB 102,15,56,220,208 + pshufd xmm8,xmm4,238 + xor esi,edx + movdqa xmm12,xmm7 + paddd xmm13,xmm7 + mov edi,eax + add ebp,DWORD[rsp] + punpcklqdq xmm8,xmm5 + xor ebx,ecx + rol eax,5 + add ebp,esi + psrldq xmm12,4 + and edi,ebx + xor ebx,ecx + pxor xmm8,xmm4 + add ebp,eax + ror eax,7 + pxor xmm12,xmm6 + xor edi,ecx + mov esi,ebp + add edx,DWORD[4+rsp] + pxor xmm8,xmm12 + xor eax,ebx + rol ebp,5 + movdqa XMMWORD[48+rsp],xmm13 + add edx,edi + movups xmm0,XMMWORD[((-64))+r15] +DB 102,15,56,220,209 + and esi,eax + movdqa xmm3,xmm8 + xor eax,ebx + add edx,ebp + ror ebp,7 + movdqa xmm12,xmm8 + xor esi,ebx + pslldq xmm3,12 + paddd xmm8,xmm8 + mov edi,edx + add ecx,DWORD[8+rsp] + psrld xmm12,31 + xor ebp,eax + rol edx,5 + add ecx,esi + movdqa xmm13,xmm3 + and edi,ebp + xor ebp,eax + psrld xmm3,30 + add ecx,edx + ror edx,7 + por xmm8,xmm12 + xor edi,eax + mov esi,ecx + add ebx,DWORD[12+rsp] + movups xmm1,XMMWORD[((-48))+r15] +DB 102,15,56,220,208 + pslld xmm13,2 + pxor xmm8,xmm3 + xor edx,ebp + movdqa xmm3,XMMWORD[r11] + rol ecx,5 + add ebx,edi + and esi,edx + pxor xmm8,xmm13 + xor edx,ebp + add ebx,ecx + ror ecx,7 + pshufd xmm9,xmm5,238 + xor esi,ebp + movdqa xmm13,xmm8 + paddd xmm3,xmm8 + mov edi,ebx + add eax,DWORD[16+rsp] + punpcklqdq xmm9,xmm6 + xor ecx,edx + rol ebx,5 + add eax,esi + psrldq xmm13,4 + and edi,ecx + xor ecx,edx + pxor xmm9,xmm5 + add eax,ebx + ror ebx,7 + movups xmm0,XMMWORD[((-32))+r15] +DB 102,15,56,220,209 + pxor xmm13,xmm7 + xor edi,edx + mov esi,eax + add ebp,DWORD[20+rsp] + pxor xmm9,xmm13 + xor ebx,ecx + rol eax,5 + movdqa XMMWORD[rsp],xmm3 + add ebp,edi + and esi,ebx + movdqa xmm12,xmm9 + xor ebx,ecx + add ebp,eax + ror eax,7 + movdqa xmm13,xmm9 + xor esi,ecx + pslldq xmm12,12 + paddd xmm9,xmm9 + mov edi,ebp + add edx,DWORD[24+rsp] + psrld xmm13,31 + xor eax,ebx + rol ebp,5 + add edx,esi + movups xmm1,XMMWORD[((-16))+r15] +DB 102,15,56,220,208 + movdqa xmm3,xmm12 + and edi,eax + xor eax,ebx + psrld xmm12,30 + add edx,ebp + ror ebp,7 + por xmm9,xmm13 + xor edi,ebx + mov esi,edx + add ecx,DWORD[28+rsp] + pslld xmm3,2 + pxor xmm9,xmm12 + xor ebp,eax + movdqa xmm12,XMMWORD[16+r11] + rol edx,5 + add ecx,edi + and esi,ebp + pxor xmm9,xmm3 + xor ebp,eax + add ecx,edx + ror edx,7 + pshufd xmm10,xmm6,238 + xor esi,eax + movdqa xmm3,xmm9 + paddd xmm12,xmm9 + mov edi,ecx + add ebx,DWORD[32+rsp] + movups xmm0,XMMWORD[r15] +DB 102,15,56,220,209 + punpcklqdq xmm10,xmm7 + xor edx,ebp + rol ecx,5 + add ebx,esi + psrldq xmm3,4 + and edi,edx + xor edx,ebp + pxor xmm10,xmm6 + add ebx,ecx + ror ecx,7 + pxor xmm3,xmm8 + xor edi,ebp + mov esi,ebx + add eax,DWORD[36+rsp] + pxor xmm10,xmm3 + xor ecx,edx + rol ebx,5 + movdqa XMMWORD[16+rsp],xmm12 + add eax,edi + and esi,ecx + movdqa xmm13,xmm10 + xor ecx,edx + add eax,ebx + ror ebx,7 + movups xmm1,XMMWORD[16+r15] +DB 102,15,56,220,208 + movdqa xmm3,xmm10 + xor esi,edx + pslldq xmm13,12 + paddd xmm10,xmm10 + mov edi,eax + add ebp,DWORD[40+rsp] + psrld xmm3,31 + xor ebx,ecx + rol eax,5 + add ebp,esi + movdqa xmm12,xmm13 + and edi,ebx + xor ebx,ecx + psrld xmm13,30 + add ebp,eax + ror eax,7 + por xmm10,xmm3 + xor edi,ecx + mov esi,ebp + add edx,DWORD[44+rsp] + pslld xmm12,2 + pxor xmm10,xmm13 + xor eax,ebx + movdqa xmm13,XMMWORD[16+r11] + rol ebp,5 + add edx,edi + movups xmm0,XMMWORD[32+r15] +DB 102,15,56,220,209 + and esi,eax + pxor xmm10,xmm12 + xor eax,ebx + add edx,ebp + ror ebp,7 + pshufd xmm11,xmm7,238 + xor esi,ebx + movdqa xmm12,xmm10 + paddd xmm13,xmm10 + mov edi,edx + add ecx,DWORD[48+rsp] + punpcklqdq xmm11,xmm8 + xor ebp,eax + rol edx,5 + add ecx,esi + psrldq xmm12,4 + and edi,ebp + xor ebp,eax + pxor xmm11,xmm7 + add ecx,edx + ror edx,7 + pxor xmm12,xmm9 + xor edi,eax + mov esi,ecx + add ebx,DWORD[52+rsp] + movups xmm1,XMMWORD[48+r15] +DB 102,15,56,220,208 + pxor xmm11,xmm12 + xor edx,ebp + rol ecx,5 + movdqa XMMWORD[32+rsp],xmm13 + add ebx,edi + and esi,edx + movdqa xmm3,xmm11 + xor edx,ebp + add ebx,ecx + ror ecx,7 + movdqa xmm12,xmm11 + xor esi,ebp + pslldq xmm3,12 + paddd xmm11,xmm11 + mov edi,ebx + add eax,DWORD[56+rsp] + psrld xmm12,31 + xor ecx,edx + rol ebx,5 + add eax,esi + movdqa xmm13,xmm3 + and edi,ecx + xor ecx,edx + psrld xmm3,30 + add eax,ebx + ror ebx,7 + cmp r8d,11 + jb NEAR $L$aesenclast1 + movups xmm0,XMMWORD[64+r15] +DB 102,15,56,220,209 + movups xmm1,XMMWORD[80+r15] +DB 102,15,56,220,208 + je NEAR $L$aesenclast1 + movups xmm0,XMMWORD[96+r15] +DB 102,15,56,220,209 + movups xmm1,XMMWORD[112+r15] +DB 102,15,56,220,208 +$L$aesenclast1: +DB 102,15,56,221,209 + movups xmm0,XMMWORD[((16-112))+r15] + por xmm11,xmm12 + xor edi,edx + mov esi,eax + add ebp,DWORD[60+rsp] + pslld xmm13,2 + pxor xmm11,xmm3 + xor ebx,ecx + movdqa xmm3,XMMWORD[16+r11] + rol eax,5 + add ebp,edi + and esi,ebx + pxor xmm11,xmm13 + pshufd xmm13,xmm10,238 + xor ebx,ecx + add ebp,eax + ror eax,7 + pxor xmm4,xmm8 + xor esi,ecx + mov edi,ebp + add edx,DWORD[rsp] + punpcklqdq xmm13,xmm11 + xor eax,ebx + rol ebp,5 + pxor xmm4,xmm5 + add edx,esi + movups xmm14,XMMWORD[16+r12] + xorps xmm14,xmm15 + movups XMMWORD[r13*1+r12],xmm2 + xorps xmm2,xmm14 + movups xmm1,XMMWORD[((-80))+r15] +DB 102,15,56,220,208 + and edi,eax + movdqa xmm12,xmm3 + xor eax,ebx + paddd xmm3,xmm11 + add edx,ebp + pxor xmm4,xmm13 + ror ebp,7 + xor edi,ebx + mov esi,edx + add ecx,DWORD[4+rsp] + movdqa xmm13,xmm4 + xor ebp,eax + rol edx,5 + movdqa XMMWORD[48+rsp],xmm3 + add ecx,edi + and esi,ebp + xor ebp,eax + pslld xmm4,2 + add ecx,edx + ror edx,7 + psrld xmm13,30 + xor esi,eax + mov edi,ecx + add ebx,DWORD[8+rsp] + movups xmm0,XMMWORD[((-64))+r15] +DB 102,15,56,220,209 + por xmm4,xmm13 + xor edx,ebp + rol ecx,5 + pshufd xmm3,xmm11,238 + add ebx,esi + and edi,edx + xor edx,ebp + add ebx,ecx + add eax,DWORD[12+rsp] + xor edi,ebp + mov esi,ebx + rol ebx,5 + add eax,edi + xor esi,edx + ror ecx,7 + add eax,ebx + pxor xmm5,xmm9 + add ebp,DWORD[16+rsp] + movups xmm1,XMMWORD[((-48))+r15] +DB 102,15,56,220,208 + xor esi,ecx + punpcklqdq xmm3,xmm4 + mov edi,eax + rol eax,5 + pxor xmm5,xmm6 + add ebp,esi + xor edi,ecx + movdqa xmm13,xmm12 + ror ebx,7 + paddd xmm12,xmm4 + add ebp,eax + pxor xmm5,xmm3 + add edx,DWORD[20+rsp] + xor edi,ebx + mov esi,ebp + rol ebp,5 + movdqa xmm3,xmm5 + add edx,edi + xor esi,ebx + movdqa XMMWORD[rsp],xmm12 + ror eax,7 + add edx,ebp + add ecx,DWORD[24+rsp] + pslld xmm5,2 + xor esi,eax + mov edi,edx + psrld xmm3,30 + rol edx,5 + add ecx,esi + movups xmm0,XMMWORD[((-32))+r15] +DB 102,15,56,220,209 + xor edi,eax + ror ebp,7 + por xmm5,xmm3 + add ecx,edx + add ebx,DWORD[28+rsp] + pshufd xmm12,xmm4,238 + xor edi,ebp + mov esi,ecx + rol ecx,5 + add ebx,edi + xor esi,ebp + ror edx,7 + add ebx,ecx + pxor xmm6,xmm10 + add eax,DWORD[32+rsp] + xor esi,edx + punpcklqdq xmm12,xmm5 + mov edi,ebx + rol ebx,5 + pxor xmm6,xmm7 + add eax,esi + xor edi,edx + movdqa xmm3,XMMWORD[32+r11] + ror ecx,7 + paddd xmm13,xmm5 + add eax,ebx + pxor xmm6,xmm12 + add ebp,DWORD[36+rsp] + movups xmm1,XMMWORD[((-16))+r15] +DB 102,15,56,220,208 + xor edi,ecx + mov esi,eax + rol eax,5 + movdqa xmm12,xmm6 + add ebp,edi + xor esi,ecx + movdqa XMMWORD[16+rsp],xmm13 + ror ebx,7 + add ebp,eax + add edx,DWORD[40+rsp] + pslld xmm6,2 + xor esi,ebx + mov edi,ebp + psrld xmm12,30 + rol ebp,5 + add edx,esi + xor edi,ebx + ror eax,7 + por xmm6,xmm12 + add edx,ebp + add ecx,DWORD[44+rsp] + pshufd xmm13,xmm5,238 + xor edi,eax + mov esi,edx + rol edx,5 + add ecx,edi + movups xmm0,XMMWORD[r15] +DB 102,15,56,220,209 + xor esi,eax + ror ebp,7 + add ecx,edx + pxor xmm7,xmm11 + add ebx,DWORD[48+rsp] + xor esi,ebp + punpcklqdq xmm13,xmm6 + mov edi,ecx + rol ecx,5 + pxor xmm7,xmm8 + add ebx,esi + xor edi,ebp + movdqa xmm12,xmm3 + ror edx,7 + paddd xmm3,xmm6 + add ebx,ecx + pxor xmm7,xmm13 + add eax,DWORD[52+rsp] + xor edi,edx + mov esi,ebx + rol ebx,5 + movdqa xmm13,xmm7 + add eax,edi + xor esi,edx + movdqa XMMWORD[32+rsp],xmm3 + ror ecx,7 + add eax,ebx + add ebp,DWORD[56+rsp] + movups xmm1,XMMWORD[16+r15] +DB 102,15,56,220,208 + pslld xmm7,2 + xor esi,ecx + mov edi,eax + psrld xmm13,30 + rol eax,5 + add ebp,esi + xor edi,ecx + ror ebx,7 + por xmm7,xmm13 + add ebp,eax + add edx,DWORD[60+rsp] + pshufd xmm3,xmm6,238 + xor edi,ebx + mov esi,ebp + rol ebp,5 + add edx,edi + xor esi,ebx + ror eax,7 + add edx,ebp + pxor xmm8,xmm4 + add ecx,DWORD[rsp] + xor esi,eax + punpcklqdq xmm3,xmm7 + mov edi,edx + rol edx,5 + pxor xmm8,xmm9 + add ecx,esi + movups xmm0,XMMWORD[32+r15] +DB 102,15,56,220,209 + xor edi,eax + movdqa xmm13,xmm12 + ror ebp,7 + paddd xmm12,xmm7 + add ecx,edx + pxor xmm8,xmm3 + add ebx,DWORD[4+rsp] + xor edi,ebp + mov esi,ecx + rol ecx,5 + movdqa xmm3,xmm8 + add ebx,edi + xor esi,ebp + movdqa XMMWORD[48+rsp],xmm12 + ror edx,7 + add ebx,ecx + add eax,DWORD[8+rsp] + pslld xmm8,2 + xor esi,edx + mov edi,ebx + psrld xmm3,30 + rol ebx,5 + add eax,esi + xor edi,edx + ror ecx,7 + por xmm8,xmm3 + add eax,ebx + add ebp,DWORD[12+rsp] + movups xmm1,XMMWORD[48+r15] +DB 102,15,56,220,208 + pshufd xmm12,xmm7,238 + xor edi,ecx + mov esi,eax + rol eax,5 + add ebp,edi + xor esi,ecx + ror ebx,7 + add ebp,eax + pxor xmm9,xmm5 + add edx,DWORD[16+rsp] + xor esi,ebx + punpcklqdq xmm12,xmm8 + mov edi,ebp + rol ebp,5 + pxor xmm9,xmm10 + add edx,esi + xor edi,ebx + movdqa xmm3,xmm13 + ror eax,7 + paddd xmm13,xmm8 + add edx,ebp + pxor xmm9,xmm12 + add ecx,DWORD[20+rsp] + xor edi,eax + mov esi,edx + rol edx,5 + movdqa xmm12,xmm9 + add ecx,edi + cmp r8d,11 + jb NEAR $L$aesenclast2 + movups xmm0,XMMWORD[64+r15] +DB 102,15,56,220,209 + movups xmm1,XMMWORD[80+r15] +DB 102,15,56,220,208 + je NEAR $L$aesenclast2 + movups xmm0,XMMWORD[96+r15] +DB 102,15,56,220,209 + movups xmm1,XMMWORD[112+r15] +DB 102,15,56,220,208 +$L$aesenclast2: +DB 102,15,56,221,209 + movups xmm0,XMMWORD[((16-112))+r15] + xor esi,eax + movdqa XMMWORD[rsp],xmm13 + ror ebp,7 + add ecx,edx + add ebx,DWORD[24+rsp] + pslld xmm9,2 + xor esi,ebp + mov edi,ecx + psrld xmm12,30 + rol ecx,5 + add ebx,esi + xor edi,ebp + ror edx,7 + por xmm9,xmm12 + add ebx,ecx + add eax,DWORD[28+rsp] + pshufd xmm13,xmm8,238 + ror ecx,7 + mov esi,ebx + xor edi,edx + rol ebx,5 + add eax,edi + xor esi,ecx + xor ecx,edx + add eax,ebx + pxor xmm10,xmm6 + add ebp,DWORD[32+rsp] + movups xmm14,XMMWORD[32+r12] + xorps xmm14,xmm15 + movups XMMWORD[16+r12*1+r13],xmm2 + xorps xmm2,xmm14 + movups xmm1,XMMWORD[((-80))+r15] +DB 102,15,56,220,208 + and esi,ecx + xor ecx,edx + ror ebx,7 + punpcklqdq xmm13,xmm9 + mov edi,eax + xor esi,ecx + pxor xmm10,xmm11 + rol eax,5 + add ebp,esi + movdqa xmm12,xmm3 + xor edi,ebx + paddd xmm3,xmm9 + xor ebx,ecx + pxor xmm10,xmm13 + add ebp,eax + add edx,DWORD[36+rsp] + and edi,ebx + xor ebx,ecx + ror eax,7 + movdqa xmm13,xmm10 + mov esi,ebp + xor edi,ebx + movdqa XMMWORD[16+rsp],xmm3 + rol ebp,5 + add edx,edi + movups xmm0,XMMWORD[((-64))+r15] +DB 102,15,56,220,209 + xor esi,eax + pslld xmm10,2 + xor eax,ebx + add edx,ebp + psrld xmm13,30 + add ecx,DWORD[40+rsp] + and esi,eax + xor eax,ebx + por xmm10,xmm13 + ror ebp,7 + mov edi,edx + xor esi,eax + rol edx,5 + pshufd xmm3,xmm9,238 + add ecx,esi + xor edi,ebp + xor ebp,eax + add ecx,edx + add ebx,DWORD[44+rsp] + and edi,ebp + xor ebp,eax + ror edx,7 + movups xmm1,XMMWORD[((-48))+r15] +DB 102,15,56,220,208 + mov esi,ecx + xor edi,ebp + rol ecx,5 + add ebx,edi + xor esi,edx + xor edx,ebp + add ebx,ecx + pxor xmm11,xmm7 + add eax,DWORD[48+rsp] + and esi,edx + xor edx,ebp + ror ecx,7 + punpcklqdq xmm3,xmm10 + mov edi,ebx + xor esi,edx + pxor xmm11,xmm4 + rol ebx,5 + add eax,esi + movdqa xmm13,XMMWORD[48+r11] + xor edi,ecx + paddd xmm12,xmm10 + xor ecx,edx + pxor xmm11,xmm3 + add eax,ebx + add ebp,DWORD[52+rsp] + movups xmm0,XMMWORD[((-32))+r15] +DB 102,15,56,220,209 + and edi,ecx + xor ecx,edx + ror ebx,7 + movdqa xmm3,xmm11 + mov esi,eax + xor edi,ecx + movdqa XMMWORD[32+rsp],xmm12 + rol eax,5 + add ebp,edi + xor esi,ebx + pslld xmm11,2 + xor ebx,ecx + add ebp,eax + psrld xmm3,30 + add edx,DWORD[56+rsp] + and esi,ebx + xor ebx,ecx + por xmm11,xmm3 + ror eax,7 + mov edi,ebp + xor esi,ebx + rol ebp,5 + pshufd xmm12,xmm10,238 + add edx,esi + movups xmm1,XMMWORD[((-16))+r15] +DB 102,15,56,220,208 + xor edi,eax + xor eax,ebx + add edx,ebp + add ecx,DWORD[60+rsp] + and edi,eax + xor eax,ebx + ror ebp,7 + mov esi,edx + xor edi,eax + rol edx,5 + add ecx,edi + xor esi,ebp + xor ebp,eax + add ecx,edx + pxor xmm4,xmm8 + add ebx,DWORD[rsp] + and esi,ebp + xor ebp,eax + ror edx,7 + movups xmm0,XMMWORD[r15] +DB 102,15,56,220,209 + punpcklqdq xmm12,xmm11 + mov edi,ecx + xor esi,ebp + pxor xmm4,xmm5 + rol ecx,5 + add ebx,esi + movdqa xmm3,xmm13 + xor edi,edx + paddd xmm13,xmm11 + xor edx,ebp + pxor xmm4,xmm12 + add ebx,ecx + add eax,DWORD[4+rsp] + and edi,edx + xor edx,ebp + ror ecx,7 + movdqa xmm12,xmm4 + mov esi,ebx + xor edi,edx + movdqa XMMWORD[48+rsp],xmm13 + rol ebx,5 + add eax,edi + xor esi,ecx + pslld xmm4,2 + xor ecx,edx + add eax,ebx + psrld xmm12,30 + add ebp,DWORD[8+rsp] + movups xmm1,XMMWORD[16+r15] +DB 102,15,56,220,208 + and esi,ecx + xor ecx,edx + por xmm4,xmm12 + ror ebx,7 + mov edi,eax + xor esi,ecx + rol eax,5 + pshufd xmm13,xmm11,238 + add ebp,esi + xor edi,ebx + xor ebx,ecx + add ebp,eax + add edx,DWORD[12+rsp] + and edi,ebx + xor ebx,ecx + ror eax,7 + mov esi,ebp + xor edi,ebx + rol ebp,5 + add edx,edi + movups xmm0,XMMWORD[32+r15] +DB 102,15,56,220,209 + xor esi,eax + xor eax,ebx + add edx,ebp + pxor xmm5,xmm9 + add ecx,DWORD[16+rsp] + and esi,eax + xor eax,ebx + ror ebp,7 + punpcklqdq xmm13,xmm4 + mov edi,edx + xor esi,eax + pxor xmm5,xmm6 + rol edx,5 + add ecx,esi + movdqa xmm12,xmm3 + xor edi,ebp + paddd xmm3,xmm4 + xor ebp,eax + pxor xmm5,xmm13 + add ecx,edx + add ebx,DWORD[20+rsp] + and edi,ebp + xor ebp,eax + ror edx,7 + movups xmm1,XMMWORD[48+r15] +DB 102,15,56,220,208 + movdqa xmm13,xmm5 + mov esi,ecx + xor edi,ebp + movdqa XMMWORD[rsp],xmm3 + rol ecx,5 + add ebx,edi + xor esi,edx + pslld xmm5,2 + xor edx,ebp + add ebx,ecx + psrld xmm13,30 + add eax,DWORD[24+rsp] + and esi,edx + xor edx,ebp + por xmm5,xmm13 + ror ecx,7 + mov edi,ebx + xor esi,edx + rol ebx,5 + pshufd xmm3,xmm4,238 + add eax,esi + xor edi,ecx + xor ecx,edx + add eax,ebx + add ebp,DWORD[28+rsp] + cmp r8d,11 + jb NEAR $L$aesenclast3 + movups xmm0,XMMWORD[64+r15] +DB 102,15,56,220,209 + movups xmm1,XMMWORD[80+r15] +DB 102,15,56,220,208 + je NEAR $L$aesenclast3 + movups xmm0,XMMWORD[96+r15] +DB 102,15,56,220,209 + movups xmm1,XMMWORD[112+r15] +DB 102,15,56,220,208 +$L$aesenclast3: +DB 102,15,56,221,209 + movups xmm0,XMMWORD[((16-112))+r15] + and edi,ecx + xor ecx,edx + ror ebx,7 + mov esi,eax + xor edi,ecx + rol eax,5 + add ebp,edi + xor esi,ebx + xor ebx,ecx + add ebp,eax + pxor xmm6,xmm10 + add edx,DWORD[32+rsp] + and esi,ebx + xor ebx,ecx + ror eax,7 + punpcklqdq xmm3,xmm5 + mov edi,ebp + xor esi,ebx + pxor xmm6,xmm7 + rol ebp,5 + add edx,esi + movups xmm14,XMMWORD[48+r12] + xorps xmm14,xmm15 + movups XMMWORD[32+r12*1+r13],xmm2 + xorps xmm2,xmm14 + movups xmm1,XMMWORD[((-80))+r15] +DB 102,15,56,220,208 + movdqa xmm13,xmm12 + xor edi,eax + paddd xmm12,xmm5 + xor eax,ebx + pxor xmm6,xmm3 + add edx,ebp + add ecx,DWORD[36+rsp] + and edi,eax + xor eax,ebx + ror ebp,7 + movdqa xmm3,xmm6 + mov esi,edx + xor edi,eax + movdqa XMMWORD[16+rsp],xmm12 + rol edx,5 + add ecx,edi + xor esi,ebp + pslld xmm6,2 + xor ebp,eax + add ecx,edx + psrld xmm3,30 + add ebx,DWORD[40+rsp] + and esi,ebp + xor ebp,eax + por xmm6,xmm3 + ror edx,7 + movups xmm0,XMMWORD[((-64))+r15] +DB 102,15,56,220,209 + mov edi,ecx + xor esi,ebp + rol ecx,5 + pshufd xmm12,xmm5,238 + add ebx,esi + xor edi,edx + xor edx,ebp + add ebx,ecx + add eax,DWORD[44+rsp] + and edi,edx + xor edx,ebp + ror ecx,7 + mov esi,ebx + xor edi,edx + rol ebx,5 + add eax,edi + xor esi,edx + add eax,ebx + pxor xmm7,xmm11 + add ebp,DWORD[48+rsp] + movups xmm1,XMMWORD[((-48))+r15] +DB 102,15,56,220,208 + xor esi,ecx + punpcklqdq xmm12,xmm6 + mov edi,eax + rol eax,5 + pxor xmm7,xmm8 + add ebp,esi + xor edi,ecx + movdqa xmm3,xmm13 + ror ebx,7 + paddd xmm13,xmm6 + add ebp,eax + pxor xmm7,xmm12 + add edx,DWORD[52+rsp] + xor edi,ebx + mov esi,ebp + rol ebp,5 + movdqa xmm12,xmm7 + add edx,edi + xor esi,ebx + movdqa XMMWORD[32+rsp],xmm13 + ror eax,7 + add edx,ebp + add ecx,DWORD[56+rsp] + pslld xmm7,2 + xor esi,eax + mov edi,edx + psrld xmm12,30 + rol edx,5 + add ecx,esi + movups xmm0,XMMWORD[((-32))+r15] +DB 102,15,56,220,209 + xor edi,eax + ror ebp,7 + por xmm7,xmm12 + add ecx,edx + add ebx,DWORD[60+rsp] + xor edi,ebp + mov esi,ecx + rol ecx,5 + add ebx,edi + xor esi,ebp + ror edx,7 + add ebx,ecx + add eax,DWORD[rsp] + xor esi,edx + mov edi,ebx + rol ebx,5 + paddd xmm3,xmm7 + add eax,esi + xor edi,edx + movdqa XMMWORD[48+rsp],xmm3 + ror ecx,7 + add eax,ebx + add ebp,DWORD[4+rsp] + movups xmm1,XMMWORD[((-16))+r15] +DB 102,15,56,220,208 + xor edi,ecx + mov esi,eax + rol eax,5 + add ebp,edi + xor esi,ecx + ror ebx,7 + add ebp,eax + add edx,DWORD[8+rsp] + xor esi,ebx + mov edi,ebp + rol ebp,5 + add edx,esi + xor edi,ebx + ror eax,7 + add edx,ebp + add ecx,DWORD[12+rsp] + xor edi,eax + mov esi,edx + rol edx,5 + add ecx,edi + movups xmm0,XMMWORD[r15] +DB 102,15,56,220,209 + xor esi,eax + ror ebp,7 + add ecx,edx + cmp r10,r14 + je NEAR $L$done_ssse3 + movdqa xmm3,XMMWORD[64+r11] + movdqa xmm13,XMMWORD[r11] + movdqu xmm4,XMMWORD[r10] + movdqu xmm5,XMMWORD[16+r10] + movdqu xmm6,XMMWORD[32+r10] + movdqu xmm7,XMMWORD[48+r10] +DB 102,15,56,0,227 + add r10,64 + add ebx,DWORD[16+rsp] + xor esi,ebp + mov edi,ecx +DB 102,15,56,0,235 + rol ecx,5 + add ebx,esi + xor edi,ebp + ror edx,7 + paddd xmm4,xmm13 + add ebx,ecx + add eax,DWORD[20+rsp] + xor edi,edx + mov esi,ebx + movdqa XMMWORD[rsp],xmm4 + rol ebx,5 + add eax,edi + xor esi,edx + ror ecx,7 + psubd xmm4,xmm13 + add eax,ebx + add ebp,DWORD[24+rsp] + movups xmm1,XMMWORD[16+r15] +DB 102,15,56,220,208 + xor esi,ecx + mov edi,eax + rol eax,5 + add ebp,esi + xor edi,ecx + ror ebx,7 + add ebp,eax + add edx,DWORD[28+rsp] + xor edi,ebx + mov esi,ebp + rol ebp,5 + add edx,edi + xor esi,ebx + ror eax,7 + add edx,ebp + add ecx,DWORD[32+rsp] + xor esi,eax + mov edi,edx +DB 102,15,56,0,243 + rol edx,5 + add ecx,esi + movups xmm0,XMMWORD[32+r15] +DB 102,15,56,220,209 + xor edi,eax + ror ebp,7 + paddd xmm5,xmm13 + add ecx,edx + add ebx,DWORD[36+rsp] + xor edi,ebp + mov esi,ecx + movdqa XMMWORD[16+rsp],xmm5 + rol ecx,5 + add ebx,edi + xor esi,ebp + ror edx,7 + psubd xmm5,xmm13 + add ebx,ecx + add eax,DWORD[40+rsp] + xor esi,edx + mov edi,ebx + rol ebx,5 + add eax,esi + xor edi,edx + ror ecx,7 + add eax,ebx + add ebp,DWORD[44+rsp] + movups xmm1,XMMWORD[48+r15] +DB 102,15,56,220,208 + xor edi,ecx + mov esi,eax + rol eax,5 + add ebp,edi + xor esi,ecx + ror ebx,7 + add ebp,eax + add edx,DWORD[48+rsp] + xor esi,ebx + mov edi,ebp +DB 102,15,56,0,251 + rol ebp,5 + add edx,esi + xor edi,ebx + ror eax,7 + paddd xmm6,xmm13 + add edx,ebp + add ecx,DWORD[52+rsp] + xor edi,eax + mov esi,edx + movdqa XMMWORD[32+rsp],xmm6 + rol edx,5 + add ecx,edi + cmp r8d,11 + jb NEAR $L$aesenclast4 + movups xmm0,XMMWORD[64+r15] +DB 102,15,56,220,209 + movups xmm1,XMMWORD[80+r15] +DB 102,15,56,220,208 + je NEAR $L$aesenclast4 + movups xmm0,XMMWORD[96+r15] +DB 102,15,56,220,209 + movups xmm1,XMMWORD[112+r15] +DB 102,15,56,220,208 +$L$aesenclast4: +DB 102,15,56,221,209 + movups xmm0,XMMWORD[((16-112))+r15] + xor esi,eax + ror ebp,7 + psubd xmm6,xmm13 + add ecx,edx + add ebx,DWORD[56+rsp] + xor esi,ebp + mov edi,ecx + rol ecx,5 + add ebx,esi + xor edi,ebp + ror edx,7 + add ebx,ecx + add eax,DWORD[60+rsp] + xor edi,edx + mov esi,ebx + rol ebx,5 + add eax,edi + ror ecx,7 + add eax,ebx + movups XMMWORD[48+r12*1+r13],xmm2 + lea r12,[64+r12] + + add eax,DWORD[r9] + add esi,DWORD[4+r9] + add ecx,DWORD[8+r9] + add edx,DWORD[12+r9] + mov DWORD[r9],eax + add ebp,DWORD[16+r9] + mov DWORD[4+r9],esi + mov ebx,esi + mov DWORD[8+r9],ecx + mov edi,ecx + mov DWORD[12+r9],edx + xor edi,edx + mov DWORD[16+r9],ebp + and esi,edi + jmp NEAR $L$oop_ssse3 + +$L$done_ssse3: + add ebx,DWORD[16+rsp] + xor esi,ebp + mov edi,ecx + rol ecx,5 + add ebx,esi + xor edi,ebp + ror edx,7 + add ebx,ecx + add eax,DWORD[20+rsp] + xor edi,edx + mov esi,ebx + rol ebx,5 + add eax,edi + xor esi,edx + ror ecx,7 + add eax,ebx + add ebp,DWORD[24+rsp] + movups xmm1,XMMWORD[16+r15] +DB 102,15,56,220,208 + xor esi,ecx + mov edi,eax + rol eax,5 + add ebp,esi + xor edi,ecx + ror ebx,7 + add ebp,eax + add edx,DWORD[28+rsp] + xor edi,ebx + mov esi,ebp + rol ebp,5 + add edx,edi + xor esi,ebx + ror eax,7 + add edx,ebp + add ecx,DWORD[32+rsp] + xor esi,eax + mov edi,edx + rol edx,5 + add ecx,esi + movups xmm0,XMMWORD[32+r15] +DB 102,15,56,220,209 + xor edi,eax + ror ebp,7 + add ecx,edx + add ebx,DWORD[36+rsp] + xor edi,ebp + mov esi,ecx + rol ecx,5 + add ebx,edi + xor esi,ebp + ror edx,7 + add ebx,ecx + add eax,DWORD[40+rsp] + xor esi,edx + mov edi,ebx + rol ebx,5 + add eax,esi + xor edi,edx + ror ecx,7 + add eax,ebx + add ebp,DWORD[44+rsp] + movups xmm1,XMMWORD[48+r15] +DB 102,15,56,220,208 + xor edi,ecx + mov esi,eax + rol eax,5 + add ebp,edi + xor esi,ecx + ror ebx,7 + add ebp,eax + add edx,DWORD[48+rsp] + xor esi,ebx + mov edi,ebp + rol ebp,5 + add edx,esi + xor edi,ebx + ror eax,7 + add edx,ebp + add ecx,DWORD[52+rsp] + xor edi,eax + mov esi,edx + rol edx,5 + add ecx,edi + cmp r8d,11 + jb NEAR $L$aesenclast5 + movups xmm0,XMMWORD[64+r15] +DB 102,15,56,220,209 + movups xmm1,XMMWORD[80+r15] +DB 102,15,56,220,208 + je NEAR $L$aesenclast5 + movups xmm0,XMMWORD[96+r15] +DB 102,15,56,220,209 + movups xmm1,XMMWORD[112+r15] +DB 102,15,56,220,208 +$L$aesenclast5: +DB 102,15,56,221,209 + movups xmm0,XMMWORD[((16-112))+r15] + xor esi,eax + ror ebp,7 + add ecx,edx + add ebx,DWORD[56+rsp] + xor esi,ebp + mov edi,ecx + rol ecx,5 + add ebx,esi + xor edi,ebp + ror edx,7 + add ebx,ecx + add eax,DWORD[60+rsp] + xor edi,edx + mov esi,ebx + rol ebx,5 + add eax,edi + ror ecx,7 + add eax,ebx + movups XMMWORD[48+r12*1+r13],xmm2 + mov r8,QWORD[88+rsp] + + add eax,DWORD[r9] + add esi,DWORD[4+r9] + add ecx,DWORD[8+r9] + mov DWORD[r9],eax + add edx,DWORD[12+r9] + mov DWORD[4+r9],esi + add ebp,DWORD[16+r9] + mov DWORD[8+r9],ecx + mov DWORD[12+r9],edx + mov DWORD[16+r9],ebp + movups XMMWORD[r8],xmm2 + movaps xmm6,XMMWORD[((96+0))+rsp] + movaps xmm7,XMMWORD[((96+16))+rsp] + movaps xmm8,XMMWORD[((96+32))+rsp] + movaps xmm9,XMMWORD[((96+48))+rsp] + movaps xmm10,XMMWORD[((96+64))+rsp] + movaps xmm11,XMMWORD[((96+80))+rsp] + movaps xmm12,XMMWORD[((96+96))+rsp] + movaps xmm13,XMMWORD[((96+112))+rsp] + movaps xmm14,XMMWORD[((96+128))+rsp] + movaps xmm15,XMMWORD[((96+144))+rsp] + lea rsi,[264+rsp] + + mov r15,QWORD[rsi] + + mov r14,QWORD[8+rsi] + + mov r13,QWORD[16+rsi] + + mov r12,QWORD[24+rsi] + + mov rbp,QWORD[32+rsi] + + mov rbx,QWORD[40+rsi] + + lea rsp,[48+rsi] + +$L$epilogue_ssse3: + mov rdi,QWORD[8+rsp] ;WIN64 epilogue + mov rsi,QWORD[16+rsp] + DB 0F3h,0C3h ;repret + +$L$SEH_end_aesni_cbc_sha1_enc_ssse3: +ALIGN 64 +K_XX_XX: + DD 0x5a827999,0x5a827999,0x5a827999,0x5a827999 + DD 0x6ed9eba1,0x6ed9eba1,0x6ed9eba1,0x6ed9eba1 + DD 0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc + DD 0xca62c1d6,0xca62c1d6,0xca62c1d6,0xca62c1d6 + DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f +DB 0xf,0xe,0xd,0xc,0xb,0xa,0x9,0x8,0x7,0x6,0x5,0x4,0x3,0x2,0x1,0x0 + +DB 65,69,83,78,73,45,67,66,67,43,83,72,65,49,32,115 +DB 116,105,116,99,104,32,102,111,114,32,120,56,54,95,54,52 +DB 44,32,67,82,89,80,84,79,71,65,77,83,32,98,121,32 +DB 60,97,112,112,114,111,64,111,112,101,110,115,115,108,46,111 +DB 114,103,62,0 +ALIGN 64 + +ALIGN 32 +aesni_cbc_sha1_enc_shaext: + mov QWORD[8+rsp],rdi ;WIN64 prologue + mov QWORD[16+rsp],rsi + mov rax,rsp +$L$SEH_begin_aesni_cbc_sha1_enc_shaext: + mov rdi,rcx + mov rsi,rdx + mov rdx,r8 + mov rcx,r9 + mov r8,QWORD[40+rsp] + mov r9,QWORD[48+rsp] + + + + mov r10,QWORD[56+rsp] + lea rsp,[((-168))+rsp] + movaps XMMWORD[(-8-160)+rax],xmm6 + movaps XMMWORD[(-8-144)+rax],xmm7 + movaps XMMWORD[(-8-128)+rax],xmm8 + movaps XMMWORD[(-8-112)+rax],xmm9 + movaps XMMWORD[(-8-96)+rax],xmm10 + movaps XMMWORD[(-8-80)+rax],xmm11 + movaps XMMWORD[(-8-64)+rax],xmm12 + movaps XMMWORD[(-8-48)+rax],xmm13 + movaps XMMWORD[(-8-32)+rax],xmm14 + movaps XMMWORD[(-8-16)+rax],xmm15 +$L$prologue_shaext: + movdqu xmm8,XMMWORD[r9] + movd xmm9,DWORD[16+r9] + movdqa xmm7,XMMWORD[((K_XX_XX+80))] + + mov r11d,DWORD[240+rcx] + sub rsi,rdi + movups xmm15,XMMWORD[rcx] + movups xmm2,XMMWORD[r8] + movups xmm0,XMMWORD[16+rcx] + lea rcx,[112+rcx] + + pshufd xmm8,xmm8,27 + pshufd xmm9,xmm9,27 + jmp NEAR $L$oop_shaext + +ALIGN 16 +$L$oop_shaext: + movups xmm14,XMMWORD[rdi] + xorps xmm14,xmm15 + xorps xmm2,xmm14 + movups xmm1,XMMWORD[((-80))+rcx] +DB 102,15,56,220,208 + movdqu xmm3,XMMWORD[r10] + movdqa xmm12,xmm9 +DB 102,15,56,0,223 + movdqu xmm4,XMMWORD[16+r10] + movdqa xmm11,xmm8 + movups xmm0,XMMWORD[((-64))+rcx] +DB 102,15,56,220,209 +DB 102,15,56,0,231 + + paddd xmm9,xmm3 + movdqu xmm5,XMMWORD[32+r10] + lea r10,[64+r10] + pxor xmm3,xmm12 + movups xmm1,XMMWORD[((-48))+rcx] +DB 102,15,56,220,208 + pxor xmm3,xmm12 + movdqa xmm10,xmm8 +DB 102,15,56,0,239 +DB 69,15,58,204,193,0 +DB 68,15,56,200,212 + movups xmm0,XMMWORD[((-32))+rcx] +DB 102,15,56,220,209 +DB 15,56,201,220 + movdqu xmm6,XMMWORD[((-16))+r10] + movdqa xmm9,xmm8 +DB 102,15,56,0,247 + movups xmm1,XMMWORD[((-16))+rcx] +DB 102,15,56,220,208 +DB 69,15,58,204,194,0 +DB 68,15,56,200,205 + pxor xmm3,xmm5 +DB 15,56,201,229 + movups xmm0,XMMWORD[rcx] +DB 102,15,56,220,209 + movdqa xmm10,xmm8 +DB 69,15,58,204,193,0 +DB 68,15,56,200,214 + movups xmm1,XMMWORD[16+rcx] +DB 102,15,56,220,208 +DB 15,56,202,222 + pxor xmm4,xmm6 +DB 15,56,201,238 + movups xmm0,XMMWORD[32+rcx] +DB 102,15,56,220,209 + movdqa xmm9,xmm8 +DB 69,15,58,204,194,0 +DB 68,15,56,200,203 + movups xmm1,XMMWORD[48+rcx] +DB 102,15,56,220,208 +DB 15,56,202,227 + pxor xmm5,xmm3 +DB 15,56,201,243 + cmp r11d,11 + jb NEAR $L$aesenclast6 + movups xmm0,XMMWORD[64+rcx] +DB 102,15,56,220,209 + movups xmm1,XMMWORD[80+rcx] +DB 102,15,56,220,208 + je NEAR $L$aesenclast6 + movups xmm0,XMMWORD[96+rcx] +DB 102,15,56,220,209 + movups xmm1,XMMWORD[112+rcx] +DB 102,15,56,220,208 +$L$aesenclast6: +DB 102,15,56,221,209 + movups xmm0,XMMWORD[((16-112))+rcx] + movdqa xmm10,xmm8 +DB 69,15,58,204,193,0 +DB 68,15,56,200,212 + movups xmm14,XMMWORD[16+rdi] + xorps xmm14,xmm15 + movups XMMWORD[rdi*1+rsi],xmm2 + xorps xmm2,xmm14 + movups xmm1,XMMWORD[((-80))+rcx] +DB 102,15,56,220,208 +DB 15,56,202,236 + pxor xmm6,xmm4 +DB 15,56,201,220 + movups xmm0,XMMWORD[((-64))+rcx] +DB 102,15,56,220,209 + movdqa xmm9,xmm8 +DB 69,15,58,204,194,1 +DB 68,15,56,200,205 + movups xmm1,XMMWORD[((-48))+rcx] +DB 102,15,56,220,208 +DB 15,56,202,245 + pxor xmm3,xmm5 +DB 15,56,201,229 + movups xmm0,XMMWORD[((-32))+rcx] +DB 102,15,56,220,209 + movdqa xmm10,xmm8 +DB 69,15,58,204,193,1 +DB 68,15,56,200,214 + movups xmm1,XMMWORD[((-16))+rcx] +DB 102,15,56,220,208 +DB 15,56,202,222 + pxor xmm4,xmm6 +DB 15,56,201,238 + movups xmm0,XMMWORD[rcx] +DB 102,15,56,220,209 + movdqa xmm9,xmm8 +DB 69,15,58,204,194,1 +DB 68,15,56,200,203 + movups xmm1,XMMWORD[16+rcx] +DB 102,15,56,220,208 +DB 15,56,202,227 + pxor xmm5,xmm3 +DB 15,56,201,243 + movups xmm0,XMMWORD[32+rcx] +DB 102,15,56,220,209 + movdqa xmm10,xmm8 +DB 69,15,58,204,193,1 +DB 68,15,56,200,212 + movups xmm1,XMMWORD[48+rcx] +DB 102,15,56,220,208 +DB 15,56,202,236 + pxor xmm6,xmm4 +DB 15,56,201,220 + cmp r11d,11 + jb NEAR $L$aesenclast7 + movups xmm0,XMMWORD[64+rcx] +DB 102,15,56,220,209 + movups xmm1,XMMWORD[80+rcx] +DB 102,15,56,220,208 + je NEAR $L$aesenclast7 + movups xmm0,XMMWORD[96+rcx] +DB 102,15,56,220,209 + movups xmm1,XMMWORD[112+rcx] +DB 102,15,56,220,208 +$L$aesenclast7: +DB 102,15,56,221,209 + movups xmm0,XMMWORD[((16-112))+rcx] + movdqa xmm9,xmm8 +DB 69,15,58,204,194,1 +DB 68,15,56,200,205 + movups xmm14,XMMWORD[32+rdi] + xorps xmm14,xmm15 + movups XMMWORD[16+rdi*1+rsi],xmm2 + xorps xmm2,xmm14 + movups xmm1,XMMWORD[((-80))+rcx] +DB 102,15,56,220,208 +DB 15,56,202,245 + pxor xmm3,xmm5 +DB 15,56,201,229 + movups xmm0,XMMWORD[((-64))+rcx] +DB 102,15,56,220,209 + movdqa xmm10,xmm8 +DB 69,15,58,204,193,2 +DB 68,15,56,200,214 + movups xmm1,XMMWORD[((-48))+rcx] +DB 102,15,56,220,208 +DB 15,56,202,222 + pxor xmm4,xmm6 +DB 15,56,201,238 + movups xmm0,XMMWORD[((-32))+rcx] +DB 102,15,56,220,209 + movdqa xmm9,xmm8 +DB 69,15,58,204,194,2 +DB 68,15,56,200,203 + movups xmm1,XMMWORD[((-16))+rcx] +DB 102,15,56,220,208 +DB 15,56,202,227 + pxor xmm5,xmm3 +DB 15,56,201,243 + movups xmm0,XMMWORD[rcx] +DB 102,15,56,220,209 + movdqa xmm10,xmm8 +DB 69,15,58,204,193,2 +DB 68,15,56,200,212 + movups xmm1,XMMWORD[16+rcx] +DB 102,15,56,220,208 +DB 15,56,202,236 + pxor xmm6,xmm4 +DB 15,56,201,220 + movups xmm0,XMMWORD[32+rcx] +DB 102,15,56,220,209 + movdqa xmm9,xmm8 +DB 69,15,58,204,194,2 +DB 68,15,56,200,205 + movups xmm1,XMMWORD[48+rcx] +DB 102,15,56,220,208 +DB 15,56,202,245 + pxor xmm3,xmm5 +DB 15,56,201,229 + cmp r11d,11 + jb NEAR $L$aesenclast8 + movups xmm0,XMMWORD[64+rcx] +DB 102,15,56,220,209 + movups xmm1,XMMWORD[80+rcx] +DB 102,15,56,220,208 + je NEAR $L$aesenclast8 + movups xmm0,XMMWORD[96+rcx] +DB 102,15,56,220,209 + movups xmm1,XMMWORD[112+rcx] +DB 102,15,56,220,208 +$L$aesenclast8: +DB 102,15,56,221,209 + movups xmm0,XMMWORD[((16-112))+rcx] + movdqa xmm10,xmm8 +DB 69,15,58,204,193,2 +DB 68,15,56,200,214 + movups xmm14,XMMWORD[48+rdi] + xorps xmm14,xmm15 + movups XMMWORD[32+rdi*1+rsi],xmm2 + xorps xmm2,xmm14 + movups xmm1,XMMWORD[((-80))+rcx] +DB 102,15,56,220,208 +DB 15,56,202,222 + pxor xmm4,xmm6 +DB 15,56,201,238 + movups xmm0,XMMWORD[((-64))+rcx] +DB 102,15,56,220,209 + movdqa xmm9,xmm8 +DB 69,15,58,204,194,3 +DB 68,15,56,200,203 + movups xmm1,XMMWORD[((-48))+rcx] +DB 102,15,56,220,208 +DB 15,56,202,227 + pxor xmm5,xmm3 +DB 15,56,201,243 + movups xmm0,XMMWORD[((-32))+rcx] +DB 102,15,56,220,209 + movdqa xmm10,xmm8 +DB 69,15,58,204,193,3 +DB 68,15,56,200,212 +DB 15,56,202,236 + pxor xmm6,xmm4 + movups xmm1,XMMWORD[((-16))+rcx] +DB 102,15,56,220,208 + movdqa xmm9,xmm8 +DB 69,15,58,204,194,3 +DB 68,15,56,200,205 +DB 15,56,202,245 + movups xmm0,XMMWORD[rcx] +DB 102,15,56,220,209 + movdqa xmm5,xmm12 + movdqa xmm10,xmm8 +DB 69,15,58,204,193,3 +DB 68,15,56,200,214 + movups xmm1,XMMWORD[16+rcx] +DB 102,15,56,220,208 + movdqa xmm9,xmm8 +DB 69,15,58,204,194,3 +DB 68,15,56,200,205 + movups xmm0,XMMWORD[32+rcx] +DB 102,15,56,220,209 + movups xmm1,XMMWORD[48+rcx] +DB 102,15,56,220,208 + cmp r11d,11 + jb NEAR $L$aesenclast9 + movups xmm0,XMMWORD[64+rcx] +DB 102,15,56,220,209 + movups xmm1,XMMWORD[80+rcx] +DB 102,15,56,220,208 + je NEAR $L$aesenclast9 + movups xmm0,XMMWORD[96+rcx] +DB 102,15,56,220,209 + movups xmm1,XMMWORD[112+rcx] +DB 102,15,56,220,208 +$L$aesenclast9: +DB 102,15,56,221,209 + movups xmm0,XMMWORD[((16-112))+rcx] + dec rdx + + paddd xmm8,xmm11 + movups XMMWORD[48+rdi*1+rsi],xmm2 + lea rdi,[64+rdi] + jnz NEAR $L$oop_shaext + + pshufd xmm8,xmm8,27 + pshufd xmm9,xmm9,27 + movups XMMWORD[r8],xmm2 + movdqu XMMWORD[r9],xmm8 + movd DWORD[16+r9],xmm9 + movaps xmm6,XMMWORD[((-8-160))+rax] + movaps xmm7,XMMWORD[((-8-144))+rax] + movaps xmm8,XMMWORD[((-8-128))+rax] + movaps xmm9,XMMWORD[((-8-112))+rax] + movaps xmm10,XMMWORD[((-8-96))+rax] + movaps xmm11,XMMWORD[((-8-80))+rax] + movaps xmm12,XMMWORD[((-8-64))+rax] + movaps xmm13,XMMWORD[((-8-48))+rax] + movaps xmm14,XMMWORD[((-8-32))+rax] + movaps xmm15,XMMWORD[((-8-16))+rax] + mov rsp,rax +$L$epilogue_shaext: + mov rdi,QWORD[8+rsp] ;WIN64 epilogue + mov rsi,QWORD[16+rsp] + DB 0F3h,0C3h ;repret + +$L$SEH_end_aesni_cbc_sha1_enc_shaext: +EXTERN __imp_RtlVirtualUnwind + +ALIGN 16 +ssse3_handler: + push rsi + push rdi + push rbx + push rbp + push r12 + push r13 + push r14 + push r15 + pushfq + sub rsp,64 + + mov rax,QWORD[120+r8] + mov rbx,QWORD[248+r8] + + mov rsi,QWORD[8+r9] + mov r11,QWORD[56+r9] + + mov r10d,DWORD[r11] + lea r10,[r10*1+rsi] + cmp rbx,r10 + jb NEAR $L$common_seh_tail + + mov rax,QWORD[152+r8] + + mov r10d,DWORD[4+r11] + lea r10,[r10*1+rsi] + cmp rbx,r10 + jae NEAR $L$common_seh_tail + lea r10,[aesni_cbc_sha1_enc_shaext] + cmp rbx,r10 + jb NEAR $L$seh_no_shaext + + lea rsi,[rax] + lea rdi,[512+r8] + mov ecx,20 + DD 0xa548f3fc + lea rax,[168+rax] + jmp NEAR $L$common_seh_tail +$L$seh_no_shaext: + lea rsi,[96+rax] + lea rdi,[512+r8] + mov ecx,20 + DD 0xa548f3fc + lea rax,[264+rax] + + mov r15,QWORD[rax] + mov r14,QWORD[8+rax] + mov r13,QWORD[16+rax] + mov r12,QWORD[24+rax] + mov rbp,QWORD[32+rax] + mov rbx,QWORD[40+rax] + lea rax,[48+rax] + mov QWORD[144+r8],rbx + mov QWORD[160+r8],rbp + mov QWORD[216+r8],r12 + mov QWORD[224+r8],r13 + mov QWORD[232+r8],r14 + mov QWORD[240+r8],r15 + +$L$common_seh_tail: + mov rdi,QWORD[8+rax] + mov rsi,QWORD[16+rax] + mov QWORD[152+r8],rax + mov QWORD[168+r8],rsi + mov QWORD[176+r8],rdi + + mov rdi,QWORD[40+r9] + mov rsi,r8 + mov ecx,154 + DD 0xa548f3fc + + mov rsi,r9 + xor rcx,rcx + mov rdx,QWORD[8+rsi] + mov r8,QWORD[rsi] + mov r9,QWORD[16+rsi] + mov r10,QWORD[40+rsi] + lea r11,[56+rsi] + lea r12,[24+rsi] + mov QWORD[32+rsp],r10 + mov QWORD[40+rsp],r11 + mov QWORD[48+rsp],r12 + mov QWORD[56+rsp],rcx + call QWORD[__imp_RtlVirtualUnwind] + + mov eax,1 + add rsp,64 + popfq + pop r15 + pop r14 + pop r13 + pop r12 + pop rbp + pop rbx + pop rdi + pop rsi + DB 0F3h,0C3h ;repret + + +section .pdata rdata align=4 +ALIGN 4 + DD $L$SEH_begin_aesni_cbc_sha1_enc_ssse3 wrt ..imagebase + DD $L$SEH_end_aesni_cbc_sha1_enc_ssse3 wrt ..imagebase + DD $L$SEH_info_aesni_cbc_sha1_enc_ssse3 wrt ..imagebase + DD $L$SEH_begin_aesni_cbc_sha1_enc_shaext wrt ..imagebase + DD $L$SEH_end_aesni_cbc_sha1_enc_shaext wrt ..imagebase + DD $L$SEH_info_aesni_cbc_sha1_enc_shaext wrt ..imagebase +section .xdata rdata align=8 +ALIGN 8 +$L$SEH_info_aesni_cbc_sha1_enc_ssse3: +DB 9,0,0,0 + DD ssse3_handler wrt ..imagebase + DD $L$prologue_ssse3 wrt ..imagebase,$L$epilogue_ssse3 wrt ..imagebase +$L$SEH_info_aesni_cbc_sha1_enc_shaext: +DB 9,0,0,0 + DD ssse3_handler wrt ..imagebase + DD $L$prologue_shaext wrt ..imagebase,$L$epilogue_shaext wrt ..imagebase diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha256-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha256-x86_64.nasm new file mode 100644 index 0000000000..f5c250b904 --- /dev/null +++ b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha256-x86_64.nasm @@ -0,0 +1,78 @@ +; WARNING: do not edit! +; Generated from openssl/crypto/aes/asm/aesni-sha256-x86_64.pl +; +; Copyright 2013-2020 The OpenSSL Project Authors. All Rights Reserved. +; +; Licensed under the OpenSSL license (the "License"). You may not use +; this file except in compliance with the License. You can obtain a copy +; in the file LICENSE in the source distribution or at +; https://www.openssl.org/source/license.html + +default rel +%define XMMWORD +%define YMMWORD +%define ZMMWORD +section .text code align=64 + + +EXTERN OPENSSL_ia32cap_P +global aesni_cbc_sha256_enc + +ALIGN 16 +aesni_cbc_sha256_enc: + + xor eax,eax + cmp rcx,0 + je NEAR $L$probe + ud2 +$L$probe: + DB 0F3h,0C3h ;repret + + + +ALIGN 64 + +K256: + DD 0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5 + DD 0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5 + DD 0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5 + DD 0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5 + DD 0xd807aa98,0x12835b01,0x243185be,0x550c7dc3 + DD 0xd807aa98,0x12835b01,0x243185be,0x550c7dc3 + DD 0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174 + DD 0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174 + DD 0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc + DD 0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc + DD 0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da + DD 0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da + DD 0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7 + DD 0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7 + DD 0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967 + DD 0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967 + DD 0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13 + DD 0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13 + DD 0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85 + DD 0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85 + DD 0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3 + DD 0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3 + DD 0xd192e819,0xd6990624,0xf40e3585,0x106aa070 + DD 0xd192e819,0xd6990624,0xf40e3585,0x106aa070 + DD 0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5 + DD 0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5 + DD 0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3 + DD 0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3 + DD 0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208 + DD 0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208 + DD 0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2 + DD 0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2 + + DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f + DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f + DD 0,0,0,0,0,0,0,0,-1,-1,-1,-1 + DD 0,0,0,0,0,0,0,0 +DB 65,69,83,78,73,45,67,66,67,43,83,72,65,50,53,54 +DB 32,115,116,105,116,99,104,32,102,111,114,32,120,56,54,95 +DB 54,52,44,32,67,82,89,80,84,79,71,65,77,83,32,98 +DB 121,32,60,97,112,112,114,111,64,111,112,101,110,115,115,108 +DB 46,111,114,103,62,0 +ALIGN 64 diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-x86_64.nasm new file mode 100644 index 0000000000..57ee23ea8c --- /dev/null +++ b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-x86_64.nasm @@ -0,0 +1,5103 @@ +; WARNING: do not edit! +; Generated from openssl/crypto/aes/asm/aesni-x86_64.pl +; +; Copyright 2009-2020 The OpenSSL Project Authors. All Rights Reserved. +; +; Licensed under the OpenSSL license (the "License"). You may not use +; this file except in compliance with the License. You can obtain a copy +; in the file LICENSE in the source distribution or at +; https://www.openssl.org/source/license.html + +default rel +%define XMMWORD +%define YMMWORD +%define ZMMWORD +section .text code align=64 + +EXTERN OPENSSL_ia32cap_P +global aesni_encrypt + +ALIGN 16 +aesni_encrypt: + + movups xmm2,XMMWORD[rcx] + mov eax,DWORD[240+r8] + movups xmm0,XMMWORD[r8] + movups xmm1,XMMWORD[16+r8] + lea r8,[32+r8] + xorps xmm2,xmm0 +$L$oop_enc1_1: +DB 102,15,56,220,209 + dec eax + movups xmm1,XMMWORD[r8] + lea r8,[16+r8] + jnz NEAR $L$oop_enc1_1 +DB 102,15,56,221,209 + pxor xmm0,xmm0 + pxor xmm1,xmm1 + movups XMMWORD[rdx],xmm2 + pxor xmm2,xmm2 + DB 0F3h,0C3h ;repret + + + +global aesni_decrypt + +ALIGN 16 +aesni_decrypt: + + movups xmm2,XMMWORD[rcx] + mov eax,DWORD[240+r8] + movups xmm0,XMMWORD[r8] + movups xmm1,XMMWORD[16+r8] + lea r8,[32+r8] + xorps xmm2,xmm0 +$L$oop_dec1_2: +DB 102,15,56,222,209 + dec eax + movups xmm1,XMMWORD[r8] + lea r8,[16+r8] + jnz NEAR $L$oop_dec1_2 +DB 102,15,56,223,209 + pxor xmm0,xmm0 + pxor xmm1,xmm1 + movups XMMWORD[rdx],xmm2 + pxor xmm2,xmm2 + DB 0F3h,0C3h ;repret + + + +ALIGN 16 +_aesni_encrypt2: + + movups xmm0,XMMWORD[rcx] + shl eax,4 + movups xmm1,XMMWORD[16+rcx] + xorps xmm2,xmm0 + xorps xmm3,xmm0 + movups xmm0,XMMWORD[32+rcx] + lea rcx,[32+rax*1+rcx] + neg rax + add rax,16 + +$L$enc_loop2: +DB 102,15,56,220,209 +DB 102,15,56,220,217 + movups xmm1,XMMWORD[rax*1+rcx] + add rax,32 +DB 102,15,56,220,208 +DB 102,15,56,220,216 + movups xmm0,XMMWORD[((-16))+rax*1+rcx] + jnz NEAR $L$enc_loop2 + +DB 102,15,56,220,209 +DB 102,15,56,220,217 +DB 102,15,56,221,208 +DB 102,15,56,221,216 + DB 0F3h,0C3h ;repret + + + +ALIGN 16 +_aesni_decrypt2: + + movups xmm0,XMMWORD[rcx] + shl eax,4 + movups xmm1,XMMWORD[16+rcx] + xorps xmm2,xmm0 + xorps xmm3,xmm0 + movups xmm0,XMMWORD[32+rcx] + lea rcx,[32+rax*1+rcx] + neg rax + add rax,16 + +$L$dec_loop2: +DB 102,15,56,222,209 +DB 102,15,56,222,217 + movups xmm1,XMMWORD[rax*1+rcx] + add rax,32 +DB 102,15,56,222,208 +DB 102,15,56,222,216 + movups xmm0,XMMWORD[((-16))+rax*1+rcx] + jnz NEAR $L$dec_loop2 + +DB 102,15,56,222,209 +DB 102,15,56,222,217 +DB 102,15,56,223,208 +DB 102,15,56,223,216 + DB 0F3h,0C3h ;repret + + + +ALIGN 16 +_aesni_encrypt3: + + movups xmm0,XMMWORD[rcx] + shl eax,4 + movups xmm1,XMMWORD[16+rcx] + xorps xmm2,xmm0 + xorps xmm3,xmm0 + xorps xmm4,xmm0 + movups xmm0,XMMWORD[32+rcx] + lea rcx,[32+rax*1+rcx] + neg rax + add rax,16 + +$L$enc_loop3: +DB 102,15,56,220,209 +DB 102,15,56,220,217 +DB 102,15,56,220,225 + movups xmm1,XMMWORD[rax*1+rcx] + add rax,32 +DB 102,15,56,220,208 +DB 102,15,56,220,216 +DB 102,15,56,220,224 + movups xmm0,XMMWORD[((-16))+rax*1+rcx] + jnz NEAR $L$enc_loop3 + +DB 102,15,56,220,209 +DB 102,15,56,220,217 +DB 102,15,56,220,225 +DB 102,15,56,221,208 +DB 102,15,56,221,216 +DB 102,15,56,221,224 + DB 0F3h,0C3h ;repret + + + +ALIGN 16 +_aesni_decrypt3: + + movups xmm0,XMMWORD[rcx] + shl eax,4 + movups xmm1,XMMWORD[16+rcx] + xorps xmm2,xmm0 + xorps xmm3,xmm0 + xorps xmm4,xmm0 + movups xmm0,XMMWORD[32+rcx] + lea rcx,[32+rax*1+rcx] + neg rax + add rax,16 + +$L$dec_loop3: +DB 102,15,56,222,209 +DB 102,15,56,222,217 +DB 102,15,56,222,225 + movups xmm1,XMMWORD[rax*1+rcx] + add rax,32 +DB 102,15,56,222,208 +DB 102,15,56,222,216 +DB 102,15,56,222,224 + movups xmm0,XMMWORD[((-16))+rax*1+rcx] + jnz NEAR $L$dec_loop3 + +DB 102,15,56,222,209 +DB 102,15,56,222,217 +DB 102,15,56,222,225 +DB 102,15,56,223,208 +DB 102,15,56,223,216 +DB 102,15,56,223,224 + DB 0F3h,0C3h ;repret + + + +ALIGN 16 +_aesni_encrypt4: + + movups xmm0,XMMWORD[rcx] + shl eax,4 + movups xmm1,XMMWORD[16+rcx] + xorps xmm2,xmm0 + xorps xmm3,xmm0 + xorps xmm4,xmm0 + xorps xmm5,xmm0 + movups xmm0,XMMWORD[32+rcx] + lea rcx,[32+rax*1+rcx] + neg rax +DB 0x0f,0x1f,0x00 + add rax,16 + +$L$enc_loop4: +DB 102,15,56,220,209 +DB 102,15,56,220,217 +DB 102,15,56,220,225 +DB 102,15,56,220,233 + movups xmm1,XMMWORD[rax*1+rcx] + add rax,32 +DB 102,15,56,220,208 +DB 102,15,56,220,216 +DB 102,15,56,220,224 +DB 102,15,56,220,232 + movups xmm0,XMMWORD[((-16))+rax*1+rcx] + jnz NEAR $L$enc_loop4 + +DB 102,15,56,220,209 +DB 102,15,56,220,217 +DB 102,15,56,220,225 +DB 102,15,56,220,233 +DB 102,15,56,221,208 +DB 102,15,56,221,216 +DB 102,15,56,221,224 +DB 102,15,56,221,232 + DB 0F3h,0C3h ;repret + + + +ALIGN 16 +_aesni_decrypt4: + + movups xmm0,XMMWORD[rcx] + shl eax,4 + movups xmm1,XMMWORD[16+rcx] + xorps xmm2,xmm0 + xorps xmm3,xmm0 + xorps xmm4,xmm0 + xorps xmm5,xmm0 + movups xmm0,XMMWORD[32+rcx] + lea rcx,[32+rax*1+rcx] + neg rax +DB 0x0f,0x1f,0x00 + add rax,16 + +$L$dec_loop4: +DB 102,15,56,222,209 +DB 102,15,56,222,217 +DB 102,15,56,222,225 +DB 102,15,56,222,233 + movups xmm1,XMMWORD[rax*1+rcx] + add rax,32 +DB 102,15,56,222,208 +DB 102,15,56,222,216 +DB 102,15,56,222,224 +DB 102,15,56,222,232 + movups xmm0,XMMWORD[((-16))+rax*1+rcx] + jnz NEAR $L$dec_loop4 + +DB 102,15,56,222,209 +DB 102,15,56,222,217 +DB 102,15,56,222,225 +DB 102,15,56,222,233 +DB 102,15,56,223,208 +DB 102,15,56,223,216 +DB 102,15,56,223,224 +DB 102,15,56,223,232 + DB 0F3h,0C3h ;repret + + + +ALIGN 16 +_aesni_encrypt6: + + movups xmm0,XMMWORD[rcx] + shl eax,4 + movups xmm1,XMMWORD[16+rcx] + xorps xmm2,xmm0 + pxor xmm3,xmm0 + pxor xmm4,xmm0 +DB 102,15,56,220,209 + lea rcx,[32+rax*1+rcx] + neg rax +DB 102,15,56,220,217 + pxor xmm5,xmm0 + pxor xmm6,xmm0 +DB 102,15,56,220,225 + pxor xmm7,xmm0 + movups xmm0,XMMWORD[rax*1+rcx] + add rax,16 + jmp NEAR $L$enc_loop6_enter +ALIGN 16 +$L$enc_loop6: +DB 102,15,56,220,209 +DB 102,15,56,220,217 +DB 102,15,56,220,225 +$L$enc_loop6_enter: +DB 102,15,56,220,233 +DB 102,15,56,220,241 +DB 102,15,56,220,249 + movups xmm1,XMMWORD[rax*1+rcx] + add rax,32 +DB 102,15,56,220,208 +DB 102,15,56,220,216 +DB 102,15,56,220,224 +DB 102,15,56,220,232 +DB 102,15,56,220,240 +DB 102,15,56,220,248 + movups xmm0,XMMWORD[((-16))+rax*1+rcx] + jnz NEAR $L$enc_loop6 + +DB 102,15,56,220,209 +DB 102,15,56,220,217 +DB 102,15,56,220,225 +DB 102,15,56,220,233 +DB 102,15,56,220,241 +DB 102,15,56,220,249 +DB 102,15,56,221,208 +DB 102,15,56,221,216 +DB 102,15,56,221,224 +DB 102,15,56,221,232 +DB 102,15,56,221,240 +DB 102,15,56,221,248 + DB 0F3h,0C3h ;repret + + + +ALIGN 16 +_aesni_decrypt6: + + movups xmm0,XMMWORD[rcx] + shl eax,4 + movups xmm1,XMMWORD[16+rcx] + xorps xmm2,xmm0 + pxor xmm3,xmm0 + pxor xmm4,xmm0 +DB 102,15,56,222,209 + lea rcx,[32+rax*1+rcx] + neg rax +DB 102,15,56,222,217 + pxor xmm5,xmm0 + pxor xmm6,xmm0 +DB 102,15,56,222,225 + pxor xmm7,xmm0 + movups xmm0,XMMWORD[rax*1+rcx] + add rax,16 + jmp NEAR $L$dec_loop6_enter +ALIGN 16 +$L$dec_loop6: +DB 102,15,56,222,209 +DB 102,15,56,222,217 +DB 102,15,56,222,225 +$L$dec_loop6_enter: +DB 102,15,56,222,233 +DB 102,15,56,222,241 +DB 102,15,56,222,249 + movups xmm1,XMMWORD[rax*1+rcx] + add rax,32 +DB 102,15,56,222,208 +DB 102,15,56,222,216 +DB 102,15,56,222,224 +DB 102,15,56,222,232 +DB 102,15,56,222,240 +DB 102,15,56,222,248 + movups xmm0,XMMWORD[((-16))+rax*1+rcx] + jnz NEAR $L$dec_loop6 + +DB 102,15,56,222,209 +DB 102,15,56,222,217 +DB 102,15,56,222,225 +DB 102,15,56,222,233 +DB 102,15,56,222,241 +DB 102,15,56,222,249 +DB 102,15,56,223,208 +DB 102,15,56,223,216 +DB 102,15,56,223,224 +DB 102,15,56,223,232 +DB 102,15,56,223,240 +DB 102,15,56,223,248 + DB 0F3h,0C3h ;repret + + + +ALIGN 16 +_aesni_encrypt8: + + movups xmm0,XMMWORD[rcx] + shl eax,4 + movups xmm1,XMMWORD[16+rcx] + xorps xmm2,xmm0 + xorps xmm3,xmm0 + pxor xmm4,xmm0 + pxor xmm5,xmm0 + pxor xmm6,xmm0 + lea rcx,[32+rax*1+rcx] + neg rax +DB 102,15,56,220,209 + pxor xmm7,xmm0 + pxor xmm8,xmm0 +DB 102,15,56,220,217 + pxor xmm9,xmm0 + movups xmm0,XMMWORD[rax*1+rcx] + add rax,16 + jmp NEAR $L$enc_loop8_inner +ALIGN 16 +$L$enc_loop8: +DB 102,15,56,220,209 +DB 102,15,56,220,217 +$L$enc_loop8_inner: +DB 102,15,56,220,225 +DB 102,15,56,220,233 +DB 102,15,56,220,241 +DB 102,15,56,220,249 +DB 102,68,15,56,220,193 +DB 102,68,15,56,220,201 +$L$enc_loop8_enter: + movups xmm1,XMMWORD[rax*1+rcx] + add rax,32 +DB 102,15,56,220,208 +DB 102,15,56,220,216 +DB 102,15,56,220,224 +DB 102,15,56,220,232 +DB 102,15,56,220,240 +DB 102,15,56,220,248 +DB 102,68,15,56,220,192 +DB 102,68,15,56,220,200 + movups xmm0,XMMWORD[((-16))+rax*1+rcx] + jnz NEAR $L$enc_loop8 + +DB 102,15,56,220,209 +DB 102,15,56,220,217 +DB 102,15,56,220,225 +DB 102,15,56,220,233 +DB 102,15,56,220,241 +DB 102,15,56,220,249 +DB 102,68,15,56,220,193 +DB 102,68,15,56,220,201 +DB 102,15,56,221,208 +DB 102,15,56,221,216 +DB 102,15,56,221,224 +DB 102,15,56,221,232 +DB 102,15,56,221,240 +DB 102,15,56,221,248 +DB 102,68,15,56,221,192 +DB 102,68,15,56,221,200 + DB 0F3h,0C3h ;repret + + + +ALIGN 16 +_aesni_decrypt8: + + movups xmm0,XMMWORD[rcx] + shl eax,4 + movups xmm1,XMMWORD[16+rcx] + xorps xmm2,xmm0 + xorps xmm3,xmm0 + pxor xmm4,xmm0 + pxor xmm5,xmm0 + pxor xmm6,xmm0 + lea rcx,[32+rax*1+rcx] + neg rax +DB 102,15,56,222,209 + pxor xmm7,xmm0 + pxor xmm8,xmm0 +DB 102,15,56,222,217 + pxor xmm9,xmm0 + movups xmm0,XMMWORD[rax*1+rcx] + add rax,16 + jmp NEAR $L$dec_loop8_inner +ALIGN 16 +$L$dec_loop8: +DB 102,15,56,222,209 +DB 102,15,56,222,217 +$L$dec_loop8_inner: +DB 102,15,56,222,225 +DB 102,15,56,222,233 +DB 102,15,56,222,241 +DB 102,15,56,222,249 +DB 102,68,15,56,222,193 +DB 102,68,15,56,222,201 +$L$dec_loop8_enter: + movups xmm1,XMMWORD[rax*1+rcx] + add rax,32 +DB 102,15,56,222,208 +DB 102,15,56,222,216 +DB 102,15,56,222,224 +DB 102,15,56,222,232 +DB 102,15,56,222,240 +DB 102,15,56,222,248 +DB 102,68,15,56,222,192 +DB 102,68,15,56,222,200 + movups xmm0,XMMWORD[((-16))+rax*1+rcx] + jnz NEAR $L$dec_loop8 + +DB 102,15,56,222,209 +DB 102,15,56,222,217 +DB 102,15,56,222,225 +DB 102,15,56,222,233 +DB 102,15,56,222,241 +DB 102,15,56,222,249 +DB 102,68,15,56,222,193 +DB 102,68,15,56,222,201 +DB 102,15,56,223,208 +DB 102,15,56,223,216 +DB 102,15,56,223,224 +DB 102,15,56,223,232 +DB 102,15,56,223,240 +DB 102,15,56,223,248 +DB 102,68,15,56,223,192 +DB 102,68,15,56,223,200 + DB 0F3h,0C3h ;repret + + +global aesni_ecb_encrypt + +ALIGN 16 +aesni_ecb_encrypt: + mov QWORD[8+rsp],rdi ;WIN64 prologue + mov QWORD[16+rsp],rsi + mov rax,rsp +$L$SEH_begin_aesni_ecb_encrypt: + mov rdi,rcx + mov rsi,rdx + mov rdx,r8 + mov rcx,r9 + mov r8,QWORD[40+rsp] + + + + lea rsp,[((-88))+rsp] + movaps XMMWORD[rsp],xmm6 + movaps XMMWORD[16+rsp],xmm7 + movaps XMMWORD[32+rsp],xmm8 + movaps XMMWORD[48+rsp],xmm9 +$L$ecb_enc_body: + and rdx,-16 + jz NEAR $L$ecb_ret + + mov eax,DWORD[240+rcx] + movups xmm0,XMMWORD[rcx] + mov r11,rcx + mov r10d,eax + test r8d,r8d + jz NEAR $L$ecb_decrypt + + cmp rdx,0x80 + jb NEAR $L$ecb_enc_tail + + movdqu xmm2,XMMWORD[rdi] + movdqu xmm3,XMMWORD[16+rdi] + movdqu xmm4,XMMWORD[32+rdi] + movdqu xmm5,XMMWORD[48+rdi] + movdqu xmm6,XMMWORD[64+rdi] + movdqu xmm7,XMMWORD[80+rdi] + movdqu xmm8,XMMWORD[96+rdi] + movdqu xmm9,XMMWORD[112+rdi] + lea rdi,[128+rdi] + sub rdx,0x80 + jmp NEAR $L$ecb_enc_loop8_enter +ALIGN 16 +$L$ecb_enc_loop8: + movups XMMWORD[rsi],xmm2 + mov rcx,r11 + movdqu xmm2,XMMWORD[rdi] + mov eax,r10d + movups XMMWORD[16+rsi],xmm3 + movdqu xmm3,XMMWORD[16+rdi] + movups XMMWORD[32+rsi],xmm4 + movdqu xmm4,XMMWORD[32+rdi] + movups XMMWORD[48+rsi],xmm5 + movdqu xmm5,XMMWORD[48+rdi] + movups XMMWORD[64+rsi],xmm6 + movdqu xmm6,XMMWORD[64+rdi] + movups XMMWORD[80+rsi],xmm7 + movdqu xmm7,XMMWORD[80+rdi] + movups XMMWORD[96+rsi],xmm8 + movdqu xmm8,XMMWORD[96+rdi] + movups XMMWORD[112+rsi],xmm9 + lea rsi,[128+rsi] + movdqu xmm9,XMMWORD[112+rdi] + lea rdi,[128+rdi] +$L$ecb_enc_loop8_enter: + + call _aesni_encrypt8 + + sub rdx,0x80 + jnc NEAR $L$ecb_enc_loop8 + + movups XMMWORD[rsi],xmm2 + mov rcx,r11 + movups XMMWORD[16+rsi],xmm3 + mov eax,r10d + movups XMMWORD[32+rsi],xmm4 + movups XMMWORD[48+rsi],xmm5 + movups XMMWORD[64+rsi],xmm6 + movups XMMWORD[80+rsi],xmm7 + movups XMMWORD[96+rsi],xmm8 + movups XMMWORD[112+rsi],xmm9 + lea rsi,[128+rsi] + add rdx,0x80 + jz NEAR $L$ecb_ret + +$L$ecb_enc_tail: + movups xmm2,XMMWORD[rdi] + cmp rdx,0x20 + jb NEAR $L$ecb_enc_one + movups xmm3,XMMWORD[16+rdi] + je NEAR $L$ecb_enc_two + movups xmm4,XMMWORD[32+rdi] + cmp rdx,0x40 + jb NEAR $L$ecb_enc_three + movups xmm5,XMMWORD[48+rdi] + je NEAR $L$ecb_enc_four + movups xmm6,XMMWORD[64+rdi] + cmp rdx,0x60 + jb NEAR $L$ecb_enc_five + movups xmm7,XMMWORD[80+rdi] + je NEAR $L$ecb_enc_six + movdqu xmm8,XMMWORD[96+rdi] + xorps xmm9,xmm9 + call _aesni_encrypt8 + movups XMMWORD[rsi],xmm2 + movups XMMWORD[16+rsi],xmm3 + movups XMMWORD[32+rsi],xmm4 + movups XMMWORD[48+rsi],xmm5 + movups XMMWORD[64+rsi],xmm6 + movups XMMWORD[80+rsi],xmm7 + movups XMMWORD[96+rsi],xmm8 + jmp NEAR $L$ecb_ret +ALIGN 16 +$L$ecb_enc_one: + movups xmm0,XMMWORD[rcx] + movups xmm1,XMMWORD[16+rcx] + lea rcx,[32+rcx] + xorps xmm2,xmm0 +$L$oop_enc1_3: +DB 102,15,56,220,209 + dec eax + movups xmm1,XMMWORD[rcx] + lea rcx,[16+rcx] + jnz NEAR $L$oop_enc1_3 +DB 102,15,56,221,209 + movups XMMWORD[rsi],xmm2 + jmp NEAR $L$ecb_ret +ALIGN 16 +$L$ecb_enc_two: + call _aesni_encrypt2 + movups XMMWORD[rsi],xmm2 + movups XMMWORD[16+rsi],xmm3 + jmp NEAR $L$ecb_ret +ALIGN 16 +$L$ecb_enc_three: + call _aesni_encrypt3 + movups XMMWORD[rsi],xmm2 + movups XMMWORD[16+rsi],xmm3 + movups XMMWORD[32+rsi],xmm4 + jmp NEAR $L$ecb_ret +ALIGN 16 +$L$ecb_enc_four: + call _aesni_encrypt4 + movups XMMWORD[rsi],xmm2 + movups XMMWORD[16+rsi],xmm3 + movups XMMWORD[32+rsi],xmm4 + movups XMMWORD[48+rsi],xmm5 + jmp NEAR $L$ecb_ret +ALIGN 16 +$L$ecb_enc_five: + xorps xmm7,xmm7 + call _aesni_encrypt6 + movups XMMWORD[rsi],xmm2 + movups XMMWORD[16+rsi],xmm3 + movups XMMWORD[32+rsi],xmm4 + movups XMMWORD[48+rsi],xmm5 + movups XMMWORD[64+rsi],xmm6 + jmp NEAR $L$ecb_ret +ALIGN 16 +$L$ecb_enc_six: + call _aesni_encrypt6 + movups XMMWORD[rsi],xmm2 + movups XMMWORD[16+rsi],xmm3 + movups XMMWORD[32+rsi],xmm4 + movups XMMWORD[48+rsi],xmm5 + movups XMMWORD[64+rsi],xmm6 + movups XMMWORD[80+rsi],xmm7 + jmp NEAR $L$ecb_ret + +ALIGN 16 +$L$ecb_decrypt: + cmp rdx,0x80 + jb NEAR $L$ecb_dec_tail + + movdqu xmm2,XMMWORD[rdi] + movdqu xmm3,XMMWORD[16+rdi] + movdqu xmm4,XMMWORD[32+rdi] + movdqu xmm5,XMMWORD[48+rdi] + movdqu xmm6,XMMWORD[64+rdi] + movdqu xmm7,XMMWORD[80+rdi] + movdqu xmm8,XMMWORD[96+rdi] + movdqu xmm9,XMMWORD[112+rdi] + lea rdi,[128+rdi] + sub rdx,0x80 + jmp NEAR $L$ecb_dec_loop8_enter +ALIGN 16 +$L$ecb_dec_loop8: + movups XMMWORD[rsi],xmm2 + mov rcx,r11 + movdqu xmm2,XMMWORD[rdi] + mov eax,r10d + movups XMMWORD[16+rsi],xmm3 + movdqu xmm3,XMMWORD[16+rdi] + movups XMMWORD[32+rsi],xmm4 + movdqu xmm4,XMMWORD[32+rdi] + movups XMMWORD[48+rsi],xmm5 + movdqu xmm5,XMMWORD[48+rdi] + movups XMMWORD[64+rsi],xmm6 + movdqu xmm6,XMMWORD[64+rdi] + movups XMMWORD[80+rsi],xmm7 + movdqu xmm7,XMMWORD[80+rdi] + movups XMMWORD[96+rsi],xmm8 + movdqu xmm8,XMMWORD[96+rdi] + movups XMMWORD[112+rsi],xmm9 + lea rsi,[128+rsi] + movdqu xmm9,XMMWORD[112+rdi] + lea rdi,[128+rdi] +$L$ecb_dec_loop8_enter: + + call _aesni_decrypt8 + + movups xmm0,XMMWORD[r11] + sub rdx,0x80 + jnc NEAR $L$ecb_dec_loop8 + + movups XMMWORD[rsi],xmm2 + pxor xmm2,xmm2 + mov rcx,r11 + movups XMMWORD[16+rsi],xmm3 + pxor xmm3,xmm3 + mov eax,r10d + movups XMMWORD[32+rsi],xmm4 + pxor xmm4,xmm4 + movups XMMWORD[48+rsi],xmm5 + pxor xmm5,xmm5 + movups XMMWORD[64+rsi],xmm6 + pxor xmm6,xmm6 + movups XMMWORD[80+rsi],xmm7 + pxor xmm7,xmm7 + movups XMMWORD[96+rsi],xmm8 + pxor xmm8,xmm8 + movups XMMWORD[112+rsi],xmm9 + pxor xmm9,xmm9 + lea rsi,[128+rsi] + add rdx,0x80 + jz NEAR $L$ecb_ret + +$L$ecb_dec_tail: + movups xmm2,XMMWORD[rdi] + cmp rdx,0x20 + jb NEAR $L$ecb_dec_one + movups xmm3,XMMWORD[16+rdi] + je NEAR $L$ecb_dec_two + movups xmm4,XMMWORD[32+rdi] + cmp rdx,0x40 + jb NEAR $L$ecb_dec_three + movups xmm5,XMMWORD[48+rdi] + je NEAR $L$ecb_dec_four + movups xmm6,XMMWORD[64+rdi] + cmp rdx,0x60 + jb NEAR $L$ecb_dec_five + movups xmm7,XMMWORD[80+rdi] + je NEAR $L$ecb_dec_six + movups xmm8,XMMWORD[96+rdi] + movups xmm0,XMMWORD[rcx] + xorps xmm9,xmm9 + call _aesni_decrypt8 + movups XMMWORD[rsi],xmm2 + pxor xmm2,xmm2 + movups XMMWORD[16+rsi],xmm3 + pxor xmm3,xmm3 + movups XMMWORD[32+rsi],xmm4 + pxor xmm4,xmm4 + movups XMMWORD[48+rsi],xmm5 + pxor xmm5,xmm5 + movups XMMWORD[64+rsi],xmm6 + pxor xmm6,xmm6 + movups XMMWORD[80+rsi],xmm7 + pxor xmm7,xmm7 + movups XMMWORD[96+rsi],xmm8 + pxor xmm8,xmm8 + pxor xmm9,xmm9 + jmp NEAR $L$ecb_ret +ALIGN 16 +$L$ecb_dec_one: + movups xmm0,XMMWORD[rcx] + movups xmm1,XMMWORD[16+rcx] + lea rcx,[32+rcx] + xorps xmm2,xmm0 +$L$oop_dec1_4: +DB 102,15,56,222,209 + dec eax + movups xmm1,XMMWORD[rcx] + lea rcx,[16+rcx] + jnz NEAR $L$oop_dec1_4 +DB 102,15,56,223,209 + movups XMMWORD[rsi],xmm2 + pxor xmm2,xmm2 + jmp NEAR $L$ecb_ret +ALIGN 16 +$L$ecb_dec_two: + call _aesni_decrypt2 + movups XMMWORD[rsi],xmm2 + pxor xmm2,xmm2 + movups XMMWORD[16+rsi],xmm3 + pxor xmm3,xmm3 + jmp NEAR $L$ecb_ret +ALIGN 16 +$L$ecb_dec_three: + call _aesni_decrypt3 + movups XMMWORD[rsi],xmm2 + pxor xmm2,xmm2 + movups XMMWORD[16+rsi],xmm3 + pxor xmm3,xmm3 + movups XMMWORD[32+rsi],xmm4 + pxor xmm4,xmm4 + jmp NEAR $L$ecb_ret +ALIGN 16 +$L$ecb_dec_four: + call _aesni_decrypt4 + movups XMMWORD[rsi],xmm2 + pxor xmm2,xmm2 + movups XMMWORD[16+rsi],xmm3 + pxor xmm3,xmm3 + movups XMMWORD[32+rsi],xmm4 + pxor xmm4,xmm4 + movups XMMWORD[48+rsi],xmm5 + pxor xmm5,xmm5 + jmp NEAR $L$ecb_ret +ALIGN 16 +$L$ecb_dec_five: + xorps xmm7,xmm7 + call _aesni_decrypt6 + movups XMMWORD[rsi],xmm2 + pxor xmm2,xmm2 + movups XMMWORD[16+rsi],xmm3 + pxor xmm3,xmm3 + movups XMMWORD[32+rsi],xmm4 + pxor xmm4,xmm4 + movups XMMWORD[48+rsi],xmm5 + pxor xmm5,xmm5 + movups XMMWORD[64+rsi],xmm6 + pxor xmm6,xmm6 + pxor xmm7,xmm7 + jmp NEAR $L$ecb_ret +ALIGN 16 +$L$ecb_dec_six: + call _aesni_decrypt6 + movups XMMWORD[rsi],xmm2 + pxor xmm2,xmm2 + movups XMMWORD[16+rsi],xmm3 + pxor xmm3,xmm3 + movups XMMWORD[32+rsi],xmm4 + pxor xmm4,xmm4 + movups XMMWORD[48+rsi],xmm5 + pxor xmm5,xmm5 + movups XMMWORD[64+rsi],xmm6 + pxor xmm6,xmm6 + movups XMMWORD[80+rsi],xmm7 + pxor xmm7,xmm7 + +$L$ecb_ret: + xorps xmm0,xmm0 + pxor xmm1,xmm1 + movaps xmm6,XMMWORD[rsp] + movaps XMMWORD[rsp],xmm0 + movaps xmm7,XMMWORD[16+rsp] + movaps XMMWORD[16+rsp],xmm0 + movaps xmm8,XMMWORD[32+rsp] + movaps XMMWORD[32+rsp],xmm0 + movaps xmm9,XMMWORD[48+rsp] + movaps XMMWORD[48+rsp],xmm0 + lea rsp,[88+rsp] +$L$ecb_enc_ret: + mov rdi,QWORD[8+rsp] ;WIN64 epilogue + mov rsi,QWORD[16+rsp] + DB 0F3h,0C3h ;repret + +$L$SEH_end_aesni_ecb_encrypt: +global aesni_ccm64_encrypt_blocks + +ALIGN 16 +aesni_ccm64_encrypt_blocks: + mov QWORD[8+rsp],rdi ;WIN64 prologue + mov QWORD[16+rsp],rsi + mov rax,rsp +$L$SEH_begin_aesni_ccm64_encrypt_blocks: + mov rdi,rcx + mov rsi,rdx + mov rdx,r8 + mov rcx,r9 + mov r8,QWORD[40+rsp] + mov r9,QWORD[48+rsp] + + + + lea rsp,[((-88))+rsp] + movaps XMMWORD[rsp],xmm6 + movaps XMMWORD[16+rsp],xmm7 + movaps XMMWORD[32+rsp],xmm8 + movaps XMMWORD[48+rsp],xmm9 +$L$ccm64_enc_body: + mov eax,DWORD[240+rcx] + movdqu xmm6,XMMWORD[r8] + movdqa xmm9,XMMWORD[$L$increment64] + movdqa xmm7,XMMWORD[$L$bswap_mask] + + shl eax,4 + mov r10d,16 + lea r11,[rcx] + movdqu xmm3,XMMWORD[r9] + movdqa xmm2,xmm6 + lea rcx,[32+rax*1+rcx] +DB 102,15,56,0,247 + sub r10,rax + jmp NEAR $L$ccm64_enc_outer +ALIGN 16 +$L$ccm64_enc_outer: + movups xmm0,XMMWORD[r11] + mov rax,r10 + movups xmm8,XMMWORD[rdi] + + xorps xmm2,xmm0 + movups xmm1,XMMWORD[16+r11] + xorps xmm0,xmm8 + xorps xmm3,xmm0 + movups xmm0,XMMWORD[32+r11] + +$L$ccm64_enc2_loop: +DB 102,15,56,220,209 +DB 102,15,56,220,217 + movups xmm1,XMMWORD[rax*1+rcx] + add rax,32 +DB 102,15,56,220,208 +DB 102,15,56,220,216 + movups xmm0,XMMWORD[((-16))+rax*1+rcx] + jnz NEAR $L$ccm64_enc2_loop +DB 102,15,56,220,209 +DB 102,15,56,220,217 + paddq xmm6,xmm9 + dec rdx +DB 102,15,56,221,208 +DB 102,15,56,221,216 + + lea rdi,[16+rdi] + xorps xmm8,xmm2 + movdqa xmm2,xmm6 + movups XMMWORD[rsi],xmm8 +DB 102,15,56,0,215 + lea rsi,[16+rsi] + jnz NEAR $L$ccm64_enc_outer + + pxor xmm0,xmm0 + pxor xmm1,xmm1 + pxor xmm2,xmm2 + movups XMMWORD[r9],xmm3 + pxor xmm3,xmm3 + pxor xmm8,xmm8 + pxor xmm6,xmm6 + movaps xmm6,XMMWORD[rsp] + movaps XMMWORD[rsp],xmm0 + movaps xmm7,XMMWORD[16+rsp] + movaps XMMWORD[16+rsp],xmm0 + movaps xmm8,XMMWORD[32+rsp] + movaps XMMWORD[32+rsp],xmm0 + movaps xmm9,XMMWORD[48+rsp] + movaps XMMWORD[48+rsp],xmm0 + lea rsp,[88+rsp] +$L$ccm64_enc_ret: + mov rdi,QWORD[8+rsp] ;WIN64 epilogue + mov rsi,QWORD[16+rsp] + DB 0F3h,0C3h ;repret + +$L$SEH_end_aesni_ccm64_encrypt_blocks: +global aesni_ccm64_decrypt_blocks + +ALIGN 16 +aesni_ccm64_decrypt_blocks: + mov QWORD[8+rsp],rdi ;WIN64 prologue + mov QWORD[16+rsp],rsi + mov rax,rsp +$L$SEH_begin_aesni_ccm64_decrypt_blocks: + mov rdi,rcx + mov rsi,rdx + mov rdx,r8 + mov rcx,r9 + mov r8,QWORD[40+rsp] + mov r9,QWORD[48+rsp] + + + + lea rsp,[((-88))+rsp] + movaps XMMWORD[rsp],xmm6 + movaps XMMWORD[16+rsp],xmm7 + movaps XMMWORD[32+rsp],xmm8 + movaps XMMWORD[48+rsp],xmm9 +$L$ccm64_dec_body: + mov eax,DWORD[240+rcx] + movups xmm6,XMMWORD[r8] + movdqu xmm3,XMMWORD[r9] + movdqa xmm9,XMMWORD[$L$increment64] + movdqa xmm7,XMMWORD[$L$bswap_mask] + + movaps xmm2,xmm6 + mov r10d,eax + mov r11,rcx +DB 102,15,56,0,247 + movups xmm0,XMMWORD[rcx] + movups xmm1,XMMWORD[16+rcx] + lea rcx,[32+rcx] + xorps xmm2,xmm0 +$L$oop_enc1_5: +DB 102,15,56,220,209 + dec eax + movups xmm1,XMMWORD[rcx] + lea rcx,[16+rcx] + jnz NEAR $L$oop_enc1_5 +DB 102,15,56,221,209 + shl r10d,4 + mov eax,16 + movups xmm8,XMMWORD[rdi] + paddq xmm6,xmm9 + lea rdi,[16+rdi] + sub rax,r10 + lea rcx,[32+r10*1+r11] + mov r10,rax + jmp NEAR $L$ccm64_dec_outer +ALIGN 16 +$L$ccm64_dec_outer: + xorps xmm8,xmm2 + movdqa xmm2,xmm6 + movups XMMWORD[rsi],xmm8 + lea rsi,[16+rsi] +DB 102,15,56,0,215 + + sub rdx,1 + jz NEAR $L$ccm64_dec_break + + movups xmm0,XMMWORD[r11] + mov rax,r10 + movups xmm1,XMMWORD[16+r11] + xorps xmm8,xmm0 + xorps xmm2,xmm0 + xorps xmm3,xmm8 + movups xmm0,XMMWORD[32+r11] + jmp NEAR $L$ccm64_dec2_loop +ALIGN 16 +$L$ccm64_dec2_loop: +DB 102,15,56,220,209 +DB 102,15,56,220,217 + movups xmm1,XMMWORD[rax*1+rcx] + add rax,32 +DB 102,15,56,220,208 +DB 102,15,56,220,216 + movups xmm0,XMMWORD[((-16))+rax*1+rcx] + jnz NEAR $L$ccm64_dec2_loop + movups xmm8,XMMWORD[rdi] + paddq xmm6,xmm9 +DB 102,15,56,220,209 +DB 102,15,56,220,217 +DB 102,15,56,221,208 +DB 102,15,56,221,216 + lea rdi,[16+rdi] + jmp NEAR $L$ccm64_dec_outer + +ALIGN 16 +$L$ccm64_dec_break: + + mov eax,DWORD[240+r11] + movups xmm0,XMMWORD[r11] + movups xmm1,XMMWORD[16+r11] + xorps xmm8,xmm0 + lea r11,[32+r11] + xorps xmm3,xmm8 +$L$oop_enc1_6: +DB 102,15,56,220,217 + dec eax + movups xmm1,XMMWORD[r11] + lea r11,[16+r11] + jnz NEAR $L$oop_enc1_6 +DB 102,15,56,221,217 + pxor xmm0,xmm0 + pxor xmm1,xmm1 + pxor xmm2,xmm2 + movups XMMWORD[r9],xmm3 + pxor xmm3,xmm3 + pxor xmm8,xmm8 + pxor xmm6,xmm6 + movaps xmm6,XMMWORD[rsp] + movaps XMMWORD[rsp],xmm0 + movaps xmm7,XMMWORD[16+rsp] + movaps XMMWORD[16+rsp],xmm0 + movaps xmm8,XMMWORD[32+rsp] + movaps XMMWORD[32+rsp],xmm0 + movaps xmm9,XMMWORD[48+rsp] + movaps XMMWORD[48+rsp],xmm0 + lea rsp,[88+rsp] +$L$ccm64_dec_ret: + mov rdi,QWORD[8+rsp] ;WIN64 epilogue + mov rsi,QWORD[16+rsp] + DB 0F3h,0C3h ;repret + +$L$SEH_end_aesni_ccm64_decrypt_blocks: +global aesni_ctr32_encrypt_blocks + +ALIGN 16 +aesni_ctr32_encrypt_blocks: + mov QWORD[8+rsp],rdi ;WIN64 prologue + mov QWORD[16+rsp],rsi + mov rax,rsp +$L$SEH_begin_aesni_ctr32_encrypt_blocks: + mov rdi,rcx + mov rsi,rdx + mov rdx,r8 + mov rcx,r9 + mov r8,QWORD[40+rsp] + + + + cmp rdx,1 + jne NEAR $L$ctr32_bulk + + + + movups xmm2,XMMWORD[r8] + movups xmm3,XMMWORD[rdi] + mov edx,DWORD[240+rcx] + movups xmm0,XMMWORD[rcx] + movups xmm1,XMMWORD[16+rcx] + lea rcx,[32+rcx] + xorps xmm2,xmm0 +$L$oop_enc1_7: +DB 102,15,56,220,209 + dec edx + movups xmm1,XMMWORD[rcx] + lea rcx,[16+rcx] + jnz NEAR $L$oop_enc1_7 +DB 102,15,56,221,209 + pxor xmm0,xmm0 + pxor xmm1,xmm1 + xorps xmm2,xmm3 + pxor xmm3,xmm3 + movups XMMWORD[rsi],xmm2 + xorps xmm2,xmm2 + jmp NEAR $L$ctr32_epilogue + +ALIGN 16 +$L$ctr32_bulk: + lea r11,[rsp] + + push rbp + + sub rsp,288 + and rsp,-16 + movaps XMMWORD[(-168)+r11],xmm6 + movaps XMMWORD[(-152)+r11],xmm7 + movaps XMMWORD[(-136)+r11],xmm8 + movaps XMMWORD[(-120)+r11],xmm9 + movaps XMMWORD[(-104)+r11],xmm10 + movaps XMMWORD[(-88)+r11],xmm11 + movaps XMMWORD[(-72)+r11],xmm12 + movaps XMMWORD[(-56)+r11],xmm13 + movaps XMMWORD[(-40)+r11],xmm14 + movaps XMMWORD[(-24)+r11],xmm15 +$L$ctr32_body: + + + + + movdqu xmm2,XMMWORD[r8] + movdqu xmm0,XMMWORD[rcx] + mov r8d,DWORD[12+r8] + pxor xmm2,xmm0 + mov ebp,DWORD[12+rcx] + movdqa XMMWORD[rsp],xmm2 + bswap r8d + movdqa xmm3,xmm2 + movdqa xmm4,xmm2 + movdqa xmm5,xmm2 + movdqa XMMWORD[64+rsp],xmm2 + movdqa XMMWORD[80+rsp],xmm2 + movdqa XMMWORD[96+rsp],xmm2 + mov r10,rdx + movdqa XMMWORD[112+rsp],xmm2 + + lea rax,[1+r8] + lea rdx,[2+r8] + bswap eax + bswap edx + xor eax,ebp + xor edx,ebp +DB 102,15,58,34,216,3 + lea rax,[3+r8] + movdqa XMMWORD[16+rsp],xmm3 +DB 102,15,58,34,226,3 + bswap eax + mov rdx,r10 + lea r10,[4+r8] + movdqa XMMWORD[32+rsp],xmm4 + xor eax,ebp + bswap r10d +DB 102,15,58,34,232,3 + xor r10d,ebp + movdqa XMMWORD[48+rsp],xmm5 + lea r9,[5+r8] + mov DWORD[((64+12))+rsp],r10d + bswap r9d + lea r10,[6+r8] + mov eax,DWORD[240+rcx] + xor r9d,ebp + bswap r10d + mov DWORD[((80+12))+rsp],r9d + xor r10d,ebp + lea r9,[7+r8] + mov DWORD[((96+12))+rsp],r10d + bswap r9d + mov r10d,DWORD[((OPENSSL_ia32cap_P+4))] + xor r9d,ebp + and r10d,71303168 + mov DWORD[((112+12))+rsp],r9d + + movups xmm1,XMMWORD[16+rcx] + + movdqa xmm6,XMMWORD[64+rsp] + movdqa xmm7,XMMWORD[80+rsp] + + cmp rdx,8 + jb NEAR $L$ctr32_tail + + sub rdx,6 + cmp r10d,4194304 + je NEAR $L$ctr32_6x + + lea rcx,[128+rcx] + sub rdx,2 + jmp NEAR $L$ctr32_loop8 + +ALIGN 16 +$L$ctr32_6x: + shl eax,4 + mov r10d,48 + bswap ebp + lea rcx,[32+rax*1+rcx] + sub r10,rax + jmp NEAR $L$ctr32_loop6 + +ALIGN 16 +$L$ctr32_loop6: + add r8d,6 + movups xmm0,XMMWORD[((-48))+r10*1+rcx] +DB 102,15,56,220,209 + mov eax,r8d + xor eax,ebp +DB 102,15,56,220,217 +DB 0x0f,0x38,0xf1,0x44,0x24,12 + lea eax,[1+r8] +DB 102,15,56,220,225 + xor eax,ebp +DB 0x0f,0x38,0xf1,0x44,0x24,28 +DB 102,15,56,220,233 + lea eax,[2+r8] + xor eax,ebp +DB 102,15,56,220,241 +DB 0x0f,0x38,0xf1,0x44,0x24,44 + lea eax,[3+r8] +DB 102,15,56,220,249 + movups xmm1,XMMWORD[((-32))+r10*1+rcx] + xor eax,ebp + +DB 102,15,56,220,208 +DB 0x0f,0x38,0xf1,0x44,0x24,60 + lea eax,[4+r8] +DB 102,15,56,220,216 + xor eax,ebp +DB 0x0f,0x38,0xf1,0x44,0x24,76 +DB 102,15,56,220,224 + lea eax,[5+r8] + xor eax,ebp +DB 102,15,56,220,232 +DB 0x0f,0x38,0xf1,0x44,0x24,92 + mov rax,r10 +DB 102,15,56,220,240 +DB 102,15,56,220,248 + movups xmm0,XMMWORD[((-16))+r10*1+rcx] + + call $L$enc_loop6 + + movdqu xmm8,XMMWORD[rdi] + movdqu xmm9,XMMWORD[16+rdi] + movdqu xmm10,XMMWORD[32+rdi] + movdqu xmm11,XMMWORD[48+rdi] + movdqu xmm12,XMMWORD[64+rdi] + movdqu xmm13,XMMWORD[80+rdi] + lea rdi,[96+rdi] + movups xmm1,XMMWORD[((-64))+r10*1+rcx] + pxor xmm8,xmm2 + movaps xmm2,XMMWORD[rsp] + pxor xmm9,xmm3 + movaps xmm3,XMMWORD[16+rsp] + pxor xmm10,xmm4 + movaps xmm4,XMMWORD[32+rsp] + pxor xmm11,xmm5 + movaps xmm5,XMMWORD[48+rsp] + pxor xmm12,xmm6 + movaps xmm6,XMMWORD[64+rsp] + pxor xmm13,xmm7 + movaps xmm7,XMMWORD[80+rsp] + movdqu XMMWORD[rsi],xmm8 + movdqu XMMWORD[16+rsi],xmm9 + movdqu XMMWORD[32+rsi],xmm10 + movdqu XMMWORD[48+rsi],xmm11 + movdqu XMMWORD[64+rsi],xmm12 + movdqu XMMWORD[80+rsi],xmm13 + lea rsi,[96+rsi] + + sub rdx,6 + jnc NEAR $L$ctr32_loop6 + + add rdx,6 + jz NEAR $L$ctr32_done + + lea eax,[((-48))+r10] + lea rcx,[((-80))+r10*1+rcx] + neg eax + shr eax,4 + jmp NEAR $L$ctr32_tail + +ALIGN 32 +$L$ctr32_loop8: + add r8d,8 + movdqa xmm8,XMMWORD[96+rsp] +DB 102,15,56,220,209 + mov r9d,r8d + movdqa xmm9,XMMWORD[112+rsp] +DB 102,15,56,220,217 + bswap r9d + movups xmm0,XMMWORD[((32-128))+rcx] +DB 102,15,56,220,225 + xor r9d,ebp + nop +DB 102,15,56,220,233 + mov DWORD[((0+12))+rsp],r9d + lea r9,[1+r8] +DB 102,15,56,220,241 +DB 102,15,56,220,249 +DB 102,68,15,56,220,193 +DB 102,68,15,56,220,201 + movups xmm1,XMMWORD[((48-128))+rcx] + bswap r9d +DB 102,15,56,220,208 +DB 102,15,56,220,216 + xor r9d,ebp +DB 0x66,0x90 +DB 102,15,56,220,224 +DB 102,15,56,220,232 + mov DWORD[((16+12))+rsp],r9d + lea r9,[2+r8] +DB 102,15,56,220,240 +DB 102,15,56,220,248 +DB 102,68,15,56,220,192 +DB 102,68,15,56,220,200 + movups xmm0,XMMWORD[((64-128))+rcx] + bswap r9d +DB 102,15,56,220,209 +DB 102,15,56,220,217 + xor r9d,ebp +DB 0x66,0x90 +DB 102,15,56,220,225 +DB 102,15,56,220,233 + mov DWORD[((32+12))+rsp],r9d + lea r9,[3+r8] +DB 102,15,56,220,241 +DB 102,15,56,220,249 +DB 102,68,15,56,220,193 +DB 102,68,15,56,220,201 + movups xmm1,XMMWORD[((80-128))+rcx] + bswap r9d +DB 102,15,56,220,208 +DB 102,15,56,220,216 + xor r9d,ebp +DB 0x66,0x90 +DB 102,15,56,220,224 +DB 102,15,56,220,232 + mov DWORD[((48+12))+rsp],r9d + lea r9,[4+r8] +DB 102,15,56,220,240 +DB 102,15,56,220,248 +DB 102,68,15,56,220,192 +DB 102,68,15,56,220,200 + movups xmm0,XMMWORD[((96-128))+rcx] + bswap r9d +DB 102,15,56,220,209 +DB 102,15,56,220,217 + xor r9d,ebp +DB 0x66,0x90 +DB 102,15,56,220,225 +DB 102,15,56,220,233 + mov DWORD[((64+12))+rsp],r9d + lea r9,[5+r8] +DB 102,15,56,220,241 +DB 102,15,56,220,249 +DB 102,68,15,56,220,193 +DB 102,68,15,56,220,201 + movups xmm1,XMMWORD[((112-128))+rcx] + bswap r9d +DB 102,15,56,220,208 +DB 102,15,56,220,216 + xor r9d,ebp +DB 0x66,0x90 +DB 102,15,56,220,224 +DB 102,15,56,220,232 + mov DWORD[((80+12))+rsp],r9d + lea r9,[6+r8] +DB 102,15,56,220,240 +DB 102,15,56,220,248 +DB 102,68,15,56,220,192 +DB 102,68,15,56,220,200 + movups xmm0,XMMWORD[((128-128))+rcx] + bswap r9d +DB 102,15,56,220,209 +DB 102,15,56,220,217 + xor r9d,ebp +DB 0x66,0x90 +DB 102,15,56,220,225 +DB 102,15,56,220,233 + mov DWORD[((96+12))+rsp],r9d + lea r9,[7+r8] +DB 102,15,56,220,241 +DB 102,15,56,220,249 +DB 102,68,15,56,220,193 +DB 102,68,15,56,220,201 + movups xmm1,XMMWORD[((144-128))+rcx] + bswap r9d +DB 102,15,56,220,208 +DB 102,15,56,220,216 +DB 102,15,56,220,224 + xor r9d,ebp + movdqu xmm10,XMMWORD[rdi] +DB 102,15,56,220,232 + mov DWORD[((112+12))+rsp],r9d + cmp eax,11 +DB 102,15,56,220,240 +DB 102,15,56,220,248 +DB 102,68,15,56,220,192 +DB 102,68,15,56,220,200 + movups xmm0,XMMWORD[((160-128))+rcx] + + jb NEAR $L$ctr32_enc_done + +DB 102,15,56,220,209 +DB 102,15,56,220,217 +DB 102,15,56,220,225 +DB 102,15,56,220,233 +DB 102,15,56,220,241 +DB 102,15,56,220,249 +DB 102,68,15,56,220,193 +DB 102,68,15,56,220,201 + movups xmm1,XMMWORD[((176-128))+rcx] + +DB 102,15,56,220,208 +DB 102,15,56,220,216 +DB 102,15,56,220,224 +DB 102,15,56,220,232 +DB 102,15,56,220,240 +DB 102,15,56,220,248 +DB 102,68,15,56,220,192 +DB 102,68,15,56,220,200 + movups xmm0,XMMWORD[((192-128))+rcx] + je NEAR $L$ctr32_enc_done + +DB 102,15,56,220,209 +DB 102,15,56,220,217 +DB 102,15,56,220,225 +DB 102,15,56,220,233 +DB 102,15,56,220,241 +DB 102,15,56,220,249 +DB 102,68,15,56,220,193 +DB 102,68,15,56,220,201 + movups xmm1,XMMWORD[((208-128))+rcx] + +DB 102,15,56,220,208 +DB 102,15,56,220,216 +DB 102,15,56,220,224 +DB 102,15,56,220,232 +DB 102,15,56,220,240 +DB 102,15,56,220,248 +DB 102,68,15,56,220,192 +DB 102,68,15,56,220,200 + movups xmm0,XMMWORD[((224-128))+rcx] + jmp NEAR $L$ctr32_enc_done + +ALIGN 16 +$L$ctr32_enc_done: + movdqu xmm11,XMMWORD[16+rdi] + pxor xmm10,xmm0 + movdqu xmm12,XMMWORD[32+rdi] + pxor xmm11,xmm0 + movdqu xmm13,XMMWORD[48+rdi] + pxor xmm12,xmm0 + movdqu xmm14,XMMWORD[64+rdi] + pxor xmm13,xmm0 + movdqu xmm15,XMMWORD[80+rdi] + pxor xmm14,xmm0 + pxor xmm15,xmm0 +DB 102,15,56,220,209 +DB 102,15,56,220,217 +DB 102,15,56,220,225 +DB 102,15,56,220,233 +DB 102,15,56,220,241 +DB 102,15,56,220,249 +DB 102,68,15,56,220,193 +DB 102,68,15,56,220,201 + movdqu xmm1,XMMWORD[96+rdi] + lea rdi,[128+rdi] + +DB 102,65,15,56,221,210 + pxor xmm1,xmm0 + movdqu xmm10,XMMWORD[((112-128))+rdi] +DB 102,65,15,56,221,219 + pxor xmm10,xmm0 + movdqa xmm11,XMMWORD[rsp] +DB 102,65,15,56,221,228 +DB 102,65,15,56,221,237 + movdqa xmm12,XMMWORD[16+rsp] + movdqa xmm13,XMMWORD[32+rsp] +DB 102,65,15,56,221,246 +DB 102,65,15,56,221,255 + movdqa xmm14,XMMWORD[48+rsp] + movdqa xmm15,XMMWORD[64+rsp] +DB 102,68,15,56,221,193 + movdqa xmm0,XMMWORD[80+rsp] + movups xmm1,XMMWORD[((16-128))+rcx] +DB 102,69,15,56,221,202 + + movups XMMWORD[rsi],xmm2 + movdqa xmm2,xmm11 + movups XMMWORD[16+rsi],xmm3 + movdqa xmm3,xmm12 + movups XMMWORD[32+rsi],xmm4 + movdqa xmm4,xmm13 + movups XMMWORD[48+rsi],xmm5 + movdqa xmm5,xmm14 + movups XMMWORD[64+rsi],xmm6 + movdqa xmm6,xmm15 + movups XMMWORD[80+rsi],xmm7 + movdqa xmm7,xmm0 + movups XMMWORD[96+rsi],xmm8 + movups XMMWORD[112+rsi],xmm9 + lea rsi,[128+rsi] + + sub rdx,8 + jnc NEAR $L$ctr32_loop8 + + add rdx,8 + jz NEAR $L$ctr32_done + lea rcx,[((-128))+rcx] + +$L$ctr32_tail: + + + lea rcx,[16+rcx] + cmp rdx,4 + jb NEAR $L$ctr32_loop3 + je NEAR $L$ctr32_loop4 + + + shl eax,4 + movdqa xmm8,XMMWORD[96+rsp] + pxor xmm9,xmm9 + + movups xmm0,XMMWORD[16+rcx] +DB 102,15,56,220,209 +DB 102,15,56,220,217 + lea rcx,[((32-16))+rax*1+rcx] + neg rax +DB 102,15,56,220,225 + add rax,16 + movups xmm10,XMMWORD[rdi] +DB 102,15,56,220,233 +DB 102,15,56,220,241 + movups xmm11,XMMWORD[16+rdi] + movups xmm12,XMMWORD[32+rdi] +DB 102,15,56,220,249 +DB 102,68,15,56,220,193 + + call $L$enc_loop8_enter + + movdqu xmm13,XMMWORD[48+rdi] + pxor xmm2,xmm10 + movdqu xmm10,XMMWORD[64+rdi] + pxor xmm3,xmm11 + movdqu XMMWORD[rsi],xmm2 + pxor xmm4,xmm12 + movdqu XMMWORD[16+rsi],xmm3 + pxor xmm5,xmm13 + movdqu XMMWORD[32+rsi],xmm4 + pxor xmm6,xmm10 + movdqu XMMWORD[48+rsi],xmm5 + movdqu XMMWORD[64+rsi],xmm6 + cmp rdx,6 + jb NEAR $L$ctr32_done + + movups xmm11,XMMWORD[80+rdi] + xorps xmm7,xmm11 + movups XMMWORD[80+rsi],xmm7 + je NEAR $L$ctr32_done + + movups xmm12,XMMWORD[96+rdi] + xorps xmm8,xmm12 + movups XMMWORD[96+rsi],xmm8 + jmp NEAR $L$ctr32_done + +ALIGN 32 +$L$ctr32_loop4: +DB 102,15,56,220,209 + lea rcx,[16+rcx] + dec eax +DB 102,15,56,220,217 +DB 102,15,56,220,225 +DB 102,15,56,220,233 + movups xmm1,XMMWORD[rcx] + jnz NEAR $L$ctr32_loop4 +DB 102,15,56,221,209 +DB 102,15,56,221,217 + movups xmm10,XMMWORD[rdi] + movups xmm11,XMMWORD[16+rdi] +DB 102,15,56,221,225 +DB 102,15,56,221,233 + movups xmm12,XMMWORD[32+rdi] + movups xmm13,XMMWORD[48+rdi] + + xorps xmm2,xmm10 + movups XMMWORD[rsi],xmm2 + xorps xmm3,xmm11 + movups XMMWORD[16+rsi],xmm3 + pxor xmm4,xmm12 + movdqu XMMWORD[32+rsi],xmm4 + pxor xmm5,xmm13 + movdqu XMMWORD[48+rsi],xmm5 + jmp NEAR $L$ctr32_done + +ALIGN 32 +$L$ctr32_loop3: +DB 102,15,56,220,209 + lea rcx,[16+rcx] + dec eax +DB 102,15,56,220,217 +DB 102,15,56,220,225 + movups xmm1,XMMWORD[rcx] + jnz NEAR $L$ctr32_loop3 +DB 102,15,56,221,209 +DB 102,15,56,221,217 +DB 102,15,56,221,225 + + movups xmm10,XMMWORD[rdi] + xorps xmm2,xmm10 + movups XMMWORD[rsi],xmm2 + cmp rdx,2 + jb NEAR $L$ctr32_done + + movups xmm11,XMMWORD[16+rdi] + xorps xmm3,xmm11 + movups XMMWORD[16+rsi],xmm3 + je NEAR $L$ctr32_done + + movups xmm12,XMMWORD[32+rdi] + xorps xmm4,xmm12 + movups XMMWORD[32+rsi],xmm4 + +$L$ctr32_done: + xorps xmm0,xmm0 + xor ebp,ebp + pxor xmm1,xmm1 + pxor xmm2,xmm2 + pxor xmm3,xmm3 + pxor xmm4,xmm4 + pxor xmm5,xmm5 + movaps xmm6,XMMWORD[((-168))+r11] + movaps XMMWORD[(-168)+r11],xmm0 + movaps xmm7,XMMWORD[((-152))+r11] + movaps XMMWORD[(-152)+r11],xmm0 + movaps xmm8,XMMWORD[((-136))+r11] + movaps XMMWORD[(-136)+r11],xmm0 + movaps xmm9,XMMWORD[((-120))+r11] + movaps XMMWORD[(-120)+r11],xmm0 + movaps xmm10,XMMWORD[((-104))+r11] + movaps XMMWORD[(-104)+r11],xmm0 + movaps xmm11,XMMWORD[((-88))+r11] + movaps XMMWORD[(-88)+r11],xmm0 + movaps xmm12,XMMWORD[((-72))+r11] + movaps XMMWORD[(-72)+r11],xmm0 + movaps xmm13,XMMWORD[((-56))+r11] + movaps XMMWORD[(-56)+r11],xmm0 + movaps xmm14,XMMWORD[((-40))+r11] + movaps XMMWORD[(-40)+r11],xmm0 + movaps xmm15,XMMWORD[((-24))+r11] + movaps XMMWORD[(-24)+r11],xmm0 + movaps XMMWORD[rsp],xmm0 + movaps XMMWORD[16+rsp],xmm0 + movaps XMMWORD[32+rsp],xmm0 + movaps XMMWORD[48+rsp],xmm0 + movaps XMMWORD[64+rsp],xmm0 + movaps XMMWORD[80+rsp],xmm0 + movaps XMMWORD[96+rsp],xmm0 + movaps XMMWORD[112+rsp],xmm0 + mov rbp,QWORD[((-8))+r11] + + lea rsp,[r11] + +$L$ctr32_epilogue: + mov rdi,QWORD[8+rsp] ;WIN64 epilogue + mov rsi,QWORD[16+rsp] + DB 0F3h,0C3h ;repret + +$L$SEH_end_aesni_ctr32_encrypt_blocks: +global aesni_xts_encrypt + +ALIGN 16 +aesni_xts_encrypt: + mov QWORD[8+rsp],rdi ;WIN64 prologue + mov QWORD[16+rsp],rsi + mov rax,rsp +$L$SEH_begin_aesni_xts_encrypt: + mov rdi,rcx + mov rsi,rdx + mov rdx,r8 + mov rcx,r9 + mov r8,QWORD[40+rsp] + mov r9,QWORD[48+rsp] + + + + lea r11,[rsp] + + push rbp + + sub rsp,272 + and rsp,-16 + movaps XMMWORD[(-168)+r11],xmm6 + movaps XMMWORD[(-152)+r11],xmm7 + movaps XMMWORD[(-136)+r11],xmm8 + movaps XMMWORD[(-120)+r11],xmm9 + movaps XMMWORD[(-104)+r11],xmm10 + movaps XMMWORD[(-88)+r11],xmm11 + movaps XMMWORD[(-72)+r11],xmm12 + movaps XMMWORD[(-56)+r11],xmm13 + movaps XMMWORD[(-40)+r11],xmm14 + movaps XMMWORD[(-24)+r11],xmm15 +$L$xts_enc_body: + movups xmm2,XMMWORD[r9] + mov eax,DWORD[240+r8] + mov r10d,DWORD[240+rcx] + movups xmm0,XMMWORD[r8] + movups xmm1,XMMWORD[16+r8] + lea r8,[32+r8] + xorps xmm2,xmm0 +$L$oop_enc1_8: +DB 102,15,56,220,209 + dec eax + movups xmm1,XMMWORD[r8] + lea r8,[16+r8] + jnz NEAR $L$oop_enc1_8 +DB 102,15,56,221,209 + movups xmm0,XMMWORD[rcx] + mov rbp,rcx + mov eax,r10d + shl r10d,4 + mov r9,rdx + and rdx,-16 + + movups xmm1,XMMWORD[16+r10*1+rcx] + + movdqa xmm8,XMMWORD[$L$xts_magic] + movdqa xmm15,xmm2 + pshufd xmm9,xmm2,0x5f + pxor xmm1,xmm0 + movdqa xmm14,xmm9 + paddd xmm9,xmm9 + movdqa xmm10,xmm15 + psrad xmm14,31 + paddq xmm15,xmm15 + pand xmm14,xmm8 + pxor xmm10,xmm0 + pxor xmm15,xmm14 + movdqa xmm14,xmm9 + paddd xmm9,xmm9 + movdqa xmm11,xmm15 + psrad xmm14,31 + paddq xmm15,xmm15 + pand xmm14,xmm8 + pxor xmm11,xmm0 + pxor xmm15,xmm14 + movdqa xmm14,xmm9 + paddd xmm9,xmm9 + movdqa xmm12,xmm15 + psrad xmm14,31 + paddq xmm15,xmm15 + pand xmm14,xmm8 + pxor xmm12,xmm0 + pxor xmm15,xmm14 + movdqa xmm14,xmm9 + paddd xmm9,xmm9 + movdqa xmm13,xmm15 + psrad xmm14,31 + paddq xmm15,xmm15 + pand xmm14,xmm8 + pxor xmm13,xmm0 + pxor xmm15,xmm14 + movdqa xmm14,xmm15 + psrad xmm9,31 + paddq xmm15,xmm15 + pand xmm9,xmm8 + pxor xmm14,xmm0 + pxor xmm15,xmm9 + movaps XMMWORD[96+rsp],xmm1 + + sub rdx,16*6 + jc NEAR $L$xts_enc_short + + mov eax,16+96 + lea rcx,[32+r10*1+rbp] + sub rax,r10 + movups xmm1,XMMWORD[16+rbp] + mov r10,rax + lea r8,[$L$xts_magic] + jmp NEAR $L$xts_enc_grandloop + +ALIGN 32 +$L$xts_enc_grandloop: + movdqu xmm2,XMMWORD[rdi] + movdqa xmm8,xmm0 + movdqu xmm3,XMMWORD[16+rdi] + pxor xmm2,xmm10 + movdqu xmm4,XMMWORD[32+rdi] + pxor xmm3,xmm11 +DB 102,15,56,220,209 + movdqu xmm5,XMMWORD[48+rdi] + pxor xmm4,xmm12 +DB 102,15,56,220,217 + movdqu xmm6,XMMWORD[64+rdi] + pxor xmm5,xmm13 +DB 102,15,56,220,225 + movdqu xmm7,XMMWORD[80+rdi] + pxor xmm8,xmm15 + movdqa xmm9,XMMWORD[96+rsp] + pxor xmm6,xmm14 +DB 102,15,56,220,233 + movups xmm0,XMMWORD[32+rbp] + lea rdi,[96+rdi] + pxor xmm7,xmm8 + + pxor xmm10,xmm9 +DB 102,15,56,220,241 + pxor xmm11,xmm9 + movdqa XMMWORD[rsp],xmm10 +DB 102,15,56,220,249 + movups xmm1,XMMWORD[48+rbp] + pxor xmm12,xmm9 + +DB 102,15,56,220,208 + pxor xmm13,xmm9 + movdqa XMMWORD[16+rsp],xmm11 +DB 102,15,56,220,216 + pxor xmm14,xmm9 + movdqa XMMWORD[32+rsp],xmm12 +DB 102,15,56,220,224 +DB 102,15,56,220,232 + pxor xmm8,xmm9 + movdqa XMMWORD[64+rsp],xmm14 +DB 102,15,56,220,240 +DB 102,15,56,220,248 + movups xmm0,XMMWORD[64+rbp] + movdqa XMMWORD[80+rsp],xmm8 + pshufd xmm9,xmm15,0x5f + jmp NEAR $L$xts_enc_loop6 +ALIGN 32 +$L$xts_enc_loop6: +DB 102,15,56,220,209 +DB 102,15,56,220,217 +DB 102,15,56,220,225 +DB 102,15,56,220,233 +DB 102,15,56,220,241 +DB 102,15,56,220,249 + movups xmm1,XMMWORD[((-64))+rax*1+rcx] + add rax,32 + +DB 102,15,56,220,208 +DB 102,15,56,220,216 +DB 102,15,56,220,224 +DB 102,15,56,220,232 +DB 102,15,56,220,240 +DB 102,15,56,220,248 + movups xmm0,XMMWORD[((-80))+rax*1+rcx] + jnz NEAR $L$xts_enc_loop6 + + movdqa xmm8,XMMWORD[r8] + movdqa xmm14,xmm9 + paddd xmm9,xmm9 +DB 102,15,56,220,209 + paddq xmm15,xmm15 + psrad xmm14,31 +DB 102,15,56,220,217 + pand xmm14,xmm8 + movups xmm10,XMMWORD[rbp] +DB 102,15,56,220,225 +DB 102,15,56,220,233 +DB 102,15,56,220,241 + pxor xmm15,xmm14 + movaps xmm11,xmm10 +DB 102,15,56,220,249 + movups xmm1,XMMWORD[((-64))+rcx] + + movdqa xmm14,xmm9 +DB 102,15,56,220,208 + paddd xmm9,xmm9 + pxor xmm10,xmm15 +DB 102,15,56,220,216 + psrad xmm14,31 + paddq xmm15,xmm15 +DB 102,15,56,220,224 +DB 102,15,56,220,232 + pand xmm14,xmm8 + movaps xmm12,xmm11 +DB 102,15,56,220,240 + pxor xmm15,xmm14 + movdqa xmm14,xmm9 +DB 102,15,56,220,248 + movups xmm0,XMMWORD[((-48))+rcx] + + paddd xmm9,xmm9 +DB 102,15,56,220,209 + pxor xmm11,xmm15 + psrad xmm14,31 +DB 102,15,56,220,217 + paddq xmm15,xmm15 + pand xmm14,xmm8 +DB 102,15,56,220,225 +DB 102,15,56,220,233 + movdqa XMMWORD[48+rsp],xmm13 + pxor xmm15,xmm14 +DB 102,15,56,220,241 + movaps xmm13,xmm12 + movdqa xmm14,xmm9 +DB 102,15,56,220,249 + movups xmm1,XMMWORD[((-32))+rcx] + + paddd xmm9,xmm9 +DB 102,15,56,220,208 + pxor xmm12,xmm15 + psrad xmm14,31 +DB 102,15,56,220,216 + paddq xmm15,xmm15 + pand xmm14,xmm8 +DB 102,15,56,220,224 +DB 102,15,56,220,232 +DB 102,15,56,220,240 + pxor xmm15,xmm14 + movaps xmm14,xmm13 +DB 102,15,56,220,248 + + movdqa xmm0,xmm9 + paddd xmm9,xmm9 +DB 102,15,56,220,209 + pxor xmm13,xmm15 + psrad xmm0,31 +DB 102,15,56,220,217 + paddq xmm15,xmm15 + pand xmm0,xmm8 +DB 102,15,56,220,225 +DB 102,15,56,220,233 + pxor xmm15,xmm0 + movups xmm0,XMMWORD[rbp] +DB 102,15,56,220,241 +DB 102,15,56,220,249 + movups xmm1,XMMWORD[16+rbp] + + pxor xmm14,xmm15 +DB 102,15,56,221,84,36,0 + psrad xmm9,31 + paddq xmm15,xmm15 +DB 102,15,56,221,92,36,16 +DB 102,15,56,221,100,36,32 + pand xmm9,xmm8 + mov rax,r10 +DB 102,15,56,221,108,36,48 +DB 102,15,56,221,116,36,64 +DB 102,15,56,221,124,36,80 + pxor xmm15,xmm9 + + lea rsi,[96+rsi] + movups XMMWORD[(-96)+rsi],xmm2 + movups XMMWORD[(-80)+rsi],xmm3 + movups XMMWORD[(-64)+rsi],xmm4 + movups XMMWORD[(-48)+rsi],xmm5 + movups XMMWORD[(-32)+rsi],xmm6 + movups XMMWORD[(-16)+rsi],xmm7 + sub rdx,16*6 + jnc NEAR $L$xts_enc_grandloop + + mov eax,16+96 + sub eax,r10d + mov rcx,rbp + shr eax,4 + +$L$xts_enc_short: + + mov r10d,eax + pxor xmm10,xmm0 + add rdx,16*6 + jz NEAR $L$xts_enc_done + + pxor xmm11,xmm0 + cmp rdx,0x20 + jb NEAR $L$xts_enc_one + pxor xmm12,xmm0 + je NEAR $L$xts_enc_two + + pxor xmm13,xmm0 + cmp rdx,0x40 + jb NEAR $L$xts_enc_three + pxor xmm14,xmm0 + je NEAR $L$xts_enc_four + + movdqu xmm2,XMMWORD[rdi] + movdqu xmm3,XMMWORD[16+rdi] + movdqu xmm4,XMMWORD[32+rdi] + pxor xmm2,xmm10 + movdqu xmm5,XMMWORD[48+rdi] + pxor xmm3,xmm11 + movdqu xmm6,XMMWORD[64+rdi] + lea rdi,[80+rdi] + pxor xmm4,xmm12 + pxor xmm5,xmm13 + pxor xmm6,xmm14 + pxor xmm7,xmm7 + + call _aesni_encrypt6 + + xorps xmm2,xmm10 + movdqa xmm10,xmm15 + xorps xmm3,xmm11 + xorps xmm4,xmm12 + movdqu XMMWORD[rsi],xmm2 + xorps xmm5,xmm13 + movdqu XMMWORD[16+rsi],xmm3 + xorps xmm6,xmm14 + movdqu XMMWORD[32+rsi],xmm4 + movdqu XMMWORD[48+rsi],xmm5 + movdqu XMMWORD[64+rsi],xmm6 + lea rsi,[80+rsi] + jmp NEAR $L$xts_enc_done + +ALIGN 16 +$L$xts_enc_one: + movups xmm2,XMMWORD[rdi] + lea rdi,[16+rdi] + xorps xmm2,xmm10 + movups xmm0,XMMWORD[rcx] + movups xmm1,XMMWORD[16+rcx] + lea rcx,[32+rcx] + xorps xmm2,xmm0 +$L$oop_enc1_9: +DB 102,15,56,220,209 + dec eax + movups xmm1,XMMWORD[rcx] + lea rcx,[16+rcx] + jnz NEAR $L$oop_enc1_9 +DB 102,15,56,221,209 + xorps xmm2,xmm10 + movdqa xmm10,xmm11 + movups XMMWORD[rsi],xmm2 + lea rsi,[16+rsi] + jmp NEAR $L$xts_enc_done + +ALIGN 16 +$L$xts_enc_two: + movups xmm2,XMMWORD[rdi] + movups xmm3,XMMWORD[16+rdi] + lea rdi,[32+rdi] + xorps xmm2,xmm10 + xorps xmm3,xmm11 + + call _aesni_encrypt2 + + xorps xmm2,xmm10 + movdqa xmm10,xmm12 + xorps xmm3,xmm11 + movups XMMWORD[rsi],xmm2 + movups XMMWORD[16+rsi],xmm3 + lea rsi,[32+rsi] + jmp NEAR $L$xts_enc_done + +ALIGN 16 +$L$xts_enc_three: + movups xmm2,XMMWORD[rdi] + movups xmm3,XMMWORD[16+rdi] + movups xmm4,XMMWORD[32+rdi] + lea rdi,[48+rdi] + xorps xmm2,xmm10 + xorps xmm3,xmm11 + xorps xmm4,xmm12 + + call _aesni_encrypt3 + + xorps xmm2,xmm10 + movdqa xmm10,xmm13 + xorps xmm3,xmm11 + xorps xmm4,xmm12 + movups XMMWORD[rsi],xmm2 + movups XMMWORD[16+rsi],xmm3 + movups XMMWORD[32+rsi],xmm4 + lea rsi,[48+rsi] + jmp NEAR $L$xts_enc_done + +ALIGN 16 +$L$xts_enc_four: + movups xmm2,XMMWORD[rdi] + movups xmm3,XMMWORD[16+rdi] + movups xmm4,XMMWORD[32+rdi] + xorps xmm2,xmm10 + movups xmm5,XMMWORD[48+rdi] + lea rdi,[64+rdi] + xorps xmm3,xmm11 + xorps xmm4,xmm12 + xorps xmm5,xmm13 + + call _aesni_encrypt4 + + pxor xmm2,xmm10 + movdqa xmm10,xmm14 + pxor xmm3,xmm11 + pxor xmm4,xmm12 + movdqu XMMWORD[rsi],xmm2 + pxor xmm5,xmm13 + movdqu XMMWORD[16+rsi],xmm3 + movdqu XMMWORD[32+rsi],xmm4 + movdqu XMMWORD[48+rsi],xmm5 + lea rsi,[64+rsi] + jmp NEAR $L$xts_enc_done + +ALIGN 16 +$L$xts_enc_done: + and r9,15 + jz NEAR $L$xts_enc_ret + mov rdx,r9 + +$L$xts_enc_steal: + movzx eax,BYTE[rdi] + movzx ecx,BYTE[((-16))+rsi] + lea rdi,[1+rdi] + mov BYTE[((-16))+rsi],al + mov BYTE[rsi],cl + lea rsi,[1+rsi] + sub rdx,1 + jnz NEAR $L$xts_enc_steal + + sub rsi,r9 + mov rcx,rbp + mov eax,r10d + + movups xmm2,XMMWORD[((-16))+rsi] + xorps xmm2,xmm10 + movups xmm0,XMMWORD[rcx] + movups xmm1,XMMWORD[16+rcx] + lea rcx,[32+rcx] + xorps xmm2,xmm0 +$L$oop_enc1_10: +DB 102,15,56,220,209 + dec eax + movups xmm1,XMMWORD[rcx] + lea rcx,[16+rcx] + jnz NEAR $L$oop_enc1_10 +DB 102,15,56,221,209 + xorps xmm2,xmm10 + movups XMMWORD[(-16)+rsi],xmm2 + +$L$xts_enc_ret: + xorps xmm0,xmm0 + pxor xmm1,xmm1 + pxor xmm2,xmm2 + pxor xmm3,xmm3 + pxor xmm4,xmm4 + pxor xmm5,xmm5 + movaps xmm6,XMMWORD[((-168))+r11] + movaps XMMWORD[(-168)+r11],xmm0 + movaps xmm7,XMMWORD[((-152))+r11] + movaps XMMWORD[(-152)+r11],xmm0 + movaps xmm8,XMMWORD[((-136))+r11] + movaps XMMWORD[(-136)+r11],xmm0 + movaps xmm9,XMMWORD[((-120))+r11] + movaps XMMWORD[(-120)+r11],xmm0 + movaps xmm10,XMMWORD[((-104))+r11] + movaps XMMWORD[(-104)+r11],xmm0 + movaps xmm11,XMMWORD[((-88))+r11] + movaps XMMWORD[(-88)+r11],xmm0 + movaps xmm12,XMMWORD[((-72))+r11] + movaps XMMWORD[(-72)+r11],xmm0 + movaps xmm13,XMMWORD[((-56))+r11] + movaps XMMWORD[(-56)+r11],xmm0 + movaps xmm14,XMMWORD[((-40))+r11] + movaps XMMWORD[(-40)+r11],xmm0 + movaps xmm15,XMMWORD[((-24))+r11] + movaps XMMWORD[(-24)+r11],xmm0 + movaps XMMWORD[rsp],xmm0 + movaps XMMWORD[16+rsp],xmm0 + movaps XMMWORD[32+rsp],xmm0 + movaps XMMWORD[48+rsp],xmm0 + movaps XMMWORD[64+rsp],xmm0 + movaps XMMWORD[80+rsp],xmm0 + movaps XMMWORD[96+rsp],xmm0 + mov rbp,QWORD[((-8))+r11] + + lea rsp,[r11] + +$L$xts_enc_epilogue: + mov rdi,QWORD[8+rsp] ;WIN64 epilogue + mov rsi,QWORD[16+rsp] + DB 0F3h,0C3h ;repret + +$L$SEH_end_aesni_xts_encrypt: +global aesni_xts_decrypt + +ALIGN 16 +aesni_xts_decrypt: + mov QWORD[8+rsp],rdi ;WIN64 prologue + mov QWORD[16+rsp],rsi + mov rax,rsp +$L$SEH_begin_aesni_xts_decrypt: + mov rdi,rcx + mov rsi,rdx + mov rdx,r8 + mov rcx,r9 + mov r8,QWORD[40+rsp] + mov r9,QWORD[48+rsp] + + + + lea r11,[rsp] + + push rbp + + sub rsp,272 + and rsp,-16 + movaps XMMWORD[(-168)+r11],xmm6 + movaps XMMWORD[(-152)+r11],xmm7 + movaps XMMWORD[(-136)+r11],xmm8 + movaps XMMWORD[(-120)+r11],xmm9 + movaps XMMWORD[(-104)+r11],xmm10 + movaps XMMWORD[(-88)+r11],xmm11 + movaps XMMWORD[(-72)+r11],xmm12 + movaps XMMWORD[(-56)+r11],xmm13 + movaps XMMWORD[(-40)+r11],xmm14 + movaps XMMWORD[(-24)+r11],xmm15 +$L$xts_dec_body: + movups xmm2,XMMWORD[r9] + mov eax,DWORD[240+r8] + mov r10d,DWORD[240+rcx] + movups xmm0,XMMWORD[r8] + movups xmm1,XMMWORD[16+r8] + lea r8,[32+r8] + xorps xmm2,xmm0 +$L$oop_enc1_11: +DB 102,15,56,220,209 + dec eax + movups xmm1,XMMWORD[r8] + lea r8,[16+r8] + jnz NEAR $L$oop_enc1_11 +DB 102,15,56,221,209 + xor eax,eax + test rdx,15 + setnz al + shl rax,4 + sub rdx,rax + + movups xmm0,XMMWORD[rcx] + mov rbp,rcx + mov eax,r10d + shl r10d,4 + mov r9,rdx + and rdx,-16 + + movups xmm1,XMMWORD[16+r10*1+rcx] + + movdqa xmm8,XMMWORD[$L$xts_magic] + movdqa xmm15,xmm2 + pshufd xmm9,xmm2,0x5f + pxor xmm1,xmm0 + movdqa xmm14,xmm9 + paddd xmm9,xmm9 + movdqa xmm10,xmm15 + psrad xmm14,31 + paddq xmm15,xmm15 + pand xmm14,xmm8 + pxor xmm10,xmm0 + pxor xmm15,xmm14 + movdqa xmm14,xmm9 + paddd xmm9,xmm9 + movdqa xmm11,xmm15 + psrad xmm14,31 + paddq xmm15,xmm15 + pand xmm14,xmm8 + pxor xmm11,xmm0 + pxor xmm15,xmm14 + movdqa xmm14,xmm9 + paddd xmm9,xmm9 + movdqa xmm12,xmm15 + psrad xmm14,31 + paddq xmm15,xmm15 + pand xmm14,xmm8 + pxor xmm12,xmm0 + pxor xmm15,xmm14 + movdqa xmm14,xmm9 + paddd xmm9,xmm9 + movdqa xmm13,xmm15 + psrad xmm14,31 + paddq xmm15,xmm15 + pand xmm14,xmm8 + pxor xmm13,xmm0 + pxor xmm15,xmm14 + movdqa xmm14,xmm15 + psrad xmm9,31 + paddq xmm15,xmm15 + pand xmm9,xmm8 + pxor xmm14,xmm0 + pxor xmm15,xmm9 + movaps XMMWORD[96+rsp],xmm1 + + sub rdx,16*6 + jc NEAR $L$xts_dec_short + + mov eax,16+96 + lea rcx,[32+r10*1+rbp] + sub rax,r10 + movups xmm1,XMMWORD[16+rbp] + mov r10,rax + lea r8,[$L$xts_magic] + jmp NEAR $L$xts_dec_grandloop + +ALIGN 32 +$L$xts_dec_grandloop: + movdqu xmm2,XMMWORD[rdi] + movdqa xmm8,xmm0 + movdqu xmm3,XMMWORD[16+rdi] + pxor xmm2,xmm10 + movdqu xmm4,XMMWORD[32+rdi] + pxor xmm3,xmm11 +DB 102,15,56,222,209 + movdqu xmm5,XMMWORD[48+rdi] + pxor xmm4,xmm12 +DB 102,15,56,222,217 + movdqu xmm6,XMMWORD[64+rdi] + pxor xmm5,xmm13 +DB 102,15,56,222,225 + movdqu xmm7,XMMWORD[80+rdi] + pxor xmm8,xmm15 + movdqa xmm9,XMMWORD[96+rsp] + pxor xmm6,xmm14 +DB 102,15,56,222,233 + movups xmm0,XMMWORD[32+rbp] + lea rdi,[96+rdi] + pxor xmm7,xmm8 + + pxor xmm10,xmm9 +DB 102,15,56,222,241 + pxor xmm11,xmm9 + movdqa XMMWORD[rsp],xmm10 +DB 102,15,56,222,249 + movups xmm1,XMMWORD[48+rbp] + pxor xmm12,xmm9 + +DB 102,15,56,222,208 + pxor xmm13,xmm9 + movdqa XMMWORD[16+rsp],xmm11 +DB 102,15,56,222,216 + pxor xmm14,xmm9 + movdqa XMMWORD[32+rsp],xmm12 +DB 102,15,56,222,224 +DB 102,15,56,222,232 + pxor xmm8,xmm9 + movdqa XMMWORD[64+rsp],xmm14 +DB 102,15,56,222,240 +DB 102,15,56,222,248 + movups xmm0,XMMWORD[64+rbp] + movdqa XMMWORD[80+rsp],xmm8 + pshufd xmm9,xmm15,0x5f + jmp NEAR $L$xts_dec_loop6 +ALIGN 32 +$L$xts_dec_loop6: +DB 102,15,56,222,209 +DB 102,15,56,222,217 +DB 102,15,56,222,225 +DB 102,15,56,222,233 +DB 102,15,56,222,241 +DB 102,15,56,222,249 + movups xmm1,XMMWORD[((-64))+rax*1+rcx] + add rax,32 + +DB 102,15,56,222,208 +DB 102,15,56,222,216 +DB 102,15,56,222,224 +DB 102,15,56,222,232 +DB 102,15,56,222,240 +DB 102,15,56,222,248 + movups xmm0,XMMWORD[((-80))+rax*1+rcx] + jnz NEAR $L$xts_dec_loop6 + + movdqa xmm8,XMMWORD[r8] + movdqa xmm14,xmm9 + paddd xmm9,xmm9 +DB 102,15,56,222,209 + paddq xmm15,xmm15 + psrad xmm14,31 +DB 102,15,56,222,217 + pand xmm14,xmm8 + movups xmm10,XMMWORD[rbp] +DB 102,15,56,222,225 +DB 102,15,56,222,233 +DB 102,15,56,222,241 + pxor xmm15,xmm14 + movaps xmm11,xmm10 +DB 102,15,56,222,249 + movups xmm1,XMMWORD[((-64))+rcx] + + movdqa xmm14,xmm9 +DB 102,15,56,222,208 + paddd xmm9,xmm9 + pxor xmm10,xmm15 +DB 102,15,56,222,216 + psrad xmm14,31 + paddq xmm15,xmm15 +DB 102,15,56,222,224 +DB 102,15,56,222,232 + pand xmm14,xmm8 + movaps xmm12,xmm11 +DB 102,15,56,222,240 + pxor xmm15,xmm14 + movdqa xmm14,xmm9 +DB 102,15,56,222,248 + movups xmm0,XMMWORD[((-48))+rcx] + + paddd xmm9,xmm9 +DB 102,15,56,222,209 + pxor xmm11,xmm15 + psrad xmm14,31 +DB 102,15,56,222,217 + paddq xmm15,xmm15 + pand xmm14,xmm8 +DB 102,15,56,222,225 +DB 102,15,56,222,233 + movdqa XMMWORD[48+rsp],xmm13 + pxor xmm15,xmm14 +DB 102,15,56,222,241 + movaps xmm13,xmm12 + movdqa xmm14,xmm9 +DB 102,15,56,222,249 + movups xmm1,XMMWORD[((-32))+rcx] + + paddd xmm9,xmm9 +DB 102,15,56,222,208 + pxor xmm12,xmm15 + psrad xmm14,31 +DB 102,15,56,222,216 + paddq xmm15,xmm15 + pand xmm14,xmm8 +DB 102,15,56,222,224 +DB 102,15,56,222,232 +DB 102,15,56,222,240 + pxor xmm15,xmm14 + movaps xmm14,xmm13 +DB 102,15,56,222,248 + + movdqa xmm0,xmm9 + paddd xmm9,xmm9 +DB 102,15,56,222,209 + pxor xmm13,xmm15 + psrad xmm0,31 +DB 102,15,56,222,217 + paddq xmm15,xmm15 + pand xmm0,xmm8 +DB 102,15,56,222,225 +DB 102,15,56,222,233 + pxor xmm15,xmm0 + movups xmm0,XMMWORD[rbp] +DB 102,15,56,222,241 +DB 102,15,56,222,249 + movups xmm1,XMMWORD[16+rbp] + + pxor xmm14,xmm15 +DB 102,15,56,223,84,36,0 + psrad xmm9,31 + paddq xmm15,xmm15 +DB 102,15,56,223,92,36,16 +DB 102,15,56,223,100,36,32 + pand xmm9,xmm8 + mov rax,r10 +DB 102,15,56,223,108,36,48 +DB 102,15,56,223,116,36,64 +DB 102,15,56,223,124,36,80 + pxor xmm15,xmm9 + + lea rsi,[96+rsi] + movups XMMWORD[(-96)+rsi],xmm2 + movups XMMWORD[(-80)+rsi],xmm3 + movups XMMWORD[(-64)+rsi],xmm4 + movups XMMWORD[(-48)+rsi],xmm5 + movups XMMWORD[(-32)+rsi],xmm6 + movups XMMWORD[(-16)+rsi],xmm7 + sub rdx,16*6 + jnc NEAR $L$xts_dec_grandloop + + mov eax,16+96 + sub eax,r10d + mov rcx,rbp + shr eax,4 + +$L$xts_dec_short: + + mov r10d,eax + pxor xmm10,xmm0 + pxor xmm11,xmm0 + add rdx,16*6 + jz NEAR $L$xts_dec_done + + pxor xmm12,xmm0 + cmp rdx,0x20 + jb NEAR $L$xts_dec_one + pxor xmm13,xmm0 + je NEAR $L$xts_dec_two + + pxor xmm14,xmm0 + cmp rdx,0x40 + jb NEAR $L$xts_dec_three + je NEAR $L$xts_dec_four + + movdqu xmm2,XMMWORD[rdi] + movdqu xmm3,XMMWORD[16+rdi] + movdqu xmm4,XMMWORD[32+rdi] + pxor xmm2,xmm10 + movdqu xmm5,XMMWORD[48+rdi] + pxor xmm3,xmm11 + movdqu xmm6,XMMWORD[64+rdi] + lea rdi,[80+rdi] + pxor xmm4,xmm12 + pxor xmm5,xmm13 + pxor xmm6,xmm14 + + call _aesni_decrypt6 + + xorps xmm2,xmm10 + xorps xmm3,xmm11 + xorps xmm4,xmm12 + movdqu XMMWORD[rsi],xmm2 + xorps xmm5,xmm13 + movdqu XMMWORD[16+rsi],xmm3 + xorps xmm6,xmm14 + movdqu XMMWORD[32+rsi],xmm4 + pxor xmm14,xmm14 + movdqu XMMWORD[48+rsi],xmm5 + pcmpgtd xmm14,xmm15 + movdqu XMMWORD[64+rsi],xmm6 + lea rsi,[80+rsi] + pshufd xmm11,xmm14,0x13 + and r9,15 + jz NEAR $L$xts_dec_ret + + movdqa xmm10,xmm15 + paddq xmm15,xmm15 + pand xmm11,xmm8 + pxor xmm11,xmm15 + jmp NEAR $L$xts_dec_done2 + +ALIGN 16 +$L$xts_dec_one: + movups xmm2,XMMWORD[rdi] + lea rdi,[16+rdi] + xorps xmm2,xmm10 + movups xmm0,XMMWORD[rcx] + movups xmm1,XMMWORD[16+rcx] + lea rcx,[32+rcx] + xorps xmm2,xmm0 +$L$oop_dec1_12: +DB 102,15,56,222,209 + dec eax + movups xmm1,XMMWORD[rcx] + lea rcx,[16+rcx] + jnz NEAR $L$oop_dec1_12 +DB 102,15,56,223,209 + xorps xmm2,xmm10 + movdqa xmm10,xmm11 + movups XMMWORD[rsi],xmm2 + movdqa xmm11,xmm12 + lea rsi,[16+rsi] + jmp NEAR $L$xts_dec_done + +ALIGN 16 +$L$xts_dec_two: + movups xmm2,XMMWORD[rdi] + movups xmm3,XMMWORD[16+rdi] + lea rdi,[32+rdi] + xorps xmm2,xmm10 + xorps xmm3,xmm11 + + call _aesni_decrypt2 + + xorps xmm2,xmm10 + movdqa xmm10,xmm12 + xorps xmm3,xmm11 + movdqa xmm11,xmm13 + movups XMMWORD[rsi],xmm2 + movups XMMWORD[16+rsi],xmm3 + lea rsi,[32+rsi] + jmp NEAR $L$xts_dec_done + +ALIGN 16 +$L$xts_dec_three: + movups xmm2,XMMWORD[rdi] + movups xmm3,XMMWORD[16+rdi] + movups xmm4,XMMWORD[32+rdi] + lea rdi,[48+rdi] + xorps xmm2,xmm10 + xorps xmm3,xmm11 + xorps xmm4,xmm12 + + call _aesni_decrypt3 + + xorps xmm2,xmm10 + movdqa xmm10,xmm13 + xorps xmm3,xmm11 + movdqa xmm11,xmm14 + xorps xmm4,xmm12 + movups XMMWORD[rsi],xmm2 + movups XMMWORD[16+rsi],xmm3 + movups XMMWORD[32+rsi],xmm4 + lea rsi,[48+rsi] + jmp NEAR $L$xts_dec_done + +ALIGN 16 +$L$xts_dec_four: + movups xmm2,XMMWORD[rdi] + movups xmm3,XMMWORD[16+rdi] + movups xmm4,XMMWORD[32+rdi] + xorps xmm2,xmm10 + movups xmm5,XMMWORD[48+rdi] + lea rdi,[64+rdi] + xorps xmm3,xmm11 + xorps xmm4,xmm12 + xorps xmm5,xmm13 + + call _aesni_decrypt4 + + pxor xmm2,xmm10 + movdqa xmm10,xmm14 + pxor xmm3,xmm11 + movdqa xmm11,xmm15 + pxor xmm4,xmm12 + movdqu XMMWORD[rsi],xmm2 + pxor xmm5,xmm13 + movdqu XMMWORD[16+rsi],xmm3 + movdqu XMMWORD[32+rsi],xmm4 + movdqu XMMWORD[48+rsi],xmm5 + lea rsi,[64+rsi] + jmp NEAR $L$xts_dec_done + +ALIGN 16 +$L$xts_dec_done: + and r9,15 + jz NEAR $L$xts_dec_ret +$L$xts_dec_done2: + mov rdx,r9 + mov rcx,rbp + mov eax,r10d + + movups xmm2,XMMWORD[rdi] + xorps xmm2,xmm11 + movups xmm0,XMMWORD[rcx] + movups xmm1,XMMWORD[16+rcx] + lea rcx,[32+rcx] + xorps xmm2,xmm0 +$L$oop_dec1_13: +DB 102,15,56,222,209 + dec eax + movups xmm1,XMMWORD[rcx] + lea rcx,[16+rcx] + jnz NEAR $L$oop_dec1_13 +DB 102,15,56,223,209 + xorps xmm2,xmm11 + movups XMMWORD[rsi],xmm2 + +$L$xts_dec_steal: + movzx eax,BYTE[16+rdi] + movzx ecx,BYTE[rsi] + lea rdi,[1+rdi] + mov BYTE[rsi],al + mov BYTE[16+rsi],cl + lea rsi,[1+rsi] + sub rdx,1 + jnz NEAR $L$xts_dec_steal + + sub rsi,r9 + mov rcx,rbp + mov eax,r10d + + movups xmm2,XMMWORD[rsi] + xorps xmm2,xmm10 + movups xmm0,XMMWORD[rcx] + movups xmm1,XMMWORD[16+rcx] + lea rcx,[32+rcx] + xorps xmm2,xmm0 +$L$oop_dec1_14: +DB 102,15,56,222,209 + dec eax + movups xmm1,XMMWORD[rcx] + lea rcx,[16+rcx] + jnz NEAR $L$oop_dec1_14 +DB 102,15,56,223,209 + xorps xmm2,xmm10 + movups XMMWORD[rsi],xmm2 + +$L$xts_dec_ret: + xorps xmm0,xmm0 + pxor xmm1,xmm1 + pxor xmm2,xmm2 + pxor xmm3,xmm3 + pxor xmm4,xmm4 + pxor xmm5,xmm5 + movaps xmm6,XMMWORD[((-168))+r11] + movaps XMMWORD[(-168)+r11],xmm0 + movaps xmm7,XMMWORD[((-152))+r11] + movaps XMMWORD[(-152)+r11],xmm0 + movaps xmm8,XMMWORD[((-136))+r11] + movaps XMMWORD[(-136)+r11],xmm0 + movaps xmm9,XMMWORD[((-120))+r11] + movaps XMMWORD[(-120)+r11],xmm0 + movaps xmm10,XMMWORD[((-104))+r11] + movaps XMMWORD[(-104)+r11],xmm0 + movaps xmm11,XMMWORD[((-88))+r11] + movaps XMMWORD[(-88)+r11],xmm0 + movaps xmm12,XMMWORD[((-72))+r11] + movaps XMMWORD[(-72)+r11],xmm0 + movaps xmm13,XMMWORD[((-56))+r11] + movaps XMMWORD[(-56)+r11],xmm0 + movaps xmm14,XMMWORD[((-40))+r11] + movaps XMMWORD[(-40)+r11],xmm0 + movaps xmm15,XMMWORD[((-24))+r11] + movaps XMMWORD[(-24)+r11],xmm0 + movaps XMMWORD[rsp],xmm0 + movaps XMMWORD[16+rsp],xmm0 + movaps XMMWORD[32+rsp],xmm0 + movaps XMMWORD[48+rsp],xmm0 + movaps XMMWORD[64+rsp],xmm0 + movaps XMMWORD[80+rsp],xmm0 + movaps XMMWORD[96+rsp],xmm0 + mov rbp,QWORD[((-8))+r11] + + lea rsp,[r11] + +$L$xts_dec_epilogue: + mov rdi,QWORD[8+rsp] ;WIN64 epilogue + mov rsi,QWORD[16+rsp] + DB 0F3h,0C3h ;repret + +$L$SEH_end_aesni_xts_decrypt: +global aesni_ocb_encrypt + +ALIGN 32 +aesni_ocb_encrypt: + mov QWORD[8+rsp],rdi ;WIN64 prologue + mov QWORD[16+rsp],rsi + mov rax,rsp +$L$SEH_begin_aesni_ocb_encrypt: + mov rdi,rcx + mov rsi,rdx + mov rdx,r8 + mov rcx,r9 + mov r8,QWORD[40+rsp] + mov r9,QWORD[48+rsp] + + + + lea rax,[rsp] + push rbx + + push rbp + + push r12 + + push r13 + + push r14 + + lea rsp,[((-160))+rsp] + movaps XMMWORD[rsp],xmm6 + movaps XMMWORD[16+rsp],xmm7 + movaps XMMWORD[32+rsp],xmm8 + movaps XMMWORD[48+rsp],xmm9 + movaps XMMWORD[64+rsp],xmm10 + movaps XMMWORD[80+rsp],xmm11 + movaps XMMWORD[96+rsp],xmm12 + movaps XMMWORD[112+rsp],xmm13 + movaps XMMWORD[128+rsp],xmm14 + movaps XMMWORD[144+rsp],xmm15 +$L$ocb_enc_body: + mov rbx,QWORD[56+rax] + mov rbp,QWORD[((56+8))+rax] + + mov r10d,DWORD[240+rcx] + mov r11,rcx + shl r10d,4 + movups xmm9,XMMWORD[rcx] + movups xmm1,XMMWORD[16+r10*1+rcx] + + movdqu xmm15,XMMWORD[r9] + pxor xmm9,xmm1 + pxor xmm15,xmm1 + + mov eax,16+32 + lea rcx,[32+r10*1+r11] + movups xmm1,XMMWORD[16+r11] + sub rax,r10 + mov r10,rax + + movdqu xmm10,XMMWORD[rbx] + movdqu xmm8,XMMWORD[rbp] + + test r8,1 + jnz NEAR $L$ocb_enc_odd + + bsf r12,r8 + add r8,1 + shl r12,4 + movdqu xmm7,XMMWORD[r12*1+rbx] + movdqu xmm2,XMMWORD[rdi] + lea rdi,[16+rdi] + + call __ocb_encrypt1 + + movdqa xmm15,xmm7 + movups XMMWORD[rsi],xmm2 + lea rsi,[16+rsi] + sub rdx,1 + jz NEAR $L$ocb_enc_done + +$L$ocb_enc_odd: + lea r12,[1+r8] + lea r13,[3+r8] + lea r14,[5+r8] + lea r8,[6+r8] + bsf r12,r12 + bsf r13,r13 + bsf r14,r14 + shl r12,4 + shl r13,4 + shl r14,4 + + sub rdx,6 + jc NEAR $L$ocb_enc_short + jmp NEAR $L$ocb_enc_grandloop + +ALIGN 32 +$L$ocb_enc_grandloop: + movdqu xmm2,XMMWORD[rdi] + movdqu xmm3,XMMWORD[16+rdi] + movdqu xmm4,XMMWORD[32+rdi] + movdqu xmm5,XMMWORD[48+rdi] + movdqu xmm6,XMMWORD[64+rdi] + movdqu xmm7,XMMWORD[80+rdi] + lea rdi,[96+rdi] + + call __ocb_encrypt6 + + movups XMMWORD[rsi],xmm2 + movups XMMWORD[16+rsi],xmm3 + movups XMMWORD[32+rsi],xmm4 + movups XMMWORD[48+rsi],xmm5 + movups XMMWORD[64+rsi],xmm6 + movups XMMWORD[80+rsi],xmm7 + lea rsi,[96+rsi] + sub rdx,6 + jnc NEAR $L$ocb_enc_grandloop + +$L$ocb_enc_short: + add rdx,6 + jz NEAR $L$ocb_enc_done + + movdqu xmm2,XMMWORD[rdi] + cmp rdx,2 + jb NEAR $L$ocb_enc_one + movdqu xmm3,XMMWORD[16+rdi] + je NEAR $L$ocb_enc_two + + movdqu xmm4,XMMWORD[32+rdi] + cmp rdx,4 + jb NEAR $L$ocb_enc_three + movdqu xmm5,XMMWORD[48+rdi] + je NEAR $L$ocb_enc_four + + movdqu xmm6,XMMWORD[64+rdi] + pxor xmm7,xmm7 + + call __ocb_encrypt6 + + movdqa xmm15,xmm14 + movups XMMWORD[rsi],xmm2 + movups XMMWORD[16+rsi],xmm3 + movups XMMWORD[32+rsi],xmm4 + movups XMMWORD[48+rsi],xmm5 + movups XMMWORD[64+rsi],xmm6 + + jmp NEAR $L$ocb_enc_done + +ALIGN 16 +$L$ocb_enc_one: + movdqa xmm7,xmm10 + + call __ocb_encrypt1 + + movdqa xmm15,xmm7 + movups XMMWORD[rsi],xmm2 + jmp NEAR $L$ocb_enc_done + +ALIGN 16 +$L$ocb_enc_two: + pxor xmm4,xmm4 + pxor xmm5,xmm5 + + call __ocb_encrypt4 + + movdqa xmm15,xmm11 + movups XMMWORD[rsi],xmm2 + movups XMMWORD[16+rsi],xmm3 + + jmp NEAR $L$ocb_enc_done + +ALIGN 16 +$L$ocb_enc_three: + pxor xmm5,xmm5 + + call __ocb_encrypt4 + + movdqa xmm15,xmm12 + movups XMMWORD[rsi],xmm2 + movups XMMWORD[16+rsi],xmm3 + movups XMMWORD[32+rsi],xmm4 + + jmp NEAR $L$ocb_enc_done + +ALIGN 16 +$L$ocb_enc_four: + call __ocb_encrypt4 + + movdqa xmm15,xmm13 + movups XMMWORD[rsi],xmm2 + movups XMMWORD[16+rsi],xmm3 + movups XMMWORD[32+rsi],xmm4 + movups XMMWORD[48+rsi],xmm5 + +$L$ocb_enc_done: + pxor xmm15,xmm0 + movdqu XMMWORD[rbp],xmm8 + movdqu XMMWORD[r9],xmm15 + + xorps xmm0,xmm0 + pxor xmm1,xmm1 + pxor xmm2,xmm2 + pxor xmm3,xmm3 + pxor xmm4,xmm4 + pxor xmm5,xmm5 + movaps xmm6,XMMWORD[rsp] + movaps XMMWORD[rsp],xmm0 + movaps xmm7,XMMWORD[16+rsp] + movaps XMMWORD[16+rsp],xmm0 + movaps xmm8,XMMWORD[32+rsp] + movaps XMMWORD[32+rsp],xmm0 + movaps xmm9,XMMWORD[48+rsp] + movaps XMMWORD[48+rsp],xmm0 + movaps xmm10,XMMWORD[64+rsp] + movaps XMMWORD[64+rsp],xmm0 + movaps xmm11,XMMWORD[80+rsp] + movaps XMMWORD[80+rsp],xmm0 + movaps xmm12,XMMWORD[96+rsp] + movaps XMMWORD[96+rsp],xmm0 + movaps xmm13,XMMWORD[112+rsp] + movaps XMMWORD[112+rsp],xmm0 + movaps xmm14,XMMWORD[128+rsp] + movaps XMMWORD[128+rsp],xmm0 + movaps xmm15,XMMWORD[144+rsp] + movaps XMMWORD[144+rsp],xmm0 + lea rax,[((160+40))+rsp] +$L$ocb_enc_pop: + mov r14,QWORD[((-40))+rax] + + mov r13,QWORD[((-32))+rax] + + mov r12,QWORD[((-24))+rax] + + mov rbp,QWORD[((-16))+rax] + + mov rbx,QWORD[((-8))+rax] + + lea rsp,[rax] + +$L$ocb_enc_epilogue: + mov rdi,QWORD[8+rsp] ;WIN64 epilogue + mov rsi,QWORD[16+rsp] + DB 0F3h,0C3h ;repret + +$L$SEH_end_aesni_ocb_encrypt: + + +ALIGN 32 +__ocb_encrypt6: + + pxor xmm15,xmm9 + movdqu xmm11,XMMWORD[r12*1+rbx] + movdqa xmm12,xmm10 + movdqu xmm13,XMMWORD[r13*1+rbx] + movdqa xmm14,xmm10 + pxor xmm10,xmm15 + movdqu xmm15,XMMWORD[r14*1+rbx] + pxor xmm11,xmm10 + pxor xmm8,xmm2 + pxor xmm2,xmm10 + pxor xmm12,xmm11 + pxor xmm8,xmm3 + pxor xmm3,xmm11 + pxor xmm13,xmm12 + pxor xmm8,xmm4 + pxor xmm4,xmm12 + pxor xmm14,xmm13 + pxor xmm8,xmm5 + pxor xmm5,xmm13 + pxor xmm15,xmm14 + pxor xmm8,xmm6 + pxor xmm6,xmm14 + pxor xmm8,xmm7 + pxor xmm7,xmm15 + movups xmm0,XMMWORD[32+r11] + + lea r12,[1+r8] + lea r13,[3+r8] + lea r14,[5+r8] + add r8,6 + pxor xmm10,xmm9 + bsf r12,r12 + bsf r13,r13 + bsf r14,r14 + +DB 102,15,56,220,209 +DB 102,15,56,220,217 +DB 102,15,56,220,225 +DB 102,15,56,220,233 + pxor xmm11,xmm9 + pxor xmm12,xmm9 +DB 102,15,56,220,241 + pxor xmm13,xmm9 + pxor xmm14,xmm9 +DB 102,15,56,220,249 + movups xmm1,XMMWORD[48+r11] + pxor xmm15,xmm9 + +DB 102,15,56,220,208 +DB 102,15,56,220,216 +DB 102,15,56,220,224 +DB 102,15,56,220,232 +DB 102,15,56,220,240 +DB 102,15,56,220,248 + movups xmm0,XMMWORD[64+r11] + shl r12,4 + shl r13,4 + jmp NEAR $L$ocb_enc_loop6 + +ALIGN 32 +$L$ocb_enc_loop6: +DB 102,15,56,220,209 +DB 102,15,56,220,217 +DB 102,15,56,220,225 +DB 102,15,56,220,233 +DB 102,15,56,220,241 +DB 102,15,56,220,249 + movups xmm1,XMMWORD[rax*1+rcx] + add rax,32 + +DB 102,15,56,220,208 +DB 102,15,56,220,216 +DB 102,15,56,220,224 +DB 102,15,56,220,232 +DB 102,15,56,220,240 +DB 102,15,56,220,248 + movups xmm0,XMMWORD[((-16))+rax*1+rcx] + jnz NEAR $L$ocb_enc_loop6 + +DB 102,15,56,220,209 +DB 102,15,56,220,217 +DB 102,15,56,220,225 +DB 102,15,56,220,233 +DB 102,15,56,220,241 +DB 102,15,56,220,249 + movups xmm1,XMMWORD[16+r11] + shl r14,4 + +DB 102,65,15,56,221,210 + movdqu xmm10,XMMWORD[rbx] + mov rax,r10 +DB 102,65,15,56,221,219 +DB 102,65,15,56,221,228 +DB 102,65,15,56,221,237 +DB 102,65,15,56,221,246 +DB 102,65,15,56,221,255 + DB 0F3h,0C3h ;repret + + + + +ALIGN 32 +__ocb_encrypt4: + + pxor xmm15,xmm9 + movdqu xmm11,XMMWORD[r12*1+rbx] + movdqa xmm12,xmm10 + movdqu xmm13,XMMWORD[r13*1+rbx] + pxor xmm10,xmm15 + pxor xmm11,xmm10 + pxor xmm8,xmm2 + pxor xmm2,xmm10 + pxor xmm12,xmm11 + pxor xmm8,xmm3 + pxor xmm3,xmm11 + pxor xmm13,xmm12 + pxor xmm8,xmm4 + pxor xmm4,xmm12 + pxor xmm8,xmm5 + pxor xmm5,xmm13 + movups xmm0,XMMWORD[32+r11] + + pxor xmm10,xmm9 + pxor xmm11,xmm9 + pxor xmm12,xmm9 + pxor xmm13,xmm9 + +DB 102,15,56,220,209 +DB 102,15,56,220,217 +DB 102,15,56,220,225 +DB 102,15,56,220,233 + movups xmm1,XMMWORD[48+r11] + +DB 102,15,56,220,208 +DB 102,15,56,220,216 +DB 102,15,56,220,224 +DB 102,15,56,220,232 + movups xmm0,XMMWORD[64+r11] + jmp NEAR $L$ocb_enc_loop4 + +ALIGN 32 +$L$ocb_enc_loop4: +DB 102,15,56,220,209 +DB 102,15,56,220,217 +DB 102,15,56,220,225 +DB 102,15,56,220,233 + movups xmm1,XMMWORD[rax*1+rcx] + add rax,32 + +DB 102,15,56,220,208 +DB 102,15,56,220,216 +DB 102,15,56,220,224 +DB 102,15,56,220,232 + movups xmm0,XMMWORD[((-16))+rax*1+rcx] + jnz NEAR $L$ocb_enc_loop4 + +DB 102,15,56,220,209 +DB 102,15,56,220,217 +DB 102,15,56,220,225 +DB 102,15,56,220,233 + movups xmm1,XMMWORD[16+r11] + mov rax,r10 + +DB 102,65,15,56,221,210 +DB 102,65,15,56,221,219 +DB 102,65,15,56,221,228 +DB 102,65,15,56,221,237 + DB 0F3h,0C3h ;repret + + + + +ALIGN 32 +__ocb_encrypt1: + + pxor xmm7,xmm15 + pxor xmm7,xmm9 + pxor xmm8,xmm2 + pxor xmm2,xmm7 + movups xmm0,XMMWORD[32+r11] + +DB 102,15,56,220,209 + movups xmm1,XMMWORD[48+r11] + pxor xmm7,xmm9 + +DB 102,15,56,220,208 + movups xmm0,XMMWORD[64+r11] + jmp NEAR $L$ocb_enc_loop1 + +ALIGN 32 +$L$ocb_enc_loop1: +DB 102,15,56,220,209 + movups xmm1,XMMWORD[rax*1+rcx] + add rax,32 + +DB 102,15,56,220,208 + movups xmm0,XMMWORD[((-16))+rax*1+rcx] + jnz NEAR $L$ocb_enc_loop1 + +DB 102,15,56,220,209 + movups xmm1,XMMWORD[16+r11] + mov rax,r10 + +DB 102,15,56,221,215 + DB 0F3h,0C3h ;repret + + + +global aesni_ocb_decrypt + +ALIGN 32 +aesni_ocb_decrypt: + mov QWORD[8+rsp],rdi ;WIN64 prologue + mov QWORD[16+rsp],rsi + mov rax,rsp +$L$SEH_begin_aesni_ocb_decrypt: + mov rdi,rcx + mov rsi,rdx + mov rdx,r8 + mov rcx,r9 + mov r8,QWORD[40+rsp] + mov r9,QWORD[48+rsp] + + + + lea rax,[rsp] + push rbx + + push rbp + + push r12 + + push r13 + + push r14 + + lea rsp,[((-160))+rsp] + movaps XMMWORD[rsp],xmm6 + movaps XMMWORD[16+rsp],xmm7 + movaps XMMWORD[32+rsp],xmm8 + movaps XMMWORD[48+rsp],xmm9 + movaps XMMWORD[64+rsp],xmm10 + movaps XMMWORD[80+rsp],xmm11 + movaps XMMWORD[96+rsp],xmm12 + movaps XMMWORD[112+rsp],xmm13 + movaps XMMWORD[128+rsp],xmm14 + movaps XMMWORD[144+rsp],xmm15 +$L$ocb_dec_body: + mov rbx,QWORD[56+rax] + mov rbp,QWORD[((56+8))+rax] + + mov r10d,DWORD[240+rcx] + mov r11,rcx + shl r10d,4 + movups xmm9,XMMWORD[rcx] + movups xmm1,XMMWORD[16+r10*1+rcx] + + movdqu xmm15,XMMWORD[r9] + pxor xmm9,xmm1 + pxor xmm15,xmm1 + + mov eax,16+32 + lea rcx,[32+r10*1+r11] + movups xmm1,XMMWORD[16+r11] + sub rax,r10 + mov r10,rax + + movdqu xmm10,XMMWORD[rbx] + movdqu xmm8,XMMWORD[rbp] + + test r8,1 + jnz NEAR $L$ocb_dec_odd + + bsf r12,r8 + add r8,1 + shl r12,4 + movdqu xmm7,XMMWORD[r12*1+rbx] + movdqu xmm2,XMMWORD[rdi] + lea rdi,[16+rdi] + + call __ocb_decrypt1 + + movdqa xmm15,xmm7 + movups XMMWORD[rsi],xmm2 + xorps xmm8,xmm2 + lea rsi,[16+rsi] + sub rdx,1 + jz NEAR $L$ocb_dec_done + +$L$ocb_dec_odd: + lea r12,[1+r8] + lea r13,[3+r8] + lea r14,[5+r8] + lea r8,[6+r8] + bsf r12,r12 + bsf r13,r13 + bsf r14,r14 + shl r12,4 + shl r13,4 + shl r14,4 + + sub rdx,6 + jc NEAR $L$ocb_dec_short + jmp NEAR $L$ocb_dec_grandloop + +ALIGN 32 +$L$ocb_dec_grandloop: + movdqu xmm2,XMMWORD[rdi] + movdqu xmm3,XMMWORD[16+rdi] + movdqu xmm4,XMMWORD[32+rdi] + movdqu xmm5,XMMWORD[48+rdi] + movdqu xmm6,XMMWORD[64+rdi] + movdqu xmm7,XMMWORD[80+rdi] + lea rdi,[96+rdi] + + call __ocb_decrypt6 + + movups XMMWORD[rsi],xmm2 + pxor xmm8,xmm2 + movups XMMWORD[16+rsi],xmm3 + pxor xmm8,xmm3 + movups XMMWORD[32+rsi],xmm4 + pxor xmm8,xmm4 + movups XMMWORD[48+rsi],xmm5 + pxor xmm8,xmm5 + movups XMMWORD[64+rsi],xmm6 + pxor xmm8,xmm6 + movups XMMWORD[80+rsi],xmm7 + pxor xmm8,xmm7 + lea rsi,[96+rsi] + sub rdx,6 + jnc NEAR $L$ocb_dec_grandloop + +$L$ocb_dec_short: + add rdx,6 + jz NEAR $L$ocb_dec_done + + movdqu xmm2,XMMWORD[rdi] + cmp rdx,2 + jb NEAR $L$ocb_dec_one + movdqu xmm3,XMMWORD[16+rdi] + je NEAR $L$ocb_dec_two + + movdqu xmm4,XMMWORD[32+rdi] + cmp rdx,4 + jb NEAR $L$ocb_dec_three + movdqu xmm5,XMMWORD[48+rdi] + je NEAR $L$ocb_dec_four + + movdqu xmm6,XMMWORD[64+rdi] + pxor xmm7,xmm7 + + call __ocb_decrypt6 + + movdqa xmm15,xmm14 + movups XMMWORD[rsi],xmm2 + pxor xmm8,xmm2 + movups XMMWORD[16+rsi],xmm3 + pxor xmm8,xmm3 + movups XMMWORD[32+rsi],xmm4 + pxor xmm8,xmm4 + movups XMMWORD[48+rsi],xmm5 + pxor xmm8,xmm5 + movups XMMWORD[64+rsi],xmm6 + pxor xmm8,xmm6 + + jmp NEAR $L$ocb_dec_done + +ALIGN 16 +$L$ocb_dec_one: + movdqa xmm7,xmm10 + + call __ocb_decrypt1 + + movdqa xmm15,xmm7 + movups XMMWORD[rsi],xmm2 + xorps xmm8,xmm2 + jmp NEAR $L$ocb_dec_done + +ALIGN 16 +$L$ocb_dec_two: + pxor xmm4,xmm4 + pxor xmm5,xmm5 + + call __ocb_decrypt4 + + movdqa xmm15,xmm11 + movups XMMWORD[rsi],xmm2 + xorps xmm8,xmm2 + movups XMMWORD[16+rsi],xmm3 + xorps xmm8,xmm3 + + jmp NEAR $L$ocb_dec_done + +ALIGN 16 +$L$ocb_dec_three: + pxor xmm5,xmm5 + + call __ocb_decrypt4 + + movdqa xmm15,xmm12 + movups XMMWORD[rsi],xmm2 + xorps xmm8,xmm2 + movups XMMWORD[16+rsi],xmm3 + xorps xmm8,xmm3 + movups XMMWORD[32+rsi],xmm4 + xorps xmm8,xmm4 + + jmp NEAR $L$ocb_dec_done + +ALIGN 16 +$L$ocb_dec_four: + call __ocb_decrypt4 + + movdqa xmm15,xmm13 + movups XMMWORD[rsi],xmm2 + pxor xmm8,xmm2 + movups XMMWORD[16+rsi],xmm3 + pxor xmm8,xmm3 + movups XMMWORD[32+rsi],xmm4 + pxor xmm8,xmm4 + movups XMMWORD[48+rsi],xmm5 + pxor xmm8,xmm5 + +$L$ocb_dec_done: + pxor xmm15,xmm0 + movdqu XMMWORD[rbp],xmm8 + movdqu XMMWORD[r9],xmm15 + + xorps xmm0,xmm0 + pxor xmm1,xmm1 + pxor xmm2,xmm2 + pxor xmm3,xmm3 + pxor xmm4,xmm4 + pxor xmm5,xmm5 + movaps xmm6,XMMWORD[rsp] + movaps XMMWORD[rsp],xmm0 + movaps xmm7,XMMWORD[16+rsp] + movaps XMMWORD[16+rsp],xmm0 + movaps xmm8,XMMWORD[32+rsp] + movaps XMMWORD[32+rsp],xmm0 + movaps xmm9,XMMWORD[48+rsp] + movaps XMMWORD[48+rsp],xmm0 + movaps xmm10,XMMWORD[64+rsp] + movaps XMMWORD[64+rsp],xmm0 + movaps xmm11,XMMWORD[80+rsp] + movaps XMMWORD[80+rsp],xmm0 + movaps xmm12,XMMWORD[96+rsp] + movaps XMMWORD[96+rsp],xmm0 + movaps xmm13,XMMWORD[112+rsp] + movaps XMMWORD[112+rsp],xmm0 + movaps xmm14,XMMWORD[128+rsp] + movaps XMMWORD[128+rsp],xmm0 + movaps xmm15,XMMWORD[144+rsp] + movaps XMMWORD[144+rsp],xmm0 + lea rax,[((160+40))+rsp] +$L$ocb_dec_pop: + mov r14,QWORD[((-40))+rax] + + mov r13,QWORD[((-32))+rax] + + mov r12,QWORD[((-24))+rax] + + mov rbp,QWORD[((-16))+rax] + + mov rbx,QWORD[((-8))+rax] + + lea rsp,[rax] + +$L$ocb_dec_epilogue: + mov rdi,QWORD[8+rsp] ;WIN64 epilogue + mov rsi,QWORD[16+rsp] + DB 0F3h,0C3h ;repret + +$L$SEH_end_aesni_ocb_decrypt: + + +ALIGN 32 +__ocb_decrypt6: + + pxor xmm15,xmm9 + movdqu xmm11,XMMWORD[r12*1+rbx] + movdqa xmm12,xmm10 + movdqu xmm13,XMMWORD[r13*1+rbx] + movdqa xmm14,xmm10 + pxor xmm10,xmm15 + movdqu xmm15,XMMWORD[r14*1+rbx] + pxor xmm11,xmm10 + pxor xmm2,xmm10 + pxor xmm12,xmm11 + pxor xmm3,xmm11 + pxor xmm13,xmm12 + pxor xmm4,xmm12 + pxor xmm14,xmm13 + pxor xmm5,xmm13 + pxor xmm15,xmm14 + pxor xmm6,xmm14 + pxor xmm7,xmm15 + movups xmm0,XMMWORD[32+r11] + + lea r12,[1+r8] + lea r13,[3+r8] + lea r14,[5+r8] + add r8,6 + pxor xmm10,xmm9 + bsf r12,r12 + bsf r13,r13 + bsf r14,r14 + +DB 102,15,56,222,209 +DB 102,15,56,222,217 +DB 102,15,56,222,225 +DB 102,15,56,222,233 + pxor xmm11,xmm9 + pxor xmm12,xmm9 +DB 102,15,56,222,241 + pxor xmm13,xmm9 + pxor xmm14,xmm9 +DB 102,15,56,222,249 + movups xmm1,XMMWORD[48+r11] + pxor xmm15,xmm9 + +DB 102,15,56,222,208 +DB 102,15,56,222,216 +DB 102,15,56,222,224 +DB 102,15,56,222,232 +DB 102,15,56,222,240 +DB 102,15,56,222,248 + movups xmm0,XMMWORD[64+r11] + shl r12,4 + shl r13,4 + jmp NEAR $L$ocb_dec_loop6 + +ALIGN 32 +$L$ocb_dec_loop6: +DB 102,15,56,222,209 +DB 102,15,56,222,217 +DB 102,15,56,222,225 +DB 102,15,56,222,233 +DB 102,15,56,222,241 +DB 102,15,56,222,249 + movups xmm1,XMMWORD[rax*1+rcx] + add rax,32 + +DB 102,15,56,222,208 +DB 102,15,56,222,216 +DB 102,15,56,222,224 +DB 102,15,56,222,232 +DB 102,15,56,222,240 +DB 102,15,56,222,248 + movups xmm0,XMMWORD[((-16))+rax*1+rcx] + jnz NEAR $L$ocb_dec_loop6 + +DB 102,15,56,222,209 +DB 102,15,56,222,217 +DB 102,15,56,222,225 +DB 102,15,56,222,233 +DB 102,15,56,222,241 +DB 102,15,56,222,249 + movups xmm1,XMMWORD[16+r11] + shl r14,4 + +DB 102,65,15,56,223,210 + movdqu xmm10,XMMWORD[rbx] + mov rax,r10 +DB 102,65,15,56,223,219 +DB 102,65,15,56,223,228 +DB 102,65,15,56,223,237 +DB 102,65,15,56,223,246 +DB 102,65,15,56,223,255 + DB 0F3h,0C3h ;repret + + + + +ALIGN 32 +__ocb_decrypt4: + + pxor xmm15,xmm9 + movdqu xmm11,XMMWORD[r12*1+rbx] + movdqa xmm12,xmm10 + movdqu xmm13,XMMWORD[r13*1+rbx] + pxor xmm10,xmm15 + pxor xmm11,xmm10 + pxor xmm2,xmm10 + pxor xmm12,xmm11 + pxor xmm3,xmm11 + pxor xmm13,xmm12 + pxor xmm4,xmm12 + pxor xmm5,xmm13 + movups xmm0,XMMWORD[32+r11] + + pxor xmm10,xmm9 + pxor xmm11,xmm9 + pxor xmm12,xmm9 + pxor xmm13,xmm9 + +DB 102,15,56,222,209 +DB 102,15,56,222,217 +DB 102,15,56,222,225 +DB 102,15,56,222,233 + movups xmm1,XMMWORD[48+r11] + +DB 102,15,56,222,208 +DB 102,15,56,222,216 +DB 102,15,56,222,224 +DB 102,15,56,222,232 + movups xmm0,XMMWORD[64+r11] + jmp NEAR $L$ocb_dec_loop4 + +ALIGN 32 +$L$ocb_dec_loop4: +DB 102,15,56,222,209 +DB 102,15,56,222,217 +DB 102,15,56,222,225 +DB 102,15,56,222,233 + movups xmm1,XMMWORD[rax*1+rcx] + add rax,32 + +DB 102,15,56,222,208 +DB 102,15,56,222,216 +DB 102,15,56,222,224 +DB 102,15,56,222,232 + movups xmm0,XMMWORD[((-16))+rax*1+rcx] + jnz NEAR $L$ocb_dec_loop4 + +DB 102,15,56,222,209 +DB 102,15,56,222,217 +DB 102,15,56,222,225 +DB 102,15,56,222,233 + movups xmm1,XMMWORD[16+r11] + mov rax,r10 + +DB 102,65,15,56,223,210 +DB 102,65,15,56,223,219 +DB 102,65,15,56,223,228 +DB 102,65,15,56,223,237 + DB 0F3h,0C3h ;repret + + + + +ALIGN 32 +__ocb_decrypt1: + + pxor xmm7,xmm15 + pxor xmm7,xmm9 + pxor xmm2,xmm7 + movups xmm0,XMMWORD[32+r11] + +DB 102,15,56,222,209 + movups xmm1,XMMWORD[48+r11] + pxor xmm7,xmm9 + +DB 102,15,56,222,208 + movups xmm0,XMMWORD[64+r11] + jmp NEAR $L$ocb_dec_loop1 + +ALIGN 32 +$L$ocb_dec_loop1: +DB 102,15,56,222,209 + movups xmm1,XMMWORD[rax*1+rcx] + add rax,32 + +DB 102,15,56,222,208 + movups xmm0,XMMWORD[((-16))+rax*1+rcx] + jnz NEAR $L$ocb_dec_loop1 + +DB 102,15,56,222,209 + movups xmm1,XMMWORD[16+r11] + mov rax,r10 + +DB 102,15,56,223,215 + DB 0F3h,0C3h ;repret + + +global aesni_cbc_encrypt + +ALIGN 16 +aesni_cbc_encrypt: + mov QWORD[8+rsp],rdi ;WIN64 prologue + mov QWORD[16+rsp],rsi + mov rax,rsp +$L$SEH_begin_aesni_cbc_encrypt: + mov rdi,rcx + mov rsi,rdx + mov rdx,r8 + mov rcx,r9 + mov r8,QWORD[40+rsp] + mov r9,QWORD[48+rsp] + + + + test rdx,rdx + jz NEAR $L$cbc_ret + + mov r10d,DWORD[240+rcx] + mov r11,rcx + test r9d,r9d + jz NEAR $L$cbc_decrypt + + movups xmm2,XMMWORD[r8] + mov eax,r10d + cmp rdx,16 + jb NEAR $L$cbc_enc_tail + sub rdx,16 + jmp NEAR $L$cbc_enc_loop +ALIGN 16 +$L$cbc_enc_loop: + movups xmm3,XMMWORD[rdi] + lea rdi,[16+rdi] + + movups xmm0,XMMWORD[rcx] + movups xmm1,XMMWORD[16+rcx] + xorps xmm3,xmm0 + lea rcx,[32+rcx] + xorps xmm2,xmm3 +$L$oop_enc1_15: +DB 102,15,56,220,209 + dec eax + movups xmm1,XMMWORD[rcx] + lea rcx,[16+rcx] + jnz NEAR $L$oop_enc1_15 +DB 102,15,56,221,209 + mov eax,r10d + mov rcx,r11 + movups XMMWORD[rsi],xmm2 + lea rsi,[16+rsi] + sub rdx,16 + jnc NEAR $L$cbc_enc_loop + add rdx,16 + jnz NEAR $L$cbc_enc_tail + pxor xmm0,xmm0 + pxor xmm1,xmm1 + movups XMMWORD[r8],xmm2 + pxor xmm2,xmm2 + pxor xmm3,xmm3 + jmp NEAR $L$cbc_ret + +$L$cbc_enc_tail: + mov rcx,rdx + xchg rsi,rdi + DD 0x9066A4F3 + mov ecx,16 + sub rcx,rdx + xor eax,eax + DD 0x9066AAF3 + lea rdi,[((-16))+rdi] + mov eax,r10d + mov rsi,rdi + mov rcx,r11 + xor rdx,rdx + jmp NEAR $L$cbc_enc_loop + +ALIGN 16 +$L$cbc_decrypt: + cmp rdx,16 + jne NEAR $L$cbc_decrypt_bulk + + + + movdqu xmm2,XMMWORD[rdi] + movdqu xmm3,XMMWORD[r8] + movdqa xmm4,xmm2 + movups xmm0,XMMWORD[rcx] + movups xmm1,XMMWORD[16+rcx] + lea rcx,[32+rcx] + xorps xmm2,xmm0 +$L$oop_dec1_16: +DB 102,15,56,222,209 + dec r10d + movups xmm1,XMMWORD[rcx] + lea rcx,[16+rcx] + jnz NEAR $L$oop_dec1_16 +DB 102,15,56,223,209 + pxor xmm0,xmm0 + pxor xmm1,xmm1 + movdqu XMMWORD[r8],xmm4 + xorps xmm2,xmm3 + pxor xmm3,xmm3 + movups XMMWORD[rsi],xmm2 + pxor xmm2,xmm2 + jmp NEAR $L$cbc_ret +ALIGN 16 +$L$cbc_decrypt_bulk: + lea r11,[rsp] + + push rbp + + sub rsp,176 + and rsp,-16 + movaps XMMWORD[16+rsp],xmm6 + movaps XMMWORD[32+rsp],xmm7 + movaps XMMWORD[48+rsp],xmm8 + movaps XMMWORD[64+rsp],xmm9 + movaps XMMWORD[80+rsp],xmm10 + movaps XMMWORD[96+rsp],xmm11 + movaps XMMWORD[112+rsp],xmm12 + movaps XMMWORD[128+rsp],xmm13 + movaps XMMWORD[144+rsp],xmm14 + movaps XMMWORD[160+rsp],xmm15 +$L$cbc_decrypt_body: + mov rbp,rcx + movups xmm10,XMMWORD[r8] + mov eax,r10d + cmp rdx,0x50 + jbe NEAR $L$cbc_dec_tail + + movups xmm0,XMMWORD[rcx] + movdqu xmm2,XMMWORD[rdi] + movdqu xmm3,XMMWORD[16+rdi] + movdqa xmm11,xmm2 + movdqu xmm4,XMMWORD[32+rdi] + movdqa xmm12,xmm3 + movdqu xmm5,XMMWORD[48+rdi] + movdqa xmm13,xmm4 + movdqu xmm6,XMMWORD[64+rdi] + movdqa xmm14,xmm5 + movdqu xmm7,XMMWORD[80+rdi] + movdqa xmm15,xmm6 + mov r9d,DWORD[((OPENSSL_ia32cap_P+4))] + cmp rdx,0x70 + jbe NEAR $L$cbc_dec_six_or_seven + + and r9d,71303168 + sub rdx,0x50 + cmp r9d,4194304 + je NEAR $L$cbc_dec_loop6_enter + sub rdx,0x20 + lea rcx,[112+rcx] + jmp NEAR $L$cbc_dec_loop8_enter +ALIGN 16 +$L$cbc_dec_loop8: + movups XMMWORD[rsi],xmm9 + lea rsi,[16+rsi] +$L$cbc_dec_loop8_enter: + movdqu xmm8,XMMWORD[96+rdi] + pxor xmm2,xmm0 + movdqu xmm9,XMMWORD[112+rdi] + pxor xmm3,xmm0 + movups xmm1,XMMWORD[((16-112))+rcx] + pxor xmm4,xmm0 + mov rbp,-1 + cmp rdx,0x70 + pxor xmm5,xmm0 + pxor xmm6,xmm0 + pxor xmm7,xmm0 + pxor xmm8,xmm0 + +DB 102,15,56,222,209 + pxor xmm9,xmm0 + movups xmm0,XMMWORD[((32-112))+rcx] +DB 102,15,56,222,217 +DB 102,15,56,222,225 +DB 102,15,56,222,233 +DB 102,15,56,222,241 +DB 102,15,56,222,249 +DB 102,68,15,56,222,193 + adc rbp,0 + and rbp,128 +DB 102,68,15,56,222,201 + add rbp,rdi + movups xmm1,XMMWORD[((48-112))+rcx] +DB 102,15,56,222,208 +DB 102,15,56,222,216 +DB 102,15,56,222,224 +DB 102,15,56,222,232 +DB 102,15,56,222,240 +DB 102,15,56,222,248 +DB 102,68,15,56,222,192 +DB 102,68,15,56,222,200 + movups xmm0,XMMWORD[((64-112))+rcx] + nop +DB 102,15,56,222,209 +DB 102,15,56,222,217 +DB 102,15,56,222,225 +DB 102,15,56,222,233 +DB 102,15,56,222,241 +DB 102,15,56,222,249 +DB 102,68,15,56,222,193 +DB 102,68,15,56,222,201 + movups xmm1,XMMWORD[((80-112))+rcx] + nop +DB 102,15,56,222,208 +DB 102,15,56,222,216 +DB 102,15,56,222,224 +DB 102,15,56,222,232 +DB 102,15,56,222,240 +DB 102,15,56,222,248 +DB 102,68,15,56,222,192 +DB 102,68,15,56,222,200 + movups xmm0,XMMWORD[((96-112))+rcx] + nop +DB 102,15,56,222,209 +DB 102,15,56,222,217 +DB 102,15,56,222,225 +DB 102,15,56,222,233 +DB 102,15,56,222,241 +DB 102,15,56,222,249 +DB 102,68,15,56,222,193 +DB 102,68,15,56,222,201 + movups xmm1,XMMWORD[((112-112))+rcx] + nop +DB 102,15,56,222,208 +DB 102,15,56,222,216 +DB 102,15,56,222,224 +DB 102,15,56,222,232 +DB 102,15,56,222,240 +DB 102,15,56,222,248 +DB 102,68,15,56,222,192 +DB 102,68,15,56,222,200 + movups xmm0,XMMWORD[((128-112))+rcx] + nop +DB 102,15,56,222,209 +DB 102,15,56,222,217 +DB 102,15,56,222,225 +DB 102,15,56,222,233 +DB 102,15,56,222,241 +DB 102,15,56,222,249 +DB 102,68,15,56,222,193 +DB 102,68,15,56,222,201 + movups xmm1,XMMWORD[((144-112))+rcx] + cmp eax,11 +DB 102,15,56,222,208 +DB 102,15,56,222,216 +DB 102,15,56,222,224 +DB 102,15,56,222,232 +DB 102,15,56,222,240 +DB 102,15,56,222,248 +DB 102,68,15,56,222,192 +DB 102,68,15,56,222,200 + movups xmm0,XMMWORD[((160-112))+rcx] + jb NEAR $L$cbc_dec_done +DB 102,15,56,222,209 +DB 102,15,56,222,217 +DB 102,15,56,222,225 +DB 102,15,56,222,233 +DB 102,15,56,222,241 +DB 102,15,56,222,249 +DB 102,68,15,56,222,193 +DB 102,68,15,56,222,201 + movups xmm1,XMMWORD[((176-112))+rcx] + nop +DB 102,15,56,222,208 +DB 102,15,56,222,216 +DB 102,15,56,222,224 +DB 102,15,56,222,232 +DB 102,15,56,222,240 +DB 102,15,56,222,248 +DB 102,68,15,56,222,192 +DB 102,68,15,56,222,200 + movups xmm0,XMMWORD[((192-112))+rcx] + je NEAR $L$cbc_dec_done +DB 102,15,56,222,209 +DB 102,15,56,222,217 +DB 102,15,56,222,225 +DB 102,15,56,222,233 +DB 102,15,56,222,241 +DB 102,15,56,222,249 +DB 102,68,15,56,222,193 +DB 102,68,15,56,222,201 + movups xmm1,XMMWORD[((208-112))+rcx] + nop +DB 102,15,56,222,208 +DB 102,15,56,222,216 +DB 102,15,56,222,224 +DB 102,15,56,222,232 +DB 102,15,56,222,240 +DB 102,15,56,222,248 +DB 102,68,15,56,222,192 +DB 102,68,15,56,222,200 + movups xmm0,XMMWORD[((224-112))+rcx] + jmp NEAR $L$cbc_dec_done +ALIGN 16 +$L$cbc_dec_done: +DB 102,15,56,222,209 +DB 102,15,56,222,217 + pxor xmm10,xmm0 + pxor xmm11,xmm0 +DB 102,15,56,222,225 +DB 102,15,56,222,233 + pxor xmm12,xmm0 + pxor xmm13,xmm0 +DB 102,15,56,222,241 +DB 102,15,56,222,249 + pxor xmm14,xmm0 + pxor xmm15,xmm0 +DB 102,68,15,56,222,193 +DB 102,68,15,56,222,201 + movdqu xmm1,XMMWORD[80+rdi] + +DB 102,65,15,56,223,210 + movdqu xmm10,XMMWORD[96+rdi] + pxor xmm1,xmm0 +DB 102,65,15,56,223,219 + pxor xmm10,xmm0 + movdqu xmm0,XMMWORD[112+rdi] +DB 102,65,15,56,223,228 + lea rdi,[128+rdi] + movdqu xmm11,XMMWORD[rbp] +DB 102,65,15,56,223,237 +DB 102,65,15,56,223,246 + movdqu xmm12,XMMWORD[16+rbp] + movdqu xmm13,XMMWORD[32+rbp] +DB 102,65,15,56,223,255 +DB 102,68,15,56,223,193 + movdqu xmm14,XMMWORD[48+rbp] + movdqu xmm15,XMMWORD[64+rbp] +DB 102,69,15,56,223,202 + movdqa xmm10,xmm0 + movdqu xmm1,XMMWORD[80+rbp] + movups xmm0,XMMWORD[((-112))+rcx] + + movups XMMWORD[rsi],xmm2 + movdqa xmm2,xmm11 + movups XMMWORD[16+rsi],xmm3 + movdqa xmm3,xmm12 + movups XMMWORD[32+rsi],xmm4 + movdqa xmm4,xmm13 + movups XMMWORD[48+rsi],xmm5 + movdqa xmm5,xmm14 + movups XMMWORD[64+rsi],xmm6 + movdqa xmm6,xmm15 + movups XMMWORD[80+rsi],xmm7 + movdqa xmm7,xmm1 + movups XMMWORD[96+rsi],xmm8 + lea rsi,[112+rsi] + + sub rdx,0x80 + ja NEAR $L$cbc_dec_loop8 + + movaps xmm2,xmm9 + lea rcx,[((-112))+rcx] + add rdx,0x70 + jle NEAR $L$cbc_dec_clear_tail_collected + movups XMMWORD[rsi],xmm9 + lea rsi,[16+rsi] + cmp rdx,0x50 + jbe NEAR $L$cbc_dec_tail + + movaps xmm2,xmm11 +$L$cbc_dec_six_or_seven: + cmp rdx,0x60 + ja NEAR $L$cbc_dec_seven + + movaps xmm8,xmm7 + call _aesni_decrypt6 + pxor xmm2,xmm10 + movaps xmm10,xmm8 + pxor xmm3,xmm11 + movdqu XMMWORD[rsi],xmm2 + pxor xmm4,xmm12 + movdqu XMMWORD[16+rsi],xmm3 + pxor xmm3,xmm3 + pxor xmm5,xmm13 + movdqu XMMWORD[32+rsi],xmm4 + pxor xmm4,xmm4 + pxor xmm6,xmm14 + movdqu XMMWORD[48+rsi],xmm5 + pxor xmm5,xmm5 + pxor xmm7,xmm15 + movdqu XMMWORD[64+rsi],xmm6 + pxor xmm6,xmm6 + lea rsi,[80+rsi] + movdqa xmm2,xmm7 + pxor xmm7,xmm7 + jmp NEAR $L$cbc_dec_tail_collected + +ALIGN 16 +$L$cbc_dec_seven: + movups xmm8,XMMWORD[96+rdi] + xorps xmm9,xmm9 + call _aesni_decrypt8 + movups xmm9,XMMWORD[80+rdi] + pxor xmm2,xmm10 + movups xmm10,XMMWORD[96+rdi] + pxor xmm3,xmm11 + movdqu XMMWORD[rsi],xmm2 + pxor xmm4,xmm12 + movdqu XMMWORD[16+rsi],xmm3 + pxor xmm3,xmm3 + pxor xmm5,xmm13 + movdqu XMMWORD[32+rsi],xmm4 + pxor xmm4,xmm4 + pxor xmm6,xmm14 + movdqu XMMWORD[48+rsi],xmm5 + pxor xmm5,xmm5 + pxor xmm7,xmm15 + movdqu XMMWORD[64+rsi],xmm6 + pxor xmm6,xmm6 + pxor xmm8,xmm9 + movdqu XMMWORD[80+rsi],xmm7 + pxor xmm7,xmm7 + lea rsi,[96+rsi] + movdqa xmm2,xmm8 + pxor xmm8,xmm8 + pxor xmm9,xmm9 + jmp NEAR $L$cbc_dec_tail_collected + +ALIGN 16 +$L$cbc_dec_loop6: + movups XMMWORD[rsi],xmm7 + lea rsi,[16+rsi] + movdqu xmm2,XMMWORD[rdi] + movdqu xmm3,XMMWORD[16+rdi] + movdqa xmm11,xmm2 + movdqu xmm4,XMMWORD[32+rdi] + movdqa xmm12,xmm3 + movdqu xmm5,XMMWORD[48+rdi] + movdqa xmm13,xmm4 + movdqu xmm6,XMMWORD[64+rdi] + movdqa xmm14,xmm5 + movdqu xmm7,XMMWORD[80+rdi] + movdqa xmm15,xmm6 +$L$cbc_dec_loop6_enter: + lea rdi,[96+rdi] + movdqa xmm8,xmm7 + + call _aesni_decrypt6 + + pxor xmm2,xmm10 + movdqa xmm10,xmm8 + pxor xmm3,xmm11 + movdqu XMMWORD[rsi],xmm2 + pxor xmm4,xmm12 + movdqu XMMWORD[16+rsi],xmm3 + pxor xmm5,xmm13 + movdqu XMMWORD[32+rsi],xmm4 + pxor xmm6,xmm14 + mov rcx,rbp + movdqu XMMWORD[48+rsi],xmm5 + pxor xmm7,xmm15 + mov eax,r10d + movdqu XMMWORD[64+rsi],xmm6 + lea rsi,[80+rsi] + sub rdx,0x60 + ja NEAR $L$cbc_dec_loop6 + + movdqa xmm2,xmm7 + add rdx,0x50 + jle NEAR $L$cbc_dec_clear_tail_collected + movups XMMWORD[rsi],xmm7 + lea rsi,[16+rsi] + +$L$cbc_dec_tail: + movups xmm2,XMMWORD[rdi] + sub rdx,0x10 + jbe NEAR $L$cbc_dec_one + + movups xmm3,XMMWORD[16+rdi] + movaps xmm11,xmm2 + sub rdx,0x10 + jbe NEAR $L$cbc_dec_two + + movups xmm4,XMMWORD[32+rdi] + movaps xmm12,xmm3 + sub rdx,0x10 + jbe NEAR $L$cbc_dec_three + + movups xmm5,XMMWORD[48+rdi] + movaps xmm13,xmm4 + sub rdx,0x10 + jbe NEAR $L$cbc_dec_four + + movups xmm6,XMMWORD[64+rdi] + movaps xmm14,xmm5 + movaps xmm15,xmm6 + xorps xmm7,xmm7 + call _aesni_decrypt6 + pxor xmm2,xmm10 + movaps xmm10,xmm15 + pxor xmm3,xmm11 + movdqu XMMWORD[rsi],xmm2 + pxor xmm4,xmm12 + movdqu XMMWORD[16+rsi],xmm3 + pxor xmm3,xmm3 + pxor xmm5,xmm13 + movdqu XMMWORD[32+rsi],xmm4 + pxor xmm4,xmm4 + pxor xmm6,xmm14 + movdqu XMMWORD[48+rsi],xmm5 + pxor xmm5,xmm5 + lea rsi,[64+rsi] + movdqa xmm2,xmm6 + pxor xmm6,xmm6 + pxor xmm7,xmm7 + sub rdx,0x10 + jmp NEAR $L$cbc_dec_tail_collected + +ALIGN 16 +$L$cbc_dec_one: + movaps xmm11,xmm2 + movups xmm0,XMMWORD[rcx] + movups xmm1,XMMWORD[16+rcx] + lea rcx,[32+rcx] + xorps xmm2,xmm0 +$L$oop_dec1_17: +DB 102,15,56,222,209 + dec eax + movups xmm1,XMMWORD[rcx] + lea rcx,[16+rcx] + jnz NEAR $L$oop_dec1_17 +DB 102,15,56,223,209 + xorps xmm2,xmm10 + movaps xmm10,xmm11 + jmp NEAR $L$cbc_dec_tail_collected +ALIGN 16 +$L$cbc_dec_two: + movaps xmm12,xmm3 + call _aesni_decrypt2 + pxor xmm2,xmm10 + movaps xmm10,xmm12 + pxor xmm3,xmm11 + movdqu XMMWORD[rsi],xmm2 + movdqa xmm2,xmm3 + pxor xmm3,xmm3 + lea rsi,[16+rsi] + jmp NEAR $L$cbc_dec_tail_collected +ALIGN 16 +$L$cbc_dec_three: + movaps xmm13,xmm4 + call _aesni_decrypt3 + pxor xmm2,xmm10 + movaps xmm10,xmm13 + pxor xmm3,xmm11 + movdqu XMMWORD[rsi],xmm2 + pxor xmm4,xmm12 + movdqu XMMWORD[16+rsi],xmm3 + pxor xmm3,xmm3 + movdqa xmm2,xmm4 + pxor xmm4,xmm4 + lea rsi,[32+rsi] + jmp NEAR $L$cbc_dec_tail_collected +ALIGN 16 +$L$cbc_dec_four: + movaps xmm14,xmm5 + call _aesni_decrypt4 + pxor xmm2,xmm10 + movaps xmm10,xmm14 + pxor xmm3,xmm11 + movdqu XMMWORD[rsi],xmm2 + pxor xmm4,xmm12 + movdqu XMMWORD[16+rsi],xmm3 + pxor xmm3,xmm3 + pxor xmm5,xmm13 + movdqu XMMWORD[32+rsi],xmm4 + pxor xmm4,xmm4 + movdqa xmm2,xmm5 + pxor xmm5,xmm5 + lea rsi,[48+rsi] + jmp NEAR $L$cbc_dec_tail_collected + +ALIGN 16 +$L$cbc_dec_clear_tail_collected: + pxor xmm3,xmm3 + pxor xmm4,xmm4 + pxor xmm5,xmm5 +$L$cbc_dec_tail_collected: + movups XMMWORD[r8],xmm10 + and rdx,15 + jnz NEAR $L$cbc_dec_tail_partial + movups XMMWORD[rsi],xmm2 + pxor xmm2,xmm2 + jmp NEAR $L$cbc_dec_ret +ALIGN 16 +$L$cbc_dec_tail_partial: + movaps XMMWORD[rsp],xmm2 + pxor xmm2,xmm2 + mov rcx,16 + mov rdi,rsi + sub rcx,rdx + lea rsi,[rsp] + DD 0x9066A4F3 + movdqa XMMWORD[rsp],xmm2 + +$L$cbc_dec_ret: + xorps xmm0,xmm0 + pxor xmm1,xmm1 + movaps xmm6,XMMWORD[16+rsp] + movaps XMMWORD[16+rsp],xmm0 + movaps xmm7,XMMWORD[32+rsp] + movaps XMMWORD[32+rsp],xmm0 + movaps xmm8,XMMWORD[48+rsp] + movaps XMMWORD[48+rsp],xmm0 + movaps xmm9,XMMWORD[64+rsp] + movaps XMMWORD[64+rsp],xmm0 + movaps xmm10,XMMWORD[80+rsp] + movaps XMMWORD[80+rsp],xmm0 + movaps xmm11,XMMWORD[96+rsp] + movaps XMMWORD[96+rsp],xmm0 + movaps xmm12,XMMWORD[112+rsp] + movaps XMMWORD[112+rsp],xmm0 + movaps xmm13,XMMWORD[128+rsp] + movaps XMMWORD[128+rsp],xmm0 + movaps xmm14,XMMWORD[144+rsp] + movaps XMMWORD[144+rsp],xmm0 + movaps xmm15,XMMWORD[160+rsp] + movaps XMMWORD[160+rsp],xmm0 + mov rbp,QWORD[((-8))+r11] + + lea rsp,[r11] + +$L$cbc_ret: + mov rdi,QWORD[8+rsp] ;WIN64 epilogue + mov rsi,QWORD[16+rsp] + DB 0F3h,0C3h ;repret + +$L$SEH_end_aesni_cbc_encrypt: +global aesni_set_decrypt_key + +ALIGN 16 +aesni_set_decrypt_key: + +DB 0x48,0x83,0xEC,0x08 + + call __aesni_set_encrypt_key + shl edx,4 + test eax,eax + jnz NEAR $L$dec_key_ret + lea rcx,[16+rdx*1+r8] + + movups xmm0,XMMWORD[r8] + movups xmm1,XMMWORD[rcx] + movups XMMWORD[rcx],xmm0 + movups XMMWORD[r8],xmm1 + lea r8,[16+r8] + lea rcx,[((-16))+rcx] + +$L$dec_key_inverse: + movups xmm0,XMMWORD[r8] + movups xmm1,XMMWORD[rcx] +DB 102,15,56,219,192 +DB 102,15,56,219,201 + lea r8,[16+r8] + lea rcx,[((-16))+rcx] + movups XMMWORD[16+rcx],xmm0 + movups XMMWORD[(-16)+r8],xmm1 + cmp rcx,r8 + ja NEAR $L$dec_key_inverse + + movups xmm0,XMMWORD[r8] +DB 102,15,56,219,192 + pxor xmm1,xmm1 + movups XMMWORD[rcx],xmm0 + pxor xmm0,xmm0 +$L$dec_key_ret: + add rsp,8 + + DB 0F3h,0C3h ;repret + +$L$SEH_end_set_decrypt_key: + +global aesni_set_encrypt_key + +ALIGN 16 +aesni_set_encrypt_key: +__aesni_set_encrypt_key: + +DB 0x48,0x83,0xEC,0x08 + + mov rax,-1 + test rcx,rcx + jz NEAR $L$enc_key_ret + test r8,r8 + jz NEAR $L$enc_key_ret + + mov r10d,268437504 + movups xmm0,XMMWORD[rcx] + xorps xmm4,xmm4 + and r10d,DWORD[((OPENSSL_ia32cap_P+4))] + lea rax,[16+r8] + cmp edx,256 + je NEAR $L$14rounds + cmp edx,192 + je NEAR $L$12rounds + cmp edx,128 + jne NEAR $L$bad_keybits + +$L$10rounds: + mov edx,9 + cmp r10d,268435456 + je NEAR $L$10rounds_alt + + movups XMMWORD[r8],xmm0 +DB 102,15,58,223,200,1 + call $L$key_expansion_128_cold +DB 102,15,58,223,200,2 + call $L$key_expansion_128 +DB 102,15,58,223,200,4 + call $L$key_expansion_128 +DB 102,15,58,223,200,8 + call $L$key_expansion_128 +DB 102,15,58,223,200,16 + call $L$key_expansion_128 +DB 102,15,58,223,200,32 + call $L$key_expansion_128 +DB 102,15,58,223,200,64 + call $L$key_expansion_128 +DB 102,15,58,223,200,128 + call $L$key_expansion_128 +DB 102,15,58,223,200,27 + call $L$key_expansion_128 +DB 102,15,58,223,200,54 + call $L$key_expansion_128 + movups XMMWORD[rax],xmm0 + mov DWORD[80+rax],edx + xor eax,eax + jmp NEAR $L$enc_key_ret + +ALIGN 16 +$L$10rounds_alt: + movdqa xmm5,XMMWORD[$L$key_rotate] + mov r10d,8 + movdqa xmm4,XMMWORD[$L$key_rcon1] + movdqa xmm2,xmm0 + movdqu XMMWORD[r8],xmm0 + jmp NEAR $L$oop_key128 + +ALIGN 16 +$L$oop_key128: +DB 102,15,56,0,197 +DB 102,15,56,221,196 + pslld xmm4,1 + lea rax,[16+rax] + + movdqa xmm3,xmm2 + pslldq xmm2,4 + pxor xmm3,xmm2 + pslldq xmm2,4 + pxor xmm3,xmm2 + pslldq xmm2,4 + pxor xmm2,xmm3 + + pxor xmm0,xmm2 + movdqu XMMWORD[(-16)+rax],xmm0 + movdqa xmm2,xmm0 + + dec r10d + jnz NEAR $L$oop_key128 + + movdqa xmm4,XMMWORD[$L$key_rcon1b] + +DB 102,15,56,0,197 +DB 102,15,56,221,196 + pslld xmm4,1 + + movdqa xmm3,xmm2 + pslldq xmm2,4 + pxor xmm3,xmm2 + pslldq xmm2,4 + pxor xmm3,xmm2 + pslldq xmm2,4 + pxor xmm2,xmm3 + + pxor xmm0,xmm2 + movdqu XMMWORD[rax],xmm0 + + movdqa xmm2,xmm0 +DB 102,15,56,0,197 +DB 102,15,56,221,196 + + movdqa xmm3,xmm2 + pslldq xmm2,4 + pxor xmm3,xmm2 + pslldq xmm2,4 + pxor xmm3,xmm2 + pslldq xmm2,4 + pxor xmm2,xmm3 + + pxor xmm0,xmm2 + movdqu XMMWORD[16+rax],xmm0 + + mov DWORD[96+rax],edx + xor eax,eax + jmp NEAR $L$enc_key_ret + +ALIGN 16 +$L$12rounds: + movq xmm2,QWORD[16+rcx] + mov edx,11 + cmp r10d,268435456 + je NEAR $L$12rounds_alt + + movups XMMWORD[r8],xmm0 +DB 102,15,58,223,202,1 + call $L$key_expansion_192a_cold +DB 102,15,58,223,202,2 + call $L$key_expansion_192b +DB 102,15,58,223,202,4 + call $L$key_expansion_192a +DB 102,15,58,223,202,8 + call $L$key_expansion_192b +DB 102,15,58,223,202,16 + call $L$key_expansion_192a +DB 102,15,58,223,202,32 + call $L$key_expansion_192b +DB 102,15,58,223,202,64 + call $L$key_expansion_192a +DB 102,15,58,223,202,128 + call $L$key_expansion_192b + movups XMMWORD[rax],xmm0 + mov DWORD[48+rax],edx + xor rax,rax + jmp NEAR $L$enc_key_ret + +ALIGN 16 +$L$12rounds_alt: + movdqa xmm5,XMMWORD[$L$key_rotate192] + movdqa xmm4,XMMWORD[$L$key_rcon1] + mov r10d,8 + movdqu XMMWORD[r8],xmm0 + jmp NEAR $L$oop_key192 + +ALIGN 16 +$L$oop_key192: + movq QWORD[rax],xmm2 + movdqa xmm1,xmm2 +DB 102,15,56,0,213 +DB 102,15,56,221,212 + pslld xmm4,1 + lea rax,[24+rax] + + movdqa xmm3,xmm0 + pslldq xmm0,4 + pxor xmm3,xmm0 + pslldq xmm0,4 + pxor xmm3,xmm0 + pslldq xmm0,4 + pxor xmm0,xmm3 + + pshufd xmm3,xmm0,0xff + pxor xmm3,xmm1 + pslldq xmm1,4 + pxor xmm3,xmm1 + + pxor xmm0,xmm2 + pxor xmm2,xmm3 + movdqu XMMWORD[(-16)+rax],xmm0 + + dec r10d + jnz NEAR $L$oop_key192 + + mov DWORD[32+rax],edx + xor eax,eax + jmp NEAR $L$enc_key_ret + +ALIGN 16 +$L$14rounds: + movups xmm2,XMMWORD[16+rcx] + mov edx,13 + lea rax,[16+rax] + cmp r10d,268435456 + je NEAR $L$14rounds_alt + + movups XMMWORD[r8],xmm0 + movups XMMWORD[16+r8],xmm2 +DB 102,15,58,223,202,1 + call $L$key_expansion_256a_cold +DB 102,15,58,223,200,1 + call $L$key_expansion_256b +DB 102,15,58,223,202,2 + call $L$key_expansion_256a +DB 102,15,58,223,200,2 + call $L$key_expansion_256b +DB 102,15,58,223,202,4 + call $L$key_expansion_256a +DB 102,15,58,223,200,4 + call $L$key_expansion_256b +DB 102,15,58,223,202,8 + call $L$key_expansion_256a +DB 102,15,58,223,200,8 + call $L$key_expansion_256b +DB 102,15,58,223,202,16 + call $L$key_expansion_256a +DB 102,15,58,223,200,16 + call $L$key_expansion_256b +DB 102,15,58,223,202,32 + call $L$key_expansion_256a +DB 102,15,58,223,200,32 + call $L$key_expansion_256b +DB 102,15,58,223,202,64 + call $L$key_expansion_256a + movups XMMWORD[rax],xmm0 + mov DWORD[16+rax],edx + xor rax,rax + jmp NEAR $L$enc_key_ret + +ALIGN 16 +$L$14rounds_alt: + movdqa xmm5,XMMWORD[$L$key_rotate] + movdqa xmm4,XMMWORD[$L$key_rcon1] + mov r10d,7 + movdqu XMMWORD[r8],xmm0 + movdqa xmm1,xmm2 + movdqu XMMWORD[16+r8],xmm2 + jmp NEAR $L$oop_key256 + +ALIGN 16 +$L$oop_key256: +DB 102,15,56,0,213 +DB 102,15,56,221,212 + + movdqa xmm3,xmm0 + pslldq xmm0,4 + pxor xmm3,xmm0 + pslldq xmm0,4 + pxor xmm3,xmm0 + pslldq xmm0,4 + pxor xmm0,xmm3 + pslld xmm4,1 + + pxor xmm0,xmm2 + movdqu XMMWORD[rax],xmm0 + + dec r10d + jz NEAR $L$done_key256 + + pshufd xmm2,xmm0,0xff + pxor xmm3,xmm3 +DB 102,15,56,221,211 + + movdqa xmm3,xmm1 + pslldq xmm1,4 + pxor xmm3,xmm1 + pslldq xmm1,4 + pxor xmm3,xmm1 + pslldq xmm1,4 + pxor xmm1,xmm3 + + pxor xmm2,xmm1 + movdqu XMMWORD[16+rax],xmm2 + lea rax,[32+rax] + movdqa xmm1,xmm2 + + jmp NEAR $L$oop_key256 + +$L$done_key256: + mov DWORD[16+rax],edx + xor eax,eax + jmp NEAR $L$enc_key_ret + +ALIGN 16 +$L$bad_keybits: + mov rax,-2 +$L$enc_key_ret: + pxor xmm0,xmm0 + pxor xmm1,xmm1 + pxor xmm2,xmm2 + pxor xmm3,xmm3 + pxor xmm4,xmm4 + pxor xmm5,xmm5 + add rsp,8 + + DB 0F3h,0C3h ;repret +$L$SEH_end_set_encrypt_key: + +ALIGN 16 +$L$key_expansion_128: + movups XMMWORD[rax],xmm0 + lea rax,[16+rax] +$L$key_expansion_128_cold: + shufps xmm4,xmm0,16 + xorps xmm0,xmm4 + shufps xmm4,xmm0,140 + xorps xmm0,xmm4 + shufps xmm1,xmm1,255 + xorps xmm0,xmm1 + DB 0F3h,0C3h ;repret + +ALIGN 16 +$L$key_expansion_192a: + movups XMMWORD[rax],xmm0 + lea rax,[16+rax] +$L$key_expansion_192a_cold: + movaps xmm5,xmm2 +$L$key_expansion_192b_warm: + shufps xmm4,xmm0,16 + movdqa xmm3,xmm2 + xorps xmm0,xmm4 + shufps xmm4,xmm0,140 + pslldq xmm3,4 + xorps xmm0,xmm4 + pshufd xmm1,xmm1,85 + pxor xmm2,xmm3 + pxor xmm0,xmm1 + pshufd xmm3,xmm0,255 + pxor xmm2,xmm3 + DB 0F3h,0C3h ;repret + +ALIGN 16 +$L$key_expansion_192b: + movaps xmm3,xmm0 + shufps xmm5,xmm0,68 + movups XMMWORD[rax],xmm5 + shufps xmm3,xmm2,78 + movups XMMWORD[16+rax],xmm3 + lea rax,[32+rax] + jmp NEAR $L$key_expansion_192b_warm + +ALIGN 16 +$L$key_expansion_256a: + movups XMMWORD[rax],xmm2 + lea rax,[16+rax] +$L$key_expansion_256a_cold: + shufps xmm4,xmm0,16 + xorps xmm0,xmm4 + shufps xmm4,xmm0,140 + xorps xmm0,xmm4 + shufps xmm1,xmm1,255 + xorps xmm0,xmm1 + DB 0F3h,0C3h ;repret + +ALIGN 16 +$L$key_expansion_256b: + movups XMMWORD[rax],xmm0 + lea rax,[16+rax] + + shufps xmm4,xmm2,16 + xorps xmm2,xmm4 + shufps xmm4,xmm2,140 + xorps xmm2,xmm4 + shufps xmm1,xmm1,170 + xorps xmm2,xmm1 + DB 0F3h,0C3h ;repret + + + +ALIGN 64 +$L$bswap_mask: +DB 15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0 +$L$increment32: + DD 6,6,6,0 +$L$increment64: + DD 1,0,0,0 +$L$xts_magic: + DD 0x87,0,1,0 +$L$increment1: +DB 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1 +$L$key_rotate: + DD 0x0c0f0e0d,0x0c0f0e0d,0x0c0f0e0d,0x0c0f0e0d +$L$key_rotate192: + DD 0x04070605,0x04070605,0x04070605,0x04070605 +$L$key_rcon1: + DD 1,1,1,1 +$L$key_rcon1b: + DD 0x1b,0x1b,0x1b,0x1b + +DB 65,69,83,32,102,111,114,32,73,110,116,101,108,32,65,69 +DB 83,45,78,73,44,32,67,82,89,80,84,79,71,65,77,83 +DB 32,98,121,32,60,97,112,112,114,111,64,111,112,101,110,115 +DB 115,108,46,111,114,103,62,0 +ALIGN 64 +EXTERN __imp_RtlVirtualUnwind + +ALIGN 16 +ecb_ccm64_se_handler: + push rsi + push rdi + push rbx + push rbp + push r12 + push r13 + push r14 + push r15 + pushfq + sub rsp,64 + + mov rax,QWORD[120+r8] + mov rbx,QWORD[248+r8] + + mov rsi,QWORD[8+r9] + mov r11,QWORD[56+r9] + + mov r10d,DWORD[r11] + lea r10,[r10*1+rsi] + cmp rbx,r10 + jb NEAR $L$common_seh_tail + + mov rax,QWORD[152+r8] + + mov r10d,DWORD[4+r11] + lea r10,[r10*1+rsi] + cmp rbx,r10 + jae NEAR $L$common_seh_tail + + lea rsi,[rax] + lea rdi,[512+r8] + mov ecx,8 + DD 0xa548f3fc + lea rax,[88+rax] + + jmp NEAR $L$common_seh_tail + + + +ALIGN 16 +ctr_xts_se_handler: + push rsi + push rdi + push rbx + push rbp + push r12 + push r13 + push r14 + push r15 + pushfq + sub rsp,64 + + mov rax,QWORD[120+r8] + mov rbx,QWORD[248+r8] + + mov rsi,QWORD[8+r9] + mov r11,QWORD[56+r9] + + mov r10d,DWORD[r11] + lea r10,[r10*1+rsi] + cmp rbx,r10 + jb NEAR $L$common_seh_tail + + mov rax,QWORD[152+r8] + + mov r10d,DWORD[4+r11] + lea r10,[r10*1+rsi] + cmp rbx,r10 + jae NEAR $L$common_seh_tail + + mov rax,QWORD[208+r8] + + lea rsi,[((-168))+rax] + lea rdi,[512+r8] + mov ecx,20 + DD 0xa548f3fc + + mov rbp,QWORD[((-8))+rax] + mov QWORD[160+r8],rbp + jmp NEAR $L$common_seh_tail + + + +ALIGN 16 +ocb_se_handler: + push rsi + push rdi + push rbx + push rbp + push r12 + push r13 + push r14 + push r15 + pushfq + sub rsp,64 + + mov rax,QWORD[120+r8] + mov rbx,QWORD[248+r8] + + mov rsi,QWORD[8+r9] + mov r11,QWORD[56+r9] + + mov r10d,DWORD[r11] + lea r10,[r10*1+rsi] + cmp rbx,r10 + jb NEAR $L$common_seh_tail + + mov r10d,DWORD[4+r11] + lea r10,[r10*1+rsi] + cmp rbx,r10 + jae NEAR $L$common_seh_tail + + mov r10d,DWORD[8+r11] + lea r10,[r10*1+rsi] + cmp rbx,r10 + jae NEAR $L$ocb_no_xmm + + mov rax,QWORD[152+r8] + + lea rsi,[rax] + lea rdi,[512+r8] + mov ecx,20 + DD 0xa548f3fc + lea rax,[((160+40))+rax] + +$L$ocb_no_xmm: + mov rbx,QWORD[((-8))+rax] + mov rbp,QWORD[((-16))+rax] + mov r12,QWORD[((-24))+rax] + mov r13,QWORD[((-32))+rax] + mov r14,QWORD[((-40))+rax] + + mov QWORD[144+r8],rbx + mov QWORD[160+r8],rbp + mov QWORD[216+r8],r12 + mov QWORD[224+r8],r13 + mov QWORD[232+r8],r14 + + jmp NEAR $L$common_seh_tail + + +ALIGN 16 +cbc_se_handler: + push rsi + push rdi + push rbx + push rbp + push r12 + push r13 + push r14 + push r15 + pushfq + sub rsp,64 + + mov rax,QWORD[152+r8] + mov rbx,QWORD[248+r8] + + lea r10,[$L$cbc_decrypt_bulk] + cmp rbx,r10 + jb NEAR $L$common_seh_tail + + mov rax,QWORD[120+r8] + + lea r10,[$L$cbc_decrypt_body] + cmp rbx,r10 + jb NEAR $L$common_seh_tail + + mov rax,QWORD[152+r8] + + lea r10,[$L$cbc_ret] + cmp rbx,r10 + jae NEAR $L$common_seh_tail + + lea rsi,[16+rax] + lea rdi,[512+r8] + mov ecx,20 + DD 0xa548f3fc + + mov rax,QWORD[208+r8] + + mov rbp,QWORD[((-8))+rax] + mov QWORD[160+r8],rbp + +$L$common_seh_tail: + mov rdi,QWORD[8+rax] + mov rsi,QWORD[16+rax] + mov QWORD[152+r8],rax + mov QWORD[168+r8],rsi + mov QWORD[176+r8],rdi + + mov rdi,QWORD[40+r9] + mov rsi,r8 + mov ecx,154 + DD 0xa548f3fc + + mov rsi,r9 + xor rcx,rcx + mov rdx,QWORD[8+rsi] + mov r8,QWORD[rsi] + mov r9,QWORD[16+rsi] + mov r10,QWORD[40+rsi] + lea r11,[56+rsi] + lea r12,[24+rsi] + mov QWORD[32+rsp],r10 + mov QWORD[40+rsp],r11 + mov QWORD[48+rsp],r12 + mov QWORD[56+rsp],rcx + call QWORD[__imp_RtlVirtualUnwind] + + mov eax,1 + add rsp,64 + popfq + pop r15 + pop r14 + pop r13 + pop r12 + pop rbp + pop rbx + pop rdi + pop rsi + DB 0F3h,0C3h ;repret + + +section .pdata rdata align=4 +ALIGN 4 + DD $L$SEH_begin_aesni_ecb_encrypt wrt ..imagebase + DD $L$SEH_end_aesni_ecb_encrypt wrt ..imagebase + DD $L$SEH_info_ecb wrt ..imagebase + + DD $L$SEH_begin_aesni_ccm64_encrypt_blocks wrt ..imagebase + DD $L$SEH_end_aesni_ccm64_encrypt_blocks wrt ..imagebase + DD $L$SEH_info_ccm64_enc wrt ..imagebase + + DD $L$SEH_begin_aesni_ccm64_decrypt_blocks wrt ..imagebase + DD $L$SEH_end_aesni_ccm64_decrypt_blocks wrt ..imagebase + DD $L$SEH_info_ccm64_dec wrt ..imagebase + + DD $L$SEH_begin_aesni_ctr32_encrypt_blocks wrt ..imagebase + DD $L$SEH_end_aesni_ctr32_encrypt_blocks wrt ..imagebase + DD $L$SEH_info_ctr32 wrt ..imagebase + + DD $L$SEH_begin_aesni_xts_encrypt wrt ..imagebase + DD $L$SEH_end_aesni_xts_encrypt wrt ..imagebase + DD $L$SEH_info_xts_enc wrt ..imagebase + + DD $L$SEH_begin_aesni_xts_decrypt wrt ..imagebase + DD $L$SEH_end_aesni_xts_decrypt wrt ..imagebase + DD $L$SEH_info_xts_dec wrt ..imagebase + + DD $L$SEH_begin_aesni_ocb_encrypt wrt ..imagebase + DD $L$SEH_end_aesni_ocb_encrypt wrt ..imagebase + DD $L$SEH_info_ocb_enc wrt ..imagebase + + DD $L$SEH_begin_aesni_ocb_decrypt wrt ..imagebase + DD $L$SEH_end_aesni_ocb_decrypt wrt ..imagebase + DD $L$SEH_info_ocb_dec wrt ..imagebase + DD $L$SEH_begin_aesni_cbc_encrypt wrt ..imagebase + DD $L$SEH_end_aesni_cbc_encrypt wrt ..imagebase + DD $L$SEH_info_cbc wrt ..imagebase + + DD aesni_set_decrypt_key wrt ..imagebase + DD $L$SEH_end_set_decrypt_key wrt ..imagebase + DD $L$SEH_info_key wrt ..imagebase + + DD aesni_set_encrypt_key wrt ..imagebase + DD $L$SEH_end_set_encrypt_key wrt ..imagebase + DD $L$SEH_info_key wrt ..imagebase +section .xdata rdata align=8 +ALIGN 8 +$L$SEH_info_ecb: +DB 9,0,0,0 + DD ecb_ccm64_se_handler wrt ..imagebase + DD $L$ecb_enc_body wrt ..imagebase,$L$ecb_enc_ret wrt ..imagebase +$L$SEH_info_ccm64_enc: +DB 9,0,0,0 + DD ecb_ccm64_se_handler wrt ..imagebase + DD $L$ccm64_enc_body wrt ..imagebase,$L$ccm64_enc_ret wrt ..imagebase +$L$SEH_info_ccm64_dec: +DB 9,0,0,0 + DD ecb_ccm64_se_handler wrt ..imagebase + DD $L$ccm64_dec_body wrt ..imagebase,$L$ccm64_dec_ret wrt ..imagebase +$L$SEH_info_ctr32: +DB 9,0,0,0 + DD ctr_xts_se_handler wrt ..imagebase + DD $L$ctr32_body wrt ..imagebase,$L$ctr32_epilogue wrt ..imagebase +$L$SEH_info_xts_enc: +DB 9,0,0,0 + DD ctr_xts_se_handler wrt ..imagebase + DD $L$xts_enc_body wrt ..imagebase,$L$xts_enc_epilogue wrt ..imagebase +$L$SEH_info_xts_dec: +DB 9,0,0,0 + DD ctr_xts_se_handler wrt ..imagebase + DD $L$xts_dec_body wrt ..imagebase,$L$xts_dec_epilogue wrt ..imagebase +$L$SEH_info_ocb_enc: +DB 9,0,0,0 + DD ocb_se_handler wrt ..imagebase + DD $L$ocb_enc_body wrt ..imagebase,$L$ocb_enc_epilogue wrt ..imagebase + DD $L$ocb_enc_pop wrt ..imagebase + DD 0 +$L$SEH_info_ocb_dec: +DB 9,0,0,0 + DD ocb_se_handler wrt ..imagebase + DD $L$ocb_dec_body wrt ..imagebase,$L$ocb_dec_epilogue wrt ..imagebase + DD $L$ocb_dec_pop wrt ..imagebase + DD 0 +$L$SEH_info_cbc: +DB 9,0,0,0 + DD cbc_se_handler wrt ..imagebase +$L$SEH_info_key: +DB 0x01,0x04,0x01,0x00 +DB 0x04,0x02,0x00,0x00 diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/aes/vpaes-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/vpaes-x86_64.nasm new file mode 100644 index 0000000000..1c911fa294 --- /dev/null +++ b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/vpaes-x86_64.nasm @@ -0,0 +1,1173 @@ +; WARNING: do not edit! +; Generated from openssl/crypto/aes/asm/vpaes-x86_64.pl +; +; Copyright 2011-2020 The OpenSSL Project Authors. All Rights Reserved. +; +; Licensed under the OpenSSL license (the "License"). You may not use +; this file except in compliance with the License. You can obtain a copy +; in the file LICENSE in the source distribution or at +; https://www.openssl.org/source/license.html + +default rel +%define XMMWORD +%define YMMWORD +%define ZMMWORD +section .text code align=64 + + + + + + + + + + + + + + + + + + +ALIGN 16 +_vpaes_encrypt_core: + + mov r9,rdx + mov r11,16 + mov eax,DWORD[240+rdx] + movdqa xmm1,xmm9 + movdqa xmm2,XMMWORD[$L$k_ipt] + pandn xmm1,xmm0 + movdqu xmm5,XMMWORD[r9] + psrld xmm1,4 + pand xmm0,xmm9 +DB 102,15,56,0,208 + movdqa xmm0,XMMWORD[(($L$k_ipt+16))] +DB 102,15,56,0,193 + pxor xmm2,xmm5 + add r9,16 + pxor xmm0,xmm2 + lea r10,[$L$k_mc_backward] + jmp NEAR $L$enc_entry + +ALIGN 16 +$L$enc_loop: + + movdqa xmm4,xmm13 + movdqa xmm0,xmm12 +DB 102,15,56,0,226 +DB 102,15,56,0,195 + pxor xmm4,xmm5 + movdqa xmm5,xmm15 + pxor xmm0,xmm4 + movdqa xmm1,XMMWORD[((-64))+r10*1+r11] +DB 102,15,56,0,234 + movdqa xmm4,XMMWORD[r10*1+r11] + movdqa xmm2,xmm14 +DB 102,15,56,0,211 + movdqa xmm3,xmm0 + pxor xmm2,xmm5 +DB 102,15,56,0,193 + add r9,16 + pxor xmm0,xmm2 +DB 102,15,56,0,220 + add r11,16 + pxor xmm3,xmm0 +DB 102,15,56,0,193 + and r11,0x30 + sub rax,1 + pxor xmm0,xmm3 + +$L$enc_entry: + + movdqa xmm1,xmm9 + movdqa xmm5,xmm11 + pandn xmm1,xmm0 + psrld xmm1,4 + pand xmm0,xmm9 +DB 102,15,56,0,232 + movdqa xmm3,xmm10 + pxor xmm0,xmm1 +DB 102,15,56,0,217 + movdqa xmm4,xmm10 + pxor xmm3,xmm5 +DB 102,15,56,0,224 + movdqa xmm2,xmm10 + pxor xmm4,xmm5 +DB 102,15,56,0,211 + movdqa xmm3,xmm10 + pxor xmm2,xmm0 +DB 102,15,56,0,220 + movdqu xmm5,XMMWORD[r9] + pxor xmm3,xmm1 + jnz NEAR $L$enc_loop + + + movdqa xmm4,XMMWORD[((-96))+r10] + movdqa xmm0,XMMWORD[((-80))+r10] +DB 102,15,56,0,226 + pxor xmm4,xmm5 +DB 102,15,56,0,195 + movdqa xmm1,XMMWORD[64+r10*1+r11] + pxor xmm0,xmm4 +DB 102,15,56,0,193 + DB 0F3h,0C3h ;repret + + + + + + + + + +ALIGN 16 +_vpaes_decrypt_core: + + mov r9,rdx + mov eax,DWORD[240+rdx] + movdqa xmm1,xmm9 + movdqa xmm2,XMMWORD[$L$k_dipt] + pandn xmm1,xmm0 + mov r11,rax + psrld xmm1,4 + movdqu xmm5,XMMWORD[r9] + shl r11,4 + pand xmm0,xmm9 +DB 102,15,56,0,208 + movdqa xmm0,XMMWORD[(($L$k_dipt+16))] + xor r11,0x30 + lea r10,[$L$k_dsbd] +DB 102,15,56,0,193 + and r11,0x30 + pxor xmm2,xmm5 + movdqa xmm5,XMMWORD[(($L$k_mc_forward+48))] + pxor xmm0,xmm2 + add r9,16 + add r11,r10 + jmp NEAR $L$dec_entry + +ALIGN 16 +$L$dec_loop: + + + + movdqa xmm4,XMMWORD[((-32))+r10] + movdqa xmm1,XMMWORD[((-16))+r10] +DB 102,15,56,0,226 +DB 102,15,56,0,203 + pxor xmm0,xmm4 + movdqa xmm4,XMMWORD[r10] + pxor xmm0,xmm1 + movdqa xmm1,XMMWORD[16+r10] + +DB 102,15,56,0,226 +DB 102,15,56,0,197 +DB 102,15,56,0,203 + pxor xmm0,xmm4 + movdqa xmm4,XMMWORD[32+r10] + pxor xmm0,xmm1 + movdqa xmm1,XMMWORD[48+r10] + +DB 102,15,56,0,226 +DB 102,15,56,0,197 +DB 102,15,56,0,203 + pxor xmm0,xmm4 + movdqa xmm4,XMMWORD[64+r10] + pxor xmm0,xmm1 + movdqa xmm1,XMMWORD[80+r10] + +DB 102,15,56,0,226 +DB 102,15,56,0,197 +DB 102,15,56,0,203 + pxor xmm0,xmm4 + add r9,16 +DB 102,15,58,15,237,12 + pxor xmm0,xmm1 + sub rax,1 + +$L$dec_entry: + + movdqa xmm1,xmm9 + pandn xmm1,xmm0 + movdqa xmm2,xmm11 + psrld xmm1,4 + pand xmm0,xmm9 +DB 102,15,56,0,208 + movdqa xmm3,xmm10 + pxor xmm0,xmm1 +DB 102,15,56,0,217 + movdqa xmm4,xmm10 + pxor xmm3,xmm2 +DB 102,15,56,0,224 + pxor xmm4,xmm2 + movdqa xmm2,xmm10 +DB 102,15,56,0,211 + movdqa xmm3,xmm10 + pxor xmm2,xmm0 +DB 102,15,56,0,220 + movdqu xmm0,XMMWORD[r9] + pxor xmm3,xmm1 + jnz NEAR $L$dec_loop + + + movdqa xmm4,XMMWORD[96+r10] +DB 102,15,56,0,226 + pxor xmm4,xmm0 + movdqa xmm0,XMMWORD[112+r10] + movdqa xmm2,XMMWORD[((-352))+r11] +DB 102,15,56,0,195 + pxor xmm0,xmm4 +DB 102,15,56,0,194 + DB 0F3h,0C3h ;repret + + + + + + + + + +ALIGN 16 +_vpaes_schedule_core: + + + + + + + call _vpaes_preheat + movdqa xmm8,XMMWORD[$L$k_rcon] + movdqu xmm0,XMMWORD[rdi] + + + movdqa xmm3,xmm0 + lea r11,[$L$k_ipt] + call _vpaes_schedule_transform + movdqa xmm7,xmm0 + + lea r10,[$L$k_sr] + test rcx,rcx + jnz NEAR $L$schedule_am_decrypting + + + movdqu XMMWORD[rdx],xmm0 + jmp NEAR $L$schedule_go + +$L$schedule_am_decrypting: + + movdqa xmm1,XMMWORD[r10*1+r8] +DB 102,15,56,0,217 + movdqu XMMWORD[rdx],xmm3 + xor r8,0x30 + +$L$schedule_go: + cmp esi,192 + ja NEAR $L$schedule_256 + je NEAR $L$schedule_192 + + + + + + + + + + +$L$schedule_128: + mov esi,10 + +$L$oop_schedule_128: + call _vpaes_schedule_round + dec rsi + jz NEAR $L$schedule_mangle_last + call _vpaes_schedule_mangle + jmp NEAR $L$oop_schedule_128 + + + + + + + + + + + + + + + + +ALIGN 16 +$L$schedule_192: + movdqu xmm0,XMMWORD[8+rdi] + call _vpaes_schedule_transform + movdqa xmm6,xmm0 + pxor xmm4,xmm4 + movhlps xmm6,xmm4 + mov esi,4 + +$L$oop_schedule_192: + call _vpaes_schedule_round +DB 102,15,58,15,198,8 + call _vpaes_schedule_mangle + call _vpaes_schedule_192_smear + call _vpaes_schedule_mangle + call _vpaes_schedule_round + dec rsi + jz NEAR $L$schedule_mangle_last + call _vpaes_schedule_mangle + call _vpaes_schedule_192_smear + jmp NEAR $L$oop_schedule_192 + + + + + + + + + + + +ALIGN 16 +$L$schedule_256: + movdqu xmm0,XMMWORD[16+rdi] + call _vpaes_schedule_transform + mov esi,7 + +$L$oop_schedule_256: + call _vpaes_schedule_mangle + movdqa xmm6,xmm0 + + + call _vpaes_schedule_round + dec rsi + jz NEAR $L$schedule_mangle_last + call _vpaes_schedule_mangle + + + pshufd xmm0,xmm0,0xFF + movdqa xmm5,xmm7 + movdqa xmm7,xmm6 + call _vpaes_schedule_low_round + movdqa xmm7,xmm5 + + jmp NEAR $L$oop_schedule_256 + + + + + + + + + + + + +ALIGN 16 +$L$schedule_mangle_last: + + lea r11,[$L$k_deskew] + test rcx,rcx + jnz NEAR $L$schedule_mangle_last_dec + + + movdqa xmm1,XMMWORD[r10*1+r8] +DB 102,15,56,0,193 + lea r11,[$L$k_opt] + add rdx,32 + +$L$schedule_mangle_last_dec: + add rdx,-16 + pxor xmm0,XMMWORD[$L$k_s63] + call _vpaes_schedule_transform + movdqu XMMWORD[rdx],xmm0 + + + pxor xmm0,xmm0 + pxor xmm1,xmm1 + pxor xmm2,xmm2 + pxor xmm3,xmm3 + pxor xmm4,xmm4 + pxor xmm5,xmm5 + pxor xmm6,xmm6 + pxor xmm7,xmm7 + DB 0F3h,0C3h ;repret + + + + + + + + + + + + + + + + + + +ALIGN 16 +_vpaes_schedule_192_smear: + + pshufd xmm1,xmm6,0x80 + pshufd xmm0,xmm7,0xFE + pxor xmm6,xmm1 + pxor xmm1,xmm1 + pxor xmm6,xmm0 + movdqa xmm0,xmm6 + movhlps xmm6,xmm1 + DB 0F3h,0C3h ;repret + + + + + + + + + + + + + + + + + + + + + + +ALIGN 16 +_vpaes_schedule_round: + + + pxor xmm1,xmm1 +DB 102,65,15,58,15,200,15 +DB 102,69,15,58,15,192,15 + pxor xmm7,xmm1 + + + pshufd xmm0,xmm0,0xFF +DB 102,15,58,15,192,1 + + + + +_vpaes_schedule_low_round: + + movdqa xmm1,xmm7 + pslldq xmm7,4 + pxor xmm7,xmm1 + movdqa xmm1,xmm7 + pslldq xmm7,8 + pxor xmm7,xmm1 + pxor xmm7,XMMWORD[$L$k_s63] + + + movdqa xmm1,xmm9 + pandn xmm1,xmm0 + psrld xmm1,4 + pand xmm0,xmm9 + movdqa xmm2,xmm11 +DB 102,15,56,0,208 + pxor xmm0,xmm1 + movdqa xmm3,xmm10 +DB 102,15,56,0,217 + pxor xmm3,xmm2 + movdqa xmm4,xmm10 +DB 102,15,56,0,224 + pxor xmm4,xmm2 + movdqa xmm2,xmm10 +DB 102,15,56,0,211 + pxor xmm2,xmm0 + movdqa xmm3,xmm10 +DB 102,15,56,0,220 + pxor xmm3,xmm1 + movdqa xmm4,xmm13 +DB 102,15,56,0,226 + movdqa xmm0,xmm12 +DB 102,15,56,0,195 + pxor xmm0,xmm4 + + + pxor xmm0,xmm7 + movdqa xmm7,xmm0 + DB 0F3h,0C3h ;repret + + + + + + + + + + + + + +ALIGN 16 +_vpaes_schedule_transform: + + movdqa xmm1,xmm9 + pandn xmm1,xmm0 + psrld xmm1,4 + pand xmm0,xmm9 + movdqa xmm2,XMMWORD[r11] +DB 102,15,56,0,208 + movdqa xmm0,XMMWORD[16+r11] +DB 102,15,56,0,193 + pxor xmm0,xmm2 + DB 0F3h,0C3h ;repret + + + + + + + + + + + + + + + + + + + + + + + + + + + +ALIGN 16 +_vpaes_schedule_mangle: + + movdqa xmm4,xmm0 + movdqa xmm5,XMMWORD[$L$k_mc_forward] + test rcx,rcx + jnz NEAR $L$schedule_mangle_dec + + + add rdx,16 + pxor xmm4,XMMWORD[$L$k_s63] +DB 102,15,56,0,229 + movdqa xmm3,xmm4 +DB 102,15,56,0,229 + pxor xmm3,xmm4 +DB 102,15,56,0,229 + pxor xmm3,xmm4 + + jmp NEAR $L$schedule_mangle_both +ALIGN 16 +$L$schedule_mangle_dec: + + lea r11,[$L$k_dksd] + movdqa xmm1,xmm9 + pandn xmm1,xmm4 + psrld xmm1,4 + pand xmm4,xmm9 + + movdqa xmm2,XMMWORD[r11] +DB 102,15,56,0,212 + movdqa xmm3,XMMWORD[16+r11] +DB 102,15,56,0,217 + pxor xmm3,xmm2 +DB 102,15,56,0,221 + + movdqa xmm2,XMMWORD[32+r11] +DB 102,15,56,0,212 + pxor xmm2,xmm3 + movdqa xmm3,XMMWORD[48+r11] +DB 102,15,56,0,217 + pxor xmm3,xmm2 +DB 102,15,56,0,221 + + movdqa xmm2,XMMWORD[64+r11] +DB 102,15,56,0,212 + pxor xmm2,xmm3 + movdqa xmm3,XMMWORD[80+r11] +DB 102,15,56,0,217 + pxor xmm3,xmm2 +DB 102,15,56,0,221 + + movdqa xmm2,XMMWORD[96+r11] +DB 102,15,56,0,212 + pxor xmm2,xmm3 + movdqa xmm3,XMMWORD[112+r11] +DB 102,15,56,0,217 + pxor xmm3,xmm2 + + add rdx,-16 + +$L$schedule_mangle_both: + movdqa xmm1,XMMWORD[r10*1+r8] +DB 102,15,56,0,217 + add r8,-16 + and r8,0x30 + movdqu XMMWORD[rdx],xmm3 + DB 0F3h,0C3h ;repret + + + + + + +global vpaes_set_encrypt_key + +ALIGN 16 +vpaes_set_encrypt_key: + mov QWORD[8+rsp],rdi ;WIN64 prologue + mov QWORD[16+rsp],rsi + mov rax,rsp +$L$SEH_begin_vpaes_set_encrypt_key: + mov rdi,rcx + mov rsi,rdx + mov rdx,r8 + + + + lea rsp,[((-184))+rsp] + movaps XMMWORD[16+rsp],xmm6 + movaps XMMWORD[32+rsp],xmm7 + movaps XMMWORD[48+rsp],xmm8 + movaps XMMWORD[64+rsp],xmm9 + movaps XMMWORD[80+rsp],xmm10 + movaps XMMWORD[96+rsp],xmm11 + movaps XMMWORD[112+rsp],xmm12 + movaps XMMWORD[128+rsp],xmm13 + movaps XMMWORD[144+rsp],xmm14 + movaps XMMWORD[160+rsp],xmm15 +$L$enc_key_body: + mov eax,esi + shr eax,5 + add eax,5 + mov DWORD[240+rdx],eax + + mov ecx,0 + mov r8d,0x30 + call _vpaes_schedule_core + movaps xmm6,XMMWORD[16+rsp] + movaps xmm7,XMMWORD[32+rsp] + movaps xmm8,XMMWORD[48+rsp] + movaps xmm9,XMMWORD[64+rsp] + movaps xmm10,XMMWORD[80+rsp] + movaps xmm11,XMMWORD[96+rsp] + movaps xmm12,XMMWORD[112+rsp] + movaps xmm13,XMMWORD[128+rsp] + movaps xmm14,XMMWORD[144+rsp] + movaps xmm15,XMMWORD[160+rsp] + lea rsp,[184+rsp] +$L$enc_key_epilogue: + xor eax,eax + mov rdi,QWORD[8+rsp] ;WIN64 epilogue + mov rsi,QWORD[16+rsp] + DB 0F3h,0C3h ;repret + +$L$SEH_end_vpaes_set_encrypt_key: + +global vpaes_set_decrypt_key + +ALIGN 16 +vpaes_set_decrypt_key: + mov QWORD[8+rsp],rdi ;WIN64 prologue + mov QWORD[16+rsp],rsi + mov rax,rsp +$L$SEH_begin_vpaes_set_decrypt_key: + mov rdi,rcx + mov rsi,rdx + mov rdx,r8 + + + + lea rsp,[((-184))+rsp] + movaps XMMWORD[16+rsp],xmm6 + movaps XMMWORD[32+rsp],xmm7 + movaps XMMWORD[48+rsp],xmm8 + movaps XMMWORD[64+rsp],xmm9 + movaps XMMWORD[80+rsp],xmm10 + movaps XMMWORD[96+rsp],xmm11 + movaps XMMWORD[112+rsp],xmm12 + movaps XMMWORD[128+rsp],xmm13 + movaps XMMWORD[144+rsp],xmm14 + movaps XMMWORD[160+rsp],xmm15 +$L$dec_key_body: + mov eax,esi + shr eax,5 + add eax,5 + mov DWORD[240+rdx],eax + shl eax,4 + lea rdx,[16+rax*1+rdx] + + mov ecx,1 + mov r8d,esi + shr r8d,1 + and r8d,32 + xor r8d,32 + call _vpaes_schedule_core + movaps xmm6,XMMWORD[16+rsp] + movaps xmm7,XMMWORD[32+rsp] + movaps xmm8,XMMWORD[48+rsp] + movaps xmm9,XMMWORD[64+rsp] + movaps xmm10,XMMWORD[80+rsp] + movaps xmm11,XMMWORD[96+rsp] + movaps xmm12,XMMWORD[112+rsp] + movaps xmm13,XMMWORD[128+rsp] + movaps xmm14,XMMWORD[144+rsp] + movaps xmm15,XMMWORD[160+rsp] + lea rsp,[184+rsp] +$L$dec_key_epilogue: + xor eax,eax + mov rdi,QWORD[8+rsp] ;WIN64 epilogue + mov rsi,QWORD[16+rsp] + DB 0F3h,0C3h ;repret + +$L$SEH_end_vpaes_set_decrypt_key: + +global vpaes_encrypt + +ALIGN 16 +vpaes_encrypt: + mov QWORD[8+rsp],rdi ;WIN64 prologue + mov QWORD[16+rsp],rsi + mov rax,rsp +$L$SEH_begin_vpaes_encrypt: + mov rdi,rcx + mov rsi,rdx + mov rdx,r8 + + + + lea rsp,[((-184))+rsp] + movaps XMMWORD[16+rsp],xmm6 + movaps XMMWORD[32+rsp],xmm7 + movaps XMMWORD[48+rsp],xmm8 + movaps XMMWORD[64+rsp],xmm9 + movaps XMMWORD[80+rsp],xmm10 + movaps XMMWORD[96+rsp],xmm11 + movaps XMMWORD[112+rsp],xmm12 + movaps XMMWORD[128+rsp],xmm13 + movaps XMMWORD[144+rsp],xmm14 + movaps XMMWORD[160+rsp],xmm15 +$L$enc_body: + movdqu xmm0,XMMWORD[rdi] + call _vpaes_preheat + call _vpaes_encrypt_core + movdqu XMMWORD[rsi],xmm0 + movaps xmm6,XMMWORD[16+rsp] + movaps xmm7,XMMWORD[32+rsp] + movaps xmm8,XMMWORD[48+rsp] + movaps xmm9,XMMWORD[64+rsp] + movaps xmm10,XMMWORD[80+rsp] + movaps xmm11,XMMWORD[96+rsp] + movaps xmm12,XMMWORD[112+rsp] + movaps xmm13,XMMWORD[128+rsp] + movaps xmm14,XMMWORD[144+rsp] + movaps xmm15,XMMWORD[160+rsp] + lea rsp,[184+rsp] +$L$enc_epilogue: + mov rdi,QWORD[8+rsp] ;WIN64 epilogue + mov rsi,QWORD[16+rsp] + DB 0F3h,0C3h ;repret + +$L$SEH_end_vpaes_encrypt: + +global vpaes_decrypt + +ALIGN 16 +vpaes_decrypt: + mov QWORD[8+rsp],rdi ;WIN64 prologue + mov QWORD[16+rsp],rsi + mov rax,rsp +$L$SEH_begin_vpaes_decrypt: + mov rdi,rcx + mov rsi,rdx + mov rdx,r8 + + + + lea rsp,[((-184))+rsp] + movaps XMMWORD[16+rsp],xmm6 + movaps XMMWORD[32+rsp],xmm7 + movaps XMMWORD[48+rsp],xmm8 + movaps XMMWORD[64+rsp],xmm9 + movaps XMMWORD[80+rsp],xmm10 + movaps XMMWORD[96+rsp],xmm11 + movaps XMMWORD[112+rsp],xmm12 + movaps XMMWORD[128+rsp],xmm13 + movaps XMMWORD[144+rsp],xmm14 + movaps XMMWORD[160+rsp],xmm15 +$L$dec_body: + movdqu xmm0,XMMWORD[rdi] + call _vpaes_preheat + call _vpaes_decrypt_core + movdqu XMMWORD[rsi],xmm0 + movaps xmm6,XMMWORD[16+rsp] + movaps xmm7,XMMWORD[32+rsp] + movaps xmm8,XMMWORD[48+rsp] + movaps xmm9,XMMWORD[64+rsp] + movaps xmm10,XMMWORD[80+rsp] + movaps xmm11,XMMWORD[96+rsp] + movaps xmm12,XMMWORD[112+rsp] + movaps xmm13,XMMWORD[128+rsp] + movaps xmm14,XMMWORD[144+rsp] + movaps xmm15,XMMWORD[160+rsp] + lea rsp,[184+rsp] +$L$dec_epilogue: + mov rdi,QWORD[8+rsp] ;WIN64 epilogue + mov rsi,QWORD[16+rsp] + DB 0F3h,0C3h ;repret + +$L$SEH_end_vpaes_decrypt: +global vpaes_cbc_encrypt + +ALIGN 16 +vpaes_cbc_encrypt: + mov QWORD[8+rsp],rdi ;WIN64 prologue + mov QWORD[16+rsp],rsi + mov rax,rsp +$L$SEH_begin_vpaes_cbc_encrypt: + mov rdi,rcx + mov rsi,rdx + mov rdx,r8 + mov rcx,r9 + mov r8,QWORD[40+rsp] + mov r9,QWORD[48+rsp] + + + + xchg rdx,rcx + sub rcx,16 + jc NEAR $L$cbc_abort + lea rsp,[((-184))+rsp] + movaps XMMWORD[16+rsp],xmm6 + movaps XMMWORD[32+rsp],xmm7 + movaps XMMWORD[48+rsp],xmm8 + movaps XMMWORD[64+rsp],xmm9 + movaps XMMWORD[80+rsp],xmm10 + movaps XMMWORD[96+rsp],xmm11 + movaps XMMWORD[112+rsp],xmm12 + movaps XMMWORD[128+rsp],xmm13 + movaps XMMWORD[144+rsp],xmm14 + movaps XMMWORD[160+rsp],xmm15 +$L$cbc_body: + movdqu xmm6,XMMWORD[r8] + sub rsi,rdi + call _vpaes_preheat + cmp r9d,0 + je NEAR $L$cbc_dec_loop + jmp NEAR $L$cbc_enc_loop +ALIGN 16 +$L$cbc_enc_loop: + movdqu xmm0,XMMWORD[rdi] + pxor xmm0,xmm6 + call _vpaes_encrypt_core + movdqa xmm6,xmm0 + movdqu XMMWORD[rdi*1+rsi],xmm0 + lea rdi,[16+rdi] + sub rcx,16 + jnc NEAR $L$cbc_enc_loop + jmp NEAR $L$cbc_done +ALIGN 16 +$L$cbc_dec_loop: + movdqu xmm0,XMMWORD[rdi] + movdqa xmm7,xmm0 + call _vpaes_decrypt_core + pxor xmm0,xmm6 + movdqa xmm6,xmm7 + movdqu XMMWORD[rdi*1+rsi],xmm0 + lea rdi,[16+rdi] + sub rcx,16 + jnc NEAR $L$cbc_dec_loop +$L$cbc_done: + movdqu XMMWORD[r8],xmm6 + movaps xmm6,XMMWORD[16+rsp] + movaps xmm7,XMMWORD[32+rsp] + movaps xmm8,XMMWORD[48+rsp] + movaps xmm9,XMMWORD[64+rsp] + movaps xmm10,XMMWORD[80+rsp] + movaps xmm11,XMMWORD[96+rsp] + movaps xmm12,XMMWORD[112+rsp] + movaps xmm13,XMMWORD[128+rsp] + movaps xmm14,XMMWORD[144+rsp] + movaps xmm15,XMMWORD[160+rsp] + lea rsp,[184+rsp] +$L$cbc_epilogue: +$L$cbc_abort: + mov rdi,QWORD[8+rsp] ;WIN64 epilogue + mov rsi,QWORD[16+rsp] + DB 0F3h,0C3h ;repret + +$L$SEH_end_vpaes_cbc_encrypt: + + + + + + + +ALIGN 16 +_vpaes_preheat: + + lea r10,[$L$k_s0F] + movdqa xmm10,XMMWORD[((-32))+r10] + movdqa xmm11,XMMWORD[((-16))+r10] + movdqa xmm9,XMMWORD[r10] + movdqa xmm13,XMMWORD[48+r10] + movdqa xmm12,XMMWORD[64+r10] + movdqa xmm15,XMMWORD[80+r10] + movdqa xmm14,XMMWORD[96+r10] + DB 0F3h,0C3h ;repret + + + + + + + + +ALIGN 64 +_vpaes_consts: +$L$k_inv: + DQ 0x0E05060F0D080180,0x040703090A0B0C02 + DQ 0x01040A060F0B0780,0x030D0E0C02050809 + +$L$k_s0F: + DQ 0x0F0F0F0F0F0F0F0F,0x0F0F0F0F0F0F0F0F + +$L$k_ipt: + DQ 0xC2B2E8985A2A7000,0xCABAE09052227808 + DQ 0x4C01307D317C4D00,0xCD80B1FCB0FDCC81 + +$L$k_sb1: + DQ 0xB19BE18FCB503E00,0xA5DF7A6E142AF544 + DQ 0x3618D415FAE22300,0x3BF7CCC10D2ED9EF +$L$k_sb2: + DQ 0xE27A93C60B712400,0x5EB7E955BC982FCD + DQ 0x69EB88400AE12900,0xC2A163C8AB82234A +$L$k_sbo: + DQ 0xD0D26D176FBDC700,0x15AABF7AC502A878 + DQ 0xCFE474A55FBB6A00,0x8E1E90D1412B35FA + +$L$k_mc_forward: + DQ 0x0407060500030201,0x0C0F0E0D080B0A09 + DQ 0x080B0A0904070605,0x000302010C0F0E0D + DQ 0x0C0F0E0D080B0A09,0x0407060500030201 + DQ 0x000302010C0F0E0D,0x080B0A0904070605 + +$L$k_mc_backward: + DQ 0x0605040702010003,0x0E0D0C0F0A09080B + DQ 0x020100030E0D0C0F,0x0A09080B06050407 + DQ 0x0E0D0C0F0A09080B,0x0605040702010003 + DQ 0x0A09080B06050407,0x020100030E0D0C0F + +$L$k_sr: + DQ 0x0706050403020100,0x0F0E0D0C0B0A0908 + DQ 0x030E09040F0A0500,0x0B06010C07020D08 + DQ 0x0F060D040B020900,0x070E050C030A0108 + DQ 0x0B0E0104070A0D00,0x0306090C0F020508 + +$L$k_rcon: + DQ 0x1F8391B9AF9DEEB6,0x702A98084D7C7D81 + +$L$k_s63: + DQ 0x5B5B5B5B5B5B5B5B,0x5B5B5B5B5B5B5B5B + +$L$k_opt: + DQ 0xFF9F4929D6B66000,0xF7974121DEBE6808 + DQ 0x01EDBD5150BCEC00,0xE10D5DB1B05C0CE0 + +$L$k_deskew: + DQ 0x07E4A34047A4E300,0x1DFEB95A5DBEF91A + DQ 0x5F36B5DC83EA6900,0x2841C2ABF49D1E77 + + + + + +$L$k_dksd: + DQ 0xFEB91A5DA3E44700,0x0740E3A45A1DBEF9 + DQ 0x41C277F4B5368300,0x5FDC69EAAB289D1E +$L$k_dksb: + DQ 0x9A4FCA1F8550D500,0x03D653861CC94C99 + DQ 0x115BEDA7B6FC4A00,0xD993256F7E3482C8 +$L$k_dkse: + DQ 0xD5031CCA1FC9D600,0x53859A4C994F5086 + DQ 0xA23196054FDC7BE8,0xCD5EF96A20B31487 +$L$k_dks9: + DQ 0xB6116FC87ED9A700,0x4AED933482255BFC + DQ 0x4576516227143300,0x8BB89FACE9DAFDCE + + + + + +$L$k_dipt: + DQ 0x0F505B040B545F00,0x154A411E114E451A + DQ 0x86E383E660056500,0x12771772F491F194 + +$L$k_dsb9: + DQ 0x851C03539A86D600,0xCAD51F504F994CC9 + DQ 0xC03B1789ECD74900,0x725E2C9EB2FBA565 +$L$k_dsbd: + DQ 0x7D57CCDFE6B1A200,0xF56E9B13882A4439 + DQ 0x3CE2FAF724C6CB00,0x2931180D15DEEFD3 +$L$k_dsbb: + DQ 0xD022649296B44200,0x602646F6B0F2D404 + DQ 0xC19498A6CD596700,0xF3FF0C3E3255AA6B +$L$k_dsbe: + DQ 0x46F2929626D4D000,0x2242600464B4F6B0 + DQ 0x0C55A6CDFFAAC100,0x9467F36B98593E32 +$L$k_dsbo: + DQ 0x1387EA537EF94000,0xC7AA6DB9D4943E2D + DQ 0x12D7560F93441D00,0xCA4B8159D8C58E9C +DB 86,101,99,116,111,114,32,80,101,114,109,117,116,97,116,105 +DB 111,110,32,65,69,83,32,102,111,114,32,120,56,54,95,54 +DB 52,47,83,83,83,69,51,44,32,77,105,107,101,32,72,97 +DB 109,98,117,114,103,32,40,83,116,97,110,102,111,114,100,32 +DB 85,110,105,118,101,114,115,105,116,121,41,0 +ALIGN 64 + +EXTERN __imp_RtlVirtualUnwind + +ALIGN 16 +se_handler: + push rsi + push rdi + push rbx + push rbp + push r12 + push r13 + push r14 + push r15 + pushfq + sub rsp,64 + + mov rax,QWORD[120+r8] + mov rbx,QWORD[248+r8] + + mov rsi,QWORD[8+r9] + mov r11,QWORD[56+r9] + + mov r10d,DWORD[r11] + lea r10,[r10*1+rsi] + cmp rbx,r10 + jb NEAR $L$in_prologue + + mov rax,QWORD[152+r8] + + mov r10d,DWORD[4+r11] + lea r10,[r10*1+rsi] + cmp rbx,r10 + jae NEAR $L$in_prologue + + lea rsi,[16+rax] + lea rdi,[512+r8] + mov ecx,20 + DD 0xa548f3fc + lea rax,[184+rax] + +$L$in_prologue: + mov rdi,QWORD[8+rax] + mov rsi,QWORD[16+rax] + mov QWORD[152+r8],rax + mov QWORD[168+r8],rsi + mov QWORD[176+r8],rdi + + mov rdi,QWORD[40+r9] + mov rsi,r8 + mov ecx,154 + DD 0xa548f3fc + + mov rsi,r9 + xor rcx,rcx + mov rdx,QWORD[8+rsi] + mov r8,QWORD[rsi] + mov r9,QWORD[16+rsi] + mov r10,QWORD[40+rsi] + lea r11,[56+rsi] + lea r12,[24+rsi] + mov QWORD[32+rsp],r10 + mov QWORD[40+rsp],r11 + mov QWORD[48+rsp],r12 + mov QWORD[56+rsp],rcx + call QWORD[__imp_RtlVirtualUnwind] + + mov eax,1 + add rsp,64 + popfq + pop r15 + pop r14 + pop r13 + pop r12 + pop rbp + pop rbx + pop rdi + pop rsi + DB 0F3h,0C3h ;repret + + +section .pdata rdata align=4 +ALIGN 4 + DD $L$SEH_begin_vpaes_set_encrypt_key wrt ..imagebase + DD $L$SEH_end_vpaes_set_encrypt_key wrt ..imagebase + DD $L$SEH_info_vpaes_set_encrypt_key wrt ..imagebase + + DD $L$SEH_begin_vpaes_set_decrypt_key wrt ..imagebase + DD $L$SEH_end_vpaes_set_decrypt_key wrt ..imagebase + DD $L$SEH_info_vpaes_set_decrypt_key wrt ..imagebase + + DD $L$SEH_begin_vpaes_encrypt wrt ..imagebase + DD $L$SEH_end_vpaes_encrypt wrt ..imagebase + DD $L$SEH_info_vpaes_encrypt wrt ..imagebase + + DD $L$SEH_begin_vpaes_decrypt wrt ..imagebase + DD $L$SEH_end_vpaes_decrypt wrt ..imagebase + DD $L$SEH_info_vpaes_decrypt wrt ..imagebase + + DD $L$SEH_begin_vpaes_cbc_encrypt wrt ..imagebase + DD $L$SEH_end_vpaes_cbc_encrypt wrt ..imagebase + DD $L$SEH_info_vpaes_cbc_encrypt wrt ..imagebase + +section .xdata rdata align=8 +ALIGN 8 +$L$SEH_info_vpaes_set_encrypt_key: +DB 9,0,0,0 + DD se_handler wrt ..imagebase + DD $L$enc_key_body wrt ..imagebase,$L$enc_key_epilogue wrt ..imagebase +$L$SEH_info_vpaes_set_decrypt_key: +DB 9,0,0,0 + DD se_handler wrt ..imagebase + DD $L$dec_key_body wrt ..imagebase,$L$dec_key_epilogue wrt ..imagebase +$L$SEH_info_vpaes_encrypt: +DB 9,0,0,0 + DD se_handler wrt ..imagebase + DD $L$enc_body wrt ..imagebase,$L$enc_epilogue wrt ..imagebase +$L$SEH_info_vpaes_decrypt: +DB 9,0,0,0 + DD se_handler wrt ..imagebase + DD $L$dec_body wrt ..imagebase,$L$dec_epilogue wrt ..imagebase +$L$SEH_info_vpaes_cbc_encrypt: +DB 9,0,0,0 + DD se_handler wrt ..imagebase + DD $L$cbc_body wrt ..imagebase,$L$cbc_epilogue wrt ..imagebase diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/modes/aesni-gcm-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/modes/aesni-gcm-x86_64.nasm new file mode 100644 index 0000000000..9e1a2d0a40 --- /dev/null +++ b/CryptoPkg/Library/OpensslLib/X64/crypto/modes/aesni-gcm-x86_64.nasm @@ -0,0 +1,34 @@ +; WARNING: do not edit! +; Generated from openssl/crypto/modes/asm/aesni-gcm-x86_64.pl +; +; Copyright 2013-2020 The OpenSSL Project Authors. All Rights Reserved. +; +; Licensed under the OpenSSL license (the "License"). You may not use +; this file except in compliance with the License. You can obtain a copy +; in the file LICENSE in the source distribution or at +; https://www.openssl.org/source/license.html + +default rel +%define XMMWORD +%define YMMWORD +%define ZMMWORD +section .text code align=64 + + +global aesni_gcm_encrypt + +aesni_gcm_encrypt: + + xor eax,eax + DB 0F3h,0C3h ;repret + + + +global aesni_gcm_decrypt + +aesni_gcm_decrypt: + + xor eax,eax + DB 0F3h,0C3h ;repret + + diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/modes/ghash-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/modes/ghash-x86_64.nasm new file mode 100644 index 0000000000..60f283d5fb --- /dev/null +++ b/CryptoPkg/Library/OpensslLib/X64/crypto/modes/ghash-x86_64.nasm @@ -0,0 +1,1569 @@ +; WARNING: do not edit! +; Generated from openssl/crypto/modes/asm/ghash-x86_64.pl +; +; Copyright 2010-2020 The OpenSSL Project Authors. All Rights Reserved. +; +; Licensed under the OpenSSL license (the "License"). You may not use +; this file except in compliance with the License. You can obtain a copy +; in the file LICENSE in the source distribution or at +; https://www.openssl.org/source/license.html + +default rel +%define XMMWORD +%define YMMWORD +%define ZMMWORD +section .text code align=64 + +EXTERN OPENSSL_ia32cap_P + +global gcm_gmult_4bit + +ALIGN 16 +gcm_gmult_4bit: + mov QWORD[8+rsp],rdi ;WIN64 prologue + mov QWORD[16+rsp],rsi + mov rax,rsp +$L$SEH_begin_gcm_gmult_4bit: + mov rdi,rcx + mov rsi,rdx + + + + push rbx + + push rbp + + push r12 + + push r13 + + push r14 + + push r15 + + sub rsp,280 + +$L$gmult_prologue: + + movzx r8,BYTE[15+rdi] + lea r11,[$L$rem_4bit] + xor rax,rax + xor rbx,rbx + mov al,r8b + mov bl,r8b + shl al,4 + mov rcx,14 + mov r8,QWORD[8+rax*1+rsi] + mov r9,QWORD[rax*1+rsi] + and bl,0xf0 + mov rdx,r8 + jmp NEAR $L$oop1 + +ALIGN 16 +$L$oop1: + shr r8,4 + and rdx,0xf + mov r10,r9 + mov al,BYTE[rcx*1+rdi] + shr r9,4 + xor r8,QWORD[8+rbx*1+rsi] + shl r10,60 + xor r9,QWORD[rbx*1+rsi] + mov bl,al + xor r9,QWORD[rdx*8+r11] + mov rdx,r8 + shl al,4 + xor r8,r10 + dec rcx + js NEAR $L$break1 + + shr r8,4 + and rdx,0xf + mov r10,r9 + shr r9,4 + xor r8,QWORD[8+rax*1+rsi] + shl r10,60 + xor r9,QWORD[rax*1+rsi] + and bl,0xf0 + xor r9,QWORD[rdx*8+r11] + mov rdx,r8 + xor r8,r10 + jmp NEAR $L$oop1 + +ALIGN 16 +$L$break1: + shr r8,4 + and rdx,0xf + mov r10,r9 + shr r9,4 + xor r8,QWORD[8+rax*1+rsi] + shl r10,60 + xor r9,QWORD[rax*1+rsi] + and bl,0xf0 + xor r9,QWORD[rdx*8+r11] + mov rdx,r8 + xor r8,r10 + + shr r8,4 + and rdx,0xf + mov r10,r9 + shr r9,4 + xor r8,QWORD[8+rbx*1+rsi] + shl r10,60 + xor r9,QWORD[rbx*1+rsi] + xor r8,r10 + xor r9,QWORD[rdx*8+r11] + + bswap r8 + bswap r9 + mov QWORD[8+rdi],r8 + mov QWORD[rdi],r9 + + lea rsi,[((280+48))+rsp] + + mov rbx,QWORD[((-8))+rsi] + + lea rsp,[rsi] + +$L$gmult_epilogue: + mov rdi,QWORD[8+rsp] ;WIN64 epilogue + mov rsi,QWORD[16+rsp] + DB 0F3h,0C3h ;repret + +$L$SEH_end_gcm_gmult_4bit: +global gcm_ghash_4bit + +ALIGN 16 +gcm_ghash_4bit: + mov QWORD[8+rsp],rdi ;WIN64 prologue + mov QWORD[16+rsp],rsi + mov rax,rsp +$L$SEH_begin_gcm_ghash_4bit: + mov rdi,rcx + mov rsi,rdx + mov rdx,r8 + mov rcx,r9 + + + + push rbx + + push rbp + + push r12 + + push r13 + + push r14 + + push r15 + + sub rsp,280 + +$L$ghash_prologue: + mov r14,rdx + mov r15,rcx + sub rsi,-128 + lea rbp,[((16+128))+rsp] + xor edx,edx + mov r8,QWORD[((0+0-128))+rsi] + mov rax,QWORD[((0+8-128))+rsi] + mov dl,al + shr rax,4 + mov r10,r8 + shr r8,4 + mov r9,QWORD[((16+0-128))+rsi] + shl dl,4 + mov rbx,QWORD[((16+8-128))+rsi] + shl r10,60 + mov BYTE[rsp],dl + or rax,r10 + mov dl,bl + shr rbx,4 + mov r10,r9 + shr r9,4 + mov QWORD[rbp],r8 + mov r8,QWORD[((32+0-128))+rsi] + shl dl,4 + mov QWORD[((0-128))+rbp],rax + mov rax,QWORD[((32+8-128))+rsi] + shl r10,60 + mov BYTE[1+rsp],dl + or rbx,r10 + mov dl,al + shr rax,4 + mov r10,r8 + shr r8,4 + mov QWORD[8+rbp],r9 + mov r9,QWORD[((48+0-128))+rsi] + shl dl,4 + mov QWORD[((8-128))+rbp],rbx + mov rbx,QWORD[((48+8-128))+rsi] + shl r10,60 + mov BYTE[2+rsp],dl + or rax,r10 + mov dl,bl + shr rbx,4 + mov r10,r9 + shr r9,4 + mov QWORD[16+rbp],r8 + mov r8,QWORD[((64+0-128))+rsi] + shl dl,4 + mov QWORD[((16-128))+rbp],rax + mov rax,QWORD[((64+8-128))+rsi] + shl r10,60 + mov BYTE[3+rsp],dl + or rbx,r10 + mov dl,al + shr rax,4 + mov r10,r8 + shr r8,4 + mov QWORD[24+rbp],r9 + mov r9,QWORD[((80+0-128))+rsi] + shl dl,4 + mov QWORD[((24-128))+rbp],rbx + mov rbx,QWORD[((80+8-128))+rsi] + shl r10,60 + mov BYTE[4+rsp],dl + or rax,r10 + mov dl,bl + shr rbx,4 + mov r10,r9 + shr r9,4 + mov QWORD[32+rbp],r8 + mov r8,QWORD[((96+0-128))+rsi] + shl dl,4 + mov QWORD[((32-128))+rbp],rax + mov rax,QWORD[((96+8-128))+rsi] + shl r10,60 + mov BYTE[5+rsp],dl + or rbx,r10 + mov dl,al + shr rax,4 + mov r10,r8 + shr r8,4 + mov QWORD[40+rbp],r9 + mov r9,QWORD[((112+0-128))+rsi] + shl dl,4 + mov QWORD[((40-128))+rbp],rbx + mov rbx,QWORD[((112+8-128))+rsi] + shl r10,60 + mov BYTE[6+rsp],dl + or rax,r10 + mov dl,bl + shr rbx,4 + mov r10,r9 + shr r9,4 + mov QWORD[48+rbp],r8 + mov r8,QWORD[((128+0-128))+rsi] + shl dl,4 + mov QWORD[((48-128))+rbp],rax + mov rax,QWORD[((128+8-128))+rsi] + shl r10,60 + mov BYTE[7+rsp],dl + or rbx,r10 + mov dl,al + shr rax,4 + mov r10,r8 + shr r8,4 + mov QWORD[56+rbp],r9 + mov r9,QWORD[((144+0-128))+rsi] + shl dl,4 + mov QWORD[((56-128))+rbp],rbx + mov rbx,QWORD[((144+8-128))+rsi] + shl r10,60 + mov BYTE[8+rsp],dl + or rax,r10 + mov dl,bl + shr rbx,4 + mov r10,r9 + shr r9,4 + mov QWORD[64+rbp],r8 + mov r8,QWORD[((160+0-128))+rsi] + shl dl,4 + mov QWORD[((64-128))+rbp],rax + mov rax,QWORD[((160+8-128))+rsi] + shl r10,60 + mov BYTE[9+rsp],dl + or rbx,r10 + mov dl,al + shr rax,4 + mov r10,r8 + shr r8,4 + mov QWORD[72+rbp],r9 + mov r9,QWORD[((176+0-128))+rsi] + shl dl,4 + mov QWORD[((72-128))+rbp],rbx + mov rbx,QWORD[((176+8-128))+rsi] + shl r10,60 + mov BYTE[10+rsp],dl + or rax,r10 + mov dl,bl + shr rbx,4 + mov r10,r9 + shr r9,4 + mov QWORD[80+rbp],r8 + mov r8,QWORD[((192+0-128))+rsi] + shl dl,4 + mov QWORD[((80-128))+rbp],rax + mov rax,QWORD[((192+8-128))+rsi] + shl r10,60 + mov BYTE[11+rsp],dl + or rbx,r10 + mov dl,al + shr rax,4 + mov r10,r8 + shr r8,4 + mov QWORD[88+rbp],r9 + mov r9,QWORD[((208+0-128))+rsi] + shl dl,4 + mov QWORD[((88-128))+rbp],rbx + mov rbx,QWORD[((208+8-128))+rsi] + shl r10,60 + mov BYTE[12+rsp],dl + or rax,r10 + mov dl,bl + shr rbx,4 + mov r10,r9 + shr r9,4 + mov QWORD[96+rbp],r8 + mov r8,QWORD[((224+0-128))+rsi] + shl dl,4 + mov QWORD[((96-128))+rbp],rax + mov rax,QWORD[((224+8-128))+rsi] + shl r10,60 + mov BYTE[13+rsp],dl + or rbx,r10 + mov dl,al + shr rax,4 + mov r10,r8 + shr r8,4 + mov QWORD[104+rbp],r9 + mov r9,QWORD[((240+0-128))+rsi] + shl dl,4 + mov QWORD[((104-128))+rbp],rbx + mov rbx,QWORD[((240+8-128))+rsi] + shl r10,60 + mov BYTE[14+rsp],dl + or rax,r10 + mov dl,bl + shr rbx,4 + mov r10,r9 + shr r9,4 + mov QWORD[112+rbp],r8 + shl dl,4 + mov QWORD[((112-128))+rbp],rax + shl r10,60 + mov BYTE[15+rsp],dl + or rbx,r10 + mov QWORD[120+rbp],r9 + mov QWORD[((120-128))+rbp],rbx + add rsi,-128 + mov r8,QWORD[8+rdi] + mov r9,QWORD[rdi] + add r15,r14 + lea r11,[$L$rem_8bit] + jmp NEAR $L$outer_loop +ALIGN 16 +$L$outer_loop: + xor r9,QWORD[r14] + mov rdx,QWORD[8+r14] + lea r14,[16+r14] + xor rdx,r8 + mov QWORD[rdi],r9 + mov QWORD[8+rdi],rdx + shr rdx,32 + xor rax,rax + rol edx,8 + mov al,dl + movzx ebx,dl + shl al,4 + shr ebx,4 + rol edx,8 + mov r8,QWORD[8+rax*1+rsi] + mov r9,QWORD[rax*1+rsi] + mov al,dl + movzx ecx,dl + shl al,4 + movzx r12,BYTE[rbx*1+rsp] + shr ecx,4 + xor r12,r8 + mov r10,r9 + shr r8,8 + movzx r12,r12b + shr r9,8 + xor r8,QWORD[((-128))+rbx*8+rbp] + shl r10,56 + xor r9,QWORD[rbx*8+rbp] + rol edx,8 + xor r8,QWORD[8+rax*1+rsi] + xor r9,QWORD[rax*1+rsi] + mov al,dl + xor r8,r10 + movzx r12,WORD[r12*2+r11] + movzx ebx,dl + shl al,4 + movzx r13,BYTE[rcx*1+rsp] + shr ebx,4 + shl r12,48 + xor r13,r8 + mov r10,r9 + xor r9,r12 + shr r8,8 + movzx r13,r13b + shr r9,8 + xor r8,QWORD[((-128))+rcx*8+rbp] + shl r10,56 + xor r9,QWORD[rcx*8+rbp] + rol edx,8 + xor r8,QWORD[8+rax*1+rsi] + xor r9,QWORD[rax*1+rsi] + mov al,dl + xor r8,r10 + movzx r13,WORD[r13*2+r11] + movzx ecx,dl + shl al,4 + movzx r12,BYTE[rbx*1+rsp] + shr ecx,4 + shl r13,48 + xor r12,r8 + mov r10,r9 + xor r9,r13 + shr r8,8 + movzx r12,r12b + mov edx,DWORD[8+rdi] + shr r9,8 + xor r8,QWORD[((-128))+rbx*8+rbp] + shl r10,56 + xor r9,QWORD[rbx*8+rbp] + rol edx,8 + xor r8,QWORD[8+rax*1+rsi] + xor r9,QWORD[rax*1+rsi] + mov al,dl + xor r8,r10 + movzx r12,WORD[r12*2+r11] + movzx ebx,dl + shl al,4 + movzx r13,BYTE[rcx*1+rsp] + shr ebx,4 + shl r12,48 + xor r13,r8 + mov r10,r9 + xor r9,r12 + shr r8,8 + movzx r13,r13b + shr r9,8 + xor r8,QWORD[((-128))+rcx*8+rbp] + shl r10,56 + xor r9,QWORD[rcx*8+rbp] + rol edx,8 + xor r8,QWORD[8+rax*1+rsi] + xor r9,QWORD[rax*1+rsi] + mov al,dl + xor r8,r10 + movzx r13,WORD[r13*2+r11] + movzx ecx,dl + shl al,4 + movzx r12,BYTE[rbx*1+rsp] + shr ecx,4 + shl r13,48 + xor r12,r8 + mov r10,r9 + xor r9,r13 + shr r8,8 + movzx r12,r12b + shr r9,8 + xor r8,QWORD[((-128))+rbx*8+rbp] + shl r10,56 + xor r9,QWORD[rbx*8+rbp] + rol edx,8 + xor r8,QWORD[8+rax*1+rsi] + xor r9,QWORD[rax*1+rsi] + mov al,dl + xor r8,r10 + movzx r12,WORD[r12*2+r11] + movzx ebx,dl + shl al,4 + movzx r13,BYTE[rcx*1+rsp] + shr ebx,4 + shl r12,48 + xor r13,r8 + mov r10,r9 + xor r9,r12 + shr r8,8 + movzx r13,r13b + shr r9,8 + xor r8,QWORD[((-128))+rcx*8+rbp] + shl r10,56 + xor r9,QWORD[rcx*8+rbp] + rol edx,8 + xor r8,QWORD[8+rax*1+rsi] + xor r9,QWORD[rax*1+rsi] + mov al,dl + xor r8,r10 + movzx r13,WORD[r13*2+r11] + movzx ecx,dl + shl al,4 + movzx r12,BYTE[rbx*1+rsp] + shr ecx,4 + shl r13,48 + xor r12,r8 + mov r10,r9 + xor r9,r13 + shr r8,8 + movzx r12,r12b + mov edx,DWORD[4+rdi] + shr r9,8 + xor r8,QWORD[((-128))+rbx*8+rbp] + shl r10,56 + xor r9,QWORD[rbx*8+rbp] + rol edx,8 + xor r8,QWORD[8+rax*1+rsi] + xor r9,QWORD[rax*1+rsi] + mov al,dl + xor r8,r10 + movzx r12,WORD[r12*2+r11] + movzx ebx,dl + shl al,4 + movzx r13,BYTE[rcx*1+rsp] + shr ebx,4 + shl r12,48 + xor r13,r8 + mov r10,r9 + xor r9,r12 + shr r8,8 + movzx r13,r13b + shr r9,8 + xor r8,QWORD[((-128))+rcx*8+rbp] + shl r10,56 + xor r9,QWORD[rcx*8+rbp] + rol edx,8 + xor r8,QWORD[8+rax*1+rsi] + xor r9,QWORD[rax*1+rsi] + mov al,dl + xor r8,r10 + movzx r13,WORD[r13*2+r11] + movzx ecx,dl + shl al,4 + movzx r12,BYTE[rbx*1+rsp] + shr ecx,4 + shl r13,48 + xor r12,r8 + mov r10,r9 + xor r9,r13 + shr r8,8 + movzx r12,r12b + shr r9,8 + xor r8,QWORD[((-128))+rbx*8+rbp] + shl r10,56 + xor r9,QWORD[rbx*8+rbp] + rol edx,8 + xor r8,QWORD[8+rax*1+rsi] + xor r9,QWORD[rax*1+rsi] + mov al,dl + xor r8,r10 + movzx r12,WORD[r12*2+r11] + movzx ebx,dl + shl al,4 + movzx r13,BYTE[rcx*1+rsp] + shr ebx,4 + shl r12,48 + xor r13,r8 + mov r10,r9 + xor r9,r12 + shr r8,8 + movzx r13,r13b + shr r9,8 + xor r8,QWORD[((-128))+rcx*8+rbp] + shl r10,56 + xor r9,QWORD[rcx*8+rbp] + rol edx,8 + xor r8,QWORD[8+rax*1+rsi] + xor r9,QWORD[rax*1+rsi] + mov al,dl + xor r8,r10 + movzx r13,WORD[r13*2+r11] + movzx ecx,dl + shl al,4 + movzx r12,BYTE[rbx*1+rsp] + shr ecx,4 + shl r13,48 + xor r12,r8 + mov r10,r9 + xor r9,r13 + shr r8,8 + movzx r12,r12b + mov edx,DWORD[rdi] + shr r9,8 + xor r8,QWORD[((-128))+rbx*8+rbp] + shl r10,56 + xor r9,QWORD[rbx*8+rbp] + rol edx,8 + xor r8,QWORD[8+rax*1+rsi] + xor r9,QWORD[rax*1+rsi] + mov al,dl + xor r8,r10 + movzx r12,WORD[r12*2+r11] + movzx ebx,dl + shl al,4 + movzx r13,BYTE[rcx*1+rsp] + shr ebx,4 + shl r12,48 + xor r13,r8 + mov r10,r9 + xor r9,r12 + shr r8,8 + movzx r13,r13b + shr r9,8 + xor r8,QWORD[((-128))+rcx*8+rbp] + shl r10,56 + xor r9,QWORD[rcx*8+rbp] + rol edx,8 + xor r8,QWORD[8+rax*1+rsi] + xor r9,QWORD[rax*1+rsi] + mov al,dl + xor r8,r10 + movzx r13,WORD[r13*2+r11] + movzx ecx,dl + shl al,4 + movzx r12,BYTE[rbx*1+rsp] + shr ecx,4 + shl r13,48 + xor r12,r8 + mov r10,r9 + xor r9,r13 + shr r8,8 + movzx r12,r12b + shr r9,8 + xor r8,QWORD[((-128))+rbx*8+rbp] + shl r10,56 + xor r9,QWORD[rbx*8+rbp] + rol edx,8 + xor r8,QWORD[8+rax*1+rsi] + xor r9,QWORD[rax*1+rsi] + mov al,dl + xor r8,r10 + movzx r12,WORD[r12*2+r11] + movzx ebx,dl + shl al,4 + movzx r13,BYTE[rcx*1+rsp] + shr ebx,4 + shl r12,48 + xor r13,r8 + mov r10,r9 + xor r9,r12 + shr r8,8 + movzx r13,r13b + shr r9,8 + xor r8,QWORD[((-128))+rcx*8+rbp] + shl r10,56 + xor r9,QWORD[rcx*8+rbp] + rol edx,8 + xor r8,QWORD[8+rax*1+rsi] + xor r9,QWORD[rax*1+rsi] + mov al,dl + xor r8,r10 + movzx r13,WORD[r13*2+r11] + movzx ecx,dl + shl al,4 + movzx r12,BYTE[rbx*1+rsp] + and ecx,240 + shl r13,48 + xor r12,r8 + mov r10,r9 + xor r9,r13 + shr r8,8 + movzx r12,r12b + mov edx,DWORD[((-4))+rdi] + shr r9,8 + xor r8,QWORD[((-128))+rbx*8+rbp] + shl r10,56 + xor r9,QWORD[rbx*8+rbp] + movzx r12,WORD[r12*2+r11] + xor r8,QWORD[8+rax*1+rsi] + xor r9,QWORD[rax*1+rsi] + shl r12,48 + xor r8,r10 + xor r9,r12 + movzx r13,r8b + shr r8,4 + mov r10,r9 + shl r13b,4 + shr r9,4 + xor r8,QWORD[8+rcx*1+rsi] + movzx r13,WORD[r13*2+r11] + shl r10,60 + xor r9,QWORD[rcx*1+rsi] + xor r8,r10 + shl r13,48 + bswap r8 + xor r9,r13 + bswap r9 + cmp r14,r15 + jb NEAR $L$outer_loop + mov QWORD[8+rdi],r8 + mov QWORD[rdi],r9 + + lea rsi,[((280+48))+rsp] + + mov r15,QWORD[((-48))+rsi] + + mov r14,QWORD[((-40))+rsi] + + mov r13,QWORD[((-32))+rsi] + + mov r12,QWORD[((-24))+rsi] + + mov rbp,QWORD[((-16))+rsi] + + mov rbx,QWORD[((-8))+rsi] + + lea rsp,[rsi] + +$L$ghash_epilogue: + mov rdi,QWORD[8+rsp] ;WIN64 epilogue + mov rsi,QWORD[16+rsp] + DB 0F3h,0C3h ;repret + +$L$SEH_end_gcm_ghash_4bit: +global gcm_init_clmul + +ALIGN 16 +gcm_init_clmul: + +$L$_init_clmul: +$L$SEH_begin_gcm_init_clmul: + +DB 0x48,0x83,0xec,0x18 +DB 0x0f,0x29,0x34,0x24 + movdqu xmm2,XMMWORD[rdx] + pshufd xmm2,xmm2,78 + + + pshufd xmm4,xmm2,255 + movdqa xmm3,xmm2 + psllq xmm2,1 + pxor xmm5,xmm5 + psrlq xmm3,63 + pcmpgtd xmm5,xmm4 + pslldq xmm3,8 + por xmm2,xmm3 + + + pand xmm5,XMMWORD[$L$0x1c2_polynomial] + pxor xmm2,xmm5 + + + pshufd xmm6,xmm2,78 + movdqa xmm0,xmm2 + pxor xmm6,xmm2 + movdqa xmm1,xmm0 + pshufd xmm3,xmm0,78 + pxor xmm3,xmm0 +DB 102,15,58,68,194,0 +DB 102,15,58,68,202,17 +DB 102,15,58,68,222,0 + pxor xmm3,xmm0 + pxor xmm3,xmm1 + + movdqa xmm4,xmm3 + psrldq xmm3,8 + pslldq xmm4,8 + pxor xmm1,xmm3 + pxor xmm0,xmm4 + + movdqa xmm4,xmm0 + movdqa xmm3,xmm0 + psllq xmm0,5 + pxor xmm3,xmm0 + psllq xmm0,1 + pxor xmm0,xmm3 + psllq xmm0,57 + movdqa xmm3,xmm0 + pslldq xmm0,8 + psrldq xmm3,8 + pxor xmm0,xmm4 + pxor xmm1,xmm3 + + + movdqa xmm4,xmm0 + psrlq xmm0,1 + pxor xmm1,xmm4 + pxor xmm4,xmm0 + psrlq xmm0,5 + pxor xmm0,xmm4 + psrlq xmm0,1 + pxor xmm0,xmm1 + pshufd xmm3,xmm2,78 + pshufd xmm4,xmm0,78 + pxor xmm3,xmm2 + movdqu XMMWORD[rcx],xmm2 + pxor xmm4,xmm0 + movdqu XMMWORD[16+rcx],xmm0 +DB 102,15,58,15,227,8 + movdqu XMMWORD[32+rcx],xmm4 + movdqa xmm1,xmm0 + pshufd xmm3,xmm0,78 + pxor xmm3,xmm0 +DB 102,15,58,68,194,0 +DB 102,15,58,68,202,17 +DB 102,15,58,68,222,0 + pxor xmm3,xmm0 + pxor xmm3,xmm1 + + movdqa xmm4,xmm3 + psrldq xmm3,8 + pslldq xmm4,8 + pxor xmm1,xmm3 + pxor xmm0,xmm4 + + movdqa xmm4,xmm0 + movdqa xmm3,xmm0 + psllq xmm0,5 + pxor xmm3,xmm0 + psllq xmm0,1 + pxor xmm0,xmm3 + psllq xmm0,57 + movdqa xmm3,xmm0 + pslldq xmm0,8 + psrldq xmm3,8 + pxor xmm0,xmm4 + pxor xmm1,xmm3 + + + movdqa xmm4,xmm0 + psrlq xmm0,1 + pxor xmm1,xmm4 + pxor xmm4,xmm0 + psrlq xmm0,5 + pxor xmm0,xmm4 + psrlq xmm0,1 + pxor xmm0,xmm1 + movdqa xmm5,xmm0 + movdqa xmm1,xmm0 + pshufd xmm3,xmm0,78 + pxor xmm3,xmm0 +DB 102,15,58,68,194,0 +DB 102,15,58,68,202,17 +DB 102,15,58,68,222,0 + pxor xmm3,xmm0 + pxor xmm3,xmm1 + + movdqa xmm4,xmm3 + psrldq xmm3,8 + pslldq xmm4,8 + pxor xmm1,xmm3 + pxor xmm0,xmm4 + + movdqa xmm4,xmm0 + movdqa xmm3,xmm0 + psllq xmm0,5 + pxor xmm3,xmm0 + psllq xmm0,1 + pxor xmm0,xmm3 + psllq xmm0,57 + movdqa xmm3,xmm0 + pslldq xmm0,8 + psrldq xmm3,8 + pxor xmm0,xmm4 + pxor xmm1,xmm3 + + + movdqa xmm4,xmm0 + psrlq xmm0,1 + pxor xmm1,xmm4 + pxor xmm4,xmm0 + psrlq xmm0,5 + pxor xmm0,xmm4 + psrlq xmm0,1 + pxor xmm0,xmm1 + pshufd xmm3,xmm5,78 + pshufd xmm4,xmm0,78 + pxor xmm3,xmm5 + movdqu XMMWORD[48+rcx],xmm5 + pxor xmm4,xmm0 + movdqu XMMWORD[64+rcx],xmm0 +DB 102,15,58,15,227,8 + movdqu XMMWORD[80+rcx],xmm4 + movaps xmm6,XMMWORD[rsp] + lea rsp,[24+rsp] +$L$SEH_end_gcm_init_clmul: + DB 0F3h,0C3h ;repret + + +global gcm_gmult_clmul + +ALIGN 16 +gcm_gmult_clmul: + +$L$_gmult_clmul: + movdqu xmm0,XMMWORD[rcx] + movdqa xmm5,XMMWORD[$L$bswap_mask] + movdqu xmm2,XMMWORD[rdx] + movdqu xmm4,XMMWORD[32+rdx] +DB 102,15,56,0,197 + movdqa xmm1,xmm0 + pshufd xmm3,xmm0,78 + pxor xmm3,xmm0 +DB 102,15,58,68,194,0 +DB 102,15,58,68,202,17 +DB 102,15,58,68,220,0 + pxor xmm3,xmm0 + pxor xmm3,xmm1 + + movdqa xmm4,xmm3 + psrldq xmm3,8 + pslldq xmm4,8 + pxor xmm1,xmm3 + pxor xmm0,xmm4 + + movdqa xmm4,xmm0 + movdqa xmm3,xmm0 + psllq xmm0,5 + pxor xmm3,xmm0 + psllq xmm0,1 + pxor xmm0,xmm3 + psllq xmm0,57 + movdqa xmm3,xmm0 + pslldq xmm0,8 + psrldq xmm3,8 + pxor xmm0,xmm4 + pxor xmm1,xmm3 + + + movdqa xmm4,xmm0 + psrlq xmm0,1 + pxor xmm1,xmm4 + pxor xmm4,xmm0 + psrlq xmm0,5 + pxor xmm0,xmm4 + psrlq xmm0,1 + pxor xmm0,xmm1 +DB 102,15,56,0,197 + movdqu XMMWORD[rcx],xmm0 + DB 0F3h,0C3h ;repret + + +global gcm_ghash_clmul + +ALIGN 32 +gcm_ghash_clmul: + +$L$_ghash_clmul: + lea rax,[((-136))+rsp] +$L$SEH_begin_gcm_ghash_clmul: + +DB 0x48,0x8d,0x60,0xe0 +DB 0x0f,0x29,0x70,0xe0 +DB 0x0f,0x29,0x78,0xf0 +DB 0x44,0x0f,0x29,0x00 +DB 0x44,0x0f,0x29,0x48,0x10 +DB 0x44,0x0f,0x29,0x50,0x20 +DB 0x44,0x0f,0x29,0x58,0x30 +DB 0x44,0x0f,0x29,0x60,0x40 +DB 0x44,0x0f,0x29,0x68,0x50 +DB 0x44,0x0f,0x29,0x70,0x60 +DB 0x44,0x0f,0x29,0x78,0x70 + movdqa xmm10,XMMWORD[$L$bswap_mask] + + movdqu xmm0,XMMWORD[rcx] + movdqu xmm2,XMMWORD[rdx] + movdqu xmm7,XMMWORD[32+rdx] +DB 102,65,15,56,0,194 + + sub r9,0x10 + jz NEAR $L$odd_tail + + movdqu xmm6,XMMWORD[16+rdx] + mov eax,DWORD[((OPENSSL_ia32cap_P+4))] + cmp r9,0x30 + jb NEAR $L$skip4x + + and eax,71303168 + cmp eax,4194304 + je NEAR $L$skip4x + + sub r9,0x30 + mov rax,0xA040608020C0E000 + movdqu xmm14,XMMWORD[48+rdx] + movdqu xmm15,XMMWORD[64+rdx] + + + + + movdqu xmm3,XMMWORD[48+r8] + movdqu xmm11,XMMWORD[32+r8] +DB 102,65,15,56,0,218 +DB 102,69,15,56,0,218 + movdqa xmm5,xmm3 + pshufd xmm4,xmm3,78 + pxor xmm4,xmm3 +DB 102,15,58,68,218,0 +DB 102,15,58,68,234,17 +DB 102,15,58,68,231,0 + + movdqa xmm13,xmm11 + pshufd xmm12,xmm11,78 + pxor xmm12,xmm11 +DB 102,68,15,58,68,222,0 +DB 102,68,15,58,68,238,17 +DB 102,68,15,58,68,231,16 + xorps xmm3,xmm11 + xorps xmm5,xmm13 + movups xmm7,XMMWORD[80+rdx] + xorps xmm4,xmm12 + + movdqu xmm11,XMMWORD[16+r8] + movdqu xmm8,XMMWORD[r8] +DB 102,69,15,56,0,218 +DB 102,69,15,56,0,194 + movdqa xmm13,xmm11 + pshufd xmm12,xmm11,78 + pxor xmm0,xmm8 + pxor xmm12,xmm11 +DB 102,69,15,58,68,222,0 + movdqa xmm1,xmm0 + pshufd xmm8,xmm0,78 + pxor xmm8,xmm0 +DB 102,69,15,58,68,238,17 +DB 102,68,15,58,68,231,0 + xorps xmm3,xmm11 + xorps xmm5,xmm13 + + lea r8,[64+r8] + sub r9,0x40 + jc NEAR $L$tail4x + + jmp NEAR $L$mod4_loop +ALIGN 32 +$L$mod4_loop: +DB 102,65,15,58,68,199,0 + xorps xmm4,xmm12 + movdqu xmm11,XMMWORD[48+r8] +DB 102,69,15,56,0,218 +DB 102,65,15,58,68,207,17 + xorps xmm0,xmm3 + movdqu xmm3,XMMWORD[32+r8] + movdqa xmm13,xmm11 +DB 102,68,15,58,68,199,16 + pshufd xmm12,xmm11,78 + xorps xmm1,xmm5 + pxor xmm12,xmm11 +DB 102,65,15,56,0,218 + movups xmm7,XMMWORD[32+rdx] + xorps xmm8,xmm4 +DB 102,68,15,58,68,218,0 + pshufd xmm4,xmm3,78 + + pxor xmm8,xmm0 + movdqa xmm5,xmm3 + pxor xmm8,xmm1 + pxor xmm4,xmm3 + movdqa xmm9,xmm8 +DB 102,68,15,58,68,234,17 + pslldq xmm8,8 + psrldq xmm9,8 + pxor xmm0,xmm8 + movdqa xmm8,XMMWORD[$L$7_mask] + pxor xmm1,xmm9 +DB 102,76,15,110,200 + + pand xmm8,xmm0 +DB 102,69,15,56,0,200 + pxor xmm9,xmm0 +DB 102,68,15,58,68,231,0 + psllq xmm9,57 + movdqa xmm8,xmm9 + pslldq xmm9,8 +DB 102,15,58,68,222,0 + psrldq xmm8,8 + pxor xmm0,xmm9 + pxor xmm1,xmm8 + movdqu xmm8,XMMWORD[r8] + + movdqa xmm9,xmm0 + psrlq xmm0,1 +DB 102,15,58,68,238,17 + xorps xmm3,xmm11 + movdqu xmm11,XMMWORD[16+r8] +DB 102,69,15,56,0,218 +DB 102,15,58,68,231,16 + xorps xmm5,xmm13 + movups xmm7,XMMWORD[80+rdx] +DB 102,69,15,56,0,194 + pxor xmm1,xmm9 + pxor xmm9,xmm0 + psrlq xmm0,5 + + movdqa xmm13,xmm11 + pxor xmm4,xmm12 + pshufd xmm12,xmm11,78 + pxor xmm0,xmm9 + pxor xmm1,xmm8 + pxor xmm12,xmm11 +DB 102,69,15,58,68,222,0 + psrlq xmm0,1 + pxor xmm0,xmm1 + movdqa xmm1,xmm0 +DB 102,69,15,58,68,238,17 + xorps xmm3,xmm11 + pshufd xmm8,xmm0,78 + pxor xmm8,xmm0 + +DB 102,68,15,58,68,231,0 + xorps xmm5,xmm13 + + lea r8,[64+r8] + sub r9,0x40 + jnc NEAR $L$mod4_loop + +$L$tail4x: +DB 102,65,15,58,68,199,0 +DB 102,65,15,58,68,207,17 +DB 102,68,15,58,68,199,16 + xorps xmm4,xmm12 + xorps xmm0,xmm3 + xorps xmm1,xmm5 + pxor xmm1,xmm0 + pxor xmm8,xmm4 + + pxor xmm8,xmm1 + pxor xmm1,xmm0 + + movdqa xmm9,xmm8 + psrldq xmm8,8 + pslldq xmm9,8 + pxor xmm1,xmm8 + pxor xmm0,xmm9 + + movdqa xmm4,xmm0 + movdqa xmm3,xmm0 + psllq xmm0,5 + pxor xmm3,xmm0 + psllq xmm0,1 + pxor xmm0,xmm3 + psllq xmm0,57 + movdqa xmm3,xmm0 + pslldq xmm0,8 + psrldq xmm3,8 + pxor xmm0,xmm4 + pxor xmm1,xmm3 + + + movdqa xmm4,xmm0 + psrlq xmm0,1 + pxor xmm1,xmm4 + pxor xmm4,xmm0 + psrlq xmm0,5 + pxor xmm0,xmm4 + psrlq xmm0,1 + pxor xmm0,xmm1 + add r9,0x40 + jz NEAR $L$done + movdqu xmm7,XMMWORD[32+rdx] + sub r9,0x10 + jz NEAR $L$odd_tail +$L$skip4x: + + + + + + movdqu xmm8,XMMWORD[r8] + movdqu xmm3,XMMWORD[16+r8] +DB 102,69,15,56,0,194 +DB 102,65,15,56,0,218 + pxor xmm0,xmm8 + + movdqa xmm5,xmm3 + pshufd xmm4,xmm3,78 + pxor xmm4,xmm3 +DB 102,15,58,68,218,0 +DB 102,15,58,68,234,17 +DB 102,15,58,68,231,0 + + lea r8,[32+r8] + nop + sub r9,0x20 + jbe NEAR $L$even_tail + nop + jmp NEAR $L$mod_loop + +ALIGN 32 +$L$mod_loop: + movdqa xmm1,xmm0 + movdqa xmm8,xmm4 + pshufd xmm4,xmm0,78 + pxor xmm4,xmm0 + +DB 102,15,58,68,198,0 +DB 102,15,58,68,206,17 +DB 102,15,58,68,231,16 + + pxor xmm0,xmm3 + pxor xmm1,xmm5 + movdqu xmm9,XMMWORD[r8] + pxor xmm8,xmm0 +DB 102,69,15,56,0,202 + movdqu xmm3,XMMWORD[16+r8] + + pxor xmm8,xmm1 + pxor xmm1,xmm9 + pxor xmm4,xmm8 +DB 102,65,15,56,0,218 + movdqa xmm8,xmm4 + psrldq xmm8,8 + pslldq xmm4,8 + pxor xmm1,xmm8 + pxor xmm0,xmm4 + + movdqa xmm5,xmm3 + + movdqa xmm9,xmm0 + movdqa xmm8,xmm0 + psllq xmm0,5 + pxor xmm8,xmm0 +DB 102,15,58,68,218,0 + psllq xmm0,1 + pxor xmm0,xmm8 + psllq xmm0,57 + movdqa xmm8,xmm0 + pslldq xmm0,8 + psrldq xmm8,8 + pxor xmm0,xmm9 + pshufd xmm4,xmm5,78 + pxor xmm1,xmm8 + pxor xmm4,xmm5 + + movdqa xmm9,xmm0 + psrlq xmm0,1 +DB 102,15,58,68,234,17 + pxor xmm1,xmm9 + pxor xmm9,xmm0 + psrlq xmm0,5 + pxor xmm0,xmm9 + lea r8,[32+r8] + psrlq xmm0,1 +DB 102,15,58,68,231,0 + pxor xmm0,xmm1 + + sub r9,0x20 + ja NEAR $L$mod_loop + +$L$even_tail: + movdqa xmm1,xmm0 + movdqa xmm8,xmm4 + pshufd xmm4,xmm0,78 + pxor xmm4,xmm0 + +DB 102,15,58,68,198,0 +DB 102,15,58,68,206,17 +DB 102,15,58,68,231,16 + + pxor xmm0,xmm3 + pxor xmm1,xmm5 + pxor xmm8,xmm0 + pxor xmm8,xmm1 + pxor xmm4,xmm8 + movdqa xmm8,xmm4 + psrldq xmm8,8 + pslldq xmm4,8 + pxor xmm1,xmm8 + pxor xmm0,xmm4 + + movdqa xmm4,xmm0 + movdqa xmm3,xmm0 + psllq xmm0,5 + pxor xmm3,xmm0 + psllq xmm0,1 + pxor xmm0,xmm3 + psllq xmm0,57 + movdqa xmm3,xmm0 + pslldq xmm0,8 + psrldq xmm3,8 + pxor xmm0,xmm4 + pxor xmm1,xmm3 + + + movdqa xmm4,xmm0 + psrlq xmm0,1 + pxor xmm1,xmm4 + pxor xmm4,xmm0 + psrlq xmm0,5 + pxor xmm0,xmm4 + psrlq xmm0,1 + pxor xmm0,xmm1 + test r9,r9 + jnz NEAR $L$done + +$L$odd_tail: + movdqu xmm8,XMMWORD[r8] +DB 102,69,15,56,0,194 + pxor xmm0,xmm8 + movdqa xmm1,xmm0 + pshufd xmm3,xmm0,78 + pxor xmm3,xmm0 +DB 102,15,58,68,194,0 +DB 102,15,58,68,202,17 +DB 102,15,58,68,223,0 + pxor xmm3,xmm0 + pxor xmm3,xmm1 + + movdqa xmm4,xmm3 + psrldq xmm3,8 + pslldq xmm4,8 + pxor xmm1,xmm3 + pxor xmm0,xmm4 + + movdqa xmm4,xmm0 + movdqa xmm3,xmm0 + psllq xmm0,5 + pxor xmm3,xmm0 + psllq xmm0,1 + pxor xmm0,xmm3 + psllq xmm0,57 + movdqa xmm3,xmm0 + pslldq xmm0,8 + psrldq xmm3,8 + pxor xmm0,xmm4 + pxor xmm1,xmm3 + + + movdqa xmm4,xmm0 + psrlq xmm0,1 + pxor xmm1,xmm4 + pxor xmm4,xmm0 + psrlq xmm0,5 + pxor xmm0,xmm4 + psrlq xmm0,1 + pxor xmm0,xmm1 +$L$done: +DB 102,65,15,56,0,194 + movdqu XMMWORD[rcx],xmm0 + movaps xmm6,XMMWORD[rsp] + movaps xmm7,XMMWORD[16+rsp] + movaps xmm8,XMMWORD[32+rsp] + movaps xmm9,XMMWORD[48+rsp] + movaps xmm10,XMMWORD[64+rsp] + movaps xmm11,XMMWORD[80+rsp] + movaps xmm12,XMMWORD[96+rsp] + movaps xmm13,XMMWORD[112+rsp] + movaps xmm14,XMMWORD[128+rsp] + movaps xmm15,XMMWORD[144+rsp] + lea rsp,[168+rsp] +$L$SEH_end_gcm_ghash_clmul: + DB 0F3h,0C3h ;repret + + +global gcm_init_avx + +ALIGN 32 +gcm_init_avx: + + jmp NEAR $L$_init_clmul + + +global gcm_gmult_avx + +ALIGN 32 +gcm_gmult_avx: + + jmp NEAR $L$_gmult_clmul + + +global gcm_ghash_avx + +ALIGN 32 +gcm_ghash_avx: + + jmp NEAR $L$_ghash_clmul + + +ALIGN 64 +$L$bswap_mask: +DB 15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0 +$L$0x1c2_polynomial: +DB 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0xc2 +$L$7_mask: + DD 7,0,7,0 +$L$7_mask_poly: + DD 7,0,450,0 +ALIGN 64 + +$L$rem_4bit: + DD 0,0,0,471859200,0,943718400,0,610271232 + DD 0,1887436800,0,1822425088,0,1220542464,0,1423966208 + DD 0,3774873600,0,4246732800,0,3644850176,0,3311403008 + DD 0,2441084928,0,2376073216,0,2847932416,0,3051356160 + +$L$rem_8bit: + DW 0x0000,0x01C2,0x0384,0x0246,0x0708,0x06CA,0x048C,0x054E + DW 0x0E10,0x0FD2,0x0D94,0x0C56,0x0918,0x08DA,0x0A9C,0x0B5E + DW 0x1C20,0x1DE2,0x1FA4,0x1E66,0x1B28,0x1AEA,0x18AC,0x196E + DW 0x1230,0x13F2,0x11B4,0x1076,0x1538,0x14FA,0x16BC,0x177E + DW 0x3840,0x3982,0x3BC4,0x3A06,0x3F48,0x3E8A,0x3CCC,0x3D0E + DW 0x3650,0x3792,0x35D4,0x3416,0x3158,0x309A,0x32DC,0x331E + DW 0x2460,0x25A2,0x27E4,0x2626,0x2368,0x22AA,0x20EC,0x212E + DW 0x2A70,0x2BB2,0x29F4,0x2836,0x2D78,0x2CBA,0x2EFC,0x2F3E + DW 0x7080,0x7142,0x7304,0x72C6,0x7788,0x764A,0x740C,0x75CE + DW 0x7E90,0x7F52,0x7D14,0x7CD6,0x7998,0x785A,0x7A1C,0x7BDE + DW 0x6CA0,0x6D62,0x6F24,0x6EE6,0x6BA8,0x6A6A,0x682C,0x69EE + DW 0x62B0,0x6372,0x6134,0x60F6,0x65B8,0x647A,0x663C,0x67FE + DW 0x48C0,0x4902,0x4B44,0x4A86,0x4FC8,0x4E0A,0x4C4C,0x4D8E + DW 0x46D0,0x4712,0x4554,0x4496,0x41D8,0x401A,0x425C,0x439E + DW 0x54E0,0x5522,0x5764,0x56A6,0x53E8,0x522A,0x506C,0x51AE + DW 0x5AF0,0x5B32,0x5974,0x58B6,0x5DF8,0x5C3A,0x5E7C,0x5FBE + DW 0xE100,0xE0C2,0xE284,0xE346,0xE608,0xE7CA,0xE58C,0xE44E + DW 0xEF10,0xEED2,0xEC94,0xED56,0xE818,0xE9DA,0xEB9C,0xEA5E + DW 0xFD20,0xFCE2,0xFEA4,0xFF66,0xFA28,0xFBEA,0xF9AC,0xF86E + DW 0xF330,0xF2F2,0xF0B4,0xF176,0xF438,0xF5FA,0xF7BC,0xF67E + DW 0xD940,0xD882,0xDAC4,0xDB06,0xDE48,0xDF8A,0xDDCC,0xDC0E + DW 0xD750,0xD692,0xD4D4,0xD516,0xD058,0xD19A,0xD3DC,0xD21E + DW 0xC560,0xC4A2,0xC6E4,0xC726,0xC268,0xC3AA,0xC1EC,0xC02E + DW 0xCB70,0xCAB2,0xC8F4,0xC936,0xCC78,0xCDBA,0xCFFC,0xCE3E + DW 0x9180,0x9042,0x9204,0x93C6,0x9688,0x974A,0x950C,0x94CE + DW 0x9F90,0x9E52,0x9C14,0x9DD6,0x9898,0x995A,0x9B1C,0x9ADE + DW 0x8DA0,0x8C62,0x8E24,0x8FE6,0x8AA8,0x8B6A,0x892C,0x88EE + DW 0x83B0,0x8272,0x8034,0x81F6,0x84B8,0x857A,0x873C,0x86FE + DW 0xA9C0,0xA802,0xAA44,0xAB86,0xAEC8,0xAF0A,0xAD4C,0xAC8E + DW 0xA7D0,0xA612,0xA454,0xA596,0xA0D8,0xA11A,0xA35C,0xA29E + DW 0xB5E0,0xB422,0xB664,0xB7A6,0xB2E8,0xB32A,0xB16C,0xB0AE + DW 0xBBF0,0xBA32,0xB874,0xB9B6,0xBCF8,0xBD3A,0xBF7C,0xBEBE + +DB 71,72,65,83,72,32,102,111,114,32,120,56,54,95,54,52 +DB 44,32,67,82,89,80,84,79,71,65,77,83,32,98,121,32 +DB 60,97,112,112,114,111,64,111,112,101,110,115,115,108,46,111 +DB 114,103,62,0 +ALIGN 64 +EXTERN __imp_RtlVirtualUnwind + +ALIGN 16 +se_handler: + push rsi + push rdi + push rbx + push rbp + push r12 + push r13 + push r14 + push r15 + pushfq + sub rsp,64 + + mov rax,QWORD[120+r8] + mov rbx,QWORD[248+r8] + + mov rsi,QWORD[8+r9] + mov r11,QWORD[56+r9] + + mov r10d,DWORD[r11] + lea r10,[r10*1+rsi] + cmp rbx,r10 + jb NEAR $L$in_prologue + + mov rax,QWORD[152+r8] + + mov r10d,DWORD[4+r11] + lea r10,[r10*1+rsi] + cmp rbx,r10 + jae NEAR $L$in_prologue + + lea rax,[((48+280))+rax] + + mov rbx,QWORD[((-8))+rax] + mov rbp,QWORD[((-16))+rax] + mov r12,QWORD[((-24))+rax] + mov r13,QWORD[((-32))+rax] + mov r14,QWORD[((-40))+rax] + mov r15,QWORD[((-48))+rax] + mov QWORD[144+r8],rbx + mov QWORD[160+r8],rbp + mov QWORD[216+r8],r12 + mov QWORD[224+r8],r13 + mov QWORD[232+r8],r14 + mov QWORD[240+r8],r15 + +$L$in_prologue: + mov rdi,QWORD[8+rax] + mov rsi,QWORD[16+rax] + mov QWORD[152+r8],rax + mov QWORD[168+r8],rsi + mov QWORD[176+r8],rdi + + mov rdi,QWORD[40+r9] + mov rsi,r8 + mov ecx,154 + DD 0xa548f3fc + + mov rsi,r9 + xor rcx,rcx + mov rdx,QWORD[8+rsi] + mov r8,QWORD[rsi] + mov r9,QWORD[16+rsi] + mov r10,QWORD[40+rsi] + lea r11,[56+rsi] + lea r12,[24+rsi] + mov QWORD[32+rsp],r10 + mov QWORD[40+rsp],r11 + mov QWORD[48+rsp],r12 + mov QWORD[56+rsp],rcx + call QWORD[__imp_RtlVirtualUnwind] + + mov eax,1 + add rsp,64 + popfq + pop r15 + pop r14 + pop r13 + pop r12 + pop rbp + pop rbx + pop rdi + pop rsi + DB 0F3h,0C3h ;repret + + +section .pdata rdata align=4 +ALIGN 4 + DD $L$SEH_begin_gcm_gmult_4bit wrt ..imagebase + DD $L$SEH_end_gcm_gmult_4bit wrt ..imagebase + DD $L$SEH_info_gcm_gmult_4bit wrt ..imagebase + + DD $L$SEH_begin_gcm_ghash_4bit wrt ..imagebase + DD $L$SEH_end_gcm_ghash_4bit wrt ..imagebase + DD $L$SEH_info_gcm_ghash_4bit wrt ..imagebase + + DD $L$SEH_begin_gcm_init_clmul wrt ..imagebase + DD $L$SEH_end_gcm_init_clmul wrt ..imagebase + DD $L$SEH_info_gcm_init_clmul wrt ..imagebase + + DD $L$SEH_begin_gcm_ghash_clmul wrt ..imagebase + DD $L$SEH_end_gcm_ghash_clmul wrt ..imagebase + DD $L$SEH_info_gcm_ghash_clmul wrt ..imagebase +section .xdata rdata align=8 +ALIGN 8 +$L$SEH_info_gcm_gmult_4bit: +DB 9,0,0,0 + DD se_handler wrt ..imagebase + DD $L$gmult_prologue wrt ..imagebase,$L$gmult_epilogue wrt ..imagebase +$L$SEH_info_gcm_ghash_4bit: +DB 9,0,0,0 + DD se_handler wrt ..imagebase + DD $L$ghash_prologue wrt ..imagebase,$L$ghash_epilogue wrt ..imagebase +$L$SEH_info_gcm_init_clmul: +DB 0x01,0x08,0x03,0x00 +DB 0x08,0x68,0x00,0x00 +DB 0x04,0x22,0x00,0x00 +$L$SEH_info_gcm_ghash_clmul: +DB 0x01,0x33,0x16,0x00 +DB 0x33,0xf8,0x09,0x00 +DB 0x2e,0xe8,0x08,0x00 +DB 0x29,0xd8,0x07,0x00 +DB 0x24,0xc8,0x06,0x00 +DB 0x1f,0xb8,0x05,0x00 +DB 0x1a,0xa8,0x04,0x00 +DB 0x15,0x98,0x03,0x00 +DB 0x10,0x88,0x02,0x00 +DB 0x0c,0x78,0x01,0x00 +DB 0x08,0x68,0x00,0x00 +DB 0x04,0x01,0x15,0x00 diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-mb-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-mb-x86_64.nasm new file mode 100644 index 0000000000..f3b7b0e35e --- /dev/null +++ b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-mb-x86_64.nasm @@ -0,0 +1,3137 @@ +; WARNING: do not edit! +; Generated from openssl/crypto/sha/asm/sha1-mb-x86_64.pl +; +; Copyright 2013-2020 The OpenSSL Project Authors. All Rights Reserved. +; +; Licensed under the OpenSSL license (the "License"). You may not use +; this file except in compliance with the License. You can obtain a copy +; in the file LICENSE in the source distribution or at +; https://www.openssl.org/source/license.html + +default rel +%define XMMWORD +%define YMMWORD +%define ZMMWORD +section .text code align=64 + + +EXTERN OPENSSL_ia32cap_P + +global sha1_multi_block + +ALIGN 32 +sha1_multi_block: + mov QWORD[8+rsp],rdi ;WIN64 prologue + mov QWORD[16+rsp],rsi + mov rax,rsp +$L$SEH_begin_sha1_multi_block: + mov rdi,rcx + mov rsi,rdx + mov rdx,r8 + + + + mov rcx,QWORD[((OPENSSL_ia32cap_P+4))] + bt rcx,61 + jc NEAR _shaext_shortcut + mov rax,rsp + + push rbx + + push rbp + + lea rsp,[((-168))+rsp] + movaps XMMWORD[rsp],xmm6 + movaps XMMWORD[16+rsp],xmm7 + movaps XMMWORD[32+rsp],xmm8 + movaps XMMWORD[48+rsp],xmm9 + movaps XMMWORD[(-120)+rax],xmm10 + movaps XMMWORD[(-104)+rax],xmm11 + movaps XMMWORD[(-88)+rax],xmm12 + movaps XMMWORD[(-72)+rax],xmm13 + movaps XMMWORD[(-56)+rax],xmm14 + movaps XMMWORD[(-40)+rax],xmm15 + sub rsp,288 + and rsp,-256 + mov QWORD[272+rsp],rax + +$L$body: + lea rbp,[K_XX_XX] + lea rbx,[256+rsp] + +$L$oop_grande: + mov DWORD[280+rsp],edx + xor edx,edx + mov r8,QWORD[rsi] + mov ecx,DWORD[8+rsi] + cmp ecx,edx + cmovg edx,ecx + test ecx,ecx + mov DWORD[rbx],ecx + cmovle r8,rbp + mov r9,QWORD[16+rsi] + mov ecx,DWORD[24+rsi] + cmp ecx,edx + cmovg edx,ecx + test ecx,ecx + mov DWORD[4+rbx],ecx + cmovle r9,rbp + mov r10,QWORD[32+rsi] + mov ecx,DWORD[40+rsi] + cmp ecx,edx + cmovg edx,ecx + test ecx,ecx + mov DWORD[8+rbx],ecx + cmovle r10,rbp + mov r11,QWORD[48+rsi] + mov ecx,DWORD[56+rsi] + cmp ecx,edx + cmovg edx,ecx + test ecx,ecx + mov DWORD[12+rbx],ecx + cmovle r11,rbp + test edx,edx + jz NEAR $L$done + + movdqu xmm10,XMMWORD[rdi] + lea rax,[128+rsp] + movdqu xmm11,XMMWORD[32+rdi] + movdqu xmm12,XMMWORD[64+rdi] + movdqu xmm13,XMMWORD[96+rdi] + movdqu xmm14,XMMWORD[128+rdi] + movdqa xmm5,XMMWORD[96+rbp] + movdqa xmm15,XMMWORD[((-32))+rbp] + jmp NEAR $L$oop + +ALIGN 32 +$L$oop: + movd xmm0,DWORD[r8] + lea r8,[64+r8] + movd xmm2,DWORD[r9] + lea r9,[64+r9] + movd xmm3,DWORD[r10] + lea r10,[64+r10] + movd xmm4,DWORD[r11] + lea r11,[64+r11] + punpckldq xmm0,xmm3 + movd xmm1,DWORD[((-60))+r8] + punpckldq xmm2,xmm4 + movd xmm9,DWORD[((-60))+r9] + punpckldq xmm0,xmm2 + movd xmm8,DWORD[((-60))+r10] +DB 102,15,56,0,197 + movd xmm7,DWORD[((-60))+r11] + punpckldq xmm1,xmm8 + movdqa xmm8,xmm10 + paddd xmm14,xmm15 + punpckldq xmm9,xmm7 + movdqa xmm7,xmm11 + movdqa xmm6,xmm11 + pslld xmm8,5 + pandn xmm7,xmm13 + pand xmm6,xmm12 + punpckldq xmm1,xmm9 + movdqa xmm9,xmm10 + + movdqa XMMWORD[(0-128)+rax],xmm0 + paddd xmm14,xmm0 + movd xmm2,DWORD[((-56))+r8] + psrld xmm9,27 + pxor xmm6,xmm7 + movdqa xmm7,xmm11 + + por xmm8,xmm9 + movd xmm9,DWORD[((-56))+r9] + pslld xmm7,30 + paddd xmm14,xmm6 + + psrld xmm11,2 + paddd xmm14,xmm8 +DB 102,15,56,0,205 + movd xmm8,DWORD[((-56))+r10] + por xmm11,xmm7 + movd xmm7,DWORD[((-56))+r11] + punpckldq xmm2,xmm8 + movdqa xmm8,xmm14 + paddd xmm13,xmm15 + punpckldq xmm9,xmm7 + movdqa xmm7,xmm10 + movdqa xmm6,xmm10 + pslld xmm8,5 + pandn xmm7,xmm12 + pand xmm6,xmm11 + punpckldq xmm2,xmm9 + movdqa xmm9,xmm14 + + movdqa XMMWORD[(16-128)+rax],xmm1 + paddd xmm13,xmm1 + movd xmm3,DWORD[((-52))+r8] + psrld xmm9,27 + pxor xmm6,xmm7 + movdqa xmm7,xmm10 + + por xmm8,xmm9 + movd xmm9,DWORD[((-52))+r9] + pslld xmm7,30 + paddd xmm13,xmm6 + + psrld xmm10,2 + paddd xmm13,xmm8 +DB 102,15,56,0,213 + movd xmm8,DWORD[((-52))+r10] + por xmm10,xmm7 + movd xmm7,DWORD[((-52))+r11] + punpckldq xmm3,xmm8 + movdqa xmm8,xmm13 + paddd xmm12,xmm15 + punpckldq xmm9,xmm7 + movdqa xmm7,xmm14 + movdqa xmm6,xmm14 + pslld xmm8,5 + pandn xmm7,xmm11 + pand xmm6,xmm10 + punpckldq xmm3,xmm9 + movdqa xmm9,xmm13 + + movdqa XMMWORD[(32-128)+rax],xmm2 + paddd xmm12,xmm2 + movd xmm4,DWORD[((-48))+r8] + psrld xmm9,27 + pxor xmm6,xmm7 + movdqa xmm7,xmm14 + + por xmm8,xmm9 + movd xmm9,DWORD[((-48))+r9] + pslld xmm7,30 + paddd xmm12,xmm6 + + psrld xmm14,2 + paddd xmm12,xmm8 +DB 102,15,56,0,221 + movd xmm8,DWORD[((-48))+r10] + por xmm14,xmm7 + movd xmm7,DWORD[((-48))+r11] + punpckldq xmm4,xmm8 + movdqa xmm8,xmm12 + paddd xmm11,xmm15 + punpckldq xmm9,xmm7 + movdqa xmm7,xmm13 + movdqa xmm6,xmm13 + pslld xmm8,5 + pandn xmm7,xmm10 + pand xmm6,xmm14 + punpckldq xmm4,xmm9 + movdqa xmm9,xmm12 + + movdqa XMMWORD[(48-128)+rax],xmm3 + paddd xmm11,xmm3 + movd xmm0,DWORD[((-44))+r8] + psrld xmm9,27 + pxor xmm6,xmm7 + movdqa xmm7,xmm13 + + por xmm8,xmm9 + movd xmm9,DWORD[((-44))+r9] + pslld xmm7,30 + paddd xmm11,xmm6 + + psrld xmm13,2 + paddd xmm11,xmm8 +DB 102,15,56,0,229 + movd xmm8,DWORD[((-44))+r10] + por xmm13,xmm7 + movd xmm7,DWORD[((-44))+r11] + punpckldq xmm0,xmm8 + movdqa xmm8,xmm11 + paddd xmm10,xmm15 + punpckldq xmm9,xmm7 + movdqa xmm7,xmm12 + movdqa xmm6,xmm12 + pslld xmm8,5 + pandn xmm7,xmm14 + pand xmm6,xmm13 + punpckldq xmm0,xmm9 + movdqa xmm9,xmm11 + + movdqa XMMWORD[(64-128)+rax],xmm4 + paddd xmm10,xmm4 + movd xmm1,DWORD[((-40))+r8] + psrld xmm9,27 + pxor xmm6,xmm7 + movdqa xmm7,xmm12 + + por xmm8,xmm9 + movd xmm9,DWORD[((-40))+r9] + pslld xmm7,30 + paddd xmm10,xmm6 + + psrld xmm12,2 + paddd xmm10,xmm8 +DB 102,15,56,0,197 + movd xmm8,DWORD[((-40))+r10] + por xmm12,xmm7 + movd xmm7,DWORD[((-40))+r11] + punpckldq xmm1,xmm8 + movdqa xmm8,xmm10 + paddd xmm14,xmm15 + punpckldq xmm9,xmm7 + movdqa xmm7,xmm11 + movdqa xmm6,xmm11 + pslld xmm8,5 + pandn xmm7,xmm13 + pand xmm6,xmm12 + punpckldq xmm1,xmm9 + movdqa xmm9,xmm10 + + movdqa XMMWORD[(80-128)+rax],xmm0 + paddd xmm14,xmm0 + movd xmm2,DWORD[((-36))+r8] + psrld xmm9,27 + pxor xmm6,xmm7 + movdqa xmm7,xmm11 + + por xmm8,xmm9 + movd xmm9,DWORD[((-36))+r9] + pslld xmm7,30 + paddd xmm14,xmm6 + + psrld xmm11,2 + paddd xmm14,xmm8 +DB 102,15,56,0,205 + movd xmm8,DWORD[((-36))+r10] + por xmm11,xmm7 + movd xmm7,DWORD[((-36))+r11] + punpckldq xmm2,xmm8 + movdqa xmm8,xmm14 + paddd xmm13,xmm15 + punpckldq xmm9,xmm7 + movdqa xmm7,xmm10 + movdqa xmm6,xmm10 + pslld xmm8,5 + pandn xmm7,xmm12 + pand xmm6,xmm11 + punpckldq xmm2,xmm9 + movdqa xmm9,xmm14 + + movdqa XMMWORD[(96-128)+rax],xmm1 + paddd xmm13,xmm1 + movd xmm3,DWORD[((-32))+r8] + psrld xmm9,27 + pxor xmm6,xmm7 + movdqa xmm7,xmm10 + + por xmm8,xmm9 + movd xmm9,DWORD[((-32))+r9] + pslld xmm7,30 + paddd xmm13,xmm6 + + psrld xmm10,2 + paddd xmm13,xmm8 +DB 102,15,56,0,213 + movd xmm8,DWORD[((-32))+r10] + por xmm10,xmm7 + movd xmm7,DWORD[((-32))+r11] + punpckldq xmm3,xmm8 + movdqa xmm8,xmm13 + paddd xmm12,xmm15 + punpckldq xmm9,xmm7 + movdqa xmm7,xmm14 + movdqa xmm6,xmm14 + pslld xmm8,5 + pandn xmm7,xmm11 + pand xmm6,xmm10 + punpckldq xmm3,xmm9 + movdqa xmm9,xmm13 + + movdqa XMMWORD[(112-128)+rax],xmm2 + paddd xmm12,xmm2 + movd xmm4,DWORD[((-28))+r8] + psrld xmm9,27 + pxor xmm6,xmm7 + movdqa xmm7,xmm14 + + por xmm8,xmm9 + movd xmm9,DWORD[((-28))+r9] + pslld xmm7,30 + paddd xmm12,xmm6 + + psrld xmm14,2 + paddd xmm12,xmm8 +DB 102,15,56,0,221 + movd xmm8,DWORD[((-28))+r10] + por xmm14,xmm7 + movd xmm7,DWORD[((-28))+r11] + punpckldq xmm4,xmm8 + movdqa xmm8,xmm12 + paddd xmm11,xmm15 + punpckldq xmm9,xmm7 + movdqa xmm7,xmm13 + movdqa xmm6,xmm13 + pslld xmm8,5 + pandn xmm7,xmm10 + pand xmm6,xmm14 + punpckldq xmm4,xmm9 + movdqa xmm9,xmm12 + + movdqa XMMWORD[(128-128)+rax],xmm3 + paddd xmm11,xmm3 + movd xmm0,DWORD[((-24))+r8] + psrld xmm9,27 + pxor xmm6,xmm7 + movdqa xmm7,xmm13 + + por xmm8,xmm9 + movd xmm9,DWORD[((-24))+r9] + pslld xmm7,30 + paddd xmm11,xmm6 + + psrld xmm13,2 + paddd xmm11,xmm8 +DB 102,15,56,0,229 + movd xmm8,DWORD[((-24))+r10] + por xmm13,xmm7 + movd xmm7,DWORD[((-24))+r11] + punpckldq xmm0,xmm8 + movdqa xmm8,xmm11 + paddd xmm10,xmm15 + punpckldq xmm9,xmm7 + movdqa xmm7,xmm12 + movdqa xmm6,xmm12 + pslld xmm8,5 + pandn xmm7,xmm14 + pand xmm6,xmm13 + punpckldq xmm0,xmm9 + movdqa xmm9,xmm11 + + movdqa XMMWORD[(144-128)+rax],xmm4 + paddd xmm10,xmm4 + movd xmm1,DWORD[((-20))+r8] + psrld xmm9,27 + pxor xmm6,xmm7 + movdqa xmm7,xmm12 + + por xmm8,xmm9 + movd xmm9,DWORD[((-20))+r9] + pslld xmm7,30 + paddd xmm10,xmm6 + + psrld xmm12,2 + paddd xmm10,xmm8 +DB 102,15,56,0,197 + movd xmm8,DWORD[((-20))+r10] + por xmm12,xmm7 + movd xmm7,DWORD[((-20))+r11] + punpckldq xmm1,xmm8 + movdqa xmm8,xmm10 + paddd xmm14,xmm15 + punpckldq xmm9,xmm7 + movdqa xmm7,xmm11 + movdqa xmm6,xmm11 + pslld xmm8,5 + pandn xmm7,xmm13 + pand xmm6,xmm12 + punpckldq xmm1,xmm9 + movdqa xmm9,xmm10 + + movdqa XMMWORD[(160-128)+rax],xmm0 + paddd xmm14,xmm0 + movd xmm2,DWORD[((-16))+r8] + psrld xmm9,27 + pxor xmm6,xmm7 + movdqa xmm7,xmm11 + + por xmm8,xmm9 + movd xmm9,DWORD[((-16))+r9] + pslld xmm7,30 + paddd xmm14,xmm6 + + psrld xmm11,2 + paddd xmm14,xmm8 +DB 102,15,56,0,205 + movd xmm8,DWORD[((-16))+r10] + por xmm11,xmm7 + movd xmm7,DWORD[((-16))+r11] + punpckldq xmm2,xmm8 + movdqa xmm8,xmm14 + paddd xmm13,xmm15 + punpckldq xmm9,xmm7 + movdqa xmm7,xmm10 + movdqa xmm6,xmm10 + pslld xmm8,5 + pandn xmm7,xmm12 + pand xmm6,xmm11 + punpckldq xmm2,xmm9 + movdqa xmm9,xmm14 + + movdqa XMMWORD[(176-128)+rax],xmm1 + paddd xmm13,xmm1 + movd xmm3,DWORD[((-12))+r8] + psrld xmm9,27 + pxor xmm6,xmm7 + movdqa xmm7,xmm10 + + por xmm8,xmm9 + movd xmm9,DWORD[((-12))+r9] + pslld xmm7,30 + paddd xmm13,xmm6 + + psrld xmm10,2 + paddd xmm13,xmm8 +DB 102,15,56,0,213 + movd xmm8,DWORD[((-12))+r10] + por xmm10,xmm7 + movd xmm7,DWORD[((-12))+r11] + punpckldq xmm3,xmm8 + movdqa xmm8,xmm13 + paddd xmm12,xmm15 + punpckldq xmm9,xmm7 + movdqa xmm7,xmm14 + movdqa xmm6,xmm14 + pslld xmm8,5 + pandn xmm7,xmm11 + pand xmm6,xmm10 + punpckldq xmm3,xmm9 + movdqa xmm9,xmm13 + + movdqa XMMWORD[(192-128)+rax],xmm2 + paddd xmm12,xmm2 + movd xmm4,DWORD[((-8))+r8] + psrld xmm9,27 + pxor xmm6,xmm7 + movdqa xmm7,xmm14 + + por xmm8,xmm9 + movd xmm9,DWORD[((-8))+r9] + pslld xmm7,30 + paddd xmm12,xmm6 + + psrld xmm14,2 + paddd xmm12,xmm8 +DB 102,15,56,0,221 + movd xmm8,DWORD[((-8))+r10] + por xmm14,xmm7 + movd xmm7,DWORD[((-8))+r11] + punpckldq xmm4,xmm8 + movdqa xmm8,xmm12 + paddd xmm11,xmm15 + punpckldq xmm9,xmm7 + movdqa xmm7,xmm13 + movdqa xmm6,xmm13 + pslld xmm8,5 + pandn xmm7,xmm10 + pand xmm6,xmm14 + punpckldq xmm4,xmm9 + movdqa xmm9,xmm12 + + movdqa XMMWORD[(208-128)+rax],xmm3 + paddd xmm11,xmm3 + movd xmm0,DWORD[((-4))+r8] + psrld xmm9,27 + pxor xmm6,xmm7 + movdqa xmm7,xmm13 + + por xmm8,xmm9 + movd xmm9,DWORD[((-4))+r9] + pslld xmm7,30 + paddd xmm11,xmm6 + + psrld xmm13,2 + paddd xmm11,xmm8 +DB 102,15,56,0,229 + movd xmm8,DWORD[((-4))+r10] + por xmm13,xmm7 + movdqa xmm1,XMMWORD[((0-128))+rax] + movd xmm7,DWORD[((-4))+r11] + punpckldq xmm0,xmm8 + movdqa xmm8,xmm11 + paddd xmm10,xmm15 + punpckldq xmm9,xmm7 + movdqa xmm7,xmm12 + movdqa xmm6,xmm12 + pslld xmm8,5 + prefetcht0 [63+r8] + pandn xmm7,xmm14 + pand xmm6,xmm13 + punpckldq xmm0,xmm9 + movdqa xmm9,xmm11 + + movdqa XMMWORD[(224-128)+rax],xmm4 + paddd xmm10,xmm4 + psrld xmm9,27 + pxor xmm6,xmm7 + movdqa xmm7,xmm12 + prefetcht0 [63+r9] + + por xmm8,xmm9 + pslld xmm7,30 + paddd xmm10,xmm6 + prefetcht0 [63+r10] + + psrld xmm12,2 + paddd xmm10,xmm8 +DB 102,15,56,0,197 + prefetcht0 [63+r11] + por xmm12,xmm7 + movdqa xmm2,XMMWORD[((16-128))+rax] + pxor xmm1,xmm3 + movdqa xmm3,XMMWORD[((32-128))+rax] + + movdqa xmm8,xmm10 + pxor xmm1,XMMWORD[((128-128))+rax] + paddd xmm14,xmm15 + movdqa xmm7,xmm11 + pslld xmm8,5 + pxor xmm1,xmm3 + movdqa xmm6,xmm11 + pandn xmm7,xmm13 + movdqa xmm5,xmm1 + pand xmm6,xmm12 + movdqa xmm9,xmm10 + psrld xmm5,31 + paddd xmm1,xmm1 + + movdqa XMMWORD[(240-128)+rax],xmm0 + paddd xmm14,xmm0 + psrld xmm9,27 + pxor xmm6,xmm7 + + movdqa xmm7,xmm11 + por xmm8,xmm9 + pslld xmm7,30 + paddd xmm14,xmm6 + + psrld xmm11,2 + paddd xmm14,xmm8 + por xmm1,xmm5 + por xmm11,xmm7 + pxor xmm2,xmm4 + movdqa xmm4,XMMWORD[((48-128))+rax] + + movdqa xmm8,xmm14 + pxor xmm2,XMMWORD[((144-128))+rax] + paddd xmm13,xmm15 + movdqa xmm7,xmm10 + pslld xmm8,5 + pxor xmm2,xmm4 + movdqa xmm6,xmm10 + pandn xmm7,xmm12 + movdqa xmm5,xmm2 + pand xmm6,xmm11 + movdqa xmm9,xmm14 + psrld xmm5,31 + paddd xmm2,xmm2 + + movdqa XMMWORD[(0-128)+rax],xmm1 + paddd xmm13,xmm1 + psrld xmm9,27 + pxor xmm6,xmm7 + + movdqa xmm7,xmm10 + por xmm8,xmm9 + pslld xmm7,30 + paddd xmm13,xmm6 + + psrld xmm10,2 + paddd xmm13,xmm8 + por xmm2,xmm5 + por xmm10,xmm7 + pxor xmm3,xmm0 + movdqa xmm0,XMMWORD[((64-128))+rax] + + movdqa xmm8,xmm13 + pxor xmm3,XMMWORD[((160-128))+rax] + paddd xmm12,xmm15 + movdqa xmm7,xmm14 + pslld xmm8,5 + pxor xmm3,xmm0 + movdqa xmm6,xmm14 + pandn xmm7,xmm11 + movdqa xmm5,xmm3 + pand xmm6,xmm10 + movdqa xmm9,xmm13 + psrld xmm5,31 + paddd xmm3,xmm3 + + movdqa XMMWORD[(16-128)+rax],xmm2 + paddd xmm12,xmm2 + psrld xmm9,27 + pxor xmm6,xmm7 + + movdqa xmm7,xmm14 + por xmm8,xmm9 + pslld xmm7,30 + paddd xmm12,xmm6 + + psrld xmm14,2 + paddd xmm12,xmm8 + por xmm3,xmm5 + por xmm14,xmm7 + pxor xmm4,xmm1 + movdqa xmm1,XMMWORD[((80-128))+rax] + + movdqa xmm8,xmm12 + pxor xmm4,XMMWORD[((176-128))+rax] + paddd xmm11,xmm15 + movdqa xmm7,xmm13 + pslld xmm8,5 + pxor xmm4,xmm1 + movdqa xmm6,xmm13 + pandn xmm7,xmm10 + movdqa xmm5,xmm4 + pand xmm6,xmm14 + movdqa xmm9,xmm12 + psrld xmm5,31 + paddd xmm4,xmm4 + + movdqa XMMWORD[(32-128)+rax],xmm3 + paddd xmm11,xmm3 + psrld xmm9,27 + pxor xmm6,xmm7 + + movdqa xmm7,xmm13 + por xmm8,xmm9 + pslld xmm7,30 + paddd xmm11,xmm6 + + psrld xmm13,2 + paddd xmm11,xmm8 + por xmm4,xmm5 + por xmm13,xmm7 + pxor xmm0,xmm2 + movdqa xmm2,XMMWORD[((96-128))+rax] + + movdqa xmm8,xmm11 + pxor xmm0,XMMWORD[((192-128))+rax] + paddd xmm10,xmm15 + movdqa xmm7,xmm12 + pslld xmm8,5 + pxor xmm0,xmm2 + movdqa xmm6,xmm12 + pandn xmm7,xmm14 + movdqa xmm5,xmm0 + pand xmm6,xmm13 + movdqa xmm9,xmm11 + psrld xmm5,31 + paddd xmm0,xmm0 + + movdqa XMMWORD[(48-128)+rax],xmm4 + paddd xmm10,xmm4 + psrld xmm9,27 + pxor xmm6,xmm7 + + movdqa xmm7,xmm12 + por xmm8,xmm9 + pslld xmm7,30 + paddd xmm10,xmm6 + + psrld xmm12,2 + paddd xmm10,xmm8 + por xmm0,xmm5 + por xmm12,xmm7 + movdqa xmm15,XMMWORD[rbp] + pxor xmm1,xmm3 + movdqa xmm3,XMMWORD[((112-128))+rax] + + movdqa xmm8,xmm10 + movdqa xmm6,xmm13 + pxor xmm1,XMMWORD[((208-128))+rax] + paddd xmm14,xmm15 + pslld xmm8,5 + pxor xmm6,xmm11 + + movdqa xmm9,xmm10 + movdqa XMMWORD[(64-128)+rax],xmm0 + paddd xmm14,xmm0 + pxor xmm1,xmm3 + psrld xmm9,27 + pxor xmm6,xmm12 + movdqa xmm7,xmm11 + + pslld xmm7,30 + movdqa xmm5,xmm1 + por xmm8,xmm9 + psrld xmm5,31 + paddd xmm14,xmm6 + paddd xmm1,xmm1 + + psrld xmm11,2 + paddd xmm14,xmm8 + por xmm1,xmm5 + por xmm11,xmm7 + pxor xmm2,xmm4 + movdqa xmm4,XMMWORD[((128-128))+rax] + + movdqa xmm8,xmm14 + movdqa xmm6,xmm12 + pxor xmm2,XMMWORD[((224-128))+rax] + paddd xmm13,xmm15 + pslld xmm8,5 + pxor xmm6,xmm10 + + movdqa xmm9,xmm14 + movdqa XMMWORD[(80-128)+rax],xmm1 + paddd xmm13,xmm1 + pxor xmm2,xmm4 + psrld xmm9,27 + pxor xmm6,xmm11 + movdqa xmm7,xmm10 + + pslld xmm7,30 + movdqa xmm5,xmm2 + por xmm8,xmm9 + psrld xmm5,31 + paddd xmm13,xmm6 + paddd xmm2,xmm2 + + psrld xmm10,2 + paddd xmm13,xmm8 + por xmm2,xmm5 + por xmm10,xmm7 + pxor xmm3,xmm0 + movdqa xmm0,XMMWORD[((144-128))+rax] + + movdqa xmm8,xmm13 + movdqa xmm6,xmm11 + pxor xmm3,XMMWORD[((240-128))+rax] + paddd xmm12,xmm15 + pslld xmm8,5 + pxor xmm6,xmm14 + + movdqa xmm9,xmm13 + movdqa XMMWORD[(96-128)+rax],xmm2 + paddd xmm12,xmm2 + pxor xmm3,xmm0 + psrld xmm9,27 + pxor xmm6,xmm10 + movdqa xmm7,xmm14 + + pslld xmm7,30 + movdqa xmm5,xmm3 + por xmm8,xmm9 + psrld xmm5,31 + paddd xmm12,xmm6 + paddd xmm3,xmm3 + + psrld xmm14,2 + paddd xmm12,xmm8 + por xmm3,xmm5 + por xmm14,xmm7 + pxor xmm4,xmm1 + movdqa xmm1,XMMWORD[((160-128))+rax] + + movdqa xmm8,xmm12 + movdqa xmm6,xmm10 + pxor xmm4,XMMWORD[((0-128))+rax] + paddd xmm11,xmm15 + pslld xmm8,5 + pxor xmm6,xmm13 + + movdqa xmm9,xmm12 + movdqa XMMWORD[(112-128)+rax],xmm3 + paddd xmm11,xmm3 + pxor xmm4,xmm1 + psrld xmm9,27 + pxor xmm6,xmm14 + movdqa xmm7,xmm13 + + pslld xmm7,30 + movdqa xmm5,xmm4 + por xmm8,xmm9 + psrld xmm5,31 + paddd xmm11,xmm6 + paddd xmm4,xmm4 + + psrld xmm13,2 + paddd xmm11,xmm8 + por xmm4,xmm5 + por xmm13,xmm7 + pxor xmm0,xmm2 + movdqa xmm2,XMMWORD[((176-128))+rax] + + movdqa xmm8,xmm11 + movdqa xmm6,xmm14 + pxor xmm0,XMMWORD[((16-128))+rax] + paddd xmm10,xmm15 + pslld xmm8,5 + pxor xmm6,xmm12 + + movdqa xmm9,xmm11 + movdqa XMMWORD[(128-128)+rax],xmm4 + paddd xmm10,xmm4 + pxor xmm0,xmm2 + psrld xmm9,27 + pxor xmm6,xmm13 + movdqa xmm7,xmm12 + + pslld xmm7,30 + movdqa xmm5,xmm0 + por xmm8,xmm9 + psrld xmm5,31 + paddd xmm10,xmm6 + paddd xmm0,xmm0 + + psrld xmm12,2 + paddd xmm10,xmm8 + por xmm0,xmm5 + por xmm12,xmm7 + pxor xmm1,xmm3 + movdqa xmm3,XMMWORD[((192-128))+rax] + + movdqa xmm8,xmm10 + movdqa xmm6,xmm13 + pxor xmm1,XMMWORD[((32-128))+rax] + paddd xmm14,xmm15 + pslld xmm8,5 + pxor xmm6,xmm11 + + movdqa xmm9,xmm10 + movdqa XMMWORD[(144-128)+rax],xmm0 + paddd xmm14,xmm0 + pxor xmm1,xmm3 + psrld xmm9,27 + pxor xmm6,xmm12 + movdqa xmm7,xmm11 + + pslld xmm7,30 + movdqa xmm5,xmm1 + por xmm8,xmm9 + psrld xmm5,31 + paddd xmm14,xmm6 + paddd xmm1,xmm1 + + psrld xmm11,2 + paddd xmm14,xmm8 + por xmm1,xmm5 + por xmm11,xmm7 + pxor xmm2,xmm4 + movdqa xmm4,XMMWORD[((208-128))+rax] + + movdqa xmm8,xmm14 + movdqa xmm6,xmm12 + pxor xmm2,XMMWORD[((48-128))+rax] + paddd xmm13,xmm15 + pslld xmm8,5 + pxor xmm6,xmm10 + + movdqa xmm9,xmm14 + movdqa XMMWORD[(160-128)+rax],xmm1 + paddd xmm13,xmm1 + pxor xmm2,xmm4 + psrld xmm9,27 + pxor xmm6,xmm11 + movdqa xmm7,xmm10 + + pslld xmm7,30 + movdqa xmm5,xmm2 + por xmm8,xmm9 + psrld xmm5,31 + paddd xmm13,xmm6 + paddd xmm2,xmm2 + + psrld xmm10,2 + paddd xmm13,xmm8 + por xmm2,xmm5 + por xmm10,xmm7 + pxor xmm3,xmm0 + movdqa xmm0,XMMWORD[((224-128))+rax] + + movdqa xmm8,xmm13 + movdqa xmm6,xmm11 + pxor xmm3,XMMWORD[((64-128))+rax] + paddd xmm12,xmm15 + pslld xmm8,5 + pxor xmm6,xmm14 + + movdqa xmm9,xmm13 + movdqa XMMWORD[(176-128)+rax],xmm2 + paddd xmm12,xmm2 + pxor xmm3,xmm0 + psrld xmm9,27 + pxor xmm6,xmm10 + movdqa xmm7,xmm14 + + pslld xmm7,30 + movdqa xmm5,xmm3 + por xmm8,xmm9 + psrld xmm5,31 + paddd xmm12,xmm6 + paddd xmm3,xmm3 + + psrld xmm14,2 + paddd xmm12,xmm8 + por xmm3,xmm5 + por xmm14,xmm7 + pxor xmm4,xmm1 + movdqa xmm1,XMMWORD[((240-128))+rax] + + movdqa xmm8,xmm12 + movdqa xmm6,xmm10 + pxor xmm4,XMMWORD[((80-128))+rax] + paddd xmm11,xmm15 + pslld xmm8,5 + pxor xmm6,xmm13 + + movdqa xmm9,xmm12 + movdqa XMMWORD[(192-128)+rax],xmm3 + paddd xmm11,xmm3 + pxor xmm4,xmm1 + psrld xmm9,27 + pxor xmm6,xmm14 + movdqa xmm7,xmm13 + + pslld xmm7,30 + movdqa xmm5,xmm4 + por xmm8,xmm9 + psrld xmm5,31 + paddd xmm11,xmm6 + paddd xmm4,xmm4 + + psrld xmm13,2 + paddd xmm11,xmm8 + por xmm4,xmm5 + por xmm13,xmm7 + pxor xmm0,xmm2 + movdqa xmm2,XMMWORD[((0-128))+rax] + + movdqa xmm8,xmm11 + movdqa xmm6,xmm14 + pxor xmm0,XMMWORD[((96-128))+rax] + paddd xmm10,xmm15 + pslld xmm8,5 + pxor xmm6,xmm12 + + movdqa xmm9,xmm11 + movdqa XMMWORD[(208-128)+rax],xmm4 + paddd xmm10,xmm4 + pxor xmm0,xmm2 + psrld xmm9,27 + pxor xmm6,xmm13 + movdqa xmm7,xmm12 + + pslld xmm7,30 + movdqa xmm5,xmm0 + por xmm8,xmm9 + psrld xmm5,31 + paddd xmm10,xmm6 + paddd xmm0,xmm0 + + psrld xmm12,2 + paddd xmm10,xmm8 + por xmm0,xmm5 + por xmm12,xmm7 + pxor xmm1,xmm3 + movdqa xmm3,XMMWORD[((16-128))+rax] + + movdqa xmm8,xmm10 + movdqa xmm6,xmm13 + pxor xmm1,XMMWORD[((112-128))+rax] + paddd xmm14,xmm15 + pslld xmm8,5 + pxor xmm6,xmm11 + + movdqa xmm9,xmm10 + movdqa XMMWORD[(224-128)+rax],xmm0 + paddd xmm14,xmm0 + pxor xmm1,xmm3 + psrld xmm9,27 + pxor xmm6,xmm12 + movdqa xmm7,xmm11 + + pslld xmm7,30 + movdqa xmm5,xmm1 + por xmm8,xmm9 + psrld xmm5,31 + paddd xmm14,xmm6 + paddd xmm1,xmm1 + + psrld xmm11,2 + paddd xmm14,xmm8 + por xmm1,xmm5 + por xmm11,xmm7 + pxor xmm2,xmm4 + movdqa xmm4,XMMWORD[((32-128))+rax] + + movdqa xmm8,xmm14 + movdqa xmm6,xmm12 + pxor xmm2,XMMWORD[((128-128))+rax] + paddd xmm13,xmm15 + pslld xmm8,5 + pxor xmm6,xmm10 + + movdqa xmm9,xmm14 + movdqa XMMWORD[(240-128)+rax],xmm1 + paddd xmm13,xmm1 + pxor xmm2,xmm4 + psrld xmm9,27 + pxor xmm6,xmm11 + movdqa xmm7,xmm10 + + pslld xmm7,30 + movdqa xmm5,xmm2 + por xmm8,xmm9 + psrld xmm5,31 + paddd xmm13,xmm6 + paddd xmm2,xmm2 + + psrld xmm10,2 + paddd xmm13,xmm8 + por xmm2,xmm5 + por xmm10,xmm7 + pxor xmm3,xmm0 + movdqa xmm0,XMMWORD[((48-128))+rax] + + movdqa xmm8,xmm13 + movdqa xmm6,xmm11 + pxor xmm3,XMMWORD[((144-128))+rax] + paddd xmm12,xmm15 + pslld xmm8,5 + pxor xmm6,xmm14 + + movdqa xmm9,xmm13 + movdqa XMMWORD[(0-128)+rax],xmm2 + paddd xmm12,xmm2 + pxor xmm3,xmm0 + psrld xmm9,27 + pxor xmm6,xmm10 + movdqa xmm7,xmm14 + + pslld xmm7,30 + movdqa xmm5,xmm3 + por xmm8,xmm9 + psrld xmm5,31 + paddd xmm12,xmm6 + paddd xmm3,xmm3 + + psrld xmm14,2 + paddd xmm12,xmm8 + por xmm3,xmm5 + por xmm14,xmm7 + pxor xmm4,xmm1 + movdqa xmm1,XMMWORD[((64-128))+rax] + + movdqa xmm8,xmm12 + movdqa xmm6,xmm10 + pxor xmm4,XMMWORD[((160-128))+rax] + paddd xmm11,xmm15 + pslld xmm8,5 + pxor xmm6,xmm13 + + movdqa xmm9,xmm12 + movdqa XMMWORD[(16-128)+rax],xmm3 + paddd xmm11,xmm3 + pxor xmm4,xmm1 + psrld xmm9,27 + pxor xmm6,xmm14 + movdqa xmm7,xmm13 + + pslld xmm7,30 + movdqa xmm5,xmm4 + por xmm8,xmm9 + psrld xmm5,31 + paddd xmm11,xmm6 + paddd xmm4,xmm4 + + psrld xmm13,2 + paddd xmm11,xmm8 + por xmm4,xmm5 + por xmm13,xmm7 + pxor xmm0,xmm2 + movdqa xmm2,XMMWORD[((80-128))+rax] + + movdqa xmm8,xmm11 + movdqa xmm6,xmm14 + pxor xmm0,XMMWORD[((176-128))+rax] + paddd xmm10,xmm15 + pslld xmm8,5 + pxor xmm6,xmm12 + + movdqa xmm9,xmm11 + movdqa XMMWORD[(32-128)+rax],xmm4 + paddd xmm10,xmm4 + pxor xmm0,xmm2 + psrld xmm9,27 + pxor xmm6,xmm13 + movdqa xmm7,xmm12 + + pslld xmm7,30 + movdqa xmm5,xmm0 + por xmm8,xmm9 + psrld xmm5,31 + paddd xmm10,xmm6 + paddd xmm0,xmm0 + + psrld xmm12,2 + paddd xmm10,xmm8 + por xmm0,xmm5 + por xmm12,xmm7 + pxor xmm1,xmm3 + movdqa xmm3,XMMWORD[((96-128))+rax] + + movdqa xmm8,xmm10 + movdqa xmm6,xmm13 + pxor xmm1,XMMWORD[((192-128))+rax] + paddd xmm14,xmm15 + pslld xmm8,5 + pxor xmm6,xmm11 + + movdqa xmm9,xmm10 + movdqa XMMWORD[(48-128)+rax],xmm0 + paddd xmm14,xmm0 + pxor xmm1,xmm3 + psrld xmm9,27 + pxor xmm6,xmm12 + movdqa xmm7,xmm11 + + pslld xmm7,30 + movdqa xmm5,xmm1 + por xmm8,xmm9 + psrld xmm5,31 + paddd xmm14,xmm6 + paddd xmm1,xmm1 + + psrld xmm11,2 + paddd xmm14,xmm8 + por xmm1,xmm5 + por xmm11,xmm7 + pxor xmm2,xmm4 + movdqa xmm4,XMMWORD[((112-128))+rax] + + movdqa xmm8,xmm14 + movdqa xmm6,xmm12 + pxor xmm2,XMMWORD[((208-128))+rax] + paddd xmm13,xmm15 + pslld xmm8,5 + pxor xmm6,xmm10 + + movdqa xmm9,xmm14 + movdqa XMMWORD[(64-128)+rax],xmm1 + paddd xmm13,xmm1 + pxor xmm2,xmm4 + psrld xmm9,27 + pxor xmm6,xmm11 + movdqa xmm7,xmm10 + + pslld xmm7,30 + movdqa xmm5,xmm2 + por xmm8,xmm9 + psrld xmm5,31 + paddd xmm13,xmm6 + paddd xmm2,xmm2 + + psrld xmm10,2 + paddd xmm13,xmm8 + por xmm2,xmm5 + por xmm10,xmm7 + pxor xmm3,xmm0 + movdqa xmm0,XMMWORD[((128-128))+rax] + + movdqa xmm8,xmm13 + movdqa xmm6,xmm11 + pxor xmm3,XMMWORD[((224-128))+rax] + paddd xmm12,xmm15 + pslld xmm8,5 + pxor xmm6,xmm14 + + movdqa xmm9,xmm13 + movdqa XMMWORD[(80-128)+rax],xmm2 + paddd xmm12,xmm2 + pxor xmm3,xmm0 + psrld xmm9,27 + pxor xmm6,xmm10 + movdqa xmm7,xmm14 + + pslld xmm7,30 + movdqa xmm5,xmm3 + por xmm8,xmm9 + psrld xmm5,31 + paddd xmm12,xmm6 + paddd xmm3,xmm3 + + psrld xmm14,2 + paddd xmm12,xmm8 + por xmm3,xmm5 + por xmm14,xmm7 + pxor xmm4,xmm1 + movdqa xmm1,XMMWORD[((144-128))+rax] + + movdqa xmm8,xmm12 + movdqa xmm6,xmm10 + pxor xmm4,XMMWORD[((240-128))+rax] + paddd xmm11,xmm15 + pslld xmm8,5 + pxor xmm6,xmm13 + + movdqa xmm9,xmm12 + movdqa XMMWORD[(96-128)+rax],xmm3 + paddd xmm11,xmm3 + pxor xmm4,xmm1 + psrld xmm9,27 + pxor xmm6,xmm14 + movdqa xmm7,xmm13 + + pslld xmm7,30 + movdqa xmm5,xmm4 + por xmm8,xmm9 + psrld xmm5,31 + paddd xmm11,xmm6 + paddd xmm4,xmm4 + + psrld xmm13,2 + paddd xmm11,xmm8 + por xmm4,xmm5 + por xmm13,xmm7 + pxor xmm0,xmm2 + movdqa xmm2,XMMWORD[((160-128))+rax] + + movdqa xmm8,xmm11 + movdqa xmm6,xmm14 + pxor xmm0,XMMWORD[((0-128))+rax] + paddd xmm10,xmm15 + pslld xmm8,5 + pxor xmm6,xmm12 + + movdqa xmm9,xmm11 + movdqa XMMWORD[(112-128)+rax],xmm4 + paddd xmm10,xmm4 + pxor xmm0,xmm2 + psrld xmm9,27 + pxor xmm6,xmm13 + movdqa xmm7,xmm12 + + pslld xmm7,30 + movdqa xmm5,xmm0 + por xmm8,xmm9 + psrld xmm5,31 + paddd xmm10,xmm6 + paddd xmm0,xmm0 + + psrld xmm12,2 + paddd xmm10,xmm8 + por xmm0,xmm5 + por xmm12,xmm7 + movdqa xmm15,XMMWORD[32+rbp] + pxor xmm1,xmm3 + movdqa xmm3,XMMWORD[((176-128))+rax] + + movdqa xmm8,xmm10 + movdqa xmm7,xmm13 + pxor xmm1,XMMWORD[((16-128))+rax] + pxor xmm1,xmm3 + paddd xmm14,xmm15 + pslld xmm8,5 + movdqa xmm9,xmm10 + pand xmm7,xmm12 + + movdqa xmm6,xmm13 + movdqa xmm5,xmm1 + psrld xmm9,27 + paddd xmm14,xmm7 + pxor xmm6,xmm12 + + movdqa XMMWORD[(128-128)+rax],xmm0 + paddd xmm14,xmm0 + por xmm8,xmm9 + psrld xmm5,31 + pand xmm6,xmm11 + movdqa xmm7,xmm11 + + pslld xmm7,30 + paddd xmm1,xmm1 + paddd xmm14,xmm6 + + psrld xmm11,2 + paddd xmm14,xmm8 + por xmm1,xmm5 + por xmm11,xmm7 + pxor xmm2,xmm4 + movdqa xmm4,XMMWORD[((192-128))+rax] + + movdqa xmm8,xmm14 + movdqa xmm7,xmm12 + pxor xmm2,XMMWORD[((32-128))+rax] + pxor xmm2,xmm4 + paddd xmm13,xmm15 + pslld xmm8,5 + movdqa xmm9,xmm14 + pand xmm7,xmm11 + + movdqa xmm6,xmm12 + movdqa xmm5,xmm2 + psrld xmm9,27 + paddd xmm13,xmm7 + pxor xmm6,xmm11 + + movdqa XMMWORD[(144-128)+rax],xmm1 + paddd xmm13,xmm1 + por xmm8,xmm9 + psrld xmm5,31 + pand xmm6,xmm10 + movdqa xmm7,xmm10 + + pslld xmm7,30 + paddd xmm2,xmm2 + paddd xmm13,xmm6 + + psrld xmm10,2 + paddd xmm13,xmm8 + por xmm2,xmm5 + por xmm10,xmm7 + pxor xmm3,xmm0 + movdqa xmm0,XMMWORD[((208-128))+rax] + + movdqa xmm8,xmm13 + movdqa xmm7,xmm11 + pxor xmm3,XMMWORD[((48-128))+rax] + pxor xmm3,xmm0 + paddd xmm12,xmm15 + pslld xmm8,5 + movdqa xmm9,xmm13 + pand xmm7,xmm10 + + movdqa xmm6,xmm11 + movdqa xmm5,xmm3 + psrld xmm9,27 + paddd xmm12,xmm7 + pxor xmm6,xmm10 + + movdqa XMMWORD[(160-128)+rax],xmm2 + paddd xmm12,xmm2 + por xmm8,xmm9 + psrld xmm5,31 + pand xmm6,xmm14 + movdqa xmm7,xmm14 + + pslld xmm7,30 + paddd xmm3,xmm3 + paddd xmm12,xmm6 + + psrld xmm14,2 + paddd xmm12,xmm8 + por xmm3,xmm5 + por xmm14,xmm7 + pxor xmm4,xmm1 + movdqa xmm1,XMMWORD[((224-128))+rax] + + movdqa xmm8,xmm12 + movdqa xmm7,xmm10 + pxor xmm4,XMMWORD[((64-128))+rax] + pxor xmm4,xmm1 + paddd xmm11,xmm15 + pslld xmm8,5 + movdqa xmm9,xmm12 + pand xmm7,xmm14 + + movdqa xmm6,xmm10 + movdqa xmm5,xmm4 + psrld xmm9,27 + paddd xmm11,xmm7 + pxor xmm6,xmm14 + + movdqa XMMWORD[(176-128)+rax],xmm3 + paddd xmm11,xmm3 + por xmm8,xmm9 + psrld xmm5,31 + pand xmm6,xmm13 + movdqa xmm7,xmm13 + + pslld xmm7,30 + paddd xmm4,xmm4 + paddd xmm11,xmm6 + + psrld xmm13,2 + paddd xmm11,xmm8 + por xmm4,xmm5 + por xmm13,xmm7 + pxor xmm0,xmm2 + movdqa xmm2,XMMWORD[((240-128))+rax] + + movdqa xmm8,xmm11 + movdqa xmm7,xmm14 + pxor xmm0,XMMWORD[((80-128))+rax] + pxor xmm0,xmm2 + paddd xmm10,xmm15 + pslld xmm8,5 + movdqa xmm9,xmm11 + pand xmm7,xmm13 + + movdqa xmm6,xmm14 + movdqa xmm5,xmm0 + psrld xmm9,27 + paddd xmm10,xmm7 + pxor xmm6,xmm13 + + movdqa XMMWORD[(192-128)+rax],xmm4 + paddd xmm10,xmm4 + por xmm8,xmm9 + psrld xmm5,31 + pand xmm6,xmm12 + movdqa xmm7,xmm12 + + pslld xmm7,30 + paddd xmm0,xmm0 + paddd xmm10,xmm6 + + psrld xmm12,2 + paddd xmm10,xmm8 + por xmm0,xmm5 + por xmm12,xmm7 + pxor xmm1,xmm3 + movdqa xmm3,XMMWORD[((0-128))+rax] + + movdqa xmm8,xmm10 + movdqa xmm7,xmm13 + pxor xmm1,XMMWORD[((96-128))+rax] + pxor xmm1,xmm3 + paddd xmm14,xmm15 + pslld xmm8,5 + movdqa xmm9,xmm10 + pand xmm7,xmm12 + + movdqa xmm6,xmm13 + movdqa xmm5,xmm1 + psrld xmm9,27 + paddd xmm14,xmm7 + pxor xmm6,xmm12 + + movdqa XMMWORD[(208-128)+rax],xmm0 + paddd xmm14,xmm0 + por xmm8,xmm9 + psrld xmm5,31 + pand xmm6,xmm11 + movdqa xmm7,xmm11 + + pslld xmm7,30 + paddd xmm1,xmm1 + paddd xmm14,xmm6 + + psrld xmm11,2 + paddd xmm14,xmm8 + por xmm1,xmm5 + por xmm11,xmm7 + pxor xmm2,xmm4 + movdqa xmm4,XMMWORD[((16-128))+rax] + + movdqa xmm8,xmm14 + movdqa xmm7,xmm12 + pxor xmm2,XMMWORD[((112-128))+rax] + pxor xmm2,xmm4 + paddd xmm13,xmm15 + pslld xmm8,5 + movdqa xmm9,xmm14 + pand xmm7,xmm11 + + movdqa xmm6,xmm12 + movdqa xmm5,xmm2 + psrld xmm9,27 + paddd xmm13,xmm7 + pxor xmm6,xmm11 + + movdqa XMMWORD[(224-128)+rax],xmm1 + paddd xmm13,xmm1 + por xmm8,xmm9 + psrld xmm5,31 + pand xmm6,xmm10 + movdqa xmm7,xmm10 + + pslld xmm7,30 + paddd xmm2,xmm2 + paddd xmm13,xmm6 + + psrld xmm10,2 + paddd xmm13,xmm8 + por xmm2,xmm5 + por xmm10,xmm7 + pxor xmm3,xmm0 + movdqa xmm0,XMMWORD[((32-128))+rax] + + movdqa xmm8,xmm13 + movdqa xmm7,xmm11 + pxor xmm3,XMMWORD[((128-128))+rax] + pxor xmm3,xmm0 + paddd xmm12,xmm15 + pslld xmm8,5 + movdqa xmm9,xmm13 + pand xmm7,xmm10 + + movdqa xmm6,xmm11 + movdqa xmm5,xmm3 + psrld xmm9,27 + paddd xmm12,xmm7 + pxor xmm6,xmm10 + + movdqa XMMWORD[(240-128)+rax],xmm2 + paddd xmm12,xmm2 + por xmm8,xmm9 + psrld xmm5,31 + pand xmm6,xmm14 + movdqa xmm7,xmm14 + + pslld xmm7,30 + paddd xmm3,xmm3 + paddd xmm12,xmm6 + + psrld xmm14,2 + paddd xmm12,xmm8 + por xmm3,xmm5 + por xmm14,xmm7 + pxor xmm4,xmm1 + movdqa xmm1,XMMWORD[((48-128))+rax] + + movdqa xmm8,xmm12 + movdqa xmm7,xmm10 + pxor xmm4,XMMWORD[((144-128))+rax] + pxor xmm4,xmm1 + paddd xmm11,xmm15 + pslld xmm8,5 + movdqa xmm9,xmm12 + pand xmm7,xmm14 + + movdqa xmm6,xmm10 + movdqa xmm5,xmm4 + psrld xmm9,27 + paddd xmm11,xmm7 + pxor xmm6,xmm14 + + movdqa XMMWORD[(0-128)+rax],xmm3 + paddd xmm11,xmm3 + por xmm8,xmm9 + psrld xmm5,31 + pand xmm6,xmm13 + movdqa xmm7,xmm13 + + pslld xmm7,30 + paddd xmm4,xmm4 + paddd xmm11,xmm6 + + psrld xmm13,2 + paddd xmm11,xmm8 + por xmm4,xmm5 + por xmm13,xmm7 + pxor xmm0,xmm2 + movdqa xmm2,XMMWORD[((64-128))+rax] + + movdqa xmm8,xmm11 + movdqa xmm7,xmm14 + pxor xmm0,XMMWORD[((160-128))+rax] + pxor xmm0,xmm2 + paddd xmm10,xmm15 + pslld xmm8,5 + movdqa xmm9,xmm11 + pand xmm7,xmm13 + + movdqa xmm6,xmm14 + movdqa xmm5,xmm0 + psrld xmm9,27 + paddd xmm10,xmm7 + pxor xmm6,xmm13 + + movdqa XMMWORD[(16-128)+rax],xmm4 + paddd xmm10,xmm4 + por xmm8,xmm9 + psrld xmm5,31 + pand xmm6,xmm12 + movdqa xmm7,xmm12 + + pslld xmm7,30 + paddd xmm0,xmm0 + paddd xmm10,xmm6 + + psrld xmm12,2 + paddd xmm10,xmm8 + por xmm0,xmm5 + por xmm12,xmm7 + pxor xmm1,xmm3 + movdqa xmm3,XMMWORD[((80-128))+rax] + + movdqa xmm8,xmm10 + movdqa xmm7,xmm13 + pxor xmm1,XMMWORD[((176-128))+rax] + pxor xmm1,xmm3 + paddd xmm14,xmm15 + pslld xmm8,5 + movdqa xmm9,xmm10 + pand xmm7,xmm12 + + movdqa xmm6,xmm13 + movdqa xmm5,xmm1 + psrld xmm9,27 + paddd xmm14,xmm7 + pxor xmm6,xmm12 + + movdqa XMMWORD[(32-128)+rax],xmm0 + paddd xmm14,xmm0 + por xmm8,xmm9 + psrld xmm5,31 + pand xmm6,xmm11 + movdqa xmm7,xmm11 + + pslld xmm7,30 + paddd xmm1,xmm1 + paddd xmm14,xmm6 + + psrld xmm11,2 + paddd xmm14,xmm8 + por xmm1,xmm5 + por xmm11,xmm7 + pxor xmm2,xmm4 + movdqa xmm4,XMMWORD[((96-128))+rax] + + movdqa xmm8,xmm14 + movdqa xmm7,xmm12 + pxor xmm2,XMMWORD[((192-128))+rax] + pxor xmm2,xmm4 + paddd xmm13,xmm15 + pslld xmm8,5 + movdqa xmm9,xmm14 + pand xmm7,xmm11 + + movdqa xmm6,xmm12 + movdqa xmm5,xmm2 + psrld xmm9,27 + paddd xmm13,xmm7 + pxor xmm6,xmm11 + + movdqa XMMWORD[(48-128)+rax],xmm1 + paddd xmm13,xmm1 + por xmm8,xmm9 + psrld xmm5,31 + pand xmm6,xmm10 + movdqa xmm7,xmm10 + + pslld xmm7,30 + paddd xmm2,xmm2 + paddd xmm13,xmm6 + + psrld xmm10,2 + paddd xmm13,xmm8 + por xmm2,xmm5 + por xmm10,xmm7 + pxor xmm3,xmm0 + movdqa xmm0,XMMWORD[((112-128))+rax] + + movdqa xmm8,xmm13 + movdqa xmm7,xmm11 + pxor xmm3,XMMWORD[((208-128))+rax] + pxor xmm3,xmm0 + paddd xmm12,xmm15 + pslld xmm8,5 + movdqa xmm9,xmm13 + pand xmm7,xmm10 + + movdqa xmm6,xmm11 + movdqa xmm5,xmm3 + psrld xmm9,27 + paddd xmm12,xmm7 + pxor xmm6,xmm10 + + movdqa XMMWORD[(64-128)+rax],xmm2 + paddd xmm12,xmm2 + por xmm8,xmm9 + psrld xmm5,31 + pand xmm6,xmm14 + movdqa xmm7,xmm14 + + pslld xmm7,30 + paddd xmm3,xmm3 + paddd xmm12,xmm6 + + psrld xmm14,2 + paddd xmm12,xmm8 + por xmm3,xmm5 + por xmm14,xmm7 + pxor xmm4,xmm1 + movdqa xmm1,XMMWORD[((128-128))+rax] + + movdqa xmm8,xmm12 + movdqa xmm7,xmm10 + pxor xmm4,XMMWORD[((224-128))+rax] + pxor xmm4,xmm1 + paddd xmm11,xmm15 + pslld xmm8,5 + movdqa xmm9,xmm12 + pand xmm7,xmm14 + + movdqa xmm6,xmm10 + movdqa xmm5,xmm4 + psrld xmm9,27 + paddd xmm11,xmm7 + pxor xmm6,xmm14 + + movdqa XMMWORD[(80-128)+rax],xmm3 + paddd xmm11,xmm3 + por xmm8,xmm9 + psrld xmm5,31 + pand xmm6,xmm13 + movdqa xmm7,xmm13 + + pslld xmm7,30 + paddd xmm4,xmm4 + paddd xmm11,xmm6 + + psrld xmm13,2 + paddd xmm11,xmm8 + por xmm4,xmm5 + por xmm13,xmm7 + pxor xmm0,xmm2 + movdqa xmm2,XMMWORD[((144-128))+rax] + + movdqa xmm8,xmm11 + movdqa xmm7,xmm14 + pxor xmm0,XMMWORD[((240-128))+rax] + pxor xmm0,xmm2 + paddd xmm10,xmm15 + pslld xmm8,5 + movdqa xmm9,xmm11 + pand xmm7,xmm13 + + movdqa xmm6,xmm14 + movdqa xmm5,xmm0 + psrld xmm9,27 + paddd xmm10,xmm7 + pxor xmm6,xmm13 + + movdqa XMMWORD[(96-128)+rax],xmm4 + paddd xmm10,xmm4 + por xmm8,xmm9 + psrld xmm5,31 + pand xmm6,xmm12 + movdqa xmm7,xmm12 + + pslld xmm7,30 + paddd xmm0,xmm0 + paddd xmm10,xmm6 + + psrld xmm12,2 + paddd xmm10,xmm8 + por xmm0,xmm5 + por xmm12,xmm7 + pxor xmm1,xmm3 + movdqa xmm3,XMMWORD[((160-128))+rax] + + movdqa xmm8,xmm10 + movdqa xmm7,xmm13 + pxor xmm1,XMMWORD[((0-128))+rax] + pxor xmm1,xmm3 + paddd xmm14,xmm15 + pslld xmm8,5 + movdqa xmm9,xmm10 + pand xmm7,xmm12 + + movdqa xmm6,xmm13 + movdqa xmm5,xmm1 + psrld xmm9,27 + paddd xmm14,xmm7 + pxor xmm6,xmm12 + + movdqa XMMWORD[(112-128)+rax],xmm0 + paddd xmm14,xmm0 + por xmm8,xmm9 + psrld xmm5,31 + pand xmm6,xmm11 + movdqa xmm7,xmm11 + + pslld xmm7,30 + paddd xmm1,xmm1 + paddd xmm14,xmm6 + + psrld xmm11,2 + paddd xmm14,xmm8 + por xmm1,xmm5 + por xmm11,xmm7 + pxor xmm2,xmm4 + movdqa xmm4,XMMWORD[((176-128))+rax] + + movdqa xmm8,xmm14 + movdqa xmm7,xmm12 + pxor xmm2,XMMWORD[((16-128))+rax] + pxor xmm2,xmm4 + paddd xmm13,xmm15 + pslld xmm8,5 + movdqa xmm9,xmm14 + pand xmm7,xmm11 + + movdqa xmm6,xmm12 + movdqa xmm5,xmm2 + psrld xmm9,27 + paddd xmm13,xmm7 + pxor xmm6,xmm11 + + movdqa XMMWORD[(128-128)+rax],xmm1 + paddd xmm13,xmm1 + por xmm8,xmm9 + psrld xmm5,31 + pand xmm6,xmm10 + movdqa xmm7,xmm10 + + pslld xmm7,30 + paddd xmm2,xmm2 + paddd xmm13,xmm6 + + psrld xmm10,2 + paddd xmm13,xmm8 + por xmm2,xmm5 + por xmm10,xmm7 + pxor xmm3,xmm0 + movdqa xmm0,XMMWORD[((192-128))+rax] + + movdqa xmm8,xmm13 + movdqa xmm7,xmm11 + pxor xmm3,XMMWORD[((32-128))+rax] + pxor xmm3,xmm0 + paddd xmm12,xmm15 + pslld xmm8,5 + movdqa xmm9,xmm13 + pand xmm7,xmm10 + + movdqa xmm6,xmm11 + movdqa xmm5,xmm3 + psrld xmm9,27 + paddd xmm12,xmm7 + pxor xmm6,xmm10 + + movdqa XMMWORD[(144-128)+rax],xmm2 + paddd xmm12,xmm2 + por xmm8,xmm9 + psrld xmm5,31 + pand xmm6,xmm14 + movdqa xmm7,xmm14 + + pslld xmm7,30 + paddd xmm3,xmm3 + paddd xmm12,xmm6 + + psrld xmm14,2 + paddd xmm12,xmm8 + por xmm3,xmm5 + por xmm14,xmm7 + pxor xmm4,xmm1 + movdqa xmm1,XMMWORD[((208-128))+rax] + + movdqa xmm8,xmm12 + movdqa xmm7,xmm10 + pxor xmm4,XMMWORD[((48-128))+rax] + pxor xmm4,xmm1 + paddd xmm11,xmm15 + pslld xmm8,5 + movdqa xmm9,xmm12 + pand xmm7,xmm14 + + movdqa xmm6,xmm10 + movdqa xmm5,xmm4 + psrld xmm9,27 + paddd xmm11,xmm7 + pxor xmm6,xmm14 + + movdqa XMMWORD[(160-128)+rax],xmm3 + paddd xmm11,xmm3 + por xmm8,xmm9 + psrld xmm5,31 + pand xmm6,xmm13 + movdqa xmm7,xmm13 + + pslld xmm7,30 + paddd xmm4,xmm4 + paddd xmm11,xmm6 + + psrld xmm13,2 + paddd xmm11,xmm8 + por xmm4,xmm5 + por xmm13,xmm7 + pxor xmm0,xmm2 + movdqa xmm2,XMMWORD[((224-128))+rax] + + movdqa xmm8,xmm11 + movdqa xmm7,xmm14 + pxor xmm0,XMMWORD[((64-128))+rax] + pxor xmm0,xmm2 + paddd xmm10,xmm15 + pslld xmm8,5 + movdqa xmm9,xmm11 + pand xmm7,xmm13 + + movdqa xmm6,xmm14 + movdqa xmm5,xmm0 + psrld xmm9,27 + paddd xmm10,xmm7 + pxor xmm6,xmm13 + + movdqa XMMWORD[(176-128)+rax],xmm4 + paddd xmm10,xmm4 + por xmm8,xmm9 + psrld xmm5,31 + pand xmm6,xmm12 + movdqa xmm7,xmm12 + + pslld xmm7,30 + paddd xmm0,xmm0 + paddd xmm10,xmm6 + + psrld xmm12,2 + paddd xmm10,xmm8 + por xmm0,xmm5 + por xmm12,xmm7 + movdqa xmm15,XMMWORD[64+rbp] + pxor xmm1,xmm3 + movdqa xmm3,XMMWORD[((240-128))+rax] + + movdqa xmm8,xmm10 + movdqa xmm6,xmm13 + pxor xmm1,XMMWORD[((80-128))+rax] + paddd xmm14,xmm15 + pslld xmm8,5 + pxor xmm6,xmm11 + + movdqa xmm9,xmm10 + movdqa XMMWORD[(192-128)+rax],xmm0 + paddd xmm14,xmm0 + pxor xmm1,xmm3 + psrld xmm9,27 + pxor xmm6,xmm12 + movdqa xmm7,xmm11 + + pslld xmm7,30 + movdqa xmm5,xmm1 + por xmm8,xmm9 + psrld xmm5,31 + paddd xmm14,xmm6 + paddd xmm1,xmm1 + + psrld xmm11,2 + paddd xmm14,xmm8 + por xmm1,xmm5 + por xmm11,xmm7 + pxor xmm2,xmm4 + movdqa xmm4,XMMWORD[((0-128))+rax] + + movdqa xmm8,xmm14 + movdqa xmm6,xmm12 + pxor xmm2,XMMWORD[((96-128))+rax] + paddd xmm13,xmm15 + pslld xmm8,5 + pxor xmm6,xmm10 + + movdqa xmm9,xmm14 + movdqa XMMWORD[(208-128)+rax],xmm1 + paddd xmm13,xmm1 + pxor xmm2,xmm4 + psrld xmm9,27 + pxor xmm6,xmm11 + movdqa xmm7,xmm10 + + pslld xmm7,30 + movdqa xmm5,xmm2 + por xmm8,xmm9 + psrld xmm5,31 + paddd xmm13,xmm6 + paddd xmm2,xmm2 + + psrld xmm10,2 + paddd xmm13,xmm8 + por xmm2,xmm5 + por xmm10,xmm7 + pxor xmm3,xmm0 + movdqa xmm0,XMMWORD[((16-128))+rax] + + movdqa xmm8,xmm13 + movdqa xmm6,xmm11 + pxor xmm3,XMMWORD[((112-128))+rax] + paddd xmm12,xmm15 + pslld xmm8,5 + pxor xmm6,xmm14 + + movdqa xmm9,xmm13 + movdqa XMMWORD[(224-128)+rax],xmm2 + paddd xmm12,xmm2 + pxor xmm3,xmm0 + psrld xmm9,27 + pxor xmm6,xmm10 + movdqa xmm7,xmm14 + + pslld xmm7,30 + movdqa xmm5,xmm3 + por xmm8,xmm9 + psrld xmm5,31 + paddd xmm12,xmm6 + paddd xmm3,xmm3 + + psrld xmm14,2 + paddd xmm12,xmm8 + por xmm3,xmm5 + por xmm14,xmm7 + pxor xmm4,xmm1 + movdqa xmm1,XMMWORD[((32-128))+rax] + + movdqa xmm8,xmm12 + movdqa xmm6,xmm10 + pxor xmm4,XMMWORD[((128-128))+rax] + paddd xmm11,xmm15 + pslld xmm8,5 + pxor xmm6,xmm13 + + movdqa xmm9,xmm12 + movdqa XMMWORD[(240-128)+rax],xmm3 + paddd xmm11,xmm3 + pxor xmm4,xmm1 + psrld xmm9,27 + pxor xmm6,xmm14 + movdqa xmm7,xmm13 + + pslld xmm7,30 + movdqa xmm5,xmm4 + por xmm8,xmm9 + psrld xmm5,31 + paddd xmm11,xmm6 + paddd xmm4,xmm4 + + psrld xmm13,2 + paddd xmm11,xmm8 + por xmm4,xmm5 + por xmm13,xmm7 + pxor xmm0,xmm2 + movdqa xmm2,XMMWORD[((48-128))+rax] + + movdqa xmm8,xmm11 + movdqa xmm6,xmm14 + pxor xmm0,XMMWORD[((144-128))+rax] + paddd xmm10,xmm15 + pslld xmm8,5 + pxor xmm6,xmm12 + + movdqa xmm9,xmm11 + movdqa XMMWORD[(0-128)+rax],xmm4 + paddd xmm10,xmm4 + pxor xmm0,xmm2 + psrld xmm9,27 + pxor xmm6,xmm13 + movdqa xmm7,xmm12 + + pslld xmm7,30 + movdqa xmm5,xmm0 + por xmm8,xmm9 + psrld xmm5,31 + paddd xmm10,xmm6 + paddd xmm0,xmm0 + + psrld xmm12,2 + paddd xmm10,xmm8 + por xmm0,xmm5 + por xmm12,xmm7 + pxor xmm1,xmm3 + movdqa xmm3,XMMWORD[((64-128))+rax] + + movdqa xmm8,xmm10 + movdqa xmm6,xmm13 + pxor xmm1,XMMWORD[((160-128))+rax] + paddd xmm14,xmm15 + pslld xmm8,5 + pxor xmm6,xmm11 + + movdqa xmm9,xmm10 + movdqa XMMWORD[(16-128)+rax],xmm0 + paddd xmm14,xmm0 + pxor xmm1,xmm3 + psrld xmm9,27 + pxor xmm6,xmm12 + movdqa xmm7,xmm11 + + pslld xmm7,30 + movdqa xmm5,xmm1 + por xmm8,xmm9 + psrld xmm5,31 + paddd xmm14,xmm6 + paddd xmm1,xmm1 + + psrld xmm11,2 + paddd xmm14,xmm8 + por xmm1,xmm5 + por xmm11,xmm7 + pxor xmm2,xmm4 + movdqa xmm4,XMMWORD[((80-128))+rax] + + movdqa xmm8,xmm14 + movdqa xmm6,xmm12 + pxor xmm2,XMMWORD[((176-128))+rax] + paddd xmm13,xmm15 + pslld xmm8,5 + pxor xmm6,xmm10 + + movdqa xmm9,xmm14 + movdqa XMMWORD[(32-128)+rax],xmm1 + paddd xmm13,xmm1 + pxor xmm2,xmm4 + psrld xmm9,27 + pxor xmm6,xmm11 + movdqa xmm7,xmm10 + + pslld xmm7,30 + movdqa xmm5,xmm2 + por xmm8,xmm9 + psrld xmm5,31 + paddd xmm13,xmm6 + paddd xmm2,xmm2 + + psrld xmm10,2 + paddd xmm13,xmm8 + por xmm2,xmm5 + por xmm10,xmm7 + pxor xmm3,xmm0 + movdqa xmm0,XMMWORD[((96-128))+rax] + + movdqa xmm8,xmm13 + movdqa xmm6,xmm11 + pxor xmm3,XMMWORD[((192-128))+rax] + paddd xmm12,xmm15 + pslld xmm8,5 + pxor xmm6,xmm14 + + movdqa xmm9,xmm13 + movdqa XMMWORD[(48-128)+rax],xmm2 + paddd xmm12,xmm2 + pxor xmm3,xmm0 + psrld xmm9,27 + pxor xmm6,xmm10 + movdqa xmm7,xmm14 + + pslld xmm7,30 + movdqa xmm5,xmm3 + por xmm8,xmm9 + psrld xmm5,31 + paddd xmm12,xmm6 + paddd xmm3,xmm3 + + psrld xmm14,2 + paddd xmm12,xmm8 + por xmm3,xmm5 + por xmm14,xmm7 + pxor xmm4,xmm1 + movdqa xmm1,XMMWORD[((112-128))+rax] + + movdqa xmm8,xmm12 + movdqa xmm6,xmm10 + pxor xmm4,XMMWORD[((208-128))+rax] + paddd xmm11,xmm15 + pslld xmm8,5 + pxor xmm6,xmm13 + + movdqa xmm9,xmm12 + movdqa XMMWORD[(64-128)+rax],xmm3 + paddd xmm11,xmm3 + pxor xmm4,xmm1 + psrld xmm9,27 + pxor xmm6,xmm14 + movdqa xmm7,xmm13 + + pslld xmm7,30 + movdqa xmm5,xmm4 + por xmm8,xmm9 + psrld xmm5,31 + paddd xmm11,xmm6 + paddd xmm4,xmm4 + + psrld xmm13,2 + paddd xmm11,xmm8 + por xmm4,xmm5 + por xmm13,xmm7 + pxor xmm0,xmm2 + movdqa xmm2,XMMWORD[((128-128))+rax] + + movdqa xmm8,xmm11 + movdqa xmm6,xmm14 + pxor xmm0,XMMWORD[((224-128))+rax] + paddd xmm10,xmm15 + pslld xmm8,5 + pxor xmm6,xmm12 + + movdqa xmm9,xmm11 + movdqa XMMWORD[(80-128)+rax],xmm4 + paddd xmm10,xmm4 + pxor xmm0,xmm2 + psrld xmm9,27 + pxor xmm6,xmm13 + movdqa xmm7,xmm12 + + pslld xmm7,30 + movdqa xmm5,xmm0 + por xmm8,xmm9 + psrld xmm5,31 + paddd xmm10,xmm6 + paddd xmm0,xmm0 + + psrld xmm12,2 + paddd xmm10,xmm8 + por xmm0,xmm5 + por xmm12,xmm7 + pxor xmm1,xmm3 + movdqa xmm3,XMMWORD[((144-128))+rax] + + movdqa xmm8,xmm10 + movdqa xmm6,xmm13 + pxor xmm1,XMMWORD[((240-128))+rax] + paddd xmm14,xmm15 + pslld xmm8,5 + pxor xmm6,xmm11 + + movdqa xmm9,xmm10 + movdqa XMMWORD[(96-128)+rax],xmm0 + paddd xmm14,xmm0 + pxor xmm1,xmm3 + psrld xmm9,27 + pxor xmm6,xmm12 + movdqa xmm7,xmm11 + + pslld xmm7,30 + movdqa xmm5,xmm1 + por xmm8,xmm9 + psrld xmm5,31 + paddd xmm14,xmm6 + paddd xmm1,xmm1 + + psrld xmm11,2 + paddd xmm14,xmm8 + por xmm1,xmm5 + por xmm11,xmm7 + pxor xmm2,xmm4 + movdqa xmm4,XMMWORD[((160-128))+rax] + + movdqa xmm8,xmm14 + movdqa xmm6,xmm12 + pxor xmm2,XMMWORD[((0-128))+rax] + paddd xmm13,xmm15 + pslld xmm8,5 + pxor xmm6,xmm10 + + movdqa xmm9,xmm14 + movdqa XMMWORD[(112-128)+rax],xmm1 + paddd xmm13,xmm1 + pxor xmm2,xmm4 + psrld xmm9,27 + pxor xmm6,xmm11 + movdqa xmm7,xmm10 + + pslld xmm7,30 + movdqa xmm5,xmm2 + por xmm8,xmm9 + psrld xmm5,31 + paddd xmm13,xmm6 + paddd xmm2,xmm2 + + psrld xmm10,2 + paddd xmm13,xmm8 + por xmm2,xmm5 + por xmm10,xmm7 + pxor xmm3,xmm0 + movdqa xmm0,XMMWORD[((176-128))+rax] + + movdqa xmm8,xmm13 + movdqa xmm6,xmm11 + pxor xmm3,XMMWORD[((16-128))+rax] + paddd xmm12,xmm15 + pslld xmm8,5 + pxor xmm6,xmm14 + + movdqa xmm9,xmm13 + paddd xmm12,xmm2 + pxor xmm3,xmm0 + psrld xmm9,27 + pxor xmm6,xmm10 + movdqa xmm7,xmm14 + + pslld xmm7,30 + movdqa xmm5,xmm3 + por xmm8,xmm9 + psrld xmm5,31 + paddd xmm12,xmm6 + paddd xmm3,xmm3 + + psrld xmm14,2 + paddd xmm12,xmm8 + por xmm3,xmm5 + por xmm14,xmm7 + pxor xmm4,xmm1 + movdqa xmm1,XMMWORD[((192-128))+rax] + + movdqa xmm8,xmm12 + movdqa xmm6,xmm10 + pxor xmm4,XMMWORD[((32-128))+rax] + paddd xmm11,xmm15 + pslld xmm8,5 + pxor xmm6,xmm13 + + movdqa xmm9,xmm12 + paddd xmm11,xmm3 + pxor xmm4,xmm1 + psrld xmm9,27 + pxor xmm6,xmm14 + movdqa xmm7,xmm13 + + pslld xmm7,30 + movdqa xmm5,xmm4 + por xmm8,xmm9 + psrld xmm5,31 + paddd xmm11,xmm6 + paddd xmm4,xmm4 + + psrld xmm13,2 + paddd xmm11,xmm8 + por xmm4,xmm5 + por xmm13,xmm7 + pxor xmm0,xmm2 + movdqa xmm2,XMMWORD[((208-128))+rax] + + movdqa xmm8,xmm11 + movdqa xmm6,xmm14 + pxor xmm0,XMMWORD[((48-128))+rax] + paddd xmm10,xmm15 + pslld xmm8,5 + pxor xmm6,xmm12 + + movdqa xmm9,xmm11 + paddd xmm10,xmm4 + pxor xmm0,xmm2 + psrld xmm9,27 + pxor xmm6,xmm13 + movdqa xmm7,xmm12 + + pslld xmm7,30 + movdqa xmm5,xmm0 + por xmm8,xmm9 + psrld xmm5,31 + paddd xmm10,xmm6 + paddd xmm0,xmm0 + + psrld xmm12,2 + paddd xmm10,xmm8 + por xmm0,xmm5 + por xmm12,xmm7 + pxor xmm1,xmm3 + movdqa xmm3,XMMWORD[((224-128))+rax] + + movdqa xmm8,xmm10 + movdqa xmm6,xmm13 + pxor xmm1,XMMWORD[((64-128))+rax] + paddd xmm14,xmm15 + pslld xmm8,5 + pxor xmm6,xmm11 + + movdqa xmm9,xmm10 + paddd xmm14,xmm0 + pxor xmm1,xmm3 + psrld xmm9,27 + pxor xmm6,xmm12 + movdqa xmm7,xmm11 + + pslld xmm7,30 + movdqa xmm5,xmm1 + por xmm8,xmm9 + psrld xmm5,31 + paddd xmm14,xmm6 + paddd xmm1,xmm1 + + psrld xmm11,2 + paddd xmm14,xmm8 + por xmm1,xmm5 + por xmm11,xmm7 + pxor xmm2,xmm4 + movdqa xmm4,XMMWORD[((240-128))+rax] + + movdqa xmm8,xmm14 + movdqa xmm6,xmm12 + pxor xmm2,XMMWORD[((80-128))+rax] + paddd xmm13,xmm15 + pslld xmm8,5 + pxor xmm6,xmm10 + + movdqa xmm9,xmm14 + paddd xmm13,xmm1 + pxor xmm2,xmm4 + psrld xmm9,27 + pxor xmm6,xmm11 + movdqa xmm7,xmm10 + + pslld xmm7,30 + movdqa xmm5,xmm2 + por xmm8,xmm9 + psrld xmm5,31 + paddd xmm13,xmm6 + paddd xmm2,xmm2 + + psrld xmm10,2 + paddd xmm13,xmm8 + por xmm2,xmm5 + por xmm10,xmm7 + pxor xmm3,xmm0 + movdqa xmm0,XMMWORD[((0-128))+rax] + + movdqa xmm8,xmm13 + movdqa xmm6,xmm11 + pxor xmm3,XMMWORD[((96-128))+rax] + paddd xmm12,xmm15 + pslld xmm8,5 + pxor xmm6,xmm14 + + movdqa xmm9,xmm13 + paddd xmm12,xmm2 + pxor xmm3,xmm0 + psrld xmm9,27 + pxor xmm6,xmm10 + movdqa xmm7,xmm14 + + pslld xmm7,30 + movdqa xmm5,xmm3 + por xmm8,xmm9 + psrld xmm5,31 + paddd xmm12,xmm6 + paddd xmm3,xmm3 + + psrld xmm14,2 + paddd xmm12,xmm8 + por xmm3,xmm5 + por xmm14,xmm7 + pxor xmm4,xmm1 + movdqa xmm1,XMMWORD[((16-128))+rax] + + movdqa xmm8,xmm12 + movdqa xmm6,xmm10 + pxor xmm4,XMMWORD[((112-128))+rax] + paddd xmm11,xmm15 + pslld xmm8,5 + pxor xmm6,xmm13 + + movdqa xmm9,xmm12 + paddd xmm11,xmm3 + pxor xmm4,xmm1 + psrld xmm9,27 + pxor xmm6,xmm14 + movdqa xmm7,xmm13 + + pslld xmm7,30 + movdqa xmm5,xmm4 + por xmm8,xmm9 + psrld xmm5,31 + paddd xmm11,xmm6 + paddd xmm4,xmm4 + + psrld xmm13,2 + paddd xmm11,xmm8 + por xmm4,xmm5 + por xmm13,xmm7 + movdqa xmm8,xmm11 + paddd xmm10,xmm15 + movdqa xmm6,xmm14 + pslld xmm8,5 + pxor xmm6,xmm12 + + movdqa xmm9,xmm11 + paddd xmm10,xmm4 + psrld xmm9,27 + movdqa xmm7,xmm12 + pxor xmm6,xmm13 + + pslld xmm7,30 + por xmm8,xmm9 + paddd xmm10,xmm6 + + psrld xmm12,2 + paddd xmm10,xmm8 + por xmm12,xmm7 + movdqa xmm0,XMMWORD[rbx] + mov ecx,1 + cmp ecx,DWORD[rbx] + pxor xmm8,xmm8 + cmovge r8,rbp + cmp ecx,DWORD[4+rbx] + movdqa xmm1,xmm0 + cmovge r9,rbp + cmp ecx,DWORD[8+rbx] + pcmpgtd xmm1,xmm8 + cmovge r10,rbp + cmp ecx,DWORD[12+rbx] + paddd xmm0,xmm1 + cmovge r11,rbp + + movdqu xmm6,XMMWORD[rdi] + pand xmm10,xmm1 + movdqu xmm7,XMMWORD[32+rdi] + pand xmm11,xmm1 + paddd xmm10,xmm6 + movdqu xmm8,XMMWORD[64+rdi] + pand xmm12,xmm1 + paddd xmm11,xmm7 + movdqu xmm9,XMMWORD[96+rdi] + pand xmm13,xmm1 + paddd xmm12,xmm8 + movdqu xmm5,XMMWORD[128+rdi] + pand xmm14,xmm1 + movdqu XMMWORD[rdi],xmm10 + paddd xmm13,xmm9 + movdqu XMMWORD[32+rdi],xmm11 + paddd xmm14,xmm5 + movdqu XMMWORD[64+rdi],xmm12 + movdqu XMMWORD[96+rdi],xmm13 + movdqu XMMWORD[128+rdi],xmm14 + + movdqa XMMWORD[rbx],xmm0 + movdqa xmm5,XMMWORD[96+rbp] + movdqa xmm15,XMMWORD[((-32))+rbp] + dec edx + jnz NEAR $L$oop + + mov edx,DWORD[280+rsp] + lea rdi,[16+rdi] + lea rsi,[64+rsi] + dec edx + jnz NEAR $L$oop_grande + +$L$done: + mov rax,QWORD[272+rsp] + + movaps xmm6,XMMWORD[((-184))+rax] + movaps xmm7,XMMWORD[((-168))+rax] + movaps xmm8,XMMWORD[((-152))+rax] + movaps xmm9,XMMWORD[((-136))+rax] + movaps xmm10,XMMWORD[((-120))+rax] + movaps xmm11,XMMWORD[((-104))+rax] + movaps xmm12,XMMWORD[((-88))+rax] + movaps xmm13,XMMWORD[((-72))+rax] + movaps xmm14,XMMWORD[((-56))+rax] + movaps xmm15,XMMWORD[((-40))+rax] + mov rbp,QWORD[((-16))+rax] + + mov rbx,QWORD[((-8))+rax] + + lea rsp,[rax] + +$L$epilogue: + mov rdi,QWORD[8+rsp] ;WIN64 epilogue + mov rsi,QWORD[16+rsp] + DB 0F3h,0C3h ;repret + +$L$SEH_end_sha1_multi_block: + +ALIGN 32 +sha1_multi_block_shaext: + mov QWORD[8+rsp],rdi ;WIN64 prologue + mov QWORD[16+rsp],rsi + mov rax,rsp +$L$SEH_begin_sha1_multi_block_shaext: + mov rdi,rcx + mov rsi,rdx + mov rdx,r8 + + + +_shaext_shortcut: + mov rax,rsp + + push rbx + + push rbp + + lea rsp,[((-168))+rsp] + movaps XMMWORD[rsp],xmm6 + movaps XMMWORD[16+rsp],xmm7 + movaps XMMWORD[32+rsp],xmm8 + movaps XMMWORD[48+rsp],xmm9 + movaps XMMWORD[(-120)+rax],xmm10 + movaps XMMWORD[(-104)+rax],xmm11 + movaps XMMWORD[(-88)+rax],xmm12 + movaps XMMWORD[(-72)+rax],xmm13 + movaps XMMWORD[(-56)+rax],xmm14 + movaps XMMWORD[(-40)+rax],xmm15 + sub rsp,288 + shl edx,1 + and rsp,-256 + lea rdi,[64+rdi] + mov QWORD[272+rsp],rax +$L$body_shaext: + lea rbx,[256+rsp] + movdqa xmm3,XMMWORD[((K_XX_XX+128))] + +$L$oop_grande_shaext: + mov DWORD[280+rsp],edx + xor edx,edx + mov r8,QWORD[rsi] + mov ecx,DWORD[8+rsi] + cmp ecx,edx + cmovg edx,ecx + test ecx,ecx + mov DWORD[rbx],ecx + cmovle r8,rsp + mov r9,QWORD[16+rsi] + mov ecx,DWORD[24+rsi] + cmp ecx,edx + cmovg edx,ecx + test ecx,ecx + mov DWORD[4+rbx],ecx + cmovle r9,rsp + test edx,edx + jz NEAR $L$done_shaext + + movq xmm0,QWORD[((0-64))+rdi] + movq xmm4,QWORD[((32-64))+rdi] + movq xmm5,QWORD[((64-64))+rdi] + movq xmm6,QWORD[((96-64))+rdi] + movq xmm7,QWORD[((128-64))+rdi] + + punpckldq xmm0,xmm4 + punpckldq xmm5,xmm6 + + movdqa xmm8,xmm0 + punpcklqdq xmm0,xmm5 + punpckhqdq xmm8,xmm5 + + pshufd xmm1,xmm7,63 + pshufd xmm9,xmm7,127 + pshufd xmm0,xmm0,27 + pshufd xmm8,xmm8,27 + jmp NEAR $L$oop_shaext + +ALIGN 32 +$L$oop_shaext: + movdqu xmm4,XMMWORD[r8] + movdqu xmm11,XMMWORD[r9] + movdqu xmm5,XMMWORD[16+r8] + movdqu xmm12,XMMWORD[16+r9] + movdqu xmm6,XMMWORD[32+r8] +DB 102,15,56,0,227 + movdqu xmm13,XMMWORD[32+r9] +DB 102,68,15,56,0,219 + movdqu xmm7,XMMWORD[48+r8] + lea r8,[64+r8] +DB 102,15,56,0,235 + movdqu xmm14,XMMWORD[48+r9] + lea r9,[64+r9] +DB 102,68,15,56,0,227 + + movdqa XMMWORD[80+rsp],xmm1 + paddd xmm1,xmm4 + movdqa XMMWORD[112+rsp],xmm9 + paddd xmm9,xmm11 + movdqa XMMWORD[64+rsp],xmm0 + movdqa xmm2,xmm0 + movdqa XMMWORD[96+rsp],xmm8 + movdqa xmm10,xmm8 +DB 15,58,204,193,0 +DB 15,56,200,213 +DB 69,15,58,204,193,0 +DB 69,15,56,200,212 +DB 102,15,56,0,243 + prefetcht0 [127+r8] +DB 15,56,201,229 +DB 102,68,15,56,0,235 + prefetcht0 [127+r9] +DB 69,15,56,201,220 + +DB 102,15,56,0,251 + movdqa xmm1,xmm0 +DB 102,68,15,56,0,243 + movdqa xmm9,xmm8 +DB 15,58,204,194,0 +DB 15,56,200,206 +DB 69,15,58,204,194,0 +DB 69,15,56,200,205 + pxor xmm4,xmm6 +DB 15,56,201,238 + pxor xmm11,xmm13 +DB 69,15,56,201,229 + movdqa xmm2,xmm0 + movdqa xmm10,xmm8 +DB 15,58,204,193,0 +DB 15,56,200,215 +DB 69,15,58,204,193,0 +DB 69,15,56,200,214 +DB 15,56,202,231 +DB 69,15,56,202,222 + pxor xmm5,xmm7 +DB 15,56,201,247 + pxor xmm12,xmm14 +DB 69,15,56,201,238 + movdqa xmm1,xmm0 + movdqa xmm9,xmm8 +DB 15,58,204,194,0 +DB 15,56,200,204 +DB 69,15,58,204,194,0 +DB 69,15,56,200,203 +DB 15,56,202,236 +DB 69,15,56,202,227 + pxor xmm6,xmm4 +DB 15,56,201,252 + pxor xmm13,xmm11 +DB 69,15,56,201,243 + movdqa xmm2,xmm0 + movdqa xmm10,xmm8 +DB 15,58,204,193,0 +DB 15,56,200,213 +DB 69,15,58,204,193,0 +DB 69,15,56,200,212 +DB 15,56,202,245 +DB 69,15,56,202,236 + pxor xmm7,xmm5 +DB 15,56,201,229 + pxor xmm14,xmm12 +DB 69,15,56,201,220 + movdqa xmm1,xmm0 + movdqa xmm9,xmm8 +DB 15,58,204,194,1 +DB 15,56,200,206 +DB 69,15,58,204,194,1 +DB 69,15,56,200,205 +DB 15,56,202,254 +DB 69,15,56,202,245 + pxor xmm4,xmm6 +DB 15,56,201,238 + pxor xmm11,xmm13 +DB 69,15,56,201,229 + movdqa xmm2,xmm0 + movdqa xmm10,xmm8 +DB 15,58,204,193,1 +DB 15,56,200,215 +DB 69,15,58,204,193,1 +DB 69,15,56,200,214 +DB 15,56,202,231 +DB 69,15,56,202,222 + pxor xmm5,xmm7 +DB 15,56,201,247 + pxor xmm12,xmm14 +DB 69,15,56,201,238 + movdqa xmm1,xmm0 + movdqa xmm9,xmm8 +DB 15,58,204,194,1 +DB 15,56,200,204 +DB 69,15,58,204,194,1 +DB 69,15,56,200,203 +DB 15,56,202,236 +DB 69,15,56,202,227 + pxor xmm6,xmm4 +DB 15,56,201,252 + pxor xmm13,xmm11 +DB 69,15,56,201,243 + movdqa xmm2,xmm0 + movdqa xmm10,xmm8 +DB 15,58,204,193,1 +DB 15,56,200,213 +DB 69,15,58,204,193,1 +DB 69,15,56,200,212 +DB 15,56,202,245 +DB 69,15,56,202,236 + pxor xmm7,xmm5 +DB 15,56,201,229 + pxor xmm14,xmm12 +DB 69,15,56,201,220 + movdqa xmm1,xmm0 + movdqa xmm9,xmm8 +DB 15,58,204,194,1 +DB 15,56,200,206 +DB 69,15,58,204,194,1 +DB 69,15,56,200,205 +DB 15,56,202,254 +DB 69,15,56,202,245 + pxor xmm4,xmm6 +DB 15,56,201,238 + pxor xmm11,xmm13 +DB 69,15,56,201,229 + movdqa xmm2,xmm0 + movdqa xmm10,xmm8 +DB 15,58,204,193,2 +DB 15,56,200,215 +DB 69,15,58,204,193,2 +DB 69,15,56,200,214 +DB 15,56,202,231 +DB 69,15,56,202,222 + pxor xmm5,xmm7 +DB 15,56,201,247 + pxor xmm12,xmm14 +DB 69,15,56,201,238 + movdqa xmm1,xmm0 + movdqa xmm9,xmm8 +DB 15,58,204,194,2 +DB 15,56,200,204 +DB 69,15,58,204,194,2 +DB 69,15,56,200,203 +DB 15,56,202,236 +DB 69,15,56,202,227 + pxor xmm6,xmm4 +DB 15,56,201,252 + pxor xmm13,xmm11 +DB 69,15,56,201,243 + movdqa xmm2,xmm0 + movdqa xmm10,xmm8 +DB 15,58,204,193,2 +DB 15,56,200,213 +DB 69,15,58,204,193,2 +DB 69,15,56,200,212 +DB 15,56,202,245 +DB 69,15,56,202,236 + pxor xmm7,xmm5 +DB 15,56,201,229 + pxor xmm14,xmm12 +DB 69,15,56,201,220 + movdqa xmm1,xmm0 + movdqa xmm9,xmm8 +DB 15,58,204,194,2 +DB 15,56,200,206 +DB 69,15,58,204,194,2 +DB 69,15,56,200,205 +DB 15,56,202,254 +DB 69,15,56,202,245 + pxor xmm4,xmm6 +DB 15,56,201,238 + pxor xmm11,xmm13 +DB 69,15,56,201,229 + movdqa xmm2,xmm0 + movdqa xmm10,xmm8 +DB 15,58,204,193,2 +DB 15,56,200,215 +DB 69,15,58,204,193,2 +DB 69,15,56,200,214 +DB 15,56,202,231 +DB 69,15,56,202,222 + pxor xmm5,xmm7 +DB 15,56,201,247 + pxor xmm12,xmm14 +DB 69,15,56,201,238 + movdqa xmm1,xmm0 + movdqa xmm9,xmm8 +DB 15,58,204,194,3 +DB 15,56,200,204 +DB 69,15,58,204,194,3 +DB 69,15,56,200,203 +DB 15,56,202,236 +DB 69,15,56,202,227 + pxor xmm6,xmm4 +DB 15,56,201,252 + pxor xmm13,xmm11 +DB 69,15,56,201,243 + movdqa xmm2,xmm0 + movdqa xmm10,xmm8 +DB 15,58,204,193,3 +DB 15,56,200,213 +DB 69,15,58,204,193,3 +DB 69,15,56,200,212 +DB 15,56,202,245 +DB 69,15,56,202,236 + pxor xmm7,xmm5 + pxor xmm14,xmm12 + + mov ecx,1 + pxor xmm4,xmm4 + cmp ecx,DWORD[rbx] + cmovge r8,rsp + + movdqa xmm1,xmm0 + movdqa xmm9,xmm8 +DB 15,58,204,194,3 +DB 15,56,200,206 +DB 69,15,58,204,194,3 +DB 69,15,56,200,205 +DB 15,56,202,254 +DB 69,15,56,202,245 + + cmp ecx,DWORD[4+rbx] + cmovge r9,rsp + movq xmm6,QWORD[rbx] + + movdqa xmm2,xmm0 + movdqa xmm10,xmm8 +DB 15,58,204,193,3 +DB 15,56,200,215 +DB 69,15,58,204,193,3 +DB 69,15,56,200,214 + + pshufd xmm11,xmm6,0x00 + pshufd xmm12,xmm6,0x55 + movdqa xmm7,xmm6 + pcmpgtd xmm11,xmm4 + pcmpgtd xmm12,xmm4 + + movdqa xmm1,xmm0 + movdqa xmm9,xmm8 +DB 15,58,204,194,3 +DB 15,56,200,204 +DB 69,15,58,204,194,3 +DB 68,15,56,200,204 + + pcmpgtd xmm7,xmm4 + pand xmm0,xmm11 + pand xmm1,xmm11 + pand xmm8,xmm12 + pand xmm9,xmm12 + paddd xmm6,xmm7 + + paddd xmm0,XMMWORD[64+rsp] + paddd xmm1,XMMWORD[80+rsp] + paddd xmm8,XMMWORD[96+rsp] + paddd xmm9,XMMWORD[112+rsp] + + movq QWORD[rbx],xmm6 + dec edx + jnz NEAR $L$oop_shaext + + mov edx,DWORD[280+rsp] + + pshufd xmm0,xmm0,27 + pshufd xmm8,xmm8,27 + + movdqa xmm6,xmm0 + punpckldq xmm0,xmm8 + punpckhdq xmm6,xmm8 + punpckhdq xmm1,xmm9 + movq QWORD[(0-64)+rdi],xmm0 + psrldq xmm0,8 + movq QWORD[(64-64)+rdi],xmm6 + psrldq xmm6,8 + movq QWORD[(32-64)+rdi],xmm0 + psrldq xmm1,8 + movq QWORD[(96-64)+rdi],xmm6 + movq QWORD[(128-64)+rdi],xmm1 + + lea rdi,[8+rdi] + lea rsi,[32+rsi] + dec edx + jnz NEAR $L$oop_grande_shaext + +$L$done_shaext: + + movaps xmm6,XMMWORD[((-184))+rax] + movaps xmm7,XMMWORD[((-168))+rax] + movaps xmm8,XMMWORD[((-152))+rax] + movaps xmm9,XMMWORD[((-136))+rax] + movaps xmm10,XMMWORD[((-120))+rax] + movaps xmm11,XMMWORD[((-104))+rax] + movaps xmm12,XMMWORD[((-88))+rax] + movaps xmm13,XMMWORD[((-72))+rax] + movaps xmm14,XMMWORD[((-56))+rax] + movaps xmm15,XMMWORD[((-40))+rax] + mov rbp,QWORD[((-16))+rax] + + mov rbx,QWORD[((-8))+rax] + + lea rsp,[rax] + +$L$epilogue_shaext: + mov rdi,QWORD[8+rsp] ;WIN64 epilogue + mov rsi,QWORD[16+rsp] + DB 0F3h,0C3h ;repret + +$L$SEH_end_sha1_multi_block_shaext: + +ALIGN 256 + DD 0x5a827999,0x5a827999,0x5a827999,0x5a827999 + DD 0x5a827999,0x5a827999,0x5a827999,0x5a827999 +K_XX_XX: + DD 0x6ed9eba1,0x6ed9eba1,0x6ed9eba1,0x6ed9eba1 + DD 0x6ed9eba1,0x6ed9eba1,0x6ed9eba1,0x6ed9eba1 + DD 0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc + DD 0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc + DD 0xca62c1d6,0xca62c1d6,0xca62c1d6,0xca62c1d6 + DD 0xca62c1d6,0xca62c1d6,0xca62c1d6,0xca62c1d6 + DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f + DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f +DB 0xf,0xe,0xd,0xc,0xb,0xa,0x9,0x8,0x7,0x6,0x5,0x4,0x3,0x2,0x1,0x0 +DB 83,72,65,49,32,109,117,108,116,105,45,98,108,111,99,107 +DB 32,116,114,97,110,115,102,111,114,109,32,102,111,114,32,120 +DB 56,54,95,54,52,44,32,67,82,89,80,84,79,71,65,77 +DB 83,32,98,121,32,60,97,112,112,114,111,64,111,112,101,110 +DB 115,115,108,46,111,114,103,62,0 +EXTERN __imp_RtlVirtualUnwind + +ALIGN 16 +se_handler: + push rsi + push rdi + push rbx + push rbp + push r12 + push r13 + push r14 + push r15 + pushfq + sub rsp,64 + + mov rax,QWORD[120+r8] + mov rbx,QWORD[248+r8] + + mov rsi,QWORD[8+r9] + mov r11,QWORD[56+r9] + + mov r10d,DWORD[r11] + lea r10,[r10*1+rsi] + cmp rbx,r10 + jb NEAR $L$in_prologue + + mov rax,QWORD[152+r8] + + mov r10d,DWORD[4+r11] + lea r10,[r10*1+rsi] + cmp rbx,r10 + jae NEAR $L$in_prologue + + mov rax,QWORD[272+rax] + + mov rbx,QWORD[((-8))+rax] + mov rbp,QWORD[((-16))+rax] + mov QWORD[144+r8],rbx + mov QWORD[160+r8],rbp + + lea rsi,[((-24-160))+rax] + lea rdi,[512+r8] + mov ecx,20 + DD 0xa548f3fc + +$L$in_prologue: + mov rdi,QWORD[8+rax] + mov rsi,QWORD[16+rax] + mov QWORD[152+r8],rax + mov QWORD[168+r8],rsi + mov QWORD[176+r8],rdi + + mov rdi,QWORD[40+r9] + mov rsi,r8 + mov ecx,154 + DD 0xa548f3fc + + mov rsi,r9 + xor rcx,rcx + mov rdx,QWORD[8+rsi] + mov r8,QWORD[rsi] + mov r9,QWORD[16+rsi] + mov r10,QWORD[40+rsi] + lea r11,[56+rsi] + lea r12,[24+rsi] + mov QWORD[32+rsp],r10 + mov QWORD[40+rsp],r11 + mov QWORD[48+rsp],r12 + mov QWORD[56+rsp],rcx + call QWORD[__imp_RtlVirtualUnwind] + + mov eax,1 + add rsp,64 + popfq + pop r15 + pop r14 + pop r13 + pop r12 + pop rbp + pop rbx + pop rdi + pop rsi + DB 0F3h,0C3h ;repret + +section .pdata rdata align=4 +ALIGN 4 + DD $L$SEH_begin_sha1_multi_block wrt ..imagebase + DD $L$SEH_end_sha1_multi_block wrt ..imagebase + DD $L$SEH_info_sha1_multi_block wrt ..imagebase + DD $L$SEH_begin_sha1_multi_block_shaext wrt ..imagebase + DD $L$SEH_end_sha1_multi_block_shaext wrt ..imagebase + DD $L$SEH_info_sha1_multi_block_shaext wrt ..imagebase +section .xdata rdata align=8 +ALIGN 8 +$L$SEH_info_sha1_multi_block: +DB 9,0,0,0 + DD se_handler wrt ..imagebase + DD $L$body wrt ..imagebase,$L$epilogue wrt ..imagebase +$L$SEH_info_sha1_multi_block_shaext: +DB 9,0,0,0 + DD se_handler wrt ..imagebase + DD $L$body_shaext wrt ..imagebase,$L$epilogue_shaext wrt ..imagebase diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-x86_64.nasm new file mode 100644 index 0000000000..c6d68d348f --- /dev/null +++ b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-x86_64.nasm @@ -0,0 +1,2884 @@ +; WARNING: do not edit! +; Generated from openssl/crypto/sha/asm/sha1-x86_64.pl +; +; Copyright 2006-2020 The OpenSSL Project Authors. All Rights Reserved. +; +; Licensed under the OpenSSL license (the "License"). You may not use +; this file except in compliance with the License. You can obtain a copy +; in the file LICENSE in the source distribution or at +; https://www.openssl.org/source/license.html + +default rel +%define XMMWORD +%define YMMWORD +%define ZMMWORD +section .text code align=64 + +EXTERN OPENSSL_ia32cap_P + +global sha1_block_data_order + +ALIGN 16 +sha1_block_data_order: + mov QWORD[8+rsp],rdi ;WIN64 prologue + mov QWORD[16+rsp],rsi + mov rax,rsp +$L$SEH_begin_sha1_block_data_order: + mov rdi,rcx + mov rsi,rdx + mov rdx,r8 + + + + mov r9d,DWORD[((OPENSSL_ia32cap_P+0))] + mov r8d,DWORD[((OPENSSL_ia32cap_P+4))] + mov r10d,DWORD[((OPENSSL_ia32cap_P+8))] + test r8d,512 + jz NEAR $L$ialu + test r10d,536870912 + jnz NEAR _shaext_shortcut + jmp NEAR _ssse3_shortcut + +ALIGN 16 +$L$ialu: + mov rax,rsp + + push rbx + + push rbp + + push r12 + + push r13 + + push r14 + + mov r8,rdi + sub rsp,72 + mov r9,rsi + and rsp,-64 + mov r10,rdx + mov QWORD[64+rsp],rax + +$L$prologue: + + mov esi,DWORD[r8] + mov edi,DWORD[4+r8] + mov r11d,DWORD[8+r8] + mov r12d,DWORD[12+r8] + mov r13d,DWORD[16+r8] + jmp NEAR $L$loop + +ALIGN 16 +$L$loop: + mov edx,DWORD[r9] + bswap edx + mov ebp,DWORD[4+r9] + mov eax,r12d + mov DWORD[rsp],edx + mov ecx,esi + bswap ebp + xor eax,r11d + rol ecx,5 + and eax,edi + lea r13d,[1518500249+r13*1+rdx] + add r13d,ecx + xor eax,r12d + rol edi,30 + add r13d,eax + mov r14d,DWORD[8+r9] + mov eax,r11d + mov DWORD[4+rsp],ebp + mov ecx,r13d + bswap r14d + xor eax,edi + rol ecx,5 + and eax,esi + lea r12d,[1518500249+r12*1+rbp] + add r12d,ecx + xor eax,r11d + rol esi,30 + add r12d,eax + mov edx,DWORD[12+r9] + mov eax,edi + mov DWORD[8+rsp],r14d + mov ecx,r12d + bswap edx + xor eax,esi + rol ecx,5 + and eax,r13d + lea r11d,[1518500249+r11*1+r14] + add r11d,ecx + xor eax,edi + rol r13d,30 + add r11d,eax + mov ebp,DWORD[16+r9] + mov eax,esi + mov DWORD[12+rsp],edx + mov ecx,r11d + bswap ebp + xor eax,r13d + rol ecx,5 + and eax,r12d + lea edi,[1518500249+rdi*1+rdx] + add edi,ecx + xor eax,esi + rol r12d,30 + add edi,eax + mov r14d,DWORD[20+r9] + mov eax,r13d + mov DWORD[16+rsp],ebp + mov ecx,edi + bswap r14d + xor eax,r12d + rol ecx,5 + and eax,r11d + lea esi,[1518500249+rsi*1+rbp] + add esi,ecx + xor eax,r13d + rol r11d,30 + add esi,eax + mov edx,DWORD[24+r9] + mov eax,r12d + mov DWORD[20+rsp],r14d + mov ecx,esi + bswap edx + xor eax,r11d + rol ecx,5 + and eax,edi + lea r13d,[1518500249+r13*1+r14] + add r13d,ecx + xor eax,r12d + rol edi,30 + add r13d,eax + mov ebp,DWORD[28+r9] + mov eax,r11d + mov DWORD[24+rsp],edx + mov ecx,r13d + bswap ebp + xor eax,edi + rol ecx,5 + and eax,esi + lea r12d,[1518500249+r12*1+rdx] + add r12d,ecx + xor eax,r11d + rol esi,30 + add r12d,eax + mov r14d,DWORD[32+r9] + mov eax,edi + mov DWORD[28+rsp],ebp + mov ecx,r12d + bswap r14d + xor eax,esi + rol ecx,5 + and eax,r13d + lea r11d,[1518500249+r11*1+rbp] + add r11d,ecx + xor eax,edi + rol r13d,30 + add r11d,eax + mov edx,DWORD[36+r9] + mov eax,esi + mov DWORD[32+rsp],r14d + mov ecx,r11d + bswap edx + xor eax,r13d + rol ecx,5 + and eax,r12d + lea edi,[1518500249+rdi*1+r14] + add edi,ecx + xor eax,esi + rol r12d,30 + add edi,eax + mov ebp,DWORD[40+r9] + mov eax,r13d + mov DWORD[36+rsp],edx + mov ecx,edi + bswap ebp + xor eax,r12d + rol ecx,5 + and eax,r11d + lea esi,[1518500249+rsi*1+rdx] + add esi,ecx + xor eax,r13d + rol r11d,30 + add esi,eax + mov r14d,DWORD[44+r9] + mov eax,r12d + mov DWORD[40+rsp],ebp + mov ecx,esi + bswap r14d + xor eax,r11d + rol ecx,5 + and eax,edi + lea r13d,[1518500249+r13*1+rbp] + add r13d,ecx + xor eax,r12d + rol edi,30 + add r13d,eax + mov edx,DWORD[48+r9] + mov eax,r11d + mov DWORD[44+rsp],r14d + mov ecx,r13d + bswap edx + xor eax,edi + rol ecx,5 + and eax,esi + lea r12d,[1518500249+r12*1+r14] + add r12d,ecx + xor eax,r11d + rol esi,30 + add r12d,eax + mov ebp,DWORD[52+r9] + mov eax,edi + mov DWORD[48+rsp],edx + mov ecx,r12d + bswap ebp + xor eax,esi + rol ecx,5 + and eax,r13d + lea r11d,[1518500249+r11*1+rdx] + add r11d,ecx + xor eax,edi + rol r13d,30 + add r11d,eax + mov r14d,DWORD[56+r9] + mov eax,esi + mov DWORD[52+rsp],ebp + mov ecx,r11d + bswap r14d + xor eax,r13d + rol ecx,5 + and eax,r12d + lea edi,[1518500249+rdi*1+rbp] + add edi,ecx + xor eax,esi + rol r12d,30 + add edi,eax + mov edx,DWORD[60+r9] + mov eax,r13d + mov DWORD[56+rsp],r14d + mov ecx,edi + bswap edx + xor eax,r12d + rol ecx,5 + and eax,r11d + lea esi,[1518500249+rsi*1+r14] + add esi,ecx + xor eax,r13d + rol r11d,30 + add esi,eax + xor ebp,DWORD[rsp] + mov eax,r12d + mov DWORD[60+rsp],edx + mov ecx,esi + xor ebp,DWORD[8+rsp] + xor eax,r11d + rol ecx,5 + xor ebp,DWORD[32+rsp] + and eax,edi + lea r13d,[1518500249+r13*1+rdx] + rol edi,30 + xor eax,r12d + add r13d,ecx + rol ebp,1 + add r13d,eax + xor r14d,DWORD[4+rsp] + mov eax,r11d + mov DWORD[rsp],ebp + mov ecx,r13d + xor r14d,DWORD[12+rsp] + xor eax,edi + rol ecx,5 + xor r14d,DWORD[36+rsp] + and eax,esi + lea r12d,[1518500249+r12*1+rbp] + rol esi,30 + xor eax,r11d + add r12d,ecx + rol r14d,1 + add r12d,eax + xor edx,DWORD[8+rsp] + mov eax,edi + mov DWORD[4+rsp],r14d + mov ecx,r12d + xor edx,DWORD[16+rsp] + xor eax,esi + rol ecx,5 + xor edx,DWORD[40+rsp] + and eax,r13d + lea r11d,[1518500249+r11*1+r14] + rol r13d,30 + xor eax,edi + add r11d,ecx + rol edx,1 + add r11d,eax + xor ebp,DWORD[12+rsp] + mov eax,esi + mov DWORD[8+rsp],edx + mov ecx,r11d + xor ebp,DWORD[20+rsp] + xor eax,r13d + rol ecx,5 + xor ebp,DWORD[44+rsp] + and eax,r12d + lea edi,[1518500249+rdi*1+rdx] + rol r12d,30 + xor eax,esi + add edi,ecx + rol ebp,1 + add edi,eax + xor r14d,DWORD[16+rsp] + mov eax,r13d + mov DWORD[12+rsp],ebp + mov ecx,edi + xor r14d,DWORD[24+rsp] + xor eax,r12d + rol ecx,5 + xor r14d,DWORD[48+rsp] + and eax,r11d + lea esi,[1518500249+rsi*1+rbp] + rol r11d,30 + xor eax,r13d + add esi,ecx + rol r14d,1 + add esi,eax + xor edx,DWORD[20+rsp] + mov eax,edi + mov DWORD[16+rsp],r14d + mov ecx,esi + xor edx,DWORD[28+rsp] + xor eax,r12d + rol ecx,5 + xor edx,DWORD[52+rsp] + lea r13d,[1859775393+r13*1+r14] + xor eax,r11d + add r13d,ecx + rol edi,30 + add r13d,eax + rol edx,1 + xor ebp,DWORD[24+rsp] + mov eax,esi + mov DWORD[20+rsp],edx + mov ecx,r13d + xor ebp,DWORD[32+rsp] + xor eax,r11d + rol ecx,5 + xor ebp,DWORD[56+rsp] + lea r12d,[1859775393+r12*1+rdx] + xor eax,edi + add r12d,ecx + rol esi,30 + add r12d,eax + rol ebp,1 + xor r14d,DWORD[28+rsp] + mov eax,r13d + mov DWORD[24+rsp],ebp + mov ecx,r12d + xor r14d,DWORD[36+rsp] + xor eax,edi + rol ecx,5 + xor r14d,DWORD[60+rsp] + lea r11d,[1859775393+r11*1+rbp] + xor eax,esi + add r11d,ecx + rol r13d,30 + add r11d,eax + rol r14d,1 + xor edx,DWORD[32+rsp] + mov eax,r12d + mov DWORD[28+rsp],r14d + mov ecx,r11d + xor edx,DWORD[40+rsp] + xor eax,esi + rol ecx,5 + xor edx,DWORD[rsp] + lea edi,[1859775393+rdi*1+r14] + xor eax,r13d + add edi,ecx + rol r12d,30 + add edi,eax + rol edx,1 + xor ebp,DWORD[36+rsp] + mov eax,r11d + mov DWORD[32+rsp],edx + mov ecx,edi + xor ebp,DWORD[44+rsp] + xor eax,r13d + rol ecx,5 + xor ebp,DWORD[4+rsp] + lea esi,[1859775393+rsi*1+rdx] + xor eax,r12d + add esi,ecx + rol r11d,30 + add esi,eax + rol ebp,1 + xor r14d,DWORD[40+rsp] + mov eax,edi + mov DWORD[36+rsp],ebp + mov ecx,esi + xor r14d,DWORD[48+rsp] + xor eax,r12d + rol ecx,5 + xor r14d,DWORD[8+rsp] + lea r13d,[1859775393+r13*1+rbp] + xor eax,r11d + add r13d,ecx + rol edi,30 + add r13d,eax + rol r14d,1 + xor edx,DWORD[44+rsp] + mov eax,esi + mov DWORD[40+rsp],r14d + mov ecx,r13d + xor edx,DWORD[52+rsp] + xor eax,r11d + rol ecx,5 + xor edx,DWORD[12+rsp] + lea r12d,[1859775393+r12*1+r14] + xor eax,edi + add r12d,ecx + rol esi,30 + add r12d,eax + rol edx,1 + xor ebp,DWORD[48+rsp] + mov eax,r13d + mov DWORD[44+rsp],edx + mov ecx,r12d + xor ebp,DWORD[56+rsp] + xor eax,edi + rol ecx,5 + xor ebp,DWORD[16+rsp] + lea r11d,[1859775393+r11*1+rdx] + xor eax,esi + add r11d,ecx + rol r13d,30 + add r11d,eax + rol ebp,1 + xor r14d,DWORD[52+rsp] + mov eax,r12d + mov DWORD[48+rsp],ebp + mov ecx,r11d + xor r14d,DWORD[60+rsp] + xor eax,esi + rol ecx,5 + xor r14d,DWORD[20+rsp] + lea edi,[1859775393+rdi*1+rbp] + xor eax,r13d + add edi,ecx + rol r12d,30 + add edi,eax + rol r14d,1 + xor edx,DWORD[56+rsp] + mov eax,r11d + mov DWORD[52+rsp],r14d + mov ecx,edi + xor edx,DWORD[rsp] + xor eax,r13d + rol ecx,5 + xor edx,DWORD[24+rsp] + lea esi,[1859775393+rsi*1+r14] + xor eax,r12d + add esi,ecx + rol r11d,30 + add esi,eax + rol edx,1 + xor ebp,DWORD[60+rsp] + mov eax,edi + mov DWORD[56+rsp],edx + mov ecx,esi + xor ebp,DWORD[4+rsp] + xor eax,r12d + rol ecx,5 + xor ebp,DWORD[28+rsp] + lea r13d,[1859775393+r13*1+rdx] + xor eax,r11d + add r13d,ecx + rol edi,30 + add r13d,eax + rol ebp,1 + xor r14d,DWORD[rsp] + mov eax,esi + mov DWORD[60+rsp],ebp + mov ecx,r13d + xor r14d,DWORD[8+rsp] + xor eax,r11d + rol ecx,5 + xor r14d,DWORD[32+rsp] + lea r12d,[1859775393+r12*1+rbp] + xor eax,edi + add r12d,ecx + rol esi,30 + add r12d,eax + rol r14d,1 + xor edx,DWORD[4+rsp] + mov eax,r13d + mov DWORD[rsp],r14d + mov ecx,r12d + xor edx,DWORD[12+rsp] + xor eax,edi + rol ecx,5 + xor edx,DWORD[36+rsp] + lea r11d,[1859775393+r11*1+r14] + xor eax,esi + add r11d,ecx + rol r13d,30 + add r11d,eax + rol edx,1 + xor ebp,DWORD[8+rsp] + mov eax,r12d + mov DWORD[4+rsp],edx + mov ecx,r11d + xor ebp,DWORD[16+rsp] + xor eax,esi + rol ecx,5 + xor ebp,DWORD[40+rsp] + lea edi,[1859775393+rdi*1+rdx] + xor eax,r13d + add edi,ecx + rol r12d,30 + add edi,eax + rol ebp,1 + xor r14d,DWORD[12+rsp] + mov eax,r11d + mov DWORD[8+rsp],ebp + mov ecx,edi + xor r14d,DWORD[20+rsp] + xor eax,r13d + rol ecx,5 + xor r14d,DWORD[44+rsp] + lea esi,[1859775393+rsi*1+rbp] + xor eax,r12d + add esi,ecx + rol r11d,30 + add esi,eax + rol r14d,1 + xor edx,DWORD[16+rsp] + mov eax,edi + mov DWORD[12+rsp],r14d + mov ecx,esi + xor edx,DWORD[24+rsp] + xor eax,r12d + rol ecx,5 + xor edx,DWORD[48+rsp] + lea r13d,[1859775393+r13*1+r14] + xor eax,r11d + add r13d,ecx + rol edi,30 + add r13d,eax + rol edx,1 + xor ebp,DWORD[20+rsp] + mov eax,esi + mov DWORD[16+rsp],edx + mov ecx,r13d + xor ebp,DWORD[28+rsp] + xor eax,r11d + rol ecx,5 + xor ebp,DWORD[52+rsp] + lea r12d,[1859775393+r12*1+rdx] + xor eax,edi + add r12d,ecx + rol esi,30 + add r12d,eax + rol ebp,1 + xor r14d,DWORD[24+rsp] + mov eax,r13d + mov DWORD[20+rsp],ebp + mov ecx,r12d + xor r14d,DWORD[32+rsp] + xor eax,edi + rol ecx,5 + xor r14d,DWORD[56+rsp] + lea r11d,[1859775393+r11*1+rbp] + xor eax,esi + add r11d,ecx + rol r13d,30 + add r11d,eax + rol r14d,1 + xor edx,DWORD[28+rsp] + mov eax,r12d + mov DWORD[24+rsp],r14d + mov ecx,r11d + xor edx,DWORD[36+rsp] + xor eax,esi + rol ecx,5 + xor edx,DWORD[60+rsp] + lea edi,[1859775393+rdi*1+r14] + xor eax,r13d + add edi,ecx + rol r12d,30 + add edi,eax + rol edx,1 + xor ebp,DWORD[32+rsp] + mov eax,r11d + mov DWORD[28+rsp],edx + mov ecx,edi + xor ebp,DWORD[40+rsp] + xor eax,r13d + rol ecx,5 + xor ebp,DWORD[rsp] + lea esi,[1859775393+rsi*1+rdx] + xor eax,r12d + add esi,ecx + rol r11d,30 + add esi,eax + rol ebp,1 + xor r14d,DWORD[36+rsp] + mov eax,r12d + mov DWORD[32+rsp],ebp + mov ebx,r12d + xor r14d,DWORD[44+rsp] + and eax,r11d + mov ecx,esi + xor r14d,DWORD[4+rsp] + lea r13d,[((-1894007588))+r13*1+rbp] + xor ebx,r11d + rol ecx,5 + add r13d,eax + rol r14d,1 + and ebx,edi + add r13d,ecx + rol edi,30 + add r13d,ebx + xor edx,DWORD[40+rsp] + mov eax,r11d + mov DWORD[36+rsp],r14d + mov ebx,r11d + xor edx,DWORD[48+rsp] + and eax,edi + mov ecx,r13d + xor edx,DWORD[8+rsp] + lea r12d,[((-1894007588))+r12*1+r14] + xor ebx,edi + rol ecx,5 + add r12d,eax + rol edx,1 + and ebx,esi + add r12d,ecx + rol esi,30 + add r12d,ebx + xor ebp,DWORD[44+rsp] + mov eax,edi + mov DWORD[40+rsp],edx + mov ebx,edi + xor ebp,DWORD[52+rsp] + and eax,esi + mov ecx,r12d + xor ebp,DWORD[12+rsp] + lea r11d,[((-1894007588))+r11*1+rdx] + xor ebx,esi + rol ecx,5 + add r11d,eax + rol ebp,1 + and ebx,r13d + add r11d,ecx + rol r13d,30 + add r11d,ebx + xor r14d,DWORD[48+rsp] + mov eax,esi + mov DWORD[44+rsp],ebp + mov ebx,esi + xor r14d,DWORD[56+rsp] + and eax,r13d + mov ecx,r11d + xor r14d,DWORD[16+rsp] + lea edi,[((-1894007588))+rdi*1+rbp] + xor ebx,r13d + rol ecx,5 + add edi,eax + rol r14d,1 + and ebx,r12d + add edi,ecx + rol r12d,30 + add edi,ebx + xor edx,DWORD[52+rsp] + mov eax,r13d + mov DWORD[48+rsp],r14d + mov ebx,r13d + xor edx,DWORD[60+rsp] + and eax,r12d + mov ecx,edi + xor edx,DWORD[20+rsp] + lea esi,[((-1894007588))+rsi*1+r14] + xor ebx,r12d + rol ecx,5 + add esi,eax + rol edx,1 + and ebx,r11d + add esi,ecx + rol r11d,30 + add esi,ebx + xor ebp,DWORD[56+rsp] + mov eax,r12d + mov DWORD[52+rsp],edx + mov ebx,r12d + xor ebp,DWORD[rsp] + and eax,r11d + mov ecx,esi + xor ebp,DWORD[24+rsp] + lea r13d,[((-1894007588))+r13*1+rdx] + xor ebx,r11d + rol ecx,5 + add r13d,eax + rol ebp,1 + and ebx,edi + add r13d,ecx + rol edi,30 + add r13d,ebx + xor r14d,DWORD[60+rsp] + mov eax,r11d + mov DWORD[56+rsp],ebp + mov ebx,r11d + xor r14d,DWORD[4+rsp] + and eax,edi + mov ecx,r13d + xor r14d,DWORD[28+rsp] + lea r12d,[((-1894007588))+r12*1+rbp] + xor ebx,edi + rol ecx,5 + add r12d,eax + rol r14d,1 + and ebx,esi + add r12d,ecx + rol esi,30 + add r12d,ebx + xor edx,DWORD[rsp] + mov eax,edi + mov DWORD[60+rsp],r14d + mov ebx,edi + xor edx,DWORD[8+rsp] + and eax,esi + mov ecx,r12d + xor edx,DWORD[32+rsp] + lea r11d,[((-1894007588))+r11*1+r14] + xor ebx,esi + rol ecx,5 + add r11d,eax + rol edx,1 + and ebx,r13d + add r11d,ecx + rol r13d,30 + add r11d,ebx + xor ebp,DWORD[4+rsp] + mov eax,esi + mov DWORD[rsp],edx + mov ebx,esi + xor ebp,DWORD[12+rsp] + and eax,r13d + mov ecx,r11d + xor ebp,DWORD[36+rsp] + lea edi,[((-1894007588))+rdi*1+rdx] + xor ebx,r13d + rol ecx,5 + add edi,eax + rol ebp,1 + and ebx,r12d + add edi,ecx + rol r12d,30 + add edi,ebx + xor r14d,DWORD[8+rsp] + mov eax,r13d + mov DWORD[4+rsp],ebp + mov ebx,r13d + xor r14d,DWORD[16+rsp] + and eax,r12d + mov ecx,edi + xor r14d,DWORD[40+rsp] + lea esi,[((-1894007588))+rsi*1+rbp] + xor ebx,r12d + rol ecx,5 + add esi,eax + rol r14d,1 + and ebx,r11d + add esi,ecx + rol r11d,30 + add esi,ebx + xor edx,DWORD[12+rsp] + mov eax,r12d + mov DWORD[8+rsp],r14d + mov ebx,r12d + xor edx,DWORD[20+rsp] + and eax,r11d + mov ecx,esi + xor edx,DWORD[44+rsp] + lea r13d,[((-1894007588))+r13*1+r14] + xor ebx,r11d + rol ecx,5 + add r13d,eax + rol edx,1 + and ebx,edi + add r13d,ecx + rol edi,30 + add r13d,ebx + xor ebp,DWORD[16+rsp] + mov eax,r11d + mov DWORD[12+rsp],edx + mov ebx,r11d + xor ebp,DWORD[24+rsp] + and eax,edi + mov ecx,r13d + xor ebp,DWORD[48+rsp] + lea r12d,[((-1894007588))+r12*1+rdx] + xor ebx,edi + rol ecx,5 + add r12d,eax + rol ebp,1 + and ebx,esi + add r12d,ecx + rol esi,30 + add r12d,ebx + xor r14d,DWORD[20+rsp] + mov eax,edi + mov DWORD[16+rsp],ebp + mov ebx,edi + xor r14d,DWORD[28+rsp] + and eax,esi + mov ecx,r12d + xor r14d,DWORD[52+rsp] + lea r11d,[((-1894007588))+r11*1+rbp] + xor ebx,esi + rol ecx,5 + add r11d,eax + rol r14d,1 + and ebx,r13d + add r11d,ecx + rol r13d,30 + add r11d,ebx + xor edx,DWORD[24+rsp] + mov eax,esi + mov DWORD[20+rsp],r14d + mov ebx,esi + xor edx,DWORD[32+rsp] + and eax,r13d + mov ecx,r11d + xor edx,DWORD[56+rsp] + lea edi,[((-1894007588))+rdi*1+r14] + xor ebx,r13d + rol ecx,5 + add edi,eax + rol edx,1 + and ebx,r12d + add edi,ecx + rol r12d,30 + add edi,ebx + xor ebp,DWORD[28+rsp] + mov eax,r13d + mov DWORD[24+rsp],edx + mov ebx,r13d + xor ebp,DWORD[36+rsp] + and eax,r12d + mov ecx,edi + xor ebp,DWORD[60+rsp] + lea esi,[((-1894007588))+rsi*1+rdx] + xor ebx,r12d + rol ecx,5 + add esi,eax + rol ebp,1 + and ebx,r11d + add esi,ecx + rol r11d,30 + add esi,ebx + xor r14d,DWORD[32+rsp] + mov eax,r12d + mov DWORD[28+rsp],ebp + mov ebx,r12d + xor r14d,DWORD[40+rsp] + and eax,r11d + mov ecx,esi + xor r14d,DWORD[rsp] + lea r13d,[((-1894007588))+r13*1+rbp] + xor ebx,r11d + rol ecx,5 + add r13d,eax + rol r14d,1 + and ebx,edi + add r13d,ecx + rol edi,30 + add r13d,ebx + xor edx,DWORD[36+rsp] + mov eax,r11d + mov DWORD[32+rsp],r14d + mov ebx,r11d + xor edx,DWORD[44+rsp] + and eax,edi + mov ecx,r13d + xor edx,DWORD[4+rsp] + lea r12d,[((-1894007588))+r12*1+r14] + xor ebx,edi + rol ecx,5 + add r12d,eax + rol edx,1 + and ebx,esi + add r12d,ecx + rol esi,30 + add r12d,ebx + xor ebp,DWORD[40+rsp] + mov eax,edi + mov DWORD[36+rsp],edx + mov ebx,edi + xor ebp,DWORD[48+rsp] + and eax,esi + mov ecx,r12d + xor ebp,DWORD[8+rsp] + lea r11d,[((-1894007588))+r11*1+rdx] + xor ebx,esi + rol ecx,5 + add r11d,eax + rol ebp,1 + and ebx,r13d + add r11d,ecx + rol r13d,30 + add r11d,ebx + xor r14d,DWORD[44+rsp] + mov eax,esi + mov DWORD[40+rsp],ebp + mov ebx,esi + xor r14d,DWORD[52+rsp] + and eax,r13d + mov ecx,r11d + xor r14d,DWORD[12+rsp] + lea edi,[((-1894007588))+rdi*1+rbp] + xor ebx,r13d + rol ecx,5 + add edi,eax + rol r14d,1 + and ebx,r12d + add edi,ecx + rol r12d,30 + add edi,ebx + xor edx,DWORD[48+rsp] + mov eax,r13d + mov DWORD[44+rsp],r14d + mov ebx,r13d + xor edx,DWORD[56+rsp] + and eax,r12d + mov ecx,edi + xor edx,DWORD[16+rsp] + lea esi,[((-1894007588))+rsi*1+r14] + xor ebx,r12d + rol ecx,5 + add esi,eax + rol edx,1 + and ebx,r11d + add esi,ecx + rol r11d,30 + add esi,ebx + xor ebp,DWORD[52+rsp] + mov eax,edi + mov DWORD[48+rsp],edx + mov ecx,esi + xor ebp,DWORD[60+rsp] + xor eax,r12d + rol ecx,5 + xor ebp,DWORD[20+rsp] + lea r13d,[((-899497514))+r13*1+rdx] + xor eax,r11d + add r13d,ecx + rol edi,30 + add r13d,eax + rol ebp,1 + xor r14d,DWORD[56+rsp] + mov eax,esi + mov DWORD[52+rsp],ebp + mov ecx,r13d + xor r14d,DWORD[rsp] + xor eax,r11d + rol ecx,5 + xor r14d,DWORD[24+rsp] + lea r12d,[((-899497514))+r12*1+rbp] + xor eax,edi + add r12d,ecx + rol esi,30 + add r12d,eax + rol r14d,1 + xor edx,DWORD[60+rsp] + mov eax,r13d + mov DWORD[56+rsp],r14d + mov ecx,r12d + xor edx,DWORD[4+rsp] + xor eax,edi + rol ecx,5 + xor edx,DWORD[28+rsp] + lea r11d,[((-899497514))+r11*1+r14] + xor eax,esi + add r11d,ecx + rol r13d,30 + add r11d,eax + rol edx,1 + xor ebp,DWORD[rsp] + mov eax,r12d + mov DWORD[60+rsp],edx + mov ecx,r11d + xor ebp,DWORD[8+rsp] + xor eax,esi + rol ecx,5 + xor ebp,DWORD[32+rsp] + lea edi,[((-899497514))+rdi*1+rdx] + xor eax,r13d + add edi,ecx + rol r12d,30 + add edi,eax + rol ebp,1 + xor r14d,DWORD[4+rsp] + mov eax,r11d + mov DWORD[rsp],ebp + mov ecx,edi + xor r14d,DWORD[12+rsp] + xor eax,r13d + rol ecx,5 + xor r14d,DWORD[36+rsp] + lea esi,[((-899497514))+rsi*1+rbp] + xor eax,r12d + add esi,ecx + rol r11d,30 + add esi,eax + rol r14d,1 + xor edx,DWORD[8+rsp] + mov eax,edi + mov DWORD[4+rsp],r14d + mov ecx,esi + xor edx,DWORD[16+rsp] + xor eax,r12d + rol ecx,5 + xor edx,DWORD[40+rsp] + lea r13d,[((-899497514))+r13*1+r14] + xor eax,r11d + add r13d,ecx + rol edi,30 + add r13d,eax + rol edx,1 + xor ebp,DWORD[12+rsp] + mov eax,esi + mov DWORD[8+rsp],edx + mov ecx,r13d + xor ebp,DWORD[20+rsp] + xor eax,r11d + rol ecx,5 + xor ebp,DWORD[44+rsp] + lea r12d,[((-899497514))+r12*1+rdx] + xor eax,edi + add r12d,ecx + rol esi,30 + add r12d,eax + rol ebp,1 + xor r14d,DWORD[16+rsp] + mov eax,r13d + mov DWORD[12+rsp],ebp + mov ecx,r12d + xor r14d,DWORD[24+rsp] + xor eax,edi + rol ecx,5 + xor r14d,DWORD[48+rsp] + lea r11d,[((-899497514))+r11*1+rbp] + xor eax,esi + add r11d,ecx + rol r13d,30 + add r11d,eax + rol r14d,1 + xor edx,DWORD[20+rsp] + mov eax,r12d + mov DWORD[16+rsp],r14d + mov ecx,r11d + xor edx,DWORD[28+rsp] + xor eax,esi + rol ecx,5 + xor edx,DWORD[52+rsp] + lea edi,[((-899497514))+rdi*1+r14] + xor eax,r13d + add edi,ecx + rol r12d,30 + add edi,eax + rol edx,1 + xor ebp,DWORD[24+rsp] + mov eax,r11d + mov DWORD[20+rsp],edx + mov ecx,edi + xor ebp,DWORD[32+rsp] + xor eax,r13d + rol ecx,5 + xor ebp,DWORD[56+rsp] + lea esi,[((-899497514))+rsi*1+rdx] + xor eax,r12d + add esi,ecx + rol r11d,30 + add esi,eax + rol ebp,1 + xor r14d,DWORD[28+rsp] + mov eax,edi + mov DWORD[24+rsp],ebp + mov ecx,esi + xor r14d,DWORD[36+rsp] + xor eax,r12d + rol ecx,5 + xor r14d,DWORD[60+rsp] + lea r13d,[((-899497514))+r13*1+rbp] + xor eax,r11d + add r13d,ecx + rol edi,30 + add r13d,eax + rol r14d,1 + xor edx,DWORD[32+rsp] + mov eax,esi + mov DWORD[28+rsp],r14d + mov ecx,r13d + xor edx,DWORD[40+rsp] + xor eax,r11d + rol ecx,5 + xor edx,DWORD[rsp] + lea r12d,[((-899497514))+r12*1+r14] + xor eax,edi + add r12d,ecx + rol esi,30 + add r12d,eax + rol edx,1 + xor ebp,DWORD[36+rsp] + mov eax,r13d + + mov ecx,r12d + xor ebp,DWORD[44+rsp] + xor eax,edi + rol ecx,5 + xor ebp,DWORD[4+rsp] + lea r11d,[((-899497514))+r11*1+rdx] + xor eax,esi + add r11d,ecx + rol r13d,30 + add r11d,eax + rol ebp,1 + xor r14d,DWORD[40+rsp] + mov eax,r12d + + mov ecx,r11d + xor r14d,DWORD[48+rsp] + xor eax,esi + rol ecx,5 + xor r14d,DWORD[8+rsp] + lea edi,[((-899497514))+rdi*1+rbp] + xor eax,r13d + add edi,ecx + rol r12d,30 + add edi,eax + rol r14d,1 + xor edx,DWORD[44+rsp] + mov eax,r11d + + mov ecx,edi + xor edx,DWORD[52+rsp] + xor eax,r13d + rol ecx,5 + xor edx,DWORD[12+rsp] + lea esi,[((-899497514))+rsi*1+r14] + xor eax,r12d + add esi,ecx + rol r11d,30 + add esi,eax + rol edx,1 + xor ebp,DWORD[48+rsp] + mov eax,edi + + mov ecx,esi + xor ebp,DWORD[56+rsp] + xor eax,r12d + rol ecx,5 + xor ebp,DWORD[16+rsp] + lea r13d,[((-899497514))+r13*1+rdx] + xor eax,r11d + add r13d,ecx + rol edi,30 + add r13d,eax + rol ebp,1 + xor r14d,DWORD[52+rsp] + mov eax,esi + + mov ecx,r13d + xor r14d,DWORD[60+rsp] + xor eax,r11d + rol ecx,5 + xor r14d,DWORD[20+rsp] + lea r12d,[((-899497514))+r12*1+rbp] + xor eax,edi + add r12d,ecx + rol esi,30 + add r12d,eax + rol r14d,1 + xor edx,DWORD[56+rsp] + mov eax,r13d + + mov ecx,r12d + xor edx,DWORD[rsp] + xor eax,edi + rol ecx,5 + xor edx,DWORD[24+rsp] + lea r11d,[((-899497514))+r11*1+r14] + xor eax,esi + add r11d,ecx + rol r13d,30 + add r11d,eax + rol edx,1 + xor ebp,DWORD[60+rsp] + mov eax,r12d + + mov ecx,r11d + xor ebp,DWORD[4+rsp] + xor eax,esi + rol ecx,5 + xor ebp,DWORD[28+rsp] + lea edi,[((-899497514))+rdi*1+rdx] + xor eax,r13d + add edi,ecx + rol r12d,30 + add edi,eax + rol ebp,1 + mov eax,r11d + mov ecx,edi + xor eax,r13d + lea esi,[((-899497514))+rsi*1+rbp] + rol ecx,5 + xor eax,r12d + add esi,ecx + rol r11d,30 + add esi,eax + add esi,DWORD[r8] + add edi,DWORD[4+r8] + add r11d,DWORD[8+r8] + add r12d,DWORD[12+r8] + add r13d,DWORD[16+r8] + mov DWORD[r8],esi + mov DWORD[4+r8],edi + mov DWORD[8+r8],r11d + mov DWORD[12+r8],r12d + mov DWORD[16+r8],r13d + + sub r10,1 + lea r9,[64+r9] + jnz NEAR $L$loop + + mov rsi,QWORD[64+rsp] + + mov r14,QWORD[((-40))+rsi] + + mov r13,QWORD[((-32))+rsi] + + mov r12,QWORD[((-24))+rsi] + + mov rbp,QWORD[((-16))+rsi] + + mov rbx,QWORD[((-8))+rsi] + + lea rsp,[rsi] + +$L$epilogue: + mov rdi,QWORD[8+rsp] ;WIN64 epilogue + mov rsi,QWORD[16+rsp] + DB 0F3h,0C3h ;repret + +$L$SEH_end_sha1_block_data_order: + +ALIGN 32 +sha1_block_data_order_shaext: + mov QWORD[8+rsp],rdi ;WIN64 prologue + mov QWORD[16+rsp],rsi + mov rax,rsp +$L$SEH_begin_sha1_block_data_order_shaext: + mov rdi,rcx + mov rsi,rdx + mov rdx,r8 + + +_shaext_shortcut: + + lea rsp,[((-72))+rsp] + movaps XMMWORD[(-8-64)+rax],xmm6 + movaps XMMWORD[(-8-48)+rax],xmm7 + movaps XMMWORD[(-8-32)+rax],xmm8 + movaps XMMWORD[(-8-16)+rax],xmm9 +$L$prologue_shaext: + movdqu xmm0,XMMWORD[rdi] + movd xmm1,DWORD[16+rdi] + movdqa xmm3,XMMWORD[((K_XX_XX+160))] + + movdqu xmm4,XMMWORD[rsi] + pshufd xmm0,xmm0,27 + movdqu xmm5,XMMWORD[16+rsi] + pshufd xmm1,xmm1,27 + movdqu xmm6,XMMWORD[32+rsi] +DB 102,15,56,0,227 + movdqu xmm7,XMMWORD[48+rsi] +DB 102,15,56,0,235 +DB 102,15,56,0,243 + movdqa xmm9,xmm1 +DB 102,15,56,0,251 + jmp NEAR $L$oop_shaext + +ALIGN 16 +$L$oop_shaext: + dec rdx + lea r8,[64+rsi] + paddd xmm1,xmm4 + cmovne rsi,r8 + movdqa xmm8,xmm0 +DB 15,56,201,229 + movdqa xmm2,xmm0 +DB 15,58,204,193,0 +DB 15,56,200,213 + pxor xmm4,xmm6 +DB 15,56,201,238 +DB 15,56,202,231 + + movdqa xmm1,xmm0 +DB 15,58,204,194,0 +DB 15,56,200,206 + pxor xmm5,xmm7 +DB 15,56,202,236 +DB 15,56,201,247 + movdqa xmm2,xmm0 +DB 15,58,204,193,0 +DB 15,56,200,215 + pxor xmm6,xmm4 +DB 15,56,201,252 +DB 15,56,202,245 + + movdqa xmm1,xmm0 +DB 15,58,204,194,0 +DB 15,56,200,204 + pxor xmm7,xmm5 +DB 15,56,202,254 +DB 15,56,201,229 + movdqa xmm2,xmm0 +DB 15,58,204,193,0 +DB 15,56,200,213 + pxor xmm4,xmm6 +DB 15,56,201,238 +DB 15,56,202,231 + + movdqa xmm1,xmm0 +DB 15,58,204,194,1 +DB 15,56,200,206 + pxor xmm5,xmm7 +DB 15,56,202,236 +DB 15,56,201,247 + movdqa xmm2,xmm0 +DB 15,58,204,193,1 +DB 15,56,200,215 + pxor xmm6,xmm4 +DB 15,56,201,252 +DB 15,56,202,245 + + movdqa xmm1,xmm0 +DB 15,58,204,194,1 +DB 15,56,200,204 + pxor xmm7,xmm5 +DB 15,56,202,254 +DB 15,56,201,229 + movdqa xmm2,xmm0 +DB 15,58,204,193,1 +DB 15,56,200,213 + pxor xmm4,xmm6 +DB 15,56,201,238 +DB 15,56,202,231 + + movdqa xmm1,xmm0 +DB 15,58,204,194,1 +DB 15,56,200,206 + pxor xmm5,xmm7 +DB 15,56,202,236 +DB 15,56,201,247 + movdqa xmm2,xmm0 +DB 15,58,204,193,2 +DB 15,56,200,215 + pxor xmm6,xmm4 +DB 15,56,201,252 +DB 15,56,202,245 + + movdqa xmm1,xmm0 +DB 15,58,204,194,2 +DB 15,56,200,204 + pxor xmm7,xmm5 +DB 15,56,202,254 +DB 15,56,201,229 + movdqa xmm2,xmm0 +DB 15,58,204,193,2 +DB 15,56,200,213 + pxor xmm4,xmm6 +DB 15,56,201,238 +DB 15,56,202,231 + + movdqa xmm1,xmm0 +DB 15,58,204,194,2 +DB 15,56,200,206 + pxor xmm5,xmm7 +DB 15,56,202,236 +DB 15,56,201,247 + movdqa xmm2,xmm0 +DB 15,58,204,193,2 +DB 15,56,200,215 + pxor xmm6,xmm4 +DB 15,56,201,252 +DB 15,56,202,245 + + movdqa xmm1,xmm0 +DB 15,58,204,194,3 +DB 15,56,200,204 + pxor xmm7,xmm5 +DB 15,56,202,254 + movdqu xmm4,XMMWORD[rsi] + movdqa xmm2,xmm0 +DB 15,58,204,193,3 +DB 15,56,200,213 + movdqu xmm5,XMMWORD[16+rsi] +DB 102,15,56,0,227 + + movdqa xmm1,xmm0 +DB 15,58,204,194,3 +DB 15,56,200,206 + movdqu xmm6,XMMWORD[32+rsi] +DB 102,15,56,0,235 + + movdqa xmm2,xmm0 +DB 15,58,204,193,3 +DB 15,56,200,215 + movdqu xmm7,XMMWORD[48+rsi] +DB 102,15,56,0,243 + + movdqa xmm1,xmm0 +DB 15,58,204,194,3 +DB 65,15,56,200,201 +DB 102,15,56,0,251 + + paddd xmm0,xmm8 + movdqa xmm9,xmm1 + + jnz NEAR $L$oop_shaext + + pshufd xmm0,xmm0,27 + pshufd xmm1,xmm1,27 + movdqu XMMWORD[rdi],xmm0 + movd DWORD[16+rdi],xmm1 + movaps xmm6,XMMWORD[((-8-64))+rax] + movaps xmm7,XMMWORD[((-8-48))+rax] + movaps xmm8,XMMWORD[((-8-32))+rax] + movaps xmm9,XMMWORD[((-8-16))+rax] + mov rsp,rax +$L$epilogue_shaext: + mov rdi,QWORD[8+rsp] ;WIN64 epilogue + mov rsi,QWORD[16+rsp] + DB 0F3h,0C3h ;repret + +$L$SEH_end_sha1_block_data_order_shaext: + +ALIGN 16 +sha1_block_data_order_ssse3: + mov QWORD[8+rsp],rdi ;WIN64 prologue + mov QWORD[16+rsp],rsi + mov rax,rsp +$L$SEH_begin_sha1_block_data_order_ssse3: + mov rdi,rcx + mov rsi,rdx + mov rdx,r8 + + +_ssse3_shortcut: + + mov r11,rsp + + push rbx + + push rbp + + push r12 + + push r13 + + push r14 + + lea rsp,[((-160))+rsp] + movaps XMMWORD[(-40-96)+r11],xmm6 + movaps XMMWORD[(-40-80)+r11],xmm7 + movaps XMMWORD[(-40-64)+r11],xmm8 + movaps XMMWORD[(-40-48)+r11],xmm9 + movaps XMMWORD[(-40-32)+r11],xmm10 + movaps XMMWORD[(-40-16)+r11],xmm11 +$L$prologue_ssse3: + and rsp,-64 + mov r8,rdi + mov r9,rsi + mov r10,rdx + + shl r10,6 + add r10,r9 + lea r14,[((K_XX_XX+64))] + + mov eax,DWORD[r8] + mov ebx,DWORD[4+r8] + mov ecx,DWORD[8+r8] + mov edx,DWORD[12+r8] + mov esi,ebx + mov ebp,DWORD[16+r8] + mov edi,ecx + xor edi,edx + and esi,edi + + movdqa xmm6,XMMWORD[64+r14] + movdqa xmm9,XMMWORD[((-64))+r14] + movdqu xmm0,XMMWORD[r9] + movdqu xmm1,XMMWORD[16+r9] + movdqu xmm2,XMMWORD[32+r9] + movdqu xmm3,XMMWORD[48+r9] +DB 102,15,56,0,198 +DB 102,15,56,0,206 +DB 102,15,56,0,214 + add r9,64 + paddd xmm0,xmm9 +DB 102,15,56,0,222 + paddd xmm1,xmm9 + paddd xmm2,xmm9 + movdqa XMMWORD[rsp],xmm0 + psubd xmm0,xmm9 + movdqa XMMWORD[16+rsp],xmm1 + psubd xmm1,xmm9 + movdqa XMMWORD[32+rsp],xmm2 + psubd xmm2,xmm9 + jmp NEAR $L$oop_ssse3 +ALIGN 16 +$L$oop_ssse3: + ror ebx,2 + pshufd xmm4,xmm0,238 + xor esi,edx + movdqa xmm8,xmm3 + paddd xmm9,xmm3 + mov edi,eax + add ebp,DWORD[rsp] + punpcklqdq xmm4,xmm1 + xor ebx,ecx + rol eax,5 + add ebp,esi + psrldq xmm8,4 + and edi,ebx + xor ebx,ecx + pxor xmm4,xmm0 + add ebp,eax + ror eax,7 + pxor xmm8,xmm2 + xor edi,ecx + mov esi,ebp + add edx,DWORD[4+rsp] + pxor xmm4,xmm8 + xor eax,ebx + rol ebp,5 + movdqa XMMWORD[48+rsp],xmm9 + add edx,edi + and esi,eax + movdqa xmm10,xmm4 + xor eax,ebx + add edx,ebp + ror ebp,7 + movdqa xmm8,xmm4 + xor esi,ebx + pslldq xmm10,12 + paddd xmm4,xmm4 + mov edi,edx + add ecx,DWORD[8+rsp] + psrld xmm8,31 + xor ebp,eax + rol edx,5 + add ecx,esi + movdqa xmm9,xmm10 + and edi,ebp + xor ebp,eax + psrld xmm10,30 + add ecx,edx + ror edx,7 + por xmm4,xmm8 + xor edi,eax + mov esi,ecx + add ebx,DWORD[12+rsp] + pslld xmm9,2 + pxor xmm4,xmm10 + xor edx,ebp + movdqa xmm10,XMMWORD[((-64))+r14] + rol ecx,5 + add ebx,edi + and esi,edx + pxor xmm4,xmm9 + xor edx,ebp + add ebx,ecx + ror ecx,7 + pshufd xmm5,xmm1,238 + xor esi,ebp + movdqa xmm9,xmm4 + paddd xmm10,xmm4 + mov edi,ebx + add eax,DWORD[16+rsp] + punpcklqdq xmm5,xmm2 + xor ecx,edx + rol ebx,5 + add eax,esi + psrldq xmm9,4 + and edi,ecx + xor ecx,edx + pxor xmm5,xmm1 + add eax,ebx + ror ebx,7 + pxor xmm9,xmm3 + xor edi,edx + mov esi,eax + add ebp,DWORD[20+rsp] + pxor xmm5,xmm9 + xor ebx,ecx + rol eax,5 + movdqa XMMWORD[rsp],xmm10 + add ebp,edi + and esi,ebx + movdqa xmm8,xmm5 + xor ebx,ecx + add ebp,eax + ror eax,7 + movdqa xmm9,xmm5 + xor esi,ecx + pslldq xmm8,12 + paddd xmm5,xmm5 + mov edi,ebp + add edx,DWORD[24+rsp] + psrld xmm9,31 + xor eax,ebx + rol ebp,5 + add edx,esi + movdqa xmm10,xmm8 + and edi,eax + xor eax,ebx + psrld xmm8,30 + add edx,ebp + ror ebp,7 + por xmm5,xmm9 + xor edi,ebx + mov esi,edx + add ecx,DWORD[28+rsp] + pslld xmm10,2 + pxor xmm5,xmm8 + xor ebp,eax + movdqa xmm8,XMMWORD[((-32))+r14] + rol edx,5 + add ecx,edi + and esi,ebp + pxor xmm5,xmm10 + xor ebp,eax + add ecx,edx + ror edx,7 + pshufd xmm6,xmm2,238 + xor esi,eax + movdqa xmm10,xmm5 + paddd xmm8,xmm5 + mov edi,ecx + add ebx,DWORD[32+rsp] + punpcklqdq xmm6,xmm3 + xor edx,ebp + rol ecx,5 + add ebx,esi + psrldq xmm10,4 + and edi,edx + xor edx,ebp + pxor xmm6,xmm2 + add ebx,ecx + ror ecx,7 + pxor xmm10,xmm4 + xor edi,ebp + mov esi,ebx + add eax,DWORD[36+rsp] + pxor xmm6,xmm10 + xor ecx,edx + rol ebx,5 + movdqa XMMWORD[16+rsp],xmm8 + add eax,edi + and esi,ecx + movdqa xmm9,xmm6 + xor ecx,edx + add eax,ebx + ror ebx,7 + movdqa xmm10,xmm6 + xor esi,edx + pslldq xmm9,12 + paddd xmm6,xmm6 + mov edi,eax + add ebp,DWORD[40+rsp] + psrld xmm10,31 + xor ebx,ecx + rol eax,5 + add ebp,esi + movdqa xmm8,xmm9 + and edi,ebx + xor ebx,ecx + psrld xmm9,30 + add ebp,eax + ror eax,7 + por xmm6,xmm10 + xor edi,ecx + mov esi,ebp + add edx,DWORD[44+rsp] + pslld xmm8,2 + pxor xmm6,xmm9 + xor eax,ebx + movdqa xmm9,XMMWORD[((-32))+r14] + rol ebp,5 + add edx,edi + and esi,eax + pxor xmm6,xmm8 + xor eax,ebx + add edx,ebp + ror ebp,7 + pshufd xmm7,xmm3,238 + xor esi,ebx + movdqa xmm8,xmm6 + paddd xmm9,xmm6 + mov edi,edx + add ecx,DWORD[48+rsp] + punpcklqdq xmm7,xmm4 + xor ebp,eax + rol edx,5 + add ecx,esi + psrldq xmm8,4 + and edi,ebp + xor ebp,eax + pxor xmm7,xmm3 + add ecx,edx + ror edx,7 + pxor xmm8,xmm5 + xor edi,eax + mov esi,ecx + add ebx,DWORD[52+rsp] + pxor xmm7,xmm8 + xor edx,ebp + rol ecx,5 + movdqa XMMWORD[32+rsp],xmm9 + add ebx,edi + and esi,edx + movdqa xmm10,xmm7 + xor edx,ebp + add ebx,ecx + ror ecx,7 + movdqa xmm8,xmm7 + xor esi,ebp + pslldq xmm10,12 + paddd xmm7,xmm7 + mov edi,ebx + add eax,DWORD[56+rsp] + psrld xmm8,31 + xor ecx,edx + rol ebx,5 + add eax,esi + movdqa xmm9,xmm10 + and edi,ecx + xor ecx,edx + psrld xmm10,30 + add eax,ebx + ror ebx,7 + por xmm7,xmm8 + xor edi,edx + mov esi,eax + add ebp,DWORD[60+rsp] + pslld xmm9,2 + pxor xmm7,xmm10 + xor ebx,ecx + movdqa xmm10,XMMWORD[((-32))+r14] + rol eax,5 + add ebp,edi + and esi,ebx + pxor xmm7,xmm9 + pshufd xmm9,xmm6,238 + xor ebx,ecx + add ebp,eax + ror eax,7 + pxor xmm0,xmm4 + xor esi,ecx + mov edi,ebp + add edx,DWORD[rsp] + punpcklqdq xmm9,xmm7 + xor eax,ebx + rol ebp,5 + pxor xmm0,xmm1 + add edx,esi + and edi,eax + movdqa xmm8,xmm10 + xor eax,ebx + paddd xmm10,xmm7 + add edx,ebp + pxor xmm0,xmm9 + ror ebp,7 + xor edi,ebx + mov esi,edx + add ecx,DWORD[4+rsp] + movdqa xmm9,xmm0 + xor ebp,eax + rol edx,5 + movdqa XMMWORD[48+rsp],xmm10 + add ecx,edi + and esi,ebp + xor ebp,eax + pslld xmm0,2 + add ecx,edx + ror edx,7 + psrld xmm9,30 + xor esi,eax + mov edi,ecx + add ebx,DWORD[8+rsp] + por xmm0,xmm9 + xor edx,ebp + rol ecx,5 + pshufd xmm10,xmm7,238 + add ebx,esi + and edi,edx + xor edx,ebp + add ebx,ecx + add eax,DWORD[12+rsp] + xor edi,ebp + mov esi,ebx + rol ebx,5 + add eax,edi + xor esi,edx + ror ecx,7 + add eax,ebx + pxor xmm1,xmm5 + add ebp,DWORD[16+rsp] + xor esi,ecx + punpcklqdq xmm10,xmm0 + mov edi,eax + rol eax,5 + pxor xmm1,xmm2 + add ebp,esi + xor edi,ecx + movdqa xmm9,xmm8 + ror ebx,7 + paddd xmm8,xmm0 + add ebp,eax + pxor xmm1,xmm10 + add edx,DWORD[20+rsp] + xor edi,ebx + mov esi,ebp + rol ebp,5 + movdqa xmm10,xmm1 + add edx,edi + xor esi,ebx + movdqa XMMWORD[rsp],xmm8 + ror eax,7 + add edx,ebp + add ecx,DWORD[24+rsp] + pslld xmm1,2 + xor esi,eax + mov edi,edx + psrld xmm10,30 + rol edx,5 + add ecx,esi + xor edi,eax + ror ebp,7 + por xmm1,xmm10 + add ecx,edx + add ebx,DWORD[28+rsp] + pshufd xmm8,xmm0,238 + xor edi,ebp + mov esi,ecx + rol ecx,5 + add ebx,edi + xor esi,ebp + ror edx,7 + add ebx,ecx + pxor xmm2,xmm6 + add eax,DWORD[32+rsp] + xor esi,edx + punpcklqdq xmm8,xmm1 + mov edi,ebx + rol ebx,5 + pxor xmm2,xmm3 + add eax,esi + xor edi,edx + movdqa xmm10,XMMWORD[r14] + ror ecx,7 + paddd xmm9,xmm1 + add eax,ebx + pxor xmm2,xmm8 + add ebp,DWORD[36+rsp] + xor edi,ecx + mov esi,eax + rol eax,5 + movdqa xmm8,xmm2 + add ebp,edi + xor esi,ecx + movdqa XMMWORD[16+rsp],xmm9 + ror ebx,7 + add ebp,eax + add edx,DWORD[40+rsp] + pslld xmm2,2 + xor esi,ebx + mov edi,ebp + psrld xmm8,30 + rol ebp,5 + add edx,esi + xor edi,ebx + ror eax,7 + por xmm2,xmm8 + add edx,ebp + add ecx,DWORD[44+rsp] + pshufd xmm9,xmm1,238 + xor edi,eax + mov esi,edx + rol edx,5 + add ecx,edi + xor esi,eax + ror ebp,7 + add ecx,edx + pxor xmm3,xmm7 + add ebx,DWORD[48+rsp] + xor esi,ebp + punpcklqdq xmm9,xmm2 + mov edi,ecx + rol ecx,5 + pxor xmm3,xmm4 + add ebx,esi + xor edi,ebp + movdqa xmm8,xmm10 + ror edx,7 + paddd xmm10,xmm2 + add ebx,ecx + pxor xmm3,xmm9 + add eax,DWORD[52+rsp] + xor edi,edx + mov esi,ebx + rol ebx,5 + movdqa xmm9,xmm3 + add eax,edi + xor esi,edx + movdqa XMMWORD[32+rsp],xmm10 + ror ecx,7 + add eax,ebx + add ebp,DWORD[56+rsp] + pslld xmm3,2 + xor esi,ecx + mov edi,eax + psrld xmm9,30 + rol eax,5 + add ebp,esi + xor edi,ecx + ror ebx,7 + por xmm3,xmm9 + add ebp,eax + add edx,DWORD[60+rsp] + pshufd xmm10,xmm2,238 + xor edi,ebx + mov esi,ebp + rol ebp,5 + add edx,edi + xor esi,ebx + ror eax,7 + add edx,ebp + pxor xmm4,xmm0 + add ecx,DWORD[rsp] + xor esi,eax + punpcklqdq xmm10,xmm3 + mov edi,edx + rol edx,5 + pxor xmm4,xmm5 + add ecx,esi + xor edi,eax + movdqa xmm9,xmm8 + ror ebp,7 + paddd xmm8,xmm3 + add ecx,edx + pxor xmm4,xmm10 + add ebx,DWORD[4+rsp] + xor edi,ebp + mov esi,ecx + rol ecx,5 + movdqa xmm10,xmm4 + add ebx,edi + xor esi,ebp + movdqa XMMWORD[48+rsp],xmm8 + ror edx,7 + add ebx,ecx + add eax,DWORD[8+rsp] + pslld xmm4,2 + xor esi,edx + mov edi,ebx + psrld xmm10,30 + rol ebx,5 + add eax,esi + xor edi,edx + ror ecx,7 + por xmm4,xmm10 + add eax,ebx + add ebp,DWORD[12+rsp] + pshufd xmm8,xmm3,238 + xor edi,ecx + mov esi,eax + rol eax,5 + add ebp,edi + xor esi,ecx + ror ebx,7 + add ebp,eax + pxor xmm5,xmm1 + add edx,DWORD[16+rsp] + xor esi,ebx + punpcklqdq xmm8,xmm4 + mov edi,ebp + rol ebp,5 + pxor xmm5,xmm6 + add edx,esi + xor edi,ebx + movdqa xmm10,xmm9 + ror eax,7 + paddd xmm9,xmm4 + add edx,ebp + pxor xmm5,xmm8 + add ecx,DWORD[20+rsp] + xor edi,eax + mov esi,edx + rol edx,5 + movdqa xmm8,xmm5 + add ecx,edi + xor esi,eax + movdqa XMMWORD[rsp],xmm9 + ror ebp,7 + add ecx,edx + add ebx,DWORD[24+rsp] + pslld xmm5,2 + xor esi,ebp + mov edi,ecx + psrld xmm8,30 + rol ecx,5 + add ebx,esi + xor edi,ebp + ror edx,7 + por xmm5,xmm8 + add ebx,ecx + add eax,DWORD[28+rsp] + pshufd xmm9,xmm4,238 + ror ecx,7 + mov esi,ebx + xor edi,edx + rol ebx,5 + add eax,edi + xor esi,ecx + xor ecx,edx + add eax,ebx + pxor xmm6,xmm2 + add ebp,DWORD[32+rsp] + and esi,ecx + xor ecx,edx + ror ebx,7 + punpcklqdq xmm9,xmm5 + mov edi,eax + xor esi,ecx + pxor xmm6,xmm7 + rol eax,5 + add ebp,esi + movdqa xmm8,xmm10 + xor edi,ebx + paddd xmm10,xmm5 + xor ebx,ecx + pxor xmm6,xmm9 + add ebp,eax + add edx,DWORD[36+rsp] + and edi,ebx + xor ebx,ecx + ror eax,7 + movdqa xmm9,xmm6 + mov esi,ebp + xor edi,ebx + movdqa XMMWORD[16+rsp],xmm10 + rol ebp,5 + add edx,edi + xor esi,eax + pslld xmm6,2 + xor eax,ebx + add edx,ebp + psrld xmm9,30 + add ecx,DWORD[40+rsp] + and esi,eax + xor eax,ebx + por xmm6,xmm9 + ror ebp,7 + mov edi,edx + xor esi,eax + rol edx,5 + pshufd xmm10,xmm5,238 + add ecx,esi + xor edi,ebp + xor ebp,eax + add ecx,edx + add ebx,DWORD[44+rsp] + and edi,ebp + xor ebp,eax + ror edx,7 + mov esi,ecx + xor edi,ebp + rol ecx,5 + add ebx,edi + xor esi,edx + xor edx,ebp + add ebx,ecx + pxor xmm7,xmm3 + add eax,DWORD[48+rsp] + and esi,edx + xor edx,ebp + ror ecx,7 + punpcklqdq xmm10,xmm6 + mov edi,ebx + xor esi,edx + pxor xmm7,xmm0 + rol ebx,5 + add eax,esi + movdqa xmm9,XMMWORD[32+r14] + xor edi,ecx + paddd xmm8,xmm6 + xor ecx,edx + pxor xmm7,xmm10 + add eax,ebx + add ebp,DWORD[52+rsp] + and edi,ecx + xor ecx,edx + ror ebx,7 + movdqa xmm10,xmm7 + mov esi,eax + xor edi,ecx + movdqa XMMWORD[32+rsp],xmm8 + rol eax,5 + add ebp,edi + xor esi,ebx + pslld xmm7,2 + xor ebx,ecx + add ebp,eax + psrld xmm10,30 + add edx,DWORD[56+rsp] + and esi,ebx + xor ebx,ecx + por xmm7,xmm10 + ror eax,7 + mov edi,ebp + xor esi,ebx + rol ebp,5 + pshufd xmm8,xmm6,238 + add edx,esi + xor edi,eax + xor eax,ebx + add edx,ebp + add ecx,DWORD[60+rsp] + and edi,eax + xor eax,ebx + ror ebp,7 + mov esi,edx + xor edi,eax + rol edx,5 + add ecx,edi + xor esi,ebp + xor ebp,eax + add ecx,edx + pxor xmm0,xmm4 + add ebx,DWORD[rsp] + and esi,ebp + xor ebp,eax + ror edx,7 + punpcklqdq xmm8,xmm7 + mov edi,ecx + xor esi,ebp + pxor xmm0,xmm1 + rol ecx,5 + add ebx,esi + movdqa xmm10,xmm9 + xor edi,edx + paddd xmm9,xmm7 + xor edx,ebp + pxor xmm0,xmm8 + add ebx,ecx + add eax,DWORD[4+rsp] + and edi,edx + xor edx,ebp + ror ecx,7 + movdqa xmm8,xmm0 + mov esi,ebx + xor edi,edx + movdqa XMMWORD[48+rsp],xmm9 + rol ebx,5 + add eax,edi + xor esi,ecx + pslld xmm0,2 + xor ecx,edx + add eax,ebx + psrld xmm8,30 + add ebp,DWORD[8+rsp] + and esi,ecx + xor ecx,edx + por xmm0,xmm8 + ror ebx,7 + mov edi,eax + xor esi,ecx + rol eax,5 + pshufd xmm9,xmm7,238 + add ebp,esi + xor edi,ebx + xor ebx,ecx + add ebp,eax + add edx,DWORD[12+rsp] + and edi,ebx + xor ebx,ecx + ror eax,7 + mov esi,ebp + xor edi,ebx + rol ebp,5 + add edx,edi + xor esi,eax + xor eax,ebx + add edx,ebp + pxor xmm1,xmm5 + add ecx,DWORD[16+rsp] + and esi,eax + xor eax,ebx + ror ebp,7 + punpcklqdq xmm9,xmm0 + mov edi,edx + xor esi,eax + pxor xmm1,xmm2 + rol edx,5 + add ecx,esi + movdqa xmm8,xmm10 + xor edi,ebp + paddd xmm10,xmm0 + xor ebp,eax + pxor xmm1,xmm9 + add ecx,edx + add ebx,DWORD[20+rsp] + and edi,ebp + xor ebp,eax + ror edx,7 + movdqa xmm9,xmm1 + mov esi,ecx + xor edi,ebp + movdqa XMMWORD[rsp],xmm10 + rol ecx,5 + add ebx,edi + xor esi,edx + pslld xmm1,2 + xor edx,ebp + add ebx,ecx + psrld xmm9,30 + add eax,DWORD[24+rsp] + and esi,edx + xor edx,ebp + por xmm1,xmm9 + ror ecx,7 + mov edi,ebx + xor esi,edx + rol ebx,5 + pshufd xmm10,xmm0,238 + add eax,esi + xor edi,ecx + xor ecx,edx + add eax,ebx + add ebp,DWORD[28+rsp] + and edi,ecx + xor ecx,edx + ror ebx,7 + mov esi,eax + xor edi,ecx + rol eax,5 + add ebp,edi + xor esi,ebx + xor ebx,ecx + add ebp,eax + pxor xmm2,xmm6 + add edx,DWORD[32+rsp] + and esi,ebx + xor ebx,ecx + ror eax,7 + punpcklqdq xmm10,xmm1 + mov edi,ebp + xor esi,ebx + pxor xmm2,xmm3 + rol ebp,5 + add edx,esi + movdqa xmm9,xmm8 + xor edi,eax + paddd xmm8,xmm1 + xor eax,ebx + pxor xmm2,xmm10 + add edx,ebp + add ecx,DWORD[36+rsp] + and edi,eax + xor eax,ebx + ror ebp,7 + movdqa xmm10,xmm2 + mov esi,edx + xor edi,eax + movdqa XMMWORD[16+rsp],xmm8 + rol edx,5 + add ecx,edi + xor esi,ebp + pslld xmm2,2 + xor ebp,eax + add ecx,edx + psrld xmm10,30 + add ebx,DWORD[40+rsp] + and esi,ebp + xor ebp,eax + por xmm2,xmm10 + ror edx,7 + mov edi,ecx + xor esi,ebp + rol ecx,5 + pshufd xmm8,xmm1,238 + add ebx,esi + xor edi,edx + xor edx,ebp + add ebx,ecx + add eax,DWORD[44+rsp] + and edi,edx + xor edx,ebp + ror ecx,7 + mov esi,ebx + xor edi,edx + rol ebx,5 + add eax,edi + xor esi,edx + add eax,ebx + pxor xmm3,xmm7 + add ebp,DWORD[48+rsp] + xor esi,ecx + punpcklqdq xmm8,xmm2 + mov edi,eax + rol eax,5 + pxor xmm3,xmm4 + add ebp,esi + xor edi,ecx + movdqa xmm10,xmm9 + ror ebx,7 + paddd xmm9,xmm2 + add ebp,eax + pxor xmm3,xmm8 + add edx,DWORD[52+rsp] + xor edi,ebx + mov esi,ebp + rol ebp,5 + movdqa xmm8,xmm3 + add edx,edi + xor esi,ebx + movdqa XMMWORD[32+rsp],xmm9 + ror eax,7 + add edx,ebp + add ecx,DWORD[56+rsp] + pslld xmm3,2 + xor esi,eax + mov edi,edx + psrld xmm8,30 + rol edx,5 + add ecx,esi + xor edi,eax + ror ebp,7 + por xmm3,xmm8 + add ecx,edx + add ebx,DWORD[60+rsp] + xor edi,ebp + mov esi,ecx + rol ecx,5 + add ebx,edi + xor esi,ebp + ror edx,7 + add ebx,ecx + add eax,DWORD[rsp] + xor esi,edx + mov edi,ebx + rol ebx,5 + paddd xmm10,xmm3 + add eax,esi + xor edi,edx + movdqa XMMWORD[48+rsp],xmm10 + ror ecx,7 + add eax,ebx + add ebp,DWORD[4+rsp] + xor edi,ecx + mov esi,eax + rol eax,5 + add ebp,edi + xor esi,ecx + ror ebx,7 + add ebp,eax + add edx,DWORD[8+rsp] + xor esi,ebx + mov edi,ebp + rol ebp,5 + add edx,esi + xor edi,ebx + ror eax,7 + add edx,ebp + add ecx,DWORD[12+rsp] + xor edi,eax + mov esi,edx + rol edx,5 + add ecx,edi + xor esi,eax + ror ebp,7 + add ecx,edx + cmp r9,r10 + je NEAR $L$done_ssse3 + movdqa xmm6,XMMWORD[64+r14] + movdqa xmm9,XMMWORD[((-64))+r14] + movdqu xmm0,XMMWORD[r9] + movdqu xmm1,XMMWORD[16+r9] + movdqu xmm2,XMMWORD[32+r9] + movdqu xmm3,XMMWORD[48+r9] +DB 102,15,56,0,198 + add r9,64 + add ebx,DWORD[16+rsp] + xor esi,ebp + mov edi,ecx +DB 102,15,56,0,206 + rol ecx,5 + add ebx,esi + xor edi,ebp + ror edx,7 + paddd xmm0,xmm9 + add ebx,ecx + add eax,DWORD[20+rsp] + xor edi,edx + mov esi,ebx + movdqa XMMWORD[rsp],xmm0 + rol ebx,5 + add eax,edi + xor esi,edx + ror ecx,7 + psubd xmm0,xmm9 + add eax,ebx + add ebp,DWORD[24+rsp] + xor esi,ecx + mov edi,eax + rol eax,5 + add ebp,esi + xor edi,ecx + ror ebx,7 + add ebp,eax + add edx,DWORD[28+rsp] + xor edi,ebx + mov esi,ebp + rol ebp,5 + add edx,edi + xor esi,ebx + ror eax,7 + add edx,ebp + add ecx,DWORD[32+rsp] + xor esi,eax + mov edi,edx +DB 102,15,56,0,214 + rol edx,5 + add ecx,esi + xor edi,eax + ror ebp,7 + paddd xmm1,xmm9 + add ecx,edx + add ebx,DWORD[36+rsp] + xor edi,ebp + mov esi,ecx + movdqa XMMWORD[16+rsp],xmm1 + rol ecx,5 + add ebx,edi + xor esi,ebp + ror edx,7 + psubd xmm1,xmm9 + add ebx,ecx + add eax,DWORD[40+rsp] + xor esi,edx + mov edi,ebx + rol ebx,5 + add eax,esi + xor edi,edx + ror ecx,7 + add eax,ebx + add ebp,DWORD[44+rsp] + xor edi,ecx + mov esi,eax + rol eax,5 + add ebp,edi + xor esi,ecx + ror ebx,7 + add ebp,eax + add edx,DWORD[48+rsp] + xor esi,ebx + mov edi,ebp +DB 102,15,56,0,222 + rol ebp,5 + add edx,esi + xor edi,ebx + ror eax,7 + paddd xmm2,xmm9 + add edx,ebp + add ecx,DWORD[52+rsp] + xor edi,eax + mov esi,edx + movdqa XMMWORD[32+rsp],xmm2 + rol edx,5 + add ecx,edi + xor esi,eax + ror ebp,7 + psubd xmm2,xmm9 + add ecx,edx + add ebx,DWORD[56+rsp] + xor esi,ebp + mov edi,ecx + rol ecx,5 + add ebx,esi + xor edi,ebp + ror edx,7 + add ebx,ecx + add eax,DWORD[60+rsp] + xor edi,edx + mov esi,ebx + rol ebx,5 + add eax,edi + ror ecx,7 + add eax,ebx + add eax,DWORD[r8] + add esi,DWORD[4+r8] + add ecx,DWORD[8+r8] + add edx,DWORD[12+r8] + mov DWORD[r8],eax + add ebp,DWORD[16+r8] + mov DWORD[4+r8],esi + mov ebx,esi + mov DWORD[8+r8],ecx + mov edi,ecx + mov DWORD[12+r8],edx + xor edi,edx + mov DWORD[16+r8],ebp + and esi,edi + jmp NEAR $L$oop_ssse3 + +ALIGN 16 +$L$done_ssse3: + add ebx,DWORD[16+rsp] + xor esi,ebp + mov edi,ecx + rol ecx,5 + add ebx,esi + xor edi,ebp + ror edx,7 + add ebx,ecx + add eax,DWORD[20+rsp] + xor edi,edx + mov esi,ebx + rol ebx,5 + add eax,edi + xor esi,edx + ror ecx,7 + add eax,ebx + add ebp,DWORD[24+rsp] + xor esi,ecx + mov edi,eax + rol eax,5 + add ebp,esi + xor edi,ecx + ror ebx,7 + add ebp,eax + add edx,DWORD[28+rsp] + xor edi,ebx + mov esi,ebp + rol ebp,5 + add edx,edi + xor esi,ebx + ror eax,7 + add edx,ebp + add ecx,DWORD[32+rsp] + xor esi,eax + mov edi,edx + rol edx,5 + add ecx,esi + xor edi,eax + ror ebp,7 + add ecx,edx + add ebx,DWORD[36+rsp] + xor edi,ebp + mov esi,ecx + rol ecx,5 + add ebx,edi + xor esi,ebp + ror edx,7 + add ebx,ecx + add eax,DWORD[40+rsp] + xor esi,edx + mov edi,ebx + rol ebx,5 + add eax,esi + xor edi,edx + ror ecx,7 + add eax,ebx + add ebp,DWORD[44+rsp] + xor edi,ecx + mov esi,eax + rol eax,5 + add ebp,edi + xor esi,ecx + ror ebx,7 + add ebp,eax + add edx,DWORD[48+rsp] + xor esi,ebx + mov edi,ebp + rol ebp,5 + add edx,esi + xor edi,ebx + ror eax,7 + add edx,ebp + add ecx,DWORD[52+rsp] + xor edi,eax + mov esi,edx + rol edx,5 + add ecx,edi + xor esi,eax + ror ebp,7 + add ecx,edx + add ebx,DWORD[56+rsp] + xor esi,ebp + mov edi,ecx + rol ecx,5 + add ebx,esi + xor edi,ebp + ror edx,7 + add ebx,ecx + add eax,DWORD[60+rsp] + xor edi,edx + mov esi,ebx + rol ebx,5 + add eax,edi + ror ecx,7 + add eax,ebx + add eax,DWORD[r8] + add esi,DWORD[4+r8] + add ecx,DWORD[8+r8] + mov DWORD[r8],eax + add edx,DWORD[12+r8] + mov DWORD[4+r8],esi + add ebp,DWORD[16+r8] + mov DWORD[8+r8],ecx + mov DWORD[12+r8],edx + mov DWORD[16+r8],ebp + movaps xmm6,XMMWORD[((-40-96))+r11] + movaps xmm7,XMMWORD[((-40-80))+r11] + movaps xmm8,XMMWORD[((-40-64))+r11] + movaps xmm9,XMMWORD[((-40-48))+r11] + movaps xmm10,XMMWORD[((-40-32))+r11] + movaps xmm11,XMMWORD[((-40-16))+r11] + mov r14,QWORD[((-40))+r11] + + mov r13,QWORD[((-32))+r11] + + mov r12,QWORD[((-24))+r11] + + mov rbp,QWORD[((-16))+r11] + + mov rbx,QWORD[((-8))+r11] + + lea rsp,[r11] + +$L$epilogue_ssse3: + mov rdi,QWORD[8+rsp] ;WIN64 epilogue + mov rsi,QWORD[16+rsp] + DB 0F3h,0C3h ;repret + +$L$SEH_end_sha1_block_data_order_ssse3: +ALIGN 64 +K_XX_XX: + DD 0x5a827999,0x5a827999,0x5a827999,0x5a827999 + DD 0x5a827999,0x5a827999,0x5a827999,0x5a827999 + DD 0x6ed9eba1,0x6ed9eba1,0x6ed9eba1,0x6ed9eba1 + DD 0x6ed9eba1,0x6ed9eba1,0x6ed9eba1,0x6ed9eba1 + DD 0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc + DD 0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc + DD 0xca62c1d6,0xca62c1d6,0xca62c1d6,0xca62c1d6 + DD 0xca62c1d6,0xca62c1d6,0xca62c1d6,0xca62c1d6 + DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f + DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f +DB 0xf,0xe,0xd,0xc,0xb,0xa,0x9,0x8,0x7,0x6,0x5,0x4,0x3,0x2,0x1,0x0 +DB 83,72,65,49,32,98,108,111,99,107,32,116,114,97,110,115 +DB 102,111,114,109,32,102,111,114,32,120,56,54,95,54,52,44 +DB 32,67,82,89,80,84,79,71,65,77,83,32,98,121,32,60 +DB 97,112,112,114,111,64,111,112,101,110,115,115,108,46,111,114 +DB 103,62,0 +ALIGN 64 +EXTERN __imp_RtlVirtualUnwind + +ALIGN 16 +se_handler: + push rsi + push rdi + push rbx + push rbp + push r12 + push r13 + push r14 + push r15 + pushfq + sub rsp,64 + + mov rax,QWORD[120+r8] + mov rbx,QWORD[248+r8] + + lea r10,[$L$prologue] + cmp rbx,r10 + jb NEAR $L$common_seh_tail + + mov rax,QWORD[152+r8] + + lea r10,[$L$epilogue] + cmp rbx,r10 + jae NEAR $L$common_seh_tail + + mov rax,QWORD[64+rax] + + mov rbx,QWORD[((-8))+rax] + mov rbp,QWORD[((-16))+rax] + mov r12,QWORD[((-24))+rax] + mov r13,QWORD[((-32))+rax] + mov r14,QWORD[((-40))+rax] + mov QWORD[144+r8],rbx + mov QWORD[160+r8],rbp + mov QWORD[216+r8],r12 + mov QWORD[224+r8],r13 + mov QWORD[232+r8],r14 + + jmp NEAR $L$common_seh_tail + + +ALIGN 16 +shaext_handler: + push rsi + push rdi + push rbx + push rbp + push r12 + push r13 + push r14 + push r15 + pushfq + sub rsp,64 + + mov rax,QWORD[120+r8] + mov rbx,QWORD[248+r8] + + lea r10,[$L$prologue_shaext] + cmp rbx,r10 + jb NEAR $L$common_seh_tail + + lea r10,[$L$epilogue_shaext] + cmp rbx,r10 + jae NEAR $L$common_seh_tail + + lea rsi,[((-8-64))+rax] + lea rdi,[512+r8] + mov ecx,8 + DD 0xa548f3fc + + jmp NEAR $L$common_seh_tail + + +ALIGN 16 +ssse3_handler: + push rsi + push rdi + push rbx + push rbp + push r12 + push r13 + push r14 + push r15 + pushfq + sub rsp,64 + + mov rax,QWORD[120+r8] + mov rbx,QWORD[248+r8] + + mov rsi,QWORD[8+r9] + mov r11,QWORD[56+r9] + + mov r10d,DWORD[r11] + lea r10,[r10*1+rsi] + cmp rbx,r10 + jb NEAR $L$common_seh_tail + + mov rax,QWORD[208+r8] + + mov r10d,DWORD[4+r11] + lea r10,[r10*1+rsi] + cmp rbx,r10 + jae NEAR $L$common_seh_tail + + lea rsi,[((-40-96))+rax] + lea rdi,[512+r8] + mov ecx,12 + DD 0xa548f3fc + + mov rbx,QWORD[((-8))+rax] + mov rbp,QWORD[((-16))+rax] + mov r12,QWORD[((-24))+rax] + mov r13,QWORD[((-32))+rax] + mov r14,QWORD[((-40))+rax] + mov QWORD[144+r8],rbx + mov QWORD[160+r8],rbp + mov QWORD[216+r8],r12 + mov QWORD[224+r8],r13 + mov QWORD[232+r8],r14 + +$L$common_seh_tail: + mov rdi,QWORD[8+rax] + mov rsi,QWORD[16+rax] + mov QWORD[152+r8],rax + mov QWORD[168+r8],rsi + mov QWORD[176+r8],rdi + + mov rdi,QWORD[40+r9] + mov rsi,r8 + mov ecx,154 + DD 0xa548f3fc + + mov rsi,r9 + xor rcx,rcx + mov rdx,QWORD[8+rsi] + mov r8,QWORD[rsi] + mov r9,QWORD[16+rsi] + mov r10,QWORD[40+rsi] + lea r11,[56+rsi] + lea r12,[24+rsi] + mov QWORD[32+rsp],r10 + mov QWORD[40+rsp],r11 + mov QWORD[48+rsp],r12 + mov QWORD[56+rsp],rcx + call QWORD[__imp_RtlVirtualUnwind] + + mov eax,1 + add rsp,64 + popfq + pop r15 + pop r14 + pop r13 + pop r12 + pop rbp + pop rbx + pop rdi + pop rsi + DB 0F3h,0C3h ;repret + + +section .pdata rdata align=4 +ALIGN 4 + DD $L$SEH_begin_sha1_block_data_order wrt ..imagebase + DD $L$SEH_end_sha1_block_data_order wrt ..imagebase + DD $L$SEH_info_sha1_block_data_order wrt ..imagebase + DD $L$SEH_begin_sha1_block_data_order_shaext wrt ..imagebase + DD $L$SEH_end_sha1_block_data_order_shaext wrt ..imagebase + DD $L$SEH_info_sha1_block_data_order_shaext wrt ..imagebase + DD $L$SEH_begin_sha1_block_data_order_ssse3 wrt ..imagebase + DD $L$SEH_end_sha1_block_data_order_ssse3 wrt ..imagebase + DD $L$SEH_info_sha1_block_data_order_ssse3 wrt ..imagebase +section .xdata rdata align=8 +ALIGN 8 +$L$SEH_info_sha1_block_data_order: +DB 9,0,0,0 + DD se_handler wrt ..imagebase +$L$SEH_info_sha1_block_data_order_shaext: +DB 9,0,0,0 + DD shaext_handler wrt ..imagebase +$L$SEH_info_sha1_block_data_order_ssse3: +DB 9,0,0,0 + DD ssse3_handler wrt ..imagebase + DD $L$prologue_ssse3 wrt ..imagebase,$L$epilogue_ssse3 wrt ..imagebase diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-mb-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-mb-x86_64.nasm new file mode 100644 index 0000000000..7cd5eae85c --- /dev/null +++ b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-mb-x86_64.nasm @@ -0,0 +1,3461 @@ +; WARNING: do not edit! +; Generated from openssl/crypto/sha/asm/sha256-mb-x86_64.pl +; +; Copyright 2013-2020 The OpenSSL Project Authors. All Rights Reserved. +; +; Licensed under the OpenSSL license (the "License"). You may not use +; this file except in compliance with the License. You can obtain a copy +; in the file LICENSE in the source distribution or at +; https://www.openssl.org/source/license.html + +default rel +%define XMMWORD +%define YMMWORD +%define ZMMWORD +section .text code align=64 + + +EXTERN OPENSSL_ia32cap_P + +global sha256_multi_block + +ALIGN 32 +sha256_multi_block: + mov QWORD[8+rsp],rdi ;WIN64 prologue + mov QWORD[16+rsp],rsi + mov rax,rsp +$L$SEH_begin_sha256_multi_block: + mov rdi,rcx + mov rsi,rdx + mov rdx,r8 + + + + mov rcx,QWORD[((OPENSSL_ia32cap_P+4))] + bt rcx,61 + jc NEAR _shaext_shortcut + mov rax,rsp + + push rbx + + push rbp + + lea rsp,[((-168))+rsp] + movaps XMMWORD[rsp],xmm6 + movaps XMMWORD[16+rsp],xmm7 + movaps XMMWORD[32+rsp],xmm8 + movaps XMMWORD[48+rsp],xmm9 + movaps XMMWORD[(-120)+rax],xmm10 + movaps XMMWORD[(-104)+rax],xmm11 + movaps XMMWORD[(-88)+rax],xmm12 + movaps XMMWORD[(-72)+rax],xmm13 + movaps XMMWORD[(-56)+rax],xmm14 + movaps XMMWORD[(-40)+rax],xmm15 + sub rsp,288 + and rsp,-256 + mov QWORD[272+rsp],rax + +$L$body: + lea rbp,[((K256+128))] + lea rbx,[256+rsp] + lea rdi,[128+rdi] + +$L$oop_grande: + mov DWORD[280+rsp],edx + xor edx,edx + mov r8,QWORD[rsi] + mov ecx,DWORD[8+rsi] + cmp ecx,edx + cmovg edx,ecx + test ecx,ecx + mov DWORD[rbx],ecx + cmovle r8,rbp + mov r9,QWORD[16+rsi] + mov ecx,DWORD[24+rsi] + cmp ecx,edx + cmovg edx,ecx + test ecx,ecx + mov DWORD[4+rbx],ecx + cmovle r9,rbp + mov r10,QWORD[32+rsi] + mov ecx,DWORD[40+rsi] + cmp ecx,edx + cmovg edx,ecx + test ecx,ecx + mov DWORD[8+rbx],ecx + cmovle r10,rbp + mov r11,QWORD[48+rsi] + mov ecx,DWORD[56+rsi] + cmp ecx,edx + cmovg edx,ecx + test ecx,ecx + mov DWORD[12+rbx],ecx + cmovle r11,rbp + test edx,edx + jz NEAR $L$done + + movdqu xmm8,XMMWORD[((0-128))+rdi] + lea rax,[128+rsp] + movdqu xmm9,XMMWORD[((32-128))+rdi] + movdqu xmm10,XMMWORD[((64-128))+rdi] + movdqu xmm11,XMMWORD[((96-128))+rdi] + movdqu xmm12,XMMWORD[((128-128))+rdi] + movdqu xmm13,XMMWORD[((160-128))+rdi] + movdqu xmm14,XMMWORD[((192-128))+rdi] + movdqu xmm15,XMMWORD[((224-128))+rdi] + movdqu xmm6,XMMWORD[$L$pbswap] + jmp NEAR $L$oop + +ALIGN 32 +$L$oop: + movdqa xmm4,xmm10 + pxor xmm4,xmm9 + movd xmm5,DWORD[r8] + movd xmm0,DWORD[r9] + movd xmm1,DWORD[r10] + movd xmm2,DWORD[r11] + punpckldq xmm5,xmm1 + punpckldq xmm0,xmm2 + punpckldq xmm5,xmm0 + movdqa xmm7,xmm12 +DB 102,15,56,0,238 + movdqa xmm2,xmm12 + + psrld xmm7,6 + movdqa xmm1,xmm12 + pslld xmm2,7 + movdqa XMMWORD[(0-128)+rax],xmm5 + paddd xmm5,xmm15 + + psrld xmm1,11 + pxor xmm7,xmm2 + pslld xmm2,21-7 + paddd xmm5,XMMWORD[((-128))+rbp] + pxor xmm7,xmm1 + + psrld xmm1,25-11 + movdqa xmm0,xmm12 + + pxor xmm7,xmm2 + movdqa xmm3,xmm12 + pslld xmm2,26-21 + pandn xmm0,xmm14 + pand xmm3,xmm13 + pxor xmm7,xmm1 + + + movdqa xmm1,xmm8 + pxor xmm7,xmm2 + movdqa xmm2,xmm8 + psrld xmm1,2 + paddd xmm5,xmm7 + pxor xmm0,xmm3 + movdqa xmm3,xmm9 + movdqa xmm7,xmm8 + pslld xmm2,10 + pxor xmm3,xmm8 + + + psrld xmm7,13 + pxor xmm1,xmm2 + paddd xmm5,xmm0 + pslld xmm2,19-10 + pand xmm4,xmm3 + pxor xmm1,xmm7 + + + psrld xmm7,22-13 + pxor xmm1,xmm2 + movdqa xmm15,xmm9 + pslld xmm2,30-19 + pxor xmm7,xmm1 + pxor xmm15,xmm4 + paddd xmm11,xmm5 + pxor xmm7,xmm2 + + paddd xmm15,xmm5 + paddd xmm15,xmm7 + movd xmm5,DWORD[4+r8] + movd xmm0,DWORD[4+r9] + movd xmm1,DWORD[4+r10] + movd xmm2,DWORD[4+r11] + punpckldq xmm5,xmm1 + punpckldq xmm0,xmm2 + punpckldq xmm5,xmm0 + movdqa xmm7,xmm11 + + movdqa xmm2,xmm11 +DB 102,15,56,0,238 + psrld xmm7,6 + movdqa xmm1,xmm11 + pslld xmm2,7 + movdqa XMMWORD[(16-128)+rax],xmm5 + paddd xmm5,xmm14 + + psrld xmm1,11 + pxor xmm7,xmm2 + pslld xmm2,21-7 + paddd xmm5,XMMWORD[((-96))+rbp] + pxor xmm7,xmm1 + + psrld xmm1,25-11 + movdqa xmm0,xmm11 + + pxor xmm7,xmm2 + movdqa xmm4,xmm11 + pslld xmm2,26-21 + pandn xmm0,xmm13 + pand xmm4,xmm12 + pxor xmm7,xmm1 + + + movdqa xmm1,xmm15 + pxor xmm7,xmm2 + movdqa xmm2,xmm15 + psrld xmm1,2 + paddd xmm5,xmm7 + pxor xmm0,xmm4 + movdqa xmm4,xmm8 + movdqa xmm7,xmm15 + pslld xmm2,10 + pxor xmm4,xmm15 + + + psrld xmm7,13 + pxor xmm1,xmm2 + paddd xmm5,xmm0 + pslld xmm2,19-10 + pand xmm3,xmm4 + pxor xmm1,xmm7 + + + psrld xmm7,22-13 + pxor xmm1,xmm2 + movdqa xmm14,xmm8 + pslld xmm2,30-19 + pxor xmm7,xmm1 + pxor xmm14,xmm3 + paddd xmm10,xmm5 + pxor xmm7,xmm2 + + paddd xmm14,xmm5 + paddd xmm14,xmm7 + movd xmm5,DWORD[8+r8] + movd xmm0,DWORD[8+r9] + movd xmm1,DWORD[8+r10] + movd xmm2,DWORD[8+r11] + punpckldq xmm5,xmm1 + punpckldq xmm0,xmm2 + punpckldq xmm5,xmm0 + movdqa xmm7,xmm10 +DB 102,15,56,0,238 + movdqa xmm2,xmm10 + + psrld xmm7,6 + movdqa xmm1,xmm10 + pslld xmm2,7 + movdqa XMMWORD[(32-128)+rax],xmm5 + paddd xmm5,xmm13 + + psrld xmm1,11 + pxor xmm7,xmm2 + pslld xmm2,21-7 + paddd xmm5,XMMWORD[((-64))+rbp] + pxor xmm7,xmm1 + + psrld xmm1,25-11 + movdqa xmm0,xmm10 + + pxor xmm7,xmm2 + movdqa xmm3,xmm10 + pslld xmm2,26-21 + pandn xmm0,xmm12 + pand xmm3,xmm11 + pxor xmm7,xmm1 + + + movdqa xmm1,xmm14 + pxor xmm7,xmm2 + movdqa xmm2,xmm14 + psrld xmm1,2 + paddd xmm5,xmm7 + pxor xmm0,xmm3 + movdqa xmm3,xmm15 + movdqa xmm7,xmm14 + pslld xmm2,10 + pxor xmm3,xmm14 + + + psrld xmm7,13 + pxor xmm1,xmm2 + paddd xmm5,xmm0 + pslld xmm2,19-10 + pand xmm4,xmm3 + pxor xmm1,xmm7 + + + psrld xmm7,22-13 + pxor xmm1,xmm2 + movdqa xmm13,xmm15 + pslld xmm2,30-19 + pxor xmm7,xmm1 + pxor xmm13,xmm4 + paddd xmm9,xmm5 + pxor xmm7,xmm2 + + paddd xmm13,xmm5 + paddd xmm13,xmm7 + movd xmm5,DWORD[12+r8] + movd xmm0,DWORD[12+r9] + movd xmm1,DWORD[12+r10] + movd xmm2,DWORD[12+r11] + punpckldq xmm5,xmm1 + punpckldq xmm0,xmm2 + punpckldq xmm5,xmm0 + movdqa xmm7,xmm9 + + movdqa xmm2,xmm9 +DB 102,15,56,0,238 + psrld xmm7,6 + movdqa xmm1,xmm9 + pslld xmm2,7 + movdqa XMMWORD[(48-128)+rax],xmm5 + paddd xmm5,xmm12 + + psrld xmm1,11 + pxor xmm7,xmm2 + pslld xmm2,21-7 + paddd xmm5,XMMWORD[((-32))+rbp] + pxor xmm7,xmm1 + + psrld xmm1,25-11 + movdqa xmm0,xmm9 + + pxor xmm7,xmm2 + movdqa xmm4,xmm9 + pslld xmm2,26-21 + pandn xmm0,xmm11 + pand xmm4,xmm10 + pxor xmm7,xmm1 + + + movdqa xmm1,xmm13 + pxor xmm7,xmm2 + movdqa xmm2,xmm13 + psrld xmm1,2 + paddd xmm5,xmm7 + pxor xmm0,xmm4 + movdqa xmm4,xmm14 + movdqa xmm7,xmm13 + pslld xmm2,10 + pxor xmm4,xmm13 + + + psrld xmm7,13 + pxor xmm1,xmm2 + paddd xmm5,xmm0 + pslld xmm2,19-10 + pand xmm3,xmm4 + pxor xmm1,xmm7 + + + psrld xmm7,22-13 + pxor xmm1,xmm2 + movdqa xmm12,xmm14 + pslld xmm2,30-19 + pxor xmm7,xmm1 + pxor xmm12,xmm3 + paddd xmm8,xmm5 + pxor xmm7,xmm2 + + paddd xmm12,xmm5 + paddd xmm12,xmm7 + movd xmm5,DWORD[16+r8] + movd xmm0,DWORD[16+r9] + movd xmm1,DWORD[16+r10] + movd xmm2,DWORD[16+r11] + punpckldq xmm5,xmm1 + punpckldq xmm0,xmm2 + punpckldq xmm5,xmm0 + movdqa xmm7,xmm8 +DB 102,15,56,0,238 + movdqa xmm2,xmm8 + + psrld xmm7,6 + movdqa xmm1,xmm8 + pslld xmm2,7 + movdqa XMMWORD[(64-128)+rax],xmm5 + paddd xmm5,xmm11 + + psrld xmm1,11 + pxor xmm7,xmm2 + pslld xmm2,21-7 + paddd xmm5,XMMWORD[rbp] + pxor xmm7,xmm1 + + psrld xmm1,25-11 + movdqa xmm0,xmm8 + + pxor xmm7,xmm2 + movdqa xmm3,xmm8 + pslld xmm2,26-21 + pandn xmm0,xmm10 + pand xmm3,xmm9 + pxor xmm7,xmm1 + + + movdqa xmm1,xmm12 + pxor xmm7,xmm2 + movdqa xmm2,xmm12 + psrld xmm1,2 + paddd xmm5,xmm7 + pxor xmm0,xmm3 + movdqa xmm3,xmm13 + movdqa xmm7,xmm12 + pslld xmm2,10 + pxor xmm3,xmm12 + + + psrld xmm7,13 + pxor xmm1,xmm2 + paddd xmm5,xmm0 + pslld xmm2,19-10 + pand xmm4,xmm3 + pxor xmm1,xmm7 + + + psrld xmm7,22-13 + pxor xmm1,xmm2 + movdqa xmm11,xmm13 + pslld xmm2,30-19 + pxor xmm7,xmm1 + pxor xmm11,xmm4 + paddd xmm15,xmm5 + pxor xmm7,xmm2 + + paddd xmm11,xmm5 + paddd xmm11,xmm7 + movd xmm5,DWORD[20+r8] + movd xmm0,DWORD[20+r9] + movd xmm1,DWORD[20+r10] + movd xmm2,DWORD[20+r11] + punpckldq xmm5,xmm1 + punpckldq xmm0,xmm2 + punpckldq xmm5,xmm0 + movdqa xmm7,xmm15 + + movdqa xmm2,xmm15 +DB 102,15,56,0,238 + psrld xmm7,6 + movdqa xmm1,xmm15 + pslld xmm2,7 + movdqa XMMWORD[(80-128)+rax],xmm5 + paddd xmm5,xmm10 + + psrld xmm1,11 + pxor xmm7,xmm2 + pslld xmm2,21-7 + paddd xmm5,XMMWORD[32+rbp] + pxor xmm7,xmm1 + + psrld xmm1,25-11 + movdqa xmm0,xmm15 + + pxor xmm7,xmm2 + movdqa xmm4,xmm15 + pslld xmm2,26-21 + pandn xmm0,xmm9 + pand xmm4,xmm8 + pxor xmm7,xmm1 + + + movdqa xmm1,xmm11 + pxor xmm7,xmm2 + movdqa xmm2,xmm11 + psrld xmm1,2 + paddd xmm5,xmm7 + pxor xmm0,xmm4 + movdqa xmm4,xmm12 + movdqa xmm7,xmm11 + pslld xmm2,10 + pxor xmm4,xmm11 + + + psrld xmm7,13 + pxor xmm1,xmm2 + paddd xmm5,xmm0 + pslld xmm2,19-10 + pand xmm3,xmm4 + pxor xmm1,xmm7 + + + psrld xmm7,22-13 + pxor xmm1,xmm2 + movdqa xmm10,xmm12 + pslld xmm2,30-19 + pxor xmm7,xmm1 + pxor xmm10,xmm3 + paddd xmm14,xmm5 + pxor xmm7,xmm2 + + paddd xmm10,xmm5 + paddd xmm10,xmm7 + movd xmm5,DWORD[24+r8] + movd xmm0,DWORD[24+r9] + movd xmm1,DWORD[24+r10] + movd xmm2,DWORD[24+r11] + punpckldq xmm5,xmm1 + punpckldq xmm0,xmm2 + punpckldq xmm5,xmm0 + movdqa xmm7,xmm14 +DB 102,15,56,0,238 + movdqa xmm2,xmm14 + + psrld xmm7,6 + movdqa xmm1,xmm14 + pslld xmm2,7 + movdqa XMMWORD[(96-128)+rax],xmm5 + paddd xmm5,xmm9 + + psrld xmm1,11 + pxor xmm7,xmm2 + pslld xmm2,21-7 + paddd xmm5,XMMWORD[64+rbp] + pxor xmm7,xmm1 + + psrld xmm1,25-11 + movdqa xmm0,xmm14 + + pxor xmm7,xmm2 + movdqa xmm3,xmm14 + pslld xmm2,26-21 + pandn xmm0,xmm8 + pand xmm3,xmm15 + pxor xmm7,xmm1 + + + movdqa xmm1,xmm10 + pxor xmm7,xmm2 + movdqa xmm2,xmm10 + psrld xmm1,2 + paddd xmm5,xmm7 + pxor xmm0,xmm3 + movdqa xmm3,xmm11 + movdqa xmm7,xmm10 + pslld xmm2,10 + pxor xmm3,xmm10 + + + psrld xmm7,13 + pxor xmm1,xmm2 + paddd xmm5,xmm0 + pslld xmm2,19-10 + pand xmm4,xmm3 + pxor xmm1,xmm7 + + + psrld xmm7,22-13 + pxor xmm1,xmm2 + movdqa xmm9,xmm11 + pslld xmm2,30-19 + pxor xmm7,xmm1 + pxor xmm9,xmm4 + paddd xmm13,xmm5 + pxor xmm7,xmm2 + + paddd xmm9,xmm5 + paddd xmm9,xmm7 + movd xmm5,DWORD[28+r8] + movd xmm0,DWORD[28+r9] + movd xmm1,DWORD[28+r10] + movd xmm2,DWORD[28+r11] + punpckldq xmm5,xmm1 + punpckldq xmm0,xmm2 + punpckldq xmm5,xmm0 + movdqa xmm7,xmm13 + + movdqa xmm2,xmm13 +DB 102,15,56,0,238 + psrld xmm7,6 + movdqa xmm1,xmm13 + pslld xmm2,7 + movdqa XMMWORD[(112-128)+rax],xmm5 + paddd xmm5,xmm8 + + psrld xmm1,11 + pxor xmm7,xmm2 + pslld xmm2,21-7 + paddd xmm5,XMMWORD[96+rbp] + pxor xmm7,xmm1 + + psrld xmm1,25-11 + movdqa xmm0,xmm13 + + pxor xmm7,xmm2 + movdqa xmm4,xmm13 + pslld xmm2,26-21 + pandn xmm0,xmm15 + pand xmm4,xmm14 + pxor xmm7,xmm1 + + + movdqa xmm1,xmm9 + pxor xmm7,xmm2 + movdqa xmm2,xmm9 + psrld xmm1,2 + paddd xmm5,xmm7 + pxor xmm0,xmm4 + movdqa xmm4,xmm10 + movdqa xmm7,xmm9 + pslld xmm2,10 + pxor xmm4,xmm9 + + + psrld xmm7,13 + pxor xmm1,xmm2 + paddd xmm5,xmm0 + pslld xmm2,19-10 + pand xmm3,xmm4 + pxor xmm1,xmm7 + + + psrld xmm7,22-13 + pxor xmm1,xmm2 + movdqa xmm8,xmm10 + pslld xmm2,30-19 + pxor xmm7,xmm1 + pxor xmm8,xmm3 + paddd xmm12,xmm5 + pxor xmm7,xmm2 + + paddd xmm8,xmm5 + paddd xmm8,xmm7 + lea rbp,[256+rbp] + movd xmm5,DWORD[32+r8] + movd xmm0,DWORD[32+r9] + movd xmm1,DWORD[32+r10] + movd xmm2,DWORD[32+r11] + punpckldq xmm5,xmm1 + punpckldq xmm0,xmm2 + punpckldq xmm5,xmm0 + movdqa xmm7,xmm12 +DB 102,15,56,0,238 + movdqa xmm2,xmm12 + + psrld xmm7,6 + movdqa xmm1,xmm12 + pslld xmm2,7 + movdqa XMMWORD[(128-128)+rax],xmm5 + paddd xmm5,xmm15 + + psrld xmm1,11 + pxor xmm7,xmm2 + pslld xmm2,21-7 + paddd xmm5,XMMWORD[((-128))+rbp] + pxor xmm7,xmm1 + + psrld xmm1,25-11 + movdqa xmm0,xmm12 + + pxor xmm7,xmm2 + movdqa xmm3,xmm12 + pslld xmm2,26-21 + pandn xmm0,xmm14 + pand xmm3,xmm13 + pxor xmm7,xmm1 + + + movdqa xmm1,xmm8 + pxor xmm7,xmm2 + movdqa xmm2,xmm8 + psrld xmm1,2 + paddd xmm5,xmm7 + pxor xmm0,xmm3 + movdqa xmm3,xmm9 + movdqa xmm7,xmm8 + pslld xmm2,10 + pxor xmm3,xmm8 + + + psrld xmm7,13 + pxor xmm1,xmm2 + paddd xmm5,xmm0 + pslld xmm2,19-10 + pand xmm4,xmm3 + pxor xmm1,xmm7 + + + psrld xmm7,22-13 + pxor xmm1,xmm2 + movdqa xmm15,xmm9 + pslld xmm2,30-19 + pxor xmm7,xmm1 + pxor xmm15,xmm4 + paddd xmm11,xmm5 + pxor xmm7,xmm2 + + paddd xmm15,xmm5 + paddd xmm15,xmm7 + movd xmm5,DWORD[36+r8] + movd xmm0,DWORD[36+r9] + movd xmm1,DWORD[36+r10] + movd xmm2,DWORD[36+r11] + punpckldq xmm5,xmm1 + punpckldq xmm0,xmm2 + punpckldq xmm5,xmm0 + movdqa xmm7,xmm11 + + movdqa xmm2,xmm11 +DB 102,15,56,0,238 + psrld xmm7,6 + movdqa xmm1,xmm11 + pslld xmm2,7 + movdqa XMMWORD[(144-128)+rax],xmm5 + paddd xmm5,xmm14 + + psrld xmm1,11 + pxor xmm7,xmm2 + pslld xmm2,21-7 + paddd xmm5,XMMWORD[((-96))+rbp] + pxor xmm7,xmm1 + + psrld xmm1,25-11 + movdqa xmm0,xmm11 + + pxor xmm7,xmm2 + movdqa xmm4,xmm11 + pslld xmm2,26-21 + pandn xmm0,xmm13 + pand xmm4,xmm12 + pxor xmm7,xmm1 + + + movdqa xmm1,xmm15 + pxor xmm7,xmm2 + movdqa xmm2,xmm15 + psrld xmm1,2 + paddd xmm5,xmm7 + pxor xmm0,xmm4 + movdqa xmm4,xmm8 + movdqa xmm7,xmm15 + pslld xmm2,10 + pxor xmm4,xmm15 + + + psrld xmm7,13 + pxor xmm1,xmm2 + paddd xmm5,xmm0 + pslld xmm2,19-10 + pand xmm3,xmm4 + pxor xmm1,xmm7 + + + psrld xmm7,22-13 + pxor xmm1,xmm2 + movdqa xmm14,xmm8 + pslld xmm2,30-19 + pxor xmm7,xmm1 + pxor xmm14,xmm3 + paddd xmm10,xmm5 + pxor xmm7,xmm2 + + paddd xmm14,xmm5 + paddd xmm14,xmm7 + movd xmm5,DWORD[40+r8] + movd xmm0,DWORD[40+r9] + movd xmm1,DWORD[40+r10] + movd xmm2,DWORD[40+r11] + punpckldq xmm5,xmm1 + punpckldq xmm0,xmm2 + punpckldq xmm5,xmm0 + movdqa xmm7,xmm10 +DB 102,15,56,0,238 + movdqa xmm2,xmm10 + + psrld xmm7,6 + movdqa xmm1,xmm10 + pslld xmm2,7 + movdqa XMMWORD[(160-128)+rax],xmm5 + paddd xmm5,xmm13 + + psrld xmm1,11 + pxor xmm7,xmm2 + pslld xmm2,21-7 + paddd xmm5,XMMWORD[((-64))+rbp] + pxor xmm7,xmm1 + + psrld xmm1,25-11 + movdqa xmm0,xmm10 + + pxor xmm7,xmm2 + movdqa xmm3,xmm10 + pslld xmm2,26-21 + pandn xmm0,xmm12 + pand xmm3,xmm11 + pxor xmm7,xmm1 + + + movdqa xmm1,xmm14 + pxor xmm7,xmm2 + movdqa xmm2,xmm14 + psrld xmm1,2 + paddd xmm5,xmm7 + pxor xmm0,xmm3 + movdqa xmm3,xmm15 + movdqa xmm7,xmm14 + pslld xmm2,10 + pxor xmm3,xmm14 + + + psrld xmm7,13 + pxor xmm1,xmm2 + paddd xmm5,xmm0 + pslld xmm2,19-10 + pand xmm4,xmm3 + pxor xmm1,xmm7 + + + psrld xmm7,22-13 + pxor xmm1,xmm2 + movdqa xmm13,xmm15 + pslld xmm2,30-19 + pxor xmm7,xmm1 + pxor xmm13,xmm4 + paddd xmm9,xmm5 + pxor xmm7,xmm2 + + paddd xmm13,xmm5 + paddd xmm13,xmm7 + movd xmm5,DWORD[44+r8] + movd xmm0,DWORD[44+r9] + movd xmm1,DWORD[44+r10] + movd xmm2,DWORD[44+r11] + punpckldq xmm5,xmm1 + punpckldq xmm0,xmm2 + punpckldq xmm5,xmm0 + movdqa xmm7,xmm9 + + movdqa xmm2,xmm9 +DB 102,15,56,0,238 + psrld xmm7,6 + movdqa xmm1,xmm9 + pslld xmm2,7 + movdqa XMMWORD[(176-128)+rax],xmm5 + paddd xmm5,xmm12 + + psrld xmm1,11 + pxor xmm7,xmm2 + pslld xmm2,21-7 + paddd xmm5,XMMWORD[((-32))+rbp] + pxor xmm7,xmm1 + + psrld xmm1,25-11 + movdqa xmm0,xmm9 + + pxor xmm7,xmm2 + movdqa xmm4,xmm9 + pslld xmm2,26-21 + pandn xmm0,xmm11 + pand xmm4,xmm10 + pxor xmm7,xmm1 + + + movdqa xmm1,xmm13 + pxor xmm7,xmm2 + movdqa xmm2,xmm13 + psrld xmm1,2 + paddd xmm5,xmm7 + pxor xmm0,xmm4 + movdqa xmm4,xmm14 + movdqa xmm7,xmm13 + pslld xmm2,10 + pxor xmm4,xmm13 + + + psrld xmm7,13 + pxor xmm1,xmm2 + paddd xmm5,xmm0 + pslld xmm2,19-10 + pand xmm3,xmm4 + pxor xmm1,xmm7 + + + psrld xmm7,22-13 + pxor xmm1,xmm2 + movdqa xmm12,xmm14 + pslld xmm2,30-19 + pxor xmm7,xmm1 + pxor xmm12,xmm3 + paddd xmm8,xmm5 + pxor xmm7,xmm2 + + paddd xmm12,xmm5 + paddd xmm12,xmm7 + movd xmm5,DWORD[48+r8] + movd xmm0,DWORD[48+r9] + movd xmm1,DWORD[48+r10] + movd xmm2,DWORD[48+r11] + punpckldq xmm5,xmm1 + punpckldq xmm0,xmm2 + punpckldq xmm5,xmm0 + movdqa xmm7,xmm8 +DB 102,15,56,0,238 + movdqa xmm2,xmm8 + + psrld xmm7,6 + movdqa xmm1,xmm8 + pslld xmm2,7 + movdqa XMMWORD[(192-128)+rax],xmm5 + paddd xmm5,xmm11 + + psrld xmm1,11 + pxor xmm7,xmm2 + pslld xmm2,21-7 + paddd xmm5,XMMWORD[rbp] + pxor xmm7,xmm1 + + psrld xmm1,25-11 + movdqa xmm0,xmm8 + + pxor xmm7,xmm2 + movdqa xmm3,xmm8 + pslld xmm2,26-21 + pandn xmm0,xmm10 + pand xmm3,xmm9 + pxor xmm7,xmm1 + + + movdqa xmm1,xmm12 + pxor xmm7,xmm2 + movdqa xmm2,xmm12 + psrld xmm1,2 + paddd xmm5,xmm7 + pxor xmm0,xmm3 + movdqa xmm3,xmm13 + movdqa xmm7,xmm12 + pslld xmm2,10 + pxor xmm3,xmm12 + + + psrld xmm7,13 + pxor xmm1,xmm2 + paddd xmm5,xmm0 + pslld xmm2,19-10 + pand xmm4,xmm3 + pxor xmm1,xmm7 + + + psrld xmm7,22-13 + pxor xmm1,xmm2 + movdqa xmm11,xmm13 + pslld xmm2,30-19 + pxor xmm7,xmm1 + pxor xmm11,xmm4 + paddd xmm15,xmm5 + pxor xmm7,xmm2 + + paddd xmm11,xmm5 + paddd xmm11,xmm7 + movd xmm5,DWORD[52+r8] + movd xmm0,DWORD[52+r9] + movd xmm1,DWORD[52+r10] + movd xmm2,DWORD[52+r11] + punpckldq xmm5,xmm1 + punpckldq xmm0,xmm2 + punpckldq xmm5,xmm0 + movdqa xmm7,xmm15 + + movdqa xmm2,xmm15 +DB 102,15,56,0,238 + psrld xmm7,6 + movdqa xmm1,xmm15 + pslld xmm2,7 + movdqa XMMWORD[(208-128)+rax],xmm5 + paddd xmm5,xmm10 + + psrld xmm1,11 + pxor xmm7,xmm2 + pslld xmm2,21-7 + paddd xmm5,XMMWORD[32+rbp] + pxor xmm7,xmm1 + + psrld xmm1,25-11 + movdqa xmm0,xmm15 + + pxor xmm7,xmm2 + movdqa xmm4,xmm15 + pslld xmm2,26-21 + pandn xmm0,xmm9 + pand xmm4,xmm8 + pxor xmm7,xmm1 + + + movdqa xmm1,xmm11 + pxor xmm7,xmm2 + movdqa xmm2,xmm11 + psrld xmm1,2 + paddd xmm5,xmm7 + pxor xmm0,xmm4 + movdqa xmm4,xmm12 + movdqa xmm7,xmm11 + pslld xmm2,10 + pxor xmm4,xmm11 + + + psrld xmm7,13 + pxor xmm1,xmm2 + paddd xmm5,xmm0 + pslld xmm2,19-10 + pand xmm3,xmm4 + pxor xmm1,xmm7 + + + psrld xmm7,22-13 + pxor xmm1,xmm2 + movdqa xmm10,xmm12 + pslld xmm2,30-19 + pxor xmm7,xmm1 + pxor xmm10,xmm3 + paddd xmm14,xmm5 + pxor xmm7,xmm2 + + paddd xmm10,xmm5 + paddd xmm10,xmm7 + movd xmm5,DWORD[56+r8] + movd xmm0,DWORD[56+r9] + movd xmm1,DWORD[56+r10] + movd xmm2,DWORD[56+r11] + punpckldq xmm5,xmm1 + punpckldq xmm0,xmm2 + punpckldq xmm5,xmm0 + movdqa xmm7,xmm14 +DB 102,15,56,0,238 + movdqa xmm2,xmm14 + + psrld xmm7,6 + movdqa xmm1,xmm14 + pslld xmm2,7 + movdqa XMMWORD[(224-128)+rax],xmm5 + paddd xmm5,xmm9 + + psrld xmm1,11 + pxor xmm7,xmm2 + pslld xmm2,21-7 + paddd xmm5,XMMWORD[64+rbp] + pxor xmm7,xmm1 + + psrld xmm1,25-11 + movdqa xmm0,xmm14 + + pxor xmm7,xmm2 + movdqa xmm3,xmm14 + pslld xmm2,26-21 + pandn xmm0,xmm8 + pand xmm3,xmm15 + pxor xmm7,xmm1 + + + movdqa xmm1,xmm10 + pxor xmm7,xmm2 + movdqa xmm2,xmm10 + psrld xmm1,2 + paddd xmm5,xmm7 + pxor xmm0,xmm3 + movdqa xmm3,xmm11 + movdqa xmm7,xmm10 + pslld xmm2,10 + pxor xmm3,xmm10 + + + psrld xmm7,13 + pxor xmm1,xmm2 + paddd xmm5,xmm0 + pslld xmm2,19-10 + pand xmm4,xmm3 + pxor xmm1,xmm7 + + + psrld xmm7,22-13 + pxor xmm1,xmm2 + movdqa xmm9,xmm11 + pslld xmm2,30-19 + pxor xmm7,xmm1 + pxor xmm9,xmm4 + paddd xmm13,xmm5 + pxor xmm7,xmm2 + + paddd xmm9,xmm5 + paddd xmm9,xmm7 + movd xmm5,DWORD[60+r8] + lea r8,[64+r8] + movd xmm0,DWORD[60+r9] + lea r9,[64+r9] + movd xmm1,DWORD[60+r10] + lea r10,[64+r10] + movd xmm2,DWORD[60+r11] + lea r11,[64+r11] + punpckldq xmm5,xmm1 + punpckldq xmm0,xmm2 + punpckldq xmm5,xmm0 + movdqa xmm7,xmm13 + + movdqa xmm2,xmm13 +DB 102,15,56,0,238 + psrld xmm7,6 + movdqa xmm1,xmm13 + pslld xmm2,7 + movdqa XMMWORD[(240-128)+rax],xmm5 + paddd xmm5,xmm8 + + psrld xmm1,11 + pxor xmm7,xmm2 + pslld xmm2,21-7 + paddd xmm5,XMMWORD[96+rbp] + pxor xmm7,xmm1 + + psrld xmm1,25-11 + movdqa xmm0,xmm13 + prefetcht0 [63+r8] + pxor xmm7,xmm2 + movdqa xmm4,xmm13 + pslld xmm2,26-21 + pandn xmm0,xmm15 + pand xmm4,xmm14 + pxor xmm7,xmm1 + + prefetcht0 [63+r9] + movdqa xmm1,xmm9 + pxor xmm7,xmm2 + movdqa xmm2,xmm9 + psrld xmm1,2 + paddd xmm5,xmm7 + pxor xmm0,xmm4 + movdqa xmm4,xmm10 + movdqa xmm7,xmm9 + pslld xmm2,10 + pxor xmm4,xmm9 + + prefetcht0 [63+r10] + psrld xmm7,13 + pxor xmm1,xmm2 + paddd xmm5,xmm0 + pslld xmm2,19-10 + pand xmm3,xmm4 + pxor xmm1,xmm7 + + prefetcht0 [63+r11] + psrld xmm7,22-13 + pxor xmm1,xmm2 + movdqa xmm8,xmm10 + pslld xmm2,30-19 + pxor xmm7,xmm1 + pxor xmm8,xmm3 + paddd xmm12,xmm5 + pxor xmm7,xmm2 + + paddd xmm8,xmm5 + paddd xmm8,xmm7 + lea rbp,[256+rbp] + movdqu xmm5,XMMWORD[((0-128))+rax] + mov ecx,3 + jmp NEAR $L$oop_16_xx +ALIGN 32 +$L$oop_16_xx: + movdqa xmm6,XMMWORD[((16-128))+rax] + paddd xmm5,XMMWORD[((144-128))+rax] + + movdqa xmm7,xmm6 + movdqa xmm1,xmm6 + psrld xmm7,3 + movdqa xmm2,xmm6 + + psrld xmm1,7 + movdqa xmm0,XMMWORD[((224-128))+rax] + pslld xmm2,14 + pxor xmm7,xmm1 + psrld xmm1,18-7 + movdqa xmm3,xmm0 + pxor xmm7,xmm2 + pslld xmm2,25-14 + pxor xmm7,xmm1 + psrld xmm0,10 + movdqa xmm1,xmm3 + + psrld xmm3,17 + pxor xmm7,xmm2 + pslld xmm1,13 + paddd xmm5,xmm7 + pxor xmm0,xmm3 + psrld xmm3,19-17 + pxor xmm0,xmm1 + pslld xmm1,15-13 + pxor xmm0,xmm3 + pxor xmm0,xmm1 + paddd xmm5,xmm0 + movdqa xmm7,xmm12 + + movdqa xmm2,xmm12 + + psrld xmm7,6 + movdqa xmm1,xmm12 + pslld xmm2,7 + movdqa XMMWORD[(0-128)+rax],xmm5 + paddd xmm5,xmm15 + + psrld xmm1,11 + pxor xmm7,xmm2 + pslld xmm2,21-7 + paddd xmm5,XMMWORD[((-128))+rbp] + pxor xmm7,xmm1 + + psrld xmm1,25-11 + movdqa xmm0,xmm12 + + pxor xmm7,xmm2 + movdqa xmm3,xmm12 + pslld xmm2,26-21 + pandn xmm0,xmm14 + pand xmm3,xmm13 + pxor xmm7,xmm1 + + + movdqa xmm1,xmm8 + pxor xmm7,xmm2 + movdqa xmm2,xmm8 + psrld xmm1,2 + paddd xmm5,xmm7 + pxor xmm0,xmm3 + movdqa xmm3,xmm9 + movdqa xmm7,xmm8 + pslld xmm2,10 + pxor xmm3,xmm8 + + + psrld xmm7,13 + pxor xmm1,xmm2 + paddd xmm5,xmm0 + pslld xmm2,19-10 + pand xmm4,xmm3 + pxor xmm1,xmm7 + + + psrld xmm7,22-13 + pxor xmm1,xmm2 + movdqa xmm15,xmm9 + pslld xmm2,30-19 + pxor xmm7,xmm1 + pxor xmm15,xmm4 + paddd xmm11,xmm5 + pxor xmm7,xmm2 + + paddd xmm15,xmm5 + paddd xmm15,xmm7 + movdqa xmm5,XMMWORD[((32-128))+rax] + paddd xmm6,XMMWORD[((160-128))+rax] + + movdqa xmm7,xmm5 + movdqa xmm1,xmm5 + psrld xmm7,3 + movdqa xmm2,xmm5 + + psrld xmm1,7 + movdqa xmm0,XMMWORD[((240-128))+rax] + pslld xmm2,14 + pxor xmm7,xmm1 + psrld xmm1,18-7 + movdqa xmm4,xmm0 + pxor xmm7,xmm2 + pslld xmm2,25-14 + pxor xmm7,xmm1 + psrld xmm0,10 + movdqa xmm1,xmm4 + + psrld xmm4,17 + pxor xmm7,xmm2 + pslld xmm1,13 + paddd xmm6,xmm7 + pxor xmm0,xmm4 + psrld xmm4,19-17 + pxor xmm0,xmm1 + pslld xmm1,15-13 + pxor xmm0,xmm4 + pxor xmm0,xmm1 + paddd xmm6,xmm0 + movdqa xmm7,xmm11 + + movdqa xmm2,xmm11 + + psrld xmm7,6 + movdqa xmm1,xmm11 + pslld xmm2,7 + movdqa XMMWORD[(16-128)+rax],xmm6 + paddd xmm6,xmm14 + + psrld xmm1,11 + pxor xmm7,xmm2 + pslld xmm2,21-7 + paddd xmm6,XMMWORD[((-96))+rbp] + pxor xmm7,xmm1 + + psrld xmm1,25-11 + movdqa xmm0,xmm11 + + pxor xmm7,xmm2 + movdqa xmm4,xmm11 + pslld xmm2,26-21 + pandn xmm0,xmm13 + pand xmm4,xmm12 + pxor xmm7,xmm1 + + + movdqa xmm1,xmm15 + pxor xmm7,xmm2 + movdqa xmm2,xmm15 + psrld xmm1,2 + paddd xmm6,xmm7 + pxor xmm0,xmm4 + movdqa xmm4,xmm8 + movdqa xmm7,xmm15 + pslld xmm2,10 + pxor xmm4,xmm15 + + + psrld xmm7,13 + pxor xmm1,xmm2 + paddd xmm6,xmm0 + pslld xmm2,19-10 + pand xmm3,xmm4 + pxor xmm1,xmm7 + + + psrld xmm7,22-13 + pxor xmm1,xmm2 + movdqa xmm14,xmm8 + pslld xmm2,30-19 + pxor xmm7,xmm1 + pxor xmm14,xmm3 + paddd xmm10,xmm6 + pxor xmm7,xmm2 + + paddd xmm14,xmm6 + paddd xmm14,xmm7 + movdqa xmm6,XMMWORD[((48-128))+rax] + paddd xmm5,XMMWORD[((176-128))+rax] + + movdqa xmm7,xmm6 + movdqa xmm1,xmm6 + psrld xmm7,3 + movdqa xmm2,xmm6 + + psrld xmm1,7 + movdqa xmm0,XMMWORD[((0-128))+rax] + pslld xmm2,14 + pxor xmm7,xmm1 + psrld xmm1,18-7 + movdqa xmm3,xmm0 + pxor xmm7,xmm2 + pslld xmm2,25-14 + pxor xmm7,xmm1 + psrld xmm0,10 + movdqa xmm1,xmm3 + + psrld xmm3,17 + pxor xmm7,xmm2 + pslld xmm1,13 + paddd xmm5,xmm7 + pxor xmm0,xmm3 + psrld xmm3,19-17 + pxor xmm0,xmm1 + pslld xmm1,15-13 + pxor xmm0,xmm3 + pxor xmm0,xmm1 + paddd xmm5,xmm0 + movdqa xmm7,xmm10 + + movdqa xmm2,xmm10 + + psrld xmm7,6 + movdqa xmm1,xmm10 + pslld xmm2,7 + movdqa XMMWORD[(32-128)+rax],xmm5 + paddd xmm5,xmm13 + + psrld xmm1,11 + pxor xmm7,xmm2 + pslld xmm2,21-7 + paddd xmm5,XMMWORD[((-64))+rbp] + pxor xmm7,xmm1 + + psrld xmm1,25-11 + movdqa xmm0,xmm10 + + pxor xmm7,xmm2 + movdqa xmm3,xmm10 + pslld xmm2,26-21 + pandn xmm0,xmm12 + pand xmm3,xmm11 + pxor xmm7,xmm1 + + + movdqa xmm1,xmm14 + pxor xmm7,xmm2 + movdqa xmm2,xmm14 + psrld xmm1,2 + paddd xmm5,xmm7 + pxor xmm0,xmm3 + movdqa xmm3,xmm15 + movdqa xmm7,xmm14 + pslld xmm2,10 + pxor xmm3,xmm14 + + + psrld xmm7,13 + pxor xmm1,xmm2 + paddd xmm5,xmm0 + pslld xmm2,19-10 + pand xmm4,xmm3 + pxor xmm1,xmm7 + + + psrld xmm7,22-13 + pxor xmm1,xmm2 + movdqa xmm13,xmm15 + pslld xmm2,30-19 + pxor xmm7,xmm1 + pxor xmm13,xmm4 + paddd xmm9,xmm5 + pxor xmm7,xmm2 + + paddd xmm13,xmm5 + paddd xmm13,xmm7 + movdqa xmm5,XMMWORD[((64-128))+rax] + paddd xmm6,XMMWORD[((192-128))+rax] + + movdqa xmm7,xmm5 + movdqa xmm1,xmm5 + psrld xmm7,3 + movdqa xmm2,xmm5 + + psrld xmm1,7 + movdqa xmm0,XMMWORD[((16-128))+rax] + pslld xmm2,14 + pxor xmm7,xmm1 + psrld xmm1,18-7 + movdqa xmm4,xmm0 + pxor xmm7,xmm2 + pslld xmm2,25-14 + pxor xmm7,xmm1 + psrld xmm0,10 + movdqa xmm1,xmm4 + + psrld xmm4,17 + pxor xmm7,xmm2 + pslld xmm1,13 + paddd xmm6,xmm7 + pxor xmm0,xmm4 + psrld xmm4,19-17 + pxor xmm0,xmm1 + pslld xmm1,15-13 + pxor xmm0,xmm4 + pxor xmm0,xmm1 + paddd xmm6,xmm0 + movdqa xmm7,xmm9 + + movdqa xmm2,xmm9 + + psrld xmm7,6 + movdqa xmm1,xmm9 + pslld xmm2,7 + movdqa XMMWORD[(48-128)+rax],xmm6 + paddd xmm6,xmm12 + + psrld xmm1,11 + pxor xmm7,xmm2 + pslld xmm2,21-7 + paddd xmm6,XMMWORD[((-32))+rbp] + pxor xmm7,xmm1 + + psrld xmm1,25-11 + movdqa xmm0,xmm9 + + pxor xmm7,xmm2 + movdqa xmm4,xmm9 + pslld xmm2,26-21 + pandn xmm0,xmm11 + pand xmm4,xmm10 + pxor xmm7,xmm1 + + + movdqa xmm1,xmm13 + pxor xmm7,xmm2 + movdqa xmm2,xmm13 + psrld xmm1,2 + paddd xmm6,xmm7 + pxor xmm0,xmm4 + movdqa xmm4,xmm14 + movdqa xmm7,xmm13 + pslld xmm2,10 + pxor xmm4,xmm13 + + + psrld xmm7,13 + pxor xmm1,xmm2 + paddd xmm6,xmm0 + pslld xmm2,19-10 + pand xmm3,xmm4 + pxor xmm1,xmm7 + + + psrld xmm7,22-13 + pxor xmm1,xmm2 + movdqa xmm12,xmm14 + pslld xmm2,30-19 + pxor xmm7,xmm1 + pxor xmm12,xmm3 + paddd xmm8,xmm6 + pxor xmm7,xmm2 + + paddd xmm12,xmm6 + paddd xmm12,xmm7 + movdqa xmm6,XMMWORD[((80-128))+rax] + paddd xmm5,XMMWORD[((208-128))+rax] + + movdqa xmm7,xmm6 + movdqa xmm1,xmm6 + psrld xmm7,3 + movdqa xmm2,xmm6 + + psrld xmm1,7 + movdqa xmm0,XMMWORD[((32-128))+rax] + pslld xmm2,14 + pxor xmm7,xmm1 + psrld xmm1,18-7 + movdqa xmm3,xmm0 + pxor xmm7,xmm2 + pslld xmm2,25-14 + pxor xmm7,xmm1 + psrld xmm0,10 + movdqa xmm1,xmm3 + + psrld xmm3,17 + pxor xmm7,xmm2 + pslld xmm1,13 + paddd xmm5,xmm7 + pxor xmm0,xmm3 + psrld xmm3,19-17 + pxor xmm0,xmm1 + pslld xmm1,15-13 + pxor xmm0,xmm3 + pxor xmm0,xmm1 + paddd xmm5,xmm0 + movdqa xmm7,xmm8 + + movdqa xmm2,xmm8 + + psrld xmm7,6 + movdqa xmm1,xmm8 + pslld xmm2,7 + movdqa XMMWORD[(64-128)+rax],xmm5 + paddd xmm5,xmm11 + + psrld xmm1,11 + pxor xmm7,xmm2 + pslld xmm2,21-7 + paddd xmm5,XMMWORD[rbp] + pxor xmm7,xmm1 + + psrld xmm1,25-11 + movdqa xmm0,xmm8 + + pxor xmm7,xmm2 + movdqa xmm3,xmm8 + pslld xmm2,26-21 + pandn xmm0,xmm10 + pand xmm3,xmm9 + pxor xmm7,xmm1 + + + movdqa xmm1,xmm12 + pxor xmm7,xmm2 + movdqa xmm2,xmm12 + psrld xmm1,2 + paddd xmm5,xmm7 + pxor xmm0,xmm3 + movdqa xmm3,xmm13 + movdqa xmm7,xmm12 + pslld xmm2,10 + pxor xmm3,xmm12 + + + psrld xmm7,13 + pxor xmm1,xmm2 + paddd xmm5,xmm0 + pslld xmm2,19-10 + pand xmm4,xmm3 + pxor xmm1,xmm7 + + + psrld xmm7,22-13 + pxor xmm1,xmm2 + movdqa xmm11,xmm13 + pslld xmm2,30-19 + pxor xmm7,xmm1 + pxor xmm11,xmm4 + paddd xmm15,xmm5 + pxor xmm7,xmm2 + + paddd xmm11,xmm5 + paddd xmm11,xmm7 + movdqa xmm5,XMMWORD[((96-128))+rax] + paddd xmm6,XMMWORD[((224-128))+rax] + + movdqa xmm7,xmm5 + movdqa xmm1,xmm5 + psrld xmm7,3 + movdqa xmm2,xmm5 + + psrld xmm1,7 + movdqa xmm0,XMMWORD[((48-128))+rax] + pslld xmm2,14 + pxor xmm7,xmm1 + psrld xmm1,18-7 + movdqa xmm4,xmm0 + pxor xmm7,xmm2 + pslld xmm2,25-14 + pxor xmm7,xmm1 + psrld xmm0,10 + movdqa xmm1,xmm4 + + psrld xmm4,17 + pxor xmm7,xmm2 + pslld xmm1,13 + paddd xmm6,xmm7 + pxor xmm0,xmm4 + psrld xmm4,19-17 + pxor xmm0,xmm1 + pslld xmm1,15-13 + pxor xmm0,xmm4 + pxor xmm0,xmm1 + paddd xmm6,xmm0 + movdqa xmm7,xmm15 + + movdqa xmm2,xmm15 + + psrld xmm7,6 + movdqa xmm1,xmm15 + pslld xmm2,7 + movdqa XMMWORD[(80-128)+rax],xmm6 + paddd xmm6,xmm10 + + psrld xmm1,11 + pxor xmm7,xmm2 + pslld xmm2,21-7 + paddd xmm6,XMMWORD[32+rbp] + pxor xmm7,xmm1 + + psrld xmm1,25-11 + movdqa xmm0,xmm15 + + pxor xmm7,xmm2 + movdqa xmm4,xmm15 + pslld xmm2,26-21 + pandn xmm0,xmm9 + pand xmm4,xmm8 + pxor xmm7,xmm1 + + + movdqa xmm1,xmm11 + pxor xmm7,xmm2 + movdqa xmm2,xmm11 + psrld xmm1,2 + paddd xmm6,xmm7 + pxor xmm0,xmm4 + movdqa xmm4,xmm12 + movdqa xmm7,xmm11 + pslld xmm2,10 + pxor xmm4,xmm11 + + + psrld xmm7,13 + pxor xmm1,xmm2 + paddd xmm6,xmm0 + pslld xmm2,19-10 + pand xmm3,xmm4 + pxor xmm1,xmm7 + + + psrld xmm7,22-13 + pxor xmm1,xmm2 + movdqa xmm10,xmm12 + pslld xmm2,30-19 + pxor xmm7,xmm1 + pxor xmm10,xmm3 + paddd xmm14,xmm6 + pxor xmm7,xmm2 + + paddd xmm10,xmm6 + paddd xmm10,xmm7 + movdqa xmm6,XMMWORD[((112-128))+rax] + paddd xmm5,XMMWORD[((240-128))+rax] + + movdqa xmm7,xmm6 + movdqa xmm1,xmm6 + psrld xmm7,3 + movdqa xmm2,xmm6 + + psrld xmm1,7 + movdqa xmm0,XMMWORD[((64-128))+rax] + pslld xmm2,14 + pxor xmm7,xmm1 + psrld xmm1,18-7 + movdqa xmm3,xmm0 + pxor xmm7,xmm2 + pslld xmm2,25-14 + pxor xmm7,xmm1 + psrld xmm0,10 + movdqa xmm1,xmm3 + + psrld xmm3,17 + pxor xmm7,xmm2 + pslld xmm1,13 + paddd xmm5,xmm7 + pxor xmm0,xmm3 + psrld xmm3,19-17 + pxor xmm0,xmm1 + pslld xmm1,15-13 + pxor xmm0,xmm3 + pxor xmm0,xmm1 + paddd xmm5,xmm0 + movdqa xmm7,xmm14 + + movdqa xmm2,xmm14 + + psrld xmm7,6 + movdqa xmm1,xmm14 + pslld xmm2,7 + movdqa XMMWORD[(96-128)+rax],xmm5 + paddd xmm5,xmm9 + + psrld xmm1,11 + pxor xmm7,xmm2 + pslld xmm2,21-7 + paddd xmm5,XMMWORD[64+rbp] + pxor xmm7,xmm1 + + psrld xmm1,25-11 + movdqa xmm0,xmm14 + + pxor xmm7,xmm2 + movdqa xmm3,xmm14 + pslld xmm2,26-21 + pandn xmm0,xmm8 + pand xmm3,xmm15 + pxor xmm7,xmm1 + + + movdqa xmm1,xmm10 + pxor xmm7,xmm2 + movdqa xmm2,xmm10 + psrld xmm1,2 + paddd xmm5,xmm7 + pxor xmm0,xmm3 + movdqa xmm3,xmm11 + movdqa xmm7,xmm10 + pslld xmm2,10 + pxor xmm3,xmm10 + + + psrld xmm7,13 + pxor xmm1,xmm2 + paddd xmm5,xmm0 + pslld xmm2,19-10 + pand xmm4,xmm3 + pxor xmm1,xmm7 + + + psrld xmm7,22-13 + pxor xmm1,xmm2 + movdqa xmm9,xmm11 + pslld xmm2,30-19 + pxor xmm7,xmm1 + pxor xmm9,xmm4 + paddd xmm13,xmm5 + pxor xmm7,xmm2 + + paddd xmm9,xmm5 + paddd xmm9,xmm7 + movdqa xmm5,XMMWORD[((128-128))+rax] + paddd xmm6,XMMWORD[((0-128))+rax] + + movdqa xmm7,xmm5 + movdqa xmm1,xmm5 + psrld xmm7,3 + movdqa xmm2,xmm5 + + psrld xmm1,7 + movdqa xmm0,XMMWORD[((80-128))+rax] + pslld xmm2,14 + pxor xmm7,xmm1 + psrld xmm1,18-7 + movdqa xmm4,xmm0 + pxor xmm7,xmm2 + pslld xmm2,25-14 + pxor xmm7,xmm1 + psrld xmm0,10 + movdqa xmm1,xmm4 + + psrld xmm4,17 + pxor xmm7,xmm2 + pslld xmm1,13 + paddd xmm6,xmm7 + pxor xmm0,xmm4 + psrld xmm4,19-17 + pxor xmm0,xmm1 + pslld xmm1,15-13 + pxor xmm0,xmm4 + pxor xmm0,xmm1 + paddd xmm6,xmm0 + movdqa xmm7,xmm13 + + movdqa xmm2,xmm13 + + psrld xmm7,6 + movdqa xmm1,xmm13 + pslld xmm2,7 + movdqa XMMWORD[(112-128)+rax],xmm6 + paddd xmm6,xmm8 + + psrld xmm1,11 + pxor xmm7,xmm2 + pslld xmm2,21-7 + paddd xmm6,XMMWORD[96+rbp] + pxor xmm7,xmm1 + + psrld xmm1,25-11 + movdqa xmm0,xmm13 + + pxor xmm7,xmm2 + movdqa xmm4,xmm13 + pslld xmm2,26-21 + pandn xmm0,xmm15 + pand xmm4,xmm14 + pxor xmm7,xmm1 + + + movdqa xmm1,xmm9 + pxor xmm7,xmm2 + movdqa xmm2,xmm9 + psrld xmm1,2 + paddd xmm6,xmm7 + pxor xmm0,xmm4 + movdqa xmm4,xmm10 + movdqa xmm7,xmm9 + pslld xmm2,10 + pxor xmm4,xmm9 + + + psrld xmm7,13 + pxor xmm1,xmm2 + paddd xmm6,xmm0 + pslld xmm2,19-10 + pand xmm3,xmm4 + pxor xmm1,xmm7 + + + psrld xmm7,22-13 + pxor xmm1,xmm2 + movdqa xmm8,xmm10 + pslld xmm2,30-19 + pxor xmm7,xmm1 + pxor xmm8,xmm3 + paddd xmm12,xmm6 + pxor xmm7,xmm2 + + paddd xmm8,xmm6 + paddd xmm8,xmm7 + lea rbp,[256+rbp] + movdqa xmm6,XMMWORD[((144-128))+rax] + paddd xmm5,XMMWORD[((16-128))+rax] + + movdqa xmm7,xmm6 + movdqa xmm1,xmm6 + psrld xmm7,3 + movdqa xmm2,xmm6 + + psrld xmm1,7 + movdqa xmm0,XMMWORD[((96-128))+rax] + pslld xmm2,14 + pxor xmm7,xmm1 + psrld xmm1,18-7 + movdqa xmm3,xmm0 + pxor xmm7,xmm2 + pslld xmm2,25-14 + pxor xmm7,xmm1 + psrld xmm0,10 + movdqa xmm1,xmm3 + + psrld xmm3,17 + pxor xmm7,xmm2 + pslld xmm1,13 + paddd xmm5,xmm7 + pxor xmm0,xmm3 + psrld xmm3,19-17 + pxor xmm0,xmm1 + pslld xmm1,15-13 + pxor xmm0,xmm3 + pxor xmm0,xmm1 + paddd xmm5,xmm0 + movdqa xmm7,xmm12 + + movdqa xmm2,xmm12 + + psrld xmm7,6 + movdqa xmm1,xmm12 + pslld xmm2,7 + movdqa XMMWORD[(128-128)+rax],xmm5 + paddd xmm5,xmm15 + + psrld xmm1,11 + pxor xmm7,xmm2 + pslld xmm2,21-7 + paddd xmm5,XMMWORD[((-128))+rbp] + pxor xmm7,xmm1 + + psrld xmm1,25-11 + movdqa xmm0,xmm12 + + pxor xmm7,xmm2 + movdqa xmm3,xmm12 + pslld xmm2,26-21 + pandn xmm0,xmm14 + pand xmm3,xmm13 + pxor xmm7,xmm1 + + + movdqa xmm1,xmm8 + pxor xmm7,xmm2 + movdqa xmm2,xmm8 + psrld xmm1,2 + paddd xmm5,xmm7 + pxor xmm0,xmm3 + movdqa xmm3,xmm9 + movdqa xmm7,xmm8 + pslld xmm2,10 + pxor xmm3,xmm8 + + + psrld xmm7,13 + pxor xmm1,xmm2 + paddd xmm5,xmm0 + pslld xmm2,19-10 + pand xmm4,xmm3 + pxor xmm1,xmm7 + + + psrld xmm7,22-13 + pxor xmm1,xmm2 + movdqa xmm15,xmm9 + pslld xmm2,30-19 + pxor xmm7,xmm1 + pxor xmm15,xmm4 + paddd xmm11,xmm5 + pxor xmm7,xmm2 + + paddd xmm15,xmm5 + paddd xmm15,xmm7 + movdqa xmm5,XMMWORD[((160-128))+rax] + paddd xmm6,XMMWORD[((32-128))+rax] + + movdqa xmm7,xmm5 + movdqa xmm1,xmm5 + psrld xmm7,3 + movdqa xmm2,xmm5 + + psrld xmm1,7 + movdqa xmm0,XMMWORD[((112-128))+rax] + pslld xmm2,14 + pxor xmm7,xmm1 + psrld xmm1,18-7 + movdqa xmm4,xmm0 + pxor xmm7,xmm2 + pslld xmm2,25-14 + pxor xmm7,xmm1 + psrld xmm0,10 + movdqa xmm1,xmm4 + + psrld xmm4,17 + pxor xmm7,xmm2 + pslld xmm1,13 + paddd xmm6,xmm7 + pxor xmm0,xmm4 + psrld xmm4,19-17 + pxor xmm0,xmm1 + pslld xmm1,15-13 + pxor xmm0,xmm4 + pxor xmm0,xmm1 + paddd xmm6,xmm0 + movdqa xmm7,xmm11 + + movdqa xmm2,xmm11 + + psrld xmm7,6 + movdqa xmm1,xmm11 + pslld xmm2,7 + movdqa XMMWORD[(144-128)+rax],xmm6 + paddd xmm6,xmm14 + + psrld xmm1,11 + pxor xmm7,xmm2 + pslld xmm2,21-7 + paddd xmm6,XMMWORD[((-96))+rbp] + pxor xmm7,xmm1 + + psrld xmm1,25-11 + movdqa xmm0,xmm11 + + pxor xmm7,xmm2 + movdqa xmm4,xmm11 + pslld xmm2,26-21 + pandn xmm0,xmm13 + pand xmm4,xmm12 + pxor xmm7,xmm1 + + + movdqa xmm1,xmm15 + pxor xmm7,xmm2 + movdqa xmm2,xmm15 + psrld xmm1,2 + paddd xmm6,xmm7 + pxor xmm0,xmm4 + movdqa xmm4,xmm8 + movdqa xmm7,xmm15 + pslld xmm2,10 + pxor xmm4,xmm15 + + + psrld xmm7,13 + pxor xmm1,xmm2 + paddd xmm6,xmm0 + pslld xmm2,19-10 + pand xmm3,xmm4 + pxor xmm1,xmm7 + + + psrld xmm7,22-13 + pxor xmm1,xmm2 + movdqa xmm14,xmm8 + pslld xmm2,30-19 + pxor xmm7,xmm1 + pxor xmm14,xmm3 + paddd xmm10,xmm6 + pxor xmm7,xmm2 + + paddd xmm14,xmm6 + paddd xmm14,xmm7 + movdqa xmm6,XMMWORD[((176-128))+rax] + paddd xmm5,XMMWORD[((48-128))+rax] + + movdqa xmm7,xmm6 + movdqa xmm1,xmm6 + psrld xmm7,3 + movdqa xmm2,xmm6 + + psrld xmm1,7 + movdqa xmm0,XMMWORD[((128-128))+rax] + pslld xmm2,14 + pxor xmm7,xmm1 + psrld xmm1,18-7 + movdqa xmm3,xmm0 + pxor xmm7,xmm2 + pslld xmm2,25-14 + pxor xmm7,xmm1 + psrld xmm0,10 + movdqa xmm1,xmm3 + + psrld xmm3,17 + pxor xmm7,xmm2 + pslld xmm1,13 + paddd xmm5,xmm7 + pxor xmm0,xmm3 + psrld xmm3,19-17 + pxor xmm0,xmm1 + pslld xmm1,15-13 + pxor xmm0,xmm3 + pxor xmm0,xmm1 + paddd xmm5,xmm0 + movdqa xmm7,xmm10 + + movdqa xmm2,xmm10 + + psrld xmm7,6 + movdqa xmm1,xmm10 + pslld xmm2,7 + movdqa XMMWORD[(160-128)+rax],xmm5 + paddd xmm5,xmm13 + + psrld xmm1,11 + pxor xmm7,xmm2 + pslld xmm2,21-7 + paddd xmm5,XMMWORD[((-64))+rbp] + pxor xmm7,xmm1 + + psrld xmm1,25-11 + movdqa xmm0,xmm10 + + pxor xmm7,xmm2 + movdqa xmm3,xmm10 + pslld xmm2,26-21 + pandn xmm0,xmm12 + pand xmm3,xmm11 + pxor xmm7,xmm1 + + + movdqa xmm1,xmm14 + pxor xmm7,xmm2 + movdqa xmm2,xmm14 + psrld xmm1,2 + paddd xmm5,xmm7 + pxor xmm0,xmm3 + movdqa xmm3,xmm15 + movdqa xmm7,xmm14 + pslld xmm2,10 + pxor xmm3,xmm14 + + + psrld xmm7,13 + pxor xmm1,xmm2 + paddd xmm5,xmm0 + pslld xmm2,19-10 + pand xmm4,xmm3 + pxor xmm1,xmm7 + + + psrld xmm7,22-13 + pxor xmm1,xmm2 + movdqa xmm13,xmm15 + pslld xmm2,30-19 + pxor xmm7,xmm1 + pxor xmm13,xmm4 + paddd xmm9,xmm5 + pxor xmm7,xmm2 + + paddd xmm13,xmm5 + paddd xmm13,xmm7 + movdqa xmm5,XMMWORD[((192-128))+rax] + paddd xmm6,XMMWORD[((64-128))+rax] + + movdqa xmm7,xmm5 + movdqa xmm1,xmm5 + psrld xmm7,3 + movdqa xmm2,xmm5 + + psrld xmm1,7 + movdqa xmm0,XMMWORD[((144-128))+rax] + pslld xmm2,14 + pxor xmm7,xmm1 + psrld xmm1,18-7 + movdqa xmm4,xmm0 + pxor xmm7,xmm2 + pslld xmm2,25-14 + pxor xmm7,xmm1 + psrld xmm0,10 + movdqa xmm1,xmm4 + + psrld xmm4,17 + pxor xmm7,xmm2 + pslld xmm1,13 + paddd xmm6,xmm7 + pxor xmm0,xmm4 + psrld xmm4,19-17 + pxor xmm0,xmm1 + pslld xmm1,15-13 + pxor xmm0,xmm4 + pxor xmm0,xmm1 + paddd xmm6,xmm0 + movdqa xmm7,xmm9 + + movdqa xmm2,xmm9 + + psrld xmm7,6 + movdqa xmm1,xmm9 + pslld xmm2,7 + movdqa XMMWORD[(176-128)+rax],xmm6 + paddd xmm6,xmm12 + + psrld xmm1,11 + pxor xmm7,xmm2 + pslld xmm2,21-7 + paddd xmm6,XMMWORD[((-32))+rbp] + pxor xmm7,xmm1 + + psrld xmm1,25-11 + movdqa xmm0,xmm9 + + pxor xmm7,xmm2 + movdqa xmm4,xmm9 + pslld xmm2,26-21 + pandn xmm0,xmm11 + pand xmm4,xmm10 + pxor xmm7,xmm1 + + + movdqa xmm1,xmm13 + pxor xmm7,xmm2 + movdqa xmm2,xmm13 + psrld xmm1,2 + paddd xmm6,xmm7 + pxor xmm0,xmm4 + movdqa xmm4,xmm14 + movdqa xmm7,xmm13 + pslld xmm2,10 + pxor xmm4,xmm13 + + + psrld xmm7,13 + pxor xmm1,xmm2 + paddd xmm6,xmm0 + pslld xmm2,19-10 + pand xmm3,xmm4 + pxor xmm1,xmm7 + + + psrld xmm7,22-13 + pxor xmm1,xmm2 + movdqa xmm12,xmm14 + pslld xmm2,30-19 + pxor xmm7,xmm1 + pxor xmm12,xmm3 + paddd xmm8,xmm6 + pxor xmm7,xmm2 + + paddd xmm12,xmm6 + paddd xmm12,xmm7 + movdqa xmm6,XMMWORD[((208-128))+rax] + paddd xmm5,XMMWORD[((80-128))+rax] + + movdqa xmm7,xmm6 + movdqa xmm1,xmm6 + psrld xmm7,3 + movdqa xmm2,xmm6 + + psrld xmm1,7 + movdqa xmm0,XMMWORD[((160-128))+rax] + pslld xmm2,14 + pxor xmm7,xmm1 + psrld xmm1,18-7 + movdqa xmm3,xmm0 + pxor xmm7,xmm2 + pslld xmm2,25-14 + pxor xmm7,xmm1 + psrld xmm0,10 + movdqa xmm1,xmm3 + + psrld xmm3,17 + pxor xmm7,xmm2 + pslld xmm1,13 + paddd xmm5,xmm7 + pxor xmm0,xmm3 + psrld xmm3,19-17 + pxor xmm0,xmm1 + pslld xmm1,15-13 + pxor xmm0,xmm3 + pxor xmm0,xmm1 + paddd xmm5,xmm0 + movdqa xmm7,xmm8 + + movdqa xmm2,xmm8 + + psrld xmm7,6 + movdqa xmm1,xmm8 + pslld xmm2,7 + movdqa XMMWORD[(192-128)+rax],xmm5 + paddd xmm5,xmm11 + + psrld xmm1,11 + pxor xmm7,xmm2 + pslld xmm2,21-7 + paddd xmm5,XMMWORD[rbp] + pxor xmm7,xmm1 + + psrld xmm1,25-11 + movdqa xmm0,xmm8 + + pxor xmm7,xmm2 + movdqa xmm3,xmm8 + pslld xmm2,26-21 + pandn xmm0,xmm10 + pand xmm3,xmm9 + pxor xmm7,xmm1 + + + movdqa xmm1,xmm12 + pxor xmm7,xmm2 + movdqa xmm2,xmm12 + psrld xmm1,2 + paddd xmm5,xmm7 + pxor xmm0,xmm3 + movdqa xmm3,xmm13 + movdqa xmm7,xmm12 + pslld xmm2,10 + pxor xmm3,xmm12 + + + psrld xmm7,13 + pxor xmm1,xmm2 + paddd xmm5,xmm0 + pslld xmm2,19-10 + pand xmm4,xmm3 + pxor xmm1,xmm7 + + + psrld xmm7,22-13 + pxor xmm1,xmm2 + movdqa xmm11,xmm13 + pslld xmm2,30-19 + pxor xmm7,xmm1 + pxor xmm11,xmm4 + paddd xmm15,xmm5 + pxor xmm7,xmm2 + + paddd xmm11,xmm5 + paddd xmm11,xmm7 + movdqa xmm5,XMMWORD[((224-128))+rax] + paddd xmm6,XMMWORD[((96-128))+rax] + + movdqa xmm7,xmm5 + movdqa xmm1,xmm5 + psrld xmm7,3 + movdqa xmm2,xmm5 + + psrld xmm1,7 + movdqa xmm0,XMMWORD[((176-128))+rax] + pslld xmm2,14 + pxor xmm7,xmm1 + psrld xmm1,18-7 + movdqa xmm4,xmm0 + pxor xmm7,xmm2 + pslld xmm2,25-14 + pxor xmm7,xmm1 + psrld xmm0,10 + movdqa xmm1,xmm4 + + psrld xmm4,17 + pxor xmm7,xmm2 + pslld xmm1,13 + paddd xmm6,xmm7 + pxor xmm0,xmm4 + psrld xmm4,19-17 + pxor xmm0,xmm1 + pslld xmm1,15-13 + pxor xmm0,xmm4 + pxor xmm0,xmm1 + paddd xmm6,xmm0 + movdqa xmm7,xmm15 + + movdqa xmm2,xmm15 + + psrld xmm7,6 + movdqa xmm1,xmm15 + pslld xmm2,7 + movdqa XMMWORD[(208-128)+rax],xmm6 + paddd xmm6,xmm10 + + psrld xmm1,11 + pxor xmm7,xmm2 + pslld xmm2,21-7 + paddd xmm6,XMMWORD[32+rbp] + pxor xmm7,xmm1 + + psrld xmm1,25-11 + movdqa xmm0,xmm15 + + pxor xmm7,xmm2 + movdqa xmm4,xmm15 + pslld xmm2,26-21 + pandn xmm0,xmm9 + pand xmm4,xmm8 + pxor xmm7,xmm1 + + + movdqa xmm1,xmm11 + pxor xmm7,xmm2 + movdqa xmm2,xmm11 + psrld xmm1,2 + paddd xmm6,xmm7 + pxor xmm0,xmm4 + movdqa xmm4,xmm12 + movdqa xmm7,xmm11 + pslld xmm2,10 + pxor xmm4,xmm11 + + + psrld xmm7,13 + pxor xmm1,xmm2 + paddd xmm6,xmm0 + pslld xmm2,19-10 + pand xmm3,xmm4 + pxor xmm1,xmm7 + + + psrld xmm7,22-13 + pxor xmm1,xmm2 + movdqa xmm10,xmm12 + pslld xmm2,30-19 + pxor xmm7,xmm1 + pxor xmm10,xmm3 + paddd xmm14,xmm6 + pxor xmm7,xmm2 + + paddd xmm10,xmm6 + paddd xmm10,xmm7 + movdqa xmm6,XMMWORD[((240-128))+rax] + paddd xmm5,XMMWORD[((112-128))+rax] + + movdqa xmm7,xmm6 + movdqa xmm1,xmm6 + psrld xmm7,3 + movdqa xmm2,xmm6 + + psrld xmm1,7 + movdqa xmm0,XMMWORD[((192-128))+rax] + pslld xmm2,14 + pxor xmm7,xmm1 + psrld xmm1,18-7 + movdqa xmm3,xmm0 + pxor xmm7,xmm2 + pslld xmm2,25-14 + pxor xmm7,xmm1 + psrld xmm0,10 + movdqa xmm1,xmm3 + + psrld xmm3,17 + pxor xmm7,xmm2 + pslld xmm1,13 + paddd xmm5,xmm7 + pxor xmm0,xmm3 + psrld xmm3,19-17 + pxor xmm0,xmm1 + pslld xmm1,15-13 + pxor xmm0,xmm3 + pxor xmm0,xmm1 + paddd xmm5,xmm0 + movdqa xmm7,xmm14 + + movdqa xmm2,xmm14 + + psrld xmm7,6 + movdqa xmm1,xmm14 + pslld xmm2,7 + movdqa XMMWORD[(224-128)+rax],xmm5 + paddd xmm5,xmm9 + + psrld xmm1,11 + pxor xmm7,xmm2 + pslld xmm2,21-7 + paddd xmm5,XMMWORD[64+rbp] + pxor xmm7,xmm1 + + psrld xmm1,25-11 + movdqa xmm0,xmm14 + + pxor xmm7,xmm2 + movdqa xmm3,xmm14 + pslld xmm2,26-21 + pandn xmm0,xmm8 + pand xmm3,xmm15 + pxor xmm7,xmm1 + + + movdqa xmm1,xmm10 + pxor xmm7,xmm2 + movdqa xmm2,xmm10 + psrld xmm1,2 + paddd xmm5,xmm7 + pxor xmm0,xmm3 + movdqa xmm3,xmm11 + movdqa xmm7,xmm10 + pslld xmm2,10 + pxor xmm3,xmm10 + + + psrld xmm7,13 + pxor xmm1,xmm2 + paddd xmm5,xmm0 + pslld xmm2,19-10 + pand xmm4,xmm3 + pxor xmm1,xmm7 + + + psrld xmm7,22-13 + pxor xmm1,xmm2 + movdqa xmm9,xmm11 + pslld xmm2,30-19 + pxor xmm7,xmm1 + pxor xmm9,xmm4 + paddd xmm13,xmm5 + pxor xmm7,xmm2 + + paddd xmm9,xmm5 + paddd xmm9,xmm7 + movdqa xmm5,XMMWORD[((0-128))+rax] + paddd xmm6,XMMWORD[((128-128))+rax] + + movdqa xmm7,xmm5 + movdqa xmm1,xmm5 + psrld xmm7,3 + movdqa xmm2,xmm5 + + psrld xmm1,7 + movdqa xmm0,XMMWORD[((208-128))+rax] + pslld xmm2,14 + pxor xmm7,xmm1 + psrld xmm1,18-7 + movdqa xmm4,xmm0 + pxor xmm7,xmm2 + pslld xmm2,25-14 + pxor xmm7,xmm1 + psrld xmm0,10 + movdqa xmm1,xmm4 + + psrld xmm4,17 + pxor xmm7,xmm2 + pslld xmm1,13 + paddd xmm6,xmm7 + pxor xmm0,xmm4 + psrld xmm4,19-17 + pxor xmm0,xmm1 + pslld xmm1,15-13 + pxor xmm0,xmm4 + pxor xmm0,xmm1 + paddd xmm6,xmm0 + movdqa xmm7,xmm13 + + movdqa xmm2,xmm13 + + psrld xmm7,6 + movdqa xmm1,xmm13 + pslld xmm2,7 + movdqa XMMWORD[(240-128)+rax],xmm6 + paddd xmm6,xmm8 + + psrld xmm1,11 + pxor xmm7,xmm2 + pslld xmm2,21-7 + paddd xmm6,XMMWORD[96+rbp] + pxor xmm7,xmm1 + + psrld xmm1,25-11 + movdqa xmm0,xmm13 + + pxor xmm7,xmm2 + movdqa xmm4,xmm13 + pslld xmm2,26-21 + pandn xmm0,xmm15 + pand xmm4,xmm14 + pxor xmm7,xmm1 + + + movdqa xmm1,xmm9 + pxor xmm7,xmm2 + movdqa xmm2,xmm9 + psrld xmm1,2 + paddd xmm6,xmm7 + pxor xmm0,xmm4 + movdqa xmm4,xmm10 + movdqa xmm7,xmm9 + pslld xmm2,10 + pxor xmm4,xmm9 + + + psrld xmm7,13 + pxor xmm1,xmm2 + paddd xmm6,xmm0 + pslld xmm2,19-10 + pand xmm3,xmm4 + pxor xmm1,xmm7 + + + psrld xmm7,22-13 + pxor xmm1,xmm2 + movdqa xmm8,xmm10 + pslld xmm2,30-19 + pxor xmm7,xmm1 + pxor xmm8,xmm3 + paddd xmm12,xmm6 + pxor xmm7,xmm2 + + paddd xmm8,xmm6 + paddd xmm8,xmm7 + lea rbp,[256+rbp] + dec ecx + jnz NEAR $L$oop_16_xx + + mov ecx,1 + lea rbp,[((K256+128))] + + movdqa xmm7,XMMWORD[rbx] + cmp ecx,DWORD[rbx] + pxor xmm0,xmm0 + cmovge r8,rbp + cmp ecx,DWORD[4+rbx] + movdqa xmm6,xmm7 + cmovge r9,rbp + cmp ecx,DWORD[8+rbx] + pcmpgtd xmm6,xmm0 + cmovge r10,rbp + cmp ecx,DWORD[12+rbx] + paddd xmm7,xmm6 + cmovge r11,rbp + + movdqu xmm0,XMMWORD[((0-128))+rdi] + pand xmm8,xmm6 + movdqu xmm1,XMMWORD[((32-128))+rdi] + pand xmm9,xmm6 + movdqu xmm2,XMMWORD[((64-128))+rdi] + pand xmm10,xmm6 + movdqu xmm5,XMMWORD[((96-128))+rdi] + pand xmm11,xmm6 + paddd xmm8,xmm0 + movdqu xmm0,XMMWORD[((128-128))+rdi] + pand xmm12,xmm6 + paddd xmm9,xmm1 + movdqu xmm1,XMMWORD[((160-128))+rdi] + pand xmm13,xmm6 + paddd xmm10,xmm2 + movdqu xmm2,XMMWORD[((192-128))+rdi] + pand xmm14,xmm6 + paddd xmm11,xmm5 + movdqu xmm5,XMMWORD[((224-128))+rdi] + pand xmm15,xmm6 + paddd xmm12,xmm0 + paddd xmm13,xmm1 + movdqu XMMWORD[(0-128)+rdi],xmm8 + paddd xmm14,xmm2 + movdqu XMMWORD[(32-128)+rdi],xmm9 + paddd xmm15,xmm5 + movdqu XMMWORD[(64-128)+rdi],xmm10 + movdqu XMMWORD[(96-128)+rdi],xmm11 + movdqu XMMWORD[(128-128)+rdi],xmm12 + movdqu XMMWORD[(160-128)+rdi],xmm13 + movdqu XMMWORD[(192-128)+rdi],xmm14 + movdqu XMMWORD[(224-128)+rdi],xmm15 + + movdqa XMMWORD[rbx],xmm7 + movdqa xmm6,XMMWORD[$L$pbswap] + dec edx + jnz NEAR $L$oop + + mov edx,DWORD[280+rsp] + lea rdi,[16+rdi] + lea rsi,[64+rsi] + dec edx + jnz NEAR $L$oop_grande + +$L$done: + mov rax,QWORD[272+rsp] + + movaps xmm6,XMMWORD[((-184))+rax] + movaps xmm7,XMMWORD[((-168))+rax] + movaps xmm8,XMMWORD[((-152))+rax] + movaps xmm9,XMMWORD[((-136))+rax] + movaps xmm10,XMMWORD[((-120))+rax] + movaps xmm11,XMMWORD[((-104))+rax] + movaps xmm12,XMMWORD[((-88))+rax] + movaps xmm13,XMMWORD[((-72))+rax] + movaps xmm14,XMMWORD[((-56))+rax] + movaps xmm15,XMMWORD[((-40))+rax] + mov rbp,QWORD[((-16))+rax] + + mov rbx,QWORD[((-8))+rax] + + lea rsp,[rax] + +$L$epilogue: + mov rdi,QWORD[8+rsp] ;WIN64 epilogue + mov rsi,QWORD[16+rsp] + DB 0F3h,0C3h ;repret + +$L$SEH_end_sha256_multi_block: + +ALIGN 32 +sha256_multi_block_shaext: + mov QWORD[8+rsp],rdi ;WIN64 prologue + mov QWORD[16+rsp],rsi + mov rax,rsp +$L$SEH_begin_sha256_multi_block_shaext: + mov rdi,rcx + mov rsi,rdx + mov rdx,r8 + + + +_shaext_shortcut: + mov rax,rsp + + push rbx + + push rbp + + lea rsp,[((-168))+rsp] + movaps XMMWORD[rsp],xmm6 + movaps XMMWORD[16+rsp],xmm7 + movaps XMMWORD[32+rsp],xmm8 + movaps XMMWORD[48+rsp],xmm9 + movaps XMMWORD[(-120)+rax],xmm10 + movaps XMMWORD[(-104)+rax],xmm11 + movaps XMMWORD[(-88)+rax],xmm12 + movaps XMMWORD[(-72)+rax],xmm13 + movaps XMMWORD[(-56)+rax],xmm14 + movaps XMMWORD[(-40)+rax],xmm15 + sub rsp,288 + shl edx,1 + and rsp,-256 + lea rdi,[128+rdi] + mov QWORD[272+rsp],rax +$L$body_shaext: + lea rbx,[256+rsp] + lea rbp,[((K256_shaext+128))] + +$L$oop_grande_shaext: + mov DWORD[280+rsp],edx + xor edx,edx + mov r8,QWORD[rsi] + mov ecx,DWORD[8+rsi] + cmp ecx,edx + cmovg edx,ecx + test ecx,ecx + mov DWORD[rbx],ecx + cmovle r8,rsp + mov r9,QWORD[16+rsi] + mov ecx,DWORD[24+rsi] + cmp ecx,edx + cmovg edx,ecx + test ecx,ecx + mov DWORD[4+rbx],ecx + cmovle r9,rsp + test edx,edx + jz NEAR $L$done_shaext + + movq xmm12,QWORD[((0-128))+rdi] + movq xmm4,QWORD[((32-128))+rdi] + movq xmm13,QWORD[((64-128))+rdi] + movq xmm5,QWORD[((96-128))+rdi] + movq xmm8,QWORD[((128-128))+rdi] + movq xmm9,QWORD[((160-128))+rdi] + movq xmm10,QWORD[((192-128))+rdi] + movq xmm11,QWORD[((224-128))+rdi] + + punpckldq xmm12,xmm4 + punpckldq xmm13,xmm5 + punpckldq xmm8,xmm9 + punpckldq xmm10,xmm11 + movdqa xmm3,XMMWORD[((K256_shaext-16))] + + movdqa xmm14,xmm12 + movdqa xmm15,xmm13 + punpcklqdq xmm12,xmm8 + punpcklqdq xmm13,xmm10 + punpckhqdq xmm14,xmm8 + punpckhqdq xmm15,xmm10 + + pshufd xmm12,xmm12,27 + pshufd xmm13,xmm13,27 + pshufd xmm14,xmm14,27 + pshufd xmm15,xmm15,27 + jmp NEAR $L$oop_shaext + +ALIGN 32 +$L$oop_shaext: + movdqu xmm4,XMMWORD[r8] + movdqu xmm8,XMMWORD[r9] + movdqu xmm5,XMMWORD[16+r8] + movdqu xmm9,XMMWORD[16+r9] + movdqu xmm6,XMMWORD[32+r8] +DB 102,15,56,0,227 + movdqu xmm10,XMMWORD[32+r9] +DB 102,68,15,56,0,195 + movdqu xmm7,XMMWORD[48+r8] + lea r8,[64+r8] + movdqu xmm11,XMMWORD[48+r9] + lea r9,[64+r9] + + movdqa xmm0,XMMWORD[((0-128))+rbp] +DB 102,15,56,0,235 + paddd xmm0,xmm4 + pxor xmm4,xmm12 + movdqa xmm1,xmm0 + movdqa xmm2,XMMWORD[((0-128))+rbp] +DB 102,68,15,56,0,203 + paddd xmm2,xmm8 + movdqa XMMWORD[80+rsp],xmm13 +DB 69,15,56,203,236 + pxor xmm8,xmm14 + movdqa xmm0,xmm2 + movdqa XMMWORD[112+rsp],xmm15 +DB 69,15,56,203,254 + pshufd xmm0,xmm1,0x0e + pxor xmm4,xmm12 + movdqa XMMWORD[64+rsp],xmm12 +DB 69,15,56,203,229 + pshufd xmm0,xmm2,0x0e + pxor xmm8,xmm14 + movdqa XMMWORD[96+rsp],xmm14 + movdqa xmm1,XMMWORD[((16-128))+rbp] + paddd xmm1,xmm5 +DB 102,15,56,0,243 +DB 69,15,56,203,247 + + movdqa xmm0,xmm1 + movdqa xmm2,XMMWORD[((16-128))+rbp] + paddd xmm2,xmm9 +DB 69,15,56,203,236 + movdqa xmm0,xmm2 + prefetcht0 [127+r8] +DB 102,15,56,0,251 +DB 102,68,15,56,0,211 + prefetcht0 [127+r9] +DB 69,15,56,203,254 + pshufd xmm0,xmm1,0x0e +DB 102,68,15,56,0,219 +DB 15,56,204,229 +DB 69,15,56,203,229 + pshufd xmm0,xmm2,0x0e + movdqa xmm1,XMMWORD[((32-128))+rbp] + paddd xmm1,xmm6 +DB 69,15,56,203,247 + + movdqa xmm0,xmm1 + movdqa xmm2,XMMWORD[((32-128))+rbp] + paddd xmm2,xmm10 +DB 69,15,56,203,236 +DB 69,15,56,204,193 + movdqa xmm0,xmm2 + movdqa xmm3,xmm7 +DB 69,15,56,203,254 + pshufd xmm0,xmm1,0x0e +DB 102,15,58,15,222,4 + paddd xmm4,xmm3 + movdqa xmm3,xmm11 +DB 102,65,15,58,15,218,4 +DB 15,56,204,238 +DB 69,15,56,203,229 + pshufd xmm0,xmm2,0x0e + movdqa xmm1,XMMWORD[((48-128))+rbp] + paddd xmm1,xmm7 +DB 69,15,56,203,247 +DB 69,15,56,204,202 + + movdqa xmm0,xmm1 + movdqa xmm2,XMMWORD[((48-128))+rbp] + paddd xmm8,xmm3 + paddd xmm2,xmm11 +DB 15,56,205,231 +DB 69,15,56,203,236 + movdqa xmm0,xmm2 + movdqa xmm3,xmm4 +DB 102,15,58,15,223,4 +DB 69,15,56,203,254 +DB 69,15,56,205,195 + pshufd xmm0,xmm1,0x0e + paddd xmm5,xmm3 + movdqa xmm3,xmm8 +DB 102,65,15,58,15,219,4 +DB 15,56,204,247 +DB 69,15,56,203,229 + pshufd xmm0,xmm2,0x0e + movdqa xmm1,XMMWORD[((64-128))+rbp] + paddd xmm1,xmm4 +DB 69,15,56,203,247 +DB 69,15,56,204,211 + movdqa xmm0,xmm1 + movdqa xmm2,XMMWORD[((64-128))+rbp] + paddd xmm9,xmm3 + paddd xmm2,xmm8 +DB 15,56,205,236 +DB 69,15,56,203,236 + movdqa xmm0,xmm2 + movdqa xmm3,xmm5 +DB 102,15,58,15,220,4 +DB 69,15,56,203,254 +DB 69,15,56,205,200 + pshufd xmm0,xmm1,0x0e + paddd xmm6,xmm3 + movdqa xmm3,xmm9 +DB 102,65,15,58,15,216,4 +DB 15,56,204,252 +DB 69,15,56,203,229 + pshufd xmm0,xmm2,0x0e + movdqa xmm1,XMMWORD[((80-128))+rbp] + paddd xmm1,xmm5 +DB 69,15,56,203,247 +DB 69,15,56,204,216 + movdqa xmm0,xmm1 + movdqa xmm2,XMMWORD[((80-128))+rbp] + paddd xmm10,xmm3 + paddd xmm2,xmm9 +DB 15,56,205,245 +DB 69,15,56,203,236 + movdqa xmm0,xmm2 + movdqa xmm3,xmm6 +DB 102,15,58,15,221,4 +DB 69,15,56,203,254 +DB 69,15,56,205,209 + pshufd xmm0,xmm1,0x0e + paddd xmm7,xmm3 + movdqa xmm3,xmm10 +DB 102,65,15,58,15,217,4 +DB 15,56,204,229 +DB 69,15,56,203,229 + pshufd xmm0,xmm2,0x0e + movdqa xmm1,XMMWORD[((96-128))+rbp] + paddd xmm1,xmm6 +DB 69,15,56,203,247 +DB 69,15,56,204,193 + movdqa xmm0,xmm1 + movdqa xmm2,XMMWORD[((96-128))+rbp] + paddd xmm11,xmm3 + paddd xmm2,xmm10 +DB 15,56,205,254 +DB 69,15,56,203,236 + movdqa xmm0,xmm2 + movdqa xmm3,xmm7 +DB 102,15,58,15,222,4 +DB 69,15,56,203,254 +DB 69,15,56,205,218 + pshufd xmm0,xmm1,0x0e + paddd xmm4,xmm3 + movdqa xmm3,xmm11 +DB 102,65,15,58,15,218,4 +DB 15,56,204,238 +DB 69,15,56,203,229 + pshufd xmm0,xmm2,0x0e + movdqa xmm1,XMMWORD[((112-128))+rbp] + paddd xmm1,xmm7 +DB 69,15,56,203,247 +DB 69,15,56,204,202 + movdqa xmm0,xmm1 + movdqa xmm2,XMMWORD[((112-128))+rbp] + paddd xmm8,xmm3 + paddd xmm2,xmm11 +DB 15,56,205,231 +DB 69,15,56,203,236 + movdqa xmm0,xmm2 + movdqa xmm3,xmm4 +DB 102,15,58,15,223,4 +DB 69,15,56,203,254 +DB 69,15,56,205,195 + pshufd xmm0,xmm1,0x0e + paddd xmm5,xmm3 + movdqa xmm3,xmm8 +DB 102,65,15,58,15,219,4 +DB 15,56,204,247 +DB 69,15,56,203,229 + pshufd xmm0,xmm2,0x0e + movdqa xmm1,XMMWORD[((128-128))+rbp] + paddd xmm1,xmm4 +DB 69,15,56,203,247 +DB 69,15,56,204,211 + movdqa xmm0,xmm1 + movdqa xmm2,XMMWORD[((128-128))+rbp] + paddd xmm9,xmm3 + paddd xmm2,xmm8 +DB 15,56,205,236 +DB 69,15,56,203,236 + movdqa xmm0,xmm2 + movdqa xmm3,xmm5 +DB 102,15,58,15,220,4 +DB 69,15,56,203,254 +DB 69,15,56,205,200 + pshufd xmm0,xmm1,0x0e + paddd xmm6,xmm3 + movdqa xmm3,xmm9 +DB 102,65,15,58,15,216,4 +DB 15,56,204,252 +DB 69,15,56,203,229 + pshufd xmm0,xmm2,0x0e + movdqa xmm1,XMMWORD[((144-128))+rbp] + paddd xmm1,xmm5 +DB 69,15,56,203,247 +DB 69,15,56,204,216 + movdqa xmm0,xmm1 + movdqa xmm2,XMMWORD[((144-128))+rbp] + paddd xmm10,xmm3 + paddd xmm2,xmm9 +DB 15,56,205,245 +DB 69,15,56,203,236 + movdqa xmm0,xmm2 + movdqa xmm3,xmm6 +DB 102,15,58,15,221,4 +DB 69,15,56,203,254 +DB 69,15,56,205,209 + pshufd xmm0,xmm1,0x0e + paddd xmm7,xmm3 + movdqa xmm3,xmm10 +DB 102,65,15,58,15,217,4 +DB 15,56,204,229 +DB 69,15,56,203,229 + pshufd xmm0,xmm2,0x0e + movdqa xmm1,XMMWORD[((160-128))+rbp] + paddd xmm1,xmm6 +DB 69,15,56,203,247 +DB 69,15,56,204,193 + movdqa xmm0,xmm1 + movdqa xmm2,XMMWORD[((160-128))+rbp] + paddd xmm11,xmm3 + paddd xmm2,xmm10 +DB 15,56,205,254 +DB 69,15,56,203,236 + movdqa xmm0,xmm2 + movdqa xmm3,xmm7 +DB 102,15,58,15,222,4 +DB 69,15,56,203,254 +DB 69,15,56,205,218 + pshufd xmm0,xmm1,0x0e + paddd xmm4,xmm3 + movdqa xmm3,xmm11 +DB 102,65,15,58,15,218,4 +DB 15,56,204,238 +DB 69,15,56,203,229 + pshufd xmm0,xmm2,0x0e + movdqa xmm1,XMMWORD[((176-128))+rbp] + paddd xmm1,xmm7 +DB 69,15,56,203,247 +DB 69,15,56,204,202 + movdqa xmm0,xmm1 + movdqa xmm2,XMMWORD[((176-128))+rbp] + paddd xmm8,xmm3 + paddd xmm2,xmm11 +DB 15,56,205,231 +DB 69,15,56,203,236 + movdqa xmm0,xmm2 + movdqa xmm3,xmm4 +DB 102,15,58,15,223,4 +DB 69,15,56,203,254 +DB 69,15,56,205,195 + pshufd xmm0,xmm1,0x0e + paddd xmm5,xmm3 + movdqa xmm3,xmm8 +DB 102,65,15,58,15,219,4 +DB 15,56,204,247 +DB 69,15,56,203,229 + pshufd xmm0,xmm2,0x0e + movdqa xmm1,XMMWORD[((192-128))+rbp] + paddd xmm1,xmm4 +DB 69,15,56,203,247 +DB 69,15,56,204,211 + movdqa xmm0,xmm1 + movdqa xmm2,XMMWORD[((192-128))+rbp] + paddd xmm9,xmm3 + paddd xmm2,xmm8 +DB 15,56,205,236 +DB 69,15,56,203,236 + movdqa xmm0,xmm2 + movdqa xmm3,xmm5 +DB 102,15,58,15,220,4 +DB 69,15,56,203,254 +DB 69,15,56,205,200 + pshufd xmm0,xmm1,0x0e + paddd xmm6,xmm3 + movdqa xmm3,xmm9 +DB 102,65,15,58,15,216,4 +DB 15,56,204,252 +DB 69,15,56,203,229 + pshufd xmm0,xmm2,0x0e + movdqa xmm1,XMMWORD[((208-128))+rbp] + paddd xmm1,xmm5 +DB 69,15,56,203,247 +DB 69,15,56,204,216 + movdqa xmm0,xmm1 + movdqa xmm2,XMMWORD[((208-128))+rbp] + paddd xmm10,xmm3 + paddd xmm2,xmm9 +DB 15,56,205,245 +DB 69,15,56,203,236 + movdqa xmm0,xmm2 + movdqa xmm3,xmm6 +DB 102,15,58,15,221,4 +DB 69,15,56,203,254 +DB 69,15,56,205,209 + pshufd xmm0,xmm1,0x0e + paddd xmm7,xmm3 + movdqa xmm3,xmm10 +DB 102,65,15,58,15,217,4 + nop +DB 69,15,56,203,229 + pshufd xmm0,xmm2,0x0e + movdqa xmm1,XMMWORD[((224-128))+rbp] + paddd xmm1,xmm6 +DB 69,15,56,203,247 + + movdqa xmm0,xmm1 + movdqa xmm2,XMMWORD[((224-128))+rbp] + paddd xmm11,xmm3 + paddd xmm2,xmm10 +DB 15,56,205,254 + nop +DB 69,15,56,203,236 + movdqa xmm0,xmm2 + mov ecx,1 + pxor xmm6,xmm6 +DB 69,15,56,203,254 +DB 69,15,56,205,218 + pshufd xmm0,xmm1,0x0e + movdqa xmm1,XMMWORD[((240-128))+rbp] + paddd xmm1,xmm7 + movq xmm7,QWORD[rbx] + nop +DB 69,15,56,203,229 + pshufd xmm0,xmm2,0x0e + movdqa xmm2,XMMWORD[((240-128))+rbp] + paddd xmm2,xmm11 +DB 69,15,56,203,247 + + movdqa xmm0,xmm1 + cmp ecx,DWORD[rbx] + cmovge r8,rsp + cmp ecx,DWORD[4+rbx] + cmovge r9,rsp + pshufd xmm9,xmm7,0x00 +DB 69,15,56,203,236 + movdqa xmm0,xmm2 + pshufd xmm10,xmm7,0x55 + movdqa xmm11,xmm7 +DB 69,15,56,203,254 + pshufd xmm0,xmm1,0x0e + pcmpgtd xmm9,xmm6 + pcmpgtd xmm10,xmm6 +DB 69,15,56,203,229 + pshufd xmm0,xmm2,0x0e + pcmpgtd xmm11,xmm6 + movdqa xmm3,XMMWORD[((K256_shaext-16))] +DB 69,15,56,203,247 + + pand xmm13,xmm9 + pand xmm15,xmm10 + pand xmm12,xmm9 + pand xmm14,xmm10 + paddd xmm11,xmm7 + + paddd xmm13,XMMWORD[80+rsp] + paddd xmm15,XMMWORD[112+rsp] + paddd xmm12,XMMWORD[64+rsp] + paddd xmm14,XMMWORD[96+rsp] + + movq QWORD[rbx],xmm11 + dec edx + jnz NEAR $L$oop_shaext + + mov edx,DWORD[280+rsp] + + pshufd xmm12,xmm12,27 + pshufd xmm13,xmm13,27 + pshufd xmm14,xmm14,27 + pshufd xmm15,xmm15,27 + + movdqa xmm5,xmm12 + movdqa xmm6,xmm13 + punpckldq xmm12,xmm14 + punpckhdq xmm5,xmm14 + punpckldq xmm13,xmm15 + punpckhdq xmm6,xmm15 + + movq QWORD[(0-128)+rdi],xmm12 + psrldq xmm12,8 + movq QWORD[(128-128)+rdi],xmm5 + psrldq xmm5,8 + movq QWORD[(32-128)+rdi],xmm12 + movq QWORD[(160-128)+rdi],xmm5 + + movq QWORD[(64-128)+rdi],xmm13 + psrldq xmm13,8 + movq QWORD[(192-128)+rdi],xmm6 + psrldq xmm6,8 + movq QWORD[(96-128)+rdi],xmm13 + movq QWORD[(224-128)+rdi],xmm6 + + lea rdi,[8+rdi] + lea rsi,[32+rsi] + dec edx + jnz NEAR $L$oop_grande_shaext + +$L$done_shaext: + + movaps xmm6,XMMWORD[((-184))+rax] + movaps xmm7,XMMWORD[((-168))+rax] + movaps xmm8,XMMWORD[((-152))+rax] + movaps xmm9,XMMWORD[((-136))+rax] + movaps xmm10,XMMWORD[((-120))+rax] + movaps xmm11,XMMWORD[((-104))+rax] + movaps xmm12,XMMWORD[((-88))+rax] + movaps xmm13,XMMWORD[((-72))+rax] + movaps xmm14,XMMWORD[((-56))+rax] + movaps xmm15,XMMWORD[((-40))+rax] + mov rbp,QWORD[((-16))+rax] + + mov rbx,QWORD[((-8))+rax] + + lea rsp,[rax] + +$L$epilogue_shaext: + mov rdi,QWORD[8+rsp] ;WIN64 epilogue + mov rsi,QWORD[16+rsp] + DB 0F3h,0C3h ;repret + +$L$SEH_end_sha256_multi_block_shaext: +ALIGN 256 +K256: + DD 1116352408,1116352408,1116352408,1116352408 + DD 1116352408,1116352408,1116352408,1116352408 + DD 1899447441,1899447441,1899447441,1899447441 + DD 1899447441,1899447441,1899447441,1899447441 + DD 3049323471,3049323471,3049323471,3049323471 + DD 3049323471,3049323471,3049323471,3049323471 + DD 3921009573,3921009573,3921009573,3921009573 + DD 3921009573,3921009573,3921009573,3921009573 + DD 961987163,961987163,961987163,961987163 + DD 961987163,961987163,961987163,961987163 + DD 1508970993,1508970993,1508970993,1508970993 + DD 1508970993,1508970993,1508970993,1508970993 + DD 2453635748,2453635748,2453635748,2453635748 + DD 2453635748,2453635748,2453635748,2453635748 + DD 2870763221,2870763221,2870763221,2870763221 + DD 2870763221,2870763221,2870763221,2870763221 + DD 3624381080,3624381080,3624381080,3624381080 + DD 3624381080,3624381080,3624381080,3624381080 + DD 310598401,310598401,310598401,310598401 + DD 310598401,310598401,310598401,310598401 + DD 607225278,607225278,607225278,607225278 + DD 607225278,607225278,607225278,607225278 + DD 1426881987,1426881987,1426881987,1426881987 + DD 1426881987,1426881987,1426881987,1426881987 + DD 1925078388,1925078388,1925078388,1925078388 + DD 1925078388,1925078388,1925078388,1925078388 + DD 2162078206,2162078206,2162078206,2162078206 + DD 2162078206,2162078206,2162078206,2162078206 + DD 2614888103,2614888103,2614888103,2614888103 + DD 2614888103,2614888103,2614888103,2614888103 + DD 3248222580,3248222580,3248222580,3248222580 + DD 3248222580,3248222580,3248222580,3248222580 + DD 3835390401,3835390401,3835390401,3835390401 + DD 3835390401,3835390401,3835390401,3835390401 + DD 4022224774,4022224774,4022224774,4022224774 + DD 4022224774,4022224774,4022224774,4022224774 + DD 264347078,264347078,264347078,264347078 + DD 264347078,264347078,264347078,264347078 + DD 604807628,604807628,604807628,604807628 + DD 604807628,604807628,604807628,604807628 + DD 770255983,770255983,770255983,770255983 + DD 770255983,770255983,770255983,770255983 + DD 1249150122,1249150122,1249150122,1249150122 + DD 1249150122,1249150122,1249150122,1249150122 + DD 1555081692,1555081692,1555081692,1555081692 + DD 1555081692,1555081692,1555081692,1555081692 + DD 1996064986,1996064986,1996064986,1996064986 + DD 1996064986,1996064986,1996064986,1996064986 + DD 2554220882,2554220882,2554220882,2554220882 + DD 2554220882,2554220882,2554220882,2554220882 + DD 2821834349,2821834349,2821834349,2821834349 + DD 2821834349,2821834349,2821834349,2821834349 + DD 2952996808,2952996808,2952996808,2952996808 + DD 2952996808,2952996808,2952996808,2952996808 + DD 3210313671,3210313671,3210313671,3210313671 + DD 3210313671,3210313671,3210313671,3210313671 + DD 3336571891,3336571891,3336571891,3336571891 + DD 3336571891,3336571891,3336571891,3336571891 + DD 3584528711,3584528711,3584528711,3584528711 + DD 3584528711,3584528711,3584528711,3584528711 + DD 113926993,113926993,113926993,113926993 + DD 113926993,113926993,113926993,113926993 + DD 338241895,338241895,338241895,338241895 + DD 338241895,338241895,338241895,338241895 + DD 666307205,666307205,666307205,666307205 + DD 666307205,666307205,666307205,666307205 + DD 773529912,773529912,773529912,773529912 + DD 773529912,773529912,773529912,773529912 + DD 1294757372,1294757372,1294757372,1294757372 + DD 1294757372,1294757372,1294757372,1294757372 + DD 1396182291,1396182291,1396182291,1396182291 + DD 1396182291,1396182291,1396182291,1396182291 + DD 1695183700,1695183700,1695183700,1695183700 + DD 1695183700,1695183700,1695183700,1695183700 + DD 1986661051,1986661051,1986661051,1986661051 + DD 1986661051,1986661051,1986661051,1986661051 + DD 2177026350,2177026350,2177026350,2177026350 + DD 2177026350,2177026350,2177026350,2177026350 + DD 2456956037,2456956037,2456956037,2456956037 + DD 2456956037,2456956037,2456956037,2456956037 + DD 2730485921,2730485921,2730485921,2730485921 + DD 2730485921,2730485921,2730485921,2730485921 + DD 2820302411,2820302411,2820302411,2820302411 + DD 2820302411,2820302411,2820302411,2820302411 + DD 3259730800,3259730800,3259730800,3259730800 + DD 3259730800,3259730800,3259730800,3259730800 + DD 3345764771,3345764771,3345764771,3345764771 + DD 3345764771,3345764771,3345764771,3345764771 + DD 3516065817,3516065817,3516065817,3516065817 + DD 3516065817,3516065817,3516065817,3516065817 + DD 3600352804,3600352804,3600352804,3600352804 + DD 3600352804,3600352804,3600352804,3600352804 + DD 4094571909,4094571909,4094571909,4094571909 + DD 4094571909,4094571909,4094571909,4094571909 + DD 275423344,275423344,275423344,275423344 + DD 275423344,275423344,275423344,275423344 + DD 430227734,430227734,430227734,430227734 + DD 430227734,430227734,430227734,430227734 + DD 506948616,506948616,506948616,506948616 + DD 506948616,506948616,506948616,506948616 + DD 659060556,659060556,659060556,659060556 + DD 659060556,659060556,659060556,659060556 + DD 883997877,883997877,883997877,883997877 + DD 883997877,883997877,883997877,883997877 + DD 958139571,958139571,958139571,958139571 + DD 958139571,958139571,958139571,958139571 + DD 1322822218,1322822218,1322822218,1322822218 + DD 1322822218,1322822218,1322822218,1322822218 + DD 1537002063,1537002063,1537002063,1537002063 + DD 1537002063,1537002063,1537002063,1537002063 + DD 1747873779,1747873779,1747873779,1747873779 + DD 1747873779,1747873779,1747873779,1747873779 + DD 1955562222,1955562222,1955562222,1955562222 + DD 1955562222,1955562222,1955562222,1955562222 + DD 2024104815,2024104815,2024104815,2024104815 + DD 2024104815,2024104815,2024104815,2024104815 + DD 2227730452,2227730452,2227730452,2227730452 + DD 2227730452,2227730452,2227730452,2227730452 + DD 2361852424,2361852424,2361852424,2361852424 + DD 2361852424,2361852424,2361852424,2361852424 + DD 2428436474,2428436474,2428436474,2428436474 + DD 2428436474,2428436474,2428436474,2428436474 + DD 2756734187,2756734187,2756734187,2756734187 + DD 2756734187,2756734187,2756734187,2756734187 + DD 3204031479,3204031479,3204031479,3204031479 + DD 3204031479,3204031479,3204031479,3204031479 + DD 3329325298,3329325298,3329325298,3329325298 + DD 3329325298,3329325298,3329325298,3329325298 +$L$pbswap: + DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f + DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f +K256_shaext: + DD 0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5 + DD 0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5 + DD 0xd807aa98,0x12835b01,0x243185be,0x550c7dc3 + DD 0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174 + DD 0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc + DD 0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da + DD 0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7 + DD 0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967 + DD 0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13 + DD 0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85 + DD 0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3 + DD 0xd192e819,0xd6990624,0xf40e3585,0x106aa070 + DD 0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5 + DD 0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3 + DD 0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208 + DD 0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2 +DB 83,72,65,50,53,54,32,109,117,108,116,105,45,98,108,111 +DB 99,107,32,116,114,97,110,115,102,111,114,109,32,102,111,114 +DB 32,120,56,54,95,54,52,44,32,67,82,89,80,84,79,71 +DB 65,77,83,32,98,121,32,60,97,112,112,114,111,64,111,112 +DB 101,110,115,115,108,46,111,114,103,62,0 +EXTERN __imp_RtlVirtualUnwind + +ALIGN 16 +se_handler: + push rsi + push rdi + push rbx + push rbp + push r12 + push r13 + push r14 + push r15 + pushfq + sub rsp,64 + + mov rax,QWORD[120+r8] + mov rbx,QWORD[248+r8] + + mov rsi,QWORD[8+r9] + mov r11,QWORD[56+r9] + + mov r10d,DWORD[r11] + lea r10,[r10*1+rsi] + cmp rbx,r10 + jb NEAR $L$in_prologue + + mov rax,QWORD[152+r8] + + mov r10d,DWORD[4+r11] + lea r10,[r10*1+rsi] + cmp rbx,r10 + jae NEAR $L$in_prologue + + mov rax,QWORD[272+rax] + + mov rbx,QWORD[((-8))+rax] + mov rbp,QWORD[((-16))+rax] + mov QWORD[144+r8],rbx + mov QWORD[160+r8],rbp + + lea rsi,[((-24-160))+rax] + lea rdi,[512+r8] + mov ecx,20 + DD 0xa548f3fc + +$L$in_prologue: + mov rdi,QWORD[8+rax] + mov rsi,QWORD[16+rax] + mov QWORD[152+r8],rax + mov QWORD[168+r8],rsi + mov QWORD[176+r8],rdi + + mov rdi,QWORD[40+r9] + mov rsi,r8 + mov ecx,154 + DD 0xa548f3fc + + mov rsi,r9 + xor rcx,rcx + mov rdx,QWORD[8+rsi] + mov r8,QWORD[rsi] + mov r9,QWORD[16+rsi] + mov r10,QWORD[40+rsi] + lea r11,[56+rsi] + lea r12,[24+rsi] + mov QWORD[32+rsp],r10 + mov QWORD[40+rsp],r11 + mov QWORD[48+rsp],r12 + mov QWORD[56+rsp],rcx + call QWORD[__imp_RtlVirtualUnwind] + + mov eax,1 + add rsp,64 + popfq + pop r15 + pop r14 + pop r13 + pop r12 + pop rbp + pop rbx + pop rdi + pop rsi + DB 0F3h,0C3h ;repret + +section .pdata rdata align=4 +ALIGN 4 + DD $L$SEH_begin_sha256_multi_block wrt ..imagebase + DD $L$SEH_end_sha256_multi_block wrt ..imagebase + DD $L$SEH_info_sha256_multi_block wrt ..imagebase + DD $L$SEH_begin_sha256_multi_block_shaext wrt ..imagebase + DD $L$SEH_end_sha256_multi_block_shaext wrt ..imagebase + DD $L$SEH_info_sha256_multi_block_shaext wrt ..imagebase +section .xdata rdata align=8 +ALIGN 8 +$L$SEH_info_sha256_multi_block: +DB 9,0,0,0 + DD se_handler wrt ..imagebase + DD $L$body wrt ..imagebase,$L$epilogue wrt ..imagebase +$L$SEH_info_sha256_multi_block_shaext: +DB 9,0,0,0 + DD se_handler wrt ..imagebase + DD $L$body_shaext wrt ..imagebase,$L$epilogue_shaext wrt ..imagebase diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-x86_64.nasm new file mode 100644 index 0000000000..70e49862a3 --- /dev/null +++ b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-x86_64.nasm @@ -0,0 +1,3313 @@ +; WARNING: do not edit! +; Generated from openssl/crypto/sha/asm/sha512-x86_64.pl +; +; Copyright 2005-2020 The OpenSSL Project Authors. All Rights Reserved. +; +; Licensed under the OpenSSL license (the "License"). You may not use +; this file except in compliance with the License. You can obtain a copy +; in the file LICENSE in the source distribution or at +; https://www.openssl.org/source/license.html + +default rel +%define XMMWORD +%define YMMWORD +%define ZMMWORD +section .text code align=64 + + +EXTERN OPENSSL_ia32cap_P +global sha256_block_data_order + +ALIGN 16 +sha256_block_data_order: + mov QWORD[8+rsp],rdi ;WIN64 prologue + mov QWORD[16+rsp],rsi + mov rax,rsp +$L$SEH_begin_sha256_block_data_order: + mov rdi,rcx + mov rsi,rdx + mov rdx,r8 + + + + lea r11,[OPENSSL_ia32cap_P] + mov r9d,DWORD[r11] + mov r10d,DWORD[4+r11] + mov r11d,DWORD[8+r11] + test r11d,536870912 + jnz NEAR _shaext_shortcut + test r10d,512 + jnz NEAR $L$ssse3_shortcut + mov rax,rsp + + push rbx + + push rbp + + push r12 + + push r13 + + push r14 + + push r15 + + shl rdx,4 + sub rsp,16*4+4*8 + lea rdx,[rdx*4+rsi] + and rsp,-64 + mov QWORD[((64+0))+rsp],rdi + mov QWORD[((64+8))+rsp],rsi + mov QWORD[((64+16))+rsp],rdx + mov QWORD[88+rsp],rax + +$L$prologue: + + mov eax,DWORD[rdi] + mov ebx,DWORD[4+rdi] + mov ecx,DWORD[8+rdi] + mov edx,DWORD[12+rdi] + mov r8d,DWORD[16+rdi] + mov r9d,DWORD[20+rdi] + mov r10d,DWORD[24+rdi] + mov r11d,DWORD[28+rdi] + jmp NEAR $L$loop + +ALIGN 16 +$L$loop: + mov edi,ebx + lea rbp,[K256] + xor edi,ecx + mov r12d,DWORD[rsi] + mov r13d,r8d + mov r14d,eax + bswap r12d + ror r13d,14 + mov r15d,r9d + + xor r13d,r8d + ror r14d,9 + xor r15d,r10d + + mov DWORD[rsp],r12d + xor r14d,eax + and r15d,r8d + + ror r13d,5 + add r12d,r11d + xor r15d,r10d + + ror r14d,11 + xor r13d,r8d + add r12d,r15d + + mov r15d,eax + add r12d,DWORD[rbp] + xor r14d,eax + + xor r15d,ebx + ror r13d,6 + mov r11d,ebx + + and edi,r15d + ror r14d,2 + add r12d,r13d + + xor r11d,edi + add edx,r12d + add r11d,r12d + + lea rbp,[4+rbp] + add r11d,r14d + mov r12d,DWORD[4+rsi] + mov r13d,edx + mov r14d,r11d + bswap r12d + ror r13d,14 + mov edi,r8d + + xor r13d,edx + ror r14d,9 + xor edi,r9d + + mov DWORD[4+rsp],r12d + xor r14d,r11d + and edi,edx + + ror r13d,5 + add r12d,r10d + xor edi,r9d + + ror r14d,11 + xor r13d,edx + add r12d,edi + + mov edi,r11d + add r12d,DWORD[rbp] + xor r14d,r11d + + xor edi,eax + ror r13d,6 + mov r10d,eax + + and r15d,edi + ror r14d,2 + add r12d,r13d + + xor r10d,r15d + add ecx,r12d + add r10d,r12d + + lea rbp,[4+rbp] + add r10d,r14d + mov r12d,DWORD[8+rsi] + mov r13d,ecx + mov r14d,r10d + bswap r12d + ror r13d,14 + mov r15d,edx + + xor r13d,ecx + ror r14d,9 + xor r15d,r8d + + mov DWORD[8+rsp],r12d + xor r14d,r10d + and r15d,ecx + + ror r13d,5 + add r12d,r9d + xor r15d,r8d + + ror r14d,11 + xor r13d,ecx + add r12d,r15d + + mov r15d,r10d + add r12d,DWORD[rbp] + xor r14d,r10d + + xor r15d,r11d + ror r13d,6 + mov r9d,r11d + + and edi,r15d + ror r14d,2 + add r12d,r13d + + xor r9d,edi + add ebx,r12d + add r9d,r12d + + lea rbp,[4+rbp] + add r9d,r14d + mov r12d,DWORD[12+rsi] + mov r13d,ebx + mov r14d,r9d + bswap r12d + ror r13d,14 + mov edi,ecx + + xor r13d,ebx + ror r14d,9 + xor edi,edx + + mov DWORD[12+rsp],r12d + xor r14d,r9d + and edi,ebx + + ror r13d,5 + add r12d,r8d + xor edi,edx + + ror r14d,11 + xor r13d,ebx + add r12d,edi + + mov edi,r9d + add r12d,DWORD[rbp] + xor r14d,r9d + + xor edi,r10d + ror r13d,6 + mov r8d,r10d + + and r15d,edi + ror r14d,2 + add r12d,r13d + + xor r8d,r15d + add eax,r12d + add r8d,r12d + + lea rbp,[20+rbp] + add r8d,r14d + mov r12d,DWORD[16+rsi] + mov r13d,eax + mov r14d,r8d + bswap r12d + ror r13d,14 + mov r15d,ebx + + xor r13d,eax + ror r14d,9 + xor r15d,ecx + + mov DWORD[16+rsp],r12d + xor r14d,r8d + and r15d,eax + + ror r13d,5 + add r12d,edx + xor r15d,ecx + + ror r14d,11 + xor r13d,eax + add r12d,r15d + + mov r15d,r8d + add r12d,DWORD[rbp] + xor r14d,r8d + + xor r15d,r9d + ror r13d,6 + mov edx,r9d + + and edi,r15d + ror r14d,2 + add r12d,r13d + + xor edx,edi + add r11d,r12d + add edx,r12d + + lea rbp,[4+rbp] + add edx,r14d + mov r12d,DWORD[20+rsi] + mov r13d,r11d + mov r14d,edx + bswap r12d + ror r13d,14 + mov edi,eax + + xor r13d,r11d + ror r14d,9 + xor edi,ebx + + mov DWORD[20+rsp],r12d + xor r14d,edx + and edi,r11d + + ror r13d,5 + add r12d,ecx + xor edi,ebx + + ror r14d,11 + xor r13d,r11d + add r12d,edi + + mov edi,edx + add r12d,DWORD[rbp] + xor r14d,edx + + xor edi,r8d + ror r13d,6 + mov ecx,r8d + + and r15d,edi + ror r14d,2 + add r12d,r13d + + xor ecx,r15d + add r10d,r12d + add ecx,r12d + + lea rbp,[4+rbp] + add ecx,r14d + mov r12d,DWORD[24+rsi] + mov r13d,r10d + mov r14d,ecx + bswap r12d + ror r13d,14 + mov r15d,r11d + + xor r13d,r10d + ror r14d,9 + xor r15d,eax + + mov DWORD[24+rsp],r12d + xor r14d,ecx + and r15d,r10d + + ror r13d,5 + add r12d,ebx + xor r15d,eax + + ror r14d,11 + xor r13d,r10d + add r12d,r15d + + mov r15d,ecx + add r12d,DWORD[rbp] + xor r14d,ecx + + xor r15d,edx + ror r13d,6 + mov ebx,edx + + and edi,r15d + ror r14d,2 + add r12d,r13d + + xor ebx,edi + add r9d,r12d + add ebx,r12d + + lea rbp,[4+rbp] + add ebx,r14d + mov r12d,DWORD[28+rsi] + mov r13d,r9d + mov r14d,ebx + bswap r12d + ror r13d,14 + mov edi,r10d + + xor r13d,r9d + ror r14d,9 + xor edi,r11d + + mov DWORD[28+rsp],r12d + xor r14d,ebx + and edi,r9d + + ror r13d,5 + add r12d,eax + xor edi,r11d + + ror r14d,11 + xor r13d,r9d + add r12d,edi + + mov edi,ebx + add r12d,DWORD[rbp] + xor r14d,ebx + + xor edi,ecx + ror r13d,6 + mov eax,ecx + + and r15d,edi + ror r14d,2 + add r12d,r13d + + xor eax,r15d + add r8d,r12d + add eax,r12d + + lea rbp,[20+rbp] + add eax,r14d + mov r12d,DWORD[32+rsi] + mov r13d,r8d + mov r14d,eax + bswap r12d + ror r13d,14 + mov r15d,r9d + + xor r13d,r8d + ror r14d,9 + xor r15d,r10d + + mov DWORD[32+rsp],r12d + xor r14d,eax + and r15d,r8d + + ror r13d,5 + add r12d,r11d + xor r15d,r10d + + ror r14d,11 + xor r13d,r8d + add r12d,r15d + + mov r15d,eax + add r12d,DWORD[rbp] + xor r14d,eax + + xor r15d,ebx + ror r13d,6 + mov r11d,ebx + + and edi,r15d + ror r14d,2 + add r12d,r13d + + xor r11d,edi + add edx,r12d + add r11d,r12d + + lea rbp,[4+rbp] + add r11d,r14d + mov r12d,DWORD[36+rsi] + mov r13d,edx + mov r14d,r11d + bswap r12d + ror r13d,14 + mov edi,r8d + + xor r13d,edx + ror r14d,9 + xor edi,r9d + + mov DWORD[36+rsp],r12d + xor r14d,r11d + and edi,edx + + ror r13d,5 + add r12d,r10d + xor edi,r9d + + ror r14d,11 + xor r13d,edx + add r12d,edi + + mov edi,r11d + add r12d,DWORD[rbp] + xor r14d,r11d + + xor edi,eax + ror r13d,6 + mov r10d,eax + + and r15d,edi + ror r14d,2 + add r12d,r13d + + xor r10d,r15d + add ecx,r12d + add r10d,r12d + + lea rbp,[4+rbp] + add r10d,r14d + mov r12d,DWORD[40+rsi] + mov r13d,ecx + mov r14d,r10d + bswap r12d + ror r13d,14 + mov r15d,edx + + xor r13d,ecx + ror r14d,9 + xor r15d,r8d + + mov DWORD[40+rsp],r12d + xor r14d,r10d + and r15d,ecx + + ror r13d,5 + add r12d,r9d + xor r15d,r8d + + ror r14d,11 + xor r13d,ecx + add r12d,r15d + + mov r15d,r10d + add r12d,DWORD[rbp] + xor r14d,r10d + + xor r15d,r11d + ror r13d,6 + mov r9d,r11d + + and edi,r15d + ror r14d,2 + add r12d,r13d + + xor r9d,edi + add ebx,r12d + add r9d,r12d + + lea rbp,[4+rbp] + add r9d,r14d + mov r12d,DWORD[44+rsi] + mov r13d,ebx + mov r14d,r9d + bswap r12d + ror r13d,14 + mov edi,ecx + + xor r13d,ebx + ror r14d,9 + xor edi,edx + + mov DWORD[44+rsp],r12d + xor r14d,r9d + and edi,ebx + + ror r13d,5 + add r12d,r8d + xor edi,edx + + ror r14d,11 + xor r13d,ebx + add r12d,edi + + mov edi,r9d + add r12d,DWORD[rbp] + xor r14d,r9d + + xor edi,r10d + ror r13d,6 + mov r8d,r10d + + and r15d,edi + ror r14d,2 + add r12d,r13d + + xor r8d,r15d + add eax,r12d + add r8d,r12d + + lea rbp,[20+rbp] + add r8d,r14d + mov r12d,DWORD[48+rsi] + mov r13d,eax + mov r14d,r8d + bswap r12d + ror r13d,14 + mov r15d,ebx + + xor r13d,eax + ror r14d,9 + xor r15d,ecx + + mov DWORD[48+rsp],r12d + xor r14d,r8d + and r15d,eax + + ror r13d,5 + add r12d,edx + xor r15d,ecx + + ror r14d,11 + xor r13d,eax + add r12d,r15d + + mov r15d,r8d + add r12d,DWORD[rbp] + xor r14d,r8d + + xor r15d,r9d + ror r13d,6 + mov edx,r9d + + and edi,r15d + ror r14d,2 + add r12d,r13d + + xor edx,edi + add r11d,r12d + add edx,r12d + + lea rbp,[4+rbp] + add edx,r14d + mov r12d,DWORD[52+rsi] + mov r13d,r11d + mov r14d,edx + bswap r12d + ror r13d,14 + mov edi,eax + + xor r13d,r11d + ror r14d,9 + xor edi,ebx + + mov DWORD[52+rsp],r12d + xor r14d,edx + and edi,r11d + + ror r13d,5 + add r12d,ecx + xor edi,ebx + + ror r14d,11 + xor r13d,r11d + add r12d,edi + + mov edi,edx + add r12d,DWORD[rbp] + xor r14d,edx + + xor edi,r8d + ror r13d,6 + mov ecx,r8d + + and r15d,edi + ror r14d,2 + add r12d,r13d + + xor ecx,r15d + add r10d,r12d + add ecx,r12d + + lea rbp,[4+rbp] + add ecx,r14d + mov r12d,DWORD[56+rsi] + mov r13d,r10d + mov r14d,ecx + bswap r12d + ror r13d,14 + mov r15d,r11d + + xor r13d,r10d + ror r14d,9 + xor r15d,eax + + mov DWORD[56+rsp],r12d + xor r14d,ecx + and r15d,r10d + + ror r13d,5 + add r12d,ebx + xor r15d,eax + + ror r14d,11 + xor r13d,r10d + add r12d,r15d + + mov r15d,ecx + add r12d,DWORD[rbp] + xor r14d,ecx + + xor r15d,edx + ror r13d,6 + mov ebx,edx + + and edi,r15d + ror r14d,2 + add r12d,r13d + + xor ebx,edi + add r9d,r12d + add ebx,r12d + + lea rbp,[4+rbp] + add ebx,r14d + mov r12d,DWORD[60+rsi] + mov r13d,r9d + mov r14d,ebx + bswap r12d + ror r13d,14 + mov edi,r10d + + xor r13d,r9d + ror r14d,9 + xor edi,r11d + + mov DWORD[60+rsp],r12d + xor r14d,ebx + and edi,r9d + + ror r13d,5 + add r12d,eax + xor edi,r11d + + ror r14d,11 + xor r13d,r9d + add r12d,edi + + mov edi,ebx + add r12d,DWORD[rbp] + xor r14d,ebx + + xor edi,ecx + ror r13d,6 + mov eax,ecx + + and r15d,edi + ror r14d,2 + add r12d,r13d + + xor eax,r15d + add r8d,r12d + add eax,r12d + + lea rbp,[20+rbp] + jmp NEAR $L$rounds_16_xx +ALIGN 16 +$L$rounds_16_xx: + mov r13d,DWORD[4+rsp] + mov r15d,DWORD[56+rsp] + + mov r12d,r13d + ror r13d,11 + add eax,r14d + mov r14d,r15d + ror r15d,2 + + xor r13d,r12d + shr r12d,3 + ror r13d,7 + xor r15d,r14d + shr r14d,10 + + ror r15d,17 + xor r12d,r13d + xor r15d,r14d + add r12d,DWORD[36+rsp] + + add r12d,DWORD[rsp] + mov r13d,r8d + add r12d,r15d + mov r14d,eax + ror r13d,14 + mov r15d,r9d + + xor r13d,r8d + ror r14d,9 + xor r15d,r10d + + mov DWORD[rsp],r12d + xor r14d,eax + and r15d,r8d + + ror r13d,5 + add r12d,r11d + xor r15d,r10d + + ror r14d,11 + xor r13d,r8d + add r12d,r15d + + mov r15d,eax + add r12d,DWORD[rbp] + xor r14d,eax + + xor r15d,ebx + ror r13d,6 + mov r11d,ebx + + and edi,r15d + ror r14d,2 + add r12d,r13d + + xor r11d,edi + add edx,r12d + add r11d,r12d + + lea rbp,[4+rbp] + mov r13d,DWORD[8+rsp] + mov edi,DWORD[60+rsp] + + mov r12d,r13d + ror r13d,11 + add r11d,r14d + mov r14d,edi + ror edi,2 + + xor r13d,r12d + shr r12d,3 + ror r13d,7 + xor edi,r14d + shr r14d,10 + + ror edi,17 + xor r12d,r13d + xor edi,r14d + add r12d,DWORD[40+rsp] + + add r12d,DWORD[4+rsp] + mov r13d,edx + add r12d,edi + mov r14d,r11d + ror r13d,14 + mov edi,r8d + + xor r13d,edx + ror r14d,9 + xor edi,r9d + + mov DWORD[4+rsp],r12d + xor r14d,r11d + and edi,edx + + ror r13d,5 + add r12d,r10d + xor edi,r9d + + ror r14d,11 + xor r13d,edx + add r12d,edi + + mov edi,r11d + add r12d,DWORD[rbp] + xor r14d,r11d + + xor edi,eax + ror r13d,6 + mov r10d,eax + + and r15d,edi + ror r14d,2 + add r12d,r13d + + xor r10d,r15d + add ecx,r12d + add r10d,r12d + + lea rbp,[4+rbp] + mov r13d,DWORD[12+rsp] + mov r15d,DWORD[rsp] + + mov r12d,r13d + ror r13d,11 + add r10d,r14d + mov r14d,r15d + ror r15d,2 + + xor r13d,r12d + shr r12d,3 + ror r13d,7 + xor r15d,r14d + shr r14d,10 + + ror r15d,17 + xor r12d,r13d + xor r15d,r14d + add r12d,DWORD[44+rsp] + + add r12d,DWORD[8+rsp] + mov r13d,ecx + add r12d,r15d + mov r14d,r10d + ror r13d,14 + mov r15d,edx + + xor r13d,ecx + ror r14d,9 + xor r15d,r8d + + mov DWORD[8+rsp],r12d + xor r14d,r10d + and r15d,ecx + + ror r13d,5 + add r12d,r9d + xor r15d,r8d + + ror r14d,11 + xor r13d,ecx + add r12d,r15d + + mov r15d,r10d + add r12d,DWORD[rbp] + xor r14d,r10d + + xor r15d,r11d + ror r13d,6 + mov r9d,r11d + + and edi,r15d + ror r14d,2 + add r12d,r13d + + xor r9d,edi + add ebx,r12d + add r9d,r12d + + lea rbp,[4+rbp] + mov r13d,DWORD[16+rsp] + mov edi,DWORD[4+rsp] + + mov r12d,r13d + ror r13d,11 + add r9d,r14d + mov r14d,edi + ror edi,2 + + xor r13d,r12d + shr r12d,3 + ror r13d,7 + xor edi,r14d + shr r14d,10 + + ror edi,17 + xor r12d,r13d + xor edi,r14d + add r12d,DWORD[48+rsp] + + add r12d,DWORD[12+rsp] + mov r13d,ebx + add r12d,edi + mov r14d,r9d + ror r13d,14 + mov edi,ecx + + xor r13d,ebx + ror r14d,9 + xor edi,edx + + mov DWORD[12+rsp],r12d + xor r14d,r9d + and edi,ebx + + ror r13d,5 + add r12d,r8d + xor edi,edx + + ror r14d,11 + xor r13d,ebx + add r12d,edi + + mov edi,r9d + add r12d,DWORD[rbp] + xor r14d,r9d + + xor edi,r10d + ror r13d,6 + mov r8d,r10d + + and r15d,edi + ror r14d,2 + add r12d,r13d + + xor r8d,r15d + add eax,r12d + add r8d,r12d + + lea rbp,[20+rbp] + mov r13d,DWORD[20+rsp] + mov r15d,DWORD[8+rsp] + + mov r12d,r13d + ror r13d,11 + add r8d,r14d + mov r14d,r15d + ror r15d,2 + + xor r13d,r12d + shr r12d,3 + ror r13d,7 + xor r15d,r14d + shr r14d,10 + + ror r15d,17 + xor r12d,r13d + xor r15d,r14d + add r12d,DWORD[52+rsp] + + add r12d,DWORD[16+rsp] + mov r13d,eax + add r12d,r15d + mov r14d,r8d + ror r13d,14 + mov r15d,ebx + + xor r13d,eax + ror r14d,9 + xor r15d,ecx + + mov DWORD[16+rsp],r12d + xor r14d,r8d + and r15d,eax + + ror r13d,5 + add r12d,edx + xor r15d,ecx + + ror r14d,11 + xor r13d,eax + add r12d,r15d + + mov r15d,r8d + add r12d,DWORD[rbp] + xor r14d,r8d + + xor r15d,r9d + ror r13d,6 + mov edx,r9d + + and edi,r15d + ror r14d,2 + add r12d,r13d + + xor edx,edi + add r11d,r12d + add edx,r12d + + lea rbp,[4+rbp] + mov r13d,DWORD[24+rsp] + mov edi,DWORD[12+rsp] + + mov r12d,r13d + ror r13d,11 + add edx,r14d + mov r14d,edi + ror edi,2 + + xor r13d,r12d + shr r12d,3 + ror r13d,7 + xor edi,r14d + shr r14d,10 + + ror edi,17 + xor r12d,r13d + xor edi,r14d + add r12d,DWORD[56+rsp] + + add r12d,DWORD[20+rsp] + mov r13d,r11d + add r12d,edi + mov r14d,edx + ror r13d,14 + mov edi,eax + + xor r13d,r11d + ror r14d,9 + xor edi,ebx + + mov DWORD[20+rsp],r12d + xor r14d,edx + and edi,r11d + + ror r13d,5 + add r12d,ecx + xor edi,ebx + + ror r14d,11 + xor r13d,r11d + add r12d,edi + + mov edi,edx + add r12d,DWORD[rbp] + xor r14d,edx + + xor edi,r8d + ror r13d,6 + mov ecx,r8d + + and r15d,edi + ror r14d,2 + add r12d,r13d + + xor ecx,r15d + add r10d,r12d + add ecx,r12d + + lea rbp,[4+rbp] + mov r13d,DWORD[28+rsp] + mov r15d,DWORD[16+rsp] + + mov r12d,r13d + ror r13d,11 + add ecx,r14d + mov r14d,r15d + ror r15d,2 + + xor r13d,r12d + shr r12d,3 + ror r13d,7 + xor r15d,r14d + shr r14d,10 + + ror r15d,17 + xor r12d,r13d + xor r15d,r14d + add r12d,DWORD[60+rsp] + + add r12d,DWORD[24+rsp] + mov r13d,r10d + add r12d,r15d + mov r14d,ecx + ror r13d,14 + mov r15d,r11d + + xor r13d,r10d + ror r14d,9 + xor r15d,eax + + mov DWORD[24+rsp],r12d + xor r14d,ecx + and r15d,r10d + + ror r13d,5 + add r12d,ebx + xor r15d,eax + + ror r14d,11 + xor r13d,r10d + add r12d,r15d + + mov r15d,ecx + add r12d,DWORD[rbp] + xor r14d,ecx + + xor r15d,edx + ror r13d,6 + mov ebx,edx + + and edi,r15d + ror r14d,2 + add r12d,r13d + + xor ebx,edi + add r9d,r12d + add ebx,r12d + + lea rbp,[4+rbp] + mov r13d,DWORD[32+rsp] + mov edi,DWORD[20+rsp] + + mov r12d,r13d + ror r13d,11 + add ebx,r14d + mov r14d,edi + ror edi,2 + + xor r13d,r12d + shr r12d,3 + ror r13d,7 + xor edi,r14d + shr r14d,10 + + ror edi,17 + xor r12d,r13d + xor edi,r14d + add r12d,DWORD[rsp] + + add r12d,DWORD[28+rsp] + mov r13d,r9d + add r12d,edi + mov r14d,ebx + ror r13d,14 + mov edi,r10d + + xor r13d,r9d + ror r14d,9 + xor edi,r11d + + mov DWORD[28+rsp],r12d + xor r14d,ebx + and edi,r9d + + ror r13d,5 + add r12d,eax + xor edi,r11d + + ror r14d,11 + xor r13d,r9d + add r12d,edi + + mov edi,ebx + add r12d,DWORD[rbp] + xor r14d,ebx + + xor edi,ecx + ror r13d,6 + mov eax,ecx + + and r15d,edi + ror r14d,2 + add r12d,r13d + + xor eax,r15d + add r8d,r12d + add eax,r12d + + lea rbp,[20+rbp] + mov r13d,DWORD[36+rsp] + mov r15d,DWORD[24+rsp] + + mov r12d,r13d + ror r13d,11 + add eax,r14d + mov r14d,r15d + ror r15d,2 + + xor r13d,r12d + shr r12d,3 + ror r13d,7 + xor r15d,r14d + shr r14d,10 + + ror r15d,17 + xor r12d,r13d + xor r15d,r14d + add r12d,DWORD[4+rsp] + + add r12d,DWORD[32+rsp] + mov r13d,r8d + add r12d,r15d + mov r14d,eax + ror r13d,14 + mov r15d,r9d + + xor r13d,r8d + ror r14d,9 + xor r15d,r10d + + mov DWORD[32+rsp],r12d + xor r14d,eax + and r15d,r8d + + ror r13d,5 + add r12d,r11d + xor r15d,r10d + + ror r14d,11 + xor r13d,r8d + add r12d,r15d + + mov r15d,eax + add r12d,DWORD[rbp] + xor r14d,eax + + xor r15d,ebx + ror r13d,6 + mov r11d,ebx + + and edi,r15d + ror r14d,2 + add r12d,r13d + + xor r11d,edi + add edx,r12d + add r11d,r12d + + lea rbp,[4+rbp] + mov r13d,DWORD[40+rsp] + mov edi,DWORD[28+rsp] + + mov r12d,r13d + ror r13d,11 + add r11d,r14d + mov r14d,edi + ror edi,2 + + xor r13d,r12d + shr r12d,3 + ror r13d,7 + xor edi,r14d + shr r14d,10 + + ror edi,17 + xor r12d,r13d + xor edi,r14d + add r12d,DWORD[8+rsp] + + add r12d,DWORD[36+rsp] + mov r13d,edx + add r12d,edi + mov r14d,r11d + ror r13d,14 + mov edi,r8d + + xor r13d,edx + ror r14d,9 + xor edi,r9d + + mov DWORD[36+rsp],r12d + xor r14d,r11d + and edi,edx + + ror r13d,5 + add r12d,r10d + xor edi,r9d + + ror r14d,11 + xor r13d,edx + add r12d,edi + + mov edi,r11d + add r12d,DWORD[rbp] + xor r14d,r11d + + xor edi,eax + ror r13d,6 + mov r10d,eax + + and r15d,edi + ror r14d,2 + add r12d,r13d + + xor r10d,r15d + add ecx,r12d + add r10d,r12d + + lea rbp,[4+rbp] + mov r13d,DWORD[44+rsp] + mov r15d,DWORD[32+rsp] + + mov r12d,r13d + ror r13d,11 + add r10d,r14d + mov r14d,r15d + ror r15d,2 + + xor r13d,r12d + shr r12d,3 + ror r13d,7 + xor r15d,r14d + shr r14d,10 + + ror r15d,17 + xor r12d,r13d + xor r15d,r14d + add r12d,DWORD[12+rsp] + + add r12d,DWORD[40+rsp] + mov r13d,ecx + add r12d,r15d + mov r14d,r10d + ror r13d,14 + mov r15d,edx + + xor r13d,ecx + ror r14d,9 + xor r15d,r8d + + mov DWORD[40+rsp],r12d + xor r14d,r10d + and r15d,ecx + + ror r13d,5 + add r12d,r9d + xor r15d,r8d + + ror r14d,11 + xor r13d,ecx + add r12d,r15d + + mov r15d,r10d + add r12d,DWORD[rbp] + xor r14d,r10d + + xor r15d,r11d + ror r13d,6 + mov r9d,r11d + + and edi,r15d + ror r14d,2 + add r12d,r13d + + xor r9d,edi + add ebx,r12d + add r9d,r12d + + lea rbp,[4+rbp] + mov r13d,DWORD[48+rsp] + mov edi,DWORD[36+rsp] + + mov r12d,r13d + ror r13d,11 + add r9d,r14d + mov r14d,edi + ror edi,2 + + xor r13d,r12d + shr r12d,3 + ror r13d,7 + xor edi,r14d + shr r14d,10 + + ror edi,17 + xor r12d,r13d + xor edi,r14d + add r12d,DWORD[16+rsp] + + add r12d,DWORD[44+rsp] + mov r13d,ebx + add r12d,edi + mov r14d,r9d + ror r13d,14 + mov edi,ecx + + xor r13d,ebx + ror r14d,9 + xor edi,edx + + mov DWORD[44+rsp],r12d + xor r14d,r9d + and edi,ebx + + ror r13d,5 + add r12d,r8d + xor edi,edx + + ror r14d,11 + xor r13d,ebx + add r12d,edi + + mov edi,r9d + add r12d,DWORD[rbp] + xor r14d,r9d + + xor edi,r10d + ror r13d,6 + mov r8d,r10d + + and r15d,edi + ror r14d,2 + add r12d,r13d + + xor r8d,r15d + add eax,r12d + add r8d,r12d + + lea rbp,[20+rbp] + mov r13d,DWORD[52+rsp] + mov r15d,DWORD[40+rsp] + + mov r12d,r13d + ror r13d,11 + add r8d,r14d + mov r14d,r15d + ror r15d,2 + + xor r13d,r12d + shr r12d,3 + ror r13d,7 + xor r15d,r14d + shr r14d,10 + + ror r15d,17 + xor r12d,r13d + xor r15d,r14d + add r12d,DWORD[20+rsp] + + add r12d,DWORD[48+rsp] + mov r13d,eax + add r12d,r15d + mov r14d,r8d + ror r13d,14 + mov r15d,ebx + + xor r13d,eax + ror r14d,9 + xor r15d,ecx + + mov DWORD[48+rsp],r12d + xor r14d,r8d + and r15d,eax + + ror r13d,5 + add r12d,edx + xor r15d,ecx + + ror r14d,11 + xor r13d,eax + add r12d,r15d + + mov r15d,r8d + add r12d,DWORD[rbp] + xor r14d,r8d + + xor r15d,r9d + ror r13d,6 + mov edx,r9d + + and edi,r15d + ror r14d,2 + add r12d,r13d + + xor edx,edi + add r11d,r12d + add edx,r12d + + lea rbp,[4+rbp] + mov r13d,DWORD[56+rsp] + mov edi,DWORD[44+rsp] + + mov r12d,r13d + ror r13d,11 + add edx,r14d + mov r14d,edi + ror edi,2 + + xor r13d,r12d + shr r12d,3 + ror r13d,7 + xor edi,r14d + shr r14d,10 + + ror edi,17 + xor r12d,r13d + xor edi,r14d + add r12d,DWORD[24+rsp] + + add r12d,DWORD[52+rsp] + mov r13d,r11d + add r12d,edi + mov r14d,edx + ror r13d,14 + mov edi,eax + + xor r13d,r11d + ror r14d,9 + xor edi,ebx + + mov DWORD[52+rsp],r12d + xor r14d,edx + and edi,r11d + + ror r13d,5 + add r12d,ecx + xor edi,ebx + + ror r14d,11 + xor r13d,r11d + add r12d,edi + + mov edi,edx + add r12d,DWORD[rbp] + xor r14d,edx + + xor edi,r8d + ror r13d,6 + mov ecx,r8d + + and r15d,edi + ror r14d,2 + add r12d,r13d + + xor ecx,r15d + add r10d,r12d + add ecx,r12d + + lea rbp,[4+rbp] + mov r13d,DWORD[60+rsp] + mov r15d,DWORD[48+rsp] + + mov r12d,r13d + ror r13d,11 + add ecx,r14d + mov r14d,r15d + ror r15d,2 + + xor r13d,r12d + shr r12d,3 + ror r13d,7 + xor r15d,r14d + shr r14d,10 + + ror r15d,17 + xor r12d,r13d + xor r15d,r14d + add r12d,DWORD[28+rsp] + + add r12d,DWORD[56+rsp] + mov r13d,r10d + add r12d,r15d + mov r14d,ecx + ror r13d,14 + mov r15d,r11d + + xor r13d,r10d + ror r14d,9 + xor r15d,eax + + mov DWORD[56+rsp],r12d + xor r14d,ecx + and r15d,r10d + + ror r13d,5 + add r12d,ebx + xor r15d,eax + + ror r14d,11 + xor r13d,r10d + add r12d,r15d + + mov r15d,ecx + add r12d,DWORD[rbp] + xor r14d,ecx + + xor r15d,edx + ror r13d,6 + mov ebx,edx + + and edi,r15d + ror r14d,2 + add r12d,r13d + + xor ebx,edi + add r9d,r12d + add ebx,r12d + + lea rbp,[4+rbp] + mov r13d,DWORD[rsp] + mov edi,DWORD[52+rsp] + + mov r12d,r13d + ror r13d,11 + add ebx,r14d + mov r14d,edi + ror edi,2 + + xor r13d,r12d + shr r12d,3 + ror r13d,7 + xor edi,r14d + shr r14d,10 + + ror edi,17 + xor r12d,r13d + xor edi,r14d + add r12d,DWORD[32+rsp] + + add r12d,DWORD[60+rsp] + mov r13d,r9d + add r12d,edi + mov r14d,ebx + ror r13d,14 + mov edi,r10d + + xor r13d,r9d + ror r14d,9 + xor edi,r11d + + mov DWORD[60+rsp],r12d + xor r14d,ebx + and edi,r9d + + ror r13d,5 + add r12d,eax + xor edi,r11d + + ror r14d,11 + xor r13d,r9d + add r12d,edi + + mov edi,ebx + add r12d,DWORD[rbp] + xor r14d,ebx + + xor edi,ecx + ror r13d,6 + mov eax,ecx + + and r15d,edi + ror r14d,2 + add r12d,r13d + + xor eax,r15d + add r8d,r12d + add eax,r12d + + lea rbp,[20+rbp] + cmp BYTE[3+rbp],0 + jnz NEAR $L$rounds_16_xx + + mov rdi,QWORD[((64+0))+rsp] + add eax,r14d + lea rsi,[64+rsi] + + add eax,DWORD[rdi] + add ebx,DWORD[4+rdi] + add ecx,DWORD[8+rdi] + add edx,DWORD[12+rdi] + add r8d,DWORD[16+rdi] + add r9d,DWORD[20+rdi] + add r10d,DWORD[24+rdi] + add r11d,DWORD[28+rdi] + + cmp rsi,QWORD[((64+16))+rsp] + + mov DWORD[rdi],eax + mov DWORD[4+rdi],ebx + mov DWORD[8+rdi],ecx + mov DWORD[12+rdi],edx + mov DWORD[16+rdi],r8d + mov DWORD[20+rdi],r9d + mov DWORD[24+rdi],r10d + mov DWORD[28+rdi],r11d + jb NEAR $L$loop + + mov rsi,QWORD[88+rsp] + + mov r15,QWORD[((-48))+rsi] + + mov r14,QWORD[((-40))+rsi] + + mov r13,QWORD[((-32))+rsi] + + mov r12,QWORD[((-24))+rsi] + + mov rbp,QWORD[((-16))+rsi] + + mov rbx,QWORD[((-8))+rsi] + + lea rsp,[rsi] + +$L$epilogue: + mov rdi,QWORD[8+rsp] ;WIN64 epilogue + mov rsi,QWORD[16+rsp] + DB 0F3h,0C3h ;repret + +$L$SEH_end_sha256_block_data_order: +ALIGN 64 + +K256: + DD 0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5 + DD 0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5 + DD 0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5 + DD 0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5 + DD 0xd807aa98,0x12835b01,0x243185be,0x550c7dc3 + DD 0xd807aa98,0x12835b01,0x243185be,0x550c7dc3 + DD 0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174 + DD 0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174 + DD 0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc + DD 0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc + DD 0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da + DD 0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da + DD 0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7 + DD 0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7 + DD 0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967 + DD 0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967 + DD 0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13 + DD 0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13 + DD 0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85 + DD 0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85 + DD 0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3 + DD 0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3 + DD 0xd192e819,0xd6990624,0xf40e3585,0x106aa070 + DD 0xd192e819,0xd6990624,0xf40e3585,0x106aa070 + DD 0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5 + DD 0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5 + DD 0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3 + DD 0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3 + DD 0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208 + DD 0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208 + DD 0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2 + DD 0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2 + + DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f + DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f + DD 0x03020100,0x0b0a0908,0xffffffff,0xffffffff + DD 0x03020100,0x0b0a0908,0xffffffff,0xffffffff + DD 0xffffffff,0xffffffff,0x03020100,0x0b0a0908 + DD 0xffffffff,0xffffffff,0x03020100,0x0b0a0908 +DB 83,72,65,50,53,54,32,98,108,111,99,107,32,116,114,97 +DB 110,115,102,111,114,109,32,102,111,114,32,120,56,54,95,54 +DB 52,44,32,67,82,89,80,84,79,71,65,77,83,32,98,121 +DB 32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46 +DB 111,114,103,62,0 + +ALIGN 64 +sha256_block_data_order_shaext: + mov QWORD[8+rsp],rdi ;WIN64 prologue + mov QWORD[16+rsp],rsi + mov rax,rsp +$L$SEH_begin_sha256_block_data_order_shaext: + mov rdi,rcx + mov rsi,rdx + mov rdx,r8 + + +_shaext_shortcut: + + lea rsp,[((-88))+rsp] + movaps XMMWORD[(-8-80)+rax],xmm6 + movaps XMMWORD[(-8-64)+rax],xmm7 + movaps XMMWORD[(-8-48)+rax],xmm8 + movaps XMMWORD[(-8-32)+rax],xmm9 + movaps XMMWORD[(-8-16)+rax],xmm10 +$L$prologue_shaext: + lea rcx,[((K256+128))] + movdqu xmm1,XMMWORD[rdi] + movdqu xmm2,XMMWORD[16+rdi] + movdqa xmm7,XMMWORD[((512-128))+rcx] + + pshufd xmm0,xmm1,0x1b + pshufd xmm1,xmm1,0xb1 + pshufd xmm2,xmm2,0x1b + movdqa xmm8,xmm7 +DB 102,15,58,15,202,8 + punpcklqdq xmm2,xmm0 + jmp NEAR $L$oop_shaext + +ALIGN 16 +$L$oop_shaext: + movdqu xmm3,XMMWORD[rsi] + movdqu xmm4,XMMWORD[16+rsi] + movdqu xmm5,XMMWORD[32+rsi] +DB 102,15,56,0,223 + movdqu xmm6,XMMWORD[48+rsi] + + movdqa xmm0,XMMWORD[((0-128))+rcx] + paddd xmm0,xmm3 +DB 102,15,56,0,231 + movdqa xmm10,xmm2 +DB 15,56,203,209 + pshufd xmm0,xmm0,0x0e + nop + movdqa xmm9,xmm1 +DB 15,56,203,202 + + movdqa xmm0,XMMWORD[((32-128))+rcx] + paddd xmm0,xmm4 +DB 102,15,56,0,239 +DB 15,56,203,209 + pshufd xmm0,xmm0,0x0e + lea rsi,[64+rsi] +DB 15,56,204,220 +DB 15,56,203,202 + + movdqa xmm0,XMMWORD[((64-128))+rcx] + paddd xmm0,xmm5 +DB 102,15,56,0,247 +DB 15,56,203,209 + pshufd xmm0,xmm0,0x0e + movdqa xmm7,xmm6 +DB 102,15,58,15,253,4 + nop + paddd xmm3,xmm7 +DB 15,56,204,229 +DB 15,56,203,202 + + movdqa xmm0,XMMWORD[((96-128))+rcx] + paddd xmm0,xmm6 +DB 15,56,205,222 +DB 15,56,203,209 + pshufd xmm0,xmm0,0x0e + movdqa xmm7,xmm3 +DB 102,15,58,15,254,4 + nop + paddd xmm4,xmm7 +DB 15,56,204,238 +DB 15,56,203,202 + movdqa xmm0,XMMWORD[((128-128))+rcx] + paddd xmm0,xmm3 +DB 15,56,205,227 +DB 15,56,203,209 + pshufd xmm0,xmm0,0x0e + movdqa xmm7,xmm4 +DB 102,15,58,15,251,4 + nop + paddd xmm5,xmm7 +DB 15,56,204,243 +DB 15,56,203,202 + movdqa xmm0,XMMWORD[((160-128))+rcx] + paddd xmm0,xmm4 +DB 15,56,205,236 +DB 15,56,203,209 + pshufd xmm0,xmm0,0x0e + movdqa xmm7,xmm5 +DB 102,15,58,15,252,4 + nop + paddd xmm6,xmm7 +DB 15,56,204,220 +DB 15,56,203,202 + movdqa xmm0,XMMWORD[((192-128))+rcx] + paddd xmm0,xmm5 +DB 15,56,205,245 +DB 15,56,203,209 + pshufd xmm0,xmm0,0x0e + movdqa xmm7,xmm6 +DB 102,15,58,15,253,4 + nop + paddd xmm3,xmm7 +DB 15,56,204,229 +DB 15,56,203,202 + movdqa xmm0,XMMWORD[((224-128))+rcx] + paddd xmm0,xmm6 +DB 15,56,205,222 +DB 15,56,203,209 + pshufd xmm0,xmm0,0x0e + movdqa xmm7,xmm3 +DB 102,15,58,15,254,4 + nop + paddd xmm4,xmm7 +DB 15,56,204,238 +DB 15,56,203,202 + movdqa xmm0,XMMWORD[((256-128))+rcx] + paddd xmm0,xmm3 +DB 15,56,205,227 +DB 15,56,203,209 + pshufd xmm0,xmm0,0x0e + movdqa xmm7,xmm4 +DB 102,15,58,15,251,4 + nop + paddd xmm5,xmm7 +DB 15,56,204,243 +DB 15,56,203,202 + movdqa xmm0,XMMWORD[((288-128))+rcx] + paddd xmm0,xmm4 +DB 15,56,205,236 +DB 15,56,203,209 + pshufd xmm0,xmm0,0x0e + movdqa xmm7,xmm5 +DB 102,15,58,15,252,4 + nop + paddd xmm6,xmm7 +DB 15,56,204,220 +DB 15,56,203,202 + movdqa xmm0,XMMWORD[((320-128))+rcx] + paddd xmm0,xmm5 +DB 15,56,205,245 +DB 15,56,203,209 + pshufd xmm0,xmm0,0x0e + movdqa xmm7,xmm6 +DB 102,15,58,15,253,4 + nop + paddd xmm3,xmm7 +DB 15,56,204,229 +DB 15,56,203,202 + movdqa xmm0,XMMWORD[((352-128))+rcx] + paddd xmm0,xmm6 +DB 15,56,205,222 +DB 15,56,203,209 + pshufd xmm0,xmm0,0x0e + movdqa xmm7,xmm3 +DB 102,15,58,15,254,4 + nop + paddd xmm4,xmm7 +DB 15,56,204,238 +DB 15,56,203,202 + movdqa xmm0,XMMWORD[((384-128))+rcx] + paddd xmm0,xmm3 +DB 15,56,205,227 +DB 15,56,203,209 + pshufd xmm0,xmm0,0x0e + movdqa xmm7,xmm4 +DB 102,15,58,15,251,4 + nop + paddd xmm5,xmm7 +DB 15,56,204,243 +DB 15,56,203,202 + movdqa xmm0,XMMWORD[((416-128))+rcx] + paddd xmm0,xmm4 +DB 15,56,205,236 +DB 15,56,203,209 + pshufd xmm0,xmm0,0x0e + movdqa xmm7,xmm5 +DB 102,15,58,15,252,4 +DB 15,56,203,202 + paddd xmm6,xmm7 + + movdqa xmm0,XMMWORD[((448-128))+rcx] + paddd xmm0,xmm5 +DB 15,56,203,209 + pshufd xmm0,xmm0,0x0e +DB 15,56,205,245 + movdqa xmm7,xmm8 +DB 15,56,203,202 + + movdqa xmm0,XMMWORD[((480-128))+rcx] + paddd xmm0,xmm6 + nop +DB 15,56,203,209 + pshufd xmm0,xmm0,0x0e + dec rdx + nop +DB 15,56,203,202 + + paddd xmm2,xmm10 + paddd xmm1,xmm9 + jnz NEAR $L$oop_shaext + + pshufd xmm2,xmm2,0xb1 + pshufd xmm7,xmm1,0x1b + pshufd xmm1,xmm1,0xb1 + punpckhqdq xmm1,xmm2 +DB 102,15,58,15,215,8 + + movdqu XMMWORD[rdi],xmm1 + movdqu XMMWORD[16+rdi],xmm2 + movaps xmm6,XMMWORD[((-8-80))+rax] + movaps xmm7,XMMWORD[((-8-64))+rax] + movaps xmm8,XMMWORD[((-8-48))+rax] + movaps xmm9,XMMWORD[((-8-32))+rax] + movaps xmm10,XMMWORD[((-8-16))+rax] + mov rsp,rax +$L$epilogue_shaext: + mov rdi,QWORD[8+rsp] ;WIN64 epilogue + mov rsi,QWORD[16+rsp] + DB 0F3h,0C3h ;repret + +$L$SEH_end_sha256_block_data_order_shaext: + +ALIGN 64 +sha256_block_data_order_ssse3: + mov QWORD[8+rsp],rdi ;WIN64 prologue + mov QWORD[16+rsp],rsi + mov rax,rsp +$L$SEH_begin_sha256_block_data_order_ssse3: + mov rdi,rcx + mov rsi,rdx + mov rdx,r8 + + + +$L$ssse3_shortcut: + mov rax,rsp + + push rbx + + push rbp + + push r12 + + push r13 + + push r14 + + push r15 + + shl rdx,4 + sub rsp,160 + lea rdx,[rdx*4+rsi] + and rsp,-64 + mov QWORD[((64+0))+rsp],rdi + mov QWORD[((64+8))+rsp],rsi + mov QWORD[((64+16))+rsp],rdx + mov QWORD[88+rsp],rax + + movaps XMMWORD[(64+32)+rsp],xmm6 + movaps XMMWORD[(64+48)+rsp],xmm7 + movaps XMMWORD[(64+64)+rsp],xmm8 + movaps XMMWORD[(64+80)+rsp],xmm9 +$L$prologue_ssse3: + + mov eax,DWORD[rdi] + mov ebx,DWORD[4+rdi] + mov ecx,DWORD[8+rdi] + mov edx,DWORD[12+rdi] + mov r8d,DWORD[16+rdi] + mov r9d,DWORD[20+rdi] + mov r10d,DWORD[24+rdi] + mov r11d,DWORD[28+rdi] + + + jmp NEAR $L$loop_ssse3 +ALIGN 16 +$L$loop_ssse3: + movdqa xmm7,XMMWORD[((K256+512))] + movdqu xmm0,XMMWORD[rsi] + movdqu xmm1,XMMWORD[16+rsi] + movdqu xmm2,XMMWORD[32+rsi] +DB 102,15,56,0,199 + movdqu xmm3,XMMWORD[48+rsi] + lea rbp,[K256] +DB 102,15,56,0,207 + movdqa xmm4,XMMWORD[rbp] + movdqa xmm5,XMMWORD[32+rbp] +DB 102,15,56,0,215 + paddd xmm4,xmm0 + movdqa xmm6,XMMWORD[64+rbp] +DB 102,15,56,0,223 + movdqa xmm7,XMMWORD[96+rbp] + paddd xmm5,xmm1 + paddd xmm6,xmm2 + paddd xmm7,xmm3 + movdqa XMMWORD[rsp],xmm4 + mov r14d,eax + movdqa XMMWORD[16+rsp],xmm5 + mov edi,ebx + movdqa XMMWORD[32+rsp],xmm6 + xor edi,ecx + movdqa XMMWORD[48+rsp],xmm7 + mov r13d,r8d + jmp NEAR $L$ssse3_00_47 + +ALIGN 16 +$L$ssse3_00_47: + sub rbp,-128 + ror r13d,14 + movdqa xmm4,xmm1 + mov eax,r14d + mov r12d,r9d + movdqa xmm7,xmm3 + ror r14d,9 + xor r13d,r8d + xor r12d,r10d + ror r13d,5 + xor r14d,eax +DB 102,15,58,15,224,4 + and r12d,r8d + xor r13d,r8d +DB 102,15,58,15,250,4 + add r11d,DWORD[rsp] + mov r15d,eax + xor r12d,r10d + ror r14d,11 + movdqa xmm5,xmm4 + xor r15d,ebx + add r11d,r12d + movdqa xmm6,xmm4 + ror r13d,6 + and edi,r15d + psrld xmm4,3 + xor r14d,eax + add r11d,r13d + xor edi,ebx + paddd xmm0,xmm7 + ror r14d,2 + add edx,r11d + psrld xmm6,7 + add r11d,edi + mov r13d,edx + pshufd xmm7,xmm3,250 + add r14d,r11d + ror r13d,14 + pslld xmm5,14 + mov r11d,r14d + mov r12d,r8d + pxor xmm4,xmm6 + ror r14d,9 + xor r13d,edx + xor r12d,r9d + ror r13d,5 + psrld xmm6,11 + xor r14d,r11d + pxor xmm4,xmm5 + and r12d,edx + xor r13d,edx + pslld xmm5,11 + add r10d,DWORD[4+rsp] + mov edi,r11d + pxor xmm4,xmm6 + xor r12d,r9d + ror r14d,11 + movdqa xmm6,xmm7 + xor edi,eax + add r10d,r12d + pxor xmm4,xmm5 + ror r13d,6 + and r15d,edi + xor r14d,r11d + psrld xmm7,10 + add r10d,r13d + xor r15d,eax + paddd xmm0,xmm4 + ror r14d,2 + add ecx,r10d + psrlq xmm6,17 + add r10d,r15d + mov r13d,ecx + add r14d,r10d + pxor xmm7,xmm6 + ror r13d,14 + mov r10d,r14d + mov r12d,edx + ror r14d,9 + psrlq xmm6,2 + xor r13d,ecx + xor r12d,r8d + pxor xmm7,xmm6 + ror r13d,5 + xor r14d,r10d + and r12d,ecx + pshufd xmm7,xmm7,128 + xor r13d,ecx + add r9d,DWORD[8+rsp] + mov r15d,r10d + psrldq xmm7,8 + xor r12d,r8d + ror r14d,11 + xor r15d,r11d + add r9d,r12d + ror r13d,6 + paddd xmm0,xmm7 + and edi,r15d + xor r14d,r10d + add r9d,r13d + pshufd xmm7,xmm0,80 + xor edi,r11d + ror r14d,2 + add ebx,r9d + movdqa xmm6,xmm7 + add r9d,edi + mov r13d,ebx + psrld xmm7,10 + add r14d,r9d + ror r13d,14 + psrlq xmm6,17 + mov r9d,r14d + mov r12d,ecx + pxor xmm7,xmm6 + ror r14d,9 + xor r13d,ebx + xor r12d,edx + ror r13d,5 + xor r14d,r9d + psrlq xmm6,2 + and r12d,ebx + xor r13d,ebx + add r8d,DWORD[12+rsp] + pxor xmm7,xmm6 + mov edi,r9d + xor r12d,edx + ror r14d,11 + pshufd xmm7,xmm7,8 + xor edi,r10d + add r8d,r12d + movdqa xmm6,XMMWORD[rbp] + ror r13d,6 + and r15d,edi + pslldq xmm7,8 + xor r14d,r9d + add r8d,r13d + xor r15d,r10d + paddd xmm0,xmm7 + ror r14d,2 + add eax,r8d + add r8d,r15d + paddd xmm6,xmm0 + mov r13d,eax + add r14d,r8d + movdqa XMMWORD[rsp],xmm6 + ror r13d,14 + movdqa xmm4,xmm2 + mov r8d,r14d + mov r12d,ebx + movdqa xmm7,xmm0 + ror r14d,9 + xor r13d,eax + xor r12d,ecx + ror r13d,5 + xor r14d,r8d +DB 102,15,58,15,225,4 + and r12d,eax + xor r13d,eax +DB 102,15,58,15,251,4 + add edx,DWORD[16+rsp] + mov r15d,r8d + xor r12d,ecx + ror r14d,11 + movdqa xmm5,xmm4 + xor r15d,r9d + add edx,r12d + movdqa xmm6,xmm4 + ror r13d,6 + and edi,r15d + psrld xmm4,3 + xor r14d,r8d + add edx,r13d + xor edi,r9d + paddd xmm1,xmm7 + ror r14d,2 + add r11d,edx + psrld xmm6,7 + add edx,edi + mov r13d,r11d + pshufd xmm7,xmm0,250 + add r14d,edx + ror r13d,14 + pslld xmm5,14 + mov edx,r14d + mov r12d,eax + pxor xmm4,xmm6 + ror r14d,9 + xor r13d,r11d + xor r12d,ebx + ror r13d,5 + psrld xmm6,11 + xor r14d,edx + pxor xmm4,xmm5 + and r12d,r11d + xor r13d,r11d + pslld xmm5,11 + add ecx,DWORD[20+rsp] + mov edi,edx + pxor xmm4,xmm6 + xor r12d,ebx + ror r14d,11 + movdqa xmm6,xmm7 + xor edi,r8d + add ecx,r12d + pxor xmm4,xmm5 + ror r13d,6 + and r15d,edi + xor r14d,edx + psrld xmm7,10 + add ecx,r13d + xor r15d,r8d + paddd xmm1,xmm4 + ror r14d,2 + add r10d,ecx + psrlq xmm6,17 + add ecx,r15d + mov r13d,r10d + add r14d,ecx + pxor xmm7,xmm6 + ror r13d,14 + mov ecx,r14d + mov r12d,r11d + ror r14d,9 + psrlq xmm6,2 + xor r13d,r10d + xor r12d,eax + pxor xmm7,xmm6 + ror r13d,5 + xor r14d,ecx + and r12d,r10d + pshufd xmm7,xmm7,128 + xor r13d,r10d + add ebx,DWORD[24+rsp] + mov r15d,ecx + psrldq xmm7,8 + xor r12d,eax + ror r14d,11 + xor r15d,edx + add ebx,r12d + ror r13d,6 + paddd xmm1,xmm7 + and edi,r15d + xor r14d,ecx + add ebx,r13d + pshufd xmm7,xmm1,80 + xor edi,edx + ror r14d,2 + add r9d,ebx + movdqa xmm6,xmm7 + add ebx,edi + mov r13d,r9d + psrld xmm7,10 + add r14d,ebx + ror r13d,14 + psrlq xmm6,17 + mov ebx,r14d + mov r12d,r10d + pxor xmm7,xmm6 + ror r14d,9 + xor r13d,r9d + xor r12d,r11d + ror r13d,5 + xor r14d,ebx + psrlq xmm6,2 + and r12d,r9d + xor r13d,r9d + add eax,DWORD[28+rsp] + pxor xmm7,xmm6 + mov edi,ebx + xor r12d,r11d + ror r14d,11 + pshufd xmm7,xmm7,8 + xor edi,ecx + add eax,r12d + movdqa xmm6,XMMWORD[32+rbp] + ror r13d,6 + and r15d,edi + pslldq xmm7,8 + xor r14d,ebx + add eax,r13d + xor r15d,ecx + paddd xmm1,xmm7 + ror r14d,2 + add r8d,eax + add eax,r15d + paddd xmm6,xmm1 + mov r13d,r8d + add r14d,eax + movdqa XMMWORD[16+rsp],xmm6 + ror r13d,14 + movdqa xmm4,xmm3 + mov eax,r14d + mov r12d,r9d + movdqa xmm7,xmm1 + ror r14d,9 + xor r13d,r8d + xor r12d,r10d + ror r13d,5 + xor r14d,eax +DB 102,15,58,15,226,4 + and r12d,r8d + xor r13d,r8d +DB 102,15,58,15,248,4 + add r11d,DWORD[32+rsp] + mov r15d,eax + xor r12d,r10d + ror r14d,11 + movdqa xmm5,xmm4 + xor r15d,ebx + add r11d,r12d + movdqa xmm6,xmm4 + ror r13d,6 + and edi,r15d + psrld xmm4,3 + xor r14d,eax + add r11d,r13d + xor edi,ebx + paddd xmm2,xmm7 + ror r14d,2 + add edx,r11d + psrld xmm6,7 + add r11d,edi + mov r13d,edx + pshufd xmm7,xmm1,250 + add r14d,r11d + ror r13d,14 + pslld xmm5,14 + mov r11d,r14d + mov r12d,r8d + pxor xmm4,xmm6 + ror r14d,9 + xor r13d,edx + xor r12d,r9d + ror r13d,5 + psrld xmm6,11 + xor r14d,r11d + pxor xmm4,xmm5 + and r12d,edx + xor r13d,edx + pslld xmm5,11 + add r10d,DWORD[36+rsp] + mov edi,r11d + pxor xmm4,xmm6 + xor r12d,r9d + ror r14d,11 + movdqa xmm6,xmm7 + xor edi,eax + add r10d,r12d + pxor xmm4,xmm5 + ror r13d,6 + and r15d,edi + xor r14d,r11d + psrld xmm7,10 + add r10d,r13d + xor r15d,eax + paddd xmm2,xmm4 + ror r14d,2 + add ecx,r10d + psrlq xmm6,17 + add r10d,r15d + mov r13d,ecx + add r14d,r10d + pxor xmm7,xmm6 + ror r13d,14 + mov r10d,r14d + mov r12d,edx + ror r14d,9 + psrlq xmm6,2 + xor r13d,ecx + xor r12d,r8d + pxor xmm7,xmm6 + ror r13d,5 + xor r14d,r10d + and r12d,ecx + pshufd xmm7,xmm7,128 + xor r13d,ecx + add r9d,DWORD[40+rsp] + mov r15d,r10d + psrldq xmm7,8 + xor r12d,r8d + ror r14d,11 + xor r15d,r11d + add r9d,r12d + ror r13d,6 + paddd xmm2,xmm7 + and edi,r15d + xor r14d,r10d + add r9d,r13d + pshufd xmm7,xmm2,80 + xor edi,r11d + ror r14d,2 + add ebx,r9d + movdqa xmm6,xmm7 + add r9d,edi + mov r13d,ebx + psrld xmm7,10 + add r14d,r9d + ror r13d,14 + psrlq xmm6,17 + mov r9d,r14d + mov r12d,ecx + pxor xmm7,xmm6 + ror r14d,9 + xor r13d,ebx + xor r12d,edx + ror r13d,5 + xor r14d,r9d + psrlq xmm6,2 + and r12d,ebx + xor r13d,ebx + add r8d,DWORD[44+rsp] + pxor xmm7,xmm6 + mov edi,r9d + xor r12d,edx + ror r14d,11 + pshufd xmm7,xmm7,8 + xor edi,r10d + add r8d,r12d + movdqa xmm6,XMMWORD[64+rbp] + ror r13d,6 + and r15d,edi + pslldq xmm7,8 + xor r14d,r9d + add r8d,r13d + xor r15d,r10d + paddd xmm2,xmm7 + ror r14d,2 + add eax,r8d + add r8d,r15d + paddd xmm6,xmm2 + mov r13d,eax + add r14d,r8d + movdqa XMMWORD[32+rsp],xmm6 + ror r13d,14 + movdqa xmm4,xmm0 + mov r8d,r14d + mov r12d,ebx + movdqa xmm7,xmm2 + ror r14d,9 + xor r13d,eax + xor r12d,ecx + ror r13d,5 + xor r14d,r8d +DB 102,15,58,15,227,4 + and r12d,eax + xor r13d,eax +DB 102,15,58,15,249,4 + add edx,DWORD[48+rsp] + mov r15d,r8d + xor r12d,ecx + ror r14d,11 + movdqa xmm5,xmm4 + xor r15d,r9d + add edx,r12d + movdqa xmm6,xmm4 + ror r13d,6 + and edi,r15d + psrld xmm4,3 + xor r14d,r8d + add edx,r13d + xor edi,r9d + paddd xmm3,xmm7 + ror r14d,2 + add r11d,edx + psrld xmm6,7 + add edx,edi + mov r13d,r11d + pshufd xmm7,xmm2,250 + add r14d,edx + ror r13d,14 + pslld xmm5,14 + mov edx,r14d + mov r12d,eax + pxor xmm4,xmm6 + ror r14d,9 + xor r13d,r11d + xor r12d,ebx + ror r13d,5 + psrld xmm6,11 + xor r14d,edx + pxor xmm4,xmm5 + and r12d,r11d + xor r13d,r11d + pslld xmm5,11 + add ecx,DWORD[52+rsp] + mov edi,edx + pxor xmm4,xmm6 + xor r12d,ebx + ror r14d,11 + movdqa xmm6,xmm7 + xor edi,r8d + add ecx,r12d + pxor xmm4,xmm5 + ror r13d,6 + and r15d,edi + xor r14d,edx + psrld xmm7,10 + add ecx,r13d + xor r15d,r8d + paddd xmm3,xmm4 + ror r14d,2 + add r10d,ecx + psrlq xmm6,17 + add ecx,r15d + mov r13d,r10d + add r14d,ecx + pxor xmm7,xmm6 + ror r13d,14 + mov ecx,r14d + mov r12d,r11d + ror r14d,9 + psrlq xmm6,2 + xor r13d,r10d + xor r12d,eax + pxor xmm7,xmm6 + ror r13d,5 + xor r14d,ecx + and r12d,r10d + pshufd xmm7,xmm7,128 + xor r13d,r10d + add ebx,DWORD[56+rsp] + mov r15d,ecx + psrldq xmm7,8 + xor r12d,eax + ror r14d,11 + xor r15d,edx + add ebx,r12d + ror r13d,6 + paddd xmm3,xmm7 + and edi,r15d + xor r14d,ecx + add ebx,r13d + pshufd xmm7,xmm3,80 + xor edi,edx + ror r14d,2 + add r9d,ebx + movdqa xmm6,xmm7 + add ebx,edi + mov r13d,r9d + psrld xmm7,10 + add r14d,ebx + ror r13d,14 + psrlq xmm6,17 + mov ebx,r14d + mov r12d,r10d + pxor xmm7,xmm6 + ror r14d,9 + xor r13d,r9d + xor r12d,r11d + ror r13d,5 + xor r14d,ebx + psrlq xmm6,2 + and r12d,r9d + xor r13d,r9d + add eax,DWORD[60+rsp] + pxor xmm7,xmm6 + mov edi,ebx + xor r12d,r11d + ror r14d,11 + pshufd xmm7,xmm7,8 + xor edi,ecx + add eax,r12d + movdqa xmm6,XMMWORD[96+rbp] + ror r13d,6 + and r15d,edi + pslldq xmm7,8 + xor r14d,ebx + add eax,r13d + xor r15d,ecx + paddd xmm3,xmm7 + ror r14d,2 + add r8d,eax + add eax,r15d + paddd xmm6,xmm3 + mov r13d,r8d + add r14d,eax + movdqa XMMWORD[48+rsp],xmm6 + cmp BYTE[131+rbp],0 + jne NEAR $L$ssse3_00_47 + ror r13d,14 + mov eax,r14d + mov r12d,r9d + ror r14d,9 + xor r13d,r8d + xor r12d,r10d + ror r13d,5 + xor r14d,eax + and r12d,r8d + xor r13d,r8d + add r11d,DWORD[rsp] + mov r15d,eax + xor r12d,r10d + ror r14d,11 + xor r15d,ebx + add r11d,r12d + ror r13d,6 + and edi,r15d + xor r14d,eax + add r11d,r13d + xor edi,ebx + ror r14d,2 + add edx,r11d + add r11d,edi + mov r13d,edx + add r14d,r11d + ror r13d,14 + mov r11d,r14d + mov r12d,r8d + ror r14d,9 + xor r13d,edx + xor r12d,r9d + ror r13d,5 + xor r14d,r11d + and r12d,edx + xor r13d,edx + add r10d,DWORD[4+rsp] + mov edi,r11d + xor r12d,r9d + ror r14d,11 + xor edi,eax + add r10d,r12d + ror r13d,6 + and r15d,edi + xor r14d,r11d + add r10d,r13d + xor r15d,eax + ror r14d,2 + add ecx,r10d + add r10d,r15d + mov r13d,ecx + add r14d,r10d + ror r13d,14 + mov r10d,r14d + mov r12d,edx + ror r14d,9 + xor r13d,ecx + xor r12d,r8d + ror r13d,5 + xor r14d,r10d + and r12d,ecx + xor r13d,ecx + add r9d,DWORD[8+rsp] + mov r15d,r10d + xor r12d,r8d + ror r14d,11 + xor r15d,r11d + add r9d,r12d + ror r13d,6 + and edi,r15d + xor r14d,r10d + add r9d,r13d + xor edi,r11d + ror r14d,2 + add ebx,r9d + add r9d,edi + mov r13d,ebx + add r14d,r9d + ror r13d,14 + mov r9d,r14d + mov r12d,ecx + ror r14d,9 + xor r13d,ebx + xor r12d,edx + ror r13d,5 + xor r14d,r9d + and r12d,ebx + xor r13d,ebx + add r8d,DWORD[12+rsp] + mov edi,r9d + xor r12d,edx + ror r14d,11 + xor edi,r10d + add r8d,r12d + ror r13d,6 + and r15d,edi + xor r14d,r9d + add r8d,r13d + xor r15d,r10d + ror r14d,2 + add eax,r8d + add r8d,r15d + mov r13d,eax + add r14d,r8d + ror r13d,14 + mov r8d,r14d + mov r12d,ebx + ror r14d,9 + xor r13d,eax + xor r12d,ecx + ror r13d,5 + xor r14d,r8d + and r12d,eax + xor r13d,eax + add edx,DWORD[16+rsp] + mov r15d,r8d + xor r12d,ecx + ror r14d,11 + xor r15d,r9d + add edx,r12d + ror r13d,6 + and edi,r15d + xor r14d,r8d + add edx,r13d + xor edi,r9d + ror r14d,2 + add r11d,edx + add edx,edi + mov r13d,r11d + add r14d,edx + ror r13d,14 + mov edx,r14d + mov r12d,eax + ror r14d,9 + xor r13d,r11d + xor r12d,ebx + ror r13d,5 + xor r14d,edx + and r12d,r11d + xor r13d,r11d + add ecx,DWORD[20+rsp] + mov edi,edx + xor r12d,ebx + ror r14d,11 + xor edi,r8d + add ecx,r12d + ror r13d,6 + and r15d,edi + xor r14d,edx + add ecx,r13d + xor r15d,r8d + ror r14d,2 + add r10d,ecx + add ecx,r15d + mov r13d,r10d + add r14d,ecx + ror r13d,14 + mov ecx,r14d + mov r12d,r11d + ror r14d,9 + xor r13d,r10d + xor r12d,eax + ror r13d,5 + xor r14d,ecx + and r12d,r10d + xor r13d,r10d + add ebx,DWORD[24+rsp] + mov r15d,ecx + xor r12d,eax + ror r14d,11 + xor r15d,edx + add ebx,r12d + ror r13d,6 + and edi,r15d + xor r14d,ecx + add ebx,r13d + xor edi,edx + ror r14d,2 + add r9d,ebx + add ebx,edi + mov r13d,r9d + add r14d,ebx + ror r13d,14 + mov ebx,r14d + mov r12d,r10d + ror r14d,9 + xor r13d,r9d + xor r12d,r11d + ror r13d,5 + xor r14d,ebx + and r12d,r9d + xor r13d,r9d + add eax,DWORD[28+rsp] + mov edi,ebx + xor r12d,r11d + ror r14d,11 + xor edi,ecx + add eax,r12d + ror r13d,6 + and r15d,edi + xor r14d,ebx + add eax,r13d + xor r15d,ecx + ror r14d,2 + add r8d,eax + add eax,r15d + mov r13d,r8d + add r14d,eax + ror r13d,14 + mov eax,r14d + mov r12d,r9d + ror r14d,9 + xor r13d,r8d + xor r12d,r10d + ror r13d,5 + xor r14d,eax + and r12d,r8d + xor r13d,r8d + add r11d,DWORD[32+rsp] + mov r15d,eax + xor r12d,r10d + ror r14d,11 + xor r15d,ebx + add r11d,r12d + ror r13d,6 + and edi,r15d + xor r14d,eax + add r11d,r13d + xor edi,ebx + ror r14d,2 + add edx,r11d + add r11d,edi + mov r13d,edx + add r14d,r11d + ror r13d,14 + mov r11d,r14d + mov r12d,r8d + ror r14d,9 + xor r13d,edx + xor r12d,r9d + ror r13d,5 + xor r14d,r11d + and r12d,edx + xor r13d,edx + add r10d,DWORD[36+rsp] + mov edi,r11d + xor r12d,r9d + ror r14d,11 + xor edi,eax + add r10d,r12d + ror r13d,6 + and r15d,edi + xor r14d,r11d + add r10d,r13d + xor r15d,eax + ror r14d,2 + add ecx,r10d + add r10d,r15d + mov r13d,ecx + add r14d,r10d + ror r13d,14 + mov r10d,r14d + mov r12d,edx + ror r14d,9 + xor r13d,ecx + xor r12d,r8d + ror r13d,5 + xor r14d,r10d + and r12d,ecx + xor r13d,ecx + add r9d,DWORD[40+rsp] + mov r15d,r10d + xor r12d,r8d + ror r14d,11 + xor r15d,r11d + add r9d,r12d + ror r13d,6 + and edi,r15d + xor r14d,r10d + add r9d,r13d + xor edi,r11d + ror r14d,2 + add ebx,r9d + add r9d,edi + mov r13d,ebx + add r14d,r9d + ror r13d,14 + mov r9d,r14d + mov r12d,ecx + ror r14d,9 + xor r13d,ebx + xor r12d,edx + ror r13d,5 + xor r14d,r9d + and r12d,ebx + xor r13d,ebx + add r8d,DWORD[44+rsp] + mov edi,r9d + xor r12d,edx + ror r14d,11 + xor edi,r10d + add r8d,r12d + ror r13d,6 + and r15d,edi + xor r14d,r9d + add r8d,r13d + xor r15d,r10d + ror r14d,2 + add eax,r8d + add r8d,r15d + mov r13d,eax + add r14d,r8d + ror r13d,14 + mov r8d,r14d + mov r12d,ebx + ror r14d,9 + xor r13d,eax + xor r12d,ecx + ror r13d,5 + xor r14d,r8d + and r12d,eax + xor r13d,eax + add edx,DWORD[48+rsp] + mov r15d,r8d + xor r12d,ecx + ror r14d,11 + xor r15d,r9d + add edx,r12d + ror r13d,6 + and edi,r15d + xor r14d,r8d + add edx,r13d + xor edi,r9d + ror r14d,2 + add r11d,edx + add edx,edi + mov r13d,r11d + add r14d,edx + ror r13d,14 + mov edx,r14d + mov r12d,eax + ror r14d,9 + xor r13d,r11d + xor r12d,ebx + ror r13d,5 + xor r14d,edx + and r12d,r11d + xor r13d,r11d + add ecx,DWORD[52+rsp] + mov edi,edx + xor r12d,ebx + ror r14d,11 + xor edi,r8d + add ecx,r12d + ror r13d,6 + and r15d,edi + xor r14d,edx + add ecx,r13d + xor r15d,r8d + ror r14d,2 + add r10d,ecx + add ecx,r15d + mov r13d,r10d + add r14d,ecx + ror r13d,14 + mov ecx,r14d + mov r12d,r11d + ror r14d,9 + xor r13d,r10d + xor r12d,eax + ror r13d,5 + xor r14d,ecx + and r12d,r10d + xor r13d,r10d + add ebx,DWORD[56+rsp] + mov r15d,ecx + xor r12d,eax + ror r14d,11 + xor r15d,edx + add ebx,r12d + ror r13d,6 + and edi,r15d + xor r14d,ecx + add ebx,r13d + xor edi,edx + ror r14d,2 + add r9d,ebx + add ebx,edi + mov r13d,r9d + add r14d,ebx + ror r13d,14 + mov ebx,r14d + mov r12d,r10d + ror r14d,9 + xor r13d,r9d + xor r12d,r11d + ror r13d,5 + xor r14d,ebx + and r12d,r9d + xor r13d,r9d + add eax,DWORD[60+rsp] + mov edi,ebx + xor r12d,r11d + ror r14d,11 + xor edi,ecx + add eax,r12d + ror r13d,6 + and r15d,edi + xor r14d,ebx + add eax,r13d + xor r15d,ecx + ror r14d,2 + add r8d,eax + add eax,r15d + mov r13d,r8d + add r14d,eax + mov rdi,QWORD[((64+0))+rsp] + mov eax,r14d + + add eax,DWORD[rdi] + lea rsi,[64+rsi] + add ebx,DWORD[4+rdi] + add ecx,DWORD[8+rdi] + add edx,DWORD[12+rdi] + add r8d,DWORD[16+rdi] + add r9d,DWORD[20+rdi] + add r10d,DWORD[24+rdi] + add r11d,DWORD[28+rdi] + + cmp rsi,QWORD[((64+16))+rsp] + + mov DWORD[rdi],eax + mov DWORD[4+rdi],ebx + mov DWORD[8+rdi],ecx + mov DWORD[12+rdi],edx + mov DWORD[16+rdi],r8d + mov DWORD[20+rdi],r9d + mov DWORD[24+rdi],r10d + mov DWORD[28+rdi],r11d + jb NEAR $L$loop_ssse3 + + mov rsi,QWORD[88+rsp] + + movaps xmm6,XMMWORD[((64+32))+rsp] + movaps xmm7,XMMWORD[((64+48))+rsp] + movaps xmm8,XMMWORD[((64+64))+rsp] + movaps xmm9,XMMWORD[((64+80))+rsp] + mov r15,QWORD[((-48))+rsi] + + mov r14,QWORD[((-40))+rsi] + + mov r13,QWORD[((-32))+rsi] + + mov r12,QWORD[((-24))+rsi] + + mov rbp,QWORD[((-16))+rsi] + + mov rbx,QWORD[((-8))+rsi] + + lea rsp,[rsi] + +$L$epilogue_ssse3: + mov rdi,QWORD[8+rsp] ;WIN64 epilogue + mov rsi,QWORD[16+rsp] + DB 0F3h,0C3h ;repret + +$L$SEH_end_sha256_block_data_order_ssse3: +EXTERN __imp_RtlVirtualUnwind + +ALIGN 16 +se_handler: + push rsi + push rdi + push rbx + push rbp + push r12 + push r13 + push r14 + push r15 + pushfq + sub rsp,64 + + mov rax,QWORD[120+r8] + mov rbx,QWORD[248+r8] + + mov rsi,QWORD[8+r9] + mov r11,QWORD[56+r9] + + mov r10d,DWORD[r11] + lea r10,[r10*1+rsi] + cmp rbx,r10 + jb NEAR $L$in_prologue + + mov rax,QWORD[152+r8] + + mov r10d,DWORD[4+r11] + lea r10,[r10*1+rsi] + cmp rbx,r10 + jae NEAR $L$in_prologue + mov rsi,rax + mov rax,QWORD[((64+24))+rax] + + mov rbx,QWORD[((-8))+rax] + mov rbp,QWORD[((-16))+rax] + mov r12,QWORD[((-24))+rax] + mov r13,QWORD[((-32))+rax] + mov r14,QWORD[((-40))+rax] + mov r15,QWORD[((-48))+rax] + mov QWORD[144+r8],rbx + mov QWORD[160+r8],rbp + mov QWORD[216+r8],r12 + mov QWORD[224+r8],r13 + mov QWORD[232+r8],r14 + mov QWORD[240+r8],r15 + + lea r10,[$L$epilogue] + cmp rbx,r10 + jb NEAR $L$in_prologue + + lea rsi,[((64+32))+rsi] + lea rdi,[512+r8] + mov ecx,8 + DD 0xa548f3fc + +$L$in_prologue: + mov rdi,QWORD[8+rax] + mov rsi,QWORD[16+rax] + mov QWORD[152+r8],rax + mov QWORD[168+r8],rsi + mov QWORD[176+r8],rdi + + mov rdi,QWORD[40+r9] + mov rsi,r8 + mov ecx,154 + DD 0xa548f3fc + + mov rsi,r9 + xor rcx,rcx + mov rdx,QWORD[8+rsi] + mov r8,QWORD[rsi] + mov r9,QWORD[16+rsi] + mov r10,QWORD[40+rsi] + lea r11,[56+rsi] + lea r12,[24+rsi] + mov QWORD[32+rsp],r10 + mov QWORD[40+rsp],r11 + mov QWORD[48+rsp],r12 + mov QWORD[56+rsp],rcx + call QWORD[__imp_RtlVirtualUnwind] + + mov eax,1 + add rsp,64 + popfq + pop r15 + pop r14 + pop r13 + pop r12 + pop rbp + pop rbx + pop rdi + pop rsi + DB 0F3h,0C3h ;repret + + +ALIGN 16 +shaext_handler: + push rsi + push rdi + push rbx + push rbp + push r12 + push r13 + push r14 + push r15 + pushfq + sub rsp,64 + + mov rax,QWORD[120+r8] + mov rbx,QWORD[248+r8] + + lea r10,[$L$prologue_shaext] + cmp rbx,r10 + jb NEAR $L$in_prologue + + lea r10,[$L$epilogue_shaext] + cmp rbx,r10 + jae NEAR $L$in_prologue + + lea rsi,[((-8-80))+rax] + lea rdi,[512+r8] + mov ecx,10 + DD 0xa548f3fc + + jmp NEAR $L$in_prologue + +section .pdata rdata align=4 +ALIGN 4 + DD $L$SEH_begin_sha256_block_data_order wrt ..imagebase + DD $L$SEH_end_sha256_block_data_order wrt ..imagebase + DD $L$SEH_info_sha256_block_data_order wrt ..imagebase + DD $L$SEH_begin_sha256_block_data_order_shaext wrt ..imagebase + DD $L$SEH_end_sha256_block_data_order_shaext wrt ..imagebase + DD $L$SEH_info_sha256_block_data_order_shaext wrt ..imagebase + DD $L$SEH_begin_sha256_block_data_order_ssse3 wrt ..imagebase + DD $L$SEH_end_sha256_block_data_order_ssse3 wrt ..imagebase + DD $L$SEH_info_sha256_block_data_order_ssse3 wrt ..imagebase +section .xdata rdata align=8 +ALIGN 8 +$L$SEH_info_sha256_block_data_order: +DB 9,0,0,0 + DD se_handler wrt ..imagebase + DD $L$prologue wrt ..imagebase,$L$epilogue wrt ..imagebase +$L$SEH_info_sha256_block_data_order_shaext: +DB 9,0,0,0 + DD shaext_handler wrt ..imagebase +$L$SEH_info_sha256_block_data_order_ssse3: +DB 9,0,0,0 + DD se_handler wrt ..imagebase + DD $L$prologue_ssse3 wrt ..imagebase,$L$epilogue_ssse3 wrt ..imagebase diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha512-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha512-x86_64.nasm new file mode 100644 index 0000000000..c6397d4393 --- /dev/null +++ b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha512-x86_64.nasm @@ -0,0 +1,1938 @@ +; WARNING: do not edit! +; Generated from openssl/crypto/sha/asm/sha512-x86_64.pl +; +; Copyright 2005-2020 The OpenSSL Project Authors. All Rights Reserved. +; +; Licensed under the OpenSSL license (the "License"). You may not use +; this file except in compliance with the License. You can obtain a copy +; in the file LICENSE in the source distribution or at +; https://www.openssl.org/source/license.html + +default rel +%define XMMWORD +%define YMMWORD +%define ZMMWORD +section .text code align=64 + + +EXTERN OPENSSL_ia32cap_P +global sha512_block_data_order + +ALIGN 16 +sha512_block_data_order: + mov QWORD[8+rsp],rdi ;WIN64 prologue + mov QWORD[16+rsp],rsi + mov rax,rsp +$L$SEH_begin_sha512_block_data_order: + mov rdi,rcx + mov rsi,rdx + mov rdx,r8 + + + + mov rax,rsp + + push rbx + + push rbp + + push r12 + + push r13 + + push r14 + + push r15 + + shl rdx,4 + sub rsp,16*8+4*8 + lea rdx,[rdx*8+rsi] + and rsp,-64 + mov QWORD[((128+0))+rsp],rdi + mov QWORD[((128+8))+rsp],rsi + mov QWORD[((128+16))+rsp],rdx + mov QWORD[152+rsp],rax + +$L$prologue: + + mov rax,QWORD[rdi] + mov rbx,QWORD[8+rdi] + mov rcx,QWORD[16+rdi] + mov rdx,QWORD[24+rdi] + mov r8,QWORD[32+rdi] + mov r9,QWORD[40+rdi] + mov r10,QWORD[48+rdi] + mov r11,QWORD[56+rdi] + jmp NEAR $L$loop + +ALIGN 16 +$L$loop: + mov rdi,rbx + lea rbp,[K512] + xor rdi,rcx + mov r12,QWORD[rsi] + mov r13,r8 + mov r14,rax + bswap r12 + ror r13,23 + mov r15,r9 + + xor r13,r8 + ror r14,5 + xor r15,r10 + + mov QWORD[rsp],r12 + xor r14,rax + and r15,r8 + + ror r13,4 + add r12,r11 + xor r15,r10 + + ror r14,6 + xor r13,r8 + add r12,r15 + + mov r15,rax + add r12,QWORD[rbp] + xor r14,rax + + xor r15,rbx + ror r13,14 + mov r11,rbx + + and rdi,r15 + ror r14,28 + add r12,r13 + + xor r11,rdi + add rdx,r12 + add r11,r12 + + lea rbp,[8+rbp] + add r11,r14 + mov r12,QWORD[8+rsi] + mov r13,rdx + mov r14,r11 + bswap r12 + ror r13,23 + mov rdi,r8 + + xor r13,rdx + ror r14,5 + xor rdi,r9 + + mov QWORD[8+rsp],r12 + xor r14,r11 + and rdi,rdx + + ror r13,4 + add r12,r10 + xor rdi,r9 + + ror r14,6 + xor r13,rdx + add r12,rdi + + mov rdi,r11 + add r12,QWORD[rbp] + xor r14,r11 + + xor rdi,rax + ror r13,14 + mov r10,rax + + and r15,rdi + ror r14,28 + add r12,r13 + + xor r10,r15 + add rcx,r12 + add r10,r12 + + lea rbp,[24+rbp] + add r10,r14 + mov r12,QWORD[16+rsi] + mov r13,rcx + mov r14,r10 + bswap r12 + ror r13,23 + mov r15,rdx + + xor r13,rcx + ror r14,5 + xor r15,r8 + + mov QWORD[16+rsp],r12 + xor r14,r10 + and r15,rcx + + ror r13,4 + add r12,r9 + xor r15,r8 + + ror r14,6 + xor r13,rcx + add r12,r15 + + mov r15,r10 + add r12,QWORD[rbp] + xor r14,r10 + + xor r15,r11 + ror r13,14 + mov r9,r11 + + and rdi,r15 + ror r14,28 + add r12,r13 + + xor r9,rdi + add rbx,r12 + add r9,r12 + + lea rbp,[8+rbp] + add r9,r14 + mov r12,QWORD[24+rsi] + mov r13,rbx + mov r14,r9 + bswap r12 + ror r13,23 + mov rdi,rcx + + xor r13,rbx + ror r14,5 + xor rdi,rdx + + mov QWORD[24+rsp],r12 + xor r14,r9 + and rdi,rbx + + ror r13,4 + add r12,r8 + xor rdi,rdx + + ror r14,6 + xor r13,rbx + add r12,rdi + + mov rdi,r9 + add r12,QWORD[rbp] + xor r14,r9 + + xor rdi,r10 + ror r13,14 + mov r8,r10 + + and r15,rdi + ror r14,28 + add r12,r13 + + xor r8,r15 + add rax,r12 + add r8,r12 + + lea rbp,[24+rbp] + add r8,r14 + mov r12,QWORD[32+rsi] + mov r13,rax + mov r14,r8 + bswap r12 + ror r13,23 + mov r15,rbx + + xor r13,rax + ror r14,5 + xor r15,rcx + + mov QWORD[32+rsp],r12 + xor r14,r8 + and r15,rax + + ror r13,4 + add r12,rdx + xor r15,rcx + + ror r14,6 + xor r13,rax + add r12,r15 + + mov r15,r8 + add r12,QWORD[rbp] + xor r14,r8 + + xor r15,r9 + ror r13,14 + mov rdx,r9 + + and rdi,r15 + ror r14,28 + add r12,r13 + + xor rdx,rdi + add r11,r12 + add rdx,r12 + + lea rbp,[8+rbp] + add rdx,r14 + mov r12,QWORD[40+rsi] + mov r13,r11 + mov r14,rdx + bswap r12 + ror r13,23 + mov rdi,rax + + xor r13,r11 + ror r14,5 + xor rdi,rbx + + mov QWORD[40+rsp],r12 + xor r14,rdx + and rdi,r11 + + ror r13,4 + add r12,rcx + xor rdi,rbx + + ror r14,6 + xor r13,r11 + add r12,rdi + + mov rdi,rdx + add r12,QWORD[rbp] + xor r14,rdx + + xor rdi,r8 + ror r13,14 + mov rcx,r8 + + and r15,rdi + ror r14,28 + add r12,r13 + + xor rcx,r15 + add r10,r12 + add rcx,r12 + + lea rbp,[24+rbp] + add rcx,r14 + mov r12,QWORD[48+rsi] + mov r13,r10 + mov r14,rcx + bswap r12 + ror r13,23 + mov r15,r11 + + xor r13,r10 + ror r14,5 + xor r15,rax + + mov QWORD[48+rsp],r12 + xor r14,rcx + and r15,r10 + + ror r13,4 + add r12,rbx + xor r15,rax + + ror r14,6 + xor r13,r10 + add r12,r15 + + mov r15,rcx + add r12,QWORD[rbp] + xor r14,rcx + + xor r15,rdx + ror r13,14 + mov rbx,rdx + + and rdi,r15 + ror r14,28 + add r12,r13 + + xor rbx,rdi + add r9,r12 + add rbx,r12 + + lea rbp,[8+rbp] + add rbx,r14 + mov r12,QWORD[56+rsi] + mov r13,r9 + mov r14,rbx + bswap r12 + ror r13,23 + mov rdi,r10 + + xor r13,r9 + ror r14,5 + xor rdi,r11 + + mov QWORD[56+rsp],r12 + xor r14,rbx + and rdi,r9 + + ror r13,4 + add r12,rax + xor rdi,r11 + + ror r14,6 + xor r13,r9 + add r12,rdi + + mov rdi,rbx + add r12,QWORD[rbp] + xor r14,rbx + + xor rdi,rcx + ror r13,14 + mov rax,rcx + + and r15,rdi + ror r14,28 + add r12,r13 + + xor rax,r15 + add r8,r12 + add rax,r12 + + lea rbp,[24+rbp] + add rax,r14 + mov r12,QWORD[64+rsi] + mov r13,r8 + mov r14,rax + bswap r12 + ror r13,23 + mov r15,r9 + + xor r13,r8 + ror r14,5 + xor r15,r10 + + mov QWORD[64+rsp],r12 + xor r14,rax + and r15,r8 + + ror r13,4 + add r12,r11 + xor r15,r10 + + ror r14,6 + xor r13,r8 + add r12,r15 + + mov r15,rax + add r12,QWORD[rbp] + xor r14,rax + + xor r15,rbx + ror r13,14 + mov r11,rbx + + and rdi,r15 + ror r14,28 + add r12,r13 + + xor r11,rdi + add rdx,r12 + add r11,r12 + + lea rbp,[8+rbp] + add r11,r14 + mov r12,QWORD[72+rsi] + mov r13,rdx + mov r14,r11 + bswap r12 + ror r13,23 + mov rdi,r8 + + xor r13,rdx + ror r14,5 + xor rdi,r9 + + mov QWORD[72+rsp],r12 + xor r14,r11 + and rdi,rdx + + ror r13,4 + add r12,r10 + xor rdi,r9 + + ror r14,6 + xor r13,rdx + add r12,rdi + + mov rdi,r11 + add r12,QWORD[rbp] + xor r14,r11 + + xor rdi,rax + ror r13,14 + mov r10,rax + + and r15,rdi + ror r14,28 + add r12,r13 + + xor r10,r15 + add rcx,r12 + add r10,r12 + + lea rbp,[24+rbp] + add r10,r14 + mov r12,QWORD[80+rsi] + mov r13,rcx + mov r14,r10 + bswap r12 + ror r13,23 + mov r15,rdx + + xor r13,rcx + ror r14,5 + xor r15,r8 + + mov QWORD[80+rsp],r12 + xor r14,r10 + and r15,rcx + + ror r13,4 + add r12,r9 + xor r15,r8 + + ror r14,6 + xor r13,rcx + add r12,r15 + + mov r15,r10 + add r12,QWORD[rbp] + xor r14,r10 + + xor r15,r11 + ror r13,14 + mov r9,r11 + + and rdi,r15 + ror r14,28 + add r12,r13 + + xor r9,rdi + add rbx,r12 + add r9,r12 + + lea rbp,[8+rbp] + add r9,r14 + mov r12,QWORD[88+rsi] + mov r13,rbx + mov r14,r9 + bswap r12 + ror r13,23 + mov rdi,rcx + + xor r13,rbx + ror r14,5 + xor rdi,rdx + + mov QWORD[88+rsp],r12 + xor r14,r9 + and rdi,rbx + + ror r13,4 + add r12,r8 + xor rdi,rdx + + ror r14,6 + xor r13,rbx + add r12,rdi + + mov rdi,r9 + add r12,QWORD[rbp] + xor r14,r9 + + xor rdi,r10 + ror r13,14 + mov r8,r10 + + and r15,rdi + ror r14,28 + add r12,r13 + + xor r8,r15 + add rax,r12 + add r8,r12 + + lea rbp,[24+rbp] + add r8,r14 + mov r12,QWORD[96+rsi] + mov r13,rax + mov r14,r8 + bswap r12 + ror r13,23 + mov r15,rbx + + xor r13,rax + ror r14,5 + xor r15,rcx + + mov QWORD[96+rsp],r12 + xor r14,r8 + and r15,rax + + ror r13,4 + add r12,rdx + xor r15,rcx + + ror r14,6 + xor r13,rax + add r12,r15 + + mov r15,r8 + add r12,QWORD[rbp] + xor r14,r8 + + xor r15,r9 + ror r13,14 + mov rdx,r9 + + and rdi,r15 + ror r14,28 + add r12,r13 + + xor rdx,rdi + add r11,r12 + add rdx,r12 + + lea rbp,[8+rbp] + add rdx,r14 + mov r12,QWORD[104+rsi] + mov r13,r11 + mov r14,rdx + bswap r12 + ror r13,23 + mov rdi,rax + + xor r13,r11 + ror r14,5 + xor rdi,rbx + + mov QWORD[104+rsp],r12 + xor r14,rdx + and rdi,r11 + + ror r13,4 + add r12,rcx + xor rdi,rbx + + ror r14,6 + xor r13,r11 + add r12,rdi + + mov rdi,rdx + add r12,QWORD[rbp] + xor r14,rdx + + xor rdi,r8 + ror r13,14 + mov rcx,r8 + + and r15,rdi + ror r14,28 + add r12,r13 + + xor rcx,r15 + add r10,r12 + add rcx,r12 + + lea rbp,[24+rbp] + add rcx,r14 + mov r12,QWORD[112+rsi] + mov r13,r10 + mov r14,rcx + bswap r12 + ror r13,23 + mov r15,r11 + + xor r13,r10 + ror r14,5 + xor r15,rax + + mov QWORD[112+rsp],r12 + xor r14,rcx + and r15,r10 + + ror r13,4 + add r12,rbx + xor r15,rax + + ror r14,6 + xor r13,r10 + add r12,r15 + + mov r15,rcx + add r12,QWORD[rbp] + xor r14,rcx + + xor r15,rdx + ror r13,14 + mov rbx,rdx + + and rdi,r15 + ror r14,28 + add r12,r13 + + xor rbx,rdi + add r9,r12 + add rbx,r12 + + lea rbp,[8+rbp] + add rbx,r14 + mov r12,QWORD[120+rsi] + mov r13,r9 + mov r14,rbx + bswap r12 + ror r13,23 + mov rdi,r10 + + xor r13,r9 + ror r14,5 + xor rdi,r11 + + mov QWORD[120+rsp],r12 + xor r14,rbx + and rdi,r9 + + ror r13,4 + add r12,rax + xor rdi,r11 + + ror r14,6 + xor r13,r9 + add r12,rdi + + mov rdi,rbx + add r12,QWORD[rbp] + xor r14,rbx + + xor rdi,rcx + ror r13,14 + mov rax,rcx + + and r15,rdi + ror r14,28 + add r12,r13 + + xor rax,r15 + add r8,r12 + add rax,r12 + + lea rbp,[24+rbp] + jmp NEAR $L$rounds_16_xx +ALIGN 16 +$L$rounds_16_xx: + mov r13,QWORD[8+rsp] + mov r15,QWORD[112+rsp] + + mov r12,r13 + ror r13,7 + add rax,r14 + mov r14,r15 + ror r15,42 + + xor r13,r12 + shr r12,7 + ror r13,1 + xor r15,r14 + shr r14,6 + + ror r15,19 + xor r12,r13 + xor r15,r14 + add r12,QWORD[72+rsp] + + add r12,QWORD[rsp] + mov r13,r8 + add r12,r15 + mov r14,rax + ror r13,23 + mov r15,r9 + + xor r13,r8 + ror r14,5 + xor r15,r10 + + mov QWORD[rsp],r12 + xor r14,rax + and r15,r8 + + ror r13,4 + add r12,r11 + xor r15,r10 + + ror r14,6 + xor r13,r8 + add r12,r15 + + mov r15,rax + add r12,QWORD[rbp] + xor r14,rax + + xor r15,rbx + ror r13,14 + mov r11,rbx + + and rdi,r15 + ror r14,28 + add r12,r13 + + xor r11,rdi + add rdx,r12 + add r11,r12 + + lea rbp,[8+rbp] + mov r13,QWORD[16+rsp] + mov rdi,QWORD[120+rsp] + + mov r12,r13 + ror r13,7 + add r11,r14 + mov r14,rdi + ror rdi,42 + + xor r13,r12 + shr r12,7 + ror r13,1 + xor rdi,r14 + shr r14,6 + + ror rdi,19 + xor r12,r13 + xor rdi,r14 + add r12,QWORD[80+rsp] + + add r12,QWORD[8+rsp] + mov r13,rdx + add r12,rdi + mov r14,r11 + ror r13,23 + mov rdi,r8 + + xor r13,rdx + ror r14,5 + xor rdi,r9 + + mov QWORD[8+rsp],r12 + xor r14,r11 + and rdi,rdx + + ror r13,4 + add r12,r10 + xor rdi,r9 + + ror r14,6 + xor r13,rdx + add r12,rdi + + mov rdi,r11 + add r12,QWORD[rbp] + xor r14,r11 + + xor rdi,rax + ror r13,14 + mov r10,rax + + and r15,rdi + ror r14,28 + add r12,r13 + + xor r10,r15 + add rcx,r12 + add r10,r12 + + lea rbp,[24+rbp] + mov r13,QWORD[24+rsp] + mov r15,QWORD[rsp] + + mov r12,r13 + ror r13,7 + add r10,r14 + mov r14,r15 + ror r15,42 + + xor r13,r12 + shr r12,7 + ror r13,1 + xor r15,r14 + shr r14,6 + + ror r15,19 + xor r12,r13 + xor r15,r14 + add r12,QWORD[88+rsp] + + add r12,QWORD[16+rsp] + mov r13,rcx + add r12,r15 + mov r14,r10 + ror r13,23 + mov r15,rdx + + xor r13,rcx + ror r14,5 + xor r15,r8 + + mov QWORD[16+rsp],r12 + xor r14,r10 + and r15,rcx + + ror r13,4 + add r12,r9 + xor r15,r8 + + ror r14,6 + xor r13,rcx + add r12,r15 + + mov r15,r10 + add r12,QWORD[rbp] + xor r14,r10 + + xor r15,r11 + ror r13,14 + mov r9,r11 + + and rdi,r15 + ror r14,28 + add r12,r13 + + xor r9,rdi + add rbx,r12 + add r9,r12 + + lea rbp,[8+rbp] + mov r13,QWORD[32+rsp] + mov rdi,QWORD[8+rsp] + + mov r12,r13 + ror r13,7 + add r9,r14 + mov r14,rdi + ror rdi,42 + + xor r13,r12 + shr r12,7 + ror r13,1 + xor rdi,r14 + shr r14,6 + + ror rdi,19 + xor r12,r13 + xor rdi,r14 + add r12,QWORD[96+rsp] + + add r12,QWORD[24+rsp] + mov r13,rbx + add r12,rdi + mov r14,r9 + ror r13,23 + mov rdi,rcx + + xor r13,rbx + ror r14,5 + xor rdi,rdx + + mov QWORD[24+rsp],r12 + xor r14,r9 + and rdi,rbx + + ror r13,4 + add r12,r8 + xor rdi,rdx + + ror r14,6 + xor r13,rbx + add r12,rdi + + mov rdi,r9 + add r12,QWORD[rbp] + xor r14,r9 + + xor rdi,r10 + ror r13,14 + mov r8,r10 + + and r15,rdi + ror r14,28 + add r12,r13 + + xor r8,r15 + add rax,r12 + add r8,r12 + + lea rbp,[24+rbp] + mov r13,QWORD[40+rsp] + mov r15,QWORD[16+rsp] + + mov r12,r13 + ror r13,7 + add r8,r14 + mov r14,r15 + ror r15,42 + + xor r13,r12 + shr r12,7 + ror r13,1 + xor r15,r14 + shr r14,6 + + ror r15,19 + xor r12,r13 + xor r15,r14 + add r12,QWORD[104+rsp] + + add r12,QWORD[32+rsp] + mov r13,rax + add r12,r15 + mov r14,r8 + ror r13,23 + mov r15,rbx + + xor r13,rax + ror r14,5 + xor r15,rcx + + mov QWORD[32+rsp],r12 + xor r14,r8 + and r15,rax + + ror r13,4 + add r12,rdx + xor r15,rcx + + ror r14,6 + xor r13,rax + add r12,r15 + + mov r15,r8 + add r12,QWORD[rbp] + xor r14,r8 + + xor r15,r9 + ror r13,14 + mov rdx,r9 + + and rdi,r15 + ror r14,28 + add r12,r13 + + xor rdx,rdi + add r11,r12 + add rdx,r12 + + lea rbp,[8+rbp] + mov r13,QWORD[48+rsp] + mov rdi,QWORD[24+rsp] + + mov r12,r13 + ror r13,7 + add rdx,r14 + mov r14,rdi + ror rdi,42 + + xor r13,r12 + shr r12,7 + ror r13,1 + xor rdi,r14 + shr r14,6 + + ror rdi,19 + xor r12,r13 + xor rdi,r14 + add r12,QWORD[112+rsp] + + add r12,QWORD[40+rsp] + mov r13,r11 + add r12,rdi + mov r14,rdx + ror r13,23 + mov rdi,rax + + xor r13,r11 + ror r14,5 + xor rdi,rbx + + mov QWORD[40+rsp],r12 + xor r14,rdx + and rdi,r11 + + ror r13,4 + add r12,rcx + xor rdi,rbx + + ror r14,6 + xor r13,r11 + add r12,rdi + + mov rdi,rdx + add r12,QWORD[rbp] + xor r14,rdx + + xor rdi,r8 + ror r13,14 + mov rcx,r8 + + and r15,rdi + ror r14,28 + add r12,r13 + + xor rcx,r15 + add r10,r12 + add rcx,r12 + + lea rbp,[24+rbp] + mov r13,QWORD[56+rsp] + mov r15,QWORD[32+rsp] + + mov r12,r13 + ror r13,7 + add rcx,r14 + mov r14,r15 + ror r15,42 + + xor r13,r12 + shr r12,7 + ror r13,1 + xor r15,r14 + shr r14,6 + + ror r15,19 + xor r12,r13 + xor r15,r14 + add r12,QWORD[120+rsp] + + add r12,QWORD[48+rsp] + mov r13,r10 + add r12,r15 + mov r14,rcx + ror r13,23 + mov r15,r11 + + xor r13,r10 + ror r14,5 + xor r15,rax + + mov QWORD[48+rsp],r12 + xor r14,rcx + and r15,r10 + + ror r13,4 + add r12,rbx + xor r15,rax + + ror r14,6 + xor r13,r10 + add r12,r15 + + mov r15,rcx + add r12,QWORD[rbp] + xor r14,rcx + + xor r15,rdx + ror r13,14 + mov rbx,rdx + + and rdi,r15 + ror r14,28 + add r12,r13 + + xor rbx,rdi + add r9,r12 + add rbx,r12 + + lea rbp,[8+rbp] + mov r13,QWORD[64+rsp] + mov rdi,QWORD[40+rsp] + + mov r12,r13 + ror r13,7 + add rbx,r14 + mov r14,rdi + ror rdi,42 + + xor r13,r12 + shr r12,7 + ror r13,1 + xor rdi,r14 + shr r14,6 + + ror rdi,19 + xor r12,r13 + xor rdi,r14 + add r12,QWORD[rsp] + + add r12,QWORD[56+rsp] + mov r13,r9 + add r12,rdi + mov r14,rbx + ror r13,23 + mov rdi,r10 + + xor r13,r9 + ror r14,5 + xor rdi,r11 + + mov QWORD[56+rsp],r12 + xor r14,rbx + and rdi,r9 + + ror r13,4 + add r12,rax + xor rdi,r11 + + ror r14,6 + xor r13,r9 + add r12,rdi + + mov rdi,rbx + add r12,QWORD[rbp] + xor r14,rbx + + xor rdi,rcx + ror r13,14 + mov rax,rcx + + and r15,rdi + ror r14,28 + add r12,r13 + + xor rax,r15 + add r8,r12 + add rax,r12 + + lea rbp,[24+rbp] + mov r13,QWORD[72+rsp] + mov r15,QWORD[48+rsp] + + mov r12,r13 + ror r13,7 + add rax,r14 + mov r14,r15 + ror r15,42 + + xor r13,r12 + shr r12,7 + ror r13,1 + xor r15,r14 + shr r14,6 + + ror r15,19 + xor r12,r13 + xor r15,r14 + add r12,QWORD[8+rsp] + + add r12,QWORD[64+rsp] + mov r13,r8 + add r12,r15 + mov r14,rax + ror r13,23 + mov r15,r9 + + xor r13,r8 + ror r14,5 + xor r15,r10 + + mov QWORD[64+rsp],r12 + xor r14,rax + and r15,r8 + + ror r13,4 + add r12,r11 + xor r15,r10 + + ror r14,6 + xor r13,r8 + add r12,r15 + + mov r15,rax + add r12,QWORD[rbp] + xor r14,rax + + xor r15,rbx + ror r13,14 + mov r11,rbx + + and rdi,r15 + ror r14,28 + add r12,r13 + + xor r11,rdi + add rdx,r12 + add r11,r12 + + lea rbp,[8+rbp] + mov r13,QWORD[80+rsp] + mov rdi,QWORD[56+rsp] + + mov r12,r13 + ror r13,7 + add r11,r14 + mov r14,rdi + ror rdi,42 + + xor r13,r12 + shr r12,7 + ror r13,1 + xor rdi,r14 + shr r14,6 + + ror rdi,19 + xor r12,r13 + xor rdi,r14 + add r12,QWORD[16+rsp] + + add r12,QWORD[72+rsp] + mov r13,rdx + add r12,rdi + mov r14,r11 + ror r13,23 + mov rdi,r8 + + xor r13,rdx + ror r14,5 + xor rdi,r9 + + mov QWORD[72+rsp],r12 + xor r14,r11 + and rdi,rdx + + ror r13,4 + add r12,r10 + xor rdi,r9 + + ror r14,6 + xor r13,rdx + add r12,rdi + + mov rdi,r11 + add r12,QWORD[rbp] + xor r14,r11 + + xor rdi,rax + ror r13,14 + mov r10,rax + + and r15,rdi + ror r14,28 + add r12,r13 + + xor r10,r15 + add rcx,r12 + add r10,r12 + + lea rbp,[24+rbp] + mov r13,QWORD[88+rsp] + mov r15,QWORD[64+rsp] + + mov r12,r13 + ror r13,7 + add r10,r14 + mov r14,r15 + ror r15,42 + + xor r13,r12 + shr r12,7 + ror r13,1 + xor r15,r14 + shr r14,6 + + ror r15,19 + xor r12,r13 + xor r15,r14 + add r12,QWORD[24+rsp] + + add r12,QWORD[80+rsp] + mov r13,rcx + add r12,r15 + mov r14,r10 + ror r13,23 + mov r15,rdx + + xor r13,rcx + ror r14,5 + xor r15,r8 + + mov QWORD[80+rsp],r12 + xor r14,r10 + and r15,rcx + + ror r13,4 + add r12,r9 + xor r15,r8 + + ror r14,6 + xor r13,rcx + add r12,r15 + + mov r15,r10 + add r12,QWORD[rbp] + xor r14,r10 + + xor r15,r11 + ror r13,14 + mov r9,r11 + + and rdi,r15 + ror r14,28 + add r12,r13 + + xor r9,rdi + add rbx,r12 + add r9,r12 + + lea rbp,[8+rbp] + mov r13,QWORD[96+rsp] + mov rdi,QWORD[72+rsp] + + mov r12,r13 + ror r13,7 + add r9,r14 + mov r14,rdi + ror rdi,42 + + xor r13,r12 + shr r12,7 + ror r13,1 + xor rdi,r14 + shr r14,6 + + ror rdi,19 + xor r12,r13 + xor rdi,r14 + add r12,QWORD[32+rsp] + + add r12,QWORD[88+rsp] + mov r13,rbx + add r12,rdi + mov r14,r9 + ror r13,23 + mov rdi,rcx + + xor r13,rbx + ror r14,5 + xor rdi,rdx + + mov QWORD[88+rsp],r12 + xor r14,r9 + and rdi,rbx + + ror r13,4 + add r12,r8 + xor rdi,rdx + + ror r14,6 + xor r13,rbx + add r12,rdi + + mov rdi,r9 + add r12,QWORD[rbp] + xor r14,r9 + + xor rdi,r10 + ror r13,14 + mov r8,r10 + + and r15,rdi + ror r14,28 + add r12,r13 + + xor r8,r15 + add rax,r12 + add r8,r12 + + lea rbp,[24+rbp] + mov r13,QWORD[104+rsp] + mov r15,QWORD[80+rsp] + + mov r12,r13 + ror r13,7 + add r8,r14 + mov r14,r15 + ror r15,42 + + xor r13,r12 + shr r12,7 + ror r13,1 + xor r15,r14 + shr r14,6 + + ror r15,19 + xor r12,r13 + xor r15,r14 + add r12,QWORD[40+rsp] + + add r12,QWORD[96+rsp] + mov r13,rax + add r12,r15 + mov r14,r8 + ror r13,23 + mov r15,rbx + + xor r13,rax + ror r14,5 + xor r15,rcx + + mov QWORD[96+rsp],r12 + xor r14,r8 + and r15,rax + + ror r13,4 + add r12,rdx + xor r15,rcx + + ror r14,6 + xor r13,rax + add r12,r15 + + mov r15,r8 + add r12,QWORD[rbp] + xor r14,r8 + + xor r15,r9 + ror r13,14 + mov rdx,r9 + + and rdi,r15 + ror r14,28 + add r12,r13 + + xor rdx,rdi + add r11,r12 + add rdx,r12 + + lea rbp,[8+rbp] + mov r13,QWORD[112+rsp] + mov rdi,QWORD[88+rsp] + + mov r12,r13 + ror r13,7 + add rdx,r14 + mov r14,rdi + ror rdi,42 + + xor r13,r12 + shr r12,7 + ror r13,1 + xor rdi,r14 + shr r14,6 + + ror rdi,19 + xor r12,r13 + xor rdi,r14 + add r12,QWORD[48+rsp] + + add r12,QWORD[104+rsp] + mov r13,r11 + add r12,rdi + mov r14,rdx + ror r13,23 + mov rdi,rax + + xor r13,r11 + ror r14,5 + xor rdi,rbx + + mov QWORD[104+rsp],r12 + xor r14,rdx + and rdi,r11 + + ror r13,4 + add r12,rcx + xor rdi,rbx + + ror r14,6 + xor r13,r11 + add r12,rdi + + mov rdi,rdx + add r12,QWORD[rbp] + xor r14,rdx + + xor rdi,r8 + ror r13,14 + mov rcx,r8 + + and r15,rdi + ror r14,28 + add r12,r13 + + xor rcx,r15 + add r10,r12 + add rcx,r12 + + lea rbp,[24+rbp] + mov r13,QWORD[120+rsp] + mov r15,QWORD[96+rsp] + + mov r12,r13 + ror r13,7 + add rcx,r14 + mov r14,r15 + ror r15,42 + + xor r13,r12 + shr r12,7 + ror r13,1 + xor r15,r14 + shr r14,6 + + ror r15,19 + xor r12,r13 + xor r15,r14 + add r12,QWORD[56+rsp] + + add r12,QWORD[112+rsp] + mov r13,r10 + add r12,r15 + mov r14,rcx + ror r13,23 + mov r15,r11 + + xor r13,r10 + ror r14,5 + xor r15,rax + + mov QWORD[112+rsp],r12 + xor r14,rcx + and r15,r10 + + ror r13,4 + add r12,rbx + xor r15,rax + + ror r14,6 + xor r13,r10 + add r12,r15 + + mov r15,rcx + add r12,QWORD[rbp] + xor r14,rcx + + xor r15,rdx + ror r13,14 + mov rbx,rdx + + and rdi,r15 + ror r14,28 + add r12,r13 + + xor rbx,rdi + add r9,r12 + add rbx,r12 + + lea rbp,[8+rbp] + mov r13,QWORD[rsp] + mov rdi,QWORD[104+rsp] + + mov r12,r13 + ror r13,7 + add rbx,r14 + mov r14,rdi + ror rdi,42 + + xor r13,r12 + shr r12,7 + ror r13,1 + xor rdi,r14 + shr r14,6 + + ror rdi,19 + xor r12,r13 + xor rdi,r14 + add r12,QWORD[64+rsp] + + add r12,QWORD[120+rsp] + mov r13,r9 + add r12,rdi + mov r14,rbx + ror r13,23 + mov rdi,r10 + + xor r13,r9 + ror r14,5 + xor rdi,r11 + + mov QWORD[120+rsp],r12 + xor r14,rbx + and rdi,r9 + + ror r13,4 + add r12,rax + xor rdi,r11 + + ror r14,6 + xor r13,r9 + add r12,rdi + + mov rdi,rbx + add r12,QWORD[rbp] + xor r14,rbx + + xor rdi,rcx + ror r13,14 + mov rax,rcx + + and r15,rdi + ror r14,28 + add r12,r13 + + xor rax,r15 + add r8,r12 + add rax,r12 + + lea rbp,[24+rbp] + cmp BYTE[7+rbp],0 + jnz NEAR $L$rounds_16_xx + + mov rdi,QWORD[((128+0))+rsp] + add rax,r14 + lea rsi,[128+rsi] + + add rax,QWORD[rdi] + add rbx,QWORD[8+rdi] + add rcx,QWORD[16+rdi] + add rdx,QWORD[24+rdi] + add r8,QWORD[32+rdi] + add r9,QWORD[40+rdi] + add r10,QWORD[48+rdi] + add r11,QWORD[56+rdi] + + cmp rsi,QWORD[((128+16))+rsp] + + mov QWORD[rdi],rax + mov QWORD[8+rdi],rbx + mov QWORD[16+rdi],rcx + mov QWORD[24+rdi],rdx + mov QWORD[32+rdi],r8 + mov QWORD[40+rdi],r9 + mov QWORD[48+rdi],r10 + mov QWORD[56+rdi],r11 + jb NEAR $L$loop + + mov rsi,QWORD[152+rsp] + + mov r15,QWORD[((-48))+rsi] + + mov r14,QWORD[((-40))+rsi] + + mov r13,QWORD[((-32))+rsi] + + mov r12,QWORD[((-24))+rsi] + + mov rbp,QWORD[((-16))+rsi] + + mov rbx,QWORD[((-8))+rsi] + + lea rsp,[rsi] + +$L$epilogue: + mov rdi,QWORD[8+rsp] ;WIN64 epilogue + mov rsi,QWORD[16+rsp] + DB 0F3h,0C3h ;repret + +$L$SEH_end_sha512_block_data_order: +ALIGN 64 + +K512: + DQ 0x428a2f98d728ae22,0x7137449123ef65cd + DQ 0x428a2f98d728ae22,0x7137449123ef65cd + DQ 0xb5c0fbcfec4d3b2f,0xe9b5dba58189dbbc + DQ 0xb5c0fbcfec4d3b2f,0xe9b5dba58189dbbc + DQ 0x3956c25bf348b538,0x59f111f1b605d019 + DQ 0x3956c25bf348b538,0x59f111f1b605d019 + DQ 0x923f82a4af194f9b,0xab1c5ed5da6d8118 + DQ 0x923f82a4af194f9b,0xab1c5ed5da6d8118 + DQ 0xd807aa98a3030242,0x12835b0145706fbe + DQ 0xd807aa98a3030242,0x12835b0145706fbe + DQ 0x243185be4ee4b28c,0x550c7dc3d5ffb4e2 + DQ 0x243185be4ee4b28c,0x550c7dc3d5ffb4e2 + DQ 0x72be5d74f27b896f,0x80deb1fe3b1696b1 + DQ 0x72be5d74f27b896f,0x80deb1fe3b1696b1 + DQ 0x9bdc06a725c71235,0xc19bf174cf692694 + DQ 0x9bdc06a725c71235,0xc19bf174cf692694 + DQ 0xe49b69c19ef14ad2,0xefbe4786384f25e3 + DQ 0xe49b69c19ef14ad2,0xefbe4786384f25e3 + DQ 0x0fc19dc68b8cd5b5,0x240ca1cc77ac9c65 + DQ 0x0fc19dc68b8cd5b5,0x240ca1cc77ac9c65 + DQ 0x2de92c6f592b0275,0x4a7484aa6ea6e483 + DQ 0x2de92c6f592b0275,0x4a7484aa6ea6e483 + DQ 0x5cb0a9dcbd41fbd4,0x76f988da831153b5 + DQ 0x5cb0a9dcbd41fbd4,0x76f988da831153b5 + DQ 0x983e5152ee66dfab,0xa831c66d2db43210 + DQ 0x983e5152ee66dfab,0xa831c66d2db43210 + DQ 0xb00327c898fb213f,0xbf597fc7beef0ee4 + DQ 0xb00327c898fb213f,0xbf597fc7beef0ee4 + DQ 0xc6e00bf33da88fc2,0xd5a79147930aa725 + DQ 0xc6e00bf33da88fc2,0xd5a79147930aa725 + DQ 0x06ca6351e003826f,0x142929670a0e6e70 + DQ 0x06ca6351e003826f,0x142929670a0e6e70 + DQ 0x27b70a8546d22ffc,0x2e1b21385c26c926 + DQ 0x27b70a8546d22ffc,0x2e1b21385c26c926 + DQ 0x4d2c6dfc5ac42aed,0x53380d139d95b3df + DQ 0x4d2c6dfc5ac42aed,0x53380d139d95b3df + DQ 0x650a73548baf63de,0x766a0abb3c77b2a8 + DQ 0x650a73548baf63de,0x766a0abb3c77b2a8 + DQ 0x81c2c92e47edaee6,0x92722c851482353b + DQ 0x81c2c92e47edaee6,0x92722c851482353b + DQ 0xa2bfe8a14cf10364,0xa81a664bbc423001 + DQ 0xa2bfe8a14cf10364,0xa81a664bbc423001 + DQ 0xc24b8b70d0f89791,0xc76c51a30654be30 + DQ 0xc24b8b70d0f89791,0xc76c51a30654be30 + DQ 0xd192e819d6ef5218,0xd69906245565a910 + DQ 0xd192e819d6ef5218,0xd69906245565a910 + DQ 0xf40e35855771202a,0x106aa07032bbd1b8 + DQ 0xf40e35855771202a,0x106aa07032bbd1b8 + DQ 0x19a4c116b8d2d0c8,0x1e376c085141ab53 + DQ 0x19a4c116b8d2d0c8,0x1e376c085141ab53 + DQ 0x2748774cdf8eeb99,0x34b0bcb5e19b48a8 + DQ 0x2748774cdf8eeb99,0x34b0bcb5e19b48a8 + DQ 0x391c0cb3c5c95a63,0x4ed8aa4ae3418acb + DQ 0x391c0cb3c5c95a63,0x4ed8aa4ae3418acb + DQ 0x5b9cca4f7763e373,0x682e6ff3d6b2b8a3 + DQ 0x5b9cca4f7763e373,0x682e6ff3d6b2b8a3 + DQ 0x748f82ee5defb2fc,0x78a5636f43172f60 + DQ 0x748f82ee5defb2fc,0x78a5636f43172f60 + DQ 0x84c87814a1f0ab72,0x8cc702081a6439ec + DQ 0x84c87814a1f0ab72,0x8cc702081a6439ec + DQ 0x90befffa23631e28,0xa4506cebde82bde9 + DQ 0x90befffa23631e28,0xa4506cebde82bde9 + DQ 0xbef9a3f7b2c67915,0xc67178f2e372532b + DQ 0xbef9a3f7b2c67915,0xc67178f2e372532b + DQ 0xca273eceea26619c,0xd186b8c721c0c207 + DQ 0xca273eceea26619c,0xd186b8c721c0c207 + DQ 0xeada7dd6cde0eb1e,0xf57d4f7fee6ed178 + DQ 0xeada7dd6cde0eb1e,0xf57d4f7fee6ed178 + DQ 0x06f067aa72176fba,0x0a637dc5a2c898a6 + DQ 0x06f067aa72176fba,0x0a637dc5a2c898a6 + DQ 0x113f9804bef90dae,0x1b710b35131c471b + DQ 0x113f9804bef90dae,0x1b710b35131c471b + DQ 0x28db77f523047d84,0x32caab7b40c72493 + DQ 0x28db77f523047d84,0x32caab7b40c72493 + DQ 0x3c9ebe0a15c9bebc,0x431d67c49c100d4c + DQ 0x3c9ebe0a15c9bebc,0x431d67c49c100d4c + DQ 0x4cc5d4becb3e42b6,0x597f299cfc657e2a + DQ 0x4cc5d4becb3e42b6,0x597f299cfc657e2a + DQ 0x5fcb6fab3ad6faec,0x6c44198c4a475817 + DQ 0x5fcb6fab3ad6faec,0x6c44198c4a475817 + + DQ 0x0001020304050607,0x08090a0b0c0d0e0f + DQ 0x0001020304050607,0x08090a0b0c0d0e0f +DB 83,72,65,53,49,50,32,98,108,111,99,107,32,116,114,97 +DB 110,115,102,111,114,109,32,102,111,114,32,120,56,54,95,54 +DB 52,44,32,67,82,89,80,84,79,71,65,77,83,32,98,121 +DB 32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46 +DB 111,114,103,62,0 +EXTERN __imp_RtlVirtualUnwind + +ALIGN 16 +se_handler: + push rsi + push rdi + push rbx + push rbp + push r12 + push r13 + push r14 + push r15 + pushfq + sub rsp,64 + + mov rax,QWORD[120+r8] + mov rbx,QWORD[248+r8] + + mov rsi,QWORD[8+r9] + mov r11,QWORD[56+r9] + + mov r10d,DWORD[r11] + lea r10,[r10*1+rsi] + cmp rbx,r10 + jb NEAR $L$in_prologue + + mov rax,QWORD[152+r8] + + mov r10d,DWORD[4+r11] + lea r10,[r10*1+rsi] + cmp rbx,r10 + jae NEAR $L$in_prologue + mov rsi,rax + mov rax,QWORD[((128+24))+rax] + + mov rbx,QWORD[((-8))+rax] + mov rbp,QWORD[((-16))+rax] + mov r12,QWORD[((-24))+rax] + mov r13,QWORD[((-32))+rax] + mov r14,QWORD[((-40))+rax] + mov r15,QWORD[((-48))+rax] + mov QWORD[144+r8],rbx + mov QWORD[160+r8],rbp + mov QWORD[216+r8],r12 + mov QWORD[224+r8],r13 + mov QWORD[232+r8],r14 + mov QWORD[240+r8],r15 + + lea r10,[$L$epilogue] + cmp rbx,r10 + jb NEAR $L$in_prologue + + lea rsi,[((128+32))+rsi] + lea rdi,[512+r8] + mov ecx,12 + DD 0xa548f3fc + +$L$in_prologue: + mov rdi,QWORD[8+rax] + mov rsi,QWORD[16+rax] + mov QWORD[152+r8],rax + mov QWORD[168+r8],rsi + mov QWORD[176+r8],rdi + + mov rdi,QWORD[40+r9] + mov rsi,r8 + mov ecx,154 + DD 0xa548f3fc + + mov rsi,r9 + xor rcx,rcx + mov rdx,QWORD[8+rsi] + mov r8,QWORD[rsi] + mov r9,QWORD[16+rsi] + mov r10,QWORD[40+rsi] + lea r11,[56+rsi] + lea r12,[24+rsi] + mov QWORD[32+rsp],r10 + mov QWORD[40+rsp],r11 + mov QWORD[48+rsp],r12 + mov QWORD[56+rsp],rcx + call QWORD[__imp_RtlVirtualUnwind] + + mov eax,1 + add rsp,64 + popfq + pop r15 + pop r14 + pop r13 + pop r12 + pop rbp + pop rbx + pop rdi + pop rsi + DB 0F3h,0C3h ;repret + +section .pdata rdata align=4 +ALIGN 4 + DD $L$SEH_begin_sha512_block_data_order wrt ..imagebase + DD $L$SEH_end_sha512_block_data_order wrt ..imagebase + DD $L$SEH_info_sha512_block_data_order wrt ..imagebase +section .xdata rdata align=8 +ALIGN 8 +$L$SEH_info_sha512_block_data_order: +DB 9,0,0,0 + DD se_handler wrt ..imagebase + DD $L$prologue wrt ..imagebase,$L$epilogue wrt ..imagebase diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/x86_64cpuid.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/x86_64cpuid.nasm new file mode 100644 index 0000000000..2a3d5bcf72 --- /dev/null +++ b/CryptoPkg/Library/OpensslLib/X64/crypto/x86_64cpuid.nasm @@ -0,0 +1,491 @@ +; WARNING: do not edit! +; Generated from openssl/crypto/x86_64cpuid.pl +; +; Copyright 2005-2020 The OpenSSL Project Authors. All Rights Reserved. +; +; Licensed under the OpenSSL license (the "License"). You may not use +; this file except in compliance with the License. You can obtain a copy +; in the file LICENSE in the source distribution or at +; https://www.openssl.org/source/license.html + +default rel +%define XMMWORD +%define YMMWORD +%define ZMMWORD +EXTERN OPENSSL_cpuid_setup + +section .CRT$XCU rdata align=8 + DQ OPENSSL_cpuid_setup + + +common OPENSSL_ia32cap_P 16 + +section .text code align=64 + + +global OPENSSL_atomic_add + +ALIGN 16 +OPENSSL_atomic_add: + + mov eax,DWORD[rcx] +$L$spin: lea r8,[rax*1+rdx] +DB 0xf0 + cmpxchg DWORD[rcx],r8d + jne NEAR $L$spin + mov eax,r8d +DB 0x48,0x98 + DB 0F3h,0C3h ;repret + + + +global OPENSSL_rdtsc + +ALIGN 16 +OPENSSL_rdtsc: + + rdtsc + shl rdx,32 + or rax,rdx + DB 0F3h,0C3h ;repret + + + +global OPENSSL_ia32_cpuid + +ALIGN 16 +OPENSSL_ia32_cpuid: + mov QWORD[8+rsp],rdi ;WIN64 prologue + mov QWORD[16+rsp],rsi + mov rax,rsp +$L$SEH_begin_OPENSSL_ia32_cpuid: + mov rdi,rcx + + + + mov r8,rbx + + + xor eax,eax + mov QWORD[8+rdi],rax + cpuid + mov r11d,eax + + xor eax,eax + cmp ebx,0x756e6547 + setne al + mov r9d,eax + cmp edx,0x49656e69 + setne al + or r9d,eax + cmp ecx,0x6c65746e + setne al + or r9d,eax + jz NEAR $L$intel + + cmp ebx,0x68747541 + setne al + mov r10d,eax + cmp edx,0x69746E65 + setne al + or r10d,eax + cmp ecx,0x444D4163 + setne al + or r10d,eax + jnz NEAR $L$intel + + + mov eax,0x80000000 + cpuid + cmp eax,0x80000001 + jb NEAR $L$intel + mov r10d,eax + mov eax,0x80000001 + cpuid + or r9d,ecx + and r9d,0x00000801 + + cmp r10d,0x80000008 + jb NEAR $L$intel + + mov eax,0x80000008 + cpuid + movzx r10,cl + inc r10 + + mov eax,1 + cpuid + bt edx,28 + jnc NEAR $L$generic + shr ebx,16 + cmp bl,r10b + ja NEAR $L$generic + and edx,0xefffffff + jmp NEAR $L$generic + +$L$intel: + cmp r11d,4 + mov r10d,-1 + jb NEAR $L$nocacheinfo + + mov eax,4 + mov ecx,0 + cpuid + mov r10d,eax + shr r10d,14 + and r10d,0xfff + +$L$nocacheinfo: + mov eax,1 + cpuid + movd xmm0,eax + and edx,0xbfefffff + cmp r9d,0 + jne NEAR $L$notintel + or edx,0x40000000 + and ah,15 + cmp ah,15 + jne NEAR $L$notP4 + or edx,0x00100000 +$L$notP4: + cmp ah,6 + jne NEAR $L$notintel + and eax,0x0fff0ff0 + cmp eax,0x00050670 + je NEAR $L$knights + cmp eax,0x00080650 + jne NEAR $L$notintel +$L$knights: + and ecx,0xfbffffff + +$L$notintel: + bt edx,28 + jnc NEAR $L$generic + and edx,0xefffffff + cmp r10d,0 + je NEAR $L$generic + + or edx,0x10000000 + shr ebx,16 + cmp bl,1 + ja NEAR $L$generic + and edx,0xefffffff +$L$generic: + and r9d,0x00000800 + and ecx,0xfffff7ff + or r9d,ecx + + mov r10d,edx + + cmp r11d,7 + jb NEAR $L$no_extended_info + mov eax,7 + xor ecx,ecx + cpuid + bt r9d,26 + jc NEAR $L$notknights + and ebx,0xfff7ffff +$L$notknights: + movd eax,xmm0 + and eax,0x0fff0ff0 + cmp eax,0x00050650 + jne NEAR $L$notskylakex + and ebx,0xfffeffff + +$L$notskylakex: + mov DWORD[8+rdi],ebx + mov DWORD[12+rdi],ecx +$L$no_extended_info: + + bt r9d,27 + jnc NEAR $L$clear_avx + xor ecx,ecx +DB 0x0f,0x01,0xd0 + and eax,0xe6 + cmp eax,0xe6 + je NEAR $L$done + and DWORD[8+rdi],0x3fdeffff + + + + + and eax,6 + cmp eax,6 + je NEAR $L$done +$L$clear_avx: + mov eax,0xefffe7ff + and r9d,eax + mov eax,0x3fdeffdf + and DWORD[8+rdi],eax +$L$done: + shl r9,32 + mov eax,r10d + mov rbx,r8 + + or rax,r9 + mov rdi,QWORD[8+rsp] ;WIN64 epilogue + mov rsi,QWORD[16+rsp] + DB 0F3h,0C3h ;repret + +$L$SEH_end_OPENSSL_ia32_cpuid: + +global OPENSSL_cleanse + +ALIGN 16 +OPENSSL_cleanse: + + xor rax,rax + cmp rdx,15 + jae NEAR $L$ot + cmp rdx,0 + je NEAR $L$ret +$L$ittle: + mov BYTE[rcx],al + sub rdx,1 + lea rcx,[1+rcx] + jnz NEAR $L$ittle +$L$ret: + DB 0F3h,0C3h ;repret +ALIGN 16 +$L$ot: + test rcx,7 + jz NEAR $L$aligned + mov BYTE[rcx],al + lea rdx,[((-1))+rdx] + lea rcx,[1+rcx] + jmp NEAR $L$ot +$L$aligned: + mov QWORD[rcx],rax + lea rdx,[((-8))+rdx] + test rdx,-8 + lea rcx,[8+rcx] + jnz NEAR $L$aligned + cmp rdx,0 + jne NEAR $L$ittle + DB 0F3h,0C3h ;repret + + + +global CRYPTO_memcmp + +ALIGN 16 +CRYPTO_memcmp: + + xor rax,rax + xor r10,r10 + cmp r8,0 + je NEAR $L$no_data + cmp r8,16 + jne NEAR $L$oop_cmp + mov r10,QWORD[rcx] + mov r11,QWORD[8+rcx] + mov r8,1 + xor r10,QWORD[rdx] + xor r11,QWORD[8+rdx] + or r10,r11 + cmovnz rax,r8 + DB 0F3h,0C3h ;repret + +ALIGN 16 +$L$oop_cmp: + mov r10b,BYTE[rcx] + lea rcx,[1+rcx] + xor r10b,BYTE[rdx] + lea rdx,[1+rdx] + or al,r10b + dec r8 + jnz NEAR $L$oop_cmp + neg rax + shr rax,63 +$L$no_data: + DB 0F3h,0C3h ;repret + + +global OPENSSL_wipe_cpu + +ALIGN 16 +OPENSSL_wipe_cpu: + pxor xmm0,xmm0 + pxor xmm1,xmm1 + pxor xmm2,xmm2 + pxor xmm3,xmm3 + pxor xmm4,xmm4 + pxor xmm5,xmm5 + xor rcx,rcx + xor rdx,rdx + xor r8,r8 + xor r9,r9 + xor r10,r10 + xor r11,r11 + lea rax,[8+rsp] + DB 0F3h,0C3h ;repret + +global OPENSSL_instrument_bus + +ALIGN 16 +OPENSSL_instrument_bus: + + mov r10,rcx + mov rcx,rdx + mov r11,rdx + + rdtsc + mov r8d,eax + mov r9d,0 + clflush [r10] +DB 0xf0 + add DWORD[r10],r9d + jmp NEAR $L$oop +ALIGN 16 +$L$oop: rdtsc + mov edx,eax + sub eax,r8d + mov r8d,edx + mov r9d,eax + clflush [r10] +DB 0xf0 + add DWORD[r10],eax + lea r10,[4+r10] + sub rcx,1 + jnz NEAR $L$oop + + mov rax,r11 + DB 0F3h,0C3h ;repret + + + +global OPENSSL_instrument_bus2 + +ALIGN 16 +OPENSSL_instrument_bus2: + + mov r10,rcx + mov rcx,rdx + mov r11,r8 + mov QWORD[8+rsp],rcx + + rdtsc + mov r8d,eax + mov r9d,0 + + clflush [r10] +DB 0xf0 + add DWORD[r10],r9d + + rdtsc + mov edx,eax + sub eax,r8d + mov r8d,edx + mov r9d,eax +$L$oop2: + clflush [r10] +DB 0xf0 + add DWORD[r10],eax + + sub r11,1 + jz NEAR $L$done2 + + rdtsc + mov edx,eax + sub eax,r8d + mov r8d,edx + cmp eax,r9d + mov r9d,eax + mov edx,0 + setne dl + sub rcx,rdx + lea r10,[rdx*4+r10] + jnz NEAR $L$oop2 + +$L$done2: + mov rax,QWORD[8+rsp] + sub rax,rcx + DB 0F3h,0C3h ;repret + + +global OPENSSL_ia32_rdrand_bytes + +ALIGN 16 +OPENSSL_ia32_rdrand_bytes: + + xor rax,rax + cmp rdx,0 + je NEAR $L$done_rdrand_bytes + + mov r11,8 +$L$oop_rdrand_bytes: +DB 73,15,199,242 + jc NEAR $L$break_rdrand_bytes + dec r11 + jnz NEAR $L$oop_rdrand_bytes + jmp NEAR $L$done_rdrand_bytes + +ALIGN 16 +$L$break_rdrand_bytes: + cmp rdx,8 + jb NEAR $L$tail_rdrand_bytes + mov QWORD[rcx],r10 + lea rcx,[8+rcx] + add rax,8 + sub rdx,8 + jz NEAR $L$done_rdrand_bytes + mov r11,8 + jmp NEAR $L$oop_rdrand_bytes + +ALIGN 16 +$L$tail_rdrand_bytes: + mov BYTE[rcx],r10b + lea rcx,[1+rcx] + inc rax + shr r10,8 + dec rdx + jnz NEAR $L$tail_rdrand_bytes + +$L$done_rdrand_bytes: + xor r10,r10 + DB 0F3h,0C3h ;repret + + +global OPENSSL_ia32_rdseed_bytes + +ALIGN 16 +OPENSSL_ia32_rdseed_bytes: + + xor rax,rax + cmp rdx,0 + je NEAR $L$done_rdseed_bytes + + mov r11,8 +$L$oop_rdseed_bytes: +DB 73,15,199,250 + jc NEAR $L$break_rdseed_bytes + dec r11 + jnz NEAR $L$oop_rdseed_bytes + jmp NEAR $L$done_rdseed_bytes + +ALIGN 16 +$L$break_rdseed_bytes: + cmp rdx,8 + jb NEAR $L$tail_rdseed_bytes + mov QWORD[rcx],r10 + lea rcx,[8+rcx] + add rax,8 + sub rdx,8 + jz NEAR $L$done_rdseed_bytes + mov r11,8 + jmp NEAR $L$oop_rdseed_bytes + +ALIGN 16 +$L$tail_rdseed_bytes: + mov BYTE[rcx],r10b + lea rcx,[1+rcx] + inc rax + shr r10,8 + dec rdx + jnz NEAR $L$tail_rdseed_bytes + +$L$done_rdseed_bytes: + xor r10,r10 + DB 0F3h,0C3h ;repret + + diff --git a/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/aesni-mb-x86_64.S b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/aesni-mb-x86_64.S new file mode 100644 index 0000000000..7749fd685a --- /dev/null +++ b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/aesni-mb-x86_64.S @@ -0,0 +1,552 @@ +# WARNING: do not edit! +# Generated from openssl/crypto/aes/asm/aesni-mb-x86_64.pl +# +# Copyright 2013-2020 The OpenSSL Project Authors. All Rights Reserved. +# +# Licensed under the OpenSSL license (the "License"). You may not use +# this file except in compliance with the License. You can obtain a copy +# in the file LICENSE in the source distribution or at +# https://www.openssl.org/source/license.html + +.text + + + +.globl aesni_multi_cbc_encrypt +.type aesni_multi_cbc_encrypt,@function +.align 32 +aesni_multi_cbc_encrypt: +.cfi_startproc + movq %rsp,%rax +.cfi_def_cfa_register %rax + pushq %rbx +.cfi_offset %rbx,-16 + pushq %rbp +.cfi_offset %rbp,-24 + pushq %r12 +.cfi_offset %r12,-32 + pushq %r13 +.cfi_offset %r13,-40 + pushq %r14 +.cfi_offset %r14,-48 + pushq %r15 +.cfi_offset %r15,-56 + + + + + + + subq $48,%rsp + andq $-64,%rsp + movq %rax,16(%rsp) +.cfi_escape 0x0f,0x05,0x77,0x10,0x06,0x23,0x08 + +.Lenc4x_body: + movdqu (%rsi),%xmm12 + leaq 120(%rsi),%rsi + leaq 80(%rdi),%rdi + +.Lenc4x_loop_grande: + movl %edx,24(%rsp) + xorl %edx,%edx + movl -64(%rdi),%ecx + movq -80(%rdi),%r8 + cmpl %edx,%ecx + movq -72(%rdi),%r12 + cmovgl %ecx,%edx + testl %ecx,%ecx + movdqu -56(%rdi),%xmm2 + movl %ecx,32(%rsp) + cmovleq %rsp,%r8 + movl -24(%rdi),%ecx + movq -40(%rdi),%r9 + cmpl %edx,%ecx + movq -32(%rdi),%r13 + cmovgl %ecx,%edx + testl %ecx,%ecx + movdqu -16(%rdi),%xmm3 + movl %ecx,36(%rsp) + cmovleq %rsp,%r9 + movl 16(%rdi),%ecx + movq 0(%rdi),%r10 + cmpl %edx,%ecx + movq 8(%rdi),%r14 + cmovgl %ecx,%edx + testl %ecx,%ecx + movdqu 24(%rdi),%xmm4 + movl %ecx,40(%rsp) + cmovleq %rsp,%r10 + movl 56(%rdi),%ecx + movq 40(%rdi),%r11 + cmpl %edx,%ecx + movq 48(%rdi),%r15 + cmovgl %ecx,%edx + testl %ecx,%ecx + movdqu 64(%rdi),%xmm5 + movl %ecx,44(%rsp) + cmovleq %rsp,%r11 + testl %edx,%edx + jz .Lenc4x_done + + movups 16-120(%rsi),%xmm1 + pxor %xmm12,%xmm2 + movups 32-120(%rsi),%xmm0 + pxor %xmm12,%xmm3 + movl 240-120(%rsi),%eax + pxor %xmm12,%xmm4 + movdqu (%r8),%xmm6 + pxor %xmm12,%xmm5 + movdqu (%r9),%xmm7 + pxor %xmm6,%xmm2 + movdqu (%r10),%xmm8 + pxor %xmm7,%xmm3 + movdqu (%r11),%xmm9 + pxor %xmm8,%xmm4 + pxor %xmm9,%xmm5 + movdqa 32(%rsp),%xmm10 + xorq %rbx,%rbx + jmp .Loop_enc4x + +.align 32 +.Loop_enc4x: + addq $16,%rbx + leaq 16(%rsp),%rbp + movl $1,%ecx + subq %rbx,%rbp + +.byte 102,15,56,220,209 + prefetcht0 31(%r8,%rbx,1) + prefetcht0 31(%r9,%rbx,1) +.byte 102,15,56,220,217 + prefetcht0 31(%r10,%rbx,1) + prefetcht0 31(%r10,%rbx,1) +.byte 102,15,56,220,225 +.byte 102,15,56,220,233 + movups 48-120(%rsi),%xmm1 + cmpl 32(%rsp),%ecx +.byte 102,15,56,220,208 +.byte 102,15,56,220,216 +.byte 102,15,56,220,224 + cmovgeq %rbp,%r8 + cmovgq %rbp,%r12 +.byte 102,15,56,220,232 + movups -56(%rsi),%xmm0 + cmpl 36(%rsp),%ecx +.byte 102,15,56,220,209 +.byte 102,15,56,220,217 +.byte 102,15,56,220,225 + cmovgeq %rbp,%r9 + cmovgq %rbp,%r13 +.byte 102,15,56,220,233 + movups -40(%rsi),%xmm1 + cmpl 40(%rsp),%ecx +.byte 102,15,56,220,208 +.byte 102,15,56,220,216 +.byte 102,15,56,220,224 + cmovgeq %rbp,%r10 + cmovgq %rbp,%r14 +.byte 102,15,56,220,232 + movups -24(%rsi),%xmm0 + cmpl 44(%rsp),%ecx +.byte 102,15,56,220,209 +.byte 102,15,56,220,217 +.byte 102,15,56,220,225 + cmovgeq %rbp,%r11 + cmovgq %rbp,%r15 +.byte 102,15,56,220,233 + movups -8(%rsi),%xmm1 + movdqa %xmm10,%xmm11 +.byte 102,15,56,220,208 + prefetcht0 15(%r12,%rbx,1) + prefetcht0 15(%r13,%rbx,1) +.byte 102,15,56,220,216 + prefetcht0 15(%r14,%rbx,1) + prefetcht0 15(%r15,%rbx,1) +.byte 102,15,56,220,224 +.byte 102,15,56,220,232 + movups 128-120(%rsi),%xmm0 + pxor %xmm12,%xmm12 + +.byte 102,15,56,220,209 + pcmpgtd %xmm12,%xmm11 + movdqu -120(%rsi),%xmm12 +.byte 102,15,56,220,217 + paddd %xmm11,%xmm10 + movdqa %xmm10,32(%rsp) +.byte 102,15,56,220,225 +.byte 102,15,56,220,233 + movups 144-120(%rsi),%xmm1 + + cmpl $11,%eax + +.byte 102,15,56,220,208 +.byte 102,15,56,220,216 +.byte 102,15,56,220,224 +.byte 102,15,56,220,232 + movups 160-120(%rsi),%xmm0 + + jb .Lenc4x_tail + +.byte 102,15,56,220,209 +.byte 102,15,56,220,217 +.byte 102,15,56,220,225 +.byte 102,15,56,220,233 + movups 176-120(%rsi),%xmm1 + +.byte 102,15,56,220,208 +.byte 102,15,56,220,216 +.byte 102,15,56,220,224 +.byte 102,15,56,220,232 + movups 192-120(%rsi),%xmm0 + + je .Lenc4x_tail + +.byte 102,15,56,220,209 +.byte 102,15,56,220,217 +.byte 102,15,56,220,225 +.byte 102,15,56,220,233 + movups 208-120(%rsi),%xmm1 + +.byte 102,15,56,220,208 +.byte 102,15,56,220,216 +.byte 102,15,56,220,224 +.byte 102,15,56,220,232 + movups 224-120(%rsi),%xmm0 + jmp .Lenc4x_tail + +.align 32 +.Lenc4x_tail: +.byte 102,15,56,220,209 +.byte 102,15,56,220,217 +.byte 102,15,56,220,225 +.byte 102,15,56,220,233 + movdqu (%r8,%rbx,1),%xmm6 + movdqu 16-120(%rsi),%xmm1 + +.byte 102,15,56,221,208 + movdqu (%r9,%rbx,1),%xmm7 + pxor %xmm12,%xmm6 +.byte 102,15,56,221,216 + movdqu (%r10,%rbx,1),%xmm8 + pxor %xmm12,%xmm7 +.byte 102,15,56,221,224 + movdqu (%r11,%rbx,1),%xmm9 + pxor %xmm12,%xmm8 +.byte 102,15,56,221,232 + movdqu 32-120(%rsi),%xmm0 + pxor %xmm12,%xmm9 + + movups %xmm2,-16(%r12,%rbx,1) + pxor %xmm6,%xmm2 + movups %xmm3,-16(%r13,%rbx,1) + pxor %xmm7,%xmm3 + movups %xmm4,-16(%r14,%rbx,1) + pxor %xmm8,%xmm4 + movups %xmm5,-16(%r15,%rbx,1) + pxor %xmm9,%xmm5 + + decl %edx + jnz .Loop_enc4x + + movq 16(%rsp),%rax +.cfi_def_cfa %rax,8 + movl 24(%rsp),%edx + + + + + + + + + + + leaq 160(%rdi),%rdi + decl %edx + jnz .Lenc4x_loop_grande + +.Lenc4x_done: + movq -48(%rax),%r15 +.cfi_restore %r15 + movq -40(%rax),%r14 +.cfi_restore %r14 + movq -32(%rax),%r13 +.cfi_restore %r13 + movq -24(%rax),%r12 +.cfi_restore %r12 + movq -16(%rax),%rbp +.cfi_restore %rbp + movq -8(%rax),%rbx +.cfi_restore %rbx + leaq (%rax),%rsp +.cfi_def_cfa_register %rsp +.Lenc4x_epilogue: + .byte 0xf3,0xc3 +.cfi_endproc +.size aesni_multi_cbc_encrypt,.-aesni_multi_cbc_encrypt + +.globl aesni_multi_cbc_decrypt +.type aesni_multi_cbc_decrypt,@function +.align 32 +aesni_multi_cbc_decrypt: +.cfi_startproc + movq %rsp,%rax +.cfi_def_cfa_register %rax + pushq %rbx +.cfi_offset %rbx,-16 + pushq %rbp +.cfi_offset %rbp,-24 + pushq %r12 +.cfi_offset %r12,-32 + pushq %r13 +.cfi_offset %r13,-40 + pushq %r14 +.cfi_offset %r14,-48 + pushq %r15 +.cfi_offset %r15,-56 + + + + + + + subq $48,%rsp + andq $-64,%rsp + movq %rax,16(%rsp) +.cfi_escape 0x0f,0x05,0x77,0x10,0x06,0x23,0x08 + +.Ldec4x_body: + movdqu (%rsi),%xmm12 + leaq 120(%rsi),%rsi + leaq 80(%rdi),%rdi + +.Ldec4x_loop_grande: + movl %edx,24(%rsp) + xorl %edx,%edx + movl -64(%rdi),%ecx + movq -80(%rdi),%r8 + cmpl %edx,%ecx + movq -72(%rdi),%r12 + cmovgl %ecx,%edx + testl %ecx,%ecx + movdqu -56(%rdi),%xmm6 + movl %ecx,32(%rsp) + cmovleq %rsp,%r8 + movl -24(%rdi),%ecx + movq -40(%rdi),%r9 + cmpl %edx,%ecx + movq -32(%rdi),%r13 + cmovgl %ecx,%edx + testl %ecx,%ecx + movdqu -16(%rdi),%xmm7 + movl %ecx,36(%rsp) + cmovleq %rsp,%r9 + movl 16(%rdi),%ecx + movq 0(%rdi),%r10 + cmpl %edx,%ecx + movq 8(%rdi),%r14 + cmovgl %ecx,%edx + testl %ecx,%ecx + movdqu 24(%rdi),%xmm8 + movl %ecx,40(%rsp) + cmovleq %rsp,%r10 + movl 56(%rdi),%ecx + movq 40(%rdi),%r11 + cmpl %edx,%ecx + movq 48(%rdi),%r15 + cmovgl %ecx,%edx + testl %ecx,%ecx + movdqu 64(%rdi),%xmm9 + movl %ecx,44(%rsp) + cmovleq %rsp,%r11 + testl %edx,%edx + jz .Ldec4x_done + + movups 16-120(%rsi),%xmm1 + movups 32-120(%rsi),%xmm0 + movl 240-120(%rsi),%eax + movdqu (%r8),%xmm2 + movdqu (%r9),%xmm3 + pxor %xmm12,%xmm2 + movdqu (%r10),%xmm4 + pxor %xmm12,%xmm3 + movdqu (%r11),%xmm5 + pxor %xmm12,%xmm4 + pxor %xmm12,%xmm5 + movdqa 32(%rsp),%xmm10 + xorq %rbx,%rbx + jmp .Loop_dec4x + +.align 32 +.Loop_dec4x: + addq $16,%rbx + leaq 16(%rsp),%rbp + movl $1,%ecx + subq %rbx,%rbp + +.byte 102,15,56,222,209 + prefetcht0 31(%r8,%rbx,1) + prefetcht0 31(%r9,%rbx,1) +.byte 102,15,56,222,217 + prefetcht0 31(%r10,%rbx,1) + prefetcht0 31(%r11,%rbx,1) +.byte 102,15,56,222,225 +.byte 102,15,56,222,233 + movups 48-120(%rsi),%xmm1 + cmpl 32(%rsp),%ecx +.byte 102,15,56,222,208 +.byte 102,15,56,222,216 +.byte 102,15,56,222,224 + cmovgeq %rbp,%r8 + cmovgq %rbp,%r12 +.byte 102,15,56,222,232 + movups -56(%rsi),%xmm0 + cmpl 36(%rsp),%ecx +.byte 102,15,56,222,209 +.byte 102,15,56,222,217 +.byte 102,15,56,222,225 + cmovgeq %rbp,%r9 + cmovgq %rbp,%r13 +.byte 102,15,56,222,233 + movups -40(%rsi),%xmm1 + cmpl 40(%rsp),%ecx +.byte 102,15,56,222,208 +.byte 102,15,56,222,216 +.byte 102,15,56,222,224 + cmovgeq %rbp,%r10 + cmovgq %rbp,%r14 +.byte 102,15,56,222,232 + movups -24(%rsi),%xmm0 + cmpl 44(%rsp),%ecx +.byte 102,15,56,222,209 +.byte 102,15,56,222,217 +.byte 102,15,56,222,225 + cmovgeq %rbp,%r11 + cmovgq %rbp,%r15 +.byte 102,15,56,222,233 + movups -8(%rsi),%xmm1 + movdqa %xmm10,%xmm11 +.byte 102,15,56,222,208 + prefetcht0 15(%r12,%rbx,1) + prefetcht0 15(%r13,%rbx,1) +.byte 102,15,56,222,216 + prefetcht0 15(%r14,%rbx,1) + prefetcht0 15(%r15,%rbx,1) +.byte 102,15,56,222,224 +.byte 102,15,56,222,232 + movups 128-120(%rsi),%xmm0 + pxor %xmm12,%xmm12 + +.byte 102,15,56,222,209 + pcmpgtd %xmm12,%xmm11 + movdqu -120(%rsi),%xmm12 +.byte 102,15,56,222,217 + paddd %xmm11,%xmm10 + movdqa %xmm10,32(%rsp) +.byte 102,15,56,222,225 +.byte 102,15,56,222,233 + movups 144-120(%rsi),%xmm1 + + cmpl $11,%eax + +.byte 102,15,56,222,208 +.byte 102,15,56,222,216 +.byte 102,15,56,222,224 +.byte 102,15,56,222,232 + movups 160-120(%rsi),%xmm0 + + jb .Ldec4x_tail + +.byte 102,15,56,222,209 +.byte 102,15,56,222,217 +.byte 102,15,56,222,225 +.byte 102,15,56,222,233 + movups 176-120(%rsi),%xmm1 + +.byte 102,15,56,222,208 +.byte 102,15,56,222,216 +.byte 102,15,56,222,224 +.byte 102,15,56,222,232 + movups 192-120(%rsi),%xmm0 + + je .Ldec4x_tail + +.byte 102,15,56,222,209 +.byte 102,15,56,222,217 +.byte 102,15,56,222,225 +.byte 102,15,56,222,233 + movups 208-120(%rsi),%xmm1 + +.byte 102,15,56,222,208 +.byte 102,15,56,222,216 +.byte 102,15,56,222,224 +.byte 102,15,56,222,232 + movups 224-120(%rsi),%xmm0 + jmp .Ldec4x_tail + +.align 32 +.Ldec4x_tail: +.byte 102,15,56,222,209 +.byte 102,15,56,222,217 +.byte 102,15,56,222,225 + pxor %xmm0,%xmm6 + pxor %xmm0,%xmm7 +.byte 102,15,56,222,233 + movdqu 16-120(%rsi),%xmm1 + pxor %xmm0,%xmm8 + pxor %xmm0,%xmm9 + movdqu 32-120(%rsi),%xmm0 + +.byte 102,15,56,223,214 +.byte 102,15,56,223,223 + movdqu -16(%r8,%rbx,1),%xmm6 + movdqu -16(%r9,%rbx,1),%xmm7 +.byte 102,65,15,56,223,224 +.byte 102,65,15,56,223,233 + movdqu -16(%r10,%rbx,1),%xmm8 + movdqu -16(%r11,%rbx,1),%xmm9 + + movups %xmm2,-16(%r12,%rbx,1) + movdqu (%r8,%rbx,1),%xmm2 + movups %xmm3,-16(%r13,%rbx,1) + movdqu (%r9,%rbx,1),%xmm3 + pxor %xmm12,%xmm2 + movups %xmm4,-16(%r14,%rbx,1) + movdqu (%r10,%rbx,1),%xmm4 + pxor %xmm12,%xmm3 + movups %xmm5,-16(%r15,%rbx,1) + movdqu (%r11,%rbx,1),%xmm5 + pxor %xmm12,%xmm4 + pxor %xmm12,%xmm5 + + decl %edx + jnz .Loop_dec4x + + movq 16(%rsp),%rax +.cfi_def_cfa %rax,8 + movl 24(%rsp),%edx + + leaq 160(%rdi),%rdi + decl %edx + jnz .Ldec4x_loop_grande + +.Ldec4x_done: + movq -48(%rax),%r15 +.cfi_restore %r15 + movq -40(%rax),%r14 +.cfi_restore %r14 + movq -32(%rax),%r13 +.cfi_restore %r13 + movq -24(%rax),%r12 +.cfi_restore %r12 + movq -16(%rax),%rbp +.cfi_restore %rbp + movq -8(%rax),%rbx +.cfi_restore %rbx + leaq (%rax),%rsp +.cfi_def_cfa_register %rsp +.Ldec4x_epilogue: + .byte 0xf3,0xc3 +.cfi_endproc +.size aesni_multi_cbc_decrypt,.-aesni_multi_cbc_decrypt diff --git a/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/aesni-sha1-x86_64.S b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/aesni-sha1-x86_64.S new file mode 100644 index 0000000000..ab763a2eec --- /dev/null +++ b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/aesni-sha1-x86_64.S @@ -0,0 +1,1719 @@ +# WARNING: do not edit! +# Generated from openssl/crypto/aes/asm/aesni-sha1-x86_64.pl +# +# Copyright 2011-2020 The OpenSSL Project Authors. All Rights Reserved. +# +# Licensed under the OpenSSL license (the "License"). You may not use +# this file except in compliance with the License. You can obtain a copy +# in the file LICENSE in the source distribution or at +# https://www.openssl.org/source/license.html + +.text + + +.globl aesni_cbc_sha1_enc +.type aesni_cbc_sha1_enc,@function +.align 32 +aesni_cbc_sha1_enc: +.cfi_startproc + + movl OPENSSL_ia32cap_P+0(%rip),%r10d + movq OPENSSL_ia32cap_P+4(%rip),%r11 + btq $61,%r11 + jc aesni_cbc_sha1_enc_shaext + jmp aesni_cbc_sha1_enc_ssse3 + .byte 0xf3,0xc3 +.cfi_endproc +.size aesni_cbc_sha1_enc,.-aesni_cbc_sha1_enc +.type aesni_cbc_sha1_enc_ssse3,@function +.align 32 +aesni_cbc_sha1_enc_ssse3: +.cfi_startproc + movq 8(%rsp),%r10 + + + pushq %rbx +.cfi_adjust_cfa_offset 8 +.cfi_offset %rbx,-16 + pushq %rbp +.cfi_adjust_cfa_offset 8 +.cfi_offset %rbp,-24 + pushq %r12 +.cfi_adjust_cfa_offset 8 +.cfi_offset %r12,-32 + pushq %r13 +.cfi_adjust_cfa_offset 8 +.cfi_offset %r13,-40 + pushq %r14 +.cfi_adjust_cfa_offset 8 +.cfi_offset %r14,-48 + pushq %r15 +.cfi_adjust_cfa_offset 8 +.cfi_offset %r15,-56 + leaq -104(%rsp),%rsp +.cfi_adjust_cfa_offset 104 + + + movq %rdi,%r12 + movq %rsi,%r13 + movq %rdx,%r14 + leaq 112(%rcx),%r15 + movdqu (%r8),%xmm2 + movq %r8,88(%rsp) + shlq $6,%r14 + subq %r12,%r13 + movl 240-112(%r15),%r8d + addq %r10,%r14 + + leaq K_XX_XX(%rip),%r11 + movl 0(%r9),%eax + movl 4(%r9),%ebx + movl 8(%r9),%ecx + movl 12(%r9),%edx + movl %ebx,%esi + movl 16(%r9),%ebp + movl %ecx,%edi + xorl %edx,%edi + andl %edi,%esi + + movdqa 64(%r11),%xmm3 + movdqa 0(%r11),%xmm13 + movdqu 0(%r10),%xmm4 + movdqu 16(%r10),%xmm5 + movdqu 32(%r10),%xmm6 + movdqu 48(%r10),%xmm7 +.byte 102,15,56,0,227 +.byte 102,15,56,0,235 +.byte 102,15,56,0,243 + addq $64,%r10 + paddd %xmm13,%xmm4 +.byte 102,15,56,0,251 + paddd %xmm13,%xmm5 + paddd %xmm13,%xmm6 + movdqa %xmm4,0(%rsp) + psubd %xmm13,%xmm4 + movdqa %xmm5,16(%rsp) + psubd %xmm13,%xmm5 + movdqa %xmm6,32(%rsp) + psubd %xmm13,%xmm6 + movups -112(%r15),%xmm15 + movups 16-112(%r15),%xmm0 + jmp .Loop_ssse3 +.align 32 +.Loop_ssse3: + rorl $2,%ebx + movups 0(%r12),%xmm14 + xorps %xmm15,%xmm14 + xorps %xmm14,%xmm2 + movups -80(%r15),%xmm1 +.byte 102,15,56,220,208 + pshufd $238,%xmm4,%xmm8 + xorl %edx,%esi + movdqa %xmm7,%xmm12 + paddd %xmm7,%xmm13 + movl %eax,%edi + addl 0(%rsp),%ebp + punpcklqdq %xmm5,%xmm8 + xorl %ecx,%ebx + roll $5,%eax + addl %esi,%ebp + psrldq $4,%xmm12 + andl %ebx,%edi + xorl %ecx,%ebx + pxor %xmm4,%xmm8 + addl %eax,%ebp + rorl $7,%eax + pxor %xmm6,%xmm12 + xorl %ecx,%edi + movl %ebp,%esi + addl 4(%rsp),%edx + pxor %xmm12,%xmm8 + xorl %ebx,%eax + roll $5,%ebp + movdqa %xmm13,48(%rsp) + addl %edi,%edx + movups -64(%r15),%xmm0 +.byte 102,15,56,220,209 + andl %eax,%esi + movdqa %xmm8,%xmm3 + xorl %ebx,%eax + addl %ebp,%edx + rorl $7,%ebp + movdqa %xmm8,%xmm12 + xorl %ebx,%esi + pslldq $12,%xmm3 + paddd %xmm8,%xmm8 + movl %edx,%edi + addl 8(%rsp),%ecx + psrld $31,%xmm12 + xorl %eax,%ebp + roll $5,%edx + addl %esi,%ecx + movdqa %xmm3,%xmm13 + andl %ebp,%edi + xorl %eax,%ebp + psrld $30,%xmm3 + addl %edx,%ecx + rorl $7,%edx + por %xmm12,%xmm8 + xorl %eax,%edi + movl %ecx,%esi + addl 12(%rsp),%ebx + movups -48(%r15),%xmm1 +.byte 102,15,56,220,208 + pslld $2,%xmm13 + pxor %xmm3,%xmm8 + xorl %ebp,%edx + movdqa 0(%r11),%xmm3 + roll $5,%ecx + addl %edi,%ebx + andl %edx,%esi + pxor %xmm13,%xmm8 + xorl %ebp,%edx + addl %ecx,%ebx + rorl $7,%ecx + pshufd $238,%xmm5,%xmm9 + xorl %ebp,%esi + movdqa %xmm8,%xmm13 + paddd %xmm8,%xmm3 + movl %ebx,%edi + addl 16(%rsp),%eax + punpcklqdq %xmm6,%xmm9 + xorl %edx,%ecx + roll $5,%ebx + addl %esi,%eax + psrldq $4,%xmm13 + andl %ecx,%edi + xorl %edx,%ecx + pxor %xmm5,%xmm9 + addl %ebx,%eax + rorl $7,%ebx + movups -32(%r15),%xmm0 +.byte 102,15,56,220,209 + pxor %xmm7,%xmm13 + xorl %edx,%edi + movl %eax,%esi + addl 20(%rsp),%ebp + pxor %xmm13,%xmm9 + xorl %ecx,%ebx + roll $5,%eax + movdqa %xmm3,0(%rsp) + addl %edi,%ebp + andl %ebx,%esi + movdqa %xmm9,%xmm12 + xorl %ecx,%ebx + addl %eax,%ebp + rorl $7,%eax + movdqa %xmm9,%xmm13 + xorl %ecx,%esi + pslldq $12,%xmm12 + paddd %xmm9,%xmm9 + movl %ebp,%edi + addl 24(%rsp),%edx + psrld $31,%xmm13 + xorl %ebx,%eax + roll $5,%ebp + addl %esi,%edx + movups -16(%r15),%xmm1 +.byte 102,15,56,220,208 + movdqa %xmm12,%xmm3 + andl %eax,%edi + xorl %ebx,%eax + psrld $30,%xmm12 + addl %ebp,%edx + rorl $7,%ebp + por %xmm13,%xmm9 + xorl %ebx,%edi + movl %edx,%esi + addl 28(%rsp),%ecx + pslld $2,%xmm3 + pxor %xmm12,%xmm9 + xorl %eax,%ebp + movdqa 16(%r11),%xmm12 + roll $5,%edx + addl %edi,%ecx + andl %ebp,%esi + pxor %xmm3,%xmm9 + xorl %eax,%ebp + addl %edx,%ecx + rorl $7,%edx + pshufd $238,%xmm6,%xmm10 + xorl %eax,%esi + movdqa %xmm9,%xmm3 + paddd %xmm9,%xmm12 + movl %ecx,%edi + addl 32(%rsp),%ebx + movups 0(%r15),%xmm0 +.byte 102,15,56,220,209 + punpcklqdq %xmm7,%xmm10 + xorl %ebp,%edx + roll $5,%ecx + addl %esi,%ebx + psrldq $4,%xmm3 + andl %edx,%edi + xorl %ebp,%edx + pxor %xmm6,%xmm10 + addl %ecx,%ebx + rorl $7,%ecx + pxor %xmm8,%xmm3 + xorl %ebp,%edi + movl %ebx,%esi + addl 36(%rsp),%eax + pxor %xmm3,%xmm10 + xorl %edx,%ecx + roll $5,%ebx + movdqa %xmm12,16(%rsp) + addl %edi,%eax + andl %ecx,%esi + movdqa %xmm10,%xmm13 + xorl %edx,%ecx + addl %ebx,%eax + rorl $7,%ebx + movups 16(%r15),%xmm1 +.byte 102,15,56,220,208 + movdqa %xmm10,%xmm3 + xorl %edx,%esi + pslldq $12,%xmm13 + paddd %xmm10,%xmm10 + movl %eax,%edi + addl 40(%rsp),%ebp + psrld $31,%xmm3 + xorl %ecx,%ebx + roll $5,%eax + addl %esi,%ebp + movdqa %xmm13,%xmm12 + andl %ebx,%edi + xorl %ecx,%ebx + psrld $30,%xmm13 + addl %eax,%ebp + rorl $7,%eax + por %xmm3,%xmm10 + xorl %ecx,%edi + movl %ebp,%esi + addl 44(%rsp),%edx + pslld $2,%xmm12 + pxor %xmm13,%xmm10 + xorl %ebx,%eax + movdqa 16(%r11),%xmm13 + roll $5,%ebp + addl %edi,%edx + movups 32(%r15),%xmm0 +.byte 102,15,56,220,209 + andl %eax,%esi + pxor %xmm12,%xmm10 + xorl %ebx,%eax + addl %ebp,%edx + rorl $7,%ebp + pshufd $238,%xmm7,%xmm11 + xorl %ebx,%esi + movdqa %xmm10,%xmm12 + paddd %xmm10,%xmm13 + movl %edx,%edi + addl 48(%rsp),%ecx + punpcklqdq %xmm8,%xmm11 + xorl %eax,%ebp + roll $5,%edx + addl %esi,%ecx + psrldq $4,%xmm12 + andl %ebp,%edi + xorl %eax,%ebp + pxor %xmm7,%xmm11 + addl %edx,%ecx + rorl $7,%edx + pxor %xmm9,%xmm12 + xorl %eax,%edi + movl %ecx,%esi + addl 52(%rsp),%ebx + movups 48(%r15),%xmm1 +.byte 102,15,56,220,208 + pxor %xmm12,%xmm11 + xorl %ebp,%edx + roll $5,%ecx + movdqa %xmm13,32(%rsp) + addl %edi,%ebx + andl %edx,%esi + movdqa %xmm11,%xmm3 + xorl %ebp,%edx + addl %ecx,%ebx + rorl $7,%ecx + movdqa %xmm11,%xmm12 + xorl %ebp,%esi + pslldq $12,%xmm3 + paddd %xmm11,%xmm11 + movl %ebx,%edi + addl 56(%rsp),%eax + psrld $31,%xmm12 + xorl %edx,%ecx + roll $5,%ebx + addl %esi,%eax + movdqa %xmm3,%xmm13 + andl %ecx,%edi + xorl %edx,%ecx + psrld $30,%xmm3 + addl %ebx,%eax + rorl $7,%ebx + cmpl $11,%r8d + jb .Laesenclast1 + movups 64(%r15),%xmm0 +.byte 102,15,56,220,209 + movups 80(%r15),%xmm1 +.byte 102,15,56,220,208 + je .Laesenclast1 + movups 96(%r15),%xmm0 +.byte 102,15,56,220,209 + movups 112(%r15),%xmm1 +.byte 102,15,56,220,208 +.Laesenclast1: +.byte 102,15,56,221,209 + movups 16-112(%r15),%xmm0 + por %xmm12,%xmm11 + xorl %edx,%edi + movl %eax,%esi + addl 60(%rsp),%ebp + pslld $2,%xmm13 + pxor %xmm3,%xmm11 + xorl %ecx,%ebx + movdqa 16(%r11),%xmm3 + roll $5,%eax + addl %edi,%ebp + andl %ebx,%esi + pxor %xmm13,%xmm11 + pshufd $238,%xmm10,%xmm13 + xorl %ecx,%ebx + addl %eax,%ebp + rorl $7,%eax + pxor %xmm8,%xmm4 + xorl %ecx,%esi + movl %ebp,%edi + addl 0(%rsp),%edx + punpcklqdq %xmm11,%xmm13 + xorl %ebx,%eax + roll $5,%ebp + pxor %xmm5,%xmm4 + addl %esi,%edx + movups 16(%r12),%xmm14 + xorps %xmm15,%xmm14 + movups %xmm2,0(%r12,%r13,1) + xorps %xmm14,%xmm2 + movups -80(%r15),%xmm1 +.byte 102,15,56,220,208 + andl %eax,%edi + movdqa %xmm3,%xmm12 + xorl %ebx,%eax + paddd %xmm11,%xmm3 + addl %ebp,%edx + pxor %xmm13,%xmm4 + rorl $7,%ebp + xorl %ebx,%edi + movl %edx,%esi + addl 4(%rsp),%ecx + movdqa %xmm4,%xmm13 + xorl %eax,%ebp + roll $5,%edx + movdqa %xmm3,48(%rsp) + addl %edi,%ecx + andl %ebp,%esi + xorl %eax,%ebp + pslld $2,%xmm4 + addl %edx,%ecx + rorl $7,%edx + psrld $30,%xmm13 + xorl %eax,%esi + movl %ecx,%edi + addl 8(%rsp),%ebx + movups -64(%r15),%xmm0 +.byte 102,15,56,220,209 + por %xmm13,%xmm4 + xorl %ebp,%edx + roll $5,%ecx + pshufd $238,%xmm11,%xmm3 + addl %esi,%ebx + andl %edx,%edi + xorl %ebp,%edx + addl %ecx,%ebx + addl 12(%rsp),%eax + xorl %ebp,%edi + movl %ebx,%esi + roll $5,%ebx + addl %edi,%eax + xorl %edx,%esi + rorl $7,%ecx + addl %ebx,%eax + pxor %xmm9,%xmm5 + addl 16(%rsp),%ebp + movups -48(%r15),%xmm1 +.byte 102,15,56,220,208 + xorl %ecx,%esi + punpcklqdq %xmm4,%xmm3 + movl %eax,%edi + roll $5,%eax + pxor %xmm6,%xmm5 + addl %esi,%ebp + xorl %ecx,%edi + movdqa %xmm12,%xmm13 + rorl $7,%ebx + paddd %xmm4,%xmm12 + addl %eax,%ebp + pxor %xmm3,%xmm5 + addl 20(%rsp),%edx + xorl %ebx,%edi + movl %ebp,%esi + roll $5,%ebp + movdqa %xmm5,%xmm3 + addl %edi,%edx + xorl %ebx,%esi + movdqa %xmm12,0(%rsp) + rorl $7,%eax + addl %ebp,%edx + addl 24(%rsp),%ecx + pslld $2,%xmm5 + xorl %eax,%esi + movl %edx,%edi + psrld $30,%xmm3 + roll $5,%edx + addl %esi,%ecx + movups -32(%r15),%xmm0 +.byte 102,15,56,220,209 + xorl %eax,%edi + rorl $7,%ebp + por %xmm3,%xmm5 + addl %edx,%ecx + addl 28(%rsp),%ebx + pshufd $238,%xmm4,%xmm12 + xorl %ebp,%edi + movl %ecx,%esi + roll $5,%ecx + addl %edi,%ebx + xorl %ebp,%esi + rorl $7,%edx + addl %ecx,%ebx + pxor %xmm10,%xmm6 + addl 32(%rsp),%eax + xorl %edx,%esi + punpcklqdq %xmm5,%xmm12 + movl %ebx,%edi + roll $5,%ebx + pxor %xmm7,%xmm6 + addl %esi,%eax + xorl %edx,%edi + movdqa 32(%r11),%xmm3 + rorl $7,%ecx + paddd %xmm5,%xmm13 + addl %ebx,%eax + pxor %xmm12,%xmm6 + addl 36(%rsp),%ebp + movups -16(%r15),%xmm1 +.byte 102,15,56,220,208 + xorl %ecx,%edi + movl %eax,%esi + roll $5,%eax + movdqa %xmm6,%xmm12 + addl %edi,%ebp + xorl %ecx,%esi + movdqa %xmm13,16(%rsp) + rorl $7,%ebx + addl %eax,%ebp + addl 40(%rsp),%edx + pslld $2,%xmm6 + xorl %ebx,%esi + movl %ebp,%edi + psrld $30,%xmm12 + roll $5,%ebp + addl %esi,%edx + xorl %ebx,%edi + rorl $7,%eax + por %xmm12,%xmm6 + addl %ebp,%edx + addl 44(%rsp),%ecx + pshufd $238,%xmm5,%xmm13 + xorl %eax,%edi + movl %edx,%esi + roll $5,%edx + addl %edi,%ecx + movups 0(%r15),%xmm0 +.byte 102,15,56,220,209 + xorl %eax,%esi + rorl $7,%ebp + addl %edx,%ecx + pxor %xmm11,%xmm7 + addl 48(%rsp),%ebx + xorl %ebp,%esi + punpcklqdq %xmm6,%xmm13 + movl %ecx,%edi + roll $5,%ecx + pxor %xmm8,%xmm7 + addl %esi,%ebx + xorl %ebp,%edi + movdqa %xmm3,%xmm12 + rorl $7,%edx + paddd %xmm6,%xmm3 + addl %ecx,%ebx + pxor %xmm13,%xmm7 + addl 52(%rsp),%eax + xorl %edx,%edi + movl %ebx,%esi + roll $5,%ebx + movdqa %xmm7,%xmm13 + addl %edi,%eax + xorl %edx,%esi + movdqa %xmm3,32(%rsp) + rorl $7,%ecx + addl %ebx,%eax + addl 56(%rsp),%ebp + movups 16(%r15),%xmm1 +.byte 102,15,56,220,208 + pslld $2,%xmm7 + xorl %ecx,%esi + movl %eax,%edi + psrld $30,%xmm13 + roll $5,%eax + addl %esi,%ebp + xorl %ecx,%edi + rorl $7,%ebx + por %xmm13,%xmm7 + addl %eax,%ebp + addl 60(%rsp),%edx + pshufd $238,%xmm6,%xmm3 + xorl %ebx,%edi + movl %ebp,%esi + roll $5,%ebp + addl %edi,%edx + xorl %ebx,%esi + rorl $7,%eax + addl %ebp,%edx + pxor %xmm4,%xmm8 + addl 0(%rsp),%ecx + xorl %eax,%esi + punpcklqdq %xmm7,%xmm3 + movl %edx,%edi + roll $5,%edx + pxor %xmm9,%xmm8 + addl %esi,%ecx + movups 32(%r15),%xmm0 +.byte 102,15,56,220,209 + xorl %eax,%edi + movdqa %xmm12,%xmm13 + rorl $7,%ebp + paddd %xmm7,%xmm12 + addl %edx,%ecx + pxor %xmm3,%xmm8 + addl 4(%rsp),%ebx + xorl %ebp,%edi + movl %ecx,%esi + roll $5,%ecx + movdqa %xmm8,%xmm3 + addl %edi,%ebx + xorl %ebp,%esi + movdqa %xmm12,48(%rsp) + rorl $7,%edx + addl %ecx,%ebx + addl 8(%rsp),%eax + pslld $2,%xmm8 + xorl %edx,%esi + movl %ebx,%edi + psrld $30,%xmm3 + roll $5,%ebx + addl %esi,%eax + xorl %edx,%edi + rorl $7,%ecx + por %xmm3,%xmm8 + addl %ebx,%eax + addl 12(%rsp),%ebp + movups 48(%r15),%xmm1 +.byte 102,15,56,220,208 + pshufd $238,%xmm7,%xmm12 + xorl %ecx,%edi + movl %eax,%esi + roll $5,%eax + addl %edi,%ebp + xorl %ecx,%esi + rorl $7,%ebx + addl %eax,%ebp + pxor %xmm5,%xmm9 + addl 16(%rsp),%edx + xorl %ebx,%esi + punpcklqdq %xmm8,%xmm12 + movl %ebp,%edi + roll $5,%ebp + pxor %xmm10,%xmm9 + addl %esi,%edx + xorl %ebx,%edi + movdqa %xmm13,%xmm3 + rorl $7,%eax + paddd %xmm8,%xmm13 + addl %ebp,%edx + pxor %xmm12,%xmm9 + addl 20(%rsp),%ecx + xorl %eax,%edi + movl %edx,%esi + roll $5,%edx + movdqa %xmm9,%xmm12 + addl %edi,%ecx + cmpl $11,%r8d + jb .Laesenclast2 + movups 64(%r15),%xmm0 +.byte 102,15,56,220,209 + movups 80(%r15),%xmm1 +.byte 102,15,56,220,208 + je .Laesenclast2 + movups 96(%r15),%xmm0 +.byte 102,15,56,220,209 + movups 112(%r15),%xmm1 +.byte 102,15,56,220,208 +.Laesenclast2: +.byte 102,15,56,221,209 + movups 16-112(%r15),%xmm0 + xorl %eax,%esi + movdqa %xmm13,0(%rsp) + rorl $7,%ebp + addl %edx,%ecx + addl 24(%rsp),%ebx + pslld $2,%xmm9 + xorl %ebp,%esi + movl %ecx,%edi + psrld $30,%xmm12 + roll $5,%ecx + addl %esi,%ebx + xorl %ebp,%edi + rorl $7,%edx + por %xmm12,%xmm9 + addl %ecx,%ebx + addl 28(%rsp),%eax + pshufd $238,%xmm8,%xmm13 + rorl $7,%ecx + movl %ebx,%esi + xorl %edx,%edi + roll $5,%ebx + addl %edi,%eax + xorl %ecx,%esi + xorl %edx,%ecx + addl %ebx,%eax + pxor %xmm6,%xmm10 + addl 32(%rsp),%ebp + movups 32(%r12),%xmm14 + xorps %xmm15,%xmm14 + movups %xmm2,16(%r13,%r12,1) + xorps %xmm14,%xmm2 + movups -80(%r15),%xmm1 +.byte 102,15,56,220,208 + andl %ecx,%esi + xorl %edx,%ecx + rorl $7,%ebx + punpcklqdq %xmm9,%xmm13 + movl %eax,%edi + xorl %ecx,%esi + pxor %xmm11,%xmm10 + roll $5,%eax + addl %esi,%ebp + movdqa %xmm3,%xmm12 + xorl %ebx,%edi + paddd %xmm9,%xmm3 + xorl %ecx,%ebx + pxor %xmm13,%xmm10 + addl %eax,%ebp + addl 36(%rsp),%edx + andl %ebx,%edi + xorl %ecx,%ebx + rorl $7,%eax + movdqa %xmm10,%xmm13 + movl %ebp,%esi + xorl %ebx,%edi + movdqa %xmm3,16(%rsp) + roll $5,%ebp + addl %edi,%edx + movups -64(%r15),%xmm0 +.byte 102,15,56,220,209 + xorl %eax,%esi + pslld $2,%xmm10 + xorl %ebx,%eax + addl %ebp,%edx + psrld $30,%xmm13 + addl 40(%rsp),%ecx + andl %eax,%esi + xorl %ebx,%eax + por %xmm13,%xmm10 + rorl $7,%ebp + movl %edx,%edi + xorl %eax,%esi + roll $5,%edx + pshufd $238,%xmm9,%xmm3 + addl %esi,%ecx + xorl %ebp,%edi + xorl %eax,%ebp + addl %edx,%ecx + addl 44(%rsp),%ebx + andl %ebp,%edi + xorl %eax,%ebp + rorl $7,%edx + movups -48(%r15),%xmm1 +.byte 102,15,56,220,208 + movl %ecx,%esi + xorl %ebp,%edi + roll $5,%ecx + addl %edi,%ebx + xorl %edx,%esi + xorl %ebp,%edx + addl %ecx,%ebx + pxor %xmm7,%xmm11 + addl 48(%rsp),%eax + andl %edx,%esi + xorl %ebp,%edx + rorl $7,%ecx + punpcklqdq %xmm10,%xmm3 + movl %ebx,%edi + xorl %edx,%esi + pxor %xmm4,%xmm11 + roll $5,%ebx + addl %esi,%eax + movdqa 48(%r11),%xmm13 + xorl %ecx,%edi + paddd %xmm10,%xmm12 + xorl %edx,%ecx + pxor %xmm3,%xmm11 + addl %ebx,%eax + addl 52(%rsp),%ebp + movups -32(%r15),%xmm0 +.byte 102,15,56,220,209 + andl %ecx,%edi + xorl %edx,%ecx + rorl $7,%ebx + movdqa %xmm11,%xmm3 + movl %eax,%esi + xorl %ecx,%edi + movdqa %xmm12,32(%rsp) + roll $5,%eax + addl %edi,%ebp + xorl %ebx,%esi + pslld $2,%xmm11 + xorl %ecx,%ebx + addl %eax,%ebp + psrld $30,%xmm3 + addl 56(%rsp),%edx + andl %ebx,%esi + xorl %ecx,%ebx + por %xmm3,%xmm11 + rorl $7,%eax + movl %ebp,%edi + xorl %ebx,%esi + roll $5,%ebp + pshufd $238,%xmm10,%xmm12 + addl %esi,%edx + movups -16(%r15),%xmm1 +.byte 102,15,56,220,208 + xorl %eax,%edi + xorl %ebx,%eax + addl %ebp,%edx + addl 60(%rsp),%ecx + andl %eax,%edi + xorl %ebx,%eax + rorl $7,%ebp + movl %edx,%esi + xorl %eax,%edi + roll $5,%edx + addl %edi,%ecx + xorl %ebp,%esi + xorl %eax,%ebp + addl %edx,%ecx + pxor %xmm8,%xmm4 + addl 0(%rsp),%ebx + andl %ebp,%esi + xorl %eax,%ebp + rorl $7,%edx + movups 0(%r15),%xmm0 +.byte 102,15,56,220,209 + punpcklqdq %xmm11,%xmm12 + movl %ecx,%edi + xorl %ebp,%esi + pxor %xmm5,%xmm4 + roll $5,%ecx + addl %esi,%ebx + movdqa %xmm13,%xmm3 + xorl %edx,%edi + paddd %xmm11,%xmm13 + xorl %ebp,%edx + pxor %xmm12,%xmm4 + addl %ecx,%ebx + addl 4(%rsp),%eax + andl %edx,%edi + xorl %ebp,%edx + rorl $7,%ecx + movdqa %xmm4,%xmm12 + movl %ebx,%esi + xorl %edx,%edi + movdqa %xmm13,48(%rsp) + roll $5,%ebx + addl %edi,%eax + xorl %ecx,%esi + pslld $2,%xmm4 + xorl %edx,%ecx + addl %ebx,%eax + psrld $30,%xmm12 + addl 8(%rsp),%ebp + movups 16(%r15),%xmm1 +.byte 102,15,56,220,208 + andl %ecx,%esi + xorl %edx,%ecx + por %xmm12,%xmm4 + rorl $7,%ebx + movl %eax,%edi + xorl %ecx,%esi + roll $5,%eax + pshufd $238,%xmm11,%xmm13 + addl %esi,%ebp + xorl %ebx,%edi + xorl %ecx,%ebx + addl %eax,%ebp + addl 12(%rsp),%edx + andl %ebx,%edi + xorl %ecx,%ebx + rorl $7,%eax + movl %ebp,%esi + xorl %ebx,%edi + roll $5,%ebp + addl %edi,%edx + movups 32(%r15),%xmm0 +.byte 102,15,56,220,209 + xorl %eax,%esi + xorl %ebx,%eax + addl %ebp,%edx + pxor %xmm9,%xmm5 + addl 16(%rsp),%ecx + andl %eax,%esi + xorl %ebx,%eax + rorl $7,%ebp + punpcklqdq %xmm4,%xmm13 + movl %edx,%edi + xorl %eax,%esi + pxor %xmm6,%xmm5 + roll $5,%edx + addl %esi,%ecx + movdqa %xmm3,%xmm12 + xorl %ebp,%edi + paddd %xmm4,%xmm3 + xorl %eax,%ebp + pxor %xmm13,%xmm5 + addl %edx,%ecx + addl 20(%rsp),%ebx + andl %ebp,%edi + xorl %eax,%ebp + rorl $7,%edx + movups 48(%r15),%xmm1 +.byte 102,15,56,220,208 + movdqa %xmm5,%xmm13 + movl %ecx,%esi + xorl %ebp,%edi + movdqa %xmm3,0(%rsp) + roll $5,%ecx + addl %edi,%ebx + xorl %edx,%esi + pslld $2,%xmm5 + xorl %ebp,%edx + addl %ecx,%ebx + psrld $30,%xmm13 + addl 24(%rsp),%eax + andl %edx,%esi + xorl %ebp,%edx + por %xmm13,%xmm5 + rorl $7,%ecx + movl %ebx,%edi + xorl %edx,%esi + roll $5,%ebx + pshufd $238,%xmm4,%xmm3 + addl %esi,%eax + xorl %ecx,%edi + xorl %edx,%ecx + addl %ebx,%eax + addl 28(%rsp),%ebp + cmpl $11,%r8d + jb .Laesenclast3 + movups 64(%r15),%xmm0 +.byte 102,15,56,220,209 + movups 80(%r15),%xmm1 +.byte 102,15,56,220,208 + je .Laesenclast3 + movups 96(%r15),%xmm0 +.byte 102,15,56,220,209 + movups 112(%r15),%xmm1 +.byte 102,15,56,220,208 +.Laesenclast3: +.byte 102,15,56,221,209 + movups 16-112(%r15),%xmm0 + andl %ecx,%edi + xorl %edx,%ecx + rorl $7,%ebx + movl %eax,%esi + xorl %ecx,%edi + roll $5,%eax + addl %edi,%ebp + xorl %ebx,%esi + xorl %ecx,%ebx + addl %eax,%ebp + pxor %xmm10,%xmm6 + addl 32(%rsp),%edx + andl %ebx,%esi + xorl %ecx,%ebx + rorl $7,%eax + punpcklqdq %xmm5,%xmm3 + movl %ebp,%edi + xorl %ebx,%esi + pxor %xmm7,%xmm6 + roll $5,%ebp + addl %esi,%edx + movups 48(%r12),%xmm14 + xorps %xmm15,%xmm14 + movups %xmm2,32(%r13,%r12,1) + xorps %xmm14,%xmm2 + movups -80(%r15),%xmm1 +.byte 102,15,56,220,208 + movdqa %xmm12,%xmm13 + xorl %eax,%edi + paddd %xmm5,%xmm12 + xorl %ebx,%eax + pxor %xmm3,%xmm6 + addl %ebp,%edx + addl 36(%rsp),%ecx + andl %eax,%edi + xorl %ebx,%eax + rorl $7,%ebp + movdqa %xmm6,%xmm3 + movl %edx,%esi + xorl %eax,%edi + movdqa %xmm12,16(%rsp) + roll $5,%edx + addl %edi,%ecx + xorl %ebp,%esi + pslld $2,%xmm6 + xorl %eax,%ebp + addl %edx,%ecx + psrld $30,%xmm3 + addl 40(%rsp),%ebx + andl %ebp,%esi + xorl %eax,%ebp + por %xmm3,%xmm6 + rorl $7,%edx + movups -64(%r15),%xmm0 +.byte 102,15,56,220,209 + movl %ecx,%edi + xorl %ebp,%esi + roll $5,%ecx + pshufd $238,%xmm5,%xmm12 + addl %esi,%ebx + xorl %edx,%edi + xorl %ebp,%edx + addl %ecx,%ebx + addl 44(%rsp),%eax + andl %edx,%edi + xorl %ebp,%edx + rorl $7,%ecx + movl %ebx,%esi + xorl %edx,%edi + roll $5,%ebx + addl %edi,%eax + xorl %edx,%esi + addl %ebx,%eax + pxor %xmm11,%xmm7 + addl 48(%rsp),%ebp + movups -48(%r15),%xmm1 +.byte 102,15,56,220,208 + xorl %ecx,%esi + punpcklqdq %xmm6,%xmm12 + movl %eax,%edi + roll $5,%eax + pxor %xmm8,%xmm7 + addl %esi,%ebp + xorl %ecx,%edi + movdqa %xmm13,%xmm3 + rorl $7,%ebx + paddd %xmm6,%xmm13 + addl %eax,%ebp + pxor %xmm12,%xmm7 + addl 52(%rsp),%edx + xorl %ebx,%edi + movl %ebp,%esi + roll $5,%ebp + movdqa %xmm7,%xmm12 + addl %edi,%edx + xorl %ebx,%esi + movdqa %xmm13,32(%rsp) + rorl $7,%eax + addl %ebp,%edx + addl 56(%rsp),%ecx + pslld $2,%xmm7 + xorl %eax,%esi + movl %edx,%edi + psrld $30,%xmm12 + roll $5,%edx + addl %esi,%ecx + movups -32(%r15),%xmm0 +.byte 102,15,56,220,209 + xorl %eax,%edi + rorl $7,%ebp + por %xmm12,%xmm7 + addl %edx,%ecx + addl 60(%rsp),%ebx + xorl %ebp,%edi + movl %ecx,%esi + roll $5,%ecx + addl %edi,%ebx + xorl %ebp,%esi + rorl $7,%edx + addl %ecx,%ebx + addl 0(%rsp),%eax + xorl %edx,%esi + movl %ebx,%edi + roll $5,%ebx + paddd %xmm7,%xmm3 + addl %esi,%eax + xorl %edx,%edi + movdqa %xmm3,48(%rsp) + rorl $7,%ecx + addl %ebx,%eax + addl 4(%rsp),%ebp + movups -16(%r15),%xmm1 +.byte 102,15,56,220,208 + xorl %ecx,%edi + movl %eax,%esi + roll $5,%eax + addl %edi,%ebp + xorl %ecx,%esi + rorl $7,%ebx + addl %eax,%ebp + addl 8(%rsp),%edx + xorl %ebx,%esi + movl %ebp,%edi + roll $5,%ebp + addl %esi,%edx + xorl %ebx,%edi + rorl $7,%eax + addl %ebp,%edx + addl 12(%rsp),%ecx + xorl %eax,%edi + movl %edx,%esi + roll $5,%edx + addl %edi,%ecx + movups 0(%r15),%xmm0 +.byte 102,15,56,220,209 + xorl %eax,%esi + rorl $7,%ebp + addl %edx,%ecx + cmpq %r14,%r10 + je .Ldone_ssse3 + movdqa 64(%r11),%xmm3 + movdqa 0(%r11),%xmm13 + movdqu 0(%r10),%xmm4 + movdqu 16(%r10),%xmm5 + movdqu 32(%r10),%xmm6 + movdqu 48(%r10),%xmm7 +.byte 102,15,56,0,227 + addq $64,%r10 + addl 16(%rsp),%ebx + xorl %ebp,%esi + movl %ecx,%edi +.byte 102,15,56,0,235 + roll $5,%ecx + addl %esi,%ebx + xorl %ebp,%edi + rorl $7,%edx + paddd %xmm13,%xmm4 + addl %ecx,%ebx + addl 20(%rsp),%eax + xorl %edx,%edi + movl %ebx,%esi + movdqa %xmm4,0(%rsp) + roll $5,%ebx + addl %edi,%eax + xorl %edx,%esi + rorl $7,%ecx + psubd %xmm13,%xmm4 + addl %ebx,%eax + addl 24(%rsp),%ebp + movups 16(%r15),%xmm1 +.byte 102,15,56,220,208 + xorl %ecx,%esi + movl %eax,%edi + roll $5,%eax + addl %esi,%ebp + xorl %ecx,%edi + rorl $7,%ebx + addl %eax,%ebp + addl 28(%rsp),%edx + xorl %ebx,%edi + movl %ebp,%esi + roll $5,%ebp + addl %edi,%edx + xorl %ebx,%esi + rorl $7,%eax + addl %ebp,%edx + addl 32(%rsp),%ecx + xorl %eax,%esi + movl %edx,%edi +.byte 102,15,56,0,243 + roll $5,%edx + addl %esi,%ecx + movups 32(%r15),%xmm0 +.byte 102,15,56,220,209 + xorl %eax,%edi + rorl $7,%ebp + paddd %xmm13,%xmm5 + addl %edx,%ecx + addl 36(%rsp),%ebx + xorl %ebp,%edi + movl %ecx,%esi + movdqa %xmm5,16(%rsp) + roll $5,%ecx + addl %edi,%ebx + xorl %ebp,%esi + rorl $7,%edx + psubd %xmm13,%xmm5 + addl %ecx,%ebx + addl 40(%rsp),%eax + xorl %edx,%esi + movl %ebx,%edi + roll $5,%ebx + addl %esi,%eax + xorl %edx,%edi + rorl $7,%ecx + addl %ebx,%eax + addl 44(%rsp),%ebp + movups 48(%r15),%xmm1 +.byte 102,15,56,220,208 + xorl %ecx,%edi + movl %eax,%esi + roll $5,%eax + addl %edi,%ebp + xorl %ecx,%esi + rorl $7,%ebx + addl %eax,%ebp + addl 48(%rsp),%edx + xorl %ebx,%esi + movl %ebp,%edi +.byte 102,15,56,0,251 + roll $5,%ebp + addl %esi,%edx + xorl %ebx,%edi + rorl $7,%eax + paddd %xmm13,%xmm6 + addl %ebp,%edx + addl 52(%rsp),%ecx + xorl %eax,%edi + movl %edx,%esi + movdqa %xmm6,32(%rsp) + roll $5,%edx + addl %edi,%ecx + cmpl $11,%r8d + jb .Laesenclast4 + movups 64(%r15),%xmm0 +.byte 102,15,56,220,209 + movups 80(%r15),%xmm1 +.byte 102,15,56,220,208 + je .Laesenclast4 + movups 96(%r15),%xmm0 +.byte 102,15,56,220,209 + movups 112(%r15),%xmm1 +.byte 102,15,56,220,208 +.Laesenclast4: +.byte 102,15,56,221,209 + movups 16-112(%r15),%xmm0 + xorl %eax,%esi + rorl $7,%ebp + psubd %xmm13,%xmm6 + addl %edx,%ecx + addl 56(%rsp),%ebx + xorl %ebp,%esi + movl %ecx,%edi + roll $5,%ecx + addl %esi,%ebx + xorl %ebp,%edi + rorl $7,%edx + addl %ecx,%ebx + addl 60(%rsp),%eax + xorl %edx,%edi + movl %ebx,%esi + roll $5,%ebx + addl %edi,%eax + rorl $7,%ecx + addl %ebx,%eax + movups %xmm2,48(%r13,%r12,1) + leaq 64(%r12),%r12 + + addl 0(%r9),%eax + addl 4(%r9),%esi + addl 8(%r9),%ecx + addl 12(%r9),%edx + movl %eax,0(%r9) + addl 16(%r9),%ebp + movl %esi,4(%r9) + movl %esi,%ebx + movl %ecx,8(%r9) + movl %ecx,%edi + movl %edx,12(%r9) + xorl %edx,%edi + movl %ebp,16(%r9) + andl %edi,%esi + jmp .Loop_ssse3 + +.Ldone_ssse3: + addl 16(%rsp),%ebx + xorl %ebp,%esi + movl %ecx,%edi + roll $5,%ecx + addl %esi,%ebx + xorl %ebp,%edi + rorl $7,%edx + addl %ecx,%ebx + addl 20(%rsp),%eax + xorl %edx,%edi + movl %ebx,%esi + roll $5,%ebx + addl %edi,%eax + xorl %edx,%esi + rorl $7,%ecx + addl %ebx,%eax + addl 24(%rsp),%ebp + movups 16(%r15),%xmm1 +.byte 102,15,56,220,208 + xorl %ecx,%esi + movl %eax,%edi + roll $5,%eax + addl %esi,%ebp + xorl %ecx,%edi + rorl $7,%ebx + addl %eax,%ebp + addl 28(%rsp),%edx + xorl %ebx,%edi + movl %ebp,%esi + roll $5,%ebp + addl %edi,%edx + xorl %ebx,%esi + rorl $7,%eax + addl %ebp,%edx + addl 32(%rsp),%ecx + xorl %eax,%esi + movl %edx,%edi + roll $5,%edx + addl %esi,%ecx + movups 32(%r15),%xmm0 +.byte 102,15,56,220,209 + xorl %eax,%edi + rorl $7,%ebp + addl %edx,%ecx + addl 36(%rsp),%ebx + xorl %ebp,%edi + movl %ecx,%esi + roll $5,%ecx + addl %edi,%ebx + xorl %ebp,%esi + rorl $7,%edx + addl %ecx,%ebx + addl 40(%rsp),%eax + xorl %edx,%esi + movl %ebx,%edi + roll $5,%ebx + addl %esi,%eax + xorl %edx,%edi + rorl $7,%ecx + addl %ebx,%eax + addl 44(%rsp),%ebp + movups 48(%r15),%xmm1 +.byte 102,15,56,220,208 + xorl %ecx,%edi + movl %eax,%esi + roll $5,%eax + addl %edi,%ebp + xorl %ecx,%esi + rorl $7,%ebx + addl %eax,%ebp + addl 48(%rsp),%edx + xorl %ebx,%esi + movl %ebp,%edi + roll $5,%ebp + addl %esi,%edx + xorl %ebx,%edi + rorl $7,%eax + addl %ebp,%edx + addl 52(%rsp),%ecx + xorl %eax,%edi + movl %edx,%esi + roll $5,%edx + addl %edi,%ecx + cmpl $11,%r8d + jb .Laesenclast5 + movups 64(%r15),%xmm0 +.byte 102,15,56,220,209 + movups 80(%r15),%xmm1 +.byte 102,15,56,220,208 + je .Laesenclast5 + movups 96(%r15),%xmm0 +.byte 102,15,56,220,209 + movups 112(%r15),%xmm1 +.byte 102,15,56,220,208 +.Laesenclast5: +.byte 102,15,56,221,209 + movups 16-112(%r15),%xmm0 + xorl %eax,%esi + rorl $7,%ebp + addl %edx,%ecx + addl 56(%rsp),%ebx + xorl %ebp,%esi + movl %ecx,%edi + roll $5,%ecx + addl %esi,%ebx + xorl %ebp,%edi + rorl $7,%edx + addl %ecx,%ebx + addl 60(%rsp),%eax + xorl %edx,%edi + movl %ebx,%esi + roll $5,%ebx + addl %edi,%eax + rorl $7,%ecx + addl %ebx,%eax + movups %xmm2,48(%r13,%r12,1) + movq 88(%rsp),%r8 + + addl 0(%r9),%eax + addl 4(%r9),%esi + addl 8(%r9),%ecx + movl %eax,0(%r9) + addl 12(%r9),%edx + movl %esi,4(%r9) + addl 16(%r9),%ebp + movl %ecx,8(%r9) + movl %edx,12(%r9) + movl %ebp,16(%r9) + movups %xmm2,(%r8) + leaq 104(%rsp),%rsi +.cfi_def_cfa %rsi,56 + movq 0(%rsi),%r15 +.cfi_restore %r15 + movq 8(%rsi),%r14 +.cfi_restore %r14 + movq 16(%rsi),%r13 +.cfi_restore %r13 + movq 24(%rsi),%r12 +.cfi_restore %r12 + movq 32(%rsi),%rbp +.cfi_restore %rbp + movq 40(%rsi),%rbx +.cfi_restore %rbx + leaq 48(%rsi),%rsp +.cfi_def_cfa %rsp,8 +.Lepilogue_ssse3: + .byte 0xf3,0xc3 +.cfi_endproc +.size aesni_cbc_sha1_enc_ssse3,.-aesni_cbc_sha1_enc_ssse3 +.align 64 +K_XX_XX: +.long 0x5a827999,0x5a827999,0x5a827999,0x5a827999 +.long 0x6ed9eba1,0x6ed9eba1,0x6ed9eba1,0x6ed9eba1 +.long 0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc +.long 0xca62c1d6,0xca62c1d6,0xca62c1d6,0xca62c1d6 +.long 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f +.byte 0xf,0xe,0xd,0xc,0xb,0xa,0x9,0x8,0x7,0x6,0x5,0x4,0x3,0x2,0x1,0x0 + +.byte 65,69,83,78,73,45,67,66,67,43,83,72,65,49,32,115,116,105,116,99,104,32,102,111,114,32,120,56,54,95,54,52,44,32,67,82,89,80,84,79,71,65,77,83,32,98,121,32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46,111,114,103,62,0 +.align 64 +.type aesni_cbc_sha1_enc_shaext,@function +.align 32 +aesni_cbc_sha1_enc_shaext: +.cfi_startproc + movq 8(%rsp),%r10 + movdqu (%r9),%xmm8 + movd 16(%r9),%xmm9 + movdqa K_XX_XX+80(%rip),%xmm7 + + movl 240(%rcx),%r11d + subq %rdi,%rsi + movups (%rcx),%xmm15 + movups (%r8),%xmm2 + movups 16(%rcx),%xmm0 + leaq 112(%rcx),%rcx + + pshufd $27,%xmm8,%xmm8 + pshufd $27,%xmm9,%xmm9 + jmp .Loop_shaext + +.align 16 +.Loop_shaext: + movups 0(%rdi),%xmm14 + xorps %xmm15,%xmm14 + xorps %xmm14,%xmm2 + movups -80(%rcx),%xmm1 +.byte 102,15,56,220,208 + movdqu (%r10),%xmm3 + movdqa %xmm9,%xmm12 +.byte 102,15,56,0,223 + movdqu 16(%r10),%xmm4 + movdqa %xmm8,%xmm11 + movups -64(%rcx),%xmm0 +.byte 102,15,56,220,209 +.byte 102,15,56,0,231 + + paddd %xmm3,%xmm9 + movdqu 32(%r10),%xmm5 + leaq 64(%r10),%r10 + pxor %xmm12,%xmm3 + movups -48(%rcx),%xmm1 +.byte 102,15,56,220,208 + pxor %xmm12,%xmm3 + movdqa %xmm8,%xmm10 +.byte 102,15,56,0,239 +.byte 69,15,58,204,193,0 +.byte 68,15,56,200,212 + movups -32(%rcx),%xmm0 +.byte 102,15,56,220,209 +.byte 15,56,201,220 + movdqu -16(%r10),%xmm6 + movdqa %xmm8,%xmm9 +.byte 102,15,56,0,247 + movups -16(%rcx),%xmm1 +.byte 102,15,56,220,208 +.byte 69,15,58,204,194,0 +.byte 68,15,56,200,205 + pxor %xmm5,%xmm3 +.byte 15,56,201,229 + movups 0(%rcx),%xmm0 +.byte 102,15,56,220,209 + movdqa %xmm8,%xmm10 +.byte 69,15,58,204,193,0 +.byte 68,15,56,200,214 + movups 16(%rcx),%xmm1 +.byte 102,15,56,220,208 +.byte 15,56,202,222 + pxor %xmm6,%xmm4 +.byte 15,56,201,238 + movups 32(%rcx),%xmm0 +.byte 102,15,56,220,209 + movdqa %xmm8,%xmm9 +.byte 69,15,58,204,194,0 +.byte 68,15,56,200,203 + movups 48(%rcx),%xmm1 +.byte 102,15,56,220,208 +.byte 15,56,202,227 + pxor %xmm3,%xmm5 +.byte 15,56,201,243 + cmpl $11,%r11d + jb .Laesenclast6 + movups 64(%rcx),%xmm0 +.byte 102,15,56,220,209 + movups 80(%rcx),%xmm1 +.byte 102,15,56,220,208 + je .Laesenclast6 + movups 96(%rcx),%xmm0 +.byte 102,15,56,220,209 + movups 112(%rcx),%xmm1 +.byte 102,15,56,220,208 +.Laesenclast6: +.byte 102,15,56,221,209 + movups 16-112(%rcx),%xmm0 + movdqa %xmm8,%xmm10 +.byte 69,15,58,204,193,0 +.byte 68,15,56,200,212 + movups 16(%rdi),%xmm14 + xorps %xmm15,%xmm14 + movups %xmm2,0(%rsi,%rdi,1) + xorps %xmm14,%xmm2 + movups -80(%rcx),%xmm1 +.byte 102,15,56,220,208 +.byte 15,56,202,236 + pxor %xmm4,%xmm6 +.byte 15,56,201,220 + movups -64(%rcx),%xmm0 +.byte 102,15,56,220,209 + movdqa %xmm8,%xmm9 +.byte 69,15,58,204,194,1 +.byte 68,15,56,200,205 + movups -48(%rcx),%xmm1 +.byte 102,15,56,220,208 +.byte 15,56,202,245 + pxor %xmm5,%xmm3 +.byte 15,56,201,229 + movups -32(%rcx),%xmm0 +.byte 102,15,56,220,209 + movdqa %xmm8,%xmm10 +.byte 69,15,58,204,193,1 +.byte 68,15,56,200,214 + movups -16(%rcx),%xmm1 +.byte 102,15,56,220,208 +.byte 15,56,202,222 + pxor %xmm6,%xmm4 +.byte 15,56,201,238 + movups 0(%rcx),%xmm0 +.byte 102,15,56,220,209 + movdqa %xmm8,%xmm9 +.byte 69,15,58,204,194,1 +.byte 68,15,56,200,203 + movups 16(%rcx),%xmm1 +.byte 102,15,56,220,208 +.byte 15,56,202,227 + pxor %xmm3,%xmm5 +.byte 15,56,201,243 + movups 32(%rcx),%xmm0 +.byte 102,15,56,220,209 + movdqa %xmm8,%xmm10 +.byte 69,15,58,204,193,1 +.byte 68,15,56,200,212 + movups 48(%rcx),%xmm1 +.byte 102,15,56,220,208 +.byte 15,56,202,236 + pxor %xmm4,%xmm6 +.byte 15,56,201,220 + cmpl $11,%r11d + jb .Laesenclast7 + movups 64(%rcx),%xmm0 +.byte 102,15,56,220,209 + movups 80(%rcx),%xmm1 +.byte 102,15,56,220,208 + je .Laesenclast7 + movups 96(%rcx),%xmm0 +.byte 102,15,56,220,209 + movups 112(%rcx),%xmm1 +.byte 102,15,56,220,208 +.Laesenclast7: +.byte 102,15,56,221,209 + movups 16-112(%rcx),%xmm0 + movdqa %xmm8,%xmm9 +.byte 69,15,58,204,194,1 +.byte 68,15,56,200,205 + movups 32(%rdi),%xmm14 + xorps %xmm15,%xmm14 + movups %xmm2,16(%rsi,%rdi,1) + xorps %xmm14,%xmm2 + movups -80(%rcx),%xmm1 +.byte 102,15,56,220,208 +.byte 15,56,202,245 + pxor %xmm5,%xmm3 +.byte 15,56,201,229 + movups -64(%rcx),%xmm0 +.byte 102,15,56,220,209 + movdqa %xmm8,%xmm10 +.byte 69,15,58,204,193,2 +.byte 68,15,56,200,214 + movups -48(%rcx),%xmm1 +.byte 102,15,56,220,208 +.byte 15,56,202,222 + pxor %xmm6,%xmm4 +.byte 15,56,201,238 + movups -32(%rcx),%xmm0 +.byte 102,15,56,220,209 + movdqa %xmm8,%xmm9 +.byte 69,15,58,204,194,2 +.byte 68,15,56,200,203 + movups -16(%rcx),%xmm1 +.byte 102,15,56,220,208 +.byte 15,56,202,227 + pxor %xmm3,%xmm5 +.byte 15,56,201,243 + movups 0(%rcx),%xmm0 +.byte 102,15,56,220,209 + movdqa %xmm8,%xmm10 +.byte 69,15,58,204,193,2 +.byte 68,15,56,200,212 + movups 16(%rcx),%xmm1 +.byte 102,15,56,220,208 +.byte 15,56,202,236 + pxor %xmm4,%xmm6 +.byte 15,56,201,220 + movups 32(%rcx),%xmm0 +.byte 102,15,56,220,209 + movdqa %xmm8,%xmm9 +.byte 69,15,58,204,194,2 +.byte 68,15,56,200,205 + movups 48(%rcx),%xmm1 +.byte 102,15,56,220,208 +.byte 15,56,202,245 + pxor %xmm5,%xmm3 +.byte 15,56,201,229 + cmpl $11,%r11d + jb .Laesenclast8 + movups 64(%rcx),%xmm0 +.byte 102,15,56,220,209 + movups 80(%rcx),%xmm1 +.byte 102,15,56,220,208 + je .Laesenclast8 + movups 96(%rcx),%xmm0 +.byte 102,15,56,220,209 + movups 112(%rcx),%xmm1 +.byte 102,15,56,220,208 +.Laesenclast8: +.byte 102,15,56,221,209 + movups 16-112(%rcx),%xmm0 + movdqa %xmm8,%xmm10 +.byte 69,15,58,204,193,2 +.byte 68,15,56,200,214 + movups 48(%rdi),%xmm14 + xorps %xmm15,%xmm14 + movups %xmm2,32(%rsi,%rdi,1) + xorps %xmm14,%xmm2 + movups -80(%rcx),%xmm1 +.byte 102,15,56,220,208 +.byte 15,56,202,222 + pxor %xmm6,%xmm4 +.byte 15,56,201,238 + movups -64(%rcx),%xmm0 +.byte 102,15,56,220,209 + movdqa %xmm8,%xmm9 +.byte 69,15,58,204,194,3 +.byte 68,15,56,200,203 + movups -48(%rcx),%xmm1 +.byte 102,15,56,220,208 +.byte 15,56,202,227 + pxor %xmm3,%xmm5 +.byte 15,56,201,243 + movups -32(%rcx),%xmm0 +.byte 102,15,56,220,209 + movdqa %xmm8,%xmm10 +.byte 69,15,58,204,193,3 +.byte 68,15,56,200,212 +.byte 15,56,202,236 + pxor %xmm4,%xmm6 + movups -16(%rcx),%xmm1 +.byte 102,15,56,220,208 + movdqa %xmm8,%xmm9 +.byte 69,15,58,204,194,3 +.byte 68,15,56,200,205 +.byte 15,56,202,245 + movups 0(%rcx),%xmm0 +.byte 102,15,56,220,209 + movdqa %xmm12,%xmm5 + movdqa %xmm8,%xmm10 +.byte 69,15,58,204,193,3 +.byte 68,15,56,200,214 + movups 16(%rcx),%xmm1 +.byte 102,15,56,220,208 + movdqa %xmm8,%xmm9 +.byte 69,15,58,204,194,3 +.byte 68,15,56,200,205 + movups 32(%rcx),%xmm0 +.byte 102,15,56,220,209 + movups 48(%rcx),%xmm1 +.byte 102,15,56,220,208 + cmpl $11,%r11d + jb .Laesenclast9 + movups 64(%rcx),%xmm0 +.byte 102,15,56,220,209 + movups 80(%rcx),%xmm1 +.byte 102,15,56,220,208 + je .Laesenclast9 + movups 96(%rcx),%xmm0 +.byte 102,15,56,220,209 + movups 112(%rcx),%xmm1 +.byte 102,15,56,220,208 +.Laesenclast9: +.byte 102,15,56,221,209 + movups 16-112(%rcx),%xmm0 + decq %rdx + + paddd %xmm11,%xmm8 + movups %xmm2,48(%rsi,%rdi,1) + leaq 64(%rdi),%rdi + jnz .Loop_shaext + + pshufd $27,%xmm8,%xmm8 + pshufd $27,%xmm9,%xmm9 + movups %xmm2,(%r8) + movdqu %xmm8,(%r9) + movd %xmm9,16(%r9) + .byte 0xf3,0xc3 +.cfi_endproc +.size aesni_cbc_sha1_enc_shaext,.-aesni_cbc_sha1_enc_shaext diff --git a/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/aesni-sha256-x86_64.S b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/aesni-sha256-x86_64.S new file mode 100644 index 0000000000..e257169287 --- /dev/null +++ b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/aesni-sha256-x86_64.S @@ -0,0 +1,69 @@ +# WARNING: do not edit! +# Generated from openssl/crypto/aes/asm/aesni-sha256-x86_64.pl +# +# Copyright 2013-2020 The OpenSSL Project Authors. All Rights Reserved. +# +# Licensed under the OpenSSL license (the "License"). You may not use +# this file except in compliance with the License. You can obtain a copy +# in the file LICENSE in the source distribution or at +# https://www.openssl.org/source/license.html + +.text + + +.globl aesni_cbc_sha256_enc +.type aesni_cbc_sha256_enc,@function +.align 16 +aesni_cbc_sha256_enc: +.cfi_startproc + xorl %eax,%eax + cmpq $0,%rdi + je .Lprobe + ud2 +.Lprobe: + .byte 0xf3,0xc3 +.cfi_endproc +.size aesni_cbc_sha256_enc,.-aesni_cbc_sha256_enc + +.align 64 +.type K256,@object +K256: +.long 0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5 +.long 0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5 +.long 0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5 +.long 0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5 +.long 0xd807aa98,0x12835b01,0x243185be,0x550c7dc3 +.long 0xd807aa98,0x12835b01,0x243185be,0x550c7dc3 +.long 0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174 +.long 0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174 +.long 0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc +.long 0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc +.long 0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da +.long 0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da +.long 0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7 +.long 0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7 +.long 0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967 +.long 0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967 +.long 0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13 +.long 0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13 +.long 0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85 +.long 0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85 +.long 0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3 +.long 0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3 +.long 0xd192e819,0xd6990624,0xf40e3585,0x106aa070 +.long 0xd192e819,0xd6990624,0xf40e3585,0x106aa070 +.long 0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5 +.long 0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5 +.long 0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3 +.long 0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3 +.long 0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208 +.long 0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208 +.long 0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2 +.long 0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2 + +.long 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f +.long 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f +.long 0,0,0,0, 0,0,0,0, -1,-1,-1,-1 +.long 0,0,0,0, 0,0,0,0 +.byte 65,69,83,78,73,45,67,66,67,43,83,72,65,50,53,54,32,115,116,105,116,99,104,32,102,111,114,32,120,56,54,95,54,52,44,32,67,82,89,80,84,79,71,65,77,83,32,98,121,32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46,111,114,103,62,0 +.align 64 diff --git a/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/aesni-x86_64.S b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/aesni-x86_64.S new file mode 100644 index 0000000000..2bdb5cf251 --- /dev/null +++ b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/aesni-x86_64.S @@ -0,0 +1,4484 @@ +# WARNING: do not edit! +# Generated from openssl/crypto/aes/asm/aesni-x86_64.pl +# +# Copyright 2009-2020 The OpenSSL Project Authors. All Rights Reserved. +# +# Licensed under the OpenSSL license (the "License"). You may not use +# this file except in compliance with the License. You can obtain a copy +# in the file LICENSE in the source distribution or at +# https://www.openssl.org/source/license.html + +.text + +.globl aesni_encrypt +.type aesni_encrypt,@function +.align 16 +aesni_encrypt: +.cfi_startproc + movups (%rdi),%xmm2 + movl 240(%rdx),%eax + movups (%rdx),%xmm0 + movups 16(%rdx),%xmm1 + leaq 32(%rdx),%rdx + xorps %xmm0,%xmm2 +.Loop_enc1_1: +.byte 102,15,56,220,209 + decl %eax + movups (%rdx),%xmm1 + leaq 16(%rdx),%rdx + jnz .Loop_enc1_1 +.byte 102,15,56,221,209 + pxor %xmm0,%xmm0 + pxor %xmm1,%xmm1 + movups %xmm2,(%rsi) + pxor %xmm2,%xmm2 + .byte 0xf3,0xc3 +.cfi_endproc +.size aesni_encrypt,.-aesni_encrypt + +.globl aesni_decrypt +.type aesni_decrypt,@function +.align 16 +aesni_decrypt: +.cfi_startproc + movups (%rdi),%xmm2 + movl 240(%rdx),%eax + movups (%rdx),%xmm0 + movups 16(%rdx),%xmm1 + leaq 32(%rdx),%rdx + xorps %xmm0,%xmm2 +.Loop_dec1_2: +.byte 102,15,56,222,209 + decl %eax + movups (%rdx),%xmm1 + leaq 16(%rdx),%rdx + jnz .Loop_dec1_2 +.byte 102,15,56,223,209 + pxor %xmm0,%xmm0 + pxor %xmm1,%xmm1 + movups %xmm2,(%rsi) + pxor %xmm2,%xmm2 + .byte 0xf3,0xc3 +.cfi_endproc +.size aesni_decrypt, .-aesni_decrypt +.type _aesni_encrypt2,@function +.align 16 +_aesni_encrypt2: +.cfi_startproc + movups (%rcx),%xmm0 + shll $4,%eax + movups 16(%rcx),%xmm1 + xorps %xmm0,%xmm2 + xorps %xmm0,%xmm3 + movups 32(%rcx),%xmm0 + leaq 32(%rcx,%rax,1),%rcx + negq %rax + addq $16,%rax + +.Lenc_loop2: +.byte 102,15,56,220,209 +.byte 102,15,56,220,217 + movups (%rcx,%rax,1),%xmm1 + addq $32,%rax +.byte 102,15,56,220,208 +.byte 102,15,56,220,216 + movups -16(%rcx,%rax,1),%xmm0 + jnz .Lenc_loop2 + +.byte 102,15,56,220,209 +.byte 102,15,56,220,217 +.byte 102,15,56,221,208 +.byte 102,15,56,221,216 + .byte 0xf3,0xc3 +.cfi_endproc +.size _aesni_encrypt2,.-_aesni_encrypt2 +.type _aesni_decrypt2,@function +.align 16 +_aesni_decrypt2: +.cfi_startproc + movups (%rcx),%xmm0 + shll $4,%eax + movups 16(%rcx),%xmm1 + xorps %xmm0,%xmm2 + xorps %xmm0,%xmm3 + movups 32(%rcx),%xmm0 + leaq 32(%rcx,%rax,1),%rcx + negq %rax + addq $16,%rax + +.Ldec_loop2: +.byte 102,15,56,222,209 +.byte 102,15,56,222,217 + movups (%rcx,%rax,1),%xmm1 + addq $32,%rax +.byte 102,15,56,222,208 +.byte 102,15,56,222,216 + movups -16(%rcx,%rax,1),%xmm0 + jnz .Ldec_loop2 + +.byte 102,15,56,222,209 +.byte 102,15,56,222,217 +.byte 102,15,56,223,208 +.byte 102,15,56,223,216 + .byte 0xf3,0xc3 +.cfi_endproc +.size _aesni_decrypt2,.-_aesni_decrypt2 +.type _aesni_encrypt3,@function +.align 16 +_aesni_encrypt3: +.cfi_startproc + movups (%rcx),%xmm0 + shll $4,%eax + movups 16(%rcx),%xmm1 + xorps %xmm0,%xmm2 + xorps %xmm0,%xmm3 + xorps %xmm0,%xmm4 + movups 32(%rcx),%xmm0 + leaq 32(%rcx,%rax,1),%rcx + negq %rax + addq $16,%rax + +.Lenc_loop3: +.byte 102,15,56,220,209 +.byte 102,15,56,220,217 +.byte 102,15,56,220,225 + movups (%rcx,%rax,1),%xmm1 + addq $32,%rax +.byte 102,15,56,220,208 +.byte 102,15,56,220,216 +.byte 102,15,56,220,224 + movups -16(%rcx,%rax,1),%xmm0 + jnz .Lenc_loop3 + +.byte 102,15,56,220,209 +.byte 102,15,56,220,217 +.byte 102,15,56,220,225 +.byte 102,15,56,221,208 +.byte 102,15,56,221,216 +.byte 102,15,56,221,224 + .byte 0xf3,0xc3 +.cfi_endproc +.size _aesni_encrypt3,.-_aesni_encrypt3 +.type _aesni_decrypt3,@function +.align 16 +_aesni_decrypt3: +.cfi_startproc + movups (%rcx),%xmm0 + shll $4,%eax + movups 16(%rcx),%xmm1 + xorps %xmm0,%xmm2 + xorps %xmm0,%xmm3 + xorps %xmm0,%xmm4 + movups 32(%rcx),%xmm0 + leaq 32(%rcx,%rax,1),%rcx + negq %rax + addq $16,%rax + +.Ldec_loop3: +.byte 102,15,56,222,209 +.byte 102,15,56,222,217 +.byte 102,15,56,222,225 + movups (%rcx,%rax,1),%xmm1 + addq $32,%rax +.byte 102,15,56,222,208 +.byte 102,15,56,222,216 +.byte 102,15,56,222,224 + movups -16(%rcx,%rax,1),%xmm0 + jnz .Ldec_loop3 + +.byte 102,15,56,222,209 +.byte 102,15,56,222,217 +.byte 102,15,56,222,225 +.byte 102,15,56,223,208 +.byte 102,15,56,223,216 +.byte 102,15,56,223,224 + .byte 0xf3,0xc3 +.cfi_endproc +.size _aesni_decrypt3,.-_aesni_decrypt3 +.type _aesni_encrypt4,@function +.align 16 +_aesni_encrypt4: +.cfi_startproc + movups (%rcx),%xmm0 + shll $4,%eax + movups 16(%rcx),%xmm1 + xorps %xmm0,%xmm2 + xorps %xmm0,%xmm3 + xorps %xmm0,%xmm4 + xorps %xmm0,%xmm5 + movups 32(%rcx),%xmm0 + leaq 32(%rcx,%rax,1),%rcx + negq %rax +.byte 0x0f,0x1f,0x00 + addq $16,%rax + +.Lenc_loop4: +.byte 102,15,56,220,209 +.byte 102,15,56,220,217 +.byte 102,15,56,220,225 +.byte 102,15,56,220,233 + movups (%rcx,%rax,1),%xmm1 + addq $32,%rax +.byte 102,15,56,220,208 +.byte 102,15,56,220,216 +.byte 102,15,56,220,224 +.byte 102,15,56,220,232 + movups -16(%rcx,%rax,1),%xmm0 + jnz .Lenc_loop4 + +.byte 102,15,56,220,209 +.byte 102,15,56,220,217 +.byte 102,15,56,220,225 +.byte 102,15,56,220,233 +.byte 102,15,56,221,208 +.byte 102,15,56,221,216 +.byte 102,15,56,221,224 +.byte 102,15,56,221,232 + .byte 0xf3,0xc3 +.cfi_endproc +.size _aesni_encrypt4,.-_aesni_encrypt4 +.type _aesni_decrypt4,@function +.align 16 +_aesni_decrypt4: +.cfi_startproc + movups (%rcx),%xmm0 + shll $4,%eax + movups 16(%rcx),%xmm1 + xorps %xmm0,%xmm2 + xorps %xmm0,%xmm3 + xorps %xmm0,%xmm4 + xorps %xmm0,%xmm5 + movups 32(%rcx),%xmm0 + leaq 32(%rcx,%rax,1),%rcx + negq %rax +.byte 0x0f,0x1f,0x00 + addq $16,%rax + +.Ldec_loop4: +.byte 102,15,56,222,209 +.byte 102,15,56,222,217 +.byte 102,15,56,222,225 +.byte 102,15,56,222,233 + movups (%rcx,%rax,1),%xmm1 + addq $32,%rax +.byte 102,15,56,222,208 +.byte 102,15,56,222,216 +.byte 102,15,56,222,224 +.byte 102,15,56,222,232 + movups -16(%rcx,%rax,1),%xmm0 + jnz .Ldec_loop4 + +.byte 102,15,56,222,209 +.byte 102,15,56,222,217 +.byte 102,15,56,222,225 +.byte 102,15,56,222,233 +.byte 102,15,56,223,208 +.byte 102,15,56,223,216 +.byte 102,15,56,223,224 +.byte 102,15,56,223,232 + .byte 0xf3,0xc3 +.cfi_endproc +.size _aesni_decrypt4,.-_aesni_decrypt4 +.type _aesni_encrypt6,@function +.align 16 +_aesni_encrypt6: +.cfi_startproc + movups (%rcx),%xmm0 + shll $4,%eax + movups 16(%rcx),%xmm1 + xorps %xmm0,%xmm2 + pxor %xmm0,%xmm3 + pxor %xmm0,%xmm4 +.byte 102,15,56,220,209 + leaq 32(%rcx,%rax,1),%rcx + negq %rax +.byte 102,15,56,220,217 + pxor %xmm0,%xmm5 + pxor %xmm0,%xmm6 +.byte 102,15,56,220,225 + pxor %xmm0,%xmm7 + movups (%rcx,%rax,1),%xmm0 + addq $16,%rax + jmp .Lenc_loop6_enter +.align 16 +.Lenc_loop6: +.byte 102,15,56,220,209 +.byte 102,15,56,220,217 +.byte 102,15,56,220,225 +.Lenc_loop6_enter: +.byte 102,15,56,220,233 +.byte 102,15,56,220,241 +.byte 102,15,56,220,249 + movups (%rcx,%rax,1),%xmm1 + addq $32,%rax +.byte 102,15,56,220,208 +.byte 102,15,56,220,216 +.byte 102,15,56,220,224 +.byte 102,15,56,220,232 +.byte 102,15,56,220,240 +.byte 102,15,56,220,248 + movups -16(%rcx,%rax,1),%xmm0 + jnz .Lenc_loop6 + +.byte 102,15,56,220,209 +.byte 102,15,56,220,217 +.byte 102,15,56,220,225 +.byte 102,15,56,220,233 +.byte 102,15,56,220,241 +.byte 102,15,56,220,249 +.byte 102,15,56,221,208 +.byte 102,15,56,221,216 +.byte 102,15,56,221,224 +.byte 102,15,56,221,232 +.byte 102,15,56,221,240 +.byte 102,15,56,221,248 + .byte 0xf3,0xc3 +.cfi_endproc +.size _aesni_encrypt6,.-_aesni_encrypt6 +.type _aesni_decrypt6,@function +.align 16 +_aesni_decrypt6: +.cfi_startproc + movups (%rcx),%xmm0 + shll $4,%eax + movups 16(%rcx),%xmm1 + xorps %xmm0,%xmm2 + pxor %xmm0,%xmm3 + pxor %xmm0,%xmm4 +.byte 102,15,56,222,209 + leaq 32(%rcx,%rax,1),%rcx + negq %rax +.byte 102,15,56,222,217 + pxor %xmm0,%xmm5 + pxor %xmm0,%xmm6 +.byte 102,15,56,222,225 + pxor %xmm0,%xmm7 + movups (%rcx,%rax,1),%xmm0 + addq $16,%rax + jmp .Ldec_loop6_enter +.align 16 +.Ldec_loop6: +.byte 102,15,56,222,209 +.byte 102,15,56,222,217 +.byte 102,15,56,222,225 +.Ldec_loop6_enter: +.byte 102,15,56,222,233 +.byte 102,15,56,222,241 +.byte 102,15,56,222,249 + movups (%rcx,%rax,1),%xmm1 + addq $32,%rax +.byte 102,15,56,222,208 +.byte 102,15,56,222,216 +.byte 102,15,56,222,224 +.byte 102,15,56,222,232 +.byte 102,15,56,222,240 +.byte 102,15,56,222,248 + movups -16(%rcx,%rax,1),%xmm0 + jnz .Ldec_loop6 + +.byte 102,15,56,222,209 +.byte 102,15,56,222,217 +.byte 102,15,56,222,225 +.byte 102,15,56,222,233 +.byte 102,15,56,222,241 +.byte 102,15,56,222,249 +.byte 102,15,56,223,208 +.byte 102,15,56,223,216 +.byte 102,15,56,223,224 +.byte 102,15,56,223,232 +.byte 102,15,56,223,240 +.byte 102,15,56,223,248 + .byte 0xf3,0xc3 +.cfi_endproc +.size _aesni_decrypt6,.-_aesni_decrypt6 +.type _aesni_encrypt8,@function +.align 16 +_aesni_encrypt8: +.cfi_startproc + movups (%rcx),%xmm0 + shll $4,%eax + movups 16(%rcx),%xmm1 + xorps %xmm0,%xmm2 + xorps %xmm0,%xmm3 + pxor %xmm0,%xmm4 + pxor %xmm0,%xmm5 + pxor %xmm0,%xmm6 + leaq 32(%rcx,%rax,1),%rcx + negq %rax +.byte 102,15,56,220,209 + pxor %xmm0,%xmm7 + pxor %xmm0,%xmm8 +.byte 102,15,56,220,217 + pxor %xmm0,%xmm9 + movups (%rcx,%rax,1),%xmm0 + addq $16,%rax + jmp .Lenc_loop8_inner +.align 16 +.Lenc_loop8: +.byte 102,15,56,220,209 +.byte 102,15,56,220,217 +.Lenc_loop8_inner: +.byte 102,15,56,220,225 +.byte 102,15,56,220,233 +.byte 102,15,56,220,241 +.byte 102,15,56,220,249 +.byte 102,68,15,56,220,193 +.byte 102,68,15,56,220,201 +.Lenc_loop8_enter: + movups (%rcx,%rax,1),%xmm1 + addq $32,%rax +.byte 102,15,56,220,208 +.byte 102,15,56,220,216 +.byte 102,15,56,220,224 +.byte 102,15,56,220,232 +.byte 102,15,56,220,240 +.byte 102,15,56,220,248 +.byte 102,68,15,56,220,192 +.byte 102,68,15,56,220,200 + movups -16(%rcx,%rax,1),%xmm0 + jnz .Lenc_loop8 + +.byte 102,15,56,220,209 +.byte 102,15,56,220,217 +.byte 102,15,56,220,225 +.byte 102,15,56,220,233 +.byte 102,15,56,220,241 +.byte 102,15,56,220,249 +.byte 102,68,15,56,220,193 +.byte 102,68,15,56,220,201 +.byte 102,15,56,221,208 +.byte 102,15,56,221,216 +.byte 102,15,56,221,224 +.byte 102,15,56,221,232 +.byte 102,15,56,221,240 +.byte 102,15,56,221,248 +.byte 102,68,15,56,221,192 +.byte 102,68,15,56,221,200 + .byte 0xf3,0xc3 +.cfi_endproc +.size _aesni_encrypt8,.-_aesni_encrypt8 +.type _aesni_decrypt8,@function +.align 16 +_aesni_decrypt8: +.cfi_startproc + movups (%rcx),%xmm0 + shll $4,%eax + movups 16(%rcx),%xmm1 + xorps %xmm0,%xmm2 + xorps %xmm0,%xmm3 + pxor %xmm0,%xmm4 + pxor %xmm0,%xmm5 + pxor %xmm0,%xmm6 + leaq 32(%rcx,%rax,1),%rcx + negq %rax +.byte 102,15,56,222,209 + pxor %xmm0,%xmm7 + pxor %xmm0,%xmm8 +.byte 102,15,56,222,217 + pxor %xmm0,%xmm9 + movups (%rcx,%rax,1),%xmm0 + addq $16,%rax + jmp .Ldec_loop8_inner +.align 16 +.Ldec_loop8: +.byte 102,15,56,222,209 +.byte 102,15,56,222,217 +.Ldec_loop8_inner: +.byte 102,15,56,222,225 +.byte 102,15,56,222,233 +.byte 102,15,56,222,241 +.byte 102,15,56,222,249 +.byte 102,68,15,56,222,193 +.byte 102,68,15,56,222,201 +.Ldec_loop8_enter: + movups (%rcx,%rax,1),%xmm1 + addq $32,%rax +.byte 102,15,56,222,208 +.byte 102,15,56,222,216 +.byte 102,15,56,222,224 +.byte 102,15,56,222,232 +.byte 102,15,56,222,240 +.byte 102,15,56,222,248 +.byte 102,68,15,56,222,192 +.byte 102,68,15,56,222,200 + movups -16(%rcx,%rax,1),%xmm0 + jnz .Ldec_loop8 + +.byte 102,15,56,222,209 +.byte 102,15,56,222,217 +.byte 102,15,56,222,225 +.byte 102,15,56,222,233 +.byte 102,15,56,222,241 +.byte 102,15,56,222,249 +.byte 102,68,15,56,222,193 +.byte 102,68,15,56,222,201 +.byte 102,15,56,223,208 +.byte 102,15,56,223,216 +.byte 102,15,56,223,224 +.byte 102,15,56,223,232 +.byte 102,15,56,223,240 +.byte 102,15,56,223,248 +.byte 102,68,15,56,223,192 +.byte 102,68,15,56,223,200 + .byte 0xf3,0xc3 +.cfi_endproc +.size _aesni_decrypt8,.-_aesni_decrypt8 +.globl aesni_ecb_encrypt +.type aesni_ecb_encrypt,@function +.align 16 +aesni_ecb_encrypt: +.cfi_startproc + andq $-16,%rdx + jz .Lecb_ret + + movl 240(%rcx),%eax + movups (%rcx),%xmm0 + movq %rcx,%r11 + movl %eax,%r10d + testl %r8d,%r8d + jz .Lecb_decrypt + + cmpq $0x80,%rdx + jb .Lecb_enc_tail + + movdqu (%rdi),%xmm2 + movdqu 16(%rdi),%xmm3 + movdqu 32(%rdi),%xmm4 + movdqu 48(%rdi),%xmm5 + movdqu 64(%rdi),%xmm6 + movdqu 80(%rdi),%xmm7 + movdqu 96(%rdi),%xmm8 + movdqu 112(%rdi),%xmm9 + leaq 128(%rdi),%rdi + subq $0x80,%rdx + jmp .Lecb_enc_loop8_enter +.align 16 +.Lecb_enc_loop8: + movups %xmm2,(%rsi) + movq %r11,%rcx + movdqu (%rdi),%xmm2 + movl %r10d,%eax + movups %xmm3,16(%rsi) + movdqu 16(%rdi),%xmm3 + movups %xmm4,32(%rsi) + movdqu 32(%rdi),%xmm4 + movups %xmm5,48(%rsi) + movdqu 48(%rdi),%xmm5 + movups %xmm6,64(%rsi) + movdqu 64(%rdi),%xmm6 + movups %xmm7,80(%rsi) + movdqu 80(%rdi),%xmm7 + movups %xmm8,96(%rsi) + movdqu 96(%rdi),%xmm8 + movups %xmm9,112(%rsi) + leaq 128(%rsi),%rsi + movdqu 112(%rdi),%xmm9 + leaq 128(%rdi),%rdi +.Lecb_enc_loop8_enter: + + call _aesni_encrypt8 + + subq $0x80,%rdx + jnc .Lecb_enc_loop8 + + movups %xmm2,(%rsi) + movq %r11,%rcx + movups %xmm3,16(%rsi) + movl %r10d,%eax + movups %xmm4,32(%rsi) + movups %xmm5,48(%rsi) + movups %xmm6,64(%rsi) + movups %xmm7,80(%rsi) + movups %xmm8,96(%rsi) + movups %xmm9,112(%rsi) + leaq 128(%rsi),%rsi + addq $0x80,%rdx + jz .Lecb_ret + +.Lecb_enc_tail: + movups (%rdi),%xmm2 + cmpq $0x20,%rdx + jb .Lecb_enc_one + movups 16(%rdi),%xmm3 + je .Lecb_enc_two + movups 32(%rdi),%xmm4 + cmpq $0x40,%rdx + jb .Lecb_enc_three + movups 48(%rdi),%xmm5 + je .Lecb_enc_four + movups 64(%rdi),%xmm6 + cmpq $0x60,%rdx + jb .Lecb_enc_five + movups 80(%rdi),%xmm7 + je .Lecb_enc_six + movdqu 96(%rdi),%xmm8 + xorps %xmm9,%xmm9 + call _aesni_encrypt8 + movups %xmm2,(%rsi) + movups %xmm3,16(%rsi) + movups %xmm4,32(%rsi) + movups %xmm5,48(%rsi) + movups %xmm6,64(%rsi) + movups %xmm7,80(%rsi) + movups %xmm8,96(%rsi) + jmp .Lecb_ret +.align 16 +.Lecb_enc_one: + movups (%rcx),%xmm0 + movups 16(%rcx),%xmm1 + leaq 32(%rcx),%rcx + xorps %xmm0,%xmm2 +.Loop_enc1_3: +.byte 102,15,56,220,209 + decl %eax + movups (%rcx),%xmm1 + leaq 16(%rcx),%rcx + jnz .Loop_enc1_3 +.byte 102,15,56,221,209 + movups %xmm2,(%rsi) + jmp .Lecb_ret +.align 16 +.Lecb_enc_two: + call _aesni_encrypt2 + movups %xmm2,(%rsi) + movups %xmm3,16(%rsi) + jmp .Lecb_ret +.align 16 +.Lecb_enc_three: + call _aesni_encrypt3 + movups %xmm2,(%rsi) + movups %xmm3,16(%rsi) + movups %xmm4,32(%rsi) + jmp .Lecb_ret +.align 16 +.Lecb_enc_four: + call _aesni_encrypt4 + movups %xmm2,(%rsi) + movups %xmm3,16(%rsi) + movups %xmm4,32(%rsi) + movups %xmm5,48(%rsi) + jmp .Lecb_ret +.align 16 +.Lecb_enc_five: + xorps %xmm7,%xmm7 + call _aesni_encrypt6 + movups %xmm2,(%rsi) + movups %xmm3,16(%rsi) + movups %xmm4,32(%rsi) + movups %xmm5,48(%rsi) + movups %xmm6,64(%rsi) + jmp .Lecb_ret +.align 16 +.Lecb_enc_six: + call _aesni_encrypt6 + movups %xmm2,(%rsi) + movups %xmm3,16(%rsi) + movups %xmm4,32(%rsi) + movups %xmm5,48(%rsi) + movups %xmm6,64(%rsi) + movups %xmm7,80(%rsi) + jmp .Lecb_ret + +.align 16 +.Lecb_decrypt: + cmpq $0x80,%rdx + jb .Lecb_dec_tail + + movdqu (%rdi),%xmm2 + movdqu 16(%rdi),%xmm3 + movdqu 32(%rdi),%xmm4 + movdqu 48(%rdi),%xmm5 + movdqu 64(%rdi),%xmm6 + movdqu 80(%rdi),%xmm7 + movdqu 96(%rdi),%xmm8 + movdqu 112(%rdi),%xmm9 + leaq 128(%rdi),%rdi + subq $0x80,%rdx + jmp .Lecb_dec_loop8_enter +.align 16 +.Lecb_dec_loop8: + movups %xmm2,(%rsi) + movq %r11,%rcx + movdqu (%rdi),%xmm2 + movl %r10d,%eax + movups %xmm3,16(%rsi) + movdqu 16(%rdi),%xmm3 + movups %xmm4,32(%rsi) + movdqu 32(%rdi),%xmm4 + movups %xmm5,48(%rsi) + movdqu 48(%rdi),%xmm5 + movups %xmm6,64(%rsi) + movdqu 64(%rdi),%xmm6 + movups %xmm7,80(%rsi) + movdqu 80(%rdi),%xmm7 + movups %xmm8,96(%rsi) + movdqu 96(%rdi),%xmm8 + movups %xmm9,112(%rsi) + leaq 128(%rsi),%rsi + movdqu 112(%rdi),%xmm9 + leaq 128(%rdi),%rdi +.Lecb_dec_loop8_enter: + + call _aesni_decrypt8 + + movups (%r11),%xmm0 + subq $0x80,%rdx + jnc .Lecb_dec_loop8 + + movups %xmm2,(%rsi) + pxor %xmm2,%xmm2 + movq %r11,%rcx + movups %xmm3,16(%rsi) + pxor %xmm3,%xmm3 + movl %r10d,%eax + movups %xmm4,32(%rsi) + pxor %xmm4,%xmm4 + movups %xmm5,48(%rsi) + pxor %xmm5,%xmm5 + movups %xmm6,64(%rsi) + pxor %xmm6,%xmm6 + movups %xmm7,80(%rsi) + pxor %xmm7,%xmm7 + movups %xmm8,96(%rsi) + pxor %xmm8,%xmm8 + movups %xmm9,112(%rsi) + pxor %xmm9,%xmm9 + leaq 128(%rsi),%rsi + addq $0x80,%rdx + jz .Lecb_ret + +.Lecb_dec_tail: + movups (%rdi),%xmm2 + cmpq $0x20,%rdx + jb .Lecb_dec_one + movups 16(%rdi),%xmm3 + je .Lecb_dec_two + movups 32(%rdi),%xmm4 + cmpq $0x40,%rdx + jb .Lecb_dec_three + movups 48(%rdi),%xmm5 + je .Lecb_dec_four + movups 64(%rdi),%xmm6 + cmpq $0x60,%rdx + jb .Lecb_dec_five + movups 80(%rdi),%xmm7 + je .Lecb_dec_six + movups 96(%rdi),%xmm8 + movups (%rcx),%xmm0 + xorps %xmm9,%xmm9 + call _aesni_decrypt8 + movups %xmm2,(%rsi) + pxor %xmm2,%xmm2 + movups %xmm3,16(%rsi) + pxor %xmm3,%xmm3 + movups %xmm4,32(%rsi) + pxor %xmm4,%xmm4 + movups %xmm5,48(%rsi) + pxor %xmm5,%xmm5 + movups %xmm6,64(%rsi) + pxor %xmm6,%xmm6 + movups %xmm7,80(%rsi) + pxor %xmm7,%xmm7 + movups %xmm8,96(%rsi) + pxor %xmm8,%xmm8 + pxor %xmm9,%xmm9 + jmp .Lecb_ret +.align 16 +.Lecb_dec_one: + movups (%rcx),%xmm0 + movups 16(%rcx),%xmm1 + leaq 32(%rcx),%rcx + xorps %xmm0,%xmm2 +.Loop_dec1_4: +.byte 102,15,56,222,209 + decl %eax + movups (%rcx),%xmm1 + leaq 16(%rcx),%rcx + jnz .Loop_dec1_4 +.byte 102,15,56,223,209 + movups %xmm2,(%rsi) + pxor %xmm2,%xmm2 + jmp .Lecb_ret +.align 16 +.Lecb_dec_two: + call _aesni_decrypt2 + movups %xmm2,(%rsi) + pxor %xmm2,%xmm2 + movups %xmm3,16(%rsi) + pxor %xmm3,%xmm3 + jmp .Lecb_ret +.align 16 +.Lecb_dec_three: + call _aesni_decrypt3 + movups %xmm2,(%rsi) + pxor %xmm2,%xmm2 + movups %xmm3,16(%rsi) + pxor %xmm3,%xmm3 + movups %xmm4,32(%rsi) + pxor %xmm4,%xmm4 + jmp .Lecb_ret +.align 16 +.Lecb_dec_four: + call _aesni_decrypt4 + movups %xmm2,(%rsi) + pxor %xmm2,%xmm2 + movups %xmm3,16(%rsi) + pxor %xmm3,%xmm3 + movups %xmm4,32(%rsi) + pxor %xmm4,%xmm4 + movups %xmm5,48(%rsi) + pxor %xmm5,%xmm5 + jmp .Lecb_ret +.align 16 +.Lecb_dec_five: + xorps %xmm7,%xmm7 + call _aesni_decrypt6 + movups %xmm2,(%rsi) + pxor %xmm2,%xmm2 + movups %xmm3,16(%rsi) + pxor %xmm3,%xmm3 + movups %xmm4,32(%rsi) + pxor %xmm4,%xmm4 + movups %xmm5,48(%rsi) + pxor %xmm5,%xmm5 + movups %xmm6,64(%rsi) + pxor %xmm6,%xmm6 + pxor %xmm7,%xmm7 + jmp .Lecb_ret +.align 16 +.Lecb_dec_six: + call _aesni_decrypt6 + movups %xmm2,(%rsi) + pxor %xmm2,%xmm2 + movups %xmm3,16(%rsi) + pxor %xmm3,%xmm3 + movups %xmm4,32(%rsi) + pxor %xmm4,%xmm4 + movups %xmm5,48(%rsi) + pxor %xmm5,%xmm5 + movups %xmm6,64(%rsi) + pxor %xmm6,%xmm6 + movups %xmm7,80(%rsi) + pxor %xmm7,%xmm7 + +.Lecb_ret: + xorps %xmm0,%xmm0 + pxor %xmm1,%xmm1 + .byte 0xf3,0xc3 +.cfi_endproc +.size aesni_ecb_encrypt,.-aesni_ecb_encrypt +.globl aesni_ccm64_encrypt_blocks +.type aesni_ccm64_encrypt_blocks,@function +.align 16 +aesni_ccm64_encrypt_blocks: +.cfi_startproc + movl 240(%rcx),%eax + movdqu (%r8),%xmm6 + movdqa .Lincrement64(%rip),%xmm9 + movdqa .Lbswap_mask(%rip),%xmm7 + + shll $4,%eax + movl $16,%r10d + leaq 0(%rcx),%r11 + movdqu (%r9),%xmm3 + movdqa %xmm6,%xmm2 + leaq 32(%rcx,%rax,1),%rcx +.byte 102,15,56,0,247 + subq %rax,%r10 + jmp .Lccm64_enc_outer +.align 16 +.Lccm64_enc_outer: + movups (%r11),%xmm0 + movq %r10,%rax + movups (%rdi),%xmm8 + + xorps %xmm0,%xmm2 + movups 16(%r11),%xmm1 + xorps %xmm8,%xmm0 + xorps %xmm0,%xmm3 + movups 32(%r11),%xmm0 + +.Lccm64_enc2_loop: +.byte 102,15,56,220,209 +.byte 102,15,56,220,217 + movups (%rcx,%rax,1),%xmm1 + addq $32,%rax +.byte 102,15,56,220,208 +.byte 102,15,56,220,216 + movups -16(%rcx,%rax,1),%xmm0 + jnz .Lccm64_enc2_loop +.byte 102,15,56,220,209 +.byte 102,15,56,220,217 + paddq %xmm9,%xmm6 + decq %rdx +.byte 102,15,56,221,208 +.byte 102,15,56,221,216 + + leaq 16(%rdi),%rdi + xorps %xmm2,%xmm8 + movdqa %xmm6,%xmm2 + movups %xmm8,(%rsi) +.byte 102,15,56,0,215 + leaq 16(%rsi),%rsi + jnz .Lccm64_enc_outer + + pxor %xmm0,%xmm0 + pxor %xmm1,%xmm1 + pxor %xmm2,%xmm2 + movups %xmm3,(%r9) + pxor %xmm3,%xmm3 + pxor %xmm8,%xmm8 + pxor %xmm6,%xmm6 + .byte 0xf3,0xc3 +.cfi_endproc +.size aesni_ccm64_encrypt_blocks,.-aesni_ccm64_encrypt_blocks +.globl aesni_ccm64_decrypt_blocks +.type aesni_ccm64_decrypt_blocks,@function +.align 16 +aesni_ccm64_decrypt_blocks: +.cfi_startproc + movl 240(%rcx),%eax + movups (%r8),%xmm6 + movdqu (%r9),%xmm3 + movdqa .Lincrement64(%rip),%xmm9 + movdqa .Lbswap_mask(%rip),%xmm7 + + movaps %xmm6,%xmm2 + movl %eax,%r10d + movq %rcx,%r11 +.byte 102,15,56,0,247 + movups (%rcx),%xmm0 + movups 16(%rcx),%xmm1 + leaq 32(%rcx),%rcx + xorps %xmm0,%xmm2 +.Loop_enc1_5: +.byte 102,15,56,220,209 + decl %eax + movups (%rcx),%xmm1 + leaq 16(%rcx),%rcx + jnz .Loop_enc1_5 +.byte 102,15,56,221,209 + shll $4,%r10d + movl $16,%eax + movups (%rdi),%xmm8 + paddq %xmm9,%xmm6 + leaq 16(%rdi),%rdi + subq %r10,%rax + leaq 32(%r11,%r10,1),%rcx + movq %rax,%r10 + jmp .Lccm64_dec_outer +.align 16 +.Lccm64_dec_outer: + xorps %xmm2,%xmm8 + movdqa %xmm6,%xmm2 + movups %xmm8,(%rsi) + leaq 16(%rsi),%rsi +.byte 102,15,56,0,215 + + subq $1,%rdx + jz .Lccm64_dec_break + + movups (%r11),%xmm0 + movq %r10,%rax + movups 16(%r11),%xmm1 + xorps %xmm0,%xmm8 + xorps %xmm0,%xmm2 + xorps %xmm8,%xmm3 + movups 32(%r11),%xmm0 + jmp .Lccm64_dec2_loop +.align 16 +.Lccm64_dec2_loop: +.byte 102,15,56,220,209 +.byte 102,15,56,220,217 + movups (%rcx,%rax,1),%xmm1 + addq $32,%rax +.byte 102,15,56,220,208 +.byte 102,15,56,220,216 + movups -16(%rcx,%rax,1),%xmm0 + jnz .Lccm64_dec2_loop + movups (%rdi),%xmm8 + paddq %xmm9,%xmm6 +.byte 102,15,56,220,209 +.byte 102,15,56,220,217 +.byte 102,15,56,221,208 +.byte 102,15,56,221,216 + leaq 16(%rdi),%rdi + jmp .Lccm64_dec_outer + +.align 16 +.Lccm64_dec_break: + + movl 240(%r11),%eax + movups (%r11),%xmm0 + movups 16(%r11),%xmm1 + xorps %xmm0,%xmm8 + leaq 32(%r11),%r11 + xorps %xmm8,%xmm3 +.Loop_enc1_6: +.byte 102,15,56,220,217 + decl %eax + movups (%r11),%xmm1 + leaq 16(%r11),%r11 + jnz .Loop_enc1_6 +.byte 102,15,56,221,217 + pxor %xmm0,%xmm0 + pxor %xmm1,%xmm1 + pxor %xmm2,%xmm2 + movups %xmm3,(%r9) + pxor %xmm3,%xmm3 + pxor %xmm8,%xmm8 + pxor %xmm6,%xmm6 + .byte 0xf3,0xc3 +.cfi_endproc +.size aesni_ccm64_decrypt_blocks,.-aesni_ccm64_decrypt_blocks +.globl aesni_ctr32_encrypt_blocks +.type aesni_ctr32_encrypt_blocks,@function +.align 16 +aesni_ctr32_encrypt_blocks: +.cfi_startproc + cmpq $1,%rdx + jne .Lctr32_bulk + + + + movups (%r8),%xmm2 + movups (%rdi),%xmm3 + movl 240(%rcx),%edx + movups (%rcx),%xmm0 + movups 16(%rcx),%xmm1 + leaq 32(%rcx),%rcx + xorps %xmm0,%xmm2 +.Loop_enc1_7: +.byte 102,15,56,220,209 + decl %edx + movups (%rcx),%xmm1 + leaq 16(%rcx),%rcx + jnz .Loop_enc1_7 +.byte 102,15,56,221,209 + pxor %xmm0,%xmm0 + pxor %xmm1,%xmm1 + xorps %xmm3,%xmm2 + pxor %xmm3,%xmm3 + movups %xmm2,(%rsi) + xorps %xmm2,%xmm2 + jmp .Lctr32_epilogue + +.align 16 +.Lctr32_bulk: + leaq (%rsp),%r11 +.cfi_def_cfa_register %r11 + pushq %rbp +.cfi_offset %rbp,-16 + subq $128,%rsp + andq $-16,%rsp + + + + + movdqu (%r8),%xmm2 + movdqu (%rcx),%xmm0 + movl 12(%r8),%r8d + pxor %xmm0,%xmm2 + movl 12(%rcx),%ebp + movdqa %xmm2,0(%rsp) + bswapl %r8d + movdqa %xmm2,%xmm3 + movdqa %xmm2,%xmm4 + movdqa %xmm2,%xmm5 + movdqa %xmm2,64(%rsp) + movdqa %xmm2,80(%rsp) + movdqa %xmm2,96(%rsp) + movq %rdx,%r10 + movdqa %xmm2,112(%rsp) + + leaq 1(%r8),%rax + leaq 2(%r8),%rdx + bswapl %eax + bswapl %edx + xorl %ebp,%eax + xorl %ebp,%edx +.byte 102,15,58,34,216,3 + leaq 3(%r8),%rax + movdqa %xmm3,16(%rsp) +.byte 102,15,58,34,226,3 + bswapl %eax + movq %r10,%rdx + leaq 4(%r8),%r10 + movdqa %xmm4,32(%rsp) + xorl %ebp,%eax + bswapl %r10d +.byte 102,15,58,34,232,3 + xorl %ebp,%r10d + movdqa %xmm5,48(%rsp) + leaq 5(%r8),%r9 + movl %r10d,64+12(%rsp) + bswapl %r9d + leaq 6(%r8),%r10 + movl 240(%rcx),%eax + xorl %ebp,%r9d + bswapl %r10d + movl %r9d,80+12(%rsp) + xorl %ebp,%r10d + leaq 7(%r8),%r9 + movl %r10d,96+12(%rsp) + bswapl %r9d + movl OPENSSL_ia32cap_P+4(%rip),%r10d + xorl %ebp,%r9d + andl $71303168,%r10d + movl %r9d,112+12(%rsp) + + movups 16(%rcx),%xmm1 + + movdqa 64(%rsp),%xmm6 + movdqa 80(%rsp),%xmm7 + + cmpq $8,%rdx + jb .Lctr32_tail + + subq $6,%rdx + cmpl $4194304,%r10d + je .Lctr32_6x + + leaq 128(%rcx),%rcx + subq $2,%rdx + jmp .Lctr32_loop8 + +.align 16 +.Lctr32_6x: + shll $4,%eax + movl $48,%r10d + bswapl %ebp + leaq 32(%rcx,%rax,1),%rcx + subq %rax,%r10 + jmp .Lctr32_loop6 + +.align 16 +.Lctr32_loop6: + addl $6,%r8d + movups -48(%rcx,%r10,1),%xmm0 +.byte 102,15,56,220,209 + movl %r8d,%eax + xorl %ebp,%eax +.byte 102,15,56,220,217 +.byte 0x0f,0x38,0xf1,0x44,0x24,12 + leal 1(%r8),%eax +.byte 102,15,56,220,225 + xorl %ebp,%eax +.byte 0x0f,0x38,0xf1,0x44,0x24,28 +.byte 102,15,56,220,233 + leal 2(%r8),%eax + xorl %ebp,%eax +.byte 102,15,56,220,241 +.byte 0x0f,0x38,0xf1,0x44,0x24,44 + leal 3(%r8),%eax +.byte 102,15,56,220,249 + movups -32(%rcx,%r10,1),%xmm1 + xorl %ebp,%eax + +.byte 102,15,56,220,208 +.byte 0x0f,0x38,0xf1,0x44,0x24,60 + leal 4(%r8),%eax +.byte 102,15,56,220,216 + xorl %ebp,%eax +.byte 0x0f,0x38,0xf1,0x44,0x24,76 +.byte 102,15,56,220,224 + leal 5(%r8),%eax + xorl %ebp,%eax +.byte 102,15,56,220,232 +.byte 0x0f,0x38,0xf1,0x44,0x24,92 + movq %r10,%rax +.byte 102,15,56,220,240 +.byte 102,15,56,220,248 + movups -16(%rcx,%r10,1),%xmm0 + + call .Lenc_loop6 + + movdqu (%rdi),%xmm8 + movdqu 16(%rdi),%xmm9 + movdqu 32(%rdi),%xmm10 + movdqu 48(%rdi),%xmm11 + movdqu 64(%rdi),%xmm12 + movdqu 80(%rdi),%xmm13 + leaq 96(%rdi),%rdi + movups -64(%rcx,%r10,1),%xmm1 + pxor %xmm2,%xmm8 + movaps 0(%rsp),%xmm2 + pxor %xmm3,%xmm9 + movaps 16(%rsp),%xmm3 + pxor %xmm4,%xmm10 + movaps 32(%rsp),%xmm4 + pxor %xmm5,%xmm11 + movaps 48(%rsp),%xmm5 + pxor %xmm6,%xmm12 + movaps 64(%rsp),%xmm6 + pxor %xmm7,%xmm13 + movaps 80(%rsp),%xmm7 + movdqu %xmm8,(%rsi) + movdqu %xmm9,16(%rsi) + movdqu %xmm10,32(%rsi) + movdqu %xmm11,48(%rsi) + movdqu %xmm12,64(%rsi) + movdqu %xmm13,80(%rsi) + leaq 96(%rsi),%rsi + + subq $6,%rdx + jnc .Lctr32_loop6 + + addq $6,%rdx + jz .Lctr32_done + + leal -48(%r10),%eax + leaq -80(%rcx,%r10,1),%rcx + negl %eax + shrl $4,%eax + jmp .Lctr32_tail + +.align 32 +.Lctr32_loop8: + addl $8,%r8d + movdqa 96(%rsp),%xmm8 +.byte 102,15,56,220,209 + movl %r8d,%r9d + movdqa 112(%rsp),%xmm9 +.byte 102,15,56,220,217 + bswapl %r9d + movups 32-128(%rcx),%xmm0 +.byte 102,15,56,220,225 + xorl %ebp,%r9d + nop +.byte 102,15,56,220,233 + movl %r9d,0+12(%rsp) + leaq 1(%r8),%r9 +.byte 102,15,56,220,241 +.byte 102,15,56,220,249 +.byte 102,68,15,56,220,193 +.byte 102,68,15,56,220,201 + movups 48-128(%rcx),%xmm1 + bswapl %r9d +.byte 102,15,56,220,208 +.byte 102,15,56,220,216 + xorl %ebp,%r9d +.byte 0x66,0x90 +.byte 102,15,56,220,224 +.byte 102,15,56,220,232 + movl %r9d,16+12(%rsp) + leaq 2(%r8),%r9 +.byte 102,15,56,220,240 +.byte 102,15,56,220,248 +.byte 102,68,15,56,220,192 +.byte 102,68,15,56,220,200 + movups 64-128(%rcx),%xmm0 + bswapl %r9d +.byte 102,15,56,220,209 +.byte 102,15,56,220,217 + xorl %ebp,%r9d +.byte 0x66,0x90 +.byte 102,15,56,220,225 +.byte 102,15,56,220,233 + movl %r9d,32+12(%rsp) + leaq 3(%r8),%r9 +.byte 102,15,56,220,241 +.byte 102,15,56,220,249 +.byte 102,68,15,56,220,193 +.byte 102,68,15,56,220,201 + movups 80-128(%rcx),%xmm1 + bswapl %r9d +.byte 102,15,56,220,208 +.byte 102,15,56,220,216 + xorl %ebp,%r9d +.byte 0x66,0x90 +.byte 102,15,56,220,224 +.byte 102,15,56,220,232 + movl %r9d,48+12(%rsp) + leaq 4(%r8),%r9 +.byte 102,15,56,220,240 +.byte 102,15,56,220,248 +.byte 102,68,15,56,220,192 +.byte 102,68,15,56,220,200 + movups 96-128(%rcx),%xmm0 + bswapl %r9d +.byte 102,15,56,220,209 +.byte 102,15,56,220,217 + xorl %ebp,%r9d +.byte 0x66,0x90 +.byte 102,15,56,220,225 +.byte 102,15,56,220,233 + movl %r9d,64+12(%rsp) + leaq 5(%r8),%r9 +.byte 102,15,56,220,241 +.byte 102,15,56,220,249 +.byte 102,68,15,56,220,193 +.byte 102,68,15,56,220,201 + movups 112-128(%rcx),%xmm1 + bswapl %r9d +.byte 102,15,56,220,208 +.byte 102,15,56,220,216 + xorl %ebp,%r9d +.byte 0x66,0x90 +.byte 102,15,56,220,224 +.byte 102,15,56,220,232 + movl %r9d,80+12(%rsp) + leaq 6(%r8),%r9 +.byte 102,15,56,220,240 +.byte 102,15,56,220,248 +.byte 102,68,15,56,220,192 +.byte 102,68,15,56,220,200 + movups 128-128(%rcx),%xmm0 + bswapl %r9d +.byte 102,15,56,220,209 +.byte 102,15,56,220,217 + xorl %ebp,%r9d +.byte 0x66,0x90 +.byte 102,15,56,220,225 +.byte 102,15,56,220,233 + movl %r9d,96+12(%rsp) + leaq 7(%r8),%r9 +.byte 102,15,56,220,241 +.byte 102,15,56,220,249 +.byte 102,68,15,56,220,193 +.byte 102,68,15,56,220,201 + movups 144-128(%rcx),%xmm1 + bswapl %r9d +.byte 102,15,56,220,208 +.byte 102,15,56,220,216 +.byte 102,15,56,220,224 + xorl %ebp,%r9d + movdqu 0(%rdi),%xmm10 +.byte 102,15,56,220,232 + movl %r9d,112+12(%rsp) + cmpl $11,%eax +.byte 102,15,56,220,240 +.byte 102,15,56,220,248 +.byte 102,68,15,56,220,192 +.byte 102,68,15,56,220,200 + movups 160-128(%rcx),%xmm0 + + jb .Lctr32_enc_done + +.byte 102,15,56,220,209 +.byte 102,15,56,220,217 +.byte 102,15,56,220,225 +.byte 102,15,56,220,233 +.byte 102,15,56,220,241 +.byte 102,15,56,220,249 +.byte 102,68,15,56,220,193 +.byte 102,68,15,56,220,201 + movups 176-128(%rcx),%xmm1 + +.byte 102,15,56,220,208 +.byte 102,15,56,220,216 +.byte 102,15,56,220,224 +.byte 102,15,56,220,232 +.byte 102,15,56,220,240 +.byte 102,15,56,220,248 +.byte 102,68,15,56,220,192 +.byte 102,68,15,56,220,200 + movups 192-128(%rcx),%xmm0 + je .Lctr32_enc_done + +.byte 102,15,56,220,209 +.byte 102,15,56,220,217 +.byte 102,15,56,220,225 +.byte 102,15,56,220,233 +.byte 102,15,56,220,241 +.byte 102,15,56,220,249 +.byte 102,68,15,56,220,193 +.byte 102,68,15,56,220,201 + movups 208-128(%rcx),%xmm1 + +.byte 102,15,56,220,208 +.byte 102,15,56,220,216 +.byte 102,15,56,220,224 +.byte 102,15,56,220,232 +.byte 102,15,56,220,240 +.byte 102,15,56,220,248 +.byte 102,68,15,56,220,192 +.byte 102,68,15,56,220,200 + movups 224-128(%rcx),%xmm0 + jmp .Lctr32_enc_done + +.align 16 +.Lctr32_enc_done: + movdqu 16(%rdi),%xmm11 + pxor %xmm0,%xmm10 + movdqu 32(%rdi),%xmm12 + pxor %xmm0,%xmm11 + movdqu 48(%rdi),%xmm13 + pxor %xmm0,%xmm12 + movdqu 64(%rdi),%xmm14 + pxor %xmm0,%xmm13 + movdqu 80(%rdi),%xmm15 + pxor %xmm0,%xmm14 + pxor %xmm0,%xmm15 +.byte 102,15,56,220,209 +.byte 102,15,56,220,217 +.byte 102,15,56,220,225 +.byte 102,15,56,220,233 +.byte 102,15,56,220,241 +.byte 102,15,56,220,249 +.byte 102,68,15,56,220,193 +.byte 102,68,15,56,220,201 + movdqu 96(%rdi),%xmm1 + leaq 128(%rdi),%rdi + +.byte 102,65,15,56,221,210 + pxor %xmm0,%xmm1 + movdqu 112-128(%rdi),%xmm10 +.byte 102,65,15,56,221,219 + pxor %xmm0,%xmm10 + movdqa 0(%rsp),%xmm11 +.byte 102,65,15,56,221,228 +.byte 102,65,15,56,221,237 + movdqa 16(%rsp),%xmm12 + movdqa 32(%rsp),%xmm13 +.byte 102,65,15,56,221,246 +.byte 102,65,15,56,221,255 + movdqa 48(%rsp),%xmm14 + movdqa 64(%rsp),%xmm15 +.byte 102,68,15,56,221,193 + movdqa 80(%rsp),%xmm0 + movups 16-128(%rcx),%xmm1 +.byte 102,69,15,56,221,202 + + movups %xmm2,(%rsi) + movdqa %xmm11,%xmm2 + movups %xmm3,16(%rsi) + movdqa %xmm12,%xmm3 + movups %xmm4,32(%rsi) + movdqa %xmm13,%xmm4 + movups %xmm5,48(%rsi) + movdqa %xmm14,%xmm5 + movups %xmm6,64(%rsi) + movdqa %xmm15,%xmm6 + movups %xmm7,80(%rsi) + movdqa %xmm0,%xmm7 + movups %xmm8,96(%rsi) + movups %xmm9,112(%rsi) + leaq 128(%rsi),%rsi + + subq $8,%rdx + jnc .Lctr32_loop8 + + addq $8,%rdx + jz .Lctr32_done + leaq -128(%rcx),%rcx + +.Lctr32_tail: + + + leaq 16(%rcx),%rcx + cmpq $4,%rdx + jb .Lctr32_loop3 + je .Lctr32_loop4 + + + shll $4,%eax + movdqa 96(%rsp),%xmm8 + pxor %xmm9,%xmm9 + + movups 16(%rcx),%xmm0 +.byte 102,15,56,220,209 +.byte 102,15,56,220,217 + leaq 32-16(%rcx,%rax,1),%rcx + negq %rax +.byte 102,15,56,220,225 + addq $16,%rax + movups (%rdi),%xmm10 +.byte 102,15,56,220,233 +.byte 102,15,56,220,241 + movups 16(%rdi),%xmm11 + movups 32(%rdi),%xmm12 +.byte 102,15,56,220,249 +.byte 102,68,15,56,220,193 + + call .Lenc_loop8_enter + + movdqu 48(%rdi),%xmm13 + pxor %xmm10,%xmm2 + movdqu 64(%rdi),%xmm10 + pxor %xmm11,%xmm3 + movdqu %xmm2,(%rsi) + pxor %xmm12,%xmm4 + movdqu %xmm3,16(%rsi) + pxor %xmm13,%xmm5 + movdqu %xmm4,32(%rsi) + pxor %xmm10,%xmm6 + movdqu %xmm5,48(%rsi) + movdqu %xmm6,64(%rsi) + cmpq $6,%rdx + jb .Lctr32_done + + movups 80(%rdi),%xmm11 + xorps %xmm11,%xmm7 + movups %xmm7,80(%rsi) + je .Lctr32_done + + movups 96(%rdi),%xmm12 + xorps %xmm12,%xmm8 + movups %xmm8,96(%rsi) + jmp .Lctr32_done + +.align 32 +.Lctr32_loop4: +.byte 102,15,56,220,209 + leaq 16(%rcx),%rcx + decl %eax +.byte 102,15,56,220,217 +.byte 102,15,56,220,225 +.byte 102,15,56,220,233 + movups (%rcx),%xmm1 + jnz .Lctr32_loop4 +.byte 102,15,56,221,209 +.byte 102,15,56,221,217 + movups (%rdi),%xmm10 + movups 16(%rdi),%xmm11 +.byte 102,15,56,221,225 +.byte 102,15,56,221,233 + movups 32(%rdi),%xmm12 + movups 48(%rdi),%xmm13 + + xorps %xmm10,%xmm2 + movups %xmm2,(%rsi) + xorps %xmm11,%xmm3 + movups %xmm3,16(%rsi) + pxor %xmm12,%xmm4 + movdqu %xmm4,32(%rsi) + pxor %xmm13,%xmm5 + movdqu %xmm5,48(%rsi) + jmp .Lctr32_done + +.align 32 +.Lctr32_loop3: +.byte 102,15,56,220,209 + leaq 16(%rcx),%rcx + decl %eax +.byte 102,15,56,220,217 +.byte 102,15,56,220,225 + movups (%rcx),%xmm1 + jnz .Lctr32_loop3 +.byte 102,15,56,221,209 +.byte 102,15,56,221,217 +.byte 102,15,56,221,225 + + movups (%rdi),%xmm10 + xorps %xmm10,%xmm2 + movups %xmm2,(%rsi) + cmpq $2,%rdx + jb .Lctr32_done + + movups 16(%rdi),%xmm11 + xorps %xmm11,%xmm3 + movups %xmm3,16(%rsi) + je .Lctr32_done + + movups 32(%rdi),%xmm12 + xorps %xmm12,%xmm4 + movups %xmm4,32(%rsi) + +.Lctr32_done: + xorps %xmm0,%xmm0 + xorl %ebp,%ebp + pxor %xmm1,%xmm1 + pxor %xmm2,%xmm2 + pxor %xmm3,%xmm3 + pxor %xmm4,%xmm4 + pxor %xmm5,%xmm5 + pxor %xmm6,%xmm6 + pxor %xmm7,%xmm7 + movaps %xmm0,0(%rsp) + pxor %xmm8,%xmm8 + movaps %xmm0,16(%rsp) + pxor %xmm9,%xmm9 + movaps %xmm0,32(%rsp) + pxor %xmm10,%xmm10 + movaps %xmm0,48(%rsp) + pxor %xmm11,%xmm11 + movaps %xmm0,64(%rsp) + pxor %xmm12,%xmm12 + movaps %xmm0,80(%rsp) + pxor %xmm13,%xmm13 + movaps %xmm0,96(%rsp) + pxor %xmm14,%xmm14 + movaps %xmm0,112(%rsp) + pxor %xmm15,%xmm15 + movq -8(%r11),%rbp +.cfi_restore %rbp + leaq (%r11),%rsp +.cfi_def_cfa_register %rsp +.Lctr32_epilogue: + .byte 0xf3,0xc3 +.cfi_endproc +.size aesni_ctr32_encrypt_blocks,.-aesni_ctr32_encrypt_blocks +.globl aesni_xts_encrypt +.type aesni_xts_encrypt,@function +.align 16 +aesni_xts_encrypt: +.cfi_startproc + leaq (%rsp),%r11 +.cfi_def_cfa_register %r11 + pushq %rbp +.cfi_offset %rbp,-16 + subq $112,%rsp + andq $-16,%rsp + movups (%r9),%xmm2 + movl 240(%r8),%eax + movl 240(%rcx),%r10d + movups (%r8),%xmm0 + movups 16(%r8),%xmm1 + leaq 32(%r8),%r8 + xorps %xmm0,%xmm2 +.Loop_enc1_8: +.byte 102,15,56,220,209 + decl %eax + movups (%r8),%xmm1 + leaq 16(%r8),%r8 + jnz .Loop_enc1_8 +.byte 102,15,56,221,209 + movups (%rcx),%xmm0 + movq %rcx,%rbp + movl %r10d,%eax + shll $4,%r10d + movq %rdx,%r9 + andq $-16,%rdx + + movups 16(%rcx,%r10,1),%xmm1 + + movdqa .Lxts_magic(%rip),%xmm8 + movdqa %xmm2,%xmm15 + pshufd $0x5f,%xmm2,%xmm9 + pxor %xmm0,%xmm1 + movdqa %xmm9,%xmm14 + paddd %xmm9,%xmm9 + movdqa %xmm15,%xmm10 + psrad $31,%xmm14 + paddq %xmm15,%xmm15 + pand %xmm8,%xmm14 + pxor %xmm0,%xmm10 + pxor %xmm14,%xmm15 + movdqa %xmm9,%xmm14 + paddd %xmm9,%xmm9 + movdqa %xmm15,%xmm11 + psrad $31,%xmm14 + paddq %xmm15,%xmm15 + pand %xmm8,%xmm14 + pxor %xmm0,%xmm11 + pxor %xmm14,%xmm15 + movdqa %xmm9,%xmm14 + paddd %xmm9,%xmm9 + movdqa %xmm15,%xmm12 + psrad $31,%xmm14 + paddq %xmm15,%xmm15 + pand %xmm8,%xmm14 + pxor %xmm0,%xmm12 + pxor %xmm14,%xmm15 + movdqa %xmm9,%xmm14 + paddd %xmm9,%xmm9 + movdqa %xmm15,%xmm13 + psrad $31,%xmm14 + paddq %xmm15,%xmm15 + pand %xmm8,%xmm14 + pxor %xmm0,%xmm13 + pxor %xmm14,%xmm15 + movdqa %xmm15,%xmm14 + psrad $31,%xmm9 + paddq %xmm15,%xmm15 + pand %xmm8,%xmm9 + pxor %xmm0,%xmm14 + pxor %xmm9,%xmm15 + movaps %xmm1,96(%rsp) + + subq $96,%rdx + jc .Lxts_enc_short + + movl $16+96,%eax + leaq 32(%rbp,%r10,1),%rcx + subq %r10,%rax + movups 16(%rbp),%xmm1 + movq %rax,%r10 + leaq .Lxts_magic(%rip),%r8 + jmp .Lxts_enc_grandloop + +.align 32 +.Lxts_enc_grandloop: + movdqu 0(%rdi),%xmm2 + movdqa %xmm0,%xmm8 + movdqu 16(%rdi),%xmm3 + pxor %xmm10,%xmm2 + movdqu 32(%rdi),%xmm4 + pxor %xmm11,%xmm3 +.byte 102,15,56,220,209 + movdqu 48(%rdi),%xmm5 + pxor %xmm12,%xmm4 +.byte 102,15,56,220,217 + movdqu 64(%rdi),%xmm6 + pxor %xmm13,%xmm5 +.byte 102,15,56,220,225 + movdqu 80(%rdi),%xmm7 + pxor %xmm15,%xmm8 + movdqa 96(%rsp),%xmm9 + pxor %xmm14,%xmm6 +.byte 102,15,56,220,233 + movups 32(%rbp),%xmm0 + leaq 96(%rdi),%rdi + pxor %xmm8,%xmm7 + + pxor %xmm9,%xmm10 +.byte 102,15,56,220,241 + pxor %xmm9,%xmm11 + movdqa %xmm10,0(%rsp) +.byte 102,15,56,220,249 + movups 48(%rbp),%xmm1 + pxor %xmm9,%xmm12 + +.byte 102,15,56,220,208 + pxor %xmm9,%xmm13 + movdqa %xmm11,16(%rsp) +.byte 102,15,56,220,216 + pxor %xmm9,%xmm14 + movdqa %xmm12,32(%rsp) +.byte 102,15,56,220,224 +.byte 102,15,56,220,232 + pxor %xmm9,%xmm8 + movdqa %xmm14,64(%rsp) +.byte 102,15,56,220,240 +.byte 102,15,56,220,248 + movups 64(%rbp),%xmm0 + movdqa %xmm8,80(%rsp) + pshufd $0x5f,%xmm15,%xmm9 + jmp .Lxts_enc_loop6 +.align 32 +.Lxts_enc_loop6: +.byte 102,15,56,220,209 +.byte 102,15,56,220,217 +.byte 102,15,56,220,225 +.byte 102,15,56,220,233 +.byte 102,15,56,220,241 +.byte 102,15,56,220,249 + movups -64(%rcx,%rax,1),%xmm1 + addq $32,%rax + +.byte 102,15,56,220,208 +.byte 102,15,56,220,216 +.byte 102,15,56,220,224 +.byte 102,15,56,220,232 +.byte 102,15,56,220,240 +.byte 102,15,56,220,248 + movups -80(%rcx,%rax,1),%xmm0 + jnz .Lxts_enc_loop6 + + movdqa (%r8),%xmm8 + movdqa %xmm9,%xmm14 + paddd %xmm9,%xmm9 +.byte 102,15,56,220,209 + paddq %xmm15,%xmm15 + psrad $31,%xmm14 +.byte 102,15,56,220,217 + pand %xmm8,%xmm14 + movups (%rbp),%xmm10 +.byte 102,15,56,220,225 +.byte 102,15,56,220,233 +.byte 102,15,56,220,241 + pxor %xmm14,%xmm15 + movaps %xmm10,%xmm11 +.byte 102,15,56,220,249 + movups -64(%rcx),%xmm1 + + movdqa %xmm9,%xmm14 +.byte 102,15,56,220,208 + paddd %xmm9,%xmm9 + pxor %xmm15,%xmm10 +.byte 102,15,56,220,216 + psrad $31,%xmm14 + paddq %xmm15,%xmm15 +.byte 102,15,56,220,224 +.byte 102,15,56,220,232 + pand %xmm8,%xmm14 + movaps %xmm11,%xmm12 +.byte 102,15,56,220,240 + pxor %xmm14,%xmm15 + movdqa %xmm9,%xmm14 +.byte 102,15,56,220,248 + movups -48(%rcx),%xmm0 + + paddd %xmm9,%xmm9 +.byte 102,15,56,220,209 + pxor %xmm15,%xmm11 + psrad $31,%xmm14 +.byte 102,15,56,220,217 + paddq %xmm15,%xmm15 + pand %xmm8,%xmm14 +.byte 102,15,56,220,225 +.byte 102,15,56,220,233 + movdqa %xmm13,48(%rsp) + pxor %xmm14,%xmm15 +.byte 102,15,56,220,241 + movaps %xmm12,%xmm13 + movdqa %xmm9,%xmm14 +.byte 102,15,56,220,249 + movups -32(%rcx),%xmm1 + + paddd %xmm9,%xmm9 +.byte 102,15,56,220,208 + pxor %xmm15,%xmm12 + psrad $31,%xmm14 +.byte 102,15,56,220,216 + paddq %xmm15,%xmm15 + pand %xmm8,%xmm14 +.byte 102,15,56,220,224 +.byte 102,15,56,220,232 +.byte 102,15,56,220,240 + pxor %xmm14,%xmm15 + movaps %xmm13,%xmm14 +.byte 102,15,56,220,248 + + movdqa %xmm9,%xmm0 + paddd %xmm9,%xmm9 +.byte 102,15,56,220,209 + pxor %xmm15,%xmm13 + psrad $31,%xmm0 +.byte 102,15,56,220,217 + paddq %xmm15,%xmm15 + pand %xmm8,%xmm0 +.byte 102,15,56,220,225 +.byte 102,15,56,220,233 + pxor %xmm0,%xmm15 + movups (%rbp),%xmm0 +.byte 102,15,56,220,241 +.byte 102,15,56,220,249 + movups 16(%rbp),%xmm1 + + pxor %xmm15,%xmm14 +.byte 102,15,56,221,84,36,0 + psrad $31,%xmm9 + paddq %xmm15,%xmm15 +.byte 102,15,56,221,92,36,16 +.byte 102,15,56,221,100,36,32 + pand %xmm8,%xmm9 + movq %r10,%rax +.byte 102,15,56,221,108,36,48 +.byte 102,15,56,221,116,36,64 +.byte 102,15,56,221,124,36,80 + pxor %xmm9,%xmm15 + + leaq 96(%rsi),%rsi + movups %xmm2,-96(%rsi) + movups %xmm3,-80(%rsi) + movups %xmm4,-64(%rsi) + movups %xmm5,-48(%rsi) + movups %xmm6,-32(%rsi) + movups %xmm7,-16(%rsi) + subq $96,%rdx + jnc .Lxts_enc_grandloop + + movl $16+96,%eax + subl %r10d,%eax + movq %rbp,%rcx + shrl $4,%eax + +.Lxts_enc_short: + + movl %eax,%r10d + pxor %xmm0,%xmm10 + addq $96,%rdx + jz .Lxts_enc_done + + pxor %xmm0,%xmm11 + cmpq $0x20,%rdx + jb .Lxts_enc_one + pxor %xmm0,%xmm12 + je .Lxts_enc_two + + pxor %xmm0,%xmm13 + cmpq $0x40,%rdx + jb .Lxts_enc_three + pxor %xmm0,%xmm14 + je .Lxts_enc_four + + movdqu (%rdi),%xmm2 + movdqu 16(%rdi),%xmm3 + movdqu 32(%rdi),%xmm4 + pxor %xmm10,%xmm2 + movdqu 48(%rdi),%xmm5 + pxor %xmm11,%xmm3 + movdqu 64(%rdi),%xmm6 + leaq 80(%rdi),%rdi + pxor %xmm12,%xmm4 + pxor %xmm13,%xmm5 + pxor %xmm14,%xmm6 + pxor %xmm7,%xmm7 + + call _aesni_encrypt6 + + xorps %xmm10,%xmm2 + movdqa %xmm15,%xmm10 + xorps %xmm11,%xmm3 + xorps %xmm12,%xmm4 + movdqu %xmm2,(%rsi) + xorps %xmm13,%xmm5 + movdqu %xmm3,16(%rsi) + xorps %xmm14,%xmm6 + movdqu %xmm4,32(%rsi) + movdqu %xmm5,48(%rsi) + movdqu %xmm6,64(%rsi) + leaq 80(%rsi),%rsi + jmp .Lxts_enc_done + +.align 16 +.Lxts_enc_one: + movups (%rdi),%xmm2 + leaq 16(%rdi),%rdi + xorps %xmm10,%xmm2 + movups (%rcx),%xmm0 + movups 16(%rcx),%xmm1 + leaq 32(%rcx),%rcx + xorps %xmm0,%xmm2 +.Loop_enc1_9: +.byte 102,15,56,220,209 + decl %eax + movups (%rcx),%xmm1 + leaq 16(%rcx),%rcx + jnz .Loop_enc1_9 +.byte 102,15,56,221,209 + xorps %xmm10,%xmm2 + movdqa %xmm11,%xmm10 + movups %xmm2,(%rsi) + leaq 16(%rsi),%rsi + jmp .Lxts_enc_done + +.align 16 +.Lxts_enc_two: + movups (%rdi),%xmm2 + movups 16(%rdi),%xmm3 + leaq 32(%rdi),%rdi + xorps %xmm10,%xmm2 + xorps %xmm11,%xmm3 + + call _aesni_encrypt2 + + xorps %xmm10,%xmm2 + movdqa %xmm12,%xmm10 + xorps %xmm11,%xmm3 + movups %xmm2,(%rsi) + movups %xmm3,16(%rsi) + leaq 32(%rsi),%rsi + jmp .Lxts_enc_done + +.align 16 +.Lxts_enc_three: + movups (%rdi),%xmm2 + movups 16(%rdi),%xmm3 + movups 32(%rdi),%xmm4 + leaq 48(%rdi),%rdi + xorps %xmm10,%xmm2 + xorps %xmm11,%xmm3 + xorps %xmm12,%xmm4 + + call _aesni_encrypt3 + + xorps %xmm10,%xmm2 + movdqa %xmm13,%xmm10 + xorps %xmm11,%xmm3 + xorps %xmm12,%xmm4 + movups %xmm2,(%rsi) + movups %xmm3,16(%rsi) + movups %xmm4,32(%rsi) + leaq 48(%rsi),%rsi + jmp .Lxts_enc_done + +.align 16 +.Lxts_enc_four: + movups (%rdi),%xmm2 + movups 16(%rdi),%xmm3 + movups 32(%rdi),%xmm4 + xorps %xmm10,%xmm2 + movups 48(%rdi),%xmm5 + leaq 64(%rdi),%rdi + xorps %xmm11,%xmm3 + xorps %xmm12,%xmm4 + xorps %xmm13,%xmm5 + + call _aesni_encrypt4 + + pxor %xmm10,%xmm2 + movdqa %xmm14,%xmm10 + pxor %xmm11,%xmm3 + pxor %xmm12,%xmm4 + movdqu %xmm2,(%rsi) + pxor %xmm13,%xmm5 + movdqu %xmm3,16(%rsi) + movdqu %xmm4,32(%rsi) + movdqu %xmm5,48(%rsi) + leaq 64(%rsi),%rsi + jmp .Lxts_enc_done + +.align 16 +.Lxts_enc_done: + andq $15,%r9 + jz .Lxts_enc_ret + movq %r9,%rdx + +.Lxts_enc_steal: + movzbl (%rdi),%eax + movzbl -16(%rsi),%ecx + leaq 1(%rdi),%rdi + movb %al,-16(%rsi) + movb %cl,0(%rsi) + leaq 1(%rsi),%rsi + subq $1,%rdx + jnz .Lxts_enc_steal + + subq %r9,%rsi + movq %rbp,%rcx + movl %r10d,%eax + + movups -16(%rsi),%xmm2 + xorps %xmm10,%xmm2 + movups (%rcx),%xmm0 + movups 16(%rcx),%xmm1 + leaq 32(%rcx),%rcx + xorps %xmm0,%xmm2 +.Loop_enc1_10: +.byte 102,15,56,220,209 + decl %eax + movups (%rcx),%xmm1 + leaq 16(%rcx),%rcx + jnz .Loop_enc1_10 +.byte 102,15,56,221,209 + xorps %xmm10,%xmm2 + movups %xmm2,-16(%rsi) + +.Lxts_enc_ret: + xorps %xmm0,%xmm0 + pxor %xmm1,%xmm1 + pxor %xmm2,%xmm2 + pxor %xmm3,%xmm3 + pxor %xmm4,%xmm4 + pxor %xmm5,%xmm5 + pxor %xmm6,%xmm6 + pxor %xmm7,%xmm7 + movaps %xmm0,0(%rsp) + pxor %xmm8,%xmm8 + movaps %xmm0,16(%rsp) + pxor %xmm9,%xmm9 + movaps %xmm0,32(%rsp) + pxor %xmm10,%xmm10 + movaps %xmm0,48(%rsp) + pxor %xmm11,%xmm11 + movaps %xmm0,64(%rsp) + pxor %xmm12,%xmm12 + movaps %xmm0,80(%rsp) + pxor %xmm13,%xmm13 + movaps %xmm0,96(%rsp) + pxor %xmm14,%xmm14 + pxor %xmm15,%xmm15 + movq -8(%r11),%rbp +.cfi_restore %rbp + leaq (%r11),%rsp +.cfi_def_cfa_register %rsp +.Lxts_enc_epilogue: + .byte 0xf3,0xc3 +.cfi_endproc +.size aesni_xts_encrypt,.-aesni_xts_encrypt +.globl aesni_xts_decrypt +.type aesni_xts_decrypt,@function +.align 16 +aesni_xts_decrypt: +.cfi_startproc + leaq (%rsp),%r11 +.cfi_def_cfa_register %r11 + pushq %rbp +.cfi_offset %rbp,-16 + subq $112,%rsp + andq $-16,%rsp + movups (%r9),%xmm2 + movl 240(%r8),%eax + movl 240(%rcx),%r10d + movups (%r8),%xmm0 + movups 16(%r8),%xmm1 + leaq 32(%r8),%r8 + xorps %xmm0,%xmm2 +.Loop_enc1_11: +.byte 102,15,56,220,209 + decl %eax + movups (%r8),%xmm1 + leaq 16(%r8),%r8 + jnz .Loop_enc1_11 +.byte 102,15,56,221,209 + xorl %eax,%eax + testq $15,%rdx + setnz %al + shlq $4,%rax + subq %rax,%rdx + + movups (%rcx),%xmm0 + movq %rcx,%rbp + movl %r10d,%eax + shll $4,%r10d + movq %rdx,%r9 + andq $-16,%rdx + + movups 16(%rcx,%r10,1),%xmm1 + + movdqa .Lxts_magic(%rip),%xmm8 + movdqa %xmm2,%xmm15 + pshufd $0x5f,%xmm2,%xmm9 + pxor %xmm0,%xmm1 + movdqa %xmm9,%xmm14 + paddd %xmm9,%xmm9 + movdqa %xmm15,%xmm10 + psrad $31,%xmm14 + paddq %xmm15,%xmm15 + pand %xmm8,%xmm14 + pxor %xmm0,%xmm10 + pxor %xmm14,%xmm15 + movdqa %xmm9,%xmm14 + paddd %xmm9,%xmm9 + movdqa %xmm15,%xmm11 + psrad $31,%xmm14 + paddq %xmm15,%xmm15 + pand %xmm8,%xmm14 + pxor %xmm0,%xmm11 + pxor %xmm14,%xmm15 + movdqa %xmm9,%xmm14 + paddd %xmm9,%xmm9 + movdqa %xmm15,%xmm12 + psrad $31,%xmm14 + paddq %xmm15,%xmm15 + pand %xmm8,%xmm14 + pxor %xmm0,%xmm12 + pxor %xmm14,%xmm15 + movdqa %xmm9,%xmm14 + paddd %xmm9,%xmm9 + movdqa %xmm15,%xmm13 + psrad $31,%xmm14 + paddq %xmm15,%xmm15 + pand %xmm8,%xmm14 + pxor %xmm0,%xmm13 + pxor %xmm14,%xmm15 + movdqa %xmm15,%xmm14 + psrad $31,%xmm9 + paddq %xmm15,%xmm15 + pand %xmm8,%xmm9 + pxor %xmm0,%xmm14 + pxor %xmm9,%xmm15 + movaps %xmm1,96(%rsp) + + subq $96,%rdx + jc .Lxts_dec_short + + movl $16+96,%eax + leaq 32(%rbp,%r10,1),%rcx + subq %r10,%rax + movups 16(%rbp),%xmm1 + movq %rax,%r10 + leaq .Lxts_magic(%rip),%r8 + jmp .Lxts_dec_grandloop + +.align 32 +.Lxts_dec_grandloop: + movdqu 0(%rdi),%xmm2 + movdqa %xmm0,%xmm8 + movdqu 16(%rdi),%xmm3 + pxor %xmm10,%xmm2 + movdqu 32(%rdi),%xmm4 + pxor %xmm11,%xmm3 +.byte 102,15,56,222,209 + movdqu 48(%rdi),%xmm5 + pxor %xmm12,%xmm4 +.byte 102,15,56,222,217 + movdqu 64(%rdi),%xmm6 + pxor %xmm13,%xmm5 +.byte 102,15,56,222,225 + movdqu 80(%rdi),%xmm7 + pxor %xmm15,%xmm8 + movdqa 96(%rsp),%xmm9 + pxor %xmm14,%xmm6 +.byte 102,15,56,222,233 + movups 32(%rbp),%xmm0 + leaq 96(%rdi),%rdi + pxor %xmm8,%xmm7 + + pxor %xmm9,%xmm10 +.byte 102,15,56,222,241 + pxor %xmm9,%xmm11 + movdqa %xmm10,0(%rsp) +.byte 102,15,56,222,249 + movups 48(%rbp),%xmm1 + pxor %xmm9,%xmm12 + +.byte 102,15,56,222,208 + pxor %xmm9,%xmm13 + movdqa %xmm11,16(%rsp) +.byte 102,15,56,222,216 + pxor %xmm9,%xmm14 + movdqa %xmm12,32(%rsp) +.byte 102,15,56,222,224 +.byte 102,15,56,222,232 + pxor %xmm9,%xmm8 + movdqa %xmm14,64(%rsp) +.byte 102,15,56,222,240 +.byte 102,15,56,222,248 + movups 64(%rbp),%xmm0 + movdqa %xmm8,80(%rsp) + pshufd $0x5f,%xmm15,%xmm9 + jmp .Lxts_dec_loop6 +.align 32 +.Lxts_dec_loop6: +.byte 102,15,56,222,209 +.byte 102,15,56,222,217 +.byte 102,15,56,222,225 +.byte 102,15,56,222,233 +.byte 102,15,56,222,241 +.byte 102,15,56,222,249 + movups -64(%rcx,%rax,1),%xmm1 + addq $32,%rax + +.byte 102,15,56,222,208 +.byte 102,15,56,222,216 +.byte 102,15,56,222,224 +.byte 102,15,56,222,232 +.byte 102,15,56,222,240 +.byte 102,15,56,222,248 + movups -80(%rcx,%rax,1),%xmm0 + jnz .Lxts_dec_loop6 + + movdqa (%r8),%xmm8 + movdqa %xmm9,%xmm14 + paddd %xmm9,%xmm9 +.byte 102,15,56,222,209 + paddq %xmm15,%xmm15 + psrad $31,%xmm14 +.byte 102,15,56,222,217 + pand %xmm8,%xmm14 + movups (%rbp),%xmm10 +.byte 102,15,56,222,225 +.byte 102,15,56,222,233 +.byte 102,15,56,222,241 + pxor %xmm14,%xmm15 + movaps %xmm10,%xmm11 +.byte 102,15,56,222,249 + movups -64(%rcx),%xmm1 + + movdqa %xmm9,%xmm14 +.byte 102,15,56,222,208 + paddd %xmm9,%xmm9 + pxor %xmm15,%xmm10 +.byte 102,15,56,222,216 + psrad $31,%xmm14 + paddq %xmm15,%xmm15 +.byte 102,15,56,222,224 +.byte 102,15,56,222,232 + pand %xmm8,%xmm14 + movaps %xmm11,%xmm12 +.byte 102,15,56,222,240 + pxor %xmm14,%xmm15 + movdqa %xmm9,%xmm14 +.byte 102,15,56,222,248 + movups -48(%rcx),%xmm0 + + paddd %xmm9,%xmm9 +.byte 102,15,56,222,209 + pxor %xmm15,%xmm11 + psrad $31,%xmm14 +.byte 102,15,56,222,217 + paddq %xmm15,%xmm15 + pand %xmm8,%xmm14 +.byte 102,15,56,222,225 +.byte 102,15,56,222,233 + movdqa %xmm13,48(%rsp) + pxor %xmm14,%xmm15 +.byte 102,15,56,222,241 + movaps %xmm12,%xmm13 + movdqa %xmm9,%xmm14 +.byte 102,15,56,222,249 + movups -32(%rcx),%xmm1 + + paddd %xmm9,%xmm9 +.byte 102,15,56,222,208 + pxor %xmm15,%xmm12 + psrad $31,%xmm14 +.byte 102,15,56,222,216 + paddq %xmm15,%xmm15 + pand %xmm8,%xmm14 +.byte 102,15,56,222,224 +.byte 102,15,56,222,232 +.byte 102,15,56,222,240 + pxor %xmm14,%xmm15 + movaps %xmm13,%xmm14 +.byte 102,15,56,222,248 + + movdqa %xmm9,%xmm0 + paddd %xmm9,%xmm9 +.byte 102,15,56,222,209 + pxor %xmm15,%xmm13 + psrad $31,%xmm0 +.byte 102,15,56,222,217 + paddq %xmm15,%xmm15 + pand %xmm8,%xmm0 +.byte 102,15,56,222,225 +.byte 102,15,56,222,233 + pxor %xmm0,%xmm15 + movups (%rbp),%xmm0 +.byte 102,15,56,222,241 +.byte 102,15,56,222,249 + movups 16(%rbp),%xmm1 + + pxor %xmm15,%xmm14 +.byte 102,15,56,223,84,36,0 + psrad $31,%xmm9 + paddq %xmm15,%xmm15 +.byte 102,15,56,223,92,36,16 +.byte 102,15,56,223,100,36,32 + pand %xmm8,%xmm9 + movq %r10,%rax +.byte 102,15,56,223,108,36,48 +.byte 102,15,56,223,116,36,64 +.byte 102,15,56,223,124,36,80 + pxor %xmm9,%xmm15 + + leaq 96(%rsi),%rsi + movups %xmm2,-96(%rsi) + movups %xmm3,-80(%rsi) + movups %xmm4,-64(%rsi) + movups %xmm5,-48(%rsi) + movups %xmm6,-32(%rsi) + movups %xmm7,-16(%rsi) + subq $96,%rdx + jnc .Lxts_dec_grandloop + + movl $16+96,%eax + subl %r10d,%eax + movq %rbp,%rcx + shrl $4,%eax + +.Lxts_dec_short: + + movl %eax,%r10d + pxor %xmm0,%xmm10 + pxor %xmm0,%xmm11 + addq $96,%rdx + jz .Lxts_dec_done + + pxor %xmm0,%xmm12 + cmpq $0x20,%rdx + jb .Lxts_dec_one + pxor %xmm0,%xmm13 + je .Lxts_dec_two + + pxor %xmm0,%xmm14 + cmpq $0x40,%rdx + jb .Lxts_dec_three + je .Lxts_dec_four + + movdqu (%rdi),%xmm2 + movdqu 16(%rdi),%xmm3 + movdqu 32(%rdi),%xmm4 + pxor %xmm10,%xmm2 + movdqu 48(%rdi),%xmm5 + pxor %xmm11,%xmm3 + movdqu 64(%rdi),%xmm6 + leaq 80(%rdi),%rdi + pxor %xmm12,%xmm4 + pxor %xmm13,%xmm5 + pxor %xmm14,%xmm6 + + call _aesni_decrypt6 + + xorps %xmm10,%xmm2 + xorps %xmm11,%xmm3 + xorps %xmm12,%xmm4 + movdqu %xmm2,(%rsi) + xorps %xmm13,%xmm5 + movdqu %xmm3,16(%rsi) + xorps %xmm14,%xmm6 + movdqu %xmm4,32(%rsi) + pxor %xmm14,%xmm14 + movdqu %xmm5,48(%rsi) + pcmpgtd %xmm15,%xmm14 + movdqu %xmm6,64(%rsi) + leaq 80(%rsi),%rsi + pshufd $0x13,%xmm14,%xmm11 + andq $15,%r9 + jz .Lxts_dec_ret + + movdqa %xmm15,%xmm10 + paddq %xmm15,%xmm15 + pand %xmm8,%xmm11 + pxor %xmm15,%xmm11 + jmp .Lxts_dec_done2 + +.align 16 +.Lxts_dec_one: + movups (%rdi),%xmm2 + leaq 16(%rdi),%rdi + xorps %xmm10,%xmm2 + movups (%rcx),%xmm0 + movups 16(%rcx),%xmm1 + leaq 32(%rcx),%rcx + xorps %xmm0,%xmm2 +.Loop_dec1_12: +.byte 102,15,56,222,209 + decl %eax + movups (%rcx),%xmm1 + leaq 16(%rcx),%rcx + jnz .Loop_dec1_12 +.byte 102,15,56,223,209 + xorps %xmm10,%xmm2 + movdqa %xmm11,%xmm10 + movups %xmm2,(%rsi) + movdqa %xmm12,%xmm11 + leaq 16(%rsi),%rsi + jmp .Lxts_dec_done + +.align 16 +.Lxts_dec_two: + movups (%rdi),%xmm2 + movups 16(%rdi),%xmm3 + leaq 32(%rdi),%rdi + xorps %xmm10,%xmm2 + xorps %xmm11,%xmm3 + + call _aesni_decrypt2 + + xorps %xmm10,%xmm2 + movdqa %xmm12,%xmm10 + xorps %xmm11,%xmm3 + movdqa %xmm13,%xmm11 + movups %xmm2,(%rsi) + movups %xmm3,16(%rsi) + leaq 32(%rsi),%rsi + jmp .Lxts_dec_done + +.align 16 +.Lxts_dec_three: + movups (%rdi),%xmm2 + movups 16(%rdi),%xmm3 + movups 32(%rdi),%xmm4 + leaq 48(%rdi),%rdi + xorps %xmm10,%xmm2 + xorps %xmm11,%xmm3 + xorps %xmm12,%xmm4 + + call _aesni_decrypt3 + + xorps %xmm10,%xmm2 + movdqa %xmm13,%xmm10 + xorps %xmm11,%xmm3 + movdqa %xmm14,%xmm11 + xorps %xmm12,%xmm4 + movups %xmm2,(%rsi) + movups %xmm3,16(%rsi) + movups %xmm4,32(%rsi) + leaq 48(%rsi),%rsi + jmp .Lxts_dec_done + +.align 16 +.Lxts_dec_four: + movups (%rdi),%xmm2 + movups 16(%rdi),%xmm3 + movups 32(%rdi),%xmm4 + xorps %xmm10,%xmm2 + movups 48(%rdi),%xmm5 + leaq 64(%rdi),%rdi + xorps %xmm11,%xmm3 + xorps %xmm12,%xmm4 + xorps %xmm13,%xmm5 + + call _aesni_decrypt4 + + pxor %xmm10,%xmm2 + movdqa %xmm14,%xmm10 + pxor %xmm11,%xmm3 + movdqa %xmm15,%xmm11 + pxor %xmm12,%xmm4 + movdqu %xmm2,(%rsi) + pxor %xmm13,%xmm5 + movdqu %xmm3,16(%rsi) + movdqu %xmm4,32(%rsi) + movdqu %xmm5,48(%rsi) + leaq 64(%rsi),%rsi + jmp .Lxts_dec_done + +.align 16 +.Lxts_dec_done: + andq $15,%r9 + jz .Lxts_dec_ret +.Lxts_dec_done2: + movq %r9,%rdx + movq %rbp,%rcx + movl %r10d,%eax + + movups (%rdi),%xmm2 + xorps %xmm11,%xmm2 + movups (%rcx),%xmm0 + movups 16(%rcx),%xmm1 + leaq 32(%rcx),%rcx + xorps %xmm0,%xmm2 +.Loop_dec1_13: +.byte 102,15,56,222,209 + decl %eax + movups (%rcx),%xmm1 + leaq 16(%rcx),%rcx + jnz .Loop_dec1_13 +.byte 102,15,56,223,209 + xorps %xmm11,%xmm2 + movups %xmm2,(%rsi) + +.Lxts_dec_steal: + movzbl 16(%rdi),%eax + movzbl (%rsi),%ecx + leaq 1(%rdi),%rdi + movb %al,(%rsi) + movb %cl,16(%rsi) + leaq 1(%rsi),%rsi + subq $1,%rdx + jnz .Lxts_dec_steal + + subq %r9,%rsi + movq %rbp,%rcx + movl %r10d,%eax + + movups (%rsi),%xmm2 + xorps %xmm10,%xmm2 + movups (%rcx),%xmm0 + movups 16(%rcx),%xmm1 + leaq 32(%rcx),%rcx + xorps %xmm0,%xmm2 +.Loop_dec1_14: +.byte 102,15,56,222,209 + decl %eax + movups (%rcx),%xmm1 + leaq 16(%rcx),%rcx + jnz .Loop_dec1_14 +.byte 102,15,56,223,209 + xorps %xmm10,%xmm2 + movups %xmm2,(%rsi) + +.Lxts_dec_ret: + xorps %xmm0,%xmm0 + pxor %xmm1,%xmm1 + pxor %xmm2,%xmm2 + pxor %xmm3,%xmm3 + pxor %xmm4,%xmm4 + pxor %xmm5,%xmm5 + pxor %xmm6,%xmm6 + pxor %xmm7,%xmm7 + movaps %xmm0,0(%rsp) + pxor %xmm8,%xmm8 + movaps %xmm0,16(%rsp) + pxor %xmm9,%xmm9 + movaps %xmm0,32(%rsp) + pxor %xmm10,%xmm10 + movaps %xmm0,48(%rsp) + pxor %xmm11,%xmm11 + movaps %xmm0,64(%rsp) + pxor %xmm12,%xmm12 + movaps %xmm0,80(%rsp) + pxor %xmm13,%xmm13 + movaps %xmm0,96(%rsp) + pxor %xmm14,%xmm14 + pxor %xmm15,%xmm15 + movq -8(%r11),%rbp +.cfi_restore %rbp + leaq (%r11),%rsp +.cfi_def_cfa_register %rsp +.Lxts_dec_epilogue: + .byte 0xf3,0xc3 +.cfi_endproc +.size aesni_xts_decrypt,.-aesni_xts_decrypt +.globl aesni_ocb_encrypt +.type aesni_ocb_encrypt,@function +.align 32 +aesni_ocb_encrypt: +.cfi_startproc + leaq (%rsp),%rax + pushq %rbx +.cfi_adjust_cfa_offset 8 +.cfi_offset %rbx,-16 + pushq %rbp +.cfi_adjust_cfa_offset 8 +.cfi_offset %rbp,-24 + pushq %r12 +.cfi_adjust_cfa_offset 8 +.cfi_offset %r12,-32 + pushq %r13 +.cfi_adjust_cfa_offset 8 +.cfi_offset %r13,-40 + pushq %r14 +.cfi_adjust_cfa_offset 8 +.cfi_offset %r14,-48 + movq 8(%rax),%rbx + movq 8+8(%rax),%rbp + + movl 240(%rcx),%r10d + movq %rcx,%r11 + shll $4,%r10d + movups (%rcx),%xmm9 + movups 16(%rcx,%r10,1),%xmm1 + + movdqu (%r9),%xmm15 + pxor %xmm1,%xmm9 + pxor %xmm1,%xmm15 + + movl $16+32,%eax + leaq 32(%r11,%r10,1),%rcx + movups 16(%r11),%xmm1 + subq %r10,%rax + movq %rax,%r10 + + movdqu (%rbx),%xmm10 + movdqu (%rbp),%xmm8 + + testq $1,%r8 + jnz .Locb_enc_odd + + bsfq %r8,%r12 + addq $1,%r8 + shlq $4,%r12 + movdqu (%rbx,%r12,1),%xmm7 + movdqu (%rdi),%xmm2 + leaq 16(%rdi),%rdi + + call __ocb_encrypt1 + + movdqa %xmm7,%xmm15 + movups %xmm2,(%rsi) + leaq 16(%rsi),%rsi + subq $1,%rdx + jz .Locb_enc_done + +.Locb_enc_odd: + leaq 1(%r8),%r12 + leaq 3(%r8),%r13 + leaq 5(%r8),%r14 + leaq 6(%r8),%r8 + bsfq %r12,%r12 + bsfq %r13,%r13 + bsfq %r14,%r14 + shlq $4,%r12 + shlq $4,%r13 + shlq $4,%r14 + + subq $6,%rdx + jc .Locb_enc_short + jmp .Locb_enc_grandloop + +.align 32 +.Locb_enc_grandloop: + movdqu 0(%rdi),%xmm2 + movdqu 16(%rdi),%xmm3 + movdqu 32(%rdi),%xmm4 + movdqu 48(%rdi),%xmm5 + movdqu 64(%rdi),%xmm6 + movdqu 80(%rdi),%xmm7 + leaq 96(%rdi),%rdi + + call __ocb_encrypt6 + + movups %xmm2,0(%rsi) + movups %xmm3,16(%rsi) + movups %xmm4,32(%rsi) + movups %xmm5,48(%rsi) + movups %xmm6,64(%rsi) + movups %xmm7,80(%rsi) + leaq 96(%rsi),%rsi + subq $6,%rdx + jnc .Locb_enc_grandloop + +.Locb_enc_short: + addq $6,%rdx + jz .Locb_enc_done + + movdqu 0(%rdi),%xmm2 + cmpq $2,%rdx + jb .Locb_enc_one + movdqu 16(%rdi),%xmm3 + je .Locb_enc_two + + movdqu 32(%rdi),%xmm4 + cmpq $4,%rdx + jb .Locb_enc_three + movdqu 48(%rdi),%xmm5 + je .Locb_enc_four + + movdqu 64(%rdi),%xmm6 + pxor %xmm7,%xmm7 + + call __ocb_encrypt6 + + movdqa %xmm14,%xmm15 + movups %xmm2,0(%rsi) + movups %xmm3,16(%rsi) + movups %xmm4,32(%rsi) + movups %xmm5,48(%rsi) + movups %xmm6,64(%rsi) + + jmp .Locb_enc_done + +.align 16 +.Locb_enc_one: + movdqa %xmm10,%xmm7 + + call __ocb_encrypt1 + + movdqa %xmm7,%xmm15 + movups %xmm2,0(%rsi) + jmp .Locb_enc_done + +.align 16 +.Locb_enc_two: + pxor %xmm4,%xmm4 + pxor %xmm5,%xmm5 + + call __ocb_encrypt4 + + movdqa %xmm11,%xmm15 + movups %xmm2,0(%rsi) + movups %xmm3,16(%rsi) + + jmp .Locb_enc_done + +.align 16 +.Locb_enc_three: + pxor %xmm5,%xmm5 + + call __ocb_encrypt4 + + movdqa %xmm12,%xmm15 + movups %xmm2,0(%rsi) + movups %xmm3,16(%rsi) + movups %xmm4,32(%rsi) + + jmp .Locb_enc_done + +.align 16 +.Locb_enc_four: + call __ocb_encrypt4 + + movdqa %xmm13,%xmm15 + movups %xmm2,0(%rsi) + movups %xmm3,16(%rsi) + movups %xmm4,32(%rsi) + movups %xmm5,48(%rsi) + +.Locb_enc_done: + pxor %xmm0,%xmm15 + movdqu %xmm8,(%rbp) + movdqu %xmm15,(%r9) + + xorps %xmm0,%xmm0 + pxor %xmm1,%xmm1 + pxor %xmm2,%xmm2 + pxor %xmm3,%xmm3 + pxor %xmm4,%xmm4 + pxor %xmm5,%xmm5 + pxor %xmm6,%xmm6 + pxor %xmm7,%xmm7 + pxor %xmm8,%xmm8 + pxor %xmm9,%xmm9 + pxor %xmm10,%xmm10 + pxor %xmm11,%xmm11 + pxor %xmm12,%xmm12 + pxor %xmm13,%xmm13 + pxor %xmm14,%xmm14 + pxor %xmm15,%xmm15 + leaq 40(%rsp),%rax +.cfi_def_cfa %rax,8 + movq -40(%rax),%r14 +.cfi_restore %r14 + movq -32(%rax),%r13 +.cfi_restore %r13 + movq -24(%rax),%r12 +.cfi_restore %r12 + movq -16(%rax),%rbp +.cfi_restore %rbp + movq -8(%rax),%rbx +.cfi_restore %rbx + leaq (%rax),%rsp +.cfi_def_cfa_register %rsp +.Locb_enc_epilogue: + .byte 0xf3,0xc3 +.cfi_endproc +.size aesni_ocb_encrypt,.-aesni_ocb_encrypt + +.type __ocb_encrypt6,@function +.align 32 +__ocb_encrypt6: +.cfi_startproc + pxor %xmm9,%xmm15 + movdqu (%rbx,%r12,1),%xmm11 + movdqa %xmm10,%xmm12 + movdqu (%rbx,%r13,1),%xmm13 + movdqa %xmm10,%xmm14 + pxor %xmm15,%xmm10 + movdqu (%rbx,%r14,1),%xmm15 + pxor %xmm10,%xmm11 + pxor %xmm2,%xmm8 + pxor %xmm10,%xmm2 + pxor %xmm11,%xmm12 + pxor %xmm3,%xmm8 + pxor %xmm11,%xmm3 + pxor %xmm12,%xmm13 + pxor %xmm4,%xmm8 + pxor %xmm12,%xmm4 + pxor %xmm13,%xmm14 + pxor %xmm5,%xmm8 + pxor %xmm13,%xmm5 + pxor %xmm14,%xmm15 + pxor %xmm6,%xmm8 + pxor %xmm14,%xmm6 + pxor %xmm7,%xmm8 + pxor %xmm15,%xmm7 + movups 32(%r11),%xmm0 + + leaq 1(%r8),%r12 + leaq 3(%r8),%r13 + leaq 5(%r8),%r14 + addq $6,%r8 + pxor %xmm9,%xmm10 + bsfq %r12,%r12 + bsfq %r13,%r13 + bsfq %r14,%r14 + +.byte 102,15,56,220,209 +.byte 102,15,56,220,217 +.byte 102,15,56,220,225 +.byte 102,15,56,220,233 + pxor %xmm9,%xmm11 + pxor %xmm9,%xmm12 +.byte 102,15,56,220,241 + pxor %xmm9,%xmm13 + pxor %xmm9,%xmm14 +.byte 102,15,56,220,249 + movups 48(%r11),%xmm1 + pxor %xmm9,%xmm15 + +.byte 102,15,56,220,208 +.byte 102,15,56,220,216 +.byte 102,15,56,220,224 +.byte 102,15,56,220,232 +.byte 102,15,56,220,240 +.byte 102,15,56,220,248 + movups 64(%r11),%xmm0 + shlq $4,%r12 + shlq $4,%r13 + jmp .Locb_enc_loop6 + +.align 32 +.Locb_enc_loop6: +.byte 102,15,56,220,209 +.byte 102,15,56,220,217 +.byte 102,15,56,220,225 +.byte 102,15,56,220,233 +.byte 102,15,56,220,241 +.byte 102,15,56,220,249 + movups (%rcx,%rax,1),%xmm1 + addq $32,%rax + +.byte 102,15,56,220,208 +.byte 102,15,56,220,216 +.byte 102,15,56,220,224 +.byte 102,15,56,220,232 +.byte 102,15,56,220,240 +.byte 102,15,56,220,248 + movups -16(%rcx,%rax,1),%xmm0 + jnz .Locb_enc_loop6 + +.byte 102,15,56,220,209 +.byte 102,15,56,220,217 +.byte 102,15,56,220,225 +.byte 102,15,56,220,233 +.byte 102,15,56,220,241 +.byte 102,15,56,220,249 + movups 16(%r11),%xmm1 + shlq $4,%r14 + +.byte 102,65,15,56,221,210 + movdqu (%rbx),%xmm10 + movq %r10,%rax +.byte 102,65,15,56,221,219 +.byte 102,65,15,56,221,228 +.byte 102,65,15,56,221,237 +.byte 102,65,15,56,221,246 +.byte 102,65,15,56,221,255 + .byte 0xf3,0xc3 +.cfi_endproc +.size __ocb_encrypt6,.-__ocb_encrypt6 + +.type __ocb_encrypt4,@function +.align 32 +__ocb_encrypt4: +.cfi_startproc + pxor %xmm9,%xmm15 + movdqu (%rbx,%r12,1),%xmm11 + movdqa %xmm10,%xmm12 + movdqu (%rbx,%r13,1),%xmm13 + pxor %xmm15,%xmm10 + pxor %xmm10,%xmm11 + pxor %xmm2,%xmm8 + pxor %xmm10,%xmm2 + pxor %xmm11,%xmm12 + pxor %xmm3,%xmm8 + pxor %xmm11,%xmm3 + pxor %xmm12,%xmm13 + pxor %xmm4,%xmm8 + pxor %xmm12,%xmm4 + pxor %xmm5,%xmm8 + pxor %xmm13,%xmm5 + movups 32(%r11),%xmm0 + + pxor %xmm9,%xmm10 + pxor %xmm9,%xmm11 + pxor %xmm9,%xmm12 + pxor %xmm9,%xmm13 + +.byte 102,15,56,220,209 +.byte 102,15,56,220,217 +.byte 102,15,56,220,225 +.byte 102,15,56,220,233 + movups 48(%r11),%xmm1 + +.byte 102,15,56,220,208 +.byte 102,15,56,220,216 +.byte 102,15,56,220,224 +.byte 102,15,56,220,232 + movups 64(%r11),%xmm0 + jmp .Locb_enc_loop4 + +.align 32 +.Locb_enc_loop4: +.byte 102,15,56,220,209 +.byte 102,15,56,220,217 +.byte 102,15,56,220,225 +.byte 102,15,56,220,233 + movups (%rcx,%rax,1),%xmm1 + addq $32,%rax + +.byte 102,15,56,220,208 +.byte 102,15,56,220,216 +.byte 102,15,56,220,224 +.byte 102,15,56,220,232 + movups -16(%rcx,%rax,1),%xmm0 + jnz .Locb_enc_loop4 + +.byte 102,15,56,220,209 +.byte 102,15,56,220,217 +.byte 102,15,56,220,225 +.byte 102,15,56,220,233 + movups 16(%r11),%xmm1 + movq %r10,%rax + +.byte 102,65,15,56,221,210 +.byte 102,65,15,56,221,219 +.byte 102,65,15,56,221,228 +.byte 102,65,15,56,221,237 + .byte 0xf3,0xc3 +.cfi_endproc +.size __ocb_encrypt4,.-__ocb_encrypt4 + +.type __ocb_encrypt1,@function +.align 32 +__ocb_encrypt1: +.cfi_startproc + pxor %xmm15,%xmm7 + pxor %xmm9,%xmm7 + pxor %xmm2,%xmm8 + pxor %xmm7,%xmm2 + movups 32(%r11),%xmm0 + +.byte 102,15,56,220,209 + movups 48(%r11),%xmm1 + pxor %xmm9,%xmm7 + +.byte 102,15,56,220,208 + movups 64(%r11),%xmm0 + jmp .Locb_enc_loop1 + +.align 32 +.Locb_enc_loop1: +.byte 102,15,56,220,209 + movups (%rcx,%rax,1),%xmm1 + addq $32,%rax + +.byte 102,15,56,220,208 + movups -16(%rcx,%rax,1),%xmm0 + jnz .Locb_enc_loop1 + +.byte 102,15,56,220,209 + movups 16(%r11),%xmm1 + movq %r10,%rax + +.byte 102,15,56,221,215 + .byte 0xf3,0xc3 +.cfi_endproc +.size __ocb_encrypt1,.-__ocb_encrypt1 + +.globl aesni_ocb_decrypt +.type aesni_ocb_decrypt,@function +.align 32 +aesni_ocb_decrypt: +.cfi_startproc + leaq (%rsp),%rax + pushq %rbx +.cfi_adjust_cfa_offset 8 +.cfi_offset %rbx,-16 + pushq %rbp +.cfi_adjust_cfa_offset 8 +.cfi_offset %rbp,-24 + pushq %r12 +.cfi_adjust_cfa_offset 8 +.cfi_offset %r12,-32 + pushq %r13 +.cfi_adjust_cfa_offset 8 +.cfi_offset %r13,-40 + pushq %r14 +.cfi_adjust_cfa_offset 8 +.cfi_offset %r14,-48 + movq 8(%rax),%rbx + movq 8+8(%rax),%rbp + + movl 240(%rcx),%r10d + movq %rcx,%r11 + shll $4,%r10d + movups (%rcx),%xmm9 + movups 16(%rcx,%r10,1),%xmm1 + + movdqu (%r9),%xmm15 + pxor %xmm1,%xmm9 + pxor %xmm1,%xmm15 + + movl $16+32,%eax + leaq 32(%r11,%r10,1),%rcx + movups 16(%r11),%xmm1 + subq %r10,%rax + movq %rax,%r10 + + movdqu (%rbx),%xmm10 + movdqu (%rbp),%xmm8 + + testq $1,%r8 + jnz .Locb_dec_odd + + bsfq %r8,%r12 + addq $1,%r8 + shlq $4,%r12 + movdqu (%rbx,%r12,1),%xmm7 + movdqu (%rdi),%xmm2 + leaq 16(%rdi),%rdi + + call __ocb_decrypt1 + + movdqa %xmm7,%xmm15 + movups %xmm2,(%rsi) + xorps %xmm2,%xmm8 + leaq 16(%rsi),%rsi + subq $1,%rdx + jz .Locb_dec_done + +.Locb_dec_odd: + leaq 1(%r8),%r12 + leaq 3(%r8),%r13 + leaq 5(%r8),%r14 + leaq 6(%r8),%r8 + bsfq %r12,%r12 + bsfq %r13,%r13 + bsfq %r14,%r14 + shlq $4,%r12 + shlq $4,%r13 + shlq $4,%r14 + + subq $6,%rdx + jc .Locb_dec_short + jmp .Locb_dec_grandloop + +.align 32 +.Locb_dec_grandloop: + movdqu 0(%rdi),%xmm2 + movdqu 16(%rdi),%xmm3 + movdqu 32(%rdi),%xmm4 + movdqu 48(%rdi),%xmm5 + movdqu 64(%rdi),%xmm6 + movdqu 80(%rdi),%xmm7 + leaq 96(%rdi),%rdi + + call __ocb_decrypt6 + + movups %xmm2,0(%rsi) + pxor %xmm2,%xmm8 + movups %xmm3,16(%rsi) + pxor %xmm3,%xmm8 + movups %xmm4,32(%rsi) + pxor %xmm4,%xmm8 + movups %xmm5,48(%rsi) + pxor %xmm5,%xmm8 + movups %xmm6,64(%rsi) + pxor %xmm6,%xmm8 + movups %xmm7,80(%rsi) + pxor %xmm7,%xmm8 + leaq 96(%rsi),%rsi + subq $6,%rdx + jnc .Locb_dec_grandloop + +.Locb_dec_short: + addq $6,%rdx + jz .Locb_dec_done + + movdqu 0(%rdi),%xmm2 + cmpq $2,%rdx + jb .Locb_dec_one + movdqu 16(%rdi),%xmm3 + je .Locb_dec_two + + movdqu 32(%rdi),%xmm4 + cmpq $4,%rdx + jb .Locb_dec_three + movdqu 48(%rdi),%xmm5 + je .Locb_dec_four + + movdqu 64(%rdi),%xmm6 + pxor %xmm7,%xmm7 + + call __ocb_decrypt6 + + movdqa %xmm14,%xmm15 + movups %xmm2,0(%rsi) + pxor %xmm2,%xmm8 + movups %xmm3,16(%rsi) + pxor %xmm3,%xmm8 + movups %xmm4,32(%rsi) + pxor %xmm4,%xmm8 + movups %xmm5,48(%rsi) + pxor %xmm5,%xmm8 + movups %xmm6,64(%rsi) + pxor %xmm6,%xmm8 + + jmp .Locb_dec_done + +.align 16 +.Locb_dec_one: + movdqa %xmm10,%xmm7 + + call __ocb_decrypt1 + + movdqa %xmm7,%xmm15 + movups %xmm2,0(%rsi) + xorps %xmm2,%xmm8 + jmp .Locb_dec_done + +.align 16 +.Locb_dec_two: + pxor %xmm4,%xmm4 + pxor %xmm5,%xmm5 + + call __ocb_decrypt4 + + movdqa %xmm11,%xmm15 + movups %xmm2,0(%rsi) + xorps %xmm2,%xmm8 + movups %xmm3,16(%rsi) + xorps %xmm3,%xmm8 + + jmp .Locb_dec_done + +.align 16 +.Locb_dec_three: + pxor %xmm5,%xmm5 + + call __ocb_decrypt4 + + movdqa %xmm12,%xmm15 + movups %xmm2,0(%rsi) + xorps %xmm2,%xmm8 + movups %xmm3,16(%rsi) + xorps %xmm3,%xmm8 + movups %xmm4,32(%rsi) + xorps %xmm4,%xmm8 + + jmp .Locb_dec_done + +.align 16 +.Locb_dec_four: + call __ocb_decrypt4 + + movdqa %xmm13,%xmm15 + movups %xmm2,0(%rsi) + pxor %xmm2,%xmm8 + movups %xmm3,16(%rsi) + pxor %xmm3,%xmm8 + movups %xmm4,32(%rsi) + pxor %xmm4,%xmm8 + movups %xmm5,48(%rsi) + pxor %xmm5,%xmm8 + +.Locb_dec_done: + pxor %xmm0,%xmm15 + movdqu %xmm8,(%rbp) + movdqu %xmm15,(%r9) + + xorps %xmm0,%xmm0 + pxor %xmm1,%xmm1 + pxor %xmm2,%xmm2 + pxor %xmm3,%xmm3 + pxor %xmm4,%xmm4 + pxor %xmm5,%xmm5 + pxor %xmm6,%xmm6 + pxor %xmm7,%xmm7 + pxor %xmm8,%xmm8 + pxor %xmm9,%xmm9 + pxor %xmm10,%xmm10 + pxor %xmm11,%xmm11 + pxor %xmm12,%xmm12 + pxor %xmm13,%xmm13 + pxor %xmm14,%xmm14 + pxor %xmm15,%xmm15 + leaq 40(%rsp),%rax +.cfi_def_cfa %rax,8 + movq -40(%rax),%r14 +.cfi_restore %r14 + movq -32(%rax),%r13 +.cfi_restore %r13 + movq -24(%rax),%r12 +.cfi_restore %r12 + movq -16(%rax),%rbp +.cfi_restore %rbp + movq -8(%rax),%rbx +.cfi_restore %rbx + leaq (%rax),%rsp +.cfi_def_cfa_register %rsp +.Locb_dec_epilogue: + .byte 0xf3,0xc3 +.cfi_endproc +.size aesni_ocb_decrypt,.-aesni_ocb_decrypt + +.type __ocb_decrypt6,@function +.align 32 +__ocb_decrypt6: +.cfi_startproc + pxor %xmm9,%xmm15 + movdqu (%rbx,%r12,1),%xmm11 + movdqa %xmm10,%xmm12 + movdqu (%rbx,%r13,1),%xmm13 + movdqa %xmm10,%xmm14 + pxor %xmm15,%xmm10 + movdqu (%rbx,%r14,1),%xmm15 + pxor %xmm10,%xmm11 + pxor %xmm10,%xmm2 + pxor %xmm11,%xmm12 + pxor %xmm11,%xmm3 + pxor %xmm12,%xmm13 + pxor %xmm12,%xmm4 + pxor %xmm13,%xmm14 + pxor %xmm13,%xmm5 + pxor %xmm14,%xmm15 + pxor %xmm14,%xmm6 + pxor %xmm15,%xmm7 + movups 32(%r11),%xmm0 + + leaq 1(%r8),%r12 + leaq 3(%r8),%r13 + leaq 5(%r8),%r14 + addq $6,%r8 + pxor %xmm9,%xmm10 + bsfq %r12,%r12 + bsfq %r13,%r13 + bsfq %r14,%r14 + +.byte 102,15,56,222,209 +.byte 102,15,56,222,217 +.byte 102,15,56,222,225 +.byte 102,15,56,222,233 + pxor %xmm9,%xmm11 + pxor %xmm9,%xmm12 +.byte 102,15,56,222,241 + pxor %xmm9,%xmm13 + pxor %xmm9,%xmm14 +.byte 102,15,56,222,249 + movups 48(%r11),%xmm1 + pxor %xmm9,%xmm15 + +.byte 102,15,56,222,208 +.byte 102,15,56,222,216 +.byte 102,15,56,222,224 +.byte 102,15,56,222,232 +.byte 102,15,56,222,240 +.byte 102,15,56,222,248 + movups 64(%r11),%xmm0 + shlq $4,%r12 + shlq $4,%r13 + jmp .Locb_dec_loop6 + +.align 32 +.Locb_dec_loop6: +.byte 102,15,56,222,209 +.byte 102,15,56,222,217 +.byte 102,15,56,222,225 +.byte 102,15,56,222,233 +.byte 102,15,56,222,241 +.byte 102,15,56,222,249 + movups (%rcx,%rax,1),%xmm1 + addq $32,%rax + +.byte 102,15,56,222,208 +.byte 102,15,56,222,216 +.byte 102,15,56,222,224 +.byte 102,15,56,222,232 +.byte 102,15,56,222,240 +.byte 102,15,56,222,248 + movups -16(%rcx,%rax,1),%xmm0 + jnz .Locb_dec_loop6 + +.byte 102,15,56,222,209 +.byte 102,15,56,222,217 +.byte 102,15,56,222,225 +.byte 102,15,56,222,233 +.byte 102,15,56,222,241 +.byte 102,15,56,222,249 + movups 16(%r11),%xmm1 + shlq $4,%r14 + +.byte 102,65,15,56,223,210 + movdqu (%rbx),%xmm10 + movq %r10,%rax +.byte 102,65,15,56,223,219 +.byte 102,65,15,56,223,228 +.byte 102,65,15,56,223,237 +.byte 102,65,15,56,223,246 +.byte 102,65,15,56,223,255 + .byte 0xf3,0xc3 +.cfi_endproc +.size __ocb_decrypt6,.-__ocb_decrypt6 + +.type __ocb_decrypt4,@function +.align 32 +__ocb_decrypt4: +.cfi_startproc + pxor %xmm9,%xmm15 + movdqu (%rbx,%r12,1),%xmm11 + movdqa %xmm10,%xmm12 + movdqu (%rbx,%r13,1),%xmm13 + pxor %xmm15,%xmm10 + pxor %xmm10,%xmm11 + pxor %xmm10,%xmm2 + pxor %xmm11,%xmm12 + pxor %xmm11,%xmm3 + pxor %xmm12,%xmm13 + pxor %xmm12,%xmm4 + pxor %xmm13,%xmm5 + movups 32(%r11),%xmm0 + + pxor %xmm9,%xmm10 + pxor %xmm9,%xmm11 + pxor %xmm9,%xmm12 + pxor %xmm9,%xmm13 + +.byte 102,15,56,222,209 +.byte 102,15,56,222,217 +.byte 102,15,56,222,225 +.byte 102,15,56,222,233 + movups 48(%r11),%xmm1 + +.byte 102,15,56,222,208 +.byte 102,15,56,222,216 +.byte 102,15,56,222,224 +.byte 102,15,56,222,232 + movups 64(%r11),%xmm0 + jmp .Locb_dec_loop4 + +.align 32 +.Locb_dec_loop4: +.byte 102,15,56,222,209 +.byte 102,15,56,222,217 +.byte 102,15,56,222,225 +.byte 102,15,56,222,233 + movups (%rcx,%rax,1),%xmm1 + addq $32,%rax + +.byte 102,15,56,222,208 +.byte 102,15,56,222,216 +.byte 102,15,56,222,224 +.byte 102,15,56,222,232 + movups -16(%rcx,%rax,1),%xmm0 + jnz .Locb_dec_loop4 + +.byte 102,15,56,222,209 +.byte 102,15,56,222,217 +.byte 102,15,56,222,225 +.byte 102,15,56,222,233 + movups 16(%r11),%xmm1 + movq %r10,%rax + +.byte 102,65,15,56,223,210 +.byte 102,65,15,56,223,219 +.byte 102,65,15,56,223,228 +.byte 102,65,15,56,223,237 + .byte 0xf3,0xc3 +.cfi_endproc +.size __ocb_decrypt4,.-__ocb_decrypt4 + +.type __ocb_decrypt1,@function +.align 32 +__ocb_decrypt1: +.cfi_startproc + pxor %xmm15,%xmm7 + pxor %xmm9,%xmm7 + pxor %xmm7,%xmm2 + movups 32(%r11),%xmm0 + +.byte 102,15,56,222,209 + movups 48(%r11),%xmm1 + pxor %xmm9,%xmm7 + +.byte 102,15,56,222,208 + movups 64(%r11),%xmm0 + jmp .Locb_dec_loop1 + +.align 32 +.Locb_dec_loop1: +.byte 102,15,56,222,209 + movups (%rcx,%rax,1),%xmm1 + addq $32,%rax + +.byte 102,15,56,222,208 + movups -16(%rcx,%rax,1),%xmm0 + jnz .Locb_dec_loop1 + +.byte 102,15,56,222,209 + movups 16(%r11),%xmm1 + movq %r10,%rax + +.byte 102,15,56,223,215 + .byte 0xf3,0xc3 +.cfi_endproc +.size __ocb_decrypt1,.-__ocb_decrypt1 +.globl aesni_cbc_encrypt +.type aesni_cbc_encrypt,@function +.align 16 +aesni_cbc_encrypt: +.cfi_startproc + testq %rdx,%rdx + jz .Lcbc_ret + + movl 240(%rcx),%r10d + movq %rcx,%r11 + testl %r9d,%r9d + jz .Lcbc_decrypt + + movups (%r8),%xmm2 + movl %r10d,%eax + cmpq $16,%rdx + jb .Lcbc_enc_tail + subq $16,%rdx + jmp .Lcbc_enc_loop +.align 16 +.Lcbc_enc_loop: + movups (%rdi),%xmm3 + leaq 16(%rdi),%rdi + + movups (%rcx),%xmm0 + movups 16(%rcx),%xmm1 + xorps %xmm0,%xmm3 + leaq 32(%rcx),%rcx + xorps %xmm3,%xmm2 +.Loop_enc1_15: +.byte 102,15,56,220,209 + decl %eax + movups (%rcx),%xmm1 + leaq 16(%rcx),%rcx + jnz .Loop_enc1_15 +.byte 102,15,56,221,209 + movl %r10d,%eax + movq %r11,%rcx + movups %xmm2,0(%rsi) + leaq 16(%rsi),%rsi + subq $16,%rdx + jnc .Lcbc_enc_loop + addq $16,%rdx + jnz .Lcbc_enc_tail + pxor %xmm0,%xmm0 + pxor %xmm1,%xmm1 + movups %xmm2,(%r8) + pxor %xmm2,%xmm2 + pxor %xmm3,%xmm3 + jmp .Lcbc_ret + +.Lcbc_enc_tail: + movq %rdx,%rcx + xchgq %rdi,%rsi +.long 0x9066A4F3 + movl $16,%ecx + subq %rdx,%rcx + xorl %eax,%eax +.long 0x9066AAF3 + leaq -16(%rdi),%rdi + movl %r10d,%eax + movq %rdi,%rsi + movq %r11,%rcx + xorq %rdx,%rdx + jmp .Lcbc_enc_loop + +.align 16 +.Lcbc_decrypt: + cmpq $16,%rdx + jne .Lcbc_decrypt_bulk + + + + movdqu (%rdi),%xmm2 + movdqu (%r8),%xmm3 + movdqa %xmm2,%xmm4 + movups (%rcx),%xmm0 + movups 16(%rcx),%xmm1 + leaq 32(%rcx),%rcx + xorps %xmm0,%xmm2 +.Loop_dec1_16: +.byte 102,15,56,222,209 + decl %r10d + movups (%rcx),%xmm1 + leaq 16(%rcx),%rcx + jnz .Loop_dec1_16 +.byte 102,15,56,223,209 + pxor %xmm0,%xmm0 + pxor %xmm1,%xmm1 + movdqu %xmm4,(%r8) + xorps %xmm3,%xmm2 + pxor %xmm3,%xmm3 + movups %xmm2,(%rsi) + pxor %xmm2,%xmm2 + jmp .Lcbc_ret +.align 16 +.Lcbc_decrypt_bulk: + leaq (%rsp),%r11 +.cfi_def_cfa_register %r11 + pushq %rbp +.cfi_offset %rbp,-16 + subq $16,%rsp + andq $-16,%rsp + movq %rcx,%rbp + movups (%r8),%xmm10 + movl %r10d,%eax + cmpq $0x50,%rdx + jbe .Lcbc_dec_tail + + movups (%rcx),%xmm0 + movdqu 0(%rdi),%xmm2 + movdqu 16(%rdi),%xmm3 + movdqa %xmm2,%xmm11 + movdqu 32(%rdi),%xmm4 + movdqa %xmm3,%xmm12 + movdqu 48(%rdi),%xmm5 + movdqa %xmm4,%xmm13 + movdqu 64(%rdi),%xmm6 + movdqa %xmm5,%xmm14 + movdqu 80(%rdi),%xmm7 + movdqa %xmm6,%xmm15 + movl OPENSSL_ia32cap_P+4(%rip),%r9d + cmpq $0x70,%rdx + jbe .Lcbc_dec_six_or_seven + + andl $71303168,%r9d + subq $0x50,%rdx + cmpl $4194304,%r9d + je .Lcbc_dec_loop6_enter + subq $0x20,%rdx + leaq 112(%rcx),%rcx + jmp .Lcbc_dec_loop8_enter +.align 16 +.Lcbc_dec_loop8: + movups %xmm9,(%rsi) + leaq 16(%rsi),%rsi +.Lcbc_dec_loop8_enter: + movdqu 96(%rdi),%xmm8 + pxor %xmm0,%xmm2 + movdqu 112(%rdi),%xmm9 + pxor %xmm0,%xmm3 + movups 16-112(%rcx),%xmm1 + pxor %xmm0,%xmm4 + movq $-1,%rbp + cmpq $0x70,%rdx + pxor %xmm0,%xmm5 + pxor %xmm0,%xmm6 + pxor %xmm0,%xmm7 + pxor %xmm0,%xmm8 + +.byte 102,15,56,222,209 + pxor %xmm0,%xmm9 + movups 32-112(%rcx),%xmm0 +.byte 102,15,56,222,217 +.byte 102,15,56,222,225 +.byte 102,15,56,222,233 +.byte 102,15,56,222,241 +.byte 102,15,56,222,249 +.byte 102,68,15,56,222,193 + adcq $0,%rbp + andq $128,%rbp +.byte 102,68,15,56,222,201 + addq %rdi,%rbp + movups 48-112(%rcx),%xmm1 +.byte 102,15,56,222,208 +.byte 102,15,56,222,216 +.byte 102,15,56,222,224 +.byte 102,15,56,222,232 +.byte 102,15,56,222,240 +.byte 102,15,56,222,248 +.byte 102,68,15,56,222,192 +.byte 102,68,15,56,222,200 + movups 64-112(%rcx),%xmm0 + nop +.byte 102,15,56,222,209 +.byte 102,15,56,222,217 +.byte 102,15,56,222,225 +.byte 102,15,56,222,233 +.byte 102,15,56,222,241 +.byte 102,15,56,222,249 +.byte 102,68,15,56,222,193 +.byte 102,68,15,56,222,201 + movups 80-112(%rcx),%xmm1 + nop +.byte 102,15,56,222,208 +.byte 102,15,56,222,216 +.byte 102,15,56,222,224 +.byte 102,15,56,222,232 +.byte 102,15,56,222,240 +.byte 102,15,56,222,248 +.byte 102,68,15,56,222,192 +.byte 102,68,15,56,222,200 + movups 96-112(%rcx),%xmm0 + nop +.byte 102,15,56,222,209 +.byte 102,15,56,222,217 +.byte 102,15,56,222,225 +.byte 102,15,56,222,233 +.byte 102,15,56,222,241 +.byte 102,15,56,222,249 +.byte 102,68,15,56,222,193 +.byte 102,68,15,56,222,201 + movups 112-112(%rcx),%xmm1 + nop +.byte 102,15,56,222,208 +.byte 102,15,56,222,216 +.byte 102,15,56,222,224 +.byte 102,15,56,222,232 +.byte 102,15,56,222,240 +.byte 102,15,56,222,248 +.byte 102,68,15,56,222,192 +.byte 102,68,15,56,222,200 + movups 128-112(%rcx),%xmm0 + nop +.byte 102,15,56,222,209 +.byte 102,15,56,222,217 +.byte 102,15,56,222,225 +.byte 102,15,56,222,233 +.byte 102,15,56,222,241 +.byte 102,15,56,222,249 +.byte 102,68,15,56,222,193 +.byte 102,68,15,56,222,201 + movups 144-112(%rcx),%xmm1 + cmpl $11,%eax +.byte 102,15,56,222,208 +.byte 102,15,56,222,216 +.byte 102,15,56,222,224 +.byte 102,15,56,222,232 +.byte 102,15,56,222,240 +.byte 102,15,56,222,248 +.byte 102,68,15,56,222,192 +.byte 102,68,15,56,222,200 + movups 160-112(%rcx),%xmm0 + jb .Lcbc_dec_done +.byte 102,15,56,222,209 +.byte 102,15,56,222,217 +.byte 102,15,56,222,225 +.byte 102,15,56,222,233 +.byte 102,15,56,222,241 +.byte 102,15,56,222,249 +.byte 102,68,15,56,222,193 +.byte 102,68,15,56,222,201 + movups 176-112(%rcx),%xmm1 + nop +.byte 102,15,56,222,208 +.byte 102,15,56,222,216 +.byte 102,15,56,222,224 +.byte 102,15,56,222,232 +.byte 102,15,56,222,240 +.byte 102,15,56,222,248 +.byte 102,68,15,56,222,192 +.byte 102,68,15,56,222,200 + movups 192-112(%rcx),%xmm0 + je .Lcbc_dec_done +.byte 102,15,56,222,209 +.byte 102,15,56,222,217 +.byte 102,15,56,222,225 +.byte 102,15,56,222,233 +.byte 102,15,56,222,241 +.byte 102,15,56,222,249 +.byte 102,68,15,56,222,193 +.byte 102,68,15,56,222,201 + movups 208-112(%rcx),%xmm1 + nop +.byte 102,15,56,222,208 +.byte 102,15,56,222,216 +.byte 102,15,56,222,224 +.byte 102,15,56,222,232 +.byte 102,15,56,222,240 +.byte 102,15,56,222,248 +.byte 102,68,15,56,222,192 +.byte 102,68,15,56,222,200 + movups 224-112(%rcx),%xmm0 + jmp .Lcbc_dec_done +.align 16 +.Lcbc_dec_done: +.byte 102,15,56,222,209 +.byte 102,15,56,222,217 + pxor %xmm0,%xmm10 + pxor %xmm0,%xmm11 +.byte 102,15,56,222,225 +.byte 102,15,56,222,233 + pxor %xmm0,%xmm12 + pxor %xmm0,%xmm13 +.byte 102,15,56,222,241 +.byte 102,15,56,222,249 + pxor %xmm0,%xmm14 + pxor %xmm0,%xmm15 +.byte 102,68,15,56,222,193 +.byte 102,68,15,56,222,201 + movdqu 80(%rdi),%xmm1 + +.byte 102,65,15,56,223,210 + movdqu 96(%rdi),%xmm10 + pxor %xmm0,%xmm1 +.byte 102,65,15,56,223,219 + pxor %xmm0,%xmm10 + movdqu 112(%rdi),%xmm0 +.byte 102,65,15,56,223,228 + leaq 128(%rdi),%rdi + movdqu 0(%rbp),%xmm11 +.byte 102,65,15,56,223,237 +.byte 102,65,15,56,223,246 + movdqu 16(%rbp),%xmm12 + movdqu 32(%rbp),%xmm13 +.byte 102,65,15,56,223,255 +.byte 102,68,15,56,223,193 + movdqu 48(%rbp),%xmm14 + movdqu 64(%rbp),%xmm15 +.byte 102,69,15,56,223,202 + movdqa %xmm0,%xmm10 + movdqu 80(%rbp),%xmm1 + movups -112(%rcx),%xmm0 + + movups %xmm2,(%rsi) + movdqa %xmm11,%xmm2 + movups %xmm3,16(%rsi) + movdqa %xmm12,%xmm3 + movups %xmm4,32(%rsi) + movdqa %xmm13,%xmm4 + movups %xmm5,48(%rsi) + movdqa %xmm14,%xmm5 + movups %xmm6,64(%rsi) + movdqa %xmm15,%xmm6 + movups %xmm7,80(%rsi) + movdqa %xmm1,%xmm7 + movups %xmm8,96(%rsi) + leaq 112(%rsi),%rsi + + subq $0x80,%rdx + ja .Lcbc_dec_loop8 + + movaps %xmm9,%xmm2 + leaq -112(%rcx),%rcx + addq $0x70,%rdx + jle .Lcbc_dec_clear_tail_collected + movups %xmm9,(%rsi) + leaq 16(%rsi),%rsi + cmpq $0x50,%rdx + jbe .Lcbc_dec_tail + + movaps %xmm11,%xmm2 +.Lcbc_dec_six_or_seven: + cmpq $0x60,%rdx + ja .Lcbc_dec_seven + + movaps %xmm7,%xmm8 + call _aesni_decrypt6 + pxor %xmm10,%xmm2 + movaps %xmm8,%xmm10 + pxor %xmm11,%xmm3 + movdqu %xmm2,(%rsi) + pxor %xmm12,%xmm4 + movdqu %xmm3,16(%rsi) + pxor %xmm3,%xmm3 + pxor %xmm13,%xmm5 + movdqu %xmm4,32(%rsi) + pxor %xmm4,%xmm4 + pxor %xmm14,%xmm6 + movdqu %xmm5,48(%rsi) + pxor %xmm5,%xmm5 + pxor %xmm15,%xmm7 + movdqu %xmm6,64(%rsi) + pxor %xmm6,%xmm6 + leaq 80(%rsi),%rsi + movdqa %xmm7,%xmm2 + pxor %xmm7,%xmm7 + jmp .Lcbc_dec_tail_collected + +.align 16 +.Lcbc_dec_seven: + movups 96(%rdi),%xmm8 + xorps %xmm9,%xmm9 + call _aesni_decrypt8 + movups 80(%rdi),%xmm9 + pxor %xmm10,%xmm2 + movups 96(%rdi),%xmm10 + pxor %xmm11,%xmm3 + movdqu %xmm2,(%rsi) + pxor %xmm12,%xmm4 + movdqu %xmm3,16(%rsi) + pxor %xmm3,%xmm3 + pxor %xmm13,%xmm5 + movdqu %xmm4,32(%rsi) + pxor %xmm4,%xmm4 + pxor %xmm14,%xmm6 + movdqu %xmm5,48(%rsi) + pxor %xmm5,%xmm5 + pxor %xmm15,%xmm7 + movdqu %xmm6,64(%rsi) + pxor %xmm6,%xmm6 + pxor %xmm9,%xmm8 + movdqu %xmm7,80(%rsi) + pxor %xmm7,%xmm7 + leaq 96(%rsi),%rsi + movdqa %xmm8,%xmm2 + pxor %xmm8,%xmm8 + pxor %xmm9,%xmm9 + jmp .Lcbc_dec_tail_collected + +.align 16 +.Lcbc_dec_loop6: + movups %xmm7,(%rsi) + leaq 16(%rsi),%rsi + movdqu 0(%rdi),%xmm2 + movdqu 16(%rdi),%xmm3 + movdqa %xmm2,%xmm11 + movdqu 32(%rdi),%xmm4 + movdqa %xmm3,%xmm12 + movdqu 48(%rdi),%xmm5 + movdqa %xmm4,%xmm13 + movdqu 64(%rdi),%xmm6 + movdqa %xmm5,%xmm14 + movdqu 80(%rdi),%xmm7 + movdqa %xmm6,%xmm15 +.Lcbc_dec_loop6_enter: + leaq 96(%rdi),%rdi + movdqa %xmm7,%xmm8 + + call _aesni_decrypt6 + + pxor %xmm10,%xmm2 + movdqa %xmm8,%xmm10 + pxor %xmm11,%xmm3 + movdqu %xmm2,(%rsi) + pxor %xmm12,%xmm4 + movdqu %xmm3,16(%rsi) + pxor %xmm13,%xmm5 + movdqu %xmm4,32(%rsi) + pxor %xmm14,%xmm6 + movq %rbp,%rcx + movdqu %xmm5,48(%rsi) + pxor %xmm15,%xmm7 + movl %r10d,%eax + movdqu %xmm6,64(%rsi) + leaq 80(%rsi),%rsi + subq $0x60,%rdx + ja .Lcbc_dec_loop6 + + movdqa %xmm7,%xmm2 + addq $0x50,%rdx + jle .Lcbc_dec_clear_tail_collected + movups %xmm7,(%rsi) + leaq 16(%rsi),%rsi + +.Lcbc_dec_tail: + movups (%rdi),%xmm2 + subq $0x10,%rdx + jbe .Lcbc_dec_one + + movups 16(%rdi),%xmm3 + movaps %xmm2,%xmm11 + subq $0x10,%rdx + jbe .Lcbc_dec_two + + movups 32(%rdi),%xmm4 + movaps %xmm3,%xmm12 + subq $0x10,%rdx + jbe .Lcbc_dec_three + + movups 48(%rdi),%xmm5 + movaps %xmm4,%xmm13 + subq $0x10,%rdx + jbe .Lcbc_dec_four + + movups 64(%rdi),%xmm6 + movaps %xmm5,%xmm14 + movaps %xmm6,%xmm15 + xorps %xmm7,%xmm7 + call _aesni_decrypt6 + pxor %xmm10,%xmm2 + movaps %xmm15,%xmm10 + pxor %xmm11,%xmm3 + movdqu %xmm2,(%rsi) + pxor %xmm12,%xmm4 + movdqu %xmm3,16(%rsi) + pxor %xmm3,%xmm3 + pxor %xmm13,%xmm5 + movdqu %xmm4,32(%rsi) + pxor %xmm4,%xmm4 + pxor %xmm14,%xmm6 + movdqu %xmm5,48(%rsi) + pxor %xmm5,%xmm5 + leaq 64(%rsi),%rsi + movdqa %xmm6,%xmm2 + pxor %xmm6,%xmm6 + pxor %xmm7,%xmm7 + subq $0x10,%rdx + jmp .Lcbc_dec_tail_collected + +.align 16 +.Lcbc_dec_one: + movaps %xmm2,%xmm11 + movups (%rcx),%xmm0 + movups 16(%rcx),%xmm1 + leaq 32(%rcx),%rcx + xorps %xmm0,%xmm2 +.Loop_dec1_17: +.byte 102,15,56,222,209 + decl %eax + movups (%rcx),%xmm1 + leaq 16(%rcx),%rcx + jnz .Loop_dec1_17 +.byte 102,15,56,223,209 + xorps %xmm10,%xmm2 + movaps %xmm11,%xmm10 + jmp .Lcbc_dec_tail_collected +.align 16 +.Lcbc_dec_two: + movaps %xmm3,%xmm12 + call _aesni_decrypt2 + pxor %xmm10,%xmm2 + movaps %xmm12,%xmm10 + pxor %xmm11,%xmm3 + movdqu %xmm2,(%rsi) + movdqa %xmm3,%xmm2 + pxor %xmm3,%xmm3 + leaq 16(%rsi),%rsi + jmp .Lcbc_dec_tail_collected +.align 16 +.Lcbc_dec_three: + movaps %xmm4,%xmm13 + call _aesni_decrypt3 + pxor %xmm10,%xmm2 + movaps %xmm13,%xmm10 + pxor %xmm11,%xmm3 + movdqu %xmm2,(%rsi) + pxor %xmm12,%xmm4 + movdqu %xmm3,16(%rsi) + pxor %xmm3,%xmm3 + movdqa %xmm4,%xmm2 + pxor %xmm4,%xmm4 + leaq 32(%rsi),%rsi + jmp .Lcbc_dec_tail_collected +.align 16 +.Lcbc_dec_four: + movaps %xmm5,%xmm14 + call _aesni_decrypt4 + pxor %xmm10,%xmm2 + movaps %xmm14,%xmm10 + pxor %xmm11,%xmm3 + movdqu %xmm2,(%rsi) + pxor %xmm12,%xmm4 + movdqu %xmm3,16(%rsi) + pxor %xmm3,%xmm3 + pxor %xmm13,%xmm5 + movdqu %xmm4,32(%rsi) + pxor %xmm4,%xmm4 + movdqa %xmm5,%xmm2 + pxor %xmm5,%xmm5 + leaq 48(%rsi),%rsi + jmp .Lcbc_dec_tail_collected + +.align 16 +.Lcbc_dec_clear_tail_collected: + pxor %xmm3,%xmm3 + pxor %xmm4,%xmm4 + pxor %xmm5,%xmm5 + pxor %xmm6,%xmm6 + pxor %xmm7,%xmm7 + pxor %xmm8,%xmm8 + pxor %xmm9,%xmm9 +.Lcbc_dec_tail_collected: + movups %xmm10,(%r8) + andq $15,%rdx + jnz .Lcbc_dec_tail_partial + movups %xmm2,(%rsi) + pxor %xmm2,%xmm2 + jmp .Lcbc_dec_ret +.align 16 +.Lcbc_dec_tail_partial: + movaps %xmm2,(%rsp) + pxor %xmm2,%xmm2 + movq $16,%rcx + movq %rsi,%rdi + subq %rdx,%rcx + leaq (%rsp),%rsi +.long 0x9066A4F3 + movdqa %xmm2,(%rsp) + +.Lcbc_dec_ret: + xorps %xmm0,%xmm0 + pxor %xmm1,%xmm1 + movq -8(%r11),%rbp +.cfi_restore %rbp + leaq (%r11),%rsp +.cfi_def_cfa_register %rsp +.Lcbc_ret: + .byte 0xf3,0xc3 +.cfi_endproc +.size aesni_cbc_encrypt,.-aesni_cbc_encrypt +.globl aesni_set_decrypt_key +.type aesni_set_decrypt_key,@function +.align 16 +aesni_set_decrypt_key: +.cfi_startproc +.byte 0x48,0x83,0xEC,0x08 +.cfi_adjust_cfa_offset 8 + call __aesni_set_encrypt_key + shll $4,%esi + testl %eax,%eax + jnz .Ldec_key_ret + leaq 16(%rdx,%rsi,1),%rdi + + movups (%rdx),%xmm0 + movups (%rdi),%xmm1 + movups %xmm0,(%rdi) + movups %xmm1,(%rdx) + leaq 16(%rdx),%rdx + leaq -16(%rdi),%rdi + +.Ldec_key_inverse: + movups (%rdx),%xmm0 + movups (%rdi),%xmm1 +.byte 102,15,56,219,192 +.byte 102,15,56,219,201 + leaq 16(%rdx),%rdx + leaq -16(%rdi),%rdi + movups %xmm0,16(%rdi) + movups %xmm1,-16(%rdx) + cmpq %rdx,%rdi + ja .Ldec_key_inverse + + movups (%rdx),%xmm0 +.byte 102,15,56,219,192 + pxor %xmm1,%xmm1 + movups %xmm0,(%rdi) + pxor %xmm0,%xmm0 +.Ldec_key_ret: + addq $8,%rsp +.cfi_adjust_cfa_offset -8 + .byte 0xf3,0xc3 +.cfi_endproc +.LSEH_end_set_decrypt_key: +.size aesni_set_decrypt_key,.-aesni_set_decrypt_key +.globl aesni_set_encrypt_key +.type aesni_set_encrypt_key,@function +.align 16 +aesni_set_encrypt_key: +__aesni_set_encrypt_key: +.cfi_startproc +.byte 0x48,0x83,0xEC,0x08 +.cfi_adjust_cfa_offset 8 + movq $-1,%rax + testq %rdi,%rdi + jz .Lenc_key_ret + testq %rdx,%rdx + jz .Lenc_key_ret + + movl $268437504,%r10d + movups (%rdi),%xmm0 + xorps %xmm4,%xmm4 + andl OPENSSL_ia32cap_P+4(%rip),%r10d + leaq 16(%rdx),%rax + cmpl $256,%esi + je .L14rounds + cmpl $192,%esi + je .L12rounds + cmpl $128,%esi + jne .Lbad_keybits + +.L10rounds: + movl $9,%esi + cmpl $268435456,%r10d + je .L10rounds_alt + + movups %xmm0,(%rdx) +.byte 102,15,58,223,200,1 + call .Lkey_expansion_128_cold +.byte 102,15,58,223,200,2 + call .Lkey_expansion_128 +.byte 102,15,58,223,200,4 + call .Lkey_expansion_128 +.byte 102,15,58,223,200,8 + call .Lkey_expansion_128 +.byte 102,15,58,223,200,16 + call .Lkey_expansion_128 +.byte 102,15,58,223,200,32 + call .Lkey_expansion_128 +.byte 102,15,58,223,200,64 + call .Lkey_expansion_128 +.byte 102,15,58,223,200,128 + call .Lkey_expansion_128 +.byte 102,15,58,223,200,27 + call .Lkey_expansion_128 +.byte 102,15,58,223,200,54 + call .Lkey_expansion_128 + movups %xmm0,(%rax) + movl %esi,80(%rax) + xorl %eax,%eax + jmp .Lenc_key_ret + +.align 16 +.L10rounds_alt: + movdqa .Lkey_rotate(%rip),%xmm5 + movl $8,%r10d + movdqa .Lkey_rcon1(%rip),%xmm4 + movdqa %xmm0,%xmm2 + movdqu %xmm0,(%rdx) + jmp .Loop_key128 + +.align 16 +.Loop_key128: +.byte 102,15,56,0,197 +.byte 102,15,56,221,196 + pslld $1,%xmm4 + leaq 16(%rax),%rax + + movdqa %xmm2,%xmm3 + pslldq $4,%xmm2 + pxor %xmm2,%xmm3 + pslldq $4,%xmm2 + pxor %xmm2,%xmm3 + pslldq $4,%xmm2 + pxor %xmm3,%xmm2 + + pxor %xmm2,%xmm0 + movdqu %xmm0,-16(%rax) + movdqa %xmm0,%xmm2 + + decl %r10d + jnz .Loop_key128 + + movdqa .Lkey_rcon1b(%rip),%xmm4 + +.byte 102,15,56,0,197 +.byte 102,15,56,221,196 + pslld $1,%xmm4 + + movdqa %xmm2,%xmm3 + pslldq $4,%xmm2 + pxor %xmm2,%xmm3 + pslldq $4,%xmm2 + pxor %xmm2,%xmm3 + pslldq $4,%xmm2 + pxor %xmm3,%xmm2 + + pxor %xmm2,%xmm0 + movdqu %xmm0,(%rax) + + movdqa %xmm0,%xmm2 +.byte 102,15,56,0,197 +.byte 102,15,56,221,196 + + movdqa %xmm2,%xmm3 + pslldq $4,%xmm2 + pxor %xmm2,%xmm3 + pslldq $4,%xmm2 + pxor %xmm2,%xmm3 + pslldq $4,%xmm2 + pxor %xmm3,%xmm2 + + pxor %xmm2,%xmm0 + movdqu %xmm0,16(%rax) + + movl %esi,96(%rax) + xorl %eax,%eax + jmp .Lenc_key_ret + +.align 16 +.L12rounds: + movq 16(%rdi),%xmm2 + movl $11,%esi + cmpl $268435456,%r10d + je .L12rounds_alt + + movups %xmm0,(%rdx) +.byte 102,15,58,223,202,1 + call .Lkey_expansion_192a_cold +.byte 102,15,58,223,202,2 + call .Lkey_expansion_192b +.byte 102,15,58,223,202,4 + call .Lkey_expansion_192a +.byte 102,15,58,223,202,8 + call .Lkey_expansion_192b +.byte 102,15,58,223,202,16 + call .Lkey_expansion_192a +.byte 102,15,58,223,202,32 + call .Lkey_expansion_192b +.byte 102,15,58,223,202,64 + call .Lkey_expansion_192a +.byte 102,15,58,223,202,128 + call .Lkey_expansion_192b + movups %xmm0,(%rax) + movl %esi,48(%rax) + xorq %rax,%rax + jmp .Lenc_key_ret + +.align 16 +.L12rounds_alt: + movdqa .Lkey_rotate192(%rip),%xmm5 + movdqa .Lkey_rcon1(%rip),%xmm4 + movl $8,%r10d + movdqu %xmm0,(%rdx) + jmp .Loop_key192 + +.align 16 +.Loop_key192: + movq %xmm2,0(%rax) + movdqa %xmm2,%xmm1 +.byte 102,15,56,0,213 +.byte 102,15,56,221,212 + pslld $1,%xmm4 + leaq 24(%rax),%rax + + movdqa %xmm0,%xmm3 + pslldq $4,%xmm0 + pxor %xmm0,%xmm3 + pslldq $4,%xmm0 + pxor %xmm0,%xmm3 + pslldq $4,%xmm0 + pxor %xmm3,%xmm0 + + pshufd $0xff,%xmm0,%xmm3 + pxor %xmm1,%xmm3 + pslldq $4,%xmm1 + pxor %xmm1,%xmm3 + + pxor %xmm2,%xmm0 + pxor %xmm3,%xmm2 + movdqu %xmm0,-16(%rax) + + decl %r10d + jnz .Loop_key192 + + movl %esi,32(%rax) + xorl %eax,%eax + jmp .Lenc_key_ret + +.align 16 +.L14rounds: + movups 16(%rdi),%xmm2 + movl $13,%esi + leaq 16(%rax),%rax + cmpl $268435456,%r10d + je .L14rounds_alt + + movups %xmm0,(%rdx) + movups %xmm2,16(%rdx) +.byte 102,15,58,223,202,1 + call .Lkey_expansion_256a_cold +.byte 102,15,58,223,200,1 + call .Lkey_expansion_256b +.byte 102,15,58,223,202,2 + call .Lkey_expansion_256a +.byte 102,15,58,223,200,2 + call .Lkey_expansion_256b +.byte 102,15,58,223,202,4 + call .Lkey_expansion_256a +.byte 102,15,58,223,200,4 + call .Lkey_expansion_256b +.byte 102,15,58,223,202,8 + call .Lkey_expansion_256a +.byte 102,15,58,223,200,8 + call .Lkey_expansion_256b +.byte 102,15,58,223,202,16 + call .Lkey_expansion_256a +.byte 102,15,58,223,200,16 + call .Lkey_expansion_256b +.byte 102,15,58,223,202,32 + call .Lkey_expansion_256a +.byte 102,15,58,223,200,32 + call .Lkey_expansion_256b +.byte 102,15,58,223,202,64 + call .Lkey_expansion_256a + movups %xmm0,(%rax) + movl %esi,16(%rax) + xorq %rax,%rax + jmp .Lenc_key_ret + +.align 16 +.L14rounds_alt: + movdqa .Lkey_rotate(%rip),%xmm5 + movdqa .Lkey_rcon1(%rip),%xmm4 + movl $7,%r10d + movdqu %xmm0,0(%rdx) + movdqa %xmm2,%xmm1 + movdqu %xmm2,16(%rdx) + jmp .Loop_key256 + +.align 16 +.Loop_key256: +.byte 102,15,56,0,213 +.byte 102,15,56,221,212 + + movdqa %xmm0,%xmm3 + pslldq $4,%xmm0 + pxor %xmm0,%xmm3 + pslldq $4,%xmm0 + pxor %xmm0,%xmm3 + pslldq $4,%xmm0 + pxor %xmm3,%xmm0 + pslld $1,%xmm4 + + pxor %xmm2,%xmm0 + movdqu %xmm0,(%rax) + + decl %r10d + jz .Ldone_key256 + + pshufd $0xff,%xmm0,%xmm2 + pxor %xmm3,%xmm3 +.byte 102,15,56,221,211 + + movdqa %xmm1,%xmm3 + pslldq $4,%xmm1 + pxor %xmm1,%xmm3 + pslldq $4,%xmm1 + pxor %xmm1,%xmm3 + pslldq $4,%xmm1 + pxor %xmm3,%xmm1 + + pxor %xmm1,%xmm2 + movdqu %xmm2,16(%rax) + leaq 32(%rax),%rax + movdqa %xmm2,%xmm1 + + jmp .Loop_key256 + +.Ldone_key256: + movl %esi,16(%rax) + xorl %eax,%eax + jmp .Lenc_key_ret + +.align 16 +.Lbad_keybits: + movq $-2,%rax +.Lenc_key_ret: + pxor %xmm0,%xmm0 + pxor %xmm1,%xmm1 + pxor %xmm2,%xmm2 + pxor %xmm3,%xmm3 + pxor %xmm4,%xmm4 + pxor %xmm5,%xmm5 + addq $8,%rsp +.cfi_adjust_cfa_offset -8 + .byte 0xf3,0xc3 +.LSEH_end_set_encrypt_key: + +.align 16 +.Lkey_expansion_128: + movups %xmm0,(%rax) + leaq 16(%rax),%rax +.Lkey_expansion_128_cold: + shufps $16,%xmm0,%xmm4 + xorps %xmm4,%xmm0 + shufps $140,%xmm0,%xmm4 + xorps %xmm4,%xmm0 + shufps $255,%xmm1,%xmm1 + xorps %xmm1,%xmm0 + .byte 0xf3,0xc3 + +.align 16 +.Lkey_expansion_192a: + movups %xmm0,(%rax) + leaq 16(%rax),%rax +.Lkey_expansion_192a_cold: + movaps %xmm2,%xmm5 +.Lkey_expansion_192b_warm: + shufps $16,%xmm0,%xmm4 + movdqa %xmm2,%xmm3 + xorps %xmm4,%xmm0 + shufps $140,%xmm0,%xmm4 + pslldq $4,%xmm3 + xorps %xmm4,%xmm0 + pshufd $85,%xmm1,%xmm1 + pxor %xmm3,%xmm2 + pxor %xmm1,%xmm0 + pshufd $255,%xmm0,%xmm3 + pxor %xmm3,%xmm2 + .byte 0xf3,0xc3 + +.align 16 +.Lkey_expansion_192b: + movaps %xmm0,%xmm3 + shufps $68,%xmm0,%xmm5 + movups %xmm5,(%rax) + shufps $78,%xmm2,%xmm3 + movups %xmm3,16(%rax) + leaq 32(%rax),%rax + jmp .Lkey_expansion_192b_warm + +.align 16 +.Lkey_expansion_256a: + movups %xmm2,(%rax) + leaq 16(%rax),%rax +.Lkey_expansion_256a_cold: + shufps $16,%xmm0,%xmm4 + xorps %xmm4,%xmm0 + shufps $140,%xmm0,%xmm4 + xorps %xmm4,%xmm0 + shufps $255,%xmm1,%xmm1 + xorps %xmm1,%xmm0 + .byte 0xf3,0xc3 + +.align 16 +.Lkey_expansion_256b: + movups %xmm0,(%rax) + leaq 16(%rax),%rax + + shufps $16,%xmm2,%xmm4 + xorps %xmm4,%xmm2 + shufps $140,%xmm2,%xmm4 + xorps %xmm4,%xmm2 + shufps $170,%xmm1,%xmm1 + xorps %xmm1,%xmm2 + .byte 0xf3,0xc3 +.cfi_endproc +.size aesni_set_encrypt_key,.-aesni_set_encrypt_key +.size __aesni_set_encrypt_key,.-__aesni_set_encrypt_key +.align 64 +.Lbswap_mask: +.byte 15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0 +.Lincrement32: +.long 6,6,6,0 +.Lincrement64: +.long 1,0,0,0 +.Lxts_magic: +.long 0x87,0,1,0 +.Lincrement1: +.byte 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1 +.Lkey_rotate: +.long 0x0c0f0e0d,0x0c0f0e0d,0x0c0f0e0d,0x0c0f0e0d +.Lkey_rotate192: +.long 0x04070605,0x04070605,0x04070605,0x04070605 +.Lkey_rcon1: +.long 1,1,1,1 +.Lkey_rcon1b: +.long 0x1b,0x1b,0x1b,0x1b + +.byte 65,69,83,32,102,111,114,32,73,110,116,101,108,32,65,69,83,45,78,73,44,32,67,82,89,80,84,79,71,65,77,83,32,98,121,32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46,111,114,103,62,0 +.align 64 diff --git a/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/vpaes-x86_64.S b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/vpaes-x86_64.S new file mode 100644 index 0000000000..982818f83b --- /dev/null +++ b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/vpaes-x86_64.S @@ -0,0 +1,863 @@ +# WARNING: do not edit! +# Generated from openssl/crypto/aes/asm/vpaes-x86_64.pl +# +# Copyright 2011-2020 The OpenSSL Project Authors. All Rights Reserved. +# +# Licensed under the OpenSSL license (the "License"). You may not use +# this file except in compliance with the License. You can obtain a copy +# in the file LICENSE in the source distribution or at +# https://www.openssl.org/source/license.html + +.text + + + + + + + + + + + + + + + + +.type _vpaes_encrypt_core,@function +.align 16 +_vpaes_encrypt_core: +.cfi_startproc + movq %rdx,%r9 + movq $16,%r11 + movl 240(%rdx),%eax + movdqa %xmm9,%xmm1 + movdqa .Lk_ipt(%rip),%xmm2 + pandn %xmm0,%xmm1 + movdqu (%r9),%xmm5 + psrld $4,%xmm1 + pand %xmm9,%xmm0 +.byte 102,15,56,0,208 + movdqa .Lk_ipt+16(%rip),%xmm0 +.byte 102,15,56,0,193 + pxor %xmm5,%xmm2 + addq $16,%r9 + pxor %xmm2,%xmm0 + leaq .Lk_mc_backward(%rip),%r10 + jmp .Lenc_entry + +.align 16 +.Lenc_loop: + + movdqa %xmm13,%xmm4 + movdqa %xmm12,%xmm0 +.byte 102,15,56,0,226 +.byte 102,15,56,0,195 + pxor %xmm5,%xmm4 + movdqa %xmm15,%xmm5 + pxor %xmm4,%xmm0 + movdqa -64(%r11,%r10,1),%xmm1 +.byte 102,15,56,0,234 + movdqa (%r11,%r10,1),%xmm4 + movdqa %xmm14,%xmm2 +.byte 102,15,56,0,211 + movdqa %xmm0,%xmm3 + pxor %xmm5,%xmm2 +.byte 102,15,56,0,193 + addq $16,%r9 + pxor %xmm2,%xmm0 +.byte 102,15,56,0,220 + addq $16,%r11 + pxor %xmm0,%xmm3 +.byte 102,15,56,0,193 + andq $0x30,%r11 + subq $1,%rax + pxor %xmm3,%xmm0 + +.Lenc_entry: + + movdqa %xmm9,%xmm1 + movdqa %xmm11,%xmm5 + pandn %xmm0,%xmm1 + psrld $4,%xmm1 + pand %xmm9,%xmm0 +.byte 102,15,56,0,232 + movdqa %xmm10,%xmm3 + pxor %xmm1,%xmm0 +.byte 102,15,56,0,217 + movdqa %xmm10,%xmm4 + pxor %xmm5,%xmm3 +.byte 102,15,56,0,224 + movdqa %xmm10,%xmm2 + pxor %xmm5,%xmm4 +.byte 102,15,56,0,211 + movdqa %xmm10,%xmm3 + pxor %xmm0,%xmm2 +.byte 102,15,56,0,220 + movdqu (%r9),%xmm5 + pxor %xmm1,%xmm3 + jnz .Lenc_loop + + + movdqa -96(%r10),%xmm4 + movdqa -80(%r10),%xmm0 +.byte 102,15,56,0,226 + pxor %xmm5,%xmm4 +.byte 102,15,56,0,195 + movdqa 64(%r11,%r10,1),%xmm1 + pxor %xmm4,%xmm0 +.byte 102,15,56,0,193 + .byte 0xf3,0xc3 +.cfi_endproc +.size _vpaes_encrypt_core,.-_vpaes_encrypt_core + + + + + + +.type _vpaes_decrypt_core,@function +.align 16 +_vpaes_decrypt_core: +.cfi_startproc + movq %rdx,%r9 + movl 240(%rdx),%eax + movdqa %xmm9,%xmm1 + movdqa .Lk_dipt(%rip),%xmm2 + pandn %xmm0,%xmm1 + movq %rax,%r11 + psrld $4,%xmm1 + movdqu (%r9),%xmm5 + shlq $4,%r11 + pand %xmm9,%xmm0 +.byte 102,15,56,0,208 + movdqa .Lk_dipt+16(%rip),%xmm0 + xorq $0x30,%r11 + leaq .Lk_dsbd(%rip),%r10 +.byte 102,15,56,0,193 + andq $0x30,%r11 + pxor %xmm5,%xmm2 + movdqa .Lk_mc_forward+48(%rip),%xmm5 + pxor %xmm2,%xmm0 + addq $16,%r9 + addq %r10,%r11 + jmp .Ldec_entry + +.align 16 +.Ldec_loop: + + + + movdqa -32(%r10),%xmm4 + movdqa -16(%r10),%xmm1 +.byte 102,15,56,0,226 +.byte 102,15,56,0,203 + pxor %xmm4,%xmm0 + movdqa 0(%r10),%xmm4 + pxor %xmm1,%xmm0 + movdqa 16(%r10),%xmm1 + +.byte 102,15,56,0,226 +.byte 102,15,56,0,197 +.byte 102,15,56,0,203 + pxor %xmm4,%xmm0 + movdqa 32(%r10),%xmm4 + pxor %xmm1,%xmm0 + movdqa 48(%r10),%xmm1 + +.byte 102,15,56,0,226 +.byte 102,15,56,0,197 +.byte 102,15,56,0,203 + pxor %xmm4,%xmm0 + movdqa 64(%r10),%xmm4 + pxor %xmm1,%xmm0 + movdqa 80(%r10),%xmm1 + +.byte 102,15,56,0,226 +.byte 102,15,56,0,197 +.byte 102,15,56,0,203 + pxor %xmm4,%xmm0 + addq $16,%r9 +.byte 102,15,58,15,237,12 + pxor %xmm1,%xmm0 + subq $1,%rax + +.Ldec_entry: + + movdqa %xmm9,%xmm1 + pandn %xmm0,%xmm1 + movdqa %xmm11,%xmm2 + psrld $4,%xmm1 + pand %xmm9,%xmm0 +.byte 102,15,56,0,208 + movdqa %xmm10,%xmm3 + pxor %xmm1,%xmm0 +.byte 102,15,56,0,217 + movdqa %xmm10,%xmm4 + pxor %xmm2,%xmm3 +.byte 102,15,56,0,224 + pxor %xmm2,%xmm4 + movdqa %xmm10,%xmm2 +.byte 102,15,56,0,211 + movdqa %xmm10,%xmm3 + pxor %xmm0,%xmm2 +.byte 102,15,56,0,220 + movdqu (%r9),%xmm0 + pxor %xmm1,%xmm3 + jnz .Ldec_loop + + + movdqa 96(%r10),%xmm4 +.byte 102,15,56,0,226 + pxor %xmm0,%xmm4 + movdqa 112(%r10),%xmm0 + movdqa -352(%r11),%xmm2 +.byte 102,15,56,0,195 + pxor %xmm4,%xmm0 +.byte 102,15,56,0,194 + .byte 0xf3,0xc3 +.cfi_endproc +.size _vpaes_decrypt_core,.-_vpaes_decrypt_core + + + + + + +.type _vpaes_schedule_core,@function +.align 16 +_vpaes_schedule_core: +.cfi_startproc + + + + + + call _vpaes_preheat + movdqa .Lk_rcon(%rip),%xmm8 + movdqu (%rdi),%xmm0 + + + movdqa %xmm0,%xmm3 + leaq .Lk_ipt(%rip),%r11 + call _vpaes_schedule_transform + movdqa %xmm0,%xmm7 + + leaq .Lk_sr(%rip),%r10 + testq %rcx,%rcx + jnz .Lschedule_am_decrypting + + + movdqu %xmm0,(%rdx) + jmp .Lschedule_go + +.Lschedule_am_decrypting: + + movdqa (%r8,%r10,1),%xmm1 +.byte 102,15,56,0,217 + movdqu %xmm3,(%rdx) + xorq $0x30,%r8 + +.Lschedule_go: + cmpl $192,%esi + ja .Lschedule_256 + je .Lschedule_192 + + + + + + + + + + +.Lschedule_128: + movl $10,%esi + +.Loop_schedule_128: + call _vpaes_schedule_round + decq %rsi + jz .Lschedule_mangle_last + call _vpaes_schedule_mangle + jmp .Loop_schedule_128 + + + + + + + + + + + + + + + + +.align 16 +.Lschedule_192: + movdqu 8(%rdi),%xmm0 + call _vpaes_schedule_transform + movdqa %xmm0,%xmm6 + pxor %xmm4,%xmm4 + movhlps %xmm4,%xmm6 + movl $4,%esi + +.Loop_schedule_192: + call _vpaes_schedule_round +.byte 102,15,58,15,198,8 + call _vpaes_schedule_mangle + call _vpaes_schedule_192_smear + call _vpaes_schedule_mangle + call _vpaes_schedule_round + decq %rsi + jz .Lschedule_mangle_last + call _vpaes_schedule_mangle + call _vpaes_schedule_192_smear + jmp .Loop_schedule_192 + + + + + + + + + + + +.align 16 +.Lschedule_256: + movdqu 16(%rdi),%xmm0 + call _vpaes_schedule_transform + movl $7,%esi + +.Loop_schedule_256: + call _vpaes_schedule_mangle + movdqa %xmm0,%xmm6 + + + call _vpaes_schedule_round + decq %rsi + jz .Lschedule_mangle_last + call _vpaes_schedule_mangle + + + pshufd $0xFF,%xmm0,%xmm0 + movdqa %xmm7,%xmm5 + movdqa %xmm6,%xmm7 + call _vpaes_schedule_low_round + movdqa %xmm5,%xmm7 + + jmp .Loop_schedule_256 + + + + + + + + + + + + +.align 16 +.Lschedule_mangle_last: + + leaq .Lk_deskew(%rip),%r11 + testq %rcx,%rcx + jnz .Lschedule_mangle_last_dec + + + movdqa (%r8,%r10,1),%xmm1 +.byte 102,15,56,0,193 + leaq .Lk_opt(%rip),%r11 + addq $32,%rdx + +.Lschedule_mangle_last_dec: + addq $-16,%rdx + pxor .Lk_s63(%rip),%xmm0 + call _vpaes_schedule_transform + movdqu %xmm0,(%rdx) + + + pxor %xmm0,%xmm0 + pxor %xmm1,%xmm1 + pxor %xmm2,%xmm2 + pxor %xmm3,%xmm3 + pxor %xmm4,%xmm4 + pxor %xmm5,%xmm5 + pxor %xmm6,%xmm6 + pxor %xmm7,%xmm7 + .byte 0xf3,0xc3 +.cfi_endproc +.size _vpaes_schedule_core,.-_vpaes_schedule_core + + + + + + + + + + + + + + + +.type _vpaes_schedule_192_smear,@function +.align 16 +_vpaes_schedule_192_smear: +.cfi_startproc + pshufd $0x80,%xmm6,%xmm1 + pshufd $0xFE,%xmm7,%xmm0 + pxor %xmm1,%xmm6 + pxor %xmm1,%xmm1 + pxor %xmm0,%xmm6 + movdqa %xmm6,%xmm0 + movhlps %xmm1,%xmm6 + .byte 0xf3,0xc3 +.cfi_endproc +.size _vpaes_schedule_192_smear,.-_vpaes_schedule_192_smear + + + + + + + + + + + + + + + + + + + +.type _vpaes_schedule_round,@function +.align 16 +_vpaes_schedule_round: +.cfi_startproc + + pxor %xmm1,%xmm1 +.byte 102,65,15,58,15,200,15 +.byte 102,69,15,58,15,192,15 + pxor %xmm1,%xmm7 + + + pshufd $0xFF,%xmm0,%xmm0 +.byte 102,15,58,15,192,1 + + + + +_vpaes_schedule_low_round: + + movdqa %xmm7,%xmm1 + pslldq $4,%xmm7 + pxor %xmm1,%xmm7 + movdqa %xmm7,%xmm1 + pslldq $8,%xmm7 + pxor %xmm1,%xmm7 + pxor .Lk_s63(%rip),%xmm7 + + + movdqa %xmm9,%xmm1 + pandn %xmm0,%xmm1 + psrld $4,%xmm1 + pand %xmm9,%xmm0 + movdqa %xmm11,%xmm2 +.byte 102,15,56,0,208 + pxor %xmm1,%xmm0 + movdqa %xmm10,%xmm3 +.byte 102,15,56,0,217 + pxor %xmm2,%xmm3 + movdqa %xmm10,%xmm4 +.byte 102,15,56,0,224 + pxor %xmm2,%xmm4 + movdqa %xmm10,%xmm2 +.byte 102,15,56,0,211 + pxor %xmm0,%xmm2 + movdqa %xmm10,%xmm3 +.byte 102,15,56,0,220 + pxor %xmm1,%xmm3 + movdqa %xmm13,%xmm4 +.byte 102,15,56,0,226 + movdqa %xmm12,%xmm0 +.byte 102,15,56,0,195 + pxor %xmm4,%xmm0 + + + pxor %xmm7,%xmm0 + movdqa %xmm0,%xmm7 + .byte 0xf3,0xc3 +.cfi_endproc +.size _vpaes_schedule_round,.-_vpaes_schedule_round + + + + + + + + + + +.type _vpaes_schedule_transform,@function +.align 16 +_vpaes_schedule_transform: +.cfi_startproc + movdqa %xmm9,%xmm1 + pandn %xmm0,%xmm1 + psrld $4,%xmm1 + pand %xmm9,%xmm0 + movdqa (%r11),%xmm2 +.byte 102,15,56,0,208 + movdqa 16(%r11),%xmm0 +.byte 102,15,56,0,193 + pxor %xmm2,%xmm0 + .byte 0xf3,0xc3 +.cfi_endproc +.size _vpaes_schedule_transform,.-_vpaes_schedule_transform + + + + + + + + + + + + + + + + + + + + + + + + +.type _vpaes_schedule_mangle,@function +.align 16 +_vpaes_schedule_mangle: +.cfi_startproc + movdqa %xmm0,%xmm4 + movdqa .Lk_mc_forward(%rip),%xmm5 + testq %rcx,%rcx + jnz .Lschedule_mangle_dec + + + addq $16,%rdx + pxor .Lk_s63(%rip),%xmm4 +.byte 102,15,56,0,229 + movdqa %xmm4,%xmm3 +.byte 102,15,56,0,229 + pxor %xmm4,%xmm3 +.byte 102,15,56,0,229 + pxor %xmm4,%xmm3 + + jmp .Lschedule_mangle_both +.align 16 +.Lschedule_mangle_dec: + + leaq .Lk_dksd(%rip),%r11 + movdqa %xmm9,%xmm1 + pandn %xmm4,%xmm1 + psrld $4,%xmm1 + pand %xmm9,%xmm4 + + movdqa 0(%r11),%xmm2 +.byte 102,15,56,0,212 + movdqa 16(%r11),%xmm3 +.byte 102,15,56,0,217 + pxor %xmm2,%xmm3 +.byte 102,15,56,0,221 + + movdqa 32(%r11),%xmm2 +.byte 102,15,56,0,212 + pxor %xmm3,%xmm2 + movdqa 48(%r11),%xmm3 +.byte 102,15,56,0,217 + pxor %xmm2,%xmm3 +.byte 102,15,56,0,221 + + movdqa 64(%r11),%xmm2 +.byte 102,15,56,0,212 + pxor %xmm3,%xmm2 + movdqa 80(%r11),%xmm3 +.byte 102,15,56,0,217 + pxor %xmm2,%xmm3 +.byte 102,15,56,0,221 + + movdqa 96(%r11),%xmm2 +.byte 102,15,56,0,212 + pxor %xmm3,%xmm2 + movdqa 112(%r11),%xmm3 +.byte 102,15,56,0,217 + pxor %xmm2,%xmm3 + + addq $-16,%rdx + +.Lschedule_mangle_both: + movdqa (%r8,%r10,1),%xmm1 +.byte 102,15,56,0,217 + addq $-16,%r8 + andq $0x30,%r8 + movdqu %xmm3,(%rdx) + .byte 0xf3,0xc3 +.cfi_endproc +.size _vpaes_schedule_mangle,.-_vpaes_schedule_mangle + + + + +.globl vpaes_set_encrypt_key +.type vpaes_set_encrypt_key,@function +.align 16 +vpaes_set_encrypt_key: +.cfi_startproc + movl %esi,%eax + shrl $5,%eax + addl $5,%eax + movl %eax,240(%rdx) + + movl $0,%ecx + movl $0x30,%r8d + call _vpaes_schedule_core + xorl %eax,%eax + .byte 0xf3,0xc3 +.cfi_endproc +.size vpaes_set_encrypt_key,.-vpaes_set_encrypt_key + +.globl vpaes_set_decrypt_key +.type vpaes_set_decrypt_key,@function +.align 16 +vpaes_set_decrypt_key: +.cfi_startproc + movl %esi,%eax + shrl $5,%eax + addl $5,%eax + movl %eax,240(%rdx) + shll $4,%eax + leaq 16(%rdx,%rax,1),%rdx + + movl $1,%ecx + movl %esi,%r8d + shrl $1,%r8d + andl $32,%r8d + xorl $32,%r8d + call _vpaes_schedule_core + xorl %eax,%eax + .byte 0xf3,0xc3 +.cfi_endproc +.size vpaes_set_decrypt_key,.-vpaes_set_decrypt_key + +.globl vpaes_encrypt +.type vpaes_encrypt,@function +.align 16 +vpaes_encrypt: +.cfi_startproc + movdqu (%rdi),%xmm0 + call _vpaes_preheat + call _vpaes_encrypt_core + movdqu %xmm0,(%rsi) + .byte 0xf3,0xc3 +.cfi_endproc +.size vpaes_encrypt,.-vpaes_encrypt + +.globl vpaes_decrypt +.type vpaes_decrypt,@function +.align 16 +vpaes_decrypt: +.cfi_startproc + movdqu (%rdi),%xmm0 + call _vpaes_preheat + call _vpaes_decrypt_core + movdqu %xmm0,(%rsi) + .byte 0xf3,0xc3 +.cfi_endproc +.size vpaes_decrypt,.-vpaes_decrypt +.globl vpaes_cbc_encrypt +.type vpaes_cbc_encrypt,@function +.align 16 +vpaes_cbc_encrypt: +.cfi_startproc + xchgq %rcx,%rdx + subq $16,%rcx + jc .Lcbc_abort + movdqu (%r8),%xmm6 + subq %rdi,%rsi + call _vpaes_preheat + cmpl $0,%r9d + je .Lcbc_dec_loop + jmp .Lcbc_enc_loop +.align 16 +.Lcbc_enc_loop: + movdqu (%rdi),%xmm0 + pxor %xmm6,%xmm0 + call _vpaes_encrypt_core + movdqa %xmm0,%xmm6 + movdqu %xmm0,(%rsi,%rdi,1) + leaq 16(%rdi),%rdi + subq $16,%rcx + jnc .Lcbc_enc_loop + jmp .Lcbc_done +.align 16 +.Lcbc_dec_loop: + movdqu (%rdi),%xmm0 + movdqa %xmm0,%xmm7 + call _vpaes_decrypt_core + pxor %xmm6,%xmm0 + movdqa %xmm7,%xmm6 + movdqu %xmm0,(%rsi,%rdi,1) + leaq 16(%rdi),%rdi + subq $16,%rcx + jnc .Lcbc_dec_loop +.Lcbc_done: + movdqu %xmm6,(%r8) +.Lcbc_abort: + .byte 0xf3,0xc3 +.cfi_endproc +.size vpaes_cbc_encrypt,.-vpaes_cbc_encrypt + + + + + + +.type _vpaes_preheat,@function +.align 16 +_vpaes_preheat: +.cfi_startproc + leaq .Lk_s0F(%rip),%r10 + movdqa -32(%r10),%xmm10 + movdqa -16(%r10),%xmm11 + movdqa 0(%r10),%xmm9 + movdqa 48(%r10),%xmm13 + movdqa 64(%r10),%xmm12 + movdqa 80(%r10),%xmm15 + movdqa 96(%r10),%xmm14 + .byte 0xf3,0xc3 +.cfi_endproc +.size _vpaes_preheat,.-_vpaes_preheat + + + + + +.type _vpaes_consts,@object +.align 64 +_vpaes_consts: +.Lk_inv: +.quad 0x0E05060F0D080180, 0x040703090A0B0C02 +.quad 0x01040A060F0B0780, 0x030D0E0C02050809 + +.Lk_s0F: +.quad 0x0F0F0F0F0F0F0F0F, 0x0F0F0F0F0F0F0F0F + +.Lk_ipt: +.quad 0xC2B2E8985A2A7000, 0xCABAE09052227808 +.quad 0x4C01307D317C4D00, 0xCD80B1FCB0FDCC81 + +.Lk_sb1: +.quad 0xB19BE18FCB503E00, 0xA5DF7A6E142AF544 +.quad 0x3618D415FAE22300, 0x3BF7CCC10D2ED9EF +.Lk_sb2: +.quad 0xE27A93C60B712400, 0x5EB7E955BC982FCD +.quad 0x69EB88400AE12900, 0xC2A163C8AB82234A +.Lk_sbo: +.quad 0xD0D26D176FBDC700, 0x15AABF7AC502A878 +.quad 0xCFE474A55FBB6A00, 0x8E1E90D1412B35FA + +.Lk_mc_forward: +.quad 0x0407060500030201, 0x0C0F0E0D080B0A09 +.quad 0x080B0A0904070605, 0x000302010C0F0E0D +.quad 0x0C0F0E0D080B0A09, 0x0407060500030201 +.quad 0x000302010C0F0E0D, 0x080B0A0904070605 + +.Lk_mc_backward: +.quad 0x0605040702010003, 0x0E0D0C0F0A09080B +.quad 0x020100030E0D0C0F, 0x0A09080B06050407 +.quad 0x0E0D0C0F0A09080B, 0x0605040702010003 +.quad 0x0A09080B06050407, 0x020100030E0D0C0F + +.Lk_sr: +.quad 0x0706050403020100, 0x0F0E0D0C0B0A0908 +.quad 0x030E09040F0A0500, 0x0B06010C07020D08 +.quad 0x0F060D040B020900, 0x070E050C030A0108 +.quad 0x0B0E0104070A0D00, 0x0306090C0F020508 + +.Lk_rcon: +.quad 0x1F8391B9AF9DEEB6, 0x702A98084D7C7D81 + +.Lk_s63: +.quad 0x5B5B5B5B5B5B5B5B, 0x5B5B5B5B5B5B5B5B + +.Lk_opt: +.quad 0xFF9F4929D6B66000, 0xF7974121DEBE6808 +.quad 0x01EDBD5150BCEC00, 0xE10D5DB1B05C0CE0 + +.Lk_deskew: +.quad 0x07E4A34047A4E300, 0x1DFEB95A5DBEF91A +.quad 0x5F36B5DC83EA6900, 0x2841C2ABF49D1E77 + + + + + +.Lk_dksd: +.quad 0xFEB91A5DA3E44700, 0x0740E3A45A1DBEF9 +.quad 0x41C277F4B5368300, 0x5FDC69EAAB289D1E +.Lk_dksb: +.quad 0x9A4FCA1F8550D500, 0x03D653861CC94C99 +.quad 0x115BEDA7B6FC4A00, 0xD993256F7E3482C8 +.Lk_dkse: +.quad 0xD5031CCA1FC9D600, 0x53859A4C994F5086 +.quad 0xA23196054FDC7BE8, 0xCD5EF96A20B31487 +.Lk_dks9: +.quad 0xB6116FC87ED9A700, 0x4AED933482255BFC +.quad 0x4576516227143300, 0x8BB89FACE9DAFDCE + + + + + +.Lk_dipt: +.quad 0x0F505B040B545F00, 0x154A411E114E451A +.quad 0x86E383E660056500, 0x12771772F491F194 + +.Lk_dsb9: +.quad 0x851C03539A86D600, 0xCAD51F504F994CC9 +.quad 0xC03B1789ECD74900, 0x725E2C9EB2FBA565 +.Lk_dsbd: +.quad 0x7D57CCDFE6B1A200, 0xF56E9B13882A4439 +.quad 0x3CE2FAF724C6CB00, 0x2931180D15DEEFD3 +.Lk_dsbb: +.quad 0xD022649296B44200, 0x602646F6B0F2D404 +.quad 0xC19498A6CD596700, 0xF3FF0C3E3255AA6B +.Lk_dsbe: +.quad 0x46F2929626D4D000, 0x2242600464B4F6B0 +.quad 0x0C55A6CDFFAAC100, 0x9467F36B98593E32 +.Lk_dsbo: +.quad 0x1387EA537EF94000, 0xC7AA6DB9D4943E2D +.quad 0x12D7560F93441D00, 0xCA4B8159D8C58E9C +.byte 86,101,99,116,111,114,32,80,101,114,109,117,116,97,116,105,111,110,32,65,69,83,32,102,111,114,32,120,56,54,95,54,52,47,83,83,83,69,51,44,32,77,105,107,101,32,72,97,109,98,117,114,103,32,40,83,116,97,110,102,111,114,100,32,85,110,105,118,101,114,115,105,116,121,41,0 +.align 64 +.size _vpaes_consts,.-_vpaes_consts diff --git a/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/modes/aesni-gcm-x86_64.S b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/modes/aesni-gcm-x86_64.S new file mode 100644 index 0000000000..1201f3427a --- /dev/null +++ b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/modes/aesni-gcm-x86_64.S @@ -0,0 +1,29 @@ +# WARNING: do not edit! +# Generated from openssl/crypto/modes/asm/aesni-gcm-x86_64.pl +# +# Copyright 2013-2020 The OpenSSL Project Authors. All Rights Reserved. +# +# Licensed under the OpenSSL license (the "License"). You may not use +# this file except in compliance with the License. You can obtain a copy +# in the file LICENSE in the source distribution or at +# https://www.openssl.org/source/license.html + +.text + +.globl aesni_gcm_encrypt +.type aesni_gcm_encrypt,@function +aesni_gcm_encrypt: +.cfi_startproc + xorl %eax,%eax + .byte 0xf3,0xc3 +.cfi_endproc +.size aesni_gcm_encrypt,.-aesni_gcm_encrypt + +.globl aesni_gcm_decrypt +.type aesni_gcm_decrypt,@function +aesni_gcm_decrypt: +.cfi_startproc + xorl %eax,%eax + .byte 0xf3,0xc3 +.cfi_endproc +.size aesni_gcm_decrypt,.-aesni_gcm_decrypt diff --git a/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/modes/ghash-x86_64.S b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/modes/ghash-x86_64.S new file mode 100644 index 0000000000..3fcaa4b2ef --- /dev/null +++ b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/modes/ghash-x86_64.S @@ -0,0 +1,1386 @@ +# WARNING: do not edit! +# Generated from openssl/crypto/modes/asm/ghash-x86_64.pl +# +# Copyright 2010-2020 The OpenSSL Project Authors. All Rights Reserved. +# +# Licensed under the OpenSSL license (the "License"). You may not use +# this file except in compliance with the License. You can obtain a copy +# in the file LICENSE in the source distribution or at +# https://www.openssl.org/source/license.html + +.text + + +.globl gcm_gmult_4bit +.type gcm_gmult_4bit,@function +.align 16 +gcm_gmult_4bit: +.cfi_startproc + pushq %rbx +.cfi_adjust_cfa_offset 8 +.cfi_offset %rbx,-16 + pushq %rbp +.cfi_adjust_cfa_offset 8 +.cfi_offset %rbp,-24 + pushq %r12 +.cfi_adjust_cfa_offset 8 +.cfi_offset %r12,-32 + pushq %r13 +.cfi_adjust_cfa_offset 8 +.cfi_offset %r13,-40 + pushq %r14 +.cfi_adjust_cfa_offset 8 +.cfi_offset %r14,-48 + pushq %r15 +.cfi_adjust_cfa_offset 8 +.cfi_offset %r15,-56 + subq $280,%rsp +.cfi_adjust_cfa_offset 280 +.Lgmult_prologue: + + movzbq 15(%rdi),%r8 + leaq .Lrem_4bit(%rip),%r11 + xorq %rax,%rax + xorq %rbx,%rbx + movb %r8b,%al + movb %r8b,%bl + shlb $4,%al + movq $14,%rcx + movq 8(%rsi,%rax,1),%r8 + movq (%rsi,%rax,1),%r9 + andb $0xf0,%bl + movq %r8,%rdx + jmp .Loop1 + +.align 16 +.Loop1: + shrq $4,%r8 + andq $0xf,%rdx + movq %r9,%r10 + movb (%rdi,%rcx,1),%al + shrq $4,%r9 + xorq 8(%rsi,%rbx,1),%r8 + shlq $60,%r10 + xorq (%rsi,%rbx,1),%r9 + movb %al,%bl + xorq (%r11,%rdx,8),%r9 + movq %r8,%rdx + shlb $4,%al + xorq %r10,%r8 + decq %rcx + js .Lbreak1 + + shrq $4,%r8 + andq $0xf,%rdx + movq %r9,%r10 + shrq $4,%r9 + xorq 8(%rsi,%rax,1),%r8 + shlq $60,%r10 + xorq (%rsi,%rax,1),%r9 + andb $0xf0,%bl + xorq (%r11,%rdx,8),%r9 + movq %r8,%rdx + xorq %r10,%r8 + jmp .Loop1 + +.align 16 +.Lbreak1: + shrq $4,%r8 + andq $0xf,%rdx + movq %r9,%r10 + shrq $4,%r9 + xorq 8(%rsi,%rax,1),%r8 + shlq $60,%r10 + xorq (%rsi,%rax,1),%r9 + andb $0xf0,%bl + xorq (%r11,%rdx,8),%r9 + movq %r8,%rdx + xorq %r10,%r8 + + shrq $4,%r8 + andq $0xf,%rdx + movq %r9,%r10 + shrq $4,%r9 + xorq 8(%rsi,%rbx,1),%r8 + shlq $60,%r10 + xorq (%rsi,%rbx,1),%r9 + xorq %r10,%r8 + xorq (%r11,%rdx,8),%r9 + + bswapq %r8 + bswapq %r9 + movq %r8,8(%rdi) + movq %r9,(%rdi) + + leaq 280+48(%rsp),%rsi +.cfi_def_cfa %rsi,8 + movq -8(%rsi),%rbx +.cfi_restore %rbx + leaq (%rsi),%rsp +.cfi_def_cfa_register %rsp +.Lgmult_epilogue: + .byte 0xf3,0xc3 +.cfi_endproc +.size gcm_gmult_4bit,.-gcm_gmult_4bit +.globl gcm_ghash_4bit +.type gcm_ghash_4bit,@function +.align 16 +gcm_ghash_4bit: +.cfi_startproc + pushq %rbx +.cfi_adjust_cfa_offset 8 +.cfi_offset %rbx,-16 + pushq %rbp +.cfi_adjust_cfa_offset 8 +.cfi_offset %rbp,-24 + pushq %r12 +.cfi_adjust_cfa_offset 8 +.cfi_offset %r12,-32 + pushq %r13 +.cfi_adjust_cfa_offset 8 +.cfi_offset %r13,-40 + pushq %r14 +.cfi_adjust_cfa_offset 8 +.cfi_offset %r14,-48 + pushq %r15 +.cfi_adjust_cfa_offset 8 +.cfi_offset %r15,-56 + subq $280,%rsp +.cfi_adjust_cfa_offset 280 +.Lghash_prologue: + movq %rdx,%r14 + movq %rcx,%r15 + subq $-128,%rsi + leaq 16+128(%rsp),%rbp + xorl %edx,%edx + movq 0+0-128(%rsi),%r8 + movq 0+8-128(%rsi),%rax + movb %al,%dl + shrq $4,%rax + movq %r8,%r10 + shrq $4,%r8 + movq 16+0-128(%rsi),%r9 + shlb $4,%dl + movq 16+8-128(%rsi),%rbx + shlq $60,%r10 + movb %dl,0(%rsp) + orq %r10,%rax + movb %bl,%dl + shrq $4,%rbx + movq %r9,%r10 + shrq $4,%r9 + movq %r8,0(%rbp) + movq 32+0-128(%rsi),%r8 + shlb $4,%dl + movq %rax,0-128(%rbp) + movq 32+8-128(%rsi),%rax + shlq $60,%r10 + movb %dl,1(%rsp) + orq %r10,%rbx + movb %al,%dl + shrq $4,%rax + movq %r8,%r10 + shrq $4,%r8 + movq %r9,8(%rbp) + movq 48+0-128(%rsi),%r9 + shlb $4,%dl + movq %rbx,8-128(%rbp) + movq 48+8-128(%rsi),%rbx + shlq $60,%r10 + movb %dl,2(%rsp) + orq %r10,%rax + movb %bl,%dl + shrq $4,%rbx + movq %r9,%r10 + shrq $4,%r9 + movq %r8,16(%rbp) + movq 64+0-128(%rsi),%r8 + shlb $4,%dl + movq %rax,16-128(%rbp) + movq 64+8-128(%rsi),%rax + shlq $60,%r10 + movb %dl,3(%rsp) + orq %r10,%rbx + movb %al,%dl + shrq $4,%rax + movq %r8,%r10 + shrq $4,%r8 + movq %r9,24(%rbp) + movq 80+0-128(%rsi),%r9 + shlb $4,%dl + movq %rbx,24-128(%rbp) + movq 80+8-128(%rsi),%rbx + shlq $60,%r10 + movb %dl,4(%rsp) + orq %r10,%rax + movb %bl,%dl + shrq $4,%rbx + movq %r9,%r10 + shrq $4,%r9 + movq %r8,32(%rbp) + movq 96+0-128(%rsi),%r8 + shlb $4,%dl + movq %rax,32-128(%rbp) + movq 96+8-128(%rsi),%rax + shlq $60,%r10 + movb %dl,5(%rsp) + orq %r10,%rbx + movb %al,%dl + shrq $4,%rax + movq %r8,%r10 + shrq $4,%r8 + movq %r9,40(%rbp) + movq 112+0-128(%rsi),%r9 + shlb $4,%dl + movq %rbx,40-128(%rbp) + movq 112+8-128(%rsi),%rbx + shlq $60,%r10 + movb %dl,6(%rsp) + orq %r10,%rax + movb %bl,%dl + shrq $4,%rbx + movq %r9,%r10 + shrq $4,%r9 + movq %r8,48(%rbp) + movq 128+0-128(%rsi),%r8 + shlb $4,%dl + movq %rax,48-128(%rbp) + movq 128+8-128(%rsi),%rax + shlq $60,%r10 + movb %dl,7(%rsp) + orq %r10,%rbx + movb %al,%dl + shrq $4,%rax + movq %r8,%r10 + shrq $4,%r8 + movq %r9,56(%rbp) + movq 144+0-128(%rsi),%r9 + shlb $4,%dl + movq %rbx,56-128(%rbp) + movq 144+8-128(%rsi),%rbx + shlq $60,%r10 + movb %dl,8(%rsp) + orq %r10,%rax + movb %bl,%dl + shrq $4,%rbx + movq %r9,%r10 + shrq $4,%r9 + movq %r8,64(%rbp) + movq 160+0-128(%rsi),%r8 + shlb $4,%dl + movq %rax,64-128(%rbp) + movq 160+8-128(%rsi),%rax + shlq $60,%r10 + movb %dl,9(%rsp) + orq %r10,%rbx + movb %al,%dl + shrq $4,%rax + movq %r8,%r10 + shrq $4,%r8 + movq %r9,72(%rbp) + movq 176+0-128(%rsi),%r9 + shlb $4,%dl + movq %rbx,72-128(%rbp) + movq 176+8-128(%rsi),%rbx + shlq $60,%r10 + movb %dl,10(%rsp) + orq %r10,%rax + movb %bl,%dl + shrq $4,%rbx + movq %r9,%r10 + shrq $4,%r9 + movq %r8,80(%rbp) + movq 192+0-128(%rsi),%r8 + shlb $4,%dl + movq %rax,80-128(%rbp) + movq 192+8-128(%rsi),%rax + shlq $60,%r10 + movb %dl,11(%rsp) + orq %r10,%rbx + movb %al,%dl + shrq $4,%rax + movq %r8,%r10 + shrq $4,%r8 + movq %r9,88(%rbp) + movq 208+0-128(%rsi),%r9 + shlb $4,%dl + movq %rbx,88-128(%rbp) + movq 208+8-128(%rsi),%rbx + shlq $60,%r10 + movb %dl,12(%rsp) + orq %r10,%rax + movb %bl,%dl + shrq $4,%rbx + movq %r9,%r10 + shrq $4,%r9 + movq %r8,96(%rbp) + movq 224+0-128(%rsi),%r8 + shlb $4,%dl + movq %rax,96-128(%rbp) + movq 224+8-128(%rsi),%rax + shlq $60,%r10 + movb %dl,13(%rsp) + orq %r10,%rbx + movb %al,%dl + shrq $4,%rax + movq %r8,%r10 + shrq $4,%r8 + movq %r9,104(%rbp) + movq 240+0-128(%rsi),%r9 + shlb $4,%dl + movq %rbx,104-128(%rbp) + movq 240+8-128(%rsi),%rbx + shlq $60,%r10 + movb %dl,14(%rsp) + orq %r10,%rax + movb %bl,%dl + shrq $4,%rbx + movq %r9,%r10 + shrq $4,%r9 + movq %r8,112(%rbp) + shlb $4,%dl + movq %rax,112-128(%rbp) + shlq $60,%r10 + movb %dl,15(%rsp) + orq %r10,%rbx + movq %r9,120(%rbp) + movq %rbx,120-128(%rbp) + addq $-128,%rsi + movq 8(%rdi),%r8 + movq 0(%rdi),%r9 + addq %r14,%r15 + leaq .Lrem_8bit(%rip),%r11 + jmp .Louter_loop +.align 16 +.Louter_loop: + xorq (%r14),%r9 + movq 8(%r14),%rdx + leaq 16(%r14),%r14 + xorq %r8,%rdx + movq %r9,(%rdi) + movq %rdx,8(%rdi) + shrq $32,%rdx + xorq %rax,%rax + roll $8,%edx + movb %dl,%al + movzbl %dl,%ebx + shlb $4,%al + shrl $4,%ebx + roll $8,%edx + movq 8(%rsi,%rax,1),%r8 + movq (%rsi,%rax,1),%r9 + movb %dl,%al + movzbl %dl,%ecx + shlb $4,%al + movzbq (%rsp,%rbx,1),%r12 + shrl $4,%ecx + xorq %r8,%r12 + movq %r9,%r10 + shrq $8,%r8 + movzbq %r12b,%r12 + shrq $8,%r9 + xorq -128(%rbp,%rbx,8),%r8 + shlq $56,%r10 + xorq (%rbp,%rbx,8),%r9 + roll $8,%edx + xorq 8(%rsi,%rax,1),%r8 + xorq (%rsi,%rax,1),%r9 + movb %dl,%al + xorq %r10,%r8 + movzwq (%r11,%r12,2),%r12 + movzbl %dl,%ebx + shlb $4,%al + movzbq (%rsp,%rcx,1),%r13 + shrl $4,%ebx + shlq $48,%r12 + xorq %r8,%r13 + movq %r9,%r10 + xorq %r12,%r9 + shrq $8,%r8 + movzbq %r13b,%r13 + shrq $8,%r9 + xorq -128(%rbp,%rcx,8),%r8 + shlq $56,%r10 + xorq (%rbp,%rcx,8),%r9 + roll $8,%edx + xorq 8(%rsi,%rax,1),%r8 + xorq (%rsi,%rax,1),%r9 + movb %dl,%al + xorq %r10,%r8 + movzwq (%r11,%r13,2),%r13 + movzbl %dl,%ecx + shlb $4,%al + movzbq (%rsp,%rbx,1),%r12 + shrl $4,%ecx + shlq $48,%r13 + xorq %r8,%r12 + movq %r9,%r10 + xorq %r13,%r9 + shrq $8,%r8 + movzbq %r12b,%r12 + movl 8(%rdi),%edx + shrq $8,%r9 + xorq -128(%rbp,%rbx,8),%r8 + shlq $56,%r10 + xorq (%rbp,%rbx,8),%r9 + roll $8,%edx + xorq 8(%rsi,%rax,1),%r8 + xorq (%rsi,%rax,1),%r9 + movb %dl,%al + xorq %r10,%r8 + movzwq (%r11,%r12,2),%r12 + movzbl %dl,%ebx + shlb $4,%al + movzbq (%rsp,%rcx,1),%r13 + shrl $4,%ebx + shlq $48,%r12 + xorq %r8,%r13 + movq %r9,%r10 + xorq %r12,%r9 + shrq $8,%r8 + movzbq %r13b,%r13 + shrq $8,%r9 + xorq -128(%rbp,%rcx,8),%r8 + shlq $56,%r10 + xorq (%rbp,%rcx,8),%r9 + roll $8,%edx + xorq 8(%rsi,%rax,1),%r8 + xorq (%rsi,%rax,1),%r9 + movb %dl,%al + xorq %r10,%r8 + movzwq (%r11,%r13,2),%r13 + movzbl %dl,%ecx + shlb $4,%al + movzbq (%rsp,%rbx,1),%r12 + shrl $4,%ecx + shlq $48,%r13 + xorq %r8,%r12 + movq %r9,%r10 + xorq %r13,%r9 + shrq $8,%r8 + movzbq %r12b,%r12 + shrq $8,%r9 + xorq -128(%rbp,%rbx,8),%r8 + shlq $56,%r10 + xorq (%rbp,%rbx,8),%r9 + roll $8,%edx + xorq 8(%rsi,%rax,1),%r8 + xorq (%rsi,%rax,1),%r9 + movb %dl,%al + xorq %r10,%r8 + movzwq (%r11,%r12,2),%r12 + movzbl %dl,%ebx + shlb $4,%al + movzbq (%rsp,%rcx,1),%r13 + shrl $4,%ebx + shlq $48,%r12 + xorq %r8,%r13 + movq %r9,%r10 + xorq %r12,%r9 + shrq $8,%r8 + movzbq %r13b,%r13 + shrq $8,%r9 + xorq -128(%rbp,%rcx,8),%r8 + shlq $56,%r10 + xorq (%rbp,%rcx,8),%r9 + roll $8,%edx + xorq 8(%rsi,%rax,1),%r8 + xorq (%rsi,%rax,1),%r9 + movb %dl,%al + xorq %r10,%r8 + movzwq (%r11,%r13,2),%r13 + movzbl %dl,%ecx + shlb $4,%al + movzbq (%rsp,%rbx,1),%r12 + shrl $4,%ecx + shlq $48,%r13 + xorq %r8,%r12 + movq %r9,%r10 + xorq %r13,%r9 + shrq $8,%r8 + movzbq %r12b,%r12 + movl 4(%rdi),%edx + shrq $8,%r9 + xorq -128(%rbp,%rbx,8),%r8 + shlq $56,%r10 + xorq (%rbp,%rbx,8),%r9 + roll $8,%edx + xorq 8(%rsi,%rax,1),%r8 + xorq (%rsi,%rax,1),%r9 + movb %dl,%al + xorq %r10,%r8 + movzwq (%r11,%r12,2),%r12 + movzbl %dl,%ebx + shlb $4,%al + movzbq (%rsp,%rcx,1),%r13 + shrl $4,%ebx + shlq $48,%r12 + xorq %r8,%r13 + movq %r9,%r10 + xorq %r12,%r9 + shrq $8,%r8 + movzbq %r13b,%r13 + shrq $8,%r9 + xorq -128(%rbp,%rcx,8),%r8 + shlq $56,%r10 + xorq (%rbp,%rcx,8),%r9 + roll $8,%edx + xorq 8(%rsi,%rax,1),%r8 + xorq (%rsi,%rax,1),%r9 + movb %dl,%al + xorq %r10,%r8 + movzwq (%r11,%r13,2),%r13 + movzbl %dl,%ecx + shlb $4,%al + movzbq (%rsp,%rbx,1),%r12 + shrl $4,%ecx + shlq $48,%r13 + xorq %r8,%r12 + movq %r9,%r10 + xorq %r13,%r9 + shrq $8,%r8 + movzbq %r12b,%r12 + shrq $8,%r9 + xorq -128(%rbp,%rbx,8),%r8 + shlq $56,%r10 + xorq (%rbp,%rbx,8),%r9 + roll $8,%edx + xorq 8(%rsi,%rax,1),%r8 + xorq (%rsi,%rax,1),%r9 + movb %dl,%al + xorq %r10,%r8 + movzwq (%r11,%r12,2),%r12 + movzbl %dl,%ebx + shlb $4,%al + movzbq (%rsp,%rcx,1),%r13 + shrl $4,%ebx + shlq $48,%r12 + xorq %r8,%r13 + movq %r9,%r10 + xorq %r12,%r9 + shrq $8,%r8 + movzbq %r13b,%r13 + shrq $8,%r9 + xorq -128(%rbp,%rcx,8),%r8 + shlq $56,%r10 + xorq (%rbp,%rcx,8),%r9 + roll $8,%edx + xorq 8(%rsi,%rax,1),%r8 + xorq (%rsi,%rax,1),%r9 + movb %dl,%al + xorq %r10,%r8 + movzwq (%r11,%r13,2),%r13 + movzbl %dl,%ecx + shlb $4,%al + movzbq (%rsp,%rbx,1),%r12 + shrl $4,%ecx + shlq $48,%r13 + xorq %r8,%r12 + movq %r9,%r10 + xorq %r13,%r9 + shrq $8,%r8 + movzbq %r12b,%r12 + movl 0(%rdi),%edx + shrq $8,%r9 + xorq -128(%rbp,%rbx,8),%r8 + shlq $56,%r10 + xorq (%rbp,%rbx,8),%r9 + roll $8,%edx + xorq 8(%rsi,%rax,1),%r8 + xorq (%rsi,%rax,1),%r9 + movb %dl,%al + xorq %r10,%r8 + movzwq (%r11,%r12,2),%r12 + movzbl %dl,%ebx + shlb $4,%al + movzbq (%rsp,%rcx,1),%r13 + shrl $4,%ebx + shlq $48,%r12 + xorq %r8,%r13 + movq %r9,%r10 + xorq %r12,%r9 + shrq $8,%r8 + movzbq %r13b,%r13 + shrq $8,%r9 + xorq -128(%rbp,%rcx,8),%r8 + shlq $56,%r10 + xorq (%rbp,%rcx,8),%r9 + roll $8,%edx + xorq 8(%rsi,%rax,1),%r8 + xorq (%rsi,%rax,1),%r9 + movb %dl,%al + xorq %r10,%r8 + movzwq (%r11,%r13,2),%r13 + movzbl %dl,%ecx + shlb $4,%al + movzbq (%rsp,%rbx,1),%r12 + shrl $4,%ecx + shlq $48,%r13 + xorq %r8,%r12 + movq %r9,%r10 + xorq %r13,%r9 + shrq $8,%r8 + movzbq %r12b,%r12 + shrq $8,%r9 + xorq -128(%rbp,%rbx,8),%r8 + shlq $56,%r10 + xorq (%rbp,%rbx,8),%r9 + roll $8,%edx + xorq 8(%rsi,%rax,1),%r8 + xorq (%rsi,%rax,1),%r9 + movb %dl,%al + xorq %r10,%r8 + movzwq (%r11,%r12,2),%r12 + movzbl %dl,%ebx + shlb $4,%al + movzbq (%rsp,%rcx,1),%r13 + shrl $4,%ebx + shlq $48,%r12 + xorq %r8,%r13 + movq %r9,%r10 + xorq %r12,%r9 + shrq $8,%r8 + movzbq %r13b,%r13 + shrq $8,%r9 + xorq -128(%rbp,%rcx,8),%r8 + shlq $56,%r10 + xorq (%rbp,%rcx,8),%r9 + roll $8,%edx + xorq 8(%rsi,%rax,1),%r8 + xorq (%rsi,%rax,1),%r9 + movb %dl,%al + xorq %r10,%r8 + movzwq (%r11,%r13,2),%r13 + movzbl %dl,%ecx + shlb $4,%al + movzbq (%rsp,%rbx,1),%r12 + andl $240,%ecx + shlq $48,%r13 + xorq %r8,%r12 + movq %r9,%r10 + xorq %r13,%r9 + shrq $8,%r8 + movzbq %r12b,%r12 + movl -4(%rdi),%edx + shrq $8,%r9 + xorq -128(%rbp,%rbx,8),%r8 + shlq $56,%r10 + xorq (%rbp,%rbx,8),%r9 + movzwq (%r11,%r12,2),%r12 + xorq 8(%rsi,%rax,1),%r8 + xorq (%rsi,%rax,1),%r9 + shlq $48,%r12 + xorq %r10,%r8 + xorq %r12,%r9 + movzbq %r8b,%r13 + shrq $4,%r8 + movq %r9,%r10 + shlb $4,%r13b + shrq $4,%r9 + xorq 8(%rsi,%rcx,1),%r8 + movzwq (%r11,%r13,2),%r13 + shlq $60,%r10 + xorq (%rsi,%rcx,1),%r9 + xorq %r10,%r8 + shlq $48,%r13 + bswapq %r8 + xorq %r13,%r9 + bswapq %r9 + cmpq %r15,%r14 + jb .Louter_loop + movq %r8,8(%rdi) + movq %r9,(%rdi) + + leaq 280+48(%rsp),%rsi +.cfi_def_cfa %rsi,8 + movq -48(%rsi),%r15 +.cfi_restore %r15 + movq -40(%rsi),%r14 +.cfi_restore %r14 + movq -32(%rsi),%r13 +.cfi_restore %r13 + movq -24(%rsi),%r12 +.cfi_restore %r12 + movq -16(%rsi),%rbp +.cfi_restore %rbp + movq -8(%rsi),%rbx +.cfi_restore %rbx + leaq 0(%rsi),%rsp +.cfi_def_cfa_register %rsp +.Lghash_epilogue: + .byte 0xf3,0xc3 +.cfi_endproc +.size gcm_ghash_4bit,.-gcm_ghash_4bit +.globl gcm_init_clmul +.type gcm_init_clmul,@function +.align 16 +gcm_init_clmul: +.cfi_startproc +.L_init_clmul: + movdqu (%rsi),%xmm2 + pshufd $78,%xmm2,%xmm2 + + + pshufd $255,%xmm2,%xmm4 + movdqa %xmm2,%xmm3 + psllq $1,%xmm2 + pxor %xmm5,%xmm5 + psrlq $63,%xmm3 + pcmpgtd %xmm4,%xmm5 + pslldq $8,%xmm3 + por %xmm3,%xmm2 + + + pand .L0x1c2_polynomial(%rip),%xmm5 + pxor %xmm5,%xmm2 + + + pshufd $78,%xmm2,%xmm6 + movdqa %xmm2,%xmm0 + pxor %xmm2,%xmm6 + movdqa %xmm0,%xmm1 + pshufd $78,%xmm0,%xmm3 + pxor %xmm0,%xmm3 +.byte 102,15,58,68,194,0 +.byte 102,15,58,68,202,17 +.byte 102,15,58,68,222,0 + pxor %xmm0,%xmm3 + pxor %xmm1,%xmm3 + + movdqa %xmm3,%xmm4 + psrldq $8,%xmm3 + pslldq $8,%xmm4 + pxor %xmm3,%xmm1 + pxor %xmm4,%xmm0 + + movdqa %xmm0,%xmm4 + movdqa %xmm0,%xmm3 + psllq $5,%xmm0 + pxor %xmm0,%xmm3 + psllq $1,%xmm0 + pxor %xmm3,%xmm0 + psllq $57,%xmm0 + movdqa %xmm0,%xmm3 + pslldq $8,%xmm0 + psrldq $8,%xmm3 + pxor %xmm4,%xmm0 + pxor %xmm3,%xmm1 + + + movdqa %xmm0,%xmm4 + psrlq $1,%xmm0 + pxor %xmm4,%xmm1 + pxor %xmm0,%xmm4 + psrlq $5,%xmm0 + pxor %xmm4,%xmm0 + psrlq $1,%xmm0 + pxor %xmm1,%xmm0 + pshufd $78,%xmm2,%xmm3 + pshufd $78,%xmm0,%xmm4 + pxor %xmm2,%xmm3 + movdqu %xmm2,0(%rdi) + pxor %xmm0,%xmm4 + movdqu %xmm0,16(%rdi) +.byte 102,15,58,15,227,8 + movdqu %xmm4,32(%rdi) + movdqa %xmm0,%xmm1 + pshufd $78,%xmm0,%xmm3 + pxor %xmm0,%xmm3 +.byte 102,15,58,68,194,0 +.byte 102,15,58,68,202,17 +.byte 102,15,58,68,222,0 + pxor %xmm0,%xmm3 + pxor %xmm1,%xmm3 + + movdqa %xmm3,%xmm4 + psrldq $8,%xmm3 + pslldq $8,%xmm4 + pxor %xmm3,%xmm1 + pxor %xmm4,%xmm0 + + movdqa %xmm0,%xmm4 + movdqa %xmm0,%xmm3 + psllq $5,%xmm0 + pxor %xmm0,%xmm3 + psllq $1,%xmm0 + pxor %xmm3,%xmm0 + psllq $57,%xmm0 + movdqa %xmm0,%xmm3 + pslldq $8,%xmm0 + psrldq $8,%xmm3 + pxor %xmm4,%xmm0 + pxor %xmm3,%xmm1 + + + movdqa %xmm0,%xmm4 + psrlq $1,%xmm0 + pxor %xmm4,%xmm1 + pxor %xmm0,%xmm4 + psrlq $5,%xmm0 + pxor %xmm4,%xmm0 + psrlq $1,%xmm0 + pxor %xmm1,%xmm0 + movdqa %xmm0,%xmm5 + movdqa %xmm0,%xmm1 + pshufd $78,%xmm0,%xmm3 + pxor %xmm0,%xmm3 +.byte 102,15,58,68,194,0 +.byte 102,15,58,68,202,17 +.byte 102,15,58,68,222,0 + pxor %xmm0,%xmm3 + pxor %xmm1,%xmm3 + + movdqa %xmm3,%xmm4 + psrldq $8,%xmm3 + pslldq $8,%xmm4 + pxor %xmm3,%xmm1 + pxor %xmm4,%xmm0 + + movdqa %xmm0,%xmm4 + movdqa %xmm0,%xmm3 + psllq $5,%xmm0 + pxor %xmm0,%xmm3 + psllq $1,%xmm0 + pxor %xmm3,%xmm0 + psllq $57,%xmm0 + movdqa %xmm0,%xmm3 + pslldq $8,%xmm0 + psrldq $8,%xmm3 + pxor %xmm4,%xmm0 + pxor %xmm3,%xmm1 + + + movdqa %xmm0,%xmm4 + psrlq $1,%xmm0 + pxor %xmm4,%xmm1 + pxor %xmm0,%xmm4 + psrlq $5,%xmm0 + pxor %xmm4,%xmm0 + psrlq $1,%xmm0 + pxor %xmm1,%xmm0 + pshufd $78,%xmm5,%xmm3 + pshufd $78,%xmm0,%xmm4 + pxor %xmm5,%xmm3 + movdqu %xmm5,48(%rdi) + pxor %xmm0,%xmm4 + movdqu %xmm0,64(%rdi) +.byte 102,15,58,15,227,8 + movdqu %xmm4,80(%rdi) + .byte 0xf3,0xc3 +.cfi_endproc +.size gcm_init_clmul,.-gcm_init_clmul +.globl gcm_gmult_clmul +.type gcm_gmult_clmul,@function +.align 16 +gcm_gmult_clmul: +.cfi_startproc +.L_gmult_clmul: + movdqu (%rdi),%xmm0 + movdqa .Lbswap_mask(%rip),%xmm5 + movdqu (%rsi),%xmm2 + movdqu 32(%rsi),%xmm4 +.byte 102,15,56,0,197 + movdqa %xmm0,%xmm1 + pshufd $78,%xmm0,%xmm3 + pxor %xmm0,%xmm3 +.byte 102,15,58,68,194,0 +.byte 102,15,58,68,202,17 +.byte 102,15,58,68,220,0 + pxor %xmm0,%xmm3 + pxor %xmm1,%xmm3 + + movdqa %xmm3,%xmm4 + psrldq $8,%xmm3 + pslldq $8,%xmm4 + pxor %xmm3,%xmm1 + pxor %xmm4,%xmm0 + + movdqa %xmm0,%xmm4 + movdqa %xmm0,%xmm3 + psllq $5,%xmm0 + pxor %xmm0,%xmm3 + psllq $1,%xmm0 + pxor %xmm3,%xmm0 + psllq $57,%xmm0 + movdqa %xmm0,%xmm3 + pslldq $8,%xmm0 + psrldq $8,%xmm3 + pxor %xmm4,%xmm0 + pxor %xmm3,%xmm1 + + + movdqa %xmm0,%xmm4 + psrlq $1,%xmm0 + pxor %xmm4,%xmm1 + pxor %xmm0,%xmm4 + psrlq $5,%xmm0 + pxor %xmm4,%xmm0 + psrlq $1,%xmm0 + pxor %xmm1,%xmm0 +.byte 102,15,56,0,197 + movdqu %xmm0,(%rdi) + .byte 0xf3,0xc3 +.cfi_endproc +.size gcm_gmult_clmul,.-gcm_gmult_clmul +.globl gcm_ghash_clmul +.type gcm_ghash_clmul,@function +.align 32 +gcm_ghash_clmul: +.cfi_startproc +.L_ghash_clmul: + movdqa .Lbswap_mask(%rip),%xmm10 + + movdqu (%rdi),%xmm0 + movdqu (%rsi),%xmm2 + movdqu 32(%rsi),%xmm7 +.byte 102,65,15,56,0,194 + + subq $0x10,%rcx + jz .Lodd_tail + + movdqu 16(%rsi),%xmm6 + movl OPENSSL_ia32cap_P+4(%rip),%eax + cmpq $0x30,%rcx + jb .Lskip4x + + andl $71303168,%eax + cmpl $4194304,%eax + je .Lskip4x + + subq $0x30,%rcx + movq $0xA040608020C0E000,%rax + movdqu 48(%rsi),%xmm14 + movdqu 64(%rsi),%xmm15 + + + + + movdqu 48(%rdx),%xmm3 + movdqu 32(%rdx),%xmm11 +.byte 102,65,15,56,0,218 +.byte 102,69,15,56,0,218 + movdqa %xmm3,%xmm5 + pshufd $78,%xmm3,%xmm4 + pxor %xmm3,%xmm4 +.byte 102,15,58,68,218,0 +.byte 102,15,58,68,234,17 +.byte 102,15,58,68,231,0 + + movdqa %xmm11,%xmm13 + pshufd $78,%xmm11,%xmm12 + pxor %xmm11,%xmm12 +.byte 102,68,15,58,68,222,0 +.byte 102,68,15,58,68,238,17 +.byte 102,68,15,58,68,231,16 + xorps %xmm11,%xmm3 + xorps %xmm13,%xmm5 + movups 80(%rsi),%xmm7 + xorps %xmm12,%xmm4 + + movdqu 16(%rdx),%xmm11 + movdqu 0(%rdx),%xmm8 +.byte 102,69,15,56,0,218 +.byte 102,69,15,56,0,194 + movdqa %xmm11,%xmm13 + pshufd $78,%xmm11,%xmm12 + pxor %xmm8,%xmm0 + pxor %xmm11,%xmm12 +.byte 102,69,15,58,68,222,0 + movdqa %xmm0,%xmm1 + pshufd $78,%xmm0,%xmm8 + pxor %xmm0,%xmm8 +.byte 102,69,15,58,68,238,17 +.byte 102,68,15,58,68,231,0 + xorps %xmm11,%xmm3 + xorps %xmm13,%xmm5 + + leaq 64(%rdx),%rdx + subq $0x40,%rcx + jc .Ltail4x + + jmp .Lmod4_loop +.align 32 +.Lmod4_loop: +.byte 102,65,15,58,68,199,0 + xorps %xmm12,%xmm4 + movdqu 48(%rdx),%xmm11 +.byte 102,69,15,56,0,218 +.byte 102,65,15,58,68,207,17 + xorps %xmm3,%xmm0 + movdqu 32(%rdx),%xmm3 + movdqa %xmm11,%xmm13 +.byte 102,68,15,58,68,199,16 + pshufd $78,%xmm11,%xmm12 + xorps %xmm5,%xmm1 + pxor %xmm11,%xmm12 +.byte 102,65,15,56,0,218 + movups 32(%rsi),%xmm7 + xorps %xmm4,%xmm8 +.byte 102,68,15,58,68,218,0 + pshufd $78,%xmm3,%xmm4 + + pxor %xmm0,%xmm8 + movdqa %xmm3,%xmm5 + pxor %xmm1,%xmm8 + pxor %xmm3,%xmm4 + movdqa %xmm8,%xmm9 +.byte 102,68,15,58,68,234,17 + pslldq $8,%xmm8 + psrldq $8,%xmm9 + pxor %xmm8,%xmm0 + movdqa .L7_mask(%rip),%xmm8 + pxor %xmm9,%xmm1 +.byte 102,76,15,110,200 + + pand %xmm0,%xmm8 +.byte 102,69,15,56,0,200 + pxor %xmm0,%xmm9 +.byte 102,68,15,58,68,231,0 + psllq $57,%xmm9 + movdqa %xmm9,%xmm8 + pslldq $8,%xmm9 +.byte 102,15,58,68,222,0 + psrldq $8,%xmm8 + pxor %xmm9,%xmm0 + pxor %xmm8,%xmm1 + movdqu 0(%rdx),%xmm8 + + movdqa %xmm0,%xmm9 + psrlq $1,%xmm0 +.byte 102,15,58,68,238,17 + xorps %xmm11,%xmm3 + movdqu 16(%rdx),%xmm11 +.byte 102,69,15,56,0,218 +.byte 102,15,58,68,231,16 + xorps %xmm13,%xmm5 + movups 80(%rsi),%xmm7 +.byte 102,69,15,56,0,194 + pxor %xmm9,%xmm1 + pxor %xmm0,%xmm9 + psrlq $5,%xmm0 + + movdqa %xmm11,%xmm13 + pxor %xmm12,%xmm4 + pshufd $78,%xmm11,%xmm12 + pxor %xmm9,%xmm0 + pxor %xmm8,%xmm1 + pxor %xmm11,%xmm12 +.byte 102,69,15,58,68,222,0 + psrlq $1,%xmm0 + pxor %xmm1,%xmm0 + movdqa %xmm0,%xmm1 +.byte 102,69,15,58,68,238,17 + xorps %xmm11,%xmm3 + pshufd $78,%xmm0,%xmm8 + pxor %xmm0,%xmm8 + +.byte 102,68,15,58,68,231,0 + xorps %xmm13,%xmm5 + + leaq 64(%rdx),%rdx + subq $0x40,%rcx + jnc .Lmod4_loop + +.Ltail4x: +.byte 102,65,15,58,68,199,0 +.byte 102,65,15,58,68,207,17 +.byte 102,68,15,58,68,199,16 + xorps %xmm12,%xmm4 + xorps %xmm3,%xmm0 + xorps %xmm5,%xmm1 + pxor %xmm0,%xmm1 + pxor %xmm4,%xmm8 + + pxor %xmm1,%xmm8 + pxor %xmm0,%xmm1 + + movdqa %xmm8,%xmm9 + psrldq $8,%xmm8 + pslldq $8,%xmm9 + pxor %xmm8,%xmm1 + pxor %xmm9,%xmm0 + + movdqa %xmm0,%xmm4 + movdqa %xmm0,%xmm3 + psllq $5,%xmm0 + pxor %xmm0,%xmm3 + psllq $1,%xmm0 + pxor %xmm3,%xmm0 + psllq $57,%xmm0 + movdqa %xmm0,%xmm3 + pslldq $8,%xmm0 + psrldq $8,%xmm3 + pxor %xmm4,%xmm0 + pxor %xmm3,%xmm1 + + + movdqa %xmm0,%xmm4 + psrlq $1,%xmm0 + pxor %xmm4,%xmm1 + pxor %xmm0,%xmm4 + psrlq $5,%xmm0 + pxor %xmm4,%xmm0 + psrlq $1,%xmm0 + pxor %xmm1,%xmm0 + addq $0x40,%rcx + jz .Ldone + movdqu 32(%rsi),%xmm7 + subq $0x10,%rcx + jz .Lodd_tail +.Lskip4x: + + + + + + movdqu (%rdx),%xmm8 + movdqu 16(%rdx),%xmm3 +.byte 102,69,15,56,0,194 +.byte 102,65,15,56,0,218 + pxor %xmm8,%xmm0 + + movdqa %xmm3,%xmm5 + pshufd $78,%xmm3,%xmm4 + pxor %xmm3,%xmm4 +.byte 102,15,58,68,218,0 +.byte 102,15,58,68,234,17 +.byte 102,15,58,68,231,0 + + leaq 32(%rdx),%rdx + nop + subq $0x20,%rcx + jbe .Leven_tail + nop + jmp .Lmod_loop + +.align 32 +.Lmod_loop: + movdqa %xmm0,%xmm1 + movdqa %xmm4,%xmm8 + pshufd $78,%xmm0,%xmm4 + pxor %xmm0,%xmm4 + +.byte 102,15,58,68,198,0 +.byte 102,15,58,68,206,17 +.byte 102,15,58,68,231,16 + + pxor %xmm3,%xmm0 + pxor %xmm5,%xmm1 + movdqu (%rdx),%xmm9 + pxor %xmm0,%xmm8 +.byte 102,69,15,56,0,202 + movdqu 16(%rdx),%xmm3 + + pxor %xmm1,%xmm8 + pxor %xmm9,%xmm1 + pxor %xmm8,%xmm4 +.byte 102,65,15,56,0,218 + movdqa %xmm4,%xmm8 + psrldq $8,%xmm8 + pslldq $8,%xmm4 + pxor %xmm8,%xmm1 + pxor %xmm4,%xmm0 + + movdqa %xmm3,%xmm5 + + movdqa %xmm0,%xmm9 + movdqa %xmm0,%xmm8 + psllq $5,%xmm0 + pxor %xmm0,%xmm8 +.byte 102,15,58,68,218,0 + psllq $1,%xmm0 + pxor %xmm8,%xmm0 + psllq $57,%xmm0 + movdqa %xmm0,%xmm8 + pslldq $8,%xmm0 + psrldq $8,%xmm8 + pxor %xmm9,%xmm0 + pshufd $78,%xmm5,%xmm4 + pxor %xmm8,%xmm1 + pxor %xmm5,%xmm4 + + movdqa %xmm0,%xmm9 + psrlq $1,%xmm0 +.byte 102,15,58,68,234,17 + pxor %xmm9,%xmm1 + pxor %xmm0,%xmm9 + psrlq $5,%xmm0 + pxor %xmm9,%xmm0 + leaq 32(%rdx),%rdx + psrlq $1,%xmm0 +.byte 102,15,58,68,231,0 + pxor %xmm1,%xmm0 + + subq $0x20,%rcx + ja .Lmod_loop + +.Leven_tail: + movdqa %xmm0,%xmm1 + movdqa %xmm4,%xmm8 + pshufd $78,%xmm0,%xmm4 + pxor %xmm0,%xmm4 + +.byte 102,15,58,68,198,0 +.byte 102,15,58,68,206,17 +.byte 102,15,58,68,231,16 + + pxor %xmm3,%xmm0 + pxor %xmm5,%xmm1 + pxor %xmm0,%xmm8 + pxor %xmm1,%xmm8 + pxor %xmm8,%xmm4 + movdqa %xmm4,%xmm8 + psrldq $8,%xmm8 + pslldq $8,%xmm4 + pxor %xmm8,%xmm1 + pxor %xmm4,%xmm0 + + movdqa %xmm0,%xmm4 + movdqa %xmm0,%xmm3 + psllq $5,%xmm0 + pxor %xmm0,%xmm3 + psllq $1,%xmm0 + pxor %xmm3,%xmm0 + psllq $57,%xmm0 + movdqa %xmm0,%xmm3 + pslldq $8,%xmm0 + psrldq $8,%xmm3 + pxor %xmm4,%xmm0 + pxor %xmm3,%xmm1 + + + movdqa %xmm0,%xmm4 + psrlq $1,%xmm0 + pxor %xmm4,%xmm1 + pxor %xmm0,%xmm4 + psrlq $5,%xmm0 + pxor %xmm4,%xmm0 + psrlq $1,%xmm0 + pxor %xmm1,%xmm0 + testq %rcx,%rcx + jnz .Ldone + +.Lodd_tail: + movdqu (%rdx),%xmm8 +.byte 102,69,15,56,0,194 + pxor %xmm8,%xmm0 + movdqa %xmm0,%xmm1 + pshufd $78,%xmm0,%xmm3 + pxor %xmm0,%xmm3 +.byte 102,15,58,68,194,0 +.byte 102,15,58,68,202,17 +.byte 102,15,58,68,223,0 + pxor %xmm0,%xmm3 + pxor %xmm1,%xmm3 + + movdqa %xmm3,%xmm4 + psrldq $8,%xmm3 + pslldq $8,%xmm4 + pxor %xmm3,%xmm1 + pxor %xmm4,%xmm0 + + movdqa %xmm0,%xmm4 + movdqa %xmm0,%xmm3 + psllq $5,%xmm0 + pxor %xmm0,%xmm3 + psllq $1,%xmm0 + pxor %xmm3,%xmm0 + psllq $57,%xmm0 + movdqa %xmm0,%xmm3 + pslldq $8,%xmm0 + psrldq $8,%xmm3 + pxor %xmm4,%xmm0 + pxor %xmm3,%xmm1 + + + movdqa %xmm0,%xmm4 + psrlq $1,%xmm0 + pxor %xmm4,%xmm1 + pxor %xmm0,%xmm4 + psrlq $5,%xmm0 + pxor %xmm4,%xmm0 + psrlq $1,%xmm0 + pxor %xmm1,%xmm0 +.Ldone: +.byte 102,65,15,56,0,194 + movdqu %xmm0,(%rdi) + .byte 0xf3,0xc3 +.cfi_endproc +.size gcm_ghash_clmul,.-gcm_ghash_clmul +.globl gcm_init_avx +.type gcm_init_avx,@function +.align 32 +gcm_init_avx: +.cfi_startproc + jmp .L_init_clmul +.cfi_endproc +.size gcm_init_avx,.-gcm_init_avx +.globl gcm_gmult_avx +.type gcm_gmult_avx,@function +.align 32 +gcm_gmult_avx: +.cfi_startproc + jmp .L_gmult_clmul +.cfi_endproc +.size gcm_gmult_avx,.-gcm_gmult_avx +.globl gcm_ghash_avx +.type gcm_ghash_avx,@function +.align 32 +gcm_ghash_avx: +.cfi_startproc + jmp .L_ghash_clmul +.cfi_endproc +.size gcm_ghash_avx,.-gcm_ghash_avx +.align 64 +.Lbswap_mask: +.byte 15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0 +.L0x1c2_polynomial: +.byte 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0xc2 +.L7_mask: +.long 7,0,7,0 +.L7_mask_poly: +.long 7,0,450,0 +.align 64 +.type .Lrem_4bit,@object +.Lrem_4bit: +.long 0,0,0,471859200,0,943718400,0,610271232 +.long 0,1887436800,0,1822425088,0,1220542464,0,1423966208 +.long 0,3774873600,0,4246732800,0,3644850176,0,3311403008 +.long 0,2441084928,0,2376073216,0,2847932416,0,3051356160 +.type .Lrem_8bit,@object +.Lrem_8bit: +.value 0x0000,0x01C2,0x0384,0x0246,0x0708,0x06CA,0x048C,0x054E +.value 0x0E10,0x0FD2,0x0D94,0x0C56,0x0918,0x08DA,0x0A9C,0x0B5E +.value 0x1C20,0x1DE2,0x1FA4,0x1E66,0x1B28,0x1AEA,0x18AC,0x196E +.value 0x1230,0x13F2,0x11B4,0x1076,0x1538,0x14FA,0x16BC,0x177E +.value 0x3840,0x3982,0x3BC4,0x3A06,0x3F48,0x3E8A,0x3CCC,0x3D0E +.value 0x3650,0x3792,0x35D4,0x3416,0x3158,0x309A,0x32DC,0x331E +.value 0x2460,0x25A2,0x27E4,0x2626,0x2368,0x22AA,0x20EC,0x212E +.value 0x2A70,0x2BB2,0x29F4,0x2836,0x2D78,0x2CBA,0x2EFC,0x2F3E +.value 0x7080,0x7142,0x7304,0x72C6,0x7788,0x764A,0x740C,0x75CE +.value 0x7E90,0x7F52,0x7D14,0x7CD6,0x7998,0x785A,0x7A1C,0x7BDE +.value 0x6CA0,0x6D62,0x6F24,0x6EE6,0x6BA8,0x6A6A,0x682C,0x69EE +.value 0x62B0,0x6372,0x6134,0x60F6,0x65B8,0x647A,0x663C,0x67FE +.value 0x48C0,0x4902,0x4B44,0x4A86,0x4FC8,0x4E0A,0x4C4C,0x4D8E +.value 0x46D0,0x4712,0x4554,0x4496,0x41D8,0x401A,0x425C,0x439E +.value 0x54E0,0x5522,0x5764,0x56A6,0x53E8,0x522A,0x506C,0x51AE +.value 0x5AF0,0x5B32,0x5974,0x58B6,0x5DF8,0x5C3A,0x5E7C,0x5FBE +.value 0xE100,0xE0C2,0xE284,0xE346,0xE608,0xE7CA,0xE58C,0xE44E +.value 0xEF10,0xEED2,0xEC94,0xED56,0xE818,0xE9DA,0xEB9C,0xEA5E +.value 0xFD20,0xFCE2,0xFEA4,0xFF66,0xFA28,0xFBEA,0xF9AC,0xF86E +.value 0xF330,0xF2F2,0xF0B4,0xF176,0xF438,0xF5FA,0xF7BC,0xF67E +.value 0xD940,0xD882,0xDAC4,0xDB06,0xDE48,0xDF8A,0xDDCC,0xDC0E +.value 0xD750,0xD692,0xD4D4,0xD516,0xD058,0xD19A,0xD3DC,0xD21E +.value 0xC560,0xC4A2,0xC6E4,0xC726,0xC268,0xC3AA,0xC1EC,0xC02E +.value 0xCB70,0xCAB2,0xC8F4,0xC936,0xCC78,0xCDBA,0xCFFC,0xCE3E +.value 0x9180,0x9042,0x9204,0x93C6,0x9688,0x974A,0x950C,0x94CE +.value 0x9F90,0x9E52,0x9C14,0x9DD6,0x9898,0x995A,0x9B1C,0x9ADE +.value 0x8DA0,0x8C62,0x8E24,0x8FE6,0x8AA8,0x8B6A,0x892C,0x88EE +.value 0x83B0,0x8272,0x8034,0x81F6,0x84B8,0x857A,0x873C,0x86FE +.value 0xA9C0,0xA802,0xAA44,0xAB86,0xAEC8,0xAF0A,0xAD4C,0xAC8E +.value 0xA7D0,0xA612,0xA454,0xA596,0xA0D8,0xA11A,0xA35C,0xA29E +.value 0xB5E0,0xB422,0xB664,0xB7A6,0xB2E8,0xB32A,0xB16C,0xB0AE +.value 0xBBF0,0xBA32,0xB874,0xB9B6,0xBCF8,0xBD3A,0xBF7C,0xBEBE + +.byte 71,72,65,83,72,32,102,111,114,32,120,56,54,95,54,52,44,32,67,82,89,80,84,79,71,65,77,83,32,98,121,32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46,111,114,103,62,0 +.align 64 diff --git a/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha1-mb-x86_64.S b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha1-mb-x86_64.S new file mode 100644 index 0000000000..4572bc7227 --- /dev/null +++ b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha1-mb-x86_64.S @@ -0,0 +1,2962 @@ +# WARNING: do not edit! +# Generated from openssl/crypto/sha/asm/sha1-mb-x86_64.pl +# +# Copyright 2013-2020 The OpenSSL Project Authors. All Rights Reserved. +# +# Licensed under the OpenSSL license (the "License"). You may not use +# this file except in compliance with the License. You can obtain a copy +# in the file LICENSE in the source distribution or at +# https://www.openssl.org/source/license.html + +.text + + + +.globl sha1_multi_block +.type sha1_multi_block,@function +.align 32 +sha1_multi_block: +.cfi_startproc + movq OPENSSL_ia32cap_P+4(%rip),%rcx + btq $61,%rcx + jc _shaext_shortcut + movq %rsp,%rax +.cfi_def_cfa_register %rax + pushq %rbx +.cfi_offset %rbx,-16 + pushq %rbp +.cfi_offset %rbx,-24 + subq $288,%rsp + andq $-256,%rsp + movq %rax,272(%rsp) +.cfi_escape 0x0f,0x06,0x77,0x90,0x02,0x06,0x23,0x08 +.Lbody: + leaq K_XX_XX(%rip),%rbp + leaq 256(%rsp),%rbx + +.Loop_grande: + movl %edx,280(%rsp) + xorl %edx,%edx + movq 0(%rsi),%r8 + movl 8(%rsi),%ecx + cmpl %edx,%ecx + cmovgl %ecx,%edx + testl %ecx,%ecx + movl %ecx,0(%rbx) + cmovleq %rbp,%r8 + movq 16(%rsi),%r9 + movl 24(%rsi),%ecx + cmpl %edx,%ecx + cmovgl %ecx,%edx + testl %ecx,%ecx + movl %ecx,4(%rbx) + cmovleq %rbp,%r9 + movq 32(%rsi),%r10 + movl 40(%rsi),%ecx + cmpl %edx,%ecx + cmovgl %ecx,%edx + testl %ecx,%ecx + movl %ecx,8(%rbx) + cmovleq %rbp,%r10 + movq 48(%rsi),%r11 + movl 56(%rsi),%ecx + cmpl %edx,%ecx + cmovgl %ecx,%edx + testl %ecx,%ecx + movl %ecx,12(%rbx) + cmovleq %rbp,%r11 + testl %edx,%edx + jz .Ldone + + movdqu 0(%rdi),%xmm10 + leaq 128(%rsp),%rax + movdqu 32(%rdi),%xmm11 + movdqu 64(%rdi),%xmm12 + movdqu 96(%rdi),%xmm13 + movdqu 128(%rdi),%xmm14 + movdqa 96(%rbp),%xmm5 + movdqa -32(%rbp),%xmm15 + jmp .Loop + +.align 32 +.Loop: + movd (%r8),%xmm0 + leaq 64(%r8),%r8 + movd (%r9),%xmm2 + leaq 64(%r9),%r9 + movd (%r10),%xmm3 + leaq 64(%r10),%r10 + movd (%r11),%xmm4 + leaq 64(%r11),%r11 + punpckldq %xmm3,%xmm0 + movd -60(%r8),%xmm1 + punpckldq %xmm4,%xmm2 + movd -60(%r9),%xmm9 + punpckldq %xmm2,%xmm0 + movd -60(%r10),%xmm8 +.byte 102,15,56,0,197 + movd -60(%r11),%xmm7 + punpckldq %xmm8,%xmm1 + movdqa %xmm10,%xmm8 + paddd %xmm15,%xmm14 + punpckldq %xmm7,%xmm9 + movdqa %xmm11,%xmm7 + movdqa %xmm11,%xmm6 + pslld $5,%xmm8 + pandn %xmm13,%xmm7 + pand %xmm12,%xmm6 + punpckldq %xmm9,%xmm1 + movdqa %xmm10,%xmm9 + + movdqa %xmm0,0-128(%rax) + paddd %xmm0,%xmm14 + movd -56(%r8),%xmm2 + psrld $27,%xmm9 + pxor %xmm7,%xmm6 + movdqa %xmm11,%xmm7 + + por %xmm9,%xmm8 + movd -56(%r9),%xmm9 + pslld $30,%xmm7 + paddd %xmm6,%xmm14 + + psrld $2,%xmm11 + paddd %xmm8,%xmm14 +.byte 102,15,56,0,205 + movd -56(%r10),%xmm8 + por %xmm7,%xmm11 + movd -56(%r11),%xmm7 + punpckldq %xmm8,%xmm2 + movdqa %xmm14,%xmm8 + paddd %xmm15,%xmm13 + punpckldq %xmm7,%xmm9 + movdqa %xmm10,%xmm7 + movdqa %xmm10,%xmm6 + pslld $5,%xmm8 + pandn %xmm12,%xmm7 + pand %xmm11,%xmm6 + punpckldq %xmm9,%xmm2 + movdqa %xmm14,%xmm9 + + movdqa %xmm1,16-128(%rax) + paddd %xmm1,%xmm13 + movd -52(%r8),%xmm3 + psrld $27,%xmm9 + pxor %xmm7,%xmm6 + movdqa %xmm10,%xmm7 + + por %xmm9,%xmm8 + movd -52(%r9),%xmm9 + pslld $30,%xmm7 + paddd %xmm6,%xmm13 + + psrld $2,%xmm10 + paddd %xmm8,%xmm13 +.byte 102,15,56,0,213 + movd -52(%r10),%xmm8 + por %xmm7,%xmm10 + movd -52(%r11),%xmm7 + punpckldq %xmm8,%xmm3 + movdqa %xmm13,%xmm8 + paddd %xmm15,%xmm12 + punpckldq %xmm7,%xmm9 + movdqa %xmm14,%xmm7 + movdqa %xmm14,%xmm6 + pslld $5,%xmm8 + pandn %xmm11,%xmm7 + pand %xmm10,%xmm6 + punpckldq %xmm9,%xmm3 + movdqa %xmm13,%xmm9 + + movdqa %xmm2,32-128(%rax) + paddd %xmm2,%xmm12 + movd -48(%r8),%xmm4 + psrld $27,%xmm9 + pxor %xmm7,%xmm6 + movdqa %xmm14,%xmm7 + + por %xmm9,%xmm8 + movd -48(%r9),%xmm9 + pslld $30,%xmm7 + paddd %xmm6,%xmm12 + + psrld $2,%xmm14 + paddd %xmm8,%xmm12 +.byte 102,15,56,0,221 + movd -48(%r10),%xmm8 + por %xmm7,%xmm14 + movd -48(%r11),%xmm7 + punpckldq %xmm8,%xmm4 + movdqa %xmm12,%xmm8 + paddd %xmm15,%xmm11 + punpckldq %xmm7,%xmm9 + movdqa %xmm13,%xmm7 + movdqa %xmm13,%xmm6 + pslld $5,%xmm8 + pandn %xmm10,%xmm7 + pand %xmm14,%xmm6 + punpckldq %xmm9,%xmm4 + movdqa %xmm12,%xmm9 + + movdqa %xmm3,48-128(%rax) + paddd %xmm3,%xmm11 + movd -44(%r8),%xmm0 + psrld $27,%xmm9 + pxor %xmm7,%xmm6 + movdqa %xmm13,%xmm7 + + por %xmm9,%xmm8 + movd -44(%r9),%xmm9 + pslld $30,%xmm7 + paddd %xmm6,%xmm11 + + psrld $2,%xmm13 + paddd %xmm8,%xmm11 +.byte 102,15,56,0,229 + movd -44(%r10),%xmm8 + por %xmm7,%xmm13 + movd -44(%r11),%xmm7 + punpckldq %xmm8,%xmm0 + movdqa %xmm11,%xmm8 + paddd %xmm15,%xmm10 + punpckldq %xmm7,%xmm9 + movdqa %xmm12,%xmm7 + movdqa %xmm12,%xmm6 + pslld $5,%xmm8 + pandn %xmm14,%xmm7 + pand %xmm13,%xmm6 + punpckldq %xmm9,%xmm0 + movdqa %xmm11,%xmm9 + + movdqa %xmm4,64-128(%rax) + paddd %xmm4,%xmm10 + movd -40(%r8),%xmm1 + psrld $27,%xmm9 + pxor %xmm7,%xmm6 + movdqa %xmm12,%xmm7 + + por %xmm9,%xmm8 + movd -40(%r9),%xmm9 + pslld $30,%xmm7 + paddd %xmm6,%xmm10 + + psrld $2,%xmm12 + paddd %xmm8,%xmm10 +.byte 102,15,56,0,197 + movd -40(%r10),%xmm8 + por %xmm7,%xmm12 + movd -40(%r11),%xmm7 + punpckldq %xmm8,%xmm1 + movdqa %xmm10,%xmm8 + paddd %xmm15,%xmm14 + punpckldq %xmm7,%xmm9 + movdqa %xmm11,%xmm7 + movdqa %xmm11,%xmm6 + pslld $5,%xmm8 + pandn %xmm13,%xmm7 + pand %xmm12,%xmm6 + punpckldq %xmm9,%xmm1 + movdqa %xmm10,%xmm9 + + movdqa %xmm0,80-128(%rax) + paddd %xmm0,%xmm14 + movd -36(%r8),%xmm2 + psrld $27,%xmm9 + pxor %xmm7,%xmm6 + movdqa %xmm11,%xmm7 + + por %xmm9,%xmm8 + movd -36(%r9),%xmm9 + pslld $30,%xmm7 + paddd %xmm6,%xmm14 + + psrld $2,%xmm11 + paddd %xmm8,%xmm14 +.byte 102,15,56,0,205 + movd -36(%r10),%xmm8 + por %xmm7,%xmm11 + movd -36(%r11),%xmm7 + punpckldq %xmm8,%xmm2 + movdqa %xmm14,%xmm8 + paddd %xmm15,%xmm13 + punpckldq %xmm7,%xmm9 + movdqa %xmm10,%xmm7 + movdqa %xmm10,%xmm6 + pslld $5,%xmm8 + pandn %xmm12,%xmm7 + pand %xmm11,%xmm6 + punpckldq %xmm9,%xmm2 + movdqa %xmm14,%xmm9 + + movdqa %xmm1,96-128(%rax) + paddd %xmm1,%xmm13 + movd -32(%r8),%xmm3 + psrld $27,%xmm9 + pxor %xmm7,%xmm6 + movdqa %xmm10,%xmm7 + + por %xmm9,%xmm8 + movd -32(%r9),%xmm9 + pslld $30,%xmm7 + paddd %xmm6,%xmm13 + + psrld $2,%xmm10 + paddd %xmm8,%xmm13 +.byte 102,15,56,0,213 + movd -32(%r10),%xmm8 + por %xmm7,%xmm10 + movd -32(%r11),%xmm7 + punpckldq %xmm8,%xmm3 + movdqa %xmm13,%xmm8 + paddd %xmm15,%xmm12 + punpckldq %xmm7,%xmm9 + movdqa %xmm14,%xmm7 + movdqa %xmm14,%xmm6 + pslld $5,%xmm8 + pandn %xmm11,%xmm7 + pand %xmm10,%xmm6 + punpckldq %xmm9,%xmm3 + movdqa %xmm13,%xmm9 + + movdqa %xmm2,112-128(%rax) + paddd %xmm2,%xmm12 + movd -28(%r8),%xmm4 + psrld $27,%xmm9 + pxor %xmm7,%xmm6 + movdqa %xmm14,%xmm7 + + por %xmm9,%xmm8 + movd -28(%r9),%xmm9 + pslld $30,%xmm7 + paddd %xmm6,%xmm12 + + psrld $2,%xmm14 + paddd %xmm8,%xmm12 +.byte 102,15,56,0,221 + movd -28(%r10),%xmm8 + por %xmm7,%xmm14 + movd -28(%r11),%xmm7 + punpckldq %xmm8,%xmm4 + movdqa %xmm12,%xmm8 + paddd %xmm15,%xmm11 + punpckldq %xmm7,%xmm9 + movdqa %xmm13,%xmm7 + movdqa %xmm13,%xmm6 + pslld $5,%xmm8 + pandn %xmm10,%xmm7 + pand %xmm14,%xmm6 + punpckldq %xmm9,%xmm4 + movdqa %xmm12,%xmm9 + + movdqa %xmm3,128-128(%rax) + paddd %xmm3,%xmm11 + movd -24(%r8),%xmm0 + psrld $27,%xmm9 + pxor %xmm7,%xmm6 + movdqa %xmm13,%xmm7 + + por %xmm9,%xmm8 + movd -24(%r9),%xmm9 + pslld $30,%xmm7 + paddd %xmm6,%xmm11 + + psrld $2,%xmm13 + paddd %xmm8,%xmm11 +.byte 102,15,56,0,229 + movd -24(%r10),%xmm8 + por %xmm7,%xmm13 + movd -24(%r11),%xmm7 + punpckldq %xmm8,%xmm0 + movdqa %xmm11,%xmm8 + paddd %xmm15,%xmm10 + punpckldq %xmm7,%xmm9 + movdqa %xmm12,%xmm7 + movdqa %xmm12,%xmm6 + pslld $5,%xmm8 + pandn %xmm14,%xmm7 + pand %xmm13,%xmm6 + punpckldq %xmm9,%xmm0 + movdqa %xmm11,%xmm9 + + movdqa %xmm4,144-128(%rax) + paddd %xmm4,%xmm10 + movd -20(%r8),%xmm1 + psrld $27,%xmm9 + pxor %xmm7,%xmm6 + movdqa %xmm12,%xmm7 + + por %xmm9,%xmm8 + movd -20(%r9),%xmm9 + pslld $30,%xmm7 + paddd %xmm6,%xmm10 + + psrld $2,%xmm12 + paddd %xmm8,%xmm10 +.byte 102,15,56,0,197 + movd -20(%r10),%xmm8 + por %xmm7,%xmm12 + movd -20(%r11),%xmm7 + punpckldq %xmm8,%xmm1 + movdqa %xmm10,%xmm8 + paddd %xmm15,%xmm14 + punpckldq %xmm7,%xmm9 + movdqa %xmm11,%xmm7 + movdqa %xmm11,%xmm6 + pslld $5,%xmm8 + pandn %xmm13,%xmm7 + pand %xmm12,%xmm6 + punpckldq %xmm9,%xmm1 + movdqa %xmm10,%xmm9 + + movdqa %xmm0,160-128(%rax) + paddd %xmm0,%xmm14 + movd -16(%r8),%xmm2 + psrld $27,%xmm9 + pxor %xmm7,%xmm6 + movdqa %xmm11,%xmm7 + + por %xmm9,%xmm8 + movd -16(%r9),%xmm9 + pslld $30,%xmm7 + paddd %xmm6,%xmm14 + + psrld $2,%xmm11 + paddd %xmm8,%xmm14 +.byte 102,15,56,0,205 + movd -16(%r10),%xmm8 + por %xmm7,%xmm11 + movd -16(%r11),%xmm7 + punpckldq %xmm8,%xmm2 + movdqa %xmm14,%xmm8 + paddd %xmm15,%xmm13 + punpckldq %xmm7,%xmm9 + movdqa %xmm10,%xmm7 + movdqa %xmm10,%xmm6 + pslld $5,%xmm8 + pandn %xmm12,%xmm7 + pand %xmm11,%xmm6 + punpckldq %xmm9,%xmm2 + movdqa %xmm14,%xmm9 + + movdqa %xmm1,176-128(%rax) + paddd %xmm1,%xmm13 + movd -12(%r8),%xmm3 + psrld $27,%xmm9 + pxor %xmm7,%xmm6 + movdqa %xmm10,%xmm7 + + por %xmm9,%xmm8 + movd -12(%r9),%xmm9 + pslld $30,%xmm7 + paddd %xmm6,%xmm13 + + psrld $2,%xmm10 + paddd %xmm8,%xmm13 +.byte 102,15,56,0,213 + movd -12(%r10),%xmm8 + por %xmm7,%xmm10 + movd -12(%r11),%xmm7 + punpckldq %xmm8,%xmm3 + movdqa %xmm13,%xmm8 + paddd %xmm15,%xmm12 + punpckldq %xmm7,%xmm9 + movdqa %xmm14,%xmm7 + movdqa %xmm14,%xmm6 + pslld $5,%xmm8 + pandn %xmm11,%xmm7 + pand %xmm10,%xmm6 + punpckldq %xmm9,%xmm3 + movdqa %xmm13,%xmm9 + + movdqa %xmm2,192-128(%rax) + paddd %xmm2,%xmm12 + movd -8(%r8),%xmm4 + psrld $27,%xmm9 + pxor %xmm7,%xmm6 + movdqa %xmm14,%xmm7 + + por %xmm9,%xmm8 + movd -8(%r9),%xmm9 + pslld $30,%xmm7 + paddd %xmm6,%xmm12 + + psrld $2,%xmm14 + paddd %xmm8,%xmm12 +.byte 102,15,56,0,221 + movd -8(%r10),%xmm8 + por %xmm7,%xmm14 + movd -8(%r11),%xmm7 + punpckldq %xmm8,%xmm4 + movdqa %xmm12,%xmm8 + paddd %xmm15,%xmm11 + punpckldq %xmm7,%xmm9 + movdqa %xmm13,%xmm7 + movdqa %xmm13,%xmm6 + pslld $5,%xmm8 + pandn %xmm10,%xmm7 + pand %xmm14,%xmm6 + punpckldq %xmm9,%xmm4 + movdqa %xmm12,%xmm9 + + movdqa %xmm3,208-128(%rax) + paddd %xmm3,%xmm11 + movd -4(%r8),%xmm0 + psrld $27,%xmm9 + pxor %xmm7,%xmm6 + movdqa %xmm13,%xmm7 + + por %xmm9,%xmm8 + movd -4(%r9),%xmm9 + pslld $30,%xmm7 + paddd %xmm6,%xmm11 + + psrld $2,%xmm13 + paddd %xmm8,%xmm11 +.byte 102,15,56,0,229 + movd -4(%r10),%xmm8 + por %xmm7,%xmm13 + movdqa 0-128(%rax),%xmm1 + movd -4(%r11),%xmm7 + punpckldq %xmm8,%xmm0 + movdqa %xmm11,%xmm8 + paddd %xmm15,%xmm10 + punpckldq %xmm7,%xmm9 + movdqa %xmm12,%xmm7 + movdqa %xmm12,%xmm6 + pslld $5,%xmm8 + prefetcht0 63(%r8) + pandn %xmm14,%xmm7 + pand %xmm13,%xmm6 + punpckldq %xmm9,%xmm0 + movdqa %xmm11,%xmm9 + + movdqa %xmm4,224-128(%rax) + paddd %xmm4,%xmm10 + psrld $27,%xmm9 + pxor %xmm7,%xmm6 + movdqa %xmm12,%xmm7 + prefetcht0 63(%r9) + + por %xmm9,%xmm8 + pslld $30,%xmm7 + paddd %xmm6,%xmm10 + prefetcht0 63(%r10) + + psrld $2,%xmm12 + paddd %xmm8,%xmm10 +.byte 102,15,56,0,197 + prefetcht0 63(%r11) + por %xmm7,%xmm12 + movdqa 16-128(%rax),%xmm2 + pxor %xmm3,%xmm1 + movdqa 32-128(%rax),%xmm3 + + movdqa %xmm10,%xmm8 + pxor 128-128(%rax),%xmm1 + paddd %xmm15,%xmm14 + movdqa %xmm11,%xmm7 + pslld $5,%xmm8 + pxor %xmm3,%xmm1 + movdqa %xmm11,%xmm6 + pandn %xmm13,%xmm7 + movdqa %xmm1,%xmm5 + pand %xmm12,%xmm6 + movdqa %xmm10,%xmm9 + psrld $31,%xmm5 + paddd %xmm1,%xmm1 + + movdqa %xmm0,240-128(%rax) + paddd %xmm0,%xmm14 + psrld $27,%xmm9 + pxor %xmm7,%xmm6 + + movdqa %xmm11,%xmm7 + por %xmm9,%xmm8 + pslld $30,%xmm7 + paddd %xmm6,%xmm14 + + psrld $2,%xmm11 + paddd %xmm8,%xmm14 + por %xmm5,%xmm1 + por %xmm7,%xmm11 + pxor %xmm4,%xmm2 + movdqa 48-128(%rax),%xmm4 + + movdqa %xmm14,%xmm8 + pxor 144-128(%rax),%xmm2 + paddd %xmm15,%xmm13 + movdqa %xmm10,%xmm7 + pslld $5,%xmm8 + pxor %xmm4,%xmm2 + movdqa %xmm10,%xmm6 + pandn %xmm12,%xmm7 + movdqa %xmm2,%xmm5 + pand %xmm11,%xmm6 + movdqa %xmm14,%xmm9 + psrld $31,%xmm5 + paddd %xmm2,%xmm2 + + movdqa %xmm1,0-128(%rax) + paddd %xmm1,%xmm13 + psrld $27,%xmm9 + pxor %xmm7,%xmm6 + + movdqa %xmm10,%xmm7 + por %xmm9,%xmm8 + pslld $30,%xmm7 + paddd %xmm6,%xmm13 + + psrld $2,%xmm10 + paddd %xmm8,%xmm13 + por %xmm5,%xmm2 + por %xmm7,%xmm10 + pxor %xmm0,%xmm3 + movdqa 64-128(%rax),%xmm0 + + movdqa %xmm13,%xmm8 + pxor 160-128(%rax),%xmm3 + paddd %xmm15,%xmm12 + movdqa %xmm14,%xmm7 + pslld $5,%xmm8 + pxor %xmm0,%xmm3 + movdqa %xmm14,%xmm6 + pandn %xmm11,%xmm7 + movdqa %xmm3,%xmm5 + pand %xmm10,%xmm6 + movdqa %xmm13,%xmm9 + psrld $31,%xmm5 + paddd %xmm3,%xmm3 + + movdqa %xmm2,16-128(%rax) + paddd %xmm2,%xmm12 + psrld $27,%xmm9 + pxor %xmm7,%xmm6 + + movdqa %xmm14,%xmm7 + por %xmm9,%xmm8 + pslld $30,%xmm7 + paddd %xmm6,%xmm12 + + psrld $2,%xmm14 + paddd %xmm8,%xmm12 + por %xmm5,%xmm3 + por %xmm7,%xmm14 + pxor %xmm1,%xmm4 + movdqa 80-128(%rax),%xmm1 + + movdqa %xmm12,%xmm8 + pxor 176-128(%rax),%xmm4 + paddd %xmm15,%xmm11 + movdqa %xmm13,%xmm7 + pslld $5,%xmm8 + pxor %xmm1,%xmm4 + movdqa %xmm13,%xmm6 + pandn %xmm10,%xmm7 + movdqa %xmm4,%xmm5 + pand %xmm14,%xmm6 + movdqa %xmm12,%xmm9 + psrld $31,%xmm5 + paddd %xmm4,%xmm4 + + movdqa %xmm3,32-128(%rax) + paddd %xmm3,%xmm11 + psrld $27,%xmm9 + pxor %xmm7,%xmm6 + + movdqa %xmm13,%xmm7 + por %xmm9,%xmm8 + pslld $30,%xmm7 + paddd %xmm6,%xmm11 + + psrld $2,%xmm13 + paddd %xmm8,%xmm11 + por %xmm5,%xmm4 + por %xmm7,%xmm13 + pxor %xmm2,%xmm0 + movdqa 96-128(%rax),%xmm2 + + movdqa %xmm11,%xmm8 + pxor 192-128(%rax),%xmm0 + paddd %xmm15,%xmm10 + movdqa %xmm12,%xmm7 + pslld $5,%xmm8 + pxor %xmm2,%xmm0 + movdqa %xmm12,%xmm6 + pandn %xmm14,%xmm7 + movdqa %xmm0,%xmm5 + pand %xmm13,%xmm6 + movdqa %xmm11,%xmm9 + psrld $31,%xmm5 + paddd %xmm0,%xmm0 + + movdqa %xmm4,48-128(%rax) + paddd %xmm4,%xmm10 + psrld $27,%xmm9 + pxor %xmm7,%xmm6 + + movdqa %xmm12,%xmm7 + por %xmm9,%xmm8 + pslld $30,%xmm7 + paddd %xmm6,%xmm10 + + psrld $2,%xmm12 + paddd %xmm8,%xmm10 + por %xmm5,%xmm0 + por %xmm7,%xmm12 + movdqa 0(%rbp),%xmm15 + pxor %xmm3,%xmm1 + movdqa 112-128(%rax),%xmm3 + + movdqa %xmm10,%xmm8 + movdqa %xmm13,%xmm6 + pxor 208-128(%rax),%xmm1 + paddd %xmm15,%xmm14 + pslld $5,%xmm8 + pxor %xmm11,%xmm6 + + movdqa %xmm10,%xmm9 + movdqa %xmm0,64-128(%rax) + paddd %xmm0,%xmm14 + pxor %xmm3,%xmm1 + psrld $27,%xmm9 + pxor %xmm12,%xmm6 + movdqa %xmm11,%xmm7 + + pslld $30,%xmm7 + movdqa %xmm1,%xmm5 + por %xmm9,%xmm8 + psrld $31,%xmm5 + paddd %xmm6,%xmm14 + paddd %xmm1,%xmm1 + + psrld $2,%xmm11 + paddd %xmm8,%xmm14 + por %xmm5,%xmm1 + por %xmm7,%xmm11 + pxor %xmm4,%xmm2 + movdqa 128-128(%rax),%xmm4 + + movdqa %xmm14,%xmm8 + movdqa %xmm12,%xmm6 + pxor 224-128(%rax),%xmm2 + paddd %xmm15,%xmm13 + pslld $5,%xmm8 + pxor %xmm10,%xmm6 + + movdqa %xmm14,%xmm9 + movdqa %xmm1,80-128(%rax) + paddd %xmm1,%xmm13 + pxor %xmm4,%xmm2 + psrld $27,%xmm9 + pxor %xmm11,%xmm6 + movdqa %xmm10,%xmm7 + + pslld $30,%xmm7 + movdqa %xmm2,%xmm5 + por %xmm9,%xmm8 + psrld $31,%xmm5 + paddd %xmm6,%xmm13 + paddd %xmm2,%xmm2 + + psrld $2,%xmm10 + paddd %xmm8,%xmm13 + por %xmm5,%xmm2 + por %xmm7,%xmm10 + pxor %xmm0,%xmm3 + movdqa 144-128(%rax),%xmm0 + + movdqa %xmm13,%xmm8 + movdqa %xmm11,%xmm6 + pxor 240-128(%rax),%xmm3 + paddd %xmm15,%xmm12 + pslld $5,%xmm8 + pxor %xmm14,%xmm6 + + movdqa %xmm13,%xmm9 + movdqa %xmm2,96-128(%rax) + paddd %xmm2,%xmm12 + pxor %xmm0,%xmm3 + psrld $27,%xmm9 + pxor %xmm10,%xmm6 + movdqa %xmm14,%xmm7 + + pslld $30,%xmm7 + movdqa %xmm3,%xmm5 + por %xmm9,%xmm8 + psrld $31,%xmm5 + paddd %xmm6,%xmm12 + paddd %xmm3,%xmm3 + + psrld $2,%xmm14 + paddd %xmm8,%xmm12 + por %xmm5,%xmm3 + por %xmm7,%xmm14 + pxor %xmm1,%xmm4 + movdqa 160-128(%rax),%xmm1 + + movdqa %xmm12,%xmm8 + movdqa %xmm10,%xmm6 + pxor 0-128(%rax),%xmm4 + paddd %xmm15,%xmm11 + pslld $5,%xmm8 + pxor %xmm13,%xmm6 + + movdqa %xmm12,%xmm9 + movdqa %xmm3,112-128(%rax) + paddd %xmm3,%xmm11 + pxor %xmm1,%xmm4 + psrld $27,%xmm9 + pxor %xmm14,%xmm6 + movdqa %xmm13,%xmm7 + + pslld $30,%xmm7 + movdqa %xmm4,%xmm5 + por %xmm9,%xmm8 + psrld $31,%xmm5 + paddd %xmm6,%xmm11 + paddd %xmm4,%xmm4 + + psrld $2,%xmm13 + paddd %xmm8,%xmm11 + por %xmm5,%xmm4 + por %xmm7,%xmm13 + pxor %xmm2,%xmm0 + movdqa 176-128(%rax),%xmm2 + + movdqa %xmm11,%xmm8 + movdqa %xmm14,%xmm6 + pxor 16-128(%rax),%xmm0 + paddd %xmm15,%xmm10 + pslld $5,%xmm8 + pxor %xmm12,%xmm6 + + movdqa %xmm11,%xmm9 + movdqa %xmm4,128-128(%rax) + paddd %xmm4,%xmm10 + pxor %xmm2,%xmm0 + psrld $27,%xmm9 + pxor %xmm13,%xmm6 + movdqa %xmm12,%xmm7 + + pslld $30,%xmm7 + movdqa %xmm0,%xmm5 + por %xmm9,%xmm8 + psrld $31,%xmm5 + paddd %xmm6,%xmm10 + paddd %xmm0,%xmm0 + + psrld $2,%xmm12 + paddd %xmm8,%xmm10 + por %xmm5,%xmm0 + por %xmm7,%xmm12 + pxor %xmm3,%xmm1 + movdqa 192-128(%rax),%xmm3 + + movdqa %xmm10,%xmm8 + movdqa %xmm13,%xmm6 + pxor 32-128(%rax),%xmm1 + paddd %xmm15,%xmm14 + pslld $5,%xmm8 + pxor %xmm11,%xmm6 + + movdqa %xmm10,%xmm9 + movdqa %xmm0,144-128(%rax) + paddd %xmm0,%xmm14 + pxor %xmm3,%xmm1 + psrld $27,%xmm9 + pxor %xmm12,%xmm6 + movdqa %xmm11,%xmm7 + + pslld $30,%xmm7 + movdqa %xmm1,%xmm5 + por %xmm9,%xmm8 + psrld $31,%xmm5 + paddd %xmm6,%xmm14 + paddd %xmm1,%xmm1 + + psrld $2,%xmm11 + paddd %xmm8,%xmm14 + por %xmm5,%xmm1 + por %xmm7,%xmm11 + pxor %xmm4,%xmm2 + movdqa 208-128(%rax),%xmm4 + + movdqa %xmm14,%xmm8 + movdqa %xmm12,%xmm6 + pxor 48-128(%rax),%xmm2 + paddd %xmm15,%xmm13 + pslld $5,%xmm8 + pxor %xmm10,%xmm6 + + movdqa %xmm14,%xmm9 + movdqa %xmm1,160-128(%rax) + paddd %xmm1,%xmm13 + pxor %xmm4,%xmm2 + psrld $27,%xmm9 + pxor %xmm11,%xmm6 + movdqa %xmm10,%xmm7 + + pslld $30,%xmm7 + movdqa %xmm2,%xmm5 + por %xmm9,%xmm8 + psrld $31,%xmm5 + paddd %xmm6,%xmm13 + paddd %xmm2,%xmm2 + + psrld $2,%xmm10 + paddd %xmm8,%xmm13 + por %xmm5,%xmm2 + por %xmm7,%xmm10 + pxor %xmm0,%xmm3 + movdqa 224-128(%rax),%xmm0 + + movdqa %xmm13,%xmm8 + movdqa %xmm11,%xmm6 + pxor 64-128(%rax),%xmm3 + paddd %xmm15,%xmm12 + pslld $5,%xmm8 + pxor %xmm14,%xmm6 + + movdqa %xmm13,%xmm9 + movdqa %xmm2,176-128(%rax) + paddd %xmm2,%xmm12 + pxor %xmm0,%xmm3 + psrld $27,%xmm9 + pxor %xmm10,%xmm6 + movdqa %xmm14,%xmm7 + + pslld $30,%xmm7 + movdqa %xmm3,%xmm5 + por %xmm9,%xmm8 + psrld $31,%xmm5 + paddd %xmm6,%xmm12 + paddd %xmm3,%xmm3 + + psrld $2,%xmm14 + paddd %xmm8,%xmm12 + por %xmm5,%xmm3 + por %xmm7,%xmm14 + pxor %xmm1,%xmm4 + movdqa 240-128(%rax),%xmm1 + + movdqa %xmm12,%xmm8 + movdqa %xmm10,%xmm6 + pxor 80-128(%rax),%xmm4 + paddd %xmm15,%xmm11 + pslld $5,%xmm8 + pxor %xmm13,%xmm6 + + movdqa %xmm12,%xmm9 + movdqa %xmm3,192-128(%rax) + paddd %xmm3,%xmm11 + pxor %xmm1,%xmm4 + psrld $27,%xmm9 + pxor %xmm14,%xmm6 + movdqa %xmm13,%xmm7 + + pslld $30,%xmm7 + movdqa %xmm4,%xmm5 + por %xmm9,%xmm8 + psrld $31,%xmm5 + paddd %xmm6,%xmm11 + paddd %xmm4,%xmm4 + + psrld $2,%xmm13 + paddd %xmm8,%xmm11 + por %xmm5,%xmm4 + por %xmm7,%xmm13 + pxor %xmm2,%xmm0 + movdqa 0-128(%rax),%xmm2 + + movdqa %xmm11,%xmm8 + movdqa %xmm14,%xmm6 + pxor 96-128(%rax),%xmm0 + paddd %xmm15,%xmm10 + pslld $5,%xmm8 + pxor %xmm12,%xmm6 + + movdqa %xmm11,%xmm9 + movdqa %xmm4,208-128(%rax) + paddd %xmm4,%xmm10 + pxor %xmm2,%xmm0 + psrld $27,%xmm9 + pxor %xmm13,%xmm6 + movdqa %xmm12,%xmm7 + + pslld $30,%xmm7 + movdqa %xmm0,%xmm5 + por %xmm9,%xmm8 + psrld $31,%xmm5 + paddd %xmm6,%xmm10 + paddd %xmm0,%xmm0 + + psrld $2,%xmm12 + paddd %xmm8,%xmm10 + por %xmm5,%xmm0 + por %xmm7,%xmm12 + pxor %xmm3,%xmm1 + movdqa 16-128(%rax),%xmm3 + + movdqa %xmm10,%xmm8 + movdqa %xmm13,%xmm6 + pxor 112-128(%rax),%xmm1 + paddd %xmm15,%xmm14 + pslld $5,%xmm8 + pxor %xmm11,%xmm6 + + movdqa %xmm10,%xmm9 + movdqa %xmm0,224-128(%rax) + paddd %xmm0,%xmm14 + pxor %xmm3,%xmm1 + psrld $27,%xmm9 + pxor %xmm12,%xmm6 + movdqa %xmm11,%xmm7 + + pslld $30,%xmm7 + movdqa %xmm1,%xmm5 + por %xmm9,%xmm8 + psrld $31,%xmm5 + paddd %xmm6,%xmm14 + paddd %xmm1,%xmm1 + + psrld $2,%xmm11 + paddd %xmm8,%xmm14 + por %xmm5,%xmm1 + por %xmm7,%xmm11 + pxor %xmm4,%xmm2 + movdqa 32-128(%rax),%xmm4 + + movdqa %xmm14,%xmm8 + movdqa %xmm12,%xmm6 + pxor 128-128(%rax),%xmm2 + paddd %xmm15,%xmm13 + pslld $5,%xmm8 + pxor %xmm10,%xmm6 + + movdqa %xmm14,%xmm9 + movdqa %xmm1,240-128(%rax) + paddd %xmm1,%xmm13 + pxor %xmm4,%xmm2 + psrld $27,%xmm9 + pxor %xmm11,%xmm6 + movdqa %xmm10,%xmm7 + + pslld $30,%xmm7 + movdqa %xmm2,%xmm5 + por %xmm9,%xmm8 + psrld $31,%xmm5 + paddd %xmm6,%xmm13 + paddd %xmm2,%xmm2 + + psrld $2,%xmm10 + paddd %xmm8,%xmm13 + por %xmm5,%xmm2 + por %xmm7,%xmm10 + pxor %xmm0,%xmm3 + movdqa 48-128(%rax),%xmm0 + + movdqa %xmm13,%xmm8 + movdqa %xmm11,%xmm6 + pxor 144-128(%rax),%xmm3 + paddd %xmm15,%xmm12 + pslld $5,%xmm8 + pxor %xmm14,%xmm6 + + movdqa %xmm13,%xmm9 + movdqa %xmm2,0-128(%rax) + paddd %xmm2,%xmm12 + pxor %xmm0,%xmm3 + psrld $27,%xmm9 + pxor %xmm10,%xmm6 + movdqa %xmm14,%xmm7 + + pslld $30,%xmm7 + movdqa %xmm3,%xmm5 + por %xmm9,%xmm8 + psrld $31,%xmm5 + paddd %xmm6,%xmm12 + paddd %xmm3,%xmm3 + + psrld $2,%xmm14 + paddd %xmm8,%xmm12 + por %xmm5,%xmm3 + por %xmm7,%xmm14 + pxor %xmm1,%xmm4 + movdqa 64-128(%rax),%xmm1 + + movdqa %xmm12,%xmm8 + movdqa %xmm10,%xmm6 + pxor 160-128(%rax),%xmm4 + paddd %xmm15,%xmm11 + pslld $5,%xmm8 + pxor %xmm13,%xmm6 + + movdqa %xmm12,%xmm9 + movdqa %xmm3,16-128(%rax) + paddd %xmm3,%xmm11 + pxor %xmm1,%xmm4 + psrld $27,%xmm9 + pxor %xmm14,%xmm6 + movdqa %xmm13,%xmm7 + + pslld $30,%xmm7 + movdqa %xmm4,%xmm5 + por %xmm9,%xmm8 + psrld $31,%xmm5 + paddd %xmm6,%xmm11 + paddd %xmm4,%xmm4 + + psrld $2,%xmm13 + paddd %xmm8,%xmm11 + por %xmm5,%xmm4 + por %xmm7,%xmm13 + pxor %xmm2,%xmm0 + movdqa 80-128(%rax),%xmm2 + + movdqa %xmm11,%xmm8 + movdqa %xmm14,%xmm6 + pxor 176-128(%rax),%xmm0 + paddd %xmm15,%xmm10 + pslld $5,%xmm8 + pxor %xmm12,%xmm6 + + movdqa %xmm11,%xmm9 + movdqa %xmm4,32-128(%rax) + paddd %xmm4,%xmm10 + pxor %xmm2,%xmm0 + psrld $27,%xmm9 + pxor %xmm13,%xmm6 + movdqa %xmm12,%xmm7 + + pslld $30,%xmm7 + movdqa %xmm0,%xmm5 + por %xmm9,%xmm8 + psrld $31,%xmm5 + paddd %xmm6,%xmm10 + paddd %xmm0,%xmm0 + + psrld $2,%xmm12 + paddd %xmm8,%xmm10 + por %xmm5,%xmm0 + por %xmm7,%xmm12 + pxor %xmm3,%xmm1 + movdqa 96-128(%rax),%xmm3 + + movdqa %xmm10,%xmm8 + movdqa %xmm13,%xmm6 + pxor 192-128(%rax),%xmm1 + paddd %xmm15,%xmm14 + pslld $5,%xmm8 + pxor %xmm11,%xmm6 + + movdqa %xmm10,%xmm9 + movdqa %xmm0,48-128(%rax) + paddd %xmm0,%xmm14 + pxor %xmm3,%xmm1 + psrld $27,%xmm9 + pxor %xmm12,%xmm6 + movdqa %xmm11,%xmm7 + + pslld $30,%xmm7 + movdqa %xmm1,%xmm5 + por %xmm9,%xmm8 + psrld $31,%xmm5 + paddd %xmm6,%xmm14 + paddd %xmm1,%xmm1 + + psrld $2,%xmm11 + paddd %xmm8,%xmm14 + por %xmm5,%xmm1 + por %xmm7,%xmm11 + pxor %xmm4,%xmm2 + movdqa 112-128(%rax),%xmm4 + + movdqa %xmm14,%xmm8 + movdqa %xmm12,%xmm6 + pxor 208-128(%rax),%xmm2 + paddd %xmm15,%xmm13 + pslld $5,%xmm8 + pxor %xmm10,%xmm6 + + movdqa %xmm14,%xmm9 + movdqa %xmm1,64-128(%rax) + paddd %xmm1,%xmm13 + pxor %xmm4,%xmm2 + psrld $27,%xmm9 + pxor %xmm11,%xmm6 + movdqa %xmm10,%xmm7 + + pslld $30,%xmm7 + movdqa %xmm2,%xmm5 + por %xmm9,%xmm8 + psrld $31,%xmm5 + paddd %xmm6,%xmm13 + paddd %xmm2,%xmm2 + + psrld $2,%xmm10 + paddd %xmm8,%xmm13 + por %xmm5,%xmm2 + por %xmm7,%xmm10 + pxor %xmm0,%xmm3 + movdqa 128-128(%rax),%xmm0 + + movdqa %xmm13,%xmm8 + movdqa %xmm11,%xmm6 + pxor 224-128(%rax),%xmm3 + paddd %xmm15,%xmm12 + pslld $5,%xmm8 + pxor %xmm14,%xmm6 + + movdqa %xmm13,%xmm9 + movdqa %xmm2,80-128(%rax) + paddd %xmm2,%xmm12 + pxor %xmm0,%xmm3 + psrld $27,%xmm9 + pxor %xmm10,%xmm6 + movdqa %xmm14,%xmm7 + + pslld $30,%xmm7 + movdqa %xmm3,%xmm5 + por %xmm9,%xmm8 + psrld $31,%xmm5 + paddd %xmm6,%xmm12 + paddd %xmm3,%xmm3 + + psrld $2,%xmm14 + paddd %xmm8,%xmm12 + por %xmm5,%xmm3 + por %xmm7,%xmm14 + pxor %xmm1,%xmm4 + movdqa 144-128(%rax),%xmm1 + + movdqa %xmm12,%xmm8 + movdqa %xmm10,%xmm6 + pxor 240-128(%rax),%xmm4 + paddd %xmm15,%xmm11 + pslld $5,%xmm8 + pxor %xmm13,%xmm6 + + movdqa %xmm12,%xmm9 + movdqa %xmm3,96-128(%rax) + paddd %xmm3,%xmm11 + pxor %xmm1,%xmm4 + psrld $27,%xmm9 + pxor %xmm14,%xmm6 + movdqa %xmm13,%xmm7 + + pslld $30,%xmm7 + movdqa %xmm4,%xmm5 + por %xmm9,%xmm8 + psrld $31,%xmm5 + paddd %xmm6,%xmm11 + paddd %xmm4,%xmm4 + + psrld $2,%xmm13 + paddd %xmm8,%xmm11 + por %xmm5,%xmm4 + por %xmm7,%xmm13 + pxor %xmm2,%xmm0 + movdqa 160-128(%rax),%xmm2 + + movdqa %xmm11,%xmm8 + movdqa %xmm14,%xmm6 + pxor 0-128(%rax),%xmm0 + paddd %xmm15,%xmm10 + pslld $5,%xmm8 + pxor %xmm12,%xmm6 + + movdqa %xmm11,%xmm9 + movdqa %xmm4,112-128(%rax) + paddd %xmm4,%xmm10 + pxor %xmm2,%xmm0 + psrld $27,%xmm9 + pxor %xmm13,%xmm6 + movdqa %xmm12,%xmm7 + + pslld $30,%xmm7 + movdqa %xmm0,%xmm5 + por %xmm9,%xmm8 + psrld $31,%xmm5 + paddd %xmm6,%xmm10 + paddd %xmm0,%xmm0 + + psrld $2,%xmm12 + paddd %xmm8,%xmm10 + por %xmm5,%xmm0 + por %xmm7,%xmm12 + movdqa 32(%rbp),%xmm15 + pxor %xmm3,%xmm1 + movdqa 176-128(%rax),%xmm3 + + movdqa %xmm10,%xmm8 + movdqa %xmm13,%xmm7 + pxor 16-128(%rax),%xmm1 + pxor %xmm3,%xmm1 + paddd %xmm15,%xmm14 + pslld $5,%xmm8 + movdqa %xmm10,%xmm9 + pand %xmm12,%xmm7 + + movdqa %xmm13,%xmm6 + movdqa %xmm1,%xmm5 + psrld $27,%xmm9 + paddd %xmm7,%xmm14 + pxor %xmm12,%xmm6 + + movdqa %xmm0,128-128(%rax) + paddd %xmm0,%xmm14 + por %xmm9,%xmm8 + psrld $31,%xmm5 + pand %xmm11,%xmm6 + movdqa %xmm11,%xmm7 + + pslld $30,%xmm7 + paddd %xmm1,%xmm1 + paddd %xmm6,%xmm14 + + psrld $2,%xmm11 + paddd %xmm8,%xmm14 + por %xmm5,%xmm1 + por %xmm7,%xmm11 + pxor %xmm4,%xmm2 + movdqa 192-128(%rax),%xmm4 + + movdqa %xmm14,%xmm8 + movdqa %xmm12,%xmm7 + pxor 32-128(%rax),%xmm2 + pxor %xmm4,%xmm2 + paddd %xmm15,%xmm13 + pslld $5,%xmm8 + movdqa %xmm14,%xmm9 + pand %xmm11,%xmm7 + + movdqa %xmm12,%xmm6 + movdqa %xmm2,%xmm5 + psrld $27,%xmm9 + paddd %xmm7,%xmm13 + pxor %xmm11,%xmm6 + + movdqa %xmm1,144-128(%rax) + paddd %xmm1,%xmm13 + por %xmm9,%xmm8 + psrld $31,%xmm5 + pand %xmm10,%xmm6 + movdqa %xmm10,%xmm7 + + pslld $30,%xmm7 + paddd %xmm2,%xmm2 + paddd %xmm6,%xmm13 + + psrld $2,%xmm10 + paddd %xmm8,%xmm13 + por %xmm5,%xmm2 + por %xmm7,%xmm10 + pxor %xmm0,%xmm3 + movdqa 208-128(%rax),%xmm0 + + movdqa %xmm13,%xmm8 + movdqa %xmm11,%xmm7 + pxor 48-128(%rax),%xmm3 + pxor %xmm0,%xmm3 + paddd %xmm15,%xmm12 + pslld $5,%xmm8 + movdqa %xmm13,%xmm9 + pand %xmm10,%xmm7 + + movdqa %xmm11,%xmm6 + movdqa %xmm3,%xmm5 + psrld $27,%xmm9 + paddd %xmm7,%xmm12 + pxor %xmm10,%xmm6 + + movdqa %xmm2,160-128(%rax) + paddd %xmm2,%xmm12 + por %xmm9,%xmm8 + psrld $31,%xmm5 + pand %xmm14,%xmm6 + movdqa %xmm14,%xmm7 + + pslld $30,%xmm7 + paddd %xmm3,%xmm3 + paddd %xmm6,%xmm12 + + psrld $2,%xmm14 + paddd %xmm8,%xmm12 + por %xmm5,%xmm3 + por %xmm7,%xmm14 + pxor %xmm1,%xmm4 + movdqa 224-128(%rax),%xmm1 + + movdqa %xmm12,%xmm8 + movdqa %xmm10,%xmm7 + pxor 64-128(%rax),%xmm4 + pxor %xmm1,%xmm4 + paddd %xmm15,%xmm11 + pslld $5,%xmm8 + movdqa %xmm12,%xmm9 + pand %xmm14,%xmm7 + + movdqa %xmm10,%xmm6 + movdqa %xmm4,%xmm5 + psrld $27,%xmm9 + paddd %xmm7,%xmm11 + pxor %xmm14,%xmm6 + + movdqa %xmm3,176-128(%rax) + paddd %xmm3,%xmm11 + por %xmm9,%xmm8 + psrld $31,%xmm5 + pand %xmm13,%xmm6 + movdqa %xmm13,%xmm7 + + pslld $30,%xmm7 + paddd %xmm4,%xmm4 + paddd %xmm6,%xmm11 + + psrld $2,%xmm13 + paddd %xmm8,%xmm11 + por %xmm5,%xmm4 + por %xmm7,%xmm13 + pxor %xmm2,%xmm0 + movdqa 240-128(%rax),%xmm2 + + movdqa %xmm11,%xmm8 + movdqa %xmm14,%xmm7 + pxor 80-128(%rax),%xmm0 + pxor %xmm2,%xmm0 + paddd %xmm15,%xmm10 + pslld $5,%xmm8 + movdqa %xmm11,%xmm9 + pand %xmm13,%xmm7 + + movdqa %xmm14,%xmm6 + movdqa %xmm0,%xmm5 + psrld $27,%xmm9 + paddd %xmm7,%xmm10 + pxor %xmm13,%xmm6 + + movdqa %xmm4,192-128(%rax) + paddd %xmm4,%xmm10 + por %xmm9,%xmm8 + psrld $31,%xmm5 + pand %xmm12,%xmm6 + movdqa %xmm12,%xmm7 + + pslld $30,%xmm7 + paddd %xmm0,%xmm0 + paddd %xmm6,%xmm10 + + psrld $2,%xmm12 + paddd %xmm8,%xmm10 + por %xmm5,%xmm0 + por %xmm7,%xmm12 + pxor %xmm3,%xmm1 + movdqa 0-128(%rax),%xmm3 + + movdqa %xmm10,%xmm8 + movdqa %xmm13,%xmm7 + pxor 96-128(%rax),%xmm1 + pxor %xmm3,%xmm1 + paddd %xmm15,%xmm14 + pslld $5,%xmm8 + movdqa %xmm10,%xmm9 + pand %xmm12,%xmm7 + + movdqa %xmm13,%xmm6 + movdqa %xmm1,%xmm5 + psrld $27,%xmm9 + paddd %xmm7,%xmm14 + pxor %xmm12,%xmm6 + + movdqa %xmm0,208-128(%rax) + paddd %xmm0,%xmm14 + por %xmm9,%xmm8 + psrld $31,%xmm5 + pand %xmm11,%xmm6 + movdqa %xmm11,%xmm7 + + pslld $30,%xmm7 + paddd %xmm1,%xmm1 + paddd %xmm6,%xmm14 + + psrld $2,%xmm11 + paddd %xmm8,%xmm14 + por %xmm5,%xmm1 + por %xmm7,%xmm11 + pxor %xmm4,%xmm2 + movdqa 16-128(%rax),%xmm4 + + movdqa %xmm14,%xmm8 + movdqa %xmm12,%xmm7 + pxor 112-128(%rax),%xmm2 + pxor %xmm4,%xmm2 + paddd %xmm15,%xmm13 + pslld $5,%xmm8 + movdqa %xmm14,%xmm9 + pand %xmm11,%xmm7 + + movdqa %xmm12,%xmm6 + movdqa %xmm2,%xmm5 + psrld $27,%xmm9 + paddd %xmm7,%xmm13 + pxor %xmm11,%xmm6 + + movdqa %xmm1,224-128(%rax) + paddd %xmm1,%xmm13 + por %xmm9,%xmm8 + psrld $31,%xmm5 + pand %xmm10,%xmm6 + movdqa %xmm10,%xmm7 + + pslld $30,%xmm7 + paddd %xmm2,%xmm2 + paddd %xmm6,%xmm13 + + psrld $2,%xmm10 + paddd %xmm8,%xmm13 + por %xmm5,%xmm2 + por %xmm7,%xmm10 + pxor %xmm0,%xmm3 + movdqa 32-128(%rax),%xmm0 + + movdqa %xmm13,%xmm8 + movdqa %xmm11,%xmm7 + pxor 128-128(%rax),%xmm3 + pxor %xmm0,%xmm3 + paddd %xmm15,%xmm12 + pslld $5,%xmm8 + movdqa %xmm13,%xmm9 + pand %xmm10,%xmm7 + + movdqa %xmm11,%xmm6 + movdqa %xmm3,%xmm5 + psrld $27,%xmm9 + paddd %xmm7,%xmm12 + pxor %xmm10,%xmm6 + + movdqa %xmm2,240-128(%rax) + paddd %xmm2,%xmm12 + por %xmm9,%xmm8 + psrld $31,%xmm5 + pand %xmm14,%xmm6 + movdqa %xmm14,%xmm7 + + pslld $30,%xmm7 + paddd %xmm3,%xmm3 + paddd %xmm6,%xmm12 + + psrld $2,%xmm14 + paddd %xmm8,%xmm12 + por %xmm5,%xmm3 + por %xmm7,%xmm14 + pxor %xmm1,%xmm4 + movdqa 48-128(%rax),%xmm1 + + movdqa %xmm12,%xmm8 + movdqa %xmm10,%xmm7 + pxor 144-128(%rax),%xmm4 + pxor %xmm1,%xmm4 + paddd %xmm15,%xmm11 + pslld $5,%xmm8 + movdqa %xmm12,%xmm9 + pand %xmm14,%xmm7 + + movdqa %xmm10,%xmm6 + movdqa %xmm4,%xmm5 + psrld $27,%xmm9 + paddd %xmm7,%xmm11 + pxor %xmm14,%xmm6 + + movdqa %xmm3,0-128(%rax) + paddd %xmm3,%xmm11 + por %xmm9,%xmm8 + psrld $31,%xmm5 + pand %xmm13,%xmm6 + movdqa %xmm13,%xmm7 + + pslld $30,%xmm7 + paddd %xmm4,%xmm4 + paddd %xmm6,%xmm11 + + psrld $2,%xmm13 + paddd %xmm8,%xmm11 + por %xmm5,%xmm4 + por %xmm7,%xmm13 + pxor %xmm2,%xmm0 + movdqa 64-128(%rax),%xmm2 + + movdqa %xmm11,%xmm8 + movdqa %xmm14,%xmm7 + pxor 160-128(%rax),%xmm0 + pxor %xmm2,%xmm0 + paddd %xmm15,%xmm10 + pslld $5,%xmm8 + movdqa %xmm11,%xmm9 + pand %xmm13,%xmm7 + + movdqa %xmm14,%xmm6 + movdqa %xmm0,%xmm5 + psrld $27,%xmm9 + paddd %xmm7,%xmm10 + pxor %xmm13,%xmm6 + + movdqa %xmm4,16-128(%rax) + paddd %xmm4,%xmm10 + por %xmm9,%xmm8 + psrld $31,%xmm5 + pand %xmm12,%xmm6 + movdqa %xmm12,%xmm7 + + pslld $30,%xmm7 + paddd %xmm0,%xmm0 + paddd %xmm6,%xmm10 + + psrld $2,%xmm12 + paddd %xmm8,%xmm10 + por %xmm5,%xmm0 + por %xmm7,%xmm12 + pxor %xmm3,%xmm1 + movdqa 80-128(%rax),%xmm3 + + movdqa %xmm10,%xmm8 + movdqa %xmm13,%xmm7 + pxor 176-128(%rax),%xmm1 + pxor %xmm3,%xmm1 + paddd %xmm15,%xmm14 + pslld $5,%xmm8 + movdqa %xmm10,%xmm9 + pand %xmm12,%xmm7 + + movdqa %xmm13,%xmm6 + movdqa %xmm1,%xmm5 + psrld $27,%xmm9 + paddd %xmm7,%xmm14 + pxor %xmm12,%xmm6 + + movdqa %xmm0,32-128(%rax) + paddd %xmm0,%xmm14 + por %xmm9,%xmm8 + psrld $31,%xmm5 + pand %xmm11,%xmm6 + movdqa %xmm11,%xmm7 + + pslld $30,%xmm7 + paddd %xmm1,%xmm1 + paddd %xmm6,%xmm14 + + psrld $2,%xmm11 + paddd %xmm8,%xmm14 + por %xmm5,%xmm1 + por %xmm7,%xmm11 + pxor %xmm4,%xmm2 + movdqa 96-128(%rax),%xmm4 + + movdqa %xmm14,%xmm8 + movdqa %xmm12,%xmm7 + pxor 192-128(%rax),%xmm2 + pxor %xmm4,%xmm2 + paddd %xmm15,%xmm13 + pslld $5,%xmm8 + movdqa %xmm14,%xmm9 + pand %xmm11,%xmm7 + + movdqa %xmm12,%xmm6 + movdqa %xmm2,%xmm5 + psrld $27,%xmm9 + paddd %xmm7,%xmm13 + pxor %xmm11,%xmm6 + + movdqa %xmm1,48-128(%rax) + paddd %xmm1,%xmm13 + por %xmm9,%xmm8 + psrld $31,%xmm5 + pand %xmm10,%xmm6 + movdqa %xmm10,%xmm7 + + pslld $30,%xmm7 + paddd %xmm2,%xmm2 + paddd %xmm6,%xmm13 + + psrld $2,%xmm10 + paddd %xmm8,%xmm13 + por %xmm5,%xmm2 + por %xmm7,%xmm10 + pxor %xmm0,%xmm3 + movdqa 112-128(%rax),%xmm0 + + movdqa %xmm13,%xmm8 + movdqa %xmm11,%xmm7 + pxor 208-128(%rax),%xmm3 + pxor %xmm0,%xmm3 + paddd %xmm15,%xmm12 + pslld $5,%xmm8 + movdqa %xmm13,%xmm9 + pand %xmm10,%xmm7 + + movdqa %xmm11,%xmm6 + movdqa %xmm3,%xmm5 + psrld $27,%xmm9 + paddd %xmm7,%xmm12 + pxor %xmm10,%xmm6 + + movdqa %xmm2,64-128(%rax) + paddd %xmm2,%xmm12 + por %xmm9,%xmm8 + psrld $31,%xmm5 + pand %xmm14,%xmm6 + movdqa %xmm14,%xmm7 + + pslld $30,%xmm7 + paddd %xmm3,%xmm3 + paddd %xmm6,%xmm12 + + psrld $2,%xmm14 + paddd %xmm8,%xmm12 + por %xmm5,%xmm3 + por %xmm7,%xmm14 + pxor %xmm1,%xmm4 + movdqa 128-128(%rax),%xmm1 + + movdqa %xmm12,%xmm8 + movdqa %xmm10,%xmm7 + pxor 224-128(%rax),%xmm4 + pxor %xmm1,%xmm4 + paddd %xmm15,%xmm11 + pslld $5,%xmm8 + movdqa %xmm12,%xmm9 + pand %xmm14,%xmm7 + + movdqa %xmm10,%xmm6 + movdqa %xmm4,%xmm5 + psrld $27,%xmm9 + paddd %xmm7,%xmm11 + pxor %xmm14,%xmm6 + + movdqa %xmm3,80-128(%rax) + paddd %xmm3,%xmm11 + por %xmm9,%xmm8 + psrld $31,%xmm5 + pand %xmm13,%xmm6 + movdqa %xmm13,%xmm7 + + pslld $30,%xmm7 + paddd %xmm4,%xmm4 + paddd %xmm6,%xmm11 + + psrld $2,%xmm13 + paddd %xmm8,%xmm11 + por %xmm5,%xmm4 + por %xmm7,%xmm13 + pxor %xmm2,%xmm0 + movdqa 144-128(%rax),%xmm2 + + movdqa %xmm11,%xmm8 + movdqa %xmm14,%xmm7 + pxor 240-128(%rax),%xmm0 + pxor %xmm2,%xmm0 + paddd %xmm15,%xmm10 + pslld $5,%xmm8 + movdqa %xmm11,%xmm9 + pand %xmm13,%xmm7 + + movdqa %xmm14,%xmm6 + movdqa %xmm0,%xmm5 + psrld $27,%xmm9 + paddd %xmm7,%xmm10 + pxor %xmm13,%xmm6 + + movdqa %xmm4,96-128(%rax) + paddd %xmm4,%xmm10 + por %xmm9,%xmm8 + psrld $31,%xmm5 + pand %xmm12,%xmm6 + movdqa %xmm12,%xmm7 + + pslld $30,%xmm7 + paddd %xmm0,%xmm0 + paddd %xmm6,%xmm10 + + psrld $2,%xmm12 + paddd %xmm8,%xmm10 + por %xmm5,%xmm0 + por %xmm7,%xmm12 + pxor %xmm3,%xmm1 + movdqa 160-128(%rax),%xmm3 + + movdqa %xmm10,%xmm8 + movdqa %xmm13,%xmm7 + pxor 0-128(%rax),%xmm1 + pxor %xmm3,%xmm1 + paddd %xmm15,%xmm14 + pslld $5,%xmm8 + movdqa %xmm10,%xmm9 + pand %xmm12,%xmm7 + + movdqa %xmm13,%xmm6 + movdqa %xmm1,%xmm5 + psrld $27,%xmm9 + paddd %xmm7,%xmm14 + pxor %xmm12,%xmm6 + + movdqa %xmm0,112-128(%rax) + paddd %xmm0,%xmm14 + por %xmm9,%xmm8 + psrld $31,%xmm5 + pand %xmm11,%xmm6 + movdqa %xmm11,%xmm7 + + pslld $30,%xmm7 + paddd %xmm1,%xmm1 + paddd %xmm6,%xmm14 + + psrld $2,%xmm11 + paddd %xmm8,%xmm14 + por %xmm5,%xmm1 + por %xmm7,%xmm11 + pxor %xmm4,%xmm2 + movdqa 176-128(%rax),%xmm4 + + movdqa %xmm14,%xmm8 + movdqa %xmm12,%xmm7 + pxor 16-128(%rax),%xmm2 + pxor %xmm4,%xmm2 + paddd %xmm15,%xmm13 + pslld $5,%xmm8 + movdqa %xmm14,%xmm9 + pand %xmm11,%xmm7 + + movdqa %xmm12,%xmm6 + movdqa %xmm2,%xmm5 + psrld $27,%xmm9 + paddd %xmm7,%xmm13 + pxor %xmm11,%xmm6 + + movdqa %xmm1,128-128(%rax) + paddd %xmm1,%xmm13 + por %xmm9,%xmm8 + psrld $31,%xmm5 + pand %xmm10,%xmm6 + movdqa %xmm10,%xmm7 + + pslld $30,%xmm7 + paddd %xmm2,%xmm2 + paddd %xmm6,%xmm13 + + psrld $2,%xmm10 + paddd %xmm8,%xmm13 + por %xmm5,%xmm2 + por %xmm7,%xmm10 + pxor %xmm0,%xmm3 + movdqa 192-128(%rax),%xmm0 + + movdqa %xmm13,%xmm8 + movdqa %xmm11,%xmm7 + pxor 32-128(%rax),%xmm3 + pxor %xmm0,%xmm3 + paddd %xmm15,%xmm12 + pslld $5,%xmm8 + movdqa %xmm13,%xmm9 + pand %xmm10,%xmm7 + + movdqa %xmm11,%xmm6 + movdqa %xmm3,%xmm5 + psrld $27,%xmm9 + paddd %xmm7,%xmm12 + pxor %xmm10,%xmm6 + + movdqa %xmm2,144-128(%rax) + paddd %xmm2,%xmm12 + por %xmm9,%xmm8 + psrld $31,%xmm5 + pand %xmm14,%xmm6 + movdqa %xmm14,%xmm7 + + pslld $30,%xmm7 + paddd %xmm3,%xmm3 + paddd %xmm6,%xmm12 + + psrld $2,%xmm14 + paddd %xmm8,%xmm12 + por %xmm5,%xmm3 + por %xmm7,%xmm14 + pxor %xmm1,%xmm4 + movdqa 208-128(%rax),%xmm1 + + movdqa %xmm12,%xmm8 + movdqa %xmm10,%xmm7 + pxor 48-128(%rax),%xmm4 + pxor %xmm1,%xmm4 + paddd %xmm15,%xmm11 + pslld $5,%xmm8 + movdqa %xmm12,%xmm9 + pand %xmm14,%xmm7 + + movdqa %xmm10,%xmm6 + movdqa %xmm4,%xmm5 + psrld $27,%xmm9 + paddd %xmm7,%xmm11 + pxor %xmm14,%xmm6 + + movdqa %xmm3,160-128(%rax) + paddd %xmm3,%xmm11 + por %xmm9,%xmm8 + psrld $31,%xmm5 + pand %xmm13,%xmm6 + movdqa %xmm13,%xmm7 + + pslld $30,%xmm7 + paddd %xmm4,%xmm4 + paddd %xmm6,%xmm11 + + psrld $2,%xmm13 + paddd %xmm8,%xmm11 + por %xmm5,%xmm4 + por %xmm7,%xmm13 + pxor %xmm2,%xmm0 + movdqa 224-128(%rax),%xmm2 + + movdqa %xmm11,%xmm8 + movdqa %xmm14,%xmm7 + pxor 64-128(%rax),%xmm0 + pxor %xmm2,%xmm0 + paddd %xmm15,%xmm10 + pslld $5,%xmm8 + movdqa %xmm11,%xmm9 + pand %xmm13,%xmm7 + + movdqa %xmm14,%xmm6 + movdqa %xmm0,%xmm5 + psrld $27,%xmm9 + paddd %xmm7,%xmm10 + pxor %xmm13,%xmm6 + + movdqa %xmm4,176-128(%rax) + paddd %xmm4,%xmm10 + por %xmm9,%xmm8 + psrld $31,%xmm5 + pand %xmm12,%xmm6 + movdqa %xmm12,%xmm7 + + pslld $30,%xmm7 + paddd %xmm0,%xmm0 + paddd %xmm6,%xmm10 + + psrld $2,%xmm12 + paddd %xmm8,%xmm10 + por %xmm5,%xmm0 + por %xmm7,%xmm12 + movdqa 64(%rbp),%xmm15 + pxor %xmm3,%xmm1 + movdqa 240-128(%rax),%xmm3 + + movdqa %xmm10,%xmm8 + movdqa %xmm13,%xmm6 + pxor 80-128(%rax),%xmm1 + paddd %xmm15,%xmm14 + pslld $5,%xmm8 + pxor %xmm11,%xmm6 + + movdqa %xmm10,%xmm9 + movdqa %xmm0,192-128(%rax) + paddd %xmm0,%xmm14 + pxor %xmm3,%xmm1 + psrld $27,%xmm9 + pxor %xmm12,%xmm6 + movdqa %xmm11,%xmm7 + + pslld $30,%xmm7 + movdqa %xmm1,%xmm5 + por %xmm9,%xmm8 + psrld $31,%xmm5 + paddd %xmm6,%xmm14 + paddd %xmm1,%xmm1 + + psrld $2,%xmm11 + paddd %xmm8,%xmm14 + por %xmm5,%xmm1 + por %xmm7,%xmm11 + pxor %xmm4,%xmm2 + movdqa 0-128(%rax),%xmm4 + + movdqa %xmm14,%xmm8 + movdqa %xmm12,%xmm6 + pxor 96-128(%rax),%xmm2 + paddd %xmm15,%xmm13 + pslld $5,%xmm8 + pxor %xmm10,%xmm6 + + movdqa %xmm14,%xmm9 + movdqa %xmm1,208-128(%rax) + paddd %xmm1,%xmm13 + pxor %xmm4,%xmm2 + psrld $27,%xmm9 + pxor %xmm11,%xmm6 + movdqa %xmm10,%xmm7 + + pslld $30,%xmm7 + movdqa %xmm2,%xmm5 + por %xmm9,%xmm8 + psrld $31,%xmm5 + paddd %xmm6,%xmm13 + paddd %xmm2,%xmm2 + + psrld $2,%xmm10 + paddd %xmm8,%xmm13 + por %xmm5,%xmm2 + por %xmm7,%xmm10 + pxor %xmm0,%xmm3 + movdqa 16-128(%rax),%xmm0 + + movdqa %xmm13,%xmm8 + movdqa %xmm11,%xmm6 + pxor 112-128(%rax),%xmm3 + paddd %xmm15,%xmm12 + pslld $5,%xmm8 + pxor %xmm14,%xmm6 + + movdqa %xmm13,%xmm9 + movdqa %xmm2,224-128(%rax) + paddd %xmm2,%xmm12 + pxor %xmm0,%xmm3 + psrld $27,%xmm9 + pxor %xmm10,%xmm6 + movdqa %xmm14,%xmm7 + + pslld $30,%xmm7 + movdqa %xmm3,%xmm5 + por %xmm9,%xmm8 + psrld $31,%xmm5 + paddd %xmm6,%xmm12 + paddd %xmm3,%xmm3 + + psrld $2,%xmm14 + paddd %xmm8,%xmm12 + por %xmm5,%xmm3 + por %xmm7,%xmm14 + pxor %xmm1,%xmm4 + movdqa 32-128(%rax),%xmm1 + + movdqa %xmm12,%xmm8 + movdqa %xmm10,%xmm6 + pxor 128-128(%rax),%xmm4 + paddd %xmm15,%xmm11 + pslld $5,%xmm8 + pxor %xmm13,%xmm6 + + movdqa %xmm12,%xmm9 + movdqa %xmm3,240-128(%rax) + paddd %xmm3,%xmm11 + pxor %xmm1,%xmm4 + psrld $27,%xmm9 + pxor %xmm14,%xmm6 + movdqa %xmm13,%xmm7 + + pslld $30,%xmm7 + movdqa %xmm4,%xmm5 + por %xmm9,%xmm8 + psrld $31,%xmm5 + paddd %xmm6,%xmm11 + paddd %xmm4,%xmm4 + + psrld $2,%xmm13 + paddd %xmm8,%xmm11 + por %xmm5,%xmm4 + por %xmm7,%xmm13 + pxor %xmm2,%xmm0 + movdqa 48-128(%rax),%xmm2 + + movdqa %xmm11,%xmm8 + movdqa %xmm14,%xmm6 + pxor 144-128(%rax),%xmm0 + paddd %xmm15,%xmm10 + pslld $5,%xmm8 + pxor %xmm12,%xmm6 + + movdqa %xmm11,%xmm9 + movdqa %xmm4,0-128(%rax) + paddd %xmm4,%xmm10 + pxor %xmm2,%xmm0 + psrld $27,%xmm9 + pxor %xmm13,%xmm6 + movdqa %xmm12,%xmm7 + + pslld $30,%xmm7 + movdqa %xmm0,%xmm5 + por %xmm9,%xmm8 + psrld $31,%xmm5 + paddd %xmm6,%xmm10 + paddd %xmm0,%xmm0 + + psrld $2,%xmm12 + paddd %xmm8,%xmm10 + por %xmm5,%xmm0 + por %xmm7,%xmm12 + pxor %xmm3,%xmm1 + movdqa 64-128(%rax),%xmm3 + + movdqa %xmm10,%xmm8 + movdqa %xmm13,%xmm6 + pxor 160-128(%rax),%xmm1 + paddd %xmm15,%xmm14 + pslld $5,%xmm8 + pxor %xmm11,%xmm6 + + movdqa %xmm10,%xmm9 + movdqa %xmm0,16-128(%rax) + paddd %xmm0,%xmm14 + pxor %xmm3,%xmm1 + psrld $27,%xmm9 + pxor %xmm12,%xmm6 + movdqa %xmm11,%xmm7 + + pslld $30,%xmm7 + movdqa %xmm1,%xmm5 + por %xmm9,%xmm8 + psrld $31,%xmm5 + paddd %xmm6,%xmm14 + paddd %xmm1,%xmm1 + + psrld $2,%xmm11 + paddd %xmm8,%xmm14 + por %xmm5,%xmm1 + por %xmm7,%xmm11 + pxor %xmm4,%xmm2 + movdqa 80-128(%rax),%xmm4 + + movdqa %xmm14,%xmm8 + movdqa %xmm12,%xmm6 + pxor 176-128(%rax),%xmm2 + paddd %xmm15,%xmm13 + pslld $5,%xmm8 + pxor %xmm10,%xmm6 + + movdqa %xmm14,%xmm9 + movdqa %xmm1,32-128(%rax) + paddd %xmm1,%xmm13 + pxor %xmm4,%xmm2 + psrld $27,%xmm9 + pxor %xmm11,%xmm6 + movdqa %xmm10,%xmm7 + + pslld $30,%xmm7 + movdqa %xmm2,%xmm5 + por %xmm9,%xmm8 + psrld $31,%xmm5 + paddd %xmm6,%xmm13 + paddd %xmm2,%xmm2 + + psrld $2,%xmm10 + paddd %xmm8,%xmm13 + por %xmm5,%xmm2 + por %xmm7,%xmm10 + pxor %xmm0,%xmm3 + movdqa 96-128(%rax),%xmm0 + + movdqa %xmm13,%xmm8 + movdqa %xmm11,%xmm6 + pxor 192-128(%rax),%xmm3 + paddd %xmm15,%xmm12 + pslld $5,%xmm8 + pxor %xmm14,%xmm6 + + movdqa %xmm13,%xmm9 + movdqa %xmm2,48-128(%rax) + paddd %xmm2,%xmm12 + pxor %xmm0,%xmm3 + psrld $27,%xmm9 + pxor %xmm10,%xmm6 + movdqa %xmm14,%xmm7 + + pslld $30,%xmm7 + movdqa %xmm3,%xmm5 + por %xmm9,%xmm8 + psrld $31,%xmm5 + paddd %xmm6,%xmm12 + paddd %xmm3,%xmm3 + + psrld $2,%xmm14 + paddd %xmm8,%xmm12 + por %xmm5,%xmm3 + por %xmm7,%xmm14 + pxor %xmm1,%xmm4 + movdqa 112-128(%rax),%xmm1 + + movdqa %xmm12,%xmm8 + movdqa %xmm10,%xmm6 + pxor 208-128(%rax),%xmm4 + paddd %xmm15,%xmm11 + pslld $5,%xmm8 + pxor %xmm13,%xmm6 + + movdqa %xmm12,%xmm9 + movdqa %xmm3,64-128(%rax) + paddd %xmm3,%xmm11 + pxor %xmm1,%xmm4 + psrld $27,%xmm9 + pxor %xmm14,%xmm6 + movdqa %xmm13,%xmm7 + + pslld $30,%xmm7 + movdqa %xmm4,%xmm5 + por %xmm9,%xmm8 + psrld $31,%xmm5 + paddd %xmm6,%xmm11 + paddd %xmm4,%xmm4 + + psrld $2,%xmm13 + paddd %xmm8,%xmm11 + por %xmm5,%xmm4 + por %xmm7,%xmm13 + pxor %xmm2,%xmm0 + movdqa 128-128(%rax),%xmm2 + + movdqa %xmm11,%xmm8 + movdqa %xmm14,%xmm6 + pxor 224-128(%rax),%xmm0 + paddd %xmm15,%xmm10 + pslld $5,%xmm8 + pxor %xmm12,%xmm6 + + movdqa %xmm11,%xmm9 + movdqa %xmm4,80-128(%rax) + paddd %xmm4,%xmm10 + pxor %xmm2,%xmm0 + psrld $27,%xmm9 + pxor %xmm13,%xmm6 + movdqa %xmm12,%xmm7 + + pslld $30,%xmm7 + movdqa %xmm0,%xmm5 + por %xmm9,%xmm8 + psrld $31,%xmm5 + paddd %xmm6,%xmm10 + paddd %xmm0,%xmm0 + + psrld $2,%xmm12 + paddd %xmm8,%xmm10 + por %xmm5,%xmm0 + por %xmm7,%xmm12 + pxor %xmm3,%xmm1 + movdqa 144-128(%rax),%xmm3 + + movdqa %xmm10,%xmm8 + movdqa %xmm13,%xmm6 + pxor 240-128(%rax),%xmm1 + paddd %xmm15,%xmm14 + pslld $5,%xmm8 + pxor %xmm11,%xmm6 + + movdqa %xmm10,%xmm9 + movdqa %xmm0,96-128(%rax) + paddd %xmm0,%xmm14 + pxor %xmm3,%xmm1 + psrld $27,%xmm9 + pxor %xmm12,%xmm6 + movdqa %xmm11,%xmm7 + + pslld $30,%xmm7 + movdqa %xmm1,%xmm5 + por %xmm9,%xmm8 + psrld $31,%xmm5 + paddd %xmm6,%xmm14 + paddd %xmm1,%xmm1 + + psrld $2,%xmm11 + paddd %xmm8,%xmm14 + por %xmm5,%xmm1 + por %xmm7,%xmm11 + pxor %xmm4,%xmm2 + movdqa 160-128(%rax),%xmm4 + + movdqa %xmm14,%xmm8 + movdqa %xmm12,%xmm6 + pxor 0-128(%rax),%xmm2 + paddd %xmm15,%xmm13 + pslld $5,%xmm8 + pxor %xmm10,%xmm6 + + movdqa %xmm14,%xmm9 + movdqa %xmm1,112-128(%rax) + paddd %xmm1,%xmm13 + pxor %xmm4,%xmm2 + psrld $27,%xmm9 + pxor %xmm11,%xmm6 + movdqa %xmm10,%xmm7 + + pslld $30,%xmm7 + movdqa %xmm2,%xmm5 + por %xmm9,%xmm8 + psrld $31,%xmm5 + paddd %xmm6,%xmm13 + paddd %xmm2,%xmm2 + + psrld $2,%xmm10 + paddd %xmm8,%xmm13 + por %xmm5,%xmm2 + por %xmm7,%xmm10 + pxor %xmm0,%xmm3 + movdqa 176-128(%rax),%xmm0 + + movdqa %xmm13,%xmm8 + movdqa %xmm11,%xmm6 + pxor 16-128(%rax),%xmm3 + paddd %xmm15,%xmm12 + pslld $5,%xmm8 + pxor %xmm14,%xmm6 + + movdqa %xmm13,%xmm9 + paddd %xmm2,%xmm12 + pxor %xmm0,%xmm3 + psrld $27,%xmm9 + pxor %xmm10,%xmm6 + movdqa %xmm14,%xmm7 + + pslld $30,%xmm7 + movdqa %xmm3,%xmm5 + por %xmm9,%xmm8 + psrld $31,%xmm5 + paddd %xmm6,%xmm12 + paddd %xmm3,%xmm3 + + psrld $2,%xmm14 + paddd %xmm8,%xmm12 + por %xmm5,%xmm3 + por %xmm7,%xmm14 + pxor %xmm1,%xmm4 + movdqa 192-128(%rax),%xmm1 + + movdqa %xmm12,%xmm8 + movdqa %xmm10,%xmm6 + pxor 32-128(%rax),%xmm4 + paddd %xmm15,%xmm11 + pslld $5,%xmm8 + pxor %xmm13,%xmm6 + + movdqa %xmm12,%xmm9 + paddd %xmm3,%xmm11 + pxor %xmm1,%xmm4 + psrld $27,%xmm9 + pxor %xmm14,%xmm6 + movdqa %xmm13,%xmm7 + + pslld $30,%xmm7 + movdqa %xmm4,%xmm5 + por %xmm9,%xmm8 + psrld $31,%xmm5 + paddd %xmm6,%xmm11 + paddd %xmm4,%xmm4 + + psrld $2,%xmm13 + paddd %xmm8,%xmm11 + por %xmm5,%xmm4 + por %xmm7,%xmm13 + pxor %xmm2,%xmm0 + movdqa 208-128(%rax),%xmm2 + + movdqa %xmm11,%xmm8 + movdqa %xmm14,%xmm6 + pxor 48-128(%rax),%xmm0 + paddd %xmm15,%xmm10 + pslld $5,%xmm8 + pxor %xmm12,%xmm6 + + movdqa %xmm11,%xmm9 + paddd %xmm4,%xmm10 + pxor %xmm2,%xmm0 + psrld $27,%xmm9 + pxor %xmm13,%xmm6 + movdqa %xmm12,%xmm7 + + pslld $30,%xmm7 + movdqa %xmm0,%xmm5 + por %xmm9,%xmm8 + psrld $31,%xmm5 + paddd %xmm6,%xmm10 + paddd %xmm0,%xmm0 + + psrld $2,%xmm12 + paddd %xmm8,%xmm10 + por %xmm5,%xmm0 + por %xmm7,%xmm12 + pxor %xmm3,%xmm1 + movdqa 224-128(%rax),%xmm3 + + movdqa %xmm10,%xmm8 + movdqa %xmm13,%xmm6 + pxor 64-128(%rax),%xmm1 + paddd %xmm15,%xmm14 + pslld $5,%xmm8 + pxor %xmm11,%xmm6 + + movdqa %xmm10,%xmm9 + paddd %xmm0,%xmm14 + pxor %xmm3,%xmm1 + psrld $27,%xmm9 + pxor %xmm12,%xmm6 + movdqa %xmm11,%xmm7 + + pslld $30,%xmm7 + movdqa %xmm1,%xmm5 + por %xmm9,%xmm8 + psrld $31,%xmm5 + paddd %xmm6,%xmm14 + paddd %xmm1,%xmm1 + + psrld $2,%xmm11 + paddd %xmm8,%xmm14 + por %xmm5,%xmm1 + por %xmm7,%xmm11 + pxor %xmm4,%xmm2 + movdqa 240-128(%rax),%xmm4 + + movdqa %xmm14,%xmm8 + movdqa %xmm12,%xmm6 + pxor 80-128(%rax),%xmm2 + paddd %xmm15,%xmm13 + pslld $5,%xmm8 + pxor %xmm10,%xmm6 + + movdqa %xmm14,%xmm9 + paddd %xmm1,%xmm13 + pxor %xmm4,%xmm2 + psrld $27,%xmm9 + pxor %xmm11,%xmm6 + movdqa %xmm10,%xmm7 + + pslld $30,%xmm7 + movdqa %xmm2,%xmm5 + por %xmm9,%xmm8 + psrld $31,%xmm5 + paddd %xmm6,%xmm13 + paddd %xmm2,%xmm2 + + psrld $2,%xmm10 + paddd %xmm8,%xmm13 + por %xmm5,%xmm2 + por %xmm7,%xmm10 + pxor %xmm0,%xmm3 + movdqa 0-128(%rax),%xmm0 + + movdqa %xmm13,%xmm8 + movdqa %xmm11,%xmm6 + pxor 96-128(%rax),%xmm3 + paddd %xmm15,%xmm12 + pslld $5,%xmm8 + pxor %xmm14,%xmm6 + + movdqa %xmm13,%xmm9 + paddd %xmm2,%xmm12 + pxor %xmm0,%xmm3 + psrld $27,%xmm9 + pxor %xmm10,%xmm6 + movdqa %xmm14,%xmm7 + + pslld $30,%xmm7 + movdqa %xmm3,%xmm5 + por %xmm9,%xmm8 + psrld $31,%xmm5 + paddd %xmm6,%xmm12 + paddd %xmm3,%xmm3 + + psrld $2,%xmm14 + paddd %xmm8,%xmm12 + por %xmm5,%xmm3 + por %xmm7,%xmm14 + pxor %xmm1,%xmm4 + movdqa 16-128(%rax),%xmm1 + + movdqa %xmm12,%xmm8 + movdqa %xmm10,%xmm6 + pxor 112-128(%rax),%xmm4 + paddd %xmm15,%xmm11 + pslld $5,%xmm8 + pxor %xmm13,%xmm6 + + movdqa %xmm12,%xmm9 + paddd %xmm3,%xmm11 + pxor %xmm1,%xmm4 + psrld $27,%xmm9 + pxor %xmm14,%xmm6 + movdqa %xmm13,%xmm7 + + pslld $30,%xmm7 + movdqa %xmm4,%xmm5 + por %xmm9,%xmm8 + psrld $31,%xmm5 + paddd %xmm6,%xmm11 + paddd %xmm4,%xmm4 + + psrld $2,%xmm13 + paddd %xmm8,%xmm11 + por %xmm5,%xmm4 + por %xmm7,%xmm13 + movdqa %xmm11,%xmm8 + paddd %xmm15,%xmm10 + movdqa %xmm14,%xmm6 + pslld $5,%xmm8 + pxor %xmm12,%xmm6 + + movdqa %xmm11,%xmm9 + paddd %xmm4,%xmm10 + psrld $27,%xmm9 + movdqa %xmm12,%xmm7 + pxor %xmm13,%xmm6 + + pslld $30,%xmm7 + por %xmm9,%xmm8 + paddd %xmm6,%xmm10 + + psrld $2,%xmm12 + paddd %xmm8,%xmm10 + por %xmm7,%xmm12 + movdqa (%rbx),%xmm0 + movl $1,%ecx + cmpl 0(%rbx),%ecx + pxor %xmm8,%xmm8 + cmovgeq %rbp,%r8 + cmpl 4(%rbx),%ecx + movdqa %xmm0,%xmm1 + cmovgeq %rbp,%r9 + cmpl 8(%rbx),%ecx + pcmpgtd %xmm8,%xmm1 + cmovgeq %rbp,%r10 + cmpl 12(%rbx),%ecx + paddd %xmm1,%xmm0 + cmovgeq %rbp,%r11 + + movdqu 0(%rdi),%xmm6 + pand %xmm1,%xmm10 + movdqu 32(%rdi),%xmm7 + pand %xmm1,%xmm11 + paddd %xmm6,%xmm10 + movdqu 64(%rdi),%xmm8 + pand %xmm1,%xmm12 + paddd %xmm7,%xmm11 + movdqu 96(%rdi),%xmm9 + pand %xmm1,%xmm13 + paddd %xmm8,%xmm12 + movdqu 128(%rdi),%xmm5 + pand %xmm1,%xmm14 + movdqu %xmm10,0(%rdi) + paddd %xmm9,%xmm13 + movdqu %xmm11,32(%rdi) + paddd %xmm5,%xmm14 + movdqu %xmm12,64(%rdi) + movdqu %xmm13,96(%rdi) + movdqu %xmm14,128(%rdi) + + movdqa %xmm0,(%rbx) + movdqa 96(%rbp),%xmm5 + movdqa -32(%rbp),%xmm15 + decl %edx + jnz .Loop + + movl 280(%rsp),%edx + leaq 16(%rdi),%rdi + leaq 64(%rsi),%rsi + decl %edx + jnz .Loop_grande + +.Ldone: + movq 272(%rsp),%rax +.cfi_def_cfa %rax,8 + movq -16(%rax),%rbp +.cfi_restore %rbp + movq -8(%rax),%rbx +.cfi_restore %rbx + leaq (%rax),%rsp +.cfi_def_cfa_register %rsp +.Lepilogue: + .byte 0xf3,0xc3 +.cfi_endproc +.size sha1_multi_block,.-sha1_multi_block +.type sha1_multi_block_shaext,@function +.align 32 +sha1_multi_block_shaext: +.cfi_startproc +_shaext_shortcut: + movq %rsp,%rax +.cfi_def_cfa_register %rax + pushq %rbx +.cfi_offset %rbx,-16 + pushq %rbp +.cfi_offset %rbp,-24 + subq $288,%rsp + shll $1,%edx + andq $-256,%rsp + leaq 64(%rdi),%rdi + movq %rax,272(%rsp) +.Lbody_shaext: + leaq 256(%rsp),%rbx + movdqa K_XX_XX+128(%rip),%xmm3 + +.Loop_grande_shaext: + movl %edx,280(%rsp) + xorl %edx,%edx + movq 0(%rsi),%r8 + movl 8(%rsi),%ecx + cmpl %edx,%ecx + cmovgl %ecx,%edx + testl %ecx,%ecx + movl %ecx,0(%rbx) + cmovleq %rsp,%r8 + movq 16(%rsi),%r9 + movl 24(%rsi),%ecx + cmpl %edx,%ecx + cmovgl %ecx,%edx + testl %ecx,%ecx + movl %ecx,4(%rbx) + cmovleq %rsp,%r9 + testl %edx,%edx + jz .Ldone_shaext + + movq 0-64(%rdi),%xmm0 + movq 32-64(%rdi),%xmm4 + movq 64-64(%rdi),%xmm5 + movq 96-64(%rdi),%xmm6 + movq 128-64(%rdi),%xmm7 + + punpckldq %xmm4,%xmm0 + punpckldq %xmm6,%xmm5 + + movdqa %xmm0,%xmm8 + punpcklqdq %xmm5,%xmm0 + punpckhqdq %xmm5,%xmm8 + + pshufd $63,%xmm7,%xmm1 + pshufd $127,%xmm7,%xmm9 + pshufd $27,%xmm0,%xmm0 + pshufd $27,%xmm8,%xmm8 + jmp .Loop_shaext + +.align 32 +.Loop_shaext: + movdqu 0(%r8),%xmm4 + movdqu 0(%r9),%xmm11 + movdqu 16(%r8),%xmm5 + movdqu 16(%r9),%xmm12 + movdqu 32(%r8),%xmm6 +.byte 102,15,56,0,227 + movdqu 32(%r9),%xmm13 +.byte 102,68,15,56,0,219 + movdqu 48(%r8),%xmm7 + leaq 64(%r8),%r8 +.byte 102,15,56,0,235 + movdqu 48(%r9),%xmm14 + leaq 64(%r9),%r9 +.byte 102,68,15,56,0,227 + + movdqa %xmm1,80(%rsp) + paddd %xmm4,%xmm1 + movdqa %xmm9,112(%rsp) + paddd %xmm11,%xmm9 + movdqa %xmm0,64(%rsp) + movdqa %xmm0,%xmm2 + movdqa %xmm8,96(%rsp) + movdqa %xmm8,%xmm10 +.byte 15,58,204,193,0 +.byte 15,56,200,213 +.byte 69,15,58,204,193,0 +.byte 69,15,56,200,212 +.byte 102,15,56,0,243 + prefetcht0 127(%r8) +.byte 15,56,201,229 +.byte 102,68,15,56,0,235 + prefetcht0 127(%r9) +.byte 69,15,56,201,220 + +.byte 102,15,56,0,251 + movdqa %xmm0,%xmm1 +.byte 102,68,15,56,0,243 + movdqa %xmm8,%xmm9 +.byte 15,58,204,194,0 +.byte 15,56,200,206 +.byte 69,15,58,204,194,0 +.byte 69,15,56,200,205 + pxor %xmm6,%xmm4 +.byte 15,56,201,238 + pxor %xmm13,%xmm11 +.byte 69,15,56,201,229 + movdqa %xmm0,%xmm2 + movdqa %xmm8,%xmm10 +.byte 15,58,204,193,0 +.byte 15,56,200,215 +.byte 69,15,58,204,193,0 +.byte 69,15,56,200,214 +.byte 15,56,202,231 +.byte 69,15,56,202,222 + pxor %xmm7,%xmm5 +.byte 15,56,201,247 + pxor %xmm14,%xmm12 +.byte 69,15,56,201,238 + movdqa %xmm0,%xmm1 + movdqa %xmm8,%xmm9 +.byte 15,58,204,194,0 +.byte 15,56,200,204 +.byte 69,15,58,204,194,0 +.byte 69,15,56,200,203 +.byte 15,56,202,236 +.byte 69,15,56,202,227 + pxor %xmm4,%xmm6 +.byte 15,56,201,252 + pxor %xmm11,%xmm13 +.byte 69,15,56,201,243 + movdqa %xmm0,%xmm2 + movdqa %xmm8,%xmm10 +.byte 15,58,204,193,0 +.byte 15,56,200,213 +.byte 69,15,58,204,193,0 +.byte 69,15,56,200,212 +.byte 15,56,202,245 +.byte 69,15,56,202,236 + pxor %xmm5,%xmm7 +.byte 15,56,201,229 + pxor %xmm12,%xmm14 +.byte 69,15,56,201,220 + movdqa %xmm0,%xmm1 + movdqa %xmm8,%xmm9 +.byte 15,58,204,194,1 +.byte 15,56,200,206 +.byte 69,15,58,204,194,1 +.byte 69,15,56,200,205 +.byte 15,56,202,254 +.byte 69,15,56,202,245 + pxor %xmm6,%xmm4 +.byte 15,56,201,238 + pxor %xmm13,%xmm11 +.byte 69,15,56,201,229 + movdqa %xmm0,%xmm2 + movdqa %xmm8,%xmm10 +.byte 15,58,204,193,1 +.byte 15,56,200,215 +.byte 69,15,58,204,193,1 +.byte 69,15,56,200,214 +.byte 15,56,202,231 +.byte 69,15,56,202,222 + pxor %xmm7,%xmm5 +.byte 15,56,201,247 + pxor %xmm14,%xmm12 +.byte 69,15,56,201,238 + movdqa %xmm0,%xmm1 + movdqa %xmm8,%xmm9 +.byte 15,58,204,194,1 +.byte 15,56,200,204 +.byte 69,15,58,204,194,1 +.byte 69,15,56,200,203 +.byte 15,56,202,236 +.byte 69,15,56,202,227 + pxor %xmm4,%xmm6 +.byte 15,56,201,252 + pxor %xmm11,%xmm13 +.byte 69,15,56,201,243 + movdqa %xmm0,%xmm2 + movdqa %xmm8,%xmm10 +.byte 15,58,204,193,1 +.byte 15,56,200,213 +.byte 69,15,58,204,193,1 +.byte 69,15,56,200,212 +.byte 15,56,202,245 +.byte 69,15,56,202,236 + pxor %xmm5,%xmm7 +.byte 15,56,201,229 + pxor %xmm12,%xmm14 +.byte 69,15,56,201,220 + movdqa %xmm0,%xmm1 + movdqa %xmm8,%xmm9 +.byte 15,58,204,194,1 +.byte 15,56,200,206 +.byte 69,15,58,204,194,1 +.byte 69,15,56,200,205 +.byte 15,56,202,254 +.byte 69,15,56,202,245 + pxor %xmm6,%xmm4 +.byte 15,56,201,238 + pxor %xmm13,%xmm11 +.byte 69,15,56,201,229 + movdqa %xmm0,%xmm2 + movdqa %xmm8,%xmm10 +.byte 15,58,204,193,2 +.byte 15,56,200,215 +.byte 69,15,58,204,193,2 +.byte 69,15,56,200,214 +.byte 15,56,202,231 +.byte 69,15,56,202,222 + pxor %xmm7,%xmm5 +.byte 15,56,201,247 + pxor %xmm14,%xmm12 +.byte 69,15,56,201,238 + movdqa %xmm0,%xmm1 + movdqa %xmm8,%xmm9 +.byte 15,58,204,194,2 +.byte 15,56,200,204 +.byte 69,15,58,204,194,2 +.byte 69,15,56,200,203 +.byte 15,56,202,236 +.byte 69,15,56,202,227 + pxor %xmm4,%xmm6 +.byte 15,56,201,252 + pxor %xmm11,%xmm13 +.byte 69,15,56,201,243 + movdqa %xmm0,%xmm2 + movdqa %xmm8,%xmm10 +.byte 15,58,204,193,2 +.byte 15,56,200,213 +.byte 69,15,58,204,193,2 +.byte 69,15,56,200,212 +.byte 15,56,202,245 +.byte 69,15,56,202,236 + pxor %xmm5,%xmm7 +.byte 15,56,201,229 + pxor %xmm12,%xmm14 +.byte 69,15,56,201,220 + movdqa %xmm0,%xmm1 + movdqa %xmm8,%xmm9 +.byte 15,58,204,194,2 +.byte 15,56,200,206 +.byte 69,15,58,204,194,2 +.byte 69,15,56,200,205 +.byte 15,56,202,254 +.byte 69,15,56,202,245 + pxor %xmm6,%xmm4 +.byte 15,56,201,238 + pxor %xmm13,%xmm11 +.byte 69,15,56,201,229 + movdqa %xmm0,%xmm2 + movdqa %xmm8,%xmm10 +.byte 15,58,204,193,2 +.byte 15,56,200,215 +.byte 69,15,58,204,193,2 +.byte 69,15,56,200,214 +.byte 15,56,202,231 +.byte 69,15,56,202,222 + pxor %xmm7,%xmm5 +.byte 15,56,201,247 + pxor %xmm14,%xmm12 +.byte 69,15,56,201,238 + movdqa %xmm0,%xmm1 + movdqa %xmm8,%xmm9 +.byte 15,58,204,194,3 +.byte 15,56,200,204 +.byte 69,15,58,204,194,3 +.byte 69,15,56,200,203 +.byte 15,56,202,236 +.byte 69,15,56,202,227 + pxor %xmm4,%xmm6 +.byte 15,56,201,252 + pxor %xmm11,%xmm13 +.byte 69,15,56,201,243 + movdqa %xmm0,%xmm2 + movdqa %xmm8,%xmm10 +.byte 15,58,204,193,3 +.byte 15,56,200,213 +.byte 69,15,58,204,193,3 +.byte 69,15,56,200,212 +.byte 15,56,202,245 +.byte 69,15,56,202,236 + pxor %xmm5,%xmm7 + pxor %xmm12,%xmm14 + + movl $1,%ecx + pxor %xmm4,%xmm4 + cmpl 0(%rbx),%ecx + cmovgeq %rsp,%r8 + + movdqa %xmm0,%xmm1 + movdqa %xmm8,%xmm9 +.byte 15,58,204,194,3 +.byte 15,56,200,206 +.byte 69,15,58,204,194,3 +.byte 69,15,56,200,205 +.byte 15,56,202,254 +.byte 69,15,56,202,245 + + cmpl 4(%rbx),%ecx + cmovgeq %rsp,%r9 + movq (%rbx),%xmm6 + + movdqa %xmm0,%xmm2 + movdqa %xmm8,%xmm10 +.byte 15,58,204,193,3 +.byte 15,56,200,215 +.byte 69,15,58,204,193,3 +.byte 69,15,56,200,214 + + pshufd $0x00,%xmm6,%xmm11 + pshufd $0x55,%xmm6,%xmm12 + movdqa %xmm6,%xmm7 + pcmpgtd %xmm4,%xmm11 + pcmpgtd %xmm4,%xmm12 + + movdqa %xmm0,%xmm1 + movdqa %xmm8,%xmm9 +.byte 15,58,204,194,3 +.byte 15,56,200,204 +.byte 69,15,58,204,194,3 +.byte 68,15,56,200,204 + + pcmpgtd %xmm4,%xmm7 + pand %xmm11,%xmm0 + pand %xmm11,%xmm1 + pand %xmm12,%xmm8 + pand %xmm12,%xmm9 + paddd %xmm7,%xmm6 + + paddd 64(%rsp),%xmm0 + paddd 80(%rsp),%xmm1 + paddd 96(%rsp),%xmm8 + paddd 112(%rsp),%xmm9 + + movq %xmm6,(%rbx) + decl %edx + jnz .Loop_shaext + + movl 280(%rsp),%edx + + pshufd $27,%xmm0,%xmm0 + pshufd $27,%xmm8,%xmm8 + + movdqa %xmm0,%xmm6 + punpckldq %xmm8,%xmm0 + punpckhdq %xmm8,%xmm6 + punpckhdq %xmm9,%xmm1 + movq %xmm0,0-64(%rdi) + psrldq $8,%xmm0 + movq %xmm6,64-64(%rdi) + psrldq $8,%xmm6 + movq %xmm0,32-64(%rdi) + psrldq $8,%xmm1 + movq %xmm6,96-64(%rdi) + movq %xmm1,128-64(%rdi) + + leaq 8(%rdi),%rdi + leaq 32(%rsi),%rsi + decl %edx + jnz .Loop_grande_shaext + +.Ldone_shaext: + + movq -16(%rax),%rbp +.cfi_restore %rbp + movq -8(%rax),%rbx +.cfi_restore %rbx + leaq (%rax),%rsp +.cfi_def_cfa_register %rsp +.Lepilogue_shaext: + .byte 0xf3,0xc3 +.cfi_endproc +.size sha1_multi_block_shaext,.-sha1_multi_block_shaext + +.align 256 +.long 0x5a827999,0x5a827999,0x5a827999,0x5a827999 +.long 0x5a827999,0x5a827999,0x5a827999,0x5a827999 +K_XX_XX: +.long 0x6ed9eba1,0x6ed9eba1,0x6ed9eba1,0x6ed9eba1 +.long 0x6ed9eba1,0x6ed9eba1,0x6ed9eba1,0x6ed9eba1 +.long 0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc +.long 0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc +.long 0xca62c1d6,0xca62c1d6,0xca62c1d6,0xca62c1d6 +.long 0xca62c1d6,0xca62c1d6,0xca62c1d6,0xca62c1d6 +.long 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f +.long 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f +.byte 0xf,0xe,0xd,0xc,0xb,0xa,0x9,0x8,0x7,0x6,0x5,0x4,0x3,0x2,0x1,0x0 +.byte 83,72,65,49,32,109,117,108,116,105,45,98,108,111,99,107,32,116,114,97,110,115,102,111,114,109,32,102,111,114,32,120,56,54,95,54,52,44,32,67,82,89,80,84,79,71,65,77,83,32,98,121,32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46,111,114,103,62,0 diff --git a/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha1-x86_64.S b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha1-x86_64.S new file mode 100644 index 0000000000..0b59726ae4 --- /dev/null +++ b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha1-x86_64.S @@ -0,0 +1,2631 @@ +# WARNING: do not edit! +# Generated from openssl/crypto/sha/asm/sha1-x86_64.pl +# +# Copyright 2006-2020 The OpenSSL Project Authors. All Rights Reserved. +# +# Licensed under the OpenSSL license (the "License"). You may not use +# this file except in compliance with the License. You can obtain a copy +# in the file LICENSE in the source distribution or at +# https://www.openssl.org/source/license.html + +.text + + +.globl sha1_block_data_order +.type sha1_block_data_order,@function +.align 16 +sha1_block_data_order: +.cfi_startproc + movl OPENSSL_ia32cap_P+0(%rip),%r9d + movl OPENSSL_ia32cap_P+4(%rip),%r8d + movl OPENSSL_ia32cap_P+8(%rip),%r10d + testl $512,%r8d + jz .Lialu + testl $536870912,%r10d + jnz _shaext_shortcut + jmp _ssse3_shortcut + +.align 16 +.Lialu: + movq %rsp,%rax +.cfi_def_cfa_register %rax + pushq %rbx +.cfi_offset %rbx,-16 + pushq %rbp +.cfi_offset %rbp,-24 + pushq %r12 +.cfi_offset %r12,-32 + pushq %r13 +.cfi_offset %r13,-40 + pushq %r14 +.cfi_offset %r14,-48 + movq %rdi,%r8 + subq $72,%rsp + movq %rsi,%r9 + andq $-64,%rsp + movq %rdx,%r10 + movq %rax,64(%rsp) +.cfi_escape 0x0f,0x06,0x77,0xc0,0x00,0x06,0x23,0x08 +.Lprologue: + + movl 0(%r8),%esi + movl 4(%r8),%edi + movl 8(%r8),%r11d + movl 12(%r8),%r12d + movl 16(%r8),%r13d + jmp .Lloop + +.align 16 +.Lloop: + movl 0(%r9),%edx + bswapl %edx + movl 4(%r9),%ebp + movl %r12d,%eax + movl %edx,0(%rsp) + movl %esi,%ecx + bswapl %ebp + xorl %r11d,%eax + roll $5,%ecx + andl %edi,%eax + leal 1518500249(%rdx,%r13,1),%r13d + addl %ecx,%r13d + xorl %r12d,%eax + roll $30,%edi + addl %eax,%r13d + movl 8(%r9),%r14d + movl %r11d,%eax + movl %ebp,4(%rsp) + movl %r13d,%ecx + bswapl %r14d + xorl %edi,%eax + roll $5,%ecx + andl %esi,%eax + leal 1518500249(%rbp,%r12,1),%r12d + addl %ecx,%r12d + xorl %r11d,%eax + roll $30,%esi + addl %eax,%r12d + movl 12(%r9),%edx + movl %edi,%eax + movl %r14d,8(%rsp) + movl %r12d,%ecx + bswapl %edx + xorl %esi,%eax + roll $5,%ecx + andl %r13d,%eax + leal 1518500249(%r14,%r11,1),%r11d + addl %ecx,%r11d + xorl %edi,%eax + roll $30,%r13d + addl %eax,%r11d + movl 16(%r9),%ebp + movl %esi,%eax + movl %edx,12(%rsp) + movl %r11d,%ecx + bswapl %ebp + xorl %r13d,%eax + roll $5,%ecx + andl %r12d,%eax + leal 1518500249(%rdx,%rdi,1),%edi + addl %ecx,%edi + xorl %esi,%eax + roll $30,%r12d + addl %eax,%edi + movl 20(%r9),%r14d + movl %r13d,%eax + movl %ebp,16(%rsp) + movl %edi,%ecx + bswapl %r14d + xorl %r12d,%eax + roll $5,%ecx + andl %r11d,%eax + leal 1518500249(%rbp,%rsi,1),%esi + addl %ecx,%esi + xorl %r13d,%eax + roll $30,%r11d + addl %eax,%esi + movl 24(%r9),%edx + movl %r12d,%eax + movl %r14d,20(%rsp) + movl %esi,%ecx + bswapl %edx + xorl %r11d,%eax + roll $5,%ecx + andl %edi,%eax + leal 1518500249(%r14,%r13,1),%r13d + addl %ecx,%r13d + xorl %r12d,%eax + roll $30,%edi + addl %eax,%r13d + movl 28(%r9),%ebp + movl %r11d,%eax + movl %edx,24(%rsp) + movl %r13d,%ecx + bswapl %ebp + xorl %edi,%eax + roll $5,%ecx + andl %esi,%eax + leal 1518500249(%rdx,%r12,1),%r12d + addl %ecx,%r12d + xorl %r11d,%eax + roll $30,%esi + addl %eax,%r12d + movl 32(%r9),%r14d + movl %edi,%eax + movl %ebp,28(%rsp) + movl %r12d,%ecx + bswapl %r14d + xorl %esi,%eax + roll $5,%ecx + andl %r13d,%eax + leal 1518500249(%rbp,%r11,1),%r11d + addl %ecx,%r11d + xorl %edi,%eax + roll $30,%r13d + addl %eax,%r11d + movl 36(%r9),%edx + movl %esi,%eax + movl %r14d,32(%rsp) + movl %r11d,%ecx + bswapl %edx + xorl %r13d,%eax + roll $5,%ecx + andl %r12d,%eax + leal 1518500249(%r14,%rdi,1),%edi + addl %ecx,%edi + xorl %esi,%eax + roll $30,%r12d + addl %eax,%edi + movl 40(%r9),%ebp + movl %r13d,%eax + movl %edx,36(%rsp) + movl %edi,%ecx + bswapl %ebp + xorl %r12d,%eax + roll $5,%ecx + andl %r11d,%eax + leal 1518500249(%rdx,%rsi,1),%esi + addl %ecx,%esi + xorl %r13d,%eax + roll $30,%r11d + addl %eax,%esi + movl 44(%r9),%r14d + movl %r12d,%eax + movl %ebp,40(%rsp) + movl %esi,%ecx + bswapl %r14d + xorl %r11d,%eax + roll $5,%ecx + andl %edi,%eax + leal 1518500249(%rbp,%r13,1),%r13d + addl %ecx,%r13d + xorl %r12d,%eax + roll $30,%edi + addl %eax,%r13d + movl 48(%r9),%edx + movl %r11d,%eax + movl %r14d,44(%rsp) + movl %r13d,%ecx + bswapl %edx + xorl %edi,%eax + roll $5,%ecx + andl %esi,%eax + leal 1518500249(%r14,%r12,1),%r12d + addl %ecx,%r12d + xorl %r11d,%eax + roll $30,%esi + addl %eax,%r12d + movl 52(%r9),%ebp + movl %edi,%eax + movl %edx,48(%rsp) + movl %r12d,%ecx + bswapl %ebp + xorl %esi,%eax + roll $5,%ecx + andl %r13d,%eax + leal 1518500249(%rdx,%r11,1),%r11d + addl %ecx,%r11d + xorl %edi,%eax + roll $30,%r13d + addl %eax,%r11d + movl 56(%r9),%r14d + movl %esi,%eax + movl %ebp,52(%rsp) + movl %r11d,%ecx + bswapl %r14d + xorl %r13d,%eax + roll $5,%ecx + andl %r12d,%eax + leal 1518500249(%rbp,%rdi,1),%edi + addl %ecx,%edi + xorl %esi,%eax + roll $30,%r12d + addl %eax,%edi + movl 60(%r9),%edx + movl %r13d,%eax + movl %r14d,56(%rsp) + movl %edi,%ecx + bswapl %edx + xorl %r12d,%eax + roll $5,%ecx + andl %r11d,%eax + leal 1518500249(%r14,%rsi,1),%esi + addl %ecx,%esi + xorl %r13d,%eax + roll $30,%r11d + addl %eax,%esi + xorl 0(%rsp),%ebp + movl %r12d,%eax + movl %edx,60(%rsp) + movl %esi,%ecx + xorl 8(%rsp),%ebp + xorl %r11d,%eax + roll $5,%ecx + xorl 32(%rsp),%ebp + andl %edi,%eax + leal 1518500249(%rdx,%r13,1),%r13d + roll $30,%edi + xorl %r12d,%eax + addl %ecx,%r13d + roll $1,%ebp + addl %eax,%r13d + xorl 4(%rsp),%r14d + movl %r11d,%eax + movl %ebp,0(%rsp) + movl %r13d,%ecx + xorl 12(%rsp),%r14d + xorl %edi,%eax + roll $5,%ecx + xorl 36(%rsp),%r14d + andl %esi,%eax + leal 1518500249(%rbp,%r12,1),%r12d + roll $30,%esi + xorl %r11d,%eax + addl %ecx,%r12d + roll $1,%r14d + addl %eax,%r12d + xorl 8(%rsp),%edx + movl %edi,%eax + movl %r14d,4(%rsp) + movl %r12d,%ecx + xorl 16(%rsp),%edx + xorl %esi,%eax + roll $5,%ecx + xorl 40(%rsp),%edx + andl %r13d,%eax + leal 1518500249(%r14,%r11,1),%r11d + roll $30,%r13d + xorl %edi,%eax + addl %ecx,%r11d + roll $1,%edx + addl %eax,%r11d + xorl 12(%rsp),%ebp + movl %esi,%eax + movl %edx,8(%rsp) + movl %r11d,%ecx + xorl 20(%rsp),%ebp + xorl %r13d,%eax + roll $5,%ecx + xorl 44(%rsp),%ebp + andl %r12d,%eax + leal 1518500249(%rdx,%rdi,1),%edi + roll $30,%r12d + xorl %esi,%eax + addl %ecx,%edi + roll $1,%ebp + addl %eax,%edi + xorl 16(%rsp),%r14d + movl %r13d,%eax + movl %ebp,12(%rsp) + movl %edi,%ecx + xorl 24(%rsp),%r14d + xorl %r12d,%eax + roll $5,%ecx + xorl 48(%rsp),%r14d + andl %r11d,%eax + leal 1518500249(%rbp,%rsi,1),%esi + roll $30,%r11d + xorl %r13d,%eax + addl %ecx,%esi + roll $1,%r14d + addl %eax,%esi + xorl 20(%rsp),%edx + movl %edi,%eax + movl %r14d,16(%rsp) + movl %esi,%ecx + xorl 28(%rsp),%edx + xorl %r12d,%eax + roll $5,%ecx + xorl 52(%rsp),%edx + leal 1859775393(%r14,%r13,1),%r13d + xorl %r11d,%eax + addl %ecx,%r13d + roll $30,%edi + addl %eax,%r13d + roll $1,%edx + xorl 24(%rsp),%ebp + movl %esi,%eax + movl %edx,20(%rsp) + movl %r13d,%ecx + xorl 32(%rsp),%ebp + xorl %r11d,%eax + roll $5,%ecx + xorl 56(%rsp),%ebp + leal 1859775393(%rdx,%r12,1),%r12d + xorl %edi,%eax + addl %ecx,%r12d + roll $30,%esi + addl %eax,%r12d + roll $1,%ebp + xorl 28(%rsp),%r14d + movl %r13d,%eax + movl %ebp,24(%rsp) + movl %r12d,%ecx + xorl 36(%rsp),%r14d + xorl %edi,%eax + roll $5,%ecx + xorl 60(%rsp),%r14d + leal 1859775393(%rbp,%r11,1),%r11d + xorl %esi,%eax + addl %ecx,%r11d + roll $30,%r13d + addl %eax,%r11d + roll $1,%r14d + xorl 32(%rsp),%edx + movl %r12d,%eax + movl %r14d,28(%rsp) + movl %r11d,%ecx + xorl 40(%rsp),%edx + xorl %esi,%eax + roll $5,%ecx + xorl 0(%rsp),%edx + leal 1859775393(%r14,%rdi,1),%edi + xorl %r13d,%eax + addl %ecx,%edi + roll $30,%r12d + addl %eax,%edi + roll $1,%edx + xorl 36(%rsp),%ebp + movl %r11d,%eax + movl %edx,32(%rsp) + movl %edi,%ecx + xorl 44(%rsp),%ebp + xorl %r13d,%eax + roll $5,%ecx + xorl 4(%rsp),%ebp + leal 1859775393(%rdx,%rsi,1),%esi + xorl %r12d,%eax + addl %ecx,%esi + roll $30,%r11d + addl %eax,%esi + roll $1,%ebp + xorl 40(%rsp),%r14d + movl %edi,%eax + movl %ebp,36(%rsp) + movl %esi,%ecx + xorl 48(%rsp),%r14d + xorl %r12d,%eax + roll $5,%ecx + xorl 8(%rsp),%r14d + leal 1859775393(%rbp,%r13,1),%r13d + xorl %r11d,%eax + addl %ecx,%r13d + roll $30,%edi + addl %eax,%r13d + roll $1,%r14d + xorl 44(%rsp),%edx + movl %esi,%eax + movl %r14d,40(%rsp) + movl %r13d,%ecx + xorl 52(%rsp),%edx + xorl %r11d,%eax + roll $5,%ecx + xorl 12(%rsp),%edx + leal 1859775393(%r14,%r12,1),%r12d + xorl %edi,%eax + addl %ecx,%r12d + roll $30,%esi + addl %eax,%r12d + roll $1,%edx + xorl 48(%rsp),%ebp + movl %r13d,%eax + movl %edx,44(%rsp) + movl %r12d,%ecx + xorl 56(%rsp),%ebp + xorl %edi,%eax + roll $5,%ecx + xorl 16(%rsp),%ebp + leal 1859775393(%rdx,%r11,1),%r11d + xorl %esi,%eax + addl %ecx,%r11d + roll $30,%r13d + addl %eax,%r11d + roll $1,%ebp + xorl 52(%rsp),%r14d + movl %r12d,%eax + movl %ebp,48(%rsp) + movl %r11d,%ecx + xorl 60(%rsp),%r14d + xorl %esi,%eax + roll $5,%ecx + xorl 20(%rsp),%r14d + leal 1859775393(%rbp,%rdi,1),%edi + xorl %r13d,%eax + addl %ecx,%edi + roll $30,%r12d + addl %eax,%edi + roll $1,%r14d + xorl 56(%rsp),%edx + movl %r11d,%eax + movl %r14d,52(%rsp) + movl %edi,%ecx + xorl 0(%rsp),%edx + xorl %r13d,%eax + roll $5,%ecx + xorl 24(%rsp),%edx + leal 1859775393(%r14,%rsi,1),%esi + xorl %r12d,%eax + addl %ecx,%esi + roll $30,%r11d + addl %eax,%esi + roll $1,%edx + xorl 60(%rsp),%ebp + movl %edi,%eax + movl %edx,56(%rsp) + movl %esi,%ecx + xorl 4(%rsp),%ebp + xorl %r12d,%eax + roll $5,%ecx + xorl 28(%rsp),%ebp + leal 1859775393(%rdx,%r13,1),%r13d + xorl %r11d,%eax + addl %ecx,%r13d + roll $30,%edi + addl %eax,%r13d + roll $1,%ebp + xorl 0(%rsp),%r14d + movl %esi,%eax + movl %ebp,60(%rsp) + movl %r13d,%ecx + xorl 8(%rsp),%r14d + xorl %r11d,%eax + roll $5,%ecx + xorl 32(%rsp),%r14d + leal 1859775393(%rbp,%r12,1),%r12d + xorl %edi,%eax + addl %ecx,%r12d + roll $30,%esi + addl %eax,%r12d + roll $1,%r14d + xorl 4(%rsp),%edx + movl %r13d,%eax + movl %r14d,0(%rsp) + movl %r12d,%ecx + xorl 12(%rsp),%edx + xorl %edi,%eax + roll $5,%ecx + xorl 36(%rsp),%edx + leal 1859775393(%r14,%r11,1),%r11d + xorl %esi,%eax + addl %ecx,%r11d + roll $30,%r13d + addl %eax,%r11d + roll $1,%edx + xorl 8(%rsp),%ebp + movl %r12d,%eax + movl %edx,4(%rsp) + movl %r11d,%ecx + xorl 16(%rsp),%ebp + xorl %esi,%eax + roll $5,%ecx + xorl 40(%rsp),%ebp + leal 1859775393(%rdx,%rdi,1),%edi + xorl %r13d,%eax + addl %ecx,%edi + roll $30,%r12d + addl %eax,%edi + roll $1,%ebp + xorl 12(%rsp),%r14d + movl %r11d,%eax + movl %ebp,8(%rsp) + movl %edi,%ecx + xorl 20(%rsp),%r14d + xorl %r13d,%eax + roll $5,%ecx + xorl 44(%rsp),%r14d + leal 1859775393(%rbp,%rsi,1),%esi + xorl %r12d,%eax + addl %ecx,%esi + roll $30,%r11d + addl %eax,%esi + roll $1,%r14d + xorl 16(%rsp),%edx + movl %edi,%eax + movl %r14d,12(%rsp) + movl %esi,%ecx + xorl 24(%rsp),%edx + xorl %r12d,%eax + roll $5,%ecx + xorl 48(%rsp),%edx + leal 1859775393(%r14,%r13,1),%r13d + xorl %r11d,%eax + addl %ecx,%r13d + roll $30,%edi + addl %eax,%r13d + roll $1,%edx + xorl 20(%rsp),%ebp + movl %esi,%eax + movl %edx,16(%rsp) + movl %r13d,%ecx + xorl 28(%rsp),%ebp + xorl %r11d,%eax + roll $5,%ecx + xorl 52(%rsp),%ebp + leal 1859775393(%rdx,%r12,1),%r12d + xorl %edi,%eax + addl %ecx,%r12d + roll $30,%esi + addl %eax,%r12d + roll $1,%ebp + xorl 24(%rsp),%r14d + movl %r13d,%eax + movl %ebp,20(%rsp) + movl %r12d,%ecx + xorl 32(%rsp),%r14d + xorl %edi,%eax + roll $5,%ecx + xorl 56(%rsp),%r14d + leal 1859775393(%rbp,%r11,1),%r11d + xorl %esi,%eax + addl %ecx,%r11d + roll $30,%r13d + addl %eax,%r11d + roll $1,%r14d + xorl 28(%rsp),%edx + movl %r12d,%eax + movl %r14d,24(%rsp) + movl %r11d,%ecx + xorl 36(%rsp),%edx + xorl %esi,%eax + roll $5,%ecx + xorl 60(%rsp),%edx + leal 1859775393(%r14,%rdi,1),%edi + xorl %r13d,%eax + addl %ecx,%edi + roll $30,%r12d + addl %eax,%edi + roll $1,%edx + xorl 32(%rsp),%ebp + movl %r11d,%eax + movl %edx,28(%rsp) + movl %edi,%ecx + xorl 40(%rsp),%ebp + xorl %r13d,%eax + roll $5,%ecx + xorl 0(%rsp),%ebp + leal 1859775393(%rdx,%rsi,1),%esi + xorl %r12d,%eax + addl %ecx,%esi + roll $30,%r11d + addl %eax,%esi + roll $1,%ebp + xorl 36(%rsp),%r14d + movl %r12d,%eax + movl %ebp,32(%rsp) + movl %r12d,%ebx + xorl 44(%rsp),%r14d + andl %r11d,%eax + movl %esi,%ecx + xorl 4(%rsp),%r14d + leal -1894007588(%rbp,%r13,1),%r13d + xorl %r11d,%ebx + roll $5,%ecx + addl %eax,%r13d + roll $1,%r14d + andl %edi,%ebx + addl %ecx,%r13d + roll $30,%edi + addl %ebx,%r13d + xorl 40(%rsp),%edx + movl %r11d,%eax + movl %r14d,36(%rsp) + movl %r11d,%ebx + xorl 48(%rsp),%edx + andl %edi,%eax + movl %r13d,%ecx + xorl 8(%rsp),%edx + leal -1894007588(%r14,%r12,1),%r12d + xorl %edi,%ebx + roll $5,%ecx + addl %eax,%r12d + roll $1,%edx + andl %esi,%ebx + addl %ecx,%r12d + roll $30,%esi + addl %ebx,%r12d + xorl 44(%rsp),%ebp + movl %edi,%eax + movl %edx,40(%rsp) + movl %edi,%ebx + xorl 52(%rsp),%ebp + andl %esi,%eax + movl %r12d,%ecx + xorl 12(%rsp),%ebp + leal -1894007588(%rdx,%r11,1),%r11d + xorl %esi,%ebx + roll $5,%ecx + addl %eax,%r11d + roll $1,%ebp + andl %r13d,%ebx + addl %ecx,%r11d + roll $30,%r13d + addl %ebx,%r11d + xorl 48(%rsp),%r14d + movl %esi,%eax + movl %ebp,44(%rsp) + movl %esi,%ebx + xorl 56(%rsp),%r14d + andl %r13d,%eax + movl %r11d,%ecx + xorl 16(%rsp),%r14d + leal -1894007588(%rbp,%rdi,1),%edi + xorl %r13d,%ebx + roll $5,%ecx + addl %eax,%edi + roll $1,%r14d + andl %r12d,%ebx + addl %ecx,%edi + roll $30,%r12d + addl %ebx,%edi + xorl 52(%rsp),%edx + movl %r13d,%eax + movl %r14d,48(%rsp) + movl %r13d,%ebx + xorl 60(%rsp),%edx + andl %r12d,%eax + movl %edi,%ecx + xorl 20(%rsp),%edx + leal -1894007588(%r14,%rsi,1),%esi + xorl %r12d,%ebx + roll $5,%ecx + addl %eax,%esi + roll $1,%edx + andl %r11d,%ebx + addl %ecx,%esi + roll $30,%r11d + addl %ebx,%esi + xorl 56(%rsp),%ebp + movl %r12d,%eax + movl %edx,52(%rsp) + movl %r12d,%ebx + xorl 0(%rsp),%ebp + andl %r11d,%eax + movl %esi,%ecx + xorl 24(%rsp),%ebp + leal -1894007588(%rdx,%r13,1),%r13d + xorl %r11d,%ebx + roll $5,%ecx + addl %eax,%r13d + roll $1,%ebp + andl %edi,%ebx + addl %ecx,%r13d + roll $30,%edi + addl %ebx,%r13d + xorl 60(%rsp),%r14d + movl %r11d,%eax + movl %ebp,56(%rsp) + movl %r11d,%ebx + xorl 4(%rsp),%r14d + andl %edi,%eax + movl %r13d,%ecx + xorl 28(%rsp),%r14d + leal -1894007588(%rbp,%r12,1),%r12d + xorl %edi,%ebx + roll $5,%ecx + addl %eax,%r12d + roll $1,%r14d + andl %esi,%ebx + addl %ecx,%r12d + roll $30,%esi + addl %ebx,%r12d + xorl 0(%rsp),%edx + movl %edi,%eax + movl %r14d,60(%rsp) + movl %edi,%ebx + xorl 8(%rsp),%edx + andl %esi,%eax + movl %r12d,%ecx + xorl 32(%rsp),%edx + leal -1894007588(%r14,%r11,1),%r11d + xorl %esi,%ebx + roll $5,%ecx + addl %eax,%r11d + roll $1,%edx + andl %r13d,%ebx + addl %ecx,%r11d + roll $30,%r13d + addl %ebx,%r11d + xorl 4(%rsp),%ebp + movl %esi,%eax + movl %edx,0(%rsp) + movl %esi,%ebx + xorl 12(%rsp),%ebp + andl %r13d,%eax + movl %r11d,%ecx + xorl 36(%rsp),%ebp + leal -1894007588(%rdx,%rdi,1),%edi + xorl %r13d,%ebx + roll $5,%ecx + addl %eax,%edi + roll $1,%ebp + andl %r12d,%ebx + addl %ecx,%edi + roll $30,%r12d + addl %ebx,%edi + xorl 8(%rsp),%r14d + movl %r13d,%eax + movl %ebp,4(%rsp) + movl %r13d,%ebx + xorl 16(%rsp),%r14d + andl %r12d,%eax + movl %edi,%ecx + xorl 40(%rsp),%r14d + leal -1894007588(%rbp,%rsi,1),%esi + xorl %r12d,%ebx + roll $5,%ecx + addl %eax,%esi + roll $1,%r14d + andl %r11d,%ebx + addl %ecx,%esi + roll $30,%r11d + addl %ebx,%esi + xorl 12(%rsp),%edx + movl %r12d,%eax + movl %r14d,8(%rsp) + movl %r12d,%ebx + xorl 20(%rsp),%edx + andl %r11d,%eax + movl %esi,%ecx + xorl 44(%rsp),%edx + leal -1894007588(%r14,%r13,1),%r13d + xorl %r11d,%ebx + roll $5,%ecx + addl %eax,%r13d + roll $1,%edx + andl %edi,%ebx + addl %ecx,%r13d + roll $30,%edi + addl %ebx,%r13d + xorl 16(%rsp),%ebp + movl %r11d,%eax + movl %edx,12(%rsp) + movl %r11d,%ebx + xorl 24(%rsp),%ebp + andl %edi,%eax + movl %r13d,%ecx + xorl 48(%rsp),%ebp + leal -1894007588(%rdx,%r12,1),%r12d + xorl %edi,%ebx + roll $5,%ecx + addl %eax,%r12d + roll $1,%ebp + andl %esi,%ebx + addl %ecx,%r12d + roll $30,%esi + addl %ebx,%r12d + xorl 20(%rsp),%r14d + movl %edi,%eax + movl %ebp,16(%rsp) + movl %edi,%ebx + xorl 28(%rsp),%r14d + andl %esi,%eax + movl %r12d,%ecx + xorl 52(%rsp),%r14d + leal -1894007588(%rbp,%r11,1),%r11d + xorl %esi,%ebx + roll $5,%ecx + addl %eax,%r11d + roll $1,%r14d + andl %r13d,%ebx + addl %ecx,%r11d + roll $30,%r13d + addl %ebx,%r11d + xorl 24(%rsp),%edx + movl %esi,%eax + movl %r14d,20(%rsp) + movl %esi,%ebx + xorl 32(%rsp),%edx + andl %r13d,%eax + movl %r11d,%ecx + xorl 56(%rsp),%edx + leal -1894007588(%r14,%rdi,1),%edi + xorl %r13d,%ebx + roll $5,%ecx + addl %eax,%edi + roll $1,%edx + andl %r12d,%ebx + addl %ecx,%edi + roll $30,%r12d + addl %ebx,%edi + xorl 28(%rsp),%ebp + movl %r13d,%eax + movl %edx,24(%rsp) + movl %r13d,%ebx + xorl 36(%rsp),%ebp + andl %r12d,%eax + movl %edi,%ecx + xorl 60(%rsp),%ebp + leal -1894007588(%rdx,%rsi,1),%esi + xorl %r12d,%ebx + roll $5,%ecx + addl %eax,%esi + roll $1,%ebp + andl %r11d,%ebx + addl %ecx,%esi + roll $30,%r11d + addl %ebx,%esi + xorl 32(%rsp),%r14d + movl %r12d,%eax + movl %ebp,28(%rsp) + movl %r12d,%ebx + xorl 40(%rsp),%r14d + andl %r11d,%eax + movl %esi,%ecx + xorl 0(%rsp),%r14d + leal -1894007588(%rbp,%r13,1),%r13d + xorl %r11d,%ebx + roll $5,%ecx + addl %eax,%r13d + roll $1,%r14d + andl %edi,%ebx + addl %ecx,%r13d + roll $30,%edi + addl %ebx,%r13d + xorl 36(%rsp),%edx + movl %r11d,%eax + movl %r14d,32(%rsp) + movl %r11d,%ebx + xorl 44(%rsp),%edx + andl %edi,%eax + movl %r13d,%ecx + xorl 4(%rsp),%edx + leal -1894007588(%r14,%r12,1),%r12d + xorl %edi,%ebx + roll $5,%ecx + addl %eax,%r12d + roll $1,%edx + andl %esi,%ebx + addl %ecx,%r12d + roll $30,%esi + addl %ebx,%r12d + xorl 40(%rsp),%ebp + movl %edi,%eax + movl %edx,36(%rsp) + movl %edi,%ebx + xorl 48(%rsp),%ebp + andl %esi,%eax + movl %r12d,%ecx + xorl 8(%rsp),%ebp + leal -1894007588(%rdx,%r11,1),%r11d + xorl %esi,%ebx + roll $5,%ecx + addl %eax,%r11d + roll $1,%ebp + andl %r13d,%ebx + addl %ecx,%r11d + roll $30,%r13d + addl %ebx,%r11d + xorl 44(%rsp),%r14d + movl %esi,%eax + movl %ebp,40(%rsp) + movl %esi,%ebx + xorl 52(%rsp),%r14d + andl %r13d,%eax + movl %r11d,%ecx + xorl 12(%rsp),%r14d + leal -1894007588(%rbp,%rdi,1),%edi + xorl %r13d,%ebx + roll $5,%ecx + addl %eax,%edi + roll $1,%r14d + andl %r12d,%ebx + addl %ecx,%edi + roll $30,%r12d + addl %ebx,%edi + xorl 48(%rsp),%edx + movl %r13d,%eax + movl %r14d,44(%rsp) + movl %r13d,%ebx + xorl 56(%rsp),%edx + andl %r12d,%eax + movl %edi,%ecx + xorl 16(%rsp),%edx + leal -1894007588(%r14,%rsi,1),%esi + xorl %r12d,%ebx + roll $5,%ecx + addl %eax,%esi + roll $1,%edx + andl %r11d,%ebx + addl %ecx,%esi + roll $30,%r11d + addl %ebx,%esi + xorl 52(%rsp),%ebp + movl %edi,%eax + movl %edx,48(%rsp) + movl %esi,%ecx + xorl 60(%rsp),%ebp + xorl %r12d,%eax + roll $5,%ecx + xorl 20(%rsp),%ebp + leal -899497514(%rdx,%r13,1),%r13d + xorl %r11d,%eax + addl %ecx,%r13d + roll $30,%edi + addl %eax,%r13d + roll $1,%ebp + xorl 56(%rsp),%r14d + movl %esi,%eax + movl %ebp,52(%rsp) + movl %r13d,%ecx + xorl 0(%rsp),%r14d + xorl %r11d,%eax + roll $5,%ecx + xorl 24(%rsp),%r14d + leal -899497514(%rbp,%r12,1),%r12d + xorl %edi,%eax + addl %ecx,%r12d + roll $30,%esi + addl %eax,%r12d + roll $1,%r14d + xorl 60(%rsp),%edx + movl %r13d,%eax + movl %r14d,56(%rsp) + movl %r12d,%ecx + xorl 4(%rsp),%edx + xorl %edi,%eax + roll $5,%ecx + xorl 28(%rsp),%edx + leal -899497514(%r14,%r11,1),%r11d + xorl %esi,%eax + addl %ecx,%r11d + roll $30,%r13d + addl %eax,%r11d + roll $1,%edx + xorl 0(%rsp),%ebp + movl %r12d,%eax + movl %edx,60(%rsp) + movl %r11d,%ecx + xorl 8(%rsp),%ebp + xorl %esi,%eax + roll $5,%ecx + xorl 32(%rsp),%ebp + leal -899497514(%rdx,%rdi,1),%edi + xorl %r13d,%eax + addl %ecx,%edi + roll $30,%r12d + addl %eax,%edi + roll $1,%ebp + xorl 4(%rsp),%r14d + movl %r11d,%eax + movl %ebp,0(%rsp) + movl %edi,%ecx + xorl 12(%rsp),%r14d + xorl %r13d,%eax + roll $5,%ecx + xorl 36(%rsp),%r14d + leal -899497514(%rbp,%rsi,1),%esi + xorl %r12d,%eax + addl %ecx,%esi + roll $30,%r11d + addl %eax,%esi + roll $1,%r14d + xorl 8(%rsp),%edx + movl %edi,%eax + movl %r14d,4(%rsp) + movl %esi,%ecx + xorl 16(%rsp),%edx + xorl %r12d,%eax + roll $5,%ecx + xorl 40(%rsp),%edx + leal -899497514(%r14,%r13,1),%r13d + xorl %r11d,%eax + addl %ecx,%r13d + roll $30,%edi + addl %eax,%r13d + roll $1,%edx + xorl 12(%rsp),%ebp + movl %esi,%eax + movl %edx,8(%rsp) + movl %r13d,%ecx + xorl 20(%rsp),%ebp + xorl %r11d,%eax + roll $5,%ecx + xorl 44(%rsp),%ebp + leal -899497514(%rdx,%r12,1),%r12d + xorl %edi,%eax + addl %ecx,%r12d + roll $30,%esi + addl %eax,%r12d + roll $1,%ebp + xorl 16(%rsp),%r14d + movl %r13d,%eax + movl %ebp,12(%rsp) + movl %r12d,%ecx + xorl 24(%rsp),%r14d + xorl %edi,%eax + roll $5,%ecx + xorl 48(%rsp),%r14d + leal -899497514(%rbp,%r11,1),%r11d + xorl %esi,%eax + addl %ecx,%r11d + roll $30,%r13d + addl %eax,%r11d + roll $1,%r14d + xorl 20(%rsp),%edx + movl %r12d,%eax + movl %r14d,16(%rsp) + movl %r11d,%ecx + xorl 28(%rsp),%edx + xorl %esi,%eax + roll $5,%ecx + xorl 52(%rsp),%edx + leal -899497514(%r14,%rdi,1),%edi + xorl %r13d,%eax + addl %ecx,%edi + roll $30,%r12d + addl %eax,%edi + roll $1,%edx + xorl 24(%rsp),%ebp + movl %r11d,%eax + movl %edx,20(%rsp) + movl %edi,%ecx + xorl 32(%rsp),%ebp + xorl %r13d,%eax + roll $5,%ecx + xorl 56(%rsp),%ebp + leal -899497514(%rdx,%rsi,1),%esi + xorl %r12d,%eax + addl %ecx,%esi + roll $30,%r11d + addl %eax,%esi + roll $1,%ebp + xorl 28(%rsp),%r14d + movl %edi,%eax + movl %ebp,24(%rsp) + movl %esi,%ecx + xorl 36(%rsp),%r14d + xorl %r12d,%eax + roll $5,%ecx + xorl 60(%rsp),%r14d + leal -899497514(%rbp,%r13,1),%r13d + xorl %r11d,%eax + addl %ecx,%r13d + roll $30,%edi + addl %eax,%r13d + roll $1,%r14d + xorl 32(%rsp),%edx + movl %esi,%eax + movl %r14d,28(%rsp) + movl %r13d,%ecx + xorl 40(%rsp),%edx + xorl %r11d,%eax + roll $5,%ecx + xorl 0(%rsp),%edx + leal -899497514(%r14,%r12,1),%r12d + xorl %edi,%eax + addl %ecx,%r12d + roll $30,%esi + addl %eax,%r12d + roll $1,%edx + xorl 36(%rsp),%ebp + movl %r13d,%eax + + movl %r12d,%ecx + xorl 44(%rsp),%ebp + xorl %edi,%eax + roll $5,%ecx + xorl 4(%rsp),%ebp + leal -899497514(%rdx,%r11,1),%r11d + xorl %esi,%eax + addl %ecx,%r11d + roll $30,%r13d + addl %eax,%r11d + roll $1,%ebp + xorl 40(%rsp),%r14d + movl %r12d,%eax + + movl %r11d,%ecx + xorl 48(%rsp),%r14d + xorl %esi,%eax + roll $5,%ecx + xorl 8(%rsp),%r14d + leal -899497514(%rbp,%rdi,1),%edi + xorl %r13d,%eax + addl %ecx,%edi + roll $30,%r12d + addl %eax,%edi + roll $1,%r14d + xorl 44(%rsp),%edx + movl %r11d,%eax + + movl %edi,%ecx + xorl 52(%rsp),%edx + xorl %r13d,%eax + roll $5,%ecx + xorl 12(%rsp),%edx + leal -899497514(%r14,%rsi,1),%esi + xorl %r12d,%eax + addl %ecx,%esi + roll $30,%r11d + addl %eax,%esi + roll $1,%edx + xorl 48(%rsp),%ebp + movl %edi,%eax + + movl %esi,%ecx + xorl 56(%rsp),%ebp + xorl %r12d,%eax + roll $5,%ecx + xorl 16(%rsp),%ebp + leal -899497514(%rdx,%r13,1),%r13d + xorl %r11d,%eax + addl %ecx,%r13d + roll $30,%edi + addl %eax,%r13d + roll $1,%ebp + xorl 52(%rsp),%r14d + movl %esi,%eax + + movl %r13d,%ecx + xorl 60(%rsp),%r14d + xorl %r11d,%eax + roll $5,%ecx + xorl 20(%rsp),%r14d + leal -899497514(%rbp,%r12,1),%r12d + xorl %edi,%eax + addl %ecx,%r12d + roll $30,%esi + addl %eax,%r12d + roll $1,%r14d + xorl 56(%rsp),%edx + movl %r13d,%eax + + movl %r12d,%ecx + xorl 0(%rsp),%edx + xorl %edi,%eax + roll $5,%ecx + xorl 24(%rsp),%edx + leal -899497514(%r14,%r11,1),%r11d + xorl %esi,%eax + addl %ecx,%r11d + roll $30,%r13d + addl %eax,%r11d + roll $1,%edx + xorl 60(%rsp),%ebp + movl %r12d,%eax + + movl %r11d,%ecx + xorl 4(%rsp),%ebp + xorl %esi,%eax + roll $5,%ecx + xorl 28(%rsp),%ebp + leal -899497514(%rdx,%rdi,1),%edi + xorl %r13d,%eax + addl %ecx,%edi + roll $30,%r12d + addl %eax,%edi + roll $1,%ebp + movl %r11d,%eax + movl %edi,%ecx + xorl %r13d,%eax + leal -899497514(%rbp,%rsi,1),%esi + roll $5,%ecx + xorl %r12d,%eax + addl %ecx,%esi + roll $30,%r11d + addl %eax,%esi + addl 0(%r8),%esi + addl 4(%r8),%edi + addl 8(%r8),%r11d + addl 12(%r8),%r12d + addl 16(%r8),%r13d + movl %esi,0(%r8) + movl %edi,4(%r8) + movl %r11d,8(%r8) + movl %r12d,12(%r8) + movl %r13d,16(%r8) + + subq $1,%r10 + leaq 64(%r9),%r9 + jnz .Lloop + + movq 64(%rsp),%rsi +.cfi_def_cfa %rsi,8 + movq -40(%rsi),%r14 +.cfi_restore %r14 + movq -32(%rsi),%r13 +.cfi_restore %r13 + movq -24(%rsi),%r12 +.cfi_restore %r12 + movq -16(%rsi),%rbp +.cfi_restore %rbp + movq -8(%rsi),%rbx +.cfi_restore %rbx + leaq (%rsi),%rsp +.cfi_def_cfa_register %rsp +.Lepilogue: + .byte 0xf3,0xc3 +.cfi_endproc +.size sha1_block_data_order,.-sha1_block_data_order +.type sha1_block_data_order_shaext,@function +.align 32 +sha1_block_data_order_shaext: +_shaext_shortcut: +.cfi_startproc + movdqu (%rdi),%xmm0 + movd 16(%rdi),%xmm1 + movdqa K_XX_XX+160(%rip),%xmm3 + + movdqu (%rsi),%xmm4 + pshufd $27,%xmm0,%xmm0 + movdqu 16(%rsi),%xmm5 + pshufd $27,%xmm1,%xmm1 + movdqu 32(%rsi),%xmm6 +.byte 102,15,56,0,227 + movdqu 48(%rsi),%xmm7 +.byte 102,15,56,0,235 +.byte 102,15,56,0,243 + movdqa %xmm1,%xmm9 +.byte 102,15,56,0,251 + jmp .Loop_shaext + +.align 16 +.Loop_shaext: + decq %rdx + leaq 64(%rsi),%r8 + paddd %xmm4,%xmm1 + cmovneq %r8,%rsi + movdqa %xmm0,%xmm8 +.byte 15,56,201,229 + movdqa %xmm0,%xmm2 +.byte 15,58,204,193,0 +.byte 15,56,200,213 + pxor %xmm6,%xmm4 +.byte 15,56,201,238 +.byte 15,56,202,231 + + movdqa %xmm0,%xmm1 +.byte 15,58,204,194,0 +.byte 15,56,200,206 + pxor %xmm7,%xmm5 +.byte 15,56,202,236 +.byte 15,56,201,247 + movdqa %xmm0,%xmm2 +.byte 15,58,204,193,0 +.byte 15,56,200,215 + pxor %xmm4,%xmm6 +.byte 15,56,201,252 +.byte 15,56,202,245 + + movdqa %xmm0,%xmm1 +.byte 15,58,204,194,0 +.byte 15,56,200,204 + pxor %xmm5,%xmm7 +.byte 15,56,202,254 +.byte 15,56,201,229 + movdqa %xmm0,%xmm2 +.byte 15,58,204,193,0 +.byte 15,56,200,213 + pxor %xmm6,%xmm4 +.byte 15,56,201,238 +.byte 15,56,202,231 + + movdqa %xmm0,%xmm1 +.byte 15,58,204,194,1 +.byte 15,56,200,206 + pxor %xmm7,%xmm5 +.byte 15,56,202,236 +.byte 15,56,201,247 + movdqa %xmm0,%xmm2 +.byte 15,58,204,193,1 +.byte 15,56,200,215 + pxor %xmm4,%xmm6 +.byte 15,56,201,252 +.byte 15,56,202,245 + + movdqa %xmm0,%xmm1 +.byte 15,58,204,194,1 +.byte 15,56,200,204 + pxor %xmm5,%xmm7 +.byte 15,56,202,254 +.byte 15,56,201,229 + movdqa %xmm0,%xmm2 +.byte 15,58,204,193,1 +.byte 15,56,200,213 + pxor %xmm6,%xmm4 +.byte 15,56,201,238 +.byte 15,56,202,231 + + movdqa %xmm0,%xmm1 +.byte 15,58,204,194,1 +.byte 15,56,200,206 + pxor %xmm7,%xmm5 +.byte 15,56,202,236 +.byte 15,56,201,247 + movdqa %xmm0,%xmm2 +.byte 15,58,204,193,2 +.byte 15,56,200,215 + pxor %xmm4,%xmm6 +.byte 15,56,201,252 +.byte 15,56,202,245 + + movdqa %xmm0,%xmm1 +.byte 15,58,204,194,2 +.byte 15,56,200,204 + pxor %xmm5,%xmm7 +.byte 15,56,202,254 +.byte 15,56,201,229 + movdqa %xmm0,%xmm2 +.byte 15,58,204,193,2 +.byte 15,56,200,213 + pxor %xmm6,%xmm4 +.byte 15,56,201,238 +.byte 15,56,202,231 + + movdqa %xmm0,%xmm1 +.byte 15,58,204,194,2 +.byte 15,56,200,206 + pxor %xmm7,%xmm5 +.byte 15,56,202,236 +.byte 15,56,201,247 + movdqa %xmm0,%xmm2 +.byte 15,58,204,193,2 +.byte 15,56,200,215 + pxor %xmm4,%xmm6 +.byte 15,56,201,252 +.byte 15,56,202,245 + + movdqa %xmm0,%xmm1 +.byte 15,58,204,194,3 +.byte 15,56,200,204 + pxor %xmm5,%xmm7 +.byte 15,56,202,254 + movdqu (%rsi),%xmm4 + movdqa %xmm0,%xmm2 +.byte 15,58,204,193,3 +.byte 15,56,200,213 + movdqu 16(%rsi),%xmm5 +.byte 102,15,56,0,227 + + movdqa %xmm0,%xmm1 +.byte 15,58,204,194,3 +.byte 15,56,200,206 + movdqu 32(%rsi),%xmm6 +.byte 102,15,56,0,235 + + movdqa %xmm0,%xmm2 +.byte 15,58,204,193,3 +.byte 15,56,200,215 + movdqu 48(%rsi),%xmm7 +.byte 102,15,56,0,243 + + movdqa %xmm0,%xmm1 +.byte 15,58,204,194,3 +.byte 65,15,56,200,201 +.byte 102,15,56,0,251 + + paddd %xmm8,%xmm0 + movdqa %xmm1,%xmm9 + + jnz .Loop_shaext + + pshufd $27,%xmm0,%xmm0 + pshufd $27,%xmm1,%xmm1 + movdqu %xmm0,(%rdi) + movd %xmm1,16(%rdi) + .byte 0xf3,0xc3 +.cfi_endproc +.size sha1_block_data_order_shaext,.-sha1_block_data_order_shaext +.type sha1_block_data_order_ssse3,@function +.align 16 +sha1_block_data_order_ssse3: +_ssse3_shortcut: +.cfi_startproc + movq %rsp,%r11 +.cfi_def_cfa_register %r11 + pushq %rbx +.cfi_offset %rbx,-16 + pushq %rbp +.cfi_offset %rbp,-24 + pushq %r12 +.cfi_offset %r12,-32 + pushq %r13 +.cfi_offset %r13,-40 + pushq %r14 +.cfi_offset %r14,-48 + leaq -64(%rsp),%rsp + andq $-64,%rsp + movq %rdi,%r8 + movq %rsi,%r9 + movq %rdx,%r10 + + shlq $6,%r10 + addq %r9,%r10 + leaq K_XX_XX+64(%rip),%r14 + + movl 0(%r8),%eax + movl 4(%r8),%ebx + movl 8(%r8),%ecx + movl 12(%r8),%edx + movl %ebx,%esi + movl 16(%r8),%ebp + movl %ecx,%edi + xorl %edx,%edi + andl %edi,%esi + + movdqa 64(%r14),%xmm6 + movdqa -64(%r14),%xmm9 + movdqu 0(%r9),%xmm0 + movdqu 16(%r9),%xmm1 + movdqu 32(%r9),%xmm2 + movdqu 48(%r9),%xmm3 +.byte 102,15,56,0,198 +.byte 102,15,56,0,206 +.byte 102,15,56,0,214 + addq $64,%r9 + paddd %xmm9,%xmm0 +.byte 102,15,56,0,222 + paddd %xmm9,%xmm1 + paddd %xmm9,%xmm2 + movdqa %xmm0,0(%rsp) + psubd %xmm9,%xmm0 + movdqa %xmm1,16(%rsp) + psubd %xmm9,%xmm1 + movdqa %xmm2,32(%rsp) + psubd %xmm9,%xmm2 + jmp .Loop_ssse3 +.align 16 +.Loop_ssse3: + rorl $2,%ebx + pshufd $238,%xmm0,%xmm4 + xorl %edx,%esi + movdqa %xmm3,%xmm8 + paddd %xmm3,%xmm9 + movl %eax,%edi + addl 0(%rsp),%ebp + punpcklqdq %xmm1,%xmm4 + xorl %ecx,%ebx + roll $5,%eax + addl %esi,%ebp + psrldq $4,%xmm8 + andl %ebx,%edi + xorl %ecx,%ebx + pxor %xmm0,%xmm4 + addl %eax,%ebp + rorl $7,%eax + pxor %xmm2,%xmm8 + xorl %ecx,%edi + movl %ebp,%esi + addl 4(%rsp),%edx + pxor %xmm8,%xmm4 + xorl %ebx,%eax + roll $5,%ebp + movdqa %xmm9,48(%rsp) + addl %edi,%edx + andl %eax,%esi + movdqa %xmm4,%xmm10 + xorl %ebx,%eax + addl %ebp,%edx + rorl $7,%ebp + movdqa %xmm4,%xmm8 + xorl %ebx,%esi + pslldq $12,%xmm10 + paddd %xmm4,%xmm4 + movl %edx,%edi + addl 8(%rsp),%ecx + psrld $31,%xmm8 + xorl %eax,%ebp + roll $5,%edx + addl %esi,%ecx + movdqa %xmm10,%xmm9 + andl %ebp,%edi + xorl %eax,%ebp + psrld $30,%xmm10 + addl %edx,%ecx + rorl $7,%edx + por %xmm8,%xmm4 + xorl %eax,%edi + movl %ecx,%esi + addl 12(%rsp),%ebx + pslld $2,%xmm9 + pxor %xmm10,%xmm4 + xorl %ebp,%edx + movdqa -64(%r14),%xmm10 + roll $5,%ecx + addl %edi,%ebx + andl %edx,%esi + pxor %xmm9,%xmm4 + xorl %ebp,%edx + addl %ecx,%ebx + rorl $7,%ecx + pshufd $238,%xmm1,%xmm5 + xorl %ebp,%esi + movdqa %xmm4,%xmm9 + paddd %xmm4,%xmm10 + movl %ebx,%edi + addl 16(%rsp),%eax + punpcklqdq %xmm2,%xmm5 + xorl %edx,%ecx + roll $5,%ebx + addl %esi,%eax + psrldq $4,%xmm9 + andl %ecx,%edi + xorl %edx,%ecx + pxor %xmm1,%xmm5 + addl %ebx,%eax + rorl $7,%ebx + pxor %xmm3,%xmm9 + xorl %edx,%edi + movl %eax,%esi + addl 20(%rsp),%ebp + pxor %xmm9,%xmm5 + xorl %ecx,%ebx + roll $5,%eax + movdqa %xmm10,0(%rsp) + addl %edi,%ebp + andl %ebx,%esi + movdqa %xmm5,%xmm8 + xorl %ecx,%ebx + addl %eax,%ebp + rorl $7,%eax + movdqa %xmm5,%xmm9 + xorl %ecx,%esi + pslldq $12,%xmm8 + paddd %xmm5,%xmm5 + movl %ebp,%edi + addl 24(%rsp),%edx + psrld $31,%xmm9 + xorl %ebx,%eax + roll $5,%ebp + addl %esi,%edx + movdqa %xmm8,%xmm10 + andl %eax,%edi + xorl %ebx,%eax + psrld $30,%xmm8 + addl %ebp,%edx + rorl $7,%ebp + por %xmm9,%xmm5 + xorl %ebx,%edi + movl %edx,%esi + addl 28(%rsp),%ecx + pslld $2,%xmm10 + pxor %xmm8,%xmm5 + xorl %eax,%ebp + movdqa -32(%r14),%xmm8 + roll $5,%edx + addl %edi,%ecx + andl %ebp,%esi + pxor %xmm10,%xmm5 + xorl %eax,%ebp + addl %edx,%ecx + rorl $7,%edx + pshufd $238,%xmm2,%xmm6 + xorl %eax,%esi + movdqa %xmm5,%xmm10 + paddd %xmm5,%xmm8 + movl %ecx,%edi + addl 32(%rsp),%ebx + punpcklqdq %xmm3,%xmm6 + xorl %ebp,%edx + roll $5,%ecx + addl %esi,%ebx + psrldq $4,%xmm10 + andl %edx,%edi + xorl %ebp,%edx + pxor %xmm2,%xmm6 + addl %ecx,%ebx + rorl $7,%ecx + pxor %xmm4,%xmm10 + xorl %ebp,%edi + movl %ebx,%esi + addl 36(%rsp),%eax + pxor %xmm10,%xmm6 + xorl %edx,%ecx + roll $5,%ebx + movdqa %xmm8,16(%rsp) + addl %edi,%eax + andl %ecx,%esi + movdqa %xmm6,%xmm9 + xorl %edx,%ecx + addl %ebx,%eax + rorl $7,%ebx + movdqa %xmm6,%xmm10 + xorl %edx,%esi + pslldq $12,%xmm9 + paddd %xmm6,%xmm6 + movl %eax,%edi + addl 40(%rsp),%ebp + psrld $31,%xmm10 + xorl %ecx,%ebx + roll $5,%eax + addl %esi,%ebp + movdqa %xmm9,%xmm8 + andl %ebx,%edi + xorl %ecx,%ebx + psrld $30,%xmm9 + addl %eax,%ebp + rorl $7,%eax + por %xmm10,%xmm6 + xorl %ecx,%edi + movl %ebp,%esi + addl 44(%rsp),%edx + pslld $2,%xmm8 + pxor %xmm9,%xmm6 + xorl %ebx,%eax + movdqa -32(%r14),%xmm9 + roll $5,%ebp + addl %edi,%edx + andl %eax,%esi + pxor %xmm8,%xmm6 + xorl %ebx,%eax + addl %ebp,%edx + rorl $7,%ebp + pshufd $238,%xmm3,%xmm7 + xorl %ebx,%esi + movdqa %xmm6,%xmm8 + paddd %xmm6,%xmm9 + movl %edx,%edi + addl 48(%rsp),%ecx + punpcklqdq %xmm4,%xmm7 + xorl %eax,%ebp + roll $5,%edx + addl %esi,%ecx + psrldq $4,%xmm8 + andl %ebp,%edi + xorl %eax,%ebp + pxor %xmm3,%xmm7 + addl %edx,%ecx + rorl $7,%edx + pxor %xmm5,%xmm8 + xorl %eax,%edi + movl %ecx,%esi + addl 52(%rsp),%ebx + pxor %xmm8,%xmm7 + xorl %ebp,%edx + roll $5,%ecx + movdqa %xmm9,32(%rsp) + addl %edi,%ebx + andl %edx,%esi + movdqa %xmm7,%xmm10 + xorl %ebp,%edx + addl %ecx,%ebx + rorl $7,%ecx + movdqa %xmm7,%xmm8 + xorl %ebp,%esi + pslldq $12,%xmm10 + paddd %xmm7,%xmm7 + movl %ebx,%edi + addl 56(%rsp),%eax + psrld $31,%xmm8 + xorl %edx,%ecx + roll $5,%ebx + addl %esi,%eax + movdqa %xmm10,%xmm9 + andl %ecx,%edi + xorl %edx,%ecx + psrld $30,%xmm10 + addl %ebx,%eax + rorl $7,%ebx + por %xmm8,%xmm7 + xorl %edx,%edi + movl %eax,%esi + addl 60(%rsp),%ebp + pslld $2,%xmm9 + pxor %xmm10,%xmm7 + xorl %ecx,%ebx + movdqa -32(%r14),%xmm10 + roll $5,%eax + addl %edi,%ebp + andl %ebx,%esi + pxor %xmm9,%xmm7 + pshufd $238,%xmm6,%xmm9 + xorl %ecx,%ebx + addl %eax,%ebp + rorl $7,%eax + pxor %xmm4,%xmm0 + xorl %ecx,%esi + movl %ebp,%edi + addl 0(%rsp),%edx + punpcklqdq %xmm7,%xmm9 + xorl %ebx,%eax + roll $5,%ebp + pxor %xmm1,%xmm0 + addl %esi,%edx + andl %eax,%edi + movdqa %xmm10,%xmm8 + xorl %ebx,%eax + paddd %xmm7,%xmm10 + addl %ebp,%edx + pxor %xmm9,%xmm0 + rorl $7,%ebp + xorl %ebx,%edi + movl %edx,%esi + addl 4(%rsp),%ecx + movdqa %xmm0,%xmm9 + xorl %eax,%ebp + roll $5,%edx + movdqa %xmm10,48(%rsp) + addl %edi,%ecx + andl %ebp,%esi + xorl %eax,%ebp + pslld $2,%xmm0 + addl %edx,%ecx + rorl $7,%edx + psrld $30,%xmm9 + xorl %eax,%esi + movl %ecx,%edi + addl 8(%rsp),%ebx + por %xmm9,%xmm0 + xorl %ebp,%edx + roll $5,%ecx + pshufd $238,%xmm7,%xmm10 + addl %esi,%ebx + andl %edx,%edi + xorl %ebp,%edx + addl %ecx,%ebx + addl 12(%rsp),%eax + xorl %ebp,%edi + movl %ebx,%esi + roll $5,%ebx + addl %edi,%eax + xorl %edx,%esi + rorl $7,%ecx + addl %ebx,%eax + pxor %xmm5,%xmm1 + addl 16(%rsp),%ebp + xorl %ecx,%esi + punpcklqdq %xmm0,%xmm10 + movl %eax,%edi + roll $5,%eax + pxor %xmm2,%xmm1 + addl %esi,%ebp + xorl %ecx,%edi + movdqa %xmm8,%xmm9 + rorl $7,%ebx + paddd %xmm0,%xmm8 + addl %eax,%ebp + pxor %xmm10,%xmm1 + addl 20(%rsp),%edx + xorl %ebx,%edi + movl %ebp,%esi + roll $5,%ebp + movdqa %xmm1,%xmm10 + addl %edi,%edx + xorl %ebx,%esi + movdqa %xmm8,0(%rsp) + rorl $7,%eax + addl %ebp,%edx + addl 24(%rsp),%ecx + pslld $2,%xmm1 + xorl %eax,%esi + movl %edx,%edi + psrld $30,%xmm10 + roll $5,%edx + addl %esi,%ecx + xorl %eax,%edi + rorl $7,%ebp + por %xmm10,%xmm1 + addl %edx,%ecx + addl 28(%rsp),%ebx + pshufd $238,%xmm0,%xmm8 + xorl %ebp,%edi + movl %ecx,%esi + roll $5,%ecx + addl %edi,%ebx + xorl %ebp,%esi + rorl $7,%edx + addl %ecx,%ebx + pxor %xmm6,%xmm2 + addl 32(%rsp),%eax + xorl %edx,%esi + punpcklqdq %xmm1,%xmm8 + movl %ebx,%edi + roll $5,%ebx + pxor %xmm3,%xmm2 + addl %esi,%eax + xorl %edx,%edi + movdqa 0(%r14),%xmm10 + rorl $7,%ecx + paddd %xmm1,%xmm9 + addl %ebx,%eax + pxor %xmm8,%xmm2 + addl 36(%rsp),%ebp + xorl %ecx,%edi + movl %eax,%esi + roll $5,%eax + movdqa %xmm2,%xmm8 + addl %edi,%ebp + xorl %ecx,%esi + movdqa %xmm9,16(%rsp) + rorl $7,%ebx + addl %eax,%ebp + addl 40(%rsp),%edx + pslld $2,%xmm2 + xorl %ebx,%esi + movl %ebp,%edi + psrld $30,%xmm8 + roll $5,%ebp + addl %esi,%edx + xorl %ebx,%edi + rorl $7,%eax + por %xmm8,%xmm2 + addl %ebp,%edx + addl 44(%rsp),%ecx + pshufd $238,%xmm1,%xmm9 + xorl %eax,%edi + movl %edx,%esi + roll $5,%edx + addl %edi,%ecx + xorl %eax,%esi + rorl $7,%ebp + addl %edx,%ecx + pxor %xmm7,%xmm3 + addl 48(%rsp),%ebx + xorl %ebp,%esi + punpcklqdq %xmm2,%xmm9 + movl %ecx,%edi + roll $5,%ecx + pxor %xmm4,%xmm3 + addl %esi,%ebx + xorl %ebp,%edi + movdqa %xmm10,%xmm8 + rorl $7,%edx + paddd %xmm2,%xmm10 + addl %ecx,%ebx + pxor %xmm9,%xmm3 + addl 52(%rsp),%eax + xorl %edx,%edi + movl %ebx,%esi + roll $5,%ebx + movdqa %xmm3,%xmm9 + addl %edi,%eax + xorl %edx,%esi + movdqa %xmm10,32(%rsp) + rorl $7,%ecx + addl %ebx,%eax + addl 56(%rsp),%ebp + pslld $2,%xmm3 + xorl %ecx,%esi + movl %eax,%edi + psrld $30,%xmm9 + roll $5,%eax + addl %esi,%ebp + xorl %ecx,%edi + rorl $7,%ebx + por %xmm9,%xmm3 + addl %eax,%ebp + addl 60(%rsp),%edx + pshufd $238,%xmm2,%xmm10 + xorl %ebx,%edi + movl %ebp,%esi + roll $5,%ebp + addl %edi,%edx + xorl %ebx,%esi + rorl $7,%eax + addl %ebp,%edx + pxor %xmm0,%xmm4 + addl 0(%rsp),%ecx + xorl %eax,%esi + punpcklqdq %xmm3,%xmm10 + movl %edx,%edi + roll $5,%edx + pxor %xmm5,%xmm4 + addl %esi,%ecx + xorl %eax,%edi + movdqa %xmm8,%xmm9 + rorl $7,%ebp + paddd %xmm3,%xmm8 + addl %edx,%ecx + pxor %xmm10,%xmm4 + addl 4(%rsp),%ebx + xorl %ebp,%edi + movl %ecx,%esi + roll $5,%ecx + movdqa %xmm4,%xmm10 + addl %edi,%ebx + xorl %ebp,%esi + movdqa %xmm8,48(%rsp) + rorl $7,%edx + addl %ecx,%ebx + addl 8(%rsp),%eax + pslld $2,%xmm4 + xorl %edx,%esi + movl %ebx,%edi + psrld $30,%xmm10 + roll $5,%ebx + addl %esi,%eax + xorl %edx,%edi + rorl $7,%ecx + por %xmm10,%xmm4 + addl %ebx,%eax + addl 12(%rsp),%ebp + pshufd $238,%xmm3,%xmm8 + xorl %ecx,%edi + movl %eax,%esi + roll $5,%eax + addl %edi,%ebp + xorl %ecx,%esi + rorl $7,%ebx + addl %eax,%ebp + pxor %xmm1,%xmm5 + addl 16(%rsp),%edx + xorl %ebx,%esi + punpcklqdq %xmm4,%xmm8 + movl %ebp,%edi + roll $5,%ebp + pxor %xmm6,%xmm5 + addl %esi,%edx + xorl %ebx,%edi + movdqa %xmm9,%xmm10 + rorl $7,%eax + paddd %xmm4,%xmm9 + addl %ebp,%edx + pxor %xmm8,%xmm5 + addl 20(%rsp),%ecx + xorl %eax,%edi + movl %edx,%esi + roll $5,%edx + movdqa %xmm5,%xmm8 + addl %edi,%ecx + xorl %eax,%esi + movdqa %xmm9,0(%rsp) + rorl $7,%ebp + addl %edx,%ecx + addl 24(%rsp),%ebx + pslld $2,%xmm5 + xorl %ebp,%esi + movl %ecx,%edi + psrld $30,%xmm8 + roll $5,%ecx + addl %esi,%ebx + xorl %ebp,%edi + rorl $7,%edx + por %xmm8,%xmm5 + addl %ecx,%ebx + addl 28(%rsp),%eax + pshufd $238,%xmm4,%xmm9 + rorl $7,%ecx + movl %ebx,%esi + xorl %edx,%edi + roll $5,%ebx + addl %edi,%eax + xorl %ecx,%esi + xorl %edx,%ecx + addl %ebx,%eax + pxor %xmm2,%xmm6 + addl 32(%rsp),%ebp + andl %ecx,%esi + xorl %edx,%ecx + rorl $7,%ebx + punpcklqdq %xmm5,%xmm9 + movl %eax,%edi + xorl %ecx,%esi + pxor %xmm7,%xmm6 + roll $5,%eax + addl %esi,%ebp + movdqa %xmm10,%xmm8 + xorl %ebx,%edi + paddd %xmm5,%xmm10 + xorl %ecx,%ebx + pxor %xmm9,%xmm6 + addl %eax,%ebp + addl 36(%rsp),%edx + andl %ebx,%edi + xorl %ecx,%ebx + rorl $7,%eax + movdqa %xmm6,%xmm9 + movl %ebp,%esi + xorl %ebx,%edi + movdqa %xmm10,16(%rsp) + roll $5,%ebp + addl %edi,%edx + xorl %eax,%esi + pslld $2,%xmm6 + xorl %ebx,%eax + addl %ebp,%edx + psrld $30,%xmm9 + addl 40(%rsp),%ecx + andl %eax,%esi + xorl %ebx,%eax + por %xmm9,%xmm6 + rorl $7,%ebp + movl %edx,%edi + xorl %eax,%esi + roll $5,%edx + pshufd $238,%xmm5,%xmm10 + addl %esi,%ecx + xorl %ebp,%edi + xorl %eax,%ebp + addl %edx,%ecx + addl 44(%rsp),%ebx + andl %ebp,%edi + xorl %eax,%ebp + rorl $7,%edx + movl %ecx,%esi + xorl %ebp,%edi + roll $5,%ecx + addl %edi,%ebx + xorl %edx,%esi + xorl %ebp,%edx + addl %ecx,%ebx + pxor %xmm3,%xmm7 + addl 48(%rsp),%eax + andl %edx,%esi + xorl %ebp,%edx + rorl $7,%ecx + punpcklqdq %xmm6,%xmm10 + movl %ebx,%edi + xorl %edx,%esi + pxor %xmm0,%xmm7 + roll $5,%ebx + addl %esi,%eax + movdqa 32(%r14),%xmm9 + xorl %ecx,%edi + paddd %xmm6,%xmm8 + xorl %edx,%ecx + pxor %xmm10,%xmm7 + addl %ebx,%eax + addl 52(%rsp),%ebp + andl %ecx,%edi + xorl %edx,%ecx + rorl $7,%ebx + movdqa %xmm7,%xmm10 + movl %eax,%esi + xorl %ecx,%edi + movdqa %xmm8,32(%rsp) + roll $5,%eax + addl %edi,%ebp + xorl %ebx,%esi + pslld $2,%xmm7 + xorl %ecx,%ebx + addl %eax,%ebp + psrld $30,%xmm10 + addl 56(%rsp),%edx + andl %ebx,%esi + xorl %ecx,%ebx + por %xmm10,%xmm7 + rorl $7,%eax + movl %ebp,%edi + xorl %ebx,%esi + roll $5,%ebp + pshufd $238,%xmm6,%xmm8 + addl %esi,%edx + xorl %eax,%edi + xorl %ebx,%eax + addl %ebp,%edx + addl 60(%rsp),%ecx + andl %eax,%edi + xorl %ebx,%eax + rorl $7,%ebp + movl %edx,%esi + xorl %eax,%edi + roll $5,%edx + addl %edi,%ecx + xorl %ebp,%esi + xorl %eax,%ebp + addl %edx,%ecx + pxor %xmm4,%xmm0 + addl 0(%rsp),%ebx + andl %ebp,%esi + xorl %eax,%ebp + rorl $7,%edx + punpcklqdq %xmm7,%xmm8 + movl %ecx,%edi + xorl %ebp,%esi + pxor %xmm1,%xmm0 + roll $5,%ecx + addl %esi,%ebx + movdqa %xmm9,%xmm10 + xorl %edx,%edi + paddd %xmm7,%xmm9 + xorl %ebp,%edx + pxor %xmm8,%xmm0 + addl %ecx,%ebx + addl 4(%rsp),%eax + andl %edx,%edi + xorl %ebp,%edx + rorl $7,%ecx + movdqa %xmm0,%xmm8 + movl %ebx,%esi + xorl %edx,%edi + movdqa %xmm9,48(%rsp) + roll $5,%ebx + addl %edi,%eax + xorl %ecx,%esi + pslld $2,%xmm0 + xorl %edx,%ecx + addl %ebx,%eax + psrld $30,%xmm8 + addl 8(%rsp),%ebp + andl %ecx,%esi + xorl %edx,%ecx + por %xmm8,%xmm0 + rorl $7,%ebx + movl %eax,%edi + xorl %ecx,%esi + roll $5,%eax + pshufd $238,%xmm7,%xmm9 + addl %esi,%ebp + xorl %ebx,%edi + xorl %ecx,%ebx + addl %eax,%ebp + addl 12(%rsp),%edx + andl %ebx,%edi + xorl %ecx,%ebx + rorl $7,%eax + movl %ebp,%esi + xorl %ebx,%edi + roll $5,%ebp + addl %edi,%edx + xorl %eax,%esi + xorl %ebx,%eax + addl %ebp,%edx + pxor %xmm5,%xmm1 + addl 16(%rsp),%ecx + andl %eax,%esi + xorl %ebx,%eax + rorl $7,%ebp + punpcklqdq %xmm0,%xmm9 + movl %edx,%edi + xorl %eax,%esi + pxor %xmm2,%xmm1 + roll $5,%edx + addl %esi,%ecx + movdqa %xmm10,%xmm8 + xorl %ebp,%edi + paddd %xmm0,%xmm10 + xorl %eax,%ebp + pxor %xmm9,%xmm1 + addl %edx,%ecx + addl 20(%rsp),%ebx + andl %ebp,%edi + xorl %eax,%ebp + rorl $7,%edx + movdqa %xmm1,%xmm9 + movl %ecx,%esi + xorl %ebp,%edi + movdqa %xmm10,0(%rsp) + roll $5,%ecx + addl %edi,%ebx + xorl %edx,%esi + pslld $2,%xmm1 + xorl %ebp,%edx + addl %ecx,%ebx + psrld $30,%xmm9 + addl 24(%rsp),%eax + andl %edx,%esi + xorl %ebp,%edx + por %xmm9,%xmm1 + rorl $7,%ecx + movl %ebx,%edi + xorl %edx,%esi + roll $5,%ebx + pshufd $238,%xmm0,%xmm10 + addl %esi,%eax + xorl %ecx,%edi + xorl %edx,%ecx + addl %ebx,%eax + addl 28(%rsp),%ebp + andl %ecx,%edi + xorl %edx,%ecx + rorl $7,%ebx + movl %eax,%esi + xorl %ecx,%edi + roll $5,%eax + addl %edi,%ebp + xorl %ebx,%esi + xorl %ecx,%ebx + addl %eax,%ebp + pxor %xmm6,%xmm2 + addl 32(%rsp),%edx + andl %ebx,%esi + xorl %ecx,%ebx + rorl $7,%eax + punpcklqdq %xmm1,%xmm10 + movl %ebp,%edi + xorl %ebx,%esi + pxor %xmm3,%xmm2 + roll $5,%ebp + addl %esi,%edx + movdqa %xmm8,%xmm9 + xorl %eax,%edi + paddd %xmm1,%xmm8 + xorl %ebx,%eax + pxor %xmm10,%xmm2 + addl %ebp,%edx + addl 36(%rsp),%ecx + andl %eax,%edi + xorl %ebx,%eax + rorl $7,%ebp + movdqa %xmm2,%xmm10 + movl %edx,%esi + xorl %eax,%edi + movdqa %xmm8,16(%rsp) + roll $5,%edx + addl %edi,%ecx + xorl %ebp,%esi + pslld $2,%xmm2 + xorl %eax,%ebp + addl %edx,%ecx + psrld $30,%xmm10 + addl 40(%rsp),%ebx + andl %ebp,%esi + xorl %eax,%ebp + por %xmm10,%xmm2 + rorl $7,%edx + movl %ecx,%edi + xorl %ebp,%esi + roll $5,%ecx + pshufd $238,%xmm1,%xmm8 + addl %esi,%ebx + xorl %edx,%edi + xorl %ebp,%edx + addl %ecx,%ebx + addl 44(%rsp),%eax + andl %edx,%edi + xorl %ebp,%edx + rorl $7,%ecx + movl %ebx,%esi + xorl %edx,%edi + roll $5,%ebx + addl %edi,%eax + xorl %edx,%esi + addl %ebx,%eax + pxor %xmm7,%xmm3 + addl 48(%rsp),%ebp + xorl %ecx,%esi + punpcklqdq %xmm2,%xmm8 + movl %eax,%edi + roll $5,%eax + pxor %xmm4,%xmm3 + addl %esi,%ebp + xorl %ecx,%edi + movdqa %xmm9,%xmm10 + rorl $7,%ebx + paddd %xmm2,%xmm9 + addl %eax,%ebp + pxor %xmm8,%xmm3 + addl 52(%rsp),%edx + xorl %ebx,%edi + movl %ebp,%esi + roll $5,%ebp + movdqa %xmm3,%xmm8 + addl %edi,%edx + xorl %ebx,%esi + movdqa %xmm9,32(%rsp) + rorl $7,%eax + addl %ebp,%edx + addl 56(%rsp),%ecx + pslld $2,%xmm3 + xorl %eax,%esi + movl %edx,%edi + psrld $30,%xmm8 + roll $5,%edx + addl %esi,%ecx + xorl %eax,%edi + rorl $7,%ebp + por %xmm8,%xmm3 + addl %edx,%ecx + addl 60(%rsp),%ebx + xorl %ebp,%edi + movl %ecx,%esi + roll $5,%ecx + addl %edi,%ebx + xorl %ebp,%esi + rorl $7,%edx + addl %ecx,%ebx + addl 0(%rsp),%eax + xorl %edx,%esi + movl %ebx,%edi + roll $5,%ebx + paddd %xmm3,%xmm10 + addl %esi,%eax + xorl %edx,%edi + movdqa %xmm10,48(%rsp) + rorl $7,%ecx + addl %ebx,%eax + addl 4(%rsp),%ebp + xorl %ecx,%edi + movl %eax,%esi + roll $5,%eax + addl %edi,%ebp + xorl %ecx,%esi + rorl $7,%ebx + addl %eax,%ebp + addl 8(%rsp),%edx + xorl %ebx,%esi + movl %ebp,%edi + roll $5,%ebp + addl %esi,%edx + xorl %ebx,%edi + rorl $7,%eax + addl %ebp,%edx + addl 12(%rsp),%ecx + xorl %eax,%edi + movl %edx,%esi + roll $5,%edx + addl %edi,%ecx + xorl %eax,%esi + rorl $7,%ebp + addl %edx,%ecx + cmpq %r10,%r9 + je .Ldone_ssse3 + movdqa 64(%r14),%xmm6 + movdqa -64(%r14),%xmm9 + movdqu 0(%r9),%xmm0 + movdqu 16(%r9),%xmm1 + movdqu 32(%r9),%xmm2 + movdqu 48(%r9),%xmm3 +.byte 102,15,56,0,198 + addq $64,%r9 + addl 16(%rsp),%ebx + xorl %ebp,%esi + movl %ecx,%edi +.byte 102,15,56,0,206 + roll $5,%ecx + addl %esi,%ebx + xorl %ebp,%edi + rorl $7,%edx + paddd %xmm9,%xmm0 + addl %ecx,%ebx + addl 20(%rsp),%eax + xorl %edx,%edi + movl %ebx,%esi + movdqa %xmm0,0(%rsp) + roll $5,%ebx + addl %edi,%eax + xorl %edx,%esi + rorl $7,%ecx + psubd %xmm9,%xmm0 + addl %ebx,%eax + addl 24(%rsp),%ebp + xorl %ecx,%esi + movl %eax,%edi + roll $5,%eax + addl %esi,%ebp + xorl %ecx,%edi + rorl $7,%ebx + addl %eax,%ebp + addl 28(%rsp),%edx + xorl %ebx,%edi + movl %ebp,%esi + roll $5,%ebp + addl %edi,%edx + xorl %ebx,%esi + rorl $7,%eax + addl %ebp,%edx + addl 32(%rsp),%ecx + xorl %eax,%esi + movl %edx,%edi +.byte 102,15,56,0,214 + roll $5,%edx + addl %esi,%ecx + xorl %eax,%edi + rorl $7,%ebp + paddd %xmm9,%xmm1 + addl %edx,%ecx + addl 36(%rsp),%ebx + xorl %ebp,%edi + movl %ecx,%esi + movdqa %xmm1,16(%rsp) + roll $5,%ecx + addl %edi,%ebx + xorl %ebp,%esi + rorl $7,%edx + psubd %xmm9,%xmm1 + addl %ecx,%ebx + addl 40(%rsp),%eax + xorl %edx,%esi + movl %ebx,%edi + roll $5,%ebx + addl %esi,%eax + xorl %edx,%edi + rorl $7,%ecx + addl %ebx,%eax + addl 44(%rsp),%ebp + xorl %ecx,%edi + movl %eax,%esi + roll $5,%eax + addl %edi,%ebp + xorl %ecx,%esi + rorl $7,%ebx + addl %eax,%ebp + addl 48(%rsp),%edx + xorl %ebx,%esi + movl %ebp,%edi +.byte 102,15,56,0,222 + roll $5,%ebp + addl %esi,%edx + xorl %ebx,%edi + rorl $7,%eax + paddd %xmm9,%xmm2 + addl %ebp,%edx + addl 52(%rsp),%ecx + xorl %eax,%edi + movl %edx,%esi + movdqa %xmm2,32(%rsp) + roll $5,%edx + addl %edi,%ecx + xorl %eax,%esi + rorl $7,%ebp + psubd %xmm9,%xmm2 + addl %edx,%ecx + addl 56(%rsp),%ebx + xorl %ebp,%esi + movl %ecx,%edi + roll $5,%ecx + addl %esi,%ebx + xorl %ebp,%edi + rorl $7,%edx + addl %ecx,%ebx + addl 60(%rsp),%eax + xorl %edx,%edi + movl %ebx,%esi + roll $5,%ebx + addl %edi,%eax + rorl $7,%ecx + addl %ebx,%eax + addl 0(%r8),%eax + addl 4(%r8),%esi + addl 8(%r8),%ecx + addl 12(%r8),%edx + movl %eax,0(%r8) + addl 16(%r8),%ebp + movl %esi,4(%r8) + movl %esi,%ebx + movl %ecx,8(%r8) + movl %ecx,%edi + movl %edx,12(%r8) + xorl %edx,%edi + movl %ebp,16(%r8) + andl %edi,%esi + jmp .Loop_ssse3 + +.align 16 +.Ldone_ssse3: + addl 16(%rsp),%ebx + xorl %ebp,%esi + movl %ecx,%edi + roll $5,%ecx + addl %esi,%ebx + xorl %ebp,%edi + rorl $7,%edx + addl %ecx,%ebx + addl 20(%rsp),%eax + xorl %edx,%edi + movl %ebx,%esi + roll $5,%ebx + addl %edi,%eax + xorl %edx,%esi + rorl $7,%ecx + addl %ebx,%eax + addl 24(%rsp),%ebp + xorl %ecx,%esi + movl %eax,%edi + roll $5,%eax + addl %esi,%ebp + xorl %ecx,%edi + rorl $7,%ebx + addl %eax,%ebp + addl 28(%rsp),%edx + xorl %ebx,%edi + movl %ebp,%esi + roll $5,%ebp + addl %edi,%edx + xorl %ebx,%esi + rorl $7,%eax + addl %ebp,%edx + addl 32(%rsp),%ecx + xorl %eax,%esi + movl %edx,%edi + roll $5,%edx + addl %esi,%ecx + xorl %eax,%edi + rorl $7,%ebp + addl %edx,%ecx + addl 36(%rsp),%ebx + xorl %ebp,%edi + movl %ecx,%esi + roll $5,%ecx + addl %edi,%ebx + xorl %ebp,%esi + rorl $7,%edx + addl %ecx,%ebx + addl 40(%rsp),%eax + xorl %edx,%esi + movl %ebx,%edi + roll $5,%ebx + addl %esi,%eax + xorl %edx,%edi + rorl $7,%ecx + addl %ebx,%eax + addl 44(%rsp),%ebp + xorl %ecx,%edi + movl %eax,%esi + roll $5,%eax + addl %edi,%ebp + xorl %ecx,%esi + rorl $7,%ebx + addl %eax,%ebp + addl 48(%rsp),%edx + xorl %ebx,%esi + movl %ebp,%edi + roll $5,%ebp + addl %esi,%edx + xorl %ebx,%edi + rorl $7,%eax + addl %ebp,%edx + addl 52(%rsp),%ecx + xorl %eax,%edi + movl %edx,%esi + roll $5,%edx + addl %edi,%ecx + xorl %eax,%esi + rorl $7,%ebp + addl %edx,%ecx + addl 56(%rsp),%ebx + xorl %ebp,%esi + movl %ecx,%edi + roll $5,%ecx + addl %esi,%ebx + xorl %ebp,%edi + rorl $7,%edx + addl %ecx,%ebx + addl 60(%rsp),%eax + xorl %edx,%edi + movl %ebx,%esi + roll $5,%ebx + addl %edi,%eax + rorl $7,%ecx + addl %ebx,%eax + addl 0(%r8),%eax + addl 4(%r8),%esi + addl 8(%r8),%ecx + movl %eax,0(%r8) + addl 12(%r8),%edx + movl %esi,4(%r8) + addl 16(%r8),%ebp + movl %ecx,8(%r8) + movl %edx,12(%r8) + movl %ebp,16(%r8) + movq -40(%r11),%r14 +.cfi_restore %r14 + movq -32(%r11),%r13 +.cfi_restore %r13 + movq -24(%r11),%r12 +.cfi_restore %r12 + movq -16(%r11),%rbp +.cfi_restore %rbp + movq -8(%r11),%rbx +.cfi_restore %rbx + leaq (%r11),%rsp +.cfi_def_cfa_register %rsp +.Lepilogue_ssse3: + .byte 0xf3,0xc3 +.cfi_endproc +.size sha1_block_data_order_ssse3,.-sha1_block_data_order_ssse3 +.align 64 +K_XX_XX: +.long 0x5a827999,0x5a827999,0x5a827999,0x5a827999 +.long 0x5a827999,0x5a827999,0x5a827999,0x5a827999 +.long 0x6ed9eba1,0x6ed9eba1,0x6ed9eba1,0x6ed9eba1 +.long 0x6ed9eba1,0x6ed9eba1,0x6ed9eba1,0x6ed9eba1 +.long 0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc +.long 0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc +.long 0xca62c1d6,0xca62c1d6,0xca62c1d6,0xca62c1d6 +.long 0xca62c1d6,0xca62c1d6,0xca62c1d6,0xca62c1d6 +.long 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f +.long 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f +.byte 0xf,0xe,0xd,0xc,0xb,0xa,0x9,0x8,0x7,0x6,0x5,0x4,0x3,0x2,0x1,0x0 +.byte 83,72,65,49,32,98,108,111,99,107,32,116,114,97,110,115,102,111,114,109,32,102,111,114,32,120,56,54,95,54,52,44,32,67,82,89,80,84,79,71,65,77,83,32,98,121,32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46,111,114,103,62,0 +.align 64 diff --git a/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha256-mb-x86_64.S b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha256-mb-x86_64.S new file mode 100644 index 0000000000..25dee488b8 --- /dev/null +++ b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha256-mb-x86_64.S @@ -0,0 +1,3286 @@ +# WARNING: do not edit! +# Generated from openssl/crypto/sha/asm/sha256-mb-x86_64.pl +# +# Copyright 2013-2020 The OpenSSL Project Authors. All Rights Reserved. +# +# Licensed under the OpenSSL license (the "License"). You may not use +# this file except in compliance with the License. You can obtain a copy +# in the file LICENSE in the source distribution or at +# https://www.openssl.org/source/license.html + +.text + + + +.globl sha256_multi_block +.type sha256_multi_block,@function +.align 32 +sha256_multi_block: +.cfi_startproc + movq OPENSSL_ia32cap_P+4(%rip),%rcx + btq $61,%rcx + jc _shaext_shortcut + movq %rsp,%rax +.cfi_def_cfa_register %rax + pushq %rbx +.cfi_offset %rbx,-16 + pushq %rbp +.cfi_offset %rbp,-24 + subq $288,%rsp + andq $-256,%rsp + movq %rax,272(%rsp) +.cfi_escape 0x0f,0x06,0x77,0x90,0x02,0x06,0x23,0x08 +.Lbody: + leaq K256+128(%rip),%rbp + leaq 256(%rsp),%rbx + leaq 128(%rdi),%rdi + +.Loop_grande: + movl %edx,280(%rsp) + xorl %edx,%edx + movq 0(%rsi),%r8 + movl 8(%rsi),%ecx + cmpl %edx,%ecx + cmovgl %ecx,%edx + testl %ecx,%ecx + movl %ecx,0(%rbx) + cmovleq %rbp,%r8 + movq 16(%rsi),%r9 + movl 24(%rsi),%ecx + cmpl %edx,%ecx + cmovgl %ecx,%edx + testl %ecx,%ecx + movl %ecx,4(%rbx) + cmovleq %rbp,%r9 + movq 32(%rsi),%r10 + movl 40(%rsi),%ecx + cmpl %edx,%ecx + cmovgl %ecx,%edx + testl %ecx,%ecx + movl %ecx,8(%rbx) + cmovleq %rbp,%r10 + movq 48(%rsi),%r11 + movl 56(%rsi),%ecx + cmpl %edx,%ecx + cmovgl %ecx,%edx + testl %ecx,%ecx + movl %ecx,12(%rbx) + cmovleq %rbp,%r11 + testl %edx,%edx + jz .Ldone + + movdqu 0-128(%rdi),%xmm8 + leaq 128(%rsp),%rax + movdqu 32-128(%rdi),%xmm9 + movdqu 64-128(%rdi),%xmm10 + movdqu 96-128(%rdi),%xmm11 + movdqu 128-128(%rdi),%xmm12 + movdqu 160-128(%rdi),%xmm13 + movdqu 192-128(%rdi),%xmm14 + movdqu 224-128(%rdi),%xmm15 + movdqu .Lpbswap(%rip),%xmm6 + jmp .Loop + +.align 32 +.Loop: + movdqa %xmm10,%xmm4 + pxor %xmm9,%xmm4 + movd 0(%r8),%xmm5 + movd 0(%r9),%xmm0 + movd 0(%r10),%xmm1 + movd 0(%r11),%xmm2 + punpckldq %xmm1,%xmm5 + punpckldq %xmm2,%xmm0 + punpckldq %xmm0,%xmm5 + movdqa %xmm12,%xmm7 +.byte 102,15,56,0,238 + movdqa %xmm12,%xmm2 + + psrld $6,%xmm7 + movdqa %xmm12,%xmm1 + pslld $7,%xmm2 + movdqa %xmm5,0-128(%rax) + paddd %xmm15,%xmm5 + + psrld $11,%xmm1 + pxor %xmm2,%xmm7 + pslld $21-7,%xmm2 + paddd -128(%rbp),%xmm5 + pxor %xmm1,%xmm7 + + psrld $25-11,%xmm1 + movdqa %xmm12,%xmm0 + + pxor %xmm2,%xmm7 + movdqa %xmm12,%xmm3 + pslld $26-21,%xmm2 + pandn %xmm14,%xmm0 + pand %xmm13,%xmm3 + pxor %xmm1,%xmm7 + + + movdqa %xmm8,%xmm1 + pxor %xmm2,%xmm7 + movdqa %xmm8,%xmm2 + psrld $2,%xmm1 + paddd %xmm7,%xmm5 + pxor %xmm3,%xmm0 + movdqa %xmm9,%xmm3 + movdqa %xmm8,%xmm7 + pslld $10,%xmm2 + pxor %xmm8,%xmm3 + + + psrld $13,%xmm7 + pxor %xmm2,%xmm1 + paddd %xmm0,%xmm5 + pslld $19-10,%xmm2 + pand %xmm3,%xmm4 + pxor %xmm7,%xmm1 + + + psrld $22-13,%xmm7 + pxor %xmm2,%xmm1 + movdqa %xmm9,%xmm15 + pslld $30-19,%xmm2 + pxor %xmm1,%xmm7 + pxor %xmm4,%xmm15 + paddd %xmm5,%xmm11 + pxor %xmm2,%xmm7 + + paddd %xmm5,%xmm15 + paddd %xmm7,%xmm15 + movd 4(%r8),%xmm5 + movd 4(%r9),%xmm0 + movd 4(%r10),%xmm1 + movd 4(%r11),%xmm2 + punpckldq %xmm1,%xmm5 + punpckldq %xmm2,%xmm0 + punpckldq %xmm0,%xmm5 + movdqa %xmm11,%xmm7 + + movdqa %xmm11,%xmm2 +.byte 102,15,56,0,238 + psrld $6,%xmm7 + movdqa %xmm11,%xmm1 + pslld $7,%xmm2 + movdqa %xmm5,16-128(%rax) + paddd %xmm14,%xmm5 + + psrld $11,%xmm1 + pxor %xmm2,%xmm7 + pslld $21-7,%xmm2 + paddd -96(%rbp),%xmm5 + pxor %xmm1,%xmm7 + + psrld $25-11,%xmm1 + movdqa %xmm11,%xmm0 + + pxor %xmm2,%xmm7 + movdqa %xmm11,%xmm4 + pslld $26-21,%xmm2 + pandn %xmm13,%xmm0 + pand %xmm12,%xmm4 + pxor %xmm1,%xmm7 + + + movdqa %xmm15,%xmm1 + pxor %xmm2,%xmm7 + movdqa %xmm15,%xmm2 + psrld $2,%xmm1 + paddd %xmm7,%xmm5 + pxor %xmm4,%xmm0 + movdqa %xmm8,%xmm4 + movdqa %xmm15,%xmm7 + pslld $10,%xmm2 + pxor %xmm15,%xmm4 + + + psrld $13,%xmm7 + pxor %xmm2,%xmm1 + paddd %xmm0,%xmm5 + pslld $19-10,%xmm2 + pand %xmm4,%xmm3 + pxor %xmm7,%xmm1 + + + psrld $22-13,%xmm7 + pxor %xmm2,%xmm1 + movdqa %xmm8,%xmm14 + pslld $30-19,%xmm2 + pxor %xmm1,%xmm7 + pxor %xmm3,%xmm14 + paddd %xmm5,%xmm10 + pxor %xmm2,%xmm7 + + paddd %xmm5,%xmm14 + paddd %xmm7,%xmm14 + movd 8(%r8),%xmm5 + movd 8(%r9),%xmm0 + movd 8(%r10),%xmm1 + movd 8(%r11),%xmm2 + punpckldq %xmm1,%xmm5 + punpckldq %xmm2,%xmm0 + punpckldq %xmm0,%xmm5 + movdqa %xmm10,%xmm7 +.byte 102,15,56,0,238 + movdqa %xmm10,%xmm2 + + psrld $6,%xmm7 + movdqa %xmm10,%xmm1 + pslld $7,%xmm2 + movdqa %xmm5,32-128(%rax) + paddd %xmm13,%xmm5 + + psrld $11,%xmm1 + pxor %xmm2,%xmm7 + pslld $21-7,%xmm2 + paddd -64(%rbp),%xmm5 + pxor %xmm1,%xmm7 + + psrld $25-11,%xmm1 + movdqa %xmm10,%xmm0 + + pxor %xmm2,%xmm7 + movdqa %xmm10,%xmm3 + pslld $26-21,%xmm2 + pandn %xmm12,%xmm0 + pand %xmm11,%xmm3 + pxor %xmm1,%xmm7 + + + movdqa %xmm14,%xmm1 + pxor %xmm2,%xmm7 + movdqa %xmm14,%xmm2 + psrld $2,%xmm1 + paddd %xmm7,%xmm5 + pxor %xmm3,%xmm0 + movdqa %xmm15,%xmm3 + movdqa %xmm14,%xmm7 + pslld $10,%xmm2 + pxor %xmm14,%xmm3 + + + psrld $13,%xmm7 + pxor %xmm2,%xmm1 + paddd %xmm0,%xmm5 + pslld $19-10,%xmm2 + pand %xmm3,%xmm4 + pxor %xmm7,%xmm1 + + + psrld $22-13,%xmm7 + pxor %xmm2,%xmm1 + movdqa %xmm15,%xmm13 + pslld $30-19,%xmm2 + pxor %xmm1,%xmm7 + pxor %xmm4,%xmm13 + paddd %xmm5,%xmm9 + pxor %xmm2,%xmm7 + + paddd %xmm5,%xmm13 + paddd %xmm7,%xmm13 + movd 12(%r8),%xmm5 + movd 12(%r9),%xmm0 + movd 12(%r10),%xmm1 + movd 12(%r11),%xmm2 + punpckldq %xmm1,%xmm5 + punpckldq %xmm2,%xmm0 + punpckldq %xmm0,%xmm5 + movdqa %xmm9,%xmm7 + + movdqa %xmm9,%xmm2 +.byte 102,15,56,0,238 + psrld $6,%xmm7 + movdqa %xmm9,%xmm1 + pslld $7,%xmm2 + movdqa %xmm5,48-128(%rax) + paddd %xmm12,%xmm5 + + psrld $11,%xmm1 + pxor %xmm2,%xmm7 + pslld $21-7,%xmm2 + paddd -32(%rbp),%xmm5 + pxor %xmm1,%xmm7 + + psrld $25-11,%xmm1 + movdqa %xmm9,%xmm0 + + pxor %xmm2,%xmm7 + movdqa %xmm9,%xmm4 + pslld $26-21,%xmm2 + pandn %xmm11,%xmm0 + pand %xmm10,%xmm4 + pxor %xmm1,%xmm7 + + + movdqa %xmm13,%xmm1 + pxor %xmm2,%xmm7 + movdqa %xmm13,%xmm2 + psrld $2,%xmm1 + paddd %xmm7,%xmm5 + pxor %xmm4,%xmm0 + movdqa %xmm14,%xmm4 + movdqa %xmm13,%xmm7 + pslld $10,%xmm2 + pxor %xmm13,%xmm4 + + + psrld $13,%xmm7 + pxor %xmm2,%xmm1 + paddd %xmm0,%xmm5 + pslld $19-10,%xmm2 + pand %xmm4,%xmm3 + pxor %xmm7,%xmm1 + + + psrld $22-13,%xmm7 + pxor %xmm2,%xmm1 + movdqa %xmm14,%xmm12 + pslld $30-19,%xmm2 + pxor %xmm1,%xmm7 + pxor %xmm3,%xmm12 + paddd %xmm5,%xmm8 + pxor %xmm2,%xmm7 + + paddd %xmm5,%xmm12 + paddd %xmm7,%xmm12 + movd 16(%r8),%xmm5 + movd 16(%r9),%xmm0 + movd 16(%r10),%xmm1 + movd 16(%r11),%xmm2 + punpckldq %xmm1,%xmm5 + punpckldq %xmm2,%xmm0 + punpckldq %xmm0,%xmm5 + movdqa %xmm8,%xmm7 +.byte 102,15,56,0,238 + movdqa %xmm8,%xmm2 + + psrld $6,%xmm7 + movdqa %xmm8,%xmm1 + pslld $7,%xmm2 + movdqa %xmm5,64-128(%rax) + paddd %xmm11,%xmm5 + + psrld $11,%xmm1 + pxor %xmm2,%xmm7 + pslld $21-7,%xmm2 + paddd 0(%rbp),%xmm5 + pxor %xmm1,%xmm7 + + psrld $25-11,%xmm1 + movdqa %xmm8,%xmm0 + + pxor %xmm2,%xmm7 + movdqa %xmm8,%xmm3 + pslld $26-21,%xmm2 + pandn %xmm10,%xmm0 + pand %xmm9,%xmm3 + pxor %xmm1,%xmm7 + + + movdqa %xmm12,%xmm1 + pxor %xmm2,%xmm7 + movdqa %xmm12,%xmm2 + psrld $2,%xmm1 + paddd %xmm7,%xmm5 + pxor %xmm3,%xmm0 + movdqa %xmm13,%xmm3 + movdqa %xmm12,%xmm7 + pslld $10,%xmm2 + pxor %xmm12,%xmm3 + + + psrld $13,%xmm7 + pxor %xmm2,%xmm1 + paddd %xmm0,%xmm5 + pslld $19-10,%xmm2 + pand %xmm3,%xmm4 + pxor %xmm7,%xmm1 + + + psrld $22-13,%xmm7 + pxor %xmm2,%xmm1 + movdqa %xmm13,%xmm11 + pslld $30-19,%xmm2 + pxor %xmm1,%xmm7 + pxor %xmm4,%xmm11 + paddd %xmm5,%xmm15 + pxor %xmm2,%xmm7 + + paddd %xmm5,%xmm11 + paddd %xmm7,%xmm11 + movd 20(%r8),%xmm5 + movd 20(%r9),%xmm0 + movd 20(%r10),%xmm1 + movd 20(%r11),%xmm2 + punpckldq %xmm1,%xmm5 + punpckldq %xmm2,%xmm0 + punpckldq %xmm0,%xmm5 + movdqa %xmm15,%xmm7 + + movdqa %xmm15,%xmm2 +.byte 102,15,56,0,238 + psrld $6,%xmm7 + movdqa %xmm15,%xmm1 + pslld $7,%xmm2 + movdqa %xmm5,80-128(%rax) + paddd %xmm10,%xmm5 + + psrld $11,%xmm1 + pxor %xmm2,%xmm7 + pslld $21-7,%xmm2 + paddd 32(%rbp),%xmm5 + pxor %xmm1,%xmm7 + + psrld $25-11,%xmm1 + movdqa %xmm15,%xmm0 + + pxor %xmm2,%xmm7 + movdqa %xmm15,%xmm4 + pslld $26-21,%xmm2 + pandn %xmm9,%xmm0 + pand %xmm8,%xmm4 + pxor %xmm1,%xmm7 + + + movdqa %xmm11,%xmm1 + pxor %xmm2,%xmm7 + movdqa %xmm11,%xmm2 + psrld $2,%xmm1 + paddd %xmm7,%xmm5 + pxor %xmm4,%xmm0 + movdqa %xmm12,%xmm4 + movdqa %xmm11,%xmm7 + pslld $10,%xmm2 + pxor %xmm11,%xmm4 + + + psrld $13,%xmm7 + pxor %xmm2,%xmm1 + paddd %xmm0,%xmm5 + pslld $19-10,%xmm2 + pand %xmm4,%xmm3 + pxor %xmm7,%xmm1 + + + psrld $22-13,%xmm7 + pxor %xmm2,%xmm1 + movdqa %xmm12,%xmm10 + pslld $30-19,%xmm2 + pxor %xmm1,%xmm7 + pxor %xmm3,%xmm10 + paddd %xmm5,%xmm14 + pxor %xmm2,%xmm7 + + paddd %xmm5,%xmm10 + paddd %xmm7,%xmm10 + movd 24(%r8),%xmm5 + movd 24(%r9),%xmm0 + movd 24(%r10),%xmm1 + movd 24(%r11),%xmm2 + punpckldq %xmm1,%xmm5 + punpckldq %xmm2,%xmm0 + punpckldq %xmm0,%xmm5 + movdqa %xmm14,%xmm7 +.byte 102,15,56,0,238 + movdqa %xmm14,%xmm2 + + psrld $6,%xmm7 + movdqa %xmm14,%xmm1 + pslld $7,%xmm2 + movdqa %xmm5,96-128(%rax) + paddd %xmm9,%xmm5 + + psrld $11,%xmm1 + pxor %xmm2,%xmm7 + pslld $21-7,%xmm2 + paddd 64(%rbp),%xmm5 + pxor %xmm1,%xmm7 + + psrld $25-11,%xmm1 + movdqa %xmm14,%xmm0 + + pxor %xmm2,%xmm7 + movdqa %xmm14,%xmm3 + pslld $26-21,%xmm2 + pandn %xmm8,%xmm0 + pand %xmm15,%xmm3 + pxor %xmm1,%xmm7 + + + movdqa %xmm10,%xmm1 + pxor %xmm2,%xmm7 + movdqa %xmm10,%xmm2 + psrld $2,%xmm1 + paddd %xmm7,%xmm5 + pxor %xmm3,%xmm0 + movdqa %xmm11,%xmm3 + movdqa %xmm10,%xmm7 + pslld $10,%xmm2 + pxor %xmm10,%xmm3 + + + psrld $13,%xmm7 + pxor %xmm2,%xmm1 + paddd %xmm0,%xmm5 + pslld $19-10,%xmm2 + pand %xmm3,%xmm4 + pxor %xmm7,%xmm1 + + + psrld $22-13,%xmm7 + pxor %xmm2,%xmm1 + movdqa %xmm11,%xmm9 + pslld $30-19,%xmm2 + pxor %xmm1,%xmm7 + pxor %xmm4,%xmm9 + paddd %xmm5,%xmm13 + pxor %xmm2,%xmm7 + + paddd %xmm5,%xmm9 + paddd %xmm7,%xmm9 + movd 28(%r8),%xmm5 + movd 28(%r9),%xmm0 + movd 28(%r10),%xmm1 + movd 28(%r11),%xmm2 + punpckldq %xmm1,%xmm5 + punpckldq %xmm2,%xmm0 + punpckldq %xmm0,%xmm5 + movdqa %xmm13,%xmm7 + + movdqa %xmm13,%xmm2 +.byte 102,15,56,0,238 + psrld $6,%xmm7 + movdqa %xmm13,%xmm1 + pslld $7,%xmm2 + movdqa %xmm5,112-128(%rax) + paddd %xmm8,%xmm5 + + psrld $11,%xmm1 + pxor %xmm2,%xmm7 + pslld $21-7,%xmm2 + paddd 96(%rbp),%xmm5 + pxor %xmm1,%xmm7 + + psrld $25-11,%xmm1 + movdqa %xmm13,%xmm0 + + pxor %xmm2,%xmm7 + movdqa %xmm13,%xmm4 + pslld $26-21,%xmm2 + pandn %xmm15,%xmm0 + pand %xmm14,%xmm4 + pxor %xmm1,%xmm7 + + + movdqa %xmm9,%xmm1 + pxor %xmm2,%xmm7 + movdqa %xmm9,%xmm2 + psrld $2,%xmm1 + paddd %xmm7,%xmm5 + pxor %xmm4,%xmm0 + movdqa %xmm10,%xmm4 + movdqa %xmm9,%xmm7 + pslld $10,%xmm2 + pxor %xmm9,%xmm4 + + + psrld $13,%xmm7 + pxor %xmm2,%xmm1 + paddd %xmm0,%xmm5 + pslld $19-10,%xmm2 + pand %xmm4,%xmm3 + pxor %xmm7,%xmm1 + + + psrld $22-13,%xmm7 + pxor %xmm2,%xmm1 + movdqa %xmm10,%xmm8 + pslld $30-19,%xmm2 + pxor %xmm1,%xmm7 + pxor %xmm3,%xmm8 + paddd %xmm5,%xmm12 + pxor %xmm2,%xmm7 + + paddd %xmm5,%xmm8 + paddd %xmm7,%xmm8 + leaq 256(%rbp),%rbp + movd 32(%r8),%xmm5 + movd 32(%r9),%xmm0 + movd 32(%r10),%xmm1 + movd 32(%r11),%xmm2 + punpckldq %xmm1,%xmm5 + punpckldq %xmm2,%xmm0 + punpckldq %xmm0,%xmm5 + movdqa %xmm12,%xmm7 +.byte 102,15,56,0,238 + movdqa %xmm12,%xmm2 + + psrld $6,%xmm7 + movdqa %xmm12,%xmm1 + pslld $7,%xmm2 + movdqa %xmm5,128-128(%rax) + paddd %xmm15,%xmm5 + + psrld $11,%xmm1 + pxor %xmm2,%xmm7 + pslld $21-7,%xmm2 + paddd -128(%rbp),%xmm5 + pxor %xmm1,%xmm7 + + psrld $25-11,%xmm1 + movdqa %xmm12,%xmm0 + + pxor %xmm2,%xmm7 + movdqa %xmm12,%xmm3 + pslld $26-21,%xmm2 + pandn %xmm14,%xmm0 + pand %xmm13,%xmm3 + pxor %xmm1,%xmm7 + + + movdqa %xmm8,%xmm1 + pxor %xmm2,%xmm7 + movdqa %xmm8,%xmm2 + psrld $2,%xmm1 + paddd %xmm7,%xmm5 + pxor %xmm3,%xmm0 + movdqa %xmm9,%xmm3 + movdqa %xmm8,%xmm7 + pslld $10,%xmm2 + pxor %xmm8,%xmm3 + + + psrld $13,%xmm7 + pxor %xmm2,%xmm1 + paddd %xmm0,%xmm5 + pslld $19-10,%xmm2 + pand %xmm3,%xmm4 + pxor %xmm7,%xmm1 + + + psrld $22-13,%xmm7 + pxor %xmm2,%xmm1 + movdqa %xmm9,%xmm15 + pslld $30-19,%xmm2 + pxor %xmm1,%xmm7 + pxor %xmm4,%xmm15 + paddd %xmm5,%xmm11 + pxor %xmm2,%xmm7 + + paddd %xmm5,%xmm15 + paddd %xmm7,%xmm15 + movd 36(%r8),%xmm5 + movd 36(%r9),%xmm0 + movd 36(%r10),%xmm1 + movd 36(%r11),%xmm2 + punpckldq %xmm1,%xmm5 + punpckldq %xmm2,%xmm0 + punpckldq %xmm0,%xmm5 + movdqa %xmm11,%xmm7 + + movdqa %xmm11,%xmm2 +.byte 102,15,56,0,238 + psrld $6,%xmm7 + movdqa %xmm11,%xmm1 + pslld $7,%xmm2 + movdqa %xmm5,144-128(%rax) + paddd %xmm14,%xmm5 + + psrld $11,%xmm1 + pxor %xmm2,%xmm7 + pslld $21-7,%xmm2 + paddd -96(%rbp),%xmm5 + pxor %xmm1,%xmm7 + + psrld $25-11,%xmm1 + movdqa %xmm11,%xmm0 + + pxor %xmm2,%xmm7 + movdqa %xmm11,%xmm4 + pslld $26-21,%xmm2 + pandn %xmm13,%xmm0 + pand %xmm12,%xmm4 + pxor %xmm1,%xmm7 + + + movdqa %xmm15,%xmm1 + pxor %xmm2,%xmm7 + movdqa %xmm15,%xmm2 + psrld $2,%xmm1 + paddd %xmm7,%xmm5 + pxor %xmm4,%xmm0 + movdqa %xmm8,%xmm4 + movdqa %xmm15,%xmm7 + pslld $10,%xmm2 + pxor %xmm15,%xmm4 + + + psrld $13,%xmm7 + pxor %xmm2,%xmm1 + paddd %xmm0,%xmm5 + pslld $19-10,%xmm2 + pand %xmm4,%xmm3 + pxor %xmm7,%xmm1 + + + psrld $22-13,%xmm7 + pxor %xmm2,%xmm1 + movdqa %xmm8,%xmm14 + pslld $30-19,%xmm2 + pxor %xmm1,%xmm7 + pxor %xmm3,%xmm14 + paddd %xmm5,%xmm10 + pxor %xmm2,%xmm7 + + paddd %xmm5,%xmm14 + paddd %xmm7,%xmm14 + movd 40(%r8),%xmm5 + movd 40(%r9),%xmm0 + movd 40(%r10),%xmm1 + movd 40(%r11),%xmm2 + punpckldq %xmm1,%xmm5 + punpckldq %xmm2,%xmm0 + punpckldq %xmm0,%xmm5 + movdqa %xmm10,%xmm7 +.byte 102,15,56,0,238 + movdqa %xmm10,%xmm2 + + psrld $6,%xmm7 + movdqa %xmm10,%xmm1 + pslld $7,%xmm2 + movdqa %xmm5,160-128(%rax) + paddd %xmm13,%xmm5 + + psrld $11,%xmm1 + pxor %xmm2,%xmm7 + pslld $21-7,%xmm2 + paddd -64(%rbp),%xmm5 + pxor %xmm1,%xmm7 + + psrld $25-11,%xmm1 + movdqa %xmm10,%xmm0 + + pxor %xmm2,%xmm7 + movdqa %xmm10,%xmm3 + pslld $26-21,%xmm2 + pandn %xmm12,%xmm0 + pand %xmm11,%xmm3 + pxor %xmm1,%xmm7 + + + movdqa %xmm14,%xmm1 + pxor %xmm2,%xmm7 + movdqa %xmm14,%xmm2 + psrld $2,%xmm1 + paddd %xmm7,%xmm5 + pxor %xmm3,%xmm0 + movdqa %xmm15,%xmm3 + movdqa %xmm14,%xmm7 + pslld $10,%xmm2 + pxor %xmm14,%xmm3 + + + psrld $13,%xmm7 + pxor %xmm2,%xmm1 + paddd %xmm0,%xmm5 + pslld $19-10,%xmm2 + pand %xmm3,%xmm4 + pxor %xmm7,%xmm1 + + + psrld $22-13,%xmm7 + pxor %xmm2,%xmm1 + movdqa %xmm15,%xmm13 + pslld $30-19,%xmm2 + pxor %xmm1,%xmm7 + pxor %xmm4,%xmm13 + paddd %xmm5,%xmm9 + pxor %xmm2,%xmm7 + + paddd %xmm5,%xmm13 + paddd %xmm7,%xmm13 + movd 44(%r8),%xmm5 + movd 44(%r9),%xmm0 + movd 44(%r10),%xmm1 + movd 44(%r11),%xmm2 + punpckldq %xmm1,%xmm5 + punpckldq %xmm2,%xmm0 + punpckldq %xmm0,%xmm5 + movdqa %xmm9,%xmm7 + + movdqa %xmm9,%xmm2 +.byte 102,15,56,0,238 + psrld $6,%xmm7 + movdqa %xmm9,%xmm1 + pslld $7,%xmm2 + movdqa %xmm5,176-128(%rax) + paddd %xmm12,%xmm5 + + psrld $11,%xmm1 + pxor %xmm2,%xmm7 + pslld $21-7,%xmm2 + paddd -32(%rbp),%xmm5 + pxor %xmm1,%xmm7 + + psrld $25-11,%xmm1 + movdqa %xmm9,%xmm0 + + pxor %xmm2,%xmm7 + movdqa %xmm9,%xmm4 + pslld $26-21,%xmm2 + pandn %xmm11,%xmm0 + pand %xmm10,%xmm4 + pxor %xmm1,%xmm7 + + + movdqa %xmm13,%xmm1 + pxor %xmm2,%xmm7 + movdqa %xmm13,%xmm2 + psrld $2,%xmm1 + paddd %xmm7,%xmm5 + pxor %xmm4,%xmm0 + movdqa %xmm14,%xmm4 + movdqa %xmm13,%xmm7 + pslld $10,%xmm2 + pxor %xmm13,%xmm4 + + + psrld $13,%xmm7 + pxor %xmm2,%xmm1 + paddd %xmm0,%xmm5 + pslld $19-10,%xmm2 + pand %xmm4,%xmm3 + pxor %xmm7,%xmm1 + + + psrld $22-13,%xmm7 + pxor %xmm2,%xmm1 + movdqa %xmm14,%xmm12 + pslld $30-19,%xmm2 + pxor %xmm1,%xmm7 + pxor %xmm3,%xmm12 + paddd %xmm5,%xmm8 + pxor %xmm2,%xmm7 + + paddd %xmm5,%xmm12 + paddd %xmm7,%xmm12 + movd 48(%r8),%xmm5 + movd 48(%r9),%xmm0 + movd 48(%r10),%xmm1 + movd 48(%r11),%xmm2 + punpckldq %xmm1,%xmm5 + punpckldq %xmm2,%xmm0 + punpckldq %xmm0,%xmm5 + movdqa %xmm8,%xmm7 +.byte 102,15,56,0,238 + movdqa %xmm8,%xmm2 + + psrld $6,%xmm7 + movdqa %xmm8,%xmm1 + pslld $7,%xmm2 + movdqa %xmm5,192-128(%rax) + paddd %xmm11,%xmm5 + + psrld $11,%xmm1 + pxor %xmm2,%xmm7 + pslld $21-7,%xmm2 + paddd 0(%rbp),%xmm5 + pxor %xmm1,%xmm7 + + psrld $25-11,%xmm1 + movdqa %xmm8,%xmm0 + + pxor %xmm2,%xmm7 + movdqa %xmm8,%xmm3 + pslld $26-21,%xmm2 + pandn %xmm10,%xmm0 + pand %xmm9,%xmm3 + pxor %xmm1,%xmm7 + + + movdqa %xmm12,%xmm1 + pxor %xmm2,%xmm7 + movdqa %xmm12,%xmm2 + psrld $2,%xmm1 + paddd %xmm7,%xmm5 + pxor %xmm3,%xmm0 + movdqa %xmm13,%xmm3 + movdqa %xmm12,%xmm7 + pslld $10,%xmm2 + pxor %xmm12,%xmm3 + + + psrld $13,%xmm7 + pxor %xmm2,%xmm1 + paddd %xmm0,%xmm5 + pslld $19-10,%xmm2 + pand %xmm3,%xmm4 + pxor %xmm7,%xmm1 + + + psrld $22-13,%xmm7 + pxor %xmm2,%xmm1 + movdqa %xmm13,%xmm11 + pslld $30-19,%xmm2 + pxor %xmm1,%xmm7 + pxor %xmm4,%xmm11 + paddd %xmm5,%xmm15 + pxor %xmm2,%xmm7 + + paddd %xmm5,%xmm11 + paddd %xmm7,%xmm11 + movd 52(%r8),%xmm5 + movd 52(%r9),%xmm0 + movd 52(%r10),%xmm1 + movd 52(%r11),%xmm2 + punpckldq %xmm1,%xmm5 + punpckldq %xmm2,%xmm0 + punpckldq %xmm0,%xmm5 + movdqa %xmm15,%xmm7 + + movdqa %xmm15,%xmm2 +.byte 102,15,56,0,238 + psrld $6,%xmm7 + movdqa %xmm15,%xmm1 + pslld $7,%xmm2 + movdqa %xmm5,208-128(%rax) + paddd %xmm10,%xmm5 + + psrld $11,%xmm1 + pxor %xmm2,%xmm7 + pslld $21-7,%xmm2 + paddd 32(%rbp),%xmm5 + pxor %xmm1,%xmm7 + + psrld $25-11,%xmm1 + movdqa %xmm15,%xmm0 + + pxor %xmm2,%xmm7 + movdqa %xmm15,%xmm4 + pslld $26-21,%xmm2 + pandn %xmm9,%xmm0 + pand %xmm8,%xmm4 + pxor %xmm1,%xmm7 + + + movdqa %xmm11,%xmm1 + pxor %xmm2,%xmm7 + movdqa %xmm11,%xmm2 + psrld $2,%xmm1 + paddd %xmm7,%xmm5 + pxor %xmm4,%xmm0 + movdqa %xmm12,%xmm4 + movdqa %xmm11,%xmm7 + pslld $10,%xmm2 + pxor %xmm11,%xmm4 + + + psrld $13,%xmm7 + pxor %xmm2,%xmm1 + paddd %xmm0,%xmm5 + pslld $19-10,%xmm2 + pand %xmm4,%xmm3 + pxor %xmm7,%xmm1 + + + psrld $22-13,%xmm7 + pxor %xmm2,%xmm1 + movdqa %xmm12,%xmm10 + pslld $30-19,%xmm2 + pxor %xmm1,%xmm7 + pxor %xmm3,%xmm10 + paddd %xmm5,%xmm14 + pxor %xmm2,%xmm7 + + paddd %xmm5,%xmm10 + paddd %xmm7,%xmm10 + movd 56(%r8),%xmm5 + movd 56(%r9),%xmm0 + movd 56(%r10),%xmm1 + movd 56(%r11),%xmm2 + punpckldq %xmm1,%xmm5 + punpckldq %xmm2,%xmm0 + punpckldq %xmm0,%xmm5 + movdqa %xmm14,%xmm7 +.byte 102,15,56,0,238 + movdqa %xmm14,%xmm2 + + psrld $6,%xmm7 + movdqa %xmm14,%xmm1 + pslld $7,%xmm2 + movdqa %xmm5,224-128(%rax) + paddd %xmm9,%xmm5 + + psrld $11,%xmm1 + pxor %xmm2,%xmm7 + pslld $21-7,%xmm2 + paddd 64(%rbp),%xmm5 + pxor %xmm1,%xmm7 + + psrld $25-11,%xmm1 + movdqa %xmm14,%xmm0 + + pxor %xmm2,%xmm7 + movdqa %xmm14,%xmm3 + pslld $26-21,%xmm2 + pandn %xmm8,%xmm0 + pand %xmm15,%xmm3 + pxor %xmm1,%xmm7 + + + movdqa %xmm10,%xmm1 + pxor %xmm2,%xmm7 + movdqa %xmm10,%xmm2 + psrld $2,%xmm1 + paddd %xmm7,%xmm5 + pxor %xmm3,%xmm0 + movdqa %xmm11,%xmm3 + movdqa %xmm10,%xmm7 + pslld $10,%xmm2 + pxor %xmm10,%xmm3 + + + psrld $13,%xmm7 + pxor %xmm2,%xmm1 + paddd %xmm0,%xmm5 + pslld $19-10,%xmm2 + pand %xmm3,%xmm4 + pxor %xmm7,%xmm1 + + + psrld $22-13,%xmm7 + pxor %xmm2,%xmm1 + movdqa %xmm11,%xmm9 + pslld $30-19,%xmm2 + pxor %xmm1,%xmm7 + pxor %xmm4,%xmm9 + paddd %xmm5,%xmm13 + pxor %xmm2,%xmm7 + + paddd %xmm5,%xmm9 + paddd %xmm7,%xmm9 + movd 60(%r8),%xmm5 + leaq 64(%r8),%r8 + movd 60(%r9),%xmm0 + leaq 64(%r9),%r9 + movd 60(%r10),%xmm1 + leaq 64(%r10),%r10 + movd 60(%r11),%xmm2 + leaq 64(%r11),%r11 + punpckldq %xmm1,%xmm5 + punpckldq %xmm2,%xmm0 + punpckldq %xmm0,%xmm5 + movdqa %xmm13,%xmm7 + + movdqa %xmm13,%xmm2 +.byte 102,15,56,0,238 + psrld $6,%xmm7 + movdqa %xmm13,%xmm1 + pslld $7,%xmm2 + movdqa %xmm5,240-128(%rax) + paddd %xmm8,%xmm5 + + psrld $11,%xmm1 + pxor %xmm2,%xmm7 + pslld $21-7,%xmm2 + paddd 96(%rbp),%xmm5 + pxor %xmm1,%xmm7 + + psrld $25-11,%xmm1 + movdqa %xmm13,%xmm0 + prefetcht0 63(%r8) + pxor %xmm2,%xmm7 + movdqa %xmm13,%xmm4 + pslld $26-21,%xmm2 + pandn %xmm15,%xmm0 + pand %xmm14,%xmm4 + pxor %xmm1,%xmm7 + + prefetcht0 63(%r9) + movdqa %xmm9,%xmm1 + pxor %xmm2,%xmm7 + movdqa %xmm9,%xmm2 + psrld $2,%xmm1 + paddd %xmm7,%xmm5 + pxor %xmm4,%xmm0 + movdqa %xmm10,%xmm4 + movdqa %xmm9,%xmm7 + pslld $10,%xmm2 + pxor %xmm9,%xmm4 + + prefetcht0 63(%r10) + psrld $13,%xmm7 + pxor %xmm2,%xmm1 + paddd %xmm0,%xmm5 + pslld $19-10,%xmm2 + pand %xmm4,%xmm3 + pxor %xmm7,%xmm1 + + prefetcht0 63(%r11) + psrld $22-13,%xmm7 + pxor %xmm2,%xmm1 + movdqa %xmm10,%xmm8 + pslld $30-19,%xmm2 + pxor %xmm1,%xmm7 + pxor %xmm3,%xmm8 + paddd %xmm5,%xmm12 + pxor %xmm2,%xmm7 + + paddd %xmm5,%xmm8 + paddd %xmm7,%xmm8 + leaq 256(%rbp),%rbp + movdqu 0-128(%rax),%xmm5 + movl $3,%ecx + jmp .Loop_16_xx +.align 32 +.Loop_16_xx: + movdqa 16-128(%rax),%xmm6 + paddd 144-128(%rax),%xmm5 + + movdqa %xmm6,%xmm7 + movdqa %xmm6,%xmm1 + psrld $3,%xmm7 + movdqa %xmm6,%xmm2 + + psrld $7,%xmm1 + movdqa 224-128(%rax),%xmm0 + pslld $14,%xmm2 + pxor %xmm1,%xmm7 + psrld $18-7,%xmm1 + movdqa %xmm0,%xmm3 + pxor %xmm2,%xmm7 + pslld $25-14,%xmm2 + pxor %xmm1,%xmm7 + psrld $10,%xmm0 + movdqa %xmm3,%xmm1 + + psrld $17,%xmm3 + pxor %xmm2,%xmm7 + pslld $13,%xmm1 + paddd %xmm7,%xmm5 + pxor %xmm3,%xmm0 + psrld $19-17,%xmm3 + pxor %xmm1,%xmm0 + pslld $15-13,%xmm1 + pxor %xmm3,%xmm0 + pxor %xmm1,%xmm0 + paddd %xmm0,%xmm5 + movdqa %xmm12,%xmm7 + + movdqa %xmm12,%xmm2 + + psrld $6,%xmm7 + movdqa %xmm12,%xmm1 + pslld $7,%xmm2 + movdqa %xmm5,0-128(%rax) + paddd %xmm15,%xmm5 + + psrld $11,%xmm1 + pxor %xmm2,%xmm7 + pslld $21-7,%xmm2 + paddd -128(%rbp),%xmm5 + pxor %xmm1,%xmm7 + + psrld $25-11,%xmm1 + movdqa %xmm12,%xmm0 + + pxor %xmm2,%xmm7 + movdqa %xmm12,%xmm3 + pslld $26-21,%xmm2 + pandn %xmm14,%xmm0 + pand %xmm13,%xmm3 + pxor %xmm1,%xmm7 + + + movdqa %xmm8,%xmm1 + pxor %xmm2,%xmm7 + movdqa %xmm8,%xmm2 + psrld $2,%xmm1 + paddd %xmm7,%xmm5 + pxor %xmm3,%xmm0 + movdqa %xmm9,%xmm3 + movdqa %xmm8,%xmm7 + pslld $10,%xmm2 + pxor %xmm8,%xmm3 + + + psrld $13,%xmm7 + pxor %xmm2,%xmm1 + paddd %xmm0,%xmm5 + pslld $19-10,%xmm2 + pand %xmm3,%xmm4 + pxor %xmm7,%xmm1 + + + psrld $22-13,%xmm7 + pxor %xmm2,%xmm1 + movdqa %xmm9,%xmm15 + pslld $30-19,%xmm2 + pxor %xmm1,%xmm7 + pxor %xmm4,%xmm15 + paddd %xmm5,%xmm11 + pxor %xmm2,%xmm7 + + paddd %xmm5,%xmm15 + paddd %xmm7,%xmm15 + movdqa 32-128(%rax),%xmm5 + paddd 160-128(%rax),%xmm6 + + movdqa %xmm5,%xmm7 + movdqa %xmm5,%xmm1 + psrld $3,%xmm7 + movdqa %xmm5,%xmm2 + + psrld $7,%xmm1 + movdqa 240-128(%rax),%xmm0 + pslld $14,%xmm2 + pxor %xmm1,%xmm7 + psrld $18-7,%xmm1 + movdqa %xmm0,%xmm4 + pxor %xmm2,%xmm7 + pslld $25-14,%xmm2 + pxor %xmm1,%xmm7 + psrld $10,%xmm0 + movdqa %xmm4,%xmm1 + + psrld $17,%xmm4 + pxor %xmm2,%xmm7 + pslld $13,%xmm1 + paddd %xmm7,%xmm6 + pxor %xmm4,%xmm0 + psrld $19-17,%xmm4 + pxor %xmm1,%xmm0 + pslld $15-13,%xmm1 + pxor %xmm4,%xmm0 + pxor %xmm1,%xmm0 + paddd %xmm0,%xmm6 + movdqa %xmm11,%xmm7 + + movdqa %xmm11,%xmm2 + + psrld $6,%xmm7 + movdqa %xmm11,%xmm1 + pslld $7,%xmm2 + movdqa %xmm6,16-128(%rax) + paddd %xmm14,%xmm6 + + psrld $11,%xmm1 + pxor %xmm2,%xmm7 + pslld $21-7,%xmm2 + paddd -96(%rbp),%xmm6 + pxor %xmm1,%xmm7 + + psrld $25-11,%xmm1 + movdqa %xmm11,%xmm0 + + pxor %xmm2,%xmm7 + movdqa %xmm11,%xmm4 + pslld $26-21,%xmm2 + pandn %xmm13,%xmm0 + pand %xmm12,%xmm4 + pxor %xmm1,%xmm7 + + + movdqa %xmm15,%xmm1 + pxor %xmm2,%xmm7 + movdqa %xmm15,%xmm2 + psrld $2,%xmm1 + paddd %xmm7,%xmm6 + pxor %xmm4,%xmm0 + movdqa %xmm8,%xmm4 + movdqa %xmm15,%xmm7 + pslld $10,%xmm2 + pxor %xmm15,%xmm4 + + + psrld $13,%xmm7 + pxor %xmm2,%xmm1 + paddd %xmm0,%xmm6 + pslld $19-10,%xmm2 + pand %xmm4,%xmm3 + pxor %xmm7,%xmm1 + + + psrld $22-13,%xmm7 + pxor %xmm2,%xmm1 + movdqa %xmm8,%xmm14 + pslld $30-19,%xmm2 + pxor %xmm1,%xmm7 + pxor %xmm3,%xmm14 + paddd %xmm6,%xmm10 + pxor %xmm2,%xmm7 + + paddd %xmm6,%xmm14 + paddd %xmm7,%xmm14 + movdqa 48-128(%rax),%xmm6 + paddd 176-128(%rax),%xmm5 + + movdqa %xmm6,%xmm7 + movdqa %xmm6,%xmm1 + psrld $3,%xmm7 + movdqa %xmm6,%xmm2 + + psrld $7,%xmm1 + movdqa 0-128(%rax),%xmm0 + pslld $14,%xmm2 + pxor %xmm1,%xmm7 + psrld $18-7,%xmm1 + movdqa %xmm0,%xmm3 + pxor %xmm2,%xmm7 + pslld $25-14,%xmm2 + pxor %xmm1,%xmm7 + psrld $10,%xmm0 + movdqa %xmm3,%xmm1 + + psrld $17,%xmm3 + pxor %xmm2,%xmm7 + pslld $13,%xmm1 + paddd %xmm7,%xmm5 + pxor %xmm3,%xmm0 + psrld $19-17,%xmm3 + pxor %xmm1,%xmm0 + pslld $15-13,%xmm1 + pxor %xmm3,%xmm0 + pxor %xmm1,%xmm0 + paddd %xmm0,%xmm5 + movdqa %xmm10,%xmm7 + + movdqa %xmm10,%xmm2 + + psrld $6,%xmm7 + movdqa %xmm10,%xmm1 + pslld $7,%xmm2 + movdqa %xmm5,32-128(%rax) + paddd %xmm13,%xmm5 + + psrld $11,%xmm1 + pxor %xmm2,%xmm7 + pslld $21-7,%xmm2 + paddd -64(%rbp),%xmm5 + pxor %xmm1,%xmm7 + + psrld $25-11,%xmm1 + movdqa %xmm10,%xmm0 + + pxor %xmm2,%xmm7 + movdqa %xmm10,%xmm3 + pslld $26-21,%xmm2 + pandn %xmm12,%xmm0 + pand %xmm11,%xmm3 + pxor %xmm1,%xmm7 + + + movdqa %xmm14,%xmm1 + pxor %xmm2,%xmm7 + movdqa %xmm14,%xmm2 + psrld $2,%xmm1 + paddd %xmm7,%xmm5 + pxor %xmm3,%xmm0 + movdqa %xmm15,%xmm3 + movdqa %xmm14,%xmm7 + pslld $10,%xmm2 + pxor %xmm14,%xmm3 + + + psrld $13,%xmm7 + pxor %xmm2,%xmm1 + paddd %xmm0,%xmm5 + pslld $19-10,%xmm2 + pand %xmm3,%xmm4 + pxor %xmm7,%xmm1 + + + psrld $22-13,%xmm7 + pxor %xmm2,%xmm1 + movdqa %xmm15,%xmm13 + pslld $30-19,%xmm2 + pxor %xmm1,%xmm7 + pxor %xmm4,%xmm13 + paddd %xmm5,%xmm9 + pxor %xmm2,%xmm7 + + paddd %xmm5,%xmm13 + paddd %xmm7,%xmm13 + movdqa 64-128(%rax),%xmm5 + paddd 192-128(%rax),%xmm6 + + movdqa %xmm5,%xmm7 + movdqa %xmm5,%xmm1 + psrld $3,%xmm7 + movdqa %xmm5,%xmm2 + + psrld $7,%xmm1 + movdqa 16-128(%rax),%xmm0 + pslld $14,%xmm2 + pxor %xmm1,%xmm7 + psrld $18-7,%xmm1 + movdqa %xmm0,%xmm4 + pxor %xmm2,%xmm7 + pslld $25-14,%xmm2 + pxor %xmm1,%xmm7 + psrld $10,%xmm0 + movdqa %xmm4,%xmm1 + + psrld $17,%xmm4 + pxor %xmm2,%xmm7 + pslld $13,%xmm1 + paddd %xmm7,%xmm6 + pxor %xmm4,%xmm0 + psrld $19-17,%xmm4 + pxor %xmm1,%xmm0 + pslld $15-13,%xmm1 + pxor %xmm4,%xmm0 + pxor %xmm1,%xmm0 + paddd %xmm0,%xmm6 + movdqa %xmm9,%xmm7 + + movdqa %xmm9,%xmm2 + + psrld $6,%xmm7 + movdqa %xmm9,%xmm1 + pslld $7,%xmm2 + movdqa %xmm6,48-128(%rax) + paddd %xmm12,%xmm6 + + psrld $11,%xmm1 + pxor %xmm2,%xmm7 + pslld $21-7,%xmm2 + paddd -32(%rbp),%xmm6 + pxor %xmm1,%xmm7 + + psrld $25-11,%xmm1 + movdqa %xmm9,%xmm0 + + pxor %xmm2,%xmm7 + movdqa %xmm9,%xmm4 + pslld $26-21,%xmm2 + pandn %xmm11,%xmm0 + pand %xmm10,%xmm4 + pxor %xmm1,%xmm7 + + + movdqa %xmm13,%xmm1 + pxor %xmm2,%xmm7 + movdqa %xmm13,%xmm2 + psrld $2,%xmm1 + paddd %xmm7,%xmm6 + pxor %xmm4,%xmm0 + movdqa %xmm14,%xmm4 + movdqa %xmm13,%xmm7 + pslld $10,%xmm2 + pxor %xmm13,%xmm4 + + + psrld $13,%xmm7 + pxor %xmm2,%xmm1 + paddd %xmm0,%xmm6 + pslld $19-10,%xmm2 + pand %xmm4,%xmm3 + pxor %xmm7,%xmm1 + + + psrld $22-13,%xmm7 + pxor %xmm2,%xmm1 + movdqa %xmm14,%xmm12 + pslld $30-19,%xmm2 + pxor %xmm1,%xmm7 + pxor %xmm3,%xmm12 + paddd %xmm6,%xmm8 + pxor %xmm2,%xmm7 + + paddd %xmm6,%xmm12 + paddd %xmm7,%xmm12 + movdqa 80-128(%rax),%xmm6 + paddd 208-128(%rax),%xmm5 + + movdqa %xmm6,%xmm7 + movdqa %xmm6,%xmm1 + psrld $3,%xmm7 + movdqa %xmm6,%xmm2 + + psrld $7,%xmm1 + movdqa 32-128(%rax),%xmm0 + pslld $14,%xmm2 + pxor %xmm1,%xmm7 + psrld $18-7,%xmm1 + movdqa %xmm0,%xmm3 + pxor %xmm2,%xmm7 + pslld $25-14,%xmm2 + pxor %xmm1,%xmm7 + psrld $10,%xmm0 + movdqa %xmm3,%xmm1 + + psrld $17,%xmm3 + pxor %xmm2,%xmm7 + pslld $13,%xmm1 + paddd %xmm7,%xmm5 + pxor %xmm3,%xmm0 + psrld $19-17,%xmm3 + pxor %xmm1,%xmm0 + pslld $15-13,%xmm1 + pxor %xmm3,%xmm0 + pxor %xmm1,%xmm0 + paddd %xmm0,%xmm5 + movdqa %xmm8,%xmm7 + + movdqa %xmm8,%xmm2 + + psrld $6,%xmm7 + movdqa %xmm8,%xmm1 + pslld $7,%xmm2 + movdqa %xmm5,64-128(%rax) + paddd %xmm11,%xmm5 + + psrld $11,%xmm1 + pxor %xmm2,%xmm7 + pslld $21-7,%xmm2 + paddd 0(%rbp),%xmm5 + pxor %xmm1,%xmm7 + + psrld $25-11,%xmm1 + movdqa %xmm8,%xmm0 + + pxor %xmm2,%xmm7 + movdqa %xmm8,%xmm3 + pslld $26-21,%xmm2 + pandn %xmm10,%xmm0 + pand %xmm9,%xmm3 + pxor %xmm1,%xmm7 + + + movdqa %xmm12,%xmm1 + pxor %xmm2,%xmm7 + movdqa %xmm12,%xmm2 + psrld $2,%xmm1 + paddd %xmm7,%xmm5 + pxor %xmm3,%xmm0 + movdqa %xmm13,%xmm3 + movdqa %xmm12,%xmm7 + pslld $10,%xmm2 + pxor %xmm12,%xmm3 + + + psrld $13,%xmm7 + pxor %xmm2,%xmm1 + paddd %xmm0,%xmm5 + pslld $19-10,%xmm2 + pand %xmm3,%xmm4 + pxor %xmm7,%xmm1 + + + psrld $22-13,%xmm7 + pxor %xmm2,%xmm1 + movdqa %xmm13,%xmm11 + pslld $30-19,%xmm2 + pxor %xmm1,%xmm7 + pxor %xmm4,%xmm11 + paddd %xmm5,%xmm15 + pxor %xmm2,%xmm7 + + paddd %xmm5,%xmm11 + paddd %xmm7,%xmm11 + movdqa 96-128(%rax),%xmm5 + paddd 224-128(%rax),%xmm6 + + movdqa %xmm5,%xmm7 + movdqa %xmm5,%xmm1 + psrld $3,%xmm7 + movdqa %xmm5,%xmm2 + + psrld $7,%xmm1 + movdqa 48-128(%rax),%xmm0 + pslld $14,%xmm2 + pxor %xmm1,%xmm7 + psrld $18-7,%xmm1 + movdqa %xmm0,%xmm4 + pxor %xmm2,%xmm7 + pslld $25-14,%xmm2 + pxor %xmm1,%xmm7 + psrld $10,%xmm0 + movdqa %xmm4,%xmm1 + + psrld $17,%xmm4 + pxor %xmm2,%xmm7 + pslld $13,%xmm1 + paddd %xmm7,%xmm6 + pxor %xmm4,%xmm0 + psrld $19-17,%xmm4 + pxor %xmm1,%xmm0 + pslld $15-13,%xmm1 + pxor %xmm4,%xmm0 + pxor %xmm1,%xmm0 + paddd %xmm0,%xmm6 + movdqa %xmm15,%xmm7 + + movdqa %xmm15,%xmm2 + + psrld $6,%xmm7 + movdqa %xmm15,%xmm1 + pslld $7,%xmm2 + movdqa %xmm6,80-128(%rax) + paddd %xmm10,%xmm6 + + psrld $11,%xmm1 + pxor %xmm2,%xmm7 + pslld $21-7,%xmm2 + paddd 32(%rbp),%xmm6 + pxor %xmm1,%xmm7 + + psrld $25-11,%xmm1 + movdqa %xmm15,%xmm0 + + pxor %xmm2,%xmm7 + movdqa %xmm15,%xmm4 + pslld $26-21,%xmm2 + pandn %xmm9,%xmm0 + pand %xmm8,%xmm4 + pxor %xmm1,%xmm7 + + + movdqa %xmm11,%xmm1 + pxor %xmm2,%xmm7 + movdqa %xmm11,%xmm2 + psrld $2,%xmm1 + paddd %xmm7,%xmm6 + pxor %xmm4,%xmm0 + movdqa %xmm12,%xmm4 + movdqa %xmm11,%xmm7 + pslld $10,%xmm2 + pxor %xmm11,%xmm4 + + + psrld $13,%xmm7 + pxor %xmm2,%xmm1 + paddd %xmm0,%xmm6 + pslld $19-10,%xmm2 + pand %xmm4,%xmm3 + pxor %xmm7,%xmm1 + + + psrld $22-13,%xmm7 + pxor %xmm2,%xmm1 + movdqa %xmm12,%xmm10 + pslld $30-19,%xmm2 + pxor %xmm1,%xmm7 + pxor %xmm3,%xmm10 + paddd %xmm6,%xmm14 + pxor %xmm2,%xmm7 + + paddd %xmm6,%xmm10 + paddd %xmm7,%xmm10 + movdqa 112-128(%rax),%xmm6 + paddd 240-128(%rax),%xmm5 + + movdqa %xmm6,%xmm7 + movdqa %xmm6,%xmm1 + psrld $3,%xmm7 + movdqa %xmm6,%xmm2 + + psrld $7,%xmm1 + movdqa 64-128(%rax),%xmm0 + pslld $14,%xmm2 + pxor %xmm1,%xmm7 + psrld $18-7,%xmm1 + movdqa %xmm0,%xmm3 + pxor %xmm2,%xmm7 + pslld $25-14,%xmm2 + pxor %xmm1,%xmm7 + psrld $10,%xmm0 + movdqa %xmm3,%xmm1 + + psrld $17,%xmm3 + pxor %xmm2,%xmm7 + pslld $13,%xmm1 + paddd %xmm7,%xmm5 + pxor %xmm3,%xmm0 + psrld $19-17,%xmm3 + pxor %xmm1,%xmm0 + pslld $15-13,%xmm1 + pxor %xmm3,%xmm0 + pxor %xmm1,%xmm0 + paddd %xmm0,%xmm5 + movdqa %xmm14,%xmm7 + + movdqa %xmm14,%xmm2 + + psrld $6,%xmm7 + movdqa %xmm14,%xmm1 + pslld $7,%xmm2 + movdqa %xmm5,96-128(%rax) + paddd %xmm9,%xmm5 + + psrld $11,%xmm1 + pxor %xmm2,%xmm7 + pslld $21-7,%xmm2 + paddd 64(%rbp),%xmm5 + pxor %xmm1,%xmm7 + + psrld $25-11,%xmm1 + movdqa %xmm14,%xmm0 + + pxor %xmm2,%xmm7 + movdqa %xmm14,%xmm3 + pslld $26-21,%xmm2 + pandn %xmm8,%xmm0 + pand %xmm15,%xmm3 + pxor %xmm1,%xmm7 + + + movdqa %xmm10,%xmm1 + pxor %xmm2,%xmm7 + movdqa %xmm10,%xmm2 + psrld $2,%xmm1 + paddd %xmm7,%xmm5 + pxor %xmm3,%xmm0 + movdqa %xmm11,%xmm3 + movdqa %xmm10,%xmm7 + pslld $10,%xmm2 + pxor %xmm10,%xmm3 + + + psrld $13,%xmm7 + pxor %xmm2,%xmm1 + paddd %xmm0,%xmm5 + pslld $19-10,%xmm2 + pand %xmm3,%xmm4 + pxor %xmm7,%xmm1 + + + psrld $22-13,%xmm7 + pxor %xmm2,%xmm1 + movdqa %xmm11,%xmm9 + pslld $30-19,%xmm2 + pxor %xmm1,%xmm7 + pxor %xmm4,%xmm9 + paddd %xmm5,%xmm13 + pxor %xmm2,%xmm7 + + paddd %xmm5,%xmm9 + paddd %xmm7,%xmm9 + movdqa 128-128(%rax),%xmm5 + paddd 0-128(%rax),%xmm6 + + movdqa %xmm5,%xmm7 + movdqa %xmm5,%xmm1 + psrld $3,%xmm7 + movdqa %xmm5,%xmm2 + + psrld $7,%xmm1 + movdqa 80-128(%rax),%xmm0 + pslld $14,%xmm2 + pxor %xmm1,%xmm7 + psrld $18-7,%xmm1 + movdqa %xmm0,%xmm4 + pxor %xmm2,%xmm7 + pslld $25-14,%xmm2 + pxor %xmm1,%xmm7 + psrld $10,%xmm0 + movdqa %xmm4,%xmm1 + + psrld $17,%xmm4 + pxor %xmm2,%xmm7 + pslld $13,%xmm1 + paddd %xmm7,%xmm6 + pxor %xmm4,%xmm0 + psrld $19-17,%xmm4 + pxor %xmm1,%xmm0 + pslld $15-13,%xmm1 + pxor %xmm4,%xmm0 + pxor %xmm1,%xmm0 + paddd %xmm0,%xmm6 + movdqa %xmm13,%xmm7 + + movdqa %xmm13,%xmm2 + + psrld $6,%xmm7 + movdqa %xmm13,%xmm1 + pslld $7,%xmm2 + movdqa %xmm6,112-128(%rax) + paddd %xmm8,%xmm6 + + psrld $11,%xmm1 + pxor %xmm2,%xmm7 + pslld $21-7,%xmm2 + paddd 96(%rbp),%xmm6 + pxor %xmm1,%xmm7 + + psrld $25-11,%xmm1 + movdqa %xmm13,%xmm0 + + pxor %xmm2,%xmm7 + movdqa %xmm13,%xmm4 + pslld $26-21,%xmm2 + pandn %xmm15,%xmm0 + pand %xmm14,%xmm4 + pxor %xmm1,%xmm7 + + + movdqa %xmm9,%xmm1 + pxor %xmm2,%xmm7 + movdqa %xmm9,%xmm2 + psrld $2,%xmm1 + paddd %xmm7,%xmm6 + pxor %xmm4,%xmm0 + movdqa %xmm10,%xmm4 + movdqa %xmm9,%xmm7 + pslld $10,%xmm2 + pxor %xmm9,%xmm4 + + + psrld $13,%xmm7 + pxor %xmm2,%xmm1 + paddd %xmm0,%xmm6 + pslld $19-10,%xmm2 + pand %xmm4,%xmm3 + pxor %xmm7,%xmm1 + + + psrld $22-13,%xmm7 + pxor %xmm2,%xmm1 + movdqa %xmm10,%xmm8 + pslld $30-19,%xmm2 + pxor %xmm1,%xmm7 + pxor %xmm3,%xmm8 + paddd %xmm6,%xmm12 + pxor %xmm2,%xmm7 + + paddd %xmm6,%xmm8 + paddd %xmm7,%xmm8 + leaq 256(%rbp),%rbp + movdqa 144-128(%rax),%xmm6 + paddd 16-128(%rax),%xmm5 + + movdqa %xmm6,%xmm7 + movdqa %xmm6,%xmm1 + psrld $3,%xmm7 + movdqa %xmm6,%xmm2 + + psrld $7,%xmm1 + movdqa 96-128(%rax),%xmm0 + pslld $14,%xmm2 + pxor %xmm1,%xmm7 + psrld $18-7,%xmm1 + movdqa %xmm0,%xmm3 + pxor %xmm2,%xmm7 + pslld $25-14,%xmm2 + pxor %xmm1,%xmm7 + psrld $10,%xmm0 + movdqa %xmm3,%xmm1 + + psrld $17,%xmm3 + pxor %xmm2,%xmm7 + pslld $13,%xmm1 + paddd %xmm7,%xmm5 + pxor %xmm3,%xmm0 + psrld $19-17,%xmm3 + pxor %xmm1,%xmm0 + pslld $15-13,%xmm1 + pxor %xmm3,%xmm0 + pxor %xmm1,%xmm0 + paddd %xmm0,%xmm5 + movdqa %xmm12,%xmm7 + + movdqa %xmm12,%xmm2 + + psrld $6,%xmm7 + movdqa %xmm12,%xmm1 + pslld $7,%xmm2 + movdqa %xmm5,128-128(%rax) + paddd %xmm15,%xmm5 + + psrld $11,%xmm1 + pxor %xmm2,%xmm7 + pslld $21-7,%xmm2 + paddd -128(%rbp),%xmm5 + pxor %xmm1,%xmm7 + + psrld $25-11,%xmm1 + movdqa %xmm12,%xmm0 + + pxor %xmm2,%xmm7 + movdqa %xmm12,%xmm3 + pslld $26-21,%xmm2 + pandn %xmm14,%xmm0 + pand %xmm13,%xmm3 + pxor %xmm1,%xmm7 + + + movdqa %xmm8,%xmm1 + pxor %xmm2,%xmm7 + movdqa %xmm8,%xmm2 + psrld $2,%xmm1 + paddd %xmm7,%xmm5 + pxor %xmm3,%xmm0 + movdqa %xmm9,%xmm3 + movdqa %xmm8,%xmm7 + pslld $10,%xmm2 + pxor %xmm8,%xmm3 + + + psrld $13,%xmm7 + pxor %xmm2,%xmm1 + paddd %xmm0,%xmm5 + pslld $19-10,%xmm2 + pand %xmm3,%xmm4 + pxor %xmm7,%xmm1 + + + psrld $22-13,%xmm7 + pxor %xmm2,%xmm1 + movdqa %xmm9,%xmm15 + pslld $30-19,%xmm2 + pxor %xmm1,%xmm7 + pxor %xmm4,%xmm15 + paddd %xmm5,%xmm11 + pxor %xmm2,%xmm7 + + paddd %xmm5,%xmm15 + paddd %xmm7,%xmm15 + movdqa 160-128(%rax),%xmm5 + paddd 32-128(%rax),%xmm6 + + movdqa %xmm5,%xmm7 + movdqa %xmm5,%xmm1 + psrld $3,%xmm7 + movdqa %xmm5,%xmm2 + + psrld $7,%xmm1 + movdqa 112-128(%rax),%xmm0 + pslld $14,%xmm2 + pxor %xmm1,%xmm7 + psrld $18-7,%xmm1 + movdqa %xmm0,%xmm4 + pxor %xmm2,%xmm7 + pslld $25-14,%xmm2 + pxor %xmm1,%xmm7 + psrld $10,%xmm0 + movdqa %xmm4,%xmm1 + + psrld $17,%xmm4 + pxor %xmm2,%xmm7 + pslld $13,%xmm1 + paddd %xmm7,%xmm6 + pxor %xmm4,%xmm0 + psrld $19-17,%xmm4 + pxor %xmm1,%xmm0 + pslld $15-13,%xmm1 + pxor %xmm4,%xmm0 + pxor %xmm1,%xmm0 + paddd %xmm0,%xmm6 + movdqa %xmm11,%xmm7 + + movdqa %xmm11,%xmm2 + + psrld $6,%xmm7 + movdqa %xmm11,%xmm1 + pslld $7,%xmm2 + movdqa %xmm6,144-128(%rax) + paddd %xmm14,%xmm6 + + psrld $11,%xmm1 + pxor %xmm2,%xmm7 + pslld $21-7,%xmm2 + paddd -96(%rbp),%xmm6 + pxor %xmm1,%xmm7 + + psrld $25-11,%xmm1 + movdqa %xmm11,%xmm0 + + pxor %xmm2,%xmm7 + movdqa %xmm11,%xmm4 + pslld $26-21,%xmm2 + pandn %xmm13,%xmm0 + pand %xmm12,%xmm4 + pxor %xmm1,%xmm7 + + + movdqa %xmm15,%xmm1 + pxor %xmm2,%xmm7 + movdqa %xmm15,%xmm2 + psrld $2,%xmm1 + paddd %xmm7,%xmm6 + pxor %xmm4,%xmm0 + movdqa %xmm8,%xmm4 + movdqa %xmm15,%xmm7 + pslld $10,%xmm2 + pxor %xmm15,%xmm4 + + + psrld $13,%xmm7 + pxor %xmm2,%xmm1 + paddd %xmm0,%xmm6 + pslld $19-10,%xmm2 + pand %xmm4,%xmm3 + pxor %xmm7,%xmm1 + + + psrld $22-13,%xmm7 + pxor %xmm2,%xmm1 + movdqa %xmm8,%xmm14 + pslld $30-19,%xmm2 + pxor %xmm1,%xmm7 + pxor %xmm3,%xmm14 + paddd %xmm6,%xmm10 + pxor %xmm2,%xmm7 + + paddd %xmm6,%xmm14 + paddd %xmm7,%xmm14 + movdqa 176-128(%rax),%xmm6 + paddd 48-128(%rax),%xmm5 + + movdqa %xmm6,%xmm7 + movdqa %xmm6,%xmm1 + psrld $3,%xmm7 + movdqa %xmm6,%xmm2 + + psrld $7,%xmm1 + movdqa 128-128(%rax),%xmm0 + pslld $14,%xmm2 + pxor %xmm1,%xmm7 + psrld $18-7,%xmm1 + movdqa %xmm0,%xmm3 + pxor %xmm2,%xmm7 + pslld $25-14,%xmm2 + pxor %xmm1,%xmm7 + psrld $10,%xmm0 + movdqa %xmm3,%xmm1 + + psrld $17,%xmm3 + pxor %xmm2,%xmm7 + pslld $13,%xmm1 + paddd %xmm7,%xmm5 + pxor %xmm3,%xmm0 + psrld $19-17,%xmm3 + pxor %xmm1,%xmm0 + pslld $15-13,%xmm1 + pxor %xmm3,%xmm0 + pxor %xmm1,%xmm0 + paddd %xmm0,%xmm5 + movdqa %xmm10,%xmm7 + + movdqa %xmm10,%xmm2 + + psrld $6,%xmm7 + movdqa %xmm10,%xmm1 + pslld $7,%xmm2 + movdqa %xmm5,160-128(%rax) + paddd %xmm13,%xmm5 + + psrld $11,%xmm1 + pxor %xmm2,%xmm7 + pslld $21-7,%xmm2 + paddd -64(%rbp),%xmm5 + pxor %xmm1,%xmm7 + + psrld $25-11,%xmm1 + movdqa %xmm10,%xmm0 + + pxor %xmm2,%xmm7 + movdqa %xmm10,%xmm3 + pslld $26-21,%xmm2 + pandn %xmm12,%xmm0 + pand %xmm11,%xmm3 + pxor %xmm1,%xmm7 + + + movdqa %xmm14,%xmm1 + pxor %xmm2,%xmm7 + movdqa %xmm14,%xmm2 + psrld $2,%xmm1 + paddd %xmm7,%xmm5 + pxor %xmm3,%xmm0 + movdqa %xmm15,%xmm3 + movdqa %xmm14,%xmm7 + pslld $10,%xmm2 + pxor %xmm14,%xmm3 + + + psrld $13,%xmm7 + pxor %xmm2,%xmm1 + paddd %xmm0,%xmm5 + pslld $19-10,%xmm2 + pand %xmm3,%xmm4 + pxor %xmm7,%xmm1 + + + psrld $22-13,%xmm7 + pxor %xmm2,%xmm1 + movdqa %xmm15,%xmm13 + pslld $30-19,%xmm2 + pxor %xmm1,%xmm7 + pxor %xmm4,%xmm13 + paddd %xmm5,%xmm9 + pxor %xmm2,%xmm7 + + paddd %xmm5,%xmm13 + paddd %xmm7,%xmm13 + movdqa 192-128(%rax),%xmm5 + paddd 64-128(%rax),%xmm6 + + movdqa %xmm5,%xmm7 + movdqa %xmm5,%xmm1 + psrld $3,%xmm7 + movdqa %xmm5,%xmm2 + + psrld $7,%xmm1 + movdqa 144-128(%rax),%xmm0 + pslld $14,%xmm2 + pxor %xmm1,%xmm7 + psrld $18-7,%xmm1 + movdqa %xmm0,%xmm4 + pxor %xmm2,%xmm7 + pslld $25-14,%xmm2 + pxor %xmm1,%xmm7 + psrld $10,%xmm0 + movdqa %xmm4,%xmm1 + + psrld $17,%xmm4 + pxor %xmm2,%xmm7 + pslld $13,%xmm1 + paddd %xmm7,%xmm6 + pxor %xmm4,%xmm0 + psrld $19-17,%xmm4 + pxor %xmm1,%xmm0 + pslld $15-13,%xmm1 + pxor %xmm4,%xmm0 + pxor %xmm1,%xmm0 + paddd %xmm0,%xmm6 + movdqa %xmm9,%xmm7 + + movdqa %xmm9,%xmm2 + + psrld $6,%xmm7 + movdqa %xmm9,%xmm1 + pslld $7,%xmm2 + movdqa %xmm6,176-128(%rax) + paddd %xmm12,%xmm6 + + psrld $11,%xmm1 + pxor %xmm2,%xmm7 + pslld $21-7,%xmm2 + paddd -32(%rbp),%xmm6 + pxor %xmm1,%xmm7 + + psrld $25-11,%xmm1 + movdqa %xmm9,%xmm0 + + pxor %xmm2,%xmm7 + movdqa %xmm9,%xmm4 + pslld $26-21,%xmm2 + pandn %xmm11,%xmm0 + pand %xmm10,%xmm4 + pxor %xmm1,%xmm7 + + + movdqa %xmm13,%xmm1 + pxor %xmm2,%xmm7 + movdqa %xmm13,%xmm2 + psrld $2,%xmm1 + paddd %xmm7,%xmm6 + pxor %xmm4,%xmm0 + movdqa %xmm14,%xmm4 + movdqa %xmm13,%xmm7 + pslld $10,%xmm2 + pxor %xmm13,%xmm4 + + + psrld $13,%xmm7 + pxor %xmm2,%xmm1 + paddd %xmm0,%xmm6 + pslld $19-10,%xmm2 + pand %xmm4,%xmm3 + pxor %xmm7,%xmm1 + + + psrld $22-13,%xmm7 + pxor %xmm2,%xmm1 + movdqa %xmm14,%xmm12 + pslld $30-19,%xmm2 + pxor %xmm1,%xmm7 + pxor %xmm3,%xmm12 + paddd %xmm6,%xmm8 + pxor %xmm2,%xmm7 + + paddd %xmm6,%xmm12 + paddd %xmm7,%xmm12 + movdqa 208-128(%rax),%xmm6 + paddd 80-128(%rax),%xmm5 + + movdqa %xmm6,%xmm7 + movdqa %xmm6,%xmm1 + psrld $3,%xmm7 + movdqa %xmm6,%xmm2 + + psrld $7,%xmm1 + movdqa 160-128(%rax),%xmm0 + pslld $14,%xmm2 + pxor %xmm1,%xmm7 + psrld $18-7,%xmm1 + movdqa %xmm0,%xmm3 + pxor %xmm2,%xmm7 + pslld $25-14,%xmm2 + pxor %xmm1,%xmm7 + psrld $10,%xmm0 + movdqa %xmm3,%xmm1 + + psrld $17,%xmm3 + pxor %xmm2,%xmm7 + pslld $13,%xmm1 + paddd %xmm7,%xmm5 + pxor %xmm3,%xmm0 + psrld $19-17,%xmm3 + pxor %xmm1,%xmm0 + pslld $15-13,%xmm1 + pxor %xmm3,%xmm0 + pxor %xmm1,%xmm0 + paddd %xmm0,%xmm5 + movdqa %xmm8,%xmm7 + + movdqa %xmm8,%xmm2 + + psrld $6,%xmm7 + movdqa %xmm8,%xmm1 + pslld $7,%xmm2 + movdqa %xmm5,192-128(%rax) + paddd %xmm11,%xmm5 + + psrld $11,%xmm1 + pxor %xmm2,%xmm7 + pslld $21-7,%xmm2 + paddd 0(%rbp),%xmm5 + pxor %xmm1,%xmm7 + + psrld $25-11,%xmm1 + movdqa %xmm8,%xmm0 + + pxor %xmm2,%xmm7 + movdqa %xmm8,%xmm3 + pslld $26-21,%xmm2 + pandn %xmm10,%xmm0 + pand %xmm9,%xmm3 + pxor %xmm1,%xmm7 + + + movdqa %xmm12,%xmm1 + pxor %xmm2,%xmm7 + movdqa %xmm12,%xmm2 + psrld $2,%xmm1 + paddd %xmm7,%xmm5 + pxor %xmm3,%xmm0 + movdqa %xmm13,%xmm3 + movdqa %xmm12,%xmm7 + pslld $10,%xmm2 + pxor %xmm12,%xmm3 + + + psrld $13,%xmm7 + pxor %xmm2,%xmm1 + paddd %xmm0,%xmm5 + pslld $19-10,%xmm2 + pand %xmm3,%xmm4 + pxor %xmm7,%xmm1 + + + psrld $22-13,%xmm7 + pxor %xmm2,%xmm1 + movdqa %xmm13,%xmm11 + pslld $30-19,%xmm2 + pxor %xmm1,%xmm7 + pxor %xmm4,%xmm11 + paddd %xmm5,%xmm15 + pxor %xmm2,%xmm7 + + paddd %xmm5,%xmm11 + paddd %xmm7,%xmm11 + movdqa 224-128(%rax),%xmm5 + paddd 96-128(%rax),%xmm6 + + movdqa %xmm5,%xmm7 + movdqa %xmm5,%xmm1 + psrld $3,%xmm7 + movdqa %xmm5,%xmm2 + + psrld $7,%xmm1 + movdqa 176-128(%rax),%xmm0 + pslld $14,%xmm2 + pxor %xmm1,%xmm7 + psrld $18-7,%xmm1 + movdqa %xmm0,%xmm4 + pxor %xmm2,%xmm7 + pslld $25-14,%xmm2 + pxor %xmm1,%xmm7 + psrld $10,%xmm0 + movdqa %xmm4,%xmm1 + + psrld $17,%xmm4 + pxor %xmm2,%xmm7 + pslld $13,%xmm1 + paddd %xmm7,%xmm6 + pxor %xmm4,%xmm0 + psrld $19-17,%xmm4 + pxor %xmm1,%xmm0 + pslld $15-13,%xmm1 + pxor %xmm4,%xmm0 + pxor %xmm1,%xmm0 + paddd %xmm0,%xmm6 + movdqa %xmm15,%xmm7 + + movdqa %xmm15,%xmm2 + + psrld $6,%xmm7 + movdqa %xmm15,%xmm1 + pslld $7,%xmm2 + movdqa %xmm6,208-128(%rax) + paddd %xmm10,%xmm6 + + psrld $11,%xmm1 + pxor %xmm2,%xmm7 + pslld $21-7,%xmm2 + paddd 32(%rbp),%xmm6 + pxor %xmm1,%xmm7 + + psrld $25-11,%xmm1 + movdqa %xmm15,%xmm0 + + pxor %xmm2,%xmm7 + movdqa %xmm15,%xmm4 + pslld $26-21,%xmm2 + pandn %xmm9,%xmm0 + pand %xmm8,%xmm4 + pxor %xmm1,%xmm7 + + + movdqa %xmm11,%xmm1 + pxor %xmm2,%xmm7 + movdqa %xmm11,%xmm2 + psrld $2,%xmm1 + paddd %xmm7,%xmm6 + pxor %xmm4,%xmm0 + movdqa %xmm12,%xmm4 + movdqa %xmm11,%xmm7 + pslld $10,%xmm2 + pxor %xmm11,%xmm4 + + + psrld $13,%xmm7 + pxor %xmm2,%xmm1 + paddd %xmm0,%xmm6 + pslld $19-10,%xmm2 + pand %xmm4,%xmm3 + pxor %xmm7,%xmm1 + + + psrld $22-13,%xmm7 + pxor %xmm2,%xmm1 + movdqa %xmm12,%xmm10 + pslld $30-19,%xmm2 + pxor %xmm1,%xmm7 + pxor %xmm3,%xmm10 + paddd %xmm6,%xmm14 + pxor %xmm2,%xmm7 + + paddd %xmm6,%xmm10 + paddd %xmm7,%xmm10 + movdqa 240-128(%rax),%xmm6 + paddd 112-128(%rax),%xmm5 + + movdqa %xmm6,%xmm7 + movdqa %xmm6,%xmm1 + psrld $3,%xmm7 + movdqa %xmm6,%xmm2 + + psrld $7,%xmm1 + movdqa 192-128(%rax),%xmm0 + pslld $14,%xmm2 + pxor %xmm1,%xmm7 + psrld $18-7,%xmm1 + movdqa %xmm0,%xmm3 + pxor %xmm2,%xmm7 + pslld $25-14,%xmm2 + pxor %xmm1,%xmm7 + psrld $10,%xmm0 + movdqa %xmm3,%xmm1 + + psrld $17,%xmm3 + pxor %xmm2,%xmm7 + pslld $13,%xmm1 + paddd %xmm7,%xmm5 + pxor %xmm3,%xmm0 + psrld $19-17,%xmm3 + pxor %xmm1,%xmm0 + pslld $15-13,%xmm1 + pxor %xmm3,%xmm0 + pxor %xmm1,%xmm0 + paddd %xmm0,%xmm5 + movdqa %xmm14,%xmm7 + + movdqa %xmm14,%xmm2 + + psrld $6,%xmm7 + movdqa %xmm14,%xmm1 + pslld $7,%xmm2 + movdqa %xmm5,224-128(%rax) + paddd %xmm9,%xmm5 + + psrld $11,%xmm1 + pxor %xmm2,%xmm7 + pslld $21-7,%xmm2 + paddd 64(%rbp),%xmm5 + pxor %xmm1,%xmm7 + + psrld $25-11,%xmm1 + movdqa %xmm14,%xmm0 + + pxor %xmm2,%xmm7 + movdqa %xmm14,%xmm3 + pslld $26-21,%xmm2 + pandn %xmm8,%xmm0 + pand %xmm15,%xmm3 + pxor %xmm1,%xmm7 + + + movdqa %xmm10,%xmm1 + pxor %xmm2,%xmm7 + movdqa %xmm10,%xmm2 + psrld $2,%xmm1 + paddd %xmm7,%xmm5 + pxor %xmm3,%xmm0 + movdqa %xmm11,%xmm3 + movdqa %xmm10,%xmm7 + pslld $10,%xmm2 + pxor %xmm10,%xmm3 + + + psrld $13,%xmm7 + pxor %xmm2,%xmm1 + paddd %xmm0,%xmm5 + pslld $19-10,%xmm2 + pand %xmm3,%xmm4 + pxor %xmm7,%xmm1 + + + psrld $22-13,%xmm7 + pxor %xmm2,%xmm1 + movdqa %xmm11,%xmm9 + pslld $30-19,%xmm2 + pxor %xmm1,%xmm7 + pxor %xmm4,%xmm9 + paddd %xmm5,%xmm13 + pxor %xmm2,%xmm7 + + paddd %xmm5,%xmm9 + paddd %xmm7,%xmm9 + movdqa 0-128(%rax),%xmm5 + paddd 128-128(%rax),%xmm6 + + movdqa %xmm5,%xmm7 + movdqa %xmm5,%xmm1 + psrld $3,%xmm7 + movdqa %xmm5,%xmm2 + + psrld $7,%xmm1 + movdqa 208-128(%rax),%xmm0 + pslld $14,%xmm2 + pxor %xmm1,%xmm7 + psrld $18-7,%xmm1 + movdqa %xmm0,%xmm4 + pxor %xmm2,%xmm7 + pslld $25-14,%xmm2 + pxor %xmm1,%xmm7 + psrld $10,%xmm0 + movdqa %xmm4,%xmm1 + + psrld $17,%xmm4 + pxor %xmm2,%xmm7 + pslld $13,%xmm1 + paddd %xmm7,%xmm6 + pxor %xmm4,%xmm0 + psrld $19-17,%xmm4 + pxor %xmm1,%xmm0 + pslld $15-13,%xmm1 + pxor %xmm4,%xmm0 + pxor %xmm1,%xmm0 + paddd %xmm0,%xmm6 + movdqa %xmm13,%xmm7 + + movdqa %xmm13,%xmm2 + + psrld $6,%xmm7 + movdqa %xmm13,%xmm1 + pslld $7,%xmm2 + movdqa %xmm6,240-128(%rax) + paddd %xmm8,%xmm6 + + psrld $11,%xmm1 + pxor %xmm2,%xmm7 + pslld $21-7,%xmm2 + paddd 96(%rbp),%xmm6 + pxor %xmm1,%xmm7 + + psrld $25-11,%xmm1 + movdqa %xmm13,%xmm0 + + pxor %xmm2,%xmm7 + movdqa %xmm13,%xmm4 + pslld $26-21,%xmm2 + pandn %xmm15,%xmm0 + pand %xmm14,%xmm4 + pxor %xmm1,%xmm7 + + + movdqa %xmm9,%xmm1 + pxor %xmm2,%xmm7 + movdqa %xmm9,%xmm2 + psrld $2,%xmm1 + paddd %xmm7,%xmm6 + pxor %xmm4,%xmm0 + movdqa %xmm10,%xmm4 + movdqa %xmm9,%xmm7 + pslld $10,%xmm2 + pxor %xmm9,%xmm4 + + + psrld $13,%xmm7 + pxor %xmm2,%xmm1 + paddd %xmm0,%xmm6 + pslld $19-10,%xmm2 + pand %xmm4,%xmm3 + pxor %xmm7,%xmm1 + + + psrld $22-13,%xmm7 + pxor %xmm2,%xmm1 + movdqa %xmm10,%xmm8 + pslld $30-19,%xmm2 + pxor %xmm1,%xmm7 + pxor %xmm3,%xmm8 + paddd %xmm6,%xmm12 + pxor %xmm2,%xmm7 + + paddd %xmm6,%xmm8 + paddd %xmm7,%xmm8 + leaq 256(%rbp),%rbp + decl %ecx + jnz .Loop_16_xx + + movl $1,%ecx + leaq K256+128(%rip),%rbp + + movdqa (%rbx),%xmm7 + cmpl 0(%rbx),%ecx + pxor %xmm0,%xmm0 + cmovgeq %rbp,%r8 + cmpl 4(%rbx),%ecx + movdqa %xmm7,%xmm6 + cmovgeq %rbp,%r9 + cmpl 8(%rbx),%ecx + pcmpgtd %xmm0,%xmm6 + cmovgeq %rbp,%r10 + cmpl 12(%rbx),%ecx + paddd %xmm6,%xmm7 + cmovgeq %rbp,%r11 + + movdqu 0-128(%rdi),%xmm0 + pand %xmm6,%xmm8 + movdqu 32-128(%rdi),%xmm1 + pand %xmm6,%xmm9 + movdqu 64-128(%rdi),%xmm2 + pand %xmm6,%xmm10 + movdqu 96-128(%rdi),%xmm5 + pand %xmm6,%xmm11 + paddd %xmm0,%xmm8 + movdqu 128-128(%rdi),%xmm0 + pand %xmm6,%xmm12 + paddd %xmm1,%xmm9 + movdqu 160-128(%rdi),%xmm1 + pand %xmm6,%xmm13 + paddd %xmm2,%xmm10 + movdqu 192-128(%rdi),%xmm2 + pand %xmm6,%xmm14 + paddd %xmm5,%xmm11 + movdqu 224-128(%rdi),%xmm5 + pand %xmm6,%xmm15 + paddd %xmm0,%xmm12 + paddd %xmm1,%xmm13 + movdqu %xmm8,0-128(%rdi) + paddd %xmm2,%xmm14 + movdqu %xmm9,32-128(%rdi) + paddd %xmm5,%xmm15 + movdqu %xmm10,64-128(%rdi) + movdqu %xmm11,96-128(%rdi) + movdqu %xmm12,128-128(%rdi) + movdqu %xmm13,160-128(%rdi) + movdqu %xmm14,192-128(%rdi) + movdqu %xmm15,224-128(%rdi) + + movdqa %xmm7,(%rbx) + movdqa .Lpbswap(%rip),%xmm6 + decl %edx + jnz .Loop + + movl 280(%rsp),%edx + leaq 16(%rdi),%rdi + leaq 64(%rsi),%rsi + decl %edx + jnz .Loop_grande + +.Ldone: + movq 272(%rsp),%rax +.cfi_def_cfa %rax,8 + movq -16(%rax),%rbp +.cfi_restore %rbp + movq -8(%rax),%rbx +.cfi_restore %rbx + leaq (%rax),%rsp +.cfi_def_cfa_register %rsp +.Lepilogue: + .byte 0xf3,0xc3 +.cfi_endproc +.size sha256_multi_block,.-sha256_multi_block +.type sha256_multi_block_shaext,@function +.align 32 +sha256_multi_block_shaext: +.cfi_startproc +_shaext_shortcut: + movq %rsp,%rax +.cfi_def_cfa_register %rax + pushq %rbx +.cfi_offset %rbx,-16 + pushq %rbp +.cfi_offset %rbp,-24 + subq $288,%rsp + shll $1,%edx + andq $-256,%rsp + leaq 128(%rdi),%rdi + movq %rax,272(%rsp) +.Lbody_shaext: + leaq 256(%rsp),%rbx + leaq K256_shaext+128(%rip),%rbp + +.Loop_grande_shaext: + movl %edx,280(%rsp) + xorl %edx,%edx + movq 0(%rsi),%r8 + movl 8(%rsi),%ecx + cmpl %edx,%ecx + cmovgl %ecx,%edx + testl %ecx,%ecx + movl %ecx,0(%rbx) + cmovleq %rsp,%r8 + movq 16(%rsi),%r9 + movl 24(%rsi),%ecx + cmpl %edx,%ecx + cmovgl %ecx,%edx + testl %ecx,%ecx + movl %ecx,4(%rbx) + cmovleq %rsp,%r9 + testl %edx,%edx + jz .Ldone_shaext + + movq 0-128(%rdi),%xmm12 + movq 32-128(%rdi),%xmm4 + movq 64-128(%rdi),%xmm13 + movq 96-128(%rdi),%xmm5 + movq 128-128(%rdi),%xmm8 + movq 160-128(%rdi),%xmm9 + movq 192-128(%rdi),%xmm10 + movq 224-128(%rdi),%xmm11 + + punpckldq %xmm4,%xmm12 + punpckldq %xmm5,%xmm13 + punpckldq %xmm9,%xmm8 + punpckldq %xmm11,%xmm10 + movdqa K256_shaext-16(%rip),%xmm3 + + movdqa %xmm12,%xmm14 + movdqa %xmm13,%xmm15 + punpcklqdq %xmm8,%xmm12 + punpcklqdq %xmm10,%xmm13 + punpckhqdq %xmm8,%xmm14 + punpckhqdq %xmm10,%xmm15 + + pshufd $27,%xmm12,%xmm12 + pshufd $27,%xmm13,%xmm13 + pshufd $27,%xmm14,%xmm14 + pshufd $27,%xmm15,%xmm15 + jmp .Loop_shaext + +.align 32 +.Loop_shaext: + movdqu 0(%r8),%xmm4 + movdqu 0(%r9),%xmm8 + movdqu 16(%r8),%xmm5 + movdqu 16(%r9),%xmm9 + movdqu 32(%r8),%xmm6 +.byte 102,15,56,0,227 + movdqu 32(%r9),%xmm10 +.byte 102,68,15,56,0,195 + movdqu 48(%r8),%xmm7 + leaq 64(%r8),%r8 + movdqu 48(%r9),%xmm11 + leaq 64(%r9),%r9 + + movdqa 0-128(%rbp),%xmm0 +.byte 102,15,56,0,235 + paddd %xmm4,%xmm0 + pxor %xmm12,%xmm4 + movdqa %xmm0,%xmm1 + movdqa 0-128(%rbp),%xmm2 +.byte 102,68,15,56,0,203 + paddd %xmm8,%xmm2 + movdqa %xmm13,80(%rsp) +.byte 69,15,56,203,236 + pxor %xmm14,%xmm8 + movdqa %xmm2,%xmm0 + movdqa %xmm15,112(%rsp) +.byte 69,15,56,203,254 + pshufd $0x0e,%xmm1,%xmm0 + pxor %xmm12,%xmm4 + movdqa %xmm12,64(%rsp) +.byte 69,15,56,203,229 + pshufd $0x0e,%xmm2,%xmm0 + pxor %xmm14,%xmm8 + movdqa %xmm14,96(%rsp) + movdqa 16-128(%rbp),%xmm1 + paddd %xmm5,%xmm1 +.byte 102,15,56,0,243 +.byte 69,15,56,203,247 + + movdqa %xmm1,%xmm0 + movdqa 16-128(%rbp),%xmm2 + paddd %xmm9,%xmm2 +.byte 69,15,56,203,236 + movdqa %xmm2,%xmm0 + prefetcht0 127(%r8) +.byte 102,15,56,0,251 +.byte 102,68,15,56,0,211 + prefetcht0 127(%r9) +.byte 69,15,56,203,254 + pshufd $0x0e,%xmm1,%xmm0 +.byte 102,68,15,56,0,219 +.byte 15,56,204,229 +.byte 69,15,56,203,229 + pshufd $0x0e,%xmm2,%xmm0 + movdqa 32-128(%rbp),%xmm1 + paddd %xmm6,%xmm1 +.byte 69,15,56,203,247 + + movdqa %xmm1,%xmm0 + movdqa 32-128(%rbp),%xmm2 + paddd %xmm10,%xmm2 +.byte 69,15,56,203,236 +.byte 69,15,56,204,193 + movdqa %xmm2,%xmm0 + movdqa %xmm7,%xmm3 +.byte 69,15,56,203,254 + pshufd $0x0e,%xmm1,%xmm0 +.byte 102,15,58,15,222,4 + paddd %xmm3,%xmm4 + movdqa %xmm11,%xmm3 +.byte 102,65,15,58,15,218,4 +.byte 15,56,204,238 +.byte 69,15,56,203,229 + pshufd $0x0e,%xmm2,%xmm0 + movdqa 48-128(%rbp),%xmm1 + paddd %xmm7,%xmm1 +.byte 69,15,56,203,247 +.byte 69,15,56,204,202 + + movdqa %xmm1,%xmm0 + movdqa 48-128(%rbp),%xmm2 + paddd %xmm3,%xmm8 + paddd %xmm11,%xmm2 +.byte 15,56,205,231 +.byte 69,15,56,203,236 + movdqa %xmm2,%xmm0 + movdqa %xmm4,%xmm3 +.byte 102,15,58,15,223,4 +.byte 69,15,56,203,254 +.byte 69,15,56,205,195 + pshufd $0x0e,%xmm1,%xmm0 + paddd %xmm3,%xmm5 + movdqa %xmm8,%xmm3 +.byte 102,65,15,58,15,219,4 +.byte 15,56,204,247 +.byte 69,15,56,203,229 + pshufd $0x0e,%xmm2,%xmm0 + movdqa 64-128(%rbp),%xmm1 + paddd %xmm4,%xmm1 +.byte 69,15,56,203,247 +.byte 69,15,56,204,211 + movdqa %xmm1,%xmm0 + movdqa 64-128(%rbp),%xmm2 + paddd %xmm3,%xmm9 + paddd %xmm8,%xmm2 +.byte 15,56,205,236 +.byte 69,15,56,203,236 + movdqa %xmm2,%xmm0 + movdqa %xmm5,%xmm3 +.byte 102,15,58,15,220,4 +.byte 69,15,56,203,254 +.byte 69,15,56,205,200 + pshufd $0x0e,%xmm1,%xmm0 + paddd %xmm3,%xmm6 + movdqa %xmm9,%xmm3 +.byte 102,65,15,58,15,216,4 +.byte 15,56,204,252 +.byte 69,15,56,203,229 + pshufd $0x0e,%xmm2,%xmm0 + movdqa 80-128(%rbp),%xmm1 + paddd %xmm5,%xmm1 +.byte 69,15,56,203,247 +.byte 69,15,56,204,216 + movdqa %xmm1,%xmm0 + movdqa 80-128(%rbp),%xmm2 + paddd %xmm3,%xmm10 + paddd %xmm9,%xmm2 +.byte 15,56,205,245 +.byte 69,15,56,203,236 + movdqa %xmm2,%xmm0 + movdqa %xmm6,%xmm3 +.byte 102,15,58,15,221,4 +.byte 69,15,56,203,254 +.byte 69,15,56,205,209 + pshufd $0x0e,%xmm1,%xmm0 + paddd %xmm3,%xmm7 + movdqa %xmm10,%xmm3 +.byte 102,65,15,58,15,217,4 +.byte 15,56,204,229 +.byte 69,15,56,203,229 + pshufd $0x0e,%xmm2,%xmm0 + movdqa 96-128(%rbp),%xmm1 + paddd %xmm6,%xmm1 +.byte 69,15,56,203,247 +.byte 69,15,56,204,193 + movdqa %xmm1,%xmm0 + movdqa 96-128(%rbp),%xmm2 + paddd %xmm3,%xmm11 + paddd %xmm10,%xmm2 +.byte 15,56,205,254 +.byte 69,15,56,203,236 + movdqa %xmm2,%xmm0 + movdqa %xmm7,%xmm3 +.byte 102,15,58,15,222,4 +.byte 69,15,56,203,254 +.byte 69,15,56,205,218 + pshufd $0x0e,%xmm1,%xmm0 + paddd %xmm3,%xmm4 + movdqa %xmm11,%xmm3 +.byte 102,65,15,58,15,218,4 +.byte 15,56,204,238 +.byte 69,15,56,203,229 + pshufd $0x0e,%xmm2,%xmm0 + movdqa 112-128(%rbp),%xmm1 + paddd %xmm7,%xmm1 +.byte 69,15,56,203,247 +.byte 69,15,56,204,202 + movdqa %xmm1,%xmm0 + movdqa 112-128(%rbp),%xmm2 + paddd %xmm3,%xmm8 + paddd %xmm11,%xmm2 +.byte 15,56,205,231 +.byte 69,15,56,203,236 + movdqa %xmm2,%xmm0 + movdqa %xmm4,%xmm3 +.byte 102,15,58,15,223,4 +.byte 69,15,56,203,254 +.byte 69,15,56,205,195 + pshufd $0x0e,%xmm1,%xmm0 + paddd %xmm3,%xmm5 + movdqa %xmm8,%xmm3 +.byte 102,65,15,58,15,219,4 +.byte 15,56,204,247 +.byte 69,15,56,203,229 + pshufd $0x0e,%xmm2,%xmm0 + movdqa 128-128(%rbp),%xmm1 + paddd %xmm4,%xmm1 +.byte 69,15,56,203,247 +.byte 69,15,56,204,211 + movdqa %xmm1,%xmm0 + movdqa 128-128(%rbp),%xmm2 + paddd %xmm3,%xmm9 + paddd %xmm8,%xmm2 +.byte 15,56,205,236 +.byte 69,15,56,203,236 + movdqa %xmm2,%xmm0 + movdqa %xmm5,%xmm3 +.byte 102,15,58,15,220,4 +.byte 69,15,56,203,254 +.byte 69,15,56,205,200 + pshufd $0x0e,%xmm1,%xmm0 + paddd %xmm3,%xmm6 + movdqa %xmm9,%xmm3 +.byte 102,65,15,58,15,216,4 +.byte 15,56,204,252 +.byte 69,15,56,203,229 + pshufd $0x0e,%xmm2,%xmm0 + movdqa 144-128(%rbp),%xmm1 + paddd %xmm5,%xmm1 +.byte 69,15,56,203,247 +.byte 69,15,56,204,216 + movdqa %xmm1,%xmm0 + movdqa 144-128(%rbp),%xmm2 + paddd %xmm3,%xmm10 + paddd %xmm9,%xmm2 +.byte 15,56,205,245 +.byte 69,15,56,203,236 + movdqa %xmm2,%xmm0 + movdqa %xmm6,%xmm3 +.byte 102,15,58,15,221,4 +.byte 69,15,56,203,254 +.byte 69,15,56,205,209 + pshufd $0x0e,%xmm1,%xmm0 + paddd %xmm3,%xmm7 + movdqa %xmm10,%xmm3 +.byte 102,65,15,58,15,217,4 +.byte 15,56,204,229 +.byte 69,15,56,203,229 + pshufd $0x0e,%xmm2,%xmm0 + movdqa 160-128(%rbp),%xmm1 + paddd %xmm6,%xmm1 +.byte 69,15,56,203,247 +.byte 69,15,56,204,193 + movdqa %xmm1,%xmm0 + movdqa 160-128(%rbp),%xmm2 + paddd %xmm3,%xmm11 + paddd %xmm10,%xmm2 +.byte 15,56,205,254 +.byte 69,15,56,203,236 + movdqa %xmm2,%xmm0 + movdqa %xmm7,%xmm3 +.byte 102,15,58,15,222,4 +.byte 69,15,56,203,254 +.byte 69,15,56,205,218 + pshufd $0x0e,%xmm1,%xmm0 + paddd %xmm3,%xmm4 + movdqa %xmm11,%xmm3 +.byte 102,65,15,58,15,218,4 +.byte 15,56,204,238 +.byte 69,15,56,203,229 + pshufd $0x0e,%xmm2,%xmm0 + movdqa 176-128(%rbp),%xmm1 + paddd %xmm7,%xmm1 +.byte 69,15,56,203,247 +.byte 69,15,56,204,202 + movdqa %xmm1,%xmm0 + movdqa 176-128(%rbp),%xmm2 + paddd %xmm3,%xmm8 + paddd %xmm11,%xmm2 +.byte 15,56,205,231 +.byte 69,15,56,203,236 + movdqa %xmm2,%xmm0 + movdqa %xmm4,%xmm3 +.byte 102,15,58,15,223,4 +.byte 69,15,56,203,254 +.byte 69,15,56,205,195 + pshufd $0x0e,%xmm1,%xmm0 + paddd %xmm3,%xmm5 + movdqa %xmm8,%xmm3 +.byte 102,65,15,58,15,219,4 +.byte 15,56,204,247 +.byte 69,15,56,203,229 + pshufd $0x0e,%xmm2,%xmm0 + movdqa 192-128(%rbp),%xmm1 + paddd %xmm4,%xmm1 +.byte 69,15,56,203,247 +.byte 69,15,56,204,211 + movdqa %xmm1,%xmm0 + movdqa 192-128(%rbp),%xmm2 + paddd %xmm3,%xmm9 + paddd %xmm8,%xmm2 +.byte 15,56,205,236 +.byte 69,15,56,203,236 + movdqa %xmm2,%xmm0 + movdqa %xmm5,%xmm3 +.byte 102,15,58,15,220,4 +.byte 69,15,56,203,254 +.byte 69,15,56,205,200 + pshufd $0x0e,%xmm1,%xmm0 + paddd %xmm3,%xmm6 + movdqa %xmm9,%xmm3 +.byte 102,65,15,58,15,216,4 +.byte 15,56,204,252 +.byte 69,15,56,203,229 + pshufd $0x0e,%xmm2,%xmm0 + movdqa 208-128(%rbp),%xmm1 + paddd %xmm5,%xmm1 +.byte 69,15,56,203,247 +.byte 69,15,56,204,216 + movdqa %xmm1,%xmm0 + movdqa 208-128(%rbp),%xmm2 + paddd %xmm3,%xmm10 + paddd %xmm9,%xmm2 +.byte 15,56,205,245 +.byte 69,15,56,203,236 + movdqa %xmm2,%xmm0 + movdqa %xmm6,%xmm3 +.byte 102,15,58,15,221,4 +.byte 69,15,56,203,254 +.byte 69,15,56,205,209 + pshufd $0x0e,%xmm1,%xmm0 + paddd %xmm3,%xmm7 + movdqa %xmm10,%xmm3 +.byte 102,65,15,58,15,217,4 + nop +.byte 69,15,56,203,229 + pshufd $0x0e,%xmm2,%xmm0 + movdqa 224-128(%rbp),%xmm1 + paddd %xmm6,%xmm1 +.byte 69,15,56,203,247 + + movdqa %xmm1,%xmm0 + movdqa 224-128(%rbp),%xmm2 + paddd %xmm3,%xmm11 + paddd %xmm10,%xmm2 +.byte 15,56,205,254 + nop +.byte 69,15,56,203,236 + movdqa %xmm2,%xmm0 + movl $1,%ecx + pxor %xmm6,%xmm6 +.byte 69,15,56,203,254 +.byte 69,15,56,205,218 + pshufd $0x0e,%xmm1,%xmm0 + movdqa 240-128(%rbp),%xmm1 + paddd %xmm7,%xmm1 + movq (%rbx),%xmm7 + nop +.byte 69,15,56,203,229 + pshufd $0x0e,%xmm2,%xmm0 + movdqa 240-128(%rbp),%xmm2 + paddd %xmm11,%xmm2 +.byte 69,15,56,203,247 + + movdqa %xmm1,%xmm0 + cmpl 0(%rbx),%ecx + cmovgeq %rsp,%r8 + cmpl 4(%rbx),%ecx + cmovgeq %rsp,%r9 + pshufd $0x00,%xmm7,%xmm9 +.byte 69,15,56,203,236 + movdqa %xmm2,%xmm0 + pshufd $0x55,%xmm7,%xmm10 + movdqa %xmm7,%xmm11 +.byte 69,15,56,203,254 + pshufd $0x0e,%xmm1,%xmm0 + pcmpgtd %xmm6,%xmm9 + pcmpgtd %xmm6,%xmm10 +.byte 69,15,56,203,229 + pshufd $0x0e,%xmm2,%xmm0 + pcmpgtd %xmm6,%xmm11 + movdqa K256_shaext-16(%rip),%xmm3 +.byte 69,15,56,203,247 + + pand %xmm9,%xmm13 + pand %xmm10,%xmm15 + pand %xmm9,%xmm12 + pand %xmm10,%xmm14 + paddd %xmm7,%xmm11 + + paddd 80(%rsp),%xmm13 + paddd 112(%rsp),%xmm15 + paddd 64(%rsp),%xmm12 + paddd 96(%rsp),%xmm14 + + movq %xmm11,(%rbx) + decl %edx + jnz .Loop_shaext + + movl 280(%rsp),%edx + + pshufd $27,%xmm12,%xmm12 + pshufd $27,%xmm13,%xmm13 + pshufd $27,%xmm14,%xmm14 + pshufd $27,%xmm15,%xmm15 + + movdqa %xmm12,%xmm5 + movdqa %xmm13,%xmm6 + punpckldq %xmm14,%xmm12 + punpckhdq %xmm14,%xmm5 + punpckldq %xmm15,%xmm13 + punpckhdq %xmm15,%xmm6 + + movq %xmm12,0-128(%rdi) + psrldq $8,%xmm12 + movq %xmm5,128-128(%rdi) + psrldq $8,%xmm5 + movq %xmm12,32-128(%rdi) + movq %xmm5,160-128(%rdi) + + movq %xmm13,64-128(%rdi) + psrldq $8,%xmm13 + movq %xmm6,192-128(%rdi) + psrldq $8,%xmm6 + movq %xmm13,96-128(%rdi) + movq %xmm6,224-128(%rdi) + + leaq 8(%rdi),%rdi + leaq 32(%rsi),%rsi + decl %edx + jnz .Loop_grande_shaext + +.Ldone_shaext: + + movq -16(%rax),%rbp +.cfi_restore %rbp + movq -8(%rax),%rbx +.cfi_restore %rbx + leaq (%rax),%rsp +.cfi_def_cfa_register %rsp +.Lepilogue_shaext: + .byte 0xf3,0xc3 +.cfi_endproc +.size sha256_multi_block_shaext,.-sha256_multi_block_shaext +.align 256 +K256: +.long 1116352408,1116352408,1116352408,1116352408 +.long 1116352408,1116352408,1116352408,1116352408 +.long 1899447441,1899447441,1899447441,1899447441 +.long 1899447441,1899447441,1899447441,1899447441 +.long 3049323471,3049323471,3049323471,3049323471 +.long 3049323471,3049323471,3049323471,3049323471 +.long 3921009573,3921009573,3921009573,3921009573 +.long 3921009573,3921009573,3921009573,3921009573 +.long 961987163,961987163,961987163,961987163 +.long 961987163,961987163,961987163,961987163 +.long 1508970993,1508970993,1508970993,1508970993 +.long 1508970993,1508970993,1508970993,1508970993 +.long 2453635748,2453635748,2453635748,2453635748 +.long 2453635748,2453635748,2453635748,2453635748 +.long 2870763221,2870763221,2870763221,2870763221 +.long 2870763221,2870763221,2870763221,2870763221 +.long 3624381080,3624381080,3624381080,3624381080 +.long 3624381080,3624381080,3624381080,3624381080 +.long 310598401,310598401,310598401,310598401 +.long 310598401,310598401,310598401,310598401 +.long 607225278,607225278,607225278,607225278 +.long 607225278,607225278,607225278,607225278 +.long 1426881987,1426881987,1426881987,1426881987 +.long 1426881987,1426881987,1426881987,1426881987 +.long 1925078388,1925078388,1925078388,1925078388 +.long 1925078388,1925078388,1925078388,1925078388 +.long 2162078206,2162078206,2162078206,2162078206 +.long 2162078206,2162078206,2162078206,2162078206 +.long 2614888103,2614888103,2614888103,2614888103 +.long 2614888103,2614888103,2614888103,2614888103 +.long 3248222580,3248222580,3248222580,3248222580 +.long 3248222580,3248222580,3248222580,3248222580 +.long 3835390401,3835390401,3835390401,3835390401 +.long 3835390401,3835390401,3835390401,3835390401 +.long 4022224774,4022224774,4022224774,4022224774 +.long 4022224774,4022224774,4022224774,4022224774 +.long 264347078,264347078,264347078,264347078 +.long 264347078,264347078,264347078,264347078 +.long 604807628,604807628,604807628,604807628 +.long 604807628,604807628,604807628,604807628 +.long 770255983,770255983,770255983,770255983 +.long 770255983,770255983,770255983,770255983 +.long 1249150122,1249150122,1249150122,1249150122 +.long 1249150122,1249150122,1249150122,1249150122 +.long 1555081692,1555081692,1555081692,1555081692 +.long 1555081692,1555081692,1555081692,1555081692 +.long 1996064986,1996064986,1996064986,1996064986 +.long 1996064986,1996064986,1996064986,1996064986 +.long 2554220882,2554220882,2554220882,2554220882 +.long 2554220882,2554220882,2554220882,2554220882 +.long 2821834349,2821834349,2821834349,2821834349 +.long 2821834349,2821834349,2821834349,2821834349 +.long 2952996808,2952996808,2952996808,2952996808 +.long 2952996808,2952996808,2952996808,2952996808 +.long 3210313671,3210313671,3210313671,3210313671 +.long 3210313671,3210313671,3210313671,3210313671 +.long 3336571891,3336571891,3336571891,3336571891 +.long 3336571891,3336571891,3336571891,3336571891 +.long 3584528711,3584528711,3584528711,3584528711 +.long 3584528711,3584528711,3584528711,3584528711 +.long 113926993,113926993,113926993,113926993 +.long 113926993,113926993,113926993,113926993 +.long 338241895,338241895,338241895,338241895 +.long 338241895,338241895,338241895,338241895 +.long 666307205,666307205,666307205,666307205 +.long 666307205,666307205,666307205,666307205 +.long 773529912,773529912,773529912,773529912 +.long 773529912,773529912,773529912,773529912 +.long 1294757372,1294757372,1294757372,1294757372 +.long 1294757372,1294757372,1294757372,1294757372 +.long 1396182291,1396182291,1396182291,1396182291 +.long 1396182291,1396182291,1396182291,1396182291 +.long 1695183700,1695183700,1695183700,1695183700 +.long 1695183700,1695183700,1695183700,1695183700 +.long 1986661051,1986661051,1986661051,1986661051 +.long 1986661051,1986661051,1986661051,1986661051 +.long 2177026350,2177026350,2177026350,2177026350 +.long 2177026350,2177026350,2177026350,2177026350 +.long 2456956037,2456956037,2456956037,2456956037 +.long 2456956037,2456956037,2456956037,2456956037 +.long 2730485921,2730485921,2730485921,2730485921 +.long 2730485921,2730485921,2730485921,2730485921 +.long 2820302411,2820302411,2820302411,2820302411 +.long 2820302411,2820302411,2820302411,2820302411 +.long 3259730800,3259730800,3259730800,3259730800 +.long 3259730800,3259730800,3259730800,3259730800 +.long 3345764771,3345764771,3345764771,3345764771 +.long 3345764771,3345764771,3345764771,3345764771 +.long 3516065817,3516065817,3516065817,3516065817 +.long 3516065817,3516065817,3516065817,3516065817 +.long 3600352804,3600352804,3600352804,3600352804 +.long 3600352804,3600352804,3600352804,3600352804 +.long 4094571909,4094571909,4094571909,4094571909 +.long 4094571909,4094571909,4094571909,4094571909 +.long 275423344,275423344,275423344,275423344 +.long 275423344,275423344,275423344,275423344 +.long 430227734,430227734,430227734,430227734 +.long 430227734,430227734,430227734,430227734 +.long 506948616,506948616,506948616,506948616 +.long 506948616,506948616,506948616,506948616 +.long 659060556,659060556,659060556,659060556 +.long 659060556,659060556,659060556,659060556 +.long 883997877,883997877,883997877,883997877 +.long 883997877,883997877,883997877,883997877 +.long 958139571,958139571,958139571,958139571 +.long 958139571,958139571,958139571,958139571 +.long 1322822218,1322822218,1322822218,1322822218 +.long 1322822218,1322822218,1322822218,1322822218 +.long 1537002063,1537002063,1537002063,1537002063 +.long 1537002063,1537002063,1537002063,1537002063 +.long 1747873779,1747873779,1747873779,1747873779 +.long 1747873779,1747873779,1747873779,1747873779 +.long 1955562222,1955562222,1955562222,1955562222 +.long 1955562222,1955562222,1955562222,1955562222 +.long 2024104815,2024104815,2024104815,2024104815 +.long 2024104815,2024104815,2024104815,2024104815 +.long 2227730452,2227730452,2227730452,2227730452 +.long 2227730452,2227730452,2227730452,2227730452 +.long 2361852424,2361852424,2361852424,2361852424 +.long 2361852424,2361852424,2361852424,2361852424 +.long 2428436474,2428436474,2428436474,2428436474 +.long 2428436474,2428436474,2428436474,2428436474 +.long 2756734187,2756734187,2756734187,2756734187 +.long 2756734187,2756734187,2756734187,2756734187 +.long 3204031479,3204031479,3204031479,3204031479 +.long 3204031479,3204031479,3204031479,3204031479 +.long 3329325298,3329325298,3329325298,3329325298 +.long 3329325298,3329325298,3329325298,3329325298 +.Lpbswap: +.long 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f +.long 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f +K256_shaext: +.long 0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5 +.long 0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5 +.long 0xd807aa98,0x12835b01,0x243185be,0x550c7dc3 +.long 0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174 +.long 0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc +.long 0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da +.long 0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7 +.long 0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967 +.long 0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13 +.long 0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85 +.long 0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3 +.long 0xd192e819,0xd6990624,0xf40e3585,0x106aa070 +.long 0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5 +.long 0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3 +.long 0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208 +.long 0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2 +.byte 83,72,65,50,53,54,32,109,117,108,116,105,45,98,108,111,99,107,32,116,114,97,110,115,102,111,114,109,32,102,111,114,32,120,56,54,95,54,52,44,32,67,82,89,80,84,79,71,65,77,83,32,98,121,32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46,111,114,103,62,0 diff --git a/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha256-x86_64.S b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha256-x86_64.S new file mode 100644 index 0000000000..a5d3cf5068 --- /dev/null +++ b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha256-x86_64.S @@ -0,0 +1,3097 @@ +# WARNING: do not edit! +# Generated from openssl/crypto/sha/asm/sha512-x86_64.pl +# +# Copyright 2005-2020 The OpenSSL Project Authors. All Rights Reserved. +# +# Licensed under the OpenSSL license (the "License"). You may not use +# this file except in compliance with the License. You can obtain a copy +# in the file LICENSE in the source distribution or at +# https://www.openssl.org/source/license.html + +.text + + +.globl sha256_block_data_order +.type sha256_block_data_order,@function +.align 16 +sha256_block_data_order: +.cfi_startproc + leaq OPENSSL_ia32cap_P(%rip),%r11 + movl 0(%r11),%r9d + movl 4(%r11),%r10d + movl 8(%r11),%r11d + testl $536870912,%r11d + jnz _shaext_shortcut + testl $512,%r10d + jnz .Lssse3_shortcut + movq %rsp,%rax +.cfi_def_cfa_register %rax + pushq %rbx +.cfi_offset %rbx,-16 + pushq %rbp +.cfi_offset %rbp,-24 + pushq %r12 +.cfi_offset %r12,-32 + pushq %r13 +.cfi_offset %r13,-40 + pushq %r14 +.cfi_offset %r14,-48 + pushq %r15 +.cfi_offset %r15,-56 + shlq $4,%rdx + subq $64+32,%rsp + leaq (%rsi,%rdx,4),%rdx + andq $-64,%rsp + movq %rdi,64+0(%rsp) + movq %rsi,64+8(%rsp) + movq %rdx,64+16(%rsp) + movq %rax,88(%rsp) +.cfi_escape 0x0f,0x06,0x77,0xd8,0x00,0x06,0x23,0x08 +.Lprologue: + + movl 0(%rdi),%eax + movl 4(%rdi),%ebx + movl 8(%rdi),%ecx + movl 12(%rdi),%edx + movl 16(%rdi),%r8d + movl 20(%rdi),%r9d + movl 24(%rdi),%r10d + movl 28(%rdi),%r11d + jmp .Lloop + +.align 16 +.Lloop: + movl %ebx,%edi + leaq K256(%rip),%rbp + xorl %ecx,%edi + movl 0(%rsi),%r12d + movl %r8d,%r13d + movl %eax,%r14d + bswapl %r12d + rorl $14,%r13d + movl %r9d,%r15d + + xorl %r8d,%r13d + rorl $9,%r14d + xorl %r10d,%r15d + + movl %r12d,0(%rsp) + xorl %eax,%r14d + andl %r8d,%r15d + + rorl $5,%r13d + addl %r11d,%r12d + xorl %r10d,%r15d + + rorl $11,%r14d + xorl %r8d,%r13d + addl %r15d,%r12d + + movl %eax,%r15d + addl (%rbp),%r12d + xorl %eax,%r14d + + xorl %ebx,%r15d + rorl $6,%r13d + movl %ebx,%r11d + + andl %r15d,%edi + rorl $2,%r14d + addl %r13d,%r12d + + xorl %edi,%r11d + addl %r12d,%edx + addl %r12d,%r11d + + leaq 4(%rbp),%rbp + addl %r14d,%r11d + movl 4(%rsi),%r12d + movl %edx,%r13d + movl %r11d,%r14d + bswapl %r12d + rorl $14,%r13d + movl %r8d,%edi + + xorl %edx,%r13d + rorl $9,%r14d + xorl %r9d,%edi + + movl %r12d,4(%rsp) + xorl %r11d,%r14d + andl %edx,%edi + + rorl $5,%r13d + addl %r10d,%r12d + xorl %r9d,%edi + + rorl $11,%r14d + xorl %edx,%r13d + addl %edi,%r12d + + movl %r11d,%edi + addl (%rbp),%r12d + xorl %r11d,%r14d + + xorl %eax,%edi + rorl $6,%r13d + movl %eax,%r10d + + andl %edi,%r15d + rorl $2,%r14d + addl %r13d,%r12d + + xorl %r15d,%r10d + addl %r12d,%ecx + addl %r12d,%r10d + + leaq 4(%rbp),%rbp + addl %r14d,%r10d + movl 8(%rsi),%r12d + movl %ecx,%r13d + movl %r10d,%r14d + bswapl %r12d + rorl $14,%r13d + movl %edx,%r15d + + xorl %ecx,%r13d + rorl $9,%r14d + xorl %r8d,%r15d + + movl %r12d,8(%rsp) + xorl %r10d,%r14d + andl %ecx,%r15d + + rorl $5,%r13d + addl %r9d,%r12d + xorl %r8d,%r15d + + rorl $11,%r14d + xorl %ecx,%r13d + addl %r15d,%r12d + + movl %r10d,%r15d + addl (%rbp),%r12d + xorl %r10d,%r14d + + xorl %r11d,%r15d + rorl $6,%r13d + movl %r11d,%r9d + + andl %r15d,%edi + rorl $2,%r14d + addl %r13d,%r12d + + xorl %edi,%r9d + addl %r12d,%ebx + addl %r12d,%r9d + + leaq 4(%rbp),%rbp + addl %r14d,%r9d + movl 12(%rsi),%r12d + movl %ebx,%r13d + movl %r9d,%r14d + bswapl %r12d + rorl $14,%r13d + movl %ecx,%edi + + xorl %ebx,%r13d + rorl $9,%r14d + xorl %edx,%edi + + movl %r12d,12(%rsp) + xorl %r9d,%r14d + andl %ebx,%edi + + rorl $5,%r13d + addl %r8d,%r12d + xorl %edx,%edi + + rorl $11,%r14d + xorl %ebx,%r13d + addl %edi,%r12d + + movl %r9d,%edi + addl (%rbp),%r12d + xorl %r9d,%r14d + + xorl %r10d,%edi + rorl $6,%r13d + movl %r10d,%r8d + + andl %edi,%r15d + rorl $2,%r14d + addl %r13d,%r12d + + xorl %r15d,%r8d + addl %r12d,%eax + addl %r12d,%r8d + + leaq 20(%rbp),%rbp + addl %r14d,%r8d + movl 16(%rsi),%r12d + movl %eax,%r13d + movl %r8d,%r14d + bswapl %r12d + rorl $14,%r13d + movl %ebx,%r15d + + xorl %eax,%r13d + rorl $9,%r14d + xorl %ecx,%r15d + + movl %r12d,16(%rsp) + xorl %r8d,%r14d + andl %eax,%r15d + + rorl $5,%r13d + addl %edx,%r12d + xorl %ecx,%r15d + + rorl $11,%r14d + xorl %eax,%r13d + addl %r15d,%r12d + + movl %r8d,%r15d + addl (%rbp),%r12d + xorl %r8d,%r14d + + xorl %r9d,%r15d + rorl $6,%r13d + movl %r9d,%edx + + andl %r15d,%edi + rorl $2,%r14d + addl %r13d,%r12d + + xorl %edi,%edx + addl %r12d,%r11d + addl %r12d,%edx + + leaq 4(%rbp),%rbp + addl %r14d,%edx + movl 20(%rsi),%r12d + movl %r11d,%r13d + movl %edx,%r14d + bswapl %r12d + rorl $14,%r13d + movl %eax,%edi + + xorl %r11d,%r13d + rorl $9,%r14d + xorl %ebx,%edi + + movl %r12d,20(%rsp) + xorl %edx,%r14d + andl %r11d,%edi + + rorl $5,%r13d + addl %ecx,%r12d + xorl %ebx,%edi + + rorl $11,%r14d + xorl %r11d,%r13d + addl %edi,%r12d + + movl %edx,%edi + addl (%rbp),%r12d + xorl %edx,%r14d + + xorl %r8d,%edi + rorl $6,%r13d + movl %r8d,%ecx + + andl %edi,%r15d + rorl $2,%r14d + addl %r13d,%r12d + + xorl %r15d,%ecx + addl %r12d,%r10d + addl %r12d,%ecx + + leaq 4(%rbp),%rbp + addl %r14d,%ecx + movl 24(%rsi),%r12d + movl %r10d,%r13d + movl %ecx,%r14d + bswapl %r12d + rorl $14,%r13d + movl %r11d,%r15d + + xorl %r10d,%r13d + rorl $9,%r14d + xorl %eax,%r15d + + movl %r12d,24(%rsp) + xorl %ecx,%r14d + andl %r10d,%r15d + + rorl $5,%r13d + addl %ebx,%r12d + xorl %eax,%r15d + + rorl $11,%r14d + xorl %r10d,%r13d + addl %r15d,%r12d + + movl %ecx,%r15d + addl (%rbp),%r12d + xorl %ecx,%r14d + + xorl %edx,%r15d + rorl $6,%r13d + movl %edx,%ebx + + andl %r15d,%edi + rorl $2,%r14d + addl %r13d,%r12d + + xorl %edi,%ebx + addl %r12d,%r9d + addl %r12d,%ebx + + leaq 4(%rbp),%rbp + addl %r14d,%ebx + movl 28(%rsi),%r12d + movl %r9d,%r13d + movl %ebx,%r14d + bswapl %r12d + rorl $14,%r13d + movl %r10d,%edi + + xorl %r9d,%r13d + rorl $9,%r14d + xorl %r11d,%edi + + movl %r12d,28(%rsp) + xorl %ebx,%r14d + andl %r9d,%edi + + rorl $5,%r13d + addl %eax,%r12d + xorl %r11d,%edi + + rorl $11,%r14d + xorl %r9d,%r13d + addl %edi,%r12d + + movl %ebx,%edi + addl (%rbp),%r12d + xorl %ebx,%r14d + + xorl %ecx,%edi + rorl $6,%r13d + movl %ecx,%eax + + andl %edi,%r15d + rorl $2,%r14d + addl %r13d,%r12d + + xorl %r15d,%eax + addl %r12d,%r8d + addl %r12d,%eax + + leaq 20(%rbp),%rbp + addl %r14d,%eax + movl 32(%rsi),%r12d + movl %r8d,%r13d + movl %eax,%r14d + bswapl %r12d + rorl $14,%r13d + movl %r9d,%r15d + + xorl %r8d,%r13d + rorl $9,%r14d + xorl %r10d,%r15d + + movl %r12d,32(%rsp) + xorl %eax,%r14d + andl %r8d,%r15d + + rorl $5,%r13d + addl %r11d,%r12d + xorl %r10d,%r15d + + rorl $11,%r14d + xorl %r8d,%r13d + addl %r15d,%r12d + + movl %eax,%r15d + addl (%rbp),%r12d + xorl %eax,%r14d + + xorl %ebx,%r15d + rorl $6,%r13d + movl %ebx,%r11d + + andl %r15d,%edi + rorl $2,%r14d + addl %r13d,%r12d + + xorl %edi,%r11d + addl %r12d,%edx + addl %r12d,%r11d + + leaq 4(%rbp),%rbp + addl %r14d,%r11d + movl 36(%rsi),%r12d + movl %edx,%r13d + movl %r11d,%r14d + bswapl %r12d + rorl $14,%r13d + movl %r8d,%edi + + xorl %edx,%r13d + rorl $9,%r14d + xorl %r9d,%edi + + movl %r12d,36(%rsp) + xorl %r11d,%r14d + andl %edx,%edi + + rorl $5,%r13d + addl %r10d,%r12d + xorl %r9d,%edi + + rorl $11,%r14d + xorl %edx,%r13d + addl %edi,%r12d + + movl %r11d,%edi + addl (%rbp),%r12d + xorl %r11d,%r14d + + xorl %eax,%edi + rorl $6,%r13d + movl %eax,%r10d + + andl %edi,%r15d + rorl $2,%r14d + addl %r13d,%r12d + + xorl %r15d,%r10d + addl %r12d,%ecx + addl %r12d,%r10d + + leaq 4(%rbp),%rbp + addl %r14d,%r10d + movl 40(%rsi),%r12d + movl %ecx,%r13d + movl %r10d,%r14d + bswapl %r12d + rorl $14,%r13d + movl %edx,%r15d + + xorl %ecx,%r13d + rorl $9,%r14d + xorl %r8d,%r15d + + movl %r12d,40(%rsp) + xorl %r10d,%r14d + andl %ecx,%r15d + + rorl $5,%r13d + addl %r9d,%r12d + xorl %r8d,%r15d + + rorl $11,%r14d + xorl %ecx,%r13d + addl %r15d,%r12d + + movl %r10d,%r15d + addl (%rbp),%r12d + xorl %r10d,%r14d + + xorl %r11d,%r15d + rorl $6,%r13d + movl %r11d,%r9d + + andl %r15d,%edi + rorl $2,%r14d + addl %r13d,%r12d + + xorl %edi,%r9d + addl %r12d,%ebx + addl %r12d,%r9d + + leaq 4(%rbp),%rbp + addl %r14d,%r9d + movl 44(%rsi),%r12d + movl %ebx,%r13d + movl %r9d,%r14d + bswapl %r12d + rorl $14,%r13d + movl %ecx,%edi + + xorl %ebx,%r13d + rorl $9,%r14d + xorl %edx,%edi + + movl %r12d,44(%rsp) + xorl %r9d,%r14d + andl %ebx,%edi + + rorl $5,%r13d + addl %r8d,%r12d + xorl %edx,%edi + + rorl $11,%r14d + xorl %ebx,%r13d + addl %edi,%r12d + + movl %r9d,%edi + addl (%rbp),%r12d + xorl %r9d,%r14d + + xorl %r10d,%edi + rorl $6,%r13d + movl %r10d,%r8d + + andl %edi,%r15d + rorl $2,%r14d + addl %r13d,%r12d + + xorl %r15d,%r8d + addl %r12d,%eax + addl %r12d,%r8d + + leaq 20(%rbp),%rbp + addl %r14d,%r8d + movl 48(%rsi),%r12d + movl %eax,%r13d + movl %r8d,%r14d + bswapl %r12d + rorl $14,%r13d + movl %ebx,%r15d + + xorl %eax,%r13d + rorl $9,%r14d + xorl %ecx,%r15d + + movl %r12d,48(%rsp) + xorl %r8d,%r14d + andl %eax,%r15d + + rorl $5,%r13d + addl %edx,%r12d + xorl %ecx,%r15d + + rorl $11,%r14d + xorl %eax,%r13d + addl %r15d,%r12d + + movl %r8d,%r15d + addl (%rbp),%r12d + xorl %r8d,%r14d + + xorl %r9d,%r15d + rorl $6,%r13d + movl %r9d,%edx + + andl %r15d,%edi + rorl $2,%r14d + addl %r13d,%r12d + + xorl %edi,%edx + addl %r12d,%r11d + addl %r12d,%edx + + leaq 4(%rbp),%rbp + addl %r14d,%edx + movl 52(%rsi),%r12d + movl %r11d,%r13d + movl %edx,%r14d + bswapl %r12d + rorl $14,%r13d + movl %eax,%edi + + xorl %r11d,%r13d + rorl $9,%r14d + xorl %ebx,%edi + + movl %r12d,52(%rsp) + xorl %edx,%r14d + andl %r11d,%edi + + rorl $5,%r13d + addl %ecx,%r12d + xorl %ebx,%edi + + rorl $11,%r14d + xorl %r11d,%r13d + addl %edi,%r12d + + movl %edx,%edi + addl (%rbp),%r12d + xorl %edx,%r14d + + xorl %r8d,%edi + rorl $6,%r13d + movl %r8d,%ecx + + andl %edi,%r15d + rorl $2,%r14d + addl %r13d,%r12d + + xorl %r15d,%ecx + addl %r12d,%r10d + addl %r12d,%ecx + + leaq 4(%rbp),%rbp + addl %r14d,%ecx + movl 56(%rsi),%r12d + movl %r10d,%r13d + movl %ecx,%r14d + bswapl %r12d + rorl $14,%r13d + movl %r11d,%r15d + + xorl %r10d,%r13d + rorl $9,%r14d + xorl %eax,%r15d + + movl %r12d,56(%rsp) + xorl %ecx,%r14d + andl %r10d,%r15d + + rorl $5,%r13d + addl %ebx,%r12d + xorl %eax,%r15d + + rorl $11,%r14d + xorl %r10d,%r13d + addl %r15d,%r12d + + movl %ecx,%r15d + addl (%rbp),%r12d + xorl %ecx,%r14d + + xorl %edx,%r15d + rorl $6,%r13d + movl %edx,%ebx + + andl %r15d,%edi + rorl $2,%r14d + addl %r13d,%r12d + + xorl %edi,%ebx + addl %r12d,%r9d + addl %r12d,%ebx + + leaq 4(%rbp),%rbp + addl %r14d,%ebx + movl 60(%rsi),%r12d + movl %r9d,%r13d + movl %ebx,%r14d + bswapl %r12d + rorl $14,%r13d + movl %r10d,%edi + + xorl %r9d,%r13d + rorl $9,%r14d + xorl %r11d,%edi + + movl %r12d,60(%rsp) + xorl %ebx,%r14d + andl %r9d,%edi + + rorl $5,%r13d + addl %eax,%r12d + xorl %r11d,%edi + + rorl $11,%r14d + xorl %r9d,%r13d + addl %edi,%r12d + + movl %ebx,%edi + addl (%rbp),%r12d + xorl %ebx,%r14d + + xorl %ecx,%edi + rorl $6,%r13d + movl %ecx,%eax + + andl %edi,%r15d + rorl $2,%r14d + addl %r13d,%r12d + + xorl %r15d,%eax + addl %r12d,%r8d + addl %r12d,%eax + + leaq 20(%rbp),%rbp + jmp .Lrounds_16_xx +.align 16 +.Lrounds_16_xx: + movl 4(%rsp),%r13d + movl 56(%rsp),%r15d + + movl %r13d,%r12d + rorl $11,%r13d + addl %r14d,%eax + movl %r15d,%r14d + rorl $2,%r15d + + xorl %r12d,%r13d + shrl $3,%r12d + rorl $7,%r13d + xorl %r14d,%r15d + shrl $10,%r14d + + rorl $17,%r15d + xorl %r13d,%r12d + xorl %r14d,%r15d + addl 36(%rsp),%r12d + + addl 0(%rsp),%r12d + movl %r8d,%r13d + addl %r15d,%r12d + movl %eax,%r14d + rorl $14,%r13d + movl %r9d,%r15d + + xorl %r8d,%r13d + rorl $9,%r14d + xorl %r10d,%r15d + + movl %r12d,0(%rsp) + xorl %eax,%r14d + andl %r8d,%r15d + + rorl $5,%r13d + addl %r11d,%r12d + xorl %r10d,%r15d + + rorl $11,%r14d + xorl %r8d,%r13d + addl %r15d,%r12d + + movl %eax,%r15d + addl (%rbp),%r12d + xorl %eax,%r14d + + xorl %ebx,%r15d + rorl $6,%r13d + movl %ebx,%r11d + + andl %r15d,%edi + rorl $2,%r14d + addl %r13d,%r12d + + xorl %edi,%r11d + addl %r12d,%edx + addl %r12d,%r11d + + leaq 4(%rbp),%rbp + movl 8(%rsp),%r13d + movl 60(%rsp),%edi + + movl %r13d,%r12d + rorl $11,%r13d + addl %r14d,%r11d + movl %edi,%r14d + rorl $2,%edi + + xorl %r12d,%r13d + shrl $3,%r12d + rorl $7,%r13d + xorl %r14d,%edi + shrl $10,%r14d + + rorl $17,%edi + xorl %r13d,%r12d + xorl %r14d,%edi + addl 40(%rsp),%r12d + + addl 4(%rsp),%r12d + movl %edx,%r13d + addl %edi,%r12d + movl %r11d,%r14d + rorl $14,%r13d + movl %r8d,%edi + + xorl %edx,%r13d + rorl $9,%r14d + xorl %r9d,%edi + + movl %r12d,4(%rsp) + xorl %r11d,%r14d + andl %edx,%edi + + rorl $5,%r13d + addl %r10d,%r12d + xorl %r9d,%edi + + rorl $11,%r14d + xorl %edx,%r13d + addl %edi,%r12d + + movl %r11d,%edi + addl (%rbp),%r12d + xorl %r11d,%r14d + + xorl %eax,%edi + rorl $6,%r13d + movl %eax,%r10d + + andl %edi,%r15d + rorl $2,%r14d + addl %r13d,%r12d + + xorl %r15d,%r10d + addl %r12d,%ecx + addl %r12d,%r10d + + leaq 4(%rbp),%rbp + movl 12(%rsp),%r13d + movl 0(%rsp),%r15d + + movl %r13d,%r12d + rorl $11,%r13d + addl %r14d,%r10d + movl %r15d,%r14d + rorl $2,%r15d + + xorl %r12d,%r13d + shrl $3,%r12d + rorl $7,%r13d + xorl %r14d,%r15d + shrl $10,%r14d + + rorl $17,%r15d + xorl %r13d,%r12d + xorl %r14d,%r15d + addl 44(%rsp),%r12d + + addl 8(%rsp),%r12d + movl %ecx,%r13d + addl %r15d,%r12d + movl %r10d,%r14d + rorl $14,%r13d + movl %edx,%r15d + + xorl %ecx,%r13d + rorl $9,%r14d + xorl %r8d,%r15d + + movl %r12d,8(%rsp) + xorl %r10d,%r14d + andl %ecx,%r15d + + rorl $5,%r13d + addl %r9d,%r12d + xorl %r8d,%r15d + + rorl $11,%r14d + xorl %ecx,%r13d + addl %r15d,%r12d + + movl %r10d,%r15d + addl (%rbp),%r12d + xorl %r10d,%r14d + + xorl %r11d,%r15d + rorl $6,%r13d + movl %r11d,%r9d + + andl %r15d,%edi + rorl $2,%r14d + addl %r13d,%r12d + + xorl %edi,%r9d + addl %r12d,%ebx + addl %r12d,%r9d + + leaq 4(%rbp),%rbp + movl 16(%rsp),%r13d + movl 4(%rsp),%edi + + movl %r13d,%r12d + rorl $11,%r13d + addl %r14d,%r9d + movl %edi,%r14d + rorl $2,%edi + + xorl %r12d,%r13d + shrl $3,%r12d + rorl $7,%r13d + xorl %r14d,%edi + shrl $10,%r14d + + rorl $17,%edi + xorl %r13d,%r12d + xorl %r14d,%edi + addl 48(%rsp),%r12d + + addl 12(%rsp),%r12d + movl %ebx,%r13d + addl %edi,%r12d + movl %r9d,%r14d + rorl $14,%r13d + movl %ecx,%edi + + xorl %ebx,%r13d + rorl $9,%r14d + xorl %edx,%edi + + movl %r12d,12(%rsp) + xorl %r9d,%r14d + andl %ebx,%edi + + rorl $5,%r13d + addl %r8d,%r12d + xorl %edx,%edi + + rorl $11,%r14d + xorl %ebx,%r13d + addl %edi,%r12d + + movl %r9d,%edi + addl (%rbp),%r12d + xorl %r9d,%r14d + + xorl %r10d,%edi + rorl $6,%r13d + movl %r10d,%r8d + + andl %edi,%r15d + rorl $2,%r14d + addl %r13d,%r12d + + xorl %r15d,%r8d + addl %r12d,%eax + addl %r12d,%r8d + + leaq 20(%rbp),%rbp + movl 20(%rsp),%r13d + movl 8(%rsp),%r15d + + movl %r13d,%r12d + rorl $11,%r13d + addl %r14d,%r8d + movl %r15d,%r14d + rorl $2,%r15d + + xorl %r12d,%r13d + shrl $3,%r12d + rorl $7,%r13d + xorl %r14d,%r15d + shrl $10,%r14d + + rorl $17,%r15d + xorl %r13d,%r12d + xorl %r14d,%r15d + addl 52(%rsp),%r12d + + addl 16(%rsp),%r12d + movl %eax,%r13d + addl %r15d,%r12d + movl %r8d,%r14d + rorl $14,%r13d + movl %ebx,%r15d + + xorl %eax,%r13d + rorl $9,%r14d + xorl %ecx,%r15d + + movl %r12d,16(%rsp) + xorl %r8d,%r14d + andl %eax,%r15d + + rorl $5,%r13d + addl %edx,%r12d + xorl %ecx,%r15d + + rorl $11,%r14d + xorl %eax,%r13d + addl %r15d,%r12d + + movl %r8d,%r15d + addl (%rbp),%r12d + xorl %r8d,%r14d + + xorl %r9d,%r15d + rorl $6,%r13d + movl %r9d,%edx + + andl %r15d,%edi + rorl $2,%r14d + addl %r13d,%r12d + + xorl %edi,%edx + addl %r12d,%r11d + addl %r12d,%edx + + leaq 4(%rbp),%rbp + movl 24(%rsp),%r13d + movl 12(%rsp),%edi + + movl %r13d,%r12d + rorl $11,%r13d + addl %r14d,%edx + movl %edi,%r14d + rorl $2,%edi + + xorl %r12d,%r13d + shrl $3,%r12d + rorl $7,%r13d + xorl %r14d,%edi + shrl $10,%r14d + + rorl $17,%edi + xorl %r13d,%r12d + xorl %r14d,%edi + addl 56(%rsp),%r12d + + addl 20(%rsp),%r12d + movl %r11d,%r13d + addl %edi,%r12d + movl %edx,%r14d + rorl $14,%r13d + movl %eax,%edi + + xorl %r11d,%r13d + rorl $9,%r14d + xorl %ebx,%edi + + movl %r12d,20(%rsp) + xorl %edx,%r14d + andl %r11d,%edi + + rorl $5,%r13d + addl %ecx,%r12d + xorl %ebx,%edi + + rorl $11,%r14d + xorl %r11d,%r13d + addl %edi,%r12d + + movl %edx,%edi + addl (%rbp),%r12d + xorl %edx,%r14d + + xorl %r8d,%edi + rorl $6,%r13d + movl %r8d,%ecx + + andl %edi,%r15d + rorl $2,%r14d + addl %r13d,%r12d + + xorl %r15d,%ecx + addl %r12d,%r10d + addl %r12d,%ecx + + leaq 4(%rbp),%rbp + movl 28(%rsp),%r13d + movl 16(%rsp),%r15d + + movl %r13d,%r12d + rorl $11,%r13d + addl %r14d,%ecx + movl %r15d,%r14d + rorl $2,%r15d + + xorl %r12d,%r13d + shrl $3,%r12d + rorl $7,%r13d + xorl %r14d,%r15d + shrl $10,%r14d + + rorl $17,%r15d + xorl %r13d,%r12d + xorl %r14d,%r15d + addl 60(%rsp),%r12d + + addl 24(%rsp),%r12d + movl %r10d,%r13d + addl %r15d,%r12d + movl %ecx,%r14d + rorl $14,%r13d + movl %r11d,%r15d + + xorl %r10d,%r13d + rorl $9,%r14d + xorl %eax,%r15d + + movl %r12d,24(%rsp) + xorl %ecx,%r14d + andl %r10d,%r15d + + rorl $5,%r13d + addl %ebx,%r12d + xorl %eax,%r15d + + rorl $11,%r14d + xorl %r10d,%r13d + addl %r15d,%r12d + + movl %ecx,%r15d + addl (%rbp),%r12d + xorl %ecx,%r14d + + xorl %edx,%r15d + rorl $6,%r13d + movl %edx,%ebx + + andl %r15d,%edi + rorl $2,%r14d + addl %r13d,%r12d + + xorl %edi,%ebx + addl %r12d,%r9d + addl %r12d,%ebx + + leaq 4(%rbp),%rbp + movl 32(%rsp),%r13d + movl 20(%rsp),%edi + + movl %r13d,%r12d + rorl $11,%r13d + addl %r14d,%ebx + movl %edi,%r14d + rorl $2,%edi + + xorl %r12d,%r13d + shrl $3,%r12d + rorl $7,%r13d + xorl %r14d,%edi + shrl $10,%r14d + + rorl $17,%edi + xorl %r13d,%r12d + xorl %r14d,%edi + addl 0(%rsp),%r12d + + addl 28(%rsp),%r12d + movl %r9d,%r13d + addl %edi,%r12d + movl %ebx,%r14d + rorl $14,%r13d + movl %r10d,%edi + + xorl %r9d,%r13d + rorl $9,%r14d + xorl %r11d,%edi + + movl %r12d,28(%rsp) + xorl %ebx,%r14d + andl %r9d,%edi + + rorl $5,%r13d + addl %eax,%r12d + xorl %r11d,%edi + + rorl $11,%r14d + xorl %r9d,%r13d + addl %edi,%r12d + + movl %ebx,%edi + addl (%rbp),%r12d + xorl %ebx,%r14d + + xorl %ecx,%edi + rorl $6,%r13d + movl %ecx,%eax + + andl %edi,%r15d + rorl $2,%r14d + addl %r13d,%r12d + + xorl %r15d,%eax + addl %r12d,%r8d + addl %r12d,%eax + + leaq 20(%rbp),%rbp + movl 36(%rsp),%r13d + movl 24(%rsp),%r15d + + movl %r13d,%r12d + rorl $11,%r13d + addl %r14d,%eax + movl %r15d,%r14d + rorl $2,%r15d + + xorl %r12d,%r13d + shrl $3,%r12d + rorl $7,%r13d + xorl %r14d,%r15d + shrl $10,%r14d + + rorl $17,%r15d + xorl %r13d,%r12d + xorl %r14d,%r15d + addl 4(%rsp),%r12d + + addl 32(%rsp),%r12d + movl %r8d,%r13d + addl %r15d,%r12d + movl %eax,%r14d + rorl $14,%r13d + movl %r9d,%r15d + + xorl %r8d,%r13d + rorl $9,%r14d + xorl %r10d,%r15d + + movl %r12d,32(%rsp) + xorl %eax,%r14d + andl %r8d,%r15d + + rorl $5,%r13d + addl %r11d,%r12d + xorl %r10d,%r15d + + rorl $11,%r14d + xorl %r8d,%r13d + addl %r15d,%r12d + + movl %eax,%r15d + addl (%rbp),%r12d + xorl %eax,%r14d + + xorl %ebx,%r15d + rorl $6,%r13d + movl %ebx,%r11d + + andl %r15d,%edi + rorl $2,%r14d + addl %r13d,%r12d + + xorl %edi,%r11d + addl %r12d,%edx + addl %r12d,%r11d + + leaq 4(%rbp),%rbp + movl 40(%rsp),%r13d + movl 28(%rsp),%edi + + movl %r13d,%r12d + rorl $11,%r13d + addl %r14d,%r11d + movl %edi,%r14d + rorl $2,%edi + + xorl %r12d,%r13d + shrl $3,%r12d + rorl $7,%r13d + xorl %r14d,%edi + shrl $10,%r14d + + rorl $17,%edi + xorl %r13d,%r12d + xorl %r14d,%edi + addl 8(%rsp),%r12d + + addl 36(%rsp),%r12d + movl %edx,%r13d + addl %edi,%r12d + movl %r11d,%r14d + rorl $14,%r13d + movl %r8d,%edi + + xorl %edx,%r13d + rorl $9,%r14d + xorl %r9d,%edi + + movl %r12d,36(%rsp) + xorl %r11d,%r14d + andl %edx,%edi + + rorl $5,%r13d + addl %r10d,%r12d + xorl %r9d,%edi + + rorl $11,%r14d + xorl %edx,%r13d + addl %edi,%r12d + + movl %r11d,%edi + addl (%rbp),%r12d + xorl %r11d,%r14d + + xorl %eax,%edi + rorl $6,%r13d + movl %eax,%r10d + + andl %edi,%r15d + rorl $2,%r14d + addl %r13d,%r12d + + xorl %r15d,%r10d + addl %r12d,%ecx + addl %r12d,%r10d + + leaq 4(%rbp),%rbp + movl 44(%rsp),%r13d + movl 32(%rsp),%r15d + + movl %r13d,%r12d + rorl $11,%r13d + addl %r14d,%r10d + movl %r15d,%r14d + rorl $2,%r15d + + xorl %r12d,%r13d + shrl $3,%r12d + rorl $7,%r13d + xorl %r14d,%r15d + shrl $10,%r14d + + rorl $17,%r15d + xorl %r13d,%r12d + xorl %r14d,%r15d + addl 12(%rsp),%r12d + + addl 40(%rsp),%r12d + movl %ecx,%r13d + addl %r15d,%r12d + movl %r10d,%r14d + rorl $14,%r13d + movl %edx,%r15d + + xorl %ecx,%r13d + rorl $9,%r14d + xorl %r8d,%r15d + + movl %r12d,40(%rsp) + xorl %r10d,%r14d + andl %ecx,%r15d + + rorl $5,%r13d + addl %r9d,%r12d + xorl %r8d,%r15d + + rorl $11,%r14d + xorl %ecx,%r13d + addl %r15d,%r12d + + movl %r10d,%r15d + addl (%rbp),%r12d + xorl %r10d,%r14d + + xorl %r11d,%r15d + rorl $6,%r13d + movl %r11d,%r9d + + andl %r15d,%edi + rorl $2,%r14d + addl %r13d,%r12d + + xorl %edi,%r9d + addl %r12d,%ebx + addl %r12d,%r9d + + leaq 4(%rbp),%rbp + movl 48(%rsp),%r13d + movl 36(%rsp),%edi + + movl %r13d,%r12d + rorl $11,%r13d + addl %r14d,%r9d + movl %edi,%r14d + rorl $2,%edi + + xorl %r12d,%r13d + shrl $3,%r12d + rorl $7,%r13d + xorl %r14d,%edi + shrl $10,%r14d + + rorl $17,%edi + xorl %r13d,%r12d + xorl %r14d,%edi + addl 16(%rsp),%r12d + + addl 44(%rsp),%r12d + movl %ebx,%r13d + addl %edi,%r12d + movl %r9d,%r14d + rorl $14,%r13d + movl %ecx,%edi + + xorl %ebx,%r13d + rorl $9,%r14d + xorl %edx,%edi + + movl %r12d,44(%rsp) + xorl %r9d,%r14d + andl %ebx,%edi + + rorl $5,%r13d + addl %r8d,%r12d + xorl %edx,%edi + + rorl $11,%r14d + xorl %ebx,%r13d + addl %edi,%r12d + + movl %r9d,%edi + addl (%rbp),%r12d + xorl %r9d,%r14d + + xorl %r10d,%edi + rorl $6,%r13d + movl %r10d,%r8d + + andl %edi,%r15d + rorl $2,%r14d + addl %r13d,%r12d + + xorl %r15d,%r8d + addl %r12d,%eax + addl %r12d,%r8d + + leaq 20(%rbp),%rbp + movl 52(%rsp),%r13d + movl 40(%rsp),%r15d + + movl %r13d,%r12d + rorl $11,%r13d + addl %r14d,%r8d + movl %r15d,%r14d + rorl $2,%r15d + + xorl %r12d,%r13d + shrl $3,%r12d + rorl $7,%r13d + xorl %r14d,%r15d + shrl $10,%r14d + + rorl $17,%r15d + xorl %r13d,%r12d + xorl %r14d,%r15d + addl 20(%rsp),%r12d + + addl 48(%rsp),%r12d + movl %eax,%r13d + addl %r15d,%r12d + movl %r8d,%r14d + rorl $14,%r13d + movl %ebx,%r15d + + xorl %eax,%r13d + rorl $9,%r14d + xorl %ecx,%r15d + + movl %r12d,48(%rsp) + xorl %r8d,%r14d + andl %eax,%r15d + + rorl $5,%r13d + addl %edx,%r12d + xorl %ecx,%r15d + + rorl $11,%r14d + xorl %eax,%r13d + addl %r15d,%r12d + + movl %r8d,%r15d + addl (%rbp),%r12d + xorl %r8d,%r14d + + xorl %r9d,%r15d + rorl $6,%r13d + movl %r9d,%edx + + andl %r15d,%edi + rorl $2,%r14d + addl %r13d,%r12d + + xorl %edi,%edx + addl %r12d,%r11d + addl %r12d,%edx + + leaq 4(%rbp),%rbp + movl 56(%rsp),%r13d + movl 44(%rsp),%edi + + movl %r13d,%r12d + rorl $11,%r13d + addl %r14d,%edx + movl %edi,%r14d + rorl $2,%edi + + xorl %r12d,%r13d + shrl $3,%r12d + rorl $7,%r13d + xorl %r14d,%edi + shrl $10,%r14d + + rorl $17,%edi + xorl %r13d,%r12d + xorl %r14d,%edi + addl 24(%rsp),%r12d + + addl 52(%rsp),%r12d + movl %r11d,%r13d + addl %edi,%r12d + movl %edx,%r14d + rorl $14,%r13d + movl %eax,%edi + + xorl %r11d,%r13d + rorl $9,%r14d + xorl %ebx,%edi + + movl %r12d,52(%rsp) + xorl %edx,%r14d + andl %r11d,%edi + + rorl $5,%r13d + addl %ecx,%r12d + xorl %ebx,%edi + + rorl $11,%r14d + xorl %r11d,%r13d + addl %edi,%r12d + + movl %edx,%edi + addl (%rbp),%r12d + xorl %edx,%r14d + + xorl %r8d,%edi + rorl $6,%r13d + movl %r8d,%ecx + + andl %edi,%r15d + rorl $2,%r14d + addl %r13d,%r12d + + xorl %r15d,%ecx + addl %r12d,%r10d + addl %r12d,%ecx + + leaq 4(%rbp),%rbp + movl 60(%rsp),%r13d + movl 48(%rsp),%r15d + + movl %r13d,%r12d + rorl $11,%r13d + addl %r14d,%ecx + movl %r15d,%r14d + rorl $2,%r15d + + xorl %r12d,%r13d + shrl $3,%r12d + rorl $7,%r13d + xorl %r14d,%r15d + shrl $10,%r14d + + rorl $17,%r15d + xorl %r13d,%r12d + xorl %r14d,%r15d + addl 28(%rsp),%r12d + + addl 56(%rsp),%r12d + movl %r10d,%r13d + addl %r15d,%r12d + movl %ecx,%r14d + rorl $14,%r13d + movl %r11d,%r15d + + xorl %r10d,%r13d + rorl $9,%r14d + xorl %eax,%r15d + + movl %r12d,56(%rsp) + xorl %ecx,%r14d + andl %r10d,%r15d + + rorl $5,%r13d + addl %ebx,%r12d + xorl %eax,%r15d + + rorl $11,%r14d + xorl %r10d,%r13d + addl %r15d,%r12d + + movl %ecx,%r15d + addl (%rbp),%r12d + xorl %ecx,%r14d + + xorl %edx,%r15d + rorl $6,%r13d + movl %edx,%ebx + + andl %r15d,%edi + rorl $2,%r14d + addl %r13d,%r12d + + xorl %edi,%ebx + addl %r12d,%r9d + addl %r12d,%ebx + + leaq 4(%rbp),%rbp + movl 0(%rsp),%r13d + movl 52(%rsp),%edi + + movl %r13d,%r12d + rorl $11,%r13d + addl %r14d,%ebx + movl %edi,%r14d + rorl $2,%edi + + xorl %r12d,%r13d + shrl $3,%r12d + rorl $7,%r13d + xorl %r14d,%edi + shrl $10,%r14d + + rorl $17,%edi + xorl %r13d,%r12d + xorl %r14d,%edi + addl 32(%rsp),%r12d + + addl 60(%rsp),%r12d + movl %r9d,%r13d + addl %edi,%r12d + movl %ebx,%r14d + rorl $14,%r13d + movl %r10d,%edi + + xorl %r9d,%r13d + rorl $9,%r14d + xorl %r11d,%edi + + movl %r12d,60(%rsp) + xorl %ebx,%r14d + andl %r9d,%edi + + rorl $5,%r13d + addl %eax,%r12d + xorl %r11d,%edi + + rorl $11,%r14d + xorl %r9d,%r13d + addl %edi,%r12d + + movl %ebx,%edi + addl (%rbp),%r12d + xorl %ebx,%r14d + + xorl %ecx,%edi + rorl $6,%r13d + movl %ecx,%eax + + andl %edi,%r15d + rorl $2,%r14d + addl %r13d,%r12d + + xorl %r15d,%eax + addl %r12d,%r8d + addl %r12d,%eax + + leaq 20(%rbp),%rbp + cmpb $0,3(%rbp) + jnz .Lrounds_16_xx + + movq 64+0(%rsp),%rdi + addl %r14d,%eax + leaq 64(%rsi),%rsi + + addl 0(%rdi),%eax + addl 4(%rdi),%ebx + addl 8(%rdi),%ecx + addl 12(%rdi),%edx + addl 16(%rdi),%r8d + addl 20(%rdi),%r9d + addl 24(%rdi),%r10d + addl 28(%rdi),%r11d + + cmpq 64+16(%rsp),%rsi + + movl %eax,0(%rdi) + movl %ebx,4(%rdi) + movl %ecx,8(%rdi) + movl %edx,12(%rdi) + movl %r8d,16(%rdi) + movl %r9d,20(%rdi) + movl %r10d,24(%rdi) + movl %r11d,28(%rdi) + jb .Lloop + + movq 88(%rsp),%rsi +.cfi_def_cfa %rsi,8 + movq -48(%rsi),%r15 +.cfi_restore %r15 + movq -40(%rsi),%r14 +.cfi_restore %r14 + movq -32(%rsi),%r13 +.cfi_restore %r13 + movq -24(%rsi),%r12 +.cfi_restore %r12 + movq -16(%rsi),%rbp +.cfi_restore %rbp + movq -8(%rsi),%rbx +.cfi_restore %rbx + leaq (%rsi),%rsp +.cfi_def_cfa_register %rsp +.Lepilogue: + .byte 0xf3,0xc3 +.cfi_endproc +.size sha256_block_data_order,.-sha256_block_data_order +.align 64 +.type K256,@object +K256: +.long 0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5 +.long 0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5 +.long 0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5 +.long 0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5 +.long 0xd807aa98,0x12835b01,0x243185be,0x550c7dc3 +.long 0xd807aa98,0x12835b01,0x243185be,0x550c7dc3 +.long 0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174 +.long 0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174 +.long 0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc +.long 0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc +.long 0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da +.long 0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da +.long 0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7 +.long 0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7 +.long 0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967 +.long 0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967 +.long 0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13 +.long 0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13 +.long 0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85 +.long 0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85 +.long 0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3 +.long 0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3 +.long 0xd192e819,0xd6990624,0xf40e3585,0x106aa070 +.long 0xd192e819,0xd6990624,0xf40e3585,0x106aa070 +.long 0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5 +.long 0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5 +.long 0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3 +.long 0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3 +.long 0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208 +.long 0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208 +.long 0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2 +.long 0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2 + +.long 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f +.long 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f +.long 0x03020100,0x0b0a0908,0xffffffff,0xffffffff +.long 0x03020100,0x0b0a0908,0xffffffff,0xffffffff +.long 0xffffffff,0xffffffff,0x03020100,0x0b0a0908 +.long 0xffffffff,0xffffffff,0x03020100,0x0b0a0908 +.byte 83,72,65,50,53,54,32,98,108,111,99,107,32,116,114,97,110,115,102,111,114,109,32,102,111,114,32,120,56,54,95,54,52,44,32,67,82,89,80,84,79,71,65,77,83,32,98,121,32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46,111,114,103,62,0 +.type sha256_block_data_order_shaext,@function +.align 64 +sha256_block_data_order_shaext: +_shaext_shortcut: +.cfi_startproc + leaq K256+128(%rip),%rcx + movdqu (%rdi),%xmm1 + movdqu 16(%rdi),%xmm2 + movdqa 512-128(%rcx),%xmm7 + + pshufd $0x1b,%xmm1,%xmm0 + pshufd $0xb1,%xmm1,%xmm1 + pshufd $0x1b,%xmm2,%xmm2 + movdqa %xmm7,%xmm8 +.byte 102,15,58,15,202,8 + punpcklqdq %xmm0,%xmm2 + jmp .Loop_shaext + +.align 16 +.Loop_shaext: + movdqu (%rsi),%xmm3 + movdqu 16(%rsi),%xmm4 + movdqu 32(%rsi),%xmm5 +.byte 102,15,56,0,223 + movdqu 48(%rsi),%xmm6 + + movdqa 0-128(%rcx),%xmm0 + paddd %xmm3,%xmm0 +.byte 102,15,56,0,231 + movdqa %xmm2,%xmm10 +.byte 15,56,203,209 + pshufd $0x0e,%xmm0,%xmm0 + nop + movdqa %xmm1,%xmm9 +.byte 15,56,203,202 + + movdqa 32-128(%rcx),%xmm0 + paddd %xmm4,%xmm0 +.byte 102,15,56,0,239 +.byte 15,56,203,209 + pshufd $0x0e,%xmm0,%xmm0 + leaq 64(%rsi),%rsi +.byte 15,56,204,220 +.byte 15,56,203,202 + + movdqa 64-128(%rcx),%xmm0 + paddd %xmm5,%xmm0 +.byte 102,15,56,0,247 +.byte 15,56,203,209 + pshufd $0x0e,%xmm0,%xmm0 + movdqa %xmm6,%xmm7 +.byte 102,15,58,15,253,4 + nop + paddd %xmm7,%xmm3 +.byte 15,56,204,229 +.byte 15,56,203,202 + + movdqa 96-128(%rcx),%xmm0 + paddd %xmm6,%xmm0 +.byte 15,56,205,222 +.byte 15,56,203,209 + pshufd $0x0e,%xmm0,%xmm0 + movdqa %xmm3,%xmm7 +.byte 102,15,58,15,254,4 + nop + paddd %xmm7,%xmm4 +.byte 15,56,204,238 +.byte 15,56,203,202 + movdqa 128-128(%rcx),%xmm0 + paddd %xmm3,%xmm0 +.byte 15,56,205,227 +.byte 15,56,203,209 + pshufd $0x0e,%xmm0,%xmm0 + movdqa %xmm4,%xmm7 +.byte 102,15,58,15,251,4 + nop + paddd %xmm7,%xmm5 +.byte 15,56,204,243 +.byte 15,56,203,202 + movdqa 160-128(%rcx),%xmm0 + paddd %xmm4,%xmm0 +.byte 15,56,205,236 +.byte 15,56,203,209 + pshufd $0x0e,%xmm0,%xmm0 + movdqa %xmm5,%xmm7 +.byte 102,15,58,15,252,4 + nop + paddd %xmm7,%xmm6 +.byte 15,56,204,220 +.byte 15,56,203,202 + movdqa 192-128(%rcx),%xmm0 + paddd %xmm5,%xmm0 +.byte 15,56,205,245 +.byte 15,56,203,209 + pshufd $0x0e,%xmm0,%xmm0 + movdqa %xmm6,%xmm7 +.byte 102,15,58,15,253,4 + nop + paddd %xmm7,%xmm3 +.byte 15,56,204,229 +.byte 15,56,203,202 + movdqa 224-128(%rcx),%xmm0 + paddd %xmm6,%xmm0 +.byte 15,56,205,222 +.byte 15,56,203,209 + pshufd $0x0e,%xmm0,%xmm0 + movdqa %xmm3,%xmm7 +.byte 102,15,58,15,254,4 + nop + paddd %xmm7,%xmm4 +.byte 15,56,204,238 +.byte 15,56,203,202 + movdqa 256-128(%rcx),%xmm0 + paddd %xmm3,%xmm0 +.byte 15,56,205,227 +.byte 15,56,203,209 + pshufd $0x0e,%xmm0,%xmm0 + movdqa %xmm4,%xmm7 +.byte 102,15,58,15,251,4 + nop + paddd %xmm7,%xmm5 +.byte 15,56,204,243 +.byte 15,56,203,202 + movdqa 288-128(%rcx),%xmm0 + paddd %xmm4,%xmm0 +.byte 15,56,205,236 +.byte 15,56,203,209 + pshufd $0x0e,%xmm0,%xmm0 + movdqa %xmm5,%xmm7 +.byte 102,15,58,15,252,4 + nop + paddd %xmm7,%xmm6 +.byte 15,56,204,220 +.byte 15,56,203,202 + movdqa 320-128(%rcx),%xmm0 + paddd %xmm5,%xmm0 +.byte 15,56,205,245 +.byte 15,56,203,209 + pshufd $0x0e,%xmm0,%xmm0 + movdqa %xmm6,%xmm7 +.byte 102,15,58,15,253,4 + nop + paddd %xmm7,%xmm3 +.byte 15,56,204,229 +.byte 15,56,203,202 + movdqa 352-128(%rcx),%xmm0 + paddd %xmm6,%xmm0 +.byte 15,56,205,222 +.byte 15,56,203,209 + pshufd $0x0e,%xmm0,%xmm0 + movdqa %xmm3,%xmm7 +.byte 102,15,58,15,254,4 + nop + paddd %xmm7,%xmm4 +.byte 15,56,204,238 +.byte 15,56,203,202 + movdqa 384-128(%rcx),%xmm0 + paddd %xmm3,%xmm0 +.byte 15,56,205,227 +.byte 15,56,203,209 + pshufd $0x0e,%xmm0,%xmm0 + movdqa %xmm4,%xmm7 +.byte 102,15,58,15,251,4 + nop + paddd %xmm7,%xmm5 +.byte 15,56,204,243 +.byte 15,56,203,202 + movdqa 416-128(%rcx),%xmm0 + paddd %xmm4,%xmm0 +.byte 15,56,205,236 +.byte 15,56,203,209 + pshufd $0x0e,%xmm0,%xmm0 + movdqa %xmm5,%xmm7 +.byte 102,15,58,15,252,4 +.byte 15,56,203,202 + paddd %xmm7,%xmm6 + + movdqa 448-128(%rcx),%xmm0 + paddd %xmm5,%xmm0 +.byte 15,56,203,209 + pshufd $0x0e,%xmm0,%xmm0 +.byte 15,56,205,245 + movdqa %xmm8,%xmm7 +.byte 15,56,203,202 + + movdqa 480-128(%rcx),%xmm0 + paddd %xmm6,%xmm0 + nop +.byte 15,56,203,209 + pshufd $0x0e,%xmm0,%xmm0 + decq %rdx + nop +.byte 15,56,203,202 + + paddd %xmm10,%xmm2 + paddd %xmm9,%xmm1 + jnz .Loop_shaext + + pshufd $0xb1,%xmm2,%xmm2 + pshufd $0x1b,%xmm1,%xmm7 + pshufd $0xb1,%xmm1,%xmm1 + punpckhqdq %xmm2,%xmm1 +.byte 102,15,58,15,215,8 + + movdqu %xmm1,(%rdi) + movdqu %xmm2,16(%rdi) + .byte 0xf3,0xc3 +.cfi_endproc +.size sha256_block_data_order_shaext,.-sha256_block_data_order_shaext +.type sha256_block_data_order_ssse3,@function +.align 64 +sha256_block_data_order_ssse3: +.cfi_startproc +.Lssse3_shortcut: + movq %rsp,%rax +.cfi_def_cfa_register %rax + pushq %rbx +.cfi_offset %rbx,-16 + pushq %rbp +.cfi_offset %rbp,-24 + pushq %r12 +.cfi_offset %r12,-32 + pushq %r13 +.cfi_offset %r13,-40 + pushq %r14 +.cfi_offset %r14,-48 + pushq %r15 +.cfi_offset %r15,-56 + shlq $4,%rdx + subq $96,%rsp + leaq (%rsi,%rdx,4),%rdx + andq $-64,%rsp + movq %rdi,64+0(%rsp) + movq %rsi,64+8(%rsp) + movq %rdx,64+16(%rsp) + movq %rax,88(%rsp) +.cfi_escape 0x0f,0x06,0x77,0xd8,0x00,0x06,0x23,0x08 +.Lprologue_ssse3: + + movl 0(%rdi),%eax + movl 4(%rdi),%ebx + movl 8(%rdi),%ecx + movl 12(%rdi),%edx + movl 16(%rdi),%r8d + movl 20(%rdi),%r9d + movl 24(%rdi),%r10d + movl 28(%rdi),%r11d + + + jmp .Lloop_ssse3 +.align 16 +.Lloop_ssse3: + movdqa K256+512(%rip),%xmm7 + movdqu 0(%rsi),%xmm0 + movdqu 16(%rsi),%xmm1 + movdqu 32(%rsi),%xmm2 +.byte 102,15,56,0,199 + movdqu 48(%rsi),%xmm3 + leaq K256(%rip),%rbp +.byte 102,15,56,0,207 + movdqa 0(%rbp),%xmm4 + movdqa 32(%rbp),%xmm5 +.byte 102,15,56,0,215 + paddd %xmm0,%xmm4 + movdqa 64(%rbp),%xmm6 +.byte 102,15,56,0,223 + movdqa 96(%rbp),%xmm7 + paddd %xmm1,%xmm5 + paddd %xmm2,%xmm6 + paddd %xmm3,%xmm7 + movdqa %xmm4,0(%rsp) + movl %eax,%r14d + movdqa %xmm5,16(%rsp) + movl %ebx,%edi + movdqa %xmm6,32(%rsp) + xorl %ecx,%edi + movdqa %xmm7,48(%rsp) + movl %r8d,%r13d + jmp .Lssse3_00_47 + +.align 16 +.Lssse3_00_47: + subq $-128,%rbp + rorl $14,%r13d + movdqa %xmm1,%xmm4 + movl %r14d,%eax + movl %r9d,%r12d + movdqa %xmm3,%xmm7 + rorl $9,%r14d + xorl %r8d,%r13d + xorl %r10d,%r12d + rorl $5,%r13d + xorl %eax,%r14d +.byte 102,15,58,15,224,4 + andl %r8d,%r12d + xorl %r8d,%r13d +.byte 102,15,58,15,250,4 + addl 0(%rsp),%r11d + movl %eax,%r15d + xorl %r10d,%r12d + rorl $11,%r14d + movdqa %xmm4,%xmm5 + xorl %ebx,%r15d + addl %r12d,%r11d + movdqa %xmm4,%xmm6 + rorl $6,%r13d + andl %r15d,%edi + psrld $3,%xmm4 + xorl %eax,%r14d + addl %r13d,%r11d + xorl %ebx,%edi + paddd %xmm7,%xmm0 + rorl $2,%r14d + addl %r11d,%edx + psrld $7,%xmm6 + addl %edi,%r11d + movl %edx,%r13d + pshufd $250,%xmm3,%xmm7 + addl %r11d,%r14d + rorl $14,%r13d + pslld $14,%xmm5 + movl %r14d,%r11d + movl %r8d,%r12d + pxor %xmm6,%xmm4 + rorl $9,%r14d + xorl %edx,%r13d + xorl %r9d,%r12d + rorl $5,%r13d + psrld $11,%xmm6 + xorl %r11d,%r14d + pxor %xmm5,%xmm4 + andl %edx,%r12d + xorl %edx,%r13d + pslld $11,%xmm5 + addl 4(%rsp),%r10d + movl %r11d,%edi + pxor %xmm6,%xmm4 + xorl %r9d,%r12d + rorl $11,%r14d + movdqa %xmm7,%xmm6 + xorl %eax,%edi + addl %r12d,%r10d + pxor %xmm5,%xmm4 + rorl $6,%r13d + andl %edi,%r15d + xorl %r11d,%r14d + psrld $10,%xmm7 + addl %r13d,%r10d + xorl %eax,%r15d + paddd %xmm4,%xmm0 + rorl $2,%r14d + addl %r10d,%ecx + psrlq $17,%xmm6 + addl %r15d,%r10d + movl %ecx,%r13d + addl %r10d,%r14d + pxor %xmm6,%xmm7 + rorl $14,%r13d + movl %r14d,%r10d + movl %edx,%r12d + rorl $9,%r14d + psrlq $2,%xmm6 + xorl %ecx,%r13d + xorl %r8d,%r12d + pxor %xmm6,%xmm7 + rorl $5,%r13d + xorl %r10d,%r14d + andl %ecx,%r12d + pshufd $128,%xmm7,%xmm7 + xorl %ecx,%r13d + addl 8(%rsp),%r9d + movl %r10d,%r15d + psrldq $8,%xmm7 + xorl %r8d,%r12d + rorl $11,%r14d + xorl %r11d,%r15d + addl %r12d,%r9d + rorl $6,%r13d + paddd %xmm7,%xmm0 + andl %r15d,%edi + xorl %r10d,%r14d + addl %r13d,%r9d + pshufd $80,%xmm0,%xmm7 + xorl %r11d,%edi + rorl $2,%r14d + addl %r9d,%ebx + movdqa %xmm7,%xmm6 + addl %edi,%r9d + movl %ebx,%r13d + psrld $10,%xmm7 + addl %r9d,%r14d + rorl $14,%r13d + psrlq $17,%xmm6 + movl %r14d,%r9d + movl %ecx,%r12d + pxor %xmm6,%xmm7 + rorl $9,%r14d + xorl %ebx,%r13d + xorl %edx,%r12d + rorl $5,%r13d + xorl %r9d,%r14d + psrlq $2,%xmm6 + andl %ebx,%r12d + xorl %ebx,%r13d + addl 12(%rsp),%r8d + pxor %xmm6,%xmm7 + movl %r9d,%edi + xorl %edx,%r12d + rorl $11,%r14d + pshufd $8,%xmm7,%xmm7 + xorl %r10d,%edi + addl %r12d,%r8d + movdqa 0(%rbp),%xmm6 + rorl $6,%r13d + andl %edi,%r15d + pslldq $8,%xmm7 + xorl %r9d,%r14d + addl %r13d,%r8d + xorl %r10d,%r15d + paddd %xmm7,%xmm0 + rorl $2,%r14d + addl %r8d,%eax + addl %r15d,%r8d + paddd %xmm0,%xmm6 + movl %eax,%r13d + addl %r8d,%r14d + movdqa %xmm6,0(%rsp) + rorl $14,%r13d + movdqa %xmm2,%xmm4 + movl %r14d,%r8d + movl %ebx,%r12d + movdqa %xmm0,%xmm7 + rorl $9,%r14d + xorl %eax,%r13d + xorl %ecx,%r12d + rorl $5,%r13d + xorl %r8d,%r14d +.byte 102,15,58,15,225,4 + andl %eax,%r12d + xorl %eax,%r13d +.byte 102,15,58,15,251,4 + addl 16(%rsp),%edx + movl %r8d,%r15d + xorl %ecx,%r12d + rorl $11,%r14d + movdqa %xmm4,%xmm5 + xorl %r9d,%r15d + addl %r12d,%edx + movdqa %xmm4,%xmm6 + rorl $6,%r13d + andl %r15d,%edi + psrld $3,%xmm4 + xorl %r8d,%r14d + addl %r13d,%edx + xorl %r9d,%edi + paddd %xmm7,%xmm1 + rorl $2,%r14d + addl %edx,%r11d + psrld $7,%xmm6 + addl %edi,%edx + movl %r11d,%r13d + pshufd $250,%xmm0,%xmm7 + addl %edx,%r14d + rorl $14,%r13d + pslld $14,%xmm5 + movl %r14d,%edx + movl %eax,%r12d + pxor %xmm6,%xmm4 + rorl $9,%r14d + xorl %r11d,%r13d + xorl %ebx,%r12d + rorl $5,%r13d + psrld $11,%xmm6 + xorl %edx,%r14d + pxor %xmm5,%xmm4 + andl %r11d,%r12d + xorl %r11d,%r13d + pslld $11,%xmm5 + addl 20(%rsp),%ecx + movl %edx,%edi + pxor %xmm6,%xmm4 + xorl %ebx,%r12d + rorl $11,%r14d + movdqa %xmm7,%xmm6 + xorl %r8d,%edi + addl %r12d,%ecx + pxor %xmm5,%xmm4 + rorl $6,%r13d + andl %edi,%r15d + xorl %edx,%r14d + psrld $10,%xmm7 + addl %r13d,%ecx + xorl %r8d,%r15d + paddd %xmm4,%xmm1 + rorl $2,%r14d + addl %ecx,%r10d + psrlq $17,%xmm6 + addl %r15d,%ecx + movl %r10d,%r13d + addl %ecx,%r14d + pxor %xmm6,%xmm7 + rorl $14,%r13d + movl %r14d,%ecx + movl %r11d,%r12d + rorl $9,%r14d + psrlq $2,%xmm6 + xorl %r10d,%r13d + xorl %eax,%r12d + pxor %xmm6,%xmm7 + rorl $5,%r13d + xorl %ecx,%r14d + andl %r10d,%r12d + pshufd $128,%xmm7,%xmm7 + xorl %r10d,%r13d + addl 24(%rsp),%ebx + movl %ecx,%r15d + psrldq $8,%xmm7 + xorl %eax,%r12d + rorl $11,%r14d + xorl %edx,%r15d + addl %r12d,%ebx + rorl $6,%r13d + paddd %xmm7,%xmm1 + andl %r15d,%edi + xorl %ecx,%r14d + addl %r13d,%ebx + pshufd $80,%xmm1,%xmm7 + xorl %edx,%edi + rorl $2,%r14d + addl %ebx,%r9d + movdqa %xmm7,%xmm6 + addl %edi,%ebx + movl %r9d,%r13d + psrld $10,%xmm7 + addl %ebx,%r14d + rorl $14,%r13d + psrlq $17,%xmm6 + movl %r14d,%ebx + movl %r10d,%r12d + pxor %xmm6,%xmm7 + rorl $9,%r14d + xorl %r9d,%r13d + xorl %r11d,%r12d + rorl $5,%r13d + xorl %ebx,%r14d + psrlq $2,%xmm6 + andl %r9d,%r12d + xorl %r9d,%r13d + addl 28(%rsp),%eax + pxor %xmm6,%xmm7 + movl %ebx,%edi + xorl %r11d,%r12d + rorl $11,%r14d + pshufd $8,%xmm7,%xmm7 + xorl %ecx,%edi + addl %r12d,%eax + movdqa 32(%rbp),%xmm6 + rorl $6,%r13d + andl %edi,%r15d + pslldq $8,%xmm7 + xorl %ebx,%r14d + addl %r13d,%eax + xorl %ecx,%r15d + paddd %xmm7,%xmm1 + rorl $2,%r14d + addl %eax,%r8d + addl %r15d,%eax + paddd %xmm1,%xmm6 + movl %r8d,%r13d + addl %eax,%r14d + movdqa %xmm6,16(%rsp) + rorl $14,%r13d + movdqa %xmm3,%xmm4 + movl %r14d,%eax + movl %r9d,%r12d + movdqa %xmm1,%xmm7 + rorl $9,%r14d + xorl %r8d,%r13d + xorl %r10d,%r12d + rorl $5,%r13d + xorl %eax,%r14d +.byte 102,15,58,15,226,4 + andl %r8d,%r12d + xorl %r8d,%r13d +.byte 102,15,58,15,248,4 + addl 32(%rsp),%r11d + movl %eax,%r15d + xorl %r10d,%r12d + rorl $11,%r14d + movdqa %xmm4,%xmm5 + xorl %ebx,%r15d + addl %r12d,%r11d + movdqa %xmm4,%xmm6 + rorl $6,%r13d + andl %r15d,%edi + psrld $3,%xmm4 + xorl %eax,%r14d + addl %r13d,%r11d + xorl %ebx,%edi + paddd %xmm7,%xmm2 + rorl $2,%r14d + addl %r11d,%edx + psrld $7,%xmm6 + addl %edi,%r11d + movl %edx,%r13d + pshufd $250,%xmm1,%xmm7 + addl %r11d,%r14d + rorl $14,%r13d + pslld $14,%xmm5 + movl %r14d,%r11d + movl %r8d,%r12d + pxor %xmm6,%xmm4 + rorl $9,%r14d + xorl %edx,%r13d + xorl %r9d,%r12d + rorl $5,%r13d + psrld $11,%xmm6 + xorl %r11d,%r14d + pxor %xmm5,%xmm4 + andl %edx,%r12d + xorl %edx,%r13d + pslld $11,%xmm5 + addl 36(%rsp),%r10d + movl %r11d,%edi + pxor %xmm6,%xmm4 + xorl %r9d,%r12d + rorl $11,%r14d + movdqa %xmm7,%xmm6 + xorl %eax,%edi + addl %r12d,%r10d + pxor %xmm5,%xmm4 + rorl $6,%r13d + andl %edi,%r15d + xorl %r11d,%r14d + psrld $10,%xmm7 + addl %r13d,%r10d + xorl %eax,%r15d + paddd %xmm4,%xmm2 + rorl $2,%r14d + addl %r10d,%ecx + psrlq $17,%xmm6 + addl %r15d,%r10d + movl %ecx,%r13d + addl %r10d,%r14d + pxor %xmm6,%xmm7 + rorl $14,%r13d + movl %r14d,%r10d + movl %edx,%r12d + rorl $9,%r14d + psrlq $2,%xmm6 + xorl %ecx,%r13d + xorl %r8d,%r12d + pxor %xmm6,%xmm7 + rorl $5,%r13d + xorl %r10d,%r14d + andl %ecx,%r12d + pshufd $128,%xmm7,%xmm7 + xorl %ecx,%r13d + addl 40(%rsp),%r9d + movl %r10d,%r15d + psrldq $8,%xmm7 + xorl %r8d,%r12d + rorl $11,%r14d + xorl %r11d,%r15d + addl %r12d,%r9d + rorl $6,%r13d + paddd %xmm7,%xmm2 + andl %r15d,%edi + xorl %r10d,%r14d + addl %r13d,%r9d + pshufd $80,%xmm2,%xmm7 + xorl %r11d,%edi + rorl $2,%r14d + addl %r9d,%ebx + movdqa %xmm7,%xmm6 + addl %edi,%r9d + movl %ebx,%r13d + psrld $10,%xmm7 + addl %r9d,%r14d + rorl $14,%r13d + psrlq $17,%xmm6 + movl %r14d,%r9d + movl %ecx,%r12d + pxor %xmm6,%xmm7 + rorl $9,%r14d + xorl %ebx,%r13d + xorl %edx,%r12d + rorl $5,%r13d + xorl %r9d,%r14d + psrlq $2,%xmm6 + andl %ebx,%r12d + xorl %ebx,%r13d + addl 44(%rsp),%r8d + pxor %xmm6,%xmm7 + movl %r9d,%edi + xorl %edx,%r12d + rorl $11,%r14d + pshufd $8,%xmm7,%xmm7 + xorl %r10d,%edi + addl %r12d,%r8d + movdqa 64(%rbp),%xmm6 + rorl $6,%r13d + andl %edi,%r15d + pslldq $8,%xmm7 + xorl %r9d,%r14d + addl %r13d,%r8d + xorl %r10d,%r15d + paddd %xmm7,%xmm2 + rorl $2,%r14d + addl %r8d,%eax + addl %r15d,%r8d + paddd %xmm2,%xmm6 + movl %eax,%r13d + addl %r8d,%r14d + movdqa %xmm6,32(%rsp) + rorl $14,%r13d + movdqa %xmm0,%xmm4 + movl %r14d,%r8d + movl %ebx,%r12d + movdqa %xmm2,%xmm7 + rorl $9,%r14d + xorl %eax,%r13d + xorl %ecx,%r12d + rorl $5,%r13d + xorl %r8d,%r14d +.byte 102,15,58,15,227,4 + andl %eax,%r12d + xorl %eax,%r13d +.byte 102,15,58,15,249,4 + addl 48(%rsp),%edx + movl %r8d,%r15d + xorl %ecx,%r12d + rorl $11,%r14d + movdqa %xmm4,%xmm5 + xorl %r9d,%r15d + addl %r12d,%edx + movdqa %xmm4,%xmm6 + rorl $6,%r13d + andl %r15d,%edi + psrld $3,%xmm4 + xorl %r8d,%r14d + addl %r13d,%edx + xorl %r9d,%edi + paddd %xmm7,%xmm3 + rorl $2,%r14d + addl %edx,%r11d + psrld $7,%xmm6 + addl %edi,%edx + movl %r11d,%r13d + pshufd $250,%xmm2,%xmm7 + addl %edx,%r14d + rorl $14,%r13d + pslld $14,%xmm5 + movl %r14d,%edx + movl %eax,%r12d + pxor %xmm6,%xmm4 + rorl $9,%r14d + xorl %r11d,%r13d + xorl %ebx,%r12d + rorl $5,%r13d + psrld $11,%xmm6 + xorl %edx,%r14d + pxor %xmm5,%xmm4 + andl %r11d,%r12d + xorl %r11d,%r13d + pslld $11,%xmm5 + addl 52(%rsp),%ecx + movl %edx,%edi + pxor %xmm6,%xmm4 + xorl %ebx,%r12d + rorl $11,%r14d + movdqa %xmm7,%xmm6 + xorl %r8d,%edi + addl %r12d,%ecx + pxor %xmm5,%xmm4 + rorl $6,%r13d + andl %edi,%r15d + xorl %edx,%r14d + psrld $10,%xmm7 + addl %r13d,%ecx + xorl %r8d,%r15d + paddd %xmm4,%xmm3 + rorl $2,%r14d + addl %ecx,%r10d + psrlq $17,%xmm6 + addl %r15d,%ecx + movl %r10d,%r13d + addl %ecx,%r14d + pxor %xmm6,%xmm7 + rorl $14,%r13d + movl %r14d,%ecx + movl %r11d,%r12d + rorl $9,%r14d + psrlq $2,%xmm6 + xorl %r10d,%r13d + xorl %eax,%r12d + pxor %xmm6,%xmm7 + rorl $5,%r13d + xorl %ecx,%r14d + andl %r10d,%r12d + pshufd $128,%xmm7,%xmm7 + xorl %r10d,%r13d + addl 56(%rsp),%ebx + movl %ecx,%r15d + psrldq $8,%xmm7 + xorl %eax,%r12d + rorl $11,%r14d + xorl %edx,%r15d + addl %r12d,%ebx + rorl $6,%r13d + paddd %xmm7,%xmm3 + andl %r15d,%edi + xorl %ecx,%r14d + addl %r13d,%ebx + pshufd $80,%xmm3,%xmm7 + xorl %edx,%edi + rorl $2,%r14d + addl %ebx,%r9d + movdqa %xmm7,%xmm6 + addl %edi,%ebx + movl %r9d,%r13d + psrld $10,%xmm7 + addl %ebx,%r14d + rorl $14,%r13d + psrlq $17,%xmm6 + movl %r14d,%ebx + movl %r10d,%r12d + pxor %xmm6,%xmm7 + rorl $9,%r14d + xorl %r9d,%r13d + xorl %r11d,%r12d + rorl $5,%r13d + xorl %ebx,%r14d + psrlq $2,%xmm6 + andl %r9d,%r12d + xorl %r9d,%r13d + addl 60(%rsp),%eax + pxor %xmm6,%xmm7 + movl %ebx,%edi + xorl %r11d,%r12d + rorl $11,%r14d + pshufd $8,%xmm7,%xmm7 + xorl %ecx,%edi + addl %r12d,%eax + movdqa 96(%rbp),%xmm6 + rorl $6,%r13d + andl %edi,%r15d + pslldq $8,%xmm7 + xorl %ebx,%r14d + addl %r13d,%eax + xorl %ecx,%r15d + paddd %xmm7,%xmm3 + rorl $2,%r14d + addl %eax,%r8d + addl %r15d,%eax + paddd %xmm3,%xmm6 + movl %r8d,%r13d + addl %eax,%r14d + movdqa %xmm6,48(%rsp) + cmpb $0,131(%rbp) + jne .Lssse3_00_47 + rorl $14,%r13d + movl %r14d,%eax + movl %r9d,%r12d + rorl $9,%r14d + xorl %r8d,%r13d + xorl %r10d,%r12d + rorl $5,%r13d + xorl %eax,%r14d + andl %r8d,%r12d + xorl %r8d,%r13d + addl 0(%rsp),%r11d + movl %eax,%r15d + xorl %r10d,%r12d + rorl $11,%r14d + xorl %ebx,%r15d + addl %r12d,%r11d + rorl $6,%r13d + andl %r15d,%edi + xorl %eax,%r14d + addl %r13d,%r11d + xorl %ebx,%edi + rorl $2,%r14d + addl %r11d,%edx + addl %edi,%r11d + movl %edx,%r13d + addl %r11d,%r14d + rorl $14,%r13d + movl %r14d,%r11d + movl %r8d,%r12d + rorl $9,%r14d + xorl %edx,%r13d + xorl %r9d,%r12d + rorl $5,%r13d + xorl %r11d,%r14d + andl %edx,%r12d + xorl %edx,%r13d + addl 4(%rsp),%r10d + movl %r11d,%edi + xorl %r9d,%r12d + rorl $11,%r14d + xorl %eax,%edi + addl %r12d,%r10d + rorl $6,%r13d + andl %edi,%r15d + xorl %r11d,%r14d + addl %r13d,%r10d + xorl %eax,%r15d + rorl $2,%r14d + addl %r10d,%ecx + addl %r15d,%r10d + movl %ecx,%r13d + addl %r10d,%r14d + rorl $14,%r13d + movl %r14d,%r10d + movl %edx,%r12d + rorl $9,%r14d + xorl %ecx,%r13d + xorl %r8d,%r12d + rorl $5,%r13d + xorl %r10d,%r14d + andl %ecx,%r12d + xorl %ecx,%r13d + addl 8(%rsp),%r9d + movl %r10d,%r15d + xorl %r8d,%r12d + rorl $11,%r14d + xorl %r11d,%r15d + addl %r12d,%r9d + rorl $6,%r13d + andl %r15d,%edi + xorl %r10d,%r14d + addl %r13d,%r9d + xorl %r11d,%edi + rorl $2,%r14d + addl %r9d,%ebx + addl %edi,%r9d + movl %ebx,%r13d + addl %r9d,%r14d + rorl $14,%r13d + movl %r14d,%r9d + movl %ecx,%r12d + rorl $9,%r14d + xorl %ebx,%r13d + xorl %edx,%r12d + rorl $5,%r13d + xorl %r9d,%r14d + andl %ebx,%r12d + xorl %ebx,%r13d + addl 12(%rsp),%r8d + movl %r9d,%edi + xorl %edx,%r12d + rorl $11,%r14d + xorl %r10d,%edi + addl %r12d,%r8d + rorl $6,%r13d + andl %edi,%r15d + xorl %r9d,%r14d + addl %r13d,%r8d + xorl %r10d,%r15d + rorl $2,%r14d + addl %r8d,%eax + addl %r15d,%r8d + movl %eax,%r13d + addl %r8d,%r14d + rorl $14,%r13d + movl %r14d,%r8d + movl %ebx,%r12d + rorl $9,%r14d + xorl %eax,%r13d + xorl %ecx,%r12d + rorl $5,%r13d + xorl %r8d,%r14d + andl %eax,%r12d + xorl %eax,%r13d + addl 16(%rsp),%edx + movl %r8d,%r15d + xorl %ecx,%r12d + rorl $11,%r14d + xorl %r9d,%r15d + addl %r12d,%edx + rorl $6,%r13d + andl %r15d,%edi + xorl %r8d,%r14d + addl %r13d,%edx + xorl %r9d,%edi + rorl $2,%r14d + addl %edx,%r11d + addl %edi,%edx + movl %r11d,%r13d + addl %edx,%r14d + rorl $14,%r13d + movl %r14d,%edx + movl %eax,%r12d + rorl $9,%r14d + xorl %r11d,%r13d + xorl %ebx,%r12d + rorl $5,%r13d + xorl %edx,%r14d + andl %r11d,%r12d + xorl %r11d,%r13d + addl 20(%rsp),%ecx + movl %edx,%edi + xorl %ebx,%r12d + rorl $11,%r14d + xorl %r8d,%edi + addl %r12d,%ecx + rorl $6,%r13d + andl %edi,%r15d + xorl %edx,%r14d + addl %r13d,%ecx + xorl %r8d,%r15d + rorl $2,%r14d + addl %ecx,%r10d + addl %r15d,%ecx + movl %r10d,%r13d + addl %ecx,%r14d + rorl $14,%r13d + movl %r14d,%ecx + movl %r11d,%r12d + rorl $9,%r14d + xorl %r10d,%r13d + xorl %eax,%r12d + rorl $5,%r13d + xorl %ecx,%r14d + andl %r10d,%r12d + xorl %r10d,%r13d + addl 24(%rsp),%ebx + movl %ecx,%r15d + xorl %eax,%r12d + rorl $11,%r14d + xorl %edx,%r15d + addl %r12d,%ebx + rorl $6,%r13d + andl %r15d,%edi + xorl %ecx,%r14d + addl %r13d,%ebx + xorl %edx,%edi + rorl $2,%r14d + addl %ebx,%r9d + addl %edi,%ebx + movl %r9d,%r13d + addl %ebx,%r14d + rorl $14,%r13d + movl %r14d,%ebx + movl %r10d,%r12d + rorl $9,%r14d + xorl %r9d,%r13d + xorl %r11d,%r12d + rorl $5,%r13d + xorl %ebx,%r14d + andl %r9d,%r12d + xorl %r9d,%r13d + addl 28(%rsp),%eax + movl %ebx,%edi + xorl %r11d,%r12d + rorl $11,%r14d + xorl %ecx,%edi + addl %r12d,%eax + rorl $6,%r13d + andl %edi,%r15d + xorl %ebx,%r14d + addl %r13d,%eax + xorl %ecx,%r15d + rorl $2,%r14d + addl %eax,%r8d + addl %r15d,%eax + movl %r8d,%r13d + addl %eax,%r14d + rorl $14,%r13d + movl %r14d,%eax + movl %r9d,%r12d + rorl $9,%r14d + xorl %r8d,%r13d + xorl %r10d,%r12d + rorl $5,%r13d + xorl %eax,%r14d + andl %r8d,%r12d + xorl %r8d,%r13d + addl 32(%rsp),%r11d + movl %eax,%r15d + xorl %r10d,%r12d + rorl $11,%r14d + xorl %ebx,%r15d + addl %r12d,%r11d + rorl $6,%r13d + andl %r15d,%edi + xorl %eax,%r14d + addl %r13d,%r11d + xorl %ebx,%edi + rorl $2,%r14d + addl %r11d,%edx + addl %edi,%r11d + movl %edx,%r13d + addl %r11d,%r14d + rorl $14,%r13d + movl %r14d,%r11d + movl %r8d,%r12d + rorl $9,%r14d + xorl %edx,%r13d + xorl %r9d,%r12d + rorl $5,%r13d + xorl %r11d,%r14d + andl %edx,%r12d + xorl %edx,%r13d + addl 36(%rsp),%r10d + movl %r11d,%edi + xorl %r9d,%r12d + rorl $11,%r14d + xorl %eax,%edi + addl %r12d,%r10d + rorl $6,%r13d + andl %edi,%r15d + xorl %r11d,%r14d + addl %r13d,%r10d + xorl %eax,%r15d + rorl $2,%r14d + addl %r10d,%ecx + addl %r15d,%r10d + movl %ecx,%r13d + addl %r10d,%r14d + rorl $14,%r13d + movl %r14d,%r10d + movl %edx,%r12d + rorl $9,%r14d + xorl %ecx,%r13d + xorl %r8d,%r12d + rorl $5,%r13d + xorl %r10d,%r14d + andl %ecx,%r12d + xorl %ecx,%r13d + addl 40(%rsp),%r9d + movl %r10d,%r15d + xorl %r8d,%r12d + rorl $11,%r14d + xorl %r11d,%r15d + addl %r12d,%r9d + rorl $6,%r13d + andl %r15d,%edi + xorl %r10d,%r14d + addl %r13d,%r9d + xorl %r11d,%edi + rorl $2,%r14d + addl %r9d,%ebx + addl %edi,%r9d + movl %ebx,%r13d + addl %r9d,%r14d + rorl $14,%r13d + movl %r14d,%r9d + movl %ecx,%r12d + rorl $9,%r14d + xorl %ebx,%r13d + xorl %edx,%r12d + rorl $5,%r13d + xorl %r9d,%r14d + andl %ebx,%r12d + xorl %ebx,%r13d + addl 44(%rsp),%r8d + movl %r9d,%edi + xorl %edx,%r12d + rorl $11,%r14d + xorl %r10d,%edi + addl %r12d,%r8d + rorl $6,%r13d + andl %edi,%r15d + xorl %r9d,%r14d + addl %r13d,%r8d + xorl %r10d,%r15d + rorl $2,%r14d + addl %r8d,%eax + addl %r15d,%r8d + movl %eax,%r13d + addl %r8d,%r14d + rorl $14,%r13d + movl %r14d,%r8d + movl %ebx,%r12d + rorl $9,%r14d + xorl %eax,%r13d + xorl %ecx,%r12d + rorl $5,%r13d + xorl %r8d,%r14d + andl %eax,%r12d + xorl %eax,%r13d + addl 48(%rsp),%edx + movl %r8d,%r15d + xorl %ecx,%r12d + rorl $11,%r14d + xorl %r9d,%r15d + addl %r12d,%edx + rorl $6,%r13d + andl %r15d,%edi + xorl %r8d,%r14d + addl %r13d,%edx + xorl %r9d,%edi + rorl $2,%r14d + addl %edx,%r11d + addl %edi,%edx + movl %r11d,%r13d + addl %edx,%r14d + rorl $14,%r13d + movl %r14d,%edx + movl %eax,%r12d + rorl $9,%r14d + xorl %r11d,%r13d + xorl %ebx,%r12d + rorl $5,%r13d + xorl %edx,%r14d + andl %r11d,%r12d + xorl %r11d,%r13d + addl 52(%rsp),%ecx + movl %edx,%edi + xorl %ebx,%r12d + rorl $11,%r14d + xorl %r8d,%edi + addl %r12d,%ecx + rorl $6,%r13d + andl %edi,%r15d + xorl %edx,%r14d + addl %r13d,%ecx + xorl %r8d,%r15d + rorl $2,%r14d + addl %ecx,%r10d + addl %r15d,%ecx + movl %r10d,%r13d + addl %ecx,%r14d + rorl $14,%r13d + movl %r14d,%ecx + movl %r11d,%r12d + rorl $9,%r14d + xorl %r10d,%r13d + xorl %eax,%r12d + rorl $5,%r13d + xorl %ecx,%r14d + andl %r10d,%r12d + xorl %r10d,%r13d + addl 56(%rsp),%ebx + movl %ecx,%r15d + xorl %eax,%r12d + rorl $11,%r14d + xorl %edx,%r15d + addl %r12d,%ebx + rorl $6,%r13d + andl %r15d,%edi + xorl %ecx,%r14d + addl %r13d,%ebx + xorl %edx,%edi + rorl $2,%r14d + addl %ebx,%r9d + addl %edi,%ebx + movl %r9d,%r13d + addl %ebx,%r14d + rorl $14,%r13d + movl %r14d,%ebx + movl %r10d,%r12d + rorl $9,%r14d + xorl %r9d,%r13d + xorl %r11d,%r12d + rorl $5,%r13d + xorl %ebx,%r14d + andl %r9d,%r12d + xorl %r9d,%r13d + addl 60(%rsp),%eax + movl %ebx,%edi + xorl %r11d,%r12d + rorl $11,%r14d + xorl %ecx,%edi + addl %r12d,%eax + rorl $6,%r13d + andl %edi,%r15d + xorl %ebx,%r14d + addl %r13d,%eax + xorl %ecx,%r15d + rorl $2,%r14d + addl %eax,%r8d + addl %r15d,%eax + movl %r8d,%r13d + addl %eax,%r14d + movq 64+0(%rsp),%rdi + movl %r14d,%eax + + addl 0(%rdi),%eax + leaq 64(%rsi),%rsi + addl 4(%rdi),%ebx + addl 8(%rdi),%ecx + addl 12(%rdi),%edx + addl 16(%rdi),%r8d + addl 20(%rdi),%r9d + addl 24(%rdi),%r10d + addl 28(%rdi),%r11d + + cmpq 64+16(%rsp),%rsi + + movl %eax,0(%rdi) + movl %ebx,4(%rdi) + movl %ecx,8(%rdi) + movl %edx,12(%rdi) + movl %r8d,16(%rdi) + movl %r9d,20(%rdi) + movl %r10d,24(%rdi) + movl %r11d,28(%rdi) + jb .Lloop_ssse3 + + movq 88(%rsp),%rsi +.cfi_def_cfa %rsi,8 + movq -48(%rsi),%r15 +.cfi_restore %r15 + movq -40(%rsi),%r14 +.cfi_restore %r14 + movq -32(%rsi),%r13 +.cfi_restore %r13 + movq -24(%rsi),%r12 +.cfi_restore %r12 + movq -16(%rsi),%rbp +.cfi_restore %rbp + movq -8(%rsi),%rbx +.cfi_restore %rbx + leaq (%rsi),%rsp +.cfi_def_cfa_register %rsp +.Lepilogue_ssse3: + .byte 0xf3,0xc3 +.cfi_endproc +.size sha256_block_data_order_ssse3,.-sha256_block_data_order_ssse3 diff --git a/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha512-x86_64.S b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha512-x86_64.S new file mode 100644 index 0000000000..11e67e5ba1 --- /dev/null +++ b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha512-x86_64.S @@ -0,0 +1,1811 @@ +# WARNING: do not edit! +# Generated from openssl/crypto/sha/asm/sha512-x86_64.pl +# +# Copyright 2005-2020 The OpenSSL Project Authors. All Rights Reserved. +# +# Licensed under the OpenSSL license (the "License"). You may not use +# this file except in compliance with the License. You can obtain a copy +# in the file LICENSE in the source distribution or at +# https://www.openssl.org/source/license.html + +.text + + +.globl sha512_block_data_order +.type sha512_block_data_order,@function +.align 16 +sha512_block_data_order: +.cfi_startproc + movq %rsp,%rax +.cfi_def_cfa_register %rax + pushq %rbx +.cfi_offset %rbx,-16 + pushq %rbp +.cfi_offset %rbp,-24 + pushq %r12 +.cfi_offset %r12,-32 + pushq %r13 +.cfi_offset %r13,-40 + pushq %r14 +.cfi_offset %r14,-48 + pushq %r15 +.cfi_offset %r15,-56 + shlq $4,%rdx + subq $128+32,%rsp + leaq (%rsi,%rdx,8),%rdx + andq $-64,%rsp + movq %rdi,128+0(%rsp) + movq %rsi,128+8(%rsp) + movq %rdx,128+16(%rsp) + movq %rax,152(%rsp) +.cfi_escape 0x0f,0x06,0x77,0x98,0x01,0x06,0x23,0x08 +.Lprologue: + + movq 0(%rdi),%rax + movq 8(%rdi),%rbx + movq 16(%rdi),%rcx + movq 24(%rdi),%rdx + movq 32(%rdi),%r8 + movq 40(%rdi),%r9 + movq 48(%rdi),%r10 + movq 56(%rdi),%r11 + jmp .Lloop + +.align 16 +.Lloop: + movq %rbx,%rdi + leaq K512(%rip),%rbp + xorq %rcx,%rdi + movq 0(%rsi),%r12 + movq %r8,%r13 + movq %rax,%r14 + bswapq %r12 + rorq $23,%r13 + movq %r9,%r15 + + xorq %r8,%r13 + rorq $5,%r14 + xorq %r10,%r15 + + movq %r12,0(%rsp) + xorq %rax,%r14 + andq %r8,%r15 + + rorq $4,%r13 + addq %r11,%r12 + xorq %r10,%r15 + + rorq $6,%r14 + xorq %r8,%r13 + addq %r15,%r12 + + movq %rax,%r15 + addq (%rbp),%r12 + xorq %rax,%r14 + + xorq %rbx,%r15 + rorq $14,%r13 + movq %rbx,%r11 + + andq %r15,%rdi + rorq $28,%r14 + addq %r13,%r12 + + xorq %rdi,%r11 + addq %r12,%rdx + addq %r12,%r11 + + leaq 8(%rbp),%rbp + addq %r14,%r11 + movq 8(%rsi),%r12 + movq %rdx,%r13 + movq %r11,%r14 + bswapq %r12 + rorq $23,%r13 + movq %r8,%rdi + + xorq %rdx,%r13 + rorq $5,%r14 + xorq %r9,%rdi + + movq %r12,8(%rsp) + xorq %r11,%r14 + andq %rdx,%rdi + + rorq $4,%r13 + addq %r10,%r12 + xorq %r9,%rdi + + rorq $6,%r14 + xorq %rdx,%r13 + addq %rdi,%r12 + + movq %r11,%rdi + addq (%rbp),%r12 + xorq %r11,%r14 + + xorq %rax,%rdi + rorq $14,%r13 + movq %rax,%r10 + + andq %rdi,%r15 + rorq $28,%r14 + addq %r13,%r12 + + xorq %r15,%r10 + addq %r12,%rcx + addq %r12,%r10 + + leaq 24(%rbp),%rbp + addq %r14,%r10 + movq 16(%rsi),%r12 + movq %rcx,%r13 + movq %r10,%r14 + bswapq %r12 + rorq $23,%r13 + movq %rdx,%r15 + + xorq %rcx,%r13 + rorq $5,%r14 + xorq %r8,%r15 + + movq %r12,16(%rsp) + xorq %r10,%r14 + andq %rcx,%r15 + + rorq $4,%r13 + addq %r9,%r12 + xorq %r8,%r15 + + rorq $6,%r14 + xorq %rcx,%r13 + addq %r15,%r12 + + movq %r10,%r15 + addq (%rbp),%r12 + xorq %r10,%r14 + + xorq %r11,%r15 + rorq $14,%r13 + movq %r11,%r9 + + andq %r15,%rdi + rorq $28,%r14 + addq %r13,%r12 + + xorq %rdi,%r9 + addq %r12,%rbx + addq %r12,%r9 + + leaq 8(%rbp),%rbp + addq %r14,%r9 + movq 24(%rsi),%r12 + movq %rbx,%r13 + movq %r9,%r14 + bswapq %r12 + rorq $23,%r13 + movq %rcx,%rdi + + xorq %rbx,%r13 + rorq $5,%r14 + xorq %rdx,%rdi + + movq %r12,24(%rsp) + xorq %r9,%r14 + andq %rbx,%rdi + + rorq $4,%r13 + addq %r8,%r12 + xorq %rdx,%rdi + + rorq $6,%r14 + xorq %rbx,%r13 + addq %rdi,%r12 + + movq %r9,%rdi + addq (%rbp),%r12 + xorq %r9,%r14 + + xorq %r10,%rdi + rorq $14,%r13 + movq %r10,%r8 + + andq %rdi,%r15 + rorq $28,%r14 + addq %r13,%r12 + + xorq %r15,%r8 + addq %r12,%rax + addq %r12,%r8 + + leaq 24(%rbp),%rbp + addq %r14,%r8 + movq 32(%rsi),%r12 + movq %rax,%r13 + movq %r8,%r14 + bswapq %r12 + rorq $23,%r13 + movq %rbx,%r15 + + xorq %rax,%r13 + rorq $5,%r14 + xorq %rcx,%r15 + + movq %r12,32(%rsp) + xorq %r8,%r14 + andq %rax,%r15 + + rorq $4,%r13 + addq %rdx,%r12 + xorq %rcx,%r15 + + rorq $6,%r14 + xorq %rax,%r13 + addq %r15,%r12 + + movq %r8,%r15 + addq (%rbp),%r12 + xorq %r8,%r14 + + xorq %r9,%r15 + rorq $14,%r13 + movq %r9,%rdx + + andq %r15,%rdi + rorq $28,%r14 + addq %r13,%r12 + + xorq %rdi,%rdx + addq %r12,%r11 + addq %r12,%rdx + + leaq 8(%rbp),%rbp + addq %r14,%rdx + movq 40(%rsi),%r12 + movq %r11,%r13 + movq %rdx,%r14 + bswapq %r12 + rorq $23,%r13 + movq %rax,%rdi + + xorq %r11,%r13 + rorq $5,%r14 + xorq %rbx,%rdi + + movq %r12,40(%rsp) + xorq %rdx,%r14 + andq %r11,%rdi + + rorq $4,%r13 + addq %rcx,%r12 + xorq %rbx,%rdi + + rorq $6,%r14 + xorq %r11,%r13 + addq %rdi,%r12 + + movq %rdx,%rdi + addq (%rbp),%r12 + xorq %rdx,%r14 + + xorq %r8,%rdi + rorq $14,%r13 + movq %r8,%rcx + + andq %rdi,%r15 + rorq $28,%r14 + addq %r13,%r12 + + xorq %r15,%rcx + addq %r12,%r10 + addq %r12,%rcx + + leaq 24(%rbp),%rbp + addq %r14,%rcx + movq 48(%rsi),%r12 + movq %r10,%r13 + movq %rcx,%r14 + bswapq %r12 + rorq $23,%r13 + movq %r11,%r15 + + xorq %r10,%r13 + rorq $5,%r14 + xorq %rax,%r15 + + movq %r12,48(%rsp) + xorq %rcx,%r14 + andq %r10,%r15 + + rorq $4,%r13 + addq %rbx,%r12 + xorq %rax,%r15 + + rorq $6,%r14 + xorq %r10,%r13 + addq %r15,%r12 + + movq %rcx,%r15 + addq (%rbp),%r12 + xorq %rcx,%r14 + + xorq %rdx,%r15 + rorq $14,%r13 + movq %rdx,%rbx + + andq %r15,%rdi + rorq $28,%r14 + addq %r13,%r12 + + xorq %rdi,%rbx + addq %r12,%r9 + addq %r12,%rbx + + leaq 8(%rbp),%rbp + addq %r14,%rbx + movq 56(%rsi),%r12 + movq %r9,%r13 + movq %rbx,%r14 + bswapq %r12 + rorq $23,%r13 + movq %r10,%rdi + + xorq %r9,%r13 + rorq $5,%r14 + xorq %r11,%rdi + + movq %r12,56(%rsp) + xorq %rbx,%r14 + andq %r9,%rdi + + rorq $4,%r13 + addq %rax,%r12 + xorq %r11,%rdi + + rorq $6,%r14 + xorq %r9,%r13 + addq %rdi,%r12 + + movq %rbx,%rdi + addq (%rbp),%r12 + xorq %rbx,%r14 + + xorq %rcx,%rdi + rorq $14,%r13 + movq %rcx,%rax + + andq %rdi,%r15 + rorq $28,%r14 + addq %r13,%r12 + + xorq %r15,%rax + addq %r12,%r8 + addq %r12,%rax + + leaq 24(%rbp),%rbp + addq %r14,%rax + movq 64(%rsi),%r12 + movq %r8,%r13 + movq %rax,%r14 + bswapq %r12 + rorq $23,%r13 + movq %r9,%r15 + + xorq %r8,%r13 + rorq $5,%r14 + xorq %r10,%r15 + + movq %r12,64(%rsp) + xorq %rax,%r14 + andq %r8,%r15 + + rorq $4,%r13 + addq %r11,%r12 + xorq %r10,%r15 + + rorq $6,%r14 + xorq %r8,%r13 + addq %r15,%r12 + + movq %rax,%r15 + addq (%rbp),%r12 + xorq %rax,%r14 + + xorq %rbx,%r15 + rorq $14,%r13 + movq %rbx,%r11 + + andq %r15,%rdi + rorq $28,%r14 + addq %r13,%r12 + + xorq %rdi,%r11 + addq %r12,%rdx + addq %r12,%r11 + + leaq 8(%rbp),%rbp + addq %r14,%r11 + movq 72(%rsi),%r12 + movq %rdx,%r13 + movq %r11,%r14 + bswapq %r12 + rorq $23,%r13 + movq %r8,%rdi + + xorq %rdx,%r13 + rorq $5,%r14 + xorq %r9,%rdi + + movq %r12,72(%rsp) + xorq %r11,%r14 + andq %rdx,%rdi + + rorq $4,%r13 + addq %r10,%r12 + xorq %r9,%rdi + + rorq $6,%r14 + xorq %rdx,%r13 + addq %rdi,%r12 + + movq %r11,%rdi + addq (%rbp),%r12 + xorq %r11,%r14 + + xorq %rax,%rdi + rorq $14,%r13 + movq %rax,%r10 + + andq %rdi,%r15 + rorq $28,%r14 + addq %r13,%r12 + + xorq %r15,%r10 + addq %r12,%rcx + addq %r12,%r10 + + leaq 24(%rbp),%rbp + addq %r14,%r10 + movq 80(%rsi),%r12 + movq %rcx,%r13 + movq %r10,%r14 + bswapq %r12 + rorq $23,%r13 + movq %rdx,%r15 + + xorq %rcx,%r13 + rorq $5,%r14 + xorq %r8,%r15 + + movq %r12,80(%rsp) + xorq %r10,%r14 + andq %rcx,%r15 + + rorq $4,%r13 + addq %r9,%r12 + xorq %r8,%r15 + + rorq $6,%r14 + xorq %rcx,%r13 + addq %r15,%r12 + + movq %r10,%r15 + addq (%rbp),%r12 + xorq %r10,%r14 + + xorq %r11,%r15 + rorq $14,%r13 + movq %r11,%r9 + + andq %r15,%rdi + rorq $28,%r14 + addq %r13,%r12 + + xorq %rdi,%r9 + addq %r12,%rbx + addq %r12,%r9 + + leaq 8(%rbp),%rbp + addq %r14,%r9 + movq 88(%rsi),%r12 + movq %rbx,%r13 + movq %r9,%r14 + bswapq %r12 + rorq $23,%r13 + movq %rcx,%rdi + + xorq %rbx,%r13 + rorq $5,%r14 + xorq %rdx,%rdi + + movq %r12,88(%rsp) + xorq %r9,%r14 + andq %rbx,%rdi + + rorq $4,%r13 + addq %r8,%r12 + xorq %rdx,%rdi + + rorq $6,%r14 + xorq %rbx,%r13 + addq %rdi,%r12 + + movq %r9,%rdi + addq (%rbp),%r12 + xorq %r9,%r14 + + xorq %r10,%rdi + rorq $14,%r13 + movq %r10,%r8 + + andq %rdi,%r15 + rorq $28,%r14 + addq %r13,%r12 + + xorq %r15,%r8 + addq %r12,%rax + addq %r12,%r8 + + leaq 24(%rbp),%rbp + addq %r14,%r8 + movq 96(%rsi),%r12 + movq %rax,%r13 + movq %r8,%r14 + bswapq %r12 + rorq $23,%r13 + movq %rbx,%r15 + + xorq %rax,%r13 + rorq $5,%r14 + xorq %rcx,%r15 + + movq %r12,96(%rsp) + xorq %r8,%r14 + andq %rax,%r15 + + rorq $4,%r13 + addq %rdx,%r12 + xorq %rcx,%r15 + + rorq $6,%r14 + xorq %rax,%r13 + addq %r15,%r12 + + movq %r8,%r15 + addq (%rbp),%r12 + xorq %r8,%r14 + + xorq %r9,%r15 + rorq $14,%r13 + movq %r9,%rdx + + andq %r15,%rdi + rorq $28,%r14 + addq %r13,%r12 + + xorq %rdi,%rdx + addq %r12,%r11 + addq %r12,%rdx + + leaq 8(%rbp),%rbp + addq %r14,%rdx + movq 104(%rsi),%r12 + movq %r11,%r13 + movq %rdx,%r14 + bswapq %r12 + rorq $23,%r13 + movq %rax,%rdi + + xorq %r11,%r13 + rorq $5,%r14 + xorq %rbx,%rdi + + movq %r12,104(%rsp) + xorq %rdx,%r14 + andq %r11,%rdi + + rorq $4,%r13 + addq %rcx,%r12 + xorq %rbx,%rdi + + rorq $6,%r14 + xorq %r11,%r13 + addq %rdi,%r12 + + movq %rdx,%rdi + addq (%rbp),%r12 + xorq %rdx,%r14 + + xorq %r8,%rdi + rorq $14,%r13 + movq %r8,%rcx + + andq %rdi,%r15 + rorq $28,%r14 + addq %r13,%r12 + + xorq %r15,%rcx + addq %r12,%r10 + addq %r12,%rcx + + leaq 24(%rbp),%rbp + addq %r14,%rcx + movq 112(%rsi),%r12 + movq %r10,%r13 + movq %rcx,%r14 + bswapq %r12 + rorq $23,%r13 + movq %r11,%r15 + + xorq %r10,%r13 + rorq $5,%r14 + xorq %rax,%r15 + + movq %r12,112(%rsp) + xorq %rcx,%r14 + andq %r10,%r15 + + rorq $4,%r13 + addq %rbx,%r12 + xorq %rax,%r15 + + rorq $6,%r14 + xorq %r10,%r13 + addq %r15,%r12 + + movq %rcx,%r15 + addq (%rbp),%r12 + xorq %rcx,%r14 + + xorq %rdx,%r15 + rorq $14,%r13 + movq %rdx,%rbx + + andq %r15,%rdi + rorq $28,%r14 + addq %r13,%r12 + + xorq %rdi,%rbx + addq %r12,%r9 + addq %r12,%rbx + + leaq 8(%rbp),%rbp + addq %r14,%rbx + movq 120(%rsi),%r12 + movq %r9,%r13 + movq %rbx,%r14 + bswapq %r12 + rorq $23,%r13 + movq %r10,%rdi + + xorq %r9,%r13 + rorq $5,%r14 + xorq %r11,%rdi + + movq %r12,120(%rsp) + xorq %rbx,%r14 + andq %r9,%rdi + + rorq $4,%r13 + addq %rax,%r12 + xorq %r11,%rdi + + rorq $6,%r14 + xorq %r9,%r13 + addq %rdi,%r12 + + movq %rbx,%rdi + addq (%rbp),%r12 + xorq %rbx,%r14 + + xorq %rcx,%rdi + rorq $14,%r13 + movq %rcx,%rax + + andq %rdi,%r15 + rorq $28,%r14 + addq %r13,%r12 + + xorq %r15,%rax + addq %r12,%r8 + addq %r12,%rax + + leaq 24(%rbp),%rbp + jmp .Lrounds_16_xx +.align 16 +.Lrounds_16_xx: + movq 8(%rsp),%r13 + movq 112(%rsp),%r15 + + movq %r13,%r12 + rorq $7,%r13 + addq %r14,%rax + movq %r15,%r14 + rorq $42,%r15 + + xorq %r12,%r13 + shrq $7,%r12 + rorq $1,%r13 + xorq %r14,%r15 + shrq $6,%r14 + + rorq $19,%r15 + xorq %r13,%r12 + xorq %r14,%r15 + addq 72(%rsp),%r12 + + addq 0(%rsp),%r12 + movq %r8,%r13 + addq %r15,%r12 + movq %rax,%r14 + rorq $23,%r13 + movq %r9,%r15 + + xorq %r8,%r13 + rorq $5,%r14 + xorq %r10,%r15 + + movq %r12,0(%rsp) + xorq %rax,%r14 + andq %r8,%r15 + + rorq $4,%r13 + addq %r11,%r12 + xorq %r10,%r15 + + rorq $6,%r14 + xorq %r8,%r13 + addq %r15,%r12 + + movq %rax,%r15 + addq (%rbp),%r12 + xorq %rax,%r14 + + xorq %rbx,%r15 + rorq $14,%r13 + movq %rbx,%r11 + + andq %r15,%rdi + rorq $28,%r14 + addq %r13,%r12 + + xorq %rdi,%r11 + addq %r12,%rdx + addq %r12,%r11 + + leaq 8(%rbp),%rbp + movq 16(%rsp),%r13 + movq 120(%rsp),%rdi + + movq %r13,%r12 + rorq $7,%r13 + addq %r14,%r11 + movq %rdi,%r14 + rorq $42,%rdi + + xorq %r12,%r13 + shrq $7,%r12 + rorq $1,%r13 + xorq %r14,%rdi + shrq $6,%r14 + + rorq $19,%rdi + xorq %r13,%r12 + xorq %r14,%rdi + addq 80(%rsp),%r12 + + addq 8(%rsp),%r12 + movq %rdx,%r13 + addq %rdi,%r12 + movq %r11,%r14 + rorq $23,%r13 + movq %r8,%rdi + + xorq %rdx,%r13 + rorq $5,%r14 + xorq %r9,%rdi + + movq %r12,8(%rsp) + xorq %r11,%r14 + andq %rdx,%rdi + + rorq $4,%r13 + addq %r10,%r12 + xorq %r9,%rdi + + rorq $6,%r14 + xorq %rdx,%r13 + addq %rdi,%r12 + + movq %r11,%rdi + addq (%rbp),%r12 + xorq %r11,%r14 + + xorq %rax,%rdi + rorq $14,%r13 + movq %rax,%r10 + + andq %rdi,%r15 + rorq $28,%r14 + addq %r13,%r12 + + xorq %r15,%r10 + addq %r12,%rcx + addq %r12,%r10 + + leaq 24(%rbp),%rbp + movq 24(%rsp),%r13 + movq 0(%rsp),%r15 + + movq %r13,%r12 + rorq $7,%r13 + addq %r14,%r10 + movq %r15,%r14 + rorq $42,%r15 + + xorq %r12,%r13 + shrq $7,%r12 + rorq $1,%r13 + xorq %r14,%r15 + shrq $6,%r14 + + rorq $19,%r15 + xorq %r13,%r12 + xorq %r14,%r15 + addq 88(%rsp),%r12 + + addq 16(%rsp),%r12 + movq %rcx,%r13 + addq %r15,%r12 + movq %r10,%r14 + rorq $23,%r13 + movq %rdx,%r15 + + xorq %rcx,%r13 + rorq $5,%r14 + xorq %r8,%r15 + + movq %r12,16(%rsp) + xorq %r10,%r14 + andq %rcx,%r15 + + rorq $4,%r13 + addq %r9,%r12 + xorq %r8,%r15 + + rorq $6,%r14 + xorq %rcx,%r13 + addq %r15,%r12 + + movq %r10,%r15 + addq (%rbp),%r12 + xorq %r10,%r14 + + xorq %r11,%r15 + rorq $14,%r13 + movq %r11,%r9 + + andq %r15,%rdi + rorq $28,%r14 + addq %r13,%r12 + + xorq %rdi,%r9 + addq %r12,%rbx + addq %r12,%r9 + + leaq 8(%rbp),%rbp + movq 32(%rsp),%r13 + movq 8(%rsp),%rdi + + movq %r13,%r12 + rorq $7,%r13 + addq %r14,%r9 + movq %rdi,%r14 + rorq $42,%rdi + + xorq %r12,%r13 + shrq $7,%r12 + rorq $1,%r13 + xorq %r14,%rdi + shrq $6,%r14 + + rorq $19,%rdi + xorq %r13,%r12 + xorq %r14,%rdi + addq 96(%rsp),%r12 + + addq 24(%rsp),%r12 + movq %rbx,%r13 + addq %rdi,%r12 + movq %r9,%r14 + rorq $23,%r13 + movq %rcx,%rdi + + xorq %rbx,%r13 + rorq $5,%r14 + xorq %rdx,%rdi + + movq %r12,24(%rsp) + xorq %r9,%r14 + andq %rbx,%rdi + + rorq $4,%r13 + addq %r8,%r12 + xorq %rdx,%rdi + + rorq $6,%r14 + xorq %rbx,%r13 + addq %rdi,%r12 + + movq %r9,%rdi + addq (%rbp),%r12 + xorq %r9,%r14 + + xorq %r10,%rdi + rorq $14,%r13 + movq %r10,%r8 + + andq %rdi,%r15 + rorq $28,%r14 + addq %r13,%r12 + + xorq %r15,%r8 + addq %r12,%rax + addq %r12,%r8 + + leaq 24(%rbp),%rbp + movq 40(%rsp),%r13 + movq 16(%rsp),%r15 + + movq %r13,%r12 + rorq $7,%r13 + addq %r14,%r8 + movq %r15,%r14 + rorq $42,%r15 + + xorq %r12,%r13 + shrq $7,%r12 + rorq $1,%r13 + xorq %r14,%r15 + shrq $6,%r14 + + rorq $19,%r15 + xorq %r13,%r12 + xorq %r14,%r15 + addq 104(%rsp),%r12 + + addq 32(%rsp),%r12 + movq %rax,%r13 + addq %r15,%r12 + movq %r8,%r14 + rorq $23,%r13 + movq %rbx,%r15 + + xorq %rax,%r13 + rorq $5,%r14 + xorq %rcx,%r15 + + movq %r12,32(%rsp) + xorq %r8,%r14 + andq %rax,%r15 + + rorq $4,%r13 + addq %rdx,%r12 + xorq %rcx,%r15 + + rorq $6,%r14 + xorq %rax,%r13 + addq %r15,%r12 + + movq %r8,%r15 + addq (%rbp),%r12 + xorq %r8,%r14 + + xorq %r9,%r15 + rorq $14,%r13 + movq %r9,%rdx + + andq %r15,%rdi + rorq $28,%r14 + addq %r13,%r12 + + xorq %rdi,%rdx + addq %r12,%r11 + addq %r12,%rdx + + leaq 8(%rbp),%rbp + movq 48(%rsp),%r13 + movq 24(%rsp),%rdi + + movq %r13,%r12 + rorq $7,%r13 + addq %r14,%rdx + movq %rdi,%r14 + rorq $42,%rdi + + xorq %r12,%r13 + shrq $7,%r12 + rorq $1,%r13 + xorq %r14,%rdi + shrq $6,%r14 + + rorq $19,%rdi + xorq %r13,%r12 + xorq %r14,%rdi + addq 112(%rsp),%r12 + + addq 40(%rsp),%r12 + movq %r11,%r13 + addq %rdi,%r12 + movq %rdx,%r14 + rorq $23,%r13 + movq %rax,%rdi + + xorq %r11,%r13 + rorq $5,%r14 + xorq %rbx,%rdi + + movq %r12,40(%rsp) + xorq %rdx,%r14 + andq %r11,%rdi + + rorq $4,%r13 + addq %rcx,%r12 + xorq %rbx,%rdi + + rorq $6,%r14 + xorq %r11,%r13 + addq %rdi,%r12 + + movq %rdx,%rdi + addq (%rbp),%r12 + xorq %rdx,%r14 + + xorq %r8,%rdi + rorq $14,%r13 + movq %r8,%rcx + + andq %rdi,%r15 + rorq $28,%r14 + addq %r13,%r12 + + xorq %r15,%rcx + addq %r12,%r10 + addq %r12,%rcx + + leaq 24(%rbp),%rbp + movq 56(%rsp),%r13 + movq 32(%rsp),%r15 + + movq %r13,%r12 + rorq $7,%r13 + addq %r14,%rcx + movq %r15,%r14 + rorq $42,%r15 + + xorq %r12,%r13 + shrq $7,%r12 + rorq $1,%r13 + xorq %r14,%r15 + shrq $6,%r14 + + rorq $19,%r15 + xorq %r13,%r12 + xorq %r14,%r15 + addq 120(%rsp),%r12 + + addq 48(%rsp),%r12 + movq %r10,%r13 + addq %r15,%r12 + movq %rcx,%r14 + rorq $23,%r13 + movq %r11,%r15 + + xorq %r10,%r13 + rorq $5,%r14 + xorq %rax,%r15 + + movq %r12,48(%rsp) + xorq %rcx,%r14 + andq %r10,%r15 + + rorq $4,%r13 + addq %rbx,%r12 + xorq %rax,%r15 + + rorq $6,%r14 + xorq %r10,%r13 + addq %r15,%r12 + + movq %rcx,%r15 + addq (%rbp),%r12 + xorq %rcx,%r14 + + xorq %rdx,%r15 + rorq $14,%r13 + movq %rdx,%rbx + + andq %r15,%rdi + rorq $28,%r14 + addq %r13,%r12 + + xorq %rdi,%rbx + addq %r12,%r9 + addq %r12,%rbx + + leaq 8(%rbp),%rbp + movq 64(%rsp),%r13 + movq 40(%rsp),%rdi + + movq %r13,%r12 + rorq $7,%r13 + addq %r14,%rbx + movq %rdi,%r14 + rorq $42,%rdi + + xorq %r12,%r13 + shrq $7,%r12 + rorq $1,%r13 + xorq %r14,%rdi + shrq $6,%r14 + + rorq $19,%rdi + xorq %r13,%r12 + xorq %r14,%rdi + addq 0(%rsp),%r12 + + addq 56(%rsp),%r12 + movq %r9,%r13 + addq %rdi,%r12 + movq %rbx,%r14 + rorq $23,%r13 + movq %r10,%rdi + + xorq %r9,%r13 + rorq $5,%r14 + xorq %r11,%rdi + + movq %r12,56(%rsp) + xorq %rbx,%r14 + andq %r9,%rdi + + rorq $4,%r13 + addq %rax,%r12 + xorq %r11,%rdi + + rorq $6,%r14 + xorq %r9,%r13 + addq %rdi,%r12 + + movq %rbx,%rdi + addq (%rbp),%r12 + xorq %rbx,%r14 + + xorq %rcx,%rdi + rorq $14,%r13 + movq %rcx,%rax + + andq %rdi,%r15 + rorq $28,%r14 + addq %r13,%r12 + + xorq %r15,%rax + addq %r12,%r8 + addq %r12,%rax + + leaq 24(%rbp),%rbp + movq 72(%rsp),%r13 + movq 48(%rsp),%r15 + + movq %r13,%r12 + rorq $7,%r13 + addq %r14,%rax + movq %r15,%r14 + rorq $42,%r15 + + xorq %r12,%r13 + shrq $7,%r12 + rorq $1,%r13 + xorq %r14,%r15 + shrq $6,%r14 + + rorq $19,%r15 + xorq %r13,%r12 + xorq %r14,%r15 + addq 8(%rsp),%r12 + + addq 64(%rsp),%r12 + movq %r8,%r13 + addq %r15,%r12 + movq %rax,%r14 + rorq $23,%r13 + movq %r9,%r15 + + xorq %r8,%r13 + rorq $5,%r14 + xorq %r10,%r15 + + movq %r12,64(%rsp) + xorq %rax,%r14 + andq %r8,%r15 + + rorq $4,%r13 + addq %r11,%r12 + xorq %r10,%r15 + + rorq $6,%r14 + xorq %r8,%r13 + addq %r15,%r12 + + movq %rax,%r15 + addq (%rbp),%r12 + xorq %rax,%r14 + + xorq %rbx,%r15 + rorq $14,%r13 + movq %rbx,%r11 + + andq %r15,%rdi + rorq $28,%r14 + addq %r13,%r12 + + xorq %rdi,%r11 + addq %r12,%rdx + addq %r12,%r11 + + leaq 8(%rbp),%rbp + movq 80(%rsp),%r13 + movq 56(%rsp),%rdi + + movq %r13,%r12 + rorq $7,%r13 + addq %r14,%r11 + movq %rdi,%r14 + rorq $42,%rdi + + xorq %r12,%r13 + shrq $7,%r12 + rorq $1,%r13 + xorq %r14,%rdi + shrq $6,%r14 + + rorq $19,%rdi + xorq %r13,%r12 + xorq %r14,%rdi + addq 16(%rsp),%r12 + + addq 72(%rsp),%r12 + movq %rdx,%r13 + addq %rdi,%r12 + movq %r11,%r14 + rorq $23,%r13 + movq %r8,%rdi + + xorq %rdx,%r13 + rorq $5,%r14 + xorq %r9,%rdi + + movq %r12,72(%rsp) + xorq %r11,%r14 + andq %rdx,%rdi + + rorq $4,%r13 + addq %r10,%r12 + xorq %r9,%rdi + + rorq $6,%r14 + xorq %rdx,%r13 + addq %rdi,%r12 + + movq %r11,%rdi + addq (%rbp),%r12 + xorq %r11,%r14 + + xorq %rax,%rdi + rorq $14,%r13 + movq %rax,%r10 + + andq %rdi,%r15 + rorq $28,%r14 + addq %r13,%r12 + + xorq %r15,%r10 + addq %r12,%rcx + addq %r12,%r10 + + leaq 24(%rbp),%rbp + movq 88(%rsp),%r13 + movq 64(%rsp),%r15 + + movq %r13,%r12 + rorq $7,%r13 + addq %r14,%r10 + movq %r15,%r14 + rorq $42,%r15 + + xorq %r12,%r13 + shrq $7,%r12 + rorq $1,%r13 + xorq %r14,%r15 + shrq $6,%r14 + + rorq $19,%r15 + xorq %r13,%r12 + xorq %r14,%r15 + addq 24(%rsp),%r12 + + addq 80(%rsp),%r12 + movq %rcx,%r13 + addq %r15,%r12 + movq %r10,%r14 + rorq $23,%r13 + movq %rdx,%r15 + + xorq %rcx,%r13 + rorq $5,%r14 + xorq %r8,%r15 + + movq %r12,80(%rsp) + xorq %r10,%r14 + andq %rcx,%r15 + + rorq $4,%r13 + addq %r9,%r12 + xorq %r8,%r15 + + rorq $6,%r14 + xorq %rcx,%r13 + addq %r15,%r12 + + movq %r10,%r15 + addq (%rbp),%r12 + xorq %r10,%r14 + + xorq %r11,%r15 + rorq $14,%r13 + movq %r11,%r9 + + andq %r15,%rdi + rorq $28,%r14 + addq %r13,%r12 + + xorq %rdi,%r9 + addq %r12,%rbx + addq %r12,%r9 + + leaq 8(%rbp),%rbp + movq 96(%rsp),%r13 + movq 72(%rsp),%rdi + + movq %r13,%r12 + rorq $7,%r13 + addq %r14,%r9 + movq %rdi,%r14 + rorq $42,%rdi + + xorq %r12,%r13 + shrq $7,%r12 + rorq $1,%r13 + xorq %r14,%rdi + shrq $6,%r14 + + rorq $19,%rdi + xorq %r13,%r12 + xorq %r14,%rdi + addq 32(%rsp),%r12 + + addq 88(%rsp),%r12 + movq %rbx,%r13 + addq %rdi,%r12 + movq %r9,%r14 + rorq $23,%r13 + movq %rcx,%rdi + + xorq %rbx,%r13 + rorq $5,%r14 + xorq %rdx,%rdi + + movq %r12,88(%rsp) + xorq %r9,%r14 + andq %rbx,%rdi + + rorq $4,%r13 + addq %r8,%r12 + xorq %rdx,%rdi + + rorq $6,%r14 + xorq %rbx,%r13 + addq %rdi,%r12 + + movq %r9,%rdi + addq (%rbp),%r12 + xorq %r9,%r14 + + xorq %r10,%rdi + rorq $14,%r13 + movq %r10,%r8 + + andq %rdi,%r15 + rorq $28,%r14 + addq %r13,%r12 + + xorq %r15,%r8 + addq %r12,%rax + addq %r12,%r8 + + leaq 24(%rbp),%rbp + movq 104(%rsp),%r13 + movq 80(%rsp),%r15 + + movq %r13,%r12 + rorq $7,%r13 + addq %r14,%r8 + movq %r15,%r14 + rorq $42,%r15 + + xorq %r12,%r13 + shrq $7,%r12 + rorq $1,%r13 + xorq %r14,%r15 + shrq $6,%r14 + + rorq $19,%r15 + xorq %r13,%r12 + xorq %r14,%r15 + addq 40(%rsp),%r12 + + addq 96(%rsp),%r12 + movq %rax,%r13 + addq %r15,%r12 + movq %r8,%r14 + rorq $23,%r13 + movq %rbx,%r15 + + xorq %rax,%r13 + rorq $5,%r14 + xorq %rcx,%r15 + + movq %r12,96(%rsp) + xorq %r8,%r14 + andq %rax,%r15 + + rorq $4,%r13 + addq %rdx,%r12 + xorq %rcx,%r15 + + rorq $6,%r14 + xorq %rax,%r13 + addq %r15,%r12 + + movq %r8,%r15 + addq (%rbp),%r12 + xorq %r8,%r14 + + xorq %r9,%r15 + rorq $14,%r13 + movq %r9,%rdx + + andq %r15,%rdi + rorq $28,%r14 + addq %r13,%r12 + + xorq %rdi,%rdx + addq %r12,%r11 + addq %r12,%rdx + + leaq 8(%rbp),%rbp + movq 112(%rsp),%r13 + movq 88(%rsp),%rdi + + movq %r13,%r12 + rorq $7,%r13 + addq %r14,%rdx + movq %rdi,%r14 + rorq $42,%rdi + + xorq %r12,%r13 + shrq $7,%r12 + rorq $1,%r13 + xorq %r14,%rdi + shrq $6,%r14 + + rorq $19,%rdi + xorq %r13,%r12 + xorq %r14,%rdi + addq 48(%rsp),%r12 + + addq 104(%rsp),%r12 + movq %r11,%r13 + addq %rdi,%r12 + movq %rdx,%r14 + rorq $23,%r13 + movq %rax,%rdi + + xorq %r11,%r13 + rorq $5,%r14 + xorq %rbx,%rdi + + movq %r12,104(%rsp) + xorq %rdx,%r14 + andq %r11,%rdi + + rorq $4,%r13 + addq %rcx,%r12 + xorq %rbx,%rdi + + rorq $6,%r14 + xorq %r11,%r13 + addq %rdi,%r12 + + movq %rdx,%rdi + addq (%rbp),%r12 + xorq %rdx,%r14 + + xorq %r8,%rdi + rorq $14,%r13 + movq %r8,%rcx + + andq %rdi,%r15 + rorq $28,%r14 + addq %r13,%r12 + + xorq %r15,%rcx + addq %r12,%r10 + addq %r12,%rcx + + leaq 24(%rbp),%rbp + movq 120(%rsp),%r13 + movq 96(%rsp),%r15 + + movq %r13,%r12 + rorq $7,%r13 + addq %r14,%rcx + movq %r15,%r14 + rorq $42,%r15 + + xorq %r12,%r13 + shrq $7,%r12 + rorq $1,%r13 + xorq %r14,%r15 + shrq $6,%r14 + + rorq $19,%r15 + xorq %r13,%r12 + xorq %r14,%r15 + addq 56(%rsp),%r12 + + addq 112(%rsp),%r12 + movq %r10,%r13 + addq %r15,%r12 + movq %rcx,%r14 + rorq $23,%r13 + movq %r11,%r15 + + xorq %r10,%r13 + rorq $5,%r14 + xorq %rax,%r15 + + movq %r12,112(%rsp) + xorq %rcx,%r14 + andq %r10,%r15 + + rorq $4,%r13 + addq %rbx,%r12 + xorq %rax,%r15 + + rorq $6,%r14 + xorq %r10,%r13 + addq %r15,%r12 + + movq %rcx,%r15 + addq (%rbp),%r12 + xorq %rcx,%r14 + + xorq %rdx,%r15 + rorq $14,%r13 + movq %rdx,%rbx + + andq %r15,%rdi + rorq $28,%r14 + addq %r13,%r12 + + xorq %rdi,%rbx + addq %r12,%r9 + addq %r12,%rbx + + leaq 8(%rbp),%rbp + movq 0(%rsp),%r13 + movq 104(%rsp),%rdi + + movq %r13,%r12 + rorq $7,%r13 + addq %r14,%rbx + movq %rdi,%r14 + rorq $42,%rdi + + xorq %r12,%r13 + shrq $7,%r12 + rorq $1,%r13 + xorq %r14,%rdi + shrq $6,%r14 + + rorq $19,%rdi + xorq %r13,%r12 + xorq %r14,%rdi + addq 64(%rsp),%r12 + + addq 120(%rsp),%r12 + movq %r9,%r13 + addq %rdi,%r12 + movq %rbx,%r14 + rorq $23,%r13 + movq %r10,%rdi + + xorq %r9,%r13 + rorq $5,%r14 + xorq %r11,%rdi + + movq %r12,120(%rsp) + xorq %rbx,%r14 + andq %r9,%rdi + + rorq $4,%r13 + addq %rax,%r12 + xorq %r11,%rdi + + rorq $6,%r14 + xorq %r9,%r13 + addq %rdi,%r12 + + movq %rbx,%rdi + addq (%rbp),%r12 + xorq %rbx,%r14 + + xorq %rcx,%rdi + rorq $14,%r13 + movq %rcx,%rax + + andq %rdi,%r15 + rorq $28,%r14 + addq %r13,%r12 + + xorq %r15,%rax + addq %r12,%r8 + addq %r12,%rax + + leaq 24(%rbp),%rbp + cmpb $0,7(%rbp) + jnz .Lrounds_16_xx + + movq 128+0(%rsp),%rdi + addq %r14,%rax + leaq 128(%rsi),%rsi + + addq 0(%rdi),%rax + addq 8(%rdi),%rbx + addq 16(%rdi),%rcx + addq 24(%rdi),%rdx + addq 32(%rdi),%r8 + addq 40(%rdi),%r9 + addq 48(%rdi),%r10 + addq 56(%rdi),%r11 + + cmpq 128+16(%rsp),%rsi + + movq %rax,0(%rdi) + movq %rbx,8(%rdi) + movq %rcx,16(%rdi) + movq %rdx,24(%rdi) + movq %r8,32(%rdi) + movq %r9,40(%rdi) + movq %r10,48(%rdi) + movq %r11,56(%rdi) + jb .Lloop + + movq 152(%rsp),%rsi +.cfi_def_cfa %rsi,8 + movq -48(%rsi),%r15 +.cfi_restore %r15 + movq -40(%rsi),%r14 +.cfi_restore %r14 + movq -32(%rsi),%r13 +.cfi_restore %r13 + movq -24(%rsi),%r12 +.cfi_restore %r12 + movq -16(%rsi),%rbp +.cfi_restore %rbp + movq -8(%rsi),%rbx +.cfi_restore %rbx + leaq (%rsi),%rsp +.cfi_def_cfa_register %rsp +.Lepilogue: + .byte 0xf3,0xc3 +.cfi_endproc +.size sha512_block_data_order,.-sha512_block_data_order +.align 64 +.type K512,@object +K512: +.quad 0x428a2f98d728ae22,0x7137449123ef65cd +.quad 0x428a2f98d728ae22,0x7137449123ef65cd +.quad 0xb5c0fbcfec4d3b2f,0xe9b5dba58189dbbc +.quad 0xb5c0fbcfec4d3b2f,0xe9b5dba58189dbbc +.quad 0x3956c25bf348b538,0x59f111f1b605d019 +.quad 0x3956c25bf348b538,0x59f111f1b605d019 +.quad 0x923f82a4af194f9b,0xab1c5ed5da6d8118 +.quad 0x923f82a4af194f9b,0xab1c5ed5da6d8118 +.quad 0xd807aa98a3030242,0x12835b0145706fbe +.quad 0xd807aa98a3030242,0x12835b0145706fbe +.quad 0x243185be4ee4b28c,0x550c7dc3d5ffb4e2 +.quad 0x243185be4ee4b28c,0x550c7dc3d5ffb4e2 +.quad 0x72be5d74f27b896f,0x80deb1fe3b1696b1 +.quad 0x72be5d74f27b896f,0x80deb1fe3b1696b1 +.quad 0x9bdc06a725c71235,0xc19bf174cf692694 +.quad 0x9bdc06a725c71235,0xc19bf174cf692694 +.quad 0xe49b69c19ef14ad2,0xefbe4786384f25e3 +.quad 0xe49b69c19ef14ad2,0xefbe4786384f25e3 +.quad 0x0fc19dc68b8cd5b5,0x240ca1cc77ac9c65 +.quad 0x0fc19dc68b8cd5b5,0x240ca1cc77ac9c65 +.quad 0x2de92c6f592b0275,0x4a7484aa6ea6e483 +.quad 0x2de92c6f592b0275,0x4a7484aa6ea6e483 +.quad 0x5cb0a9dcbd41fbd4,0x76f988da831153b5 +.quad 0x5cb0a9dcbd41fbd4,0x76f988da831153b5 +.quad 0x983e5152ee66dfab,0xa831c66d2db43210 +.quad 0x983e5152ee66dfab,0xa831c66d2db43210 +.quad 0xb00327c898fb213f,0xbf597fc7beef0ee4 +.quad 0xb00327c898fb213f,0xbf597fc7beef0ee4 +.quad 0xc6e00bf33da88fc2,0xd5a79147930aa725 +.quad 0xc6e00bf33da88fc2,0xd5a79147930aa725 +.quad 0x06ca6351e003826f,0x142929670a0e6e70 +.quad 0x06ca6351e003826f,0x142929670a0e6e70 +.quad 0x27b70a8546d22ffc,0x2e1b21385c26c926 +.quad 0x27b70a8546d22ffc,0x2e1b21385c26c926 +.quad 0x4d2c6dfc5ac42aed,0x53380d139d95b3df +.quad 0x4d2c6dfc5ac42aed,0x53380d139d95b3df +.quad 0x650a73548baf63de,0x766a0abb3c77b2a8 +.quad 0x650a73548baf63de,0x766a0abb3c77b2a8 +.quad 0x81c2c92e47edaee6,0x92722c851482353b +.quad 0x81c2c92e47edaee6,0x92722c851482353b +.quad 0xa2bfe8a14cf10364,0xa81a664bbc423001 +.quad 0xa2bfe8a14cf10364,0xa81a664bbc423001 +.quad 0xc24b8b70d0f89791,0xc76c51a30654be30 +.quad 0xc24b8b70d0f89791,0xc76c51a30654be30 +.quad 0xd192e819d6ef5218,0xd69906245565a910 +.quad 0xd192e819d6ef5218,0xd69906245565a910 +.quad 0xf40e35855771202a,0x106aa07032bbd1b8 +.quad 0xf40e35855771202a,0x106aa07032bbd1b8 +.quad 0x19a4c116b8d2d0c8,0x1e376c085141ab53 +.quad 0x19a4c116b8d2d0c8,0x1e376c085141ab53 +.quad 0x2748774cdf8eeb99,0x34b0bcb5e19b48a8 +.quad 0x2748774cdf8eeb99,0x34b0bcb5e19b48a8 +.quad 0x391c0cb3c5c95a63,0x4ed8aa4ae3418acb +.quad 0x391c0cb3c5c95a63,0x4ed8aa4ae3418acb +.quad 0x5b9cca4f7763e373,0x682e6ff3d6b2b8a3 +.quad 0x5b9cca4f7763e373,0x682e6ff3d6b2b8a3 +.quad 0x748f82ee5defb2fc,0x78a5636f43172f60 +.quad 0x748f82ee5defb2fc,0x78a5636f43172f60 +.quad 0x84c87814a1f0ab72,0x8cc702081a6439ec +.quad 0x84c87814a1f0ab72,0x8cc702081a6439ec +.quad 0x90befffa23631e28,0xa4506cebde82bde9 +.quad 0x90befffa23631e28,0xa4506cebde82bde9 +.quad 0xbef9a3f7b2c67915,0xc67178f2e372532b +.quad 0xbef9a3f7b2c67915,0xc67178f2e372532b +.quad 0xca273eceea26619c,0xd186b8c721c0c207 +.quad 0xca273eceea26619c,0xd186b8c721c0c207 +.quad 0xeada7dd6cde0eb1e,0xf57d4f7fee6ed178 +.quad 0xeada7dd6cde0eb1e,0xf57d4f7fee6ed178 +.quad 0x06f067aa72176fba,0x0a637dc5a2c898a6 +.quad 0x06f067aa72176fba,0x0a637dc5a2c898a6 +.quad 0x113f9804bef90dae,0x1b710b35131c471b +.quad 0x113f9804bef90dae,0x1b710b35131c471b +.quad 0x28db77f523047d84,0x32caab7b40c72493 +.quad 0x28db77f523047d84,0x32caab7b40c72493 +.quad 0x3c9ebe0a15c9bebc,0x431d67c49c100d4c +.quad 0x3c9ebe0a15c9bebc,0x431d67c49c100d4c +.quad 0x4cc5d4becb3e42b6,0x597f299cfc657e2a +.quad 0x4cc5d4becb3e42b6,0x597f299cfc657e2a +.quad 0x5fcb6fab3ad6faec,0x6c44198c4a475817 +.quad 0x5fcb6fab3ad6faec,0x6c44198c4a475817 + +.quad 0x0001020304050607,0x08090a0b0c0d0e0f +.quad 0x0001020304050607,0x08090a0b0c0d0e0f +.byte 83,72,65,53,49,50,32,98,108,111,99,107,32,116,114,97,110,115,102,111,114,109,32,102,111,114,32,120,56,54,95,54,52,44,32,67,82,89,80,84,79,71,65,77,83,32,98,121,32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46,111,114,103,62,0 diff --git a/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/x86_64cpuid.S b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/x86_64cpuid.S new file mode 100644 index 0000000000..cac5f8f32c --- /dev/null +++ b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/x86_64cpuid.S @@ -0,0 +1,491 @@ +# WARNING: do not edit! +# Generated from openssl/crypto/x86_64cpuid.pl +# +# Copyright 2005-2020 The OpenSSL Project Authors. All Rights Reserved. +# +# Licensed under the OpenSSL license (the "License"). You may not use +# this file except in compliance with the License. You can obtain a copy +# in the file LICENSE in the source distribution or at +# https://www.openssl.org/source/license.html + + +.hidden OPENSSL_cpuid_setup +.section .init + call OPENSSL_cpuid_setup + +.hidden OPENSSL_ia32cap_P +.comm OPENSSL_ia32cap_P,16,4 + +.text + +.globl OPENSSL_atomic_add +.type OPENSSL_atomic_add,@function +.align 16 +OPENSSL_atomic_add: +.cfi_startproc + movl (%rdi),%eax +.Lspin: leaq (%rsi,%rax,1),%r8 +.byte 0xf0 + cmpxchgl %r8d,(%rdi) + jne .Lspin + movl %r8d,%eax +.byte 0x48,0x98 + .byte 0xf3,0xc3 +.cfi_endproc +.size OPENSSL_atomic_add,.-OPENSSL_atomic_add + +.globl OPENSSL_rdtsc +.type OPENSSL_rdtsc,@function +.align 16 +OPENSSL_rdtsc: +.cfi_startproc + rdtsc + shlq $32,%rdx + orq %rdx,%rax + .byte 0xf3,0xc3 +.cfi_endproc +.size OPENSSL_rdtsc,.-OPENSSL_rdtsc + +.globl OPENSSL_ia32_cpuid +.type OPENSSL_ia32_cpuid,@function +.align 16 +OPENSSL_ia32_cpuid: +.cfi_startproc + movq %rbx,%r8 +.cfi_register %rbx,%r8 + + xorl %eax,%eax + movq %rax,8(%rdi) + cpuid + movl %eax,%r11d + + xorl %eax,%eax + cmpl $0x756e6547,%ebx + setne %al + movl %eax,%r9d + cmpl $0x49656e69,%edx + setne %al + orl %eax,%r9d + cmpl $0x6c65746e,%ecx + setne %al + orl %eax,%r9d + jz .Lintel + + cmpl $0x68747541,%ebx + setne %al + movl %eax,%r10d + cmpl $0x69746E65,%edx + setne %al + orl %eax,%r10d + cmpl $0x444D4163,%ecx + setne %al + orl %eax,%r10d + jnz .Lintel + + + movl $0x80000000,%eax + cpuid + cmpl $0x80000001,%eax + jb .Lintel + movl %eax,%r10d + movl $0x80000001,%eax + cpuid + orl %ecx,%r9d + andl $0x00000801,%r9d + + cmpl $0x80000008,%r10d + jb .Lintel + + movl $0x80000008,%eax + cpuid + movzbq %cl,%r10 + incq %r10 + + movl $1,%eax + cpuid + btl $28,%edx + jnc .Lgeneric + shrl $16,%ebx + cmpb %r10b,%bl + ja .Lgeneric + andl $0xefffffff,%edx + jmp .Lgeneric + +.Lintel: + cmpl $4,%r11d + movl $-1,%r10d + jb .Lnocacheinfo + + movl $4,%eax + movl $0,%ecx + cpuid + movl %eax,%r10d + shrl $14,%r10d + andl $0xfff,%r10d + +.Lnocacheinfo: + movl $1,%eax + cpuid + movd %eax,%xmm0 + andl $0xbfefffff,%edx + cmpl $0,%r9d + jne .Lnotintel + orl $0x40000000,%edx + andb $15,%ah + cmpb $15,%ah + jne .LnotP4 + orl $0x00100000,%edx +.LnotP4: + cmpb $6,%ah + jne .Lnotintel + andl $0x0fff0ff0,%eax + cmpl $0x00050670,%eax + je .Lknights + cmpl $0x00080650,%eax + jne .Lnotintel +.Lknights: + andl $0xfbffffff,%ecx + +.Lnotintel: + btl $28,%edx + jnc .Lgeneric + andl $0xefffffff,%edx + cmpl $0,%r10d + je .Lgeneric + + orl $0x10000000,%edx + shrl $16,%ebx + cmpb $1,%bl + ja .Lgeneric + andl $0xefffffff,%edx +.Lgeneric: + andl $0x00000800,%r9d + andl $0xfffff7ff,%ecx + orl %ecx,%r9d + + movl %edx,%r10d + + cmpl $7,%r11d + jb .Lno_extended_info + movl $7,%eax + xorl %ecx,%ecx + cpuid + btl $26,%r9d + jc .Lnotknights + andl $0xfff7ffff,%ebx +.Lnotknights: + movd %xmm0,%eax + andl $0x0fff0ff0,%eax + cmpl $0x00050650,%eax + jne .Lnotskylakex + andl $0xfffeffff,%ebx + +.Lnotskylakex: + movl %ebx,8(%rdi) + movl %ecx,12(%rdi) +.Lno_extended_info: + + btl $27,%r9d + jnc .Lclear_avx + xorl %ecx,%ecx +.byte 0x0f,0x01,0xd0 + andl $0xe6,%eax + cmpl $0xe6,%eax + je .Ldone + andl $0x3fdeffff,8(%rdi) + + + + + andl $6,%eax + cmpl $6,%eax + je .Ldone +.Lclear_avx: + movl $0xefffe7ff,%eax + andl %eax,%r9d + movl $0x3fdeffdf,%eax + andl %eax,8(%rdi) +.Ldone: + shlq $32,%r9 + movl %r10d,%eax + movq %r8,%rbx +.cfi_restore %rbx + orq %r9,%rax + .byte 0xf3,0xc3 +.cfi_endproc +.size OPENSSL_ia32_cpuid,.-OPENSSL_ia32_cpuid + +.globl OPENSSL_cleanse +.type OPENSSL_cleanse,@function +.align 16 +OPENSSL_cleanse: +.cfi_startproc + xorq %rax,%rax + cmpq $15,%rsi + jae .Lot + cmpq $0,%rsi + je .Lret +.Little: + movb %al,(%rdi) + subq $1,%rsi + leaq 1(%rdi),%rdi + jnz .Little +.Lret: + .byte 0xf3,0xc3 +.align 16 +.Lot: + testq $7,%rdi + jz .Laligned + movb %al,(%rdi) + leaq -1(%rsi),%rsi + leaq 1(%rdi),%rdi + jmp .Lot +.Laligned: + movq %rax,(%rdi) + leaq -8(%rsi),%rsi + testq $-8,%rsi + leaq 8(%rdi),%rdi + jnz .Laligned + cmpq $0,%rsi + jne .Little + .byte 0xf3,0xc3 +.cfi_endproc +.size OPENSSL_cleanse,.-OPENSSL_cleanse + +.globl CRYPTO_memcmp +.type CRYPTO_memcmp,@function +.align 16 +CRYPTO_memcmp: +.cfi_startproc + xorq %rax,%rax + xorq %r10,%r10 + cmpq $0,%rdx + je .Lno_data + cmpq $16,%rdx + jne .Loop_cmp + movq (%rdi),%r10 + movq 8(%rdi),%r11 + movq $1,%rdx + xorq (%rsi),%r10 + xorq 8(%rsi),%r11 + orq %r11,%r10 + cmovnzq %rdx,%rax + .byte 0xf3,0xc3 + +.align 16 +.Loop_cmp: + movb (%rdi),%r10b + leaq 1(%rdi),%rdi + xorb (%rsi),%r10b + leaq 1(%rsi),%rsi + orb %r10b,%al + decq %rdx + jnz .Loop_cmp + negq %rax + shrq $63,%rax +.Lno_data: + .byte 0xf3,0xc3 +.cfi_endproc +.size CRYPTO_memcmp,.-CRYPTO_memcmp +.globl OPENSSL_wipe_cpu +.type OPENSSL_wipe_cpu,@function +.align 16 +OPENSSL_wipe_cpu: +.cfi_startproc + pxor %xmm0,%xmm0 + pxor %xmm1,%xmm1 + pxor %xmm2,%xmm2 + pxor %xmm3,%xmm3 + pxor %xmm4,%xmm4 + pxor %xmm5,%xmm5 + pxor %xmm6,%xmm6 + pxor %xmm7,%xmm7 + pxor %xmm8,%xmm8 + pxor %xmm9,%xmm9 + pxor %xmm10,%xmm10 + pxor %xmm11,%xmm11 + pxor %xmm12,%xmm12 + pxor %xmm13,%xmm13 + pxor %xmm14,%xmm14 + pxor %xmm15,%xmm15 + xorq %rcx,%rcx + xorq %rdx,%rdx + xorq %rsi,%rsi + xorq %rdi,%rdi + xorq %r8,%r8 + xorq %r9,%r9 + xorq %r10,%r10 + xorq %r11,%r11 + leaq 8(%rsp),%rax + .byte 0xf3,0xc3 +.cfi_endproc +.size OPENSSL_wipe_cpu,.-OPENSSL_wipe_cpu +.globl OPENSSL_instrument_bus +.type OPENSSL_instrument_bus,@function +.align 16 +OPENSSL_instrument_bus: +.cfi_startproc + movq %rdi,%r10 + movq %rsi,%rcx + movq %rsi,%r11 + + rdtsc + movl %eax,%r8d + movl $0,%r9d + clflush (%r10) +.byte 0xf0 + addl %r9d,(%r10) + jmp .Loop +.align 16 +.Loop: rdtsc + movl %eax,%edx + subl %r8d,%eax + movl %edx,%r8d + movl %eax,%r9d + clflush (%r10) +.byte 0xf0 + addl %eax,(%r10) + leaq 4(%r10),%r10 + subq $1,%rcx + jnz .Loop + + movq %r11,%rax + .byte 0xf3,0xc3 +.cfi_endproc +.size OPENSSL_instrument_bus,.-OPENSSL_instrument_bus + +.globl OPENSSL_instrument_bus2 +.type OPENSSL_instrument_bus2,@function +.align 16 +OPENSSL_instrument_bus2: +.cfi_startproc + movq %rdi,%r10 + movq %rsi,%rcx + movq %rdx,%r11 + movq %rcx,8(%rsp) + + rdtsc + movl %eax,%r8d + movl $0,%r9d + + clflush (%r10) +.byte 0xf0 + addl %r9d,(%r10) + + rdtsc + movl %eax,%edx + subl %r8d,%eax + movl %edx,%r8d + movl %eax,%r9d +.Loop2: + clflush (%r10) +.byte 0xf0 + addl %eax,(%r10) + + subq $1,%r11 + jz .Ldone2 + + rdtsc + movl %eax,%edx + subl %r8d,%eax + movl %edx,%r8d + cmpl %r9d,%eax + movl %eax,%r9d + movl $0,%edx + setne %dl + subq %rdx,%rcx + leaq (%r10,%rdx,4),%r10 + jnz .Loop2 + +.Ldone2: + movq 8(%rsp),%rax + subq %rcx,%rax + .byte 0xf3,0xc3 +.cfi_endproc +.size OPENSSL_instrument_bus2,.-OPENSSL_instrument_bus2 +.globl OPENSSL_ia32_rdrand_bytes +.type OPENSSL_ia32_rdrand_bytes,@function +.align 16 +OPENSSL_ia32_rdrand_bytes: +.cfi_startproc + xorq %rax,%rax + cmpq $0,%rsi + je .Ldone_rdrand_bytes + + movq $8,%r11 +.Loop_rdrand_bytes: +.byte 73,15,199,242 + jc .Lbreak_rdrand_bytes + decq %r11 + jnz .Loop_rdrand_bytes + jmp .Ldone_rdrand_bytes + +.align 16 +.Lbreak_rdrand_bytes: + cmpq $8,%rsi + jb .Ltail_rdrand_bytes + movq %r10,(%rdi) + leaq 8(%rdi),%rdi + addq $8,%rax + subq $8,%rsi + jz .Ldone_rdrand_bytes + movq $8,%r11 + jmp .Loop_rdrand_bytes + +.align 16 +.Ltail_rdrand_bytes: + movb %r10b,(%rdi) + leaq 1(%rdi),%rdi + incq %rax + shrq $8,%r10 + decq %rsi + jnz .Ltail_rdrand_bytes + +.Ldone_rdrand_bytes: + xorq %r10,%r10 + .byte 0xf3,0xc3 +.cfi_endproc +.size OPENSSL_ia32_rdrand_bytes,.-OPENSSL_ia32_rdrand_bytes +.globl OPENSSL_ia32_rdseed_bytes +.type OPENSSL_ia32_rdseed_bytes,@function +.align 16 +OPENSSL_ia32_rdseed_bytes: +.cfi_startproc + xorq %rax,%rax + cmpq $0,%rsi + je .Ldone_rdseed_bytes + + movq $8,%r11 +.Loop_rdseed_bytes: +.byte 73,15,199,250 + jc .Lbreak_rdseed_bytes + decq %r11 + jnz .Loop_rdseed_bytes + jmp .Ldone_rdseed_bytes + +.align 16 +.Lbreak_rdseed_bytes: + cmpq $8,%rsi + jb .Ltail_rdseed_bytes + movq %r10,(%rdi) + leaq 8(%rdi),%rdi + addq $8,%rax + subq $8,%rsi + jz .Ldone_rdseed_bytes + movq $8,%r11 + jmp .Loop_rdseed_bytes + +.align 16 +.Ltail_rdseed_bytes: + movb %r10b,(%rdi) + leaq 1(%rdi),%rdi + incq %rax + shrq $8,%r10 + decq %rsi + jnz .Ltail_rdseed_bytes + +.Ldone_rdseed_bytes: + xorq %r10,%r10 + .byte 0xf3,0xc3 +.cfi_endproc +.size OPENSSL_ia32_rdseed_bytes,.-OPENSSL_ia32_rdseed_bytes -- 2.32.0.windows.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v7 3/3] CryptoPkg/OpensslLib: Commit the auto-generated assembly files for X64 2021-07-20 22:06 ` [PATCH v7 3/3] CryptoPkg/OpensslLib: Commit the auto-generated assembly files " Christopher Zurcher @ 2021-07-21 11:44 ` Yao, Jiewen 2021-07-26 10:08 ` 回复: [edk2-devel] " gaoliming 0 siblings, 1 reply; 13+ messages in thread From: Yao, Jiewen @ 2021-07-21 11:44 UTC (permalink / raw) To: christopher.zurcher@outlook.com, devel@edk2.groups.io Cc: Wang, Jian J, Lu, XiaoyuX, Kinney, Michael D, Ard Biesheuvel Reviewed-by: Jiewen Yao <Jiewen.yao@intel.com> > -----Original Message----- > From: christopher.zurcher@outlook.com <christopher.zurcher@outlook.com> > Sent: Wednesday, July 21, 2021 6:07 AM > To: devel@edk2.groups.io > Cc: Yao, Jiewen <jiewen.yao@intel.com>; Wang, Jian J <jian.j.wang@intel.com>; > Lu, XiaoyuX <xiaoyux.lu@intel.com>; Kinney, Michael D > <michael.d.kinney@intel.com>; Ard Biesheuvel <ardb@kernel.org> > Subject: [PATCH v7 3/3] CryptoPkg/OpensslLib: Commit the auto-generated > assembly files for X64 > > From: Christopher Zurcher <christopher.zurcher@microsoft.com> > > BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=2507 > > Adding the auto-generated assembly files for X64 architectures. > > Cc: Jiewen Yao <jiewen.yao@intel.com> > Cc: Jian J Wang <jian.j.wang@intel.com> > Cc: Xiaoyu Lu <xiaoyux.lu@intel.com> > Cc: Mike Kinney <michael.d.kinney@intel.com> > Cc: Ard Biesheuvel <ardb@kernel.org> > Signed-off-by: Christopher Zurcher <christopher.zurcher@microsoft.com> > --- > CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-mb-x86_64.nasm | 732 > +++ > CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha1-x86_64.nasm | > 1916 ++++++++ > CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha256-x86_64.nasm | > 78 + > CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-x86_64.nasm | 5103 > ++++++++++++++++++++ > CryptoPkg/Library/OpensslLib/X64/crypto/aes/vpaes-x86_64.nasm | 1173 > +++++ > CryptoPkg/Library/OpensslLib/X64/crypto/modes/aesni-gcm-x86_64.nasm | > 34 + > CryptoPkg/Library/OpensslLib/X64/crypto/modes/ghash-x86_64.nasm | > 1569 ++++++ > CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-mb-x86_64.nasm | 3137 > ++++++++++++ > CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-x86_64.nasm | 2884 > +++++++++++ > CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-mb-x86_64.nasm | > 3461 +++++++++++++ > CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-x86_64.nasm | 3313 > +++++++++++++ > CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha512-x86_64.nasm | 1938 > ++++++++ > CryptoPkg/Library/OpensslLib/X64/crypto/x86_64cpuid.nasm | 491 ++ > CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/aesni-mb-x86_64.S | 552 > +++ > CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/aesni-sha1-x86_64.S | 1719 > +++++++ > CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/aesni-sha256-x86_64.S | 69 > + > CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/aesni-x86_64.S | 4484 > +++++++++++++++++ > CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/vpaes-x86_64.S | 863 > ++++ > CryptoPkg/Library/OpensslLib/X64Gcc/crypto/modes/aesni-gcm-x86_64.S | > 29 + > CryptoPkg/Library/OpensslLib/X64Gcc/crypto/modes/ghash-x86_64.S | 1386 > ++++++ > CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha1-mb-x86_64.S | 2962 > ++++++++++++ > CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha1-x86_64.S | 2631 > ++++++++++ > CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha256-mb-x86_64.S | > 3286 +++++++++++++ > CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha256-x86_64.S | 3097 > ++++++++++++ > CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha512-x86_64.S | 1811 > +++++++ > CryptoPkg/Library/OpensslLib/X64Gcc/crypto/x86_64cpuid.S | 491 ++ > 26 files changed, 49209 insertions(+) > > diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-mb- > x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-mb- > x86_64.nasm > new file mode 100644 > index 0000000000..1a3ed1dd35 > --- /dev/null > +++ b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-mb-x86_64.nasm > @@ -0,0 +1,732 @@ > +; WARNING: do not edit! > +; Generated from openssl/crypto/aes/asm/aesni-mb-x86_64.pl > +; > +; Copyright 2013-2020 The OpenSSL Project Authors. All Rights Reserved. > +; > +; Licensed under the OpenSSL license (the "License"). You may not use > +; this file except in compliance with the License. You can obtain a copy > +; in the file LICENSE in the source distribution or at > +; https://www.openssl.org/source/license.html > + > +default rel > +%define XMMWORD > +%define YMMWORD > +%define ZMMWORD > +section .text code align=64 > + > + > +EXTERN OPENSSL_ia32cap_P > + > +global aesni_multi_cbc_encrypt > + > +ALIGN 32 > +aesni_multi_cbc_encrypt: > + mov QWORD[8+rsp],rdi ;WIN64 prologue > + mov QWORD[16+rsp],rsi > + mov rax,rsp > +$L$SEH_begin_aesni_multi_cbc_encrypt: > + mov rdi,rcx > + mov rsi,rdx > + mov rdx,r8 > + > + > + > + mov rax,rsp > + > + push rbx > + > + push rbp > + > + push r12 > + > + push r13 > + > + push r14 > + > + push r15 > + > + lea rsp,[((-168))+rsp] > + movaps XMMWORD[rsp],xmm6 > + movaps XMMWORD[16+rsp],xmm7 > + movaps XMMWORD[32+rsp],xmm8 > + movaps XMMWORD[48+rsp],xmm9 > + movaps XMMWORD[64+rsp],xmm10 > + movaps XMMWORD[80+rsp],xmm11 > + movaps XMMWORD[96+rsp],xmm12 > + movaps XMMWORD[(-104)+rax],xmm13 > + movaps XMMWORD[(-88)+rax],xmm14 > + movaps XMMWORD[(-72)+rax],xmm15 > + > + > + > + > + > + > + sub rsp,48 > + and rsp,-64 > + mov QWORD[16+rsp],rax > + > + > +$L$enc4x_body: > + movdqu xmm12,XMMWORD[rsi] > + lea rsi,[120+rsi] > + lea rdi,[80+rdi] > + > +$L$enc4x_loop_grande: > + mov DWORD[24+rsp],edx > + xor edx,edx > + mov ecx,DWORD[((-64))+rdi] > + mov r8,QWORD[((-80))+rdi] > + cmp ecx,edx > + mov r12,QWORD[((-72))+rdi] > + cmovg edx,ecx > + test ecx,ecx > + movdqu xmm2,XMMWORD[((-56))+rdi] > + mov DWORD[32+rsp],ecx > + cmovle r8,rsp > + mov ecx,DWORD[((-24))+rdi] > + mov r9,QWORD[((-40))+rdi] > + cmp ecx,edx > + mov r13,QWORD[((-32))+rdi] > + cmovg edx,ecx > + test ecx,ecx > + movdqu xmm3,XMMWORD[((-16))+rdi] > + mov DWORD[36+rsp],ecx > + cmovle r9,rsp > + mov ecx,DWORD[16+rdi] > + mov r10,QWORD[rdi] > + cmp ecx,edx > + mov r14,QWORD[8+rdi] > + cmovg edx,ecx > + test ecx,ecx > + movdqu xmm4,XMMWORD[24+rdi] > + mov DWORD[40+rsp],ecx > + cmovle r10,rsp > + mov ecx,DWORD[56+rdi] > + mov r11,QWORD[40+rdi] > + cmp ecx,edx > + mov r15,QWORD[48+rdi] > + cmovg edx,ecx > + test ecx,ecx > + movdqu xmm5,XMMWORD[64+rdi] > + mov DWORD[44+rsp],ecx > + cmovle r11,rsp > + test edx,edx > + jz NEAR $L$enc4x_done > + > + movups xmm1,XMMWORD[((16-120))+rsi] > + pxor xmm2,xmm12 > + movups xmm0,XMMWORD[((32-120))+rsi] > + pxor xmm3,xmm12 > + mov eax,DWORD[((240-120))+rsi] > + pxor xmm4,xmm12 > + movdqu xmm6,XMMWORD[r8] > + pxor xmm5,xmm12 > + movdqu xmm7,XMMWORD[r9] > + pxor xmm2,xmm6 > + movdqu xmm8,XMMWORD[r10] > + pxor xmm3,xmm7 > + movdqu xmm9,XMMWORD[r11] > + pxor xmm4,xmm8 > + pxor xmm5,xmm9 > + movdqa xmm10,XMMWORD[32+rsp] > + xor rbx,rbx > + jmp NEAR $L$oop_enc4x > + > +ALIGN 32 > +$L$oop_enc4x: > + add rbx,16 > + lea rbp,[16+rsp] > + mov ecx,1 > + sub rbp,rbx > + > +DB 102,15,56,220,209 > + prefetcht0 [31+rbx*1+r8] > + prefetcht0 [31+rbx*1+r9] > +DB 102,15,56,220,217 > + prefetcht0 [31+rbx*1+r10] > + prefetcht0 [31+rbx*1+r10] > +DB 102,15,56,220,225 > +DB 102,15,56,220,233 > + movups xmm1,XMMWORD[((48-120))+rsi] > + cmp ecx,DWORD[32+rsp] > +DB 102,15,56,220,208 > +DB 102,15,56,220,216 > +DB 102,15,56,220,224 > + cmovge r8,rbp > + cmovg r12,rbp > +DB 102,15,56,220,232 > + movups xmm0,XMMWORD[((-56))+rsi] > + cmp ecx,DWORD[36+rsp] > +DB 102,15,56,220,209 > +DB 102,15,56,220,217 > +DB 102,15,56,220,225 > + cmovge r9,rbp > + cmovg r13,rbp > +DB 102,15,56,220,233 > + movups xmm1,XMMWORD[((-40))+rsi] > + cmp ecx,DWORD[40+rsp] > +DB 102,15,56,220,208 > +DB 102,15,56,220,216 > +DB 102,15,56,220,224 > + cmovge r10,rbp > + cmovg r14,rbp > +DB 102,15,56,220,232 > + movups xmm0,XMMWORD[((-24))+rsi] > + cmp ecx,DWORD[44+rsp] > +DB 102,15,56,220,209 > +DB 102,15,56,220,217 > +DB 102,15,56,220,225 > + cmovge r11,rbp > + cmovg r15,rbp > +DB 102,15,56,220,233 > + movups xmm1,XMMWORD[((-8))+rsi] > + movdqa xmm11,xmm10 > +DB 102,15,56,220,208 > + prefetcht0 [15+rbx*1+r12] > + prefetcht0 [15+rbx*1+r13] > +DB 102,15,56,220,216 > + prefetcht0 [15+rbx*1+r14] > + prefetcht0 [15+rbx*1+r15] > +DB 102,15,56,220,224 > +DB 102,15,56,220,232 > + movups xmm0,XMMWORD[((128-120))+rsi] > + pxor xmm12,xmm12 > + > +DB 102,15,56,220,209 > + pcmpgtd xmm11,xmm12 > + movdqu xmm12,XMMWORD[((-120))+rsi] > +DB 102,15,56,220,217 > + paddd xmm10,xmm11 > + movdqa XMMWORD[32+rsp],xmm10 > +DB 102,15,56,220,225 > +DB 102,15,56,220,233 > + movups xmm1,XMMWORD[((144-120))+rsi] > + > + cmp eax,11 > + > +DB 102,15,56,220,208 > +DB 102,15,56,220,216 > +DB 102,15,56,220,224 > +DB 102,15,56,220,232 > + movups xmm0,XMMWORD[((160-120))+rsi] > + > + jb NEAR $L$enc4x_tail > + > +DB 102,15,56,220,209 > +DB 102,15,56,220,217 > +DB 102,15,56,220,225 > +DB 102,15,56,220,233 > + movups xmm1,XMMWORD[((176-120))+rsi] > + > +DB 102,15,56,220,208 > +DB 102,15,56,220,216 > +DB 102,15,56,220,224 > +DB 102,15,56,220,232 > + movups xmm0,XMMWORD[((192-120))+rsi] > + > + je NEAR $L$enc4x_tail > + > +DB 102,15,56,220,209 > +DB 102,15,56,220,217 > +DB 102,15,56,220,225 > +DB 102,15,56,220,233 > + movups xmm1,XMMWORD[((208-120))+rsi] > + > +DB 102,15,56,220,208 > +DB 102,15,56,220,216 > +DB 102,15,56,220,224 > +DB 102,15,56,220,232 > + movups xmm0,XMMWORD[((224-120))+rsi] > + jmp NEAR $L$enc4x_tail > + > +ALIGN 32 > +$L$enc4x_tail: > +DB 102,15,56,220,209 > +DB 102,15,56,220,217 > +DB 102,15,56,220,225 > +DB 102,15,56,220,233 > + movdqu xmm6,XMMWORD[rbx*1+r8] > + movdqu xmm1,XMMWORD[((16-120))+rsi] > + > +DB 102,15,56,221,208 > + movdqu xmm7,XMMWORD[rbx*1+r9] > + pxor xmm6,xmm12 > +DB 102,15,56,221,216 > + movdqu xmm8,XMMWORD[rbx*1+r10] > + pxor xmm7,xmm12 > +DB 102,15,56,221,224 > + movdqu xmm9,XMMWORD[rbx*1+r11] > + pxor xmm8,xmm12 > +DB 102,15,56,221,232 > + movdqu xmm0,XMMWORD[((32-120))+rsi] > + pxor xmm9,xmm12 > + > + movups XMMWORD[(-16)+rbx*1+r12],xmm2 > + pxor xmm2,xmm6 > + movups XMMWORD[(-16)+rbx*1+r13],xmm3 > + pxor xmm3,xmm7 > + movups XMMWORD[(-16)+rbx*1+r14],xmm4 > + pxor xmm4,xmm8 > + movups XMMWORD[(-16)+rbx*1+r15],xmm5 > + pxor xmm5,xmm9 > + > + dec edx > + jnz NEAR $L$oop_enc4x > + > + mov rax,QWORD[16+rsp] > + > + mov edx,DWORD[24+rsp] > + > + > + > + > + > + > + > + > + > + > + lea rdi,[160+rdi] > + dec edx > + jnz NEAR $L$enc4x_loop_grande > + > +$L$enc4x_done: > + movaps xmm6,XMMWORD[((-216))+rax] > + movaps xmm7,XMMWORD[((-200))+rax] > + movaps xmm8,XMMWORD[((-184))+rax] > + movaps xmm9,XMMWORD[((-168))+rax] > + movaps xmm10,XMMWORD[((-152))+rax] > + movaps xmm11,XMMWORD[((-136))+rax] > + movaps xmm12,XMMWORD[((-120))+rax] > + > + > + > + mov r15,QWORD[((-48))+rax] > + > + mov r14,QWORD[((-40))+rax] > + > + mov r13,QWORD[((-32))+rax] > + > + mov r12,QWORD[((-24))+rax] > + > + mov rbp,QWORD[((-16))+rax] > + > + mov rbx,QWORD[((-8))+rax] > + > + lea rsp,[rax] > + > +$L$enc4x_epilogue: > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > + mov rsi,QWORD[16+rsp] > + DB 0F3h,0C3h ;repret > + > +$L$SEH_end_aesni_multi_cbc_encrypt: > + > +global aesni_multi_cbc_decrypt > + > +ALIGN 32 > +aesni_multi_cbc_decrypt: > + mov QWORD[8+rsp],rdi ;WIN64 prologue > + mov QWORD[16+rsp],rsi > + mov rax,rsp > +$L$SEH_begin_aesni_multi_cbc_decrypt: > + mov rdi,rcx > + mov rsi,rdx > + mov rdx,r8 > + > + > + > + mov rax,rsp > + > + push rbx > + > + push rbp > + > + push r12 > + > + push r13 > + > + push r14 > + > + push r15 > + > + lea rsp,[((-168))+rsp] > + movaps XMMWORD[rsp],xmm6 > + movaps XMMWORD[16+rsp],xmm7 > + movaps XMMWORD[32+rsp],xmm8 > + movaps XMMWORD[48+rsp],xmm9 > + movaps XMMWORD[64+rsp],xmm10 > + movaps XMMWORD[80+rsp],xmm11 > + movaps XMMWORD[96+rsp],xmm12 > + movaps XMMWORD[(-104)+rax],xmm13 > + movaps XMMWORD[(-88)+rax],xmm14 > + movaps XMMWORD[(-72)+rax],xmm15 > + > + > + > + > + > + > + sub rsp,48 > + and rsp,-64 > + mov QWORD[16+rsp],rax > + > + > +$L$dec4x_body: > + movdqu xmm12,XMMWORD[rsi] > + lea rsi,[120+rsi] > + lea rdi,[80+rdi] > + > +$L$dec4x_loop_grande: > + mov DWORD[24+rsp],edx > + xor edx,edx > + mov ecx,DWORD[((-64))+rdi] > + mov r8,QWORD[((-80))+rdi] > + cmp ecx,edx > + mov r12,QWORD[((-72))+rdi] > + cmovg edx,ecx > + test ecx,ecx > + movdqu xmm6,XMMWORD[((-56))+rdi] > + mov DWORD[32+rsp],ecx > + cmovle r8,rsp > + mov ecx,DWORD[((-24))+rdi] > + mov r9,QWORD[((-40))+rdi] > + cmp ecx,edx > + mov r13,QWORD[((-32))+rdi] > + cmovg edx,ecx > + test ecx,ecx > + movdqu xmm7,XMMWORD[((-16))+rdi] > + mov DWORD[36+rsp],ecx > + cmovle r9,rsp > + mov ecx,DWORD[16+rdi] > + mov r10,QWORD[rdi] > + cmp ecx,edx > + mov r14,QWORD[8+rdi] > + cmovg edx,ecx > + test ecx,ecx > + movdqu xmm8,XMMWORD[24+rdi] > + mov DWORD[40+rsp],ecx > + cmovle r10,rsp > + mov ecx,DWORD[56+rdi] > + mov r11,QWORD[40+rdi] > + cmp ecx,edx > + mov r15,QWORD[48+rdi] > + cmovg edx,ecx > + test ecx,ecx > + movdqu xmm9,XMMWORD[64+rdi] > + mov DWORD[44+rsp],ecx > + cmovle r11,rsp > + test edx,edx > + jz NEAR $L$dec4x_done > + > + movups xmm1,XMMWORD[((16-120))+rsi] > + movups xmm0,XMMWORD[((32-120))+rsi] > + mov eax,DWORD[((240-120))+rsi] > + movdqu xmm2,XMMWORD[r8] > + movdqu xmm3,XMMWORD[r9] > + pxor xmm2,xmm12 > + movdqu xmm4,XMMWORD[r10] > + pxor xmm3,xmm12 > + movdqu xmm5,XMMWORD[r11] > + pxor xmm4,xmm12 > + pxor xmm5,xmm12 > + movdqa xmm10,XMMWORD[32+rsp] > + xor rbx,rbx > + jmp NEAR $L$oop_dec4x > + > +ALIGN 32 > +$L$oop_dec4x: > + add rbx,16 > + lea rbp,[16+rsp] > + mov ecx,1 > + sub rbp,rbx > + > +DB 102,15,56,222,209 > + prefetcht0 [31+rbx*1+r8] > + prefetcht0 [31+rbx*1+r9] > +DB 102,15,56,222,217 > + prefetcht0 [31+rbx*1+r10] > + prefetcht0 [31+rbx*1+r11] > +DB 102,15,56,222,225 > +DB 102,15,56,222,233 > + movups xmm1,XMMWORD[((48-120))+rsi] > + cmp ecx,DWORD[32+rsp] > +DB 102,15,56,222,208 > +DB 102,15,56,222,216 > +DB 102,15,56,222,224 > + cmovge r8,rbp > + cmovg r12,rbp > +DB 102,15,56,222,232 > + movups xmm0,XMMWORD[((-56))+rsi] > + cmp ecx,DWORD[36+rsp] > +DB 102,15,56,222,209 > +DB 102,15,56,222,217 > +DB 102,15,56,222,225 > + cmovge r9,rbp > + cmovg r13,rbp > +DB 102,15,56,222,233 > + movups xmm1,XMMWORD[((-40))+rsi] > + cmp ecx,DWORD[40+rsp] > +DB 102,15,56,222,208 > +DB 102,15,56,222,216 > +DB 102,15,56,222,224 > + cmovge r10,rbp > + cmovg r14,rbp > +DB 102,15,56,222,232 > + movups xmm0,XMMWORD[((-24))+rsi] > + cmp ecx,DWORD[44+rsp] > +DB 102,15,56,222,209 > +DB 102,15,56,222,217 > +DB 102,15,56,222,225 > + cmovge r11,rbp > + cmovg r15,rbp > +DB 102,15,56,222,233 > + movups xmm1,XMMWORD[((-8))+rsi] > + movdqa xmm11,xmm10 > +DB 102,15,56,222,208 > + prefetcht0 [15+rbx*1+r12] > + prefetcht0 [15+rbx*1+r13] > +DB 102,15,56,222,216 > + prefetcht0 [15+rbx*1+r14] > + prefetcht0 [15+rbx*1+r15] > +DB 102,15,56,222,224 > +DB 102,15,56,222,232 > + movups xmm0,XMMWORD[((128-120))+rsi] > + pxor xmm12,xmm12 > + > +DB 102,15,56,222,209 > + pcmpgtd xmm11,xmm12 > + movdqu xmm12,XMMWORD[((-120))+rsi] > +DB 102,15,56,222,217 > + paddd xmm10,xmm11 > + movdqa XMMWORD[32+rsp],xmm10 > +DB 102,15,56,222,225 > +DB 102,15,56,222,233 > + movups xmm1,XMMWORD[((144-120))+rsi] > + > + cmp eax,11 > + > +DB 102,15,56,222,208 > +DB 102,15,56,222,216 > +DB 102,15,56,222,224 > +DB 102,15,56,222,232 > + movups xmm0,XMMWORD[((160-120))+rsi] > + > + jb NEAR $L$dec4x_tail > + > +DB 102,15,56,222,209 > +DB 102,15,56,222,217 > +DB 102,15,56,222,225 > +DB 102,15,56,222,233 > + movups xmm1,XMMWORD[((176-120))+rsi] > + > +DB 102,15,56,222,208 > +DB 102,15,56,222,216 > +DB 102,15,56,222,224 > +DB 102,15,56,222,232 > + movups xmm0,XMMWORD[((192-120))+rsi] > + > + je NEAR $L$dec4x_tail > + > +DB 102,15,56,222,209 > +DB 102,15,56,222,217 > +DB 102,15,56,222,225 > +DB 102,15,56,222,233 > + movups xmm1,XMMWORD[((208-120))+rsi] > + > +DB 102,15,56,222,208 > +DB 102,15,56,222,216 > +DB 102,15,56,222,224 > +DB 102,15,56,222,232 > + movups xmm0,XMMWORD[((224-120))+rsi] > + jmp NEAR $L$dec4x_tail > + > +ALIGN 32 > +$L$dec4x_tail: > +DB 102,15,56,222,209 > +DB 102,15,56,222,217 > +DB 102,15,56,222,225 > + pxor xmm6,xmm0 > + pxor xmm7,xmm0 > +DB 102,15,56,222,233 > + movdqu xmm1,XMMWORD[((16-120))+rsi] > + pxor xmm8,xmm0 > + pxor xmm9,xmm0 > + movdqu xmm0,XMMWORD[((32-120))+rsi] > + > +DB 102,15,56,223,214 > +DB 102,15,56,223,223 > + movdqu xmm6,XMMWORD[((-16))+rbx*1+r8] > + movdqu xmm7,XMMWORD[((-16))+rbx*1+r9] > +DB 102,65,15,56,223,224 > +DB 102,65,15,56,223,233 > + movdqu xmm8,XMMWORD[((-16))+rbx*1+r10] > + movdqu xmm9,XMMWORD[((-16))+rbx*1+r11] > + > + movups XMMWORD[(-16)+rbx*1+r12],xmm2 > + movdqu xmm2,XMMWORD[rbx*1+r8] > + movups XMMWORD[(-16)+rbx*1+r13],xmm3 > + movdqu xmm3,XMMWORD[rbx*1+r9] > + pxor xmm2,xmm12 > + movups XMMWORD[(-16)+rbx*1+r14],xmm4 > + movdqu xmm4,XMMWORD[rbx*1+r10] > + pxor xmm3,xmm12 > + movups XMMWORD[(-16)+rbx*1+r15],xmm5 > + movdqu xmm5,XMMWORD[rbx*1+r11] > + pxor xmm4,xmm12 > + pxor xmm5,xmm12 > + > + dec edx > + jnz NEAR $L$oop_dec4x > + > + mov rax,QWORD[16+rsp] > + > + mov edx,DWORD[24+rsp] > + > + lea rdi,[160+rdi] > + dec edx > + jnz NEAR $L$dec4x_loop_grande > + > +$L$dec4x_done: > + movaps xmm6,XMMWORD[((-216))+rax] > + movaps xmm7,XMMWORD[((-200))+rax] > + movaps xmm8,XMMWORD[((-184))+rax] > + movaps xmm9,XMMWORD[((-168))+rax] > + movaps xmm10,XMMWORD[((-152))+rax] > + movaps xmm11,XMMWORD[((-136))+rax] > + movaps xmm12,XMMWORD[((-120))+rax] > + > + > + > + mov r15,QWORD[((-48))+rax] > + > + mov r14,QWORD[((-40))+rax] > + > + mov r13,QWORD[((-32))+rax] > + > + mov r12,QWORD[((-24))+rax] > + > + mov rbp,QWORD[((-16))+rax] > + > + mov rbx,QWORD[((-8))+rax] > + > + lea rsp,[rax] > + > +$L$dec4x_epilogue: > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > + mov rsi,QWORD[16+rsp] > + DB 0F3h,0C3h ;repret > + > +$L$SEH_end_aesni_multi_cbc_decrypt: > +EXTERN __imp_RtlVirtualUnwind > + > +ALIGN 16 > +se_handler: > + push rsi > + push rdi > + push rbx > + push rbp > + push r12 > + push r13 > + push r14 > + push r15 > + pushfq > + sub rsp,64 > + > + mov rax,QWORD[120+r8] > + mov rbx,QWORD[248+r8] > + > + mov rsi,QWORD[8+r9] > + mov r11,QWORD[56+r9] > + > + mov r10d,DWORD[r11] > + lea r10,[r10*1+rsi] > + cmp rbx,r10 > + jb NEAR $L$in_prologue > + > + mov rax,QWORD[152+r8] > + > + mov r10d,DWORD[4+r11] > + lea r10,[r10*1+rsi] > + cmp rbx,r10 > + jae NEAR $L$in_prologue > + > + mov rax,QWORD[16+rax] > + > + mov rbx,QWORD[((-8))+rax] > + mov rbp,QWORD[((-16))+rax] > + mov r12,QWORD[((-24))+rax] > + mov r13,QWORD[((-32))+rax] > + mov r14,QWORD[((-40))+rax] > + mov r15,QWORD[((-48))+rax] > + mov QWORD[144+r8],rbx > + mov QWORD[160+r8],rbp > + mov QWORD[216+r8],r12 > + mov QWORD[224+r8],r13 > + mov QWORD[232+r8],r14 > + mov QWORD[240+r8],r15 > + > + lea rsi,[((-56-160))+rax] > + lea rdi,[512+r8] > + mov ecx,20 > + DD 0xa548f3fc > + > +$L$in_prologue: > + mov rdi,QWORD[8+rax] > + mov rsi,QWORD[16+rax] > + mov QWORD[152+r8],rax > + mov QWORD[168+r8],rsi > + mov QWORD[176+r8],rdi > + > + mov rdi,QWORD[40+r9] > + mov rsi,r8 > + mov ecx,154 > + DD 0xa548f3fc > + > + mov rsi,r9 > + xor rcx,rcx > + mov rdx,QWORD[8+rsi] > + mov r8,QWORD[rsi] > + mov r9,QWORD[16+rsi] > + mov r10,QWORD[40+rsi] > + lea r11,[56+rsi] > + lea r12,[24+rsi] > + mov QWORD[32+rsp],r10 > + mov QWORD[40+rsp],r11 > + mov QWORD[48+rsp],r12 > + mov QWORD[56+rsp],rcx > + call QWORD[__imp_RtlVirtualUnwind] > + > + mov eax,1 > + add rsp,64 > + popfq > + pop r15 > + pop r14 > + pop r13 > + pop r12 > + pop rbp > + pop rbx > + pop rdi > + pop rsi > + DB 0F3h,0C3h ;repret > + > + > +section .pdata rdata align=4 > +ALIGN 4 > + DD $L$SEH_begin_aesni_multi_cbc_encrypt wrt ..imagebase > + DD $L$SEH_end_aesni_multi_cbc_encrypt wrt ..imagebase > + DD $L$SEH_info_aesni_multi_cbc_encrypt wrt ..imagebase > + DD $L$SEH_begin_aesni_multi_cbc_decrypt wrt ..imagebase > + DD $L$SEH_end_aesni_multi_cbc_decrypt wrt ..imagebase > + DD $L$SEH_info_aesni_multi_cbc_decrypt wrt ..imagebase > +section .xdata rdata align=8 > +ALIGN 8 > +$L$SEH_info_aesni_multi_cbc_encrypt: > +DB 9,0,0,0 > + DD se_handler wrt ..imagebase > + DD $L$enc4x_body wrt ..imagebase,$L$enc4x_epilogue > wrt ..imagebase > +$L$SEH_info_aesni_multi_cbc_decrypt: > +DB 9,0,0,0 > + DD se_handler wrt ..imagebase > + DD $L$dec4x_body wrt ..imagebase,$L$dec4x_epilogue > wrt ..imagebase > diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha1- > x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha1- > x86_64.nasm > new file mode 100644 > index 0000000000..f4fd9ca50d > --- /dev/null > +++ b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha1-x86_64.nasm > @@ -0,0 +1,1916 @@ > +; WARNING: do not edit! > +; Generated from openssl/crypto/aes/asm/aesni-sha1-x86_64.pl > +; > +; Copyright 2011-2020 The OpenSSL Project Authors. All Rights Reserved. > +; > +; Licensed under the OpenSSL license (the "License"). You may not use > +; this file except in compliance with the License. You can obtain a copy > +; in the file LICENSE in the source distribution or at > +; https://www.openssl.org/source/license.html > + > +default rel > +%define XMMWORD > +%define YMMWORD > +%define ZMMWORD > +section .text code align=64 > + > +EXTERN OPENSSL_ia32cap_P > + > +global aesni_cbc_sha1_enc > + > +ALIGN 32 > +aesni_cbc_sha1_enc: > + > + > + mov r10d,DWORD[((OPENSSL_ia32cap_P+0))] > + mov r11,QWORD[((OPENSSL_ia32cap_P+4))] > + bt r11,61 > + jc NEAR aesni_cbc_sha1_enc_shaext > + jmp NEAR aesni_cbc_sha1_enc_ssse3 > + DB 0F3h,0C3h ;repret > + > + > + > +ALIGN 32 > +aesni_cbc_sha1_enc_ssse3: > + mov QWORD[8+rsp],rdi ;WIN64 prologue > + mov QWORD[16+rsp],rsi > + mov rax,rsp > +$L$SEH_begin_aesni_cbc_sha1_enc_ssse3: > + mov rdi,rcx > + mov rsi,rdx > + mov rdx,r8 > + mov rcx,r9 > + mov r8,QWORD[40+rsp] > + mov r9,QWORD[48+rsp] > + > + > + > + mov r10,QWORD[56+rsp] > + > + > + push rbx > + > + push rbp > + > + push r12 > + > + push r13 > + > + push r14 > + > + push r15 > + > + lea rsp,[((-264))+rsp] > + > + > + > + movaps XMMWORD[(96+0)+rsp],xmm6 > + movaps XMMWORD[(96+16)+rsp],xmm7 > + movaps XMMWORD[(96+32)+rsp],xmm8 > + movaps XMMWORD[(96+48)+rsp],xmm9 > + movaps XMMWORD[(96+64)+rsp],xmm10 > + movaps XMMWORD[(96+80)+rsp],xmm11 > + movaps XMMWORD[(96+96)+rsp],xmm12 > + movaps XMMWORD[(96+112)+rsp],xmm13 > + movaps XMMWORD[(96+128)+rsp],xmm14 > + movaps XMMWORD[(96+144)+rsp],xmm15 > +$L$prologue_ssse3: > + mov r12,rdi > + mov r13,rsi > + mov r14,rdx > + lea r15,[112+rcx] > + movdqu xmm2,XMMWORD[r8] > + mov QWORD[88+rsp],r8 > + shl r14,6 > + sub r13,r12 > + mov r8d,DWORD[((240-112))+r15] > + add r14,r10 > + > + lea r11,[K_XX_XX] > + mov eax,DWORD[r9] > + mov ebx,DWORD[4+r9] > + mov ecx,DWORD[8+r9] > + mov edx,DWORD[12+r9] > + mov esi,ebx > + mov ebp,DWORD[16+r9] > + mov edi,ecx > + xor edi,edx > + and esi,edi > + > + movdqa xmm3,XMMWORD[64+r11] > + movdqa xmm13,XMMWORD[r11] > + movdqu xmm4,XMMWORD[r10] > + movdqu xmm5,XMMWORD[16+r10] > + movdqu xmm6,XMMWORD[32+r10] > + movdqu xmm7,XMMWORD[48+r10] > +DB 102,15,56,0,227 > +DB 102,15,56,0,235 > +DB 102,15,56,0,243 > + add r10,64 > + paddd xmm4,xmm13 > +DB 102,15,56,0,251 > + paddd xmm5,xmm13 > + paddd xmm6,xmm13 > + movdqa XMMWORD[rsp],xmm4 > + psubd xmm4,xmm13 > + movdqa XMMWORD[16+rsp],xmm5 > + psubd xmm5,xmm13 > + movdqa XMMWORD[32+rsp],xmm6 > + psubd xmm6,xmm13 > + movups xmm15,XMMWORD[((-112))+r15] > + movups xmm0,XMMWORD[((16-112))+r15] > + jmp NEAR $L$oop_ssse3 > +ALIGN 32 > +$L$oop_ssse3: > + ror ebx,2 > + movups xmm14,XMMWORD[r12] > + xorps xmm14,xmm15 > + xorps xmm2,xmm14 > + movups xmm1,XMMWORD[((-80))+r15] > +DB 102,15,56,220,208 > + pshufd xmm8,xmm4,238 > + xor esi,edx > + movdqa xmm12,xmm7 > + paddd xmm13,xmm7 > + mov edi,eax > + add ebp,DWORD[rsp] > + punpcklqdq xmm8,xmm5 > + xor ebx,ecx > + rol eax,5 > + add ebp,esi > + psrldq xmm12,4 > + and edi,ebx > + xor ebx,ecx > + pxor xmm8,xmm4 > + add ebp,eax > + ror eax,7 > + pxor xmm12,xmm6 > + xor edi,ecx > + mov esi,ebp > + add edx,DWORD[4+rsp] > + pxor xmm8,xmm12 > + xor eax,ebx > + rol ebp,5 > + movdqa XMMWORD[48+rsp],xmm13 > + add edx,edi > + movups xmm0,XMMWORD[((-64))+r15] > +DB 102,15,56,220,209 > + and esi,eax > + movdqa xmm3,xmm8 > + xor eax,ebx > + add edx,ebp > + ror ebp,7 > + movdqa xmm12,xmm8 > + xor esi,ebx > + pslldq xmm3,12 > + paddd xmm8,xmm8 > + mov edi,edx > + add ecx,DWORD[8+rsp] > + psrld xmm12,31 > + xor ebp,eax > + rol edx,5 > + add ecx,esi > + movdqa xmm13,xmm3 > + and edi,ebp > + xor ebp,eax > + psrld xmm3,30 > + add ecx,edx > + ror edx,7 > + por xmm8,xmm12 > + xor edi,eax > + mov esi,ecx > + add ebx,DWORD[12+rsp] > + movups xmm1,XMMWORD[((-48))+r15] > +DB 102,15,56,220,208 > + pslld xmm13,2 > + pxor xmm8,xmm3 > + xor edx,ebp > + movdqa xmm3,XMMWORD[r11] > + rol ecx,5 > + add ebx,edi > + and esi,edx > + pxor xmm8,xmm13 > + xor edx,ebp > + add ebx,ecx > + ror ecx,7 > + pshufd xmm9,xmm5,238 > + xor esi,ebp > + movdqa xmm13,xmm8 > + paddd xmm3,xmm8 > + mov edi,ebx > + add eax,DWORD[16+rsp] > + punpcklqdq xmm9,xmm6 > + xor ecx,edx > + rol ebx,5 > + add eax,esi > + psrldq xmm13,4 > + and edi,ecx > + xor ecx,edx > + pxor xmm9,xmm5 > + add eax,ebx > + ror ebx,7 > + movups xmm0,XMMWORD[((-32))+r15] > +DB 102,15,56,220,209 > + pxor xmm13,xmm7 > + xor edi,edx > + mov esi,eax > + add ebp,DWORD[20+rsp] > + pxor xmm9,xmm13 > + xor ebx,ecx > + rol eax,5 > + movdqa XMMWORD[rsp],xmm3 > + add ebp,edi > + and esi,ebx > + movdqa xmm12,xmm9 > + xor ebx,ecx > + add ebp,eax > + ror eax,7 > + movdqa xmm13,xmm9 > + xor esi,ecx > + pslldq xmm12,12 > + paddd xmm9,xmm9 > + mov edi,ebp > + add edx,DWORD[24+rsp] > + psrld xmm13,31 > + xor eax,ebx > + rol ebp,5 > + add edx,esi > + movups xmm1,XMMWORD[((-16))+r15] > +DB 102,15,56,220,208 > + movdqa xmm3,xmm12 > + and edi,eax > + xor eax,ebx > + psrld xmm12,30 > + add edx,ebp > + ror ebp,7 > + por xmm9,xmm13 > + xor edi,ebx > + mov esi,edx > + add ecx,DWORD[28+rsp] > + pslld xmm3,2 > + pxor xmm9,xmm12 > + xor ebp,eax > + movdqa xmm12,XMMWORD[16+r11] > + rol edx,5 > + add ecx,edi > + and esi,ebp > + pxor xmm9,xmm3 > + xor ebp,eax > + add ecx,edx > + ror edx,7 > + pshufd xmm10,xmm6,238 > + xor esi,eax > + movdqa xmm3,xmm9 > + paddd xmm12,xmm9 > + mov edi,ecx > + add ebx,DWORD[32+rsp] > + movups xmm0,XMMWORD[r15] > +DB 102,15,56,220,209 > + punpcklqdq xmm10,xmm7 > + xor edx,ebp > + rol ecx,5 > + add ebx,esi > + psrldq xmm3,4 > + and edi,edx > + xor edx,ebp > + pxor xmm10,xmm6 > + add ebx,ecx > + ror ecx,7 > + pxor xmm3,xmm8 > + xor edi,ebp > + mov esi,ebx > + add eax,DWORD[36+rsp] > + pxor xmm10,xmm3 > + xor ecx,edx > + rol ebx,5 > + movdqa XMMWORD[16+rsp],xmm12 > + add eax,edi > + and esi,ecx > + movdqa xmm13,xmm10 > + xor ecx,edx > + add eax,ebx > + ror ebx,7 > + movups xmm1,XMMWORD[16+r15] > +DB 102,15,56,220,208 > + movdqa xmm3,xmm10 > + xor esi,edx > + pslldq xmm13,12 > + paddd xmm10,xmm10 > + mov edi,eax > + add ebp,DWORD[40+rsp] > + psrld xmm3,31 > + xor ebx,ecx > + rol eax,5 > + add ebp,esi > + movdqa xmm12,xmm13 > + and edi,ebx > + xor ebx,ecx > + psrld xmm13,30 > + add ebp,eax > + ror eax,7 > + por xmm10,xmm3 > + xor edi,ecx > + mov esi,ebp > + add edx,DWORD[44+rsp] > + pslld xmm12,2 > + pxor xmm10,xmm13 > + xor eax,ebx > + movdqa xmm13,XMMWORD[16+r11] > + rol ebp,5 > + add edx,edi > + movups xmm0,XMMWORD[32+r15] > +DB 102,15,56,220,209 > + and esi,eax > + pxor xmm10,xmm12 > + xor eax,ebx > + add edx,ebp > + ror ebp,7 > + pshufd xmm11,xmm7,238 > + xor esi,ebx > + movdqa xmm12,xmm10 > + paddd xmm13,xmm10 > + mov edi,edx > + add ecx,DWORD[48+rsp] > + punpcklqdq xmm11,xmm8 > + xor ebp,eax > + rol edx,5 > + add ecx,esi > + psrldq xmm12,4 > + and edi,ebp > + xor ebp,eax > + pxor xmm11,xmm7 > + add ecx,edx > + ror edx,7 > + pxor xmm12,xmm9 > + xor edi,eax > + mov esi,ecx > + add ebx,DWORD[52+rsp] > + movups xmm1,XMMWORD[48+r15] > +DB 102,15,56,220,208 > + pxor xmm11,xmm12 > + xor edx,ebp > + rol ecx,5 > + movdqa XMMWORD[32+rsp],xmm13 > + add ebx,edi > + and esi,edx > + movdqa xmm3,xmm11 > + xor edx,ebp > + add ebx,ecx > + ror ecx,7 > + movdqa xmm12,xmm11 > + xor esi,ebp > + pslldq xmm3,12 > + paddd xmm11,xmm11 > + mov edi,ebx > + add eax,DWORD[56+rsp] > + psrld xmm12,31 > + xor ecx,edx > + rol ebx,5 > + add eax,esi > + movdqa xmm13,xmm3 > + and edi,ecx > + xor ecx,edx > + psrld xmm3,30 > + add eax,ebx > + ror ebx,7 > + cmp r8d,11 > + jb NEAR $L$aesenclast1 > + movups xmm0,XMMWORD[64+r15] > +DB 102,15,56,220,209 > + movups xmm1,XMMWORD[80+r15] > +DB 102,15,56,220,208 > + je NEAR $L$aesenclast1 > + movups xmm0,XMMWORD[96+r15] > +DB 102,15,56,220,209 > + movups xmm1,XMMWORD[112+r15] > +DB 102,15,56,220,208 > +$L$aesenclast1: > +DB 102,15,56,221,209 > + movups xmm0,XMMWORD[((16-112))+r15] > + por xmm11,xmm12 > + xor edi,edx > + mov esi,eax > + add ebp,DWORD[60+rsp] > + pslld xmm13,2 > + pxor xmm11,xmm3 > + xor ebx,ecx > + movdqa xmm3,XMMWORD[16+r11] > + rol eax,5 > + add ebp,edi > + and esi,ebx > + pxor xmm11,xmm13 > + pshufd xmm13,xmm10,238 > + xor ebx,ecx > + add ebp,eax > + ror eax,7 > + pxor xmm4,xmm8 > + xor esi,ecx > + mov edi,ebp > + add edx,DWORD[rsp] > + punpcklqdq xmm13,xmm11 > + xor eax,ebx > + rol ebp,5 > + pxor xmm4,xmm5 > + add edx,esi > + movups xmm14,XMMWORD[16+r12] > + xorps xmm14,xmm15 > + movups XMMWORD[r13*1+r12],xmm2 > + xorps xmm2,xmm14 > + movups xmm1,XMMWORD[((-80))+r15] > +DB 102,15,56,220,208 > + and edi,eax > + movdqa xmm12,xmm3 > + xor eax,ebx > + paddd xmm3,xmm11 > + add edx,ebp > + pxor xmm4,xmm13 > + ror ebp,7 > + xor edi,ebx > + mov esi,edx > + add ecx,DWORD[4+rsp] > + movdqa xmm13,xmm4 > + xor ebp,eax > + rol edx,5 > + movdqa XMMWORD[48+rsp],xmm3 > + add ecx,edi > + and esi,ebp > + xor ebp,eax > + pslld xmm4,2 > + add ecx,edx > + ror edx,7 > + psrld xmm13,30 > + xor esi,eax > + mov edi,ecx > + add ebx,DWORD[8+rsp] > + movups xmm0,XMMWORD[((-64))+r15] > +DB 102,15,56,220,209 > + por xmm4,xmm13 > + xor edx,ebp > + rol ecx,5 > + pshufd xmm3,xmm11,238 > + add ebx,esi > + and edi,edx > + xor edx,ebp > + add ebx,ecx > + add eax,DWORD[12+rsp] > + xor edi,ebp > + mov esi,ebx > + rol ebx,5 > + add eax,edi > + xor esi,edx > + ror ecx,7 > + add eax,ebx > + pxor xmm5,xmm9 > + add ebp,DWORD[16+rsp] > + movups xmm1,XMMWORD[((-48))+r15] > +DB 102,15,56,220,208 > + xor esi,ecx > + punpcklqdq xmm3,xmm4 > + mov edi,eax > + rol eax,5 > + pxor xmm5,xmm6 > + add ebp,esi > + xor edi,ecx > + movdqa xmm13,xmm12 > + ror ebx,7 > + paddd xmm12,xmm4 > + add ebp,eax > + pxor xmm5,xmm3 > + add edx,DWORD[20+rsp] > + xor edi,ebx > + mov esi,ebp > + rol ebp,5 > + movdqa xmm3,xmm5 > + add edx,edi > + xor esi,ebx > + movdqa XMMWORD[rsp],xmm12 > + ror eax,7 > + add edx,ebp > + add ecx,DWORD[24+rsp] > + pslld xmm5,2 > + xor esi,eax > + mov edi,edx > + psrld xmm3,30 > + rol edx,5 > + add ecx,esi > + movups xmm0,XMMWORD[((-32))+r15] > +DB 102,15,56,220,209 > + xor edi,eax > + ror ebp,7 > + por xmm5,xmm3 > + add ecx,edx > + add ebx,DWORD[28+rsp] > + pshufd xmm12,xmm4,238 > + xor edi,ebp > + mov esi,ecx > + rol ecx,5 > + add ebx,edi > + xor esi,ebp > + ror edx,7 > + add ebx,ecx > + pxor xmm6,xmm10 > + add eax,DWORD[32+rsp] > + xor esi,edx > + punpcklqdq xmm12,xmm5 > + mov edi,ebx > + rol ebx,5 > + pxor xmm6,xmm7 > + add eax,esi > + xor edi,edx > + movdqa xmm3,XMMWORD[32+r11] > + ror ecx,7 > + paddd xmm13,xmm5 > + add eax,ebx > + pxor xmm6,xmm12 > + add ebp,DWORD[36+rsp] > + movups xmm1,XMMWORD[((-16))+r15] > +DB 102,15,56,220,208 > + xor edi,ecx > + mov esi,eax > + rol eax,5 > + movdqa xmm12,xmm6 > + add ebp,edi > + xor esi,ecx > + movdqa XMMWORD[16+rsp],xmm13 > + ror ebx,7 > + add ebp,eax > + add edx,DWORD[40+rsp] > + pslld xmm6,2 > + xor esi,ebx > + mov edi,ebp > + psrld xmm12,30 > + rol ebp,5 > + add edx,esi > + xor edi,ebx > + ror eax,7 > + por xmm6,xmm12 > + add edx,ebp > + add ecx,DWORD[44+rsp] > + pshufd xmm13,xmm5,238 > + xor edi,eax > + mov esi,edx > + rol edx,5 > + add ecx,edi > + movups xmm0,XMMWORD[r15] > +DB 102,15,56,220,209 > + xor esi,eax > + ror ebp,7 > + add ecx,edx > + pxor xmm7,xmm11 > + add ebx,DWORD[48+rsp] > + xor esi,ebp > + punpcklqdq xmm13,xmm6 > + mov edi,ecx > + rol ecx,5 > + pxor xmm7,xmm8 > + add ebx,esi > + xor edi,ebp > + movdqa xmm12,xmm3 > + ror edx,7 > + paddd xmm3,xmm6 > + add ebx,ecx > + pxor xmm7,xmm13 > + add eax,DWORD[52+rsp] > + xor edi,edx > + mov esi,ebx > + rol ebx,5 > + movdqa xmm13,xmm7 > + add eax,edi > + xor esi,edx > + movdqa XMMWORD[32+rsp],xmm3 > + ror ecx,7 > + add eax,ebx > + add ebp,DWORD[56+rsp] > + movups xmm1,XMMWORD[16+r15] > +DB 102,15,56,220,208 > + pslld xmm7,2 > + xor esi,ecx > + mov edi,eax > + psrld xmm13,30 > + rol eax,5 > + add ebp,esi > + xor edi,ecx > + ror ebx,7 > + por xmm7,xmm13 > + add ebp,eax > + add edx,DWORD[60+rsp] > + pshufd xmm3,xmm6,238 > + xor edi,ebx > + mov esi,ebp > + rol ebp,5 > + add edx,edi > + xor esi,ebx > + ror eax,7 > + add edx,ebp > + pxor xmm8,xmm4 > + add ecx,DWORD[rsp] > + xor esi,eax > + punpcklqdq xmm3,xmm7 > + mov edi,edx > + rol edx,5 > + pxor xmm8,xmm9 > + add ecx,esi > + movups xmm0,XMMWORD[32+r15] > +DB 102,15,56,220,209 > + xor edi,eax > + movdqa xmm13,xmm12 > + ror ebp,7 > + paddd xmm12,xmm7 > + add ecx,edx > + pxor xmm8,xmm3 > + add ebx,DWORD[4+rsp] > + xor edi,ebp > + mov esi,ecx > + rol ecx,5 > + movdqa xmm3,xmm8 > + add ebx,edi > + xor esi,ebp > + movdqa XMMWORD[48+rsp],xmm12 > + ror edx,7 > + add ebx,ecx > + add eax,DWORD[8+rsp] > + pslld xmm8,2 > + xor esi,edx > + mov edi,ebx > + psrld xmm3,30 > + rol ebx,5 > + add eax,esi > + xor edi,edx > + ror ecx,7 > + por xmm8,xmm3 > + add eax,ebx > + add ebp,DWORD[12+rsp] > + movups xmm1,XMMWORD[48+r15] > +DB 102,15,56,220,208 > + pshufd xmm12,xmm7,238 > + xor edi,ecx > + mov esi,eax > + rol eax,5 > + add ebp,edi > + xor esi,ecx > + ror ebx,7 > + add ebp,eax > + pxor xmm9,xmm5 > + add edx,DWORD[16+rsp] > + xor esi,ebx > + punpcklqdq xmm12,xmm8 > + mov edi,ebp > + rol ebp,5 > + pxor xmm9,xmm10 > + add edx,esi > + xor edi,ebx > + movdqa xmm3,xmm13 > + ror eax,7 > + paddd xmm13,xmm8 > + add edx,ebp > + pxor xmm9,xmm12 > + add ecx,DWORD[20+rsp] > + xor edi,eax > + mov esi,edx > + rol edx,5 > + movdqa xmm12,xmm9 > + add ecx,edi > + cmp r8d,11 > + jb NEAR $L$aesenclast2 > + movups xmm0,XMMWORD[64+r15] > +DB 102,15,56,220,209 > + movups xmm1,XMMWORD[80+r15] > +DB 102,15,56,220,208 > + je NEAR $L$aesenclast2 > + movups xmm0,XMMWORD[96+r15] > +DB 102,15,56,220,209 > + movups xmm1,XMMWORD[112+r15] > +DB 102,15,56,220,208 > +$L$aesenclast2: > +DB 102,15,56,221,209 > + movups xmm0,XMMWORD[((16-112))+r15] > + xor esi,eax > + movdqa XMMWORD[rsp],xmm13 > + ror ebp,7 > + add ecx,edx > + add ebx,DWORD[24+rsp] > + pslld xmm9,2 > + xor esi,ebp > + mov edi,ecx > + psrld xmm12,30 > + rol ecx,5 > + add ebx,esi > + xor edi,ebp > + ror edx,7 > + por xmm9,xmm12 > + add ebx,ecx > + add eax,DWORD[28+rsp] > + pshufd xmm13,xmm8,238 > + ror ecx,7 > + mov esi,ebx > + xor edi,edx > + rol ebx,5 > + add eax,edi > + xor esi,ecx > + xor ecx,edx > + add eax,ebx > + pxor xmm10,xmm6 > + add ebp,DWORD[32+rsp] > + movups xmm14,XMMWORD[32+r12] > + xorps xmm14,xmm15 > + movups XMMWORD[16+r12*1+r13],xmm2 > + xorps xmm2,xmm14 > + movups xmm1,XMMWORD[((-80))+r15] > +DB 102,15,56,220,208 > + and esi,ecx > + xor ecx,edx > + ror ebx,7 > + punpcklqdq xmm13,xmm9 > + mov edi,eax > + xor esi,ecx > + pxor xmm10,xmm11 > + rol eax,5 > + add ebp,esi > + movdqa xmm12,xmm3 > + xor edi,ebx > + paddd xmm3,xmm9 > + xor ebx,ecx > + pxor xmm10,xmm13 > + add ebp,eax > + add edx,DWORD[36+rsp] > + and edi,ebx > + xor ebx,ecx > + ror eax,7 > + movdqa xmm13,xmm10 > + mov esi,ebp > + xor edi,ebx > + movdqa XMMWORD[16+rsp],xmm3 > + rol ebp,5 > + add edx,edi > + movups xmm0,XMMWORD[((-64))+r15] > +DB 102,15,56,220,209 > + xor esi,eax > + pslld xmm10,2 > + xor eax,ebx > + add edx,ebp > + psrld xmm13,30 > + add ecx,DWORD[40+rsp] > + and esi,eax > + xor eax,ebx > + por xmm10,xmm13 > + ror ebp,7 > + mov edi,edx > + xor esi,eax > + rol edx,5 > + pshufd xmm3,xmm9,238 > + add ecx,esi > + xor edi,ebp > + xor ebp,eax > + add ecx,edx > + add ebx,DWORD[44+rsp] > + and edi,ebp > + xor ebp,eax > + ror edx,7 > + movups xmm1,XMMWORD[((-48))+r15] > +DB 102,15,56,220,208 > + mov esi,ecx > + xor edi,ebp > + rol ecx,5 > + add ebx,edi > + xor esi,edx > + xor edx,ebp > + add ebx,ecx > + pxor xmm11,xmm7 > + add eax,DWORD[48+rsp] > + and esi,edx > + xor edx,ebp > + ror ecx,7 > + punpcklqdq xmm3,xmm10 > + mov edi,ebx > + xor esi,edx > + pxor xmm11,xmm4 > + rol ebx,5 > + add eax,esi > + movdqa xmm13,XMMWORD[48+r11] > + xor edi,ecx > + paddd xmm12,xmm10 > + xor ecx,edx > + pxor xmm11,xmm3 > + add eax,ebx > + add ebp,DWORD[52+rsp] > + movups xmm0,XMMWORD[((-32))+r15] > +DB 102,15,56,220,209 > + and edi,ecx > + xor ecx,edx > + ror ebx,7 > + movdqa xmm3,xmm11 > + mov esi,eax > + xor edi,ecx > + movdqa XMMWORD[32+rsp],xmm12 > + rol eax,5 > + add ebp,edi > + xor esi,ebx > + pslld xmm11,2 > + xor ebx,ecx > + add ebp,eax > + psrld xmm3,30 > + add edx,DWORD[56+rsp] > + and esi,ebx > + xor ebx,ecx > + por xmm11,xmm3 > + ror eax,7 > + mov edi,ebp > + xor esi,ebx > + rol ebp,5 > + pshufd xmm12,xmm10,238 > + add edx,esi > + movups xmm1,XMMWORD[((-16))+r15] > +DB 102,15,56,220,208 > + xor edi,eax > + xor eax,ebx > + add edx,ebp > + add ecx,DWORD[60+rsp] > + and edi,eax > + xor eax,ebx > + ror ebp,7 > + mov esi,edx > + xor edi,eax > + rol edx,5 > + add ecx,edi > + xor esi,ebp > + xor ebp,eax > + add ecx,edx > + pxor xmm4,xmm8 > + add ebx,DWORD[rsp] > + and esi,ebp > + xor ebp,eax > + ror edx,7 > + movups xmm0,XMMWORD[r15] > +DB 102,15,56,220,209 > + punpcklqdq xmm12,xmm11 > + mov edi,ecx > + xor esi,ebp > + pxor xmm4,xmm5 > + rol ecx,5 > + add ebx,esi > + movdqa xmm3,xmm13 > + xor edi,edx > + paddd xmm13,xmm11 > + xor edx,ebp > + pxor xmm4,xmm12 > + add ebx,ecx > + add eax,DWORD[4+rsp] > + and edi,edx > + xor edx,ebp > + ror ecx,7 > + movdqa xmm12,xmm4 > + mov esi,ebx > + xor edi,edx > + movdqa XMMWORD[48+rsp],xmm13 > + rol ebx,5 > + add eax,edi > + xor esi,ecx > + pslld xmm4,2 > + xor ecx,edx > + add eax,ebx > + psrld xmm12,30 > + add ebp,DWORD[8+rsp] > + movups xmm1,XMMWORD[16+r15] > +DB 102,15,56,220,208 > + and esi,ecx > + xor ecx,edx > + por xmm4,xmm12 > + ror ebx,7 > + mov edi,eax > + xor esi,ecx > + rol eax,5 > + pshufd xmm13,xmm11,238 > + add ebp,esi > + xor edi,ebx > + xor ebx,ecx > + add ebp,eax > + add edx,DWORD[12+rsp] > + and edi,ebx > + xor ebx,ecx > + ror eax,7 > + mov esi,ebp > + xor edi,ebx > + rol ebp,5 > + add edx,edi > + movups xmm0,XMMWORD[32+r15] > +DB 102,15,56,220,209 > + xor esi,eax > + xor eax,ebx > + add edx,ebp > + pxor xmm5,xmm9 > + add ecx,DWORD[16+rsp] > + and esi,eax > + xor eax,ebx > + ror ebp,7 > + punpcklqdq xmm13,xmm4 > + mov edi,edx > + xor esi,eax > + pxor xmm5,xmm6 > + rol edx,5 > + add ecx,esi > + movdqa xmm12,xmm3 > + xor edi,ebp > + paddd xmm3,xmm4 > + xor ebp,eax > + pxor xmm5,xmm13 > + add ecx,edx > + add ebx,DWORD[20+rsp] > + and edi,ebp > + xor ebp,eax > + ror edx,7 > + movups xmm1,XMMWORD[48+r15] > +DB 102,15,56,220,208 > + movdqa xmm13,xmm5 > + mov esi,ecx > + xor edi,ebp > + movdqa XMMWORD[rsp],xmm3 > + rol ecx,5 > + add ebx,edi > + xor esi,edx > + pslld xmm5,2 > + xor edx,ebp > + add ebx,ecx > + psrld xmm13,30 > + add eax,DWORD[24+rsp] > + and esi,edx > + xor edx,ebp > + por xmm5,xmm13 > + ror ecx,7 > + mov edi,ebx > + xor esi,edx > + rol ebx,5 > + pshufd xmm3,xmm4,238 > + add eax,esi > + xor edi,ecx > + xor ecx,edx > + add eax,ebx > + add ebp,DWORD[28+rsp] > + cmp r8d,11 > + jb NEAR $L$aesenclast3 > + movups xmm0,XMMWORD[64+r15] > +DB 102,15,56,220,209 > + movups xmm1,XMMWORD[80+r15] > +DB 102,15,56,220,208 > + je NEAR $L$aesenclast3 > + movups xmm0,XMMWORD[96+r15] > +DB 102,15,56,220,209 > + movups xmm1,XMMWORD[112+r15] > +DB 102,15,56,220,208 > +$L$aesenclast3: > +DB 102,15,56,221,209 > + movups xmm0,XMMWORD[((16-112))+r15] > + and edi,ecx > + xor ecx,edx > + ror ebx,7 > + mov esi,eax > + xor edi,ecx > + rol eax,5 > + add ebp,edi > + xor esi,ebx > + xor ebx,ecx > + add ebp,eax > + pxor xmm6,xmm10 > + add edx,DWORD[32+rsp] > + and esi,ebx > + xor ebx,ecx > + ror eax,7 > + punpcklqdq xmm3,xmm5 > + mov edi,ebp > + xor esi,ebx > + pxor xmm6,xmm7 > + rol ebp,5 > + add edx,esi > + movups xmm14,XMMWORD[48+r12] > + xorps xmm14,xmm15 > + movups XMMWORD[32+r12*1+r13],xmm2 > + xorps xmm2,xmm14 > + movups xmm1,XMMWORD[((-80))+r15] > +DB 102,15,56,220,208 > + movdqa xmm13,xmm12 > + xor edi,eax > + paddd xmm12,xmm5 > + xor eax,ebx > + pxor xmm6,xmm3 > + add edx,ebp > + add ecx,DWORD[36+rsp] > + and edi,eax > + xor eax,ebx > + ror ebp,7 > + movdqa xmm3,xmm6 > + mov esi,edx > + xor edi,eax > + movdqa XMMWORD[16+rsp],xmm12 > + rol edx,5 > + add ecx,edi > + xor esi,ebp > + pslld xmm6,2 > + xor ebp,eax > + add ecx,edx > + psrld xmm3,30 > + add ebx,DWORD[40+rsp] > + and esi,ebp > + xor ebp,eax > + por xmm6,xmm3 > + ror edx,7 > + movups xmm0,XMMWORD[((-64))+r15] > +DB 102,15,56,220,209 > + mov edi,ecx > + xor esi,ebp > + rol ecx,5 > + pshufd xmm12,xmm5,238 > + add ebx,esi > + xor edi,edx > + xor edx,ebp > + add ebx,ecx > + add eax,DWORD[44+rsp] > + and edi,edx > + xor edx,ebp > + ror ecx,7 > + mov esi,ebx > + xor edi,edx > + rol ebx,5 > + add eax,edi > + xor esi,edx > + add eax,ebx > + pxor xmm7,xmm11 > + add ebp,DWORD[48+rsp] > + movups xmm1,XMMWORD[((-48))+r15] > +DB 102,15,56,220,208 > + xor esi,ecx > + punpcklqdq xmm12,xmm6 > + mov edi,eax > + rol eax,5 > + pxor xmm7,xmm8 > + add ebp,esi > + xor edi,ecx > + movdqa xmm3,xmm13 > + ror ebx,7 > + paddd xmm13,xmm6 > + add ebp,eax > + pxor xmm7,xmm12 > + add edx,DWORD[52+rsp] > + xor edi,ebx > + mov esi,ebp > + rol ebp,5 > + movdqa xmm12,xmm7 > + add edx,edi > + xor esi,ebx > + movdqa XMMWORD[32+rsp],xmm13 > + ror eax,7 > + add edx,ebp > + add ecx,DWORD[56+rsp] > + pslld xmm7,2 > + xor esi,eax > + mov edi,edx > + psrld xmm12,30 > + rol edx,5 > + add ecx,esi > + movups xmm0,XMMWORD[((-32))+r15] > +DB 102,15,56,220,209 > + xor edi,eax > + ror ebp,7 > + por xmm7,xmm12 > + add ecx,edx > + add ebx,DWORD[60+rsp] > + xor edi,ebp > + mov esi,ecx > + rol ecx,5 > + add ebx,edi > + xor esi,ebp > + ror edx,7 > + add ebx,ecx > + add eax,DWORD[rsp] > + xor esi,edx > + mov edi,ebx > + rol ebx,5 > + paddd xmm3,xmm7 > + add eax,esi > + xor edi,edx > + movdqa XMMWORD[48+rsp],xmm3 > + ror ecx,7 > + add eax,ebx > + add ebp,DWORD[4+rsp] > + movups xmm1,XMMWORD[((-16))+r15] > +DB 102,15,56,220,208 > + xor edi,ecx > + mov esi,eax > + rol eax,5 > + add ebp,edi > + xor esi,ecx > + ror ebx,7 > + add ebp,eax > + add edx,DWORD[8+rsp] > + xor esi,ebx > + mov edi,ebp > + rol ebp,5 > + add edx,esi > + xor edi,ebx > + ror eax,7 > + add edx,ebp > + add ecx,DWORD[12+rsp] > + xor edi,eax > + mov esi,edx > + rol edx,5 > + add ecx,edi > + movups xmm0,XMMWORD[r15] > +DB 102,15,56,220,209 > + xor esi,eax > + ror ebp,7 > + add ecx,edx > + cmp r10,r14 > + je NEAR $L$done_ssse3 > + movdqa xmm3,XMMWORD[64+r11] > + movdqa xmm13,XMMWORD[r11] > + movdqu xmm4,XMMWORD[r10] > + movdqu xmm5,XMMWORD[16+r10] > + movdqu xmm6,XMMWORD[32+r10] > + movdqu xmm7,XMMWORD[48+r10] > +DB 102,15,56,0,227 > + add r10,64 > + add ebx,DWORD[16+rsp] > + xor esi,ebp > + mov edi,ecx > +DB 102,15,56,0,235 > + rol ecx,5 > + add ebx,esi > + xor edi,ebp > + ror edx,7 > + paddd xmm4,xmm13 > + add ebx,ecx > + add eax,DWORD[20+rsp] > + xor edi,edx > + mov esi,ebx > + movdqa XMMWORD[rsp],xmm4 > + rol ebx,5 > + add eax,edi > + xor esi,edx > + ror ecx,7 > + psubd xmm4,xmm13 > + add eax,ebx > + add ebp,DWORD[24+rsp] > + movups xmm1,XMMWORD[16+r15] > +DB 102,15,56,220,208 > + xor esi,ecx > + mov edi,eax > + rol eax,5 > + add ebp,esi > + xor edi,ecx > + ror ebx,7 > + add ebp,eax > + add edx,DWORD[28+rsp] > + xor edi,ebx > + mov esi,ebp > + rol ebp,5 > + add edx,edi > + xor esi,ebx > + ror eax,7 > + add edx,ebp > + add ecx,DWORD[32+rsp] > + xor esi,eax > + mov edi,edx > +DB 102,15,56,0,243 > + rol edx,5 > + add ecx,esi > + movups xmm0,XMMWORD[32+r15] > +DB 102,15,56,220,209 > + xor edi,eax > + ror ebp,7 > + paddd xmm5,xmm13 > + add ecx,edx > + add ebx,DWORD[36+rsp] > + xor edi,ebp > + mov esi,ecx > + movdqa XMMWORD[16+rsp],xmm5 > + rol ecx,5 > + add ebx,edi > + xor esi,ebp > + ror edx,7 > + psubd xmm5,xmm13 > + add ebx,ecx > + add eax,DWORD[40+rsp] > + xor esi,edx > + mov edi,ebx > + rol ebx,5 > + add eax,esi > + xor edi,edx > + ror ecx,7 > + add eax,ebx > + add ebp,DWORD[44+rsp] > + movups xmm1,XMMWORD[48+r15] > +DB 102,15,56,220,208 > + xor edi,ecx > + mov esi,eax > + rol eax,5 > + add ebp,edi > + xor esi,ecx > + ror ebx,7 > + add ebp,eax > + add edx,DWORD[48+rsp] > + xor esi,ebx > + mov edi,ebp > +DB 102,15,56,0,251 > + rol ebp,5 > + add edx,esi > + xor edi,ebx > + ror eax,7 > + paddd xmm6,xmm13 > + add edx,ebp > + add ecx,DWORD[52+rsp] > + xor edi,eax > + mov esi,edx > + movdqa XMMWORD[32+rsp],xmm6 > + rol edx,5 > + add ecx,edi > + cmp r8d,11 > + jb NEAR $L$aesenclast4 > + movups xmm0,XMMWORD[64+r15] > +DB 102,15,56,220,209 > + movups xmm1,XMMWORD[80+r15] > +DB 102,15,56,220,208 > + je NEAR $L$aesenclast4 > + movups xmm0,XMMWORD[96+r15] > +DB 102,15,56,220,209 > + movups xmm1,XMMWORD[112+r15] > +DB 102,15,56,220,208 > +$L$aesenclast4: > +DB 102,15,56,221,209 > + movups xmm0,XMMWORD[((16-112))+r15] > + xor esi,eax > + ror ebp,7 > + psubd xmm6,xmm13 > + add ecx,edx > + add ebx,DWORD[56+rsp] > + xor esi,ebp > + mov edi,ecx > + rol ecx,5 > + add ebx,esi > + xor edi,ebp > + ror edx,7 > + add ebx,ecx > + add eax,DWORD[60+rsp] > + xor edi,edx > + mov esi,ebx > + rol ebx,5 > + add eax,edi > + ror ecx,7 > + add eax,ebx > + movups XMMWORD[48+r12*1+r13],xmm2 > + lea r12,[64+r12] > + > + add eax,DWORD[r9] > + add esi,DWORD[4+r9] > + add ecx,DWORD[8+r9] > + add edx,DWORD[12+r9] > + mov DWORD[r9],eax > + add ebp,DWORD[16+r9] > + mov DWORD[4+r9],esi > + mov ebx,esi > + mov DWORD[8+r9],ecx > + mov edi,ecx > + mov DWORD[12+r9],edx > + xor edi,edx > + mov DWORD[16+r9],ebp > + and esi,edi > + jmp NEAR $L$oop_ssse3 > + > +$L$done_ssse3: > + add ebx,DWORD[16+rsp] > + xor esi,ebp > + mov edi,ecx > + rol ecx,5 > + add ebx,esi > + xor edi,ebp > + ror edx,7 > + add ebx,ecx > + add eax,DWORD[20+rsp] > + xor edi,edx > + mov esi,ebx > + rol ebx,5 > + add eax,edi > + xor esi,edx > + ror ecx,7 > + add eax,ebx > + add ebp,DWORD[24+rsp] > + movups xmm1,XMMWORD[16+r15] > +DB 102,15,56,220,208 > + xor esi,ecx > + mov edi,eax > + rol eax,5 > + add ebp,esi > + xor edi,ecx > + ror ebx,7 > + add ebp,eax > + add edx,DWORD[28+rsp] > + xor edi,ebx > + mov esi,ebp > + rol ebp,5 > + add edx,edi > + xor esi,ebx > + ror eax,7 > + add edx,ebp > + add ecx,DWORD[32+rsp] > + xor esi,eax > + mov edi,edx > + rol edx,5 > + add ecx,esi > + movups xmm0,XMMWORD[32+r15] > +DB 102,15,56,220,209 > + xor edi,eax > + ror ebp,7 > + add ecx,edx > + add ebx,DWORD[36+rsp] > + xor edi,ebp > + mov esi,ecx > + rol ecx,5 > + add ebx,edi > + xor esi,ebp > + ror edx,7 > + add ebx,ecx > + add eax,DWORD[40+rsp] > + xor esi,edx > + mov edi,ebx > + rol ebx,5 > + add eax,esi > + xor edi,edx > + ror ecx,7 > + add eax,ebx > + add ebp,DWORD[44+rsp] > + movups xmm1,XMMWORD[48+r15] > +DB 102,15,56,220,208 > + xor edi,ecx > + mov esi,eax > + rol eax,5 > + add ebp,edi > + xor esi,ecx > + ror ebx,7 > + add ebp,eax > + add edx,DWORD[48+rsp] > + xor esi,ebx > + mov edi,ebp > + rol ebp,5 > + add edx,esi > + xor edi,ebx > + ror eax,7 > + add edx,ebp > + add ecx,DWORD[52+rsp] > + xor edi,eax > + mov esi,edx > + rol edx,5 > + add ecx,edi > + cmp r8d,11 > + jb NEAR $L$aesenclast5 > + movups xmm0,XMMWORD[64+r15] > +DB 102,15,56,220,209 > + movups xmm1,XMMWORD[80+r15] > +DB 102,15,56,220,208 > + je NEAR $L$aesenclast5 > + movups xmm0,XMMWORD[96+r15] > +DB 102,15,56,220,209 > + movups xmm1,XMMWORD[112+r15] > +DB 102,15,56,220,208 > +$L$aesenclast5: > +DB 102,15,56,221,209 > + movups xmm0,XMMWORD[((16-112))+r15] > + xor esi,eax > + ror ebp,7 > + add ecx,edx > + add ebx,DWORD[56+rsp] > + xor esi,ebp > + mov edi,ecx > + rol ecx,5 > + add ebx,esi > + xor edi,ebp > + ror edx,7 > + add ebx,ecx > + add eax,DWORD[60+rsp] > + xor edi,edx > + mov esi,ebx > + rol ebx,5 > + add eax,edi > + ror ecx,7 > + add eax,ebx > + movups XMMWORD[48+r12*1+r13],xmm2 > + mov r8,QWORD[88+rsp] > + > + add eax,DWORD[r9] > + add esi,DWORD[4+r9] > + add ecx,DWORD[8+r9] > + mov DWORD[r9],eax > + add edx,DWORD[12+r9] > + mov DWORD[4+r9],esi > + add ebp,DWORD[16+r9] > + mov DWORD[8+r9],ecx > + mov DWORD[12+r9],edx > + mov DWORD[16+r9],ebp > + movups XMMWORD[r8],xmm2 > + movaps xmm6,XMMWORD[((96+0))+rsp] > + movaps xmm7,XMMWORD[((96+16))+rsp] > + movaps xmm8,XMMWORD[((96+32))+rsp] > + movaps xmm9,XMMWORD[((96+48))+rsp] > + movaps xmm10,XMMWORD[((96+64))+rsp] > + movaps xmm11,XMMWORD[((96+80))+rsp] > + movaps xmm12,XMMWORD[((96+96))+rsp] > + movaps xmm13,XMMWORD[((96+112))+rsp] > + movaps xmm14,XMMWORD[((96+128))+rsp] > + movaps xmm15,XMMWORD[((96+144))+rsp] > + lea rsi,[264+rsp] > + > + mov r15,QWORD[rsi] > + > + mov r14,QWORD[8+rsi] > + > + mov r13,QWORD[16+rsi] > + > + mov r12,QWORD[24+rsi] > + > + mov rbp,QWORD[32+rsi] > + > + mov rbx,QWORD[40+rsi] > + > + lea rsp,[48+rsi] > + > +$L$epilogue_ssse3: > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > + mov rsi,QWORD[16+rsp] > + DB 0F3h,0C3h ;repret > + > +$L$SEH_end_aesni_cbc_sha1_enc_ssse3: > +ALIGN 64 > +K_XX_XX: > + DD 0x5a827999,0x5a827999,0x5a827999,0x5a827999 > + DD 0x6ed9eba1,0x6ed9eba1,0x6ed9eba1,0x6ed9eba1 > + DD 0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc > + DD 0xca62c1d6,0xca62c1d6,0xca62c1d6,0xca62c1d6 > + DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f > +DB 0xf,0xe,0xd,0xc,0xb,0xa,0x9,0x8,0x7,0x6,0x5,0x4,0x3,0x2,0x1,0x0 > + > +DB 65,69,83,78,73,45,67,66,67,43,83,72,65,49,32,115 > +DB 116,105,116,99,104,32,102,111,114,32,120,56,54,95,54,52 > +DB 44,32,67,82,89,80,84,79,71,65,77,83,32,98,121,32 > +DB 60,97,112,112,114,111,64,111,112,101,110,115,115,108,46,111 > +DB 114,103,62,0 > +ALIGN 64 > + > +ALIGN 32 > +aesni_cbc_sha1_enc_shaext: > + mov QWORD[8+rsp],rdi ;WIN64 prologue > + mov QWORD[16+rsp],rsi > + mov rax,rsp > +$L$SEH_begin_aesni_cbc_sha1_enc_shaext: > + mov rdi,rcx > + mov rsi,rdx > + mov rdx,r8 > + mov rcx,r9 > + mov r8,QWORD[40+rsp] > + mov r9,QWORD[48+rsp] > + > + > + > + mov r10,QWORD[56+rsp] > + lea rsp,[((-168))+rsp] > + movaps XMMWORD[(-8-160)+rax],xmm6 > + movaps XMMWORD[(-8-144)+rax],xmm7 > + movaps XMMWORD[(-8-128)+rax],xmm8 > + movaps XMMWORD[(-8-112)+rax],xmm9 > + movaps XMMWORD[(-8-96)+rax],xmm10 > + movaps XMMWORD[(-8-80)+rax],xmm11 > + movaps XMMWORD[(-8-64)+rax],xmm12 > + movaps XMMWORD[(-8-48)+rax],xmm13 > + movaps XMMWORD[(-8-32)+rax],xmm14 > + movaps XMMWORD[(-8-16)+rax],xmm15 > +$L$prologue_shaext: > + movdqu xmm8,XMMWORD[r9] > + movd xmm9,DWORD[16+r9] > + movdqa xmm7,XMMWORD[((K_XX_XX+80))] > + > + mov r11d,DWORD[240+rcx] > + sub rsi,rdi > + movups xmm15,XMMWORD[rcx] > + movups xmm2,XMMWORD[r8] > + movups xmm0,XMMWORD[16+rcx] > + lea rcx,[112+rcx] > + > + pshufd xmm8,xmm8,27 > + pshufd xmm9,xmm9,27 > + jmp NEAR $L$oop_shaext > + > +ALIGN 16 > +$L$oop_shaext: > + movups xmm14,XMMWORD[rdi] > + xorps xmm14,xmm15 > + xorps xmm2,xmm14 > + movups xmm1,XMMWORD[((-80))+rcx] > +DB 102,15,56,220,208 > + movdqu xmm3,XMMWORD[r10] > + movdqa xmm12,xmm9 > +DB 102,15,56,0,223 > + movdqu xmm4,XMMWORD[16+r10] > + movdqa xmm11,xmm8 > + movups xmm0,XMMWORD[((-64))+rcx] > +DB 102,15,56,220,209 > +DB 102,15,56,0,231 > + > + paddd xmm9,xmm3 > + movdqu xmm5,XMMWORD[32+r10] > + lea r10,[64+r10] > + pxor xmm3,xmm12 > + movups xmm1,XMMWORD[((-48))+rcx] > +DB 102,15,56,220,208 > + pxor xmm3,xmm12 > + movdqa xmm10,xmm8 > +DB 102,15,56,0,239 > +DB 69,15,58,204,193,0 > +DB 68,15,56,200,212 > + movups xmm0,XMMWORD[((-32))+rcx] > +DB 102,15,56,220,209 > +DB 15,56,201,220 > + movdqu xmm6,XMMWORD[((-16))+r10] > + movdqa xmm9,xmm8 > +DB 102,15,56,0,247 > + movups xmm1,XMMWORD[((-16))+rcx] > +DB 102,15,56,220,208 > +DB 69,15,58,204,194,0 > +DB 68,15,56,200,205 > + pxor xmm3,xmm5 > +DB 15,56,201,229 > + movups xmm0,XMMWORD[rcx] > +DB 102,15,56,220,209 > + movdqa xmm10,xmm8 > +DB 69,15,58,204,193,0 > +DB 68,15,56,200,214 > + movups xmm1,XMMWORD[16+rcx] > +DB 102,15,56,220,208 > +DB 15,56,202,222 > + pxor xmm4,xmm6 > +DB 15,56,201,238 > + movups xmm0,XMMWORD[32+rcx] > +DB 102,15,56,220,209 > + movdqa xmm9,xmm8 > +DB 69,15,58,204,194,0 > +DB 68,15,56,200,203 > + movups xmm1,XMMWORD[48+rcx] > +DB 102,15,56,220,208 > +DB 15,56,202,227 > + pxor xmm5,xmm3 > +DB 15,56,201,243 > + cmp r11d,11 > + jb NEAR $L$aesenclast6 > + movups xmm0,XMMWORD[64+rcx] > +DB 102,15,56,220,209 > + movups xmm1,XMMWORD[80+rcx] > +DB 102,15,56,220,208 > + je NEAR $L$aesenclast6 > + movups xmm0,XMMWORD[96+rcx] > +DB 102,15,56,220,209 > + movups xmm1,XMMWORD[112+rcx] > +DB 102,15,56,220,208 > +$L$aesenclast6: > +DB 102,15,56,221,209 > + movups xmm0,XMMWORD[((16-112))+rcx] > + movdqa xmm10,xmm8 > +DB 69,15,58,204,193,0 > +DB 68,15,56,200,212 > + movups xmm14,XMMWORD[16+rdi] > + xorps xmm14,xmm15 > + movups XMMWORD[rdi*1+rsi],xmm2 > + xorps xmm2,xmm14 > + movups xmm1,XMMWORD[((-80))+rcx] > +DB 102,15,56,220,208 > +DB 15,56,202,236 > + pxor xmm6,xmm4 > +DB 15,56,201,220 > + movups xmm0,XMMWORD[((-64))+rcx] > +DB 102,15,56,220,209 > + movdqa xmm9,xmm8 > +DB 69,15,58,204,194,1 > +DB 68,15,56,200,205 > + movups xmm1,XMMWORD[((-48))+rcx] > +DB 102,15,56,220,208 > +DB 15,56,202,245 > + pxor xmm3,xmm5 > +DB 15,56,201,229 > + movups xmm0,XMMWORD[((-32))+rcx] > +DB 102,15,56,220,209 > + movdqa xmm10,xmm8 > +DB 69,15,58,204,193,1 > +DB 68,15,56,200,214 > + movups xmm1,XMMWORD[((-16))+rcx] > +DB 102,15,56,220,208 > +DB 15,56,202,222 > + pxor xmm4,xmm6 > +DB 15,56,201,238 > + movups xmm0,XMMWORD[rcx] > +DB 102,15,56,220,209 > + movdqa xmm9,xmm8 > +DB 69,15,58,204,194,1 > +DB 68,15,56,200,203 > + movups xmm1,XMMWORD[16+rcx] > +DB 102,15,56,220,208 > +DB 15,56,202,227 > + pxor xmm5,xmm3 > +DB 15,56,201,243 > + movups xmm0,XMMWORD[32+rcx] > +DB 102,15,56,220,209 > + movdqa xmm10,xmm8 > +DB 69,15,58,204,193,1 > +DB 68,15,56,200,212 > + movups xmm1,XMMWORD[48+rcx] > +DB 102,15,56,220,208 > +DB 15,56,202,236 > + pxor xmm6,xmm4 > +DB 15,56,201,220 > + cmp r11d,11 > + jb NEAR $L$aesenclast7 > + movups xmm0,XMMWORD[64+rcx] > +DB 102,15,56,220,209 > + movups xmm1,XMMWORD[80+rcx] > +DB 102,15,56,220,208 > + je NEAR $L$aesenclast7 > + movups xmm0,XMMWORD[96+rcx] > +DB 102,15,56,220,209 > + movups xmm1,XMMWORD[112+rcx] > +DB 102,15,56,220,208 > +$L$aesenclast7: > +DB 102,15,56,221,209 > + movups xmm0,XMMWORD[((16-112))+rcx] > + movdqa xmm9,xmm8 > +DB 69,15,58,204,194,1 > +DB 68,15,56,200,205 > + movups xmm14,XMMWORD[32+rdi] > + xorps xmm14,xmm15 > + movups XMMWORD[16+rdi*1+rsi],xmm2 > + xorps xmm2,xmm14 > + movups xmm1,XMMWORD[((-80))+rcx] > +DB 102,15,56,220,208 > +DB 15,56,202,245 > + pxor xmm3,xmm5 > +DB 15,56,201,229 > + movups xmm0,XMMWORD[((-64))+rcx] > +DB 102,15,56,220,209 > + movdqa xmm10,xmm8 > +DB 69,15,58,204,193,2 > +DB 68,15,56,200,214 > + movups xmm1,XMMWORD[((-48))+rcx] > +DB 102,15,56,220,208 > +DB 15,56,202,222 > + pxor xmm4,xmm6 > +DB 15,56,201,238 > + movups xmm0,XMMWORD[((-32))+rcx] > +DB 102,15,56,220,209 > + movdqa xmm9,xmm8 > +DB 69,15,58,204,194,2 > +DB 68,15,56,200,203 > + movups xmm1,XMMWORD[((-16))+rcx] > +DB 102,15,56,220,208 > +DB 15,56,202,227 > + pxor xmm5,xmm3 > +DB 15,56,201,243 > + movups xmm0,XMMWORD[rcx] > +DB 102,15,56,220,209 > + movdqa xmm10,xmm8 > +DB 69,15,58,204,193,2 > +DB 68,15,56,200,212 > + movups xmm1,XMMWORD[16+rcx] > +DB 102,15,56,220,208 > +DB 15,56,202,236 > + pxor xmm6,xmm4 > +DB 15,56,201,220 > + movups xmm0,XMMWORD[32+rcx] > +DB 102,15,56,220,209 > + movdqa xmm9,xmm8 > +DB 69,15,58,204,194,2 > +DB 68,15,56,200,205 > + movups xmm1,XMMWORD[48+rcx] > +DB 102,15,56,220,208 > +DB 15,56,202,245 > + pxor xmm3,xmm5 > +DB 15,56,201,229 > + cmp r11d,11 > + jb NEAR $L$aesenclast8 > + movups xmm0,XMMWORD[64+rcx] > +DB 102,15,56,220,209 > + movups xmm1,XMMWORD[80+rcx] > +DB 102,15,56,220,208 > + je NEAR $L$aesenclast8 > + movups xmm0,XMMWORD[96+rcx] > +DB 102,15,56,220,209 > + movups xmm1,XMMWORD[112+rcx] > +DB 102,15,56,220,208 > +$L$aesenclast8: > +DB 102,15,56,221,209 > + movups xmm0,XMMWORD[((16-112))+rcx] > + movdqa xmm10,xmm8 > +DB 69,15,58,204,193,2 > +DB 68,15,56,200,214 > + movups xmm14,XMMWORD[48+rdi] > + xorps xmm14,xmm15 > + movups XMMWORD[32+rdi*1+rsi],xmm2 > + xorps xmm2,xmm14 > + movups xmm1,XMMWORD[((-80))+rcx] > +DB 102,15,56,220,208 > +DB 15,56,202,222 > + pxor xmm4,xmm6 > +DB 15,56,201,238 > + movups xmm0,XMMWORD[((-64))+rcx] > +DB 102,15,56,220,209 > + movdqa xmm9,xmm8 > +DB 69,15,58,204,194,3 > +DB 68,15,56,200,203 > + movups xmm1,XMMWORD[((-48))+rcx] > +DB 102,15,56,220,208 > +DB 15,56,202,227 > + pxor xmm5,xmm3 > +DB 15,56,201,243 > + movups xmm0,XMMWORD[((-32))+rcx] > +DB 102,15,56,220,209 > + movdqa xmm10,xmm8 > +DB 69,15,58,204,193,3 > +DB 68,15,56,200,212 > +DB 15,56,202,236 > + pxor xmm6,xmm4 > + movups xmm1,XMMWORD[((-16))+rcx] > +DB 102,15,56,220,208 > + movdqa xmm9,xmm8 > +DB 69,15,58,204,194,3 > +DB 68,15,56,200,205 > +DB 15,56,202,245 > + movups xmm0,XMMWORD[rcx] > +DB 102,15,56,220,209 > + movdqa xmm5,xmm12 > + movdqa xmm10,xmm8 > +DB 69,15,58,204,193,3 > +DB 68,15,56,200,214 > + movups xmm1,XMMWORD[16+rcx] > +DB 102,15,56,220,208 > + movdqa xmm9,xmm8 > +DB 69,15,58,204,194,3 > +DB 68,15,56,200,205 > + movups xmm0,XMMWORD[32+rcx] > +DB 102,15,56,220,209 > + movups xmm1,XMMWORD[48+rcx] > +DB 102,15,56,220,208 > + cmp r11d,11 > + jb NEAR $L$aesenclast9 > + movups xmm0,XMMWORD[64+rcx] > +DB 102,15,56,220,209 > + movups xmm1,XMMWORD[80+rcx] > +DB 102,15,56,220,208 > + je NEAR $L$aesenclast9 > + movups xmm0,XMMWORD[96+rcx] > +DB 102,15,56,220,209 > + movups xmm1,XMMWORD[112+rcx] > +DB 102,15,56,220,208 > +$L$aesenclast9: > +DB 102,15,56,221,209 > + movups xmm0,XMMWORD[((16-112))+rcx] > + dec rdx > + > + paddd xmm8,xmm11 > + movups XMMWORD[48+rdi*1+rsi],xmm2 > + lea rdi,[64+rdi] > + jnz NEAR $L$oop_shaext > + > + pshufd xmm8,xmm8,27 > + pshufd xmm9,xmm9,27 > + movups XMMWORD[r8],xmm2 > + movdqu XMMWORD[r9],xmm8 > + movd DWORD[16+r9],xmm9 > + movaps xmm6,XMMWORD[((-8-160))+rax] > + movaps xmm7,XMMWORD[((-8-144))+rax] > + movaps xmm8,XMMWORD[((-8-128))+rax] > + movaps xmm9,XMMWORD[((-8-112))+rax] > + movaps xmm10,XMMWORD[((-8-96))+rax] > + movaps xmm11,XMMWORD[((-8-80))+rax] > + movaps xmm12,XMMWORD[((-8-64))+rax] > + movaps xmm13,XMMWORD[((-8-48))+rax] > + movaps xmm14,XMMWORD[((-8-32))+rax] > + movaps xmm15,XMMWORD[((-8-16))+rax] > + mov rsp,rax > +$L$epilogue_shaext: > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > + mov rsi,QWORD[16+rsp] > + DB 0F3h,0C3h ;repret > + > +$L$SEH_end_aesni_cbc_sha1_enc_shaext: > +EXTERN __imp_RtlVirtualUnwind > + > +ALIGN 16 > +ssse3_handler: > + push rsi > + push rdi > + push rbx > + push rbp > + push r12 > + push r13 > + push r14 > + push r15 > + pushfq > + sub rsp,64 > + > + mov rax,QWORD[120+r8] > + mov rbx,QWORD[248+r8] > + > + mov rsi,QWORD[8+r9] > + mov r11,QWORD[56+r9] > + > + mov r10d,DWORD[r11] > + lea r10,[r10*1+rsi] > + cmp rbx,r10 > + jb NEAR $L$common_seh_tail > + > + mov rax,QWORD[152+r8] > + > + mov r10d,DWORD[4+r11] > + lea r10,[r10*1+rsi] > + cmp rbx,r10 > + jae NEAR $L$common_seh_tail > + lea r10,[aesni_cbc_sha1_enc_shaext] > + cmp rbx,r10 > + jb NEAR $L$seh_no_shaext > + > + lea rsi,[rax] > + lea rdi,[512+r8] > + mov ecx,20 > + DD 0xa548f3fc > + lea rax,[168+rax] > + jmp NEAR $L$common_seh_tail > +$L$seh_no_shaext: > + lea rsi,[96+rax] > + lea rdi,[512+r8] > + mov ecx,20 > + DD 0xa548f3fc > + lea rax,[264+rax] > + > + mov r15,QWORD[rax] > + mov r14,QWORD[8+rax] > + mov r13,QWORD[16+rax] > + mov r12,QWORD[24+rax] > + mov rbp,QWORD[32+rax] > + mov rbx,QWORD[40+rax] > + lea rax,[48+rax] > + mov QWORD[144+r8],rbx > + mov QWORD[160+r8],rbp > + mov QWORD[216+r8],r12 > + mov QWORD[224+r8],r13 > + mov QWORD[232+r8],r14 > + mov QWORD[240+r8],r15 > + > +$L$common_seh_tail: > + mov rdi,QWORD[8+rax] > + mov rsi,QWORD[16+rax] > + mov QWORD[152+r8],rax > + mov QWORD[168+r8],rsi > + mov QWORD[176+r8],rdi > + > + mov rdi,QWORD[40+r9] > + mov rsi,r8 > + mov ecx,154 > + DD 0xa548f3fc > + > + mov rsi,r9 > + xor rcx,rcx > + mov rdx,QWORD[8+rsi] > + mov r8,QWORD[rsi] > + mov r9,QWORD[16+rsi] > + mov r10,QWORD[40+rsi] > + lea r11,[56+rsi] > + lea r12,[24+rsi] > + mov QWORD[32+rsp],r10 > + mov QWORD[40+rsp],r11 > + mov QWORD[48+rsp],r12 > + mov QWORD[56+rsp],rcx > + call QWORD[__imp_RtlVirtualUnwind] > + > + mov eax,1 > + add rsp,64 > + popfq > + pop r15 > + pop r14 > + pop r13 > + pop r12 > + pop rbp > + pop rbx > + pop rdi > + pop rsi > + DB 0F3h,0C3h ;repret > + > + > +section .pdata rdata align=4 > +ALIGN 4 > + DD $L$SEH_begin_aesni_cbc_sha1_enc_ssse3 wrt ..imagebase > + DD $L$SEH_end_aesni_cbc_sha1_enc_ssse3 wrt ..imagebase > + DD $L$SEH_info_aesni_cbc_sha1_enc_ssse3 wrt ..imagebase > + DD $L$SEH_begin_aesni_cbc_sha1_enc_shaext wrt ..imagebase > + DD $L$SEH_end_aesni_cbc_sha1_enc_shaext wrt ..imagebase > + DD $L$SEH_info_aesni_cbc_sha1_enc_shaext wrt ..imagebase > +section .xdata rdata align=8 > +ALIGN 8 > +$L$SEH_info_aesni_cbc_sha1_enc_ssse3: > +DB 9,0,0,0 > + DD ssse3_handler wrt ..imagebase > + DD $L$prologue_ssse3 wrt ..imagebase,$L$epilogue_ssse3 > wrt ..imagebase > +$L$SEH_info_aesni_cbc_sha1_enc_shaext: > +DB 9,0,0,0 > + DD ssse3_handler wrt ..imagebase > + DD $L$prologue_shaext wrt ..imagebase,$L$epilogue_shaext > wrt ..imagebase > diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha256- > x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha256- > x86_64.nasm > new file mode 100644 > index 0000000000..f5c250b904 > --- /dev/null > +++ b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha256- > x86_64.nasm > @@ -0,0 +1,78 @@ > +; WARNING: do not edit! > +; Generated from openssl/crypto/aes/asm/aesni-sha256-x86_64.pl > +; > +; Copyright 2013-2020 The OpenSSL Project Authors. All Rights Reserved. > +; > +; Licensed under the OpenSSL license (the "License"). You may not use > +; this file except in compliance with the License. You can obtain a copy > +; in the file LICENSE in the source distribution or at > +; https://www.openssl.org/source/license.html > + > +default rel > +%define XMMWORD > +%define YMMWORD > +%define ZMMWORD > +section .text code align=64 > + > + > +EXTERN OPENSSL_ia32cap_P > +global aesni_cbc_sha256_enc > + > +ALIGN 16 > +aesni_cbc_sha256_enc: > + > + xor eax,eax > + cmp rcx,0 > + je NEAR $L$probe > + ud2 > +$L$probe: > + DB 0F3h,0C3h ;repret > + > + > + > +ALIGN 64 > + > +K256: > + DD 0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5 > + DD 0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5 > + DD 0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5 > + DD 0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5 > + DD 0xd807aa98,0x12835b01,0x243185be,0x550c7dc3 > + DD 0xd807aa98,0x12835b01,0x243185be,0x550c7dc3 > + DD 0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174 > + DD 0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174 > + DD 0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc > + DD 0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc > + DD 0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da > + DD 0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da > + DD 0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7 > + DD 0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7 > + DD 0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967 > + DD 0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967 > + DD 0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13 > + DD 0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13 > + DD 0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85 > + DD 0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85 > + DD 0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3 > + DD 0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3 > + DD 0xd192e819,0xd6990624,0xf40e3585,0x106aa070 > + DD 0xd192e819,0xd6990624,0xf40e3585,0x106aa070 > + DD 0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5 > + DD 0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5 > + DD 0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3 > + DD 0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3 > + DD 0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208 > + DD 0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208 > + DD 0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2 > + DD 0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2 > + > + DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f > + DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f > + DD 0,0,0,0,0,0,0,0,-1,-1,-1,-1 > + DD 0,0,0,0,0,0,0,0 > +DB 65,69,83,78,73,45,67,66,67,43,83,72,65,50,53,54 > +DB 32,115,116,105,116,99,104,32,102,111,114,32,120,56,54,95 > +DB 54,52,44,32,67,82,89,80,84,79,71,65,77,83,32,98 > +DB 121,32,60,97,112,112,114,111,64,111,112,101,110,115,115,108 > +DB 46,111,114,103,62,0 > +ALIGN 64 > diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-x86_64.nasm > b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-x86_64.nasm > new file mode 100644 > index 0000000000..57ee23ea8c > --- /dev/null > +++ b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-x86_64.nasm > @@ -0,0 +1,5103 @@ > +; WARNING: do not edit! > +; Generated from openssl/crypto/aes/asm/aesni-x86_64.pl > +; > +; Copyright 2009-2020 The OpenSSL Project Authors. All Rights Reserved. > +; > +; Licensed under the OpenSSL license (the "License"). You may not use > +; this file except in compliance with the License. You can obtain a copy > +; in the file LICENSE in the source distribution or at > +; https://www.openssl.org/source/license.html > + > +default rel > +%define XMMWORD > +%define YMMWORD > +%define ZMMWORD > +section .text code align=64 > + > +EXTERN OPENSSL_ia32cap_P > +global aesni_encrypt > + > +ALIGN 16 > +aesni_encrypt: > + > + movups xmm2,XMMWORD[rcx] > + mov eax,DWORD[240+r8] > + movups xmm0,XMMWORD[r8] > + movups xmm1,XMMWORD[16+r8] > + lea r8,[32+r8] > + xorps xmm2,xmm0 > +$L$oop_enc1_1: > +DB 102,15,56,220,209 > + dec eax > + movups xmm1,XMMWORD[r8] > + lea r8,[16+r8] > + jnz NEAR $L$oop_enc1_1 > +DB 102,15,56,221,209 > + pxor xmm0,xmm0 > + pxor xmm1,xmm1 > + movups XMMWORD[rdx],xmm2 > + pxor xmm2,xmm2 > + DB 0F3h,0C3h ;repret > + > + > + > +global aesni_decrypt > + > +ALIGN 16 > +aesni_decrypt: > + > + movups xmm2,XMMWORD[rcx] > + mov eax,DWORD[240+r8] > + movups xmm0,XMMWORD[r8] > + movups xmm1,XMMWORD[16+r8] > + lea r8,[32+r8] > + xorps xmm2,xmm0 > +$L$oop_dec1_2: > +DB 102,15,56,222,209 > + dec eax > + movups xmm1,XMMWORD[r8] > + lea r8,[16+r8] > + jnz NEAR $L$oop_dec1_2 > +DB 102,15,56,223,209 > + pxor xmm0,xmm0 > + pxor xmm1,xmm1 > + movups XMMWORD[rdx],xmm2 > + pxor xmm2,xmm2 > + DB 0F3h,0C3h ;repret > + > + > + > +ALIGN 16 > +_aesni_encrypt2: > + > + movups xmm0,XMMWORD[rcx] > + shl eax,4 > + movups xmm1,XMMWORD[16+rcx] > + xorps xmm2,xmm0 > + xorps xmm3,xmm0 > + movups xmm0,XMMWORD[32+rcx] > + lea rcx,[32+rax*1+rcx] > + neg rax > + add rax,16 > + > +$L$enc_loop2: > +DB 102,15,56,220,209 > +DB 102,15,56,220,217 > + movups xmm1,XMMWORD[rax*1+rcx] > + add rax,32 > +DB 102,15,56,220,208 > +DB 102,15,56,220,216 > + movups xmm0,XMMWORD[((-16))+rax*1+rcx] > + jnz NEAR $L$enc_loop2 > + > +DB 102,15,56,220,209 > +DB 102,15,56,220,217 > +DB 102,15,56,221,208 > +DB 102,15,56,221,216 > + DB 0F3h,0C3h ;repret > + > + > + > +ALIGN 16 > +_aesni_decrypt2: > + > + movups xmm0,XMMWORD[rcx] > + shl eax,4 > + movups xmm1,XMMWORD[16+rcx] > + xorps xmm2,xmm0 > + xorps xmm3,xmm0 > + movups xmm0,XMMWORD[32+rcx] > + lea rcx,[32+rax*1+rcx] > + neg rax > + add rax,16 > + > +$L$dec_loop2: > +DB 102,15,56,222,209 > +DB 102,15,56,222,217 > + movups xmm1,XMMWORD[rax*1+rcx] > + add rax,32 > +DB 102,15,56,222,208 > +DB 102,15,56,222,216 > + movups xmm0,XMMWORD[((-16))+rax*1+rcx] > + jnz NEAR $L$dec_loop2 > + > +DB 102,15,56,222,209 > +DB 102,15,56,222,217 > +DB 102,15,56,223,208 > +DB 102,15,56,223,216 > + DB 0F3h,0C3h ;repret > + > + > + > +ALIGN 16 > +_aesni_encrypt3: > + > + movups xmm0,XMMWORD[rcx] > + shl eax,4 > + movups xmm1,XMMWORD[16+rcx] > + xorps xmm2,xmm0 > + xorps xmm3,xmm0 > + xorps xmm4,xmm0 > + movups xmm0,XMMWORD[32+rcx] > + lea rcx,[32+rax*1+rcx] > + neg rax > + add rax,16 > + > +$L$enc_loop3: > +DB 102,15,56,220,209 > +DB 102,15,56,220,217 > +DB 102,15,56,220,225 > + movups xmm1,XMMWORD[rax*1+rcx] > + add rax,32 > +DB 102,15,56,220,208 > +DB 102,15,56,220,216 > +DB 102,15,56,220,224 > + movups xmm0,XMMWORD[((-16))+rax*1+rcx] > + jnz NEAR $L$enc_loop3 > + > +DB 102,15,56,220,209 > +DB 102,15,56,220,217 > +DB 102,15,56,220,225 > +DB 102,15,56,221,208 > +DB 102,15,56,221,216 > +DB 102,15,56,221,224 > + DB 0F3h,0C3h ;repret > + > + > + > +ALIGN 16 > +_aesni_decrypt3: > + > + movups xmm0,XMMWORD[rcx] > + shl eax,4 > + movups xmm1,XMMWORD[16+rcx] > + xorps xmm2,xmm0 > + xorps xmm3,xmm0 > + xorps xmm4,xmm0 > + movups xmm0,XMMWORD[32+rcx] > + lea rcx,[32+rax*1+rcx] > + neg rax > + add rax,16 > + > +$L$dec_loop3: > +DB 102,15,56,222,209 > +DB 102,15,56,222,217 > +DB 102,15,56,222,225 > + movups xmm1,XMMWORD[rax*1+rcx] > + add rax,32 > +DB 102,15,56,222,208 > +DB 102,15,56,222,216 > +DB 102,15,56,222,224 > + movups xmm0,XMMWORD[((-16))+rax*1+rcx] > + jnz NEAR $L$dec_loop3 > + > +DB 102,15,56,222,209 > +DB 102,15,56,222,217 > +DB 102,15,56,222,225 > +DB 102,15,56,223,208 > +DB 102,15,56,223,216 > +DB 102,15,56,223,224 > + DB 0F3h,0C3h ;repret > + > + > + > +ALIGN 16 > +_aesni_encrypt4: > + > + movups xmm0,XMMWORD[rcx] > + shl eax,4 > + movups xmm1,XMMWORD[16+rcx] > + xorps xmm2,xmm0 > + xorps xmm3,xmm0 > + xorps xmm4,xmm0 > + xorps xmm5,xmm0 > + movups xmm0,XMMWORD[32+rcx] > + lea rcx,[32+rax*1+rcx] > + neg rax > +DB 0x0f,0x1f,0x00 > + add rax,16 > + > +$L$enc_loop4: > +DB 102,15,56,220,209 > +DB 102,15,56,220,217 > +DB 102,15,56,220,225 > +DB 102,15,56,220,233 > + movups xmm1,XMMWORD[rax*1+rcx] > + add rax,32 > +DB 102,15,56,220,208 > +DB 102,15,56,220,216 > +DB 102,15,56,220,224 > +DB 102,15,56,220,232 > + movups xmm0,XMMWORD[((-16))+rax*1+rcx] > + jnz NEAR $L$enc_loop4 > + > +DB 102,15,56,220,209 > +DB 102,15,56,220,217 > +DB 102,15,56,220,225 > +DB 102,15,56,220,233 > +DB 102,15,56,221,208 > +DB 102,15,56,221,216 > +DB 102,15,56,221,224 > +DB 102,15,56,221,232 > + DB 0F3h,0C3h ;repret > + > + > + > +ALIGN 16 > +_aesni_decrypt4: > + > + movups xmm0,XMMWORD[rcx] > + shl eax,4 > + movups xmm1,XMMWORD[16+rcx] > + xorps xmm2,xmm0 > + xorps xmm3,xmm0 > + xorps xmm4,xmm0 > + xorps xmm5,xmm0 > + movups xmm0,XMMWORD[32+rcx] > + lea rcx,[32+rax*1+rcx] > + neg rax > +DB 0x0f,0x1f,0x00 > + add rax,16 > + > +$L$dec_loop4: > +DB 102,15,56,222,209 > +DB 102,15,56,222,217 > +DB 102,15,56,222,225 > +DB 102,15,56,222,233 > + movups xmm1,XMMWORD[rax*1+rcx] > + add rax,32 > +DB 102,15,56,222,208 > +DB 102,15,56,222,216 > +DB 102,15,56,222,224 > +DB 102,15,56,222,232 > + movups xmm0,XMMWORD[((-16))+rax*1+rcx] > + jnz NEAR $L$dec_loop4 > + > +DB 102,15,56,222,209 > +DB 102,15,56,222,217 > +DB 102,15,56,222,225 > +DB 102,15,56,222,233 > +DB 102,15,56,223,208 > +DB 102,15,56,223,216 > +DB 102,15,56,223,224 > +DB 102,15,56,223,232 > + DB 0F3h,0C3h ;repret > + > + > + > +ALIGN 16 > +_aesni_encrypt6: > + > + movups xmm0,XMMWORD[rcx] > + shl eax,4 > + movups xmm1,XMMWORD[16+rcx] > + xorps xmm2,xmm0 > + pxor xmm3,xmm0 > + pxor xmm4,xmm0 > +DB 102,15,56,220,209 > + lea rcx,[32+rax*1+rcx] > + neg rax > +DB 102,15,56,220,217 > + pxor xmm5,xmm0 > + pxor xmm6,xmm0 > +DB 102,15,56,220,225 > + pxor xmm7,xmm0 > + movups xmm0,XMMWORD[rax*1+rcx] > + add rax,16 > + jmp NEAR $L$enc_loop6_enter > +ALIGN 16 > +$L$enc_loop6: > +DB 102,15,56,220,209 > +DB 102,15,56,220,217 > +DB 102,15,56,220,225 > +$L$enc_loop6_enter: > +DB 102,15,56,220,233 > +DB 102,15,56,220,241 > +DB 102,15,56,220,249 > + movups xmm1,XMMWORD[rax*1+rcx] > + add rax,32 > +DB 102,15,56,220,208 > +DB 102,15,56,220,216 > +DB 102,15,56,220,224 > +DB 102,15,56,220,232 > +DB 102,15,56,220,240 > +DB 102,15,56,220,248 > + movups xmm0,XMMWORD[((-16))+rax*1+rcx] > + jnz NEAR $L$enc_loop6 > + > +DB 102,15,56,220,209 > +DB 102,15,56,220,217 > +DB 102,15,56,220,225 > +DB 102,15,56,220,233 > +DB 102,15,56,220,241 > +DB 102,15,56,220,249 > +DB 102,15,56,221,208 > +DB 102,15,56,221,216 > +DB 102,15,56,221,224 > +DB 102,15,56,221,232 > +DB 102,15,56,221,240 > +DB 102,15,56,221,248 > + DB 0F3h,0C3h ;repret > + > + > + > +ALIGN 16 > +_aesni_decrypt6: > + > + movups xmm0,XMMWORD[rcx] > + shl eax,4 > + movups xmm1,XMMWORD[16+rcx] > + xorps xmm2,xmm0 > + pxor xmm3,xmm0 > + pxor xmm4,xmm0 > +DB 102,15,56,222,209 > + lea rcx,[32+rax*1+rcx] > + neg rax > +DB 102,15,56,222,217 > + pxor xmm5,xmm0 > + pxor xmm6,xmm0 > +DB 102,15,56,222,225 > + pxor xmm7,xmm0 > + movups xmm0,XMMWORD[rax*1+rcx] > + add rax,16 > + jmp NEAR $L$dec_loop6_enter > +ALIGN 16 > +$L$dec_loop6: > +DB 102,15,56,222,209 > +DB 102,15,56,222,217 > +DB 102,15,56,222,225 > +$L$dec_loop6_enter: > +DB 102,15,56,222,233 > +DB 102,15,56,222,241 > +DB 102,15,56,222,249 > + movups xmm1,XMMWORD[rax*1+rcx] > + add rax,32 > +DB 102,15,56,222,208 > +DB 102,15,56,222,216 > +DB 102,15,56,222,224 > +DB 102,15,56,222,232 > +DB 102,15,56,222,240 > +DB 102,15,56,222,248 > + movups xmm0,XMMWORD[((-16))+rax*1+rcx] > + jnz NEAR $L$dec_loop6 > + > +DB 102,15,56,222,209 > +DB 102,15,56,222,217 > +DB 102,15,56,222,225 > +DB 102,15,56,222,233 > +DB 102,15,56,222,241 > +DB 102,15,56,222,249 > +DB 102,15,56,223,208 > +DB 102,15,56,223,216 > +DB 102,15,56,223,224 > +DB 102,15,56,223,232 > +DB 102,15,56,223,240 > +DB 102,15,56,223,248 > + DB 0F3h,0C3h ;repret > + > + > + > +ALIGN 16 > +_aesni_encrypt8: > + > + movups xmm0,XMMWORD[rcx] > + shl eax,4 > + movups xmm1,XMMWORD[16+rcx] > + xorps xmm2,xmm0 > + xorps xmm3,xmm0 > + pxor xmm4,xmm0 > + pxor xmm5,xmm0 > + pxor xmm6,xmm0 > + lea rcx,[32+rax*1+rcx] > + neg rax > +DB 102,15,56,220,209 > + pxor xmm7,xmm0 > + pxor xmm8,xmm0 > +DB 102,15,56,220,217 > + pxor xmm9,xmm0 > + movups xmm0,XMMWORD[rax*1+rcx] > + add rax,16 > + jmp NEAR $L$enc_loop8_inner > +ALIGN 16 > +$L$enc_loop8: > +DB 102,15,56,220,209 > +DB 102,15,56,220,217 > +$L$enc_loop8_inner: > +DB 102,15,56,220,225 > +DB 102,15,56,220,233 > +DB 102,15,56,220,241 > +DB 102,15,56,220,249 > +DB 102,68,15,56,220,193 > +DB 102,68,15,56,220,201 > +$L$enc_loop8_enter: > + movups xmm1,XMMWORD[rax*1+rcx] > + add rax,32 > +DB 102,15,56,220,208 > +DB 102,15,56,220,216 > +DB 102,15,56,220,224 > +DB 102,15,56,220,232 > +DB 102,15,56,220,240 > +DB 102,15,56,220,248 > +DB 102,68,15,56,220,192 > +DB 102,68,15,56,220,200 > + movups xmm0,XMMWORD[((-16))+rax*1+rcx] > + jnz NEAR $L$enc_loop8 > + > +DB 102,15,56,220,209 > +DB 102,15,56,220,217 > +DB 102,15,56,220,225 > +DB 102,15,56,220,233 > +DB 102,15,56,220,241 > +DB 102,15,56,220,249 > +DB 102,68,15,56,220,193 > +DB 102,68,15,56,220,201 > +DB 102,15,56,221,208 > +DB 102,15,56,221,216 > +DB 102,15,56,221,224 > +DB 102,15,56,221,232 > +DB 102,15,56,221,240 > +DB 102,15,56,221,248 > +DB 102,68,15,56,221,192 > +DB 102,68,15,56,221,200 > + DB 0F3h,0C3h ;repret > + > + > + > +ALIGN 16 > +_aesni_decrypt8: > + > + movups xmm0,XMMWORD[rcx] > + shl eax,4 > + movups xmm1,XMMWORD[16+rcx] > + xorps xmm2,xmm0 > + xorps xmm3,xmm0 > + pxor xmm4,xmm0 > + pxor xmm5,xmm0 > + pxor xmm6,xmm0 > + lea rcx,[32+rax*1+rcx] > + neg rax > +DB 102,15,56,222,209 > + pxor xmm7,xmm0 > + pxor xmm8,xmm0 > +DB 102,15,56,222,217 > + pxor xmm9,xmm0 > + movups xmm0,XMMWORD[rax*1+rcx] > + add rax,16 > + jmp NEAR $L$dec_loop8_inner > +ALIGN 16 > +$L$dec_loop8: > +DB 102,15,56,222,209 > +DB 102,15,56,222,217 > +$L$dec_loop8_inner: > +DB 102,15,56,222,225 > +DB 102,15,56,222,233 > +DB 102,15,56,222,241 > +DB 102,15,56,222,249 > +DB 102,68,15,56,222,193 > +DB 102,68,15,56,222,201 > +$L$dec_loop8_enter: > + movups xmm1,XMMWORD[rax*1+rcx] > + add rax,32 > +DB 102,15,56,222,208 > +DB 102,15,56,222,216 > +DB 102,15,56,222,224 > +DB 102,15,56,222,232 > +DB 102,15,56,222,240 > +DB 102,15,56,222,248 > +DB 102,68,15,56,222,192 > +DB 102,68,15,56,222,200 > + movups xmm0,XMMWORD[((-16))+rax*1+rcx] > + jnz NEAR $L$dec_loop8 > + > +DB 102,15,56,222,209 > +DB 102,15,56,222,217 > +DB 102,15,56,222,225 > +DB 102,15,56,222,233 > +DB 102,15,56,222,241 > +DB 102,15,56,222,249 > +DB 102,68,15,56,222,193 > +DB 102,68,15,56,222,201 > +DB 102,15,56,223,208 > +DB 102,15,56,223,216 > +DB 102,15,56,223,224 > +DB 102,15,56,223,232 > +DB 102,15,56,223,240 > +DB 102,15,56,223,248 > +DB 102,68,15,56,223,192 > +DB 102,68,15,56,223,200 > + DB 0F3h,0C3h ;repret > + > + > +global aesni_ecb_encrypt > + > +ALIGN 16 > +aesni_ecb_encrypt: > + mov QWORD[8+rsp],rdi ;WIN64 prologue > + mov QWORD[16+rsp],rsi > + mov rax,rsp > +$L$SEH_begin_aesni_ecb_encrypt: > + mov rdi,rcx > + mov rsi,rdx > + mov rdx,r8 > + mov rcx,r9 > + mov r8,QWORD[40+rsp] > + > + > + > + lea rsp,[((-88))+rsp] > + movaps XMMWORD[rsp],xmm6 > + movaps XMMWORD[16+rsp],xmm7 > + movaps XMMWORD[32+rsp],xmm8 > + movaps XMMWORD[48+rsp],xmm9 > +$L$ecb_enc_body: > + and rdx,-16 > + jz NEAR $L$ecb_ret > + > + mov eax,DWORD[240+rcx] > + movups xmm0,XMMWORD[rcx] > + mov r11,rcx > + mov r10d,eax > + test r8d,r8d > + jz NEAR $L$ecb_decrypt > + > + cmp rdx,0x80 > + jb NEAR $L$ecb_enc_tail > + > + movdqu xmm2,XMMWORD[rdi] > + movdqu xmm3,XMMWORD[16+rdi] > + movdqu xmm4,XMMWORD[32+rdi] > + movdqu xmm5,XMMWORD[48+rdi] > + movdqu xmm6,XMMWORD[64+rdi] > + movdqu xmm7,XMMWORD[80+rdi] > + movdqu xmm8,XMMWORD[96+rdi] > + movdqu xmm9,XMMWORD[112+rdi] > + lea rdi,[128+rdi] > + sub rdx,0x80 > + jmp NEAR $L$ecb_enc_loop8_enter > +ALIGN 16 > +$L$ecb_enc_loop8: > + movups XMMWORD[rsi],xmm2 > + mov rcx,r11 > + movdqu xmm2,XMMWORD[rdi] > + mov eax,r10d > + movups XMMWORD[16+rsi],xmm3 > + movdqu xmm3,XMMWORD[16+rdi] > + movups XMMWORD[32+rsi],xmm4 > + movdqu xmm4,XMMWORD[32+rdi] > + movups XMMWORD[48+rsi],xmm5 > + movdqu xmm5,XMMWORD[48+rdi] > + movups XMMWORD[64+rsi],xmm6 > + movdqu xmm6,XMMWORD[64+rdi] > + movups XMMWORD[80+rsi],xmm7 > + movdqu xmm7,XMMWORD[80+rdi] > + movups XMMWORD[96+rsi],xmm8 > + movdqu xmm8,XMMWORD[96+rdi] > + movups XMMWORD[112+rsi],xmm9 > + lea rsi,[128+rsi] > + movdqu xmm9,XMMWORD[112+rdi] > + lea rdi,[128+rdi] > +$L$ecb_enc_loop8_enter: > + > + call _aesni_encrypt8 > + > + sub rdx,0x80 > + jnc NEAR $L$ecb_enc_loop8 > + > + movups XMMWORD[rsi],xmm2 > + mov rcx,r11 > + movups XMMWORD[16+rsi],xmm3 > + mov eax,r10d > + movups XMMWORD[32+rsi],xmm4 > + movups XMMWORD[48+rsi],xmm5 > + movups XMMWORD[64+rsi],xmm6 > + movups XMMWORD[80+rsi],xmm7 > + movups XMMWORD[96+rsi],xmm8 > + movups XMMWORD[112+rsi],xmm9 > + lea rsi,[128+rsi] > + add rdx,0x80 > + jz NEAR $L$ecb_ret > + > +$L$ecb_enc_tail: > + movups xmm2,XMMWORD[rdi] > + cmp rdx,0x20 > + jb NEAR $L$ecb_enc_one > + movups xmm3,XMMWORD[16+rdi] > + je NEAR $L$ecb_enc_two > + movups xmm4,XMMWORD[32+rdi] > + cmp rdx,0x40 > + jb NEAR $L$ecb_enc_three > + movups xmm5,XMMWORD[48+rdi] > + je NEAR $L$ecb_enc_four > + movups xmm6,XMMWORD[64+rdi] > + cmp rdx,0x60 > + jb NEAR $L$ecb_enc_five > + movups xmm7,XMMWORD[80+rdi] > + je NEAR $L$ecb_enc_six > + movdqu xmm8,XMMWORD[96+rdi] > + xorps xmm9,xmm9 > + call _aesni_encrypt8 > + movups XMMWORD[rsi],xmm2 > + movups XMMWORD[16+rsi],xmm3 > + movups XMMWORD[32+rsi],xmm4 > + movups XMMWORD[48+rsi],xmm5 > + movups XMMWORD[64+rsi],xmm6 > + movups XMMWORD[80+rsi],xmm7 > + movups XMMWORD[96+rsi],xmm8 > + jmp NEAR $L$ecb_ret > +ALIGN 16 > +$L$ecb_enc_one: > + movups xmm0,XMMWORD[rcx] > + movups xmm1,XMMWORD[16+rcx] > + lea rcx,[32+rcx] > + xorps xmm2,xmm0 > +$L$oop_enc1_3: > +DB 102,15,56,220,209 > + dec eax > + movups xmm1,XMMWORD[rcx] > + lea rcx,[16+rcx] > + jnz NEAR $L$oop_enc1_3 > +DB 102,15,56,221,209 > + movups XMMWORD[rsi],xmm2 > + jmp NEAR $L$ecb_ret > +ALIGN 16 > +$L$ecb_enc_two: > + call _aesni_encrypt2 > + movups XMMWORD[rsi],xmm2 > + movups XMMWORD[16+rsi],xmm3 > + jmp NEAR $L$ecb_ret > +ALIGN 16 > +$L$ecb_enc_three: > + call _aesni_encrypt3 > + movups XMMWORD[rsi],xmm2 > + movups XMMWORD[16+rsi],xmm3 > + movups XMMWORD[32+rsi],xmm4 > + jmp NEAR $L$ecb_ret > +ALIGN 16 > +$L$ecb_enc_four: > + call _aesni_encrypt4 > + movups XMMWORD[rsi],xmm2 > + movups XMMWORD[16+rsi],xmm3 > + movups XMMWORD[32+rsi],xmm4 > + movups XMMWORD[48+rsi],xmm5 > + jmp NEAR $L$ecb_ret > +ALIGN 16 > +$L$ecb_enc_five: > + xorps xmm7,xmm7 > + call _aesni_encrypt6 > + movups XMMWORD[rsi],xmm2 > + movups XMMWORD[16+rsi],xmm3 > + movups XMMWORD[32+rsi],xmm4 > + movups XMMWORD[48+rsi],xmm5 > + movups XMMWORD[64+rsi],xmm6 > + jmp NEAR $L$ecb_ret > +ALIGN 16 > +$L$ecb_enc_six: > + call _aesni_encrypt6 > + movups XMMWORD[rsi],xmm2 > + movups XMMWORD[16+rsi],xmm3 > + movups XMMWORD[32+rsi],xmm4 > + movups XMMWORD[48+rsi],xmm5 > + movups XMMWORD[64+rsi],xmm6 > + movups XMMWORD[80+rsi],xmm7 > + jmp NEAR $L$ecb_ret > + > +ALIGN 16 > +$L$ecb_decrypt: > + cmp rdx,0x80 > + jb NEAR $L$ecb_dec_tail > + > + movdqu xmm2,XMMWORD[rdi] > + movdqu xmm3,XMMWORD[16+rdi] > + movdqu xmm4,XMMWORD[32+rdi] > + movdqu xmm5,XMMWORD[48+rdi] > + movdqu xmm6,XMMWORD[64+rdi] > + movdqu xmm7,XMMWORD[80+rdi] > + movdqu xmm8,XMMWORD[96+rdi] > + movdqu xmm9,XMMWORD[112+rdi] > + lea rdi,[128+rdi] > + sub rdx,0x80 > + jmp NEAR $L$ecb_dec_loop8_enter > +ALIGN 16 > +$L$ecb_dec_loop8: > + movups XMMWORD[rsi],xmm2 > + mov rcx,r11 > + movdqu xmm2,XMMWORD[rdi] > + mov eax,r10d > + movups XMMWORD[16+rsi],xmm3 > + movdqu xmm3,XMMWORD[16+rdi] > + movups XMMWORD[32+rsi],xmm4 > + movdqu xmm4,XMMWORD[32+rdi] > + movups XMMWORD[48+rsi],xmm5 > + movdqu xmm5,XMMWORD[48+rdi] > + movups XMMWORD[64+rsi],xmm6 > + movdqu xmm6,XMMWORD[64+rdi] > + movups XMMWORD[80+rsi],xmm7 > + movdqu xmm7,XMMWORD[80+rdi] > + movups XMMWORD[96+rsi],xmm8 > + movdqu xmm8,XMMWORD[96+rdi] > + movups XMMWORD[112+rsi],xmm9 > + lea rsi,[128+rsi] > + movdqu xmm9,XMMWORD[112+rdi] > + lea rdi,[128+rdi] > +$L$ecb_dec_loop8_enter: > + > + call _aesni_decrypt8 > + > + movups xmm0,XMMWORD[r11] > + sub rdx,0x80 > + jnc NEAR $L$ecb_dec_loop8 > + > + movups XMMWORD[rsi],xmm2 > + pxor xmm2,xmm2 > + mov rcx,r11 > + movups XMMWORD[16+rsi],xmm3 > + pxor xmm3,xmm3 > + mov eax,r10d > + movups XMMWORD[32+rsi],xmm4 > + pxor xmm4,xmm4 > + movups XMMWORD[48+rsi],xmm5 > + pxor xmm5,xmm5 > + movups XMMWORD[64+rsi],xmm6 > + pxor xmm6,xmm6 > + movups XMMWORD[80+rsi],xmm7 > + pxor xmm7,xmm7 > + movups XMMWORD[96+rsi],xmm8 > + pxor xmm8,xmm8 > + movups XMMWORD[112+rsi],xmm9 > + pxor xmm9,xmm9 > + lea rsi,[128+rsi] > + add rdx,0x80 > + jz NEAR $L$ecb_ret > + > +$L$ecb_dec_tail: > + movups xmm2,XMMWORD[rdi] > + cmp rdx,0x20 > + jb NEAR $L$ecb_dec_one > + movups xmm3,XMMWORD[16+rdi] > + je NEAR $L$ecb_dec_two > + movups xmm4,XMMWORD[32+rdi] > + cmp rdx,0x40 > + jb NEAR $L$ecb_dec_three > + movups xmm5,XMMWORD[48+rdi] > + je NEAR $L$ecb_dec_four > + movups xmm6,XMMWORD[64+rdi] > + cmp rdx,0x60 > + jb NEAR $L$ecb_dec_five > + movups xmm7,XMMWORD[80+rdi] > + je NEAR $L$ecb_dec_six > + movups xmm8,XMMWORD[96+rdi] > + movups xmm0,XMMWORD[rcx] > + xorps xmm9,xmm9 > + call _aesni_decrypt8 > + movups XMMWORD[rsi],xmm2 > + pxor xmm2,xmm2 > + movups XMMWORD[16+rsi],xmm3 > + pxor xmm3,xmm3 > + movups XMMWORD[32+rsi],xmm4 > + pxor xmm4,xmm4 > + movups XMMWORD[48+rsi],xmm5 > + pxor xmm5,xmm5 > + movups XMMWORD[64+rsi],xmm6 > + pxor xmm6,xmm6 > + movups XMMWORD[80+rsi],xmm7 > + pxor xmm7,xmm7 > + movups XMMWORD[96+rsi],xmm8 > + pxor xmm8,xmm8 > + pxor xmm9,xmm9 > + jmp NEAR $L$ecb_ret > +ALIGN 16 > +$L$ecb_dec_one: > + movups xmm0,XMMWORD[rcx] > + movups xmm1,XMMWORD[16+rcx] > + lea rcx,[32+rcx] > + xorps xmm2,xmm0 > +$L$oop_dec1_4: > +DB 102,15,56,222,209 > + dec eax > + movups xmm1,XMMWORD[rcx] > + lea rcx,[16+rcx] > + jnz NEAR $L$oop_dec1_4 > +DB 102,15,56,223,209 > + movups XMMWORD[rsi],xmm2 > + pxor xmm2,xmm2 > + jmp NEAR $L$ecb_ret > +ALIGN 16 > +$L$ecb_dec_two: > + call _aesni_decrypt2 > + movups XMMWORD[rsi],xmm2 > + pxor xmm2,xmm2 > + movups XMMWORD[16+rsi],xmm3 > + pxor xmm3,xmm3 > + jmp NEAR $L$ecb_ret > +ALIGN 16 > +$L$ecb_dec_three: > + call _aesni_decrypt3 > + movups XMMWORD[rsi],xmm2 > + pxor xmm2,xmm2 > + movups XMMWORD[16+rsi],xmm3 > + pxor xmm3,xmm3 > + movups XMMWORD[32+rsi],xmm4 > + pxor xmm4,xmm4 > + jmp NEAR $L$ecb_ret > +ALIGN 16 > +$L$ecb_dec_four: > + call _aesni_decrypt4 > + movups XMMWORD[rsi],xmm2 > + pxor xmm2,xmm2 > + movups XMMWORD[16+rsi],xmm3 > + pxor xmm3,xmm3 > + movups XMMWORD[32+rsi],xmm4 > + pxor xmm4,xmm4 > + movups XMMWORD[48+rsi],xmm5 > + pxor xmm5,xmm5 > + jmp NEAR $L$ecb_ret > +ALIGN 16 > +$L$ecb_dec_five: > + xorps xmm7,xmm7 > + call _aesni_decrypt6 > + movups XMMWORD[rsi],xmm2 > + pxor xmm2,xmm2 > + movups XMMWORD[16+rsi],xmm3 > + pxor xmm3,xmm3 > + movups XMMWORD[32+rsi],xmm4 > + pxor xmm4,xmm4 > + movups XMMWORD[48+rsi],xmm5 > + pxor xmm5,xmm5 > + movups XMMWORD[64+rsi],xmm6 > + pxor xmm6,xmm6 > + pxor xmm7,xmm7 > + jmp NEAR $L$ecb_ret > +ALIGN 16 > +$L$ecb_dec_six: > + call _aesni_decrypt6 > + movups XMMWORD[rsi],xmm2 > + pxor xmm2,xmm2 > + movups XMMWORD[16+rsi],xmm3 > + pxor xmm3,xmm3 > + movups XMMWORD[32+rsi],xmm4 > + pxor xmm4,xmm4 > + movups XMMWORD[48+rsi],xmm5 > + pxor xmm5,xmm5 > + movups XMMWORD[64+rsi],xmm6 > + pxor xmm6,xmm6 > + movups XMMWORD[80+rsi],xmm7 > + pxor xmm7,xmm7 > + > +$L$ecb_ret: > + xorps xmm0,xmm0 > + pxor xmm1,xmm1 > + movaps xmm6,XMMWORD[rsp] > + movaps XMMWORD[rsp],xmm0 > + movaps xmm7,XMMWORD[16+rsp] > + movaps XMMWORD[16+rsp],xmm0 > + movaps xmm8,XMMWORD[32+rsp] > + movaps XMMWORD[32+rsp],xmm0 > + movaps xmm9,XMMWORD[48+rsp] > + movaps XMMWORD[48+rsp],xmm0 > + lea rsp,[88+rsp] > +$L$ecb_enc_ret: > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > + mov rsi,QWORD[16+rsp] > + DB 0F3h,0C3h ;repret > + > +$L$SEH_end_aesni_ecb_encrypt: > +global aesni_ccm64_encrypt_blocks > + > +ALIGN 16 > +aesni_ccm64_encrypt_blocks: > + mov QWORD[8+rsp],rdi ;WIN64 prologue > + mov QWORD[16+rsp],rsi > + mov rax,rsp > +$L$SEH_begin_aesni_ccm64_encrypt_blocks: > + mov rdi,rcx > + mov rsi,rdx > + mov rdx,r8 > + mov rcx,r9 > + mov r8,QWORD[40+rsp] > + mov r9,QWORD[48+rsp] > + > + > + > + lea rsp,[((-88))+rsp] > + movaps XMMWORD[rsp],xmm6 > + movaps XMMWORD[16+rsp],xmm7 > + movaps XMMWORD[32+rsp],xmm8 > + movaps XMMWORD[48+rsp],xmm9 > +$L$ccm64_enc_body: > + mov eax,DWORD[240+rcx] > + movdqu xmm6,XMMWORD[r8] > + movdqa xmm9,XMMWORD[$L$increment64] > + movdqa xmm7,XMMWORD[$L$bswap_mask] > + > + shl eax,4 > + mov r10d,16 > + lea r11,[rcx] > + movdqu xmm3,XMMWORD[r9] > + movdqa xmm2,xmm6 > + lea rcx,[32+rax*1+rcx] > +DB 102,15,56,0,247 > + sub r10,rax > + jmp NEAR $L$ccm64_enc_outer > +ALIGN 16 > +$L$ccm64_enc_outer: > + movups xmm0,XMMWORD[r11] > + mov rax,r10 > + movups xmm8,XMMWORD[rdi] > + > + xorps xmm2,xmm0 > + movups xmm1,XMMWORD[16+r11] > + xorps xmm0,xmm8 > + xorps xmm3,xmm0 > + movups xmm0,XMMWORD[32+r11] > + > +$L$ccm64_enc2_loop: > +DB 102,15,56,220,209 > +DB 102,15,56,220,217 > + movups xmm1,XMMWORD[rax*1+rcx] > + add rax,32 > +DB 102,15,56,220,208 > +DB 102,15,56,220,216 > + movups xmm0,XMMWORD[((-16))+rax*1+rcx] > + jnz NEAR $L$ccm64_enc2_loop > +DB 102,15,56,220,209 > +DB 102,15,56,220,217 > + paddq xmm6,xmm9 > + dec rdx > +DB 102,15,56,221,208 > +DB 102,15,56,221,216 > + > + lea rdi,[16+rdi] > + xorps xmm8,xmm2 > + movdqa xmm2,xmm6 > + movups XMMWORD[rsi],xmm8 > +DB 102,15,56,0,215 > + lea rsi,[16+rsi] > + jnz NEAR $L$ccm64_enc_outer > + > + pxor xmm0,xmm0 > + pxor xmm1,xmm1 > + pxor xmm2,xmm2 > + movups XMMWORD[r9],xmm3 > + pxor xmm3,xmm3 > + pxor xmm8,xmm8 > + pxor xmm6,xmm6 > + movaps xmm6,XMMWORD[rsp] > + movaps XMMWORD[rsp],xmm0 > + movaps xmm7,XMMWORD[16+rsp] > + movaps XMMWORD[16+rsp],xmm0 > + movaps xmm8,XMMWORD[32+rsp] > + movaps XMMWORD[32+rsp],xmm0 > + movaps xmm9,XMMWORD[48+rsp] > + movaps XMMWORD[48+rsp],xmm0 > + lea rsp,[88+rsp] > +$L$ccm64_enc_ret: > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > + mov rsi,QWORD[16+rsp] > + DB 0F3h,0C3h ;repret > + > +$L$SEH_end_aesni_ccm64_encrypt_blocks: > +global aesni_ccm64_decrypt_blocks > + > +ALIGN 16 > +aesni_ccm64_decrypt_blocks: > + mov QWORD[8+rsp],rdi ;WIN64 prologue > + mov QWORD[16+rsp],rsi > + mov rax,rsp > +$L$SEH_begin_aesni_ccm64_decrypt_blocks: > + mov rdi,rcx > + mov rsi,rdx > + mov rdx,r8 > + mov rcx,r9 > + mov r8,QWORD[40+rsp] > + mov r9,QWORD[48+rsp] > + > + > + > + lea rsp,[((-88))+rsp] > + movaps XMMWORD[rsp],xmm6 > + movaps XMMWORD[16+rsp],xmm7 > + movaps XMMWORD[32+rsp],xmm8 > + movaps XMMWORD[48+rsp],xmm9 > +$L$ccm64_dec_body: > + mov eax,DWORD[240+rcx] > + movups xmm6,XMMWORD[r8] > + movdqu xmm3,XMMWORD[r9] > + movdqa xmm9,XMMWORD[$L$increment64] > + movdqa xmm7,XMMWORD[$L$bswap_mask] > + > + movaps xmm2,xmm6 > + mov r10d,eax > + mov r11,rcx > +DB 102,15,56,0,247 > + movups xmm0,XMMWORD[rcx] > + movups xmm1,XMMWORD[16+rcx] > + lea rcx,[32+rcx] > + xorps xmm2,xmm0 > +$L$oop_enc1_5: > +DB 102,15,56,220,209 > + dec eax > + movups xmm1,XMMWORD[rcx] > + lea rcx,[16+rcx] > + jnz NEAR $L$oop_enc1_5 > +DB 102,15,56,221,209 > + shl r10d,4 > + mov eax,16 > + movups xmm8,XMMWORD[rdi] > + paddq xmm6,xmm9 > + lea rdi,[16+rdi] > + sub rax,r10 > + lea rcx,[32+r10*1+r11] > + mov r10,rax > + jmp NEAR $L$ccm64_dec_outer > +ALIGN 16 > +$L$ccm64_dec_outer: > + xorps xmm8,xmm2 > + movdqa xmm2,xmm6 > + movups XMMWORD[rsi],xmm8 > + lea rsi,[16+rsi] > +DB 102,15,56,0,215 > + > + sub rdx,1 > + jz NEAR $L$ccm64_dec_break > + > + movups xmm0,XMMWORD[r11] > + mov rax,r10 > + movups xmm1,XMMWORD[16+r11] > + xorps xmm8,xmm0 > + xorps xmm2,xmm0 > + xorps xmm3,xmm8 > + movups xmm0,XMMWORD[32+r11] > + jmp NEAR $L$ccm64_dec2_loop > +ALIGN 16 > +$L$ccm64_dec2_loop: > +DB 102,15,56,220,209 > +DB 102,15,56,220,217 > + movups xmm1,XMMWORD[rax*1+rcx] > + add rax,32 > +DB 102,15,56,220,208 > +DB 102,15,56,220,216 > + movups xmm0,XMMWORD[((-16))+rax*1+rcx] > + jnz NEAR $L$ccm64_dec2_loop > + movups xmm8,XMMWORD[rdi] > + paddq xmm6,xmm9 > +DB 102,15,56,220,209 > +DB 102,15,56,220,217 > +DB 102,15,56,221,208 > +DB 102,15,56,221,216 > + lea rdi,[16+rdi] > + jmp NEAR $L$ccm64_dec_outer > + > +ALIGN 16 > +$L$ccm64_dec_break: > + > + mov eax,DWORD[240+r11] > + movups xmm0,XMMWORD[r11] > + movups xmm1,XMMWORD[16+r11] > + xorps xmm8,xmm0 > + lea r11,[32+r11] > + xorps xmm3,xmm8 > +$L$oop_enc1_6: > +DB 102,15,56,220,217 > + dec eax > + movups xmm1,XMMWORD[r11] > + lea r11,[16+r11] > + jnz NEAR $L$oop_enc1_6 > +DB 102,15,56,221,217 > + pxor xmm0,xmm0 > + pxor xmm1,xmm1 > + pxor xmm2,xmm2 > + movups XMMWORD[r9],xmm3 > + pxor xmm3,xmm3 > + pxor xmm8,xmm8 > + pxor xmm6,xmm6 > + movaps xmm6,XMMWORD[rsp] > + movaps XMMWORD[rsp],xmm0 > + movaps xmm7,XMMWORD[16+rsp] > + movaps XMMWORD[16+rsp],xmm0 > + movaps xmm8,XMMWORD[32+rsp] > + movaps XMMWORD[32+rsp],xmm0 > + movaps xmm9,XMMWORD[48+rsp] > + movaps XMMWORD[48+rsp],xmm0 > + lea rsp,[88+rsp] > +$L$ccm64_dec_ret: > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > + mov rsi,QWORD[16+rsp] > + DB 0F3h,0C3h ;repret > + > +$L$SEH_end_aesni_ccm64_decrypt_blocks: > +global aesni_ctr32_encrypt_blocks > + > +ALIGN 16 > +aesni_ctr32_encrypt_blocks: > + mov QWORD[8+rsp],rdi ;WIN64 prologue > + mov QWORD[16+rsp],rsi > + mov rax,rsp > +$L$SEH_begin_aesni_ctr32_encrypt_blocks: > + mov rdi,rcx > + mov rsi,rdx > + mov rdx,r8 > + mov rcx,r9 > + mov r8,QWORD[40+rsp] > + > + > + > + cmp rdx,1 > + jne NEAR $L$ctr32_bulk > + > + > + > + movups xmm2,XMMWORD[r8] > + movups xmm3,XMMWORD[rdi] > + mov edx,DWORD[240+rcx] > + movups xmm0,XMMWORD[rcx] > + movups xmm1,XMMWORD[16+rcx] > + lea rcx,[32+rcx] > + xorps xmm2,xmm0 > +$L$oop_enc1_7: > +DB 102,15,56,220,209 > + dec edx > + movups xmm1,XMMWORD[rcx] > + lea rcx,[16+rcx] > + jnz NEAR $L$oop_enc1_7 > +DB 102,15,56,221,209 > + pxor xmm0,xmm0 > + pxor xmm1,xmm1 > + xorps xmm2,xmm3 > + pxor xmm3,xmm3 > + movups XMMWORD[rsi],xmm2 > + xorps xmm2,xmm2 > + jmp NEAR $L$ctr32_epilogue > + > +ALIGN 16 > +$L$ctr32_bulk: > + lea r11,[rsp] > + > + push rbp > + > + sub rsp,288 > + and rsp,-16 > + movaps XMMWORD[(-168)+r11],xmm6 > + movaps XMMWORD[(-152)+r11],xmm7 > + movaps XMMWORD[(-136)+r11],xmm8 > + movaps XMMWORD[(-120)+r11],xmm9 > + movaps XMMWORD[(-104)+r11],xmm10 > + movaps XMMWORD[(-88)+r11],xmm11 > + movaps XMMWORD[(-72)+r11],xmm12 > + movaps XMMWORD[(-56)+r11],xmm13 > + movaps XMMWORD[(-40)+r11],xmm14 > + movaps XMMWORD[(-24)+r11],xmm15 > +$L$ctr32_body: > + > + > + > + > + movdqu xmm2,XMMWORD[r8] > + movdqu xmm0,XMMWORD[rcx] > + mov r8d,DWORD[12+r8] > + pxor xmm2,xmm0 > + mov ebp,DWORD[12+rcx] > + movdqa XMMWORD[rsp],xmm2 > + bswap r8d > + movdqa xmm3,xmm2 > + movdqa xmm4,xmm2 > + movdqa xmm5,xmm2 > + movdqa XMMWORD[64+rsp],xmm2 > + movdqa XMMWORD[80+rsp],xmm2 > + movdqa XMMWORD[96+rsp],xmm2 > + mov r10,rdx > + movdqa XMMWORD[112+rsp],xmm2 > + > + lea rax,[1+r8] > + lea rdx,[2+r8] > + bswap eax > + bswap edx > + xor eax,ebp > + xor edx,ebp > +DB 102,15,58,34,216,3 > + lea rax,[3+r8] > + movdqa XMMWORD[16+rsp],xmm3 > +DB 102,15,58,34,226,3 > + bswap eax > + mov rdx,r10 > + lea r10,[4+r8] > + movdqa XMMWORD[32+rsp],xmm4 > + xor eax,ebp > + bswap r10d > +DB 102,15,58,34,232,3 > + xor r10d,ebp > + movdqa XMMWORD[48+rsp],xmm5 > + lea r9,[5+r8] > + mov DWORD[((64+12))+rsp],r10d > + bswap r9d > + lea r10,[6+r8] > + mov eax,DWORD[240+rcx] > + xor r9d,ebp > + bswap r10d > + mov DWORD[((80+12))+rsp],r9d > + xor r10d,ebp > + lea r9,[7+r8] > + mov DWORD[((96+12))+rsp],r10d > + bswap r9d > + mov r10d,DWORD[((OPENSSL_ia32cap_P+4))] > + xor r9d,ebp > + and r10d,71303168 > + mov DWORD[((112+12))+rsp],r9d > + > + movups xmm1,XMMWORD[16+rcx] > + > + movdqa xmm6,XMMWORD[64+rsp] > + movdqa xmm7,XMMWORD[80+rsp] > + > + cmp rdx,8 > + jb NEAR $L$ctr32_tail > + > + sub rdx,6 > + cmp r10d,4194304 > + je NEAR $L$ctr32_6x > + > + lea rcx,[128+rcx] > + sub rdx,2 > + jmp NEAR $L$ctr32_loop8 > + > +ALIGN 16 > +$L$ctr32_6x: > + shl eax,4 > + mov r10d,48 > + bswap ebp > + lea rcx,[32+rax*1+rcx] > + sub r10,rax > + jmp NEAR $L$ctr32_loop6 > + > +ALIGN 16 > +$L$ctr32_loop6: > + add r8d,6 > + movups xmm0,XMMWORD[((-48))+r10*1+rcx] > +DB 102,15,56,220,209 > + mov eax,r8d > + xor eax,ebp > +DB 102,15,56,220,217 > +DB 0x0f,0x38,0xf1,0x44,0x24,12 > + lea eax,[1+r8] > +DB 102,15,56,220,225 > + xor eax,ebp > +DB 0x0f,0x38,0xf1,0x44,0x24,28 > +DB 102,15,56,220,233 > + lea eax,[2+r8] > + xor eax,ebp > +DB 102,15,56,220,241 > +DB 0x0f,0x38,0xf1,0x44,0x24,44 > + lea eax,[3+r8] > +DB 102,15,56,220,249 > + movups xmm1,XMMWORD[((-32))+r10*1+rcx] > + xor eax,ebp > + > +DB 102,15,56,220,208 > +DB 0x0f,0x38,0xf1,0x44,0x24,60 > + lea eax,[4+r8] > +DB 102,15,56,220,216 > + xor eax,ebp > +DB 0x0f,0x38,0xf1,0x44,0x24,76 > +DB 102,15,56,220,224 > + lea eax,[5+r8] > + xor eax,ebp > +DB 102,15,56,220,232 > +DB 0x0f,0x38,0xf1,0x44,0x24,92 > + mov rax,r10 > +DB 102,15,56,220,240 > +DB 102,15,56,220,248 > + movups xmm0,XMMWORD[((-16))+r10*1+rcx] > + > + call $L$enc_loop6 > + > + movdqu xmm8,XMMWORD[rdi] > + movdqu xmm9,XMMWORD[16+rdi] > + movdqu xmm10,XMMWORD[32+rdi] > + movdqu xmm11,XMMWORD[48+rdi] > + movdqu xmm12,XMMWORD[64+rdi] > + movdqu xmm13,XMMWORD[80+rdi] > + lea rdi,[96+rdi] > + movups xmm1,XMMWORD[((-64))+r10*1+rcx] > + pxor xmm8,xmm2 > + movaps xmm2,XMMWORD[rsp] > + pxor xmm9,xmm3 > + movaps xmm3,XMMWORD[16+rsp] > + pxor xmm10,xmm4 > + movaps xmm4,XMMWORD[32+rsp] > + pxor xmm11,xmm5 > + movaps xmm5,XMMWORD[48+rsp] > + pxor xmm12,xmm6 > + movaps xmm6,XMMWORD[64+rsp] > + pxor xmm13,xmm7 > + movaps xmm7,XMMWORD[80+rsp] > + movdqu XMMWORD[rsi],xmm8 > + movdqu XMMWORD[16+rsi],xmm9 > + movdqu XMMWORD[32+rsi],xmm10 > + movdqu XMMWORD[48+rsi],xmm11 > + movdqu XMMWORD[64+rsi],xmm12 > + movdqu XMMWORD[80+rsi],xmm13 > + lea rsi,[96+rsi] > + > + sub rdx,6 > + jnc NEAR $L$ctr32_loop6 > + > + add rdx,6 > + jz NEAR $L$ctr32_done > + > + lea eax,[((-48))+r10] > + lea rcx,[((-80))+r10*1+rcx] > + neg eax > + shr eax,4 > + jmp NEAR $L$ctr32_tail > + > +ALIGN 32 > +$L$ctr32_loop8: > + add r8d,8 > + movdqa xmm8,XMMWORD[96+rsp] > +DB 102,15,56,220,209 > + mov r9d,r8d > + movdqa xmm9,XMMWORD[112+rsp] > +DB 102,15,56,220,217 > + bswap r9d > + movups xmm0,XMMWORD[((32-128))+rcx] > +DB 102,15,56,220,225 > + xor r9d,ebp > + nop > +DB 102,15,56,220,233 > + mov DWORD[((0+12))+rsp],r9d > + lea r9,[1+r8] > +DB 102,15,56,220,241 > +DB 102,15,56,220,249 > +DB 102,68,15,56,220,193 > +DB 102,68,15,56,220,201 > + movups xmm1,XMMWORD[((48-128))+rcx] > + bswap r9d > +DB 102,15,56,220,208 > +DB 102,15,56,220,216 > + xor r9d,ebp > +DB 0x66,0x90 > +DB 102,15,56,220,224 > +DB 102,15,56,220,232 > + mov DWORD[((16+12))+rsp],r9d > + lea r9,[2+r8] > +DB 102,15,56,220,240 > +DB 102,15,56,220,248 > +DB 102,68,15,56,220,192 > +DB 102,68,15,56,220,200 > + movups xmm0,XMMWORD[((64-128))+rcx] > + bswap r9d > +DB 102,15,56,220,209 > +DB 102,15,56,220,217 > + xor r9d,ebp > +DB 0x66,0x90 > +DB 102,15,56,220,225 > +DB 102,15,56,220,233 > + mov DWORD[((32+12))+rsp],r9d > + lea r9,[3+r8] > +DB 102,15,56,220,241 > +DB 102,15,56,220,249 > +DB 102,68,15,56,220,193 > +DB 102,68,15,56,220,201 > + movups xmm1,XMMWORD[((80-128))+rcx] > + bswap r9d > +DB 102,15,56,220,208 > +DB 102,15,56,220,216 > + xor r9d,ebp > +DB 0x66,0x90 > +DB 102,15,56,220,224 > +DB 102,15,56,220,232 > + mov DWORD[((48+12))+rsp],r9d > + lea r9,[4+r8] > +DB 102,15,56,220,240 > +DB 102,15,56,220,248 > +DB 102,68,15,56,220,192 > +DB 102,68,15,56,220,200 > + movups xmm0,XMMWORD[((96-128))+rcx] > + bswap r9d > +DB 102,15,56,220,209 > +DB 102,15,56,220,217 > + xor r9d,ebp > +DB 0x66,0x90 > +DB 102,15,56,220,225 > +DB 102,15,56,220,233 > + mov DWORD[((64+12))+rsp],r9d > + lea r9,[5+r8] > +DB 102,15,56,220,241 > +DB 102,15,56,220,249 > +DB 102,68,15,56,220,193 > +DB 102,68,15,56,220,201 > + movups xmm1,XMMWORD[((112-128))+rcx] > + bswap r9d > +DB 102,15,56,220,208 > +DB 102,15,56,220,216 > + xor r9d,ebp > +DB 0x66,0x90 > +DB 102,15,56,220,224 > +DB 102,15,56,220,232 > + mov DWORD[((80+12))+rsp],r9d > + lea r9,[6+r8] > +DB 102,15,56,220,240 > +DB 102,15,56,220,248 > +DB 102,68,15,56,220,192 > +DB 102,68,15,56,220,200 > + movups xmm0,XMMWORD[((128-128))+rcx] > + bswap r9d > +DB 102,15,56,220,209 > +DB 102,15,56,220,217 > + xor r9d,ebp > +DB 0x66,0x90 > +DB 102,15,56,220,225 > +DB 102,15,56,220,233 > + mov DWORD[((96+12))+rsp],r9d > + lea r9,[7+r8] > +DB 102,15,56,220,241 > +DB 102,15,56,220,249 > +DB 102,68,15,56,220,193 > +DB 102,68,15,56,220,201 > + movups xmm1,XMMWORD[((144-128))+rcx] > + bswap r9d > +DB 102,15,56,220,208 > +DB 102,15,56,220,216 > +DB 102,15,56,220,224 > + xor r9d,ebp > + movdqu xmm10,XMMWORD[rdi] > +DB 102,15,56,220,232 > + mov DWORD[((112+12))+rsp],r9d > + cmp eax,11 > +DB 102,15,56,220,240 > +DB 102,15,56,220,248 > +DB 102,68,15,56,220,192 > +DB 102,68,15,56,220,200 > + movups xmm0,XMMWORD[((160-128))+rcx] > + > + jb NEAR $L$ctr32_enc_done > + > +DB 102,15,56,220,209 > +DB 102,15,56,220,217 > +DB 102,15,56,220,225 > +DB 102,15,56,220,233 > +DB 102,15,56,220,241 > +DB 102,15,56,220,249 > +DB 102,68,15,56,220,193 > +DB 102,68,15,56,220,201 > + movups xmm1,XMMWORD[((176-128))+rcx] > + > +DB 102,15,56,220,208 > +DB 102,15,56,220,216 > +DB 102,15,56,220,224 > +DB 102,15,56,220,232 > +DB 102,15,56,220,240 > +DB 102,15,56,220,248 > +DB 102,68,15,56,220,192 > +DB 102,68,15,56,220,200 > + movups xmm0,XMMWORD[((192-128))+rcx] > + je NEAR $L$ctr32_enc_done > + > +DB 102,15,56,220,209 > +DB 102,15,56,220,217 > +DB 102,15,56,220,225 > +DB 102,15,56,220,233 > +DB 102,15,56,220,241 > +DB 102,15,56,220,249 > +DB 102,68,15,56,220,193 > +DB 102,68,15,56,220,201 > + movups xmm1,XMMWORD[((208-128))+rcx] > + > +DB 102,15,56,220,208 > +DB 102,15,56,220,216 > +DB 102,15,56,220,224 > +DB 102,15,56,220,232 > +DB 102,15,56,220,240 > +DB 102,15,56,220,248 > +DB 102,68,15,56,220,192 > +DB 102,68,15,56,220,200 > + movups xmm0,XMMWORD[((224-128))+rcx] > + jmp NEAR $L$ctr32_enc_done > + > +ALIGN 16 > +$L$ctr32_enc_done: > + movdqu xmm11,XMMWORD[16+rdi] > + pxor xmm10,xmm0 > + movdqu xmm12,XMMWORD[32+rdi] > + pxor xmm11,xmm0 > + movdqu xmm13,XMMWORD[48+rdi] > + pxor xmm12,xmm0 > + movdqu xmm14,XMMWORD[64+rdi] > + pxor xmm13,xmm0 > + movdqu xmm15,XMMWORD[80+rdi] > + pxor xmm14,xmm0 > + pxor xmm15,xmm0 > +DB 102,15,56,220,209 > +DB 102,15,56,220,217 > +DB 102,15,56,220,225 > +DB 102,15,56,220,233 > +DB 102,15,56,220,241 > +DB 102,15,56,220,249 > +DB 102,68,15,56,220,193 > +DB 102,68,15,56,220,201 > + movdqu xmm1,XMMWORD[96+rdi] > + lea rdi,[128+rdi] > + > +DB 102,65,15,56,221,210 > + pxor xmm1,xmm0 > + movdqu xmm10,XMMWORD[((112-128))+rdi] > +DB 102,65,15,56,221,219 > + pxor xmm10,xmm0 > + movdqa xmm11,XMMWORD[rsp] > +DB 102,65,15,56,221,228 > +DB 102,65,15,56,221,237 > + movdqa xmm12,XMMWORD[16+rsp] > + movdqa xmm13,XMMWORD[32+rsp] > +DB 102,65,15,56,221,246 > +DB 102,65,15,56,221,255 > + movdqa xmm14,XMMWORD[48+rsp] > + movdqa xmm15,XMMWORD[64+rsp] > +DB 102,68,15,56,221,193 > + movdqa xmm0,XMMWORD[80+rsp] > + movups xmm1,XMMWORD[((16-128))+rcx] > +DB 102,69,15,56,221,202 > + > + movups XMMWORD[rsi],xmm2 > + movdqa xmm2,xmm11 > + movups XMMWORD[16+rsi],xmm3 > + movdqa xmm3,xmm12 > + movups XMMWORD[32+rsi],xmm4 > + movdqa xmm4,xmm13 > + movups XMMWORD[48+rsi],xmm5 > + movdqa xmm5,xmm14 > + movups XMMWORD[64+rsi],xmm6 > + movdqa xmm6,xmm15 > + movups XMMWORD[80+rsi],xmm7 > + movdqa xmm7,xmm0 > + movups XMMWORD[96+rsi],xmm8 > + movups XMMWORD[112+rsi],xmm9 > + lea rsi,[128+rsi] > + > + sub rdx,8 > + jnc NEAR $L$ctr32_loop8 > + > + add rdx,8 > + jz NEAR $L$ctr32_done > + lea rcx,[((-128))+rcx] > + > +$L$ctr32_tail: > + > + > + lea rcx,[16+rcx] > + cmp rdx,4 > + jb NEAR $L$ctr32_loop3 > + je NEAR $L$ctr32_loop4 > + > + > + shl eax,4 > + movdqa xmm8,XMMWORD[96+rsp] > + pxor xmm9,xmm9 > + > + movups xmm0,XMMWORD[16+rcx] > +DB 102,15,56,220,209 > +DB 102,15,56,220,217 > + lea rcx,[((32-16))+rax*1+rcx] > + neg rax > +DB 102,15,56,220,225 > + add rax,16 > + movups xmm10,XMMWORD[rdi] > +DB 102,15,56,220,233 > +DB 102,15,56,220,241 > + movups xmm11,XMMWORD[16+rdi] > + movups xmm12,XMMWORD[32+rdi] > +DB 102,15,56,220,249 > +DB 102,68,15,56,220,193 > + > + call $L$enc_loop8_enter > + > + movdqu xmm13,XMMWORD[48+rdi] > + pxor xmm2,xmm10 > + movdqu xmm10,XMMWORD[64+rdi] > + pxor xmm3,xmm11 > + movdqu XMMWORD[rsi],xmm2 > + pxor xmm4,xmm12 > + movdqu XMMWORD[16+rsi],xmm3 > + pxor xmm5,xmm13 > + movdqu XMMWORD[32+rsi],xmm4 > + pxor xmm6,xmm10 > + movdqu XMMWORD[48+rsi],xmm5 > + movdqu XMMWORD[64+rsi],xmm6 > + cmp rdx,6 > + jb NEAR $L$ctr32_done > + > + movups xmm11,XMMWORD[80+rdi] > + xorps xmm7,xmm11 > + movups XMMWORD[80+rsi],xmm7 > + je NEAR $L$ctr32_done > + > + movups xmm12,XMMWORD[96+rdi] > + xorps xmm8,xmm12 > + movups XMMWORD[96+rsi],xmm8 > + jmp NEAR $L$ctr32_done > + > +ALIGN 32 > +$L$ctr32_loop4: > +DB 102,15,56,220,209 > + lea rcx,[16+rcx] > + dec eax > +DB 102,15,56,220,217 > +DB 102,15,56,220,225 > +DB 102,15,56,220,233 > + movups xmm1,XMMWORD[rcx] > + jnz NEAR $L$ctr32_loop4 > +DB 102,15,56,221,209 > +DB 102,15,56,221,217 > + movups xmm10,XMMWORD[rdi] > + movups xmm11,XMMWORD[16+rdi] > +DB 102,15,56,221,225 > +DB 102,15,56,221,233 > + movups xmm12,XMMWORD[32+rdi] > + movups xmm13,XMMWORD[48+rdi] > + > + xorps xmm2,xmm10 > + movups XMMWORD[rsi],xmm2 > + xorps xmm3,xmm11 > + movups XMMWORD[16+rsi],xmm3 > + pxor xmm4,xmm12 > + movdqu XMMWORD[32+rsi],xmm4 > + pxor xmm5,xmm13 > + movdqu XMMWORD[48+rsi],xmm5 > + jmp NEAR $L$ctr32_done > + > +ALIGN 32 > +$L$ctr32_loop3: > +DB 102,15,56,220,209 > + lea rcx,[16+rcx] > + dec eax > +DB 102,15,56,220,217 > +DB 102,15,56,220,225 > + movups xmm1,XMMWORD[rcx] > + jnz NEAR $L$ctr32_loop3 > +DB 102,15,56,221,209 > +DB 102,15,56,221,217 > +DB 102,15,56,221,225 > + > + movups xmm10,XMMWORD[rdi] > + xorps xmm2,xmm10 > + movups XMMWORD[rsi],xmm2 > + cmp rdx,2 > + jb NEAR $L$ctr32_done > + > + movups xmm11,XMMWORD[16+rdi] > + xorps xmm3,xmm11 > + movups XMMWORD[16+rsi],xmm3 > + je NEAR $L$ctr32_done > + > + movups xmm12,XMMWORD[32+rdi] > + xorps xmm4,xmm12 > + movups XMMWORD[32+rsi],xmm4 > + > +$L$ctr32_done: > + xorps xmm0,xmm0 > + xor ebp,ebp > + pxor xmm1,xmm1 > + pxor xmm2,xmm2 > + pxor xmm3,xmm3 > + pxor xmm4,xmm4 > + pxor xmm5,xmm5 > + movaps xmm6,XMMWORD[((-168))+r11] > + movaps XMMWORD[(-168)+r11],xmm0 > + movaps xmm7,XMMWORD[((-152))+r11] > + movaps XMMWORD[(-152)+r11],xmm0 > + movaps xmm8,XMMWORD[((-136))+r11] > + movaps XMMWORD[(-136)+r11],xmm0 > + movaps xmm9,XMMWORD[((-120))+r11] > + movaps XMMWORD[(-120)+r11],xmm0 > + movaps xmm10,XMMWORD[((-104))+r11] > + movaps XMMWORD[(-104)+r11],xmm0 > + movaps xmm11,XMMWORD[((-88))+r11] > + movaps XMMWORD[(-88)+r11],xmm0 > + movaps xmm12,XMMWORD[((-72))+r11] > + movaps XMMWORD[(-72)+r11],xmm0 > + movaps xmm13,XMMWORD[((-56))+r11] > + movaps XMMWORD[(-56)+r11],xmm0 > + movaps xmm14,XMMWORD[((-40))+r11] > + movaps XMMWORD[(-40)+r11],xmm0 > + movaps xmm15,XMMWORD[((-24))+r11] > + movaps XMMWORD[(-24)+r11],xmm0 > + movaps XMMWORD[rsp],xmm0 > + movaps XMMWORD[16+rsp],xmm0 > + movaps XMMWORD[32+rsp],xmm0 > + movaps XMMWORD[48+rsp],xmm0 > + movaps XMMWORD[64+rsp],xmm0 > + movaps XMMWORD[80+rsp],xmm0 > + movaps XMMWORD[96+rsp],xmm0 > + movaps XMMWORD[112+rsp],xmm0 > + mov rbp,QWORD[((-8))+r11] > + > + lea rsp,[r11] > + > +$L$ctr32_epilogue: > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > + mov rsi,QWORD[16+rsp] > + DB 0F3h,0C3h ;repret > + > +$L$SEH_end_aesni_ctr32_encrypt_blocks: > +global aesni_xts_encrypt > + > +ALIGN 16 > +aesni_xts_encrypt: > + mov QWORD[8+rsp],rdi ;WIN64 prologue > + mov QWORD[16+rsp],rsi > + mov rax,rsp > +$L$SEH_begin_aesni_xts_encrypt: > + mov rdi,rcx > + mov rsi,rdx > + mov rdx,r8 > + mov rcx,r9 > + mov r8,QWORD[40+rsp] > + mov r9,QWORD[48+rsp] > + > + > + > + lea r11,[rsp] > + > + push rbp > + > + sub rsp,272 > + and rsp,-16 > + movaps XMMWORD[(-168)+r11],xmm6 > + movaps XMMWORD[(-152)+r11],xmm7 > + movaps XMMWORD[(-136)+r11],xmm8 > + movaps XMMWORD[(-120)+r11],xmm9 > + movaps XMMWORD[(-104)+r11],xmm10 > + movaps XMMWORD[(-88)+r11],xmm11 > + movaps XMMWORD[(-72)+r11],xmm12 > + movaps XMMWORD[(-56)+r11],xmm13 > + movaps XMMWORD[(-40)+r11],xmm14 > + movaps XMMWORD[(-24)+r11],xmm15 > +$L$xts_enc_body: > + movups xmm2,XMMWORD[r9] > + mov eax,DWORD[240+r8] > + mov r10d,DWORD[240+rcx] > + movups xmm0,XMMWORD[r8] > + movups xmm1,XMMWORD[16+r8] > + lea r8,[32+r8] > + xorps xmm2,xmm0 > +$L$oop_enc1_8: > +DB 102,15,56,220,209 > + dec eax > + movups xmm1,XMMWORD[r8] > + lea r8,[16+r8] > + jnz NEAR $L$oop_enc1_8 > +DB 102,15,56,221,209 > + movups xmm0,XMMWORD[rcx] > + mov rbp,rcx > + mov eax,r10d > + shl r10d,4 > + mov r9,rdx > + and rdx,-16 > + > + movups xmm1,XMMWORD[16+r10*1+rcx] > + > + movdqa xmm8,XMMWORD[$L$xts_magic] > + movdqa xmm15,xmm2 > + pshufd xmm9,xmm2,0x5f > + pxor xmm1,xmm0 > + movdqa xmm14,xmm9 > + paddd xmm9,xmm9 > + movdqa xmm10,xmm15 > + psrad xmm14,31 > + paddq xmm15,xmm15 > + pand xmm14,xmm8 > + pxor xmm10,xmm0 > + pxor xmm15,xmm14 > + movdqa xmm14,xmm9 > + paddd xmm9,xmm9 > + movdqa xmm11,xmm15 > + psrad xmm14,31 > + paddq xmm15,xmm15 > + pand xmm14,xmm8 > + pxor xmm11,xmm0 > + pxor xmm15,xmm14 > + movdqa xmm14,xmm9 > + paddd xmm9,xmm9 > + movdqa xmm12,xmm15 > + psrad xmm14,31 > + paddq xmm15,xmm15 > + pand xmm14,xmm8 > + pxor xmm12,xmm0 > + pxor xmm15,xmm14 > + movdqa xmm14,xmm9 > + paddd xmm9,xmm9 > + movdqa xmm13,xmm15 > + psrad xmm14,31 > + paddq xmm15,xmm15 > + pand xmm14,xmm8 > + pxor xmm13,xmm0 > + pxor xmm15,xmm14 > + movdqa xmm14,xmm15 > + psrad xmm9,31 > + paddq xmm15,xmm15 > + pand xmm9,xmm8 > + pxor xmm14,xmm0 > + pxor xmm15,xmm9 > + movaps XMMWORD[96+rsp],xmm1 > + > + sub rdx,16*6 > + jc NEAR $L$xts_enc_short > + > + mov eax,16+96 > + lea rcx,[32+r10*1+rbp] > + sub rax,r10 > + movups xmm1,XMMWORD[16+rbp] > + mov r10,rax > + lea r8,[$L$xts_magic] > + jmp NEAR $L$xts_enc_grandloop > + > +ALIGN 32 > +$L$xts_enc_grandloop: > + movdqu xmm2,XMMWORD[rdi] > + movdqa xmm8,xmm0 > + movdqu xmm3,XMMWORD[16+rdi] > + pxor xmm2,xmm10 > + movdqu xmm4,XMMWORD[32+rdi] > + pxor xmm3,xmm11 > +DB 102,15,56,220,209 > + movdqu xmm5,XMMWORD[48+rdi] > + pxor xmm4,xmm12 > +DB 102,15,56,220,217 > + movdqu xmm6,XMMWORD[64+rdi] > + pxor xmm5,xmm13 > +DB 102,15,56,220,225 > + movdqu xmm7,XMMWORD[80+rdi] > + pxor xmm8,xmm15 > + movdqa xmm9,XMMWORD[96+rsp] > + pxor xmm6,xmm14 > +DB 102,15,56,220,233 > + movups xmm0,XMMWORD[32+rbp] > + lea rdi,[96+rdi] > + pxor xmm7,xmm8 > + > + pxor xmm10,xmm9 > +DB 102,15,56,220,241 > + pxor xmm11,xmm9 > + movdqa XMMWORD[rsp],xmm10 > +DB 102,15,56,220,249 > + movups xmm1,XMMWORD[48+rbp] > + pxor xmm12,xmm9 > + > +DB 102,15,56,220,208 > + pxor xmm13,xmm9 > + movdqa XMMWORD[16+rsp],xmm11 > +DB 102,15,56,220,216 > + pxor xmm14,xmm9 > + movdqa XMMWORD[32+rsp],xmm12 > +DB 102,15,56,220,224 > +DB 102,15,56,220,232 > + pxor xmm8,xmm9 > + movdqa XMMWORD[64+rsp],xmm14 > +DB 102,15,56,220,240 > +DB 102,15,56,220,248 > + movups xmm0,XMMWORD[64+rbp] > + movdqa XMMWORD[80+rsp],xmm8 > + pshufd xmm9,xmm15,0x5f > + jmp NEAR $L$xts_enc_loop6 > +ALIGN 32 > +$L$xts_enc_loop6: > +DB 102,15,56,220,209 > +DB 102,15,56,220,217 > +DB 102,15,56,220,225 > +DB 102,15,56,220,233 > +DB 102,15,56,220,241 > +DB 102,15,56,220,249 > + movups xmm1,XMMWORD[((-64))+rax*1+rcx] > + add rax,32 > + > +DB 102,15,56,220,208 > +DB 102,15,56,220,216 > +DB 102,15,56,220,224 > +DB 102,15,56,220,232 > +DB 102,15,56,220,240 > +DB 102,15,56,220,248 > + movups xmm0,XMMWORD[((-80))+rax*1+rcx] > + jnz NEAR $L$xts_enc_loop6 > + > + movdqa xmm8,XMMWORD[r8] > + movdqa xmm14,xmm9 > + paddd xmm9,xmm9 > +DB 102,15,56,220,209 > + paddq xmm15,xmm15 > + psrad xmm14,31 > +DB 102,15,56,220,217 > + pand xmm14,xmm8 > + movups xmm10,XMMWORD[rbp] > +DB 102,15,56,220,225 > +DB 102,15,56,220,233 > +DB 102,15,56,220,241 > + pxor xmm15,xmm14 > + movaps xmm11,xmm10 > +DB 102,15,56,220,249 > + movups xmm1,XMMWORD[((-64))+rcx] > + > + movdqa xmm14,xmm9 > +DB 102,15,56,220,208 > + paddd xmm9,xmm9 > + pxor xmm10,xmm15 > +DB 102,15,56,220,216 > + psrad xmm14,31 > + paddq xmm15,xmm15 > +DB 102,15,56,220,224 > +DB 102,15,56,220,232 > + pand xmm14,xmm8 > + movaps xmm12,xmm11 > +DB 102,15,56,220,240 > + pxor xmm15,xmm14 > + movdqa xmm14,xmm9 > +DB 102,15,56,220,248 > + movups xmm0,XMMWORD[((-48))+rcx] > + > + paddd xmm9,xmm9 > +DB 102,15,56,220,209 > + pxor xmm11,xmm15 > + psrad xmm14,31 > +DB 102,15,56,220,217 > + paddq xmm15,xmm15 > + pand xmm14,xmm8 > +DB 102,15,56,220,225 > +DB 102,15,56,220,233 > + movdqa XMMWORD[48+rsp],xmm13 > + pxor xmm15,xmm14 > +DB 102,15,56,220,241 > + movaps xmm13,xmm12 > + movdqa xmm14,xmm9 > +DB 102,15,56,220,249 > + movups xmm1,XMMWORD[((-32))+rcx] > + > + paddd xmm9,xmm9 > +DB 102,15,56,220,208 > + pxor xmm12,xmm15 > + psrad xmm14,31 > +DB 102,15,56,220,216 > + paddq xmm15,xmm15 > + pand xmm14,xmm8 > +DB 102,15,56,220,224 > +DB 102,15,56,220,232 > +DB 102,15,56,220,240 > + pxor xmm15,xmm14 > + movaps xmm14,xmm13 > +DB 102,15,56,220,248 > + > + movdqa xmm0,xmm9 > + paddd xmm9,xmm9 > +DB 102,15,56,220,209 > + pxor xmm13,xmm15 > + psrad xmm0,31 > +DB 102,15,56,220,217 > + paddq xmm15,xmm15 > + pand xmm0,xmm8 > +DB 102,15,56,220,225 > +DB 102,15,56,220,233 > + pxor xmm15,xmm0 > + movups xmm0,XMMWORD[rbp] > +DB 102,15,56,220,241 > +DB 102,15,56,220,249 > + movups xmm1,XMMWORD[16+rbp] > + > + pxor xmm14,xmm15 > +DB 102,15,56,221,84,36,0 > + psrad xmm9,31 > + paddq xmm15,xmm15 > +DB 102,15,56,221,92,36,16 > +DB 102,15,56,221,100,36,32 > + pand xmm9,xmm8 > + mov rax,r10 > +DB 102,15,56,221,108,36,48 > +DB 102,15,56,221,116,36,64 > +DB 102,15,56,221,124,36,80 > + pxor xmm15,xmm9 > + > + lea rsi,[96+rsi] > + movups XMMWORD[(-96)+rsi],xmm2 > + movups XMMWORD[(-80)+rsi],xmm3 > + movups XMMWORD[(-64)+rsi],xmm4 > + movups XMMWORD[(-48)+rsi],xmm5 > + movups XMMWORD[(-32)+rsi],xmm6 > + movups XMMWORD[(-16)+rsi],xmm7 > + sub rdx,16*6 > + jnc NEAR $L$xts_enc_grandloop > + > + mov eax,16+96 > + sub eax,r10d > + mov rcx,rbp > + shr eax,4 > + > +$L$xts_enc_short: > + > + mov r10d,eax > + pxor xmm10,xmm0 > + add rdx,16*6 > + jz NEAR $L$xts_enc_done > + > + pxor xmm11,xmm0 > + cmp rdx,0x20 > + jb NEAR $L$xts_enc_one > + pxor xmm12,xmm0 > + je NEAR $L$xts_enc_two > + > + pxor xmm13,xmm0 > + cmp rdx,0x40 > + jb NEAR $L$xts_enc_three > + pxor xmm14,xmm0 > + je NEAR $L$xts_enc_four > + > + movdqu xmm2,XMMWORD[rdi] > + movdqu xmm3,XMMWORD[16+rdi] > + movdqu xmm4,XMMWORD[32+rdi] > + pxor xmm2,xmm10 > + movdqu xmm5,XMMWORD[48+rdi] > + pxor xmm3,xmm11 > + movdqu xmm6,XMMWORD[64+rdi] > + lea rdi,[80+rdi] > + pxor xmm4,xmm12 > + pxor xmm5,xmm13 > + pxor xmm6,xmm14 > + pxor xmm7,xmm7 > + > + call _aesni_encrypt6 > + > + xorps xmm2,xmm10 > + movdqa xmm10,xmm15 > + xorps xmm3,xmm11 > + xorps xmm4,xmm12 > + movdqu XMMWORD[rsi],xmm2 > + xorps xmm5,xmm13 > + movdqu XMMWORD[16+rsi],xmm3 > + xorps xmm6,xmm14 > + movdqu XMMWORD[32+rsi],xmm4 > + movdqu XMMWORD[48+rsi],xmm5 > + movdqu XMMWORD[64+rsi],xmm6 > + lea rsi,[80+rsi] > + jmp NEAR $L$xts_enc_done > + > +ALIGN 16 > +$L$xts_enc_one: > + movups xmm2,XMMWORD[rdi] > + lea rdi,[16+rdi] > + xorps xmm2,xmm10 > + movups xmm0,XMMWORD[rcx] > + movups xmm1,XMMWORD[16+rcx] > + lea rcx,[32+rcx] > + xorps xmm2,xmm0 > +$L$oop_enc1_9: > +DB 102,15,56,220,209 > + dec eax > + movups xmm1,XMMWORD[rcx] > + lea rcx,[16+rcx] > + jnz NEAR $L$oop_enc1_9 > +DB 102,15,56,221,209 > + xorps xmm2,xmm10 > + movdqa xmm10,xmm11 > + movups XMMWORD[rsi],xmm2 > + lea rsi,[16+rsi] > + jmp NEAR $L$xts_enc_done > + > +ALIGN 16 > +$L$xts_enc_two: > + movups xmm2,XMMWORD[rdi] > + movups xmm3,XMMWORD[16+rdi] > + lea rdi,[32+rdi] > + xorps xmm2,xmm10 > + xorps xmm3,xmm11 > + > + call _aesni_encrypt2 > + > + xorps xmm2,xmm10 > + movdqa xmm10,xmm12 > + xorps xmm3,xmm11 > + movups XMMWORD[rsi],xmm2 > + movups XMMWORD[16+rsi],xmm3 > + lea rsi,[32+rsi] > + jmp NEAR $L$xts_enc_done > + > +ALIGN 16 > +$L$xts_enc_three: > + movups xmm2,XMMWORD[rdi] > + movups xmm3,XMMWORD[16+rdi] > + movups xmm4,XMMWORD[32+rdi] > + lea rdi,[48+rdi] > + xorps xmm2,xmm10 > + xorps xmm3,xmm11 > + xorps xmm4,xmm12 > + > + call _aesni_encrypt3 > + > + xorps xmm2,xmm10 > + movdqa xmm10,xmm13 > + xorps xmm3,xmm11 > + xorps xmm4,xmm12 > + movups XMMWORD[rsi],xmm2 > + movups XMMWORD[16+rsi],xmm3 > + movups XMMWORD[32+rsi],xmm4 > + lea rsi,[48+rsi] > + jmp NEAR $L$xts_enc_done > + > +ALIGN 16 > +$L$xts_enc_four: > + movups xmm2,XMMWORD[rdi] > + movups xmm3,XMMWORD[16+rdi] > + movups xmm4,XMMWORD[32+rdi] > + xorps xmm2,xmm10 > + movups xmm5,XMMWORD[48+rdi] > + lea rdi,[64+rdi] > + xorps xmm3,xmm11 > + xorps xmm4,xmm12 > + xorps xmm5,xmm13 > + > + call _aesni_encrypt4 > + > + pxor xmm2,xmm10 > + movdqa xmm10,xmm14 > + pxor xmm3,xmm11 > + pxor xmm4,xmm12 > + movdqu XMMWORD[rsi],xmm2 > + pxor xmm5,xmm13 > + movdqu XMMWORD[16+rsi],xmm3 > + movdqu XMMWORD[32+rsi],xmm4 > + movdqu XMMWORD[48+rsi],xmm5 > + lea rsi,[64+rsi] > + jmp NEAR $L$xts_enc_done > + > +ALIGN 16 > +$L$xts_enc_done: > + and r9,15 > + jz NEAR $L$xts_enc_ret > + mov rdx,r9 > + > +$L$xts_enc_steal: > + movzx eax,BYTE[rdi] > + movzx ecx,BYTE[((-16))+rsi] > + lea rdi,[1+rdi] > + mov BYTE[((-16))+rsi],al > + mov BYTE[rsi],cl > + lea rsi,[1+rsi] > + sub rdx,1 > + jnz NEAR $L$xts_enc_steal > + > + sub rsi,r9 > + mov rcx,rbp > + mov eax,r10d > + > + movups xmm2,XMMWORD[((-16))+rsi] > + xorps xmm2,xmm10 > + movups xmm0,XMMWORD[rcx] > + movups xmm1,XMMWORD[16+rcx] > + lea rcx,[32+rcx] > + xorps xmm2,xmm0 > +$L$oop_enc1_10: > +DB 102,15,56,220,209 > + dec eax > + movups xmm1,XMMWORD[rcx] > + lea rcx,[16+rcx] > + jnz NEAR $L$oop_enc1_10 > +DB 102,15,56,221,209 > + xorps xmm2,xmm10 > + movups XMMWORD[(-16)+rsi],xmm2 > + > +$L$xts_enc_ret: > + xorps xmm0,xmm0 > + pxor xmm1,xmm1 > + pxor xmm2,xmm2 > + pxor xmm3,xmm3 > + pxor xmm4,xmm4 > + pxor xmm5,xmm5 > + movaps xmm6,XMMWORD[((-168))+r11] > + movaps XMMWORD[(-168)+r11],xmm0 > + movaps xmm7,XMMWORD[((-152))+r11] > + movaps XMMWORD[(-152)+r11],xmm0 > + movaps xmm8,XMMWORD[((-136))+r11] > + movaps XMMWORD[(-136)+r11],xmm0 > + movaps xmm9,XMMWORD[((-120))+r11] > + movaps XMMWORD[(-120)+r11],xmm0 > + movaps xmm10,XMMWORD[((-104))+r11] > + movaps XMMWORD[(-104)+r11],xmm0 > + movaps xmm11,XMMWORD[((-88))+r11] > + movaps XMMWORD[(-88)+r11],xmm0 > + movaps xmm12,XMMWORD[((-72))+r11] > + movaps XMMWORD[(-72)+r11],xmm0 > + movaps xmm13,XMMWORD[((-56))+r11] > + movaps XMMWORD[(-56)+r11],xmm0 > + movaps xmm14,XMMWORD[((-40))+r11] > + movaps XMMWORD[(-40)+r11],xmm0 > + movaps xmm15,XMMWORD[((-24))+r11] > + movaps XMMWORD[(-24)+r11],xmm0 > + movaps XMMWORD[rsp],xmm0 > + movaps XMMWORD[16+rsp],xmm0 > + movaps XMMWORD[32+rsp],xmm0 > + movaps XMMWORD[48+rsp],xmm0 > + movaps XMMWORD[64+rsp],xmm0 > + movaps XMMWORD[80+rsp],xmm0 > + movaps XMMWORD[96+rsp],xmm0 > + mov rbp,QWORD[((-8))+r11] > + > + lea rsp,[r11] > + > +$L$xts_enc_epilogue: > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > + mov rsi,QWORD[16+rsp] > + DB 0F3h,0C3h ;repret > + > +$L$SEH_end_aesni_xts_encrypt: > +global aesni_xts_decrypt > + > +ALIGN 16 > +aesni_xts_decrypt: > + mov QWORD[8+rsp],rdi ;WIN64 prologue > + mov QWORD[16+rsp],rsi > + mov rax,rsp > +$L$SEH_begin_aesni_xts_decrypt: > + mov rdi,rcx > + mov rsi,rdx > + mov rdx,r8 > + mov rcx,r9 > + mov r8,QWORD[40+rsp] > + mov r9,QWORD[48+rsp] > + > + > + > + lea r11,[rsp] > + > + push rbp > + > + sub rsp,272 > + and rsp,-16 > + movaps XMMWORD[(-168)+r11],xmm6 > + movaps XMMWORD[(-152)+r11],xmm7 > + movaps XMMWORD[(-136)+r11],xmm8 > + movaps XMMWORD[(-120)+r11],xmm9 > + movaps XMMWORD[(-104)+r11],xmm10 > + movaps XMMWORD[(-88)+r11],xmm11 > + movaps XMMWORD[(-72)+r11],xmm12 > + movaps XMMWORD[(-56)+r11],xmm13 > + movaps XMMWORD[(-40)+r11],xmm14 > + movaps XMMWORD[(-24)+r11],xmm15 > +$L$xts_dec_body: > + movups xmm2,XMMWORD[r9] > + mov eax,DWORD[240+r8] > + mov r10d,DWORD[240+rcx] > + movups xmm0,XMMWORD[r8] > + movups xmm1,XMMWORD[16+r8] > + lea r8,[32+r8] > + xorps xmm2,xmm0 > +$L$oop_enc1_11: > +DB 102,15,56,220,209 > + dec eax > + movups xmm1,XMMWORD[r8] > + lea r8,[16+r8] > + jnz NEAR $L$oop_enc1_11 > +DB 102,15,56,221,209 > + xor eax,eax > + test rdx,15 > + setnz al > + shl rax,4 > + sub rdx,rax > + > + movups xmm0,XMMWORD[rcx] > + mov rbp,rcx > + mov eax,r10d > + shl r10d,4 > + mov r9,rdx > + and rdx,-16 > + > + movups xmm1,XMMWORD[16+r10*1+rcx] > + > + movdqa xmm8,XMMWORD[$L$xts_magic] > + movdqa xmm15,xmm2 > + pshufd xmm9,xmm2,0x5f > + pxor xmm1,xmm0 > + movdqa xmm14,xmm9 > + paddd xmm9,xmm9 > + movdqa xmm10,xmm15 > + psrad xmm14,31 > + paddq xmm15,xmm15 > + pand xmm14,xmm8 > + pxor xmm10,xmm0 > + pxor xmm15,xmm14 > + movdqa xmm14,xmm9 > + paddd xmm9,xmm9 > + movdqa xmm11,xmm15 > + psrad xmm14,31 > + paddq xmm15,xmm15 > + pand xmm14,xmm8 > + pxor xmm11,xmm0 > + pxor xmm15,xmm14 > + movdqa xmm14,xmm9 > + paddd xmm9,xmm9 > + movdqa xmm12,xmm15 > + psrad xmm14,31 > + paddq xmm15,xmm15 > + pand xmm14,xmm8 > + pxor xmm12,xmm0 > + pxor xmm15,xmm14 > + movdqa xmm14,xmm9 > + paddd xmm9,xmm9 > + movdqa xmm13,xmm15 > + psrad xmm14,31 > + paddq xmm15,xmm15 > + pand xmm14,xmm8 > + pxor xmm13,xmm0 > + pxor xmm15,xmm14 > + movdqa xmm14,xmm15 > + psrad xmm9,31 > + paddq xmm15,xmm15 > + pand xmm9,xmm8 > + pxor xmm14,xmm0 > + pxor xmm15,xmm9 > + movaps XMMWORD[96+rsp],xmm1 > + > + sub rdx,16*6 > + jc NEAR $L$xts_dec_short > + > + mov eax,16+96 > + lea rcx,[32+r10*1+rbp] > + sub rax,r10 > + movups xmm1,XMMWORD[16+rbp] > + mov r10,rax > + lea r8,[$L$xts_magic] > + jmp NEAR $L$xts_dec_grandloop > + > +ALIGN 32 > +$L$xts_dec_grandloop: > + movdqu xmm2,XMMWORD[rdi] > + movdqa xmm8,xmm0 > + movdqu xmm3,XMMWORD[16+rdi] > + pxor xmm2,xmm10 > + movdqu xmm4,XMMWORD[32+rdi] > + pxor xmm3,xmm11 > +DB 102,15,56,222,209 > + movdqu xmm5,XMMWORD[48+rdi] > + pxor xmm4,xmm12 > +DB 102,15,56,222,217 > + movdqu xmm6,XMMWORD[64+rdi] > + pxor xmm5,xmm13 > +DB 102,15,56,222,225 > + movdqu xmm7,XMMWORD[80+rdi] > + pxor xmm8,xmm15 > + movdqa xmm9,XMMWORD[96+rsp] > + pxor xmm6,xmm14 > +DB 102,15,56,222,233 > + movups xmm0,XMMWORD[32+rbp] > + lea rdi,[96+rdi] > + pxor xmm7,xmm8 > + > + pxor xmm10,xmm9 > +DB 102,15,56,222,241 > + pxor xmm11,xmm9 > + movdqa XMMWORD[rsp],xmm10 > +DB 102,15,56,222,249 > + movups xmm1,XMMWORD[48+rbp] > + pxor xmm12,xmm9 > + > +DB 102,15,56,222,208 > + pxor xmm13,xmm9 > + movdqa XMMWORD[16+rsp],xmm11 > +DB 102,15,56,222,216 > + pxor xmm14,xmm9 > + movdqa XMMWORD[32+rsp],xmm12 > +DB 102,15,56,222,224 > +DB 102,15,56,222,232 > + pxor xmm8,xmm9 > + movdqa XMMWORD[64+rsp],xmm14 > +DB 102,15,56,222,240 > +DB 102,15,56,222,248 > + movups xmm0,XMMWORD[64+rbp] > + movdqa XMMWORD[80+rsp],xmm8 > + pshufd xmm9,xmm15,0x5f > + jmp NEAR $L$xts_dec_loop6 > +ALIGN 32 > +$L$xts_dec_loop6: > +DB 102,15,56,222,209 > +DB 102,15,56,222,217 > +DB 102,15,56,222,225 > +DB 102,15,56,222,233 > +DB 102,15,56,222,241 > +DB 102,15,56,222,249 > + movups xmm1,XMMWORD[((-64))+rax*1+rcx] > + add rax,32 > + > +DB 102,15,56,222,208 > +DB 102,15,56,222,216 > +DB 102,15,56,222,224 > +DB 102,15,56,222,232 > +DB 102,15,56,222,240 > +DB 102,15,56,222,248 > + movups xmm0,XMMWORD[((-80))+rax*1+rcx] > + jnz NEAR $L$xts_dec_loop6 > + > + movdqa xmm8,XMMWORD[r8] > + movdqa xmm14,xmm9 > + paddd xmm9,xmm9 > +DB 102,15,56,222,209 > + paddq xmm15,xmm15 > + psrad xmm14,31 > +DB 102,15,56,222,217 > + pand xmm14,xmm8 > + movups xmm10,XMMWORD[rbp] > +DB 102,15,56,222,225 > +DB 102,15,56,222,233 > +DB 102,15,56,222,241 > + pxor xmm15,xmm14 > + movaps xmm11,xmm10 > +DB 102,15,56,222,249 > + movups xmm1,XMMWORD[((-64))+rcx] > + > + movdqa xmm14,xmm9 > +DB 102,15,56,222,208 > + paddd xmm9,xmm9 > + pxor xmm10,xmm15 > +DB 102,15,56,222,216 > + psrad xmm14,31 > + paddq xmm15,xmm15 > +DB 102,15,56,222,224 > +DB 102,15,56,222,232 > + pand xmm14,xmm8 > + movaps xmm12,xmm11 > +DB 102,15,56,222,240 > + pxor xmm15,xmm14 > + movdqa xmm14,xmm9 > +DB 102,15,56,222,248 > + movups xmm0,XMMWORD[((-48))+rcx] > + > + paddd xmm9,xmm9 > +DB 102,15,56,222,209 > + pxor xmm11,xmm15 > + psrad xmm14,31 > +DB 102,15,56,222,217 > + paddq xmm15,xmm15 > + pand xmm14,xmm8 > +DB 102,15,56,222,225 > +DB 102,15,56,222,233 > + movdqa XMMWORD[48+rsp],xmm13 > + pxor xmm15,xmm14 > +DB 102,15,56,222,241 > + movaps xmm13,xmm12 > + movdqa xmm14,xmm9 > +DB 102,15,56,222,249 > + movups xmm1,XMMWORD[((-32))+rcx] > + > + paddd xmm9,xmm9 > +DB 102,15,56,222,208 > + pxor xmm12,xmm15 > + psrad xmm14,31 > +DB 102,15,56,222,216 > + paddq xmm15,xmm15 > + pand xmm14,xmm8 > +DB 102,15,56,222,224 > +DB 102,15,56,222,232 > +DB 102,15,56,222,240 > + pxor xmm15,xmm14 > + movaps xmm14,xmm13 > +DB 102,15,56,222,248 > + > + movdqa xmm0,xmm9 > + paddd xmm9,xmm9 > +DB 102,15,56,222,209 > + pxor xmm13,xmm15 > + psrad xmm0,31 > +DB 102,15,56,222,217 > + paddq xmm15,xmm15 > + pand xmm0,xmm8 > +DB 102,15,56,222,225 > +DB 102,15,56,222,233 > + pxor xmm15,xmm0 > + movups xmm0,XMMWORD[rbp] > +DB 102,15,56,222,241 > +DB 102,15,56,222,249 > + movups xmm1,XMMWORD[16+rbp] > + > + pxor xmm14,xmm15 > +DB 102,15,56,223,84,36,0 > + psrad xmm9,31 > + paddq xmm15,xmm15 > +DB 102,15,56,223,92,36,16 > +DB 102,15,56,223,100,36,32 > + pand xmm9,xmm8 > + mov rax,r10 > +DB 102,15,56,223,108,36,48 > +DB 102,15,56,223,116,36,64 > +DB 102,15,56,223,124,36,80 > + pxor xmm15,xmm9 > + > + lea rsi,[96+rsi] > + movups XMMWORD[(-96)+rsi],xmm2 > + movups XMMWORD[(-80)+rsi],xmm3 > + movups XMMWORD[(-64)+rsi],xmm4 > + movups XMMWORD[(-48)+rsi],xmm5 > + movups XMMWORD[(-32)+rsi],xmm6 > + movups XMMWORD[(-16)+rsi],xmm7 > + sub rdx,16*6 > + jnc NEAR $L$xts_dec_grandloop > + > + mov eax,16+96 > + sub eax,r10d > + mov rcx,rbp > + shr eax,4 > + > +$L$xts_dec_short: > + > + mov r10d,eax > + pxor xmm10,xmm0 > + pxor xmm11,xmm0 > + add rdx,16*6 > + jz NEAR $L$xts_dec_done > + > + pxor xmm12,xmm0 > + cmp rdx,0x20 > + jb NEAR $L$xts_dec_one > + pxor xmm13,xmm0 > + je NEAR $L$xts_dec_two > + > + pxor xmm14,xmm0 > + cmp rdx,0x40 > + jb NEAR $L$xts_dec_three > + je NEAR $L$xts_dec_four > + > + movdqu xmm2,XMMWORD[rdi] > + movdqu xmm3,XMMWORD[16+rdi] > + movdqu xmm4,XMMWORD[32+rdi] > + pxor xmm2,xmm10 > + movdqu xmm5,XMMWORD[48+rdi] > + pxor xmm3,xmm11 > + movdqu xmm6,XMMWORD[64+rdi] > + lea rdi,[80+rdi] > + pxor xmm4,xmm12 > + pxor xmm5,xmm13 > + pxor xmm6,xmm14 > + > + call _aesni_decrypt6 > + > + xorps xmm2,xmm10 > + xorps xmm3,xmm11 > + xorps xmm4,xmm12 > + movdqu XMMWORD[rsi],xmm2 > + xorps xmm5,xmm13 > + movdqu XMMWORD[16+rsi],xmm3 > + xorps xmm6,xmm14 > + movdqu XMMWORD[32+rsi],xmm4 > + pxor xmm14,xmm14 > + movdqu XMMWORD[48+rsi],xmm5 > + pcmpgtd xmm14,xmm15 > + movdqu XMMWORD[64+rsi],xmm6 > + lea rsi,[80+rsi] > + pshufd xmm11,xmm14,0x13 > + and r9,15 > + jz NEAR $L$xts_dec_ret > + > + movdqa xmm10,xmm15 > + paddq xmm15,xmm15 > + pand xmm11,xmm8 > + pxor xmm11,xmm15 > + jmp NEAR $L$xts_dec_done2 > + > +ALIGN 16 > +$L$xts_dec_one: > + movups xmm2,XMMWORD[rdi] > + lea rdi,[16+rdi] > + xorps xmm2,xmm10 > + movups xmm0,XMMWORD[rcx] > + movups xmm1,XMMWORD[16+rcx] > + lea rcx,[32+rcx] > + xorps xmm2,xmm0 > +$L$oop_dec1_12: > +DB 102,15,56,222,209 > + dec eax > + movups xmm1,XMMWORD[rcx] > + lea rcx,[16+rcx] > + jnz NEAR $L$oop_dec1_12 > +DB 102,15,56,223,209 > + xorps xmm2,xmm10 > + movdqa xmm10,xmm11 > + movups XMMWORD[rsi],xmm2 > + movdqa xmm11,xmm12 > + lea rsi,[16+rsi] > + jmp NEAR $L$xts_dec_done > + > +ALIGN 16 > +$L$xts_dec_two: > + movups xmm2,XMMWORD[rdi] > + movups xmm3,XMMWORD[16+rdi] > + lea rdi,[32+rdi] > + xorps xmm2,xmm10 > + xorps xmm3,xmm11 > + > + call _aesni_decrypt2 > + > + xorps xmm2,xmm10 > + movdqa xmm10,xmm12 > + xorps xmm3,xmm11 > + movdqa xmm11,xmm13 > + movups XMMWORD[rsi],xmm2 > + movups XMMWORD[16+rsi],xmm3 > + lea rsi,[32+rsi] > + jmp NEAR $L$xts_dec_done > + > +ALIGN 16 > +$L$xts_dec_three: > + movups xmm2,XMMWORD[rdi] > + movups xmm3,XMMWORD[16+rdi] > + movups xmm4,XMMWORD[32+rdi] > + lea rdi,[48+rdi] > + xorps xmm2,xmm10 > + xorps xmm3,xmm11 > + xorps xmm4,xmm12 > + > + call _aesni_decrypt3 > + > + xorps xmm2,xmm10 > + movdqa xmm10,xmm13 > + xorps xmm3,xmm11 > + movdqa xmm11,xmm14 > + xorps xmm4,xmm12 > + movups XMMWORD[rsi],xmm2 > + movups XMMWORD[16+rsi],xmm3 > + movups XMMWORD[32+rsi],xmm4 > + lea rsi,[48+rsi] > + jmp NEAR $L$xts_dec_done > + > +ALIGN 16 > +$L$xts_dec_four: > + movups xmm2,XMMWORD[rdi] > + movups xmm3,XMMWORD[16+rdi] > + movups xmm4,XMMWORD[32+rdi] > + xorps xmm2,xmm10 > + movups xmm5,XMMWORD[48+rdi] > + lea rdi,[64+rdi] > + xorps xmm3,xmm11 > + xorps xmm4,xmm12 > + xorps xmm5,xmm13 > + > + call _aesni_decrypt4 > + > + pxor xmm2,xmm10 > + movdqa xmm10,xmm14 > + pxor xmm3,xmm11 > + movdqa xmm11,xmm15 > + pxor xmm4,xmm12 > + movdqu XMMWORD[rsi],xmm2 > + pxor xmm5,xmm13 > + movdqu XMMWORD[16+rsi],xmm3 > + movdqu XMMWORD[32+rsi],xmm4 > + movdqu XMMWORD[48+rsi],xmm5 > + lea rsi,[64+rsi] > + jmp NEAR $L$xts_dec_done > + > +ALIGN 16 > +$L$xts_dec_done: > + and r9,15 > + jz NEAR $L$xts_dec_ret > +$L$xts_dec_done2: > + mov rdx,r9 > + mov rcx,rbp > + mov eax,r10d > + > + movups xmm2,XMMWORD[rdi] > + xorps xmm2,xmm11 > + movups xmm0,XMMWORD[rcx] > + movups xmm1,XMMWORD[16+rcx] > + lea rcx,[32+rcx] > + xorps xmm2,xmm0 > +$L$oop_dec1_13: > +DB 102,15,56,222,209 > + dec eax > + movups xmm1,XMMWORD[rcx] > + lea rcx,[16+rcx] > + jnz NEAR $L$oop_dec1_13 > +DB 102,15,56,223,209 > + xorps xmm2,xmm11 > + movups XMMWORD[rsi],xmm2 > + > +$L$xts_dec_steal: > + movzx eax,BYTE[16+rdi] > + movzx ecx,BYTE[rsi] > + lea rdi,[1+rdi] > + mov BYTE[rsi],al > + mov BYTE[16+rsi],cl > + lea rsi,[1+rsi] > + sub rdx,1 > + jnz NEAR $L$xts_dec_steal > + > + sub rsi,r9 > + mov rcx,rbp > + mov eax,r10d > + > + movups xmm2,XMMWORD[rsi] > + xorps xmm2,xmm10 > + movups xmm0,XMMWORD[rcx] > + movups xmm1,XMMWORD[16+rcx] > + lea rcx,[32+rcx] > + xorps xmm2,xmm0 > +$L$oop_dec1_14: > +DB 102,15,56,222,209 > + dec eax > + movups xmm1,XMMWORD[rcx] > + lea rcx,[16+rcx] > + jnz NEAR $L$oop_dec1_14 > +DB 102,15,56,223,209 > + xorps xmm2,xmm10 > + movups XMMWORD[rsi],xmm2 > + > +$L$xts_dec_ret: > + xorps xmm0,xmm0 > + pxor xmm1,xmm1 > + pxor xmm2,xmm2 > + pxor xmm3,xmm3 > + pxor xmm4,xmm4 > + pxor xmm5,xmm5 > + movaps xmm6,XMMWORD[((-168))+r11] > + movaps XMMWORD[(-168)+r11],xmm0 > + movaps xmm7,XMMWORD[((-152))+r11] > + movaps XMMWORD[(-152)+r11],xmm0 > + movaps xmm8,XMMWORD[((-136))+r11] > + movaps XMMWORD[(-136)+r11],xmm0 > + movaps xmm9,XMMWORD[((-120))+r11] > + movaps XMMWORD[(-120)+r11],xmm0 > + movaps xmm10,XMMWORD[((-104))+r11] > + movaps XMMWORD[(-104)+r11],xmm0 > + movaps xmm11,XMMWORD[((-88))+r11] > + movaps XMMWORD[(-88)+r11],xmm0 > + movaps xmm12,XMMWORD[((-72))+r11] > + movaps XMMWORD[(-72)+r11],xmm0 > + movaps xmm13,XMMWORD[((-56))+r11] > + movaps XMMWORD[(-56)+r11],xmm0 > + movaps xmm14,XMMWORD[((-40))+r11] > + movaps XMMWORD[(-40)+r11],xmm0 > + movaps xmm15,XMMWORD[((-24))+r11] > + movaps XMMWORD[(-24)+r11],xmm0 > + movaps XMMWORD[rsp],xmm0 > + movaps XMMWORD[16+rsp],xmm0 > + movaps XMMWORD[32+rsp],xmm0 > + movaps XMMWORD[48+rsp],xmm0 > + movaps XMMWORD[64+rsp],xmm0 > + movaps XMMWORD[80+rsp],xmm0 > + movaps XMMWORD[96+rsp],xmm0 > + mov rbp,QWORD[((-8))+r11] > + > + lea rsp,[r11] > + > +$L$xts_dec_epilogue: > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > + mov rsi,QWORD[16+rsp] > + DB 0F3h,0C3h ;repret > + > +$L$SEH_end_aesni_xts_decrypt: > +global aesni_ocb_encrypt > + > +ALIGN 32 > +aesni_ocb_encrypt: > + mov QWORD[8+rsp],rdi ;WIN64 prologue > + mov QWORD[16+rsp],rsi > + mov rax,rsp > +$L$SEH_begin_aesni_ocb_encrypt: > + mov rdi,rcx > + mov rsi,rdx > + mov rdx,r8 > + mov rcx,r9 > + mov r8,QWORD[40+rsp] > + mov r9,QWORD[48+rsp] > + > + > + > + lea rax,[rsp] > + push rbx > + > + push rbp > + > + push r12 > + > + push r13 > + > + push r14 > + > + lea rsp,[((-160))+rsp] > + movaps XMMWORD[rsp],xmm6 > + movaps XMMWORD[16+rsp],xmm7 > + movaps XMMWORD[32+rsp],xmm8 > + movaps XMMWORD[48+rsp],xmm9 > + movaps XMMWORD[64+rsp],xmm10 > + movaps XMMWORD[80+rsp],xmm11 > + movaps XMMWORD[96+rsp],xmm12 > + movaps XMMWORD[112+rsp],xmm13 > + movaps XMMWORD[128+rsp],xmm14 > + movaps XMMWORD[144+rsp],xmm15 > +$L$ocb_enc_body: > + mov rbx,QWORD[56+rax] > + mov rbp,QWORD[((56+8))+rax] > + > + mov r10d,DWORD[240+rcx] > + mov r11,rcx > + shl r10d,4 > + movups xmm9,XMMWORD[rcx] > + movups xmm1,XMMWORD[16+r10*1+rcx] > + > + movdqu xmm15,XMMWORD[r9] > + pxor xmm9,xmm1 > + pxor xmm15,xmm1 > + > + mov eax,16+32 > + lea rcx,[32+r10*1+r11] > + movups xmm1,XMMWORD[16+r11] > + sub rax,r10 > + mov r10,rax > + > + movdqu xmm10,XMMWORD[rbx] > + movdqu xmm8,XMMWORD[rbp] > + > + test r8,1 > + jnz NEAR $L$ocb_enc_odd > + > + bsf r12,r8 > + add r8,1 > + shl r12,4 > + movdqu xmm7,XMMWORD[r12*1+rbx] > + movdqu xmm2,XMMWORD[rdi] > + lea rdi,[16+rdi] > + > + call __ocb_encrypt1 > + > + movdqa xmm15,xmm7 > + movups XMMWORD[rsi],xmm2 > + lea rsi,[16+rsi] > + sub rdx,1 > + jz NEAR $L$ocb_enc_done > + > +$L$ocb_enc_odd: > + lea r12,[1+r8] > + lea r13,[3+r8] > + lea r14,[5+r8] > + lea r8,[6+r8] > + bsf r12,r12 > + bsf r13,r13 > + bsf r14,r14 > + shl r12,4 > + shl r13,4 > + shl r14,4 > + > + sub rdx,6 > + jc NEAR $L$ocb_enc_short > + jmp NEAR $L$ocb_enc_grandloop > + > +ALIGN 32 > +$L$ocb_enc_grandloop: > + movdqu xmm2,XMMWORD[rdi] > + movdqu xmm3,XMMWORD[16+rdi] > + movdqu xmm4,XMMWORD[32+rdi] > + movdqu xmm5,XMMWORD[48+rdi] > + movdqu xmm6,XMMWORD[64+rdi] > + movdqu xmm7,XMMWORD[80+rdi] > + lea rdi,[96+rdi] > + > + call __ocb_encrypt6 > + > + movups XMMWORD[rsi],xmm2 > + movups XMMWORD[16+rsi],xmm3 > + movups XMMWORD[32+rsi],xmm4 > + movups XMMWORD[48+rsi],xmm5 > + movups XMMWORD[64+rsi],xmm6 > + movups XMMWORD[80+rsi],xmm7 > + lea rsi,[96+rsi] > + sub rdx,6 > + jnc NEAR $L$ocb_enc_grandloop > + > +$L$ocb_enc_short: > + add rdx,6 > + jz NEAR $L$ocb_enc_done > + > + movdqu xmm2,XMMWORD[rdi] > + cmp rdx,2 > + jb NEAR $L$ocb_enc_one > + movdqu xmm3,XMMWORD[16+rdi] > + je NEAR $L$ocb_enc_two > + > + movdqu xmm4,XMMWORD[32+rdi] > + cmp rdx,4 > + jb NEAR $L$ocb_enc_three > + movdqu xmm5,XMMWORD[48+rdi] > + je NEAR $L$ocb_enc_four > + > + movdqu xmm6,XMMWORD[64+rdi] > + pxor xmm7,xmm7 > + > + call __ocb_encrypt6 > + > + movdqa xmm15,xmm14 > + movups XMMWORD[rsi],xmm2 > + movups XMMWORD[16+rsi],xmm3 > + movups XMMWORD[32+rsi],xmm4 > + movups XMMWORD[48+rsi],xmm5 > + movups XMMWORD[64+rsi],xmm6 > + > + jmp NEAR $L$ocb_enc_done > + > +ALIGN 16 > +$L$ocb_enc_one: > + movdqa xmm7,xmm10 > + > + call __ocb_encrypt1 > + > + movdqa xmm15,xmm7 > + movups XMMWORD[rsi],xmm2 > + jmp NEAR $L$ocb_enc_done > + > +ALIGN 16 > +$L$ocb_enc_two: > + pxor xmm4,xmm4 > + pxor xmm5,xmm5 > + > + call __ocb_encrypt4 > + > + movdqa xmm15,xmm11 > + movups XMMWORD[rsi],xmm2 > + movups XMMWORD[16+rsi],xmm3 > + > + jmp NEAR $L$ocb_enc_done > + > +ALIGN 16 > +$L$ocb_enc_three: > + pxor xmm5,xmm5 > + > + call __ocb_encrypt4 > + > + movdqa xmm15,xmm12 > + movups XMMWORD[rsi],xmm2 > + movups XMMWORD[16+rsi],xmm3 > + movups XMMWORD[32+rsi],xmm4 > + > + jmp NEAR $L$ocb_enc_done > + > +ALIGN 16 > +$L$ocb_enc_four: > + call __ocb_encrypt4 > + > + movdqa xmm15,xmm13 > + movups XMMWORD[rsi],xmm2 > + movups XMMWORD[16+rsi],xmm3 > + movups XMMWORD[32+rsi],xmm4 > + movups XMMWORD[48+rsi],xmm5 > + > +$L$ocb_enc_done: > + pxor xmm15,xmm0 > + movdqu XMMWORD[rbp],xmm8 > + movdqu XMMWORD[r9],xmm15 > + > + xorps xmm0,xmm0 > + pxor xmm1,xmm1 > + pxor xmm2,xmm2 > + pxor xmm3,xmm3 > + pxor xmm4,xmm4 > + pxor xmm5,xmm5 > + movaps xmm6,XMMWORD[rsp] > + movaps XMMWORD[rsp],xmm0 > + movaps xmm7,XMMWORD[16+rsp] > + movaps XMMWORD[16+rsp],xmm0 > + movaps xmm8,XMMWORD[32+rsp] > + movaps XMMWORD[32+rsp],xmm0 > + movaps xmm9,XMMWORD[48+rsp] > + movaps XMMWORD[48+rsp],xmm0 > + movaps xmm10,XMMWORD[64+rsp] > + movaps XMMWORD[64+rsp],xmm0 > + movaps xmm11,XMMWORD[80+rsp] > + movaps XMMWORD[80+rsp],xmm0 > + movaps xmm12,XMMWORD[96+rsp] > + movaps XMMWORD[96+rsp],xmm0 > + movaps xmm13,XMMWORD[112+rsp] > + movaps XMMWORD[112+rsp],xmm0 > + movaps xmm14,XMMWORD[128+rsp] > + movaps XMMWORD[128+rsp],xmm0 > + movaps xmm15,XMMWORD[144+rsp] > + movaps XMMWORD[144+rsp],xmm0 > + lea rax,[((160+40))+rsp] > +$L$ocb_enc_pop: > + mov r14,QWORD[((-40))+rax] > + > + mov r13,QWORD[((-32))+rax] > + > + mov r12,QWORD[((-24))+rax] > + > + mov rbp,QWORD[((-16))+rax] > + > + mov rbx,QWORD[((-8))+rax] > + > + lea rsp,[rax] > + > +$L$ocb_enc_epilogue: > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > + mov rsi,QWORD[16+rsp] > + DB 0F3h,0C3h ;repret > + > +$L$SEH_end_aesni_ocb_encrypt: > + > + > +ALIGN 32 > +__ocb_encrypt6: > + > + pxor xmm15,xmm9 > + movdqu xmm11,XMMWORD[r12*1+rbx] > + movdqa xmm12,xmm10 > + movdqu xmm13,XMMWORD[r13*1+rbx] > + movdqa xmm14,xmm10 > + pxor xmm10,xmm15 > + movdqu xmm15,XMMWORD[r14*1+rbx] > + pxor xmm11,xmm10 > + pxor xmm8,xmm2 > + pxor xmm2,xmm10 > + pxor xmm12,xmm11 > + pxor xmm8,xmm3 > + pxor xmm3,xmm11 > + pxor xmm13,xmm12 > + pxor xmm8,xmm4 > + pxor xmm4,xmm12 > + pxor xmm14,xmm13 > + pxor xmm8,xmm5 > + pxor xmm5,xmm13 > + pxor xmm15,xmm14 > + pxor xmm8,xmm6 > + pxor xmm6,xmm14 > + pxor xmm8,xmm7 > + pxor xmm7,xmm15 > + movups xmm0,XMMWORD[32+r11] > + > + lea r12,[1+r8] > + lea r13,[3+r8] > + lea r14,[5+r8] > + add r8,6 > + pxor xmm10,xmm9 > + bsf r12,r12 > + bsf r13,r13 > + bsf r14,r14 > + > +DB 102,15,56,220,209 > +DB 102,15,56,220,217 > +DB 102,15,56,220,225 > +DB 102,15,56,220,233 > + pxor xmm11,xmm9 > + pxor xmm12,xmm9 > +DB 102,15,56,220,241 > + pxor xmm13,xmm9 > + pxor xmm14,xmm9 > +DB 102,15,56,220,249 > + movups xmm1,XMMWORD[48+r11] > + pxor xmm15,xmm9 > + > +DB 102,15,56,220,208 > +DB 102,15,56,220,216 > +DB 102,15,56,220,224 > +DB 102,15,56,220,232 > +DB 102,15,56,220,240 > +DB 102,15,56,220,248 > + movups xmm0,XMMWORD[64+r11] > + shl r12,4 > + shl r13,4 > + jmp NEAR $L$ocb_enc_loop6 > + > +ALIGN 32 > +$L$ocb_enc_loop6: > +DB 102,15,56,220,209 > +DB 102,15,56,220,217 > +DB 102,15,56,220,225 > +DB 102,15,56,220,233 > +DB 102,15,56,220,241 > +DB 102,15,56,220,249 > + movups xmm1,XMMWORD[rax*1+rcx] > + add rax,32 > + > +DB 102,15,56,220,208 > +DB 102,15,56,220,216 > +DB 102,15,56,220,224 > +DB 102,15,56,220,232 > +DB 102,15,56,220,240 > +DB 102,15,56,220,248 > + movups xmm0,XMMWORD[((-16))+rax*1+rcx] > + jnz NEAR $L$ocb_enc_loop6 > + > +DB 102,15,56,220,209 > +DB 102,15,56,220,217 > +DB 102,15,56,220,225 > +DB 102,15,56,220,233 > +DB 102,15,56,220,241 > +DB 102,15,56,220,249 > + movups xmm1,XMMWORD[16+r11] > + shl r14,4 > + > +DB 102,65,15,56,221,210 > + movdqu xmm10,XMMWORD[rbx] > + mov rax,r10 > +DB 102,65,15,56,221,219 > +DB 102,65,15,56,221,228 > +DB 102,65,15,56,221,237 > +DB 102,65,15,56,221,246 > +DB 102,65,15,56,221,255 > + DB 0F3h,0C3h ;repret > + > + > + > + > +ALIGN 32 > +__ocb_encrypt4: > + > + pxor xmm15,xmm9 > + movdqu xmm11,XMMWORD[r12*1+rbx] > + movdqa xmm12,xmm10 > + movdqu xmm13,XMMWORD[r13*1+rbx] > + pxor xmm10,xmm15 > + pxor xmm11,xmm10 > + pxor xmm8,xmm2 > + pxor xmm2,xmm10 > + pxor xmm12,xmm11 > + pxor xmm8,xmm3 > + pxor xmm3,xmm11 > + pxor xmm13,xmm12 > + pxor xmm8,xmm4 > + pxor xmm4,xmm12 > + pxor xmm8,xmm5 > + pxor xmm5,xmm13 > + movups xmm0,XMMWORD[32+r11] > + > + pxor xmm10,xmm9 > + pxor xmm11,xmm9 > + pxor xmm12,xmm9 > + pxor xmm13,xmm9 > + > +DB 102,15,56,220,209 > +DB 102,15,56,220,217 > +DB 102,15,56,220,225 > +DB 102,15,56,220,233 > + movups xmm1,XMMWORD[48+r11] > + > +DB 102,15,56,220,208 > +DB 102,15,56,220,216 > +DB 102,15,56,220,224 > +DB 102,15,56,220,232 > + movups xmm0,XMMWORD[64+r11] > + jmp NEAR $L$ocb_enc_loop4 > + > +ALIGN 32 > +$L$ocb_enc_loop4: > +DB 102,15,56,220,209 > +DB 102,15,56,220,217 > +DB 102,15,56,220,225 > +DB 102,15,56,220,233 > + movups xmm1,XMMWORD[rax*1+rcx] > + add rax,32 > + > +DB 102,15,56,220,208 > +DB 102,15,56,220,216 > +DB 102,15,56,220,224 > +DB 102,15,56,220,232 > + movups xmm0,XMMWORD[((-16))+rax*1+rcx] > + jnz NEAR $L$ocb_enc_loop4 > + > +DB 102,15,56,220,209 > +DB 102,15,56,220,217 > +DB 102,15,56,220,225 > +DB 102,15,56,220,233 > + movups xmm1,XMMWORD[16+r11] > + mov rax,r10 > + > +DB 102,65,15,56,221,210 > +DB 102,65,15,56,221,219 > +DB 102,65,15,56,221,228 > +DB 102,65,15,56,221,237 > + DB 0F3h,0C3h ;repret > + > + > + > + > +ALIGN 32 > +__ocb_encrypt1: > + > + pxor xmm7,xmm15 > + pxor xmm7,xmm9 > + pxor xmm8,xmm2 > + pxor xmm2,xmm7 > + movups xmm0,XMMWORD[32+r11] > + > +DB 102,15,56,220,209 > + movups xmm1,XMMWORD[48+r11] > + pxor xmm7,xmm9 > + > +DB 102,15,56,220,208 > + movups xmm0,XMMWORD[64+r11] > + jmp NEAR $L$ocb_enc_loop1 > + > +ALIGN 32 > +$L$ocb_enc_loop1: > +DB 102,15,56,220,209 > + movups xmm1,XMMWORD[rax*1+rcx] > + add rax,32 > + > +DB 102,15,56,220,208 > + movups xmm0,XMMWORD[((-16))+rax*1+rcx] > + jnz NEAR $L$ocb_enc_loop1 > + > +DB 102,15,56,220,209 > + movups xmm1,XMMWORD[16+r11] > + mov rax,r10 > + > +DB 102,15,56,221,215 > + DB 0F3h,0C3h ;repret > + > + > + > +global aesni_ocb_decrypt > + > +ALIGN 32 > +aesni_ocb_decrypt: > + mov QWORD[8+rsp],rdi ;WIN64 prologue > + mov QWORD[16+rsp],rsi > + mov rax,rsp > +$L$SEH_begin_aesni_ocb_decrypt: > + mov rdi,rcx > + mov rsi,rdx > + mov rdx,r8 > + mov rcx,r9 > + mov r8,QWORD[40+rsp] > + mov r9,QWORD[48+rsp] > + > + > + > + lea rax,[rsp] > + push rbx > + > + push rbp > + > + push r12 > + > + push r13 > + > + push r14 > + > + lea rsp,[((-160))+rsp] > + movaps XMMWORD[rsp],xmm6 > + movaps XMMWORD[16+rsp],xmm7 > + movaps XMMWORD[32+rsp],xmm8 > + movaps XMMWORD[48+rsp],xmm9 > + movaps XMMWORD[64+rsp],xmm10 > + movaps XMMWORD[80+rsp],xmm11 > + movaps XMMWORD[96+rsp],xmm12 > + movaps XMMWORD[112+rsp],xmm13 > + movaps XMMWORD[128+rsp],xmm14 > + movaps XMMWORD[144+rsp],xmm15 > +$L$ocb_dec_body: > + mov rbx,QWORD[56+rax] > + mov rbp,QWORD[((56+8))+rax] > + > + mov r10d,DWORD[240+rcx] > + mov r11,rcx > + shl r10d,4 > + movups xmm9,XMMWORD[rcx] > + movups xmm1,XMMWORD[16+r10*1+rcx] > + > + movdqu xmm15,XMMWORD[r9] > + pxor xmm9,xmm1 > + pxor xmm15,xmm1 > + > + mov eax,16+32 > + lea rcx,[32+r10*1+r11] > + movups xmm1,XMMWORD[16+r11] > + sub rax,r10 > + mov r10,rax > + > + movdqu xmm10,XMMWORD[rbx] > + movdqu xmm8,XMMWORD[rbp] > + > + test r8,1 > + jnz NEAR $L$ocb_dec_odd > + > + bsf r12,r8 > + add r8,1 > + shl r12,4 > + movdqu xmm7,XMMWORD[r12*1+rbx] > + movdqu xmm2,XMMWORD[rdi] > + lea rdi,[16+rdi] > + > + call __ocb_decrypt1 > + > + movdqa xmm15,xmm7 > + movups XMMWORD[rsi],xmm2 > + xorps xmm8,xmm2 > + lea rsi,[16+rsi] > + sub rdx,1 > + jz NEAR $L$ocb_dec_done > + > +$L$ocb_dec_odd: > + lea r12,[1+r8] > + lea r13,[3+r8] > + lea r14,[5+r8] > + lea r8,[6+r8] > + bsf r12,r12 > + bsf r13,r13 > + bsf r14,r14 > + shl r12,4 > + shl r13,4 > + shl r14,4 > + > + sub rdx,6 > + jc NEAR $L$ocb_dec_short > + jmp NEAR $L$ocb_dec_grandloop > + > +ALIGN 32 > +$L$ocb_dec_grandloop: > + movdqu xmm2,XMMWORD[rdi] > + movdqu xmm3,XMMWORD[16+rdi] > + movdqu xmm4,XMMWORD[32+rdi] > + movdqu xmm5,XMMWORD[48+rdi] > + movdqu xmm6,XMMWORD[64+rdi] > + movdqu xmm7,XMMWORD[80+rdi] > + lea rdi,[96+rdi] > + > + call __ocb_decrypt6 > + > + movups XMMWORD[rsi],xmm2 > + pxor xmm8,xmm2 > + movups XMMWORD[16+rsi],xmm3 > + pxor xmm8,xmm3 > + movups XMMWORD[32+rsi],xmm4 > + pxor xmm8,xmm4 > + movups XMMWORD[48+rsi],xmm5 > + pxor xmm8,xmm5 > + movups XMMWORD[64+rsi],xmm6 > + pxor xmm8,xmm6 > + movups XMMWORD[80+rsi],xmm7 > + pxor xmm8,xmm7 > + lea rsi,[96+rsi] > + sub rdx,6 > + jnc NEAR $L$ocb_dec_grandloop > + > +$L$ocb_dec_short: > + add rdx,6 > + jz NEAR $L$ocb_dec_done > + > + movdqu xmm2,XMMWORD[rdi] > + cmp rdx,2 > + jb NEAR $L$ocb_dec_one > + movdqu xmm3,XMMWORD[16+rdi] > + je NEAR $L$ocb_dec_two > + > + movdqu xmm4,XMMWORD[32+rdi] > + cmp rdx,4 > + jb NEAR $L$ocb_dec_three > + movdqu xmm5,XMMWORD[48+rdi] > + je NEAR $L$ocb_dec_four > + > + movdqu xmm6,XMMWORD[64+rdi] > + pxor xmm7,xmm7 > + > + call __ocb_decrypt6 > + > + movdqa xmm15,xmm14 > + movups XMMWORD[rsi],xmm2 > + pxor xmm8,xmm2 > + movups XMMWORD[16+rsi],xmm3 > + pxor xmm8,xmm3 > + movups XMMWORD[32+rsi],xmm4 > + pxor xmm8,xmm4 > + movups XMMWORD[48+rsi],xmm5 > + pxor xmm8,xmm5 > + movups XMMWORD[64+rsi],xmm6 > + pxor xmm8,xmm6 > + > + jmp NEAR $L$ocb_dec_done > + > +ALIGN 16 > +$L$ocb_dec_one: > + movdqa xmm7,xmm10 > + > + call __ocb_decrypt1 > + > + movdqa xmm15,xmm7 > + movups XMMWORD[rsi],xmm2 > + xorps xmm8,xmm2 > + jmp NEAR $L$ocb_dec_done > + > +ALIGN 16 > +$L$ocb_dec_two: > + pxor xmm4,xmm4 > + pxor xmm5,xmm5 > + > + call __ocb_decrypt4 > + > + movdqa xmm15,xmm11 > + movups XMMWORD[rsi],xmm2 > + xorps xmm8,xmm2 > + movups XMMWORD[16+rsi],xmm3 > + xorps xmm8,xmm3 > + > + jmp NEAR $L$ocb_dec_done > + > +ALIGN 16 > +$L$ocb_dec_three: > + pxor xmm5,xmm5 > + > + call __ocb_decrypt4 > + > + movdqa xmm15,xmm12 > + movups XMMWORD[rsi],xmm2 > + xorps xmm8,xmm2 > + movups XMMWORD[16+rsi],xmm3 > + xorps xmm8,xmm3 > + movups XMMWORD[32+rsi],xmm4 > + xorps xmm8,xmm4 > + > + jmp NEAR $L$ocb_dec_done > + > +ALIGN 16 > +$L$ocb_dec_four: > + call __ocb_decrypt4 > + > + movdqa xmm15,xmm13 > + movups XMMWORD[rsi],xmm2 > + pxor xmm8,xmm2 > + movups XMMWORD[16+rsi],xmm3 > + pxor xmm8,xmm3 > + movups XMMWORD[32+rsi],xmm4 > + pxor xmm8,xmm4 > + movups XMMWORD[48+rsi],xmm5 > + pxor xmm8,xmm5 > + > +$L$ocb_dec_done: > + pxor xmm15,xmm0 > + movdqu XMMWORD[rbp],xmm8 > + movdqu XMMWORD[r9],xmm15 > + > + xorps xmm0,xmm0 > + pxor xmm1,xmm1 > + pxor xmm2,xmm2 > + pxor xmm3,xmm3 > + pxor xmm4,xmm4 > + pxor xmm5,xmm5 > + movaps xmm6,XMMWORD[rsp] > + movaps XMMWORD[rsp],xmm0 > + movaps xmm7,XMMWORD[16+rsp] > + movaps XMMWORD[16+rsp],xmm0 > + movaps xmm8,XMMWORD[32+rsp] > + movaps XMMWORD[32+rsp],xmm0 > + movaps xmm9,XMMWORD[48+rsp] > + movaps XMMWORD[48+rsp],xmm0 > + movaps xmm10,XMMWORD[64+rsp] > + movaps XMMWORD[64+rsp],xmm0 > + movaps xmm11,XMMWORD[80+rsp] > + movaps XMMWORD[80+rsp],xmm0 > + movaps xmm12,XMMWORD[96+rsp] > + movaps XMMWORD[96+rsp],xmm0 > + movaps xmm13,XMMWORD[112+rsp] > + movaps XMMWORD[112+rsp],xmm0 > + movaps xmm14,XMMWORD[128+rsp] > + movaps XMMWORD[128+rsp],xmm0 > + movaps xmm15,XMMWORD[144+rsp] > + movaps XMMWORD[144+rsp],xmm0 > + lea rax,[((160+40))+rsp] > +$L$ocb_dec_pop: > + mov r14,QWORD[((-40))+rax] > + > + mov r13,QWORD[((-32))+rax] > + > + mov r12,QWORD[((-24))+rax] > + > + mov rbp,QWORD[((-16))+rax] > + > + mov rbx,QWORD[((-8))+rax] > + > + lea rsp,[rax] > + > +$L$ocb_dec_epilogue: > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > + mov rsi,QWORD[16+rsp] > + DB 0F3h,0C3h ;repret > + > +$L$SEH_end_aesni_ocb_decrypt: > + > + > +ALIGN 32 > +__ocb_decrypt6: > + > + pxor xmm15,xmm9 > + movdqu xmm11,XMMWORD[r12*1+rbx] > + movdqa xmm12,xmm10 > + movdqu xmm13,XMMWORD[r13*1+rbx] > + movdqa xmm14,xmm10 > + pxor xmm10,xmm15 > + movdqu xmm15,XMMWORD[r14*1+rbx] > + pxor xmm11,xmm10 > + pxor xmm2,xmm10 > + pxor xmm12,xmm11 > + pxor xmm3,xmm11 > + pxor xmm13,xmm12 > + pxor xmm4,xmm12 > + pxor xmm14,xmm13 > + pxor xmm5,xmm13 > + pxor xmm15,xmm14 > + pxor xmm6,xmm14 > + pxor xmm7,xmm15 > + movups xmm0,XMMWORD[32+r11] > + > + lea r12,[1+r8] > + lea r13,[3+r8] > + lea r14,[5+r8] > + add r8,6 > + pxor xmm10,xmm9 > + bsf r12,r12 > + bsf r13,r13 > + bsf r14,r14 > + > +DB 102,15,56,222,209 > +DB 102,15,56,222,217 > +DB 102,15,56,222,225 > +DB 102,15,56,222,233 > + pxor xmm11,xmm9 > + pxor xmm12,xmm9 > +DB 102,15,56,222,241 > + pxor xmm13,xmm9 > + pxor xmm14,xmm9 > +DB 102,15,56,222,249 > + movups xmm1,XMMWORD[48+r11] > + pxor xmm15,xmm9 > + > +DB 102,15,56,222,208 > +DB 102,15,56,222,216 > +DB 102,15,56,222,224 > +DB 102,15,56,222,232 > +DB 102,15,56,222,240 > +DB 102,15,56,222,248 > + movups xmm0,XMMWORD[64+r11] > + shl r12,4 > + shl r13,4 > + jmp NEAR $L$ocb_dec_loop6 > + > +ALIGN 32 > +$L$ocb_dec_loop6: > +DB 102,15,56,222,209 > +DB 102,15,56,222,217 > +DB 102,15,56,222,225 > +DB 102,15,56,222,233 > +DB 102,15,56,222,241 > +DB 102,15,56,222,249 > + movups xmm1,XMMWORD[rax*1+rcx] > + add rax,32 > + > +DB 102,15,56,222,208 > +DB 102,15,56,222,216 > +DB 102,15,56,222,224 > +DB 102,15,56,222,232 > +DB 102,15,56,222,240 > +DB 102,15,56,222,248 > + movups xmm0,XMMWORD[((-16))+rax*1+rcx] > + jnz NEAR $L$ocb_dec_loop6 > + > +DB 102,15,56,222,209 > +DB 102,15,56,222,217 > +DB 102,15,56,222,225 > +DB 102,15,56,222,233 > +DB 102,15,56,222,241 > +DB 102,15,56,222,249 > + movups xmm1,XMMWORD[16+r11] > + shl r14,4 > + > +DB 102,65,15,56,223,210 > + movdqu xmm10,XMMWORD[rbx] > + mov rax,r10 > +DB 102,65,15,56,223,219 > +DB 102,65,15,56,223,228 > +DB 102,65,15,56,223,237 > +DB 102,65,15,56,223,246 > +DB 102,65,15,56,223,255 > + DB 0F3h,0C3h ;repret > + > + > + > + > +ALIGN 32 > +__ocb_decrypt4: > + > + pxor xmm15,xmm9 > + movdqu xmm11,XMMWORD[r12*1+rbx] > + movdqa xmm12,xmm10 > + movdqu xmm13,XMMWORD[r13*1+rbx] > + pxor xmm10,xmm15 > + pxor xmm11,xmm10 > + pxor xmm2,xmm10 > + pxor xmm12,xmm11 > + pxor xmm3,xmm11 > + pxor xmm13,xmm12 > + pxor xmm4,xmm12 > + pxor xmm5,xmm13 > + movups xmm0,XMMWORD[32+r11] > + > + pxor xmm10,xmm9 > + pxor xmm11,xmm9 > + pxor xmm12,xmm9 > + pxor xmm13,xmm9 > + > +DB 102,15,56,222,209 > +DB 102,15,56,222,217 > +DB 102,15,56,222,225 > +DB 102,15,56,222,233 > + movups xmm1,XMMWORD[48+r11] > + > +DB 102,15,56,222,208 > +DB 102,15,56,222,216 > +DB 102,15,56,222,224 > +DB 102,15,56,222,232 > + movups xmm0,XMMWORD[64+r11] > + jmp NEAR $L$ocb_dec_loop4 > + > +ALIGN 32 > +$L$ocb_dec_loop4: > +DB 102,15,56,222,209 > +DB 102,15,56,222,217 > +DB 102,15,56,222,225 > +DB 102,15,56,222,233 > + movups xmm1,XMMWORD[rax*1+rcx] > + add rax,32 > + > +DB 102,15,56,222,208 > +DB 102,15,56,222,216 > +DB 102,15,56,222,224 > +DB 102,15,56,222,232 > + movups xmm0,XMMWORD[((-16))+rax*1+rcx] > + jnz NEAR $L$ocb_dec_loop4 > + > +DB 102,15,56,222,209 > +DB 102,15,56,222,217 > +DB 102,15,56,222,225 > +DB 102,15,56,222,233 > + movups xmm1,XMMWORD[16+r11] > + mov rax,r10 > + > +DB 102,65,15,56,223,210 > +DB 102,65,15,56,223,219 > +DB 102,65,15,56,223,228 > +DB 102,65,15,56,223,237 > + DB 0F3h,0C3h ;repret > + > + > + > + > +ALIGN 32 > +__ocb_decrypt1: > + > + pxor xmm7,xmm15 > + pxor xmm7,xmm9 > + pxor xmm2,xmm7 > + movups xmm0,XMMWORD[32+r11] > + > +DB 102,15,56,222,209 > + movups xmm1,XMMWORD[48+r11] > + pxor xmm7,xmm9 > + > +DB 102,15,56,222,208 > + movups xmm0,XMMWORD[64+r11] > + jmp NEAR $L$ocb_dec_loop1 > + > +ALIGN 32 > +$L$ocb_dec_loop1: > +DB 102,15,56,222,209 > + movups xmm1,XMMWORD[rax*1+rcx] > + add rax,32 > + > +DB 102,15,56,222,208 > + movups xmm0,XMMWORD[((-16))+rax*1+rcx] > + jnz NEAR $L$ocb_dec_loop1 > + > +DB 102,15,56,222,209 > + movups xmm1,XMMWORD[16+r11] > + mov rax,r10 > + > +DB 102,15,56,223,215 > + DB 0F3h,0C3h ;repret > + > + > +global aesni_cbc_encrypt > + > +ALIGN 16 > +aesni_cbc_encrypt: > + mov QWORD[8+rsp],rdi ;WIN64 prologue > + mov QWORD[16+rsp],rsi > + mov rax,rsp > +$L$SEH_begin_aesni_cbc_encrypt: > + mov rdi,rcx > + mov rsi,rdx > + mov rdx,r8 > + mov rcx,r9 > + mov r8,QWORD[40+rsp] > + mov r9,QWORD[48+rsp] > + > + > + > + test rdx,rdx > + jz NEAR $L$cbc_ret > + > + mov r10d,DWORD[240+rcx] > + mov r11,rcx > + test r9d,r9d > + jz NEAR $L$cbc_decrypt > + > + movups xmm2,XMMWORD[r8] > + mov eax,r10d > + cmp rdx,16 > + jb NEAR $L$cbc_enc_tail > + sub rdx,16 > + jmp NEAR $L$cbc_enc_loop > +ALIGN 16 > +$L$cbc_enc_loop: > + movups xmm3,XMMWORD[rdi] > + lea rdi,[16+rdi] > + > + movups xmm0,XMMWORD[rcx] > + movups xmm1,XMMWORD[16+rcx] > + xorps xmm3,xmm0 > + lea rcx,[32+rcx] > + xorps xmm2,xmm3 > +$L$oop_enc1_15: > +DB 102,15,56,220,209 > + dec eax > + movups xmm1,XMMWORD[rcx] > + lea rcx,[16+rcx] > + jnz NEAR $L$oop_enc1_15 > +DB 102,15,56,221,209 > + mov eax,r10d > + mov rcx,r11 > + movups XMMWORD[rsi],xmm2 > + lea rsi,[16+rsi] > + sub rdx,16 > + jnc NEAR $L$cbc_enc_loop > + add rdx,16 > + jnz NEAR $L$cbc_enc_tail > + pxor xmm0,xmm0 > + pxor xmm1,xmm1 > + movups XMMWORD[r8],xmm2 > + pxor xmm2,xmm2 > + pxor xmm3,xmm3 > + jmp NEAR $L$cbc_ret > + > +$L$cbc_enc_tail: > + mov rcx,rdx > + xchg rsi,rdi > + DD 0x9066A4F3 > + mov ecx,16 > + sub rcx,rdx > + xor eax,eax > + DD 0x9066AAF3 > + lea rdi,[((-16))+rdi] > + mov eax,r10d > + mov rsi,rdi > + mov rcx,r11 > + xor rdx,rdx > + jmp NEAR $L$cbc_enc_loop > + > +ALIGN 16 > +$L$cbc_decrypt: > + cmp rdx,16 > + jne NEAR $L$cbc_decrypt_bulk > + > + > + > + movdqu xmm2,XMMWORD[rdi] > + movdqu xmm3,XMMWORD[r8] > + movdqa xmm4,xmm2 > + movups xmm0,XMMWORD[rcx] > + movups xmm1,XMMWORD[16+rcx] > + lea rcx,[32+rcx] > + xorps xmm2,xmm0 > +$L$oop_dec1_16: > +DB 102,15,56,222,209 > + dec r10d > + movups xmm1,XMMWORD[rcx] > + lea rcx,[16+rcx] > + jnz NEAR $L$oop_dec1_16 > +DB 102,15,56,223,209 > + pxor xmm0,xmm0 > + pxor xmm1,xmm1 > + movdqu XMMWORD[r8],xmm4 > + xorps xmm2,xmm3 > + pxor xmm3,xmm3 > + movups XMMWORD[rsi],xmm2 > + pxor xmm2,xmm2 > + jmp NEAR $L$cbc_ret > +ALIGN 16 > +$L$cbc_decrypt_bulk: > + lea r11,[rsp] > + > + push rbp > + > + sub rsp,176 > + and rsp,-16 > + movaps XMMWORD[16+rsp],xmm6 > + movaps XMMWORD[32+rsp],xmm7 > + movaps XMMWORD[48+rsp],xmm8 > + movaps XMMWORD[64+rsp],xmm9 > + movaps XMMWORD[80+rsp],xmm10 > + movaps XMMWORD[96+rsp],xmm11 > + movaps XMMWORD[112+rsp],xmm12 > + movaps XMMWORD[128+rsp],xmm13 > + movaps XMMWORD[144+rsp],xmm14 > + movaps XMMWORD[160+rsp],xmm15 > +$L$cbc_decrypt_body: > + mov rbp,rcx > + movups xmm10,XMMWORD[r8] > + mov eax,r10d > + cmp rdx,0x50 > + jbe NEAR $L$cbc_dec_tail > + > + movups xmm0,XMMWORD[rcx] > + movdqu xmm2,XMMWORD[rdi] > + movdqu xmm3,XMMWORD[16+rdi] > + movdqa xmm11,xmm2 > + movdqu xmm4,XMMWORD[32+rdi] > + movdqa xmm12,xmm3 > + movdqu xmm5,XMMWORD[48+rdi] > + movdqa xmm13,xmm4 > + movdqu xmm6,XMMWORD[64+rdi] > + movdqa xmm14,xmm5 > + movdqu xmm7,XMMWORD[80+rdi] > + movdqa xmm15,xmm6 > + mov r9d,DWORD[((OPENSSL_ia32cap_P+4))] > + cmp rdx,0x70 > + jbe NEAR $L$cbc_dec_six_or_seven > + > + and r9d,71303168 > + sub rdx,0x50 > + cmp r9d,4194304 > + je NEAR $L$cbc_dec_loop6_enter > + sub rdx,0x20 > + lea rcx,[112+rcx] > + jmp NEAR $L$cbc_dec_loop8_enter > +ALIGN 16 > +$L$cbc_dec_loop8: > + movups XMMWORD[rsi],xmm9 > + lea rsi,[16+rsi] > +$L$cbc_dec_loop8_enter: > + movdqu xmm8,XMMWORD[96+rdi] > + pxor xmm2,xmm0 > + movdqu xmm9,XMMWORD[112+rdi] > + pxor xmm3,xmm0 > + movups xmm1,XMMWORD[((16-112))+rcx] > + pxor xmm4,xmm0 > + mov rbp,-1 > + cmp rdx,0x70 > + pxor xmm5,xmm0 > + pxor xmm6,xmm0 > + pxor xmm7,xmm0 > + pxor xmm8,xmm0 > + > +DB 102,15,56,222,209 > + pxor xmm9,xmm0 > + movups xmm0,XMMWORD[((32-112))+rcx] > +DB 102,15,56,222,217 > +DB 102,15,56,222,225 > +DB 102,15,56,222,233 > +DB 102,15,56,222,241 > +DB 102,15,56,222,249 > +DB 102,68,15,56,222,193 > + adc rbp,0 > + and rbp,128 > +DB 102,68,15,56,222,201 > + add rbp,rdi > + movups xmm1,XMMWORD[((48-112))+rcx] > +DB 102,15,56,222,208 > +DB 102,15,56,222,216 > +DB 102,15,56,222,224 > +DB 102,15,56,222,232 > +DB 102,15,56,222,240 > +DB 102,15,56,222,248 > +DB 102,68,15,56,222,192 > +DB 102,68,15,56,222,200 > + movups xmm0,XMMWORD[((64-112))+rcx] > + nop > +DB 102,15,56,222,209 > +DB 102,15,56,222,217 > +DB 102,15,56,222,225 > +DB 102,15,56,222,233 > +DB 102,15,56,222,241 > +DB 102,15,56,222,249 > +DB 102,68,15,56,222,193 > +DB 102,68,15,56,222,201 > + movups xmm1,XMMWORD[((80-112))+rcx] > + nop > +DB 102,15,56,222,208 > +DB 102,15,56,222,216 > +DB 102,15,56,222,224 > +DB 102,15,56,222,232 > +DB 102,15,56,222,240 > +DB 102,15,56,222,248 > +DB 102,68,15,56,222,192 > +DB 102,68,15,56,222,200 > + movups xmm0,XMMWORD[((96-112))+rcx] > + nop > +DB 102,15,56,222,209 > +DB 102,15,56,222,217 > +DB 102,15,56,222,225 > +DB 102,15,56,222,233 > +DB 102,15,56,222,241 > +DB 102,15,56,222,249 > +DB 102,68,15,56,222,193 > +DB 102,68,15,56,222,201 > + movups xmm1,XMMWORD[((112-112))+rcx] > + nop > +DB 102,15,56,222,208 > +DB 102,15,56,222,216 > +DB 102,15,56,222,224 > +DB 102,15,56,222,232 > +DB 102,15,56,222,240 > +DB 102,15,56,222,248 > +DB 102,68,15,56,222,192 > +DB 102,68,15,56,222,200 > + movups xmm0,XMMWORD[((128-112))+rcx] > + nop > +DB 102,15,56,222,209 > +DB 102,15,56,222,217 > +DB 102,15,56,222,225 > +DB 102,15,56,222,233 > +DB 102,15,56,222,241 > +DB 102,15,56,222,249 > +DB 102,68,15,56,222,193 > +DB 102,68,15,56,222,201 > + movups xmm1,XMMWORD[((144-112))+rcx] > + cmp eax,11 > +DB 102,15,56,222,208 > +DB 102,15,56,222,216 > +DB 102,15,56,222,224 > +DB 102,15,56,222,232 > +DB 102,15,56,222,240 > +DB 102,15,56,222,248 > +DB 102,68,15,56,222,192 > +DB 102,68,15,56,222,200 > + movups xmm0,XMMWORD[((160-112))+rcx] > + jb NEAR $L$cbc_dec_done > +DB 102,15,56,222,209 > +DB 102,15,56,222,217 > +DB 102,15,56,222,225 > +DB 102,15,56,222,233 > +DB 102,15,56,222,241 > +DB 102,15,56,222,249 > +DB 102,68,15,56,222,193 > +DB 102,68,15,56,222,201 > + movups xmm1,XMMWORD[((176-112))+rcx] > + nop > +DB 102,15,56,222,208 > +DB 102,15,56,222,216 > +DB 102,15,56,222,224 > +DB 102,15,56,222,232 > +DB 102,15,56,222,240 > +DB 102,15,56,222,248 > +DB 102,68,15,56,222,192 > +DB 102,68,15,56,222,200 > + movups xmm0,XMMWORD[((192-112))+rcx] > + je NEAR $L$cbc_dec_done > +DB 102,15,56,222,209 > +DB 102,15,56,222,217 > +DB 102,15,56,222,225 > +DB 102,15,56,222,233 > +DB 102,15,56,222,241 > +DB 102,15,56,222,249 > +DB 102,68,15,56,222,193 > +DB 102,68,15,56,222,201 > + movups xmm1,XMMWORD[((208-112))+rcx] > + nop > +DB 102,15,56,222,208 > +DB 102,15,56,222,216 > +DB 102,15,56,222,224 > +DB 102,15,56,222,232 > +DB 102,15,56,222,240 > +DB 102,15,56,222,248 > +DB 102,68,15,56,222,192 > +DB 102,68,15,56,222,200 > + movups xmm0,XMMWORD[((224-112))+rcx] > + jmp NEAR $L$cbc_dec_done > +ALIGN 16 > +$L$cbc_dec_done: > +DB 102,15,56,222,209 > +DB 102,15,56,222,217 > + pxor xmm10,xmm0 > + pxor xmm11,xmm0 > +DB 102,15,56,222,225 > +DB 102,15,56,222,233 > + pxor xmm12,xmm0 > + pxor xmm13,xmm0 > +DB 102,15,56,222,241 > +DB 102,15,56,222,249 > + pxor xmm14,xmm0 > + pxor xmm15,xmm0 > +DB 102,68,15,56,222,193 > +DB 102,68,15,56,222,201 > + movdqu xmm1,XMMWORD[80+rdi] > + > +DB 102,65,15,56,223,210 > + movdqu xmm10,XMMWORD[96+rdi] > + pxor xmm1,xmm0 > +DB 102,65,15,56,223,219 > + pxor xmm10,xmm0 > + movdqu xmm0,XMMWORD[112+rdi] > +DB 102,65,15,56,223,228 > + lea rdi,[128+rdi] > + movdqu xmm11,XMMWORD[rbp] > +DB 102,65,15,56,223,237 > +DB 102,65,15,56,223,246 > + movdqu xmm12,XMMWORD[16+rbp] > + movdqu xmm13,XMMWORD[32+rbp] > +DB 102,65,15,56,223,255 > +DB 102,68,15,56,223,193 > + movdqu xmm14,XMMWORD[48+rbp] > + movdqu xmm15,XMMWORD[64+rbp] > +DB 102,69,15,56,223,202 > + movdqa xmm10,xmm0 > + movdqu xmm1,XMMWORD[80+rbp] > + movups xmm0,XMMWORD[((-112))+rcx] > + > + movups XMMWORD[rsi],xmm2 > + movdqa xmm2,xmm11 > + movups XMMWORD[16+rsi],xmm3 > + movdqa xmm3,xmm12 > + movups XMMWORD[32+rsi],xmm4 > + movdqa xmm4,xmm13 > + movups XMMWORD[48+rsi],xmm5 > + movdqa xmm5,xmm14 > + movups XMMWORD[64+rsi],xmm6 > + movdqa xmm6,xmm15 > + movups XMMWORD[80+rsi],xmm7 > + movdqa xmm7,xmm1 > + movups XMMWORD[96+rsi],xmm8 > + lea rsi,[112+rsi] > + > + sub rdx,0x80 > + ja NEAR $L$cbc_dec_loop8 > + > + movaps xmm2,xmm9 > + lea rcx,[((-112))+rcx] > + add rdx,0x70 > + jle NEAR $L$cbc_dec_clear_tail_collected > + movups XMMWORD[rsi],xmm9 > + lea rsi,[16+rsi] > + cmp rdx,0x50 > + jbe NEAR $L$cbc_dec_tail > + > + movaps xmm2,xmm11 > +$L$cbc_dec_six_or_seven: > + cmp rdx,0x60 > + ja NEAR $L$cbc_dec_seven > + > + movaps xmm8,xmm7 > + call _aesni_decrypt6 > + pxor xmm2,xmm10 > + movaps xmm10,xmm8 > + pxor xmm3,xmm11 > + movdqu XMMWORD[rsi],xmm2 > + pxor xmm4,xmm12 > + movdqu XMMWORD[16+rsi],xmm3 > + pxor xmm3,xmm3 > + pxor xmm5,xmm13 > + movdqu XMMWORD[32+rsi],xmm4 > + pxor xmm4,xmm4 > + pxor xmm6,xmm14 > + movdqu XMMWORD[48+rsi],xmm5 > + pxor xmm5,xmm5 > + pxor xmm7,xmm15 > + movdqu XMMWORD[64+rsi],xmm6 > + pxor xmm6,xmm6 > + lea rsi,[80+rsi] > + movdqa xmm2,xmm7 > + pxor xmm7,xmm7 > + jmp NEAR $L$cbc_dec_tail_collected > + > +ALIGN 16 > +$L$cbc_dec_seven: > + movups xmm8,XMMWORD[96+rdi] > + xorps xmm9,xmm9 > + call _aesni_decrypt8 > + movups xmm9,XMMWORD[80+rdi] > + pxor xmm2,xmm10 > + movups xmm10,XMMWORD[96+rdi] > + pxor xmm3,xmm11 > + movdqu XMMWORD[rsi],xmm2 > + pxor xmm4,xmm12 > + movdqu XMMWORD[16+rsi],xmm3 > + pxor xmm3,xmm3 > + pxor xmm5,xmm13 > + movdqu XMMWORD[32+rsi],xmm4 > + pxor xmm4,xmm4 > + pxor xmm6,xmm14 > + movdqu XMMWORD[48+rsi],xmm5 > + pxor xmm5,xmm5 > + pxor xmm7,xmm15 > + movdqu XMMWORD[64+rsi],xmm6 > + pxor xmm6,xmm6 > + pxor xmm8,xmm9 > + movdqu XMMWORD[80+rsi],xmm7 > + pxor xmm7,xmm7 > + lea rsi,[96+rsi] > + movdqa xmm2,xmm8 > + pxor xmm8,xmm8 > + pxor xmm9,xmm9 > + jmp NEAR $L$cbc_dec_tail_collected > + > +ALIGN 16 > +$L$cbc_dec_loop6: > + movups XMMWORD[rsi],xmm7 > + lea rsi,[16+rsi] > + movdqu xmm2,XMMWORD[rdi] > + movdqu xmm3,XMMWORD[16+rdi] > + movdqa xmm11,xmm2 > + movdqu xmm4,XMMWORD[32+rdi] > + movdqa xmm12,xmm3 > + movdqu xmm5,XMMWORD[48+rdi] > + movdqa xmm13,xmm4 > + movdqu xmm6,XMMWORD[64+rdi] > + movdqa xmm14,xmm5 > + movdqu xmm7,XMMWORD[80+rdi] > + movdqa xmm15,xmm6 > +$L$cbc_dec_loop6_enter: > + lea rdi,[96+rdi] > + movdqa xmm8,xmm7 > + > + call _aesni_decrypt6 > + > + pxor xmm2,xmm10 > + movdqa xmm10,xmm8 > + pxor xmm3,xmm11 > + movdqu XMMWORD[rsi],xmm2 > + pxor xmm4,xmm12 > + movdqu XMMWORD[16+rsi],xmm3 > + pxor xmm5,xmm13 > + movdqu XMMWORD[32+rsi],xmm4 > + pxor xmm6,xmm14 > + mov rcx,rbp > + movdqu XMMWORD[48+rsi],xmm5 > + pxor xmm7,xmm15 > + mov eax,r10d > + movdqu XMMWORD[64+rsi],xmm6 > + lea rsi,[80+rsi] > + sub rdx,0x60 > + ja NEAR $L$cbc_dec_loop6 > + > + movdqa xmm2,xmm7 > + add rdx,0x50 > + jle NEAR $L$cbc_dec_clear_tail_collected > + movups XMMWORD[rsi],xmm7 > + lea rsi,[16+rsi] > + > +$L$cbc_dec_tail: > + movups xmm2,XMMWORD[rdi] > + sub rdx,0x10 > + jbe NEAR $L$cbc_dec_one > + > + movups xmm3,XMMWORD[16+rdi] > + movaps xmm11,xmm2 > + sub rdx,0x10 > + jbe NEAR $L$cbc_dec_two > + > + movups xmm4,XMMWORD[32+rdi] > + movaps xmm12,xmm3 > + sub rdx,0x10 > + jbe NEAR $L$cbc_dec_three > + > + movups xmm5,XMMWORD[48+rdi] > + movaps xmm13,xmm4 > + sub rdx,0x10 > + jbe NEAR $L$cbc_dec_four > + > + movups xmm6,XMMWORD[64+rdi] > + movaps xmm14,xmm5 > + movaps xmm15,xmm6 > + xorps xmm7,xmm7 > + call _aesni_decrypt6 > + pxor xmm2,xmm10 > + movaps xmm10,xmm15 > + pxor xmm3,xmm11 > + movdqu XMMWORD[rsi],xmm2 > + pxor xmm4,xmm12 > + movdqu XMMWORD[16+rsi],xmm3 > + pxor xmm3,xmm3 > + pxor xmm5,xmm13 > + movdqu XMMWORD[32+rsi],xmm4 > + pxor xmm4,xmm4 > + pxor xmm6,xmm14 > + movdqu XMMWORD[48+rsi],xmm5 > + pxor xmm5,xmm5 > + lea rsi,[64+rsi] > + movdqa xmm2,xmm6 > + pxor xmm6,xmm6 > + pxor xmm7,xmm7 > + sub rdx,0x10 > + jmp NEAR $L$cbc_dec_tail_collected > + > +ALIGN 16 > +$L$cbc_dec_one: > + movaps xmm11,xmm2 > + movups xmm0,XMMWORD[rcx] > + movups xmm1,XMMWORD[16+rcx] > + lea rcx,[32+rcx] > + xorps xmm2,xmm0 > +$L$oop_dec1_17: > +DB 102,15,56,222,209 > + dec eax > + movups xmm1,XMMWORD[rcx] > + lea rcx,[16+rcx] > + jnz NEAR $L$oop_dec1_17 > +DB 102,15,56,223,209 > + xorps xmm2,xmm10 > + movaps xmm10,xmm11 > + jmp NEAR $L$cbc_dec_tail_collected > +ALIGN 16 > +$L$cbc_dec_two: > + movaps xmm12,xmm3 > + call _aesni_decrypt2 > + pxor xmm2,xmm10 > + movaps xmm10,xmm12 > + pxor xmm3,xmm11 > + movdqu XMMWORD[rsi],xmm2 > + movdqa xmm2,xmm3 > + pxor xmm3,xmm3 > + lea rsi,[16+rsi] > + jmp NEAR $L$cbc_dec_tail_collected > +ALIGN 16 > +$L$cbc_dec_three: > + movaps xmm13,xmm4 > + call _aesni_decrypt3 > + pxor xmm2,xmm10 > + movaps xmm10,xmm13 > + pxor xmm3,xmm11 > + movdqu XMMWORD[rsi],xmm2 > + pxor xmm4,xmm12 > + movdqu XMMWORD[16+rsi],xmm3 > + pxor xmm3,xmm3 > + movdqa xmm2,xmm4 > + pxor xmm4,xmm4 > + lea rsi,[32+rsi] > + jmp NEAR $L$cbc_dec_tail_collected > +ALIGN 16 > +$L$cbc_dec_four: > + movaps xmm14,xmm5 > + call _aesni_decrypt4 > + pxor xmm2,xmm10 > + movaps xmm10,xmm14 > + pxor xmm3,xmm11 > + movdqu XMMWORD[rsi],xmm2 > + pxor xmm4,xmm12 > + movdqu XMMWORD[16+rsi],xmm3 > + pxor xmm3,xmm3 > + pxor xmm5,xmm13 > + movdqu XMMWORD[32+rsi],xmm4 > + pxor xmm4,xmm4 > + movdqa xmm2,xmm5 > + pxor xmm5,xmm5 > + lea rsi,[48+rsi] > + jmp NEAR $L$cbc_dec_tail_collected > + > +ALIGN 16 > +$L$cbc_dec_clear_tail_collected: > + pxor xmm3,xmm3 > + pxor xmm4,xmm4 > + pxor xmm5,xmm5 > +$L$cbc_dec_tail_collected: > + movups XMMWORD[r8],xmm10 > + and rdx,15 > + jnz NEAR $L$cbc_dec_tail_partial > + movups XMMWORD[rsi],xmm2 > + pxor xmm2,xmm2 > + jmp NEAR $L$cbc_dec_ret > +ALIGN 16 > +$L$cbc_dec_tail_partial: > + movaps XMMWORD[rsp],xmm2 > + pxor xmm2,xmm2 > + mov rcx,16 > + mov rdi,rsi > + sub rcx,rdx > + lea rsi,[rsp] > + DD 0x9066A4F3 > + movdqa XMMWORD[rsp],xmm2 > + > +$L$cbc_dec_ret: > + xorps xmm0,xmm0 > + pxor xmm1,xmm1 > + movaps xmm6,XMMWORD[16+rsp] > + movaps XMMWORD[16+rsp],xmm0 > + movaps xmm7,XMMWORD[32+rsp] > + movaps XMMWORD[32+rsp],xmm0 > + movaps xmm8,XMMWORD[48+rsp] > + movaps XMMWORD[48+rsp],xmm0 > + movaps xmm9,XMMWORD[64+rsp] > + movaps XMMWORD[64+rsp],xmm0 > + movaps xmm10,XMMWORD[80+rsp] > + movaps XMMWORD[80+rsp],xmm0 > + movaps xmm11,XMMWORD[96+rsp] > + movaps XMMWORD[96+rsp],xmm0 > + movaps xmm12,XMMWORD[112+rsp] > + movaps XMMWORD[112+rsp],xmm0 > + movaps xmm13,XMMWORD[128+rsp] > + movaps XMMWORD[128+rsp],xmm0 > + movaps xmm14,XMMWORD[144+rsp] > + movaps XMMWORD[144+rsp],xmm0 > + movaps xmm15,XMMWORD[160+rsp] > + movaps XMMWORD[160+rsp],xmm0 > + mov rbp,QWORD[((-8))+r11] > + > + lea rsp,[r11] > + > +$L$cbc_ret: > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > + mov rsi,QWORD[16+rsp] > + DB 0F3h,0C3h ;repret > + > +$L$SEH_end_aesni_cbc_encrypt: > +global aesni_set_decrypt_key > + > +ALIGN 16 > +aesni_set_decrypt_key: > + > +DB 0x48,0x83,0xEC,0x08 > + > + call __aesni_set_encrypt_key > + shl edx,4 > + test eax,eax > + jnz NEAR $L$dec_key_ret > + lea rcx,[16+rdx*1+r8] > + > + movups xmm0,XMMWORD[r8] > + movups xmm1,XMMWORD[rcx] > + movups XMMWORD[rcx],xmm0 > + movups XMMWORD[r8],xmm1 > + lea r8,[16+r8] > + lea rcx,[((-16))+rcx] > + > +$L$dec_key_inverse: > + movups xmm0,XMMWORD[r8] > + movups xmm1,XMMWORD[rcx] > +DB 102,15,56,219,192 > +DB 102,15,56,219,201 > + lea r8,[16+r8] > + lea rcx,[((-16))+rcx] > + movups XMMWORD[16+rcx],xmm0 > + movups XMMWORD[(-16)+r8],xmm1 > + cmp rcx,r8 > + ja NEAR $L$dec_key_inverse > + > + movups xmm0,XMMWORD[r8] > +DB 102,15,56,219,192 > + pxor xmm1,xmm1 > + movups XMMWORD[rcx],xmm0 > + pxor xmm0,xmm0 > +$L$dec_key_ret: > + add rsp,8 > + > + DB 0F3h,0C3h ;repret > + > +$L$SEH_end_set_decrypt_key: > + > +global aesni_set_encrypt_key > + > +ALIGN 16 > +aesni_set_encrypt_key: > +__aesni_set_encrypt_key: > + > +DB 0x48,0x83,0xEC,0x08 > + > + mov rax,-1 > + test rcx,rcx > + jz NEAR $L$enc_key_ret > + test r8,r8 > + jz NEAR $L$enc_key_ret > + > + mov r10d,268437504 > + movups xmm0,XMMWORD[rcx] > + xorps xmm4,xmm4 > + and r10d,DWORD[((OPENSSL_ia32cap_P+4))] > + lea rax,[16+r8] > + cmp edx,256 > + je NEAR $L$14rounds > + cmp edx,192 > + je NEAR $L$12rounds > + cmp edx,128 > + jne NEAR $L$bad_keybits > + > +$L$10rounds: > + mov edx,9 > + cmp r10d,268435456 > + je NEAR $L$10rounds_alt > + > + movups XMMWORD[r8],xmm0 > +DB 102,15,58,223,200,1 > + call $L$key_expansion_128_cold > +DB 102,15,58,223,200,2 > + call $L$key_expansion_128 > +DB 102,15,58,223,200,4 > + call $L$key_expansion_128 > +DB 102,15,58,223,200,8 > + call $L$key_expansion_128 > +DB 102,15,58,223,200,16 > + call $L$key_expansion_128 > +DB 102,15,58,223,200,32 > + call $L$key_expansion_128 > +DB 102,15,58,223,200,64 > + call $L$key_expansion_128 > +DB 102,15,58,223,200,128 > + call $L$key_expansion_128 > +DB 102,15,58,223,200,27 > + call $L$key_expansion_128 > +DB 102,15,58,223,200,54 > + call $L$key_expansion_128 > + movups XMMWORD[rax],xmm0 > + mov DWORD[80+rax],edx > + xor eax,eax > + jmp NEAR $L$enc_key_ret > + > +ALIGN 16 > +$L$10rounds_alt: > + movdqa xmm5,XMMWORD[$L$key_rotate] > + mov r10d,8 > + movdqa xmm4,XMMWORD[$L$key_rcon1] > + movdqa xmm2,xmm0 > + movdqu XMMWORD[r8],xmm0 > + jmp NEAR $L$oop_key128 > + > +ALIGN 16 > +$L$oop_key128: > +DB 102,15,56,0,197 > +DB 102,15,56,221,196 > + pslld xmm4,1 > + lea rax,[16+rax] > + > + movdqa xmm3,xmm2 > + pslldq xmm2,4 > + pxor xmm3,xmm2 > + pslldq xmm2,4 > + pxor xmm3,xmm2 > + pslldq xmm2,4 > + pxor xmm2,xmm3 > + > + pxor xmm0,xmm2 > + movdqu XMMWORD[(-16)+rax],xmm0 > + movdqa xmm2,xmm0 > + > + dec r10d > + jnz NEAR $L$oop_key128 > + > + movdqa xmm4,XMMWORD[$L$key_rcon1b] > + > +DB 102,15,56,0,197 > +DB 102,15,56,221,196 > + pslld xmm4,1 > + > + movdqa xmm3,xmm2 > + pslldq xmm2,4 > + pxor xmm3,xmm2 > + pslldq xmm2,4 > + pxor xmm3,xmm2 > + pslldq xmm2,4 > + pxor xmm2,xmm3 > + > + pxor xmm0,xmm2 > + movdqu XMMWORD[rax],xmm0 > + > + movdqa xmm2,xmm0 > +DB 102,15,56,0,197 > +DB 102,15,56,221,196 > + > + movdqa xmm3,xmm2 > + pslldq xmm2,4 > + pxor xmm3,xmm2 > + pslldq xmm2,4 > + pxor xmm3,xmm2 > + pslldq xmm2,4 > + pxor xmm2,xmm3 > + > + pxor xmm0,xmm2 > + movdqu XMMWORD[16+rax],xmm0 > + > + mov DWORD[96+rax],edx > + xor eax,eax > + jmp NEAR $L$enc_key_ret > + > +ALIGN 16 > +$L$12rounds: > + movq xmm2,QWORD[16+rcx] > + mov edx,11 > + cmp r10d,268435456 > + je NEAR $L$12rounds_alt > + > + movups XMMWORD[r8],xmm0 > +DB 102,15,58,223,202,1 > + call $L$key_expansion_192a_cold > +DB 102,15,58,223,202,2 > + call $L$key_expansion_192b > +DB 102,15,58,223,202,4 > + call $L$key_expansion_192a > +DB 102,15,58,223,202,8 > + call $L$key_expansion_192b > +DB 102,15,58,223,202,16 > + call $L$key_expansion_192a > +DB 102,15,58,223,202,32 > + call $L$key_expansion_192b > +DB 102,15,58,223,202,64 > + call $L$key_expansion_192a > +DB 102,15,58,223,202,128 > + call $L$key_expansion_192b > + movups XMMWORD[rax],xmm0 > + mov DWORD[48+rax],edx > + xor rax,rax > + jmp NEAR $L$enc_key_ret > + > +ALIGN 16 > +$L$12rounds_alt: > + movdqa xmm5,XMMWORD[$L$key_rotate192] > + movdqa xmm4,XMMWORD[$L$key_rcon1] > + mov r10d,8 > + movdqu XMMWORD[r8],xmm0 > + jmp NEAR $L$oop_key192 > + > +ALIGN 16 > +$L$oop_key192: > + movq QWORD[rax],xmm2 > + movdqa xmm1,xmm2 > +DB 102,15,56,0,213 > +DB 102,15,56,221,212 > + pslld xmm4,1 > + lea rax,[24+rax] > + > + movdqa xmm3,xmm0 > + pslldq xmm0,4 > + pxor xmm3,xmm0 > + pslldq xmm0,4 > + pxor xmm3,xmm0 > + pslldq xmm0,4 > + pxor xmm0,xmm3 > + > + pshufd xmm3,xmm0,0xff > + pxor xmm3,xmm1 > + pslldq xmm1,4 > + pxor xmm3,xmm1 > + > + pxor xmm0,xmm2 > + pxor xmm2,xmm3 > + movdqu XMMWORD[(-16)+rax],xmm0 > + > + dec r10d > + jnz NEAR $L$oop_key192 > + > + mov DWORD[32+rax],edx > + xor eax,eax > + jmp NEAR $L$enc_key_ret > + > +ALIGN 16 > +$L$14rounds: > + movups xmm2,XMMWORD[16+rcx] > + mov edx,13 > + lea rax,[16+rax] > + cmp r10d,268435456 > + je NEAR $L$14rounds_alt > + > + movups XMMWORD[r8],xmm0 > + movups XMMWORD[16+r8],xmm2 > +DB 102,15,58,223,202,1 > + call $L$key_expansion_256a_cold > +DB 102,15,58,223,200,1 > + call $L$key_expansion_256b > +DB 102,15,58,223,202,2 > + call $L$key_expansion_256a > +DB 102,15,58,223,200,2 > + call $L$key_expansion_256b > +DB 102,15,58,223,202,4 > + call $L$key_expansion_256a > +DB 102,15,58,223,200,4 > + call $L$key_expansion_256b > +DB 102,15,58,223,202,8 > + call $L$key_expansion_256a > +DB 102,15,58,223,200,8 > + call $L$key_expansion_256b > +DB 102,15,58,223,202,16 > + call $L$key_expansion_256a > +DB 102,15,58,223,200,16 > + call $L$key_expansion_256b > +DB 102,15,58,223,202,32 > + call $L$key_expansion_256a > +DB 102,15,58,223,200,32 > + call $L$key_expansion_256b > +DB 102,15,58,223,202,64 > + call $L$key_expansion_256a > + movups XMMWORD[rax],xmm0 > + mov DWORD[16+rax],edx > + xor rax,rax > + jmp NEAR $L$enc_key_ret > + > +ALIGN 16 > +$L$14rounds_alt: > + movdqa xmm5,XMMWORD[$L$key_rotate] > + movdqa xmm4,XMMWORD[$L$key_rcon1] > + mov r10d,7 > + movdqu XMMWORD[r8],xmm0 > + movdqa xmm1,xmm2 > + movdqu XMMWORD[16+r8],xmm2 > + jmp NEAR $L$oop_key256 > + > +ALIGN 16 > +$L$oop_key256: > +DB 102,15,56,0,213 > +DB 102,15,56,221,212 > + > + movdqa xmm3,xmm0 > + pslldq xmm0,4 > + pxor xmm3,xmm0 > + pslldq xmm0,4 > + pxor xmm3,xmm0 > + pslldq xmm0,4 > + pxor xmm0,xmm3 > + pslld xmm4,1 > + > + pxor xmm0,xmm2 > + movdqu XMMWORD[rax],xmm0 > + > + dec r10d > + jz NEAR $L$done_key256 > + > + pshufd xmm2,xmm0,0xff > + pxor xmm3,xmm3 > +DB 102,15,56,221,211 > + > + movdqa xmm3,xmm1 > + pslldq xmm1,4 > + pxor xmm3,xmm1 > + pslldq xmm1,4 > + pxor xmm3,xmm1 > + pslldq xmm1,4 > + pxor xmm1,xmm3 > + > + pxor xmm2,xmm1 > + movdqu XMMWORD[16+rax],xmm2 > + lea rax,[32+rax] > + movdqa xmm1,xmm2 > + > + jmp NEAR $L$oop_key256 > + > +$L$done_key256: > + mov DWORD[16+rax],edx > + xor eax,eax > + jmp NEAR $L$enc_key_ret > + > +ALIGN 16 > +$L$bad_keybits: > + mov rax,-2 > +$L$enc_key_ret: > + pxor xmm0,xmm0 > + pxor xmm1,xmm1 > + pxor xmm2,xmm2 > + pxor xmm3,xmm3 > + pxor xmm4,xmm4 > + pxor xmm5,xmm5 > + add rsp,8 > + > + DB 0F3h,0C3h ;repret > +$L$SEH_end_set_encrypt_key: > + > +ALIGN 16 > +$L$key_expansion_128: > + movups XMMWORD[rax],xmm0 > + lea rax,[16+rax] > +$L$key_expansion_128_cold: > + shufps xmm4,xmm0,16 > + xorps xmm0,xmm4 > + shufps xmm4,xmm0,140 > + xorps xmm0,xmm4 > + shufps xmm1,xmm1,255 > + xorps xmm0,xmm1 > + DB 0F3h,0C3h ;repret > + > +ALIGN 16 > +$L$key_expansion_192a: > + movups XMMWORD[rax],xmm0 > + lea rax,[16+rax] > +$L$key_expansion_192a_cold: > + movaps xmm5,xmm2 > +$L$key_expansion_192b_warm: > + shufps xmm4,xmm0,16 > + movdqa xmm3,xmm2 > + xorps xmm0,xmm4 > + shufps xmm4,xmm0,140 > + pslldq xmm3,4 > + xorps xmm0,xmm4 > + pshufd xmm1,xmm1,85 > + pxor xmm2,xmm3 > + pxor xmm0,xmm1 > + pshufd xmm3,xmm0,255 > + pxor xmm2,xmm3 > + DB 0F3h,0C3h ;repret > + > +ALIGN 16 > +$L$key_expansion_192b: > + movaps xmm3,xmm0 > + shufps xmm5,xmm0,68 > + movups XMMWORD[rax],xmm5 > + shufps xmm3,xmm2,78 > + movups XMMWORD[16+rax],xmm3 > + lea rax,[32+rax] > + jmp NEAR $L$key_expansion_192b_warm > + > +ALIGN 16 > +$L$key_expansion_256a: > + movups XMMWORD[rax],xmm2 > + lea rax,[16+rax] > +$L$key_expansion_256a_cold: > + shufps xmm4,xmm0,16 > + xorps xmm0,xmm4 > + shufps xmm4,xmm0,140 > + xorps xmm0,xmm4 > + shufps xmm1,xmm1,255 > + xorps xmm0,xmm1 > + DB 0F3h,0C3h ;repret > + > +ALIGN 16 > +$L$key_expansion_256b: > + movups XMMWORD[rax],xmm0 > + lea rax,[16+rax] > + > + shufps xmm4,xmm2,16 > + xorps xmm2,xmm4 > + shufps xmm4,xmm2,140 > + xorps xmm2,xmm4 > + shufps xmm1,xmm1,170 > + xorps xmm2,xmm1 > + DB 0F3h,0C3h ;repret > + > + > + > +ALIGN 64 > +$L$bswap_mask: > +DB 15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0 > +$L$increment32: > + DD 6,6,6,0 > +$L$increment64: > + DD 1,0,0,0 > +$L$xts_magic: > + DD 0x87,0,1,0 > +$L$increment1: > +DB 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1 > +$L$key_rotate: > + DD 0x0c0f0e0d,0x0c0f0e0d,0x0c0f0e0d,0x0c0f0e0d > +$L$key_rotate192: > + DD 0x04070605,0x04070605,0x04070605,0x04070605 > +$L$key_rcon1: > + DD 1,1,1,1 > +$L$key_rcon1b: > + DD 0x1b,0x1b,0x1b,0x1b > + > +DB 65,69,83,32,102,111,114,32,73,110,116,101,108,32,65,69 > +DB 83,45,78,73,44,32,67,82,89,80,84,79,71,65,77,83 > +DB 32,98,121,32,60,97,112,112,114,111,64,111,112,101,110,115 > +DB 115,108,46,111,114,103,62,0 > +ALIGN 64 > +EXTERN __imp_RtlVirtualUnwind > + > +ALIGN 16 > +ecb_ccm64_se_handler: > + push rsi > + push rdi > + push rbx > + push rbp > + push r12 > + push r13 > + push r14 > + push r15 > + pushfq > + sub rsp,64 > + > + mov rax,QWORD[120+r8] > + mov rbx,QWORD[248+r8] > + > + mov rsi,QWORD[8+r9] > + mov r11,QWORD[56+r9] > + > + mov r10d,DWORD[r11] > + lea r10,[r10*1+rsi] > + cmp rbx,r10 > + jb NEAR $L$common_seh_tail > + > + mov rax,QWORD[152+r8] > + > + mov r10d,DWORD[4+r11] > + lea r10,[r10*1+rsi] > + cmp rbx,r10 > + jae NEAR $L$common_seh_tail > + > + lea rsi,[rax] > + lea rdi,[512+r8] > + mov ecx,8 > + DD 0xa548f3fc > + lea rax,[88+rax] > + > + jmp NEAR $L$common_seh_tail > + > + > + > +ALIGN 16 > +ctr_xts_se_handler: > + push rsi > + push rdi > + push rbx > + push rbp > + push r12 > + push r13 > + push r14 > + push r15 > + pushfq > + sub rsp,64 > + > + mov rax,QWORD[120+r8] > + mov rbx,QWORD[248+r8] > + > + mov rsi,QWORD[8+r9] > + mov r11,QWORD[56+r9] > + > + mov r10d,DWORD[r11] > + lea r10,[r10*1+rsi] > + cmp rbx,r10 > + jb NEAR $L$common_seh_tail > + > + mov rax,QWORD[152+r8] > + > + mov r10d,DWORD[4+r11] > + lea r10,[r10*1+rsi] > + cmp rbx,r10 > + jae NEAR $L$common_seh_tail > + > + mov rax,QWORD[208+r8] > + > + lea rsi,[((-168))+rax] > + lea rdi,[512+r8] > + mov ecx,20 > + DD 0xa548f3fc > + > + mov rbp,QWORD[((-8))+rax] > + mov QWORD[160+r8],rbp > + jmp NEAR $L$common_seh_tail > + > + > + > +ALIGN 16 > +ocb_se_handler: > + push rsi > + push rdi > + push rbx > + push rbp > + push r12 > + push r13 > + push r14 > + push r15 > + pushfq > + sub rsp,64 > + > + mov rax,QWORD[120+r8] > + mov rbx,QWORD[248+r8] > + > + mov rsi,QWORD[8+r9] > + mov r11,QWORD[56+r9] > + > + mov r10d,DWORD[r11] > + lea r10,[r10*1+rsi] > + cmp rbx,r10 > + jb NEAR $L$common_seh_tail > + > + mov r10d,DWORD[4+r11] > + lea r10,[r10*1+rsi] > + cmp rbx,r10 > + jae NEAR $L$common_seh_tail > + > + mov r10d,DWORD[8+r11] > + lea r10,[r10*1+rsi] > + cmp rbx,r10 > + jae NEAR $L$ocb_no_xmm > + > + mov rax,QWORD[152+r8] > + > + lea rsi,[rax] > + lea rdi,[512+r8] > + mov ecx,20 > + DD 0xa548f3fc > + lea rax,[((160+40))+rax] > + > +$L$ocb_no_xmm: > + mov rbx,QWORD[((-8))+rax] > + mov rbp,QWORD[((-16))+rax] > + mov r12,QWORD[((-24))+rax] > + mov r13,QWORD[((-32))+rax] > + mov r14,QWORD[((-40))+rax] > + > + mov QWORD[144+r8],rbx > + mov QWORD[160+r8],rbp > + mov QWORD[216+r8],r12 > + mov QWORD[224+r8],r13 > + mov QWORD[232+r8],r14 > + > + jmp NEAR $L$common_seh_tail > + > + > +ALIGN 16 > +cbc_se_handler: > + push rsi > + push rdi > + push rbx > + push rbp > + push r12 > + push r13 > + push r14 > + push r15 > + pushfq > + sub rsp,64 > + > + mov rax,QWORD[152+r8] > + mov rbx,QWORD[248+r8] > + > + lea r10,[$L$cbc_decrypt_bulk] > + cmp rbx,r10 > + jb NEAR $L$common_seh_tail > + > + mov rax,QWORD[120+r8] > + > + lea r10,[$L$cbc_decrypt_body] > + cmp rbx,r10 > + jb NEAR $L$common_seh_tail > + > + mov rax,QWORD[152+r8] > + > + lea r10,[$L$cbc_ret] > + cmp rbx,r10 > + jae NEAR $L$common_seh_tail > + > + lea rsi,[16+rax] > + lea rdi,[512+r8] > + mov ecx,20 > + DD 0xa548f3fc > + > + mov rax,QWORD[208+r8] > + > + mov rbp,QWORD[((-8))+rax] > + mov QWORD[160+r8],rbp > + > +$L$common_seh_tail: > + mov rdi,QWORD[8+rax] > + mov rsi,QWORD[16+rax] > + mov QWORD[152+r8],rax > + mov QWORD[168+r8],rsi > + mov QWORD[176+r8],rdi > + > + mov rdi,QWORD[40+r9] > + mov rsi,r8 > + mov ecx,154 > + DD 0xa548f3fc > + > + mov rsi,r9 > + xor rcx,rcx > + mov rdx,QWORD[8+rsi] > + mov r8,QWORD[rsi] > + mov r9,QWORD[16+rsi] > + mov r10,QWORD[40+rsi] > + lea r11,[56+rsi] > + lea r12,[24+rsi] > + mov QWORD[32+rsp],r10 > + mov QWORD[40+rsp],r11 > + mov QWORD[48+rsp],r12 > + mov QWORD[56+rsp],rcx > + call QWORD[__imp_RtlVirtualUnwind] > + > + mov eax,1 > + add rsp,64 > + popfq > + pop r15 > + pop r14 > + pop r13 > + pop r12 > + pop rbp > + pop rbx > + pop rdi > + pop rsi > + DB 0F3h,0C3h ;repret > + > + > +section .pdata rdata align=4 > +ALIGN 4 > + DD $L$SEH_begin_aesni_ecb_encrypt wrt ..imagebase > + DD $L$SEH_end_aesni_ecb_encrypt wrt ..imagebase > + DD $L$SEH_info_ecb wrt ..imagebase > + > + DD $L$SEH_begin_aesni_ccm64_encrypt_blocks wrt ..imagebase > + DD $L$SEH_end_aesni_ccm64_encrypt_blocks wrt ..imagebase > + DD $L$SEH_info_ccm64_enc wrt ..imagebase > + > + DD $L$SEH_begin_aesni_ccm64_decrypt_blocks wrt ..imagebase > + DD $L$SEH_end_aesni_ccm64_decrypt_blocks wrt ..imagebase > + DD $L$SEH_info_ccm64_dec wrt ..imagebase > + > + DD $L$SEH_begin_aesni_ctr32_encrypt_blocks wrt ..imagebase > + DD $L$SEH_end_aesni_ctr32_encrypt_blocks wrt ..imagebase > + DD $L$SEH_info_ctr32 wrt ..imagebase > + > + DD $L$SEH_begin_aesni_xts_encrypt wrt ..imagebase > + DD $L$SEH_end_aesni_xts_encrypt wrt ..imagebase > + DD $L$SEH_info_xts_enc wrt ..imagebase > + > + DD $L$SEH_begin_aesni_xts_decrypt wrt ..imagebase > + DD $L$SEH_end_aesni_xts_decrypt wrt ..imagebase > + DD $L$SEH_info_xts_dec wrt ..imagebase > + > + DD $L$SEH_begin_aesni_ocb_encrypt wrt ..imagebase > + DD $L$SEH_end_aesni_ocb_encrypt wrt ..imagebase > + DD $L$SEH_info_ocb_enc wrt ..imagebase > + > + DD $L$SEH_begin_aesni_ocb_decrypt wrt ..imagebase > + DD $L$SEH_end_aesni_ocb_decrypt wrt ..imagebase > + DD $L$SEH_info_ocb_dec wrt ..imagebase > + DD $L$SEH_begin_aesni_cbc_encrypt wrt ..imagebase > + DD $L$SEH_end_aesni_cbc_encrypt wrt ..imagebase > + DD $L$SEH_info_cbc wrt ..imagebase > + > + DD aesni_set_decrypt_key wrt ..imagebase > + DD $L$SEH_end_set_decrypt_key wrt ..imagebase > + DD $L$SEH_info_key wrt ..imagebase > + > + DD aesni_set_encrypt_key wrt ..imagebase > + DD $L$SEH_end_set_encrypt_key wrt ..imagebase > + DD $L$SEH_info_key wrt ..imagebase > +section .xdata rdata align=8 > +ALIGN 8 > +$L$SEH_info_ecb: > +DB 9,0,0,0 > + DD ecb_ccm64_se_handler wrt ..imagebase > + DD $L$ecb_enc_body wrt ..imagebase,$L$ecb_enc_ret wrt ..imagebase > +$L$SEH_info_ccm64_enc: > +DB 9,0,0,0 > + DD ecb_ccm64_se_handler wrt ..imagebase > + DD $L$ccm64_enc_body wrt ..imagebase,$L$ccm64_enc_ret > wrt ..imagebase > +$L$SEH_info_ccm64_dec: > +DB 9,0,0,0 > + DD ecb_ccm64_se_handler wrt ..imagebase > + DD $L$ccm64_dec_body wrt ..imagebase,$L$ccm64_dec_ret > wrt ..imagebase > +$L$SEH_info_ctr32: > +DB 9,0,0,0 > + DD ctr_xts_se_handler wrt ..imagebase > + DD $L$ctr32_body wrt ..imagebase,$L$ctr32_epilogue wrt ..imagebase > +$L$SEH_info_xts_enc: > +DB 9,0,0,0 > + DD ctr_xts_se_handler wrt ..imagebase > + DD $L$xts_enc_body wrt ..imagebase,$L$xts_enc_epilogue > wrt ..imagebase > +$L$SEH_info_xts_dec: > +DB 9,0,0,0 > + DD ctr_xts_se_handler wrt ..imagebase > + DD $L$xts_dec_body wrt ..imagebase,$L$xts_dec_epilogue > wrt ..imagebase > +$L$SEH_info_ocb_enc: > +DB 9,0,0,0 > + DD ocb_se_handler wrt ..imagebase > + DD $L$ocb_enc_body wrt ..imagebase,$L$ocb_enc_epilogue > wrt ..imagebase > + DD $L$ocb_enc_pop wrt ..imagebase > + DD 0 > +$L$SEH_info_ocb_dec: > +DB 9,0,0,0 > + DD ocb_se_handler wrt ..imagebase > + DD $L$ocb_dec_body wrt ..imagebase,$L$ocb_dec_epilogue > wrt ..imagebase > + DD $L$ocb_dec_pop wrt ..imagebase > + DD 0 > +$L$SEH_info_cbc: > +DB 9,0,0,0 > + DD cbc_se_handler wrt ..imagebase > +$L$SEH_info_key: > +DB 0x01,0x04,0x01,0x00 > +DB 0x04,0x02,0x00,0x00 > diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/aes/vpaes-x86_64.nasm > b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/vpaes-x86_64.nasm > new file mode 100644 > index 0000000000..1c911fa294 > --- /dev/null > +++ b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/vpaes-x86_64.nasm > @@ -0,0 +1,1173 @@ > +; WARNING: do not edit! > +; Generated from openssl/crypto/aes/asm/vpaes-x86_64.pl > +; > +; Copyright 2011-2020 The OpenSSL Project Authors. All Rights Reserved. > +; > +; Licensed under the OpenSSL license (the "License"). You may not use > +; this file except in compliance with the License. You can obtain a copy > +; in the file LICENSE in the source distribution or at > +; https://www.openssl.org/source/license.html > + > +default rel > +%define XMMWORD > +%define YMMWORD > +%define ZMMWORD > +section .text code align=64 > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > +ALIGN 16 > +_vpaes_encrypt_core: > + > + mov r9,rdx > + mov r11,16 > + mov eax,DWORD[240+rdx] > + movdqa xmm1,xmm9 > + movdqa xmm2,XMMWORD[$L$k_ipt] > + pandn xmm1,xmm0 > + movdqu xmm5,XMMWORD[r9] > + psrld xmm1,4 > + pand xmm0,xmm9 > +DB 102,15,56,0,208 > + movdqa xmm0,XMMWORD[(($L$k_ipt+16))] > +DB 102,15,56,0,193 > + pxor xmm2,xmm5 > + add r9,16 > + pxor xmm0,xmm2 > + lea r10,[$L$k_mc_backward] > + jmp NEAR $L$enc_entry > + > +ALIGN 16 > +$L$enc_loop: > + > + movdqa xmm4,xmm13 > + movdqa xmm0,xmm12 > +DB 102,15,56,0,226 > +DB 102,15,56,0,195 > + pxor xmm4,xmm5 > + movdqa xmm5,xmm15 > + pxor xmm0,xmm4 > + movdqa xmm1,XMMWORD[((-64))+r10*1+r11] > +DB 102,15,56,0,234 > + movdqa xmm4,XMMWORD[r10*1+r11] > + movdqa xmm2,xmm14 > +DB 102,15,56,0,211 > + movdqa xmm3,xmm0 > + pxor xmm2,xmm5 > +DB 102,15,56,0,193 > + add r9,16 > + pxor xmm0,xmm2 > +DB 102,15,56,0,220 > + add r11,16 > + pxor xmm3,xmm0 > +DB 102,15,56,0,193 > + and r11,0x30 > + sub rax,1 > + pxor xmm0,xmm3 > + > +$L$enc_entry: > + > + movdqa xmm1,xmm9 > + movdqa xmm5,xmm11 > + pandn xmm1,xmm0 > + psrld xmm1,4 > + pand xmm0,xmm9 > +DB 102,15,56,0,232 > + movdqa xmm3,xmm10 > + pxor xmm0,xmm1 > +DB 102,15,56,0,217 > + movdqa xmm4,xmm10 > + pxor xmm3,xmm5 > +DB 102,15,56,0,224 > + movdqa xmm2,xmm10 > + pxor xmm4,xmm5 > +DB 102,15,56,0,211 > + movdqa xmm3,xmm10 > + pxor xmm2,xmm0 > +DB 102,15,56,0,220 > + movdqu xmm5,XMMWORD[r9] > + pxor xmm3,xmm1 > + jnz NEAR $L$enc_loop > + > + > + movdqa xmm4,XMMWORD[((-96))+r10] > + movdqa xmm0,XMMWORD[((-80))+r10] > +DB 102,15,56,0,226 > + pxor xmm4,xmm5 > +DB 102,15,56,0,195 > + movdqa xmm1,XMMWORD[64+r10*1+r11] > + pxor xmm0,xmm4 > +DB 102,15,56,0,193 > + DB 0F3h,0C3h ;repret > + > + > + > + > + > + > + > + > + > +ALIGN 16 > +_vpaes_decrypt_core: > + > + mov r9,rdx > + mov eax,DWORD[240+rdx] > + movdqa xmm1,xmm9 > + movdqa xmm2,XMMWORD[$L$k_dipt] > + pandn xmm1,xmm0 > + mov r11,rax > + psrld xmm1,4 > + movdqu xmm5,XMMWORD[r9] > + shl r11,4 > + pand xmm0,xmm9 > +DB 102,15,56,0,208 > + movdqa xmm0,XMMWORD[(($L$k_dipt+16))] > + xor r11,0x30 > + lea r10,[$L$k_dsbd] > +DB 102,15,56,0,193 > + and r11,0x30 > + pxor xmm2,xmm5 > + movdqa xmm5,XMMWORD[(($L$k_mc_forward+48))] > + pxor xmm0,xmm2 > + add r9,16 > + add r11,r10 > + jmp NEAR $L$dec_entry > + > +ALIGN 16 > +$L$dec_loop: > + > + > + > + movdqa xmm4,XMMWORD[((-32))+r10] > + movdqa xmm1,XMMWORD[((-16))+r10] > +DB 102,15,56,0,226 > +DB 102,15,56,0,203 > + pxor xmm0,xmm4 > + movdqa xmm4,XMMWORD[r10] > + pxor xmm0,xmm1 > + movdqa xmm1,XMMWORD[16+r10] > + > +DB 102,15,56,0,226 > +DB 102,15,56,0,197 > +DB 102,15,56,0,203 > + pxor xmm0,xmm4 > + movdqa xmm4,XMMWORD[32+r10] > + pxor xmm0,xmm1 > + movdqa xmm1,XMMWORD[48+r10] > + > +DB 102,15,56,0,226 > +DB 102,15,56,0,197 > +DB 102,15,56,0,203 > + pxor xmm0,xmm4 > + movdqa xmm4,XMMWORD[64+r10] > + pxor xmm0,xmm1 > + movdqa xmm1,XMMWORD[80+r10] > + > +DB 102,15,56,0,226 > +DB 102,15,56,0,197 > +DB 102,15,56,0,203 > + pxor xmm0,xmm4 > + add r9,16 > +DB 102,15,58,15,237,12 > + pxor xmm0,xmm1 > + sub rax,1 > + > +$L$dec_entry: > + > + movdqa xmm1,xmm9 > + pandn xmm1,xmm0 > + movdqa xmm2,xmm11 > + psrld xmm1,4 > + pand xmm0,xmm9 > +DB 102,15,56,0,208 > + movdqa xmm3,xmm10 > + pxor xmm0,xmm1 > +DB 102,15,56,0,217 > + movdqa xmm4,xmm10 > + pxor xmm3,xmm2 > +DB 102,15,56,0,224 > + pxor xmm4,xmm2 > + movdqa xmm2,xmm10 > +DB 102,15,56,0,211 > + movdqa xmm3,xmm10 > + pxor xmm2,xmm0 > +DB 102,15,56,0,220 > + movdqu xmm0,XMMWORD[r9] > + pxor xmm3,xmm1 > + jnz NEAR $L$dec_loop > + > + > + movdqa xmm4,XMMWORD[96+r10] > +DB 102,15,56,0,226 > + pxor xmm4,xmm0 > + movdqa xmm0,XMMWORD[112+r10] > + movdqa xmm2,XMMWORD[((-352))+r11] > +DB 102,15,56,0,195 > + pxor xmm0,xmm4 > +DB 102,15,56,0,194 > + DB 0F3h,0C3h ;repret > + > + > + > + > + > + > + > + > + > +ALIGN 16 > +_vpaes_schedule_core: > + > + > + > + > + > + > + call _vpaes_preheat > + movdqa xmm8,XMMWORD[$L$k_rcon] > + movdqu xmm0,XMMWORD[rdi] > + > + > + movdqa xmm3,xmm0 > + lea r11,[$L$k_ipt] > + call _vpaes_schedule_transform > + movdqa xmm7,xmm0 > + > + lea r10,[$L$k_sr] > + test rcx,rcx > + jnz NEAR $L$schedule_am_decrypting > + > + > + movdqu XMMWORD[rdx],xmm0 > + jmp NEAR $L$schedule_go > + > +$L$schedule_am_decrypting: > + > + movdqa xmm1,XMMWORD[r10*1+r8] > +DB 102,15,56,0,217 > + movdqu XMMWORD[rdx],xmm3 > + xor r8,0x30 > + > +$L$schedule_go: > + cmp esi,192 > + ja NEAR $L$schedule_256 > + je NEAR $L$schedule_192 > + > + > + > + > + > + > + > + > + > + > +$L$schedule_128: > + mov esi,10 > + > +$L$oop_schedule_128: > + call _vpaes_schedule_round > + dec rsi > + jz NEAR $L$schedule_mangle_last > + call _vpaes_schedule_mangle > + jmp NEAR $L$oop_schedule_128 > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > +ALIGN 16 > +$L$schedule_192: > + movdqu xmm0,XMMWORD[8+rdi] > + call _vpaes_schedule_transform > + movdqa xmm6,xmm0 > + pxor xmm4,xmm4 > + movhlps xmm6,xmm4 > + mov esi,4 > + > +$L$oop_schedule_192: > + call _vpaes_schedule_round > +DB 102,15,58,15,198,8 > + call _vpaes_schedule_mangle > + call _vpaes_schedule_192_smear > + call _vpaes_schedule_mangle > + call _vpaes_schedule_round > + dec rsi > + jz NEAR $L$schedule_mangle_last > + call _vpaes_schedule_mangle > + call _vpaes_schedule_192_smear > + jmp NEAR $L$oop_schedule_192 > + > + > + > + > + > + > + > + > + > + > + > +ALIGN 16 > +$L$schedule_256: > + movdqu xmm0,XMMWORD[16+rdi] > + call _vpaes_schedule_transform > + mov esi,7 > + > +$L$oop_schedule_256: > + call _vpaes_schedule_mangle > + movdqa xmm6,xmm0 > + > + > + call _vpaes_schedule_round > + dec rsi > + jz NEAR $L$schedule_mangle_last > + call _vpaes_schedule_mangle > + > + > + pshufd xmm0,xmm0,0xFF > + movdqa xmm5,xmm7 > + movdqa xmm7,xmm6 > + call _vpaes_schedule_low_round > + movdqa xmm7,xmm5 > + > + jmp NEAR $L$oop_schedule_256 > + > + > + > + > + > + > + > + > + > + > + > + > +ALIGN 16 > +$L$schedule_mangle_last: > + > + lea r11,[$L$k_deskew] > + test rcx,rcx > + jnz NEAR $L$schedule_mangle_last_dec > + > + > + movdqa xmm1,XMMWORD[r10*1+r8] > +DB 102,15,56,0,193 > + lea r11,[$L$k_opt] > + add rdx,32 > + > +$L$schedule_mangle_last_dec: > + add rdx,-16 > + pxor xmm0,XMMWORD[$L$k_s63] > + call _vpaes_schedule_transform > + movdqu XMMWORD[rdx],xmm0 > + > + > + pxor xmm0,xmm0 > + pxor xmm1,xmm1 > + pxor xmm2,xmm2 > + pxor xmm3,xmm3 > + pxor xmm4,xmm4 > + pxor xmm5,xmm5 > + pxor xmm6,xmm6 > + pxor xmm7,xmm7 > + DB 0F3h,0C3h ;repret > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > +ALIGN 16 > +_vpaes_schedule_192_smear: > + > + pshufd xmm1,xmm6,0x80 > + pshufd xmm0,xmm7,0xFE > + pxor xmm6,xmm1 > + pxor xmm1,xmm1 > + pxor xmm6,xmm0 > + movdqa xmm0,xmm6 > + movhlps xmm6,xmm1 > + DB 0F3h,0C3h ;repret > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > +ALIGN 16 > +_vpaes_schedule_round: > + > + > + pxor xmm1,xmm1 > +DB 102,65,15,58,15,200,15 > +DB 102,69,15,58,15,192,15 > + pxor xmm7,xmm1 > + > + > + pshufd xmm0,xmm0,0xFF > +DB 102,15,58,15,192,1 > + > + > + > + > +_vpaes_schedule_low_round: > + > + movdqa xmm1,xmm7 > + pslldq xmm7,4 > + pxor xmm7,xmm1 > + movdqa xmm1,xmm7 > + pslldq xmm7,8 > + pxor xmm7,xmm1 > + pxor xmm7,XMMWORD[$L$k_s63] > + > + > + movdqa xmm1,xmm9 > + pandn xmm1,xmm0 > + psrld xmm1,4 > + pand xmm0,xmm9 > + movdqa xmm2,xmm11 > +DB 102,15,56,0,208 > + pxor xmm0,xmm1 > + movdqa xmm3,xmm10 > +DB 102,15,56,0,217 > + pxor xmm3,xmm2 > + movdqa xmm4,xmm10 > +DB 102,15,56,0,224 > + pxor xmm4,xmm2 > + movdqa xmm2,xmm10 > +DB 102,15,56,0,211 > + pxor xmm2,xmm0 > + movdqa xmm3,xmm10 > +DB 102,15,56,0,220 > + pxor xmm3,xmm1 > + movdqa xmm4,xmm13 > +DB 102,15,56,0,226 > + movdqa xmm0,xmm12 > +DB 102,15,56,0,195 > + pxor xmm0,xmm4 > + > + > + pxor xmm0,xmm7 > + movdqa xmm7,xmm0 > + DB 0F3h,0C3h ;repret > + > + > + > + > + > + > + > + > + > + > + > + > + > +ALIGN 16 > +_vpaes_schedule_transform: > + > + movdqa xmm1,xmm9 > + pandn xmm1,xmm0 > + psrld xmm1,4 > + pand xmm0,xmm9 > + movdqa xmm2,XMMWORD[r11] > +DB 102,15,56,0,208 > + movdqa xmm0,XMMWORD[16+r11] > +DB 102,15,56,0,193 > + pxor xmm0,xmm2 > + DB 0F3h,0C3h ;repret > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > +ALIGN 16 > +_vpaes_schedule_mangle: > + > + movdqa xmm4,xmm0 > + movdqa xmm5,XMMWORD[$L$k_mc_forward] > + test rcx,rcx > + jnz NEAR $L$schedule_mangle_dec > + > + > + add rdx,16 > + pxor xmm4,XMMWORD[$L$k_s63] > +DB 102,15,56,0,229 > + movdqa xmm3,xmm4 > +DB 102,15,56,0,229 > + pxor xmm3,xmm4 > +DB 102,15,56,0,229 > + pxor xmm3,xmm4 > + > + jmp NEAR $L$schedule_mangle_both > +ALIGN 16 > +$L$schedule_mangle_dec: > + > + lea r11,[$L$k_dksd] > + movdqa xmm1,xmm9 > + pandn xmm1,xmm4 > + psrld xmm1,4 > + pand xmm4,xmm9 > + > + movdqa xmm2,XMMWORD[r11] > +DB 102,15,56,0,212 > + movdqa xmm3,XMMWORD[16+r11] > +DB 102,15,56,0,217 > + pxor xmm3,xmm2 > +DB 102,15,56,0,221 > + > + movdqa xmm2,XMMWORD[32+r11] > +DB 102,15,56,0,212 > + pxor xmm2,xmm3 > + movdqa xmm3,XMMWORD[48+r11] > +DB 102,15,56,0,217 > + pxor xmm3,xmm2 > +DB 102,15,56,0,221 > + > + movdqa xmm2,XMMWORD[64+r11] > +DB 102,15,56,0,212 > + pxor xmm2,xmm3 > + movdqa xmm3,XMMWORD[80+r11] > +DB 102,15,56,0,217 > + pxor xmm3,xmm2 > +DB 102,15,56,0,221 > + > + movdqa xmm2,XMMWORD[96+r11] > +DB 102,15,56,0,212 > + pxor xmm2,xmm3 > + movdqa xmm3,XMMWORD[112+r11] > +DB 102,15,56,0,217 > + pxor xmm3,xmm2 > + > + add rdx,-16 > + > +$L$schedule_mangle_both: > + movdqa xmm1,XMMWORD[r10*1+r8] > +DB 102,15,56,0,217 > + add r8,-16 > + and r8,0x30 > + movdqu XMMWORD[rdx],xmm3 > + DB 0F3h,0C3h ;repret > + > + > + > + > + > + > +global vpaes_set_encrypt_key > + > +ALIGN 16 > +vpaes_set_encrypt_key: > + mov QWORD[8+rsp],rdi ;WIN64 prologue > + mov QWORD[16+rsp],rsi > + mov rax,rsp > +$L$SEH_begin_vpaes_set_encrypt_key: > + mov rdi,rcx > + mov rsi,rdx > + mov rdx,r8 > + > + > + > + lea rsp,[((-184))+rsp] > + movaps XMMWORD[16+rsp],xmm6 > + movaps XMMWORD[32+rsp],xmm7 > + movaps XMMWORD[48+rsp],xmm8 > + movaps XMMWORD[64+rsp],xmm9 > + movaps XMMWORD[80+rsp],xmm10 > + movaps XMMWORD[96+rsp],xmm11 > + movaps XMMWORD[112+rsp],xmm12 > + movaps XMMWORD[128+rsp],xmm13 > + movaps XMMWORD[144+rsp],xmm14 > + movaps XMMWORD[160+rsp],xmm15 > +$L$enc_key_body: > + mov eax,esi > + shr eax,5 > + add eax,5 > + mov DWORD[240+rdx],eax > + > + mov ecx,0 > + mov r8d,0x30 > + call _vpaes_schedule_core > + movaps xmm6,XMMWORD[16+rsp] > + movaps xmm7,XMMWORD[32+rsp] > + movaps xmm8,XMMWORD[48+rsp] > + movaps xmm9,XMMWORD[64+rsp] > + movaps xmm10,XMMWORD[80+rsp] > + movaps xmm11,XMMWORD[96+rsp] > + movaps xmm12,XMMWORD[112+rsp] > + movaps xmm13,XMMWORD[128+rsp] > + movaps xmm14,XMMWORD[144+rsp] > + movaps xmm15,XMMWORD[160+rsp] > + lea rsp,[184+rsp] > +$L$enc_key_epilogue: > + xor eax,eax > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > + mov rsi,QWORD[16+rsp] > + DB 0F3h,0C3h ;repret > + > +$L$SEH_end_vpaes_set_encrypt_key: > + > +global vpaes_set_decrypt_key > + > +ALIGN 16 > +vpaes_set_decrypt_key: > + mov QWORD[8+rsp],rdi ;WIN64 prologue > + mov QWORD[16+rsp],rsi > + mov rax,rsp > +$L$SEH_begin_vpaes_set_decrypt_key: > + mov rdi,rcx > + mov rsi,rdx > + mov rdx,r8 > + > + > + > + lea rsp,[((-184))+rsp] > + movaps XMMWORD[16+rsp],xmm6 > + movaps XMMWORD[32+rsp],xmm7 > + movaps XMMWORD[48+rsp],xmm8 > + movaps XMMWORD[64+rsp],xmm9 > + movaps XMMWORD[80+rsp],xmm10 > + movaps XMMWORD[96+rsp],xmm11 > + movaps XMMWORD[112+rsp],xmm12 > + movaps XMMWORD[128+rsp],xmm13 > + movaps XMMWORD[144+rsp],xmm14 > + movaps XMMWORD[160+rsp],xmm15 > +$L$dec_key_body: > + mov eax,esi > + shr eax,5 > + add eax,5 > + mov DWORD[240+rdx],eax > + shl eax,4 > + lea rdx,[16+rax*1+rdx] > + > + mov ecx,1 > + mov r8d,esi > + shr r8d,1 > + and r8d,32 > + xor r8d,32 > + call _vpaes_schedule_core > + movaps xmm6,XMMWORD[16+rsp] > + movaps xmm7,XMMWORD[32+rsp] > + movaps xmm8,XMMWORD[48+rsp] > + movaps xmm9,XMMWORD[64+rsp] > + movaps xmm10,XMMWORD[80+rsp] > + movaps xmm11,XMMWORD[96+rsp] > + movaps xmm12,XMMWORD[112+rsp] > + movaps xmm13,XMMWORD[128+rsp] > + movaps xmm14,XMMWORD[144+rsp] > + movaps xmm15,XMMWORD[160+rsp] > + lea rsp,[184+rsp] > +$L$dec_key_epilogue: > + xor eax,eax > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > + mov rsi,QWORD[16+rsp] > + DB 0F3h,0C3h ;repret > + > +$L$SEH_end_vpaes_set_decrypt_key: > + > +global vpaes_encrypt > + > +ALIGN 16 > +vpaes_encrypt: > + mov QWORD[8+rsp],rdi ;WIN64 prologue > + mov QWORD[16+rsp],rsi > + mov rax,rsp > +$L$SEH_begin_vpaes_encrypt: > + mov rdi,rcx > + mov rsi,rdx > + mov rdx,r8 > + > + > + > + lea rsp,[((-184))+rsp] > + movaps XMMWORD[16+rsp],xmm6 > + movaps XMMWORD[32+rsp],xmm7 > + movaps XMMWORD[48+rsp],xmm8 > + movaps XMMWORD[64+rsp],xmm9 > + movaps XMMWORD[80+rsp],xmm10 > + movaps XMMWORD[96+rsp],xmm11 > + movaps XMMWORD[112+rsp],xmm12 > + movaps XMMWORD[128+rsp],xmm13 > + movaps XMMWORD[144+rsp],xmm14 > + movaps XMMWORD[160+rsp],xmm15 > +$L$enc_body: > + movdqu xmm0,XMMWORD[rdi] > + call _vpaes_preheat > + call _vpaes_encrypt_core > + movdqu XMMWORD[rsi],xmm0 > + movaps xmm6,XMMWORD[16+rsp] > + movaps xmm7,XMMWORD[32+rsp] > + movaps xmm8,XMMWORD[48+rsp] > + movaps xmm9,XMMWORD[64+rsp] > + movaps xmm10,XMMWORD[80+rsp] > + movaps xmm11,XMMWORD[96+rsp] > + movaps xmm12,XMMWORD[112+rsp] > + movaps xmm13,XMMWORD[128+rsp] > + movaps xmm14,XMMWORD[144+rsp] > + movaps xmm15,XMMWORD[160+rsp] > + lea rsp,[184+rsp] > +$L$enc_epilogue: > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > + mov rsi,QWORD[16+rsp] > + DB 0F3h,0C3h ;repret > + > +$L$SEH_end_vpaes_encrypt: > + > +global vpaes_decrypt > + > +ALIGN 16 > +vpaes_decrypt: > + mov QWORD[8+rsp],rdi ;WIN64 prologue > + mov QWORD[16+rsp],rsi > + mov rax,rsp > +$L$SEH_begin_vpaes_decrypt: > + mov rdi,rcx > + mov rsi,rdx > + mov rdx,r8 > + > + > + > + lea rsp,[((-184))+rsp] > + movaps XMMWORD[16+rsp],xmm6 > + movaps XMMWORD[32+rsp],xmm7 > + movaps XMMWORD[48+rsp],xmm8 > + movaps XMMWORD[64+rsp],xmm9 > + movaps XMMWORD[80+rsp],xmm10 > + movaps XMMWORD[96+rsp],xmm11 > + movaps XMMWORD[112+rsp],xmm12 > + movaps XMMWORD[128+rsp],xmm13 > + movaps XMMWORD[144+rsp],xmm14 > + movaps XMMWORD[160+rsp],xmm15 > +$L$dec_body: > + movdqu xmm0,XMMWORD[rdi] > + call _vpaes_preheat > + call _vpaes_decrypt_core > + movdqu XMMWORD[rsi],xmm0 > + movaps xmm6,XMMWORD[16+rsp] > + movaps xmm7,XMMWORD[32+rsp] > + movaps xmm8,XMMWORD[48+rsp] > + movaps xmm9,XMMWORD[64+rsp] > + movaps xmm10,XMMWORD[80+rsp] > + movaps xmm11,XMMWORD[96+rsp] > + movaps xmm12,XMMWORD[112+rsp] > + movaps xmm13,XMMWORD[128+rsp] > + movaps xmm14,XMMWORD[144+rsp] > + movaps xmm15,XMMWORD[160+rsp] > + lea rsp,[184+rsp] > +$L$dec_epilogue: > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > + mov rsi,QWORD[16+rsp] > + DB 0F3h,0C3h ;repret > + > +$L$SEH_end_vpaes_decrypt: > +global vpaes_cbc_encrypt > + > +ALIGN 16 > +vpaes_cbc_encrypt: > + mov QWORD[8+rsp],rdi ;WIN64 prologue > + mov QWORD[16+rsp],rsi > + mov rax,rsp > +$L$SEH_begin_vpaes_cbc_encrypt: > + mov rdi,rcx > + mov rsi,rdx > + mov rdx,r8 > + mov rcx,r9 > + mov r8,QWORD[40+rsp] > + mov r9,QWORD[48+rsp] > + > + > + > + xchg rdx,rcx > + sub rcx,16 > + jc NEAR $L$cbc_abort > + lea rsp,[((-184))+rsp] > + movaps XMMWORD[16+rsp],xmm6 > + movaps XMMWORD[32+rsp],xmm7 > + movaps XMMWORD[48+rsp],xmm8 > + movaps XMMWORD[64+rsp],xmm9 > + movaps XMMWORD[80+rsp],xmm10 > + movaps XMMWORD[96+rsp],xmm11 > + movaps XMMWORD[112+rsp],xmm12 > + movaps XMMWORD[128+rsp],xmm13 > + movaps XMMWORD[144+rsp],xmm14 > + movaps XMMWORD[160+rsp],xmm15 > +$L$cbc_body: > + movdqu xmm6,XMMWORD[r8] > + sub rsi,rdi > + call _vpaes_preheat > + cmp r9d,0 > + je NEAR $L$cbc_dec_loop > + jmp NEAR $L$cbc_enc_loop > +ALIGN 16 > +$L$cbc_enc_loop: > + movdqu xmm0,XMMWORD[rdi] > + pxor xmm0,xmm6 > + call _vpaes_encrypt_core > + movdqa xmm6,xmm0 > + movdqu XMMWORD[rdi*1+rsi],xmm0 > + lea rdi,[16+rdi] > + sub rcx,16 > + jnc NEAR $L$cbc_enc_loop > + jmp NEAR $L$cbc_done > +ALIGN 16 > +$L$cbc_dec_loop: > + movdqu xmm0,XMMWORD[rdi] > + movdqa xmm7,xmm0 > + call _vpaes_decrypt_core > + pxor xmm0,xmm6 > + movdqa xmm6,xmm7 > + movdqu XMMWORD[rdi*1+rsi],xmm0 > + lea rdi,[16+rdi] > + sub rcx,16 > + jnc NEAR $L$cbc_dec_loop > +$L$cbc_done: > + movdqu XMMWORD[r8],xmm6 > + movaps xmm6,XMMWORD[16+rsp] > + movaps xmm7,XMMWORD[32+rsp] > + movaps xmm8,XMMWORD[48+rsp] > + movaps xmm9,XMMWORD[64+rsp] > + movaps xmm10,XMMWORD[80+rsp] > + movaps xmm11,XMMWORD[96+rsp] > + movaps xmm12,XMMWORD[112+rsp] > + movaps xmm13,XMMWORD[128+rsp] > + movaps xmm14,XMMWORD[144+rsp] > + movaps xmm15,XMMWORD[160+rsp] > + lea rsp,[184+rsp] > +$L$cbc_epilogue: > +$L$cbc_abort: > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > + mov rsi,QWORD[16+rsp] > + DB 0F3h,0C3h ;repret > + > +$L$SEH_end_vpaes_cbc_encrypt: > + > + > + > + > + > + > + > +ALIGN 16 > +_vpaes_preheat: > + > + lea r10,[$L$k_s0F] > + movdqa xmm10,XMMWORD[((-32))+r10] > + movdqa xmm11,XMMWORD[((-16))+r10] > + movdqa xmm9,XMMWORD[r10] > + movdqa xmm13,XMMWORD[48+r10] > + movdqa xmm12,XMMWORD[64+r10] > + movdqa xmm15,XMMWORD[80+r10] > + movdqa xmm14,XMMWORD[96+r10] > + DB 0F3h,0C3h ;repret > + > + > + > + > + > + > + > + > +ALIGN 64 > +_vpaes_consts: > +$L$k_inv: > + DQ 0x0E05060F0D080180,0x040703090A0B0C02 > + DQ 0x01040A060F0B0780,0x030D0E0C02050809 > + > +$L$k_s0F: > + DQ 0x0F0F0F0F0F0F0F0F,0x0F0F0F0F0F0F0F0F > + > +$L$k_ipt: > + DQ 0xC2B2E8985A2A7000,0xCABAE09052227808 > + DQ 0x4C01307D317C4D00,0xCD80B1FCB0FDCC81 > + > +$L$k_sb1: > + DQ 0xB19BE18FCB503E00,0xA5DF7A6E142AF544 > + DQ 0x3618D415FAE22300,0x3BF7CCC10D2ED9EF > +$L$k_sb2: > + DQ 0xE27A93C60B712400,0x5EB7E955BC982FCD > + DQ 0x69EB88400AE12900,0xC2A163C8AB82234A > +$L$k_sbo: > + DQ 0xD0D26D176FBDC700,0x15AABF7AC502A878 > + DQ 0xCFE474A55FBB6A00,0x8E1E90D1412B35FA > + > +$L$k_mc_forward: > + DQ 0x0407060500030201,0x0C0F0E0D080B0A09 > + DQ 0x080B0A0904070605,0x000302010C0F0E0D > + DQ 0x0C0F0E0D080B0A09,0x0407060500030201 > + DQ 0x000302010C0F0E0D,0x080B0A0904070605 > + > +$L$k_mc_backward: > + DQ 0x0605040702010003,0x0E0D0C0F0A09080B > + DQ 0x020100030E0D0C0F,0x0A09080B06050407 > + DQ 0x0E0D0C0F0A09080B,0x0605040702010003 > + DQ 0x0A09080B06050407,0x020100030E0D0C0F > + > +$L$k_sr: > + DQ 0x0706050403020100,0x0F0E0D0C0B0A0908 > + DQ 0x030E09040F0A0500,0x0B06010C07020D08 > + DQ 0x0F060D040B020900,0x070E050C030A0108 > + DQ 0x0B0E0104070A0D00,0x0306090C0F020508 > + > +$L$k_rcon: > + DQ 0x1F8391B9AF9DEEB6,0x702A98084D7C7D81 > + > +$L$k_s63: > + DQ 0x5B5B5B5B5B5B5B5B,0x5B5B5B5B5B5B5B5B > + > +$L$k_opt: > + DQ 0xFF9F4929D6B66000,0xF7974121DEBE6808 > + DQ 0x01EDBD5150BCEC00,0xE10D5DB1B05C0CE0 > + > +$L$k_deskew: > + DQ 0x07E4A34047A4E300,0x1DFEB95A5DBEF91A > + DQ 0x5F36B5DC83EA6900,0x2841C2ABF49D1E77 > + > + > + > + > + > +$L$k_dksd: > + DQ 0xFEB91A5DA3E44700,0x0740E3A45A1DBEF9 > + DQ 0x41C277F4B5368300,0x5FDC69EAAB289D1E > +$L$k_dksb: > + DQ 0x9A4FCA1F8550D500,0x03D653861CC94C99 > + DQ 0x115BEDA7B6FC4A00,0xD993256F7E3482C8 > +$L$k_dkse: > + DQ 0xD5031CCA1FC9D600,0x53859A4C994F5086 > + DQ 0xA23196054FDC7BE8,0xCD5EF96A20B31487 > +$L$k_dks9: > + DQ 0xB6116FC87ED9A700,0x4AED933482255BFC > + DQ 0x4576516227143300,0x8BB89FACE9DAFDCE > + > + > + > + > + > +$L$k_dipt: > + DQ 0x0F505B040B545F00,0x154A411E114E451A > + DQ 0x86E383E660056500,0x12771772F491F194 > + > +$L$k_dsb9: > + DQ 0x851C03539A86D600,0xCAD51F504F994CC9 > + DQ 0xC03B1789ECD74900,0x725E2C9EB2FBA565 > +$L$k_dsbd: > + DQ 0x7D57CCDFE6B1A200,0xF56E9B13882A4439 > + DQ 0x3CE2FAF724C6CB00,0x2931180D15DEEFD3 > +$L$k_dsbb: > + DQ 0xD022649296B44200,0x602646F6B0F2D404 > + DQ 0xC19498A6CD596700,0xF3FF0C3E3255AA6B > +$L$k_dsbe: > + DQ 0x46F2929626D4D000,0x2242600464B4F6B0 > + DQ 0x0C55A6CDFFAAC100,0x9467F36B98593E32 > +$L$k_dsbo: > + DQ 0x1387EA537EF94000,0xC7AA6DB9D4943E2D > + DQ 0x12D7560F93441D00,0xCA4B8159D8C58E9C > +DB 86,101,99,116,111,114,32,80,101,114,109,117,116,97,116,105 > +DB 111,110,32,65,69,83,32,102,111,114,32,120,56,54,95,54 > +DB 52,47,83,83,83,69,51,44,32,77,105,107,101,32,72,97 > +DB 109,98,117,114,103,32,40,83,116,97,110,102,111,114,100,32 > +DB 85,110,105,118,101,114,115,105,116,121,41,0 > +ALIGN 64 > + > +EXTERN __imp_RtlVirtualUnwind > + > +ALIGN 16 > +se_handler: > + push rsi > + push rdi > + push rbx > + push rbp > + push r12 > + push r13 > + push r14 > + push r15 > + pushfq > + sub rsp,64 > + > + mov rax,QWORD[120+r8] > + mov rbx,QWORD[248+r8] > + > + mov rsi,QWORD[8+r9] > + mov r11,QWORD[56+r9] > + > + mov r10d,DWORD[r11] > + lea r10,[r10*1+rsi] > + cmp rbx,r10 > + jb NEAR $L$in_prologue > + > + mov rax,QWORD[152+r8] > + > + mov r10d,DWORD[4+r11] > + lea r10,[r10*1+rsi] > + cmp rbx,r10 > + jae NEAR $L$in_prologue > + > + lea rsi,[16+rax] > + lea rdi,[512+r8] > + mov ecx,20 > + DD 0xa548f3fc > + lea rax,[184+rax] > + > +$L$in_prologue: > + mov rdi,QWORD[8+rax] > + mov rsi,QWORD[16+rax] > + mov QWORD[152+r8],rax > + mov QWORD[168+r8],rsi > + mov QWORD[176+r8],rdi > + > + mov rdi,QWORD[40+r9] > + mov rsi,r8 > + mov ecx,154 > + DD 0xa548f3fc > + > + mov rsi,r9 > + xor rcx,rcx > + mov rdx,QWORD[8+rsi] > + mov r8,QWORD[rsi] > + mov r9,QWORD[16+rsi] > + mov r10,QWORD[40+rsi] > + lea r11,[56+rsi] > + lea r12,[24+rsi] > + mov QWORD[32+rsp],r10 > + mov QWORD[40+rsp],r11 > + mov QWORD[48+rsp],r12 > + mov QWORD[56+rsp],rcx > + call QWORD[__imp_RtlVirtualUnwind] > + > + mov eax,1 > + add rsp,64 > + popfq > + pop r15 > + pop r14 > + pop r13 > + pop r12 > + pop rbp > + pop rbx > + pop rdi > + pop rsi > + DB 0F3h,0C3h ;repret > + > + > +section .pdata rdata align=4 > +ALIGN 4 > + DD $L$SEH_begin_vpaes_set_encrypt_key wrt ..imagebase > + DD $L$SEH_end_vpaes_set_encrypt_key wrt ..imagebase > + DD $L$SEH_info_vpaes_set_encrypt_key wrt ..imagebase > + > + DD $L$SEH_begin_vpaes_set_decrypt_key wrt ..imagebase > + DD $L$SEH_end_vpaes_set_decrypt_key wrt ..imagebase > + DD $L$SEH_info_vpaes_set_decrypt_key wrt ..imagebase > + > + DD $L$SEH_begin_vpaes_encrypt wrt ..imagebase > + DD $L$SEH_end_vpaes_encrypt wrt ..imagebase > + DD $L$SEH_info_vpaes_encrypt wrt ..imagebase > + > + DD $L$SEH_begin_vpaes_decrypt wrt ..imagebase > + DD $L$SEH_end_vpaes_decrypt wrt ..imagebase > + DD $L$SEH_info_vpaes_decrypt wrt ..imagebase > + > + DD $L$SEH_begin_vpaes_cbc_encrypt wrt ..imagebase > + DD $L$SEH_end_vpaes_cbc_encrypt wrt ..imagebase > + DD $L$SEH_info_vpaes_cbc_encrypt wrt ..imagebase > + > +section .xdata rdata align=8 > +ALIGN 8 > +$L$SEH_info_vpaes_set_encrypt_key: > +DB 9,0,0,0 > + DD se_handler wrt ..imagebase > + DD $L$enc_key_body wrt ..imagebase,$L$enc_key_epilogue > wrt ..imagebase > +$L$SEH_info_vpaes_set_decrypt_key: > +DB 9,0,0,0 > + DD se_handler wrt ..imagebase > + DD $L$dec_key_body wrt ..imagebase,$L$dec_key_epilogue > wrt ..imagebase > +$L$SEH_info_vpaes_encrypt: > +DB 9,0,0,0 > + DD se_handler wrt ..imagebase > + DD $L$enc_body wrt ..imagebase,$L$enc_epilogue wrt ..imagebase > +$L$SEH_info_vpaes_decrypt: > +DB 9,0,0,0 > + DD se_handler wrt ..imagebase > + DD $L$dec_body wrt ..imagebase,$L$dec_epilogue wrt ..imagebase > +$L$SEH_info_vpaes_cbc_encrypt: > +DB 9,0,0,0 > + DD se_handler wrt ..imagebase > + DD $L$cbc_body wrt ..imagebase,$L$cbc_epilogue wrt ..imagebase > diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/modes/aesni-gcm- > x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/modes/aesni-gcm- > x86_64.nasm > new file mode 100644 > index 0000000000..9e1a2d0a40 > --- /dev/null > +++ b/CryptoPkg/Library/OpensslLib/X64/crypto/modes/aesni-gcm- > x86_64.nasm > @@ -0,0 +1,34 @@ > +; WARNING: do not edit! > +; Generated from openssl/crypto/modes/asm/aesni-gcm-x86_64.pl > +; > +; Copyright 2013-2020 The OpenSSL Project Authors. All Rights Reserved. > +; > +; Licensed under the OpenSSL license (the "License"). You may not use > +; this file except in compliance with the License. You can obtain a copy > +; in the file LICENSE in the source distribution or at > +; https://www.openssl.org/source/license.html > + > +default rel > +%define XMMWORD > +%define YMMWORD > +%define ZMMWORD > +section .text code align=64 > + > + > +global aesni_gcm_encrypt > + > +aesni_gcm_encrypt: > + > + xor eax,eax > + DB 0F3h,0C3h ;repret > + > + > + > +global aesni_gcm_decrypt > + > +aesni_gcm_decrypt: > + > + xor eax,eax > + DB 0F3h,0C3h ;repret > + > + > diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/modes/ghash- > x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/modes/ghash- > x86_64.nasm > new file mode 100644 > index 0000000000..60f283d5fb > --- /dev/null > +++ b/CryptoPkg/Library/OpensslLib/X64/crypto/modes/ghash-x86_64.nasm > @@ -0,0 +1,1569 @@ > +; WARNING: do not edit! > +; Generated from openssl/crypto/modes/asm/ghash-x86_64.pl > +; > +; Copyright 2010-2020 The OpenSSL Project Authors. All Rights Reserved. > +; > +; Licensed under the OpenSSL license (the "License"). You may not use > +; this file except in compliance with the License. You can obtain a copy > +; in the file LICENSE in the source distribution or at > +; https://www.openssl.org/source/license.html > + > +default rel > +%define XMMWORD > +%define YMMWORD > +%define ZMMWORD > +section .text code align=64 > + > +EXTERN OPENSSL_ia32cap_P > + > +global gcm_gmult_4bit > + > +ALIGN 16 > +gcm_gmult_4bit: > + mov QWORD[8+rsp],rdi ;WIN64 prologue > + mov QWORD[16+rsp],rsi > + mov rax,rsp > +$L$SEH_begin_gcm_gmult_4bit: > + mov rdi,rcx > + mov rsi,rdx > + > + > + > + push rbx > + > + push rbp > + > + push r12 > + > + push r13 > + > + push r14 > + > + push r15 > + > + sub rsp,280 > + > +$L$gmult_prologue: > + > + movzx r8,BYTE[15+rdi] > + lea r11,[$L$rem_4bit] > + xor rax,rax > + xor rbx,rbx > + mov al,r8b > + mov bl,r8b > + shl al,4 > + mov rcx,14 > + mov r8,QWORD[8+rax*1+rsi] > + mov r9,QWORD[rax*1+rsi] > + and bl,0xf0 > + mov rdx,r8 > + jmp NEAR $L$oop1 > + > +ALIGN 16 > +$L$oop1: > + shr r8,4 > + and rdx,0xf > + mov r10,r9 > + mov al,BYTE[rcx*1+rdi] > + shr r9,4 > + xor r8,QWORD[8+rbx*1+rsi] > + shl r10,60 > + xor r9,QWORD[rbx*1+rsi] > + mov bl,al > + xor r9,QWORD[rdx*8+r11] > + mov rdx,r8 > + shl al,4 > + xor r8,r10 > + dec rcx > + js NEAR $L$break1 > + > + shr r8,4 > + and rdx,0xf > + mov r10,r9 > + shr r9,4 > + xor r8,QWORD[8+rax*1+rsi] > + shl r10,60 > + xor r9,QWORD[rax*1+rsi] > + and bl,0xf0 > + xor r9,QWORD[rdx*8+r11] > + mov rdx,r8 > + xor r8,r10 > + jmp NEAR $L$oop1 > + > +ALIGN 16 > +$L$break1: > + shr r8,4 > + and rdx,0xf > + mov r10,r9 > + shr r9,4 > + xor r8,QWORD[8+rax*1+rsi] > + shl r10,60 > + xor r9,QWORD[rax*1+rsi] > + and bl,0xf0 > + xor r9,QWORD[rdx*8+r11] > + mov rdx,r8 > + xor r8,r10 > + > + shr r8,4 > + and rdx,0xf > + mov r10,r9 > + shr r9,4 > + xor r8,QWORD[8+rbx*1+rsi] > + shl r10,60 > + xor r9,QWORD[rbx*1+rsi] > + xor r8,r10 > + xor r9,QWORD[rdx*8+r11] > + > + bswap r8 > + bswap r9 > + mov QWORD[8+rdi],r8 > + mov QWORD[rdi],r9 > + > + lea rsi,[((280+48))+rsp] > + > + mov rbx,QWORD[((-8))+rsi] > + > + lea rsp,[rsi] > + > +$L$gmult_epilogue: > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > + mov rsi,QWORD[16+rsp] > + DB 0F3h,0C3h ;repret > + > +$L$SEH_end_gcm_gmult_4bit: > +global gcm_ghash_4bit > + > +ALIGN 16 > +gcm_ghash_4bit: > + mov QWORD[8+rsp],rdi ;WIN64 prologue > + mov QWORD[16+rsp],rsi > + mov rax,rsp > +$L$SEH_begin_gcm_ghash_4bit: > + mov rdi,rcx > + mov rsi,rdx > + mov rdx,r8 > + mov rcx,r9 > + > + > + > + push rbx > + > + push rbp > + > + push r12 > + > + push r13 > + > + push r14 > + > + push r15 > + > + sub rsp,280 > + > +$L$ghash_prologue: > + mov r14,rdx > + mov r15,rcx > + sub rsi,-128 > + lea rbp,[((16+128))+rsp] > + xor edx,edx > + mov r8,QWORD[((0+0-128))+rsi] > + mov rax,QWORD[((0+8-128))+rsi] > + mov dl,al > + shr rax,4 > + mov r10,r8 > + shr r8,4 > + mov r9,QWORD[((16+0-128))+rsi] > + shl dl,4 > + mov rbx,QWORD[((16+8-128))+rsi] > + shl r10,60 > + mov BYTE[rsp],dl > + or rax,r10 > + mov dl,bl > + shr rbx,4 > + mov r10,r9 > + shr r9,4 > + mov QWORD[rbp],r8 > + mov r8,QWORD[((32+0-128))+rsi] > + shl dl,4 > + mov QWORD[((0-128))+rbp],rax > + mov rax,QWORD[((32+8-128))+rsi] > + shl r10,60 > + mov BYTE[1+rsp],dl > + or rbx,r10 > + mov dl,al > + shr rax,4 > + mov r10,r8 > + shr r8,4 > + mov QWORD[8+rbp],r9 > + mov r9,QWORD[((48+0-128))+rsi] > + shl dl,4 > + mov QWORD[((8-128))+rbp],rbx > + mov rbx,QWORD[((48+8-128))+rsi] > + shl r10,60 > + mov BYTE[2+rsp],dl > + or rax,r10 > + mov dl,bl > + shr rbx,4 > + mov r10,r9 > + shr r9,4 > + mov QWORD[16+rbp],r8 > + mov r8,QWORD[((64+0-128))+rsi] > + shl dl,4 > + mov QWORD[((16-128))+rbp],rax > + mov rax,QWORD[((64+8-128))+rsi] > + shl r10,60 > + mov BYTE[3+rsp],dl > + or rbx,r10 > + mov dl,al > + shr rax,4 > + mov r10,r8 > + shr r8,4 > + mov QWORD[24+rbp],r9 > + mov r9,QWORD[((80+0-128))+rsi] > + shl dl,4 > + mov QWORD[((24-128))+rbp],rbx > + mov rbx,QWORD[((80+8-128))+rsi] > + shl r10,60 > + mov BYTE[4+rsp],dl > + or rax,r10 > + mov dl,bl > + shr rbx,4 > + mov r10,r9 > + shr r9,4 > + mov QWORD[32+rbp],r8 > + mov r8,QWORD[((96+0-128))+rsi] > + shl dl,4 > + mov QWORD[((32-128))+rbp],rax > + mov rax,QWORD[((96+8-128))+rsi] > + shl r10,60 > + mov BYTE[5+rsp],dl > + or rbx,r10 > + mov dl,al > + shr rax,4 > + mov r10,r8 > + shr r8,4 > + mov QWORD[40+rbp],r9 > + mov r9,QWORD[((112+0-128))+rsi] > + shl dl,4 > + mov QWORD[((40-128))+rbp],rbx > + mov rbx,QWORD[((112+8-128))+rsi] > + shl r10,60 > + mov BYTE[6+rsp],dl > + or rax,r10 > + mov dl,bl > + shr rbx,4 > + mov r10,r9 > + shr r9,4 > + mov QWORD[48+rbp],r8 > + mov r8,QWORD[((128+0-128))+rsi] > + shl dl,4 > + mov QWORD[((48-128))+rbp],rax > + mov rax,QWORD[((128+8-128))+rsi] > + shl r10,60 > + mov BYTE[7+rsp],dl > + or rbx,r10 > + mov dl,al > + shr rax,4 > + mov r10,r8 > + shr r8,4 > + mov QWORD[56+rbp],r9 > + mov r9,QWORD[((144+0-128))+rsi] > + shl dl,4 > + mov QWORD[((56-128))+rbp],rbx > + mov rbx,QWORD[((144+8-128))+rsi] > + shl r10,60 > + mov BYTE[8+rsp],dl > + or rax,r10 > + mov dl,bl > + shr rbx,4 > + mov r10,r9 > + shr r9,4 > + mov QWORD[64+rbp],r8 > + mov r8,QWORD[((160+0-128))+rsi] > + shl dl,4 > + mov QWORD[((64-128))+rbp],rax > + mov rax,QWORD[((160+8-128))+rsi] > + shl r10,60 > + mov BYTE[9+rsp],dl > + or rbx,r10 > + mov dl,al > + shr rax,4 > + mov r10,r8 > + shr r8,4 > + mov QWORD[72+rbp],r9 > + mov r9,QWORD[((176+0-128))+rsi] > + shl dl,4 > + mov QWORD[((72-128))+rbp],rbx > + mov rbx,QWORD[((176+8-128))+rsi] > + shl r10,60 > + mov BYTE[10+rsp],dl > + or rax,r10 > + mov dl,bl > + shr rbx,4 > + mov r10,r9 > + shr r9,4 > + mov QWORD[80+rbp],r8 > + mov r8,QWORD[((192+0-128))+rsi] > + shl dl,4 > + mov QWORD[((80-128))+rbp],rax > + mov rax,QWORD[((192+8-128))+rsi] > + shl r10,60 > + mov BYTE[11+rsp],dl > + or rbx,r10 > + mov dl,al > + shr rax,4 > + mov r10,r8 > + shr r8,4 > + mov QWORD[88+rbp],r9 > + mov r9,QWORD[((208+0-128))+rsi] > + shl dl,4 > + mov QWORD[((88-128))+rbp],rbx > + mov rbx,QWORD[((208+8-128))+rsi] > + shl r10,60 > + mov BYTE[12+rsp],dl > + or rax,r10 > + mov dl,bl > + shr rbx,4 > + mov r10,r9 > + shr r9,4 > + mov QWORD[96+rbp],r8 > + mov r8,QWORD[((224+0-128))+rsi] > + shl dl,4 > + mov QWORD[((96-128))+rbp],rax > + mov rax,QWORD[((224+8-128))+rsi] > + shl r10,60 > + mov BYTE[13+rsp],dl > + or rbx,r10 > + mov dl,al > + shr rax,4 > + mov r10,r8 > + shr r8,4 > + mov QWORD[104+rbp],r9 > + mov r9,QWORD[((240+0-128))+rsi] > + shl dl,4 > + mov QWORD[((104-128))+rbp],rbx > + mov rbx,QWORD[((240+8-128))+rsi] > + shl r10,60 > + mov BYTE[14+rsp],dl > + or rax,r10 > + mov dl,bl > + shr rbx,4 > + mov r10,r9 > + shr r9,4 > + mov QWORD[112+rbp],r8 > + shl dl,4 > + mov QWORD[((112-128))+rbp],rax > + shl r10,60 > + mov BYTE[15+rsp],dl > + or rbx,r10 > + mov QWORD[120+rbp],r9 > + mov QWORD[((120-128))+rbp],rbx > + add rsi,-128 > + mov r8,QWORD[8+rdi] > + mov r9,QWORD[rdi] > + add r15,r14 > + lea r11,[$L$rem_8bit] > + jmp NEAR $L$outer_loop > +ALIGN 16 > +$L$outer_loop: > + xor r9,QWORD[r14] > + mov rdx,QWORD[8+r14] > + lea r14,[16+r14] > + xor rdx,r8 > + mov QWORD[rdi],r9 > + mov QWORD[8+rdi],rdx > + shr rdx,32 > + xor rax,rax > + rol edx,8 > + mov al,dl > + movzx ebx,dl > + shl al,4 > + shr ebx,4 > + rol edx,8 > + mov r8,QWORD[8+rax*1+rsi] > + mov r9,QWORD[rax*1+rsi] > + mov al,dl > + movzx ecx,dl > + shl al,4 > + movzx r12,BYTE[rbx*1+rsp] > + shr ecx,4 > + xor r12,r8 > + mov r10,r9 > + shr r8,8 > + movzx r12,r12b > + shr r9,8 > + xor r8,QWORD[((-128))+rbx*8+rbp] > + shl r10,56 > + xor r9,QWORD[rbx*8+rbp] > + rol edx,8 > + xor r8,QWORD[8+rax*1+rsi] > + xor r9,QWORD[rax*1+rsi] > + mov al,dl > + xor r8,r10 > + movzx r12,WORD[r12*2+r11] > + movzx ebx,dl > + shl al,4 > + movzx r13,BYTE[rcx*1+rsp] > + shr ebx,4 > + shl r12,48 > + xor r13,r8 > + mov r10,r9 > + xor r9,r12 > + shr r8,8 > + movzx r13,r13b > + shr r9,8 > + xor r8,QWORD[((-128))+rcx*8+rbp] > + shl r10,56 > + xor r9,QWORD[rcx*8+rbp] > + rol edx,8 > + xor r8,QWORD[8+rax*1+rsi] > + xor r9,QWORD[rax*1+rsi] > + mov al,dl > + xor r8,r10 > + movzx r13,WORD[r13*2+r11] > + movzx ecx,dl > + shl al,4 > + movzx r12,BYTE[rbx*1+rsp] > + shr ecx,4 > + shl r13,48 > + xor r12,r8 > + mov r10,r9 > + xor r9,r13 > + shr r8,8 > + movzx r12,r12b > + mov edx,DWORD[8+rdi] > + shr r9,8 > + xor r8,QWORD[((-128))+rbx*8+rbp] > + shl r10,56 > + xor r9,QWORD[rbx*8+rbp] > + rol edx,8 > + xor r8,QWORD[8+rax*1+rsi] > + xor r9,QWORD[rax*1+rsi] > + mov al,dl > + xor r8,r10 > + movzx r12,WORD[r12*2+r11] > + movzx ebx,dl > + shl al,4 > + movzx r13,BYTE[rcx*1+rsp] > + shr ebx,4 > + shl r12,48 > + xor r13,r8 > + mov r10,r9 > + xor r9,r12 > + shr r8,8 > + movzx r13,r13b > + shr r9,8 > + xor r8,QWORD[((-128))+rcx*8+rbp] > + shl r10,56 > + xor r9,QWORD[rcx*8+rbp] > + rol edx,8 > + xor r8,QWORD[8+rax*1+rsi] > + xor r9,QWORD[rax*1+rsi] > + mov al,dl > + xor r8,r10 > + movzx r13,WORD[r13*2+r11] > + movzx ecx,dl > + shl al,4 > + movzx r12,BYTE[rbx*1+rsp] > + shr ecx,4 > + shl r13,48 > + xor r12,r8 > + mov r10,r9 > + xor r9,r13 > + shr r8,8 > + movzx r12,r12b > + shr r9,8 > + xor r8,QWORD[((-128))+rbx*8+rbp] > + shl r10,56 > + xor r9,QWORD[rbx*8+rbp] > + rol edx,8 > + xor r8,QWORD[8+rax*1+rsi] > + xor r9,QWORD[rax*1+rsi] > + mov al,dl > + xor r8,r10 > + movzx r12,WORD[r12*2+r11] > + movzx ebx,dl > + shl al,4 > + movzx r13,BYTE[rcx*1+rsp] > + shr ebx,4 > + shl r12,48 > + xor r13,r8 > + mov r10,r9 > + xor r9,r12 > + shr r8,8 > + movzx r13,r13b > + shr r9,8 > + xor r8,QWORD[((-128))+rcx*8+rbp] > + shl r10,56 > + xor r9,QWORD[rcx*8+rbp] > + rol edx,8 > + xor r8,QWORD[8+rax*1+rsi] > + xor r9,QWORD[rax*1+rsi] > + mov al,dl > + xor r8,r10 > + movzx r13,WORD[r13*2+r11] > + movzx ecx,dl > + shl al,4 > + movzx r12,BYTE[rbx*1+rsp] > + shr ecx,4 > + shl r13,48 > + xor r12,r8 > + mov r10,r9 > + xor r9,r13 > + shr r8,8 > + movzx r12,r12b > + mov edx,DWORD[4+rdi] > + shr r9,8 > + xor r8,QWORD[((-128))+rbx*8+rbp] > + shl r10,56 > + xor r9,QWORD[rbx*8+rbp] > + rol edx,8 > + xor r8,QWORD[8+rax*1+rsi] > + xor r9,QWORD[rax*1+rsi] > + mov al,dl > + xor r8,r10 > + movzx r12,WORD[r12*2+r11] > + movzx ebx,dl > + shl al,4 > + movzx r13,BYTE[rcx*1+rsp] > + shr ebx,4 > + shl r12,48 > + xor r13,r8 > + mov r10,r9 > + xor r9,r12 > + shr r8,8 > + movzx r13,r13b > + shr r9,8 > + xor r8,QWORD[((-128))+rcx*8+rbp] > + shl r10,56 > + xor r9,QWORD[rcx*8+rbp] > + rol edx,8 > + xor r8,QWORD[8+rax*1+rsi] > + xor r9,QWORD[rax*1+rsi] > + mov al,dl > + xor r8,r10 > + movzx r13,WORD[r13*2+r11] > + movzx ecx,dl > + shl al,4 > + movzx r12,BYTE[rbx*1+rsp] > + shr ecx,4 > + shl r13,48 > + xor r12,r8 > + mov r10,r9 > + xor r9,r13 > + shr r8,8 > + movzx r12,r12b > + shr r9,8 > + xor r8,QWORD[((-128))+rbx*8+rbp] > + shl r10,56 > + xor r9,QWORD[rbx*8+rbp] > + rol edx,8 > + xor r8,QWORD[8+rax*1+rsi] > + xor r9,QWORD[rax*1+rsi] > + mov al,dl > + xor r8,r10 > + movzx r12,WORD[r12*2+r11] > + movzx ebx,dl > + shl al,4 > + movzx r13,BYTE[rcx*1+rsp] > + shr ebx,4 > + shl r12,48 > + xor r13,r8 > + mov r10,r9 > + xor r9,r12 > + shr r8,8 > + movzx r13,r13b > + shr r9,8 > + xor r8,QWORD[((-128))+rcx*8+rbp] > + shl r10,56 > + xor r9,QWORD[rcx*8+rbp] > + rol edx,8 > + xor r8,QWORD[8+rax*1+rsi] > + xor r9,QWORD[rax*1+rsi] > + mov al,dl > + xor r8,r10 > + movzx r13,WORD[r13*2+r11] > + movzx ecx,dl > + shl al,4 > + movzx r12,BYTE[rbx*1+rsp] > + shr ecx,4 > + shl r13,48 > + xor r12,r8 > + mov r10,r9 > + xor r9,r13 > + shr r8,8 > + movzx r12,r12b > + mov edx,DWORD[rdi] > + shr r9,8 > + xor r8,QWORD[((-128))+rbx*8+rbp] > + shl r10,56 > + xor r9,QWORD[rbx*8+rbp] > + rol edx,8 > + xor r8,QWORD[8+rax*1+rsi] > + xor r9,QWORD[rax*1+rsi] > + mov al,dl > + xor r8,r10 > + movzx r12,WORD[r12*2+r11] > + movzx ebx,dl > + shl al,4 > + movzx r13,BYTE[rcx*1+rsp] > + shr ebx,4 > + shl r12,48 > + xor r13,r8 > + mov r10,r9 > + xor r9,r12 > + shr r8,8 > + movzx r13,r13b > + shr r9,8 > + xor r8,QWORD[((-128))+rcx*8+rbp] > + shl r10,56 > + xor r9,QWORD[rcx*8+rbp] > + rol edx,8 > + xor r8,QWORD[8+rax*1+rsi] > + xor r9,QWORD[rax*1+rsi] > + mov al,dl > + xor r8,r10 > + movzx r13,WORD[r13*2+r11] > + movzx ecx,dl > + shl al,4 > + movzx r12,BYTE[rbx*1+rsp] > + shr ecx,4 > + shl r13,48 > + xor r12,r8 > + mov r10,r9 > + xor r9,r13 > + shr r8,8 > + movzx r12,r12b > + shr r9,8 > + xor r8,QWORD[((-128))+rbx*8+rbp] > + shl r10,56 > + xor r9,QWORD[rbx*8+rbp] > + rol edx,8 > + xor r8,QWORD[8+rax*1+rsi] > + xor r9,QWORD[rax*1+rsi] > + mov al,dl > + xor r8,r10 > + movzx r12,WORD[r12*2+r11] > + movzx ebx,dl > + shl al,4 > + movzx r13,BYTE[rcx*1+rsp] > + shr ebx,4 > + shl r12,48 > + xor r13,r8 > + mov r10,r9 > + xor r9,r12 > + shr r8,8 > + movzx r13,r13b > + shr r9,8 > + xor r8,QWORD[((-128))+rcx*8+rbp] > + shl r10,56 > + xor r9,QWORD[rcx*8+rbp] > + rol edx,8 > + xor r8,QWORD[8+rax*1+rsi] > + xor r9,QWORD[rax*1+rsi] > + mov al,dl > + xor r8,r10 > + movzx r13,WORD[r13*2+r11] > + movzx ecx,dl > + shl al,4 > + movzx r12,BYTE[rbx*1+rsp] > + and ecx,240 > + shl r13,48 > + xor r12,r8 > + mov r10,r9 > + xor r9,r13 > + shr r8,8 > + movzx r12,r12b > + mov edx,DWORD[((-4))+rdi] > + shr r9,8 > + xor r8,QWORD[((-128))+rbx*8+rbp] > + shl r10,56 > + xor r9,QWORD[rbx*8+rbp] > + movzx r12,WORD[r12*2+r11] > + xor r8,QWORD[8+rax*1+rsi] > + xor r9,QWORD[rax*1+rsi] > + shl r12,48 > + xor r8,r10 > + xor r9,r12 > + movzx r13,r8b > + shr r8,4 > + mov r10,r9 > + shl r13b,4 > + shr r9,4 > + xor r8,QWORD[8+rcx*1+rsi] > + movzx r13,WORD[r13*2+r11] > + shl r10,60 > + xor r9,QWORD[rcx*1+rsi] > + xor r8,r10 > + shl r13,48 > + bswap r8 > + xor r9,r13 > + bswap r9 > + cmp r14,r15 > + jb NEAR $L$outer_loop > + mov QWORD[8+rdi],r8 > + mov QWORD[rdi],r9 > + > + lea rsi,[((280+48))+rsp] > + > + mov r15,QWORD[((-48))+rsi] > + > + mov r14,QWORD[((-40))+rsi] > + > + mov r13,QWORD[((-32))+rsi] > + > + mov r12,QWORD[((-24))+rsi] > + > + mov rbp,QWORD[((-16))+rsi] > + > + mov rbx,QWORD[((-8))+rsi] > + > + lea rsp,[rsi] > + > +$L$ghash_epilogue: > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > + mov rsi,QWORD[16+rsp] > + DB 0F3h,0C3h ;repret > + > +$L$SEH_end_gcm_ghash_4bit: > +global gcm_init_clmul > + > +ALIGN 16 > +gcm_init_clmul: > + > +$L$_init_clmul: > +$L$SEH_begin_gcm_init_clmul: > + > +DB 0x48,0x83,0xec,0x18 > +DB 0x0f,0x29,0x34,0x24 > + movdqu xmm2,XMMWORD[rdx] > + pshufd xmm2,xmm2,78 > + > + > + pshufd xmm4,xmm2,255 > + movdqa xmm3,xmm2 > + psllq xmm2,1 > + pxor xmm5,xmm5 > + psrlq xmm3,63 > + pcmpgtd xmm5,xmm4 > + pslldq xmm3,8 > + por xmm2,xmm3 > + > + > + pand xmm5,XMMWORD[$L$0x1c2_polynomial] > + pxor xmm2,xmm5 > + > + > + pshufd xmm6,xmm2,78 > + movdqa xmm0,xmm2 > + pxor xmm6,xmm2 > + movdqa xmm1,xmm0 > + pshufd xmm3,xmm0,78 > + pxor xmm3,xmm0 > +DB 102,15,58,68,194,0 > +DB 102,15,58,68,202,17 > +DB 102,15,58,68,222,0 > + pxor xmm3,xmm0 > + pxor xmm3,xmm1 > + > + movdqa xmm4,xmm3 > + psrldq xmm3,8 > + pslldq xmm4,8 > + pxor xmm1,xmm3 > + pxor xmm0,xmm4 > + > + movdqa xmm4,xmm0 > + movdqa xmm3,xmm0 > + psllq xmm0,5 > + pxor xmm3,xmm0 > + psllq xmm0,1 > + pxor xmm0,xmm3 > + psllq xmm0,57 > + movdqa xmm3,xmm0 > + pslldq xmm0,8 > + psrldq xmm3,8 > + pxor xmm0,xmm4 > + pxor xmm1,xmm3 > + > + > + movdqa xmm4,xmm0 > + psrlq xmm0,1 > + pxor xmm1,xmm4 > + pxor xmm4,xmm0 > + psrlq xmm0,5 > + pxor xmm0,xmm4 > + psrlq xmm0,1 > + pxor xmm0,xmm1 > + pshufd xmm3,xmm2,78 > + pshufd xmm4,xmm0,78 > + pxor xmm3,xmm2 > + movdqu XMMWORD[rcx],xmm2 > + pxor xmm4,xmm0 > + movdqu XMMWORD[16+rcx],xmm0 > +DB 102,15,58,15,227,8 > + movdqu XMMWORD[32+rcx],xmm4 > + movdqa xmm1,xmm0 > + pshufd xmm3,xmm0,78 > + pxor xmm3,xmm0 > +DB 102,15,58,68,194,0 > +DB 102,15,58,68,202,17 > +DB 102,15,58,68,222,0 > + pxor xmm3,xmm0 > + pxor xmm3,xmm1 > + > + movdqa xmm4,xmm3 > + psrldq xmm3,8 > + pslldq xmm4,8 > + pxor xmm1,xmm3 > + pxor xmm0,xmm4 > + > + movdqa xmm4,xmm0 > + movdqa xmm3,xmm0 > + psllq xmm0,5 > + pxor xmm3,xmm0 > + psllq xmm0,1 > + pxor xmm0,xmm3 > + psllq xmm0,57 > + movdqa xmm3,xmm0 > + pslldq xmm0,8 > + psrldq xmm3,8 > + pxor xmm0,xmm4 > + pxor xmm1,xmm3 > + > + > + movdqa xmm4,xmm0 > + psrlq xmm0,1 > + pxor xmm1,xmm4 > + pxor xmm4,xmm0 > + psrlq xmm0,5 > + pxor xmm0,xmm4 > + psrlq xmm0,1 > + pxor xmm0,xmm1 > + movdqa xmm5,xmm0 > + movdqa xmm1,xmm0 > + pshufd xmm3,xmm0,78 > + pxor xmm3,xmm0 > +DB 102,15,58,68,194,0 > +DB 102,15,58,68,202,17 > +DB 102,15,58,68,222,0 > + pxor xmm3,xmm0 > + pxor xmm3,xmm1 > + > + movdqa xmm4,xmm3 > + psrldq xmm3,8 > + pslldq xmm4,8 > + pxor xmm1,xmm3 > + pxor xmm0,xmm4 > + > + movdqa xmm4,xmm0 > + movdqa xmm3,xmm0 > + psllq xmm0,5 > + pxor xmm3,xmm0 > + psllq xmm0,1 > + pxor xmm0,xmm3 > + psllq xmm0,57 > + movdqa xmm3,xmm0 > + pslldq xmm0,8 > + psrldq xmm3,8 > + pxor xmm0,xmm4 > + pxor xmm1,xmm3 > + > + > + movdqa xmm4,xmm0 > + psrlq xmm0,1 > + pxor xmm1,xmm4 > + pxor xmm4,xmm0 > + psrlq xmm0,5 > + pxor xmm0,xmm4 > + psrlq xmm0,1 > + pxor xmm0,xmm1 > + pshufd xmm3,xmm5,78 > + pshufd xmm4,xmm0,78 > + pxor xmm3,xmm5 > + movdqu XMMWORD[48+rcx],xmm5 > + pxor xmm4,xmm0 > + movdqu XMMWORD[64+rcx],xmm0 > +DB 102,15,58,15,227,8 > + movdqu XMMWORD[80+rcx],xmm4 > + movaps xmm6,XMMWORD[rsp] > + lea rsp,[24+rsp] > +$L$SEH_end_gcm_init_clmul: > + DB 0F3h,0C3h ;repret > + > + > +global gcm_gmult_clmul > + > +ALIGN 16 > +gcm_gmult_clmul: > + > +$L$_gmult_clmul: > + movdqu xmm0,XMMWORD[rcx] > + movdqa xmm5,XMMWORD[$L$bswap_mask] > + movdqu xmm2,XMMWORD[rdx] > + movdqu xmm4,XMMWORD[32+rdx] > +DB 102,15,56,0,197 > + movdqa xmm1,xmm0 > + pshufd xmm3,xmm0,78 > + pxor xmm3,xmm0 > +DB 102,15,58,68,194,0 > +DB 102,15,58,68,202,17 > +DB 102,15,58,68,220,0 > + pxor xmm3,xmm0 > + pxor xmm3,xmm1 > + > + movdqa xmm4,xmm3 > + psrldq xmm3,8 > + pslldq xmm4,8 > + pxor xmm1,xmm3 > + pxor xmm0,xmm4 > + > + movdqa xmm4,xmm0 > + movdqa xmm3,xmm0 > + psllq xmm0,5 > + pxor xmm3,xmm0 > + psllq xmm0,1 > + pxor xmm0,xmm3 > + psllq xmm0,57 > + movdqa xmm3,xmm0 > + pslldq xmm0,8 > + psrldq xmm3,8 > + pxor xmm0,xmm4 > + pxor xmm1,xmm3 > + > + > + movdqa xmm4,xmm0 > + psrlq xmm0,1 > + pxor xmm1,xmm4 > + pxor xmm4,xmm0 > + psrlq xmm0,5 > + pxor xmm0,xmm4 > + psrlq xmm0,1 > + pxor xmm0,xmm1 > +DB 102,15,56,0,197 > + movdqu XMMWORD[rcx],xmm0 > + DB 0F3h,0C3h ;repret > + > + > +global gcm_ghash_clmul > + > +ALIGN 32 > +gcm_ghash_clmul: > + > +$L$_ghash_clmul: > + lea rax,[((-136))+rsp] > +$L$SEH_begin_gcm_ghash_clmul: > + > +DB 0x48,0x8d,0x60,0xe0 > +DB 0x0f,0x29,0x70,0xe0 > +DB 0x0f,0x29,0x78,0xf0 > +DB 0x44,0x0f,0x29,0x00 > +DB 0x44,0x0f,0x29,0x48,0x10 > +DB 0x44,0x0f,0x29,0x50,0x20 > +DB 0x44,0x0f,0x29,0x58,0x30 > +DB 0x44,0x0f,0x29,0x60,0x40 > +DB 0x44,0x0f,0x29,0x68,0x50 > +DB 0x44,0x0f,0x29,0x70,0x60 > +DB 0x44,0x0f,0x29,0x78,0x70 > + movdqa xmm10,XMMWORD[$L$bswap_mask] > + > + movdqu xmm0,XMMWORD[rcx] > + movdqu xmm2,XMMWORD[rdx] > + movdqu xmm7,XMMWORD[32+rdx] > +DB 102,65,15,56,0,194 > + > + sub r9,0x10 > + jz NEAR $L$odd_tail > + > + movdqu xmm6,XMMWORD[16+rdx] > + mov eax,DWORD[((OPENSSL_ia32cap_P+4))] > + cmp r9,0x30 > + jb NEAR $L$skip4x > + > + and eax,71303168 > + cmp eax,4194304 > + je NEAR $L$skip4x > + > + sub r9,0x30 > + mov rax,0xA040608020C0E000 > + movdqu xmm14,XMMWORD[48+rdx] > + movdqu xmm15,XMMWORD[64+rdx] > + > + > + > + > + movdqu xmm3,XMMWORD[48+r8] > + movdqu xmm11,XMMWORD[32+r8] > +DB 102,65,15,56,0,218 > +DB 102,69,15,56,0,218 > + movdqa xmm5,xmm3 > + pshufd xmm4,xmm3,78 > + pxor xmm4,xmm3 > +DB 102,15,58,68,218,0 > +DB 102,15,58,68,234,17 > +DB 102,15,58,68,231,0 > + > + movdqa xmm13,xmm11 > + pshufd xmm12,xmm11,78 > + pxor xmm12,xmm11 > +DB 102,68,15,58,68,222,0 > +DB 102,68,15,58,68,238,17 > +DB 102,68,15,58,68,231,16 > + xorps xmm3,xmm11 > + xorps xmm5,xmm13 > + movups xmm7,XMMWORD[80+rdx] > + xorps xmm4,xmm12 > + > + movdqu xmm11,XMMWORD[16+r8] > + movdqu xmm8,XMMWORD[r8] > +DB 102,69,15,56,0,218 > +DB 102,69,15,56,0,194 > + movdqa xmm13,xmm11 > + pshufd xmm12,xmm11,78 > + pxor xmm0,xmm8 > + pxor xmm12,xmm11 > +DB 102,69,15,58,68,222,0 > + movdqa xmm1,xmm0 > + pshufd xmm8,xmm0,78 > + pxor xmm8,xmm0 > +DB 102,69,15,58,68,238,17 > +DB 102,68,15,58,68,231,0 > + xorps xmm3,xmm11 > + xorps xmm5,xmm13 > + > + lea r8,[64+r8] > + sub r9,0x40 > + jc NEAR $L$tail4x > + > + jmp NEAR $L$mod4_loop > +ALIGN 32 > +$L$mod4_loop: > +DB 102,65,15,58,68,199,0 > + xorps xmm4,xmm12 > + movdqu xmm11,XMMWORD[48+r8] > +DB 102,69,15,56,0,218 > +DB 102,65,15,58,68,207,17 > + xorps xmm0,xmm3 > + movdqu xmm3,XMMWORD[32+r8] > + movdqa xmm13,xmm11 > +DB 102,68,15,58,68,199,16 > + pshufd xmm12,xmm11,78 > + xorps xmm1,xmm5 > + pxor xmm12,xmm11 > +DB 102,65,15,56,0,218 > + movups xmm7,XMMWORD[32+rdx] > + xorps xmm8,xmm4 > +DB 102,68,15,58,68,218,0 > + pshufd xmm4,xmm3,78 > + > + pxor xmm8,xmm0 > + movdqa xmm5,xmm3 > + pxor xmm8,xmm1 > + pxor xmm4,xmm3 > + movdqa xmm9,xmm8 > +DB 102,68,15,58,68,234,17 > + pslldq xmm8,8 > + psrldq xmm9,8 > + pxor xmm0,xmm8 > + movdqa xmm8,XMMWORD[$L$7_mask] > + pxor xmm1,xmm9 > +DB 102,76,15,110,200 > + > + pand xmm8,xmm0 > +DB 102,69,15,56,0,200 > + pxor xmm9,xmm0 > +DB 102,68,15,58,68,231,0 > + psllq xmm9,57 > + movdqa xmm8,xmm9 > + pslldq xmm9,8 > +DB 102,15,58,68,222,0 > + psrldq xmm8,8 > + pxor xmm0,xmm9 > + pxor xmm1,xmm8 > + movdqu xmm8,XMMWORD[r8] > + > + movdqa xmm9,xmm0 > + psrlq xmm0,1 > +DB 102,15,58,68,238,17 > + xorps xmm3,xmm11 > + movdqu xmm11,XMMWORD[16+r8] > +DB 102,69,15,56,0,218 > +DB 102,15,58,68,231,16 > + xorps xmm5,xmm13 > + movups xmm7,XMMWORD[80+rdx] > +DB 102,69,15,56,0,194 > + pxor xmm1,xmm9 > + pxor xmm9,xmm0 > + psrlq xmm0,5 > + > + movdqa xmm13,xmm11 > + pxor xmm4,xmm12 > + pshufd xmm12,xmm11,78 > + pxor xmm0,xmm9 > + pxor xmm1,xmm8 > + pxor xmm12,xmm11 > +DB 102,69,15,58,68,222,0 > + psrlq xmm0,1 > + pxor xmm0,xmm1 > + movdqa xmm1,xmm0 > +DB 102,69,15,58,68,238,17 > + xorps xmm3,xmm11 > + pshufd xmm8,xmm0,78 > + pxor xmm8,xmm0 > + > +DB 102,68,15,58,68,231,0 > + xorps xmm5,xmm13 > + > + lea r8,[64+r8] > + sub r9,0x40 > + jnc NEAR $L$mod4_loop > + > +$L$tail4x: > +DB 102,65,15,58,68,199,0 > +DB 102,65,15,58,68,207,17 > +DB 102,68,15,58,68,199,16 > + xorps xmm4,xmm12 > + xorps xmm0,xmm3 > + xorps xmm1,xmm5 > + pxor xmm1,xmm0 > + pxor xmm8,xmm4 > + > + pxor xmm8,xmm1 > + pxor xmm1,xmm0 > + > + movdqa xmm9,xmm8 > + psrldq xmm8,8 > + pslldq xmm9,8 > + pxor xmm1,xmm8 > + pxor xmm0,xmm9 > + > + movdqa xmm4,xmm0 > + movdqa xmm3,xmm0 > + psllq xmm0,5 > + pxor xmm3,xmm0 > + psllq xmm0,1 > + pxor xmm0,xmm3 > + psllq xmm0,57 > + movdqa xmm3,xmm0 > + pslldq xmm0,8 > + psrldq xmm3,8 > + pxor xmm0,xmm4 > + pxor xmm1,xmm3 > + > + > + movdqa xmm4,xmm0 > + psrlq xmm0,1 > + pxor xmm1,xmm4 > + pxor xmm4,xmm0 > + psrlq xmm0,5 > + pxor xmm0,xmm4 > + psrlq xmm0,1 > + pxor xmm0,xmm1 > + add r9,0x40 > + jz NEAR $L$done > + movdqu xmm7,XMMWORD[32+rdx] > + sub r9,0x10 > + jz NEAR $L$odd_tail > +$L$skip4x: > + > + > + > + > + > + movdqu xmm8,XMMWORD[r8] > + movdqu xmm3,XMMWORD[16+r8] > +DB 102,69,15,56,0,194 > +DB 102,65,15,56,0,218 > + pxor xmm0,xmm8 > + > + movdqa xmm5,xmm3 > + pshufd xmm4,xmm3,78 > + pxor xmm4,xmm3 > +DB 102,15,58,68,218,0 > +DB 102,15,58,68,234,17 > +DB 102,15,58,68,231,0 > + > + lea r8,[32+r8] > + nop > + sub r9,0x20 > + jbe NEAR $L$even_tail > + nop > + jmp NEAR $L$mod_loop > + > +ALIGN 32 > +$L$mod_loop: > + movdqa xmm1,xmm0 > + movdqa xmm8,xmm4 > + pshufd xmm4,xmm0,78 > + pxor xmm4,xmm0 > + > +DB 102,15,58,68,198,0 > +DB 102,15,58,68,206,17 > +DB 102,15,58,68,231,16 > + > + pxor xmm0,xmm3 > + pxor xmm1,xmm5 > + movdqu xmm9,XMMWORD[r8] > + pxor xmm8,xmm0 > +DB 102,69,15,56,0,202 > + movdqu xmm3,XMMWORD[16+r8] > + > + pxor xmm8,xmm1 > + pxor xmm1,xmm9 > + pxor xmm4,xmm8 > +DB 102,65,15,56,0,218 > + movdqa xmm8,xmm4 > + psrldq xmm8,8 > + pslldq xmm4,8 > + pxor xmm1,xmm8 > + pxor xmm0,xmm4 > + > + movdqa xmm5,xmm3 > + > + movdqa xmm9,xmm0 > + movdqa xmm8,xmm0 > + psllq xmm0,5 > + pxor xmm8,xmm0 > +DB 102,15,58,68,218,0 > + psllq xmm0,1 > + pxor xmm0,xmm8 > + psllq xmm0,57 > + movdqa xmm8,xmm0 > + pslldq xmm0,8 > + psrldq xmm8,8 > + pxor xmm0,xmm9 > + pshufd xmm4,xmm5,78 > + pxor xmm1,xmm8 > + pxor xmm4,xmm5 > + > + movdqa xmm9,xmm0 > + psrlq xmm0,1 > +DB 102,15,58,68,234,17 > + pxor xmm1,xmm9 > + pxor xmm9,xmm0 > + psrlq xmm0,5 > + pxor xmm0,xmm9 > + lea r8,[32+r8] > + psrlq xmm0,1 > +DB 102,15,58,68,231,0 > + pxor xmm0,xmm1 > + > + sub r9,0x20 > + ja NEAR $L$mod_loop > + > +$L$even_tail: > + movdqa xmm1,xmm0 > + movdqa xmm8,xmm4 > + pshufd xmm4,xmm0,78 > + pxor xmm4,xmm0 > + > +DB 102,15,58,68,198,0 > +DB 102,15,58,68,206,17 > +DB 102,15,58,68,231,16 > + > + pxor xmm0,xmm3 > + pxor xmm1,xmm5 > + pxor xmm8,xmm0 > + pxor xmm8,xmm1 > + pxor xmm4,xmm8 > + movdqa xmm8,xmm4 > + psrldq xmm8,8 > + pslldq xmm4,8 > + pxor xmm1,xmm8 > + pxor xmm0,xmm4 > + > + movdqa xmm4,xmm0 > + movdqa xmm3,xmm0 > + psllq xmm0,5 > + pxor xmm3,xmm0 > + psllq xmm0,1 > + pxor xmm0,xmm3 > + psllq xmm0,57 > + movdqa xmm3,xmm0 > + pslldq xmm0,8 > + psrldq xmm3,8 > + pxor xmm0,xmm4 > + pxor xmm1,xmm3 > + > + > + movdqa xmm4,xmm0 > + psrlq xmm0,1 > + pxor xmm1,xmm4 > + pxor xmm4,xmm0 > + psrlq xmm0,5 > + pxor xmm0,xmm4 > + psrlq xmm0,1 > + pxor xmm0,xmm1 > + test r9,r9 > + jnz NEAR $L$done > + > +$L$odd_tail: > + movdqu xmm8,XMMWORD[r8] > +DB 102,69,15,56,0,194 > + pxor xmm0,xmm8 > + movdqa xmm1,xmm0 > + pshufd xmm3,xmm0,78 > + pxor xmm3,xmm0 > +DB 102,15,58,68,194,0 > +DB 102,15,58,68,202,17 > +DB 102,15,58,68,223,0 > + pxor xmm3,xmm0 > + pxor xmm3,xmm1 > + > + movdqa xmm4,xmm3 > + psrldq xmm3,8 > + pslldq xmm4,8 > + pxor xmm1,xmm3 > + pxor xmm0,xmm4 > + > + movdqa xmm4,xmm0 > + movdqa xmm3,xmm0 > + psllq xmm0,5 > + pxor xmm3,xmm0 > + psllq xmm0,1 > + pxor xmm0,xmm3 > + psllq xmm0,57 > + movdqa xmm3,xmm0 > + pslldq xmm0,8 > + psrldq xmm3,8 > + pxor xmm0,xmm4 > + pxor xmm1,xmm3 > + > + > + movdqa xmm4,xmm0 > + psrlq xmm0,1 > + pxor xmm1,xmm4 > + pxor xmm4,xmm0 > + psrlq xmm0,5 > + pxor xmm0,xmm4 > + psrlq xmm0,1 > + pxor xmm0,xmm1 > +$L$done: > +DB 102,65,15,56,0,194 > + movdqu XMMWORD[rcx],xmm0 > + movaps xmm6,XMMWORD[rsp] > + movaps xmm7,XMMWORD[16+rsp] > + movaps xmm8,XMMWORD[32+rsp] > + movaps xmm9,XMMWORD[48+rsp] > + movaps xmm10,XMMWORD[64+rsp] > + movaps xmm11,XMMWORD[80+rsp] > + movaps xmm12,XMMWORD[96+rsp] > + movaps xmm13,XMMWORD[112+rsp] > + movaps xmm14,XMMWORD[128+rsp] > + movaps xmm15,XMMWORD[144+rsp] > + lea rsp,[168+rsp] > +$L$SEH_end_gcm_ghash_clmul: > + DB 0F3h,0C3h ;repret > + > + > +global gcm_init_avx > + > +ALIGN 32 > +gcm_init_avx: > + > + jmp NEAR $L$_init_clmul > + > + > +global gcm_gmult_avx > + > +ALIGN 32 > +gcm_gmult_avx: > + > + jmp NEAR $L$_gmult_clmul > + > + > +global gcm_ghash_avx > + > +ALIGN 32 > +gcm_ghash_avx: > + > + jmp NEAR $L$_ghash_clmul > + > + > +ALIGN 64 > +$L$bswap_mask: > +DB 15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0 > +$L$0x1c2_polynomial: > +DB 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0xc2 > +$L$7_mask: > + DD 7,0,7,0 > +$L$7_mask_poly: > + DD 7,0,450,0 > +ALIGN 64 > + > +$L$rem_4bit: > + DD 0,0,0,471859200,0,943718400,0,610271232 > + DD 0,1887436800,0,1822425088,0,1220542464,0,1423966208 > + DD 0,3774873600,0,4246732800,0,3644850176,0,3311403008 > + DD 0,2441084928,0,2376073216,0,2847932416,0,3051356160 > + > +$L$rem_8bit: > + DW 0x0000,0x01C2,0x0384,0x0246,0x0708,0x06CA,0x048C,0x054E > + DW 0x0E10,0x0FD2,0x0D94,0x0C56,0x0918,0x08DA,0x0A9C,0x0B5E > + DW 0x1C20,0x1DE2,0x1FA4,0x1E66,0x1B28,0x1AEA,0x18AC,0x196E > + DW 0x1230,0x13F2,0x11B4,0x1076,0x1538,0x14FA,0x16BC,0x177E > + DW 0x3840,0x3982,0x3BC4,0x3A06,0x3F48,0x3E8A,0x3CCC,0x3D0E > + DW 0x3650,0x3792,0x35D4,0x3416,0x3158,0x309A,0x32DC,0x331E > + DW 0x2460,0x25A2,0x27E4,0x2626,0x2368,0x22AA,0x20EC,0x212E > + DW 0x2A70,0x2BB2,0x29F4,0x2836,0x2D78,0x2CBA,0x2EFC,0x2F3E > + DW 0x7080,0x7142,0x7304,0x72C6,0x7788,0x764A,0x740C,0x75CE > + DW 0x7E90,0x7F52,0x7D14,0x7CD6,0x7998,0x785A,0x7A1C,0x7BDE > + DW 0x6CA0,0x6D62,0x6F24,0x6EE6,0x6BA8,0x6A6A,0x682C,0x69EE > + DW 0x62B0,0x6372,0x6134,0x60F6,0x65B8,0x647A,0x663C,0x67FE > + DW 0x48C0,0x4902,0x4B44,0x4A86,0x4FC8,0x4E0A,0x4C4C,0x4D8E > + DW 0x46D0,0x4712,0x4554,0x4496,0x41D8,0x401A,0x425C,0x439E > + DW 0x54E0,0x5522,0x5764,0x56A6,0x53E8,0x522A,0x506C,0x51AE > + DW 0x5AF0,0x5B32,0x5974,0x58B6,0x5DF8,0x5C3A,0x5E7C,0x5FBE > + DW 0xE100,0xE0C2,0xE284,0xE346,0xE608,0xE7CA,0xE58C,0xE44E > + DW 0xEF10,0xEED2,0xEC94,0xED56,0xE818,0xE9DA,0xEB9C,0xEA5E > + DW 0xFD20,0xFCE2,0xFEA4,0xFF66,0xFA28,0xFBEA,0xF9AC,0xF86E > + DW 0xF330,0xF2F2,0xF0B4,0xF176,0xF438,0xF5FA,0xF7BC,0xF67E > + DW 0xD940,0xD882,0xDAC4,0xDB06,0xDE48,0xDF8A,0xDDCC,0xDC0E > + DW 0xD750,0xD692,0xD4D4,0xD516,0xD058,0xD19A,0xD3DC,0xD21E > + DW 0xC560,0xC4A2,0xC6E4,0xC726,0xC268,0xC3AA,0xC1EC,0xC02E > + DW 0xCB70,0xCAB2,0xC8F4,0xC936,0xCC78,0xCDBA,0xCFFC,0xCE3E > + DW 0x9180,0x9042,0x9204,0x93C6,0x9688,0x974A,0x950C,0x94CE > + DW 0x9F90,0x9E52,0x9C14,0x9DD6,0x9898,0x995A,0x9B1C,0x9ADE > + DW 0x8DA0,0x8C62,0x8E24,0x8FE6,0x8AA8,0x8B6A,0x892C,0x88EE > + DW 0x83B0,0x8272,0x8034,0x81F6,0x84B8,0x857A,0x873C,0x86FE > + DW 0xA9C0,0xA802,0xAA44,0xAB86,0xAEC8,0xAF0A,0xAD4C,0xAC8E > + DW 0xA7D0,0xA612,0xA454,0xA596,0xA0D8,0xA11A,0xA35C,0xA29E > + DW 0xB5E0,0xB422,0xB664,0xB7A6,0xB2E8,0xB32A,0xB16C,0xB0AE > + DW 0xBBF0,0xBA32,0xB874,0xB9B6,0xBCF8,0xBD3A,0xBF7C,0xBEBE > + > +DB 71,72,65,83,72,32,102,111,114,32,120,56,54,95,54,52 > +DB 44,32,67,82,89,80,84,79,71,65,77,83,32,98,121,32 > +DB 60,97,112,112,114,111,64,111,112,101,110,115,115,108,46,111 > +DB 114,103,62,0 > +ALIGN 64 > +EXTERN __imp_RtlVirtualUnwind > + > +ALIGN 16 > +se_handler: > + push rsi > + push rdi > + push rbx > + push rbp > + push r12 > + push r13 > + push r14 > + push r15 > + pushfq > + sub rsp,64 > + > + mov rax,QWORD[120+r8] > + mov rbx,QWORD[248+r8] > + > + mov rsi,QWORD[8+r9] > + mov r11,QWORD[56+r9] > + > + mov r10d,DWORD[r11] > + lea r10,[r10*1+rsi] > + cmp rbx,r10 > + jb NEAR $L$in_prologue > + > + mov rax,QWORD[152+r8] > + > + mov r10d,DWORD[4+r11] > + lea r10,[r10*1+rsi] > + cmp rbx,r10 > + jae NEAR $L$in_prologue > + > + lea rax,[((48+280))+rax] > + > + mov rbx,QWORD[((-8))+rax] > + mov rbp,QWORD[((-16))+rax] > + mov r12,QWORD[((-24))+rax] > + mov r13,QWORD[((-32))+rax] > + mov r14,QWORD[((-40))+rax] > + mov r15,QWORD[((-48))+rax] > + mov QWORD[144+r8],rbx > + mov QWORD[160+r8],rbp > + mov QWORD[216+r8],r12 > + mov QWORD[224+r8],r13 > + mov QWORD[232+r8],r14 > + mov QWORD[240+r8],r15 > + > +$L$in_prologue: > + mov rdi,QWORD[8+rax] > + mov rsi,QWORD[16+rax] > + mov QWORD[152+r8],rax > + mov QWORD[168+r8],rsi > + mov QWORD[176+r8],rdi > + > + mov rdi,QWORD[40+r9] > + mov rsi,r8 > + mov ecx,154 > + DD 0xa548f3fc > + > + mov rsi,r9 > + xor rcx,rcx > + mov rdx,QWORD[8+rsi] > + mov r8,QWORD[rsi] > + mov r9,QWORD[16+rsi] > + mov r10,QWORD[40+rsi] > + lea r11,[56+rsi] > + lea r12,[24+rsi] > + mov QWORD[32+rsp],r10 > + mov QWORD[40+rsp],r11 > + mov QWORD[48+rsp],r12 > + mov QWORD[56+rsp],rcx > + call QWORD[__imp_RtlVirtualUnwind] > + > + mov eax,1 > + add rsp,64 > + popfq > + pop r15 > + pop r14 > + pop r13 > + pop r12 > + pop rbp > + pop rbx > + pop rdi > + pop rsi > + DB 0F3h,0C3h ;repret > + > + > +section .pdata rdata align=4 > +ALIGN 4 > + DD $L$SEH_begin_gcm_gmult_4bit wrt ..imagebase > + DD $L$SEH_end_gcm_gmult_4bit wrt ..imagebase > + DD $L$SEH_info_gcm_gmult_4bit wrt ..imagebase > + > + DD $L$SEH_begin_gcm_ghash_4bit wrt ..imagebase > + DD $L$SEH_end_gcm_ghash_4bit wrt ..imagebase > + DD $L$SEH_info_gcm_ghash_4bit wrt ..imagebase > + > + DD $L$SEH_begin_gcm_init_clmul wrt ..imagebase > + DD $L$SEH_end_gcm_init_clmul wrt ..imagebase > + DD $L$SEH_info_gcm_init_clmul wrt ..imagebase > + > + DD $L$SEH_begin_gcm_ghash_clmul wrt ..imagebase > + DD $L$SEH_end_gcm_ghash_clmul wrt ..imagebase > + DD $L$SEH_info_gcm_ghash_clmul wrt ..imagebase > +section .xdata rdata align=8 > +ALIGN 8 > +$L$SEH_info_gcm_gmult_4bit: > +DB 9,0,0,0 > + DD se_handler wrt ..imagebase > + DD $L$gmult_prologue wrt ..imagebase,$L$gmult_epilogue > wrt ..imagebase > +$L$SEH_info_gcm_ghash_4bit: > +DB 9,0,0,0 > + DD se_handler wrt ..imagebase > + DD $L$ghash_prologue wrt ..imagebase,$L$ghash_epilogue > wrt ..imagebase > +$L$SEH_info_gcm_init_clmul: > +DB 0x01,0x08,0x03,0x00 > +DB 0x08,0x68,0x00,0x00 > +DB 0x04,0x22,0x00,0x00 > +$L$SEH_info_gcm_ghash_clmul: > +DB 0x01,0x33,0x16,0x00 > +DB 0x33,0xf8,0x09,0x00 > +DB 0x2e,0xe8,0x08,0x00 > +DB 0x29,0xd8,0x07,0x00 > +DB 0x24,0xc8,0x06,0x00 > +DB 0x1f,0xb8,0x05,0x00 > +DB 0x1a,0xa8,0x04,0x00 > +DB 0x15,0x98,0x03,0x00 > +DB 0x10,0x88,0x02,0x00 > +DB 0x0c,0x78,0x01,0x00 > +DB 0x08,0x68,0x00,0x00 > +DB 0x04,0x01,0x15,0x00 > diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-mb- > x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-mb- > x86_64.nasm > new file mode 100644 > index 0000000000..f3b7b0e35e > --- /dev/null > +++ b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-mb-x86_64.nasm > @@ -0,0 +1,3137 @@ > +; WARNING: do not edit! > +; Generated from openssl/crypto/sha/asm/sha1-mb-x86_64.pl > +; > +; Copyright 2013-2020 The OpenSSL Project Authors. All Rights Reserved. > +; > +; Licensed under the OpenSSL license (the "License"). You may not use > +; this file except in compliance with the License. You can obtain a copy > +; in the file LICENSE in the source distribution or at > +; https://www.openssl.org/source/license.html > + > +default rel > +%define XMMWORD > +%define YMMWORD > +%define ZMMWORD > +section .text code align=64 > + > + > +EXTERN OPENSSL_ia32cap_P > + > +global sha1_multi_block > + > +ALIGN 32 > +sha1_multi_block: > + mov QWORD[8+rsp],rdi ;WIN64 prologue > + mov QWORD[16+rsp],rsi > + mov rax,rsp > +$L$SEH_begin_sha1_multi_block: > + mov rdi,rcx > + mov rsi,rdx > + mov rdx,r8 > + > + > + > + mov rcx,QWORD[((OPENSSL_ia32cap_P+4))] > + bt rcx,61 > + jc NEAR _shaext_shortcut > + mov rax,rsp > + > + push rbx > + > + push rbp > + > + lea rsp,[((-168))+rsp] > + movaps XMMWORD[rsp],xmm6 > + movaps XMMWORD[16+rsp],xmm7 > + movaps XMMWORD[32+rsp],xmm8 > + movaps XMMWORD[48+rsp],xmm9 > + movaps XMMWORD[(-120)+rax],xmm10 > + movaps XMMWORD[(-104)+rax],xmm11 > + movaps XMMWORD[(-88)+rax],xmm12 > + movaps XMMWORD[(-72)+rax],xmm13 > + movaps XMMWORD[(-56)+rax],xmm14 > + movaps XMMWORD[(-40)+rax],xmm15 > + sub rsp,288 > + and rsp,-256 > + mov QWORD[272+rsp],rax > + > +$L$body: > + lea rbp,[K_XX_XX] > + lea rbx,[256+rsp] > + > +$L$oop_grande: > + mov DWORD[280+rsp],edx > + xor edx,edx > + mov r8,QWORD[rsi] > + mov ecx,DWORD[8+rsi] > + cmp ecx,edx > + cmovg edx,ecx > + test ecx,ecx > + mov DWORD[rbx],ecx > + cmovle r8,rbp > + mov r9,QWORD[16+rsi] > + mov ecx,DWORD[24+rsi] > + cmp ecx,edx > + cmovg edx,ecx > + test ecx,ecx > + mov DWORD[4+rbx],ecx > + cmovle r9,rbp > + mov r10,QWORD[32+rsi] > + mov ecx,DWORD[40+rsi] > + cmp ecx,edx > + cmovg edx,ecx > + test ecx,ecx > + mov DWORD[8+rbx],ecx > + cmovle r10,rbp > + mov r11,QWORD[48+rsi] > + mov ecx,DWORD[56+rsi] > + cmp ecx,edx > + cmovg edx,ecx > + test ecx,ecx > + mov DWORD[12+rbx],ecx > + cmovle r11,rbp > + test edx,edx > + jz NEAR $L$done > + > + movdqu xmm10,XMMWORD[rdi] > + lea rax,[128+rsp] > + movdqu xmm11,XMMWORD[32+rdi] > + movdqu xmm12,XMMWORD[64+rdi] > + movdqu xmm13,XMMWORD[96+rdi] > + movdqu xmm14,XMMWORD[128+rdi] > + movdqa xmm5,XMMWORD[96+rbp] > + movdqa xmm15,XMMWORD[((-32))+rbp] > + jmp NEAR $L$oop > + > +ALIGN 32 > +$L$oop: > + movd xmm0,DWORD[r8] > + lea r8,[64+r8] > + movd xmm2,DWORD[r9] > + lea r9,[64+r9] > + movd xmm3,DWORD[r10] > + lea r10,[64+r10] > + movd xmm4,DWORD[r11] > + lea r11,[64+r11] > + punpckldq xmm0,xmm3 > + movd xmm1,DWORD[((-60))+r8] > + punpckldq xmm2,xmm4 > + movd xmm9,DWORD[((-60))+r9] > + punpckldq xmm0,xmm2 > + movd xmm8,DWORD[((-60))+r10] > +DB 102,15,56,0,197 > + movd xmm7,DWORD[((-60))+r11] > + punpckldq xmm1,xmm8 > + movdqa xmm8,xmm10 > + paddd xmm14,xmm15 > + punpckldq xmm9,xmm7 > + movdqa xmm7,xmm11 > + movdqa xmm6,xmm11 > + pslld xmm8,5 > + pandn xmm7,xmm13 > + pand xmm6,xmm12 > + punpckldq xmm1,xmm9 > + movdqa xmm9,xmm10 > + > + movdqa XMMWORD[(0-128)+rax],xmm0 > + paddd xmm14,xmm0 > + movd xmm2,DWORD[((-56))+r8] > + psrld xmm9,27 > + pxor xmm6,xmm7 > + movdqa xmm7,xmm11 > + > + por xmm8,xmm9 > + movd xmm9,DWORD[((-56))+r9] > + pslld xmm7,30 > + paddd xmm14,xmm6 > + > + psrld xmm11,2 > + paddd xmm14,xmm8 > +DB 102,15,56,0,205 > + movd xmm8,DWORD[((-56))+r10] > + por xmm11,xmm7 > + movd xmm7,DWORD[((-56))+r11] > + punpckldq xmm2,xmm8 > + movdqa xmm8,xmm14 > + paddd xmm13,xmm15 > + punpckldq xmm9,xmm7 > + movdqa xmm7,xmm10 > + movdqa xmm6,xmm10 > + pslld xmm8,5 > + pandn xmm7,xmm12 > + pand xmm6,xmm11 > + punpckldq xmm2,xmm9 > + movdqa xmm9,xmm14 > + > + movdqa XMMWORD[(16-128)+rax],xmm1 > + paddd xmm13,xmm1 > + movd xmm3,DWORD[((-52))+r8] > + psrld xmm9,27 > + pxor xmm6,xmm7 > + movdqa xmm7,xmm10 > + > + por xmm8,xmm9 > + movd xmm9,DWORD[((-52))+r9] > + pslld xmm7,30 > + paddd xmm13,xmm6 > + > + psrld xmm10,2 > + paddd xmm13,xmm8 > +DB 102,15,56,0,213 > + movd xmm8,DWORD[((-52))+r10] > + por xmm10,xmm7 > + movd xmm7,DWORD[((-52))+r11] > + punpckldq xmm3,xmm8 > + movdqa xmm8,xmm13 > + paddd xmm12,xmm15 > + punpckldq xmm9,xmm7 > + movdqa xmm7,xmm14 > + movdqa xmm6,xmm14 > + pslld xmm8,5 > + pandn xmm7,xmm11 > + pand xmm6,xmm10 > + punpckldq xmm3,xmm9 > + movdqa xmm9,xmm13 > + > + movdqa XMMWORD[(32-128)+rax],xmm2 > + paddd xmm12,xmm2 > + movd xmm4,DWORD[((-48))+r8] > + psrld xmm9,27 > + pxor xmm6,xmm7 > + movdqa xmm7,xmm14 > + > + por xmm8,xmm9 > + movd xmm9,DWORD[((-48))+r9] > + pslld xmm7,30 > + paddd xmm12,xmm6 > + > + psrld xmm14,2 > + paddd xmm12,xmm8 > +DB 102,15,56,0,221 > + movd xmm8,DWORD[((-48))+r10] > + por xmm14,xmm7 > + movd xmm7,DWORD[((-48))+r11] > + punpckldq xmm4,xmm8 > + movdqa xmm8,xmm12 > + paddd xmm11,xmm15 > + punpckldq xmm9,xmm7 > + movdqa xmm7,xmm13 > + movdqa xmm6,xmm13 > + pslld xmm8,5 > + pandn xmm7,xmm10 > + pand xmm6,xmm14 > + punpckldq xmm4,xmm9 > + movdqa xmm9,xmm12 > + > + movdqa XMMWORD[(48-128)+rax],xmm3 > + paddd xmm11,xmm3 > + movd xmm0,DWORD[((-44))+r8] > + psrld xmm9,27 > + pxor xmm6,xmm7 > + movdqa xmm7,xmm13 > + > + por xmm8,xmm9 > + movd xmm9,DWORD[((-44))+r9] > + pslld xmm7,30 > + paddd xmm11,xmm6 > + > + psrld xmm13,2 > + paddd xmm11,xmm8 > +DB 102,15,56,0,229 > + movd xmm8,DWORD[((-44))+r10] > + por xmm13,xmm7 > + movd xmm7,DWORD[((-44))+r11] > + punpckldq xmm0,xmm8 > + movdqa xmm8,xmm11 > + paddd xmm10,xmm15 > + punpckldq xmm9,xmm7 > + movdqa xmm7,xmm12 > + movdqa xmm6,xmm12 > + pslld xmm8,5 > + pandn xmm7,xmm14 > + pand xmm6,xmm13 > + punpckldq xmm0,xmm9 > + movdqa xmm9,xmm11 > + > + movdqa XMMWORD[(64-128)+rax],xmm4 > + paddd xmm10,xmm4 > + movd xmm1,DWORD[((-40))+r8] > + psrld xmm9,27 > + pxor xmm6,xmm7 > + movdqa xmm7,xmm12 > + > + por xmm8,xmm9 > + movd xmm9,DWORD[((-40))+r9] > + pslld xmm7,30 > + paddd xmm10,xmm6 > + > + psrld xmm12,2 > + paddd xmm10,xmm8 > +DB 102,15,56,0,197 > + movd xmm8,DWORD[((-40))+r10] > + por xmm12,xmm7 > + movd xmm7,DWORD[((-40))+r11] > + punpckldq xmm1,xmm8 > + movdqa xmm8,xmm10 > + paddd xmm14,xmm15 > + punpckldq xmm9,xmm7 > + movdqa xmm7,xmm11 > + movdqa xmm6,xmm11 > + pslld xmm8,5 > + pandn xmm7,xmm13 > + pand xmm6,xmm12 > + punpckldq xmm1,xmm9 > + movdqa xmm9,xmm10 > + > + movdqa XMMWORD[(80-128)+rax],xmm0 > + paddd xmm14,xmm0 > + movd xmm2,DWORD[((-36))+r8] > + psrld xmm9,27 > + pxor xmm6,xmm7 > + movdqa xmm7,xmm11 > + > + por xmm8,xmm9 > + movd xmm9,DWORD[((-36))+r9] > + pslld xmm7,30 > + paddd xmm14,xmm6 > + > + psrld xmm11,2 > + paddd xmm14,xmm8 > +DB 102,15,56,0,205 > + movd xmm8,DWORD[((-36))+r10] > + por xmm11,xmm7 > + movd xmm7,DWORD[((-36))+r11] > + punpckldq xmm2,xmm8 > + movdqa xmm8,xmm14 > + paddd xmm13,xmm15 > + punpckldq xmm9,xmm7 > + movdqa xmm7,xmm10 > + movdqa xmm6,xmm10 > + pslld xmm8,5 > + pandn xmm7,xmm12 > + pand xmm6,xmm11 > + punpckldq xmm2,xmm9 > + movdqa xmm9,xmm14 > + > + movdqa XMMWORD[(96-128)+rax],xmm1 > + paddd xmm13,xmm1 > + movd xmm3,DWORD[((-32))+r8] > + psrld xmm9,27 > + pxor xmm6,xmm7 > + movdqa xmm7,xmm10 > + > + por xmm8,xmm9 > + movd xmm9,DWORD[((-32))+r9] > + pslld xmm7,30 > + paddd xmm13,xmm6 > + > + psrld xmm10,2 > + paddd xmm13,xmm8 > +DB 102,15,56,0,213 > + movd xmm8,DWORD[((-32))+r10] > + por xmm10,xmm7 > + movd xmm7,DWORD[((-32))+r11] > + punpckldq xmm3,xmm8 > + movdqa xmm8,xmm13 > + paddd xmm12,xmm15 > + punpckldq xmm9,xmm7 > + movdqa xmm7,xmm14 > + movdqa xmm6,xmm14 > + pslld xmm8,5 > + pandn xmm7,xmm11 > + pand xmm6,xmm10 > + punpckldq xmm3,xmm9 > + movdqa xmm9,xmm13 > + > + movdqa XMMWORD[(112-128)+rax],xmm2 > + paddd xmm12,xmm2 > + movd xmm4,DWORD[((-28))+r8] > + psrld xmm9,27 > + pxor xmm6,xmm7 > + movdqa xmm7,xmm14 > + > + por xmm8,xmm9 > + movd xmm9,DWORD[((-28))+r9] > + pslld xmm7,30 > + paddd xmm12,xmm6 > + > + psrld xmm14,2 > + paddd xmm12,xmm8 > +DB 102,15,56,0,221 > + movd xmm8,DWORD[((-28))+r10] > + por xmm14,xmm7 > + movd xmm7,DWORD[((-28))+r11] > + punpckldq xmm4,xmm8 > + movdqa xmm8,xmm12 > + paddd xmm11,xmm15 > + punpckldq xmm9,xmm7 > + movdqa xmm7,xmm13 > + movdqa xmm6,xmm13 > + pslld xmm8,5 > + pandn xmm7,xmm10 > + pand xmm6,xmm14 > + punpckldq xmm4,xmm9 > + movdqa xmm9,xmm12 > + > + movdqa XMMWORD[(128-128)+rax],xmm3 > + paddd xmm11,xmm3 > + movd xmm0,DWORD[((-24))+r8] > + psrld xmm9,27 > + pxor xmm6,xmm7 > + movdqa xmm7,xmm13 > + > + por xmm8,xmm9 > + movd xmm9,DWORD[((-24))+r9] > + pslld xmm7,30 > + paddd xmm11,xmm6 > + > + psrld xmm13,2 > + paddd xmm11,xmm8 > +DB 102,15,56,0,229 > + movd xmm8,DWORD[((-24))+r10] > + por xmm13,xmm7 > + movd xmm7,DWORD[((-24))+r11] > + punpckldq xmm0,xmm8 > + movdqa xmm8,xmm11 > + paddd xmm10,xmm15 > + punpckldq xmm9,xmm7 > + movdqa xmm7,xmm12 > + movdqa xmm6,xmm12 > + pslld xmm8,5 > + pandn xmm7,xmm14 > + pand xmm6,xmm13 > + punpckldq xmm0,xmm9 > + movdqa xmm9,xmm11 > + > + movdqa XMMWORD[(144-128)+rax],xmm4 > + paddd xmm10,xmm4 > + movd xmm1,DWORD[((-20))+r8] > + psrld xmm9,27 > + pxor xmm6,xmm7 > + movdqa xmm7,xmm12 > + > + por xmm8,xmm9 > + movd xmm9,DWORD[((-20))+r9] > + pslld xmm7,30 > + paddd xmm10,xmm6 > + > + psrld xmm12,2 > + paddd xmm10,xmm8 > +DB 102,15,56,0,197 > + movd xmm8,DWORD[((-20))+r10] > + por xmm12,xmm7 > + movd xmm7,DWORD[((-20))+r11] > + punpckldq xmm1,xmm8 > + movdqa xmm8,xmm10 > + paddd xmm14,xmm15 > + punpckldq xmm9,xmm7 > + movdqa xmm7,xmm11 > + movdqa xmm6,xmm11 > + pslld xmm8,5 > + pandn xmm7,xmm13 > + pand xmm6,xmm12 > + punpckldq xmm1,xmm9 > + movdqa xmm9,xmm10 > + > + movdqa XMMWORD[(160-128)+rax],xmm0 > + paddd xmm14,xmm0 > + movd xmm2,DWORD[((-16))+r8] > + psrld xmm9,27 > + pxor xmm6,xmm7 > + movdqa xmm7,xmm11 > + > + por xmm8,xmm9 > + movd xmm9,DWORD[((-16))+r9] > + pslld xmm7,30 > + paddd xmm14,xmm6 > + > + psrld xmm11,2 > + paddd xmm14,xmm8 > +DB 102,15,56,0,205 > + movd xmm8,DWORD[((-16))+r10] > + por xmm11,xmm7 > + movd xmm7,DWORD[((-16))+r11] > + punpckldq xmm2,xmm8 > + movdqa xmm8,xmm14 > + paddd xmm13,xmm15 > + punpckldq xmm9,xmm7 > + movdqa xmm7,xmm10 > + movdqa xmm6,xmm10 > + pslld xmm8,5 > + pandn xmm7,xmm12 > + pand xmm6,xmm11 > + punpckldq xmm2,xmm9 > + movdqa xmm9,xmm14 > + > + movdqa XMMWORD[(176-128)+rax],xmm1 > + paddd xmm13,xmm1 > + movd xmm3,DWORD[((-12))+r8] > + psrld xmm9,27 > + pxor xmm6,xmm7 > + movdqa xmm7,xmm10 > + > + por xmm8,xmm9 > + movd xmm9,DWORD[((-12))+r9] > + pslld xmm7,30 > + paddd xmm13,xmm6 > + > + psrld xmm10,2 > + paddd xmm13,xmm8 > +DB 102,15,56,0,213 > + movd xmm8,DWORD[((-12))+r10] > + por xmm10,xmm7 > + movd xmm7,DWORD[((-12))+r11] > + punpckldq xmm3,xmm8 > + movdqa xmm8,xmm13 > + paddd xmm12,xmm15 > + punpckldq xmm9,xmm7 > + movdqa xmm7,xmm14 > + movdqa xmm6,xmm14 > + pslld xmm8,5 > + pandn xmm7,xmm11 > + pand xmm6,xmm10 > + punpckldq xmm3,xmm9 > + movdqa xmm9,xmm13 > + > + movdqa XMMWORD[(192-128)+rax],xmm2 > + paddd xmm12,xmm2 > + movd xmm4,DWORD[((-8))+r8] > + psrld xmm9,27 > + pxor xmm6,xmm7 > + movdqa xmm7,xmm14 > + > + por xmm8,xmm9 > + movd xmm9,DWORD[((-8))+r9] > + pslld xmm7,30 > + paddd xmm12,xmm6 > + > + psrld xmm14,2 > + paddd xmm12,xmm8 > +DB 102,15,56,0,221 > + movd xmm8,DWORD[((-8))+r10] > + por xmm14,xmm7 > + movd xmm7,DWORD[((-8))+r11] > + punpckldq xmm4,xmm8 > + movdqa xmm8,xmm12 > + paddd xmm11,xmm15 > + punpckldq xmm9,xmm7 > + movdqa xmm7,xmm13 > + movdqa xmm6,xmm13 > + pslld xmm8,5 > + pandn xmm7,xmm10 > + pand xmm6,xmm14 > + punpckldq xmm4,xmm9 > + movdqa xmm9,xmm12 > + > + movdqa XMMWORD[(208-128)+rax],xmm3 > + paddd xmm11,xmm3 > + movd xmm0,DWORD[((-4))+r8] > + psrld xmm9,27 > + pxor xmm6,xmm7 > + movdqa xmm7,xmm13 > + > + por xmm8,xmm9 > + movd xmm9,DWORD[((-4))+r9] > + pslld xmm7,30 > + paddd xmm11,xmm6 > + > + psrld xmm13,2 > + paddd xmm11,xmm8 > +DB 102,15,56,0,229 > + movd xmm8,DWORD[((-4))+r10] > + por xmm13,xmm7 > + movdqa xmm1,XMMWORD[((0-128))+rax] > + movd xmm7,DWORD[((-4))+r11] > + punpckldq xmm0,xmm8 > + movdqa xmm8,xmm11 > + paddd xmm10,xmm15 > + punpckldq xmm9,xmm7 > + movdqa xmm7,xmm12 > + movdqa xmm6,xmm12 > + pslld xmm8,5 > + prefetcht0 [63+r8] > + pandn xmm7,xmm14 > + pand xmm6,xmm13 > + punpckldq xmm0,xmm9 > + movdqa xmm9,xmm11 > + > + movdqa XMMWORD[(224-128)+rax],xmm4 > + paddd xmm10,xmm4 > + psrld xmm9,27 > + pxor xmm6,xmm7 > + movdqa xmm7,xmm12 > + prefetcht0 [63+r9] > + > + por xmm8,xmm9 > + pslld xmm7,30 > + paddd xmm10,xmm6 > + prefetcht0 [63+r10] > + > + psrld xmm12,2 > + paddd xmm10,xmm8 > +DB 102,15,56,0,197 > + prefetcht0 [63+r11] > + por xmm12,xmm7 > + movdqa xmm2,XMMWORD[((16-128))+rax] > + pxor xmm1,xmm3 > + movdqa xmm3,XMMWORD[((32-128))+rax] > + > + movdqa xmm8,xmm10 > + pxor xmm1,XMMWORD[((128-128))+rax] > + paddd xmm14,xmm15 > + movdqa xmm7,xmm11 > + pslld xmm8,5 > + pxor xmm1,xmm3 > + movdqa xmm6,xmm11 > + pandn xmm7,xmm13 > + movdqa xmm5,xmm1 > + pand xmm6,xmm12 > + movdqa xmm9,xmm10 > + psrld xmm5,31 > + paddd xmm1,xmm1 > + > + movdqa XMMWORD[(240-128)+rax],xmm0 > + paddd xmm14,xmm0 > + psrld xmm9,27 > + pxor xmm6,xmm7 > + > + movdqa xmm7,xmm11 > + por xmm8,xmm9 > + pslld xmm7,30 > + paddd xmm14,xmm6 > + > + psrld xmm11,2 > + paddd xmm14,xmm8 > + por xmm1,xmm5 > + por xmm11,xmm7 > + pxor xmm2,xmm4 > + movdqa xmm4,XMMWORD[((48-128))+rax] > + > + movdqa xmm8,xmm14 > + pxor xmm2,XMMWORD[((144-128))+rax] > + paddd xmm13,xmm15 > + movdqa xmm7,xmm10 > + pslld xmm8,5 > + pxor xmm2,xmm4 > + movdqa xmm6,xmm10 > + pandn xmm7,xmm12 > + movdqa xmm5,xmm2 > + pand xmm6,xmm11 > + movdqa xmm9,xmm14 > + psrld xmm5,31 > + paddd xmm2,xmm2 > + > + movdqa XMMWORD[(0-128)+rax],xmm1 > + paddd xmm13,xmm1 > + psrld xmm9,27 > + pxor xmm6,xmm7 > + > + movdqa xmm7,xmm10 > + por xmm8,xmm9 > + pslld xmm7,30 > + paddd xmm13,xmm6 > + > + psrld xmm10,2 > + paddd xmm13,xmm8 > + por xmm2,xmm5 > + por xmm10,xmm7 > + pxor xmm3,xmm0 > + movdqa xmm0,XMMWORD[((64-128))+rax] > + > + movdqa xmm8,xmm13 > + pxor xmm3,XMMWORD[((160-128))+rax] > + paddd xmm12,xmm15 > + movdqa xmm7,xmm14 > + pslld xmm8,5 > + pxor xmm3,xmm0 > + movdqa xmm6,xmm14 > + pandn xmm7,xmm11 > + movdqa xmm5,xmm3 > + pand xmm6,xmm10 > + movdqa xmm9,xmm13 > + psrld xmm5,31 > + paddd xmm3,xmm3 > + > + movdqa XMMWORD[(16-128)+rax],xmm2 > + paddd xmm12,xmm2 > + psrld xmm9,27 > + pxor xmm6,xmm7 > + > + movdqa xmm7,xmm14 > + por xmm8,xmm9 > + pslld xmm7,30 > + paddd xmm12,xmm6 > + > + psrld xmm14,2 > + paddd xmm12,xmm8 > + por xmm3,xmm5 > + por xmm14,xmm7 > + pxor xmm4,xmm1 > + movdqa xmm1,XMMWORD[((80-128))+rax] > + > + movdqa xmm8,xmm12 > + pxor xmm4,XMMWORD[((176-128))+rax] > + paddd xmm11,xmm15 > + movdqa xmm7,xmm13 > + pslld xmm8,5 > + pxor xmm4,xmm1 > + movdqa xmm6,xmm13 > + pandn xmm7,xmm10 > + movdqa xmm5,xmm4 > + pand xmm6,xmm14 > + movdqa xmm9,xmm12 > + psrld xmm5,31 > + paddd xmm4,xmm4 > + > + movdqa XMMWORD[(32-128)+rax],xmm3 > + paddd xmm11,xmm3 > + psrld xmm9,27 > + pxor xmm6,xmm7 > + > + movdqa xmm7,xmm13 > + por xmm8,xmm9 > + pslld xmm7,30 > + paddd xmm11,xmm6 > + > + psrld xmm13,2 > + paddd xmm11,xmm8 > + por xmm4,xmm5 > + por xmm13,xmm7 > + pxor xmm0,xmm2 > + movdqa xmm2,XMMWORD[((96-128))+rax] > + > + movdqa xmm8,xmm11 > + pxor xmm0,XMMWORD[((192-128))+rax] > + paddd xmm10,xmm15 > + movdqa xmm7,xmm12 > + pslld xmm8,5 > + pxor xmm0,xmm2 > + movdqa xmm6,xmm12 > + pandn xmm7,xmm14 > + movdqa xmm5,xmm0 > + pand xmm6,xmm13 > + movdqa xmm9,xmm11 > + psrld xmm5,31 > + paddd xmm0,xmm0 > + > + movdqa XMMWORD[(48-128)+rax],xmm4 > + paddd xmm10,xmm4 > + psrld xmm9,27 > + pxor xmm6,xmm7 > + > + movdqa xmm7,xmm12 > + por xmm8,xmm9 > + pslld xmm7,30 > + paddd xmm10,xmm6 > + > + psrld xmm12,2 > + paddd xmm10,xmm8 > + por xmm0,xmm5 > + por xmm12,xmm7 > + movdqa xmm15,XMMWORD[rbp] > + pxor xmm1,xmm3 > + movdqa xmm3,XMMWORD[((112-128))+rax] > + > + movdqa xmm8,xmm10 > + movdqa xmm6,xmm13 > + pxor xmm1,XMMWORD[((208-128))+rax] > + paddd xmm14,xmm15 > + pslld xmm8,5 > + pxor xmm6,xmm11 > + > + movdqa xmm9,xmm10 > + movdqa XMMWORD[(64-128)+rax],xmm0 > + paddd xmm14,xmm0 > + pxor xmm1,xmm3 > + psrld xmm9,27 > + pxor xmm6,xmm12 > + movdqa xmm7,xmm11 > + > + pslld xmm7,30 > + movdqa xmm5,xmm1 > + por xmm8,xmm9 > + psrld xmm5,31 > + paddd xmm14,xmm6 > + paddd xmm1,xmm1 > + > + psrld xmm11,2 > + paddd xmm14,xmm8 > + por xmm1,xmm5 > + por xmm11,xmm7 > + pxor xmm2,xmm4 > + movdqa xmm4,XMMWORD[((128-128))+rax] > + > + movdqa xmm8,xmm14 > + movdqa xmm6,xmm12 > + pxor xmm2,XMMWORD[((224-128))+rax] > + paddd xmm13,xmm15 > + pslld xmm8,5 > + pxor xmm6,xmm10 > + > + movdqa xmm9,xmm14 > + movdqa XMMWORD[(80-128)+rax],xmm1 > + paddd xmm13,xmm1 > + pxor xmm2,xmm4 > + psrld xmm9,27 > + pxor xmm6,xmm11 > + movdqa xmm7,xmm10 > + > + pslld xmm7,30 > + movdqa xmm5,xmm2 > + por xmm8,xmm9 > + psrld xmm5,31 > + paddd xmm13,xmm6 > + paddd xmm2,xmm2 > + > + psrld xmm10,2 > + paddd xmm13,xmm8 > + por xmm2,xmm5 > + por xmm10,xmm7 > + pxor xmm3,xmm0 > + movdqa xmm0,XMMWORD[((144-128))+rax] > + > + movdqa xmm8,xmm13 > + movdqa xmm6,xmm11 > + pxor xmm3,XMMWORD[((240-128))+rax] > + paddd xmm12,xmm15 > + pslld xmm8,5 > + pxor xmm6,xmm14 > + > + movdqa xmm9,xmm13 > + movdqa XMMWORD[(96-128)+rax],xmm2 > + paddd xmm12,xmm2 > + pxor xmm3,xmm0 > + psrld xmm9,27 > + pxor xmm6,xmm10 > + movdqa xmm7,xmm14 > + > + pslld xmm7,30 > + movdqa xmm5,xmm3 > + por xmm8,xmm9 > + psrld xmm5,31 > + paddd xmm12,xmm6 > + paddd xmm3,xmm3 > + > + psrld xmm14,2 > + paddd xmm12,xmm8 > + por xmm3,xmm5 > + por xmm14,xmm7 > + pxor xmm4,xmm1 > + movdqa xmm1,XMMWORD[((160-128))+rax] > + > + movdqa xmm8,xmm12 > + movdqa xmm6,xmm10 > + pxor xmm4,XMMWORD[((0-128))+rax] > + paddd xmm11,xmm15 > + pslld xmm8,5 > + pxor xmm6,xmm13 > + > + movdqa xmm9,xmm12 > + movdqa XMMWORD[(112-128)+rax],xmm3 > + paddd xmm11,xmm3 > + pxor xmm4,xmm1 > + psrld xmm9,27 > + pxor xmm6,xmm14 > + movdqa xmm7,xmm13 > + > + pslld xmm7,30 > + movdqa xmm5,xmm4 > + por xmm8,xmm9 > + psrld xmm5,31 > + paddd xmm11,xmm6 > + paddd xmm4,xmm4 > + > + psrld xmm13,2 > + paddd xmm11,xmm8 > + por xmm4,xmm5 > + por xmm13,xmm7 > + pxor xmm0,xmm2 > + movdqa xmm2,XMMWORD[((176-128))+rax] > + > + movdqa xmm8,xmm11 > + movdqa xmm6,xmm14 > + pxor xmm0,XMMWORD[((16-128))+rax] > + paddd xmm10,xmm15 > + pslld xmm8,5 > + pxor xmm6,xmm12 > + > + movdqa xmm9,xmm11 > + movdqa XMMWORD[(128-128)+rax],xmm4 > + paddd xmm10,xmm4 > + pxor xmm0,xmm2 > + psrld xmm9,27 > + pxor xmm6,xmm13 > + movdqa xmm7,xmm12 > + > + pslld xmm7,30 > + movdqa xmm5,xmm0 > + por xmm8,xmm9 > + psrld xmm5,31 > + paddd xmm10,xmm6 > + paddd xmm0,xmm0 > + > + psrld xmm12,2 > + paddd xmm10,xmm8 > + por xmm0,xmm5 > + por xmm12,xmm7 > + pxor xmm1,xmm3 > + movdqa xmm3,XMMWORD[((192-128))+rax] > + > + movdqa xmm8,xmm10 > + movdqa xmm6,xmm13 > + pxor xmm1,XMMWORD[((32-128))+rax] > + paddd xmm14,xmm15 > + pslld xmm8,5 > + pxor xmm6,xmm11 > + > + movdqa xmm9,xmm10 > + movdqa XMMWORD[(144-128)+rax],xmm0 > + paddd xmm14,xmm0 > + pxor xmm1,xmm3 > + psrld xmm9,27 > + pxor xmm6,xmm12 > + movdqa xmm7,xmm11 > + > + pslld xmm7,30 > + movdqa xmm5,xmm1 > + por xmm8,xmm9 > + psrld xmm5,31 > + paddd xmm14,xmm6 > + paddd xmm1,xmm1 > + > + psrld xmm11,2 > + paddd xmm14,xmm8 > + por xmm1,xmm5 > + por xmm11,xmm7 > + pxor xmm2,xmm4 > + movdqa xmm4,XMMWORD[((208-128))+rax] > + > + movdqa xmm8,xmm14 > + movdqa xmm6,xmm12 > + pxor xmm2,XMMWORD[((48-128))+rax] > + paddd xmm13,xmm15 > + pslld xmm8,5 > + pxor xmm6,xmm10 > + > + movdqa xmm9,xmm14 > + movdqa XMMWORD[(160-128)+rax],xmm1 > + paddd xmm13,xmm1 > + pxor xmm2,xmm4 > + psrld xmm9,27 > + pxor xmm6,xmm11 > + movdqa xmm7,xmm10 > + > + pslld xmm7,30 > + movdqa xmm5,xmm2 > + por xmm8,xmm9 > + psrld xmm5,31 > + paddd xmm13,xmm6 > + paddd xmm2,xmm2 > + > + psrld xmm10,2 > + paddd xmm13,xmm8 > + por xmm2,xmm5 > + por xmm10,xmm7 > + pxor xmm3,xmm0 > + movdqa xmm0,XMMWORD[((224-128))+rax] > + > + movdqa xmm8,xmm13 > + movdqa xmm6,xmm11 > + pxor xmm3,XMMWORD[((64-128))+rax] > + paddd xmm12,xmm15 > + pslld xmm8,5 > + pxor xmm6,xmm14 > + > + movdqa xmm9,xmm13 > + movdqa XMMWORD[(176-128)+rax],xmm2 > + paddd xmm12,xmm2 > + pxor xmm3,xmm0 > + psrld xmm9,27 > + pxor xmm6,xmm10 > + movdqa xmm7,xmm14 > + > + pslld xmm7,30 > + movdqa xmm5,xmm3 > + por xmm8,xmm9 > + psrld xmm5,31 > + paddd xmm12,xmm6 > + paddd xmm3,xmm3 > + > + psrld xmm14,2 > + paddd xmm12,xmm8 > + por xmm3,xmm5 > + por xmm14,xmm7 > + pxor xmm4,xmm1 > + movdqa xmm1,XMMWORD[((240-128))+rax] > + > + movdqa xmm8,xmm12 > + movdqa xmm6,xmm10 > + pxor xmm4,XMMWORD[((80-128))+rax] > + paddd xmm11,xmm15 > + pslld xmm8,5 > + pxor xmm6,xmm13 > + > + movdqa xmm9,xmm12 > + movdqa XMMWORD[(192-128)+rax],xmm3 > + paddd xmm11,xmm3 > + pxor xmm4,xmm1 > + psrld xmm9,27 > + pxor xmm6,xmm14 > + movdqa xmm7,xmm13 > + > + pslld xmm7,30 > + movdqa xmm5,xmm4 > + por xmm8,xmm9 > + psrld xmm5,31 > + paddd xmm11,xmm6 > + paddd xmm4,xmm4 > + > + psrld xmm13,2 > + paddd xmm11,xmm8 > + por xmm4,xmm5 > + por xmm13,xmm7 > + pxor xmm0,xmm2 > + movdqa xmm2,XMMWORD[((0-128))+rax] > + > + movdqa xmm8,xmm11 > + movdqa xmm6,xmm14 > + pxor xmm0,XMMWORD[((96-128))+rax] > + paddd xmm10,xmm15 > + pslld xmm8,5 > + pxor xmm6,xmm12 > + > + movdqa xmm9,xmm11 > + movdqa XMMWORD[(208-128)+rax],xmm4 > + paddd xmm10,xmm4 > + pxor xmm0,xmm2 > + psrld xmm9,27 > + pxor xmm6,xmm13 > + movdqa xmm7,xmm12 > + > + pslld xmm7,30 > + movdqa xmm5,xmm0 > + por xmm8,xmm9 > + psrld xmm5,31 > + paddd xmm10,xmm6 > + paddd xmm0,xmm0 > + > + psrld xmm12,2 > + paddd xmm10,xmm8 > + por xmm0,xmm5 > + por xmm12,xmm7 > + pxor xmm1,xmm3 > + movdqa xmm3,XMMWORD[((16-128))+rax] > + > + movdqa xmm8,xmm10 > + movdqa xmm6,xmm13 > + pxor xmm1,XMMWORD[((112-128))+rax] > + paddd xmm14,xmm15 > + pslld xmm8,5 > + pxor xmm6,xmm11 > + > + movdqa xmm9,xmm10 > + movdqa XMMWORD[(224-128)+rax],xmm0 > + paddd xmm14,xmm0 > + pxor xmm1,xmm3 > + psrld xmm9,27 > + pxor xmm6,xmm12 > + movdqa xmm7,xmm11 > + > + pslld xmm7,30 > + movdqa xmm5,xmm1 > + por xmm8,xmm9 > + psrld xmm5,31 > + paddd xmm14,xmm6 > + paddd xmm1,xmm1 > + > + psrld xmm11,2 > + paddd xmm14,xmm8 > + por xmm1,xmm5 > + por xmm11,xmm7 > + pxor xmm2,xmm4 > + movdqa xmm4,XMMWORD[((32-128))+rax] > + > + movdqa xmm8,xmm14 > + movdqa xmm6,xmm12 > + pxor xmm2,XMMWORD[((128-128))+rax] > + paddd xmm13,xmm15 > + pslld xmm8,5 > + pxor xmm6,xmm10 > + > + movdqa xmm9,xmm14 > + movdqa XMMWORD[(240-128)+rax],xmm1 > + paddd xmm13,xmm1 > + pxor xmm2,xmm4 > + psrld xmm9,27 > + pxor xmm6,xmm11 > + movdqa xmm7,xmm10 > + > + pslld xmm7,30 > + movdqa xmm5,xmm2 > + por xmm8,xmm9 > + psrld xmm5,31 > + paddd xmm13,xmm6 > + paddd xmm2,xmm2 > + > + psrld xmm10,2 > + paddd xmm13,xmm8 > + por xmm2,xmm5 > + por xmm10,xmm7 > + pxor xmm3,xmm0 > + movdqa xmm0,XMMWORD[((48-128))+rax] > + > + movdqa xmm8,xmm13 > + movdqa xmm6,xmm11 > + pxor xmm3,XMMWORD[((144-128))+rax] > + paddd xmm12,xmm15 > + pslld xmm8,5 > + pxor xmm6,xmm14 > + > + movdqa xmm9,xmm13 > + movdqa XMMWORD[(0-128)+rax],xmm2 > + paddd xmm12,xmm2 > + pxor xmm3,xmm0 > + psrld xmm9,27 > + pxor xmm6,xmm10 > + movdqa xmm7,xmm14 > + > + pslld xmm7,30 > + movdqa xmm5,xmm3 > + por xmm8,xmm9 > + psrld xmm5,31 > + paddd xmm12,xmm6 > + paddd xmm3,xmm3 > + > + psrld xmm14,2 > + paddd xmm12,xmm8 > + por xmm3,xmm5 > + por xmm14,xmm7 > + pxor xmm4,xmm1 > + movdqa xmm1,XMMWORD[((64-128))+rax] > + > + movdqa xmm8,xmm12 > + movdqa xmm6,xmm10 > + pxor xmm4,XMMWORD[((160-128))+rax] > + paddd xmm11,xmm15 > + pslld xmm8,5 > + pxor xmm6,xmm13 > + > + movdqa xmm9,xmm12 > + movdqa XMMWORD[(16-128)+rax],xmm3 > + paddd xmm11,xmm3 > + pxor xmm4,xmm1 > + psrld xmm9,27 > + pxor xmm6,xmm14 > + movdqa xmm7,xmm13 > + > + pslld xmm7,30 > + movdqa xmm5,xmm4 > + por xmm8,xmm9 > + psrld xmm5,31 > + paddd xmm11,xmm6 > + paddd xmm4,xmm4 > + > + psrld xmm13,2 > + paddd xmm11,xmm8 > + por xmm4,xmm5 > + por xmm13,xmm7 > + pxor xmm0,xmm2 > + movdqa xmm2,XMMWORD[((80-128))+rax] > + > + movdqa xmm8,xmm11 > + movdqa xmm6,xmm14 > + pxor xmm0,XMMWORD[((176-128))+rax] > + paddd xmm10,xmm15 > + pslld xmm8,5 > + pxor xmm6,xmm12 > + > + movdqa xmm9,xmm11 > + movdqa XMMWORD[(32-128)+rax],xmm4 > + paddd xmm10,xmm4 > + pxor xmm0,xmm2 > + psrld xmm9,27 > + pxor xmm6,xmm13 > + movdqa xmm7,xmm12 > + > + pslld xmm7,30 > + movdqa xmm5,xmm0 > + por xmm8,xmm9 > + psrld xmm5,31 > + paddd xmm10,xmm6 > + paddd xmm0,xmm0 > + > + psrld xmm12,2 > + paddd xmm10,xmm8 > + por xmm0,xmm5 > + por xmm12,xmm7 > + pxor xmm1,xmm3 > + movdqa xmm3,XMMWORD[((96-128))+rax] > + > + movdqa xmm8,xmm10 > + movdqa xmm6,xmm13 > + pxor xmm1,XMMWORD[((192-128))+rax] > + paddd xmm14,xmm15 > + pslld xmm8,5 > + pxor xmm6,xmm11 > + > + movdqa xmm9,xmm10 > + movdqa XMMWORD[(48-128)+rax],xmm0 > + paddd xmm14,xmm0 > + pxor xmm1,xmm3 > + psrld xmm9,27 > + pxor xmm6,xmm12 > + movdqa xmm7,xmm11 > + > + pslld xmm7,30 > + movdqa xmm5,xmm1 > + por xmm8,xmm9 > + psrld xmm5,31 > + paddd xmm14,xmm6 > + paddd xmm1,xmm1 > + > + psrld xmm11,2 > + paddd xmm14,xmm8 > + por xmm1,xmm5 > + por xmm11,xmm7 > + pxor xmm2,xmm4 > + movdqa xmm4,XMMWORD[((112-128))+rax] > + > + movdqa xmm8,xmm14 > + movdqa xmm6,xmm12 > + pxor xmm2,XMMWORD[((208-128))+rax] > + paddd xmm13,xmm15 > + pslld xmm8,5 > + pxor xmm6,xmm10 > + > + movdqa xmm9,xmm14 > + movdqa XMMWORD[(64-128)+rax],xmm1 > + paddd xmm13,xmm1 > + pxor xmm2,xmm4 > + psrld xmm9,27 > + pxor xmm6,xmm11 > + movdqa xmm7,xmm10 > + > + pslld xmm7,30 > + movdqa xmm5,xmm2 > + por xmm8,xmm9 > + psrld xmm5,31 > + paddd xmm13,xmm6 > + paddd xmm2,xmm2 > + > + psrld xmm10,2 > + paddd xmm13,xmm8 > + por xmm2,xmm5 > + por xmm10,xmm7 > + pxor xmm3,xmm0 > + movdqa xmm0,XMMWORD[((128-128))+rax] > + > + movdqa xmm8,xmm13 > + movdqa xmm6,xmm11 > + pxor xmm3,XMMWORD[((224-128))+rax] > + paddd xmm12,xmm15 > + pslld xmm8,5 > + pxor xmm6,xmm14 > + > + movdqa xmm9,xmm13 > + movdqa XMMWORD[(80-128)+rax],xmm2 > + paddd xmm12,xmm2 > + pxor xmm3,xmm0 > + psrld xmm9,27 > + pxor xmm6,xmm10 > + movdqa xmm7,xmm14 > + > + pslld xmm7,30 > + movdqa xmm5,xmm3 > + por xmm8,xmm9 > + psrld xmm5,31 > + paddd xmm12,xmm6 > + paddd xmm3,xmm3 > + > + psrld xmm14,2 > + paddd xmm12,xmm8 > + por xmm3,xmm5 > + por xmm14,xmm7 > + pxor xmm4,xmm1 > + movdqa xmm1,XMMWORD[((144-128))+rax] > + > + movdqa xmm8,xmm12 > + movdqa xmm6,xmm10 > + pxor xmm4,XMMWORD[((240-128))+rax] > + paddd xmm11,xmm15 > + pslld xmm8,5 > + pxor xmm6,xmm13 > + > + movdqa xmm9,xmm12 > + movdqa XMMWORD[(96-128)+rax],xmm3 > + paddd xmm11,xmm3 > + pxor xmm4,xmm1 > + psrld xmm9,27 > + pxor xmm6,xmm14 > + movdqa xmm7,xmm13 > + > + pslld xmm7,30 > + movdqa xmm5,xmm4 > + por xmm8,xmm9 > + psrld xmm5,31 > + paddd xmm11,xmm6 > + paddd xmm4,xmm4 > + > + psrld xmm13,2 > + paddd xmm11,xmm8 > + por xmm4,xmm5 > + por xmm13,xmm7 > + pxor xmm0,xmm2 > + movdqa xmm2,XMMWORD[((160-128))+rax] > + > + movdqa xmm8,xmm11 > + movdqa xmm6,xmm14 > + pxor xmm0,XMMWORD[((0-128))+rax] > + paddd xmm10,xmm15 > + pslld xmm8,5 > + pxor xmm6,xmm12 > + > + movdqa xmm9,xmm11 > + movdqa XMMWORD[(112-128)+rax],xmm4 > + paddd xmm10,xmm4 > + pxor xmm0,xmm2 > + psrld xmm9,27 > + pxor xmm6,xmm13 > + movdqa xmm7,xmm12 > + > + pslld xmm7,30 > + movdqa xmm5,xmm0 > + por xmm8,xmm9 > + psrld xmm5,31 > + paddd xmm10,xmm6 > + paddd xmm0,xmm0 > + > + psrld xmm12,2 > + paddd xmm10,xmm8 > + por xmm0,xmm5 > + por xmm12,xmm7 > + movdqa xmm15,XMMWORD[32+rbp] > + pxor xmm1,xmm3 > + movdqa xmm3,XMMWORD[((176-128))+rax] > + > + movdqa xmm8,xmm10 > + movdqa xmm7,xmm13 > + pxor xmm1,XMMWORD[((16-128))+rax] > + pxor xmm1,xmm3 > + paddd xmm14,xmm15 > + pslld xmm8,5 > + movdqa xmm9,xmm10 > + pand xmm7,xmm12 > + > + movdqa xmm6,xmm13 > + movdqa xmm5,xmm1 > + psrld xmm9,27 > + paddd xmm14,xmm7 > + pxor xmm6,xmm12 > + > + movdqa XMMWORD[(128-128)+rax],xmm0 > + paddd xmm14,xmm0 > + por xmm8,xmm9 > + psrld xmm5,31 > + pand xmm6,xmm11 > + movdqa xmm7,xmm11 > + > + pslld xmm7,30 > + paddd xmm1,xmm1 > + paddd xmm14,xmm6 > + > + psrld xmm11,2 > + paddd xmm14,xmm8 > + por xmm1,xmm5 > + por xmm11,xmm7 > + pxor xmm2,xmm4 > + movdqa xmm4,XMMWORD[((192-128))+rax] > + > + movdqa xmm8,xmm14 > + movdqa xmm7,xmm12 > + pxor xmm2,XMMWORD[((32-128))+rax] > + pxor xmm2,xmm4 > + paddd xmm13,xmm15 > + pslld xmm8,5 > + movdqa xmm9,xmm14 > + pand xmm7,xmm11 > + > + movdqa xmm6,xmm12 > + movdqa xmm5,xmm2 > + psrld xmm9,27 > + paddd xmm13,xmm7 > + pxor xmm6,xmm11 > + > + movdqa XMMWORD[(144-128)+rax],xmm1 > + paddd xmm13,xmm1 > + por xmm8,xmm9 > + psrld xmm5,31 > + pand xmm6,xmm10 > + movdqa xmm7,xmm10 > + > + pslld xmm7,30 > + paddd xmm2,xmm2 > + paddd xmm13,xmm6 > + > + psrld xmm10,2 > + paddd xmm13,xmm8 > + por xmm2,xmm5 > + por xmm10,xmm7 > + pxor xmm3,xmm0 > + movdqa xmm0,XMMWORD[((208-128))+rax] > + > + movdqa xmm8,xmm13 > + movdqa xmm7,xmm11 > + pxor xmm3,XMMWORD[((48-128))+rax] > + pxor xmm3,xmm0 > + paddd xmm12,xmm15 > + pslld xmm8,5 > + movdqa xmm9,xmm13 > + pand xmm7,xmm10 > + > + movdqa xmm6,xmm11 > + movdqa xmm5,xmm3 > + psrld xmm9,27 > + paddd xmm12,xmm7 > + pxor xmm6,xmm10 > + > + movdqa XMMWORD[(160-128)+rax],xmm2 > + paddd xmm12,xmm2 > + por xmm8,xmm9 > + psrld xmm5,31 > + pand xmm6,xmm14 > + movdqa xmm7,xmm14 > + > + pslld xmm7,30 > + paddd xmm3,xmm3 > + paddd xmm12,xmm6 > + > + psrld xmm14,2 > + paddd xmm12,xmm8 > + por xmm3,xmm5 > + por xmm14,xmm7 > + pxor xmm4,xmm1 > + movdqa xmm1,XMMWORD[((224-128))+rax] > + > + movdqa xmm8,xmm12 > + movdqa xmm7,xmm10 > + pxor xmm4,XMMWORD[((64-128))+rax] > + pxor xmm4,xmm1 > + paddd xmm11,xmm15 > + pslld xmm8,5 > + movdqa xmm9,xmm12 > + pand xmm7,xmm14 > + > + movdqa xmm6,xmm10 > + movdqa xmm5,xmm4 > + psrld xmm9,27 > + paddd xmm11,xmm7 > + pxor xmm6,xmm14 > + > + movdqa XMMWORD[(176-128)+rax],xmm3 > + paddd xmm11,xmm3 > + por xmm8,xmm9 > + psrld xmm5,31 > + pand xmm6,xmm13 > + movdqa xmm7,xmm13 > + > + pslld xmm7,30 > + paddd xmm4,xmm4 > + paddd xmm11,xmm6 > + > + psrld xmm13,2 > + paddd xmm11,xmm8 > + por xmm4,xmm5 > + por xmm13,xmm7 > + pxor xmm0,xmm2 > + movdqa xmm2,XMMWORD[((240-128))+rax] > + > + movdqa xmm8,xmm11 > + movdqa xmm7,xmm14 > + pxor xmm0,XMMWORD[((80-128))+rax] > + pxor xmm0,xmm2 > + paddd xmm10,xmm15 > + pslld xmm8,5 > + movdqa xmm9,xmm11 > + pand xmm7,xmm13 > + > + movdqa xmm6,xmm14 > + movdqa xmm5,xmm0 > + psrld xmm9,27 > + paddd xmm10,xmm7 > + pxor xmm6,xmm13 > + > + movdqa XMMWORD[(192-128)+rax],xmm4 > + paddd xmm10,xmm4 > + por xmm8,xmm9 > + psrld xmm5,31 > + pand xmm6,xmm12 > + movdqa xmm7,xmm12 > + > + pslld xmm7,30 > + paddd xmm0,xmm0 > + paddd xmm10,xmm6 > + > + psrld xmm12,2 > + paddd xmm10,xmm8 > + por xmm0,xmm5 > + por xmm12,xmm7 > + pxor xmm1,xmm3 > + movdqa xmm3,XMMWORD[((0-128))+rax] > + > + movdqa xmm8,xmm10 > + movdqa xmm7,xmm13 > + pxor xmm1,XMMWORD[((96-128))+rax] > + pxor xmm1,xmm3 > + paddd xmm14,xmm15 > + pslld xmm8,5 > + movdqa xmm9,xmm10 > + pand xmm7,xmm12 > + > + movdqa xmm6,xmm13 > + movdqa xmm5,xmm1 > + psrld xmm9,27 > + paddd xmm14,xmm7 > + pxor xmm6,xmm12 > + > + movdqa XMMWORD[(208-128)+rax],xmm0 > + paddd xmm14,xmm0 > + por xmm8,xmm9 > + psrld xmm5,31 > + pand xmm6,xmm11 > + movdqa xmm7,xmm11 > + > + pslld xmm7,30 > + paddd xmm1,xmm1 > + paddd xmm14,xmm6 > + > + psrld xmm11,2 > + paddd xmm14,xmm8 > + por xmm1,xmm5 > + por xmm11,xmm7 > + pxor xmm2,xmm4 > + movdqa xmm4,XMMWORD[((16-128))+rax] > + > + movdqa xmm8,xmm14 > + movdqa xmm7,xmm12 > + pxor xmm2,XMMWORD[((112-128))+rax] > + pxor xmm2,xmm4 > + paddd xmm13,xmm15 > + pslld xmm8,5 > + movdqa xmm9,xmm14 > + pand xmm7,xmm11 > + > + movdqa xmm6,xmm12 > + movdqa xmm5,xmm2 > + psrld xmm9,27 > + paddd xmm13,xmm7 > + pxor xmm6,xmm11 > + > + movdqa XMMWORD[(224-128)+rax],xmm1 > + paddd xmm13,xmm1 > + por xmm8,xmm9 > + psrld xmm5,31 > + pand xmm6,xmm10 > + movdqa xmm7,xmm10 > + > + pslld xmm7,30 > + paddd xmm2,xmm2 > + paddd xmm13,xmm6 > + > + psrld xmm10,2 > + paddd xmm13,xmm8 > + por xmm2,xmm5 > + por xmm10,xmm7 > + pxor xmm3,xmm0 > + movdqa xmm0,XMMWORD[((32-128))+rax] > + > + movdqa xmm8,xmm13 > + movdqa xmm7,xmm11 > + pxor xmm3,XMMWORD[((128-128))+rax] > + pxor xmm3,xmm0 > + paddd xmm12,xmm15 > + pslld xmm8,5 > + movdqa xmm9,xmm13 > + pand xmm7,xmm10 > + > + movdqa xmm6,xmm11 > + movdqa xmm5,xmm3 > + psrld xmm9,27 > + paddd xmm12,xmm7 > + pxor xmm6,xmm10 > + > + movdqa XMMWORD[(240-128)+rax],xmm2 > + paddd xmm12,xmm2 > + por xmm8,xmm9 > + psrld xmm5,31 > + pand xmm6,xmm14 > + movdqa xmm7,xmm14 > + > + pslld xmm7,30 > + paddd xmm3,xmm3 > + paddd xmm12,xmm6 > + > + psrld xmm14,2 > + paddd xmm12,xmm8 > + por xmm3,xmm5 > + por xmm14,xmm7 > + pxor xmm4,xmm1 > + movdqa xmm1,XMMWORD[((48-128))+rax] > + > + movdqa xmm8,xmm12 > + movdqa xmm7,xmm10 > + pxor xmm4,XMMWORD[((144-128))+rax] > + pxor xmm4,xmm1 > + paddd xmm11,xmm15 > + pslld xmm8,5 > + movdqa xmm9,xmm12 > + pand xmm7,xmm14 > + > + movdqa xmm6,xmm10 > + movdqa xmm5,xmm4 > + psrld xmm9,27 > + paddd xmm11,xmm7 > + pxor xmm6,xmm14 > + > + movdqa XMMWORD[(0-128)+rax],xmm3 > + paddd xmm11,xmm3 > + por xmm8,xmm9 > + psrld xmm5,31 > + pand xmm6,xmm13 > + movdqa xmm7,xmm13 > + > + pslld xmm7,30 > + paddd xmm4,xmm4 > + paddd xmm11,xmm6 > + > + psrld xmm13,2 > + paddd xmm11,xmm8 > + por xmm4,xmm5 > + por xmm13,xmm7 > + pxor xmm0,xmm2 > + movdqa xmm2,XMMWORD[((64-128))+rax] > + > + movdqa xmm8,xmm11 > + movdqa xmm7,xmm14 > + pxor xmm0,XMMWORD[((160-128))+rax] > + pxor xmm0,xmm2 > + paddd xmm10,xmm15 > + pslld xmm8,5 > + movdqa xmm9,xmm11 > + pand xmm7,xmm13 > + > + movdqa xmm6,xmm14 > + movdqa xmm5,xmm0 > + psrld xmm9,27 > + paddd xmm10,xmm7 > + pxor xmm6,xmm13 > + > + movdqa XMMWORD[(16-128)+rax],xmm4 > + paddd xmm10,xmm4 > + por xmm8,xmm9 > + psrld xmm5,31 > + pand xmm6,xmm12 > + movdqa xmm7,xmm12 > + > + pslld xmm7,30 > + paddd xmm0,xmm0 > + paddd xmm10,xmm6 > + > + psrld xmm12,2 > + paddd xmm10,xmm8 > + por xmm0,xmm5 > + por xmm12,xmm7 > + pxor xmm1,xmm3 > + movdqa xmm3,XMMWORD[((80-128))+rax] > + > + movdqa xmm8,xmm10 > + movdqa xmm7,xmm13 > + pxor xmm1,XMMWORD[((176-128))+rax] > + pxor xmm1,xmm3 > + paddd xmm14,xmm15 > + pslld xmm8,5 > + movdqa xmm9,xmm10 > + pand xmm7,xmm12 > + > + movdqa xmm6,xmm13 > + movdqa xmm5,xmm1 > + psrld xmm9,27 > + paddd xmm14,xmm7 > + pxor xmm6,xmm12 > + > + movdqa XMMWORD[(32-128)+rax],xmm0 > + paddd xmm14,xmm0 > + por xmm8,xmm9 > + psrld xmm5,31 > + pand xmm6,xmm11 > + movdqa xmm7,xmm11 > + > + pslld xmm7,30 > + paddd xmm1,xmm1 > + paddd xmm14,xmm6 > + > + psrld xmm11,2 > + paddd xmm14,xmm8 > + por xmm1,xmm5 > + por xmm11,xmm7 > + pxor xmm2,xmm4 > + movdqa xmm4,XMMWORD[((96-128))+rax] > + > + movdqa xmm8,xmm14 > + movdqa xmm7,xmm12 > + pxor xmm2,XMMWORD[((192-128))+rax] > + pxor xmm2,xmm4 > + paddd xmm13,xmm15 > + pslld xmm8,5 > + movdqa xmm9,xmm14 > + pand xmm7,xmm11 > + > + movdqa xmm6,xmm12 > + movdqa xmm5,xmm2 > + psrld xmm9,27 > + paddd xmm13,xmm7 > + pxor xmm6,xmm11 > + > + movdqa XMMWORD[(48-128)+rax],xmm1 > + paddd xmm13,xmm1 > + por xmm8,xmm9 > + psrld xmm5,31 > + pand xmm6,xmm10 > + movdqa xmm7,xmm10 > + > + pslld xmm7,30 > + paddd xmm2,xmm2 > + paddd xmm13,xmm6 > + > + psrld xmm10,2 > + paddd xmm13,xmm8 > + por xmm2,xmm5 > + por xmm10,xmm7 > + pxor xmm3,xmm0 > + movdqa xmm0,XMMWORD[((112-128))+rax] > + > + movdqa xmm8,xmm13 > + movdqa xmm7,xmm11 > + pxor xmm3,XMMWORD[((208-128))+rax] > + pxor xmm3,xmm0 > + paddd xmm12,xmm15 > + pslld xmm8,5 > + movdqa xmm9,xmm13 > + pand xmm7,xmm10 > + > + movdqa xmm6,xmm11 > + movdqa xmm5,xmm3 > + psrld xmm9,27 > + paddd xmm12,xmm7 > + pxor xmm6,xmm10 > + > + movdqa XMMWORD[(64-128)+rax],xmm2 > + paddd xmm12,xmm2 > + por xmm8,xmm9 > + psrld xmm5,31 > + pand xmm6,xmm14 > + movdqa xmm7,xmm14 > + > + pslld xmm7,30 > + paddd xmm3,xmm3 > + paddd xmm12,xmm6 > + > + psrld xmm14,2 > + paddd xmm12,xmm8 > + por xmm3,xmm5 > + por xmm14,xmm7 > + pxor xmm4,xmm1 > + movdqa xmm1,XMMWORD[((128-128))+rax] > + > + movdqa xmm8,xmm12 > + movdqa xmm7,xmm10 > + pxor xmm4,XMMWORD[((224-128))+rax] > + pxor xmm4,xmm1 > + paddd xmm11,xmm15 > + pslld xmm8,5 > + movdqa xmm9,xmm12 > + pand xmm7,xmm14 > + > + movdqa xmm6,xmm10 > + movdqa xmm5,xmm4 > + psrld xmm9,27 > + paddd xmm11,xmm7 > + pxor xmm6,xmm14 > + > + movdqa XMMWORD[(80-128)+rax],xmm3 > + paddd xmm11,xmm3 > + por xmm8,xmm9 > + psrld xmm5,31 > + pand xmm6,xmm13 > + movdqa xmm7,xmm13 > + > + pslld xmm7,30 > + paddd xmm4,xmm4 > + paddd xmm11,xmm6 > + > + psrld xmm13,2 > + paddd xmm11,xmm8 > + por xmm4,xmm5 > + por xmm13,xmm7 > + pxor xmm0,xmm2 > + movdqa xmm2,XMMWORD[((144-128))+rax] > + > + movdqa xmm8,xmm11 > + movdqa xmm7,xmm14 > + pxor xmm0,XMMWORD[((240-128))+rax] > + pxor xmm0,xmm2 > + paddd xmm10,xmm15 > + pslld xmm8,5 > + movdqa xmm9,xmm11 > + pand xmm7,xmm13 > + > + movdqa xmm6,xmm14 > + movdqa xmm5,xmm0 > + psrld xmm9,27 > + paddd xmm10,xmm7 > + pxor xmm6,xmm13 > + > + movdqa XMMWORD[(96-128)+rax],xmm4 > + paddd xmm10,xmm4 > + por xmm8,xmm9 > + psrld xmm5,31 > + pand xmm6,xmm12 > + movdqa xmm7,xmm12 > + > + pslld xmm7,30 > + paddd xmm0,xmm0 > + paddd xmm10,xmm6 > + > + psrld xmm12,2 > + paddd xmm10,xmm8 > + por xmm0,xmm5 > + por xmm12,xmm7 > + pxor xmm1,xmm3 > + movdqa xmm3,XMMWORD[((160-128))+rax] > + > + movdqa xmm8,xmm10 > + movdqa xmm7,xmm13 > + pxor xmm1,XMMWORD[((0-128))+rax] > + pxor xmm1,xmm3 > + paddd xmm14,xmm15 > + pslld xmm8,5 > + movdqa xmm9,xmm10 > + pand xmm7,xmm12 > + > + movdqa xmm6,xmm13 > + movdqa xmm5,xmm1 > + psrld xmm9,27 > + paddd xmm14,xmm7 > + pxor xmm6,xmm12 > + > + movdqa XMMWORD[(112-128)+rax],xmm0 > + paddd xmm14,xmm0 > + por xmm8,xmm9 > + psrld xmm5,31 > + pand xmm6,xmm11 > + movdqa xmm7,xmm11 > + > + pslld xmm7,30 > + paddd xmm1,xmm1 > + paddd xmm14,xmm6 > + > + psrld xmm11,2 > + paddd xmm14,xmm8 > + por xmm1,xmm5 > + por xmm11,xmm7 > + pxor xmm2,xmm4 > + movdqa xmm4,XMMWORD[((176-128))+rax] > + > + movdqa xmm8,xmm14 > + movdqa xmm7,xmm12 > + pxor xmm2,XMMWORD[((16-128))+rax] > + pxor xmm2,xmm4 > + paddd xmm13,xmm15 > + pslld xmm8,5 > + movdqa xmm9,xmm14 > + pand xmm7,xmm11 > + > + movdqa xmm6,xmm12 > + movdqa xmm5,xmm2 > + psrld xmm9,27 > + paddd xmm13,xmm7 > + pxor xmm6,xmm11 > + > + movdqa XMMWORD[(128-128)+rax],xmm1 > + paddd xmm13,xmm1 > + por xmm8,xmm9 > + psrld xmm5,31 > + pand xmm6,xmm10 > + movdqa xmm7,xmm10 > + > + pslld xmm7,30 > + paddd xmm2,xmm2 > + paddd xmm13,xmm6 > + > + psrld xmm10,2 > + paddd xmm13,xmm8 > + por xmm2,xmm5 > + por xmm10,xmm7 > + pxor xmm3,xmm0 > + movdqa xmm0,XMMWORD[((192-128))+rax] > + > + movdqa xmm8,xmm13 > + movdqa xmm7,xmm11 > + pxor xmm3,XMMWORD[((32-128))+rax] > + pxor xmm3,xmm0 > + paddd xmm12,xmm15 > + pslld xmm8,5 > + movdqa xmm9,xmm13 > + pand xmm7,xmm10 > + > + movdqa xmm6,xmm11 > + movdqa xmm5,xmm3 > + psrld xmm9,27 > + paddd xmm12,xmm7 > + pxor xmm6,xmm10 > + > + movdqa XMMWORD[(144-128)+rax],xmm2 > + paddd xmm12,xmm2 > + por xmm8,xmm9 > + psrld xmm5,31 > + pand xmm6,xmm14 > + movdqa xmm7,xmm14 > + > + pslld xmm7,30 > + paddd xmm3,xmm3 > + paddd xmm12,xmm6 > + > + psrld xmm14,2 > + paddd xmm12,xmm8 > + por xmm3,xmm5 > + por xmm14,xmm7 > + pxor xmm4,xmm1 > + movdqa xmm1,XMMWORD[((208-128))+rax] > + > + movdqa xmm8,xmm12 > + movdqa xmm7,xmm10 > + pxor xmm4,XMMWORD[((48-128))+rax] > + pxor xmm4,xmm1 > + paddd xmm11,xmm15 > + pslld xmm8,5 > + movdqa xmm9,xmm12 > + pand xmm7,xmm14 > + > + movdqa xmm6,xmm10 > + movdqa xmm5,xmm4 > + psrld xmm9,27 > + paddd xmm11,xmm7 > + pxor xmm6,xmm14 > + > + movdqa XMMWORD[(160-128)+rax],xmm3 > + paddd xmm11,xmm3 > + por xmm8,xmm9 > + psrld xmm5,31 > + pand xmm6,xmm13 > + movdqa xmm7,xmm13 > + > + pslld xmm7,30 > + paddd xmm4,xmm4 > + paddd xmm11,xmm6 > + > + psrld xmm13,2 > + paddd xmm11,xmm8 > + por xmm4,xmm5 > + por xmm13,xmm7 > + pxor xmm0,xmm2 > + movdqa xmm2,XMMWORD[((224-128))+rax] > + > + movdqa xmm8,xmm11 > + movdqa xmm7,xmm14 > + pxor xmm0,XMMWORD[((64-128))+rax] > + pxor xmm0,xmm2 > + paddd xmm10,xmm15 > + pslld xmm8,5 > + movdqa xmm9,xmm11 > + pand xmm7,xmm13 > + > + movdqa xmm6,xmm14 > + movdqa xmm5,xmm0 > + psrld xmm9,27 > + paddd xmm10,xmm7 > + pxor xmm6,xmm13 > + > + movdqa XMMWORD[(176-128)+rax],xmm4 > + paddd xmm10,xmm4 > + por xmm8,xmm9 > + psrld xmm5,31 > + pand xmm6,xmm12 > + movdqa xmm7,xmm12 > + > + pslld xmm7,30 > + paddd xmm0,xmm0 > + paddd xmm10,xmm6 > + > + psrld xmm12,2 > + paddd xmm10,xmm8 > + por xmm0,xmm5 > + por xmm12,xmm7 > + movdqa xmm15,XMMWORD[64+rbp] > + pxor xmm1,xmm3 > + movdqa xmm3,XMMWORD[((240-128))+rax] > + > + movdqa xmm8,xmm10 > + movdqa xmm6,xmm13 > + pxor xmm1,XMMWORD[((80-128))+rax] > + paddd xmm14,xmm15 > + pslld xmm8,5 > + pxor xmm6,xmm11 > + > + movdqa xmm9,xmm10 > + movdqa XMMWORD[(192-128)+rax],xmm0 > + paddd xmm14,xmm0 > + pxor xmm1,xmm3 > + psrld xmm9,27 > + pxor xmm6,xmm12 > + movdqa xmm7,xmm11 > + > + pslld xmm7,30 > + movdqa xmm5,xmm1 > + por xmm8,xmm9 > + psrld xmm5,31 > + paddd xmm14,xmm6 > + paddd xmm1,xmm1 > + > + psrld xmm11,2 > + paddd xmm14,xmm8 > + por xmm1,xmm5 > + por xmm11,xmm7 > + pxor xmm2,xmm4 > + movdqa xmm4,XMMWORD[((0-128))+rax] > + > + movdqa xmm8,xmm14 > + movdqa xmm6,xmm12 > + pxor xmm2,XMMWORD[((96-128))+rax] > + paddd xmm13,xmm15 > + pslld xmm8,5 > + pxor xmm6,xmm10 > + > + movdqa xmm9,xmm14 > + movdqa XMMWORD[(208-128)+rax],xmm1 > + paddd xmm13,xmm1 > + pxor xmm2,xmm4 > + psrld xmm9,27 > + pxor xmm6,xmm11 > + movdqa xmm7,xmm10 > + > + pslld xmm7,30 > + movdqa xmm5,xmm2 > + por xmm8,xmm9 > + psrld xmm5,31 > + paddd xmm13,xmm6 > + paddd xmm2,xmm2 > + > + psrld xmm10,2 > + paddd xmm13,xmm8 > + por xmm2,xmm5 > + por xmm10,xmm7 > + pxor xmm3,xmm0 > + movdqa xmm0,XMMWORD[((16-128))+rax] > + > + movdqa xmm8,xmm13 > + movdqa xmm6,xmm11 > + pxor xmm3,XMMWORD[((112-128))+rax] > + paddd xmm12,xmm15 > + pslld xmm8,5 > + pxor xmm6,xmm14 > + > + movdqa xmm9,xmm13 > + movdqa XMMWORD[(224-128)+rax],xmm2 > + paddd xmm12,xmm2 > + pxor xmm3,xmm0 > + psrld xmm9,27 > + pxor xmm6,xmm10 > + movdqa xmm7,xmm14 > + > + pslld xmm7,30 > + movdqa xmm5,xmm3 > + por xmm8,xmm9 > + psrld xmm5,31 > + paddd xmm12,xmm6 > + paddd xmm3,xmm3 > + > + psrld xmm14,2 > + paddd xmm12,xmm8 > + por xmm3,xmm5 > + por xmm14,xmm7 > + pxor xmm4,xmm1 > + movdqa xmm1,XMMWORD[((32-128))+rax] > + > + movdqa xmm8,xmm12 > + movdqa xmm6,xmm10 > + pxor xmm4,XMMWORD[((128-128))+rax] > + paddd xmm11,xmm15 > + pslld xmm8,5 > + pxor xmm6,xmm13 > + > + movdqa xmm9,xmm12 > + movdqa XMMWORD[(240-128)+rax],xmm3 > + paddd xmm11,xmm3 > + pxor xmm4,xmm1 > + psrld xmm9,27 > + pxor xmm6,xmm14 > + movdqa xmm7,xmm13 > + > + pslld xmm7,30 > + movdqa xmm5,xmm4 > + por xmm8,xmm9 > + psrld xmm5,31 > + paddd xmm11,xmm6 > + paddd xmm4,xmm4 > + > + psrld xmm13,2 > + paddd xmm11,xmm8 > + por xmm4,xmm5 > + por xmm13,xmm7 > + pxor xmm0,xmm2 > + movdqa xmm2,XMMWORD[((48-128))+rax] > + > + movdqa xmm8,xmm11 > + movdqa xmm6,xmm14 > + pxor xmm0,XMMWORD[((144-128))+rax] > + paddd xmm10,xmm15 > + pslld xmm8,5 > + pxor xmm6,xmm12 > + > + movdqa xmm9,xmm11 > + movdqa XMMWORD[(0-128)+rax],xmm4 > + paddd xmm10,xmm4 > + pxor xmm0,xmm2 > + psrld xmm9,27 > + pxor xmm6,xmm13 > + movdqa xmm7,xmm12 > + > + pslld xmm7,30 > + movdqa xmm5,xmm0 > + por xmm8,xmm9 > + psrld xmm5,31 > + paddd xmm10,xmm6 > + paddd xmm0,xmm0 > + > + psrld xmm12,2 > + paddd xmm10,xmm8 > + por xmm0,xmm5 > + por xmm12,xmm7 > + pxor xmm1,xmm3 > + movdqa xmm3,XMMWORD[((64-128))+rax] > + > + movdqa xmm8,xmm10 > + movdqa xmm6,xmm13 > + pxor xmm1,XMMWORD[((160-128))+rax] > + paddd xmm14,xmm15 > + pslld xmm8,5 > + pxor xmm6,xmm11 > + > + movdqa xmm9,xmm10 > + movdqa XMMWORD[(16-128)+rax],xmm0 > + paddd xmm14,xmm0 > + pxor xmm1,xmm3 > + psrld xmm9,27 > + pxor xmm6,xmm12 > + movdqa xmm7,xmm11 > + > + pslld xmm7,30 > + movdqa xmm5,xmm1 > + por xmm8,xmm9 > + psrld xmm5,31 > + paddd xmm14,xmm6 > + paddd xmm1,xmm1 > + > + psrld xmm11,2 > + paddd xmm14,xmm8 > + por xmm1,xmm5 > + por xmm11,xmm7 > + pxor xmm2,xmm4 > + movdqa xmm4,XMMWORD[((80-128))+rax] > + > + movdqa xmm8,xmm14 > + movdqa xmm6,xmm12 > + pxor xmm2,XMMWORD[((176-128))+rax] > + paddd xmm13,xmm15 > + pslld xmm8,5 > + pxor xmm6,xmm10 > + > + movdqa xmm9,xmm14 > + movdqa XMMWORD[(32-128)+rax],xmm1 > + paddd xmm13,xmm1 > + pxor xmm2,xmm4 > + psrld xmm9,27 > + pxor xmm6,xmm11 > + movdqa xmm7,xmm10 > + > + pslld xmm7,30 > + movdqa xmm5,xmm2 > + por xmm8,xmm9 > + psrld xmm5,31 > + paddd xmm13,xmm6 > + paddd xmm2,xmm2 > + > + psrld xmm10,2 > + paddd xmm13,xmm8 > + por xmm2,xmm5 > + por xmm10,xmm7 > + pxor xmm3,xmm0 > + movdqa xmm0,XMMWORD[((96-128))+rax] > + > + movdqa xmm8,xmm13 > + movdqa xmm6,xmm11 > + pxor xmm3,XMMWORD[((192-128))+rax] > + paddd xmm12,xmm15 > + pslld xmm8,5 > + pxor xmm6,xmm14 > + > + movdqa xmm9,xmm13 > + movdqa XMMWORD[(48-128)+rax],xmm2 > + paddd xmm12,xmm2 > + pxor xmm3,xmm0 > + psrld xmm9,27 > + pxor xmm6,xmm10 > + movdqa xmm7,xmm14 > + > + pslld xmm7,30 > + movdqa xmm5,xmm3 > + por xmm8,xmm9 > + psrld xmm5,31 > + paddd xmm12,xmm6 > + paddd xmm3,xmm3 > + > + psrld xmm14,2 > + paddd xmm12,xmm8 > + por xmm3,xmm5 > + por xmm14,xmm7 > + pxor xmm4,xmm1 > + movdqa xmm1,XMMWORD[((112-128))+rax] > + > + movdqa xmm8,xmm12 > + movdqa xmm6,xmm10 > + pxor xmm4,XMMWORD[((208-128))+rax] > + paddd xmm11,xmm15 > + pslld xmm8,5 > + pxor xmm6,xmm13 > + > + movdqa xmm9,xmm12 > + movdqa XMMWORD[(64-128)+rax],xmm3 > + paddd xmm11,xmm3 > + pxor xmm4,xmm1 > + psrld xmm9,27 > + pxor xmm6,xmm14 > + movdqa xmm7,xmm13 > + > + pslld xmm7,30 > + movdqa xmm5,xmm4 > + por xmm8,xmm9 > + psrld xmm5,31 > + paddd xmm11,xmm6 > + paddd xmm4,xmm4 > + > + psrld xmm13,2 > + paddd xmm11,xmm8 > + por xmm4,xmm5 > + por xmm13,xmm7 > + pxor xmm0,xmm2 > + movdqa xmm2,XMMWORD[((128-128))+rax] > + > + movdqa xmm8,xmm11 > + movdqa xmm6,xmm14 > + pxor xmm0,XMMWORD[((224-128))+rax] > + paddd xmm10,xmm15 > + pslld xmm8,5 > + pxor xmm6,xmm12 > + > + movdqa xmm9,xmm11 > + movdqa XMMWORD[(80-128)+rax],xmm4 > + paddd xmm10,xmm4 > + pxor xmm0,xmm2 > + psrld xmm9,27 > + pxor xmm6,xmm13 > + movdqa xmm7,xmm12 > + > + pslld xmm7,30 > + movdqa xmm5,xmm0 > + por xmm8,xmm9 > + psrld xmm5,31 > + paddd xmm10,xmm6 > + paddd xmm0,xmm0 > + > + psrld xmm12,2 > + paddd xmm10,xmm8 > + por xmm0,xmm5 > + por xmm12,xmm7 > + pxor xmm1,xmm3 > + movdqa xmm3,XMMWORD[((144-128))+rax] > + > + movdqa xmm8,xmm10 > + movdqa xmm6,xmm13 > + pxor xmm1,XMMWORD[((240-128))+rax] > + paddd xmm14,xmm15 > + pslld xmm8,5 > + pxor xmm6,xmm11 > + > + movdqa xmm9,xmm10 > + movdqa XMMWORD[(96-128)+rax],xmm0 > + paddd xmm14,xmm0 > + pxor xmm1,xmm3 > + psrld xmm9,27 > + pxor xmm6,xmm12 > + movdqa xmm7,xmm11 > + > + pslld xmm7,30 > + movdqa xmm5,xmm1 > + por xmm8,xmm9 > + psrld xmm5,31 > + paddd xmm14,xmm6 > + paddd xmm1,xmm1 > + > + psrld xmm11,2 > + paddd xmm14,xmm8 > + por xmm1,xmm5 > + por xmm11,xmm7 > + pxor xmm2,xmm4 > + movdqa xmm4,XMMWORD[((160-128))+rax] > + > + movdqa xmm8,xmm14 > + movdqa xmm6,xmm12 > + pxor xmm2,XMMWORD[((0-128))+rax] > + paddd xmm13,xmm15 > + pslld xmm8,5 > + pxor xmm6,xmm10 > + > + movdqa xmm9,xmm14 > + movdqa XMMWORD[(112-128)+rax],xmm1 > + paddd xmm13,xmm1 > + pxor xmm2,xmm4 > + psrld xmm9,27 > + pxor xmm6,xmm11 > + movdqa xmm7,xmm10 > + > + pslld xmm7,30 > + movdqa xmm5,xmm2 > + por xmm8,xmm9 > + psrld xmm5,31 > + paddd xmm13,xmm6 > + paddd xmm2,xmm2 > + > + psrld xmm10,2 > + paddd xmm13,xmm8 > + por xmm2,xmm5 > + por xmm10,xmm7 > + pxor xmm3,xmm0 > + movdqa xmm0,XMMWORD[((176-128))+rax] > + > + movdqa xmm8,xmm13 > + movdqa xmm6,xmm11 > + pxor xmm3,XMMWORD[((16-128))+rax] > + paddd xmm12,xmm15 > + pslld xmm8,5 > + pxor xmm6,xmm14 > + > + movdqa xmm9,xmm13 > + paddd xmm12,xmm2 > + pxor xmm3,xmm0 > + psrld xmm9,27 > + pxor xmm6,xmm10 > + movdqa xmm7,xmm14 > + > + pslld xmm7,30 > + movdqa xmm5,xmm3 > + por xmm8,xmm9 > + psrld xmm5,31 > + paddd xmm12,xmm6 > + paddd xmm3,xmm3 > + > + psrld xmm14,2 > + paddd xmm12,xmm8 > + por xmm3,xmm5 > + por xmm14,xmm7 > + pxor xmm4,xmm1 > + movdqa xmm1,XMMWORD[((192-128))+rax] > + > + movdqa xmm8,xmm12 > + movdqa xmm6,xmm10 > + pxor xmm4,XMMWORD[((32-128))+rax] > + paddd xmm11,xmm15 > + pslld xmm8,5 > + pxor xmm6,xmm13 > + > + movdqa xmm9,xmm12 > + paddd xmm11,xmm3 > + pxor xmm4,xmm1 > + psrld xmm9,27 > + pxor xmm6,xmm14 > + movdqa xmm7,xmm13 > + > + pslld xmm7,30 > + movdqa xmm5,xmm4 > + por xmm8,xmm9 > + psrld xmm5,31 > + paddd xmm11,xmm6 > + paddd xmm4,xmm4 > + > + psrld xmm13,2 > + paddd xmm11,xmm8 > + por xmm4,xmm5 > + por xmm13,xmm7 > + pxor xmm0,xmm2 > + movdqa xmm2,XMMWORD[((208-128))+rax] > + > + movdqa xmm8,xmm11 > + movdqa xmm6,xmm14 > + pxor xmm0,XMMWORD[((48-128))+rax] > + paddd xmm10,xmm15 > + pslld xmm8,5 > + pxor xmm6,xmm12 > + > + movdqa xmm9,xmm11 > + paddd xmm10,xmm4 > + pxor xmm0,xmm2 > + psrld xmm9,27 > + pxor xmm6,xmm13 > + movdqa xmm7,xmm12 > + > + pslld xmm7,30 > + movdqa xmm5,xmm0 > + por xmm8,xmm9 > + psrld xmm5,31 > + paddd xmm10,xmm6 > + paddd xmm0,xmm0 > + > + psrld xmm12,2 > + paddd xmm10,xmm8 > + por xmm0,xmm5 > + por xmm12,xmm7 > + pxor xmm1,xmm3 > + movdqa xmm3,XMMWORD[((224-128))+rax] > + > + movdqa xmm8,xmm10 > + movdqa xmm6,xmm13 > + pxor xmm1,XMMWORD[((64-128))+rax] > + paddd xmm14,xmm15 > + pslld xmm8,5 > + pxor xmm6,xmm11 > + > + movdqa xmm9,xmm10 > + paddd xmm14,xmm0 > + pxor xmm1,xmm3 > + psrld xmm9,27 > + pxor xmm6,xmm12 > + movdqa xmm7,xmm11 > + > + pslld xmm7,30 > + movdqa xmm5,xmm1 > + por xmm8,xmm9 > + psrld xmm5,31 > + paddd xmm14,xmm6 > + paddd xmm1,xmm1 > + > + psrld xmm11,2 > + paddd xmm14,xmm8 > + por xmm1,xmm5 > + por xmm11,xmm7 > + pxor xmm2,xmm4 > + movdqa xmm4,XMMWORD[((240-128))+rax] > + > + movdqa xmm8,xmm14 > + movdqa xmm6,xmm12 > + pxor xmm2,XMMWORD[((80-128))+rax] > + paddd xmm13,xmm15 > + pslld xmm8,5 > + pxor xmm6,xmm10 > + > + movdqa xmm9,xmm14 > + paddd xmm13,xmm1 > + pxor xmm2,xmm4 > + psrld xmm9,27 > + pxor xmm6,xmm11 > + movdqa xmm7,xmm10 > + > + pslld xmm7,30 > + movdqa xmm5,xmm2 > + por xmm8,xmm9 > + psrld xmm5,31 > + paddd xmm13,xmm6 > + paddd xmm2,xmm2 > + > + psrld xmm10,2 > + paddd xmm13,xmm8 > + por xmm2,xmm5 > + por xmm10,xmm7 > + pxor xmm3,xmm0 > + movdqa xmm0,XMMWORD[((0-128))+rax] > + > + movdqa xmm8,xmm13 > + movdqa xmm6,xmm11 > + pxor xmm3,XMMWORD[((96-128))+rax] > + paddd xmm12,xmm15 > + pslld xmm8,5 > + pxor xmm6,xmm14 > + > + movdqa xmm9,xmm13 > + paddd xmm12,xmm2 > + pxor xmm3,xmm0 > + psrld xmm9,27 > + pxor xmm6,xmm10 > + movdqa xmm7,xmm14 > + > + pslld xmm7,30 > + movdqa xmm5,xmm3 > + por xmm8,xmm9 > + psrld xmm5,31 > + paddd xmm12,xmm6 > + paddd xmm3,xmm3 > + > + psrld xmm14,2 > + paddd xmm12,xmm8 > + por xmm3,xmm5 > + por xmm14,xmm7 > + pxor xmm4,xmm1 > + movdqa xmm1,XMMWORD[((16-128))+rax] > + > + movdqa xmm8,xmm12 > + movdqa xmm6,xmm10 > + pxor xmm4,XMMWORD[((112-128))+rax] > + paddd xmm11,xmm15 > + pslld xmm8,5 > + pxor xmm6,xmm13 > + > + movdqa xmm9,xmm12 > + paddd xmm11,xmm3 > + pxor xmm4,xmm1 > + psrld xmm9,27 > + pxor xmm6,xmm14 > + movdqa xmm7,xmm13 > + > + pslld xmm7,30 > + movdqa xmm5,xmm4 > + por xmm8,xmm9 > + psrld xmm5,31 > + paddd xmm11,xmm6 > + paddd xmm4,xmm4 > + > + psrld xmm13,2 > + paddd xmm11,xmm8 > + por xmm4,xmm5 > + por xmm13,xmm7 > + movdqa xmm8,xmm11 > + paddd xmm10,xmm15 > + movdqa xmm6,xmm14 > + pslld xmm8,5 > + pxor xmm6,xmm12 > + > + movdqa xmm9,xmm11 > + paddd xmm10,xmm4 > + psrld xmm9,27 > + movdqa xmm7,xmm12 > + pxor xmm6,xmm13 > + > + pslld xmm7,30 > + por xmm8,xmm9 > + paddd xmm10,xmm6 > + > + psrld xmm12,2 > + paddd xmm10,xmm8 > + por xmm12,xmm7 > + movdqa xmm0,XMMWORD[rbx] > + mov ecx,1 > + cmp ecx,DWORD[rbx] > + pxor xmm8,xmm8 > + cmovge r8,rbp > + cmp ecx,DWORD[4+rbx] > + movdqa xmm1,xmm0 > + cmovge r9,rbp > + cmp ecx,DWORD[8+rbx] > + pcmpgtd xmm1,xmm8 > + cmovge r10,rbp > + cmp ecx,DWORD[12+rbx] > + paddd xmm0,xmm1 > + cmovge r11,rbp > + > + movdqu xmm6,XMMWORD[rdi] > + pand xmm10,xmm1 > + movdqu xmm7,XMMWORD[32+rdi] > + pand xmm11,xmm1 > + paddd xmm10,xmm6 > + movdqu xmm8,XMMWORD[64+rdi] > + pand xmm12,xmm1 > + paddd xmm11,xmm7 > + movdqu xmm9,XMMWORD[96+rdi] > + pand xmm13,xmm1 > + paddd xmm12,xmm8 > + movdqu xmm5,XMMWORD[128+rdi] > + pand xmm14,xmm1 > + movdqu XMMWORD[rdi],xmm10 > + paddd xmm13,xmm9 > + movdqu XMMWORD[32+rdi],xmm11 > + paddd xmm14,xmm5 > + movdqu XMMWORD[64+rdi],xmm12 > + movdqu XMMWORD[96+rdi],xmm13 > + movdqu XMMWORD[128+rdi],xmm14 > + > + movdqa XMMWORD[rbx],xmm0 > + movdqa xmm5,XMMWORD[96+rbp] > + movdqa xmm15,XMMWORD[((-32))+rbp] > + dec edx > + jnz NEAR $L$oop > + > + mov edx,DWORD[280+rsp] > + lea rdi,[16+rdi] > + lea rsi,[64+rsi] > + dec edx > + jnz NEAR $L$oop_grande > + > +$L$done: > + mov rax,QWORD[272+rsp] > + > + movaps xmm6,XMMWORD[((-184))+rax] > + movaps xmm7,XMMWORD[((-168))+rax] > + movaps xmm8,XMMWORD[((-152))+rax] > + movaps xmm9,XMMWORD[((-136))+rax] > + movaps xmm10,XMMWORD[((-120))+rax] > + movaps xmm11,XMMWORD[((-104))+rax] > + movaps xmm12,XMMWORD[((-88))+rax] > + movaps xmm13,XMMWORD[((-72))+rax] > + movaps xmm14,XMMWORD[((-56))+rax] > + movaps xmm15,XMMWORD[((-40))+rax] > + mov rbp,QWORD[((-16))+rax] > + > + mov rbx,QWORD[((-8))+rax] > + > + lea rsp,[rax] > + > +$L$epilogue: > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > + mov rsi,QWORD[16+rsp] > + DB 0F3h,0C3h ;repret > + > +$L$SEH_end_sha1_multi_block: > + > +ALIGN 32 > +sha1_multi_block_shaext: > + mov QWORD[8+rsp],rdi ;WIN64 prologue > + mov QWORD[16+rsp],rsi > + mov rax,rsp > +$L$SEH_begin_sha1_multi_block_shaext: > + mov rdi,rcx > + mov rsi,rdx > + mov rdx,r8 > + > + > + > +_shaext_shortcut: > + mov rax,rsp > + > + push rbx > + > + push rbp > + > + lea rsp,[((-168))+rsp] > + movaps XMMWORD[rsp],xmm6 > + movaps XMMWORD[16+rsp],xmm7 > + movaps XMMWORD[32+rsp],xmm8 > + movaps XMMWORD[48+rsp],xmm9 > + movaps XMMWORD[(-120)+rax],xmm10 > + movaps XMMWORD[(-104)+rax],xmm11 > + movaps XMMWORD[(-88)+rax],xmm12 > + movaps XMMWORD[(-72)+rax],xmm13 > + movaps XMMWORD[(-56)+rax],xmm14 > + movaps XMMWORD[(-40)+rax],xmm15 > + sub rsp,288 > + shl edx,1 > + and rsp,-256 > + lea rdi,[64+rdi] > + mov QWORD[272+rsp],rax > +$L$body_shaext: > + lea rbx,[256+rsp] > + movdqa xmm3,XMMWORD[((K_XX_XX+128))] > + > +$L$oop_grande_shaext: > + mov DWORD[280+rsp],edx > + xor edx,edx > + mov r8,QWORD[rsi] > + mov ecx,DWORD[8+rsi] > + cmp ecx,edx > + cmovg edx,ecx > + test ecx,ecx > + mov DWORD[rbx],ecx > + cmovle r8,rsp > + mov r9,QWORD[16+rsi] > + mov ecx,DWORD[24+rsi] > + cmp ecx,edx > + cmovg edx,ecx > + test ecx,ecx > + mov DWORD[4+rbx],ecx > + cmovle r9,rsp > + test edx,edx > + jz NEAR $L$done_shaext > + > + movq xmm0,QWORD[((0-64))+rdi] > + movq xmm4,QWORD[((32-64))+rdi] > + movq xmm5,QWORD[((64-64))+rdi] > + movq xmm6,QWORD[((96-64))+rdi] > + movq xmm7,QWORD[((128-64))+rdi] > + > + punpckldq xmm0,xmm4 > + punpckldq xmm5,xmm6 > + > + movdqa xmm8,xmm0 > + punpcklqdq xmm0,xmm5 > + punpckhqdq xmm8,xmm5 > + > + pshufd xmm1,xmm7,63 > + pshufd xmm9,xmm7,127 > + pshufd xmm0,xmm0,27 > + pshufd xmm8,xmm8,27 > + jmp NEAR $L$oop_shaext > + > +ALIGN 32 > +$L$oop_shaext: > + movdqu xmm4,XMMWORD[r8] > + movdqu xmm11,XMMWORD[r9] > + movdqu xmm5,XMMWORD[16+r8] > + movdqu xmm12,XMMWORD[16+r9] > + movdqu xmm6,XMMWORD[32+r8] > +DB 102,15,56,0,227 > + movdqu xmm13,XMMWORD[32+r9] > +DB 102,68,15,56,0,219 > + movdqu xmm7,XMMWORD[48+r8] > + lea r8,[64+r8] > +DB 102,15,56,0,235 > + movdqu xmm14,XMMWORD[48+r9] > + lea r9,[64+r9] > +DB 102,68,15,56,0,227 > + > + movdqa XMMWORD[80+rsp],xmm1 > + paddd xmm1,xmm4 > + movdqa XMMWORD[112+rsp],xmm9 > + paddd xmm9,xmm11 > + movdqa XMMWORD[64+rsp],xmm0 > + movdqa xmm2,xmm0 > + movdqa XMMWORD[96+rsp],xmm8 > + movdqa xmm10,xmm8 > +DB 15,58,204,193,0 > +DB 15,56,200,213 > +DB 69,15,58,204,193,0 > +DB 69,15,56,200,212 > +DB 102,15,56,0,243 > + prefetcht0 [127+r8] > +DB 15,56,201,229 > +DB 102,68,15,56,0,235 > + prefetcht0 [127+r9] > +DB 69,15,56,201,220 > + > +DB 102,15,56,0,251 > + movdqa xmm1,xmm0 > +DB 102,68,15,56,0,243 > + movdqa xmm9,xmm8 > +DB 15,58,204,194,0 > +DB 15,56,200,206 > +DB 69,15,58,204,194,0 > +DB 69,15,56,200,205 > + pxor xmm4,xmm6 > +DB 15,56,201,238 > + pxor xmm11,xmm13 > +DB 69,15,56,201,229 > + movdqa xmm2,xmm0 > + movdqa xmm10,xmm8 > +DB 15,58,204,193,0 > +DB 15,56,200,215 > +DB 69,15,58,204,193,0 > +DB 69,15,56,200,214 > +DB 15,56,202,231 > +DB 69,15,56,202,222 > + pxor xmm5,xmm7 > +DB 15,56,201,247 > + pxor xmm12,xmm14 > +DB 69,15,56,201,238 > + movdqa xmm1,xmm0 > + movdqa xmm9,xmm8 > +DB 15,58,204,194,0 > +DB 15,56,200,204 > +DB 69,15,58,204,194,0 > +DB 69,15,56,200,203 > +DB 15,56,202,236 > +DB 69,15,56,202,227 > + pxor xmm6,xmm4 > +DB 15,56,201,252 > + pxor xmm13,xmm11 > +DB 69,15,56,201,243 > + movdqa xmm2,xmm0 > + movdqa xmm10,xmm8 > +DB 15,58,204,193,0 > +DB 15,56,200,213 > +DB 69,15,58,204,193,0 > +DB 69,15,56,200,212 > +DB 15,56,202,245 > +DB 69,15,56,202,236 > + pxor xmm7,xmm5 > +DB 15,56,201,229 > + pxor xmm14,xmm12 > +DB 69,15,56,201,220 > + movdqa xmm1,xmm0 > + movdqa xmm9,xmm8 > +DB 15,58,204,194,1 > +DB 15,56,200,206 > +DB 69,15,58,204,194,1 > +DB 69,15,56,200,205 > +DB 15,56,202,254 > +DB 69,15,56,202,245 > + pxor xmm4,xmm6 > +DB 15,56,201,238 > + pxor xmm11,xmm13 > +DB 69,15,56,201,229 > + movdqa xmm2,xmm0 > + movdqa xmm10,xmm8 > +DB 15,58,204,193,1 > +DB 15,56,200,215 > +DB 69,15,58,204,193,1 > +DB 69,15,56,200,214 > +DB 15,56,202,231 > +DB 69,15,56,202,222 > + pxor xmm5,xmm7 > +DB 15,56,201,247 > + pxor xmm12,xmm14 > +DB 69,15,56,201,238 > + movdqa xmm1,xmm0 > + movdqa xmm9,xmm8 > +DB 15,58,204,194,1 > +DB 15,56,200,204 > +DB 69,15,58,204,194,1 > +DB 69,15,56,200,203 > +DB 15,56,202,236 > +DB 69,15,56,202,227 > + pxor xmm6,xmm4 > +DB 15,56,201,252 > + pxor xmm13,xmm11 > +DB 69,15,56,201,243 > + movdqa xmm2,xmm0 > + movdqa xmm10,xmm8 > +DB 15,58,204,193,1 > +DB 15,56,200,213 > +DB 69,15,58,204,193,1 > +DB 69,15,56,200,212 > +DB 15,56,202,245 > +DB 69,15,56,202,236 > + pxor xmm7,xmm5 > +DB 15,56,201,229 > + pxor xmm14,xmm12 > +DB 69,15,56,201,220 > + movdqa xmm1,xmm0 > + movdqa xmm9,xmm8 > +DB 15,58,204,194,1 > +DB 15,56,200,206 > +DB 69,15,58,204,194,1 > +DB 69,15,56,200,205 > +DB 15,56,202,254 > +DB 69,15,56,202,245 > + pxor xmm4,xmm6 > +DB 15,56,201,238 > + pxor xmm11,xmm13 > +DB 69,15,56,201,229 > + movdqa xmm2,xmm0 > + movdqa xmm10,xmm8 > +DB 15,58,204,193,2 > +DB 15,56,200,215 > +DB 69,15,58,204,193,2 > +DB 69,15,56,200,214 > +DB 15,56,202,231 > +DB 69,15,56,202,222 > + pxor xmm5,xmm7 > +DB 15,56,201,247 > + pxor xmm12,xmm14 > +DB 69,15,56,201,238 > + movdqa xmm1,xmm0 > + movdqa xmm9,xmm8 > +DB 15,58,204,194,2 > +DB 15,56,200,204 > +DB 69,15,58,204,194,2 > +DB 69,15,56,200,203 > +DB 15,56,202,236 > +DB 69,15,56,202,227 > + pxor xmm6,xmm4 > +DB 15,56,201,252 > + pxor xmm13,xmm11 > +DB 69,15,56,201,243 > + movdqa xmm2,xmm0 > + movdqa xmm10,xmm8 > +DB 15,58,204,193,2 > +DB 15,56,200,213 > +DB 69,15,58,204,193,2 > +DB 69,15,56,200,212 > +DB 15,56,202,245 > +DB 69,15,56,202,236 > + pxor xmm7,xmm5 > +DB 15,56,201,229 > + pxor xmm14,xmm12 > +DB 69,15,56,201,220 > + movdqa xmm1,xmm0 > + movdqa xmm9,xmm8 > +DB 15,58,204,194,2 > +DB 15,56,200,206 > +DB 69,15,58,204,194,2 > +DB 69,15,56,200,205 > +DB 15,56,202,254 > +DB 69,15,56,202,245 > + pxor xmm4,xmm6 > +DB 15,56,201,238 > + pxor xmm11,xmm13 > +DB 69,15,56,201,229 > + movdqa xmm2,xmm0 > + movdqa xmm10,xmm8 > +DB 15,58,204,193,2 > +DB 15,56,200,215 > +DB 69,15,58,204,193,2 > +DB 69,15,56,200,214 > +DB 15,56,202,231 > +DB 69,15,56,202,222 > + pxor xmm5,xmm7 > +DB 15,56,201,247 > + pxor xmm12,xmm14 > +DB 69,15,56,201,238 > + movdqa xmm1,xmm0 > + movdqa xmm9,xmm8 > +DB 15,58,204,194,3 > +DB 15,56,200,204 > +DB 69,15,58,204,194,3 > +DB 69,15,56,200,203 > +DB 15,56,202,236 > +DB 69,15,56,202,227 > + pxor xmm6,xmm4 > +DB 15,56,201,252 > + pxor xmm13,xmm11 > +DB 69,15,56,201,243 > + movdqa xmm2,xmm0 > + movdqa xmm10,xmm8 > +DB 15,58,204,193,3 > +DB 15,56,200,213 > +DB 69,15,58,204,193,3 > +DB 69,15,56,200,212 > +DB 15,56,202,245 > +DB 69,15,56,202,236 > + pxor xmm7,xmm5 > + pxor xmm14,xmm12 > + > + mov ecx,1 > + pxor xmm4,xmm4 > + cmp ecx,DWORD[rbx] > + cmovge r8,rsp > + > + movdqa xmm1,xmm0 > + movdqa xmm9,xmm8 > +DB 15,58,204,194,3 > +DB 15,56,200,206 > +DB 69,15,58,204,194,3 > +DB 69,15,56,200,205 > +DB 15,56,202,254 > +DB 69,15,56,202,245 > + > + cmp ecx,DWORD[4+rbx] > + cmovge r9,rsp > + movq xmm6,QWORD[rbx] > + > + movdqa xmm2,xmm0 > + movdqa xmm10,xmm8 > +DB 15,58,204,193,3 > +DB 15,56,200,215 > +DB 69,15,58,204,193,3 > +DB 69,15,56,200,214 > + > + pshufd xmm11,xmm6,0x00 > + pshufd xmm12,xmm6,0x55 > + movdqa xmm7,xmm6 > + pcmpgtd xmm11,xmm4 > + pcmpgtd xmm12,xmm4 > + > + movdqa xmm1,xmm0 > + movdqa xmm9,xmm8 > +DB 15,58,204,194,3 > +DB 15,56,200,204 > +DB 69,15,58,204,194,3 > +DB 68,15,56,200,204 > + > + pcmpgtd xmm7,xmm4 > + pand xmm0,xmm11 > + pand xmm1,xmm11 > + pand xmm8,xmm12 > + pand xmm9,xmm12 > + paddd xmm6,xmm7 > + > + paddd xmm0,XMMWORD[64+rsp] > + paddd xmm1,XMMWORD[80+rsp] > + paddd xmm8,XMMWORD[96+rsp] > + paddd xmm9,XMMWORD[112+rsp] > + > + movq QWORD[rbx],xmm6 > + dec edx > + jnz NEAR $L$oop_shaext > + > + mov edx,DWORD[280+rsp] > + > + pshufd xmm0,xmm0,27 > + pshufd xmm8,xmm8,27 > + > + movdqa xmm6,xmm0 > + punpckldq xmm0,xmm8 > + punpckhdq xmm6,xmm8 > + punpckhdq xmm1,xmm9 > + movq QWORD[(0-64)+rdi],xmm0 > + psrldq xmm0,8 > + movq QWORD[(64-64)+rdi],xmm6 > + psrldq xmm6,8 > + movq QWORD[(32-64)+rdi],xmm0 > + psrldq xmm1,8 > + movq QWORD[(96-64)+rdi],xmm6 > + movq QWORD[(128-64)+rdi],xmm1 > + > + lea rdi,[8+rdi] > + lea rsi,[32+rsi] > + dec edx > + jnz NEAR $L$oop_grande_shaext > + > +$L$done_shaext: > + > + movaps xmm6,XMMWORD[((-184))+rax] > + movaps xmm7,XMMWORD[((-168))+rax] > + movaps xmm8,XMMWORD[((-152))+rax] > + movaps xmm9,XMMWORD[((-136))+rax] > + movaps xmm10,XMMWORD[((-120))+rax] > + movaps xmm11,XMMWORD[((-104))+rax] > + movaps xmm12,XMMWORD[((-88))+rax] > + movaps xmm13,XMMWORD[((-72))+rax] > + movaps xmm14,XMMWORD[((-56))+rax] > + movaps xmm15,XMMWORD[((-40))+rax] > + mov rbp,QWORD[((-16))+rax] > + > + mov rbx,QWORD[((-8))+rax] > + > + lea rsp,[rax] > + > +$L$epilogue_shaext: > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > + mov rsi,QWORD[16+rsp] > + DB 0F3h,0C3h ;repret > + > +$L$SEH_end_sha1_multi_block_shaext: > + > +ALIGN 256 > + DD 0x5a827999,0x5a827999,0x5a827999,0x5a827999 > + DD 0x5a827999,0x5a827999,0x5a827999,0x5a827999 > +K_XX_XX: > + DD 0x6ed9eba1,0x6ed9eba1,0x6ed9eba1,0x6ed9eba1 > + DD 0x6ed9eba1,0x6ed9eba1,0x6ed9eba1,0x6ed9eba1 > + DD 0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc > + DD 0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc > + DD 0xca62c1d6,0xca62c1d6,0xca62c1d6,0xca62c1d6 > + DD 0xca62c1d6,0xca62c1d6,0xca62c1d6,0xca62c1d6 > + DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f > + DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f > +DB 0xf,0xe,0xd,0xc,0xb,0xa,0x9,0x8,0x7,0x6,0x5,0x4,0x3,0x2,0x1,0x0 > +DB 83,72,65,49,32,109,117,108,116,105,45,98,108,111,99,107 > +DB 32,116,114,97,110,115,102,111,114,109,32,102,111,114,32,120 > +DB 56,54,95,54,52,44,32,67,82,89,80,84,79,71,65,77 > +DB 83,32,98,121,32,60,97,112,112,114,111,64,111,112,101,110 > +DB 115,115,108,46,111,114,103,62,0 > +EXTERN __imp_RtlVirtualUnwind > + > +ALIGN 16 > +se_handler: > + push rsi > + push rdi > + push rbx > + push rbp > + push r12 > + push r13 > + push r14 > + push r15 > + pushfq > + sub rsp,64 > + > + mov rax,QWORD[120+r8] > + mov rbx,QWORD[248+r8] > + > + mov rsi,QWORD[8+r9] > + mov r11,QWORD[56+r9] > + > + mov r10d,DWORD[r11] > + lea r10,[r10*1+rsi] > + cmp rbx,r10 > + jb NEAR $L$in_prologue > + > + mov rax,QWORD[152+r8] > + > + mov r10d,DWORD[4+r11] > + lea r10,[r10*1+rsi] > + cmp rbx,r10 > + jae NEAR $L$in_prologue > + > + mov rax,QWORD[272+rax] > + > + mov rbx,QWORD[((-8))+rax] > + mov rbp,QWORD[((-16))+rax] > + mov QWORD[144+r8],rbx > + mov QWORD[160+r8],rbp > + > + lea rsi,[((-24-160))+rax] > + lea rdi,[512+r8] > + mov ecx,20 > + DD 0xa548f3fc > + > +$L$in_prologue: > + mov rdi,QWORD[8+rax] > + mov rsi,QWORD[16+rax] > + mov QWORD[152+r8],rax > + mov QWORD[168+r8],rsi > + mov QWORD[176+r8],rdi > + > + mov rdi,QWORD[40+r9] > + mov rsi,r8 > + mov ecx,154 > + DD 0xa548f3fc > + > + mov rsi,r9 > + xor rcx,rcx > + mov rdx,QWORD[8+rsi] > + mov r8,QWORD[rsi] > + mov r9,QWORD[16+rsi] > + mov r10,QWORD[40+rsi] > + lea r11,[56+rsi] > + lea r12,[24+rsi] > + mov QWORD[32+rsp],r10 > + mov QWORD[40+rsp],r11 > + mov QWORD[48+rsp],r12 > + mov QWORD[56+rsp],rcx > + call QWORD[__imp_RtlVirtualUnwind] > + > + mov eax,1 > + add rsp,64 > + popfq > + pop r15 > + pop r14 > + pop r13 > + pop r12 > + pop rbp > + pop rbx > + pop rdi > + pop rsi > + DB 0F3h,0C3h ;repret > + > +section .pdata rdata align=4 > +ALIGN 4 > + DD $L$SEH_begin_sha1_multi_block wrt ..imagebase > + DD $L$SEH_end_sha1_multi_block wrt ..imagebase > + DD $L$SEH_info_sha1_multi_block wrt ..imagebase > + DD $L$SEH_begin_sha1_multi_block_shaext wrt ..imagebase > + DD $L$SEH_end_sha1_multi_block_shaext wrt ..imagebase > + DD $L$SEH_info_sha1_multi_block_shaext wrt ..imagebase > +section .xdata rdata align=8 > +ALIGN 8 > +$L$SEH_info_sha1_multi_block: > +DB 9,0,0,0 > + DD se_handler wrt ..imagebase > + DD $L$body wrt ..imagebase,$L$epilogue wrt ..imagebase > +$L$SEH_info_sha1_multi_block_shaext: > +DB 9,0,0,0 > + DD se_handler wrt ..imagebase > + DD $L$body_shaext wrt ..imagebase,$L$epilogue_shaext > wrt ..imagebase > diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-x86_64.nasm > b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-x86_64.nasm > new file mode 100644 > index 0000000000..c6d68d348f > --- /dev/null > +++ b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-x86_64.nasm > @@ -0,0 +1,2884 @@ > +; WARNING: do not edit! > +; Generated from openssl/crypto/sha/asm/sha1-x86_64.pl > +; > +; Copyright 2006-2020 The OpenSSL Project Authors. All Rights Reserved. > +; > +; Licensed under the OpenSSL license (the "License"). You may not use > +; this file except in compliance with the License. You can obtain a copy > +; in the file LICENSE in the source distribution or at > +; https://www.openssl.org/source/license.html > + > +default rel > +%define XMMWORD > +%define YMMWORD > +%define ZMMWORD > +section .text code align=64 > + > +EXTERN OPENSSL_ia32cap_P > + > +global sha1_block_data_order > + > +ALIGN 16 > +sha1_block_data_order: > + mov QWORD[8+rsp],rdi ;WIN64 prologue > + mov QWORD[16+rsp],rsi > + mov rax,rsp > +$L$SEH_begin_sha1_block_data_order: > + mov rdi,rcx > + mov rsi,rdx > + mov rdx,r8 > + > + > + > + mov r9d,DWORD[((OPENSSL_ia32cap_P+0))] > + mov r8d,DWORD[((OPENSSL_ia32cap_P+4))] > + mov r10d,DWORD[((OPENSSL_ia32cap_P+8))] > + test r8d,512 > + jz NEAR $L$ialu > + test r10d,536870912 > + jnz NEAR _shaext_shortcut > + jmp NEAR _ssse3_shortcut > + > +ALIGN 16 > +$L$ialu: > + mov rax,rsp > + > + push rbx > + > + push rbp > + > + push r12 > + > + push r13 > + > + push r14 > + > + mov r8,rdi > + sub rsp,72 > + mov r9,rsi > + and rsp,-64 > + mov r10,rdx > + mov QWORD[64+rsp],rax > + > +$L$prologue: > + > + mov esi,DWORD[r8] > + mov edi,DWORD[4+r8] > + mov r11d,DWORD[8+r8] > + mov r12d,DWORD[12+r8] > + mov r13d,DWORD[16+r8] > + jmp NEAR $L$loop > + > +ALIGN 16 > +$L$loop: > + mov edx,DWORD[r9] > + bswap edx > + mov ebp,DWORD[4+r9] > + mov eax,r12d > + mov DWORD[rsp],edx > + mov ecx,esi > + bswap ebp > + xor eax,r11d > + rol ecx,5 > + and eax,edi > + lea r13d,[1518500249+r13*1+rdx] > + add r13d,ecx > + xor eax,r12d > + rol edi,30 > + add r13d,eax > + mov r14d,DWORD[8+r9] > + mov eax,r11d > + mov DWORD[4+rsp],ebp > + mov ecx,r13d > + bswap r14d > + xor eax,edi > + rol ecx,5 > + and eax,esi > + lea r12d,[1518500249+r12*1+rbp] > + add r12d,ecx > + xor eax,r11d > + rol esi,30 > + add r12d,eax > + mov edx,DWORD[12+r9] > + mov eax,edi > + mov DWORD[8+rsp],r14d > + mov ecx,r12d > + bswap edx > + xor eax,esi > + rol ecx,5 > + and eax,r13d > + lea r11d,[1518500249+r11*1+r14] > + add r11d,ecx > + xor eax,edi > + rol r13d,30 > + add r11d,eax > + mov ebp,DWORD[16+r9] > + mov eax,esi > + mov DWORD[12+rsp],edx > + mov ecx,r11d > + bswap ebp > + xor eax,r13d > + rol ecx,5 > + and eax,r12d > + lea edi,[1518500249+rdi*1+rdx] > + add edi,ecx > + xor eax,esi > + rol r12d,30 > + add edi,eax > + mov r14d,DWORD[20+r9] > + mov eax,r13d > + mov DWORD[16+rsp],ebp > + mov ecx,edi > + bswap r14d > + xor eax,r12d > + rol ecx,5 > + and eax,r11d > + lea esi,[1518500249+rsi*1+rbp] > + add esi,ecx > + xor eax,r13d > + rol r11d,30 > + add esi,eax > + mov edx,DWORD[24+r9] > + mov eax,r12d > + mov DWORD[20+rsp],r14d > + mov ecx,esi > + bswap edx > + xor eax,r11d > + rol ecx,5 > + and eax,edi > + lea r13d,[1518500249+r13*1+r14] > + add r13d,ecx > + xor eax,r12d > + rol edi,30 > + add r13d,eax > + mov ebp,DWORD[28+r9] > + mov eax,r11d > + mov DWORD[24+rsp],edx > + mov ecx,r13d > + bswap ebp > + xor eax,edi > + rol ecx,5 > + and eax,esi > + lea r12d,[1518500249+r12*1+rdx] > + add r12d,ecx > + xor eax,r11d > + rol esi,30 > + add r12d,eax > + mov r14d,DWORD[32+r9] > + mov eax,edi > + mov DWORD[28+rsp],ebp > + mov ecx,r12d > + bswap r14d > + xor eax,esi > + rol ecx,5 > + and eax,r13d > + lea r11d,[1518500249+r11*1+rbp] > + add r11d,ecx > + xor eax,edi > + rol r13d,30 > + add r11d,eax > + mov edx,DWORD[36+r9] > + mov eax,esi > + mov DWORD[32+rsp],r14d > + mov ecx,r11d > + bswap edx > + xor eax,r13d > + rol ecx,5 > + and eax,r12d > + lea edi,[1518500249+rdi*1+r14] > + add edi,ecx > + xor eax,esi > + rol r12d,30 > + add edi,eax > + mov ebp,DWORD[40+r9] > + mov eax,r13d > + mov DWORD[36+rsp],edx > + mov ecx,edi > + bswap ebp > + xor eax,r12d > + rol ecx,5 > + and eax,r11d > + lea esi,[1518500249+rsi*1+rdx] > + add esi,ecx > + xor eax,r13d > + rol r11d,30 > + add esi,eax > + mov r14d,DWORD[44+r9] > + mov eax,r12d > + mov DWORD[40+rsp],ebp > + mov ecx,esi > + bswap r14d > + xor eax,r11d > + rol ecx,5 > + and eax,edi > + lea r13d,[1518500249+r13*1+rbp] > + add r13d,ecx > + xor eax,r12d > + rol edi,30 > + add r13d,eax > + mov edx,DWORD[48+r9] > + mov eax,r11d > + mov DWORD[44+rsp],r14d > + mov ecx,r13d > + bswap edx > + xor eax,edi > + rol ecx,5 > + and eax,esi > + lea r12d,[1518500249+r12*1+r14] > + add r12d,ecx > + xor eax,r11d > + rol esi,30 > + add r12d,eax > + mov ebp,DWORD[52+r9] > + mov eax,edi > + mov DWORD[48+rsp],edx > + mov ecx,r12d > + bswap ebp > + xor eax,esi > + rol ecx,5 > + and eax,r13d > + lea r11d,[1518500249+r11*1+rdx] > + add r11d,ecx > + xor eax,edi > + rol r13d,30 > + add r11d,eax > + mov r14d,DWORD[56+r9] > + mov eax,esi > + mov DWORD[52+rsp],ebp > + mov ecx,r11d > + bswap r14d > + xor eax,r13d > + rol ecx,5 > + and eax,r12d > + lea edi,[1518500249+rdi*1+rbp] > + add edi,ecx > + xor eax,esi > + rol r12d,30 > + add edi,eax > + mov edx,DWORD[60+r9] > + mov eax,r13d > + mov DWORD[56+rsp],r14d > + mov ecx,edi > + bswap edx > + xor eax,r12d > + rol ecx,5 > + and eax,r11d > + lea esi,[1518500249+rsi*1+r14] > + add esi,ecx > + xor eax,r13d > + rol r11d,30 > + add esi,eax > + xor ebp,DWORD[rsp] > + mov eax,r12d > + mov DWORD[60+rsp],edx > + mov ecx,esi > + xor ebp,DWORD[8+rsp] > + xor eax,r11d > + rol ecx,5 > + xor ebp,DWORD[32+rsp] > + and eax,edi > + lea r13d,[1518500249+r13*1+rdx] > + rol edi,30 > + xor eax,r12d > + add r13d,ecx > + rol ebp,1 > + add r13d,eax > + xor r14d,DWORD[4+rsp] > + mov eax,r11d > + mov DWORD[rsp],ebp > + mov ecx,r13d > + xor r14d,DWORD[12+rsp] > + xor eax,edi > + rol ecx,5 > + xor r14d,DWORD[36+rsp] > + and eax,esi > + lea r12d,[1518500249+r12*1+rbp] > + rol esi,30 > + xor eax,r11d > + add r12d,ecx > + rol r14d,1 > + add r12d,eax > + xor edx,DWORD[8+rsp] > + mov eax,edi > + mov DWORD[4+rsp],r14d > + mov ecx,r12d > + xor edx,DWORD[16+rsp] > + xor eax,esi > + rol ecx,5 > + xor edx,DWORD[40+rsp] > + and eax,r13d > + lea r11d,[1518500249+r11*1+r14] > + rol r13d,30 > + xor eax,edi > + add r11d,ecx > + rol edx,1 > + add r11d,eax > + xor ebp,DWORD[12+rsp] > + mov eax,esi > + mov DWORD[8+rsp],edx > + mov ecx,r11d > + xor ebp,DWORD[20+rsp] > + xor eax,r13d > + rol ecx,5 > + xor ebp,DWORD[44+rsp] > + and eax,r12d > + lea edi,[1518500249+rdi*1+rdx] > + rol r12d,30 > + xor eax,esi > + add edi,ecx > + rol ebp,1 > + add edi,eax > + xor r14d,DWORD[16+rsp] > + mov eax,r13d > + mov DWORD[12+rsp],ebp > + mov ecx,edi > + xor r14d,DWORD[24+rsp] > + xor eax,r12d > + rol ecx,5 > + xor r14d,DWORD[48+rsp] > + and eax,r11d > + lea esi,[1518500249+rsi*1+rbp] > + rol r11d,30 > + xor eax,r13d > + add esi,ecx > + rol r14d,1 > + add esi,eax > + xor edx,DWORD[20+rsp] > + mov eax,edi > + mov DWORD[16+rsp],r14d > + mov ecx,esi > + xor edx,DWORD[28+rsp] > + xor eax,r12d > + rol ecx,5 > + xor edx,DWORD[52+rsp] > + lea r13d,[1859775393+r13*1+r14] > + xor eax,r11d > + add r13d,ecx > + rol edi,30 > + add r13d,eax > + rol edx,1 > + xor ebp,DWORD[24+rsp] > + mov eax,esi > + mov DWORD[20+rsp],edx > + mov ecx,r13d > + xor ebp,DWORD[32+rsp] > + xor eax,r11d > + rol ecx,5 > + xor ebp,DWORD[56+rsp] > + lea r12d,[1859775393+r12*1+rdx] > + xor eax,edi > + add r12d,ecx > + rol esi,30 > + add r12d,eax > + rol ebp,1 > + xor r14d,DWORD[28+rsp] > + mov eax,r13d > + mov DWORD[24+rsp],ebp > + mov ecx,r12d > + xor r14d,DWORD[36+rsp] > + xor eax,edi > + rol ecx,5 > + xor r14d,DWORD[60+rsp] > + lea r11d,[1859775393+r11*1+rbp] > + xor eax,esi > + add r11d,ecx > + rol r13d,30 > + add r11d,eax > + rol r14d,1 > + xor edx,DWORD[32+rsp] > + mov eax,r12d > + mov DWORD[28+rsp],r14d > + mov ecx,r11d > + xor edx,DWORD[40+rsp] > + xor eax,esi > + rol ecx,5 > + xor edx,DWORD[rsp] > + lea edi,[1859775393+rdi*1+r14] > + xor eax,r13d > + add edi,ecx > + rol r12d,30 > + add edi,eax > + rol edx,1 > + xor ebp,DWORD[36+rsp] > + mov eax,r11d > + mov DWORD[32+rsp],edx > + mov ecx,edi > + xor ebp,DWORD[44+rsp] > + xor eax,r13d > + rol ecx,5 > + xor ebp,DWORD[4+rsp] > + lea esi,[1859775393+rsi*1+rdx] > + xor eax,r12d > + add esi,ecx > + rol r11d,30 > + add esi,eax > + rol ebp,1 > + xor r14d,DWORD[40+rsp] > + mov eax,edi > + mov DWORD[36+rsp],ebp > + mov ecx,esi > + xor r14d,DWORD[48+rsp] > + xor eax,r12d > + rol ecx,5 > + xor r14d,DWORD[8+rsp] > + lea r13d,[1859775393+r13*1+rbp] > + xor eax,r11d > + add r13d,ecx > + rol edi,30 > + add r13d,eax > + rol r14d,1 > + xor edx,DWORD[44+rsp] > + mov eax,esi > + mov DWORD[40+rsp],r14d > + mov ecx,r13d > + xor edx,DWORD[52+rsp] > + xor eax,r11d > + rol ecx,5 > + xor edx,DWORD[12+rsp] > + lea r12d,[1859775393+r12*1+r14] > + xor eax,edi > + add r12d,ecx > + rol esi,30 > + add r12d,eax > + rol edx,1 > + xor ebp,DWORD[48+rsp] > + mov eax,r13d > + mov DWORD[44+rsp],edx > + mov ecx,r12d > + xor ebp,DWORD[56+rsp] > + xor eax,edi > + rol ecx,5 > + xor ebp,DWORD[16+rsp] > + lea r11d,[1859775393+r11*1+rdx] > + xor eax,esi > + add r11d,ecx > + rol r13d,30 > + add r11d,eax > + rol ebp,1 > + xor r14d,DWORD[52+rsp] > + mov eax,r12d > + mov DWORD[48+rsp],ebp > + mov ecx,r11d > + xor r14d,DWORD[60+rsp] > + xor eax,esi > + rol ecx,5 > + xor r14d,DWORD[20+rsp] > + lea edi,[1859775393+rdi*1+rbp] > + xor eax,r13d > + add edi,ecx > + rol r12d,30 > + add edi,eax > + rol r14d,1 > + xor edx,DWORD[56+rsp] > + mov eax,r11d > + mov DWORD[52+rsp],r14d > + mov ecx,edi > + xor edx,DWORD[rsp] > + xor eax,r13d > + rol ecx,5 > + xor edx,DWORD[24+rsp] > + lea esi,[1859775393+rsi*1+r14] > + xor eax,r12d > + add esi,ecx > + rol r11d,30 > + add esi,eax > + rol edx,1 > + xor ebp,DWORD[60+rsp] > + mov eax,edi > + mov DWORD[56+rsp],edx > + mov ecx,esi > + xor ebp,DWORD[4+rsp] > + xor eax,r12d > + rol ecx,5 > + xor ebp,DWORD[28+rsp] > + lea r13d,[1859775393+r13*1+rdx] > + xor eax,r11d > + add r13d,ecx > + rol edi,30 > + add r13d,eax > + rol ebp,1 > + xor r14d,DWORD[rsp] > + mov eax,esi > + mov DWORD[60+rsp],ebp > + mov ecx,r13d > + xor r14d,DWORD[8+rsp] > + xor eax,r11d > + rol ecx,5 > + xor r14d,DWORD[32+rsp] > + lea r12d,[1859775393+r12*1+rbp] > + xor eax,edi > + add r12d,ecx > + rol esi,30 > + add r12d,eax > + rol r14d,1 > + xor edx,DWORD[4+rsp] > + mov eax,r13d > + mov DWORD[rsp],r14d > + mov ecx,r12d > + xor edx,DWORD[12+rsp] > + xor eax,edi > + rol ecx,5 > + xor edx,DWORD[36+rsp] > + lea r11d,[1859775393+r11*1+r14] > + xor eax,esi > + add r11d,ecx > + rol r13d,30 > + add r11d,eax > + rol edx,1 > + xor ebp,DWORD[8+rsp] > + mov eax,r12d > + mov DWORD[4+rsp],edx > + mov ecx,r11d > + xor ebp,DWORD[16+rsp] > + xor eax,esi > + rol ecx,5 > + xor ebp,DWORD[40+rsp] > + lea edi,[1859775393+rdi*1+rdx] > + xor eax,r13d > + add edi,ecx > + rol r12d,30 > + add edi,eax > + rol ebp,1 > + xor r14d,DWORD[12+rsp] > + mov eax,r11d > + mov DWORD[8+rsp],ebp > + mov ecx,edi > + xor r14d,DWORD[20+rsp] > + xor eax,r13d > + rol ecx,5 > + xor r14d,DWORD[44+rsp] > + lea esi,[1859775393+rsi*1+rbp] > + xor eax,r12d > + add esi,ecx > + rol r11d,30 > + add esi,eax > + rol r14d,1 > + xor edx,DWORD[16+rsp] > + mov eax,edi > + mov DWORD[12+rsp],r14d > + mov ecx,esi > + xor edx,DWORD[24+rsp] > + xor eax,r12d > + rol ecx,5 > + xor edx,DWORD[48+rsp] > + lea r13d,[1859775393+r13*1+r14] > + xor eax,r11d > + add r13d,ecx > + rol edi,30 > + add r13d,eax > + rol edx,1 > + xor ebp,DWORD[20+rsp] > + mov eax,esi > + mov DWORD[16+rsp],edx > + mov ecx,r13d > + xor ebp,DWORD[28+rsp] > + xor eax,r11d > + rol ecx,5 > + xor ebp,DWORD[52+rsp] > + lea r12d,[1859775393+r12*1+rdx] > + xor eax,edi > + add r12d,ecx > + rol esi,30 > + add r12d,eax > + rol ebp,1 > + xor r14d,DWORD[24+rsp] > + mov eax,r13d > + mov DWORD[20+rsp],ebp > + mov ecx,r12d > + xor r14d,DWORD[32+rsp] > + xor eax,edi > + rol ecx,5 > + xor r14d,DWORD[56+rsp] > + lea r11d,[1859775393+r11*1+rbp] > + xor eax,esi > + add r11d,ecx > + rol r13d,30 > + add r11d,eax > + rol r14d,1 > + xor edx,DWORD[28+rsp] > + mov eax,r12d > + mov DWORD[24+rsp],r14d > + mov ecx,r11d > + xor edx,DWORD[36+rsp] > + xor eax,esi > + rol ecx,5 > + xor edx,DWORD[60+rsp] > + lea edi,[1859775393+rdi*1+r14] > + xor eax,r13d > + add edi,ecx > + rol r12d,30 > + add edi,eax > + rol edx,1 > + xor ebp,DWORD[32+rsp] > + mov eax,r11d > + mov DWORD[28+rsp],edx > + mov ecx,edi > + xor ebp,DWORD[40+rsp] > + xor eax,r13d > + rol ecx,5 > + xor ebp,DWORD[rsp] > + lea esi,[1859775393+rsi*1+rdx] > + xor eax,r12d > + add esi,ecx > + rol r11d,30 > + add esi,eax > + rol ebp,1 > + xor r14d,DWORD[36+rsp] > + mov eax,r12d > + mov DWORD[32+rsp],ebp > + mov ebx,r12d > + xor r14d,DWORD[44+rsp] > + and eax,r11d > + mov ecx,esi > + xor r14d,DWORD[4+rsp] > + lea r13d,[((-1894007588))+r13*1+rbp] > + xor ebx,r11d > + rol ecx,5 > + add r13d,eax > + rol r14d,1 > + and ebx,edi > + add r13d,ecx > + rol edi,30 > + add r13d,ebx > + xor edx,DWORD[40+rsp] > + mov eax,r11d > + mov DWORD[36+rsp],r14d > + mov ebx,r11d > + xor edx,DWORD[48+rsp] > + and eax,edi > + mov ecx,r13d > + xor edx,DWORD[8+rsp] > + lea r12d,[((-1894007588))+r12*1+r14] > + xor ebx,edi > + rol ecx,5 > + add r12d,eax > + rol edx,1 > + and ebx,esi > + add r12d,ecx > + rol esi,30 > + add r12d,ebx > + xor ebp,DWORD[44+rsp] > + mov eax,edi > + mov DWORD[40+rsp],edx > + mov ebx,edi > + xor ebp,DWORD[52+rsp] > + and eax,esi > + mov ecx,r12d > + xor ebp,DWORD[12+rsp] > + lea r11d,[((-1894007588))+r11*1+rdx] > + xor ebx,esi > + rol ecx,5 > + add r11d,eax > + rol ebp,1 > + and ebx,r13d > + add r11d,ecx > + rol r13d,30 > + add r11d,ebx > + xor r14d,DWORD[48+rsp] > + mov eax,esi > + mov DWORD[44+rsp],ebp > + mov ebx,esi > + xor r14d,DWORD[56+rsp] > + and eax,r13d > + mov ecx,r11d > + xor r14d,DWORD[16+rsp] > + lea edi,[((-1894007588))+rdi*1+rbp] > + xor ebx,r13d > + rol ecx,5 > + add edi,eax > + rol r14d,1 > + and ebx,r12d > + add edi,ecx > + rol r12d,30 > + add edi,ebx > + xor edx,DWORD[52+rsp] > + mov eax,r13d > + mov DWORD[48+rsp],r14d > + mov ebx,r13d > + xor edx,DWORD[60+rsp] > + and eax,r12d > + mov ecx,edi > + xor edx,DWORD[20+rsp] > + lea esi,[((-1894007588))+rsi*1+r14] > + xor ebx,r12d > + rol ecx,5 > + add esi,eax > + rol edx,1 > + and ebx,r11d > + add esi,ecx > + rol r11d,30 > + add esi,ebx > + xor ebp,DWORD[56+rsp] > + mov eax,r12d > + mov DWORD[52+rsp],edx > + mov ebx,r12d > + xor ebp,DWORD[rsp] > + and eax,r11d > + mov ecx,esi > + xor ebp,DWORD[24+rsp] > + lea r13d,[((-1894007588))+r13*1+rdx] > + xor ebx,r11d > + rol ecx,5 > + add r13d,eax > + rol ebp,1 > + and ebx,edi > + add r13d,ecx > + rol edi,30 > + add r13d,ebx > + xor r14d,DWORD[60+rsp] > + mov eax,r11d > + mov DWORD[56+rsp],ebp > + mov ebx,r11d > + xor r14d,DWORD[4+rsp] > + and eax,edi > + mov ecx,r13d > + xor r14d,DWORD[28+rsp] > + lea r12d,[((-1894007588))+r12*1+rbp] > + xor ebx,edi > + rol ecx,5 > + add r12d,eax > + rol r14d,1 > + and ebx,esi > + add r12d,ecx > + rol esi,30 > + add r12d,ebx > + xor edx,DWORD[rsp] > + mov eax,edi > + mov DWORD[60+rsp],r14d > + mov ebx,edi > + xor edx,DWORD[8+rsp] > + and eax,esi > + mov ecx,r12d > + xor edx,DWORD[32+rsp] > + lea r11d,[((-1894007588))+r11*1+r14] > + xor ebx,esi > + rol ecx,5 > + add r11d,eax > + rol edx,1 > + and ebx,r13d > + add r11d,ecx > + rol r13d,30 > + add r11d,ebx > + xor ebp,DWORD[4+rsp] > + mov eax,esi > + mov DWORD[rsp],edx > + mov ebx,esi > + xor ebp,DWORD[12+rsp] > + and eax,r13d > + mov ecx,r11d > + xor ebp,DWORD[36+rsp] > + lea edi,[((-1894007588))+rdi*1+rdx] > + xor ebx,r13d > + rol ecx,5 > + add edi,eax > + rol ebp,1 > + and ebx,r12d > + add edi,ecx > + rol r12d,30 > + add edi,ebx > + xor r14d,DWORD[8+rsp] > + mov eax,r13d > + mov DWORD[4+rsp],ebp > + mov ebx,r13d > + xor r14d,DWORD[16+rsp] > + and eax,r12d > + mov ecx,edi > + xor r14d,DWORD[40+rsp] > + lea esi,[((-1894007588))+rsi*1+rbp] > + xor ebx,r12d > + rol ecx,5 > + add esi,eax > + rol r14d,1 > + and ebx,r11d > + add esi,ecx > + rol r11d,30 > + add esi,ebx > + xor edx,DWORD[12+rsp] > + mov eax,r12d > + mov DWORD[8+rsp],r14d > + mov ebx,r12d > + xor edx,DWORD[20+rsp] > + and eax,r11d > + mov ecx,esi > + xor edx,DWORD[44+rsp] > + lea r13d,[((-1894007588))+r13*1+r14] > + xor ebx,r11d > + rol ecx,5 > + add r13d,eax > + rol edx,1 > + and ebx,edi > + add r13d,ecx > + rol edi,30 > + add r13d,ebx > + xor ebp,DWORD[16+rsp] > + mov eax,r11d > + mov DWORD[12+rsp],edx > + mov ebx,r11d > + xor ebp,DWORD[24+rsp] > + and eax,edi > + mov ecx,r13d > + xor ebp,DWORD[48+rsp] > + lea r12d,[((-1894007588))+r12*1+rdx] > + xor ebx,edi > + rol ecx,5 > + add r12d,eax > + rol ebp,1 > + and ebx,esi > + add r12d,ecx > + rol esi,30 > + add r12d,ebx > + xor r14d,DWORD[20+rsp] > + mov eax,edi > + mov DWORD[16+rsp],ebp > + mov ebx,edi > + xor r14d,DWORD[28+rsp] > + and eax,esi > + mov ecx,r12d > + xor r14d,DWORD[52+rsp] > + lea r11d,[((-1894007588))+r11*1+rbp] > + xor ebx,esi > + rol ecx,5 > + add r11d,eax > + rol r14d,1 > + and ebx,r13d > + add r11d,ecx > + rol r13d,30 > + add r11d,ebx > + xor edx,DWORD[24+rsp] > + mov eax,esi > + mov DWORD[20+rsp],r14d > + mov ebx,esi > + xor edx,DWORD[32+rsp] > + and eax,r13d > + mov ecx,r11d > + xor edx,DWORD[56+rsp] > + lea edi,[((-1894007588))+rdi*1+r14] > + xor ebx,r13d > + rol ecx,5 > + add edi,eax > + rol edx,1 > + and ebx,r12d > + add edi,ecx > + rol r12d,30 > + add edi,ebx > + xor ebp,DWORD[28+rsp] > + mov eax,r13d > + mov DWORD[24+rsp],edx > + mov ebx,r13d > + xor ebp,DWORD[36+rsp] > + and eax,r12d > + mov ecx,edi > + xor ebp,DWORD[60+rsp] > + lea esi,[((-1894007588))+rsi*1+rdx] > + xor ebx,r12d > + rol ecx,5 > + add esi,eax > + rol ebp,1 > + and ebx,r11d > + add esi,ecx > + rol r11d,30 > + add esi,ebx > + xor r14d,DWORD[32+rsp] > + mov eax,r12d > + mov DWORD[28+rsp],ebp > + mov ebx,r12d > + xor r14d,DWORD[40+rsp] > + and eax,r11d > + mov ecx,esi > + xor r14d,DWORD[rsp] > + lea r13d,[((-1894007588))+r13*1+rbp] > + xor ebx,r11d > + rol ecx,5 > + add r13d,eax > + rol r14d,1 > + and ebx,edi > + add r13d,ecx > + rol edi,30 > + add r13d,ebx > + xor edx,DWORD[36+rsp] > + mov eax,r11d > + mov DWORD[32+rsp],r14d > + mov ebx,r11d > + xor edx,DWORD[44+rsp] > + and eax,edi > + mov ecx,r13d > + xor edx,DWORD[4+rsp] > + lea r12d,[((-1894007588))+r12*1+r14] > + xor ebx,edi > + rol ecx,5 > + add r12d,eax > + rol edx,1 > + and ebx,esi > + add r12d,ecx > + rol esi,30 > + add r12d,ebx > + xor ebp,DWORD[40+rsp] > + mov eax,edi > + mov DWORD[36+rsp],edx > + mov ebx,edi > + xor ebp,DWORD[48+rsp] > + and eax,esi > + mov ecx,r12d > + xor ebp,DWORD[8+rsp] > + lea r11d,[((-1894007588))+r11*1+rdx] > + xor ebx,esi > + rol ecx,5 > + add r11d,eax > + rol ebp,1 > + and ebx,r13d > + add r11d,ecx > + rol r13d,30 > + add r11d,ebx > + xor r14d,DWORD[44+rsp] > + mov eax,esi > + mov DWORD[40+rsp],ebp > + mov ebx,esi > + xor r14d,DWORD[52+rsp] > + and eax,r13d > + mov ecx,r11d > + xor r14d,DWORD[12+rsp] > + lea edi,[((-1894007588))+rdi*1+rbp] > + xor ebx,r13d > + rol ecx,5 > + add edi,eax > + rol r14d,1 > + and ebx,r12d > + add edi,ecx > + rol r12d,30 > + add edi,ebx > + xor edx,DWORD[48+rsp] > + mov eax,r13d > + mov DWORD[44+rsp],r14d > + mov ebx,r13d > + xor edx,DWORD[56+rsp] > + and eax,r12d > + mov ecx,edi > + xor edx,DWORD[16+rsp] > + lea esi,[((-1894007588))+rsi*1+r14] > + xor ebx,r12d > + rol ecx,5 > + add esi,eax > + rol edx,1 > + and ebx,r11d > + add esi,ecx > + rol r11d,30 > + add esi,ebx > + xor ebp,DWORD[52+rsp] > + mov eax,edi > + mov DWORD[48+rsp],edx > + mov ecx,esi > + xor ebp,DWORD[60+rsp] > + xor eax,r12d > + rol ecx,5 > + xor ebp,DWORD[20+rsp] > + lea r13d,[((-899497514))+r13*1+rdx] > + xor eax,r11d > + add r13d,ecx > + rol edi,30 > + add r13d,eax > + rol ebp,1 > + xor r14d,DWORD[56+rsp] > + mov eax,esi > + mov DWORD[52+rsp],ebp > + mov ecx,r13d > + xor r14d,DWORD[rsp] > + xor eax,r11d > + rol ecx,5 > + xor r14d,DWORD[24+rsp] > + lea r12d,[((-899497514))+r12*1+rbp] > + xor eax,edi > + add r12d,ecx > + rol esi,30 > + add r12d,eax > + rol r14d,1 > + xor edx,DWORD[60+rsp] > + mov eax,r13d > + mov DWORD[56+rsp],r14d > + mov ecx,r12d > + xor edx,DWORD[4+rsp] > + xor eax,edi > + rol ecx,5 > + xor edx,DWORD[28+rsp] > + lea r11d,[((-899497514))+r11*1+r14] > + xor eax,esi > + add r11d,ecx > + rol r13d,30 > + add r11d,eax > + rol edx,1 > + xor ebp,DWORD[rsp] > + mov eax,r12d > + mov DWORD[60+rsp],edx > + mov ecx,r11d > + xor ebp,DWORD[8+rsp] > + xor eax,esi > + rol ecx,5 > + xor ebp,DWORD[32+rsp] > + lea edi,[((-899497514))+rdi*1+rdx] > + xor eax,r13d > + add edi,ecx > + rol r12d,30 > + add edi,eax > + rol ebp,1 > + xor r14d,DWORD[4+rsp] > + mov eax,r11d > + mov DWORD[rsp],ebp > + mov ecx,edi > + xor r14d,DWORD[12+rsp] > + xor eax,r13d > + rol ecx,5 > + xor r14d,DWORD[36+rsp] > + lea esi,[((-899497514))+rsi*1+rbp] > + xor eax,r12d > + add esi,ecx > + rol r11d,30 > + add esi,eax > + rol r14d,1 > + xor edx,DWORD[8+rsp] > + mov eax,edi > + mov DWORD[4+rsp],r14d > + mov ecx,esi > + xor edx,DWORD[16+rsp] > + xor eax,r12d > + rol ecx,5 > + xor edx,DWORD[40+rsp] > + lea r13d,[((-899497514))+r13*1+r14] > + xor eax,r11d > + add r13d,ecx > + rol edi,30 > + add r13d,eax > + rol edx,1 > + xor ebp,DWORD[12+rsp] > + mov eax,esi > + mov DWORD[8+rsp],edx > + mov ecx,r13d > + xor ebp,DWORD[20+rsp] > + xor eax,r11d > + rol ecx,5 > + xor ebp,DWORD[44+rsp] > + lea r12d,[((-899497514))+r12*1+rdx] > + xor eax,edi > + add r12d,ecx > + rol esi,30 > + add r12d,eax > + rol ebp,1 > + xor r14d,DWORD[16+rsp] > + mov eax,r13d > + mov DWORD[12+rsp],ebp > + mov ecx,r12d > + xor r14d,DWORD[24+rsp] > + xor eax,edi > + rol ecx,5 > + xor r14d,DWORD[48+rsp] > + lea r11d,[((-899497514))+r11*1+rbp] > + xor eax,esi > + add r11d,ecx > + rol r13d,30 > + add r11d,eax > + rol r14d,1 > + xor edx,DWORD[20+rsp] > + mov eax,r12d > + mov DWORD[16+rsp],r14d > + mov ecx,r11d > + xor edx,DWORD[28+rsp] > + xor eax,esi > + rol ecx,5 > + xor edx,DWORD[52+rsp] > + lea edi,[((-899497514))+rdi*1+r14] > + xor eax,r13d > + add edi,ecx > + rol r12d,30 > + add edi,eax > + rol edx,1 > + xor ebp,DWORD[24+rsp] > + mov eax,r11d > + mov DWORD[20+rsp],edx > + mov ecx,edi > + xor ebp,DWORD[32+rsp] > + xor eax,r13d > + rol ecx,5 > + xor ebp,DWORD[56+rsp] > + lea esi,[((-899497514))+rsi*1+rdx] > + xor eax,r12d > + add esi,ecx > + rol r11d,30 > + add esi,eax > + rol ebp,1 > + xor r14d,DWORD[28+rsp] > + mov eax,edi > + mov DWORD[24+rsp],ebp > + mov ecx,esi > + xor r14d,DWORD[36+rsp] > + xor eax,r12d > + rol ecx,5 > + xor r14d,DWORD[60+rsp] > + lea r13d,[((-899497514))+r13*1+rbp] > + xor eax,r11d > + add r13d,ecx > + rol edi,30 > + add r13d,eax > + rol r14d,1 > + xor edx,DWORD[32+rsp] > + mov eax,esi > + mov DWORD[28+rsp],r14d > + mov ecx,r13d > + xor edx,DWORD[40+rsp] > + xor eax,r11d > + rol ecx,5 > + xor edx,DWORD[rsp] > + lea r12d,[((-899497514))+r12*1+r14] > + xor eax,edi > + add r12d,ecx > + rol esi,30 > + add r12d,eax > + rol edx,1 > + xor ebp,DWORD[36+rsp] > + mov eax,r13d > + > + mov ecx,r12d > + xor ebp,DWORD[44+rsp] > + xor eax,edi > + rol ecx,5 > + xor ebp,DWORD[4+rsp] > + lea r11d,[((-899497514))+r11*1+rdx] > + xor eax,esi > + add r11d,ecx > + rol r13d,30 > + add r11d,eax > + rol ebp,1 > + xor r14d,DWORD[40+rsp] > + mov eax,r12d > + > + mov ecx,r11d > + xor r14d,DWORD[48+rsp] > + xor eax,esi > + rol ecx,5 > + xor r14d,DWORD[8+rsp] > + lea edi,[((-899497514))+rdi*1+rbp] > + xor eax,r13d > + add edi,ecx > + rol r12d,30 > + add edi,eax > + rol r14d,1 > + xor edx,DWORD[44+rsp] > + mov eax,r11d > + > + mov ecx,edi > + xor edx,DWORD[52+rsp] > + xor eax,r13d > + rol ecx,5 > + xor edx,DWORD[12+rsp] > + lea esi,[((-899497514))+rsi*1+r14] > + xor eax,r12d > + add esi,ecx > + rol r11d,30 > + add esi,eax > + rol edx,1 > + xor ebp,DWORD[48+rsp] > + mov eax,edi > + > + mov ecx,esi > + xor ebp,DWORD[56+rsp] > + xor eax,r12d > + rol ecx,5 > + xor ebp,DWORD[16+rsp] > + lea r13d,[((-899497514))+r13*1+rdx] > + xor eax,r11d > + add r13d,ecx > + rol edi,30 > + add r13d,eax > + rol ebp,1 > + xor r14d,DWORD[52+rsp] > + mov eax,esi > + > + mov ecx,r13d > + xor r14d,DWORD[60+rsp] > + xor eax,r11d > + rol ecx,5 > + xor r14d,DWORD[20+rsp] > + lea r12d,[((-899497514))+r12*1+rbp] > + xor eax,edi > + add r12d,ecx > + rol esi,30 > + add r12d,eax > + rol r14d,1 > + xor edx,DWORD[56+rsp] > + mov eax,r13d > + > + mov ecx,r12d > + xor edx,DWORD[rsp] > + xor eax,edi > + rol ecx,5 > + xor edx,DWORD[24+rsp] > + lea r11d,[((-899497514))+r11*1+r14] > + xor eax,esi > + add r11d,ecx > + rol r13d,30 > + add r11d,eax > + rol edx,1 > + xor ebp,DWORD[60+rsp] > + mov eax,r12d > + > + mov ecx,r11d > + xor ebp,DWORD[4+rsp] > + xor eax,esi > + rol ecx,5 > + xor ebp,DWORD[28+rsp] > + lea edi,[((-899497514))+rdi*1+rdx] > + xor eax,r13d > + add edi,ecx > + rol r12d,30 > + add edi,eax > + rol ebp,1 > + mov eax,r11d > + mov ecx,edi > + xor eax,r13d > + lea esi,[((-899497514))+rsi*1+rbp] > + rol ecx,5 > + xor eax,r12d > + add esi,ecx > + rol r11d,30 > + add esi,eax > + add esi,DWORD[r8] > + add edi,DWORD[4+r8] > + add r11d,DWORD[8+r8] > + add r12d,DWORD[12+r8] > + add r13d,DWORD[16+r8] > + mov DWORD[r8],esi > + mov DWORD[4+r8],edi > + mov DWORD[8+r8],r11d > + mov DWORD[12+r8],r12d > + mov DWORD[16+r8],r13d > + > + sub r10,1 > + lea r9,[64+r9] > + jnz NEAR $L$loop > + > + mov rsi,QWORD[64+rsp] > + > + mov r14,QWORD[((-40))+rsi] > + > + mov r13,QWORD[((-32))+rsi] > + > + mov r12,QWORD[((-24))+rsi] > + > + mov rbp,QWORD[((-16))+rsi] > + > + mov rbx,QWORD[((-8))+rsi] > + > + lea rsp,[rsi] > + > +$L$epilogue: > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > + mov rsi,QWORD[16+rsp] > + DB 0F3h,0C3h ;repret > + > +$L$SEH_end_sha1_block_data_order: > + > +ALIGN 32 > +sha1_block_data_order_shaext: > + mov QWORD[8+rsp],rdi ;WIN64 prologue > + mov QWORD[16+rsp],rsi > + mov rax,rsp > +$L$SEH_begin_sha1_block_data_order_shaext: > + mov rdi,rcx > + mov rsi,rdx > + mov rdx,r8 > + > + > +_shaext_shortcut: > + > + lea rsp,[((-72))+rsp] > + movaps XMMWORD[(-8-64)+rax],xmm6 > + movaps XMMWORD[(-8-48)+rax],xmm7 > + movaps XMMWORD[(-8-32)+rax],xmm8 > + movaps XMMWORD[(-8-16)+rax],xmm9 > +$L$prologue_shaext: > + movdqu xmm0,XMMWORD[rdi] > + movd xmm1,DWORD[16+rdi] > + movdqa xmm3,XMMWORD[((K_XX_XX+160))] > + > + movdqu xmm4,XMMWORD[rsi] > + pshufd xmm0,xmm0,27 > + movdqu xmm5,XMMWORD[16+rsi] > + pshufd xmm1,xmm1,27 > + movdqu xmm6,XMMWORD[32+rsi] > +DB 102,15,56,0,227 > + movdqu xmm7,XMMWORD[48+rsi] > +DB 102,15,56,0,235 > +DB 102,15,56,0,243 > + movdqa xmm9,xmm1 > +DB 102,15,56,0,251 > + jmp NEAR $L$oop_shaext > + > +ALIGN 16 > +$L$oop_shaext: > + dec rdx > + lea r8,[64+rsi] > + paddd xmm1,xmm4 > + cmovne rsi,r8 > + movdqa xmm8,xmm0 > +DB 15,56,201,229 > + movdqa xmm2,xmm0 > +DB 15,58,204,193,0 > +DB 15,56,200,213 > + pxor xmm4,xmm6 > +DB 15,56,201,238 > +DB 15,56,202,231 > + > + movdqa xmm1,xmm0 > +DB 15,58,204,194,0 > +DB 15,56,200,206 > + pxor xmm5,xmm7 > +DB 15,56,202,236 > +DB 15,56,201,247 > + movdqa xmm2,xmm0 > +DB 15,58,204,193,0 > +DB 15,56,200,215 > + pxor xmm6,xmm4 > +DB 15,56,201,252 > +DB 15,56,202,245 > + > + movdqa xmm1,xmm0 > +DB 15,58,204,194,0 > +DB 15,56,200,204 > + pxor xmm7,xmm5 > +DB 15,56,202,254 > +DB 15,56,201,229 > + movdqa xmm2,xmm0 > +DB 15,58,204,193,0 > +DB 15,56,200,213 > + pxor xmm4,xmm6 > +DB 15,56,201,238 > +DB 15,56,202,231 > + > + movdqa xmm1,xmm0 > +DB 15,58,204,194,1 > +DB 15,56,200,206 > + pxor xmm5,xmm7 > +DB 15,56,202,236 > +DB 15,56,201,247 > + movdqa xmm2,xmm0 > +DB 15,58,204,193,1 > +DB 15,56,200,215 > + pxor xmm6,xmm4 > +DB 15,56,201,252 > +DB 15,56,202,245 > + > + movdqa xmm1,xmm0 > +DB 15,58,204,194,1 > +DB 15,56,200,204 > + pxor xmm7,xmm5 > +DB 15,56,202,254 > +DB 15,56,201,229 > + movdqa xmm2,xmm0 > +DB 15,58,204,193,1 > +DB 15,56,200,213 > + pxor xmm4,xmm6 > +DB 15,56,201,238 > +DB 15,56,202,231 > + > + movdqa xmm1,xmm0 > +DB 15,58,204,194,1 > +DB 15,56,200,206 > + pxor xmm5,xmm7 > +DB 15,56,202,236 > +DB 15,56,201,247 > + movdqa xmm2,xmm0 > +DB 15,58,204,193,2 > +DB 15,56,200,215 > + pxor xmm6,xmm4 > +DB 15,56,201,252 > +DB 15,56,202,245 > + > + movdqa xmm1,xmm0 > +DB 15,58,204,194,2 > +DB 15,56,200,204 > + pxor xmm7,xmm5 > +DB 15,56,202,254 > +DB 15,56,201,229 > + movdqa xmm2,xmm0 > +DB 15,58,204,193,2 > +DB 15,56,200,213 > + pxor xmm4,xmm6 > +DB 15,56,201,238 > +DB 15,56,202,231 > + > + movdqa xmm1,xmm0 > +DB 15,58,204,194,2 > +DB 15,56,200,206 > + pxor xmm5,xmm7 > +DB 15,56,202,236 > +DB 15,56,201,247 > + movdqa xmm2,xmm0 > +DB 15,58,204,193,2 > +DB 15,56,200,215 > + pxor xmm6,xmm4 > +DB 15,56,201,252 > +DB 15,56,202,245 > + > + movdqa xmm1,xmm0 > +DB 15,58,204,194,3 > +DB 15,56,200,204 > + pxor xmm7,xmm5 > +DB 15,56,202,254 > + movdqu xmm4,XMMWORD[rsi] > + movdqa xmm2,xmm0 > +DB 15,58,204,193,3 > +DB 15,56,200,213 > + movdqu xmm5,XMMWORD[16+rsi] > +DB 102,15,56,0,227 > + > + movdqa xmm1,xmm0 > +DB 15,58,204,194,3 > +DB 15,56,200,206 > + movdqu xmm6,XMMWORD[32+rsi] > +DB 102,15,56,0,235 > + > + movdqa xmm2,xmm0 > +DB 15,58,204,193,3 > +DB 15,56,200,215 > + movdqu xmm7,XMMWORD[48+rsi] > +DB 102,15,56,0,243 > + > + movdqa xmm1,xmm0 > +DB 15,58,204,194,3 > +DB 65,15,56,200,201 > +DB 102,15,56,0,251 > + > + paddd xmm0,xmm8 > + movdqa xmm9,xmm1 > + > + jnz NEAR $L$oop_shaext > + > + pshufd xmm0,xmm0,27 > + pshufd xmm1,xmm1,27 > + movdqu XMMWORD[rdi],xmm0 > + movd DWORD[16+rdi],xmm1 > + movaps xmm6,XMMWORD[((-8-64))+rax] > + movaps xmm7,XMMWORD[((-8-48))+rax] > + movaps xmm8,XMMWORD[((-8-32))+rax] > + movaps xmm9,XMMWORD[((-8-16))+rax] > + mov rsp,rax > +$L$epilogue_shaext: > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > + mov rsi,QWORD[16+rsp] > + DB 0F3h,0C3h ;repret > + > +$L$SEH_end_sha1_block_data_order_shaext: > + > +ALIGN 16 > +sha1_block_data_order_ssse3: > + mov QWORD[8+rsp],rdi ;WIN64 prologue > + mov QWORD[16+rsp],rsi > + mov rax,rsp > +$L$SEH_begin_sha1_block_data_order_ssse3: > + mov rdi,rcx > + mov rsi,rdx > + mov rdx,r8 > + > + > +_ssse3_shortcut: > + > + mov r11,rsp > + > + push rbx > + > + push rbp > + > + push r12 > + > + push r13 > + > + push r14 > + > + lea rsp,[((-160))+rsp] > + movaps XMMWORD[(-40-96)+r11],xmm6 > + movaps XMMWORD[(-40-80)+r11],xmm7 > + movaps XMMWORD[(-40-64)+r11],xmm8 > + movaps XMMWORD[(-40-48)+r11],xmm9 > + movaps XMMWORD[(-40-32)+r11],xmm10 > + movaps XMMWORD[(-40-16)+r11],xmm11 > +$L$prologue_ssse3: > + and rsp,-64 > + mov r8,rdi > + mov r9,rsi > + mov r10,rdx > + > + shl r10,6 > + add r10,r9 > + lea r14,[((K_XX_XX+64))] > + > + mov eax,DWORD[r8] > + mov ebx,DWORD[4+r8] > + mov ecx,DWORD[8+r8] > + mov edx,DWORD[12+r8] > + mov esi,ebx > + mov ebp,DWORD[16+r8] > + mov edi,ecx > + xor edi,edx > + and esi,edi > + > + movdqa xmm6,XMMWORD[64+r14] > + movdqa xmm9,XMMWORD[((-64))+r14] > + movdqu xmm0,XMMWORD[r9] > + movdqu xmm1,XMMWORD[16+r9] > + movdqu xmm2,XMMWORD[32+r9] > + movdqu xmm3,XMMWORD[48+r9] > +DB 102,15,56,0,198 > +DB 102,15,56,0,206 > +DB 102,15,56,0,214 > + add r9,64 > + paddd xmm0,xmm9 > +DB 102,15,56,0,222 > + paddd xmm1,xmm9 > + paddd xmm2,xmm9 > + movdqa XMMWORD[rsp],xmm0 > + psubd xmm0,xmm9 > + movdqa XMMWORD[16+rsp],xmm1 > + psubd xmm1,xmm9 > + movdqa XMMWORD[32+rsp],xmm2 > + psubd xmm2,xmm9 > + jmp NEAR $L$oop_ssse3 > +ALIGN 16 > +$L$oop_ssse3: > + ror ebx,2 > + pshufd xmm4,xmm0,238 > + xor esi,edx > + movdqa xmm8,xmm3 > + paddd xmm9,xmm3 > + mov edi,eax > + add ebp,DWORD[rsp] > + punpcklqdq xmm4,xmm1 > + xor ebx,ecx > + rol eax,5 > + add ebp,esi > + psrldq xmm8,4 > + and edi,ebx > + xor ebx,ecx > + pxor xmm4,xmm0 > + add ebp,eax > + ror eax,7 > + pxor xmm8,xmm2 > + xor edi,ecx > + mov esi,ebp > + add edx,DWORD[4+rsp] > + pxor xmm4,xmm8 > + xor eax,ebx > + rol ebp,5 > + movdqa XMMWORD[48+rsp],xmm9 > + add edx,edi > + and esi,eax > + movdqa xmm10,xmm4 > + xor eax,ebx > + add edx,ebp > + ror ebp,7 > + movdqa xmm8,xmm4 > + xor esi,ebx > + pslldq xmm10,12 > + paddd xmm4,xmm4 > + mov edi,edx > + add ecx,DWORD[8+rsp] > + psrld xmm8,31 > + xor ebp,eax > + rol edx,5 > + add ecx,esi > + movdqa xmm9,xmm10 > + and edi,ebp > + xor ebp,eax > + psrld xmm10,30 > + add ecx,edx > + ror edx,7 > + por xmm4,xmm8 > + xor edi,eax > + mov esi,ecx > + add ebx,DWORD[12+rsp] > + pslld xmm9,2 > + pxor xmm4,xmm10 > + xor edx,ebp > + movdqa xmm10,XMMWORD[((-64))+r14] > + rol ecx,5 > + add ebx,edi > + and esi,edx > + pxor xmm4,xmm9 > + xor edx,ebp > + add ebx,ecx > + ror ecx,7 > + pshufd xmm5,xmm1,238 > + xor esi,ebp > + movdqa xmm9,xmm4 > + paddd xmm10,xmm4 > + mov edi,ebx > + add eax,DWORD[16+rsp] > + punpcklqdq xmm5,xmm2 > + xor ecx,edx > + rol ebx,5 > + add eax,esi > + psrldq xmm9,4 > + and edi,ecx > + xor ecx,edx > + pxor xmm5,xmm1 > + add eax,ebx > + ror ebx,7 > + pxor xmm9,xmm3 > + xor edi,edx > + mov esi,eax > + add ebp,DWORD[20+rsp] > + pxor xmm5,xmm9 > + xor ebx,ecx > + rol eax,5 > + movdqa XMMWORD[rsp],xmm10 > + add ebp,edi > + and esi,ebx > + movdqa xmm8,xmm5 > + xor ebx,ecx > + add ebp,eax > + ror eax,7 > + movdqa xmm9,xmm5 > + xor esi,ecx > + pslldq xmm8,12 > + paddd xmm5,xmm5 > + mov edi,ebp > + add edx,DWORD[24+rsp] > + psrld xmm9,31 > + xor eax,ebx > + rol ebp,5 > + add edx,esi > + movdqa xmm10,xmm8 > + and edi,eax > + xor eax,ebx > + psrld xmm8,30 > + add edx,ebp > + ror ebp,7 > + por xmm5,xmm9 > + xor edi,ebx > + mov esi,edx > + add ecx,DWORD[28+rsp] > + pslld xmm10,2 > + pxor xmm5,xmm8 > + xor ebp,eax > + movdqa xmm8,XMMWORD[((-32))+r14] > + rol edx,5 > + add ecx,edi > + and esi,ebp > + pxor xmm5,xmm10 > + xor ebp,eax > + add ecx,edx > + ror edx,7 > + pshufd xmm6,xmm2,238 > + xor esi,eax > + movdqa xmm10,xmm5 > + paddd xmm8,xmm5 > + mov edi,ecx > + add ebx,DWORD[32+rsp] > + punpcklqdq xmm6,xmm3 > + xor edx,ebp > + rol ecx,5 > + add ebx,esi > + psrldq xmm10,4 > + and edi,edx > + xor edx,ebp > + pxor xmm6,xmm2 > + add ebx,ecx > + ror ecx,7 > + pxor xmm10,xmm4 > + xor edi,ebp > + mov esi,ebx > + add eax,DWORD[36+rsp] > + pxor xmm6,xmm10 > + xor ecx,edx > + rol ebx,5 > + movdqa XMMWORD[16+rsp],xmm8 > + add eax,edi > + and esi,ecx > + movdqa xmm9,xmm6 > + xor ecx,edx > + add eax,ebx > + ror ebx,7 > + movdqa xmm10,xmm6 > + xor esi,edx > + pslldq xmm9,12 > + paddd xmm6,xmm6 > + mov edi,eax > + add ebp,DWORD[40+rsp] > + psrld xmm10,31 > + xor ebx,ecx > + rol eax,5 > + add ebp,esi > + movdqa xmm8,xmm9 > + and edi,ebx > + xor ebx,ecx > + psrld xmm9,30 > + add ebp,eax > + ror eax,7 > + por xmm6,xmm10 > + xor edi,ecx > + mov esi,ebp > + add edx,DWORD[44+rsp] > + pslld xmm8,2 > + pxor xmm6,xmm9 > + xor eax,ebx > + movdqa xmm9,XMMWORD[((-32))+r14] > + rol ebp,5 > + add edx,edi > + and esi,eax > + pxor xmm6,xmm8 > + xor eax,ebx > + add edx,ebp > + ror ebp,7 > + pshufd xmm7,xmm3,238 > + xor esi,ebx > + movdqa xmm8,xmm6 > + paddd xmm9,xmm6 > + mov edi,edx > + add ecx,DWORD[48+rsp] > + punpcklqdq xmm7,xmm4 > + xor ebp,eax > + rol edx,5 > + add ecx,esi > + psrldq xmm8,4 > + and edi,ebp > + xor ebp,eax > + pxor xmm7,xmm3 > + add ecx,edx > + ror edx,7 > + pxor xmm8,xmm5 > + xor edi,eax > + mov esi,ecx > + add ebx,DWORD[52+rsp] > + pxor xmm7,xmm8 > + xor edx,ebp > + rol ecx,5 > + movdqa XMMWORD[32+rsp],xmm9 > + add ebx,edi > + and esi,edx > + movdqa xmm10,xmm7 > + xor edx,ebp > + add ebx,ecx > + ror ecx,7 > + movdqa xmm8,xmm7 > + xor esi,ebp > + pslldq xmm10,12 > + paddd xmm7,xmm7 > + mov edi,ebx > + add eax,DWORD[56+rsp] > + psrld xmm8,31 > + xor ecx,edx > + rol ebx,5 > + add eax,esi > + movdqa xmm9,xmm10 > + and edi,ecx > + xor ecx,edx > + psrld xmm10,30 > + add eax,ebx > + ror ebx,7 > + por xmm7,xmm8 > + xor edi,edx > + mov esi,eax > + add ebp,DWORD[60+rsp] > + pslld xmm9,2 > + pxor xmm7,xmm10 > + xor ebx,ecx > + movdqa xmm10,XMMWORD[((-32))+r14] > + rol eax,5 > + add ebp,edi > + and esi,ebx > + pxor xmm7,xmm9 > + pshufd xmm9,xmm6,238 > + xor ebx,ecx > + add ebp,eax > + ror eax,7 > + pxor xmm0,xmm4 > + xor esi,ecx > + mov edi,ebp > + add edx,DWORD[rsp] > + punpcklqdq xmm9,xmm7 > + xor eax,ebx > + rol ebp,5 > + pxor xmm0,xmm1 > + add edx,esi > + and edi,eax > + movdqa xmm8,xmm10 > + xor eax,ebx > + paddd xmm10,xmm7 > + add edx,ebp > + pxor xmm0,xmm9 > + ror ebp,7 > + xor edi,ebx > + mov esi,edx > + add ecx,DWORD[4+rsp] > + movdqa xmm9,xmm0 > + xor ebp,eax > + rol edx,5 > + movdqa XMMWORD[48+rsp],xmm10 > + add ecx,edi > + and esi,ebp > + xor ebp,eax > + pslld xmm0,2 > + add ecx,edx > + ror edx,7 > + psrld xmm9,30 > + xor esi,eax > + mov edi,ecx > + add ebx,DWORD[8+rsp] > + por xmm0,xmm9 > + xor edx,ebp > + rol ecx,5 > + pshufd xmm10,xmm7,238 > + add ebx,esi > + and edi,edx > + xor edx,ebp > + add ebx,ecx > + add eax,DWORD[12+rsp] > + xor edi,ebp > + mov esi,ebx > + rol ebx,5 > + add eax,edi > + xor esi,edx > + ror ecx,7 > + add eax,ebx > + pxor xmm1,xmm5 > + add ebp,DWORD[16+rsp] > + xor esi,ecx > + punpcklqdq xmm10,xmm0 > + mov edi,eax > + rol eax,5 > + pxor xmm1,xmm2 > + add ebp,esi > + xor edi,ecx > + movdqa xmm9,xmm8 > + ror ebx,7 > + paddd xmm8,xmm0 > + add ebp,eax > + pxor xmm1,xmm10 > + add edx,DWORD[20+rsp] > + xor edi,ebx > + mov esi,ebp > + rol ebp,5 > + movdqa xmm10,xmm1 > + add edx,edi > + xor esi,ebx > + movdqa XMMWORD[rsp],xmm8 > + ror eax,7 > + add edx,ebp > + add ecx,DWORD[24+rsp] > + pslld xmm1,2 > + xor esi,eax > + mov edi,edx > + psrld xmm10,30 > + rol edx,5 > + add ecx,esi > + xor edi,eax > + ror ebp,7 > + por xmm1,xmm10 > + add ecx,edx > + add ebx,DWORD[28+rsp] > + pshufd xmm8,xmm0,238 > + xor edi,ebp > + mov esi,ecx > + rol ecx,5 > + add ebx,edi > + xor esi,ebp > + ror edx,7 > + add ebx,ecx > + pxor xmm2,xmm6 > + add eax,DWORD[32+rsp] > + xor esi,edx > + punpcklqdq xmm8,xmm1 > + mov edi,ebx > + rol ebx,5 > + pxor xmm2,xmm3 > + add eax,esi > + xor edi,edx > + movdqa xmm10,XMMWORD[r14] > + ror ecx,7 > + paddd xmm9,xmm1 > + add eax,ebx > + pxor xmm2,xmm8 > + add ebp,DWORD[36+rsp] > + xor edi,ecx > + mov esi,eax > + rol eax,5 > + movdqa xmm8,xmm2 > + add ebp,edi > + xor esi,ecx > + movdqa XMMWORD[16+rsp],xmm9 > + ror ebx,7 > + add ebp,eax > + add edx,DWORD[40+rsp] > + pslld xmm2,2 > + xor esi,ebx > + mov edi,ebp > + psrld xmm8,30 > + rol ebp,5 > + add edx,esi > + xor edi,ebx > + ror eax,7 > + por xmm2,xmm8 > + add edx,ebp > + add ecx,DWORD[44+rsp] > + pshufd xmm9,xmm1,238 > + xor edi,eax > + mov esi,edx > + rol edx,5 > + add ecx,edi > + xor esi,eax > + ror ebp,7 > + add ecx,edx > + pxor xmm3,xmm7 > + add ebx,DWORD[48+rsp] > + xor esi,ebp > + punpcklqdq xmm9,xmm2 > + mov edi,ecx > + rol ecx,5 > + pxor xmm3,xmm4 > + add ebx,esi > + xor edi,ebp > + movdqa xmm8,xmm10 > + ror edx,7 > + paddd xmm10,xmm2 > + add ebx,ecx > + pxor xmm3,xmm9 > + add eax,DWORD[52+rsp] > + xor edi,edx > + mov esi,ebx > + rol ebx,5 > + movdqa xmm9,xmm3 > + add eax,edi > + xor esi,edx > + movdqa XMMWORD[32+rsp],xmm10 > + ror ecx,7 > + add eax,ebx > + add ebp,DWORD[56+rsp] > + pslld xmm3,2 > + xor esi,ecx > + mov edi,eax > + psrld xmm9,30 > + rol eax,5 > + add ebp,esi > + xor edi,ecx > + ror ebx,7 > + por xmm3,xmm9 > + add ebp,eax > + add edx,DWORD[60+rsp] > + pshufd xmm10,xmm2,238 > + xor edi,ebx > + mov esi,ebp > + rol ebp,5 > + add edx,edi > + xor esi,ebx > + ror eax,7 > + add edx,ebp > + pxor xmm4,xmm0 > + add ecx,DWORD[rsp] > + xor esi,eax > + punpcklqdq xmm10,xmm3 > + mov edi,edx > + rol edx,5 > + pxor xmm4,xmm5 > + add ecx,esi > + xor edi,eax > + movdqa xmm9,xmm8 > + ror ebp,7 > + paddd xmm8,xmm3 > + add ecx,edx > + pxor xmm4,xmm10 > + add ebx,DWORD[4+rsp] > + xor edi,ebp > + mov esi,ecx > + rol ecx,5 > + movdqa xmm10,xmm4 > + add ebx,edi > + xor esi,ebp > + movdqa XMMWORD[48+rsp],xmm8 > + ror edx,7 > + add ebx,ecx > + add eax,DWORD[8+rsp] > + pslld xmm4,2 > + xor esi,edx > + mov edi,ebx > + psrld xmm10,30 > + rol ebx,5 > + add eax,esi > + xor edi,edx > + ror ecx,7 > + por xmm4,xmm10 > + add eax,ebx > + add ebp,DWORD[12+rsp] > + pshufd xmm8,xmm3,238 > + xor edi,ecx > + mov esi,eax > + rol eax,5 > + add ebp,edi > + xor esi,ecx > + ror ebx,7 > + add ebp,eax > + pxor xmm5,xmm1 > + add edx,DWORD[16+rsp] > + xor esi,ebx > + punpcklqdq xmm8,xmm4 > + mov edi,ebp > + rol ebp,5 > + pxor xmm5,xmm6 > + add edx,esi > + xor edi,ebx > + movdqa xmm10,xmm9 > + ror eax,7 > + paddd xmm9,xmm4 > + add edx,ebp > + pxor xmm5,xmm8 > + add ecx,DWORD[20+rsp] > + xor edi,eax > + mov esi,edx > + rol edx,5 > + movdqa xmm8,xmm5 > + add ecx,edi > + xor esi,eax > + movdqa XMMWORD[rsp],xmm9 > + ror ebp,7 > + add ecx,edx > + add ebx,DWORD[24+rsp] > + pslld xmm5,2 > + xor esi,ebp > + mov edi,ecx > + psrld xmm8,30 > + rol ecx,5 > + add ebx,esi > + xor edi,ebp > + ror edx,7 > + por xmm5,xmm8 > + add ebx,ecx > + add eax,DWORD[28+rsp] > + pshufd xmm9,xmm4,238 > + ror ecx,7 > + mov esi,ebx > + xor edi,edx > + rol ebx,5 > + add eax,edi > + xor esi,ecx > + xor ecx,edx > + add eax,ebx > + pxor xmm6,xmm2 > + add ebp,DWORD[32+rsp] > + and esi,ecx > + xor ecx,edx > + ror ebx,7 > + punpcklqdq xmm9,xmm5 > + mov edi,eax > + xor esi,ecx > + pxor xmm6,xmm7 > + rol eax,5 > + add ebp,esi > + movdqa xmm8,xmm10 > + xor edi,ebx > + paddd xmm10,xmm5 > + xor ebx,ecx > + pxor xmm6,xmm9 > + add ebp,eax > + add edx,DWORD[36+rsp] > + and edi,ebx > + xor ebx,ecx > + ror eax,7 > + movdqa xmm9,xmm6 > + mov esi,ebp > + xor edi,ebx > + movdqa XMMWORD[16+rsp],xmm10 > + rol ebp,5 > + add edx,edi > + xor esi,eax > + pslld xmm6,2 > + xor eax,ebx > + add edx,ebp > + psrld xmm9,30 > + add ecx,DWORD[40+rsp] > + and esi,eax > + xor eax,ebx > + por xmm6,xmm9 > + ror ebp,7 > + mov edi,edx > + xor esi,eax > + rol edx,5 > + pshufd xmm10,xmm5,238 > + add ecx,esi > + xor edi,ebp > + xor ebp,eax > + add ecx,edx > + add ebx,DWORD[44+rsp] > + and edi,ebp > + xor ebp,eax > + ror edx,7 > + mov esi,ecx > + xor edi,ebp > + rol ecx,5 > + add ebx,edi > + xor esi,edx > + xor edx,ebp > + add ebx,ecx > + pxor xmm7,xmm3 > + add eax,DWORD[48+rsp] > + and esi,edx > + xor edx,ebp > + ror ecx,7 > + punpcklqdq xmm10,xmm6 > + mov edi,ebx > + xor esi,edx > + pxor xmm7,xmm0 > + rol ebx,5 > + add eax,esi > + movdqa xmm9,XMMWORD[32+r14] > + xor edi,ecx > + paddd xmm8,xmm6 > + xor ecx,edx > + pxor xmm7,xmm10 > + add eax,ebx > + add ebp,DWORD[52+rsp] > + and edi,ecx > + xor ecx,edx > + ror ebx,7 > + movdqa xmm10,xmm7 > + mov esi,eax > + xor edi,ecx > + movdqa XMMWORD[32+rsp],xmm8 > + rol eax,5 > + add ebp,edi > + xor esi,ebx > + pslld xmm7,2 > + xor ebx,ecx > + add ebp,eax > + psrld xmm10,30 > + add edx,DWORD[56+rsp] > + and esi,ebx > + xor ebx,ecx > + por xmm7,xmm10 > + ror eax,7 > + mov edi,ebp > + xor esi,ebx > + rol ebp,5 > + pshufd xmm8,xmm6,238 > + add edx,esi > + xor edi,eax > + xor eax,ebx > + add edx,ebp > + add ecx,DWORD[60+rsp] > + and edi,eax > + xor eax,ebx > + ror ebp,7 > + mov esi,edx > + xor edi,eax > + rol edx,5 > + add ecx,edi > + xor esi,ebp > + xor ebp,eax > + add ecx,edx > + pxor xmm0,xmm4 > + add ebx,DWORD[rsp] > + and esi,ebp > + xor ebp,eax > + ror edx,7 > + punpcklqdq xmm8,xmm7 > + mov edi,ecx > + xor esi,ebp > + pxor xmm0,xmm1 > + rol ecx,5 > + add ebx,esi > + movdqa xmm10,xmm9 > + xor edi,edx > + paddd xmm9,xmm7 > + xor edx,ebp > + pxor xmm0,xmm8 > + add ebx,ecx > + add eax,DWORD[4+rsp] > + and edi,edx > + xor edx,ebp > + ror ecx,7 > + movdqa xmm8,xmm0 > + mov esi,ebx > + xor edi,edx > + movdqa XMMWORD[48+rsp],xmm9 > + rol ebx,5 > + add eax,edi > + xor esi,ecx > + pslld xmm0,2 > + xor ecx,edx > + add eax,ebx > + psrld xmm8,30 > + add ebp,DWORD[8+rsp] > + and esi,ecx > + xor ecx,edx > + por xmm0,xmm8 > + ror ebx,7 > + mov edi,eax > + xor esi,ecx > + rol eax,5 > + pshufd xmm9,xmm7,238 > + add ebp,esi > + xor edi,ebx > + xor ebx,ecx > + add ebp,eax > + add edx,DWORD[12+rsp] > + and edi,ebx > + xor ebx,ecx > + ror eax,7 > + mov esi,ebp > + xor edi,ebx > + rol ebp,5 > + add edx,edi > + xor esi,eax > + xor eax,ebx > + add edx,ebp > + pxor xmm1,xmm5 > + add ecx,DWORD[16+rsp] > + and esi,eax > + xor eax,ebx > + ror ebp,7 > + punpcklqdq xmm9,xmm0 > + mov edi,edx > + xor esi,eax > + pxor xmm1,xmm2 > + rol edx,5 > + add ecx,esi > + movdqa xmm8,xmm10 > + xor edi,ebp > + paddd xmm10,xmm0 > + xor ebp,eax > + pxor xmm1,xmm9 > + add ecx,edx > + add ebx,DWORD[20+rsp] > + and edi,ebp > + xor ebp,eax > + ror edx,7 > + movdqa xmm9,xmm1 > + mov esi,ecx > + xor edi,ebp > + movdqa XMMWORD[rsp],xmm10 > + rol ecx,5 > + add ebx,edi > + xor esi,edx > + pslld xmm1,2 > + xor edx,ebp > + add ebx,ecx > + psrld xmm9,30 > + add eax,DWORD[24+rsp] > + and esi,edx > + xor edx,ebp > + por xmm1,xmm9 > + ror ecx,7 > + mov edi,ebx > + xor esi,edx > + rol ebx,5 > + pshufd xmm10,xmm0,238 > + add eax,esi > + xor edi,ecx > + xor ecx,edx > + add eax,ebx > + add ebp,DWORD[28+rsp] > + and edi,ecx > + xor ecx,edx > + ror ebx,7 > + mov esi,eax > + xor edi,ecx > + rol eax,5 > + add ebp,edi > + xor esi,ebx > + xor ebx,ecx > + add ebp,eax > + pxor xmm2,xmm6 > + add edx,DWORD[32+rsp] > + and esi,ebx > + xor ebx,ecx > + ror eax,7 > + punpcklqdq xmm10,xmm1 > + mov edi,ebp > + xor esi,ebx > + pxor xmm2,xmm3 > + rol ebp,5 > + add edx,esi > + movdqa xmm9,xmm8 > + xor edi,eax > + paddd xmm8,xmm1 > + xor eax,ebx > + pxor xmm2,xmm10 > + add edx,ebp > + add ecx,DWORD[36+rsp] > + and edi,eax > + xor eax,ebx > + ror ebp,7 > + movdqa xmm10,xmm2 > + mov esi,edx > + xor edi,eax > + movdqa XMMWORD[16+rsp],xmm8 > + rol edx,5 > + add ecx,edi > + xor esi,ebp > + pslld xmm2,2 > + xor ebp,eax > + add ecx,edx > + psrld xmm10,30 > + add ebx,DWORD[40+rsp] > + and esi,ebp > + xor ebp,eax > + por xmm2,xmm10 > + ror edx,7 > + mov edi,ecx > + xor esi,ebp > + rol ecx,5 > + pshufd xmm8,xmm1,238 > + add ebx,esi > + xor edi,edx > + xor edx,ebp > + add ebx,ecx > + add eax,DWORD[44+rsp] > + and edi,edx > + xor edx,ebp > + ror ecx,7 > + mov esi,ebx > + xor edi,edx > + rol ebx,5 > + add eax,edi > + xor esi,edx > + add eax,ebx > + pxor xmm3,xmm7 > + add ebp,DWORD[48+rsp] > + xor esi,ecx > + punpcklqdq xmm8,xmm2 > + mov edi,eax > + rol eax,5 > + pxor xmm3,xmm4 > + add ebp,esi > + xor edi,ecx > + movdqa xmm10,xmm9 > + ror ebx,7 > + paddd xmm9,xmm2 > + add ebp,eax > + pxor xmm3,xmm8 > + add edx,DWORD[52+rsp] > + xor edi,ebx > + mov esi,ebp > + rol ebp,5 > + movdqa xmm8,xmm3 > + add edx,edi > + xor esi,ebx > + movdqa XMMWORD[32+rsp],xmm9 > + ror eax,7 > + add edx,ebp > + add ecx,DWORD[56+rsp] > + pslld xmm3,2 > + xor esi,eax > + mov edi,edx > + psrld xmm8,30 > + rol edx,5 > + add ecx,esi > + xor edi,eax > + ror ebp,7 > + por xmm3,xmm8 > + add ecx,edx > + add ebx,DWORD[60+rsp] > + xor edi,ebp > + mov esi,ecx > + rol ecx,5 > + add ebx,edi > + xor esi,ebp > + ror edx,7 > + add ebx,ecx > + add eax,DWORD[rsp] > + xor esi,edx > + mov edi,ebx > + rol ebx,5 > + paddd xmm10,xmm3 > + add eax,esi > + xor edi,edx > + movdqa XMMWORD[48+rsp],xmm10 > + ror ecx,7 > + add eax,ebx > + add ebp,DWORD[4+rsp] > + xor edi,ecx > + mov esi,eax > + rol eax,5 > + add ebp,edi > + xor esi,ecx > + ror ebx,7 > + add ebp,eax > + add edx,DWORD[8+rsp] > + xor esi,ebx > + mov edi,ebp > + rol ebp,5 > + add edx,esi > + xor edi,ebx > + ror eax,7 > + add edx,ebp > + add ecx,DWORD[12+rsp] > + xor edi,eax > + mov esi,edx > + rol edx,5 > + add ecx,edi > + xor esi,eax > + ror ebp,7 > + add ecx,edx > + cmp r9,r10 > + je NEAR $L$done_ssse3 > + movdqa xmm6,XMMWORD[64+r14] > + movdqa xmm9,XMMWORD[((-64))+r14] > + movdqu xmm0,XMMWORD[r9] > + movdqu xmm1,XMMWORD[16+r9] > + movdqu xmm2,XMMWORD[32+r9] > + movdqu xmm3,XMMWORD[48+r9] > +DB 102,15,56,0,198 > + add r9,64 > + add ebx,DWORD[16+rsp] > + xor esi,ebp > + mov edi,ecx > +DB 102,15,56,0,206 > + rol ecx,5 > + add ebx,esi > + xor edi,ebp > + ror edx,7 > + paddd xmm0,xmm9 > + add ebx,ecx > + add eax,DWORD[20+rsp] > + xor edi,edx > + mov esi,ebx > + movdqa XMMWORD[rsp],xmm0 > + rol ebx,5 > + add eax,edi > + xor esi,edx > + ror ecx,7 > + psubd xmm0,xmm9 > + add eax,ebx > + add ebp,DWORD[24+rsp] > + xor esi,ecx > + mov edi,eax > + rol eax,5 > + add ebp,esi > + xor edi,ecx > + ror ebx,7 > + add ebp,eax > + add edx,DWORD[28+rsp] > + xor edi,ebx > + mov esi,ebp > + rol ebp,5 > + add edx,edi > + xor esi,ebx > + ror eax,7 > + add edx,ebp > + add ecx,DWORD[32+rsp] > + xor esi,eax > + mov edi,edx > +DB 102,15,56,0,214 > + rol edx,5 > + add ecx,esi > + xor edi,eax > + ror ebp,7 > + paddd xmm1,xmm9 > + add ecx,edx > + add ebx,DWORD[36+rsp] > + xor edi,ebp > + mov esi,ecx > + movdqa XMMWORD[16+rsp],xmm1 > + rol ecx,5 > + add ebx,edi > + xor esi,ebp > + ror edx,7 > + psubd xmm1,xmm9 > + add ebx,ecx > + add eax,DWORD[40+rsp] > + xor esi,edx > + mov edi,ebx > + rol ebx,5 > + add eax,esi > + xor edi,edx > + ror ecx,7 > + add eax,ebx > + add ebp,DWORD[44+rsp] > + xor edi,ecx > + mov esi,eax > + rol eax,5 > + add ebp,edi > + xor esi,ecx > + ror ebx,7 > + add ebp,eax > + add edx,DWORD[48+rsp] > + xor esi,ebx > + mov edi,ebp > +DB 102,15,56,0,222 > + rol ebp,5 > + add edx,esi > + xor edi,ebx > + ror eax,7 > + paddd xmm2,xmm9 > + add edx,ebp > + add ecx,DWORD[52+rsp] > + xor edi,eax > + mov esi,edx > + movdqa XMMWORD[32+rsp],xmm2 > + rol edx,5 > + add ecx,edi > + xor esi,eax > + ror ebp,7 > + psubd xmm2,xmm9 > + add ecx,edx > + add ebx,DWORD[56+rsp] > + xor esi,ebp > + mov edi,ecx > + rol ecx,5 > + add ebx,esi > + xor edi,ebp > + ror edx,7 > + add ebx,ecx > + add eax,DWORD[60+rsp] > + xor edi,edx > + mov esi,ebx > + rol ebx,5 > + add eax,edi > + ror ecx,7 > + add eax,ebx > + add eax,DWORD[r8] > + add esi,DWORD[4+r8] > + add ecx,DWORD[8+r8] > + add edx,DWORD[12+r8] > + mov DWORD[r8],eax > + add ebp,DWORD[16+r8] > + mov DWORD[4+r8],esi > + mov ebx,esi > + mov DWORD[8+r8],ecx > + mov edi,ecx > + mov DWORD[12+r8],edx > + xor edi,edx > + mov DWORD[16+r8],ebp > + and esi,edi > + jmp NEAR $L$oop_ssse3 > + > +ALIGN 16 > +$L$done_ssse3: > + add ebx,DWORD[16+rsp] > + xor esi,ebp > + mov edi,ecx > + rol ecx,5 > + add ebx,esi > + xor edi,ebp > + ror edx,7 > + add ebx,ecx > + add eax,DWORD[20+rsp] > + xor edi,edx > + mov esi,ebx > + rol ebx,5 > + add eax,edi > + xor esi,edx > + ror ecx,7 > + add eax,ebx > + add ebp,DWORD[24+rsp] > + xor esi,ecx > + mov edi,eax > + rol eax,5 > + add ebp,esi > + xor edi,ecx > + ror ebx,7 > + add ebp,eax > + add edx,DWORD[28+rsp] > + xor edi,ebx > + mov esi,ebp > + rol ebp,5 > + add edx,edi > + xor esi,ebx > + ror eax,7 > + add edx,ebp > + add ecx,DWORD[32+rsp] > + xor esi,eax > + mov edi,edx > + rol edx,5 > + add ecx,esi > + xor edi,eax > + ror ebp,7 > + add ecx,edx > + add ebx,DWORD[36+rsp] > + xor edi,ebp > + mov esi,ecx > + rol ecx,5 > + add ebx,edi > + xor esi,ebp > + ror edx,7 > + add ebx,ecx > + add eax,DWORD[40+rsp] > + xor esi,edx > + mov edi,ebx > + rol ebx,5 > + add eax,esi > + xor edi,edx > + ror ecx,7 > + add eax,ebx > + add ebp,DWORD[44+rsp] > + xor edi,ecx > + mov esi,eax > + rol eax,5 > + add ebp,edi > + xor esi,ecx > + ror ebx,7 > + add ebp,eax > + add edx,DWORD[48+rsp] > + xor esi,ebx > + mov edi,ebp > + rol ebp,5 > + add edx,esi > + xor edi,ebx > + ror eax,7 > + add edx,ebp > + add ecx,DWORD[52+rsp] > + xor edi,eax > + mov esi,edx > + rol edx,5 > + add ecx,edi > + xor esi,eax > + ror ebp,7 > + add ecx,edx > + add ebx,DWORD[56+rsp] > + xor esi,ebp > + mov edi,ecx > + rol ecx,5 > + add ebx,esi > + xor edi,ebp > + ror edx,7 > + add ebx,ecx > + add eax,DWORD[60+rsp] > + xor edi,edx > + mov esi,ebx > + rol ebx,5 > + add eax,edi > + ror ecx,7 > + add eax,ebx > + add eax,DWORD[r8] > + add esi,DWORD[4+r8] > + add ecx,DWORD[8+r8] > + mov DWORD[r8],eax > + add edx,DWORD[12+r8] > + mov DWORD[4+r8],esi > + add ebp,DWORD[16+r8] > + mov DWORD[8+r8],ecx > + mov DWORD[12+r8],edx > + mov DWORD[16+r8],ebp > + movaps xmm6,XMMWORD[((-40-96))+r11] > + movaps xmm7,XMMWORD[((-40-80))+r11] > + movaps xmm8,XMMWORD[((-40-64))+r11] > + movaps xmm9,XMMWORD[((-40-48))+r11] > + movaps xmm10,XMMWORD[((-40-32))+r11] > + movaps xmm11,XMMWORD[((-40-16))+r11] > + mov r14,QWORD[((-40))+r11] > + > + mov r13,QWORD[((-32))+r11] > + > + mov r12,QWORD[((-24))+r11] > + > + mov rbp,QWORD[((-16))+r11] > + > + mov rbx,QWORD[((-8))+r11] > + > + lea rsp,[r11] > + > +$L$epilogue_ssse3: > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > + mov rsi,QWORD[16+rsp] > + DB 0F3h,0C3h ;repret > + > +$L$SEH_end_sha1_block_data_order_ssse3: > +ALIGN 64 > +K_XX_XX: > + DD 0x5a827999,0x5a827999,0x5a827999,0x5a827999 > + DD 0x5a827999,0x5a827999,0x5a827999,0x5a827999 > + DD 0x6ed9eba1,0x6ed9eba1,0x6ed9eba1,0x6ed9eba1 > + DD 0x6ed9eba1,0x6ed9eba1,0x6ed9eba1,0x6ed9eba1 > + DD 0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc > + DD 0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc > + DD 0xca62c1d6,0xca62c1d6,0xca62c1d6,0xca62c1d6 > + DD 0xca62c1d6,0xca62c1d6,0xca62c1d6,0xca62c1d6 > + DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f > + DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f > +DB 0xf,0xe,0xd,0xc,0xb,0xa,0x9,0x8,0x7,0x6,0x5,0x4,0x3,0x2,0x1,0x0 > +DB 83,72,65,49,32,98,108,111,99,107,32,116,114,97,110,115 > +DB 102,111,114,109,32,102,111,114,32,120,56,54,95,54,52,44 > +DB 32,67,82,89,80,84,79,71,65,77,83,32,98,121,32,60 > +DB 97,112,112,114,111,64,111,112,101,110,115,115,108,46,111,114 > +DB 103,62,0 > +ALIGN 64 > +EXTERN __imp_RtlVirtualUnwind > + > +ALIGN 16 > +se_handler: > + push rsi > + push rdi > + push rbx > + push rbp > + push r12 > + push r13 > + push r14 > + push r15 > + pushfq > + sub rsp,64 > + > + mov rax,QWORD[120+r8] > + mov rbx,QWORD[248+r8] > + > + lea r10,[$L$prologue] > + cmp rbx,r10 > + jb NEAR $L$common_seh_tail > + > + mov rax,QWORD[152+r8] > + > + lea r10,[$L$epilogue] > + cmp rbx,r10 > + jae NEAR $L$common_seh_tail > + > + mov rax,QWORD[64+rax] > + > + mov rbx,QWORD[((-8))+rax] > + mov rbp,QWORD[((-16))+rax] > + mov r12,QWORD[((-24))+rax] > + mov r13,QWORD[((-32))+rax] > + mov r14,QWORD[((-40))+rax] > + mov QWORD[144+r8],rbx > + mov QWORD[160+r8],rbp > + mov QWORD[216+r8],r12 > + mov QWORD[224+r8],r13 > + mov QWORD[232+r8],r14 > + > + jmp NEAR $L$common_seh_tail > + > + > +ALIGN 16 > +shaext_handler: > + push rsi > + push rdi > + push rbx > + push rbp > + push r12 > + push r13 > + push r14 > + push r15 > + pushfq > + sub rsp,64 > + > + mov rax,QWORD[120+r8] > + mov rbx,QWORD[248+r8] > + > + lea r10,[$L$prologue_shaext] > + cmp rbx,r10 > + jb NEAR $L$common_seh_tail > + > + lea r10,[$L$epilogue_shaext] > + cmp rbx,r10 > + jae NEAR $L$common_seh_tail > + > + lea rsi,[((-8-64))+rax] > + lea rdi,[512+r8] > + mov ecx,8 > + DD 0xa548f3fc > + > + jmp NEAR $L$common_seh_tail > + > + > +ALIGN 16 > +ssse3_handler: > + push rsi > + push rdi > + push rbx > + push rbp > + push r12 > + push r13 > + push r14 > + push r15 > + pushfq > + sub rsp,64 > + > + mov rax,QWORD[120+r8] > + mov rbx,QWORD[248+r8] > + > + mov rsi,QWORD[8+r9] > + mov r11,QWORD[56+r9] > + > + mov r10d,DWORD[r11] > + lea r10,[r10*1+rsi] > + cmp rbx,r10 > + jb NEAR $L$common_seh_tail > + > + mov rax,QWORD[208+r8] > + > + mov r10d,DWORD[4+r11] > + lea r10,[r10*1+rsi] > + cmp rbx,r10 > + jae NEAR $L$common_seh_tail > + > + lea rsi,[((-40-96))+rax] > + lea rdi,[512+r8] > + mov ecx,12 > + DD 0xa548f3fc > + > + mov rbx,QWORD[((-8))+rax] > + mov rbp,QWORD[((-16))+rax] > + mov r12,QWORD[((-24))+rax] > + mov r13,QWORD[((-32))+rax] > + mov r14,QWORD[((-40))+rax] > + mov QWORD[144+r8],rbx > + mov QWORD[160+r8],rbp > + mov QWORD[216+r8],r12 > + mov QWORD[224+r8],r13 > + mov QWORD[232+r8],r14 > + > +$L$common_seh_tail: > + mov rdi,QWORD[8+rax] > + mov rsi,QWORD[16+rax] > + mov QWORD[152+r8],rax > + mov QWORD[168+r8],rsi > + mov QWORD[176+r8],rdi > + > + mov rdi,QWORD[40+r9] > + mov rsi,r8 > + mov ecx,154 > + DD 0xa548f3fc > + > + mov rsi,r9 > + xor rcx,rcx > + mov rdx,QWORD[8+rsi] > + mov r8,QWORD[rsi] > + mov r9,QWORD[16+rsi] > + mov r10,QWORD[40+rsi] > + lea r11,[56+rsi] > + lea r12,[24+rsi] > + mov QWORD[32+rsp],r10 > + mov QWORD[40+rsp],r11 > + mov QWORD[48+rsp],r12 > + mov QWORD[56+rsp],rcx > + call QWORD[__imp_RtlVirtualUnwind] > + > + mov eax,1 > + add rsp,64 > + popfq > + pop r15 > + pop r14 > + pop r13 > + pop r12 > + pop rbp > + pop rbx > + pop rdi > + pop rsi > + DB 0F3h,0C3h ;repret > + > + > +section .pdata rdata align=4 > +ALIGN 4 > + DD $L$SEH_begin_sha1_block_data_order wrt ..imagebase > + DD $L$SEH_end_sha1_block_data_order wrt ..imagebase > + DD $L$SEH_info_sha1_block_data_order wrt ..imagebase > + DD $L$SEH_begin_sha1_block_data_order_shaext wrt ..imagebase > + DD $L$SEH_end_sha1_block_data_order_shaext wrt ..imagebase > + DD $L$SEH_info_sha1_block_data_order_shaext wrt ..imagebase > + DD $L$SEH_begin_sha1_block_data_order_ssse3 wrt ..imagebase > + DD $L$SEH_end_sha1_block_data_order_ssse3 wrt ..imagebase > + DD $L$SEH_info_sha1_block_data_order_ssse3 wrt ..imagebase > +section .xdata rdata align=8 > +ALIGN 8 > +$L$SEH_info_sha1_block_data_order: > +DB 9,0,0,0 > + DD se_handler wrt ..imagebase > +$L$SEH_info_sha1_block_data_order_shaext: > +DB 9,0,0,0 > + DD shaext_handler wrt ..imagebase > +$L$SEH_info_sha1_block_data_order_ssse3: > +DB 9,0,0,0 > + DD ssse3_handler wrt ..imagebase > + DD $L$prologue_ssse3 wrt ..imagebase,$L$epilogue_ssse3 > wrt ..imagebase > diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-mb- > x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-mb- > x86_64.nasm > new file mode 100644 > index 0000000000..7cd5eae85c > --- /dev/null > +++ b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-mb-x86_64.nasm > @@ -0,0 +1,3461 @@ > +; WARNING: do not edit! > +; Generated from openssl/crypto/sha/asm/sha256-mb-x86_64.pl > +; > +; Copyright 2013-2020 The OpenSSL Project Authors. All Rights Reserved. > +; > +; Licensed under the OpenSSL license (the "License"). You may not use > +; this file except in compliance with the License. You can obtain a copy > +; in the file LICENSE in the source distribution or at > +; https://www.openssl.org/source/license.html > + > +default rel > +%define XMMWORD > +%define YMMWORD > +%define ZMMWORD > +section .text code align=64 > + > + > +EXTERN OPENSSL_ia32cap_P > + > +global sha256_multi_block > + > +ALIGN 32 > +sha256_multi_block: > + mov QWORD[8+rsp],rdi ;WIN64 prologue > + mov QWORD[16+rsp],rsi > + mov rax,rsp > +$L$SEH_begin_sha256_multi_block: > + mov rdi,rcx > + mov rsi,rdx > + mov rdx,r8 > + > + > + > + mov rcx,QWORD[((OPENSSL_ia32cap_P+4))] > + bt rcx,61 > + jc NEAR _shaext_shortcut > + mov rax,rsp > + > + push rbx > + > + push rbp > + > + lea rsp,[((-168))+rsp] > + movaps XMMWORD[rsp],xmm6 > + movaps XMMWORD[16+rsp],xmm7 > + movaps XMMWORD[32+rsp],xmm8 > + movaps XMMWORD[48+rsp],xmm9 > + movaps XMMWORD[(-120)+rax],xmm10 > + movaps XMMWORD[(-104)+rax],xmm11 > + movaps XMMWORD[(-88)+rax],xmm12 > + movaps XMMWORD[(-72)+rax],xmm13 > + movaps XMMWORD[(-56)+rax],xmm14 > + movaps XMMWORD[(-40)+rax],xmm15 > + sub rsp,288 > + and rsp,-256 > + mov QWORD[272+rsp],rax > + > +$L$body: > + lea rbp,[((K256+128))] > + lea rbx,[256+rsp] > + lea rdi,[128+rdi] > + > +$L$oop_grande: > + mov DWORD[280+rsp],edx > + xor edx,edx > + mov r8,QWORD[rsi] > + mov ecx,DWORD[8+rsi] > + cmp ecx,edx > + cmovg edx,ecx > + test ecx,ecx > + mov DWORD[rbx],ecx > + cmovle r8,rbp > + mov r9,QWORD[16+rsi] > + mov ecx,DWORD[24+rsi] > + cmp ecx,edx > + cmovg edx,ecx > + test ecx,ecx > + mov DWORD[4+rbx],ecx > + cmovle r9,rbp > + mov r10,QWORD[32+rsi] > + mov ecx,DWORD[40+rsi] > + cmp ecx,edx > + cmovg edx,ecx > + test ecx,ecx > + mov DWORD[8+rbx],ecx > + cmovle r10,rbp > + mov r11,QWORD[48+rsi] > + mov ecx,DWORD[56+rsi] > + cmp ecx,edx > + cmovg edx,ecx > + test ecx,ecx > + mov DWORD[12+rbx],ecx > + cmovle r11,rbp > + test edx,edx > + jz NEAR $L$done > + > + movdqu xmm8,XMMWORD[((0-128))+rdi] > + lea rax,[128+rsp] > + movdqu xmm9,XMMWORD[((32-128))+rdi] > + movdqu xmm10,XMMWORD[((64-128))+rdi] > + movdqu xmm11,XMMWORD[((96-128))+rdi] > + movdqu xmm12,XMMWORD[((128-128))+rdi] > + movdqu xmm13,XMMWORD[((160-128))+rdi] > + movdqu xmm14,XMMWORD[((192-128))+rdi] > + movdqu xmm15,XMMWORD[((224-128))+rdi] > + movdqu xmm6,XMMWORD[$L$pbswap] > + jmp NEAR $L$oop > + > +ALIGN 32 > +$L$oop: > + movdqa xmm4,xmm10 > + pxor xmm4,xmm9 > + movd xmm5,DWORD[r8] > + movd xmm0,DWORD[r9] > + movd xmm1,DWORD[r10] > + movd xmm2,DWORD[r11] > + punpckldq xmm5,xmm1 > + punpckldq xmm0,xmm2 > + punpckldq xmm5,xmm0 > + movdqa xmm7,xmm12 > +DB 102,15,56,0,238 > + movdqa xmm2,xmm12 > + > + psrld xmm7,6 > + movdqa xmm1,xmm12 > + pslld xmm2,7 > + movdqa XMMWORD[(0-128)+rax],xmm5 > + paddd xmm5,xmm15 > + > + psrld xmm1,11 > + pxor xmm7,xmm2 > + pslld xmm2,21-7 > + paddd xmm5,XMMWORD[((-128))+rbp] > + pxor xmm7,xmm1 > + > + psrld xmm1,25-11 > + movdqa xmm0,xmm12 > + > + pxor xmm7,xmm2 > + movdqa xmm3,xmm12 > + pslld xmm2,26-21 > + pandn xmm0,xmm14 > + pand xmm3,xmm13 > + pxor xmm7,xmm1 > + > + > + movdqa xmm1,xmm8 > + pxor xmm7,xmm2 > + movdqa xmm2,xmm8 > + psrld xmm1,2 > + paddd xmm5,xmm7 > + pxor xmm0,xmm3 > + movdqa xmm3,xmm9 > + movdqa xmm7,xmm8 > + pslld xmm2,10 > + pxor xmm3,xmm8 > + > + > + psrld xmm7,13 > + pxor xmm1,xmm2 > + paddd xmm5,xmm0 > + pslld xmm2,19-10 > + pand xmm4,xmm3 > + pxor xmm1,xmm7 > + > + > + psrld xmm7,22-13 > + pxor xmm1,xmm2 > + movdqa xmm15,xmm9 > + pslld xmm2,30-19 > + pxor xmm7,xmm1 > + pxor xmm15,xmm4 > + paddd xmm11,xmm5 > + pxor xmm7,xmm2 > + > + paddd xmm15,xmm5 > + paddd xmm15,xmm7 > + movd xmm5,DWORD[4+r8] > + movd xmm0,DWORD[4+r9] > + movd xmm1,DWORD[4+r10] > + movd xmm2,DWORD[4+r11] > + punpckldq xmm5,xmm1 > + punpckldq xmm0,xmm2 > + punpckldq xmm5,xmm0 > + movdqa xmm7,xmm11 > + > + movdqa xmm2,xmm11 > +DB 102,15,56,0,238 > + psrld xmm7,6 > + movdqa xmm1,xmm11 > + pslld xmm2,7 > + movdqa XMMWORD[(16-128)+rax],xmm5 > + paddd xmm5,xmm14 > + > + psrld xmm1,11 > + pxor xmm7,xmm2 > + pslld xmm2,21-7 > + paddd xmm5,XMMWORD[((-96))+rbp] > + pxor xmm7,xmm1 > + > + psrld xmm1,25-11 > + movdqa xmm0,xmm11 > + > + pxor xmm7,xmm2 > + movdqa xmm4,xmm11 > + pslld xmm2,26-21 > + pandn xmm0,xmm13 > + pand xmm4,xmm12 > + pxor xmm7,xmm1 > + > + > + movdqa xmm1,xmm15 > + pxor xmm7,xmm2 > + movdqa xmm2,xmm15 > + psrld xmm1,2 > + paddd xmm5,xmm7 > + pxor xmm0,xmm4 > + movdqa xmm4,xmm8 > + movdqa xmm7,xmm15 > + pslld xmm2,10 > + pxor xmm4,xmm15 > + > + > + psrld xmm7,13 > + pxor xmm1,xmm2 > + paddd xmm5,xmm0 > + pslld xmm2,19-10 > + pand xmm3,xmm4 > + pxor xmm1,xmm7 > + > + > + psrld xmm7,22-13 > + pxor xmm1,xmm2 > + movdqa xmm14,xmm8 > + pslld xmm2,30-19 > + pxor xmm7,xmm1 > + pxor xmm14,xmm3 > + paddd xmm10,xmm5 > + pxor xmm7,xmm2 > + > + paddd xmm14,xmm5 > + paddd xmm14,xmm7 > + movd xmm5,DWORD[8+r8] > + movd xmm0,DWORD[8+r9] > + movd xmm1,DWORD[8+r10] > + movd xmm2,DWORD[8+r11] > + punpckldq xmm5,xmm1 > + punpckldq xmm0,xmm2 > + punpckldq xmm5,xmm0 > + movdqa xmm7,xmm10 > +DB 102,15,56,0,238 > + movdqa xmm2,xmm10 > + > + psrld xmm7,6 > + movdqa xmm1,xmm10 > + pslld xmm2,7 > + movdqa XMMWORD[(32-128)+rax],xmm5 > + paddd xmm5,xmm13 > + > + psrld xmm1,11 > + pxor xmm7,xmm2 > + pslld xmm2,21-7 > + paddd xmm5,XMMWORD[((-64))+rbp] > + pxor xmm7,xmm1 > + > + psrld xmm1,25-11 > + movdqa xmm0,xmm10 > + > + pxor xmm7,xmm2 > + movdqa xmm3,xmm10 > + pslld xmm2,26-21 > + pandn xmm0,xmm12 > + pand xmm3,xmm11 > + pxor xmm7,xmm1 > + > + > + movdqa xmm1,xmm14 > + pxor xmm7,xmm2 > + movdqa xmm2,xmm14 > + psrld xmm1,2 > + paddd xmm5,xmm7 > + pxor xmm0,xmm3 > + movdqa xmm3,xmm15 > + movdqa xmm7,xmm14 > + pslld xmm2,10 > + pxor xmm3,xmm14 > + > + > + psrld xmm7,13 > + pxor xmm1,xmm2 > + paddd xmm5,xmm0 > + pslld xmm2,19-10 > + pand xmm4,xmm3 > + pxor xmm1,xmm7 > + > + > + psrld xmm7,22-13 > + pxor xmm1,xmm2 > + movdqa xmm13,xmm15 > + pslld xmm2,30-19 > + pxor xmm7,xmm1 > + pxor xmm13,xmm4 > + paddd xmm9,xmm5 > + pxor xmm7,xmm2 > + > + paddd xmm13,xmm5 > + paddd xmm13,xmm7 > + movd xmm5,DWORD[12+r8] > + movd xmm0,DWORD[12+r9] > + movd xmm1,DWORD[12+r10] > + movd xmm2,DWORD[12+r11] > + punpckldq xmm5,xmm1 > + punpckldq xmm0,xmm2 > + punpckldq xmm5,xmm0 > + movdqa xmm7,xmm9 > + > + movdqa xmm2,xmm9 > +DB 102,15,56,0,238 > + psrld xmm7,6 > + movdqa xmm1,xmm9 > + pslld xmm2,7 > + movdqa XMMWORD[(48-128)+rax],xmm5 > + paddd xmm5,xmm12 > + > + psrld xmm1,11 > + pxor xmm7,xmm2 > + pslld xmm2,21-7 > + paddd xmm5,XMMWORD[((-32))+rbp] > + pxor xmm7,xmm1 > + > + psrld xmm1,25-11 > + movdqa xmm0,xmm9 > + > + pxor xmm7,xmm2 > + movdqa xmm4,xmm9 > + pslld xmm2,26-21 > + pandn xmm0,xmm11 > + pand xmm4,xmm10 > + pxor xmm7,xmm1 > + > + > + movdqa xmm1,xmm13 > + pxor xmm7,xmm2 > + movdqa xmm2,xmm13 > + psrld xmm1,2 > + paddd xmm5,xmm7 > + pxor xmm0,xmm4 > + movdqa xmm4,xmm14 > + movdqa xmm7,xmm13 > + pslld xmm2,10 > + pxor xmm4,xmm13 > + > + > + psrld xmm7,13 > + pxor xmm1,xmm2 > + paddd xmm5,xmm0 > + pslld xmm2,19-10 > + pand xmm3,xmm4 > + pxor xmm1,xmm7 > + > + > + psrld xmm7,22-13 > + pxor xmm1,xmm2 > + movdqa xmm12,xmm14 > + pslld xmm2,30-19 > + pxor xmm7,xmm1 > + pxor xmm12,xmm3 > + paddd xmm8,xmm5 > + pxor xmm7,xmm2 > + > + paddd xmm12,xmm5 > + paddd xmm12,xmm7 > + movd xmm5,DWORD[16+r8] > + movd xmm0,DWORD[16+r9] > + movd xmm1,DWORD[16+r10] > + movd xmm2,DWORD[16+r11] > + punpckldq xmm5,xmm1 > + punpckldq xmm0,xmm2 > + punpckldq xmm5,xmm0 > + movdqa xmm7,xmm8 > +DB 102,15,56,0,238 > + movdqa xmm2,xmm8 > + > + psrld xmm7,6 > + movdqa xmm1,xmm8 > + pslld xmm2,7 > + movdqa XMMWORD[(64-128)+rax],xmm5 > + paddd xmm5,xmm11 > + > + psrld xmm1,11 > + pxor xmm7,xmm2 > + pslld xmm2,21-7 > + paddd xmm5,XMMWORD[rbp] > + pxor xmm7,xmm1 > + > + psrld xmm1,25-11 > + movdqa xmm0,xmm8 > + > + pxor xmm7,xmm2 > + movdqa xmm3,xmm8 > + pslld xmm2,26-21 > + pandn xmm0,xmm10 > + pand xmm3,xmm9 > + pxor xmm7,xmm1 > + > + > + movdqa xmm1,xmm12 > + pxor xmm7,xmm2 > + movdqa xmm2,xmm12 > + psrld xmm1,2 > + paddd xmm5,xmm7 > + pxor xmm0,xmm3 > + movdqa xmm3,xmm13 > + movdqa xmm7,xmm12 > + pslld xmm2,10 > + pxor xmm3,xmm12 > + > + > + psrld xmm7,13 > + pxor xmm1,xmm2 > + paddd xmm5,xmm0 > + pslld xmm2,19-10 > + pand xmm4,xmm3 > + pxor xmm1,xmm7 > + > + > + psrld xmm7,22-13 > + pxor xmm1,xmm2 > + movdqa xmm11,xmm13 > + pslld xmm2,30-19 > + pxor xmm7,xmm1 > + pxor xmm11,xmm4 > + paddd xmm15,xmm5 > + pxor xmm7,xmm2 > + > + paddd xmm11,xmm5 > + paddd xmm11,xmm7 > + movd xmm5,DWORD[20+r8] > + movd xmm0,DWORD[20+r9] > + movd xmm1,DWORD[20+r10] > + movd xmm2,DWORD[20+r11] > + punpckldq xmm5,xmm1 > + punpckldq xmm0,xmm2 > + punpckldq xmm5,xmm0 > + movdqa xmm7,xmm15 > + > + movdqa xmm2,xmm15 > +DB 102,15,56,0,238 > + psrld xmm7,6 > + movdqa xmm1,xmm15 > + pslld xmm2,7 > + movdqa XMMWORD[(80-128)+rax],xmm5 > + paddd xmm5,xmm10 > + > + psrld xmm1,11 > + pxor xmm7,xmm2 > + pslld xmm2,21-7 > + paddd xmm5,XMMWORD[32+rbp] > + pxor xmm7,xmm1 > + > + psrld xmm1,25-11 > + movdqa xmm0,xmm15 > + > + pxor xmm7,xmm2 > + movdqa xmm4,xmm15 > + pslld xmm2,26-21 > + pandn xmm0,xmm9 > + pand xmm4,xmm8 > + pxor xmm7,xmm1 > + > + > + movdqa xmm1,xmm11 > + pxor xmm7,xmm2 > + movdqa xmm2,xmm11 > + psrld xmm1,2 > + paddd xmm5,xmm7 > + pxor xmm0,xmm4 > + movdqa xmm4,xmm12 > + movdqa xmm7,xmm11 > + pslld xmm2,10 > + pxor xmm4,xmm11 > + > + > + psrld xmm7,13 > + pxor xmm1,xmm2 > + paddd xmm5,xmm0 > + pslld xmm2,19-10 > + pand xmm3,xmm4 > + pxor xmm1,xmm7 > + > + > + psrld xmm7,22-13 > + pxor xmm1,xmm2 > + movdqa xmm10,xmm12 > + pslld xmm2,30-19 > + pxor xmm7,xmm1 > + pxor xmm10,xmm3 > + paddd xmm14,xmm5 > + pxor xmm7,xmm2 > + > + paddd xmm10,xmm5 > + paddd xmm10,xmm7 > + movd xmm5,DWORD[24+r8] > + movd xmm0,DWORD[24+r9] > + movd xmm1,DWORD[24+r10] > + movd xmm2,DWORD[24+r11] > + punpckldq xmm5,xmm1 > + punpckldq xmm0,xmm2 > + punpckldq xmm5,xmm0 > + movdqa xmm7,xmm14 > +DB 102,15,56,0,238 > + movdqa xmm2,xmm14 > + > + psrld xmm7,6 > + movdqa xmm1,xmm14 > + pslld xmm2,7 > + movdqa XMMWORD[(96-128)+rax],xmm5 > + paddd xmm5,xmm9 > + > + psrld xmm1,11 > + pxor xmm7,xmm2 > + pslld xmm2,21-7 > + paddd xmm5,XMMWORD[64+rbp] > + pxor xmm7,xmm1 > + > + psrld xmm1,25-11 > + movdqa xmm0,xmm14 > + > + pxor xmm7,xmm2 > + movdqa xmm3,xmm14 > + pslld xmm2,26-21 > + pandn xmm0,xmm8 > + pand xmm3,xmm15 > + pxor xmm7,xmm1 > + > + > + movdqa xmm1,xmm10 > + pxor xmm7,xmm2 > + movdqa xmm2,xmm10 > + psrld xmm1,2 > + paddd xmm5,xmm7 > + pxor xmm0,xmm3 > + movdqa xmm3,xmm11 > + movdqa xmm7,xmm10 > + pslld xmm2,10 > + pxor xmm3,xmm10 > + > + > + psrld xmm7,13 > + pxor xmm1,xmm2 > + paddd xmm5,xmm0 > + pslld xmm2,19-10 > + pand xmm4,xmm3 > + pxor xmm1,xmm7 > + > + > + psrld xmm7,22-13 > + pxor xmm1,xmm2 > + movdqa xmm9,xmm11 > + pslld xmm2,30-19 > + pxor xmm7,xmm1 > + pxor xmm9,xmm4 > + paddd xmm13,xmm5 > + pxor xmm7,xmm2 > + > + paddd xmm9,xmm5 > + paddd xmm9,xmm7 > + movd xmm5,DWORD[28+r8] > + movd xmm0,DWORD[28+r9] > + movd xmm1,DWORD[28+r10] > + movd xmm2,DWORD[28+r11] > + punpckldq xmm5,xmm1 > + punpckldq xmm0,xmm2 > + punpckldq xmm5,xmm0 > + movdqa xmm7,xmm13 > + > + movdqa xmm2,xmm13 > +DB 102,15,56,0,238 > + psrld xmm7,6 > + movdqa xmm1,xmm13 > + pslld xmm2,7 > + movdqa XMMWORD[(112-128)+rax],xmm5 > + paddd xmm5,xmm8 > + > + psrld xmm1,11 > + pxor xmm7,xmm2 > + pslld xmm2,21-7 > + paddd xmm5,XMMWORD[96+rbp] > + pxor xmm7,xmm1 > + > + psrld xmm1,25-11 > + movdqa xmm0,xmm13 > + > + pxor xmm7,xmm2 > + movdqa xmm4,xmm13 > + pslld xmm2,26-21 > + pandn xmm0,xmm15 > + pand xmm4,xmm14 > + pxor xmm7,xmm1 > + > + > + movdqa xmm1,xmm9 > + pxor xmm7,xmm2 > + movdqa xmm2,xmm9 > + psrld xmm1,2 > + paddd xmm5,xmm7 > + pxor xmm0,xmm4 > + movdqa xmm4,xmm10 > + movdqa xmm7,xmm9 > + pslld xmm2,10 > + pxor xmm4,xmm9 > + > + > + psrld xmm7,13 > + pxor xmm1,xmm2 > + paddd xmm5,xmm0 > + pslld xmm2,19-10 > + pand xmm3,xmm4 > + pxor xmm1,xmm7 > + > + > + psrld xmm7,22-13 > + pxor xmm1,xmm2 > + movdqa xmm8,xmm10 > + pslld xmm2,30-19 > + pxor xmm7,xmm1 > + pxor xmm8,xmm3 > + paddd xmm12,xmm5 > + pxor xmm7,xmm2 > + > + paddd xmm8,xmm5 > + paddd xmm8,xmm7 > + lea rbp,[256+rbp] > + movd xmm5,DWORD[32+r8] > + movd xmm0,DWORD[32+r9] > + movd xmm1,DWORD[32+r10] > + movd xmm2,DWORD[32+r11] > + punpckldq xmm5,xmm1 > + punpckldq xmm0,xmm2 > + punpckldq xmm5,xmm0 > + movdqa xmm7,xmm12 > +DB 102,15,56,0,238 > + movdqa xmm2,xmm12 > + > + psrld xmm7,6 > + movdqa xmm1,xmm12 > + pslld xmm2,7 > + movdqa XMMWORD[(128-128)+rax],xmm5 > + paddd xmm5,xmm15 > + > + psrld xmm1,11 > + pxor xmm7,xmm2 > + pslld xmm2,21-7 > + paddd xmm5,XMMWORD[((-128))+rbp] > + pxor xmm7,xmm1 > + > + psrld xmm1,25-11 > + movdqa xmm0,xmm12 > + > + pxor xmm7,xmm2 > + movdqa xmm3,xmm12 > + pslld xmm2,26-21 > + pandn xmm0,xmm14 > + pand xmm3,xmm13 > + pxor xmm7,xmm1 > + > + > + movdqa xmm1,xmm8 > + pxor xmm7,xmm2 > + movdqa xmm2,xmm8 > + psrld xmm1,2 > + paddd xmm5,xmm7 > + pxor xmm0,xmm3 > + movdqa xmm3,xmm9 > + movdqa xmm7,xmm8 > + pslld xmm2,10 > + pxor xmm3,xmm8 > + > + > + psrld xmm7,13 > + pxor xmm1,xmm2 > + paddd xmm5,xmm0 > + pslld xmm2,19-10 > + pand xmm4,xmm3 > + pxor xmm1,xmm7 > + > + > + psrld xmm7,22-13 > + pxor xmm1,xmm2 > + movdqa xmm15,xmm9 > + pslld xmm2,30-19 > + pxor xmm7,xmm1 > + pxor xmm15,xmm4 > + paddd xmm11,xmm5 > + pxor xmm7,xmm2 > + > + paddd xmm15,xmm5 > + paddd xmm15,xmm7 > + movd xmm5,DWORD[36+r8] > + movd xmm0,DWORD[36+r9] > + movd xmm1,DWORD[36+r10] > + movd xmm2,DWORD[36+r11] > + punpckldq xmm5,xmm1 > + punpckldq xmm0,xmm2 > + punpckldq xmm5,xmm0 > + movdqa xmm7,xmm11 > + > + movdqa xmm2,xmm11 > +DB 102,15,56,0,238 > + psrld xmm7,6 > + movdqa xmm1,xmm11 > + pslld xmm2,7 > + movdqa XMMWORD[(144-128)+rax],xmm5 > + paddd xmm5,xmm14 > + > + psrld xmm1,11 > + pxor xmm7,xmm2 > + pslld xmm2,21-7 > + paddd xmm5,XMMWORD[((-96))+rbp] > + pxor xmm7,xmm1 > + > + psrld xmm1,25-11 > + movdqa xmm0,xmm11 > + > + pxor xmm7,xmm2 > + movdqa xmm4,xmm11 > + pslld xmm2,26-21 > + pandn xmm0,xmm13 > + pand xmm4,xmm12 > + pxor xmm7,xmm1 > + > + > + movdqa xmm1,xmm15 > + pxor xmm7,xmm2 > + movdqa xmm2,xmm15 > + psrld xmm1,2 > + paddd xmm5,xmm7 > + pxor xmm0,xmm4 > + movdqa xmm4,xmm8 > + movdqa xmm7,xmm15 > + pslld xmm2,10 > + pxor xmm4,xmm15 > + > + > + psrld xmm7,13 > + pxor xmm1,xmm2 > + paddd xmm5,xmm0 > + pslld xmm2,19-10 > + pand xmm3,xmm4 > + pxor xmm1,xmm7 > + > + > + psrld xmm7,22-13 > + pxor xmm1,xmm2 > + movdqa xmm14,xmm8 > + pslld xmm2,30-19 > + pxor xmm7,xmm1 > + pxor xmm14,xmm3 > + paddd xmm10,xmm5 > + pxor xmm7,xmm2 > + > + paddd xmm14,xmm5 > + paddd xmm14,xmm7 > + movd xmm5,DWORD[40+r8] > + movd xmm0,DWORD[40+r9] > + movd xmm1,DWORD[40+r10] > + movd xmm2,DWORD[40+r11] > + punpckldq xmm5,xmm1 > + punpckldq xmm0,xmm2 > + punpckldq xmm5,xmm0 > + movdqa xmm7,xmm10 > +DB 102,15,56,0,238 > + movdqa xmm2,xmm10 > + > + psrld xmm7,6 > + movdqa xmm1,xmm10 > + pslld xmm2,7 > + movdqa XMMWORD[(160-128)+rax],xmm5 > + paddd xmm5,xmm13 > + > + psrld xmm1,11 > + pxor xmm7,xmm2 > + pslld xmm2,21-7 > + paddd xmm5,XMMWORD[((-64))+rbp] > + pxor xmm7,xmm1 > + > + psrld xmm1,25-11 > + movdqa xmm0,xmm10 > + > + pxor xmm7,xmm2 > + movdqa xmm3,xmm10 > + pslld xmm2,26-21 > + pandn xmm0,xmm12 > + pand xmm3,xmm11 > + pxor xmm7,xmm1 > + > + > + movdqa xmm1,xmm14 > + pxor xmm7,xmm2 > + movdqa xmm2,xmm14 > + psrld xmm1,2 > + paddd xmm5,xmm7 > + pxor xmm0,xmm3 > + movdqa xmm3,xmm15 > + movdqa xmm7,xmm14 > + pslld xmm2,10 > + pxor xmm3,xmm14 > + > + > + psrld xmm7,13 > + pxor xmm1,xmm2 > + paddd xmm5,xmm0 > + pslld xmm2,19-10 > + pand xmm4,xmm3 > + pxor xmm1,xmm7 > + > + > + psrld xmm7,22-13 > + pxor xmm1,xmm2 > + movdqa xmm13,xmm15 > + pslld xmm2,30-19 > + pxor xmm7,xmm1 > + pxor xmm13,xmm4 > + paddd xmm9,xmm5 > + pxor xmm7,xmm2 > + > + paddd xmm13,xmm5 > + paddd xmm13,xmm7 > + movd xmm5,DWORD[44+r8] > + movd xmm0,DWORD[44+r9] > + movd xmm1,DWORD[44+r10] > + movd xmm2,DWORD[44+r11] > + punpckldq xmm5,xmm1 > + punpckldq xmm0,xmm2 > + punpckldq xmm5,xmm0 > + movdqa xmm7,xmm9 > + > + movdqa xmm2,xmm9 > +DB 102,15,56,0,238 > + psrld xmm7,6 > + movdqa xmm1,xmm9 > + pslld xmm2,7 > + movdqa XMMWORD[(176-128)+rax],xmm5 > + paddd xmm5,xmm12 > + > + psrld xmm1,11 > + pxor xmm7,xmm2 > + pslld xmm2,21-7 > + paddd xmm5,XMMWORD[((-32))+rbp] > + pxor xmm7,xmm1 > + > + psrld xmm1,25-11 > + movdqa xmm0,xmm9 > + > + pxor xmm7,xmm2 > + movdqa xmm4,xmm9 > + pslld xmm2,26-21 > + pandn xmm0,xmm11 > + pand xmm4,xmm10 > + pxor xmm7,xmm1 > + > + > + movdqa xmm1,xmm13 > + pxor xmm7,xmm2 > + movdqa xmm2,xmm13 > + psrld xmm1,2 > + paddd xmm5,xmm7 > + pxor xmm0,xmm4 > + movdqa xmm4,xmm14 > + movdqa xmm7,xmm13 > + pslld xmm2,10 > + pxor xmm4,xmm13 > + > + > + psrld xmm7,13 > + pxor xmm1,xmm2 > + paddd xmm5,xmm0 > + pslld xmm2,19-10 > + pand xmm3,xmm4 > + pxor xmm1,xmm7 > + > + > + psrld xmm7,22-13 > + pxor xmm1,xmm2 > + movdqa xmm12,xmm14 > + pslld xmm2,30-19 > + pxor xmm7,xmm1 > + pxor xmm12,xmm3 > + paddd xmm8,xmm5 > + pxor xmm7,xmm2 > + > + paddd xmm12,xmm5 > + paddd xmm12,xmm7 > + movd xmm5,DWORD[48+r8] > + movd xmm0,DWORD[48+r9] > + movd xmm1,DWORD[48+r10] > + movd xmm2,DWORD[48+r11] > + punpckldq xmm5,xmm1 > + punpckldq xmm0,xmm2 > + punpckldq xmm5,xmm0 > + movdqa xmm7,xmm8 > +DB 102,15,56,0,238 > + movdqa xmm2,xmm8 > + > + psrld xmm7,6 > + movdqa xmm1,xmm8 > + pslld xmm2,7 > + movdqa XMMWORD[(192-128)+rax],xmm5 > + paddd xmm5,xmm11 > + > + psrld xmm1,11 > + pxor xmm7,xmm2 > + pslld xmm2,21-7 > + paddd xmm5,XMMWORD[rbp] > + pxor xmm7,xmm1 > + > + psrld xmm1,25-11 > + movdqa xmm0,xmm8 > + > + pxor xmm7,xmm2 > + movdqa xmm3,xmm8 > + pslld xmm2,26-21 > + pandn xmm0,xmm10 > + pand xmm3,xmm9 > + pxor xmm7,xmm1 > + > + > + movdqa xmm1,xmm12 > + pxor xmm7,xmm2 > + movdqa xmm2,xmm12 > + psrld xmm1,2 > + paddd xmm5,xmm7 > + pxor xmm0,xmm3 > + movdqa xmm3,xmm13 > + movdqa xmm7,xmm12 > + pslld xmm2,10 > + pxor xmm3,xmm12 > + > + > + psrld xmm7,13 > + pxor xmm1,xmm2 > + paddd xmm5,xmm0 > + pslld xmm2,19-10 > + pand xmm4,xmm3 > + pxor xmm1,xmm7 > + > + > + psrld xmm7,22-13 > + pxor xmm1,xmm2 > + movdqa xmm11,xmm13 > + pslld xmm2,30-19 > + pxor xmm7,xmm1 > + pxor xmm11,xmm4 > + paddd xmm15,xmm5 > + pxor xmm7,xmm2 > + > + paddd xmm11,xmm5 > + paddd xmm11,xmm7 > + movd xmm5,DWORD[52+r8] > + movd xmm0,DWORD[52+r9] > + movd xmm1,DWORD[52+r10] > + movd xmm2,DWORD[52+r11] > + punpckldq xmm5,xmm1 > + punpckldq xmm0,xmm2 > + punpckldq xmm5,xmm0 > + movdqa xmm7,xmm15 > + > + movdqa xmm2,xmm15 > +DB 102,15,56,0,238 > + psrld xmm7,6 > + movdqa xmm1,xmm15 > + pslld xmm2,7 > + movdqa XMMWORD[(208-128)+rax],xmm5 > + paddd xmm5,xmm10 > + > + psrld xmm1,11 > + pxor xmm7,xmm2 > + pslld xmm2,21-7 > + paddd xmm5,XMMWORD[32+rbp] > + pxor xmm7,xmm1 > + > + psrld xmm1,25-11 > + movdqa xmm0,xmm15 > + > + pxor xmm7,xmm2 > + movdqa xmm4,xmm15 > + pslld xmm2,26-21 > + pandn xmm0,xmm9 > + pand xmm4,xmm8 > + pxor xmm7,xmm1 > + > + > + movdqa xmm1,xmm11 > + pxor xmm7,xmm2 > + movdqa xmm2,xmm11 > + psrld xmm1,2 > + paddd xmm5,xmm7 > + pxor xmm0,xmm4 > + movdqa xmm4,xmm12 > + movdqa xmm7,xmm11 > + pslld xmm2,10 > + pxor xmm4,xmm11 > + > + > + psrld xmm7,13 > + pxor xmm1,xmm2 > + paddd xmm5,xmm0 > + pslld xmm2,19-10 > + pand xmm3,xmm4 > + pxor xmm1,xmm7 > + > + > + psrld xmm7,22-13 > + pxor xmm1,xmm2 > + movdqa xmm10,xmm12 > + pslld xmm2,30-19 > + pxor xmm7,xmm1 > + pxor xmm10,xmm3 > + paddd xmm14,xmm5 > + pxor xmm7,xmm2 > + > + paddd xmm10,xmm5 > + paddd xmm10,xmm7 > + movd xmm5,DWORD[56+r8] > + movd xmm0,DWORD[56+r9] > + movd xmm1,DWORD[56+r10] > + movd xmm2,DWORD[56+r11] > + punpckldq xmm5,xmm1 > + punpckldq xmm0,xmm2 > + punpckldq xmm5,xmm0 > + movdqa xmm7,xmm14 > +DB 102,15,56,0,238 > + movdqa xmm2,xmm14 > + > + psrld xmm7,6 > + movdqa xmm1,xmm14 > + pslld xmm2,7 > + movdqa XMMWORD[(224-128)+rax],xmm5 > + paddd xmm5,xmm9 > + > + psrld xmm1,11 > + pxor xmm7,xmm2 > + pslld xmm2,21-7 > + paddd xmm5,XMMWORD[64+rbp] > + pxor xmm7,xmm1 > + > + psrld xmm1,25-11 > + movdqa xmm0,xmm14 > + > + pxor xmm7,xmm2 > + movdqa xmm3,xmm14 > + pslld xmm2,26-21 > + pandn xmm0,xmm8 > + pand xmm3,xmm15 > + pxor xmm7,xmm1 > + > + > + movdqa xmm1,xmm10 > + pxor xmm7,xmm2 > + movdqa xmm2,xmm10 > + psrld xmm1,2 > + paddd xmm5,xmm7 > + pxor xmm0,xmm3 > + movdqa xmm3,xmm11 > + movdqa xmm7,xmm10 > + pslld xmm2,10 > + pxor xmm3,xmm10 > + > + > + psrld xmm7,13 > + pxor xmm1,xmm2 > + paddd xmm5,xmm0 > + pslld xmm2,19-10 > + pand xmm4,xmm3 > + pxor xmm1,xmm7 > + > + > + psrld xmm7,22-13 > + pxor xmm1,xmm2 > + movdqa xmm9,xmm11 > + pslld xmm2,30-19 > + pxor xmm7,xmm1 > + pxor xmm9,xmm4 > + paddd xmm13,xmm5 > + pxor xmm7,xmm2 > + > + paddd xmm9,xmm5 > + paddd xmm9,xmm7 > + movd xmm5,DWORD[60+r8] > + lea r8,[64+r8] > + movd xmm0,DWORD[60+r9] > + lea r9,[64+r9] > + movd xmm1,DWORD[60+r10] > + lea r10,[64+r10] > + movd xmm2,DWORD[60+r11] > + lea r11,[64+r11] > + punpckldq xmm5,xmm1 > + punpckldq xmm0,xmm2 > + punpckldq xmm5,xmm0 > + movdqa xmm7,xmm13 > + > + movdqa xmm2,xmm13 > +DB 102,15,56,0,238 > + psrld xmm7,6 > + movdqa xmm1,xmm13 > + pslld xmm2,7 > + movdqa XMMWORD[(240-128)+rax],xmm5 > + paddd xmm5,xmm8 > + > + psrld xmm1,11 > + pxor xmm7,xmm2 > + pslld xmm2,21-7 > + paddd xmm5,XMMWORD[96+rbp] > + pxor xmm7,xmm1 > + > + psrld xmm1,25-11 > + movdqa xmm0,xmm13 > + prefetcht0 [63+r8] > + pxor xmm7,xmm2 > + movdqa xmm4,xmm13 > + pslld xmm2,26-21 > + pandn xmm0,xmm15 > + pand xmm4,xmm14 > + pxor xmm7,xmm1 > + > + prefetcht0 [63+r9] > + movdqa xmm1,xmm9 > + pxor xmm7,xmm2 > + movdqa xmm2,xmm9 > + psrld xmm1,2 > + paddd xmm5,xmm7 > + pxor xmm0,xmm4 > + movdqa xmm4,xmm10 > + movdqa xmm7,xmm9 > + pslld xmm2,10 > + pxor xmm4,xmm9 > + > + prefetcht0 [63+r10] > + psrld xmm7,13 > + pxor xmm1,xmm2 > + paddd xmm5,xmm0 > + pslld xmm2,19-10 > + pand xmm3,xmm4 > + pxor xmm1,xmm7 > + > + prefetcht0 [63+r11] > + psrld xmm7,22-13 > + pxor xmm1,xmm2 > + movdqa xmm8,xmm10 > + pslld xmm2,30-19 > + pxor xmm7,xmm1 > + pxor xmm8,xmm3 > + paddd xmm12,xmm5 > + pxor xmm7,xmm2 > + > + paddd xmm8,xmm5 > + paddd xmm8,xmm7 > + lea rbp,[256+rbp] > + movdqu xmm5,XMMWORD[((0-128))+rax] > + mov ecx,3 > + jmp NEAR $L$oop_16_xx > +ALIGN 32 > +$L$oop_16_xx: > + movdqa xmm6,XMMWORD[((16-128))+rax] > + paddd xmm5,XMMWORD[((144-128))+rax] > + > + movdqa xmm7,xmm6 > + movdqa xmm1,xmm6 > + psrld xmm7,3 > + movdqa xmm2,xmm6 > + > + psrld xmm1,7 > + movdqa xmm0,XMMWORD[((224-128))+rax] > + pslld xmm2,14 > + pxor xmm7,xmm1 > + psrld xmm1,18-7 > + movdqa xmm3,xmm0 > + pxor xmm7,xmm2 > + pslld xmm2,25-14 > + pxor xmm7,xmm1 > + psrld xmm0,10 > + movdqa xmm1,xmm3 > + > + psrld xmm3,17 > + pxor xmm7,xmm2 > + pslld xmm1,13 > + paddd xmm5,xmm7 > + pxor xmm0,xmm3 > + psrld xmm3,19-17 > + pxor xmm0,xmm1 > + pslld xmm1,15-13 > + pxor xmm0,xmm3 > + pxor xmm0,xmm1 > + paddd xmm5,xmm0 > + movdqa xmm7,xmm12 > + > + movdqa xmm2,xmm12 > + > + psrld xmm7,6 > + movdqa xmm1,xmm12 > + pslld xmm2,7 > + movdqa XMMWORD[(0-128)+rax],xmm5 > + paddd xmm5,xmm15 > + > + psrld xmm1,11 > + pxor xmm7,xmm2 > + pslld xmm2,21-7 > + paddd xmm5,XMMWORD[((-128))+rbp] > + pxor xmm7,xmm1 > + > + psrld xmm1,25-11 > + movdqa xmm0,xmm12 > + > + pxor xmm7,xmm2 > + movdqa xmm3,xmm12 > + pslld xmm2,26-21 > + pandn xmm0,xmm14 > + pand xmm3,xmm13 > + pxor xmm7,xmm1 > + > + > + movdqa xmm1,xmm8 > + pxor xmm7,xmm2 > + movdqa xmm2,xmm8 > + psrld xmm1,2 > + paddd xmm5,xmm7 > + pxor xmm0,xmm3 > + movdqa xmm3,xmm9 > + movdqa xmm7,xmm8 > + pslld xmm2,10 > + pxor xmm3,xmm8 > + > + > + psrld xmm7,13 > + pxor xmm1,xmm2 > + paddd xmm5,xmm0 > + pslld xmm2,19-10 > + pand xmm4,xmm3 > + pxor xmm1,xmm7 > + > + > + psrld xmm7,22-13 > + pxor xmm1,xmm2 > + movdqa xmm15,xmm9 > + pslld xmm2,30-19 > + pxor xmm7,xmm1 > + pxor xmm15,xmm4 > + paddd xmm11,xmm5 > + pxor xmm7,xmm2 > + > + paddd xmm15,xmm5 > + paddd xmm15,xmm7 > + movdqa xmm5,XMMWORD[((32-128))+rax] > + paddd xmm6,XMMWORD[((160-128))+rax] > + > + movdqa xmm7,xmm5 > + movdqa xmm1,xmm5 > + psrld xmm7,3 > + movdqa xmm2,xmm5 > + > + psrld xmm1,7 > + movdqa xmm0,XMMWORD[((240-128))+rax] > + pslld xmm2,14 > + pxor xmm7,xmm1 > + psrld xmm1,18-7 > + movdqa xmm4,xmm0 > + pxor xmm7,xmm2 > + pslld xmm2,25-14 > + pxor xmm7,xmm1 > + psrld xmm0,10 > + movdqa xmm1,xmm4 > + > + psrld xmm4,17 > + pxor xmm7,xmm2 > + pslld xmm1,13 > + paddd xmm6,xmm7 > + pxor xmm0,xmm4 > + psrld xmm4,19-17 > + pxor xmm0,xmm1 > + pslld xmm1,15-13 > + pxor xmm0,xmm4 > + pxor xmm0,xmm1 > + paddd xmm6,xmm0 > + movdqa xmm7,xmm11 > + > + movdqa xmm2,xmm11 > + > + psrld xmm7,6 > + movdqa xmm1,xmm11 > + pslld xmm2,7 > + movdqa XMMWORD[(16-128)+rax],xmm6 > + paddd xmm6,xmm14 > + > + psrld xmm1,11 > + pxor xmm7,xmm2 > + pslld xmm2,21-7 > + paddd xmm6,XMMWORD[((-96))+rbp] > + pxor xmm7,xmm1 > + > + psrld xmm1,25-11 > + movdqa xmm0,xmm11 > + > + pxor xmm7,xmm2 > + movdqa xmm4,xmm11 > + pslld xmm2,26-21 > + pandn xmm0,xmm13 > + pand xmm4,xmm12 > + pxor xmm7,xmm1 > + > + > + movdqa xmm1,xmm15 > + pxor xmm7,xmm2 > + movdqa xmm2,xmm15 > + psrld xmm1,2 > + paddd xmm6,xmm7 > + pxor xmm0,xmm4 > + movdqa xmm4,xmm8 > + movdqa xmm7,xmm15 > + pslld xmm2,10 > + pxor xmm4,xmm15 > + > + > + psrld xmm7,13 > + pxor xmm1,xmm2 > + paddd xmm6,xmm0 > + pslld xmm2,19-10 > + pand xmm3,xmm4 > + pxor xmm1,xmm7 > + > + > + psrld xmm7,22-13 > + pxor xmm1,xmm2 > + movdqa xmm14,xmm8 > + pslld xmm2,30-19 > + pxor xmm7,xmm1 > + pxor xmm14,xmm3 > + paddd xmm10,xmm6 > + pxor xmm7,xmm2 > + > + paddd xmm14,xmm6 > + paddd xmm14,xmm7 > + movdqa xmm6,XMMWORD[((48-128))+rax] > + paddd xmm5,XMMWORD[((176-128))+rax] > + > + movdqa xmm7,xmm6 > + movdqa xmm1,xmm6 > + psrld xmm7,3 > + movdqa xmm2,xmm6 > + > + psrld xmm1,7 > + movdqa xmm0,XMMWORD[((0-128))+rax] > + pslld xmm2,14 > + pxor xmm7,xmm1 > + psrld xmm1,18-7 > + movdqa xmm3,xmm0 > + pxor xmm7,xmm2 > + pslld xmm2,25-14 > + pxor xmm7,xmm1 > + psrld xmm0,10 > + movdqa xmm1,xmm3 > + > + psrld xmm3,17 > + pxor xmm7,xmm2 > + pslld xmm1,13 > + paddd xmm5,xmm7 > + pxor xmm0,xmm3 > + psrld xmm3,19-17 > + pxor xmm0,xmm1 > + pslld xmm1,15-13 > + pxor xmm0,xmm3 > + pxor xmm0,xmm1 > + paddd xmm5,xmm0 > + movdqa xmm7,xmm10 > + > + movdqa xmm2,xmm10 > + > + psrld xmm7,6 > + movdqa xmm1,xmm10 > + pslld xmm2,7 > + movdqa XMMWORD[(32-128)+rax],xmm5 > + paddd xmm5,xmm13 > + > + psrld xmm1,11 > + pxor xmm7,xmm2 > + pslld xmm2,21-7 > + paddd xmm5,XMMWORD[((-64))+rbp] > + pxor xmm7,xmm1 > + > + psrld xmm1,25-11 > + movdqa xmm0,xmm10 > + > + pxor xmm7,xmm2 > + movdqa xmm3,xmm10 > + pslld xmm2,26-21 > + pandn xmm0,xmm12 > + pand xmm3,xmm11 > + pxor xmm7,xmm1 > + > + > + movdqa xmm1,xmm14 > + pxor xmm7,xmm2 > + movdqa xmm2,xmm14 > + psrld xmm1,2 > + paddd xmm5,xmm7 > + pxor xmm0,xmm3 > + movdqa xmm3,xmm15 > + movdqa xmm7,xmm14 > + pslld xmm2,10 > + pxor xmm3,xmm14 > + > + > + psrld xmm7,13 > + pxor xmm1,xmm2 > + paddd xmm5,xmm0 > + pslld xmm2,19-10 > + pand xmm4,xmm3 > + pxor xmm1,xmm7 > + > + > + psrld xmm7,22-13 > + pxor xmm1,xmm2 > + movdqa xmm13,xmm15 > + pslld xmm2,30-19 > + pxor xmm7,xmm1 > + pxor xmm13,xmm4 > + paddd xmm9,xmm5 > + pxor xmm7,xmm2 > + > + paddd xmm13,xmm5 > + paddd xmm13,xmm7 > + movdqa xmm5,XMMWORD[((64-128))+rax] > + paddd xmm6,XMMWORD[((192-128))+rax] > + > + movdqa xmm7,xmm5 > + movdqa xmm1,xmm5 > + psrld xmm7,3 > + movdqa xmm2,xmm5 > + > + psrld xmm1,7 > + movdqa xmm0,XMMWORD[((16-128))+rax] > + pslld xmm2,14 > + pxor xmm7,xmm1 > + psrld xmm1,18-7 > + movdqa xmm4,xmm0 > + pxor xmm7,xmm2 > + pslld xmm2,25-14 > + pxor xmm7,xmm1 > + psrld xmm0,10 > + movdqa xmm1,xmm4 > + > + psrld xmm4,17 > + pxor xmm7,xmm2 > + pslld xmm1,13 > + paddd xmm6,xmm7 > + pxor xmm0,xmm4 > + psrld xmm4,19-17 > + pxor xmm0,xmm1 > + pslld xmm1,15-13 > + pxor xmm0,xmm4 > + pxor xmm0,xmm1 > + paddd xmm6,xmm0 > + movdqa xmm7,xmm9 > + > + movdqa xmm2,xmm9 > + > + psrld xmm7,6 > + movdqa xmm1,xmm9 > + pslld xmm2,7 > + movdqa XMMWORD[(48-128)+rax],xmm6 > + paddd xmm6,xmm12 > + > + psrld xmm1,11 > + pxor xmm7,xmm2 > + pslld xmm2,21-7 > + paddd xmm6,XMMWORD[((-32))+rbp] > + pxor xmm7,xmm1 > + > + psrld xmm1,25-11 > + movdqa xmm0,xmm9 > + > + pxor xmm7,xmm2 > + movdqa xmm4,xmm9 > + pslld xmm2,26-21 > + pandn xmm0,xmm11 > + pand xmm4,xmm10 > + pxor xmm7,xmm1 > + > + > + movdqa xmm1,xmm13 > + pxor xmm7,xmm2 > + movdqa xmm2,xmm13 > + psrld xmm1,2 > + paddd xmm6,xmm7 > + pxor xmm0,xmm4 > + movdqa xmm4,xmm14 > + movdqa xmm7,xmm13 > + pslld xmm2,10 > + pxor xmm4,xmm13 > + > + > + psrld xmm7,13 > + pxor xmm1,xmm2 > + paddd xmm6,xmm0 > + pslld xmm2,19-10 > + pand xmm3,xmm4 > + pxor xmm1,xmm7 > + > + > + psrld xmm7,22-13 > + pxor xmm1,xmm2 > + movdqa xmm12,xmm14 > + pslld xmm2,30-19 > + pxor xmm7,xmm1 > + pxor xmm12,xmm3 > + paddd xmm8,xmm6 > + pxor xmm7,xmm2 > + > + paddd xmm12,xmm6 > + paddd xmm12,xmm7 > + movdqa xmm6,XMMWORD[((80-128))+rax] > + paddd xmm5,XMMWORD[((208-128))+rax] > + > + movdqa xmm7,xmm6 > + movdqa xmm1,xmm6 > + psrld xmm7,3 > + movdqa xmm2,xmm6 > + > + psrld xmm1,7 > + movdqa xmm0,XMMWORD[((32-128))+rax] > + pslld xmm2,14 > + pxor xmm7,xmm1 > + psrld xmm1,18-7 > + movdqa xmm3,xmm0 > + pxor xmm7,xmm2 > + pslld xmm2,25-14 > + pxor xmm7,xmm1 > + psrld xmm0,10 > + movdqa xmm1,xmm3 > + > + psrld xmm3,17 > + pxor xmm7,xmm2 > + pslld xmm1,13 > + paddd xmm5,xmm7 > + pxor xmm0,xmm3 > + psrld xmm3,19-17 > + pxor xmm0,xmm1 > + pslld xmm1,15-13 > + pxor xmm0,xmm3 > + pxor xmm0,xmm1 > + paddd xmm5,xmm0 > + movdqa xmm7,xmm8 > + > + movdqa xmm2,xmm8 > + > + psrld xmm7,6 > + movdqa xmm1,xmm8 > + pslld xmm2,7 > + movdqa XMMWORD[(64-128)+rax],xmm5 > + paddd xmm5,xmm11 > + > + psrld xmm1,11 > + pxor xmm7,xmm2 > + pslld xmm2,21-7 > + paddd xmm5,XMMWORD[rbp] > + pxor xmm7,xmm1 > + > + psrld xmm1,25-11 > + movdqa xmm0,xmm8 > + > + pxor xmm7,xmm2 > + movdqa xmm3,xmm8 > + pslld xmm2,26-21 > + pandn xmm0,xmm10 > + pand xmm3,xmm9 > + pxor xmm7,xmm1 > + > + > + movdqa xmm1,xmm12 > + pxor xmm7,xmm2 > + movdqa xmm2,xmm12 > + psrld xmm1,2 > + paddd xmm5,xmm7 > + pxor xmm0,xmm3 > + movdqa xmm3,xmm13 > + movdqa xmm7,xmm12 > + pslld xmm2,10 > + pxor xmm3,xmm12 > + > + > + psrld xmm7,13 > + pxor xmm1,xmm2 > + paddd xmm5,xmm0 > + pslld xmm2,19-10 > + pand xmm4,xmm3 > + pxor xmm1,xmm7 > + > + > + psrld xmm7,22-13 > + pxor xmm1,xmm2 > + movdqa xmm11,xmm13 > + pslld xmm2,30-19 > + pxor xmm7,xmm1 > + pxor xmm11,xmm4 > + paddd xmm15,xmm5 > + pxor xmm7,xmm2 > + > + paddd xmm11,xmm5 > + paddd xmm11,xmm7 > + movdqa xmm5,XMMWORD[((96-128))+rax] > + paddd xmm6,XMMWORD[((224-128))+rax] > + > + movdqa xmm7,xmm5 > + movdqa xmm1,xmm5 > + psrld xmm7,3 > + movdqa xmm2,xmm5 > + > + psrld xmm1,7 > + movdqa xmm0,XMMWORD[((48-128))+rax] > + pslld xmm2,14 > + pxor xmm7,xmm1 > + psrld xmm1,18-7 > + movdqa xmm4,xmm0 > + pxor xmm7,xmm2 > + pslld xmm2,25-14 > + pxor xmm7,xmm1 > + psrld xmm0,10 > + movdqa xmm1,xmm4 > + > + psrld xmm4,17 > + pxor xmm7,xmm2 > + pslld xmm1,13 > + paddd xmm6,xmm7 > + pxor xmm0,xmm4 > + psrld xmm4,19-17 > + pxor xmm0,xmm1 > + pslld xmm1,15-13 > + pxor xmm0,xmm4 > + pxor xmm0,xmm1 > + paddd xmm6,xmm0 > + movdqa xmm7,xmm15 > + > + movdqa xmm2,xmm15 > + > + psrld xmm7,6 > + movdqa xmm1,xmm15 > + pslld xmm2,7 > + movdqa XMMWORD[(80-128)+rax],xmm6 > + paddd xmm6,xmm10 > + > + psrld xmm1,11 > + pxor xmm7,xmm2 > + pslld xmm2,21-7 > + paddd xmm6,XMMWORD[32+rbp] > + pxor xmm7,xmm1 > + > + psrld xmm1,25-11 > + movdqa xmm0,xmm15 > + > + pxor xmm7,xmm2 > + movdqa xmm4,xmm15 > + pslld xmm2,26-21 > + pandn xmm0,xmm9 > + pand xmm4,xmm8 > + pxor xmm7,xmm1 > + > + > + movdqa xmm1,xmm11 > + pxor xmm7,xmm2 > + movdqa xmm2,xmm11 > + psrld xmm1,2 > + paddd xmm6,xmm7 > + pxor xmm0,xmm4 > + movdqa xmm4,xmm12 > + movdqa xmm7,xmm11 > + pslld xmm2,10 > + pxor xmm4,xmm11 > + > + > + psrld xmm7,13 > + pxor xmm1,xmm2 > + paddd xmm6,xmm0 > + pslld xmm2,19-10 > + pand xmm3,xmm4 > + pxor xmm1,xmm7 > + > + > + psrld xmm7,22-13 > + pxor xmm1,xmm2 > + movdqa xmm10,xmm12 > + pslld xmm2,30-19 > + pxor xmm7,xmm1 > + pxor xmm10,xmm3 > + paddd xmm14,xmm6 > + pxor xmm7,xmm2 > + > + paddd xmm10,xmm6 > + paddd xmm10,xmm7 > + movdqa xmm6,XMMWORD[((112-128))+rax] > + paddd xmm5,XMMWORD[((240-128))+rax] > + > + movdqa xmm7,xmm6 > + movdqa xmm1,xmm6 > + psrld xmm7,3 > + movdqa xmm2,xmm6 > + > + psrld xmm1,7 > + movdqa xmm0,XMMWORD[((64-128))+rax] > + pslld xmm2,14 > + pxor xmm7,xmm1 > + psrld xmm1,18-7 > + movdqa xmm3,xmm0 > + pxor xmm7,xmm2 > + pslld xmm2,25-14 > + pxor xmm7,xmm1 > + psrld xmm0,10 > + movdqa xmm1,xmm3 > + > + psrld xmm3,17 > + pxor xmm7,xmm2 > + pslld xmm1,13 > + paddd xmm5,xmm7 > + pxor xmm0,xmm3 > + psrld xmm3,19-17 > + pxor xmm0,xmm1 > + pslld xmm1,15-13 > + pxor xmm0,xmm3 > + pxor xmm0,xmm1 > + paddd xmm5,xmm0 > + movdqa xmm7,xmm14 > + > + movdqa xmm2,xmm14 > + > + psrld xmm7,6 > + movdqa xmm1,xmm14 > + pslld xmm2,7 > + movdqa XMMWORD[(96-128)+rax],xmm5 > + paddd xmm5,xmm9 > + > + psrld xmm1,11 > + pxor xmm7,xmm2 > + pslld xmm2,21-7 > + paddd xmm5,XMMWORD[64+rbp] > + pxor xmm7,xmm1 > + > + psrld xmm1,25-11 > + movdqa xmm0,xmm14 > + > + pxor xmm7,xmm2 > + movdqa xmm3,xmm14 > + pslld xmm2,26-21 > + pandn xmm0,xmm8 > + pand xmm3,xmm15 > + pxor xmm7,xmm1 > + > + > + movdqa xmm1,xmm10 > + pxor xmm7,xmm2 > + movdqa xmm2,xmm10 > + psrld xmm1,2 > + paddd xmm5,xmm7 > + pxor xmm0,xmm3 > + movdqa xmm3,xmm11 > + movdqa xmm7,xmm10 > + pslld xmm2,10 > + pxor xmm3,xmm10 > + > + > + psrld xmm7,13 > + pxor xmm1,xmm2 > + paddd xmm5,xmm0 > + pslld xmm2,19-10 > + pand xmm4,xmm3 > + pxor xmm1,xmm7 > + > + > + psrld xmm7,22-13 > + pxor xmm1,xmm2 > + movdqa xmm9,xmm11 > + pslld xmm2,30-19 > + pxor xmm7,xmm1 > + pxor xmm9,xmm4 > + paddd xmm13,xmm5 > + pxor xmm7,xmm2 > + > + paddd xmm9,xmm5 > + paddd xmm9,xmm7 > + movdqa xmm5,XMMWORD[((128-128))+rax] > + paddd xmm6,XMMWORD[((0-128))+rax] > + > + movdqa xmm7,xmm5 > + movdqa xmm1,xmm5 > + psrld xmm7,3 > + movdqa xmm2,xmm5 > + > + psrld xmm1,7 > + movdqa xmm0,XMMWORD[((80-128))+rax] > + pslld xmm2,14 > + pxor xmm7,xmm1 > + psrld xmm1,18-7 > + movdqa xmm4,xmm0 > + pxor xmm7,xmm2 > + pslld xmm2,25-14 > + pxor xmm7,xmm1 > + psrld xmm0,10 > + movdqa xmm1,xmm4 > + > + psrld xmm4,17 > + pxor xmm7,xmm2 > + pslld xmm1,13 > + paddd xmm6,xmm7 > + pxor xmm0,xmm4 > + psrld xmm4,19-17 > + pxor xmm0,xmm1 > + pslld xmm1,15-13 > + pxor xmm0,xmm4 > + pxor xmm0,xmm1 > + paddd xmm6,xmm0 > + movdqa xmm7,xmm13 > + > + movdqa xmm2,xmm13 > + > + psrld xmm7,6 > + movdqa xmm1,xmm13 > + pslld xmm2,7 > + movdqa XMMWORD[(112-128)+rax],xmm6 > + paddd xmm6,xmm8 > + > + psrld xmm1,11 > + pxor xmm7,xmm2 > + pslld xmm2,21-7 > + paddd xmm6,XMMWORD[96+rbp] > + pxor xmm7,xmm1 > + > + psrld xmm1,25-11 > + movdqa xmm0,xmm13 > + > + pxor xmm7,xmm2 > + movdqa xmm4,xmm13 > + pslld xmm2,26-21 > + pandn xmm0,xmm15 > + pand xmm4,xmm14 > + pxor xmm7,xmm1 > + > + > + movdqa xmm1,xmm9 > + pxor xmm7,xmm2 > + movdqa xmm2,xmm9 > + psrld xmm1,2 > + paddd xmm6,xmm7 > + pxor xmm0,xmm4 > + movdqa xmm4,xmm10 > + movdqa xmm7,xmm9 > + pslld xmm2,10 > + pxor xmm4,xmm9 > + > + > + psrld xmm7,13 > + pxor xmm1,xmm2 > + paddd xmm6,xmm0 > + pslld xmm2,19-10 > + pand xmm3,xmm4 > + pxor xmm1,xmm7 > + > + > + psrld xmm7,22-13 > + pxor xmm1,xmm2 > + movdqa xmm8,xmm10 > + pslld xmm2,30-19 > + pxor xmm7,xmm1 > + pxor xmm8,xmm3 > + paddd xmm12,xmm6 > + pxor xmm7,xmm2 > + > + paddd xmm8,xmm6 > + paddd xmm8,xmm7 > + lea rbp,[256+rbp] > + movdqa xmm6,XMMWORD[((144-128))+rax] > + paddd xmm5,XMMWORD[((16-128))+rax] > + > + movdqa xmm7,xmm6 > + movdqa xmm1,xmm6 > + psrld xmm7,3 > + movdqa xmm2,xmm6 > + > + psrld xmm1,7 > + movdqa xmm0,XMMWORD[((96-128))+rax] > + pslld xmm2,14 > + pxor xmm7,xmm1 > + psrld xmm1,18-7 > + movdqa xmm3,xmm0 > + pxor xmm7,xmm2 > + pslld xmm2,25-14 > + pxor xmm7,xmm1 > + psrld xmm0,10 > + movdqa xmm1,xmm3 > + > + psrld xmm3,17 > + pxor xmm7,xmm2 > + pslld xmm1,13 > + paddd xmm5,xmm7 > + pxor xmm0,xmm3 > + psrld xmm3,19-17 > + pxor xmm0,xmm1 > + pslld xmm1,15-13 > + pxor xmm0,xmm3 > + pxor xmm0,xmm1 > + paddd xmm5,xmm0 > + movdqa xmm7,xmm12 > + > + movdqa xmm2,xmm12 > + > + psrld xmm7,6 > + movdqa xmm1,xmm12 > + pslld xmm2,7 > + movdqa XMMWORD[(128-128)+rax],xmm5 > + paddd xmm5,xmm15 > + > + psrld xmm1,11 > + pxor xmm7,xmm2 > + pslld xmm2,21-7 > + paddd xmm5,XMMWORD[((-128))+rbp] > + pxor xmm7,xmm1 > + > + psrld xmm1,25-11 > + movdqa xmm0,xmm12 > + > + pxor xmm7,xmm2 > + movdqa xmm3,xmm12 > + pslld xmm2,26-21 > + pandn xmm0,xmm14 > + pand xmm3,xmm13 > + pxor xmm7,xmm1 > + > + > + movdqa xmm1,xmm8 > + pxor xmm7,xmm2 > + movdqa xmm2,xmm8 > + psrld xmm1,2 > + paddd xmm5,xmm7 > + pxor xmm0,xmm3 > + movdqa xmm3,xmm9 > + movdqa xmm7,xmm8 > + pslld xmm2,10 > + pxor xmm3,xmm8 > + > + > + psrld xmm7,13 > + pxor xmm1,xmm2 > + paddd xmm5,xmm0 > + pslld xmm2,19-10 > + pand xmm4,xmm3 > + pxor xmm1,xmm7 > + > + > + psrld xmm7,22-13 > + pxor xmm1,xmm2 > + movdqa xmm15,xmm9 > + pslld xmm2,30-19 > + pxor xmm7,xmm1 > + pxor xmm15,xmm4 > + paddd xmm11,xmm5 > + pxor xmm7,xmm2 > + > + paddd xmm15,xmm5 > + paddd xmm15,xmm7 > + movdqa xmm5,XMMWORD[((160-128))+rax] > + paddd xmm6,XMMWORD[((32-128))+rax] > + > + movdqa xmm7,xmm5 > + movdqa xmm1,xmm5 > + psrld xmm7,3 > + movdqa xmm2,xmm5 > + > + psrld xmm1,7 > + movdqa xmm0,XMMWORD[((112-128))+rax] > + pslld xmm2,14 > + pxor xmm7,xmm1 > + psrld xmm1,18-7 > + movdqa xmm4,xmm0 > + pxor xmm7,xmm2 > + pslld xmm2,25-14 > + pxor xmm7,xmm1 > + psrld xmm0,10 > + movdqa xmm1,xmm4 > + > + psrld xmm4,17 > + pxor xmm7,xmm2 > + pslld xmm1,13 > + paddd xmm6,xmm7 > + pxor xmm0,xmm4 > + psrld xmm4,19-17 > + pxor xmm0,xmm1 > + pslld xmm1,15-13 > + pxor xmm0,xmm4 > + pxor xmm0,xmm1 > + paddd xmm6,xmm0 > + movdqa xmm7,xmm11 > + > + movdqa xmm2,xmm11 > + > + psrld xmm7,6 > + movdqa xmm1,xmm11 > + pslld xmm2,7 > + movdqa XMMWORD[(144-128)+rax],xmm6 > + paddd xmm6,xmm14 > + > + psrld xmm1,11 > + pxor xmm7,xmm2 > + pslld xmm2,21-7 > + paddd xmm6,XMMWORD[((-96))+rbp] > + pxor xmm7,xmm1 > + > + psrld xmm1,25-11 > + movdqa xmm0,xmm11 > + > + pxor xmm7,xmm2 > + movdqa xmm4,xmm11 > + pslld xmm2,26-21 > + pandn xmm0,xmm13 > + pand xmm4,xmm12 > + pxor xmm7,xmm1 > + > + > + movdqa xmm1,xmm15 > + pxor xmm7,xmm2 > + movdqa xmm2,xmm15 > + psrld xmm1,2 > + paddd xmm6,xmm7 > + pxor xmm0,xmm4 > + movdqa xmm4,xmm8 > + movdqa xmm7,xmm15 > + pslld xmm2,10 > + pxor xmm4,xmm15 > + > + > + psrld xmm7,13 > + pxor xmm1,xmm2 > + paddd xmm6,xmm0 > + pslld xmm2,19-10 > + pand xmm3,xmm4 > + pxor xmm1,xmm7 > + > + > + psrld xmm7,22-13 > + pxor xmm1,xmm2 > + movdqa xmm14,xmm8 > + pslld xmm2,30-19 > + pxor xmm7,xmm1 > + pxor xmm14,xmm3 > + paddd xmm10,xmm6 > + pxor xmm7,xmm2 > + > + paddd xmm14,xmm6 > + paddd xmm14,xmm7 > + movdqa xmm6,XMMWORD[((176-128))+rax] > + paddd xmm5,XMMWORD[((48-128))+rax] > + > + movdqa xmm7,xmm6 > + movdqa xmm1,xmm6 > + psrld xmm7,3 > + movdqa xmm2,xmm6 > + > + psrld xmm1,7 > + movdqa xmm0,XMMWORD[((128-128))+rax] > + pslld xmm2,14 > + pxor xmm7,xmm1 > + psrld xmm1,18-7 > + movdqa xmm3,xmm0 > + pxor xmm7,xmm2 > + pslld xmm2,25-14 > + pxor xmm7,xmm1 > + psrld xmm0,10 > + movdqa xmm1,xmm3 > + > + psrld xmm3,17 > + pxor xmm7,xmm2 > + pslld xmm1,13 > + paddd xmm5,xmm7 > + pxor xmm0,xmm3 > + psrld xmm3,19-17 > + pxor xmm0,xmm1 > + pslld xmm1,15-13 > + pxor xmm0,xmm3 > + pxor xmm0,xmm1 > + paddd xmm5,xmm0 > + movdqa xmm7,xmm10 > + > + movdqa xmm2,xmm10 > + > + psrld xmm7,6 > + movdqa xmm1,xmm10 > + pslld xmm2,7 > + movdqa XMMWORD[(160-128)+rax],xmm5 > + paddd xmm5,xmm13 > + > + psrld xmm1,11 > + pxor xmm7,xmm2 > + pslld xmm2,21-7 > + paddd xmm5,XMMWORD[((-64))+rbp] > + pxor xmm7,xmm1 > + > + psrld xmm1,25-11 > + movdqa xmm0,xmm10 > + > + pxor xmm7,xmm2 > + movdqa xmm3,xmm10 > + pslld xmm2,26-21 > + pandn xmm0,xmm12 > + pand xmm3,xmm11 > + pxor xmm7,xmm1 > + > + > + movdqa xmm1,xmm14 > + pxor xmm7,xmm2 > + movdqa xmm2,xmm14 > + psrld xmm1,2 > + paddd xmm5,xmm7 > + pxor xmm0,xmm3 > + movdqa xmm3,xmm15 > + movdqa xmm7,xmm14 > + pslld xmm2,10 > + pxor xmm3,xmm14 > + > + > + psrld xmm7,13 > + pxor xmm1,xmm2 > + paddd xmm5,xmm0 > + pslld xmm2,19-10 > + pand xmm4,xmm3 > + pxor xmm1,xmm7 > + > + > + psrld xmm7,22-13 > + pxor xmm1,xmm2 > + movdqa xmm13,xmm15 > + pslld xmm2,30-19 > + pxor xmm7,xmm1 > + pxor xmm13,xmm4 > + paddd xmm9,xmm5 > + pxor xmm7,xmm2 > + > + paddd xmm13,xmm5 > + paddd xmm13,xmm7 > + movdqa xmm5,XMMWORD[((192-128))+rax] > + paddd xmm6,XMMWORD[((64-128))+rax] > + > + movdqa xmm7,xmm5 > + movdqa xmm1,xmm5 > + psrld xmm7,3 > + movdqa xmm2,xmm5 > + > + psrld xmm1,7 > + movdqa xmm0,XMMWORD[((144-128))+rax] > + pslld xmm2,14 > + pxor xmm7,xmm1 > + psrld xmm1,18-7 > + movdqa xmm4,xmm0 > + pxor xmm7,xmm2 > + pslld xmm2,25-14 > + pxor xmm7,xmm1 > + psrld xmm0,10 > + movdqa xmm1,xmm4 > + > + psrld xmm4,17 > + pxor xmm7,xmm2 > + pslld xmm1,13 > + paddd xmm6,xmm7 > + pxor xmm0,xmm4 > + psrld xmm4,19-17 > + pxor xmm0,xmm1 > + pslld xmm1,15-13 > + pxor xmm0,xmm4 > + pxor xmm0,xmm1 > + paddd xmm6,xmm0 > + movdqa xmm7,xmm9 > + > + movdqa xmm2,xmm9 > + > + psrld xmm7,6 > + movdqa xmm1,xmm9 > + pslld xmm2,7 > + movdqa XMMWORD[(176-128)+rax],xmm6 > + paddd xmm6,xmm12 > + > + psrld xmm1,11 > + pxor xmm7,xmm2 > + pslld xmm2,21-7 > + paddd xmm6,XMMWORD[((-32))+rbp] > + pxor xmm7,xmm1 > + > + psrld xmm1,25-11 > + movdqa xmm0,xmm9 > + > + pxor xmm7,xmm2 > + movdqa xmm4,xmm9 > + pslld xmm2,26-21 > + pandn xmm0,xmm11 > + pand xmm4,xmm10 > + pxor xmm7,xmm1 > + > + > + movdqa xmm1,xmm13 > + pxor xmm7,xmm2 > + movdqa xmm2,xmm13 > + psrld xmm1,2 > + paddd xmm6,xmm7 > + pxor xmm0,xmm4 > + movdqa xmm4,xmm14 > + movdqa xmm7,xmm13 > + pslld xmm2,10 > + pxor xmm4,xmm13 > + > + > + psrld xmm7,13 > + pxor xmm1,xmm2 > + paddd xmm6,xmm0 > + pslld xmm2,19-10 > + pand xmm3,xmm4 > + pxor xmm1,xmm7 > + > + > + psrld xmm7,22-13 > + pxor xmm1,xmm2 > + movdqa xmm12,xmm14 > + pslld xmm2,30-19 > + pxor xmm7,xmm1 > + pxor xmm12,xmm3 > + paddd xmm8,xmm6 > + pxor xmm7,xmm2 > + > + paddd xmm12,xmm6 > + paddd xmm12,xmm7 > + movdqa xmm6,XMMWORD[((208-128))+rax] > + paddd xmm5,XMMWORD[((80-128))+rax] > + > + movdqa xmm7,xmm6 > + movdqa xmm1,xmm6 > + psrld xmm7,3 > + movdqa xmm2,xmm6 > + > + psrld xmm1,7 > + movdqa xmm0,XMMWORD[((160-128))+rax] > + pslld xmm2,14 > + pxor xmm7,xmm1 > + psrld xmm1,18-7 > + movdqa xmm3,xmm0 > + pxor xmm7,xmm2 > + pslld xmm2,25-14 > + pxor xmm7,xmm1 > + psrld xmm0,10 > + movdqa xmm1,xmm3 > + > + psrld xmm3,17 > + pxor xmm7,xmm2 > + pslld xmm1,13 > + paddd xmm5,xmm7 > + pxor xmm0,xmm3 > + psrld xmm3,19-17 > + pxor xmm0,xmm1 > + pslld xmm1,15-13 > + pxor xmm0,xmm3 > + pxor xmm0,xmm1 > + paddd xmm5,xmm0 > + movdqa xmm7,xmm8 > + > + movdqa xmm2,xmm8 > + > + psrld xmm7,6 > + movdqa xmm1,xmm8 > + pslld xmm2,7 > + movdqa XMMWORD[(192-128)+rax],xmm5 > + paddd xmm5,xmm11 > + > + psrld xmm1,11 > + pxor xmm7,xmm2 > + pslld xmm2,21-7 > + paddd xmm5,XMMWORD[rbp] > + pxor xmm7,xmm1 > + > + psrld xmm1,25-11 > + movdqa xmm0,xmm8 > + > + pxor xmm7,xmm2 > + movdqa xmm3,xmm8 > + pslld xmm2,26-21 > + pandn xmm0,xmm10 > + pand xmm3,xmm9 > + pxor xmm7,xmm1 > + > + > + movdqa xmm1,xmm12 > + pxor xmm7,xmm2 > + movdqa xmm2,xmm12 > + psrld xmm1,2 > + paddd xmm5,xmm7 > + pxor xmm0,xmm3 > + movdqa xmm3,xmm13 > + movdqa xmm7,xmm12 > + pslld xmm2,10 > + pxor xmm3,xmm12 > + > + > + psrld xmm7,13 > + pxor xmm1,xmm2 > + paddd xmm5,xmm0 > + pslld xmm2,19-10 > + pand xmm4,xmm3 > + pxor xmm1,xmm7 > + > + > + psrld xmm7,22-13 > + pxor xmm1,xmm2 > + movdqa xmm11,xmm13 > + pslld xmm2,30-19 > + pxor xmm7,xmm1 > + pxor xmm11,xmm4 > + paddd xmm15,xmm5 > + pxor xmm7,xmm2 > + > + paddd xmm11,xmm5 > + paddd xmm11,xmm7 > + movdqa xmm5,XMMWORD[((224-128))+rax] > + paddd xmm6,XMMWORD[((96-128))+rax] > + > + movdqa xmm7,xmm5 > + movdqa xmm1,xmm5 > + psrld xmm7,3 > + movdqa xmm2,xmm5 > + > + psrld xmm1,7 > + movdqa xmm0,XMMWORD[((176-128))+rax] > + pslld xmm2,14 > + pxor xmm7,xmm1 > + psrld xmm1,18-7 > + movdqa xmm4,xmm0 > + pxor xmm7,xmm2 > + pslld xmm2,25-14 > + pxor xmm7,xmm1 > + psrld xmm0,10 > + movdqa xmm1,xmm4 > + > + psrld xmm4,17 > + pxor xmm7,xmm2 > + pslld xmm1,13 > + paddd xmm6,xmm7 > + pxor xmm0,xmm4 > + psrld xmm4,19-17 > + pxor xmm0,xmm1 > + pslld xmm1,15-13 > + pxor xmm0,xmm4 > + pxor xmm0,xmm1 > + paddd xmm6,xmm0 > + movdqa xmm7,xmm15 > + > + movdqa xmm2,xmm15 > + > + psrld xmm7,6 > + movdqa xmm1,xmm15 > + pslld xmm2,7 > + movdqa XMMWORD[(208-128)+rax],xmm6 > + paddd xmm6,xmm10 > + > + psrld xmm1,11 > + pxor xmm7,xmm2 > + pslld xmm2,21-7 > + paddd xmm6,XMMWORD[32+rbp] > + pxor xmm7,xmm1 > + > + psrld xmm1,25-11 > + movdqa xmm0,xmm15 > + > + pxor xmm7,xmm2 > + movdqa xmm4,xmm15 > + pslld xmm2,26-21 > + pandn xmm0,xmm9 > + pand xmm4,xmm8 > + pxor xmm7,xmm1 > + > + > + movdqa xmm1,xmm11 > + pxor xmm7,xmm2 > + movdqa xmm2,xmm11 > + psrld xmm1,2 > + paddd xmm6,xmm7 > + pxor xmm0,xmm4 > + movdqa xmm4,xmm12 > + movdqa xmm7,xmm11 > + pslld xmm2,10 > + pxor xmm4,xmm11 > + > + > + psrld xmm7,13 > + pxor xmm1,xmm2 > + paddd xmm6,xmm0 > + pslld xmm2,19-10 > + pand xmm3,xmm4 > + pxor xmm1,xmm7 > + > + > + psrld xmm7,22-13 > + pxor xmm1,xmm2 > + movdqa xmm10,xmm12 > + pslld xmm2,30-19 > + pxor xmm7,xmm1 > + pxor xmm10,xmm3 > + paddd xmm14,xmm6 > + pxor xmm7,xmm2 > + > + paddd xmm10,xmm6 > + paddd xmm10,xmm7 > + movdqa xmm6,XMMWORD[((240-128))+rax] > + paddd xmm5,XMMWORD[((112-128))+rax] > + > + movdqa xmm7,xmm6 > + movdqa xmm1,xmm6 > + psrld xmm7,3 > + movdqa xmm2,xmm6 > + > + psrld xmm1,7 > + movdqa xmm0,XMMWORD[((192-128))+rax] > + pslld xmm2,14 > + pxor xmm7,xmm1 > + psrld xmm1,18-7 > + movdqa xmm3,xmm0 > + pxor xmm7,xmm2 > + pslld xmm2,25-14 > + pxor xmm7,xmm1 > + psrld xmm0,10 > + movdqa xmm1,xmm3 > + > + psrld xmm3,17 > + pxor xmm7,xmm2 > + pslld xmm1,13 > + paddd xmm5,xmm7 > + pxor xmm0,xmm3 > + psrld xmm3,19-17 > + pxor xmm0,xmm1 > + pslld xmm1,15-13 > + pxor xmm0,xmm3 > + pxor xmm0,xmm1 > + paddd xmm5,xmm0 > + movdqa xmm7,xmm14 > + > + movdqa xmm2,xmm14 > + > + psrld xmm7,6 > + movdqa xmm1,xmm14 > + pslld xmm2,7 > + movdqa XMMWORD[(224-128)+rax],xmm5 > + paddd xmm5,xmm9 > + > + psrld xmm1,11 > + pxor xmm7,xmm2 > + pslld xmm2,21-7 > + paddd xmm5,XMMWORD[64+rbp] > + pxor xmm7,xmm1 > + > + psrld xmm1,25-11 > + movdqa xmm0,xmm14 > + > + pxor xmm7,xmm2 > + movdqa xmm3,xmm14 > + pslld xmm2,26-21 > + pandn xmm0,xmm8 > + pand xmm3,xmm15 > + pxor xmm7,xmm1 > + > + > + movdqa xmm1,xmm10 > + pxor xmm7,xmm2 > + movdqa xmm2,xmm10 > + psrld xmm1,2 > + paddd xmm5,xmm7 > + pxor xmm0,xmm3 > + movdqa xmm3,xmm11 > + movdqa xmm7,xmm10 > + pslld xmm2,10 > + pxor xmm3,xmm10 > + > + > + psrld xmm7,13 > + pxor xmm1,xmm2 > + paddd xmm5,xmm0 > + pslld xmm2,19-10 > + pand xmm4,xmm3 > + pxor xmm1,xmm7 > + > + > + psrld xmm7,22-13 > + pxor xmm1,xmm2 > + movdqa xmm9,xmm11 > + pslld xmm2,30-19 > + pxor xmm7,xmm1 > + pxor xmm9,xmm4 > + paddd xmm13,xmm5 > + pxor xmm7,xmm2 > + > + paddd xmm9,xmm5 > + paddd xmm9,xmm7 > + movdqa xmm5,XMMWORD[((0-128))+rax] > + paddd xmm6,XMMWORD[((128-128))+rax] > + > + movdqa xmm7,xmm5 > + movdqa xmm1,xmm5 > + psrld xmm7,3 > + movdqa xmm2,xmm5 > + > + psrld xmm1,7 > + movdqa xmm0,XMMWORD[((208-128))+rax] > + pslld xmm2,14 > + pxor xmm7,xmm1 > + psrld xmm1,18-7 > + movdqa xmm4,xmm0 > + pxor xmm7,xmm2 > + pslld xmm2,25-14 > + pxor xmm7,xmm1 > + psrld xmm0,10 > + movdqa xmm1,xmm4 > + > + psrld xmm4,17 > + pxor xmm7,xmm2 > + pslld xmm1,13 > + paddd xmm6,xmm7 > + pxor xmm0,xmm4 > + psrld xmm4,19-17 > + pxor xmm0,xmm1 > + pslld xmm1,15-13 > + pxor xmm0,xmm4 > + pxor xmm0,xmm1 > + paddd xmm6,xmm0 > + movdqa xmm7,xmm13 > + > + movdqa xmm2,xmm13 > + > + psrld xmm7,6 > + movdqa xmm1,xmm13 > + pslld xmm2,7 > + movdqa XMMWORD[(240-128)+rax],xmm6 > + paddd xmm6,xmm8 > + > + psrld xmm1,11 > + pxor xmm7,xmm2 > + pslld xmm2,21-7 > + paddd xmm6,XMMWORD[96+rbp] > + pxor xmm7,xmm1 > + > + psrld xmm1,25-11 > + movdqa xmm0,xmm13 > + > + pxor xmm7,xmm2 > + movdqa xmm4,xmm13 > + pslld xmm2,26-21 > + pandn xmm0,xmm15 > + pand xmm4,xmm14 > + pxor xmm7,xmm1 > + > + > + movdqa xmm1,xmm9 > + pxor xmm7,xmm2 > + movdqa xmm2,xmm9 > + psrld xmm1,2 > + paddd xmm6,xmm7 > + pxor xmm0,xmm4 > + movdqa xmm4,xmm10 > + movdqa xmm7,xmm9 > + pslld xmm2,10 > + pxor xmm4,xmm9 > + > + > + psrld xmm7,13 > + pxor xmm1,xmm2 > + paddd xmm6,xmm0 > + pslld xmm2,19-10 > + pand xmm3,xmm4 > + pxor xmm1,xmm7 > + > + > + psrld xmm7,22-13 > + pxor xmm1,xmm2 > + movdqa xmm8,xmm10 > + pslld xmm2,30-19 > + pxor xmm7,xmm1 > + pxor xmm8,xmm3 > + paddd xmm12,xmm6 > + pxor xmm7,xmm2 > + > + paddd xmm8,xmm6 > + paddd xmm8,xmm7 > + lea rbp,[256+rbp] > + dec ecx > + jnz NEAR $L$oop_16_xx > + > + mov ecx,1 > + lea rbp,[((K256+128))] > + > + movdqa xmm7,XMMWORD[rbx] > + cmp ecx,DWORD[rbx] > + pxor xmm0,xmm0 > + cmovge r8,rbp > + cmp ecx,DWORD[4+rbx] > + movdqa xmm6,xmm7 > + cmovge r9,rbp > + cmp ecx,DWORD[8+rbx] > + pcmpgtd xmm6,xmm0 > + cmovge r10,rbp > + cmp ecx,DWORD[12+rbx] > + paddd xmm7,xmm6 > + cmovge r11,rbp > + > + movdqu xmm0,XMMWORD[((0-128))+rdi] > + pand xmm8,xmm6 > + movdqu xmm1,XMMWORD[((32-128))+rdi] > + pand xmm9,xmm6 > + movdqu xmm2,XMMWORD[((64-128))+rdi] > + pand xmm10,xmm6 > + movdqu xmm5,XMMWORD[((96-128))+rdi] > + pand xmm11,xmm6 > + paddd xmm8,xmm0 > + movdqu xmm0,XMMWORD[((128-128))+rdi] > + pand xmm12,xmm6 > + paddd xmm9,xmm1 > + movdqu xmm1,XMMWORD[((160-128))+rdi] > + pand xmm13,xmm6 > + paddd xmm10,xmm2 > + movdqu xmm2,XMMWORD[((192-128))+rdi] > + pand xmm14,xmm6 > + paddd xmm11,xmm5 > + movdqu xmm5,XMMWORD[((224-128))+rdi] > + pand xmm15,xmm6 > + paddd xmm12,xmm0 > + paddd xmm13,xmm1 > + movdqu XMMWORD[(0-128)+rdi],xmm8 > + paddd xmm14,xmm2 > + movdqu XMMWORD[(32-128)+rdi],xmm9 > + paddd xmm15,xmm5 > + movdqu XMMWORD[(64-128)+rdi],xmm10 > + movdqu XMMWORD[(96-128)+rdi],xmm11 > + movdqu XMMWORD[(128-128)+rdi],xmm12 > + movdqu XMMWORD[(160-128)+rdi],xmm13 > + movdqu XMMWORD[(192-128)+rdi],xmm14 > + movdqu XMMWORD[(224-128)+rdi],xmm15 > + > + movdqa XMMWORD[rbx],xmm7 > + movdqa xmm6,XMMWORD[$L$pbswap] > + dec edx > + jnz NEAR $L$oop > + > + mov edx,DWORD[280+rsp] > + lea rdi,[16+rdi] > + lea rsi,[64+rsi] > + dec edx > + jnz NEAR $L$oop_grande > + > +$L$done: > + mov rax,QWORD[272+rsp] > + > + movaps xmm6,XMMWORD[((-184))+rax] > + movaps xmm7,XMMWORD[((-168))+rax] > + movaps xmm8,XMMWORD[((-152))+rax] > + movaps xmm9,XMMWORD[((-136))+rax] > + movaps xmm10,XMMWORD[((-120))+rax] > + movaps xmm11,XMMWORD[((-104))+rax] > + movaps xmm12,XMMWORD[((-88))+rax] > + movaps xmm13,XMMWORD[((-72))+rax] > + movaps xmm14,XMMWORD[((-56))+rax] > + movaps xmm15,XMMWORD[((-40))+rax] > + mov rbp,QWORD[((-16))+rax] > + > + mov rbx,QWORD[((-8))+rax] > + > + lea rsp,[rax] > + > +$L$epilogue: > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > + mov rsi,QWORD[16+rsp] > + DB 0F3h,0C3h ;repret > + > +$L$SEH_end_sha256_multi_block: > + > +ALIGN 32 > +sha256_multi_block_shaext: > + mov QWORD[8+rsp],rdi ;WIN64 prologue > + mov QWORD[16+rsp],rsi > + mov rax,rsp > +$L$SEH_begin_sha256_multi_block_shaext: > + mov rdi,rcx > + mov rsi,rdx > + mov rdx,r8 > + > + > + > +_shaext_shortcut: > + mov rax,rsp > + > + push rbx > + > + push rbp > + > + lea rsp,[((-168))+rsp] > + movaps XMMWORD[rsp],xmm6 > + movaps XMMWORD[16+rsp],xmm7 > + movaps XMMWORD[32+rsp],xmm8 > + movaps XMMWORD[48+rsp],xmm9 > + movaps XMMWORD[(-120)+rax],xmm10 > + movaps XMMWORD[(-104)+rax],xmm11 > + movaps XMMWORD[(-88)+rax],xmm12 > + movaps XMMWORD[(-72)+rax],xmm13 > + movaps XMMWORD[(-56)+rax],xmm14 > + movaps XMMWORD[(-40)+rax],xmm15 > + sub rsp,288 > + shl edx,1 > + and rsp,-256 > + lea rdi,[128+rdi] > + mov QWORD[272+rsp],rax > +$L$body_shaext: > + lea rbx,[256+rsp] > + lea rbp,[((K256_shaext+128))] > + > +$L$oop_grande_shaext: > + mov DWORD[280+rsp],edx > + xor edx,edx > + mov r8,QWORD[rsi] > + mov ecx,DWORD[8+rsi] > + cmp ecx,edx > + cmovg edx,ecx > + test ecx,ecx > + mov DWORD[rbx],ecx > + cmovle r8,rsp > + mov r9,QWORD[16+rsi] > + mov ecx,DWORD[24+rsi] > + cmp ecx,edx > + cmovg edx,ecx > + test ecx,ecx > + mov DWORD[4+rbx],ecx > + cmovle r9,rsp > + test edx,edx > + jz NEAR $L$done_shaext > + > + movq xmm12,QWORD[((0-128))+rdi] > + movq xmm4,QWORD[((32-128))+rdi] > + movq xmm13,QWORD[((64-128))+rdi] > + movq xmm5,QWORD[((96-128))+rdi] > + movq xmm8,QWORD[((128-128))+rdi] > + movq xmm9,QWORD[((160-128))+rdi] > + movq xmm10,QWORD[((192-128))+rdi] > + movq xmm11,QWORD[((224-128))+rdi] > + > + punpckldq xmm12,xmm4 > + punpckldq xmm13,xmm5 > + punpckldq xmm8,xmm9 > + punpckldq xmm10,xmm11 > + movdqa xmm3,XMMWORD[((K256_shaext-16))] > + > + movdqa xmm14,xmm12 > + movdqa xmm15,xmm13 > + punpcklqdq xmm12,xmm8 > + punpcklqdq xmm13,xmm10 > + punpckhqdq xmm14,xmm8 > + punpckhqdq xmm15,xmm10 > + > + pshufd xmm12,xmm12,27 > + pshufd xmm13,xmm13,27 > + pshufd xmm14,xmm14,27 > + pshufd xmm15,xmm15,27 > + jmp NEAR $L$oop_shaext > + > +ALIGN 32 > +$L$oop_shaext: > + movdqu xmm4,XMMWORD[r8] > + movdqu xmm8,XMMWORD[r9] > + movdqu xmm5,XMMWORD[16+r8] > + movdqu xmm9,XMMWORD[16+r9] > + movdqu xmm6,XMMWORD[32+r8] > +DB 102,15,56,0,227 > + movdqu xmm10,XMMWORD[32+r9] > +DB 102,68,15,56,0,195 > + movdqu xmm7,XMMWORD[48+r8] > + lea r8,[64+r8] > + movdqu xmm11,XMMWORD[48+r9] > + lea r9,[64+r9] > + > + movdqa xmm0,XMMWORD[((0-128))+rbp] > +DB 102,15,56,0,235 > + paddd xmm0,xmm4 > + pxor xmm4,xmm12 > + movdqa xmm1,xmm0 > + movdqa xmm2,XMMWORD[((0-128))+rbp] > +DB 102,68,15,56,0,203 > + paddd xmm2,xmm8 > + movdqa XMMWORD[80+rsp],xmm13 > +DB 69,15,56,203,236 > + pxor xmm8,xmm14 > + movdqa xmm0,xmm2 > + movdqa XMMWORD[112+rsp],xmm15 > +DB 69,15,56,203,254 > + pshufd xmm0,xmm1,0x0e > + pxor xmm4,xmm12 > + movdqa XMMWORD[64+rsp],xmm12 > +DB 69,15,56,203,229 > + pshufd xmm0,xmm2,0x0e > + pxor xmm8,xmm14 > + movdqa XMMWORD[96+rsp],xmm14 > + movdqa xmm1,XMMWORD[((16-128))+rbp] > + paddd xmm1,xmm5 > +DB 102,15,56,0,243 > +DB 69,15,56,203,247 > + > + movdqa xmm0,xmm1 > + movdqa xmm2,XMMWORD[((16-128))+rbp] > + paddd xmm2,xmm9 > +DB 69,15,56,203,236 > + movdqa xmm0,xmm2 > + prefetcht0 [127+r8] > +DB 102,15,56,0,251 > +DB 102,68,15,56,0,211 > + prefetcht0 [127+r9] > +DB 69,15,56,203,254 > + pshufd xmm0,xmm1,0x0e > +DB 102,68,15,56,0,219 > +DB 15,56,204,229 > +DB 69,15,56,203,229 > + pshufd xmm0,xmm2,0x0e > + movdqa xmm1,XMMWORD[((32-128))+rbp] > + paddd xmm1,xmm6 > +DB 69,15,56,203,247 > + > + movdqa xmm0,xmm1 > + movdqa xmm2,XMMWORD[((32-128))+rbp] > + paddd xmm2,xmm10 > +DB 69,15,56,203,236 > +DB 69,15,56,204,193 > + movdqa xmm0,xmm2 > + movdqa xmm3,xmm7 > +DB 69,15,56,203,254 > + pshufd xmm0,xmm1,0x0e > +DB 102,15,58,15,222,4 > + paddd xmm4,xmm3 > + movdqa xmm3,xmm11 > +DB 102,65,15,58,15,218,4 > +DB 15,56,204,238 > +DB 69,15,56,203,229 > + pshufd xmm0,xmm2,0x0e > + movdqa xmm1,XMMWORD[((48-128))+rbp] > + paddd xmm1,xmm7 > +DB 69,15,56,203,247 > +DB 69,15,56,204,202 > + > + movdqa xmm0,xmm1 > + movdqa xmm2,XMMWORD[((48-128))+rbp] > + paddd xmm8,xmm3 > + paddd xmm2,xmm11 > +DB 15,56,205,231 > +DB 69,15,56,203,236 > + movdqa xmm0,xmm2 > + movdqa xmm3,xmm4 > +DB 102,15,58,15,223,4 > +DB 69,15,56,203,254 > +DB 69,15,56,205,195 > + pshufd xmm0,xmm1,0x0e > + paddd xmm5,xmm3 > + movdqa xmm3,xmm8 > +DB 102,65,15,58,15,219,4 > +DB 15,56,204,247 > +DB 69,15,56,203,229 > + pshufd xmm0,xmm2,0x0e > + movdqa xmm1,XMMWORD[((64-128))+rbp] > + paddd xmm1,xmm4 > +DB 69,15,56,203,247 > +DB 69,15,56,204,211 > + movdqa xmm0,xmm1 > + movdqa xmm2,XMMWORD[((64-128))+rbp] > + paddd xmm9,xmm3 > + paddd xmm2,xmm8 > +DB 15,56,205,236 > +DB 69,15,56,203,236 > + movdqa xmm0,xmm2 > + movdqa xmm3,xmm5 > +DB 102,15,58,15,220,4 > +DB 69,15,56,203,254 > +DB 69,15,56,205,200 > + pshufd xmm0,xmm1,0x0e > + paddd xmm6,xmm3 > + movdqa xmm3,xmm9 > +DB 102,65,15,58,15,216,4 > +DB 15,56,204,252 > +DB 69,15,56,203,229 > + pshufd xmm0,xmm2,0x0e > + movdqa xmm1,XMMWORD[((80-128))+rbp] > + paddd xmm1,xmm5 > +DB 69,15,56,203,247 > +DB 69,15,56,204,216 > + movdqa xmm0,xmm1 > + movdqa xmm2,XMMWORD[((80-128))+rbp] > + paddd xmm10,xmm3 > + paddd xmm2,xmm9 > +DB 15,56,205,245 > +DB 69,15,56,203,236 > + movdqa xmm0,xmm2 > + movdqa xmm3,xmm6 > +DB 102,15,58,15,221,4 > +DB 69,15,56,203,254 > +DB 69,15,56,205,209 > + pshufd xmm0,xmm1,0x0e > + paddd xmm7,xmm3 > + movdqa xmm3,xmm10 > +DB 102,65,15,58,15,217,4 > +DB 15,56,204,229 > +DB 69,15,56,203,229 > + pshufd xmm0,xmm2,0x0e > + movdqa xmm1,XMMWORD[((96-128))+rbp] > + paddd xmm1,xmm6 > +DB 69,15,56,203,247 > +DB 69,15,56,204,193 > + movdqa xmm0,xmm1 > + movdqa xmm2,XMMWORD[((96-128))+rbp] > + paddd xmm11,xmm3 > + paddd xmm2,xmm10 > +DB 15,56,205,254 > +DB 69,15,56,203,236 > + movdqa xmm0,xmm2 > + movdqa xmm3,xmm7 > +DB 102,15,58,15,222,4 > +DB 69,15,56,203,254 > +DB 69,15,56,205,218 > + pshufd xmm0,xmm1,0x0e > + paddd xmm4,xmm3 > + movdqa xmm3,xmm11 > +DB 102,65,15,58,15,218,4 > +DB 15,56,204,238 > +DB 69,15,56,203,229 > + pshufd xmm0,xmm2,0x0e > + movdqa xmm1,XMMWORD[((112-128))+rbp] > + paddd xmm1,xmm7 > +DB 69,15,56,203,247 > +DB 69,15,56,204,202 > + movdqa xmm0,xmm1 > + movdqa xmm2,XMMWORD[((112-128))+rbp] > + paddd xmm8,xmm3 > + paddd xmm2,xmm11 > +DB 15,56,205,231 > +DB 69,15,56,203,236 > + movdqa xmm0,xmm2 > + movdqa xmm3,xmm4 > +DB 102,15,58,15,223,4 > +DB 69,15,56,203,254 > +DB 69,15,56,205,195 > + pshufd xmm0,xmm1,0x0e > + paddd xmm5,xmm3 > + movdqa xmm3,xmm8 > +DB 102,65,15,58,15,219,4 > +DB 15,56,204,247 > +DB 69,15,56,203,229 > + pshufd xmm0,xmm2,0x0e > + movdqa xmm1,XMMWORD[((128-128))+rbp] > + paddd xmm1,xmm4 > +DB 69,15,56,203,247 > +DB 69,15,56,204,211 > + movdqa xmm0,xmm1 > + movdqa xmm2,XMMWORD[((128-128))+rbp] > + paddd xmm9,xmm3 > + paddd xmm2,xmm8 > +DB 15,56,205,236 > +DB 69,15,56,203,236 > + movdqa xmm0,xmm2 > + movdqa xmm3,xmm5 > +DB 102,15,58,15,220,4 > +DB 69,15,56,203,254 > +DB 69,15,56,205,200 > + pshufd xmm0,xmm1,0x0e > + paddd xmm6,xmm3 > + movdqa xmm3,xmm9 > +DB 102,65,15,58,15,216,4 > +DB 15,56,204,252 > +DB 69,15,56,203,229 > + pshufd xmm0,xmm2,0x0e > + movdqa xmm1,XMMWORD[((144-128))+rbp] > + paddd xmm1,xmm5 > +DB 69,15,56,203,247 > +DB 69,15,56,204,216 > + movdqa xmm0,xmm1 > + movdqa xmm2,XMMWORD[((144-128))+rbp] > + paddd xmm10,xmm3 > + paddd xmm2,xmm9 > +DB 15,56,205,245 > +DB 69,15,56,203,236 > + movdqa xmm0,xmm2 > + movdqa xmm3,xmm6 > +DB 102,15,58,15,221,4 > +DB 69,15,56,203,254 > +DB 69,15,56,205,209 > + pshufd xmm0,xmm1,0x0e > + paddd xmm7,xmm3 > + movdqa xmm3,xmm10 > +DB 102,65,15,58,15,217,4 > +DB 15,56,204,229 > +DB 69,15,56,203,229 > + pshufd xmm0,xmm2,0x0e > + movdqa xmm1,XMMWORD[((160-128))+rbp] > + paddd xmm1,xmm6 > +DB 69,15,56,203,247 > +DB 69,15,56,204,193 > + movdqa xmm0,xmm1 > + movdqa xmm2,XMMWORD[((160-128))+rbp] > + paddd xmm11,xmm3 > + paddd xmm2,xmm10 > +DB 15,56,205,254 > +DB 69,15,56,203,236 > + movdqa xmm0,xmm2 > + movdqa xmm3,xmm7 > +DB 102,15,58,15,222,4 > +DB 69,15,56,203,254 > +DB 69,15,56,205,218 > + pshufd xmm0,xmm1,0x0e > + paddd xmm4,xmm3 > + movdqa xmm3,xmm11 > +DB 102,65,15,58,15,218,4 > +DB 15,56,204,238 > +DB 69,15,56,203,229 > + pshufd xmm0,xmm2,0x0e > + movdqa xmm1,XMMWORD[((176-128))+rbp] > + paddd xmm1,xmm7 > +DB 69,15,56,203,247 > +DB 69,15,56,204,202 > + movdqa xmm0,xmm1 > + movdqa xmm2,XMMWORD[((176-128))+rbp] > + paddd xmm8,xmm3 > + paddd xmm2,xmm11 > +DB 15,56,205,231 > +DB 69,15,56,203,236 > + movdqa xmm0,xmm2 > + movdqa xmm3,xmm4 > +DB 102,15,58,15,223,4 > +DB 69,15,56,203,254 > +DB 69,15,56,205,195 > + pshufd xmm0,xmm1,0x0e > + paddd xmm5,xmm3 > + movdqa xmm3,xmm8 > +DB 102,65,15,58,15,219,4 > +DB 15,56,204,247 > +DB 69,15,56,203,229 > + pshufd xmm0,xmm2,0x0e > + movdqa xmm1,XMMWORD[((192-128))+rbp] > + paddd xmm1,xmm4 > +DB 69,15,56,203,247 > +DB 69,15,56,204,211 > + movdqa xmm0,xmm1 > + movdqa xmm2,XMMWORD[((192-128))+rbp] > + paddd xmm9,xmm3 > + paddd xmm2,xmm8 > +DB 15,56,205,236 > +DB 69,15,56,203,236 > + movdqa xmm0,xmm2 > + movdqa xmm3,xmm5 > +DB 102,15,58,15,220,4 > +DB 69,15,56,203,254 > +DB 69,15,56,205,200 > + pshufd xmm0,xmm1,0x0e > + paddd xmm6,xmm3 > + movdqa xmm3,xmm9 > +DB 102,65,15,58,15,216,4 > +DB 15,56,204,252 > +DB 69,15,56,203,229 > + pshufd xmm0,xmm2,0x0e > + movdqa xmm1,XMMWORD[((208-128))+rbp] > + paddd xmm1,xmm5 > +DB 69,15,56,203,247 > +DB 69,15,56,204,216 > + movdqa xmm0,xmm1 > + movdqa xmm2,XMMWORD[((208-128))+rbp] > + paddd xmm10,xmm3 > + paddd xmm2,xmm9 > +DB 15,56,205,245 > +DB 69,15,56,203,236 > + movdqa xmm0,xmm2 > + movdqa xmm3,xmm6 > +DB 102,15,58,15,221,4 > +DB 69,15,56,203,254 > +DB 69,15,56,205,209 > + pshufd xmm0,xmm1,0x0e > + paddd xmm7,xmm3 > + movdqa xmm3,xmm10 > +DB 102,65,15,58,15,217,4 > + nop > +DB 69,15,56,203,229 > + pshufd xmm0,xmm2,0x0e > + movdqa xmm1,XMMWORD[((224-128))+rbp] > + paddd xmm1,xmm6 > +DB 69,15,56,203,247 > + > + movdqa xmm0,xmm1 > + movdqa xmm2,XMMWORD[((224-128))+rbp] > + paddd xmm11,xmm3 > + paddd xmm2,xmm10 > +DB 15,56,205,254 > + nop > +DB 69,15,56,203,236 > + movdqa xmm0,xmm2 > + mov ecx,1 > + pxor xmm6,xmm6 > +DB 69,15,56,203,254 > +DB 69,15,56,205,218 > + pshufd xmm0,xmm1,0x0e > + movdqa xmm1,XMMWORD[((240-128))+rbp] > + paddd xmm1,xmm7 > + movq xmm7,QWORD[rbx] > + nop > +DB 69,15,56,203,229 > + pshufd xmm0,xmm2,0x0e > + movdqa xmm2,XMMWORD[((240-128))+rbp] > + paddd xmm2,xmm11 > +DB 69,15,56,203,247 > + > + movdqa xmm0,xmm1 > + cmp ecx,DWORD[rbx] > + cmovge r8,rsp > + cmp ecx,DWORD[4+rbx] > + cmovge r9,rsp > + pshufd xmm9,xmm7,0x00 > +DB 69,15,56,203,236 > + movdqa xmm0,xmm2 > + pshufd xmm10,xmm7,0x55 > + movdqa xmm11,xmm7 > +DB 69,15,56,203,254 > + pshufd xmm0,xmm1,0x0e > + pcmpgtd xmm9,xmm6 > + pcmpgtd xmm10,xmm6 > +DB 69,15,56,203,229 > + pshufd xmm0,xmm2,0x0e > + pcmpgtd xmm11,xmm6 > + movdqa xmm3,XMMWORD[((K256_shaext-16))] > +DB 69,15,56,203,247 > + > + pand xmm13,xmm9 > + pand xmm15,xmm10 > + pand xmm12,xmm9 > + pand xmm14,xmm10 > + paddd xmm11,xmm7 > + > + paddd xmm13,XMMWORD[80+rsp] > + paddd xmm15,XMMWORD[112+rsp] > + paddd xmm12,XMMWORD[64+rsp] > + paddd xmm14,XMMWORD[96+rsp] > + > + movq QWORD[rbx],xmm11 > + dec edx > + jnz NEAR $L$oop_shaext > + > + mov edx,DWORD[280+rsp] > + > + pshufd xmm12,xmm12,27 > + pshufd xmm13,xmm13,27 > + pshufd xmm14,xmm14,27 > + pshufd xmm15,xmm15,27 > + > + movdqa xmm5,xmm12 > + movdqa xmm6,xmm13 > + punpckldq xmm12,xmm14 > + punpckhdq xmm5,xmm14 > + punpckldq xmm13,xmm15 > + punpckhdq xmm6,xmm15 > + > + movq QWORD[(0-128)+rdi],xmm12 > + psrldq xmm12,8 > + movq QWORD[(128-128)+rdi],xmm5 > + psrldq xmm5,8 > + movq QWORD[(32-128)+rdi],xmm12 > + movq QWORD[(160-128)+rdi],xmm5 > + > + movq QWORD[(64-128)+rdi],xmm13 > + psrldq xmm13,8 > + movq QWORD[(192-128)+rdi],xmm6 > + psrldq xmm6,8 > + movq QWORD[(96-128)+rdi],xmm13 > + movq QWORD[(224-128)+rdi],xmm6 > + > + lea rdi,[8+rdi] > + lea rsi,[32+rsi] > + dec edx > + jnz NEAR $L$oop_grande_shaext > + > +$L$done_shaext: > + > + movaps xmm6,XMMWORD[((-184))+rax] > + movaps xmm7,XMMWORD[((-168))+rax] > + movaps xmm8,XMMWORD[((-152))+rax] > + movaps xmm9,XMMWORD[((-136))+rax] > + movaps xmm10,XMMWORD[((-120))+rax] > + movaps xmm11,XMMWORD[((-104))+rax] > + movaps xmm12,XMMWORD[((-88))+rax] > + movaps xmm13,XMMWORD[((-72))+rax] > + movaps xmm14,XMMWORD[((-56))+rax] > + movaps xmm15,XMMWORD[((-40))+rax] > + mov rbp,QWORD[((-16))+rax] > + > + mov rbx,QWORD[((-8))+rax] > + > + lea rsp,[rax] > + > +$L$epilogue_shaext: > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > + mov rsi,QWORD[16+rsp] > + DB 0F3h,0C3h ;repret > + > +$L$SEH_end_sha256_multi_block_shaext: > +ALIGN 256 > +K256: > + DD 1116352408,1116352408,1116352408,1116352408 > + DD 1116352408,1116352408,1116352408,1116352408 > + DD 1899447441,1899447441,1899447441,1899447441 > + DD 1899447441,1899447441,1899447441,1899447441 > + DD 3049323471,3049323471,3049323471,3049323471 > + DD 3049323471,3049323471,3049323471,3049323471 > + DD 3921009573,3921009573,3921009573,3921009573 > + DD 3921009573,3921009573,3921009573,3921009573 > + DD 961987163,961987163,961987163,961987163 > + DD 961987163,961987163,961987163,961987163 > + DD 1508970993,1508970993,1508970993,1508970993 > + DD 1508970993,1508970993,1508970993,1508970993 > + DD 2453635748,2453635748,2453635748,2453635748 > + DD 2453635748,2453635748,2453635748,2453635748 > + DD 2870763221,2870763221,2870763221,2870763221 > + DD 2870763221,2870763221,2870763221,2870763221 > + DD 3624381080,3624381080,3624381080,3624381080 > + DD 3624381080,3624381080,3624381080,3624381080 > + DD 310598401,310598401,310598401,310598401 > + DD 310598401,310598401,310598401,310598401 > + DD 607225278,607225278,607225278,607225278 > + DD 607225278,607225278,607225278,607225278 > + DD 1426881987,1426881987,1426881987,1426881987 > + DD 1426881987,1426881987,1426881987,1426881987 > + DD 1925078388,1925078388,1925078388,1925078388 > + DD 1925078388,1925078388,1925078388,1925078388 > + DD 2162078206,2162078206,2162078206,2162078206 > + DD 2162078206,2162078206,2162078206,2162078206 > + DD 2614888103,2614888103,2614888103,2614888103 > + DD 2614888103,2614888103,2614888103,2614888103 > + DD 3248222580,3248222580,3248222580,3248222580 > + DD 3248222580,3248222580,3248222580,3248222580 > + DD 3835390401,3835390401,3835390401,3835390401 > + DD 3835390401,3835390401,3835390401,3835390401 > + DD 4022224774,4022224774,4022224774,4022224774 > + DD 4022224774,4022224774,4022224774,4022224774 > + DD 264347078,264347078,264347078,264347078 > + DD 264347078,264347078,264347078,264347078 > + DD 604807628,604807628,604807628,604807628 > + DD 604807628,604807628,604807628,604807628 > + DD 770255983,770255983,770255983,770255983 > + DD 770255983,770255983,770255983,770255983 > + DD 1249150122,1249150122,1249150122,1249150122 > + DD 1249150122,1249150122,1249150122,1249150122 > + DD 1555081692,1555081692,1555081692,1555081692 > + DD 1555081692,1555081692,1555081692,1555081692 > + DD 1996064986,1996064986,1996064986,1996064986 > + DD 1996064986,1996064986,1996064986,1996064986 > + DD 2554220882,2554220882,2554220882,2554220882 > + DD 2554220882,2554220882,2554220882,2554220882 > + DD 2821834349,2821834349,2821834349,2821834349 > + DD 2821834349,2821834349,2821834349,2821834349 > + DD 2952996808,2952996808,2952996808,2952996808 > + DD 2952996808,2952996808,2952996808,2952996808 > + DD 3210313671,3210313671,3210313671,3210313671 > + DD 3210313671,3210313671,3210313671,3210313671 > + DD 3336571891,3336571891,3336571891,3336571891 > + DD 3336571891,3336571891,3336571891,3336571891 > + DD 3584528711,3584528711,3584528711,3584528711 > + DD 3584528711,3584528711,3584528711,3584528711 > + DD 113926993,113926993,113926993,113926993 > + DD 113926993,113926993,113926993,113926993 > + DD 338241895,338241895,338241895,338241895 > + DD 338241895,338241895,338241895,338241895 > + DD 666307205,666307205,666307205,666307205 > + DD 666307205,666307205,666307205,666307205 > + DD 773529912,773529912,773529912,773529912 > + DD 773529912,773529912,773529912,773529912 > + DD 1294757372,1294757372,1294757372,1294757372 > + DD 1294757372,1294757372,1294757372,1294757372 > + DD 1396182291,1396182291,1396182291,1396182291 > + DD 1396182291,1396182291,1396182291,1396182291 > + DD 1695183700,1695183700,1695183700,1695183700 > + DD 1695183700,1695183700,1695183700,1695183700 > + DD 1986661051,1986661051,1986661051,1986661051 > + DD 1986661051,1986661051,1986661051,1986661051 > + DD 2177026350,2177026350,2177026350,2177026350 > + DD 2177026350,2177026350,2177026350,2177026350 > + DD 2456956037,2456956037,2456956037,2456956037 > + DD 2456956037,2456956037,2456956037,2456956037 > + DD 2730485921,2730485921,2730485921,2730485921 > + DD 2730485921,2730485921,2730485921,2730485921 > + DD 2820302411,2820302411,2820302411,2820302411 > + DD 2820302411,2820302411,2820302411,2820302411 > + DD 3259730800,3259730800,3259730800,3259730800 > + DD 3259730800,3259730800,3259730800,3259730800 > + DD 3345764771,3345764771,3345764771,3345764771 > + DD 3345764771,3345764771,3345764771,3345764771 > + DD 3516065817,3516065817,3516065817,3516065817 > + DD 3516065817,3516065817,3516065817,3516065817 > + DD 3600352804,3600352804,3600352804,3600352804 > + DD 3600352804,3600352804,3600352804,3600352804 > + DD 4094571909,4094571909,4094571909,4094571909 > + DD 4094571909,4094571909,4094571909,4094571909 > + DD 275423344,275423344,275423344,275423344 > + DD 275423344,275423344,275423344,275423344 > + DD 430227734,430227734,430227734,430227734 > + DD 430227734,430227734,430227734,430227734 > + DD 506948616,506948616,506948616,506948616 > + DD 506948616,506948616,506948616,506948616 > + DD 659060556,659060556,659060556,659060556 > + DD 659060556,659060556,659060556,659060556 > + DD 883997877,883997877,883997877,883997877 > + DD 883997877,883997877,883997877,883997877 > + DD 958139571,958139571,958139571,958139571 > + DD 958139571,958139571,958139571,958139571 > + DD 1322822218,1322822218,1322822218,1322822218 > + DD 1322822218,1322822218,1322822218,1322822218 > + DD 1537002063,1537002063,1537002063,1537002063 > + DD 1537002063,1537002063,1537002063,1537002063 > + DD 1747873779,1747873779,1747873779,1747873779 > + DD 1747873779,1747873779,1747873779,1747873779 > + DD 1955562222,1955562222,1955562222,1955562222 > + DD 1955562222,1955562222,1955562222,1955562222 > + DD 2024104815,2024104815,2024104815,2024104815 > + DD 2024104815,2024104815,2024104815,2024104815 > + DD 2227730452,2227730452,2227730452,2227730452 > + DD 2227730452,2227730452,2227730452,2227730452 > + DD 2361852424,2361852424,2361852424,2361852424 > + DD 2361852424,2361852424,2361852424,2361852424 > + DD 2428436474,2428436474,2428436474,2428436474 > + DD 2428436474,2428436474,2428436474,2428436474 > + DD 2756734187,2756734187,2756734187,2756734187 > + DD 2756734187,2756734187,2756734187,2756734187 > + DD 3204031479,3204031479,3204031479,3204031479 > + DD 3204031479,3204031479,3204031479,3204031479 > + DD 3329325298,3329325298,3329325298,3329325298 > + DD 3329325298,3329325298,3329325298,3329325298 > +$L$pbswap: > + DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f > + DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f > +K256_shaext: > + DD 0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5 > + DD 0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5 > + DD 0xd807aa98,0x12835b01,0x243185be,0x550c7dc3 > + DD 0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174 > + DD 0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc > + DD 0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da > + DD 0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7 > + DD 0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967 > + DD 0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13 > + DD 0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85 > + DD 0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3 > + DD 0xd192e819,0xd6990624,0xf40e3585,0x106aa070 > + DD 0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5 > + DD 0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3 > + DD 0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208 > + DD 0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2 > +DB 83,72,65,50,53,54,32,109,117,108,116,105,45,98,108,111 > +DB 99,107,32,116,114,97,110,115,102,111,114,109,32,102,111,114 > +DB 32,120,56,54,95,54,52,44,32,67,82,89,80,84,79,71 > +DB 65,77,83,32,98,121,32,60,97,112,112,114,111,64,111,112 > +DB 101,110,115,115,108,46,111,114,103,62,0 > +EXTERN __imp_RtlVirtualUnwind > + > +ALIGN 16 > +se_handler: > + push rsi > + push rdi > + push rbx > + push rbp > + push r12 > + push r13 > + push r14 > + push r15 > + pushfq > + sub rsp,64 > + > + mov rax,QWORD[120+r8] > + mov rbx,QWORD[248+r8] > + > + mov rsi,QWORD[8+r9] > + mov r11,QWORD[56+r9] > + > + mov r10d,DWORD[r11] > + lea r10,[r10*1+rsi] > + cmp rbx,r10 > + jb NEAR $L$in_prologue > + > + mov rax,QWORD[152+r8] > + > + mov r10d,DWORD[4+r11] > + lea r10,[r10*1+rsi] > + cmp rbx,r10 > + jae NEAR $L$in_prologue > + > + mov rax,QWORD[272+rax] > + > + mov rbx,QWORD[((-8))+rax] > + mov rbp,QWORD[((-16))+rax] > + mov QWORD[144+r8],rbx > + mov QWORD[160+r8],rbp > + > + lea rsi,[((-24-160))+rax] > + lea rdi,[512+r8] > + mov ecx,20 > + DD 0xa548f3fc > + > +$L$in_prologue: > + mov rdi,QWORD[8+rax] > + mov rsi,QWORD[16+rax] > + mov QWORD[152+r8],rax > + mov QWORD[168+r8],rsi > + mov QWORD[176+r8],rdi > + > + mov rdi,QWORD[40+r9] > + mov rsi,r8 > + mov ecx,154 > + DD 0xa548f3fc > + > + mov rsi,r9 > + xor rcx,rcx > + mov rdx,QWORD[8+rsi] > + mov r8,QWORD[rsi] > + mov r9,QWORD[16+rsi] > + mov r10,QWORD[40+rsi] > + lea r11,[56+rsi] > + lea r12,[24+rsi] > + mov QWORD[32+rsp],r10 > + mov QWORD[40+rsp],r11 > + mov QWORD[48+rsp],r12 > + mov QWORD[56+rsp],rcx > + call QWORD[__imp_RtlVirtualUnwind] > + > + mov eax,1 > + add rsp,64 > + popfq > + pop r15 > + pop r14 > + pop r13 > + pop r12 > + pop rbp > + pop rbx > + pop rdi > + pop rsi > + DB 0F3h,0C3h ;repret > + > +section .pdata rdata align=4 > +ALIGN 4 > + DD $L$SEH_begin_sha256_multi_block wrt ..imagebase > + DD $L$SEH_end_sha256_multi_block wrt ..imagebase > + DD $L$SEH_info_sha256_multi_block wrt ..imagebase > + DD $L$SEH_begin_sha256_multi_block_shaext wrt ..imagebase > + DD $L$SEH_end_sha256_multi_block_shaext wrt ..imagebase > + DD $L$SEH_info_sha256_multi_block_shaext wrt ..imagebase > +section .xdata rdata align=8 > +ALIGN 8 > +$L$SEH_info_sha256_multi_block: > +DB 9,0,0,0 > + DD se_handler wrt ..imagebase > + DD $L$body wrt ..imagebase,$L$epilogue wrt ..imagebase > +$L$SEH_info_sha256_multi_block_shaext: > +DB 9,0,0,0 > + DD se_handler wrt ..imagebase > + DD $L$body_shaext wrt ..imagebase,$L$epilogue_shaext > wrt ..imagebase > diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-x86_64.nasm > b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-x86_64.nasm > new file mode 100644 > index 0000000000..70e49862a3 > --- /dev/null > +++ b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-x86_64.nasm > @@ -0,0 +1,3313 @@ > +; WARNING: do not edit! > +; Generated from openssl/crypto/sha/asm/sha512-x86_64.pl > +; > +; Copyright 2005-2020 The OpenSSL Project Authors. All Rights Reserved. > +; > +; Licensed under the OpenSSL license (the "License"). You may not use > +; this file except in compliance with the License. You can obtain a copy > +; in the file LICENSE in the source distribution or at > +; https://www.openssl.org/source/license.html > + > +default rel > +%define XMMWORD > +%define YMMWORD > +%define ZMMWORD > +section .text code align=64 > + > + > +EXTERN OPENSSL_ia32cap_P > +global sha256_block_data_order > + > +ALIGN 16 > +sha256_block_data_order: > + mov QWORD[8+rsp],rdi ;WIN64 prologue > + mov QWORD[16+rsp],rsi > + mov rax,rsp > +$L$SEH_begin_sha256_block_data_order: > + mov rdi,rcx > + mov rsi,rdx > + mov rdx,r8 > + > + > + > + lea r11,[OPENSSL_ia32cap_P] > + mov r9d,DWORD[r11] > + mov r10d,DWORD[4+r11] > + mov r11d,DWORD[8+r11] > + test r11d,536870912 > + jnz NEAR _shaext_shortcut > + test r10d,512 > + jnz NEAR $L$ssse3_shortcut > + mov rax,rsp > + > + push rbx > + > + push rbp > + > + push r12 > + > + push r13 > + > + push r14 > + > + push r15 > + > + shl rdx,4 > + sub rsp,16*4+4*8 > + lea rdx,[rdx*4+rsi] > + and rsp,-64 > + mov QWORD[((64+0))+rsp],rdi > + mov QWORD[((64+8))+rsp],rsi > + mov QWORD[((64+16))+rsp],rdx > + mov QWORD[88+rsp],rax > + > +$L$prologue: > + > + mov eax,DWORD[rdi] > + mov ebx,DWORD[4+rdi] > + mov ecx,DWORD[8+rdi] > + mov edx,DWORD[12+rdi] > + mov r8d,DWORD[16+rdi] > + mov r9d,DWORD[20+rdi] > + mov r10d,DWORD[24+rdi] > + mov r11d,DWORD[28+rdi] > + jmp NEAR $L$loop > + > +ALIGN 16 > +$L$loop: > + mov edi,ebx > + lea rbp,[K256] > + xor edi,ecx > + mov r12d,DWORD[rsi] > + mov r13d,r8d > + mov r14d,eax > + bswap r12d > + ror r13d,14 > + mov r15d,r9d > + > + xor r13d,r8d > + ror r14d,9 > + xor r15d,r10d > + > + mov DWORD[rsp],r12d > + xor r14d,eax > + and r15d,r8d > + > + ror r13d,5 > + add r12d,r11d > + xor r15d,r10d > + > + ror r14d,11 > + xor r13d,r8d > + add r12d,r15d > + > + mov r15d,eax > + add r12d,DWORD[rbp] > + xor r14d,eax > + > + xor r15d,ebx > + ror r13d,6 > + mov r11d,ebx > + > + and edi,r15d > + ror r14d,2 > + add r12d,r13d > + > + xor r11d,edi > + add edx,r12d > + add r11d,r12d > + > + lea rbp,[4+rbp] > + add r11d,r14d > + mov r12d,DWORD[4+rsi] > + mov r13d,edx > + mov r14d,r11d > + bswap r12d > + ror r13d,14 > + mov edi,r8d > + > + xor r13d,edx > + ror r14d,9 > + xor edi,r9d > + > + mov DWORD[4+rsp],r12d > + xor r14d,r11d > + and edi,edx > + > + ror r13d,5 > + add r12d,r10d > + xor edi,r9d > + > + ror r14d,11 > + xor r13d,edx > + add r12d,edi > + > + mov edi,r11d > + add r12d,DWORD[rbp] > + xor r14d,r11d > + > + xor edi,eax > + ror r13d,6 > + mov r10d,eax > + > + and r15d,edi > + ror r14d,2 > + add r12d,r13d > + > + xor r10d,r15d > + add ecx,r12d > + add r10d,r12d > + > + lea rbp,[4+rbp] > + add r10d,r14d > + mov r12d,DWORD[8+rsi] > + mov r13d,ecx > + mov r14d,r10d > + bswap r12d > + ror r13d,14 > + mov r15d,edx > + > + xor r13d,ecx > + ror r14d,9 > + xor r15d,r8d > + > + mov DWORD[8+rsp],r12d > + xor r14d,r10d > + and r15d,ecx > + > + ror r13d,5 > + add r12d,r9d > + xor r15d,r8d > + > + ror r14d,11 > + xor r13d,ecx > + add r12d,r15d > + > + mov r15d,r10d > + add r12d,DWORD[rbp] > + xor r14d,r10d > + > + xor r15d,r11d > + ror r13d,6 > + mov r9d,r11d > + > + and edi,r15d > + ror r14d,2 > + add r12d,r13d > + > + xor r9d,edi > + add ebx,r12d > + add r9d,r12d > + > + lea rbp,[4+rbp] > + add r9d,r14d > + mov r12d,DWORD[12+rsi] > + mov r13d,ebx > + mov r14d,r9d > + bswap r12d > + ror r13d,14 > + mov edi,ecx > + > + xor r13d,ebx > + ror r14d,9 > + xor edi,edx > + > + mov DWORD[12+rsp],r12d > + xor r14d,r9d > + and edi,ebx > + > + ror r13d,5 > + add r12d,r8d > + xor edi,edx > + > + ror r14d,11 > + xor r13d,ebx > + add r12d,edi > + > + mov edi,r9d > + add r12d,DWORD[rbp] > + xor r14d,r9d > + > + xor edi,r10d > + ror r13d,6 > + mov r8d,r10d > + > + and r15d,edi > + ror r14d,2 > + add r12d,r13d > + > + xor r8d,r15d > + add eax,r12d > + add r8d,r12d > + > + lea rbp,[20+rbp] > + add r8d,r14d > + mov r12d,DWORD[16+rsi] > + mov r13d,eax > + mov r14d,r8d > + bswap r12d > + ror r13d,14 > + mov r15d,ebx > + > + xor r13d,eax > + ror r14d,9 > + xor r15d,ecx > + > + mov DWORD[16+rsp],r12d > + xor r14d,r8d > + and r15d,eax > + > + ror r13d,5 > + add r12d,edx > + xor r15d,ecx > + > + ror r14d,11 > + xor r13d,eax > + add r12d,r15d > + > + mov r15d,r8d > + add r12d,DWORD[rbp] > + xor r14d,r8d > + > + xor r15d,r9d > + ror r13d,6 > + mov edx,r9d > + > + and edi,r15d > + ror r14d,2 > + add r12d,r13d > + > + xor edx,edi > + add r11d,r12d > + add edx,r12d > + > + lea rbp,[4+rbp] > + add edx,r14d > + mov r12d,DWORD[20+rsi] > + mov r13d,r11d > + mov r14d,edx > + bswap r12d > + ror r13d,14 > + mov edi,eax > + > + xor r13d,r11d > + ror r14d,9 > + xor edi,ebx > + > + mov DWORD[20+rsp],r12d > + xor r14d,edx > + and edi,r11d > + > + ror r13d,5 > + add r12d,ecx > + xor edi,ebx > + > + ror r14d,11 > + xor r13d,r11d > + add r12d,edi > + > + mov edi,edx > + add r12d,DWORD[rbp] > + xor r14d,edx > + > + xor edi,r8d > + ror r13d,6 > + mov ecx,r8d > + > + and r15d,edi > + ror r14d,2 > + add r12d,r13d > + > + xor ecx,r15d > + add r10d,r12d > + add ecx,r12d > + > + lea rbp,[4+rbp] > + add ecx,r14d > + mov r12d,DWORD[24+rsi] > + mov r13d,r10d > + mov r14d,ecx > + bswap r12d > + ror r13d,14 > + mov r15d,r11d > + > + xor r13d,r10d > + ror r14d,9 > + xor r15d,eax > + > + mov DWORD[24+rsp],r12d > + xor r14d,ecx > + and r15d,r10d > + > + ror r13d,5 > + add r12d,ebx > + xor r15d,eax > + > + ror r14d,11 > + xor r13d,r10d > + add r12d,r15d > + > + mov r15d,ecx > + add r12d,DWORD[rbp] > + xor r14d,ecx > + > + xor r15d,edx > + ror r13d,6 > + mov ebx,edx > + > + and edi,r15d > + ror r14d,2 > + add r12d,r13d > + > + xor ebx,edi > + add r9d,r12d > + add ebx,r12d > + > + lea rbp,[4+rbp] > + add ebx,r14d > + mov r12d,DWORD[28+rsi] > + mov r13d,r9d > + mov r14d,ebx > + bswap r12d > + ror r13d,14 > + mov edi,r10d > + > + xor r13d,r9d > + ror r14d,9 > + xor edi,r11d > + > + mov DWORD[28+rsp],r12d > + xor r14d,ebx > + and edi,r9d > + > + ror r13d,5 > + add r12d,eax > + xor edi,r11d > + > + ror r14d,11 > + xor r13d,r9d > + add r12d,edi > + > + mov edi,ebx > + add r12d,DWORD[rbp] > + xor r14d,ebx > + > + xor edi,ecx > + ror r13d,6 > + mov eax,ecx > + > + and r15d,edi > + ror r14d,2 > + add r12d,r13d > + > + xor eax,r15d > + add r8d,r12d > + add eax,r12d > + > + lea rbp,[20+rbp] > + add eax,r14d > + mov r12d,DWORD[32+rsi] > + mov r13d,r8d > + mov r14d,eax > + bswap r12d > + ror r13d,14 > + mov r15d,r9d > + > + xor r13d,r8d > + ror r14d,9 > + xor r15d,r10d > + > + mov DWORD[32+rsp],r12d > + xor r14d,eax > + and r15d,r8d > + > + ror r13d,5 > + add r12d,r11d > + xor r15d,r10d > + > + ror r14d,11 > + xor r13d,r8d > + add r12d,r15d > + > + mov r15d,eax > + add r12d,DWORD[rbp] > + xor r14d,eax > + > + xor r15d,ebx > + ror r13d,6 > + mov r11d,ebx > + > + and edi,r15d > + ror r14d,2 > + add r12d,r13d > + > + xor r11d,edi > + add edx,r12d > + add r11d,r12d > + > + lea rbp,[4+rbp] > + add r11d,r14d > + mov r12d,DWORD[36+rsi] > + mov r13d,edx > + mov r14d,r11d > + bswap r12d > + ror r13d,14 > + mov edi,r8d > + > + xor r13d,edx > + ror r14d,9 > + xor edi,r9d > + > + mov DWORD[36+rsp],r12d > + xor r14d,r11d > + and edi,edx > + > + ror r13d,5 > + add r12d,r10d > + xor edi,r9d > + > + ror r14d,11 > + xor r13d,edx > + add r12d,edi > + > + mov edi,r11d > + add r12d,DWORD[rbp] > + xor r14d,r11d > + > + xor edi,eax > + ror r13d,6 > + mov r10d,eax > + > + and r15d,edi > + ror r14d,2 > + add r12d,r13d > + > + xor r10d,r15d > + add ecx,r12d > + add r10d,r12d > + > + lea rbp,[4+rbp] > + add r10d,r14d > + mov r12d,DWORD[40+rsi] > + mov r13d,ecx > + mov r14d,r10d > + bswap r12d > + ror r13d,14 > + mov r15d,edx > + > + xor r13d,ecx > + ror r14d,9 > + xor r15d,r8d > + > + mov DWORD[40+rsp],r12d > + xor r14d,r10d > + and r15d,ecx > + > + ror r13d,5 > + add r12d,r9d > + xor r15d,r8d > + > + ror r14d,11 > + xor r13d,ecx > + add r12d,r15d > + > + mov r15d,r10d > + add r12d,DWORD[rbp] > + xor r14d,r10d > + > + xor r15d,r11d > + ror r13d,6 > + mov r9d,r11d > + > + and edi,r15d > + ror r14d,2 > + add r12d,r13d > + > + xor r9d,edi > + add ebx,r12d > + add r9d,r12d > + > + lea rbp,[4+rbp] > + add r9d,r14d > + mov r12d,DWORD[44+rsi] > + mov r13d,ebx > + mov r14d,r9d > + bswap r12d > + ror r13d,14 > + mov edi,ecx > + > + xor r13d,ebx > + ror r14d,9 > + xor edi,edx > + > + mov DWORD[44+rsp],r12d > + xor r14d,r9d > + and edi,ebx > + > + ror r13d,5 > + add r12d,r8d > + xor edi,edx > + > + ror r14d,11 > + xor r13d,ebx > + add r12d,edi > + > + mov edi,r9d > + add r12d,DWORD[rbp] > + xor r14d,r9d > + > + xor edi,r10d > + ror r13d,6 > + mov r8d,r10d > + > + and r15d,edi > + ror r14d,2 > + add r12d,r13d > + > + xor r8d,r15d > + add eax,r12d > + add r8d,r12d > + > + lea rbp,[20+rbp] > + add r8d,r14d > + mov r12d,DWORD[48+rsi] > + mov r13d,eax > + mov r14d,r8d > + bswap r12d > + ror r13d,14 > + mov r15d,ebx > + > + xor r13d,eax > + ror r14d,9 > + xor r15d,ecx > + > + mov DWORD[48+rsp],r12d > + xor r14d,r8d > + and r15d,eax > + > + ror r13d,5 > + add r12d,edx > + xor r15d,ecx > + > + ror r14d,11 > + xor r13d,eax > + add r12d,r15d > + > + mov r15d,r8d > + add r12d,DWORD[rbp] > + xor r14d,r8d > + > + xor r15d,r9d > + ror r13d,6 > + mov edx,r9d > + > + and edi,r15d > + ror r14d,2 > + add r12d,r13d > + > + xor edx,edi > + add r11d,r12d > + add edx,r12d > + > + lea rbp,[4+rbp] > + add edx,r14d > + mov r12d,DWORD[52+rsi] > + mov r13d,r11d > + mov r14d,edx > + bswap r12d > + ror r13d,14 > + mov edi,eax > + > + xor r13d,r11d > + ror r14d,9 > + xor edi,ebx > + > + mov DWORD[52+rsp],r12d > + xor r14d,edx > + and edi,r11d > + > + ror r13d,5 > + add r12d,ecx > + xor edi,ebx > + > + ror r14d,11 > + xor r13d,r11d > + add r12d,edi > + > + mov edi,edx > + add r12d,DWORD[rbp] > + xor r14d,edx > + > + xor edi,r8d > + ror r13d,6 > + mov ecx,r8d > + > + and r15d,edi > + ror r14d,2 > + add r12d,r13d > + > + xor ecx,r15d > + add r10d,r12d > + add ecx,r12d > + > + lea rbp,[4+rbp] > + add ecx,r14d > + mov r12d,DWORD[56+rsi] > + mov r13d,r10d > + mov r14d,ecx > + bswap r12d > + ror r13d,14 > + mov r15d,r11d > + > + xor r13d,r10d > + ror r14d,9 > + xor r15d,eax > + > + mov DWORD[56+rsp],r12d > + xor r14d,ecx > + and r15d,r10d > + > + ror r13d,5 > + add r12d,ebx > + xor r15d,eax > + > + ror r14d,11 > + xor r13d,r10d > + add r12d,r15d > + > + mov r15d,ecx > + add r12d,DWORD[rbp] > + xor r14d,ecx > + > + xor r15d,edx > + ror r13d,6 > + mov ebx,edx > + > + and edi,r15d > + ror r14d,2 > + add r12d,r13d > + > + xor ebx,edi > + add r9d,r12d > + add ebx,r12d > + > + lea rbp,[4+rbp] > + add ebx,r14d > + mov r12d,DWORD[60+rsi] > + mov r13d,r9d > + mov r14d,ebx > + bswap r12d > + ror r13d,14 > + mov edi,r10d > + > + xor r13d,r9d > + ror r14d,9 > + xor edi,r11d > + > + mov DWORD[60+rsp],r12d > + xor r14d,ebx > + and edi,r9d > + > + ror r13d,5 > + add r12d,eax > + xor edi,r11d > + > + ror r14d,11 > + xor r13d,r9d > + add r12d,edi > + > + mov edi,ebx > + add r12d,DWORD[rbp] > + xor r14d,ebx > + > + xor edi,ecx > + ror r13d,6 > + mov eax,ecx > + > + and r15d,edi > + ror r14d,2 > + add r12d,r13d > + > + xor eax,r15d > + add r8d,r12d > + add eax,r12d > + > + lea rbp,[20+rbp] > + jmp NEAR $L$rounds_16_xx > +ALIGN 16 > +$L$rounds_16_xx: > + mov r13d,DWORD[4+rsp] > + mov r15d,DWORD[56+rsp] > + > + mov r12d,r13d > + ror r13d,11 > + add eax,r14d > + mov r14d,r15d > + ror r15d,2 > + > + xor r13d,r12d > + shr r12d,3 > + ror r13d,7 > + xor r15d,r14d > + shr r14d,10 > + > + ror r15d,17 > + xor r12d,r13d > + xor r15d,r14d > + add r12d,DWORD[36+rsp] > + > + add r12d,DWORD[rsp] > + mov r13d,r8d > + add r12d,r15d > + mov r14d,eax > + ror r13d,14 > + mov r15d,r9d > + > + xor r13d,r8d > + ror r14d,9 > + xor r15d,r10d > + > + mov DWORD[rsp],r12d > + xor r14d,eax > + and r15d,r8d > + > + ror r13d,5 > + add r12d,r11d > + xor r15d,r10d > + > + ror r14d,11 > + xor r13d,r8d > + add r12d,r15d > + > + mov r15d,eax > + add r12d,DWORD[rbp] > + xor r14d,eax > + > + xor r15d,ebx > + ror r13d,6 > + mov r11d,ebx > + > + and edi,r15d > + ror r14d,2 > + add r12d,r13d > + > + xor r11d,edi > + add edx,r12d > + add r11d,r12d > + > + lea rbp,[4+rbp] > + mov r13d,DWORD[8+rsp] > + mov edi,DWORD[60+rsp] > + > + mov r12d,r13d > + ror r13d,11 > + add r11d,r14d > + mov r14d,edi > + ror edi,2 > + > + xor r13d,r12d > + shr r12d,3 > + ror r13d,7 > + xor edi,r14d > + shr r14d,10 > + > + ror edi,17 > + xor r12d,r13d > + xor edi,r14d > + add r12d,DWORD[40+rsp] > + > + add r12d,DWORD[4+rsp] > + mov r13d,edx > + add r12d,edi > + mov r14d,r11d > + ror r13d,14 > + mov edi,r8d > + > + xor r13d,edx > + ror r14d,9 > + xor edi,r9d > + > + mov DWORD[4+rsp],r12d > + xor r14d,r11d > + and edi,edx > + > + ror r13d,5 > + add r12d,r10d > + xor edi,r9d > + > + ror r14d,11 > + xor r13d,edx > + add r12d,edi > + > + mov edi,r11d > + add r12d,DWORD[rbp] > + xor r14d,r11d > + > + xor edi,eax > + ror r13d,6 > + mov r10d,eax > + > + and r15d,edi > + ror r14d,2 > + add r12d,r13d > + > + xor r10d,r15d > + add ecx,r12d > + add r10d,r12d > + > + lea rbp,[4+rbp] > + mov r13d,DWORD[12+rsp] > + mov r15d,DWORD[rsp] > + > + mov r12d,r13d > + ror r13d,11 > + add r10d,r14d > + mov r14d,r15d > + ror r15d,2 > + > + xor r13d,r12d > + shr r12d,3 > + ror r13d,7 > + xor r15d,r14d > + shr r14d,10 > + > + ror r15d,17 > + xor r12d,r13d > + xor r15d,r14d > + add r12d,DWORD[44+rsp] > + > + add r12d,DWORD[8+rsp] > + mov r13d,ecx > + add r12d,r15d > + mov r14d,r10d > + ror r13d,14 > + mov r15d,edx > + > + xor r13d,ecx > + ror r14d,9 > + xor r15d,r8d > + > + mov DWORD[8+rsp],r12d > + xor r14d,r10d > + and r15d,ecx > + > + ror r13d,5 > + add r12d,r9d > + xor r15d,r8d > + > + ror r14d,11 > + xor r13d,ecx > + add r12d,r15d > + > + mov r15d,r10d > + add r12d,DWORD[rbp] > + xor r14d,r10d > + > + xor r15d,r11d > + ror r13d,6 > + mov r9d,r11d > + > + and edi,r15d > + ror r14d,2 > + add r12d,r13d > + > + xor r9d,edi > + add ebx,r12d > + add r9d,r12d > + > + lea rbp,[4+rbp] > + mov r13d,DWORD[16+rsp] > + mov edi,DWORD[4+rsp] > + > + mov r12d,r13d > + ror r13d,11 > + add r9d,r14d > + mov r14d,edi > + ror edi,2 > + > + xor r13d,r12d > + shr r12d,3 > + ror r13d,7 > + xor edi,r14d > + shr r14d,10 > + > + ror edi,17 > + xor r12d,r13d > + xor edi,r14d > + add r12d,DWORD[48+rsp] > + > + add r12d,DWORD[12+rsp] > + mov r13d,ebx > + add r12d,edi > + mov r14d,r9d > + ror r13d,14 > + mov edi,ecx > + > + xor r13d,ebx > + ror r14d,9 > + xor edi,edx > + > + mov DWORD[12+rsp],r12d > + xor r14d,r9d > + and edi,ebx > + > + ror r13d,5 > + add r12d,r8d > + xor edi,edx > + > + ror r14d,11 > + xor r13d,ebx > + add r12d,edi > + > + mov edi,r9d > + add r12d,DWORD[rbp] > + xor r14d,r9d > + > + xor edi,r10d > + ror r13d,6 > + mov r8d,r10d > + > + and r15d,edi > + ror r14d,2 > + add r12d,r13d > + > + xor r8d,r15d > + add eax,r12d > + add r8d,r12d > + > + lea rbp,[20+rbp] > + mov r13d,DWORD[20+rsp] > + mov r15d,DWORD[8+rsp] > + > + mov r12d,r13d > + ror r13d,11 > + add r8d,r14d > + mov r14d,r15d > + ror r15d,2 > + > + xor r13d,r12d > + shr r12d,3 > + ror r13d,7 > + xor r15d,r14d > + shr r14d,10 > + > + ror r15d,17 > + xor r12d,r13d > + xor r15d,r14d > + add r12d,DWORD[52+rsp] > + > + add r12d,DWORD[16+rsp] > + mov r13d,eax > + add r12d,r15d > + mov r14d,r8d > + ror r13d,14 > + mov r15d,ebx > + > + xor r13d,eax > + ror r14d,9 > + xor r15d,ecx > + > + mov DWORD[16+rsp],r12d > + xor r14d,r8d > + and r15d,eax > + > + ror r13d,5 > + add r12d,edx > + xor r15d,ecx > + > + ror r14d,11 > + xor r13d,eax > + add r12d,r15d > + > + mov r15d,r8d > + add r12d,DWORD[rbp] > + xor r14d,r8d > + > + xor r15d,r9d > + ror r13d,6 > + mov edx,r9d > + > + and edi,r15d > + ror r14d,2 > + add r12d,r13d > + > + xor edx,edi > + add r11d,r12d > + add edx,r12d > + > + lea rbp,[4+rbp] > + mov r13d,DWORD[24+rsp] > + mov edi,DWORD[12+rsp] > + > + mov r12d,r13d > + ror r13d,11 > + add edx,r14d > + mov r14d,edi > + ror edi,2 > + > + xor r13d,r12d > + shr r12d,3 > + ror r13d,7 > + xor edi,r14d > + shr r14d,10 > + > + ror edi,17 > + xor r12d,r13d > + xor edi,r14d > + add r12d,DWORD[56+rsp] > + > + add r12d,DWORD[20+rsp] > + mov r13d,r11d > + add r12d,edi > + mov r14d,edx > + ror r13d,14 > + mov edi,eax > + > + xor r13d,r11d > + ror r14d,9 > + xor edi,ebx > + > + mov DWORD[20+rsp],r12d > + xor r14d,edx > + and edi,r11d > + > + ror r13d,5 > + add r12d,ecx > + xor edi,ebx > + > + ror r14d,11 > + xor r13d,r11d > + add r12d,edi > + > + mov edi,edx > + add r12d,DWORD[rbp] > + xor r14d,edx > + > + xor edi,r8d > + ror r13d,6 > + mov ecx,r8d > + > + and r15d,edi > + ror r14d,2 > + add r12d,r13d > + > + xor ecx,r15d > + add r10d,r12d > + add ecx,r12d > + > + lea rbp,[4+rbp] > + mov r13d,DWORD[28+rsp] > + mov r15d,DWORD[16+rsp] > + > + mov r12d,r13d > + ror r13d,11 > + add ecx,r14d > + mov r14d,r15d > + ror r15d,2 > + > + xor r13d,r12d > + shr r12d,3 > + ror r13d,7 > + xor r15d,r14d > + shr r14d,10 > + > + ror r15d,17 > + xor r12d,r13d > + xor r15d,r14d > + add r12d,DWORD[60+rsp] > + > + add r12d,DWORD[24+rsp] > + mov r13d,r10d > + add r12d,r15d > + mov r14d,ecx > + ror r13d,14 > + mov r15d,r11d > + > + xor r13d,r10d > + ror r14d,9 > + xor r15d,eax > + > + mov DWORD[24+rsp],r12d > + xor r14d,ecx > + and r15d,r10d > + > + ror r13d,5 > + add r12d,ebx > + xor r15d,eax > + > + ror r14d,11 > + xor r13d,r10d > + add r12d,r15d > + > + mov r15d,ecx > + add r12d,DWORD[rbp] > + xor r14d,ecx > + > + xor r15d,edx > + ror r13d,6 > + mov ebx,edx > + > + and edi,r15d > + ror r14d,2 > + add r12d,r13d > + > + xor ebx,edi > + add r9d,r12d > + add ebx,r12d > + > + lea rbp,[4+rbp] > + mov r13d,DWORD[32+rsp] > + mov edi,DWORD[20+rsp] > + > + mov r12d,r13d > + ror r13d,11 > + add ebx,r14d > + mov r14d,edi > + ror edi,2 > + > + xor r13d,r12d > + shr r12d,3 > + ror r13d,7 > + xor edi,r14d > + shr r14d,10 > + > + ror edi,17 > + xor r12d,r13d > + xor edi,r14d > + add r12d,DWORD[rsp] > + > + add r12d,DWORD[28+rsp] > + mov r13d,r9d > + add r12d,edi > + mov r14d,ebx > + ror r13d,14 > + mov edi,r10d > + > + xor r13d,r9d > + ror r14d,9 > + xor edi,r11d > + > + mov DWORD[28+rsp],r12d > + xor r14d,ebx > + and edi,r9d > + > + ror r13d,5 > + add r12d,eax > + xor edi,r11d > + > + ror r14d,11 > + xor r13d,r9d > + add r12d,edi > + > + mov edi,ebx > + add r12d,DWORD[rbp] > + xor r14d,ebx > + > + xor edi,ecx > + ror r13d,6 > + mov eax,ecx > + > + and r15d,edi > + ror r14d,2 > + add r12d,r13d > + > + xor eax,r15d > + add r8d,r12d > + add eax,r12d > + > + lea rbp,[20+rbp] > + mov r13d,DWORD[36+rsp] > + mov r15d,DWORD[24+rsp] > + > + mov r12d,r13d > + ror r13d,11 > + add eax,r14d > + mov r14d,r15d > + ror r15d,2 > + > + xor r13d,r12d > + shr r12d,3 > + ror r13d,7 > + xor r15d,r14d > + shr r14d,10 > + > + ror r15d,17 > + xor r12d,r13d > + xor r15d,r14d > + add r12d,DWORD[4+rsp] > + > + add r12d,DWORD[32+rsp] > + mov r13d,r8d > + add r12d,r15d > + mov r14d,eax > + ror r13d,14 > + mov r15d,r9d > + > + xor r13d,r8d > + ror r14d,9 > + xor r15d,r10d > + > + mov DWORD[32+rsp],r12d > + xor r14d,eax > + and r15d,r8d > + > + ror r13d,5 > + add r12d,r11d > + xor r15d,r10d > + > + ror r14d,11 > + xor r13d,r8d > + add r12d,r15d > + > + mov r15d,eax > + add r12d,DWORD[rbp] > + xor r14d,eax > + > + xor r15d,ebx > + ror r13d,6 > + mov r11d,ebx > + > + and edi,r15d > + ror r14d,2 > + add r12d,r13d > + > + xor r11d,edi > + add edx,r12d > + add r11d,r12d > + > + lea rbp,[4+rbp] > + mov r13d,DWORD[40+rsp] > + mov edi,DWORD[28+rsp] > + > + mov r12d,r13d > + ror r13d,11 > + add r11d,r14d > + mov r14d,edi > + ror edi,2 > + > + xor r13d,r12d > + shr r12d,3 > + ror r13d,7 > + xor edi,r14d > + shr r14d,10 > + > + ror edi,17 > + xor r12d,r13d > + xor edi,r14d > + add r12d,DWORD[8+rsp] > + > + add r12d,DWORD[36+rsp] > + mov r13d,edx > + add r12d,edi > + mov r14d,r11d > + ror r13d,14 > + mov edi,r8d > + > + xor r13d,edx > + ror r14d,9 > + xor edi,r9d > + > + mov DWORD[36+rsp],r12d > + xor r14d,r11d > + and edi,edx > + > + ror r13d,5 > + add r12d,r10d > + xor edi,r9d > + > + ror r14d,11 > + xor r13d,edx > + add r12d,edi > + > + mov edi,r11d > + add r12d,DWORD[rbp] > + xor r14d,r11d > + > + xor edi,eax > + ror r13d,6 > + mov r10d,eax > + > + and r15d,edi > + ror r14d,2 > + add r12d,r13d > + > + xor r10d,r15d > + add ecx,r12d > + add r10d,r12d > + > + lea rbp,[4+rbp] > + mov r13d,DWORD[44+rsp] > + mov r15d,DWORD[32+rsp] > + > + mov r12d,r13d > + ror r13d,11 > + add r10d,r14d > + mov r14d,r15d > + ror r15d,2 > + > + xor r13d,r12d > + shr r12d,3 > + ror r13d,7 > + xor r15d,r14d > + shr r14d,10 > + > + ror r15d,17 > + xor r12d,r13d > + xor r15d,r14d > + add r12d,DWORD[12+rsp] > + > + add r12d,DWORD[40+rsp] > + mov r13d,ecx > + add r12d,r15d > + mov r14d,r10d > + ror r13d,14 > + mov r15d,edx > + > + xor r13d,ecx > + ror r14d,9 > + xor r15d,r8d > + > + mov DWORD[40+rsp],r12d > + xor r14d,r10d > + and r15d,ecx > + > + ror r13d,5 > + add r12d,r9d > + xor r15d,r8d > + > + ror r14d,11 > + xor r13d,ecx > + add r12d,r15d > + > + mov r15d,r10d > + add r12d,DWORD[rbp] > + xor r14d,r10d > + > + xor r15d,r11d > + ror r13d,6 > + mov r9d,r11d > + > + and edi,r15d > + ror r14d,2 > + add r12d,r13d > + > + xor r9d,edi > + add ebx,r12d > + add r9d,r12d > + > + lea rbp,[4+rbp] > + mov r13d,DWORD[48+rsp] > + mov edi,DWORD[36+rsp] > + > + mov r12d,r13d > + ror r13d,11 > + add r9d,r14d > + mov r14d,edi > + ror edi,2 > + > + xor r13d,r12d > + shr r12d,3 > + ror r13d,7 > + xor edi,r14d > + shr r14d,10 > + > + ror edi,17 > + xor r12d,r13d > + xor edi,r14d > + add r12d,DWORD[16+rsp] > + > + add r12d,DWORD[44+rsp] > + mov r13d,ebx > + add r12d,edi > + mov r14d,r9d > + ror r13d,14 > + mov edi,ecx > + > + xor r13d,ebx > + ror r14d,9 > + xor edi,edx > + > + mov DWORD[44+rsp],r12d > + xor r14d,r9d > + and edi,ebx > + > + ror r13d,5 > + add r12d,r8d > + xor edi,edx > + > + ror r14d,11 > + xor r13d,ebx > + add r12d,edi > + > + mov edi,r9d > + add r12d,DWORD[rbp] > + xor r14d,r9d > + > + xor edi,r10d > + ror r13d,6 > + mov r8d,r10d > + > + and r15d,edi > + ror r14d,2 > + add r12d,r13d > + > + xor r8d,r15d > + add eax,r12d > + add r8d,r12d > + > + lea rbp,[20+rbp] > + mov r13d,DWORD[52+rsp] > + mov r15d,DWORD[40+rsp] > + > + mov r12d,r13d > + ror r13d,11 > + add r8d,r14d > + mov r14d,r15d > + ror r15d,2 > + > + xor r13d,r12d > + shr r12d,3 > + ror r13d,7 > + xor r15d,r14d > + shr r14d,10 > + > + ror r15d,17 > + xor r12d,r13d > + xor r15d,r14d > + add r12d,DWORD[20+rsp] > + > + add r12d,DWORD[48+rsp] > + mov r13d,eax > + add r12d,r15d > + mov r14d,r8d > + ror r13d,14 > + mov r15d,ebx > + > + xor r13d,eax > + ror r14d,9 > + xor r15d,ecx > + > + mov DWORD[48+rsp],r12d > + xor r14d,r8d > + and r15d,eax > + > + ror r13d,5 > + add r12d,edx > + xor r15d,ecx > + > + ror r14d,11 > + xor r13d,eax > + add r12d,r15d > + > + mov r15d,r8d > + add r12d,DWORD[rbp] > + xor r14d,r8d > + > + xor r15d,r9d > + ror r13d,6 > + mov edx,r9d > + > + and edi,r15d > + ror r14d,2 > + add r12d,r13d > + > + xor edx,edi > + add r11d,r12d > + add edx,r12d > + > + lea rbp,[4+rbp] > + mov r13d,DWORD[56+rsp] > + mov edi,DWORD[44+rsp] > + > + mov r12d,r13d > + ror r13d,11 > + add edx,r14d > + mov r14d,edi > + ror edi,2 > + > + xor r13d,r12d > + shr r12d,3 > + ror r13d,7 > + xor edi,r14d > + shr r14d,10 > + > + ror edi,17 > + xor r12d,r13d > + xor edi,r14d > + add r12d,DWORD[24+rsp] > + > + add r12d,DWORD[52+rsp] > + mov r13d,r11d > + add r12d,edi > + mov r14d,edx > + ror r13d,14 > + mov edi,eax > + > + xor r13d,r11d > + ror r14d,9 > + xor edi,ebx > + > + mov DWORD[52+rsp],r12d > + xor r14d,edx > + and edi,r11d > + > + ror r13d,5 > + add r12d,ecx > + xor edi,ebx > + > + ror r14d,11 > + xor r13d,r11d > + add r12d,edi > + > + mov edi,edx > + add r12d,DWORD[rbp] > + xor r14d,edx > + > + xor edi,r8d > + ror r13d,6 > + mov ecx,r8d > + > + and r15d,edi > + ror r14d,2 > + add r12d,r13d > + > + xor ecx,r15d > + add r10d,r12d > + add ecx,r12d > + > + lea rbp,[4+rbp] > + mov r13d,DWORD[60+rsp] > + mov r15d,DWORD[48+rsp] > + > + mov r12d,r13d > + ror r13d,11 > + add ecx,r14d > + mov r14d,r15d > + ror r15d,2 > + > + xor r13d,r12d > + shr r12d,3 > + ror r13d,7 > + xor r15d,r14d > + shr r14d,10 > + > + ror r15d,17 > + xor r12d,r13d > + xor r15d,r14d > + add r12d,DWORD[28+rsp] > + > + add r12d,DWORD[56+rsp] > + mov r13d,r10d > + add r12d,r15d > + mov r14d,ecx > + ror r13d,14 > + mov r15d,r11d > + > + xor r13d,r10d > + ror r14d,9 > + xor r15d,eax > + > + mov DWORD[56+rsp],r12d > + xor r14d,ecx > + and r15d,r10d > + > + ror r13d,5 > + add r12d,ebx > + xor r15d,eax > + > + ror r14d,11 > + xor r13d,r10d > + add r12d,r15d > + > + mov r15d,ecx > + add r12d,DWORD[rbp] > + xor r14d,ecx > + > + xor r15d,edx > + ror r13d,6 > + mov ebx,edx > + > + and edi,r15d > + ror r14d,2 > + add r12d,r13d > + > + xor ebx,edi > + add r9d,r12d > + add ebx,r12d > + > + lea rbp,[4+rbp] > + mov r13d,DWORD[rsp] > + mov edi,DWORD[52+rsp] > + > + mov r12d,r13d > + ror r13d,11 > + add ebx,r14d > + mov r14d,edi > + ror edi,2 > + > + xor r13d,r12d > + shr r12d,3 > + ror r13d,7 > + xor edi,r14d > + shr r14d,10 > + > + ror edi,17 > + xor r12d,r13d > + xor edi,r14d > + add r12d,DWORD[32+rsp] > + > + add r12d,DWORD[60+rsp] > + mov r13d,r9d > + add r12d,edi > + mov r14d,ebx > + ror r13d,14 > + mov edi,r10d > + > + xor r13d,r9d > + ror r14d,9 > + xor edi,r11d > + > + mov DWORD[60+rsp],r12d > + xor r14d,ebx > + and edi,r9d > + > + ror r13d,5 > + add r12d,eax > + xor edi,r11d > + > + ror r14d,11 > + xor r13d,r9d > + add r12d,edi > + > + mov edi,ebx > + add r12d,DWORD[rbp] > + xor r14d,ebx > + > + xor edi,ecx > + ror r13d,6 > + mov eax,ecx > + > + and r15d,edi > + ror r14d,2 > + add r12d,r13d > + > + xor eax,r15d > + add r8d,r12d > + add eax,r12d > + > + lea rbp,[20+rbp] > + cmp BYTE[3+rbp],0 > + jnz NEAR $L$rounds_16_xx > + > + mov rdi,QWORD[((64+0))+rsp] > + add eax,r14d > + lea rsi,[64+rsi] > + > + add eax,DWORD[rdi] > + add ebx,DWORD[4+rdi] > + add ecx,DWORD[8+rdi] > + add edx,DWORD[12+rdi] > + add r8d,DWORD[16+rdi] > + add r9d,DWORD[20+rdi] > + add r10d,DWORD[24+rdi] > + add r11d,DWORD[28+rdi] > + > + cmp rsi,QWORD[((64+16))+rsp] > + > + mov DWORD[rdi],eax > + mov DWORD[4+rdi],ebx > + mov DWORD[8+rdi],ecx > + mov DWORD[12+rdi],edx > + mov DWORD[16+rdi],r8d > + mov DWORD[20+rdi],r9d > + mov DWORD[24+rdi],r10d > + mov DWORD[28+rdi],r11d > + jb NEAR $L$loop > + > + mov rsi,QWORD[88+rsp] > + > + mov r15,QWORD[((-48))+rsi] > + > + mov r14,QWORD[((-40))+rsi] > + > + mov r13,QWORD[((-32))+rsi] > + > + mov r12,QWORD[((-24))+rsi] > + > + mov rbp,QWORD[((-16))+rsi] > + > + mov rbx,QWORD[((-8))+rsi] > + > + lea rsp,[rsi] > + > +$L$epilogue: > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > + mov rsi,QWORD[16+rsp] > + DB 0F3h,0C3h ;repret > + > +$L$SEH_end_sha256_block_data_order: > +ALIGN 64 > + > +K256: > + DD 0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5 > + DD 0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5 > + DD 0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5 > + DD 0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5 > + DD 0xd807aa98,0x12835b01,0x243185be,0x550c7dc3 > + DD 0xd807aa98,0x12835b01,0x243185be,0x550c7dc3 > + DD 0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174 > + DD 0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174 > + DD 0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc > + DD 0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc > + DD 0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da > + DD 0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da > + DD 0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7 > + DD 0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7 > + DD 0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967 > + DD 0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967 > + DD 0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13 > + DD 0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13 > + DD 0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85 > + DD 0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85 > + DD 0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3 > + DD 0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3 > + DD 0xd192e819,0xd6990624,0xf40e3585,0x106aa070 > + DD 0xd192e819,0xd6990624,0xf40e3585,0x106aa070 > + DD 0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5 > + DD 0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5 > + DD 0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3 > + DD 0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3 > + DD 0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208 > + DD 0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208 > + DD 0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2 > + DD 0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2 > + > + DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f > + DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f > + DD 0x03020100,0x0b0a0908,0xffffffff,0xffffffff > + DD 0x03020100,0x0b0a0908,0xffffffff,0xffffffff > + DD 0xffffffff,0xffffffff,0x03020100,0x0b0a0908 > + DD 0xffffffff,0xffffffff,0x03020100,0x0b0a0908 > +DB 83,72,65,50,53,54,32,98,108,111,99,107,32,116,114,97 > +DB 110,115,102,111,114,109,32,102,111,114,32,120,56,54,95,54 > +DB 52,44,32,67,82,89,80,84,79,71,65,77,83,32,98,121 > +DB 32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46 > +DB 111,114,103,62,0 > + > +ALIGN 64 > +sha256_block_data_order_shaext: > + mov QWORD[8+rsp],rdi ;WIN64 prologue > + mov QWORD[16+rsp],rsi > + mov rax,rsp > +$L$SEH_begin_sha256_block_data_order_shaext: > + mov rdi,rcx > + mov rsi,rdx > + mov rdx,r8 > + > + > +_shaext_shortcut: > + > + lea rsp,[((-88))+rsp] > + movaps XMMWORD[(-8-80)+rax],xmm6 > + movaps XMMWORD[(-8-64)+rax],xmm7 > + movaps XMMWORD[(-8-48)+rax],xmm8 > + movaps XMMWORD[(-8-32)+rax],xmm9 > + movaps XMMWORD[(-8-16)+rax],xmm10 > +$L$prologue_shaext: > + lea rcx,[((K256+128))] > + movdqu xmm1,XMMWORD[rdi] > + movdqu xmm2,XMMWORD[16+rdi] > + movdqa xmm7,XMMWORD[((512-128))+rcx] > + > + pshufd xmm0,xmm1,0x1b > + pshufd xmm1,xmm1,0xb1 > + pshufd xmm2,xmm2,0x1b > + movdqa xmm8,xmm7 > +DB 102,15,58,15,202,8 > + punpcklqdq xmm2,xmm0 > + jmp NEAR $L$oop_shaext > + > +ALIGN 16 > +$L$oop_shaext: > + movdqu xmm3,XMMWORD[rsi] > + movdqu xmm4,XMMWORD[16+rsi] > + movdqu xmm5,XMMWORD[32+rsi] > +DB 102,15,56,0,223 > + movdqu xmm6,XMMWORD[48+rsi] > + > + movdqa xmm0,XMMWORD[((0-128))+rcx] > + paddd xmm0,xmm3 > +DB 102,15,56,0,231 > + movdqa xmm10,xmm2 > +DB 15,56,203,209 > + pshufd xmm0,xmm0,0x0e > + nop > + movdqa xmm9,xmm1 > +DB 15,56,203,202 > + > + movdqa xmm0,XMMWORD[((32-128))+rcx] > + paddd xmm0,xmm4 > +DB 102,15,56,0,239 > +DB 15,56,203,209 > + pshufd xmm0,xmm0,0x0e > + lea rsi,[64+rsi] > +DB 15,56,204,220 > +DB 15,56,203,202 > + > + movdqa xmm0,XMMWORD[((64-128))+rcx] > + paddd xmm0,xmm5 > +DB 102,15,56,0,247 > +DB 15,56,203,209 > + pshufd xmm0,xmm0,0x0e > + movdqa xmm7,xmm6 > +DB 102,15,58,15,253,4 > + nop > + paddd xmm3,xmm7 > +DB 15,56,204,229 > +DB 15,56,203,202 > + > + movdqa xmm0,XMMWORD[((96-128))+rcx] > + paddd xmm0,xmm6 > +DB 15,56,205,222 > +DB 15,56,203,209 > + pshufd xmm0,xmm0,0x0e > + movdqa xmm7,xmm3 > +DB 102,15,58,15,254,4 > + nop > + paddd xmm4,xmm7 > +DB 15,56,204,238 > +DB 15,56,203,202 > + movdqa xmm0,XMMWORD[((128-128))+rcx] > + paddd xmm0,xmm3 > +DB 15,56,205,227 > +DB 15,56,203,209 > + pshufd xmm0,xmm0,0x0e > + movdqa xmm7,xmm4 > +DB 102,15,58,15,251,4 > + nop > + paddd xmm5,xmm7 > +DB 15,56,204,243 > +DB 15,56,203,202 > + movdqa xmm0,XMMWORD[((160-128))+rcx] > + paddd xmm0,xmm4 > +DB 15,56,205,236 > +DB 15,56,203,209 > + pshufd xmm0,xmm0,0x0e > + movdqa xmm7,xmm5 > +DB 102,15,58,15,252,4 > + nop > + paddd xmm6,xmm7 > +DB 15,56,204,220 > +DB 15,56,203,202 > + movdqa xmm0,XMMWORD[((192-128))+rcx] > + paddd xmm0,xmm5 > +DB 15,56,205,245 > +DB 15,56,203,209 > + pshufd xmm0,xmm0,0x0e > + movdqa xmm7,xmm6 > +DB 102,15,58,15,253,4 > + nop > + paddd xmm3,xmm7 > +DB 15,56,204,229 > +DB 15,56,203,202 > + movdqa xmm0,XMMWORD[((224-128))+rcx] > + paddd xmm0,xmm6 > +DB 15,56,205,222 > +DB 15,56,203,209 > + pshufd xmm0,xmm0,0x0e > + movdqa xmm7,xmm3 > +DB 102,15,58,15,254,4 > + nop > + paddd xmm4,xmm7 > +DB 15,56,204,238 > +DB 15,56,203,202 > + movdqa xmm0,XMMWORD[((256-128))+rcx] > + paddd xmm0,xmm3 > +DB 15,56,205,227 > +DB 15,56,203,209 > + pshufd xmm0,xmm0,0x0e > + movdqa xmm7,xmm4 > +DB 102,15,58,15,251,4 > + nop > + paddd xmm5,xmm7 > +DB 15,56,204,243 > +DB 15,56,203,202 > + movdqa xmm0,XMMWORD[((288-128))+rcx] > + paddd xmm0,xmm4 > +DB 15,56,205,236 > +DB 15,56,203,209 > + pshufd xmm0,xmm0,0x0e > + movdqa xmm7,xmm5 > +DB 102,15,58,15,252,4 > + nop > + paddd xmm6,xmm7 > +DB 15,56,204,220 > +DB 15,56,203,202 > + movdqa xmm0,XMMWORD[((320-128))+rcx] > + paddd xmm0,xmm5 > +DB 15,56,205,245 > +DB 15,56,203,209 > + pshufd xmm0,xmm0,0x0e > + movdqa xmm7,xmm6 > +DB 102,15,58,15,253,4 > + nop > + paddd xmm3,xmm7 > +DB 15,56,204,229 > +DB 15,56,203,202 > + movdqa xmm0,XMMWORD[((352-128))+rcx] > + paddd xmm0,xmm6 > +DB 15,56,205,222 > +DB 15,56,203,209 > + pshufd xmm0,xmm0,0x0e > + movdqa xmm7,xmm3 > +DB 102,15,58,15,254,4 > + nop > + paddd xmm4,xmm7 > +DB 15,56,204,238 > +DB 15,56,203,202 > + movdqa xmm0,XMMWORD[((384-128))+rcx] > + paddd xmm0,xmm3 > +DB 15,56,205,227 > +DB 15,56,203,209 > + pshufd xmm0,xmm0,0x0e > + movdqa xmm7,xmm4 > +DB 102,15,58,15,251,4 > + nop > + paddd xmm5,xmm7 > +DB 15,56,204,243 > +DB 15,56,203,202 > + movdqa xmm0,XMMWORD[((416-128))+rcx] > + paddd xmm0,xmm4 > +DB 15,56,205,236 > +DB 15,56,203,209 > + pshufd xmm0,xmm0,0x0e > + movdqa xmm7,xmm5 > +DB 102,15,58,15,252,4 > +DB 15,56,203,202 > + paddd xmm6,xmm7 > + > + movdqa xmm0,XMMWORD[((448-128))+rcx] > + paddd xmm0,xmm5 > +DB 15,56,203,209 > + pshufd xmm0,xmm0,0x0e > +DB 15,56,205,245 > + movdqa xmm7,xmm8 > +DB 15,56,203,202 > + > + movdqa xmm0,XMMWORD[((480-128))+rcx] > + paddd xmm0,xmm6 > + nop > +DB 15,56,203,209 > + pshufd xmm0,xmm0,0x0e > + dec rdx > + nop > +DB 15,56,203,202 > + > + paddd xmm2,xmm10 > + paddd xmm1,xmm9 > + jnz NEAR $L$oop_shaext > + > + pshufd xmm2,xmm2,0xb1 > + pshufd xmm7,xmm1,0x1b > + pshufd xmm1,xmm1,0xb1 > + punpckhqdq xmm1,xmm2 > +DB 102,15,58,15,215,8 > + > + movdqu XMMWORD[rdi],xmm1 > + movdqu XMMWORD[16+rdi],xmm2 > + movaps xmm6,XMMWORD[((-8-80))+rax] > + movaps xmm7,XMMWORD[((-8-64))+rax] > + movaps xmm8,XMMWORD[((-8-48))+rax] > + movaps xmm9,XMMWORD[((-8-32))+rax] > + movaps xmm10,XMMWORD[((-8-16))+rax] > + mov rsp,rax > +$L$epilogue_shaext: > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > + mov rsi,QWORD[16+rsp] > + DB 0F3h,0C3h ;repret > + > +$L$SEH_end_sha256_block_data_order_shaext: > + > +ALIGN 64 > +sha256_block_data_order_ssse3: > + mov QWORD[8+rsp],rdi ;WIN64 prologue > + mov QWORD[16+rsp],rsi > + mov rax,rsp > +$L$SEH_begin_sha256_block_data_order_ssse3: > + mov rdi,rcx > + mov rsi,rdx > + mov rdx,r8 > + > + > + > +$L$ssse3_shortcut: > + mov rax,rsp > + > + push rbx > + > + push rbp > + > + push r12 > + > + push r13 > + > + push r14 > + > + push r15 > + > + shl rdx,4 > + sub rsp,160 > + lea rdx,[rdx*4+rsi] > + and rsp,-64 > + mov QWORD[((64+0))+rsp],rdi > + mov QWORD[((64+8))+rsp],rsi > + mov QWORD[((64+16))+rsp],rdx > + mov QWORD[88+rsp],rax > + > + movaps XMMWORD[(64+32)+rsp],xmm6 > + movaps XMMWORD[(64+48)+rsp],xmm7 > + movaps XMMWORD[(64+64)+rsp],xmm8 > + movaps XMMWORD[(64+80)+rsp],xmm9 > +$L$prologue_ssse3: > + > + mov eax,DWORD[rdi] > + mov ebx,DWORD[4+rdi] > + mov ecx,DWORD[8+rdi] > + mov edx,DWORD[12+rdi] > + mov r8d,DWORD[16+rdi] > + mov r9d,DWORD[20+rdi] > + mov r10d,DWORD[24+rdi] > + mov r11d,DWORD[28+rdi] > + > + > + jmp NEAR $L$loop_ssse3 > +ALIGN 16 > +$L$loop_ssse3: > + movdqa xmm7,XMMWORD[((K256+512))] > + movdqu xmm0,XMMWORD[rsi] > + movdqu xmm1,XMMWORD[16+rsi] > + movdqu xmm2,XMMWORD[32+rsi] > +DB 102,15,56,0,199 > + movdqu xmm3,XMMWORD[48+rsi] > + lea rbp,[K256] > +DB 102,15,56,0,207 > + movdqa xmm4,XMMWORD[rbp] > + movdqa xmm5,XMMWORD[32+rbp] > +DB 102,15,56,0,215 > + paddd xmm4,xmm0 > + movdqa xmm6,XMMWORD[64+rbp] > +DB 102,15,56,0,223 > + movdqa xmm7,XMMWORD[96+rbp] > + paddd xmm5,xmm1 > + paddd xmm6,xmm2 > + paddd xmm7,xmm3 > + movdqa XMMWORD[rsp],xmm4 > + mov r14d,eax > + movdqa XMMWORD[16+rsp],xmm5 > + mov edi,ebx > + movdqa XMMWORD[32+rsp],xmm6 > + xor edi,ecx > + movdqa XMMWORD[48+rsp],xmm7 > + mov r13d,r8d > + jmp NEAR $L$ssse3_00_47 > + > +ALIGN 16 > +$L$ssse3_00_47: > + sub rbp,-128 > + ror r13d,14 > + movdqa xmm4,xmm1 > + mov eax,r14d > + mov r12d,r9d > + movdqa xmm7,xmm3 > + ror r14d,9 > + xor r13d,r8d > + xor r12d,r10d > + ror r13d,5 > + xor r14d,eax > +DB 102,15,58,15,224,4 > + and r12d,r8d > + xor r13d,r8d > +DB 102,15,58,15,250,4 > + add r11d,DWORD[rsp] > + mov r15d,eax > + xor r12d,r10d > + ror r14d,11 > + movdqa xmm5,xmm4 > + xor r15d,ebx > + add r11d,r12d > + movdqa xmm6,xmm4 > + ror r13d,6 > + and edi,r15d > + psrld xmm4,3 > + xor r14d,eax > + add r11d,r13d > + xor edi,ebx > + paddd xmm0,xmm7 > + ror r14d,2 > + add edx,r11d > + psrld xmm6,7 > + add r11d,edi > + mov r13d,edx > + pshufd xmm7,xmm3,250 > + add r14d,r11d > + ror r13d,14 > + pslld xmm5,14 > + mov r11d,r14d > + mov r12d,r8d > + pxor xmm4,xmm6 > + ror r14d,9 > + xor r13d,edx > + xor r12d,r9d > + ror r13d,5 > + psrld xmm6,11 > + xor r14d,r11d > + pxor xmm4,xmm5 > + and r12d,edx > + xor r13d,edx > + pslld xmm5,11 > + add r10d,DWORD[4+rsp] > + mov edi,r11d > + pxor xmm4,xmm6 > + xor r12d,r9d > + ror r14d,11 > + movdqa xmm6,xmm7 > + xor edi,eax > + add r10d,r12d > + pxor xmm4,xmm5 > + ror r13d,6 > + and r15d,edi > + xor r14d,r11d > + psrld xmm7,10 > + add r10d,r13d > + xor r15d,eax > + paddd xmm0,xmm4 > + ror r14d,2 > + add ecx,r10d > + psrlq xmm6,17 > + add r10d,r15d > + mov r13d,ecx > + add r14d,r10d > + pxor xmm7,xmm6 > + ror r13d,14 > + mov r10d,r14d > + mov r12d,edx > + ror r14d,9 > + psrlq xmm6,2 > + xor r13d,ecx > + xor r12d,r8d > + pxor xmm7,xmm6 > + ror r13d,5 > + xor r14d,r10d > + and r12d,ecx > + pshufd xmm7,xmm7,128 > + xor r13d,ecx > + add r9d,DWORD[8+rsp] > + mov r15d,r10d > + psrldq xmm7,8 > + xor r12d,r8d > + ror r14d,11 > + xor r15d,r11d > + add r9d,r12d > + ror r13d,6 > + paddd xmm0,xmm7 > + and edi,r15d > + xor r14d,r10d > + add r9d,r13d > + pshufd xmm7,xmm0,80 > + xor edi,r11d > + ror r14d,2 > + add ebx,r9d > + movdqa xmm6,xmm7 > + add r9d,edi > + mov r13d,ebx > + psrld xmm7,10 > + add r14d,r9d > + ror r13d,14 > + psrlq xmm6,17 > + mov r9d,r14d > + mov r12d,ecx > + pxor xmm7,xmm6 > + ror r14d,9 > + xor r13d,ebx > + xor r12d,edx > + ror r13d,5 > + xor r14d,r9d > + psrlq xmm6,2 > + and r12d,ebx > + xor r13d,ebx > + add r8d,DWORD[12+rsp] > + pxor xmm7,xmm6 > + mov edi,r9d > + xor r12d,edx > + ror r14d,11 > + pshufd xmm7,xmm7,8 > + xor edi,r10d > + add r8d,r12d > + movdqa xmm6,XMMWORD[rbp] > + ror r13d,6 > + and r15d,edi > + pslldq xmm7,8 > + xor r14d,r9d > + add r8d,r13d > + xor r15d,r10d > + paddd xmm0,xmm7 > + ror r14d,2 > + add eax,r8d > + add r8d,r15d > + paddd xmm6,xmm0 > + mov r13d,eax > + add r14d,r8d > + movdqa XMMWORD[rsp],xmm6 > + ror r13d,14 > + movdqa xmm4,xmm2 > + mov r8d,r14d > + mov r12d,ebx > + movdqa xmm7,xmm0 > + ror r14d,9 > + xor r13d,eax > + xor r12d,ecx > + ror r13d,5 > + xor r14d,r8d > +DB 102,15,58,15,225,4 > + and r12d,eax > + xor r13d,eax > +DB 102,15,58,15,251,4 > + add edx,DWORD[16+rsp] > + mov r15d,r8d > + xor r12d,ecx > + ror r14d,11 > + movdqa xmm5,xmm4 > + xor r15d,r9d > + add edx,r12d > + movdqa xmm6,xmm4 > + ror r13d,6 > + and edi,r15d > + psrld xmm4,3 > + xor r14d,r8d > + add edx,r13d > + xor edi,r9d > + paddd xmm1,xmm7 > + ror r14d,2 > + add r11d,edx > + psrld xmm6,7 > + add edx,edi > + mov r13d,r11d > + pshufd xmm7,xmm0,250 > + add r14d,edx > + ror r13d,14 > + pslld xmm5,14 > + mov edx,r14d > + mov r12d,eax > + pxor xmm4,xmm6 > + ror r14d,9 > + xor r13d,r11d > + xor r12d,ebx > + ror r13d,5 > + psrld xmm6,11 > + xor r14d,edx > + pxor xmm4,xmm5 > + and r12d,r11d > + xor r13d,r11d > + pslld xmm5,11 > + add ecx,DWORD[20+rsp] > + mov edi,edx > + pxor xmm4,xmm6 > + xor r12d,ebx > + ror r14d,11 > + movdqa xmm6,xmm7 > + xor edi,r8d > + add ecx,r12d > + pxor xmm4,xmm5 > + ror r13d,6 > + and r15d,edi > + xor r14d,edx > + psrld xmm7,10 > + add ecx,r13d > + xor r15d,r8d > + paddd xmm1,xmm4 > + ror r14d,2 > + add r10d,ecx > + psrlq xmm6,17 > + add ecx,r15d > + mov r13d,r10d > + add r14d,ecx > + pxor xmm7,xmm6 > + ror r13d,14 > + mov ecx,r14d > + mov r12d,r11d > + ror r14d,9 > + psrlq xmm6,2 > + xor r13d,r10d > + xor r12d,eax > + pxor xmm7,xmm6 > + ror r13d,5 > + xor r14d,ecx > + and r12d,r10d > + pshufd xmm7,xmm7,128 > + xor r13d,r10d > + add ebx,DWORD[24+rsp] > + mov r15d,ecx > + psrldq xmm7,8 > + xor r12d,eax > + ror r14d,11 > + xor r15d,edx > + add ebx,r12d > + ror r13d,6 > + paddd xmm1,xmm7 > + and edi,r15d > + xor r14d,ecx > + add ebx,r13d > + pshufd xmm7,xmm1,80 > + xor edi,edx > + ror r14d,2 > + add r9d,ebx > + movdqa xmm6,xmm7 > + add ebx,edi > + mov r13d,r9d > + psrld xmm7,10 > + add r14d,ebx > + ror r13d,14 > + psrlq xmm6,17 > + mov ebx,r14d > + mov r12d,r10d > + pxor xmm7,xmm6 > + ror r14d,9 > + xor r13d,r9d > + xor r12d,r11d > + ror r13d,5 > + xor r14d,ebx > + psrlq xmm6,2 > + and r12d,r9d > + xor r13d,r9d > + add eax,DWORD[28+rsp] > + pxor xmm7,xmm6 > + mov edi,ebx > + xor r12d,r11d > + ror r14d,11 > + pshufd xmm7,xmm7,8 > + xor edi,ecx > + add eax,r12d > + movdqa xmm6,XMMWORD[32+rbp] > + ror r13d,6 > + and r15d,edi > + pslldq xmm7,8 > + xor r14d,ebx > + add eax,r13d > + xor r15d,ecx > + paddd xmm1,xmm7 > + ror r14d,2 > + add r8d,eax > + add eax,r15d > + paddd xmm6,xmm1 > + mov r13d,r8d > + add r14d,eax > + movdqa XMMWORD[16+rsp],xmm6 > + ror r13d,14 > + movdqa xmm4,xmm3 > + mov eax,r14d > + mov r12d,r9d > + movdqa xmm7,xmm1 > + ror r14d,9 > + xor r13d,r8d > + xor r12d,r10d > + ror r13d,5 > + xor r14d,eax > +DB 102,15,58,15,226,4 > + and r12d,r8d > + xor r13d,r8d > +DB 102,15,58,15,248,4 > + add r11d,DWORD[32+rsp] > + mov r15d,eax > + xor r12d,r10d > + ror r14d,11 > + movdqa xmm5,xmm4 > + xor r15d,ebx > + add r11d,r12d > + movdqa xmm6,xmm4 > + ror r13d,6 > + and edi,r15d > + psrld xmm4,3 > + xor r14d,eax > + add r11d,r13d > + xor edi,ebx > + paddd xmm2,xmm7 > + ror r14d,2 > + add edx,r11d > + psrld xmm6,7 > + add r11d,edi > + mov r13d,edx > + pshufd xmm7,xmm1,250 > + add r14d,r11d > + ror r13d,14 > + pslld xmm5,14 > + mov r11d,r14d > + mov r12d,r8d > + pxor xmm4,xmm6 > + ror r14d,9 > + xor r13d,edx > + xor r12d,r9d > + ror r13d,5 > + psrld xmm6,11 > + xor r14d,r11d > + pxor xmm4,xmm5 > + and r12d,edx > + xor r13d,edx > + pslld xmm5,11 > + add r10d,DWORD[36+rsp] > + mov edi,r11d > + pxor xmm4,xmm6 > + xor r12d,r9d > + ror r14d,11 > + movdqa xmm6,xmm7 > + xor edi,eax > + add r10d,r12d > + pxor xmm4,xmm5 > + ror r13d,6 > + and r15d,edi > + xor r14d,r11d > + psrld xmm7,10 > + add r10d,r13d > + xor r15d,eax > + paddd xmm2,xmm4 > + ror r14d,2 > + add ecx,r10d > + psrlq xmm6,17 > + add r10d,r15d > + mov r13d,ecx > + add r14d,r10d > + pxor xmm7,xmm6 > + ror r13d,14 > + mov r10d,r14d > + mov r12d,edx > + ror r14d,9 > + psrlq xmm6,2 > + xor r13d,ecx > + xor r12d,r8d > + pxor xmm7,xmm6 > + ror r13d,5 > + xor r14d,r10d > + and r12d,ecx > + pshufd xmm7,xmm7,128 > + xor r13d,ecx > + add r9d,DWORD[40+rsp] > + mov r15d,r10d > + psrldq xmm7,8 > + xor r12d,r8d > + ror r14d,11 > + xor r15d,r11d > + add r9d,r12d > + ror r13d,6 > + paddd xmm2,xmm7 > + and edi,r15d > + xor r14d,r10d > + add r9d,r13d > + pshufd xmm7,xmm2,80 > + xor edi,r11d > + ror r14d,2 > + add ebx,r9d > + movdqa xmm6,xmm7 > + add r9d,edi > + mov r13d,ebx > + psrld xmm7,10 > + add r14d,r9d > + ror r13d,14 > + psrlq xmm6,17 > + mov r9d,r14d > + mov r12d,ecx > + pxor xmm7,xmm6 > + ror r14d,9 > + xor r13d,ebx > + xor r12d,edx > + ror r13d,5 > + xor r14d,r9d > + psrlq xmm6,2 > + and r12d,ebx > + xor r13d,ebx > + add r8d,DWORD[44+rsp] > + pxor xmm7,xmm6 > + mov edi,r9d > + xor r12d,edx > + ror r14d,11 > + pshufd xmm7,xmm7,8 > + xor edi,r10d > + add r8d,r12d > + movdqa xmm6,XMMWORD[64+rbp] > + ror r13d,6 > + and r15d,edi > + pslldq xmm7,8 > + xor r14d,r9d > + add r8d,r13d > + xor r15d,r10d > + paddd xmm2,xmm7 > + ror r14d,2 > + add eax,r8d > + add r8d,r15d > + paddd xmm6,xmm2 > + mov r13d,eax > + add r14d,r8d > + movdqa XMMWORD[32+rsp],xmm6 > + ror r13d,14 > + movdqa xmm4,xmm0 > + mov r8d,r14d > + mov r12d,ebx > + movdqa xmm7,xmm2 > + ror r14d,9 > + xor r13d,eax > + xor r12d,ecx > + ror r13d,5 > + xor r14d,r8d > +DB 102,15,58,15,227,4 > + and r12d,eax > + xor r13d,eax > +DB 102,15,58,15,249,4 > + add edx,DWORD[48+rsp] > + mov r15d,r8d > + xor r12d,ecx > + ror r14d,11 > + movdqa xmm5,xmm4 > + xor r15d,r9d > + add edx,r12d > + movdqa xmm6,xmm4 > + ror r13d,6 > + and edi,r15d > + psrld xmm4,3 > + xor r14d,r8d > + add edx,r13d > + xor edi,r9d > + paddd xmm3,xmm7 > + ror r14d,2 > + add r11d,edx > + psrld xmm6,7 > + add edx,edi > + mov r13d,r11d > + pshufd xmm7,xmm2,250 > + add r14d,edx > + ror r13d,14 > + pslld xmm5,14 > + mov edx,r14d > + mov r12d,eax > + pxor xmm4,xmm6 > + ror r14d,9 > + xor r13d,r11d > + xor r12d,ebx > + ror r13d,5 > + psrld xmm6,11 > + xor r14d,edx > + pxor xmm4,xmm5 > + and r12d,r11d > + xor r13d,r11d > + pslld xmm5,11 > + add ecx,DWORD[52+rsp] > + mov edi,edx > + pxor xmm4,xmm6 > + xor r12d,ebx > + ror r14d,11 > + movdqa xmm6,xmm7 > + xor edi,r8d > + add ecx,r12d > + pxor xmm4,xmm5 > + ror r13d,6 > + and r15d,edi > + xor r14d,edx > + psrld xmm7,10 > + add ecx,r13d > + xor r15d,r8d > + paddd xmm3,xmm4 > + ror r14d,2 > + add r10d,ecx > + psrlq xmm6,17 > + add ecx,r15d > + mov r13d,r10d > + add r14d,ecx > + pxor xmm7,xmm6 > + ror r13d,14 > + mov ecx,r14d > + mov r12d,r11d > + ror r14d,9 > + psrlq xmm6,2 > + xor r13d,r10d > + xor r12d,eax > + pxor xmm7,xmm6 > + ror r13d,5 > + xor r14d,ecx > + and r12d,r10d > + pshufd xmm7,xmm7,128 > + xor r13d,r10d > + add ebx,DWORD[56+rsp] > + mov r15d,ecx > + psrldq xmm7,8 > + xor r12d,eax > + ror r14d,11 > + xor r15d,edx > + add ebx,r12d > + ror r13d,6 > + paddd xmm3,xmm7 > + and edi,r15d > + xor r14d,ecx > + add ebx,r13d > + pshufd xmm7,xmm3,80 > + xor edi,edx > + ror r14d,2 > + add r9d,ebx > + movdqa xmm6,xmm7 > + add ebx,edi > + mov r13d,r9d > + psrld xmm7,10 > + add r14d,ebx > + ror r13d,14 > + psrlq xmm6,17 > + mov ebx,r14d > + mov r12d,r10d > + pxor xmm7,xmm6 > + ror r14d,9 > + xor r13d,r9d > + xor r12d,r11d > + ror r13d,5 > + xor r14d,ebx > + psrlq xmm6,2 > + and r12d,r9d > + xor r13d,r9d > + add eax,DWORD[60+rsp] > + pxor xmm7,xmm6 > + mov edi,ebx > + xor r12d,r11d > + ror r14d,11 > + pshufd xmm7,xmm7,8 > + xor edi,ecx > + add eax,r12d > + movdqa xmm6,XMMWORD[96+rbp] > + ror r13d,6 > + and r15d,edi > + pslldq xmm7,8 > + xor r14d,ebx > + add eax,r13d > + xor r15d,ecx > + paddd xmm3,xmm7 > + ror r14d,2 > + add r8d,eax > + add eax,r15d > + paddd xmm6,xmm3 > + mov r13d,r8d > + add r14d,eax > + movdqa XMMWORD[48+rsp],xmm6 > + cmp BYTE[131+rbp],0 > + jne NEAR $L$ssse3_00_47 > + ror r13d,14 > + mov eax,r14d > + mov r12d,r9d > + ror r14d,9 > + xor r13d,r8d > + xor r12d,r10d > + ror r13d,5 > + xor r14d,eax > + and r12d,r8d > + xor r13d,r8d > + add r11d,DWORD[rsp] > + mov r15d,eax > + xor r12d,r10d > + ror r14d,11 > + xor r15d,ebx > + add r11d,r12d > + ror r13d,6 > + and edi,r15d > + xor r14d,eax > + add r11d,r13d > + xor edi,ebx > + ror r14d,2 > + add edx,r11d > + add r11d,edi > + mov r13d,edx > + add r14d,r11d > + ror r13d,14 > + mov r11d,r14d > + mov r12d,r8d > + ror r14d,9 > + xor r13d,edx > + xor r12d,r9d > + ror r13d,5 > + xor r14d,r11d > + and r12d,edx > + xor r13d,edx > + add r10d,DWORD[4+rsp] > + mov edi,r11d > + xor r12d,r9d > + ror r14d,11 > + xor edi,eax > + add r10d,r12d > + ror r13d,6 > + and r15d,edi > + xor r14d,r11d > + add r10d,r13d > + xor r15d,eax > + ror r14d,2 > + add ecx,r10d > + add r10d,r15d > + mov r13d,ecx > + add r14d,r10d > + ror r13d,14 > + mov r10d,r14d > + mov r12d,edx > + ror r14d,9 > + xor r13d,ecx > + xor r12d,r8d > + ror r13d,5 > + xor r14d,r10d > + and r12d,ecx > + xor r13d,ecx > + add r9d,DWORD[8+rsp] > + mov r15d,r10d > + xor r12d,r8d > + ror r14d,11 > + xor r15d,r11d > + add r9d,r12d > + ror r13d,6 > + and edi,r15d > + xor r14d,r10d > + add r9d,r13d > + xor edi,r11d > + ror r14d,2 > + add ebx,r9d > + add r9d,edi > + mov r13d,ebx > + add r14d,r9d > + ror r13d,14 > + mov r9d,r14d > + mov r12d,ecx > + ror r14d,9 > + xor r13d,ebx > + xor r12d,edx > + ror r13d,5 > + xor r14d,r9d > + and r12d,ebx > + xor r13d,ebx > + add r8d,DWORD[12+rsp] > + mov edi,r9d > + xor r12d,edx > + ror r14d,11 > + xor edi,r10d > + add r8d,r12d > + ror r13d,6 > + and r15d,edi > + xor r14d,r9d > + add r8d,r13d > + xor r15d,r10d > + ror r14d,2 > + add eax,r8d > + add r8d,r15d > + mov r13d,eax > + add r14d,r8d > + ror r13d,14 > + mov r8d,r14d > + mov r12d,ebx > + ror r14d,9 > + xor r13d,eax > + xor r12d,ecx > + ror r13d,5 > + xor r14d,r8d > + and r12d,eax > + xor r13d,eax > + add edx,DWORD[16+rsp] > + mov r15d,r8d > + xor r12d,ecx > + ror r14d,11 > + xor r15d,r9d > + add edx,r12d > + ror r13d,6 > + and edi,r15d > + xor r14d,r8d > + add edx,r13d > + xor edi,r9d > + ror r14d,2 > + add r11d,edx > + add edx,edi > + mov r13d,r11d > + add r14d,edx > + ror r13d,14 > + mov edx,r14d > + mov r12d,eax > + ror r14d,9 > + xor r13d,r11d > + xor r12d,ebx > + ror r13d,5 > + xor r14d,edx > + and r12d,r11d > + xor r13d,r11d > + add ecx,DWORD[20+rsp] > + mov edi,edx > + xor r12d,ebx > + ror r14d,11 > + xor edi,r8d > + add ecx,r12d > + ror r13d,6 > + and r15d,edi > + xor r14d,edx > + add ecx,r13d > + xor r15d,r8d > + ror r14d,2 > + add r10d,ecx > + add ecx,r15d > + mov r13d,r10d > + add r14d,ecx > + ror r13d,14 > + mov ecx,r14d > + mov r12d,r11d > + ror r14d,9 > + xor r13d,r10d > + xor r12d,eax > + ror r13d,5 > + xor r14d,ecx > + and r12d,r10d > + xor r13d,r10d > + add ebx,DWORD[24+rsp] > + mov r15d,ecx > + xor r12d,eax > + ror r14d,11 > + xor r15d,edx > + add ebx,r12d > + ror r13d,6 > + and edi,r15d > + xor r14d,ecx > + add ebx,r13d > + xor edi,edx > + ror r14d,2 > + add r9d,ebx > + add ebx,edi > + mov r13d,r9d > + add r14d,ebx > + ror r13d,14 > + mov ebx,r14d > + mov r12d,r10d > + ror r14d,9 > + xor r13d,r9d > + xor r12d,r11d > + ror r13d,5 > + xor r14d,ebx > + and r12d,r9d > + xor r13d,r9d > + add eax,DWORD[28+rsp] > + mov edi,ebx > + xor r12d,r11d > + ror r14d,11 > + xor edi,ecx > + add eax,r12d > + ror r13d,6 > + and r15d,edi > + xor r14d,ebx > + add eax,r13d > + xor r15d,ecx > + ror r14d,2 > + add r8d,eax > + add eax,r15d > + mov r13d,r8d > + add r14d,eax > + ror r13d,14 > + mov eax,r14d > + mov r12d,r9d > + ror r14d,9 > + xor r13d,r8d > + xor r12d,r10d > + ror r13d,5 > + xor r14d,eax > + and r12d,r8d > + xor r13d,r8d > + add r11d,DWORD[32+rsp] > + mov r15d,eax > + xor r12d,r10d > + ror r14d,11 > + xor r15d,ebx > + add r11d,r12d > + ror r13d,6 > + and edi,r15d > + xor r14d,eax > + add r11d,r13d > + xor edi,ebx > + ror r14d,2 > + add edx,r11d > + add r11d,edi > + mov r13d,edx > + add r14d,r11d > + ror r13d,14 > + mov r11d,r14d > + mov r12d,r8d > + ror r14d,9 > + xor r13d,edx > + xor r12d,r9d > + ror r13d,5 > + xor r14d,r11d > + and r12d,edx > + xor r13d,edx > + add r10d,DWORD[36+rsp] > + mov edi,r11d > + xor r12d,r9d > + ror r14d,11 > + xor edi,eax > + add r10d,r12d > + ror r13d,6 > + and r15d,edi > + xor r14d,r11d > + add r10d,r13d > + xor r15d,eax > + ror r14d,2 > + add ecx,r10d > + add r10d,r15d > + mov r13d,ecx > + add r14d,r10d > + ror r13d,14 > + mov r10d,r14d > + mov r12d,edx > + ror r14d,9 > + xor r13d,ecx > + xor r12d,r8d > + ror r13d,5 > + xor r14d,r10d > + and r12d,ecx > + xor r13d,ecx > + add r9d,DWORD[40+rsp] > + mov r15d,r10d > + xor r12d,r8d > + ror r14d,11 > + xor r15d,r11d > + add r9d,r12d > + ror r13d,6 > + and edi,r15d > + xor r14d,r10d > + add r9d,r13d > + xor edi,r11d > + ror r14d,2 > + add ebx,r9d > + add r9d,edi > + mov r13d,ebx > + add r14d,r9d > + ror r13d,14 > + mov r9d,r14d > + mov r12d,ecx > + ror r14d,9 > + xor r13d,ebx > + xor r12d,edx > + ror r13d,5 > + xor r14d,r9d > + and r12d,ebx > + xor r13d,ebx > + add r8d,DWORD[44+rsp] > + mov edi,r9d > + xor r12d,edx > + ror r14d,11 > + xor edi,r10d > + add r8d,r12d > + ror r13d,6 > + and r15d,edi > + xor r14d,r9d > + add r8d,r13d > + xor r15d,r10d > + ror r14d,2 > + add eax,r8d > + add r8d,r15d > + mov r13d,eax > + add r14d,r8d > + ror r13d,14 > + mov r8d,r14d > + mov r12d,ebx > + ror r14d,9 > + xor r13d,eax > + xor r12d,ecx > + ror r13d,5 > + xor r14d,r8d > + and r12d,eax > + xor r13d,eax > + add edx,DWORD[48+rsp] > + mov r15d,r8d > + xor r12d,ecx > + ror r14d,11 > + xor r15d,r9d > + add edx,r12d > + ror r13d,6 > + and edi,r15d > + xor r14d,r8d > + add edx,r13d > + xor edi,r9d > + ror r14d,2 > + add r11d,edx > + add edx,edi > + mov r13d,r11d > + add r14d,edx > + ror r13d,14 > + mov edx,r14d > + mov r12d,eax > + ror r14d,9 > + xor r13d,r11d > + xor r12d,ebx > + ror r13d,5 > + xor r14d,edx > + and r12d,r11d > + xor r13d,r11d > + add ecx,DWORD[52+rsp] > + mov edi,edx > + xor r12d,ebx > + ror r14d,11 > + xor edi,r8d > + add ecx,r12d > + ror r13d,6 > + and r15d,edi > + xor r14d,edx > + add ecx,r13d > + xor r15d,r8d > + ror r14d,2 > + add r10d,ecx > + add ecx,r15d > + mov r13d,r10d > + add r14d,ecx > + ror r13d,14 > + mov ecx,r14d > + mov r12d,r11d > + ror r14d,9 > + xor r13d,r10d > + xor r12d,eax > + ror r13d,5 > + xor r14d,ecx > + and r12d,r10d > + xor r13d,r10d > + add ebx,DWORD[56+rsp] > + mov r15d,ecx > + xor r12d,eax > + ror r14d,11 > + xor r15d,edx > + add ebx,r12d > + ror r13d,6 > + and edi,r15d > + xor r14d,ecx > + add ebx,r13d > + xor edi,edx > + ror r14d,2 > + add r9d,ebx > + add ebx,edi > + mov r13d,r9d > + add r14d,ebx > + ror r13d,14 > + mov ebx,r14d > + mov r12d,r10d > + ror r14d,9 > + xor r13d,r9d > + xor r12d,r11d > + ror r13d,5 > + xor r14d,ebx > + and r12d,r9d > + xor r13d,r9d > + add eax,DWORD[60+rsp] > + mov edi,ebx > + xor r12d,r11d > + ror r14d,11 > + xor edi,ecx > + add eax,r12d > + ror r13d,6 > + and r15d,edi > + xor r14d,ebx > + add eax,r13d > + xor r15d,ecx > + ror r14d,2 > + add r8d,eax > + add eax,r15d > + mov r13d,r8d > + add r14d,eax > + mov rdi,QWORD[((64+0))+rsp] > + mov eax,r14d > + > + add eax,DWORD[rdi] > + lea rsi,[64+rsi] > + add ebx,DWORD[4+rdi] > + add ecx,DWORD[8+rdi] > + add edx,DWORD[12+rdi] > + add r8d,DWORD[16+rdi] > + add r9d,DWORD[20+rdi] > + add r10d,DWORD[24+rdi] > + add r11d,DWORD[28+rdi] > + > + cmp rsi,QWORD[((64+16))+rsp] > + > + mov DWORD[rdi],eax > + mov DWORD[4+rdi],ebx > + mov DWORD[8+rdi],ecx > + mov DWORD[12+rdi],edx > + mov DWORD[16+rdi],r8d > + mov DWORD[20+rdi],r9d > + mov DWORD[24+rdi],r10d > + mov DWORD[28+rdi],r11d > + jb NEAR $L$loop_ssse3 > + > + mov rsi,QWORD[88+rsp] > + > + movaps xmm6,XMMWORD[((64+32))+rsp] > + movaps xmm7,XMMWORD[((64+48))+rsp] > + movaps xmm8,XMMWORD[((64+64))+rsp] > + movaps xmm9,XMMWORD[((64+80))+rsp] > + mov r15,QWORD[((-48))+rsi] > + > + mov r14,QWORD[((-40))+rsi] > + > + mov r13,QWORD[((-32))+rsi] > + > + mov r12,QWORD[((-24))+rsi] > + > + mov rbp,QWORD[((-16))+rsi] > + > + mov rbx,QWORD[((-8))+rsi] > + > + lea rsp,[rsi] > + > +$L$epilogue_ssse3: > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > + mov rsi,QWORD[16+rsp] > + DB 0F3h,0C3h ;repret > + > +$L$SEH_end_sha256_block_data_order_ssse3: > +EXTERN __imp_RtlVirtualUnwind > + > +ALIGN 16 > +se_handler: > + push rsi > + push rdi > + push rbx > + push rbp > + push r12 > + push r13 > + push r14 > + push r15 > + pushfq > + sub rsp,64 > + > + mov rax,QWORD[120+r8] > + mov rbx,QWORD[248+r8] > + > + mov rsi,QWORD[8+r9] > + mov r11,QWORD[56+r9] > + > + mov r10d,DWORD[r11] > + lea r10,[r10*1+rsi] > + cmp rbx,r10 > + jb NEAR $L$in_prologue > + > + mov rax,QWORD[152+r8] > + > + mov r10d,DWORD[4+r11] > + lea r10,[r10*1+rsi] > + cmp rbx,r10 > + jae NEAR $L$in_prologue > + mov rsi,rax > + mov rax,QWORD[((64+24))+rax] > + > + mov rbx,QWORD[((-8))+rax] > + mov rbp,QWORD[((-16))+rax] > + mov r12,QWORD[((-24))+rax] > + mov r13,QWORD[((-32))+rax] > + mov r14,QWORD[((-40))+rax] > + mov r15,QWORD[((-48))+rax] > + mov QWORD[144+r8],rbx > + mov QWORD[160+r8],rbp > + mov QWORD[216+r8],r12 > + mov QWORD[224+r8],r13 > + mov QWORD[232+r8],r14 > + mov QWORD[240+r8],r15 > + > + lea r10,[$L$epilogue] > + cmp rbx,r10 > + jb NEAR $L$in_prologue > + > + lea rsi,[((64+32))+rsi] > + lea rdi,[512+r8] > + mov ecx,8 > + DD 0xa548f3fc > + > +$L$in_prologue: > + mov rdi,QWORD[8+rax] > + mov rsi,QWORD[16+rax] > + mov QWORD[152+r8],rax > + mov QWORD[168+r8],rsi > + mov QWORD[176+r8],rdi > + > + mov rdi,QWORD[40+r9] > + mov rsi,r8 > + mov ecx,154 > + DD 0xa548f3fc > + > + mov rsi,r9 > + xor rcx,rcx > + mov rdx,QWORD[8+rsi] > + mov r8,QWORD[rsi] > + mov r9,QWORD[16+rsi] > + mov r10,QWORD[40+rsi] > + lea r11,[56+rsi] > + lea r12,[24+rsi] > + mov QWORD[32+rsp],r10 > + mov QWORD[40+rsp],r11 > + mov QWORD[48+rsp],r12 > + mov QWORD[56+rsp],rcx > + call QWORD[__imp_RtlVirtualUnwind] > + > + mov eax,1 > + add rsp,64 > + popfq > + pop r15 > + pop r14 > + pop r13 > + pop r12 > + pop rbp > + pop rbx > + pop rdi > + pop rsi > + DB 0F3h,0C3h ;repret > + > + > +ALIGN 16 > +shaext_handler: > + push rsi > + push rdi > + push rbx > + push rbp > + push r12 > + push r13 > + push r14 > + push r15 > + pushfq > + sub rsp,64 > + > + mov rax,QWORD[120+r8] > + mov rbx,QWORD[248+r8] > + > + lea r10,[$L$prologue_shaext] > + cmp rbx,r10 > + jb NEAR $L$in_prologue > + > + lea r10,[$L$epilogue_shaext] > + cmp rbx,r10 > + jae NEAR $L$in_prologue > + > + lea rsi,[((-8-80))+rax] > + lea rdi,[512+r8] > + mov ecx,10 > + DD 0xa548f3fc > + > + jmp NEAR $L$in_prologue > + > +section .pdata rdata align=4 > +ALIGN 4 > + DD $L$SEH_begin_sha256_block_data_order wrt ..imagebase > + DD $L$SEH_end_sha256_block_data_order wrt ..imagebase > + DD $L$SEH_info_sha256_block_data_order wrt ..imagebase > + DD $L$SEH_begin_sha256_block_data_order_shaext wrt ..imagebase > + DD $L$SEH_end_sha256_block_data_order_shaext wrt ..imagebase > + DD $L$SEH_info_sha256_block_data_order_shaext wrt ..imagebase > + DD $L$SEH_begin_sha256_block_data_order_ssse3 wrt ..imagebase > + DD $L$SEH_end_sha256_block_data_order_ssse3 wrt ..imagebase > + DD $L$SEH_info_sha256_block_data_order_ssse3 wrt ..imagebase > +section .xdata rdata align=8 > +ALIGN 8 > +$L$SEH_info_sha256_block_data_order: > +DB 9,0,0,0 > + DD se_handler wrt ..imagebase > + DD $L$prologue wrt ..imagebase,$L$epilogue wrt ..imagebase > +$L$SEH_info_sha256_block_data_order_shaext: > +DB 9,0,0,0 > + DD shaext_handler wrt ..imagebase > +$L$SEH_info_sha256_block_data_order_ssse3: > +DB 9,0,0,0 > + DD se_handler wrt ..imagebase > + DD $L$prologue_ssse3 wrt ..imagebase,$L$epilogue_ssse3 > wrt ..imagebase > diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha512-x86_64.nasm > b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha512-x86_64.nasm > new file mode 100644 > index 0000000000..c6397d4393 > --- /dev/null > +++ b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha512-x86_64.nasm > @@ -0,0 +1,1938 @@ > +; WARNING: do not edit! > +; Generated from openssl/crypto/sha/asm/sha512-x86_64.pl > +; > +; Copyright 2005-2020 The OpenSSL Project Authors. All Rights Reserved. > +; > +; Licensed under the OpenSSL license (the "License"). You may not use > +; this file except in compliance with the License. You can obtain a copy > +; in the file LICENSE in the source distribution or at > +; https://www.openssl.org/source/license.html > + > +default rel > +%define XMMWORD > +%define YMMWORD > +%define ZMMWORD > +section .text code align=64 > + > + > +EXTERN OPENSSL_ia32cap_P > +global sha512_block_data_order > + > +ALIGN 16 > +sha512_block_data_order: > + mov QWORD[8+rsp],rdi ;WIN64 prologue > + mov QWORD[16+rsp],rsi > + mov rax,rsp > +$L$SEH_begin_sha512_block_data_order: > + mov rdi,rcx > + mov rsi,rdx > + mov rdx,r8 > + > + > + > + mov rax,rsp > + > + push rbx > + > + push rbp > + > + push r12 > + > + push r13 > + > + push r14 > + > + push r15 > + > + shl rdx,4 > + sub rsp,16*8+4*8 > + lea rdx,[rdx*8+rsi] > + and rsp,-64 > + mov QWORD[((128+0))+rsp],rdi > + mov QWORD[((128+8))+rsp],rsi > + mov QWORD[((128+16))+rsp],rdx > + mov QWORD[152+rsp],rax > + > +$L$prologue: > + > + mov rax,QWORD[rdi] > + mov rbx,QWORD[8+rdi] > + mov rcx,QWORD[16+rdi] > + mov rdx,QWORD[24+rdi] > + mov r8,QWORD[32+rdi] > + mov r9,QWORD[40+rdi] > + mov r10,QWORD[48+rdi] > + mov r11,QWORD[56+rdi] > + jmp NEAR $L$loop > + > +ALIGN 16 > +$L$loop: > + mov rdi,rbx > + lea rbp,[K512] > + xor rdi,rcx > + mov r12,QWORD[rsi] > + mov r13,r8 > + mov r14,rax > + bswap r12 > + ror r13,23 > + mov r15,r9 > + > + xor r13,r8 > + ror r14,5 > + xor r15,r10 > + > + mov QWORD[rsp],r12 > + xor r14,rax > + and r15,r8 > + > + ror r13,4 > + add r12,r11 > + xor r15,r10 > + > + ror r14,6 > + xor r13,r8 > + add r12,r15 > + > + mov r15,rax > + add r12,QWORD[rbp] > + xor r14,rax > + > + xor r15,rbx > + ror r13,14 > + mov r11,rbx > + > + and rdi,r15 > + ror r14,28 > + add r12,r13 > + > + xor r11,rdi > + add rdx,r12 > + add r11,r12 > + > + lea rbp,[8+rbp] > + add r11,r14 > + mov r12,QWORD[8+rsi] > + mov r13,rdx > + mov r14,r11 > + bswap r12 > + ror r13,23 > + mov rdi,r8 > + > + xor r13,rdx > + ror r14,5 > + xor rdi,r9 > + > + mov QWORD[8+rsp],r12 > + xor r14,r11 > + and rdi,rdx > + > + ror r13,4 > + add r12,r10 > + xor rdi,r9 > + > + ror r14,6 > + xor r13,rdx > + add r12,rdi > + > + mov rdi,r11 > + add r12,QWORD[rbp] > + xor r14,r11 > + > + xor rdi,rax > + ror r13,14 > + mov r10,rax > + > + and r15,rdi > + ror r14,28 > + add r12,r13 > + > + xor r10,r15 > + add rcx,r12 > + add r10,r12 > + > + lea rbp,[24+rbp] > + add r10,r14 > + mov r12,QWORD[16+rsi] > + mov r13,rcx > + mov r14,r10 > + bswap r12 > + ror r13,23 > + mov r15,rdx > + > + xor r13,rcx > + ror r14,5 > + xor r15,r8 > + > + mov QWORD[16+rsp],r12 > + xor r14,r10 > + and r15,rcx > + > + ror r13,4 > + add r12,r9 > + xor r15,r8 > + > + ror r14,6 > + xor r13,rcx > + add r12,r15 > + > + mov r15,r10 > + add r12,QWORD[rbp] > + xor r14,r10 > + > + xor r15,r11 > + ror r13,14 > + mov r9,r11 > + > + and rdi,r15 > + ror r14,28 > + add r12,r13 > + > + xor r9,rdi > + add rbx,r12 > + add r9,r12 > + > + lea rbp,[8+rbp] > + add r9,r14 > + mov r12,QWORD[24+rsi] > + mov r13,rbx > + mov r14,r9 > + bswap r12 > + ror r13,23 > + mov rdi,rcx > + > + xor r13,rbx > + ror r14,5 > + xor rdi,rdx > + > + mov QWORD[24+rsp],r12 > + xor r14,r9 > + and rdi,rbx > + > + ror r13,4 > + add r12,r8 > + xor rdi,rdx > + > + ror r14,6 > + xor r13,rbx > + add r12,rdi > + > + mov rdi,r9 > + add r12,QWORD[rbp] > + xor r14,r9 > + > + xor rdi,r10 > + ror r13,14 > + mov r8,r10 > + > + and r15,rdi > + ror r14,28 > + add r12,r13 > + > + xor r8,r15 > + add rax,r12 > + add r8,r12 > + > + lea rbp,[24+rbp] > + add r8,r14 > + mov r12,QWORD[32+rsi] > + mov r13,rax > + mov r14,r8 > + bswap r12 > + ror r13,23 > + mov r15,rbx > + > + xor r13,rax > + ror r14,5 > + xor r15,rcx > + > + mov QWORD[32+rsp],r12 > + xor r14,r8 > + and r15,rax > + > + ror r13,4 > + add r12,rdx > + xor r15,rcx > + > + ror r14,6 > + xor r13,rax > + add r12,r15 > + > + mov r15,r8 > + add r12,QWORD[rbp] > + xor r14,r8 > + > + xor r15,r9 > + ror r13,14 > + mov rdx,r9 > + > + and rdi,r15 > + ror r14,28 > + add r12,r13 > + > + xor rdx,rdi > + add r11,r12 > + add rdx,r12 > + > + lea rbp,[8+rbp] > + add rdx,r14 > + mov r12,QWORD[40+rsi] > + mov r13,r11 > + mov r14,rdx > + bswap r12 > + ror r13,23 > + mov rdi,rax > + > + xor r13,r11 > + ror r14,5 > + xor rdi,rbx > + > + mov QWORD[40+rsp],r12 > + xor r14,rdx > + and rdi,r11 > + > + ror r13,4 > + add r12,rcx > + xor rdi,rbx > + > + ror r14,6 > + xor r13,r11 > + add r12,rdi > + > + mov rdi,rdx > + add r12,QWORD[rbp] > + xor r14,rdx > + > + xor rdi,r8 > + ror r13,14 > + mov rcx,r8 > + > + and r15,rdi > + ror r14,28 > + add r12,r13 > + > + xor rcx,r15 > + add r10,r12 > + add rcx,r12 > + > + lea rbp,[24+rbp] > + add rcx,r14 > + mov r12,QWORD[48+rsi] > + mov r13,r10 > + mov r14,rcx > + bswap r12 > + ror r13,23 > + mov r15,r11 > + > + xor r13,r10 > + ror r14,5 > + xor r15,rax > + > + mov QWORD[48+rsp],r12 > + xor r14,rcx > + and r15,r10 > + > + ror r13,4 > + add r12,rbx > + xor r15,rax > + > + ror r14,6 > + xor r13,r10 > + add r12,r15 > + > + mov r15,rcx > + add r12,QWORD[rbp] > + xor r14,rcx > + > + xor r15,rdx > + ror r13,14 > + mov rbx,rdx > + > + and rdi,r15 > + ror r14,28 > + add r12,r13 > + > + xor rbx,rdi > + add r9,r12 > + add rbx,r12 > + > + lea rbp,[8+rbp] > + add rbx,r14 > + mov r12,QWORD[56+rsi] > + mov r13,r9 > + mov r14,rbx > + bswap r12 > + ror r13,23 > + mov rdi,r10 > + > + xor r13,r9 > + ror r14,5 > + xor rdi,r11 > + > + mov QWORD[56+rsp],r12 > + xor r14,rbx > + and rdi,r9 > + > + ror r13,4 > + add r12,rax > + xor rdi,r11 > + > + ror r14,6 > + xor r13,r9 > + add r12,rdi > + > + mov rdi,rbx > + add r12,QWORD[rbp] > + xor r14,rbx > + > + xor rdi,rcx > + ror r13,14 > + mov rax,rcx > + > + and r15,rdi > + ror r14,28 > + add r12,r13 > + > + xor rax,r15 > + add r8,r12 > + add rax,r12 > + > + lea rbp,[24+rbp] > + add rax,r14 > + mov r12,QWORD[64+rsi] > + mov r13,r8 > + mov r14,rax > + bswap r12 > + ror r13,23 > + mov r15,r9 > + > + xor r13,r8 > + ror r14,5 > + xor r15,r10 > + > + mov QWORD[64+rsp],r12 > + xor r14,rax > + and r15,r8 > + > + ror r13,4 > + add r12,r11 > + xor r15,r10 > + > + ror r14,6 > + xor r13,r8 > + add r12,r15 > + > + mov r15,rax > + add r12,QWORD[rbp] > + xor r14,rax > + > + xor r15,rbx > + ror r13,14 > + mov r11,rbx > + > + and rdi,r15 > + ror r14,28 > + add r12,r13 > + > + xor r11,rdi > + add rdx,r12 > + add r11,r12 > + > + lea rbp,[8+rbp] > + add r11,r14 > + mov r12,QWORD[72+rsi] > + mov r13,rdx > + mov r14,r11 > + bswap r12 > + ror r13,23 > + mov rdi,r8 > + > + xor r13,rdx > + ror r14,5 > + xor rdi,r9 > + > + mov QWORD[72+rsp],r12 > + xor r14,r11 > + and rdi,rdx > + > + ror r13,4 > + add r12,r10 > + xor rdi,r9 > + > + ror r14,6 > + xor r13,rdx > + add r12,rdi > + > + mov rdi,r11 > + add r12,QWORD[rbp] > + xor r14,r11 > + > + xor rdi,rax > + ror r13,14 > + mov r10,rax > + > + and r15,rdi > + ror r14,28 > + add r12,r13 > + > + xor r10,r15 > + add rcx,r12 > + add r10,r12 > + > + lea rbp,[24+rbp] > + add r10,r14 > + mov r12,QWORD[80+rsi] > + mov r13,rcx > + mov r14,r10 > + bswap r12 > + ror r13,23 > + mov r15,rdx > + > + xor r13,rcx > + ror r14,5 > + xor r15,r8 > + > + mov QWORD[80+rsp],r12 > + xor r14,r10 > + and r15,rcx > + > + ror r13,4 > + add r12,r9 > + xor r15,r8 > + > + ror r14,6 > + xor r13,rcx > + add r12,r15 > + > + mov r15,r10 > + add r12,QWORD[rbp] > + xor r14,r10 > + > + xor r15,r11 > + ror r13,14 > + mov r9,r11 > + > + and rdi,r15 > + ror r14,28 > + add r12,r13 > + > + xor r9,rdi > + add rbx,r12 > + add r9,r12 > + > + lea rbp,[8+rbp] > + add r9,r14 > + mov r12,QWORD[88+rsi] > + mov r13,rbx > + mov r14,r9 > + bswap r12 > + ror r13,23 > + mov rdi,rcx > + > + xor r13,rbx > + ror r14,5 > + xor rdi,rdx > + > + mov QWORD[88+rsp],r12 > + xor r14,r9 > + and rdi,rbx > + > + ror r13,4 > + add r12,r8 > + xor rdi,rdx > + > + ror r14,6 > + xor r13,rbx > + add r12,rdi > + > + mov rdi,r9 > + add r12,QWORD[rbp] > + xor r14,r9 > + > + xor rdi,r10 > + ror r13,14 > + mov r8,r10 > + > + and r15,rdi > + ror r14,28 > + add r12,r13 > + > + xor r8,r15 > + add rax,r12 > + add r8,r12 > + > + lea rbp,[24+rbp] > + add r8,r14 > + mov r12,QWORD[96+rsi] > + mov r13,rax > + mov r14,r8 > + bswap r12 > + ror r13,23 > + mov r15,rbx > + > + xor r13,rax > + ror r14,5 > + xor r15,rcx > + > + mov QWORD[96+rsp],r12 > + xor r14,r8 > + and r15,rax > + > + ror r13,4 > + add r12,rdx > + xor r15,rcx > + > + ror r14,6 > + xor r13,rax > + add r12,r15 > + > + mov r15,r8 > + add r12,QWORD[rbp] > + xor r14,r8 > + > + xor r15,r9 > + ror r13,14 > + mov rdx,r9 > + > + and rdi,r15 > + ror r14,28 > + add r12,r13 > + > + xor rdx,rdi > + add r11,r12 > + add rdx,r12 > + > + lea rbp,[8+rbp] > + add rdx,r14 > + mov r12,QWORD[104+rsi] > + mov r13,r11 > + mov r14,rdx > + bswap r12 > + ror r13,23 > + mov rdi,rax > + > + xor r13,r11 > + ror r14,5 > + xor rdi,rbx > + > + mov QWORD[104+rsp],r12 > + xor r14,rdx > + and rdi,r11 > + > + ror r13,4 > + add r12,rcx > + xor rdi,rbx > + > + ror r14,6 > + xor r13,r11 > + add r12,rdi > + > + mov rdi,rdx > + add r12,QWORD[rbp] > + xor r14,rdx > + > + xor rdi,r8 > + ror r13,14 > + mov rcx,r8 > + > + and r15,rdi > + ror r14,28 > + add r12,r13 > + > + xor rcx,r15 > + add r10,r12 > + add rcx,r12 > + > + lea rbp,[24+rbp] > + add rcx,r14 > + mov r12,QWORD[112+rsi] > + mov r13,r10 > + mov r14,rcx > + bswap r12 > + ror r13,23 > + mov r15,r11 > + > + xor r13,r10 > + ror r14,5 > + xor r15,rax > + > + mov QWORD[112+rsp],r12 > + xor r14,rcx > + and r15,r10 > + > + ror r13,4 > + add r12,rbx > + xor r15,rax > + > + ror r14,6 > + xor r13,r10 > + add r12,r15 > + > + mov r15,rcx > + add r12,QWORD[rbp] > + xor r14,rcx > + > + xor r15,rdx > + ror r13,14 > + mov rbx,rdx > + > + and rdi,r15 > + ror r14,28 > + add r12,r13 > + > + xor rbx,rdi > + add r9,r12 > + add rbx,r12 > + > + lea rbp,[8+rbp] > + add rbx,r14 > + mov r12,QWORD[120+rsi] > + mov r13,r9 > + mov r14,rbx > + bswap r12 > + ror r13,23 > + mov rdi,r10 > + > + xor r13,r9 > + ror r14,5 > + xor rdi,r11 > + > + mov QWORD[120+rsp],r12 > + xor r14,rbx > + and rdi,r9 > + > + ror r13,4 > + add r12,rax > + xor rdi,r11 > + > + ror r14,6 > + xor r13,r9 > + add r12,rdi > + > + mov rdi,rbx > + add r12,QWORD[rbp] > + xor r14,rbx > + > + xor rdi,rcx > + ror r13,14 > + mov rax,rcx > + > + and r15,rdi > + ror r14,28 > + add r12,r13 > + > + xor rax,r15 > + add r8,r12 > + add rax,r12 > + > + lea rbp,[24+rbp] > + jmp NEAR $L$rounds_16_xx > +ALIGN 16 > +$L$rounds_16_xx: > + mov r13,QWORD[8+rsp] > + mov r15,QWORD[112+rsp] > + > + mov r12,r13 > + ror r13,7 > + add rax,r14 > + mov r14,r15 > + ror r15,42 > + > + xor r13,r12 > + shr r12,7 > + ror r13,1 > + xor r15,r14 > + shr r14,6 > + > + ror r15,19 > + xor r12,r13 > + xor r15,r14 > + add r12,QWORD[72+rsp] > + > + add r12,QWORD[rsp] > + mov r13,r8 > + add r12,r15 > + mov r14,rax > + ror r13,23 > + mov r15,r9 > + > + xor r13,r8 > + ror r14,5 > + xor r15,r10 > + > + mov QWORD[rsp],r12 > + xor r14,rax > + and r15,r8 > + > + ror r13,4 > + add r12,r11 > + xor r15,r10 > + > + ror r14,6 > + xor r13,r8 > + add r12,r15 > + > + mov r15,rax > + add r12,QWORD[rbp] > + xor r14,rax > + > + xor r15,rbx > + ror r13,14 > + mov r11,rbx > + > + and rdi,r15 > + ror r14,28 > + add r12,r13 > + > + xor r11,rdi > + add rdx,r12 > + add r11,r12 > + > + lea rbp,[8+rbp] > + mov r13,QWORD[16+rsp] > + mov rdi,QWORD[120+rsp] > + > + mov r12,r13 > + ror r13,7 > + add r11,r14 > + mov r14,rdi > + ror rdi,42 > + > + xor r13,r12 > + shr r12,7 > + ror r13,1 > + xor rdi,r14 > + shr r14,6 > + > + ror rdi,19 > + xor r12,r13 > + xor rdi,r14 > + add r12,QWORD[80+rsp] > + > + add r12,QWORD[8+rsp] > + mov r13,rdx > + add r12,rdi > + mov r14,r11 > + ror r13,23 > + mov rdi,r8 > + > + xor r13,rdx > + ror r14,5 > + xor rdi,r9 > + > + mov QWORD[8+rsp],r12 > + xor r14,r11 > + and rdi,rdx > + > + ror r13,4 > + add r12,r10 > + xor rdi,r9 > + > + ror r14,6 > + xor r13,rdx > + add r12,rdi > + > + mov rdi,r11 > + add r12,QWORD[rbp] > + xor r14,r11 > + > + xor rdi,rax > + ror r13,14 > + mov r10,rax > + > + and r15,rdi > + ror r14,28 > + add r12,r13 > + > + xor r10,r15 > + add rcx,r12 > + add r10,r12 > + > + lea rbp,[24+rbp] > + mov r13,QWORD[24+rsp] > + mov r15,QWORD[rsp] > + > + mov r12,r13 > + ror r13,7 > + add r10,r14 > + mov r14,r15 > + ror r15,42 > + > + xor r13,r12 > + shr r12,7 > + ror r13,1 > + xor r15,r14 > + shr r14,6 > + > + ror r15,19 > + xor r12,r13 > + xor r15,r14 > + add r12,QWORD[88+rsp] > + > + add r12,QWORD[16+rsp] > + mov r13,rcx > + add r12,r15 > + mov r14,r10 > + ror r13,23 > + mov r15,rdx > + > + xor r13,rcx > + ror r14,5 > + xor r15,r8 > + > + mov QWORD[16+rsp],r12 > + xor r14,r10 > + and r15,rcx > + > + ror r13,4 > + add r12,r9 > + xor r15,r8 > + > + ror r14,6 > + xor r13,rcx > + add r12,r15 > + > + mov r15,r10 > + add r12,QWORD[rbp] > + xor r14,r10 > + > + xor r15,r11 > + ror r13,14 > + mov r9,r11 > + > + and rdi,r15 > + ror r14,28 > + add r12,r13 > + > + xor r9,rdi > + add rbx,r12 > + add r9,r12 > + > + lea rbp,[8+rbp] > + mov r13,QWORD[32+rsp] > + mov rdi,QWORD[8+rsp] > + > + mov r12,r13 > + ror r13,7 > + add r9,r14 > + mov r14,rdi > + ror rdi,42 > + > + xor r13,r12 > + shr r12,7 > + ror r13,1 > + xor rdi,r14 > + shr r14,6 > + > + ror rdi,19 > + xor r12,r13 > + xor rdi,r14 > + add r12,QWORD[96+rsp] > + > + add r12,QWORD[24+rsp] > + mov r13,rbx > + add r12,rdi > + mov r14,r9 > + ror r13,23 > + mov rdi,rcx > + > + xor r13,rbx > + ror r14,5 > + xor rdi,rdx > + > + mov QWORD[24+rsp],r12 > + xor r14,r9 > + and rdi,rbx > + > + ror r13,4 > + add r12,r8 > + xor rdi,rdx > + > + ror r14,6 > + xor r13,rbx > + add r12,rdi > + > + mov rdi,r9 > + add r12,QWORD[rbp] > + xor r14,r9 > + > + xor rdi,r10 > + ror r13,14 > + mov r8,r10 > + > + and r15,rdi > + ror r14,28 > + add r12,r13 > + > + xor r8,r15 > + add rax,r12 > + add r8,r12 > + > + lea rbp,[24+rbp] > + mov r13,QWORD[40+rsp] > + mov r15,QWORD[16+rsp] > + > + mov r12,r13 > + ror r13,7 > + add r8,r14 > + mov r14,r15 > + ror r15,42 > + > + xor r13,r12 > + shr r12,7 > + ror r13,1 > + xor r15,r14 > + shr r14,6 > + > + ror r15,19 > + xor r12,r13 > + xor r15,r14 > + add r12,QWORD[104+rsp] > + > + add r12,QWORD[32+rsp] > + mov r13,rax > + add r12,r15 > + mov r14,r8 > + ror r13,23 > + mov r15,rbx > + > + xor r13,rax > + ror r14,5 > + xor r15,rcx > + > + mov QWORD[32+rsp],r12 > + xor r14,r8 > + and r15,rax > + > + ror r13,4 > + add r12,rdx > + xor r15,rcx > + > + ror r14,6 > + xor r13,rax > + add r12,r15 > + > + mov r15,r8 > + add r12,QWORD[rbp] > + xor r14,r8 > + > + xor r15,r9 > + ror r13,14 > + mov rdx,r9 > + > + and rdi,r15 > + ror r14,28 > + add r12,r13 > + > + xor rdx,rdi > + add r11,r12 > + add rdx,r12 > + > + lea rbp,[8+rbp] > + mov r13,QWORD[48+rsp] > + mov rdi,QWORD[24+rsp] > + > + mov r12,r13 > + ror r13,7 > + add rdx,r14 > + mov r14,rdi > + ror rdi,42 > + > + xor r13,r12 > + shr r12,7 > + ror r13,1 > + xor rdi,r14 > + shr r14,6 > + > + ror rdi,19 > + xor r12,r13 > + xor rdi,r14 > + add r12,QWORD[112+rsp] > + > + add r12,QWORD[40+rsp] > + mov r13,r11 > + add r12,rdi > + mov r14,rdx > + ror r13,23 > + mov rdi,rax > + > + xor r13,r11 > + ror r14,5 > + xor rdi,rbx > + > + mov QWORD[40+rsp],r12 > + xor r14,rdx > + and rdi,r11 > + > + ror r13,4 > + add r12,rcx > + xor rdi,rbx > + > + ror r14,6 > + xor r13,r11 > + add r12,rdi > + > + mov rdi,rdx > + add r12,QWORD[rbp] > + xor r14,rdx > + > + xor rdi,r8 > + ror r13,14 > + mov rcx,r8 > + > + and r15,rdi > + ror r14,28 > + add r12,r13 > + > + xor rcx,r15 > + add r10,r12 > + add rcx,r12 > + > + lea rbp,[24+rbp] > + mov r13,QWORD[56+rsp] > + mov r15,QWORD[32+rsp] > + > + mov r12,r13 > + ror r13,7 > + add rcx,r14 > + mov r14,r15 > + ror r15,42 > + > + xor r13,r12 > + shr r12,7 > + ror r13,1 > + xor r15,r14 > + shr r14,6 > + > + ror r15,19 > + xor r12,r13 > + xor r15,r14 > + add r12,QWORD[120+rsp] > + > + add r12,QWORD[48+rsp] > + mov r13,r10 > + add r12,r15 > + mov r14,rcx > + ror r13,23 > + mov r15,r11 > + > + xor r13,r10 > + ror r14,5 > + xor r15,rax > + > + mov QWORD[48+rsp],r12 > + xor r14,rcx > + and r15,r10 > + > + ror r13,4 > + add r12,rbx > + xor r15,rax > + > + ror r14,6 > + xor r13,r10 > + add r12,r15 > + > + mov r15,rcx > + add r12,QWORD[rbp] > + xor r14,rcx > + > + xor r15,rdx > + ror r13,14 > + mov rbx,rdx > + > + and rdi,r15 > + ror r14,28 > + add r12,r13 > + > + xor rbx,rdi > + add r9,r12 > + add rbx,r12 > + > + lea rbp,[8+rbp] > + mov r13,QWORD[64+rsp] > + mov rdi,QWORD[40+rsp] > + > + mov r12,r13 > + ror r13,7 > + add rbx,r14 > + mov r14,rdi > + ror rdi,42 > + > + xor r13,r12 > + shr r12,7 > + ror r13,1 > + xor rdi,r14 > + shr r14,6 > + > + ror rdi,19 > + xor r12,r13 > + xor rdi,r14 > + add r12,QWORD[rsp] > + > + add r12,QWORD[56+rsp] > + mov r13,r9 > + add r12,rdi > + mov r14,rbx > + ror r13,23 > + mov rdi,r10 > + > + xor r13,r9 > + ror r14,5 > + xor rdi,r11 > + > + mov QWORD[56+rsp],r12 > + xor r14,rbx > + and rdi,r9 > + > + ror r13,4 > + add r12,rax > + xor rdi,r11 > + > + ror r14,6 > + xor r13,r9 > + add r12,rdi > + > + mov rdi,rbx > + add r12,QWORD[rbp] > + xor r14,rbx > + > + xor rdi,rcx > + ror r13,14 > + mov rax,rcx > + > + and r15,rdi > + ror r14,28 > + add r12,r13 > + > + xor rax,r15 > + add r8,r12 > + add rax,r12 > + > + lea rbp,[24+rbp] > + mov r13,QWORD[72+rsp] > + mov r15,QWORD[48+rsp] > + > + mov r12,r13 > + ror r13,7 > + add rax,r14 > + mov r14,r15 > + ror r15,42 > + > + xor r13,r12 > + shr r12,7 > + ror r13,1 > + xor r15,r14 > + shr r14,6 > + > + ror r15,19 > + xor r12,r13 > + xor r15,r14 > + add r12,QWORD[8+rsp] > + > + add r12,QWORD[64+rsp] > + mov r13,r8 > + add r12,r15 > + mov r14,rax > + ror r13,23 > + mov r15,r9 > + > + xor r13,r8 > + ror r14,5 > + xor r15,r10 > + > + mov QWORD[64+rsp],r12 > + xor r14,rax > + and r15,r8 > + > + ror r13,4 > + add r12,r11 > + xor r15,r10 > + > + ror r14,6 > + xor r13,r8 > + add r12,r15 > + > + mov r15,rax > + add r12,QWORD[rbp] > + xor r14,rax > + > + xor r15,rbx > + ror r13,14 > + mov r11,rbx > + > + and rdi,r15 > + ror r14,28 > + add r12,r13 > + > + xor r11,rdi > + add rdx,r12 > + add r11,r12 > + > + lea rbp,[8+rbp] > + mov r13,QWORD[80+rsp] > + mov rdi,QWORD[56+rsp] > + > + mov r12,r13 > + ror r13,7 > + add r11,r14 > + mov r14,rdi > + ror rdi,42 > + > + xor r13,r12 > + shr r12,7 > + ror r13,1 > + xor rdi,r14 > + shr r14,6 > + > + ror rdi,19 > + xor r12,r13 > + xor rdi,r14 > + add r12,QWORD[16+rsp] > + > + add r12,QWORD[72+rsp] > + mov r13,rdx > + add r12,rdi > + mov r14,r11 > + ror r13,23 > + mov rdi,r8 > + > + xor r13,rdx > + ror r14,5 > + xor rdi,r9 > + > + mov QWORD[72+rsp],r12 > + xor r14,r11 > + and rdi,rdx > + > + ror r13,4 > + add r12,r10 > + xor rdi,r9 > + > + ror r14,6 > + xor r13,rdx > + add r12,rdi > + > + mov rdi,r11 > + add r12,QWORD[rbp] > + xor r14,r11 > + > + xor rdi,rax > + ror r13,14 > + mov r10,rax > + > + and r15,rdi > + ror r14,28 > + add r12,r13 > + > + xor r10,r15 > + add rcx,r12 > + add r10,r12 > + > + lea rbp,[24+rbp] > + mov r13,QWORD[88+rsp] > + mov r15,QWORD[64+rsp] > + > + mov r12,r13 > + ror r13,7 > + add r10,r14 > + mov r14,r15 > + ror r15,42 > + > + xor r13,r12 > + shr r12,7 > + ror r13,1 > + xor r15,r14 > + shr r14,6 > + > + ror r15,19 > + xor r12,r13 > + xor r15,r14 > + add r12,QWORD[24+rsp] > + > + add r12,QWORD[80+rsp] > + mov r13,rcx > + add r12,r15 > + mov r14,r10 > + ror r13,23 > + mov r15,rdx > + > + xor r13,rcx > + ror r14,5 > + xor r15,r8 > + > + mov QWORD[80+rsp],r12 > + xor r14,r10 > + and r15,rcx > + > + ror r13,4 > + add r12,r9 > + xor r15,r8 > + > + ror r14,6 > + xor r13,rcx > + add r12,r15 > + > + mov r15,r10 > + add r12,QWORD[rbp] > + xor r14,r10 > + > + xor r15,r11 > + ror r13,14 > + mov r9,r11 > + > + and rdi,r15 > + ror r14,28 > + add r12,r13 > + > + xor r9,rdi > + add rbx,r12 > + add r9,r12 > + > + lea rbp,[8+rbp] > + mov r13,QWORD[96+rsp] > + mov rdi,QWORD[72+rsp] > + > + mov r12,r13 > + ror r13,7 > + add r9,r14 > + mov r14,rdi > + ror rdi,42 > + > + xor r13,r12 > + shr r12,7 > + ror r13,1 > + xor rdi,r14 > + shr r14,6 > + > + ror rdi,19 > + xor r12,r13 > + xor rdi,r14 > + add r12,QWORD[32+rsp] > + > + add r12,QWORD[88+rsp] > + mov r13,rbx > + add r12,rdi > + mov r14,r9 > + ror r13,23 > + mov rdi,rcx > + > + xor r13,rbx > + ror r14,5 > + xor rdi,rdx > + > + mov QWORD[88+rsp],r12 > + xor r14,r9 > + and rdi,rbx > + > + ror r13,4 > + add r12,r8 > + xor rdi,rdx > + > + ror r14,6 > + xor r13,rbx > + add r12,rdi > + > + mov rdi,r9 > + add r12,QWORD[rbp] > + xor r14,r9 > + > + xor rdi,r10 > + ror r13,14 > + mov r8,r10 > + > + and r15,rdi > + ror r14,28 > + add r12,r13 > + > + xor r8,r15 > + add rax,r12 > + add r8,r12 > + > + lea rbp,[24+rbp] > + mov r13,QWORD[104+rsp] > + mov r15,QWORD[80+rsp] > + > + mov r12,r13 > + ror r13,7 > + add r8,r14 > + mov r14,r15 > + ror r15,42 > + > + xor r13,r12 > + shr r12,7 > + ror r13,1 > + xor r15,r14 > + shr r14,6 > + > + ror r15,19 > + xor r12,r13 > + xor r15,r14 > + add r12,QWORD[40+rsp] > + > + add r12,QWORD[96+rsp] > + mov r13,rax > + add r12,r15 > + mov r14,r8 > + ror r13,23 > + mov r15,rbx > + > + xor r13,rax > + ror r14,5 > + xor r15,rcx > + > + mov QWORD[96+rsp],r12 > + xor r14,r8 > + and r15,rax > + > + ror r13,4 > + add r12,rdx > + xor r15,rcx > + > + ror r14,6 > + xor r13,rax > + add r12,r15 > + > + mov r15,r8 > + add r12,QWORD[rbp] > + xor r14,r8 > + > + xor r15,r9 > + ror r13,14 > + mov rdx,r9 > + > + and rdi,r15 > + ror r14,28 > + add r12,r13 > + > + xor rdx,rdi > + add r11,r12 > + add rdx,r12 > + > + lea rbp,[8+rbp] > + mov r13,QWORD[112+rsp] > + mov rdi,QWORD[88+rsp] > + > + mov r12,r13 > + ror r13,7 > + add rdx,r14 > + mov r14,rdi > + ror rdi,42 > + > + xor r13,r12 > + shr r12,7 > + ror r13,1 > + xor rdi,r14 > + shr r14,6 > + > + ror rdi,19 > + xor r12,r13 > + xor rdi,r14 > + add r12,QWORD[48+rsp] > + > + add r12,QWORD[104+rsp] > + mov r13,r11 > + add r12,rdi > + mov r14,rdx > + ror r13,23 > + mov rdi,rax > + > + xor r13,r11 > + ror r14,5 > + xor rdi,rbx > + > + mov QWORD[104+rsp],r12 > + xor r14,rdx > + and rdi,r11 > + > + ror r13,4 > + add r12,rcx > + xor rdi,rbx > + > + ror r14,6 > + xor r13,r11 > + add r12,rdi > + > + mov rdi,rdx > + add r12,QWORD[rbp] > + xor r14,rdx > + > + xor rdi,r8 > + ror r13,14 > + mov rcx,r8 > + > + and r15,rdi > + ror r14,28 > + add r12,r13 > + > + xor rcx,r15 > + add r10,r12 > + add rcx,r12 > + > + lea rbp,[24+rbp] > + mov r13,QWORD[120+rsp] > + mov r15,QWORD[96+rsp] > + > + mov r12,r13 > + ror r13,7 > + add rcx,r14 > + mov r14,r15 > + ror r15,42 > + > + xor r13,r12 > + shr r12,7 > + ror r13,1 > + xor r15,r14 > + shr r14,6 > + > + ror r15,19 > + xor r12,r13 > + xor r15,r14 > + add r12,QWORD[56+rsp] > + > + add r12,QWORD[112+rsp] > + mov r13,r10 > + add r12,r15 > + mov r14,rcx > + ror r13,23 > + mov r15,r11 > + > + xor r13,r10 > + ror r14,5 > + xor r15,rax > + > + mov QWORD[112+rsp],r12 > + xor r14,rcx > + and r15,r10 > + > + ror r13,4 > + add r12,rbx > + xor r15,rax > + > + ror r14,6 > + xor r13,r10 > + add r12,r15 > + > + mov r15,rcx > + add r12,QWORD[rbp] > + xor r14,rcx > + > + xor r15,rdx > + ror r13,14 > + mov rbx,rdx > + > + and rdi,r15 > + ror r14,28 > + add r12,r13 > + > + xor rbx,rdi > + add r9,r12 > + add rbx,r12 > + > + lea rbp,[8+rbp] > + mov r13,QWORD[rsp] > + mov rdi,QWORD[104+rsp] > + > + mov r12,r13 > + ror r13,7 > + add rbx,r14 > + mov r14,rdi > + ror rdi,42 > + > + xor r13,r12 > + shr r12,7 > + ror r13,1 > + xor rdi,r14 > + shr r14,6 > + > + ror rdi,19 > + xor r12,r13 > + xor rdi,r14 > + add r12,QWORD[64+rsp] > + > + add r12,QWORD[120+rsp] > + mov r13,r9 > + add r12,rdi > + mov r14,rbx > + ror r13,23 > + mov rdi,r10 > + > + xor r13,r9 > + ror r14,5 > + xor rdi,r11 > + > + mov QWORD[120+rsp],r12 > + xor r14,rbx > + and rdi,r9 > + > + ror r13,4 > + add r12,rax > + xor rdi,r11 > + > + ror r14,6 > + xor r13,r9 > + add r12,rdi > + > + mov rdi,rbx > + add r12,QWORD[rbp] > + xor r14,rbx > + > + xor rdi,rcx > + ror r13,14 > + mov rax,rcx > + > + and r15,rdi > + ror r14,28 > + add r12,r13 > + > + xor rax,r15 > + add r8,r12 > + add rax,r12 > + > + lea rbp,[24+rbp] > + cmp BYTE[7+rbp],0 > + jnz NEAR $L$rounds_16_xx > + > + mov rdi,QWORD[((128+0))+rsp] > + add rax,r14 > + lea rsi,[128+rsi] > + > + add rax,QWORD[rdi] > + add rbx,QWORD[8+rdi] > + add rcx,QWORD[16+rdi] > + add rdx,QWORD[24+rdi] > + add r8,QWORD[32+rdi] > + add r9,QWORD[40+rdi] > + add r10,QWORD[48+rdi] > + add r11,QWORD[56+rdi] > + > + cmp rsi,QWORD[((128+16))+rsp] > + > + mov QWORD[rdi],rax > + mov QWORD[8+rdi],rbx > + mov QWORD[16+rdi],rcx > + mov QWORD[24+rdi],rdx > + mov QWORD[32+rdi],r8 > + mov QWORD[40+rdi],r9 > + mov QWORD[48+rdi],r10 > + mov QWORD[56+rdi],r11 > + jb NEAR $L$loop > + > + mov rsi,QWORD[152+rsp] > + > + mov r15,QWORD[((-48))+rsi] > + > + mov r14,QWORD[((-40))+rsi] > + > + mov r13,QWORD[((-32))+rsi] > + > + mov r12,QWORD[((-24))+rsi] > + > + mov rbp,QWORD[((-16))+rsi] > + > + mov rbx,QWORD[((-8))+rsi] > + > + lea rsp,[rsi] > + > +$L$epilogue: > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > + mov rsi,QWORD[16+rsp] > + DB 0F3h,0C3h ;repret > + > +$L$SEH_end_sha512_block_data_order: > +ALIGN 64 > + > +K512: > + DQ 0x428a2f98d728ae22,0x7137449123ef65cd > + DQ 0x428a2f98d728ae22,0x7137449123ef65cd > + DQ 0xb5c0fbcfec4d3b2f,0xe9b5dba58189dbbc > + DQ 0xb5c0fbcfec4d3b2f,0xe9b5dba58189dbbc > + DQ 0x3956c25bf348b538,0x59f111f1b605d019 > + DQ 0x3956c25bf348b538,0x59f111f1b605d019 > + DQ 0x923f82a4af194f9b,0xab1c5ed5da6d8118 > + DQ 0x923f82a4af194f9b,0xab1c5ed5da6d8118 > + DQ 0xd807aa98a3030242,0x12835b0145706fbe > + DQ 0xd807aa98a3030242,0x12835b0145706fbe > + DQ 0x243185be4ee4b28c,0x550c7dc3d5ffb4e2 > + DQ 0x243185be4ee4b28c,0x550c7dc3d5ffb4e2 > + DQ 0x72be5d74f27b896f,0x80deb1fe3b1696b1 > + DQ 0x72be5d74f27b896f,0x80deb1fe3b1696b1 > + DQ 0x9bdc06a725c71235,0xc19bf174cf692694 > + DQ 0x9bdc06a725c71235,0xc19bf174cf692694 > + DQ 0xe49b69c19ef14ad2,0xefbe4786384f25e3 > + DQ 0xe49b69c19ef14ad2,0xefbe4786384f25e3 > + DQ 0x0fc19dc68b8cd5b5,0x240ca1cc77ac9c65 > + DQ 0x0fc19dc68b8cd5b5,0x240ca1cc77ac9c65 > + DQ 0x2de92c6f592b0275,0x4a7484aa6ea6e483 > + DQ 0x2de92c6f592b0275,0x4a7484aa6ea6e483 > + DQ 0x5cb0a9dcbd41fbd4,0x76f988da831153b5 > + DQ 0x5cb0a9dcbd41fbd4,0x76f988da831153b5 > + DQ 0x983e5152ee66dfab,0xa831c66d2db43210 > + DQ 0x983e5152ee66dfab,0xa831c66d2db43210 > + DQ 0xb00327c898fb213f,0xbf597fc7beef0ee4 > + DQ 0xb00327c898fb213f,0xbf597fc7beef0ee4 > + DQ 0xc6e00bf33da88fc2,0xd5a79147930aa725 > + DQ 0xc6e00bf33da88fc2,0xd5a79147930aa725 > + DQ 0x06ca6351e003826f,0x142929670a0e6e70 > + DQ 0x06ca6351e003826f,0x142929670a0e6e70 > + DQ 0x27b70a8546d22ffc,0x2e1b21385c26c926 > + DQ 0x27b70a8546d22ffc,0x2e1b21385c26c926 > + DQ 0x4d2c6dfc5ac42aed,0x53380d139d95b3df > + DQ 0x4d2c6dfc5ac42aed,0x53380d139d95b3df > + DQ 0x650a73548baf63de,0x766a0abb3c77b2a8 > + DQ 0x650a73548baf63de,0x766a0abb3c77b2a8 > + DQ 0x81c2c92e47edaee6,0x92722c851482353b > + DQ 0x81c2c92e47edaee6,0x92722c851482353b > + DQ 0xa2bfe8a14cf10364,0xa81a664bbc423001 > + DQ 0xa2bfe8a14cf10364,0xa81a664bbc423001 > + DQ 0xc24b8b70d0f89791,0xc76c51a30654be30 > + DQ 0xc24b8b70d0f89791,0xc76c51a30654be30 > + DQ 0xd192e819d6ef5218,0xd69906245565a910 > + DQ 0xd192e819d6ef5218,0xd69906245565a910 > + DQ 0xf40e35855771202a,0x106aa07032bbd1b8 > + DQ 0xf40e35855771202a,0x106aa07032bbd1b8 > + DQ 0x19a4c116b8d2d0c8,0x1e376c085141ab53 > + DQ 0x19a4c116b8d2d0c8,0x1e376c085141ab53 > + DQ 0x2748774cdf8eeb99,0x34b0bcb5e19b48a8 > + DQ 0x2748774cdf8eeb99,0x34b0bcb5e19b48a8 > + DQ 0x391c0cb3c5c95a63,0x4ed8aa4ae3418acb > + DQ 0x391c0cb3c5c95a63,0x4ed8aa4ae3418acb > + DQ 0x5b9cca4f7763e373,0x682e6ff3d6b2b8a3 > + DQ 0x5b9cca4f7763e373,0x682e6ff3d6b2b8a3 > + DQ 0x748f82ee5defb2fc,0x78a5636f43172f60 > + DQ 0x748f82ee5defb2fc,0x78a5636f43172f60 > + DQ 0x84c87814a1f0ab72,0x8cc702081a6439ec > + DQ 0x84c87814a1f0ab72,0x8cc702081a6439ec > + DQ 0x90befffa23631e28,0xa4506cebde82bde9 > + DQ 0x90befffa23631e28,0xa4506cebde82bde9 > + DQ 0xbef9a3f7b2c67915,0xc67178f2e372532b > + DQ 0xbef9a3f7b2c67915,0xc67178f2e372532b > + DQ 0xca273eceea26619c,0xd186b8c721c0c207 > + DQ 0xca273eceea26619c,0xd186b8c721c0c207 > + DQ 0xeada7dd6cde0eb1e,0xf57d4f7fee6ed178 > + DQ 0xeada7dd6cde0eb1e,0xf57d4f7fee6ed178 > + DQ 0x06f067aa72176fba,0x0a637dc5a2c898a6 > + DQ 0x06f067aa72176fba,0x0a637dc5a2c898a6 > + DQ 0x113f9804bef90dae,0x1b710b35131c471b > + DQ 0x113f9804bef90dae,0x1b710b35131c471b > + DQ 0x28db77f523047d84,0x32caab7b40c72493 > + DQ 0x28db77f523047d84,0x32caab7b40c72493 > + DQ 0x3c9ebe0a15c9bebc,0x431d67c49c100d4c > + DQ 0x3c9ebe0a15c9bebc,0x431d67c49c100d4c > + DQ 0x4cc5d4becb3e42b6,0x597f299cfc657e2a > + DQ 0x4cc5d4becb3e42b6,0x597f299cfc657e2a > + DQ 0x5fcb6fab3ad6faec,0x6c44198c4a475817 > + DQ 0x5fcb6fab3ad6faec,0x6c44198c4a475817 > + > + DQ 0x0001020304050607,0x08090a0b0c0d0e0f > + DQ 0x0001020304050607,0x08090a0b0c0d0e0f > +DB 83,72,65,53,49,50,32,98,108,111,99,107,32,116,114,97 > +DB 110,115,102,111,114,109,32,102,111,114,32,120,56,54,95,54 > +DB 52,44,32,67,82,89,80,84,79,71,65,77,83,32,98,121 > +DB 32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46 > +DB 111,114,103,62,0 > +EXTERN __imp_RtlVirtualUnwind > + > +ALIGN 16 > +se_handler: > + push rsi > + push rdi > + push rbx > + push rbp > + push r12 > + push r13 > + push r14 > + push r15 > + pushfq > + sub rsp,64 > + > + mov rax,QWORD[120+r8] > + mov rbx,QWORD[248+r8] > + > + mov rsi,QWORD[8+r9] > + mov r11,QWORD[56+r9] > + > + mov r10d,DWORD[r11] > + lea r10,[r10*1+rsi] > + cmp rbx,r10 > + jb NEAR $L$in_prologue > + > + mov rax,QWORD[152+r8] > + > + mov r10d,DWORD[4+r11] > + lea r10,[r10*1+rsi] > + cmp rbx,r10 > + jae NEAR $L$in_prologue > + mov rsi,rax > + mov rax,QWORD[((128+24))+rax] > + > + mov rbx,QWORD[((-8))+rax] > + mov rbp,QWORD[((-16))+rax] > + mov r12,QWORD[((-24))+rax] > + mov r13,QWORD[((-32))+rax] > + mov r14,QWORD[((-40))+rax] > + mov r15,QWORD[((-48))+rax] > + mov QWORD[144+r8],rbx > + mov QWORD[160+r8],rbp > + mov QWORD[216+r8],r12 > + mov QWORD[224+r8],r13 > + mov QWORD[232+r8],r14 > + mov QWORD[240+r8],r15 > + > + lea r10,[$L$epilogue] > + cmp rbx,r10 > + jb NEAR $L$in_prologue > + > + lea rsi,[((128+32))+rsi] > + lea rdi,[512+r8] > + mov ecx,12 > + DD 0xa548f3fc > + > +$L$in_prologue: > + mov rdi,QWORD[8+rax] > + mov rsi,QWORD[16+rax] > + mov QWORD[152+r8],rax > + mov QWORD[168+r8],rsi > + mov QWORD[176+r8],rdi > + > + mov rdi,QWORD[40+r9] > + mov rsi,r8 > + mov ecx,154 > + DD 0xa548f3fc > + > + mov rsi,r9 > + xor rcx,rcx > + mov rdx,QWORD[8+rsi] > + mov r8,QWORD[rsi] > + mov r9,QWORD[16+rsi] > + mov r10,QWORD[40+rsi] > + lea r11,[56+rsi] > + lea r12,[24+rsi] > + mov QWORD[32+rsp],r10 > + mov QWORD[40+rsp],r11 > + mov QWORD[48+rsp],r12 > + mov QWORD[56+rsp],rcx > + call QWORD[__imp_RtlVirtualUnwind] > + > + mov eax,1 > + add rsp,64 > + popfq > + pop r15 > + pop r14 > + pop r13 > + pop r12 > + pop rbp > + pop rbx > + pop rdi > + pop rsi > + DB 0F3h,0C3h ;repret > + > +section .pdata rdata align=4 > +ALIGN 4 > + DD $L$SEH_begin_sha512_block_data_order wrt ..imagebase > + DD $L$SEH_end_sha512_block_data_order wrt ..imagebase > + DD $L$SEH_info_sha512_block_data_order wrt ..imagebase > +section .xdata rdata align=8 > +ALIGN 8 > +$L$SEH_info_sha512_block_data_order: > +DB 9,0,0,0 > + DD se_handler wrt ..imagebase > + DD $L$prologue wrt ..imagebase,$L$epilogue wrt ..imagebase > diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/x86_64cpuid.nasm > b/CryptoPkg/Library/OpensslLib/X64/crypto/x86_64cpuid.nasm > new file mode 100644 > index 0000000000..2a3d5bcf72 > --- /dev/null > +++ b/CryptoPkg/Library/OpensslLib/X64/crypto/x86_64cpuid.nasm > @@ -0,0 +1,491 @@ > +; WARNING: do not edit! > +; Generated from openssl/crypto/x86_64cpuid.pl > +; > +; Copyright 2005-2020 The OpenSSL Project Authors. All Rights Reserved. > +; > +; Licensed under the OpenSSL license (the "License"). You may not use > +; this file except in compliance with the License. You can obtain a copy > +; in the file LICENSE in the source distribution or at > +; https://www.openssl.org/source/license.html > + > +default rel > +%define XMMWORD > +%define YMMWORD > +%define ZMMWORD > +EXTERN OPENSSL_cpuid_setup > + > +section .CRT$XCU rdata align=8 > + DQ OPENSSL_cpuid_setup > + > + > +common OPENSSL_ia32cap_P 16 > + > +section .text code align=64 > + > + > +global OPENSSL_atomic_add > + > +ALIGN 16 > +OPENSSL_atomic_add: > + > + mov eax,DWORD[rcx] > +$L$spin: lea r8,[rax*1+rdx] > +DB 0xf0 > + cmpxchg DWORD[rcx],r8d > + jne NEAR $L$spin > + mov eax,r8d > +DB 0x48,0x98 > + DB 0F3h,0C3h ;repret > + > + > + > +global OPENSSL_rdtsc > + > +ALIGN 16 > +OPENSSL_rdtsc: > + > + rdtsc > + shl rdx,32 > + or rax,rdx > + DB 0F3h,0C3h ;repret > + > + > + > +global OPENSSL_ia32_cpuid > + > +ALIGN 16 > +OPENSSL_ia32_cpuid: > + mov QWORD[8+rsp],rdi ;WIN64 prologue > + mov QWORD[16+rsp],rsi > + mov rax,rsp > +$L$SEH_begin_OPENSSL_ia32_cpuid: > + mov rdi,rcx > + > + > + > + mov r8,rbx > + > + > + xor eax,eax > + mov QWORD[8+rdi],rax > + cpuid > + mov r11d,eax > + > + xor eax,eax > + cmp ebx,0x756e6547 > + setne al > + mov r9d,eax > + cmp edx,0x49656e69 > + setne al > + or r9d,eax > + cmp ecx,0x6c65746e > + setne al > + or r9d,eax > + jz NEAR $L$intel > + > + cmp ebx,0x68747541 > + setne al > + mov r10d,eax > + cmp edx,0x69746E65 > + setne al > + or r10d,eax > + cmp ecx,0x444D4163 > + setne al > + or r10d,eax > + jnz NEAR $L$intel > + > + > + mov eax,0x80000000 > + cpuid > + cmp eax,0x80000001 > + jb NEAR $L$intel > + mov r10d,eax > + mov eax,0x80000001 > + cpuid > + or r9d,ecx > + and r9d,0x00000801 > + > + cmp r10d,0x80000008 > + jb NEAR $L$intel > + > + mov eax,0x80000008 > + cpuid > + movzx r10,cl > + inc r10 > + > + mov eax,1 > + cpuid > + bt edx,28 > + jnc NEAR $L$generic > + shr ebx,16 > + cmp bl,r10b > + ja NEAR $L$generic > + and edx,0xefffffff > + jmp NEAR $L$generic > + > +$L$intel: > + cmp r11d,4 > + mov r10d,-1 > + jb NEAR $L$nocacheinfo > + > + mov eax,4 > + mov ecx,0 > + cpuid > + mov r10d,eax > + shr r10d,14 > + and r10d,0xfff > + > +$L$nocacheinfo: > + mov eax,1 > + cpuid > + movd xmm0,eax > + and edx,0xbfefffff > + cmp r9d,0 > + jne NEAR $L$notintel > + or edx,0x40000000 > + and ah,15 > + cmp ah,15 > + jne NEAR $L$notP4 > + or edx,0x00100000 > +$L$notP4: > + cmp ah,6 > + jne NEAR $L$notintel > + and eax,0x0fff0ff0 > + cmp eax,0x00050670 > + je NEAR $L$knights > + cmp eax,0x00080650 > + jne NEAR $L$notintel > +$L$knights: > + and ecx,0xfbffffff > + > +$L$notintel: > + bt edx,28 > + jnc NEAR $L$generic > + and edx,0xefffffff > + cmp r10d,0 > + je NEAR $L$generic > + > + or edx,0x10000000 > + shr ebx,16 > + cmp bl,1 > + ja NEAR $L$generic > + and edx,0xefffffff > +$L$generic: > + and r9d,0x00000800 > + and ecx,0xfffff7ff > + or r9d,ecx > + > + mov r10d,edx > + > + cmp r11d,7 > + jb NEAR $L$no_extended_info > + mov eax,7 > + xor ecx,ecx > + cpuid > + bt r9d,26 > + jc NEAR $L$notknights > + and ebx,0xfff7ffff > +$L$notknights: > + movd eax,xmm0 > + and eax,0x0fff0ff0 > + cmp eax,0x00050650 > + jne NEAR $L$notskylakex > + and ebx,0xfffeffff > + > +$L$notskylakex: > + mov DWORD[8+rdi],ebx > + mov DWORD[12+rdi],ecx > +$L$no_extended_info: > + > + bt r9d,27 > + jnc NEAR $L$clear_avx > + xor ecx,ecx > +DB 0x0f,0x01,0xd0 > + and eax,0xe6 > + cmp eax,0xe6 > + je NEAR $L$done > + and DWORD[8+rdi],0x3fdeffff > + > + > + > + > + and eax,6 > + cmp eax,6 > + je NEAR $L$done > +$L$clear_avx: > + mov eax,0xefffe7ff > + and r9d,eax > + mov eax,0x3fdeffdf > + and DWORD[8+rdi],eax > +$L$done: > + shl r9,32 > + mov eax,r10d > + mov rbx,r8 > + > + or rax,r9 > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > + mov rsi,QWORD[16+rsp] > + DB 0F3h,0C3h ;repret > + > +$L$SEH_end_OPENSSL_ia32_cpuid: > + > +global OPENSSL_cleanse > + > +ALIGN 16 > +OPENSSL_cleanse: > + > + xor rax,rax > + cmp rdx,15 > + jae NEAR $L$ot > + cmp rdx,0 > + je NEAR $L$ret > +$L$ittle: > + mov BYTE[rcx],al > + sub rdx,1 > + lea rcx,[1+rcx] > + jnz NEAR $L$ittle > +$L$ret: > + DB 0F3h,0C3h ;repret > +ALIGN 16 > +$L$ot: > + test rcx,7 > + jz NEAR $L$aligned > + mov BYTE[rcx],al > + lea rdx,[((-1))+rdx] > + lea rcx,[1+rcx] > + jmp NEAR $L$ot > +$L$aligned: > + mov QWORD[rcx],rax > + lea rdx,[((-8))+rdx] > + test rdx,-8 > + lea rcx,[8+rcx] > + jnz NEAR $L$aligned > + cmp rdx,0 > + jne NEAR $L$ittle > + DB 0F3h,0C3h ;repret > + > + > + > +global CRYPTO_memcmp > + > +ALIGN 16 > +CRYPTO_memcmp: > + > + xor rax,rax > + xor r10,r10 > + cmp r8,0 > + je NEAR $L$no_data > + cmp r8,16 > + jne NEAR $L$oop_cmp > + mov r10,QWORD[rcx] > + mov r11,QWORD[8+rcx] > + mov r8,1 > + xor r10,QWORD[rdx] > + xor r11,QWORD[8+rdx] > + or r10,r11 > + cmovnz rax,r8 > + DB 0F3h,0C3h ;repret > + > +ALIGN 16 > +$L$oop_cmp: > + mov r10b,BYTE[rcx] > + lea rcx,[1+rcx] > + xor r10b,BYTE[rdx] > + lea rdx,[1+rdx] > + or al,r10b > + dec r8 > + jnz NEAR $L$oop_cmp > + neg rax > + shr rax,63 > +$L$no_data: > + DB 0F3h,0C3h ;repret > + > + > +global OPENSSL_wipe_cpu > + > +ALIGN 16 > +OPENSSL_wipe_cpu: > + pxor xmm0,xmm0 > + pxor xmm1,xmm1 > + pxor xmm2,xmm2 > + pxor xmm3,xmm3 > + pxor xmm4,xmm4 > + pxor xmm5,xmm5 > + xor rcx,rcx > + xor rdx,rdx > + xor r8,r8 > + xor r9,r9 > + xor r10,r10 > + xor r11,r11 > + lea rax,[8+rsp] > + DB 0F3h,0C3h ;repret > + > +global OPENSSL_instrument_bus > + > +ALIGN 16 > +OPENSSL_instrument_bus: > + > + mov r10,rcx > + mov rcx,rdx > + mov r11,rdx > + > + rdtsc > + mov r8d,eax > + mov r9d,0 > + clflush [r10] > +DB 0xf0 > + add DWORD[r10],r9d > + jmp NEAR $L$oop > +ALIGN 16 > +$L$oop: rdtsc > + mov edx,eax > + sub eax,r8d > + mov r8d,edx > + mov r9d,eax > + clflush [r10] > +DB 0xf0 > + add DWORD[r10],eax > + lea r10,[4+r10] > + sub rcx,1 > + jnz NEAR $L$oop > + > + mov rax,r11 > + DB 0F3h,0C3h ;repret > + > + > + > +global OPENSSL_instrument_bus2 > + > +ALIGN 16 > +OPENSSL_instrument_bus2: > + > + mov r10,rcx > + mov rcx,rdx > + mov r11,r8 > + mov QWORD[8+rsp],rcx > + > + rdtsc > + mov r8d,eax > + mov r9d,0 > + > + clflush [r10] > +DB 0xf0 > + add DWORD[r10],r9d > + > + rdtsc > + mov edx,eax > + sub eax,r8d > + mov r8d,edx > + mov r9d,eax > +$L$oop2: > + clflush [r10] > +DB 0xf0 > + add DWORD[r10],eax > + > + sub r11,1 > + jz NEAR $L$done2 > + > + rdtsc > + mov edx,eax > + sub eax,r8d > + mov r8d,edx > + cmp eax,r9d > + mov r9d,eax > + mov edx,0 > + setne dl > + sub rcx,rdx > + lea r10,[rdx*4+r10] > + jnz NEAR $L$oop2 > + > +$L$done2: > + mov rax,QWORD[8+rsp] > + sub rax,rcx > + DB 0F3h,0C3h ;repret > + > + > +global OPENSSL_ia32_rdrand_bytes > + > +ALIGN 16 > +OPENSSL_ia32_rdrand_bytes: > + > + xor rax,rax > + cmp rdx,0 > + je NEAR $L$done_rdrand_bytes > + > + mov r11,8 > +$L$oop_rdrand_bytes: > +DB 73,15,199,242 > + jc NEAR $L$break_rdrand_bytes > + dec r11 > + jnz NEAR $L$oop_rdrand_bytes > + jmp NEAR $L$done_rdrand_bytes > + > +ALIGN 16 > +$L$break_rdrand_bytes: > + cmp rdx,8 > + jb NEAR $L$tail_rdrand_bytes > + mov QWORD[rcx],r10 > + lea rcx,[8+rcx] > + add rax,8 > + sub rdx,8 > + jz NEAR $L$done_rdrand_bytes > + mov r11,8 > + jmp NEAR $L$oop_rdrand_bytes > + > +ALIGN 16 > +$L$tail_rdrand_bytes: > + mov BYTE[rcx],r10b > + lea rcx,[1+rcx] > + inc rax > + shr r10,8 > + dec rdx > + jnz NEAR $L$tail_rdrand_bytes > + > +$L$done_rdrand_bytes: > + xor r10,r10 > + DB 0F3h,0C3h ;repret > + > + > +global OPENSSL_ia32_rdseed_bytes > + > +ALIGN 16 > +OPENSSL_ia32_rdseed_bytes: > + > + xor rax,rax > + cmp rdx,0 > + je NEAR $L$done_rdseed_bytes > + > + mov r11,8 > +$L$oop_rdseed_bytes: > +DB 73,15,199,250 > + jc NEAR $L$break_rdseed_bytes > + dec r11 > + jnz NEAR $L$oop_rdseed_bytes > + jmp NEAR $L$done_rdseed_bytes > + > +ALIGN 16 > +$L$break_rdseed_bytes: > + cmp rdx,8 > + jb NEAR $L$tail_rdseed_bytes > + mov QWORD[rcx],r10 > + lea rcx,[8+rcx] > + add rax,8 > + sub rdx,8 > + jz NEAR $L$done_rdseed_bytes > + mov r11,8 > + jmp NEAR $L$oop_rdseed_bytes > + > +ALIGN 16 > +$L$tail_rdseed_bytes: > + mov BYTE[rcx],r10b > + lea rcx,[1+rcx] > + inc rax > + shr r10,8 > + dec rdx > + jnz NEAR $L$tail_rdseed_bytes > + > +$L$done_rdseed_bytes: > + xor r10,r10 > + DB 0F3h,0C3h ;repret > + > + > diff --git a/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/aesni-mb- > x86_64.S b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/aesni-mb- > x86_64.S > new file mode 100644 > index 0000000000..7749fd685a > --- /dev/null > +++ b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/aesni-mb-x86_64.S > @@ -0,0 +1,552 @@ > +# WARNING: do not edit! > +# Generated from openssl/crypto/aes/asm/aesni-mb-x86_64.pl > +# > +# Copyright 2013-2020 The OpenSSL Project Authors. All Rights Reserved. > +# > +# Licensed under the OpenSSL license (the "License"). You may not use > +# this file except in compliance with the License. You can obtain a copy > +# in the file LICENSE in the source distribution or at > +# https://www.openssl.org/source/license.html > + > +.text > + > + > + > +.globl aesni_multi_cbc_encrypt > +.type aesni_multi_cbc_encrypt,@function > +.align 32 > +aesni_multi_cbc_encrypt: > +.cfi_startproc > + movq %rsp,%rax > +.cfi_def_cfa_register %rax > + pushq %rbx > +.cfi_offset %rbx,-16 > + pushq %rbp > +.cfi_offset %rbp,-24 > + pushq %r12 > +.cfi_offset %r12,-32 > + pushq %r13 > +.cfi_offset %r13,-40 > + pushq %r14 > +.cfi_offset %r14,-48 > + pushq %r15 > +.cfi_offset %r15,-56 > + > + > + > + > + > + > + subq $48,%rsp > + andq $-64,%rsp > + movq %rax,16(%rsp) > +.cfi_escape 0x0f,0x05,0x77,0x10,0x06,0x23,0x08 > + > +.Lenc4x_body: > + movdqu (%rsi),%xmm12 > + leaq 120(%rsi),%rsi > + leaq 80(%rdi),%rdi > + > +.Lenc4x_loop_grande: > + movl %edx,24(%rsp) > + xorl %edx,%edx > + movl -64(%rdi),%ecx > + movq -80(%rdi),%r8 > + cmpl %edx,%ecx > + movq -72(%rdi),%r12 > + cmovgl %ecx,%edx > + testl %ecx,%ecx > + movdqu -56(%rdi),%xmm2 > + movl %ecx,32(%rsp) > + cmovleq %rsp,%r8 > + movl -24(%rdi),%ecx > + movq -40(%rdi),%r9 > + cmpl %edx,%ecx > + movq -32(%rdi),%r13 > + cmovgl %ecx,%edx > + testl %ecx,%ecx > + movdqu -16(%rdi),%xmm3 > + movl %ecx,36(%rsp) > + cmovleq %rsp,%r9 > + movl 16(%rdi),%ecx > + movq 0(%rdi),%r10 > + cmpl %edx,%ecx > + movq 8(%rdi),%r14 > + cmovgl %ecx,%edx > + testl %ecx,%ecx > + movdqu 24(%rdi),%xmm4 > + movl %ecx,40(%rsp) > + cmovleq %rsp,%r10 > + movl 56(%rdi),%ecx > + movq 40(%rdi),%r11 > + cmpl %edx,%ecx > + movq 48(%rdi),%r15 > + cmovgl %ecx,%edx > + testl %ecx,%ecx > + movdqu 64(%rdi),%xmm5 > + movl %ecx,44(%rsp) > + cmovleq %rsp,%r11 > + testl %edx,%edx > + jz .Lenc4x_done > + > + movups 16-120(%rsi),%xmm1 > + pxor %xmm12,%xmm2 > + movups 32-120(%rsi),%xmm0 > + pxor %xmm12,%xmm3 > + movl 240-120(%rsi),%eax > + pxor %xmm12,%xmm4 > + movdqu (%r8),%xmm6 > + pxor %xmm12,%xmm5 > + movdqu (%r9),%xmm7 > + pxor %xmm6,%xmm2 > + movdqu (%r10),%xmm8 > + pxor %xmm7,%xmm3 > + movdqu (%r11),%xmm9 > + pxor %xmm8,%xmm4 > + pxor %xmm9,%xmm5 > + movdqa 32(%rsp),%xmm10 > + xorq %rbx,%rbx > + jmp .Loop_enc4x > + > +.align 32 > +.Loop_enc4x: > + addq $16,%rbx > + leaq 16(%rsp),%rbp > + movl $1,%ecx > + subq %rbx,%rbp > + > +.byte 102,15,56,220,209 > + prefetcht0 31(%r8,%rbx,1) > + prefetcht0 31(%r9,%rbx,1) > +.byte 102,15,56,220,217 > + prefetcht0 31(%r10,%rbx,1) > + prefetcht0 31(%r10,%rbx,1) > +.byte 102,15,56,220,225 > +.byte 102,15,56,220,233 > + movups 48-120(%rsi),%xmm1 > + cmpl 32(%rsp),%ecx > +.byte 102,15,56,220,208 > +.byte 102,15,56,220,216 > +.byte 102,15,56,220,224 > + cmovgeq %rbp,%r8 > + cmovgq %rbp,%r12 > +.byte 102,15,56,220,232 > + movups -56(%rsi),%xmm0 > + cmpl 36(%rsp),%ecx > +.byte 102,15,56,220,209 > +.byte 102,15,56,220,217 > +.byte 102,15,56,220,225 > + cmovgeq %rbp,%r9 > + cmovgq %rbp,%r13 > +.byte 102,15,56,220,233 > + movups -40(%rsi),%xmm1 > + cmpl 40(%rsp),%ecx > +.byte 102,15,56,220,208 > +.byte 102,15,56,220,216 > +.byte 102,15,56,220,224 > + cmovgeq %rbp,%r10 > + cmovgq %rbp,%r14 > +.byte 102,15,56,220,232 > + movups -24(%rsi),%xmm0 > + cmpl 44(%rsp),%ecx > +.byte 102,15,56,220,209 > +.byte 102,15,56,220,217 > +.byte 102,15,56,220,225 > + cmovgeq %rbp,%r11 > + cmovgq %rbp,%r15 > +.byte 102,15,56,220,233 > + movups -8(%rsi),%xmm1 > + movdqa %xmm10,%xmm11 > +.byte 102,15,56,220,208 > + prefetcht0 15(%r12,%rbx,1) > + prefetcht0 15(%r13,%rbx,1) > +.byte 102,15,56,220,216 > + prefetcht0 15(%r14,%rbx,1) > + prefetcht0 15(%r15,%rbx,1) > +.byte 102,15,56,220,224 > +.byte 102,15,56,220,232 > + movups 128-120(%rsi),%xmm0 > + pxor %xmm12,%xmm12 > + > +.byte 102,15,56,220,209 > + pcmpgtd %xmm12,%xmm11 > + movdqu -120(%rsi),%xmm12 > +.byte 102,15,56,220,217 > + paddd %xmm11,%xmm10 > + movdqa %xmm10,32(%rsp) > +.byte 102,15,56,220,225 > +.byte 102,15,56,220,233 > + movups 144-120(%rsi),%xmm1 > + > + cmpl $11,%eax > + > +.byte 102,15,56,220,208 > +.byte 102,15,56,220,216 > +.byte 102,15,56,220,224 > +.byte 102,15,56,220,232 > + movups 160-120(%rsi),%xmm0 > + > + jb .Lenc4x_tail > + > +.byte 102,15,56,220,209 > +.byte 102,15,56,220,217 > +.byte 102,15,56,220,225 > +.byte 102,15,56,220,233 > + movups 176-120(%rsi),%xmm1 > + > +.byte 102,15,56,220,208 > +.byte 102,15,56,220,216 > +.byte 102,15,56,220,224 > +.byte 102,15,56,220,232 > + movups 192-120(%rsi),%xmm0 > + > + je .Lenc4x_tail > + > +.byte 102,15,56,220,209 > +.byte 102,15,56,220,217 > +.byte 102,15,56,220,225 > +.byte 102,15,56,220,233 > + movups 208-120(%rsi),%xmm1 > + > +.byte 102,15,56,220,208 > +.byte 102,15,56,220,216 > +.byte 102,15,56,220,224 > +.byte 102,15,56,220,232 > + movups 224-120(%rsi),%xmm0 > + jmp .Lenc4x_tail > + > +.align 32 > +.Lenc4x_tail: > +.byte 102,15,56,220,209 > +.byte 102,15,56,220,217 > +.byte 102,15,56,220,225 > +.byte 102,15,56,220,233 > + movdqu (%r8,%rbx,1),%xmm6 > + movdqu 16-120(%rsi),%xmm1 > + > +.byte 102,15,56,221,208 > + movdqu (%r9,%rbx,1),%xmm7 > + pxor %xmm12,%xmm6 > +.byte 102,15,56,221,216 > + movdqu (%r10,%rbx,1),%xmm8 > + pxor %xmm12,%xmm7 > +.byte 102,15,56,221,224 > + movdqu (%r11,%rbx,1),%xmm9 > + pxor %xmm12,%xmm8 > +.byte 102,15,56,221,232 > + movdqu 32-120(%rsi),%xmm0 > + pxor %xmm12,%xmm9 > + > + movups %xmm2,-16(%r12,%rbx,1) > + pxor %xmm6,%xmm2 > + movups %xmm3,-16(%r13,%rbx,1) > + pxor %xmm7,%xmm3 > + movups %xmm4,-16(%r14,%rbx,1) > + pxor %xmm8,%xmm4 > + movups %xmm5,-16(%r15,%rbx,1) > + pxor %xmm9,%xmm5 > + > + decl %edx > + jnz .Loop_enc4x > + > + movq 16(%rsp),%rax > +.cfi_def_cfa %rax,8 > + movl 24(%rsp),%edx > + > + > + > + > + > + > + > + > + > + > + leaq 160(%rdi),%rdi > + decl %edx > + jnz .Lenc4x_loop_grande > + > +.Lenc4x_done: > + movq -48(%rax),%r15 > +.cfi_restore %r15 > + movq -40(%rax),%r14 > +.cfi_restore %r14 > + movq -32(%rax),%r13 > +.cfi_restore %r13 > + movq -24(%rax),%r12 > +.cfi_restore %r12 > + movq -16(%rax),%rbp > +.cfi_restore %rbp > + movq -8(%rax),%rbx > +.cfi_restore %rbx > + leaq (%rax),%rsp > +.cfi_def_cfa_register %rsp > +.Lenc4x_epilogue: > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size aesni_multi_cbc_encrypt,.-aesni_multi_cbc_encrypt > + > +.globl aesni_multi_cbc_decrypt > +.type aesni_multi_cbc_decrypt,@function > +.align 32 > +aesni_multi_cbc_decrypt: > +.cfi_startproc > + movq %rsp,%rax > +.cfi_def_cfa_register %rax > + pushq %rbx > +.cfi_offset %rbx,-16 > + pushq %rbp > +.cfi_offset %rbp,-24 > + pushq %r12 > +.cfi_offset %r12,-32 > + pushq %r13 > +.cfi_offset %r13,-40 > + pushq %r14 > +.cfi_offset %r14,-48 > + pushq %r15 > +.cfi_offset %r15,-56 > + > + > + > + > + > + > + subq $48,%rsp > + andq $-64,%rsp > + movq %rax,16(%rsp) > +.cfi_escape 0x0f,0x05,0x77,0x10,0x06,0x23,0x08 > + > +.Ldec4x_body: > + movdqu (%rsi),%xmm12 > + leaq 120(%rsi),%rsi > + leaq 80(%rdi),%rdi > + > +.Ldec4x_loop_grande: > + movl %edx,24(%rsp) > + xorl %edx,%edx > + movl -64(%rdi),%ecx > + movq -80(%rdi),%r8 > + cmpl %edx,%ecx > + movq -72(%rdi),%r12 > + cmovgl %ecx,%edx > + testl %ecx,%ecx > + movdqu -56(%rdi),%xmm6 > + movl %ecx,32(%rsp) > + cmovleq %rsp,%r8 > + movl -24(%rdi),%ecx > + movq -40(%rdi),%r9 > + cmpl %edx,%ecx > + movq -32(%rdi),%r13 > + cmovgl %ecx,%edx > + testl %ecx,%ecx > + movdqu -16(%rdi),%xmm7 > + movl %ecx,36(%rsp) > + cmovleq %rsp,%r9 > + movl 16(%rdi),%ecx > + movq 0(%rdi),%r10 > + cmpl %edx,%ecx > + movq 8(%rdi),%r14 > + cmovgl %ecx,%edx > + testl %ecx,%ecx > + movdqu 24(%rdi),%xmm8 > + movl %ecx,40(%rsp) > + cmovleq %rsp,%r10 > + movl 56(%rdi),%ecx > + movq 40(%rdi),%r11 > + cmpl %edx,%ecx > + movq 48(%rdi),%r15 > + cmovgl %ecx,%edx > + testl %ecx,%ecx > + movdqu 64(%rdi),%xmm9 > + movl %ecx,44(%rsp) > + cmovleq %rsp,%r11 > + testl %edx,%edx > + jz .Ldec4x_done > + > + movups 16-120(%rsi),%xmm1 > + movups 32-120(%rsi),%xmm0 > + movl 240-120(%rsi),%eax > + movdqu (%r8),%xmm2 > + movdqu (%r9),%xmm3 > + pxor %xmm12,%xmm2 > + movdqu (%r10),%xmm4 > + pxor %xmm12,%xmm3 > + movdqu (%r11),%xmm5 > + pxor %xmm12,%xmm4 > + pxor %xmm12,%xmm5 > + movdqa 32(%rsp),%xmm10 > + xorq %rbx,%rbx > + jmp .Loop_dec4x > + > +.align 32 > +.Loop_dec4x: > + addq $16,%rbx > + leaq 16(%rsp),%rbp > + movl $1,%ecx > + subq %rbx,%rbp > + > +.byte 102,15,56,222,209 > + prefetcht0 31(%r8,%rbx,1) > + prefetcht0 31(%r9,%rbx,1) > +.byte 102,15,56,222,217 > + prefetcht0 31(%r10,%rbx,1) > + prefetcht0 31(%r11,%rbx,1) > +.byte 102,15,56,222,225 > +.byte 102,15,56,222,233 > + movups 48-120(%rsi),%xmm1 > + cmpl 32(%rsp),%ecx > +.byte 102,15,56,222,208 > +.byte 102,15,56,222,216 > +.byte 102,15,56,222,224 > + cmovgeq %rbp,%r8 > + cmovgq %rbp,%r12 > +.byte 102,15,56,222,232 > + movups -56(%rsi),%xmm0 > + cmpl 36(%rsp),%ecx > +.byte 102,15,56,222,209 > +.byte 102,15,56,222,217 > +.byte 102,15,56,222,225 > + cmovgeq %rbp,%r9 > + cmovgq %rbp,%r13 > +.byte 102,15,56,222,233 > + movups -40(%rsi),%xmm1 > + cmpl 40(%rsp),%ecx > +.byte 102,15,56,222,208 > +.byte 102,15,56,222,216 > +.byte 102,15,56,222,224 > + cmovgeq %rbp,%r10 > + cmovgq %rbp,%r14 > +.byte 102,15,56,222,232 > + movups -24(%rsi),%xmm0 > + cmpl 44(%rsp),%ecx > +.byte 102,15,56,222,209 > +.byte 102,15,56,222,217 > +.byte 102,15,56,222,225 > + cmovgeq %rbp,%r11 > + cmovgq %rbp,%r15 > +.byte 102,15,56,222,233 > + movups -8(%rsi),%xmm1 > + movdqa %xmm10,%xmm11 > +.byte 102,15,56,222,208 > + prefetcht0 15(%r12,%rbx,1) > + prefetcht0 15(%r13,%rbx,1) > +.byte 102,15,56,222,216 > + prefetcht0 15(%r14,%rbx,1) > + prefetcht0 15(%r15,%rbx,1) > +.byte 102,15,56,222,224 > +.byte 102,15,56,222,232 > + movups 128-120(%rsi),%xmm0 > + pxor %xmm12,%xmm12 > + > +.byte 102,15,56,222,209 > + pcmpgtd %xmm12,%xmm11 > + movdqu -120(%rsi),%xmm12 > +.byte 102,15,56,222,217 > + paddd %xmm11,%xmm10 > + movdqa %xmm10,32(%rsp) > +.byte 102,15,56,222,225 > +.byte 102,15,56,222,233 > + movups 144-120(%rsi),%xmm1 > + > + cmpl $11,%eax > + > +.byte 102,15,56,222,208 > +.byte 102,15,56,222,216 > +.byte 102,15,56,222,224 > +.byte 102,15,56,222,232 > + movups 160-120(%rsi),%xmm0 > + > + jb .Ldec4x_tail > + > +.byte 102,15,56,222,209 > +.byte 102,15,56,222,217 > +.byte 102,15,56,222,225 > +.byte 102,15,56,222,233 > + movups 176-120(%rsi),%xmm1 > + > +.byte 102,15,56,222,208 > +.byte 102,15,56,222,216 > +.byte 102,15,56,222,224 > +.byte 102,15,56,222,232 > + movups 192-120(%rsi),%xmm0 > + > + je .Ldec4x_tail > + > +.byte 102,15,56,222,209 > +.byte 102,15,56,222,217 > +.byte 102,15,56,222,225 > +.byte 102,15,56,222,233 > + movups 208-120(%rsi),%xmm1 > + > +.byte 102,15,56,222,208 > +.byte 102,15,56,222,216 > +.byte 102,15,56,222,224 > +.byte 102,15,56,222,232 > + movups 224-120(%rsi),%xmm0 > + jmp .Ldec4x_tail > + > +.align 32 > +.Ldec4x_tail: > +.byte 102,15,56,222,209 > +.byte 102,15,56,222,217 > +.byte 102,15,56,222,225 > + pxor %xmm0,%xmm6 > + pxor %xmm0,%xmm7 > +.byte 102,15,56,222,233 > + movdqu 16-120(%rsi),%xmm1 > + pxor %xmm0,%xmm8 > + pxor %xmm0,%xmm9 > + movdqu 32-120(%rsi),%xmm0 > + > +.byte 102,15,56,223,214 > +.byte 102,15,56,223,223 > + movdqu -16(%r8,%rbx,1),%xmm6 > + movdqu -16(%r9,%rbx,1),%xmm7 > +.byte 102,65,15,56,223,224 > +.byte 102,65,15,56,223,233 > + movdqu -16(%r10,%rbx,1),%xmm8 > + movdqu -16(%r11,%rbx,1),%xmm9 > + > + movups %xmm2,-16(%r12,%rbx,1) > + movdqu (%r8,%rbx,1),%xmm2 > + movups %xmm3,-16(%r13,%rbx,1) > + movdqu (%r9,%rbx,1),%xmm3 > + pxor %xmm12,%xmm2 > + movups %xmm4,-16(%r14,%rbx,1) > + movdqu (%r10,%rbx,1),%xmm4 > + pxor %xmm12,%xmm3 > + movups %xmm5,-16(%r15,%rbx,1) > + movdqu (%r11,%rbx,1),%xmm5 > + pxor %xmm12,%xmm4 > + pxor %xmm12,%xmm5 > + > + decl %edx > + jnz .Loop_dec4x > + > + movq 16(%rsp),%rax > +.cfi_def_cfa %rax,8 > + movl 24(%rsp),%edx > + > + leaq 160(%rdi),%rdi > + decl %edx > + jnz .Ldec4x_loop_grande > + > +.Ldec4x_done: > + movq -48(%rax),%r15 > +.cfi_restore %r15 > + movq -40(%rax),%r14 > +.cfi_restore %r14 > + movq -32(%rax),%r13 > +.cfi_restore %r13 > + movq -24(%rax),%r12 > +.cfi_restore %r12 > + movq -16(%rax),%rbp > +.cfi_restore %rbp > + movq -8(%rax),%rbx > +.cfi_restore %rbx > + leaq (%rax),%rsp > +.cfi_def_cfa_register %rsp > +.Ldec4x_epilogue: > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size aesni_multi_cbc_decrypt,.-aesni_multi_cbc_decrypt > diff --git a/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/aesni-sha1- > x86_64.S b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/aesni-sha1- > x86_64.S > new file mode 100644 > index 0000000000..ab763a2eec > --- /dev/null > +++ b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/aesni-sha1-x86_64.S > @@ -0,0 +1,1719 @@ > +# WARNING: do not edit! > +# Generated from openssl/crypto/aes/asm/aesni-sha1-x86_64.pl > +# > +# Copyright 2011-2020 The OpenSSL Project Authors. All Rights Reserved. > +# > +# Licensed under the OpenSSL license (the "License"). You may not use > +# this file except in compliance with the License. You can obtain a copy > +# in the file LICENSE in the source distribution or at > +# https://www.openssl.org/source/license.html > + > +.text > + > + > +.globl aesni_cbc_sha1_enc > +.type aesni_cbc_sha1_enc,@function > +.align 32 > +aesni_cbc_sha1_enc: > +.cfi_startproc > + > + movl OPENSSL_ia32cap_P+0(%rip),%r10d > + movq OPENSSL_ia32cap_P+4(%rip),%r11 > + btq $61,%r11 > + jc aesni_cbc_sha1_enc_shaext > + jmp aesni_cbc_sha1_enc_ssse3 > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size aesni_cbc_sha1_enc,.-aesni_cbc_sha1_enc > +.type aesni_cbc_sha1_enc_ssse3,@function > +.align 32 > +aesni_cbc_sha1_enc_ssse3: > +.cfi_startproc > + movq 8(%rsp),%r10 > + > + > + pushq %rbx > +.cfi_adjust_cfa_offset 8 > +.cfi_offset %rbx,-16 > + pushq %rbp > +.cfi_adjust_cfa_offset 8 > +.cfi_offset %rbp,-24 > + pushq %r12 > +.cfi_adjust_cfa_offset 8 > +.cfi_offset %r12,-32 > + pushq %r13 > +.cfi_adjust_cfa_offset 8 > +.cfi_offset %r13,-40 > + pushq %r14 > +.cfi_adjust_cfa_offset 8 > +.cfi_offset %r14,-48 > + pushq %r15 > +.cfi_adjust_cfa_offset 8 > +.cfi_offset %r15,-56 > + leaq -104(%rsp),%rsp > +.cfi_adjust_cfa_offset 104 > + > + > + movq %rdi,%r12 > + movq %rsi,%r13 > + movq %rdx,%r14 > + leaq 112(%rcx),%r15 > + movdqu (%r8),%xmm2 > + movq %r8,88(%rsp) > + shlq $6,%r14 > + subq %r12,%r13 > + movl 240-112(%r15),%r8d > + addq %r10,%r14 > + > + leaq K_XX_XX(%rip),%r11 > + movl 0(%r9),%eax > + movl 4(%r9),%ebx > + movl 8(%r9),%ecx > + movl 12(%r9),%edx > + movl %ebx,%esi > + movl 16(%r9),%ebp > + movl %ecx,%edi > + xorl %edx,%edi > + andl %edi,%esi > + > + movdqa 64(%r11),%xmm3 > + movdqa 0(%r11),%xmm13 > + movdqu 0(%r10),%xmm4 > + movdqu 16(%r10),%xmm5 > + movdqu 32(%r10),%xmm6 > + movdqu 48(%r10),%xmm7 > +.byte 102,15,56,0,227 > +.byte 102,15,56,0,235 > +.byte 102,15,56,0,243 > + addq $64,%r10 > + paddd %xmm13,%xmm4 > +.byte 102,15,56,0,251 > + paddd %xmm13,%xmm5 > + paddd %xmm13,%xmm6 > + movdqa %xmm4,0(%rsp) > + psubd %xmm13,%xmm4 > + movdqa %xmm5,16(%rsp) > + psubd %xmm13,%xmm5 > + movdqa %xmm6,32(%rsp) > + psubd %xmm13,%xmm6 > + movups -112(%r15),%xmm15 > + movups 16-112(%r15),%xmm0 > + jmp .Loop_ssse3 > +.align 32 > +.Loop_ssse3: > + rorl $2,%ebx > + movups 0(%r12),%xmm14 > + xorps %xmm15,%xmm14 > + xorps %xmm14,%xmm2 > + movups -80(%r15),%xmm1 > +.byte 102,15,56,220,208 > + pshufd $238,%xmm4,%xmm8 > + xorl %edx,%esi > + movdqa %xmm7,%xmm12 > + paddd %xmm7,%xmm13 > + movl %eax,%edi > + addl 0(%rsp),%ebp > + punpcklqdq %xmm5,%xmm8 > + xorl %ecx,%ebx > + roll $5,%eax > + addl %esi,%ebp > + psrldq $4,%xmm12 > + andl %ebx,%edi > + xorl %ecx,%ebx > + pxor %xmm4,%xmm8 > + addl %eax,%ebp > + rorl $7,%eax > + pxor %xmm6,%xmm12 > + xorl %ecx,%edi > + movl %ebp,%esi > + addl 4(%rsp),%edx > + pxor %xmm12,%xmm8 > + xorl %ebx,%eax > + roll $5,%ebp > + movdqa %xmm13,48(%rsp) > + addl %edi,%edx > + movups -64(%r15),%xmm0 > +.byte 102,15,56,220,209 > + andl %eax,%esi > + movdqa %xmm8,%xmm3 > + xorl %ebx,%eax > + addl %ebp,%edx > + rorl $7,%ebp > + movdqa %xmm8,%xmm12 > + xorl %ebx,%esi > + pslldq $12,%xmm3 > + paddd %xmm8,%xmm8 > + movl %edx,%edi > + addl 8(%rsp),%ecx > + psrld $31,%xmm12 > + xorl %eax,%ebp > + roll $5,%edx > + addl %esi,%ecx > + movdqa %xmm3,%xmm13 > + andl %ebp,%edi > + xorl %eax,%ebp > + psrld $30,%xmm3 > + addl %edx,%ecx > + rorl $7,%edx > + por %xmm12,%xmm8 > + xorl %eax,%edi > + movl %ecx,%esi > + addl 12(%rsp),%ebx > + movups -48(%r15),%xmm1 > +.byte 102,15,56,220,208 > + pslld $2,%xmm13 > + pxor %xmm3,%xmm8 > + xorl %ebp,%edx > + movdqa 0(%r11),%xmm3 > + roll $5,%ecx > + addl %edi,%ebx > + andl %edx,%esi > + pxor %xmm13,%xmm8 > + xorl %ebp,%edx > + addl %ecx,%ebx > + rorl $7,%ecx > + pshufd $238,%xmm5,%xmm9 > + xorl %ebp,%esi > + movdqa %xmm8,%xmm13 > + paddd %xmm8,%xmm3 > + movl %ebx,%edi > + addl 16(%rsp),%eax > + punpcklqdq %xmm6,%xmm9 > + xorl %edx,%ecx > + roll $5,%ebx > + addl %esi,%eax > + psrldq $4,%xmm13 > + andl %ecx,%edi > + xorl %edx,%ecx > + pxor %xmm5,%xmm9 > + addl %ebx,%eax > + rorl $7,%ebx > + movups -32(%r15),%xmm0 > +.byte 102,15,56,220,209 > + pxor %xmm7,%xmm13 > + xorl %edx,%edi > + movl %eax,%esi > + addl 20(%rsp),%ebp > + pxor %xmm13,%xmm9 > + xorl %ecx,%ebx > + roll $5,%eax > + movdqa %xmm3,0(%rsp) > + addl %edi,%ebp > + andl %ebx,%esi > + movdqa %xmm9,%xmm12 > + xorl %ecx,%ebx > + addl %eax,%ebp > + rorl $7,%eax > + movdqa %xmm9,%xmm13 > + xorl %ecx,%esi > + pslldq $12,%xmm12 > + paddd %xmm9,%xmm9 > + movl %ebp,%edi > + addl 24(%rsp),%edx > + psrld $31,%xmm13 > + xorl %ebx,%eax > + roll $5,%ebp > + addl %esi,%edx > + movups -16(%r15),%xmm1 > +.byte 102,15,56,220,208 > + movdqa %xmm12,%xmm3 > + andl %eax,%edi > + xorl %ebx,%eax > + psrld $30,%xmm12 > + addl %ebp,%edx > + rorl $7,%ebp > + por %xmm13,%xmm9 > + xorl %ebx,%edi > + movl %edx,%esi > + addl 28(%rsp),%ecx > + pslld $2,%xmm3 > + pxor %xmm12,%xmm9 > + xorl %eax,%ebp > + movdqa 16(%r11),%xmm12 > + roll $5,%edx > + addl %edi,%ecx > + andl %ebp,%esi > + pxor %xmm3,%xmm9 > + xorl %eax,%ebp > + addl %edx,%ecx > + rorl $7,%edx > + pshufd $238,%xmm6,%xmm10 > + xorl %eax,%esi > + movdqa %xmm9,%xmm3 > + paddd %xmm9,%xmm12 > + movl %ecx,%edi > + addl 32(%rsp),%ebx > + movups 0(%r15),%xmm0 > +.byte 102,15,56,220,209 > + punpcklqdq %xmm7,%xmm10 > + xorl %ebp,%edx > + roll $5,%ecx > + addl %esi,%ebx > + psrldq $4,%xmm3 > + andl %edx,%edi > + xorl %ebp,%edx > + pxor %xmm6,%xmm10 > + addl %ecx,%ebx > + rorl $7,%ecx > + pxor %xmm8,%xmm3 > + xorl %ebp,%edi > + movl %ebx,%esi > + addl 36(%rsp),%eax > + pxor %xmm3,%xmm10 > + xorl %edx,%ecx > + roll $5,%ebx > + movdqa %xmm12,16(%rsp) > + addl %edi,%eax > + andl %ecx,%esi > + movdqa %xmm10,%xmm13 > + xorl %edx,%ecx > + addl %ebx,%eax > + rorl $7,%ebx > + movups 16(%r15),%xmm1 > +.byte 102,15,56,220,208 > + movdqa %xmm10,%xmm3 > + xorl %edx,%esi > + pslldq $12,%xmm13 > + paddd %xmm10,%xmm10 > + movl %eax,%edi > + addl 40(%rsp),%ebp > + psrld $31,%xmm3 > + xorl %ecx,%ebx > + roll $5,%eax > + addl %esi,%ebp > + movdqa %xmm13,%xmm12 > + andl %ebx,%edi > + xorl %ecx,%ebx > + psrld $30,%xmm13 > + addl %eax,%ebp > + rorl $7,%eax > + por %xmm3,%xmm10 > + xorl %ecx,%edi > + movl %ebp,%esi > + addl 44(%rsp),%edx > + pslld $2,%xmm12 > + pxor %xmm13,%xmm10 > + xorl %ebx,%eax > + movdqa 16(%r11),%xmm13 > + roll $5,%ebp > + addl %edi,%edx > + movups 32(%r15),%xmm0 > +.byte 102,15,56,220,209 > + andl %eax,%esi > + pxor %xmm12,%xmm10 > + xorl %ebx,%eax > + addl %ebp,%edx > + rorl $7,%ebp > + pshufd $238,%xmm7,%xmm11 > + xorl %ebx,%esi > + movdqa %xmm10,%xmm12 > + paddd %xmm10,%xmm13 > + movl %edx,%edi > + addl 48(%rsp),%ecx > + punpcklqdq %xmm8,%xmm11 > + xorl %eax,%ebp > + roll $5,%edx > + addl %esi,%ecx > + psrldq $4,%xmm12 > + andl %ebp,%edi > + xorl %eax,%ebp > + pxor %xmm7,%xmm11 > + addl %edx,%ecx > + rorl $7,%edx > + pxor %xmm9,%xmm12 > + xorl %eax,%edi > + movl %ecx,%esi > + addl 52(%rsp),%ebx > + movups 48(%r15),%xmm1 > +.byte 102,15,56,220,208 > + pxor %xmm12,%xmm11 > + xorl %ebp,%edx > + roll $5,%ecx > + movdqa %xmm13,32(%rsp) > + addl %edi,%ebx > + andl %edx,%esi > + movdqa %xmm11,%xmm3 > + xorl %ebp,%edx > + addl %ecx,%ebx > + rorl $7,%ecx > + movdqa %xmm11,%xmm12 > + xorl %ebp,%esi > + pslldq $12,%xmm3 > + paddd %xmm11,%xmm11 > + movl %ebx,%edi > + addl 56(%rsp),%eax > + psrld $31,%xmm12 > + xorl %edx,%ecx > + roll $5,%ebx > + addl %esi,%eax > + movdqa %xmm3,%xmm13 > + andl %ecx,%edi > + xorl %edx,%ecx > + psrld $30,%xmm3 > + addl %ebx,%eax > + rorl $7,%ebx > + cmpl $11,%r8d > + jb .Laesenclast1 > + movups 64(%r15),%xmm0 > +.byte 102,15,56,220,209 > + movups 80(%r15),%xmm1 > +.byte 102,15,56,220,208 > + je .Laesenclast1 > + movups 96(%r15),%xmm0 > +.byte 102,15,56,220,209 > + movups 112(%r15),%xmm1 > +.byte 102,15,56,220,208 > +.Laesenclast1: > +.byte 102,15,56,221,209 > + movups 16-112(%r15),%xmm0 > + por %xmm12,%xmm11 > + xorl %edx,%edi > + movl %eax,%esi > + addl 60(%rsp),%ebp > + pslld $2,%xmm13 > + pxor %xmm3,%xmm11 > + xorl %ecx,%ebx > + movdqa 16(%r11),%xmm3 > + roll $5,%eax > + addl %edi,%ebp > + andl %ebx,%esi > + pxor %xmm13,%xmm11 > + pshufd $238,%xmm10,%xmm13 > + xorl %ecx,%ebx > + addl %eax,%ebp > + rorl $7,%eax > + pxor %xmm8,%xmm4 > + xorl %ecx,%esi > + movl %ebp,%edi > + addl 0(%rsp),%edx > + punpcklqdq %xmm11,%xmm13 > + xorl %ebx,%eax > + roll $5,%ebp > + pxor %xmm5,%xmm4 > + addl %esi,%edx > + movups 16(%r12),%xmm14 > + xorps %xmm15,%xmm14 > + movups %xmm2,0(%r12,%r13,1) > + xorps %xmm14,%xmm2 > + movups -80(%r15),%xmm1 > +.byte 102,15,56,220,208 > + andl %eax,%edi > + movdqa %xmm3,%xmm12 > + xorl %ebx,%eax > + paddd %xmm11,%xmm3 > + addl %ebp,%edx > + pxor %xmm13,%xmm4 > + rorl $7,%ebp > + xorl %ebx,%edi > + movl %edx,%esi > + addl 4(%rsp),%ecx > + movdqa %xmm4,%xmm13 > + xorl %eax,%ebp > + roll $5,%edx > + movdqa %xmm3,48(%rsp) > + addl %edi,%ecx > + andl %ebp,%esi > + xorl %eax,%ebp > + pslld $2,%xmm4 > + addl %edx,%ecx > + rorl $7,%edx > + psrld $30,%xmm13 > + xorl %eax,%esi > + movl %ecx,%edi > + addl 8(%rsp),%ebx > + movups -64(%r15),%xmm0 > +.byte 102,15,56,220,209 > + por %xmm13,%xmm4 > + xorl %ebp,%edx > + roll $5,%ecx > + pshufd $238,%xmm11,%xmm3 > + addl %esi,%ebx > + andl %edx,%edi > + xorl %ebp,%edx > + addl %ecx,%ebx > + addl 12(%rsp),%eax > + xorl %ebp,%edi > + movl %ebx,%esi > + roll $5,%ebx > + addl %edi,%eax > + xorl %edx,%esi > + rorl $7,%ecx > + addl %ebx,%eax > + pxor %xmm9,%xmm5 > + addl 16(%rsp),%ebp > + movups -48(%r15),%xmm1 > +.byte 102,15,56,220,208 > + xorl %ecx,%esi > + punpcklqdq %xmm4,%xmm3 > + movl %eax,%edi > + roll $5,%eax > + pxor %xmm6,%xmm5 > + addl %esi,%ebp > + xorl %ecx,%edi > + movdqa %xmm12,%xmm13 > + rorl $7,%ebx > + paddd %xmm4,%xmm12 > + addl %eax,%ebp > + pxor %xmm3,%xmm5 > + addl 20(%rsp),%edx > + xorl %ebx,%edi > + movl %ebp,%esi > + roll $5,%ebp > + movdqa %xmm5,%xmm3 > + addl %edi,%edx > + xorl %ebx,%esi > + movdqa %xmm12,0(%rsp) > + rorl $7,%eax > + addl %ebp,%edx > + addl 24(%rsp),%ecx > + pslld $2,%xmm5 > + xorl %eax,%esi > + movl %edx,%edi > + psrld $30,%xmm3 > + roll $5,%edx > + addl %esi,%ecx > + movups -32(%r15),%xmm0 > +.byte 102,15,56,220,209 > + xorl %eax,%edi > + rorl $7,%ebp > + por %xmm3,%xmm5 > + addl %edx,%ecx > + addl 28(%rsp),%ebx > + pshufd $238,%xmm4,%xmm12 > + xorl %ebp,%edi > + movl %ecx,%esi > + roll $5,%ecx > + addl %edi,%ebx > + xorl %ebp,%esi > + rorl $7,%edx > + addl %ecx,%ebx > + pxor %xmm10,%xmm6 > + addl 32(%rsp),%eax > + xorl %edx,%esi > + punpcklqdq %xmm5,%xmm12 > + movl %ebx,%edi > + roll $5,%ebx > + pxor %xmm7,%xmm6 > + addl %esi,%eax > + xorl %edx,%edi > + movdqa 32(%r11),%xmm3 > + rorl $7,%ecx > + paddd %xmm5,%xmm13 > + addl %ebx,%eax > + pxor %xmm12,%xmm6 > + addl 36(%rsp),%ebp > + movups -16(%r15),%xmm1 > +.byte 102,15,56,220,208 > + xorl %ecx,%edi > + movl %eax,%esi > + roll $5,%eax > + movdqa %xmm6,%xmm12 > + addl %edi,%ebp > + xorl %ecx,%esi > + movdqa %xmm13,16(%rsp) > + rorl $7,%ebx > + addl %eax,%ebp > + addl 40(%rsp),%edx > + pslld $2,%xmm6 > + xorl %ebx,%esi > + movl %ebp,%edi > + psrld $30,%xmm12 > + roll $5,%ebp > + addl %esi,%edx > + xorl %ebx,%edi > + rorl $7,%eax > + por %xmm12,%xmm6 > + addl %ebp,%edx > + addl 44(%rsp),%ecx > + pshufd $238,%xmm5,%xmm13 > + xorl %eax,%edi > + movl %edx,%esi > + roll $5,%edx > + addl %edi,%ecx > + movups 0(%r15),%xmm0 > +.byte 102,15,56,220,209 > + xorl %eax,%esi > + rorl $7,%ebp > + addl %edx,%ecx > + pxor %xmm11,%xmm7 > + addl 48(%rsp),%ebx > + xorl %ebp,%esi > + punpcklqdq %xmm6,%xmm13 > + movl %ecx,%edi > + roll $5,%ecx > + pxor %xmm8,%xmm7 > + addl %esi,%ebx > + xorl %ebp,%edi > + movdqa %xmm3,%xmm12 > + rorl $7,%edx > + paddd %xmm6,%xmm3 > + addl %ecx,%ebx > + pxor %xmm13,%xmm7 > + addl 52(%rsp),%eax > + xorl %edx,%edi > + movl %ebx,%esi > + roll $5,%ebx > + movdqa %xmm7,%xmm13 > + addl %edi,%eax > + xorl %edx,%esi > + movdqa %xmm3,32(%rsp) > + rorl $7,%ecx > + addl %ebx,%eax > + addl 56(%rsp),%ebp > + movups 16(%r15),%xmm1 > +.byte 102,15,56,220,208 > + pslld $2,%xmm7 > + xorl %ecx,%esi > + movl %eax,%edi > + psrld $30,%xmm13 > + roll $5,%eax > + addl %esi,%ebp > + xorl %ecx,%edi > + rorl $7,%ebx > + por %xmm13,%xmm7 > + addl %eax,%ebp > + addl 60(%rsp),%edx > + pshufd $238,%xmm6,%xmm3 > + xorl %ebx,%edi > + movl %ebp,%esi > + roll $5,%ebp > + addl %edi,%edx > + xorl %ebx,%esi > + rorl $7,%eax > + addl %ebp,%edx > + pxor %xmm4,%xmm8 > + addl 0(%rsp),%ecx > + xorl %eax,%esi > + punpcklqdq %xmm7,%xmm3 > + movl %edx,%edi > + roll $5,%edx > + pxor %xmm9,%xmm8 > + addl %esi,%ecx > + movups 32(%r15),%xmm0 > +.byte 102,15,56,220,209 > + xorl %eax,%edi > + movdqa %xmm12,%xmm13 > + rorl $7,%ebp > + paddd %xmm7,%xmm12 > + addl %edx,%ecx > + pxor %xmm3,%xmm8 > + addl 4(%rsp),%ebx > + xorl %ebp,%edi > + movl %ecx,%esi > + roll $5,%ecx > + movdqa %xmm8,%xmm3 > + addl %edi,%ebx > + xorl %ebp,%esi > + movdqa %xmm12,48(%rsp) > + rorl $7,%edx > + addl %ecx,%ebx > + addl 8(%rsp),%eax > + pslld $2,%xmm8 > + xorl %edx,%esi > + movl %ebx,%edi > + psrld $30,%xmm3 > + roll $5,%ebx > + addl %esi,%eax > + xorl %edx,%edi > + rorl $7,%ecx > + por %xmm3,%xmm8 > + addl %ebx,%eax > + addl 12(%rsp),%ebp > + movups 48(%r15),%xmm1 > +.byte 102,15,56,220,208 > + pshufd $238,%xmm7,%xmm12 > + xorl %ecx,%edi > + movl %eax,%esi > + roll $5,%eax > + addl %edi,%ebp > + xorl %ecx,%esi > + rorl $7,%ebx > + addl %eax,%ebp > + pxor %xmm5,%xmm9 > + addl 16(%rsp),%edx > + xorl %ebx,%esi > + punpcklqdq %xmm8,%xmm12 > + movl %ebp,%edi > + roll $5,%ebp > + pxor %xmm10,%xmm9 > + addl %esi,%edx > + xorl %ebx,%edi > + movdqa %xmm13,%xmm3 > + rorl $7,%eax > + paddd %xmm8,%xmm13 > + addl %ebp,%edx > + pxor %xmm12,%xmm9 > + addl 20(%rsp),%ecx > + xorl %eax,%edi > + movl %edx,%esi > + roll $5,%edx > + movdqa %xmm9,%xmm12 > + addl %edi,%ecx > + cmpl $11,%r8d > + jb .Laesenclast2 > + movups 64(%r15),%xmm0 > +.byte 102,15,56,220,209 > + movups 80(%r15),%xmm1 > +.byte 102,15,56,220,208 > + je .Laesenclast2 > + movups 96(%r15),%xmm0 > +.byte 102,15,56,220,209 > + movups 112(%r15),%xmm1 > +.byte 102,15,56,220,208 > +.Laesenclast2: > +.byte 102,15,56,221,209 > + movups 16-112(%r15),%xmm0 > + xorl %eax,%esi > + movdqa %xmm13,0(%rsp) > + rorl $7,%ebp > + addl %edx,%ecx > + addl 24(%rsp),%ebx > + pslld $2,%xmm9 > + xorl %ebp,%esi > + movl %ecx,%edi > + psrld $30,%xmm12 > + roll $5,%ecx > + addl %esi,%ebx > + xorl %ebp,%edi > + rorl $7,%edx > + por %xmm12,%xmm9 > + addl %ecx,%ebx > + addl 28(%rsp),%eax > + pshufd $238,%xmm8,%xmm13 > + rorl $7,%ecx > + movl %ebx,%esi > + xorl %edx,%edi > + roll $5,%ebx > + addl %edi,%eax > + xorl %ecx,%esi > + xorl %edx,%ecx > + addl %ebx,%eax > + pxor %xmm6,%xmm10 > + addl 32(%rsp),%ebp > + movups 32(%r12),%xmm14 > + xorps %xmm15,%xmm14 > + movups %xmm2,16(%r13,%r12,1) > + xorps %xmm14,%xmm2 > + movups -80(%r15),%xmm1 > +.byte 102,15,56,220,208 > + andl %ecx,%esi > + xorl %edx,%ecx > + rorl $7,%ebx > + punpcklqdq %xmm9,%xmm13 > + movl %eax,%edi > + xorl %ecx,%esi > + pxor %xmm11,%xmm10 > + roll $5,%eax > + addl %esi,%ebp > + movdqa %xmm3,%xmm12 > + xorl %ebx,%edi > + paddd %xmm9,%xmm3 > + xorl %ecx,%ebx > + pxor %xmm13,%xmm10 > + addl %eax,%ebp > + addl 36(%rsp),%edx > + andl %ebx,%edi > + xorl %ecx,%ebx > + rorl $7,%eax > + movdqa %xmm10,%xmm13 > + movl %ebp,%esi > + xorl %ebx,%edi > + movdqa %xmm3,16(%rsp) > + roll $5,%ebp > + addl %edi,%edx > + movups -64(%r15),%xmm0 > +.byte 102,15,56,220,209 > + xorl %eax,%esi > + pslld $2,%xmm10 > + xorl %ebx,%eax > + addl %ebp,%edx > + psrld $30,%xmm13 > + addl 40(%rsp),%ecx > + andl %eax,%esi > + xorl %ebx,%eax > + por %xmm13,%xmm10 > + rorl $7,%ebp > + movl %edx,%edi > + xorl %eax,%esi > + roll $5,%edx > + pshufd $238,%xmm9,%xmm3 > + addl %esi,%ecx > + xorl %ebp,%edi > + xorl %eax,%ebp > + addl %edx,%ecx > + addl 44(%rsp),%ebx > + andl %ebp,%edi > + xorl %eax,%ebp > + rorl $7,%edx > + movups -48(%r15),%xmm1 > +.byte 102,15,56,220,208 > + movl %ecx,%esi > + xorl %ebp,%edi > + roll $5,%ecx > + addl %edi,%ebx > + xorl %edx,%esi > + xorl %ebp,%edx > + addl %ecx,%ebx > + pxor %xmm7,%xmm11 > + addl 48(%rsp),%eax > + andl %edx,%esi > + xorl %ebp,%edx > + rorl $7,%ecx > + punpcklqdq %xmm10,%xmm3 > + movl %ebx,%edi > + xorl %edx,%esi > + pxor %xmm4,%xmm11 > + roll $5,%ebx > + addl %esi,%eax > + movdqa 48(%r11),%xmm13 > + xorl %ecx,%edi > + paddd %xmm10,%xmm12 > + xorl %edx,%ecx > + pxor %xmm3,%xmm11 > + addl %ebx,%eax > + addl 52(%rsp),%ebp > + movups -32(%r15),%xmm0 > +.byte 102,15,56,220,209 > + andl %ecx,%edi > + xorl %edx,%ecx > + rorl $7,%ebx > + movdqa %xmm11,%xmm3 > + movl %eax,%esi > + xorl %ecx,%edi > + movdqa %xmm12,32(%rsp) > + roll $5,%eax > + addl %edi,%ebp > + xorl %ebx,%esi > + pslld $2,%xmm11 > + xorl %ecx,%ebx > + addl %eax,%ebp > + psrld $30,%xmm3 > + addl 56(%rsp),%edx > + andl %ebx,%esi > + xorl %ecx,%ebx > + por %xmm3,%xmm11 > + rorl $7,%eax > + movl %ebp,%edi > + xorl %ebx,%esi > + roll $5,%ebp > + pshufd $238,%xmm10,%xmm12 > + addl %esi,%edx > + movups -16(%r15),%xmm1 > +.byte 102,15,56,220,208 > + xorl %eax,%edi > + xorl %ebx,%eax > + addl %ebp,%edx > + addl 60(%rsp),%ecx > + andl %eax,%edi > + xorl %ebx,%eax > + rorl $7,%ebp > + movl %edx,%esi > + xorl %eax,%edi > + roll $5,%edx > + addl %edi,%ecx > + xorl %ebp,%esi > + xorl %eax,%ebp > + addl %edx,%ecx > + pxor %xmm8,%xmm4 > + addl 0(%rsp),%ebx > + andl %ebp,%esi > + xorl %eax,%ebp > + rorl $7,%edx > + movups 0(%r15),%xmm0 > +.byte 102,15,56,220,209 > + punpcklqdq %xmm11,%xmm12 > + movl %ecx,%edi > + xorl %ebp,%esi > + pxor %xmm5,%xmm4 > + roll $5,%ecx > + addl %esi,%ebx > + movdqa %xmm13,%xmm3 > + xorl %edx,%edi > + paddd %xmm11,%xmm13 > + xorl %ebp,%edx > + pxor %xmm12,%xmm4 > + addl %ecx,%ebx > + addl 4(%rsp),%eax > + andl %edx,%edi > + xorl %ebp,%edx > + rorl $7,%ecx > + movdqa %xmm4,%xmm12 > + movl %ebx,%esi > + xorl %edx,%edi > + movdqa %xmm13,48(%rsp) > + roll $5,%ebx > + addl %edi,%eax > + xorl %ecx,%esi > + pslld $2,%xmm4 > + xorl %edx,%ecx > + addl %ebx,%eax > + psrld $30,%xmm12 > + addl 8(%rsp),%ebp > + movups 16(%r15),%xmm1 > +.byte 102,15,56,220,208 > + andl %ecx,%esi > + xorl %edx,%ecx > + por %xmm12,%xmm4 > + rorl $7,%ebx > + movl %eax,%edi > + xorl %ecx,%esi > + roll $5,%eax > + pshufd $238,%xmm11,%xmm13 > + addl %esi,%ebp > + xorl %ebx,%edi > + xorl %ecx,%ebx > + addl %eax,%ebp > + addl 12(%rsp),%edx > + andl %ebx,%edi > + xorl %ecx,%ebx > + rorl $7,%eax > + movl %ebp,%esi > + xorl %ebx,%edi > + roll $5,%ebp > + addl %edi,%edx > + movups 32(%r15),%xmm0 > +.byte 102,15,56,220,209 > + xorl %eax,%esi > + xorl %ebx,%eax > + addl %ebp,%edx > + pxor %xmm9,%xmm5 > + addl 16(%rsp),%ecx > + andl %eax,%esi > + xorl %ebx,%eax > + rorl $7,%ebp > + punpcklqdq %xmm4,%xmm13 > + movl %edx,%edi > + xorl %eax,%esi > + pxor %xmm6,%xmm5 > + roll $5,%edx > + addl %esi,%ecx > + movdqa %xmm3,%xmm12 > + xorl %ebp,%edi > + paddd %xmm4,%xmm3 > + xorl %eax,%ebp > + pxor %xmm13,%xmm5 > + addl %edx,%ecx > + addl 20(%rsp),%ebx > + andl %ebp,%edi > + xorl %eax,%ebp > + rorl $7,%edx > + movups 48(%r15),%xmm1 > +.byte 102,15,56,220,208 > + movdqa %xmm5,%xmm13 > + movl %ecx,%esi > + xorl %ebp,%edi > + movdqa %xmm3,0(%rsp) > + roll $5,%ecx > + addl %edi,%ebx > + xorl %edx,%esi > + pslld $2,%xmm5 > + xorl %ebp,%edx > + addl %ecx,%ebx > + psrld $30,%xmm13 > + addl 24(%rsp),%eax > + andl %edx,%esi > + xorl %ebp,%edx > + por %xmm13,%xmm5 > + rorl $7,%ecx > + movl %ebx,%edi > + xorl %edx,%esi > + roll $5,%ebx > + pshufd $238,%xmm4,%xmm3 > + addl %esi,%eax > + xorl %ecx,%edi > + xorl %edx,%ecx > + addl %ebx,%eax > + addl 28(%rsp),%ebp > + cmpl $11,%r8d > + jb .Laesenclast3 > + movups 64(%r15),%xmm0 > +.byte 102,15,56,220,209 > + movups 80(%r15),%xmm1 > +.byte 102,15,56,220,208 > + je .Laesenclast3 > + movups 96(%r15),%xmm0 > +.byte 102,15,56,220,209 > + movups 112(%r15),%xmm1 > +.byte 102,15,56,220,208 > +.Laesenclast3: > +.byte 102,15,56,221,209 > + movups 16-112(%r15),%xmm0 > + andl %ecx,%edi > + xorl %edx,%ecx > + rorl $7,%ebx > + movl %eax,%esi > + xorl %ecx,%edi > + roll $5,%eax > + addl %edi,%ebp > + xorl %ebx,%esi > + xorl %ecx,%ebx > + addl %eax,%ebp > + pxor %xmm10,%xmm6 > + addl 32(%rsp),%edx > + andl %ebx,%esi > + xorl %ecx,%ebx > + rorl $7,%eax > + punpcklqdq %xmm5,%xmm3 > + movl %ebp,%edi > + xorl %ebx,%esi > + pxor %xmm7,%xmm6 > + roll $5,%ebp > + addl %esi,%edx > + movups 48(%r12),%xmm14 > + xorps %xmm15,%xmm14 > + movups %xmm2,32(%r13,%r12,1) > + xorps %xmm14,%xmm2 > + movups -80(%r15),%xmm1 > +.byte 102,15,56,220,208 > + movdqa %xmm12,%xmm13 > + xorl %eax,%edi > + paddd %xmm5,%xmm12 > + xorl %ebx,%eax > + pxor %xmm3,%xmm6 > + addl %ebp,%edx > + addl 36(%rsp),%ecx > + andl %eax,%edi > + xorl %ebx,%eax > + rorl $7,%ebp > + movdqa %xmm6,%xmm3 > + movl %edx,%esi > + xorl %eax,%edi > + movdqa %xmm12,16(%rsp) > + roll $5,%edx > + addl %edi,%ecx > + xorl %ebp,%esi > + pslld $2,%xmm6 > + xorl %eax,%ebp > + addl %edx,%ecx > + psrld $30,%xmm3 > + addl 40(%rsp),%ebx > + andl %ebp,%esi > + xorl %eax,%ebp > + por %xmm3,%xmm6 > + rorl $7,%edx > + movups -64(%r15),%xmm0 > +.byte 102,15,56,220,209 > + movl %ecx,%edi > + xorl %ebp,%esi > + roll $5,%ecx > + pshufd $238,%xmm5,%xmm12 > + addl %esi,%ebx > + xorl %edx,%edi > + xorl %ebp,%edx > + addl %ecx,%ebx > + addl 44(%rsp),%eax > + andl %edx,%edi > + xorl %ebp,%edx > + rorl $7,%ecx > + movl %ebx,%esi > + xorl %edx,%edi > + roll $5,%ebx > + addl %edi,%eax > + xorl %edx,%esi > + addl %ebx,%eax > + pxor %xmm11,%xmm7 > + addl 48(%rsp),%ebp > + movups -48(%r15),%xmm1 > +.byte 102,15,56,220,208 > + xorl %ecx,%esi > + punpcklqdq %xmm6,%xmm12 > + movl %eax,%edi > + roll $5,%eax > + pxor %xmm8,%xmm7 > + addl %esi,%ebp > + xorl %ecx,%edi > + movdqa %xmm13,%xmm3 > + rorl $7,%ebx > + paddd %xmm6,%xmm13 > + addl %eax,%ebp > + pxor %xmm12,%xmm7 > + addl 52(%rsp),%edx > + xorl %ebx,%edi > + movl %ebp,%esi > + roll $5,%ebp > + movdqa %xmm7,%xmm12 > + addl %edi,%edx > + xorl %ebx,%esi > + movdqa %xmm13,32(%rsp) > + rorl $7,%eax > + addl %ebp,%edx > + addl 56(%rsp),%ecx > + pslld $2,%xmm7 > + xorl %eax,%esi > + movl %edx,%edi > + psrld $30,%xmm12 > + roll $5,%edx > + addl %esi,%ecx > + movups -32(%r15),%xmm0 > +.byte 102,15,56,220,209 > + xorl %eax,%edi > + rorl $7,%ebp > + por %xmm12,%xmm7 > + addl %edx,%ecx > + addl 60(%rsp),%ebx > + xorl %ebp,%edi > + movl %ecx,%esi > + roll $5,%ecx > + addl %edi,%ebx > + xorl %ebp,%esi > + rorl $7,%edx > + addl %ecx,%ebx > + addl 0(%rsp),%eax > + xorl %edx,%esi > + movl %ebx,%edi > + roll $5,%ebx > + paddd %xmm7,%xmm3 > + addl %esi,%eax > + xorl %edx,%edi > + movdqa %xmm3,48(%rsp) > + rorl $7,%ecx > + addl %ebx,%eax > + addl 4(%rsp),%ebp > + movups -16(%r15),%xmm1 > +.byte 102,15,56,220,208 > + xorl %ecx,%edi > + movl %eax,%esi > + roll $5,%eax > + addl %edi,%ebp > + xorl %ecx,%esi > + rorl $7,%ebx > + addl %eax,%ebp > + addl 8(%rsp),%edx > + xorl %ebx,%esi > + movl %ebp,%edi > + roll $5,%ebp > + addl %esi,%edx > + xorl %ebx,%edi > + rorl $7,%eax > + addl %ebp,%edx > + addl 12(%rsp),%ecx > + xorl %eax,%edi > + movl %edx,%esi > + roll $5,%edx > + addl %edi,%ecx > + movups 0(%r15),%xmm0 > +.byte 102,15,56,220,209 > + xorl %eax,%esi > + rorl $7,%ebp > + addl %edx,%ecx > + cmpq %r14,%r10 > + je .Ldone_ssse3 > + movdqa 64(%r11),%xmm3 > + movdqa 0(%r11),%xmm13 > + movdqu 0(%r10),%xmm4 > + movdqu 16(%r10),%xmm5 > + movdqu 32(%r10),%xmm6 > + movdqu 48(%r10),%xmm7 > +.byte 102,15,56,0,227 > + addq $64,%r10 > + addl 16(%rsp),%ebx > + xorl %ebp,%esi > + movl %ecx,%edi > +.byte 102,15,56,0,235 > + roll $5,%ecx > + addl %esi,%ebx > + xorl %ebp,%edi > + rorl $7,%edx > + paddd %xmm13,%xmm4 > + addl %ecx,%ebx > + addl 20(%rsp),%eax > + xorl %edx,%edi > + movl %ebx,%esi > + movdqa %xmm4,0(%rsp) > + roll $5,%ebx > + addl %edi,%eax > + xorl %edx,%esi > + rorl $7,%ecx > + psubd %xmm13,%xmm4 > + addl %ebx,%eax > + addl 24(%rsp),%ebp > + movups 16(%r15),%xmm1 > +.byte 102,15,56,220,208 > + xorl %ecx,%esi > + movl %eax,%edi > + roll $5,%eax > + addl %esi,%ebp > + xorl %ecx,%edi > + rorl $7,%ebx > + addl %eax,%ebp > + addl 28(%rsp),%edx > + xorl %ebx,%edi > + movl %ebp,%esi > + roll $5,%ebp > + addl %edi,%edx > + xorl %ebx,%esi > + rorl $7,%eax > + addl %ebp,%edx > + addl 32(%rsp),%ecx > + xorl %eax,%esi > + movl %edx,%edi > +.byte 102,15,56,0,243 > + roll $5,%edx > + addl %esi,%ecx > + movups 32(%r15),%xmm0 > +.byte 102,15,56,220,209 > + xorl %eax,%edi > + rorl $7,%ebp > + paddd %xmm13,%xmm5 > + addl %edx,%ecx > + addl 36(%rsp),%ebx > + xorl %ebp,%edi > + movl %ecx,%esi > + movdqa %xmm5,16(%rsp) > + roll $5,%ecx > + addl %edi,%ebx > + xorl %ebp,%esi > + rorl $7,%edx > + psubd %xmm13,%xmm5 > + addl %ecx,%ebx > + addl 40(%rsp),%eax > + xorl %edx,%esi > + movl %ebx,%edi > + roll $5,%ebx > + addl %esi,%eax > + xorl %edx,%edi > + rorl $7,%ecx > + addl %ebx,%eax > + addl 44(%rsp),%ebp > + movups 48(%r15),%xmm1 > +.byte 102,15,56,220,208 > + xorl %ecx,%edi > + movl %eax,%esi > + roll $5,%eax > + addl %edi,%ebp > + xorl %ecx,%esi > + rorl $7,%ebx > + addl %eax,%ebp > + addl 48(%rsp),%edx > + xorl %ebx,%esi > + movl %ebp,%edi > +.byte 102,15,56,0,251 > + roll $5,%ebp > + addl %esi,%edx > + xorl %ebx,%edi > + rorl $7,%eax > + paddd %xmm13,%xmm6 > + addl %ebp,%edx > + addl 52(%rsp),%ecx > + xorl %eax,%edi > + movl %edx,%esi > + movdqa %xmm6,32(%rsp) > + roll $5,%edx > + addl %edi,%ecx > + cmpl $11,%r8d > + jb .Laesenclast4 > + movups 64(%r15),%xmm0 > +.byte 102,15,56,220,209 > + movups 80(%r15),%xmm1 > +.byte 102,15,56,220,208 > + je .Laesenclast4 > + movups 96(%r15),%xmm0 > +.byte 102,15,56,220,209 > + movups 112(%r15),%xmm1 > +.byte 102,15,56,220,208 > +.Laesenclast4: > +.byte 102,15,56,221,209 > + movups 16-112(%r15),%xmm0 > + xorl %eax,%esi > + rorl $7,%ebp > + psubd %xmm13,%xmm6 > + addl %edx,%ecx > + addl 56(%rsp),%ebx > + xorl %ebp,%esi > + movl %ecx,%edi > + roll $5,%ecx > + addl %esi,%ebx > + xorl %ebp,%edi > + rorl $7,%edx > + addl %ecx,%ebx > + addl 60(%rsp),%eax > + xorl %edx,%edi > + movl %ebx,%esi > + roll $5,%ebx > + addl %edi,%eax > + rorl $7,%ecx > + addl %ebx,%eax > + movups %xmm2,48(%r13,%r12,1) > + leaq 64(%r12),%r12 > + > + addl 0(%r9),%eax > + addl 4(%r9),%esi > + addl 8(%r9),%ecx > + addl 12(%r9),%edx > + movl %eax,0(%r9) > + addl 16(%r9),%ebp > + movl %esi,4(%r9) > + movl %esi,%ebx > + movl %ecx,8(%r9) > + movl %ecx,%edi > + movl %edx,12(%r9) > + xorl %edx,%edi > + movl %ebp,16(%r9) > + andl %edi,%esi > + jmp .Loop_ssse3 > + > +.Ldone_ssse3: > + addl 16(%rsp),%ebx > + xorl %ebp,%esi > + movl %ecx,%edi > + roll $5,%ecx > + addl %esi,%ebx > + xorl %ebp,%edi > + rorl $7,%edx > + addl %ecx,%ebx > + addl 20(%rsp),%eax > + xorl %edx,%edi > + movl %ebx,%esi > + roll $5,%ebx > + addl %edi,%eax > + xorl %edx,%esi > + rorl $7,%ecx > + addl %ebx,%eax > + addl 24(%rsp),%ebp > + movups 16(%r15),%xmm1 > +.byte 102,15,56,220,208 > + xorl %ecx,%esi > + movl %eax,%edi > + roll $5,%eax > + addl %esi,%ebp > + xorl %ecx,%edi > + rorl $7,%ebx > + addl %eax,%ebp > + addl 28(%rsp),%edx > + xorl %ebx,%edi > + movl %ebp,%esi > + roll $5,%ebp > + addl %edi,%edx > + xorl %ebx,%esi > + rorl $7,%eax > + addl %ebp,%edx > + addl 32(%rsp),%ecx > + xorl %eax,%esi > + movl %edx,%edi > + roll $5,%edx > + addl %esi,%ecx > + movups 32(%r15),%xmm0 > +.byte 102,15,56,220,209 > + xorl %eax,%edi > + rorl $7,%ebp > + addl %edx,%ecx > + addl 36(%rsp),%ebx > + xorl %ebp,%edi > + movl %ecx,%esi > + roll $5,%ecx > + addl %edi,%ebx > + xorl %ebp,%esi > + rorl $7,%edx > + addl %ecx,%ebx > + addl 40(%rsp),%eax > + xorl %edx,%esi > + movl %ebx,%edi > + roll $5,%ebx > + addl %esi,%eax > + xorl %edx,%edi > + rorl $7,%ecx > + addl %ebx,%eax > + addl 44(%rsp),%ebp > + movups 48(%r15),%xmm1 > +.byte 102,15,56,220,208 > + xorl %ecx,%edi > + movl %eax,%esi > + roll $5,%eax > + addl %edi,%ebp > + xorl %ecx,%esi > + rorl $7,%ebx > + addl %eax,%ebp > + addl 48(%rsp),%edx > + xorl %ebx,%esi > + movl %ebp,%edi > + roll $5,%ebp > + addl %esi,%edx > + xorl %ebx,%edi > + rorl $7,%eax > + addl %ebp,%edx > + addl 52(%rsp),%ecx > + xorl %eax,%edi > + movl %edx,%esi > + roll $5,%edx > + addl %edi,%ecx > + cmpl $11,%r8d > + jb .Laesenclast5 > + movups 64(%r15),%xmm0 > +.byte 102,15,56,220,209 > + movups 80(%r15),%xmm1 > +.byte 102,15,56,220,208 > + je .Laesenclast5 > + movups 96(%r15),%xmm0 > +.byte 102,15,56,220,209 > + movups 112(%r15),%xmm1 > +.byte 102,15,56,220,208 > +.Laesenclast5: > +.byte 102,15,56,221,209 > + movups 16-112(%r15),%xmm0 > + xorl %eax,%esi > + rorl $7,%ebp > + addl %edx,%ecx > + addl 56(%rsp),%ebx > + xorl %ebp,%esi > + movl %ecx,%edi > + roll $5,%ecx > + addl %esi,%ebx > + xorl %ebp,%edi > + rorl $7,%edx > + addl %ecx,%ebx > + addl 60(%rsp),%eax > + xorl %edx,%edi > + movl %ebx,%esi > + roll $5,%ebx > + addl %edi,%eax > + rorl $7,%ecx > + addl %ebx,%eax > + movups %xmm2,48(%r13,%r12,1) > + movq 88(%rsp),%r8 > + > + addl 0(%r9),%eax > + addl 4(%r9),%esi > + addl 8(%r9),%ecx > + movl %eax,0(%r9) > + addl 12(%r9),%edx > + movl %esi,4(%r9) > + addl 16(%r9),%ebp > + movl %ecx,8(%r9) > + movl %edx,12(%r9) > + movl %ebp,16(%r9) > + movups %xmm2,(%r8) > + leaq 104(%rsp),%rsi > +.cfi_def_cfa %rsi,56 > + movq 0(%rsi),%r15 > +.cfi_restore %r15 > + movq 8(%rsi),%r14 > +.cfi_restore %r14 > + movq 16(%rsi),%r13 > +.cfi_restore %r13 > + movq 24(%rsi),%r12 > +.cfi_restore %r12 > + movq 32(%rsi),%rbp > +.cfi_restore %rbp > + movq 40(%rsi),%rbx > +.cfi_restore %rbx > + leaq 48(%rsi),%rsp > +.cfi_def_cfa %rsp,8 > +.Lepilogue_ssse3: > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size aesni_cbc_sha1_enc_ssse3,.-aesni_cbc_sha1_enc_ssse3 > +.align 64 > +K_XX_XX: > +.long 0x5a827999,0x5a827999,0x5a827999,0x5a827999 > +.long 0x6ed9eba1,0x6ed9eba1,0x6ed9eba1,0x6ed9eba1 > +.long 0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc > +.long 0xca62c1d6,0xca62c1d6,0xca62c1d6,0xca62c1d6 > +.long 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f > +.byte 0xf,0xe,0xd,0xc,0xb,0xa,0x9,0x8,0x7,0x6,0x5,0x4,0x3,0x2,0x1,0x0 > + > +.byte > 65,69,83,78,73,45,67,66,67,43,83,72,65,49,32,115,116,105,116,99,104,32,102, > 111,114,32,120,56,54,95,54,52,44,32,67,82,89,80,84,79,71,65,77,83,32,98,121, > 32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46,111,114,103,62, > 0 > +.align 64 > +.type aesni_cbc_sha1_enc_shaext,@function > +.align 32 > +aesni_cbc_sha1_enc_shaext: > +.cfi_startproc > + movq 8(%rsp),%r10 > + movdqu (%r9),%xmm8 > + movd 16(%r9),%xmm9 > + movdqa K_XX_XX+80(%rip),%xmm7 > + > + movl 240(%rcx),%r11d > + subq %rdi,%rsi > + movups (%rcx),%xmm15 > + movups (%r8),%xmm2 > + movups 16(%rcx),%xmm0 > + leaq 112(%rcx),%rcx > + > + pshufd $27,%xmm8,%xmm8 > + pshufd $27,%xmm9,%xmm9 > + jmp .Loop_shaext > + > +.align 16 > +.Loop_shaext: > + movups 0(%rdi),%xmm14 > + xorps %xmm15,%xmm14 > + xorps %xmm14,%xmm2 > + movups -80(%rcx),%xmm1 > +.byte 102,15,56,220,208 > + movdqu (%r10),%xmm3 > + movdqa %xmm9,%xmm12 > +.byte 102,15,56,0,223 > + movdqu 16(%r10),%xmm4 > + movdqa %xmm8,%xmm11 > + movups -64(%rcx),%xmm0 > +.byte 102,15,56,220,209 > +.byte 102,15,56,0,231 > + > + paddd %xmm3,%xmm9 > + movdqu 32(%r10),%xmm5 > + leaq 64(%r10),%r10 > + pxor %xmm12,%xmm3 > + movups -48(%rcx),%xmm1 > +.byte 102,15,56,220,208 > + pxor %xmm12,%xmm3 > + movdqa %xmm8,%xmm10 > +.byte 102,15,56,0,239 > +.byte 69,15,58,204,193,0 > +.byte 68,15,56,200,212 > + movups -32(%rcx),%xmm0 > +.byte 102,15,56,220,209 > +.byte 15,56,201,220 > + movdqu -16(%r10),%xmm6 > + movdqa %xmm8,%xmm9 > +.byte 102,15,56,0,247 > + movups -16(%rcx),%xmm1 > +.byte 102,15,56,220,208 > +.byte 69,15,58,204,194,0 > +.byte 68,15,56,200,205 > + pxor %xmm5,%xmm3 > +.byte 15,56,201,229 > + movups 0(%rcx),%xmm0 > +.byte 102,15,56,220,209 > + movdqa %xmm8,%xmm10 > +.byte 69,15,58,204,193,0 > +.byte 68,15,56,200,214 > + movups 16(%rcx),%xmm1 > +.byte 102,15,56,220,208 > +.byte 15,56,202,222 > + pxor %xmm6,%xmm4 > +.byte 15,56,201,238 > + movups 32(%rcx),%xmm0 > +.byte 102,15,56,220,209 > + movdqa %xmm8,%xmm9 > +.byte 69,15,58,204,194,0 > +.byte 68,15,56,200,203 > + movups 48(%rcx),%xmm1 > +.byte 102,15,56,220,208 > +.byte 15,56,202,227 > + pxor %xmm3,%xmm5 > +.byte 15,56,201,243 > + cmpl $11,%r11d > + jb .Laesenclast6 > + movups 64(%rcx),%xmm0 > +.byte 102,15,56,220,209 > + movups 80(%rcx),%xmm1 > +.byte 102,15,56,220,208 > + je .Laesenclast6 > + movups 96(%rcx),%xmm0 > +.byte 102,15,56,220,209 > + movups 112(%rcx),%xmm1 > +.byte 102,15,56,220,208 > +.Laesenclast6: > +.byte 102,15,56,221,209 > + movups 16-112(%rcx),%xmm0 > + movdqa %xmm8,%xmm10 > +.byte 69,15,58,204,193,0 > +.byte 68,15,56,200,212 > + movups 16(%rdi),%xmm14 > + xorps %xmm15,%xmm14 > + movups %xmm2,0(%rsi,%rdi,1) > + xorps %xmm14,%xmm2 > + movups -80(%rcx),%xmm1 > +.byte 102,15,56,220,208 > +.byte 15,56,202,236 > + pxor %xmm4,%xmm6 > +.byte 15,56,201,220 > + movups -64(%rcx),%xmm0 > +.byte 102,15,56,220,209 > + movdqa %xmm8,%xmm9 > +.byte 69,15,58,204,194,1 > +.byte 68,15,56,200,205 > + movups -48(%rcx),%xmm1 > +.byte 102,15,56,220,208 > +.byte 15,56,202,245 > + pxor %xmm5,%xmm3 > +.byte 15,56,201,229 > + movups -32(%rcx),%xmm0 > +.byte 102,15,56,220,209 > + movdqa %xmm8,%xmm10 > +.byte 69,15,58,204,193,1 > +.byte 68,15,56,200,214 > + movups -16(%rcx),%xmm1 > +.byte 102,15,56,220,208 > +.byte 15,56,202,222 > + pxor %xmm6,%xmm4 > +.byte 15,56,201,238 > + movups 0(%rcx),%xmm0 > +.byte 102,15,56,220,209 > + movdqa %xmm8,%xmm9 > +.byte 69,15,58,204,194,1 > +.byte 68,15,56,200,203 > + movups 16(%rcx),%xmm1 > +.byte 102,15,56,220,208 > +.byte 15,56,202,227 > + pxor %xmm3,%xmm5 > +.byte 15,56,201,243 > + movups 32(%rcx),%xmm0 > +.byte 102,15,56,220,209 > + movdqa %xmm8,%xmm10 > +.byte 69,15,58,204,193,1 > +.byte 68,15,56,200,212 > + movups 48(%rcx),%xmm1 > +.byte 102,15,56,220,208 > +.byte 15,56,202,236 > + pxor %xmm4,%xmm6 > +.byte 15,56,201,220 > + cmpl $11,%r11d > + jb .Laesenclast7 > + movups 64(%rcx),%xmm0 > +.byte 102,15,56,220,209 > + movups 80(%rcx),%xmm1 > +.byte 102,15,56,220,208 > + je .Laesenclast7 > + movups 96(%rcx),%xmm0 > +.byte 102,15,56,220,209 > + movups 112(%rcx),%xmm1 > +.byte 102,15,56,220,208 > +.Laesenclast7: > +.byte 102,15,56,221,209 > + movups 16-112(%rcx),%xmm0 > + movdqa %xmm8,%xmm9 > +.byte 69,15,58,204,194,1 > +.byte 68,15,56,200,205 > + movups 32(%rdi),%xmm14 > + xorps %xmm15,%xmm14 > + movups %xmm2,16(%rsi,%rdi,1) > + xorps %xmm14,%xmm2 > + movups -80(%rcx),%xmm1 > +.byte 102,15,56,220,208 > +.byte 15,56,202,245 > + pxor %xmm5,%xmm3 > +.byte 15,56,201,229 > + movups -64(%rcx),%xmm0 > +.byte 102,15,56,220,209 > + movdqa %xmm8,%xmm10 > +.byte 69,15,58,204,193,2 > +.byte 68,15,56,200,214 > + movups -48(%rcx),%xmm1 > +.byte 102,15,56,220,208 > +.byte 15,56,202,222 > + pxor %xmm6,%xmm4 > +.byte 15,56,201,238 > + movups -32(%rcx),%xmm0 > +.byte 102,15,56,220,209 > + movdqa %xmm8,%xmm9 > +.byte 69,15,58,204,194,2 > +.byte 68,15,56,200,203 > + movups -16(%rcx),%xmm1 > +.byte 102,15,56,220,208 > +.byte 15,56,202,227 > + pxor %xmm3,%xmm5 > +.byte 15,56,201,243 > + movups 0(%rcx),%xmm0 > +.byte 102,15,56,220,209 > + movdqa %xmm8,%xmm10 > +.byte 69,15,58,204,193,2 > +.byte 68,15,56,200,212 > + movups 16(%rcx),%xmm1 > +.byte 102,15,56,220,208 > +.byte 15,56,202,236 > + pxor %xmm4,%xmm6 > +.byte 15,56,201,220 > + movups 32(%rcx),%xmm0 > +.byte 102,15,56,220,209 > + movdqa %xmm8,%xmm9 > +.byte 69,15,58,204,194,2 > +.byte 68,15,56,200,205 > + movups 48(%rcx),%xmm1 > +.byte 102,15,56,220,208 > +.byte 15,56,202,245 > + pxor %xmm5,%xmm3 > +.byte 15,56,201,229 > + cmpl $11,%r11d > + jb .Laesenclast8 > + movups 64(%rcx),%xmm0 > +.byte 102,15,56,220,209 > + movups 80(%rcx),%xmm1 > +.byte 102,15,56,220,208 > + je .Laesenclast8 > + movups 96(%rcx),%xmm0 > +.byte 102,15,56,220,209 > + movups 112(%rcx),%xmm1 > +.byte 102,15,56,220,208 > +.Laesenclast8: > +.byte 102,15,56,221,209 > + movups 16-112(%rcx),%xmm0 > + movdqa %xmm8,%xmm10 > +.byte 69,15,58,204,193,2 > +.byte 68,15,56,200,214 > + movups 48(%rdi),%xmm14 > + xorps %xmm15,%xmm14 > + movups %xmm2,32(%rsi,%rdi,1) > + xorps %xmm14,%xmm2 > + movups -80(%rcx),%xmm1 > +.byte 102,15,56,220,208 > +.byte 15,56,202,222 > + pxor %xmm6,%xmm4 > +.byte 15,56,201,238 > + movups -64(%rcx),%xmm0 > +.byte 102,15,56,220,209 > + movdqa %xmm8,%xmm9 > +.byte 69,15,58,204,194,3 > +.byte 68,15,56,200,203 > + movups -48(%rcx),%xmm1 > +.byte 102,15,56,220,208 > +.byte 15,56,202,227 > + pxor %xmm3,%xmm5 > +.byte 15,56,201,243 > + movups -32(%rcx),%xmm0 > +.byte 102,15,56,220,209 > + movdqa %xmm8,%xmm10 > +.byte 69,15,58,204,193,3 > +.byte 68,15,56,200,212 > +.byte 15,56,202,236 > + pxor %xmm4,%xmm6 > + movups -16(%rcx),%xmm1 > +.byte 102,15,56,220,208 > + movdqa %xmm8,%xmm9 > +.byte 69,15,58,204,194,3 > +.byte 68,15,56,200,205 > +.byte 15,56,202,245 > + movups 0(%rcx),%xmm0 > +.byte 102,15,56,220,209 > + movdqa %xmm12,%xmm5 > + movdqa %xmm8,%xmm10 > +.byte 69,15,58,204,193,3 > +.byte 68,15,56,200,214 > + movups 16(%rcx),%xmm1 > +.byte 102,15,56,220,208 > + movdqa %xmm8,%xmm9 > +.byte 69,15,58,204,194,3 > +.byte 68,15,56,200,205 > + movups 32(%rcx),%xmm0 > +.byte 102,15,56,220,209 > + movups 48(%rcx),%xmm1 > +.byte 102,15,56,220,208 > + cmpl $11,%r11d > + jb .Laesenclast9 > + movups 64(%rcx),%xmm0 > +.byte 102,15,56,220,209 > + movups 80(%rcx),%xmm1 > +.byte 102,15,56,220,208 > + je .Laesenclast9 > + movups 96(%rcx),%xmm0 > +.byte 102,15,56,220,209 > + movups 112(%rcx),%xmm1 > +.byte 102,15,56,220,208 > +.Laesenclast9: > +.byte 102,15,56,221,209 > + movups 16-112(%rcx),%xmm0 > + decq %rdx > + > + paddd %xmm11,%xmm8 > + movups %xmm2,48(%rsi,%rdi,1) > + leaq 64(%rdi),%rdi > + jnz .Loop_shaext > + > + pshufd $27,%xmm8,%xmm8 > + pshufd $27,%xmm9,%xmm9 > + movups %xmm2,(%r8) > + movdqu %xmm8,(%r9) > + movd %xmm9,16(%r9) > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size aesni_cbc_sha1_enc_shaext,.-aesni_cbc_sha1_enc_shaext > diff --git a/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/aesni-sha256- > x86_64.S b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/aesni-sha256- > x86_64.S > new file mode 100644 > index 0000000000..e257169287 > --- /dev/null > +++ b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/aesni-sha256-x86_64.S > @@ -0,0 +1,69 @@ > +# WARNING: do not edit! > +# Generated from openssl/crypto/aes/asm/aesni-sha256-x86_64.pl > +# > +# Copyright 2013-2020 The OpenSSL Project Authors. All Rights Reserved. > +# > +# Licensed under the OpenSSL license (the "License"). You may not use > +# this file except in compliance with the License. You can obtain a copy > +# in the file LICENSE in the source distribution or at > +# https://www.openssl.org/source/license.html > + > +.text > + > + > +.globl aesni_cbc_sha256_enc > +.type aesni_cbc_sha256_enc,@function > +.align 16 > +aesni_cbc_sha256_enc: > +.cfi_startproc > + xorl %eax,%eax > + cmpq $0,%rdi > + je .Lprobe > + ud2 > +.Lprobe: > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size aesni_cbc_sha256_enc,.-aesni_cbc_sha256_enc > + > +.align 64 > +.type K256,@object > +K256: > +.long 0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5 > +.long 0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5 > +.long 0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5 > +.long 0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5 > +.long 0xd807aa98,0x12835b01,0x243185be,0x550c7dc3 > +.long 0xd807aa98,0x12835b01,0x243185be,0x550c7dc3 > +.long 0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174 > +.long 0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174 > +.long 0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc > +.long 0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc > +.long 0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da > +.long 0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da > +.long 0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7 > +.long 0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7 > +.long 0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967 > +.long 0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967 > +.long 0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13 > +.long 0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13 > +.long 0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85 > +.long 0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85 > +.long 0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3 > +.long 0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3 > +.long 0xd192e819,0xd6990624,0xf40e3585,0x106aa070 > +.long 0xd192e819,0xd6990624,0xf40e3585,0x106aa070 > +.long 0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5 > +.long 0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5 > +.long 0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3 > +.long 0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3 > +.long 0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208 > +.long 0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208 > +.long 0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2 > +.long 0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2 > + > +.long 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f > +.long 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f > +.long 0,0,0,0, 0,0,0,0, -1,-1,-1,-1 > +.long 0,0,0,0, 0,0,0,0 > +.byte > 65,69,83,78,73,45,67,66,67,43,83,72,65,50,53,54,32,115,116,105,116,99,104,3 > 2,102,111,114,32,120,56,54,95,54,52,44,32,67,82,89,80,84,79,71,65,77,83,32,9 > 8,121,32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46,111,114,1 > 03,62,0 > +.align 64 > diff --git a/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/aesni-x86_64.S > b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/aesni-x86_64.S > new file mode 100644 > index 0000000000..2bdb5cf251 > --- /dev/null > +++ b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/aesni-x86_64.S > @@ -0,0 +1,4484 @@ > +# WARNING: do not edit! > +# Generated from openssl/crypto/aes/asm/aesni-x86_64.pl > +# > +# Copyright 2009-2020 The OpenSSL Project Authors. All Rights Reserved. > +# > +# Licensed under the OpenSSL license (the "License"). You may not use > +# this file except in compliance with the License. You can obtain a copy > +# in the file LICENSE in the source distribution or at > +# https://www.openssl.org/source/license.html > + > +.text > + > +.globl aesni_encrypt > +.type aesni_encrypt,@function > +.align 16 > +aesni_encrypt: > +.cfi_startproc > + movups (%rdi),%xmm2 > + movl 240(%rdx),%eax > + movups (%rdx),%xmm0 > + movups 16(%rdx),%xmm1 > + leaq 32(%rdx),%rdx > + xorps %xmm0,%xmm2 > +.Loop_enc1_1: > +.byte 102,15,56,220,209 > + decl %eax > + movups (%rdx),%xmm1 > + leaq 16(%rdx),%rdx > + jnz .Loop_enc1_1 > +.byte 102,15,56,221,209 > + pxor %xmm0,%xmm0 > + pxor %xmm1,%xmm1 > + movups %xmm2,(%rsi) > + pxor %xmm2,%xmm2 > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size aesni_encrypt,.-aesni_encrypt > + > +.globl aesni_decrypt > +.type aesni_decrypt,@function > +.align 16 > +aesni_decrypt: > +.cfi_startproc > + movups (%rdi),%xmm2 > + movl 240(%rdx),%eax > + movups (%rdx),%xmm0 > + movups 16(%rdx),%xmm1 > + leaq 32(%rdx),%rdx > + xorps %xmm0,%xmm2 > +.Loop_dec1_2: > +.byte 102,15,56,222,209 > + decl %eax > + movups (%rdx),%xmm1 > + leaq 16(%rdx),%rdx > + jnz .Loop_dec1_2 > +.byte 102,15,56,223,209 > + pxor %xmm0,%xmm0 > + pxor %xmm1,%xmm1 > + movups %xmm2,(%rsi) > + pxor %xmm2,%xmm2 > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size aesni_decrypt, .-aesni_decrypt > +.type _aesni_encrypt2,@function > +.align 16 > +_aesni_encrypt2: > +.cfi_startproc > + movups (%rcx),%xmm0 > + shll $4,%eax > + movups 16(%rcx),%xmm1 > + xorps %xmm0,%xmm2 > + xorps %xmm0,%xmm3 > + movups 32(%rcx),%xmm0 > + leaq 32(%rcx,%rax,1),%rcx > + negq %rax > + addq $16,%rax > + > +.Lenc_loop2: > +.byte 102,15,56,220,209 > +.byte 102,15,56,220,217 > + movups (%rcx,%rax,1),%xmm1 > + addq $32,%rax > +.byte 102,15,56,220,208 > +.byte 102,15,56,220,216 > + movups -16(%rcx,%rax,1),%xmm0 > + jnz .Lenc_loop2 > + > +.byte 102,15,56,220,209 > +.byte 102,15,56,220,217 > +.byte 102,15,56,221,208 > +.byte 102,15,56,221,216 > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size _aesni_encrypt2,.-_aesni_encrypt2 > +.type _aesni_decrypt2,@function > +.align 16 > +_aesni_decrypt2: > +.cfi_startproc > + movups (%rcx),%xmm0 > + shll $4,%eax > + movups 16(%rcx),%xmm1 > + xorps %xmm0,%xmm2 > + xorps %xmm0,%xmm3 > + movups 32(%rcx),%xmm0 > + leaq 32(%rcx,%rax,1),%rcx > + negq %rax > + addq $16,%rax > + > +.Ldec_loop2: > +.byte 102,15,56,222,209 > +.byte 102,15,56,222,217 > + movups (%rcx,%rax,1),%xmm1 > + addq $32,%rax > +.byte 102,15,56,222,208 > +.byte 102,15,56,222,216 > + movups -16(%rcx,%rax,1),%xmm0 > + jnz .Ldec_loop2 > + > +.byte 102,15,56,222,209 > +.byte 102,15,56,222,217 > +.byte 102,15,56,223,208 > +.byte 102,15,56,223,216 > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size _aesni_decrypt2,.-_aesni_decrypt2 > +.type _aesni_encrypt3,@function > +.align 16 > +_aesni_encrypt3: > +.cfi_startproc > + movups (%rcx),%xmm0 > + shll $4,%eax > + movups 16(%rcx),%xmm1 > + xorps %xmm0,%xmm2 > + xorps %xmm0,%xmm3 > + xorps %xmm0,%xmm4 > + movups 32(%rcx),%xmm0 > + leaq 32(%rcx,%rax,1),%rcx > + negq %rax > + addq $16,%rax > + > +.Lenc_loop3: > +.byte 102,15,56,220,209 > +.byte 102,15,56,220,217 > +.byte 102,15,56,220,225 > + movups (%rcx,%rax,1),%xmm1 > + addq $32,%rax > +.byte 102,15,56,220,208 > +.byte 102,15,56,220,216 > +.byte 102,15,56,220,224 > + movups -16(%rcx,%rax,1),%xmm0 > + jnz .Lenc_loop3 > + > +.byte 102,15,56,220,209 > +.byte 102,15,56,220,217 > +.byte 102,15,56,220,225 > +.byte 102,15,56,221,208 > +.byte 102,15,56,221,216 > +.byte 102,15,56,221,224 > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size _aesni_encrypt3,.-_aesni_encrypt3 > +.type _aesni_decrypt3,@function > +.align 16 > +_aesni_decrypt3: > +.cfi_startproc > + movups (%rcx),%xmm0 > + shll $4,%eax > + movups 16(%rcx),%xmm1 > + xorps %xmm0,%xmm2 > + xorps %xmm0,%xmm3 > + xorps %xmm0,%xmm4 > + movups 32(%rcx),%xmm0 > + leaq 32(%rcx,%rax,1),%rcx > + negq %rax > + addq $16,%rax > + > +.Ldec_loop3: > +.byte 102,15,56,222,209 > +.byte 102,15,56,222,217 > +.byte 102,15,56,222,225 > + movups (%rcx,%rax,1),%xmm1 > + addq $32,%rax > +.byte 102,15,56,222,208 > +.byte 102,15,56,222,216 > +.byte 102,15,56,222,224 > + movups -16(%rcx,%rax,1),%xmm0 > + jnz .Ldec_loop3 > + > +.byte 102,15,56,222,209 > +.byte 102,15,56,222,217 > +.byte 102,15,56,222,225 > +.byte 102,15,56,223,208 > +.byte 102,15,56,223,216 > +.byte 102,15,56,223,224 > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size _aesni_decrypt3,.-_aesni_decrypt3 > +.type _aesni_encrypt4,@function > +.align 16 > +_aesni_encrypt4: > +.cfi_startproc > + movups (%rcx),%xmm0 > + shll $4,%eax > + movups 16(%rcx),%xmm1 > + xorps %xmm0,%xmm2 > + xorps %xmm0,%xmm3 > + xorps %xmm0,%xmm4 > + xorps %xmm0,%xmm5 > + movups 32(%rcx),%xmm0 > + leaq 32(%rcx,%rax,1),%rcx > + negq %rax > +.byte 0x0f,0x1f,0x00 > + addq $16,%rax > + > +.Lenc_loop4: > +.byte 102,15,56,220,209 > +.byte 102,15,56,220,217 > +.byte 102,15,56,220,225 > +.byte 102,15,56,220,233 > + movups (%rcx,%rax,1),%xmm1 > + addq $32,%rax > +.byte 102,15,56,220,208 > +.byte 102,15,56,220,216 > +.byte 102,15,56,220,224 > +.byte 102,15,56,220,232 > + movups -16(%rcx,%rax,1),%xmm0 > + jnz .Lenc_loop4 > + > +.byte 102,15,56,220,209 > +.byte 102,15,56,220,217 > +.byte 102,15,56,220,225 > +.byte 102,15,56,220,233 > +.byte 102,15,56,221,208 > +.byte 102,15,56,221,216 > +.byte 102,15,56,221,224 > +.byte 102,15,56,221,232 > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size _aesni_encrypt4,.-_aesni_encrypt4 > +.type _aesni_decrypt4,@function > +.align 16 > +_aesni_decrypt4: > +.cfi_startproc > + movups (%rcx),%xmm0 > + shll $4,%eax > + movups 16(%rcx),%xmm1 > + xorps %xmm0,%xmm2 > + xorps %xmm0,%xmm3 > + xorps %xmm0,%xmm4 > + xorps %xmm0,%xmm5 > + movups 32(%rcx),%xmm0 > + leaq 32(%rcx,%rax,1),%rcx > + negq %rax > +.byte 0x0f,0x1f,0x00 > + addq $16,%rax > + > +.Ldec_loop4: > +.byte 102,15,56,222,209 > +.byte 102,15,56,222,217 > +.byte 102,15,56,222,225 > +.byte 102,15,56,222,233 > + movups (%rcx,%rax,1),%xmm1 > + addq $32,%rax > +.byte 102,15,56,222,208 > +.byte 102,15,56,222,216 > +.byte 102,15,56,222,224 > +.byte 102,15,56,222,232 > + movups -16(%rcx,%rax,1),%xmm0 > + jnz .Ldec_loop4 > + > +.byte 102,15,56,222,209 > +.byte 102,15,56,222,217 > +.byte 102,15,56,222,225 > +.byte 102,15,56,222,233 > +.byte 102,15,56,223,208 > +.byte 102,15,56,223,216 > +.byte 102,15,56,223,224 > +.byte 102,15,56,223,232 > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size _aesni_decrypt4,.-_aesni_decrypt4 > +.type _aesni_encrypt6,@function > +.align 16 > +_aesni_encrypt6: > +.cfi_startproc > + movups (%rcx),%xmm0 > + shll $4,%eax > + movups 16(%rcx),%xmm1 > + xorps %xmm0,%xmm2 > + pxor %xmm0,%xmm3 > + pxor %xmm0,%xmm4 > +.byte 102,15,56,220,209 > + leaq 32(%rcx,%rax,1),%rcx > + negq %rax > +.byte 102,15,56,220,217 > + pxor %xmm0,%xmm5 > + pxor %xmm0,%xmm6 > +.byte 102,15,56,220,225 > + pxor %xmm0,%xmm7 > + movups (%rcx,%rax,1),%xmm0 > + addq $16,%rax > + jmp .Lenc_loop6_enter > +.align 16 > +.Lenc_loop6: > +.byte 102,15,56,220,209 > +.byte 102,15,56,220,217 > +.byte 102,15,56,220,225 > +.Lenc_loop6_enter: > +.byte 102,15,56,220,233 > +.byte 102,15,56,220,241 > +.byte 102,15,56,220,249 > + movups (%rcx,%rax,1),%xmm1 > + addq $32,%rax > +.byte 102,15,56,220,208 > +.byte 102,15,56,220,216 > +.byte 102,15,56,220,224 > +.byte 102,15,56,220,232 > +.byte 102,15,56,220,240 > +.byte 102,15,56,220,248 > + movups -16(%rcx,%rax,1),%xmm0 > + jnz .Lenc_loop6 > + > +.byte 102,15,56,220,209 > +.byte 102,15,56,220,217 > +.byte 102,15,56,220,225 > +.byte 102,15,56,220,233 > +.byte 102,15,56,220,241 > +.byte 102,15,56,220,249 > +.byte 102,15,56,221,208 > +.byte 102,15,56,221,216 > +.byte 102,15,56,221,224 > +.byte 102,15,56,221,232 > +.byte 102,15,56,221,240 > +.byte 102,15,56,221,248 > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size _aesni_encrypt6,.-_aesni_encrypt6 > +.type _aesni_decrypt6,@function > +.align 16 > +_aesni_decrypt6: > +.cfi_startproc > + movups (%rcx),%xmm0 > + shll $4,%eax > + movups 16(%rcx),%xmm1 > + xorps %xmm0,%xmm2 > + pxor %xmm0,%xmm3 > + pxor %xmm0,%xmm4 > +.byte 102,15,56,222,209 > + leaq 32(%rcx,%rax,1),%rcx > + negq %rax > +.byte 102,15,56,222,217 > + pxor %xmm0,%xmm5 > + pxor %xmm0,%xmm6 > +.byte 102,15,56,222,225 > + pxor %xmm0,%xmm7 > + movups (%rcx,%rax,1),%xmm0 > + addq $16,%rax > + jmp .Ldec_loop6_enter > +.align 16 > +.Ldec_loop6: > +.byte 102,15,56,222,209 > +.byte 102,15,56,222,217 > +.byte 102,15,56,222,225 > +.Ldec_loop6_enter: > +.byte 102,15,56,222,233 > +.byte 102,15,56,222,241 > +.byte 102,15,56,222,249 > + movups (%rcx,%rax,1),%xmm1 > + addq $32,%rax > +.byte 102,15,56,222,208 > +.byte 102,15,56,222,216 > +.byte 102,15,56,222,224 > +.byte 102,15,56,222,232 > +.byte 102,15,56,222,240 > +.byte 102,15,56,222,248 > + movups -16(%rcx,%rax,1),%xmm0 > + jnz .Ldec_loop6 > + > +.byte 102,15,56,222,209 > +.byte 102,15,56,222,217 > +.byte 102,15,56,222,225 > +.byte 102,15,56,222,233 > +.byte 102,15,56,222,241 > +.byte 102,15,56,222,249 > +.byte 102,15,56,223,208 > +.byte 102,15,56,223,216 > +.byte 102,15,56,223,224 > +.byte 102,15,56,223,232 > +.byte 102,15,56,223,240 > +.byte 102,15,56,223,248 > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size _aesni_decrypt6,.-_aesni_decrypt6 > +.type _aesni_encrypt8,@function > +.align 16 > +_aesni_encrypt8: > +.cfi_startproc > + movups (%rcx),%xmm0 > + shll $4,%eax > + movups 16(%rcx),%xmm1 > + xorps %xmm0,%xmm2 > + xorps %xmm0,%xmm3 > + pxor %xmm0,%xmm4 > + pxor %xmm0,%xmm5 > + pxor %xmm0,%xmm6 > + leaq 32(%rcx,%rax,1),%rcx > + negq %rax > +.byte 102,15,56,220,209 > + pxor %xmm0,%xmm7 > + pxor %xmm0,%xmm8 > +.byte 102,15,56,220,217 > + pxor %xmm0,%xmm9 > + movups (%rcx,%rax,1),%xmm0 > + addq $16,%rax > + jmp .Lenc_loop8_inner > +.align 16 > +.Lenc_loop8: > +.byte 102,15,56,220,209 > +.byte 102,15,56,220,217 > +.Lenc_loop8_inner: > +.byte 102,15,56,220,225 > +.byte 102,15,56,220,233 > +.byte 102,15,56,220,241 > +.byte 102,15,56,220,249 > +.byte 102,68,15,56,220,193 > +.byte 102,68,15,56,220,201 > +.Lenc_loop8_enter: > + movups (%rcx,%rax,1),%xmm1 > + addq $32,%rax > +.byte 102,15,56,220,208 > +.byte 102,15,56,220,216 > +.byte 102,15,56,220,224 > +.byte 102,15,56,220,232 > +.byte 102,15,56,220,240 > +.byte 102,15,56,220,248 > +.byte 102,68,15,56,220,192 > +.byte 102,68,15,56,220,200 > + movups -16(%rcx,%rax,1),%xmm0 > + jnz .Lenc_loop8 > + > +.byte 102,15,56,220,209 > +.byte 102,15,56,220,217 > +.byte 102,15,56,220,225 > +.byte 102,15,56,220,233 > +.byte 102,15,56,220,241 > +.byte 102,15,56,220,249 > +.byte 102,68,15,56,220,193 > +.byte 102,68,15,56,220,201 > +.byte 102,15,56,221,208 > +.byte 102,15,56,221,216 > +.byte 102,15,56,221,224 > +.byte 102,15,56,221,232 > +.byte 102,15,56,221,240 > +.byte 102,15,56,221,248 > +.byte 102,68,15,56,221,192 > +.byte 102,68,15,56,221,200 > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size _aesni_encrypt8,.-_aesni_encrypt8 > +.type _aesni_decrypt8,@function > +.align 16 > +_aesni_decrypt8: > +.cfi_startproc > + movups (%rcx),%xmm0 > + shll $4,%eax > + movups 16(%rcx),%xmm1 > + xorps %xmm0,%xmm2 > + xorps %xmm0,%xmm3 > + pxor %xmm0,%xmm4 > + pxor %xmm0,%xmm5 > + pxor %xmm0,%xmm6 > + leaq 32(%rcx,%rax,1),%rcx > + negq %rax > +.byte 102,15,56,222,209 > + pxor %xmm0,%xmm7 > + pxor %xmm0,%xmm8 > +.byte 102,15,56,222,217 > + pxor %xmm0,%xmm9 > + movups (%rcx,%rax,1),%xmm0 > + addq $16,%rax > + jmp .Ldec_loop8_inner > +.align 16 > +.Ldec_loop8: > +.byte 102,15,56,222,209 > +.byte 102,15,56,222,217 > +.Ldec_loop8_inner: > +.byte 102,15,56,222,225 > +.byte 102,15,56,222,233 > +.byte 102,15,56,222,241 > +.byte 102,15,56,222,249 > +.byte 102,68,15,56,222,193 > +.byte 102,68,15,56,222,201 > +.Ldec_loop8_enter: > + movups (%rcx,%rax,1),%xmm1 > + addq $32,%rax > +.byte 102,15,56,222,208 > +.byte 102,15,56,222,216 > +.byte 102,15,56,222,224 > +.byte 102,15,56,222,232 > +.byte 102,15,56,222,240 > +.byte 102,15,56,222,248 > +.byte 102,68,15,56,222,192 > +.byte 102,68,15,56,222,200 > + movups -16(%rcx,%rax,1),%xmm0 > + jnz .Ldec_loop8 > + > +.byte 102,15,56,222,209 > +.byte 102,15,56,222,217 > +.byte 102,15,56,222,225 > +.byte 102,15,56,222,233 > +.byte 102,15,56,222,241 > +.byte 102,15,56,222,249 > +.byte 102,68,15,56,222,193 > +.byte 102,68,15,56,222,201 > +.byte 102,15,56,223,208 > +.byte 102,15,56,223,216 > +.byte 102,15,56,223,224 > +.byte 102,15,56,223,232 > +.byte 102,15,56,223,240 > +.byte 102,15,56,223,248 > +.byte 102,68,15,56,223,192 > +.byte 102,68,15,56,223,200 > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size _aesni_decrypt8,.-_aesni_decrypt8 > +.globl aesni_ecb_encrypt > +.type aesni_ecb_encrypt,@function > +.align 16 > +aesni_ecb_encrypt: > +.cfi_startproc > + andq $-16,%rdx > + jz .Lecb_ret > + > + movl 240(%rcx),%eax > + movups (%rcx),%xmm0 > + movq %rcx,%r11 > + movl %eax,%r10d > + testl %r8d,%r8d > + jz .Lecb_decrypt > + > + cmpq $0x80,%rdx > + jb .Lecb_enc_tail > + > + movdqu (%rdi),%xmm2 > + movdqu 16(%rdi),%xmm3 > + movdqu 32(%rdi),%xmm4 > + movdqu 48(%rdi),%xmm5 > + movdqu 64(%rdi),%xmm6 > + movdqu 80(%rdi),%xmm7 > + movdqu 96(%rdi),%xmm8 > + movdqu 112(%rdi),%xmm9 > + leaq 128(%rdi),%rdi > + subq $0x80,%rdx > + jmp .Lecb_enc_loop8_enter > +.align 16 > +.Lecb_enc_loop8: > + movups %xmm2,(%rsi) > + movq %r11,%rcx > + movdqu (%rdi),%xmm2 > + movl %r10d,%eax > + movups %xmm3,16(%rsi) > + movdqu 16(%rdi),%xmm3 > + movups %xmm4,32(%rsi) > + movdqu 32(%rdi),%xmm4 > + movups %xmm5,48(%rsi) > + movdqu 48(%rdi),%xmm5 > + movups %xmm6,64(%rsi) > + movdqu 64(%rdi),%xmm6 > + movups %xmm7,80(%rsi) > + movdqu 80(%rdi),%xmm7 > + movups %xmm8,96(%rsi) > + movdqu 96(%rdi),%xmm8 > + movups %xmm9,112(%rsi) > + leaq 128(%rsi),%rsi > + movdqu 112(%rdi),%xmm9 > + leaq 128(%rdi),%rdi > +.Lecb_enc_loop8_enter: > + > + call _aesni_encrypt8 > + > + subq $0x80,%rdx > + jnc .Lecb_enc_loop8 > + > + movups %xmm2,(%rsi) > + movq %r11,%rcx > + movups %xmm3,16(%rsi) > + movl %r10d,%eax > + movups %xmm4,32(%rsi) > + movups %xmm5,48(%rsi) > + movups %xmm6,64(%rsi) > + movups %xmm7,80(%rsi) > + movups %xmm8,96(%rsi) > + movups %xmm9,112(%rsi) > + leaq 128(%rsi),%rsi > + addq $0x80,%rdx > + jz .Lecb_ret > + > +.Lecb_enc_tail: > + movups (%rdi),%xmm2 > + cmpq $0x20,%rdx > + jb .Lecb_enc_one > + movups 16(%rdi),%xmm3 > + je .Lecb_enc_two > + movups 32(%rdi),%xmm4 > + cmpq $0x40,%rdx > + jb .Lecb_enc_three > + movups 48(%rdi),%xmm5 > + je .Lecb_enc_four > + movups 64(%rdi),%xmm6 > + cmpq $0x60,%rdx > + jb .Lecb_enc_five > + movups 80(%rdi),%xmm7 > + je .Lecb_enc_six > + movdqu 96(%rdi),%xmm8 > + xorps %xmm9,%xmm9 > + call _aesni_encrypt8 > + movups %xmm2,(%rsi) > + movups %xmm3,16(%rsi) > + movups %xmm4,32(%rsi) > + movups %xmm5,48(%rsi) > + movups %xmm6,64(%rsi) > + movups %xmm7,80(%rsi) > + movups %xmm8,96(%rsi) > + jmp .Lecb_ret > +.align 16 > +.Lecb_enc_one: > + movups (%rcx),%xmm0 > + movups 16(%rcx),%xmm1 > + leaq 32(%rcx),%rcx > + xorps %xmm0,%xmm2 > +.Loop_enc1_3: > +.byte 102,15,56,220,209 > + decl %eax > + movups (%rcx),%xmm1 > + leaq 16(%rcx),%rcx > + jnz .Loop_enc1_3 > +.byte 102,15,56,221,209 > + movups %xmm2,(%rsi) > + jmp .Lecb_ret > +.align 16 > +.Lecb_enc_two: > + call _aesni_encrypt2 > + movups %xmm2,(%rsi) > + movups %xmm3,16(%rsi) > + jmp .Lecb_ret > +.align 16 > +.Lecb_enc_three: > + call _aesni_encrypt3 > + movups %xmm2,(%rsi) > + movups %xmm3,16(%rsi) > + movups %xmm4,32(%rsi) > + jmp .Lecb_ret > +.align 16 > +.Lecb_enc_four: > + call _aesni_encrypt4 > + movups %xmm2,(%rsi) > + movups %xmm3,16(%rsi) > + movups %xmm4,32(%rsi) > + movups %xmm5,48(%rsi) > + jmp .Lecb_ret > +.align 16 > +.Lecb_enc_five: > + xorps %xmm7,%xmm7 > + call _aesni_encrypt6 > + movups %xmm2,(%rsi) > + movups %xmm3,16(%rsi) > + movups %xmm4,32(%rsi) > + movups %xmm5,48(%rsi) > + movups %xmm6,64(%rsi) > + jmp .Lecb_ret > +.align 16 > +.Lecb_enc_six: > + call _aesni_encrypt6 > + movups %xmm2,(%rsi) > + movups %xmm3,16(%rsi) > + movups %xmm4,32(%rsi) > + movups %xmm5,48(%rsi) > + movups %xmm6,64(%rsi) > + movups %xmm7,80(%rsi) > + jmp .Lecb_ret > + > +.align 16 > +.Lecb_decrypt: > + cmpq $0x80,%rdx > + jb .Lecb_dec_tail > + > + movdqu (%rdi),%xmm2 > + movdqu 16(%rdi),%xmm3 > + movdqu 32(%rdi),%xmm4 > + movdqu 48(%rdi),%xmm5 > + movdqu 64(%rdi),%xmm6 > + movdqu 80(%rdi),%xmm7 > + movdqu 96(%rdi),%xmm8 > + movdqu 112(%rdi),%xmm9 > + leaq 128(%rdi),%rdi > + subq $0x80,%rdx > + jmp .Lecb_dec_loop8_enter > +.align 16 > +.Lecb_dec_loop8: > + movups %xmm2,(%rsi) > + movq %r11,%rcx > + movdqu (%rdi),%xmm2 > + movl %r10d,%eax > + movups %xmm3,16(%rsi) > + movdqu 16(%rdi),%xmm3 > + movups %xmm4,32(%rsi) > + movdqu 32(%rdi),%xmm4 > + movups %xmm5,48(%rsi) > + movdqu 48(%rdi),%xmm5 > + movups %xmm6,64(%rsi) > + movdqu 64(%rdi),%xmm6 > + movups %xmm7,80(%rsi) > + movdqu 80(%rdi),%xmm7 > + movups %xmm8,96(%rsi) > + movdqu 96(%rdi),%xmm8 > + movups %xmm9,112(%rsi) > + leaq 128(%rsi),%rsi > + movdqu 112(%rdi),%xmm9 > + leaq 128(%rdi),%rdi > +.Lecb_dec_loop8_enter: > + > + call _aesni_decrypt8 > + > + movups (%r11),%xmm0 > + subq $0x80,%rdx > + jnc .Lecb_dec_loop8 > + > + movups %xmm2,(%rsi) > + pxor %xmm2,%xmm2 > + movq %r11,%rcx > + movups %xmm3,16(%rsi) > + pxor %xmm3,%xmm3 > + movl %r10d,%eax > + movups %xmm4,32(%rsi) > + pxor %xmm4,%xmm4 > + movups %xmm5,48(%rsi) > + pxor %xmm5,%xmm5 > + movups %xmm6,64(%rsi) > + pxor %xmm6,%xmm6 > + movups %xmm7,80(%rsi) > + pxor %xmm7,%xmm7 > + movups %xmm8,96(%rsi) > + pxor %xmm8,%xmm8 > + movups %xmm9,112(%rsi) > + pxor %xmm9,%xmm9 > + leaq 128(%rsi),%rsi > + addq $0x80,%rdx > + jz .Lecb_ret > + > +.Lecb_dec_tail: > + movups (%rdi),%xmm2 > + cmpq $0x20,%rdx > + jb .Lecb_dec_one > + movups 16(%rdi),%xmm3 > + je .Lecb_dec_two > + movups 32(%rdi),%xmm4 > + cmpq $0x40,%rdx > + jb .Lecb_dec_three > + movups 48(%rdi),%xmm5 > + je .Lecb_dec_four > + movups 64(%rdi),%xmm6 > + cmpq $0x60,%rdx > + jb .Lecb_dec_five > + movups 80(%rdi),%xmm7 > + je .Lecb_dec_six > + movups 96(%rdi),%xmm8 > + movups (%rcx),%xmm0 > + xorps %xmm9,%xmm9 > + call _aesni_decrypt8 > + movups %xmm2,(%rsi) > + pxor %xmm2,%xmm2 > + movups %xmm3,16(%rsi) > + pxor %xmm3,%xmm3 > + movups %xmm4,32(%rsi) > + pxor %xmm4,%xmm4 > + movups %xmm5,48(%rsi) > + pxor %xmm5,%xmm5 > + movups %xmm6,64(%rsi) > + pxor %xmm6,%xmm6 > + movups %xmm7,80(%rsi) > + pxor %xmm7,%xmm7 > + movups %xmm8,96(%rsi) > + pxor %xmm8,%xmm8 > + pxor %xmm9,%xmm9 > + jmp .Lecb_ret > +.align 16 > +.Lecb_dec_one: > + movups (%rcx),%xmm0 > + movups 16(%rcx),%xmm1 > + leaq 32(%rcx),%rcx > + xorps %xmm0,%xmm2 > +.Loop_dec1_4: > +.byte 102,15,56,222,209 > + decl %eax > + movups (%rcx),%xmm1 > + leaq 16(%rcx),%rcx > + jnz .Loop_dec1_4 > +.byte 102,15,56,223,209 > + movups %xmm2,(%rsi) > + pxor %xmm2,%xmm2 > + jmp .Lecb_ret > +.align 16 > +.Lecb_dec_two: > + call _aesni_decrypt2 > + movups %xmm2,(%rsi) > + pxor %xmm2,%xmm2 > + movups %xmm3,16(%rsi) > + pxor %xmm3,%xmm3 > + jmp .Lecb_ret > +.align 16 > +.Lecb_dec_three: > + call _aesni_decrypt3 > + movups %xmm2,(%rsi) > + pxor %xmm2,%xmm2 > + movups %xmm3,16(%rsi) > + pxor %xmm3,%xmm3 > + movups %xmm4,32(%rsi) > + pxor %xmm4,%xmm4 > + jmp .Lecb_ret > +.align 16 > +.Lecb_dec_four: > + call _aesni_decrypt4 > + movups %xmm2,(%rsi) > + pxor %xmm2,%xmm2 > + movups %xmm3,16(%rsi) > + pxor %xmm3,%xmm3 > + movups %xmm4,32(%rsi) > + pxor %xmm4,%xmm4 > + movups %xmm5,48(%rsi) > + pxor %xmm5,%xmm5 > + jmp .Lecb_ret > +.align 16 > +.Lecb_dec_five: > + xorps %xmm7,%xmm7 > + call _aesni_decrypt6 > + movups %xmm2,(%rsi) > + pxor %xmm2,%xmm2 > + movups %xmm3,16(%rsi) > + pxor %xmm3,%xmm3 > + movups %xmm4,32(%rsi) > + pxor %xmm4,%xmm4 > + movups %xmm5,48(%rsi) > + pxor %xmm5,%xmm5 > + movups %xmm6,64(%rsi) > + pxor %xmm6,%xmm6 > + pxor %xmm7,%xmm7 > + jmp .Lecb_ret > +.align 16 > +.Lecb_dec_six: > + call _aesni_decrypt6 > + movups %xmm2,(%rsi) > + pxor %xmm2,%xmm2 > + movups %xmm3,16(%rsi) > + pxor %xmm3,%xmm3 > + movups %xmm4,32(%rsi) > + pxor %xmm4,%xmm4 > + movups %xmm5,48(%rsi) > + pxor %xmm5,%xmm5 > + movups %xmm6,64(%rsi) > + pxor %xmm6,%xmm6 > + movups %xmm7,80(%rsi) > + pxor %xmm7,%xmm7 > + > +.Lecb_ret: > + xorps %xmm0,%xmm0 > + pxor %xmm1,%xmm1 > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size aesni_ecb_encrypt,.-aesni_ecb_encrypt > +.globl aesni_ccm64_encrypt_blocks > +.type aesni_ccm64_encrypt_blocks,@function > +.align 16 > +aesni_ccm64_encrypt_blocks: > +.cfi_startproc > + movl 240(%rcx),%eax > + movdqu (%r8),%xmm6 > + movdqa .Lincrement64(%rip),%xmm9 > + movdqa .Lbswap_mask(%rip),%xmm7 > + > + shll $4,%eax > + movl $16,%r10d > + leaq 0(%rcx),%r11 > + movdqu (%r9),%xmm3 > + movdqa %xmm6,%xmm2 > + leaq 32(%rcx,%rax,1),%rcx > +.byte 102,15,56,0,247 > + subq %rax,%r10 > + jmp .Lccm64_enc_outer > +.align 16 > +.Lccm64_enc_outer: > + movups (%r11),%xmm0 > + movq %r10,%rax > + movups (%rdi),%xmm8 > + > + xorps %xmm0,%xmm2 > + movups 16(%r11),%xmm1 > + xorps %xmm8,%xmm0 > + xorps %xmm0,%xmm3 > + movups 32(%r11),%xmm0 > + > +.Lccm64_enc2_loop: > +.byte 102,15,56,220,209 > +.byte 102,15,56,220,217 > + movups (%rcx,%rax,1),%xmm1 > + addq $32,%rax > +.byte 102,15,56,220,208 > +.byte 102,15,56,220,216 > + movups -16(%rcx,%rax,1),%xmm0 > + jnz .Lccm64_enc2_loop > +.byte 102,15,56,220,209 > +.byte 102,15,56,220,217 > + paddq %xmm9,%xmm6 > + decq %rdx > +.byte 102,15,56,221,208 > +.byte 102,15,56,221,216 > + > + leaq 16(%rdi),%rdi > + xorps %xmm2,%xmm8 > + movdqa %xmm6,%xmm2 > + movups %xmm8,(%rsi) > +.byte 102,15,56,0,215 > + leaq 16(%rsi),%rsi > + jnz .Lccm64_enc_outer > + > + pxor %xmm0,%xmm0 > + pxor %xmm1,%xmm1 > + pxor %xmm2,%xmm2 > + movups %xmm3,(%r9) > + pxor %xmm3,%xmm3 > + pxor %xmm8,%xmm8 > + pxor %xmm6,%xmm6 > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size aesni_ccm64_encrypt_blocks,.-aesni_ccm64_encrypt_blocks > +.globl aesni_ccm64_decrypt_blocks > +.type aesni_ccm64_decrypt_blocks,@function > +.align 16 > +aesni_ccm64_decrypt_blocks: > +.cfi_startproc > + movl 240(%rcx),%eax > + movups (%r8),%xmm6 > + movdqu (%r9),%xmm3 > + movdqa .Lincrement64(%rip),%xmm9 > + movdqa .Lbswap_mask(%rip),%xmm7 > + > + movaps %xmm6,%xmm2 > + movl %eax,%r10d > + movq %rcx,%r11 > +.byte 102,15,56,0,247 > + movups (%rcx),%xmm0 > + movups 16(%rcx),%xmm1 > + leaq 32(%rcx),%rcx > + xorps %xmm0,%xmm2 > +.Loop_enc1_5: > +.byte 102,15,56,220,209 > + decl %eax > + movups (%rcx),%xmm1 > + leaq 16(%rcx),%rcx > + jnz .Loop_enc1_5 > +.byte 102,15,56,221,209 > + shll $4,%r10d > + movl $16,%eax > + movups (%rdi),%xmm8 > + paddq %xmm9,%xmm6 > + leaq 16(%rdi),%rdi > + subq %r10,%rax > + leaq 32(%r11,%r10,1),%rcx > + movq %rax,%r10 > + jmp .Lccm64_dec_outer > +.align 16 > +.Lccm64_dec_outer: > + xorps %xmm2,%xmm8 > + movdqa %xmm6,%xmm2 > + movups %xmm8,(%rsi) > + leaq 16(%rsi),%rsi > +.byte 102,15,56,0,215 > + > + subq $1,%rdx > + jz .Lccm64_dec_break > + > + movups (%r11),%xmm0 > + movq %r10,%rax > + movups 16(%r11),%xmm1 > + xorps %xmm0,%xmm8 > + xorps %xmm0,%xmm2 > + xorps %xmm8,%xmm3 > + movups 32(%r11),%xmm0 > + jmp .Lccm64_dec2_loop > +.align 16 > +.Lccm64_dec2_loop: > +.byte 102,15,56,220,209 > +.byte 102,15,56,220,217 > + movups (%rcx,%rax,1),%xmm1 > + addq $32,%rax > +.byte 102,15,56,220,208 > +.byte 102,15,56,220,216 > + movups -16(%rcx,%rax,1),%xmm0 > + jnz .Lccm64_dec2_loop > + movups (%rdi),%xmm8 > + paddq %xmm9,%xmm6 > +.byte 102,15,56,220,209 > +.byte 102,15,56,220,217 > +.byte 102,15,56,221,208 > +.byte 102,15,56,221,216 > + leaq 16(%rdi),%rdi > + jmp .Lccm64_dec_outer > + > +.align 16 > +.Lccm64_dec_break: > + > + movl 240(%r11),%eax > + movups (%r11),%xmm0 > + movups 16(%r11),%xmm1 > + xorps %xmm0,%xmm8 > + leaq 32(%r11),%r11 > + xorps %xmm8,%xmm3 > +.Loop_enc1_6: > +.byte 102,15,56,220,217 > + decl %eax > + movups (%r11),%xmm1 > + leaq 16(%r11),%r11 > + jnz .Loop_enc1_6 > +.byte 102,15,56,221,217 > + pxor %xmm0,%xmm0 > + pxor %xmm1,%xmm1 > + pxor %xmm2,%xmm2 > + movups %xmm3,(%r9) > + pxor %xmm3,%xmm3 > + pxor %xmm8,%xmm8 > + pxor %xmm6,%xmm6 > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size aesni_ccm64_decrypt_blocks,.-aesni_ccm64_decrypt_blocks > +.globl aesni_ctr32_encrypt_blocks > +.type aesni_ctr32_encrypt_blocks,@function > +.align 16 > +aesni_ctr32_encrypt_blocks: > +.cfi_startproc > + cmpq $1,%rdx > + jne .Lctr32_bulk > + > + > + > + movups (%r8),%xmm2 > + movups (%rdi),%xmm3 > + movl 240(%rcx),%edx > + movups (%rcx),%xmm0 > + movups 16(%rcx),%xmm1 > + leaq 32(%rcx),%rcx > + xorps %xmm0,%xmm2 > +.Loop_enc1_7: > +.byte 102,15,56,220,209 > + decl %edx > + movups (%rcx),%xmm1 > + leaq 16(%rcx),%rcx > + jnz .Loop_enc1_7 > +.byte 102,15,56,221,209 > + pxor %xmm0,%xmm0 > + pxor %xmm1,%xmm1 > + xorps %xmm3,%xmm2 > + pxor %xmm3,%xmm3 > + movups %xmm2,(%rsi) > + xorps %xmm2,%xmm2 > + jmp .Lctr32_epilogue > + > +.align 16 > +.Lctr32_bulk: > + leaq (%rsp),%r11 > +.cfi_def_cfa_register %r11 > + pushq %rbp > +.cfi_offset %rbp,-16 > + subq $128,%rsp > + andq $-16,%rsp > + > + > + > + > + movdqu (%r8),%xmm2 > + movdqu (%rcx),%xmm0 > + movl 12(%r8),%r8d > + pxor %xmm0,%xmm2 > + movl 12(%rcx),%ebp > + movdqa %xmm2,0(%rsp) > + bswapl %r8d > + movdqa %xmm2,%xmm3 > + movdqa %xmm2,%xmm4 > + movdqa %xmm2,%xmm5 > + movdqa %xmm2,64(%rsp) > + movdqa %xmm2,80(%rsp) > + movdqa %xmm2,96(%rsp) > + movq %rdx,%r10 > + movdqa %xmm2,112(%rsp) > + > + leaq 1(%r8),%rax > + leaq 2(%r8),%rdx > + bswapl %eax > + bswapl %edx > + xorl %ebp,%eax > + xorl %ebp,%edx > +.byte 102,15,58,34,216,3 > + leaq 3(%r8),%rax > + movdqa %xmm3,16(%rsp) > +.byte 102,15,58,34,226,3 > + bswapl %eax > + movq %r10,%rdx > + leaq 4(%r8),%r10 > + movdqa %xmm4,32(%rsp) > + xorl %ebp,%eax > + bswapl %r10d > +.byte 102,15,58,34,232,3 > + xorl %ebp,%r10d > + movdqa %xmm5,48(%rsp) > + leaq 5(%r8),%r9 > + movl %r10d,64+12(%rsp) > + bswapl %r9d > + leaq 6(%r8),%r10 > + movl 240(%rcx),%eax > + xorl %ebp,%r9d > + bswapl %r10d > + movl %r9d,80+12(%rsp) > + xorl %ebp,%r10d > + leaq 7(%r8),%r9 > + movl %r10d,96+12(%rsp) > + bswapl %r9d > + movl OPENSSL_ia32cap_P+4(%rip),%r10d > + xorl %ebp,%r9d > + andl $71303168,%r10d > + movl %r9d,112+12(%rsp) > + > + movups 16(%rcx),%xmm1 > + > + movdqa 64(%rsp),%xmm6 > + movdqa 80(%rsp),%xmm7 > + > + cmpq $8,%rdx > + jb .Lctr32_tail > + > + subq $6,%rdx > + cmpl $4194304,%r10d > + je .Lctr32_6x > + > + leaq 128(%rcx),%rcx > + subq $2,%rdx > + jmp .Lctr32_loop8 > + > +.align 16 > +.Lctr32_6x: > + shll $4,%eax > + movl $48,%r10d > + bswapl %ebp > + leaq 32(%rcx,%rax,1),%rcx > + subq %rax,%r10 > + jmp .Lctr32_loop6 > + > +.align 16 > +.Lctr32_loop6: > + addl $6,%r8d > + movups -48(%rcx,%r10,1),%xmm0 > +.byte 102,15,56,220,209 > + movl %r8d,%eax > + xorl %ebp,%eax > +.byte 102,15,56,220,217 > +.byte 0x0f,0x38,0xf1,0x44,0x24,12 > + leal 1(%r8),%eax > +.byte 102,15,56,220,225 > + xorl %ebp,%eax > +.byte 0x0f,0x38,0xf1,0x44,0x24,28 > +.byte 102,15,56,220,233 > + leal 2(%r8),%eax > + xorl %ebp,%eax > +.byte 102,15,56,220,241 > +.byte 0x0f,0x38,0xf1,0x44,0x24,44 > + leal 3(%r8),%eax > +.byte 102,15,56,220,249 > + movups -32(%rcx,%r10,1),%xmm1 > + xorl %ebp,%eax > + > +.byte 102,15,56,220,208 > +.byte 0x0f,0x38,0xf1,0x44,0x24,60 > + leal 4(%r8),%eax > +.byte 102,15,56,220,216 > + xorl %ebp,%eax > +.byte 0x0f,0x38,0xf1,0x44,0x24,76 > +.byte 102,15,56,220,224 > + leal 5(%r8),%eax > + xorl %ebp,%eax > +.byte 102,15,56,220,232 > +.byte 0x0f,0x38,0xf1,0x44,0x24,92 > + movq %r10,%rax > +.byte 102,15,56,220,240 > +.byte 102,15,56,220,248 > + movups -16(%rcx,%r10,1),%xmm0 > + > + call .Lenc_loop6 > + > + movdqu (%rdi),%xmm8 > + movdqu 16(%rdi),%xmm9 > + movdqu 32(%rdi),%xmm10 > + movdqu 48(%rdi),%xmm11 > + movdqu 64(%rdi),%xmm12 > + movdqu 80(%rdi),%xmm13 > + leaq 96(%rdi),%rdi > + movups -64(%rcx,%r10,1),%xmm1 > + pxor %xmm2,%xmm8 > + movaps 0(%rsp),%xmm2 > + pxor %xmm3,%xmm9 > + movaps 16(%rsp),%xmm3 > + pxor %xmm4,%xmm10 > + movaps 32(%rsp),%xmm4 > + pxor %xmm5,%xmm11 > + movaps 48(%rsp),%xmm5 > + pxor %xmm6,%xmm12 > + movaps 64(%rsp),%xmm6 > + pxor %xmm7,%xmm13 > + movaps 80(%rsp),%xmm7 > + movdqu %xmm8,(%rsi) > + movdqu %xmm9,16(%rsi) > + movdqu %xmm10,32(%rsi) > + movdqu %xmm11,48(%rsi) > + movdqu %xmm12,64(%rsi) > + movdqu %xmm13,80(%rsi) > + leaq 96(%rsi),%rsi > + > + subq $6,%rdx > + jnc .Lctr32_loop6 > + > + addq $6,%rdx > + jz .Lctr32_done > + > + leal -48(%r10),%eax > + leaq -80(%rcx,%r10,1),%rcx > + negl %eax > + shrl $4,%eax > + jmp .Lctr32_tail > + > +.align 32 > +.Lctr32_loop8: > + addl $8,%r8d > + movdqa 96(%rsp),%xmm8 > +.byte 102,15,56,220,209 > + movl %r8d,%r9d > + movdqa 112(%rsp),%xmm9 > +.byte 102,15,56,220,217 > + bswapl %r9d > + movups 32-128(%rcx),%xmm0 > +.byte 102,15,56,220,225 > + xorl %ebp,%r9d > + nop > +.byte 102,15,56,220,233 > + movl %r9d,0+12(%rsp) > + leaq 1(%r8),%r9 > +.byte 102,15,56,220,241 > +.byte 102,15,56,220,249 > +.byte 102,68,15,56,220,193 > +.byte 102,68,15,56,220,201 > + movups 48-128(%rcx),%xmm1 > + bswapl %r9d > +.byte 102,15,56,220,208 > +.byte 102,15,56,220,216 > + xorl %ebp,%r9d > +.byte 0x66,0x90 > +.byte 102,15,56,220,224 > +.byte 102,15,56,220,232 > + movl %r9d,16+12(%rsp) > + leaq 2(%r8),%r9 > +.byte 102,15,56,220,240 > +.byte 102,15,56,220,248 > +.byte 102,68,15,56,220,192 > +.byte 102,68,15,56,220,200 > + movups 64-128(%rcx),%xmm0 > + bswapl %r9d > +.byte 102,15,56,220,209 > +.byte 102,15,56,220,217 > + xorl %ebp,%r9d > +.byte 0x66,0x90 > +.byte 102,15,56,220,225 > +.byte 102,15,56,220,233 > + movl %r9d,32+12(%rsp) > + leaq 3(%r8),%r9 > +.byte 102,15,56,220,241 > +.byte 102,15,56,220,249 > +.byte 102,68,15,56,220,193 > +.byte 102,68,15,56,220,201 > + movups 80-128(%rcx),%xmm1 > + bswapl %r9d > +.byte 102,15,56,220,208 > +.byte 102,15,56,220,216 > + xorl %ebp,%r9d > +.byte 0x66,0x90 > +.byte 102,15,56,220,224 > +.byte 102,15,56,220,232 > + movl %r9d,48+12(%rsp) > + leaq 4(%r8),%r9 > +.byte 102,15,56,220,240 > +.byte 102,15,56,220,248 > +.byte 102,68,15,56,220,192 > +.byte 102,68,15,56,220,200 > + movups 96-128(%rcx),%xmm0 > + bswapl %r9d > +.byte 102,15,56,220,209 > +.byte 102,15,56,220,217 > + xorl %ebp,%r9d > +.byte 0x66,0x90 > +.byte 102,15,56,220,225 > +.byte 102,15,56,220,233 > + movl %r9d,64+12(%rsp) > + leaq 5(%r8),%r9 > +.byte 102,15,56,220,241 > +.byte 102,15,56,220,249 > +.byte 102,68,15,56,220,193 > +.byte 102,68,15,56,220,201 > + movups 112-128(%rcx),%xmm1 > + bswapl %r9d > +.byte 102,15,56,220,208 > +.byte 102,15,56,220,216 > + xorl %ebp,%r9d > +.byte 0x66,0x90 > +.byte 102,15,56,220,224 > +.byte 102,15,56,220,232 > + movl %r9d,80+12(%rsp) > + leaq 6(%r8),%r9 > +.byte 102,15,56,220,240 > +.byte 102,15,56,220,248 > +.byte 102,68,15,56,220,192 > +.byte 102,68,15,56,220,200 > + movups 128-128(%rcx),%xmm0 > + bswapl %r9d > +.byte 102,15,56,220,209 > +.byte 102,15,56,220,217 > + xorl %ebp,%r9d > +.byte 0x66,0x90 > +.byte 102,15,56,220,225 > +.byte 102,15,56,220,233 > + movl %r9d,96+12(%rsp) > + leaq 7(%r8),%r9 > +.byte 102,15,56,220,241 > +.byte 102,15,56,220,249 > +.byte 102,68,15,56,220,193 > +.byte 102,68,15,56,220,201 > + movups 144-128(%rcx),%xmm1 > + bswapl %r9d > +.byte 102,15,56,220,208 > +.byte 102,15,56,220,216 > +.byte 102,15,56,220,224 > + xorl %ebp,%r9d > + movdqu 0(%rdi),%xmm10 > +.byte 102,15,56,220,232 > + movl %r9d,112+12(%rsp) > + cmpl $11,%eax > +.byte 102,15,56,220,240 > +.byte 102,15,56,220,248 > +.byte 102,68,15,56,220,192 > +.byte 102,68,15,56,220,200 > + movups 160-128(%rcx),%xmm0 > + > + jb .Lctr32_enc_done > + > +.byte 102,15,56,220,209 > +.byte 102,15,56,220,217 > +.byte 102,15,56,220,225 > +.byte 102,15,56,220,233 > +.byte 102,15,56,220,241 > +.byte 102,15,56,220,249 > +.byte 102,68,15,56,220,193 > +.byte 102,68,15,56,220,201 > + movups 176-128(%rcx),%xmm1 > + > +.byte 102,15,56,220,208 > +.byte 102,15,56,220,216 > +.byte 102,15,56,220,224 > +.byte 102,15,56,220,232 > +.byte 102,15,56,220,240 > +.byte 102,15,56,220,248 > +.byte 102,68,15,56,220,192 > +.byte 102,68,15,56,220,200 > + movups 192-128(%rcx),%xmm0 > + je .Lctr32_enc_done > + > +.byte 102,15,56,220,209 > +.byte 102,15,56,220,217 > +.byte 102,15,56,220,225 > +.byte 102,15,56,220,233 > +.byte 102,15,56,220,241 > +.byte 102,15,56,220,249 > +.byte 102,68,15,56,220,193 > +.byte 102,68,15,56,220,201 > + movups 208-128(%rcx),%xmm1 > + > +.byte 102,15,56,220,208 > +.byte 102,15,56,220,216 > +.byte 102,15,56,220,224 > +.byte 102,15,56,220,232 > +.byte 102,15,56,220,240 > +.byte 102,15,56,220,248 > +.byte 102,68,15,56,220,192 > +.byte 102,68,15,56,220,200 > + movups 224-128(%rcx),%xmm0 > + jmp .Lctr32_enc_done > + > +.align 16 > +.Lctr32_enc_done: > + movdqu 16(%rdi),%xmm11 > + pxor %xmm0,%xmm10 > + movdqu 32(%rdi),%xmm12 > + pxor %xmm0,%xmm11 > + movdqu 48(%rdi),%xmm13 > + pxor %xmm0,%xmm12 > + movdqu 64(%rdi),%xmm14 > + pxor %xmm0,%xmm13 > + movdqu 80(%rdi),%xmm15 > + pxor %xmm0,%xmm14 > + pxor %xmm0,%xmm15 > +.byte 102,15,56,220,209 > +.byte 102,15,56,220,217 > +.byte 102,15,56,220,225 > +.byte 102,15,56,220,233 > +.byte 102,15,56,220,241 > +.byte 102,15,56,220,249 > +.byte 102,68,15,56,220,193 > +.byte 102,68,15,56,220,201 > + movdqu 96(%rdi),%xmm1 > + leaq 128(%rdi),%rdi > + > +.byte 102,65,15,56,221,210 > + pxor %xmm0,%xmm1 > + movdqu 112-128(%rdi),%xmm10 > +.byte 102,65,15,56,221,219 > + pxor %xmm0,%xmm10 > + movdqa 0(%rsp),%xmm11 > +.byte 102,65,15,56,221,228 > +.byte 102,65,15,56,221,237 > + movdqa 16(%rsp),%xmm12 > + movdqa 32(%rsp),%xmm13 > +.byte 102,65,15,56,221,246 > +.byte 102,65,15,56,221,255 > + movdqa 48(%rsp),%xmm14 > + movdqa 64(%rsp),%xmm15 > +.byte 102,68,15,56,221,193 > + movdqa 80(%rsp),%xmm0 > + movups 16-128(%rcx),%xmm1 > +.byte 102,69,15,56,221,202 > + > + movups %xmm2,(%rsi) > + movdqa %xmm11,%xmm2 > + movups %xmm3,16(%rsi) > + movdqa %xmm12,%xmm3 > + movups %xmm4,32(%rsi) > + movdqa %xmm13,%xmm4 > + movups %xmm5,48(%rsi) > + movdqa %xmm14,%xmm5 > + movups %xmm6,64(%rsi) > + movdqa %xmm15,%xmm6 > + movups %xmm7,80(%rsi) > + movdqa %xmm0,%xmm7 > + movups %xmm8,96(%rsi) > + movups %xmm9,112(%rsi) > + leaq 128(%rsi),%rsi > + > + subq $8,%rdx > + jnc .Lctr32_loop8 > + > + addq $8,%rdx > + jz .Lctr32_done > + leaq -128(%rcx),%rcx > + > +.Lctr32_tail: > + > + > + leaq 16(%rcx),%rcx > + cmpq $4,%rdx > + jb .Lctr32_loop3 > + je .Lctr32_loop4 > + > + > + shll $4,%eax > + movdqa 96(%rsp),%xmm8 > + pxor %xmm9,%xmm9 > + > + movups 16(%rcx),%xmm0 > +.byte 102,15,56,220,209 > +.byte 102,15,56,220,217 > + leaq 32-16(%rcx,%rax,1),%rcx > + negq %rax > +.byte 102,15,56,220,225 > + addq $16,%rax > + movups (%rdi),%xmm10 > +.byte 102,15,56,220,233 > +.byte 102,15,56,220,241 > + movups 16(%rdi),%xmm11 > + movups 32(%rdi),%xmm12 > +.byte 102,15,56,220,249 > +.byte 102,68,15,56,220,193 > + > + call .Lenc_loop8_enter > + > + movdqu 48(%rdi),%xmm13 > + pxor %xmm10,%xmm2 > + movdqu 64(%rdi),%xmm10 > + pxor %xmm11,%xmm3 > + movdqu %xmm2,(%rsi) > + pxor %xmm12,%xmm4 > + movdqu %xmm3,16(%rsi) > + pxor %xmm13,%xmm5 > + movdqu %xmm4,32(%rsi) > + pxor %xmm10,%xmm6 > + movdqu %xmm5,48(%rsi) > + movdqu %xmm6,64(%rsi) > + cmpq $6,%rdx > + jb .Lctr32_done > + > + movups 80(%rdi),%xmm11 > + xorps %xmm11,%xmm7 > + movups %xmm7,80(%rsi) > + je .Lctr32_done > + > + movups 96(%rdi),%xmm12 > + xorps %xmm12,%xmm8 > + movups %xmm8,96(%rsi) > + jmp .Lctr32_done > + > +.align 32 > +.Lctr32_loop4: > +.byte 102,15,56,220,209 > + leaq 16(%rcx),%rcx > + decl %eax > +.byte 102,15,56,220,217 > +.byte 102,15,56,220,225 > +.byte 102,15,56,220,233 > + movups (%rcx),%xmm1 > + jnz .Lctr32_loop4 > +.byte 102,15,56,221,209 > +.byte 102,15,56,221,217 > + movups (%rdi),%xmm10 > + movups 16(%rdi),%xmm11 > +.byte 102,15,56,221,225 > +.byte 102,15,56,221,233 > + movups 32(%rdi),%xmm12 > + movups 48(%rdi),%xmm13 > + > + xorps %xmm10,%xmm2 > + movups %xmm2,(%rsi) > + xorps %xmm11,%xmm3 > + movups %xmm3,16(%rsi) > + pxor %xmm12,%xmm4 > + movdqu %xmm4,32(%rsi) > + pxor %xmm13,%xmm5 > + movdqu %xmm5,48(%rsi) > + jmp .Lctr32_done > + > +.align 32 > +.Lctr32_loop3: > +.byte 102,15,56,220,209 > + leaq 16(%rcx),%rcx > + decl %eax > +.byte 102,15,56,220,217 > +.byte 102,15,56,220,225 > + movups (%rcx),%xmm1 > + jnz .Lctr32_loop3 > +.byte 102,15,56,221,209 > +.byte 102,15,56,221,217 > +.byte 102,15,56,221,225 > + > + movups (%rdi),%xmm10 > + xorps %xmm10,%xmm2 > + movups %xmm2,(%rsi) > + cmpq $2,%rdx > + jb .Lctr32_done > + > + movups 16(%rdi),%xmm11 > + xorps %xmm11,%xmm3 > + movups %xmm3,16(%rsi) > + je .Lctr32_done > + > + movups 32(%rdi),%xmm12 > + xorps %xmm12,%xmm4 > + movups %xmm4,32(%rsi) > + > +.Lctr32_done: > + xorps %xmm0,%xmm0 > + xorl %ebp,%ebp > + pxor %xmm1,%xmm1 > + pxor %xmm2,%xmm2 > + pxor %xmm3,%xmm3 > + pxor %xmm4,%xmm4 > + pxor %xmm5,%xmm5 > + pxor %xmm6,%xmm6 > + pxor %xmm7,%xmm7 > + movaps %xmm0,0(%rsp) > + pxor %xmm8,%xmm8 > + movaps %xmm0,16(%rsp) > + pxor %xmm9,%xmm9 > + movaps %xmm0,32(%rsp) > + pxor %xmm10,%xmm10 > + movaps %xmm0,48(%rsp) > + pxor %xmm11,%xmm11 > + movaps %xmm0,64(%rsp) > + pxor %xmm12,%xmm12 > + movaps %xmm0,80(%rsp) > + pxor %xmm13,%xmm13 > + movaps %xmm0,96(%rsp) > + pxor %xmm14,%xmm14 > + movaps %xmm0,112(%rsp) > + pxor %xmm15,%xmm15 > + movq -8(%r11),%rbp > +.cfi_restore %rbp > + leaq (%r11),%rsp > +.cfi_def_cfa_register %rsp > +.Lctr32_epilogue: > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size aesni_ctr32_encrypt_blocks,.-aesni_ctr32_encrypt_blocks > +.globl aesni_xts_encrypt > +.type aesni_xts_encrypt,@function > +.align 16 > +aesni_xts_encrypt: > +.cfi_startproc > + leaq (%rsp),%r11 > +.cfi_def_cfa_register %r11 > + pushq %rbp > +.cfi_offset %rbp,-16 > + subq $112,%rsp > + andq $-16,%rsp > + movups (%r9),%xmm2 > + movl 240(%r8),%eax > + movl 240(%rcx),%r10d > + movups (%r8),%xmm0 > + movups 16(%r8),%xmm1 > + leaq 32(%r8),%r8 > + xorps %xmm0,%xmm2 > +.Loop_enc1_8: > +.byte 102,15,56,220,209 > + decl %eax > + movups (%r8),%xmm1 > + leaq 16(%r8),%r8 > + jnz .Loop_enc1_8 > +.byte 102,15,56,221,209 > + movups (%rcx),%xmm0 > + movq %rcx,%rbp > + movl %r10d,%eax > + shll $4,%r10d > + movq %rdx,%r9 > + andq $-16,%rdx > + > + movups 16(%rcx,%r10,1),%xmm1 > + > + movdqa .Lxts_magic(%rip),%xmm8 > + movdqa %xmm2,%xmm15 > + pshufd $0x5f,%xmm2,%xmm9 > + pxor %xmm0,%xmm1 > + movdqa %xmm9,%xmm14 > + paddd %xmm9,%xmm9 > + movdqa %xmm15,%xmm10 > + psrad $31,%xmm14 > + paddq %xmm15,%xmm15 > + pand %xmm8,%xmm14 > + pxor %xmm0,%xmm10 > + pxor %xmm14,%xmm15 > + movdqa %xmm9,%xmm14 > + paddd %xmm9,%xmm9 > + movdqa %xmm15,%xmm11 > + psrad $31,%xmm14 > + paddq %xmm15,%xmm15 > + pand %xmm8,%xmm14 > + pxor %xmm0,%xmm11 > + pxor %xmm14,%xmm15 > + movdqa %xmm9,%xmm14 > + paddd %xmm9,%xmm9 > + movdqa %xmm15,%xmm12 > + psrad $31,%xmm14 > + paddq %xmm15,%xmm15 > + pand %xmm8,%xmm14 > + pxor %xmm0,%xmm12 > + pxor %xmm14,%xmm15 > + movdqa %xmm9,%xmm14 > + paddd %xmm9,%xmm9 > + movdqa %xmm15,%xmm13 > + psrad $31,%xmm14 > + paddq %xmm15,%xmm15 > + pand %xmm8,%xmm14 > + pxor %xmm0,%xmm13 > + pxor %xmm14,%xmm15 > + movdqa %xmm15,%xmm14 > + psrad $31,%xmm9 > + paddq %xmm15,%xmm15 > + pand %xmm8,%xmm9 > + pxor %xmm0,%xmm14 > + pxor %xmm9,%xmm15 > + movaps %xmm1,96(%rsp) > + > + subq $96,%rdx > + jc .Lxts_enc_short > + > + movl $16+96,%eax > + leaq 32(%rbp,%r10,1),%rcx > + subq %r10,%rax > + movups 16(%rbp),%xmm1 > + movq %rax,%r10 > + leaq .Lxts_magic(%rip),%r8 > + jmp .Lxts_enc_grandloop > + > +.align 32 > +.Lxts_enc_grandloop: > + movdqu 0(%rdi),%xmm2 > + movdqa %xmm0,%xmm8 > + movdqu 16(%rdi),%xmm3 > + pxor %xmm10,%xmm2 > + movdqu 32(%rdi),%xmm4 > + pxor %xmm11,%xmm3 > +.byte 102,15,56,220,209 > + movdqu 48(%rdi),%xmm5 > + pxor %xmm12,%xmm4 > +.byte 102,15,56,220,217 > + movdqu 64(%rdi),%xmm6 > + pxor %xmm13,%xmm5 > +.byte 102,15,56,220,225 > + movdqu 80(%rdi),%xmm7 > + pxor %xmm15,%xmm8 > + movdqa 96(%rsp),%xmm9 > + pxor %xmm14,%xmm6 > +.byte 102,15,56,220,233 > + movups 32(%rbp),%xmm0 > + leaq 96(%rdi),%rdi > + pxor %xmm8,%xmm7 > + > + pxor %xmm9,%xmm10 > +.byte 102,15,56,220,241 > + pxor %xmm9,%xmm11 > + movdqa %xmm10,0(%rsp) > +.byte 102,15,56,220,249 > + movups 48(%rbp),%xmm1 > + pxor %xmm9,%xmm12 > + > +.byte 102,15,56,220,208 > + pxor %xmm9,%xmm13 > + movdqa %xmm11,16(%rsp) > +.byte 102,15,56,220,216 > + pxor %xmm9,%xmm14 > + movdqa %xmm12,32(%rsp) > +.byte 102,15,56,220,224 > +.byte 102,15,56,220,232 > + pxor %xmm9,%xmm8 > + movdqa %xmm14,64(%rsp) > +.byte 102,15,56,220,240 > +.byte 102,15,56,220,248 > + movups 64(%rbp),%xmm0 > + movdqa %xmm8,80(%rsp) > + pshufd $0x5f,%xmm15,%xmm9 > + jmp .Lxts_enc_loop6 > +.align 32 > +.Lxts_enc_loop6: > +.byte 102,15,56,220,209 > +.byte 102,15,56,220,217 > +.byte 102,15,56,220,225 > +.byte 102,15,56,220,233 > +.byte 102,15,56,220,241 > +.byte 102,15,56,220,249 > + movups -64(%rcx,%rax,1),%xmm1 > + addq $32,%rax > + > +.byte 102,15,56,220,208 > +.byte 102,15,56,220,216 > +.byte 102,15,56,220,224 > +.byte 102,15,56,220,232 > +.byte 102,15,56,220,240 > +.byte 102,15,56,220,248 > + movups -80(%rcx,%rax,1),%xmm0 > + jnz .Lxts_enc_loop6 > + > + movdqa (%r8),%xmm8 > + movdqa %xmm9,%xmm14 > + paddd %xmm9,%xmm9 > +.byte 102,15,56,220,209 > + paddq %xmm15,%xmm15 > + psrad $31,%xmm14 > +.byte 102,15,56,220,217 > + pand %xmm8,%xmm14 > + movups (%rbp),%xmm10 > +.byte 102,15,56,220,225 > +.byte 102,15,56,220,233 > +.byte 102,15,56,220,241 > + pxor %xmm14,%xmm15 > + movaps %xmm10,%xmm11 > +.byte 102,15,56,220,249 > + movups -64(%rcx),%xmm1 > + > + movdqa %xmm9,%xmm14 > +.byte 102,15,56,220,208 > + paddd %xmm9,%xmm9 > + pxor %xmm15,%xmm10 > +.byte 102,15,56,220,216 > + psrad $31,%xmm14 > + paddq %xmm15,%xmm15 > +.byte 102,15,56,220,224 > +.byte 102,15,56,220,232 > + pand %xmm8,%xmm14 > + movaps %xmm11,%xmm12 > +.byte 102,15,56,220,240 > + pxor %xmm14,%xmm15 > + movdqa %xmm9,%xmm14 > +.byte 102,15,56,220,248 > + movups -48(%rcx),%xmm0 > + > + paddd %xmm9,%xmm9 > +.byte 102,15,56,220,209 > + pxor %xmm15,%xmm11 > + psrad $31,%xmm14 > +.byte 102,15,56,220,217 > + paddq %xmm15,%xmm15 > + pand %xmm8,%xmm14 > +.byte 102,15,56,220,225 > +.byte 102,15,56,220,233 > + movdqa %xmm13,48(%rsp) > + pxor %xmm14,%xmm15 > +.byte 102,15,56,220,241 > + movaps %xmm12,%xmm13 > + movdqa %xmm9,%xmm14 > +.byte 102,15,56,220,249 > + movups -32(%rcx),%xmm1 > + > + paddd %xmm9,%xmm9 > +.byte 102,15,56,220,208 > + pxor %xmm15,%xmm12 > + psrad $31,%xmm14 > +.byte 102,15,56,220,216 > + paddq %xmm15,%xmm15 > + pand %xmm8,%xmm14 > +.byte 102,15,56,220,224 > +.byte 102,15,56,220,232 > +.byte 102,15,56,220,240 > + pxor %xmm14,%xmm15 > + movaps %xmm13,%xmm14 > +.byte 102,15,56,220,248 > + > + movdqa %xmm9,%xmm0 > + paddd %xmm9,%xmm9 > +.byte 102,15,56,220,209 > + pxor %xmm15,%xmm13 > + psrad $31,%xmm0 > +.byte 102,15,56,220,217 > + paddq %xmm15,%xmm15 > + pand %xmm8,%xmm0 > +.byte 102,15,56,220,225 > +.byte 102,15,56,220,233 > + pxor %xmm0,%xmm15 > + movups (%rbp),%xmm0 > +.byte 102,15,56,220,241 > +.byte 102,15,56,220,249 > + movups 16(%rbp),%xmm1 > + > + pxor %xmm15,%xmm14 > +.byte 102,15,56,221,84,36,0 > + psrad $31,%xmm9 > + paddq %xmm15,%xmm15 > +.byte 102,15,56,221,92,36,16 > +.byte 102,15,56,221,100,36,32 > + pand %xmm8,%xmm9 > + movq %r10,%rax > +.byte 102,15,56,221,108,36,48 > +.byte 102,15,56,221,116,36,64 > +.byte 102,15,56,221,124,36,80 > + pxor %xmm9,%xmm15 > + > + leaq 96(%rsi),%rsi > + movups %xmm2,-96(%rsi) > + movups %xmm3,-80(%rsi) > + movups %xmm4,-64(%rsi) > + movups %xmm5,-48(%rsi) > + movups %xmm6,-32(%rsi) > + movups %xmm7,-16(%rsi) > + subq $96,%rdx > + jnc .Lxts_enc_grandloop > + > + movl $16+96,%eax > + subl %r10d,%eax > + movq %rbp,%rcx > + shrl $4,%eax > + > +.Lxts_enc_short: > + > + movl %eax,%r10d > + pxor %xmm0,%xmm10 > + addq $96,%rdx > + jz .Lxts_enc_done > + > + pxor %xmm0,%xmm11 > + cmpq $0x20,%rdx > + jb .Lxts_enc_one > + pxor %xmm0,%xmm12 > + je .Lxts_enc_two > + > + pxor %xmm0,%xmm13 > + cmpq $0x40,%rdx > + jb .Lxts_enc_three > + pxor %xmm0,%xmm14 > + je .Lxts_enc_four > + > + movdqu (%rdi),%xmm2 > + movdqu 16(%rdi),%xmm3 > + movdqu 32(%rdi),%xmm4 > + pxor %xmm10,%xmm2 > + movdqu 48(%rdi),%xmm5 > + pxor %xmm11,%xmm3 > + movdqu 64(%rdi),%xmm6 > + leaq 80(%rdi),%rdi > + pxor %xmm12,%xmm4 > + pxor %xmm13,%xmm5 > + pxor %xmm14,%xmm6 > + pxor %xmm7,%xmm7 > + > + call _aesni_encrypt6 > + > + xorps %xmm10,%xmm2 > + movdqa %xmm15,%xmm10 > + xorps %xmm11,%xmm3 > + xorps %xmm12,%xmm4 > + movdqu %xmm2,(%rsi) > + xorps %xmm13,%xmm5 > + movdqu %xmm3,16(%rsi) > + xorps %xmm14,%xmm6 > + movdqu %xmm4,32(%rsi) > + movdqu %xmm5,48(%rsi) > + movdqu %xmm6,64(%rsi) > + leaq 80(%rsi),%rsi > + jmp .Lxts_enc_done > + > +.align 16 > +.Lxts_enc_one: > + movups (%rdi),%xmm2 > + leaq 16(%rdi),%rdi > + xorps %xmm10,%xmm2 > + movups (%rcx),%xmm0 > + movups 16(%rcx),%xmm1 > + leaq 32(%rcx),%rcx > + xorps %xmm0,%xmm2 > +.Loop_enc1_9: > +.byte 102,15,56,220,209 > + decl %eax > + movups (%rcx),%xmm1 > + leaq 16(%rcx),%rcx > + jnz .Loop_enc1_9 > +.byte 102,15,56,221,209 > + xorps %xmm10,%xmm2 > + movdqa %xmm11,%xmm10 > + movups %xmm2,(%rsi) > + leaq 16(%rsi),%rsi > + jmp .Lxts_enc_done > + > +.align 16 > +.Lxts_enc_two: > + movups (%rdi),%xmm2 > + movups 16(%rdi),%xmm3 > + leaq 32(%rdi),%rdi > + xorps %xmm10,%xmm2 > + xorps %xmm11,%xmm3 > + > + call _aesni_encrypt2 > + > + xorps %xmm10,%xmm2 > + movdqa %xmm12,%xmm10 > + xorps %xmm11,%xmm3 > + movups %xmm2,(%rsi) > + movups %xmm3,16(%rsi) > + leaq 32(%rsi),%rsi > + jmp .Lxts_enc_done > + > +.align 16 > +.Lxts_enc_three: > + movups (%rdi),%xmm2 > + movups 16(%rdi),%xmm3 > + movups 32(%rdi),%xmm4 > + leaq 48(%rdi),%rdi > + xorps %xmm10,%xmm2 > + xorps %xmm11,%xmm3 > + xorps %xmm12,%xmm4 > + > + call _aesni_encrypt3 > + > + xorps %xmm10,%xmm2 > + movdqa %xmm13,%xmm10 > + xorps %xmm11,%xmm3 > + xorps %xmm12,%xmm4 > + movups %xmm2,(%rsi) > + movups %xmm3,16(%rsi) > + movups %xmm4,32(%rsi) > + leaq 48(%rsi),%rsi > + jmp .Lxts_enc_done > + > +.align 16 > +.Lxts_enc_four: > + movups (%rdi),%xmm2 > + movups 16(%rdi),%xmm3 > + movups 32(%rdi),%xmm4 > + xorps %xmm10,%xmm2 > + movups 48(%rdi),%xmm5 > + leaq 64(%rdi),%rdi > + xorps %xmm11,%xmm3 > + xorps %xmm12,%xmm4 > + xorps %xmm13,%xmm5 > + > + call _aesni_encrypt4 > + > + pxor %xmm10,%xmm2 > + movdqa %xmm14,%xmm10 > + pxor %xmm11,%xmm3 > + pxor %xmm12,%xmm4 > + movdqu %xmm2,(%rsi) > + pxor %xmm13,%xmm5 > + movdqu %xmm3,16(%rsi) > + movdqu %xmm4,32(%rsi) > + movdqu %xmm5,48(%rsi) > + leaq 64(%rsi),%rsi > + jmp .Lxts_enc_done > + > +.align 16 > +.Lxts_enc_done: > + andq $15,%r9 > + jz .Lxts_enc_ret > + movq %r9,%rdx > + > +.Lxts_enc_steal: > + movzbl (%rdi),%eax > + movzbl -16(%rsi),%ecx > + leaq 1(%rdi),%rdi > + movb %al,-16(%rsi) > + movb %cl,0(%rsi) > + leaq 1(%rsi),%rsi > + subq $1,%rdx > + jnz .Lxts_enc_steal > + > + subq %r9,%rsi > + movq %rbp,%rcx > + movl %r10d,%eax > + > + movups -16(%rsi),%xmm2 > + xorps %xmm10,%xmm2 > + movups (%rcx),%xmm0 > + movups 16(%rcx),%xmm1 > + leaq 32(%rcx),%rcx > + xorps %xmm0,%xmm2 > +.Loop_enc1_10: > +.byte 102,15,56,220,209 > + decl %eax > + movups (%rcx),%xmm1 > + leaq 16(%rcx),%rcx > + jnz .Loop_enc1_10 > +.byte 102,15,56,221,209 > + xorps %xmm10,%xmm2 > + movups %xmm2,-16(%rsi) > + > +.Lxts_enc_ret: > + xorps %xmm0,%xmm0 > + pxor %xmm1,%xmm1 > + pxor %xmm2,%xmm2 > + pxor %xmm3,%xmm3 > + pxor %xmm4,%xmm4 > + pxor %xmm5,%xmm5 > + pxor %xmm6,%xmm6 > + pxor %xmm7,%xmm7 > + movaps %xmm0,0(%rsp) > + pxor %xmm8,%xmm8 > + movaps %xmm0,16(%rsp) > + pxor %xmm9,%xmm9 > + movaps %xmm0,32(%rsp) > + pxor %xmm10,%xmm10 > + movaps %xmm0,48(%rsp) > + pxor %xmm11,%xmm11 > + movaps %xmm0,64(%rsp) > + pxor %xmm12,%xmm12 > + movaps %xmm0,80(%rsp) > + pxor %xmm13,%xmm13 > + movaps %xmm0,96(%rsp) > + pxor %xmm14,%xmm14 > + pxor %xmm15,%xmm15 > + movq -8(%r11),%rbp > +.cfi_restore %rbp > + leaq (%r11),%rsp > +.cfi_def_cfa_register %rsp > +.Lxts_enc_epilogue: > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size aesni_xts_encrypt,.-aesni_xts_encrypt > +.globl aesni_xts_decrypt > +.type aesni_xts_decrypt,@function > +.align 16 > +aesni_xts_decrypt: > +.cfi_startproc > + leaq (%rsp),%r11 > +.cfi_def_cfa_register %r11 > + pushq %rbp > +.cfi_offset %rbp,-16 > + subq $112,%rsp > + andq $-16,%rsp > + movups (%r9),%xmm2 > + movl 240(%r8),%eax > + movl 240(%rcx),%r10d > + movups (%r8),%xmm0 > + movups 16(%r8),%xmm1 > + leaq 32(%r8),%r8 > + xorps %xmm0,%xmm2 > +.Loop_enc1_11: > +.byte 102,15,56,220,209 > + decl %eax > + movups (%r8),%xmm1 > + leaq 16(%r8),%r8 > + jnz .Loop_enc1_11 > +.byte 102,15,56,221,209 > + xorl %eax,%eax > + testq $15,%rdx > + setnz %al > + shlq $4,%rax > + subq %rax,%rdx > + > + movups (%rcx),%xmm0 > + movq %rcx,%rbp > + movl %r10d,%eax > + shll $4,%r10d > + movq %rdx,%r9 > + andq $-16,%rdx > + > + movups 16(%rcx,%r10,1),%xmm1 > + > + movdqa .Lxts_magic(%rip),%xmm8 > + movdqa %xmm2,%xmm15 > + pshufd $0x5f,%xmm2,%xmm9 > + pxor %xmm0,%xmm1 > + movdqa %xmm9,%xmm14 > + paddd %xmm9,%xmm9 > + movdqa %xmm15,%xmm10 > + psrad $31,%xmm14 > + paddq %xmm15,%xmm15 > + pand %xmm8,%xmm14 > + pxor %xmm0,%xmm10 > + pxor %xmm14,%xmm15 > + movdqa %xmm9,%xmm14 > + paddd %xmm9,%xmm9 > + movdqa %xmm15,%xmm11 > + psrad $31,%xmm14 > + paddq %xmm15,%xmm15 > + pand %xmm8,%xmm14 > + pxor %xmm0,%xmm11 > + pxor %xmm14,%xmm15 > + movdqa %xmm9,%xmm14 > + paddd %xmm9,%xmm9 > + movdqa %xmm15,%xmm12 > + psrad $31,%xmm14 > + paddq %xmm15,%xmm15 > + pand %xmm8,%xmm14 > + pxor %xmm0,%xmm12 > + pxor %xmm14,%xmm15 > + movdqa %xmm9,%xmm14 > + paddd %xmm9,%xmm9 > + movdqa %xmm15,%xmm13 > + psrad $31,%xmm14 > + paddq %xmm15,%xmm15 > + pand %xmm8,%xmm14 > + pxor %xmm0,%xmm13 > + pxor %xmm14,%xmm15 > + movdqa %xmm15,%xmm14 > + psrad $31,%xmm9 > + paddq %xmm15,%xmm15 > + pand %xmm8,%xmm9 > + pxor %xmm0,%xmm14 > + pxor %xmm9,%xmm15 > + movaps %xmm1,96(%rsp) > + > + subq $96,%rdx > + jc .Lxts_dec_short > + > + movl $16+96,%eax > + leaq 32(%rbp,%r10,1),%rcx > + subq %r10,%rax > + movups 16(%rbp),%xmm1 > + movq %rax,%r10 > + leaq .Lxts_magic(%rip),%r8 > + jmp .Lxts_dec_grandloop > + > +.align 32 > +.Lxts_dec_grandloop: > + movdqu 0(%rdi),%xmm2 > + movdqa %xmm0,%xmm8 > + movdqu 16(%rdi),%xmm3 > + pxor %xmm10,%xmm2 > + movdqu 32(%rdi),%xmm4 > + pxor %xmm11,%xmm3 > +.byte 102,15,56,222,209 > + movdqu 48(%rdi),%xmm5 > + pxor %xmm12,%xmm4 > +.byte 102,15,56,222,217 > + movdqu 64(%rdi),%xmm6 > + pxor %xmm13,%xmm5 > +.byte 102,15,56,222,225 > + movdqu 80(%rdi),%xmm7 > + pxor %xmm15,%xmm8 > + movdqa 96(%rsp),%xmm9 > + pxor %xmm14,%xmm6 > +.byte 102,15,56,222,233 > + movups 32(%rbp),%xmm0 > + leaq 96(%rdi),%rdi > + pxor %xmm8,%xmm7 > + > + pxor %xmm9,%xmm10 > +.byte 102,15,56,222,241 > + pxor %xmm9,%xmm11 > + movdqa %xmm10,0(%rsp) > +.byte 102,15,56,222,249 > + movups 48(%rbp),%xmm1 > + pxor %xmm9,%xmm12 > + > +.byte 102,15,56,222,208 > + pxor %xmm9,%xmm13 > + movdqa %xmm11,16(%rsp) > +.byte 102,15,56,222,216 > + pxor %xmm9,%xmm14 > + movdqa %xmm12,32(%rsp) > +.byte 102,15,56,222,224 > +.byte 102,15,56,222,232 > + pxor %xmm9,%xmm8 > + movdqa %xmm14,64(%rsp) > +.byte 102,15,56,222,240 > +.byte 102,15,56,222,248 > + movups 64(%rbp),%xmm0 > + movdqa %xmm8,80(%rsp) > + pshufd $0x5f,%xmm15,%xmm9 > + jmp .Lxts_dec_loop6 > +.align 32 > +.Lxts_dec_loop6: > +.byte 102,15,56,222,209 > +.byte 102,15,56,222,217 > +.byte 102,15,56,222,225 > +.byte 102,15,56,222,233 > +.byte 102,15,56,222,241 > +.byte 102,15,56,222,249 > + movups -64(%rcx,%rax,1),%xmm1 > + addq $32,%rax > + > +.byte 102,15,56,222,208 > +.byte 102,15,56,222,216 > +.byte 102,15,56,222,224 > +.byte 102,15,56,222,232 > +.byte 102,15,56,222,240 > +.byte 102,15,56,222,248 > + movups -80(%rcx,%rax,1),%xmm0 > + jnz .Lxts_dec_loop6 > + > + movdqa (%r8),%xmm8 > + movdqa %xmm9,%xmm14 > + paddd %xmm9,%xmm9 > +.byte 102,15,56,222,209 > + paddq %xmm15,%xmm15 > + psrad $31,%xmm14 > +.byte 102,15,56,222,217 > + pand %xmm8,%xmm14 > + movups (%rbp),%xmm10 > +.byte 102,15,56,222,225 > +.byte 102,15,56,222,233 > +.byte 102,15,56,222,241 > + pxor %xmm14,%xmm15 > + movaps %xmm10,%xmm11 > +.byte 102,15,56,222,249 > + movups -64(%rcx),%xmm1 > + > + movdqa %xmm9,%xmm14 > +.byte 102,15,56,222,208 > + paddd %xmm9,%xmm9 > + pxor %xmm15,%xmm10 > +.byte 102,15,56,222,216 > + psrad $31,%xmm14 > + paddq %xmm15,%xmm15 > +.byte 102,15,56,222,224 > +.byte 102,15,56,222,232 > + pand %xmm8,%xmm14 > + movaps %xmm11,%xmm12 > +.byte 102,15,56,222,240 > + pxor %xmm14,%xmm15 > + movdqa %xmm9,%xmm14 > +.byte 102,15,56,222,248 > + movups -48(%rcx),%xmm0 > + > + paddd %xmm9,%xmm9 > +.byte 102,15,56,222,209 > + pxor %xmm15,%xmm11 > + psrad $31,%xmm14 > +.byte 102,15,56,222,217 > + paddq %xmm15,%xmm15 > + pand %xmm8,%xmm14 > +.byte 102,15,56,222,225 > +.byte 102,15,56,222,233 > + movdqa %xmm13,48(%rsp) > + pxor %xmm14,%xmm15 > +.byte 102,15,56,222,241 > + movaps %xmm12,%xmm13 > + movdqa %xmm9,%xmm14 > +.byte 102,15,56,222,249 > + movups -32(%rcx),%xmm1 > + > + paddd %xmm9,%xmm9 > +.byte 102,15,56,222,208 > + pxor %xmm15,%xmm12 > + psrad $31,%xmm14 > +.byte 102,15,56,222,216 > + paddq %xmm15,%xmm15 > + pand %xmm8,%xmm14 > +.byte 102,15,56,222,224 > +.byte 102,15,56,222,232 > +.byte 102,15,56,222,240 > + pxor %xmm14,%xmm15 > + movaps %xmm13,%xmm14 > +.byte 102,15,56,222,248 > + > + movdqa %xmm9,%xmm0 > + paddd %xmm9,%xmm9 > +.byte 102,15,56,222,209 > + pxor %xmm15,%xmm13 > + psrad $31,%xmm0 > +.byte 102,15,56,222,217 > + paddq %xmm15,%xmm15 > + pand %xmm8,%xmm0 > +.byte 102,15,56,222,225 > +.byte 102,15,56,222,233 > + pxor %xmm0,%xmm15 > + movups (%rbp),%xmm0 > +.byte 102,15,56,222,241 > +.byte 102,15,56,222,249 > + movups 16(%rbp),%xmm1 > + > + pxor %xmm15,%xmm14 > +.byte 102,15,56,223,84,36,0 > + psrad $31,%xmm9 > + paddq %xmm15,%xmm15 > +.byte 102,15,56,223,92,36,16 > +.byte 102,15,56,223,100,36,32 > + pand %xmm8,%xmm9 > + movq %r10,%rax > +.byte 102,15,56,223,108,36,48 > +.byte 102,15,56,223,116,36,64 > +.byte 102,15,56,223,124,36,80 > + pxor %xmm9,%xmm15 > + > + leaq 96(%rsi),%rsi > + movups %xmm2,-96(%rsi) > + movups %xmm3,-80(%rsi) > + movups %xmm4,-64(%rsi) > + movups %xmm5,-48(%rsi) > + movups %xmm6,-32(%rsi) > + movups %xmm7,-16(%rsi) > + subq $96,%rdx > + jnc .Lxts_dec_grandloop > + > + movl $16+96,%eax > + subl %r10d,%eax > + movq %rbp,%rcx > + shrl $4,%eax > + > +.Lxts_dec_short: > + > + movl %eax,%r10d > + pxor %xmm0,%xmm10 > + pxor %xmm0,%xmm11 > + addq $96,%rdx > + jz .Lxts_dec_done > + > + pxor %xmm0,%xmm12 > + cmpq $0x20,%rdx > + jb .Lxts_dec_one > + pxor %xmm0,%xmm13 > + je .Lxts_dec_two > + > + pxor %xmm0,%xmm14 > + cmpq $0x40,%rdx > + jb .Lxts_dec_three > + je .Lxts_dec_four > + > + movdqu (%rdi),%xmm2 > + movdqu 16(%rdi),%xmm3 > + movdqu 32(%rdi),%xmm4 > + pxor %xmm10,%xmm2 > + movdqu 48(%rdi),%xmm5 > + pxor %xmm11,%xmm3 > + movdqu 64(%rdi),%xmm6 > + leaq 80(%rdi),%rdi > + pxor %xmm12,%xmm4 > + pxor %xmm13,%xmm5 > + pxor %xmm14,%xmm6 > + > + call _aesni_decrypt6 > + > + xorps %xmm10,%xmm2 > + xorps %xmm11,%xmm3 > + xorps %xmm12,%xmm4 > + movdqu %xmm2,(%rsi) > + xorps %xmm13,%xmm5 > + movdqu %xmm3,16(%rsi) > + xorps %xmm14,%xmm6 > + movdqu %xmm4,32(%rsi) > + pxor %xmm14,%xmm14 > + movdqu %xmm5,48(%rsi) > + pcmpgtd %xmm15,%xmm14 > + movdqu %xmm6,64(%rsi) > + leaq 80(%rsi),%rsi > + pshufd $0x13,%xmm14,%xmm11 > + andq $15,%r9 > + jz .Lxts_dec_ret > + > + movdqa %xmm15,%xmm10 > + paddq %xmm15,%xmm15 > + pand %xmm8,%xmm11 > + pxor %xmm15,%xmm11 > + jmp .Lxts_dec_done2 > + > +.align 16 > +.Lxts_dec_one: > + movups (%rdi),%xmm2 > + leaq 16(%rdi),%rdi > + xorps %xmm10,%xmm2 > + movups (%rcx),%xmm0 > + movups 16(%rcx),%xmm1 > + leaq 32(%rcx),%rcx > + xorps %xmm0,%xmm2 > +.Loop_dec1_12: > +.byte 102,15,56,222,209 > + decl %eax > + movups (%rcx),%xmm1 > + leaq 16(%rcx),%rcx > + jnz .Loop_dec1_12 > +.byte 102,15,56,223,209 > + xorps %xmm10,%xmm2 > + movdqa %xmm11,%xmm10 > + movups %xmm2,(%rsi) > + movdqa %xmm12,%xmm11 > + leaq 16(%rsi),%rsi > + jmp .Lxts_dec_done > + > +.align 16 > +.Lxts_dec_two: > + movups (%rdi),%xmm2 > + movups 16(%rdi),%xmm3 > + leaq 32(%rdi),%rdi > + xorps %xmm10,%xmm2 > + xorps %xmm11,%xmm3 > + > + call _aesni_decrypt2 > + > + xorps %xmm10,%xmm2 > + movdqa %xmm12,%xmm10 > + xorps %xmm11,%xmm3 > + movdqa %xmm13,%xmm11 > + movups %xmm2,(%rsi) > + movups %xmm3,16(%rsi) > + leaq 32(%rsi),%rsi > + jmp .Lxts_dec_done > + > +.align 16 > +.Lxts_dec_three: > + movups (%rdi),%xmm2 > + movups 16(%rdi),%xmm3 > + movups 32(%rdi),%xmm4 > + leaq 48(%rdi),%rdi > + xorps %xmm10,%xmm2 > + xorps %xmm11,%xmm3 > + xorps %xmm12,%xmm4 > + > + call _aesni_decrypt3 > + > + xorps %xmm10,%xmm2 > + movdqa %xmm13,%xmm10 > + xorps %xmm11,%xmm3 > + movdqa %xmm14,%xmm11 > + xorps %xmm12,%xmm4 > + movups %xmm2,(%rsi) > + movups %xmm3,16(%rsi) > + movups %xmm4,32(%rsi) > + leaq 48(%rsi),%rsi > + jmp .Lxts_dec_done > + > +.align 16 > +.Lxts_dec_four: > + movups (%rdi),%xmm2 > + movups 16(%rdi),%xmm3 > + movups 32(%rdi),%xmm4 > + xorps %xmm10,%xmm2 > + movups 48(%rdi),%xmm5 > + leaq 64(%rdi),%rdi > + xorps %xmm11,%xmm3 > + xorps %xmm12,%xmm4 > + xorps %xmm13,%xmm5 > + > + call _aesni_decrypt4 > + > + pxor %xmm10,%xmm2 > + movdqa %xmm14,%xmm10 > + pxor %xmm11,%xmm3 > + movdqa %xmm15,%xmm11 > + pxor %xmm12,%xmm4 > + movdqu %xmm2,(%rsi) > + pxor %xmm13,%xmm5 > + movdqu %xmm3,16(%rsi) > + movdqu %xmm4,32(%rsi) > + movdqu %xmm5,48(%rsi) > + leaq 64(%rsi),%rsi > + jmp .Lxts_dec_done > + > +.align 16 > +.Lxts_dec_done: > + andq $15,%r9 > + jz .Lxts_dec_ret > +.Lxts_dec_done2: > + movq %r9,%rdx > + movq %rbp,%rcx > + movl %r10d,%eax > + > + movups (%rdi),%xmm2 > + xorps %xmm11,%xmm2 > + movups (%rcx),%xmm0 > + movups 16(%rcx),%xmm1 > + leaq 32(%rcx),%rcx > + xorps %xmm0,%xmm2 > +.Loop_dec1_13: > +.byte 102,15,56,222,209 > + decl %eax > + movups (%rcx),%xmm1 > + leaq 16(%rcx),%rcx > + jnz .Loop_dec1_13 > +.byte 102,15,56,223,209 > + xorps %xmm11,%xmm2 > + movups %xmm2,(%rsi) > + > +.Lxts_dec_steal: > + movzbl 16(%rdi),%eax > + movzbl (%rsi),%ecx > + leaq 1(%rdi),%rdi > + movb %al,(%rsi) > + movb %cl,16(%rsi) > + leaq 1(%rsi),%rsi > + subq $1,%rdx > + jnz .Lxts_dec_steal > + > + subq %r9,%rsi > + movq %rbp,%rcx > + movl %r10d,%eax > + > + movups (%rsi),%xmm2 > + xorps %xmm10,%xmm2 > + movups (%rcx),%xmm0 > + movups 16(%rcx),%xmm1 > + leaq 32(%rcx),%rcx > + xorps %xmm0,%xmm2 > +.Loop_dec1_14: > +.byte 102,15,56,222,209 > + decl %eax > + movups (%rcx),%xmm1 > + leaq 16(%rcx),%rcx > + jnz .Loop_dec1_14 > +.byte 102,15,56,223,209 > + xorps %xmm10,%xmm2 > + movups %xmm2,(%rsi) > + > +.Lxts_dec_ret: > + xorps %xmm0,%xmm0 > + pxor %xmm1,%xmm1 > + pxor %xmm2,%xmm2 > + pxor %xmm3,%xmm3 > + pxor %xmm4,%xmm4 > + pxor %xmm5,%xmm5 > + pxor %xmm6,%xmm6 > + pxor %xmm7,%xmm7 > + movaps %xmm0,0(%rsp) > + pxor %xmm8,%xmm8 > + movaps %xmm0,16(%rsp) > + pxor %xmm9,%xmm9 > + movaps %xmm0,32(%rsp) > + pxor %xmm10,%xmm10 > + movaps %xmm0,48(%rsp) > + pxor %xmm11,%xmm11 > + movaps %xmm0,64(%rsp) > + pxor %xmm12,%xmm12 > + movaps %xmm0,80(%rsp) > + pxor %xmm13,%xmm13 > + movaps %xmm0,96(%rsp) > + pxor %xmm14,%xmm14 > + pxor %xmm15,%xmm15 > + movq -8(%r11),%rbp > +.cfi_restore %rbp > + leaq (%r11),%rsp > +.cfi_def_cfa_register %rsp > +.Lxts_dec_epilogue: > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size aesni_xts_decrypt,.-aesni_xts_decrypt > +.globl aesni_ocb_encrypt > +.type aesni_ocb_encrypt,@function > +.align 32 > +aesni_ocb_encrypt: > +.cfi_startproc > + leaq (%rsp),%rax > + pushq %rbx > +.cfi_adjust_cfa_offset 8 > +.cfi_offset %rbx,-16 > + pushq %rbp > +.cfi_adjust_cfa_offset 8 > +.cfi_offset %rbp,-24 > + pushq %r12 > +.cfi_adjust_cfa_offset 8 > +.cfi_offset %r12,-32 > + pushq %r13 > +.cfi_adjust_cfa_offset 8 > +.cfi_offset %r13,-40 > + pushq %r14 > +.cfi_adjust_cfa_offset 8 > +.cfi_offset %r14,-48 > + movq 8(%rax),%rbx > + movq 8+8(%rax),%rbp > + > + movl 240(%rcx),%r10d > + movq %rcx,%r11 > + shll $4,%r10d > + movups (%rcx),%xmm9 > + movups 16(%rcx,%r10,1),%xmm1 > + > + movdqu (%r9),%xmm15 > + pxor %xmm1,%xmm9 > + pxor %xmm1,%xmm15 > + > + movl $16+32,%eax > + leaq 32(%r11,%r10,1),%rcx > + movups 16(%r11),%xmm1 > + subq %r10,%rax > + movq %rax,%r10 > + > + movdqu (%rbx),%xmm10 > + movdqu (%rbp),%xmm8 > + > + testq $1,%r8 > + jnz .Locb_enc_odd > + > + bsfq %r8,%r12 > + addq $1,%r8 > + shlq $4,%r12 > + movdqu (%rbx,%r12,1),%xmm7 > + movdqu (%rdi),%xmm2 > + leaq 16(%rdi),%rdi > + > + call __ocb_encrypt1 > + > + movdqa %xmm7,%xmm15 > + movups %xmm2,(%rsi) > + leaq 16(%rsi),%rsi > + subq $1,%rdx > + jz .Locb_enc_done > + > +.Locb_enc_odd: > + leaq 1(%r8),%r12 > + leaq 3(%r8),%r13 > + leaq 5(%r8),%r14 > + leaq 6(%r8),%r8 > + bsfq %r12,%r12 > + bsfq %r13,%r13 > + bsfq %r14,%r14 > + shlq $4,%r12 > + shlq $4,%r13 > + shlq $4,%r14 > + > + subq $6,%rdx > + jc .Locb_enc_short > + jmp .Locb_enc_grandloop > + > +.align 32 > +.Locb_enc_grandloop: > + movdqu 0(%rdi),%xmm2 > + movdqu 16(%rdi),%xmm3 > + movdqu 32(%rdi),%xmm4 > + movdqu 48(%rdi),%xmm5 > + movdqu 64(%rdi),%xmm6 > + movdqu 80(%rdi),%xmm7 > + leaq 96(%rdi),%rdi > + > + call __ocb_encrypt6 > + > + movups %xmm2,0(%rsi) > + movups %xmm3,16(%rsi) > + movups %xmm4,32(%rsi) > + movups %xmm5,48(%rsi) > + movups %xmm6,64(%rsi) > + movups %xmm7,80(%rsi) > + leaq 96(%rsi),%rsi > + subq $6,%rdx > + jnc .Locb_enc_grandloop > + > +.Locb_enc_short: > + addq $6,%rdx > + jz .Locb_enc_done > + > + movdqu 0(%rdi),%xmm2 > + cmpq $2,%rdx > + jb .Locb_enc_one > + movdqu 16(%rdi),%xmm3 > + je .Locb_enc_two > + > + movdqu 32(%rdi),%xmm4 > + cmpq $4,%rdx > + jb .Locb_enc_three > + movdqu 48(%rdi),%xmm5 > + je .Locb_enc_four > + > + movdqu 64(%rdi),%xmm6 > + pxor %xmm7,%xmm7 > + > + call __ocb_encrypt6 > + > + movdqa %xmm14,%xmm15 > + movups %xmm2,0(%rsi) > + movups %xmm3,16(%rsi) > + movups %xmm4,32(%rsi) > + movups %xmm5,48(%rsi) > + movups %xmm6,64(%rsi) > + > + jmp .Locb_enc_done > + > +.align 16 > +.Locb_enc_one: > + movdqa %xmm10,%xmm7 > + > + call __ocb_encrypt1 > + > + movdqa %xmm7,%xmm15 > + movups %xmm2,0(%rsi) > + jmp .Locb_enc_done > + > +.align 16 > +.Locb_enc_two: > + pxor %xmm4,%xmm4 > + pxor %xmm5,%xmm5 > + > + call __ocb_encrypt4 > + > + movdqa %xmm11,%xmm15 > + movups %xmm2,0(%rsi) > + movups %xmm3,16(%rsi) > + > + jmp .Locb_enc_done > + > +.align 16 > +.Locb_enc_three: > + pxor %xmm5,%xmm5 > + > + call __ocb_encrypt4 > + > + movdqa %xmm12,%xmm15 > + movups %xmm2,0(%rsi) > + movups %xmm3,16(%rsi) > + movups %xmm4,32(%rsi) > + > + jmp .Locb_enc_done > + > +.align 16 > +.Locb_enc_four: > + call __ocb_encrypt4 > + > + movdqa %xmm13,%xmm15 > + movups %xmm2,0(%rsi) > + movups %xmm3,16(%rsi) > + movups %xmm4,32(%rsi) > + movups %xmm5,48(%rsi) > + > +.Locb_enc_done: > + pxor %xmm0,%xmm15 > + movdqu %xmm8,(%rbp) > + movdqu %xmm15,(%r9) > + > + xorps %xmm0,%xmm0 > + pxor %xmm1,%xmm1 > + pxor %xmm2,%xmm2 > + pxor %xmm3,%xmm3 > + pxor %xmm4,%xmm4 > + pxor %xmm5,%xmm5 > + pxor %xmm6,%xmm6 > + pxor %xmm7,%xmm7 > + pxor %xmm8,%xmm8 > + pxor %xmm9,%xmm9 > + pxor %xmm10,%xmm10 > + pxor %xmm11,%xmm11 > + pxor %xmm12,%xmm12 > + pxor %xmm13,%xmm13 > + pxor %xmm14,%xmm14 > + pxor %xmm15,%xmm15 > + leaq 40(%rsp),%rax > +.cfi_def_cfa %rax,8 > + movq -40(%rax),%r14 > +.cfi_restore %r14 > + movq -32(%rax),%r13 > +.cfi_restore %r13 > + movq -24(%rax),%r12 > +.cfi_restore %r12 > + movq -16(%rax),%rbp > +.cfi_restore %rbp > + movq -8(%rax),%rbx > +.cfi_restore %rbx > + leaq (%rax),%rsp > +.cfi_def_cfa_register %rsp > +.Locb_enc_epilogue: > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size aesni_ocb_encrypt,.-aesni_ocb_encrypt > + > +.type __ocb_encrypt6,@function > +.align 32 > +__ocb_encrypt6: > +.cfi_startproc > + pxor %xmm9,%xmm15 > + movdqu (%rbx,%r12,1),%xmm11 > + movdqa %xmm10,%xmm12 > + movdqu (%rbx,%r13,1),%xmm13 > + movdqa %xmm10,%xmm14 > + pxor %xmm15,%xmm10 > + movdqu (%rbx,%r14,1),%xmm15 > + pxor %xmm10,%xmm11 > + pxor %xmm2,%xmm8 > + pxor %xmm10,%xmm2 > + pxor %xmm11,%xmm12 > + pxor %xmm3,%xmm8 > + pxor %xmm11,%xmm3 > + pxor %xmm12,%xmm13 > + pxor %xmm4,%xmm8 > + pxor %xmm12,%xmm4 > + pxor %xmm13,%xmm14 > + pxor %xmm5,%xmm8 > + pxor %xmm13,%xmm5 > + pxor %xmm14,%xmm15 > + pxor %xmm6,%xmm8 > + pxor %xmm14,%xmm6 > + pxor %xmm7,%xmm8 > + pxor %xmm15,%xmm7 > + movups 32(%r11),%xmm0 > + > + leaq 1(%r8),%r12 > + leaq 3(%r8),%r13 > + leaq 5(%r8),%r14 > + addq $6,%r8 > + pxor %xmm9,%xmm10 > + bsfq %r12,%r12 > + bsfq %r13,%r13 > + bsfq %r14,%r14 > + > +.byte 102,15,56,220,209 > +.byte 102,15,56,220,217 > +.byte 102,15,56,220,225 > +.byte 102,15,56,220,233 > + pxor %xmm9,%xmm11 > + pxor %xmm9,%xmm12 > +.byte 102,15,56,220,241 > + pxor %xmm9,%xmm13 > + pxor %xmm9,%xmm14 > +.byte 102,15,56,220,249 > + movups 48(%r11),%xmm1 > + pxor %xmm9,%xmm15 > + > +.byte 102,15,56,220,208 > +.byte 102,15,56,220,216 > +.byte 102,15,56,220,224 > +.byte 102,15,56,220,232 > +.byte 102,15,56,220,240 > +.byte 102,15,56,220,248 > + movups 64(%r11),%xmm0 > + shlq $4,%r12 > + shlq $4,%r13 > + jmp .Locb_enc_loop6 > + > +.align 32 > +.Locb_enc_loop6: > +.byte 102,15,56,220,209 > +.byte 102,15,56,220,217 > +.byte 102,15,56,220,225 > +.byte 102,15,56,220,233 > +.byte 102,15,56,220,241 > +.byte 102,15,56,220,249 > + movups (%rcx,%rax,1),%xmm1 > + addq $32,%rax > + > +.byte 102,15,56,220,208 > +.byte 102,15,56,220,216 > +.byte 102,15,56,220,224 > +.byte 102,15,56,220,232 > +.byte 102,15,56,220,240 > +.byte 102,15,56,220,248 > + movups -16(%rcx,%rax,1),%xmm0 > + jnz .Locb_enc_loop6 > + > +.byte 102,15,56,220,209 > +.byte 102,15,56,220,217 > +.byte 102,15,56,220,225 > +.byte 102,15,56,220,233 > +.byte 102,15,56,220,241 > +.byte 102,15,56,220,249 > + movups 16(%r11),%xmm1 > + shlq $4,%r14 > + > +.byte 102,65,15,56,221,210 > + movdqu (%rbx),%xmm10 > + movq %r10,%rax > +.byte 102,65,15,56,221,219 > +.byte 102,65,15,56,221,228 > +.byte 102,65,15,56,221,237 > +.byte 102,65,15,56,221,246 > +.byte 102,65,15,56,221,255 > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size __ocb_encrypt6,.-__ocb_encrypt6 > + > +.type __ocb_encrypt4,@function > +.align 32 > +__ocb_encrypt4: > +.cfi_startproc > + pxor %xmm9,%xmm15 > + movdqu (%rbx,%r12,1),%xmm11 > + movdqa %xmm10,%xmm12 > + movdqu (%rbx,%r13,1),%xmm13 > + pxor %xmm15,%xmm10 > + pxor %xmm10,%xmm11 > + pxor %xmm2,%xmm8 > + pxor %xmm10,%xmm2 > + pxor %xmm11,%xmm12 > + pxor %xmm3,%xmm8 > + pxor %xmm11,%xmm3 > + pxor %xmm12,%xmm13 > + pxor %xmm4,%xmm8 > + pxor %xmm12,%xmm4 > + pxor %xmm5,%xmm8 > + pxor %xmm13,%xmm5 > + movups 32(%r11),%xmm0 > + > + pxor %xmm9,%xmm10 > + pxor %xmm9,%xmm11 > + pxor %xmm9,%xmm12 > + pxor %xmm9,%xmm13 > + > +.byte 102,15,56,220,209 > +.byte 102,15,56,220,217 > +.byte 102,15,56,220,225 > +.byte 102,15,56,220,233 > + movups 48(%r11),%xmm1 > + > +.byte 102,15,56,220,208 > +.byte 102,15,56,220,216 > +.byte 102,15,56,220,224 > +.byte 102,15,56,220,232 > + movups 64(%r11),%xmm0 > + jmp .Locb_enc_loop4 > + > +.align 32 > +.Locb_enc_loop4: > +.byte 102,15,56,220,209 > +.byte 102,15,56,220,217 > +.byte 102,15,56,220,225 > +.byte 102,15,56,220,233 > + movups (%rcx,%rax,1),%xmm1 > + addq $32,%rax > + > +.byte 102,15,56,220,208 > +.byte 102,15,56,220,216 > +.byte 102,15,56,220,224 > +.byte 102,15,56,220,232 > + movups -16(%rcx,%rax,1),%xmm0 > + jnz .Locb_enc_loop4 > + > +.byte 102,15,56,220,209 > +.byte 102,15,56,220,217 > +.byte 102,15,56,220,225 > +.byte 102,15,56,220,233 > + movups 16(%r11),%xmm1 > + movq %r10,%rax > + > +.byte 102,65,15,56,221,210 > +.byte 102,65,15,56,221,219 > +.byte 102,65,15,56,221,228 > +.byte 102,65,15,56,221,237 > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size __ocb_encrypt4,.-__ocb_encrypt4 > + > +.type __ocb_encrypt1,@function > +.align 32 > +__ocb_encrypt1: > +.cfi_startproc > + pxor %xmm15,%xmm7 > + pxor %xmm9,%xmm7 > + pxor %xmm2,%xmm8 > + pxor %xmm7,%xmm2 > + movups 32(%r11),%xmm0 > + > +.byte 102,15,56,220,209 > + movups 48(%r11),%xmm1 > + pxor %xmm9,%xmm7 > + > +.byte 102,15,56,220,208 > + movups 64(%r11),%xmm0 > + jmp .Locb_enc_loop1 > + > +.align 32 > +.Locb_enc_loop1: > +.byte 102,15,56,220,209 > + movups (%rcx,%rax,1),%xmm1 > + addq $32,%rax > + > +.byte 102,15,56,220,208 > + movups -16(%rcx,%rax,1),%xmm0 > + jnz .Locb_enc_loop1 > + > +.byte 102,15,56,220,209 > + movups 16(%r11),%xmm1 > + movq %r10,%rax > + > +.byte 102,15,56,221,215 > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size __ocb_encrypt1,.-__ocb_encrypt1 > + > +.globl aesni_ocb_decrypt > +.type aesni_ocb_decrypt,@function > +.align 32 > +aesni_ocb_decrypt: > +.cfi_startproc > + leaq (%rsp),%rax > + pushq %rbx > +.cfi_adjust_cfa_offset 8 > +.cfi_offset %rbx,-16 > + pushq %rbp > +.cfi_adjust_cfa_offset 8 > +.cfi_offset %rbp,-24 > + pushq %r12 > +.cfi_adjust_cfa_offset 8 > +.cfi_offset %r12,-32 > + pushq %r13 > +.cfi_adjust_cfa_offset 8 > +.cfi_offset %r13,-40 > + pushq %r14 > +.cfi_adjust_cfa_offset 8 > +.cfi_offset %r14,-48 > + movq 8(%rax),%rbx > + movq 8+8(%rax),%rbp > + > + movl 240(%rcx),%r10d > + movq %rcx,%r11 > + shll $4,%r10d > + movups (%rcx),%xmm9 > + movups 16(%rcx,%r10,1),%xmm1 > + > + movdqu (%r9),%xmm15 > + pxor %xmm1,%xmm9 > + pxor %xmm1,%xmm15 > + > + movl $16+32,%eax > + leaq 32(%r11,%r10,1),%rcx > + movups 16(%r11),%xmm1 > + subq %r10,%rax > + movq %rax,%r10 > + > + movdqu (%rbx),%xmm10 > + movdqu (%rbp),%xmm8 > + > + testq $1,%r8 > + jnz .Locb_dec_odd > + > + bsfq %r8,%r12 > + addq $1,%r8 > + shlq $4,%r12 > + movdqu (%rbx,%r12,1),%xmm7 > + movdqu (%rdi),%xmm2 > + leaq 16(%rdi),%rdi > + > + call __ocb_decrypt1 > + > + movdqa %xmm7,%xmm15 > + movups %xmm2,(%rsi) > + xorps %xmm2,%xmm8 > + leaq 16(%rsi),%rsi > + subq $1,%rdx > + jz .Locb_dec_done > + > +.Locb_dec_odd: > + leaq 1(%r8),%r12 > + leaq 3(%r8),%r13 > + leaq 5(%r8),%r14 > + leaq 6(%r8),%r8 > + bsfq %r12,%r12 > + bsfq %r13,%r13 > + bsfq %r14,%r14 > + shlq $4,%r12 > + shlq $4,%r13 > + shlq $4,%r14 > + > + subq $6,%rdx > + jc .Locb_dec_short > + jmp .Locb_dec_grandloop > + > +.align 32 > +.Locb_dec_grandloop: > + movdqu 0(%rdi),%xmm2 > + movdqu 16(%rdi),%xmm3 > + movdqu 32(%rdi),%xmm4 > + movdqu 48(%rdi),%xmm5 > + movdqu 64(%rdi),%xmm6 > + movdqu 80(%rdi),%xmm7 > + leaq 96(%rdi),%rdi > + > + call __ocb_decrypt6 > + > + movups %xmm2,0(%rsi) > + pxor %xmm2,%xmm8 > + movups %xmm3,16(%rsi) > + pxor %xmm3,%xmm8 > + movups %xmm4,32(%rsi) > + pxor %xmm4,%xmm8 > + movups %xmm5,48(%rsi) > + pxor %xmm5,%xmm8 > + movups %xmm6,64(%rsi) > + pxor %xmm6,%xmm8 > + movups %xmm7,80(%rsi) > + pxor %xmm7,%xmm8 > + leaq 96(%rsi),%rsi > + subq $6,%rdx > + jnc .Locb_dec_grandloop > + > +.Locb_dec_short: > + addq $6,%rdx > + jz .Locb_dec_done > + > + movdqu 0(%rdi),%xmm2 > + cmpq $2,%rdx > + jb .Locb_dec_one > + movdqu 16(%rdi),%xmm3 > + je .Locb_dec_two > + > + movdqu 32(%rdi),%xmm4 > + cmpq $4,%rdx > + jb .Locb_dec_three > + movdqu 48(%rdi),%xmm5 > + je .Locb_dec_four > + > + movdqu 64(%rdi),%xmm6 > + pxor %xmm7,%xmm7 > + > + call __ocb_decrypt6 > + > + movdqa %xmm14,%xmm15 > + movups %xmm2,0(%rsi) > + pxor %xmm2,%xmm8 > + movups %xmm3,16(%rsi) > + pxor %xmm3,%xmm8 > + movups %xmm4,32(%rsi) > + pxor %xmm4,%xmm8 > + movups %xmm5,48(%rsi) > + pxor %xmm5,%xmm8 > + movups %xmm6,64(%rsi) > + pxor %xmm6,%xmm8 > + > + jmp .Locb_dec_done > + > +.align 16 > +.Locb_dec_one: > + movdqa %xmm10,%xmm7 > + > + call __ocb_decrypt1 > + > + movdqa %xmm7,%xmm15 > + movups %xmm2,0(%rsi) > + xorps %xmm2,%xmm8 > + jmp .Locb_dec_done > + > +.align 16 > +.Locb_dec_two: > + pxor %xmm4,%xmm4 > + pxor %xmm5,%xmm5 > + > + call __ocb_decrypt4 > + > + movdqa %xmm11,%xmm15 > + movups %xmm2,0(%rsi) > + xorps %xmm2,%xmm8 > + movups %xmm3,16(%rsi) > + xorps %xmm3,%xmm8 > + > + jmp .Locb_dec_done > + > +.align 16 > +.Locb_dec_three: > + pxor %xmm5,%xmm5 > + > + call __ocb_decrypt4 > + > + movdqa %xmm12,%xmm15 > + movups %xmm2,0(%rsi) > + xorps %xmm2,%xmm8 > + movups %xmm3,16(%rsi) > + xorps %xmm3,%xmm8 > + movups %xmm4,32(%rsi) > + xorps %xmm4,%xmm8 > + > + jmp .Locb_dec_done > + > +.align 16 > +.Locb_dec_four: > + call __ocb_decrypt4 > + > + movdqa %xmm13,%xmm15 > + movups %xmm2,0(%rsi) > + pxor %xmm2,%xmm8 > + movups %xmm3,16(%rsi) > + pxor %xmm3,%xmm8 > + movups %xmm4,32(%rsi) > + pxor %xmm4,%xmm8 > + movups %xmm5,48(%rsi) > + pxor %xmm5,%xmm8 > + > +.Locb_dec_done: > + pxor %xmm0,%xmm15 > + movdqu %xmm8,(%rbp) > + movdqu %xmm15,(%r9) > + > + xorps %xmm0,%xmm0 > + pxor %xmm1,%xmm1 > + pxor %xmm2,%xmm2 > + pxor %xmm3,%xmm3 > + pxor %xmm4,%xmm4 > + pxor %xmm5,%xmm5 > + pxor %xmm6,%xmm6 > + pxor %xmm7,%xmm7 > + pxor %xmm8,%xmm8 > + pxor %xmm9,%xmm9 > + pxor %xmm10,%xmm10 > + pxor %xmm11,%xmm11 > + pxor %xmm12,%xmm12 > + pxor %xmm13,%xmm13 > + pxor %xmm14,%xmm14 > + pxor %xmm15,%xmm15 > + leaq 40(%rsp),%rax > +.cfi_def_cfa %rax,8 > + movq -40(%rax),%r14 > +.cfi_restore %r14 > + movq -32(%rax),%r13 > +.cfi_restore %r13 > + movq -24(%rax),%r12 > +.cfi_restore %r12 > + movq -16(%rax),%rbp > +.cfi_restore %rbp > + movq -8(%rax),%rbx > +.cfi_restore %rbx > + leaq (%rax),%rsp > +.cfi_def_cfa_register %rsp > +.Locb_dec_epilogue: > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size aesni_ocb_decrypt,.-aesni_ocb_decrypt > + > +.type __ocb_decrypt6,@function > +.align 32 > +__ocb_decrypt6: > +.cfi_startproc > + pxor %xmm9,%xmm15 > + movdqu (%rbx,%r12,1),%xmm11 > + movdqa %xmm10,%xmm12 > + movdqu (%rbx,%r13,1),%xmm13 > + movdqa %xmm10,%xmm14 > + pxor %xmm15,%xmm10 > + movdqu (%rbx,%r14,1),%xmm15 > + pxor %xmm10,%xmm11 > + pxor %xmm10,%xmm2 > + pxor %xmm11,%xmm12 > + pxor %xmm11,%xmm3 > + pxor %xmm12,%xmm13 > + pxor %xmm12,%xmm4 > + pxor %xmm13,%xmm14 > + pxor %xmm13,%xmm5 > + pxor %xmm14,%xmm15 > + pxor %xmm14,%xmm6 > + pxor %xmm15,%xmm7 > + movups 32(%r11),%xmm0 > + > + leaq 1(%r8),%r12 > + leaq 3(%r8),%r13 > + leaq 5(%r8),%r14 > + addq $6,%r8 > + pxor %xmm9,%xmm10 > + bsfq %r12,%r12 > + bsfq %r13,%r13 > + bsfq %r14,%r14 > + > +.byte 102,15,56,222,209 > +.byte 102,15,56,222,217 > +.byte 102,15,56,222,225 > +.byte 102,15,56,222,233 > + pxor %xmm9,%xmm11 > + pxor %xmm9,%xmm12 > +.byte 102,15,56,222,241 > + pxor %xmm9,%xmm13 > + pxor %xmm9,%xmm14 > +.byte 102,15,56,222,249 > + movups 48(%r11),%xmm1 > + pxor %xmm9,%xmm15 > + > +.byte 102,15,56,222,208 > +.byte 102,15,56,222,216 > +.byte 102,15,56,222,224 > +.byte 102,15,56,222,232 > +.byte 102,15,56,222,240 > +.byte 102,15,56,222,248 > + movups 64(%r11),%xmm0 > + shlq $4,%r12 > + shlq $4,%r13 > + jmp .Locb_dec_loop6 > + > +.align 32 > +.Locb_dec_loop6: > +.byte 102,15,56,222,209 > +.byte 102,15,56,222,217 > +.byte 102,15,56,222,225 > +.byte 102,15,56,222,233 > +.byte 102,15,56,222,241 > +.byte 102,15,56,222,249 > + movups (%rcx,%rax,1),%xmm1 > + addq $32,%rax > + > +.byte 102,15,56,222,208 > +.byte 102,15,56,222,216 > +.byte 102,15,56,222,224 > +.byte 102,15,56,222,232 > +.byte 102,15,56,222,240 > +.byte 102,15,56,222,248 > + movups -16(%rcx,%rax,1),%xmm0 > + jnz .Locb_dec_loop6 > + > +.byte 102,15,56,222,209 > +.byte 102,15,56,222,217 > +.byte 102,15,56,222,225 > +.byte 102,15,56,222,233 > +.byte 102,15,56,222,241 > +.byte 102,15,56,222,249 > + movups 16(%r11),%xmm1 > + shlq $4,%r14 > + > +.byte 102,65,15,56,223,210 > + movdqu (%rbx),%xmm10 > + movq %r10,%rax > +.byte 102,65,15,56,223,219 > +.byte 102,65,15,56,223,228 > +.byte 102,65,15,56,223,237 > +.byte 102,65,15,56,223,246 > +.byte 102,65,15,56,223,255 > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size __ocb_decrypt6,.-__ocb_decrypt6 > + > +.type __ocb_decrypt4,@function > +.align 32 > +__ocb_decrypt4: > +.cfi_startproc > + pxor %xmm9,%xmm15 > + movdqu (%rbx,%r12,1),%xmm11 > + movdqa %xmm10,%xmm12 > + movdqu (%rbx,%r13,1),%xmm13 > + pxor %xmm15,%xmm10 > + pxor %xmm10,%xmm11 > + pxor %xmm10,%xmm2 > + pxor %xmm11,%xmm12 > + pxor %xmm11,%xmm3 > + pxor %xmm12,%xmm13 > + pxor %xmm12,%xmm4 > + pxor %xmm13,%xmm5 > + movups 32(%r11),%xmm0 > + > + pxor %xmm9,%xmm10 > + pxor %xmm9,%xmm11 > + pxor %xmm9,%xmm12 > + pxor %xmm9,%xmm13 > + > +.byte 102,15,56,222,209 > +.byte 102,15,56,222,217 > +.byte 102,15,56,222,225 > +.byte 102,15,56,222,233 > + movups 48(%r11),%xmm1 > + > +.byte 102,15,56,222,208 > +.byte 102,15,56,222,216 > +.byte 102,15,56,222,224 > +.byte 102,15,56,222,232 > + movups 64(%r11),%xmm0 > + jmp .Locb_dec_loop4 > + > +.align 32 > +.Locb_dec_loop4: > +.byte 102,15,56,222,209 > +.byte 102,15,56,222,217 > +.byte 102,15,56,222,225 > +.byte 102,15,56,222,233 > + movups (%rcx,%rax,1),%xmm1 > + addq $32,%rax > + > +.byte 102,15,56,222,208 > +.byte 102,15,56,222,216 > +.byte 102,15,56,222,224 > +.byte 102,15,56,222,232 > + movups -16(%rcx,%rax,1),%xmm0 > + jnz .Locb_dec_loop4 > + > +.byte 102,15,56,222,209 > +.byte 102,15,56,222,217 > +.byte 102,15,56,222,225 > +.byte 102,15,56,222,233 > + movups 16(%r11),%xmm1 > + movq %r10,%rax > + > +.byte 102,65,15,56,223,210 > +.byte 102,65,15,56,223,219 > +.byte 102,65,15,56,223,228 > +.byte 102,65,15,56,223,237 > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size __ocb_decrypt4,.-__ocb_decrypt4 > + > +.type __ocb_decrypt1,@function > +.align 32 > +__ocb_decrypt1: > +.cfi_startproc > + pxor %xmm15,%xmm7 > + pxor %xmm9,%xmm7 > + pxor %xmm7,%xmm2 > + movups 32(%r11),%xmm0 > + > +.byte 102,15,56,222,209 > + movups 48(%r11),%xmm1 > + pxor %xmm9,%xmm7 > + > +.byte 102,15,56,222,208 > + movups 64(%r11),%xmm0 > + jmp .Locb_dec_loop1 > + > +.align 32 > +.Locb_dec_loop1: > +.byte 102,15,56,222,209 > + movups (%rcx,%rax,1),%xmm1 > + addq $32,%rax > + > +.byte 102,15,56,222,208 > + movups -16(%rcx,%rax,1),%xmm0 > + jnz .Locb_dec_loop1 > + > +.byte 102,15,56,222,209 > + movups 16(%r11),%xmm1 > + movq %r10,%rax > + > +.byte 102,15,56,223,215 > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size __ocb_decrypt1,.-__ocb_decrypt1 > +.globl aesni_cbc_encrypt > +.type aesni_cbc_encrypt,@function > +.align 16 > +aesni_cbc_encrypt: > +.cfi_startproc > + testq %rdx,%rdx > + jz .Lcbc_ret > + > + movl 240(%rcx),%r10d > + movq %rcx,%r11 > + testl %r9d,%r9d > + jz .Lcbc_decrypt > + > + movups (%r8),%xmm2 > + movl %r10d,%eax > + cmpq $16,%rdx > + jb .Lcbc_enc_tail > + subq $16,%rdx > + jmp .Lcbc_enc_loop > +.align 16 > +.Lcbc_enc_loop: > + movups (%rdi),%xmm3 > + leaq 16(%rdi),%rdi > + > + movups (%rcx),%xmm0 > + movups 16(%rcx),%xmm1 > + xorps %xmm0,%xmm3 > + leaq 32(%rcx),%rcx > + xorps %xmm3,%xmm2 > +.Loop_enc1_15: > +.byte 102,15,56,220,209 > + decl %eax > + movups (%rcx),%xmm1 > + leaq 16(%rcx),%rcx > + jnz .Loop_enc1_15 > +.byte 102,15,56,221,209 > + movl %r10d,%eax > + movq %r11,%rcx > + movups %xmm2,0(%rsi) > + leaq 16(%rsi),%rsi > + subq $16,%rdx > + jnc .Lcbc_enc_loop > + addq $16,%rdx > + jnz .Lcbc_enc_tail > + pxor %xmm0,%xmm0 > + pxor %xmm1,%xmm1 > + movups %xmm2,(%r8) > + pxor %xmm2,%xmm2 > + pxor %xmm3,%xmm3 > + jmp .Lcbc_ret > + > +.Lcbc_enc_tail: > + movq %rdx,%rcx > + xchgq %rdi,%rsi > +.long 0x9066A4F3 > + movl $16,%ecx > + subq %rdx,%rcx > + xorl %eax,%eax > +.long 0x9066AAF3 > + leaq -16(%rdi),%rdi > + movl %r10d,%eax > + movq %rdi,%rsi > + movq %r11,%rcx > + xorq %rdx,%rdx > + jmp .Lcbc_enc_loop > + > +.align 16 > +.Lcbc_decrypt: > + cmpq $16,%rdx > + jne .Lcbc_decrypt_bulk > + > + > + > + movdqu (%rdi),%xmm2 > + movdqu (%r8),%xmm3 > + movdqa %xmm2,%xmm4 > + movups (%rcx),%xmm0 > + movups 16(%rcx),%xmm1 > + leaq 32(%rcx),%rcx > + xorps %xmm0,%xmm2 > +.Loop_dec1_16: > +.byte 102,15,56,222,209 > + decl %r10d > + movups (%rcx),%xmm1 > + leaq 16(%rcx),%rcx > + jnz .Loop_dec1_16 > +.byte 102,15,56,223,209 > + pxor %xmm0,%xmm0 > + pxor %xmm1,%xmm1 > + movdqu %xmm4,(%r8) > + xorps %xmm3,%xmm2 > + pxor %xmm3,%xmm3 > + movups %xmm2,(%rsi) > + pxor %xmm2,%xmm2 > + jmp .Lcbc_ret > +.align 16 > +.Lcbc_decrypt_bulk: > + leaq (%rsp),%r11 > +.cfi_def_cfa_register %r11 > + pushq %rbp > +.cfi_offset %rbp,-16 > + subq $16,%rsp > + andq $-16,%rsp > + movq %rcx,%rbp > + movups (%r8),%xmm10 > + movl %r10d,%eax > + cmpq $0x50,%rdx > + jbe .Lcbc_dec_tail > + > + movups (%rcx),%xmm0 > + movdqu 0(%rdi),%xmm2 > + movdqu 16(%rdi),%xmm3 > + movdqa %xmm2,%xmm11 > + movdqu 32(%rdi),%xmm4 > + movdqa %xmm3,%xmm12 > + movdqu 48(%rdi),%xmm5 > + movdqa %xmm4,%xmm13 > + movdqu 64(%rdi),%xmm6 > + movdqa %xmm5,%xmm14 > + movdqu 80(%rdi),%xmm7 > + movdqa %xmm6,%xmm15 > + movl OPENSSL_ia32cap_P+4(%rip),%r9d > + cmpq $0x70,%rdx > + jbe .Lcbc_dec_six_or_seven > + > + andl $71303168,%r9d > + subq $0x50,%rdx > + cmpl $4194304,%r9d > + je .Lcbc_dec_loop6_enter > + subq $0x20,%rdx > + leaq 112(%rcx),%rcx > + jmp .Lcbc_dec_loop8_enter > +.align 16 > +.Lcbc_dec_loop8: > + movups %xmm9,(%rsi) > + leaq 16(%rsi),%rsi > +.Lcbc_dec_loop8_enter: > + movdqu 96(%rdi),%xmm8 > + pxor %xmm0,%xmm2 > + movdqu 112(%rdi),%xmm9 > + pxor %xmm0,%xmm3 > + movups 16-112(%rcx),%xmm1 > + pxor %xmm0,%xmm4 > + movq $-1,%rbp > + cmpq $0x70,%rdx > + pxor %xmm0,%xmm5 > + pxor %xmm0,%xmm6 > + pxor %xmm0,%xmm7 > + pxor %xmm0,%xmm8 > + > +.byte 102,15,56,222,209 > + pxor %xmm0,%xmm9 > + movups 32-112(%rcx),%xmm0 > +.byte 102,15,56,222,217 > +.byte 102,15,56,222,225 > +.byte 102,15,56,222,233 > +.byte 102,15,56,222,241 > +.byte 102,15,56,222,249 > +.byte 102,68,15,56,222,193 > + adcq $0,%rbp > + andq $128,%rbp > +.byte 102,68,15,56,222,201 > + addq %rdi,%rbp > + movups 48-112(%rcx),%xmm1 > +.byte 102,15,56,222,208 > +.byte 102,15,56,222,216 > +.byte 102,15,56,222,224 > +.byte 102,15,56,222,232 > +.byte 102,15,56,222,240 > +.byte 102,15,56,222,248 > +.byte 102,68,15,56,222,192 > +.byte 102,68,15,56,222,200 > + movups 64-112(%rcx),%xmm0 > + nop > +.byte 102,15,56,222,209 > +.byte 102,15,56,222,217 > +.byte 102,15,56,222,225 > +.byte 102,15,56,222,233 > +.byte 102,15,56,222,241 > +.byte 102,15,56,222,249 > +.byte 102,68,15,56,222,193 > +.byte 102,68,15,56,222,201 > + movups 80-112(%rcx),%xmm1 > + nop > +.byte 102,15,56,222,208 > +.byte 102,15,56,222,216 > +.byte 102,15,56,222,224 > +.byte 102,15,56,222,232 > +.byte 102,15,56,222,240 > +.byte 102,15,56,222,248 > +.byte 102,68,15,56,222,192 > +.byte 102,68,15,56,222,200 > + movups 96-112(%rcx),%xmm0 > + nop > +.byte 102,15,56,222,209 > +.byte 102,15,56,222,217 > +.byte 102,15,56,222,225 > +.byte 102,15,56,222,233 > +.byte 102,15,56,222,241 > +.byte 102,15,56,222,249 > +.byte 102,68,15,56,222,193 > +.byte 102,68,15,56,222,201 > + movups 112-112(%rcx),%xmm1 > + nop > +.byte 102,15,56,222,208 > +.byte 102,15,56,222,216 > +.byte 102,15,56,222,224 > +.byte 102,15,56,222,232 > +.byte 102,15,56,222,240 > +.byte 102,15,56,222,248 > +.byte 102,68,15,56,222,192 > +.byte 102,68,15,56,222,200 > + movups 128-112(%rcx),%xmm0 > + nop > +.byte 102,15,56,222,209 > +.byte 102,15,56,222,217 > +.byte 102,15,56,222,225 > +.byte 102,15,56,222,233 > +.byte 102,15,56,222,241 > +.byte 102,15,56,222,249 > +.byte 102,68,15,56,222,193 > +.byte 102,68,15,56,222,201 > + movups 144-112(%rcx),%xmm1 > + cmpl $11,%eax > +.byte 102,15,56,222,208 > +.byte 102,15,56,222,216 > +.byte 102,15,56,222,224 > +.byte 102,15,56,222,232 > +.byte 102,15,56,222,240 > +.byte 102,15,56,222,248 > +.byte 102,68,15,56,222,192 > +.byte 102,68,15,56,222,200 > + movups 160-112(%rcx),%xmm0 > + jb .Lcbc_dec_done > +.byte 102,15,56,222,209 > +.byte 102,15,56,222,217 > +.byte 102,15,56,222,225 > +.byte 102,15,56,222,233 > +.byte 102,15,56,222,241 > +.byte 102,15,56,222,249 > +.byte 102,68,15,56,222,193 > +.byte 102,68,15,56,222,201 > + movups 176-112(%rcx),%xmm1 > + nop > +.byte 102,15,56,222,208 > +.byte 102,15,56,222,216 > +.byte 102,15,56,222,224 > +.byte 102,15,56,222,232 > +.byte 102,15,56,222,240 > +.byte 102,15,56,222,248 > +.byte 102,68,15,56,222,192 > +.byte 102,68,15,56,222,200 > + movups 192-112(%rcx),%xmm0 > + je .Lcbc_dec_done > +.byte 102,15,56,222,209 > +.byte 102,15,56,222,217 > +.byte 102,15,56,222,225 > +.byte 102,15,56,222,233 > +.byte 102,15,56,222,241 > +.byte 102,15,56,222,249 > +.byte 102,68,15,56,222,193 > +.byte 102,68,15,56,222,201 > + movups 208-112(%rcx),%xmm1 > + nop > +.byte 102,15,56,222,208 > +.byte 102,15,56,222,216 > +.byte 102,15,56,222,224 > +.byte 102,15,56,222,232 > +.byte 102,15,56,222,240 > +.byte 102,15,56,222,248 > +.byte 102,68,15,56,222,192 > +.byte 102,68,15,56,222,200 > + movups 224-112(%rcx),%xmm0 > + jmp .Lcbc_dec_done > +.align 16 > +.Lcbc_dec_done: > +.byte 102,15,56,222,209 > +.byte 102,15,56,222,217 > + pxor %xmm0,%xmm10 > + pxor %xmm0,%xmm11 > +.byte 102,15,56,222,225 > +.byte 102,15,56,222,233 > + pxor %xmm0,%xmm12 > + pxor %xmm0,%xmm13 > +.byte 102,15,56,222,241 > +.byte 102,15,56,222,249 > + pxor %xmm0,%xmm14 > + pxor %xmm0,%xmm15 > +.byte 102,68,15,56,222,193 > +.byte 102,68,15,56,222,201 > + movdqu 80(%rdi),%xmm1 > + > +.byte 102,65,15,56,223,210 > + movdqu 96(%rdi),%xmm10 > + pxor %xmm0,%xmm1 > +.byte 102,65,15,56,223,219 > + pxor %xmm0,%xmm10 > + movdqu 112(%rdi),%xmm0 > +.byte 102,65,15,56,223,228 > + leaq 128(%rdi),%rdi > + movdqu 0(%rbp),%xmm11 > +.byte 102,65,15,56,223,237 > +.byte 102,65,15,56,223,246 > + movdqu 16(%rbp),%xmm12 > + movdqu 32(%rbp),%xmm13 > +.byte 102,65,15,56,223,255 > +.byte 102,68,15,56,223,193 > + movdqu 48(%rbp),%xmm14 > + movdqu 64(%rbp),%xmm15 > +.byte 102,69,15,56,223,202 > + movdqa %xmm0,%xmm10 > + movdqu 80(%rbp),%xmm1 > + movups -112(%rcx),%xmm0 > + > + movups %xmm2,(%rsi) > + movdqa %xmm11,%xmm2 > + movups %xmm3,16(%rsi) > + movdqa %xmm12,%xmm3 > + movups %xmm4,32(%rsi) > + movdqa %xmm13,%xmm4 > + movups %xmm5,48(%rsi) > + movdqa %xmm14,%xmm5 > + movups %xmm6,64(%rsi) > + movdqa %xmm15,%xmm6 > + movups %xmm7,80(%rsi) > + movdqa %xmm1,%xmm7 > + movups %xmm8,96(%rsi) > + leaq 112(%rsi),%rsi > + > + subq $0x80,%rdx > + ja .Lcbc_dec_loop8 > + > + movaps %xmm9,%xmm2 > + leaq -112(%rcx),%rcx > + addq $0x70,%rdx > + jle .Lcbc_dec_clear_tail_collected > + movups %xmm9,(%rsi) > + leaq 16(%rsi),%rsi > + cmpq $0x50,%rdx > + jbe .Lcbc_dec_tail > + > + movaps %xmm11,%xmm2 > +.Lcbc_dec_six_or_seven: > + cmpq $0x60,%rdx > + ja .Lcbc_dec_seven > + > + movaps %xmm7,%xmm8 > + call _aesni_decrypt6 > + pxor %xmm10,%xmm2 > + movaps %xmm8,%xmm10 > + pxor %xmm11,%xmm3 > + movdqu %xmm2,(%rsi) > + pxor %xmm12,%xmm4 > + movdqu %xmm3,16(%rsi) > + pxor %xmm3,%xmm3 > + pxor %xmm13,%xmm5 > + movdqu %xmm4,32(%rsi) > + pxor %xmm4,%xmm4 > + pxor %xmm14,%xmm6 > + movdqu %xmm5,48(%rsi) > + pxor %xmm5,%xmm5 > + pxor %xmm15,%xmm7 > + movdqu %xmm6,64(%rsi) > + pxor %xmm6,%xmm6 > + leaq 80(%rsi),%rsi > + movdqa %xmm7,%xmm2 > + pxor %xmm7,%xmm7 > + jmp .Lcbc_dec_tail_collected > + > +.align 16 > +.Lcbc_dec_seven: > + movups 96(%rdi),%xmm8 > + xorps %xmm9,%xmm9 > + call _aesni_decrypt8 > + movups 80(%rdi),%xmm9 > + pxor %xmm10,%xmm2 > + movups 96(%rdi),%xmm10 > + pxor %xmm11,%xmm3 > + movdqu %xmm2,(%rsi) > + pxor %xmm12,%xmm4 > + movdqu %xmm3,16(%rsi) > + pxor %xmm3,%xmm3 > + pxor %xmm13,%xmm5 > + movdqu %xmm4,32(%rsi) > + pxor %xmm4,%xmm4 > + pxor %xmm14,%xmm6 > + movdqu %xmm5,48(%rsi) > + pxor %xmm5,%xmm5 > + pxor %xmm15,%xmm7 > + movdqu %xmm6,64(%rsi) > + pxor %xmm6,%xmm6 > + pxor %xmm9,%xmm8 > + movdqu %xmm7,80(%rsi) > + pxor %xmm7,%xmm7 > + leaq 96(%rsi),%rsi > + movdqa %xmm8,%xmm2 > + pxor %xmm8,%xmm8 > + pxor %xmm9,%xmm9 > + jmp .Lcbc_dec_tail_collected > + > +.align 16 > +.Lcbc_dec_loop6: > + movups %xmm7,(%rsi) > + leaq 16(%rsi),%rsi > + movdqu 0(%rdi),%xmm2 > + movdqu 16(%rdi),%xmm3 > + movdqa %xmm2,%xmm11 > + movdqu 32(%rdi),%xmm4 > + movdqa %xmm3,%xmm12 > + movdqu 48(%rdi),%xmm5 > + movdqa %xmm4,%xmm13 > + movdqu 64(%rdi),%xmm6 > + movdqa %xmm5,%xmm14 > + movdqu 80(%rdi),%xmm7 > + movdqa %xmm6,%xmm15 > +.Lcbc_dec_loop6_enter: > + leaq 96(%rdi),%rdi > + movdqa %xmm7,%xmm8 > + > + call _aesni_decrypt6 > + > + pxor %xmm10,%xmm2 > + movdqa %xmm8,%xmm10 > + pxor %xmm11,%xmm3 > + movdqu %xmm2,(%rsi) > + pxor %xmm12,%xmm4 > + movdqu %xmm3,16(%rsi) > + pxor %xmm13,%xmm5 > + movdqu %xmm4,32(%rsi) > + pxor %xmm14,%xmm6 > + movq %rbp,%rcx > + movdqu %xmm5,48(%rsi) > + pxor %xmm15,%xmm7 > + movl %r10d,%eax > + movdqu %xmm6,64(%rsi) > + leaq 80(%rsi),%rsi > + subq $0x60,%rdx > + ja .Lcbc_dec_loop6 > + > + movdqa %xmm7,%xmm2 > + addq $0x50,%rdx > + jle .Lcbc_dec_clear_tail_collected > + movups %xmm7,(%rsi) > + leaq 16(%rsi),%rsi > + > +.Lcbc_dec_tail: > + movups (%rdi),%xmm2 > + subq $0x10,%rdx > + jbe .Lcbc_dec_one > + > + movups 16(%rdi),%xmm3 > + movaps %xmm2,%xmm11 > + subq $0x10,%rdx > + jbe .Lcbc_dec_two > + > + movups 32(%rdi),%xmm4 > + movaps %xmm3,%xmm12 > + subq $0x10,%rdx > + jbe .Lcbc_dec_three > + > + movups 48(%rdi),%xmm5 > + movaps %xmm4,%xmm13 > + subq $0x10,%rdx > + jbe .Lcbc_dec_four > + > + movups 64(%rdi),%xmm6 > + movaps %xmm5,%xmm14 > + movaps %xmm6,%xmm15 > + xorps %xmm7,%xmm7 > + call _aesni_decrypt6 > + pxor %xmm10,%xmm2 > + movaps %xmm15,%xmm10 > + pxor %xmm11,%xmm3 > + movdqu %xmm2,(%rsi) > + pxor %xmm12,%xmm4 > + movdqu %xmm3,16(%rsi) > + pxor %xmm3,%xmm3 > + pxor %xmm13,%xmm5 > + movdqu %xmm4,32(%rsi) > + pxor %xmm4,%xmm4 > + pxor %xmm14,%xmm6 > + movdqu %xmm5,48(%rsi) > + pxor %xmm5,%xmm5 > + leaq 64(%rsi),%rsi > + movdqa %xmm6,%xmm2 > + pxor %xmm6,%xmm6 > + pxor %xmm7,%xmm7 > + subq $0x10,%rdx > + jmp .Lcbc_dec_tail_collected > + > +.align 16 > +.Lcbc_dec_one: > + movaps %xmm2,%xmm11 > + movups (%rcx),%xmm0 > + movups 16(%rcx),%xmm1 > + leaq 32(%rcx),%rcx > + xorps %xmm0,%xmm2 > +.Loop_dec1_17: > +.byte 102,15,56,222,209 > + decl %eax > + movups (%rcx),%xmm1 > + leaq 16(%rcx),%rcx > + jnz .Loop_dec1_17 > +.byte 102,15,56,223,209 > + xorps %xmm10,%xmm2 > + movaps %xmm11,%xmm10 > + jmp .Lcbc_dec_tail_collected > +.align 16 > +.Lcbc_dec_two: > + movaps %xmm3,%xmm12 > + call _aesni_decrypt2 > + pxor %xmm10,%xmm2 > + movaps %xmm12,%xmm10 > + pxor %xmm11,%xmm3 > + movdqu %xmm2,(%rsi) > + movdqa %xmm3,%xmm2 > + pxor %xmm3,%xmm3 > + leaq 16(%rsi),%rsi > + jmp .Lcbc_dec_tail_collected > +.align 16 > +.Lcbc_dec_three: > + movaps %xmm4,%xmm13 > + call _aesni_decrypt3 > + pxor %xmm10,%xmm2 > + movaps %xmm13,%xmm10 > + pxor %xmm11,%xmm3 > + movdqu %xmm2,(%rsi) > + pxor %xmm12,%xmm4 > + movdqu %xmm3,16(%rsi) > + pxor %xmm3,%xmm3 > + movdqa %xmm4,%xmm2 > + pxor %xmm4,%xmm4 > + leaq 32(%rsi),%rsi > + jmp .Lcbc_dec_tail_collected > +.align 16 > +.Lcbc_dec_four: > + movaps %xmm5,%xmm14 > + call _aesni_decrypt4 > + pxor %xmm10,%xmm2 > + movaps %xmm14,%xmm10 > + pxor %xmm11,%xmm3 > + movdqu %xmm2,(%rsi) > + pxor %xmm12,%xmm4 > + movdqu %xmm3,16(%rsi) > + pxor %xmm3,%xmm3 > + pxor %xmm13,%xmm5 > + movdqu %xmm4,32(%rsi) > + pxor %xmm4,%xmm4 > + movdqa %xmm5,%xmm2 > + pxor %xmm5,%xmm5 > + leaq 48(%rsi),%rsi > + jmp .Lcbc_dec_tail_collected > + > +.align 16 > +.Lcbc_dec_clear_tail_collected: > + pxor %xmm3,%xmm3 > + pxor %xmm4,%xmm4 > + pxor %xmm5,%xmm5 > + pxor %xmm6,%xmm6 > + pxor %xmm7,%xmm7 > + pxor %xmm8,%xmm8 > + pxor %xmm9,%xmm9 > +.Lcbc_dec_tail_collected: > + movups %xmm10,(%r8) > + andq $15,%rdx > + jnz .Lcbc_dec_tail_partial > + movups %xmm2,(%rsi) > + pxor %xmm2,%xmm2 > + jmp .Lcbc_dec_ret > +.align 16 > +.Lcbc_dec_tail_partial: > + movaps %xmm2,(%rsp) > + pxor %xmm2,%xmm2 > + movq $16,%rcx > + movq %rsi,%rdi > + subq %rdx,%rcx > + leaq (%rsp),%rsi > +.long 0x9066A4F3 > + movdqa %xmm2,(%rsp) > + > +.Lcbc_dec_ret: > + xorps %xmm0,%xmm0 > + pxor %xmm1,%xmm1 > + movq -8(%r11),%rbp > +.cfi_restore %rbp > + leaq (%r11),%rsp > +.cfi_def_cfa_register %rsp > +.Lcbc_ret: > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size aesni_cbc_encrypt,.-aesni_cbc_encrypt > +.globl aesni_set_decrypt_key > +.type aesni_set_decrypt_key,@function > +.align 16 > +aesni_set_decrypt_key: > +.cfi_startproc > +.byte 0x48,0x83,0xEC,0x08 > +.cfi_adjust_cfa_offset 8 > + call __aesni_set_encrypt_key > + shll $4,%esi > + testl %eax,%eax > + jnz .Ldec_key_ret > + leaq 16(%rdx,%rsi,1),%rdi > + > + movups (%rdx),%xmm0 > + movups (%rdi),%xmm1 > + movups %xmm0,(%rdi) > + movups %xmm1,(%rdx) > + leaq 16(%rdx),%rdx > + leaq -16(%rdi),%rdi > + > +.Ldec_key_inverse: > + movups (%rdx),%xmm0 > + movups (%rdi),%xmm1 > +.byte 102,15,56,219,192 > +.byte 102,15,56,219,201 > + leaq 16(%rdx),%rdx > + leaq -16(%rdi),%rdi > + movups %xmm0,16(%rdi) > + movups %xmm1,-16(%rdx) > + cmpq %rdx,%rdi > + ja .Ldec_key_inverse > + > + movups (%rdx),%xmm0 > +.byte 102,15,56,219,192 > + pxor %xmm1,%xmm1 > + movups %xmm0,(%rdi) > + pxor %xmm0,%xmm0 > +.Ldec_key_ret: > + addq $8,%rsp > +.cfi_adjust_cfa_offset -8 > + .byte 0xf3,0xc3 > +.cfi_endproc > +.LSEH_end_set_decrypt_key: > +.size aesni_set_decrypt_key,.-aesni_set_decrypt_key > +.globl aesni_set_encrypt_key > +.type aesni_set_encrypt_key,@function > +.align 16 > +aesni_set_encrypt_key: > +__aesni_set_encrypt_key: > +.cfi_startproc > +.byte 0x48,0x83,0xEC,0x08 > +.cfi_adjust_cfa_offset 8 > + movq $-1,%rax > + testq %rdi,%rdi > + jz .Lenc_key_ret > + testq %rdx,%rdx > + jz .Lenc_key_ret > + > + movl $268437504,%r10d > + movups (%rdi),%xmm0 > + xorps %xmm4,%xmm4 > + andl OPENSSL_ia32cap_P+4(%rip),%r10d > + leaq 16(%rdx),%rax > + cmpl $256,%esi > + je .L14rounds > + cmpl $192,%esi > + je .L12rounds > + cmpl $128,%esi > + jne .Lbad_keybits > + > +.L10rounds: > + movl $9,%esi > + cmpl $268435456,%r10d > + je .L10rounds_alt > + > + movups %xmm0,(%rdx) > +.byte 102,15,58,223,200,1 > + call .Lkey_expansion_128_cold > +.byte 102,15,58,223,200,2 > + call .Lkey_expansion_128 > +.byte 102,15,58,223,200,4 > + call .Lkey_expansion_128 > +.byte 102,15,58,223,200,8 > + call .Lkey_expansion_128 > +.byte 102,15,58,223,200,16 > + call .Lkey_expansion_128 > +.byte 102,15,58,223,200,32 > + call .Lkey_expansion_128 > +.byte 102,15,58,223,200,64 > + call .Lkey_expansion_128 > +.byte 102,15,58,223,200,128 > + call .Lkey_expansion_128 > +.byte 102,15,58,223,200,27 > + call .Lkey_expansion_128 > +.byte 102,15,58,223,200,54 > + call .Lkey_expansion_128 > + movups %xmm0,(%rax) > + movl %esi,80(%rax) > + xorl %eax,%eax > + jmp .Lenc_key_ret > + > +.align 16 > +.L10rounds_alt: > + movdqa .Lkey_rotate(%rip),%xmm5 > + movl $8,%r10d > + movdqa .Lkey_rcon1(%rip),%xmm4 > + movdqa %xmm0,%xmm2 > + movdqu %xmm0,(%rdx) > + jmp .Loop_key128 > + > +.align 16 > +.Loop_key128: > +.byte 102,15,56,0,197 > +.byte 102,15,56,221,196 > + pslld $1,%xmm4 > + leaq 16(%rax),%rax > + > + movdqa %xmm2,%xmm3 > + pslldq $4,%xmm2 > + pxor %xmm2,%xmm3 > + pslldq $4,%xmm2 > + pxor %xmm2,%xmm3 > + pslldq $4,%xmm2 > + pxor %xmm3,%xmm2 > + > + pxor %xmm2,%xmm0 > + movdqu %xmm0,-16(%rax) > + movdqa %xmm0,%xmm2 > + > + decl %r10d > + jnz .Loop_key128 > + > + movdqa .Lkey_rcon1b(%rip),%xmm4 > + > +.byte 102,15,56,0,197 > +.byte 102,15,56,221,196 > + pslld $1,%xmm4 > + > + movdqa %xmm2,%xmm3 > + pslldq $4,%xmm2 > + pxor %xmm2,%xmm3 > + pslldq $4,%xmm2 > + pxor %xmm2,%xmm3 > + pslldq $4,%xmm2 > + pxor %xmm3,%xmm2 > + > + pxor %xmm2,%xmm0 > + movdqu %xmm0,(%rax) > + > + movdqa %xmm0,%xmm2 > +.byte 102,15,56,0,197 > +.byte 102,15,56,221,196 > + > + movdqa %xmm2,%xmm3 > + pslldq $4,%xmm2 > + pxor %xmm2,%xmm3 > + pslldq $4,%xmm2 > + pxor %xmm2,%xmm3 > + pslldq $4,%xmm2 > + pxor %xmm3,%xmm2 > + > + pxor %xmm2,%xmm0 > + movdqu %xmm0,16(%rax) > + > + movl %esi,96(%rax) > + xorl %eax,%eax > + jmp .Lenc_key_ret > + > +.align 16 > +.L12rounds: > + movq 16(%rdi),%xmm2 > + movl $11,%esi > + cmpl $268435456,%r10d > + je .L12rounds_alt > + > + movups %xmm0,(%rdx) > +.byte 102,15,58,223,202,1 > + call .Lkey_expansion_192a_cold > +.byte 102,15,58,223,202,2 > + call .Lkey_expansion_192b > +.byte 102,15,58,223,202,4 > + call .Lkey_expansion_192a > +.byte 102,15,58,223,202,8 > + call .Lkey_expansion_192b > +.byte 102,15,58,223,202,16 > + call .Lkey_expansion_192a > +.byte 102,15,58,223,202,32 > + call .Lkey_expansion_192b > +.byte 102,15,58,223,202,64 > + call .Lkey_expansion_192a > +.byte 102,15,58,223,202,128 > + call .Lkey_expansion_192b > + movups %xmm0,(%rax) > + movl %esi,48(%rax) > + xorq %rax,%rax > + jmp .Lenc_key_ret > + > +.align 16 > +.L12rounds_alt: > + movdqa .Lkey_rotate192(%rip),%xmm5 > + movdqa .Lkey_rcon1(%rip),%xmm4 > + movl $8,%r10d > + movdqu %xmm0,(%rdx) > + jmp .Loop_key192 > + > +.align 16 > +.Loop_key192: > + movq %xmm2,0(%rax) > + movdqa %xmm2,%xmm1 > +.byte 102,15,56,0,213 > +.byte 102,15,56,221,212 > + pslld $1,%xmm4 > + leaq 24(%rax),%rax > + > + movdqa %xmm0,%xmm3 > + pslldq $4,%xmm0 > + pxor %xmm0,%xmm3 > + pslldq $4,%xmm0 > + pxor %xmm0,%xmm3 > + pslldq $4,%xmm0 > + pxor %xmm3,%xmm0 > + > + pshufd $0xff,%xmm0,%xmm3 > + pxor %xmm1,%xmm3 > + pslldq $4,%xmm1 > + pxor %xmm1,%xmm3 > + > + pxor %xmm2,%xmm0 > + pxor %xmm3,%xmm2 > + movdqu %xmm0,-16(%rax) > + > + decl %r10d > + jnz .Loop_key192 > + > + movl %esi,32(%rax) > + xorl %eax,%eax > + jmp .Lenc_key_ret > + > +.align 16 > +.L14rounds: > + movups 16(%rdi),%xmm2 > + movl $13,%esi > + leaq 16(%rax),%rax > + cmpl $268435456,%r10d > + je .L14rounds_alt > + > + movups %xmm0,(%rdx) > + movups %xmm2,16(%rdx) > +.byte 102,15,58,223,202,1 > + call .Lkey_expansion_256a_cold > +.byte 102,15,58,223,200,1 > + call .Lkey_expansion_256b > +.byte 102,15,58,223,202,2 > + call .Lkey_expansion_256a > +.byte 102,15,58,223,200,2 > + call .Lkey_expansion_256b > +.byte 102,15,58,223,202,4 > + call .Lkey_expansion_256a > +.byte 102,15,58,223,200,4 > + call .Lkey_expansion_256b > +.byte 102,15,58,223,202,8 > + call .Lkey_expansion_256a > +.byte 102,15,58,223,200,8 > + call .Lkey_expansion_256b > +.byte 102,15,58,223,202,16 > + call .Lkey_expansion_256a > +.byte 102,15,58,223,200,16 > + call .Lkey_expansion_256b > +.byte 102,15,58,223,202,32 > + call .Lkey_expansion_256a > +.byte 102,15,58,223,200,32 > + call .Lkey_expansion_256b > +.byte 102,15,58,223,202,64 > + call .Lkey_expansion_256a > + movups %xmm0,(%rax) > + movl %esi,16(%rax) > + xorq %rax,%rax > + jmp .Lenc_key_ret > + > +.align 16 > +.L14rounds_alt: > + movdqa .Lkey_rotate(%rip),%xmm5 > + movdqa .Lkey_rcon1(%rip),%xmm4 > + movl $7,%r10d > + movdqu %xmm0,0(%rdx) > + movdqa %xmm2,%xmm1 > + movdqu %xmm2,16(%rdx) > + jmp .Loop_key256 > + > +.align 16 > +.Loop_key256: > +.byte 102,15,56,0,213 > +.byte 102,15,56,221,212 > + > + movdqa %xmm0,%xmm3 > + pslldq $4,%xmm0 > + pxor %xmm0,%xmm3 > + pslldq $4,%xmm0 > + pxor %xmm0,%xmm3 > + pslldq $4,%xmm0 > + pxor %xmm3,%xmm0 > + pslld $1,%xmm4 > + > + pxor %xmm2,%xmm0 > + movdqu %xmm0,(%rax) > + > + decl %r10d > + jz .Ldone_key256 > + > + pshufd $0xff,%xmm0,%xmm2 > + pxor %xmm3,%xmm3 > +.byte 102,15,56,221,211 > + > + movdqa %xmm1,%xmm3 > + pslldq $4,%xmm1 > + pxor %xmm1,%xmm3 > + pslldq $4,%xmm1 > + pxor %xmm1,%xmm3 > + pslldq $4,%xmm1 > + pxor %xmm3,%xmm1 > + > + pxor %xmm1,%xmm2 > + movdqu %xmm2,16(%rax) > + leaq 32(%rax),%rax > + movdqa %xmm2,%xmm1 > + > + jmp .Loop_key256 > + > +.Ldone_key256: > + movl %esi,16(%rax) > + xorl %eax,%eax > + jmp .Lenc_key_ret > + > +.align 16 > +.Lbad_keybits: > + movq $-2,%rax > +.Lenc_key_ret: > + pxor %xmm0,%xmm0 > + pxor %xmm1,%xmm1 > + pxor %xmm2,%xmm2 > + pxor %xmm3,%xmm3 > + pxor %xmm4,%xmm4 > + pxor %xmm5,%xmm5 > + addq $8,%rsp > +.cfi_adjust_cfa_offset -8 > + .byte 0xf3,0xc3 > +.LSEH_end_set_encrypt_key: > + > +.align 16 > +.Lkey_expansion_128: > + movups %xmm0,(%rax) > + leaq 16(%rax),%rax > +.Lkey_expansion_128_cold: > + shufps $16,%xmm0,%xmm4 > + xorps %xmm4,%xmm0 > + shufps $140,%xmm0,%xmm4 > + xorps %xmm4,%xmm0 > + shufps $255,%xmm1,%xmm1 > + xorps %xmm1,%xmm0 > + .byte 0xf3,0xc3 > + > +.align 16 > +.Lkey_expansion_192a: > + movups %xmm0,(%rax) > + leaq 16(%rax),%rax > +.Lkey_expansion_192a_cold: > + movaps %xmm2,%xmm5 > +.Lkey_expansion_192b_warm: > + shufps $16,%xmm0,%xmm4 > + movdqa %xmm2,%xmm3 > + xorps %xmm4,%xmm0 > + shufps $140,%xmm0,%xmm4 > + pslldq $4,%xmm3 > + xorps %xmm4,%xmm0 > + pshufd $85,%xmm1,%xmm1 > + pxor %xmm3,%xmm2 > + pxor %xmm1,%xmm0 > + pshufd $255,%xmm0,%xmm3 > + pxor %xmm3,%xmm2 > + .byte 0xf3,0xc3 > + > +.align 16 > +.Lkey_expansion_192b: > + movaps %xmm0,%xmm3 > + shufps $68,%xmm0,%xmm5 > + movups %xmm5,(%rax) > + shufps $78,%xmm2,%xmm3 > + movups %xmm3,16(%rax) > + leaq 32(%rax),%rax > + jmp .Lkey_expansion_192b_warm > + > +.align 16 > +.Lkey_expansion_256a: > + movups %xmm2,(%rax) > + leaq 16(%rax),%rax > +.Lkey_expansion_256a_cold: > + shufps $16,%xmm0,%xmm4 > + xorps %xmm4,%xmm0 > + shufps $140,%xmm0,%xmm4 > + xorps %xmm4,%xmm0 > + shufps $255,%xmm1,%xmm1 > + xorps %xmm1,%xmm0 > + .byte 0xf3,0xc3 > + > +.align 16 > +.Lkey_expansion_256b: > + movups %xmm0,(%rax) > + leaq 16(%rax),%rax > + > + shufps $16,%xmm2,%xmm4 > + xorps %xmm4,%xmm2 > + shufps $140,%xmm2,%xmm4 > + xorps %xmm4,%xmm2 > + shufps $170,%xmm1,%xmm1 > + xorps %xmm1,%xmm2 > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size aesni_set_encrypt_key,.-aesni_set_encrypt_key > +.size __aesni_set_encrypt_key,.-__aesni_set_encrypt_key > +.align 64 > +.Lbswap_mask: > +.byte 15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0 > +.Lincrement32: > +.long 6,6,6,0 > +.Lincrement64: > +.long 1,0,0,0 > +.Lxts_magic: > +.long 0x87,0,1,0 > +.Lincrement1: > +.byte 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1 > +.Lkey_rotate: > +.long 0x0c0f0e0d,0x0c0f0e0d,0x0c0f0e0d,0x0c0f0e0d > +.Lkey_rotate192: > +.long 0x04070605,0x04070605,0x04070605,0x04070605 > +.Lkey_rcon1: > +.long 1,1,1,1 > +.Lkey_rcon1b: > +.long 0x1b,0x1b,0x1b,0x1b > + > +.byte > 65,69,83,32,102,111,114,32,73,110,116,101,108,32,65,69,83,45,78,73,44,32,67 > ,82,89,80,84,79,71,65,77,83,32,98,121,32,60,97,112,112,114,111,64,111,112,1 > 01,110,115,115,108,46,111,114,103,62,0 > +.align 64 > diff --git a/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/vpaes-x86_64.S > b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/vpaes-x86_64.S > new file mode 100644 > index 0000000000..982818f83b > --- /dev/null > +++ b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/vpaes-x86_64.S > @@ -0,0 +1,863 @@ > +# WARNING: do not edit! > +# Generated from openssl/crypto/aes/asm/vpaes-x86_64.pl > +# > +# Copyright 2011-2020 The OpenSSL Project Authors. All Rights Reserved. > +# > +# Licensed under the OpenSSL license (the "License"). You may not use > +# this file except in compliance with the License. You can obtain a copy > +# in the file LICENSE in the source distribution or at > +# https://www.openssl.org/source/license.html > + > +.text > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > +.type _vpaes_encrypt_core,@function > +.align 16 > +_vpaes_encrypt_core: > +.cfi_startproc > + movq %rdx,%r9 > + movq $16,%r11 > + movl 240(%rdx),%eax > + movdqa %xmm9,%xmm1 > + movdqa .Lk_ipt(%rip),%xmm2 > + pandn %xmm0,%xmm1 > + movdqu (%r9),%xmm5 > + psrld $4,%xmm1 > + pand %xmm9,%xmm0 > +.byte 102,15,56,0,208 > + movdqa .Lk_ipt+16(%rip),%xmm0 > +.byte 102,15,56,0,193 > + pxor %xmm5,%xmm2 > + addq $16,%r9 > + pxor %xmm2,%xmm0 > + leaq .Lk_mc_backward(%rip),%r10 > + jmp .Lenc_entry > + > +.align 16 > +.Lenc_loop: > + > + movdqa %xmm13,%xmm4 > + movdqa %xmm12,%xmm0 > +.byte 102,15,56,0,226 > +.byte 102,15,56,0,195 > + pxor %xmm5,%xmm4 > + movdqa %xmm15,%xmm5 > + pxor %xmm4,%xmm0 > + movdqa -64(%r11,%r10,1),%xmm1 > +.byte 102,15,56,0,234 > + movdqa (%r11,%r10,1),%xmm4 > + movdqa %xmm14,%xmm2 > +.byte 102,15,56,0,211 > + movdqa %xmm0,%xmm3 > + pxor %xmm5,%xmm2 > +.byte 102,15,56,0,193 > + addq $16,%r9 > + pxor %xmm2,%xmm0 > +.byte 102,15,56,0,220 > + addq $16,%r11 > + pxor %xmm0,%xmm3 > +.byte 102,15,56,0,193 > + andq $0x30,%r11 > + subq $1,%rax > + pxor %xmm3,%xmm0 > + > +.Lenc_entry: > + > + movdqa %xmm9,%xmm1 > + movdqa %xmm11,%xmm5 > + pandn %xmm0,%xmm1 > + psrld $4,%xmm1 > + pand %xmm9,%xmm0 > +.byte 102,15,56,0,232 > + movdqa %xmm10,%xmm3 > + pxor %xmm1,%xmm0 > +.byte 102,15,56,0,217 > + movdqa %xmm10,%xmm4 > + pxor %xmm5,%xmm3 > +.byte 102,15,56,0,224 > + movdqa %xmm10,%xmm2 > + pxor %xmm5,%xmm4 > +.byte 102,15,56,0,211 > + movdqa %xmm10,%xmm3 > + pxor %xmm0,%xmm2 > +.byte 102,15,56,0,220 > + movdqu (%r9),%xmm5 > + pxor %xmm1,%xmm3 > + jnz .Lenc_loop > + > + > + movdqa -96(%r10),%xmm4 > + movdqa -80(%r10),%xmm0 > +.byte 102,15,56,0,226 > + pxor %xmm5,%xmm4 > +.byte 102,15,56,0,195 > + movdqa 64(%r11,%r10,1),%xmm1 > + pxor %xmm4,%xmm0 > +.byte 102,15,56,0,193 > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size _vpaes_encrypt_core,.-_vpaes_encrypt_core > + > + > + > + > + > + > +.type _vpaes_decrypt_core,@function > +.align 16 > +_vpaes_decrypt_core: > +.cfi_startproc > + movq %rdx,%r9 > + movl 240(%rdx),%eax > + movdqa %xmm9,%xmm1 > + movdqa .Lk_dipt(%rip),%xmm2 > + pandn %xmm0,%xmm1 > + movq %rax,%r11 > + psrld $4,%xmm1 > + movdqu (%r9),%xmm5 > + shlq $4,%r11 > + pand %xmm9,%xmm0 > +.byte 102,15,56,0,208 > + movdqa .Lk_dipt+16(%rip),%xmm0 > + xorq $0x30,%r11 > + leaq .Lk_dsbd(%rip),%r10 > +.byte 102,15,56,0,193 > + andq $0x30,%r11 > + pxor %xmm5,%xmm2 > + movdqa .Lk_mc_forward+48(%rip),%xmm5 > + pxor %xmm2,%xmm0 > + addq $16,%r9 > + addq %r10,%r11 > + jmp .Ldec_entry > + > +.align 16 > +.Ldec_loop: > + > + > + > + movdqa -32(%r10),%xmm4 > + movdqa -16(%r10),%xmm1 > +.byte 102,15,56,0,226 > +.byte 102,15,56,0,203 > + pxor %xmm4,%xmm0 > + movdqa 0(%r10),%xmm4 > + pxor %xmm1,%xmm0 > + movdqa 16(%r10),%xmm1 > + > +.byte 102,15,56,0,226 > +.byte 102,15,56,0,197 > +.byte 102,15,56,0,203 > + pxor %xmm4,%xmm0 > + movdqa 32(%r10),%xmm4 > + pxor %xmm1,%xmm0 > + movdqa 48(%r10),%xmm1 > + > +.byte 102,15,56,0,226 > +.byte 102,15,56,0,197 > +.byte 102,15,56,0,203 > + pxor %xmm4,%xmm0 > + movdqa 64(%r10),%xmm4 > + pxor %xmm1,%xmm0 > + movdqa 80(%r10),%xmm1 > + > +.byte 102,15,56,0,226 > +.byte 102,15,56,0,197 > +.byte 102,15,56,0,203 > + pxor %xmm4,%xmm0 > + addq $16,%r9 > +.byte 102,15,58,15,237,12 > + pxor %xmm1,%xmm0 > + subq $1,%rax > + > +.Ldec_entry: > + > + movdqa %xmm9,%xmm1 > + pandn %xmm0,%xmm1 > + movdqa %xmm11,%xmm2 > + psrld $4,%xmm1 > + pand %xmm9,%xmm0 > +.byte 102,15,56,0,208 > + movdqa %xmm10,%xmm3 > + pxor %xmm1,%xmm0 > +.byte 102,15,56,0,217 > + movdqa %xmm10,%xmm4 > + pxor %xmm2,%xmm3 > +.byte 102,15,56,0,224 > + pxor %xmm2,%xmm4 > + movdqa %xmm10,%xmm2 > +.byte 102,15,56,0,211 > + movdqa %xmm10,%xmm3 > + pxor %xmm0,%xmm2 > +.byte 102,15,56,0,220 > + movdqu (%r9),%xmm0 > + pxor %xmm1,%xmm3 > + jnz .Ldec_loop > + > + > + movdqa 96(%r10),%xmm4 > +.byte 102,15,56,0,226 > + pxor %xmm0,%xmm4 > + movdqa 112(%r10),%xmm0 > + movdqa -352(%r11),%xmm2 > +.byte 102,15,56,0,195 > + pxor %xmm4,%xmm0 > +.byte 102,15,56,0,194 > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size _vpaes_decrypt_core,.-_vpaes_decrypt_core > + > + > + > + > + > + > +.type _vpaes_schedule_core,@function > +.align 16 > +_vpaes_schedule_core: > +.cfi_startproc > + > + > + > + > + > + call _vpaes_preheat > + movdqa .Lk_rcon(%rip),%xmm8 > + movdqu (%rdi),%xmm0 > + > + > + movdqa %xmm0,%xmm3 > + leaq .Lk_ipt(%rip),%r11 > + call _vpaes_schedule_transform > + movdqa %xmm0,%xmm7 > + > + leaq .Lk_sr(%rip),%r10 > + testq %rcx,%rcx > + jnz .Lschedule_am_decrypting > + > + > + movdqu %xmm0,(%rdx) > + jmp .Lschedule_go > + > +.Lschedule_am_decrypting: > + > + movdqa (%r8,%r10,1),%xmm1 > +.byte 102,15,56,0,217 > + movdqu %xmm3,(%rdx) > + xorq $0x30,%r8 > + > +.Lschedule_go: > + cmpl $192,%esi > + ja .Lschedule_256 > + je .Lschedule_192 > + > + > + > + > + > + > + > + > + > + > +.Lschedule_128: > + movl $10,%esi > + > +.Loop_schedule_128: > + call _vpaes_schedule_round > + decq %rsi > + jz .Lschedule_mangle_last > + call _vpaes_schedule_mangle > + jmp .Loop_schedule_128 > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > +.align 16 > +.Lschedule_192: > + movdqu 8(%rdi),%xmm0 > + call _vpaes_schedule_transform > + movdqa %xmm0,%xmm6 > + pxor %xmm4,%xmm4 > + movhlps %xmm4,%xmm6 > + movl $4,%esi > + > +.Loop_schedule_192: > + call _vpaes_schedule_round > +.byte 102,15,58,15,198,8 > + call _vpaes_schedule_mangle > + call _vpaes_schedule_192_smear > + call _vpaes_schedule_mangle > + call _vpaes_schedule_round > + decq %rsi > + jz .Lschedule_mangle_last > + call _vpaes_schedule_mangle > + call _vpaes_schedule_192_smear > + jmp .Loop_schedule_192 > + > + > + > + > + > + > + > + > + > + > + > +.align 16 > +.Lschedule_256: > + movdqu 16(%rdi),%xmm0 > + call _vpaes_schedule_transform > + movl $7,%esi > + > +.Loop_schedule_256: > + call _vpaes_schedule_mangle > + movdqa %xmm0,%xmm6 > + > + > + call _vpaes_schedule_round > + decq %rsi > + jz .Lschedule_mangle_last > + call _vpaes_schedule_mangle > + > + > + pshufd $0xFF,%xmm0,%xmm0 > + movdqa %xmm7,%xmm5 > + movdqa %xmm6,%xmm7 > + call _vpaes_schedule_low_round > + movdqa %xmm5,%xmm7 > + > + jmp .Loop_schedule_256 > + > + > + > + > + > + > + > + > + > + > + > + > +.align 16 > +.Lschedule_mangle_last: > + > + leaq .Lk_deskew(%rip),%r11 > + testq %rcx,%rcx > + jnz .Lschedule_mangle_last_dec > + > + > + movdqa (%r8,%r10,1),%xmm1 > +.byte 102,15,56,0,193 > + leaq .Lk_opt(%rip),%r11 > + addq $32,%rdx > + > +.Lschedule_mangle_last_dec: > + addq $-16,%rdx > + pxor .Lk_s63(%rip),%xmm0 > + call _vpaes_schedule_transform > + movdqu %xmm0,(%rdx) > + > + > + pxor %xmm0,%xmm0 > + pxor %xmm1,%xmm1 > + pxor %xmm2,%xmm2 > + pxor %xmm3,%xmm3 > + pxor %xmm4,%xmm4 > + pxor %xmm5,%xmm5 > + pxor %xmm6,%xmm6 > + pxor %xmm7,%xmm7 > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size _vpaes_schedule_core,.-_vpaes_schedule_core > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > +.type _vpaes_schedule_192_smear,@function > +.align 16 > +_vpaes_schedule_192_smear: > +.cfi_startproc > + pshufd $0x80,%xmm6,%xmm1 > + pshufd $0xFE,%xmm7,%xmm0 > + pxor %xmm1,%xmm6 > + pxor %xmm1,%xmm1 > + pxor %xmm0,%xmm6 > + movdqa %xmm6,%xmm0 > + movhlps %xmm1,%xmm6 > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size _vpaes_schedule_192_smear,.-_vpaes_schedule_192_smear > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > +.type _vpaes_schedule_round,@function > +.align 16 > +_vpaes_schedule_round: > +.cfi_startproc > + > + pxor %xmm1,%xmm1 > +.byte 102,65,15,58,15,200,15 > +.byte 102,69,15,58,15,192,15 > + pxor %xmm1,%xmm7 > + > + > + pshufd $0xFF,%xmm0,%xmm0 > +.byte 102,15,58,15,192,1 > + > + > + > + > +_vpaes_schedule_low_round: > + > + movdqa %xmm7,%xmm1 > + pslldq $4,%xmm7 > + pxor %xmm1,%xmm7 > + movdqa %xmm7,%xmm1 > + pslldq $8,%xmm7 > + pxor %xmm1,%xmm7 > + pxor .Lk_s63(%rip),%xmm7 > + > + > + movdqa %xmm9,%xmm1 > + pandn %xmm0,%xmm1 > + psrld $4,%xmm1 > + pand %xmm9,%xmm0 > + movdqa %xmm11,%xmm2 > +.byte 102,15,56,0,208 > + pxor %xmm1,%xmm0 > + movdqa %xmm10,%xmm3 > +.byte 102,15,56,0,217 > + pxor %xmm2,%xmm3 > + movdqa %xmm10,%xmm4 > +.byte 102,15,56,0,224 > + pxor %xmm2,%xmm4 > + movdqa %xmm10,%xmm2 > +.byte 102,15,56,0,211 > + pxor %xmm0,%xmm2 > + movdqa %xmm10,%xmm3 > +.byte 102,15,56,0,220 > + pxor %xmm1,%xmm3 > + movdqa %xmm13,%xmm4 > +.byte 102,15,56,0,226 > + movdqa %xmm12,%xmm0 > +.byte 102,15,56,0,195 > + pxor %xmm4,%xmm0 > + > + > + pxor %xmm7,%xmm0 > + movdqa %xmm0,%xmm7 > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size _vpaes_schedule_round,.-_vpaes_schedule_round > + > + > + > + > + > + > + > + > + > + > +.type _vpaes_schedule_transform,@function > +.align 16 > +_vpaes_schedule_transform: > +.cfi_startproc > + movdqa %xmm9,%xmm1 > + pandn %xmm0,%xmm1 > + psrld $4,%xmm1 > + pand %xmm9,%xmm0 > + movdqa (%r11),%xmm2 > +.byte 102,15,56,0,208 > + movdqa 16(%r11),%xmm0 > +.byte 102,15,56,0,193 > + pxor %xmm2,%xmm0 > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size _vpaes_schedule_transform,.-_vpaes_schedule_transform > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > +.type _vpaes_schedule_mangle,@function > +.align 16 > +_vpaes_schedule_mangle: > +.cfi_startproc > + movdqa %xmm0,%xmm4 > + movdqa .Lk_mc_forward(%rip),%xmm5 > + testq %rcx,%rcx > + jnz .Lschedule_mangle_dec > + > + > + addq $16,%rdx > + pxor .Lk_s63(%rip),%xmm4 > +.byte 102,15,56,0,229 > + movdqa %xmm4,%xmm3 > +.byte 102,15,56,0,229 > + pxor %xmm4,%xmm3 > +.byte 102,15,56,0,229 > + pxor %xmm4,%xmm3 > + > + jmp .Lschedule_mangle_both > +.align 16 > +.Lschedule_mangle_dec: > + > + leaq .Lk_dksd(%rip),%r11 > + movdqa %xmm9,%xmm1 > + pandn %xmm4,%xmm1 > + psrld $4,%xmm1 > + pand %xmm9,%xmm4 > + > + movdqa 0(%r11),%xmm2 > +.byte 102,15,56,0,212 > + movdqa 16(%r11),%xmm3 > +.byte 102,15,56,0,217 > + pxor %xmm2,%xmm3 > +.byte 102,15,56,0,221 > + > + movdqa 32(%r11),%xmm2 > +.byte 102,15,56,0,212 > + pxor %xmm3,%xmm2 > + movdqa 48(%r11),%xmm3 > +.byte 102,15,56,0,217 > + pxor %xmm2,%xmm3 > +.byte 102,15,56,0,221 > + > + movdqa 64(%r11),%xmm2 > +.byte 102,15,56,0,212 > + pxor %xmm3,%xmm2 > + movdqa 80(%r11),%xmm3 > +.byte 102,15,56,0,217 > + pxor %xmm2,%xmm3 > +.byte 102,15,56,0,221 > + > + movdqa 96(%r11),%xmm2 > +.byte 102,15,56,0,212 > + pxor %xmm3,%xmm2 > + movdqa 112(%r11),%xmm3 > +.byte 102,15,56,0,217 > + pxor %xmm2,%xmm3 > + > + addq $-16,%rdx > + > +.Lschedule_mangle_both: > + movdqa (%r8,%r10,1),%xmm1 > +.byte 102,15,56,0,217 > + addq $-16,%r8 > + andq $0x30,%r8 > + movdqu %xmm3,(%rdx) > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size _vpaes_schedule_mangle,.-_vpaes_schedule_mangle > + > + > + > + > +.globl vpaes_set_encrypt_key > +.type vpaes_set_encrypt_key,@function > +.align 16 > +vpaes_set_encrypt_key: > +.cfi_startproc > + movl %esi,%eax > + shrl $5,%eax > + addl $5,%eax > + movl %eax,240(%rdx) > + > + movl $0,%ecx > + movl $0x30,%r8d > + call _vpaes_schedule_core > + xorl %eax,%eax > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size vpaes_set_encrypt_key,.-vpaes_set_encrypt_key > + > +.globl vpaes_set_decrypt_key > +.type vpaes_set_decrypt_key,@function > +.align 16 > +vpaes_set_decrypt_key: > +.cfi_startproc > + movl %esi,%eax > + shrl $5,%eax > + addl $5,%eax > + movl %eax,240(%rdx) > + shll $4,%eax > + leaq 16(%rdx,%rax,1),%rdx > + > + movl $1,%ecx > + movl %esi,%r8d > + shrl $1,%r8d > + andl $32,%r8d > + xorl $32,%r8d > + call _vpaes_schedule_core > + xorl %eax,%eax > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size vpaes_set_decrypt_key,.-vpaes_set_decrypt_key > + > +.globl vpaes_encrypt > +.type vpaes_encrypt,@function > +.align 16 > +vpaes_encrypt: > +.cfi_startproc > + movdqu (%rdi),%xmm0 > + call _vpaes_preheat > + call _vpaes_encrypt_core > + movdqu %xmm0,(%rsi) > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size vpaes_encrypt,.-vpaes_encrypt > + > +.globl vpaes_decrypt > +.type vpaes_decrypt,@function > +.align 16 > +vpaes_decrypt: > +.cfi_startproc > + movdqu (%rdi),%xmm0 > + call _vpaes_preheat > + call _vpaes_decrypt_core > + movdqu %xmm0,(%rsi) > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size vpaes_decrypt,.-vpaes_decrypt > +.globl vpaes_cbc_encrypt > +.type vpaes_cbc_encrypt,@function > +.align 16 > +vpaes_cbc_encrypt: > +.cfi_startproc > + xchgq %rcx,%rdx > + subq $16,%rcx > + jc .Lcbc_abort > + movdqu (%r8),%xmm6 > + subq %rdi,%rsi > + call _vpaes_preheat > + cmpl $0,%r9d > + je .Lcbc_dec_loop > + jmp .Lcbc_enc_loop > +.align 16 > +.Lcbc_enc_loop: > + movdqu (%rdi),%xmm0 > + pxor %xmm6,%xmm0 > + call _vpaes_encrypt_core > + movdqa %xmm0,%xmm6 > + movdqu %xmm0,(%rsi,%rdi,1) > + leaq 16(%rdi),%rdi > + subq $16,%rcx > + jnc .Lcbc_enc_loop > + jmp .Lcbc_done > +.align 16 > +.Lcbc_dec_loop: > + movdqu (%rdi),%xmm0 > + movdqa %xmm0,%xmm7 > + call _vpaes_decrypt_core > + pxor %xmm6,%xmm0 > + movdqa %xmm7,%xmm6 > + movdqu %xmm0,(%rsi,%rdi,1) > + leaq 16(%rdi),%rdi > + subq $16,%rcx > + jnc .Lcbc_dec_loop > +.Lcbc_done: > + movdqu %xmm6,(%r8) > +.Lcbc_abort: > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size vpaes_cbc_encrypt,.-vpaes_cbc_encrypt > + > + > + > + > + > + > +.type _vpaes_preheat,@function > +.align 16 > +_vpaes_preheat: > +.cfi_startproc > + leaq .Lk_s0F(%rip),%r10 > + movdqa -32(%r10),%xmm10 > + movdqa -16(%r10),%xmm11 > + movdqa 0(%r10),%xmm9 > + movdqa 48(%r10),%xmm13 > + movdqa 64(%r10),%xmm12 > + movdqa 80(%r10),%xmm15 > + movdqa 96(%r10),%xmm14 > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size _vpaes_preheat,.-_vpaes_preheat > + > + > + > + > + > +.type _vpaes_consts,@object > +.align 64 > +_vpaes_consts: > +.Lk_inv: > +.quad 0x0E05060F0D080180, 0x040703090A0B0C02 > +.quad 0x01040A060F0B0780, 0x030D0E0C02050809 > + > +.Lk_s0F: > +.quad 0x0F0F0F0F0F0F0F0F, 0x0F0F0F0F0F0F0F0F > + > +.Lk_ipt: > +.quad 0xC2B2E8985A2A7000, 0xCABAE09052227808 > +.quad 0x4C01307D317C4D00, 0xCD80B1FCB0FDCC81 > + > +.Lk_sb1: > +.quad 0xB19BE18FCB503E00, 0xA5DF7A6E142AF544 > +.quad 0x3618D415FAE22300, 0x3BF7CCC10D2ED9EF > +.Lk_sb2: > +.quad 0xE27A93C60B712400, 0x5EB7E955BC982FCD > +.quad 0x69EB88400AE12900, 0xC2A163C8AB82234A > +.Lk_sbo: > +.quad 0xD0D26D176FBDC700, 0x15AABF7AC502A878 > +.quad 0xCFE474A55FBB6A00, 0x8E1E90D1412B35FA > + > +.Lk_mc_forward: > +.quad 0x0407060500030201, 0x0C0F0E0D080B0A09 > +.quad 0x080B0A0904070605, 0x000302010C0F0E0D > +.quad 0x0C0F0E0D080B0A09, 0x0407060500030201 > +.quad 0x000302010C0F0E0D, 0x080B0A0904070605 > + > +.Lk_mc_backward: > +.quad 0x0605040702010003, 0x0E0D0C0F0A09080B > +.quad 0x020100030E0D0C0F, 0x0A09080B06050407 > +.quad 0x0E0D0C0F0A09080B, 0x0605040702010003 > +.quad 0x0A09080B06050407, 0x020100030E0D0C0F > + > +.Lk_sr: > +.quad 0x0706050403020100, 0x0F0E0D0C0B0A0908 > +.quad 0x030E09040F0A0500, 0x0B06010C07020D08 > +.quad 0x0F060D040B020900, 0x070E050C030A0108 > +.quad 0x0B0E0104070A0D00, 0x0306090C0F020508 > + > +.Lk_rcon: > +.quad 0x1F8391B9AF9DEEB6, 0x702A98084D7C7D81 > + > +.Lk_s63: > +.quad 0x5B5B5B5B5B5B5B5B, 0x5B5B5B5B5B5B5B5B > + > +.Lk_opt: > +.quad 0xFF9F4929D6B66000, 0xF7974121DEBE6808 > +.quad 0x01EDBD5150BCEC00, 0xE10D5DB1B05C0CE0 > + > +.Lk_deskew: > +.quad 0x07E4A34047A4E300, 0x1DFEB95A5DBEF91A > +.quad 0x5F36B5DC83EA6900, 0x2841C2ABF49D1E77 > + > + > + > + > + > +.Lk_dksd: > +.quad 0xFEB91A5DA3E44700, 0x0740E3A45A1DBEF9 > +.quad 0x41C277F4B5368300, 0x5FDC69EAAB289D1E > +.Lk_dksb: > +.quad 0x9A4FCA1F8550D500, 0x03D653861CC94C99 > +.quad 0x115BEDA7B6FC4A00, 0xD993256F7E3482C8 > +.Lk_dkse: > +.quad 0xD5031CCA1FC9D600, 0x53859A4C994F5086 > +.quad 0xA23196054FDC7BE8, 0xCD5EF96A20B31487 > +.Lk_dks9: > +.quad 0xB6116FC87ED9A700, 0x4AED933482255BFC > +.quad 0x4576516227143300, 0x8BB89FACE9DAFDCE > + > + > + > + > + > +.Lk_dipt: > +.quad 0x0F505B040B545F00, 0x154A411E114E451A > +.quad 0x86E383E660056500, 0x12771772F491F194 > + > +.Lk_dsb9: > +.quad 0x851C03539A86D600, 0xCAD51F504F994CC9 > +.quad 0xC03B1789ECD74900, 0x725E2C9EB2FBA565 > +.Lk_dsbd: > +.quad 0x7D57CCDFE6B1A200, 0xF56E9B13882A4439 > +.quad 0x3CE2FAF724C6CB00, 0x2931180D15DEEFD3 > +.Lk_dsbb: > +.quad 0xD022649296B44200, 0x602646F6B0F2D404 > +.quad 0xC19498A6CD596700, 0xF3FF0C3E3255AA6B > +.Lk_dsbe: > +.quad 0x46F2929626D4D000, 0x2242600464B4F6B0 > +.quad 0x0C55A6CDFFAAC100, 0x9467F36B98593E32 > +.Lk_dsbo: > +.quad 0x1387EA537EF94000, 0xC7AA6DB9D4943E2D > +.quad 0x12D7560F93441D00, 0xCA4B8159D8C58E9C > +.byte > 86,101,99,116,111,114,32,80,101,114,109,117,116,97,116,105,111,110,32,65,6 > 9,83,32,102,111,114,32,120,56,54,95,54,52,47,83,83,83,69,51,44,32,77,105,10 > 7,101,32,72,97,109,98,117,114,103,32,40,83,116,97,110,102,111,114,100,32,8 > 5,110,105,118,101,114,115,105,116,121,41,0 > +.align 64 > +.size _vpaes_consts,.-_vpaes_consts > diff --git a/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/modes/aesni-gcm- > x86_64.S b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/modes/aesni-gcm- > x86_64.S > new file mode 100644 > index 0000000000..1201f3427a > --- /dev/null > +++ b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/modes/aesni-gcm- > x86_64.S > @@ -0,0 +1,29 @@ > +# WARNING: do not edit! > +# Generated from openssl/crypto/modes/asm/aesni-gcm-x86_64.pl > +# > +# Copyright 2013-2020 The OpenSSL Project Authors. All Rights Reserved. > +# > +# Licensed under the OpenSSL license (the "License"). You may not use > +# this file except in compliance with the License. You can obtain a copy > +# in the file LICENSE in the source distribution or at > +# https://www.openssl.org/source/license.html > + > +.text > + > +.globl aesni_gcm_encrypt > +.type aesni_gcm_encrypt,@function > +aesni_gcm_encrypt: > +.cfi_startproc > + xorl %eax,%eax > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size aesni_gcm_encrypt,.-aesni_gcm_encrypt > + > +.globl aesni_gcm_decrypt > +.type aesni_gcm_decrypt,@function > +aesni_gcm_decrypt: > +.cfi_startproc > + xorl %eax,%eax > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size aesni_gcm_decrypt,.-aesni_gcm_decrypt > diff --git a/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/modes/ghash- > x86_64.S b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/modes/ghash- > x86_64.S > new file mode 100644 > index 0000000000..3fcaa4b2ef > --- /dev/null > +++ b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/modes/ghash-x86_64.S > @@ -0,0 +1,1386 @@ > +# WARNING: do not edit! > +# Generated from openssl/crypto/modes/asm/ghash-x86_64.pl > +# > +# Copyright 2010-2020 The OpenSSL Project Authors. All Rights Reserved. > +# > +# Licensed under the OpenSSL license (the "License"). You may not use > +# this file except in compliance with the License. You can obtain a copy > +# in the file LICENSE in the source distribution or at > +# https://www.openssl.org/source/license.html > + > +.text > + > + > +.globl gcm_gmult_4bit > +.type gcm_gmult_4bit,@function > +.align 16 > +gcm_gmult_4bit: > +.cfi_startproc > + pushq %rbx > +.cfi_adjust_cfa_offset 8 > +.cfi_offset %rbx,-16 > + pushq %rbp > +.cfi_adjust_cfa_offset 8 > +.cfi_offset %rbp,-24 > + pushq %r12 > +.cfi_adjust_cfa_offset 8 > +.cfi_offset %r12,-32 > + pushq %r13 > +.cfi_adjust_cfa_offset 8 > +.cfi_offset %r13,-40 > + pushq %r14 > +.cfi_adjust_cfa_offset 8 > +.cfi_offset %r14,-48 > + pushq %r15 > +.cfi_adjust_cfa_offset 8 > +.cfi_offset %r15,-56 > + subq $280,%rsp > +.cfi_adjust_cfa_offset 280 > +.Lgmult_prologue: > + > + movzbq 15(%rdi),%r8 > + leaq .Lrem_4bit(%rip),%r11 > + xorq %rax,%rax > + xorq %rbx,%rbx > + movb %r8b,%al > + movb %r8b,%bl > + shlb $4,%al > + movq $14,%rcx > + movq 8(%rsi,%rax,1),%r8 > + movq (%rsi,%rax,1),%r9 > + andb $0xf0,%bl > + movq %r8,%rdx > + jmp .Loop1 > + > +.align 16 > +.Loop1: > + shrq $4,%r8 > + andq $0xf,%rdx > + movq %r9,%r10 > + movb (%rdi,%rcx,1),%al > + shrq $4,%r9 > + xorq 8(%rsi,%rbx,1),%r8 > + shlq $60,%r10 > + xorq (%rsi,%rbx,1),%r9 > + movb %al,%bl > + xorq (%r11,%rdx,8),%r9 > + movq %r8,%rdx > + shlb $4,%al > + xorq %r10,%r8 > + decq %rcx > + js .Lbreak1 > + > + shrq $4,%r8 > + andq $0xf,%rdx > + movq %r9,%r10 > + shrq $4,%r9 > + xorq 8(%rsi,%rax,1),%r8 > + shlq $60,%r10 > + xorq (%rsi,%rax,1),%r9 > + andb $0xf0,%bl > + xorq (%r11,%rdx,8),%r9 > + movq %r8,%rdx > + xorq %r10,%r8 > + jmp .Loop1 > + > +.align 16 > +.Lbreak1: > + shrq $4,%r8 > + andq $0xf,%rdx > + movq %r9,%r10 > + shrq $4,%r9 > + xorq 8(%rsi,%rax,1),%r8 > + shlq $60,%r10 > + xorq (%rsi,%rax,1),%r9 > + andb $0xf0,%bl > + xorq (%r11,%rdx,8),%r9 > + movq %r8,%rdx > + xorq %r10,%r8 > + > + shrq $4,%r8 > + andq $0xf,%rdx > + movq %r9,%r10 > + shrq $4,%r9 > + xorq 8(%rsi,%rbx,1),%r8 > + shlq $60,%r10 > + xorq (%rsi,%rbx,1),%r9 > + xorq %r10,%r8 > + xorq (%r11,%rdx,8),%r9 > + > + bswapq %r8 > + bswapq %r9 > + movq %r8,8(%rdi) > + movq %r9,(%rdi) > + > + leaq 280+48(%rsp),%rsi > +.cfi_def_cfa %rsi,8 > + movq -8(%rsi),%rbx > +.cfi_restore %rbx > + leaq (%rsi),%rsp > +.cfi_def_cfa_register %rsp > +.Lgmult_epilogue: > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size gcm_gmult_4bit,.-gcm_gmult_4bit > +.globl gcm_ghash_4bit > +.type gcm_ghash_4bit,@function > +.align 16 > +gcm_ghash_4bit: > +.cfi_startproc > + pushq %rbx > +.cfi_adjust_cfa_offset 8 > +.cfi_offset %rbx,-16 > + pushq %rbp > +.cfi_adjust_cfa_offset 8 > +.cfi_offset %rbp,-24 > + pushq %r12 > +.cfi_adjust_cfa_offset 8 > +.cfi_offset %r12,-32 > + pushq %r13 > +.cfi_adjust_cfa_offset 8 > +.cfi_offset %r13,-40 > + pushq %r14 > +.cfi_adjust_cfa_offset 8 > +.cfi_offset %r14,-48 > + pushq %r15 > +.cfi_adjust_cfa_offset 8 > +.cfi_offset %r15,-56 > + subq $280,%rsp > +.cfi_adjust_cfa_offset 280 > +.Lghash_prologue: > + movq %rdx,%r14 > + movq %rcx,%r15 > + subq $-128,%rsi > + leaq 16+128(%rsp),%rbp > + xorl %edx,%edx > + movq 0+0-128(%rsi),%r8 > + movq 0+8-128(%rsi),%rax > + movb %al,%dl > + shrq $4,%rax > + movq %r8,%r10 > + shrq $4,%r8 > + movq 16+0-128(%rsi),%r9 > + shlb $4,%dl > + movq 16+8-128(%rsi),%rbx > + shlq $60,%r10 > + movb %dl,0(%rsp) > + orq %r10,%rax > + movb %bl,%dl > + shrq $4,%rbx > + movq %r9,%r10 > + shrq $4,%r9 > + movq %r8,0(%rbp) > + movq 32+0-128(%rsi),%r8 > + shlb $4,%dl > + movq %rax,0-128(%rbp) > + movq 32+8-128(%rsi),%rax > + shlq $60,%r10 > + movb %dl,1(%rsp) > + orq %r10,%rbx > + movb %al,%dl > + shrq $4,%rax > + movq %r8,%r10 > + shrq $4,%r8 > + movq %r9,8(%rbp) > + movq 48+0-128(%rsi),%r9 > + shlb $4,%dl > + movq %rbx,8-128(%rbp) > + movq 48+8-128(%rsi),%rbx > + shlq $60,%r10 > + movb %dl,2(%rsp) > + orq %r10,%rax > + movb %bl,%dl > + shrq $4,%rbx > + movq %r9,%r10 > + shrq $4,%r9 > + movq %r8,16(%rbp) > + movq 64+0-128(%rsi),%r8 > + shlb $4,%dl > + movq %rax,16-128(%rbp) > + movq 64+8-128(%rsi),%rax > + shlq $60,%r10 > + movb %dl,3(%rsp) > + orq %r10,%rbx > + movb %al,%dl > + shrq $4,%rax > + movq %r8,%r10 > + shrq $4,%r8 > + movq %r9,24(%rbp) > + movq 80+0-128(%rsi),%r9 > + shlb $4,%dl > + movq %rbx,24-128(%rbp) > + movq 80+8-128(%rsi),%rbx > + shlq $60,%r10 > + movb %dl,4(%rsp) > + orq %r10,%rax > + movb %bl,%dl > + shrq $4,%rbx > + movq %r9,%r10 > + shrq $4,%r9 > + movq %r8,32(%rbp) > + movq 96+0-128(%rsi),%r8 > + shlb $4,%dl > + movq %rax,32-128(%rbp) > + movq 96+8-128(%rsi),%rax > + shlq $60,%r10 > + movb %dl,5(%rsp) > + orq %r10,%rbx > + movb %al,%dl > + shrq $4,%rax > + movq %r8,%r10 > + shrq $4,%r8 > + movq %r9,40(%rbp) > + movq 112+0-128(%rsi),%r9 > + shlb $4,%dl > + movq %rbx,40-128(%rbp) > + movq 112+8-128(%rsi),%rbx > + shlq $60,%r10 > + movb %dl,6(%rsp) > + orq %r10,%rax > + movb %bl,%dl > + shrq $4,%rbx > + movq %r9,%r10 > + shrq $4,%r9 > + movq %r8,48(%rbp) > + movq 128+0-128(%rsi),%r8 > + shlb $4,%dl > + movq %rax,48-128(%rbp) > + movq 128+8-128(%rsi),%rax > + shlq $60,%r10 > + movb %dl,7(%rsp) > + orq %r10,%rbx > + movb %al,%dl > + shrq $4,%rax > + movq %r8,%r10 > + shrq $4,%r8 > + movq %r9,56(%rbp) > + movq 144+0-128(%rsi),%r9 > + shlb $4,%dl > + movq %rbx,56-128(%rbp) > + movq 144+8-128(%rsi),%rbx > + shlq $60,%r10 > + movb %dl,8(%rsp) > + orq %r10,%rax > + movb %bl,%dl > + shrq $4,%rbx > + movq %r9,%r10 > + shrq $4,%r9 > + movq %r8,64(%rbp) > + movq 160+0-128(%rsi),%r8 > + shlb $4,%dl > + movq %rax,64-128(%rbp) > + movq 160+8-128(%rsi),%rax > + shlq $60,%r10 > + movb %dl,9(%rsp) > + orq %r10,%rbx > + movb %al,%dl > + shrq $4,%rax > + movq %r8,%r10 > + shrq $4,%r8 > + movq %r9,72(%rbp) > + movq 176+0-128(%rsi),%r9 > + shlb $4,%dl > + movq %rbx,72-128(%rbp) > + movq 176+8-128(%rsi),%rbx > + shlq $60,%r10 > + movb %dl,10(%rsp) > + orq %r10,%rax > + movb %bl,%dl > + shrq $4,%rbx > + movq %r9,%r10 > + shrq $4,%r9 > + movq %r8,80(%rbp) > + movq 192+0-128(%rsi),%r8 > + shlb $4,%dl > + movq %rax,80-128(%rbp) > + movq 192+8-128(%rsi),%rax > + shlq $60,%r10 > + movb %dl,11(%rsp) > + orq %r10,%rbx > + movb %al,%dl > + shrq $4,%rax > + movq %r8,%r10 > + shrq $4,%r8 > + movq %r9,88(%rbp) > + movq 208+0-128(%rsi),%r9 > + shlb $4,%dl > + movq %rbx,88-128(%rbp) > + movq 208+8-128(%rsi),%rbx > + shlq $60,%r10 > + movb %dl,12(%rsp) > + orq %r10,%rax > + movb %bl,%dl > + shrq $4,%rbx > + movq %r9,%r10 > + shrq $4,%r9 > + movq %r8,96(%rbp) > + movq 224+0-128(%rsi),%r8 > + shlb $4,%dl > + movq %rax,96-128(%rbp) > + movq 224+8-128(%rsi),%rax > + shlq $60,%r10 > + movb %dl,13(%rsp) > + orq %r10,%rbx > + movb %al,%dl > + shrq $4,%rax > + movq %r8,%r10 > + shrq $4,%r8 > + movq %r9,104(%rbp) > + movq 240+0-128(%rsi),%r9 > + shlb $4,%dl > + movq %rbx,104-128(%rbp) > + movq 240+8-128(%rsi),%rbx > + shlq $60,%r10 > + movb %dl,14(%rsp) > + orq %r10,%rax > + movb %bl,%dl > + shrq $4,%rbx > + movq %r9,%r10 > + shrq $4,%r9 > + movq %r8,112(%rbp) > + shlb $4,%dl > + movq %rax,112-128(%rbp) > + shlq $60,%r10 > + movb %dl,15(%rsp) > + orq %r10,%rbx > + movq %r9,120(%rbp) > + movq %rbx,120-128(%rbp) > + addq $-128,%rsi > + movq 8(%rdi),%r8 > + movq 0(%rdi),%r9 > + addq %r14,%r15 > + leaq .Lrem_8bit(%rip),%r11 > + jmp .Louter_loop > +.align 16 > +.Louter_loop: > + xorq (%r14),%r9 > + movq 8(%r14),%rdx > + leaq 16(%r14),%r14 > + xorq %r8,%rdx > + movq %r9,(%rdi) > + movq %rdx,8(%rdi) > + shrq $32,%rdx > + xorq %rax,%rax > + roll $8,%edx > + movb %dl,%al > + movzbl %dl,%ebx > + shlb $4,%al > + shrl $4,%ebx > + roll $8,%edx > + movq 8(%rsi,%rax,1),%r8 > + movq (%rsi,%rax,1),%r9 > + movb %dl,%al > + movzbl %dl,%ecx > + shlb $4,%al > + movzbq (%rsp,%rbx,1),%r12 > + shrl $4,%ecx > + xorq %r8,%r12 > + movq %r9,%r10 > + shrq $8,%r8 > + movzbq %r12b,%r12 > + shrq $8,%r9 > + xorq -128(%rbp,%rbx,8),%r8 > + shlq $56,%r10 > + xorq (%rbp,%rbx,8),%r9 > + roll $8,%edx > + xorq 8(%rsi,%rax,1),%r8 > + xorq (%rsi,%rax,1),%r9 > + movb %dl,%al > + xorq %r10,%r8 > + movzwq (%r11,%r12,2),%r12 > + movzbl %dl,%ebx > + shlb $4,%al > + movzbq (%rsp,%rcx,1),%r13 > + shrl $4,%ebx > + shlq $48,%r12 > + xorq %r8,%r13 > + movq %r9,%r10 > + xorq %r12,%r9 > + shrq $8,%r8 > + movzbq %r13b,%r13 > + shrq $8,%r9 > + xorq -128(%rbp,%rcx,8),%r8 > + shlq $56,%r10 > + xorq (%rbp,%rcx,8),%r9 > + roll $8,%edx > + xorq 8(%rsi,%rax,1),%r8 > + xorq (%rsi,%rax,1),%r9 > + movb %dl,%al > + xorq %r10,%r8 > + movzwq (%r11,%r13,2),%r13 > + movzbl %dl,%ecx > + shlb $4,%al > + movzbq (%rsp,%rbx,1),%r12 > + shrl $4,%ecx > + shlq $48,%r13 > + xorq %r8,%r12 > + movq %r9,%r10 > + xorq %r13,%r9 > + shrq $8,%r8 > + movzbq %r12b,%r12 > + movl 8(%rdi),%edx > + shrq $8,%r9 > + xorq -128(%rbp,%rbx,8),%r8 > + shlq $56,%r10 > + xorq (%rbp,%rbx,8),%r9 > + roll $8,%edx > + xorq 8(%rsi,%rax,1),%r8 > + xorq (%rsi,%rax,1),%r9 > + movb %dl,%al > + xorq %r10,%r8 > + movzwq (%r11,%r12,2),%r12 > + movzbl %dl,%ebx > + shlb $4,%al > + movzbq (%rsp,%rcx,1),%r13 > + shrl $4,%ebx > + shlq $48,%r12 > + xorq %r8,%r13 > + movq %r9,%r10 > + xorq %r12,%r9 > + shrq $8,%r8 > + movzbq %r13b,%r13 > + shrq $8,%r9 > + xorq -128(%rbp,%rcx,8),%r8 > + shlq $56,%r10 > + xorq (%rbp,%rcx,8),%r9 > + roll $8,%edx > + xorq 8(%rsi,%rax,1),%r8 > + xorq (%rsi,%rax,1),%r9 > + movb %dl,%al > + xorq %r10,%r8 > + movzwq (%r11,%r13,2),%r13 > + movzbl %dl,%ecx > + shlb $4,%al > + movzbq (%rsp,%rbx,1),%r12 > + shrl $4,%ecx > + shlq $48,%r13 > + xorq %r8,%r12 > + movq %r9,%r10 > + xorq %r13,%r9 > + shrq $8,%r8 > + movzbq %r12b,%r12 > + shrq $8,%r9 > + xorq -128(%rbp,%rbx,8),%r8 > + shlq $56,%r10 > + xorq (%rbp,%rbx,8),%r9 > + roll $8,%edx > + xorq 8(%rsi,%rax,1),%r8 > + xorq (%rsi,%rax,1),%r9 > + movb %dl,%al > + xorq %r10,%r8 > + movzwq (%r11,%r12,2),%r12 > + movzbl %dl,%ebx > + shlb $4,%al > + movzbq (%rsp,%rcx,1),%r13 > + shrl $4,%ebx > + shlq $48,%r12 > + xorq %r8,%r13 > + movq %r9,%r10 > + xorq %r12,%r9 > + shrq $8,%r8 > + movzbq %r13b,%r13 > + shrq $8,%r9 > + xorq -128(%rbp,%rcx,8),%r8 > + shlq $56,%r10 > + xorq (%rbp,%rcx,8),%r9 > + roll $8,%edx > + xorq 8(%rsi,%rax,1),%r8 > + xorq (%rsi,%rax,1),%r9 > + movb %dl,%al > + xorq %r10,%r8 > + movzwq (%r11,%r13,2),%r13 > + movzbl %dl,%ecx > + shlb $4,%al > + movzbq (%rsp,%rbx,1),%r12 > + shrl $4,%ecx > + shlq $48,%r13 > + xorq %r8,%r12 > + movq %r9,%r10 > + xorq %r13,%r9 > + shrq $8,%r8 > + movzbq %r12b,%r12 > + movl 4(%rdi),%edx > + shrq $8,%r9 > + xorq -128(%rbp,%rbx,8),%r8 > + shlq $56,%r10 > + xorq (%rbp,%rbx,8),%r9 > + roll $8,%edx > + xorq 8(%rsi,%rax,1),%r8 > + xorq (%rsi,%rax,1),%r9 > + movb %dl,%al > + xorq %r10,%r8 > + movzwq (%r11,%r12,2),%r12 > + movzbl %dl,%ebx > + shlb $4,%al > + movzbq (%rsp,%rcx,1),%r13 > + shrl $4,%ebx > + shlq $48,%r12 > + xorq %r8,%r13 > + movq %r9,%r10 > + xorq %r12,%r9 > + shrq $8,%r8 > + movzbq %r13b,%r13 > + shrq $8,%r9 > + xorq -128(%rbp,%rcx,8),%r8 > + shlq $56,%r10 > + xorq (%rbp,%rcx,8),%r9 > + roll $8,%edx > + xorq 8(%rsi,%rax,1),%r8 > + xorq (%rsi,%rax,1),%r9 > + movb %dl,%al > + xorq %r10,%r8 > + movzwq (%r11,%r13,2),%r13 > + movzbl %dl,%ecx > + shlb $4,%al > + movzbq (%rsp,%rbx,1),%r12 > + shrl $4,%ecx > + shlq $48,%r13 > + xorq %r8,%r12 > + movq %r9,%r10 > + xorq %r13,%r9 > + shrq $8,%r8 > + movzbq %r12b,%r12 > + shrq $8,%r9 > + xorq -128(%rbp,%rbx,8),%r8 > + shlq $56,%r10 > + xorq (%rbp,%rbx,8),%r9 > + roll $8,%edx > + xorq 8(%rsi,%rax,1),%r8 > + xorq (%rsi,%rax,1),%r9 > + movb %dl,%al > + xorq %r10,%r8 > + movzwq (%r11,%r12,2),%r12 > + movzbl %dl,%ebx > + shlb $4,%al > + movzbq (%rsp,%rcx,1),%r13 > + shrl $4,%ebx > + shlq $48,%r12 > + xorq %r8,%r13 > + movq %r9,%r10 > + xorq %r12,%r9 > + shrq $8,%r8 > + movzbq %r13b,%r13 > + shrq $8,%r9 > + xorq -128(%rbp,%rcx,8),%r8 > + shlq $56,%r10 > + xorq (%rbp,%rcx,8),%r9 > + roll $8,%edx > + xorq 8(%rsi,%rax,1),%r8 > + xorq (%rsi,%rax,1),%r9 > + movb %dl,%al > + xorq %r10,%r8 > + movzwq (%r11,%r13,2),%r13 > + movzbl %dl,%ecx > + shlb $4,%al > + movzbq (%rsp,%rbx,1),%r12 > + shrl $4,%ecx > + shlq $48,%r13 > + xorq %r8,%r12 > + movq %r9,%r10 > + xorq %r13,%r9 > + shrq $8,%r8 > + movzbq %r12b,%r12 > + movl 0(%rdi),%edx > + shrq $8,%r9 > + xorq -128(%rbp,%rbx,8),%r8 > + shlq $56,%r10 > + xorq (%rbp,%rbx,8),%r9 > + roll $8,%edx > + xorq 8(%rsi,%rax,1),%r8 > + xorq (%rsi,%rax,1),%r9 > + movb %dl,%al > + xorq %r10,%r8 > + movzwq (%r11,%r12,2),%r12 > + movzbl %dl,%ebx > + shlb $4,%al > + movzbq (%rsp,%rcx,1),%r13 > + shrl $4,%ebx > + shlq $48,%r12 > + xorq %r8,%r13 > + movq %r9,%r10 > + xorq %r12,%r9 > + shrq $8,%r8 > + movzbq %r13b,%r13 > + shrq $8,%r9 > + xorq -128(%rbp,%rcx,8),%r8 > + shlq $56,%r10 > + xorq (%rbp,%rcx,8),%r9 > + roll $8,%edx > + xorq 8(%rsi,%rax,1),%r8 > + xorq (%rsi,%rax,1),%r9 > + movb %dl,%al > + xorq %r10,%r8 > + movzwq (%r11,%r13,2),%r13 > + movzbl %dl,%ecx > + shlb $4,%al > + movzbq (%rsp,%rbx,1),%r12 > + shrl $4,%ecx > + shlq $48,%r13 > + xorq %r8,%r12 > + movq %r9,%r10 > + xorq %r13,%r9 > + shrq $8,%r8 > + movzbq %r12b,%r12 > + shrq $8,%r9 > + xorq -128(%rbp,%rbx,8),%r8 > + shlq $56,%r10 > + xorq (%rbp,%rbx,8),%r9 > + roll $8,%edx > + xorq 8(%rsi,%rax,1),%r8 > + xorq (%rsi,%rax,1),%r9 > + movb %dl,%al > + xorq %r10,%r8 > + movzwq (%r11,%r12,2),%r12 > + movzbl %dl,%ebx > + shlb $4,%al > + movzbq (%rsp,%rcx,1),%r13 > + shrl $4,%ebx > + shlq $48,%r12 > + xorq %r8,%r13 > + movq %r9,%r10 > + xorq %r12,%r9 > + shrq $8,%r8 > + movzbq %r13b,%r13 > + shrq $8,%r9 > + xorq -128(%rbp,%rcx,8),%r8 > + shlq $56,%r10 > + xorq (%rbp,%rcx,8),%r9 > + roll $8,%edx > + xorq 8(%rsi,%rax,1),%r8 > + xorq (%rsi,%rax,1),%r9 > + movb %dl,%al > + xorq %r10,%r8 > + movzwq (%r11,%r13,2),%r13 > + movzbl %dl,%ecx > + shlb $4,%al > + movzbq (%rsp,%rbx,1),%r12 > + andl $240,%ecx > + shlq $48,%r13 > + xorq %r8,%r12 > + movq %r9,%r10 > + xorq %r13,%r9 > + shrq $8,%r8 > + movzbq %r12b,%r12 > + movl -4(%rdi),%edx > + shrq $8,%r9 > + xorq -128(%rbp,%rbx,8),%r8 > + shlq $56,%r10 > + xorq (%rbp,%rbx,8),%r9 > + movzwq (%r11,%r12,2),%r12 > + xorq 8(%rsi,%rax,1),%r8 > + xorq (%rsi,%rax,1),%r9 > + shlq $48,%r12 > + xorq %r10,%r8 > + xorq %r12,%r9 > + movzbq %r8b,%r13 > + shrq $4,%r8 > + movq %r9,%r10 > + shlb $4,%r13b > + shrq $4,%r9 > + xorq 8(%rsi,%rcx,1),%r8 > + movzwq (%r11,%r13,2),%r13 > + shlq $60,%r10 > + xorq (%rsi,%rcx,1),%r9 > + xorq %r10,%r8 > + shlq $48,%r13 > + bswapq %r8 > + xorq %r13,%r9 > + bswapq %r9 > + cmpq %r15,%r14 > + jb .Louter_loop > + movq %r8,8(%rdi) > + movq %r9,(%rdi) > + > + leaq 280+48(%rsp),%rsi > +.cfi_def_cfa %rsi,8 > + movq -48(%rsi),%r15 > +.cfi_restore %r15 > + movq -40(%rsi),%r14 > +.cfi_restore %r14 > + movq -32(%rsi),%r13 > +.cfi_restore %r13 > + movq -24(%rsi),%r12 > +.cfi_restore %r12 > + movq -16(%rsi),%rbp > +.cfi_restore %rbp > + movq -8(%rsi),%rbx > +.cfi_restore %rbx > + leaq 0(%rsi),%rsp > +.cfi_def_cfa_register %rsp > +.Lghash_epilogue: > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size gcm_ghash_4bit,.-gcm_ghash_4bit > +.globl gcm_init_clmul > +.type gcm_init_clmul,@function > +.align 16 > +gcm_init_clmul: > +.cfi_startproc > +.L_init_clmul: > + movdqu (%rsi),%xmm2 > + pshufd $78,%xmm2,%xmm2 > + > + > + pshufd $255,%xmm2,%xmm4 > + movdqa %xmm2,%xmm3 > + psllq $1,%xmm2 > + pxor %xmm5,%xmm5 > + psrlq $63,%xmm3 > + pcmpgtd %xmm4,%xmm5 > + pslldq $8,%xmm3 > + por %xmm3,%xmm2 > + > + > + pand .L0x1c2_polynomial(%rip),%xmm5 > + pxor %xmm5,%xmm2 > + > + > + pshufd $78,%xmm2,%xmm6 > + movdqa %xmm2,%xmm0 > + pxor %xmm2,%xmm6 > + movdqa %xmm0,%xmm1 > + pshufd $78,%xmm0,%xmm3 > + pxor %xmm0,%xmm3 > +.byte 102,15,58,68,194,0 > +.byte 102,15,58,68,202,17 > +.byte 102,15,58,68,222,0 > + pxor %xmm0,%xmm3 > + pxor %xmm1,%xmm3 > + > + movdqa %xmm3,%xmm4 > + psrldq $8,%xmm3 > + pslldq $8,%xmm4 > + pxor %xmm3,%xmm1 > + pxor %xmm4,%xmm0 > + > + movdqa %xmm0,%xmm4 > + movdqa %xmm0,%xmm3 > + psllq $5,%xmm0 > + pxor %xmm0,%xmm3 > + psllq $1,%xmm0 > + pxor %xmm3,%xmm0 > + psllq $57,%xmm0 > + movdqa %xmm0,%xmm3 > + pslldq $8,%xmm0 > + psrldq $8,%xmm3 > + pxor %xmm4,%xmm0 > + pxor %xmm3,%xmm1 > + > + > + movdqa %xmm0,%xmm4 > + psrlq $1,%xmm0 > + pxor %xmm4,%xmm1 > + pxor %xmm0,%xmm4 > + psrlq $5,%xmm0 > + pxor %xmm4,%xmm0 > + psrlq $1,%xmm0 > + pxor %xmm1,%xmm0 > + pshufd $78,%xmm2,%xmm3 > + pshufd $78,%xmm0,%xmm4 > + pxor %xmm2,%xmm3 > + movdqu %xmm2,0(%rdi) > + pxor %xmm0,%xmm4 > + movdqu %xmm0,16(%rdi) > +.byte 102,15,58,15,227,8 > + movdqu %xmm4,32(%rdi) > + movdqa %xmm0,%xmm1 > + pshufd $78,%xmm0,%xmm3 > + pxor %xmm0,%xmm3 > +.byte 102,15,58,68,194,0 > +.byte 102,15,58,68,202,17 > +.byte 102,15,58,68,222,0 > + pxor %xmm0,%xmm3 > + pxor %xmm1,%xmm3 > + > + movdqa %xmm3,%xmm4 > + psrldq $8,%xmm3 > + pslldq $8,%xmm4 > + pxor %xmm3,%xmm1 > + pxor %xmm4,%xmm0 > + > + movdqa %xmm0,%xmm4 > + movdqa %xmm0,%xmm3 > + psllq $5,%xmm0 > + pxor %xmm0,%xmm3 > + psllq $1,%xmm0 > + pxor %xmm3,%xmm0 > + psllq $57,%xmm0 > + movdqa %xmm0,%xmm3 > + pslldq $8,%xmm0 > + psrldq $8,%xmm3 > + pxor %xmm4,%xmm0 > + pxor %xmm3,%xmm1 > + > + > + movdqa %xmm0,%xmm4 > + psrlq $1,%xmm0 > + pxor %xmm4,%xmm1 > + pxor %xmm0,%xmm4 > + psrlq $5,%xmm0 > + pxor %xmm4,%xmm0 > + psrlq $1,%xmm0 > + pxor %xmm1,%xmm0 > + movdqa %xmm0,%xmm5 > + movdqa %xmm0,%xmm1 > + pshufd $78,%xmm0,%xmm3 > + pxor %xmm0,%xmm3 > +.byte 102,15,58,68,194,0 > +.byte 102,15,58,68,202,17 > +.byte 102,15,58,68,222,0 > + pxor %xmm0,%xmm3 > + pxor %xmm1,%xmm3 > + > + movdqa %xmm3,%xmm4 > + psrldq $8,%xmm3 > + pslldq $8,%xmm4 > + pxor %xmm3,%xmm1 > + pxor %xmm4,%xmm0 > + > + movdqa %xmm0,%xmm4 > + movdqa %xmm0,%xmm3 > + psllq $5,%xmm0 > + pxor %xmm0,%xmm3 > + psllq $1,%xmm0 > + pxor %xmm3,%xmm0 > + psllq $57,%xmm0 > + movdqa %xmm0,%xmm3 > + pslldq $8,%xmm0 > + psrldq $8,%xmm3 > + pxor %xmm4,%xmm0 > + pxor %xmm3,%xmm1 > + > + > + movdqa %xmm0,%xmm4 > + psrlq $1,%xmm0 > + pxor %xmm4,%xmm1 > + pxor %xmm0,%xmm4 > + psrlq $5,%xmm0 > + pxor %xmm4,%xmm0 > + psrlq $1,%xmm0 > + pxor %xmm1,%xmm0 > + pshufd $78,%xmm5,%xmm3 > + pshufd $78,%xmm0,%xmm4 > + pxor %xmm5,%xmm3 > + movdqu %xmm5,48(%rdi) > + pxor %xmm0,%xmm4 > + movdqu %xmm0,64(%rdi) > +.byte 102,15,58,15,227,8 > + movdqu %xmm4,80(%rdi) > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size gcm_init_clmul,.-gcm_init_clmul > +.globl gcm_gmult_clmul > +.type gcm_gmult_clmul,@function > +.align 16 > +gcm_gmult_clmul: > +.cfi_startproc > +.L_gmult_clmul: > + movdqu (%rdi),%xmm0 > + movdqa .Lbswap_mask(%rip),%xmm5 > + movdqu (%rsi),%xmm2 > + movdqu 32(%rsi),%xmm4 > +.byte 102,15,56,0,197 > + movdqa %xmm0,%xmm1 > + pshufd $78,%xmm0,%xmm3 > + pxor %xmm0,%xmm3 > +.byte 102,15,58,68,194,0 > +.byte 102,15,58,68,202,17 > +.byte 102,15,58,68,220,0 > + pxor %xmm0,%xmm3 > + pxor %xmm1,%xmm3 > + > + movdqa %xmm3,%xmm4 > + psrldq $8,%xmm3 > + pslldq $8,%xmm4 > + pxor %xmm3,%xmm1 > + pxor %xmm4,%xmm0 > + > + movdqa %xmm0,%xmm4 > + movdqa %xmm0,%xmm3 > + psllq $5,%xmm0 > + pxor %xmm0,%xmm3 > + psllq $1,%xmm0 > + pxor %xmm3,%xmm0 > + psllq $57,%xmm0 > + movdqa %xmm0,%xmm3 > + pslldq $8,%xmm0 > + psrldq $8,%xmm3 > + pxor %xmm4,%xmm0 > + pxor %xmm3,%xmm1 > + > + > + movdqa %xmm0,%xmm4 > + psrlq $1,%xmm0 > + pxor %xmm4,%xmm1 > + pxor %xmm0,%xmm4 > + psrlq $5,%xmm0 > + pxor %xmm4,%xmm0 > + psrlq $1,%xmm0 > + pxor %xmm1,%xmm0 > +.byte 102,15,56,0,197 > + movdqu %xmm0,(%rdi) > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size gcm_gmult_clmul,.-gcm_gmult_clmul > +.globl gcm_ghash_clmul > +.type gcm_ghash_clmul,@function > +.align 32 > +gcm_ghash_clmul: > +.cfi_startproc > +.L_ghash_clmul: > + movdqa .Lbswap_mask(%rip),%xmm10 > + > + movdqu (%rdi),%xmm0 > + movdqu (%rsi),%xmm2 > + movdqu 32(%rsi),%xmm7 > +.byte 102,65,15,56,0,194 > + > + subq $0x10,%rcx > + jz .Lodd_tail > + > + movdqu 16(%rsi),%xmm6 > + movl OPENSSL_ia32cap_P+4(%rip),%eax > + cmpq $0x30,%rcx > + jb .Lskip4x > + > + andl $71303168,%eax > + cmpl $4194304,%eax > + je .Lskip4x > + > + subq $0x30,%rcx > + movq $0xA040608020C0E000,%rax > + movdqu 48(%rsi),%xmm14 > + movdqu 64(%rsi),%xmm15 > + > + > + > + > + movdqu 48(%rdx),%xmm3 > + movdqu 32(%rdx),%xmm11 > +.byte 102,65,15,56,0,218 > +.byte 102,69,15,56,0,218 > + movdqa %xmm3,%xmm5 > + pshufd $78,%xmm3,%xmm4 > + pxor %xmm3,%xmm4 > +.byte 102,15,58,68,218,0 > +.byte 102,15,58,68,234,17 > +.byte 102,15,58,68,231,0 > + > + movdqa %xmm11,%xmm13 > + pshufd $78,%xmm11,%xmm12 > + pxor %xmm11,%xmm12 > +.byte 102,68,15,58,68,222,0 > +.byte 102,68,15,58,68,238,17 > +.byte 102,68,15,58,68,231,16 > + xorps %xmm11,%xmm3 > + xorps %xmm13,%xmm5 > + movups 80(%rsi),%xmm7 > + xorps %xmm12,%xmm4 > + > + movdqu 16(%rdx),%xmm11 > + movdqu 0(%rdx),%xmm8 > +.byte 102,69,15,56,0,218 > +.byte 102,69,15,56,0,194 > + movdqa %xmm11,%xmm13 > + pshufd $78,%xmm11,%xmm12 > + pxor %xmm8,%xmm0 > + pxor %xmm11,%xmm12 > +.byte 102,69,15,58,68,222,0 > + movdqa %xmm0,%xmm1 > + pshufd $78,%xmm0,%xmm8 > + pxor %xmm0,%xmm8 > +.byte 102,69,15,58,68,238,17 > +.byte 102,68,15,58,68,231,0 > + xorps %xmm11,%xmm3 > + xorps %xmm13,%xmm5 > + > + leaq 64(%rdx),%rdx > + subq $0x40,%rcx > + jc .Ltail4x > + > + jmp .Lmod4_loop > +.align 32 > +.Lmod4_loop: > +.byte 102,65,15,58,68,199,0 > + xorps %xmm12,%xmm4 > + movdqu 48(%rdx),%xmm11 > +.byte 102,69,15,56,0,218 > +.byte 102,65,15,58,68,207,17 > + xorps %xmm3,%xmm0 > + movdqu 32(%rdx),%xmm3 > + movdqa %xmm11,%xmm13 > +.byte 102,68,15,58,68,199,16 > + pshufd $78,%xmm11,%xmm12 > + xorps %xmm5,%xmm1 > + pxor %xmm11,%xmm12 > +.byte 102,65,15,56,0,218 > + movups 32(%rsi),%xmm7 > + xorps %xmm4,%xmm8 > +.byte 102,68,15,58,68,218,0 > + pshufd $78,%xmm3,%xmm4 > + > + pxor %xmm0,%xmm8 > + movdqa %xmm3,%xmm5 > + pxor %xmm1,%xmm8 > + pxor %xmm3,%xmm4 > + movdqa %xmm8,%xmm9 > +.byte 102,68,15,58,68,234,17 > + pslldq $8,%xmm8 > + psrldq $8,%xmm9 > + pxor %xmm8,%xmm0 > + movdqa .L7_mask(%rip),%xmm8 > + pxor %xmm9,%xmm1 > +.byte 102,76,15,110,200 > + > + pand %xmm0,%xmm8 > +.byte 102,69,15,56,0,200 > + pxor %xmm0,%xmm9 > +.byte 102,68,15,58,68,231,0 > + psllq $57,%xmm9 > + movdqa %xmm9,%xmm8 > + pslldq $8,%xmm9 > +.byte 102,15,58,68,222,0 > + psrldq $8,%xmm8 > + pxor %xmm9,%xmm0 > + pxor %xmm8,%xmm1 > + movdqu 0(%rdx),%xmm8 > + > + movdqa %xmm0,%xmm9 > + psrlq $1,%xmm0 > +.byte 102,15,58,68,238,17 > + xorps %xmm11,%xmm3 > + movdqu 16(%rdx),%xmm11 > +.byte 102,69,15,56,0,218 > +.byte 102,15,58,68,231,16 > + xorps %xmm13,%xmm5 > + movups 80(%rsi),%xmm7 > +.byte 102,69,15,56,0,194 > + pxor %xmm9,%xmm1 > + pxor %xmm0,%xmm9 > + psrlq $5,%xmm0 > + > + movdqa %xmm11,%xmm13 > + pxor %xmm12,%xmm4 > + pshufd $78,%xmm11,%xmm12 > + pxor %xmm9,%xmm0 > + pxor %xmm8,%xmm1 > + pxor %xmm11,%xmm12 > +.byte 102,69,15,58,68,222,0 > + psrlq $1,%xmm0 > + pxor %xmm1,%xmm0 > + movdqa %xmm0,%xmm1 > +.byte 102,69,15,58,68,238,17 > + xorps %xmm11,%xmm3 > + pshufd $78,%xmm0,%xmm8 > + pxor %xmm0,%xmm8 > + > +.byte 102,68,15,58,68,231,0 > + xorps %xmm13,%xmm5 > + > + leaq 64(%rdx),%rdx > + subq $0x40,%rcx > + jnc .Lmod4_loop > + > +.Ltail4x: > +.byte 102,65,15,58,68,199,0 > +.byte 102,65,15,58,68,207,17 > +.byte 102,68,15,58,68,199,16 > + xorps %xmm12,%xmm4 > + xorps %xmm3,%xmm0 > + xorps %xmm5,%xmm1 > + pxor %xmm0,%xmm1 > + pxor %xmm4,%xmm8 > + > + pxor %xmm1,%xmm8 > + pxor %xmm0,%xmm1 > + > + movdqa %xmm8,%xmm9 > + psrldq $8,%xmm8 > + pslldq $8,%xmm9 > + pxor %xmm8,%xmm1 > + pxor %xmm9,%xmm0 > + > + movdqa %xmm0,%xmm4 > + movdqa %xmm0,%xmm3 > + psllq $5,%xmm0 > + pxor %xmm0,%xmm3 > + psllq $1,%xmm0 > + pxor %xmm3,%xmm0 > + psllq $57,%xmm0 > + movdqa %xmm0,%xmm3 > + pslldq $8,%xmm0 > + psrldq $8,%xmm3 > + pxor %xmm4,%xmm0 > + pxor %xmm3,%xmm1 > + > + > + movdqa %xmm0,%xmm4 > + psrlq $1,%xmm0 > + pxor %xmm4,%xmm1 > + pxor %xmm0,%xmm4 > + psrlq $5,%xmm0 > + pxor %xmm4,%xmm0 > + psrlq $1,%xmm0 > + pxor %xmm1,%xmm0 > + addq $0x40,%rcx > + jz .Ldone > + movdqu 32(%rsi),%xmm7 > + subq $0x10,%rcx > + jz .Lodd_tail > +.Lskip4x: > + > + > + > + > + > + movdqu (%rdx),%xmm8 > + movdqu 16(%rdx),%xmm3 > +.byte 102,69,15,56,0,194 > +.byte 102,65,15,56,0,218 > + pxor %xmm8,%xmm0 > + > + movdqa %xmm3,%xmm5 > + pshufd $78,%xmm3,%xmm4 > + pxor %xmm3,%xmm4 > +.byte 102,15,58,68,218,0 > +.byte 102,15,58,68,234,17 > +.byte 102,15,58,68,231,0 > + > + leaq 32(%rdx),%rdx > + nop > + subq $0x20,%rcx > + jbe .Leven_tail > + nop > + jmp .Lmod_loop > + > +.align 32 > +.Lmod_loop: > + movdqa %xmm0,%xmm1 > + movdqa %xmm4,%xmm8 > + pshufd $78,%xmm0,%xmm4 > + pxor %xmm0,%xmm4 > + > +.byte 102,15,58,68,198,0 > +.byte 102,15,58,68,206,17 > +.byte 102,15,58,68,231,16 > + > + pxor %xmm3,%xmm0 > + pxor %xmm5,%xmm1 > + movdqu (%rdx),%xmm9 > + pxor %xmm0,%xmm8 > +.byte 102,69,15,56,0,202 > + movdqu 16(%rdx),%xmm3 > + > + pxor %xmm1,%xmm8 > + pxor %xmm9,%xmm1 > + pxor %xmm8,%xmm4 > +.byte 102,65,15,56,0,218 > + movdqa %xmm4,%xmm8 > + psrldq $8,%xmm8 > + pslldq $8,%xmm4 > + pxor %xmm8,%xmm1 > + pxor %xmm4,%xmm0 > + > + movdqa %xmm3,%xmm5 > + > + movdqa %xmm0,%xmm9 > + movdqa %xmm0,%xmm8 > + psllq $5,%xmm0 > + pxor %xmm0,%xmm8 > +.byte 102,15,58,68,218,0 > + psllq $1,%xmm0 > + pxor %xmm8,%xmm0 > + psllq $57,%xmm0 > + movdqa %xmm0,%xmm8 > + pslldq $8,%xmm0 > + psrldq $8,%xmm8 > + pxor %xmm9,%xmm0 > + pshufd $78,%xmm5,%xmm4 > + pxor %xmm8,%xmm1 > + pxor %xmm5,%xmm4 > + > + movdqa %xmm0,%xmm9 > + psrlq $1,%xmm0 > +.byte 102,15,58,68,234,17 > + pxor %xmm9,%xmm1 > + pxor %xmm0,%xmm9 > + psrlq $5,%xmm0 > + pxor %xmm9,%xmm0 > + leaq 32(%rdx),%rdx > + psrlq $1,%xmm0 > +.byte 102,15,58,68,231,0 > + pxor %xmm1,%xmm0 > + > + subq $0x20,%rcx > + ja .Lmod_loop > + > +.Leven_tail: > + movdqa %xmm0,%xmm1 > + movdqa %xmm4,%xmm8 > + pshufd $78,%xmm0,%xmm4 > + pxor %xmm0,%xmm4 > + > +.byte 102,15,58,68,198,0 > +.byte 102,15,58,68,206,17 > +.byte 102,15,58,68,231,16 > + > + pxor %xmm3,%xmm0 > + pxor %xmm5,%xmm1 > + pxor %xmm0,%xmm8 > + pxor %xmm1,%xmm8 > + pxor %xmm8,%xmm4 > + movdqa %xmm4,%xmm8 > + psrldq $8,%xmm8 > + pslldq $8,%xmm4 > + pxor %xmm8,%xmm1 > + pxor %xmm4,%xmm0 > + > + movdqa %xmm0,%xmm4 > + movdqa %xmm0,%xmm3 > + psllq $5,%xmm0 > + pxor %xmm0,%xmm3 > + psllq $1,%xmm0 > + pxor %xmm3,%xmm0 > + psllq $57,%xmm0 > + movdqa %xmm0,%xmm3 > + pslldq $8,%xmm0 > + psrldq $8,%xmm3 > + pxor %xmm4,%xmm0 > + pxor %xmm3,%xmm1 > + > + > + movdqa %xmm0,%xmm4 > + psrlq $1,%xmm0 > + pxor %xmm4,%xmm1 > + pxor %xmm0,%xmm4 > + psrlq $5,%xmm0 > + pxor %xmm4,%xmm0 > + psrlq $1,%xmm0 > + pxor %xmm1,%xmm0 > + testq %rcx,%rcx > + jnz .Ldone > + > +.Lodd_tail: > + movdqu (%rdx),%xmm8 > +.byte 102,69,15,56,0,194 > + pxor %xmm8,%xmm0 > + movdqa %xmm0,%xmm1 > + pshufd $78,%xmm0,%xmm3 > + pxor %xmm0,%xmm3 > +.byte 102,15,58,68,194,0 > +.byte 102,15,58,68,202,17 > +.byte 102,15,58,68,223,0 > + pxor %xmm0,%xmm3 > + pxor %xmm1,%xmm3 > + > + movdqa %xmm3,%xmm4 > + psrldq $8,%xmm3 > + pslldq $8,%xmm4 > + pxor %xmm3,%xmm1 > + pxor %xmm4,%xmm0 > + > + movdqa %xmm0,%xmm4 > + movdqa %xmm0,%xmm3 > + psllq $5,%xmm0 > + pxor %xmm0,%xmm3 > + psllq $1,%xmm0 > + pxor %xmm3,%xmm0 > + psllq $57,%xmm0 > + movdqa %xmm0,%xmm3 > + pslldq $8,%xmm0 > + psrldq $8,%xmm3 > + pxor %xmm4,%xmm0 > + pxor %xmm3,%xmm1 > + > + > + movdqa %xmm0,%xmm4 > + psrlq $1,%xmm0 > + pxor %xmm4,%xmm1 > + pxor %xmm0,%xmm4 > + psrlq $5,%xmm0 > + pxor %xmm4,%xmm0 > + psrlq $1,%xmm0 > + pxor %xmm1,%xmm0 > +.Ldone: > +.byte 102,65,15,56,0,194 > + movdqu %xmm0,(%rdi) > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size gcm_ghash_clmul,.-gcm_ghash_clmul > +.globl gcm_init_avx > +.type gcm_init_avx,@function > +.align 32 > +gcm_init_avx: > +.cfi_startproc > + jmp .L_init_clmul > +.cfi_endproc > +.size gcm_init_avx,.-gcm_init_avx > +.globl gcm_gmult_avx > +.type gcm_gmult_avx,@function > +.align 32 > +gcm_gmult_avx: > +.cfi_startproc > + jmp .L_gmult_clmul > +.cfi_endproc > +.size gcm_gmult_avx,.-gcm_gmult_avx > +.globl gcm_ghash_avx > +.type gcm_ghash_avx,@function > +.align 32 > +gcm_ghash_avx: > +.cfi_startproc > + jmp .L_ghash_clmul > +.cfi_endproc > +.size gcm_ghash_avx,.-gcm_ghash_avx > +.align 64 > +.Lbswap_mask: > +.byte 15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0 > +.L0x1c2_polynomial: > +.byte 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0xc2 > +.L7_mask: > +.long 7,0,7,0 > +.L7_mask_poly: > +.long 7,0,450,0 > +.align 64 > +.type .Lrem_4bit,@object > +.Lrem_4bit: > +.long 0,0,0,471859200,0,943718400,0,610271232 > +.long 0,1887436800,0,1822425088,0,1220542464,0,1423966208 > +.long 0,3774873600,0,4246732800,0,3644850176,0,3311403008 > +.long 0,2441084928,0,2376073216,0,2847932416,0,3051356160 > +.type .Lrem_8bit,@object > +.Lrem_8bit: > +.value 0x0000,0x01C2,0x0384,0x0246,0x0708,0x06CA,0x048C,0x054E > +.value 0x0E10,0x0FD2,0x0D94,0x0C56,0x0918,0x08DA,0x0A9C,0x0B5E > +.value 0x1C20,0x1DE2,0x1FA4,0x1E66,0x1B28,0x1AEA,0x18AC,0x196E > +.value 0x1230,0x13F2,0x11B4,0x1076,0x1538,0x14FA,0x16BC,0x177E > +.value 0x3840,0x3982,0x3BC4,0x3A06,0x3F48,0x3E8A,0x3CCC,0x3D0E > +.value 0x3650,0x3792,0x35D4,0x3416,0x3158,0x309A,0x32DC,0x331E > +.value 0x2460,0x25A2,0x27E4,0x2626,0x2368,0x22AA,0x20EC,0x212E > +.value 0x2A70,0x2BB2,0x29F4,0x2836,0x2D78,0x2CBA,0x2EFC,0x2F3E > +.value 0x7080,0x7142,0x7304,0x72C6,0x7788,0x764A,0x740C,0x75CE > +.value 0x7E90,0x7F52,0x7D14,0x7CD6,0x7998,0x785A,0x7A1C,0x7BDE > +.value 0x6CA0,0x6D62,0x6F24,0x6EE6,0x6BA8,0x6A6A,0x682C,0x69EE > +.value 0x62B0,0x6372,0x6134,0x60F6,0x65B8,0x647A,0x663C,0x67FE > +.value 0x48C0,0x4902,0x4B44,0x4A86,0x4FC8,0x4E0A,0x4C4C,0x4D8E > +.value 0x46D0,0x4712,0x4554,0x4496,0x41D8,0x401A,0x425C,0x439E > +.value 0x54E0,0x5522,0x5764,0x56A6,0x53E8,0x522A,0x506C,0x51AE > +.value 0x5AF0,0x5B32,0x5974,0x58B6,0x5DF8,0x5C3A,0x5E7C,0x5FBE > +.value 0xE100,0xE0C2,0xE284,0xE346,0xE608,0xE7CA,0xE58C,0xE44E > +.value 0xEF10,0xEED2,0xEC94,0xED56,0xE818,0xE9DA,0xEB9C,0xEA5E > +.value 0xFD20,0xFCE2,0xFEA4,0xFF66,0xFA28,0xFBEA,0xF9AC,0xF86E > +.value 0xF330,0xF2F2,0xF0B4,0xF176,0xF438,0xF5FA,0xF7BC,0xF67E > +.value 0xD940,0xD882,0xDAC4,0xDB06,0xDE48,0xDF8A,0xDDCC,0xDC0E > +.value 0xD750,0xD692,0xD4D4,0xD516,0xD058,0xD19A,0xD3DC,0xD21E > +.value 0xC560,0xC4A2,0xC6E4,0xC726,0xC268,0xC3AA,0xC1EC,0xC02E > +.value 0xCB70,0xCAB2,0xC8F4,0xC936,0xCC78,0xCDBA,0xCFFC,0xCE3E > +.value 0x9180,0x9042,0x9204,0x93C6,0x9688,0x974A,0x950C,0x94CE > +.value 0x9F90,0x9E52,0x9C14,0x9DD6,0x9898,0x995A,0x9B1C,0x9ADE > +.value 0x8DA0,0x8C62,0x8E24,0x8FE6,0x8AA8,0x8B6A,0x892C,0x88EE > +.value 0x83B0,0x8272,0x8034,0x81F6,0x84B8,0x857A,0x873C,0x86FE > +.value 0xA9C0,0xA802,0xAA44,0xAB86,0xAEC8,0xAF0A,0xAD4C,0xAC8E > +.value 0xA7D0,0xA612,0xA454,0xA596,0xA0D8,0xA11A,0xA35C,0xA29E > +.value 0xB5E0,0xB422,0xB664,0xB7A6,0xB2E8,0xB32A,0xB16C,0xB0AE > +.value 0xBBF0,0xBA32,0xB874,0xB9B6,0xBCF8,0xBD3A,0xBF7C,0xBEBE > + > +.byte > 71,72,65,83,72,32,102,111,114,32,120,56,54,95,54,52,44,32,67,82,89,80,84,79, > 71,65,77,83,32,98,121,32,60,97,112,112,114,111,64,111,112,101,110,115,115, > 108,46,111,114,103,62,0 > +.align 64 > diff --git a/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha1-mb-x86_64.S > b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha1-mb-x86_64.S > new file mode 100644 > index 0000000000..4572bc7227 > --- /dev/null > +++ b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha1-mb-x86_64.S > @@ -0,0 +1,2962 @@ > +# WARNING: do not edit! > +# Generated from openssl/crypto/sha/asm/sha1-mb-x86_64.pl > +# > +# Copyright 2013-2020 The OpenSSL Project Authors. All Rights Reserved. > +# > +# Licensed under the OpenSSL license (the "License"). You may not use > +# this file except in compliance with the License. You can obtain a copy > +# in the file LICENSE in the source distribution or at > +# https://www.openssl.org/source/license.html > + > +.text > + > + > + > +.globl sha1_multi_block > +.type sha1_multi_block,@function > +.align 32 > +sha1_multi_block: > +.cfi_startproc > + movq OPENSSL_ia32cap_P+4(%rip),%rcx > + btq $61,%rcx > + jc _shaext_shortcut > + movq %rsp,%rax > +.cfi_def_cfa_register %rax > + pushq %rbx > +.cfi_offset %rbx,-16 > + pushq %rbp > +.cfi_offset %rbx,-24 > + subq $288,%rsp > + andq $-256,%rsp > + movq %rax,272(%rsp) > +.cfi_escape 0x0f,0x06,0x77,0x90,0x02,0x06,0x23,0x08 > +.Lbody: > + leaq K_XX_XX(%rip),%rbp > + leaq 256(%rsp),%rbx > + > +.Loop_grande: > + movl %edx,280(%rsp) > + xorl %edx,%edx > + movq 0(%rsi),%r8 > + movl 8(%rsi),%ecx > + cmpl %edx,%ecx > + cmovgl %ecx,%edx > + testl %ecx,%ecx > + movl %ecx,0(%rbx) > + cmovleq %rbp,%r8 > + movq 16(%rsi),%r9 > + movl 24(%rsi),%ecx > + cmpl %edx,%ecx > + cmovgl %ecx,%edx > + testl %ecx,%ecx > + movl %ecx,4(%rbx) > + cmovleq %rbp,%r9 > + movq 32(%rsi),%r10 > + movl 40(%rsi),%ecx > + cmpl %edx,%ecx > + cmovgl %ecx,%edx > + testl %ecx,%ecx > + movl %ecx,8(%rbx) > + cmovleq %rbp,%r10 > + movq 48(%rsi),%r11 > + movl 56(%rsi),%ecx > + cmpl %edx,%ecx > + cmovgl %ecx,%edx > + testl %ecx,%ecx > + movl %ecx,12(%rbx) > + cmovleq %rbp,%r11 > + testl %edx,%edx > + jz .Ldone > + > + movdqu 0(%rdi),%xmm10 > + leaq 128(%rsp),%rax > + movdqu 32(%rdi),%xmm11 > + movdqu 64(%rdi),%xmm12 > + movdqu 96(%rdi),%xmm13 > + movdqu 128(%rdi),%xmm14 > + movdqa 96(%rbp),%xmm5 > + movdqa -32(%rbp),%xmm15 > + jmp .Loop > + > +.align 32 > +.Loop: > + movd (%r8),%xmm0 > + leaq 64(%r8),%r8 > + movd (%r9),%xmm2 > + leaq 64(%r9),%r9 > + movd (%r10),%xmm3 > + leaq 64(%r10),%r10 > + movd (%r11),%xmm4 > + leaq 64(%r11),%r11 > + punpckldq %xmm3,%xmm0 > + movd -60(%r8),%xmm1 > + punpckldq %xmm4,%xmm2 > + movd -60(%r9),%xmm9 > + punpckldq %xmm2,%xmm0 > + movd -60(%r10),%xmm8 > +.byte 102,15,56,0,197 > + movd -60(%r11),%xmm7 > + punpckldq %xmm8,%xmm1 > + movdqa %xmm10,%xmm8 > + paddd %xmm15,%xmm14 > + punpckldq %xmm7,%xmm9 > + movdqa %xmm11,%xmm7 > + movdqa %xmm11,%xmm6 > + pslld $5,%xmm8 > + pandn %xmm13,%xmm7 > + pand %xmm12,%xmm6 > + punpckldq %xmm9,%xmm1 > + movdqa %xmm10,%xmm9 > + > + movdqa %xmm0,0-128(%rax) > + paddd %xmm0,%xmm14 > + movd -56(%r8),%xmm2 > + psrld $27,%xmm9 > + pxor %xmm7,%xmm6 > + movdqa %xmm11,%xmm7 > + > + por %xmm9,%xmm8 > + movd -56(%r9),%xmm9 > + pslld $30,%xmm7 > + paddd %xmm6,%xmm14 > + > + psrld $2,%xmm11 > + paddd %xmm8,%xmm14 > +.byte 102,15,56,0,205 > + movd -56(%r10),%xmm8 > + por %xmm7,%xmm11 > + movd -56(%r11),%xmm7 > + punpckldq %xmm8,%xmm2 > + movdqa %xmm14,%xmm8 > + paddd %xmm15,%xmm13 > + punpckldq %xmm7,%xmm9 > + movdqa %xmm10,%xmm7 > + movdqa %xmm10,%xmm6 > + pslld $5,%xmm8 > + pandn %xmm12,%xmm7 > + pand %xmm11,%xmm6 > + punpckldq %xmm9,%xmm2 > + movdqa %xmm14,%xmm9 > + > + movdqa %xmm1,16-128(%rax) > + paddd %xmm1,%xmm13 > + movd -52(%r8),%xmm3 > + psrld $27,%xmm9 > + pxor %xmm7,%xmm6 > + movdqa %xmm10,%xmm7 > + > + por %xmm9,%xmm8 > + movd -52(%r9),%xmm9 > + pslld $30,%xmm7 > + paddd %xmm6,%xmm13 > + > + psrld $2,%xmm10 > + paddd %xmm8,%xmm13 > +.byte 102,15,56,0,213 > + movd -52(%r10),%xmm8 > + por %xmm7,%xmm10 > + movd -52(%r11),%xmm7 > + punpckldq %xmm8,%xmm3 > + movdqa %xmm13,%xmm8 > + paddd %xmm15,%xmm12 > + punpckldq %xmm7,%xmm9 > + movdqa %xmm14,%xmm7 > + movdqa %xmm14,%xmm6 > + pslld $5,%xmm8 > + pandn %xmm11,%xmm7 > + pand %xmm10,%xmm6 > + punpckldq %xmm9,%xmm3 > + movdqa %xmm13,%xmm9 > + > + movdqa %xmm2,32-128(%rax) > + paddd %xmm2,%xmm12 > + movd -48(%r8),%xmm4 > + psrld $27,%xmm9 > + pxor %xmm7,%xmm6 > + movdqa %xmm14,%xmm7 > + > + por %xmm9,%xmm8 > + movd -48(%r9),%xmm9 > + pslld $30,%xmm7 > + paddd %xmm6,%xmm12 > + > + psrld $2,%xmm14 > + paddd %xmm8,%xmm12 > +.byte 102,15,56,0,221 > + movd -48(%r10),%xmm8 > + por %xmm7,%xmm14 > + movd -48(%r11),%xmm7 > + punpckldq %xmm8,%xmm4 > + movdqa %xmm12,%xmm8 > + paddd %xmm15,%xmm11 > + punpckldq %xmm7,%xmm9 > + movdqa %xmm13,%xmm7 > + movdqa %xmm13,%xmm6 > + pslld $5,%xmm8 > + pandn %xmm10,%xmm7 > + pand %xmm14,%xmm6 > + punpckldq %xmm9,%xmm4 > + movdqa %xmm12,%xmm9 > + > + movdqa %xmm3,48-128(%rax) > + paddd %xmm3,%xmm11 > + movd -44(%r8),%xmm0 > + psrld $27,%xmm9 > + pxor %xmm7,%xmm6 > + movdqa %xmm13,%xmm7 > + > + por %xmm9,%xmm8 > + movd -44(%r9),%xmm9 > + pslld $30,%xmm7 > + paddd %xmm6,%xmm11 > + > + psrld $2,%xmm13 > + paddd %xmm8,%xmm11 > +.byte 102,15,56,0,229 > + movd -44(%r10),%xmm8 > + por %xmm7,%xmm13 > + movd -44(%r11),%xmm7 > + punpckldq %xmm8,%xmm0 > + movdqa %xmm11,%xmm8 > + paddd %xmm15,%xmm10 > + punpckldq %xmm7,%xmm9 > + movdqa %xmm12,%xmm7 > + movdqa %xmm12,%xmm6 > + pslld $5,%xmm8 > + pandn %xmm14,%xmm7 > + pand %xmm13,%xmm6 > + punpckldq %xmm9,%xmm0 > + movdqa %xmm11,%xmm9 > + > + movdqa %xmm4,64-128(%rax) > + paddd %xmm4,%xmm10 > + movd -40(%r8),%xmm1 > + psrld $27,%xmm9 > + pxor %xmm7,%xmm6 > + movdqa %xmm12,%xmm7 > + > + por %xmm9,%xmm8 > + movd -40(%r9),%xmm9 > + pslld $30,%xmm7 > + paddd %xmm6,%xmm10 > + > + psrld $2,%xmm12 > + paddd %xmm8,%xmm10 > +.byte 102,15,56,0,197 > + movd -40(%r10),%xmm8 > + por %xmm7,%xmm12 > + movd -40(%r11),%xmm7 > + punpckldq %xmm8,%xmm1 > + movdqa %xmm10,%xmm8 > + paddd %xmm15,%xmm14 > + punpckldq %xmm7,%xmm9 > + movdqa %xmm11,%xmm7 > + movdqa %xmm11,%xmm6 > + pslld $5,%xmm8 > + pandn %xmm13,%xmm7 > + pand %xmm12,%xmm6 > + punpckldq %xmm9,%xmm1 > + movdqa %xmm10,%xmm9 > + > + movdqa %xmm0,80-128(%rax) > + paddd %xmm0,%xmm14 > + movd -36(%r8),%xmm2 > + psrld $27,%xmm9 > + pxor %xmm7,%xmm6 > + movdqa %xmm11,%xmm7 > + > + por %xmm9,%xmm8 > + movd -36(%r9),%xmm9 > + pslld $30,%xmm7 > + paddd %xmm6,%xmm14 > + > + psrld $2,%xmm11 > + paddd %xmm8,%xmm14 > +.byte 102,15,56,0,205 > + movd -36(%r10),%xmm8 > + por %xmm7,%xmm11 > + movd -36(%r11),%xmm7 > + punpckldq %xmm8,%xmm2 > + movdqa %xmm14,%xmm8 > + paddd %xmm15,%xmm13 > + punpckldq %xmm7,%xmm9 > + movdqa %xmm10,%xmm7 > + movdqa %xmm10,%xmm6 > + pslld $5,%xmm8 > + pandn %xmm12,%xmm7 > + pand %xmm11,%xmm6 > + punpckldq %xmm9,%xmm2 > + movdqa %xmm14,%xmm9 > + > + movdqa %xmm1,96-128(%rax) > + paddd %xmm1,%xmm13 > + movd -32(%r8),%xmm3 > + psrld $27,%xmm9 > + pxor %xmm7,%xmm6 > + movdqa %xmm10,%xmm7 > + > + por %xmm9,%xmm8 > + movd -32(%r9),%xmm9 > + pslld $30,%xmm7 > + paddd %xmm6,%xmm13 > + > + psrld $2,%xmm10 > + paddd %xmm8,%xmm13 > +.byte 102,15,56,0,213 > + movd -32(%r10),%xmm8 > + por %xmm7,%xmm10 > + movd -32(%r11),%xmm7 > + punpckldq %xmm8,%xmm3 > + movdqa %xmm13,%xmm8 > + paddd %xmm15,%xmm12 > + punpckldq %xmm7,%xmm9 > + movdqa %xmm14,%xmm7 > + movdqa %xmm14,%xmm6 > + pslld $5,%xmm8 > + pandn %xmm11,%xmm7 > + pand %xmm10,%xmm6 > + punpckldq %xmm9,%xmm3 > + movdqa %xmm13,%xmm9 > + > + movdqa %xmm2,112-128(%rax) > + paddd %xmm2,%xmm12 > + movd -28(%r8),%xmm4 > + psrld $27,%xmm9 > + pxor %xmm7,%xmm6 > + movdqa %xmm14,%xmm7 > + > + por %xmm9,%xmm8 > + movd -28(%r9),%xmm9 > + pslld $30,%xmm7 > + paddd %xmm6,%xmm12 > + > + psrld $2,%xmm14 > + paddd %xmm8,%xmm12 > +.byte 102,15,56,0,221 > + movd -28(%r10),%xmm8 > + por %xmm7,%xmm14 > + movd -28(%r11),%xmm7 > + punpckldq %xmm8,%xmm4 > + movdqa %xmm12,%xmm8 > + paddd %xmm15,%xmm11 > + punpckldq %xmm7,%xmm9 > + movdqa %xmm13,%xmm7 > + movdqa %xmm13,%xmm6 > + pslld $5,%xmm8 > + pandn %xmm10,%xmm7 > + pand %xmm14,%xmm6 > + punpckldq %xmm9,%xmm4 > + movdqa %xmm12,%xmm9 > + > + movdqa %xmm3,128-128(%rax) > + paddd %xmm3,%xmm11 > + movd -24(%r8),%xmm0 > + psrld $27,%xmm9 > + pxor %xmm7,%xmm6 > + movdqa %xmm13,%xmm7 > + > + por %xmm9,%xmm8 > + movd -24(%r9),%xmm9 > + pslld $30,%xmm7 > + paddd %xmm6,%xmm11 > + > + psrld $2,%xmm13 > + paddd %xmm8,%xmm11 > +.byte 102,15,56,0,229 > + movd -24(%r10),%xmm8 > + por %xmm7,%xmm13 > + movd -24(%r11),%xmm7 > + punpckldq %xmm8,%xmm0 > + movdqa %xmm11,%xmm8 > + paddd %xmm15,%xmm10 > + punpckldq %xmm7,%xmm9 > + movdqa %xmm12,%xmm7 > + movdqa %xmm12,%xmm6 > + pslld $5,%xmm8 > + pandn %xmm14,%xmm7 > + pand %xmm13,%xmm6 > + punpckldq %xmm9,%xmm0 > + movdqa %xmm11,%xmm9 > + > + movdqa %xmm4,144-128(%rax) > + paddd %xmm4,%xmm10 > + movd -20(%r8),%xmm1 > + psrld $27,%xmm9 > + pxor %xmm7,%xmm6 > + movdqa %xmm12,%xmm7 > + > + por %xmm9,%xmm8 > + movd -20(%r9),%xmm9 > + pslld $30,%xmm7 > + paddd %xmm6,%xmm10 > + > + psrld $2,%xmm12 > + paddd %xmm8,%xmm10 > +.byte 102,15,56,0,197 > + movd -20(%r10),%xmm8 > + por %xmm7,%xmm12 > + movd -20(%r11),%xmm7 > + punpckldq %xmm8,%xmm1 > + movdqa %xmm10,%xmm8 > + paddd %xmm15,%xmm14 > + punpckldq %xmm7,%xmm9 > + movdqa %xmm11,%xmm7 > + movdqa %xmm11,%xmm6 > + pslld $5,%xmm8 > + pandn %xmm13,%xmm7 > + pand %xmm12,%xmm6 > + punpckldq %xmm9,%xmm1 > + movdqa %xmm10,%xmm9 > + > + movdqa %xmm0,160-128(%rax) > + paddd %xmm0,%xmm14 > + movd -16(%r8),%xmm2 > + psrld $27,%xmm9 > + pxor %xmm7,%xmm6 > + movdqa %xmm11,%xmm7 > + > + por %xmm9,%xmm8 > + movd -16(%r9),%xmm9 > + pslld $30,%xmm7 > + paddd %xmm6,%xmm14 > + > + psrld $2,%xmm11 > + paddd %xmm8,%xmm14 > +.byte 102,15,56,0,205 > + movd -16(%r10),%xmm8 > + por %xmm7,%xmm11 > + movd -16(%r11),%xmm7 > + punpckldq %xmm8,%xmm2 > + movdqa %xmm14,%xmm8 > + paddd %xmm15,%xmm13 > + punpckldq %xmm7,%xmm9 > + movdqa %xmm10,%xmm7 > + movdqa %xmm10,%xmm6 > + pslld $5,%xmm8 > + pandn %xmm12,%xmm7 > + pand %xmm11,%xmm6 > + punpckldq %xmm9,%xmm2 > + movdqa %xmm14,%xmm9 > + > + movdqa %xmm1,176-128(%rax) > + paddd %xmm1,%xmm13 > + movd -12(%r8),%xmm3 > + psrld $27,%xmm9 > + pxor %xmm7,%xmm6 > + movdqa %xmm10,%xmm7 > + > + por %xmm9,%xmm8 > + movd -12(%r9),%xmm9 > + pslld $30,%xmm7 > + paddd %xmm6,%xmm13 > + > + psrld $2,%xmm10 > + paddd %xmm8,%xmm13 > +.byte 102,15,56,0,213 > + movd -12(%r10),%xmm8 > + por %xmm7,%xmm10 > + movd -12(%r11),%xmm7 > + punpckldq %xmm8,%xmm3 > + movdqa %xmm13,%xmm8 > + paddd %xmm15,%xmm12 > + punpckldq %xmm7,%xmm9 > + movdqa %xmm14,%xmm7 > + movdqa %xmm14,%xmm6 > + pslld $5,%xmm8 > + pandn %xmm11,%xmm7 > + pand %xmm10,%xmm6 > + punpckldq %xmm9,%xmm3 > + movdqa %xmm13,%xmm9 > + > + movdqa %xmm2,192-128(%rax) > + paddd %xmm2,%xmm12 > + movd -8(%r8),%xmm4 > + psrld $27,%xmm9 > + pxor %xmm7,%xmm6 > + movdqa %xmm14,%xmm7 > + > + por %xmm9,%xmm8 > + movd -8(%r9),%xmm9 > + pslld $30,%xmm7 > + paddd %xmm6,%xmm12 > + > + psrld $2,%xmm14 > + paddd %xmm8,%xmm12 > +.byte 102,15,56,0,221 > + movd -8(%r10),%xmm8 > + por %xmm7,%xmm14 > + movd -8(%r11),%xmm7 > + punpckldq %xmm8,%xmm4 > + movdqa %xmm12,%xmm8 > + paddd %xmm15,%xmm11 > + punpckldq %xmm7,%xmm9 > + movdqa %xmm13,%xmm7 > + movdqa %xmm13,%xmm6 > + pslld $5,%xmm8 > + pandn %xmm10,%xmm7 > + pand %xmm14,%xmm6 > + punpckldq %xmm9,%xmm4 > + movdqa %xmm12,%xmm9 > + > + movdqa %xmm3,208-128(%rax) > + paddd %xmm3,%xmm11 > + movd -4(%r8),%xmm0 > + psrld $27,%xmm9 > + pxor %xmm7,%xmm6 > + movdqa %xmm13,%xmm7 > + > + por %xmm9,%xmm8 > + movd -4(%r9),%xmm9 > + pslld $30,%xmm7 > + paddd %xmm6,%xmm11 > + > + psrld $2,%xmm13 > + paddd %xmm8,%xmm11 > +.byte 102,15,56,0,229 > + movd -4(%r10),%xmm8 > + por %xmm7,%xmm13 > + movdqa 0-128(%rax),%xmm1 > + movd -4(%r11),%xmm7 > + punpckldq %xmm8,%xmm0 > + movdqa %xmm11,%xmm8 > + paddd %xmm15,%xmm10 > + punpckldq %xmm7,%xmm9 > + movdqa %xmm12,%xmm7 > + movdqa %xmm12,%xmm6 > + pslld $5,%xmm8 > + prefetcht0 63(%r8) > + pandn %xmm14,%xmm7 > + pand %xmm13,%xmm6 > + punpckldq %xmm9,%xmm0 > + movdqa %xmm11,%xmm9 > + > + movdqa %xmm4,224-128(%rax) > + paddd %xmm4,%xmm10 > + psrld $27,%xmm9 > + pxor %xmm7,%xmm6 > + movdqa %xmm12,%xmm7 > + prefetcht0 63(%r9) > + > + por %xmm9,%xmm8 > + pslld $30,%xmm7 > + paddd %xmm6,%xmm10 > + prefetcht0 63(%r10) > + > + psrld $2,%xmm12 > + paddd %xmm8,%xmm10 > +.byte 102,15,56,0,197 > + prefetcht0 63(%r11) > + por %xmm7,%xmm12 > + movdqa 16-128(%rax),%xmm2 > + pxor %xmm3,%xmm1 > + movdqa 32-128(%rax),%xmm3 > + > + movdqa %xmm10,%xmm8 > + pxor 128-128(%rax),%xmm1 > + paddd %xmm15,%xmm14 > + movdqa %xmm11,%xmm7 > + pslld $5,%xmm8 > + pxor %xmm3,%xmm1 > + movdqa %xmm11,%xmm6 > + pandn %xmm13,%xmm7 > + movdqa %xmm1,%xmm5 > + pand %xmm12,%xmm6 > + movdqa %xmm10,%xmm9 > + psrld $31,%xmm5 > + paddd %xmm1,%xmm1 > + > + movdqa %xmm0,240-128(%rax) > + paddd %xmm0,%xmm14 > + psrld $27,%xmm9 > + pxor %xmm7,%xmm6 > + > + movdqa %xmm11,%xmm7 > + por %xmm9,%xmm8 > + pslld $30,%xmm7 > + paddd %xmm6,%xmm14 > + > + psrld $2,%xmm11 > + paddd %xmm8,%xmm14 > + por %xmm5,%xmm1 > + por %xmm7,%xmm11 > + pxor %xmm4,%xmm2 > + movdqa 48-128(%rax),%xmm4 > + > + movdqa %xmm14,%xmm8 > + pxor 144-128(%rax),%xmm2 > + paddd %xmm15,%xmm13 > + movdqa %xmm10,%xmm7 > + pslld $5,%xmm8 > + pxor %xmm4,%xmm2 > + movdqa %xmm10,%xmm6 > + pandn %xmm12,%xmm7 > + movdqa %xmm2,%xmm5 > + pand %xmm11,%xmm6 > + movdqa %xmm14,%xmm9 > + psrld $31,%xmm5 > + paddd %xmm2,%xmm2 > + > + movdqa %xmm1,0-128(%rax) > + paddd %xmm1,%xmm13 > + psrld $27,%xmm9 > + pxor %xmm7,%xmm6 > + > + movdqa %xmm10,%xmm7 > + por %xmm9,%xmm8 > + pslld $30,%xmm7 > + paddd %xmm6,%xmm13 > + > + psrld $2,%xmm10 > + paddd %xmm8,%xmm13 > + por %xmm5,%xmm2 > + por %xmm7,%xmm10 > + pxor %xmm0,%xmm3 > + movdqa 64-128(%rax),%xmm0 > + > + movdqa %xmm13,%xmm8 > + pxor 160-128(%rax),%xmm3 > + paddd %xmm15,%xmm12 > + movdqa %xmm14,%xmm7 > + pslld $5,%xmm8 > + pxor %xmm0,%xmm3 > + movdqa %xmm14,%xmm6 > + pandn %xmm11,%xmm7 > + movdqa %xmm3,%xmm5 > + pand %xmm10,%xmm6 > + movdqa %xmm13,%xmm9 > + psrld $31,%xmm5 > + paddd %xmm3,%xmm3 > + > + movdqa %xmm2,16-128(%rax) > + paddd %xmm2,%xmm12 > + psrld $27,%xmm9 > + pxor %xmm7,%xmm6 > + > + movdqa %xmm14,%xmm7 > + por %xmm9,%xmm8 > + pslld $30,%xmm7 > + paddd %xmm6,%xmm12 > + > + psrld $2,%xmm14 > + paddd %xmm8,%xmm12 > + por %xmm5,%xmm3 > + por %xmm7,%xmm14 > + pxor %xmm1,%xmm4 > + movdqa 80-128(%rax),%xmm1 > + > + movdqa %xmm12,%xmm8 > + pxor 176-128(%rax),%xmm4 > + paddd %xmm15,%xmm11 > + movdqa %xmm13,%xmm7 > + pslld $5,%xmm8 > + pxor %xmm1,%xmm4 > + movdqa %xmm13,%xmm6 > + pandn %xmm10,%xmm7 > + movdqa %xmm4,%xmm5 > + pand %xmm14,%xmm6 > + movdqa %xmm12,%xmm9 > + psrld $31,%xmm5 > + paddd %xmm4,%xmm4 > + > + movdqa %xmm3,32-128(%rax) > + paddd %xmm3,%xmm11 > + psrld $27,%xmm9 > + pxor %xmm7,%xmm6 > + > + movdqa %xmm13,%xmm7 > + por %xmm9,%xmm8 > + pslld $30,%xmm7 > + paddd %xmm6,%xmm11 > + > + psrld $2,%xmm13 > + paddd %xmm8,%xmm11 > + por %xmm5,%xmm4 > + por %xmm7,%xmm13 > + pxor %xmm2,%xmm0 > + movdqa 96-128(%rax),%xmm2 > + > + movdqa %xmm11,%xmm8 > + pxor 192-128(%rax),%xmm0 > + paddd %xmm15,%xmm10 > + movdqa %xmm12,%xmm7 > + pslld $5,%xmm8 > + pxor %xmm2,%xmm0 > + movdqa %xmm12,%xmm6 > + pandn %xmm14,%xmm7 > + movdqa %xmm0,%xmm5 > + pand %xmm13,%xmm6 > + movdqa %xmm11,%xmm9 > + psrld $31,%xmm5 > + paddd %xmm0,%xmm0 > + > + movdqa %xmm4,48-128(%rax) > + paddd %xmm4,%xmm10 > + psrld $27,%xmm9 > + pxor %xmm7,%xmm6 > + > + movdqa %xmm12,%xmm7 > + por %xmm9,%xmm8 > + pslld $30,%xmm7 > + paddd %xmm6,%xmm10 > + > + psrld $2,%xmm12 > + paddd %xmm8,%xmm10 > + por %xmm5,%xmm0 > + por %xmm7,%xmm12 > + movdqa 0(%rbp),%xmm15 > + pxor %xmm3,%xmm1 > + movdqa 112-128(%rax),%xmm3 > + > + movdqa %xmm10,%xmm8 > + movdqa %xmm13,%xmm6 > + pxor 208-128(%rax),%xmm1 > + paddd %xmm15,%xmm14 > + pslld $5,%xmm8 > + pxor %xmm11,%xmm6 > + > + movdqa %xmm10,%xmm9 > + movdqa %xmm0,64-128(%rax) > + paddd %xmm0,%xmm14 > + pxor %xmm3,%xmm1 > + psrld $27,%xmm9 > + pxor %xmm12,%xmm6 > + movdqa %xmm11,%xmm7 > + > + pslld $30,%xmm7 > + movdqa %xmm1,%xmm5 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + paddd %xmm6,%xmm14 > + paddd %xmm1,%xmm1 > + > + psrld $2,%xmm11 > + paddd %xmm8,%xmm14 > + por %xmm5,%xmm1 > + por %xmm7,%xmm11 > + pxor %xmm4,%xmm2 > + movdqa 128-128(%rax),%xmm4 > + > + movdqa %xmm14,%xmm8 > + movdqa %xmm12,%xmm6 > + pxor 224-128(%rax),%xmm2 > + paddd %xmm15,%xmm13 > + pslld $5,%xmm8 > + pxor %xmm10,%xmm6 > + > + movdqa %xmm14,%xmm9 > + movdqa %xmm1,80-128(%rax) > + paddd %xmm1,%xmm13 > + pxor %xmm4,%xmm2 > + psrld $27,%xmm9 > + pxor %xmm11,%xmm6 > + movdqa %xmm10,%xmm7 > + > + pslld $30,%xmm7 > + movdqa %xmm2,%xmm5 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + paddd %xmm6,%xmm13 > + paddd %xmm2,%xmm2 > + > + psrld $2,%xmm10 > + paddd %xmm8,%xmm13 > + por %xmm5,%xmm2 > + por %xmm7,%xmm10 > + pxor %xmm0,%xmm3 > + movdqa 144-128(%rax),%xmm0 > + > + movdqa %xmm13,%xmm8 > + movdqa %xmm11,%xmm6 > + pxor 240-128(%rax),%xmm3 > + paddd %xmm15,%xmm12 > + pslld $5,%xmm8 > + pxor %xmm14,%xmm6 > + > + movdqa %xmm13,%xmm9 > + movdqa %xmm2,96-128(%rax) > + paddd %xmm2,%xmm12 > + pxor %xmm0,%xmm3 > + psrld $27,%xmm9 > + pxor %xmm10,%xmm6 > + movdqa %xmm14,%xmm7 > + > + pslld $30,%xmm7 > + movdqa %xmm3,%xmm5 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + paddd %xmm6,%xmm12 > + paddd %xmm3,%xmm3 > + > + psrld $2,%xmm14 > + paddd %xmm8,%xmm12 > + por %xmm5,%xmm3 > + por %xmm7,%xmm14 > + pxor %xmm1,%xmm4 > + movdqa 160-128(%rax),%xmm1 > + > + movdqa %xmm12,%xmm8 > + movdqa %xmm10,%xmm6 > + pxor 0-128(%rax),%xmm4 > + paddd %xmm15,%xmm11 > + pslld $5,%xmm8 > + pxor %xmm13,%xmm6 > + > + movdqa %xmm12,%xmm9 > + movdqa %xmm3,112-128(%rax) > + paddd %xmm3,%xmm11 > + pxor %xmm1,%xmm4 > + psrld $27,%xmm9 > + pxor %xmm14,%xmm6 > + movdqa %xmm13,%xmm7 > + > + pslld $30,%xmm7 > + movdqa %xmm4,%xmm5 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + paddd %xmm6,%xmm11 > + paddd %xmm4,%xmm4 > + > + psrld $2,%xmm13 > + paddd %xmm8,%xmm11 > + por %xmm5,%xmm4 > + por %xmm7,%xmm13 > + pxor %xmm2,%xmm0 > + movdqa 176-128(%rax),%xmm2 > + > + movdqa %xmm11,%xmm8 > + movdqa %xmm14,%xmm6 > + pxor 16-128(%rax),%xmm0 > + paddd %xmm15,%xmm10 > + pslld $5,%xmm8 > + pxor %xmm12,%xmm6 > + > + movdqa %xmm11,%xmm9 > + movdqa %xmm4,128-128(%rax) > + paddd %xmm4,%xmm10 > + pxor %xmm2,%xmm0 > + psrld $27,%xmm9 > + pxor %xmm13,%xmm6 > + movdqa %xmm12,%xmm7 > + > + pslld $30,%xmm7 > + movdqa %xmm0,%xmm5 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + paddd %xmm6,%xmm10 > + paddd %xmm0,%xmm0 > + > + psrld $2,%xmm12 > + paddd %xmm8,%xmm10 > + por %xmm5,%xmm0 > + por %xmm7,%xmm12 > + pxor %xmm3,%xmm1 > + movdqa 192-128(%rax),%xmm3 > + > + movdqa %xmm10,%xmm8 > + movdqa %xmm13,%xmm6 > + pxor 32-128(%rax),%xmm1 > + paddd %xmm15,%xmm14 > + pslld $5,%xmm8 > + pxor %xmm11,%xmm6 > + > + movdqa %xmm10,%xmm9 > + movdqa %xmm0,144-128(%rax) > + paddd %xmm0,%xmm14 > + pxor %xmm3,%xmm1 > + psrld $27,%xmm9 > + pxor %xmm12,%xmm6 > + movdqa %xmm11,%xmm7 > + > + pslld $30,%xmm7 > + movdqa %xmm1,%xmm5 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + paddd %xmm6,%xmm14 > + paddd %xmm1,%xmm1 > + > + psrld $2,%xmm11 > + paddd %xmm8,%xmm14 > + por %xmm5,%xmm1 > + por %xmm7,%xmm11 > + pxor %xmm4,%xmm2 > + movdqa 208-128(%rax),%xmm4 > + > + movdqa %xmm14,%xmm8 > + movdqa %xmm12,%xmm6 > + pxor 48-128(%rax),%xmm2 > + paddd %xmm15,%xmm13 > + pslld $5,%xmm8 > + pxor %xmm10,%xmm6 > + > + movdqa %xmm14,%xmm9 > + movdqa %xmm1,160-128(%rax) > + paddd %xmm1,%xmm13 > + pxor %xmm4,%xmm2 > + psrld $27,%xmm9 > + pxor %xmm11,%xmm6 > + movdqa %xmm10,%xmm7 > + > + pslld $30,%xmm7 > + movdqa %xmm2,%xmm5 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + paddd %xmm6,%xmm13 > + paddd %xmm2,%xmm2 > + > + psrld $2,%xmm10 > + paddd %xmm8,%xmm13 > + por %xmm5,%xmm2 > + por %xmm7,%xmm10 > + pxor %xmm0,%xmm3 > + movdqa 224-128(%rax),%xmm0 > + > + movdqa %xmm13,%xmm8 > + movdqa %xmm11,%xmm6 > + pxor 64-128(%rax),%xmm3 > + paddd %xmm15,%xmm12 > + pslld $5,%xmm8 > + pxor %xmm14,%xmm6 > + > + movdqa %xmm13,%xmm9 > + movdqa %xmm2,176-128(%rax) > + paddd %xmm2,%xmm12 > + pxor %xmm0,%xmm3 > + psrld $27,%xmm9 > + pxor %xmm10,%xmm6 > + movdqa %xmm14,%xmm7 > + > + pslld $30,%xmm7 > + movdqa %xmm3,%xmm5 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + paddd %xmm6,%xmm12 > + paddd %xmm3,%xmm3 > + > + psrld $2,%xmm14 > + paddd %xmm8,%xmm12 > + por %xmm5,%xmm3 > + por %xmm7,%xmm14 > + pxor %xmm1,%xmm4 > + movdqa 240-128(%rax),%xmm1 > + > + movdqa %xmm12,%xmm8 > + movdqa %xmm10,%xmm6 > + pxor 80-128(%rax),%xmm4 > + paddd %xmm15,%xmm11 > + pslld $5,%xmm8 > + pxor %xmm13,%xmm6 > + > + movdqa %xmm12,%xmm9 > + movdqa %xmm3,192-128(%rax) > + paddd %xmm3,%xmm11 > + pxor %xmm1,%xmm4 > + psrld $27,%xmm9 > + pxor %xmm14,%xmm6 > + movdqa %xmm13,%xmm7 > + > + pslld $30,%xmm7 > + movdqa %xmm4,%xmm5 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + paddd %xmm6,%xmm11 > + paddd %xmm4,%xmm4 > + > + psrld $2,%xmm13 > + paddd %xmm8,%xmm11 > + por %xmm5,%xmm4 > + por %xmm7,%xmm13 > + pxor %xmm2,%xmm0 > + movdqa 0-128(%rax),%xmm2 > + > + movdqa %xmm11,%xmm8 > + movdqa %xmm14,%xmm6 > + pxor 96-128(%rax),%xmm0 > + paddd %xmm15,%xmm10 > + pslld $5,%xmm8 > + pxor %xmm12,%xmm6 > + > + movdqa %xmm11,%xmm9 > + movdqa %xmm4,208-128(%rax) > + paddd %xmm4,%xmm10 > + pxor %xmm2,%xmm0 > + psrld $27,%xmm9 > + pxor %xmm13,%xmm6 > + movdqa %xmm12,%xmm7 > + > + pslld $30,%xmm7 > + movdqa %xmm0,%xmm5 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + paddd %xmm6,%xmm10 > + paddd %xmm0,%xmm0 > + > + psrld $2,%xmm12 > + paddd %xmm8,%xmm10 > + por %xmm5,%xmm0 > + por %xmm7,%xmm12 > + pxor %xmm3,%xmm1 > + movdqa 16-128(%rax),%xmm3 > + > + movdqa %xmm10,%xmm8 > + movdqa %xmm13,%xmm6 > + pxor 112-128(%rax),%xmm1 > + paddd %xmm15,%xmm14 > + pslld $5,%xmm8 > + pxor %xmm11,%xmm6 > + > + movdqa %xmm10,%xmm9 > + movdqa %xmm0,224-128(%rax) > + paddd %xmm0,%xmm14 > + pxor %xmm3,%xmm1 > + psrld $27,%xmm9 > + pxor %xmm12,%xmm6 > + movdqa %xmm11,%xmm7 > + > + pslld $30,%xmm7 > + movdqa %xmm1,%xmm5 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + paddd %xmm6,%xmm14 > + paddd %xmm1,%xmm1 > + > + psrld $2,%xmm11 > + paddd %xmm8,%xmm14 > + por %xmm5,%xmm1 > + por %xmm7,%xmm11 > + pxor %xmm4,%xmm2 > + movdqa 32-128(%rax),%xmm4 > + > + movdqa %xmm14,%xmm8 > + movdqa %xmm12,%xmm6 > + pxor 128-128(%rax),%xmm2 > + paddd %xmm15,%xmm13 > + pslld $5,%xmm8 > + pxor %xmm10,%xmm6 > + > + movdqa %xmm14,%xmm9 > + movdqa %xmm1,240-128(%rax) > + paddd %xmm1,%xmm13 > + pxor %xmm4,%xmm2 > + psrld $27,%xmm9 > + pxor %xmm11,%xmm6 > + movdqa %xmm10,%xmm7 > + > + pslld $30,%xmm7 > + movdqa %xmm2,%xmm5 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + paddd %xmm6,%xmm13 > + paddd %xmm2,%xmm2 > + > + psrld $2,%xmm10 > + paddd %xmm8,%xmm13 > + por %xmm5,%xmm2 > + por %xmm7,%xmm10 > + pxor %xmm0,%xmm3 > + movdqa 48-128(%rax),%xmm0 > + > + movdqa %xmm13,%xmm8 > + movdqa %xmm11,%xmm6 > + pxor 144-128(%rax),%xmm3 > + paddd %xmm15,%xmm12 > + pslld $5,%xmm8 > + pxor %xmm14,%xmm6 > + > + movdqa %xmm13,%xmm9 > + movdqa %xmm2,0-128(%rax) > + paddd %xmm2,%xmm12 > + pxor %xmm0,%xmm3 > + psrld $27,%xmm9 > + pxor %xmm10,%xmm6 > + movdqa %xmm14,%xmm7 > + > + pslld $30,%xmm7 > + movdqa %xmm3,%xmm5 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + paddd %xmm6,%xmm12 > + paddd %xmm3,%xmm3 > + > + psrld $2,%xmm14 > + paddd %xmm8,%xmm12 > + por %xmm5,%xmm3 > + por %xmm7,%xmm14 > + pxor %xmm1,%xmm4 > + movdqa 64-128(%rax),%xmm1 > + > + movdqa %xmm12,%xmm8 > + movdqa %xmm10,%xmm6 > + pxor 160-128(%rax),%xmm4 > + paddd %xmm15,%xmm11 > + pslld $5,%xmm8 > + pxor %xmm13,%xmm6 > + > + movdqa %xmm12,%xmm9 > + movdqa %xmm3,16-128(%rax) > + paddd %xmm3,%xmm11 > + pxor %xmm1,%xmm4 > + psrld $27,%xmm9 > + pxor %xmm14,%xmm6 > + movdqa %xmm13,%xmm7 > + > + pslld $30,%xmm7 > + movdqa %xmm4,%xmm5 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + paddd %xmm6,%xmm11 > + paddd %xmm4,%xmm4 > + > + psrld $2,%xmm13 > + paddd %xmm8,%xmm11 > + por %xmm5,%xmm4 > + por %xmm7,%xmm13 > + pxor %xmm2,%xmm0 > + movdqa 80-128(%rax),%xmm2 > + > + movdqa %xmm11,%xmm8 > + movdqa %xmm14,%xmm6 > + pxor 176-128(%rax),%xmm0 > + paddd %xmm15,%xmm10 > + pslld $5,%xmm8 > + pxor %xmm12,%xmm6 > + > + movdqa %xmm11,%xmm9 > + movdqa %xmm4,32-128(%rax) > + paddd %xmm4,%xmm10 > + pxor %xmm2,%xmm0 > + psrld $27,%xmm9 > + pxor %xmm13,%xmm6 > + movdqa %xmm12,%xmm7 > + > + pslld $30,%xmm7 > + movdqa %xmm0,%xmm5 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + paddd %xmm6,%xmm10 > + paddd %xmm0,%xmm0 > + > + psrld $2,%xmm12 > + paddd %xmm8,%xmm10 > + por %xmm5,%xmm0 > + por %xmm7,%xmm12 > + pxor %xmm3,%xmm1 > + movdqa 96-128(%rax),%xmm3 > + > + movdqa %xmm10,%xmm8 > + movdqa %xmm13,%xmm6 > + pxor 192-128(%rax),%xmm1 > + paddd %xmm15,%xmm14 > + pslld $5,%xmm8 > + pxor %xmm11,%xmm6 > + > + movdqa %xmm10,%xmm9 > + movdqa %xmm0,48-128(%rax) > + paddd %xmm0,%xmm14 > + pxor %xmm3,%xmm1 > + psrld $27,%xmm9 > + pxor %xmm12,%xmm6 > + movdqa %xmm11,%xmm7 > + > + pslld $30,%xmm7 > + movdqa %xmm1,%xmm5 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + paddd %xmm6,%xmm14 > + paddd %xmm1,%xmm1 > + > + psrld $2,%xmm11 > + paddd %xmm8,%xmm14 > + por %xmm5,%xmm1 > + por %xmm7,%xmm11 > + pxor %xmm4,%xmm2 > + movdqa 112-128(%rax),%xmm4 > + > + movdqa %xmm14,%xmm8 > + movdqa %xmm12,%xmm6 > + pxor 208-128(%rax),%xmm2 > + paddd %xmm15,%xmm13 > + pslld $5,%xmm8 > + pxor %xmm10,%xmm6 > + > + movdqa %xmm14,%xmm9 > + movdqa %xmm1,64-128(%rax) > + paddd %xmm1,%xmm13 > + pxor %xmm4,%xmm2 > + psrld $27,%xmm9 > + pxor %xmm11,%xmm6 > + movdqa %xmm10,%xmm7 > + > + pslld $30,%xmm7 > + movdqa %xmm2,%xmm5 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + paddd %xmm6,%xmm13 > + paddd %xmm2,%xmm2 > + > + psrld $2,%xmm10 > + paddd %xmm8,%xmm13 > + por %xmm5,%xmm2 > + por %xmm7,%xmm10 > + pxor %xmm0,%xmm3 > + movdqa 128-128(%rax),%xmm0 > + > + movdqa %xmm13,%xmm8 > + movdqa %xmm11,%xmm6 > + pxor 224-128(%rax),%xmm3 > + paddd %xmm15,%xmm12 > + pslld $5,%xmm8 > + pxor %xmm14,%xmm6 > + > + movdqa %xmm13,%xmm9 > + movdqa %xmm2,80-128(%rax) > + paddd %xmm2,%xmm12 > + pxor %xmm0,%xmm3 > + psrld $27,%xmm9 > + pxor %xmm10,%xmm6 > + movdqa %xmm14,%xmm7 > + > + pslld $30,%xmm7 > + movdqa %xmm3,%xmm5 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + paddd %xmm6,%xmm12 > + paddd %xmm3,%xmm3 > + > + psrld $2,%xmm14 > + paddd %xmm8,%xmm12 > + por %xmm5,%xmm3 > + por %xmm7,%xmm14 > + pxor %xmm1,%xmm4 > + movdqa 144-128(%rax),%xmm1 > + > + movdqa %xmm12,%xmm8 > + movdqa %xmm10,%xmm6 > + pxor 240-128(%rax),%xmm4 > + paddd %xmm15,%xmm11 > + pslld $5,%xmm8 > + pxor %xmm13,%xmm6 > + > + movdqa %xmm12,%xmm9 > + movdqa %xmm3,96-128(%rax) > + paddd %xmm3,%xmm11 > + pxor %xmm1,%xmm4 > + psrld $27,%xmm9 > + pxor %xmm14,%xmm6 > + movdqa %xmm13,%xmm7 > + > + pslld $30,%xmm7 > + movdqa %xmm4,%xmm5 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + paddd %xmm6,%xmm11 > + paddd %xmm4,%xmm4 > + > + psrld $2,%xmm13 > + paddd %xmm8,%xmm11 > + por %xmm5,%xmm4 > + por %xmm7,%xmm13 > + pxor %xmm2,%xmm0 > + movdqa 160-128(%rax),%xmm2 > + > + movdqa %xmm11,%xmm8 > + movdqa %xmm14,%xmm6 > + pxor 0-128(%rax),%xmm0 > + paddd %xmm15,%xmm10 > + pslld $5,%xmm8 > + pxor %xmm12,%xmm6 > + > + movdqa %xmm11,%xmm9 > + movdqa %xmm4,112-128(%rax) > + paddd %xmm4,%xmm10 > + pxor %xmm2,%xmm0 > + psrld $27,%xmm9 > + pxor %xmm13,%xmm6 > + movdqa %xmm12,%xmm7 > + > + pslld $30,%xmm7 > + movdqa %xmm0,%xmm5 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + paddd %xmm6,%xmm10 > + paddd %xmm0,%xmm0 > + > + psrld $2,%xmm12 > + paddd %xmm8,%xmm10 > + por %xmm5,%xmm0 > + por %xmm7,%xmm12 > + movdqa 32(%rbp),%xmm15 > + pxor %xmm3,%xmm1 > + movdqa 176-128(%rax),%xmm3 > + > + movdqa %xmm10,%xmm8 > + movdqa %xmm13,%xmm7 > + pxor 16-128(%rax),%xmm1 > + pxor %xmm3,%xmm1 > + paddd %xmm15,%xmm14 > + pslld $5,%xmm8 > + movdqa %xmm10,%xmm9 > + pand %xmm12,%xmm7 > + > + movdqa %xmm13,%xmm6 > + movdqa %xmm1,%xmm5 > + psrld $27,%xmm9 > + paddd %xmm7,%xmm14 > + pxor %xmm12,%xmm6 > + > + movdqa %xmm0,128-128(%rax) > + paddd %xmm0,%xmm14 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + pand %xmm11,%xmm6 > + movdqa %xmm11,%xmm7 > + > + pslld $30,%xmm7 > + paddd %xmm1,%xmm1 > + paddd %xmm6,%xmm14 > + > + psrld $2,%xmm11 > + paddd %xmm8,%xmm14 > + por %xmm5,%xmm1 > + por %xmm7,%xmm11 > + pxor %xmm4,%xmm2 > + movdqa 192-128(%rax),%xmm4 > + > + movdqa %xmm14,%xmm8 > + movdqa %xmm12,%xmm7 > + pxor 32-128(%rax),%xmm2 > + pxor %xmm4,%xmm2 > + paddd %xmm15,%xmm13 > + pslld $5,%xmm8 > + movdqa %xmm14,%xmm9 > + pand %xmm11,%xmm7 > + > + movdqa %xmm12,%xmm6 > + movdqa %xmm2,%xmm5 > + psrld $27,%xmm9 > + paddd %xmm7,%xmm13 > + pxor %xmm11,%xmm6 > + > + movdqa %xmm1,144-128(%rax) > + paddd %xmm1,%xmm13 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + pand %xmm10,%xmm6 > + movdqa %xmm10,%xmm7 > + > + pslld $30,%xmm7 > + paddd %xmm2,%xmm2 > + paddd %xmm6,%xmm13 > + > + psrld $2,%xmm10 > + paddd %xmm8,%xmm13 > + por %xmm5,%xmm2 > + por %xmm7,%xmm10 > + pxor %xmm0,%xmm3 > + movdqa 208-128(%rax),%xmm0 > + > + movdqa %xmm13,%xmm8 > + movdqa %xmm11,%xmm7 > + pxor 48-128(%rax),%xmm3 > + pxor %xmm0,%xmm3 > + paddd %xmm15,%xmm12 > + pslld $5,%xmm8 > + movdqa %xmm13,%xmm9 > + pand %xmm10,%xmm7 > + > + movdqa %xmm11,%xmm6 > + movdqa %xmm3,%xmm5 > + psrld $27,%xmm9 > + paddd %xmm7,%xmm12 > + pxor %xmm10,%xmm6 > + > + movdqa %xmm2,160-128(%rax) > + paddd %xmm2,%xmm12 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + pand %xmm14,%xmm6 > + movdqa %xmm14,%xmm7 > + > + pslld $30,%xmm7 > + paddd %xmm3,%xmm3 > + paddd %xmm6,%xmm12 > + > + psrld $2,%xmm14 > + paddd %xmm8,%xmm12 > + por %xmm5,%xmm3 > + por %xmm7,%xmm14 > + pxor %xmm1,%xmm4 > + movdqa 224-128(%rax),%xmm1 > + > + movdqa %xmm12,%xmm8 > + movdqa %xmm10,%xmm7 > + pxor 64-128(%rax),%xmm4 > + pxor %xmm1,%xmm4 > + paddd %xmm15,%xmm11 > + pslld $5,%xmm8 > + movdqa %xmm12,%xmm9 > + pand %xmm14,%xmm7 > + > + movdqa %xmm10,%xmm6 > + movdqa %xmm4,%xmm5 > + psrld $27,%xmm9 > + paddd %xmm7,%xmm11 > + pxor %xmm14,%xmm6 > + > + movdqa %xmm3,176-128(%rax) > + paddd %xmm3,%xmm11 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + pand %xmm13,%xmm6 > + movdqa %xmm13,%xmm7 > + > + pslld $30,%xmm7 > + paddd %xmm4,%xmm4 > + paddd %xmm6,%xmm11 > + > + psrld $2,%xmm13 > + paddd %xmm8,%xmm11 > + por %xmm5,%xmm4 > + por %xmm7,%xmm13 > + pxor %xmm2,%xmm0 > + movdqa 240-128(%rax),%xmm2 > + > + movdqa %xmm11,%xmm8 > + movdqa %xmm14,%xmm7 > + pxor 80-128(%rax),%xmm0 > + pxor %xmm2,%xmm0 > + paddd %xmm15,%xmm10 > + pslld $5,%xmm8 > + movdqa %xmm11,%xmm9 > + pand %xmm13,%xmm7 > + > + movdqa %xmm14,%xmm6 > + movdqa %xmm0,%xmm5 > + psrld $27,%xmm9 > + paddd %xmm7,%xmm10 > + pxor %xmm13,%xmm6 > + > + movdqa %xmm4,192-128(%rax) > + paddd %xmm4,%xmm10 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + pand %xmm12,%xmm6 > + movdqa %xmm12,%xmm7 > + > + pslld $30,%xmm7 > + paddd %xmm0,%xmm0 > + paddd %xmm6,%xmm10 > + > + psrld $2,%xmm12 > + paddd %xmm8,%xmm10 > + por %xmm5,%xmm0 > + por %xmm7,%xmm12 > + pxor %xmm3,%xmm1 > + movdqa 0-128(%rax),%xmm3 > + > + movdqa %xmm10,%xmm8 > + movdqa %xmm13,%xmm7 > + pxor 96-128(%rax),%xmm1 > + pxor %xmm3,%xmm1 > + paddd %xmm15,%xmm14 > + pslld $5,%xmm8 > + movdqa %xmm10,%xmm9 > + pand %xmm12,%xmm7 > + > + movdqa %xmm13,%xmm6 > + movdqa %xmm1,%xmm5 > + psrld $27,%xmm9 > + paddd %xmm7,%xmm14 > + pxor %xmm12,%xmm6 > + > + movdqa %xmm0,208-128(%rax) > + paddd %xmm0,%xmm14 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + pand %xmm11,%xmm6 > + movdqa %xmm11,%xmm7 > + > + pslld $30,%xmm7 > + paddd %xmm1,%xmm1 > + paddd %xmm6,%xmm14 > + > + psrld $2,%xmm11 > + paddd %xmm8,%xmm14 > + por %xmm5,%xmm1 > + por %xmm7,%xmm11 > + pxor %xmm4,%xmm2 > + movdqa 16-128(%rax),%xmm4 > + > + movdqa %xmm14,%xmm8 > + movdqa %xmm12,%xmm7 > + pxor 112-128(%rax),%xmm2 > + pxor %xmm4,%xmm2 > + paddd %xmm15,%xmm13 > + pslld $5,%xmm8 > + movdqa %xmm14,%xmm9 > + pand %xmm11,%xmm7 > + > + movdqa %xmm12,%xmm6 > + movdqa %xmm2,%xmm5 > + psrld $27,%xmm9 > + paddd %xmm7,%xmm13 > + pxor %xmm11,%xmm6 > + > + movdqa %xmm1,224-128(%rax) > + paddd %xmm1,%xmm13 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + pand %xmm10,%xmm6 > + movdqa %xmm10,%xmm7 > + > + pslld $30,%xmm7 > + paddd %xmm2,%xmm2 > + paddd %xmm6,%xmm13 > + > + psrld $2,%xmm10 > + paddd %xmm8,%xmm13 > + por %xmm5,%xmm2 > + por %xmm7,%xmm10 > + pxor %xmm0,%xmm3 > + movdqa 32-128(%rax),%xmm0 > + > + movdqa %xmm13,%xmm8 > + movdqa %xmm11,%xmm7 > + pxor 128-128(%rax),%xmm3 > + pxor %xmm0,%xmm3 > + paddd %xmm15,%xmm12 > + pslld $5,%xmm8 > + movdqa %xmm13,%xmm9 > + pand %xmm10,%xmm7 > + > + movdqa %xmm11,%xmm6 > + movdqa %xmm3,%xmm5 > + psrld $27,%xmm9 > + paddd %xmm7,%xmm12 > + pxor %xmm10,%xmm6 > + > + movdqa %xmm2,240-128(%rax) > + paddd %xmm2,%xmm12 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + pand %xmm14,%xmm6 > + movdqa %xmm14,%xmm7 > + > + pslld $30,%xmm7 > + paddd %xmm3,%xmm3 > + paddd %xmm6,%xmm12 > + > + psrld $2,%xmm14 > + paddd %xmm8,%xmm12 > + por %xmm5,%xmm3 > + por %xmm7,%xmm14 > + pxor %xmm1,%xmm4 > + movdqa 48-128(%rax),%xmm1 > + > + movdqa %xmm12,%xmm8 > + movdqa %xmm10,%xmm7 > + pxor 144-128(%rax),%xmm4 > + pxor %xmm1,%xmm4 > + paddd %xmm15,%xmm11 > + pslld $5,%xmm8 > + movdqa %xmm12,%xmm9 > + pand %xmm14,%xmm7 > + > + movdqa %xmm10,%xmm6 > + movdqa %xmm4,%xmm5 > + psrld $27,%xmm9 > + paddd %xmm7,%xmm11 > + pxor %xmm14,%xmm6 > + > + movdqa %xmm3,0-128(%rax) > + paddd %xmm3,%xmm11 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + pand %xmm13,%xmm6 > + movdqa %xmm13,%xmm7 > + > + pslld $30,%xmm7 > + paddd %xmm4,%xmm4 > + paddd %xmm6,%xmm11 > + > + psrld $2,%xmm13 > + paddd %xmm8,%xmm11 > + por %xmm5,%xmm4 > + por %xmm7,%xmm13 > + pxor %xmm2,%xmm0 > + movdqa 64-128(%rax),%xmm2 > + > + movdqa %xmm11,%xmm8 > + movdqa %xmm14,%xmm7 > + pxor 160-128(%rax),%xmm0 > + pxor %xmm2,%xmm0 > + paddd %xmm15,%xmm10 > + pslld $5,%xmm8 > + movdqa %xmm11,%xmm9 > + pand %xmm13,%xmm7 > + > + movdqa %xmm14,%xmm6 > + movdqa %xmm0,%xmm5 > + psrld $27,%xmm9 > + paddd %xmm7,%xmm10 > + pxor %xmm13,%xmm6 > + > + movdqa %xmm4,16-128(%rax) > + paddd %xmm4,%xmm10 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + pand %xmm12,%xmm6 > + movdqa %xmm12,%xmm7 > + > + pslld $30,%xmm7 > + paddd %xmm0,%xmm0 > + paddd %xmm6,%xmm10 > + > + psrld $2,%xmm12 > + paddd %xmm8,%xmm10 > + por %xmm5,%xmm0 > + por %xmm7,%xmm12 > + pxor %xmm3,%xmm1 > + movdqa 80-128(%rax),%xmm3 > + > + movdqa %xmm10,%xmm8 > + movdqa %xmm13,%xmm7 > + pxor 176-128(%rax),%xmm1 > + pxor %xmm3,%xmm1 > + paddd %xmm15,%xmm14 > + pslld $5,%xmm8 > + movdqa %xmm10,%xmm9 > + pand %xmm12,%xmm7 > + > + movdqa %xmm13,%xmm6 > + movdqa %xmm1,%xmm5 > + psrld $27,%xmm9 > + paddd %xmm7,%xmm14 > + pxor %xmm12,%xmm6 > + > + movdqa %xmm0,32-128(%rax) > + paddd %xmm0,%xmm14 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + pand %xmm11,%xmm6 > + movdqa %xmm11,%xmm7 > + > + pslld $30,%xmm7 > + paddd %xmm1,%xmm1 > + paddd %xmm6,%xmm14 > + > + psrld $2,%xmm11 > + paddd %xmm8,%xmm14 > + por %xmm5,%xmm1 > + por %xmm7,%xmm11 > + pxor %xmm4,%xmm2 > + movdqa 96-128(%rax),%xmm4 > + > + movdqa %xmm14,%xmm8 > + movdqa %xmm12,%xmm7 > + pxor 192-128(%rax),%xmm2 > + pxor %xmm4,%xmm2 > + paddd %xmm15,%xmm13 > + pslld $5,%xmm8 > + movdqa %xmm14,%xmm9 > + pand %xmm11,%xmm7 > + > + movdqa %xmm12,%xmm6 > + movdqa %xmm2,%xmm5 > + psrld $27,%xmm9 > + paddd %xmm7,%xmm13 > + pxor %xmm11,%xmm6 > + > + movdqa %xmm1,48-128(%rax) > + paddd %xmm1,%xmm13 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + pand %xmm10,%xmm6 > + movdqa %xmm10,%xmm7 > + > + pslld $30,%xmm7 > + paddd %xmm2,%xmm2 > + paddd %xmm6,%xmm13 > + > + psrld $2,%xmm10 > + paddd %xmm8,%xmm13 > + por %xmm5,%xmm2 > + por %xmm7,%xmm10 > + pxor %xmm0,%xmm3 > + movdqa 112-128(%rax),%xmm0 > + > + movdqa %xmm13,%xmm8 > + movdqa %xmm11,%xmm7 > + pxor 208-128(%rax),%xmm3 > + pxor %xmm0,%xmm3 > + paddd %xmm15,%xmm12 > + pslld $5,%xmm8 > + movdqa %xmm13,%xmm9 > + pand %xmm10,%xmm7 > + > + movdqa %xmm11,%xmm6 > + movdqa %xmm3,%xmm5 > + psrld $27,%xmm9 > + paddd %xmm7,%xmm12 > + pxor %xmm10,%xmm6 > + > + movdqa %xmm2,64-128(%rax) > + paddd %xmm2,%xmm12 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + pand %xmm14,%xmm6 > + movdqa %xmm14,%xmm7 > + > + pslld $30,%xmm7 > + paddd %xmm3,%xmm3 > + paddd %xmm6,%xmm12 > + > + psrld $2,%xmm14 > + paddd %xmm8,%xmm12 > + por %xmm5,%xmm3 > + por %xmm7,%xmm14 > + pxor %xmm1,%xmm4 > + movdqa 128-128(%rax),%xmm1 > + > + movdqa %xmm12,%xmm8 > + movdqa %xmm10,%xmm7 > + pxor 224-128(%rax),%xmm4 > + pxor %xmm1,%xmm4 > + paddd %xmm15,%xmm11 > + pslld $5,%xmm8 > + movdqa %xmm12,%xmm9 > + pand %xmm14,%xmm7 > + > + movdqa %xmm10,%xmm6 > + movdqa %xmm4,%xmm5 > + psrld $27,%xmm9 > + paddd %xmm7,%xmm11 > + pxor %xmm14,%xmm6 > + > + movdqa %xmm3,80-128(%rax) > + paddd %xmm3,%xmm11 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + pand %xmm13,%xmm6 > + movdqa %xmm13,%xmm7 > + > + pslld $30,%xmm7 > + paddd %xmm4,%xmm4 > + paddd %xmm6,%xmm11 > + > + psrld $2,%xmm13 > + paddd %xmm8,%xmm11 > + por %xmm5,%xmm4 > + por %xmm7,%xmm13 > + pxor %xmm2,%xmm0 > + movdqa 144-128(%rax),%xmm2 > + > + movdqa %xmm11,%xmm8 > + movdqa %xmm14,%xmm7 > + pxor 240-128(%rax),%xmm0 > + pxor %xmm2,%xmm0 > + paddd %xmm15,%xmm10 > + pslld $5,%xmm8 > + movdqa %xmm11,%xmm9 > + pand %xmm13,%xmm7 > + > + movdqa %xmm14,%xmm6 > + movdqa %xmm0,%xmm5 > + psrld $27,%xmm9 > + paddd %xmm7,%xmm10 > + pxor %xmm13,%xmm6 > + > + movdqa %xmm4,96-128(%rax) > + paddd %xmm4,%xmm10 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + pand %xmm12,%xmm6 > + movdqa %xmm12,%xmm7 > + > + pslld $30,%xmm7 > + paddd %xmm0,%xmm0 > + paddd %xmm6,%xmm10 > + > + psrld $2,%xmm12 > + paddd %xmm8,%xmm10 > + por %xmm5,%xmm0 > + por %xmm7,%xmm12 > + pxor %xmm3,%xmm1 > + movdqa 160-128(%rax),%xmm3 > + > + movdqa %xmm10,%xmm8 > + movdqa %xmm13,%xmm7 > + pxor 0-128(%rax),%xmm1 > + pxor %xmm3,%xmm1 > + paddd %xmm15,%xmm14 > + pslld $5,%xmm8 > + movdqa %xmm10,%xmm9 > + pand %xmm12,%xmm7 > + > + movdqa %xmm13,%xmm6 > + movdqa %xmm1,%xmm5 > + psrld $27,%xmm9 > + paddd %xmm7,%xmm14 > + pxor %xmm12,%xmm6 > + > + movdqa %xmm0,112-128(%rax) > + paddd %xmm0,%xmm14 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + pand %xmm11,%xmm6 > + movdqa %xmm11,%xmm7 > + > + pslld $30,%xmm7 > + paddd %xmm1,%xmm1 > + paddd %xmm6,%xmm14 > + > + psrld $2,%xmm11 > + paddd %xmm8,%xmm14 > + por %xmm5,%xmm1 > + por %xmm7,%xmm11 > + pxor %xmm4,%xmm2 > + movdqa 176-128(%rax),%xmm4 > + > + movdqa %xmm14,%xmm8 > + movdqa %xmm12,%xmm7 > + pxor 16-128(%rax),%xmm2 > + pxor %xmm4,%xmm2 > + paddd %xmm15,%xmm13 > + pslld $5,%xmm8 > + movdqa %xmm14,%xmm9 > + pand %xmm11,%xmm7 > + > + movdqa %xmm12,%xmm6 > + movdqa %xmm2,%xmm5 > + psrld $27,%xmm9 > + paddd %xmm7,%xmm13 > + pxor %xmm11,%xmm6 > + > + movdqa %xmm1,128-128(%rax) > + paddd %xmm1,%xmm13 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + pand %xmm10,%xmm6 > + movdqa %xmm10,%xmm7 > + > + pslld $30,%xmm7 > + paddd %xmm2,%xmm2 > + paddd %xmm6,%xmm13 > + > + psrld $2,%xmm10 > + paddd %xmm8,%xmm13 > + por %xmm5,%xmm2 > + por %xmm7,%xmm10 > + pxor %xmm0,%xmm3 > + movdqa 192-128(%rax),%xmm0 > + > + movdqa %xmm13,%xmm8 > + movdqa %xmm11,%xmm7 > + pxor 32-128(%rax),%xmm3 > + pxor %xmm0,%xmm3 > + paddd %xmm15,%xmm12 > + pslld $5,%xmm8 > + movdqa %xmm13,%xmm9 > + pand %xmm10,%xmm7 > + > + movdqa %xmm11,%xmm6 > + movdqa %xmm3,%xmm5 > + psrld $27,%xmm9 > + paddd %xmm7,%xmm12 > + pxor %xmm10,%xmm6 > + > + movdqa %xmm2,144-128(%rax) > + paddd %xmm2,%xmm12 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + pand %xmm14,%xmm6 > + movdqa %xmm14,%xmm7 > + > + pslld $30,%xmm7 > + paddd %xmm3,%xmm3 > + paddd %xmm6,%xmm12 > + > + psrld $2,%xmm14 > + paddd %xmm8,%xmm12 > + por %xmm5,%xmm3 > + por %xmm7,%xmm14 > + pxor %xmm1,%xmm4 > + movdqa 208-128(%rax),%xmm1 > + > + movdqa %xmm12,%xmm8 > + movdqa %xmm10,%xmm7 > + pxor 48-128(%rax),%xmm4 > + pxor %xmm1,%xmm4 > + paddd %xmm15,%xmm11 > + pslld $5,%xmm8 > + movdqa %xmm12,%xmm9 > + pand %xmm14,%xmm7 > + > + movdqa %xmm10,%xmm6 > + movdqa %xmm4,%xmm5 > + psrld $27,%xmm9 > + paddd %xmm7,%xmm11 > + pxor %xmm14,%xmm6 > + > + movdqa %xmm3,160-128(%rax) > + paddd %xmm3,%xmm11 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + pand %xmm13,%xmm6 > + movdqa %xmm13,%xmm7 > + > + pslld $30,%xmm7 > + paddd %xmm4,%xmm4 > + paddd %xmm6,%xmm11 > + > + psrld $2,%xmm13 > + paddd %xmm8,%xmm11 > + por %xmm5,%xmm4 > + por %xmm7,%xmm13 > + pxor %xmm2,%xmm0 > + movdqa 224-128(%rax),%xmm2 > + > + movdqa %xmm11,%xmm8 > + movdqa %xmm14,%xmm7 > + pxor 64-128(%rax),%xmm0 > + pxor %xmm2,%xmm0 > + paddd %xmm15,%xmm10 > + pslld $5,%xmm8 > + movdqa %xmm11,%xmm9 > + pand %xmm13,%xmm7 > + > + movdqa %xmm14,%xmm6 > + movdqa %xmm0,%xmm5 > + psrld $27,%xmm9 > + paddd %xmm7,%xmm10 > + pxor %xmm13,%xmm6 > + > + movdqa %xmm4,176-128(%rax) > + paddd %xmm4,%xmm10 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + pand %xmm12,%xmm6 > + movdqa %xmm12,%xmm7 > + > + pslld $30,%xmm7 > + paddd %xmm0,%xmm0 > + paddd %xmm6,%xmm10 > + > + psrld $2,%xmm12 > + paddd %xmm8,%xmm10 > + por %xmm5,%xmm0 > + por %xmm7,%xmm12 > + movdqa 64(%rbp),%xmm15 > + pxor %xmm3,%xmm1 > + movdqa 240-128(%rax),%xmm3 > + > + movdqa %xmm10,%xmm8 > + movdqa %xmm13,%xmm6 > + pxor 80-128(%rax),%xmm1 > + paddd %xmm15,%xmm14 > + pslld $5,%xmm8 > + pxor %xmm11,%xmm6 > + > + movdqa %xmm10,%xmm9 > + movdqa %xmm0,192-128(%rax) > + paddd %xmm0,%xmm14 > + pxor %xmm3,%xmm1 > + psrld $27,%xmm9 > + pxor %xmm12,%xmm6 > + movdqa %xmm11,%xmm7 > + > + pslld $30,%xmm7 > + movdqa %xmm1,%xmm5 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + paddd %xmm6,%xmm14 > + paddd %xmm1,%xmm1 > + > + psrld $2,%xmm11 > + paddd %xmm8,%xmm14 > + por %xmm5,%xmm1 > + por %xmm7,%xmm11 > + pxor %xmm4,%xmm2 > + movdqa 0-128(%rax),%xmm4 > + > + movdqa %xmm14,%xmm8 > + movdqa %xmm12,%xmm6 > + pxor 96-128(%rax),%xmm2 > + paddd %xmm15,%xmm13 > + pslld $5,%xmm8 > + pxor %xmm10,%xmm6 > + > + movdqa %xmm14,%xmm9 > + movdqa %xmm1,208-128(%rax) > + paddd %xmm1,%xmm13 > + pxor %xmm4,%xmm2 > + psrld $27,%xmm9 > + pxor %xmm11,%xmm6 > + movdqa %xmm10,%xmm7 > + > + pslld $30,%xmm7 > + movdqa %xmm2,%xmm5 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + paddd %xmm6,%xmm13 > + paddd %xmm2,%xmm2 > + > + psrld $2,%xmm10 > + paddd %xmm8,%xmm13 > + por %xmm5,%xmm2 > + por %xmm7,%xmm10 > + pxor %xmm0,%xmm3 > + movdqa 16-128(%rax),%xmm0 > + > + movdqa %xmm13,%xmm8 > + movdqa %xmm11,%xmm6 > + pxor 112-128(%rax),%xmm3 > + paddd %xmm15,%xmm12 > + pslld $5,%xmm8 > + pxor %xmm14,%xmm6 > + > + movdqa %xmm13,%xmm9 > + movdqa %xmm2,224-128(%rax) > + paddd %xmm2,%xmm12 > + pxor %xmm0,%xmm3 > + psrld $27,%xmm9 > + pxor %xmm10,%xmm6 > + movdqa %xmm14,%xmm7 > + > + pslld $30,%xmm7 > + movdqa %xmm3,%xmm5 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + paddd %xmm6,%xmm12 > + paddd %xmm3,%xmm3 > + > + psrld $2,%xmm14 > + paddd %xmm8,%xmm12 > + por %xmm5,%xmm3 > + por %xmm7,%xmm14 > + pxor %xmm1,%xmm4 > + movdqa 32-128(%rax),%xmm1 > + > + movdqa %xmm12,%xmm8 > + movdqa %xmm10,%xmm6 > + pxor 128-128(%rax),%xmm4 > + paddd %xmm15,%xmm11 > + pslld $5,%xmm8 > + pxor %xmm13,%xmm6 > + > + movdqa %xmm12,%xmm9 > + movdqa %xmm3,240-128(%rax) > + paddd %xmm3,%xmm11 > + pxor %xmm1,%xmm4 > + psrld $27,%xmm9 > + pxor %xmm14,%xmm6 > + movdqa %xmm13,%xmm7 > + > + pslld $30,%xmm7 > + movdqa %xmm4,%xmm5 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + paddd %xmm6,%xmm11 > + paddd %xmm4,%xmm4 > + > + psrld $2,%xmm13 > + paddd %xmm8,%xmm11 > + por %xmm5,%xmm4 > + por %xmm7,%xmm13 > + pxor %xmm2,%xmm0 > + movdqa 48-128(%rax),%xmm2 > + > + movdqa %xmm11,%xmm8 > + movdqa %xmm14,%xmm6 > + pxor 144-128(%rax),%xmm0 > + paddd %xmm15,%xmm10 > + pslld $5,%xmm8 > + pxor %xmm12,%xmm6 > + > + movdqa %xmm11,%xmm9 > + movdqa %xmm4,0-128(%rax) > + paddd %xmm4,%xmm10 > + pxor %xmm2,%xmm0 > + psrld $27,%xmm9 > + pxor %xmm13,%xmm6 > + movdqa %xmm12,%xmm7 > + > + pslld $30,%xmm7 > + movdqa %xmm0,%xmm5 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + paddd %xmm6,%xmm10 > + paddd %xmm0,%xmm0 > + > + psrld $2,%xmm12 > + paddd %xmm8,%xmm10 > + por %xmm5,%xmm0 > + por %xmm7,%xmm12 > + pxor %xmm3,%xmm1 > + movdqa 64-128(%rax),%xmm3 > + > + movdqa %xmm10,%xmm8 > + movdqa %xmm13,%xmm6 > + pxor 160-128(%rax),%xmm1 > + paddd %xmm15,%xmm14 > + pslld $5,%xmm8 > + pxor %xmm11,%xmm6 > + > + movdqa %xmm10,%xmm9 > + movdqa %xmm0,16-128(%rax) > + paddd %xmm0,%xmm14 > + pxor %xmm3,%xmm1 > + psrld $27,%xmm9 > + pxor %xmm12,%xmm6 > + movdqa %xmm11,%xmm7 > + > + pslld $30,%xmm7 > + movdqa %xmm1,%xmm5 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + paddd %xmm6,%xmm14 > + paddd %xmm1,%xmm1 > + > + psrld $2,%xmm11 > + paddd %xmm8,%xmm14 > + por %xmm5,%xmm1 > + por %xmm7,%xmm11 > + pxor %xmm4,%xmm2 > + movdqa 80-128(%rax),%xmm4 > + > + movdqa %xmm14,%xmm8 > + movdqa %xmm12,%xmm6 > + pxor 176-128(%rax),%xmm2 > + paddd %xmm15,%xmm13 > + pslld $5,%xmm8 > + pxor %xmm10,%xmm6 > + > + movdqa %xmm14,%xmm9 > + movdqa %xmm1,32-128(%rax) > + paddd %xmm1,%xmm13 > + pxor %xmm4,%xmm2 > + psrld $27,%xmm9 > + pxor %xmm11,%xmm6 > + movdqa %xmm10,%xmm7 > + > + pslld $30,%xmm7 > + movdqa %xmm2,%xmm5 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + paddd %xmm6,%xmm13 > + paddd %xmm2,%xmm2 > + > + psrld $2,%xmm10 > + paddd %xmm8,%xmm13 > + por %xmm5,%xmm2 > + por %xmm7,%xmm10 > + pxor %xmm0,%xmm3 > + movdqa 96-128(%rax),%xmm0 > + > + movdqa %xmm13,%xmm8 > + movdqa %xmm11,%xmm6 > + pxor 192-128(%rax),%xmm3 > + paddd %xmm15,%xmm12 > + pslld $5,%xmm8 > + pxor %xmm14,%xmm6 > + > + movdqa %xmm13,%xmm9 > + movdqa %xmm2,48-128(%rax) > + paddd %xmm2,%xmm12 > + pxor %xmm0,%xmm3 > + psrld $27,%xmm9 > + pxor %xmm10,%xmm6 > + movdqa %xmm14,%xmm7 > + > + pslld $30,%xmm7 > + movdqa %xmm3,%xmm5 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + paddd %xmm6,%xmm12 > + paddd %xmm3,%xmm3 > + > + psrld $2,%xmm14 > + paddd %xmm8,%xmm12 > + por %xmm5,%xmm3 > + por %xmm7,%xmm14 > + pxor %xmm1,%xmm4 > + movdqa 112-128(%rax),%xmm1 > + > + movdqa %xmm12,%xmm8 > + movdqa %xmm10,%xmm6 > + pxor 208-128(%rax),%xmm4 > + paddd %xmm15,%xmm11 > + pslld $5,%xmm8 > + pxor %xmm13,%xmm6 > + > + movdqa %xmm12,%xmm9 > + movdqa %xmm3,64-128(%rax) > + paddd %xmm3,%xmm11 > + pxor %xmm1,%xmm4 > + psrld $27,%xmm9 > + pxor %xmm14,%xmm6 > + movdqa %xmm13,%xmm7 > + > + pslld $30,%xmm7 > + movdqa %xmm4,%xmm5 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + paddd %xmm6,%xmm11 > + paddd %xmm4,%xmm4 > + > + psrld $2,%xmm13 > + paddd %xmm8,%xmm11 > + por %xmm5,%xmm4 > + por %xmm7,%xmm13 > + pxor %xmm2,%xmm0 > + movdqa 128-128(%rax),%xmm2 > + > + movdqa %xmm11,%xmm8 > + movdqa %xmm14,%xmm6 > + pxor 224-128(%rax),%xmm0 > + paddd %xmm15,%xmm10 > + pslld $5,%xmm8 > + pxor %xmm12,%xmm6 > + > + movdqa %xmm11,%xmm9 > + movdqa %xmm4,80-128(%rax) > + paddd %xmm4,%xmm10 > + pxor %xmm2,%xmm0 > + psrld $27,%xmm9 > + pxor %xmm13,%xmm6 > + movdqa %xmm12,%xmm7 > + > + pslld $30,%xmm7 > + movdqa %xmm0,%xmm5 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + paddd %xmm6,%xmm10 > + paddd %xmm0,%xmm0 > + > + psrld $2,%xmm12 > + paddd %xmm8,%xmm10 > + por %xmm5,%xmm0 > + por %xmm7,%xmm12 > + pxor %xmm3,%xmm1 > + movdqa 144-128(%rax),%xmm3 > + > + movdqa %xmm10,%xmm8 > + movdqa %xmm13,%xmm6 > + pxor 240-128(%rax),%xmm1 > + paddd %xmm15,%xmm14 > + pslld $5,%xmm8 > + pxor %xmm11,%xmm6 > + > + movdqa %xmm10,%xmm9 > + movdqa %xmm0,96-128(%rax) > + paddd %xmm0,%xmm14 > + pxor %xmm3,%xmm1 > + psrld $27,%xmm9 > + pxor %xmm12,%xmm6 > + movdqa %xmm11,%xmm7 > + > + pslld $30,%xmm7 > + movdqa %xmm1,%xmm5 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + paddd %xmm6,%xmm14 > + paddd %xmm1,%xmm1 > + > + psrld $2,%xmm11 > + paddd %xmm8,%xmm14 > + por %xmm5,%xmm1 > + por %xmm7,%xmm11 > + pxor %xmm4,%xmm2 > + movdqa 160-128(%rax),%xmm4 > + > + movdqa %xmm14,%xmm8 > + movdqa %xmm12,%xmm6 > + pxor 0-128(%rax),%xmm2 > + paddd %xmm15,%xmm13 > + pslld $5,%xmm8 > + pxor %xmm10,%xmm6 > + > + movdqa %xmm14,%xmm9 > + movdqa %xmm1,112-128(%rax) > + paddd %xmm1,%xmm13 > + pxor %xmm4,%xmm2 > + psrld $27,%xmm9 > + pxor %xmm11,%xmm6 > + movdqa %xmm10,%xmm7 > + > + pslld $30,%xmm7 > + movdqa %xmm2,%xmm5 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + paddd %xmm6,%xmm13 > + paddd %xmm2,%xmm2 > + > + psrld $2,%xmm10 > + paddd %xmm8,%xmm13 > + por %xmm5,%xmm2 > + por %xmm7,%xmm10 > + pxor %xmm0,%xmm3 > + movdqa 176-128(%rax),%xmm0 > + > + movdqa %xmm13,%xmm8 > + movdqa %xmm11,%xmm6 > + pxor 16-128(%rax),%xmm3 > + paddd %xmm15,%xmm12 > + pslld $5,%xmm8 > + pxor %xmm14,%xmm6 > + > + movdqa %xmm13,%xmm9 > + paddd %xmm2,%xmm12 > + pxor %xmm0,%xmm3 > + psrld $27,%xmm9 > + pxor %xmm10,%xmm6 > + movdqa %xmm14,%xmm7 > + > + pslld $30,%xmm7 > + movdqa %xmm3,%xmm5 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + paddd %xmm6,%xmm12 > + paddd %xmm3,%xmm3 > + > + psrld $2,%xmm14 > + paddd %xmm8,%xmm12 > + por %xmm5,%xmm3 > + por %xmm7,%xmm14 > + pxor %xmm1,%xmm4 > + movdqa 192-128(%rax),%xmm1 > + > + movdqa %xmm12,%xmm8 > + movdqa %xmm10,%xmm6 > + pxor 32-128(%rax),%xmm4 > + paddd %xmm15,%xmm11 > + pslld $5,%xmm8 > + pxor %xmm13,%xmm6 > + > + movdqa %xmm12,%xmm9 > + paddd %xmm3,%xmm11 > + pxor %xmm1,%xmm4 > + psrld $27,%xmm9 > + pxor %xmm14,%xmm6 > + movdqa %xmm13,%xmm7 > + > + pslld $30,%xmm7 > + movdqa %xmm4,%xmm5 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + paddd %xmm6,%xmm11 > + paddd %xmm4,%xmm4 > + > + psrld $2,%xmm13 > + paddd %xmm8,%xmm11 > + por %xmm5,%xmm4 > + por %xmm7,%xmm13 > + pxor %xmm2,%xmm0 > + movdqa 208-128(%rax),%xmm2 > + > + movdqa %xmm11,%xmm8 > + movdqa %xmm14,%xmm6 > + pxor 48-128(%rax),%xmm0 > + paddd %xmm15,%xmm10 > + pslld $5,%xmm8 > + pxor %xmm12,%xmm6 > + > + movdqa %xmm11,%xmm9 > + paddd %xmm4,%xmm10 > + pxor %xmm2,%xmm0 > + psrld $27,%xmm9 > + pxor %xmm13,%xmm6 > + movdqa %xmm12,%xmm7 > + > + pslld $30,%xmm7 > + movdqa %xmm0,%xmm5 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + paddd %xmm6,%xmm10 > + paddd %xmm0,%xmm0 > + > + psrld $2,%xmm12 > + paddd %xmm8,%xmm10 > + por %xmm5,%xmm0 > + por %xmm7,%xmm12 > + pxor %xmm3,%xmm1 > + movdqa 224-128(%rax),%xmm3 > + > + movdqa %xmm10,%xmm8 > + movdqa %xmm13,%xmm6 > + pxor 64-128(%rax),%xmm1 > + paddd %xmm15,%xmm14 > + pslld $5,%xmm8 > + pxor %xmm11,%xmm6 > + > + movdqa %xmm10,%xmm9 > + paddd %xmm0,%xmm14 > + pxor %xmm3,%xmm1 > + psrld $27,%xmm9 > + pxor %xmm12,%xmm6 > + movdqa %xmm11,%xmm7 > + > + pslld $30,%xmm7 > + movdqa %xmm1,%xmm5 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + paddd %xmm6,%xmm14 > + paddd %xmm1,%xmm1 > + > + psrld $2,%xmm11 > + paddd %xmm8,%xmm14 > + por %xmm5,%xmm1 > + por %xmm7,%xmm11 > + pxor %xmm4,%xmm2 > + movdqa 240-128(%rax),%xmm4 > + > + movdqa %xmm14,%xmm8 > + movdqa %xmm12,%xmm6 > + pxor 80-128(%rax),%xmm2 > + paddd %xmm15,%xmm13 > + pslld $5,%xmm8 > + pxor %xmm10,%xmm6 > + > + movdqa %xmm14,%xmm9 > + paddd %xmm1,%xmm13 > + pxor %xmm4,%xmm2 > + psrld $27,%xmm9 > + pxor %xmm11,%xmm6 > + movdqa %xmm10,%xmm7 > + > + pslld $30,%xmm7 > + movdqa %xmm2,%xmm5 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + paddd %xmm6,%xmm13 > + paddd %xmm2,%xmm2 > + > + psrld $2,%xmm10 > + paddd %xmm8,%xmm13 > + por %xmm5,%xmm2 > + por %xmm7,%xmm10 > + pxor %xmm0,%xmm3 > + movdqa 0-128(%rax),%xmm0 > + > + movdqa %xmm13,%xmm8 > + movdqa %xmm11,%xmm6 > + pxor 96-128(%rax),%xmm3 > + paddd %xmm15,%xmm12 > + pslld $5,%xmm8 > + pxor %xmm14,%xmm6 > + > + movdqa %xmm13,%xmm9 > + paddd %xmm2,%xmm12 > + pxor %xmm0,%xmm3 > + psrld $27,%xmm9 > + pxor %xmm10,%xmm6 > + movdqa %xmm14,%xmm7 > + > + pslld $30,%xmm7 > + movdqa %xmm3,%xmm5 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + paddd %xmm6,%xmm12 > + paddd %xmm3,%xmm3 > + > + psrld $2,%xmm14 > + paddd %xmm8,%xmm12 > + por %xmm5,%xmm3 > + por %xmm7,%xmm14 > + pxor %xmm1,%xmm4 > + movdqa 16-128(%rax),%xmm1 > + > + movdqa %xmm12,%xmm8 > + movdqa %xmm10,%xmm6 > + pxor 112-128(%rax),%xmm4 > + paddd %xmm15,%xmm11 > + pslld $5,%xmm8 > + pxor %xmm13,%xmm6 > + > + movdqa %xmm12,%xmm9 > + paddd %xmm3,%xmm11 > + pxor %xmm1,%xmm4 > + psrld $27,%xmm9 > + pxor %xmm14,%xmm6 > + movdqa %xmm13,%xmm7 > + > + pslld $30,%xmm7 > + movdqa %xmm4,%xmm5 > + por %xmm9,%xmm8 > + psrld $31,%xmm5 > + paddd %xmm6,%xmm11 > + paddd %xmm4,%xmm4 > + > + psrld $2,%xmm13 > + paddd %xmm8,%xmm11 > + por %xmm5,%xmm4 > + por %xmm7,%xmm13 > + movdqa %xmm11,%xmm8 > + paddd %xmm15,%xmm10 > + movdqa %xmm14,%xmm6 > + pslld $5,%xmm8 > + pxor %xmm12,%xmm6 > + > + movdqa %xmm11,%xmm9 > + paddd %xmm4,%xmm10 > + psrld $27,%xmm9 > + movdqa %xmm12,%xmm7 > + pxor %xmm13,%xmm6 > + > + pslld $30,%xmm7 > + por %xmm9,%xmm8 > + paddd %xmm6,%xmm10 > + > + psrld $2,%xmm12 > + paddd %xmm8,%xmm10 > + por %xmm7,%xmm12 > + movdqa (%rbx),%xmm0 > + movl $1,%ecx > + cmpl 0(%rbx),%ecx > + pxor %xmm8,%xmm8 > + cmovgeq %rbp,%r8 > + cmpl 4(%rbx),%ecx > + movdqa %xmm0,%xmm1 > + cmovgeq %rbp,%r9 > + cmpl 8(%rbx),%ecx > + pcmpgtd %xmm8,%xmm1 > + cmovgeq %rbp,%r10 > + cmpl 12(%rbx),%ecx > + paddd %xmm1,%xmm0 > + cmovgeq %rbp,%r11 > + > + movdqu 0(%rdi),%xmm6 > + pand %xmm1,%xmm10 > + movdqu 32(%rdi),%xmm7 > + pand %xmm1,%xmm11 > + paddd %xmm6,%xmm10 > + movdqu 64(%rdi),%xmm8 > + pand %xmm1,%xmm12 > + paddd %xmm7,%xmm11 > + movdqu 96(%rdi),%xmm9 > + pand %xmm1,%xmm13 > + paddd %xmm8,%xmm12 > + movdqu 128(%rdi),%xmm5 > + pand %xmm1,%xmm14 > + movdqu %xmm10,0(%rdi) > + paddd %xmm9,%xmm13 > + movdqu %xmm11,32(%rdi) > + paddd %xmm5,%xmm14 > + movdqu %xmm12,64(%rdi) > + movdqu %xmm13,96(%rdi) > + movdqu %xmm14,128(%rdi) > + > + movdqa %xmm0,(%rbx) > + movdqa 96(%rbp),%xmm5 > + movdqa -32(%rbp),%xmm15 > + decl %edx > + jnz .Loop > + > + movl 280(%rsp),%edx > + leaq 16(%rdi),%rdi > + leaq 64(%rsi),%rsi > + decl %edx > + jnz .Loop_grande > + > +.Ldone: > + movq 272(%rsp),%rax > +.cfi_def_cfa %rax,8 > + movq -16(%rax),%rbp > +.cfi_restore %rbp > + movq -8(%rax),%rbx > +.cfi_restore %rbx > + leaq (%rax),%rsp > +.cfi_def_cfa_register %rsp > +.Lepilogue: > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size sha1_multi_block,.-sha1_multi_block > +.type sha1_multi_block_shaext,@function > +.align 32 > +sha1_multi_block_shaext: > +.cfi_startproc > +_shaext_shortcut: > + movq %rsp,%rax > +.cfi_def_cfa_register %rax > + pushq %rbx > +.cfi_offset %rbx,-16 > + pushq %rbp > +.cfi_offset %rbp,-24 > + subq $288,%rsp > + shll $1,%edx > + andq $-256,%rsp > + leaq 64(%rdi),%rdi > + movq %rax,272(%rsp) > +.Lbody_shaext: > + leaq 256(%rsp),%rbx > + movdqa K_XX_XX+128(%rip),%xmm3 > + > +.Loop_grande_shaext: > + movl %edx,280(%rsp) > + xorl %edx,%edx > + movq 0(%rsi),%r8 > + movl 8(%rsi),%ecx > + cmpl %edx,%ecx > + cmovgl %ecx,%edx > + testl %ecx,%ecx > + movl %ecx,0(%rbx) > + cmovleq %rsp,%r8 > + movq 16(%rsi),%r9 > + movl 24(%rsi),%ecx > + cmpl %edx,%ecx > + cmovgl %ecx,%edx > + testl %ecx,%ecx > + movl %ecx,4(%rbx) > + cmovleq %rsp,%r9 > + testl %edx,%edx > + jz .Ldone_shaext > + > + movq 0-64(%rdi),%xmm0 > + movq 32-64(%rdi),%xmm4 > + movq 64-64(%rdi),%xmm5 > + movq 96-64(%rdi),%xmm6 > + movq 128-64(%rdi),%xmm7 > + > + punpckldq %xmm4,%xmm0 > + punpckldq %xmm6,%xmm5 > + > + movdqa %xmm0,%xmm8 > + punpcklqdq %xmm5,%xmm0 > + punpckhqdq %xmm5,%xmm8 > + > + pshufd $63,%xmm7,%xmm1 > + pshufd $127,%xmm7,%xmm9 > + pshufd $27,%xmm0,%xmm0 > + pshufd $27,%xmm8,%xmm8 > + jmp .Loop_shaext > + > +.align 32 > +.Loop_shaext: > + movdqu 0(%r8),%xmm4 > + movdqu 0(%r9),%xmm11 > + movdqu 16(%r8),%xmm5 > + movdqu 16(%r9),%xmm12 > + movdqu 32(%r8),%xmm6 > +.byte 102,15,56,0,227 > + movdqu 32(%r9),%xmm13 > +.byte 102,68,15,56,0,219 > + movdqu 48(%r8),%xmm7 > + leaq 64(%r8),%r8 > +.byte 102,15,56,0,235 > + movdqu 48(%r9),%xmm14 > + leaq 64(%r9),%r9 > +.byte 102,68,15,56,0,227 > + > + movdqa %xmm1,80(%rsp) > + paddd %xmm4,%xmm1 > + movdqa %xmm9,112(%rsp) > + paddd %xmm11,%xmm9 > + movdqa %xmm0,64(%rsp) > + movdqa %xmm0,%xmm2 > + movdqa %xmm8,96(%rsp) > + movdqa %xmm8,%xmm10 > +.byte 15,58,204,193,0 > +.byte 15,56,200,213 > +.byte 69,15,58,204,193,0 > +.byte 69,15,56,200,212 > +.byte 102,15,56,0,243 > + prefetcht0 127(%r8) > +.byte 15,56,201,229 > +.byte 102,68,15,56,0,235 > + prefetcht0 127(%r9) > +.byte 69,15,56,201,220 > + > +.byte 102,15,56,0,251 > + movdqa %xmm0,%xmm1 > +.byte 102,68,15,56,0,243 > + movdqa %xmm8,%xmm9 > +.byte 15,58,204,194,0 > +.byte 15,56,200,206 > +.byte 69,15,58,204,194,0 > +.byte 69,15,56,200,205 > + pxor %xmm6,%xmm4 > +.byte 15,56,201,238 > + pxor %xmm13,%xmm11 > +.byte 69,15,56,201,229 > + movdqa %xmm0,%xmm2 > + movdqa %xmm8,%xmm10 > +.byte 15,58,204,193,0 > +.byte 15,56,200,215 > +.byte 69,15,58,204,193,0 > +.byte 69,15,56,200,214 > +.byte 15,56,202,231 > +.byte 69,15,56,202,222 > + pxor %xmm7,%xmm5 > +.byte 15,56,201,247 > + pxor %xmm14,%xmm12 > +.byte 69,15,56,201,238 > + movdqa %xmm0,%xmm1 > + movdqa %xmm8,%xmm9 > +.byte 15,58,204,194,0 > +.byte 15,56,200,204 > +.byte 69,15,58,204,194,0 > +.byte 69,15,56,200,203 > +.byte 15,56,202,236 > +.byte 69,15,56,202,227 > + pxor %xmm4,%xmm6 > +.byte 15,56,201,252 > + pxor %xmm11,%xmm13 > +.byte 69,15,56,201,243 > + movdqa %xmm0,%xmm2 > + movdqa %xmm8,%xmm10 > +.byte 15,58,204,193,0 > +.byte 15,56,200,213 > +.byte 69,15,58,204,193,0 > +.byte 69,15,56,200,212 > +.byte 15,56,202,245 > +.byte 69,15,56,202,236 > + pxor %xmm5,%xmm7 > +.byte 15,56,201,229 > + pxor %xmm12,%xmm14 > +.byte 69,15,56,201,220 > + movdqa %xmm0,%xmm1 > + movdqa %xmm8,%xmm9 > +.byte 15,58,204,194,1 > +.byte 15,56,200,206 > +.byte 69,15,58,204,194,1 > +.byte 69,15,56,200,205 > +.byte 15,56,202,254 > +.byte 69,15,56,202,245 > + pxor %xmm6,%xmm4 > +.byte 15,56,201,238 > + pxor %xmm13,%xmm11 > +.byte 69,15,56,201,229 > + movdqa %xmm0,%xmm2 > + movdqa %xmm8,%xmm10 > +.byte 15,58,204,193,1 > +.byte 15,56,200,215 > +.byte 69,15,58,204,193,1 > +.byte 69,15,56,200,214 > +.byte 15,56,202,231 > +.byte 69,15,56,202,222 > + pxor %xmm7,%xmm5 > +.byte 15,56,201,247 > + pxor %xmm14,%xmm12 > +.byte 69,15,56,201,238 > + movdqa %xmm0,%xmm1 > + movdqa %xmm8,%xmm9 > +.byte 15,58,204,194,1 > +.byte 15,56,200,204 > +.byte 69,15,58,204,194,1 > +.byte 69,15,56,200,203 > +.byte 15,56,202,236 > +.byte 69,15,56,202,227 > + pxor %xmm4,%xmm6 > +.byte 15,56,201,252 > + pxor %xmm11,%xmm13 > +.byte 69,15,56,201,243 > + movdqa %xmm0,%xmm2 > + movdqa %xmm8,%xmm10 > +.byte 15,58,204,193,1 > +.byte 15,56,200,213 > +.byte 69,15,58,204,193,1 > +.byte 69,15,56,200,212 > +.byte 15,56,202,245 > +.byte 69,15,56,202,236 > + pxor %xmm5,%xmm7 > +.byte 15,56,201,229 > + pxor %xmm12,%xmm14 > +.byte 69,15,56,201,220 > + movdqa %xmm0,%xmm1 > + movdqa %xmm8,%xmm9 > +.byte 15,58,204,194,1 > +.byte 15,56,200,206 > +.byte 69,15,58,204,194,1 > +.byte 69,15,56,200,205 > +.byte 15,56,202,254 > +.byte 69,15,56,202,245 > + pxor %xmm6,%xmm4 > +.byte 15,56,201,238 > + pxor %xmm13,%xmm11 > +.byte 69,15,56,201,229 > + movdqa %xmm0,%xmm2 > + movdqa %xmm8,%xmm10 > +.byte 15,58,204,193,2 > +.byte 15,56,200,215 > +.byte 69,15,58,204,193,2 > +.byte 69,15,56,200,214 > +.byte 15,56,202,231 > +.byte 69,15,56,202,222 > + pxor %xmm7,%xmm5 > +.byte 15,56,201,247 > + pxor %xmm14,%xmm12 > +.byte 69,15,56,201,238 > + movdqa %xmm0,%xmm1 > + movdqa %xmm8,%xmm9 > +.byte 15,58,204,194,2 > +.byte 15,56,200,204 > +.byte 69,15,58,204,194,2 > +.byte 69,15,56,200,203 > +.byte 15,56,202,236 > +.byte 69,15,56,202,227 > + pxor %xmm4,%xmm6 > +.byte 15,56,201,252 > + pxor %xmm11,%xmm13 > +.byte 69,15,56,201,243 > + movdqa %xmm0,%xmm2 > + movdqa %xmm8,%xmm10 > +.byte 15,58,204,193,2 > +.byte 15,56,200,213 > +.byte 69,15,58,204,193,2 > +.byte 69,15,56,200,212 > +.byte 15,56,202,245 > +.byte 69,15,56,202,236 > + pxor %xmm5,%xmm7 > +.byte 15,56,201,229 > + pxor %xmm12,%xmm14 > +.byte 69,15,56,201,220 > + movdqa %xmm0,%xmm1 > + movdqa %xmm8,%xmm9 > +.byte 15,58,204,194,2 > +.byte 15,56,200,206 > +.byte 69,15,58,204,194,2 > +.byte 69,15,56,200,205 > +.byte 15,56,202,254 > +.byte 69,15,56,202,245 > + pxor %xmm6,%xmm4 > +.byte 15,56,201,238 > + pxor %xmm13,%xmm11 > +.byte 69,15,56,201,229 > + movdqa %xmm0,%xmm2 > + movdqa %xmm8,%xmm10 > +.byte 15,58,204,193,2 > +.byte 15,56,200,215 > +.byte 69,15,58,204,193,2 > +.byte 69,15,56,200,214 > +.byte 15,56,202,231 > +.byte 69,15,56,202,222 > + pxor %xmm7,%xmm5 > +.byte 15,56,201,247 > + pxor %xmm14,%xmm12 > +.byte 69,15,56,201,238 > + movdqa %xmm0,%xmm1 > + movdqa %xmm8,%xmm9 > +.byte 15,58,204,194,3 > +.byte 15,56,200,204 > +.byte 69,15,58,204,194,3 > +.byte 69,15,56,200,203 > +.byte 15,56,202,236 > +.byte 69,15,56,202,227 > + pxor %xmm4,%xmm6 > +.byte 15,56,201,252 > + pxor %xmm11,%xmm13 > +.byte 69,15,56,201,243 > + movdqa %xmm0,%xmm2 > + movdqa %xmm8,%xmm10 > +.byte 15,58,204,193,3 > +.byte 15,56,200,213 > +.byte 69,15,58,204,193,3 > +.byte 69,15,56,200,212 > +.byte 15,56,202,245 > +.byte 69,15,56,202,236 > + pxor %xmm5,%xmm7 > + pxor %xmm12,%xmm14 > + > + movl $1,%ecx > + pxor %xmm4,%xmm4 > + cmpl 0(%rbx),%ecx > + cmovgeq %rsp,%r8 > + > + movdqa %xmm0,%xmm1 > + movdqa %xmm8,%xmm9 > +.byte 15,58,204,194,3 > +.byte 15,56,200,206 > +.byte 69,15,58,204,194,3 > +.byte 69,15,56,200,205 > +.byte 15,56,202,254 > +.byte 69,15,56,202,245 > + > + cmpl 4(%rbx),%ecx > + cmovgeq %rsp,%r9 > + movq (%rbx),%xmm6 > + > + movdqa %xmm0,%xmm2 > + movdqa %xmm8,%xmm10 > +.byte 15,58,204,193,3 > +.byte 15,56,200,215 > +.byte 69,15,58,204,193,3 > +.byte 69,15,56,200,214 > + > + pshufd $0x00,%xmm6,%xmm11 > + pshufd $0x55,%xmm6,%xmm12 > + movdqa %xmm6,%xmm7 > + pcmpgtd %xmm4,%xmm11 > + pcmpgtd %xmm4,%xmm12 > + > + movdqa %xmm0,%xmm1 > + movdqa %xmm8,%xmm9 > +.byte 15,58,204,194,3 > +.byte 15,56,200,204 > +.byte 69,15,58,204,194,3 > +.byte 68,15,56,200,204 > + > + pcmpgtd %xmm4,%xmm7 > + pand %xmm11,%xmm0 > + pand %xmm11,%xmm1 > + pand %xmm12,%xmm8 > + pand %xmm12,%xmm9 > + paddd %xmm7,%xmm6 > + > + paddd 64(%rsp),%xmm0 > + paddd 80(%rsp),%xmm1 > + paddd 96(%rsp),%xmm8 > + paddd 112(%rsp),%xmm9 > + > + movq %xmm6,(%rbx) > + decl %edx > + jnz .Loop_shaext > + > + movl 280(%rsp),%edx > + > + pshufd $27,%xmm0,%xmm0 > + pshufd $27,%xmm8,%xmm8 > + > + movdqa %xmm0,%xmm6 > + punpckldq %xmm8,%xmm0 > + punpckhdq %xmm8,%xmm6 > + punpckhdq %xmm9,%xmm1 > + movq %xmm0,0-64(%rdi) > + psrldq $8,%xmm0 > + movq %xmm6,64-64(%rdi) > + psrldq $8,%xmm6 > + movq %xmm0,32-64(%rdi) > + psrldq $8,%xmm1 > + movq %xmm6,96-64(%rdi) > + movq %xmm1,128-64(%rdi) > + > + leaq 8(%rdi),%rdi > + leaq 32(%rsi),%rsi > + decl %edx > + jnz .Loop_grande_shaext > + > +.Ldone_shaext: > + > + movq -16(%rax),%rbp > +.cfi_restore %rbp > + movq -8(%rax),%rbx > +.cfi_restore %rbx > + leaq (%rax),%rsp > +.cfi_def_cfa_register %rsp > +.Lepilogue_shaext: > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size sha1_multi_block_shaext,.-sha1_multi_block_shaext > + > +.align 256 > +.long 0x5a827999,0x5a827999,0x5a827999,0x5a827999 > +.long 0x5a827999,0x5a827999,0x5a827999,0x5a827999 > +K_XX_XX: > +.long 0x6ed9eba1,0x6ed9eba1,0x6ed9eba1,0x6ed9eba1 > +.long 0x6ed9eba1,0x6ed9eba1,0x6ed9eba1,0x6ed9eba1 > +.long 0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc > +.long 0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc > +.long 0xca62c1d6,0xca62c1d6,0xca62c1d6,0xca62c1d6 > +.long 0xca62c1d6,0xca62c1d6,0xca62c1d6,0xca62c1d6 > +.long 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f > +.long 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f > +.byte 0xf,0xe,0xd,0xc,0xb,0xa,0x9,0x8,0x7,0x6,0x5,0x4,0x3,0x2,0x1,0x0 > +.byte > 83,72,65,49,32,109,117,108,116,105,45,98,108,111,99,107,32,116,114,97,110, > 115,102,111,114,109,32,102,111,114,32,120,56,54,95,54,52,44,32,67,82,89,80, > 84,79,71,65,77,83,32,98,121,32,60,97,112,112,114,111,64,111,112,101,110,11 > 5,115,108,46,111,114,103,62,0 > diff --git a/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha1-x86_64.S > b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha1-x86_64.S > new file mode 100644 > index 0000000000..0b59726ae4 > --- /dev/null > +++ b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha1-x86_64.S > @@ -0,0 +1,2631 @@ > +# WARNING: do not edit! > +# Generated from openssl/crypto/sha/asm/sha1-x86_64.pl > +# > +# Copyright 2006-2020 The OpenSSL Project Authors. All Rights Reserved. > +# > +# Licensed under the OpenSSL license (the "License"). You may not use > +# this file except in compliance with the License. You can obtain a copy > +# in the file LICENSE in the source distribution or at > +# https://www.openssl.org/source/license.html > + > +.text > + > + > +.globl sha1_block_data_order > +.type sha1_block_data_order,@function > +.align 16 > +sha1_block_data_order: > +.cfi_startproc > + movl OPENSSL_ia32cap_P+0(%rip),%r9d > + movl OPENSSL_ia32cap_P+4(%rip),%r8d > + movl OPENSSL_ia32cap_P+8(%rip),%r10d > + testl $512,%r8d > + jz .Lialu > + testl $536870912,%r10d > + jnz _shaext_shortcut > + jmp _ssse3_shortcut > + > +.align 16 > +.Lialu: > + movq %rsp,%rax > +.cfi_def_cfa_register %rax > + pushq %rbx > +.cfi_offset %rbx,-16 > + pushq %rbp > +.cfi_offset %rbp,-24 > + pushq %r12 > +.cfi_offset %r12,-32 > + pushq %r13 > +.cfi_offset %r13,-40 > + pushq %r14 > +.cfi_offset %r14,-48 > + movq %rdi,%r8 > + subq $72,%rsp > + movq %rsi,%r9 > + andq $-64,%rsp > + movq %rdx,%r10 > + movq %rax,64(%rsp) > +.cfi_escape 0x0f,0x06,0x77,0xc0,0x00,0x06,0x23,0x08 > +.Lprologue: > + > + movl 0(%r8),%esi > + movl 4(%r8),%edi > + movl 8(%r8),%r11d > + movl 12(%r8),%r12d > + movl 16(%r8),%r13d > + jmp .Lloop > + > +.align 16 > +.Lloop: > + movl 0(%r9),%edx > + bswapl %edx > + movl 4(%r9),%ebp > + movl %r12d,%eax > + movl %edx,0(%rsp) > + movl %esi,%ecx > + bswapl %ebp > + xorl %r11d,%eax > + roll $5,%ecx > + andl %edi,%eax > + leal 1518500249(%rdx,%r13,1),%r13d > + addl %ecx,%r13d > + xorl %r12d,%eax > + roll $30,%edi > + addl %eax,%r13d > + movl 8(%r9),%r14d > + movl %r11d,%eax > + movl %ebp,4(%rsp) > + movl %r13d,%ecx > + bswapl %r14d > + xorl %edi,%eax > + roll $5,%ecx > + andl %esi,%eax > + leal 1518500249(%rbp,%r12,1),%r12d > + addl %ecx,%r12d > + xorl %r11d,%eax > + roll $30,%esi > + addl %eax,%r12d > + movl 12(%r9),%edx > + movl %edi,%eax > + movl %r14d,8(%rsp) > + movl %r12d,%ecx > + bswapl %edx > + xorl %esi,%eax > + roll $5,%ecx > + andl %r13d,%eax > + leal 1518500249(%r14,%r11,1),%r11d > + addl %ecx,%r11d > + xorl %edi,%eax > + roll $30,%r13d > + addl %eax,%r11d > + movl 16(%r9),%ebp > + movl %esi,%eax > + movl %edx,12(%rsp) > + movl %r11d,%ecx > + bswapl %ebp > + xorl %r13d,%eax > + roll $5,%ecx > + andl %r12d,%eax > + leal 1518500249(%rdx,%rdi,1),%edi > + addl %ecx,%edi > + xorl %esi,%eax > + roll $30,%r12d > + addl %eax,%edi > + movl 20(%r9),%r14d > + movl %r13d,%eax > + movl %ebp,16(%rsp) > + movl %edi,%ecx > + bswapl %r14d > + xorl %r12d,%eax > + roll $5,%ecx > + andl %r11d,%eax > + leal 1518500249(%rbp,%rsi,1),%esi > + addl %ecx,%esi > + xorl %r13d,%eax > + roll $30,%r11d > + addl %eax,%esi > + movl 24(%r9),%edx > + movl %r12d,%eax > + movl %r14d,20(%rsp) > + movl %esi,%ecx > + bswapl %edx > + xorl %r11d,%eax > + roll $5,%ecx > + andl %edi,%eax > + leal 1518500249(%r14,%r13,1),%r13d > + addl %ecx,%r13d > + xorl %r12d,%eax > + roll $30,%edi > + addl %eax,%r13d > + movl 28(%r9),%ebp > + movl %r11d,%eax > + movl %edx,24(%rsp) > + movl %r13d,%ecx > + bswapl %ebp > + xorl %edi,%eax > + roll $5,%ecx > + andl %esi,%eax > + leal 1518500249(%rdx,%r12,1),%r12d > + addl %ecx,%r12d > + xorl %r11d,%eax > + roll $30,%esi > + addl %eax,%r12d > + movl 32(%r9),%r14d > + movl %edi,%eax > + movl %ebp,28(%rsp) > + movl %r12d,%ecx > + bswapl %r14d > + xorl %esi,%eax > + roll $5,%ecx > + andl %r13d,%eax > + leal 1518500249(%rbp,%r11,1),%r11d > + addl %ecx,%r11d > + xorl %edi,%eax > + roll $30,%r13d > + addl %eax,%r11d > + movl 36(%r9),%edx > + movl %esi,%eax > + movl %r14d,32(%rsp) > + movl %r11d,%ecx > + bswapl %edx > + xorl %r13d,%eax > + roll $5,%ecx > + andl %r12d,%eax > + leal 1518500249(%r14,%rdi,1),%edi > + addl %ecx,%edi > + xorl %esi,%eax > + roll $30,%r12d > + addl %eax,%edi > + movl 40(%r9),%ebp > + movl %r13d,%eax > + movl %edx,36(%rsp) > + movl %edi,%ecx > + bswapl %ebp > + xorl %r12d,%eax > + roll $5,%ecx > + andl %r11d,%eax > + leal 1518500249(%rdx,%rsi,1),%esi > + addl %ecx,%esi > + xorl %r13d,%eax > + roll $30,%r11d > + addl %eax,%esi > + movl 44(%r9),%r14d > + movl %r12d,%eax > + movl %ebp,40(%rsp) > + movl %esi,%ecx > + bswapl %r14d > + xorl %r11d,%eax > + roll $5,%ecx > + andl %edi,%eax > + leal 1518500249(%rbp,%r13,1),%r13d > + addl %ecx,%r13d > + xorl %r12d,%eax > + roll $30,%edi > + addl %eax,%r13d > + movl 48(%r9),%edx > + movl %r11d,%eax > + movl %r14d,44(%rsp) > + movl %r13d,%ecx > + bswapl %edx > + xorl %edi,%eax > + roll $5,%ecx > + andl %esi,%eax > + leal 1518500249(%r14,%r12,1),%r12d > + addl %ecx,%r12d > + xorl %r11d,%eax > + roll $30,%esi > + addl %eax,%r12d > + movl 52(%r9),%ebp > + movl %edi,%eax > + movl %edx,48(%rsp) > + movl %r12d,%ecx > + bswapl %ebp > + xorl %esi,%eax > + roll $5,%ecx > + andl %r13d,%eax > + leal 1518500249(%rdx,%r11,1),%r11d > + addl %ecx,%r11d > + xorl %edi,%eax > + roll $30,%r13d > + addl %eax,%r11d > + movl 56(%r9),%r14d > + movl %esi,%eax > + movl %ebp,52(%rsp) > + movl %r11d,%ecx > + bswapl %r14d > + xorl %r13d,%eax > + roll $5,%ecx > + andl %r12d,%eax > + leal 1518500249(%rbp,%rdi,1),%edi > + addl %ecx,%edi > + xorl %esi,%eax > + roll $30,%r12d > + addl %eax,%edi > + movl 60(%r9),%edx > + movl %r13d,%eax > + movl %r14d,56(%rsp) > + movl %edi,%ecx > + bswapl %edx > + xorl %r12d,%eax > + roll $5,%ecx > + andl %r11d,%eax > + leal 1518500249(%r14,%rsi,1),%esi > + addl %ecx,%esi > + xorl %r13d,%eax > + roll $30,%r11d > + addl %eax,%esi > + xorl 0(%rsp),%ebp > + movl %r12d,%eax > + movl %edx,60(%rsp) > + movl %esi,%ecx > + xorl 8(%rsp),%ebp > + xorl %r11d,%eax > + roll $5,%ecx > + xorl 32(%rsp),%ebp > + andl %edi,%eax > + leal 1518500249(%rdx,%r13,1),%r13d > + roll $30,%edi > + xorl %r12d,%eax > + addl %ecx,%r13d > + roll $1,%ebp > + addl %eax,%r13d > + xorl 4(%rsp),%r14d > + movl %r11d,%eax > + movl %ebp,0(%rsp) > + movl %r13d,%ecx > + xorl 12(%rsp),%r14d > + xorl %edi,%eax > + roll $5,%ecx > + xorl 36(%rsp),%r14d > + andl %esi,%eax > + leal 1518500249(%rbp,%r12,1),%r12d > + roll $30,%esi > + xorl %r11d,%eax > + addl %ecx,%r12d > + roll $1,%r14d > + addl %eax,%r12d > + xorl 8(%rsp),%edx > + movl %edi,%eax > + movl %r14d,4(%rsp) > + movl %r12d,%ecx > + xorl 16(%rsp),%edx > + xorl %esi,%eax > + roll $5,%ecx > + xorl 40(%rsp),%edx > + andl %r13d,%eax > + leal 1518500249(%r14,%r11,1),%r11d > + roll $30,%r13d > + xorl %edi,%eax > + addl %ecx,%r11d > + roll $1,%edx > + addl %eax,%r11d > + xorl 12(%rsp),%ebp > + movl %esi,%eax > + movl %edx,8(%rsp) > + movl %r11d,%ecx > + xorl 20(%rsp),%ebp > + xorl %r13d,%eax > + roll $5,%ecx > + xorl 44(%rsp),%ebp > + andl %r12d,%eax > + leal 1518500249(%rdx,%rdi,1),%edi > + roll $30,%r12d > + xorl %esi,%eax > + addl %ecx,%edi > + roll $1,%ebp > + addl %eax,%edi > + xorl 16(%rsp),%r14d > + movl %r13d,%eax > + movl %ebp,12(%rsp) > + movl %edi,%ecx > + xorl 24(%rsp),%r14d > + xorl %r12d,%eax > + roll $5,%ecx > + xorl 48(%rsp),%r14d > + andl %r11d,%eax > + leal 1518500249(%rbp,%rsi,1),%esi > + roll $30,%r11d > + xorl %r13d,%eax > + addl %ecx,%esi > + roll $1,%r14d > + addl %eax,%esi > + xorl 20(%rsp),%edx > + movl %edi,%eax > + movl %r14d,16(%rsp) > + movl %esi,%ecx > + xorl 28(%rsp),%edx > + xorl %r12d,%eax > + roll $5,%ecx > + xorl 52(%rsp),%edx > + leal 1859775393(%r14,%r13,1),%r13d > + xorl %r11d,%eax > + addl %ecx,%r13d > + roll $30,%edi > + addl %eax,%r13d > + roll $1,%edx > + xorl 24(%rsp),%ebp > + movl %esi,%eax > + movl %edx,20(%rsp) > + movl %r13d,%ecx > + xorl 32(%rsp),%ebp > + xorl %r11d,%eax > + roll $5,%ecx > + xorl 56(%rsp),%ebp > + leal 1859775393(%rdx,%r12,1),%r12d > + xorl %edi,%eax > + addl %ecx,%r12d > + roll $30,%esi > + addl %eax,%r12d > + roll $1,%ebp > + xorl 28(%rsp),%r14d > + movl %r13d,%eax > + movl %ebp,24(%rsp) > + movl %r12d,%ecx > + xorl 36(%rsp),%r14d > + xorl %edi,%eax > + roll $5,%ecx > + xorl 60(%rsp),%r14d > + leal 1859775393(%rbp,%r11,1),%r11d > + xorl %esi,%eax > + addl %ecx,%r11d > + roll $30,%r13d > + addl %eax,%r11d > + roll $1,%r14d > + xorl 32(%rsp),%edx > + movl %r12d,%eax > + movl %r14d,28(%rsp) > + movl %r11d,%ecx > + xorl 40(%rsp),%edx > + xorl %esi,%eax > + roll $5,%ecx > + xorl 0(%rsp),%edx > + leal 1859775393(%r14,%rdi,1),%edi > + xorl %r13d,%eax > + addl %ecx,%edi > + roll $30,%r12d > + addl %eax,%edi > + roll $1,%edx > + xorl 36(%rsp),%ebp > + movl %r11d,%eax > + movl %edx,32(%rsp) > + movl %edi,%ecx > + xorl 44(%rsp),%ebp > + xorl %r13d,%eax > + roll $5,%ecx > + xorl 4(%rsp),%ebp > + leal 1859775393(%rdx,%rsi,1),%esi > + xorl %r12d,%eax > + addl %ecx,%esi > + roll $30,%r11d > + addl %eax,%esi > + roll $1,%ebp > + xorl 40(%rsp),%r14d > + movl %edi,%eax > + movl %ebp,36(%rsp) > + movl %esi,%ecx > + xorl 48(%rsp),%r14d > + xorl %r12d,%eax > + roll $5,%ecx > + xorl 8(%rsp),%r14d > + leal 1859775393(%rbp,%r13,1),%r13d > + xorl %r11d,%eax > + addl %ecx,%r13d > + roll $30,%edi > + addl %eax,%r13d > + roll $1,%r14d > + xorl 44(%rsp),%edx > + movl %esi,%eax > + movl %r14d,40(%rsp) > + movl %r13d,%ecx > + xorl 52(%rsp),%edx > + xorl %r11d,%eax > + roll $5,%ecx > + xorl 12(%rsp),%edx > + leal 1859775393(%r14,%r12,1),%r12d > + xorl %edi,%eax > + addl %ecx,%r12d > + roll $30,%esi > + addl %eax,%r12d > + roll $1,%edx > + xorl 48(%rsp),%ebp > + movl %r13d,%eax > + movl %edx,44(%rsp) > + movl %r12d,%ecx > + xorl 56(%rsp),%ebp > + xorl %edi,%eax > + roll $5,%ecx > + xorl 16(%rsp),%ebp > + leal 1859775393(%rdx,%r11,1),%r11d > + xorl %esi,%eax > + addl %ecx,%r11d > + roll $30,%r13d > + addl %eax,%r11d > + roll $1,%ebp > + xorl 52(%rsp),%r14d > + movl %r12d,%eax > + movl %ebp,48(%rsp) > + movl %r11d,%ecx > + xorl 60(%rsp),%r14d > + xorl %esi,%eax > + roll $5,%ecx > + xorl 20(%rsp),%r14d > + leal 1859775393(%rbp,%rdi,1),%edi > + xorl %r13d,%eax > + addl %ecx,%edi > + roll $30,%r12d > + addl %eax,%edi > + roll $1,%r14d > + xorl 56(%rsp),%edx > + movl %r11d,%eax > + movl %r14d,52(%rsp) > + movl %edi,%ecx > + xorl 0(%rsp),%edx > + xorl %r13d,%eax > + roll $5,%ecx > + xorl 24(%rsp),%edx > + leal 1859775393(%r14,%rsi,1),%esi > + xorl %r12d,%eax > + addl %ecx,%esi > + roll $30,%r11d > + addl %eax,%esi > + roll $1,%edx > + xorl 60(%rsp),%ebp > + movl %edi,%eax > + movl %edx,56(%rsp) > + movl %esi,%ecx > + xorl 4(%rsp),%ebp > + xorl %r12d,%eax > + roll $5,%ecx > + xorl 28(%rsp),%ebp > + leal 1859775393(%rdx,%r13,1),%r13d > + xorl %r11d,%eax > + addl %ecx,%r13d > + roll $30,%edi > + addl %eax,%r13d > + roll $1,%ebp > + xorl 0(%rsp),%r14d > + movl %esi,%eax > + movl %ebp,60(%rsp) > + movl %r13d,%ecx > + xorl 8(%rsp),%r14d > + xorl %r11d,%eax > + roll $5,%ecx > + xorl 32(%rsp),%r14d > + leal 1859775393(%rbp,%r12,1),%r12d > + xorl %edi,%eax > + addl %ecx,%r12d > + roll $30,%esi > + addl %eax,%r12d > + roll $1,%r14d > + xorl 4(%rsp),%edx > + movl %r13d,%eax > + movl %r14d,0(%rsp) > + movl %r12d,%ecx > + xorl 12(%rsp),%edx > + xorl %edi,%eax > + roll $5,%ecx > + xorl 36(%rsp),%edx > + leal 1859775393(%r14,%r11,1),%r11d > + xorl %esi,%eax > + addl %ecx,%r11d > + roll $30,%r13d > + addl %eax,%r11d > + roll $1,%edx > + xorl 8(%rsp),%ebp > + movl %r12d,%eax > + movl %edx,4(%rsp) > + movl %r11d,%ecx > + xorl 16(%rsp),%ebp > + xorl %esi,%eax > + roll $5,%ecx > + xorl 40(%rsp),%ebp > + leal 1859775393(%rdx,%rdi,1),%edi > + xorl %r13d,%eax > + addl %ecx,%edi > + roll $30,%r12d > + addl %eax,%edi > + roll $1,%ebp > + xorl 12(%rsp),%r14d > + movl %r11d,%eax > + movl %ebp,8(%rsp) > + movl %edi,%ecx > + xorl 20(%rsp),%r14d > + xorl %r13d,%eax > + roll $5,%ecx > + xorl 44(%rsp),%r14d > + leal 1859775393(%rbp,%rsi,1),%esi > + xorl %r12d,%eax > + addl %ecx,%esi > + roll $30,%r11d > + addl %eax,%esi > + roll $1,%r14d > + xorl 16(%rsp),%edx > + movl %edi,%eax > + movl %r14d,12(%rsp) > + movl %esi,%ecx > + xorl 24(%rsp),%edx > + xorl %r12d,%eax > + roll $5,%ecx > + xorl 48(%rsp),%edx > + leal 1859775393(%r14,%r13,1),%r13d > + xorl %r11d,%eax > + addl %ecx,%r13d > + roll $30,%edi > + addl %eax,%r13d > + roll $1,%edx > + xorl 20(%rsp),%ebp > + movl %esi,%eax > + movl %edx,16(%rsp) > + movl %r13d,%ecx > + xorl 28(%rsp),%ebp > + xorl %r11d,%eax > + roll $5,%ecx > + xorl 52(%rsp),%ebp > + leal 1859775393(%rdx,%r12,1),%r12d > + xorl %edi,%eax > + addl %ecx,%r12d > + roll $30,%esi > + addl %eax,%r12d > + roll $1,%ebp > + xorl 24(%rsp),%r14d > + movl %r13d,%eax > + movl %ebp,20(%rsp) > + movl %r12d,%ecx > + xorl 32(%rsp),%r14d > + xorl %edi,%eax > + roll $5,%ecx > + xorl 56(%rsp),%r14d > + leal 1859775393(%rbp,%r11,1),%r11d > + xorl %esi,%eax > + addl %ecx,%r11d > + roll $30,%r13d > + addl %eax,%r11d > + roll $1,%r14d > + xorl 28(%rsp),%edx > + movl %r12d,%eax > + movl %r14d,24(%rsp) > + movl %r11d,%ecx > + xorl 36(%rsp),%edx > + xorl %esi,%eax > + roll $5,%ecx > + xorl 60(%rsp),%edx > + leal 1859775393(%r14,%rdi,1),%edi > + xorl %r13d,%eax > + addl %ecx,%edi > + roll $30,%r12d > + addl %eax,%edi > + roll $1,%edx > + xorl 32(%rsp),%ebp > + movl %r11d,%eax > + movl %edx,28(%rsp) > + movl %edi,%ecx > + xorl 40(%rsp),%ebp > + xorl %r13d,%eax > + roll $5,%ecx > + xorl 0(%rsp),%ebp > + leal 1859775393(%rdx,%rsi,1),%esi > + xorl %r12d,%eax > + addl %ecx,%esi > + roll $30,%r11d > + addl %eax,%esi > + roll $1,%ebp > + xorl 36(%rsp),%r14d > + movl %r12d,%eax > + movl %ebp,32(%rsp) > + movl %r12d,%ebx > + xorl 44(%rsp),%r14d > + andl %r11d,%eax > + movl %esi,%ecx > + xorl 4(%rsp),%r14d > + leal -1894007588(%rbp,%r13,1),%r13d > + xorl %r11d,%ebx > + roll $5,%ecx > + addl %eax,%r13d > + roll $1,%r14d > + andl %edi,%ebx > + addl %ecx,%r13d > + roll $30,%edi > + addl %ebx,%r13d > + xorl 40(%rsp),%edx > + movl %r11d,%eax > + movl %r14d,36(%rsp) > + movl %r11d,%ebx > + xorl 48(%rsp),%edx > + andl %edi,%eax > + movl %r13d,%ecx > + xorl 8(%rsp),%edx > + leal -1894007588(%r14,%r12,1),%r12d > + xorl %edi,%ebx > + roll $5,%ecx > + addl %eax,%r12d > + roll $1,%edx > + andl %esi,%ebx > + addl %ecx,%r12d > + roll $30,%esi > + addl %ebx,%r12d > + xorl 44(%rsp),%ebp > + movl %edi,%eax > + movl %edx,40(%rsp) > + movl %edi,%ebx > + xorl 52(%rsp),%ebp > + andl %esi,%eax > + movl %r12d,%ecx > + xorl 12(%rsp),%ebp > + leal -1894007588(%rdx,%r11,1),%r11d > + xorl %esi,%ebx > + roll $5,%ecx > + addl %eax,%r11d > + roll $1,%ebp > + andl %r13d,%ebx > + addl %ecx,%r11d > + roll $30,%r13d > + addl %ebx,%r11d > + xorl 48(%rsp),%r14d > + movl %esi,%eax > + movl %ebp,44(%rsp) > + movl %esi,%ebx > + xorl 56(%rsp),%r14d > + andl %r13d,%eax > + movl %r11d,%ecx > + xorl 16(%rsp),%r14d > + leal -1894007588(%rbp,%rdi,1),%edi > + xorl %r13d,%ebx > + roll $5,%ecx > + addl %eax,%edi > + roll $1,%r14d > + andl %r12d,%ebx > + addl %ecx,%edi > + roll $30,%r12d > + addl %ebx,%edi > + xorl 52(%rsp),%edx > + movl %r13d,%eax > + movl %r14d,48(%rsp) > + movl %r13d,%ebx > + xorl 60(%rsp),%edx > + andl %r12d,%eax > + movl %edi,%ecx > + xorl 20(%rsp),%edx > + leal -1894007588(%r14,%rsi,1),%esi > + xorl %r12d,%ebx > + roll $5,%ecx > + addl %eax,%esi > + roll $1,%edx > + andl %r11d,%ebx > + addl %ecx,%esi > + roll $30,%r11d > + addl %ebx,%esi > + xorl 56(%rsp),%ebp > + movl %r12d,%eax > + movl %edx,52(%rsp) > + movl %r12d,%ebx > + xorl 0(%rsp),%ebp > + andl %r11d,%eax > + movl %esi,%ecx > + xorl 24(%rsp),%ebp > + leal -1894007588(%rdx,%r13,1),%r13d > + xorl %r11d,%ebx > + roll $5,%ecx > + addl %eax,%r13d > + roll $1,%ebp > + andl %edi,%ebx > + addl %ecx,%r13d > + roll $30,%edi > + addl %ebx,%r13d > + xorl 60(%rsp),%r14d > + movl %r11d,%eax > + movl %ebp,56(%rsp) > + movl %r11d,%ebx > + xorl 4(%rsp),%r14d > + andl %edi,%eax > + movl %r13d,%ecx > + xorl 28(%rsp),%r14d > + leal -1894007588(%rbp,%r12,1),%r12d > + xorl %edi,%ebx > + roll $5,%ecx > + addl %eax,%r12d > + roll $1,%r14d > + andl %esi,%ebx > + addl %ecx,%r12d > + roll $30,%esi > + addl %ebx,%r12d > + xorl 0(%rsp),%edx > + movl %edi,%eax > + movl %r14d,60(%rsp) > + movl %edi,%ebx > + xorl 8(%rsp),%edx > + andl %esi,%eax > + movl %r12d,%ecx > + xorl 32(%rsp),%edx > + leal -1894007588(%r14,%r11,1),%r11d > + xorl %esi,%ebx > + roll $5,%ecx > + addl %eax,%r11d > + roll $1,%edx > + andl %r13d,%ebx > + addl %ecx,%r11d > + roll $30,%r13d > + addl %ebx,%r11d > + xorl 4(%rsp),%ebp > + movl %esi,%eax > + movl %edx,0(%rsp) > + movl %esi,%ebx > + xorl 12(%rsp),%ebp > + andl %r13d,%eax > + movl %r11d,%ecx > + xorl 36(%rsp),%ebp > + leal -1894007588(%rdx,%rdi,1),%edi > + xorl %r13d,%ebx > + roll $5,%ecx > + addl %eax,%edi > + roll $1,%ebp > + andl %r12d,%ebx > + addl %ecx,%edi > + roll $30,%r12d > + addl %ebx,%edi > + xorl 8(%rsp),%r14d > + movl %r13d,%eax > + movl %ebp,4(%rsp) > + movl %r13d,%ebx > + xorl 16(%rsp),%r14d > + andl %r12d,%eax > + movl %edi,%ecx > + xorl 40(%rsp),%r14d > + leal -1894007588(%rbp,%rsi,1),%esi > + xorl %r12d,%ebx > + roll $5,%ecx > + addl %eax,%esi > + roll $1,%r14d > + andl %r11d,%ebx > + addl %ecx,%esi > + roll $30,%r11d > + addl %ebx,%esi > + xorl 12(%rsp),%edx > + movl %r12d,%eax > + movl %r14d,8(%rsp) > + movl %r12d,%ebx > + xorl 20(%rsp),%edx > + andl %r11d,%eax > + movl %esi,%ecx > + xorl 44(%rsp),%edx > + leal -1894007588(%r14,%r13,1),%r13d > + xorl %r11d,%ebx > + roll $5,%ecx > + addl %eax,%r13d > + roll $1,%edx > + andl %edi,%ebx > + addl %ecx,%r13d > + roll $30,%edi > + addl %ebx,%r13d > + xorl 16(%rsp),%ebp > + movl %r11d,%eax > + movl %edx,12(%rsp) > + movl %r11d,%ebx > + xorl 24(%rsp),%ebp > + andl %edi,%eax > + movl %r13d,%ecx > + xorl 48(%rsp),%ebp > + leal -1894007588(%rdx,%r12,1),%r12d > + xorl %edi,%ebx > + roll $5,%ecx > + addl %eax,%r12d > + roll $1,%ebp > + andl %esi,%ebx > + addl %ecx,%r12d > + roll $30,%esi > + addl %ebx,%r12d > + xorl 20(%rsp),%r14d > + movl %edi,%eax > + movl %ebp,16(%rsp) > + movl %edi,%ebx > + xorl 28(%rsp),%r14d > + andl %esi,%eax > + movl %r12d,%ecx > + xorl 52(%rsp),%r14d > + leal -1894007588(%rbp,%r11,1),%r11d > + xorl %esi,%ebx > + roll $5,%ecx > + addl %eax,%r11d > + roll $1,%r14d > + andl %r13d,%ebx > + addl %ecx,%r11d > + roll $30,%r13d > + addl %ebx,%r11d > + xorl 24(%rsp),%edx > + movl %esi,%eax > + movl %r14d,20(%rsp) > + movl %esi,%ebx > + xorl 32(%rsp),%edx > + andl %r13d,%eax > + movl %r11d,%ecx > + xorl 56(%rsp),%edx > + leal -1894007588(%r14,%rdi,1),%edi > + xorl %r13d,%ebx > + roll $5,%ecx > + addl %eax,%edi > + roll $1,%edx > + andl %r12d,%ebx > + addl %ecx,%edi > + roll $30,%r12d > + addl %ebx,%edi > + xorl 28(%rsp),%ebp > + movl %r13d,%eax > + movl %edx,24(%rsp) > + movl %r13d,%ebx > + xorl 36(%rsp),%ebp > + andl %r12d,%eax > + movl %edi,%ecx > + xorl 60(%rsp),%ebp > + leal -1894007588(%rdx,%rsi,1),%esi > + xorl %r12d,%ebx > + roll $5,%ecx > + addl %eax,%esi > + roll $1,%ebp > + andl %r11d,%ebx > + addl %ecx,%esi > + roll $30,%r11d > + addl %ebx,%esi > + xorl 32(%rsp),%r14d > + movl %r12d,%eax > + movl %ebp,28(%rsp) > + movl %r12d,%ebx > + xorl 40(%rsp),%r14d > + andl %r11d,%eax > + movl %esi,%ecx > + xorl 0(%rsp),%r14d > + leal -1894007588(%rbp,%r13,1),%r13d > + xorl %r11d,%ebx > + roll $5,%ecx > + addl %eax,%r13d > + roll $1,%r14d > + andl %edi,%ebx > + addl %ecx,%r13d > + roll $30,%edi > + addl %ebx,%r13d > + xorl 36(%rsp),%edx > + movl %r11d,%eax > + movl %r14d,32(%rsp) > + movl %r11d,%ebx > + xorl 44(%rsp),%edx > + andl %edi,%eax > + movl %r13d,%ecx > + xorl 4(%rsp),%edx > + leal -1894007588(%r14,%r12,1),%r12d > + xorl %edi,%ebx > + roll $5,%ecx > + addl %eax,%r12d > + roll $1,%edx > + andl %esi,%ebx > + addl %ecx,%r12d > + roll $30,%esi > + addl %ebx,%r12d > + xorl 40(%rsp),%ebp > + movl %edi,%eax > + movl %edx,36(%rsp) > + movl %edi,%ebx > + xorl 48(%rsp),%ebp > + andl %esi,%eax > + movl %r12d,%ecx > + xorl 8(%rsp),%ebp > + leal -1894007588(%rdx,%r11,1),%r11d > + xorl %esi,%ebx > + roll $5,%ecx > + addl %eax,%r11d > + roll $1,%ebp > + andl %r13d,%ebx > + addl %ecx,%r11d > + roll $30,%r13d > + addl %ebx,%r11d > + xorl 44(%rsp),%r14d > + movl %esi,%eax > + movl %ebp,40(%rsp) > + movl %esi,%ebx > + xorl 52(%rsp),%r14d > + andl %r13d,%eax > + movl %r11d,%ecx > + xorl 12(%rsp),%r14d > + leal -1894007588(%rbp,%rdi,1),%edi > + xorl %r13d,%ebx > + roll $5,%ecx > + addl %eax,%edi > + roll $1,%r14d > + andl %r12d,%ebx > + addl %ecx,%edi > + roll $30,%r12d > + addl %ebx,%edi > + xorl 48(%rsp),%edx > + movl %r13d,%eax > + movl %r14d,44(%rsp) > + movl %r13d,%ebx > + xorl 56(%rsp),%edx > + andl %r12d,%eax > + movl %edi,%ecx > + xorl 16(%rsp),%edx > + leal -1894007588(%r14,%rsi,1),%esi > + xorl %r12d,%ebx > + roll $5,%ecx > + addl %eax,%esi > + roll $1,%edx > + andl %r11d,%ebx > + addl %ecx,%esi > + roll $30,%r11d > + addl %ebx,%esi > + xorl 52(%rsp),%ebp > + movl %edi,%eax > + movl %edx,48(%rsp) > + movl %esi,%ecx > + xorl 60(%rsp),%ebp > + xorl %r12d,%eax > + roll $5,%ecx > + xorl 20(%rsp),%ebp > + leal -899497514(%rdx,%r13,1),%r13d > + xorl %r11d,%eax > + addl %ecx,%r13d > + roll $30,%edi > + addl %eax,%r13d > + roll $1,%ebp > + xorl 56(%rsp),%r14d > + movl %esi,%eax > + movl %ebp,52(%rsp) > + movl %r13d,%ecx > + xorl 0(%rsp),%r14d > + xorl %r11d,%eax > + roll $5,%ecx > + xorl 24(%rsp),%r14d > + leal -899497514(%rbp,%r12,1),%r12d > + xorl %edi,%eax > + addl %ecx,%r12d > + roll $30,%esi > + addl %eax,%r12d > + roll $1,%r14d > + xorl 60(%rsp),%edx > + movl %r13d,%eax > + movl %r14d,56(%rsp) > + movl %r12d,%ecx > + xorl 4(%rsp),%edx > + xorl %edi,%eax > + roll $5,%ecx > + xorl 28(%rsp),%edx > + leal -899497514(%r14,%r11,1),%r11d > + xorl %esi,%eax > + addl %ecx,%r11d > + roll $30,%r13d > + addl %eax,%r11d > + roll $1,%edx > + xorl 0(%rsp),%ebp > + movl %r12d,%eax > + movl %edx,60(%rsp) > + movl %r11d,%ecx > + xorl 8(%rsp),%ebp > + xorl %esi,%eax > + roll $5,%ecx > + xorl 32(%rsp),%ebp > + leal -899497514(%rdx,%rdi,1),%edi > + xorl %r13d,%eax > + addl %ecx,%edi > + roll $30,%r12d > + addl %eax,%edi > + roll $1,%ebp > + xorl 4(%rsp),%r14d > + movl %r11d,%eax > + movl %ebp,0(%rsp) > + movl %edi,%ecx > + xorl 12(%rsp),%r14d > + xorl %r13d,%eax > + roll $5,%ecx > + xorl 36(%rsp),%r14d > + leal -899497514(%rbp,%rsi,1),%esi > + xorl %r12d,%eax > + addl %ecx,%esi > + roll $30,%r11d > + addl %eax,%esi > + roll $1,%r14d > + xorl 8(%rsp),%edx > + movl %edi,%eax > + movl %r14d,4(%rsp) > + movl %esi,%ecx > + xorl 16(%rsp),%edx > + xorl %r12d,%eax > + roll $5,%ecx > + xorl 40(%rsp),%edx > + leal -899497514(%r14,%r13,1),%r13d > + xorl %r11d,%eax > + addl %ecx,%r13d > + roll $30,%edi > + addl %eax,%r13d > + roll $1,%edx > + xorl 12(%rsp),%ebp > + movl %esi,%eax > + movl %edx,8(%rsp) > + movl %r13d,%ecx > + xorl 20(%rsp),%ebp > + xorl %r11d,%eax > + roll $5,%ecx > + xorl 44(%rsp),%ebp > + leal -899497514(%rdx,%r12,1),%r12d > + xorl %edi,%eax > + addl %ecx,%r12d > + roll $30,%esi > + addl %eax,%r12d > + roll $1,%ebp > + xorl 16(%rsp),%r14d > + movl %r13d,%eax > + movl %ebp,12(%rsp) > + movl %r12d,%ecx > + xorl 24(%rsp),%r14d > + xorl %edi,%eax > + roll $5,%ecx > + xorl 48(%rsp),%r14d > + leal -899497514(%rbp,%r11,1),%r11d > + xorl %esi,%eax > + addl %ecx,%r11d > + roll $30,%r13d > + addl %eax,%r11d > + roll $1,%r14d > + xorl 20(%rsp),%edx > + movl %r12d,%eax > + movl %r14d,16(%rsp) > + movl %r11d,%ecx > + xorl 28(%rsp),%edx > + xorl %esi,%eax > + roll $5,%ecx > + xorl 52(%rsp),%edx > + leal -899497514(%r14,%rdi,1),%edi > + xorl %r13d,%eax > + addl %ecx,%edi > + roll $30,%r12d > + addl %eax,%edi > + roll $1,%edx > + xorl 24(%rsp),%ebp > + movl %r11d,%eax > + movl %edx,20(%rsp) > + movl %edi,%ecx > + xorl 32(%rsp),%ebp > + xorl %r13d,%eax > + roll $5,%ecx > + xorl 56(%rsp),%ebp > + leal -899497514(%rdx,%rsi,1),%esi > + xorl %r12d,%eax > + addl %ecx,%esi > + roll $30,%r11d > + addl %eax,%esi > + roll $1,%ebp > + xorl 28(%rsp),%r14d > + movl %edi,%eax > + movl %ebp,24(%rsp) > + movl %esi,%ecx > + xorl 36(%rsp),%r14d > + xorl %r12d,%eax > + roll $5,%ecx > + xorl 60(%rsp),%r14d > + leal -899497514(%rbp,%r13,1),%r13d > + xorl %r11d,%eax > + addl %ecx,%r13d > + roll $30,%edi > + addl %eax,%r13d > + roll $1,%r14d > + xorl 32(%rsp),%edx > + movl %esi,%eax > + movl %r14d,28(%rsp) > + movl %r13d,%ecx > + xorl 40(%rsp),%edx > + xorl %r11d,%eax > + roll $5,%ecx > + xorl 0(%rsp),%edx > + leal -899497514(%r14,%r12,1),%r12d > + xorl %edi,%eax > + addl %ecx,%r12d > + roll $30,%esi > + addl %eax,%r12d > + roll $1,%edx > + xorl 36(%rsp),%ebp > + movl %r13d,%eax > + > + movl %r12d,%ecx > + xorl 44(%rsp),%ebp > + xorl %edi,%eax > + roll $5,%ecx > + xorl 4(%rsp),%ebp > + leal -899497514(%rdx,%r11,1),%r11d > + xorl %esi,%eax > + addl %ecx,%r11d > + roll $30,%r13d > + addl %eax,%r11d > + roll $1,%ebp > + xorl 40(%rsp),%r14d > + movl %r12d,%eax > + > + movl %r11d,%ecx > + xorl 48(%rsp),%r14d > + xorl %esi,%eax > + roll $5,%ecx > + xorl 8(%rsp),%r14d > + leal -899497514(%rbp,%rdi,1),%edi > + xorl %r13d,%eax > + addl %ecx,%edi > + roll $30,%r12d > + addl %eax,%edi > + roll $1,%r14d > + xorl 44(%rsp),%edx > + movl %r11d,%eax > + > + movl %edi,%ecx > + xorl 52(%rsp),%edx > + xorl %r13d,%eax > + roll $5,%ecx > + xorl 12(%rsp),%edx > + leal -899497514(%r14,%rsi,1),%esi > + xorl %r12d,%eax > + addl %ecx,%esi > + roll $30,%r11d > + addl %eax,%esi > + roll $1,%edx > + xorl 48(%rsp),%ebp > + movl %edi,%eax > + > + movl %esi,%ecx > + xorl 56(%rsp),%ebp > + xorl %r12d,%eax > + roll $5,%ecx > + xorl 16(%rsp),%ebp > + leal -899497514(%rdx,%r13,1),%r13d > + xorl %r11d,%eax > + addl %ecx,%r13d > + roll $30,%edi > + addl %eax,%r13d > + roll $1,%ebp > + xorl 52(%rsp),%r14d > + movl %esi,%eax > + > + movl %r13d,%ecx > + xorl 60(%rsp),%r14d > + xorl %r11d,%eax > + roll $5,%ecx > + xorl 20(%rsp),%r14d > + leal -899497514(%rbp,%r12,1),%r12d > + xorl %edi,%eax > + addl %ecx,%r12d > + roll $30,%esi > + addl %eax,%r12d > + roll $1,%r14d > + xorl 56(%rsp),%edx > + movl %r13d,%eax > + > + movl %r12d,%ecx > + xorl 0(%rsp),%edx > + xorl %edi,%eax > + roll $5,%ecx > + xorl 24(%rsp),%edx > + leal -899497514(%r14,%r11,1),%r11d > + xorl %esi,%eax > + addl %ecx,%r11d > + roll $30,%r13d > + addl %eax,%r11d > + roll $1,%edx > + xorl 60(%rsp),%ebp > + movl %r12d,%eax > + > + movl %r11d,%ecx > + xorl 4(%rsp),%ebp > + xorl %esi,%eax > + roll $5,%ecx > + xorl 28(%rsp),%ebp > + leal -899497514(%rdx,%rdi,1),%edi > + xorl %r13d,%eax > + addl %ecx,%edi > + roll $30,%r12d > + addl %eax,%edi > + roll $1,%ebp > + movl %r11d,%eax > + movl %edi,%ecx > + xorl %r13d,%eax > + leal -899497514(%rbp,%rsi,1),%esi > + roll $5,%ecx > + xorl %r12d,%eax > + addl %ecx,%esi > + roll $30,%r11d > + addl %eax,%esi > + addl 0(%r8),%esi > + addl 4(%r8),%edi > + addl 8(%r8),%r11d > + addl 12(%r8),%r12d > + addl 16(%r8),%r13d > + movl %esi,0(%r8) > + movl %edi,4(%r8) > + movl %r11d,8(%r8) > + movl %r12d,12(%r8) > + movl %r13d,16(%r8) > + > + subq $1,%r10 > + leaq 64(%r9),%r9 > + jnz .Lloop > + > + movq 64(%rsp),%rsi > +.cfi_def_cfa %rsi,8 > + movq -40(%rsi),%r14 > +.cfi_restore %r14 > + movq -32(%rsi),%r13 > +.cfi_restore %r13 > + movq -24(%rsi),%r12 > +.cfi_restore %r12 > + movq -16(%rsi),%rbp > +.cfi_restore %rbp > + movq -8(%rsi),%rbx > +.cfi_restore %rbx > + leaq (%rsi),%rsp > +.cfi_def_cfa_register %rsp > +.Lepilogue: > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size sha1_block_data_order,.-sha1_block_data_order > +.type sha1_block_data_order_shaext,@function > +.align 32 > +sha1_block_data_order_shaext: > +_shaext_shortcut: > +.cfi_startproc > + movdqu (%rdi),%xmm0 > + movd 16(%rdi),%xmm1 > + movdqa K_XX_XX+160(%rip),%xmm3 > + > + movdqu (%rsi),%xmm4 > + pshufd $27,%xmm0,%xmm0 > + movdqu 16(%rsi),%xmm5 > + pshufd $27,%xmm1,%xmm1 > + movdqu 32(%rsi),%xmm6 > +.byte 102,15,56,0,227 > + movdqu 48(%rsi),%xmm7 > +.byte 102,15,56,0,235 > +.byte 102,15,56,0,243 > + movdqa %xmm1,%xmm9 > +.byte 102,15,56,0,251 > + jmp .Loop_shaext > + > +.align 16 > +.Loop_shaext: > + decq %rdx > + leaq 64(%rsi),%r8 > + paddd %xmm4,%xmm1 > + cmovneq %r8,%rsi > + movdqa %xmm0,%xmm8 > +.byte 15,56,201,229 > + movdqa %xmm0,%xmm2 > +.byte 15,58,204,193,0 > +.byte 15,56,200,213 > + pxor %xmm6,%xmm4 > +.byte 15,56,201,238 > +.byte 15,56,202,231 > + > + movdqa %xmm0,%xmm1 > +.byte 15,58,204,194,0 > +.byte 15,56,200,206 > + pxor %xmm7,%xmm5 > +.byte 15,56,202,236 > +.byte 15,56,201,247 > + movdqa %xmm0,%xmm2 > +.byte 15,58,204,193,0 > +.byte 15,56,200,215 > + pxor %xmm4,%xmm6 > +.byte 15,56,201,252 > +.byte 15,56,202,245 > + > + movdqa %xmm0,%xmm1 > +.byte 15,58,204,194,0 > +.byte 15,56,200,204 > + pxor %xmm5,%xmm7 > +.byte 15,56,202,254 > +.byte 15,56,201,229 > + movdqa %xmm0,%xmm2 > +.byte 15,58,204,193,0 > +.byte 15,56,200,213 > + pxor %xmm6,%xmm4 > +.byte 15,56,201,238 > +.byte 15,56,202,231 > + > + movdqa %xmm0,%xmm1 > +.byte 15,58,204,194,1 > +.byte 15,56,200,206 > + pxor %xmm7,%xmm5 > +.byte 15,56,202,236 > +.byte 15,56,201,247 > + movdqa %xmm0,%xmm2 > +.byte 15,58,204,193,1 > +.byte 15,56,200,215 > + pxor %xmm4,%xmm6 > +.byte 15,56,201,252 > +.byte 15,56,202,245 > + > + movdqa %xmm0,%xmm1 > +.byte 15,58,204,194,1 > +.byte 15,56,200,204 > + pxor %xmm5,%xmm7 > +.byte 15,56,202,254 > +.byte 15,56,201,229 > + movdqa %xmm0,%xmm2 > +.byte 15,58,204,193,1 > +.byte 15,56,200,213 > + pxor %xmm6,%xmm4 > +.byte 15,56,201,238 > +.byte 15,56,202,231 > + > + movdqa %xmm0,%xmm1 > +.byte 15,58,204,194,1 > +.byte 15,56,200,206 > + pxor %xmm7,%xmm5 > +.byte 15,56,202,236 > +.byte 15,56,201,247 > + movdqa %xmm0,%xmm2 > +.byte 15,58,204,193,2 > +.byte 15,56,200,215 > + pxor %xmm4,%xmm6 > +.byte 15,56,201,252 > +.byte 15,56,202,245 > + > + movdqa %xmm0,%xmm1 > +.byte 15,58,204,194,2 > +.byte 15,56,200,204 > + pxor %xmm5,%xmm7 > +.byte 15,56,202,254 > +.byte 15,56,201,229 > + movdqa %xmm0,%xmm2 > +.byte 15,58,204,193,2 > +.byte 15,56,200,213 > + pxor %xmm6,%xmm4 > +.byte 15,56,201,238 > +.byte 15,56,202,231 > + > + movdqa %xmm0,%xmm1 > +.byte 15,58,204,194,2 > +.byte 15,56,200,206 > + pxor %xmm7,%xmm5 > +.byte 15,56,202,236 > +.byte 15,56,201,247 > + movdqa %xmm0,%xmm2 > +.byte 15,58,204,193,2 > +.byte 15,56,200,215 > + pxor %xmm4,%xmm6 > +.byte 15,56,201,252 > +.byte 15,56,202,245 > + > + movdqa %xmm0,%xmm1 > +.byte 15,58,204,194,3 > +.byte 15,56,200,204 > + pxor %xmm5,%xmm7 > +.byte 15,56,202,254 > + movdqu (%rsi),%xmm4 > + movdqa %xmm0,%xmm2 > +.byte 15,58,204,193,3 > +.byte 15,56,200,213 > + movdqu 16(%rsi),%xmm5 > +.byte 102,15,56,0,227 > + > + movdqa %xmm0,%xmm1 > +.byte 15,58,204,194,3 > +.byte 15,56,200,206 > + movdqu 32(%rsi),%xmm6 > +.byte 102,15,56,0,235 > + > + movdqa %xmm0,%xmm2 > +.byte 15,58,204,193,3 > +.byte 15,56,200,215 > + movdqu 48(%rsi),%xmm7 > +.byte 102,15,56,0,243 > + > + movdqa %xmm0,%xmm1 > +.byte 15,58,204,194,3 > +.byte 65,15,56,200,201 > +.byte 102,15,56,0,251 > + > + paddd %xmm8,%xmm0 > + movdqa %xmm1,%xmm9 > + > + jnz .Loop_shaext > + > + pshufd $27,%xmm0,%xmm0 > + pshufd $27,%xmm1,%xmm1 > + movdqu %xmm0,(%rdi) > + movd %xmm1,16(%rdi) > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size sha1_block_data_order_shaext,.-sha1_block_data_order_shaext > +.type sha1_block_data_order_ssse3,@function > +.align 16 > +sha1_block_data_order_ssse3: > +_ssse3_shortcut: > +.cfi_startproc > + movq %rsp,%r11 > +.cfi_def_cfa_register %r11 > + pushq %rbx > +.cfi_offset %rbx,-16 > + pushq %rbp > +.cfi_offset %rbp,-24 > + pushq %r12 > +.cfi_offset %r12,-32 > + pushq %r13 > +.cfi_offset %r13,-40 > + pushq %r14 > +.cfi_offset %r14,-48 > + leaq -64(%rsp),%rsp > + andq $-64,%rsp > + movq %rdi,%r8 > + movq %rsi,%r9 > + movq %rdx,%r10 > + > + shlq $6,%r10 > + addq %r9,%r10 > + leaq K_XX_XX+64(%rip),%r14 > + > + movl 0(%r8),%eax > + movl 4(%r8),%ebx > + movl 8(%r8),%ecx > + movl 12(%r8),%edx > + movl %ebx,%esi > + movl 16(%r8),%ebp > + movl %ecx,%edi > + xorl %edx,%edi > + andl %edi,%esi > + > + movdqa 64(%r14),%xmm6 > + movdqa -64(%r14),%xmm9 > + movdqu 0(%r9),%xmm0 > + movdqu 16(%r9),%xmm1 > + movdqu 32(%r9),%xmm2 > + movdqu 48(%r9),%xmm3 > +.byte 102,15,56,0,198 > +.byte 102,15,56,0,206 > +.byte 102,15,56,0,214 > + addq $64,%r9 > + paddd %xmm9,%xmm0 > +.byte 102,15,56,0,222 > + paddd %xmm9,%xmm1 > + paddd %xmm9,%xmm2 > + movdqa %xmm0,0(%rsp) > + psubd %xmm9,%xmm0 > + movdqa %xmm1,16(%rsp) > + psubd %xmm9,%xmm1 > + movdqa %xmm2,32(%rsp) > + psubd %xmm9,%xmm2 > + jmp .Loop_ssse3 > +.align 16 > +.Loop_ssse3: > + rorl $2,%ebx > + pshufd $238,%xmm0,%xmm4 > + xorl %edx,%esi > + movdqa %xmm3,%xmm8 > + paddd %xmm3,%xmm9 > + movl %eax,%edi > + addl 0(%rsp),%ebp > + punpcklqdq %xmm1,%xmm4 > + xorl %ecx,%ebx > + roll $5,%eax > + addl %esi,%ebp > + psrldq $4,%xmm8 > + andl %ebx,%edi > + xorl %ecx,%ebx > + pxor %xmm0,%xmm4 > + addl %eax,%ebp > + rorl $7,%eax > + pxor %xmm2,%xmm8 > + xorl %ecx,%edi > + movl %ebp,%esi > + addl 4(%rsp),%edx > + pxor %xmm8,%xmm4 > + xorl %ebx,%eax > + roll $5,%ebp > + movdqa %xmm9,48(%rsp) > + addl %edi,%edx > + andl %eax,%esi > + movdqa %xmm4,%xmm10 > + xorl %ebx,%eax > + addl %ebp,%edx > + rorl $7,%ebp > + movdqa %xmm4,%xmm8 > + xorl %ebx,%esi > + pslldq $12,%xmm10 > + paddd %xmm4,%xmm4 > + movl %edx,%edi > + addl 8(%rsp),%ecx > + psrld $31,%xmm8 > + xorl %eax,%ebp > + roll $5,%edx > + addl %esi,%ecx > + movdqa %xmm10,%xmm9 > + andl %ebp,%edi > + xorl %eax,%ebp > + psrld $30,%xmm10 > + addl %edx,%ecx > + rorl $7,%edx > + por %xmm8,%xmm4 > + xorl %eax,%edi > + movl %ecx,%esi > + addl 12(%rsp),%ebx > + pslld $2,%xmm9 > + pxor %xmm10,%xmm4 > + xorl %ebp,%edx > + movdqa -64(%r14),%xmm10 > + roll $5,%ecx > + addl %edi,%ebx > + andl %edx,%esi > + pxor %xmm9,%xmm4 > + xorl %ebp,%edx > + addl %ecx,%ebx > + rorl $7,%ecx > + pshufd $238,%xmm1,%xmm5 > + xorl %ebp,%esi > + movdqa %xmm4,%xmm9 > + paddd %xmm4,%xmm10 > + movl %ebx,%edi > + addl 16(%rsp),%eax > + punpcklqdq %xmm2,%xmm5 > + xorl %edx,%ecx > + roll $5,%ebx > + addl %esi,%eax > + psrldq $4,%xmm9 > + andl %ecx,%edi > + xorl %edx,%ecx > + pxor %xmm1,%xmm5 > + addl %ebx,%eax > + rorl $7,%ebx > + pxor %xmm3,%xmm9 > + xorl %edx,%edi > + movl %eax,%esi > + addl 20(%rsp),%ebp > + pxor %xmm9,%xmm5 > + xorl %ecx,%ebx > + roll $5,%eax > + movdqa %xmm10,0(%rsp) > + addl %edi,%ebp > + andl %ebx,%esi > + movdqa %xmm5,%xmm8 > + xorl %ecx,%ebx > + addl %eax,%ebp > + rorl $7,%eax > + movdqa %xmm5,%xmm9 > + xorl %ecx,%esi > + pslldq $12,%xmm8 > + paddd %xmm5,%xmm5 > + movl %ebp,%edi > + addl 24(%rsp),%edx > + psrld $31,%xmm9 > + xorl %ebx,%eax > + roll $5,%ebp > + addl %esi,%edx > + movdqa %xmm8,%xmm10 > + andl %eax,%edi > + xorl %ebx,%eax > + psrld $30,%xmm8 > + addl %ebp,%edx > + rorl $7,%ebp > + por %xmm9,%xmm5 > + xorl %ebx,%edi > + movl %edx,%esi > + addl 28(%rsp),%ecx > + pslld $2,%xmm10 > + pxor %xmm8,%xmm5 > + xorl %eax,%ebp > + movdqa -32(%r14),%xmm8 > + roll $5,%edx > + addl %edi,%ecx > + andl %ebp,%esi > + pxor %xmm10,%xmm5 > + xorl %eax,%ebp > + addl %edx,%ecx > + rorl $7,%edx > + pshufd $238,%xmm2,%xmm6 > + xorl %eax,%esi > + movdqa %xmm5,%xmm10 > + paddd %xmm5,%xmm8 > + movl %ecx,%edi > + addl 32(%rsp),%ebx > + punpcklqdq %xmm3,%xmm6 > + xorl %ebp,%edx > + roll $5,%ecx > + addl %esi,%ebx > + psrldq $4,%xmm10 > + andl %edx,%edi > + xorl %ebp,%edx > + pxor %xmm2,%xmm6 > + addl %ecx,%ebx > + rorl $7,%ecx > + pxor %xmm4,%xmm10 > + xorl %ebp,%edi > + movl %ebx,%esi > + addl 36(%rsp),%eax > + pxor %xmm10,%xmm6 > + xorl %edx,%ecx > + roll $5,%ebx > + movdqa %xmm8,16(%rsp) > + addl %edi,%eax > + andl %ecx,%esi > + movdqa %xmm6,%xmm9 > + xorl %edx,%ecx > + addl %ebx,%eax > + rorl $7,%ebx > + movdqa %xmm6,%xmm10 > + xorl %edx,%esi > + pslldq $12,%xmm9 > + paddd %xmm6,%xmm6 > + movl %eax,%edi > + addl 40(%rsp),%ebp > + psrld $31,%xmm10 > + xorl %ecx,%ebx > + roll $5,%eax > + addl %esi,%ebp > + movdqa %xmm9,%xmm8 > + andl %ebx,%edi > + xorl %ecx,%ebx > + psrld $30,%xmm9 > + addl %eax,%ebp > + rorl $7,%eax > + por %xmm10,%xmm6 > + xorl %ecx,%edi > + movl %ebp,%esi > + addl 44(%rsp),%edx > + pslld $2,%xmm8 > + pxor %xmm9,%xmm6 > + xorl %ebx,%eax > + movdqa -32(%r14),%xmm9 > + roll $5,%ebp > + addl %edi,%edx > + andl %eax,%esi > + pxor %xmm8,%xmm6 > + xorl %ebx,%eax > + addl %ebp,%edx > + rorl $7,%ebp > + pshufd $238,%xmm3,%xmm7 > + xorl %ebx,%esi > + movdqa %xmm6,%xmm8 > + paddd %xmm6,%xmm9 > + movl %edx,%edi > + addl 48(%rsp),%ecx > + punpcklqdq %xmm4,%xmm7 > + xorl %eax,%ebp > + roll $5,%edx > + addl %esi,%ecx > + psrldq $4,%xmm8 > + andl %ebp,%edi > + xorl %eax,%ebp > + pxor %xmm3,%xmm7 > + addl %edx,%ecx > + rorl $7,%edx > + pxor %xmm5,%xmm8 > + xorl %eax,%edi > + movl %ecx,%esi > + addl 52(%rsp),%ebx > + pxor %xmm8,%xmm7 > + xorl %ebp,%edx > + roll $5,%ecx > + movdqa %xmm9,32(%rsp) > + addl %edi,%ebx > + andl %edx,%esi > + movdqa %xmm7,%xmm10 > + xorl %ebp,%edx > + addl %ecx,%ebx > + rorl $7,%ecx > + movdqa %xmm7,%xmm8 > + xorl %ebp,%esi > + pslldq $12,%xmm10 > + paddd %xmm7,%xmm7 > + movl %ebx,%edi > + addl 56(%rsp),%eax > + psrld $31,%xmm8 > + xorl %edx,%ecx > + roll $5,%ebx > + addl %esi,%eax > + movdqa %xmm10,%xmm9 > + andl %ecx,%edi > + xorl %edx,%ecx > + psrld $30,%xmm10 > + addl %ebx,%eax > + rorl $7,%ebx > + por %xmm8,%xmm7 > + xorl %edx,%edi > + movl %eax,%esi > + addl 60(%rsp),%ebp > + pslld $2,%xmm9 > + pxor %xmm10,%xmm7 > + xorl %ecx,%ebx > + movdqa -32(%r14),%xmm10 > + roll $5,%eax > + addl %edi,%ebp > + andl %ebx,%esi > + pxor %xmm9,%xmm7 > + pshufd $238,%xmm6,%xmm9 > + xorl %ecx,%ebx > + addl %eax,%ebp > + rorl $7,%eax > + pxor %xmm4,%xmm0 > + xorl %ecx,%esi > + movl %ebp,%edi > + addl 0(%rsp),%edx > + punpcklqdq %xmm7,%xmm9 > + xorl %ebx,%eax > + roll $5,%ebp > + pxor %xmm1,%xmm0 > + addl %esi,%edx > + andl %eax,%edi > + movdqa %xmm10,%xmm8 > + xorl %ebx,%eax > + paddd %xmm7,%xmm10 > + addl %ebp,%edx > + pxor %xmm9,%xmm0 > + rorl $7,%ebp > + xorl %ebx,%edi > + movl %edx,%esi > + addl 4(%rsp),%ecx > + movdqa %xmm0,%xmm9 > + xorl %eax,%ebp > + roll $5,%edx > + movdqa %xmm10,48(%rsp) > + addl %edi,%ecx > + andl %ebp,%esi > + xorl %eax,%ebp > + pslld $2,%xmm0 > + addl %edx,%ecx > + rorl $7,%edx > + psrld $30,%xmm9 > + xorl %eax,%esi > + movl %ecx,%edi > + addl 8(%rsp),%ebx > + por %xmm9,%xmm0 > + xorl %ebp,%edx > + roll $5,%ecx > + pshufd $238,%xmm7,%xmm10 > + addl %esi,%ebx > + andl %edx,%edi > + xorl %ebp,%edx > + addl %ecx,%ebx > + addl 12(%rsp),%eax > + xorl %ebp,%edi > + movl %ebx,%esi > + roll $5,%ebx > + addl %edi,%eax > + xorl %edx,%esi > + rorl $7,%ecx > + addl %ebx,%eax > + pxor %xmm5,%xmm1 > + addl 16(%rsp),%ebp > + xorl %ecx,%esi > + punpcklqdq %xmm0,%xmm10 > + movl %eax,%edi > + roll $5,%eax > + pxor %xmm2,%xmm1 > + addl %esi,%ebp > + xorl %ecx,%edi > + movdqa %xmm8,%xmm9 > + rorl $7,%ebx > + paddd %xmm0,%xmm8 > + addl %eax,%ebp > + pxor %xmm10,%xmm1 > + addl 20(%rsp),%edx > + xorl %ebx,%edi > + movl %ebp,%esi > + roll $5,%ebp > + movdqa %xmm1,%xmm10 > + addl %edi,%edx > + xorl %ebx,%esi > + movdqa %xmm8,0(%rsp) > + rorl $7,%eax > + addl %ebp,%edx > + addl 24(%rsp),%ecx > + pslld $2,%xmm1 > + xorl %eax,%esi > + movl %edx,%edi > + psrld $30,%xmm10 > + roll $5,%edx > + addl %esi,%ecx > + xorl %eax,%edi > + rorl $7,%ebp > + por %xmm10,%xmm1 > + addl %edx,%ecx > + addl 28(%rsp),%ebx > + pshufd $238,%xmm0,%xmm8 > + xorl %ebp,%edi > + movl %ecx,%esi > + roll $5,%ecx > + addl %edi,%ebx > + xorl %ebp,%esi > + rorl $7,%edx > + addl %ecx,%ebx > + pxor %xmm6,%xmm2 > + addl 32(%rsp),%eax > + xorl %edx,%esi > + punpcklqdq %xmm1,%xmm8 > + movl %ebx,%edi > + roll $5,%ebx > + pxor %xmm3,%xmm2 > + addl %esi,%eax > + xorl %edx,%edi > + movdqa 0(%r14),%xmm10 > + rorl $7,%ecx > + paddd %xmm1,%xmm9 > + addl %ebx,%eax > + pxor %xmm8,%xmm2 > + addl 36(%rsp),%ebp > + xorl %ecx,%edi > + movl %eax,%esi > + roll $5,%eax > + movdqa %xmm2,%xmm8 > + addl %edi,%ebp > + xorl %ecx,%esi > + movdqa %xmm9,16(%rsp) > + rorl $7,%ebx > + addl %eax,%ebp > + addl 40(%rsp),%edx > + pslld $2,%xmm2 > + xorl %ebx,%esi > + movl %ebp,%edi > + psrld $30,%xmm8 > + roll $5,%ebp > + addl %esi,%edx > + xorl %ebx,%edi > + rorl $7,%eax > + por %xmm8,%xmm2 > + addl %ebp,%edx > + addl 44(%rsp),%ecx > + pshufd $238,%xmm1,%xmm9 > + xorl %eax,%edi > + movl %edx,%esi > + roll $5,%edx > + addl %edi,%ecx > + xorl %eax,%esi > + rorl $7,%ebp > + addl %edx,%ecx > + pxor %xmm7,%xmm3 > + addl 48(%rsp),%ebx > + xorl %ebp,%esi > + punpcklqdq %xmm2,%xmm9 > + movl %ecx,%edi > + roll $5,%ecx > + pxor %xmm4,%xmm3 > + addl %esi,%ebx > + xorl %ebp,%edi > + movdqa %xmm10,%xmm8 > + rorl $7,%edx > + paddd %xmm2,%xmm10 > + addl %ecx,%ebx > + pxor %xmm9,%xmm3 > + addl 52(%rsp),%eax > + xorl %edx,%edi > + movl %ebx,%esi > + roll $5,%ebx > + movdqa %xmm3,%xmm9 > + addl %edi,%eax > + xorl %edx,%esi > + movdqa %xmm10,32(%rsp) > + rorl $7,%ecx > + addl %ebx,%eax > + addl 56(%rsp),%ebp > + pslld $2,%xmm3 > + xorl %ecx,%esi > + movl %eax,%edi > + psrld $30,%xmm9 > + roll $5,%eax > + addl %esi,%ebp > + xorl %ecx,%edi > + rorl $7,%ebx > + por %xmm9,%xmm3 > + addl %eax,%ebp > + addl 60(%rsp),%edx > + pshufd $238,%xmm2,%xmm10 > + xorl %ebx,%edi > + movl %ebp,%esi > + roll $5,%ebp > + addl %edi,%edx > + xorl %ebx,%esi > + rorl $7,%eax > + addl %ebp,%edx > + pxor %xmm0,%xmm4 > + addl 0(%rsp),%ecx > + xorl %eax,%esi > + punpcklqdq %xmm3,%xmm10 > + movl %edx,%edi > + roll $5,%edx > + pxor %xmm5,%xmm4 > + addl %esi,%ecx > + xorl %eax,%edi > + movdqa %xmm8,%xmm9 > + rorl $7,%ebp > + paddd %xmm3,%xmm8 > + addl %edx,%ecx > + pxor %xmm10,%xmm4 > + addl 4(%rsp),%ebx > + xorl %ebp,%edi > + movl %ecx,%esi > + roll $5,%ecx > + movdqa %xmm4,%xmm10 > + addl %edi,%ebx > + xorl %ebp,%esi > + movdqa %xmm8,48(%rsp) > + rorl $7,%edx > + addl %ecx,%ebx > + addl 8(%rsp),%eax > + pslld $2,%xmm4 > + xorl %edx,%esi > + movl %ebx,%edi > + psrld $30,%xmm10 > + roll $5,%ebx > + addl %esi,%eax > + xorl %edx,%edi > + rorl $7,%ecx > + por %xmm10,%xmm4 > + addl %ebx,%eax > + addl 12(%rsp),%ebp > + pshufd $238,%xmm3,%xmm8 > + xorl %ecx,%edi > + movl %eax,%esi > + roll $5,%eax > + addl %edi,%ebp > + xorl %ecx,%esi > + rorl $7,%ebx > + addl %eax,%ebp > + pxor %xmm1,%xmm5 > + addl 16(%rsp),%edx > + xorl %ebx,%esi > + punpcklqdq %xmm4,%xmm8 > + movl %ebp,%edi > + roll $5,%ebp > + pxor %xmm6,%xmm5 > + addl %esi,%edx > + xorl %ebx,%edi > + movdqa %xmm9,%xmm10 > + rorl $7,%eax > + paddd %xmm4,%xmm9 > + addl %ebp,%edx > + pxor %xmm8,%xmm5 > + addl 20(%rsp),%ecx > + xorl %eax,%edi > + movl %edx,%esi > + roll $5,%edx > + movdqa %xmm5,%xmm8 > + addl %edi,%ecx > + xorl %eax,%esi > + movdqa %xmm9,0(%rsp) > + rorl $7,%ebp > + addl %edx,%ecx > + addl 24(%rsp),%ebx > + pslld $2,%xmm5 > + xorl %ebp,%esi > + movl %ecx,%edi > + psrld $30,%xmm8 > + roll $5,%ecx > + addl %esi,%ebx > + xorl %ebp,%edi > + rorl $7,%edx > + por %xmm8,%xmm5 > + addl %ecx,%ebx > + addl 28(%rsp),%eax > + pshufd $238,%xmm4,%xmm9 > + rorl $7,%ecx > + movl %ebx,%esi > + xorl %edx,%edi > + roll $5,%ebx > + addl %edi,%eax > + xorl %ecx,%esi > + xorl %edx,%ecx > + addl %ebx,%eax > + pxor %xmm2,%xmm6 > + addl 32(%rsp),%ebp > + andl %ecx,%esi > + xorl %edx,%ecx > + rorl $7,%ebx > + punpcklqdq %xmm5,%xmm9 > + movl %eax,%edi > + xorl %ecx,%esi > + pxor %xmm7,%xmm6 > + roll $5,%eax > + addl %esi,%ebp > + movdqa %xmm10,%xmm8 > + xorl %ebx,%edi > + paddd %xmm5,%xmm10 > + xorl %ecx,%ebx > + pxor %xmm9,%xmm6 > + addl %eax,%ebp > + addl 36(%rsp),%edx > + andl %ebx,%edi > + xorl %ecx,%ebx > + rorl $7,%eax > + movdqa %xmm6,%xmm9 > + movl %ebp,%esi > + xorl %ebx,%edi > + movdqa %xmm10,16(%rsp) > + roll $5,%ebp > + addl %edi,%edx > + xorl %eax,%esi > + pslld $2,%xmm6 > + xorl %ebx,%eax > + addl %ebp,%edx > + psrld $30,%xmm9 > + addl 40(%rsp),%ecx > + andl %eax,%esi > + xorl %ebx,%eax > + por %xmm9,%xmm6 > + rorl $7,%ebp > + movl %edx,%edi > + xorl %eax,%esi > + roll $5,%edx > + pshufd $238,%xmm5,%xmm10 > + addl %esi,%ecx > + xorl %ebp,%edi > + xorl %eax,%ebp > + addl %edx,%ecx > + addl 44(%rsp),%ebx > + andl %ebp,%edi > + xorl %eax,%ebp > + rorl $7,%edx > + movl %ecx,%esi > + xorl %ebp,%edi > + roll $5,%ecx > + addl %edi,%ebx > + xorl %edx,%esi > + xorl %ebp,%edx > + addl %ecx,%ebx > + pxor %xmm3,%xmm7 > + addl 48(%rsp),%eax > + andl %edx,%esi > + xorl %ebp,%edx > + rorl $7,%ecx > + punpcklqdq %xmm6,%xmm10 > + movl %ebx,%edi > + xorl %edx,%esi > + pxor %xmm0,%xmm7 > + roll $5,%ebx > + addl %esi,%eax > + movdqa 32(%r14),%xmm9 > + xorl %ecx,%edi > + paddd %xmm6,%xmm8 > + xorl %edx,%ecx > + pxor %xmm10,%xmm7 > + addl %ebx,%eax > + addl 52(%rsp),%ebp > + andl %ecx,%edi > + xorl %edx,%ecx > + rorl $7,%ebx > + movdqa %xmm7,%xmm10 > + movl %eax,%esi > + xorl %ecx,%edi > + movdqa %xmm8,32(%rsp) > + roll $5,%eax > + addl %edi,%ebp > + xorl %ebx,%esi > + pslld $2,%xmm7 > + xorl %ecx,%ebx > + addl %eax,%ebp > + psrld $30,%xmm10 > + addl 56(%rsp),%edx > + andl %ebx,%esi > + xorl %ecx,%ebx > + por %xmm10,%xmm7 > + rorl $7,%eax > + movl %ebp,%edi > + xorl %ebx,%esi > + roll $5,%ebp > + pshufd $238,%xmm6,%xmm8 > + addl %esi,%edx > + xorl %eax,%edi > + xorl %ebx,%eax > + addl %ebp,%edx > + addl 60(%rsp),%ecx > + andl %eax,%edi > + xorl %ebx,%eax > + rorl $7,%ebp > + movl %edx,%esi > + xorl %eax,%edi > + roll $5,%edx > + addl %edi,%ecx > + xorl %ebp,%esi > + xorl %eax,%ebp > + addl %edx,%ecx > + pxor %xmm4,%xmm0 > + addl 0(%rsp),%ebx > + andl %ebp,%esi > + xorl %eax,%ebp > + rorl $7,%edx > + punpcklqdq %xmm7,%xmm8 > + movl %ecx,%edi > + xorl %ebp,%esi > + pxor %xmm1,%xmm0 > + roll $5,%ecx > + addl %esi,%ebx > + movdqa %xmm9,%xmm10 > + xorl %edx,%edi > + paddd %xmm7,%xmm9 > + xorl %ebp,%edx > + pxor %xmm8,%xmm0 > + addl %ecx,%ebx > + addl 4(%rsp),%eax > + andl %edx,%edi > + xorl %ebp,%edx > + rorl $7,%ecx > + movdqa %xmm0,%xmm8 > + movl %ebx,%esi > + xorl %edx,%edi > + movdqa %xmm9,48(%rsp) > + roll $5,%ebx > + addl %edi,%eax > + xorl %ecx,%esi > + pslld $2,%xmm0 > + xorl %edx,%ecx > + addl %ebx,%eax > + psrld $30,%xmm8 > + addl 8(%rsp),%ebp > + andl %ecx,%esi > + xorl %edx,%ecx > + por %xmm8,%xmm0 > + rorl $7,%ebx > + movl %eax,%edi > + xorl %ecx,%esi > + roll $5,%eax > + pshufd $238,%xmm7,%xmm9 > + addl %esi,%ebp > + xorl %ebx,%edi > + xorl %ecx,%ebx > + addl %eax,%ebp > + addl 12(%rsp),%edx > + andl %ebx,%edi > + xorl %ecx,%ebx > + rorl $7,%eax > + movl %ebp,%esi > + xorl %ebx,%edi > + roll $5,%ebp > + addl %edi,%edx > + xorl %eax,%esi > + xorl %ebx,%eax > + addl %ebp,%edx > + pxor %xmm5,%xmm1 > + addl 16(%rsp),%ecx > + andl %eax,%esi > + xorl %ebx,%eax > + rorl $7,%ebp > + punpcklqdq %xmm0,%xmm9 > + movl %edx,%edi > + xorl %eax,%esi > + pxor %xmm2,%xmm1 > + roll $5,%edx > + addl %esi,%ecx > + movdqa %xmm10,%xmm8 > + xorl %ebp,%edi > + paddd %xmm0,%xmm10 > + xorl %eax,%ebp > + pxor %xmm9,%xmm1 > + addl %edx,%ecx > + addl 20(%rsp),%ebx > + andl %ebp,%edi > + xorl %eax,%ebp > + rorl $7,%edx > + movdqa %xmm1,%xmm9 > + movl %ecx,%esi > + xorl %ebp,%edi > + movdqa %xmm10,0(%rsp) > + roll $5,%ecx > + addl %edi,%ebx > + xorl %edx,%esi > + pslld $2,%xmm1 > + xorl %ebp,%edx > + addl %ecx,%ebx > + psrld $30,%xmm9 > + addl 24(%rsp),%eax > + andl %edx,%esi > + xorl %ebp,%edx > + por %xmm9,%xmm1 > + rorl $7,%ecx > + movl %ebx,%edi > + xorl %edx,%esi > + roll $5,%ebx > + pshufd $238,%xmm0,%xmm10 > + addl %esi,%eax > + xorl %ecx,%edi > + xorl %edx,%ecx > + addl %ebx,%eax > + addl 28(%rsp),%ebp > + andl %ecx,%edi > + xorl %edx,%ecx > + rorl $7,%ebx > + movl %eax,%esi > + xorl %ecx,%edi > + roll $5,%eax > + addl %edi,%ebp > + xorl %ebx,%esi > + xorl %ecx,%ebx > + addl %eax,%ebp > + pxor %xmm6,%xmm2 > + addl 32(%rsp),%edx > + andl %ebx,%esi > + xorl %ecx,%ebx > + rorl $7,%eax > + punpcklqdq %xmm1,%xmm10 > + movl %ebp,%edi > + xorl %ebx,%esi > + pxor %xmm3,%xmm2 > + roll $5,%ebp > + addl %esi,%edx > + movdqa %xmm8,%xmm9 > + xorl %eax,%edi > + paddd %xmm1,%xmm8 > + xorl %ebx,%eax > + pxor %xmm10,%xmm2 > + addl %ebp,%edx > + addl 36(%rsp),%ecx > + andl %eax,%edi > + xorl %ebx,%eax > + rorl $7,%ebp > + movdqa %xmm2,%xmm10 > + movl %edx,%esi > + xorl %eax,%edi > + movdqa %xmm8,16(%rsp) > + roll $5,%edx > + addl %edi,%ecx > + xorl %ebp,%esi > + pslld $2,%xmm2 > + xorl %eax,%ebp > + addl %edx,%ecx > + psrld $30,%xmm10 > + addl 40(%rsp),%ebx > + andl %ebp,%esi > + xorl %eax,%ebp > + por %xmm10,%xmm2 > + rorl $7,%edx > + movl %ecx,%edi > + xorl %ebp,%esi > + roll $5,%ecx > + pshufd $238,%xmm1,%xmm8 > + addl %esi,%ebx > + xorl %edx,%edi > + xorl %ebp,%edx > + addl %ecx,%ebx > + addl 44(%rsp),%eax > + andl %edx,%edi > + xorl %ebp,%edx > + rorl $7,%ecx > + movl %ebx,%esi > + xorl %edx,%edi > + roll $5,%ebx > + addl %edi,%eax > + xorl %edx,%esi > + addl %ebx,%eax > + pxor %xmm7,%xmm3 > + addl 48(%rsp),%ebp > + xorl %ecx,%esi > + punpcklqdq %xmm2,%xmm8 > + movl %eax,%edi > + roll $5,%eax > + pxor %xmm4,%xmm3 > + addl %esi,%ebp > + xorl %ecx,%edi > + movdqa %xmm9,%xmm10 > + rorl $7,%ebx > + paddd %xmm2,%xmm9 > + addl %eax,%ebp > + pxor %xmm8,%xmm3 > + addl 52(%rsp),%edx > + xorl %ebx,%edi > + movl %ebp,%esi > + roll $5,%ebp > + movdqa %xmm3,%xmm8 > + addl %edi,%edx > + xorl %ebx,%esi > + movdqa %xmm9,32(%rsp) > + rorl $7,%eax > + addl %ebp,%edx > + addl 56(%rsp),%ecx > + pslld $2,%xmm3 > + xorl %eax,%esi > + movl %edx,%edi > + psrld $30,%xmm8 > + roll $5,%edx > + addl %esi,%ecx > + xorl %eax,%edi > + rorl $7,%ebp > + por %xmm8,%xmm3 > + addl %edx,%ecx > + addl 60(%rsp),%ebx > + xorl %ebp,%edi > + movl %ecx,%esi > + roll $5,%ecx > + addl %edi,%ebx > + xorl %ebp,%esi > + rorl $7,%edx > + addl %ecx,%ebx > + addl 0(%rsp),%eax > + xorl %edx,%esi > + movl %ebx,%edi > + roll $5,%ebx > + paddd %xmm3,%xmm10 > + addl %esi,%eax > + xorl %edx,%edi > + movdqa %xmm10,48(%rsp) > + rorl $7,%ecx > + addl %ebx,%eax > + addl 4(%rsp),%ebp > + xorl %ecx,%edi > + movl %eax,%esi > + roll $5,%eax > + addl %edi,%ebp > + xorl %ecx,%esi > + rorl $7,%ebx > + addl %eax,%ebp > + addl 8(%rsp),%edx > + xorl %ebx,%esi > + movl %ebp,%edi > + roll $5,%ebp > + addl %esi,%edx > + xorl %ebx,%edi > + rorl $7,%eax > + addl %ebp,%edx > + addl 12(%rsp),%ecx > + xorl %eax,%edi > + movl %edx,%esi > + roll $5,%edx > + addl %edi,%ecx > + xorl %eax,%esi > + rorl $7,%ebp > + addl %edx,%ecx > + cmpq %r10,%r9 > + je .Ldone_ssse3 > + movdqa 64(%r14),%xmm6 > + movdqa -64(%r14),%xmm9 > + movdqu 0(%r9),%xmm0 > + movdqu 16(%r9),%xmm1 > + movdqu 32(%r9),%xmm2 > + movdqu 48(%r9),%xmm3 > +.byte 102,15,56,0,198 > + addq $64,%r9 > + addl 16(%rsp),%ebx > + xorl %ebp,%esi > + movl %ecx,%edi > +.byte 102,15,56,0,206 > + roll $5,%ecx > + addl %esi,%ebx > + xorl %ebp,%edi > + rorl $7,%edx > + paddd %xmm9,%xmm0 > + addl %ecx,%ebx > + addl 20(%rsp),%eax > + xorl %edx,%edi > + movl %ebx,%esi > + movdqa %xmm0,0(%rsp) > + roll $5,%ebx > + addl %edi,%eax > + xorl %edx,%esi > + rorl $7,%ecx > + psubd %xmm9,%xmm0 > + addl %ebx,%eax > + addl 24(%rsp),%ebp > + xorl %ecx,%esi > + movl %eax,%edi > + roll $5,%eax > + addl %esi,%ebp > + xorl %ecx,%edi > + rorl $7,%ebx > + addl %eax,%ebp > + addl 28(%rsp),%edx > + xorl %ebx,%edi > + movl %ebp,%esi > + roll $5,%ebp > + addl %edi,%edx > + xorl %ebx,%esi > + rorl $7,%eax > + addl %ebp,%edx > + addl 32(%rsp),%ecx > + xorl %eax,%esi > + movl %edx,%edi > +.byte 102,15,56,0,214 > + roll $5,%edx > + addl %esi,%ecx > + xorl %eax,%edi > + rorl $7,%ebp > + paddd %xmm9,%xmm1 > + addl %edx,%ecx > + addl 36(%rsp),%ebx > + xorl %ebp,%edi > + movl %ecx,%esi > + movdqa %xmm1,16(%rsp) > + roll $5,%ecx > + addl %edi,%ebx > + xorl %ebp,%esi > + rorl $7,%edx > + psubd %xmm9,%xmm1 > + addl %ecx,%ebx > + addl 40(%rsp),%eax > + xorl %edx,%esi > + movl %ebx,%edi > + roll $5,%ebx > + addl %esi,%eax > + xorl %edx,%edi > + rorl $7,%ecx > + addl %ebx,%eax > + addl 44(%rsp),%ebp > + xorl %ecx,%edi > + movl %eax,%esi > + roll $5,%eax > + addl %edi,%ebp > + xorl %ecx,%esi > + rorl $7,%ebx > + addl %eax,%ebp > + addl 48(%rsp),%edx > + xorl %ebx,%esi > + movl %ebp,%edi > +.byte 102,15,56,0,222 > + roll $5,%ebp > + addl %esi,%edx > + xorl %ebx,%edi > + rorl $7,%eax > + paddd %xmm9,%xmm2 > + addl %ebp,%edx > + addl 52(%rsp),%ecx > + xorl %eax,%edi > + movl %edx,%esi > + movdqa %xmm2,32(%rsp) > + roll $5,%edx > + addl %edi,%ecx > + xorl %eax,%esi > + rorl $7,%ebp > + psubd %xmm9,%xmm2 > + addl %edx,%ecx > + addl 56(%rsp),%ebx > + xorl %ebp,%esi > + movl %ecx,%edi > + roll $5,%ecx > + addl %esi,%ebx > + xorl %ebp,%edi > + rorl $7,%edx > + addl %ecx,%ebx > + addl 60(%rsp),%eax > + xorl %edx,%edi > + movl %ebx,%esi > + roll $5,%ebx > + addl %edi,%eax > + rorl $7,%ecx > + addl %ebx,%eax > + addl 0(%r8),%eax > + addl 4(%r8),%esi > + addl 8(%r8),%ecx > + addl 12(%r8),%edx > + movl %eax,0(%r8) > + addl 16(%r8),%ebp > + movl %esi,4(%r8) > + movl %esi,%ebx > + movl %ecx,8(%r8) > + movl %ecx,%edi > + movl %edx,12(%r8) > + xorl %edx,%edi > + movl %ebp,16(%r8) > + andl %edi,%esi > + jmp .Loop_ssse3 > + > +.align 16 > +.Ldone_ssse3: > + addl 16(%rsp),%ebx > + xorl %ebp,%esi > + movl %ecx,%edi > + roll $5,%ecx > + addl %esi,%ebx > + xorl %ebp,%edi > + rorl $7,%edx > + addl %ecx,%ebx > + addl 20(%rsp),%eax > + xorl %edx,%edi > + movl %ebx,%esi > + roll $5,%ebx > + addl %edi,%eax > + xorl %edx,%esi > + rorl $7,%ecx > + addl %ebx,%eax > + addl 24(%rsp),%ebp > + xorl %ecx,%esi > + movl %eax,%edi > + roll $5,%eax > + addl %esi,%ebp > + xorl %ecx,%edi > + rorl $7,%ebx > + addl %eax,%ebp > + addl 28(%rsp),%edx > + xorl %ebx,%edi > + movl %ebp,%esi > + roll $5,%ebp > + addl %edi,%edx > + xorl %ebx,%esi > + rorl $7,%eax > + addl %ebp,%edx > + addl 32(%rsp),%ecx > + xorl %eax,%esi > + movl %edx,%edi > + roll $5,%edx > + addl %esi,%ecx > + xorl %eax,%edi > + rorl $7,%ebp > + addl %edx,%ecx > + addl 36(%rsp),%ebx > + xorl %ebp,%edi > + movl %ecx,%esi > + roll $5,%ecx > + addl %edi,%ebx > + xorl %ebp,%esi > + rorl $7,%edx > + addl %ecx,%ebx > + addl 40(%rsp),%eax > + xorl %edx,%esi > + movl %ebx,%edi > + roll $5,%ebx > + addl %esi,%eax > + xorl %edx,%edi > + rorl $7,%ecx > + addl %ebx,%eax > + addl 44(%rsp),%ebp > + xorl %ecx,%edi > + movl %eax,%esi > + roll $5,%eax > + addl %edi,%ebp > + xorl %ecx,%esi > + rorl $7,%ebx > + addl %eax,%ebp > + addl 48(%rsp),%edx > + xorl %ebx,%esi > + movl %ebp,%edi > + roll $5,%ebp > + addl %esi,%edx > + xorl %ebx,%edi > + rorl $7,%eax > + addl %ebp,%edx > + addl 52(%rsp),%ecx > + xorl %eax,%edi > + movl %edx,%esi > + roll $5,%edx > + addl %edi,%ecx > + xorl %eax,%esi > + rorl $7,%ebp > + addl %edx,%ecx > + addl 56(%rsp),%ebx > + xorl %ebp,%esi > + movl %ecx,%edi > + roll $5,%ecx > + addl %esi,%ebx > + xorl %ebp,%edi > + rorl $7,%edx > + addl %ecx,%ebx > + addl 60(%rsp),%eax > + xorl %edx,%edi > + movl %ebx,%esi > + roll $5,%ebx > + addl %edi,%eax > + rorl $7,%ecx > + addl %ebx,%eax > + addl 0(%r8),%eax > + addl 4(%r8),%esi > + addl 8(%r8),%ecx > + movl %eax,0(%r8) > + addl 12(%r8),%edx > + movl %esi,4(%r8) > + addl 16(%r8),%ebp > + movl %ecx,8(%r8) > + movl %edx,12(%r8) > + movl %ebp,16(%r8) > + movq -40(%r11),%r14 > +.cfi_restore %r14 > + movq -32(%r11),%r13 > +.cfi_restore %r13 > + movq -24(%r11),%r12 > +.cfi_restore %r12 > + movq -16(%r11),%rbp > +.cfi_restore %rbp > + movq -8(%r11),%rbx > +.cfi_restore %rbx > + leaq (%r11),%rsp > +.cfi_def_cfa_register %rsp > +.Lepilogue_ssse3: > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size sha1_block_data_order_ssse3,.-sha1_block_data_order_ssse3 > +.align 64 > +K_XX_XX: > +.long 0x5a827999,0x5a827999,0x5a827999,0x5a827999 > +.long 0x5a827999,0x5a827999,0x5a827999,0x5a827999 > +.long 0x6ed9eba1,0x6ed9eba1,0x6ed9eba1,0x6ed9eba1 > +.long 0x6ed9eba1,0x6ed9eba1,0x6ed9eba1,0x6ed9eba1 > +.long 0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc > +.long 0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc > +.long 0xca62c1d6,0xca62c1d6,0xca62c1d6,0xca62c1d6 > +.long 0xca62c1d6,0xca62c1d6,0xca62c1d6,0xca62c1d6 > +.long 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f > +.long 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f > +.byte 0xf,0xe,0xd,0xc,0xb,0xa,0x9,0x8,0x7,0x6,0x5,0x4,0x3,0x2,0x1,0x0 > +.byte > 83,72,65,49,32,98,108,111,99,107,32,116,114,97,110,115,102,111,114,109,32, > 102,111,114,32,120,56,54,95,54,52,44,32,67,82,89,80,84,79,71,65,77,83,32,98, > 121,32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46,111,114,10 > 3,62,0 > +.align 64 > diff --git a/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha256-mb- > x86_64.S b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha256-mb- > x86_64.S > new file mode 100644 > index 0000000000..25dee488b8 > --- /dev/null > +++ b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha256-mb-x86_64.S > @@ -0,0 +1,3286 @@ > +# WARNING: do not edit! > +# Generated from openssl/crypto/sha/asm/sha256-mb-x86_64.pl > +# > +# Copyright 2013-2020 The OpenSSL Project Authors. All Rights Reserved. > +# > +# Licensed under the OpenSSL license (the "License"). You may not use > +# this file except in compliance with the License. You can obtain a copy > +# in the file LICENSE in the source distribution or at > +# https://www.openssl.org/source/license.html > + > +.text > + > + > + > +.globl sha256_multi_block > +.type sha256_multi_block,@function > +.align 32 > +sha256_multi_block: > +.cfi_startproc > + movq OPENSSL_ia32cap_P+4(%rip),%rcx > + btq $61,%rcx > + jc _shaext_shortcut > + movq %rsp,%rax > +.cfi_def_cfa_register %rax > + pushq %rbx > +.cfi_offset %rbx,-16 > + pushq %rbp > +.cfi_offset %rbp,-24 > + subq $288,%rsp > + andq $-256,%rsp > + movq %rax,272(%rsp) > +.cfi_escape 0x0f,0x06,0x77,0x90,0x02,0x06,0x23,0x08 > +.Lbody: > + leaq K256+128(%rip),%rbp > + leaq 256(%rsp),%rbx > + leaq 128(%rdi),%rdi > + > +.Loop_grande: > + movl %edx,280(%rsp) > + xorl %edx,%edx > + movq 0(%rsi),%r8 > + movl 8(%rsi),%ecx > + cmpl %edx,%ecx > + cmovgl %ecx,%edx > + testl %ecx,%ecx > + movl %ecx,0(%rbx) > + cmovleq %rbp,%r8 > + movq 16(%rsi),%r9 > + movl 24(%rsi),%ecx > + cmpl %edx,%ecx > + cmovgl %ecx,%edx > + testl %ecx,%ecx > + movl %ecx,4(%rbx) > + cmovleq %rbp,%r9 > + movq 32(%rsi),%r10 > + movl 40(%rsi),%ecx > + cmpl %edx,%ecx > + cmovgl %ecx,%edx > + testl %ecx,%ecx > + movl %ecx,8(%rbx) > + cmovleq %rbp,%r10 > + movq 48(%rsi),%r11 > + movl 56(%rsi),%ecx > + cmpl %edx,%ecx > + cmovgl %ecx,%edx > + testl %ecx,%ecx > + movl %ecx,12(%rbx) > + cmovleq %rbp,%r11 > + testl %edx,%edx > + jz .Ldone > + > + movdqu 0-128(%rdi),%xmm8 > + leaq 128(%rsp),%rax > + movdqu 32-128(%rdi),%xmm9 > + movdqu 64-128(%rdi),%xmm10 > + movdqu 96-128(%rdi),%xmm11 > + movdqu 128-128(%rdi),%xmm12 > + movdqu 160-128(%rdi),%xmm13 > + movdqu 192-128(%rdi),%xmm14 > + movdqu 224-128(%rdi),%xmm15 > + movdqu .Lpbswap(%rip),%xmm6 > + jmp .Loop > + > +.align 32 > +.Loop: > + movdqa %xmm10,%xmm4 > + pxor %xmm9,%xmm4 > + movd 0(%r8),%xmm5 > + movd 0(%r9),%xmm0 > + movd 0(%r10),%xmm1 > + movd 0(%r11),%xmm2 > + punpckldq %xmm1,%xmm5 > + punpckldq %xmm2,%xmm0 > + punpckldq %xmm0,%xmm5 > + movdqa %xmm12,%xmm7 > +.byte 102,15,56,0,238 > + movdqa %xmm12,%xmm2 > + > + psrld $6,%xmm7 > + movdqa %xmm12,%xmm1 > + pslld $7,%xmm2 > + movdqa %xmm5,0-128(%rax) > + paddd %xmm15,%xmm5 > + > + psrld $11,%xmm1 > + pxor %xmm2,%xmm7 > + pslld $21-7,%xmm2 > + paddd -128(%rbp),%xmm5 > + pxor %xmm1,%xmm7 > + > + psrld $25-11,%xmm1 > + movdqa %xmm12,%xmm0 > + > + pxor %xmm2,%xmm7 > + movdqa %xmm12,%xmm3 > + pslld $26-21,%xmm2 > + pandn %xmm14,%xmm0 > + pand %xmm13,%xmm3 > + pxor %xmm1,%xmm7 > + > + > + movdqa %xmm8,%xmm1 > + pxor %xmm2,%xmm7 > + movdqa %xmm8,%xmm2 > + psrld $2,%xmm1 > + paddd %xmm7,%xmm5 > + pxor %xmm3,%xmm0 > + movdqa %xmm9,%xmm3 > + movdqa %xmm8,%xmm7 > + pslld $10,%xmm2 > + pxor %xmm8,%xmm3 > + > + > + psrld $13,%xmm7 > + pxor %xmm2,%xmm1 > + paddd %xmm0,%xmm5 > + pslld $19-10,%xmm2 > + pand %xmm3,%xmm4 > + pxor %xmm7,%xmm1 > + > + > + psrld $22-13,%xmm7 > + pxor %xmm2,%xmm1 > + movdqa %xmm9,%xmm15 > + pslld $30-19,%xmm2 > + pxor %xmm1,%xmm7 > + pxor %xmm4,%xmm15 > + paddd %xmm5,%xmm11 > + pxor %xmm2,%xmm7 > + > + paddd %xmm5,%xmm15 > + paddd %xmm7,%xmm15 > + movd 4(%r8),%xmm5 > + movd 4(%r9),%xmm0 > + movd 4(%r10),%xmm1 > + movd 4(%r11),%xmm2 > + punpckldq %xmm1,%xmm5 > + punpckldq %xmm2,%xmm0 > + punpckldq %xmm0,%xmm5 > + movdqa %xmm11,%xmm7 > + > + movdqa %xmm11,%xmm2 > +.byte 102,15,56,0,238 > + psrld $6,%xmm7 > + movdqa %xmm11,%xmm1 > + pslld $7,%xmm2 > + movdqa %xmm5,16-128(%rax) > + paddd %xmm14,%xmm5 > + > + psrld $11,%xmm1 > + pxor %xmm2,%xmm7 > + pslld $21-7,%xmm2 > + paddd -96(%rbp),%xmm5 > + pxor %xmm1,%xmm7 > + > + psrld $25-11,%xmm1 > + movdqa %xmm11,%xmm0 > + > + pxor %xmm2,%xmm7 > + movdqa %xmm11,%xmm4 > + pslld $26-21,%xmm2 > + pandn %xmm13,%xmm0 > + pand %xmm12,%xmm4 > + pxor %xmm1,%xmm7 > + > + > + movdqa %xmm15,%xmm1 > + pxor %xmm2,%xmm7 > + movdqa %xmm15,%xmm2 > + psrld $2,%xmm1 > + paddd %xmm7,%xmm5 > + pxor %xmm4,%xmm0 > + movdqa %xmm8,%xmm4 > + movdqa %xmm15,%xmm7 > + pslld $10,%xmm2 > + pxor %xmm15,%xmm4 > + > + > + psrld $13,%xmm7 > + pxor %xmm2,%xmm1 > + paddd %xmm0,%xmm5 > + pslld $19-10,%xmm2 > + pand %xmm4,%xmm3 > + pxor %xmm7,%xmm1 > + > + > + psrld $22-13,%xmm7 > + pxor %xmm2,%xmm1 > + movdqa %xmm8,%xmm14 > + pslld $30-19,%xmm2 > + pxor %xmm1,%xmm7 > + pxor %xmm3,%xmm14 > + paddd %xmm5,%xmm10 > + pxor %xmm2,%xmm7 > + > + paddd %xmm5,%xmm14 > + paddd %xmm7,%xmm14 > + movd 8(%r8),%xmm5 > + movd 8(%r9),%xmm0 > + movd 8(%r10),%xmm1 > + movd 8(%r11),%xmm2 > + punpckldq %xmm1,%xmm5 > + punpckldq %xmm2,%xmm0 > + punpckldq %xmm0,%xmm5 > + movdqa %xmm10,%xmm7 > +.byte 102,15,56,0,238 > + movdqa %xmm10,%xmm2 > + > + psrld $6,%xmm7 > + movdqa %xmm10,%xmm1 > + pslld $7,%xmm2 > + movdqa %xmm5,32-128(%rax) > + paddd %xmm13,%xmm5 > + > + psrld $11,%xmm1 > + pxor %xmm2,%xmm7 > + pslld $21-7,%xmm2 > + paddd -64(%rbp),%xmm5 > + pxor %xmm1,%xmm7 > + > + psrld $25-11,%xmm1 > + movdqa %xmm10,%xmm0 > + > + pxor %xmm2,%xmm7 > + movdqa %xmm10,%xmm3 > + pslld $26-21,%xmm2 > + pandn %xmm12,%xmm0 > + pand %xmm11,%xmm3 > + pxor %xmm1,%xmm7 > + > + > + movdqa %xmm14,%xmm1 > + pxor %xmm2,%xmm7 > + movdqa %xmm14,%xmm2 > + psrld $2,%xmm1 > + paddd %xmm7,%xmm5 > + pxor %xmm3,%xmm0 > + movdqa %xmm15,%xmm3 > + movdqa %xmm14,%xmm7 > + pslld $10,%xmm2 > + pxor %xmm14,%xmm3 > + > + > + psrld $13,%xmm7 > + pxor %xmm2,%xmm1 > + paddd %xmm0,%xmm5 > + pslld $19-10,%xmm2 > + pand %xmm3,%xmm4 > + pxor %xmm7,%xmm1 > + > + > + psrld $22-13,%xmm7 > + pxor %xmm2,%xmm1 > + movdqa %xmm15,%xmm13 > + pslld $30-19,%xmm2 > + pxor %xmm1,%xmm7 > + pxor %xmm4,%xmm13 > + paddd %xmm5,%xmm9 > + pxor %xmm2,%xmm7 > + > + paddd %xmm5,%xmm13 > + paddd %xmm7,%xmm13 > + movd 12(%r8),%xmm5 > + movd 12(%r9),%xmm0 > + movd 12(%r10),%xmm1 > + movd 12(%r11),%xmm2 > + punpckldq %xmm1,%xmm5 > + punpckldq %xmm2,%xmm0 > + punpckldq %xmm0,%xmm5 > + movdqa %xmm9,%xmm7 > + > + movdqa %xmm9,%xmm2 > +.byte 102,15,56,0,238 > + psrld $6,%xmm7 > + movdqa %xmm9,%xmm1 > + pslld $7,%xmm2 > + movdqa %xmm5,48-128(%rax) > + paddd %xmm12,%xmm5 > + > + psrld $11,%xmm1 > + pxor %xmm2,%xmm7 > + pslld $21-7,%xmm2 > + paddd -32(%rbp),%xmm5 > + pxor %xmm1,%xmm7 > + > + psrld $25-11,%xmm1 > + movdqa %xmm9,%xmm0 > + > + pxor %xmm2,%xmm7 > + movdqa %xmm9,%xmm4 > + pslld $26-21,%xmm2 > + pandn %xmm11,%xmm0 > + pand %xmm10,%xmm4 > + pxor %xmm1,%xmm7 > + > + > + movdqa %xmm13,%xmm1 > + pxor %xmm2,%xmm7 > + movdqa %xmm13,%xmm2 > + psrld $2,%xmm1 > + paddd %xmm7,%xmm5 > + pxor %xmm4,%xmm0 > + movdqa %xmm14,%xmm4 > + movdqa %xmm13,%xmm7 > + pslld $10,%xmm2 > + pxor %xmm13,%xmm4 > + > + > + psrld $13,%xmm7 > + pxor %xmm2,%xmm1 > + paddd %xmm0,%xmm5 > + pslld $19-10,%xmm2 > + pand %xmm4,%xmm3 > + pxor %xmm7,%xmm1 > + > + > + psrld $22-13,%xmm7 > + pxor %xmm2,%xmm1 > + movdqa %xmm14,%xmm12 > + pslld $30-19,%xmm2 > + pxor %xmm1,%xmm7 > + pxor %xmm3,%xmm12 > + paddd %xmm5,%xmm8 > + pxor %xmm2,%xmm7 > + > + paddd %xmm5,%xmm12 > + paddd %xmm7,%xmm12 > + movd 16(%r8),%xmm5 > + movd 16(%r9),%xmm0 > + movd 16(%r10),%xmm1 > + movd 16(%r11),%xmm2 > + punpckldq %xmm1,%xmm5 > + punpckldq %xmm2,%xmm0 > + punpckldq %xmm0,%xmm5 > + movdqa %xmm8,%xmm7 > +.byte 102,15,56,0,238 > + movdqa %xmm8,%xmm2 > + > + psrld $6,%xmm7 > + movdqa %xmm8,%xmm1 > + pslld $7,%xmm2 > + movdqa %xmm5,64-128(%rax) > + paddd %xmm11,%xmm5 > + > + psrld $11,%xmm1 > + pxor %xmm2,%xmm7 > + pslld $21-7,%xmm2 > + paddd 0(%rbp),%xmm5 > + pxor %xmm1,%xmm7 > + > + psrld $25-11,%xmm1 > + movdqa %xmm8,%xmm0 > + > + pxor %xmm2,%xmm7 > + movdqa %xmm8,%xmm3 > + pslld $26-21,%xmm2 > + pandn %xmm10,%xmm0 > + pand %xmm9,%xmm3 > + pxor %xmm1,%xmm7 > + > + > + movdqa %xmm12,%xmm1 > + pxor %xmm2,%xmm7 > + movdqa %xmm12,%xmm2 > + psrld $2,%xmm1 > + paddd %xmm7,%xmm5 > + pxor %xmm3,%xmm0 > + movdqa %xmm13,%xmm3 > + movdqa %xmm12,%xmm7 > + pslld $10,%xmm2 > + pxor %xmm12,%xmm3 > + > + > + psrld $13,%xmm7 > + pxor %xmm2,%xmm1 > + paddd %xmm0,%xmm5 > + pslld $19-10,%xmm2 > + pand %xmm3,%xmm4 > + pxor %xmm7,%xmm1 > + > + > + psrld $22-13,%xmm7 > + pxor %xmm2,%xmm1 > + movdqa %xmm13,%xmm11 > + pslld $30-19,%xmm2 > + pxor %xmm1,%xmm7 > + pxor %xmm4,%xmm11 > + paddd %xmm5,%xmm15 > + pxor %xmm2,%xmm7 > + > + paddd %xmm5,%xmm11 > + paddd %xmm7,%xmm11 > + movd 20(%r8),%xmm5 > + movd 20(%r9),%xmm0 > + movd 20(%r10),%xmm1 > + movd 20(%r11),%xmm2 > + punpckldq %xmm1,%xmm5 > + punpckldq %xmm2,%xmm0 > + punpckldq %xmm0,%xmm5 > + movdqa %xmm15,%xmm7 > + > + movdqa %xmm15,%xmm2 > +.byte 102,15,56,0,238 > + psrld $6,%xmm7 > + movdqa %xmm15,%xmm1 > + pslld $7,%xmm2 > + movdqa %xmm5,80-128(%rax) > + paddd %xmm10,%xmm5 > + > + psrld $11,%xmm1 > + pxor %xmm2,%xmm7 > + pslld $21-7,%xmm2 > + paddd 32(%rbp),%xmm5 > + pxor %xmm1,%xmm7 > + > + psrld $25-11,%xmm1 > + movdqa %xmm15,%xmm0 > + > + pxor %xmm2,%xmm7 > + movdqa %xmm15,%xmm4 > + pslld $26-21,%xmm2 > + pandn %xmm9,%xmm0 > + pand %xmm8,%xmm4 > + pxor %xmm1,%xmm7 > + > + > + movdqa %xmm11,%xmm1 > + pxor %xmm2,%xmm7 > + movdqa %xmm11,%xmm2 > + psrld $2,%xmm1 > + paddd %xmm7,%xmm5 > + pxor %xmm4,%xmm0 > + movdqa %xmm12,%xmm4 > + movdqa %xmm11,%xmm7 > + pslld $10,%xmm2 > + pxor %xmm11,%xmm4 > + > + > + psrld $13,%xmm7 > + pxor %xmm2,%xmm1 > + paddd %xmm0,%xmm5 > + pslld $19-10,%xmm2 > + pand %xmm4,%xmm3 > + pxor %xmm7,%xmm1 > + > + > + psrld $22-13,%xmm7 > + pxor %xmm2,%xmm1 > + movdqa %xmm12,%xmm10 > + pslld $30-19,%xmm2 > + pxor %xmm1,%xmm7 > + pxor %xmm3,%xmm10 > + paddd %xmm5,%xmm14 > + pxor %xmm2,%xmm7 > + > + paddd %xmm5,%xmm10 > + paddd %xmm7,%xmm10 > + movd 24(%r8),%xmm5 > + movd 24(%r9),%xmm0 > + movd 24(%r10),%xmm1 > + movd 24(%r11),%xmm2 > + punpckldq %xmm1,%xmm5 > + punpckldq %xmm2,%xmm0 > + punpckldq %xmm0,%xmm5 > + movdqa %xmm14,%xmm7 > +.byte 102,15,56,0,238 > + movdqa %xmm14,%xmm2 > + > + psrld $6,%xmm7 > + movdqa %xmm14,%xmm1 > + pslld $7,%xmm2 > + movdqa %xmm5,96-128(%rax) > + paddd %xmm9,%xmm5 > + > + psrld $11,%xmm1 > + pxor %xmm2,%xmm7 > + pslld $21-7,%xmm2 > + paddd 64(%rbp),%xmm5 > + pxor %xmm1,%xmm7 > + > + psrld $25-11,%xmm1 > + movdqa %xmm14,%xmm0 > + > + pxor %xmm2,%xmm7 > + movdqa %xmm14,%xmm3 > + pslld $26-21,%xmm2 > + pandn %xmm8,%xmm0 > + pand %xmm15,%xmm3 > + pxor %xmm1,%xmm7 > + > + > + movdqa %xmm10,%xmm1 > + pxor %xmm2,%xmm7 > + movdqa %xmm10,%xmm2 > + psrld $2,%xmm1 > + paddd %xmm7,%xmm5 > + pxor %xmm3,%xmm0 > + movdqa %xmm11,%xmm3 > + movdqa %xmm10,%xmm7 > + pslld $10,%xmm2 > + pxor %xmm10,%xmm3 > + > + > + psrld $13,%xmm7 > + pxor %xmm2,%xmm1 > + paddd %xmm0,%xmm5 > + pslld $19-10,%xmm2 > + pand %xmm3,%xmm4 > + pxor %xmm7,%xmm1 > + > + > + psrld $22-13,%xmm7 > + pxor %xmm2,%xmm1 > + movdqa %xmm11,%xmm9 > + pslld $30-19,%xmm2 > + pxor %xmm1,%xmm7 > + pxor %xmm4,%xmm9 > + paddd %xmm5,%xmm13 > + pxor %xmm2,%xmm7 > + > + paddd %xmm5,%xmm9 > + paddd %xmm7,%xmm9 > + movd 28(%r8),%xmm5 > + movd 28(%r9),%xmm0 > + movd 28(%r10),%xmm1 > + movd 28(%r11),%xmm2 > + punpckldq %xmm1,%xmm5 > + punpckldq %xmm2,%xmm0 > + punpckldq %xmm0,%xmm5 > + movdqa %xmm13,%xmm7 > + > + movdqa %xmm13,%xmm2 > +.byte 102,15,56,0,238 > + psrld $6,%xmm7 > + movdqa %xmm13,%xmm1 > + pslld $7,%xmm2 > + movdqa %xmm5,112-128(%rax) > + paddd %xmm8,%xmm5 > + > + psrld $11,%xmm1 > + pxor %xmm2,%xmm7 > + pslld $21-7,%xmm2 > + paddd 96(%rbp),%xmm5 > + pxor %xmm1,%xmm7 > + > + psrld $25-11,%xmm1 > + movdqa %xmm13,%xmm0 > + > + pxor %xmm2,%xmm7 > + movdqa %xmm13,%xmm4 > + pslld $26-21,%xmm2 > + pandn %xmm15,%xmm0 > + pand %xmm14,%xmm4 > + pxor %xmm1,%xmm7 > + > + > + movdqa %xmm9,%xmm1 > + pxor %xmm2,%xmm7 > + movdqa %xmm9,%xmm2 > + psrld $2,%xmm1 > + paddd %xmm7,%xmm5 > + pxor %xmm4,%xmm0 > + movdqa %xmm10,%xmm4 > + movdqa %xmm9,%xmm7 > + pslld $10,%xmm2 > + pxor %xmm9,%xmm4 > + > + > + psrld $13,%xmm7 > + pxor %xmm2,%xmm1 > + paddd %xmm0,%xmm5 > + pslld $19-10,%xmm2 > + pand %xmm4,%xmm3 > + pxor %xmm7,%xmm1 > + > + > + psrld $22-13,%xmm7 > + pxor %xmm2,%xmm1 > + movdqa %xmm10,%xmm8 > + pslld $30-19,%xmm2 > + pxor %xmm1,%xmm7 > + pxor %xmm3,%xmm8 > + paddd %xmm5,%xmm12 > + pxor %xmm2,%xmm7 > + > + paddd %xmm5,%xmm8 > + paddd %xmm7,%xmm8 > + leaq 256(%rbp),%rbp > + movd 32(%r8),%xmm5 > + movd 32(%r9),%xmm0 > + movd 32(%r10),%xmm1 > + movd 32(%r11),%xmm2 > + punpckldq %xmm1,%xmm5 > + punpckldq %xmm2,%xmm0 > + punpckldq %xmm0,%xmm5 > + movdqa %xmm12,%xmm7 > +.byte 102,15,56,0,238 > + movdqa %xmm12,%xmm2 > + > + psrld $6,%xmm7 > + movdqa %xmm12,%xmm1 > + pslld $7,%xmm2 > + movdqa %xmm5,128-128(%rax) > + paddd %xmm15,%xmm5 > + > + psrld $11,%xmm1 > + pxor %xmm2,%xmm7 > + pslld $21-7,%xmm2 > + paddd -128(%rbp),%xmm5 > + pxor %xmm1,%xmm7 > + > + psrld $25-11,%xmm1 > + movdqa %xmm12,%xmm0 > + > + pxor %xmm2,%xmm7 > + movdqa %xmm12,%xmm3 > + pslld $26-21,%xmm2 > + pandn %xmm14,%xmm0 > + pand %xmm13,%xmm3 > + pxor %xmm1,%xmm7 > + > + > + movdqa %xmm8,%xmm1 > + pxor %xmm2,%xmm7 > + movdqa %xmm8,%xmm2 > + psrld $2,%xmm1 > + paddd %xmm7,%xmm5 > + pxor %xmm3,%xmm0 > + movdqa %xmm9,%xmm3 > + movdqa %xmm8,%xmm7 > + pslld $10,%xmm2 > + pxor %xmm8,%xmm3 > + > + > + psrld $13,%xmm7 > + pxor %xmm2,%xmm1 > + paddd %xmm0,%xmm5 > + pslld $19-10,%xmm2 > + pand %xmm3,%xmm4 > + pxor %xmm7,%xmm1 > + > + > + psrld $22-13,%xmm7 > + pxor %xmm2,%xmm1 > + movdqa %xmm9,%xmm15 > + pslld $30-19,%xmm2 > + pxor %xmm1,%xmm7 > + pxor %xmm4,%xmm15 > + paddd %xmm5,%xmm11 > + pxor %xmm2,%xmm7 > + > + paddd %xmm5,%xmm15 > + paddd %xmm7,%xmm15 > + movd 36(%r8),%xmm5 > + movd 36(%r9),%xmm0 > + movd 36(%r10),%xmm1 > + movd 36(%r11),%xmm2 > + punpckldq %xmm1,%xmm5 > + punpckldq %xmm2,%xmm0 > + punpckldq %xmm0,%xmm5 > + movdqa %xmm11,%xmm7 > + > + movdqa %xmm11,%xmm2 > +.byte 102,15,56,0,238 > + psrld $6,%xmm7 > + movdqa %xmm11,%xmm1 > + pslld $7,%xmm2 > + movdqa %xmm5,144-128(%rax) > + paddd %xmm14,%xmm5 > + > + psrld $11,%xmm1 > + pxor %xmm2,%xmm7 > + pslld $21-7,%xmm2 > + paddd -96(%rbp),%xmm5 > + pxor %xmm1,%xmm7 > + > + psrld $25-11,%xmm1 > + movdqa %xmm11,%xmm0 > + > + pxor %xmm2,%xmm7 > + movdqa %xmm11,%xmm4 > + pslld $26-21,%xmm2 > + pandn %xmm13,%xmm0 > + pand %xmm12,%xmm4 > + pxor %xmm1,%xmm7 > + > + > + movdqa %xmm15,%xmm1 > + pxor %xmm2,%xmm7 > + movdqa %xmm15,%xmm2 > + psrld $2,%xmm1 > + paddd %xmm7,%xmm5 > + pxor %xmm4,%xmm0 > + movdqa %xmm8,%xmm4 > + movdqa %xmm15,%xmm7 > + pslld $10,%xmm2 > + pxor %xmm15,%xmm4 > + > + > + psrld $13,%xmm7 > + pxor %xmm2,%xmm1 > + paddd %xmm0,%xmm5 > + pslld $19-10,%xmm2 > + pand %xmm4,%xmm3 > + pxor %xmm7,%xmm1 > + > + > + psrld $22-13,%xmm7 > + pxor %xmm2,%xmm1 > + movdqa %xmm8,%xmm14 > + pslld $30-19,%xmm2 > + pxor %xmm1,%xmm7 > + pxor %xmm3,%xmm14 > + paddd %xmm5,%xmm10 > + pxor %xmm2,%xmm7 > + > + paddd %xmm5,%xmm14 > + paddd %xmm7,%xmm14 > + movd 40(%r8),%xmm5 > + movd 40(%r9),%xmm0 > + movd 40(%r10),%xmm1 > + movd 40(%r11),%xmm2 > + punpckldq %xmm1,%xmm5 > + punpckldq %xmm2,%xmm0 > + punpckldq %xmm0,%xmm5 > + movdqa %xmm10,%xmm7 > +.byte 102,15,56,0,238 > + movdqa %xmm10,%xmm2 > + > + psrld $6,%xmm7 > + movdqa %xmm10,%xmm1 > + pslld $7,%xmm2 > + movdqa %xmm5,160-128(%rax) > + paddd %xmm13,%xmm5 > + > + psrld $11,%xmm1 > + pxor %xmm2,%xmm7 > + pslld $21-7,%xmm2 > + paddd -64(%rbp),%xmm5 > + pxor %xmm1,%xmm7 > + > + psrld $25-11,%xmm1 > + movdqa %xmm10,%xmm0 > + > + pxor %xmm2,%xmm7 > + movdqa %xmm10,%xmm3 > + pslld $26-21,%xmm2 > + pandn %xmm12,%xmm0 > + pand %xmm11,%xmm3 > + pxor %xmm1,%xmm7 > + > + > + movdqa %xmm14,%xmm1 > + pxor %xmm2,%xmm7 > + movdqa %xmm14,%xmm2 > + psrld $2,%xmm1 > + paddd %xmm7,%xmm5 > + pxor %xmm3,%xmm0 > + movdqa %xmm15,%xmm3 > + movdqa %xmm14,%xmm7 > + pslld $10,%xmm2 > + pxor %xmm14,%xmm3 > + > + > + psrld $13,%xmm7 > + pxor %xmm2,%xmm1 > + paddd %xmm0,%xmm5 > + pslld $19-10,%xmm2 > + pand %xmm3,%xmm4 > + pxor %xmm7,%xmm1 > + > + > + psrld $22-13,%xmm7 > + pxor %xmm2,%xmm1 > + movdqa %xmm15,%xmm13 > + pslld $30-19,%xmm2 > + pxor %xmm1,%xmm7 > + pxor %xmm4,%xmm13 > + paddd %xmm5,%xmm9 > + pxor %xmm2,%xmm7 > + > + paddd %xmm5,%xmm13 > + paddd %xmm7,%xmm13 > + movd 44(%r8),%xmm5 > + movd 44(%r9),%xmm0 > + movd 44(%r10),%xmm1 > + movd 44(%r11),%xmm2 > + punpckldq %xmm1,%xmm5 > + punpckldq %xmm2,%xmm0 > + punpckldq %xmm0,%xmm5 > + movdqa %xmm9,%xmm7 > + > + movdqa %xmm9,%xmm2 > +.byte 102,15,56,0,238 > + psrld $6,%xmm7 > + movdqa %xmm9,%xmm1 > + pslld $7,%xmm2 > + movdqa %xmm5,176-128(%rax) > + paddd %xmm12,%xmm5 > + > + psrld $11,%xmm1 > + pxor %xmm2,%xmm7 > + pslld $21-7,%xmm2 > + paddd -32(%rbp),%xmm5 > + pxor %xmm1,%xmm7 > + > + psrld $25-11,%xmm1 > + movdqa %xmm9,%xmm0 > + > + pxor %xmm2,%xmm7 > + movdqa %xmm9,%xmm4 > + pslld $26-21,%xmm2 > + pandn %xmm11,%xmm0 > + pand %xmm10,%xmm4 > + pxor %xmm1,%xmm7 > + > + > + movdqa %xmm13,%xmm1 > + pxor %xmm2,%xmm7 > + movdqa %xmm13,%xmm2 > + psrld $2,%xmm1 > + paddd %xmm7,%xmm5 > + pxor %xmm4,%xmm0 > + movdqa %xmm14,%xmm4 > + movdqa %xmm13,%xmm7 > + pslld $10,%xmm2 > + pxor %xmm13,%xmm4 > + > + > + psrld $13,%xmm7 > + pxor %xmm2,%xmm1 > + paddd %xmm0,%xmm5 > + pslld $19-10,%xmm2 > + pand %xmm4,%xmm3 > + pxor %xmm7,%xmm1 > + > + > + psrld $22-13,%xmm7 > + pxor %xmm2,%xmm1 > + movdqa %xmm14,%xmm12 > + pslld $30-19,%xmm2 > + pxor %xmm1,%xmm7 > + pxor %xmm3,%xmm12 > + paddd %xmm5,%xmm8 > + pxor %xmm2,%xmm7 > + > + paddd %xmm5,%xmm12 > + paddd %xmm7,%xmm12 > + movd 48(%r8),%xmm5 > + movd 48(%r9),%xmm0 > + movd 48(%r10),%xmm1 > + movd 48(%r11),%xmm2 > + punpckldq %xmm1,%xmm5 > + punpckldq %xmm2,%xmm0 > + punpckldq %xmm0,%xmm5 > + movdqa %xmm8,%xmm7 > +.byte 102,15,56,0,238 > + movdqa %xmm8,%xmm2 > + > + psrld $6,%xmm7 > + movdqa %xmm8,%xmm1 > + pslld $7,%xmm2 > + movdqa %xmm5,192-128(%rax) > + paddd %xmm11,%xmm5 > + > + psrld $11,%xmm1 > + pxor %xmm2,%xmm7 > + pslld $21-7,%xmm2 > + paddd 0(%rbp),%xmm5 > + pxor %xmm1,%xmm7 > + > + psrld $25-11,%xmm1 > + movdqa %xmm8,%xmm0 > + > + pxor %xmm2,%xmm7 > + movdqa %xmm8,%xmm3 > + pslld $26-21,%xmm2 > + pandn %xmm10,%xmm0 > + pand %xmm9,%xmm3 > + pxor %xmm1,%xmm7 > + > + > + movdqa %xmm12,%xmm1 > + pxor %xmm2,%xmm7 > + movdqa %xmm12,%xmm2 > + psrld $2,%xmm1 > + paddd %xmm7,%xmm5 > + pxor %xmm3,%xmm0 > + movdqa %xmm13,%xmm3 > + movdqa %xmm12,%xmm7 > + pslld $10,%xmm2 > + pxor %xmm12,%xmm3 > + > + > + psrld $13,%xmm7 > + pxor %xmm2,%xmm1 > + paddd %xmm0,%xmm5 > + pslld $19-10,%xmm2 > + pand %xmm3,%xmm4 > + pxor %xmm7,%xmm1 > + > + > + psrld $22-13,%xmm7 > + pxor %xmm2,%xmm1 > + movdqa %xmm13,%xmm11 > + pslld $30-19,%xmm2 > + pxor %xmm1,%xmm7 > + pxor %xmm4,%xmm11 > + paddd %xmm5,%xmm15 > + pxor %xmm2,%xmm7 > + > + paddd %xmm5,%xmm11 > + paddd %xmm7,%xmm11 > + movd 52(%r8),%xmm5 > + movd 52(%r9),%xmm0 > + movd 52(%r10),%xmm1 > + movd 52(%r11),%xmm2 > + punpckldq %xmm1,%xmm5 > + punpckldq %xmm2,%xmm0 > + punpckldq %xmm0,%xmm5 > + movdqa %xmm15,%xmm7 > + > + movdqa %xmm15,%xmm2 > +.byte 102,15,56,0,238 > + psrld $6,%xmm7 > + movdqa %xmm15,%xmm1 > + pslld $7,%xmm2 > + movdqa %xmm5,208-128(%rax) > + paddd %xmm10,%xmm5 > + > + psrld $11,%xmm1 > + pxor %xmm2,%xmm7 > + pslld $21-7,%xmm2 > + paddd 32(%rbp),%xmm5 > + pxor %xmm1,%xmm7 > + > + psrld $25-11,%xmm1 > + movdqa %xmm15,%xmm0 > + > + pxor %xmm2,%xmm7 > + movdqa %xmm15,%xmm4 > + pslld $26-21,%xmm2 > + pandn %xmm9,%xmm0 > + pand %xmm8,%xmm4 > + pxor %xmm1,%xmm7 > + > + > + movdqa %xmm11,%xmm1 > + pxor %xmm2,%xmm7 > + movdqa %xmm11,%xmm2 > + psrld $2,%xmm1 > + paddd %xmm7,%xmm5 > + pxor %xmm4,%xmm0 > + movdqa %xmm12,%xmm4 > + movdqa %xmm11,%xmm7 > + pslld $10,%xmm2 > + pxor %xmm11,%xmm4 > + > + > + psrld $13,%xmm7 > + pxor %xmm2,%xmm1 > + paddd %xmm0,%xmm5 > + pslld $19-10,%xmm2 > + pand %xmm4,%xmm3 > + pxor %xmm7,%xmm1 > + > + > + psrld $22-13,%xmm7 > + pxor %xmm2,%xmm1 > + movdqa %xmm12,%xmm10 > + pslld $30-19,%xmm2 > + pxor %xmm1,%xmm7 > + pxor %xmm3,%xmm10 > + paddd %xmm5,%xmm14 > + pxor %xmm2,%xmm7 > + > + paddd %xmm5,%xmm10 > + paddd %xmm7,%xmm10 > + movd 56(%r8),%xmm5 > + movd 56(%r9),%xmm0 > + movd 56(%r10),%xmm1 > + movd 56(%r11),%xmm2 > + punpckldq %xmm1,%xmm5 > + punpckldq %xmm2,%xmm0 > + punpckldq %xmm0,%xmm5 > + movdqa %xmm14,%xmm7 > +.byte 102,15,56,0,238 > + movdqa %xmm14,%xmm2 > + > + psrld $6,%xmm7 > + movdqa %xmm14,%xmm1 > + pslld $7,%xmm2 > + movdqa %xmm5,224-128(%rax) > + paddd %xmm9,%xmm5 > + > + psrld $11,%xmm1 > + pxor %xmm2,%xmm7 > + pslld $21-7,%xmm2 > + paddd 64(%rbp),%xmm5 > + pxor %xmm1,%xmm7 > + > + psrld $25-11,%xmm1 > + movdqa %xmm14,%xmm0 > + > + pxor %xmm2,%xmm7 > + movdqa %xmm14,%xmm3 > + pslld $26-21,%xmm2 > + pandn %xmm8,%xmm0 > + pand %xmm15,%xmm3 > + pxor %xmm1,%xmm7 > + > + > + movdqa %xmm10,%xmm1 > + pxor %xmm2,%xmm7 > + movdqa %xmm10,%xmm2 > + psrld $2,%xmm1 > + paddd %xmm7,%xmm5 > + pxor %xmm3,%xmm0 > + movdqa %xmm11,%xmm3 > + movdqa %xmm10,%xmm7 > + pslld $10,%xmm2 > + pxor %xmm10,%xmm3 > + > + > + psrld $13,%xmm7 > + pxor %xmm2,%xmm1 > + paddd %xmm0,%xmm5 > + pslld $19-10,%xmm2 > + pand %xmm3,%xmm4 > + pxor %xmm7,%xmm1 > + > + > + psrld $22-13,%xmm7 > + pxor %xmm2,%xmm1 > + movdqa %xmm11,%xmm9 > + pslld $30-19,%xmm2 > + pxor %xmm1,%xmm7 > + pxor %xmm4,%xmm9 > + paddd %xmm5,%xmm13 > + pxor %xmm2,%xmm7 > + > + paddd %xmm5,%xmm9 > + paddd %xmm7,%xmm9 > + movd 60(%r8),%xmm5 > + leaq 64(%r8),%r8 > + movd 60(%r9),%xmm0 > + leaq 64(%r9),%r9 > + movd 60(%r10),%xmm1 > + leaq 64(%r10),%r10 > + movd 60(%r11),%xmm2 > + leaq 64(%r11),%r11 > + punpckldq %xmm1,%xmm5 > + punpckldq %xmm2,%xmm0 > + punpckldq %xmm0,%xmm5 > + movdqa %xmm13,%xmm7 > + > + movdqa %xmm13,%xmm2 > +.byte 102,15,56,0,238 > + psrld $6,%xmm7 > + movdqa %xmm13,%xmm1 > + pslld $7,%xmm2 > + movdqa %xmm5,240-128(%rax) > + paddd %xmm8,%xmm5 > + > + psrld $11,%xmm1 > + pxor %xmm2,%xmm7 > + pslld $21-7,%xmm2 > + paddd 96(%rbp),%xmm5 > + pxor %xmm1,%xmm7 > + > + psrld $25-11,%xmm1 > + movdqa %xmm13,%xmm0 > + prefetcht0 63(%r8) > + pxor %xmm2,%xmm7 > + movdqa %xmm13,%xmm4 > + pslld $26-21,%xmm2 > + pandn %xmm15,%xmm0 > + pand %xmm14,%xmm4 > + pxor %xmm1,%xmm7 > + > + prefetcht0 63(%r9) > + movdqa %xmm9,%xmm1 > + pxor %xmm2,%xmm7 > + movdqa %xmm9,%xmm2 > + psrld $2,%xmm1 > + paddd %xmm7,%xmm5 > + pxor %xmm4,%xmm0 > + movdqa %xmm10,%xmm4 > + movdqa %xmm9,%xmm7 > + pslld $10,%xmm2 > + pxor %xmm9,%xmm4 > + > + prefetcht0 63(%r10) > + psrld $13,%xmm7 > + pxor %xmm2,%xmm1 > + paddd %xmm0,%xmm5 > + pslld $19-10,%xmm2 > + pand %xmm4,%xmm3 > + pxor %xmm7,%xmm1 > + > + prefetcht0 63(%r11) > + psrld $22-13,%xmm7 > + pxor %xmm2,%xmm1 > + movdqa %xmm10,%xmm8 > + pslld $30-19,%xmm2 > + pxor %xmm1,%xmm7 > + pxor %xmm3,%xmm8 > + paddd %xmm5,%xmm12 > + pxor %xmm2,%xmm7 > + > + paddd %xmm5,%xmm8 > + paddd %xmm7,%xmm8 > + leaq 256(%rbp),%rbp > + movdqu 0-128(%rax),%xmm5 > + movl $3,%ecx > + jmp .Loop_16_xx > +.align 32 > +.Loop_16_xx: > + movdqa 16-128(%rax),%xmm6 > + paddd 144-128(%rax),%xmm5 > + > + movdqa %xmm6,%xmm7 > + movdqa %xmm6,%xmm1 > + psrld $3,%xmm7 > + movdqa %xmm6,%xmm2 > + > + psrld $7,%xmm1 > + movdqa 224-128(%rax),%xmm0 > + pslld $14,%xmm2 > + pxor %xmm1,%xmm7 > + psrld $18-7,%xmm1 > + movdqa %xmm0,%xmm3 > + pxor %xmm2,%xmm7 > + pslld $25-14,%xmm2 > + pxor %xmm1,%xmm7 > + psrld $10,%xmm0 > + movdqa %xmm3,%xmm1 > + > + psrld $17,%xmm3 > + pxor %xmm2,%xmm7 > + pslld $13,%xmm1 > + paddd %xmm7,%xmm5 > + pxor %xmm3,%xmm0 > + psrld $19-17,%xmm3 > + pxor %xmm1,%xmm0 > + pslld $15-13,%xmm1 > + pxor %xmm3,%xmm0 > + pxor %xmm1,%xmm0 > + paddd %xmm0,%xmm5 > + movdqa %xmm12,%xmm7 > + > + movdqa %xmm12,%xmm2 > + > + psrld $6,%xmm7 > + movdqa %xmm12,%xmm1 > + pslld $7,%xmm2 > + movdqa %xmm5,0-128(%rax) > + paddd %xmm15,%xmm5 > + > + psrld $11,%xmm1 > + pxor %xmm2,%xmm7 > + pslld $21-7,%xmm2 > + paddd -128(%rbp),%xmm5 > + pxor %xmm1,%xmm7 > + > + psrld $25-11,%xmm1 > + movdqa %xmm12,%xmm0 > + > + pxor %xmm2,%xmm7 > + movdqa %xmm12,%xmm3 > + pslld $26-21,%xmm2 > + pandn %xmm14,%xmm0 > + pand %xmm13,%xmm3 > + pxor %xmm1,%xmm7 > + > + > + movdqa %xmm8,%xmm1 > + pxor %xmm2,%xmm7 > + movdqa %xmm8,%xmm2 > + psrld $2,%xmm1 > + paddd %xmm7,%xmm5 > + pxor %xmm3,%xmm0 > + movdqa %xmm9,%xmm3 > + movdqa %xmm8,%xmm7 > + pslld $10,%xmm2 > + pxor %xmm8,%xmm3 > + > + > + psrld $13,%xmm7 > + pxor %xmm2,%xmm1 > + paddd %xmm0,%xmm5 > + pslld $19-10,%xmm2 > + pand %xmm3,%xmm4 > + pxor %xmm7,%xmm1 > + > + > + psrld $22-13,%xmm7 > + pxor %xmm2,%xmm1 > + movdqa %xmm9,%xmm15 > + pslld $30-19,%xmm2 > + pxor %xmm1,%xmm7 > + pxor %xmm4,%xmm15 > + paddd %xmm5,%xmm11 > + pxor %xmm2,%xmm7 > + > + paddd %xmm5,%xmm15 > + paddd %xmm7,%xmm15 > + movdqa 32-128(%rax),%xmm5 > + paddd 160-128(%rax),%xmm6 > + > + movdqa %xmm5,%xmm7 > + movdqa %xmm5,%xmm1 > + psrld $3,%xmm7 > + movdqa %xmm5,%xmm2 > + > + psrld $7,%xmm1 > + movdqa 240-128(%rax),%xmm0 > + pslld $14,%xmm2 > + pxor %xmm1,%xmm7 > + psrld $18-7,%xmm1 > + movdqa %xmm0,%xmm4 > + pxor %xmm2,%xmm7 > + pslld $25-14,%xmm2 > + pxor %xmm1,%xmm7 > + psrld $10,%xmm0 > + movdqa %xmm4,%xmm1 > + > + psrld $17,%xmm4 > + pxor %xmm2,%xmm7 > + pslld $13,%xmm1 > + paddd %xmm7,%xmm6 > + pxor %xmm4,%xmm0 > + psrld $19-17,%xmm4 > + pxor %xmm1,%xmm0 > + pslld $15-13,%xmm1 > + pxor %xmm4,%xmm0 > + pxor %xmm1,%xmm0 > + paddd %xmm0,%xmm6 > + movdqa %xmm11,%xmm7 > + > + movdqa %xmm11,%xmm2 > + > + psrld $6,%xmm7 > + movdqa %xmm11,%xmm1 > + pslld $7,%xmm2 > + movdqa %xmm6,16-128(%rax) > + paddd %xmm14,%xmm6 > + > + psrld $11,%xmm1 > + pxor %xmm2,%xmm7 > + pslld $21-7,%xmm2 > + paddd -96(%rbp),%xmm6 > + pxor %xmm1,%xmm7 > + > + psrld $25-11,%xmm1 > + movdqa %xmm11,%xmm0 > + > + pxor %xmm2,%xmm7 > + movdqa %xmm11,%xmm4 > + pslld $26-21,%xmm2 > + pandn %xmm13,%xmm0 > + pand %xmm12,%xmm4 > + pxor %xmm1,%xmm7 > + > + > + movdqa %xmm15,%xmm1 > + pxor %xmm2,%xmm7 > + movdqa %xmm15,%xmm2 > + psrld $2,%xmm1 > + paddd %xmm7,%xmm6 > + pxor %xmm4,%xmm0 > + movdqa %xmm8,%xmm4 > + movdqa %xmm15,%xmm7 > + pslld $10,%xmm2 > + pxor %xmm15,%xmm4 > + > + > + psrld $13,%xmm7 > + pxor %xmm2,%xmm1 > + paddd %xmm0,%xmm6 > + pslld $19-10,%xmm2 > + pand %xmm4,%xmm3 > + pxor %xmm7,%xmm1 > + > + > + psrld $22-13,%xmm7 > + pxor %xmm2,%xmm1 > + movdqa %xmm8,%xmm14 > + pslld $30-19,%xmm2 > + pxor %xmm1,%xmm7 > + pxor %xmm3,%xmm14 > + paddd %xmm6,%xmm10 > + pxor %xmm2,%xmm7 > + > + paddd %xmm6,%xmm14 > + paddd %xmm7,%xmm14 > + movdqa 48-128(%rax),%xmm6 > + paddd 176-128(%rax),%xmm5 > + > + movdqa %xmm6,%xmm7 > + movdqa %xmm6,%xmm1 > + psrld $3,%xmm7 > + movdqa %xmm6,%xmm2 > + > + psrld $7,%xmm1 > + movdqa 0-128(%rax),%xmm0 > + pslld $14,%xmm2 > + pxor %xmm1,%xmm7 > + psrld $18-7,%xmm1 > + movdqa %xmm0,%xmm3 > + pxor %xmm2,%xmm7 > + pslld $25-14,%xmm2 > + pxor %xmm1,%xmm7 > + psrld $10,%xmm0 > + movdqa %xmm3,%xmm1 > + > + psrld $17,%xmm3 > + pxor %xmm2,%xmm7 > + pslld $13,%xmm1 > + paddd %xmm7,%xmm5 > + pxor %xmm3,%xmm0 > + psrld $19-17,%xmm3 > + pxor %xmm1,%xmm0 > + pslld $15-13,%xmm1 > + pxor %xmm3,%xmm0 > + pxor %xmm1,%xmm0 > + paddd %xmm0,%xmm5 > + movdqa %xmm10,%xmm7 > + > + movdqa %xmm10,%xmm2 > + > + psrld $6,%xmm7 > + movdqa %xmm10,%xmm1 > + pslld $7,%xmm2 > + movdqa %xmm5,32-128(%rax) > + paddd %xmm13,%xmm5 > + > + psrld $11,%xmm1 > + pxor %xmm2,%xmm7 > + pslld $21-7,%xmm2 > + paddd -64(%rbp),%xmm5 > + pxor %xmm1,%xmm7 > + > + psrld $25-11,%xmm1 > + movdqa %xmm10,%xmm0 > + > + pxor %xmm2,%xmm7 > + movdqa %xmm10,%xmm3 > + pslld $26-21,%xmm2 > + pandn %xmm12,%xmm0 > + pand %xmm11,%xmm3 > + pxor %xmm1,%xmm7 > + > + > + movdqa %xmm14,%xmm1 > + pxor %xmm2,%xmm7 > + movdqa %xmm14,%xmm2 > + psrld $2,%xmm1 > + paddd %xmm7,%xmm5 > + pxor %xmm3,%xmm0 > + movdqa %xmm15,%xmm3 > + movdqa %xmm14,%xmm7 > + pslld $10,%xmm2 > + pxor %xmm14,%xmm3 > + > + > + psrld $13,%xmm7 > + pxor %xmm2,%xmm1 > + paddd %xmm0,%xmm5 > + pslld $19-10,%xmm2 > + pand %xmm3,%xmm4 > + pxor %xmm7,%xmm1 > + > + > + psrld $22-13,%xmm7 > + pxor %xmm2,%xmm1 > + movdqa %xmm15,%xmm13 > + pslld $30-19,%xmm2 > + pxor %xmm1,%xmm7 > + pxor %xmm4,%xmm13 > + paddd %xmm5,%xmm9 > + pxor %xmm2,%xmm7 > + > + paddd %xmm5,%xmm13 > + paddd %xmm7,%xmm13 > + movdqa 64-128(%rax),%xmm5 > + paddd 192-128(%rax),%xmm6 > + > + movdqa %xmm5,%xmm7 > + movdqa %xmm5,%xmm1 > + psrld $3,%xmm7 > + movdqa %xmm5,%xmm2 > + > + psrld $7,%xmm1 > + movdqa 16-128(%rax),%xmm0 > + pslld $14,%xmm2 > + pxor %xmm1,%xmm7 > + psrld $18-7,%xmm1 > + movdqa %xmm0,%xmm4 > + pxor %xmm2,%xmm7 > + pslld $25-14,%xmm2 > + pxor %xmm1,%xmm7 > + psrld $10,%xmm0 > + movdqa %xmm4,%xmm1 > + > + psrld $17,%xmm4 > + pxor %xmm2,%xmm7 > + pslld $13,%xmm1 > + paddd %xmm7,%xmm6 > + pxor %xmm4,%xmm0 > + psrld $19-17,%xmm4 > + pxor %xmm1,%xmm0 > + pslld $15-13,%xmm1 > + pxor %xmm4,%xmm0 > + pxor %xmm1,%xmm0 > + paddd %xmm0,%xmm6 > + movdqa %xmm9,%xmm7 > + > + movdqa %xmm9,%xmm2 > + > + psrld $6,%xmm7 > + movdqa %xmm9,%xmm1 > + pslld $7,%xmm2 > + movdqa %xmm6,48-128(%rax) > + paddd %xmm12,%xmm6 > + > + psrld $11,%xmm1 > + pxor %xmm2,%xmm7 > + pslld $21-7,%xmm2 > + paddd -32(%rbp),%xmm6 > + pxor %xmm1,%xmm7 > + > + psrld $25-11,%xmm1 > + movdqa %xmm9,%xmm0 > + > + pxor %xmm2,%xmm7 > + movdqa %xmm9,%xmm4 > + pslld $26-21,%xmm2 > + pandn %xmm11,%xmm0 > + pand %xmm10,%xmm4 > + pxor %xmm1,%xmm7 > + > + > + movdqa %xmm13,%xmm1 > + pxor %xmm2,%xmm7 > + movdqa %xmm13,%xmm2 > + psrld $2,%xmm1 > + paddd %xmm7,%xmm6 > + pxor %xmm4,%xmm0 > + movdqa %xmm14,%xmm4 > + movdqa %xmm13,%xmm7 > + pslld $10,%xmm2 > + pxor %xmm13,%xmm4 > + > + > + psrld $13,%xmm7 > + pxor %xmm2,%xmm1 > + paddd %xmm0,%xmm6 > + pslld $19-10,%xmm2 > + pand %xmm4,%xmm3 > + pxor %xmm7,%xmm1 > + > + > + psrld $22-13,%xmm7 > + pxor %xmm2,%xmm1 > + movdqa %xmm14,%xmm12 > + pslld $30-19,%xmm2 > + pxor %xmm1,%xmm7 > + pxor %xmm3,%xmm12 > + paddd %xmm6,%xmm8 > + pxor %xmm2,%xmm7 > + > + paddd %xmm6,%xmm12 > + paddd %xmm7,%xmm12 > + movdqa 80-128(%rax),%xmm6 > + paddd 208-128(%rax),%xmm5 > + > + movdqa %xmm6,%xmm7 > + movdqa %xmm6,%xmm1 > + psrld $3,%xmm7 > + movdqa %xmm6,%xmm2 > + > + psrld $7,%xmm1 > + movdqa 32-128(%rax),%xmm0 > + pslld $14,%xmm2 > + pxor %xmm1,%xmm7 > + psrld $18-7,%xmm1 > + movdqa %xmm0,%xmm3 > + pxor %xmm2,%xmm7 > + pslld $25-14,%xmm2 > + pxor %xmm1,%xmm7 > + psrld $10,%xmm0 > + movdqa %xmm3,%xmm1 > + > + psrld $17,%xmm3 > + pxor %xmm2,%xmm7 > + pslld $13,%xmm1 > + paddd %xmm7,%xmm5 > + pxor %xmm3,%xmm0 > + psrld $19-17,%xmm3 > + pxor %xmm1,%xmm0 > + pslld $15-13,%xmm1 > + pxor %xmm3,%xmm0 > + pxor %xmm1,%xmm0 > + paddd %xmm0,%xmm5 > + movdqa %xmm8,%xmm7 > + > + movdqa %xmm8,%xmm2 > + > + psrld $6,%xmm7 > + movdqa %xmm8,%xmm1 > + pslld $7,%xmm2 > + movdqa %xmm5,64-128(%rax) > + paddd %xmm11,%xmm5 > + > + psrld $11,%xmm1 > + pxor %xmm2,%xmm7 > + pslld $21-7,%xmm2 > + paddd 0(%rbp),%xmm5 > + pxor %xmm1,%xmm7 > + > + psrld $25-11,%xmm1 > + movdqa %xmm8,%xmm0 > + > + pxor %xmm2,%xmm7 > + movdqa %xmm8,%xmm3 > + pslld $26-21,%xmm2 > + pandn %xmm10,%xmm0 > + pand %xmm9,%xmm3 > + pxor %xmm1,%xmm7 > + > + > + movdqa %xmm12,%xmm1 > + pxor %xmm2,%xmm7 > + movdqa %xmm12,%xmm2 > + psrld $2,%xmm1 > + paddd %xmm7,%xmm5 > + pxor %xmm3,%xmm0 > + movdqa %xmm13,%xmm3 > + movdqa %xmm12,%xmm7 > + pslld $10,%xmm2 > + pxor %xmm12,%xmm3 > + > + > + psrld $13,%xmm7 > + pxor %xmm2,%xmm1 > + paddd %xmm0,%xmm5 > + pslld $19-10,%xmm2 > + pand %xmm3,%xmm4 > + pxor %xmm7,%xmm1 > + > + > + psrld $22-13,%xmm7 > + pxor %xmm2,%xmm1 > + movdqa %xmm13,%xmm11 > + pslld $30-19,%xmm2 > + pxor %xmm1,%xmm7 > + pxor %xmm4,%xmm11 > + paddd %xmm5,%xmm15 > + pxor %xmm2,%xmm7 > + > + paddd %xmm5,%xmm11 > + paddd %xmm7,%xmm11 > + movdqa 96-128(%rax),%xmm5 > + paddd 224-128(%rax),%xmm6 > + > + movdqa %xmm5,%xmm7 > + movdqa %xmm5,%xmm1 > + psrld $3,%xmm7 > + movdqa %xmm5,%xmm2 > + > + psrld $7,%xmm1 > + movdqa 48-128(%rax),%xmm0 > + pslld $14,%xmm2 > + pxor %xmm1,%xmm7 > + psrld $18-7,%xmm1 > + movdqa %xmm0,%xmm4 > + pxor %xmm2,%xmm7 > + pslld $25-14,%xmm2 > + pxor %xmm1,%xmm7 > + psrld $10,%xmm0 > + movdqa %xmm4,%xmm1 > + > + psrld $17,%xmm4 > + pxor %xmm2,%xmm7 > + pslld $13,%xmm1 > + paddd %xmm7,%xmm6 > + pxor %xmm4,%xmm0 > + psrld $19-17,%xmm4 > + pxor %xmm1,%xmm0 > + pslld $15-13,%xmm1 > + pxor %xmm4,%xmm0 > + pxor %xmm1,%xmm0 > + paddd %xmm0,%xmm6 > + movdqa %xmm15,%xmm7 > + > + movdqa %xmm15,%xmm2 > + > + psrld $6,%xmm7 > + movdqa %xmm15,%xmm1 > + pslld $7,%xmm2 > + movdqa %xmm6,80-128(%rax) > + paddd %xmm10,%xmm6 > + > + psrld $11,%xmm1 > + pxor %xmm2,%xmm7 > + pslld $21-7,%xmm2 > + paddd 32(%rbp),%xmm6 > + pxor %xmm1,%xmm7 > + > + psrld $25-11,%xmm1 > + movdqa %xmm15,%xmm0 > + > + pxor %xmm2,%xmm7 > + movdqa %xmm15,%xmm4 > + pslld $26-21,%xmm2 > + pandn %xmm9,%xmm0 > + pand %xmm8,%xmm4 > + pxor %xmm1,%xmm7 > + > + > + movdqa %xmm11,%xmm1 > + pxor %xmm2,%xmm7 > + movdqa %xmm11,%xmm2 > + psrld $2,%xmm1 > + paddd %xmm7,%xmm6 > + pxor %xmm4,%xmm0 > + movdqa %xmm12,%xmm4 > + movdqa %xmm11,%xmm7 > + pslld $10,%xmm2 > + pxor %xmm11,%xmm4 > + > + > + psrld $13,%xmm7 > + pxor %xmm2,%xmm1 > + paddd %xmm0,%xmm6 > + pslld $19-10,%xmm2 > + pand %xmm4,%xmm3 > + pxor %xmm7,%xmm1 > + > + > + psrld $22-13,%xmm7 > + pxor %xmm2,%xmm1 > + movdqa %xmm12,%xmm10 > + pslld $30-19,%xmm2 > + pxor %xmm1,%xmm7 > + pxor %xmm3,%xmm10 > + paddd %xmm6,%xmm14 > + pxor %xmm2,%xmm7 > + > + paddd %xmm6,%xmm10 > + paddd %xmm7,%xmm10 > + movdqa 112-128(%rax),%xmm6 > + paddd 240-128(%rax),%xmm5 > + > + movdqa %xmm6,%xmm7 > + movdqa %xmm6,%xmm1 > + psrld $3,%xmm7 > + movdqa %xmm6,%xmm2 > + > + psrld $7,%xmm1 > + movdqa 64-128(%rax),%xmm0 > + pslld $14,%xmm2 > + pxor %xmm1,%xmm7 > + psrld $18-7,%xmm1 > + movdqa %xmm0,%xmm3 > + pxor %xmm2,%xmm7 > + pslld $25-14,%xmm2 > + pxor %xmm1,%xmm7 > + psrld $10,%xmm0 > + movdqa %xmm3,%xmm1 > + > + psrld $17,%xmm3 > + pxor %xmm2,%xmm7 > + pslld $13,%xmm1 > + paddd %xmm7,%xmm5 > + pxor %xmm3,%xmm0 > + psrld $19-17,%xmm3 > + pxor %xmm1,%xmm0 > + pslld $15-13,%xmm1 > + pxor %xmm3,%xmm0 > + pxor %xmm1,%xmm0 > + paddd %xmm0,%xmm5 > + movdqa %xmm14,%xmm7 > + > + movdqa %xmm14,%xmm2 > + > + psrld $6,%xmm7 > + movdqa %xmm14,%xmm1 > + pslld $7,%xmm2 > + movdqa %xmm5,96-128(%rax) > + paddd %xmm9,%xmm5 > + > + psrld $11,%xmm1 > + pxor %xmm2,%xmm7 > + pslld $21-7,%xmm2 > + paddd 64(%rbp),%xmm5 > + pxor %xmm1,%xmm7 > + > + psrld $25-11,%xmm1 > + movdqa %xmm14,%xmm0 > + > + pxor %xmm2,%xmm7 > + movdqa %xmm14,%xmm3 > + pslld $26-21,%xmm2 > + pandn %xmm8,%xmm0 > + pand %xmm15,%xmm3 > + pxor %xmm1,%xmm7 > + > + > + movdqa %xmm10,%xmm1 > + pxor %xmm2,%xmm7 > + movdqa %xmm10,%xmm2 > + psrld $2,%xmm1 > + paddd %xmm7,%xmm5 > + pxor %xmm3,%xmm0 > + movdqa %xmm11,%xmm3 > + movdqa %xmm10,%xmm7 > + pslld $10,%xmm2 > + pxor %xmm10,%xmm3 > + > + > + psrld $13,%xmm7 > + pxor %xmm2,%xmm1 > + paddd %xmm0,%xmm5 > + pslld $19-10,%xmm2 > + pand %xmm3,%xmm4 > + pxor %xmm7,%xmm1 > + > + > + psrld $22-13,%xmm7 > + pxor %xmm2,%xmm1 > + movdqa %xmm11,%xmm9 > + pslld $30-19,%xmm2 > + pxor %xmm1,%xmm7 > + pxor %xmm4,%xmm9 > + paddd %xmm5,%xmm13 > + pxor %xmm2,%xmm7 > + > + paddd %xmm5,%xmm9 > + paddd %xmm7,%xmm9 > + movdqa 128-128(%rax),%xmm5 > + paddd 0-128(%rax),%xmm6 > + > + movdqa %xmm5,%xmm7 > + movdqa %xmm5,%xmm1 > + psrld $3,%xmm7 > + movdqa %xmm5,%xmm2 > + > + psrld $7,%xmm1 > + movdqa 80-128(%rax),%xmm0 > + pslld $14,%xmm2 > + pxor %xmm1,%xmm7 > + psrld $18-7,%xmm1 > + movdqa %xmm0,%xmm4 > + pxor %xmm2,%xmm7 > + pslld $25-14,%xmm2 > + pxor %xmm1,%xmm7 > + psrld $10,%xmm0 > + movdqa %xmm4,%xmm1 > + > + psrld $17,%xmm4 > + pxor %xmm2,%xmm7 > + pslld $13,%xmm1 > + paddd %xmm7,%xmm6 > + pxor %xmm4,%xmm0 > + psrld $19-17,%xmm4 > + pxor %xmm1,%xmm0 > + pslld $15-13,%xmm1 > + pxor %xmm4,%xmm0 > + pxor %xmm1,%xmm0 > + paddd %xmm0,%xmm6 > + movdqa %xmm13,%xmm7 > + > + movdqa %xmm13,%xmm2 > + > + psrld $6,%xmm7 > + movdqa %xmm13,%xmm1 > + pslld $7,%xmm2 > + movdqa %xmm6,112-128(%rax) > + paddd %xmm8,%xmm6 > + > + psrld $11,%xmm1 > + pxor %xmm2,%xmm7 > + pslld $21-7,%xmm2 > + paddd 96(%rbp),%xmm6 > + pxor %xmm1,%xmm7 > + > + psrld $25-11,%xmm1 > + movdqa %xmm13,%xmm0 > + > + pxor %xmm2,%xmm7 > + movdqa %xmm13,%xmm4 > + pslld $26-21,%xmm2 > + pandn %xmm15,%xmm0 > + pand %xmm14,%xmm4 > + pxor %xmm1,%xmm7 > + > + > + movdqa %xmm9,%xmm1 > + pxor %xmm2,%xmm7 > + movdqa %xmm9,%xmm2 > + psrld $2,%xmm1 > + paddd %xmm7,%xmm6 > + pxor %xmm4,%xmm0 > + movdqa %xmm10,%xmm4 > + movdqa %xmm9,%xmm7 > + pslld $10,%xmm2 > + pxor %xmm9,%xmm4 > + > + > + psrld $13,%xmm7 > + pxor %xmm2,%xmm1 > + paddd %xmm0,%xmm6 > + pslld $19-10,%xmm2 > + pand %xmm4,%xmm3 > + pxor %xmm7,%xmm1 > + > + > + psrld $22-13,%xmm7 > + pxor %xmm2,%xmm1 > + movdqa %xmm10,%xmm8 > + pslld $30-19,%xmm2 > + pxor %xmm1,%xmm7 > + pxor %xmm3,%xmm8 > + paddd %xmm6,%xmm12 > + pxor %xmm2,%xmm7 > + > + paddd %xmm6,%xmm8 > + paddd %xmm7,%xmm8 > + leaq 256(%rbp),%rbp > + movdqa 144-128(%rax),%xmm6 > + paddd 16-128(%rax),%xmm5 > + > + movdqa %xmm6,%xmm7 > + movdqa %xmm6,%xmm1 > + psrld $3,%xmm7 > + movdqa %xmm6,%xmm2 > + > + psrld $7,%xmm1 > + movdqa 96-128(%rax),%xmm0 > + pslld $14,%xmm2 > + pxor %xmm1,%xmm7 > + psrld $18-7,%xmm1 > + movdqa %xmm0,%xmm3 > + pxor %xmm2,%xmm7 > + pslld $25-14,%xmm2 > + pxor %xmm1,%xmm7 > + psrld $10,%xmm0 > + movdqa %xmm3,%xmm1 > + > + psrld $17,%xmm3 > + pxor %xmm2,%xmm7 > + pslld $13,%xmm1 > + paddd %xmm7,%xmm5 > + pxor %xmm3,%xmm0 > + psrld $19-17,%xmm3 > + pxor %xmm1,%xmm0 > + pslld $15-13,%xmm1 > + pxor %xmm3,%xmm0 > + pxor %xmm1,%xmm0 > + paddd %xmm0,%xmm5 > + movdqa %xmm12,%xmm7 > + > + movdqa %xmm12,%xmm2 > + > + psrld $6,%xmm7 > + movdqa %xmm12,%xmm1 > + pslld $7,%xmm2 > + movdqa %xmm5,128-128(%rax) > + paddd %xmm15,%xmm5 > + > + psrld $11,%xmm1 > + pxor %xmm2,%xmm7 > + pslld $21-7,%xmm2 > + paddd -128(%rbp),%xmm5 > + pxor %xmm1,%xmm7 > + > + psrld $25-11,%xmm1 > + movdqa %xmm12,%xmm0 > + > + pxor %xmm2,%xmm7 > + movdqa %xmm12,%xmm3 > + pslld $26-21,%xmm2 > + pandn %xmm14,%xmm0 > + pand %xmm13,%xmm3 > + pxor %xmm1,%xmm7 > + > + > + movdqa %xmm8,%xmm1 > + pxor %xmm2,%xmm7 > + movdqa %xmm8,%xmm2 > + psrld $2,%xmm1 > + paddd %xmm7,%xmm5 > + pxor %xmm3,%xmm0 > + movdqa %xmm9,%xmm3 > + movdqa %xmm8,%xmm7 > + pslld $10,%xmm2 > + pxor %xmm8,%xmm3 > + > + > + psrld $13,%xmm7 > + pxor %xmm2,%xmm1 > + paddd %xmm0,%xmm5 > + pslld $19-10,%xmm2 > + pand %xmm3,%xmm4 > + pxor %xmm7,%xmm1 > + > + > + psrld $22-13,%xmm7 > + pxor %xmm2,%xmm1 > + movdqa %xmm9,%xmm15 > + pslld $30-19,%xmm2 > + pxor %xmm1,%xmm7 > + pxor %xmm4,%xmm15 > + paddd %xmm5,%xmm11 > + pxor %xmm2,%xmm7 > + > + paddd %xmm5,%xmm15 > + paddd %xmm7,%xmm15 > + movdqa 160-128(%rax),%xmm5 > + paddd 32-128(%rax),%xmm6 > + > + movdqa %xmm5,%xmm7 > + movdqa %xmm5,%xmm1 > + psrld $3,%xmm7 > + movdqa %xmm5,%xmm2 > + > + psrld $7,%xmm1 > + movdqa 112-128(%rax),%xmm0 > + pslld $14,%xmm2 > + pxor %xmm1,%xmm7 > + psrld $18-7,%xmm1 > + movdqa %xmm0,%xmm4 > + pxor %xmm2,%xmm7 > + pslld $25-14,%xmm2 > + pxor %xmm1,%xmm7 > + psrld $10,%xmm0 > + movdqa %xmm4,%xmm1 > + > + psrld $17,%xmm4 > + pxor %xmm2,%xmm7 > + pslld $13,%xmm1 > + paddd %xmm7,%xmm6 > + pxor %xmm4,%xmm0 > + psrld $19-17,%xmm4 > + pxor %xmm1,%xmm0 > + pslld $15-13,%xmm1 > + pxor %xmm4,%xmm0 > + pxor %xmm1,%xmm0 > + paddd %xmm0,%xmm6 > + movdqa %xmm11,%xmm7 > + > + movdqa %xmm11,%xmm2 > + > + psrld $6,%xmm7 > + movdqa %xmm11,%xmm1 > + pslld $7,%xmm2 > + movdqa %xmm6,144-128(%rax) > + paddd %xmm14,%xmm6 > + > + psrld $11,%xmm1 > + pxor %xmm2,%xmm7 > + pslld $21-7,%xmm2 > + paddd -96(%rbp),%xmm6 > + pxor %xmm1,%xmm7 > + > + psrld $25-11,%xmm1 > + movdqa %xmm11,%xmm0 > + > + pxor %xmm2,%xmm7 > + movdqa %xmm11,%xmm4 > + pslld $26-21,%xmm2 > + pandn %xmm13,%xmm0 > + pand %xmm12,%xmm4 > + pxor %xmm1,%xmm7 > + > + > + movdqa %xmm15,%xmm1 > + pxor %xmm2,%xmm7 > + movdqa %xmm15,%xmm2 > + psrld $2,%xmm1 > + paddd %xmm7,%xmm6 > + pxor %xmm4,%xmm0 > + movdqa %xmm8,%xmm4 > + movdqa %xmm15,%xmm7 > + pslld $10,%xmm2 > + pxor %xmm15,%xmm4 > + > + > + psrld $13,%xmm7 > + pxor %xmm2,%xmm1 > + paddd %xmm0,%xmm6 > + pslld $19-10,%xmm2 > + pand %xmm4,%xmm3 > + pxor %xmm7,%xmm1 > + > + > + psrld $22-13,%xmm7 > + pxor %xmm2,%xmm1 > + movdqa %xmm8,%xmm14 > + pslld $30-19,%xmm2 > + pxor %xmm1,%xmm7 > + pxor %xmm3,%xmm14 > + paddd %xmm6,%xmm10 > + pxor %xmm2,%xmm7 > + > + paddd %xmm6,%xmm14 > + paddd %xmm7,%xmm14 > + movdqa 176-128(%rax),%xmm6 > + paddd 48-128(%rax),%xmm5 > + > + movdqa %xmm6,%xmm7 > + movdqa %xmm6,%xmm1 > + psrld $3,%xmm7 > + movdqa %xmm6,%xmm2 > + > + psrld $7,%xmm1 > + movdqa 128-128(%rax),%xmm0 > + pslld $14,%xmm2 > + pxor %xmm1,%xmm7 > + psrld $18-7,%xmm1 > + movdqa %xmm0,%xmm3 > + pxor %xmm2,%xmm7 > + pslld $25-14,%xmm2 > + pxor %xmm1,%xmm7 > + psrld $10,%xmm0 > + movdqa %xmm3,%xmm1 > + > + psrld $17,%xmm3 > + pxor %xmm2,%xmm7 > + pslld $13,%xmm1 > + paddd %xmm7,%xmm5 > + pxor %xmm3,%xmm0 > + psrld $19-17,%xmm3 > + pxor %xmm1,%xmm0 > + pslld $15-13,%xmm1 > + pxor %xmm3,%xmm0 > + pxor %xmm1,%xmm0 > + paddd %xmm0,%xmm5 > + movdqa %xmm10,%xmm7 > + > + movdqa %xmm10,%xmm2 > + > + psrld $6,%xmm7 > + movdqa %xmm10,%xmm1 > + pslld $7,%xmm2 > + movdqa %xmm5,160-128(%rax) > + paddd %xmm13,%xmm5 > + > + psrld $11,%xmm1 > + pxor %xmm2,%xmm7 > + pslld $21-7,%xmm2 > + paddd -64(%rbp),%xmm5 > + pxor %xmm1,%xmm7 > + > + psrld $25-11,%xmm1 > + movdqa %xmm10,%xmm0 > + > + pxor %xmm2,%xmm7 > + movdqa %xmm10,%xmm3 > + pslld $26-21,%xmm2 > + pandn %xmm12,%xmm0 > + pand %xmm11,%xmm3 > + pxor %xmm1,%xmm7 > + > + > + movdqa %xmm14,%xmm1 > + pxor %xmm2,%xmm7 > + movdqa %xmm14,%xmm2 > + psrld $2,%xmm1 > + paddd %xmm7,%xmm5 > + pxor %xmm3,%xmm0 > + movdqa %xmm15,%xmm3 > + movdqa %xmm14,%xmm7 > + pslld $10,%xmm2 > + pxor %xmm14,%xmm3 > + > + > + psrld $13,%xmm7 > + pxor %xmm2,%xmm1 > + paddd %xmm0,%xmm5 > + pslld $19-10,%xmm2 > + pand %xmm3,%xmm4 > + pxor %xmm7,%xmm1 > + > + > + psrld $22-13,%xmm7 > + pxor %xmm2,%xmm1 > + movdqa %xmm15,%xmm13 > + pslld $30-19,%xmm2 > + pxor %xmm1,%xmm7 > + pxor %xmm4,%xmm13 > + paddd %xmm5,%xmm9 > + pxor %xmm2,%xmm7 > + > + paddd %xmm5,%xmm13 > + paddd %xmm7,%xmm13 > + movdqa 192-128(%rax),%xmm5 > + paddd 64-128(%rax),%xmm6 > + > + movdqa %xmm5,%xmm7 > + movdqa %xmm5,%xmm1 > + psrld $3,%xmm7 > + movdqa %xmm5,%xmm2 > + > + psrld $7,%xmm1 > + movdqa 144-128(%rax),%xmm0 > + pslld $14,%xmm2 > + pxor %xmm1,%xmm7 > + psrld $18-7,%xmm1 > + movdqa %xmm0,%xmm4 > + pxor %xmm2,%xmm7 > + pslld $25-14,%xmm2 > + pxor %xmm1,%xmm7 > + psrld $10,%xmm0 > + movdqa %xmm4,%xmm1 > + > + psrld $17,%xmm4 > + pxor %xmm2,%xmm7 > + pslld $13,%xmm1 > + paddd %xmm7,%xmm6 > + pxor %xmm4,%xmm0 > + psrld $19-17,%xmm4 > + pxor %xmm1,%xmm0 > + pslld $15-13,%xmm1 > + pxor %xmm4,%xmm0 > + pxor %xmm1,%xmm0 > + paddd %xmm0,%xmm6 > + movdqa %xmm9,%xmm7 > + > + movdqa %xmm9,%xmm2 > + > + psrld $6,%xmm7 > + movdqa %xmm9,%xmm1 > + pslld $7,%xmm2 > + movdqa %xmm6,176-128(%rax) > + paddd %xmm12,%xmm6 > + > + psrld $11,%xmm1 > + pxor %xmm2,%xmm7 > + pslld $21-7,%xmm2 > + paddd -32(%rbp),%xmm6 > + pxor %xmm1,%xmm7 > + > + psrld $25-11,%xmm1 > + movdqa %xmm9,%xmm0 > + > + pxor %xmm2,%xmm7 > + movdqa %xmm9,%xmm4 > + pslld $26-21,%xmm2 > + pandn %xmm11,%xmm0 > + pand %xmm10,%xmm4 > + pxor %xmm1,%xmm7 > + > + > + movdqa %xmm13,%xmm1 > + pxor %xmm2,%xmm7 > + movdqa %xmm13,%xmm2 > + psrld $2,%xmm1 > + paddd %xmm7,%xmm6 > + pxor %xmm4,%xmm0 > + movdqa %xmm14,%xmm4 > + movdqa %xmm13,%xmm7 > + pslld $10,%xmm2 > + pxor %xmm13,%xmm4 > + > + > + psrld $13,%xmm7 > + pxor %xmm2,%xmm1 > + paddd %xmm0,%xmm6 > + pslld $19-10,%xmm2 > + pand %xmm4,%xmm3 > + pxor %xmm7,%xmm1 > + > + > + psrld $22-13,%xmm7 > + pxor %xmm2,%xmm1 > + movdqa %xmm14,%xmm12 > + pslld $30-19,%xmm2 > + pxor %xmm1,%xmm7 > + pxor %xmm3,%xmm12 > + paddd %xmm6,%xmm8 > + pxor %xmm2,%xmm7 > + > + paddd %xmm6,%xmm12 > + paddd %xmm7,%xmm12 > + movdqa 208-128(%rax),%xmm6 > + paddd 80-128(%rax),%xmm5 > + > + movdqa %xmm6,%xmm7 > + movdqa %xmm6,%xmm1 > + psrld $3,%xmm7 > + movdqa %xmm6,%xmm2 > + > + psrld $7,%xmm1 > + movdqa 160-128(%rax),%xmm0 > + pslld $14,%xmm2 > + pxor %xmm1,%xmm7 > + psrld $18-7,%xmm1 > + movdqa %xmm0,%xmm3 > + pxor %xmm2,%xmm7 > + pslld $25-14,%xmm2 > + pxor %xmm1,%xmm7 > + psrld $10,%xmm0 > + movdqa %xmm3,%xmm1 > + > + psrld $17,%xmm3 > + pxor %xmm2,%xmm7 > + pslld $13,%xmm1 > + paddd %xmm7,%xmm5 > + pxor %xmm3,%xmm0 > + psrld $19-17,%xmm3 > + pxor %xmm1,%xmm0 > + pslld $15-13,%xmm1 > + pxor %xmm3,%xmm0 > + pxor %xmm1,%xmm0 > + paddd %xmm0,%xmm5 > + movdqa %xmm8,%xmm7 > + > + movdqa %xmm8,%xmm2 > + > + psrld $6,%xmm7 > + movdqa %xmm8,%xmm1 > + pslld $7,%xmm2 > + movdqa %xmm5,192-128(%rax) > + paddd %xmm11,%xmm5 > + > + psrld $11,%xmm1 > + pxor %xmm2,%xmm7 > + pslld $21-7,%xmm2 > + paddd 0(%rbp),%xmm5 > + pxor %xmm1,%xmm7 > + > + psrld $25-11,%xmm1 > + movdqa %xmm8,%xmm0 > + > + pxor %xmm2,%xmm7 > + movdqa %xmm8,%xmm3 > + pslld $26-21,%xmm2 > + pandn %xmm10,%xmm0 > + pand %xmm9,%xmm3 > + pxor %xmm1,%xmm7 > + > + > + movdqa %xmm12,%xmm1 > + pxor %xmm2,%xmm7 > + movdqa %xmm12,%xmm2 > + psrld $2,%xmm1 > + paddd %xmm7,%xmm5 > + pxor %xmm3,%xmm0 > + movdqa %xmm13,%xmm3 > + movdqa %xmm12,%xmm7 > + pslld $10,%xmm2 > + pxor %xmm12,%xmm3 > + > + > + psrld $13,%xmm7 > + pxor %xmm2,%xmm1 > + paddd %xmm0,%xmm5 > + pslld $19-10,%xmm2 > + pand %xmm3,%xmm4 > + pxor %xmm7,%xmm1 > + > + > + psrld $22-13,%xmm7 > + pxor %xmm2,%xmm1 > + movdqa %xmm13,%xmm11 > + pslld $30-19,%xmm2 > + pxor %xmm1,%xmm7 > + pxor %xmm4,%xmm11 > + paddd %xmm5,%xmm15 > + pxor %xmm2,%xmm7 > + > + paddd %xmm5,%xmm11 > + paddd %xmm7,%xmm11 > + movdqa 224-128(%rax),%xmm5 > + paddd 96-128(%rax),%xmm6 > + > + movdqa %xmm5,%xmm7 > + movdqa %xmm5,%xmm1 > + psrld $3,%xmm7 > + movdqa %xmm5,%xmm2 > + > + psrld $7,%xmm1 > + movdqa 176-128(%rax),%xmm0 > + pslld $14,%xmm2 > + pxor %xmm1,%xmm7 > + psrld $18-7,%xmm1 > + movdqa %xmm0,%xmm4 > + pxor %xmm2,%xmm7 > + pslld $25-14,%xmm2 > + pxor %xmm1,%xmm7 > + psrld $10,%xmm0 > + movdqa %xmm4,%xmm1 > + > + psrld $17,%xmm4 > + pxor %xmm2,%xmm7 > + pslld $13,%xmm1 > + paddd %xmm7,%xmm6 > + pxor %xmm4,%xmm0 > + psrld $19-17,%xmm4 > + pxor %xmm1,%xmm0 > + pslld $15-13,%xmm1 > + pxor %xmm4,%xmm0 > + pxor %xmm1,%xmm0 > + paddd %xmm0,%xmm6 > + movdqa %xmm15,%xmm7 > + > + movdqa %xmm15,%xmm2 > + > + psrld $6,%xmm7 > + movdqa %xmm15,%xmm1 > + pslld $7,%xmm2 > + movdqa %xmm6,208-128(%rax) > + paddd %xmm10,%xmm6 > + > + psrld $11,%xmm1 > + pxor %xmm2,%xmm7 > + pslld $21-7,%xmm2 > + paddd 32(%rbp),%xmm6 > + pxor %xmm1,%xmm7 > + > + psrld $25-11,%xmm1 > + movdqa %xmm15,%xmm0 > + > + pxor %xmm2,%xmm7 > + movdqa %xmm15,%xmm4 > + pslld $26-21,%xmm2 > + pandn %xmm9,%xmm0 > + pand %xmm8,%xmm4 > + pxor %xmm1,%xmm7 > + > + > + movdqa %xmm11,%xmm1 > + pxor %xmm2,%xmm7 > + movdqa %xmm11,%xmm2 > + psrld $2,%xmm1 > + paddd %xmm7,%xmm6 > + pxor %xmm4,%xmm0 > + movdqa %xmm12,%xmm4 > + movdqa %xmm11,%xmm7 > + pslld $10,%xmm2 > + pxor %xmm11,%xmm4 > + > + > + psrld $13,%xmm7 > + pxor %xmm2,%xmm1 > + paddd %xmm0,%xmm6 > + pslld $19-10,%xmm2 > + pand %xmm4,%xmm3 > + pxor %xmm7,%xmm1 > + > + > + psrld $22-13,%xmm7 > + pxor %xmm2,%xmm1 > + movdqa %xmm12,%xmm10 > + pslld $30-19,%xmm2 > + pxor %xmm1,%xmm7 > + pxor %xmm3,%xmm10 > + paddd %xmm6,%xmm14 > + pxor %xmm2,%xmm7 > + > + paddd %xmm6,%xmm10 > + paddd %xmm7,%xmm10 > + movdqa 240-128(%rax),%xmm6 > + paddd 112-128(%rax),%xmm5 > + > + movdqa %xmm6,%xmm7 > + movdqa %xmm6,%xmm1 > + psrld $3,%xmm7 > + movdqa %xmm6,%xmm2 > + > + psrld $7,%xmm1 > + movdqa 192-128(%rax),%xmm0 > + pslld $14,%xmm2 > + pxor %xmm1,%xmm7 > + psrld $18-7,%xmm1 > + movdqa %xmm0,%xmm3 > + pxor %xmm2,%xmm7 > + pslld $25-14,%xmm2 > + pxor %xmm1,%xmm7 > + psrld $10,%xmm0 > + movdqa %xmm3,%xmm1 > + > + psrld $17,%xmm3 > + pxor %xmm2,%xmm7 > + pslld $13,%xmm1 > + paddd %xmm7,%xmm5 > + pxor %xmm3,%xmm0 > + psrld $19-17,%xmm3 > + pxor %xmm1,%xmm0 > + pslld $15-13,%xmm1 > + pxor %xmm3,%xmm0 > + pxor %xmm1,%xmm0 > + paddd %xmm0,%xmm5 > + movdqa %xmm14,%xmm7 > + > + movdqa %xmm14,%xmm2 > + > + psrld $6,%xmm7 > + movdqa %xmm14,%xmm1 > + pslld $7,%xmm2 > + movdqa %xmm5,224-128(%rax) > + paddd %xmm9,%xmm5 > + > + psrld $11,%xmm1 > + pxor %xmm2,%xmm7 > + pslld $21-7,%xmm2 > + paddd 64(%rbp),%xmm5 > + pxor %xmm1,%xmm7 > + > + psrld $25-11,%xmm1 > + movdqa %xmm14,%xmm0 > + > + pxor %xmm2,%xmm7 > + movdqa %xmm14,%xmm3 > + pslld $26-21,%xmm2 > + pandn %xmm8,%xmm0 > + pand %xmm15,%xmm3 > + pxor %xmm1,%xmm7 > + > + > + movdqa %xmm10,%xmm1 > + pxor %xmm2,%xmm7 > + movdqa %xmm10,%xmm2 > + psrld $2,%xmm1 > + paddd %xmm7,%xmm5 > + pxor %xmm3,%xmm0 > + movdqa %xmm11,%xmm3 > + movdqa %xmm10,%xmm7 > + pslld $10,%xmm2 > + pxor %xmm10,%xmm3 > + > + > + psrld $13,%xmm7 > + pxor %xmm2,%xmm1 > + paddd %xmm0,%xmm5 > + pslld $19-10,%xmm2 > + pand %xmm3,%xmm4 > + pxor %xmm7,%xmm1 > + > + > + psrld $22-13,%xmm7 > + pxor %xmm2,%xmm1 > + movdqa %xmm11,%xmm9 > + pslld $30-19,%xmm2 > + pxor %xmm1,%xmm7 > + pxor %xmm4,%xmm9 > + paddd %xmm5,%xmm13 > + pxor %xmm2,%xmm7 > + > + paddd %xmm5,%xmm9 > + paddd %xmm7,%xmm9 > + movdqa 0-128(%rax),%xmm5 > + paddd 128-128(%rax),%xmm6 > + > + movdqa %xmm5,%xmm7 > + movdqa %xmm5,%xmm1 > + psrld $3,%xmm7 > + movdqa %xmm5,%xmm2 > + > + psrld $7,%xmm1 > + movdqa 208-128(%rax),%xmm0 > + pslld $14,%xmm2 > + pxor %xmm1,%xmm7 > + psrld $18-7,%xmm1 > + movdqa %xmm0,%xmm4 > + pxor %xmm2,%xmm7 > + pslld $25-14,%xmm2 > + pxor %xmm1,%xmm7 > + psrld $10,%xmm0 > + movdqa %xmm4,%xmm1 > + > + psrld $17,%xmm4 > + pxor %xmm2,%xmm7 > + pslld $13,%xmm1 > + paddd %xmm7,%xmm6 > + pxor %xmm4,%xmm0 > + psrld $19-17,%xmm4 > + pxor %xmm1,%xmm0 > + pslld $15-13,%xmm1 > + pxor %xmm4,%xmm0 > + pxor %xmm1,%xmm0 > + paddd %xmm0,%xmm6 > + movdqa %xmm13,%xmm7 > + > + movdqa %xmm13,%xmm2 > + > + psrld $6,%xmm7 > + movdqa %xmm13,%xmm1 > + pslld $7,%xmm2 > + movdqa %xmm6,240-128(%rax) > + paddd %xmm8,%xmm6 > + > + psrld $11,%xmm1 > + pxor %xmm2,%xmm7 > + pslld $21-7,%xmm2 > + paddd 96(%rbp),%xmm6 > + pxor %xmm1,%xmm7 > + > + psrld $25-11,%xmm1 > + movdqa %xmm13,%xmm0 > + > + pxor %xmm2,%xmm7 > + movdqa %xmm13,%xmm4 > + pslld $26-21,%xmm2 > + pandn %xmm15,%xmm0 > + pand %xmm14,%xmm4 > + pxor %xmm1,%xmm7 > + > + > + movdqa %xmm9,%xmm1 > + pxor %xmm2,%xmm7 > + movdqa %xmm9,%xmm2 > + psrld $2,%xmm1 > + paddd %xmm7,%xmm6 > + pxor %xmm4,%xmm0 > + movdqa %xmm10,%xmm4 > + movdqa %xmm9,%xmm7 > + pslld $10,%xmm2 > + pxor %xmm9,%xmm4 > + > + > + psrld $13,%xmm7 > + pxor %xmm2,%xmm1 > + paddd %xmm0,%xmm6 > + pslld $19-10,%xmm2 > + pand %xmm4,%xmm3 > + pxor %xmm7,%xmm1 > + > + > + psrld $22-13,%xmm7 > + pxor %xmm2,%xmm1 > + movdqa %xmm10,%xmm8 > + pslld $30-19,%xmm2 > + pxor %xmm1,%xmm7 > + pxor %xmm3,%xmm8 > + paddd %xmm6,%xmm12 > + pxor %xmm2,%xmm7 > + > + paddd %xmm6,%xmm8 > + paddd %xmm7,%xmm8 > + leaq 256(%rbp),%rbp > + decl %ecx > + jnz .Loop_16_xx > + > + movl $1,%ecx > + leaq K256+128(%rip),%rbp > + > + movdqa (%rbx),%xmm7 > + cmpl 0(%rbx),%ecx > + pxor %xmm0,%xmm0 > + cmovgeq %rbp,%r8 > + cmpl 4(%rbx),%ecx > + movdqa %xmm7,%xmm6 > + cmovgeq %rbp,%r9 > + cmpl 8(%rbx),%ecx > + pcmpgtd %xmm0,%xmm6 > + cmovgeq %rbp,%r10 > + cmpl 12(%rbx),%ecx > + paddd %xmm6,%xmm7 > + cmovgeq %rbp,%r11 > + > + movdqu 0-128(%rdi),%xmm0 > + pand %xmm6,%xmm8 > + movdqu 32-128(%rdi),%xmm1 > + pand %xmm6,%xmm9 > + movdqu 64-128(%rdi),%xmm2 > + pand %xmm6,%xmm10 > + movdqu 96-128(%rdi),%xmm5 > + pand %xmm6,%xmm11 > + paddd %xmm0,%xmm8 > + movdqu 128-128(%rdi),%xmm0 > + pand %xmm6,%xmm12 > + paddd %xmm1,%xmm9 > + movdqu 160-128(%rdi),%xmm1 > + pand %xmm6,%xmm13 > + paddd %xmm2,%xmm10 > + movdqu 192-128(%rdi),%xmm2 > + pand %xmm6,%xmm14 > + paddd %xmm5,%xmm11 > + movdqu 224-128(%rdi),%xmm5 > + pand %xmm6,%xmm15 > + paddd %xmm0,%xmm12 > + paddd %xmm1,%xmm13 > + movdqu %xmm8,0-128(%rdi) > + paddd %xmm2,%xmm14 > + movdqu %xmm9,32-128(%rdi) > + paddd %xmm5,%xmm15 > + movdqu %xmm10,64-128(%rdi) > + movdqu %xmm11,96-128(%rdi) > + movdqu %xmm12,128-128(%rdi) > + movdqu %xmm13,160-128(%rdi) > + movdqu %xmm14,192-128(%rdi) > + movdqu %xmm15,224-128(%rdi) > + > + movdqa %xmm7,(%rbx) > + movdqa .Lpbswap(%rip),%xmm6 > + decl %edx > + jnz .Loop > + > + movl 280(%rsp),%edx > + leaq 16(%rdi),%rdi > + leaq 64(%rsi),%rsi > + decl %edx > + jnz .Loop_grande > + > +.Ldone: > + movq 272(%rsp),%rax > +.cfi_def_cfa %rax,8 > + movq -16(%rax),%rbp > +.cfi_restore %rbp > + movq -8(%rax),%rbx > +.cfi_restore %rbx > + leaq (%rax),%rsp > +.cfi_def_cfa_register %rsp > +.Lepilogue: > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size sha256_multi_block,.-sha256_multi_block > +.type sha256_multi_block_shaext,@function > +.align 32 > +sha256_multi_block_shaext: > +.cfi_startproc > +_shaext_shortcut: > + movq %rsp,%rax > +.cfi_def_cfa_register %rax > + pushq %rbx > +.cfi_offset %rbx,-16 > + pushq %rbp > +.cfi_offset %rbp,-24 > + subq $288,%rsp > + shll $1,%edx > + andq $-256,%rsp > + leaq 128(%rdi),%rdi > + movq %rax,272(%rsp) > +.Lbody_shaext: > + leaq 256(%rsp),%rbx > + leaq K256_shaext+128(%rip),%rbp > + > +.Loop_grande_shaext: > + movl %edx,280(%rsp) > + xorl %edx,%edx > + movq 0(%rsi),%r8 > + movl 8(%rsi),%ecx > + cmpl %edx,%ecx > + cmovgl %ecx,%edx > + testl %ecx,%ecx > + movl %ecx,0(%rbx) > + cmovleq %rsp,%r8 > + movq 16(%rsi),%r9 > + movl 24(%rsi),%ecx > + cmpl %edx,%ecx > + cmovgl %ecx,%edx > + testl %ecx,%ecx > + movl %ecx,4(%rbx) > + cmovleq %rsp,%r9 > + testl %edx,%edx > + jz .Ldone_shaext > + > + movq 0-128(%rdi),%xmm12 > + movq 32-128(%rdi),%xmm4 > + movq 64-128(%rdi),%xmm13 > + movq 96-128(%rdi),%xmm5 > + movq 128-128(%rdi),%xmm8 > + movq 160-128(%rdi),%xmm9 > + movq 192-128(%rdi),%xmm10 > + movq 224-128(%rdi),%xmm11 > + > + punpckldq %xmm4,%xmm12 > + punpckldq %xmm5,%xmm13 > + punpckldq %xmm9,%xmm8 > + punpckldq %xmm11,%xmm10 > + movdqa K256_shaext-16(%rip),%xmm3 > + > + movdqa %xmm12,%xmm14 > + movdqa %xmm13,%xmm15 > + punpcklqdq %xmm8,%xmm12 > + punpcklqdq %xmm10,%xmm13 > + punpckhqdq %xmm8,%xmm14 > + punpckhqdq %xmm10,%xmm15 > + > + pshufd $27,%xmm12,%xmm12 > + pshufd $27,%xmm13,%xmm13 > + pshufd $27,%xmm14,%xmm14 > + pshufd $27,%xmm15,%xmm15 > + jmp .Loop_shaext > + > +.align 32 > +.Loop_shaext: > + movdqu 0(%r8),%xmm4 > + movdqu 0(%r9),%xmm8 > + movdqu 16(%r8),%xmm5 > + movdqu 16(%r9),%xmm9 > + movdqu 32(%r8),%xmm6 > +.byte 102,15,56,0,227 > + movdqu 32(%r9),%xmm10 > +.byte 102,68,15,56,0,195 > + movdqu 48(%r8),%xmm7 > + leaq 64(%r8),%r8 > + movdqu 48(%r9),%xmm11 > + leaq 64(%r9),%r9 > + > + movdqa 0-128(%rbp),%xmm0 > +.byte 102,15,56,0,235 > + paddd %xmm4,%xmm0 > + pxor %xmm12,%xmm4 > + movdqa %xmm0,%xmm1 > + movdqa 0-128(%rbp),%xmm2 > +.byte 102,68,15,56,0,203 > + paddd %xmm8,%xmm2 > + movdqa %xmm13,80(%rsp) > +.byte 69,15,56,203,236 > + pxor %xmm14,%xmm8 > + movdqa %xmm2,%xmm0 > + movdqa %xmm15,112(%rsp) > +.byte 69,15,56,203,254 > + pshufd $0x0e,%xmm1,%xmm0 > + pxor %xmm12,%xmm4 > + movdqa %xmm12,64(%rsp) > +.byte 69,15,56,203,229 > + pshufd $0x0e,%xmm2,%xmm0 > + pxor %xmm14,%xmm8 > + movdqa %xmm14,96(%rsp) > + movdqa 16-128(%rbp),%xmm1 > + paddd %xmm5,%xmm1 > +.byte 102,15,56,0,243 > +.byte 69,15,56,203,247 > + > + movdqa %xmm1,%xmm0 > + movdqa 16-128(%rbp),%xmm2 > + paddd %xmm9,%xmm2 > +.byte 69,15,56,203,236 > + movdqa %xmm2,%xmm0 > + prefetcht0 127(%r8) > +.byte 102,15,56,0,251 > +.byte 102,68,15,56,0,211 > + prefetcht0 127(%r9) > +.byte 69,15,56,203,254 > + pshufd $0x0e,%xmm1,%xmm0 > +.byte 102,68,15,56,0,219 > +.byte 15,56,204,229 > +.byte 69,15,56,203,229 > + pshufd $0x0e,%xmm2,%xmm0 > + movdqa 32-128(%rbp),%xmm1 > + paddd %xmm6,%xmm1 > +.byte 69,15,56,203,247 > + > + movdqa %xmm1,%xmm0 > + movdqa 32-128(%rbp),%xmm2 > + paddd %xmm10,%xmm2 > +.byte 69,15,56,203,236 > +.byte 69,15,56,204,193 > + movdqa %xmm2,%xmm0 > + movdqa %xmm7,%xmm3 > +.byte 69,15,56,203,254 > + pshufd $0x0e,%xmm1,%xmm0 > +.byte 102,15,58,15,222,4 > + paddd %xmm3,%xmm4 > + movdqa %xmm11,%xmm3 > +.byte 102,65,15,58,15,218,4 > +.byte 15,56,204,238 > +.byte 69,15,56,203,229 > + pshufd $0x0e,%xmm2,%xmm0 > + movdqa 48-128(%rbp),%xmm1 > + paddd %xmm7,%xmm1 > +.byte 69,15,56,203,247 > +.byte 69,15,56,204,202 > + > + movdqa %xmm1,%xmm0 > + movdqa 48-128(%rbp),%xmm2 > + paddd %xmm3,%xmm8 > + paddd %xmm11,%xmm2 > +.byte 15,56,205,231 > +.byte 69,15,56,203,236 > + movdqa %xmm2,%xmm0 > + movdqa %xmm4,%xmm3 > +.byte 102,15,58,15,223,4 > +.byte 69,15,56,203,254 > +.byte 69,15,56,205,195 > + pshufd $0x0e,%xmm1,%xmm0 > + paddd %xmm3,%xmm5 > + movdqa %xmm8,%xmm3 > +.byte 102,65,15,58,15,219,4 > +.byte 15,56,204,247 > +.byte 69,15,56,203,229 > + pshufd $0x0e,%xmm2,%xmm0 > + movdqa 64-128(%rbp),%xmm1 > + paddd %xmm4,%xmm1 > +.byte 69,15,56,203,247 > +.byte 69,15,56,204,211 > + movdqa %xmm1,%xmm0 > + movdqa 64-128(%rbp),%xmm2 > + paddd %xmm3,%xmm9 > + paddd %xmm8,%xmm2 > +.byte 15,56,205,236 > +.byte 69,15,56,203,236 > + movdqa %xmm2,%xmm0 > + movdqa %xmm5,%xmm3 > +.byte 102,15,58,15,220,4 > +.byte 69,15,56,203,254 > +.byte 69,15,56,205,200 > + pshufd $0x0e,%xmm1,%xmm0 > + paddd %xmm3,%xmm6 > + movdqa %xmm9,%xmm3 > +.byte 102,65,15,58,15,216,4 > +.byte 15,56,204,252 > +.byte 69,15,56,203,229 > + pshufd $0x0e,%xmm2,%xmm0 > + movdqa 80-128(%rbp),%xmm1 > + paddd %xmm5,%xmm1 > +.byte 69,15,56,203,247 > +.byte 69,15,56,204,216 > + movdqa %xmm1,%xmm0 > + movdqa 80-128(%rbp),%xmm2 > + paddd %xmm3,%xmm10 > + paddd %xmm9,%xmm2 > +.byte 15,56,205,245 > +.byte 69,15,56,203,236 > + movdqa %xmm2,%xmm0 > + movdqa %xmm6,%xmm3 > +.byte 102,15,58,15,221,4 > +.byte 69,15,56,203,254 > +.byte 69,15,56,205,209 > + pshufd $0x0e,%xmm1,%xmm0 > + paddd %xmm3,%xmm7 > + movdqa %xmm10,%xmm3 > +.byte 102,65,15,58,15,217,4 > +.byte 15,56,204,229 > +.byte 69,15,56,203,229 > + pshufd $0x0e,%xmm2,%xmm0 > + movdqa 96-128(%rbp),%xmm1 > + paddd %xmm6,%xmm1 > +.byte 69,15,56,203,247 > +.byte 69,15,56,204,193 > + movdqa %xmm1,%xmm0 > + movdqa 96-128(%rbp),%xmm2 > + paddd %xmm3,%xmm11 > + paddd %xmm10,%xmm2 > +.byte 15,56,205,254 > +.byte 69,15,56,203,236 > + movdqa %xmm2,%xmm0 > + movdqa %xmm7,%xmm3 > +.byte 102,15,58,15,222,4 > +.byte 69,15,56,203,254 > +.byte 69,15,56,205,218 > + pshufd $0x0e,%xmm1,%xmm0 > + paddd %xmm3,%xmm4 > + movdqa %xmm11,%xmm3 > +.byte 102,65,15,58,15,218,4 > +.byte 15,56,204,238 > +.byte 69,15,56,203,229 > + pshufd $0x0e,%xmm2,%xmm0 > + movdqa 112-128(%rbp),%xmm1 > + paddd %xmm7,%xmm1 > +.byte 69,15,56,203,247 > +.byte 69,15,56,204,202 > + movdqa %xmm1,%xmm0 > + movdqa 112-128(%rbp),%xmm2 > + paddd %xmm3,%xmm8 > + paddd %xmm11,%xmm2 > +.byte 15,56,205,231 > +.byte 69,15,56,203,236 > + movdqa %xmm2,%xmm0 > + movdqa %xmm4,%xmm3 > +.byte 102,15,58,15,223,4 > +.byte 69,15,56,203,254 > +.byte 69,15,56,205,195 > + pshufd $0x0e,%xmm1,%xmm0 > + paddd %xmm3,%xmm5 > + movdqa %xmm8,%xmm3 > +.byte 102,65,15,58,15,219,4 > +.byte 15,56,204,247 > +.byte 69,15,56,203,229 > + pshufd $0x0e,%xmm2,%xmm0 > + movdqa 128-128(%rbp),%xmm1 > + paddd %xmm4,%xmm1 > +.byte 69,15,56,203,247 > +.byte 69,15,56,204,211 > + movdqa %xmm1,%xmm0 > + movdqa 128-128(%rbp),%xmm2 > + paddd %xmm3,%xmm9 > + paddd %xmm8,%xmm2 > +.byte 15,56,205,236 > +.byte 69,15,56,203,236 > + movdqa %xmm2,%xmm0 > + movdqa %xmm5,%xmm3 > +.byte 102,15,58,15,220,4 > +.byte 69,15,56,203,254 > +.byte 69,15,56,205,200 > + pshufd $0x0e,%xmm1,%xmm0 > + paddd %xmm3,%xmm6 > + movdqa %xmm9,%xmm3 > +.byte 102,65,15,58,15,216,4 > +.byte 15,56,204,252 > +.byte 69,15,56,203,229 > + pshufd $0x0e,%xmm2,%xmm0 > + movdqa 144-128(%rbp),%xmm1 > + paddd %xmm5,%xmm1 > +.byte 69,15,56,203,247 > +.byte 69,15,56,204,216 > + movdqa %xmm1,%xmm0 > + movdqa 144-128(%rbp),%xmm2 > + paddd %xmm3,%xmm10 > + paddd %xmm9,%xmm2 > +.byte 15,56,205,245 > +.byte 69,15,56,203,236 > + movdqa %xmm2,%xmm0 > + movdqa %xmm6,%xmm3 > +.byte 102,15,58,15,221,4 > +.byte 69,15,56,203,254 > +.byte 69,15,56,205,209 > + pshufd $0x0e,%xmm1,%xmm0 > + paddd %xmm3,%xmm7 > + movdqa %xmm10,%xmm3 > +.byte 102,65,15,58,15,217,4 > +.byte 15,56,204,229 > +.byte 69,15,56,203,229 > + pshufd $0x0e,%xmm2,%xmm0 > + movdqa 160-128(%rbp),%xmm1 > + paddd %xmm6,%xmm1 > +.byte 69,15,56,203,247 > +.byte 69,15,56,204,193 > + movdqa %xmm1,%xmm0 > + movdqa 160-128(%rbp),%xmm2 > + paddd %xmm3,%xmm11 > + paddd %xmm10,%xmm2 > +.byte 15,56,205,254 > +.byte 69,15,56,203,236 > + movdqa %xmm2,%xmm0 > + movdqa %xmm7,%xmm3 > +.byte 102,15,58,15,222,4 > +.byte 69,15,56,203,254 > +.byte 69,15,56,205,218 > + pshufd $0x0e,%xmm1,%xmm0 > + paddd %xmm3,%xmm4 > + movdqa %xmm11,%xmm3 > +.byte 102,65,15,58,15,218,4 > +.byte 15,56,204,238 > +.byte 69,15,56,203,229 > + pshufd $0x0e,%xmm2,%xmm0 > + movdqa 176-128(%rbp),%xmm1 > + paddd %xmm7,%xmm1 > +.byte 69,15,56,203,247 > +.byte 69,15,56,204,202 > + movdqa %xmm1,%xmm0 > + movdqa 176-128(%rbp),%xmm2 > + paddd %xmm3,%xmm8 > + paddd %xmm11,%xmm2 > +.byte 15,56,205,231 > +.byte 69,15,56,203,236 > + movdqa %xmm2,%xmm0 > + movdqa %xmm4,%xmm3 > +.byte 102,15,58,15,223,4 > +.byte 69,15,56,203,254 > +.byte 69,15,56,205,195 > + pshufd $0x0e,%xmm1,%xmm0 > + paddd %xmm3,%xmm5 > + movdqa %xmm8,%xmm3 > +.byte 102,65,15,58,15,219,4 > +.byte 15,56,204,247 > +.byte 69,15,56,203,229 > + pshufd $0x0e,%xmm2,%xmm0 > + movdqa 192-128(%rbp),%xmm1 > + paddd %xmm4,%xmm1 > +.byte 69,15,56,203,247 > +.byte 69,15,56,204,211 > + movdqa %xmm1,%xmm0 > + movdqa 192-128(%rbp),%xmm2 > + paddd %xmm3,%xmm9 > + paddd %xmm8,%xmm2 > +.byte 15,56,205,236 > +.byte 69,15,56,203,236 > + movdqa %xmm2,%xmm0 > + movdqa %xmm5,%xmm3 > +.byte 102,15,58,15,220,4 > +.byte 69,15,56,203,254 > +.byte 69,15,56,205,200 > + pshufd $0x0e,%xmm1,%xmm0 > + paddd %xmm3,%xmm6 > + movdqa %xmm9,%xmm3 > +.byte 102,65,15,58,15,216,4 > +.byte 15,56,204,252 > +.byte 69,15,56,203,229 > + pshufd $0x0e,%xmm2,%xmm0 > + movdqa 208-128(%rbp),%xmm1 > + paddd %xmm5,%xmm1 > +.byte 69,15,56,203,247 > +.byte 69,15,56,204,216 > + movdqa %xmm1,%xmm0 > + movdqa 208-128(%rbp),%xmm2 > + paddd %xmm3,%xmm10 > + paddd %xmm9,%xmm2 > +.byte 15,56,205,245 > +.byte 69,15,56,203,236 > + movdqa %xmm2,%xmm0 > + movdqa %xmm6,%xmm3 > +.byte 102,15,58,15,221,4 > +.byte 69,15,56,203,254 > +.byte 69,15,56,205,209 > + pshufd $0x0e,%xmm1,%xmm0 > + paddd %xmm3,%xmm7 > + movdqa %xmm10,%xmm3 > +.byte 102,65,15,58,15,217,4 > + nop > +.byte 69,15,56,203,229 > + pshufd $0x0e,%xmm2,%xmm0 > + movdqa 224-128(%rbp),%xmm1 > + paddd %xmm6,%xmm1 > +.byte 69,15,56,203,247 > + > + movdqa %xmm1,%xmm0 > + movdqa 224-128(%rbp),%xmm2 > + paddd %xmm3,%xmm11 > + paddd %xmm10,%xmm2 > +.byte 15,56,205,254 > + nop > +.byte 69,15,56,203,236 > + movdqa %xmm2,%xmm0 > + movl $1,%ecx > + pxor %xmm6,%xmm6 > +.byte 69,15,56,203,254 > +.byte 69,15,56,205,218 > + pshufd $0x0e,%xmm1,%xmm0 > + movdqa 240-128(%rbp),%xmm1 > + paddd %xmm7,%xmm1 > + movq (%rbx),%xmm7 > + nop > +.byte 69,15,56,203,229 > + pshufd $0x0e,%xmm2,%xmm0 > + movdqa 240-128(%rbp),%xmm2 > + paddd %xmm11,%xmm2 > +.byte 69,15,56,203,247 > + > + movdqa %xmm1,%xmm0 > + cmpl 0(%rbx),%ecx > + cmovgeq %rsp,%r8 > + cmpl 4(%rbx),%ecx > + cmovgeq %rsp,%r9 > + pshufd $0x00,%xmm7,%xmm9 > +.byte 69,15,56,203,236 > + movdqa %xmm2,%xmm0 > + pshufd $0x55,%xmm7,%xmm10 > + movdqa %xmm7,%xmm11 > +.byte 69,15,56,203,254 > + pshufd $0x0e,%xmm1,%xmm0 > + pcmpgtd %xmm6,%xmm9 > + pcmpgtd %xmm6,%xmm10 > +.byte 69,15,56,203,229 > + pshufd $0x0e,%xmm2,%xmm0 > + pcmpgtd %xmm6,%xmm11 > + movdqa K256_shaext-16(%rip),%xmm3 > +.byte 69,15,56,203,247 > + > + pand %xmm9,%xmm13 > + pand %xmm10,%xmm15 > + pand %xmm9,%xmm12 > + pand %xmm10,%xmm14 > + paddd %xmm7,%xmm11 > + > + paddd 80(%rsp),%xmm13 > + paddd 112(%rsp),%xmm15 > + paddd 64(%rsp),%xmm12 > + paddd 96(%rsp),%xmm14 > + > + movq %xmm11,(%rbx) > + decl %edx > + jnz .Loop_shaext > + > + movl 280(%rsp),%edx > + > + pshufd $27,%xmm12,%xmm12 > + pshufd $27,%xmm13,%xmm13 > + pshufd $27,%xmm14,%xmm14 > + pshufd $27,%xmm15,%xmm15 > + > + movdqa %xmm12,%xmm5 > + movdqa %xmm13,%xmm6 > + punpckldq %xmm14,%xmm12 > + punpckhdq %xmm14,%xmm5 > + punpckldq %xmm15,%xmm13 > + punpckhdq %xmm15,%xmm6 > + > + movq %xmm12,0-128(%rdi) > + psrldq $8,%xmm12 > + movq %xmm5,128-128(%rdi) > + psrldq $8,%xmm5 > + movq %xmm12,32-128(%rdi) > + movq %xmm5,160-128(%rdi) > + > + movq %xmm13,64-128(%rdi) > + psrldq $8,%xmm13 > + movq %xmm6,192-128(%rdi) > + psrldq $8,%xmm6 > + movq %xmm13,96-128(%rdi) > + movq %xmm6,224-128(%rdi) > + > + leaq 8(%rdi),%rdi > + leaq 32(%rsi),%rsi > + decl %edx > + jnz .Loop_grande_shaext > + > +.Ldone_shaext: > + > + movq -16(%rax),%rbp > +.cfi_restore %rbp > + movq -8(%rax),%rbx > +.cfi_restore %rbx > + leaq (%rax),%rsp > +.cfi_def_cfa_register %rsp > +.Lepilogue_shaext: > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size sha256_multi_block_shaext,.-sha256_multi_block_shaext > +.align 256 > +K256: > +.long 1116352408,1116352408,1116352408,1116352408 > +.long 1116352408,1116352408,1116352408,1116352408 > +.long 1899447441,1899447441,1899447441,1899447441 > +.long 1899447441,1899447441,1899447441,1899447441 > +.long 3049323471,3049323471,3049323471,3049323471 > +.long 3049323471,3049323471,3049323471,3049323471 > +.long 3921009573,3921009573,3921009573,3921009573 > +.long 3921009573,3921009573,3921009573,3921009573 > +.long 961987163,961987163,961987163,961987163 > +.long 961987163,961987163,961987163,961987163 > +.long 1508970993,1508970993,1508970993,1508970993 > +.long 1508970993,1508970993,1508970993,1508970993 > +.long 2453635748,2453635748,2453635748,2453635748 > +.long 2453635748,2453635748,2453635748,2453635748 > +.long 2870763221,2870763221,2870763221,2870763221 > +.long 2870763221,2870763221,2870763221,2870763221 > +.long 3624381080,3624381080,3624381080,3624381080 > +.long 3624381080,3624381080,3624381080,3624381080 > +.long 310598401,310598401,310598401,310598401 > +.long 310598401,310598401,310598401,310598401 > +.long 607225278,607225278,607225278,607225278 > +.long 607225278,607225278,607225278,607225278 > +.long 1426881987,1426881987,1426881987,1426881987 > +.long 1426881987,1426881987,1426881987,1426881987 > +.long 1925078388,1925078388,1925078388,1925078388 > +.long 1925078388,1925078388,1925078388,1925078388 > +.long 2162078206,2162078206,2162078206,2162078206 > +.long 2162078206,2162078206,2162078206,2162078206 > +.long 2614888103,2614888103,2614888103,2614888103 > +.long 2614888103,2614888103,2614888103,2614888103 > +.long 3248222580,3248222580,3248222580,3248222580 > +.long 3248222580,3248222580,3248222580,3248222580 > +.long 3835390401,3835390401,3835390401,3835390401 > +.long 3835390401,3835390401,3835390401,3835390401 > +.long 4022224774,4022224774,4022224774,4022224774 > +.long 4022224774,4022224774,4022224774,4022224774 > +.long 264347078,264347078,264347078,264347078 > +.long 264347078,264347078,264347078,264347078 > +.long 604807628,604807628,604807628,604807628 > +.long 604807628,604807628,604807628,604807628 > +.long 770255983,770255983,770255983,770255983 > +.long 770255983,770255983,770255983,770255983 > +.long 1249150122,1249150122,1249150122,1249150122 > +.long 1249150122,1249150122,1249150122,1249150122 > +.long 1555081692,1555081692,1555081692,1555081692 > +.long 1555081692,1555081692,1555081692,1555081692 > +.long 1996064986,1996064986,1996064986,1996064986 > +.long 1996064986,1996064986,1996064986,1996064986 > +.long 2554220882,2554220882,2554220882,2554220882 > +.long 2554220882,2554220882,2554220882,2554220882 > +.long 2821834349,2821834349,2821834349,2821834349 > +.long 2821834349,2821834349,2821834349,2821834349 > +.long 2952996808,2952996808,2952996808,2952996808 > +.long 2952996808,2952996808,2952996808,2952996808 > +.long 3210313671,3210313671,3210313671,3210313671 > +.long 3210313671,3210313671,3210313671,3210313671 > +.long 3336571891,3336571891,3336571891,3336571891 > +.long 3336571891,3336571891,3336571891,3336571891 > +.long 3584528711,3584528711,3584528711,3584528711 > +.long 3584528711,3584528711,3584528711,3584528711 > +.long 113926993,113926993,113926993,113926993 > +.long 113926993,113926993,113926993,113926993 > +.long 338241895,338241895,338241895,338241895 > +.long 338241895,338241895,338241895,338241895 > +.long 666307205,666307205,666307205,666307205 > +.long 666307205,666307205,666307205,666307205 > +.long 773529912,773529912,773529912,773529912 > +.long 773529912,773529912,773529912,773529912 > +.long 1294757372,1294757372,1294757372,1294757372 > +.long 1294757372,1294757372,1294757372,1294757372 > +.long 1396182291,1396182291,1396182291,1396182291 > +.long 1396182291,1396182291,1396182291,1396182291 > +.long 1695183700,1695183700,1695183700,1695183700 > +.long 1695183700,1695183700,1695183700,1695183700 > +.long 1986661051,1986661051,1986661051,1986661051 > +.long 1986661051,1986661051,1986661051,1986661051 > +.long 2177026350,2177026350,2177026350,2177026350 > +.long 2177026350,2177026350,2177026350,2177026350 > +.long 2456956037,2456956037,2456956037,2456956037 > +.long 2456956037,2456956037,2456956037,2456956037 > +.long 2730485921,2730485921,2730485921,2730485921 > +.long 2730485921,2730485921,2730485921,2730485921 > +.long 2820302411,2820302411,2820302411,2820302411 > +.long 2820302411,2820302411,2820302411,2820302411 > +.long 3259730800,3259730800,3259730800,3259730800 > +.long 3259730800,3259730800,3259730800,3259730800 > +.long 3345764771,3345764771,3345764771,3345764771 > +.long 3345764771,3345764771,3345764771,3345764771 > +.long 3516065817,3516065817,3516065817,3516065817 > +.long 3516065817,3516065817,3516065817,3516065817 > +.long 3600352804,3600352804,3600352804,3600352804 > +.long 3600352804,3600352804,3600352804,3600352804 > +.long 4094571909,4094571909,4094571909,4094571909 > +.long 4094571909,4094571909,4094571909,4094571909 > +.long 275423344,275423344,275423344,275423344 > +.long 275423344,275423344,275423344,275423344 > +.long 430227734,430227734,430227734,430227734 > +.long 430227734,430227734,430227734,430227734 > +.long 506948616,506948616,506948616,506948616 > +.long 506948616,506948616,506948616,506948616 > +.long 659060556,659060556,659060556,659060556 > +.long 659060556,659060556,659060556,659060556 > +.long 883997877,883997877,883997877,883997877 > +.long 883997877,883997877,883997877,883997877 > +.long 958139571,958139571,958139571,958139571 > +.long 958139571,958139571,958139571,958139571 > +.long 1322822218,1322822218,1322822218,1322822218 > +.long 1322822218,1322822218,1322822218,1322822218 > +.long 1537002063,1537002063,1537002063,1537002063 > +.long 1537002063,1537002063,1537002063,1537002063 > +.long 1747873779,1747873779,1747873779,1747873779 > +.long 1747873779,1747873779,1747873779,1747873779 > +.long 1955562222,1955562222,1955562222,1955562222 > +.long 1955562222,1955562222,1955562222,1955562222 > +.long 2024104815,2024104815,2024104815,2024104815 > +.long 2024104815,2024104815,2024104815,2024104815 > +.long 2227730452,2227730452,2227730452,2227730452 > +.long 2227730452,2227730452,2227730452,2227730452 > +.long 2361852424,2361852424,2361852424,2361852424 > +.long 2361852424,2361852424,2361852424,2361852424 > +.long 2428436474,2428436474,2428436474,2428436474 > +.long 2428436474,2428436474,2428436474,2428436474 > +.long 2756734187,2756734187,2756734187,2756734187 > +.long 2756734187,2756734187,2756734187,2756734187 > +.long 3204031479,3204031479,3204031479,3204031479 > +.long 3204031479,3204031479,3204031479,3204031479 > +.long 3329325298,3329325298,3329325298,3329325298 > +.long 3329325298,3329325298,3329325298,3329325298 > +.Lpbswap: > +.long 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f > +.long 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f > +K256_shaext: > +.long 0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5 > +.long 0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5 > +.long 0xd807aa98,0x12835b01,0x243185be,0x550c7dc3 > +.long 0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174 > +.long 0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc > +.long 0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da > +.long 0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7 > +.long 0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967 > +.long 0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13 > +.long 0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85 > +.long 0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3 > +.long 0xd192e819,0xd6990624,0xf40e3585,0x106aa070 > +.long 0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5 > +.long 0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3 > +.long 0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208 > +.long 0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2 > +.byte > 83,72,65,50,53,54,32,109,117,108,116,105,45,98,108,111,99,107,32,116,114,9 > 7,110,115,102,111,114,109,32,102,111,114,32,120,56,54,95,54,52,44,32,67,82, > 89,80,84,79,71,65,77,83,32,98,121,32,60,97,112,112,114,111,64,111,112,101,1 > 10,115,115,108,46,111,114,103,62,0 > diff --git a/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha256-x86_64.S > b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha256-x86_64.S > new file mode 100644 > index 0000000000..a5d3cf5068 > --- /dev/null > +++ b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha256-x86_64.S > @@ -0,0 +1,3097 @@ > +# WARNING: do not edit! > +# Generated from openssl/crypto/sha/asm/sha512-x86_64.pl > +# > +# Copyright 2005-2020 The OpenSSL Project Authors. All Rights Reserved. > +# > +# Licensed under the OpenSSL license (the "License"). You may not use > +# this file except in compliance with the License. You can obtain a copy > +# in the file LICENSE in the source distribution or at > +# https://www.openssl.org/source/license.html > + > +.text > + > + > +.globl sha256_block_data_order > +.type sha256_block_data_order,@function > +.align 16 > +sha256_block_data_order: > +.cfi_startproc > + leaq OPENSSL_ia32cap_P(%rip),%r11 > + movl 0(%r11),%r9d > + movl 4(%r11),%r10d > + movl 8(%r11),%r11d > + testl $536870912,%r11d > + jnz _shaext_shortcut > + testl $512,%r10d > + jnz .Lssse3_shortcut > + movq %rsp,%rax > +.cfi_def_cfa_register %rax > + pushq %rbx > +.cfi_offset %rbx,-16 > + pushq %rbp > +.cfi_offset %rbp,-24 > + pushq %r12 > +.cfi_offset %r12,-32 > + pushq %r13 > +.cfi_offset %r13,-40 > + pushq %r14 > +.cfi_offset %r14,-48 > + pushq %r15 > +.cfi_offset %r15,-56 > + shlq $4,%rdx > + subq $64+32,%rsp > + leaq (%rsi,%rdx,4),%rdx > + andq $-64,%rsp > + movq %rdi,64+0(%rsp) > + movq %rsi,64+8(%rsp) > + movq %rdx,64+16(%rsp) > + movq %rax,88(%rsp) > +.cfi_escape 0x0f,0x06,0x77,0xd8,0x00,0x06,0x23,0x08 > +.Lprologue: > + > + movl 0(%rdi),%eax > + movl 4(%rdi),%ebx > + movl 8(%rdi),%ecx > + movl 12(%rdi),%edx > + movl 16(%rdi),%r8d > + movl 20(%rdi),%r9d > + movl 24(%rdi),%r10d > + movl 28(%rdi),%r11d > + jmp .Lloop > + > +.align 16 > +.Lloop: > + movl %ebx,%edi > + leaq K256(%rip),%rbp > + xorl %ecx,%edi > + movl 0(%rsi),%r12d > + movl %r8d,%r13d > + movl %eax,%r14d > + bswapl %r12d > + rorl $14,%r13d > + movl %r9d,%r15d > + > + xorl %r8d,%r13d > + rorl $9,%r14d > + xorl %r10d,%r15d > + > + movl %r12d,0(%rsp) > + xorl %eax,%r14d > + andl %r8d,%r15d > + > + rorl $5,%r13d > + addl %r11d,%r12d > + xorl %r10d,%r15d > + > + rorl $11,%r14d > + xorl %r8d,%r13d > + addl %r15d,%r12d > + > + movl %eax,%r15d > + addl (%rbp),%r12d > + xorl %eax,%r14d > + > + xorl %ebx,%r15d > + rorl $6,%r13d > + movl %ebx,%r11d > + > + andl %r15d,%edi > + rorl $2,%r14d > + addl %r13d,%r12d > + > + xorl %edi,%r11d > + addl %r12d,%edx > + addl %r12d,%r11d > + > + leaq 4(%rbp),%rbp > + addl %r14d,%r11d > + movl 4(%rsi),%r12d > + movl %edx,%r13d > + movl %r11d,%r14d > + bswapl %r12d > + rorl $14,%r13d > + movl %r8d,%edi > + > + xorl %edx,%r13d > + rorl $9,%r14d > + xorl %r9d,%edi > + > + movl %r12d,4(%rsp) > + xorl %r11d,%r14d > + andl %edx,%edi > + > + rorl $5,%r13d > + addl %r10d,%r12d > + xorl %r9d,%edi > + > + rorl $11,%r14d > + xorl %edx,%r13d > + addl %edi,%r12d > + > + movl %r11d,%edi > + addl (%rbp),%r12d > + xorl %r11d,%r14d > + > + xorl %eax,%edi > + rorl $6,%r13d > + movl %eax,%r10d > + > + andl %edi,%r15d > + rorl $2,%r14d > + addl %r13d,%r12d > + > + xorl %r15d,%r10d > + addl %r12d,%ecx > + addl %r12d,%r10d > + > + leaq 4(%rbp),%rbp > + addl %r14d,%r10d > + movl 8(%rsi),%r12d > + movl %ecx,%r13d > + movl %r10d,%r14d > + bswapl %r12d > + rorl $14,%r13d > + movl %edx,%r15d > + > + xorl %ecx,%r13d > + rorl $9,%r14d > + xorl %r8d,%r15d > + > + movl %r12d,8(%rsp) > + xorl %r10d,%r14d > + andl %ecx,%r15d > + > + rorl $5,%r13d > + addl %r9d,%r12d > + xorl %r8d,%r15d > + > + rorl $11,%r14d > + xorl %ecx,%r13d > + addl %r15d,%r12d > + > + movl %r10d,%r15d > + addl (%rbp),%r12d > + xorl %r10d,%r14d > + > + xorl %r11d,%r15d > + rorl $6,%r13d > + movl %r11d,%r9d > + > + andl %r15d,%edi > + rorl $2,%r14d > + addl %r13d,%r12d > + > + xorl %edi,%r9d > + addl %r12d,%ebx > + addl %r12d,%r9d > + > + leaq 4(%rbp),%rbp > + addl %r14d,%r9d > + movl 12(%rsi),%r12d > + movl %ebx,%r13d > + movl %r9d,%r14d > + bswapl %r12d > + rorl $14,%r13d > + movl %ecx,%edi > + > + xorl %ebx,%r13d > + rorl $9,%r14d > + xorl %edx,%edi > + > + movl %r12d,12(%rsp) > + xorl %r9d,%r14d > + andl %ebx,%edi > + > + rorl $5,%r13d > + addl %r8d,%r12d > + xorl %edx,%edi > + > + rorl $11,%r14d > + xorl %ebx,%r13d > + addl %edi,%r12d > + > + movl %r9d,%edi > + addl (%rbp),%r12d > + xorl %r9d,%r14d > + > + xorl %r10d,%edi > + rorl $6,%r13d > + movl %r10d,%r8d > + > + andl %edi,%r15d > + rorl $2,%r14d > + addl %r13d,%r12d > + > + xorl %r15d,%r8d > + addl %r12d,%eax > + addl %r12d,%r8d > + > + leaq 20(%rbp),%rbp > + addl %r14d,%r8d > + movl 16(%rsi),%r12d > + movl %eax,%r13d > + movl %r8d,%r14d > + bswapl %r12d > + rorl $14,%r13d > + movl %ebx,%r15d > + > + xorl %eax,%r13d > + rorl $9,%r14d > + xorl %ecx,%r15d > + > + movl %r12d,16(%rsp) > + xorl %r8d,%r14d > + andl %eax,%r15d > + > + rorl $5,%r13d > + addl %edx,%r12d > + xorl %ecx,%r15d > + > + rorl $11,%r14d > + xorl %eax,%r13d > + addl %r15d,%r12d > + > + movl %r8d,%r15d > + addl (%rbp),%r12d > + xorl %r8d,%r14d > + > + xorl %r9d,%r15d > + rorl $6,%r13d > + movl %r9d,%edx > + > + andl %r15d,%edi > + rorl $2,%r14d > + addl %r13d,%r12d > + > + xorl %edi,%edx > + addl %r12d,%r11d > + addl %r12d,%edx > + > + leaq 4(%rbp),%rbp > + addl %r14d,%edx > + movl 20(%rsi),%r12d > + movl %r11d,%r13d > + movl %edx,%r14d > + bswapl %r12d > + rorl $14,%r13d > + movl %eax,%edi > + > + xorl %r11d,%r13d > + rorl $9,%r14d > + xorl %ebx,%edi > + > + movl %r12d,20(%rsp) > + xorl %edx,%r14d > + andl %r11d,%edi > + > + rorl $5,%r13d > + addl %ecx,%r12d > + xorl %ebx,%edi > + > + rorl $11,%r14d > + xorl %r11d,%r13d > + addl %edi,%r12d > + > + movl %edx,%edi > + addl (%rbp),%r12d > + xorl %edx,%r14d > + > + xorl %r8d,%edi > + rorl $6,%r13d > + movl %r8d,%ecx > + > + andl %edi,%r15d > + rorl $2,%r14d > + addl %r13d,%r12d > + > + xorl %r15d,%ecx > + addl %r12d,%r10d > + addl %r12d,%ecx > + > + leaq 4(%rbp),%rbp > + addl %r14d,%ecx > + movl 24(%rsi),%r12d > + movl %r10d,%r13d > + movl %ecx,%r14d > + bswapl %r12d > + rorl $14,%r13d > + movl %r11d,%r15d > + > + xorl %r10d,%r13d > + rorl $9,%r14d > + xorl %eax,%r15d > + > + movl %r12d,24(%rsp) > + xorl %ecx,%r14d > + andl %r10d,%r15d > + > + rorl $5,%r13d > + addl %ebx,%r12d > + xorl %eax,%r15d > + > + rorl $11,%r14d > + xorl %r10d,%r13d > + addl %r15d,%r12d > + > + movl %ecx,%r15d > + addl (%rbp),%r12d > + xorl %ecx,%r14d > + > + xorl %edx,%r15d > + rorl $6,%r13d > + movl %edx,%ebx > + > + andl %r15d,%edi > + rorl $2,%r14d > + addl %r13d,%r12d > + > + xorl %edi,%ebx > + addl %r12d,%r9d > + addl %r12d,%ebx > + > + leaq 4(%rbp),%rbp > + addl %r14d,%ebx > + movl 28(%rsi),%r12d > + movl %r9d,%r13d > + movl %ebx,%r14d > + bswapl %r12d > + rorl $14,%r13d > + movl %r10d,%edi > + > + xorl %r9d,%r13d > + rorl $9,%r14d > + xorl %r11d,%edi > + > + movl %r12d,28(%rsp) > + xorl %ebx,%r14d > + andl %r9d,%edi > + > + rorl $5,%r13d > + addl %eax,%r12d > + xorl %r11d,%edi > + > + rorl $11,%r14d > + xorl %r9d,%r13d > + addl %edi,%r12d > + > + movl %ebx,%edi > + addl (%rbp),%r12d > + xorl %ebx,%r14d > + > + xorl %ecx,%edi > + rorl $6,%r13d > + movl %ecx,%eax > + > + andl %edi,%r15d > + rorl $2,%r14d > + addl %r13d,%r12d > + > + xorl %r15d,%eax > + addl %r12d,%r8d > + addl %r12d,%eax > + > + leaq 20(%rbp),%rbp > + addl %r14d,%eax > + movl 32(%rsi),%r12d > + movl %r8d,%r13d > + movl %eax,%r14d > + bswapl %r12d > + rorl $14,%r13d > + movl %r9d,%r15d > + > + xorl %r8d,%r13d > + rorl $9,%r14d > + xorl %r10d,%r15d > + > + movl %r12d,32(%rsp) > + xorl %eax,%r14d > + andl %r8d,%r15d > + > + rorl $5,%r13d > + addl %r11d,%r12d > + xorl %r10d,%r15d > + > + rorl $11,%r14d > + xorl %r8d,%r13d > + addl %r15d,%r12d > + > + movl %eax,%r15d > + addl (%rbp),%r12d > + xorl %eax,%r14d > + > + xorl %ebx,%r15d > + rorl $6,%r13d > + movl %ebx,%r11d > + > + andl %r15d,%edi > + rorl $2,%r14d > + addl %r13d,%r12d > + > + xorl %edi,%r11d > + addl %r12d,%edx > + addl %r12d,%r11d > + > + leaq 4(%rbp),%rbp > + addl %r14d,%r11d > + movl 36(%rsi),%r12d > + movl %edx,%r13d > + movl %r11d,%r14d > + bswapl %r12d > + rorl $14,%r13d > + movl %r8d,%edi > + > + xorl %edx,%r13d > + rorl $9,%r14d > + xorl %r9d,%edi > + > + movl %r12d,36(%rsp) > + xorl %r11d,%r14d > + andl %edx,%edi > + > + rorl $5,%r13d > + addl %r10d,%r12d > + xorl %r9d,%edi > + > + rorl $11,%r14d > + xorl %edx,%r13d > + addl %edi,%r12d > + > + movl %r11d,%edi > + addl (%rbp),%r12d > + xorl %r11d,%r14d > + > + xorl %eax,%edi > + rorl $6,%r13d > + movl %eax,%r10d > + > + andl %edi,%r15d > + rorl $2,%r14d > + addl %r13d,%r12d > + > + xorl %r15d,%r10d > + addl %r12d,%ecx > + addl %r12d,%r10d > + > + leaq 4(%rbp),%rbp > + addl %r14d,%r10d > + movl 40(%rsi),%r12d > + movl %ecx,%r13d > + movl %r10d,%r14d > + bswapl %r12d > + rorl $14,%r13d > + movl %edx,%r15d > + > + xorl %ecx,%r13d > + rorl $9,%r14d > + xorl %r8d,%r15d > + > + movl %r12d,40(%rsp) > + xorl %r10d,%r14d > + andl %ecx,%r15d > + > + rorl $5,%r13d > + addl %r9d,%r12d > + xorl %r8d,%r15d > + > + rorl $11,%r14d > + xorl %ecx,%r13d > + addl %r15d,%r12d > + > + movl %r10d,%r15d > + addl (%rbp),%r12d > + xorl %r10d,%r14d > + > + xorl %r11d,%r15d > + rorl $6,%r13d > + movl %r11d,%r9d > + > + andl %r15d,%edi > + rorl $2,%r14d > + addl %r13d,%r12d > + > + xorl %edi,%r9d > + addl %r12d,%ebx > + addl %r12d,%r9d > + > + leaq 4(%rbp),%rbp > + addl %r14d,%r9d > + movl 44(%rsi),%r12d > + movl %ebx,%r13d > + movl %r9d,%r14d > + bswapl %r12d > + rorl $14,%r13d > + movl %ecx,%edi > + > + xorl %ebx,%r13d > + rorl $9,%r14d > + xorl %edx,%edi > + > + movl %r12d,44(%rsp) > + xorl %r9d,%r14d > + andl %ebx,%edi > + > + rorl $5,%r13d > + addl %r8d,%r12d > + xorl %edx,%edi > + > + rorl $11,%r14d > + xorl %ebx,%r13d > + addl %edi,%r12d > + > + movl %r9d,%edi > + addl (%rbp),%r12d > + xorl %r9d,%r14d > + > + xorl %r10d,%edi > + rorl $6,%r13d > + movl %r10d,%r8d > + > + andl %edi,%r15d > + rorl $2,%r14d > + addl %r13d,%r12d > + > + xorl %r15d,%r8d > + addl %r12d,%eax > + addl %r12d,%r8d > + > + leaq 20(%rbp),%rbp > + addl %r14d,%r8d > + movl 48(%rsi),%r12d > + movl %eax,%r13d > + movl %r8d,%r14d > + bswapl %r12d > + rorl $14,%r13d > + movl %ebx,%r15d > + > + xorl %eax,%r13d > + rorl $9,%r14d > + xorl %ecx,%r15d > + > + movl %r12d,48(%rsp) > + xorl %r8d,%r14d > + andl %eax,%r15d > + > + rorl $5,%r13d > + addl %edx,%r12d > + xorl %ecx,%r15d > + > + rorl $11,%r14d > + xorl %eax,%r13d > + addl %r15d,%r12d > + > + movl %r8d,%r15d > + addl (%rbp),%r12d > + xorl %r8d,%r14d > + > + xorl %r9d,%r15d > + rorl $6,%r13d > + movl %r9d,%edx > + > + andl %r15d,%edi > + rorl $2,%r14d > + addl %r13d,%r12d > + > + xorl %edi,%edx > + addl %r12d,%r11d > + addl %r12d,%edx > + > + leaq 4(%rbp),%rbp > + addl %r14d,%edx > + movl 52(%rsi),%r12d > + movl %r11d,%r13d > + movl %edx,%r14d > + bswapl %r12d > + rorl $14,%r13d > + movl %eax,%edi > + > + xorl %r11d,%r13d > + rorl $9,%r14d > + xorl %ebx,%edi > + > + movl %r12d,52(%rsp) > + xorl %edx,%r14d > + andl %r11d,%edi > + > + rorl $5,%r13d > + addl %ecx,%r12d > + xorl %ebx,%edi > + > + rorl $11,%r14d > + xorl %r11d,%r13d > + addl %edi,%r12d > + > + movl %edx,%edi > + addl (%rbp),%r12d > + xorl %edx,%r14d > + > + xorl %r8d,%edi > + rorl $6,%r13d > + movl %r8d,%ecx > + > + andl %edi,%r15d > + rorl $2,%r14d > + addl %r13d,%r12d > + > + xorl %r15d,%ecx > + addl %r12d,%r10d > + addl %r12d,%ecx > + > + leaq 4(%rbp),%rbp > + addl %r14d,%ecx > + movl 56(%rsi),%r12d > + movl %r10d,%r13d > + movl %ecx,%r14d > + bswapl %r12d > + rorl $14,%r13d > + movl %r11d,%r15d > + > + xorl %r10d,%r13d > + rorl $9,%r14d > + xorl %eax,%r15d > + > + movl %r12d,56(%rsp) > + xorl %ecx,%r14d > + andl %r10d,%r15d > + > + rorl $5,%r13d > + addl %ebx,%r12d > + xorl %eax,%r15d > + > + rorl $11,%r14d > + xorl %r10d,%r13d > + addl %r15d,%r12d > + > + movl %ecx,%r15d > + addl (%rbp),%r12d > + xorl %ecx,%r14d > + > + xorl %edx,%r15d > + rorl $6,%r13d > + movl %edx,%ebx > + > + andl %r15d,%edi > + rorl $2,%r14d > + addl %r13d,%r12d > + > + xorl %edi,%ebx > + addl %r12d,%r9d > + addl %r12d,%ebx > + > + leaq 4(%rbp),%rbp > + addl %r14d,%ebx > + movl 60(%rsi),%r12d > + movl %r9d,%r13d > + movl %ebx,%r14d > + bswapl %r12d > + rorl $14,%r13d > + movl %r10d,%edi > + > + xorl %r9d,%r13d > + rorl $9,%r14d > + xorl %r11d,%edi > + > + movl %r12d,60(%rsp) > + xorl %ebx,%r14d > + andl %r9d,%edi > + > + rorl $5,%r13d > + addl %eax,%r12d > + xorl %r11d,%edi > + > + rorl $11,%r14d > + xorl %r9d,%r13d > + addl %edi,%r12d > + > + movl %ebx,%edi > + addl (%rbp),%r12d > + xorl %ebx,%r14d > + > + xorl %ecx,%edi > + rorl $6,%r13d > + movl %ecx,%eax > + > + andl %edi,%r15d > + rorl $2,%r14d > + addl %r13d,%r12d > + > + xorl %r15d,%eax > + addl %r12d,%r8d > + addl %r12d,%eax > + > + leaq 20(%rbp),%rbp > + jmp .Lrounds_16_xx > +.align 16 > +.Lrounds_16_xx: > + movl 4(%rsp),%r13d > + movl 56(%rsp),%r15d > + > + movl %r13d,%r12d > + rorl $11,%r13d > + addl %r14d,%eax > + movl %r15d,%r14d > + rorl $2,%r15d > + > + xorl %r12d,%r13d > + shrl $3,%r12d > + rorl $7,%r13d > + xorl %r14d,%r15d > + shrl $10,%r14d > + > + rorl $17,%r15d > + xorl %r13d,%r12d > + xorl %r14d,%r15d > + addl 36(%rsp),%r12d > + > + addl 0(%rsp),%r12d > + movl %r8d,%r13d > + addl %r15d,%r12d > + movl %eax,%r14d > + rorl $14,%r13d > + movl %r9d,%r15d > + > + xorl %r8d,%r13d > + rorl $9,%r14d > + xorl %r10d,%r15d > + > + movl %r12d,0(%rsp) > + xorl %eax,%r14d > + andl %r8d,%r15d > + > + rorl $5,%r13d > + addl %r11d,%r12d > + xorl %r10d,%r15d > + > + rorl $11,%r14d > + xorl %r8d,%r13d > + addl %r15d,%r12d > + > + movl %eax,%r15d > + addl (%rbp),%r12d > + xorl %eax,%r14d > + > + xorl %ebx,%r15d > + rorl $6,%r13d > + movl %ebx,%r11d > + > + andl %r15d,%edi > + rorl $2,%r14d > + addl %r13d,%r12d > + > + xorl %edi,%r11d > + addl %r12d,%edx > + addl %r12d,%r11d > + > + leaq 4(%rbp),%rbp > + movl 8(%rsp),%r13d > + movl 60(%rsp),%edi > + > + movl %r13d,%r12d > + rorl $11,%r13d > + addl %r14d,%r11d > + movl %edi,%r14d > + rorl $2,%edi > + > + xorl %r12d,%r13d > + shrl $3,%r12d > + rorl $7,%r13d > + xorl %r14d,%edi > + shrl $10,%r14d > + > + rorl $17,%edi > + xorl %r13d,%r12d > + xorl %r14d,%edi > + addl 40(%rsp),%r12d > + > + addl 4(%rsp),%r12d > + movl %edx,%r13d > + addl %edi,%r12d > + movl %r11d,%r14d > + rorl $14,%r13d > + movl %r8d,%edi > + > + xorl %edx,%r13d > + rorl $9,%r14d > + xorl %r9d,%edi > + > + movl %r12d,4(%rsp) > + xorl %r11d,%r14d > + andl %edx,%edi > + > + rorl $5,%r13d > + addl %r10d,%r12d > + xorl %r9d,%edi > + > + rorl $11,%r14d > + xorl %edx,%r13d > + addl %edi,%r12d > + > + movl %r11d,%edi > + addl (%rbp),%r12d > + xorl %r11d,%r14d > + > + xorl %eax,%edi > + rorl $6,%r13d > + movl %eax,%r10d > + > + andl %edi,%r15d > + rorl $2,%r14d > + addl %r13d,%r12d > + > + xorl %r15d,%r10d > + addl %r12d,%ecx > + addl %r12d,%r10d > + > + leaq 4(%rbp),%rbp > + movl 12(%rsp),%r13d > + movl 0(%rsp),%r15d > + > + movl %r13d,%r12d > + rorl $11,%r13d > + addl %r14d,%r10d > + movl %r15d,%r14d > + rorl $2,%r15d > + > + xorl %r12d,%r13d > + shrl $3,%r12d > + rorl $7,%r13d > + xorl %r14d,%r15d > + shrl $10,%r14d > + > + rorl $17,%r15d > + xorl %r13d,%r12d > + xorl %r14d,%r15d > + addl 44(%rsp),%r12d > + > + addl 8(%rsp),%r12d > + movl %ecx,%r13d > + addl %r15d,%r12d > + movl %r10d,%r14d > + rorl $14,%r13d > + movl %edx,%r15d > + > + xorl %ecx,%r13d > + rorl $9,%r14d > + xorl %r8d,%r15d > + > + movl %r12d,8(%rsp) > + xorl %r10d,%r14d > + andl %ecx,%r15d > + > + rorl $5,%r13d > + addl %r9d,%r12d > + xorl %r8d,%r15d > + > + rorl $11,%r14d > + xorl %ecx,%r13d > + addl %r15d,%r12d > + > + movl %r10d,%r15d > + addl (%rbp),%r12d > + xorl %r10d,%r14d > + > + xorl %r11d,%r15d > + rorl $6,%r13d > + movl %r11d,%r9d > + > + andl %r15d,%edi > + rorl $2,%r14d > + addl %r13d,%r12d > + > + xorl %edi,%r9d > + addl %r12d,%ebx > + addl %r12d,%r9d > + > + leaq 4(%rbp),%rbp > + movl 16(%rsp),%r13d > + movl 4(%rsp),%edi > + > + movl %r13d,%r12d > + rorl $11,%r13d > + addl %r14d,%r9d > + movl %edi,%r14d > + rorl $2,%edi > + > + xorl %r12d,%r13d > + shrl $3,%r12d > + rorl $7,%r13d > + xorl %r14d,%edi > + shrl $10,%r14d > + > + rorl $17,%edi > + xorl %r13d,%r12d > + xorl %r14d,%edi > + addl 48(%rsp),%r12d > + > + addl 12(%rsp),%r12d > + movl %ebx,%r13d > + addl %edi,%r12d > + movl %r9d,%r14d > + rorl $14,%r13d > + movl %ecx,%edi > + > + xorl %ebx,%r13d > + rorl $9,%r14d > + xorl %edx,%edi > + > + movl %r12d,12(%rsp) > + xorl %r9d,%r14d > + andl %ebx,%edi > + > + rorl $5,%r13d > + addl %r8d,%r12d > + xorl %edx,%edi > + > + rorl $11,%r14d > + xorl %ebx,%r13d > + addl %edi,%r12d > + > + movl %r9d,%edi > + addl (%rbp),%r12d > + xorl %r9d,%r14d > + > + xorl %r10d,%edi > + rorl $6,%r13d > + movl %r10d,%r8d > + > + andl %edi,%r15d > + rorl $2,%r14d > + addl %r13d,%r12d > + > + xorl %r15d,%r8d > + addl %r12d,%eax > + addl %r12d,%r8d > + > + leaq 20(%rbp),%rbp > + movl 20(%rsp),%r13d > + movl 8(%rsp),%r15d > + > + movl %r13d,%r12d > + rorl $11,%r13d > + addl %r14d,%r8d > + movl %r15d,%r14d > + rorl $2,%r15d > + > + xorl %r12d,%r13d > + shrl $3,%r12d > + rorl $7,%r13d > + xorl %r14d,%r15d > + shrl $10,%r14d > + > + rorl $17,%r15d > + xorl %r13d,%r12d > + xorl %r14d,%r15d > + addl 52(%rsp),%r12d > + > + addl 16(%rsp),%r12d > + movl %eax,%r13d > + addl %r15d,%r12d > + movl %r8d,%r14d > + rorl $14,%r13d > + movl %ebx,%r15d > + > + xorl %eax,%r13d > + rorl $9,%r14d > + xorl %ecx,%r15d > + > + movl %r12d,16(%rsp) > + xorl %r8d,%r14d > + andl %eax,%r15d > + > + rorl $5,%r13d > + addl %edx,%r12d > + xorl %ecx,%r15d > + > + rorl $11,%r14d > + xorl %eax,%r13d > + addl %r15d,%r12d > + > + movl %r8d,%r15d > + addl (%rbp),%r12d > + xorl %r8d,%r14d > + > + xorl %r9d,%r15d > + rorl $6,%r13d > + movl %r9d,%edx > + > + andl %r15d,%edi > + rorl $2,%r14d > + addl %r13d,%r12d > + > + xorl %edi,%edx > + addl %r12d,%r11d > + addl %r12d,%edx > + > + leaq 4(%rbp),%rbp > + movl 24(%rsp),%r13d > + movl 12(%rsp),%edi > + > + movl %r13d,%r12d > + rorl $11,%r13d > + addl %r14d,%edx > + movl %edi,%r14d > + rorl $2,%edi > + > + xorl %r12d,%r13d > + shrl $3,%r12d > + rorl $7,%r13d > + xorl %r14d,%edi > + shrl $10,%r14d > + > + rorl $17,%edi > + xorl %r13d,%r12d > + xorl %r14d,%edi > + addl 56(%rsp),%r12d > + > + addl 20(%rsp),%r12d > + movl %r11d,%r13d > + addl %edi,%r12d > + movl %edx,%r14d > + rorl $14,%r13d > + movl %eax,%edi > + > + xorl %r11d,%r13d > + rorl $9,%r14d > + xorl %ebx,%edi > + > + movl %r12d,20(%rsp) > + xorl %edx,%r14d > + andl %r11d,%edi > + > + rorl $5,%r13d > + addl %ecx,%r12d > + xorl %ebx,%edi > + > + rorl $11,%r14d > + xorl %r11d,%r13d > + addl %edi,%r12d > + > + movl %edx,%edi > + addl (%rbp),%r12d > + xorl %edx,%r14d > + > + xorl %r8d,%edi > + rorl $6,%r13d > + movl %r8d,%ecx > + > + andl %edi,%r15d > + rorl $2,%r14d > + addl %r13d,%r12d > + > + xorl %r15d,%ecx > + addl %r12d,%r10d > + addl %r12d,%ecx > + > + leaq 4(%rbp),%rbp > + movl 28(%rsp),%r13d > + movl 16(%rsp),%r15d > + > + movl %r13d,%r12d > + rorl $11,%r13d > + addl %r14d,%ecx > + movl %r15d,%r14d > + rorl $2,%r15d > + > + xorl %r12d,%r13d > + shrl $3,%r12d > + rorl $7,%r13d > + xorl %r14d,%r15d > + shrl $10,%r14d > + > + rorl $17,%r15d > + xorl %r13d,%r12d > + xorl %r14d,%r15d > + addl 60(%rsp),%r12d > + > + addl 24(%rsp),%r12d > + movl %r10d,%r13d > + addl %r15d,%r12d > + movl %ecx,%r14d > + rorl $14,%r13d > + movl %r11d,%r15d > + > + xorl %r10d,%r13d > + rorl $9,%r14d > + xorl %eax,%r15d > + > + movl %r12d,24(%rsp) > + xorl %ecx,%r14d > + andl %r10d,%r15d > + > + rorl $5,%r13d > + addl %ebx,%r12d > + xorl %eax,%r15d > + > + rorl $11,%r14d > + xorl %r10d,%r13d > + addl %r15d,%r12d > + > + movl %ecx,%r15d > + addl (%rbp),%r12d > + xorl %ecx,%r14d > + > + xorl %edx,%r15d > + rorl $6,%r13d > + movl %edx,%ebx > + > + andl %r15d,%edi > + rorl $2,%r14d > + addl %r13d,%r12d > + > + xorl %edi,%ebx > + addl %r12d,%r9d > + addl %r12d,%ebx > + > + leaq 4(%rbp),%rbp > + movl 32(%rsp),%r13d > + movl 20(%rsp),%edi > + > + movl %r13d,%r12d > + rorl $11,%r13d > + addl %r14d,%ebx > + movl %edi,%r14d > + rorl $2,%edi > + > + xorl %r12d,%r13d > + shrl $3,%r12d > + rorl $7,%r13d > + xorl %r14d,%edi > + shrl $10,%r14d > + > + rorl $17,%edi > + xorl %r13d,%r12d > + xorl %r14d,%edi > + addl 0(%rsp),%r12d > + > + addl 28(%rsp),%r12d > + movl %r9d,%r13d > + addl %edi,%r12d > + movl %ebx,%r14d > + rorl $14,%r13d > + movl %r10d,%edi > + > + xorl %r9d,%r13d > + rorl $9,%r14d > + xorl %r11d,%edi > + > + movl %r12d,28(%rsp) > + xorl %ebx,%r14d > + andl %r9d,%edi > + > + rorl $5,%r13d > + addl %eax,%r12d > + xorl %r11d,%edi > + > + rorl $11,%r14d > + xorl %r9d,%r13d > + addl %edi,%r12d > + > + movl %ebx,%edi > + addl (%rbp),%r12d > + xorl %ebx,%r14d > + > + xorl %ecx,%edi > + rorl $6,%r13d > + movl %ecx,%eax > + > + andl %edi,%r15d > + rorl $2,%r14d > + addl %r13d,%r12d > + > + xorl %r15d,%eax > + addl %r12d,%r8d > + addl %r12d,%eax > + > + leaq 20(%rbp),%rbp > + movl 36(%rsp),%r13d > + movl 24(%rsp),%r15d > + > + movl %r13d,%r12d > + rorl $11,%r13d > + addl %r14d,%eax > + movl %r15d,%r14d > + rorl $2,%r15d > + > + xorl %r12d,%r13d > + shrl $3,%r12d > + rorl $7,%r13d > + xorl %r14d,%r15d > + shrl $10,%r14d > + > + rorl $17,%r15d > + xorl %r13d,%r12d > + xorl %r14d,%r15d > + addl 4(%rsp),%r12d > + > + addl 32(%rsp),%r12d > + movl %r8d,%r13d > + addl %r15d,%r12d > + movl %eax,%r14d > + rorl $14,%r13d > + movl %r9d,%r15d > + > + xorl %r8d,%r13d > + rorl $9,%r14d > + xorl %r10d,%r15d > + > + movl %r12d,32(%rsp) > + xorl %eax,%r14d > + andl %r8d,%r15d > + > + rorl $5,%r13d > + addl %r11d,%r12d > + xorl %r10d,%r15d > + > + rorl $11,%r14d > + xorl %r8d,%r13d > + addl %r15d,%r12d > + > + movl %eax,%r15d > + addl (%rbp),%r12d > + xorl %eax,%r14d > + > + xorl %ebx,%r15d > + rorl $6,%r13d > + movl %ebx,%r11d > + > + andl %r15d,%edi > + rorl $2,%r14d > + addl %r13d,%r12d > + > + xorl %edi,%r11d > + addl %r12d,%edx > + addl %r12d,%r11d > + > + leaq 4(%rbp),%rbp > + movl 40(%rsp),%r13d > + movl 28(%rsp),%edi > + > + movl %r13d,%r12d > + rorl $11,%r13d > + addl %r14d,%r11d > + movl %edi,%r14d > + rorl $2,%edi > + > + xorl %r12d,%r13d > + shrl $3,%r12d > + rorl $7,%r13d > + xorl %r14d,%edi > + shrl $10,%r14d > + > + rorl $17,%edi > + xorl %r13d,%r12d > + xorl %r14d,%edi > + addl 8(%rsp),%r12d > + > + addl 36(%rsp),%r12d > + movl %edx,%r13d > + addl %edi,%r12d > + movl %r11d,%r14d > + rorl $14,%r13d > + movl %r8d,%edi > + > + xorl %edx,%r13d > + rorl $9,%r14d > + xorl %r9d,%edi > + > + movl %r12d,36(%rsp) > + xorl %r11d,%r14d > + andl %edx,%edi > + > + rorl $5,%r13d > + addl %r10d,%r12d > + xorl %r9d,%edi > + > + rorl $11,%r14d > + xorl %edx,%r13d > + addl %edi,%r12d > + > + movl %r11d,%edi > + addl (%rbp),%r12d > + xorl %r11d,%r14d > + > + xorl %eax,%edi > + rorl $6,%r13d > + movl %eax,%r10d > + > + andl %edi,%r15d > + rorl $2,%r14d > + addl %r13d,%r12d > + > + xorl %r15d,%r10d > + addl %r12d,%ecx > + addl %r12d,%r10d > + > + leaq 4(%rbp),%rbp > + movl 44(%rsp),%r13d > + movl 32(%rsp),%r15d > + > + movl %r13d,%r12d > + rorl $11,%r13d > + addl %r14d,%r10d > + movl %r15d,%r14d > + rorl $2,%r15d > + > + xorl %r12d,%r13d > + shrl $3,%r12d > + rorl $7,%r13d > + xorl %r14d,%r15d > + shrl $10,%r14d > + > + rorl $17,%r15d > + xorl %r13d,%r12d > + xorl %r14d,%r15d > + addl 12(%rsp),%r12d > + > + addl 40(%rsp),%r12d > + movl %ecx,%r13d > + addl %r15d,%r12d > + movl %r10d,%r14d > + rorl $14,%r13d > + movl %edx,%r15d > + > + xorl %ecx,%r13d > + rorl $9,%r14d > + xorl %r8d,%r15d > + > + movl %r12d,40(%rsp) > + xorl %r10d,%r14d > + andl %ecx,%r15d > + > + rorl $5,%r13d > + addl %r9d,%r12d > + xorl %r8d,%r15d > + > + rorl $11,%r14d > + xorl %ecx,%r13d > + addl %r15d,%r12d > + > + movl %r10d,%r15d > + addl (%rbp),%r12d > + xorl %r10d,%r14d > + > + xorl %r11d,%r15d > + rorl $6,%r13d > + movl %r11d,%r9d > + > + andl %r15d,%edi > + rorl $2,%r14d > + addl %r13d,%r12d > + > + xorl %edi,%r9d > + addl %r12d,%ebx > + addl %r12d,%r9d > + > + leaq 4(%rbp),%rbp > + movl 48(%rsp),%r13d > + movl 36(%rsp),%edi > + > + movl %r13d,%r12d > + rorl $11,%r13d > + addl %r14d,%r9d > + movl %edi,%r14d > + rorl $2,%edi > + > + xorl %r12d,%r13d > + shrl $3,%r12d > + rorl $7,%r13d > + xorl %r14d,%edi > + shrl $10,%r14d > + > + rorl $17,%edi > + xorl %r13d,%r12d > + xorl %r14d,%edi > + addl 16(%rsp),%r12d > + > + addl 44(%rsp),%r12d > + movl %ebx,%r13d > + addl %edi,%r12d > + movl %r9d,%r14d > + rorl $14,%r13d > + movl %ecx,%edi > + > + xorl %ebx,%r13d > + rorl $9,%r14d > + xorl %edx,%edi > + > + movl %r12d,44(%rsp) > + xorl %r9d,%r14d > + andl %ebx,%edi > + > + rorl $5,%r13d > + addl %r8d,%r12d > + xorl %edx,%edi > + > + rorl $11,%r14d > + xorl %ebx,%r13d > + addl %edi,%r12d > + > + movl %r9d,%edi > + addl (%rbp),%r12d > + xorl %r9d,%r14d > + > + xorl %r10d,%edi > + rorl $6,%r13d > + movl %r10d,%r8d > + > + andl %edi,%r15d > + rorl $2,%r14d > + addl %r13d,%r12d > + > + xorl %r15d,%r8d > + addl %r12d,%eax > + addl %r12d,%r8d > + > + leaq 20(%rbp),%rbp > + movl 52(%rsp),%r13d > + movl 40(%rsp),%r15d > + > + movl %r13d,%r12d > + rorl $11,%r13d > + addl %r14d,%r8d > + movl %r15d,%r14d > + rorl $2,%r15d > + > + xorl %r12d,%r13d > + shrl $3,%r12d > + rorl $7,%r13d > + xorl %r14d,%r15d > + shrl $10,%r14d > + > + rorl $17,%r15d > + xorl %r13d,%r12d > + xorl %r14d,%r15d > + addl 20(%rsp),%r12d > + > + addl 48(%rsp),%r12d > + movl %eax,%r13d > + addl %r15d,%r12d > + movl %r8d,%r14d > + rorl $14,%r13d > + movl %ebx,%r15d > + > + xorl %eax,%r13d > + rorl $9,%r14d > + xorl %ecx,%r15d > + > + movl %r12d,48(%rsp) > + xorl %r8d,%r14d > + andl %eax,%r15d > + > + rorl $5,%r13d > + addl %edx,%r12d > + xorl %ecx,%r15d > + > + rorl $11,%r14d > + xorl %eax,%r13d > + addl %r15d,%r12d > + > + movl %r8d,%r15d > + addl (%rbp),%r12d > + xorl %r8d,%r14d > + > + xorl %r9d,%r15d > + rorl $6,%r13d > + movl %r9d,%edx > + > + andl %r15d,%edi > + rorl $2,%r14d > + addl %r13d,%r12d > + > + xorl %edi,%edx > + addl %r12d,%r11d > + addl %r12d,%edx > + > + leaq 4(%rbp),%rbp > + movl 56(%rsp),%r13d > + movl 44(%rsp),%edi > + > + movl %r13d,%r12d > + rorl $11,%r13d > + addl %r14d,%edx > + movl %edi,%r14d > + rorl $2,%edi > + > + xorl %r12d,%r13d > + shrl $3,%r12d > + rorl $7,%r13d > + xorl %r14d,%edi > + shrl $10,%r14d > + > + rorl $17,%edi > + xorl %r13d,%r12d > + xorl %r14d,%edi > + addl 24(%rsp),%r12d > + > + addl 52(%rsp),%r12d > + movl %r11d,%r13d > + addl %edi,%r12d > + movl %edx,%r14d > + rorl $14,%r13d > + movl %eax,%edi > + > + xorl %r11d,%r13d > + rorl $9,%r14d > + xorl %ebx,%edi > + > + movl %r12d,52(%rsp) > + xorl %edx,%r14d > + andl %r11d,%edi > + > + rorl $5,%r13d > + addl %ecx,%r12d > + xorl %ebx,%edi > + > + rorl $11,%r14d > + xorl %r11d,%r13d > + addl %edi,%r12d > + > + movl %edx,%edi > + addl (%rbp),%r12d > + xorl %edx,%r14d > + > + xorl %r8d,%edi > + rorl $6,%r13d > + movl %r8d,%ecx > + > + andl %edi,%r15d > + rorl $2,%r14d > + addl %r13d,%r12d > + > + xorl %r15d,%ecx > + addl %r12d,%r10d > + addl %r12d,%ecx > + > + leaq 4(%rbp),%rbp > + movl 60(%rsp),%r13d > + movl 48(%rsp),%r15d > + > + movl %r13d,%r12d > + rorl $11,%r13d > + addl %r14d,%ecx > + movl %r15d,%r14d > + rorl $2,%r15d > + > + xorl %r12d,%r13d > + shrl $3,%r12d > + rorl $7,%r13d > + xorl %r14d,%r15d > + shrl $10,%r14d > + > + rorl $17,%r15d > + xorl %r13d,%r12d > + xorl %r14d,%r15d > + addl 28(%rsp),%r12d > + > + addl 56(%rsp),%r12d > + movl %r10d,%r13d > + addl %r15d,%r12d > + movl %ecx,%r14d > + rorl $14,%r13d > + movl %r11d,%r15d > + > + xorl %r10d,%r13d > + rorl $9,%r14d > + xorl %eax,%r15d > + > + movl %r12d,56(%rsp) > + xorl %ecx,%r14d > + andl %r10d,%r15d > + > + rorl $5,%r13d > + addl %ebx,%r12d > + xorl %eax,%r15d > + > + rorl $11,%r14d > + xorl %r10d,%r13d > + addl %r15d,%r12d > + > + movl %ecx,%r15d > + addl (%rbp),%r12d > + xorl %ecx,%r14d > + > + xorl %edx,%r15d > + rorl $6,%r13d > + movl %edx,%ebx > + > + andl %r15d,%edi > + rorl $2,%r14d > + addl %r13d,%r12d > + > + xorl %edi,%ebx > + addl %r12d,%r9d > + addl %r12d,%ebx > + > + leaq 4(%rbp),%rbp > + movl 0(%rsp),%r13d > + movl 52(%rsp),%edi > + > + movl %r13d,%r12d > + rorl $11,%r13d > + addl %r14d,%ebx > + movl %edi,%r14d > + rorl $2,%edi > + > + xorl %r12d,%r13d > + shrl $3,%r12d > + rorl $7,%r13d > + xorl %r14d,%edi > + shrl $10,%r14d > + > + rorl $17,%edi > + xorl %r13d,%r12d > + xorl %r14d,%edi > + addl 32(%rsp),%r12d > + > + addl 60(%rsp),%r12d > + movl %r9d,%r13d > + addl %edi,%r12d > + movl %ebx,%r14d > + rorl $14,%r13d > + movl %r10d,%edi > + > + xorl %r9d,%r13d > + rorl $9,%r14d > + xorl %r11d,%edi > + > + movl %r12d,60(%rsp) > + xorl %ebx,%r14d > + andl %r9d,%edi > + > + rorl $5,%r13d > + addl %eax,%r12d > + xorl %r11d,%edi > + > + rorl $11,%r14d > + xorl %r9d,%r13d > + addl %edi,%r12d > + > + movl %ebx,%edi > + addl (%rbp),%r12d > + xorl %ebx,%r14d > + > + xorl %ecx,%edi > + rorl $6,%r13d > + movl %ecx,%eax > + > + andl %edi,%r15d > + rorl $2,%r14d > + addl %r13d,%r12d > + > + xorl %r15d,%eax > + addl %r12d,%r8d > + addl %r12d,%eax > + > + leaq 20(%rbp),%rbp > + cmpb $0,3(%rbp) > + jnz .Lrounds_16_xx > + > + movq 64+0(%rsp),%rdi > + addl %r14d,%eax > + leaq 64(%rsi),%rsi > + > + addl 0(%rdi),%eax > + addl 4(%rdi),%ebx > + addl 8(%rdi),%ecx > + addl 12(%rdi),%edx > + addl 16(%rdi),%r8d > + addl 20(%rdi),%r9d > + addl 24(%rdi),%r10d > + addl 28(%rdi),%r11d > + > + cmpq 64+16(%rsp),%rsi > + > + movl %eax,0(%rdi) > + movl %ebx,4(%rdi) > + movl %ecx,8(%rdi) > + movl %edx,12(%rdi) > + movl %r8d,16(%rdi) > + movl %r9d,20(%rdi) > + movl %r10d,24(%rdi) > + movl %r11d,28(%rdi) > + jb .Lloop > + > + movq 88(%rsp),%rsi > +.cfi_def_cfa %rsi,8 > + movq -48(%rsi),%r15 > +.cfi_restore %r15 > + movq -40(%rsi),%r14 > +.cfi_restore %r14 > + movq -32(%rsi),%r13 > +.cfi_restore %r13 > + movq -24(%rsi),%r12 > +.cfi_restore %r12 > + movq -16(%rsi),%rbp > +.cfi_restore %rbp > + movq -8(%rsi),%rbx > +.cfi_restore %rbx > + leaq (%rsi),%rsp > +.cfi_def_cfa_register %rsp > +.Lepilogue: > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size sha256_block_data_order,.-sha256_block_data_order > +.align 64 > +.type K256,@object > +K256: > +.long 0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5 > +.long 0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5 > +.long 0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5 > +.long 0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5 > +.long 0xd807aa98,0x12835b01,0x243185be,0x550c7dc3 > +.long 0xd807aa98,0x12835b01,0x243185be,0x550c7dc3 > +.long 0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174 > +.long 0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174 > +.long 0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc > +.long 0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc > +.long 0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da > +.long 0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da > +.long 0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7 > +.long 0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7 > +.long 0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967 > +.long 0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967 > +.long 0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13 > +.long 0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13 > +.long 0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85 > +.long 0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85 > +.long 0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3 > +.long 0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3 > +.long 0xd192e819,0xd6990624,0xf40e3585,0x106aa070 > +.long 0xd192e819,0xd6990624,0xf40e3585,0x106aa070 > +.long 0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5 > +.long 0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5 > +.long 0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3 > +.long 0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3 > +.long 0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208 > +.long 0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208 > +.long 0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2 > +.long 0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2 > + > +.long 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f > +.long 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f > +.long 0x03020100,0x0b0a0908,0xffffffff,0xffffffff > +.long 0x03020100,0x0b0a0908,0xffffffff,0xffffffff > +.long 0xffffffff,0xffffffff,0x03020100,0x0b0a0908 > +.long 0xffffffff,0xffffffff,0x03020100,0x0b0a0908 > +.byte > 83,72,65,50,53,54,32,98,108,111,99,107,32,116,114,97,110,115,102,111,114,1 > 09,32,102,111,114,32,120,56,54,95,54,52,44,32,67,82,89,80,84,79,71,65,77,83, > 32,98,121,32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46,111,1 > 14,103,62,0 > +.type sha256_block_data_order_shaext,@function > +.align 64 > +sha256_block_data_order_shaext: > +_shaext_shortcut: > +.cfi_startproc > + leaq K256+128(%rip),%rcx > + movdqu (%rdi),%xmm1 > + movdqu 16(%rdi),%xmm2 > + movdqa 512-128(%rcx),%xmm7 > + > + pshufd $0x1b,%xmm1,%xmm0 > + pshufd $0xb1,%xmm1,%xmm1 > + pshufd $0x1b,%xmm2,%xmm2 > + movdqa %xmm7,%xmm8 > +.byte 102,15,58,15,202,8 > + punpcklqdq %xmm0,%xmm2 > + jmp .Loop_shaext > + > +.align 16 > +.Loop_shaext: > + movdqu (%rsi),%xmm3 > + movdqu 16(%rsi),%xmm4 > + movdqu 32(%rsi),%xmm5 > +.byte 102,15,56,0,223 > + movdqu 48(%rsi),%xmm6 > + > + movdqa 0-128(%rcx),%xmm0 > + paddd %xmm3,%xmm0 > +.byte 102,15,56,0,231 > + movdqa %xmm2,%xmm10 > +.byte 15,56,203,209 > + pshufd $0x0e,%xmm0,%xmm0 > + nop > + movdqa %xmm1,%xmm9 > +.byte 15,56,203,202 > + > + movdqa 32-128(%rcx),%xmm0 > + paddd %xmm4,%xmm0 > +.byte 102,15,56,0,239 > +.byte 15,56,203,209 > + pshufd $0x0e,%xmm0,%xmm0 > + leaq 64(%rsi),%rsi > +.byte 15,56,204,220 > +.byte 15,56,203,202 > + > + movdqa 64-128(%rcx),%xmm0 > + paddd %xmm5,%xmm0 > +.byte 102,15,56,0,247 > +.byte 15,56,203,209 > + pshufd $0x0e,%xmm0,%xmm0 > + movdqa %xmm6,%xmm7 > +.byte 102,15,58,15,253,4 > + nop > + paddd %xmm7,%xmm3 > +.byte 15,56,204,229 > +.byte 15,56,203,202 > + > + movdqa 96-128(%rcx),%xmm0 > + paddd %xmm6,%xmm0 > +.byte 15,56,205,222 > +.byte 15,56,203,209 > + pshufd $0x0e,%xmm0,%xmm0 > + movdqa %xmm3,%xmm7 > +.byte 102,15,58,15,254,4 > + nop > + paddd %xmm7,%xmm4 > +.byte 15,56,204,238 > +.byte 15,56,203,202 > + movdqa 128-128(%rcx),%xmm0 > + paddd %xmm3,%xmm0 > +.byte 15,56,205,227 > +.byte 15,56,203,209 > + pshufd $0x0e,%xmm0,%xmm0 > + movdqa %xmm4,%xmm7 > +.byte 102,15,58,15,251,4 > + nop > + paddd %xmm7,%xmm5 > +.byte 15,56,204,243 > +.byte 15,56,203,202 > + movdqa 160-128(%rcx),%xmm0 > + paddd %xmm4,%xmm0 > +.byte 15,56,205,236 > +.byte 15,56,203,209 > + pshufd $0x0e,%xmm0,%xmm0 > + movdqa %xmm5,%xmm7 > +.byte 102,15,58,15,252,4 > + nop > + paddd %xmm7,%xmm6 > +.byte 15,56,204,220 > +.byte 15,56,203,202 > + movdqa 192-128(%rcx),%xmm0 > + paddd %xmm5,%xmm0 > +.byte 15,56,205,245 > +.byte 15,56,203,209 > + pshufd $0x0e,%xmm0,%xmm0 > + movdqa %xmm6,%xmm7 > +.byte 102,15,58,15,253,4 > + nop > + paddd %xmm7,%xmm3 > +.byte 15,56,204,229 > +.byte 15,56,203,202 > + movdqa 224-128(%rcx),%xmm0 > + paddd %xmm6,%xmm0 > +.byte 15,56,205,222 > +.byte 15,56,203,209 > + pshufd $0x0e,%xmm0,%xmm0 > + movdqa %xmm3,%xmm7 > +.byte 102,15,58,15,254,4 > + nop > + paddd %xmm7,%xmm4 > +.byte 15,56,204,238 > +.byte 15,56,203,202 > + movdqa 256-128(%rcx),%xmm0 > + paddd %xmm3,%xmm0 > +.byte 15,56,205,227 > +.byte 15,56,203,209 > + pshufd $0x0e,%xmm0,%xmm0 > + movdqa %xmm4,%xmm7 > +.byte 102,15,58,15,251,4 > + nop > + paddd %xmm7,%xmm5 > +.byte 15,56,204,243 > +.byte 15,56,203,202 > + movdqa 288-128(%rcx),%xmm0 > + paddd %xmm4,%xmm0 > +.byte 15,56,205,236 > +.byte 15,56,203,209 > + pshufd $0x0e,%xmm0,%xmm0 > + movdqa %xmm5,%xmm7 > +.byte 102,15,58,15,252,4 > + nop > + paddd %xmm7,%xmm6 > +.byte 15,56,204,220 > +.byte 15,56,203,202 > + movdqa 320-128(%rcx),%xmm0 > + paddd %xmm5,%xmm0 > +.byte 15,56,205,245 > +.byte 15,56,203,209 > + pshufd $0x0e,%xmm0,%xmm0 > + movdqa %xmm6,%xmm7 > +.byte 102,15,58,15,253,4 > + nop > + paddd %xmm7,%xmm3 > +.byte 15,56,204,229 > +.byte 15,56,203,202 > + movdqa 352-128(%rcx),%xmm0 > + paddd %xmm6,%xmm0 > +.byte 15,56,205,222 > +.byte 15,56,203,209 > + pshufd $0x0e,%xmm0,%xmm0 > + movdqa %xmm3,%xmm7 > +.byte 102,15,58,15,254,4 > + nop > + paddd %xmm7,%xmm4 > +.byte 15,56,204,238 > +.byte 15,56,203,202 > + movdqa 384-128(%rcx),%xmm0 > + paddd %xmm3,%xmm0 > +.byte 15,56,205,227 > +.byte 15,56,203,209 > + pshufd $0x0e,%xmm0,%xmm0 > + movdqa %xmm4,%xmm7 > +.byte 102,15,58,15,251,4 > + nop > + paddd %xmm7,%xmm5 > +.byte 15,56,204,243 > +.byte 15,56,203,202 > + movdqa 416-128(%rcx),%xmm0 > + paddd %xmm4,%xmm0 > +.byte 15,56,205,236 > +.byte 15,56,203,209 > + pshufd $0x0e,%xmm0,%xmm0 > + movdqa %xmm5,%xmm7 > +.byte 102,15,58,15,252,4 > +.byte 15,56,203,202 > + paddd %xmm7,%xmm6 > + > + movdqa 448-128(%rcx),%xmm0 > + paddd %xmm5,%xmm0 > +.byte 15,56,203,209 > + pshufd $0x0e,%xmm0,%xmm0 > +.byte 15,56,205,245 > + movdqa %xmm8,%xmm7 > +.byte 15,56,203,202 > + > + movdqa 480-128(%rcx),%xmm0 > + paddd %xmm6,%xmm0 > + nop > +.byte 15,56,203,209 > + pshufd $0x0e,%xmm0,%xmm0 > + decq %rdx > + nop > +.byte 15,56,203,202 > + > + paddd %xmm10,%xmm2 > + paddd %xmm9,%xmm1 > + jnz .Loop_shaext > + > + pshufd $0xb1,%xmm2,%xmm2 > + pshufd $0x1b,%xmm1,%xmm7 > + pshufd $0xb1,%xmm1,%xmm1 > + punpckhqdq %xmm2,%xmm1 > +.byte 102,15,58,15,215,8 > + > + movdqu %xmm1,(%rdi) > + movdqu %xmm2,16(%rdi) > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size sha256_block_data_order_shaext,.-sha256_block_data_order_shaext > +.type sha256_block_data_order_ssse3,@function > +.align 64 > +sha256_block_data_order_ssse3: > +.cfi_startproc > +.Lssse3_shortcut: > + movq %rsp,%rax > +.cfi_def_cfa_register %rax > + pushq %rbx > +.cfi_offset %rbx,-16 > + pushq %rbp > +.cfi_offset %rbp,-24 > + pushq %r12 > +.cfi_offset %r12,-32 > + pushq %r13 > +.cfi_offset %r13,-40 > + pushq %r14 > +.cfi_offset %r14,-48 > + pushq %r15 > +.cfi_offset %r15,-56 > + shlq $4,%rdx > + subq $96,%rsp > + leaq (%rsi,%rdx,4),%rdx > + andq $-64,%rsp > + movq %rdi,64+0(%rsp) > + movq %rsi,64+8(%rsp) > + movq %rdx,64+16(%rsp) > + movq %rax,88(%rsp) > +.cfi_escape 0x0f,0x06,0x77,0xd8,0x00,0x06,0x23,0x08 > +.Lprologue_ssse3: > + > + movl 0(%rdi),%eax > + movl 4(%rdi),%ebx > + movl 8(%rdi),%ecx > + movl 12(%rdi),%edx > + movl 16(%rdi),%r8d > + movl 20(%rdi),%r9d > + movl 24(%rdi),%r10d > + movl 28(%rdi),%r11d > + > + > + jmp .Lloop_ssse3 > +.align 16 > +.Lloop_ssse3: > + movdqa K256+512(%rip),%xmm7 > + movdqu 0(%rsi),%xmm0 > + movdqu 16(%rsi),%xmm1 > + movdqu 32(%rsi),%xmm2 > +.byte 102,15,56,0,199 > + movdqu 48(%rsi),%xmm3 > + leaq K256(%rip),%rbp > +.byte 102,15,56,0,207 > + movdqa 0(%rbp),%xmm4 > + movdqa 32(%rbp),%xmm5 > +.byte 102,15,56,0,215 > + paddd %xmm0,%xmm4 > + movdqa 64(%rbp),%xmm6 > +.byte 102,15,56,0,223 > + movdqa 96(%rbp),%xmm7 > + paddd %xmm1,%xmm5 > + paddd %xmm2,%xmm6 > + paddd %xmm3,%xmm7 > + movdqa %xmm4,0(%rsp) > + movl %eax,%r14d > + movdqa %xmm5,16(%rsp) > + movl %ebx,%edi > + movdqa %xmm6,32(%rsp) > + xorl %ecx,%edi > + movdqa %xmm7,48(%rsp) > + movl %r8d,%r13d > + jmp .Lssse3_00_47 > + > +.align 16 > +.Lssse3_00_47: > + subq $-128,%rbp > + rorl $14,%r13d > + movdqa %xmm1,%xmm4 > + movl %r14d,%eax > + movl %r9d,%r12d > + movdqa %xmm3,%xmm7 > + rorl $9,%r14d > + xorl %r8d,%r13d > + xorl %r10d,%r12d > + rorl $5,%r13d > + xorl %eax,%r14d > +.byte 102,15,58,15,224,4 > + andl %r8d,%r12d > + xorl %r8d,%r13d > +.byte 102,15,58,15,250,4 > + addl 0(%rsp),%r11d > + movl %eax,%r15d > + xorl %r10d,%r12d > + rorl $11,%r14d > + movdqa %xmm4,%xmm5 > + xorl %ebx,%r15d > + addl %r12d,%r11d > + movdqa %xmm4,%xmm6 > + rorl $6,%r13d > + andl %r15d,%edi > + psrld $3,%xmm4 > + xorl %eax,%r14d > + addl %r13d,%r11d > + xorl %ebx,%edi > + paddd %xmm7,%xmm0 > + rorl $2,%r14d > + addl %r11d,%edx > + psrld $7,%xmm6 > + addl %edi,%r11d > + movl %edx,%r13d > + pshufd $250,%xmm3,%xmm7 > + addl %r11d,%r14d > + rorl $14,%r13d > + pslld $14,%xmm5 > + movl %r14d,%r11d > + movl %r8d,%r12d > + pxor %xmm6,%xmm4 > + rorl $9,%r14d > + xorl %edx,%r13d > + xorl %r9d,%r12d > + rorl $5,%r13d > + psrld $11,%xmm6 > + xorl %r11d,%r14d > + pxor %xmm5,%xmm4 > + andl %edx,%r12d > + xorl %edx,%r13d > + pslld $11,%xmm5 > + addl 4(%rsp),%r10d > + movl %r11d,%edi > + pxor %xmm6,%xmm4 > + xorl %r9d,%r12d > + rorl $11,%r14d > + movdqa %xmm7,%xmm6 > + xorl %eax,%edi > + addl %r12d,%r10d > + pxor %xmm5,%xmm4 > + rorl $6,%r13d > + andl %edi,%r15d > + xorl %r11d,%r14d > + psrld $10,%xmm7 > + addl %r13d,%r10d > + xorl %eax,%r15d > + paddd %xmm4,%xmm0 > + rorl $2,%r14d > + addl %r10d,%ecx > + psrlq $17,%xmm6 > + addl %r15d,%r10d > + movl %ecx,%r13d > + addl %r10d,%r14d > + pxor %xmm6,%xmm7 > + rorl $14,%r13d > + movl %r14d,%r10d > + movl %edx,%r12d > + rorl $9,%r14d > + psrlq $2,%xmm6 > + xorl %ecx,%r13d > + xorl %r8d,%r12d > + pxor %xmm6,%xmm7 > + rorl $5,%r13d > + xorl %r10d,%r14d > + andl %ecx,%r12d > + pshufd $128,%xmm7,%xmm7 > + xorl %ecx,%r13d > + addl 8(%rsp),%r9d > + movl %r10d,%r15d > + psrldq $8,%xmm7 > + xorl %r8d,%r12d > + rorl $11,%r14d > + xorl %r11d,%r15d > + addl %r12d,%r9d > + rorl $6,%r13d > + paddd %xmm7,%xmm0 > + andl %r15d,%edi > + xorl %r10d,%r14d > + addl %r13d,%r9d > + pshufd $80,%xmm0,%xmm7 > + xorl %r11d,%edi > + rorl $2,%r14d > + addl %r9d,%ebx > + movdqa %xmm7,%xmm6 > + addl %edi,%r9d > + movl %ebx,%r13d > + psrld $10,%xmm7 > + addl %r9d,%r14d > + rorl $14,%r13d > + psrlq $17,%xmm6 > + movl %r14d,%r9d > + movl %ecx,%r12d > + pxor %xmm6,%xmm7 > + rorl $9,%r14d > + xorl %ebx,%r13d > + xorl %edx,%r12d > + rorl $5,%r13d > + xorl %r9d,%r14d > + psrlq $2,%xmm6 > + andl %ebx,%r12d > + xorl %ebx,%r13d > + addl 12(%rsp),%r8d > + pxor %xmm6,%xmm7 > + movl %r9d,%edi > + xorl %edx,%r12d > + rorl $11,%r14d > + pshufd $8,%xmm7,%xmm7 > + xorl %r10d,%edi > + addl %r12d,%r8d > + movdqa 0(%rbp),%xmm6 > + rorl $6,%r13d > + andl %edi,%r15d > + pslldq $8,%xmm7 > + xorl %r9d,%r14d > + addl %r13d,%r8d > + xorl %r10d,%r15d > + paddd %xmm7,%xmm0 > + rorl $2,%r14d > + addl %r8d,%eax > + addl %r15d,%r8d > + paddd %xmm0,%xmm6 > + movl %eax,%r13d > + addl %r8d,%r14d > + movdqa %xmm6,0(%rsp) > + rorl $14,%r13d > + movdqa %xmm2,%xmm4 > + movl %r14d,%r8d > + movl %ebx,%r12d > + movdqa %xmm0,%xmm7 > + rorl $9,%r14d > + xorl %eax,%r13d > + xorl %ecx,%r12d > + rorl $5,%r13d > + xorl %r8d,%r14d > +.byte 102,15,58,15,225,4 > + andl %eax,%r12d > + xorl %eax,%r13d > +.byte 102,15,58,15,251,4 > + addl 16(%rsp),%edx > + movl %r8d,%r15d > + xorl %ecx,%r12d > + rorl $11,%r14d > + movdqa %xmm4,%xmm5 > + xorl %r9d,%r15d > + addl %r12d,%edx > + movdqa %xmm4,%xmm6 > + rorl $6,%r13d > + andl %r15d,%edi > + psrld $3,%xmm4 > + xorl %r8d,%r14d > + addl %r13d,%edx > + xorl %r9d,%edi > + paddd %xmm7,%xmm1 > + rorl $2,%r14d > + addl %edx,%r11d > + psrld $7,%xmm6 > + addl %edi,%edx > + movl %r11d,%r13d > + pshufd $250,%xmm0,%xmm7 > + addl %edx,%r14d > + rorl $14,%r13d > + pslld $14,%xmm5 > + movl %r14d,%edx > + movl %eax,%r12d > + pxor %xmm6,%xmm4 > + rorl $9,%r14d > + xorl %r11d,%r13d > + xorl %ebx,%r12d > + rorl $5,%r13d > + psrld $11,%xmm6 > + xorl %edx,%r14d > + pxor %xmm5,%xmm4 > + andl %r11d,%r12d > + xorl %r11d,%r13d > + pslld $11,%xmm5 > + addl 20(%rsp),%ecx > + movl %edx,%edi > + pxor %xmm6,%xmm4 > + xorl %ebx,%r12d > + rorl $11,%r14d > + movdqa %xmm7,%xmm6 > + xorl %r8d,%edi > + addl %r12d,%ecx > + pxor %xmm5,%xmm4 > + rorl $6,%r13d > + andl %edi,%r15d > + xorl %edx,%r14d > + psrld $10,%xmm7 > + addl %r13d,%ecx > + xorl %r8d,%r15d > + paddd %xmm4,%xmm1 > + rorl $2,%r14d > + addl %ecx,%r10d > + psrlq $17,%xmm6 > + addl %r15d,%ecx > + movl %r10d,%r13d > + addl %ecx,%r14d > + pxor %xmm6,%xmm7 > + rorl $14,%r13d > + movl %r14d,%ecx > + movl %r11d,%r12d > + rorl $9,%r14d > + psrlq $2,%xmm6 > + xorl %r10d,%r13d > + xorl %eax,%r12d > + pxor %xmm6,%xmm7 > + rorl $5,%r13d > + xorl %ecx,%r14d > + andl %r10d,%r12d > + pshufd $128,%xmm7,%xmm7 > + xorl %r10d,%r13d > + addl 24(%rsp),%ebx > + movl %ecx,%r15d > + psrldq $8,%xmm7 > + xorl %eax,%r12d > + rorl $11,%r14d > + xorl %edx,%r15d > + addl %r12d,%ebx > + rorl $6,%r13d > + paddd %xmm7,%xmm1 > + andl %r15d,%edi > + xorl %ecx,%r14d > + addl %r13d,%ebx > + pshufd $80,%xmm1,%xmm7 > + xorl %edx,%edi > + rorl $2,%r14d > + addl %ebx,%r9d > + movdqa %xmm7,%xmm6 > + addl %edi,%ebx > + movl %r9d,%r13d > + psrld $10,%xmm7 > + addl %ebx,%r14d > + rorl $14,%r13d > + psrlq $17,%xmm6 > + movl %r14d,%ebx > + movl %r10d,%r12d > + pxor %xmm6,%xmm7 > + rorl $9,%r14d > + xorl %r9d,%r13d > + xorl %r11d,%r12d > + rorl $5,%r13d > + xorl %ebx,%r14d > + psrlq $2,%xmm6 > + andl %r9d,%r12d > + xorl %r9d,%r13d > + addl 28(%rsp),%eax > + pxor %xmm6,%xmm7 > + movl %ebx,%edi > + xorl %r11d,%r12d > + rorl $11,%r14d > + pshufd $8,%xmm7,%xmm7 > + xorl %ecx,%edi > + addl %r12d,%eax > + movdqa 32(%rbp),%xmm6 > + rorl $6,%r13d > + andl %edi,%r15d > + pslldq $8,%xmm7 > + xorl %ebx,%r14d > + addl %r13d,%eax > + xorl %ecx,%r15d > + paddd %xmm7,%xmm1 > + rorl $2,%r14d > + addl %eax,%r8d > + addl %r15d,%eax > + paddd %xmm1,%xmm6 > + movl %r8d,%r13d > + addl %eax,%r14d > + movdqa %xmm6,16(%rsp) > + rorl $14,%r13d > + movdqa %xmm3,%xmm4 > + movl %r14d,%eax > + movl %r9d,%r12d > + movdqa %xmm1,%xmm7 > + rorl $9,%r14d > + xorl %r8d,%r13d > + xorl %r10d,%r12d > + rorl $5,%r13d > + xorl %eax,%r14d > +.byte 102,15,58,15,226,4 > + andl %r8d,%r12d > + xorl %r8d,%r13d > +.byte 102,15,58,15,248,4 > + addl 32(%rsp),%r11d > + movl %eax,%r15d > + xorl %r10d,%r12d > + rorl $11,%r14d > + movdqa %xmm4,%xmm5 > + xorl %ebx,%r15d > + addl %r12d,%r11d > + movdqa %xmm4,%xmm6 > + rorl $6,%r13d > + andl %r15d,%edi > + psrld $3,%xmm4 > + xorl %eax,%r14d > + addl %r13d,%r11d > + xorl %ebx,%edi > + paddd %xmm7,%xmm2 > + rorl $2,%r14d > + addl %r11d,%edx > + psrld $7,%xmm6 > + addl %edi,%r11d > + movl %edx,%r13d > + pshufd $250,%xmm1,%xmm7 > + addl %r11d,%r14d > + rorl $14,%r13d > + pslld $14,%xmm5 > + movl %r14d,%r11d > + movl %r8d,%r12d > + pxor %xmm6,%xmm4 > + rorl $9,%r14d > + xorl %edx,%r13d > + xorl %r9d,%r12d > + rorl $5,%r13d > + psrld $11,%xmm6 > + xorl %r11d,%r14d > + pxor %xmm5,%xmm4 > + andl %edx,%r12d > + xorl %edx,%r13d > + pslld $11,%xmm5 > + addl 36(%rsp),%r10d > + movl %r11d,%edi > + pxor %xmm6,%xmm4 > + xorl %r9d,%r12d > + rorl $11,%r14d > + movdqa %xmm7,%xmm6 > + xorl %eax,%edi > + addl %r12d,%r10d > + pxor %xmm5,%xmm4 > + rorl $6,%r13d > + andl %edi,%r15d > + xorl %r11d,%r14d > + psrld $10,%xmm7 > + addl %r13d,%r10d > + xorl %eax,%r15d > + paddd %xmm4,%xmm2 > + rorl $2,%r14d > + addl %r10d,%ecx > + psrlq $17,%xmm6 > + addl %r15d,%r10d > + movl %ecx,%r13d > + addl %r10d,%r14d > + pxor %xmm6,%xmm7 > + rorl $14,%r13d > + movl %r14d,%r10d > + movl %edx,%r12d > + rorl $9,%r14d > + psrlq $2,%xmm6 > + xorl %ecx,%r13d > + xorl %r8d,%r12d > + pxor %xmm6,%xmm7 > + rorl $5,%r13d > + xorl %r10d,%r14d > + andl %ecx,%r12d > + pshufd $128,%xmm7,%xmm7 > + xorl %ecx,%r13d > + addl 40(%rsp),%r9d > + movl %r10d,%r15d > + psrldq $8,%xmm7 > + xorl %r8d,%r12d > + rorl $11,%r14d > + xorl %r11d,%r15d > + addl %r12d,%r9d > + rorl $6,%r13d > + paddd %xmm7,%xmm2 > + andl %r15d,%edi > + xorl %r10d,%r14d > + addl %r13d,%r9d > + pshufd $80,%xmm2,%xmm7 > + xorl %r11d,%edi > + rorl $2,%r14d > + addl %r9d,%ebx > + movdqa %xmm7,%xmm6 > + addl %edi,%r9d > + movl %ebx,%r13d > + psrld $10,%xmm7 > + addl %r9d,%r14d > + rorl $14,%r13d > + psrlq $17,%xmm6 > + movl %r14d,%r9d > + movl %ecx,%r12d > + pxor %xmm6,%xmm7 > + rorl $9,%r14d > + xorl %ebx,%r13d > + xorl %edx,%r12d > + rorl $5,%r13d > + xorl %r9d,%r14d > + psrlq $2,%xmm6 > + andl %ebx,%r12d > + xorl %ebx,%r13d > + addl 44(%rsp),%r8d > + pxor %xmm6,%xmm7 > + movl %r9d,%edi > + xorl %edx,%r12d > + rorl $11,%r14d > + pshufd $8,%xmm7,%xmm7 > + xorl %r10d,%edi > + addl %r12d,%r8d > + movdqa 64(%rbp),%xmm6 > + rorl $6,%r13d > + andl %edi,%r15d > + pslldq $8,%xmm7 > + xorl %r9d,%r14d > + addl %r13d,%r8d > + xorl %r10d,%r15d > + paddd %xmm7,%xmm2 > + rorl $2,%r14d > + addl %r8d,%eax > + addl %r15d,%r8d > + paddd %xmm2,%xmm6 > + movl %eax,%r13d > + addl %r8d,%r14d > + movdqa %xmm6,32(%rsp) > + rorl $14,%r13d > + movdqa %xmm0,%xmm4 > + movl %r14d,%r8d > + movl %ebx,%r12d > + movdqa %xmm2,%xmm7 > + rorl $9,%r14d > + xorl %eax,%r13d > + xorl %ecx,%r12d > + rorl $5,%r13d > + xorl %r8d,%r14d > +.byte 102,15,58,15,227,4 > + andl %eax,%r12d > + xorl %eax,%r13d > +.byte 102,15,58,15,249,4 > + addl 48(%rsp),%edx > + movl %r8d,%r15d > + xorl %ecx,%r12d > + rorl $11,%r14d > + movdqa %xmm4,%xmm5 > + xorl %r9d,%r15d > + addl %r12d,%edx > + movdqa %xmm4,%xmm6 > + rorl $6,%r13d > + andl %r15d,%edi > + psrld $3,%xmm4 > + xorl %r8d,%r14d > + addl %r13d,%edx > + xorl %r9d,%edi > + paddd %xmm7,%xmm3 > + rorl $2,%r14d > + addl %edx,%r11d > + psrld $7,%xmm6 > + addl %edi,%edx > + movl %r11d,%r13d > + pshufd $250,%xmm2,%xmm7 > + addl %edx,%r14d > + rorl $14,%r13d > + pslld $14,%xmm5 > + movl %r14d,%edx > + movl %eax,%r12d > + pxor %xmm6,%xmm4 > + rorl $9,%r14d > + xorl %r11d,%r13d > + xorl %ebx,%r12d > + rorl $5,%r13d > + psrld $11,%xmm6 > + xorl %edx,%r14d > + pxor %xmm5,%xmm4 > + andl %r11d,%r12d > + xorl %r11d,%r13d > + pslld $11,%xmm5 > + addl 52(%rsp),%ecx > + movl %edx,%edi > + pxor %xmm6,%xmm4 > + xorl %ebx,%r12d > + rorl $11,%r14d > + movdqa %xmm7,%xmm6 > + xorl %r8d,%edi > + addl %r12d,%ecx > + pxor %xmm5,%xmm4 > + rorl $6,%r13d > + andl %edi,%r15d > + xorl %edx,%r14d > + psrld $10,%xmm7 > + addl %r13d,%ecx > + xorl %r8d,%r15d > + paddd %xmm4,%xmm3 > + rorl $2,%r14d > + addl %ecx,%r10d > + psrlq $17,%xmm6 > + addl %r15d,%ecx > + movl %r10d,%r13d > + addl %ecx,%r14d > + pxor %xmm6,%xmm7 > + rorl $14,%r13d > + movl %r14d,%ecx > + movl %r11d,%r12d > + rorl $9,%r14d > + psrlq $2,%xmm6 > + xorl %r10d,%r13d > + xorl %eax,%r12d > + pxor %xmm6,%xmm7 > + rorl $5,%r13d > + xorl %ecx,%r14d > + andl %r10d,%r12d > + pshufd $128,%xmm7,%xmm7 > + xorl %r10d,%r13d > + addl 56(%rsp),%ebx > + movl %ecx,%r15d > + psrldq $8,%xmm7 > + xorl %eax,%r12d > + rorl $11,%r14d > + xorl %edx,%r15d > + addl %r12d,%ebx > + rorl $6,%r13d > + paddd %xmm7,%xmm3 > + andl %r15d,%edi > + xorl %ecx,%r14d > + addl %r13d,%ebx > + pshufd $80,%xmm3,%xmm7 > + xorl %edx,%edi > + rorl $2,%r14d > + addl %ebx,%r9d > + movdqa %xmm7,%xmm6 > + addl %edi,%ebx > + movl %r9d,%r13d > + psrld $10,%xmm7 > + addl %ebx,%r14d > + rorl $14,%r13d > + psrlq $17,%xmm6 > + movl %r14d,%ebx > + movl %r10d,%r12d > + pxor %xmm6,%xmm7 > + rorl $9,%r14d > + xorl %r9d,%r13d > + xorl %r11d,%r12d > + rorl $5,%r13d > + xorl %ebx,%r14d > + psrlq $2,%xmm6 > + andl %r9d,%r12d > + xorl %r9d,%r13d > + addl 60(%rsp),%eax > + pxor %xmm6,%xmm7 > + movl %ebx,%edi > + xorl %r11d,%r12d > + rorl $11,%r14d > + pshufd $8,%xmm7,%xmm7 > + xorl %ecx,%edi > + addl %r12d,%eax > + movdqa 96(%rbp),%xmm6 > + rorl $6,%r13d > + andl %edi,%r15d > + pslldq $8,%xmm7 > + xorl %ebx,%r14d > + addl %r13d,%eax > + xorl %ecx,%r15d > + paddd %xmm7,%xmm3 > + rorl $2,%r14d > + addl %eax,%r8d > + addl %r15d,%eax > + paddd %xmm3,%xmm6 > + movl %r8d,%r13d > + addl %eax,%r14d > + movdqa %xmm6,48(%rsp) > + cmpb $0,131(%rbp) > + jne .Lssse3_00_47 > + rorl $14,%r13d > + movl %r14d,%eax > + movl %r9d,%r12d > + rorl $9,%r14d > + xorl %r8d,%r13d > + xorl %r10d,%r12d > + rorl $5,%r13d > + xorl %eax,%r14d > + andl %r8d,%r12d > + xorl %r8d,%r13d > + addl 0(%rsp),%r11d > + movl %eax,%r15d > + xorl %r10d,%r12d > + rorl $11,%r14d > + xorl %ebx,%r15d > + addl %r12d,%r11d > + rorl $6,%r13d > + andl %r15d,%edi > + xorl %eax,%r14d > + addl %r13d,%r11d > + xorl %ebx,%edi > + rorl $2,%r14d > + addl %r11d,%edx > + addl %edi,%r11d > + movl %edx,%r13d > + addl %r11d,%r14d > + rorl $14,%r13d > + movl %r14d,%r11d > + movl %r8d,%r12d > + rorl $9,%r14d > + xorl %edx,%r13d > + xorl %r9d,%r12d > + rorl $5,%r13d > + xorl %r11d,%r14d > + andl %edx,%r12d > + xorl %edx,%r13d > + addl 4(%rsp),%r10d > + movl %r11d,%edi > + xorl %r9d,%r12d > + rorl $11,%r14d > + xorl %eax,%edi > + addl %r12d,%r10d > + rorl $6,%r13d > + andl %edi,%r15d > + xorl %r11d,%r14d > + addl %r13d,%r10d > + xorl %eax,%r15d > + rorl $2,%r14d > + addl %r10d,%ecx > + addl %r15d,%r10d > + movl %ecx,%r13d > + addl %r10d,%r14d > + rorl $14,%r13d > + movl %r14d,%r10d > + movl %edx,%r12d > + rorl $9,%r14d > + xorl %ecx,%r13d > + xorl %r8d,%r12d > + rorl $5,%r13d > + xorl %r10d,%r14d > + andl %ecx,%r12d > + xorl %ecx,%r13d > + addl 8(%rsp),%r9d > + movl %r10d,%r15d > + xorl %r8d,%r12d > + rorl $11,%r14d > + xorl %r11d,%r15d > + addl %r12d,%r9d > + rorl $6,%r13d > + andl %r15d,%edi > + xorl %r10d,%r14d > + addl %r13d,%r9d > + xorl %r11d,%edi > + rorl $2,%r14d > + addl %r9d,%ebx > + addl %edi,%r9d > + movl %ebx,%r13d > + addl %r9d,%r14d > + rorl $14,%r13d > + movl %r14d,%r9d > + movl %ecx,%r12d > + rorl $9,%r14d > + xorl %ebx,%r13d > + xorl %edx,%r12d > + rorl $5,%r13d > + xorl %r9d,%r14d > + andl %ebx,%r12d > + xorl %ebx,%r13d > + addl 12(%rsp),%r8d > + movl %r9d,%edi > + xorl %edx,%r12d > + rorl $11,%r14d > + xorl %r10d,%edi > + addl %r12d,%r8d > + rorl $6,%r13d > + andl %edi,%r15d > + xorl %r9d,%r14d > + addl %r13d,%r8d > + xorl %r10d,%r15d > + rorl $2,%r14d > + addl %r8d,%eax > + addl %r15d,%r8d > + movl %eax,%r13d > + addl %r8d,%r14d > + rorl $14,%r13d > + movl %r14d,%r8d > + movl %ebx,%r12d > + rorl $9,%r14d > + xorl %eax,%r13d > + xorl %ecx,%r12d > + rorl $5,%r13d > + xorl %r8d,%r14d > + andl %eax,%r12d > + xorl %eax,%r13d > + addl 16(%rsp),%edx > + movl %r8d,%r15d > + xorl %ecx,%r12d > + rorl $11,%r14d > + xorl %r9d,%r15d > + addl %r12d,%edx > + rorl $6,%r13d > + andl %r15d,%edi > + xorl %r8d,%r14d > + addl %r13d,%edx > + xorl %r9d,%edi > + rorl $2,%r14d > + addl %edx,%r11d > + addl %edi,%edx > + movl %r11d,%r13d > + addl %edx,%r14d > + rorl $14,%r13d > + movl %r14d,%edx > + movl %eax,%r12d > + rorl $9,%r14d > + xorl %r11d,%r13d > + xorl %ebx,%r12d > + rorl $5,%r13d > + xorl %edx,%r14d > + andl %r11d,%r12d > + xorl %r11d,%r13d > + addl 20(%rsp),%ecx > + movl %edx,%edi > + xorl %ebx,%r12d > + rorl $11,%r14d > + xorl %r8d,%edi > + addl %r12d,%ecx > + rorl $6,%r13d > + andl %edi,%r15d > + xorl %edx,%r14d > + addl %r13d,%ecx > + xorl %r8d,%r15d > + rorl $2,%r14d > + addl %ecx,%r10d > + addl %r15d,%ecx > + movl %r10d,%r13d > + addl %ecx,%r14d > + rorl $14,%r13d > + movl %r14d,%ecx > + movl %r11d,%r12d > + rorl $9,%r14d > + xorl %r10d,%r13d > + xorl %eax,%r12d > + rorl $5,%r13d > + xorl %ecx,%r14d > + andl %r10d,%r12d > + xorl %r10d,%r13d > + addl 24(%rsp),%ebx > + movl %ecx,%r15d > + xorl %eax,%r12d > + rorl $11,%r14d > + xorl %edx,%r15d > + addl %r12d,%ebx > + rorl $6,%r13d > + andl %r15d,%edi > + xorl %ecx,%r14d > + addl %r13d,%ebx > + xorl %edx,%edi > + rorl $2,%r14d > + addl %ebx,%r9d > + addl %edi,%ebx > + movl %r9d,%r13d > + addl %ebx,%r14d > + rorl $14,%r13d > + movl %r14d,%ebx > + movl %r10d,%r12d > + rorl $9,%r14d > + xorl %r9d,%r13d > + xorl %r11d,%r12d > + rorl $5,%r13d > + xorl %ebx,%r14d > + andl %r9d,%r12d > + xorl %r9d,%r13d > + addl 28(%rsp),%eax > + movl %ebx,%edi > + xorl %r11d,%r12d > + rorl $11,%r14d > + xorl %ecx,%edi > + addl %r12d,%eax > + rorl $6,%r13d > + andl %edi,%r15d > + xorl %ebx,%r14d > + addl %r13d,%eax > + xorl %ecx,%r15d > + rorl $2,%r14d > + addl %eax,%r8d > + addl %r15d,%eax > + movl %r8d,%r13d > + addl %eax,%r14d > + rorl $14,%r13d > + movl %r14d,%eax > + movl %r9d,%r12d > + rorl $9,%r14d > + xorl %r8d,%r13d > + xorl %r10d,%r12d > + rorl $5,%r13d > + xorl %eax,%r14d > + andl %r8d,%r12d > + xorl %r8d,%r13d > + addl 32(%rsp),%r11d > + movl %eax,%r15d > + xorl %r10d,%r12d > + rorl $11,%r14d > + xorl %ebx,%r15d > + addl %r12d,%r11d > + rorl $6,%r13d > + andl %r15d,%edi > + xorl %eax,%r14d > + addl %r13d,%r11d > + xorl %ebx,%edi > + rorl $2,%r14d > + addl %r11d,%edx > + addl %edi,%r11d > + movl %edx,%r13d > + addl %r11d,%r14d > + rorl $14,%r13d > + movl %r14d,%r11d > + movl %r8d,%r12d > + rorl $9,%r14d > + xorl %edx,%r13d > + xorl %r9d,%r12d > + rorl $5,%r13d > + xorl %r11d,%r14d > + andl %edx,%r12d > + xorl %edx,%r13d > + addl 36(%rsp),%r10d > + movl %r11d,%edi > + xorl %r9d,%r12d > + rorl $11,%r14d > + xorl %eax,%edi > + addl %r12d,%r10d > + rorl $6,%r13d > + andl %edi,%r15d > + xorl %r11d,%r14d > + addl %r13d,%r10d > + xorl %eax,%r15d > + rorl $2,%r14d > + addl %r10d,%ecx > + addl %r15d,%r10d > + movl %ecx,%r13d > + addl %r10d,%r14d > + rorl $14,%r13d > + movl %r14d,%r10d > + movl %edx,%r12d > + rorl $9,%r14d > + xorl %ecx,%r13d > + xorl %r8d,%r12d > + rorl $5,%r13d > + xorl %r10d,%r14d > + andl %ecx,%r12d > + xorl %ecx,%r13d > + addl 40(%rsp),%r9d > + movl %r10d,%r15d > + xorl %r8d,%r12d > + rorl $11,%r14d > + xorl %r11d,%r15d > + addl %r12d,%r9d > + rorl $6,%r13d > + andl %r15d,%edi > + xorl %r10d,%r14d > + addl %r13d,%r9d > + xorl %r11d,%edi > + rorl $2,%r14d > + addl %r9d,%ebx > + addl %edi,%r9d > + movl %ebx,%r13d > + addl %r9d,%r14d > + rorl $14,%r13d > + movl %r14d,%r9d > + movl %ecx,%r12d > + rorl $9,%r14d > + xorl %ebx,%r13d > + xorl %edx,%r12d > + rorl $5,%r13d > + xorl %r9d,%r14d > + andl %ebx,%r12d > + xorl %ebx,%r13d > + addl 44(%rsp),%r8d > + movl %r9d,%edi > + xorl %edx,%r12d > + rorl $11,%r14d > + xorl %r10d,%edi > + addl %r12d,%r8d > + rorl $6,%r13d > + andl %edi,%r15d > + xorl %r9d,%r14d > + addl %r13d,%r8d > + xorl %r10d,%r15d > + rorl $2,%r14d > + addl %r8d,%eax > + addl %r15d,%r8d > + movl %eax,%r13d > + addl %r8d,%r14d > + rorl $14,%r13d > + movl %r14d,%r8d > + movl %ebx,%r12d > + rorl $9,%r14d > + xorl %eax,%r13d > + xorl %ecx,%r12d > + rorl $5,%r13d > + xorl %r8d,%r14d > + andl %eax,%r12d > + xorl %eax,%r13d > + addl 48(%rsp),%edx > + movl %r8d,%r15d > + xorl %ecx,%r12d > + rorl $11,%r14d > + xorl %r9d,%r15d > + addl %r12d,%edx > + rorl $6,%r13d > + andl %r15d,%edi > + xorl %r8d,%r14d > + addl %r13d,%edx > + xorl %r9d,%edi > + rorl $2,%r14d > + addl %edx,%r11d > + addl %edi,%edx > + movl %r11d,%r13d > + addl %edx,%r14d > + rorl $14,%r13d > + movl %r14d,%edx > + movl %eax,%r12d > + rorl $9,%r14d > + xorl %r11d,%r13d > + xorl %ebx,%r12d > + rorl $5,%r13d > + xorl %edx,%r14d > + andl %r11d,%r12d > + xorl %r11d,%r13d > + addl 52(%rsp),%ecx > + movl %edx,%edi > + xorl %ebx,%r12d > + rorl $11,%r14d > + xorl %r8d,%edi > + addl %r12d,%ecx > + rorl $6,%r13d > + andl %edi,%r15d > + xorl %edx,%r14d > + addl %r13d,%ecx > + xorl %r8d,%r15d > + rorl $2,%r14d > + addl %ecx,%r10d > + addl %r15d,%ecx > + movl %r10d,%r13d > + addl %ecx,%r14d > + rorl $14,%r13d > + movl %r14d,%ecx > + movl %r11d,%r12d > + rorl $9,%r14d > + xorl %r10d,%r13d > + xorl %eax,%r12d > + rorl $5,%r13d > + xorl %ecx,%r14d > + andl %r10d,%r12d > + xorl %r10d,%r13d > + addl 56(%rsp),%ebx > + movl %ecx,%r15d > + xorl %eax,%r12d > + rorl $11,%r14d > + xorl %edx,%r15d > + addl %r12d,%ebx > + rorl $6,%r13d > + andl %r15d,%edi > + xorl %ecx,%r14d > + addl %r13d,%ebx > + xorl %edx,%edi > + rorl $2,%r14d > + addl %ebx,%r9d > + addl %edi,%ebx > + movl %r9d,%r13d > + addl %ebx,%r14d > + rorl $14,%r13d > + movl %r14d,%ebx > + movl %r10d,%r12d > + rorl $9,%r14d > + xorl %r9d,%r13d > + xorl %r11d,%r12d > + rorl $5,%r13d > + xorl %ebx,%r14d > + andl %r9d,%r12d > + xorl %r9d,%r13d > + addl 60(%rsp),%eax > + movl %ebx,%edi > + xorl %r11d,%r12d > + rorl $11,%r14d > + xorl %ecx,%edi > + addl %r12d,%eax > + rorl $6,%r13d > + andl %edi,%r15d > + xorl %ebx,%r14d > + addl %r13d,%eax > + xorl %ecx,%r15d > + rorl $2,%r14d > + addl %eax,%r8d > + addl %r15d,%eax > + movl %r8d,%r13d > + addl %eax,%r14d > + movq 64+0(%rsp),%rdi > + movl %r14d,%eax > + > + addl 0(%rdi),%eax > + leaq 64(%rsi),%rsi > + addl 4(%rdi),%ebx > + addl 8(%rdi),%ecx > + addl 12(%rdi),%edx > + addl 16(%rdi),%r8d > + addl 20(%rdi),%r9d > + addl 24(%rdi),%r10d > + addl 28(%rdi),%r11d > + > + cmpq 64+16(%rsp),%rsi > + > + movl %eax,0(%rdi) > + movl %ebx,4(%rdi) > + movl %ecx,8(%rdi) > + movl %edx,12(%rdi) > + movl %r8d,16(%rdi) > + movl %r9d,20(%rdi) > + movl %r10d,24(%rdi) > + movl %r11d,28(%rdi) > + jb .Lloop_ssse3 > + > + movq 88(%rsp),%rsi > +.cfi_def_cfa %rsi,8 > + movq -48(%rsi),%r15 > +.cfi_restore %r15 > + movq -40(%rsi),%r14 > +.cfi_restore %r14 > + movq -32(%rsi),%r13 > +.cfi_restore %r13 > + movq -24(%rsi),%r12 > +.cfi_restore %r12 > + movq -16(%rsi),%rbp > +.cfi_restore %rbp > + movq -8(%rsi),%rbx > +.cfi_restore %rbx > + leaq (%rsi),%rsp > +.cfi_def_cfa_register %rsp > +.Lepilogue_ssse3: > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size sha256_block_data_order_ssse3,.-sha256_block_data_order_ssse3 > diff --git a/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha512-x86_64.S > b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha512-x86_64.S > new file mode 100644 > index 0000000000..11e67e5ba1 > --- /dev/null > +++ b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha512-x86_64.S > @@ -0,0 +1,1811 @@ > +# WARNING: do not edit! > +# Generated from openssl/crypto/sha/asm/sha512-x86_64.pl > +# > +# Copyright 2005-2020 The OpenSSL Project Authors. All Rights Reserved. > +# > +# Licensed under the OpenSSL license (the "License"). You may not use > +# this file except in compliance with the License. You can obtain a copy > +# in the file LICENSE in the source distribution or at > +# https://www.openssl.org/source/license.html > + > +.text > + > + > +.globl sha512_block_data_order > +.type sha512_block_data_order,@function > +.align 16 > +sha512_block_data_order: > +.cfi_startproc > + movq %rsp,%rax > +.cfi_def_cfa_register %rax > + pushq %rbx > +.cfi_offset %rbx,-16 > + pushq %rbp > +.cfi_offset %rbp,-24 > + pushq %r12 > +.cfi_offset %r12,-32 > + pushq %r13 > +.cfi_offset %r13,-40 > + pushq %r14 > +.cfi_offset %r14,-48 > + pushq %r15 > +.cfi_offset %r15,-56 > + shlq $4,%rdx > + subq $128+32,%rsp > + leaq (%rsi,%rdx,8),%rdx > + andq $-64,%rsp > + movq %rdi,128+0(%rsp) > + movq %rsi,128+8(%rsp) > + movq %rdx,128+16(%rsp) > + movq %rax,152(%rsp) > +.cfi_escape 0x0f,0x06,0x77,0x98,0x01,0x06,0x23,0x08 > +.Lprologue: > + > + movq 0(%rdi),%rax > + movq 8(%rdi),%rbx > + movq 16(%rdi),%rcx > + movq 24(%rdi),%rdx > + movq 32(%rdi),%r8 > + movq 40(%rdi),%r9 > + movq 48(%rdi),%r10 > + movq 56(%rdi),%r11 > + jmp .Lloop > + > +.align 16 > +.Lloop: > + movq %rbx,%rdi > + leaq K512(%rip),%rbp > + xorq %rcx,%rdi > + movq 0(%rsi),%r12 > + movq %r8,%r13 > + movq %rax,%r14 > + bswapq %r12 > + rorq $23,%r13 > + movq %r9,%r15 > + > + xorq %r8,%r13 > + rorq $5,%r14 > + xorq %r10,%r15 > + > + movq %r12,0(%rsp) > + xorq %rax,%r14 > + andq %r8,%r15 > + > + rorq $4,%r13 > + addq %r11,%r12 > + xorq %r10,%r15 > + > + rorq $6,%r14 > + xorq %r8,%r13 > + addq %r15,%r12 > + > + movq %rax,%r15 > + addq (%rbp),%r12 > + xorq %rax,%r14 > + > + xorq %rbx,%r15 > + rorq $14,%r13 > + movq %rbx,%r11 > + > + andq %r15,%rdi > + rorq $28,%r14 > + addq %r13,%r12 > + > + xorq %rdi,%r11 > + addq %r12,%rdx > + addq %r12,%r11 > + > + leaq 8(%rbp),%rbp > + addq %r14,%r11 > + movq 8(%rsi),%r12 > + movq %rdx,%r13 > + movq %r11,%r14 > + bswapq %r12 > + rorq $23,%r13 > + movq %r8,%rdi > + > + xorq %rdx,%r13 > + rorq $5,%r14 > + xorq %r9,%rdi > + > + movq %r12,8(%rsp) > + xorq %r11,%r14 > + andq %rdx,%rdi > + > + rorq $4,%r13 > + addq %r10,%r12 > + xorq %r9,%rdi > + > + rorq $6,%r14 > + xorq %rdx,%r13 > + addq %rdi,%r12 > + > + movq %r11,%rdi > + addq (%rbp),%r12 > + xorq %r11,%r14 > + > + xorq %rax,%rdi > + rorq $14,%r13 > + movq %rax,%r10 > + > + andq %rdi,%r15 > + rorq $28,%r14 > + addq %r13,%r12 > + > + xorq %r15,%r10 > + addq %r12,%rcx > + addq %r12,%r10 > + > + leaq 24(%rbp),%rbp > + addq %r14,%r10 > + movq 16(%rsi),%r12 > + movq %rcx,%r13 > + movq %r10,%r14 > + bswapq %r12 > + rorq $23,%r13 > + movq %rdx,%r15 > + > + xorq %rcx,%r13 > + rorq $5,%r14 > + xorq %r8,%r15 > + > + movq %r12,16(%rsp) > + xorq %r10,%r14 > + andq %rcx,%r15 > + > + rorq $4,%r13 > + addq %r9,%r12 > + xorq %r8,%r15 > + > + rorq $6,%r14 > + xorq %rcx,%r13 > + addq %r15,%r12 > + > + movq %r10,%r15 > + addq (%rbp),%r12 > + xorq %r10,%r14 > + > + xorq %r11,%r15 > + rorq $14,%r13 > + movq %r11,%r9 > + > + andq %r15,%rdi > + rorq $28,%r14 > + addq %r13,%r12 > + > + xorq %rdi,%r9 > + addq %r12,%rbx > + addq %r12,%r9 > + > + leaq 8(%rbp),%rbp > + addq %r14,%r9 > + movq 24(%rsi),%r12 > + movq %rbx,%r13 > + movq %r9,%r14 > + bswapq %r12 > + rorq $23,%r13 > + movq %rcx,%rdi > + > + xorq %rbx,%r13 > + rorq $5,%r14 > + xorq %rdx,%rdi > + > + movq %r12,24(%rsp) > + xorq %r9,%r14 > + andq %rbx,%rdi > + > + rorq $4,%r13 > + addq %r8,%r12 > + xorq %rdx,%rdi > + > + rorq $6,%r14 > + xorq %rbx,%r13 > + addq %rdi,%r12 > + > + movq %r9,%rdi > + addq (%rbp),%r12 > + xorq %r9,%r14 > + > + xorq %r10,%rdi > + rorq $14,%r13 > + movq %r10,%r8 > + > + andq %rdi,%r15 > + rorq $28,%r14 > + addq %r13,%r12 > + > + xorq %r15,%r8 > + addq %r12,%rax > + addq %r12,%r8 > + > + leaq 24(%rbp),%rbp > + addq %r14,%r8 > + movq 32(%rsi),%r12 > + movq %rax,%r13 > + movq %r8,%r14 > + bswapq %r12 > + rorq $23,%r13 > + movq %rbx,%r15 > + > + xorq %rax,%r13 > + rorq $5,%r14 > + xorq %rcx,%r15 > + > + movq %r12,32(%rsp) > + xorq %r8,%r14 > + andq %rax,%r15 > + > + rorq $4,%r13 > + addq %rdx,%r12 > + xorq %rcx,%r15 > + > + rorq $6,%r14 > + xorq %rax,%r13 > + addq %r15,%r12 > + > + movq %r8,%r15 > + addq (%rbp),%r12 > + xorq %r8,%r14 > + > + xorq %r9,%r15 > + rorq $14,%r13 > + movq %r9,%rdx > + > + andq %r15,%rdi > + rorq $28,%r14 > + addq %r13,%r12 > + > + xorq %rdi,%rdx > + addq %r12,%r11 > + addq %r12,%rdx > + > + leaq 8(%rbp),%rbp > + addq %r14,%rdx > + movq 40(%rsi),%r12 > + movq %r11,%r13 > + movq %rdx,%r14 > + bswapq %r12 > + rorq $23,%r13 > + movq %rax,%rdi > + > + xorq %r11,%r13 > + rorq $5,%r14 > + xorq %rbx,%rdi > + > + movq %r12,40(%rsp) > + xorq %rdx,%r14 > + andq %r11,%rdi > + > + rorq $4,%r13 > + addq %rcx,%r12 > + xorq %rbx,%rdi > + > + rorq $6,%r14 > + xorq %r11,%r13 > + addq %rdi,%r12 > + > + movq %rdx,%rdi > + addq (%rbp),%r12 > + xorq %rdx,%r14 > + > + xorq %r8,%rdi > + rorq $14,%r13 > + movq %r8,%rcx > + > + andq %rdi,%r15 > + rorq $28,%r14 > + addq %r13,%r12 > + > + xorq %r15,%rcx > + addq %r12,%r10 > + addq %r12,%rcx > + > + leaq 24(%rbp),%rbp > + addq %r14,%rcx > + movq 48(%rsi),%r12 > + movq %r10,%r13 > + movq %rcx,%r14 > + bswapq %r12 > + rorq $23,%r13 > + movq %r11,%r15 > + > + xorq %r10,%r13 > + rorq $5,%r14 > + xorq %rax,%r15 > + > + movq %r12,48(%rsp) > + xorq %rcx,%r14 > + andq %r10,%r15 > + > + rorq $4,%r13 > + addq %rbx,%r12 > + xorq %rax,%r15 > + > + rorq $6,%r14 > + xorq %r10,%r13 > + addq %r15,%r12 > + > + movq %rcx,%r15 > + addq (%rbp),%r12 > + xorq %rcx,%r14 > + > + xorq %rdx,%r15 > + rorq $14,%r13 > + movq %rdx,%rbx > + > + andq %r15,%rdi > + rorq $28,%r14 > + addq %r13,%r12 > + > + xorq %rdi,%rbx > + addq %r12,%r9 > + addq %r12,%rbx > + > + leaq 8(%rbp),%rbp > + addq %r14,%rbx > + movq 56(%rsi),%r12 > + movq %r9,%r13 > + movq %rbx,%r14 > + bswapq %r12 > + rorq $23,%r13 > + movq %r10,%rdi > + > + xorq %r9,%r13 > + rorq $5,%r14 > + xorq %r11,%rdi > + > + movq %r12,56(%rsp) > + xorq %rbx,%r14 > + andq %r9,%rdi > + > + rorq $4,%r13 > + addq %rax,%r12 > + xorq %r11,%rdi > + > + rorq $6,%r14 > + xorq %r9,%r13 > + addq %rdi,%r12 > + > + movq %rbx,%rdi > + addq (%rbp),%r12 > + xorq %rbx,%r14 > + > + xorq %rcx,%rdi > + rorq $14,%r13 > + movq %rcx,%rax > + > + andq %rdi,%r15 > + rorq $28,%r14 > + addq %r13,%r12 > + > + xorq %r15,%rax > + addq %r12,%r8 > + addq %r12,%rax > + > + leaq 24(%rbp),%rbp > + addq %r14,%rax > + movq 64(%rsi),%r12 > + movq %r8,%r13 > + movq %rax,%r14 > + bswapq %r12 > + rorq $23,%r13 > + movq %r9,%r15 > + > + xorq %r8,%r13 > + rorq $5,%r14 > + xorq %r10,%r15 > + > + movq %r12,64(%rsp) > + xorq %rax,%r14 > + andq %r8,%r15 > + > + rorq $4,%r13 > + addq %r11,%r12 > + xorq %r10,%r15 > + > + rorq $6,%r14 > + xorq %r8,%r13 > + addq %r15,%r12 > + > + movq %rax,%r15 > + addq (%rbp),%r12 > + xorq %rax,%r14 > + > + xorq %rbx,%r15 > + rorq $14,%r13 > + movq %rbx,%r11 > + > + andq %r15,%rdi > + rorq $28,%r14 > + addq %r13,%r12 > + > + xorq %rdi,%r11 > + addq %r12,%rdx > + addq %r12,%r11 > + > + leaq 8(%rbp),%rbp > + addq %r14,%r11 > + movq 72(%rsi),%r12 > + movq %rdx,%r13 > + movq %r11,%r14 > + bswapq %r12 > + rorq $23,%r13 > + movq %r8,%rdi > + > + xorq %rdx,%r13 > + rorq $5,%r14 > + xorq %r9,%rdi > + > + movq %r12,72(%rsp) > + xorq %r11,%r14 > + andq %rdx,%rdi > + > + rorq $4,%r13 > + addq %r10,%r12 > + xorq %r9,%rdi > + > + rorq $6,%r14 > + xorq %rdx,%r13 > + addq %rdi,%r12 > + > + movq %r11,%rdi > + addq (%rbp),%r12 > + xorq %r11,%r14 > + > + xorq %rax,%rdi > + rorq $14,%r13 > + movq %rax,%r10 > + > + andq %rdi,%r15 > + rorq $28,%r14 > + addq %r13,%r12 > + > + xorq %r15,%r10 > + addq %r12,%rcx > + addq %r12,%r10 > + > + leaq 24(%rbp),%rbp > + addq %r14,%r10 > + movq 80(%rsi),%r12 > + movq %rcx,%r13 > + movq %r10,%r14 > + bswapq %r12 > + rorq $23,%r13 > + movq %rdx,%r15 > + > + xorq %rcx,%r13 > + rorq $5,%r14 > + xorq %r8,%r15 > + > + movq %r12,80(%rsp) > + xorq %r10,%r14 > + andq %rcx,%r15 > + > + rorq $4,%r13 > + addq %r9,%r12 > + xorq %r8,%r15 > + > + rorq $6,%r14 > + xorq %rcx,%r13 > + addq %r15,%r12 > + > + movq %r10,%r15 > + addq (%rbp),%r12 > + xorq %r10,%r14 > + > + xorq %r11,%r15 > + rorq $14,%r13 > + movq %r11,%r9 > + > + andq %r15,%rdi > + rorq $28,%r14 > + addq %r13,%r12 > + > + xorq %rdi,%r9 > + addq %r12,%rbx > + addq %r12,%r9 > + > + leaq 8(%rbp),%rbp > + addq %r14,%r9 > + movq 88(%rsi),%r12 > + movq %rbx,%r13 > + movq %r9,%r14 > + bswapq %r12 > + rorq $23,%r13 > + movq %rcx,%rdi > + > + xorq %rbx,%r13 > + rorq $5,%r14 > + xorq %rdx,%rdi > + > + movq %r12,88(%rsp) > + xorq %r9,%r14 > + andq %rbx,%rdi > + > + rorq $4,%r13 > + addq %r8,%r12 > + xorq %rdx,%rdi > + > + rorq $6,%r14 > + xorq %rbx,%r13 > + addq %rdi,%r12 > + > + movq %r9,%rdi > + addq (%rbp),%r12 > + xorq %r9,%r14 > + > + xorq %r10,%rdi > + rorq $14,%r13 > + movq %r10,%r8 > + > + andq %rdi,%r15 > + rorq $28,%r14 > + addq %r13,%r12 > + > + xorq %r15,%r8 > + addq %r12,%rax > + addq %r12,%r8 > + > + leaq 24(%rbp),%rbp > + addq %r14,%r8 > + movq 96(%rsi),%r12 > + movq %rax,%r13 > + movq %r8,%r14 > + bswapq %r12 > + rorq $23,%r13 > + movq %rbx,%r15 > + > + xorq %rax,%r13 > + rorq $5,%r14 > + xorq %rcx,%r15 > + > + movq %r12,96(%rsp) > + xorq %r8,%r14 > + andq %rax,%r15 > + > + rorq $4,%r13 > + addq %rdx,%r12 > + xorq %rcx,%r15 > + > + rorq $6,%r14 > + xorq %rax,%r13 > + addq %r15,%r12 > + > + movq %r8,%r15 > + addq (%rbp),%r12 > + xorq %r8,%r14 > + > + xorq %r9,%r15 > + rorq $14,%r13 > + movq %r9,%rdx > + > + andq %r15,%rdi > + rorq $28,%r14 > + addq %r13,%r12 > + > + xorq %rdi,%rdx > + addq %r12,%r11 > + addq %r12,%rdx > + > + leaq 8(%rbp),%rbp > + addq %r14,%rdx > + movq 104(%rsi),%r12 > + movq %r11,%r13 > + movq %rdx,%r14 > + bswapq %r12 > + rorq $23,%r13 > + movq %rax,%rdi > + > + xorq %r11,%r13 > + rorq $5,%r14 > + xorq %rbx,%rdi > + > + movq %r12,104(%rsp) > + xorq %rdx,%r14 > + andq %r11,%rdi > + > + rorq $4,%r13 > + addq %rcx,%r12 > + xorq %rbx,%rdi > + > + rorq $6,%r14 > + xorq %r11,%r13 > + addq %rdi,%r12 > + > + movq %rdx,%rdi > + addq (%rbp),%r12 > + xorq %rdx,%r14 > + > + xorq %r8,%rdi > + rorq $14,%r13 > + movq %r8,%rcx > + > + andq %rdi,%r15 > + rorq $28,%r14 > + addq %r13,%r12 > + > + xorq %r15,%rcx > + addq %r12,%r10 > + addq %r12,%rcx > + > + leaq 24(%rbp),%rbp > + addq %r14,%rcx > + movq 112(%rsi),%r12 > + movq %r10,%r13 > + movq %rcx,%r14 > + bswapq %r12 > + rorq $23,%r13 > + movq %r11,%r15 > + > + xorq %r10,%r13 > + rorq $5,%r14 > + xorq %rax,%r15 > + > + movq %r12,112(%rsp) > + xorq %rcx,%r14 > + andq %r10,%r15 > + > + rorq $4,%r13 > + addq %rbx,%r12 > + xorq %rax,%r15 > + > + rorq $6,%r14 > + xorq %r10,%r13 > + addq %r15,%r12 > + > + movq %rcx,%r15 > + addq (%rbp),%r12 > + xorq %rcx,%r14 > + > + xorq %rdx,%r15 > + rorq $14,%r13 > + movq %rdx,%rbx > + > + andq %r15,%rdi > + rorq $28,%r14 > + addq %r13,%r12 > + > + xorq %rdi,%rbx > + addq %r12,%r9 > + addq %r12,%rbx > + > + leaq 8(%rbp),%rbp > + addq %r14,%rbx > + movq 120(%rsi),%r12 > + movq %r9,%r13 > + movq %rbx,%r14 > + bswapq %r12 > + rorq $23,%r13 > + movq %r10,%rdi > + > + xorq %r9,%r13 > + rorq $5,%r14 > + xorq %r11,%rdi > + > + movq %r12,120(%rsp) > + xorq %rbx,%r14 > + andq %r9,%rdi > + > + rorq $4,%r13 > + addq %rax,%r12 > + xorq %r11,%rdi > + > + rorq $6,%r14 > + xorq %r9,%r13 > + addq %rdi,%r12 > + > + movq %rbx,%rdi > + addq (%rbp),%r12 > + xorq %rbx,%r14 > + > + xorq %rcx,%rdi > + rorq $14,%r13 > + movq %rcx,%rax > + > + andq %rdi,%r15 > + rorq $28,%r14 > + addq %r13,%r12 > + > + xorq %r15,%rax > + addq %r12,%r8 > + addq %r12,%rax > + > + leaq 24(%rbp),%rbp > + jmp .Lrounds_16_xx > +.align 16 > +.Lrounds_16_xx: > + movq 8(%rsp),%r13 > + movq 112(%rsp),%r15 > + > + movq %r13,%r12 > + rorq $7,%r13 > + addq %r14,%rax > + movq %r15,%r14 > + rorq $42,%r15 > + > + xorq %r12,%r13 > + shrq $7,%r12 > + rorq $1,%r13 > + xorq %r14,%r15 > + shrq $6,%r14 > + > + rorq $19,%r15 > + xorq %r13,%r12 > + xorq %r14,%r15 > + addq 72(%rsp),%r12 > + > + addq 0(%rsp),%r12 > + movq %r8,%r13 > + addq %r15,%r12 > + movq %rax,%r14 > + rorq $23,%r13 > + movq %r9,%r15 > + > + xorq %r8,%r13 > + rorq $5,%r14 > + xorq %r10,%r15 > + > + movq %r12,0(%rsp) > + xorq %rax,%r14 > + andq %r8,%r15 > + > + rorq $4,%r13 > + addq %r11,%r12 > + xorq %r10,%r15 > + > + rorq $6,%r14 > + xorq %r8,%r13 > + addq %r15,%r12 > + > + movq %rax,%r15 > + addq (%rbp),%r12 > + xorq %rax,%r14 > + > + xorq %rbx,%r15 > + rorq $14,%r13 > + movq %rbx,%r11 > + > + andq %r15,%rdi > + rorq $28,%r14 > + addq %r13,%r12 > + > + xorq %rdi,%r11 > + addq %r12,%rdx > + addq %r12,%r11 > + > + leaq 8(%rbp),%rbp > + movq 16(%rsp),%r13 > + movq 120(%rsp),%rdi > + > + movq %r13,%r12 > + rorq $7,%r13 > + addq %r14,%r11 > + movq %rdi,%r14 > + rorq $42,%rdi > + > + xorq %r12,%r13 > + shrq $7,%r12 > + rorq $1,%r13 > + xorq %r14,%rdi > + shrq $6,%r14 > + > + rorq $19,%rdi > + xorq %r13,%r12 > + xorq %r14,%rdi > + addq 80(%rsp),%r12 > + > + addq 8(%rsp),%r12 > + movq %rdx,%r13 > + addq %rdi,%r12 > + movq %r11,%r14 > + rorq $23,%r13 > + movq %r8,%rdi > + > + xorq %rdx,%r13 > + rorq $5,%r14 > + xorq %r9,%rdi > + > + movq %r12,8(%rsp) > + xorq %r11,%r14 > + andq %rdx,%rdi > + > + rorq $4,%r13 > + addq %r10,%r12 > + xorq %r9,%rdi > + > + rorq $6,%r14 > + xorq %rdx,%r13 > + addq %rdi,%r12 > + > + movq %r11,%rdi > + addq (%rbp),%r12 > + xorq %r11,%r14 > + > + xorq %rax,%rdi > + rorq $14,%r13 > + movq %rax,%r10 > + > + andq %rdi,%r15 > + rorq $28,%r14 > + addq %r13,%r12 > + > + xorq %r15,%r10 > + addq %r12,%rcx > + addq %r12,%r10 > + > + leaq 24(%rbp),%rbp > + movq 24(%rsp),%r13 > + movq 0(%rsp),%r15 > + > + movq %r13,%r12 > + rorq $7,%r13 > + addq %r14,%r10 > + movq %r15,%r14 > + rorq $42,%r15 > + > + xorq %r12,%r13 > + shrq $7,%r12 > + rorq $1,%r13 > + xorq %r14,%r15 > + shrq $6,%r14 > + > + rorq $19,%r15 > + xorq %r13,%r12 > + xorq %r14,%r15 > + addq 88(%rsp),%r12 > + > + addq 16(%rsp),%r12 > + movq %rcx,%r13 > + addq %r15,%r12 > + movq %r10,%r14 > + rorq $23,%r13 > + movq %rdx,%r15 > + > + xorq %rcx,%r13 > + rorq $5,%r14 > + xorq %r8,%r15 > + > + movq %r12,16(%rsp) > + xorq %r10,%r14 > + andq %rcx,%r15 > + > + rorq $4,%r13 > + addq %r9,%r12 > + xorq %r8,%r15 > + > + rorq $6,%r14 > + xorq %rcx,%r13 > + addq %r15,%r12 > + > + movq %r10,%r15 > + addq (%rbp),%r12 > + xorq %r10,%r14 > + > + xorq %r11,%r15 > + rorq $14,%r13 > + movq %r11,%r9 > + > + andq %r15,%rdi > + rorq $28,%r14 > + addq %r13,%r12 > + > + xorq %rdi,%r9 > + addq %r12,%rbx > + addq %r12,%r9 > + > + leaq 8(%rbp),%rbp > + movq 32(%rsp),%r13 > + movq 8(%rsp),%rdi > + > + movq %r13,%r12 > + rorq $7,%r13 > + addq %r14,%r9 > + movq %rdi,%r14 > + rorq $42,%rdi > + > + xorq %r12,%r13 > + shrq $7,%r12 > + rorq $1,%r13 > + xorq %r14,%rdi > + shrq $6,%r14 > + > + rorq $19,%rdi > + xorq %r13,%r12 > + xorq %r14,%rdi > + addq 96(%rsp),%r12 > + > + addq 24(%rsp),%r12 > + movq %rbx,%r13 > + addq %rdi,%r12 > + movq %r9,%r14 > + rorq $23,%r13 > + movq %rcx,%rdi > + > + xorq %rbx,%r13 > + rorq $5,%r14 > + xorq %rdx,%rdi > + > + movq %r12,24(%rsp) > + xorq %r9,%r14 > + andq %rbx,%rdi > + > + rorq $4,%r13 > + addq %r8,%r12 > + xorq %rdx,%rdi > + > + rorq $6,%r14 > + xorq %rbx,%r13 > + addq %rdi,%r12 > + > + movq %r9,%rdi > + addq (%rbp),%r12 > + xorq %r9,%r14 > + > + xorq %r10,%rdi > + rorq $14,%r13 > + movq %r10,%r8 > + > + andq %rdi,%r15 > + rorq $28,%r14 > + addq %r13,%r12 > + > + xorq %r15,%r8 > + addq %r12,%rax > + addq %r12,%r8 > + > + leaq 24(%rbp),%rbp > + movq 40(%rsp),%r13 > + movq 16(%rsp),%r15 > + > + movq %r13,%r12 > + rorq $7,%r13 > + addq %r14,%r8 > + movq %r15,%r14 > + rorq $42,%r15 > + > + xorq %r12,%r13 > + shrq $7,%r12 > + rorq $1,%r13 > + xorq %r14,%r15 > + shrq $6,%r14 > + > + rorq $19,%r15 > + xorq %r13,%r12 > + xorq %r14,%r15 > + addq 104(%rsp),%r12 > + > + addq 32(%rsp),%r12 > + movq %rax,%r13 > + addq %r15,%r12 > + movq %r8,%r14 > + rorq $23,%r13 > + movq %rbx,%r15 > + > + xorq %rax,%r13 > + rorq $5,%r14 > + xorq %rcx,%r15 > + > + movq %r12,32(%rsp) > + xorq %r8,%r14 > + andq %rax,%r15 > + > + rorq $4,%r13 > + addq %rdx,%r12 > + xorq %rcx,%r15 > + > + rorq $6,%r14 > + xorq %rax,%r13 > + addq %r15,%r12 > + > + movq %r8,%r15 > + addq (%rbp),%r12 > + xorq %r8,%r14 > + > + xorq %r9,%r15 > + rorq $14,%r13 > + movq %r9,%rdx > + > + andq %r15,%rdi > + rorq $28,%r14 > + addq %r13,%r12 > + > + xorq %rdi,%rdx > + addq %r12,%r11 > + addq %r12,%rdx > + > + leaq 8(%rbp),%rbp > + movq 48(%rsp),%r13 > + movq 24(%rsp),%rdi > + > + movq %r13,%r12 > + rorq $7,%r13 > + addq %r14,%rdx > + movq %rdi,%r14 > + rorq $42,%rdi > + > + xorq %r12,%r13 > + shrq $7,%r12 > + rorq $1,%r13 > + xorq %r14,%rdi > + shrq $6,%r14 > + > + rorq $19,%rdi > + xorq %r13,%r12 > + xorq %r14,%rdi > + addq 112(%rsp),%r12 > + > + addq 40(%rsp),%r12 > + movq %r11,%r13 > + addq %rdi,%r12 > + movq %rdx,%r14 > + rorq $23,%r13 > + movq %rax,%rdi > + > + xorq %r11,%r13 > + rorq $5,%r14 > + xorq %rbx,%rdi > + > + movq %r12,40(%rsp) > + xorq %rdx,%r14 > + andq %r11,%rdi > + > + rorq $4,%r13 > + addq %rcx,%r12 > + xorq %rbx,%rdi > + > + rorq $6,%r14 > + xorq %r11,%r13 > + addq %rdi,%r12 > + > + movq %rdx,%rdi > + addq (%rbp),%r12 > + xorq %rdx,%r14 > + > + xorq %r8,%rdi > + rorq $14,%r13 > + movq %r8,%rcx > + > + andq %rdi,%r15 > + rorq $28,%r14 > + addq %r13,%r12 > + > + xorq %r15,%rcx > + addq %r12,%r10 > + addq %r12,%rcx > + > + leaq 24(%rbp),%rbp > + movq 56(%rsp),%r13 > + movq 32(%rsp),%r15 > + > + movq %r13,%r12 > + rorq $7,%r13 > + addq %r14,%rcx > + movq %r15,%r14 > + rorq $42,%r15 > + > + xorq %r12,%r13 > + shrq $7,%r12 > + rorq $1,%r13 > + xorq %r14,%r15 > + shrq $6,%r14 > + > + rorq $19,%r15 > + xorq %r13,%r12 > + xorq %r14,%r15 > + addq 120(%rsp),%r12 > + > + addq 48(%rsp),%r12 > + movq %r10,%r13 > + addq %r15,%r12 > + movq %rcx,%r14 > + rorq $23,%r13 > + movq %r11,%r15 > + > + xorq %r10,%r13 > + rorq $5,%r14 > + xorq %rax,%r15 > + > + movq %r12,48(%rsp) > + xorq %rcx,%r14 > + andq %r10,%r15 > + > + rorq $4,%r13 > + addq %rbx,%r12 > + xorq %rax,%r15 > + > + rorq $6,%r14 > + xorq %r10,%r13 > + addq %r15,%r12 > + > + movq %rcx,%r15 > + addq (%rbp),%r12 > + xorq %rcx,%r14 > + > + xorq %rdx,%r15 > + rorq $14,%r13 > + movq %rdx,%rbx > + > + andq %r15,%rdi > + rorq $28,%r14 > + addq %r13,%r12 > + > + xorq %rdi,%rbx > + addq %r12,%r9 > + addq %r12,%rbx > + > + leaq 8(%rbp),%rbp > + movq 64(%rsp),%r13 > + movq 40(%rsp),%rdi > + > + movq %r13,%r12 > + rorq $7,%r13 > + addq %r14,%rbx > + movq %rdi,%r14 > + rorq $42,%rdi > + > + xorq %r12,%r13 > + shrq $7,%r12 > + rorq $1,%r13 > + xorq %r14,%rdi > + shrq $6,%r14 > + > + rorq $19,%rdi > + xorq %r13,%r12 > + xorq %r14,%rdi > + addq 0(%rsp),%r12 > + > + addq 56(%rsp),%r12 > + movq %r9,%r13 > + addq %rdi,%r12 > + movq %rbx,%r14 > + rorq $23,%r13 > + movq %r10,%rdi > + > + xorq %r9,%r13 > + rorq $5,%r14 > + xorq %r11,%rdi > + > + movq %r12,56(%rsp) > + xorq %rbx,%r14 > + andq %r9,%rdi > + > + rorq $4,%r13 > + addq %rax,%r12 > + xorq %r11,%rdi > + > + rorq $6,%r14 > + xorq %r9,%r13 > + addq %rdi,%r12 > + > + movq %rbx,%rdi > + addq (%rbp),%r12 > + xorq %rbx,%r14 > + > + xorq %rcx,%rdi > + rorq $14,%r13 > + movq %rcx,%rax > + > + andq %rdi,%r15 > + rorq $28,%r14 > + addq %r13,%r12 > + > + xorq %r15,%rax > + addq %r12,%r8 > + addq %r12,%rax > + > + leaq 24(%rbp),%rbp > + movq 72(%rsp),%r13 > + movq 48(%rsp),%r15 > + > + movq %r13,%r12 > + rorq $7,%r13 > + addq %r14,%rax > + movq %r15,%r14 > + rorq $42,%r15 > + > + xorq %r12,%r13 > + shrq $7,%r12 > + rorq $1,%r13 > + xorq %r14,%r15 > + shrq $6,%r14 > + > + rorq $19,%r15 > + xorq %r13,%r12 > + xorq %r14,%r15 > + addq 8(%rsp),%r12 > + > + addq 64(%rsp),%r12 > + movq %r8,%r13 > + addq %r15,%r12 > + movq %rax,%r14 > + rorq $23,%r13 > + movq %r9,%r15 > + > + xorq %r8,%r13 > + rorq $5,%r14 > + xorq %r10,%r15 > + > + movq %r12,64(%rsp) > + xorq %rax,%r14 > + andq %r8,%r15 > + > + rorq $4,%r13 > + addq %r11,%r12 > + xorq %r10,%r15 > + > + rorq $6,%r14 > + xorq %r8,%r13 > + addq %r15,%r12 > + > + movq %rax,%r15 > + addq (%rbp),%r12 > + xorq %rax,%r14 > + > + xorq %rbx,%r15 > + rorq $14,%r13 > + movq %rbx,%r11 > + > + andq %r15,%rdi > + rorq $28,%r14 > + addq %r13,%r12 > + > + xorq %rdi,%r11 > + addq %r12,%rdx > + addq %r12,%r11 > + > + leaq 8(%rbp),%rbp > + movq 80(%rsp),%r13 > + movq 56(%rsp),%rdi > + > + movq %r13,%r12 > + rorq $7,%r13 > + addq %r14,%r11 > + movq %rdi,%r14 > + rorq $42,%rdi > + > + xorq %r12,%r13 > + shrq $7,%r12 > + rorq $1,%r13 > + xorq %r14,%rdi > + shrq $6,%r14 > + > + rorq $19,%rdi > + xorq %r13,%r12 > + xorq %r14,%rdi > + addq 16(%rsp),%r12 > + > + addq 72(%rsp),%r12 > + movq %rdx,%r13 > + addq %rdi,%r12 > + movq %r11,%r14 > + rorq $23,%r13 > + movq %r8,%rdi > + > + xorq %rdx,%r13 > + rorq $5,%r14 > + xorq %r9,%rdi > + > + movq %r12,72(%rsp) > + xorq %r11,%r14 > + andq %rdx,%rdi > + > + rorq $4,%r13 > + addq %r10,%r12 > + xorq %r9,%rdi > + > + rorq $6,%r14 > + xorq %rdx,%r13 > + addq %rdi,%r12 > + > + movq %r11,%rdi > + addq (%rbp),%r12 > + xorq %r11,%r14 > + > + xorq %rax,%rdi > + rorq $14,%r13 > + movq %rax,%r10 > + > + andq %rdi,%r15 > + rorq $28,%r14 > + addq %r13,%r12 > + > + xorq %r15,%r10 > + addq %r12,%rcx > + addq %r12,%r10 > + > + leaq 24(%rbp),%rbp > + movq 88(%rsp),%r13 > + movq 64(%rsp),%r15 > + > + movq %r13,%r12 > + rorq $7,%r13 > + addq %r14,%r10 > + movq %r15,%r14 > + rorq $42,%r15 > + > + xorq %r12,%r13 > + shrq $7,%r12 > + rorq $1,%r13 > + xorq %r14,%r15 > + shrq $6,%r14 > + > + rorq $19,%r15 > + xorq %r13,%r12 > + xorq %r14,%r15 > + addq 24(%rsp),%r12 > + > + addq 80(%rsp),%r12 > + movq %rcx,%r13 > + addq %r15,%r12 > + movq %r10,%r14 > + rorq $23,%r13 > + movq %rdx,%r15 > + > + xorq %rcx,%r13 > + rorq $5,%r14 > + xorq %r8,%r15 > + > + movq %r12,80(%rsp) > + xorq %r10,%r14 > + andq %rcx,%r15 > + > + rorq $4,%r13 > + addq %r9,%r12 > + xorq %r8,%r15 > + > + rorq $6,%r14 > + xorq %rcx,%r13 > + addq %r15,%r12 > + > + movq %r10,%r15 > + addq (%rbp),%r12 > + xorq %r10,%r14 > + > + xorq %r11,%r15 > + rorq $14,%r13 > + movq %r11,%r9 > + > + andq %r15,%rdi > + rorq $28,%r14 > + addq %r13,%r12 > + > + xorq %rdi,%r9 > + addq %r12,%rbx > + addq %r12,%r9 > + > + leaq 8(%rbp),%rbp > + movq 96(%rsp),%r13 > + movq 72(%rsp),%rdi > + > + movq %r13,%r12 > + rorq $7,%r13 > + addq %r14,%r9 > + movq %rdi,%r14 > + rorq $42,%rdi > + > + xorq %r12,%r13 > + shrq $7,%r12 > + rorq $1,%r13 > + xorq %r14,%rdi > + shrq $6,%r14 > + > + rorq $19,%rdi > + xorq %r13,%r12 > + xorq %r14,%rdi > + addq 32(%rsp),%r12 > + > + addq 88(%rsp),%r12 > + movq %rbx,%r13 > + addq %rdi,%r12 > + movq %r9,%r14 > + rorq $23,%r13 > + movq %rcx,%rdi > + > + xorq %rbx,%r13 > + rorq $5,%r14 > + xorq %rdx,%rdi > + > + movq %r12,88(%rsp) > + xorq %r9,%r14 > + andq %rbx,%rdi > + > + rorq $4,%r13 > + addq %r8,%r12 > + xorq %rdx,%rdi > + > + rorq $6,%r14 > + xorq %rbx,%r13 > + addq %rdi,%r12 > + > + movq %r9,%rdi > + addq (%rbp),%r12 > + xorq %r9,%r14 > + > + xorq %r10,%rdi > + rorq $14,%r13 > + movq %r10,%r8 > + > + andq %rdi,%r15 > + rorq $28,%r14 > + addq %r13,%r12 > + > + xorq %r15,%r8 > + addq %r12,%rax > + addq %r12,%r8 > + > + leaq 24(%rbp),%rbp > + movq 104(%rsp),%r13 > + movq 80(%rsp),%r15 > + > + movq %r13,%r12 > + rorq $7,%r13 > + addq %r14,%r8 > + movq %r15,%r14 > + rorq $42,%r15 > + > + xorq %r12,%r13 > + shrq $7,%r12 > + rorq $1,%r13 > + xorq %r14,%r15 > + shrq $6,%r14 > + > + rorq $19,%r15 > + xorq %r13,%r12 > + xorq %r14,%r15 > + addq 40(%rsp),%r12 > + > + addq 96(%rsp),%r12 > + movq %rax,%r13 > + addq %r15,%r12 > + movq %r8,%r14 > + rorq $23,%r13 > + movq %rbx,%r15 > + > + xorq %rax,%r13 > + rorq $5,%r14 > + xorq %rcx,%r15 > + > + movq %r12,96(%rsp) > + xorq %r8,%r14 > + andq %rax,%r15 > + > + rorq $4,%r13 > + addq %rdx,%r12 > + xorq %rcx,%r15 > + > + rorq $6,%r14 > + xorq %rax,%r13 > + addq %r15,%r12 > + > + movq %r8,%r15 > + addq (%rbp),%r12 > + xorq %r8,%r14 > + > + xorq %r9,%r15 > + rorq $14,%r13 > + movq %r9,%rdx > + > + andq %r15,%rdi > + rorq $28,%r14 > + addq %r13,%r12 > + > + xorq %rdi,%rdx > + addq %r12,%r11 > + addq %r12,%rdx > + > + leaq 8(%rbp),%rbp > + movq 112(%rsp),%r13 > + movq 88(%rsp),%rdi > + > + movq %r13,%r12 > + rorq $7,%r13 > + addq %r14,%rdx > + movq %rdi,%r14 > + rorq $42,%rdi > + > + xorq %r12,%r13 > + shrq $7,%r12 > + rorq $1,%r13 > + xorq %r14,%rdi > + shrq $6,%r14 > + > + rorq $19,%rdi > + xorq %r13,%r12 > + xorq %r14,%rdi > + addq 48(%rsp),%r12 > + > + addq 104(%rsp),%r12 > + movq %r11,%r13 > + addq %rdi,%r12 > + movq %rdx,%r14 > + rorq $23,%r13 > + movq %rax,%rdi > + > + xorq %r11,%r13 > + rorq $5,%r14 > + xorq %rbx,%rdi > + > + movq %r12,104(%rsp) > + xorq %rdx,%r14 > + andq %r11,%rdi > + > + rorq $4,%r13 > + addq %rcx,%r12 > + xorq %rbx,%rdi > + > + rorq $6,%r14 > + xorq %r11,%r13 > + addq %rdi,%r12 > + > + movq %rdx,%rdi > + addq (%rbp),%r12 > + xorq %rdx,%r14 > + > + xorq %r8,%rdi > + rorq $14,%r13 > + movq %r8,%rcx > + > + andq %rdi,%r15 > + rorq $28,%r14 > + addq %r13,%r12 > + > + xorq %r15,%rcx > + addq %r12,%r10 > + addq %r12,%rcx > + > + leaq 24(%rbp),%rbp > + movq 120(%rsp),%r13 > + movq 96(%rsp),%r15 > + > + movq %r13,%r12 > + rorq $7,%r13 > + addq %r14,%rcx > + movq %r15,%r14 > + rorq $42,%r15 > + > + xorq %r12,%r13 > + shrq $7,%r12 > + rorq $1,%r13 > + xorq %r14,%r15 > + shrq $6,%r14 > + > + rorq $19,%r15 > + xorq %r13,%r12 > + xorq %r14,%r15 > + addq 56(%rsp),%r12 > + > + addq 112(%rsp),%r12 > + movq %r10,%r13 > + addq %r15,%r12 > + movq %rcx,%r14 > + rorq $23,%r13 > + movq %r11,%r15 > + > + xorq %r10,%r13 > + rorq $5,%r14 > + xorq %rax,%r15 > + > + movq %r12,112(%rsp) > + xorq %rcx,%r14 > + andq %r10,%r15 > + > + rorq $4,%r13 > + addq %rbx,%r12 > + xorq %rax,%r15 > + > + rorq $6,%r14 > + xorq %r10,%r13 > + addq %r15,%r12 > + > + movq %rcx,%r15 > + addq (%rbp),%r12 > + xorq %rcx,%r14 > + > + xorq %rdx,%r15 > + rorq $14,%r13 > + movq %rdx,%rbx > + > + andq %r15,%rdi > + rorq $28,%r14 > + addq %r13,%r12 > + > + xorq %rdi,%rbx > + addq %r12,%r9 > + addq %r12,%rbx > + > + leaq 8(%rbp),%rbp > + movq 0(%rsp),%r13 > + movq 104(%rsp),%rdi > + > + movq %r13,%r12 > + rorq $7,%r13 > + addq %r14,%rbx > + movq %rdi,%r14 > + rorq $42,%rdi > + > + xorq %r12,%r13 > + shrq $7,%r12 > + rorq $1,%r13 > + xorq %r14,%rdi > + shrq $6,%r14 > + > + rorq $19,%rdi > + xorq %r13,%r12 > + xorq %r14,%rdi > + addq 64(%rsp),%r12 > + > + addq 120(%rsp),%r12 > + movq %r9,%r13 > + addq %rdi,%r12 > + movq %rbx,%r14 > + rorq $23,%r13 > + movq %r10,%rdi > + > + xorq %r9,%r13 > + rorq $5,%r14 > + xorq %r11,%rdi > + > + movq %r12,120(%rsp) > + xorq %rbx,%r14 > + andq %r9,%rdi > + > + rorq $4,%r13 > + addq %rax,%r12 > + xorq %r11,%rdi > + > + rorq $6,%r14 > + xorq %r9,%r13 > + addq %rdi,%r12 > + > + movq %rbx,%rdi > + addq (%rbp),%r12 > + xorq %rbx,%r14 > + > + xorq %rcx,%rdi > + rorq $14,%r13 > + movq %rcx,%rax > + > + andq %rdi,%r15 > + rorq $28,%r14 > + addq %r13,%r12 > + > + xorq %r15,%rax > + addq %r12,%r8 > + addq %r12,%rax > + > + leaq 24(%rbp),%rbp > + cmpb $0,7(%rbp) > + jnz .Lrounds_16_xx > + > + movq 128+0(%rsp),%rdi > + addq %r14,%rax > + leaq 128(%rsi),%rsi > + > + addq 0(%rdi),%rax > + addq 8(%rdi),%rbx > + addq 16(%rdi),%rcx > + addq 24(%rdi),%rdx > + addq 32(%rdi),%r8 > + addq 40(%rdi),%r9 > + addq 48(%rdi),%r10 > + addq 56(%rdi),%r11 > + > + cmpq 128+16(%rsp),%rsi > + > + movq %rax,0(%rdi) > + movq %rbx,8(%rdi) > + movq %rcx,16(%rdi) > + movq %rdx,24(%rdi) > + movq %r8,32(%rdi) > + movq %r9,40(%rdi) > + movq %r10,48(%rdi) > + movq %r11,56(%rdi) > + jb .Lloop > + > + movq 152(%rsp),%rsi > +.cfi_def_cfa %rsi,8 > + movq -48(%rsi),%r15 > +.cfi_restore %r15 > + movq -40(%rsi),%r14 > +.cfi_restore %r14 > + movq -32(%rsi),%r13 > +.cfi_restore %r13 > + movq -24(%rsi),%r12 > +.cfi_restore %r12 > + movq -16(%rsi),%rbp > +.cfi_restore %rbp > + movq -8(%rsi),%rbx > +.cfi_restore %rbx > + leaq (%rsi),%rsp > +.cfi_def_cfa_register %rsp > +.Lepilogue: > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size sha512_block_data_order,.-sha512_block_data_order > +.align 64 > +.type K512,@object > +K512: > +.quad 0x428a2f98d728ae22,0x7137449123ef65cd > +.quad 0x428a2f98d728ae22,0x7137449123ef65cd > +.quad 0xb5c0fbcfec4d3b2f,0xe9b5dba58189dbbc > +.quad 0xb5c0fbcfec4d3b2f,0xe9b5dba58189dbbc > +.quad 0x3956c25bf348b538,0x59f111f1b605d019 > +.quad 0x3956c25bf348b538,0x59f111f1b605d019 > +.quad 0x923f82a4af194f9b,0xab1c5ed5da6d8118 > +.quad 0x923f82a4af194f9b,0xab1c5ed5da6d8118 > +.quad 0xd807aa98a3030242,0x12835b0145706fbe > +.quad 0xd807aa98a3030242,0x12835b0145706fbe > +.quad 0x243185be4ee4b28c,0x550c7dc3d5ffb4e2 > +.quad 0x243185be4ee4b28c,0x550c7dc3d5ffb4e2 > +.quad 0x72be5d74f27b896f,0x80deb1fe3b1696b1 > +.quad 0x72be5d74f27b896f,0x80deb1fe3b1696b1 > +.quad 0x9bdc06a725c71235,0xc19bf174cf692694 > +.quad 0x9bdc06a725c71235,0xc19bf174cf692694 > +.quad 0xe49b69c19ef14ad2,0xefbe4786384f25e3 > +.quad 0xe49b69c19ef14ad2,0xefbe4786384f25e3 > +.quad 0x0fc19dc68b8cd5b5,0x240ca1cc77ac9c65 > +.quad 0x0fc19dc68b8cd5b5,0x240ca1cc77ac9c65 > +.quad 0x2de92c6f592b0275,0x4a7484aa6ea6e483 > +.quad 0x2de92c6f592b0275,0x4a7484aa6ea6e483 > +.quad 0x5cb0a9dcbd41fbd4,0x76f988da831153b5 > +.quad 0x5cb0a9dcbd41fbd4,0x76f988da831153b5 > +.quad 0x983e5152ee66dfab,0xa831c66d2db43210 > +.quad 0x983e5152ee66dfab,0xa831c66d2db43210 > +.quad 0xb00327c898fb213f,0xbf597fc7beef0ee4 > +.quad 0xb00327c898fb213f,0xbf597fc7beef0ee4 > +.quad 0xc6e00bf33da88fc2,0xd5a79147930aa725 > +.quad 0xc6e00bf33da88fc2,0xd5a79147930aa725 > +.quad 0x06ca6351e003826f,0x142929670a0e6e70 > +.quad 0x06ca6351e003826f,0x142929670a0e6e70 > +.quad 0x27b70a8546d22ffc,0x2e1b21385c26c926 > +.quad 0x27b70a8546d22ffc,0x2e1b21385c26c926 > +.quad 0x4d2c6dfc5ac42aed,0x53380d139d95b3df > +.quad 0x4d2c6dfc5ac42aed,0x53380d139d95b3df > +.quad 0x650a73548baf63de,0x766a0abb3c77b2a8 > +.quad 0x650a73548baf63de,0x766a0abb3c77b2a8 > +.quad 0x81c2c92e47edaee6,0x92722c851482353b > +.quad 0x81c2c92e47edaee6,0x92722c851482353b > +.quad 0xa2bfe8a14cf10364,0xa81a664bbc423001 > +.quad 0xa2bfe8a14cf10364,0xa81a664bbc423001 > +.quad 0xc24b8b70d0f89791,0xc76c51a30654be30 > +.quad 0xc24b8b70d0f89791,0xc76c51a30654be30 > +.quad 0xd192e819d6ef5218,0xd69906245565a910 > +.quad 0xd192e819d6ef5218,0xd69906245565a910 > +.quad 0xf40e35855771202a,0x106aa07032bbd1b8 > +.quad 0xf40e35855771202a,0x106aa07032bbd1b8 > +.quad 0x19a4c116b8d2d0c8,0x1e376c085141ab53 > +.quad 0x19a4c116b8d2d0c8,0x1e376c085141ab53 > +.quad 0x2748774cdf8eeb99,0x34b0bcb5e19b48a8 > +.quad 0x2748774cdf8eeb99,0x34b0bcb5e19b48a8 > +.quad 0x391c0cb3c5c95a63,0x4ed8aa4ae3418acb > +.quad 0x391c0cb3c5c95a63,0x4ed8aa4ae3418acb > +.quad 0x5b9cca4f7763e373,0x682e6ff3d6b2b8a3 > +.quad 0x5b9cca4f7763e373,0x682e6ff3d6b2b8a3 > +.quad 0x748f82ee5defb2fc,0x78a5636f43172f60 > +.quad 0x748f82ee5defb2fc,0x78a5636f43172f60 > +.quad 0x84c87814a1f0ab72,0x8cc702081a6439ec > +.quad 0x84c87814a1f0ab72,0x8cc702081a6439ec > +.quad 0x90befffa23631e28,0xa4506cebde82bde9 > +.quad 0x90befffa23631e28,0xa4506cebde82bde9 > +.quad 0xbef9a3f7b2c67915,0xc67178f2e372532b > +.quad 0xbef9a3f7b2c67915,0xc67178f2e372532b > +.quad 0xca273eceea26619c,0xd186b8c721c0c207 > +.quad 0xca273eceea26619c,0xd186b8c721c0c207 > +.quad 0xeada7dd6cde0eb1e,0xf57d4f7fee6ed178 > +.quad 0xeada7dd6cde0eb1e,0xf57d4f7fee6ed178 > +.quad 0x06f067aa72176fba,0x0a637dc5a2c898a6 > +.quad 0x06f067aa72176fba,0x0a637dc5a2c898a6 > +.quad 0x113f9804bef90dae,0x1b710b35131c471b > +.quad 0x113f9804bef90dae,0x1b710b35131c471b > +.quad 0x28db77f523047d84,0x32caab7b40c72493 > +.quad 0x28db77f523047d84,0x32caab7b40c72493 > +.quad 0x3c9ebe0a15c9bebc,0x431d67c49c100d4c > +.quad 0x3c9ebe0a15c9bebc,0x431d67c49c100d4c > +.quad 0x4cc5d4becb3e42b6,0x597f299cfc657e2a > +.quad 0x4cc5d4becb3e42b6,0x597f299cfc657e2a > +.quad 0x5fcb6fab3ad6faec,0x6c44198c4a475817 > +.quad 0x5fcb6fab3ad6faec,0x6c44198c4a475817 > + > +.quad 0x0001020304050607,0x08090a0b0c0d0e0f > +.quad 0x0001020304050607,0x08090a0b0c0d0e0f > +.byte > 83,72,65,53,49,50,32,98,108,111,99,107,32,116,114,97,110,115,102,111,114,1 > 09,32,102,111,114,32,120,56,54,95,54,52,44,32,67,82,89,80,84,79,71,65,77,83, > 32,98,121,32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46,111,1 > 14,103,62,0 > diff --git a/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/x86_64cpuid.S > b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/x86_64cpuid.S > new file mode 100644 > index 0000000000..cac5f8f32c > --- /dev/null > +++ b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/x86_64cpuid.S > @@ -0,0 +1,491 @@ > +# WARNING: do not edit! > +# Generated from openssl/crypto/x86_64cpuid.pl > +# > +# Copyright 2005-2020 The OpenSSL Project Authors. All Rights Reserved. > +# > +# Licensed under the OpenSSL license (the "License"). You may not use > +# this file except in compliance with the License. You can obtain a copy > +# in the file LICENSE in the source distribution or at > +# https://www.openssl.org/source/license.html > + > + > +.hidden OPENSSL_cpuid_setup > +.section .init > + call OPENSSL_cpuid_setup > + > +.hidden OPENSSL_ia32cap_P > +.comm OPENSSL_ia32cap_P,16,4 > + > +.text > + > +.globl OPENSSL_atomic_add > +.type OPENSSL_atomic_add,@function > +.align 16 > +OPENSSL_atomic_add: > +.cfi_startproc > + movl (%rdi),%eax > +.Lspin: leaq (%rsi,%rax,1),%r8 > +.byte 0xf0 > + cmpxchgl %r8d,(%rdi) > + jne .Lspin > + movl %r8d,%eax > +.byte 0x48,0x98 > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size OPENSSL_atomic_add,.-OPENSSL_atomic_add > + > +.globl OPENSSL_rdtsc > +.type OPENSSL_rdtsc,@function > +.align 16 > +OPENSSL_rdtsc: > +.cfi_startproc > + rdtsc > + shlq $32,%rdx > + orq %rdx,%rax > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size OPENSSL_rdtsc,.-OPENSSL_rdtsc > + > +.globl OPENSSL_ia32_cpuid > +.type OPENSSL_ia32_cpuid,@function > +.align 16 > +OPENSSL_ia32_cpuid: > +.cfi_startproc > + movq %rbx,%r8 > +.cfi_register %rbx,%r8 > + > + xorl %eax,%eax > + movq %rax,8(%rdi) > + cpuid > + movl %eax,%r11d > + > + xorl %eax,%eax > + cmpl $0x756e6547,%ebx > + setne %al > + movl %eax,%r9d > + cmpl $0x49656e69,%edx > + setne %al > + orl %eax,%r9d > + cmpl $0x6c65746e,%ecx > + setne %al > + orl %eax,%r9d > + jz .Lintel > + > + cmpl $0x68747541,%ebx > + setne %al > + movl %eax,%r10d > + cmpl $0x69746E65,%edx > + setne %al > + orl %eax,%r10d > + cmpl $0x444D4163,%ecx > + setne %al > + orl %eax,%r10d > + jnz .Lintel > + > + > + movl $0x80000000,%eax > + cpuid > + cmpl $0x80000001,%eax > + jb .Lintel > + movl %eax,%r10d > + movl $0x80000001,%eax > + cpuid > + orl %ecx,%r9d > + andl $0x00000801,%r9d > + > + cmpl $0x80000008,%r10d > + jb .Lintel > + > + movl $0x80000008,%eax > + cpuid > + movzbq %cl,%r10 > + incq %r10 > + > + movl $1,%eax > + cpuid > + btl $28,%edx > + jnc .Lgeneric > + shrl $16,%ebx > + cmpb %r10b,%bl > + ja .Lgeneric > + andl $0xefffffff,%edx > + jmp .Lgeneric > + > +.Lintel: > + cmpl $4,%r11d > + movl $-1,%r10d > + jb .Lnocacheinfo > + > + movl $4,%eax > + movl $0,%ecx > + cpuid > + movl %eax,%r10d > + shrl $14,%r10d > + andl $0xfff,%r10d > + > +.Lnocacheinfo: > + movl $1,%eax > + cpuid > + movd %eax,%xmm0 > + andl $0xbfefffff,%edx > + cmpl $0,%r9d > + jne .Lnotintel > + orl $0x40000000,%edx > + andb $15,%ah > + cmpb $15,%ah > + jne .LnotP4 > + orl $0x00100000,%edx > +.LnotP4: > + cmpb $6,%ah > + jne .Lnotintel > + andl $0x0fff0ff0,%eax > + cmpl $0x00050670,%eax > + je .Lknights > + cmpl $0x00080650,%eax > + jne .Lnotintel > +.Lknights: > + andl $0xfbffffff,%ecx > + > +.Lnotintel: > + btl $28,%edx > + jnc .Lgeneric > + andl $0xefffffff,%edx > + cmpl $0,%r10d > + je .Lgeneric > + > + orl $0x10000000,%edx > + shrl $16,%ebx > + cmpb $1,%bl > + ja .Lgeneric > + andl $0xefffffff,%edx > +.Lgeneric: > + andl $0x00000800,%r9d > + andl $0xfffff7ff,%ecx > + orl %ecx,%r9d > + > + movl %edx,%r10d > + > + cmpl $7,%r11d > + jb .Lno_extended_info > + movl $7,%eax > + xorl %ecx,%ecx > + cpuid > + btl $26,%r9d > + jc .Lnotknights > + andl $0xfff7ffff,%ebx > +.Lnotknights: > + movd %xmm0,%eax > + andl $0x0fff0ff0,%eax > + cmpl $0x00050650,%eax > + jne .Lnotskylakex > + andl $0xfffeffff,%ebx > + > +.Lnotskylakex: > + movl %ebx,8(%rdi) > + movl %ecx,12(%rdi) > +.Lno_extended_info: > + > + btl $27,%r9d > + jnc .Lclear_avx > + xorl %ecx,%ecx > +.byte 0x0f,0x01,0xd0 > + andl $0xe6,%eax > + cmpl $0xe6,%eax > + je .Ldone > + andl $0x3fdeffff,8(%rdi) > + > + > + > + > + andl $6,%eax > + cmpl $6,%eax > + je .Ldone > +.Lclear_avx: > + movl $0xefffe7ff,%eax > + andl %eax,%r9d > + movl $0x3fdeffdf,%eax > + andl %eax,8(%rdi) > +.Ldone: > + shlq $32,%r9 > + movl %r10d,%eax > + movq %r8,%rbx > +.cfi_restore %rbx > + orq %r9,%rax > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size OPENSSL_ia32_cpuid,.-OPENSSL_ia32_cpuid > + > +.globl OPENSSL_cleanse > +.type OPENSSL_cleanse,@function > +.align 16 > +OPENSSL_cleanse: > +.cfi_startproc > + xorq %rax,%rax > + cmpq $15,%rsi > + jae .Lot > + cmpq $0,%rsi > + je .Lret > +.Little: > + movb %al,(%rdi) > + subq $1,%rsi > + leaq 1(%rdi),%rdi > + jnz .Little > +.Lret: > + .byte 0xf3,0xc3 > +.align 16 > +.Lot: > + testq $7,%rdi > + jz .Laligned > + movb %al,(%rdi) > + leaq -1(%rsi),%rsi > + leaq 1(%rdi),%rdi > + jmp .Lot > +.Laligned: > + movq %rax,(%rdi) > + leaq -8(%rsi),%rsi > + testq $-8,%rsi > + leaq 8(%rdi),%rdi > + jnz .Laligned > + cmpq $0,%rsi > + jne .Little > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size OPENSSL_cleanse,.-OPENSSL_cleanse > + > +.globl CRYPTO_memcmp > +.type CRYPTO_memcmp,@function > +.align 16 > +CRYPTO_memcmp: > +.cfi_startproc > + xorq %rax,%rax > + xorq %r10,%r10 > + cmpq $0,%rdx > + je .Lno_data > + cmpq $16,%rdx > + jne .Loop_cmp > + movq (%rdi),%r10 > + movq 8(%rdi),%r11 > + movq $1,%rdx > + xorq (%rsi),%r10 > + xorq 8(%rsi),%r11 > + orq %r11,%r10 > + cmovnzq %rdx,%rax > + .byte 0xf3,0xc3 > + > +.align 16 > +.Loop_cmp: > + movb (%rdi),%r10b > + leaq 1(%rdi),%rdi > + xorb (%rsi),%r10b > + leaq 1(%rsi),%rsi > + orb %r10b,%al > + decq %rdx > + jnz .Loop_cmp > + negq %rax > + shrq $63,%rax > +.Lno_data: > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size CRYPTO_memcmp,.-CRYPTO_memcmp > +.globl OPENSSL_wipe_cpu > +.type OPENSSL_wipe_cpu,@function > +.align 16 > +OPENSSL_wipe_cpu: > +.cfi_startproc > + pxor %xmm0,%xmm0 > + pxor %xmm1,%xmm1 > + pxor %xmm2,%xmm2 > + pxor %xmm3,%xmm3 > + pxor %xmm4,%xmm4 > + pxor %xmm5,%xmm5 > + pxor %xmm6,%xmm6 > + pxor %xmm7,%xmm7 > + pxor %xmm8,%xmm8 > + pxor %xmm9,%xmm9 > + pxor %xmm10,%xmm10 > + pxor %xmm11,%xmm11 > + pxor %xmm12,%xmm12 > + pxor %xmm13,%xmm13 > + pxor %xmm14,%xmm14 > + pxor %xmm15,%xmm15 > + xorq %rcx,%rcx > + xorq %rdx,%rdx > + xorq %rsi,%rsi > + xorq %rdi,%rdi > + xorq %r8,%r8 > + xorq %r9,%r9 > + xorq %r10,%r10 > + xorq %r11,%r11 > + leaq 8(%rsp),%rax > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size OPENSSL_wipe_cpu,.-OPENSSL_wipe_cpu > +.globl OPENSSL_instrument_bus > +.type OPENSSL_instrument_bus,@function > +.align 16 > +OPENSSL_instrument_bus: > +.cfi_startproc > + movq %rdi,%r10 > + movq %rsi,%rcx > + movq %rsi,%r11 > + > + rdtsc > + movl %eax,%r8d > + movl $0,%r9d > + clflush (%r10) > +.byte 0xf0 > + addl %r9d,(%r10) > + jmp .Loop > +.align 16 > +.Loop: rdtsc > + movl %eax,%edx > + subl %r8d,%eax > + movl %edx,%r8d > + movl %eax,%r9d > + clflush (%r10) > +.byte 0xf0 > + addl %eax,(%r10) > + leaq 4(%r10),%r10 > + subq $1,%rcx > + jnz .Loop > + > + movq %r11,%rax > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size OPENSSL_instrument_bus,.-OPENSSL_instrument_bus > + > +.globl OPENSSL_instrument_bus2 > +.type OPENSSL_instrument_bus2,@function > +.align 16 > +OPENSSL_instrument_bus2: > +.cfi_startproc > + movq %rdi,%r10 > + movq %rsi,%rcx > + movq %rdx,%r11 > + movq %rcx,8(%rsp) > + > + rdtsc > + movl %eax,%r8d > + movl $0,%r9d > + > + clflush (%r10) > +.byte 0xf0 > + addl %r9d,(%r10) > + > + rdtsc > + movl %eax,%edx > + subl %r8d,%eax > + movl %edx,%r8d > + movl %eax,%r9d > +.Loop2: > + clflush (%r10) > +.byte 0xf0 > + addl %eax,(%r10) > + > + subq $1,%r11 > + jz .Ldone2 > + > + rdtsc > + movl %eax,%edx > + subl %r8d,%eax > + movl %edx,%r8d > + cmpl %r9d,%eax > + movl %eax,%r9d > + movl $0,%edx > + setne %dl > + subq %rdx,%rcx > + leaq (%r10,%rdx,4),%r10 > + jnz .Loop2 > + > +.Ldone2: > + movq 8(%rsp),%rax > + subq %rcx,%rax > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size OPENSSL_instrument_bus2,.-OPENSSL_instrument_bus2 > +.globl OPENSSL_ia32_rdrand_bytes > +.type OPENSSL_ia32_rdrand_bytes,@function > +.align 16 > +OPENSSL_ia32_rdrand_bytes: > +.cfi_startproc > + xorq %rax,%rax > + cmpq $0,%rsi > + je .Ldone_rdrand_bytes > + > + movq $8,%r11 > +.Loop_rdrand_bytes: > +.byte 73,15,199,242 > + jc .Lbreak_rdrand_bytes > + decq %r11 > + jnz .Loop_rdrand_bytes > + jmp .Ldone_rdrand_bytes > + > +.align 16 > +.Lbreak_rdrand_bytes: > + cmpq $8,%rsi > + jb .Ltail_rdrand_bytes > + movq %r10,(%rdi) > + leaq 8(%rdi),%rdi > + addq $8,%rax > + subq $8,%rsi > + jz .Ldone_rdrand_bytes > + movq $8,%r11 > + jmp .Loop_rdrand_bytes > + > +.align 16 > +.Ltail_rdrand_bytes: > + movb %r10b,(%rdi) > + leaq 1(%rdi),%rdi > + incq %rax > + shrq $8,%r10 > + decq %rsi > + jnz .Ltail_rdrand_bytes > + > +.Ldone_rdrand_bytes: > + xorq %r10,%r10 > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size OPENSSL_ia32_rdrand_bytes,.-OPENSSL_ia32_rdrand_bytes > +.globl OPENSSL_ia32_rdseed_bytes > +.type OPENSSL_ia32_rdseed_bytes,@function > +.align 16 > +OPENSSL_ia32_rdseed_bytes: > +.cfi_startproc > + xorq %rax,%rax > + cmpq $0,%rsi > + je .Ldone_rdseed_bytes > + > + movq $8,%r11 > +.Loop_rdseed_bytes: > +.byte 73,15,199,250 > + jc .Lbreak_rdseed_bytes > + decq %r11 > + jnz .Loop_rdseed_bytes > + jmp .Ldone_rdseed_bytes > + > +.align 16 > +.Lbreak_rdseed_bytes: > + cmpq $8,%rsi > + jb .Ltail_rdseed_bytes > + movq %r10,(%rdi) > + leaq 8(%rdi),%rdi > + addq $8,%rax > + subq $8,%rsi > + jz .Ldone_rdseed_bytes > + movq $8,%r11 > + jmp .Loop_rdseed_bytes > + > +.align 16 > +.Ltail_rdseed_bytes: > + movb %r10b,(%rdi) > + leaq 1(%rdi),%rdi > + incq %rax > + shrq $8,%r10 > + decq %rsi > + jnz .Ltail_rdseed_bytes > + > +.Ldone_rdseed_bytes: > + xorq %r10,%r10 > + .byte 0xf3,0xc3 > +.cfi_endproc > +.size OPENSSL_ia32_rdseed_bytes,.-OPENSSL_ia32_rdseed_bytes > -- > 2.32.0.windows.1 ^ permalink raw reply [flat|nested] 13+ messages in thread
* 回复: [edk2-devel] [PATCH v7 3/3] CryptoPkg/OpensslLib: Commit the auto-generated assembly files for X64 2021-07-21 11:44 ` Yao, Jiewen @ 2021-07-26 10:08 ` gaoliming 0 siblings, 0 replies; 13+ messages in thread From: gaoliming @ 2021-07-26 10:08 UTC (permalink / raw) To: devel, jiewen.yao, christopher.zurcher Cc: 'Wang, Jian J', 'Lu, XiaoyuX', 'Kinney, Michael D', 'Ard Biesheuvel' Hi, all I have merged this patch set at edk2 332632abf3eb23fe7fcb0601bc715ba829b33e79..147f34b56ce0e2e18285ef7d0695753ac0 aa5085 Thanks Liming > -----邮件原件----- > 发件人: devel@edk2.groups.io <devel@edk2.groups.io> 代表 Yao, Jiewen > 发送时间: 2021年7月21日 19:45 > 收件人: christopher.zurcher@outlook.com; devel@edk2.groups.io > 抄送: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX > <xiaoyux.lu@intel.com>; Kinney, Michael D <michael.d.kinney@intel.com>; > Ard Biesheuvel <ardb@kernel.org> > 主题: Re: [edk2-devel] [PATCH v7 3/3] CryptoPkg/OpensslLib: Commit the > auto-generated assembly files for X64 > > Reviewed-by: Jiewen Yao <Jiewen.yao@intel.com> > > > -----Original Message----- > > From: christopher.zurcher@outlook.com > <christopher.zurcher@outlook.com> > > Sent: Wednesday, July 21, 2021 6:07 AM > > To: devel@edk2.groups.io > > Cc: Yao, Jiewen <jiewen.yao@intel.com>; Wang, Jian J > <jian.j.wang@intel.com>; > > Lu, XiaoyuX <xiaoyux.lu@intel.com>; Kinney, Michael D > > <michael.d.kinney@intel.com>; Ard Biesheuvel <ardb@kernel.org> > > Subject: [PATCH v7 3/3] CryptoPkg/OpensslLib: Commit the auto-generated > > assembly files for X64 > > > > From: Christopher Zurcher <christopher.zurcher@microsoft.com> > > > > BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=2507 > > > > Adding the auto-generated assembly files for X64 architectures. > > > > Cc: Jiewen Yao <jiewen.yao@intel.com> > > Cc: Jian J Wang <jian.j.wang@intel.com> > > Cc: Xiaoyu Lu <xiaoyux.lu@intel.com> > > Cc: Mike Kinney <michael.d.kinney@intel.com> > > Cc: Ard Biesheuvel <ardb@kernel.org> > > Signed-off-by: Christopher Zurcher <christopher.zurcher@microsoft.com> > > --- > > CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-mb-x86_64.nasm > | 732 > > +++ > > CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha1-x86_64.nasm > | > > 1916 ++++++++ > > > CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha256-x86_64.nasm | > > 78 + > > CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-x86_64.nasm > | 5103 > > ++++++++++++++++++++ > > CryptoPkg/Library/OpensslLib/X64/crypto/aes/vpaes-x86_64.nasm > | 1173 > > +++++ > > > CryptoPkg/Library/OpensslLib/X64/crypto/modes/aesni-gcm-x86_64.nasm > | > > 34 + > > CryptoPkg/Library/OpensslLib/X64/crypto/modes/ghash-x86_64.nasm > | > > 1569 ++++++ > > CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-mb-x86_64.nasm > | 3137 > > ++++++++++++ > > CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-x86_64.nasm > | 2884 > > +++++++++++ > > CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-mb-x86_64.nasm > | > > 3461 +++++++++++++ > > CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-x86_64.nasm > | 3313 > > +++++++++++++ > > CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha512-x86_64.nasm > | 1938 > > ++++++++ > > CryptoPkg/Library/OpensslLib/X64/crypto/x86_64cpuid.nasm > | 491 ++ > > CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/aesni-mb-x86_64.S > | 552 > > +++ > > CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/aesni-sha1-x86_64.S > | 1719 > > +++++++ > > > CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/aesni-sha256-x86_64.S | > 69 > > + > > CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/aesni-x86_64.S > | 4484 > > +++++++++++++++++ > > CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/vpaes-x86_64.S > | 863 > > ++++ > > > CryptoPkg/Library/OpensslLib/X64Gcc/crypto/modes/aesni-gcm-x86_64.S > | > > 29 + > > CryptoPkg/Library/OpensslLib/X64Gcc/crypto/modes/ghash-x86_64.S > | 1386 > > ++++++ > > CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha1-mb-x86_64.S > | 2962 > > ++++++++++++ > > CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha1-x86_64.S > | 2631 > > ++++++++++ > > CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha256-mb-x86_64.S > | > > 3286 +++++++++++++ > > CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha256-x86_64.S > | 3097 > > ++++++++++++ > > CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha512-x86_64.S > | 1811 > > +++++++ > > CryptoPkg/Library/OpensslLib/X64Gcc/crypto/x86_64cpuid.S > | 491 ++ > > 26 files changed, 49209 insertions(+) > > > > diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-mb- > > x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-mb- > > x86_64.nasm > > new file mode 100644 > > index 0000000000..1a3ed1dd35 > > --- /dev/null > > +++ > b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-mb-x86_64.nasm > > @@ -0,0 +1,732 @@ > > +; WARNING: do not edit! > > +; Generated from openssl/crypto/aes/asm/aesni-mb-x86_64.pl > > +; > > +; Copyright 2013-2020 The OpenSSL Project Authors. All Rights Reserved. > > +; > > +; Licensed under the OpenSSL license (the "License"). You may not use > > +; this file except in compliance with the License. You can obtain a copy > > +; in the file LICENSE in the source distribution or at > > +; https://www.openssl.org/source/license.html > > + > > +default rel > > +%define XMMWORD > > +%define YMMWORD > > +%define ZMMWORD > > +section .text code align=64 > > + > > + > > +EXTERN OPENSSL_ia32cap_P > > + > > +global aesni_multi_cbc_encrypt > > + > > +ALIGN 32 > > +aesni_multi_cbc_encrypt: > > + mov QWORD[8+rsp],rdi ;WIN64 prologue > > + mov QWORD[16+rsp],rsi > > + mov rax,rsp > > +$L$SEH_begin_aesni_multi_cbc_encrypt: > > + mov rdi,rcx > > + mov rsi,rdx > > + mov rdx,r8 > > + > > + > > + > > + mov rax,rsp > > + > > + push rbx > > + > > + push rbp > > + > > + push r12 > > + > > + push r13 > > + > > + push r14 > > + > > + push r15 > > + > > + lea rsp,[((-168))+rsp] > > + movaps XMMWORD[rsp],xmm6 > > + movaps XMMWORD[16+rsp],xmm7 > > + movaps XMMWORD[32+rsp],xmm8 > > + movaps XMMWORD[48+rsp],xmm9 > > + movaps XMMWORD[64+rsp],xmm10 > > + movaps XMMWORD[80+rsp],xmm11 > > + movaps XMMWORD[96+rsp],xmm12 > > + movaps XMMWORD[(-104)+rax],xmm13 > > + movaps XMMWORD[(-88)+rax],xmm14 > > + movaps XMMWORD[(-72)+rax],xmm15 > > + > > + > > + > > + > > + > > + > > + sub rsp,48 > > + and rsp,-64 > > + mov QWORD[16+rsp],rax > > + > > + > > +$L$enc4x_body: > > + movdqu xmm12,XMMWORD[rsi] > > + lea rsi,[120+rsi] > > + lea rdi,[80+rdi] > > + > > +$L$enc4x_loop_grande: > > + mov DWORD[24+rsp],edx > > + xor edx,edx > > + mov ecx,DWORD[((-64))+rdi] > > + mov r8,QWORD[((-80))+rdi] > > + cmp ecx,edx > > + mov r12,QWORD[((-72))+rdi] > > + cmovg edx,ecx > > + test ecx,ecx > > + movdqu xmm2,XMMWORD[((-56))+rdi] > > + mov DWORD[32+rsp],ecx > > + cmovle r8,rsp > > + mov ecx,DWORD[((-24))+rdi] > > + mov r9,QWORD[((-40))+rdi] > > + cmp ecx,edx > > + mov r13,QWORD[((-32))+rdi] > > + cmovg edx,ecx > > + test ecx,ecx > > + movdqu xmm3,XMMWORD[((-16))+rdi] > > + mov DWORD[36+rsp],ecx > > + cmovle r9,rsp > > + mov ecx,DWORD[16+rdi] > > + mov r10,QWORD[rdi] > > + cmp ecx,edx > > + mov r14,QWORD[8+rdi] > > + cmovg edx,ecx > > + test ecx,ecx > > + movdqu xmm4,XMMWORD[24+rdi] > > + mov DWORD[40+rsp],ecx > > + cmovle r10,rsp > > + mov ecx,DWORD[56+rdi] > > + mov r11,QWORD[40+rdi] > > + cmp ecx,edx > > + mov r15,QWORD[48+rdi] > > + cmovg edx,ecx > > + test ecx,ecx > > + movdqu xmm5,XMMWORD[64+rdi] > > + mov DWORD[44+rsp],ecx > > + cmovle r11,rsp > > + test edx,edx > > + jz NEAR $L$enc4x_done > > + > > + movups xmm1,XMMWORD[((16-120))+rsi] > > + pxor xmm2,xmm12 > > + movups xmm0,XMMWORD[((32-120))+rsi] > > + pxor xmm3,xmm12 > > + mov eax,DWORD[((240-120))+rsi] > > + pxor xmm4,xmm12 > > + movdqu xmm6,XMMWORD[r8] > > + pxor xmm5,xmm12 > > + movdqu xmm7,XMMWORD[r9] > > + pxor xmm2,xmm6 > > + movdqu xmm8,XMMWORD[r10] > > + pxor xmm3,xmm7 > > + movdqu xmm9,XMMWORD[r11] > > + pxor xmm4,xmm8 > > + pxor xmm5,xmm9 > > + movdqa xmm10,XMMWORD[32+rsp] > > + xor rbx,rbx > > + jmp NEAR $L$oop_enc4x > > + > > +ALIGN 32 > > +$L$oop_enc4x: > > + add rbx,16 > > + lea rbp,[16+rsp] > > + mov ecx,1 > > + sub rbp,rbx > > + > > +DB 102,15,56,220,209 > > + prefetcht0 [31+rbx*1+r8] > > + prefetcht0 [31+rbx*1+r9] > > +DB 102,15,56,220,217 > > + prefetcht0 [31+rbx*1+r10] > > + prefetcht0 [31+rbx*1+r10] > > +DB 102,15,56,220,225 > > +DB 102,15,56,220,233 > > + movups xmm1,XMMWORD[((48-120))+rsi] > > + cmp ecx,DWORD[32+rsp] > > +DB 102,15,56,220,208 > > +DB 102,15,56,220,216 > > +DB 102,15,56,220,224 > > + cmovge r8,rbp > > + cmovg r12,rbp > > +DB 102,15,56,220,232 > > + movups xmm0,XMMWORD[((-56))+rsi] > > + cmp ecx,DWORD[36+rsp] > > +DB 102,15,56,220,209 > > +DB 102,15,56,220,217 > > +DB 102,15,56,220,225 > > + cmovge r9,rbp > > + cmovg r13,rbp > > +DB 102,15,56,220,233 > > + movups xmm1,XMMWORD[((-40))+rsi] > > + cmp ecx,DWORD[40+rsp] > > +DB 102,15,56,220,208 > > +DB 102,15,56,220,216 > > +DB 102,15,56,220,224 > > + cmovge r10,rbp > > + cmovg r14,rbp > > +DB 102,15,56,220,232 > > + movups xmm0,XMMWORD[((-24))+rsi] > > + cmp ecx,DWORD[44+rsp] > > +DB 102,15,56,220,209 > > +DB 102,15,56,220,217 > > +DB 102,15,56,220,225 > > + cmovge r11,rbp > > + cmovg r15,rbp > > +DB 102,15,56,220,233 > > + movups xmm1,XMMWORD[((-8))+rsi] > > + movdqa xmm11,xmm10 > > +DB 102,15,56,220,208 > > + prefetcht0 [15+rbx*1+r12] > > + prefetcht0 [15+rbx*1+r13] > > +DB 102,15,56,220,216 > > + prefetcht0 [15+rbx*1+r14] > > + prefetcht0 [15+rbx*1+r15] > > +DB 102,15,56,220,224 > > +DB 102,15,56,220,232 > > + movups xmm0,XMMWORD[((128-120))+rsi] > > + pxor xmm12,xmm12 > > + > > +DB 102,15,56,220,209 > > + pcmpgtd xmm11,xmm12 > > + movdqu xmm12,XMMWORD[((-120))+rsi] > > +DB 102,15,56,220,217 > > + paddd xmm10,xmm11 > > + movdqa XMMWORD[32+rsp],xmm10 > > +DB 102,15,56,220,225 > > +DB 102,15,56,220,233 > > + movups xmm1,XMMWORD[((144-120))+rsi] > > + > > + cmp eax,11 > > + > > +DB 102,15,56,220,208 > > +DB 102,15,56,220,216 > > +DB 102,15,56,220,224 > > +DB 102,15,56,220,232 > > + movups xmm0,XMMWORD[((160-120))+rsi] > > + > > + jb NEAR $L$enc4x_tail > > + > > +DB 102,15,56,220,209 > > +DB 102,15,56,220,217 > > +DB 102,15,56,220,225 > > +DB 102,15,56,220,233 > > + movups xmm1,XMMWORD[((176-120))+rsi] > > + > > +DB 102,15,56,220,208 > > +DB 102,15,56,220,216 > > +DB 102,15,56,220,224 > > +DB 102,15,56,220,232 > > + movups xmm0,XMMWORD[((192-120))+rsi] > > + > > + je NEAR $L$enc4x_tail > > + > > +DB 102,15,56,220,209 > > +DB 102,15,56,220,217 > > +DB 102,15,56,220,225 > > +DB 102,15,56,220,233 > > + movups xmm1,XMMWORD[((208-120))+rsi] > > + > > +DB 102,15,56,220,208 > > +DB 102,15,56,220,216 > > +DB 102,15,56,220,224 > > +DB 102,15,56,220,232 > > + movups xmm0,XMMWORD[((224-120))+rsi] > > + jmp NEAR $L$enc4x_tail > > + > > +ALIGN 32 > > +$L$enc4x_tail: > > +DB 102,15,56,220,209 > > +DB 102,15,56,220,217 > > +DB 102,15,56,220,225 > > +DB 102,15,56,220,233 > > + movdqu xmm6,XMMWORD[rbx*1+r8] > > + movdqu xmm1,XMMWORD[((16-120))+rsi] > > + > > +DB 102,15,56,221,208 > > + movdqu xmm7,XMMWORD[rbx*1+r9] > > + pxor xmm6,xmm12 > > +DB 102,15,56,221,216 > > + movdqu xmm8,XMMWORD[rbx*1+r10] > > + pxor xmm7,xmm12 > > +DB 102,15,56,221,224 > > + movdqu xmm9,XMMWORD[rbx*1+r11] > > + pxor xmm8,xmm12 > > +DB 102,15,56,221,232 > > + movdqu xmm0,XMMWORD[((32-120))+rsi] > > + pxor xmm9,xmm12 > > + > > + movups XMMWORD[(-16)+rbx*1+r12],xmm2 > > + pxor xmm2,xmm6 > > + movups XMMWORD[(-16)+rbx*1+r13],xmm3 > > + pxor xmm3,xmm7 > > + movups XMMWORD[(-16)+rbx*1+r14],xmm4 > > + pxor xmm4,xmm8 > > + movups XMMWORD[(-16)+rbx*1+r15],xmm5 > > + pxor xmm5,xmm9 > > + > > + dec edx > > + jnz NEAR $L$oop_enc4x > > + > > + mov rax,QWORD[16+rsp] > > + > > + mov edx,DWORD[24+rsp] > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + lea rdi,[160+rdi] > > + dec edx > > + jnz NEAR $L$enc4x_loop_grande > > + > > +$L$enc4x_done: > > + movaps xmm6,XMMWORD[((-216))+rax] > > + movaps xmm7,XMMWORD[((-200))+rax] > > + movaps xmm8,XMMWORD[((-184))+rax] > > + movaps xmm9,XMMWORD[((-168))+rax] > > + movaps xmm10,XMMWORD[((-152))+rax] > > + movaps xmm11,XMMWORD[((-136))+rax] > > + movaps xmm12,XMMWORD[((-120))+rax] > > + > > + > > + > > + mov r15,QWORD[((-48))+rax] > > + > > + mov r14,QWORD[((-40))+rax] > > + > > + mov r13,QWORD[((-32))+rax] > > + > > + mov r12,QWORD[((-24))+rax] > > + > > + mov rbp,QWORD[((-16))+rax] > > + > > + mov rbx,QWORD[((-8))+rax] > > + > > + lea rsp,[rax] > > + > > +$L$enc4x_epilogue: > > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > > + mov rsi,QWORD[16+rsp] > > + DB 0F3h,0C3h ;repret > > + > > +$L$SEH_end_aesni_multi_cbc_encrypt: > > + > > +global aesni_multi_cbc_decrypt > > + > > +ALIGN 32 > > +aesni_multi_cbc_decrypt: > > + mov QWORD[8+rsp],rdi ;WIN64 prologue > > + mov QWORD[16+rsp],rsi > > + mov rax,rsp > > +$L$SEH_begin_aesni_multi_cbc_decrypt: > > + mov rdi,rcx > > + mov rsi,rdx > > + mov rdx,r8 > > + > > + > > + > > + mov rax,rsp > > + > > + push rbx > > + > > + push rbp > > + > > + push r12 > > + > > + push r13 > > + > > + push r14 > > + > > + push r15 > > + > > + lea rsp,[((-168))+rsp] > > + movaps XMMWORD[rsp],xmm6 > > + movaps XMMWORD[16+rsp],xmm7 > > + movaps XMMWORD[32+rsp],xmm8 > > + movaps XMMWORD[48+rsp],xmm9 > > + movaps XMMWORD[64+rsp],xmm10 > > + movaps XMMWORD[80+rsp],xmm11 > > + movaps XMMWORD[96+rsp],xmm12 > > + movaps XMMWORD[(-104)+rax],xmm13 > > + movaps XMMWORD[(-88)+rax],xmm14 > > + movaps XMMWORD[(-72)+rax],xmm15 > > + > > + > > + > > + > > + > > + > > + sub rsp,48 > > + and rsp,-64 > > + mov QWORD[16+rsp],rax > > + > > + > > +$L$dec4x_body: > > + movdqu xmm12,XMMWORD[rsi] > > + lea rsi,[120+rsi] > > + lea rdi,[80+rdi] > > + > > +$L$dec4x_loop_grande: > > + mov DWORD[24+rsp],edx > > + xor edx,edx > > + mov ecx,DWORD[((-64))+rdi] > > + mov r8,QWORD[((-80))+rdi] > > + cmp ecx,edx > > + mov r12,QWORD[((-72))+rdi] > > + cmovg edx,ecx > > + test ecx,ecx > > + movdqu xmm6,XMMWORD[((-56))+rdi] > > + mov DWORD[32+rsp],ecx > > + cmovle r8,rsp > > + mov ecx,DWORD[((-24))+rdi] > > + mov r9,QWORD[((-40))+rdi] > > + cmp ecx,edx > > + mov r13,QWORD[((-32))+rdi] > > + cmovg edx,ecx > > + test ecx,ecx > > + movdqu xmm7,XMMWORD[((-16))+rdi] > > + mov DWORD[36+rsp],ecx > > + cmovle r9,rsp > > + mov ecx,DWORD[16+rdi] > > + mov r10,QWORD[rdi] > > + cmp ecx,edx > > + mov r14,QWORD[8+rdi] > > + cmovg edx,ecx > > + test ecx,ecx > > + movdqu xmm8,XMMWORD[24+rdi] > > + mov DWORD[40+rsp],ecx > > + cmovle r10,rsp > > + mov ecx,DWORD[56+rdi] > > + mov r11,QWORD[40+rdi] > > + cmp ecx,edx > > + mov r15,QWORD[48+rdi] > > + cmovg edx,ecx > > + test ecx,ecx > > + movdqu xmm9,XMMWORD[64+rdi] > > + mov DWORD[44+rsp],ecx > > + cmovle r11,rsp > > + test edx,edx > > + jz NEAR $L$dec4x_done > > + > > + movups xmm1,XMMWORD[((16-120))+rsi] > > + movups xmm0,XMMWORD[((32-120))+rsi] > > + mov eax,DWORD[((240-120))+rsi] > > + movdqu xmm2,XMMWORD[r8] > > + movdqu xmm3,XMMWORD[r9] > > + pxor xmm2,xmm12 > > + movdqu xmm4,XMMWORD[r10] > > + pxor xmm3,xmm12 > > + movdqu xmm5,XMMWORD[r11] > > + pxor xmm4,xmm12 > > + pxor xmm5,xmm12 > > + movdqa xmm10,XMMWORD[32+rsp] > > + xor rbx,rbx > > + jmp NEAR $L$oop_dec4x > > + > > +ALIGN 32 > > +$L$oop_dec4x: > > + add rbx,16 > > + lea rbp,[16+rsp] > > + mov ecx,1 > > + sub rbp,rbx > > + > > +DB 102,15,56,222,209 > > + prefetcht0 [31+rbx*1+r8] > > + prefetcht0 [31+rbx*1+r9] > > +DB 102,15,56,222,217 > > + prefetcht0 [31+rbx*1+r10] > > + prefetcht0 [31+rbx*1+r11] > > +DB 102,15,56,222,225 > > +DB 102,15,56,222,233 > > + movups xmm1,XMMWORD[((48-120))+rsi] > > + cmp ecx,DWORD[32+rsp] > > +DB 102,15,56,222,208 > > +DB 102,15,56,222,216 > > +DB 102,15,56,222,224 > > + cmovge r8,rbp > > + cmovg r12,rbp > > +DB 102,15,56,222,232 > > + movups xmm0,XMMWORD[((-56))+rsi] > > + cmp ecx,DWORD[36+rsp] > > +DB 102,15,56,222,209 > > +DB 102,15,56,222,217 > > +DB 102,15,56,222,225 > > + cmovge r9,rbp > > + cmovg r13,rbp > > +DB 102,15,56,222,233 > > + movups xmm1,XMMWORD[((-40))+rsi] > > + cmp ecx,DWORD[40+rsp] > > +DB 102,15,56,222,208 > > +DB 102,15,56,222,216 > > +DB 102,15,56,222,224 > > + cmovge r10,rbp > > + cmovg r14,rbp > > +DB 102,15,56,222,232 > > + movups xmm0,XMMWORD[((-24))+rsi] > > + cmp ecx,DWORD[44+rsp] > > +DB 102,15,56,222,209 > > +DB 102,15,56,222,217 > > +DB 102,15,56,222,225 > > + cmovge r11,rbp > > + cmovg r15,rbp > > +DB 102,15,56,222,233 > > + movups xmm1,XMMWORD[((-8))+rsi] > > + movdqa xmm11,xmm10 > > +DB 102,15,56,222,208 > > + prefetcht0 [15+rbx*1+r12] > > + prefetcht0 [15+rbx*1+r13] > > +DB 102,15,56,222,216 > > + prefetcht0 [15+rbx*1+r14] > > + prefetcht0 [15+rbx*1+r15] > > +DB 102,15,56,222,224 > > +DB 102,15,56,222,232 > > + movups xmm0,XMMWORD[((128-120))+rsi] > > + pxor xmm12,xmm12 > > + > > +DB 102,15,56,222,209 > > + pcmpgtd xmm11,xmm12 > > + movdqu xmm12,XMMWORD[((-120))+rsi] > > +DB 102,15,56,222,217 > > + paddd xmm10,xmm11 > > + movdqa XMMWORD[32+rsp],xmm10 > > +DB 102,15,56,222,225 > > +DB 102,15,56,222,233 > > + movups xmm1,XMMWORD[((144-120))+rsi] > > + > > + cmp eax,11 > > + > > +DB 102,15,56,222,208 > > +DB 102,15,56,222,216 > > +DB 102,15,56,222,224 > > +DB 102,15,56,222,232 > > + movups xmm0,XMMWORD[((160-120))+rsi] > > + > > + jb NEAR $L$dec4x_tail > > + > > +DB 102,15,56,222,209 > > +DB 102,15,56,222,217 > > +DB 102,15,56,222,225 > > +DB 102,15,56,222,233 > > + movups xmm1,XMMWORD[((176-120))+rsi] > > + > > +DB 102,15,56,222,208 > > +DB 102,15,56,222,216 > > +DB 102,15,56,222,224 > > +DB 102,15,56,222,232 > > + movups xmm0,XMMWORD[((192-120))+rsi] > > + > > + je NEAR $L$dec4x_tail > > + > > +DB 102,15,56,222,209 > > +DB 102,15,56,222,217 > > +DB 102,15,56,222,225 > > +DB 102,15,56,222,233 > > + movups xmm1,XMMWORD[((208-120))+rsi] > > + > > +DB 102,15,56,222,208 > > +DB 102,15,56,222,216 > > +DB 102,15,56,222,224 > > +DB 102,15,56,222,232 > > + movups xmm0,XMMWORD[((224-120))+rsi] > > + jmp NEAR $L$dec4x_tail > > + > > +ALIGN 32 > > +$L$dec4x_tail: > > +DB 102,15,56,222,209 > > +DB 102,15,56,222,217 > > +DB 102,15,56,222,225 > > + pxor xmm6,xmm0 > > + pxor xmm7,xmm0 > > +DB 102,15,56,222,233 > > + movdqu xmm1,XMMWORD[((16-120))+rsi] > > + pxor xmm8,xmm0 > > + pxor xmm9,xmm0 > > + movdqu xmm0,XMMWORD[((32-120))+rsi] > > + > > +DB 102,15,56,223,214 > > +DB 102,15,56,223,223 > > + movdqu xmm6,XMMWORD[((-16))+rbx*1+r8] > > + movdqu xmm7,XMMWORD[((-16))+rbx*1+r9] > > +DB 102,65,15,56,223,224 > > +DB 102,65,15,56,223,233 > > + movdqu xmm8,XMMWORD[((-16))+rbx*1+r10] > > + movdqu xmm9,XMMWORD[((-16))+rbx*1+r11] > > + > > + movups XMMWORD[(-16)+rbx*1+r12],xmm2 > > + movdqu xmm2,XMMWORD[rbx*1+r8] > > + movups XMMWORD[(-16)+rbx*1+r13],xmm3 > > + movdqu xmm3,XMMWORD[rbx*1+r9] > > + pxor xmm2,xmm12 > > + movups XMMWORD[(-16)+rbx*1+r14],xmm4 > > + movdqu xmm4,XMMWORD[rbx*1+r10] > > + pxor xmm3,xmm12 > > + movups XMMWORD[(-16)+rbx*1+r15],xmm5 > > + movdqu xmm5,XMMWORD[rbx*1+r11] > > + pxor xmm4,xmm12 > > + pxor xmm5,xmm12 > > + > > + dec edx > > + jnz NEAR $L$oop_dec4x > > + > > + mov rax,QWORD[16+rsp] > > + > > + mov edx,DWORD[24+rsp] > > + > > + lea rdi,[160+rdi] > > + dec edx > > + jnz NEAR $L$dec4x_loop_grande > > + > > +$L$dec4x_done: > > + movaps xmm6,XMMWORD[((-216))+rax] > > + movaps xmm7,XMMWORD[((-200))+rax] > > + movaps xmm8,XMMWORD[((-184))+rax] > > + movaps xmm9,XMMWORD[((-168))+rax] > > + movaps xmm10,XMMWORD[((-152))+rax] > > + movaps xmm11,XMMWORD[((-136))+rax] > > + movaps xmm12,XMMWORD[((-120))+rax] > > + > > + > > + > > + mov r15,QWORD[((-48))+rax] > > + > > + mov r14,QWORD[((-40))+rax] > > + > > + mov r13,QWORD[((-32))+rax] > > + > > + mov r12,QWORD[((-24))+rax] > > + > > + mov rbp,QWORD[((-16))+rax] > > + > > + mov rbx,QWORD[((-8))+rax] > > + > > + lea rsp,[rax] > > + > > +$L$dec4x_epilogue: > > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > > + mov rsi,QWORD[16+rsp] > > + DB 0F3h,0C3h ;repret > > + > > +$L$SEH_end_aesni_multi_cbc_decrypt: > > +EXTERN __imp_RtlVirtualUnwind > > + > > +ALIGN 16 > > +se_handler: > > + push rsi > > + push rdi > > + push rbx > > + push rbp > > + push r12 > > + push r13 > > + push r14 > > + push r15 > > + pushfq > > + sub rsp,64 > > + > > + mov rax,QWORD[120+r8] > > + mov rbx,QWORD[248+r8] > > + > > + mov rsi,QWORD[8+r9] > > + mov r11,QWORD[56+r9] > > + > > + mov r10d,DWORD[r11] > > + lea r10,[r10*1+rsi] > > + cmp rbx,r10 > > + jb NEAR $L$in_prologue > > + > > + mov rax,QWORD[152+r8] > > + > > + mov r10d,DWORD[4+r11] > > + lea r10,[r10*1+rsi] > > + cmp rbx,r10 > > + jae NEAR $L$in_prologue > > + > > + mov rax,QWORD[16+rax] > > + > > + mov rbx,QWORD[((-8))+rax] > > + mov rbp,QWORD[((-16))+rax] > > + mov r12,QWORD[((-24))+rax] > > + mov r13,QWORD[((-32))+rax] > > + mov r14,QWORD[((-40))+rax] > > + mov r15,QWORD[((-48))+rax] > > + mov QWORD[144+r8],rbx > > + mov QWORD[160+r8],rbp > > + mov QWORD[216+r8],r12 > > + mov QWORD[224+r8],r13 > > + mov QWORD[232+r8],r14 > > + mov QWORD[240+r8],r15 > > + > > + lea rsi,[((-56-160))+rax] > > + lea rdi,[512+r8] > > + mov ecx,20 > > + DD 0xa548f3fc > > + > > +$L$in_prologue: > > + mov rdi,QWORD[8+rax] > > + mov rsi,QWORD[16+rax] > > + mov QWORD[152+r8],rax > > + mov QWORD[168+r8],rsi > > + mov QWORD[176+r8],rdi > > + > > + mov rdi,QWORD[40+r9] > > + mov rsi,r8 > > + mov ecx,154 > > + DD 0xa548f3fc > > + > > + mov rsi,r9 > > + xor rcx,rcx > > + mov rdx,QWORD[8+rsi] > > + mov r8,QWORD[rsi] > > + mov r9,QWORD[16+rsi] > > + mov r10,QWORD[40+rsi] > > + lea r11,[56+rsi] > > + lea r12,[24+rsi] > > + mov QWORD[32+rsp],r10 > > + mov QWORD[40+rsp],r11 > > + mov QWORD[48+rsp],r12 > > + mov QWORD[56+rsp],rcx > > + call QWORD[__imp_RtlVirtualUnwind] > > + > > + mov eax,1 > > + add rsp,64 > > + popfq > > + pop r15 > > + pop r14 > > + pop r13 > > + pop r12 > > + pop rbp > > + pop rbx > > + pop rdi > > + pop rsi > > + DB 0F3h,0C3h ;repret > > + > > + > > +section .pdata rdata align=4 > > +ALIGN 4 > > + DD $L$SEH_begin_aesni_multi_cbc_encrypt > wrt ..imagebase > > + DD $L$SEH_end_aesni_multi_cbc_encrypt > wrt ..imagebase > > + DD $L$SEH_info_aesni_multi_cbc_encrypt > wrt ..imagebase > > + DD $L$SEH_begin_aesni_multi_cbc_decrypt > wrt ..imagebase > > + DD $L$SEH_end_aesni_multi_cbc_decrypt > wrt ..imagebase > > + DD $L$SEH_info_aesni_multi_cbc_decrypt > wrt ..imagebase > > +section .xdata rdata align=8 > > +ALIGN 8 > > +$L$SEH_info_aesni_multi_cbc_encrypt: > > +DB 9,0,0,0 > > + DD se_handler wrt ..imagebase > > + DD $L$enc4x_body wrt ..imagebase,$L$enc4x_epilogue > > wrt ..imagebase > > +$L$SEH_info_aesni_multi_cbc_decrypt: > > +DB 9,0,0,0 > > + DD se_handler wrt ..imagebase > > + DD $L$dec4x_body wrt ..imagebase,$L$dec4x_epilogue > > wrt ..imagebase > > diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha1- > > x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha1- > > x86_64.nasm > > new file mode 100644 > > index 0000000000..f4fd9ca50d > > --- /dev/null > > +++ > b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha1-x86_64.nasm > > @@ -0,0 +1,1916 @@ > > +; WARNING: do not edit! > > +; Generated from openssl/crypto/aes/asm/aesni-sha1-x86_64.pl > > +; > > +; Copyright 2011-2020 The OpenSSL Project Authors. All Rights Reserved. > > +; > > +; Licensed under the OpenSSL license (the "License"). You may not use > > +; this file except in compliance with the License. You can obtain a copy > > +; in the file LICENSE in the source distribution or at > > +; https://www.openssl.org/source/license.html > > + > > +default rel > > +%define XMMWORD > > +%define YMMWORD > > +%define ZMMWORD > > +section .text code align=64 > > + > > +EXTERN OPENSSL_ia32cap_P > > + > > +global aesni_cbc_sha1_enc > > + > > +ALIGN 32 > > +aesni_cbc_sha1_enc: > > + > > + > > + mov r10d,DWORD[((OPENSSL_ia32cap_P+0))] > > + mov r11,QWORD[((OPENSSL_ia32cap_P+4))] > > + bt r11,61 > > + jc NEAR aesni_cbc_sha1_enc_shaext > > + jmp NEAR aesni_cbc_sha1_enc_ssse3 > > + DB 0F3h,0C3h ;repret > > + > > + > > + > > +ALIGN 32 > > +aesni_cbc_sha1_enc_ssse3: > > + mov QWORD[8+rsp],rdi ;WIN64 prologue > > + mov QWORD[16+rsp],rsi > > + mov rax,rsp > > +$L$SEH_begin_aesni_cbc_sha1_enc_ssse3: > > + mov rdi,rcx > > + mov rsi,rdx > > + mov rdx,r8 > > + mov rcx,r9 > > + mov r8,QWORD[40+rsp] > > + mov r9,QWORD[48+rsp] > > + > > + > > + > > + mov r10,QWORD[56+rsp] > > + > > + > > + push rbx > > + > > + push rbp > > + > > + push r12 > > + > > + push r13 > > + > > + push r14 > > + > > + push r15 > > + > > + lea rsp,[((-264))+rsp] > > + > > + > > + > > + movaps XMMWORD[(96+0)+rsp],xmm6 > > + movaps XMMWORD[(96+16)+rsp],xmm7 > > + movaps XMMWORD[(96+32)+rsp],xmm8 > > + movaps XMMWORD[(96+48)+rsp],xmm9 > > + movaps XMMWORD[(96+64)+rsp],xmm10 > > + movaps XMMWORD[(96+80)+rsp],xmm11 > > + movaps XMMWORD[(96+96)+rsp],xmm12 > > + movaps XMMWORD[(96+112)+rsp],xmm13 > > + movaps XMMWORD[(96+128)+rsp],xmm14 > > + movaps XMMWORD[(96+144)+rsp],xmm15 > > +$L$prologue_ssse3: > > + mov r12,rdi > > + mov r13,rsi > > + mov r14,rdx > > + lea r15,[112+rcx] > > + movdqu xmm2,XMMWORD[r8] > > + mov QWORD[88+rsp],r8 > > + shl r14,6 > > + sub r13,r12 > > + mov r8d,DWORD[((240-112))+r15] > > + add r14,r10 > > + > > + lea r11,[K_XX_XX] > > + mov eax,DWORD[r9] > > + mov ebx,DWORD[4+r9] > > + mov ecx,DWORD[8+r9] > > + mov edx,DWORD[12+r9] > > + mov esi,ebx > > + mov ebp,DWORD[16+r9] > > + mov edi,ecx > > + xor edi,edx > > + and esi,edi > > + > > + movdqa xmm3,XMMWORD[64+r11] > > + movdqa xmm13,XMMWORD[r11] > > + movdqu xmm4,XMMWORD[r10] > > + movdqu xmm5,XMMWORD[16+r10] > > + movdqu xmm6,XMMWORD[32+r10] > > + movdqu xmm7,XMMWORD[48+r10] > > +DB 102,15,56,0,227 > > +DB 102,15,56,0,235 > > +DB 102,15,56,0,243 > > + add r10,64 > > + paddd xmm4,xmm13 > > +DB 102,15,56,0,251 > > + paddd xmm5,xmm13 > > + paddd xmm6,xmm13 > > + movdqa XMMWORD[rsp],xmm4 > > + psubd xmm4,xmm13 > > + movdqa XMMWORD[16+rsp],xmm5 > > + psubd xmm5,xmm13 > > + movdqa XMMWORD[32+rsp],xmm6 > > + psubd xmm6,xmm13 > > + movups xmm15,XMMWORD[((-112))+r15] > > + movups xmm0,XMMWORD[((16-112))+r15] > > + jmp NEAR $L$oop_ssse3 > > +ALIGN 32 > > +$L$oop_ssse3: > > + ror ebx,2 > > + movups xmm14,XMMWORD[r12] > > + xorps xmm14,xmm15 > > + xorps xmm2,xmm14 > > + movups xmm1,XMMWORD[((-80))+r15] > > +DB 102,15,56,220,208 > > + pshufd xmm8,xmm4,238 > > + xor esi,edx > > + movdqa xmm12,xmm7 > > + paddd xmm13,xmm7 > > + mov edi,eax > > + add ebp,DWORD[rsp] > > + punpcklqdq xmm8,xmm5 > > + xor ebx,ecx > > + rol eax,5 > > + add ebp,esi > > + psrldq xmm12,4 > > + and edi,ebx > > + xor ebx,ecx > > + pxor xmm8,xmm4 > > + add ebp,eax > > + ror eax,7 > > + pxor xmm12,xmm6 > > + xor edi,ecx > > + mov esi,ebp > > + add edx,DWORD[4+rsp] > > + pxor xmm8,xmm12 > > + xor eax,ebx > > + rol ebp,5 > > + movdqa XMMWORD[48+rsp],xmm13 > > + add edx,edi > > + movups xmm0,XMMWORD[((-64))+r15] > > +DB 102,15,56,220,209 > > + and esi,eax > > + movdqa xmm3,xmm8 > > + xor eax,ebx > > + add edx,ebp > > + ror ebp,7 > > + movdqa xmm12,xmm8 > > + xor esi,ebx > > + pslldq xmm3,12 > > + paddd xmm8,xmm8 > > + mov edi,edx > > + add ecx,DWORD[8+rsp] > > + psrld xmm12,31 > > + xor ebp,eax > > + rol edx,5 > > + add ecx,esi > > + movdqa xmm13,xmm3 > > + and edi,ebp > > + xor ebp,eax > > + psrld xmm3,30 > > + add ecx,edx > > + ror edx,7 > > + por xmm8,xmm12 > > + xor edi,eax > > + mov esi,ecx > > + add ebx,DWORD[12+rsp] > > + movups xmm1,XMMWORD[((-48))+r15] > > +DB 102,15,56,220,208 > > + pslld xmm13,2 > > + pxor xmm8,xmm3 > > + xor edx,ebp > > + movdqa xmm3,XMMWORD[r11] > > + rol ecx,5 > > + add ebx,edi > > + and esi,edx > > + pxor xmm8,xmm13 > > + xor edx,ebp > > + add ebx,ecx > > + ror ecx,7 > > + pshufd xmm9,xmm5,238 > > + xor esi,ebp > > + movdqa xmm13,xmm8 > > + paddd xmm3,xmm8 > > + mov edi,ebx > > + add eax,DWORD[16+rsp] > > + punpcklqdq xmm9,xmm6 > > + xor ecx,edx > > + rol ebx,5 > > + add eax,esi > > + psrldq xmm13,4 > > + and edi,ecx > > + xor ecx,edx > > + pxor xmm9,xmm5 > > + add eax,ebx > > + ror ebx,7 > > + movups xmm0,XMMWORD[((-32))+r15] > > +DB 102,15,56,220,209 > > + pxor xmm13,xmm7 > > + xor edi,edx > > + mov esi,eax > > + add ebp,DWORD[20+rsp] > > + pxor xmm9,xmm13 > > + xor ebx,ecx > > + rol eax,5 > > + movdqa XMMWORD[rsp],xmm3 > > + add ebp,edi > > + and esi,ebx > > + movdqa xmm12,xmm9 > > + xor ebx,ecx > > + add ebp,eax > > + ror eax,7 > > + movdqa xmm13,xmm9 > > + xor esi,ecx > > + pslldq xmm12,12 > > + paddd xmm9,xmm9 > > + mov edi,ebp > > + add edx,DWORD[24+rsp] > > + psrld xmm13,31 > > + xor eax,ebx > > + rol ebp,5 > > + add edx,esi > > + movups xmm1,XMMWORD[((-16))+r15] > > +DB 102,15,56,220,208 > > + movdqa xmm3,xmm12 > > + and edi,eax > > + xor eax,ebx > > + psrld xmm12,30 > > + add edx,ebp > > + ror ebp,7 > > + por xmm9,xmm13 > > + xor edi,ebx > > + mov esi,edx > > + add ecx,DWORD[28+rsp] > > + pslld xmm3,2 > > + pxor xmm9,xmm12 > > + xor ebp,eax > > + movdqa xmm12,XMMWORD[16+r11] > > + rol edx,5 > > + add ecx,edi > > + and esi,ebp > > + pxor xmm9,xmm3 > > + xor ebp,eax > > + add ecx,edx > > + ror edx,7 > > + pshufd xmm10,xmm6,238 > > + xor esi,eax > > + movdqa xmm3,xmm9 > > + paddd xmm12,xmm9 > > + mov edi,ecx > > + add ebx,DWORD[32+rsp] > > + movups xmm0,XMMWORD[r15] > > +DB 102,15,56,220,209 > > + punpcklqdq xmm10,xmm7 > > + xor edx,ebp > > + rol ecx,5 > > + add ebx,esi > > + psrldq xmm3,4 > > + and edi,edx > > + xor edx,ebp > > + pxor xmm10,xmm6 > > + add ebx,ecx > > + ror ecx,7 > > + pxor xmm3,xmm8 > > + xor edi,ebp > > + mov esi,ebx > > + add eax,DWORD[36+rsp] > > + pxor xmm10,xmm3 > > + xor ecx,edx > > + rol ebx,5 > > + movdqa XMMWORD[16+rsp],xmm12 > > + add eax,edi > > + and esi,ecx > > + movdqa xmm13,xmm10 > > + xor ecx,edx > > + add eax,ebx > > + ror ebx,7 > > + movups xmm1,XMMWORD[16+r15] > > +DB 102,15,56,220,208 > > + movdqa xmm3,xmm10 > > + xor esi,edx > > + pslldq xmm13,12 > > + paddd xmm10,xmm10 > > + mov edi,eax > > + add ebp,DWORD[40+rsp] > > + psrld xmm3,31 > > + xor ebx,ecx > > + rol eax,5 > > + add ebp,esi > > + movdqa xmm12,xmm13 > > + and edi,ebx > > + xor ebx,ecx > > + psrld xmm13,30 > > + add ebp,eax > > + ror eax,7 > > + por xmm10,xmm3 > > + xor edi,ecx > > + mov esi,ebp > > + add edx,DWORD[44+rsp] > > + pslld xmm12,2 > > + pxor xmm10,xmm13 > > + xor eax,ebx > > + movdqa xmm13,XMMWORD[16+r11] > > + rol ebp,5 > > + add edx,edi > > + movups xmm0,XMMWORD[32+r15] > > +DB 102,15,56,220,209 > > + and esi,eax > > + pxor xmm10,xmm12 > > + xor eax,ebx > > + add edx,ebp > > + ror ebp,7 > > + pshufd xmm11,xmm7,238 > > + xor esi,ebx > > + movdqa xmm12,xmm10 > > + paddd xmm13,xmm10 > > + mov edi,edx > > + add ecx,DWORD[48+rsp] > > + punpcklqdq xmm11,xmm8 > > + xor ebp,eax > > + rol edx,5 > > + add ecx,esi > > + psrldq xmm12,4 > > + and edi,ebp > > + xor ebp,eax > > + pxor xmm11,xmm7 > > + add ecx,edx > > + ror edx,7 > > + pxor xmm12,xmm9 > > + xor edi,eax > > + mov esi,ecx > > + add ebx,DWORD[52+rsp] > > + movups xmm1,XMMWORD[48+r15] > > +DB 102,15,56,220,208 > > + pxor xmm11,xmm12 > > + xor edx,ebp > > + rol ecx,5 > > + movdqa XMMWORD[32+rsp],xmm13 > > + add ebx,edi > > + and esi,edx > > + movdqa xmm3,xmm11 > > + xor edx,ebp > > + add ebx,ecx > > + ror ecx,7 > > + movdqa xmm12,xmm11 > > + xor esi,ebp > > + pslldq xmm3,12 > > + paddd xmm11,xmm11 > > + mov edi,ebx > > + add eax,DWORD[56+rsp] > > + psrld xmm12,31 > > + xor ecx,edx > > + rol ebx,5 > > + add eax,esi > > + movdqa xmm13,xmm3 > > + and edi,ecx > > + xor ecx,edx > > + psrld xmm3,30 > > + add eax,ebx > > + ror ebx,7 > > + cmp r8d,11 > > + jb NEAR $L$aesenclast1 > > + movups xmm0,XMMWORD[64+r15] > > +DB 102,15,56,220,209 > > + movups xmm1,XMMWORD[80+r15] > > +DB 102,15,56,220,208 > > + je NEAR $L$aesenclast1 > > + movups xmm0,XMMWORD[96+r15] > > +DB 102,15,56,220,209 > > + movups xmm1,XMMWORD[112+r15] > > +DB 102,15,56,220,208 > > +$L$aesenclast1: > > +DB 102,15,56,221,209 > > + movups xmm0,XMMWORD[((16-112))+r15] > > + por xmm11,xmm12 > > + xor edi,edx > > + mov esi,eax > > + add ebp,DWORD[60+rsp] > > + pslld xmm13,2 > > + pxor xmm11,xmm3 > > + xor ebx,ecx > > + movdqa xmm3,XMMWORD[16+r11] > > + rol eax,5 > > + add ebp,edi > > + and esi,ebx > > + pxor xmm11,xmm13 > > + pshufd xmm13,xmm10,238 > > + xor ebx,ecx > > + add ebp,eax > > + ror eax,7 > > + pxor xmm4,xmm8 > > + xor esi,ecx > > + mov edi,ebp > > + add edx,DWORD[rsp] > > + punpcklqdq xmm13,xmm11 > > + xor eax,ebx > > + rol ebp,5 > > + pxor xmm4,xmm5 > > + add edx,esi > > + movups xmm14,XMMWORD[16+r12] > > + xorps xmm14,xmm15 > > + movups XMMWORD[r13*1+r12],xmm2 > > + xorps xmm2,xmm14 > > + movups xmm1,XMMWORD[((-80))+r15] > > +DB 102,15,56,220,208 > > + and edi,eax > > + movdqa xmm12,xmm3 > > + xor eax,ebx > > + paddd xmm3,xmm11 > > + add edx,ebp > > + pxor xmm4,xmm13 > > + ror ebp,7 > > + xor edi,ebx > > + mov esi,edx > > + add ecx,DWORD[4+rsp] > > + movdqa xmm13,xmm4 > > + xor ebp,eax > > + rol edx,5 > > + movdqa XMMWORD[48+rsp],xmm3 > > + add ecx,edi > > + and esi,ebp > > + xor ebp,eax > > + pslld xmm4,2 > > + add ecx,edx > > + ror edx,7 > > + psrld xmm13,30 > > + xor esi,eax > > + mov edi,ecx > > + add ebx,DWORD[8+rsp] > > + movups xmm0,XMMWORD[((-64))+r15] > > +DB 102,15,56,220,209 > > + por xmm4,xmm13 > > + xor edx,ebp > > + rol ecx,5 > > + pshufd xmm3,xmm11,238 > > + add ebx,esi > > + and edi,edx > > + xor edx,ebp > > + add ebx,ecx > > + add eax,DWORD[12+rsp] > > + xor edi,ebp > > + mov esi,ebx > > + rol ebx,5 > > + add eax,edi > > + xor esi,edx > > + ror ecx,7 > > + add eax,ebx > > + pxor xmm5,xmm9 > > + add ebp,DWORD[16+rsp] > > + movups xmm1,XMMWORD[((-48))+r15] > > +DB 102,15,56,220,208 > > + xor esi,ecx > > + punpcklqdq xmm3,xmm4 > > + mov edi,eax > > + rol eax,5 > > + pxor xmm5,xmm6 > > + add ebp,esi > > + xor edi,ecx > > + movdqa xmm13,xmm12 > > + ror ebx,7 > > + paddd xmm12,xmm4 > > + add ebp,eax > > + pxor xmm5,xmm3 > > + add edx,DWORD[20+rsp] > > + xor edi,ebx > > + mov esi,ebp > > + rol ebp,5 > > + movdqa xmm3,xmm5 > > + add edx,edi > > + xor esi,ebx > > + movdqa XMMWORD[rsp],xmm12 > > + ror eax,7 > > + add edx,ebp > > + add ecx,DWORD[24+rsp] > > + pslld xmm5,2 > > + xor esi,eax > > + mov edi,edx > > + psrld xmm3,30 > > + rol edx,5 > > + add ecx,esi > > + movups xmm0,XMMWORD[((-32))+r15] > > +DB 102,15,56,220,209 > > + xor edi,eax > > + ror ebp,7 > > + por xmm5,xmm3 > > + add ecx,edx > > + add ebx,DWORD[28+rsp] > > + pshufd xmm12,xmm4,238 > > + xor edi,ebp > > + mov esi,ecx > > + rol ecx,5 > > + add ebx,edi > > + xor esi,ebp > > + ror edx,7 > > + add ebx,ecx > > + pxor xmm6,xmm10 > > + add eax,DWORD[32+rsp] > > + xor esi,edx > > + punpcklqdq xmm12,xmm5 > > + mov edi,ebx > > + rol ebx,5 > > + pxor xmm6,xmm7 > > + add eax,esi > > + xor edi,edx > > + movdqa xmm3,XMMWORD[32+r11] > > + ror ecx,7 > > + paddd xmm13,xmm5 > > + add eax,ebx > > + pxor xmm6,xmm12 > > + add ebp,DWORD[36+rsp] > > + movups xmm1,XMMWORD[((-16))+r15] > > +DB 102,15,56,220,208 > > + xor edi,ecx > > + mov esi,eax > > + rol eax,5 > > + movdqa xmm12,xmm6 > > + add ebp,edi > > + xor esi,ecx > > + movdqa XMMWORD[16+rsp],xmm13 > > + ror ebx,7 > > + add ebp,eax > > + add edx,DWORD[40+rsp] > > + pslld xmm6,2 > > + xor esi,ebx > > + mov edi,ebp > > + psrld xmm12,30 > > + rol ebp,5 > > + add edx,esi > > + xor edi,ebx > > + ror eax,7 > > + por xmm6,xmm12 > > + add edx,ebp > > + add ecx,DWORD[44+rsp] > > + pshufd xmm13,xmm5,238 > > + xor edi,eax > > + mov esi,edx > > + rol edx,5 > > + add ecx,edi > > + movups xmm0,XMMWORD[r15] > > +DB 102,15,56,220,209 > > + xor esi,eax > > + ror ebp,7 > > + add ecx,edx > > + pxor xmm7,xmm11 > > + add ebx,DWORD[48+rsp] > > + xor esi,ebp > > + punpcklqdq xmm13,xmm6 > > + mov edi,ecx > > + rol ecx,5 > > + pxor xmm7,xmm8 > > + add ebx,esi > > + xor edi,ebp > > + movdqa xmm12,xmm3 > > + ror edx,7 > > + paddd xmm3,xmm6 > > + add ebx,ecx > > + pxor xmm7,xmm13 > > + add eax,DWORD[52+rsp] > > + xor edi,edx > > + mov esi,ebx > > + rol ebx,5 > > + movdqa xmm13,xmm7 > > + add eax,edi > > + xor esi,edx > > + movdqa XMMWORD[32+rsp],xmm3 > > + ror ecx,7 > > + add eax,ebx > > + add ebp,DWORD[56+rsp] > > + movups xmm1,XMMWORD[16+r15] > > +DB 102,15,56,220,208 > > + pslld xmm7,2 > > + xor esi,ecx > > + mov edi,eax > > + psrld xmm13,30 > > + rol eax,5 > > + add ebp,esi > > + xor edi,ecx > > + ror ebx,7 > > + por xmm7,xmm13 > > + add ebp,eax > > + add edx,DWORD[60+rsp] > > + pshufd xmm3,xmm6,238 > > + xor edi,ebx > > + mov esi,ebp > > + rol ebp,5 > > + add edx,edi > > + xor esi,ebx > > + ror eax,7 > > + add edx,ebp > > + pxor xmm8,xmm4 > > + add ecx,DWORD[rsp] > > + xor esi,eax > > + punpcklqdq xmm3,xmm7 > > + mov edi,edx > > + rol edx,5 > > + pxor xmm8,xmm9 > > + add ecx,esi > > + movups xmm0,XMMWORD[32+r15] > > +DB 102,15,56,220,209 > > + xor edi,eax > > + movdqa xmm13,xmm12 > > + ror ebp,7 > > + paddd xmm12,xmm7 > > + add ecx,edx > > + pxor xmm8,xmm3 > > + add ebx,DWORD[4+rsp] > > + xor edi,ebp > > + mov esi,ecx > > + rol ecx,5 > > + movdqa xmm3,xmm8 > > + add ebx,edi > > + xor esi,ebp > > + movdqa XMMWORD[48+rsp],xmm12 > > + ror edx,7 > > + add ebx,ecx > > + add eax,DWORD[8+rsp] > > + pslld xmm8,2 > > + xor esi,edx > > + mov edi,ebx > > + psrld xmm3,30 > > + rol ebx,5 > > + add eax,esi > > + xor edi,edx > > + ror ecx,7 > > + por xmm8,xmm3 > > + add eax,ebx > > + add ebp,DWORD[12+rsp] > > + movups xmm1,XMMWORD[48+r15] > > +DB 102,15,56,220,208 > > + pshufd xmm12,xmm7,238 > > + xor edi,ecx > > + mov esi,eax > > + rol eax,5 > > + add ebp,edi > > + xor esi,ecx > > + ror ebx,7 > > + add ebp,eax > > + pxor xmm9,xmm5 > > + add edx,DWORD[16+rsp] > > + xor esi,ebx > > + punpcklqdq xmm12,xmm8 > > + mov edi,ebp > > + rol ebp,5 > > + pxor xmm9,xmm10 > > + add edx,esi > > + xor edi,ebx > > + movdqa xmm3,xmm13 > > + ror eax,7 > > + paddd xmm13,xmm8 > > + add edx,ebp > > + pxor xmm9,xmm12 > > + add ecx,DWORD[20+rsp] > > + xor edi,eax > > + mov esi,edx > > + rol edx,5 > > + movdqa xmm12,xmm9 > > + add ecx,edi > > + cmp r8d,11 > > + jb NEAR $L$aesenclast2 > > + movups xmm0,XMMWORD[64+r15] > > +DB 102,15,56,220,209 > > + movups xmm1,XMMWORD[80+r15] > > +DB 102,15,56,220,208 > > + je NEAR $L$aesenclast2 > > + movups xmm0,XMMWORD[96+r15] > > +DB 102,15,56,220,209 > > + movups xmm1,XMMWORD[112+r15] > > +DB 102,15,56,220,208 > > +$L$aesenclast2: > > +DB 102,15,56,221,209 > > + movups xmm0,XMMWORD[((16-112))+r15] > > + xor esi,eax > > + movdqa XMMWORD[rsp],xmm13 > > + ror ebp,7 > > + add ecx,edx > > + add ebx,DWORD[24+rsp] > > + pslld xmm9,2 > > + xor esi,ebp > > + mov edi,ecx > > + psrld xmm12,30 > > + rol ecx,5 > > + add ebx,esi > > + xor edi,ebp > > + ror edx,7 > > + por xmm9,xmm12 > > + add ebx,ecx > > + add eax,DWORD[28+rsp] > > + pshufd xmm13,xmm8,238 > > + ror ecx,7 > > + mov esi,ebx > > + xor edi,edx > > + rol ebx,5 > > + add eax,edi > > + xor esi,ecx > > + xor ecx,edx > > + add eax,ebx > > + pxor xmm10,xmm6 > > + add ebp,DWORD[32+rsp] > > + movups xmm14,XMMWORD[32+r12] > > + xorps xmm14,xmm15 > > + movups XMMWORD[16+r12*1+r13],xmm2 > > + xorps xmm2,xmm14 > > + movups xmm1,XMMWORD[((-80))+r15] > > +DB 102,15,56,220,208 > > + and esi,ecx > > + xor ecx,edx > > + ror ebx,7 > > + punpcklqdq xmm13,xmm9 > > + mov edi,eax > > + xor esi,ecx > > + pxor xmm10,xmm11 > > + rol eax,5 > > + add ebp,esi > > + movdqa xmm12,xmm3 > > + xor edi,ebx > > + paddd xmm3,xmm9 > > + xor ebx,ecx > > + pxor xmm10,xmm13 > > + add ebp,eax > > + add edx,DWORD[36+rsp] > > + and edi,ebx > > + xor ebx,ecx > > + ror eax,7 > > + movdqa xmm13,xmm10 > > + mov esi,ebp > > + xor edi,ebx > > + movdqa XMMWORD[16+rsp],xmm3 > > + rol ebp,5 > > + add edx,edi > > + movups xmm0,XMMWORD[((-64))+r15] > > +DB 102,15,56,220,209 > > + xor esi,eax > > + pslld xmm10,2 > > + xor eax,ebx > > + add edx,ebp > > + psrld xmm13,30 > > + add ecx,DWORD[40+rsp] > > + and esi,eax > > + xor eax,ebx > > + por xmm10,xmm13 > > + ror ebp,7 > > + mov edi,edx > > + xor esi,eax > > + rol edx,5 > > + pshufd xmm3,xmm9,238 > > + add ecx,esi > > + xor edi,ebp > > + xor ebp,eax > > + add ecx,edx > > + add ebx,DWORD[44+rsp] > > + and edi,ebp > > + xor ebp,eax > > + ror edx,7 > > + movups xmm1,XMMWORD[((-48))+r15] > > +DB 102,15,56,220,208 > > + mov esi,ecx > > + xor edi,ebp > > + rol ecx,5 > > + add ebx,edi > > + xor esi,edx > > + xor edx,ebp > > + add ebx,ecx > > + pxor xmm11,xmm7 > > + add eax,DWORD[48+rsp] > > + and esi,edx > > + xor edx,ebp > > + ror ecx,7 > > + punpcklqdq xmm3,xmm10 > > + mov edi,ebx > > + xor esi,edx > > + pxor xmm11,xmm4 > > + rol ebx,5 > > + add eax,esi > > + movdqa xmm13,XMMWORD[48+r11] > > + xor edi,ecx > > + paddd xmm12,xmm10 > > + xor ecx,edx > > + pxor xmm11,xmm3 > > + add eax,ebx > > + add ebp,DWORD[52+rsp] > > + movups xmm0,XMMWORD[((-32))+r15] > > +DB 102,15,56,220,209 > > + and edi,ecx > > + xor ecx,edx > > + ror ebx,7 > > + movdqa xmm3,xmm11 > > + mov esi,eax > > + xor edi,ecx > > + movdqa XMMWORD[32+rsp],xmm12 > > + rol eax,5 > > + add ebp,edi > > + xor esi,ebx > > + pslld xmm11,2 > > + xor ebx,ecx > > + add ebp,eax > > + psrld xmm3,30 > > + add edx,DWORD[56+rsp] > > + and esi,ebx > > + xor ebx,ecx > > + por xmm11,xmm3 > > + ror eax,7 > > + mov edi,ebp > > + xor esi,ebx > > + rol ebp,5 > > + pshufd xmm12,xmm10,238 > > + add edx,esi > > + movups xmm1,XMMWORD[((-16))+r15] > > +DB 102,15,56,220,208 > > + xor edi,eax > > + xor eax,ebx > > + add edx,ebp > > + add ecx,DWORD[60+rsp] > > + and edi,eax > > + xor eax,ebx > > + ror ebp,7 > > + mov esi,edx > > + xor edi,eax > > + rol edx,5 > > + add ecx,edi > > + xor esi,ebp > > + xor ebp,eax > > + add ecx,edx > > + pxor xmm4,xmm8 > > + add ebx,DWORD[rsp] > > + and esi,ebp > > + xor ebp,eax > > + ror edx,7 > > + movups xmm0,XMMWORD[r15] > > +DB 102,15,56,220,209 > > + punpcklqdq xmm12,xmm11 > > + mov edi,ecx > > + xor esi,ebp > > + pxor xmm4,xmm5 > > + rol ecx,5 > > + add ebx,esi > > + movdqa xmm3,xmm13 > > + xor edi,edx > > + paddd xmm13,xmm11 > > + xor edx,ebp > > + pxor xmm4,xmm12 > > + add ebx,ecx > > + add eax,DWORD[4+rsp] > > + and edi,edx > > + xor edx,ebp > > + ror ecx,7 > > + movdqa xmm12,xmm4 > > + mov esi,ebx > > + xor edi,edx > > + movdqa XMMWORD[48+rsp],xmm13 > > + rol ebx,5 > > + add eax,edi > > + xor esi,ecx > > + pslld xmm4,2 > > + xor ecx,edx > > + add eax,ebx > > + psrld xmm12,30 > > + add ebp,DWORD[8+rsp] > > + movups xmm1,XMMWORD[16+r15] > > +DB 102,15,56,220,208 > > + and esi,ecx > > + xor ecx,edx > > + por xmm4,xmm12 > > + ror ebx,7 > > + mov edi,eax > > + xor esi,ecx > > + rol eax,5 > > + pshufd xmm13,xmm11,238 > > + add ebp,esi > > + xor edi,ebx > > + xor ebx,ecx > > + add ebp,eax > > + add edx,DWORD[12+rsp] > > + and edi,ebx > > + xor ebx,ecx > > + ror eax,7 > > + mov esi,ebp > > + xor edi,ebx > > + rol ebp,5 > > + add edx,edi > > + movups xmm0,XMMWORD[32+r15] > > +DB 102,15,56,220,209 > > + xor esi,eax > > + xor eax,ebx > > + add edx,ebp > > + pxor xmm5,xmm9 > > + add ecx,DWORD[16+rsp] > > + and esi,eax > > + xor eax,ebx > > + ror ebp,7 > > + punpcklqdq xmm13,xmm4 > > + mov edi,edx > > + xor esi,eax > > + pxor xmm5,xmm6 > > + rol edx,5 > > + add ecx,esi > > + movdqa xmm12,xmm3 > > + xor edi,ebp > > + paddd xmm3,xmm4 > > + xor ebp,eax > > + pxor xmm5,xmm13 > > + add ecx,edx > > + add ebx,DWORD[20+rsp] > > + and edi,ebp > > + xor ebp,eax > > + ror edx,7 > > + movups xmm1,XMMWORD[48+r15] > > +DB 102,15,56,220,208 > > + movdqa xmm13,xmm5 > > + mov esi,ecx > > + xor edi,ebp > > + movdqa XMMWORD[rsp],xmm3 > > + rol ecx,5 > > + add ebx,edi > > + xor esi,edx > > + pslld xmm5,2 > > + xor edx,ebp > > + add ebx,ecx > > + psrld xmm13,30 > > + add eax,DWORD[24+rsp] > > + and esi,edx > > + xor edx,ebp > > + por xmm5,xmm13 > > + ror ecx,7 > > + mov edi,ebx > > + xor esi,edx > > + rol ebx,5 > > + pshufd xmm3,xmm4,238 > > + add eax,esi > > + xor edi,ecx > > + xor ecx,edx > > + add eax,ebx > > + add ebp,DWORD[28+rsp] > > + cmp r8d,11 > > + jb NEAR $L$aesenclast3 > > + movups xmm0,XMMWORD[64+r15] > > +DB 102,15,56,220,209 > > + movups xmm1,XMMWORD[80+r15] > > +DB 102,15,56,220,208 > > + je NEAR $L$aesenclast3 > > + movups xmm0,XMMWORD[96+r15] > > +DB 102,15,56,220,209 > > + movups xmm1,XMMWORD[112+r15] > > +DB 102,15,56,220,208 > > +$L$aesenclast3: > > +DB 102,15,56,221,209 > > + movups xmm0,XMMWORD[((16-112))+r15] > > + and edi,ecx > > + xor ecx,edx > > + ror ebx,7 > > + mov esi,eax > > + xor edi,ecx > > + rol eax,5 > > + add ebp,edi > > + xor esi,ebx > > + xor ebx,ecx > > + add ebp,eax > > + pxor xmm6,xmm10 > > + add edx,DWORD[32+rsp] > > + and esi,ebx > > + xor ebx,ecx > > + ror eax,7 > > + punpcklqdq xmm3,xmm5 > > + mov edi,ebp > > + xor esi,ebx > > + pxor xmm6,xmm7 > > + rol ebp,5 > > + add edx,esi > > + movups xmm14,XMMWORD[48+r12] > > + xorps xmm14,xmm15 > > + movups XMMWORD[32+r12*1+r13],xmm2 > > + xorps xmm2,xmm14 > > + movups xmm1,XMMWORD[((-80))+r15] > > +DB 102,15,56,220,208 > > + movdqa xmm13,xmm12 > > + xor edi,eax > > + paddd xmm12,xmm5 > > + xor eax,ebx > > + pxor xmm6,xmm3 > > + add edx,ebp > > + add ecx,DWORD[36+rsp] > > + and edi,eax > > + xor eax,ebx > > + ror ebp,7 > > + movdqa xmm3,xmm6 > > + mov esi,edx > > + xor edi,eax > > + movdqa XMMWORD[16+rsp],xmm12 > > + rol edx,5 > > + add ecx,edi > > + xor esi,ebp > > + pslld xmm6,2 > > + xor ebp,eax > > + add ecx,edx > > + psrld xmm3,30 > > + add ebx,DWORD[40+rsp] > > + and esi,ebp > > + xor ebp,eax > > + por xmm6,xmm3 > > + ror edx,7 > > + movups xmm0,XMMWORD[((-64))+r15] > > +DB 102,15,56,220,209 > > + mov edi,ecx > > + xor esi,ebp > > + rol ecx,5 > > + pshufd xmm12,xmm5,238 > > + add ebx,esi > > + xor edi,edx > > + xor edx,ebp > > + add ebx,ecx > > + add eax,DWORD[44+rsp] > > + and edi,edx > > + xor edx,ebp > > + ror ecx,7 > > + mov esi,ebx > > + xor edi,edx > > + rol ebx,5 > > + add eax,edi > > + xor esi,edx > > + add eax,ebx > > + pxor xmm7,xmm11 > > + add ebp,DWORD[48+rsp] > > + movups xmm1,XMMWORD[((-48))+r15] > > +DB 102,15,56,220,208 > > + xor esi,ecx > > + punpcklqdq xmm12,xmm6 > > + mov edi,eax > > + rol eax,5 > > + pxor xmm7,xmm8 > > + add ebp,esi > > + xor edi,ecx > > + movdqa xmm3,xmm13 > > + ror ebx,7 > > + paddd xmm13,xmm6 > > + add ebp,eax > > + pxor xmm7,xmm12 > > + add edx,DWORD[52+rsp] > > + xor edi,ebx > > + mov esi,ebp > > + rol ebp,5 > > + movdqa xmm12,xmm7 > > + add edx,edi > > + xor esi,ebx > > + movdqa XMMWORD[32+rsp],xmm13 > > + ror eax,7 > > + add edx,ebp > > + add ecx,DWORD[56+rsp] > > + pslld xmm7,2 > > + xor esi,eax > > + mov edi,edx > > + psrld xmm12,30 > > + rol edx,5 > > + add ecx,esi > > + movups xmm0,XMMWORD[((-32))+r15] > > +DB 102,15,56,220,209 > > + xor edi,eax > > + ror ebp,7 > > + por xmm7,xmm12 > > + add ecx,edx > > + add ebx,DWORD[60+rsp] > > + xor edi,ebp > > + mov esi,ecx > > + rol ecx,5 > > + add ebx,edi > > + xor esi,ebp > > + ror edx,7 > > + add ebx,ecx > > + add eax,DWORD[rsp] > > + xor esi,edx > > + mov edi,ebx > > + rol ebx,5 > > + paddd xmm3,xmm7 > > + add eax,esi > > + xor edi,edx > > + movdqa XMMWORD[48+rsp],xmm3 > > + ror ecx,7 > > + add eax,ebx > > + add ebp,DWORD[4+rsp] > > + movups xmm1,XMMWORD[((-16))+r15] > > +DB 102,15,56,220,208 > > + xor edi,ecx > > + mov esi,eax > > + rol eax,5 > > + add ebp,edi > > + xor esi,ecx > > + ror ebx,7 > > + add ebp,eax > > + add edx,DWORD[8+rsp] > > + xor esi,ebx > > + mov edi,ebp > > + rol ebp,5 > > + add edx,esi > > + xor edi,ebx > > + ror eax,7 > > + add edx,ebp > > + add ecx,DWORD[12+rsp] > > + xor edi,eax > > + mov esi,edx > > + rol edx,5 > > + add ecx,edi > > + movups xmm0,XMMWORD[r15] > > +DB 102,15,56,220,209 > > + xor esi,eax > > + ror ebp,7 > > + add ecx,edx > > + cmp r10,r14 > > + je NEAR $L$done_ssse3 > > + movdqa xmm3,XMMWORD[64+r11] > > + movdqa xmm13,XMMWORD[r11] > > + movdqu xmm4,XMMWORD[r10] > > + movdqu xmm5,XMMWORD[16+r10] > > + movdqu xmm6,XMMWORD[32+r10] > > + movdqu xmm7,XMMWORD[48+r10] > > +DB 102,15,56,0,227 > > + add r10,64 > > + add ebx,DWORD[16+rsp] > > + xor esi,ebp > > + mov edi,ecx > > +DB 102,15,56,0,235 > > + rol ecx,5 > > + add ebx,esi > > + xor edi,ebp > > + ror edx,7 > > + paddd xmm4,xmm13 > > + add ebx,ecx > > + add eax,DWORD[20+rsp] > > + xor edi,edx > > + mov esi,ebx > > + movdqa XMMWORD[rsp],xmm4 > > + rol ebx,5 > > + add eax,edi > > + xor esi,edx > > + ror ecx,7 > > + psubd xmm4,xmm13 > > + add eax,ebx > > + add ebp,DWORD[24+rsp] > > + movups xmm1,XMMWORD[16+r15] > > +DB 102,15,56,220,208 > > + xor esi,ecx > > + mov edi,eax > > + rol eax,5 > > + add ebp,esi > > + xor edi,ecx > > + ror ebx,7 > > + add ebp,eax > > + add edx,DWORD[28+rsp] > > + xor edi,ebx > > + mov esi,ebp > > + rol ebp,5 > > + add edx,edi > > + xor esi,ebx > > + ror eax,7 > > + add edx,ebp > > + add ecx,DWORD[32+rsp] > > + xor esi,eax > > + mov edi,edx > > +DB 102,15,56,0,243 > > + rol edx,5 > > + add ecx,esi > > + movups xmm0,XMMWORD[32+r15] > > +DB 102,15,56,220,209 > > + xor edi,eax > > + ror ebp,7 > > + paddd xmm5,xmm13 > > + add ecx,edx > > + add ebx,DWORD[36+rsp] > > + xor edi,ebp > > + mov esi,ecx > > + movdqa XMMWORD[16+rsp],xmm5 > > + rol ecx,5 > > + add ebx,edi > > + xor esi,ebp > > + ror edx,7 > > + psubd xmm5,xmm13 > > + add ebx,ecx > > + add eax,DWORD[40+rsp] > > + xor esi,edx > > + mov edi,ebx > > + rol ebx,5 > > + add eax,esi > > + xor edi,edx > > + ror ecx,7 > > + add eax,ebx > > + add ebp,DWORD[44+rsp] > > + movups xmm1,XMMWORD[48+r15] > > +DB 102,15,56,220,208 > > + xor edi,ecx > > + mov esi,eax > > + rol eax,5 > > + add ebp,edi > > + xor esi,ecx > > + ror ebx,7 > > + add ebp,eax > > + add edx,DWORD[48+rsp] > > + xor esi,ebx > > + mov edi,ebp > > +DB 102,15,56,0,251 > > + rol ebp,5 > > + add edx,esi > > + xor edi,ebx > > + ror eax,7 > > + paddd xmm6,xmm13 > > + add edx,ebp > > + add ecx,DWORD[52+rsp] > > + xor edi,eax > > + mov esi,edx > > + movdqa XMMWORD[32+rsp],xmm6 > > + rol edx,5 > > + add ecx,edi > > + cmp r8d,11 > > + jb NEAR $L$aesenclast4 > > + movups xmm0,XMMWORD[64+r15] > > +DB 102,15,56,220,209 > > + movups xmm1,XMMWORD[80+r15] > > +DB 102,15,56,220,208 > > + je NEAR $L$aesenclast4 > > + movups xmm0,XMMWORD[96+r15] > > +DB 102,15,56,220,209 > > + movups xmm1,XMMWORD[112+r15] > > +DB 102,15,56,220,208 > > +$L$aesenclast4: > > +DB 102,15,56,221,209 > > + movups xmm0,XMMWORD[((16-112))+r15] > > + xor esi,eax > > + ror ebp,7 > > + psubd xmm6,xmm13 > > + add ecx,edx > > + add ebx,DWORD[56+rsp] > > + xor esi,ebp > > + mov edi,ecx > > + rol ecx,5 > > + add ebx,esi > > + xor edi,ebp > > + ror edx,7 > > + add ebx,ecx > > + add eax,DWORD[60+rsp] > > + xor edi,edx > > + mov esi,ebx > > + rol ebx,5 > > + add eax,edi > > + ror ecx,7 > > + add eax,ebx > > + movups XMMWORD[48+r12*1+r13],xmm2 > > + lea r12,[64+r12] > > + > > + add eax,DWORD[r9] > > + add esi,DWORD[4+r9] > > + add ecx,DWORD[8+r9] > > + add edx,DWORD[12+r9] > > + mov DWORD[r9],eax > > + add ebp,DWORD[16+r9] > > + mov DWORD[4+r9],esi > > + mov ebx,esi > > + mov DWORD[8+r9],ecx > > + mov edi,ecx > > + mov DWORD[12+r9],edx > > + xor edi,edx > > + mov DWORD[16+r9],ebp > > + and esi,edi > > + jmp NEAR $L$oop_ssse3 > > + > > +$L$done_ssse3: > > + add ebx,DWORD[16+rsp] > > + xor esi,ebp > > + mov edi,ecx > > + rol ecx,5 > > + add ebx,esi > > + xor edi,ebp > > + ror edx,7 > > + add ebx,ecx > > + add eax,DWORD[20+rsp] > > + xor edi,edx > > + mov esi,ebx > > + rol ebx,5 > > + add eax,edi > > + xor esi,edx > > + ror ecx,7 > > + add eax,ebx > > + add ebp,DWORD[24+rsp] > > + movups xmm1,XMMWORD[16+r15] > > +DB 102,15,56,220,208 > > + xor esi,ecx > > + mov edi,eax > > + rol eax,5 > > + add ebp,esi > > + xor edi,ecx > > + ror ebx,7 > > + add ebp,eax > > + add edx,DWORD[28+rsp] > > + xor edi,ebx > > + mov esi,ebp > > + rol ebp,5 > > + add edx,edi > > + xor esi,ebx > > + ror eax,7 > > + add edx,ebp > > + add ecx,DWORD[32+rsp] > > + xor esi,eax > > + mov edi,edx > > + rol edx,5 > > + add ecx,esi > > + movups xmm0,XMMWORD[32+r15] > > +DB 102,15,56,220,209 > > + xor edi,eax > > + ror ebp,7 > > + add ecx,edx > > + add ebx,DWORD[36+rsp] > > + xor edi,ebp > > + mov esi,ecx > > + rol ecx,5 > > + add ebx,edi > > + xor esi,ebp > > + ror edx,7 > > + add ebx,ecx > > + add eax,DWORD[40+rsp] > > + xor esi,edx > > + mov edi,ebx > > + rol ebx,5 > > + add eax,esi > > + xor edi,edx > > + ror ecx,7 > > + add eax,ebx > > + add ebp,DWORD[44+rsp] > > + movups xmm1,XMMWORD[48+r15] > > +DB 102,15,56,220,208 > > + xor edi,ecx > > + mov esi,eax > > + rol eax,5 > > + add ebp,edi > > + xor esi,ecx > > + ror ebx,7 > > + add ebp,eax > > + add edx,DWORD[48+rsp] > > + xor esi,ebx > > + mov edi,ebp > > + rol ebp,5 > > + add edx,esi > > + xor edi,ebx > > + ror eax,7 > > + add edx,ebp > > + add ecx,DWORD[52+rsp] > > + xor edi,eax > > + mov esi,edx > > + rol edx,5 > > + add ecx,edi > > + cmp r8d,11 > > + jb NEAR $L$aesenclast5 > > + movups xmm0,XMMWORD[64+r15] > > +DB 102,15,56,220,209 > > + movups xmm1,XMMWORD[80+r15] > > +DB 102,15,56,220,208 > > + je NEAR $L$aesenclast5 > > + movups xmm0,XMMWORD[96+r15] > > +DB 102,15,56,220,209 > > + movups xmm1,XMMWORD[112+r15] > > +DB 102,15,56,220,208 > > +$L$aesenclast5: > > +DB 102,15,56,221,209 > > + movups xmm0,XMMWORD[((16-112))+r15] > > + xor esi,eax > > + ror ebp,7 > > + add ecx,edx > > + add ebx,DWORD[56+rsp] > > + xor esi,ebp > > + mov edi,ecx > > + rol ecx,5 > > + add ebx,esi > > + xor edi,ebp > > + ror edx,7 > > + add ebx,ecx > > + add eax,DWORD[60+rsp] > > + xor edi,edx > > + mov esi,ebx > > + rol ebx,5 > > + add eax,edi > > + ror ecx,7 > > + add eax,ebx > > + movups XMMWORD[48+r12*1+r13],xmm2 > > + mov r8,QWORD[88+rsp] > > + > > + add eax,DWORD[r9] > > + add esi,DWORD[4+r9] > > + add ecx,DWORD[8+r9] > > + mov DWORD[r9],eax > > + add edx,DWORD[12+r9] > > + mov DWORD[4+r9],esi > > + add ebp,DWORD[16+r9] > > + mov DWORD[8+r9],ecx > > + mov DWORD[12+r9],edx > > + mov DWORD[16+r9],ebp > > + movups XMMWORD[r8],xmm2 > > + movaps xmm6,XMMWORD[((96+0))+rsp] > > + movaps xmm7,XMMWORD[((96+16))+rsp] > > + movaps xmm8,XMMWORD[((96+32))+rsp] > > + movaps xmm9,XMMWORD[((96+48))+rsp] > > + movaps xmm10,XMMWORD[((96+64))+rsp] > > + movaps xmm11,XMMWORD[((96+80))+rsp] > > + movaps xmm12,XMMWORD[((96+96))+rsp] > > + movaps xmm13,XMMWORD[((96+112))+rsp] > > + movaps xmm14,XMMWORD[((96+128))+rsp] > > + movaps xmm15,XMMWORD[((96+144))+rsp] > > + lea rsi,[264+rsp] > > + > > + mov r15,QWORD[rsi] > > + > > + mov r14,QWORD[8+rsi] > > + > > + mov r13,QWORD[16+rsi] > > + > > + mov r12,QWORD[24+rsi] > > + > > + mov rbp,QWORD[32+rsi] > > + > > + mov rbx,QWORD[40+rsi] > > + > > + lea rsp,[48+rsi] > > + > > +$L$epilogue_ssse3: > > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > > + mov rsi,QWORD[16+rsp] > > + DB 0F3h,0C3h ;repret > > + > > +$L$SEH_end_aesni_cbc_sha1_enc_ssse3: > > +ALIGN 64 > > +K_XX_XX: > > + DD 0x5a827999,0x5a827999,0x5a827999,0x5a827999 > > + DD 0x6ed9eba1,0x6ed9eba1,0x6ed9eba1,0x6ed9eba1 > > + DD 0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc > > + DD 0xca62c1d6,0xca62c1d6,0xca62c1d6,0xca62c1d6 > > + DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f > > +DB > 0xf,0xe,0xd,0xc,0xb,0xa,0x9,0x8,0x7,0x6,0x5,0x4,0x3,0x2,0x1,0x0 > > + > > +DB 65,69,83,78,73,45,67,66,67,43,83,72,65,49,32,115 > > +DB 116,105,116,99,104,32,102,111,114,32,120,56,54,95,54,52 > > +DB 44,32,67,82,89,80,84,79,71,65,77,83,32,98,121,32 > > +DB 60,97,112,112,114,111,64,111,112,101,110,115,115,108,46,111 > > +DB 114,103,62,0 > > +ALIGN 64 > > + > > +ALIGN 32 > > +aesni_cbc_sha1_enc_shaext: > > + mov QWORD[8+rsp],rdi ;WIN64 prologue > > + mov QWORD[16+rsp],rsi > > + mov rax,rsp > > +$L$SEH_begin_aesni_cbc_sha1_enc_shaext: > > + mov rdi,rcx > > + mov rsi,rdx > > + mov rdx,r8 > > + mov rcx,r9 > > + mov r8,QWORD[40+rsp] > > + mov r9,QWORD[48+rsp] > > + > > + > > + > > + mov r10,QWORD[56+rsp] > > + lea rsp,[((-168))+rsp] > > + movaps XMMWORD[(-8-160)+rax],xmm6 > > + movaps XMMWORD[(-8-144)+rax],xmm7 > > + movaps XMMWORD[(-8-128)+rax],xmm8 > > + movaps XMMWORD[(-8-112)+rax],xmm9 > > + movaps XMMWORD[(-8-96)+rax],xmm10 > > + movaps XMMWORD[(-8-80)+rax],xmm11 > > + movaps XMMWORD[(-8-64)+rax],xmm12 > > + movaps XMMWORD[(-8-48)+rax],xmm13 > > + movaps XMMWORD[(-8-32)+rax],xmm14 > > + movaps XMMWORD[(-8-16)+rax],xmm15 > > +$L$prologue_shaext: > > + movdqu xmm8,XMMWORD[r9] > > + movd xmm9,DWORD[16+r9] > > + movdqa xmm7,XMMWORD[((K_XX_XX+80))] > > + > > + mov r11d,DWORD[240+rcx] > > + sub rsi,rdi > > + movups xmm15,XMMWORD[rcx] > > + movups xmm2,XMMWORD[r8] > > + movups xmm0,XMMWORD[16+rcx] > > + lea rcx,[112+rcx] > > + > > + pshufd xmm8,xmm8,27 > > + pshufd xmm9,xmm9,27 > > + jmp NEAR $L$oop_shaext > > + > > +ALIGN 16 > > +$L$oop_shaext: > > + movups xmm14,XMMWORD[rdi] > > + xorps xmm14,xmm15 > > + xorps xmm2,xmm14 > > + movups xmm1,XMMWORD[((-80))+rcx] > > +DB 102,15,56,220,208 > > + movdqu xmm3,XMMWORD[r10] > > + movdqa xmm12,xmm9 > > +DB 102,15,56,0,223 > > + movdqu xmm4,XMMWORD[16+r10] > > + movdqa xmm11,xmm8 > > + movups xmm0,XMMWORD[((-64))+rcx] > > +DB 102,15,56,220,209 > > +DB 102,15,56,0,231 > > + > > + paddd xmm9,xmm3 > > + movdqu xmm5,XMMWORD[32+r10] > > + lea r10,[64+r10] > > + pxor xmm3,xmm12 > > + movups xmm1,XMMWORD[((-48))+rcx] > > +DB 102,15,56,220,208 > > + pxor xmm3,xmm12 > > + movdqa xmm10,xmm8 > > +DB 102,15,56,0,239 > > +DB 69,15,58,204,193,0 > > +DB 68,15,56,200,212 > > + movups xmm0,XMMWORD[((-32))+rcx] > > +DB 102,15,56,220,209 > > +DB 15,56,201,220 > > + movdqu xmm6,XMMWORD[((-16))+r10] > > + movdqa xmm9,xmm8 > > +DB 102,15,56,0,247 > > + movups xmm1,XMMWORD[((-16))+rcx] > > +DB 102,15,56,220,208 > > +DB 69,15,58,204,194,0 > > +DB 68,15,56,200,205 > > + pxor xmm3,xmm5 > > +DB 15,56,201,229 > > + movups xmm0,XMMWORD[rcx] > > +DB 102,15,56,220,209 > > + movdqa xmm10,xmm8 > > +DB 69,15,58,204,193,0 > > +DB 68,15,56,200,214 > > + movups xmm1,XMMWORD[16+rcx] > > +DB 102,15,56,220,208 > > +DB 15,56,202,222 > > + pxor xmm4,xmm6 > > +DB 15,56,201,238 > > + movups xmm0,XMMWORD[32+rcx] > > +DB 102,15,56,220,209 > > + movdqa xmm9,xmm8 > > +DB 69,15,58,204,194,0 > > +DB 68,15,56,200,203 > > + movups xmm1,XMMWORD[48+rcx] > > +DB 102,15,56,220,208 > > +DB 15,56,202,227 > > + pxor xmm5,xmm3 > > +DB 15,56,201,243 > > + cmp r11d,11 > > + jb NEAR $L$aesenclast6 > > + movups xmm0,XMMWORD[64+rcx] > > +DB 102,15,56,220,209 > > + movups xmm1,XMMWORD[80+rcx] > > +DB 102,15,56,220,208 > > + je NEAR $L$aesenclast6 > > + movups xmm0,XMMWORD[96+rcx] > > +DB 102,15,56,220,209 > > + movups xmm1,XMMWORD[112+rcx] > > +DB 102,15,56,220,208 > > +$L$aesenclast6: > > +DB 102,15,56,221,209 > > + movups xmm0,XMMWORD[((16-112))+rcx] > > + movdqa xmm10,xmm8 > > +DB 69,15,58,204,193,0 > > +DB 68,15,56,200,212 > > + movups xmm14,XMMWORD[16+rdi] > > + xorps xmm14,xmm15 > > + movups XMMWORD[rdi*1+rsi],xmm2 > > + xorps xmm2,xmm14 > > + movups xmm1,XMMWORD[((-80))+rcx] > > +DB 102,15,56,220,208 > > +DB 15,56,202,236 > > + pxor xmm6,xmm4 > > +DB 15,56,201,220 > > + movups xmm0,XMMWORD[((-64))+rcx] > > +DB 102,15,56,220,209 > > + movdqa xmm9,xmm8 > > +DB 69,15,58,204,194,1 > > +DB 68,15,56,200,205 > > + movups xmm1,XMMWORD[((-48))+rcx] > > +DB 102,15,56,220,208 > > +DB 15,56,202,245 > > + pxor xmm3,xmm5 > > +DB 15,56,201,229 > > + movups xmm0,XMMWORD[((-32))+rcx] > > +DB 102,15,56,220,209 > > + movdqa xmm10,xmm8 > > +DB 69,15,58,204,193,1 > > +DB 68,15,56,200,214 > > + movups xmm1,XMMWORD[((-16))+rcx] > > +DB 102,15,56,220,208 > > +DB 15,56,202,222 > > + pxor xmm4,xmm6 > > +DB 15,56,201,238 > > + movups xmm0,XMMWORD[rcx] > > +DB 102,15,56,220,209 > > + movdqa xmm9,xmm8 > > +DB 69,15,58,204,194,1 > > +DB 68,15,56,200,203 > > + movups xmm1,XMMWORD[16+rcx] > > +DB 102,15,56,220,208 > > +DB 15,56,202,227 > > + pxor xmm5,xmm3 > > +DB 15,56,201,243 > > + movups xmm0,XMMWORD[32+rcx] > > +DB 102,15,56,220,209 > > + movdqa xmm10,xmm8 > > +DB 69,15,58,204,193,1 > > +DB 68,15,56,200,212 > > + movups xmm1,XMMWORD[48+rcx] > > +DB 102,15,56,220,208 > > +DB 15,56,202,236 > > + pxor xmm6,xmm4 > > +DB 15,56,201,220 > > + cmp r11d,11 > > + jb NEAR $L$aesenclast7 > > + movups xmm0,XMMWORD[64+rcx] > > +DB 102,15,56,220,209 > > + movups xmm1,XMMWORD[80+rcx] > > +DB 102,15,56,220,208 > > + je NEAR $L$aesenclast7 > > + movups xmm0,XMMWORD[96+rcx] > > +DB 102,15,56,220,209 > > + movups xmm1,XMMWORD[112+rcx] > > +DB 102,15,56,220,208 > > +$L$aesenclast7: > > +DB 102,15,56,221,209 > > + movups xmm0,XMMWORD[((16-112))+rcx] > > + movdqa xmm9,xmm8 > > +DB 69,15,58,204,194,1 > > +DB 68,15,56,200,205 > > + movups xmm14,XMMWORD[32+rdi] > > + xorps xmm14,xmm15 > > + movups XMMWORD[16+rdi*1+rsi],xmm2 > > + xorps xmm2,xmm14 > > + movups xmm1,XMMWORD[((-80))+rcx] > > +DB 102,15,56,220,208 > > +DB 15,56,202,245 > > + pxor xmm3,xmm5 > > +DB 15,56,201,229 > > + movups xmm0,XMMWORD[((-64))+rcx] > > +DB 102,15,56,220,209 > > + movdqa xmm10,xmm8 > > +DB 69,15,58,204,193,2 > > +DB 68,15,56,200,214 > > + movups xmm1,XMMWORD[((-48))+rcx] > > +DB 102,15,56,220,208 > > +DB 15,56,202,222 > > + pxor xmm4,xmm6 > > +DB 15,56,201,238 > > + movups xmm0,XMMWORD[((-32))+rcx] > > +DB 102,15,56,220,209 > > + movdqa xmm9,xmm8 > > +DB 69,15,58,204,194,2 > > +DB 68,15,56,200,203 > > + movups xmm1,XMMWORD[((-16))+rcx] > > +DB 102,15,56,220,208 > > +DB 15,56,202,227 > > + pxor xmm5,xmm3 > > +DB 15,56,201,243 > > + movups xmm0,XMMWORD[rcx] > > +DB 102,15,56,220,209 > > + movdqa xmm10,xmm8 > > +DB 69,15,58,204,193,2 > > +DB 68,15,56,200,212 > > + movups xmm1,XMMWORD[16+rcx] > > +DB 102,15,56,220,208 > > +DB 15,56,202,236 > > + pxor xmm6,xmm4 > > +DB 15,56,201,220 > > + movups xmm0,XMMWORD[32+rcx] > > +DB 102,15,56,220,209 > > + movdqa xmm9,xmm8 > > +DB 69,15,58,204,194,2 > > +DB 68,15,56,200,205 > > + movups xmm1,XMMWORD[48+rcx] > > +DB 102,15,56,220,208 > > +DB 15,56,202,245 > > + pxor xmm3,xmm5 > > +DB 15,56,201,229 > > + cmp r11d,11 > > + jb NEAR $L$aesenclast8 > > + movups xmm0,XMMWORD[64+rcx] > > +DB 102,15,56,220,209 > > + movups xmm1,XMMWORD[80+rcx] > > +DB 102,15,56,220,208 > > + je NEAR $L$aesenclast8 > > + movups xmm0,XMMWORD[96+rcx] > > +DB 102,15,56,220,209 > > + movups xmm1,XMMWORD[112+rcx] > > +DB 102,15,56,220,208 > > +$L$aesenclast8: > > +DB 102,15,56,221,209 > > + movups xmm0,XMMWORD[((16-112))+rcx] > > + movdqa xmm10,xmm8 > > +DB 69,15,58,204,193,2 > > +DB 68,15,56,200,214 > > + movups xmm14,XMMWORD[48+rdi] > > + xorps xmm14,xmm15 > > + movups XMMWORD[32+rdi*1+rsi],xmm2 > > + xorps xmm2,xmm14 > > + movups xmm1,XMMWORD[((-80))+rcx] > > +DB 102,15,56,220,208 > > +DB 15,56,202,222 > > + pxor xmm4,xmm6 > > +DB 15,56,201,238 > > + movups xmm0,XMMWORD[((-64))+rcx] > > +DB 102,15,56,220,209 > > + movdqa xmm9,xmm8 > > +DB 69,15,58,204,194,3 > > +DB 68,15,56,200,203 > > + movups xmm1,XMMWORD[((-48))+rcx] > > +DB 102,15,56,220,208 > > +DB 15,56,202,227 > > + pxor xmm5,xmm3 > > +DB 15,56,201,243 > > + movups xmm0,XMMWORD[((-32))+rcx] > > +DB 102,15,56,220,209 > > + movdqa xmm10,xmm8 > > +DB 69,15,58,204,193,3 > > +DB 68,15,56,200,212 > > +DB 15,56,202,236 > > + pxor xmm6,xmm4 > > + movups xmm1,XMMWORD[((-16))+rcx] > > +DB 102,15,56,220,208 > > + movdqa xmm9,xmm8 > > +DB 69,15,58,204,194,3 > > +DB 68,15,56,200,205 > > +DB 15,56,202,245 > > + movups xmm0,XMMWORD[rcx] > > +DB 102,15,56,220,209 > > + movdqa xmm5,xmm12 > > + movdqa xmm10,xmm8 > > +DB 69,15,58,204,193,3 > > +DB 68,15,56,200,214 > > + movups xmm1,XMMWORD[16+rcx] > > +DB 102,15,56,220,208 > > + movdqa xmm9,xmm8 > > +DB 69,15,58,204,194,3 > > +DB 68,15,56,200,205 > > + movups xmm0,XMMWORD[32+rcx] > > +DB 102,15,56,220,209 > > + movups xmm1,XMMWORD[48+rcx] > > +DB 102,15,56,220,208 > > + cmp r11d,11 > > + jb NEAR $L$aesenclast9 > > + movups xmm0,XMMWORD[64+rcx] > > +DB 102,15,56,220,209 > > + movups xmm1,XMMWORD[80+rcx] > > +DB 102,15,56,220,208 > > + je NEAR $L$aesenclast9 > > + movups xmm0,XMMWORD[96+rcx] > > +DB 102,15,56,220,209 > > + movups xmm1,XMMWORD[112+rcx] > > +DB 102,15,56,220,208 > > +$L$aesenclast9: > > +DB 102,15,56,221,209 > > + movups xmm0,XMMWORD[((16-112))+rcx] > > + dec rdx > > + > > + paddd xmm8,xmm11 > > + movups XMMWORD[48+rdi*1+rsi],xmm2 > > + lea rdi,[64+rdi] > > + jnz NEAR $L$oop_shaext > > + > > + pshufd xmm8,xmm8,27 > > + pshufd xmm9,xmm9,27 > > + movups XMMWORD[r8],xmm2 > > + movdqu XMMWORD[r9],xmm8 > > + movd DWORD[16+r9],xmm9 > > + movaps xmm6,XMMWORD[((-8-160))+rax] > > + movaps xmm7,XMMWORD[((-8-144))+rax] > > + movaps xmm8,XMMWORD[((-8-128))+rax] > > + movaps xmm9,XMMWORD[((-8-112))+rax] > > + movaps xmm10,XMMWORD[((-8-96))+rax] > > + movaps xmm11,XMMWORD[((-8-80))+rax] > > + movaps xmm12,XMMWORD[((-8-64))+rax] > > + movaps xmm13,XMMWORD[((-8-48))+rax] > > + movaps xmm14,XMMWORD[((-8-32))+rax] > > + movaps xmm15,XMMWORD[((-8-16))+rax] > > + mov rsp,rax > > +$L$epilogue_shaext: > > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > > + mov rsi,QWORD[16+rsp] > > + DB 0F3h,0C3h ;repret > > + > > +$L$SEH_end_aesni_cbc_sha1_enc_shaext: > > +EXTERN __imp_RtlVirtualUnwind > > + > > +ALIGN 16 > > +ssse3_handler: > > + push rsi > > + push rdi > > + push rbx > > + push rbp > > + push r12 > > + push r13 > > + push r14 > > + push r15 > > + pushfq > > + sub rsp,64 > > + > > + mov rax,QWORD[120+r8] > > + mov rbx,QWORD[248+r8] > > + > > + mov rsi,QWORD[8+r9] > > + mov r11,QWORD[56+r9] > > + > > + mov r10d,DWORD[r11] > > + lea r10,[r10*1+rsi] > > + cmp rbx,r10 > > + jb NEAR $L$common_seh_tail > > + > > + mov rax,QWORD[152+r8] > > + > > + mov r10d,DWORD[4+r11] > > + lea r10,[r10*1+rsi] > > + cmp rbx,r10 > > + jae NEAR $L$common_seh_tail > > + lea r10,[aesni_cbc_sha1_enc_shaext] > > + cmp rbx,r10 > > + jb NEAR $L$seh_no_shaext > > + > > + lea rsi,[rax] > > + lea rdi,[512+r8] > > + mov ecx,20 > > + DD 0xa548f3fc > > + lea rax,[168+rax] > > + jmp NEAR $L$common_seh_tail > > +$L$seh_no_shaext: > > + lea rsi,[96+rax] > > + lea rdi,[512+r8] > > + mov ecx,20 > > + DD 0xa548f3fc > > + lea rax,[264+rax] > > + > > + mov r15,QWORD[rax] > > + mov r14,QWORD[8+rax] > > + mov r13,QWORD[16+rax] > > + mov r12,QWORD[24+rax] > > + mov rbp,QWORD[32+rax] > > + mov rbx,QWORD[40+rax] > > + lea rax,[48+rax] > > + mov QWORD[144+r8],rbx > > + mov QWORD[160+r8],rbp > > + mov QWORD[216+r8],r12 > > + mov QWORD[224+r8],r13 > > + mov QWORD[232+r8],r14 > > + mov QWORD[240+r8],r15 > > + > > +$L$common_seh_tail: > > + mov rdi,QWORD[8+rax] > > + mov rsi,QWORD[16+rax] > > + mov QWORD[152+r8],rax > > + mov QWORD[168+r8],rsi > > + mov QWORD[176+r8],rdi > > + > > + mov rdi,QWORD[40+r9] > > + mov rsi,r8 > > + mov ecx,154 > > + DD 0xa548f3fc > > + > > + mov rsi,r9 > > + xor rcx,rcx > > + mov rdx,QWORD[8+rsi] > > + mov r8,QWORD[rsi] > > + mov r9,QWORD[16+rsi] > > + mov r10,QWORD[40+rsi] > > + lea r11,[56+rsi] > > + lea r12,[24+rsi] > > + mov QWORD[32+rsp],r10 > > + mov QWORD[40+rsp],r11 > > + mov QWORD[48+rsp],r12 > > + mov QWORD[56+rsp],rcx > > + call QWORD[__imp_RtlVirtualUnwind] > > + > > + mov eax,1 > > + add rsp,64 > > + popfq > > + pop r15 > > + pop r14 > > + pop r13 > > + pop r12 > > + pop rbp > > + pop rbx > > + pop rdi > > + pop rsi > > + DB 0F3h,0C3h ;repret > > + > > + > > +section .pdata rdata align=4 > > +ALIGN 4 > > + DD $L$SEH_begin_aesni_cbc_sha1_enc_ssse3 > wrt ..imagebase > > + DD $L$SEH_end_aesni_cbc_sha1_enc_ssse3 > wrt ..imagebase > > + DD $L$SEH_info_aesni_cbc_sha1_enc_ssse3 > wrt ..imagebase > > + DD $L$SEH_begin_aesni_cbc_sha1_enc_shaext > wrt ..imagebase > > + DD $L$SEH_end_aesni_cbc_sha1_enc_shaext > wrt ..imagebase > > + DD $L$SEH_info_aesni_cbc_sha1_enc_shaext > wrt ..imagebase > > +section .xdata rdata align=8 > > +ALIGN 8 > > +$L$SEH_info_aesni_cbc_sha1_enc_ssse3: > > +DB 9,0,0,0 > > + DD ssse3_handler wrt ..imagebase > > + DD $L$prologue_ssse3 > wrt ..imagebase,$L$epilogue_ssse3 > > wrt ..imagebase > > +$L$SEH_info_aesni_cbc_sha1_enc_shaext: > > +DB 9,0,0,0 > > + DD ssse3_handler wrt ..imagebase > > + DD $L$prologue_shaext > wrt ..imagebase,$L$epilogue_shaext > > wrt ..imagebase > > diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha256- > > x86_64.nasm > b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha256- > > x86_64.nasm > > new file mode 100644 > > index 0000000000..f5c250b904 > > --- /dev/null > > +++ b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha256- > > x86_64.nasm > > @@ -0,0 +1,78 @@ > > +; WARNING: do not edit! > > +; Generated from openssl/crypto/aes/asm/aesni-sha256-x86_64.pl > > +; > > +; Copyright 2013-2020 The OpenSSL Project Authors. All Rights Reserved. > > +; > > +; Licensed under the OpenSSL license (the "License"). You may not use > > +; this file except in compliance with the License. You can obtain a copy > > +; in the file LICENSE in the source distribution or at > > +; https://www.openssl.org/source/license.html > > + > > +default rel > > +%define XMMWORD > > +%define YMMWORD > > +%define ZMMWORD > > +section .text code align=64 > > + > > + > > +EXTERN OPENSSL_ia32cap_P > > +global aesni_cbc_sha256_enc > > + > > +ALIGN 16 > > +aesni_cbc_sha256_enc: > > + > > + xor eax,eax > > + cmp rcx,0 > > + je NEAR $L$probe > > + ud2 > > +$L$probe: > > + DB 0F3h,0C3h ;repret > > + > > + > > + > > +ALIGN 64 > > + > > +K256: > > + DD 0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5 > > + DD 0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5 > > + DD 0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5 > > + DD 0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5 > > + DD 0xd807aa98,0x12835b01,0x243185be,0x550c7dc3 > > + DD 0xd807aa98,0x12835b01,0x243185be,0x550c7dc3 > > + DD 0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174 > > + DD 0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174 > > + DD 0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc > > + DD 0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc > > + DD 0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da > > + DD 0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da > > + DD 0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7 > > + DD 0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7 > > + DD 0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967 > > + DD 0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967 > > + DD 0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13 > > + DD 0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13 > > + DD 0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85 > > + DD 0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85 > > + DD 0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3 > > + DD 0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3 > > + DD 0xd192e819,0xd6990624,0xf40e3585,0x106aa070 > > + DD 0xd192e819,0xd6990624,0xf40e3585,0x106aa070 > > + DD 0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5 > > + DD 0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5 > > + DD 0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3 > > + DD 0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3 > > + DD 0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208 > > + DD 0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208 > > + DD 0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2 > > + DD 0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2 > > + > > + DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f > > + DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f > > + DD 0,0,0,0,0,0,0,0,-1,-1,-1,-1 > > + DD 0,0,0,0,0,0,0,0 > > +DB 65,69,83,78,73,45,67,66,67,43,83,72,65,50,53,54 > > +DB 32,115,116,105,116,99,104,32,102,111,114,32,120,56,54,95 > > +DB 54,52,44,32,67,82,89,80,84,79,71,65,77,83,32,98 > > +DB 121,32,60,97,112,112,114,111,64,111,112,101,110,115,115,108 > > +DB 46,111,114,103,62,0 > > +ALIGN 64 > > diff --git > a/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-x86_64.nasm > > b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-x86_64.nasm > > new file mode 100644 > > index 0000000000..57ee23ea8c > > --- /dev/null > > +++ b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-x86_64.nasm > > @@ -0,0 +1,5103 @@ > > +; WARNING: do not edit! > > +; Generated from openssl/crypto/aes/asm/aesni-x86_64.pl > > +; > > +; Copyright 2009-2020 The OpenSSL Project Authors. All Rights Reserved. > > +; > > +; Licensed under the OpenSSL license (the "License"). You may not use > > +; this file except in compliance with the License. You can obtain a copy > > +; in the file LICENSE in the source distribution or at > > +; https://www.openssl.org/source/license.html > > + > > +default rel > > +%define XMMWORD > > +%define YMMWORD > > +%define ZMMWORD > > +section .text code align=64 > > + > > +EXTERN OPENSSL_ia32cap_P > > +global aesni_encrypt > > + > > +ALIGN 16 > > +aesni_encrypt: > > + > > + movups xmm2,XMMWORD[rcx] > > + mov eax,DWORD[240+r8] > > + movups xmm0,XMMWORD[r8] > > + movups xmm1,XMMWORD[16+r8] > > + lea r8,[32+r8] > > + xorps xmm2,xmm0 > > +$L$oop_enc1_1: > > +DB 102,15,56,220,209 > > + dec eax > > + movups xmm1,XMMWORD[r8] > > + lea r8,[16+r8] > > + jnz NEAR $L$oop_enc1_1 > > +DB 102,15,56,221,209 > > + pxor xmm0,xmm0 > > + pxor xmm1,xmm1 > > + movups XMMWORD[rdx],xmm2 > > + pxor xmm2,xmm2 > > + DB 0F3h,0C3h ;repret > > + > > + > > + > > +global aesni_decrypt > > + > > +ALIGN 16 > > +aesni_decrypt: > > + > > + movups xmm2,XMMWORD[rcx] > > + mov eax,DWORD[240+r8] > > + movups xmm0,XMMWORD[r8] > > + movups xmm1,XMMWORD[16+r8] > > + lea r8,[32+r8] > > + xorps xmm2,xmm0 > > +$L$oop_dec1_2: > > +DB 102,15,56,222,209 > > + dec eax > > + movups xmm1,XMMWORD[r8] > > + lea r8,[16+r8] > > + jnz NEAR $L$oop_dec1_2 > > +DB 102,15,56,223,209 > > + pxor xmm0,xmm0 > > + pxor xmm1,xmm1 > > + movups XMMWORD[rdx],xmm2 > > + pxor xmm2,xmm2 > > + DB 0F3h,0C3h ;repret > > + > > + > > + > > +ALIGN 16 > > +_aesni_encrypt2: > > + > > + movups xmm0,XMMWORD[rcx] > > + shl eax,4 > > + movups xmm1,XMMWORD[16+rcx] > > + xorps xmm2,xmm0 > > + xorps xmm3,xmm0 > > + movups xmm0,XMMWORD[32+rcx] > > + lea rcx,[32+rax*1+rcx] > > + neg rax > > + add rax,16 > > + > > +$L$enc_loop2: > > +DB 102,15,56,220,209 > > +DB 102,15,56,220,217 > > + movups xmm1,XMMWORD[rax*1+rcx] > > + add rax,32 > > +DB 102,15,56,220,208 > > +DB 102,15,56,220,216 > > + movups xmm0,XMMWORD[((-16))+rax*1+rcx] > > + jnz NEAR $L$enc_loop2 > > + > > +DB 102,15,56,220,209 > > +DB 102,15,56,220,217 > > +DB 102,15,56,221,208 > > +DB 102,15,56,221,216 > > + DB 0F3h,0C3h ;repret > > + > > + > > + > > +ALIGN 16 > > +_aesni_decrypt2: > > + > > + movups xmm0,XMMWORD[rcx] > > + shl eax,4 > > + movups xmm1,XMMWORD[16+rcx] > > + xorps xmm2,xmm0 > > + xorps xmm3,xmm0 > > + movups xmm0,XMMWORD[32+rcx] > > + lea rcx,[32+rax*1+rcx] > > + neg rax > > + add rax,16 > > + > > +$L$dec_loop2: > > +DB 102,15,56,222,209 > > +DB 102,15,56,222,217 > > + movups xmm1,XMMWORD[rax*1+rcx] > > + add rax,32 > > +DB 102,15,56,222,208 > > +DB 102,15,56,222,216 > > + movups xmm0,XMMWORD[((-16))+rax*1+rcx] > > + jnz NEAR $L$dec_loop2 > > + > > +DB 102,15,56,222,209 > > +DB 102,15,56,222,217 > > +DB 102,15,56,223,208 > > +DB 102,15,56,223,216 > > + DB 0F3h,0C3h ;repret > > + > > + > > + > > +ALIGN 16 > > +_aesni_encrypt3: > > + > > + movups xmm0,XMMWORD[rcx] > > + shl eax,4 > > + movups xmm1,XMMWORD[16+rcx] > > + xorps xmm2,xmm0 > > + xorps xmm3,xmm0 > > + xorps xmm4,xmm0 > > + movups xmm0,XMMWORD[32+rcx] > > + lea rcx,[32+rax*1+rcx] > > + neg rax > > + add rax,16 > > + > > +$L$enc_loop3: > > +DB 102,15,56,220,209 > > +DB 102,15,56,220,217 > > +DB 102,15,56,220,225 > > + movups xmm1,XMMWORD[rax*1+rcx] > > + add rax,32 > > +DB 102,15,56,220,208 > > +DB 102,15,56,220,216 > > +DB 102,15,56,220,224 > > + movups xmm0,XMMWORD[((-16))+rax*1+rcx] > > + jnz NEAR $L$enc_loop3 > > + > > +DB 102,15,56,220,209 > > +DB 102,15,56,220,217 > > +DB 102,15,56,220,225 > > +DB 102,15,56,221,208 > > +DB 102,15,56,221,216 > > +DB 102,15,56,221,224 > > + DB 0F3h,0C3h ;repret > > + > > + > > + > > +ALIGN 16 > > +_aesni_decrypt3: > > + > > + movups xmm0,XMMWORD[rcx] > > + shl eax,4 > > + movups xmm1,XMMWORD[16+rcx] > > + xorps xmm2,xmm0 > > + xorps xmm3,xmm0 > > + xorps xmm4,xmm0 > > + movups xmm0,XMMWORD[32+rcx] > > + lea rcx,[32+rax*1+rcx] > > + neg rax > > + add rax,16 > > + > > +$L$dec_loop3: > > +DB 102,15,56,222,209 > > +DB 102,15,56,222,217 > > +DB 102,15,56,222,225 > > + movups xmm1,XMMWORD[rax*1+rcx] > > + add rax,32 > > +DB 102,15,56,222,208 > > +DB 102,15,56,222,216 > > +DB 102,15,56,222,224 > > + movups xmm0,XMMWORD[((-16))+rax*1+rcx] > > + jnz NEAR $L$dec_loop3 > > + > > +DB 102,15,56,222,209 > > +DB 102,15,56,222,217 > > +DB 102,15,56,222,225 > > +DB 102,15,56,223,208 > > +DB 102,15,56,223,216 > > +DB 102,15,56,223,224 > > + DB 0F3h,0C3h ;repret > > + > > + > > + > > +ALIGN 16 > > +_aesni_encrypt4: > > + > > + movups xmm0,XMMWORD[rcx] > > + shl eax,4 > > + movups xmm1,XMMWORD[16+rcx] > > + xorps xmm2,xmm0 > > + xorps xmm3,xmm0 > > + xorps xmm4,xmm0 > > + xorps xmm5,xmm0 > > + movups xmm0,XMMWORD[32+rcx] > > + lea rcx,[32+rax*1+rcx] > > + neg rax > > +DB 0x0f,0x1f,0x00 > > + add rax,16 > > + > > +$L$enc_loop4: > > +DB 102,15,56,220,209 > > +DB 102,15,56,220,217 > > +DB 102,15,56,220,225 > > +DB 102,15,56,220,233 > > + movups xmm1,XMMWORD[rax*1+rcx] > > + add rax,32 > > +DB 102,15,56,220,208 > > +DB 102,15,56,220,216 > > +DB 102,15,56,220,224 > > +DB 102,15,56,220,232 > > + movups xmm0,XMMWORD[((-16))+rax*1+rcx] > > + jnz NEAR $L$enc_loop4 > > + > > +DB 102,15,56,220,209 > > +DB 102,15,56,220,217 > > +DB 102,15,56,220,225 > > +DB 102,15,56,220,233 > > +DB 102,15,56,221,208 > > +DB 102,15,56,221,216 > > +DB 102,15,56,221,224 > > +DB 102,15,56,221,232 > > + DB 0F3h,0C3h ;repret > > + > > + > > + > > +ALIGN 16 > > +_aesni_decrypt4: > > + > > + movups xmm0,XMMWORD[rcx] > > + shl eax,4 > > + movups xmm1,XMMWORD[16+rcx] > > + xorps xmm2,xmm0 > > + xorps xmm3,xmm0 > > + xorps xmm4,xmm0 > > + xorps xmm5,xmm0 > > + movups xmm0,XMMWORD[32+rcx] > > + lea rcx,[32+rax*1+rcx] > > + neg rax > > +DB 0x0f,0x1f,0x00 > > + add rax,16 > > + > > +$L$dec_loop4: > > +DB 102,15,56,222,209 > > +DB 102,15,56,222,217 > > +DB 102,15,56,222,225 > > +DB 102,15,56,222,233 > > + movups xmm1,XMMWORD[rax*1+rcx] > > + add rax,32 > > +DB 102,15,56,222,208 > > +DB 102,15,56,222,216 > > +DB 102,15,56,222,224 > > +DB 102,15,56,222,232 > > + movups xmm0,XMMWORD[((-16))+rax*1+rcx] > > + jnz NEAR $L$dec_loop4 > > + > > +DB 102,15,56,222,209 > > +DB 102,15,56,222,217 > > +DB 102,15,56,222,225 > > +DB 102,15,56,222,233 > > +DB 102,15,56,223,208 > > +DB 102,15,56,223,216 > > +DB 102,15,56,223,224 > > +DB 102,15,56,223,232 > > + DB 0F3h,0C3h ;repret > > + > > + > > + > > +ALIGN 16 > > +_aesni_encrypt6: > > + > > + movups xmm0,XMMWORD[rcx] > > + shl eax,4 > > + movups xmm1,XMMWORD[16+rcx] > > + xorps xmm2,xmm0 > > + pxor xmm3,xmm0 > > + pxor xmm4,xmm0 > > +DB 102,15,56,220,209 > > + lea rcx,[32+rax*1+rcx] > > + neg rax > > +DB 102,15,56,220,217 > > + pxor xmm5,xmm0 > > + pxor xmm6,xmm0 > > +DB 102,15,56,220,225 > > + pxor xmm7,xmm0 > > + movups xmm0,XMMWORD[rax*1+rcx] > > + add rax,16 > > + jmp NEAR $L$enc_loop6_enter > > +ALIGN 16 > > +$L$enc_loop6: > > +DB 102,15,56,220,209 > > +DB 102,15,56,220,217 > > +DB 102,15,56,220,225 > > +$L$enc_loop6_enter: > > +DB 102,15,56,220,233 > > +DB 102,15,56,220,241 > > +DB 102,15,56,220,249 > > + movups xmm1,XMMWORD[rax*1+rcx] > > + add rax,32 > > +DB 102,15,56,220,208 > > +DB 102,15,56,220,216 > > +DB 102,15,56,220,224 > > +DB 102,15,56,220,232 > > +DB 102,15,56,220,240 > > +DB 102,15,56,220,248 > > + movups xmm0,XMMWORD[((-16))+rax*1+rcx] > > + jnz NEAR $L$enc_loop6 > > + > > +DB 102,15,56,220,209 > > +DB 102,15,56,220,217 > > +DB 102,15,56,220,225 > > +DB 102,15,56,220,233 > > +DB 102,15,56,220,241 > > +DB 102,15,56,220,249 > > +DB 102,15,56,221,208 > > +DB 102,15,56,221,216 > > +DB 102,15,56,221,224 > > +DB 102,15,56,221,232 > > +DB 102,15,56,221,240 > > +DB 102,15,56,221,248 > > + DB 0F3h,0C3h ;repret > > + > > + > > + > > +ALIGN 16 > > +_aesni_decrypt6: > > + > > + movups xmm0,XMMWORD[rcx] > > + shl eax,4 > > + movups xmm1,XMMWORD[16+rcx] > > + xorps xmm2,xmm0 > > + pxor xmm3,xmm0 > > + pxor xmm4,xmm0 > > +DB 102,15,56,222,209 > > + lea rcx,[32+rax*1+rcx] > > + neg rax > > +DB 102,15,56,222,217 > > + pxor xmm5,xmm0 > > + pxor xmm6,xmm0 > > +DB 102,15,56,222,225 > > + pxor xmm7,xmm0 > > + movups xmm0,XMMWORD[rax*1+rcx] > > + add rax,16 > > + jmp NEAR $L$dec_loop6_enter > > +ALIGN 16 > > +$L$dec_loop6: > > +DB 102,15,56,222,209 > > +DB 102,15,56,222,217 > > +DB 102,15,56,222,225 > > +$L$dec_loop6_enter: > > +DB 102,15,56,222,233 > > +DB 102,15,56,222,241 > > +DB 102,15,56,222,249 > > + movups xmm1,XMMWORD[rax*1+rcx] > > + add rax,32 > > +DB 102,15,56,222,208 > > +DB 102,15,56,222,216 > > +DB 102,15,56,222,224 > > +DB 102,15,56,222,232 > > +DB 102,15,56,222,240 > > +DB 102,15,56,222,248 > > + movups xmm0,XMMWORD[((-16))+rax*1+rcx] > > + jnz NEAR $L$dec_loop6 > > + > > +DB 102,15,56,222,209 > > +DB 102,15,56,222,217 > > +DB 102,15,56,222,225 > > +DB 102,15,56,222,233 > > +DB 102,15,56,222,241 > > +DB 102,15,56,222,249 > > +DB 102,15,56,223,208 > > +DB 102,15,56,223,216 > > +DB 102,15,56,223,224 > > +DB 102,15,56,223,232 > > +DB 102,15,56,223,240 > > +DB 102,15,56,223,248 > > + DB 0F3h,0C3h ;repret > > + > > + > > + > > +ALIGN 16 > > +_aesni_encrypt8: > > + > > + movups xmm0,XMMWORD[rcx] > > + shl eax,4 > > + movups xmm1,XMMWORD[16+rcx] > > + xorps xmm2,xmm0 > > + xorps xmm3,xmm0 > > + pxor xmm4,xmm0 > > + pxor xmm5,xmm0 > > + pxor xmm6,xmm0 > > + lea rcx,[32+rax*1+rcx] > > + neg rax > > +DB 102,15,56,220,209 > > + pxor xmm7,xmm0 > > + pxor xmm8,xmm0 > > +DB 102,15,56,220,217 > > + pxor xmm9,xmm0 > > + movups xmm0,XMMWORD[rax*1+rcx] > > + add rax,16 > > + jmp NEAR $L$enc_loop8_inner > > +ALIGN 16 > > +$L$enc_loop8: > > +DB 102,15,56,220,209 > > +DB 102,15,56,220,217 > > +$L$enc_loop8_inner: > > +DB 102,15,56,220,225 > > +DB 102,15,56,220,233 > > +DB 102,15,56,220,241 > > +DB 102,15,56,220,249 > > +DB 102,68,15,56,220,193 > > +DB 102,68,15,56,220,201 > > +$L$enc_loop8_enter: > > + movups xmm1,XMMWORD[rax*1+rcx] > > + add rax,32 > > +DB 102,15,56,220,208 > > +DB 102,15,56,220,216 > > +DB 102,15,56,220,224 > > +DB 102,15,56,220,232 > > +DB 102,15,56,220,240 > > +DB 102,15,56,220,248 > > +DB 102,68,15,56,220,192 > > +DB 102,68,15,56,220,200 > > + movups xmm0,XMMWORD[((-16))+rax*1+rcx] > > + jnz NEAR $L$enc_loop8 > > + > > +DB 102,15,56,220,209 > > +DB 102,15,56,220,217 > > +DB 102,15,56,220,225 > > +DB 102,15,56,220,233 > > +DB 102,15,56,220,241 > > +DB 102,15,56,220,249 > > +DB 102,68,15,56,220,193 > > +DB 102,68,15,56,220,201 > > +DB 102,15,56,221,208 > > +DB 102,15,56,221,216 > > +DB 102,15,56,221,224 > > +DB 102,15,56,221,232 > > +DB 102,15,56,221,240 > > +DB 102,15,56,221,248 > > +DB 102,68,15,56,221,192 > > +DB 102,68,15,56,221,200 > > + DB 0F3h,0C3h ;repret > > + > > + > > + > > +ALIGN 16 > > +_aesni_decrypt8: > > + > > + movups xmm0,XMMWORD[rcx] > > + shl eax,4 > > + movups xmm1,XMMWORD[16+rcx] > > + xorps xmm2,xmm0 > > + xorps xmm3,xmm0 > > + pxor xmm4,xmm0 > > + pxor xmm5,xmm0 > > + pxor xmm6,xmm0 > > + lea rcx,[32+rax*1+rcx] > > + neg rax > > +DB 102,15,56,222,209 > > + pxor xmm7,xmm0 > > + pxor xmm8,xmm0 > > +DB 102,15,56,222,217 > > + pxor xmm9,xmm0 > > + movups xmm0,XMMWORD[rax*1+rcx] > > + add rax,16 > > + jmp NEAR $L$dec_loop8_inner > > +ALIGN 16 > > +$L$dec_loop8: > > +DB 102,15,56,222,209 > > +DB 102,15,56,222,217 > > +$L$dec_loop8_inner: > > +DB 102,15,56,222,225 > > +DB 102,15,56,222,233 > > +DB 102,15,56,222,241 > > +DB 102,15,56,222,249 > > +DB 102,68,15,56,222,193 > > +DB 102,68,15,56,222,201 > > +$L$dec_loop8_enter: > > + movups xmm1,XMMWORD[rax*1+rcx] > > + add rax,32 > > +DB 102,15,56,222,208 > > +DB 102,15,56,222,216 > > +DB 102,15,56,222,224 > > +DB 102,15,56,222,232 > > +DB 102,15,56,222,240 > > +DB 102,15,56,222,248 > > +DB 102,68,15,56,222,192 > > +DB 102,68,15,56,222,200 > > + movups xmm0,XMMWORD[((-16))+rax*1+rcx] > > + jnz NEAR $L$dec_loop8 > > + > > +DB 102,15,56,222,209 > > +DB 102,15,56,222,217 > > +DB 102,15,56,222,225 > > +DB 102,15,56,222,233 > > +DB 102,15,56,222,241 > > +DB 102,15,56,222,249 > > +DB 102,68,15,56,222,193 > > +DB 102,68,15,56,222,201 > > +DB 102,15,56,223,208 > > +DB 102,15,56,223,216 > > +DB 102,15,56,223,224 > > +DB 102,15,56,223,232 > > +DB 102,15,56,223,240 > > +DB 102,15,56,223,248 > > +DB 102,68,15,56,223,192 > > +DB 102,68,15,56,223,200 > > + DB 0F3h,0C3h ;repret > > + > > + > > +global aesni_ecb_encrypt > > + > > +ALIGN 16 > > +aesni_ecb_encrypt: > > + mov QWORD[8+rsp],rdi ;WIN64 prologue > > + mov QWORD[16+rsp],rsi > > + mov rax,rsp > > +$L$SEH_begin_aesni_ecb_encrypt: > > + mov rdi,rcx > > + mov rsi,rdx > > + mov rdx,r8 > > + mov rcx,r9 > > + mov r8,QWORD[40+rsp] > > + > > + > > + > > + lea rsp,[((-88))+rsp] > > + movaps XMMWORD[rsp],xmm6 > > + movaps XMMWORD[16+rsp],xmm7 > > + movaps XMMWORD[32+rsp],xmm8 > > + movaps XMMWORD[48+rsp],xmm9 > > +$L$ecb_enc_body: > > + and rdx,-16 > > + jz NEAR $L$ecb_ret > > + > > + mov eax,DWORD[240+rcx] > > + movups xmm0,XMMWORD[rcx] > > + mov r11,rcx > > + mov r10d,eax > > + test r8d,r8d > > + jz NEAR $L$ecb_decrypt > > + > > + cmp rdx,0x80 > > + jb NEAR $L$ecb_enc_tail > > + > > + movdqu xmm2,XMMWORD[rdi] > > + movdqu xmm3,XMMWORD[16+rdi] > > + movdqu xmm4,XMMWORD[32+rdi] > > + movdqu xmm5,XMMWORD[48+rdi] > > + movdqu xmm6,XMMWORD[64+rdi] > > + movdqu xmm7,XMMWORD[80+rdi] > > + movdqu xmm8,XMMWORD[96+rdi] > > + movdqu xmm9,XMMWORD[112+rdi] > > + lea rdi,[128+rdi] > > + sub rdx,0x80 > > + jmp NEAR $L$ecb_enc_loop8_enter > > +ALIGN 16 > > +$L$ecb_enc_loop8: > > + movups XMMWORD[rsi],xmm2 > > + mov rcx,r11 > > + movdqu xmm2,XMMWORD[rdi] > > + mov eax,r10d > > + movups XMMWORD[16+rsi],xmm3 > > + movdqu xmm3,XMMWORD[16+rdi] > > + movups XMMWORD[32+rsi],xmm4 > > + movdqu xmm4,XMMWORD[32+rdi] > > + movups XMMWORD[48+rsi],xmm5 > > + movdqu xmm5,XMMWORD[48+rdi] > > + movups XMMWORD[64+rsi],xmm6 > > + movdqu xmm6,XMMWORD[64+rdi] > > + movups XMMWORD[80+rsi],xmm7 > > + movdqu xmm7,XMMWORD[80+rdi] > > + movups XMMWORD[96+rsi],xmm8 > > + movdqu xmm8,XMMWORD[96+rdi] > > + movups XMMWORD[112+rsi],xmm9 > > + lea rsi,[128+rsi] > > + movdqu xmm9,XMMWORD[112+rdi] > > + lea rdi,[128+rdi] > > +$L$ecb_enc_loop8_enter: > > + > > + call _aesni_encrypt8 > > + > > + sub rdx,0x80 > > + jnc NEAR $L$ecb_enc_loop8 > > + > > + movups XMMWORD[rsi],xmm2 > > + mov rcx,r11 > > + movups XMMWORD[16+rsi],xmm3 > > + mov eax,r10d > > + movups XMMWORD[32+rsi],xmm4 > > + movups XMMWORD[48+rsi],xmm5 > > + movups XMMWORD[64+rsi],xmm6 > > + movups XMMWORD[80+rsi],xmm7 > > + movups XMMWORD[96+rsi],xmm8 > > + movups XMMWORD[112+rsi],xmm9 > > + lea rsi,[128+rsi] > > + add rdx,0x80 > > + jz NEAR $L$ecb_ret > > + > > +$L$ecb_enc_tail: > > + movups xmm2,XMMWORD[rdi] > > + cmp rdx,0x20 > > + jb NEAR $L$ecb_enc_one > > + movups xmm3,XMMWORD[16+rdi] > > + je NEAR $L$ecb_enc_two > > + movups xmm4,XMMWORD[32+rdi] > > + cmp rdx,0x40 > > + jb NEAR $L$ecb_enc_three > > + movups xmm5,XMMWORD[48+rdi] > > + je NEAR $L$ecb_enc_four > > + movups xmm6,XMMWORD[64+rdi] > > + cmp rdx,0x60 > > + jb NEAR $L$ecb_enc_five > > + movups xmm7,XMMWORD[80+rdi] > > + je NEAR $L$ecb_enc_six > > + movdqu xmm8,XMMWORD[96+rdi] > > + xorps xmm9,xmm9 > > + call _aesni_encrypt8 > > + movups XMMWORD[rsi],xmm2 > > + movups XMMWORD[16+rsi],xmm3 > > + movups XMMWORD[32+rsi],xmm4 > > + movups XMMWORD[48+rsi],xmm5 > > + movups XMMWORD[64+rsi],xmm6 > > + movups XMMWORD[80+rsi],xmm7 > > + movups XMMWORD[96+rsi],xmm8 > > + jmp NEAR $L$ecb_ret > > +ALIGN 16 > > +$L$ecb_enc_one: > > + movups xmm0,XMMWORD[rcx] > > + movups xmm1,XMMWORD[16+rcx] > > + lea rcx,[32+rcx] > > + xorps xmm2,xmm0 > > +$L$oop_enc1_3: > > +DB 102,15,56,220,209 > > + dec eax > > + movups xmm1,XMMWORD[rcx] > > + lea rcx,[16+rcx] > > + jnz NEAR $L$oop_enc1_3 > > +DB 102,15,56,221,209 > > + movups XMMWORD[rsi],xmm2 > > + jmp NEAR $L$ecb_ret > > +ALIGN 16 > > +$L$ecb_enc_two: > > + call _aesni_encrypt2 > > + movups XMMWORD[rsi],xmm2 > > + movups XMMWORD[16+rsi],xmm3 > > + jmp NEAR $L$ecb_ret > > +ALIGN 16 > > +$L$ecb_enc_three: > > + call _aesni_encrypt3 > > + movups XMMWORD[rsi],xmm2 > > + movups XMMWORD[16+rsi],xmm3 > > + movups XMMWORD[32+rsi],xmm4 > > + jmp NEAR $L$ecb_ret > > +ALIGN 16 > > +$L$ecb_enc_four: > > + call _aesni_encrypt4 > > + movups XMMWORD[rsi],xmm2 > > + movups XMMWORD[16+rsi],xmm3 > > + movups XMMWORD[32+rsi],xmm4 > > + movups XMMWORD[48+rsi],xmm5 > > + jmp NEAR $L$ecb_ret > > +ALIGN 16 > > +$L$ecb_enc_five: > > + xorps xmm7,xmm7 > > + call _aesni_encrypt6 > > + movups XMMWORD[rsi],xmm2 > > + movups XMMWORD[16+rsi],xmm3 > > + movups XMMWORD[32+rsi],xmm4 > > + movups XMMWORD[48+rsi],xmm5 > > + movups XMMWORD[64+rsi],xmm6 > > + jmp NEAR $L$ecb_ret > > +ALIGN 16 > > +$L$ecb_enc_six: > > + call _aesni_encrypt6 > > + movups XMMWORD[rsi],xmm2 > > + movups XMMWORD[16+rsi],xmm3 > > + movups XMMWORD[32+rsi],xmm4 > > + movups XMMWORD[48+rsi],xmm5 > > + movups XMMWORD[64+rsi],xmm6 > > + movups XMMWORD[80+rsi],xmm7 > > + jmp NEAR $L$ecb_ret > > + > > +ALIGN 16 > > +$L$ecb_decrypt: > > + cmp rdx,0x80 > > + jb NEAR $L$ecb_dec_tail > > + > > + movdqu xmm2,XMMWORD[rdi] > > + movdqu xmm3,XMMWORD[16+rdi] > > + movdqu xmm4,XMMWORD[32+rdi] > > + movdqu xmm5,XMMWORD[48+rdi] > > + movdqu xmm6,XMMWORD[64+rdi] > > + movdqu xmm7,XMMWORD[80+rdi] > > + movdqu xmm8,XMMWORD[96+rdi] > > + movdqu xmm9,XMMWORD[112+rdi] > > + lea rdi,[128+rdi] > > + sub rdx,0x80 > > + jmp NEAR $L$ecb_dec_loop8_enter > > +ALIGN 16 > > +$L$ecb_dec_loop8: > > + movups XMMWORD[rsi],xmm2 > > + mov rcx,r11 > > + movdqu xmm2,XMMWORD[rdi] > > + mov eax,r10d > > + movups XMMWORD[16+rsi],xmm3 > > + movdqu xmm3,XMMWORD[16+rdi] > > + movups XMMWORD[32+rsi],xmm4 > > + movdqu xmm4,XMMWORD[32+rdi] > > + movups XMMWORD[48+rsi],xmm5 > > + movdqu xmm5,XMMWORD[48+rdi] > > + movups XMMWORD[64+rsi],xmm6 > > + movdqu xmm6,XMMWORD[64+rdi] > > + movups XMMWORD[80+rsi],xmm7 > > + movdqu xmm7,XMMWORD[80+rdi] > > + movups XMMWORD[96+rsi],xmm8 > > + movdqu xmm8,XMMWORD[96+rdi] > > + movups XMMWORD[112+rsi],xmm9 > > + lea rsi,[128+rsi] > > + movdqu xmm9,XMMWORD[112+rdi] > > + lea rdi,[128+rdi] > > +$L$ecb_dec_loop8_enter: > > + > > + call _aesni_decrypt8 > > + > > + movups xmm0,XMMWORD[r11] > > + sub rdx,0x80 > > + jnc NEAR $L$ecb_dec_loop8 > > + > > + movups XMMWORD[rsi],xmm2 > > + pxor xmm2,xmm2 > > + mov rcx,r11 > > + movups XMMWORD[16+rsi],xmm3 > > + pxor xmm3,xmm3 > > + mov eax,r10d > > + movups XMMWORD[32+rsi],xmm4 > > + pxor xmm4,xmm4 > > + movups XMMWORD[48+rsi],xmm5 > > + pxor xmm5,xmm5 > > + movups XMMWORD[64+rsi],xmm6 > > + pxor xmm6,xmm6 > > + movups XMMWORD[80+rsi],xmm7 > > + pxor xmm7,xmm7 > > + movups XMMWORD[96+rsi],xmm8 > > + pxor xmm8,xmm8 > > + movups XMMWORD[112+rsi],xmm9 > > + pxor xmm9,xmm9 > > + lea rsi,[128+rsi] > > + add rdx,0x80 > > + jz NEAR $L$ecb_ret > > + > > +$L$ecb_dec_tail: > > + movups xmm2,XMMWORD[rdi] > > + cmp rdx,0x20 > > + jb NEAR $L$ecb_dec_one > > + movups xmm3,XMMWORD[16+rdi] > > + je NEAR $L$ecb_dec_two > > + movups xmm4,XMMWORD[32+rdi] > > + cmp rdx,0x40 > > + jb NEAR $L$ecb_dec_three > > + movups xmm5,XMMWORD[48+rdi] > > + je NEAR $L$ecb_dec_four > > + movups xmm6,XMMWORD[64+rdi] > > + cmp rdx,0x60 > > + jb NEAR $L$ecb_dec_five > > + movups xmm7,XMMWORD[80+rdi] > > + je NEAR $L$ecb_dec_six > > + movups xmm8,XMMWORD[96+rdi] > > + movups xmm0,XMMWORD[rcx] > > + xorps xmm9,xmm9 > > + call _aesni_decrypt8 > > + movups XMMWORD[rsi],xmm2 > > + pxor xmm2,xmm2 > > + movups XMMWORD[16+rsi],xmm3 > > + pxor xmm3,xmm3 > > + movups XMMWORD[32+rsi],xmm4 > > + pxor xmm4,xmm4 > > + movups XMMWORD[48+rsi],xmm5 > > + pxor xmm5,xmm5 > > + movups XMMWORD[64+rsi],xmm6 > > + pxor xmm6,xmm6 > > + movups XMMWORD[80+rsi],xmm7 > > + pxor xmm7,xmm7 > > + movups XMMWORD[96+rsi],xmm8 > > + pxor xmm8,xmm8 > > + pxor xmm9,xmm9 > > + jmp NEAR $L$ecb_ret > > +ALIGN 16 > > +$L$ecb_dec_one: > > + movups xmm0,XMMWORD[rcx] > > + movups xmm1,XMMWORD[16+rcx] > > + lea rcx,[32+rcx] > > + xorps xmm2,xmm0 > > +$L$oop_dec1_4: > > +DB 102,15,56,222,209 > > + dec eax > > + movups xmm1,XMMWORD[rcx] > > + lea rcx,[16+rcx] > > + jnz NEAR $L$oop_dec1_4 > > +DB 102,15,56,223,209 > > + movups XMMWORD[rsi],xmm2 > > + pxor xmm2,xmm2 > > + jmp NEAR $L$ecb_ret > > +ALIGN 16 > > +$L$ecb_dec_two: > > + call _aesni_decrypt2 > > + movups XMMWORD[rsi],xmm2 > > + pxor xmm2,xmm2 > > + movups XMMWORD[16+rsi],xmm3 > > + pxor xmm3,xmm3 > > + jmp NEAR $L$ecb_ret > > +ALIGN 16 > > +$L$ecb_dec_three: > > + call _aesni_decrypt3 > > + movups XMMWORD[rsi],xmm2 > > + pxor xmm2,xmm2 > > + movups XMMWORD[16+rsi],xmm3 > > + pxor xmm3,xmm3 > > + movups XMMWORD[32+rsi],xmm4 > > + pxor xmm4,xmm4 > > + jmp NEAR $L$ecb_ret > > +ALIGN 16 > > +$L$ecb_dec_four: > > + call _aesni_decrypt4 > > + movups XMMWORD[rsi],xmm2 > > + pxor xmm2,xmm2 > > + movups XMMWORD[16+rsi],xmm3 > > + pxor xmm3,xmm3 > > + movups XMMWORD[32+rsi],xmm4 > > + pxor xmm4,xmm4 > > + movups XMMWORD[48+rsi],xmm5 > > + pxor xmm5,xmm5 > > + jmp NEAR $L$ecb_ret > > +ALIGN 16 > > +$L$ecb_dec_five: > > + xorps xmm7,xmm7 > > + call _aesni_decrypt6 > > + movups XMMWORD[rsi],xmm2 > > + pxor xmm2,xmm2 > > + movups XMMWORD[16+rsi],xmm3 > > + pxor xmm3,xmm3 > > + movups XMMWORD[32+rsi],xmm4 > > + pxor xmm4,xmm4 > > + movups XMMWORD[48+rsi],xmm5 > > + pxor xmm5,xmm5 > > + movups XMMWORD[64+rsi],xmm6 > > + pxor xmm6,xmm6 > > + pxor xmm7,xmm7 > > + jmp NEAR $L$ecb_ret > > +ALIGN 16 > > +$L$ecb_dec_six: > > + call _aesni_decrypt6 > > + movups XMMWORD[rsi],xmm2 > > + pxor xmm2,xmm2 > > + movups XMMWORD[16+rsi],xmm3 > > + pxor xmm3,xmm3 > > + movups XMMWORD[32+rsi],xmm4 > > + pxor xmm4,xmm4 > > + movups XMMWORD[48+rsi],xmm5 > > + pxor xmm5,xmm5 > > + movups XMMWORD[64+rsi],xmm6 > > + pxor xmm6,xmm6 > > + movups XMMWORD[80+rsi],xmm7 > > + pxor xmm7,xmm7 > > + > > +$L$ecb_ret: > > + xorps xmm0,xmm0 > > + pxor xmm1,xmm1 > > + movaps xmm6,XMMWORD[rsp] > > + movaps XMMWORD[rsp],xmm0 > > + movaps xmm7,XMMWORD[16+rsp] > > + movaps XMMWORD[16+rsp],xmm0 > > + movaps xmm8,XMMWORD[32+rsp] > > + movaps XMMWORD[32+rsp],xmm0 > > + movaps xmm9,XMMWORD[48+rsp] > > + movaps XMMWORD[48+rsp],xmm0 > > + lea rsp,[88+rsp] > > +$L$ecb_enc_ret: > > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > > + mov rsi,QWORD[16+rsp] > > + DB 0F3h,0C3h ;repret > > + > > +$L$SEH_end_aesni_ecb_encrypt: > > +global aesni_ccm64_encrypt_blocks > > + > > +ALIGN 16 > > +aesni_ccm64_encrypt_blocks: > > + mov QWORD[8+rsp],rdi ;WIN64 prologue > > + mov QWORD[16+rsp],rsi > > + mov rax,rsp > > +$L$SEH_begin_aesni_ccm64_encrypt_blocks: > > + mov rdi,rcx > > + mov rsi,rdx > > + mov rdx,r8 > > + mov rcx,r9 > > + mov r8,QWORD[40+rsp] > > + mov r9,QWORD[48+rsp] > > + > > + > > + > > + lea rsp,[((-88))+rsp] > > + movaps XMMWORD[rsp],xmm6 > > + movaps XMMWORD[16+rsp],xmm7 > > + movaps XMMWORD[32+rsp],xmm8 > > + movaps XMMWORD[48+rsp],xmm9 > > +$L$ccm64_enc_body: > > + mov eax,DWORD[240+rcx] > > + movdqu xmm6,XMMWORD[r8] > > + movdqa xmm9,XMMWORD[$L$increment64] > > + movdqa xmm7,XMMWORD[$L$bswap_mask] > > + > > + shl eax,4 > > + mov r10d,16 > > + lea r11,[rcx] > > + movdqu xmm3,XMMWORD[r9] > > + movdqa xmm2,xmm6 > > + lea rcx,[32+rax*1+rcx] > > +DB 102,15,56,0,247 > > + sub r10,rax > > + jmp NEAR $L$ccm64_enc_outer > > +ALIGN 16 > > +$L$ccm64_enc_outer: > > + movups xmm0,XMMWORD[r11] > > + mov rax,r10 > > + movups xmm8,XMMWORD[rdi] > > + > > + xorps xmm2,xmm0 > > + movups xmm1,XMMWORD[16+r11] > > + xorps xmm0,xmm8 > > + xorps xmm3,xmm0 > > + movups xmm0,XMMWORD[32+r11] > > + > > +$L$ccm64_enc2_loop: > > +DB 102,15,56,220,209 > > +DB 102,15,56,220,217 > > + movups xmm1,XMMWORD[rax*1+rcx] > > + add rax,32 > > +DB 102,15,56,220,208 > > +DB 102,15,56,220,216 > > + movups xmm0,XMMWORD[((-16))+rax*1+rcx] > > + jnz NEAR $L$ccm64_enc2_loop > > +DB 102,15,56,220,209 > > +DB 102,15,56,220,217 > > + paddq xmm6,xmm9 > > + dec rdx > > +DB 102,15,56,221,208 > > +DB 102,15,56,221,216 > > + > > + lea rdi,[16+rdi] > > + xorps xmm8,xmm2 > > + movdqa xmm2,xmm6 > > + movups XMMWORD[rsi],xmm8 > > +DB 102,15,56,0,215 > > + lea rsi,[16+rsi] > > + jnz NEAR $L$ccm64_enc_outer > > + > > + pxor xmm0,xmm0 > > + pxor xmm1,xmm1 > > + pxor xmm2,xmm2 > > + movups XMMWORD[r9],xmm3 > > + pxor xmm3,xmm3 > > + pxor xmm8,xmm8 > > + pxor xmm6,xmm6 > > + movaps xmm6,XMMWORD[rsp] > > + movaps XMMWORD[rsp],xmm0 > > + movaps xmm7,XMMWORD[16+rsp] > > + movaps XMMWORD[16+rsp],xmm0 > > + movaps xmm8,XMMWORD[32+rsp] > > + movaps XMMWORD[32+rsp],xmm0 > > + movaps xmm9,XMMWORD[48+rsp] > > + movaps XMMWORD[48+rsp],xmm0 > > + lea rsp,[88+rsp] > > +$L$ccm64_enc_ret: > > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > > + mov rsi,QWORD[16+rsp] > > + DB 0F3h,0C3h ;repret > > + > > +$L$SEH_end_aesni_ccm64_encrypt_blocks: > > +global aesni_ccm64_decrypt_blocks > > + > > +ALIGN 16 > > +aesni_ccm64_decrypt_blocks: > > + mov QWORD[8+rsp],rdi ;WIN64 prologue > > + mov QWORD[16+rsp],rsi > > + mov rax,rsp > > +$L$SEH_begin_aesni_ccm64_decrypt_blocks: > > + mov rdi,rcx > > + mov rsi,rdx > > + mov rdx,r8 > > + mov rcx,r9 > > + mov r8,QWORD[40+rsp] > > + mov r9,QWORD[48+rsp] > > + > > + > > + > > + lea rsp,[((-88))+rsp] > > + movaps XMMWORD[rsp],xmm6 > > + movaps XMMWORD[16+rsp],xmm7 > > + movaps XMMWORD[32+rsp],xmm8 > > + movaps XMMWORD[48+rsp],xmm9 > > +$L$ccm64_dec_body: > > + mov eax,DWORD[240+rcx] > > + movups xmm6,XMMWORD[r8] > > + movdqu xmm3,XMMWORD[r9] > > + movdqa xmm9,XMMWORD[$L$increment64] > > + movdqa xmm7,XMMWORD[$L$bswap_mask] > > + > > + movaps xmm2,xmm6 > > + mov r10d,eax > > + mov r11,rcx > > +DB 102,15,56,0,247 > > + movups xmm0,XMMWORD[rcx] > > + movups xmm1,XMMWORD[16+rcx] > > + lea rcx,[32+rcx] > > + xorps xmm2,xmm0 > > +$L$oop_enc1_5: > > +DB 102,15,56,220,209 > > + dec eax > > + movups xmm1,XMMWORD[rcx] > > + lea rcx,[16+rcx] > > + jnz NEAR $L$oop_enc1_5 > > +DB 102,15,56,221,209 > > + shl r10d,4 > > + mov eax,16 > > + movups xmm8,XMMWORD[rdi] > > + paddq xmm6,xmm9 > > + lea rdi,[16+rdi] > > + sub rax,r10 > > + lea rcx,[32+r10*1+r11] > > + mov r10,rax > > + jmp NEAR $L$ccm64_dec_outer > > +ALIGN 16 > > +$L$ccm64_dec_outer: > > + xorps xmm8,xmm2 > > + movdqa xmm2,xmm6 > > + movups XMMWORD[rsi],xmm8 > > + lea rsi,[16+rsi] > > +DB 102,15,56,0,215 > > + > > + sub rdx,1 > > + jz NEAR $L$ccm64_dec_break > > + > > + movups xmm0,XMMWORD[r11] > > + mov rax,r10 > > + movups xmm1,XMMWORD[16+r11] > > + xorps xmm8,xmm0 > > + xorps xmm2,xmm0 > > + xorps xmm3,xmm8 > > + movups xmm0,XMMWORD[32+r11] > > + jmp NEAR $L$ccm64_dec2_loop > > +ALIGN 16 > > +$L$ccm64_dec2_loop: > > +DB 102,15,56,220,209 > > +DB 102,15,56,220,217 > > + movups xmm1,XMMWORD[rax*1+rcx] > > + add rax,32 > > +DB 102,15,56,220,208 > > +DB 102,15,56,220,216 > > + movups xmm0,XMMWORD[((-16))+rax*1+rcx] > > + jnz NEAR $L$ccm64_dec2_loop > > + movups xmm8,XMMWORD[rdi] > > + paddq xmm6,xmm9 > > +DB 102,15,56,220,209 > > +DB 102,15,56,220,217 > > +DB 102,15,56,221,208 > > +DB 102,15,56,221,216 > > + lea rdi,[16+rdi] > > + jmp NEAR $L$ccm64_dec_outer > > + > > +ALIGN 16 > > +$L$ccm64_dec_break: > > + > > + mov eax,DWORD[240+r11] > > + movups xmm0,XMMWORD[r11] > > + movups xmm1,XMMWORD[16+r11] > > + xorps xmm8,xmm0 > > + lea r11,[32+r11] > > + xorps xmm3,xmm8 > > +$L$oop_enc1_6: > > +DB 102,15,56,220,217 > > + dec eax > > + movups xmm1,XMMWORD[r11] > > + lea r11,[16+r11] > > + jnz NEAR $L$oop_enc1_6 > > +DB 102,15,56,221,217 > > + pxor xmm0,xmm0 > > + pxor xmm1,xmm1 > > + pxor xmm2,xmm2 > > + movups XMMWORD[r9],xmm3 > > + pxor xmm3,xmm3 > > + pxor xmm8,xmm8 > > + pxor xmm6,xmm6 > > + movaps xmm6,XMMWORD[rsp] > > + movaps XMMWORD[rsp],xmm0 > > + movaps xmm7,XMMWORD[16+rsp] > > + movaps XMMWORD[16+rsp],xmm0 > > + movaps xmm8,XMMWORD[32+rsp] > > + movaps XMMWORD[32+rsp],xmm0 > > + movaps xmm9,XMMWORD[48+rsp] > > + movaps XMMWORD[48+rsp],xmm0 > > + lea rsp,[88+rsp] > > +$L$ccm64_dec_ret: > > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > > + mov rsi,QWORD[16+rsp] > > + DB 0F3h,0C3h ;repret > > + > > +$L$SEH_end_aesni_ccm64_decrypt_blocks: > > +global aesni_ctr32_encrypt_blocks > > + > > +ALIGN 16 > > +aesni_ctr32_encrypt_blocks: > > + mov QWORD[8+rsp],rdi ;WIN64 prologue > > + mov QWORD[16+rsp],rsi > > + mov rax,rsp > > +$L$SEH_begin_aesni_ctr32_encrypt_blocks: > > + mov rdi,rcx > > + mov rsi,rdx > > + mov rdx,r8 > > + mov rcx,r9 > > + mov r8,QWORD[40+rsp] > > + > > + > > + > > + cmp rdx,1 > > + jne NEAR $L$ctr32_bulk > > + > > + > > + > > + movups xmm2,XMMWORD[r8] > > + movups xmm3,XMMWORD[rdi] > > + mov edx,DWORD[240+rcx] > > + movups xmm0,XMMWORD[rcx] > > + movups xmm1,XMMWORD[16+rcx] > > + lea rcx,[32+rcx] > > + xorps xmm2,xmm0 > > +$L$oop_enc1_7: > > +DB 102,15,56,220,209 > > + dec edx > > + movups xmm1,XMMWORD[rcx] > > + lea rcx,[16+rcx] > > + jnz NEAR $L$oop_enc1_7 > > +DB 102,15,56,221,209 > > + pxor xmm0,xmm0 > > + pxor xmm1,xmm1 > > + xorps xmm2,xmm3 > > + pxor xmm3,xmm3 > > + movups XMMWORD[rsi],xmm2 > > + xorps xmm2,xmm2 > > + jmp NEAR $L$ctr32_epilogue > > + > > +ALIGN 16 > > +$L$ctr32_bulk: > > + lea r11,[rsp] > > + > > + push rbp > > + > > + sub rsp,288 > > + and rsp,-16 > > + movaps XMMWORD[(-168)+r11],xmm6 > > + movaps XMMWORD[(-152)+r11],xmm7 > > + movaps XMMWORD[(-136)+r11],xmm8 > > + movaps XMMWORD[(-120)+r11],xmm9 > > + movaps XMMWORD[(-104)+r11],xmm10 > > + movaps XMMWORD[(-88)+r11],xmm11 > > + movaps XMMWORD[(-72)+r11],xmm12 > > + movaps XMMWORD[(-56)+r11],xmm13 > > + movaps XMMWORD[(-40)+r11],xmm14 > > + movaps XMMWORD[(-24)+r11],xmm15 > > +$L$ctr32_body: > > + > > + > > + > > + > > + movdqu xmm2,XMMWORD[r8] > > + movdqu xmm0,XMMWORD[rcx] > > + mov r8d,DWORD[12+r8] > > + pxor xmm2,xmm0 > > + mov ebp,DWORD[12+rcx] > > + movdqa XMMWORD[rsp],xmm2 > > + bswap r8d > > + movdqa xmm3,xmm2 > > + movdqa xmm4,xmm2 > > + movdqa xmm5,xmm2 > > + movdqa XMMWORD[64+rsp],xmm2 > > + movdqa XMMWORD[80+rsp],xmm2 > > + movdqa XMMWORD[96+rsp],xmm2 > > + mov r10,rdx > > + movdqa XMMWORD[112+rsp],xmm2 > > + > > + lea rax,[1+r8] > > + lea rdx,[2+r8] > > + bswap eax > > + bswap edx > > + xor eax,ebp > > + xor edx,ebp > > +DB 102,15,58,34,216,3 > > + lea rax,[3+r8] > > + movdqa XMMWORD[16+rsp],xmm3 > > +DB 102,15,58,34,226,3 > > + bswap eax > > + mov rdx,r10 > > + lea r10,[4+r8] > > + movdqa XMMWORD[32+rsp],xmm4 > > + xor eax,ebp > > + bswap r10d > > +DB 102,15,58,34,232,3 > > + xor r10d,ebp > > + movdqa XMMWORD[48+rsp],xmm5 > > + lea r9,[5+r8] > > + mov DWORD[((64+12))+rsp],r10d > > + bswap r9d > > + lea r10,[6+r8] > > + mov eax,DWORD[240+rcx] > > + xor r9d,ebp > > + bswap r10d > > + mov DWORD[((80+12))+rsp],r9d > > + xor r10d,ebp > > + lea r9,[7+r8] > > + mov DWORD[((96+12))+rsp],r10d > > + bswap r9d > > + mov r10d,DWORD[((OPENSSL_ia32cap_P+4))] > > + xor r9d,ebp > > + and r10d,71303168 > > + mov DWORD[((112+12))+rsp],r9d > > + > > + movups xmm1,XMMWORD[16+rcx] > > + > > + movdqa xmm6,XMMWORD[64+rsp] > > + movdqa xmm7,XMMWORD[80+rsp] > > + > > + cmp rdx,8 > > + jb NEAR $L$ctr32_tail > > + > > + sub rdx,6 > > + cmp r10d,4194304 > > + je NEAR $L$ctr32_6x > > + > > + lea rcx,[128+rcx] > > + sub rdx,2 > > + jmp NEAR $L$ctr32_loop8 > > + > > +ALIGN 16 > > +$L$ctr32_6x: > > + shl eax,4 > > + mov r10d,48 > > + bswap ebp > > + lea rcx,[32+rax*1+rcx] > > + sub r10,rax > > + jmp NEAR $L$ctr32_loop6 > > + > > +ALIGN 16 > > +$L$ctr32_loop6: > > + add r8d,6 > > + movups xmm0,XMMWORD[((-48))+r10*1+rcx] > > +DB 102,15,56,220,209 > > + mov eax,r8d > > + xor eax,ebp > > +DB 102,15,56,220,217 > > +DB 0x0f,0x38,0xf1,0x44,0x24,12 > > + lea eax,[1+r8] > > +DB 102,15,56,220,225 > > + xor eax,ebp > > +DB 0x0f,0x38,0xf1,0x44,0x24,28 > > +DB 102,15,56,220,233 > > + lea eax,[2+r8] > > + xor eax,ebp > > +DB 102,15,56,220,241 > > +DB 0x0f,0x38,0xf1,0x44,0x24,44 > > + lea eax,[3+r8] > > +DB 102,15,56,220,249 > > + movups xmm1,XMMWORD[((-32))+r10*1+rcx] > > + xor eax,ebp > > + > > +DB 102,15,56,220,208 > > +DB 0x0f,0x38,0xf1,0x44,0x24,60 > > + lea eax,[4+r8] > > +DB 102,15,56,220,216 > > + xor eax,ebp > > +DB 0x0f,0x38,0xf1,0x44,0x24,76 > > +DB 102,15,56,220,224 > > + lea eax,[5+r8] > > + xor eax,ebp > > +DB 102,15,56,220,232 > > +DB 0x0f,0x38,0xf1,0x44,0x24,92 > > + mov rax,r10 > > +DB 102,15,56,220,240 > > +DB 102,15,56,220,248 > > + movups xmm0,XMMWORD[((-16))+r10*1+rcx] > > + > > + call $L$enc_loop6 > > + > > + movdqu xmm8,XMMWORD[rdi] > > + movdqu xmm9,XMMWORD[16+rdi] > > + movdqu xmm10,XMMWORD[32+rdi] > > + movdqu xmm11,XMMWORD[48+rdi] > > + movdqu xmm12,XMMWORD[64+rdi] > > + movdqu xmm13,XMMWORD[80+rdi] > > + lea rdi,[96+rdi] > > + movups xmm1,XMMWORD[((-64))+r10*1+rcx] > > + pxor xmm8,xmm2 > > + movaps xmm2,XMMWORD[rsp] > > + pxor xmm9,xmm3 > > + movaps xmm3,XMMWORD[16+rsp] > > + pxor xmm10,xmm4 > > + movaps xmm4,XMMWORD[32+rsp] > > + pxor xmm11,xmm5 > > + movaps xmm5,XMMWORD[48+rsp] > > + pxor xmm12,xmm6 > > + movaps xmm6,XMMWORD[64+rsp] > > + pxor xmm13,xmm7 > > + movaps xmm7,XMMWORD[80+rsp] > > + movdqu XMMWORD[rsi],xmm8 > > + movdqu XMMWORD[16+rsi],xmm9 > > + movdqu XMMWORD[32+rsi],xmm10 > > + movdqu XMMWORD[48+rsi],xmm11 > > + movdqu XMMWORD[64+rsi],xmm12 > > + movdqu XMMWORD[80+rsi],xmm13 > > + lea rsi,[96+rsi] > > + > > + sub rdx,6 > > + jnc NEAR $L$ctr32_loop6 > > + > > + add rdx,6 > > + jz NEAR $L$ctr32_done > > + > > + lea eax,[((-48))+r10] > > + lea rcx,[((-80))+r10*1+rcx] > > + neg eax > > + shr eax,4 > > + jmp NEAR $L$ctr32_tail > > + > > +ALIGN 32 > > +$L$ctr32_loop8: > > + add r8d,8 > > + movdqa xmm8,XMMWORD[96+rsp] > > +DB 102,15,56,220,209 > > + mov r9d,r8d > > + movdqa xmm9,XMMWORD[112+rsp] > > +DB 102,15,56,220,217 > > + bswap r9d > > + movups xmm0,XMMWORD[((32-128))+rcx] > > +DB 102,15,56,220,225 > > + xor r9d,ebp > > + nop > > +DB 102,15,56,220,233 > > + mov DWORD[((0+12))+rsp],r9d > > + lea r9,[1+r8] > > +DB 102,15,56,220,241 > > +DB 102,15,56,220,249 > > +DB 102,68,15,56,220,193 > > +DB 102,68,15,56,220,201 > > + movups xmm1,XMMWORD[((48-128))+rcx] > > + bswap r9d > > +DB 102,15,56,220,208 > > +DB 102,15,56,220,216 > > + xor r9d,ebp > > +DB 0x66,0x90 > > +DB 102,15,56,220,224 > > +DB 102,15,56,220,232 > > + mov DWORD[((16+12))+rsp],r9d > > + lea r9,[2+r8] > > +DB 102,15,56,220,240 > > +DB 102,15,56,220,248 > > +DB 102,68,15,56,220,192 > > +DB 102,68,15,56,220,200 > > + movups xmm0,XMMWORD[((64-128))+rcx] > > + bswap r9d > > +DB 102,15,56,220,209 > > +DB 102,15,56,220,217 > > + xor r9d,ebp > > +DB 0x66,0x90 > > +DB 102,15,56,220,225 > > +DB 102,15,56,220,233 > > + mov DWORD[((32+12))+rsp],r9d > > + lea r9,[3+r8] > > +DB 102,15,56,220,241 > > +DB 102,15,56,220,249 > > +DB 102,68,15,56,220,193 > > +DB 102,68,15,56,220,201 > > + movups xmm1,XMMWORD[((80-128))+rcx] > > + bswap r9d > > +DB 102,15,56,220,208 > > +DB 102,15,56,220,216 > > + xor r9d,ebp > > +DB 0x66,0x90 > > +DB 102,15,56,220,224 > > +DB 102,15,56,220,232 > > + mov DWORD[((48+12))+rsp],r9d > > + lea r9,[4+r8] > > +DB 102,15,56,220,240 > > +DB 102,15,56,220,248 > > +DB 102,68,15,56,220,192 > > +DB 102,68,15,56,220,200 > > + movups xmm0,XMMWORD[((96-128))+rcx] > > + bswap r9d > > +DB 102,15,56,220,209 > > +DB 102,15,56,220,217 > > + xor r9d,ebp > > +DB 0x66,0x90 > > +DB 102,15,56,220,225 > > +DB 102,15,56,220,233 > > + mov DWORD[((64+12))+rsp],r9d > > + lea r9,[5+r8] > > +DB 102,15,56,220,241 > > +DB 102,15,56,220,249 > > +DB 102,68,15,56,220,193 > > +DB 102,68,15,56,220,201 > > + movups xmm1,XMMWORD[((112-128))+rcx] > > + bswap r9d > > +DB 102,15,56,220,208 > > +DB 102,15,56,220,216 > > + xor r9d,ebp > > +DB 0x66,0x90 > > +DB 102,15,56,220,224 > > +DB 102,15,56,220,232 > > + mov DWORD[((80+12))+rsp],r9d > > + lea r9,[6+r8] > > +DB 102,15,56,220,240 > > +DB 102,15,56,220,248 > > +DB 102,68,15,56,220,192 > > +DB 102,68,15,56,220,200 > > + movups xmm0,XMMWORD[((128-128))+rcx] > > + bswap r9d > > +DB 102,15,56,220,209 > > +DB 102,15,56,220,217 > > + xor r9d,ebp > > +DB 0x66,0x90 > > +DB 102,15,56,220,225 > > +DB 102,15,56,220,233 > > + mov DWORD[((96+12))+rsp],r9d > > + lea r9,[7+r8] > > +DB 102,15,56,220,241 > > +DB 102,15,56,220,249 > > +DB 102,68,15,56,220,193 > > +DB 102,68,15,56,220,201 > > + movups xmm1,XMMWORD[((144-128))+rcx] > > + bswap r9d > > +DB 102,15,56,220,208 > > +DB 102,15,56,220,216 > > +DB 102,15,56,220,224 > > + xor r9d,ebp > > + movdqu xmm10,XMMWORD[rdi] > > +DB 102,15,56,220,232 > > + mov DWORD[((112+12))+rsp],r9d > > + cmp eax,11 > > +DB 102,15,56,220,240 > > +DB 102,15,56,220,248 > > +DB 102,68,15,56,220,192 > > +DB 102,68,15,56,220,200 > > + movups xmm0,XMMWORD[((160-128))+rcx] > > + > > + jb NEAR $L$ctr32_enc_done > > + > > +DB 102,15,56,220,209 > > +DB 102,15,56,220,217 > > +DB 102,15,56,220,225 > > +DB 102,15,56,220,233 > > +DB 102,15,56,220,241 > > +DB 102,15,56,220,249 > > +DB 102,68,15,56,220,193 > > +DB 102,68,15,56,220,201 > > + movups xmm1,XMMWORD[((176-128))+rcx] > > + > > +DB 102,15,56,220,208 > > +DB 102,15,56,220,216 > > +DB 102,15,56,220,224 > > +DB 102,15,56,220,232 > > +DB 102,15,56,220,240 > > +DB 102,15,56,220,248 > > +DB 102,68,15,56,220,192 > > +DB 102,68,15,56,220,200 > > + movups xmm0,XMMWORD[((192-128))+rcx] > > + je NEAR $L$ctr32_enc_done > > + > > +DB 102,15,56,220,209 > > +DB 102,15,56,220,217 > > +DB 102,15,56,220,225 > > +DB 102,15,56,220,233 > > +DB 102,15,56,220,241 > > +DB 102,15,56,220,249 > > +DB 102,68,15,56,220,193 > > +DB 102,68,15,56,220,201 > > + movups xmm1,XMMWORD[((208-128))+rcx] > > + > > +DB 102,15,56,220,208 > > +DB 102,15,56,220,216 > > +DB 102,15,56,220,224 > > +DB 102,15,56,220,232 > > +DB 102,15,56,220,240 > > +DB 102,15,56,220,248 > > +DB 102,68,15,56,220,192 > > +DB 102,68,15,56,220,200 > > + movups xmm0,XMMWORD[((224-128))+rcx] > > + jmp NEAR $L$ctr32_enc_done > > + > > +ALIGN 16 > > +$L$ctr32_enc_done: > > + movdqu xmm11,XMMWORD[16+rdi] > > + pxor xmm10,xmm0 > > + movdqu xmm12,XMMWORD[32+rdi] > > + pxor xmm11,xmm0 > > + movdqu xmm13,XMMWORD[48+rdi] > > + pxor xmm12,xmm0 > > + movdqu xmm14,XMMWORD[64+rdi] > > + pxor xmm13,xmm0 > > + movdqu xmm15,XMMWORD[80+rdi] > > + pxor xmm14,xmm0 > > + pxor xmm15,xmm0 > > +DB 102,15,56,220,209 > > +DB 102,15,56,220,217 > > +DB 102,15,56,220,225 > > +DB 102,15,56,220,233 > > +DB 102,15,56,220,241 > > +DB 102,15,56,220,249 > > +DB 102,68,15,56,220,193 > > +DB 102,68,15,56,220,201 > > + movdqu xmm1,XMMWORD[96+rdi] > > + lea rdi,[128+rdi] > > + > > +DB 102,65,15,56,221,210 > > + pxor xmm1,xmm0 > > + movdqu xmm10,XMMWORD[((112-128))+rdi] > > +DB 102,65,15,56,221,219 > > + pxor xmm10,xmm0 > > + movdqa xmm11,XMMWORD[rsp] > > +DB 102,65,15,56,221,228 > > +DB 102,65,15,56,221,237 > > + movdqa xmm12,XMMWORD[16+rsp] > > + movdqa xmm13,XMMWORD[32+rsp] > > +DB 102,65,15,56,221,246 > > +DB 102,65,15,56,221,255 > > + movdqa xmm14,XMMWORD[48+rsp] > > + movdqa xmm15,XMMWORD[64+rsp] > > +DB 102,68,15,56,221,193 > > + movdqa xmm0,XMMWORD[80+rsp] > > + movups xmm1,XMMWORD[((16-128))+rcx] > > +DB 102,69,15,56,221,202 > > + > > + movups XMMWORD[rsi],xmm2 > > + movdqa xmm2,xmm11 > > + movups XMMWORD[16+rsi],xmm3 > > + movdqa xmm3,xmm12 > > + movups XMMWORD[32+rsi],xmm4 > > + movdqa xmm4,xmm13 > > + movups XMMWORD[48+rsi],xmm5 > > + movdqa xmm5,xmm14 > > + movups XMMWORD[64+rsi],xmm6 > > + movdqa xmm6,xmm15 > > + movups XMMWORD[80+rsi],xmm7 > > + movdqa xmm7,xmm0 > > + movups XMMWORD[96+rsi],xmm8 > > + movups XMMWORD[112+rsi],xmm9 > > + lea rsi,[128+rsi] > > + > > + sub rdx,8 > > + jnc NEAR $L$ctr32_loop8 > > + > > + add rdx,8 > > + jz NEAR $L$ctr32_done > > + lea rcx,[((-128))+rcx] > > + > > +$L$ctr32_tail: > > + > > + > > + lea rcx,[16+rcx] > > + cmp rdx,4 > > + jb NEAR $L$ctr32_loop3 > > + je NEAR $L$ctr32_loop4 > > + > > + > > + shl eax,4 > > + movdqa xmm8,XMMWORD[96+rsp] > > + pxor xmm9,xmm9 > > + > > + movups xmm0,XMMWORD[16+rcx] > > +DB 102,15,56,220,209 > > +DB 102,15,56,220,217 > > + lea rcx,[((32-16))+rax*1+rcx] > > + neg rax > > +DB 102,15,56,220,225 > > + add rax,16 > > + movups xmm10,XMMWORD[rdi] > > +DB 102,15,56,220,233 > > +DB 102,15,56,220,241 > > + movups xmm11,XMMWORD[16+rdi] > > + movups xmm12,XMMWORD[32+rdi] > > +DB 102,15,56,220,249 > > +DB 102,68,15,56,220,193 > > + > > + call $L$enc_loop8_enter > > + > > + movdqu xmm13,XMMWORD[48+rdi] > > + pxor xmm2,xmm10 > > + movdqu xmm10,XMMWORD[64+rdi] > > + pxor xmm3,xmm11 > > + movdqu XMMWORD[rsi],xmm2 > > + pxor xmm4,xmm12 > > + movdqu XMMWORD[16+rsi],xmm3 > > + pxor xmm5,xmm13 > > + movdqu XMMWORD[32+rsi],xmm4 > > + pxor xmm6,xmm10 > > + movdqu XMMWORD[48+rsi],xmm5 > > + movdqu XMMWORD[64+rsi],xmm6 > > + cmp rdx,6 > > + jb NEAR $L$ctr32_done > > + > > + movups xmm11,XMMWORD[80+rdi] > > + xorps xmm7,xmm11 > > + movups XMMWORD[80+rsi],xmm7 > > + je NEAR $L$ctr32_done > > + > > + movups xmm12,XMMWORD[96+rdi] > > + xorps xmm8,xmm12 > > + movups XMMWORD[96+rsi],xmm8 > > + jmp NEAR $L$ctr32_done > > + > > +ALIGN 32 > > +$L$ctr32_loop4: > > +DB 102,15,56,220,209 > > + lea rcx,[16+rcx] > > + dec eax > > +DB 102,15,56,220,217 > > +DB 102,15,56,220,225 > > +DB 102,15,56,220,233 > > + movups xmm1,XMMWORD[rcx] > > + jnz NEAR $L$ctr32_loop4 > > +DB 102,15,56,221,209 > > +DB 102,15,56,221,217 > > + movups xmm10,XMMWORD[rdi] > > + movups xmm11,XMMWORD[16+rdi] > > +DB 102,15,56,221,225 > > +DB 102,15,56,221,233 > > + movups xmm12,XMMWORD[32+rdi] > > + movups xmm13,XMMWORD[48+rdi] > > + > > + xorps xmm2,xmm10 > > + movups XMMWORD[rsi],xmm2 > > + xorps xmm3,xmm11 > > + movups XMMWORD[16+rsi],xmm3 > > + pxor xmm4,xmm12 > > + movdqu XMMWORD[32+rsi],xmm4 > > + pxor xmm5,xmm13 > > + movdqu XMMWORD[48+rsi],xmm5 > > + jmp NEAR $L$ctr32_done > > + > > +ALIGN 32 > > +$L$ctr32_loop3: > > +DB 102,15,56,220,209 > > + lea rcx,[16+rcx] > > + dec eax > > +DB 102,15,56,220,217 > > +DB 102,15,56,220,225 > > + movups xmm1,XMMWORD[rcx] > > + jnz NEAR $L$ctr32_loop3 > > +DB 102,15,56,221,209 > > +DB 102,15,56,221,217 > > +DB 102,15,56,221,225 > > + > > + movups xmm10,XMMWORD[rdi] > > + xorps xmm2,xmm10 > > + movups XMMWORD[rsi],xmm2 > > + cmp rdx,2 > > + jb NEAR $L$ctr32_done > > + > > + movups xmm11,XMMWORD[16+rdi] > > + xorps xmm3,xmm11 > > + movups XMMWORD[16+rsi],xmm3 > > + je NEAR $L$ctr32_done > > + > > + movups xmm12,XMMWORD[32+rdi] > > + xorps xmm4,xmm12 > > + movups XMMWORD[32+rsi],xmm4 > > + > > +$L$ctr32_done: > > + xorps xmm0,xmm0 > > + xor ebp,ebp > > + pxor xmm1,xmm1 > > + pxor xmm2,xmm2 > > + pxor xmm3,xmm3 > > + pxor xmm4,xmm4 > > + pxor xmm5,xmm5 > > + movaps xmm6,XMMWORD[((-168))+r11] > > + movaps XMMWORD[(-168)+r11],xmm0 > > + movaps xmm7,XMMWORD[((-152))+r11] > > + movaps XMMWORD[(-152)+r11],xmm0 > > + movaps xmm8,XMMWORD[((-136))+r11] > > + movaps XMMWORD[(-136)+r11],xmm0 > > + movaps xmm9,XMMWORD[((-120))+r11] > > + movaps XMMWORD[(-120)+r11],xmm0 > > + movaps xmm10,XMMWORD[((-104))+r11] > > + movaps XMMWORD[(-104)+r11],xmm0 > > + movaps xmm11,XMMWORD[((-88))+r11] > > + movaps XMMWORD[(-88)+r11],xmm0 > > + movaps xmm12,XMMWORD[((-72))+r11] > > + movaps XMMWORD[(-72)+r11],xmm0 > > + movaps xmm13,XMMWORD[((-56))+r11] > > + movaps XMMWORD[(-56)+r11],xmm0 > > + movaps xmm14,XMMWORD[((-40))+r11] > > + movaps XMMWORD[(-40)+r11],xmm0 > > + movaps xmm15,XMMWORD[((-24))+r11] > > + movaps XMMWORD[(-24)+r11],xmm0 > > + movaps XMMWORD[rsp],xmm0 > > + movaps XMMWORD[16+rsp],xmm0 > > + movaps XMMWORD[32+rsp],xmm0 > > + movaps XMMWORD[48+rsp],xmm0 > > + movaps XMMWORD[64+rsp],xmm0 > > + movaps XMMWORD[80+rsp],xmm0 > > + movaps XMMWORD[96+rsp],xmm0 > > + movaps XMMWORD[112+rsp],xmm0 > > + mov rbp,QWORD[((-8))+r11] > > + > > + lea rsp,[r11] > > + > > +$L$ctr32_epilogue: > > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > > + mov rsi,QWORD[16+rsp] > > + DB 0F3h,0C3h ;repret > > + > > +$L$SEH_end_aesni_ctr32_encrypt_blocks: > > +global aesni_xts_encrypt > > + > > +ALIGN 16 > > +aesni_xts_encrypt: > > + mov QWORD[8+rsp],rdi ;WIN64 prologue > > + mov QWORD[16+rsp],rsi > > + mov rax,rsp > > +$L$SEH_begin_aesni_xts_encrypt: > > + mov rdi,rcx > > + mov rsi,rdx > > + mov rdx,r8 > > + mov rcx,r9 > > + mov r8,QWORD[40+rsp] > > + mov r9,QWORD[48+rsp] > > + > > + > > + > > + lea r11,[rsp] > > + > > + push rbp > > + > > + sub rsp,272 > > + and rsp,-16 > > + movaps XMMWORD[(-168)+r11],xmm6 > > + movaps XMMWORD[(-152)+r11],xmm7 > > + movaps XMMWORD[(-136)+r11],xmm8 > > + movaps XMMWORD[(-120)+r11],xmm9 > > + movaps XMMWORD[(-104)+r11],xmm10 > > + movaps XMMWORD[(-88)+r11],xmm11 > > + movaps XMMWORD[(-72)+r11],xmm12 > > + movaps XMMWORD[(-56)+r11],xmm13 > > + movaps XMMWORD[(-40)+r11],xmm14 > > + movaps XMMWORD[(-24)+r11],xmm15 > > +$L$xts_enc_body: > > + movups xmm2,XMMWORD[r9] > > + mov eax,DWORD[240+r8] > > + mov r10d,DWORD[240+rcx] > > + movups xmm0,XMMWORD[r8] > > + movups xmm1,XMMWORD[16+r8] > > + lea r8,[32+r8] > > + xorps xmm2,xmm0 > > +$L$oop_enc1_8: > > +DB 102,15,56,220,209 > > + dec eax > > + movups xmm1,XMMWORD[r8] > > + lea r8,[16+r8] > > + jnz NEAR $L$oop_enc1_8 > > +DB 102,15,56,221,209 > > + movups xmm0,XMMWORD[rcx] > > + mov rbp,rcx > > + mov eax,r10d > > + shl r10d,4 > > + mov r9,rdx > > + and rdx,-16 > > + > > + movups xmm1,XMMWORD[16+r10*1+rcx] > > + > > + movdqa xmm8,XMMWORD[$L$xts_magic] > > + movdqa xmm15,xmm2 > > + pshufd xmm9,xmm2,0x5f > > + pxor xmm1,xmm0 > > + movdqa xmm14,xmm9 > > + paddd xmm9,xmm9 > > + movdqa xmm10,xmm15 > > + psrad xmm14,31 > > + paddq xmm15,xmm15 > > + pand xmm14,xmm8 > > + pxor xmm10,xmm0 > > + pxor xmm15,xmm14 > > + movdqa xmm14,xmm9 > > + paddd xmm9,xmm9 > > + movdqa xmm11,xmm15 > > + psrad xmm14,31 > > + paddq xmm15,xmm15 > > + pand xmm14,xmm8 > > + pxor xmm11,xmm0 > > + pxor xmm15,xmm14 > > + movdqa xmm14,xmm9 > > + paddd xmm9,xmm9 > > + movdqa xmm12,xmm15 > > + psrad xmm14,31 > > + paddq xmm15,xmm15 > > + pand xmm14,xmm8 > > + pxor xmm12,xmm0 > > + pxor xmm15,xmm14 > > + movdqa xmm14,xmm9 > > + paddd xmm9,xmm9 > > + movdqa xmm13,xmm15 > > + psrad xmm14,31 > > + paddq xmm15,xmm15 > > + pand xmm14,xmm8 > > + pxor xmm13,xmm0 > > + pxor xmm15,xmm14 > > + movdqa xmm14,xmm15 > > + psrad xmm9,31 > > + paddq xmm15,xmm15 > > + pand xmm9,xmm8 > > + pxor xmm14,xmm0 > > + pxor xmm15,xmm9 > > + movaps XMMWORD[96+rsp],xmm1 > > + > > + sub rdx,16*6 > > + jc NEAR $L$xts_enc_short > > + > > + mov eax,16+96 > > + lea rcx,[32+r10*1+rbp] > > + sub rax,r10 > > + movups xmm1,XMMWORD[16+rbp] > > + mov r10,rax > > + lea r8,[$L$xts_magic] > > + jmp NEAR $L$xts_enc_grandloop > > + > > +ALIGN 32 > > +$L$xts_enc_grandloop: > > + movdqu xmm2,XMMWORD[rdi] > > + movdqa xmm8,xmm0 > > + movdqu xmm3,XMMWORD[16+rdi] > > + pxor xmm2,xmm10 > > + movdqu xmm4,XMMWORD[32+rdi] > > + pxor xmm3,xmm11 > > +DB 102,15,56,220,209 > > + movdqu xmm5,XMMWORD[48+rdi] > > + pxor xmm4,xmm12 > > +DB 102,15,56,220,217 > > + movdqu xmm6,XMMWORD[64+rdi] > > + pxor xmm5,xmm13 > > +DB 102,15,56,220,225 > > + movdqu xmm7,XMMWORD[80+rdi] > > + pxor xmm8,xmm15 > > + movdqa xmm9,XMMWORD[96+rsp] > > + pxor xmm6,xmm14 > > +DB 102,15,56,220,233 > > + movups xmm0,XMMWORD[32+rbp] > > + lea rdi,[96+rdi] > > + pxor xmm7,xmm8 > > + > > + pxor xmm10,xmm9 > > +DB 102,15,56,220,241 > > + pxor xmm11,xmm9 > > + movdqa XMMWORD[rsp],xmm10 > > +DB 102,15,56,220,249 > > + movups xmm1,XMMWORD[48+rbp] > > + pxor xmm12,xmm9 > > + > > +DB 102,15,56,220,208 > > + pxor xmm13,xmm9 > > + movdqa XMMWORD[16+rsp],xmm11 > > +DB 102,15,56,220,216 > > + pxor xmm14,xmm9 > > + movdqa XMMWORD[32+rsp],xmm12 > > +DB 102,15,56,220,224 > > +DB 102,15,56,220,232 > > + pxor xmm8,xmm9 > > + movdqa XMMWORD[64+rsp],xmm14 > > +DB 102,15,56,220,240 > > +DB 102,15,56,220,248 > > + movups xmm0,XMMWORD[64+rbp] > > + movdqa XMMWORD[80+rsp],xmm8 > > + pshufd xmm9,xmm15,0x5f > > + jmp NEAR $L$xts_enc_loop6 > > +ALIGN 32 > > +$L$xts_enc_loop6: > > +DB 102,15,56,220,209 > > +DB 102,15,56,220,217 > > +DB 102,15,56,220,225 > > +DB 102,15,56,220,233 > > +DB 102,15,56,220,241 > > +DB 102,15,56,220,249 > > + movups xmm1,XMMWORD[((-64))+rax*1+rcx] > > + add rax,32 > > + > > +DB 102,15,56,220,208 > > +DB 102,15,56,220,216 > > +DB 102,15,56,220,224 > > +DB 102,15,56,220,232 > > +DB 102,15,56,220,240 > > +DB 102,15,56,220,248 > > + movups xmm0,XMMWORD[((-80))+rax*1+rcx] > > + jnz NEAR $L$xts_enc_loop6 > > + > > + movdqa xmm8,XMMWORD[r8] > > + movdqa xmm14,xmm9 > > + paddd xmm9,xmm9 > > +DB 102,15,56,220,209 > > + paddq xmm15,xmm15 > > + psrad xmm14,31 > > +DB 102,15,56,220,217 > > + pand xmm14,xmm8 > > + movups xmm10,XMMWORD[rbp] > > +DB 102,15,56,220,225 > > +DB 102,15,56,220,233 > > +DB 102,15,56,220,241 > > + pxor xmm15,xmm14 > > + movaps xmm11,xmm10 > > +DB 102,15,56,220,249 > > + movups xmm1,XMMWORD[((-64))+rcx] > > + > > + movdqa xmm14,xmm9 > > +DB 102,15,56,220,208 > > + paddd xmm9,xmm9 > > + pxor xmm10,xmm15 > > +DB 102,15,56,220,216 > > + psrad xmm14,31 > > + paddq xmm15,xmm15 > > +DB 102,15,56,220,224 > > +DB 102,15,56,220,232 > > + pand xmm14,xmm8 > > + movaps xmm12,xmm11 > > +DB 102,15,56,220,240 > > + pxor xmm15,xmm14 > > + movdqa xmm14,xmm9 > > +DB 102,15,56,220,248 > > + movups xmm0,XMMWORD[((-48))+rcx] > > + > > + paddd xmm9,xmm9 > > +DB 102,15,56,220,209 > > + pxor xmm11,xmm15 > > + psrad xmm14,31 > > +DB 102,15,56,220,217 > > + paddq xmm15,xmm15 > > + pand xmm14,xmm8 > > +DB 102,15,56,220,225 > > +DB 102,15,56,220,233 > > + movdqa XMMWORD[48+rsp],xmm13 > > + pxor xmm15,xmm14 > > +DB 102,15,56,220,241 > > + movaps xmm13,xmm12 > > + movdqa xmm14,xmm9 > > +DB 102,15,56,220,249 > > + movups xmm1,XMMWORD[((-32))+rcx] > > + > > + paddd xmm9,xmm9 > > +DB 102,15,56,220,208 > > + pxor xmm12,xmm15 > > + psrad xmm14,31 > > +DB 102,15,56,220,216 > > + paddq xmm15,xmm15 > > + pand xmm14,xmm8 > > +DB 102,15,56,220,224 > > +DB 102,15,56,220,232 > > +DB 102,15,56,220,240 > > + pxor xmm15,xmm14 > > + movaps xmm14,xmm13 > > +DB 102,15,56,220,248 > > + > > + movdqa xmm0,xmm9 > > + paddd xmm9,xmm9 > > +DB 102,15,56,220,209 > > + pxor xmm13,xmm15 > > + psrad xmm0,31 > > +DB 102,15,56,220,217 > > + paddq xmm15,xmm15 > > + pand xmm0,xmm8 > > +DB 102,15,56,220,225 > > +DB 102,15,56,220,233 > > + pxor xmm15,xmm0 > > + movups xmm0,XMMWORD[rbp] > > +DB 102,15,56,220,241 > > +DB 102,15,56,220,249 > > + movups xmm1,XMMWORD[16+rbp] > > + > > + pxor xmm14,xmm15 > > +DB 102,15,56,221,84,36,0 > > + psrad xmm9,31 > > + paddq xmm15,xmm15 > > +DB 102,15,56,221,92,36,16 > > +DB 102,15,56,221,100,36,32 > > + pand xmm9,xmm8 > > + mov rax,r10 > > +DB 102,15,56,221,108,36,48 > > +DB 102,15,56,221,116,36,64 > > +DB 102,15,56,221,124,36,80 > > + pxor xmm15,xmm9 > > + > > + lea rsi,[96+rsi] > > + movups XMMWORD[(-96)+rsi],xmm2 > > + movups XMMWORD[(-80)+rsi],xmm3 > > + movups XMMWORD[(-64)+rsi],xmm4 > > + movups XMMWORD[(-48)+rsi],xmm5 > > + movups XMMWORD[(-32)+rsi],xmm6 > > + movups XMMWORD[(-16)+rsi],xmm7 > > + sub rdx,16*6 > > + jnc NEAR $L$xts_enc_grandloop > > + > > + mov eax,16+96 > > + sub eax,r10d > > + mov rcx,rbp > > + shr eax,4 > > + > > +$L$xts_enc_short: > > + > > + mov r10d,eax > > + pxor xmm10,xmm0 > > + add rdx,16*6 > > + jz NEAR $L$xts_enc_done > > + > > + pxor xmm11,xmm0 > > + cmp rdx,0x20 > > + jb NEAR $L$xts_enc_one > > + pxor xmm12,xmm0 > > + je NEAR $L$xts_enc_two > > + > > + pxor xmm13,xmm0 > > + cmp rdx,0x40 > > + jb NEAR $L$xts_enc_three > > + pxor xmm14,xmm0 > > + je NEAR $L$xts_enc_four > > + > > + movdqu xmm2,XMMWORD[rdi] > > + movdqu xmm3,XMMWORD[16+rdi] > > + movdqu xmm4,XMMWORD[32+rdi] > > + pxor xmm2,xmm10 > > + movdqu xmm5,XMMWORD[48+rdi] > > + pxor xmm3,xmm11 > > + movdqu xmm6,XMMWORD[64+rdi] > > + lea rdi,[80+rdi] > > + pxor xmm4,xmm12 > > + pxor xmm5,xmm13 > > + pxor xmm6,xmm14 > > + pxor xmm7,xmm7 > > + > > + call _aesni_encrypt6 > > + > > + xorps xmm2,xmm10 > > + movdqa xmm10,xmm15 > > + xorps xmm3,xmm11 > > + xorps xmm4,xmm12 > > + movdqu XMMWORD[rsi],xmm2 > > + xorps xmm5,xmm13 > > + movdqu XMMWORD[16+rsi],xmm3 > > + xorps xmm6,xmm14 > > + movdqu XMMWORD[32+rsi],xmm4 > > + movdqu XMMWORD[48+rsi],xmm5 > > + movdqu XMMWORD[64+rsi],xmm6 > > + lea rsi,[80+rsi] > > + jmp NEAR $L$xts_enc_done > > + > > +ALIGN 16 > > +$L$xts_enc_one: > > + movups xmm2,XMMWORD[rdi] > > + lea rdi,[16+rdi] > > + xorps xmm2,xmm10 > > + movups xmm0,XMMWORD[rcx] > > + movups xmm1,XMMWORD[16+rcx] > > + lea rcx,[32+rcx] > > + xorps xmm2,xmm0 > > +$L$oop_enc1_9: > > +DB 102,15,56,220,209 > > + dec eax > > + movups xmm1,XMMWORD[rcx] > > + lea rcx,[16+rcx] > > + jnz NEAR $L$oop_enc1_9 > > +DB 102,15,56,221,209 > > + xorps xmm2,xmm10 > > + movdqa xmm10,xmm11 > > + movups XMMWORD[rsi],xmm2 > > + lea rsi,[16+rsi] > > + jmp NEAR $L$xts_enc_done > > + > > +ALIGN 16 > > +$L$xts_enc_two: > > + movups xmm2,XMMWORD[rdi] > > + movups xmm3,XMMWORD[16+rdi] > > + lea rdi,[32+rdi] > > + xorps xmm2,xmm10 > > + xorps xmm3,xmm11 > > + > > + call _aesni_encrypt2 > > + > > + xorps xmm2,xmm10 > > + movdqa xmm10,xmm12 > > + xorps xmm3,xmm11 > > + movups XMMWORD[rsi],xmm2 > > + movups XMMWORD[16+rsi],xmm3 > > + lea rsi,[32+rsi] > > + jmp NEAR $L$xts_enc_done > > + > > +ALIGN 16 > > +$L$xts_enc_three: > > + movups xmm2,XMMWORD[rdi] > > + movups xmm3,XMMWORD[16+rdi] > > + movups xmm4,XMMWORD[32+rdi] > > + lea rdi,[48+rdi] > > + xorps xmm2,xmm10 > > + xorps xmm3,xmm11 > > + xorps xmm4,xmm12 > > + > > + call _aesni_encrypt3 > > + > > + xorps xmm2,xmm10 > > + movdqa xmm10,xmm13 > > + xorps xmm3,xmm11 > > + xorps xmm4,xmm12 > > + movups XMMWORD[rsi],xmm2 > > + movups XMMWORD[16+rsi],xmm3 > > + movups XMMWORD[32+rsi],xmm4 > > + lea rsi,[48+rsi] > > + jmp NEAR $L$xts_enc_done > > + > > +ALIGN 16 > > +$L$xts_enc_four: > > + movups xmm2,XMMWORD[rdi] > > + movups xmm3,XMMWORD[16+rdi] > > + movups xmm4,XMMWORD[32+rdi] > > + xorps xmm2,xmm10 > > + movups xmm5,XMMWORD[48+rdi] > > + lea rdi,[64+rdi] > > + xorps xmm3,xmm11 > > + xorps xmm4,xmm12 > > + xorps xmm5,xmm13 > > + > > + call _aesni_encrypt4 > > + > > + pxor xmm2,xmm10 > > + movdqa xmm10,xmm14 > > + pxor xmm3,xmm11 > > + pxor xmm4,xmm12 > > + movdqu XMMWORD[rsi],xmm2 > > + pxor xmm5,xmm13 > > + movdqu XMMWORD[16+rsi],xmm3 > > + movdqu XMMWORD[32+rsi],xmm4 > > + movdqu XMMWORD[48+rsi],xmm5 > > + lea rsi,[64+rsi] > > + jmp NEAR $L$xts_enc_done > > + > > +ALIGN 16 > > +$L$xts_enc_done: > > + and r9,15 > > + jz NEAR $L$xts_enc_ret > > + mov rdx,r9 > > + > > +$L$xts_enc_steal: > > + movzx eax,BYTE[rdi] > > + movzx ecx,BYTE[((-16))+rsi] > > + lea rdi,[1+rdi] > > + mov BYTE[((-16))+rsi],al > > + mov BYTE[rsi],cl > > + lea rsi,[1+rsi] > > + sub rdx,1 > > + jnz NEAR $L$xts_enc_steal > > + > > + sub rsi,r9 > > + mov rcx,rbp > > + mov eax,r10d > > + > > + movups xmm2,XMMWORD[((-16))+rsi] > > + xorps xmm2,xmm10 > > + movups xmm0,XMMWORD[rcx] > > + movups xmm1,XMMWORD[16+rcx] > > + lea rcx,[32+rcx] > > + xorps xmm2,xmm0 > > +$L$oop_enc1_10: > > +DB 102,15,56,220,209 > > + dec eax > > + movups xmm1,XMMWORD[rcx] > > + lea rcx,[16+rcx] > > + jnz NEAR $L$oop_enc1_10 > > +DB 102,15,56,221,209 > > + xorps xmm2,xmm10 > > + movups XMMWORD[(-16)+rsi],xmm2 > > + > > +$L$xts_enc_ret: > > + xorps xmm0,xmm0 > > + pxor xmm1,xmm1 > > + pxor xmm2,xmm2 > > + pxor xmm3,xmm3 > > + pxor xmm4,xmm4 > > + pxor xmm5,xmm5 > > + movaps xmm6,XMMWORD[((-168))+r11] > > + movaps XMMWORD[(-168)+r11],xmm0 > > + movaps xmm7,XMMWORD[((-152))+r11] > > + movaps XMMWORD[(-152)+r11],xmm0 > > + movaps xmm8,XMMWORD[((-136))+r11] > > + movaps XMMWORD[(-136)+r11],xmm0 > > + movaps xmm9,XMMWORD[((-120))+r11] > > + movaps XMMWORD[(-120)+r11],xmm0 > > + movaps xmm10,XMMWORD[((-104))+r11] > > + movaps XMMWORD[(-104)+r11],xmm0 > > + movaps xmm11,XMMWORD[((-88))+r11] > > + movaps XMMWORD[(-88)+r11],xmm0 > > + movaps xmm12,XMMWORD[((-72))+r11] > > + movaps XMMWORD[(-72)+r11],xmm0 > > + movaps xmm13,XMMWORD[((-56))+r11] > > + movaps XMMWORD[(-56)+r11],xmm0 > > + movaps xmm14,XMMWORD[((-40))+r11] > > + movaps XMMWORD[(-40)+r11],xmm0 > > + movaps xmm15,XMMWORD[((-24))+r11] > > + movaps XMMWORD[(-24)+r11],xmm0 > > + movaps XMMWORD[rsp],xmm0 > > + movaps XMMWORD[16+rsp],xmm0 > > + movaps XMMWORD[32+rsp],xmm0 > > + movaps XMMWORD[48+rsp],xmm0 > > + movaps XMMWORD[64+rsp],xmm0 > > + movaps XMMWORD[80+rsp],xmm0 > > + movaps XMMWORD[96+rsp],xmm0 > > + mov rbp,QWORD[((-8))+r11] > > + > > + lea rsp,[r11] > > + > > +$L$xts_enc_epilogue: > > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > > + mov rsi,QWORD[16+rsp] > > + DB 0F3h,0C3h ;repret > > + > > +$L$SEH_end_aesni_xts_encrypt: > > +global aesni_xts_decrypt > > + > > +ALIGN 16 > > +aesni_xts_decrypt: > > + mov QWORD[8+rsp],rdi ;WIN64 prologue > > + mov QWORD[16+rsp],rsi > > + mov rax,rsp > > +$L$SEH_begin_aesni_xts_decrypt: > > + mov rdi,rcx > > + mov rsi,rdx > > + mov rdx,r8 > > + mov rcx,r9 > > + mov r8,QWORD[40+rsp] > > + mov r9,QWORD[48+rsp] > > + > > + > > + > > + lea r11,[rsp] > > + > > + push rbp > > + > > + sub rsp,272 > > + and rsp,-16 > > + movaps XMMWORD[(-168)+r11],xmm6 > > + movaps XMMWORD[(-152)+r11],xmm7 > > + movaps XMMWORD[(-136)+r11],xmm8 > > + movaps XMMWORD[(-120)+r11],xmm9 > > + movaps XMMWORD[(-104)+r11],xmm10 > > + movaps XMMWORD[(-88)+r11],xmm11 > > + movaps XMMWORD[(-72)+r11],xmm12 > > + movaps XMMWORD[(-56)+r11],xmm13 > > + movaps XMMWORD[(-40)+r11],xmm14 > > + movaps XMMWORD[(-24)+r11],xmm15 > > +$L$xts_dec_body: > > + movups xmm2,XMMWORD[r9] > > + mov eax,DWORD[240+r8] > > + mov r10d,DWORD[240+rcx] > > + movups xmm0,XMMWORD[r8] > > + movups xmm1,XMMWORD[16+r8] > > + lea r8,[32+r8] > > + xorps xmm2,xmm0 > > +$L$oop_enc1_11: > > +DB 102,15,56,220,209 > > + dec eax > > + movups xmm1,XMMWORD[r8] > > + lea r8,[16+r8] > > + jnz NEAR $L$oop_enc1_11 > > +DB 102,15,56,221,209 > > + xor eax,eax > > + test rdx,15 > > + setnz al > > + shl rax,4 > > + sub rdx,rax > > + > > + movups xmm0,XMMWORD[rcx] > > + mov rbp,rcx > > + mov eax,r10d > > + shl r10d,4 > > + mov r9,rdx > > + and rdx,-16 > > + > > + movups xmm1,XMMWORD[16+r10*1+rcx] > > + > > + movdqa xmm8,XMMWORD[$L$xts_magic] > > + movdqa xmm15,xmm2 > > + pshufd xmm9,xmm2,0x5f > > + pxor xmm1,xmm0 > > + movdqa xmm14,xmm9 > > + paddd xmm9,xmm9 > > + movdqa xmm10,xmm15 > > + psrad xmm14,31 > > + paddq xmm15,xmm15 > > + pand xmm14,xmm8 > > + pxor xmm10,xmm0 > > + pxor xmm15,xmm14 > > + movdqa xmm14,xmm9 > > + paddd xmm9,xmm9 > > + movdqa xmm11,xmm15 > > + psrad xmm14,31 > > + paddq xmm15,xmm15 > > + pand xmm14,xmm8 > > + pxor xmm11,xmm0 > > + pxor xmm15,xmm14 > > + movdqa xmm14,xmm9 > > + paddd xmm9,xmm9 > > + movdqa xmm12,xmm15 > > + psrad xmm14,31 > > + paddq xmm15,xmm15 > > + pand xmm14,xmm8 > > + pxor xmm12,xmm0 > > + pxor xmm15,xmm14 > > + movdqa xmm14,xmm9 > > + paddd xmm9,xmm9 > > + movdqa xmm13,xmm15 > > + psrad xmm14,31 > > + paddq xmm15,xmm15 > > + pand xmm14,xmm8 > > + pxor xmm13,xmm0 > > + pxor xmm15,xmm14 > > + movdqa xmm14,xmm15 > > + psrad xmm9,31 > > + paddq xmm15,xmm15 > > + pand xmm9,xmm8 > > + pxor xmm14,xmm0 > > + pxor xmm15,xmm9 > > + movaps XMMWORD[96+rsp],xmm1 > > + > > + sub rdx,16*6 > > + jc NEAR $L$xts_dec_short > > + > > + mov eax,16+96 > > + lea rcx,[32+r10*1+rbp] > > + sub rax,r10 > > + movups xmm1,XMMWORD[16+rbp] > > + mov r10,rax > > + lea r8,[$L$xts_magic] > > + jmp NEAR $L$xts_dec_grandloop > > + > > +ALIGN 32 > > +$L$xts_dec_grandloop: > > + movdqu xmm2,XMMWORD[rdi] > > + movdqa xmm8,xmm0 > > + movdqu xmm3,XMMWORD[16+rdi] > > + pxor xmm2,xmm10 > > + movdqu xmm4,XMMWORD[32+rdi] > > + pxor xmm3,xmm11 > > +DB 102,15,56,222,209 > > + movdqu xmm5,XMMWORD[48+rdi] > > + pxor xmm4,xmm12 > > +DB 102,15,56,222,217 > > + movdqu xmm6,XMMWORD[64+rdi] > > + pxor xmm5,xmm13 > > +DB 102,15,56,222,225 > > + movdqu xmm7,XMMWORD[80+rdi] > > + pxor xmm8,xmm15 > > + movdqa xmm9,XMMWORD[96+rsp] > > + pxor xmm6,xmm14 > > +DB 102,15,56,222,233 > > + movups xmm0,XMMWORD[32+rbp] > > + lea rdi,[96+rdi] > > + pxor xmm7,xmm8 > > + > > + pxor xmm10,xmm9 > > +DB 102,15,56,222,241 > > + pxor xmm11,xmm9 > > + movdqa XMMWORD[rsp],xmm10 > > +DB 102,15,56,222,249 > > + movups xmm1,XMMWORD[48+rbp] > > + pxor xmm12,xmm9 > > + > > +DB 102,15,56,222,208 > > + pxor xmm13,xmm9 > > + movdqa XMMWORD[16+rsp],xmm11 > > +DB 102,15,56,222,216 > > + pxor xmm14,xmm9 > > + movdqa XMMWORD[32+rsp],xmm12 > > +DB 102,15,56,222,224 > > +DB 102,15,56,222,232 > > + pxor xmm8,xmm9 > > + movdqa XMMWORD[64+rsp],xmm14 > > +DB 102,15,56,222,240 > > +DB 102,15,56,222,248 > > + movups xmm0,XMMWORD[64+rbp] > > + movdqa XMMWORD[80+rsp],xmm8 > > + pshufd xmm9,xmm15,0x5f > > + jmp NEAR $L$xts_dec_loop6 > > +ALIGN 32 > > +$L$xts_dec_loop6: > > +DB 102,15,56,222,209 > > +DB 102,15,56,222,217 > > +DB 102,15,56,222,225 > > +DB 102,15,56,222,233 > > +DB 102,15,56,222,241 > > +DB 102,15,56,222,249 > > + movups xmm1,XMMWORD[((-64))+rax*1+rcx] > > + add rax,32 > > + > > +DB 102,15,56,222,208 > > +DB 102,15,56,222,216 > > +DB 102,15,56,222,224 > > +DB 102,15,56,222,232 > > +DB 102,15,56,222,240 > > +DB 102,15,56,222,248 > > + movups xmm0,XMMWORD[((-80))+rax*1+rcx] > > + jnz NEAR $L$xts_dec_loop6 > > + > > + movdqa xmm8,XMMWORD[r8] > > + movdqa xmm14,xmm9 > > + paddd xmm9,xmm9 > > +DB 102,15,56,222,209 > > + paddq xmm15,xmm15 > > + psrad xmm14,31 > > +DB 102,15,56,222,217 > > + pand xmm14,xmm8 > > + movups xmm10,XMMWORD[rbp] > > +DB 102,15,56,222,225 > > +DB 102,15,56,222,233 > > +DB 102,15,56,222,241 > > + pxor xmm15,xmm14 > > + movaps xmm11,xmm10 > > +DB 102,15,56,222,249 > > + movups xmm1,XMMWORD[((-64))+rcx] > > + > > + movdqa xmm14,xmm9 > > +DB 102,15,56,222,208 > > + paddd xmm9,xmm9 > > + pxor xmm10,xmm15 > > +DB 102,15,56,222,216 > > + psrad xmm14,31 > > + paddq xmm15,xmm15 > > +DB 102,15,56,222,224 > > +DB 102,15,56,222,232 > > + pand xmm14,xmm8 > > + movaps xmm12,xmm11 > > +DB 102,15,56,222,240 > > + pxor xmm15,xmm14 > > + movdqa xmm14,xmm9 > > +DB 102,15,56,222,248 > > + movups xmm0,XMMWORD[((-48))+rcx] > > + > > + paddd xmm9,xmm9 > > +DB 102,15,56,222,209 > > + pxor xmm11,xmm15 > > + psrad xmm14,31 > > +DB 102,15,56,222,217 > > + paddq xmm15,xmm15 > > + pand xmm14,xmm8 > > +DB 102,15,56,222,225 > > +DB 102,15,56,222,233 > > + movdqa XMMWORD[48+rsp],xmm13 > > + pxor xmm15,xmm14 > > +DB 102,15,56,222,241 > > + movaps xmm13,xmm12 > > + movdqa xmm14,xmm9 > > +DB 102,15,56,222,249 > > + movups xmm1,XMMWORD[((-32))+rcx] > > + > > + paddd xmm9,xmm9 > > +DB 102,15,56,222,208 > > + pxor xmm12,xmm15 > > + psrad xmm14,31 > > +DB 102,15,56,222,216 > > + paddq xmm15,xmm15 > > + pand xmm14,xmm8 > > +DB 102,15,56,222,224 > > +DB 102,15,56,222,232 > > +DB 102,15,56,222,240 > > + pxor xmm15,xmm14 > > + movaps xmm14,xmm13 > > +DB 102,15,56,222,248 > > + > > + movdqa xmm0,xmm9 > > + paddd xmm9,xmm9 > > +DB 102,15,56,222,209 > > + pxor xmm13,xmm15 > > + psrad xmm0,31 > > +DB 102,15,56,222,217 > > + paddq xmm15,xmm15 > > + pand xmm0,xmm8 > > +DB 102,15,56,222,225 > > +DB 102,15,56,222,233 > > + pxor xmm15,xmm0 > > + movups xmm0,XMMWORD[rbp] > > +DB 102,15,56,222,241 > > +DB 102,15,56,222,249 > > + movups xmm1,XMMWORD[16+rbp] > > + > > + pxor xmm14,xmm15 > > +DB 102,15,56,223,84,36,0 > > + psrad xmm9,31 > > + paddq xmm15,xmm15 > > +DB 102,15,56,223,92,36,16 > > +DB 102,15,56,223,100,36,32 > > + pand xmm9,xmm8 > > + mov rax,r10 > > +DB 102,15,56,223,108,36,48 > > +DB 102,15,56,223,116,36,64 > > +DB 102,15,56,223,124,36,80 > > + pxor xmm15,xmm9 > > + > > + lea rsi,[96+rsi] > > + movups XMMWORD[(-96)+rsi],xmm2 > > + movups XMMWORD[(-80)+rsi],xmm3 > > + movups XMMWORD[(-64)+rsi],xmm4 > > + movups XMMWORD[(-48)+rsi],xmm5 > > + movups XMMWORD[(-32)+rsi],xmm6 > > + movups XMMWORD[(-16)+rsi],xmm7 > > + sub rdx,16*6 > > + jnc NEAR $L$xts_dec_grandloop > > + > > + mov eax,16+96 > > + sub eax,r10d > > + mov rcx,rbp > > + shr eax,4 > > + > > +$L$xts_dec_short: > > + > > + mov r10d,eax > > + pxor xmm10,xmm0 > > + pxor xmm11,xmm0 > > + add rdx,16*6 > > + jz NEAR $L$xts_dec_done > > + > > + pxor xmm12,xmm0 > > + cmp rdx,0x20 > > + jb NEAR $L$xts_dec_one > > + pxor xmm13,xmm0 > > + je NEAR $L$xts_dec_two > > + > > + pxor xmm14,xmm0 > > + cmp rdx,0x40 > > + jb NEAR $L$xts_dec_three > > + je NEAR $L$xts_dec_four > > + > > + movdqu xmm2,XMMWORD[rdi] > > + movdqu xmm3,XMMWORD[16+rdi] > > + movdqu xmm4,XMMWORD[32+rdi] > > + pxor xmm2,xmm10 > > + movdqu xmm5,XMMWORD[48+rdi] > > + pxor xmm3,xmm11 > > + movdqu xmm6,XMMWORD[64+rdi] > > + lea rdi,[80+rdi] > > + pxor xmm4,xmm12 > > + pxor xmm5,xmm13 > > + pxor xmm6,xmm14 > > + > > + call _aesni_decrypt6 > > + > > + xorps xmm2,xmm10 > > + xorps xmm3,xmm11 > > + xorps xmm4,xmm12 > > + movdqu XMMWORD[rsi],xmm2 > > + xorps xmm5,xmm13 > > + movdqu XMMWORD[16+rsi],xmm3 > > + xorps xmm6,xmm14 > > + movdqu XMMWORD[32+rsi],xmm4 > > + pxor xmm14,xmm14 > > + movdqu XMMWORD[48+rsi],xmm5 > > + pcmpgtd xmm14,xmm15 > > + movdqu XMMWORD[64+rsi],xmm6 > > + lea rsi,[80+rsi] > > + pshufd xmm11,xmm14,0x13 > > + and r9,15 > > + jz NEAR $L$xts_dec_ret > > + > > + movdqa xmm10,xmm15 > > + paddq xmm15,xmm15 > > + pand xmm11,xmm8 > > + pxor xmm11,xmm15 > > + jmp NEAR $L$xts_dec_done2 > > + > > +ALIGN 16 > > +$L$xts_dec_one: > > + movups xmm2,XMMWORD[rdi] > > + lea rdi,[16+rdi] > > + xorps xmm2,xmm10 > > + movups xmm0,XMMWORD[rcx] > > + movups xmm1,XMMWORD[16+rcx] > > + lea rcx,[32+rcx] > > + xorps xmm2,xmm0 > > +$L$oop_dec1_12: > > +DB 102,15,56,222,209 > > + dec eax > > + movups xmm1,XMMWORD[rcx] > > + lea rcx,[16+rcx] > > + jnz NEAR $L$oop_dec1_12 > > +DB 102,15,56,223,209 > > + xorps xmm2,xmm10 > > + movdqa xmm10,xmm11 > > + movups XMMWORD[rsi],xmm2 > > + movdqa xmm11,xmm12 > > + lea rsi,[16+rsi] > > + jmp NEAR $L$xts_dec_done > > + > > +ALIGN 16 > > +$L$xts_dec_two: > > + movups xmm2,XMMWORD[rdi] > > + movups xmm3,XMMWORD[16+rdi] > > + lea rdi,[32+rdi] > > + xorps xmm2,xmm10 > > + xorps xmm3,xmm11 > > + > > + call _aesni_decrypt2 > > + > > + xorps xmm2,xmm10 > > + movdqa xmm10,xmm12 > > + xorps xmm3,xmm11 > > + movdqa xmm11,xmm13 > > + movups XMMWORD[rsi],xmm2 > > + movups XMMWORD[16+rsi],xmm3 > > + lea rsi,[32+rsi] > > + jmp NEAR $L$xts_dec_done > > + > > +ALIGN 16 > > +$L$xts_dec_three: > > + movups xmm2,XMMWORD[rdi] > > + movups xmm3,XMMWORD[16+rdi] > > + movups xmm4,XMMWORD[32+rdi] > > + lea rdi,[48+rdi] > > + xorps xmm2,xmm10 > > + xorps xmm3,xmm11 > > + xorps xmm4,xmm12 > > + > > + call _aesni_decrypt3 > > + > > + xorps xmm2,xmm10 > > + movdqa xmm10,xmm13 > > + xorps xmm3,xmm11 > > + movdqa xmm11,xmm14 > > + xorps xmm4,xmm12 > > + movups XMMWORD[rsi],xmm2 > > + movups XMMWORD[16+rsi],xmm3 > > + movups XMMWORD[32+rsi],xmm4 > > + lea rsi,[48+rsi] > > + jmp NEAR $L$xts_dec_done > > + > > +ALIGN 16 > > +$L$xts_dec_four: > > + movups xmm2,XMMWORD[rdi] > > + movups xmm3,XMMWORD[16+rdi] > > + movups xmm4,XMMWORD[32+rdi] > > + xorps xmm2,xmm10 > > + movups xmm5,XMMWORD[48+rdi] > > + lea rdi,[64+rdi] > > + xorps xmm3,xmm11 > > + xorps xmm4,xmm12 > > + xorps xmm5,xmm13 > > + > > + call _aesni_decrypt4 > > + > > + pxor xmm2,xmm10 > > + movdqa xmm10,xmm14 > > + pxor xmm3,xmm11 > > + movdqa xmm11,xmm15 > > + pxor xmm4,xmm12 > > + movdqu XMMWORD[rsi],xmm2 > > + pxor xmm5,xmm13 > > + movdqu XMMWORD[16+rsi],xmm3 > > + movdqu XMMWORD[32+rsi],xmm4 > > + movdqu XMMWORD[48+rsi],xmm5 > > + lea rsi,[64+rsi] > > + jmp NEAR $L$xts_dec_done > > + > > +ALIGN 16 > > +$L$xts_dec_done: > > + and r9,15 > > + jz NEAR $L$xts_dec_ret > > +$L$xts_dec_done2: > > + mov rdx,r9 > > + mov rcx,rbp > > + mov eax,r10d > > + > > + movups xmm2,XMMWORD[rdi] > > + xorps xmm2,xmm11 > > + movups xmm0,XMMWORD[rcx] > > + movups xmm1,XMMWORD[16+rcx] > > + lea rcx,[32+rcx] > > + xorps xmm2,xmm0 > > +$L$oop_dec1_13: > > +DB 102,15,56,222,209 > > + dec eax > > + movups xmm1,XMMWORD[rcx] > > + lea rcx,[16+rcx] > > + jnz NEAR $L$oop_dec1_13 > > +DB 102,15,56,223,209 > > + xorps xmm2,xmm11 > > + movups XMMWORD[rsi],xmm2 > > + > > +$L$xts_dec_steal: > > + movzx eax,BYTE[16+rdi] > > + movzx ecx,BYTE[rsi] > > + lea rdi,[1+rdi] > > + mov BYTE[rsi],al > > + mov BYTE[16+rsi],cl > > + lea rsi,[1+rsi] > > + sub rdx,1 > > + jnz NEAR $L$xts_dec_steal > > + > > + sub rsi,r9 > > + mov rcx,rbp > > + mov eax,r10d > > + > > + movups xmm2,XMMWORD[rsi] > > + xorps xmm2,xmm10 > > + movups xmm0,XMMWORD[rcx] > > + movups xmm1,XMMWORD[16+rcx] > > + lea rcx,[32+rcx] > > + xorps xmm2,xmm0 > > +$L$oop_dec1_14: > > +DB 102,15,56,222,209 > > + dec eax > > + movups xmm1,XMMWORD[rcx] > > + lea rcx,[16+rcx] > > + jnz NEAR $L$oop_dec1_14 > > +DB 102,15,56,223,209 > > + xorps xmm2,xmm10 > > + movups XMMWORD[rsi],xmm2 > > + > > +$L$xts_dec_ret: > > + xorps xmm0,xmm0 > > + pxor xmm1,xmm1 > > + pxor xmm2,xmm2 > > + pxor xmm3,xmm3 > > + pxor xmm4,xmm4 > > + pxor xmm5,xmm5 > > + movaps xmm6,XMMWORD[((-168))+r11] > > + movaps XMMWORD[(-168)+r11],xmm0 > > + movaps xmm7,XMMWORD[((-152))+r11] > > + movaps XMMWORD[(-152)+r11],xmm0 > > + movaps xmm8,XMMWORD[((-136))+r11] > > + movaps XMMWORD[(-136)+r11],xmm0 > > + movaps xmm9,XMMWORD[((-120))+r11] > > + movaps XMMWORD[(-120)+r11],xmm0 > > + movaps xmm10,XMMWORD[((-104))+r11] > > + movaps XMMWORD[(-104)+r11],xmm0 > > + movaps xmm11,XMMWORD[((-88))+r11] > > + movaps XMMWORD[(-88)+r11],xmm0 > > + movaps xmm12,XMMWORD[((-72))+r11] > > + movaps XMMWORD[(-72)+r11],xmm0 > > + movaps xmm13,XMMWORD[((-56))+r11] > > + movaps XMMWORD[(-56)+r11],xmm0 > > + movaps xmm14,XMMWORD[((-40))+r11] > > + movaps XMMWORD[(-40)+r11],xmm0 > > + movaps xmm15,XMMWORD[((-24))+r11] > > + movaps XMMWORD[(-24)+r11],xmm0 > > + movaps XMMWORD[rsp],xmm0 > > + movaps XMMWORD[16+rsp],xmm0 > > + movaps XMMWORD[32+rsp],xmm0 > > + movaps XMMWORD[48+rsp],xmm0 > > + movaps XMMWORD[64+rsp],xmm0 > > + movaps XMMWORD[80+rsp],xmm0 > > + movaps XMMWORD[96+rsp],xmm0 > > + mov rbp,QWORD[((-8))+r11] > > + > > + lea rsp,[r11] > > + > > +$L$xts_dec_epilogue: > > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > > + mov rsi,QWORD[16+rsp] > > + DB 0F3h,0C3h ;repret > > + > > +$L$SEH_end_aesni_xts_decrypt: > > +global aesni_ocb_encrypt > > + > > +ALIGN 32 > > +aesni_ocb_encrypt: > > + mov QWORD[8+rsp],rdi ;WIN64 prologue > > + mov QWORD[16+rsp],rsi > > + mov rax,rsp > > +$L$SEH_begin_aesni_ocb_encrypt: > > + mov rdi,rcx > > + mov rsi,rdx > > + mov rdx,r8 > > + mov rcx,r9 > > + mov r8,QWORD[40+rsp] > > + mov r9,QWORD[48+rsp] > > + > > + > > + > > + lea rax,[rsp] > > + push rbx > > + > > + push rbp > > + > > + push r12 > > + > > + push r13 > > + > > + push r14 > > + > > + lea rsp,[((-160))+rsp] > > + movaps XMMWORD[rsp],xmm6 > > + movaps XMMWORD[16+rsp],xmm7 > > + movaps XMMWORD[32+rsp],xmm8 > > + movaps XMMWORD[48+rsp],xmm9 > > + movaps XMMWORD[64+rsp],xmm10 > > + movaps XMMWORD[80+rsp],xmm11 > > + movaps XMMWORD[96+rsp],xmm12 > > + movaps XMMWORD[112+rsp],xmm13 > > + movaps XMMWORD[128+rsp],xmm14 > > + movaps XMMWORD[144+rsp],xmm15 > > +$L$ocb_enc_body: > > + mov rbx,QWORD[56+rax] > > + mov rbp,QWORD[((56+8))+rax] > > + > > + mov r10d,DWORD[240+rcx] > > + mov r11,rcx > > + shl r10d,4 > > + movups xmm9,XMMWORD[rcx] > > + movups xmm1,XMMWORD[16+r10*1+rcx] > > + > > + movdqu xmm15,XMMWORD[r9] > > + pxor xmm9,xmm1 > > + pxor xmm15,xmm1 > > + > > + mov eax,16+32 > > + lea rcx,[32+r10*1+r11] > > + movups xmm1,XMMWORD[16+r11] > > + sub rax,r10 > > + mov r10,rax > > + > > + movdqu xmm10,XMMWORD[rbx] > > + movdqu xmm8,XMMWORD[rbp] > > + > > + test r8,1 > > + jnz NEAR $L$ocb_enc_odd > > + > > + bsf r12,r8 > > + add r8,1 > > + shl r12,4 > > + movdqu xmm7,XMMWORD[r12*1+rbx] > > + movdqu xmm2,XMMWORD[rdi] > > + lea rdi,[16+rdi] > > + > > + call __ocb_encrypt1 > > + > > + movdqa xmm15,xmm7 > > + movups XMMWORD[rsi],xmm2 > > + lea rsi,[16+rsi] > > + sub rdx,1 > > + jz NEAR $L$ocb_enc_done > > + > > +$L$ocb_enc_odd: > > + lea r12,[1+r8] > > + lea r13,[3+r8] > > + lea r14,[5+r8] > > + lea r8,[6+r8] > > + bsf r12,r12 > > + bsf r13,r13 > > + bsf r14,r14 > > + shl r12,4 > > + shl r13,4 > > + shl r14,4 > > + > > + sub rdx,6 > > + jc NEAR $L$ocb_enc_short > > + jmp NEAR $L$ocb_enc_grandloop > > + > > +ALIGN 32 > > +$L$ocb_enc_grandloop: > > + movdqu xmm2,XMMWORD[rdi] > > + movdqu xmm3,XMMWORD[16+rdi] > > + movdqu xmm4,XMMWORD[32+rdi] > > + movdqu xmm5,XMMWORD[48+rdi] > > + movdqu xmm6,XMMWORD[64+rdi] > > + movdqu xmm7,XMMWORD[80+rdi] > > + lea rdi,[96+rdi] > > + > > + call __ocb_encrypt6 > > + > > + movups XMMWORD[rsi],xmm2 > > + movups XMMWORD[16+rsi],xmm3 > > + movups XMMWORD[32+rsi],xmm4 > > + movups XMMWORD[48+rsi],xmm5 > > + movups XMMWORD[64+rsi],xmm6 > > + movups XMMWORD[80+rsi],xmm7 > > + lea rsi,[96+rsi] > > + sub rdx,6 > > + jnc NEAR $L$ocb_enc_grandloop > > + > > +$L$ocb_enc_short: > > + add rdx,6 > > + jz NEAR $L$ocb_enc_done > > + > > + movdqu xmm2,XMMWORD[rdi] > > + cmp rdx,2 > > + jb NEAR $L$ocb_enc_one > > + movdqu xmm3,XMMWORD[16+rdi] > > + je NEAR $L$ocb_enc_two > > + > > + movdqu xmm4,XMMWORD[32+rdi] > > + cmp rdx,4 > > + jb NEAR $L$ocb_enc_three > > + movdqu xmm5,XMMWORD[48+rdi] > > + je NEAR $L$ocb_enc_four > > + > > + movdqu xmm6,XMMWORD[64+rdi] > > + pxor xmm7,xmm7 > > + > > + call __ocb_encrypt6 > > + > > + movdqa xmm15,xmm14 > > + movups XMMWORD[rsi],xmm2 > > + movups XMMWORD[16+rsi],xmm3 > > + movups XMMWORD[32+rsi],xmm4 > > + movups XMMWORD[48+rsi],xmm5 > > + movups XMMWORD[64+rsi],xmm6 > > + > > + jmp NEAR $L$ocb_enc_done > > + > > +ALIGN 16 > > +$L$ocb_enc_one: > > + movdqa xmm7,xmm10 > > + > > + call __ocb_encrypt1 > > + > > + movdqa xmm15,xmm7 > > + movups XMMWORD[rsi],xmm2 > > + jmp NEAR $L$ocb_enc_done > > + > > +ALIGN 16 > > +$L$ocb_enc_two: > > + pxor xmm4,xmm4 > > + pxor xmm5,xmm5 > > + > > + call __ocb_encrypt4 > > + > > + movdqa xmm15,xmm11 > > + movups XMMWORD[rsi],xmm2 > > + movups XMMWORD[16+rsi],xmm3 > > + > > + jmp NEAR $L$ocb_enc_done > > + > > +ALIGN 16 > > +$L$ocb_enc_three: > > + pxor xmm5,xmm5 > > + > > + call __ocb_encrypt4 > > + > > + movdqa xmm15,xmm12 > > + movups XMMWORD[rsi],xmm2 > > + movups XMMWORD[16+rsi],xmm3 > > + movups XMMWORD[32+rsi],xmm4 > > + > > + jmp NEAR $L$ocb_enc_done > > + > > +ALIGN 16 > > +$L$ocb_enc_four: > > + call __ocb_encrypt4 > > + > > + movdqa xmm15,xmm13 > > + movups XMMWORD[rsi],xmm2 > > + movups XMMWORD[16+rsi],xmm3 > > + movups XMMWORD[32+rsi],xmm4 > > + movups XMMWORD[48+rsi],xmm5 > > + > > +$L$ocb_enc_done: > > + pxor xmm15,xmm0 > > + movdqu XMMWORD[rbp],xmm8 > > + movdqu XMMWORD[r9],xmm15 > > + > > + xorps xmm0,xmm0 > > + pxor xmm1,xmm1 > > + pxor xmm2,xmm2 > > + pxor xmm3,xmm3 > > + pxor xmm4,xmm4 > > + pxor xmm5,xmm5 > > + movaps xmm6,XMMWORD[rsp] > > + movaps XMMWORD[rsp],xmm0 > > + movaps xmm7,XMMWORD[16+rsp] > > + movaps XMMWORD[16+rsp],xmm0 > > + movaps xmm8,XMMWORD[32+rsp] > > + movaps XMMWORD[32+rsp],xmm0 > > + movaps xmm9,XMMWORD[48+rsp] > > + movaps XMMWORD[48+rsp],xmm0 > > + movaps xmm10,XMMWORD[64+rsp] > > + movaps XMMWORD[64+rsp],xmm0 > > + movaps xmm11,XMMWORD[80+rsp] > > + movaps XMMWORD[80+rsp],xmm0 > > + movaps xmm12,XMMWORD[96+rsp] > > + movaps XMMWORD[96+rsp],xmm0 > > + movaps xmm13,XMMWORD[112+rsp] > > + movaps XMMWORD[112+rsp],xmm0 > > + movaps xmm14,XMMWORD[128+rsp] > > + movaps XMMWORD[128+rsp],xmm0 > > + movaps xmm15,XMMWORD[144+rsp] > > + movaps XMMWORD[144+rsp],xmm0 > > + lea rax,[((160+40))+rsp] > > +$L$ocb_enc_pop: > > + mov r14,QWORD[((-40))+rax] > > + > > + mov r13,QWORD[((-32))+rax] > > + > > + mov r12,QWORD[((-24))+rax] > > + > > + mov rbp,QWORD[((-16))+rax] > > + > > + mov rbx,QWORD[((-8))+rax] > > + > > + lea rsp,[rax] > > + > > +$L$ocb_enc_epilogue: > > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > > + mov rsi,QWORD[16+rsp] > > + DB 0F3h,0C3h ;repret > > + > > +$L$SEH_end_aesni_ocb_encrypt: > > + > > + > > +ALIGN 32 > > +__ocb_encrypt6: > > + > > + pxor xmm15,xmm9 > > + movdqu xmm11,XMMWORD[r12*1+rbx] > > + movdqa xmm12,xmm10 > > + movdqu xmm13,XMMWORD[r13*1+rbx] > > + movdqa xmm14,xmm10 > > + pxor xmm10,xmm15 > > + movdqu xmm15,XMMWORD[r14*1+rbx] > > + pxor xmm11,xmm10 > > + pxor xmm8,xmm2 > > + pxor xmm2,xmm10 > > + pxor xmm12,xmm11 > > + pxor xmm8,xmm3 > > + pxor xmm3,xmm11 > > + pxor xmm13,xmm12 > > + pxor xmm8,xmm4 > > + pxor xmm4,xmm12 > > + pxor xmm14,xmm13 > > + pxor xmm8,xmm5 > > + pxor xmm5,xmm13 > > + pxor xmm15,xmm14 > > + pxor xmm8,xmm6 > > + pxor xmm6,xmm14 > > + pxor xmm8,xmm7 > > + pxor xmm7,xmm15 > > + movups xmm0,XMMWORD[32+r11] > > + > > + lea r12,[1+r8] > > + lea r13,[3+r8] > > + lea r14,[5+r8] > > + add r8,6 > > + pxor xmm10,xmm9 > > + bsf r12,r12 > > + bsf r13,r13 > > + bsf r14,r14 > > + > > +DB 102,15,56,220,209 > > +DB 102,15,56,220,217 > > +DB 102,15,56,220,225 > > +DB 102,15,56,220,233 > > + pxor xmm11,xmm9 > > + pxor xmm12,xmm9 > > +DB 102,15,56,220,241 > > + pxor xmm13,xmm9 > > + pxor xmm14,xmm9 > > +DB 102,15,56,220,249 > > + movups xmm1,XMMWORD[48+r11] > > + pxor xmm15,xmm9 > > + > > +DB 102,15,56,220,208 > > +DB 102,15,56,220,216 > > +DB 102,15,56,220,224 > > +DB 102,15,56,220,232 > > +DB 102,15,56,220,240 > > +DB 102,15,56,220,248 > > + movups xmm0,XMMWORD[64+r11] > > + shl r12,4 > > + shl r13,4 > > + jmp NEAR $L$ocb_enc_loop6 > > + > > +ALIGN 32 > > +$L$ocb_enc_loop6: > > +DB 102,15,56,220,209 > > +DB 102,15,56,220,217 > > +DB 102,15,56,220,225 > > +DB 102,15,56,220,233 > > +DB 102,15,56,220,241 > > +DB 102,15,56,220,249 > > + movups xmm1,XMMWORD[rax*1+rcx] > > + add rax,32 > > + > > +DB 102,15,56,220,208 > > +DB 102,15,56,220,216 > > +DB 102,15,56,220,224 > > +DB 102,15,56,220,232 > > +DB 102,15,56,220,240 > > +DB 102,15,56,220,248 > > + movups xmm0,XMMWORD[((-16))+rax*1+rcx] > > + jnz NEAR $L$ocb_enc_loop6 > > + > > +DB 102,15,56,220,209 > > +DB 102,15,56,220,217 > > +DB 102,15,56,220,225 > > +DB 102,15,56,220,233 > > +DB 102,15,56,220,241 > > +DB 102,15,56,220,249 > > + movups xmm1,XMMWORD[16+r11] > > + shl r14,4 > > + > > +DB 102,65,15,56,221,210 > > + movdqu xmm10,XMMWORD[rbx] > > + mov rax,r10 > > +DB 102,65,15,56,221,219 > > +DB 102,65,15,56,221,228 > > +DB 102,65,15,56,221,237 > > +DB 102,65,15,56,221,246 > > +DB 102,65,15,56,221,255 > > + DB 0F3h,0C3h ;repret > > + > > + > > + > > + > > +ALIGN 32 > > +__ocb_encrypt4: > > + > > + pxor xmm15,xmm9 > > + movdqu xmm11,XMMWORD[r12*1+rbx] > > + movdqa xmm12,xmm10 > > + movdqu xmm13,XMMWORD[r13*1+rbx] > > + pxor xmm10,xmm15 > > + pxor xmm11,xmm10 > > + pxor xmm8,xmm2 > > + pxor xmm2,xmm10 > > + pxor xmm12,xmm11 > > + pxor xmm8,xmm3 > > + pxor xmm3,xmm11 > > + pxor xmm13,xmm12 > > + pxor xmm8,xmm4 > > + pxor xmm4,xmm12 > > + pxor xmm8,xmm5 > > + pxor xmm5,xmm13 > > + movups xmm0,XMMWORD[32+r11] > > + > > + pxor xmm10,xmm9 > > + pxor xmm11,xmm9 > > + pxor xmm12,xmm9 > > + pxor xmm13,xmm9 > > + > > +DB 102,15,56,220,209 > > +DB 102,15,56,220,217 > > +DB 102,15,56,220,225 > > +DB 102,15,56,220,233 > > + movups xmm1,XMMWORD[48+r11] > > + > > +DB 102,15,56,220,208 > > +DB 102,15,56,220,216 > > +DB 102,15,56,220,224 > > +DB 102,15,56,220,232 > > + movups xmm0,XMMWORD[64+r11] > > + jmp NEAR $L$ocb_enc_loop4 > > + > > +ALIGN 32 > > +$L$ocb_enc_loop4: > > +DB 102,15,56,220,209 > > +DB 102,15,56,220,217 > > +DB 102,15,56,220,225 > > +DB 102,15,56,220,233 > > + movups xmm1,XMMWORD[rax*1+rcx] > > + add rax,32 > > + > > +DB 102,15,56,220,208 > > +DB 102,15,56,220,216 > > +DB 102,15,56,220,224 > > +DB 102,15,56,220,232 > > + movups xmm0,XMMWORD[((-16))+rax*1+rcx] > > + jnz NEAR $L$ocb_enc_loop4 > > + > > +DB 102,15,56,220,209 > > +DB 102,15,56,220,217 > > +DB 102,15,56,220,225 > > +DB 102,15,56,220,233 > > + movups xmm1,XMMWORD[16+r11] > > + mov rax,r10 > > + > > +DB 102,65,15,56,221,210 > > +DB 102,65,15,56,221,219 > > +DB 102,65,15,56,221,228 > > +DB 102,65,15,56,221,237 > > + DB 0F3h,0C3h ;repret > > + > > + > > + > > + > > +ALIGN 32 > > +__ocb_encrypt1: > > + > > + pxor xmm7,xmm15 > > + pxor xmm7,xmm9 > > + pxor xmm8,xmm2 > > + pxor xmm2,xmm7 > > + movups xmm0,XMMWORD[32+r11] > > + > > +DB 102,15,56,220,209 > > + movups xmm1,XMMWORD[48+r11] > > + pxor xmm7,xmm9 > > + > > +DB 102,15,56,220,208 > > + movups xmm0,XMMWORD[64+r11] > > + jmp NEAR $L$ocb_enc_loop1 > > + > > +ALIGN 32 > > +$L$ocb_enc_loop1: > > +DB 102,15,56,220,209 > > + movups xmm1,XMMWORD[rax*1+rcx] > > + add rax,32 > > + > > +DB 102,15,56,220,208 > > + movups xmm0,XMMWORD[((-16))+rax*1+rcx] > > + jnz NEAR $L$ocb_enc_loop1 > > + > > +DB 102,15,56,220,209 > > + movups xmm1,XMMWORD[16+r11] > > + mov rax,r10 > > + > > +DB 102,15,56,221,215 > > + DB 0F3h,0C3h ;repret > > + > > + > > + > > +global aesni_ocb_decrypt > > + > > +ALIGN 32 > > +aesni_ocb_decrypt: > > + mov QWORD[8+rsp],rdi ;WIN64 prologue > > + mov QWORD[16+rsp],rsi > > + mov rax,rsp > > +$L$SEH_begin_aesni_ocb_decrypt: > > + mov rdi,rcx > > + mov rsi,rdx > > + mov rdx,r8 > > + mov rcx,r9 > > + mov r8,QWORD[40+rsp] > > + mov r9,QWORD[48+rsp] > > + > > + > > + > > + lea rax,[rsp] > > + push rbx > > + > > + push rbp > > + > > + push r12 > > + > > + push r13 > > + > > + push r14 > > + > > + lea rsp,[((-160))+rsp] > > + movaps XMMWORD[rsp],xmm6 > > + movaps XMMWORD[16+rsp],xmm7 > > + movaps XMMWORD[32+rsp],xmm8 > > + movaps XMMWORD[48+rsp],xmm9 > > + movaps XMMWORD[64+rsp],xmm10 > > + movaps XMMWORD[80+rsp],xmm11 > > + movaps XMMWORD[96+rsp],xmm12 > > + movaps XMMWORD[112+rsp],xmm13 > > + movaps XMMWORD[128+rsp],xmm14 > > + movaps XMMWORD[144+rsp],xmm15 > > +$L$ocb_dec_body: > > + mov rbx,QWORD[56+rax] > > + mov rbp,QWORD[((56+8))+rax] > > + > > + mov r10d,DWORD[240+rcx] > > + mov r11,rcx > > + shl r10d,4 > > + movups xmm9,XMMWORD[rcx] > > + movups xmm1,XMMWORD[16+r10*1+rcx] > > + > > + movdqu xmm15,XMMWORD[r9] > > + pxor xmm9,xmm1 > > + pxor xmm15,xmm1 > > + > > + mov eax,16+32 > > + lea rcx,[32+r10*1+r11] > > + movups xmm1,XMMWORD[16+r11] > > + sub rax,r10 > > + mov r10,rax > > + > > + movdqu xmm10,XMMWORD[rbx] > > + movdqu xmm8,XMMWORD[rbp] > > + > > + test r8,1 > > + jnz NEAR $L$ocb_dec_odd > > + > > + bsf r12,r8 > > + add r8,1 > > + shl r12,4 > > + movdqu xmm7,XMMWORD[r12*1+rbx] > > + movdqu xmm2,XMMWORD[rdi] > > + lea rdi,[16+rdi] > > + > > + call __ocb_decrypt1 > > + > > + movdqa xmm15,xmm7 > > + movups XMMWORD[rsi],xmm2 > > + xorps xmm8,xmm2 > > + lea rsi,[16+rsi] > > + sub rdx,1 > > + jz NEAR $L$ocb_dec_done > > + > > +$L$ocb_dec_odd: > > + lea r12,[1+r8] > > + lea r13,[3+r8] > > + lea r14,[5+r8] > > + lea r8,[6+r8] > > + bsf r12,r12 > > + bsf r13,r13 > > + bsf r14,r14 > > + shl r12,4 > > + shl r13,4 > > + shl r14,4 > > + > > + sub rdx,6 > > + jc NEAR $L$ocb_dec_short > > + jmp NEAR $L$ocb_dec_grandloop > > + > > +ALIGN 32 > > +$L$ocb_dec_grandloop: > > + movdqu xmm2,XMMWORD[rdi] > > + movdqu xmm3,XMMWORD[16+rdi] > > + movdqu xmm4,XMMWORD[32+rdi] > > + movdqu xmm5,XMMWORD[48+rdi] > > + movdqu xmm6,XMMWORD[64+rdi] > > + movdqu xmm7,XMMWORD[80+rdi] > > + lea rdi,[96+rdi] > > + > > + call __ocb_decrypt6 > > + > > + movups XMMWORD[rsi],xmm2 > > + pxor xmm8,xmm2 > > + movups XMMWORD[16+rsi],xmm3 > > + pxor xmm8,xmm3 > > + movups XMMWORD[32+rsi],xmm4 > > + pxor xmm8,xmm4 > > + movups XMMWORD[48+rsi],xmm5 > > + pxor xmm8,xmm5 > > + movups XMMWORD[64+rsi],xmm6 > > + pxor xmm8,xmm6 > > + movups XMMWORD[80+rsi],xmm7 > > + pxor xmm8,xmm7 > > + lea rsi,[96+rsi] > > + sub rdx,6 > > + jnc NEAR $L$ocb_dec_grandloop > > + > > +$L$ocb_dec_short: > > + add rdx,6 > > + jz NEAR $L$ocb_dec_done > > + > > + movdqu xmm2,XMMWORD[rdi] > > + cmp rdx,2 > > + jb NEAR $L$ocb_dec_one > > + movdqu xmm3,XMMWORD[16+rdi] > > + je NEAR $L$ocb_dec_two > > + > > + movdqu xmm4,XMMWORD[32+rdi] > > + cmp rdx,4 > > + jb NEAR $L$ocb_dec_three > > + movdqu xmm5,XMMWORD[48+rdi] > > + je NEAR $L$ocb_dec_four > > + > > + movdqu xmm6,XMMWORD[64+rdi] > > + pxor xmm7,xmm7 > > + > > + call __ocb_decrypt6 > > + > > + movdqa xmm15,xmm14 > > + movups XMMWORD[rsi],xmm2 > > + pxor xmm8,xmm2 > > + movups XMMWORD[16+rsi],xmm3 > > + pxor xmm8,xmm3 > > + movups XMMWORD[32+rsi],xmm4 > > + pxor xmm8,xmm4 > > + movups XMMWORD[48+rsi],xmm5 > > + pxor xmm8,xmm5 > > + movups XMMWORD[64+rsi],xmm6 > > + pxor xmm8,xmm6 > > + > > + jmp NEAR $L$ocb_dec_done > > + > > +ALIGN 16 > > +$L$ocb_dec_one: > > + movdqa xmm7,xmm10 > > + > > + call __ocb_decrypt1 > > + > > + movdqa xmm15,xmm7 > > + movups XMMWORD[rsi],xmm2 > > + xorps xmm8,xmm2 > > + jmp NEAR $L$ocb_dec_done > > + > > +ALIGN 16 > > +$L$ocb_dec_two: > > + pxor xmm4,xmm4 > > + pxor xmm5,xmm5 > > + > > + call __ocb_decrypt4 > > + > > + movdqa xmm15,xmm11 > > + movups XMMWORD[rsi],xmm2 > > + xorps xmm8,xmm2 > > + movups XMMWORD[16+rsi],xmm3 > > + xorps xmm8,xmm3 > > + > > + jmp NEAR $L$ocb_dec_done > > + > > +ALIGN 16 > > +$L$ocb_dec_three: > > + pxor xmm5,xmm5 > > + > > + call __ocb_decrypt4 > > + > > + movdqa xmm15,xmm12 > > + movups XMMWORD[rsi],xmm2 > > + xorps xmm8,xmm2 > > + movups XMMWORD[16+rsi],xmm3 > > + xorps xmm8,xmm3 > > + movups XMMWORD[32+rsi],xmm4 > > + xorps xmm8,xmm4 > > + > > + jmp NEAR $L$ocb_dec_done > > + > > +ALIGN 16 > > +$L$ocb_dec_four: > > + call __ocb_decrypt4 > > + > > + movdqa xmm15,xmm13 > > + movups XMMWORD[rsi],xmm2 > > + pxor xmm8,xmm2 > > + movups XMMWORD[16+rsi],xmm3 > > + pxor xmm8,xmm3 > > + movups XMMWORD[32+rsi],xmm4 > > + pxor xmm8,xmm4 > > + movups XMMWORD[48+rsi],xmm5 > > + pxor xmm8,xmm5 > > + > > +$L$ocb_dec_done: > > + pxor xmm15,xmm0 > > + movdqu XMMWORD[rbp],xmm8 > > + movdqu XMMWORD[r9],xmm15 > > + > > + xorps xmm0,xmm0 > > + pxor xmm1,xmm1 > > + pxor xmm2,xmm2 > > + pxor xmm3,xmm3 > > + pxor xmm4,xmm4 > > + pxor xmm5,xmm5 > > + movaps xmm6,XMMWORD[rsp] > > + movaps XMMWORD[rsp],xmm0 > > + movaps xmm7,XMMWORD[16+rsp] > > + movaps XMMWORD[16+rsp],xmm0 > > + movaps xmm8,XMMWORD[32+rsp] > > + movaps XMMWORD[32+rsp],xmm0 > > + movaps xmm9,XMMWORD[48+rsp] > > + movaps XMMWORD[48+rsp],xmm0 > > + movaps xmm10,XMMWORD[64+rsp] > > + movaps XMMWORD[64+rsp],xmm0 > > + movaps xmm11,XMMWORD[80+rsp] > > + movaps XMMWORD[80+rsp],xmm0 > > + movaps xmm12,XMMWORD[96+rsp] > > + movaps XMMWORD[96+rsp],xmm0 > > + movaps xmm13,XMMWORD[112+rsp] > > + movaps XMMWORD[112+rsp],xmm0 > > + movaps xmm14,XMMWORD[128+rsp] > > + movaps XMMWORD[128+rsp],xmm0 > > + movaps xmm15,XMMWORD[144+rsp] > > + movaps XMMWORD[144+rsp],xmm0 > > + lea rax,[((160+40))+rsp] > > +$L$ocb_dec_pop: > > + mov r14,QWORD[((-40))+rax] > > + > > + mov r13,QWORD[((-32))+rax] > > + > > + mov r12,QWORD[((-24))+rax] > > + > > + mov rbp,QWORD[((-16))+rax] > > + > > + mov rbx,QWORD[((-8))+rax] > > + > > + lea rsp,[rax] > > + > > +$L$ocb_dec_epilogue: > > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > > + mov rsi,QWORD[16+rsp] > > + DB 0F3h,0C3h ;repret > > + > > +$L$SEH_end_aesni_ocb_decrypt: > > + > > + > > +ALIGN 32 > > +__ocb_decrypt6: > > + > > + pxor xmm15,xmm9 > > + movdqu xmm11,XMMWORD[r12*1+rbx] > > + movdqa xmm12,xmm10 > > + movdqu xmm13,XMMWORD[r13*1+rbx] > > + movdqa xmm14,xmm10 > > + pxor xmm10,xmm15 > > + movdqu xmm15,XMMWORD[r14*1+rbx] > > + pxor xmm11,xmm10 > > + pxor xmm2,xmm10 > > + pxor xmm12,xmm11 > > + pxor xmm3,xmm11 > > + pxor xmm13,xmm12 > > + pxor xmm4,xmm12 > > + pxor xmm14,xmm13 > > + pxor xmm5,xmm13 > > + pxor xmm15,xmm14 > > + pxor xmm6,xmm14 > > + pxor xmm7,xmm15 > > + movups xmm0,XMMWORD[32+r11] > > + > > + lea r12,[1+r8] > > + lea r13,[3+r8] > > + lea r14,[5+r8] > > + add r8,6 > > + pxor xmm10,xmm9 > > + bsf r12,r12 > > + bsf r13,r13 > > + bsf r14,r14 > > + > > +DB 102,15,56,222,209 > > +DB 102,15,56,222,217 > > +DB 102,15,56,222,225 > > +DB 102,15,56,222,233 > > + pxor xmm11,xmm9 > > + pxor xmm12,xmm9 > > +DB 102,15,56,222,241 > > + pxor xmm13,xmm9 > > + pxor xmm14,xmm9 > > +DB 102,15,56,222,249 > > + movups xmm1,XMMWORD[48+r11] > > + pxor xmm15,xmm9 > > + > > +DB 102,15,56,222,208 > > +DB 102,15,56,222,216 > > +DB 102,15,56,222,224 > > +DB 102,15,56,222,232 > > +DB 102,15,56,222,240 > > +DB 102,15,56,222,248 > > + movups xmm0,XMMWORD[64+r11] > > + shl r12,4 > > + shl r13,4 > > + jmp NEAR $L$ocb_dec_loop6 > > + > > +ALIGN 32 > > +$L$ocb_dec_loop6: > > +DB 102,15,56,222,209 > > +DB 102,15,56,222,217 > > +DB 102,15,56,222,225 > > +DB 102,15,56,222,233 > > +DB 102,15,56,222,241 > > +DB 102,15,56,222,249 > > + movups xmm1,XMMWORD[rax*1+rcx] > > + add rax,32 > > + > > +DB 102,15,56,222,208 > > +DB 102,15,56,222,216 > > +DB 102,15,56,222,224 > > +DB 102,15,56,222,232 > > +DB 102,15,56,222,240 > > +DB 102,15,56,222,248 > > + movups xmm0,XMMWORD[((-16))+rax*1+rcx] > > + jnz NEAR $L$ocb_dec_loop6 > > + > > +DB 102,15,56,222,209 > > +DB 102,15,56,222,217 > > +DB 102,15,56,222,225 > > +DB 102,15,56,222,233 > > +DB 102,15,56,222,241 > > +DB 102,15,56,222,249 > > + movups xmm1,XMMWORD[16+r11] > > + shl r14,4 > > + > > +DB 102,65,15,56,223,210 > > + movdqu xmm10,XMMWORD[rbx] > > + mov rax,r10 > > +DB 102,65,15,56,223,219 > > +DB 102,65,15,56,223,228 > > +DB 102,65,15,56,223,237 > > +DB 102,65,15,56,223,246 > > +DB 102,65,15,56,223,255 > > + DB 0F3h,0C3h ;repret > > + > > + > > + > > + > > +ALIGN 32 > > +__ocb_decrypt4: > > + > > + pxor xmm15,xmm9 > > + movdqu xmm11,XMMWORD[r12*1+rbx] > > + movdqa xmm12,xmm10 > > + movdqu xmm13,XMMWORD[r13*1+rbx] > > + pxor xmm10,xmm15 > > + pxor xmm11,xmm10 > > + pxor xmm2,xmm10 > > + pxor xmm12,xmm11 > > + pxor xmm3,xmm11 > > + pxor xmm13,xmm12 > > + pxor xmm4,xmm12 > > + pxor xmm5,xmm13 > > + movups xmm0,XMMWORD[32+r11] > > + > > + pxor xmm10,xmm9 > > + pxor xmm11,xmm9 > > + pxor xmm12,xmm9 > > + pxor xmm13,xmm9 > > + > > +DB 102,15,56,222,209 > > +DB 102,15,56,222,217 > > +DB 102,15,56,222,225 > > +DB 102,15,56,222,233 > > + movups xmm1,XMMWORD[48+r11] > > + > > +DB 102,15,56,222,208 > > +DB 102,15,56,222,216 > > +DB 102,15,56,222,224 > > +DB 102,15,56,222,232 > > + movups xmm0,XMMWORD[64+r11] > > + jmp NEAR $L$ocb_dec_loop4 > > + > > +ALIGN 32 > > +$L$ocb_dec_loop4: > > +DB 102,15,56,222,209 > > +DB 102,15,56,222,217 > > +DB 102,15,56,222,225 > > +DB 102,15,56,222,233 > > + movups xmm1,XMMWORD[rax*1+rcx] > > + add rax,32 > > + > > +DB 102,15,56,222,208 > > +DB 102,15,56,222,216 > > +DB 102,15,56,222,224 > > +DB 102,15,56,222,232 > > + movups xmm0,XMMWORD[((-16))+rax*1+rcx] > > + jnz NEAR $L$ocb_dec_loop4 > > + > > +DB 102,15,56,222,209 > > +DB 102,15,56,222,217 > > +DB 102,15,56,222,225 > > +DB 102,15,56,222,233 > > + movups xmm1,XMMWORD[16+r11] > > + mov rax,r10 > > + > > +DB 102,65,15,56,223,210 > > +DB 102,65,15,56,223,219 > > +DB 102,65,15,56,223,228 > > +DB 102,65,15,56,223,237 > > + DB 0F3h,0C3h ;repret > > + > > + > > + > > + > > +ALIGN 32 > > +__ocb_decrypt1: > > + > > + pxor xmm7,xmm15 > > + pxor xmm7,xmm9 > > + pxor xmm2,xmm7 > > + movups xmm0,XMMWORD[32+r11] > > + > > +DB 102,15,56,222,209 > > + movups xmm1,XMMWORD[48+r11] > > + pxor xmm7,xmm9 > > + > > +DB 102,15,56,222,208 > > + movups xmm0,XMMWORD[64+r11] > > + jmp NEAR $L$ocb_dec_loop1 > > + > > +ALIGN 32 > > +$L$ocb_dec_loop1: > > +DB 102,15,56,222,209 > > + movups xmm1,XMMWORD[rax*1+rcx] > > + add rax,32 > > + > > +DB 102,15,56,222,208 > > + movups xmm0,XMMWORD[((-16))+rax*1+rcx] > > + jnz NEAR $L$ocb_dec_loop1 > > + > > +DB 102,15,56,222,209 > > + movups xmm1,XMMWORD[16+r11] > > + mov rax,r10 > > + > > +DB 102,15,56,223,215 > > + DB 0F3h,0C3h ;repret > > + > > + > > +global aesni_cbc_encrypt > > + > > +ALIGN 16 > > +aesni_cbc_encrypt: > > + mov QWORD[8+rsp],rdi ;WIN64 prologue > > + mov QWORD[16+rsp],rsi > > + mov rax,rsp > > +$L$SEH_begin_aesni_cbc_encrypt: > > + mov rdi,rcx > > + mov rsi,rdx > > + mov rdx,r8 > > + mov rcx,r9 > > + mov r8,QWORD[40+rsp] > > + mov r9,QWORD[48+rsp] > > + > > + > > + > > + test rdx,rdx > > + jz NEAR $L$cbc_ret > > + > > + mov r10d,DWORD[240+rcx] > > + mov r11,rcx > > + test r9d,r9d > > + jz NEAR $L$cbc_decrypt > > + > > + movups xmm2,XMMWORD[r8] > > + mov eax,r10d > > + cmp rdx,16 > > + jb NEAR $L$cbc_enc_tail > > + sub rdx,16 > > + jmp NEAR $L$cbc_enc_loop > > +ALIGN 16 > > +$L$cbc_enc_loop: > > + movups xmm3,XMMWORD[rdi] > > + lea rdi,[16+rdi] > > + > > + movups xmm0,XMMWORD[rcx] > > + movups xmm1,XMMWORD[16+rcx] > > + xorps xmm3,xmm0 > > + lea rcx,[32+rcx] > > + xorps xmm2,xmm3 > > +$L$oop_enc1_15: > > +DB 102,15,56,220,209 > > + dec eax > > + movups xmm1,XMMWORD[rcx] > > + lea rcx,[16+rcx] > > + jnz NEAR $L$oop_enc1_15 > > +DB 102,15,56,221,209 > > + mov eax,r10d > > + mov rcx,r11 > > + movups XMMWORD[rsi],xmm2 > > + lea rsi,[16+rsi] > > + sub rdx,16 > > + jnc NEAR $L$cbc_enc_loop > > + add rdx,16 > > + jnz NEAR $L$cbc_enc_tail > > + pxor xmm0,xmm0 > > + pxor xmm1,xmm1 > > + movups XMMWORD[r8],xmm2 > > + pxor xmm2,xmm2 > > + pxor xmm3,xmm3 > > + jmp NEAR $L$cbc_ret > > + > > +$L$cbc_enc_tail: > > + mov rcx,rdx > > + xchg rsi,rdi > > + DD 0x9066A4F3 > > + mov ecx,16 > > + sub rcx,rdx > > + xor eax,eax > > + DD 0x9066AAF3 > > + lea rdi,[((-16))+rdi] > > + mov eax,r10d > > + mov rsi,rdi > > + mov rcx,r11 > > + xor rdx,rdx > > + jmp NEAR $L$cbc_enc_loop > > + > > +ALIGN 16 > > +$L$cbc_decrypt: > > + cmp rdx,16 > > + jne NEAR $L$cbc_decrypt_bulk > > + > > + > > + > > + movdqu xmm2,XMMWORD[rdi] > > + movdqu xmm3,XMMWORD[r8] > > + movdqa xmm4,xmm2 > > + movups xmm0,XMMWORD[rcx] > > + movups xmm1,XMMWORD[16+rcx] > > + lea rcx,[32+rcx] > > + xorps xmm2,xmm0 > > +$L$oop_dec1_16: > > +DB 102,15,56,222,209 > > + dec r10d > > + movups xmm1,XMMWORD[rcx] > > + lea rcx,[16+rcx] > > + jnz NEAR $L$oop_dec1_16 > > +DB 102,15,56,223,209 > > + pxor xmm0,xmm0 > > + pxor xmm1,xmm1 > > + movdqu XMMWORD[r8],xmm4 > > + xorps xmm2,xmm3 > > + pxor xmm3,xmm3 > > + movups XMMWORD[rsi],xmm2 > > + pxor xmm2,xmm2 > > + jmp NEAR $L$cbc_ret > > +ALIGN 16 > > +$L$cbc_decrypt_bulk: > > + lea r11,[rsp] > > + > > + push rbp > > + > > + sub rsp,176 > > + and rsp,-16 > > + movaps XMMWORD[16+rsp],xmm6 > > + movaps XMMWORD[32+rsp],xmm7 > > + movaps XMMWORD[48+rsp],xmm8 > > + movaps XMMWORD[64+rsp],xmm9 > > + movaps XMMWORD[80+rsp],xmm10 > > + movaps XMMWORD[96+rsp],xmm11 > > + movaps XMMWORD[112+rsp],xmm12 > > + movaps XMMWORD[128+rsp],xmm13 > > + movaps XMMWORD[144+rsp],xmm14 > > + movaps XMMWORD[160+rsp],xmm15 > > +$L$cbc_decrypt_body: > > + mov rbp,rcx > > + movups xmm10,XMMWORD[r8] > > + mov eax,r10d > > + cmp rdx,0x50 > > + jbe NEAR $L$cbc_dec_tail > > + > > + movups xmm0,XMMWORD[rcx] > > + movdqu xmm2,XMMWORD[rdi] > > + movdqu xmm3,XMMWORD[16+rdi] > > + movdqa xmm11,xmm2 > > + movdqu xmm4,XMMWORD[32+rdi] > > + movdqa xmm12,xmm3 > > + movdqu xmm5,XMMWORD[48+rdi] > > + movdqa xmm13,xmm4 > > + movdqu xmm6,XMMWORD[64+rdi] > > + movdqa xmm14,xmm5 > > + movdqu xmm7,XMMWORD[80+rdi] > > + movdqa xmm15,xmm6 > > + mov r9d,DWORD[((OPENSSL_ia32cap_P+4))] > > + cmp rdx,0x70 > > + jbe NEAR $L$cbc_dec_six_or_seven > > + > > + and r9d,71303168 > > + sub rdx,0x50 > > + cmp r9d,4194304 > > + je NEAR $L$cbc_dec_loop6_enter > > + sub rdx,0x20 > > + lea rcx,[112+rcx] > > + jmp NEAR $L$cbc_dec_loop8_enter > > +ALIGN 16 > > +$L$cbc_dec_loop8: > > + movups XMMWORD[rsi],xmm9 > > + lea rsi,[16+rsi] > > +$L$cbc_dec_loop8_enter: > > + movdqu xmm8,XMMWORD[96+rdi] > > + pxor xmm2,xmm0 > > + movdqu xmm9,XMMWORD[112+rdi] > > + pxor xmm3,xmm0 > > + movups xmm1,XMMWORD[((16-112))+rcx] > > + pxor xmm4,xmm0 > > + mov rbp,-1 > > + cmp rdx,0x70 > > + pxor xmm5,xmm0 > > + pxor xmm6,xmm0 > > + pxor xmm7,xmm0 > > + pxor xmm8,xmm0 > > + > > +DB 102,15,56,222,209 > > + pxor xmm9,xmm0 > > + movups xmm0,XMMWORD[((32-112))+rcx] > > +DB 102,15,56,222,217 > > +DB 102,15,56,222,225 > > +DB 102,15,56,222,233 > > +DB 102,15,56,222,241 > > +DB 102,15,56,222,249 > > +DB 102,68,15,56,222,193 > > + adc rbp,0 > > + and rbp,128 > > +DB 102,68,15,56,222,201 > > + add rbp,rdi > > + movups xmm1,XMMWORD[((48-112))+rcx] > > +DB 102,15,56,222,208 > > +DB 102,15,56,222,216 > > +DB 102,15,56,222,224 > > +DB 102,15,56,222,232 > > +DB 102,15,56,222,240 > > +DB 102,15,56,222,248 > > +DB 102,68,15,56,222,192 > > +DB 102,68,15,56,222,200 > > + movups xmm0,XMMWORD[((64-112))+rcx] > > + nop > > +DB 102,15,56,222,209 > > +DB 102,15,56,222,217 > > +DB 102,15,56,222,225 > > +DB 102,15,56,222,233 > > +DB 102,15,56,222,241 > > +DB 102,15,56,222,249 > > +DB 102,68,15,56,222,193 > > +DB 102,68,15,56,222,201 > > + movups xmm1,XMMWORD[((80-112))+rcx] > > + nop > > +DB 102,15,56,222,208 > > +DB 102,15,56,222,216 > > +DB 102,15,56,222,224 > > +DB 102,15,56,222,232 > > +DB 102,15,56,222,240 > > +DB 102,15,56,222,248 > > +DB 102,68,15,56,222,192 > > +DB 102,68,15,56,222,200 > > + movups xmm0,XMMWORD[((96-112))+rcx] > > + nop > > +DB 102,15,56,222,209 > > +DB 102,15,56,222,217 > > +DB 102,15,56,222,225 > > +DB 102,15,56,222,233 > > +DB 102,15,56,222,241 > > +DB 102,15,56,222,249 > > +DB 102,68,15,56,222,193 > > +DB 102,68,15,56,222,201 > > + movups xmm1,XMMWORD[((112-112))+rcx] > > + nop > > +DB 102,15,56,222,208 > > +DB 102,15,56,222,216 > > +DB 102,15,56,222,224 > > +DB 102,15,56,222,232 > > +DB 102,15,56,222,240 > > +DB 102,15,56,222,248 > > +DB 102,68,15,56,222,192 > > +DB 102,68,15,56,222,200 > > + movups xmm0,XMMWORD[((128-112))+rcx] > > + nop > > +DB 102,15,56,222,209 > > +DB 102,15,56,222,217 > > +DB 102,15,56,222,225 > > +DB 102,15,56,222,233 > > +DB 102,15,56,222,241 > > +DB 102,15,56,222,249 > > +DB 102,68,15,56,222,193 > > +DB 102,68,15,56,222,201 > > + movups xmm1,XMMWORD[((144-112))+rcx] > > + cmp eax,11 > > +DB 102,15,56,222,208 > > +DB 102,15,56,222,216 > > +DB 102,15,56,222,224 > > +DB 102,15,56,222,232 > > +DB 102,15,56,222,240 > > +DB 102,15,56,222,248 > > +DB 102,68,15,56,222,192 > > +DB 102,68,15,56,222,200 > > + movups xmm0,XMMWORD[((160-112))+rcx] > > + jb NEAR $L$cbc_dec_done > > +DB 102,15,56,222,209 > > +DB 102,15,56,222,217 > > +DB 102,15,56,222,225 > > +DB 102,15,56,222,233 > > +DB 102,15,56,222,241 > > +DB 102,15,56,222,249 > > +DB 102,68,15,56,222,193 > > +DB 102,68,15,56,222,201 > > + movups xmm1,XMMWORD[((176-112))+rcx] > > + nop > > +DB 102,15,56,222,208 > > +DB 102,15,56,222,216 > > +DB 102,15,56,222,224 > > +DB 102,15,56,222,232 > > +DB 102,15,56,222,240 > > +DB 102,15,56,222,248 > > +DB 102,68,15,56,222,192 > > +DB 102,68,15,56,222,200 > > + movups xmm0,XMMWORD[((192-112))+rcx] > > + je NEAR $L$cbc_dec_done > > +DB 102,15,56,222,209 > > +DB 102,15,56,222,217 > > +DB 102,15,56,222,225 > > +DB 102,15,56,222,233 > > +DB 102,15,56,222,241 > > +DB 102,15,56,222,249 > > +DB 102,68,15,56,222,193 > > +DB 102,68,15,56,222,201 > > + movups xmm1,XMMWORD[((208-112))+rcx] > > + nop > > +DB 102,15,56,222,208 > > +DB 102,15,56,222,216 > > +DB 102,15,56,222,224 > > +DB 102,15,56,222,232 > > +DB 102,15,56,222,240 > > +DB 102,15,56,222,248 > > +DB 102,68,15,56,222,192 > > +DB 102,68,15,56,222,200 > > + movups xmm0,XMMWORD[((224-112))+rcx] > > + jmp NEAR $L$cbc_dec_done > > +ALIGN 16 > > +$L$cbc_dec_done: > > +DB 102,15,56,222,209 > > +DB 102,15,56,222,217 > > + pxor xmm10,xmm0 > > + pxor xmm11,xmm0 > > +DB 102,15,56,222,225 > > +DB 102,15,56,222,233 > > + pxor xmm12,xmm0 > > + pxor xmm13,xmm0 > > +DB 102,15,56,222,241 > > +DB 102,15,56,222,249 > > + pxor xmm14,xmm0 > > + pxor xmm15,xmm0 > > +DB 102,68,15,56,222,193 > > +DB 102,68,15,56,222,201 > > + movdqu xmm1,XMMWORD[80+rdi] > > + > > +DB 102,65,15,56,223,210 > > + movdqu xmm10,XMMWORD[96+rdi] > > + pxor xmm1,xmm0 > > +DB 102,65,15,56,223,219 > > + pxor xmm10,xmm0 > > + movdqu xmm0,XMMWORD[112+rdi] > > +DB 102,65,15,56,223,228 > > + lea rdi,[128+rdi] > > + movdqu xmm11,XMMWORD[rbp] > > +DB 102,65,15,56,223,237 > > +DB 102,65,15,56,223,246 > > + movdqu xmm12,XMMWORD[16+rbp] > > + movdqu xmm13,XMMWORD[32+rbp] > > +DB 102,65,15,56,223,255 > > +DB 102,68,15,56,223,193 > > + movdqu xmm14,XMMWORD[48+rbp] > > + movdqu xmm15,XMMWORD[64+rbp] > > +DB 102,69,15,56,223,202 > > + movdqa xmm10,xmm0 > > + movdqu xmm1,XMMWORD[80+rbp] > > + movups xmm0,XMMWORD[((-112))+rcx] > > + > > + movups XMMWORD[rsi],xmm2 > > + movdqa xmm2,xmm11 > > + movups XMMWORD[16+rsi],xmm3 > > + movdqa xmm3,xmm12 > > + movups XMMWORD[32+rsi],xmm4 > > + movdqa xmm4,xmm13 > > + movups XMMWORD[48+rsi],xmm5 > > + movdqa xmm5,xmm14 > > + movups XMMWORD[64+rsi],xmm6 > > + movdqa xmm6,xmm15 > > + movups XMMWORD[80+rsi],xmm7 > > + movdqa xmm7,xmm1 > > + movups XMMWORD[96+rsi],xmm8 > > + lea rsi,[112+rsi] > > + > > + sub rdx,0x80 > > + ja NEAR $L$cbc_dec_loop8 > > + > > + movaps xmm2,xmm9 > > + lea rcx,[((-112))+rcx] > > + add rdx,0x70 > > + jle NEAR $L$cbc_dec_clear_tail_collected > > + movups XMMWORD[rsi],xmm9 > > + lea rsi,[16+rsi] > > + cmp rdx,0x50 > > + jbe NEAR $L$cbc_dec_tail > > + > > + movaps xmm2,xmm11 > > +$L$cbc_dec_six_or_seven: > > + cmp rdx,0x60 > > + ja NEAR $L$cbc_dec_seven > > + > > + movaps xmm8,xmm7 > > + call _aesni_decrypt6 > > + pxor xmm2,xmm10 > > + movaps xmm10,xmm8 > > + pxor xmm3,xmm11 > > + movdqu XMMWORD[rsi],xmm2 > > + pxor xmm4,xmm12 > > + movdqu XMMWORD[16+rsi],xmm3 > > + pxor xmm3,xmm3 > > + pxor xmm5,xmm13 > > + movdqu XMMWORD[32+rsi],xmm4 > > + pxor xmm4,xmm4 > > + pxor xmm6,xmm14 > > + movdqu XMMWORD[48+rsi],xmm5 > > + pxor xmm5,xmm5 > > + pxor xmm7,xmm15 > > + movdqu XMMWORD[64+rsi],xmm6 > > + pxor xmm6,xmm6 > > + lea rsi,[80+rsi] > > + movdqa xmm2,xmm7 > > + pxor xmm7,xmm7 > > + jmp NEAR $L$cbc_dec_tail_collected > > + > > +ALIGN 16 > > +$L$cbc_dec_seven: > > + movups xmm8,XMMWORD[96+rdi] > > + xorps xmm9,xmm9 > > + call _aesni_decrypt8 > > + movups xmm9,XMMWORD[80+rdi] > > + pxor xmm2,xmm10 > > + movups xmm10,XMMWORD[96+rdi] > > + pxor xmm3,xmm11 > > + movdqu XMMWORD[rsi],xmm2 > > + pxor xmm4,xmm12 > > + movdqu XMMWORD[16+rsi],xmm3 > > + pxor xmm3,xmm3 > > + pxor xmm5,xmm13 > > + movdqu XMMWORD[32+rsi],xmm4 > > + pxor xmm4,xmm4 > > + pxor xmm6,xmm14 > > + movdqu XMMWORD[48+rsi],xmm5 > > + pxor xmm5,xmm5 > > + pxor xmm7,xmm15 > > + movdqu XMMWORD[64+rsi],xmm6 > > + pxor xmm6,xmm6 > > + pxor xmm8,xmm9 > > + movdqu XMMWORD[80+rsi],xmm7 > > + pxor xmm7,xmm7 > > + lea rsi,[96+rsi] > > + movdqa xmm2,xmm8 > > + pxor xmm8,xmm8 > > + pxor xmm9,xmm9 > > + jmp NEAR $L$cbc_dec_tail_collected > > + > > +ALIGN 16 > > +$L$cbc_dec_loop6: > > + movups XMMWORD[rsi],xmm7 > > + lea rsi,[16+rsi] > > + movdqu xmm2,XMMWORD[rdi] > > + movdqu xmm3,XMMWORD[16+rdi] > > + movdqa xmm11,xmm2 > > + movdqu xmm4,XMMWORD[32+rdi] > > + movdqa xmm12,xmm3 > > + movdqu xmm5,XMMWORD[48+rdi] > > + movdqa xmm13,xmm4 > > + movdqu xmm6,XMMWORD[64+rdi] > > + movdqa xmm14,xmm5 > > + movdqu xmm7,XMMWORD[80+rdi] > > + movdqa xmm15,xmm6 > > +$L$cbc_dec_loop6_enter: > > + lea rdi,[96+rdi] > > + movdqa xmm8,xmm7 > > + > > + call _aesni_decrypt6 > > + > > + pxor xmm2,xmm10 > > + movdqa xmm10,xmm8 > > + pxor xmm3,xmm11 > > + movdqu XMMWORD[rsi],xmm2 > > + pxor xmm4,xmm12 > > + movdqu XMMWORD[16+rsi],xmm3 > > + pxor xmm5,xmm13 > > + movdqu XMMWORD[32+rsi],xmm4 > > + pxor xmm6,xmm14 > > + mov rcx,rbp > > + movdqu XMMWORD[48+rsi],xmm5 > > + pxor xmm7,xmm15 > > + mov eax,r10d > > + movdqu XMMWORD[64+rsi],xmm6 > > + lea rsi,[80+rsi] > > + sub rdx,0x60 > > + ja NEAR $L$cbc_dec_loop6 > > + > > + movdqa xmm2,xmm7 > > + add rdx,0x50 > > + jle NEAR $L$cbc_dec_clear_tail_collected > > + movups XMMWORD[rsi],xmm7 > > + lea rsi,[16+rsi] > > + > > +$L$cbc_dec_tail: > > + movups xmm2,XMMWORD[rdi] > > + sub rdx,0x10 > > + jbe NEAR $L$cbc_dec_one > > + > > + movups xmm3,XMMWORD[16+rdi] > > + movaps xmm11,xmm2 > > + sub rdx,0x10 > > + jbe NEAR $L$cbc_dec_two > > + > > + movups xmm4,XMMWORD[32+rdi] > > + movaps xmm12,xmm3 > > + sub rdx,0x10 > > + jbe NEAR $L$cbc_dec_three > > + > > + movups xmm5,XMMWORD[48+rdi] > > + movaps xmm13,xmm4 > > + sub rdx,0x10 > > + jbe NEAR $L$cbc_dec_four > > + > > + movups xmm6,XMMWORD[64+rdi] > > + movaps xmm14,xmm5 > > + movaps xmm15,xmm6 > > + xorps xmm7,xmm7 > > + call _aesni_decrypt6 > > + pxor xmm2,xmm10 > > + movaps xmm10,xmm15 > > + pxor xmm3,xmm11 > > + movdqu XMMWORD[rsi],xmm2 > > + pxor xmm4,xmm12 > > + movdqu XMMWORD[16+rsi],xmm3 > > + pxor xmm3,xmm3 > > + pxor xmm5,xmm13 > > + movdqu XMMWORD[32+rsi],xmm4 > > + pxor xmm4,xmm4 > > + pxor xmm6,xmm14 > > + movdqu XMMWORD[48+rsi],xmm5 > > + pxor xmm5,xmm5 > > + lea rsi,[64+rsi] > > + movdqa xmm2,xmm6 > > + pxor xmm6,xmm6 > > + pxor xmm7,xmm7 > > + sub rdx,0x10 > > + jmp NEAR $L$cbc_dec_tail_collected > > + > > +ALIGN 16 > > +$L$cbc_dec_one: > > + movaps xmm11,xmm2 > > + movups xmm0,XMMWORD[rcx] > > + movups xmm1,XMMWORD[16+rcx] > > + lea rcx,[32+rcx] > > + xorps xmm2,xmm0 > > +$L$oop_dec1_17: > > +DB 102,15,56,222,209 > > + dec eax > > + movups xmm1,XMMWORD[rcx] > > + lea rcx,[16+rcx] > > + jnz NEAR $L$oop_dec1_17 > > +DB 102,15,56,223,209 > > + xorps xmm2,xmm10 > > + movaps xmm10,xmm11 > > + jmp NEAR $L$cbc_dec_tail_collected > > +ALIGN 16 > > +$L$cbc_dec_two: > > + movaps xmm12,xmm3 > > + call _aesni_decrypt2 > > + pxor xmm2,xmm10 > > + movaps xmm10,xmm12 > > + pxor xmm3,xmm11 > > + movdqu XMMWORD[rsi],xmm2 > > + movdqa xmm2,xmm3 > > + pxor xmm3,xmm3 > > + lea rsi,[16+rsi] > > + jmp NEAR $L$cbc_dec_tail_collected > > +ALIGN 16 > > +$L$cbc_dec_three: > > + movaps xmm13,xmm4 > > + call _aesni_decrypt3 > > + pxor xmm2,xmm10 > > + movaps xmm10,xmm13 > > + pxor xmm3,xmm11 > > + movdqu XMMWORD[rsi],xmm2 > > + pxor xmm4,xmm12 > > + movdqu XMMWORD[16+rsi],xmm3 > > + pxor xmm3,xmm3 > > + movdqa xmm2,xmm4 > > + pxor xmm4,xmm4 > > + lea rsi,[32+rsi] > > + jmp NEAR $L$cbc_dec_tail_collected > > +ALIGN 16 > > +$L$cbc_dec_four: > > + movaps xmm14,xmm5 > > + call _aesni_decrypt4 > > + pxor xmm2,xmm10 > > + movaps xmm10,xmm14 > > + pxor xmm3,xmm11 > > + movdqu XMMWORD[rsi],xmm2 > > + pxor xmm4,xmm12 > > + movdqu XMMWORD[16+rsi],xmm3 > > + pxor xmm3,xmm3 > > + pxor xmm5,xmm13 > > + movdqu XMMWORD[32+rsi],xmm4 > > + pxor xmm4,xmm4 > > + movdqa xmm2,xmm5 > > + pxor xmm5,xmm5 > > + lea rsi,[48+rsi] > > + jmp NEAR $L$cbc_dec_tail_collected > > + > > +ALIGN 16 > > +$L$cbc_dec_clear_tail_collected: > > + pxor xmm3,xmm3 > > + pxor xmm4,xmm4 > > + pxor xmm5,xmm5 > > +$L$cbc_dec_tail_collected: > > + movups XMMWORD[r8],xmm10 > > + and rdx,15 > > + jnz NEAR $L$cbc_dec_tail_partial > > + movups XMMWORD[rsi],xmm2 > > + pxor xmm2,xmm2 > > + jmp NEAR $L$cbc_dec_ret > > +ALIGN 16 > > +$L$cbc_dec_tail_partial: > > + movaps XMMWORD[rsp],xmm2 > > + pxor xmm2,xmm2 > > + mov rcx,16 > > + mov rdi,rsi > > + sub rcx,rdx > > + lea rsi,[rsp] > > + DD 0x9066A4F3 > > + movdqa XMMWORD[rsp],xmm2 > > + > > +$L$cbc_dec_ret: > > + xorps xmm0,xmm0 > > + pxor xmm1,xmm1 > > + movaps xmm6,XMMWORD[16+rsp] > > + movaps XMMWORD[16+rsp],xmm0 > > + movaps xmm7,XMMWORD[32+rsp] > > + movaps XMMWORD[32+rsp],xmm0 > > + movaps xmm8,XMMWORD[48+rsp] > > + movaps XMMWORD[48+rsp],xmm0 > > + movaps xmm9,XMMWORD[64+rsp] > > + movaps XMMWORD[64+rsp],xmm0 > > + movaps xmm10,XMMWORD[80+rsp] > > + movaps XMMWORD[80+rsp],xmm0 > > + movaps xmm11,XMMWORD[96+rsp] > > + movaps XMMWORD[96+rsp],xmm0 > > + movaps xmm12,XMMWORD[112+rsp] > > + movaps XMMWORD[112+rsp],xmm0 > > + movaps xmm13,XMMWORD[128+rsp] > > + movaps XMMWORD[128+rsp],xmm0 > > + movaps xmm14,XMMWORD[144+rsp] > > + movaps XMMWORD[144+rsp],xmm0 > > + movaps xmm15,XMMWORD[160+rsp] > > + movaps XMMWORD[160+rsp],xmm0 > > + mov rbp,QWORD[((-8))+r11] > > + > > + lea rsp,[r11] > > + > > +$L$cbc_ret: > > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > > + mov rsi,QWORD[16+rsp] > > + DB 0F3h,0C3h ;repret > > + > > +$L$SEH_end_aesni_cbc_encrypt: > > +global aesni_set_decrypt_key > > + > > +ALIGN 16 > > +aesni_set_decrypt_key: > > + > > +DB 0x48,0x83,0xEC,0x08 > > + > > + call __aesni_set_encrypt_key > > + shl edx,4 > > + test eax,eax > > + jnz NEAR $L$dec_key_ret > > + lea rcx,[16+rdx*1+r8] > > + > > + movups xmm0,XMMWORD[r8] > > + movups xmm1,XMMWORD[rcx] > > + movups XMMWORD[rcx],xmm0 > > + movups XMMWORD[r8],xmm1 > > + lea r8,[16+r8] > > + lea rcx,[((-16))+rcx] > > + > > +$L$dec_key_inverse: > > + movups xmm0,XMMWORD[r8] > > + movups xmm1,XMMWORD[rcx] > > +DB 102,15,56,219,192 > > +DB 102,15,56,219,201 > > + lea r8,[16+r8] > > + lea rcx,[((-16))+rcx] > > + movups XMMWORD[16+rcx],xmm0 > > + movups XMMWORD[(-16)+r8],xmm1 > > + cmp rcx,r8 > > + ja NEAR $L$dec_key_inverse > > + > > + movups xmm0,XMMWORD[r8] > > +DB 102,15,56,219,192 > > + pxor xmm1,xmm1 > > + movups XMMWORD[rcx],xmm0 > > + pxor xmm0,xmm0 > > +$L$dec_key_ret: > > + add rsp,8 > > + > > + DB 0F3h,0C3h ;repret > > + > > +$L$SEH_end_set_decrypt_key: > > + > > +global aesni_set_encrypt_key > > + > > +ALIGN 16 > > +aesni_set_encrypt_key: > > +__aesni_set_encrypt_key: > > + > > +DB 0x48,0x83,0xEC,0x08 > > + > > + mov rax,-1 > > + test rcx,rcx > > + jz NEAR $L$enc_key_ret > > + test r8,r8 > > + jz NEAR $L$enc_key_ret > > + > > + mov r10d,268437504 > > + movups xmm0,XMMWORD[rcx] > > + xorps xmm4,xmm4 > > + and r10d,DWORD[((OPENSSL_ia32cap_P+4))] > > + lea rax,[16+r8] > > + cmp edx,256 > > + je NEAR $L$14rounds > > + cmp edx,192 > > + je NEAR $L$12rounds > > + cmp edx,128 > > + jne NEAR $L$bad_keybits > > + > > +$L$10rounds: > > + mov edx,9 > > + cmp r10d,268435456 > > + je NEAR $L$10rounds_alt > > + > > + movups XMMWORD[r8],xmm0 > > +DB 102,15,58,223,200,1 > > + call $L$key_expansion_128_cold > > +DB 102,15,58,223,200,2 > > + call $L$key_expansion_128 > > +DB 102,15,58,223,200,4 > > + call $L$key_expansion_128 > > +DB 102,15,58,223,200,8 > > + call $L$key_expansion_128 > > +DB 102,15,58,223,200,16 > > + call $L$key_expansion_128 > > +DB 102,15,58,223,200,32 > > + call $L$key_expansion_128 > > +DB 102,15,58,223,200,64 > > + call $L$key_expansion_128 > > +DB 102,15,58,223,200,128 > > + call $L$key_expansion_128 > > +DB 102,15,58,223,200,27 > > + call $L$key_expansion_128 > > +DB 102,15,58,223,200,54 > > + call $L$key_expansion_128 > > + movups XMMWORD[rax],xmm0 > > + mov DWORD[80+rax],edx > > + xor eax,eax > > + jmp NEAR $L$enc_key_ret > > + > > +ALIGN 16 > > +$L$10rounds_alt: > > + movdqa xmm5,XMMWORD[$L$key_rotate] > > + mov r10d,8 > > + movdqa xmm4,XMMWORD[$L$key_rcon1] > > + movdqa xmm2,xmm0 > > + movdqu XMMWORD[r8],xmm0 > > + jmp NEAR $L$oop_key128 > > + > > +ALIGN 16 > > +$L$oop_key128: > > +DB 102,15,56,0,197 > > +DB 102,15,56,221,196 > > + pslld xmm4,1 > > + lea rax,[16+rax] > > + > > + movdqa xmm3,xmm2 > > + pslldq xmm2,4 > > + pxor xmm3,xmm2 > > + pslldq xmm2,4 > > + pxor xmm3,xmm2 > > + pslldq xmm2,4 > > + pxor xmm2,xmm3 > > + > > + pxor xmm0,xmm2 > > + movdqu XMMWORD[(-16)+rax],xmm0 > > + movdqa xmm2,xmm0 > > + > > + dec r10d > > + jnz NEAR $L$oop_key128 > > + > > + movdqa xmm4,XMMWORD[$L$key_rcon1b] > > + > > +DB 102,15,56,0,197 > > +DB 102,15,56,221,196 > > + pslld xmm4,1 > > + > > + movdqa xmm3,xmm2 > > + pslldq xmm2,4 > > + pxor xmm3,xmm2 > > + pslldq xmm2,4 > > + pxor xmm3,xmm2 > > + pslldq xmm2,4 > > + pxor xmm2,xmm3 > > + > > + pxor xmm0,xmm2 > > + movdqu XMMWORD[rax],xmm0 > > + > > + movdqa xmm2,xmm0 > > +DB 102,15,56,0,197 > > +DB 102,15,56,221,196 > > + > > + movdqa xmm3,xmm2 > > + pslldq xmm2,4 > > + pxor xmm3,xmm2 > > + pslldq xmm2,4 > > + pxor xmm3,xmm2 > > + pslldq xmm2,4 > > + pxor xmm2,xmm3 > > + > > + pxor xmm0,xmm2 > > + movdqu XMMWORD[16+rax],xmm0 > > + > > + mov DWORD[96+rax],edx > > + xor eax,eax > > + jmp NEAR $L$enc_key_ret > > + > > +ALIGN 16 > > +$L$12rounds: > > + movq xmm2,QWORD[16+rcx] > > + mov edx,11 > > + cmp r10d,268435456 > > + je NEAR $L$12rounds_alt > > + > > + movups XMMWORD[r8],xmm0 > > +DB 102,15,58,223,202,1 > > + call $L$key_expansion_192a_cold > > +DB 102,15,58,223,202,2 > > + call $L$key_expansion_192b > > +DB 102,15,58,223,202,4 > > + call $L$key_expansion_192a > > +DB 102,15,58,223,202,8 > > + call $L$key_expansion_192b > > +DB 102,15,58,223,202,16 > > + call $L$key_expansion_192a > > +DB 102,15,58,223,202,32 > > + call $L$key_expansion_192b > > +DB 102,15,58,223,202,64 > > + call $L$key_expansion_192a > > +DB 102,15,58,223,202,128 > > + call $L$key_expansion_192b > > + movups XMMWORD[rax],xmm0 > > + mov DWORD[48+rax],edx > > + xor rax,rax > > + jmp NEAR $L$enc_key_ret > > + > > +ALIGN 16 > > +$L$12rounds_alt: > > + movdqa xmm5,XMMWORD[$L$key_rotate192] > > + movdqa xmm4,XMMWORD[$L$key_rcon1] > > + mov r10d,8 > > + movdqu XMMWORD[r8],xmm0 > > + jmp NEAR $L$oop_key192 > > + > > +ALIGN 16 > > +$L$oop_key192: > > + movq QWORD[rax],xmm2 > > + movdqa xmm1,xmm2 > > +DB 102,15,56,0,213 > > +DB 102,15,56,221,212 > > + pslld xmm4,1 > > + lea rax,[24+rax] > > + > > + movdqa xmm3,xmm0 > > + pslldq xmm0,4 > > + pxor xmm3,xmm0 > > + pslldq xmm0,4 > > + pxor xmm3,xmm0 > > + pslldq xmm0,4 > > + pxor xmm0,xmm3 > > + > > + pshufd xmm3,xmm0,0xff > > + pxor xmm3,xmm1 > > + pslldq xmm1,4 > > + pxor xmm3,xmm1 > > + > > + pxor xmm0,xmm2 > > + pxor xmm2,xmm3 > > + movdqu XMMWORD[(-16)+rax],xmm0 > > + > > + dec r10d > > + jnz NEAR $L$oop_key192 > > + > > + mov DWORD[32+rax],edx > > + xor eax,eax > > + jmp NEAR $L$enc_key_ret > > + > > +ALIGN 16 > > +$L$14rounds: > > + movups xmm2,XMMWORD[16+rcx] > > + mov edx,13 > > + lea rax,[16+rax] > > + cmp r10d,268435456 > > + je NEAR $L$14rounds_alt > > + > > + movups XMMWORD[r8],xmm0 > > + movups XMMWORD[16+r8],xmm2 > > +DB 102,15,58,223,202,1 > > + call $L$key_expansion_256a_cold > > +DB 102,15,58,223,200,1 > > + call $L$key_expansion_256b > > +DB 102,15,58,223,202,2 > > + call $L$key_expansion_256a > > +DB 102,15,58,223,200,2 > > + call $L$key_expansion_256b > > +DB 102,15,58,223,202,4 > > + call $L$key_expansion_256a > > +DB 102,15,58,223,200,4 > > + call $L$key_expansion_256b > > +DB 102,15,58,223,202,8 > > + call $L$key_expansion_256a > > +DB 102,15,58,223,200,8 > > + call $L$key_expansion_256b > > +DB 102,15,58,223,202,16 > > + call $L$key_expansion_256a > > +DB 102,15,58,223,200,16 > > + call $L$key_expansion_256b > > +DB 102,15,58,223,202,32 > > + call $L$key_expansion_256a > > +DB 102,15,58,223,200,32 > > + call $L$key_expansion_256b > > +DB 102,15,58,223,202,64 > > + call $L$key_expansion_256a > > + movups XMMWORD[rax],xmm0 > > + mov DWORD[16+rax],edx > > + xor rax,rax > > + jmp NEAR $L$enc_key_ret > > + > > +ALIGN 16 > > +$L$14rounds_alt: > > + movdqa xmm5,XMMWORD[$L$key_rotate] > > + movdqa xmm4,XMMWORD[$L$key_rcon1] > > + mov r10d,7 > > + movdqu XMMWORD[r8],xmm0 > > + movdqa xmm1,xmm2 > > + movdqu XMMWORD[16+r8],xmm2 > > + jmp NEAR $L$oop_key256 > > + > > +ALIGN 16 > > +$L$oop_key256: > > +DB 102,15,56,0,213 > > +DB 102,15,56,221,212 > > + > > + movdqa xmm3,xmm0 > > + pslldq xmm0,4 > > + pxor xmm3,xmm0 > > + pslldq xmm0,4 > > + pxor xmm3,xmm0 > > + pslldq xmm0,4 > > + pxor xmm0,xmm3 > > + pslld xmm4,1 > > + > > + pxor xmm0,xmm2 > > + movdqu XMMWORD[rax],xmm0 > > + > > + dec r10d > > + jz NEAR $L$done_key256 > > + > > + pshufd xmm2,xmm0,0xff > > + pxor xmm3,xmm3 > > +DB 102,15,56,221,211 > > + > > + movdqa xmm3,xmm1 > > + pslldq xmm1,4 > > + pxor xmm3,xmm1 > > + pslldq xmm1,4 > > + pxor xmm3,xmm1 > > + pslldq xmm1,4 > > + pxor xmm1,xmm3 > > + > > + pxor xmm2,xmm1 > > + movdqu XMMWORD[16+rax],xmm2 > > + lea rax,[32+rax] > > + movdqa xmm1,xmm2 > > + > > + jmp NEAR $L$oop_key256 > > + > > +$L$done_key256: > > + mov DWORD[16+rax],edx > > + xor eax,eax > > + jmp NEAR $L$enc_key_ret > > + > > +ALIGN 16 > > +$L$bad_keybits: > > + mov rax,-2 > > +$L$enc_key_ret: > > + pxor xmm0,xmm0 > > + pxor xmm1,xmm1 > > + pxor xmm2,xmm2 > > + pxor xmm3,xmm3 > > + pxor xmm4,xmm4 > > + pxor xmm5,xmm5 > > + add rsp,8 > > + > > + DB 0F3h,0C3h ;repret > > +$L$SEH_end_set_encrypt_key: > > + > > +ALIGN 16 > > +$L$key_expansion_128: > > + movups XMMWORD[rax],xmm0 > > + lea rax,[16+rax] > > +$L$key_expansion_128_cold: > > + shufps xmm4,xmm0,16 > > + xorps xmm0,xmm4 > > + shufps xmm4,xmm0,140 > > + xorps xmm0,xmm4 > > + shufps xmm1,xmm1,255 > > + xorps xmm0,xmm1 > > + DB 0F3h,0C3h ;repret > > + > > +ALIGN 16 > > +$L$key_expansion_192a: > > + movups XMMWORD[rax],xmm0 > > + lea rax,[16+rax] > > +$L$key_expansion_192a_cold: > > + movaps xmm5,xmm2 > > +$L$key_expansion_192b_warm: > > + shufps xmm4,xmm0,16 > > + movdqa xmm3,xmm2 > > + xorps xmm0,xmm4 > > + shufps xmm4,xmm0,140 > > + pslldq xmm3,4 > > + xorps xmm0,xmm4 > > + pshufd xmm1,xmm1,85 > > + pxor xmm2,xmm3 > > + pxor xmm0,xmm1 > > + pshufd xmm3,xmm0,255 > > + pxor xmm2,xmm3 > > + DB 0F3h,0C3h ;repret > > + > > +ALIGN 16 > > +$L$key_expansion_192b: > > + movaps xmm3,xmm0 > > + shufps xmm5,xmm0,68 > > + movups XMMWORD[rax],xmm5 > > + shufps xmm3,xmm2,78 > > + movups XMMWORD[16+rax],xmm3 > > + lea rax,[32+rax] > > + jmp NEAR $L$key_expansion_192b_warm > > + > > +ALIGN 16 > > +$L$key_expansion_256a: > > + movups XMMWORD[rax],xmm2 > > + lea rax,[16+rax] > > +$L$key_expansion_256a_cold: > > + shufps xmm4,xmm0,16 > > + xorps xmm0,xmm4 > > + shufps xmm4,xmm0,140 > > + xorps xmm0,xmm4 > > + shufps xmm1,xmm1,255 > > + xorps xmm0,xmm1 > > + DB 0F3h,0C3h ;repret > > + > > +ALIGN 16 > > +$L$key_expansion_256b: > > + movups XMMWORD[rax],xmm0 > > + lea rax,[16+rax] > > + > > + shufps xmm4,xmm2,16 > > + xorps xmm2,xmm4 > > + shufps xmm4,xmm2,140 > > + xorps xmm2,xmm4 > > + shufps xmm1,xmm1,170 > > + xorps xmm2,xmm1 > > + DB 0F3h,0C3h ;repret > > + > > + > > + > > +ALIGN 64 > > +$L$bswap_mask: > > +DB 15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0 > > +$L$increment32: > > + DD 6,6,6,0 > > +$L$increment64: > > + DD 1,0,0,0 > > +$L$xts_magic: > > + DD 0x87,0,1,0 > > +$L$increment1: > > +DB 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1 > > +$L$key_rotate: > > + DD 0x0c0f0e0d,0x0c0f0e0d,0x0c0f0e0d,0x0c0f0e0d > > +$L$key_rotate192: > > + DD 0x04070605,0x04070605,0x04070605,0x04070605 > > +$L$key_rcon1: > > + DD 1,1,1,1 > > +$L$key_rcon1b: > > + DD 0x1b,0x1b,0x1b,0x1b > > + > > +DB 65,69,83,32,102,111,114,32,73,110,116,101,108,32,65,69 > > +DB 83,45,78,73,44,32,67,82,89,80,84,79,71,65,77,83 > > +DB 32,98,121,32,60,97,112,112,114,111,64,111,112,101,110,115 > > +DB 115,108,46,111,114,103,62,0 > > +ALIGN 64 > > +EXTERN __imp_RtlVirtualUnwind > > + > > +ALIGN 16 > > +ecb_ccm64_se_handler: > > + push rsi > > + push rdi > > + push rbx > > + push rbp > > + push r12 > > + push r13 > > + push r14 > > + push r15 > > + pushfq > > + sub rsp,64 > > + > > + mov rax,QWORD[120+r8] > > + mov rbx,QWORD[248+r8] > > + > > + mov rsi,QWORD[8+r9] > > + mov r11,QWORD[56+r9] > > + > > + mov r10d,DWORD[r11] > > + lea r10,[r10*1+rsi] > > + cmp rbx,r10 > > + jb NEAR $L$common_seh_tail > > + > > + mov rax,QWORD[152+r8] > > + > > + mov r10d,DWORD[4+r11] > > + lea r10,[r10*1+rsi] > > + cmp rbx,r10 > > + jae NEAR $L$common_seh_tail > > + > > + lea rsi,[rax] > > + lea rdi,[512+r8] > > + mov ecx,8 > > + DD 0xa548f3fc > > + lea rax,[88+rax] > > + > > + jmp NEAR $L$common_seh_tail > > + > > + > > + > > +ALIGN 16 > > +ctr_xts_se_handler: > > + push rsi > > + push rdi > > + push rbx > > + push rbp > > + push r12 > > + push r13 > > + push r14 > > + push r15 > > + pushfq > > + sub rsp,64 > > + > > + mov rax,QWORD[120+r8] > > + mov rbx,QWORD[248+r8] > > + > > + mov rsi,QWORD[8+r9] > > + mov r11,QWORD[56+r9] > > + > > + mov r10d,DWORD[r11] > > + lea r10,[r10*1+rsi] > > + cmp rbx,r10 > > + jb NEAR $L$common_seh_tail > > + > > + mov rax,QWORD[152+r8] > > + > > + mov r10d,DWORD[4+r11] > > + lea r10,[r10*1+rsi] > > + cmp rbx,r10 > > + jae NEAR $L$common_seh_tail > > + > > + mov rax,QWORD[208+r8] > > + > > + lea rsi,[((-168))+rax] > > + lea rdi,[512+r8] > > + mov ecx,20 > > + DD 0xa548f3fc > > + > > + mov rbp,QWORD[((-8))+rax] > > + mov QWORD[160+r8],rbp > > + jmp NEAR $L$common_seh_tail > > + > > + > > + > > +ALIGN 16 > > +ocb_se_handler: > > + push rsi > > + push rdi > > + push rbx > > + push rbp > > + push r12 > > + push r13 > > + push r14 > > + push r15 > > + pushfq > > + sub rsp,64 > > + > > + mov rax,QWORD[120+r8] > > + mov rbx,QWORD[248+r8] > > + > > + mov rsi,QWORD[8+r9] > > + mov r11,QWORD[56+r9] > > + > > + mov r10d,DWORD[r11] > > + lea r10,[r10*1+rsi] > > + cmp rbx,r10 > > + jb NEAR $L$common_seh_tail > > + > > + mov r10d,DWORD[4+r11] > > + lea r10,[r10*1+rsi] > > + cmp rbx,r10 > > + jae NEAR $L$common_seh_tail > > + > > + mov r10d,DWORD[8+r11] > > + lea r10,[r10*1+rsi] > > + cmp rbx,r10 > > + jae NEAR $L$ocb_no_xmm > > + > > + mov rax,QWORD[152+r8] > > + > > + lea rsi,[rax] > > + lea rdi,[512+r8] > > + mov ecx,20 > > + DD 0xa548f3fc > > + lea rax,[((160+40))+rax] > > + > > +$L$ocb_no_xmm: > > + mov rbx,QWORD[((-8))+rax] > > + mov rbp,QWORD[((-16))+rax] > > + mov r12,QWORD[((-24))+rax] > > + mov r13,QWORD[((-32))+rax] > > + mov r14,QWORD[((-40))+rax] > > + > > + mov QWORD[144+r8],rbx > > + mov QWORD[160+r8],rbp > > + mov QWORD[216+r8],r12 > > + mov QWORD[224+r8],r13 > > + mov QWORD[232+r8],r14 > > + > > + jmp NEAR $L$common_seh_tail > > + > > + > > +ALIGN 16 > > +cbc_se_handler: > > + push rsi > > + push rdi > > + push rbx > > + push rbp > > + push r12 > > + push r13 > > + push r14 > > + push r15 > > + pushfq > > + sub rsp,64 > > + > > + mov rax,QWORD[152+r8] > > + mov rbx,QWORD[248+r8] > > + > > + lea r10,[$L$cbc_decrypt_bulk] > > + cmp rbx,r10 > > + jb NEAR $L$common_seh_tail > > + > > + mov rax,QWORD[120+r8] > > + > > + lea r10,[$L$cbc_decrypt_body] > > + cmp rbx,r10 > > + jb NEAR $L$common_seh_tail > > + > > + mov rax,QWORD[152+r8] > > + > > + lea r10,[$L$cbc_ret] > > + cmp rbx,r10 > > + jae NEAR $L$common_seh_tail > > + > > + lea rsi,[16+rax] > > + lea rdi,[512+r8] > > + mov ecx,20 > > + DD 0xa548f3fc > > + > > + mov rax,QWORD[208+r8] > > + > > + mov rbp,QWORD[((-8))+rax] > > + mov QWORD[160+r8],rbp > > + > > +$L$common_seh_tail: > > + mov rdi,QWORD[8+rax] > > + mov rsi,QWORD[16+rax] > > + mov QWORD[152+r8],rax > > + mov QWORD[168+r8],rsi > > + mov QWORD[176+r8],rdi > > + > > + mov rdi,QWORD[40+r9] > > + mov rsi,r8 > > + mov ecx,154 > > + DD 0xa548f3fc > > + > > + mov rsi,r9 > > + xor rcx,rcx > > + mov rdx,QWORD[8+rsi] > > + mov r8,QWORD[rsi] > > + mov r9,QWORD[16+rsi] > > + mov r10,QWORD[40+rsi] > > + lea r11,[56+rsi] > > + lea r12,[24+rsi] > > + mov QWORD[32+rsp],r10 > > + mov QWORD[40+rsp],r11 > > + mov QWORD[48+rsp],r12 > > + mov QWORD[56+rsp],rcx > > + call QWORD[__imp_RtlVirtualUnwind] > > + > > + mov eax,1 > > + add rsp,64 > > + popfq > > + pop r15 > > + pop r14 > > + pop r13 > > + pop r12 > > + pop rbp > > + pop rbx > > + pop rdi > > + pop rsi > > + DB 0F3h,0C3h ;repret > > + > > + > > +section .pdata rdata align=4 > > +ALIGN 4 > > + DD $L$SEH_begin_aesni_ecb_encrypt wrt ..imagebase > > + DD $L$SEH_end_aesni_ecb_encrypt wrt ..imagebase > > + DD $L$SEH_info_ecb wrt ..imagebase > > + > > + DD $L$SEH_begin_aesni_ccm64_encrypt_blocks > wrt ..imagebase > > + DD $L$SEH_end_aesni_ccm64_encrypt_blocks > wrt ..imagebase > > + DD $L$SEH_info_ccm64_enc wrt ..imagebase > > + > > + DD $L$SEH_begin_aesni_ccm64_decrypt_blocks > wrt ..imagebase > > + DD $L$SEH_end_aesni_ccm64_decrypt_blocks > wrt ..imagebase > > + DD $L$SEH_info_ccm64_dec wrt ..imagebase > > + > > + DD $L$SEH_begin_aesni_ctr32_encrypt_blocks > wrt ..imagebase > > + DD $L$SEH_end_aesni_ctr32_encrypt_blocks > wrt ..imagebase > > + DD $L$SEH_info_ctr32 wrt ..imagebase > > + > > + DD $L$SEH_begin_aesni_xts_encrypt wrt ..imagebase > > + DD $L$SEH_end_aesni_xts_encrypt wrt ..imagebase > > + DD $L$SEH_info_xts_enc wrt ..imagebase > > + > > + DD $L$SEH_begin_aesni_xts_decrypt wrt ..imagebase > > + DD $L$SEH_end_aesni_xts_decrypt wrt ..imagebase > > + DD $L$SEH_info_xts_dec wrt ..imagebase > > + > > + DD $L$SEH_begin_aesni_ocb_encrypt wrt ..imagebase > > + DD $L$SEH_end_aesni_ocb_encrypt wrt ..imagebase > > + DD $L$SEH_info_ocb_enc wrt ..imagebase > > + > > + DD $L$SEH_begin_aesni_ocb_decrypt wrt ..imagebase > > + DD $L$SEH_end_aesni_ocb_decrypt wrt ..imagebase > > + DD $L$SEH_info_ocb_dec wrt ..imagebase > > + DD $L$SEH_begin_aesni_cbc_encrypt wrt ..imagebase > > + DD $L$SEH_end_aesni_cbc_encrypt wrt ..imagebase > > + DD $L$SEH_info_cbc wrt ..imagebase > > + > > + DD aesni_set_decrypt_key wrt ..imagebase > > + DD $L$SEH_end_set_decrypt_key wrt ..imagebase > > + DD $L$SEH_info_key wrt ..imagebase > > + > > + DD aesni_set_encrypt_key wrt ..imagebase > > + DD $L$SEH_end_set_encrypt_key wrt ..imagebase > > + DD $L$SEH_info_key wrt ..imagebase > > +section .xdata rdata align=8 > > +ALIGN 8 > > +$L$SEH_info_ecb: > > +DB 9,0,0,0 > > + DD ecb_ccm64_se_handler wrt ..imagebase > > + DD $L$ecb_enc_body wrt ..imagebase,$L$ecb_enc_ret > wrt ..imagebase > > +$L$SEH_info_ccm64_enc: > > +DB 9,0,0,0 > > + DD ecb_ccm64_se_handler wrt ..imagebase > > + DD $L$ccm64_enc_body > wrt ..imagebase,$L$ccm64_enc_ret > > wrt ..imagebase > > +$L$SEH_info_ccm64_dec: > > +DB 9,0,0,0 > > + DD ecb_ccm64_se_handler wrt ..imagebase > > + DD $L$ccm64_dec_body > wrt ..imagebase,$L$ccm64_dec_ret > > wrt ..imagebase > > +$L$SEH_info_ctr32: > > +DB 9,0,0,0 > > + DD ctr_xts_se_handler wrt ..imagebase > > + DD $L$ctr32_body wrt ..imagebase,$L$ctr32_epilogue > wrt ..imagebase > > +$L$SEH_info_xts_enc: > > +DB 9,0,0,0 > > + DD ctr_xts_se_handler wrt ..imagebase > > + DD $L$xts_enc_body > wrt ..imagebase,$L$xts_enc_epilogue > > wrt ..imagebase > > +$L$SEH_info_xts_dec: > > +DB 9,0,0,0 > > + DD ctr_xts_se_handler wrt ..imagebase > > + DD $L$xts_dec_body > wrt ..imagebase,$L$xts_dec_epilogue > > wrt ..imagebase > > +$L$SEH_info_ocb_enc: > > +DB 9,0,0,0 > > + DD ocb_se_handler wrt ..imagebase > > + DD $L$ocb_enc_body > wrt ..imagebase,$L$ocb_enc_epilogue > > wrt ..imagebase > > + DD $L$ocb_enc_pop wrt ..imagebase > > + DD 0 > > +$L$SEH_info_ocb_dec: > > +DB 9,0,0,0 > > + DD ocb_se_handler wrt ..imagebase > > + DD $L$ocb_dec_body > wrt ..imagebase,$L$ocb_dec_epilogue > > wrt ..imagebase > > + DD $L$ocb_dec_pop wrt ..imagebase > > + DD 0 > > +$L$SEH_info_cbc: > > +DB 9,0,0,0 > > + DD cbc_se_handler wrt ..imagebase > > +$L$SEH_info_key: > > +DB 0x01,0x04,0x01,0x00 > > +DB 0x04,0x02,0x00,0x00 > > diff --git > a/CryptoPkg/Library/OpensslLib/X64/crypto/aes/vpaes-x86_64.nasm > > b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/vpaes-x86_64.nasm > > new file mode 100644 > > index 0000000000..1c911fa294 > > --- /dev/null > > +++ b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/vpaes-x86_64.nasm > > @@ -0,0 +1,1173 @@ > > +; WARNING: do not edit! > > +; Generated from openssl/crypto/aes/asm/vpaes-x86_64.pl > > +; > > +; Copyright 2011-2020 The OpenSSL Project Authors. All Rights Reserved. > > +; > > +; Licensed under the OpenSSL license (the "License"). You may not use > > +; this file except in compliance with the License. You can obtain a copy > > +; in the file LICENSE in the source distribution or at > > +; https://www.openssl.org/source/license.html > > + > > +default rel > > +%define XMMWORD > > +%define YMMWORD > > +%define ZMMWORD > > +section .text code align=64 > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > +ALIGN 16 > > +_vpaes_encrypt_core: > > + > > + mov r9,rdx > > + mov r11,16 > > + mov eax,DWORD[240+rdx] > > + movdqa xmm1,xmm9 > > + movdqa xmm2,XMMWORD[$L$k_ipt] > > + pandn xmm1,xmm0 > > + movdqu xmm5,XMMWORD[r9] > > + psrld xmm1,4 > > + pand xmm0,xmm9 > > +DB 102,15,56,0,208 > > + movdqa xmm0,XMMWORD[(($L$k_ipt+16))] > > +DB 102,15,56,0,193 > > + pxor xmm2,xmm5 > > + add r9,16 > > + pxor xmm0,xmm2 > > + lea r10,[$L$k_mc_backward] > > + jmp NEAR $L$enc_entry > > + > > +ALIGN 16 > > +$L$enc_loop: > > + > > + movdqa xmm4,xmm13 > > + movdqa xmm0,xmm12 > > +DB 102,15,56,0,226 > > +DB 102,15,56,0,195 > > + pxor xmm4,xmm5 > > + movdqa xmm5,xmm15 > > + pxor xmm0,xmm4 > > + movdqa xmm1,XMMWORD[((-64))+r10*1+r11] > > +DB 102,15,56,0,234 > > + movdqa xmm4,XMMWORD[r10*1+r11] > > + movdqa xmm2,xmm14 > > +DB 102,15,56,0,211 > > + movdqa xmm3,xmm0 > > + pxor xmm2,xmm5 > > +DB 102,15,56,0,193 > > + add r9,16 > > + pxor xmm0,xmm2 > > +DB 102,15,56,0,220 > > + add r11,16 > > + pxor xmm3,xmm0 > > +DB 102,15,56,0,193 > > + and r11,0x30 > > + sub rax,1 > > + pxor xmm0,xmm3 > > + > > +$L$enc_entry: > > + > > + movdqa xmm1,xmm9 > > + movdqa xmm5,xmm11 > > + pandn xmm1,xmm0 > > + psrld xmm1,4 > > + pand xmm0,xmm9 > > +DB 102,15,56,0,232 > > + movdqa xmm3,xmm10 > > + pxor xmm0,xmm1 > > +DB 102,15,56,0,217 > > + movdqa xmm4,xmm10 > > + pxor xmm3,xmm5 > > +DB 102,15,56,0,224 > > + movdqa xmm2,xmm10 > > + pxor xmm4,xmm5 > > +DB 102,15,56,0,211 > > + movdqa xmm3,xmm10 > > + pxor xmm2,xmm0 > > +DB 102,15,56,0,220 > > + movdqu xmm5,XMMWORD[r9] > > + pxor xmm3,xmm1 > > + jnz NEAR $L$enc_loop > > + > > + > > + movdqa xmm4,XMMWORD[((-96))+r10] > > + movdqa xmm0,XMMWORD[((-80))+r10] > > +DB 102,15,56,0,226 > > + pxor xmm4,xmm5 > > +DB 102,15,56,0,195 > > + movdqa xmm1,XMMWORD[64+r10*1+r11] > > + pxor xmm0,xmm4 > > +DB 102,15,56,0,193 > > + DB 0F3h,0C3h ;repret > > + > > + > > + > > + > > + > > + > > + > > + > > + > > +ALIGN 16 > > +_vpaes_decrypt_core: > > + > > + mov r9,rdx > > + mov eax,DWORD[240+rdx] > > + movdqa xmm1,xmm9 > > + movdqa xmm2,XMMWORD[$L$k_dipt] > > + pandn xmm1,xmm0 > > + mov r11,rax > > + psrld xmm1,4 > > + movdqu xmm5,XMMWORD[r9] > > + shl r11,4 > > + pand xmm0,xmm9 > > +DB 102,15,56,0,208 > > + movdqa xmm0,XMMWORD[(($L$k_dipt+16))] > > + xor r11,0x30 > > + lea r10,[$L$k_dsbd] > > +DB 102,15,56,0,193 > > + and r11,0x30 > > + pxor xmm2,xmm5 > > + movdqa xmm5,XMMWORD[(($L$k_mc_forward+48))] > > + pxor xmm0,xmm2 > > + add r9,16 > > + add r11,r10 > > + jmp NEAR $L$dec_entry > > + > > +ALIGN 16 > > +$L$dec_loop: > > + > > + > > + > > + movdqa xmm4,XMMWORD[((-32))+r10] > > + movdqa xmm1,XMMWORD[((-16))+r10] > > +DB 102,15,56,0,226 > > +DB 102,15,56,0,203 > > + pxor xmm0,xmm4 > > + movdqa xmm4,XMMWORD[r10] > > + pxor xmm0,xmm1 > > + movdqa xmm1,XMMWORD[16+r10] > > + > > +DB 102,15,56,0,226 > > +DB 102,15,56,0,197 > > +DB 102,15,56,0,203 > > + pxor xmm0,xmm4 > > + movdqa xmm4,XMMWORD[32+r10] > > + pxor xmm0,xmm1 > > + movdqa xmm1,XMMWORD[48+r10] > > + > > +DB 102,15,56,0,226 > > +DB 102,15,56,0,197 > > +DB 102,15,56,0,203 > > + pxor xmm0,xmm4 > > + movdqa xmm4,XMMWORD[64+r10] > > + pxor xmm0,xmm1 > > + movdqa xmm1,XMMWORD[80+r10] > > + > > +DB 102,15,56,0,226 > > +DB 102,15,56,0,197 > > +DB 102,15,56,0,203 > > + pxor xmm0,xmm4 > > + add r9,16 > > +DB 102,15,58,15,237,12 > > + pxor xmm0,xmm1 > > + sub rax,1 > > + > > +$L$dec_entry: > > + > > + movdqa xmm1,xmm9 > > + pandn xmm1,xmm0 > > + movdqa xmm2,xmm11 > > + psrld xmm1,4 > > + pand xmm0,xmm9 > > +DB 102,15,56,0,208 > > + movdqa xmm3,xmm10 > > + pxor xmm0,xmm1 > > +DB 102,15,56,0,217 > > + movdqa xmm4,xmm10 > > + pxor xmm3,xmm2 > > +DB 102,15,56,0,224 > > + pxor xmm4,xmm2 > > + movdqa xmm2,xmm10 > > +DB 102,15,56,0,211 > > + movdqa xmm3,xmm10 > > + pxor xmm2,xmm0 > > +DB 102,15,56,0,220 > > + movdqu xmm0,XMMWORD[r9] > > + pxor xmm3,xmm1 > > + jnz NEAR $L$dec_loop > > + > > + > > + movdqa xmm4,XMMWORD[96+r10] > > +DB 102,15,56,0,226 > > + pxor xmm4,xmm0 > > + movdqa xmm0,XMMWORD[112+r10] > > + movdqa xmm2,XMMWORD[((-352))+r11] > > +DB 102,15,56,0,195 > > + pxor xmm0,xmm4 > > +DB 102,15,56,0,194 > > + DB 0F3h,0C3h ;repret > > + > > + > > + > > + > > + > > + > > + > > + > > + > > +ALIGN 16 > > +_vpaes_schedule_core: > > + > > + > > + > > + > > + > > + > > + call _vpaes_preheat > > + movdqa xmm8,XMMWORD[$L$k_rcon] > > + movdqu xmm0,XMMWORD[rdi] > > + > > + > > + movdqa xmm3,xmm0 > > + lea r11,[$L$k_ipt] > > + call _vpaes_schedule_transform > > + movdqa xmm7,xmm0 > > + > > + lea r10,[$L$k_sr] > > + test rcx,rcx > > + jnz NEAR $L$schedule_am_decrypting > > + > > + > > + movdqu XMMWORD[rdx],xmm0 > > + jmp NEAR $L$schedule_go > > + > > +$L$schedule_am_decrypting: > > + > > + movdqa xmm1,XMMWORD[r10*1+r8] > > +DB 102,15,56,0,217 > > + movdqu XMMWORD[rdx],xmm3 > > + xor r8,0x30 > > + > > +$L$schedule_go: > > + cmp esi,192 > > + ja NEAR $L$schedule_256 > > + je NEAR $L$schedule_192 > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > +$L$schedule_128: > > + mov esi,10 > > + > > +$L$oop_schedule_128: > > + call _vpaes_schedule_round > > + dec rsi > > + jz NEAR $L$schedule_mangle_last > > + call _vpaes_schedule_mangle > > + jmp NEAR $L$oop_schedule_128 > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > +ALIGN 16 > > +$L$schedule_192: > > + movdqu xmm0,XMMWORD[8+rdi] > > + call _vpaes_schedule_transform > > + movdqa xmm6,xmm0 > > + pxor xmm4,xmm4 > > + movhlps xmm6,xmm4 > > + mov esi,4 > > + > > +$L$oop_schedule_192: > > + call _vpaes_schedule_round > > +DB 102,15,58,15,198,8 > > + call _vpaes_schedule_mangle > > + call _vpaes_schedule_192_smear > > + call _vpaes_schedule_mangle > > + call _vpaes_schedule_round > > + dec rsi > > + jz NEAR $L$schedule_mangle_last > > + call _vpaes_schedule_mangle > > + call _vpaes_schedule_192_smear > > + jmp NEAR $L$oop_schedule_192 > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > +ALIGN 16 > > +$L$schedule_256: > > + movdqu xmm0,XMMWORD[16+rdi] > > + call _vpaes_schedule_transform > > + mov esi,7 > > + > > +$L$oop_schedule_256: > > + call _vpaes_schedule_mangle > > + movdqa xmm6,xmm0 > > + > > + > > + call _vpaes_schedule_round > > + dec rsi > > + jz NEAR $L$schedule_mangle_last > > + call _vpaes_schedule_mangle > > + > > + > > + pshufd xmm0,xmm0,0xFF > > + movdqa xmm5,xmm7 > > + movdqa xmm7,xmm6 > > + call _vpaes_schedule_low_round > > + movdqa xmm7,xmm5 > > + > > + jmp NEAR $L$oop_schedule_256 > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > +ALIGN 16 > > +$L$schedule_mangle_last: > > + > > + lea r11,[$L$k_deskew] > > + test rcx,rcx > > + jnz NEAR $L$schedule_mangle_last_dec > > + > > + > > + movdqa xmm1,XMMWORD[r10*1+r8] > > +DB 102,15,56,0,193 > > + lea r11,[$L$k_opt] > > + add rdx,32 > > + > > +$L$schedule_mangle_last_dec: > > + add rdx,-16 > > + pxor xmm0,XMMWORD[$L$k_s63] > > + call _vpaes_schedule_transform > > + movdqu XMMWORD[rdx],xmm0 > > + > > + > > + pxor xmm0,xmm0 > > + pxor xmm1,xmm1 > > + pxor xmm2,xmm2 > > + pxor xmm3,xmm3 > > + pxor xmm4,xmm4 > > + pxor xmm5,xmm5 > > + pxor xmm6,xmm6 > > + pxor xmm7,xmm7 > > + DB 0F3h,0C3h ;repret > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > +ALIGN 16 > > +_vpaes_schedule_192_smear: > > + > > + pshufd xmm1,xmm6,0x80 > > + pshufd xmm0,xmm7,0xFE > > + pxor xmm6,xmm1 > > + pxor xmm1,xmm1 > > + pxor xmm6,xmm0 > > + movdqa xmm0,xmm6 > > + movhlps xmm6,xmm1 > > + DB 0F3h,0C3h ;repret > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > +ALIGN 16 > > +_vpaes_schedule_round: > > + > > + > > + pxor xmm1,xmm1 > > +DB 102,65,15,58,15,200,15 > > +DB 102,69,15,58,15,192,15 > > + pxor xmm7,xmm1 > > + > > + > > + pshufd xmm0,xmm0,0xFF > > +DB 102,15,58,15,192,1 > > + > > + > > + > > + > > +_vpaes_schedule_low_round: > > + > > + movdqa xmm1,xmm7 > > + pslldq xmm7,4 > > + pxor xmm7,xmm1 > > + movdqa xmm1,xmm7 > > + pslldq xmm7,8 > > + pxor xmm7,xmm1 > > + pxor xmm7,XMMWORD[$L$k_s63] > > + > > + > > + movdqa xmm1,xmm9 > > + pandn xmm1,xmm0 > > + psrld xmm1,4 > > + pand xmm0,xmm9 > > + movdqa xmm2,xmm11 > > +DB 102,15,56,0,208 > > + pxor xmm0,xmm1 > > + movdqa xmm3,xmm10 > > +DB 102,15,56,0,217 > > + pxor xmm3,xmm2 > > + movdqa xmm4,xmm10 > > +DB 102,15,56,0,224 > > + pxor xmm4,xmm2 > > + movdqa xmm2,xmm10 > > +DB 102,15,56,0,211 > > + pxor xmm2,xmm0 > > + movdqa xmm3,xmm10 > > +DB 102,15,56,0,220 > > + pxor xmm3,xmm1 > > + movdqa xmm4,xmm13 > > +DB 102,15,56,0,226 > > + movdqa xmm0,xmm12 > > +DB 102,15,56,0,195 > > + pxor xmm0,xmm4 > > + > > + > > + pxor xmm0,xmm7 > > + movdqa xmm7,xmm0 > > + DB 0F3h,0C3h ;repret > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > +ALIGN 16 > > +_vpaes_schedule_transform: > > + > > + movdqa xmm1,xmm9 > > + pandn xmm1,xmm0 > > + psrld xmm1,4 > > + pand xmm0,xmm9 > > + movdqa xmm2,XMMWORD[r11] > > +DB 102,15,56,0,208 > > + movdqa xmm0,XMMWORD[16+r11] > > +DB 102,15,56,0,193 > > + pxor xmm0,xmm2 > > + DB 0F3h,0C3h ;repret > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > +ALIGN 16 > > +_vpaes_schedule_mangle: > > + > > + movdqa xmm4,xmm0 > > + movdqa xmm5,XMMWORD[$L$k_mc_forward] > > + test rcx,rcx > > + jnz NEAR $L$schedule_mangle_dec > > + > > + > > + add rdx,16 > > + pxor xmm4,XMMWORD[$L$k_s63] > > +DB 102,15,56,0,229 > > + movdqa xmm3,xmm4 > > +DB 102,15,56,0,229 > > + pxor xmm3,xmm4 > > +DB 102,15,56,0,229 > > + pxor xmm3,xmm4 > > + > > + jmp NEAR $L$schedule_mangle_both > > +ALIGN 16 > > +$L$schedule_mangle_dec: > > + > > + lea r11,[$L$k_dksd] > > + movdqa xmm1,xmm9 > > + pandn xmm1,xmm4 > > + psrld xmm1,4 > > + pand xmm4,xmm9 > > + > > + movdqa xmm2,XMMWORD[r11] > > +DB 102,15,56,0,212 > > + movdqa xmm3,XMMWORD[16+r11] > > +DB 102,15,56,0,217 > > + pxor xmm3,xmm2 > > +DB 102,15,56,0,221 > > + > > + movdqa xmm2,XMMWORD[32+r11] > > +DB 102,15,56,0,212 > > + pxor xmm2,xmm3 > > + movdqa xmm3,XMMWORD[48+r11] > > +DB 102,15,56,0,217 > > + pxor xmm3,xmm2 > > +DB 102,15,56,0,221 > > + > > + movdqa xmm2,XMMWORD[64+r11] > > +DB 102,15,56,0,212 > > + pxor xmm2,xmm3 > > + movdqa xmm3,XMMWORD[80+r11] > > +DB 102,15,56,0,217 > > + pxor xmm3,xmm2 > > +DB 102,15,56,0,221 > > + > > + movdqa xmm2,XMMWORD[96+r11] > > +DB 102,15,56,0,212 > > + pxor xmm2,xmm3 > > + movdqa xmm3,XMMWORD[112+r11] > > +DB 102,15,56,0,217 > > + pxor xmm3,xmm2 > > + > > + add rdx,-16 > > + > > +$L$schedule_mangle_both: > > + movdqa xmm1,XMMWORD[r10*1+r8] > > +DB 102,15,56,0,217 > > + add r8,-16 > > + and r8,0x30 > > + movdqu XMMWORD[rdx],xmm3 > > + DB 0F3h,0C3h ;repret > > + > > + > > + > > + > > + > > + > > +global vpaes_set_encrypt_key > > + > > +ALIGN 16 > > +vpaes_set_encrypt_key: > > + mov QWORD[8+rsp],rdi ;WIN64 prologue > > + mov QWORD[16+rsp],rsi > > + mov rax,rsp > > +$L$SEH_begin_vpaes_set_encrypt_key: > > + mov rdi,rcx > > + mov rsi,rdx > > + mov rdx,r8 > > + > > + > > + > > + lea rsp,[((-184))+rsp] > > + movaps XMMWORD[16+rsp],xmm6 > > + movaps XMMWORD[32+rsp],xmm7 > > + movaps XMMWORD[48+rsp],xmm8 > > + movaps XMMWORD[64+rsp],xmm9 > > + movaps XMMWORD[80+rsp],xmm10 > > + movaps XMMWORD[96+rsp],xmm11 > > + movaps XMMWORD[112+rsp],xmm12 > > + movaps XMMWORD[128+rsp],xmm13 > > + movaps XMMWORD[144+rsp],xmm14 > > + movaps XMMWORD[160+rsp],xmm15 > > +$L$enc_key_body: > > + mov eax,esi > > + shr eax,5 > > + add eax,5 > > + mov DWORD[240+rdx],eax > > + > > + mov ecx,0 > > + mov r8d,0x30 > > + call _vpaes_schedule_core > > + movaps xmm6,XMMWORD[16+rsp] > > + movaps xmm7,XMMWORD[32+rsp] > > + movaps xmm8,XMMWORD[48+rsp] > > + movaps xmm9,XMMWORD[64+rsp] > > + movaps xmm10,XMMWORD[80+rsp] > > + movaps xmm11,XMMWORD[96+rsp] > > + movaps xmm12,XMMWORD[112+rsp] > > + movaps xmm13,XMMWORD[128+rsp] > > + movaps xmm14,XMMWORD[144+rsp] > > + movaps xmm15,XMMWORD[160+rsp] > > + lea rsp,[184+rsp] > > +$L$enc_key_epilogue: > > + xor eax,eax > > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > > + mov rsi,QWORD[16+rsp] > > + DB 0F3h,0C3h ;repret > > + > > +$L$SEH_end_vpaes_set_encrypt_key: > > + > > +global vpaes_set_decrypt_key > > + > > +ALIGN 16 > > +vpaes_set_decrypt_key: > > + mov QWORD[8+rsp],rdi ;WIN64 prologue > > + mov QWORD[16+rsp],rsi > > + mov rax,rsp > > +$L$SEH_begin_vpaes_set_decrypt_key: > > + mov rdi,rcx > > + mov rsi,rdx > > + mov rdx,r8 > > + > > + > > + > > + lea rsp,[((-184))+rsp] > > + movaps XMMWORD[16+rsp],xmm6 > > + movaps XMMWORD[32+rsp],xmm7 > > + movaps XMMWORD[48+rsp],xmm8 > > + movaps XMMWORD[64+rsp],xmm9 > > + movaps XMMWORD[80+rsp],xmm10 > > + movaps XMMWORD[96+rsp],xmm11 > > + movaps XMMWORD[112+rsp],xmm12 > > + movaps XMMWORD[128+rsp],xmm13 > > + movaps XMMWORD[144+rsp],xmm14 > > + movaps XMMWORD[160+rsp],xmm15 > > +$L$dec_key_body: > > + mov eax,esi > > + shr eax,5 > > + add eax,5 > > + mov DWORD[240+rdx],eax > > + shl eax,4 > > + lea rdx,[16+rax*1+rdx] > > + > > + mov ecx,1 > > + mov r8d,esi > > + shr r8d,1 > > + and r8d,32 > > + xor r8d,32 > > + call _vpaes_schedule_core > > + movaps xmm6,XMMWORD[16+rsp] > > + movaps xmm7,XMMWORD[32+rsp] > > + movaps xmm8,XMMWORD[48+rsp] > > + movaps xmm9,XMMWORD[64+rsp] > > + movaps xmm10,XMMWORD[80+rsp] > > + movaps xmm11,XMMWORD[96+rsp] > > + movaps xmm12,XMMWORD[112+rsp] > > + movaps xmm13,XMMWORD[128+rsp] > > + movaps xmm14,XMMWORD[144+rsp] > > + movaps xmm15,XMMWORD[160+rsp] > > + lea rsp,[184+rsp] > > +$L$dec_key_epilogue: > > + xor eax,eax > > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > > + mov rsi,QWORD[16+rsp] > > + DB 0F3h,0C3h ;repret > > + > > +$L$SEH_end_vpaes_set_decrypt_key: > > + > > +global vpaes_encrypt > > + > > +ALIGN 16 > > +vpaes_encrypt: > > + mov QWORD[8+rsp],rdi ;WIN64 prologue > > + mov QWORD[16+rsp],rsi > > + mov rax,rsp > > +$L$SEH_begin_vpaes_encrypt: > > + mov rdi,rcx > > + mov rsi,rdx > > + mov rdx,r8 > > + > > + > > + > > + lea rsp,[((-184))+rsp] > > + movaps XMMWORD[16+rsp],xmm6 > > + movaps XMMWORD[32+rsp],xmm7 > > + movaps XMMWORD[48+rsp],xmm8 > > + movaps XMMWORD[64+rsp],xmm9 > > + movaps XMMWORD[80+rsp],xmm10 > > + movaps XMMWORD[96+rsp],xmm11 > > + movaps XMMWORD[112+rsp],xmm12 > > + movaps XMMWORD[128+rsp],xmm13 > > + movaps XMMWORD[144+rsp],xmm14 > > + movaps XMMWORD[160+rsp],xmm15 > > +$L$enc_body: > > + movdqu xmm0,XMMWORD[rdi] > > + call _vpaes_preheat > > + call _vpaes_encrypt_core > > + movdqu XMMWORD[rsi],xmm0 > > + movaps xmm6,XMMWORD[16+rsp] > > + movaps xmm7,XMMWORD[32+rsp] > > + movaps xmm8,XMMWORD[48+rsp] > > + movaps xmm9,XMMWORD[64+rsp] > > + movaps xmm10,XMMWORD[80+rsp] > > + movaps xmm11,XMMWORD[96+rsp] > > + movaps xmm12,XMMWORD[112+rsp] > > + movaps xmm13,XMMWORD[128+rsp] > > + movaps xmm14,XMMWORD[144+rsp] > > + movaps xmm15,XMMWORD[160+rsp] > > + lea rsp,[184+rsp] > > +$L$enc_epilogue: > > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > > + mov rsi,QWORD[16+rsp] > > + DB 0F3h,0C3h ;repret > > + > > +$L$SEH_end_vpaes_encrypt: > > + > > +global vpaes_decrypt > > + > > +ALIGN 16 > > +vpaes_decrypt: > > + mov QWORD[8+rsp],rdi ;WIN64 prologue > > + mov QWORD[16+rsp],rsi > > + mov rax,rsp > > +$L$SEH_begin_vpaes_decrypt: > > + mov rdi,rcx > > + mov rsi,rdx > > + mov rdx,r8 > > + > > + > > + > > + lea rsp,[((-184))+rsp] > > + movaps XMMWORD[16+rsp],xmm6 > > + movaps XMMWORD[32+rsp],xmm7 > > + movaps XMMWORD[48+rsp],xmm8 > > + movaps XMMWORD[64+rsp],xmm9 > > + movaps XMMWORD[80+rsp],xmm10 > > + movaps XMMWORD[96+rsp],xmm11 > > + movaps XMMWORD[112+rsp],xmm12 > > + movaps XMMWORD[128+rsp],xmm13 > > + movaps XMMWORD[144+rsp],xmm14 > > + movaps XMMWORD[160+rsp],xmm15 > > +$L$dec_body: > > + movdqu xmm0,XMMWORD[rdi] > > + call _vpaes_preheat > > + call _vpaes_decrypt_core > > + movdqu XMMWORD[rsi],xmm0 > > + movaps xmm6,XMMWORD[16+rsp] > > + movaps xmm7,XMMWORD[32+rsp] > > + movaps xmm8,XMMWORD[48+rsp] > > + movaps xmm9,XMMWORD[64+rsp] > > + movaps xmm10,XMMWORD[80+rsp] > > + movaps xmm11,XMMWORD[96+rsp] > > + movaps xmm12,XMMWORD[112+rsp] > > + movaps xmm13,XMMWORD[128+rsp] > > + movaps xmm14,XMMWORD[144+rsp] > > + movaps xmm15,XMMWORD[160+rsp] > > + lea rsp,[184+rsp] > > +$L$dec_epilogue: > > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > > + mov rsi,QWORD[16+rsp] > > + DB 0F3h,0C3h ;repret > > + > > +$L$SEH_end_vpaes_decrypt: > > +global vpaes_cbc_encrypt > > + > > +ALIGN 16 > > +vpaes_cbc_encrypt: > > + mov QWORD[8+rsp],rdi ;WIN64 prologue > > + mov QWORD[16+rsp],rsi > > + mov rax,rsp > > +$L$SEH_begin_vpaes_cbc_encrypt: > > + mov rdi,rcx > > + mov rsi,rdx > > + mov rdx,r8 > > + mov rcx,r9 > > + mov r8,QWORD[40+rsp] > > + mov r9,QWORD[48+rsp] > > + > > + > > + > > + xchg rdx,rcx > > + sub rcx,16 > > + jc NEAR $L$cbc_abort > > + lea rsp,[((-184))+rsp] > > + movaps XMMWORD[16+rsp],xmm6 > > + movaps XMMWORD[32+rsp],xmm7 > > + movaps XMMWORD[48+rsp],xmm8 > > + movaps XMMWORD[64+rsp],xmm9 > > + movaps XMMWORD[80+rsp],xmm10 > > + movaps XMMWORD[96+rsp],xmm11 > > + movaps XMMWORD[112+rsp],xmm12 > > + movaps XMMWORD[128+rsp],xmm13 > > + movaps XMMWORD[144+rsp],xmm14 > > + movaps XMMWORD[160+rsp],xmm15 > > +$L$cbc_body: > > + movdqu xmm6,XMMWORD[r8] > > + sub rsi,rdi > > + call _vpaes_preheat > > + cmp r9d,0 > > + je NEAR $L$cbc_dec_loop > > + jmp NEAR $L$cbc_enc_loop > > +ALIGN 16 > > +$L$cbc_enc_loop: > > + movdqu xmm0,XMMWORD[rdi] > > + pxor xmm0,xmm6 > > + call _vpaes_encrypt_core > > + movdqa xmm6,xmm0 > > + movdqu XMMWORD[rdi*1+rsi],xmm0 > > + lea rdi,[16+rdi] > > + sub rcx,16 > > + jnc NEAR $L$cbc_enc_loop > > + jmp NEAR $L$cbc_done > > +ALIGN 16 > > +$L$cbc_dec_loop: > > + movdqu xmm0,XMMWORD[rdi] > > + movdqa xmm7,xmm0 > > + call _vpaes_decrypt_core > > + pxor xmm0,xmm6 > > + movdqa xmm6,xmm7 > > + movdqu XMMWORD[rdi*1+rsi],xmm0 > > + lea rdi,[16+rdi] > > + sub rcx,16 > > + jnc NEAR $L$cbc_dec_loop > > +$L$cbc_done: > > + movdqu XMMWORD[r8],xmm6 > > + movaps xmm6,XMMWORD[16+rsp] > > + movaps xmm7,XMMWORD[32+rsp] > > + movaps xmm8,XMMWORD[48+rsp] > > + movaps xmm9,XMMWORD[64+rsp] > > + movaps xmm10,XMMWORD[80+rsp] > > + movaps xmm11,XMMWORD[96+rsp] > > + movaps xmm12,XMMWORD[112+rsp] > > + movaps xmm13,XMMWORD[128+rsp] > > + movaps xmm14,XMMWORD[144+rsp] > > + movaps xmm15,XMMWORD[160+rsp] > > + lea rsp,[184+rsp] > > +$L$cbc_epilogue: > > +$L$cbc_abort: > > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > > + mov rsi,QWORD[16+rsp] > > + DB 0F3h,0C3h ;repret > > + > > +$L$SEH_end_vpaes_cbc_encrypt: > > + > > + > > + > > + > > + > > + > > + > > +ALIGN 16 > > +_vpaes_preheat: > > + > > + lea r10,[$L$k_s0F] > > + movdqa xmm10,XMMWORD[((-32))+r10] > > + movdqa xmm11,XMMWORD[((-16))+r10] > > + movdqa xmm9,XMMWORD[r10] > > + movdqa xmm13,XMMWORD[48+r10] > > + movdqa xmm12,XMMWORD[64+r10] > > + movdqa xmm15,XMMWORD[80+r10] > > + movdqa xmm14,XMMWORD[96+r10] > > + DB 0F3h,0C3h ;repret > > + > > + > > + > > + > > + > > + > > + > > + > > +ALIGN 64 > > +_vpaes_consts: > > +$L$k_inv: > > + DQ 0x0E05060F0D080180,0x040703090A0B0C02 > > + DQ 0x01040A060F0B0780,0x030D0E0C02050809 > > + > > +$L$k_s0F: > > + DQ 0x0F0F0F0F0F0F0F0F,0x0F0F0F0F0F0F0F0F > > + > > +$L$k_ipt: > > + DQ 0xC2B2E8985A2A7000,0xCABAE09052227808 > > + DQ 0x4C01307D317C4D00,0xCD80B1FCB0FDCC81 > > + > > +$L$k_sb1: > > + DQ 0xB19BE18FCB503E00,0xA5DF7A6E142AF544 > > + DQ 0x3618D415FAE22300,0x3BF7CCC10D2ED9EF > > +$L$k_sb2: > > + DQ 0xE27A93C60B712400,0x5EB7E955BC982FCD > > + DQ 0x69EB88400AE12900,0xC2A163C8AB82234A > > +$L$k_sbo: > > + DQ 0xD0D26D176FBDC700,0x15AABF7AC502A878 > > + DQ 0xCFE474A55FBB6A00,0x8E1E90D1412B35FA > > + > > +$L$k_mc_forward: > > + DQ 0x0407060500030201,0x0C0F0E0D080B0A09 > > + DQ 0x080B0A0904070605,0x000302010C0F0E0D > > + DQ 0x0C0F0E0D080B0A09,0x0407060500030201 > > + DQ 0x000302010C0F0E0D,0x080B0A0904070605 > > + > > +$L$k_mc_backward: > > + DQ 0x0605040702010003,0x0E0D0C0F0A09080B > > + DQ 0x020100030E0D0C0F,0x0A09080B06050407 > > + DQ 0x0E0D0C0F0A09080B,0x0605040702010003 > > + DQ 0x0A09080B06050407,0x020100030E0D0C0F > > + > > +$L$k_sr: > > + DQ 0x0706050403020100,0x0F0E0D0C0B0A0908 > > + DQ 0x030E09040F0A0500,0x0B06010C07020D08 > > + DQ 0x0F060D040B020900,0x070E050C030A0108 > > + DQ 0x0B0E0104070A0D00,0x0306090C0F020508 > > + > > +$L$k_rcon: > > + DQ 0x1F8391B9AF9DEEB6,0x702A98084D7C7D81 > > + > > +$L$k_s63: > > + DQ 0x5B5B5B5B5B5B5B5B,0x5B5B5B5B5B5B5B5B > > + > > +$L$k_opt: > > + DQ 0xFF9F4929D6B66000,0xF7974121DEBE6808 > > + DQ 0x01EDBD5150BCEC00,0xE10D5DB1B05C0CE0 > > + > > +$L$k_deskew: > > + DQ 0x07E4A34047A4E300,0x1DFEB95A5DBEF91A > > + DQ 0x5F36B5DC83EA6900,0x2841C2ABF49D1E77 > > + > > + > > + > > + > > + > > +$L$k_dksd: > > + DQ 0xFEB91A5DA3E44700,0x0740E3A45A1DBEF9 > > + DQ 0x41C277F4B5368300,0x5FDC69EAAB289D1E > > +$L$k_dksb: > > + DQ 0x9A4FCA1F8550D500,0x03D653861CC94C99 > > + DQ 0x115BEDA7B6FC4A00,0xD993256F7E3482C8 > > +$L$k_dkse: > > + DQ 0xD5031CCA1FC9D600,0x53859A4C994F5086 > > + DQ 0xA23196054FDC7BE8,0xCD5EF96A20B31487 > > +$L$k_dks9: > > + DQ 0xB6116FC87ED9A700,0x4AED933482255BFC > > + DQ 0x4576516227143300,0x8BB89FACE9DAFDCE > > + > > + > > + > > + > > + > > +$L$k_dipt: > > + DQ 0x0F505B040B545F00,0x154A411E114E451A > > + DQ 0x86E383E660056500,0x12771772F491F194 > > + > > +$L$k_dsb9: > > + DQ 0x851C03539A86D600,0xCAD51F504F994CC9 > > + DQ 0xC03B1789ECD74900,0x725E2C9EB2FBA565 > > +$L$k_dsbd: > > + DQ 0x7D57CCDFE6B1A200,0xF56E9B13882A4439 > > + DQ 0x3CE2FAF724C6CB00,0x2931180D15DEEFD3 > > +$L$k_dsbb: > > + DQ 0xD022649296B44200,0x602646F6B0F2D404 > > + DQ 0xC19498A6CD596700,0xF3FF0C3E3255AA6B > > +$L$k_dsbe: > > + DQ 0x46F2929626D4D000,0x2242600464B4F6B0 > > + DQ 0x0C55A6CDFFAAC100,0x9467F36B98593E32 > > +$L$k_dsbo: > > + DQ 0x1387EA537EF94000,0xC7AA6DB9D4943E2D > > + DQ 0x12D7560F93441D00,0xCA4B8159D8C58E9C > > +DB 86,101,99,116,111,114,32,80,101,114,109,117,116,97,116,105 > > +DB 111,110,32,65,69,83,32,102,111,114,32,120,56,54,95,54 > > +DB 52,47,83,83,83,69,51,44,32,77,105,107,101,32,72,97 > > +DB 109,98,117,114,103,32,40,83,116,97,110,102,111,114,100,32 > > +DB 85,110,105,118,101,114,115,105,116,121,41,0 > > +ALIGN 64 > > + > > +EXTERN __imp_RtlVirtualUnwind > > + > > +ALIGN 16 > > +se_handler: > > + push rsi > > + push rdi > > + push rbx > > + push rbp > > + push r12 > > + push r13 > > + push r14 > > + push r15 > > + pushfq > > + sub rsp,64 > > + > > + mov rax,QWORD[120+r8] > > + mov rbx,QWORD[248+r8] > > + > > + mov rsi,QWORD[8+r9] > > + mov r11,QWORD[56+r9] > > + > > + mov r10d,DWORD[r11] > > + lea r10,[r10*1+rsi] > > + cmp rbx,r10 > > + jb NEAR $L$in_prologue > > + > > + mov rax,QWORD[152+r8] > > + > > + mov r10d,DWORD[4+r11] > > + lea r10,[r10*1+rsi] > > + cmp rbx,r10 > > + jae NEAR $L$in_prologue > > + > > + lea rsi,[16+rax] > > + lea rdi,[512+r8] > > + mov ecx,20 > > + DD 0xa548f3fc > > + lea rax,[184+rax] > > + > > +$L$in_prologue: > > + mov rdi,QWORD[8+rax] > > + mov rsi,QWORD[16+rax] > > + mov QWORD[152+r8],rax > > + mov QWORD[168+r8],rsi > > + mov QWORD[176+r8],rdi > > + > > + mov rdi,QWORD[40+r9] > > + mov rsi,r8 > > + mov ecx,154 > > + DD 0xa548f3fc > > + > > + mov rsi,r9 > > + xor rcx,rcx > > + mov rdx,QWORD[8+rsi] > > + mov r8,QWORD[rsi] > > + mov r9,QWORD[16+rsi] > > + mov r10,QWORD[40+rsi] > > + lea r11,[56+rsi] > > + lea r12,[24+rsi] > > + mov QWORD[32+rsp],r10 > > + mov QWORD[40+rsp],r11 > > + mov QWORD[48+rsp],r12 > > + mov QWORD[56+rsp],rcx > > + call QWORD[__imp_RtlVirtualUnwind] > > + > > + mov eax,1 > > + add rsp,64 > > + popfq > > + pop r15 > > + pop r14 > > + pop r13 > > + pop r12 > > + pop rbp > > + pop rbx > > + pop rdi > > + pop rsi > > + DB 0F3h,0C3h ;repret > > + > > + > > +section .pdata rdata align=4 > > +ALIGN 4 > > + DD $L$SEH_begin_vpaes_set_encrypt_key > wrt ..imagebase > > + DD $L$SEH_end_vpaes_set_encrypt_key wrt ..imagebase > > + DD $L$SEH_info_vpaes_set_encrypt_key wrt ..imagebase > > + > > + DD $L$SEH_begin_vpaes_set_decrypt_key > wrt ..imagebase > > + DD $L$SEH_end_vpaes_set_decrypt_key wrt ..imagebase > > + DD $L$SEH_info_vpaes_set_decrypt_key wrt ..imagebase > > + > > + DD $L$SEH_begin_vpaes_encrypt wrt ..imagebase > > + DD $L$SEH_end_vpaes_encrypt wrt ..imagebase > > + DD $L$SEH_info_vpaes_encrypt wrt ..imagebase > > + > > + DD $L$SEH_begin_vpaes_decrypt wrt ..imagebase > > + DD $L$SEH_end_vpaes_decrypt wrt ..imagebase > > + DD $L$SEH_info_vpaes_decrypt wrt ..imagebase > > + > > + DD $L$SEH_begin_vpaes_cbc_encrypt wrt ..imagebase > > + DD $L$SEH_end_vpaes_cbc_encrypt wrt ..imagebase > > + DD $L$SEH_info_vpaes_cbc_encrypt wrt ..imagebase > > + > > +section .xdata rdata align=8 > > +ALIGN 8 > > +$L$SEH_info_vpaes_set_encrypt_key: > > +DB 9,0,0,0 > > + DD se_handler wrt ..imagebase > > + DD $L$enc_key_body > wrt ..imagebase,$L$enc_key_epilogue > > wrt ..imagebase > > +$L$SEH_info_vpaes_set_decrypt_key: > > +DB 9,0,0,0 > > + DD se_handler wrt ..imagebase > > + DD $L$dec_key_body > wrt ..imagebase,$L$dec_key_epilogue > > wrt ..imagebase > > +$L$SEH_info_vpaes_encrypt: > > +DB 9,0,0,0 > > + DD se_handler wrt ..imagebase > > + DD $L$enc_body wrt ..imagebase,$L$enc_epilogue > wrt ..imagebase > > +$L$SEH_info_vpaes_decrypt: > > +DB 9,0,0,0 > > + DD se_handler wrt ..imagebase > > + DD $L$dec_body wrt ..imagebase,$L$dec_epilogue > wrt ..imagebase > > +$L$SEH_info_vpaes_cbc_encrypt: > > +DB 9,0,0,0 > > + DD se_handler wrt ..imagebase > > + DD $L$cbc_body wrt ..imagebase,$L$cbc_epilogue > wrt ..imagebase > > diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/modes/aesni-gcm- > > x86_64.nasm > b/CryptoPkg/Library/OpensslLib/X64/crypto/modes/aesni-gcm- > > x86_64.nasm > > new file mode 100644 > > index 0000000000..9e1a2d0a40 > > --- /dev/null > > +++ b/CryptoPkg/Library/OpensslLib/X64/crypto/modes/aesni-gcm- > > x86_64.nasm > > @@ -0,0 +1,34 @@ > > +; WARNING: do not edit! > > +; Generated from openssl/crypto/modes/asm/aesni-gcm-x86_64.pl > > +; > > +; Copyright 2013-2020 The OpenSSL Project Authors. All Rights Reserved. > > +; > > +; Licensed under the OpenSSL license (the "License"). You may not use > > +; this file except in compliance with the License. You can obtain a copy > > +; in the file LICENSE in the source distribution or at > > +; https://www.openssl.org/source/license.html > > + > > +default rel > > +%define XMMWORD > > +%define YMMWORD > > +%define ZMMWORD > > +section .text code align=64 > > + > > + > > +global aesni_gcm_encrypt > > + > > +aesni_gcm_encrypt: > > + > > + xor eax,eax > > + DB 0F3h,0C3h ;repret > > + > > + > > + > > +global aesni_gcm_decrypt > > + > > +aesni_gcm_decrypt: > > + > > + xor eax,eax > > + DB 0F3h,0C3h ;repret > > + > > + > > diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/modes/ghash- > > x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/modes/ghash- > > x86_64.nasm > > new file mode 100644 > > index 0000000000..60f283d5fb > > --- /dev/null > > +++ > b/CryptoPkg/Library/OpensslLib/X64/crypto/modes/ghash-x86_64.nasm > > @@ -0,0 +1,1569 @@ > > +; WARNING: do not edit! > > +; Generated from openssl/crypto/modes/asm/ghash-x86_64.pl > > +; > > +; Copyright 2010-2020 The OpenSSL Project Authors. All Rights Reserved. > > +; > > +; Licensed under the OpenSSL license (the "License"). You may not use > > +; this file except in compliance with the License. You can obtain a copy > > +; in the file LICENSE in the source distribution or at > > +; https://www.openssl.org/source/license.html > > + > > +default rel > > +%define XMMWORD > > +%define YMMWORD > > +%define ZMMWORD > > +section .text code align=64 > > + > > +EXTERN OPENSSL_ia32cap_P > > + > > +global gcm_gmult_4bit > > + > > +ALIGN 16 > > +gcm_gmult_4bit: > > + mov QWORD[8+rsp],rdi ;WIN64 prologue > > + mov QWORD[16+rsp],rsi > > + mov rax,rsp > > +$L$SEH_begin_gcm_gmult_4bit: > > + mov rdi,rcx > > + mov rsi,rdx > > + > > + > > + > > + push rbx > > + > > + push rbp > > + > > + push r12 > > + > > + push r13 > > + > > + push r14 > > + > > + push r15 > > + > > + sub rsp,280 > > + > > +$L$gmult_prologue: > > + > > + movzx r8,BYTE[15+rdi] > > + lea r11,[$L$rem_4bit] > > + xor rax,rax > > + xor rbx,rbx > > + mov al,r8b > > + mov bl,r8b > > + shl al,4 > > + mov rcx,14 > > + mov r8,QWORD[8+rax*1+rsi] > > + mov r9,QWORD[rax*1+rsi] > > + and bl,0xf0 > > + mov rdx,r8 > > + jmp NEAR $L$oop1 > > + > > +ALIGN 16 > > +$L$oop1: > > + shr r8,4 > > + and rdx,0xf > > + mov r10,r9 > > + mov al,BYTE[rcx*1+rdi] > > + shr r9,4 > > + xor r8,QWORD[8+rbx*1+rsi] > > + shl r10,60 > > + xor r9,QWORD[rbx*1+rsi] > > + mov bl,al > > + xor r9,QWORD[rdx*8+r11] > > + mov rdx,r8 > > + shl al,4 > > + xor r8,r10 > > + dec rcx > > + js NEAR $L$break1 > > + > > + shr r8,4 > > + and rdx,0xf > > + mov r10,r9 > > + shr r9,4 > > + xor r8,QWORD[8+rax*1+rsi] > > + shl r10,60 > > + xor r9,QWORD[rax*1+rsi] > > + and bl,0xf0 > > + xor r9,QWORD[rdx*8+r11] > > + mov rdx,r8 > > + xor r8,r10 > > + jmp NEAR $L$oop1 > > + > > +ALIGN 16 > > +$L$break1: > > + shr r8,4 > > + and rdx,0xf > > + mov r10,r9 > > + shr r9,4 > > + xor r8,QWORD[8+rax*1+rsi] > > + shl r10,60 > > + xor r9,QWORD[rax*1+rsi] > > + and bl,0xf0 > > + xor r9,QWORD[rdx*8+r11] > > + mov rdx,r8 > > + xor r8,r10 > > + > > + shr r8,4 > > + and rdx,0xf > > + mov r10,r9 > > + shr r9,4 > > + xor r8,QWORD[8+rbx*1+rsi] > > + shl r10,60 > > + xor r9,QWORD[rbx*1+rsi] > > + xor r8,r10 > > + xor r9,QWORD[rdx*8+r11] > > + > > + bswap r8 > > + bswap r9 > > + mov QWORD[8+rdi],r8 > > + mov QWORD[rdi],r9 > > + > > + lea rsi,[((280+48))+rsp] > > + > > + mov rbx,QWORD[((-8))+rsi] > > + > > + lea rsp,[rsi] > > + > > +$L$gmult_epilogue: > > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > > + mov rsi,QWORD[16+rsp] > > + DB 0F3h,0C3h ;repret > > + > > +$L$SEH_end_gcm_gmult_4bit: > > +global gcm_ghash_4bit > > + > > +ALIGN 16 > > +gcm_ghash_4bit: > > + mov QWORD[8+rsp],rdi ;WIN64 prologue > > + mov QWORD[16+rsp],rsi > > + mov rax,rsp > > +$L$SEH_begin_gcm_ghash_4bit: > > + mov rdi,rcx > > + mov rsi,rdx > > + mov rdx,r8 > > + mov rcx,r9 > > + > > + > > + > > + push rbx > > + > > + push rbp > > + > > + push r12 > > + > > + push r13 > > + > > + push r14 > > + > > + push r15 > > + > > + sub rsp,280 > > + > > +$L$ghash_prologue: > > + mov r14,rdx > > + mov r15,rcx > > + sub rsi,-128 > > + lea rbp,[((16+128))+rsp] > > + xor edx,edx > > + mov r8,QWORD[((0+0-128))+rsi] > > + mov rax,QWORD[((0+8-128))+rsi] > > + mov dl,al > > + shr rax,4 > > + mov r10,r8 > > + shr r8,4 > > + mov r9,QWORD[((16+0-128))+rsi] > > + shl dl,4 > > + mov rbx,QWORD[((16+8-128))+rsi] > > + shl r10,60 > > + mov BYTE[rsp],dl > > + or rax,r10 > > + mov dl,bl > > + shr rbx,4 > > + mov r10,r9 > > + shr r9,4 > > + mov QWORD[rbp],r8 > > + mov r8,QWORD[((32+0-128))+rsi] > > + shl dl,4 > > + mov QWORD[((0-128))+rbp],rax > > + mov rax,QWORD[((32+8-128))+rsi] > > + shl r10,60 > > + mov BYTE[1+rsp],dl > > + or rbx,r10 > > + mov dl,al > > + shr rax,4 > > + mov r10,r8 > > + shr r8,4 > > + mov QWORD[8+rbp],r9 > > + mov r9,QWORD[((48+0-128))+rsi] > > + shl dl,4 > > + mov QWORD[((8-128))+rbp],rbx > > + mov rbx,QWORD[((48+8-128))+rsi] > > + shl r10,60 > > + mov BYTE[2+rsp],dl > > + or rax,r10 > > + mov dl,bl > > + shr rbx,4 > > + mov r10,r9 > > + shr r9,4 > > + mov QWORD[16+rbp],r8 > > + mov r8,QWORD[((64+0-128))+rsi] > > + shl dl,4 > > + mov QWORD[((16-128))+rbp],rax > > + mov rax,QWORD[((64+8-128))+rsi] > > + shl r10,60 > > + mov BYTE[3+rsp],dl > > + or rbx,r10 > > + mov dl,al > > + shr rax,4 > > + mov r10,r8 > > + shr r8,4 > > + mov QWORD[24+rbp],r9 > > + mov r9,QWORD[((80+0-128))+rsi] > > + shl dl,4 > > + mov QWORD[((24-128))+rbp],rbx > > + mov rbx,QWORD[((80+8-128))+rsi] > > + shl r10,60 > > + mov BYTE[4+rsp],dl > > + or rax,r10 > > + mov dl,bl > > + shr rbx,4 > > + mov r10,r9 > > + shr r9,4 > > + mov QWORD[32+rbp],r8 > > + mov r8,QWORD[((96+0-128))+rsi] > > + shl dl,4 > > + mov QWORD[((32-128))+rbp],rax > > + mov rax,QWORD[((96+8-128))+rsi] > > + shl r10,60 > > + mov BYTE[5+rsp],dl > > + or rbx,r10 > > + mov dl,al > > + shr rax,4 > > + mov r10,r8 > > + shr r8,4 > > + mov QWORD[40+rbp],r9 > > + mov r9,QWORD[((112+0-128))+rsi] > > + shl dl,4 > > + mov QWORD[((40-128))+rbp],rbx > > + mov rbx,QWORD[((112+8-128))+rsi] > > + shl r10,60 > > + mov BYTE[6+rsp],dl > > + or rax,r10 > > + mov dl,bl > > + shr rbx,4 > > + mov r10,r9 > > + shr r9,4 > > + mov QWORD[48+rbp],r8 > > + mov r8,QWORD[((128+0-128))+rsi] > > + shl dl,4 > > + mov QWORD[((48-128))+rbp],rax > > + mov rax,QWORD[((128+8-128))+rsi] > > + shl r10,60 > > + mov BYTE[7+rsp],dl > > + or rbx,r10 > > + mov dl,al > > + shr rax,4 > > + mov r10,r8 > > + shr r8,4 > > + mov QWORD[56+rbp],r9 > > + mov r9,QWORD[((144+0-128))+rsi] > > + shl dl,4 > > + mov QWORD[((56-128))+rbp],rbx > > + mov rbx,QWORD[((144+8-128))+rsi] > > + shl r10,60 > > + mov BYTE[8+rsp],dl > > + or rax,r10 > > + mov dl,bl > > + shr rbx,4 > > + mov r10,r9 > > + shr r9,4 > > + mov QWORD[64+rbp],r8 > > + mov r8,QWORD[((160+0-128))+rsi] > > + shl dl,4 > > + mov QWORD[((64-128))+rbp],rax > > + mov rax,QWORD[((160+8-128))+rsi] > > + shl r10,60 > > + mov BYTE[9+rsp],dl > > + or rbx,r10 > > + mov dl,al > > + shr rax,4 > > + mov r10,r8 > > + shr r8,4 > > + mov QWORD[72+rbp],r9 > > + mov r9,QWORD[((176+0-128))+rsi] > > + shl dl,4 > > + mov QWORD[((72-128))+rbp],rbx > > + mov rbx,QWORD[((176+8-128))+rsi] > > + shl r10,60 > > + mov BYTE[10+rsp],dl > > + or rax,r10 > > + mov dl,bl > > + shr rbx,4 > > + mov r10,r9 > > + shr r9,4 > > + mov QWORD[80+rbp],r8 > > + mov r8,QWORD[((192+0-128))+rsi] > > + shl dl,4 > > + mov QWORD[((80-128))+rbp],rax > > + mov rax,QWORD[((192+8-128))+rsi] > > + shl r10,60 > > + mov BYTE[11+rsp],dl > > + or rbx,r10 > > + mov dl,al > > + shr rax,4 > > + mov r10,r8 > > + shr r8,4 > > + mov QWORD[88+rbp],r9 > > + mov r9,QWORD[((208+0-128))+rsi] > > + shl dl,4 > > + mov QWORD[((88-128))+rbp],rbx > > + mov rbx,QWORD[((208+8-128))+rsi] > > + shl r10,60 > > + mov BYTE[12+rsp],dl > > + or rax,r10 > > + mov dl,bl > > + shr rbx,4 > > + mov r10,r9 > > + shr r9,4 > > + mov QWORD[96+rbp],r8 > > + mov r8,QWORD[((224+0-128))+rsi] > > + shl dl,4 > > + mov QWORD[((96-128))+rbp],rax > > + mov rax,QWORD[((224+8-128))+rsi] > > + shl r10,60 > > + mov BYTE[13+rsp],dl > > + or rbx,r10 > > + mov dl,al > > + shr rax,4 > > + mov r10,r8 > > + shr r8,4 > > + mov QWORD[104+rbp],r9 > > + mov r9,QWORD[((240+0-128))+rsi] > > + shl dl,4 > > + mov QWORD[((104-128))+rbp],rbx > > + mov rbx,QWORD[((240+8-128))+rsi] > > + shl r10,60 > > + mov BYTE[14+rsp],dl > > + or rax,r10 > > + mov dl,bl > > + shr rbx,4 > > + mov r10,r9 > > + shr r9,4 > > + mov QWORD[112+rbp],r8 > > + shl dl,4 > > + mov QWORD[((112-128))+rbp],rax > > + shl r10,60 > > + mov BYTE[15+rsp],dl > > + or rbx,r10 > > + mov QWORD[120+rbp],r9 > > + mov QWORD[((120-128))+rbp],rbx > > + add rsi,-128 > > + mov r8,QWORD[8+rdi] > > + mov r9,QWORD[rdi] > > + add r15,r14 > > + lea r11,[$L$rem_8bit] > > + jmp NEAR $L$outer_loop > > +ALIGN 16 > > +$L$outer_loop: > > + xor r9,QWORD[r14] > > + mov rdx,QWORD[8+r14] > > + lea r14,[16+r14] > > + xor rdx,r8 > > + mov QWORD[rdi],r9 > > + mov QWORD[8+rdi],rdx > > + shr rdx,32 > > + xor rax,rax > > + rol edx,8 > > + mov al,dl > > + movzx ebx,dl > > + shl al,4 > > + shr ebx,4 > > + rol edx,8 > > + mov r8,QWORD[8+rax*1+rsi] > > + mov r9,QWORD[rax*1+rsi] > > + mov al,dl > > + movzx ecx,dl > > + shl al,4 > > + movzx r12,BYTE[rbx*1+rsp] > > + shr ecx,4 > > + xor r12,r8 > > + mov r10,r9 > > + shr r8,8 > > + movzx r12,r12b > > + shr r9,8 > > + xor r8,QWORD[((-128))+rbx*8+rbp] > > + shl r10,56 > > + xor r9,QWORD[rbx*8+rbp] > > + rol edx,8 > > + xor r8,QWORD[8+rax*1+rsi] > > + xor r9,QWORD[rax*1+rsi] > > + mov al,dl > > + xor r8,r10 > > + movzx r12,WORD[r12*2+r11] > > + movzx ebx,dl > > + shl al,4 > > + movzx r13,BYTE[rcx*1+rsp] > > + shr ebx,4 > > + shl r12,48 > > + xor r13,r8 > > + mov r10,r9 > > + xor r9,r12 > > + shr r8,8 > > + movzx r13,r13b > > + shr r9,8 > > + xor r8,QWORD[((-128))+rcx*8+rbp] > > + shl r10,56 > > + xor r9,QWORD[rcx*8+rbp] > > + rol edx,8 > > + xor r8,QWORD[8+rax*1+rsi] > > + xor r9,QWORD[rax*1+rsi] > > + mov al,dl > > + xor r8,r10 > > + movzx r13,WORD[r13*2+r11] > > + movzx ecx,dl > > + shl al,4 > > + movzx r12,BYTE[rbx*1+rsp] > > + shr ecx,4 > > + shl r13,48 > > + xor r12,r8 > > + mov r10,r9 > > + xor r9,r13 > > + shr r8,8 > > + movzx r12,r12b > > + mov edx,DWORD[8+rdi] > > + shr r9,8 > > + xor r8,QWORD[((-128))+rbx*8+rbp] > > + shl r10,56 > > + xor r9,QWORD[rbx*8+rbp] > > + rol edx,8 > > + xor r8,QWORD[8+rax*1+rsi] > > + xor r9,QWORD[rax*1+rsi] > > + mov al,dl > > + xor r8,r10 > > + movzx r12,WORD[r12*2+r11] > > + movzx ebx,dl > > + shl al,4 > > + movzx r13,BYTE[rcx*1+rsp] > > + shr ebx,4 > > + shl r12,48 > > + xor r13,r8 > > + mov r10,r9 > > + xor r9,r12 > > + shr r8,8 > > + movzx r13,r13b > > + shr r9,8 > > + xor r8,QWORD[((-128))+rcx*8+rbp] > > + shl r10,56 > > + xor r9,QWORD[rcx*8+rbp] > > + rol edx,8 > > + xor r8,QWORD[8+rax*1+rsi] > > + xor r9,QWORD[rax*1+rsi] > > + mov al,dl > > + xor r8,r10 > > + movzx r13,WORD[r13*2+r11] > > + movzx ecx,dl > > + shl al,4 > > + movzx r12,BYTE[rbx*1+rsp] > > + shr ecx,4 > > + shl r13,48 > > + xor r12,r8 > > + mov r10,r9 > > + xor r9,r13 > > + shr r8,8 > > + movzx r12,r12b > > + shr r9,8 > > + xor r8,QWORD[((-128))+rbx*8+rbp] > > + shl r10,56 > > + xor r9,QWORD[rbx*8+rbp] > > + rol edx,8 > > + xor r8,QWORD[8+rax*1+rsi] > > + xor r9,QWORD[rax*1+rsi] > > + mov al,dl > > + xor r8,r10 > > + movzx r12,WORD[r12*2+r11] > > + movzx ebx,dl > > + shl al,4 > > + movzx r13,BYTE[rcx*1+rsp] > > + shr ebx,4 > > + shl r12,48 > > + xor r13,r8 > > + mov r10,r9 > > + xor r9,r12 > > + shr r8,8 > > + movzx r13,r13b > > + shr r9,8 > > + xor r8,QWORD[((-128))+rcx*8+rbp] > > + shl r10,56 > > + xor r9,QWORD[rcx*8+rbp] > > + rol edx,8 > > + xor r8,QWORD[8+rax*1+rsi] > > + xor r9,QWORD[rax*1+rsi] > > + mov al,dl > > + xor r8,r10 > > + movzx r13,WORD[r13*2+r11] > > + movzx ecx,dl > > + shl al,4 > > + movzx r12,BYTE[rbx*1+rsp] > > + shr ecx,4 > > + shl r13,48 > > + xor r12,r8 > > + mov r10,r9 > > + xor r9,r13 > > + shr r8,8 > > + movzx r12,r12b > > + mov edx,DWORD[4+rdi] > > + shr r9,8 > > + xor r8,QWORD[((-128))+rbx*8+rbp] > > + shl r10,56 > > + xor r9,QWORD[rbx*8+rbp] > > + rol edx,8 > > + xor r8,QWORD[8+rax*1+rsi] > > + xor r9,QWORD[rax*1+rsi] > > + mov al,dl > > + xor r8,r10 > > + movzx r12,WORD[r12*2+r11] > > + movzx ebx,dl > > + shl al,4 > > + movzx r13,BYTE[rcx*1+rsp] > > + shr ebx,4 > > + shl r12,48 > > + xor r13,r8 > > + mov r10,r9 > > + xor r9,r12 > > + shr r8,8 > > + movzx r13,r13b > > + shr r9,8 > > + xor r8,QWORD[((-128))+rcx*8+rbp] > > + shl r10,56 > > + xor r9,QWORD[rcx*8+rbp] > > + rol edx,8 > > + xor r8,QWORD[8+rax*1+rsi] > > + xor r9,QWORD[rax*1+rsi] > > + mov al,dl > > + xor r8,r10 > > + movzx r13,WORD[r13*2+r11] > > + movzx ecx,dl > > + shl al,4 > > + movzx r12,BYTE[rbx*1+rsp] > > + shr ecx,4 > > + shl r13,48 > > + xor r12,r8 > > + mov r10,r9 > > + xor r9,r13 > > + shr r8,8 > > + movzx r12,r12b > > + shr r9,8 > > + xor r8,QWORD[((-128))+rbx*8+rbp] > > + shl r10,56 > > + xor r9,QWORD[rbx*8+rbp] > > + rol edx,8 > > + xor r8,QWORD[8+rax*1+rsi] > > + xor r9,QWORD[rax*1+rsi] > > + mov al,dl > > + xor r8,r10 > > + movzx r12,WORD[r12*2+r11] > > + movzx ebx,dl > > + shl al,4 > > + movzx r13,BYTE[rcx*1+rsp] > > + shr ebx,4 > > + shl r12,48 > > + xor r13,r8 > > + mov r10,r9 > > + xor r9,r12 > > + shr r8,8 > > + movzx r13,r13b > > + shr r9,8 > > + xor r8,QWORD[((-128))+rcx*8+rbp] > > + shl r10,56 > > + xor r9,QWORD[rcx*8+rbp] > > + rol edx,8 > > + xor r8,QWORD[8+rax*1+rsi] > > + xor r9,QWORD[rax*1+rsi] > > + mov al,dl > > + xor r8,r10 > > + movzx r13,WORD[r13*2+r11] > > + movzx ecx,dl > > + shl al,4 > > + movzx r12,BYTE[rbx*1+rsp] > > + shr ecx,4 > > + shl r13,48 > > + xor r12,r8 > > + mov r10,r9 > > + xor r9,r13 > > + shr r8,8 > > + movzx r12,r12b > > + mov edx,DWORD[rdi] > > + shr r9,8 > > + xor r8,QWORD[((-128))+rbx*8+rbp] > > + shl r10,56 > > + xor r9,QWORD[rbx*8+rbp] > > + rol edx,8 > > + xor r8,QWORD[8+rax*1+rsi] > > + xor r9,QWORD[rax*1+rsi] > > + mov al,dl > > + xor r8,r10 > > + movzx r12,WORD[r12*2+r11] > > + movzx ebx,dl > > + shl al,4 > > + movzx r13,BYTE[rcx*1+rsp] > > + shr ebx,4 > > + shl r12,48 > > + xor r13,r8 > > + mov r10,r9 > > + xor r9,r12 > > + shr r8,8 > > + movzx r13,r13b > > + shr r9,8 > > + xor r8,QWORD[((-128))+rcx*8+rbp] > > + shl r10,56 > > + xor r9,QWORD[rcx*8+rbp] > > + rol edx,8 > > + xor r8,QWORD[8+rax*1+rsi] > > + xor r9,QWORD[rax*1+rsi] > > + mov al,dl > > + xor r8,r10 > > + movzx r13,WORD[r13*2+r11] > > + movzx ecx,dl > > + shl al,4 > > + movzx r12,BYTE[rbx*1+rsp] > > + shr ecx,4 > > + shl r13,48 > > + xor r12,r8 > > + mov r10,r9 > > + xor r9,r13 > > + shr r8,8 > > + movzx r12,r12b > > + shr r9,8 > > + xor r8,QWORD[((-128))+rbx*8+rbp] > > + shl r10,56 > > + xor r9,QWORD[rbx*8+rbp] > > + rol edx,8 > > + xor r8,QWORD[8+rax*1+rsi] > > + xor r9,QWORD[rax*1+rsi] > > + mov al,dl > > + xor r8,r10 > > + movzx r12,WORD[r12*2+r11] > > + movzx ebx,dl > > + shl al,4 > > + movzx r13,BYTE[rcx*1+rsp] > > + shr ebx,4 > > + shl r12,48 > > + xor r13,r8 > > + mov r10,r9 > > + xor r9,r12 > > + shr r8,8 > > + movzx r13,r13b > > + shr r9,8 > > + xor r8,QWORD[((-128))+rcx*8+rbp] > > + shl r10,56 > > + xor r9,QWORD[rcx*8+rbp] > > + rol edx,8 > > + xor r8,QWORD[8+rax*1+rsi] > > + xor r9,QWORD[rax*1+rsi] > > + mov al,dl > > + xor r8,r10 > > + movzx r13,WORD[r13*2+r11] > > + movzx ecx,dl > > + shl al,4 > > + movzx r12,BYTE[rbx*1+rsp] > > + and ecx,240 > > + shl r13,48 > > + xor r12,r8 > > + mov r10,r9 > > + xor r9,r13 > > + shr r8,8 > > + movzx r12,r12b > > + mov edx,DWORD[((-4))+rdi] > > + shr r9,8 > > + xor r8,QWORD[((-128))+rbx*8+rbp] > > + shl r10,56 > > + xor r9,QWORD[rbx*8+rbp] > > + movzx r12,WORD[r12*2+r11] > > + xor r8,QWORD[8+rax*1+rsi] > > + xor r9,QWORD[rax*1+rsi] > > + shl r12,48 > > + xor r8,r10 > > + xor r9,r12 > > + movzx r13,r8b > > + shr r8,4 > > + mov r10,r9 > > + shl r13b,4 > > + shr r9,4 > > + xor r8,QWORD[8+rcx*1+rsi] > > + movzx r13,WORD[r13*2+r11] > > + shl r10,60 > > + xor r9,QWORD[rcx*1+rsi] > > + xor r8,r10 > > + shl r13,48 > > + bswap r8 > > + xor r9,r13 > > + bswap r9 > > + cmp r14,r15 > > + jb NEAR $L$outer_loop > > + mov QWORD[8+rdi],r8 > > + mov QWORD[rdi],r9 > > + > > + lea rsi,[((280+48))+rsp] > > + > > + mov r15,QWORD[((-48))+rsi] > > + > > + mov r14,QWORD[((-40))+rsi] > > + > > + mov r13,QWORD[((-32))+rsi] > > + > > + mov r12,QWORD[((-24))+rsi] > > + > > + mov rbp,QWORD[((-16))+rsi] > > + > > + mov rbx,QWORD[((-8))+rsi] > > + > > + lea rsp,[rsi] > > + > > +$L$ghash_epilogue: > > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > > + mov rsi,QWORD[16+rsp] > > + DB 0F3h,0C3h ;repret > > + > > +$L$SEH_end_gcm_ghash_4bit: > > +global gcm_init_clmul > > + > > +ALIGN 16 > > +gcm_init_clmul: > > + > > +$L$_init_clmul: > > +$L$SEH_begin_gcm_init_clmul: > > + > > +DB 0x48,0x83,0xec,0x18 > > +DB 0x0f,0x29,0x34,0x24 > > + movdqu xmm2,XMMWORD[rdx] > > + pshufd xmm2,xmm2,78 > > + > > + > > + pshufd xmm4,xmm2,255 > > + movdqa xmm3,xmm2 > > + psllq xmm2,1 > > + pxor xmm5,xmm5 > > + psrlq xmm3,63 > > + pcmpgtd xmm5,xmm4 > > + pslldq xmm3,8 > > + por xmm2,xmm3 > > + > > + > > + pand xmm5,XMMWORD[$L$0x1c2_polynomial] > > + pxor xmm2,xmm5 > > + > > + > > + pshufd xmm6,xmm2,78 > > + movdqa xmm0,xmm2 > > + pxor xmm6,xmm2 > > + movdqa xmm1,xmm0 > > + pshufd xmm3,xmm0,78 > > + pxor xmm3,xmm0 > > +DB 102,15,58,68,194,0 > > +DB 102,15,58,68,202,17 > > +DB 102,15,58,68,222,0 > > + pxor xmm3,xmm0 > > + pxor xmm3,xmm1 > > + > > + movdqa xmm4,xmm3 > > + psrldq xmm3,8 > > + pslldq xmm4,8 > > + pxor xmm1,xmm3 > > + pxor xmm0,xmm4 > > + > > + movdqa xmm4,xmm0 > > + movdqa xmm3,xmm0 > > + psllq xmm0,5 > > + pxor xmm3,xmm0 > > + psllq xmm0,1 > > + pxor xmm0,xmm3 > > + psllq xmm0,57 > > + movdqa xmm3,xmm0 > > + pslldq xmm0,8 > > + psrldq xmm3,8 > > + pxor xmm0,xmm4 > > + pxor xmm1,xmm3 > > + > > + > > + movdqa xmm4,xmm0 > > + psrlq xmm0,1 > > + pxor xmm1,xmm4 > > + pxor xmm4,xmm0 > > + psrlq xmm0,5 > > + pxor xmm0,xmm4 > > + psrlq xmm0,1 > > + pxor xmm0,xmm1 > > + pshufd xmm3,xmm2,78 > > + pshufd xmm4,xmm0,78 > > + pxor xmm3,xmm2 > > + movdqu XMMWORD[rcx],xmm2 > > + pxor xmm4,xmm0 > > + movdqu XMMWORD[16+rcx],xmm0 > > +DB 102,15,58,15,227,8 > > + movdqu XMMWORD[32+rcx],xmm4 > > + movdqa xmm1,xmm0 > > + pshufd xmm3,xmm0,78 > > + pxor xmm3,xmm0 > > +DB 102,15,58,68,194,0 > > +DB 102,15,58,68,202,17 > > +DB 102,15,58,68,222,0 > > + pxor xmm3,xmm0 > > + pxor xmm3,xmm1 > > + > > + movdqa xmm4,xmm3 > > + psrldq xmm3,8 > > + pslldq xmm4,8 > > + pxor xmm1,xmm3 > > + pxor xmm0,xmm4 > > + > > + movdqa xmm4,xmm0 > > + movdqa xmm3,xmm0 > > + psllq xmm0,5 > > + pxor xmm3,xmm0 > > + psllq xmm0,1 > > + pxor xmm0,xmm3 > > + psllq xmm0,57 > > + movdqa xmm3,xmm0 > > + pslldq xmm0,8 > > + psrldq xmm3,8 > > + pxor xmm0,xmm4 > > + pxor xmm1,xmm3 > > + > > + > > + movdqa xmm4,xmm0 > > + psrlq xmm0,1 > > + pxor xmm1,xmm4 > > + pxor xmm4,xmm0 > > + psrlq xmm0,5 > > + pxor xmm0,xmm4 > > + psrlq xmm0,1 > > + pxor xmm0,xmm1 > > + movdqa xmm5,xmm0 > > + movdqa xmm1,xmm0 > > + pshufd xmm3,xmm0,78 > > + pxor xmm3,xmm0 > > +DB 102,15,58,68,194,0 > > +DB 102,15,58,68,202,17 > > +DB 102,15,58,68,222,0 > > + pxor xmm3,xmm0 > > + pxor xmm3,xmm1 > > + > > + movdqa xmm4,xmm3 > > + psrldq xmm3,8 > > + pslldq xmm4,8 > > + pxor xmm1,xmm3 > > + pxor xmm0,xmm4 > > + > > + movdqa xmm4,xmm0 > > + movdqa xmm3,xmm0 > > + psllq xmm0,5 > > + pxor xmm3,xmm0 > > + psllq xmm0,1 > > + pxor xmm0,xmm3 > > + psllq xmm0,57 > > + movdqa xmm3,xmm0 > > + pslldq xmm0,8 > > + psrldq xmm3,8 > > + pxor xmm0,xmm4 > > + pxor xmm1,xmm3 > > + > > + > > + movdqa xmm4,xmm0 > > + psrlq xmm0,1 > > + pxor xmm1,xmm4 > > + pxor xmm4,xmm0 > > + psrlq xmm0,5 > > + pxor xmm0,xmm4 > > + psrlq xmm0,1 > > + pxor xmm0,xmm1 > > + pshufd xmm3,xmm5,78 > > + pshufd xmm4,xmm0,78 > > + pxor xmm3,xmm5 > > + movdqu XMMWORD[48+rcx],xmm5 > > + pxor xmm4,xmm0 > > + movdqu XMMWORD[64+rcx],xmm0 > > +DB 102,15,58,15,227,8 > > + movdqu XMMWORD[80+rcx],xmm4 > > + movaps xmm6,XMMWORD[rsp] > > + lea rsp,[24+rsp] > > +$L$SEH_end_gcm_init_clmul: > > + DB 0F3h,0C3h ;repret > > + > > + > > +global gcm_gmult_clmul > > + > > +ALIGN 16 > > +gcm_gmult_clmul: > > + > > +$L$_gmult_clmul: > > + movdqu xmm0,XMMWORD[rcx] > > + movdqa xmm5,XMMWORD[$L$bswap_mask] > > + movdqu xmm2,XMMWORD[rdx] > > + movdqu xmm4,XMMWORD[32+rdx] > > +DB 102,15,56,0,197 > > + movdqa xmm1,xmm0 > > + pshufd xmm3,xmm0,78 > > + pxor xmm3,xmm0 > > +DB 102,15,58,68,194,0 > > +DB 102,15,58,68,202,17 > > +DB 102,15,58,68,220,0 > > + pxor xmm3,xmm0 > > + pxor xmm3,xmm1 > > + > > + movdqa xmm4,xmm3 > > + psrldq xmm3,8 > > + pslldq xmm4,8 > > + pxor xmm1,xmm3 > > + pxor xmm0,xmm4 > > + > > + movdqa xmm4,xmm0 > > + movdqa xmm3,xmm0 > > + psllq xmm0,5 > > + pxor xmm3,xmm0 > > + psllq xmm0,1 > > + pxor xmm0,xmm3 > > + psllq xmm0,57 > > + movdqa xmm3,xmm0 > > + pslldq xmm0,8 > > + psrldq xmm3,8 > > + pxor xmm0,xmm4 > > + pxor xmm1,xmm3 > > + > > + > > + movdqa xmm4,xmm0 > > + psrlq xmm0,1 > > + pxor xmm1,xmm4 > > + pxor xmm4,xmm0 > > + psrlq xmm0,5 > > + pxor xmm0,xmm4 > > + psrlq xmm0,1 > > + pxor xmm0,xmm1 > > +DB 102,15,56,0,197 > > + movdqu XMMWORD[rcx],xmm0 > > + DB 0F3h,0C3h ;repret > > + > > + > > +global gcm_ghash_clmul > > + > > +ALIGN 32 > > +gcm_ghash_clmul: > > + > > +$L$_ghash_clmul: > > + lea rax,[((-136))+rsp] > > +$L$SEH_begin_gcm_ghash_clmul: > > + > > +DB 0x48,0x8d,0x60,0xe0 > > +DB 0x0f,0x29,0x70,0xe0 > > +DB 0x0f,0x29,0x78,0xf0 > > +DB 0x44,0x0f,0x29,0x00 > > +DB 0x44,0x0f,0x29,0x48,0x10 > > +DB 0x44,0x0f,0x29,0x50,0x20 > > +DB 0x44,0x0f,0x29,0x58,0x30 > > +DB 0x44,0x0f,0x29,0x60,0x40 > > +DB 0x44,0x0f,0x29,0x68,0x50 > > +DB 0x44,0x0f,0x29,0x70,0x60 > > +DB 0x44,0x0f,0x29,0x78,0x70 > > + movdqa xmm10,XMMWORD[$L$bswap_mask] > > + > > + movdqu xmm0,XMMWORD[rcx] > > + movdqu xmm2,XMMWORD[rdx] > > + movdqu xmm7,XMMWORD[32+rdx] > > +DB 102,65,15,56,0,194 > > + > > + sub r9,0x10 > > + jz NEAR $L$odd_tail > > + > > + movdqu xmm6,XMMWORD[16+rdx] > > + mov eax,DWORD[((OPENSSL_ia32cap_P+4))] > > + cmp r9,0x30 > > + jb NEAR $L$skip4x > > + > > + and eax,71303168 > > + cmp eax,4194304 > > + je NEAR $L$skip4x > > + > > + sub r9,0x30 > > + mov rax,0xA040608020C0E000 > > + movdqu xmm14,XMMWORD[48+rdx] > > + movdqu xmm15,XMMWORD[64+rdx] > > + > > + > > + > > + > > + movdqu xmm3,XMMWORD[48+r8] > > + movdqu xmm11,XMMWORD[32+r8] > > +DB 102,65,15,56,0,218 > > +DB 102,69,15,56,0,218 > > + movdqa xmm5,xmm3 > > + pshufd xmm4,xmm3,78 > > + pxor xmm4,xmm3 > > +DB 102,15,58,68,218,0 > > +DB 102,15,58,68,234,17 > > +DB 102,15,58,68,231,0 > > + > > + movdqa xmm13,xmm11 > > + pshufd xmm12,xmm11,78 > > + pxor xmm12,xmm11 > > +DB 102,68,15,58,68,222,0 > > +DB 102,68,15,58,68,238,17 > > +DB 102,68,15,58,68,231,16 > > + xorps xmm3,xmm11 > > + xorps xmm5,xmm13 > > + movups xmm7,XMMWORD[80+rdx] > > + xorps xmm4,xmm12 > > + > > + movdqu xmm11,XMMWORD[16+r8] > > + movdqu xmm8,XMMWORD[r8] > > +DB 102,69,15,56,0,218 > > +DB 102,69,15,56,0,194 > > + movdqa xmm13,xmm11 > > + pshufd xmm12,xmm11,78 > > + pxor xmm0,xmm8 > > + pxor xmm12,xmm11 > > +DB 102,69,15,58,68,222,0 > > + movdqa xmm1,xmm0 > > + pshufd xmm8,xmm0,78 > > + pxor xmm8,xmm0 > > +DB 102,69,15,58,68,238,17 > > +DB 102,68,15,58,68,231,0 > > + xorps xmm3,xmm11 > > + xorps xmm5,xmm13 > > + > > + lea r8,[64+r8] > > + sub r9,0x40 > > + jc NEAR $L$tail4x > > + > > + jmp NEAR $L$mod4_loop > > +ALIGN 32 > > +$L$mod4_loop: > > +DB 102,65,15,58,68,199,0 > > + xorps xmm4,xmm12 > > + movdqu xmm11,XMMWORD[48+r8] > > +DB 102,69,15,56,0,218 > > +DB 102,65,15,58,68,207,17 > > + xorps xmm0,xmm3 > > + movdqu xmm3,XMMWORD[32+r8] > > + movdqa xmm13,xmm11 > > +DB 102,68,15,58,68,199,16 > > + pshufd xmm12,xmm11,78 > > + xorps xmm1,xmm5 > > + pxor xmm12,xmm11 > > +DB 102,65,15,56,0,218 > > + movups xmm7,XMMWORD[32+rdx] > > + xorps xmm8,xmm4 > > +DB 102,68,15,58,68,218,0 > > + pshufd xmm4,xmm3,78 > > + > > + pxor xmm8,xmm0 > > + movdqa xmm5,xmm3 > > + pxor xmm8,xmm1 > > + pxor xmm4,xmm3 > > + movdqa xmm9,xmm8 > > +DB 102,68,15,58,68,234,17 > > + pslldq xmm8,8 > > + psrldq xmm9,8 > > + pxor xmm0,xmm8 > > + movdqa xmm8,XMMWORD[$L$7_mask] > > + pxor xmm1,xmm9 > > +DB 102,76,15,110,200 > > + > > + pand xmm8,xmm0 > > +DB 102,69,15,56,0,200 > > + pxor xmm9,xmm0 > > +DB 102,68,15,58,68,231,0 > > + psllq xmm9,57 > > + movdqa xmm8,xmm9 > > + pslldq xmm9,8 > > +DB 102,15,58,68,222,0 > > + psrldq xmm8,8 > > + pxor xmm0,xmm9 > > + pxor xmm1,xmm8 > > + movdqu xmm8,XMMWORD[r8] > > + > > + movdqa xmm9,xmm0 > > + psrlq xmm0,1 > > +DB 102,15,58,68,238,17 > > + xorps xmm3,xmm11 > > + movdqu xmm11,XMMWORD[16+r8] > > +DB 102,69,15,56,0,218 > > +DB 102,15,58,68,231,16 > > + xorps xmm5,xmm13 > > + movups xmm7,XMMWORD[80+rdx] > > +DB 102,69,15,56,0,194 > > + pxor xmm1,xmm9 > > + pxor xmm9,xmm0 > > + psrlq xmm0,5 > > + > > + movdqa xmm13,xmm11 > > + pxor xmm4,xmm12 > > + pshufd xmm12,xmm11,78 > > + pxor xmm0,xmm9 > > + pxor xmm1,xmm8 > > + pxor xmm12,xmm11 > > +DB 102,69,15,58,68,222,0 > > + psrlq xmm0,1 > > + pxor xmm0,xmm1 > > + movdqa xmm1,xmm0 > > +DB 102,69,15,58,68,238,17 > > + xorps xmm3,xmm11 > > + pshufd xmm8,xmm0,78 > > + pxor xmm8,xmm0 > > + > > +DB 102,68,15,58,68,231,0 > > + xorps xmm5,xmm13 > > + > > + lea r8,[64+r8] > > + sub r9,0x40 > > + jnc NEAR $L$mod4_loop > > + > > +$L$tail4x: > > +DB 102,65,15,58,68,199,0 > > +DB 102,65,15,58,68,207,17 > > +DB 102,68,15,58,68,199,16 > > + xorps xmm4,xmm12 > > + xorps xmm0,xmm3 > > + xorps xmm1,xmm5 > > + pxor xmm1,xmm0 > > + pxor xmm8,xmm4 > > + > > + pxor xmm8,xmm1 > > + pxor xmm1,xmm0 > > + > > + movdqa xmm9,xmm8 > > + psrldq xmm8,8 > > + pslldq xmm9,8 > > + pxor xmm1,xmm8 > > + pxor xmm0,xmm9 > > + > > + movdqa xmm4,xmm0 > > + movdqa xmm3,xmm0 > > + psllq xmm0,5 > > + pxor xmm3,xmm0 > > + psllq xmm0,1 > > + pxor xmm0,xmm3 > > + psllq xmm0,57 > > + movdqa xmm3,xmm0 > > + pslldq xmm0,8 > > + psrldq xmm3,8 > > + pxor xmm0,xmm4 > > + pxor xmm1,xmm3 > > + > > + > > + movdqa xmm4,xmm0 > > + psrlq xmm0,1 > > + pxor xmm1,xmm4 > > + pxor xmm4,xmm0 > > + psrlq xmm0,5 > > + pxor xmm0,xmm4 > > + psrlq xmm0,1 > > + pxor xmm0,xmm1 > > + add r9,0x40 > > + jz NEAR $L$done > > + movdqu xmm7,XMMWORD[32+rdx] > > + sub r9,0x10 > > + jz NEAR $L$odd_tail > > +$L$skip4x: > > + > > + > > + > > + > > + > > + movdqu xmm8,XMMWORD[r8] > > + movdqu xmm3,XMMWORD[16+r8] > > +DB 102,69,15,56,0,194 > > +DB 102,65,15,56,0,218 > > + pxor xmm0,xmm8 > > + > > + movdqa xmm5,xmm3 > > + pshufd xmm4,xmm3,78 > > + pxor xmm4,xmm3 > > +DB 102,15,58,68,218,0 > > +DB 102,15,58,68,234,17 > > +DB 102,15,58,68,231,0 > > + > > + lea r8,[32+r8] > > + nop > > + sub r9,0x20 > > + jbe NEAR $L$even_tail > > + nop > > + jmp NEAR $L$mod_loop > > + > > +ALIGN 32 > > +$L$mod_loop: > > + movdqa xmm1,xmm0 > > + movdqa xmm8,xmm4 > > + pshufd xmm4,xmm0,78 > > + pxor xmm4,xmm0 > > + > > +DB 102,15,58,68,198,0 > > +DB 102,15,58,68,206,17 > > +DB 102,15,58,68,231,16 > > + > > + pxor xmm0,xmm3 > > + pxor xmm1,xmm5 > > + movdqu xmm9,XMMWORD[r8] > > + pxor xmm8,xmm0 > > +DB 102,69,15,56,0,202 > > + movdqu xmm3,XMMWORD[16+r8] > > + > > + pxor xmm8,xmm1 > > + pxor xmm1,xmm9 > > + pxor xmm4,xmm8 > > +DB 102,65,15,56,0,218 > > + movdqa xmm8,xmm4 > > + psrldq xmm8,8 > > + pslldq xmm4,8 > > + pxor xmm1,xmm8 > > + pxor xmm0,xmm4 > > + > > + movdqa xmm5,xmm3 > > + > > + movdqa xmm9,xmm0 > > + movdqa xmm8,xmm0 > > + psllq xmm0,5 > > + pxor xmm8,xmm0 > > +DB 102,15,58,68,218,0 > > + psllq xmm0,1 > > + pxor xmm0,xmm8 > > + psllq xmm0,57 > > + movdqa xmm8,xmm0 > > + pslldq xmm0,8 > > + psrldq xmm8,8 > > + pxor xmm0,xmm9 > > + pshufd xmm4,xmm5,78 > > + pxor xmm1,xmm8 > > + pxor xmm4,xmm5 > > + > > + movdqa xmm9,xmm0 > > + psrlq xmm0,1 > > +DB 102,15,58,68,234,17 > > + pxor xmm1,xmm9 > > + pxor xmm9,xmm0 > > + psrlq xmm0,5 > > + pxor xmm0,xmm9 > > + lea r8,[32+r8] > > + psrlq xmm0,1 > > +DB 102,15,58,68,231,0 > > + pxor xmm0,xmm1 > > + > > + sub r9,0x20 > > + ja NEAR $L$mod_loop > > + > > +$L$even_tail: > > + movdqa xmm1,xmm0 > > + movdqa xmm8,xmm4 > > + pshufd xmm4,xmm0,78 > > + pxor xmm4,xmm0 > > + > > +DB 102,15,58,68,198,0 > > +DB 102,15,58,68,206,17 > > +DB 102,15,58,68,231,16 > > + > > + pxor xmm0,xmm3 > > + pxor xmm1,xmm5 > > + pxor xmm8,xmm0 > > + pxor xmm8,xmm1 > > + pxor xmm4,xmm8 > > + movdqa xmm8,xmm4 > > + psrldq xmm8,8 > > + pslldq xmm4,8 > > + pxor xmm1,xmm8 > > + pxor xmm0,xmm4 > > + > > + movdqa xmm4,xmm0 > > + movdqa xmm3,xmm0 > > + psllq xmm0,5 > > + pxor xmm3,xmm0 > > + psllq xmm0,1 > > + pxor xmm0,xmm3 > > + psllq xmm0,57 > > + movdqa xmm3,xmm0 > > + pslldq xmm0,8 > > + psrldq xmm3,8 > > + pxor xmm0,xmm4 > > + pxor xmm1,xmm3 > > + > > + > > + movdqa xmm4,xmm0 > > + psrlq xmm0,1 > > + pxor xmm1,xmm4 > > + pxor xmm4,xmm0 > > + psrlq xmm0,5 > > + pxor xmm0,xmm4 > > + psrlq xmm0,1 > > + pxor xmm0,xmm1 > > + test r9,r9 > > + jnz NEAR $L$done > > + > > +$L$odd_tail: > > + movdqu xmm8,XMMWORD[r8] > > +DB 102,69,15,56,0,194 > > + pxor xmm0,xmm8 > > + movdqa xmm1,xmm0 > > + pshufd xmm3,xmm0,78 > > + pxor xmm3,xmm0 > > +DB 102,15,58,68,194,0 > > +DB 102,15,58,68,202,17 > > +DB 102,15,58,68,223,0 > > + pxor xmm3,xmm0 > > + pxor xmm3,xmm1 > > + > > + movdqa xmm4,xmm3 > > + psrldq xmm3,8 > > + pslldq xmm4,8 > > + pxor xmm1,xmm3 > > + pxor xmm0,xmm4 > > + > > + movdqa xmm4,xmm0 > > + movdqa xmm3,xmm0 > > + psllq xmm0,5 > > + pxor xmm3,xmm0 > > + psllq xmm0,1 > > + pxor xmm0,xmm3 > > + psllq xmm0,57 > > + movdqa xmm3,xmm0 > > + pslldq xmm0,8 > > + psrldq xmm3,8 > > + pxor xmm0,xmm4 > > + pxor xmm1,xmm3 > > + > > + > > + movdqa xmm4,xmm0 > > + psrlq xmm0,1 > > + pxor xmm1,xmm4 > > + pxor xmm4,xmm0 > > + psrlq xmm0,5 > > + pxor xmm0,xmm4 > > + psrlq xmm0,1 > > + pxor xmm0,xmm1 > > +$L$done: > > +DB 102,65,15,56,0,194 > > + movdqu XMMWORD[rcx],xmm0 > > + movaps xmm6,XMMWORD[rsp] > > + movaps xmm7,XMMWORD[16+rsp] > > + movaps xmm8,XMMWORD[32+rsp] > > + movaps xmm9,XMMWORD[48+rsp] > > + movaps xmm10,XMMWORD[64+rsp] > > + movaps xmm11,XMMWORD[80+rsp] > > + movaps xmm12,XMMWORD[96+rsp] > > + movaps xmm13,XMMWORD[112+rsp] > > + movaps xmm14,XMMWORD[128+rsp] > > + movaps xmm15,XMMWORD[144+rsp] > > + lea rsp,[168+rsp] > > +$L$SEH_end_gcm_ghash_clmul: > > + DB 0F3h,0C3h ;repret > > + > > + > > +global gcm_init_avx > > + > > +ALIGN 32 > > +gcm_init_avx: > > + > > + jmp NEAR $L$_init_clmul > > + > > + > > +global gcm_gmult_avx > > + > > +ALIGN 32 > > +gcm_gmult_avx: > > + > > + jmp NEAR $L$_gmult_clmul > > + > > + > > +global gcm_ghash_avx > > + > > +ALIGN 32 > > +gcm_ghash_avx: > > + > > + jmp NEAR $L$_ghash_clmul > > + > > + > > +ALIGN 64 > > +$L$bswap_mask: > > +DB 15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0 > > +$L$0x1c2_polynomial: > > +DB 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0xc2 > > +$L$7_mask: > > + DD 7,0,7,0 > > +$L$7_mask_poly: > > + DD 7,0,450,0 > > +ALIGN 64 > > + > > +$L$rem_4bit: > > + DD 0,0,0,471859200,0,943718400,0,610271232 > > + DD > 0,1887436800,0,1822425088,0,1220542464,0,1423966208 > > + DD > 0,3774873600,0,4246732800,0,3644850176,0,3311403008 > > + DD > 0,2441084928,0,2376073216,0,2847932416,0,3051356160 > > + > > +$L$rem_8bit: > > + DW > 0x0000,0x01C2,0x0384,0x0246,0x0708,0x06CA,0x048C,0x054E > > + DW > 0x0E10,0x0FD2,0x0D94,0x0C56,0x0918,0x08DA,0x0A9C,0x0B5E > > + DW > 0x1C20,0x1DE2,0x1FA4,0x1E66,0x1B28,0x1AEA,0x18AC,0x196E > > + DW > 0x1230,0x13F2,0x11B4,0x1076,0x1538,0x14FA,0x16BC,0x177E > > + DW > 0x3840,0x3982,0x3BC4,0x3A06,0x3F48,0x3E8A,0x3CCC,0x3D0E > > + DW > 0x3650,0x3792,0x35D4,0x3416,0x3158,0x309A,0x32DC,0x331E > > + DW > 0x2460,0x25A2,0x27E4,0x2626,0x2368,0x22AA,0x20EC,0x212E > > + DW > 0x2A70,0x2BB2,0x29F4,0x2836,0x2D78,0x2CBA,0x2EFC,0x2F3E > > + DW > 0x7080,0x7142,0x7304,0x72C6,0x7788,0x764A,0x740C,0x75CE > > + DW > 0x7E90,0x7F52,0x7D14,0x7CD6,0x7998,0x785A,0x7A1C,0x7BDE > > + DW > 0x6CA0,0x6D62,0x6F24,0x6EE6,0x6BA8,0x6A6A,0x682C,0x69EE > > + DW > 0x62B0,0x6372,0x6134,0x60F6,0x65B8,0x647A,0x663C,0x67FE > > + DW > 0x48C0,0x4902,0x4B44,0x4A86,0x4FC8,0x4E0A,0x4C4C,0x4D8E > > + DW > 0x46D0,0x4712,0x4554,0x4496,0x41D8,0x401A,0x425C,0x439E > > + DW > 0x54E0,0x5522,0x5764,0x56A6,0x53E8,0x522A,0x506C,0x51AE > > + DW > 0x5AF0,0x5B32,0x5974,0x58B6,0x5DF8,0x5C3A,0x5E7C,0x5FBE > > + DW > 0xE100,0xE0C2,0xE284,0xE346,0xE608,0xE7CA,0xE58C,0xE44E > > + DW > 0xEF10,0xEED2,0xEC94,0xED56,0xE818,0xE9DA,0xEB9C,0xEA5E > > + DW > 0xFD20,0xFCE2,0xFEA4,0xFF66,0xFA28,0xFBEA,0xF9AC,0xF86E > > + DW > 0xF330,0xF2F2,0xF0B4,0xF176,0xF438,0xF5FA,0xF7BC,0xF67E > > + DW > 0xD940,0xD882,0xDAC4,0xDB06,0xDE48,0xDF8A,0xDDCC,0xDC0E > > + DW > 0xD750,0xD692,0xD4D4,0xD516,0xD058,0xD19A,0xD3DC,0xD21E > > + DW > 0xC560,0xC4A2,0xC6E4,0xC726,0xC268,0xC3AA,0xC1EC,0xC02E > > + DW > 0xCB70,0xCAB2,0xC8F4,0xC936,0xCC78,0xCDBA,0xCFFC,0xCE3E > > + DW > 0x9180,0x9042,0x9204,0x93C6,0x9688,0x974A,0x950C,0x94CE > > + DW > 0x9F90,0x9E52,0x9C14,0x9DD6,0x9898,0x995A,0x9B1C,0x9ADE > > + DW > 0x8DA0,0x8C62,0x8E24,0x8FE6,0x8AA8,0x8B6A,0x892C,0x88EE > > + DW > 0x83B0,0x8272,0x8034,0x81F6,0x84B8,0x857A,0x873C,0x86FE > > + DW > 0xA9C0,0xA802,0xAA44,0xAB86,0xAEC8,0xAF0A,0xAD4C,0xAC8E > > + DW > 0xA7D0,0xA612,0xA454,0xA596,0xA0D8,0xA11A,0xA35C,0xA29E > > + DW > 0xB5E0,0xB422,0xB664,0xB7A6,0xB2E8,0xB32A,0xB16C,0xB0AE > > + DW > 0xBBF0,0xBA32,0xB874,0xB9B6,0xBCF8,0xBD3A,0xBF7C,0xBEBE > > + > > +DB 71,72,65,83,72,32,102,111,114,32,120,56,54,95,54,52 > > +DB 44,32,67,82,89,80,84,79,71,65,77,83,32,98,121,32 > > +DB 60,97,112,112,114,111,64,111,112,101,110,115,115,108,46,111 > > +DB 114,103,62,0 > > +ALIGN 64 > > +EXTERN __imp_RtlVirtualUnwind > > + > > +ALIGN 16 > > +se_handler: > > + push rsi > > + push rdi > > + push rbx > > + push rbp > > + push r12 > > + push r13 > > + push r14 > > + push r15 > > + pushfq > > + sub rsp,64 > > + > > + mov rax,QWORD[120+r8] > > + mov rbx,QWORD[248+r8] > > + > > + mov rsi,QWORD[8+r9] > > + mov r11,QWORD[56+r9] > > + > > + mov r10d,DWORD[r11] > > + lea r10,[r10*1+rsi] > > + cmp rbx,r10 > > + jb NEAR $L$in_prologue > > + > > + mov rax,QWORD[152+r8] > > + > > + mov r10d,DWORD[4+r11] > > + lea r10,[r10*1+rsi] > > + cmp rbx,r10 > > + jae NEAR $L$in_prologue > > + > > + lea rax,[((48+280))+rax] > > + > > + mov rbx,QWORD[((-8))+rax] > > + mov rbp,QWORD[((-16))+rax] > > + mov r12,QWORD[((-24))+rax] > > + mov r13,QWORD[((-32))+rax] > > + mov r14,QWORD[((-40))+rax] > > + mov r15,QWORD[((-48))+rax] > > + mov QWORD[144+r8],rbx > > + mov QWORD[160+r8],rbp > > + mov QWORD[216+r8],r12 > > + mov QWORD[224+r8],r13 > > + mov QWORD[232+r8],r14 > > + mov QWORD[240+r8],r15 > > + > > +$L$in_prologue: > > + mov rdi,QWORD[8+rax] > > + mov rsi,QWORD[16+rax] > > + mov QWORD[152+r8],rax > > + mov QWORD[168+r8],rsi > > + mov QWORD[176+r8],rdi > > + > > + mov rdi,QWORD[40+r9] > > + mov rsi,r8 > > + mov ecx,154 > > + DD 0xa548f3fc > > + > > + mov rsi,r9 > > + xor rcx,rcx > > + mov rdx,QWORD[8+rsi] > > + mov r8,QWORD[rsi] > > + mov r9,QWORD[16+rsi] > > + mov r10,QWORD[40+rsi] > > + lea r11,[56+rsi] > > + lea r12,[24+rsi] > > + mov QWORD[32+rsp],r10 > > + mov QWORD[40+rsp],r11 > > + mov QWORD[48+rsp],r12 > > + mov QWORD[56+rsp],rcx > > + call QWORD[__imp_RtlVirtualUnwind] > > + > > + mov eax,1 > > + add rsp,64 > > + popfq > > + pop r15 > > + pop r14 > > + pop r13 > > + pop r12 > > + pop rbp > > + pop rbx > > + pop rdi > > + pop rsi > > + DB 0F3h,0C3h ;repret > > + > > + > > +section .pdata rdata align=4 > > +ALIGN 4 > > + DD $L$SEH_begin_gcm_gmult_4bit wrt ..imagebase > > + DD $L$SEH_end_gcm_gmult_4bit wrt ..imagebase > > + DD $L$SEH_info_gcm_gmult_4bit wrt ..imagebase > > + > > + DD $L$SEH_begin_gcm_ghash_4bit wrt ..imagebase > > + DD $L$SEH_end_gcm_ghash_4bit wrt ..imagebase > > + DD $L$SEH_info_gcm_ghash_4bit wrt ..imagebase > > + > > + DD $L$SEH_begin_gcm_init_clmul wrt ..imagebase > > + DD $L$SEH_end_gcm_init_clmul wrt ..imagebase > > + DD $L$SEH_info_gcm_init_clmul wrt ..imagebase > > + > > + DD $L$SEH_begin_gcm_ghash_clmul wrt ..imagebase > > + DD $L$SEH_end_gcm_ghash_clmul wrt ..imagebase > > + DD $L$SEH_info_gcm_ghash_clmul wrt ..imagebase > > +section .xdata rdata align=8 > > +ALIGN 8 > > +$L$SEH_info_gcm_gmult_4bit: > > +DB 9,0,0,0 > > + DD se_handler wrt ..imagebase > > + DD $L$gmult_prologue > wrt ..imagebase,$L$gmult_epilogue > > wrt ..imagebase > > +$L$SEH_info_gcm_ghash_4bit: > > +DB 9,0,0,0 > > + DD se_handler wrt ..imagebase > > + DD $L$ghash_prologue > wrt ..imagebase,$L$ghash_epilogue > > wrt ..imagebase > > +$L$SEH_info_gcm_init_clmul: > > +DB 0x01,0x08,0x03,0x00 > > +DB 0x08,0x68,0x00,0x00 > > +DB 0x04,0x22,0x00,0x00 > > +$L$SEH_info_gcm_ghash_clmul: > > +DB 0x01,0x33,0x16,0x00 > > +DB 0x33,0xf8,0x09,0x00 > > +DB 0x2e,0xe8,0x08,0x00 > > +DB 0x29,0xd8,0x07,0x00 > > +DB 0x24,0xc8,0x06,0x00 > > +DB 0x1f,0xb8,0x05,0x00 > > +DB 0x1a,0xa8,0x04,0x00 > > +DB 0x15,0x98,0x03,0x00 > > +DB 0x10,0x88,0x02,0x00 > > +DB 0x0c,0x78,0x01,0x00 > > +DB 0x08,0x68,0x00,0x00 > > +DB 0x04,0x01,0x15,0x00 > > diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-mb- > > x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-mb- > > x86_64.nasm > > new file mode 100644 > > index 0000000000..f3b7b0e35e > > --- /dev/null > > +++ > b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-mb-x86_64.nasm > > @@ -0,0 +1,3137 @@ > > +; WARNING: do not edit! > > +; Generated from openssl/crypto/sha/asm/sha1-mb-x86_64.pl > > +; > > +; Copyright 2013-2020 The OpenSSL Project Authors. All Rights Reserved. > > +; > > +; Licensed under the OpenSSL license (the "License"). You may not use > > +; this file except in compliance with the License. You can obtain a copy > > +; in the file LICENSE in the source distribution or at > > +; https://www.openssl.org/source/license.html > > + > > +default rel > > +%define XMMWORD > > +%define YMMWORD > > +%define ZMMWORD > > +section .text code align=64 > > + > > + > > +EXTERN OPENSSL_ia32cap_P > > + > > +global sha1_multi_block > > + > > +ALIGN 32 > > +sha1_multi_block: > > + mov QWORD[8+rsp],rdi ;WIN64 prologue > > + mov QWORD[16+rsp],rsi > > + mov rax,rsp > > +$L$SEH_begin_sha1_multi_block: > > + mov rdi,rcx > > + mov rsi,rdx > > + mov rdx,r8 > > + > > + > > + > > + mov rcx,QWORD[((OPENSSL_ia32cap_P+4))] > > + bt rcx,61 > > + jc NEAR _shaext_shortcut > > + mov rax,rsp > > + > > + push rbx > > + > > + push rbp > > + > > + lea rsp,[((-168))+rsp] > > + movaps XMMWORD[rsp],xmm6 > > + movaps XMMWORD[16+rsp],xmm7 > > + movaps XMMWORD[32+rsp],xmm8 > > + movaps XMMWORD[48+rsp],xmm9 > > + movaps XMMWORD[(-120)+rax],xmm10 > > + movaps XMMWORD[(-104)+rax],xmm11 > > + movaps XMMWORD[(-88)+rax],xmm12 > > + movaps XMMWORD[(-72)+rax],xmm13 > > + movaps XMMWORD[(-56)+rax],xmm14 > > + movaps XMMWORD[(-40)+rax],xmm15 > > + sub rsp,288 > > + and rsp,-256 > > + mov QWORD[272+rsp],rax > > + > > +$L$body: > > + lea rbp,[K_XX_XX] > > + lea rbx,[256+rsp] > > + > > +$L$oop_grande: > > + mov DWORD[280+rsp],edx > > + xor edx,edx > > + mov r8,QWORD[rsi] > > + mov ecx,DWORD[8+rsi] > > + cmp ecx,edx > > + cmovg edx,ecx > > + test ecx,ecx > > + mov DWORD[rbx],ecx > > + cmovle r8,rbp > > + mov r9,QWORD[16+rsi] > > + mov ecx,DWORD[24+rsi] > > + cmp ecx,edx > > + cmovg edx,ecx > > + test ecx,ecx > > + mov DWORD[4+rbx],ecx > > + cmovle r9,rbp > > + mov r10,QWORD[32+rsi] > > + mov ecx,DWORD[40+rsi] > > + cmp ecx,edx > > + cmovg edx,ecx > > + test ecx,ecx > > + mov DWORD[8+rbx],ecx > > + cmovle r10,rbp > > + mov r11,QWORD[48+rsi] > > + mov ecx,DWORD[56+rsi] > > + cmp ecx,edx > > + cmovg edx,ecx > > + test ecx,ecx > > + mov DWORD[12+rbx],ecx > > + cmovle r11,rbp > > + test edx,edx > > + jz NEAR $L$done > > + > > + movdqu xmm10,XMMWORD[rdi] > > + lea rax,[128+rsp] > > + movdqu xmm11,XMMWORD[32+rdi] > > + movdqu xmm12,XMMWORD[64+rdi] > > + movdqu xmm13,XMMWORD[96+rdi] > > + movdqu xmm14,XMMWORD[128+rdi] > > + movdqa xmm5,XMMWORD[96+rbp] > > + movdqa xmm15,XMMWORD[((-32))+rbp] > > + jmp NEAR $L$oop > > + > > +ALIGN 32 > > +$L$oop: > > + movd xmm0,DWORD[r8] > > + lea r8,[64+r8] > > + movd xmm2,DWORD[r9] > > + lea r9,[64+r9] > > + movd xmm3,DWORD[r10] > > + lea r10,[64+r10] > > + movd xmm4,DWORD[r11] > > + lea r11,[64+r11] > > + punpckldq xmm0,xmm3 > > + movd xmm1,DWORD[((-60))+r8] > > + punpckldq xmm2,xmm4 > > + movd xmm9,DWORD[((-60))+r9] > > + punpckldq xmm0,xmm2 > > + movd xmm8,DWORD[((-60))+r10] > > +DB 102,15,56,0,197 > > + movd xmm7,DWORD[((-60))+r11] > > + punpckldq xmm1,xmm8 > > + movdqa xmm8,xmm10 > > + paddd xmm14,xmm15 > > + punpckldq xmm9,xmm7 > > + movdqa xmm7,xmm11 > > + movdqa xmm6,xmm11 > > + pslld xmm8,5 > > + pandn xmm7,xmm13 > > + pand xmm6,xmm12 > > + punpckldq xmm1,xmm9 > > + movdqa xmm9,xmm10 > > + > > + movdqa XMMWORD[(0-128)+rax],xmm0 > > + paddd xmm14,xmm0 > > + movd xmm2,DWORD[((-56))+r8] > > + psrld xmm9,27 > > + pxor xmm6,xmm7 > > + movdqa xmm7,xmm11 > > + > > + por xmm8,xmm9 > > + movd xmm9,DWORD[((-56))+r9] > > + pslld xmm7,30 > > + paddd xmm14,xmm6 > > + > > + psrld xmm11,2 > > + paddd xmm14,xmm8 > > +DB 102,15,56,0,205 > > + movd xmm8,DWORD[((-56))+r10] > > + por xmm11,xmm7 > > + movd xmm7,DWORD[((-56))+r11] > > + punpckldq xmm2,xmm8 > > + movdqa xmm8,xmm14 > > + paddd xmm13,xmm15 > > + punpckldq xmm9,xmm7 > > + movdqa xmm7,xmm10 > > + movdqa xmm6,xmm10 > > + pslld xmm8,5 > > + pandn xmm7,xmm12 > > + pand xmm6,xmm11 > > + punpckldq xmm2,xmm9 > > + movdqa xmm9,xmm14 > > + > > + movdqa XMMWORD[(16-128)+rax],xmm1 > > + paddd xmm13,xmm1 > > + movd xmm3,DWORD[((-52))+r8] > > + psrld xmm9,27 > > + pxor xmm6,xmm7 > > + movdqa xmm7,xmm10 > > + > > + por xmm8,xmm9 > > + movd xmm9,DWORD[((-52))+r9] > > + pslld xmm7,30 > > + paddd xmm13,xmm6 > > + > > + psrld xmm10,2 > > + paddd xmm13,xmm8 > > +DB 102,15,56,0,213 > > + movd xmm8,DWORD[((-52))+r10] > > + por xmm10,xmm7 > > + movd xmm7,DWORD[((-52))+r11] > > + punpckldq xmm3,xmm8 > > + movdqa xmm8,xmm13 > > + paddd xmm12,xmm15 > > + punpckldq xmm9,xmm7 > > + movdqa xmm7,xmm14 > > + movdqa xmm6,xmm14 > > + pslld xmm8,5 > > + pandn xmm7,xmm11 > > + pand xmm6,xmm10 > > + punpckldq xmm3,xmm9 > > + movdqa xmm9,xmm13 > > + > > + movdqa XMMWORD[(32-128)+rax],xmm2 > > + paddd xmm12,xmm2 > > + movd xmm4,DWORD[((-48))+r8] > > + psrld xmm9,27 > > + pxor xmm6,xmm7 > > + movdqa xmm7,xmm14 > > + > > + por xmm8,xmm9 > > + movd xmm9,DWORD[((-48))+r9] > > + pslld xmm7,30 > > + paddd xmm12,xmm6 > > + > > + psrld xmm14,2 > > + paddd xmm12,xmm8 > > +DB 102,15,56,0,221 > > + movd xmm8,DWORD[((-48))+r10] > > + por xmm14,xmm7 > > + movd xmm7,DWORD[((-48))+r11] > > + punpckldq xmm4,xmm8 > > + movdqa xmm8,xmm12 > > + paddd xmm11,xmm15 > > + punpckldq xmm9,xmm7 > > + movdqa xmm7,xmm13 > > + movdqa xmm6,xmm13 > > + pslld xmm8,5 > > + pandn xmm7,xmm10 > > + pand xmm6,xmm14 > > + punpckldq xmm4,xmm9 > > + movdqa xmm9,xmm12 > > + > > + movdqa XMMWORD[(48-128)+rax],xmm3 > > + paddd xmm11,xmm3 > > + movd xmm0,DWORD[((-44))+r8] > > + psrld xmm9,27 > > + pxor xmm6,xmm7 > > + movdqa xmm7,xmm13 > > + > > + por xmm8,xmm9 > > + movd xmm9,DWORD[((-44))+r9] > > + pslld xmm7,30 > > + paddd xmm11,xmm6 > > + > > + psrld xmm13,2 > > + paddd xmm11,xmm8 > > +DB 102,15,56,0,229 > > + movd xmm8,DWORD[((-44))+r10] > > + por xmm13,xmm7 > > + movd xmm7,DWORD[((-44))+r11] > > + punpckldq xmm0,xmm8 > > + movdqa xmm8,xmm11 > > + paddd xmm10,xmm15 > > + punpckldq xmm9,xmm7 > > + movdqa xmm7,xmm12 > > + movdqa xmm6,xmm12 > > + pslld xmm8,5 > > + pandn xmm7,xmm14 > > + pand xmm6,xmm13 > > + punpckldq xmm0,xmm9 > > + movdqa xmm9,xmm11 > > + > > + movdqa XMMWORD[(64-128)+rax],xmm4 > > + paddd xmm10,xmm4 > > + movd xmm1,DWORD[((-40))+r8] > > + psrld xmm9,27 > > + pxor xmm6,xmm7 > > + movdqa xmm7,xmm12 > > + > > + por xmm8,xmm9 > > + movd xmm9,DWORD[((-40))+r9] > > + pslld xmm7,30 > > + paddd xmm10,xmm6 > > + > > + psrld xmm12,2 > > + paddd xmm10,xmm8 > > +DB 102,15,56,0,197 > > + movd xmm8,DWORD[((-40))+r10] > > + por xmm12,xmm7 > > + movd xmm7,DWORD[((-40))+r11] > > + punpckldq xmm1,xmm8 > > + movdqa xmm8,xmm10 > > + paddd xmm14,xmm15 > > + punpckldq xmm9,xmm7 > > + movdqa xmm7,xmm11 > > + movdqa xmm6,xmm11 > > + pslld xmm8,5 > > + pandn xmm7,xmm13 > > + pand xmm6,xmm12 > > + punpckldq xmm1,xmm9 > > + movdqa xmm9,xmm10 > > + > > + movdqa XMMWORD[(80-128)+rax],xmm0 > > + paddd xmm14,xmm0 > > + movd xmm2,DWORD[((-36))+r8] > > + psrld xmm9,27 > > + pxor xmm6,xmm7 > > + movdqa xmm7,xmm11 > > + > > + por xmm8,xmm9 > > + movd xmm9,DWORD[((-36))+r9] > > + pslld xmm7,30 > > + paddd xmm14,xmm6 > > + > > + psrld xmm11,2 > > + paddd xmm14,xmm8 > > +DB 102,15,56,0,205 > > + movd xmm8,DWORD[((-36))+r10] > > + por xmm11,xmm7 > > + movd xmm7,DWORD[((-36))+r11] > > + punpckldq xmm2,xmm8 > > + movdqa xmm8,xmm14 > > + paddd xmm13,xmm15 > > + punpckldq xmm9,xmm7 > > + movdqa xmm7,xmm10 > > + movdqa xmm6,xmm10 > > + pslld xmm8,5 > > + pandn xmm7,xmm12 > > + pand xmm6,xmm11 > > + punpckldq xmm2,xmm9 > > + movdqa xmm9,xmm14 > > + > > + movdqa XMMWORD[(96-128)+rax],xmm1 > > + paddd xmm13,xmm1 > > + movd xmm3,DWORD[((-32))+r8] > > + psrld xmm9,27 > > + pxor xmm6,xmm7 > > + movdqa xmm7,xmm10 > > + > > + por xmm8,xmm9 > > + movd xmm9,DWORD[((-32))+r9] > > + pslld xmm7,30 > > + paddd xmm13,xmm6 > > + > > + psrld xmm10,2 > > + paddd xmm13,xmm8 > > +DB 102,15,56,0,213 > > + movd xmm8,DWORD[((-32))+r10] > > + por xmm10,xmm7 > > + movd xmm7,DWORD[((-32))+r11] > > + punpckldq xmm3,xmm8 > > + movdqa xmm8,xmm13 > > + paddd xmm12,xmm15 > > + punpckldq xmm9,xmm7 > > + movdqa xmm7,xmm14 > > + movdqa xmm6,xmm14 > > + pslld xmm8,5 > > + pandn xmm7,xmm11 > > + pand xmm6,xmm10 > > + punpckldq xmm3,xmm9 > > + movdqa xmm9,xmm13 > > + > > + movdqa XMMWORD[(112-128)+rax],xmm2 > > + paddd xmm12,xmm2 > > + movd xmm4,DWORD[((-28))+r8] > > + psrld xmm9,27 > > + pxor xmm6,xmm7 > > + movdqa xmm7,xmm14 > > + > > + por xmm8,xmm9 > > + movd xmm9,DWORD[((-28))+r9] > > + pslld xmm7,30 > > + paddd xmm12,xmm6 > > + > > + psrld xmm14,2 > > + paddd xmm12,xmm8 > > +DB 102,15,56,0,221 > > + movd xmm8,DWORD[((-28))+r10] > > + por xmm14,xmm7 > > + movd xmm7,DWORD[((-28))+r11] > > + punpckldq xmm4,xmm8 > > + movdqa xmm8,xmm12 > > + paddd xmm11,xmm15 > > + punpckldq xmm9,xmm7 > > + movdqa xmm7,xmm13 > > + movdqa xmm6,xmm13 > > + pslld xmm8,5 > > + pandn xmm7,xmm10 > > + pand xmm6,xmm14 > > + punpckldq xmm4,xmm9 > > + movdqa xmm9,xmm12 > > + > > + movdqa XMMWORD[(128-128)+rax],xmm3 > > + paddd xmm11,xmm3 > > + movd xmm0,DWORD[((-24))+r8] > > + psrld xmm9,27 > > + pxor xmm6,xmm7 > > + movdqa xmm7,xmm13 > > + > > + por xmm8,xmm9 > > + movd xmm9,DWORD[((-24))+r9] > > + pslld xmm7,30 > > + paddd xmm11,xmm6 > > + > > + psrld xmm13,2 > > + paddd xmm11,xmm8 > > +DB 102,15,56,0,229 > > + movd xmm8,DWORD[((-24))+r10] > > + por xmm13,xmm7 > > + movd xmm7,DWORD[((-24))+r11] > > + punpckldq xmm0,xmm8 > > + movdqa xmm8,xmm11 > > + paddd xmm10,xmm15 > > + punpckldq xmm9,xmm7 > > + movdqa xmm7,xmm12 > > + movdqa xmm6,xmm12 > > + pslld xmm8,5 > > + pandn xmm7,xmm14 > > + pand xmm6,xmm13 > > + punpckldq xmm0,xmm9 > > + movdqa xmm9,xmm11 > > + > > + movdqa XMMWORD[(144-128)+rax],xmm4 > > + paddd xmm10,xmm4 > > + movd xmm1,DWORD[((-20))+r8] > > + psrld xmm9,27 > > + pxor xmm6,xmm7 > > + movdqa xmm7,xmm12 > > + > > + por xmm8,xmm9 > > + movd xmm9,DWORD[((-20))+r9] > > + pslld xmm7,30 > > + paddd xmm10,xmm6 > > + > > + psrld xmm12,2 > > + paddd xmm10,xmm8 > > +DB 102,15,56,0,197 > > + movd xmm8,DWORD[((-20))+r10] > > + por xmm12,xmm7 > > + movd xmm7,DWORD[((-20))+r11] > > + punpckldq xmm1,xmm8 > > + movdqa xmm8,xmm10 > > + paddd xmm14,xmm15 > > + punpckldq xmm9,xmm7 > > + movdqa xmm7,xmm11 > > + movdqa xmm6,xmm11 > > + pslld xmm8,5 > > + pandn xmm7,xmm13 > > + pand xmm6,xmm12 > > + punpckldq xmm1,xmm9 > > + movdqa xmm9,xmm10 > > + > > + movdqa XMMWORD[(160-128)+rax],xmm0 > > + paddd xmm14,xmm0 > > + movd xmm2,DWORD[((-16))+r8] > > + psrld xmm9,27 > > + pxor xmm6,xmm7 > > + movdqa xmm7,xmm11 > > + > > + por xmm8,xmm9 > > + movd xmm9,DWORD[((-16))+r9] > > + pslld xmm7,30 > > + paddd xmm14,xmm6 > > + > > + psrld xmm11,2 > > + paddd xmm14,xmm8 > > +DB 102,15,56,0,205 > > + movd xmm8,DWORD[((-16))+r10] > > + por xmm11,xmm7 > > + movd xmm7,DWORD[((-16))+r11] > > + punpckldq xmm2,xmm8 > > + movdqa xmm8,xmm14 > > + paddd xmm13,xmm15 > > + punpckldq xmm9,xmm7 > > + movdqa xmm7,xmm10 > > + movdqa xmm6,xmm10 > > + pslld xmm8,5 > > + pandn xmm7,xmm12 > > + pand xmm6,xmm11 > > + punpckldq xmm2,xmm9 > > + movdqa xmm9,xmm14 > > + > > + movdqa XMMWORD[(176-128)+rax],xmm1 > > + paddd xmm13,xmm1 > > + movd xmm3,DWORD[((-12))+r8] > > + psrld xmm9,27 > > + pxor xmm6,xmm7 > > + movdqa xmm7,xmm10 > > + > > + por xmm8,xmm9 > > + movd xmm9,DWORD[((-12))+r9] > > + pslld xmm7,30 > > + paddd xmm13,xmm6 > > + > > + psrld xmm10,2 > > + paddd xmm13,xmm8 > > +DB 102,15,56,0,213 > > + movd xmm8,DWORD[((-12))+r10] > > + por xmm10,xmm7 > > + movd xmm7,DWORD[((-12))+r11] > > + punpckldq xmm3,xmm8 > > + movdqa xmm8,xmm13 > > + paddd xmm12,xmm15 > > + punpckldq xmm9,xmm7 > > + movdqa xmm7,xmm14 > > + movdqa xmm6,xmm14 > > + pslld xmm8,5 > > + pandn xmm7,xmm11 > > + pand xmm6,xmm10 > > + punpckldq xmm3,xmm9 > > + movdqa xmm9,xmm13 > > + > > + movdqa XMMWORD[(192-128)+rax],xmm2 > > + paddd xmm12,xmm2 > > + movd xmm4,DWORD[((-8))+r8] > > + psrld xmm9,27 > > + pxor xmm6,xmm7 > > + movdqa xmm7,xmm14 > > + > > + por xmm8,xmm9 > > + movd xmm9,DWORD[((-8))+r9] > > + pslld xmm7,30 > > + paddd xmm12,xmm6 > > + > > + psrld xmm14,2 > > + paddd xmm12,xmm8 > > +DB 102,15,56,0,221 > > + movd xmm8,DWORD[((-8))+r10] > > + por xmm14,xmm7 > > + movd xmm7,DWORD[((-8))+r11] > > + punpckldq xmm4,xmm8 > > + movdqa xmm8,xmm12 > > + paddd xmm11,xmm15 > > + punpckldq xmm9,xmm7 > > + movdqa xmm7,xmm13 > > + movdqa xmm6,xmm13 > > + pslld xmm8,5 > > + pandn xmm7,xmm10 > > + pand xmm6,xmm14 > > + punpckldq xmm4,xmm9 > > + movdqa xmm9,xmm12 > > + > > + movdqa XMMWORD[(208-128)+rax],xmm3 > > + paddd xmm11,xmm3 > > + movd xmm0,DWORD[((-4))+r8] > > + psrld xmm9,27 > > + pxor xmm6,xmm7 > > + movdqa xmm7,xmm13 > > + > > + por xmm8,xmm9 > > + movd xmm9,DWORD[((-4))+r9] > > + pslld xmm7,30 > > + paddd xmm11,xmm6 > > + > > + psrld xmm13,2 > > + paddd xmm11,xmm8 > > +DB 102,15,56,0,229 > > + movd xmm8,DWORD[((-4))+r10] > > + por xmm13,xmm7 > > + movdqa xmm1,XMMWORD[((0-128))+rax] > > + movd xmm7,DWORD[((-4))+r11] > > + punpckldq xmm0,xmm8 > > + movdqa xmm8,xmm11 > > + paddd xmm10,xmm15 > > + punpckldq xmm9,xmm7 > > + movdqa xmm7,xmm12 > > + movdqa xmm6,xmm12 > > + pslld xmm8,5 > > + prefetcht0 [63+r8] > > + pandn xmm7,xmm14 > > + pand xmm6,xmm13 > > + punpckldq xmm0,xmm9 > > + movdqa xmm9,xmm11 > > + > > + movdqa XMMWORD[(224-128)+rax],xmm4 > > + paddd xmm10,xmm4 > > + psrld xmm9,27 > > + pxor xmm6,xmm7 > > + movdqa xmm7,xmm12 > > + prefetcht0 [63+r9] > > + > > + por xmm8,xmm9 > > + pslld xmm7,30 > > + paddd xmm10,xmm6 > > + prefetcht0 [63+r10] > > + > > + psrld xmm12,2 > > + paddd xmm10,xmm8 > > +DB 102,15,56,0,197 > > + prefetcht0 [63+r11] > > + por xmm12,xmm7 > > + movdqa xmm2,XMMWORD[((16-128))+rax] > > + pxor xmm1,xmm3 > > + movdqa xmm3,XMMWORD[((32-128))+rax] > > + > > + movdqa xmm8,xmm10 > > + pxor xmm1,XMMWORD[((128-128))+rax] > > + paddd xmm14,xmm15 > > + movdqa xmm7,xmm11 > > + pslld xmm8,5 > > + pxor xmm1,xmm3 > > + movdqa xmm6,xmm11 > > + pandn xmm7,xmm13 > > + movdqa xmm5,xmm1 > > + pand xmm6,xmm12 > > + movdqa xmm9,xmm10 > > + psrld xmm5,31 > > + paddd xmm1,xmm1 > > + > > + movdqa XMMWORD[(240-128)+rax],xmm0 > > + paddd xmm14,xmm0 > > + psrld xmm9,27 > > + pxor xmm6,xmm7 > > + > > + movdqa xmm7,xmm11 > > + por xmm8,xmm9 > > + pslld xmm7,30 > > + paddd xmm14,xmm6 > > + > > + psrld xmm11,2 > > + paddd xmm14,xmm8 > > + por xmm1,xmm5 > > + por xmm11,xmm7 > > + pxor xmm2,xmm4 > > + movdqa xmm4,XMMWORD[((48-128))+rax] > > + > > + movdqa xmm8,xmm14 > > + pxor xmm2,XMMWORD[((144-128))+rax] > > + paddd xmm13,xmm15 > > + movdqa xmm7,xmm10 > > + pslld xmm8,5 > > + pxor xmm2,xmm4 > > + movdqa xmm6,xmm10 > > + pandn xmm7,xmm12 > > + movdqa xmm5,xmm2 > > + pand xmm6,xmm11 > > + movdqa xmm9,xmm14 > > + psrld xmm5,31 > > + paddd xmm2,xmm2 > > + > > + movdqa XMMWORD[(0-128)+rax],xmm1 > > + paddd xmm13,xmm1 > > + psrld xmm9,27 > > + pxor xmm6,xmm7 > > + > > + movdqa xmm7,xmm10 > > + por xmm8,xmm9 > > + pslld xmm7,30 > > + paddd xmm13,xmm6 > > + > > + psrld xmm10,2 > > + paddd xmm13,xmm8 > > + por xmm2,xmm5 > > + por xmm10,xmm7 > > + pxor xmm3,xmm0 > > + movdqa xmm0,XMMWORD[((64-128))+rax] > > + > > + movdqa xmm8,xmm13 > > + pxor xmm3,XMMWORD[((160-128))+rax] > > + paddd xmm12,xmm15 > > + movdqa xmm7,xmm14 > > + pslld xmm8,5 > > + pxor xmm3,xmm0 > > + movdqa xmm6,xmm14 > > + pandn xmm7,xmm11 > > + movdqa xmm5,xmm3 > > + pand xmm6,xmm10 > > + movdqa xmm9,xmm13 > > + psrld xmm5,31 > > + paddd xmm3,xmm3 > > + > > + movdqa XMMWORD[(16-128)+rax],xmm2 > > + paddd xmm12,xmm2 > > + psrld xmm9,27 > > + pxor xmm6,xmm7 > > + > > + movdqa xmm7,xmm14 > > + por xmm8,xmm9 > > + pslld xmm7,30 > > + paddd xmm12,xmm6 > > + > > + psrld xmm14,2 > > + paddd xmm12,xmm8 > > + por xmm3,xmm5 > > + por xmm14,xmm7 > > + pxor xmm4,xmm1 > > + movdqa xmm1,XMMWORD[((80-128))+rax] > > + > > + movdqa xmm8,xmm12 > > + pxor xmm4,XMMWORD[((176-128))+rax] > > + paddd xmm11,xmm15 > > + movdqa xmm7,xmm13 > > + pslld xmm8,5 > > + pxor xmm4,xmm1 > > + movdqa xmm6,xmm13 > > + pandn xmm7,xmm10 > > + movdqa xmm5,xmm4 > > + pand xmm6,xmm14 > > + movdqa xmm9,xmm12 > > + psrld xmm5,31 > > + paddd xmm4,xmm4 > > + > > + movdqa XMMWORD[(32-128)+rax],xmm3 > > + paddd xmm11,xmm3 > > + psrld xmm9,27 > > + pxor xmm6,xmm7 > > + > > + movdqa xmm7,xmm13 > > + por xmm8,xmm9 > > + pslld xmm7,30 > > + paddd xmm11,xmm6 > > + > > + psrld xmm13,2 > > + paddd xmm11,xmm8 > > + por xmm4,xmm5 > > + por xmm13,xmm7 > > + pxor xmm0,xmm2 > > + movdqa xmm2,XMMWORD[((96-128))+rax] > > + > > + movdqa xmm8,xmm11 > > + pxor xmm0,XMMWORD[((192-128))+rax] > > + paddd xmm10,xmm15 > > + movdqa xmm7,xmm12 > > + pslld xmm8,5 > > + pxor xmm0,xmm2 > > + movdqa xmm6,xmm12 > > + pandn xmm7,xmm14 > > + movdqa xmm5,xmm0 > > + pand xmm6,xmm13 > > + movdqa xmm9,xmm11 > > + psrld xmm5,31 > > + paddd xmm0,xmm0 > > + > > + movdqa XMMWORD[(48-128)+rax],xmm4 > > + paddd xmm10,xmm4 > > + psrld xmm9,27 > > + pxor xmm6,xmm7 > > + > > + movdqa xmm7,xmm12 > > + por xmm8,xmm9 > > + pslld xmm7,30 > > + paddd xmm10,xmm6 > > + > > + psrld xmm12,2 > > + paddd xmm10,xmm8 > > + por xmm0,xmm5 > > + por xmm12,xmm7 > > + movdqa xmm15,XMMWORD[rbp] > > + pxor xmm1,xmm3 > > + movdqa xmm3,XMMWORD[((112-128))+rax] > > + > > + movdqa xmm8,xmm10 > > + movdqa xmm6,xmm13 > > + pxor xmm1,XMMWORD[((208-128))+rax] > > + paddd xmm14,xmm15 > > + pslld xmm8,5 > > + pxor xmm6,xmm11 > > + > > + movdqa xmm9,xmm10 > > + movdqa XMMWORD[(64-128)+rax],xmm0 > > + paddd xmm14,xmm0 > > + pxor xmm1,xmm3 > > + psrld xmm9,27 > > + pxor xmm6,xmm12 > > + movdqa xmm7,xmm11 > > + > > + pslld xmm7,30 > > + movdqa xmm5,xmm1 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + paddd xmm14,xmm6 > > + paddd xmm1,xmm1 > > + > > + psrld xmm11,2 > > + paddd xmm14,xmm8 > > + por xmm1,xmm5 > > + por xmm11,xmm7 > > + pxor xmm2,xmm4 > > + movdqa xmm4,XMMWORD[((128-128))+rax] > > + > > + movdqa xmm8,xmm14 > > + movdqa xmm6,xmm12 > > + pxor xmm2,XMMWORD[((224-128))+rax] > > + paddd xmm13,xmm15 > > + pslld xmm8,5 > > + pxor xmm6,xmm10 > > + > > + movdqa xmm9,xmm14 > > + movdqa XMMWORD[(80-128)+rax],xmm1 > > + paddd xmm13,xmm1 > > + pxor xmm2,xmm4 > > + psrld xmm9,27 > > + pxor xmm6,xmm11 > > + movdqa xmm7,xmm10 > > + > > + pslld xmm7,30 > > + movdqa xmm5,xmm2 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + paddd xmm13,xmm6 > > + paddd xmm2,xmm2 > > + > > + psrld xmm10,2 > > + paddd xmm13,xmm8 > > + por xmm2,xmm5 > > + por xmm10,xmm7 > > + pxor xmm3,xmm0 > > + movdqa xmm0,XMMWORD[((144-128))+rax] > > + > > + movdqa xmm8,xmm13 > > + movdqa xmm6,xmm11 > > + pxor xmm3,XMMWORD[((240-128))+rax] > > + paddd xmm12,xmm15 > > + pslld xmm8,5 > > + pxor xmm6,xmm14 > > + > > + movdqa xmm9,xmm13 > > + movdqa XMMWORD[(96-128)+rax],xmm2 > > + paddd xmm12,xmm2 > > + pxor xmm3,xmm0 > > + psrld xmm9,27 > > + pxor xmm6,xmm10 > > + movdqa xmm7,xmm14 > > + > > + pslld xmm7,30 > > + movdqa xmm5,xmm3 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + paddd xmm12,xmm6 > > + paddd xmm3,xmm3 > > + > > + psrld xmm14,2 > > + paddd xmm12,xmm8 > > + por xmm3,xmm5 > > + por xmm14,xmm7 > > + pxor xmm4,xmm1 > > + movdqa xmm1,XMMWORD[((160-128))+rax] > > + > > + movdqa xmm8,xmm12 > > + movdqa xmm6,xmm10 > > + pxor xmm4,XMMWORD[((0-128))+rax] > > + paddd xmm11,xmm15 > > + pslld xmm8,5 > > + pxor xmm6,xmm13 > > + > > + movdqa xmm9,xmm12 > > + movdqa XMMWORD[(112-128)+rax],xmm3 > > + paddd xmm11,xmm3 > > + pxor xmm4,xmm1 > > + psrld xmm9,27 > > + pxor xmm6,xmm14 > > + movdqa xmm7,xmm13 > > + > > + pslld xmm7,30 > > + movdqa xmm5,xmm4 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + paddd xmm11,xmm6 > > + paddd xmm4,xmm4 > > + > > + psrld xmm13,2 > > + paddd xmm11,xmm8 > > + por xmm4,xmm5 > > + por xmm13,xmm7 > > + pxor xmm0,xmm2 > > + movdqa xmm2,XMMWORD[((176-128))+rax] > > + > > + movdqa xmm8,xmm11 > > + movdqa xmm6,xmm14 > > + pxor xmm0,XMMWORD[((16-128))+rax] > > + paddd xmm10,xmm15 > > + pslld xmm8,5 > > + pxor xmm6,xmm12 > > + > > + movdqa xmm9,xmm11 > > + movdqa XMMWORD[(128-128)+rax],xmm4 > > + paddd xmm10,xmm4 > > + pxor xmm0,xmm2 > > + psrld xmm9,27 > > + pxor xmm6,xmm13 > > + movdqa xmm7,xmm12 > > + > > + pslld xmm7,30 > > + movdqa xmm5,xmm0 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + paddd xmm10,xmm6 > > + paddd xmm0,xmm0 > > + > > + psrld xmm12,2 > > + paddd xmm10,xmm8 > > + por xmm0,xmm5 > > + por xmm12,xmm7 > > + pxor xmm1,xmm3 > > + movdqa xmm3,XMMWORD[((192-128))+rax] > > + > > + movdqa xmm8,xmm10 > > + movdqa xmm6,xmm13 > > + pxor xmm1,XMMWORD[((32-128))+rax] > > + paddd xmm14,xmm15 > > + pslld xmm8,5 > > + pxor xmm6,xmm11 > > + > > + movdqa xmm9,xmm10 > > + movdqa XMMWORD[(144-128)+rax],xmm0 > > + paddd xmm14,xmm0 > > + pxor xmm1,xmm3 > > + psrld xmm9,27 > > + pxor xmm6,xmm12 > > + movdqa xmm7,xmm11 > > + > > + pslld xmm7,30 > > + movdqa xmm5,xmm1 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + paddd xmm14,xmm6 > > + paddd xmm1,xmm1 > > + > > + psrld xmm11,2 > > + paddd xmm14,xmm8 > > + por xmm1,xmm5 > > + por xmm11,xmm7 > > + pxor xmm2,xmm4 > > + movdqa xmm4,XMMWORD[((208-128))+rax] > > + > > + movdqa xmm8,xmm14 > > + movdqa xmm6,xmm12 > > + pxor xmm2,XMMWORD[((48-128))+rax] > > + paddd xmm13,xmm15 > > + pslld xmm8,5 > > + pxor xmm6,xmm10 > > + > > + movdqa xmm9,xmm14 > > + movdqa XMMWORD[(160-128)+rax],xmm1 > > + paddd xmm13,xmm1 > > + pxor xmm2,xmm4 > > + psrld xmm9,27 > > + pxor xmm6,xmm11 > > + movdqa xmm7,xmm10 > > + > > + pslld xmm7,30 > > + movdqa xmm5,xmm2 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + paddd xmm13,xmm6 > > + paddd xmm2,xmm2 > > + > > + psrld xmm10,2 > > + paddd xmm13,xmm8 > > + por xmm2,xmm5 > > + por xmm10,xmm7 > > + pxor xmm3,xmm0 > > + movdqa xmm0,XMMWORD[((224-128))+rax] > > + > > + movdqa xmm8,xmm13 > > + movdqa xmm6,xmm11 > > + pxor xmm3,XMMWORD[((64-128))+rax] > > + paddd xmm12,xmm15 > > + pslld xmm8,5 > > + pxor xmm6,xmm14 > > + > > + movdqa xmm9,xmm13 > > + movdqa XMMWORD[(176-128)+rax],xmm2 > > + paddd xmm12,xmm2 > > + pxor xmm3,xmm0 > > + psrld xmm9,27 > > + pxor xmm6,xmm10 > > + movdqa xmm7,xmm14 > > + > > + pslld xmm7,30 > > + movdqa xmm5,xmm3 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + paddd xmm12,xmm6 > > + paddd xmm3,xmm3 > > + > > + psrld xmm14,2 > > + paddd xmm12,xmm8 > > + por xmm3,xmm5 > > + por xmm14,xmm7 > > + pxor xmm4,xmm1 > > + movdqa xmm1,XMMWORD[((240-128))+rax] > > + > > + movdqa xmm8,xmm12 > > + movdqa xmm6,xmm10 > > + pxor xmm4,XMMWORD[((80-128))+rax] > > + paddd xmm11,xmm15 > > + pslld xmm8,5 > > + pxor xmm6,xmm13 > > + > > + movdqa xmm9,xmm12 > > + movdqa XMMWORD[(192-128)+rax],xmm3 > > + paddd xmm11,xmm3 > > + pxor xmm4,xmm1 > > + psrld xmm9,27 > > + pxor xmm6,xmm14 > > + movdqa xmm7,xmm13 > > + > > + pslld xmm7,30 > > + movdqa xmm5,xmm4 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + paddd xmm11,xmm6 > > + paddd xmm4,xmm4 > > + > > + psrld xmm13,2 > > + paddd xmm11,xmm8 > > + por xmm4,xmm5 > > + por xmm13,xmm7 > > + pxor xmm0,xmm2 > > + movdqa xmm2,XMMWORD[((0-128))+rax] > > + > > + movdqa xmm8,xmm11 > > + movdqa xmm6,xmm14 > > + pxor xmm0,XMMWORD[((96-128))+rax] > > + paddd xmm10,xmm15 > > + pslld xmm8,5 > > + pxor xmm6,xmm12 > > + > > + movdqa xmm9,xmm11 > > + movdqa XMMWORD[(208-128)+rax],xmm4 > > + paddd xmm10,xmm4 > > + pxor xmm0,xmm2 > > + psrld xmm9,27 > > + pxor xmm6,xmm13 > > + movdqa xmm7,xmm12 > > + > > + pslld xmm7,30 > > + movdqa xmm5,xmm0 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + paddd xmm10,xmm6 > > + paddd xmm0,xmm0 > > + > > + psrld xmm12,2 > > + paddd xmm10,xmm8 > > + por xmm0,xmm5 > > + por xmm12,xmm7 > > + pxor xmm1,xmm3 > > + movdqa xmm3,XMMWORD[((16-128))+rax] > > + > > + movdqa xmm8,xmm10 > > + movdqa xmm6,xmm13 > > + pxor xmm1,XMMWORD[((112-128))+rax] > > + paddd xmm14,xmm15 > > + pslld xmm8,5 > > + pxor xmm6,xmm11 > > + > > + movdqa xmm9,xmm10 > > + movdqa XMMWORD[(224-128)+rax],xmm0 > > + paddd xmm14,xmm0 > > + pxor xmm1,xmm3 > > + psrld xmm9,27 > > + pxor xmm6,xmm12 > > + movdqa xmm7,xmm11 > > + > > + pslld xmm7,30 > > + movdqa xmm5,xmm1 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + paddd xmm14,xmm6 > > + paddd xmm1,xmm1 > > + > > + psrld xmm11,2 > > + paddd xmm14,xmm8 > > + por xmm1,xmm5 > > + por xmm11,xmm7 > > + pxor xmm2,xmm4 > > + movdqa xmm4,XMMWORD[((32-128))+rax] > > + > > + movdqa xmm8,xmm14 > > + movdqa xmm6,xmm12 > > + pxor xmm2,XMMWORD[((128-128))+rax] > > + paddd xmm13,xmm15 > > + pslld xmm8,5 > > + pxor xmm6,xmm10 > > + > > + movdqa xmm9,xmm14 > > + movdqa XMMWORD[(240-128)+rax],xmm1 > > + paddd xmm13,xmm1 > > + pxor xmm2,xmm4 > > + psrld xmm9,27 > > + pxor xmm6,xmm11 > > + movdqa xmm7,xmm10 > > + > > + pslld xmm7,30 > > + movdqa xmm5,xmm2 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + paddd xmm13,xmm6 > > + paddd xmm2,xmm2 > > + > > + psrld xmm10,2 > > + paddd xmm13,xmm8 > > + por xmm2,xmm5 > > + por xmm10,xmm7 > > + pxor xmm3,xmm0 > > + movdqa xmm0,XMMWORD[((48-128))+rax] > > + > > + movdqa xmm8,xmm13 > > + movdqa xmm6,xmm11 > > + pxor xmm3,XMMWORD[((144-128))+rax] > > + paddd xmm12,xmm15 > > + pslld xmm8,5 > > + pxor xmm6,xmm14 > > + > > + movdqa xmm9,xmm13 > > + movdqa XMMWORD[(0-128)+rax],xmm2 > > + paddd xmm12,xmm2 > > + pxor xmm3,xmm0 > > + psrld xmm9,27 > > + pxor xmm6,xmm10 > > + movdqa xmm7,xmm14 > > + > > + pslld xmm7,30 > > + movdqa xmm5,xmm3 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + paddd xmm12,xmm6 > > + paddd xmm3,xmm3 > > + > > + psrld xmm14,2 > > + paddd xmm12,xmm8 > > + por xmm3,xmm5 > > + por xmm14,xmm7 > > + pxor xmm4,xmm1 > > + movdqa xmm1,XMMWORD[((64-128))+rax] > > + > > + movdqa xmm8,xmm12 > > + movdqa xmm6,xmm10 > > + pxor xmm4,XMMWORD[((160-128))+rax] > > + paddd xmm11,xmm15 > > + pslld xmm8,5 > > + pxor xmm6,xmm13 > > + > > + movdqa xmm9,xmm12 > > + movdqa XMMWORD[(16-128)+rax],xmm3 > > + paddd xmm11,xmm3 > > + pxor xmm4,xmm1 > > + psrld xmm9,27 > > + pxor xmm6,xmm14 > > + movdqa xmm7,xmm13 > > + > > + pslld xmm7,30 > > + movdqa xmm5,xmm4 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + paddd xmm11,xmm6 > > + paddd xmm4,xmm4 > > + > > + psrld xmm13,2 > > + paddd xmm11,xmm8 > > + por xmm4,xmm5 > > + por xmm13,xmm7 > > + pxor xmm0,xmm2 > > + movdqa xmm2,XMMWORD[((80-128))+rax] > > + > > + movdqa xmm8,xmm11 > > + movdqa xmm6,xmm14 > > + pxor xmm0,XMMWORD[((176-128))+rax] > > + paddd xmm10,xmm15 > > + pslld xmm8,5 > > + pxor xmm6,xmm12 > > + > > + movdqa xmm9,xmm11 > > + movdqa XMMWORD[(32-128)+rax],xmm4 > > + paddd xmm10,xmm4 > > + pxor xmm0,xmm2 > > + psrld xmm9,27 > > + pxor xmm6,xmm13 > > + movdqa xmm7,xmm12 > > + > > + pslld xmm7,30 > > + movdqa xmm5,xmm0 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + paddd xmm10,xmm6 > > + paddd xmm0,xmm0 > > + > > + psrld xmm12,2 > > + paddd xmm10,xmm8 > > + por xmm0,xmm5 > > + por xmm12,xmm7 > > + pxor xmm1,xmm3 > > + movdqa xmm3,XMMWORD[((96-128))+rax] > > + > > + movdqa xmm8,xmm10 > > + movdqa xmm6,xmm13 > > + pxor xmm1,XMMWORD[((192-128))+rax] > > + paddd xmm14,xmm15 > > + pslld xmm8,5 > > + pxor xmm6,xmm11 > > + > > + movdqa xmm9,xmm10 > > + movdqa XMMWORD[(48-128)+rax],xmm0 > > + paddd xmm14,xmm0 > > + pxor xmm1,xmm3 > > + psrld xmm9,27 > > + pxor xmm6,xmm12 > > + movdqa xmm7,xmm11 > > + > > + pslld xmm7,30 > > + movdqa xmm5,xmm1 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + paddd xmm14,xmm6 > > + paddd xmm1,xmm1 > > + > > + psrld xmm11,2 > > + paddd xmm14,xmm8 > > + por xmm1,xmm5 > > + por xmm11,xmm7 > > + pxor xmm2,xmm4 > > + movdqa xmm4,XMMWORD[((112-128))+rax] > > + > > + movdqa xmm8,xmm14 > > + movdqa xmm6,xmm12 > > + pxor xmm2,XMMWORD[((208-128))+rax] > > + paddd xmm13,xmm15 > > + pslld xmm8,5 > > + pxor xmm6,xmm10 > > + > > + movdqa xmm9,xmm14 > > + movdqa XMMWORD[(64-128)+rax],xmm1 > > + paddd xmm13,xmm1 > > + pxor xmm2,xmm4 > > + psrld xmm9,27 > > + pxor xmm6,xmm11 > > + movdqa xmm7,xmm10 > > + > > + pslld xmm7,30 > > + movdqa xmm5,xmm2 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + paddd xmm13,xmm6 > > + paddd xmm2,xmm2 > > + > > + psrld xmm10,2 > > + paddd xmm13,xmm8 > > + por xmm2,xmm5 > > + por xmm10,xmm7 > > + pxor xmm3,xmm0 > > + movdqa xmm0,XMMWORD[((128-128))+rax] > > + > > + movdqa xmm8,xmm13 > > + movdqa xmm6,xmm11 > > + pxor xmm3,XMMWORD[((224-128))+rax] > > + paddd xmm12,xmm15 > > + pslld xmm8,5 > > + pxor xmm6,xmm14 > > + > > + movdqa xmm9,xmm13 > > + movdqa XMMWORD[(80-128)+rax],xmm2 > > + paddd xmm12,xmm2 > > + pxor xmm3,xmm0 > > + psrld xmm9,27 > > + pxor xmm6,xmm10 > > + movdqa xmm7,xmm14 > > + > > + pslld xmm7,30 > > + movdqa xmm5,xmm3 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + paddd xmm12,xmm6 > > + paddd xmm3,xmm3 > > + > > + psrld xmm14,2 > > + paddd xmm12,xmm8 > > + por xmm3,xmm5 > > + por xmm14,xmm7 > > + pxor xmm4,xmm1 > > + movdqa xmm1,XMMWORD[((144-128))+rax] > > + > > + movdqa xmm8,xmm12 > > + movdqa xmm6,xmm10 > > + pxor xmm4,XMMWORD[((240-128))+rax] > > + paddd xmm11,xmm15 > > + pslld xmm8,5 > > + pxor xmm6,xmm13 > > + > > + movdqa xmm9,xmm12 > > + movdqa XMMWORD[(96-128)+rax],xmm3 > > + paddd xmm11,xmm3 > > + pxor xmm4,xmm1 > > + psrld xmm9,27 > > + pxor xmm6,xmm14 > > + movdqa xmm7,xmm13 > > + > > + pslld xmm7,30 > > + movdqa xmm5,xmm4 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + paddd xmm11,xmm6 > > + paddd xmm4,xmm4 > > + > > + psrld xmm13,2 > > + paddd xmm11,xmm8 > > + por xmm4,xmm5 > > + por xmm13,xmm7 > > + pxor xmm0,xmm2 > > + movdqa xmm2,XMMWORD[((160-128))+rax] > > + > > + movdqa xmm8,xmm11 > > + movdqa xmm6,xmm14 > > + pxor xmm0,XMMWORD[((0-128))+rax] > > + paddd xmm10,xmm15 > > + pslld xmm8,5 > > + pxor xmm6,xmm12 > > + > > + movdqa xmm9,xmm11 > > + movdqa XMMWORD[(112-128)+rax],xmm4 > > + paddd xmm10,xmm4 > > + pxor xmm0,xmm2 > > + psrld xmm9,27 > > + pxor xmm6,xmm13 > > + movdqa xmm7,xmm12 > > + > > + pslld xmm7,30 > > + movdqa xmm5,xmm0 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + paddd xmm10,xmm6 > > + paddd xmm0,xmm0 > > + > > + psrld xmm12,2 > > + paddd xmm10,xmm8 > > + por xmm0,xmm5 > > + por xmm12,xmm7 > > + movdqa xmm15,XMMWORD[32+rbp] > > + pxor xmm1,xmm3 > > + movdqa xmm3,XMMWORD[((176-128))+rax] > > + > > + movdqa xmm8,xmm10 > > + movdqa xmm7,xmm13 > > + pxor xmm1,XMMWORD[((16-128))+rax] > > + pxor xmm1,xmm3 > > + paddd xmm14,xmm15 > > + pslld xmm8,5 > > + movdqa xmm9,xmm10 > > + pand xmm7,xmm12 > > + > > + movdqa xmm6,xmm13 > > + movdqa xmm5,xmm1 > > + psrld xmm9,27 > > + paddd xmm14,xmm7 > > + pxor xmm6,xmm12 > > + > > + movdqa XMMWORD[(128-128)+rax],xmm0 > > + paddd xmm14,xmm0 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + pand xmm6,xmm11 > > + movdqa xmm7,xmm11 > > + > > + pslld xmm7,30 > > + paddd xmm1,xmm1 > > + paddd xmm14,xmm6 > > + > > + psrld xmm11,2 > > + paddd xmm14,xmm8 > > + por xmm1,xmm5 > > + por xmm11,xmm7 > > + pxor xmm2,xmm4 > > + movdqa xmm4,XMMWORD[((192-128))+rax] > > + > > + movdqa xmm8,xmm14 > > + movdqa xmm7,xmm12 > > + pxor xmm2,XMMWORD[((32-128))+rax] > > + pxor xmm2,xmm4 > > + paddd xmm13,xmm15 > > + pslld xmm8,5 > > + movdqa xmm9,xmm14 > > + pand xmm7,xmm11 > > + > > + movdqa xmm6,xmm12 > > + movdqa xmm5,xmm2 > > + psrld xmm9,27 > > + paddd xmm13,xmm7 > > + pxor xmm6,xmm11 > > + > > + movdqa XMMWORD[(144-128)+rax],xmm1 > > + paddd xmm13,xmm1 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + pand xmm6,xmm10 > > + movdqa xmm7,xmm10 > > + > > + pslld xmm7,30 > > + paddd xmm2,xmm2 > > + paddd xmm13,xmm6 > > + > > + psrld xmm10,2 > > + paddd xmm13,xmm8 > > + por xmm2,xmm5 > > + por xmm10,xmm7 > > + pxor xmm3,xmm0 > > + movdqa xmm0,XMMWORD[((208-128))+rax] > > + > > + movdqa xmm8,xmm13 > > + movdqa xmm7,xmm11 > > + pxor xmm3,XMMWORD[((48-128))+rax] > > + pxor xmm3,xmm0 > > + paddd xmm12,xmm15 > > + pslld xmm8,5 > > + movdqa xmm9,xmm13 > > + pand xmm7,xmm10 > > + > > + movdqa xmm6,xmm11 > > + movdqa xmm5,xmm3 > > + psrld xmm9,27 > > + paddd xmm12,xmm7 > > + pxor xmm6,xmm10 > > + > > + movdqa XMMWORD[(160-128)+rax],xmm2 > > + paddd xmm12,xmm2 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + pand xmm6,xmm14 > > + movdqa xmm7,xmm14 > > + > > + pslld xmm7,30 > > + paddd xmm3,xmm3 > > + paddd xmm12,xmm6 > > + > > + psrld xmm14,2 > > + paddd xmm12,xmm8 > > + por xmm3,xmm5 > > + por xmm14,xmm7 > > + pxor xmm4,xmm1 > > + movdqa xmm1,XMMWORD[((224-128))+rax] > > + > > + movdqa xmm8,xmm12 > > + movdqa xmm7,xmm10 > > + pxor xmm4,XMMWORD[((64-128))+rax] > > + pxor xmm4,xmm1 > > + paddd xmm11,xmm15 > > + pslld xmm8,5 > > + movdqa xmm9,xmm12 > > + pand xmm7,xmm14 > > + > > + movdqa xmm6,xmm10 > > + movdqa xmm5,xmm4 > > + psrld xmm9,27 > > + paddd xmm11,xmm7 > > + pxor xmm6,xmm14 > > + > > + movdqa XMMWORD[(176-128)+rax],xmm3 > > + paddd xmm11,xmm3 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + pand xmm6,xmm13 > > + movdqa xmm7,xmm13 > > + > > + pslld xmm7,30 > > + paddd xmm4,xmm4 > > + paddd xmm11,xmm6 > > + > > + psrld xmm13,2 > > + paddd xmm11,xmm8 > > + por xmm4,xmm5 > > + por xmm13,xmm7 > > + pxor xmm0,xmm2 > > + movdqa xmm2,XMMWORD[((240-128))+rax] > > + > > + movdqa xmm8,xmm11 > > + movdqa xmm7,xmm14 > > + pxor xmm0,XMMWORD[((80-128))+rax] > > + pxor xmm0,xmm2 > > + paddd xmm10,xmm15 > > + pslld xmm8,5 > > + movdqa xmm9,xmm11 > > + pand xmm7,xmm13 > > + > > + movdqa xmm6,xmm14 > > + movdqa xmm5,xmm0 > > + psrld xmm9,27 > > + paddd xmm10,xmm7 > > + pxor xmm6,xmm13 > > + > > + movdqa XMMWORD[(192-128)+rax],xmm4 > > + paddd xmm10,xmm4 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + pand xmm6,xmm12 > > + movdqa xmm7,xmm12 > > + > > + pslld xmm7,30 > > + paddd xmm0,xmm0 > > + paddd xmm10,xmm6 > > + > > + psrld xmm12,2 > > + paddd xmm10,xmm8 > > + por xmm0,xmm5 > > + por xmm12,xmm7 > > + pxor xmm1,xmm3 > > + movdqa xmm3,XMMWORD[((0-128))+rax] > > + > > + movdqa xmm8,xmm10 > > + movdqa xmm7,xmm13 > > + pxor xmm1,XMMWORD[((96-128))+rax] > > + pxor xmm1,xmm3 > > + paddd xmm14,xmm15 > > + pslld xmm8,5 > > + movdqa xmm9,xmm10 > > + pand xmm7,xmm12 > > + > > + movdqa xmm6,xmm13 > > + movdqa xmm5,xmm1 > > + psrld xmm9,27 > > + paddd xmm14,xmm7 > > + pxor xmm6,xmm12 > > + > > + movdqa XMMWORD[(208-128)+rax],xmm0 > > + paddd xmm14,xmm0 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + pand xmm6,xmm11 > > + movdqa xmm7,xmm11 > > + > > + pslld xmm7,30 > > + paddd xmm1,xmm1 > > + paddd xmm14,xmm6 > > + > > + psrld xmm11,2 > > + paddd xmm14,xmm8 > > + por xmm1,xmm5 > > + por xmm11,xmm7 > > + pxor xmm2,xmm4 > > + movdqa xmm4,XMMWORD[((16-128))+rax] > > + > > + movdqa xmm8,xmm14 > > + movdqa xmm7,xmm12 > > + pxor xmm2,XMMWORD[((112-128))+rax] > > + pxor xmm2,xmm4 > > + paddd xmm13,xmm15 > > + pslld xmm8,5 > > + movdqa xmm9,xmm14 > > + pand xmm7,xmm11 > > + > > + movdqa xmm6,xmm12 > > + movdqa xmm5,xmm2 > > + psrld xmm9,27 > > + paddd xmm13,xmm7 > > + pxor xmm6,xmm11 > > + > > + movdqa XMMWORD[(224-128)+rax],xmm1 > > + paddd xmm13,xmm1 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + pand xmm6,xmm10 > > + movdqa xmm7,xmm10 > > + > > + pslld xmm7,30 > > + paddd xmm2,xmm2 > > + paddd xmm13,xmm6 > > + > > + psrld xmm10,2 > > + paddd xmm13,xmm8 > > + por xmm2,xmm5 > > + por xmm10,xmm7 > > + pxor xmm3,xmm0 > > + movdqa xmm0,XMMWORD[((32-128))+rax] > > + > > + movdqa xmm8,xmm13 > > + movdqa xmm7,xmm11 > > + pxor xmm3,XMMWORD[((128-128))+rax] > > + pxor xmm3,xmm0 > > + paddd xmm12,xmm15 > > + pslld xmm8,5 > > + movdqa xmm9,xmm13 > > + pand xmm7,xmm10 > > + > > + movdqa xmm6,xmm11 > > + movdqa xmm5,xmm3 > > + psrld xmm9,27 > > + paddd xmm12,xmm7 > > + pxor xmm6,xmm10 > > + > > + movdqa XMMWORD[(240-128)+rax],xmm2 > > + paddd xmm12,xmm2 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + pand xmm6,xmm14 > > + movdqa xmm7,xmm14 > > + > > + pslld xmm7,30 > > + paddd xmm3,xmm3 > > + paddd xmm12,xmm6 > > + > > + psrld xmm14,2 > > + paddd xmm12,xmm8 > > + por xmm3,xmm5 > > + por xmm14,xmm7 > > + pxor xmm4,xmm1 > > + movdqa xmm1,XMMWORD[((48-128))+rax] > > + > > + movdqa xmm8,xmm12 > > + movdqa xmm7,xmm10 > > + pxor xmm4,XMMWORD[((144-128))+rax] > > + pxor xmm4,xmm1 > > + paddd xmm11,xmm15 > > + pslld xmm8,5 > > + movdqa xmm9,xmm12 > > + pand xmm7,xmm14 > > + > > + movdqa xmm6,xmm10 > > + movdqa xmm5,xmm4 > > + psrld xmm9,27 > > + paddd xmm11,xmm7 > > + pxor xmm6,xmm14 > > + > > + movdqa XMMWORD[(0-128)+rax],xmm3 > > + paddd xmm11,xmm3 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + pand xmm6,xmm13 > > + movdqa xmm7,xmm13 > > + > > + pslld xmm7,30 > > + paddd xmm4,xmm4 > > + paddd xmm11,xmm6 > > + > > + psrld xmm13,2 > > + paddd xmm11,xmm8 > > + por xmm4,xmm5 > > + por xmm13,xmm7 > > + pxor xmm0,xmm2 > > + movdqa xmm2,XMMWORD[((64-128))+rax] > > + > > + movdqa xmm8,xmm11 > > + movdqa xmm7,xmm14 > > + pxor xmm0,XMMWORD[((160-128))+rax] > > + pxor xmm0,xmm2 > > + paddd xmm10,xmm15 > > + pslld xmm8,5 > > + movdqa xmm9,xmm11 > > + pand xmm7,xmm13 > > + > > + movdqa xmm6,xmm14 > > + movdqa xmm5,xmm0 > > + psrld xmm9,27 > > + paddd xmm10,xmm7 > > + pxor xmm6,xmm13 > > + > > + movdqa XMMWORD[(16-128)+rax],xmm4 > > + paddd xmm10,xmm4 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + pand xmm6,xmm12 > > + movdqa xmm7,xmm12 > > + > > + pslld xmm7,30 > > + paddd xmm0,xmm0 > > + paddd xmm10,xmm6 > > + > > + psrld xmm12,2 > > + paddd xmm10,xmm8 > > + por xmm0,xmm5 > > + por xmm12,xmm7 > > + pxor xmm1,xmm3 > > + movdqa xmm3,XMMWORD[((80-128))+rax] > > + > > + movdqa xmm8,xmm10 > > + movdqa xmm7,xmm13 > > + pxor xmm1,XMMWORD[((176-128))+rax] > > + pxor xmm1,xmm3 > > + paddd xmm14,xmm15 > > + pslld xmm8,5 > > + movdqa xmm9,xmm10 > > + pand xmm7,xmm12 > > + > > + movdqa xmm6,xmm13 > > + movdqa xmm5,xmm1 > > + psrld xmm9,27 > > + paddd xmm14,xmm7 > > + pxor xmm6,xmm12 > > + > > + movdqa XMMWORD[(32-128)+rax],xmm0 > > + paddd xmm14,xmm0 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + pand xmm6,xmm11 > > + movdqa xmm7,xmm11 > > + > > + pslld xmm7,30 > > + paddd xmm1,xmm1 > > + paddd xmm14,xmm6 > > + > > + psrld xmm11,2 > > + paddd xmm14,xmm8 > > + por xmm1,xmm5 > > + por xmm11,xmm7 > > + pxor xmm2,xmm4 > > + movdqa xmm4,XMMWORD[((96-128))+rax] > > + > > + movdqa xmm8,xmm14 > > + movdqa xmm7,xmm12 > > + pxor xmm2,XMMWORD[((192-128))+rax] > > + pxor xmm2,xmm4 > > + paddd xmm13,xmm15 > > + pslld xmm8,5 > > + movdqa xmm9,xmm14 > > + pand xmm7,xmm11 > > + > > + movdqa xmm6,xmm12 > > + movdqa xmm5,xmm2 > > + psrld xmm9,27 > > + paddd xmm13,xmm7 > > + pxor xmm6,xmm11 > > + > > + movdqa XMMWORD[(48-128)+rax],xmm1 > > + paddd xmm13,xmm1 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + pand xmm6,xmm10 > > + movdqa xmm7,xmm10 > > + > > + pslld xmm7,30 > > + paddd xmm2,xmm2 > > + paddd xmm13,xmm6 > > + > > + psrld xmm10,2 > > + paddd xmm13,xmm8 > > + por xmm2,xmm5 > > + por xmm10,xmm7 > > + pxor xmm3,xmm0 > > + movdqa xmm0,XMMWORD[((112-128))+rax] > > + > > + movdqa xmm8,xmm13 > > + movdqa xmm7,xmm11 > > + pxor xmm3,XMMWORD[((208-128))+rax] > > + pxor xmm3,xmm0 > > + paddd xmm12,xmm15 > > + pslld xmm8,5 > > + movdqa xmm9,xmm13 > > + pand xmm7,xmm10 > > + > > + movdqa xmm6,xmm11 > > + movdqa xmm5,xmm3 > > + psrld xmm9,27 > > + paddd xmm12,xmm7 > > + pxor xmm6,xmm10 > > + > > + movdqa XMMWORD[(64-128)+rax],xmm2 > > + paddd xmm12,xmm2 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + pand xmm6,xmm14 > > + movdqa xmm7,xmm14 > > + > > + pslld xmm7,30 > > + paddd xmm3,xmm3 > > + paddd xmm12,xmm6 > > + > > + psrld xmm14,2 > > + paddd xmm12,xmm8 > > + por xmm3,xmm5 > > + por xmm14,xmm7 > > + pxor xmm4,xmm1 > > + movdqa xmm1,XMMWORD[((128-128))+rax] > > + > > + movdqa xmm8,xmm12 > > + movdqa xmm7,xmm10 > > + pxor xmm4,XMMWORD[((224-128))+rax] > > + pxor xmm4,xmm1 > > + paddd xmm11,xmm15 > > + pslld xmm8,5 > > + movdqa xmm9,xmm12 > > + pand xmm7,xmm14 > > + > > + movdqa xmm6,xmm10 > > + movdqa xmm5,xmm4 > > + psrld xmm9,27 > > + paddd xmm11,xmm7 > > + pxor xmm6,xmm14 > > + > > + movdqa XMMWORD[(80-128)+rax],xmm3 > > + paddd xmm11,xmm3 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + pand xmm6,xmm13 > > + movdqa xmm7,xmm13 > > + > > + pslld xmm7,30 > > + paddd xmm4,xmm4 > > + paddd xmm11,xmm6 > > + > > + psrld xmm13,2 > > + paddd xmm11,xmm8 > > + por xmm4,xmm5 > > + por xmm13,xmm7 > > + pxor xmm0,xmm2 > > + movdqa xmm2,XMMWORD[((144-128))+rax] > > + > > + movdqa xmm8,xmm11 > > + movdqa xmm7,xmm14 > > + pxor xmm0,XMMWORD[((240-128))+rax] > > + pxor xmm0,xmm2 > > + paddd xmm10,xmm15 > > + pslld xmm8,5 > > + movdqa xmm9,xmm11 > > + pand xmm7,xmm13 > > + > > + movdqa xmm6,xmm14 > > + movdqa xmm5,xmm0 > > + psrld xmm9,27 > > + paddd xmm10,xmm7 > > + pxor xmm6,xmm13 > > + > > + movdqa XMMWORD[(96-128)+rax],xmm4 > > + paddd xmm10,xmm4 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + pand xmm6,xmm12 > > + movdqa xmm7,xmm12 > > + > > + pslld xmm7,30 > > + paddd xmm0,xmm0 > > + paddd xmm10,xmm6 > > + > > + psrld xmm12,2 > > + paddd xmm10,xmm8 > > + por xmm0,xmm5 > > + por xmm12,xmm7 > > + pxor xmm1,xmm3 > > + movdqa xmm3,XMMWORD[((160-128))+rax] > > + > > + movdqa xmm8,xmm10 > > + movdqa xmm7,xmm13 > > + pxor xmm1,XMMWORD[((0-128))+rax] > > + pxor xmm1,xmm3 > > + paddd xmm14,xmm15 > > + pslld xmm8,5 > > + movdqa xmm9,xmm10 > > + pand xmm7,xmm12 > > + > > + movdqa xmm6,xmm13 > > + movdqa xmm5,xmm1 > > + psrld xmm9,27 > > + paddd xmm14,xmm7 > > + pxor xmm6,xmm12 > > + > > + movdqa XMMWORD[(112-128)+rax],xmm0 > > + paddd xmm14,xmm0 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + pand xmm6,xmm11 > > + movdqa xmm7,xmm11 > > + > > + pslld xmm7,30 > > + paddd xmm1,xmm1 > > + paddd xmm14,xmm6 > > + > > + psrld xmm11,2 > > + paddd xmm14,xmm8 > > + por xmm1,xmm5 > > + por xmm11,xmm7 > > + pxor xmm2,xmm4 > > + movdqa xmm4,XMMWORD[((176-128))+rax] > > + > > + movdqa xmm8,xmm14 > > + movdqa xmm7,xmm12 > > + pxor xmm2,XMMWORD[((16-128))+rax] > > + pxor xmm2,xmm4 > > + paddd xmm13,xmm15 > > + pslld xmm8,5 > > + movdqa xmm9,xmm14 > > + pand xmm7,xmm11 > > + > > + movdqa xmm6,xmm12 > > + movdqa xmm5,xmm2 > > + psrld xmm9,27 > > + paddd xmm13,xmm7 > > + pxor xmm6,xmm11 > > + > > + movdqa XMMWORD[(128-128)+rax],xmm1 > > + paddd xmm13,xmm1 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + pand xmm6,xmm10 > > + movdqa xmm7,xmm10 > > + > > + pslld xmm7,30 > > + paddd xmm2,xmm2 > > + paddd xmm13,xmm6 > > + > > + psrld xmm10,2 > > + paddd xmm13,xmm8 > > + por xmm2,xmm5 > > + por xmm10,xmm7 > > + pxor xmm3,xmm0 > > + movdqa xmm0,XMMWORD[((192-128))+rax] > > + > > + movdqa xmm8,xmm13 > > + movdqa xmm7,xmm11 > > + pxor xmm3,XMMWORD[((32-128))+rax] > > + pxor xmm3,xmm0 > > + paddd xmm12,xmm15 > > + pslld xmm8,5 > > + movdqa xmm9,xmm13 > > + pand xmm7,xmm10 > > + > > + movdqa xmm6,xmm11 > > + movdqa xmm5,xmm3 > > + psrld xmm9,27 > > + paddd xmm12,xmm7 > > + pxor xmm6,xmm10 > > + > > + movdqa XMMWORD[(144-128)+rax],xmm2 > > + paddd xmm12,xmm2 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + pand xmm6,xmm14 > > + movdqa xmm7,xmm14 > > + > > + pslld xmm7,30 > > + paddd xmm3,xmm3 > > + paddd xmm12,xmm6 > > + > > + psrld xmm14,2 > > + paddd xmm12,xmm8 > > + por xmm3,xmm5 > > + por xmm14,xmm7 > > + pxor xmm4,xmm1 > > + movdqa xmm1,XMMWORD[((208-128))+rax] > > + > > + movdqa xmm8,xmm12 > > + movdqa xmm7,xmm10 > > + pxor xmm4,XMMWORD[((48-128))+rax] > > + pxor xmm4,xmm1 > > + paddd xmm11,xmm15 > > + pslld xmm8,5 > > + movdqa xmm9,xmm12 > > + pand xmm7,xmm14 > > + > > + movdqa xmm6,xmm10 > > + movdqa xmm5,xmm4 > > + psrld xmm9,27 > > + paddd xmm11,xmm7 > > + pxor xmm6,xmm14 > > + > > + movdqa XMMWORD[(160-128)+rax],xmm3 > > + paddd xmm11,xmm3 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + pand xmm6,xmm13 > > + movdqa xmm7,xmm13 > > + > > + pslld xmm7,30 > > + paddd xmm4,xmm4 > > + paddd xmm11,xmm6 > > + > > + psrld xmm13,2 > > + paddd xmm11,xmm8 > > + por xmm4,xmm5 > > + por xmm13,xmm7 > > + pxor xmm0,xmm2 > > + movdqa xmm2,XMMWORD[((224-128))+rax] > > + > > + movdqa xmm8,xmm11 > > + movdqa xmm7,xmm14 > > + pxor xmm0,XMMWORD[((64-128))+rax] > > + pxor xmm0,xmm2 > > + paddd xmm10,xmm15 > > + pslld xmm8,5 > > + movdqa xmm9,xmm11 > > + pand xmm7,xmm13 > > + > > + movdqa xmm6,xmm14 > > + movdqa xmm5,xmm0 > > + psrld xmm9,27 > > + paddd xmm10,xmm7 > > + pxor xmm6,xmm13 > > + > > + movdqa XMMWORD[(176-128)+rax],xmm4 > > + paddd xmm10,xmm4 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + pand xmm6,xmm12 > > + movdqa xmm7,xmm12 > > + > > + pslld xmm7,30 > > + paddd xmm0,xmm0 > > + paddd xmm10,xmm6 > > + > > + psrld xmm12,2 > > + paddd xmm10,xmm8 > > + por xmm0,xmm5 > > + por xmm12,xmm7 > > + movdqa xmm15,XMMWORD[64+rbp] > > + pxor xmm1,xmm3 > > + movdqa xmm3,XMMWORD[((240-128))+rax] > > + > > + movdqa xmm8,xmm10 > > + movdqa xmm6,xmm13 > > + pxor xmm1,XMMWORD[((80-128))+rax] > > + paddd xmm14,xmm15 > > + pslld xmm8,5 > > + pxor xmm6,xmm11 > > + > > + movdqa xmm9,xmm10 > > + movdqa XMMWORD[(192-128)+rax],xmm0 > > + paddd xmm14,xmm0 > > + pxor xmm1,xmm3 > > + psrld xmm9,27 > > + pxor xmm6,xmm12 > > + movdqa xmm7,xmm11 > > + > > + pslld xmm7,30 > > + movdqa xmm5,xmm1 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + paddd xmm14,xmm6 > > + paddd xmm1,xmm1 > > + > > + psrld xmm11,2 > > + paddd xmm14,xmm8 > > + por xmm1,xmm5 > > + por xmm11,xmm7 > > + pxor xmm2,xmm4 > > + movdqa xmm4,XMMWORD[((0-128))+rax] > > + > > + movdqa xmm8,xmm14 > > + movdqa xmm6,xmm12 > > + pxor xmm2,XMMWORD[((96-128))+rax] > > + paddd xmm13,xmm15 > > + pslld xmm8,5 > > + pxor xmm6,xmm10 > > + > > + movdqa xmm9,xmm14 > > + movdqa XMMWORD[(208-128)+rax],xmm1 > > + paddd xmm13,xmm1 > > + pxor xmm2,xmm4 > > + psrld xmm9,27 > > + pxor xmm6,xmm11 > > + movdqa xmm7,xmm10 > > + > > + pslld xmm7,30 > > + movdqa xmm5,xmm2 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + paddd xmm13,xmm6 > > + paddd xmm2,xmm2 > > + > > + psrld xmm10,2 > > + paddd xmm13,xmm8 > > + por xmm2,xmm5 > > + por xmm10,xmm7 > > + pxor xmm3,xmm0 > > + movdqa xmm0,XMMWORD[((16-128))+rax] > > + > > + movdqa xmm8,xmm13 > > + movdqa xmm6,xmm11 > > + pxor xmm3,XMMWORD[((112-128))+rax] > > + paddd xmm12,xmm15 > > + pslld xmm8,5 > > + pxor xmm6,xmm14 > > + > > + movdqa xmm9,xmm13 > > + movdqa XMMWORD[(224-128)+rax],xmm2 > > + paddd xmm12,xmm2 > > + pxor xmm3,xmm0 > > + psrld xmm9,27 > > + pxor xmm6,xmm10 > > + movdqa xmm7,xmm14 > > + > > + pslld xmm7,30 > > + movdqa xmm5,xmm3 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + paddd xmm12,xmm6 > > + paddd xmm3,xmm3 > > + > > + psrld xmm14,2 > > + paddd xmm12,xmm8 > > + por xmm3,xmm5 > > + por xmm14,xmm7 > > + pxor xmm4,xmm1 > > + movdqa xmm1,XMMWORD[((32-128))+rax] > > + > > + movdqa xmm8,xmm12 > > + movdqa xmm6,xmm10 > > + pxor xmm4,XMMWORD[((128-128))+rax] > > + paddd xmm11,xmm15 > > + pslld xmm8,5 > > + pxor xmm6,xmm13 > > + > > + movdqa xmm9,xmm12 > > + movdqa XMMWORD[(240-128)+rax],xmm3 > > + paddd xmm11,xmm3 > > + pxor xmm4,xmm1 > > + psrld xmm9,27 > > + pxor xmm6,xmm14 > > + movdqa xmm7,xmm13 > > + > > + pslld xmm7,30 > > + movdqa xmm5,xmm4 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + paddd xmm11,xmm6 > > + paddd xmm4,xmm4 > > + > > + psrld xmm13,2 > > + paddd xmm11,xmm8 > > + por xmm4,xmm5 > > + por xmm13,xmm7 > > + pxor xmm0,xmm2 > > + movdqa xmm2,XMMWORD[((48-128))+rax] > > + > > + movdqa xmm8,xmm11 > > + movdqa xmm6,xmm14 > > + pxor xmm0,XMMWORD[((144-128))+rax] > > + paddd xmm10,xmm15 > > + pslld xmm8,5 > > + pxor xmm6,xmm12 > > + > > + movdqa xmm9,xmm11 > > + movdqa XMMWORD[(0-128)+rax],xmm4 > > + paddd xmm10,xmm4 > > + pxor xmm0,xmm2 > > + psrld xmm9,27 > > + pxor xmm6,xmm13 > > + movdqa xmm7,xmm12 > > + > > + pslld xmm7,30 > > + movdqa xmm5,xmm0 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + paddd xmm10,xmm6 > > + paddd xmm0,xmm0 > > + > > + psrld xmm12,2 > > + paddd xmm10,xmm8 > > + por xmm0,xmm5 > > + por xmm12,xmm7 > > + pxor xmm1,xmm3 > > + movdqa xmm3,XMMWORD[((64-128))+rax] > > + > > + movdqa xmm8,xmm10 > > + movdqa xmm6,xmm13 > > + pxor xmm1,XMMWORD[((160-128))+rax] > > + paddd xmm14,xmm15 > > + pslld xmm8,5 > > + pxor xmm6,xmm11 > > + > > + movdqa xmm9,xmm10 > > + movdqa XMMWORD[(16-128)+rax],xmm0 > > + paddd xmm14,xmm0 > > + pxor xmm1,xmm3 > > + psrld xmm9,27 > > + pxor xmm6,xmm12 > > + movdqa xmm7,xmm11 > > + > > + pslld xmm7,30 > > + movdqa xmm5,xmm1 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + paddd xmm14,xmm6 > > + paddd xmm1,xmm1 > > + > > + psrld xmm11,2 > > + paddd xmm14,xmm8 > > + por xmm1,xmm5 > > + por xmm11,xmm7 > > + pxor xmm2,xmm4 > > + movdqa xmm4,XMMWORD[((80-128))+rax] > > + > > + movdqa xmm8,xmm14 > > + movdqa xmm6,xmm12 > > + pxor xmm2,XMMWORD[((176-128))+rax] > > + paddd xmm13,xmm15 > > + pslld xmm8,5 > > + pxor xmm6,xmm10 > > + > > + movdqa xmm9,xmm14 > > + movdqa XMMWORD[(32-128)+rax],xmm1 > > + paddd xmm13,xmm1 > > + pxor xmm2,xmm4 > > + psrld xmm9,27 > > + pxor xmm6,xmm11 > > + movdqa xmm7,xmm10 > > + > > + pslld xmm7,30 > > + movdqa xmm5,xmm2 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + paddd xmm13,xmm6 > > + paddd xmm2,xmm2 > > + > > + psrld xmm10,2 > > + paddd xmm13,xmm8 > > + por xmm2,xmm5 > > + por xmm10,xmm7 > > + pxor xmm3,xmm0 > > + movdqa xmm0,XMMWORD[((96-128))+rax] > > + > > + movdqa xmm8,xmm13 > > + movdqa xmm6,xmm11 > > + pxor xmm3,XMMWORD[((192-128))+rax] > > + paddd xmm12,xmm15 > > + pslld xmm8,5 > > + pxor xmm6,xmm14 > > + > > + movdqa xmm9,xmm13 > > + movdqa XMMWORD[(48-128)+rax],xmm2 > > + paddd xmm12,xmm2 > > + pxor xmm3,xmm0 > > + psrld xmm9,27 > > + pxor xmm6,xmm10 > > + movdqa xmm7,xmm14 > > + > > + pslld xmm7,30 > > + movdqa xmm5,xmm3 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + paddd xmm12,xmm6 > > + paddd xmm3,xmm3 > > + > > + psrld xmm14,2 > > + paddd xmm12,xmm8 > > + por xmm3,xmm5 > > + por xmm14,xmm7 > > + pxor xmm4,xmm1 > > + movdqa xmm1,XMMWORD[((112-128))+rax] > > + > > + movdqa xmm8,xmm12 > > + movdqa xmm6,xmm10 > > + pxor xmm4,XMMWORD[((208-128))+rax] > > + paddd xmm11,xmm15 > > + pslld xmm8,5 > > + pxor xmm6,xmm13 > > + > > + movdqa xmm9,xmm12 > > + movdqa XMMWORD[(64-128)+rax],xmm3 > > + paddd xmm11,xmm3 > > + pxor xmm4,xmm1 > > + psrld xmm9,27 > > + pxor xmm6,xmm14 > > + movdqa xmm7,xmm13 > > + > > + pslld xmm7,30 > > + movdqa xmm5,xmm4 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + paddd xmm11,xmm6 > > + paddd xmm4,xmm4 > > + > > + psrld xmm13,2 > > + paddd xmm11,xmm8 > > + por xmm4,xmm5 > > + por xmm13,xmm7 > > + pxor xmm0,xmm2 > > + movdqa xmm2,XMMWORD[((128-128))+rax] > > + > > + movdqa xmm8,xmm11 > > + movdqa xmm6,xmm14 > > + pxor xmm0,XMMWORD[((224-128))+rax] > > + paddd xmm10,xmm15 > > + pslld xmm8,5 > > + pxor xmm6,xmm12 > > + > > + movdqa xmm9,xmm11 > > + movdqa XMMWORD[(80-128)+rax],xmm4 > > + paddd xmm10,xmm4 > > + pxor xmm0,xmm2 > > + psrld xmm9,27 > > + pxor xmm6,xmm13 > > + movdqa xmm7,xmm12 > > + > > + pslld xmm7,30 > > + movdqa xmm5,xmm0 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + paddd xmm10,xmm6 > > + paddd xmm0,xmm0 > > + > > + psrld xmm12,2 > > + paddd xmm10,xmm8 > > + por xmm0,xmm5 > > + por xmm12,xmm7 > > + pxor xmm1,xmm3 > > + movdqa xmm3,XMMWORD[((144-128))+rax] > > + > > + movdqa xmm8,xmm10 > > + movdqa xmm6,xmm13 > > + pxor xmm1,XMMWORD[((240-128))+rax] > > + paddd xmm14,xmm15 > > + pslld xmm8,5 > > + pxor xmm6,xmm11 > > + > > + movdqa xmm9,xmm10 > > + movdqa XMMWORD[(96-128)+rax],xmm0 > > + paddd xmm14,xmm0 > > + pxor xmm1,xmm3 > > + psrld xmm9,27 > > + pxor xmm6,xmm12 > > + movdqa xmm7,xmm11 > > + > > + pslld xmm7,30 > > + movdqa xmm5,xmm1 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + paddd xmm14,xmm6 > > + paddd xmm1,xmm1 > > + > > + psrld xmm11,2 > > + paddd xmm14,xmm8 > > + por xmm1,xmm5 > > + por xmm11,xmm7 > > + pxor xmm2,xmm4 > > + movdqa xmm4,XMMWORD[((160-128))+rax] > > + > > + movdqa xmm8,xmm14 > > + movdqa xmm6,xmm12 > > + pxor xmm2,XMMWORD[((0-128))+rax] > > + paddd xmm13,xmm15 > > + pslld xmm8,5 > > + pxor xmm6,xmm10 > > + > > + movdqa xmm9,xmm14 > > + movdqa XMMWORD[(112-128)+rax],xmm1 > > + paddd xmm13,xmm1 > > + pxor xmm2,xmm4 > > + psrld xmm9,27 > > + pxor xmm6,xmm11 > > + movdqa xmm7,xmm10 > > + > > + pslld xmm7,30 > > + movdqa xmm5,xmm2 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + paddd xmm13,xmm6 > > + paddd xmm2,xmm2 > > + > > + psrld xmm10,2 > > + paddd xmm13,xmm8 > > + por xmm2,xmm5 > > + por xmm10,xmm7 > > + pxor xmm3,xmm0 > > + movdqa xmm0,XMMWORD[((176-128))+rax] > > + > > + movdqa xmm8,xmm13 > > + movdqa xmm6,xmm11 > > + pxor xmm3,XMMWORD[((16-128))+rax] > > + paddd xmm12,xmm15 > > + pslld xmm8,5 > > + pxor xmm6,xmm14 > > + > > + movdqa xmm9,xmm13 > > + paddd xmm12,xmm2 > > + pxor xmm3,xmm0 > > + psrld xmm9,27 > > + pxor xmm6,xmm10 > > + movdqa xmm7,xmm14 > > + > > + pslld xmm7,30 > > + movdqa xmm5,xmm3 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + paddd xmm12,xmm6 > > + paddd xmm3,xmm3 > > + > > + psrld xmm14,2 > > + paddd xmm12,xmm8 > > + por xmm3,xmm5 > > + por xmm14,xmm7 > > + pxor xmm4,xmm1 > > + movdqa xmm1,XMMWORD[((192-128))+rax] > > + > > + movdqa xmm8,xmm12 > > + movdqa xmm6,xmm10 > > + pxor xmm4,XMMWORD[((32-128))+rax] > > + paddd xmm11,xmm15 > > + pslld xmm8,5 > > + pxor xmm6,xmm13 > > + > > + movdqa xmm9,xmm12 > > + paddd xmm11,xmm3 > > + pxor xmm4,xmm1 > > + psrld xmm9,27 > > + pxor xmm6,xmm14 > > + movdqa xmm7,xmm13 > > + > > + pslld xmm7,30 > > + movdqa xmm5,xmm4 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + paddd xmm11,xmm6 > > + paddd xmm4,xmm4 > > + > > + psrld xmm13,2 > > + paddd xmm11,xmm8 > > + por xmm4,xmm5 > > + por xmm13,xmm7 > > + pxor xmm0,xmm2 > > + movdqa xmm2,XMMWORD[((208-128))+rax] > > + > > + movdqa xmm8,xmm11 > > + movdqa xmm6,xmm14 > > + pxor xmm0,XMMWORD[((48-128))+rax] > > + paddd xmm10,xmm15 > > + pslld xmm8,5 > > + pxor xmm6,xmm12 > > + > > + movdqa xmm9,xmm11 > > + paddd xmm10,xmm4 > > + pxor xmm0,xmm2 > > + psrld xmm9,27 > > + pxor xmm6,xmm13 > > + movdqa xmm7,xmm12 > > + > > + pslld xmm7,30 > > + movdqa xmm5,xmm0 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + paddd xmm10,xmm6 > > + paddd xmm0,xmm0 > > + > > + psrld xmm12,2 > > + paddd xmm10,xmm8 > > + por xmm0,xmm5 > > + por xmm12,xmm7 > > + pxor xmm1,xmm3 > > + movdqa xmm3,XMMWORD[((224-128))+rax] > > + > > + movdqa xmm8,xmm10 > > + movdqa xmm6,xmm13 > > + pxor xmm1,XMMWORD[((64-128))+rax] > > + paddd xmm14,xmm15 > > + pslld xmm8,5 > > + pxor xmm6,xmm11 > > + > > + movdqa xmm9,xmm10 > > + paddd xmm14,xmm0 > > + pxor xmm1,xmm3 > > + psrld xmm9,27 > > + pxor xmm6,xmm12 > > + movdqa xmm7,xmm11 > > + > > + pslld xmm7,30 > > + movdqa xmm5,xmm1 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + paddd xmm14,xmm6 > > + paddd xmm1,xmm1 > > + > > + psrld xmm11,2 > > + paddd xmm14,xmm8 > > + por xmm1,xmm5 > > + por xmm11,xmm7 > > + pxor xmm2,xmm4 > > + movdqa xmm4,XMMWORD[((240-128))+rax] > > + > > + movdqa xmm8,xmm14 > > + movdqa xmm6,xmm12 > > + pxor xmm2,XMMWORD[((80-128))+rax] > > + paddd xmm13,xmm15 > > + pslld xmm8,5 > > + pxor xmm6,xmm10 > > + > > + movdqa xmm9,xmm14 > > + paddd xmm13,xmm1 > > + pxor xmm2,xmm4 > > + psrld xmm9,27 > > + pxor xmm6,xmm11 > > + movdqa xmm7,xmm10 > > + > > + pslld xmm7,30 > > + movdqa xmm5,xmm2 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + paddd xmm13,xmm6 > > + paddd xmm2,xmm2 > > + > > + psrld xmm10,2 > > + paddd xmm13,xmm8 > > + por xmm2,xmm5 > > + por xmm10,xmm7 > > + pxor xmm3,xmm0 > > + movdqa xmm0,XMMWORD[((0-128))+rax] > > + > > + movdqa xmm8,xmm13 > > + movdqa xmm6,xmm11 > > + pxor xmm3,XMMWORD[((96-128))+rax] > > + paddd xmm12,xmm15 > > + pslld xmm8,5 > > + pxor xmm6,xmm14 > > + > > + movdqa xmm9,xmm13 > > + paddd xmm12,xmm2 > > + pxor xmm3,xmm0 > > + psrld xmm9,27 > > + pxor xmm6,xmm10 > > + movdqa xmm7,xmm14 > > + > > + pslld xmm7,30 > > + movdqa xmm5,xmm3 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + paddd xmm12,xmm6 > > + paddd xmm3,xmm3 > > + > > + psrld xmm14,2 > > + paddd xmm12,xmm8 > > + por xmm3,xmm5 > > + por xmm14,xmm7 > > + pxor xmm4,xmm1 > > + movdqa xmm1,XMMWORD[((16-128))+rax] > > + > > + movdqa xmm8,xmm12 > > + movdqa xmm6,xmm10 > > + pxor xmm4,XMMWORD[((112-128))+rax] > > + paddd xmm11,xmm15 > > + pslld xmm8,5 > > + pxor xmm6,xmm13 > > + > > + movdqa xmm9,xmm12 > > + paddd xmm11,xmm3 > > + pxor xmm4,xmm1 > > + psrld xmm9,27 > > + pxor xmm6,xmm14 > > + movdqa xmm7,xmm13 > > + > > + pslld xmm7,30 > > + movdqa xmm5,xmm4 > > + por xmm8,xmm9 > > + psrld xmm5,31 > > + paddd xmm11,xmm6 > > + paddd xmm4,xmm4 > > + > > + psrld xmm13,2 > > + paddd xmm11,xmm8 > > + por xmm4,xmm5 > > + por xmm13,xmm7 > > + movdqa xmm8,xmm11 > > + paddd xmm10,xmm15 > > + movdqa xmm6,xmm14 > > + pslld xmm8,5 > > + pxor xmm6,xmm12 > > + > > + movdqa xmm9,xmm11 > > + paddd xmm10,xmm4 > > + psrld xmm9,27 > > + movdqa xmm7,xmm12 > > + pxor xmm6,xmm13 > > + > > + pslld xmm7,30 > > + por xmm8,xmm9 > > + paddd xmm10,xmm6 > > + > > + psrld xmm12,2 > > + paddd xmm10,xmm8 > > + por xmm12,xmm7 > > + movdqa xmm0,XMMWORD[rbx] > > + mov ecx,1 > > + cmp ecx,DWORD[rbx] > > + pxor xmm8,xmm8 > > + cmovge r8,rbp > > + cmp ecx,DWORD[4+rbx] > > + movdqa xmm1,xmm0 > > + cmovge r9,rbp > > + cmp ecx,DWORD[8+rbx] > > + pcmpgtd xmm1,xmm8 > > + cmovge r10,rbp > > + cmp ecx,DWORD[12+rbx] > > + paddd xmm0,xmm1 > > + cmovge r11,rbp > > + > > + movdqu xmm6,XMMWORD[rdi] > > + pand xmm10,xmm1 > > + movdqu xmm7,XMMWORD[32+rdi] > > + pand xmm11,xmm1 > > + paddd xmm10,xmm6 > > + movdqu xmm8,XMMWORD[64+rdi] > > + pand xmm12,xmm1 > > + paddd xmm11,xmm7 > > + movdqu xmm9,XMMWORD[96+rdi] > > + pand xmm13,xmm1 > > + paddd xmm12,xmm8 > > + movdqu xmm5,XMMWORD[128+rdi] > > + pand xmm14,xmm1 > > + movdqu XMMWORD[rdi],xmm10 > > + paddd xmm13,xmm9 > > + movdqu XMMWORD[32+rdi],xmm11 > > + paddd xmm14,xmm5 > > + movdqu XMMWORD[64+rdi],xmm12 > > + movdqu XMMWORD[96+rdi],xmm13 > > + movdqu XMMWORD[128+rdi],xmm14 > > + > > + movdqa XMMWORD[rbx],xmm0 > > + movdqa xmm5,XMMWORD[96+rbp] > > + movdqa xmm15,XMMWORD[((-32))+rbp] > > + dec edx > > + jnz NEAR $L$oop > > + > > + mov edx,DWORD[280+rsp] > > + lea rdi,[16+rdi] > > + lea rsi,[64+rsi] > > + dec edx > > + jnz NEAR $L$oop_grande > > + > > +$L$done: > > + mov rax,QWORD[272+rsp] > > + > > + movaps xmm6,XMMWORD[((-184))+rax] > > + movaps xmm7,XMMWORD[((-168))+rax] > > + movaps xmm8,XMMWORD[((-152))+rax] > > + movaps xmm9,XMMWORD[((-136))+rax] > > + movaps xmm10,XMMWORD[((-120))+rax] > > + movaps xmm11,XMMWORD[((-104))+rax] > > + movaps xmm12,XMMWORD[((-88))+rax] > > + movaps xmm13,XMMWORD[((-72))+rax] > > + movaps xmm14,XMMWORD[((-56))+rax] > > + movaps xmm15,XMMWORD[((-40))+rax] > > + mov rbp,QWORD[((-16))+rax] > > + > > + mov rbx,QWORD[((-8))+rax] > > + > > + lea rsp,[rax] > > + > > +$L$epilogue: > > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > > + mov rsi,QWORD[16+rsp] > > + DB 0F3h,0C3h ;repret > > + > > +$L$SEH_end_sha1_multi_block: > > + > > +ALIGN 32 > > +sha1_multi_block_shaext: > > + mov QWORD[8+rsp],rdi ;WIN64 prologue > > + mov QWORD[16+rsp],rsi > > + mov rax,rsp > > +$L$SEH_begin_sha1_multi_block_shaext: > > + mov rdi,rcx > > + mov rsi,rdx > > + mov rdx,r8 > > + > > + > > + > > +_shaext_shortcut: > > + mov rax,rsp > > + > > + push rbx > > + > > + push rbp > > + > > + lea rsp,[((-168))+rsp] > > + movaps XMMWORD[rsp],xmm6 > > + movaps XMMWORD[16+rsp],xmm7 > > + movaps XMMWORD[32+rsp],xmm8 > > + movaps XMMWORD[48+rsp],xmm9 > > + movaps XMMWORD[(-120)+rax],xmm10 > > + movaps XMMWORD[(-104)+rax],xmm11 > > + movaps XMMWORD[(-88)+rax],xmm12 > > + movaps XMMWORD[(-72)+rax],xmm13 > > + movaps XMMWORD[(-56)+rax],xmm14 > > + movaps XMMWORD[(-40)+rax],xmm15 > > + sub rsp,288 > > + shl edx,1 > > + and rsp,-256 > > + lea rdi,[64+rdi] > > + mov QWORD[272+rsp],rax > > +$L$body_shaext: > > + lea rbx,[256+rsp] > > + movdqa xmm3,XMMWORD[((K_XX_XX+128))] > > + > > +$L$oop_grande_shaext: > > + mov DWORD[280+rsp],edx > > + xor edx,edx > > + mov r8,QWORD[rsi] > > + mov ecx,DWORD[8+rsi] > > + cmp ecx,edx > > + cmovg edx,ecx > > + test ecx,ecx > > + mov DWORD[rbx],ecx > > + cmovle r8,rsp > > + mov r9,QWORD[16+rsi] > > + mov ecx,DWORD[24+rsi] > > + cmp ecx,edx > > + cmovg edx,ecx > > + test ecx,ecx > > + mov DWORD[4+rbx],ecx > > + cmovle r9,rsp > > + test edx,edx > > + jz NEAR $L$done_shaext > > + > > + movq xmm0,QWORD[((0-64))+rdi] > > + movq xmm4,QWORD[((32-64))+rdi] > > + movq xmm5,QWORD[((64-64))+rdi] > > + movq xmm6,QWORD[((96-64))+rdi] > > + movq xmm7,QWORD[((128-64))+rdi] > > + > > + punpckldq xmm0,xmm4 > > + punpckldq xmm5,xmm6 > > + > > + movdqa xmm8,xmm0 > > + punpcklqdq xmm0,xmm5 > > + punpckhqdq xmm8,xmm5 > > + > > + pshufd xmm1,xmm7,63 > > + pshufd xmm9,xmm7,127 > > + pshufd xmm0,xmm0,27 > > + pshufd xmm8,xmm8,27 > > + jmp NEAR $L$oop_shaext > > + > > +ALIGN 32 > > +$L$oop_shaext: > > + movdqu xmm4,XMMWORD[r8] > > + movdqu xmm11,XMMWORD[r9] > > + movdqu xmm5,XMMWORD[16+r8] > > + movdqu xmm12,XMMWORD[16+r9] > > + movdqu xmm6,XMMWORD[32+r8] > > +DB 102,15,56,0,227 > > + movdqu xmm13,XMMWORD[32+r9] > > +DB 102,68,15,56,0,219 > > + movdqu xmm7,XMMWORD[48+r8] > > + lea r8,[64+r8] > > +DB 102,15,56,0,235 > > + movdqu xmm14,XMMWORD[48+r9] > > + lea r9,[64+r9] > > +DB 102,68,15,56,0,227 > > + > > + movdqa XMMWORD[80+rsp],xmm1 > > + paddd xmm1,xmm4 > > + movdqa XMMWORD[112+rsp],xmm9 > > + paddd xmm9,xmm11 > > + movdqa XMMWORD[64+rsp],xmm0 > > + movdqa xmm2,xmm0 > > + movdqa XMMWORD[96+rsp],xmm8 > > + movdqa xmm10,xmm8 > > +DB 15,58,204,193,0 > > +DB 15,56,200,213 > > +DB 69,15,58,204,193,0 > > +DB 69,15,56,200,212 > > +DB 102,15,56,0,243 > > + prefetcht0 [127+r8] > > +DB 15,56,201,229 > > +DB 102,68,15,56,0,235 > > + prefetcht0 [127+r9] > > +DB 69,15,56,201,220 > > + > > +DB 102,15,56,0,251 > > + movdqa xmm1,xmm0 > > +DB 102,68,15,56,0,243 > > + movdqa xmm9,xmm8 > > +DB 15,58,204,194,0 > > +DB 15,56,200,206 > > +DB 69,15,58,204,194,0 > > +DB 69,15,56,200,205 > > + pxor xmm4,xmm6 > > +DB 15,56,201,238 > > + pxor xmm11,xmm13 > > +DB 69,15,56,201,229 > > + movdqa xmm2,xmm0 > > + movdqa xmm10,xmm8 > > +DB 15,58,204,193,0 > > +DB 15,56,200,215 > > +DB 69,15,58,204,193,0 > > +DB 69,15,56,200,214 > > +DB 15,56,202,231 > > +DB 69,15,56,202,222 > > + pxor xmm5,xmm7 > > +DB 15,56,201,247 > > + pxor xmm12,xmm14 > > +DB 69,15,56,201,238 > > + movdqa xmm1,xmm0 > > + movdqa xmm9,xmm8 > > +DB 15,58,204,194,0 > > +DB 15,56,200,204 > > +DB 69,15,58,204,194,0 > > +DB 69,15,56,200,203 > > +DB 15,56,202,236 > > +DB 69,15,56,202,227 > > + pxor xmm6,xmm4 > > +DB 15,56,201,252 > > + pxor xmm13,xmm11 > > +DB 69,15,56,201,243 > > + movdqa xmm2,xmm0 > > + movdqa xmm10,xmm8 > > +DB 15,58,204,193,0 > > +DB 15,56,200,213 > > +DB 69,15,58,204,193,0 > > +DB 69,15,56,200,212 > > +DB 15,56,202,245 > > +DB 69,15,56,202,236 > > + pxor xmm7,xmm5 > > +DB 15,56,201,229 > > + pxor xmm14,xmm12 > > +DB 69,15,56,201,220 > > + movdqa xmm1,xmm0 > > + movdqa xmm9,xmm8 > > +DB 15,58,204,194,1 > > +DB 15,56,200,206 > > +DB 69,15,58,204,194,1 > > +DB 69,15,56,200,205 > > +DB 15,56,202,254 > > +DB 69,15,56,202,245 > > + pxor xmm4,xmm6 > > +DB 15,56,201,238 > > + pxor xmm11,xmm13 > > +DB 69,15,56,201,229 > > + movdqa xmm2,xmm0 > > + movdqa xmm10,xmm8 > > +DB 15,58,204,193,1 > > +DB 15,56,200,215 > > +DB 69,15,58,204,193,1 > > +DB 69,15,56,200,214 > > +DB 15,56,202,231 > > +DB 69,15,56,202,222 > > + pxor xmm5,xmm7 > > +DB 15,56,201,247 > > + pxor xmm12,xmm14 > > +DB 69,15,56,201,238 > > + movdqa xmm1,xmm0 > > + movdqa xmm9,xmm8 > > +DB 15,58,204,194,1 > > +DB 15,56,200,204 > > +DB 69,15,58,204,194,1 > > +DB 69,15,56,200,203 > > +DB 15,56,202,236 > > +DB 69,15,56,202,227 > > + pxor xmm6,xmm4 > > +DB 15,56,201,252 > > + pxor xmm13,xmm11 > > +DB 69,15,56,201,243 > > + movdqa xmm2,xmm0 > > + movdqa xmm10,xmm8 > > +DB 15,58,204,193,1 > > +DB 15,56,200,213 > > +DB 69,15,58,204,193,1 > > +DB 69,15,56,200,212 > > +DB 15,56,202,245 > > +DB 69,15,56,202,236 > > + pxor xmm7,xmm5 > > +DB 15,56,201,229 > > + pxor xmm14,xmm12 > > +DB 69,15,56,201,220 > > + movdqa xmm1,xmm0 > > + movdqa xmm9,xmm8 > > +DB 15,58,204,194,1 > > +DB 15,56,200,206 > > +DB 69,15,58,204,194,1 > > +DB 69,15,56,200,205 > > +DB 15,56,202,254 > > +DB 69,15,56,202,245 > > + pxor xmm4,xmm6 > > +DB 15,56,201,238 > > + pxor xmm11,xmm13 > > +DB 69,15,56,201,229 > > + movdqa xmm2,xmm0 > > + movdqa xmm10,xmm8 > > +DB 15,58,204,193,2 > > +DB 15,56,200,215 > > +DB 69,15,58,204,193,2 > > +DB 69,15,56,200,214 > > +DB 15,56,202,231 > > +DB 69,15,56,202,222 > > + pxor xmm5,xmm7 > > +DB 15,56,201,247 > > + pxor xmm12,xmm14 > > +DB 69,15,56,201,238 > > + movdqa xmm1,xmm0 > > + movdqa xmm9,xmm8 > > +DB 15,58,204,194,2 > > +DB 15,56,200,204 > > +DB 69,15,58,204,194,2 > > +DB 69,15,56,200,203 > > +DB 15,56,202,236 > > +DB 69,15,56,202,227 > > + pxor xmm6,xmm4 > > +DB 15,56,201,252 > > + pxor xmm13,xmm11 > > +DB 69,15,56,201,243 > > + movdqa xmm2,xmm0 > > + movdqa xmm10,xmm8 > > +DB 15,58,204,193,2 > > +DB 15,56,200,213 > > +DB 69,15,58,204,193,2 > > +DB 69,15,56,200,212 > > +DB 15,56,202,245 > > +DB 69,15,56,202,236 > > + pxor xmm7,xmm5 > > +DB 15,56,201,229 > > + pxor xmm14,xmm12 > > +DB 69,15,56,201,220 > > + movdqa xmm1,xmm0 > > + movdqa xmm9,xmm8 > > +DB 15,58,204,194,2 > > +DB 15,56,200,206 > > +DB 69,15,58,204,194,2 > > +DB 69,15,56,200,205 > > +DB 15,56,202,254 > > +DB 69,15,56,202,245 > > + pxor xmm4,xmm6 > > +DB 15,56,201,238 > > + pxor xmm11,xmm13 > > +DB 69,15,56,201,229 > > + movdqa xmm2,xmm0 > > + movdqa xmm10,xmm8 > > +DB 15,58,204,193,2 > > +DB 15,56,200,215 > > +DB 69,15,58,204,193,2 > > +DB 69,15,56,200,214 > > +DB 15,56,202,231 > > +DB 69,15,56,202,222 > > + pxor xmm5,xmm7 > > +DB 15,56,201,247 > > + pxor xmm12,xmm14 > > +DB 69,15,56,201,238 > > + movdqa xmm1,xmm0 > > + movdqa xmm9,xmm8 > > +DB 15,58,204,194,3 > > +DB 15,56,200,204 > > +DB 69,15,58,204,194,3 > > +DB 69,15,56,200,203 > > +DB 15,56,202,236 > > +DB 69,15,56,202,227 > > + pxor xmm6,xmm4 > > +DB 15,56,201,252 > > + pxor xmm13,xmm11 > > +DB 69,15,56,201,243 > > + movdqa xmm2,xmm0 > > + movdqa xmm10,xmm8 > > +DB 15,58,204,193,3 > > +DB 15,56,200,213 > > +DB 69,15,58,204,193,3 > > +DB 69,15,56,200,212 > > +DB 15,56,202,245 > > +DB 69,15,56,202,236 > > + pxor xmm7,xmm5 > > + pxor xmm14,xmm12 > > + > > + mov ecx,1 > > + pxor xmm4,xmm4 > > + cmp ecx,DWORD[rbx] > > + cmovge r8,rsp > > + > > + movdqa xmm1,xmm0 > > + movdqa xmm9,xmm8 > > +DB 15,58,204,194,3 > > +DB 15,56,200,206 > > +DB 69,15,58,204,194,3 > > +DB 69,15,56,200,205 > > +DB 15,56,202,254 > > +DB 69,15,56,202,245 > > + > > + cmp ecx,DWORD[4+rbx] > > + cmovge r9,rsp > > + movq xmm6,QWORD[rbx] > > + > > + movdqa xmm2,xmm0 > > + movdqa xmm10,xmm8 > > +DB 15,58,204,193,3 > > +DB 15,56,200,215 > > +DB 69,15,58,204,193,3 > > +DB 69,15,56,200,214 > > + > > + pshufd xmm11,xmm6,0x00 > > + pshufd xmm12,xmm6,0x55 > > + movdqa xmm7,xmm6 > > + pcmpgtd xmm11,xmm4 > > + pcmpgtd xmm12,xmm4 > > + > > + movdqa xmm1,xmm0 > > + movdqa xmm9,xmm8 > > +DB 15,58,204,194,3 > > +DB 15,56,200,204 > > +DB 69,15,58,204,194,3 > > +DB 68,15,56,200,204 > > + > > + pcmpgtd xmm7,xmm4 > > + pand xmm0,xmm11 > > + pand xmm1,xmm11 > > + pand xmm8,xmm12 > > + pand xmm9,xmm12 > > + paddd xmm6,xmm7 > > + > > + paddd xmm0,XMMWORD[64+rsp] > > + paddd xmm1,XMMWORD[80+rsp] > > + paddd xmm8,XMMWORD[96+rsp] > > + paddd xmm9,XMMWORD[112+rsp] > > + > > + movq QWORD[rbx],xmm6 > > + dec edx > > + jnz NEAR $L$oop_shaext > > + > > + mov edx,DWORD[280+rsp] > > + > > + pshufd xmm0,xmm0,27 > > + pshufd xmm8,xmm8,27 > > + > > + movdqa xmm6,xmm0 > > + punpckldq xmm0,xmm8 > > + punpckhdq xmm6,xmm8 > > + punpckhdq xmm1,xmm9 > > + movq QWORD[(0-64)+rdi],xmm0 > > + psrldq xmm0,8 > > + movq QWORD[(64-64)+rdi],xmm6 > > + psrldq xmm6,8 > > + movq QWORD[(32-64)+rdi],xmm0 > > + psrldq xmm1,8 > > + movq QWORD[(96-64)+rdi],xmm6 > > + movq QWORD[(128-64)+rdi],xmm1 > > + > > + lea rdi,[8+rdi] > > + lea rsi,[32+rsi] > > + dec edx > > + jnz NEAR $L$oop_grande_shaext > > + > > +$L$done_shaext: > > + > > + movaps xmm6,XMMWORD[((-184))+rax] > > + movaps xmm7,XMMWORD[((-168))+rax] > > + movaps xmm8,XMMWORD[((-152))+rax] > > + movaps xmm9,XMMWORD[((-136))+rax] > > + movaps xmm10,XMMWORD[((-120))+rax] > > + movaps xmm11,XMMWORD[((-104))+rax] > > + movaps xmm12,XMMWORD[((-88))+rax] > > + movaps xmm13,XMMWORD[((-72))+rax] > > + movaps xmm14,XMMWORD[((-56))+rax] > > + movaps xmm15,XMMWORD[((-40))+rax] > > + mov rbp,QWORD[((-16))+rax] > > + > > + mov rbx,QWORD[((-8))+rax] > > + > > + lea rsp,[rax] > > + > > +$L$epilogue_shaext: > > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > > + mov rsi,QWORD[16+rsp] > > + DB 0F3h,0C3h ;repret > > + > > +$L$SEH_end_sha1_multi_block_shaext: > > + > > +ALIGN 256 > > + DD 0x5a827999,0x5a827999,0x5a827999,0x5a827999 > > + DD 0x5a827999,0x5a827999,0x5a827999,0x5a827999 > > +K_XX_XX: > > + DD 0x6ed9eba1,0x6ed9eba1,0x6ed9eba1,0x6ed9eba1 > > + DD 0x6ed9eba1,0x6ed9eba1,0x6ed9eba1,0x6ed9eba1 > > + DD 0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc > > + DD 0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc > > + DD 0xca62c1d6,0xca62c1d6,0xca62c1d6,0xca62c1d6 > > + DD 0xca62c1d6,0xca62c1d6,0xca62c1d6,0xca62c1d6 > > + DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f > > + DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f > > +DB > 0xf,0xe,0xd,0xc,0xb,0xa,0x9,0x8,0x7,0x6,0x5,0x4,0x3,0x2,0x1,0x0 > > +DB 83,72,65,49,32,109,117,108,116,105,45,98,108,111,99,107 > > +DB 32,116,114,97,110,115,102,111,114,109,32,102,111,114,32,120 > > +DB 56,54,95,54,52,44,32,67,82,89,80,84,79,71,65,77 > > +DB 83,32,98,121,32,60,97,112,112,114,111,64,111,112,101,110 > > +DB 115,115,108,46,111,114,103,62,0 > > +EXTERN __imp_RtlVirtualUnwind > > + > > +ALIGN 16 > > +se_handler: > > + push rsi > > + push rdi > > + push rbx > > + push rbp > > + push r12 > > + push r13 > > + push r14 > > + push r15 > > + pushfq > > + sub rsp,64 > > + > > + mov rax,QWORD[120+r8] > > + mov rbx,QWORD[248+r8] > > + > > + mov rsi,QWORD[8+r9] > > + mov r11,QWORD[56+r9] > > + > > + mov r10d,DWORD[r11] > > + lea r10,[r10*1+rsi] > > + cmp rbx,r10 > > + jb NEAR $L$in_prologue > > + > > + mov rax,QWORD[152+r8] > > + > > + mov r10d,DWORD[4+r11] > > + lea r10,[r10*1+rsi] > > + cmp rbx,r10 > > + jae NEAR $L$in_prologue > > + > > + mov rax,QWORD[272+rax] > > + > > + mov rbx,QWORD[((-8))+rax] > > + mov rbp,QWORD[((-16))+rax] > > + mov QWORD[144+r8],rbx > > + mov QWORD[160+r8],rbp > > + > > + lea rsi,[((-24-160))+rax] > > + lea rdi,[512+r8] > > + mov ecx,20 > > + DD 0xa548f3fc > > + > > +$L$in_prologue: > > + mov rdi,QWORD[8+rax] > > + mov rsi,QWORD[16+rax] > > + mov QWORD[152+r8],rax > > + mov QWORD[168+r8],rsi > > + mov QWORD[176+r8],rdi > > + > > + mov rdi,QWORD[40+r9] > > + mov rsi,r8 > > + mov ecx,154 > > + DD 0xa548f3fc > > + > > + mov rsi,r9 > > + xor rcx,rcx > > + mov rdx,QWORD[8+rsi] > > + mov r8,QWORD[rsi] > > + mov r9,QWORD[16+rsi] > > + mov r10,QWORD[40+rsi] > > + lea r11,[56+rsi] > > + lea r12,[24+rsi] > > + mov QWORD[32+rsp],r10 > > + mov QWORD[40+rsp],r11 > > + mov QWORD[48+rsp],r12 > > + mov QWORD[56+rsp],rcx > > + call QWORD[__imp_RtlVirtualUnwind] > > + > > + mov eax,1 > > + add rsp,64 > > + popfq > > + pop r15 > > + pop r14 > > + pop r13 > > + pop r12 > > + pop rbp > > + pop rbx > > + pop rdi > > + pop rsi > > + DB 0F3h,0C3h ;repret > > + > > +section .pdata rdata align=4 > > +ALIGN 4 > > + DD $L$SEH_begin_sha1_multi_block wrt ..imagebase > > + DD $L$SEH_end_sha1_multi_block wrt ..imagebase > > + DD $L$SEH_info_sha1_multi_block wrt ..imagebase > > + DD $L$SEH_begin_sha1_multi_block_shaext > wrt ..imagebase > > + DD $L$SEH_end_sha1_multi_block_shaext > wrt ..imagebase > > + DD $L$SEH_info_sha1_multi_block_shaext > wrt ..imagebase > > +section .xdata rdata align=8 > > +ALIGN 8 > > +$L$SEH_info_sha1_multi_block: > > +DB 9,0,0,0 > > + DD se_handler wrt ..imagebase > > + DD $L$body wrt ..imagebase,$L$epilogue > wrt ..imagebase > > +$L$SEH_info_sha1_multi_block_shaext: > > +DB 9,0,0,0 > > + DD se_handler wrt ..imagebase > > + DD $L$body_shaext wrt ..imagebase,$L$epilogue_shaext > > wrt ..imagebase > > diff --git > a/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-x86_64.nasm > > b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-x86_64.nasm > > new file mode 100644 > > index 0000000000..c6d68d348f > > --- /dev/null > > +++ b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-x86_64.nasm > > @@ -0,0 +1,2884 @@ > > +; WARNING: do not edit! > > +; Generated from openssl/crypto/sha/asm/sha1-x86_64.pl > > +; > > +; Copyright 2006-2020 The OpenSSL Project Authors. All Rights Reserved. > > +; > > +; Licensed under the OpenSSL license (the "License"). You may not use > > +; this file except in compliance with the License. You can obtain a copy > > +; in the file LICENSE in the source distribution or at > > +; https://www.openssl.org/source/license.html > > + > > +default rel > > +%define XMMWORD > > +%define YMMWORD > > +%define ZMMWORD > > +section .text code align=64 > > + > > +EXTERN OPENSSL_ia32cap_P > > + > > +global sha1_block_data_order > > + > > +ALIGN 16 > > +sha1_block_data_order: > > + mov QWORD[8+rsp],rdi ;WIN64 prologue > > + mov QWORD[16+rsp],rsi > > + mov rax,rsp > > +$L$SEH_begin_sha1_block_data_order: > > + mov rdi,rcx > > + mov rsi,rdx > > + mov rdx,r8 > > + > > + > > + > > + mov r9d,DWORD[((OPENSSL_ia32cap_P+0))] > > + mov r8d,DWORD[((OPENSSL_ia32cap_P+4))] > > + mov r10d,DWORD[((OPENSSL_ia32cap_P+8))] > > + test r8d,512 > > + jz NEAR $L$ialu > > + test r10d,536870912 > > + jnz NEAR _shaext_shortcut > > + jmp NEAR _ssse3_shortcut > > + > > +ALIGN 16 > > +$L$ialu: > > + mov rax,rsp > > + > > + push rbx > > + > > + push rbp > > + > > + push r12 > > + > > + push r13 > > + > > + push r14 > > + > > + mov r8,rdi > > + sub rsp,72 > > + mov r9,rsi > > + and rsp,-64 > > + mov r10,rdx > > + mov QWORD[64+rsp],rax > > + > > +$L$prologue: > > + > > + mov esi,DWORD[r8] > > + mov edi,DWORD[4+r8] > > + mov r11d,DWORD[8+r8] > > + mov r12d,DWORD[12+r8] > > + mov r13d,DWORD[16+r8] > > + jmp NEAR $L$loop > > + > > +ALIGN 16 > > +$L$loop: > > + mov edx,DWORD[r9] > > + bswap edx > > + mov ebp,DWORD[4+r9] > > + mov eax,r12d > > + mov DWORD[rsp],edx > > + mov ecx,esi > > + bswap ebp > > + xor eax,r11d > > + rol ecx,5 > > + and eax,edi > > + lea r13d,[1518500249+r13*1+rdx] > > + add r13d,ecx > > + xor eax,r12d > > + rol edi,30 > > + add r13d,eax > > + mov r14d,DWORD[8+r9] > > + mov eax,r11d > > + mov DWORD[4+rsp],ebp > > + mov ecx,r13d > > + bswap r14d > > + xor eax,edi > > + rol ecx,5 > > + and eax,esi > > + lea r12d,[1518500249+r12*1+rbp] > > + add r12d,ecx > > + xor eax,r11d > > + rol esi,30 > > + add r12d,eax > > + mov edx,DWORD[12+r9] > > + mov eax,edi > > + mov DWORD[8+rsp],r14d > > + mov ecx,r12d > > + bswap edx > > + xor eax,esi > > + rol ecx,5 > > + and eax,r13d > > + lea r11d,[1518500249+r11*1+r14] > > + add r11d,ecx > > + xor eax,edi > > + rol r13d,30 > > + add r11d,eax > > + mov ebp,DWORD[16+r9] > > + mov eax,esi > > + mov DWORD[12+rsp],edx > > + mov ecx,r11d > > + bswap ebp > > + xor eax,r13d > > + rol ecx,5 > > + and eax,r12d > > + lea edi,[1518500249+rdi*1+rdx] > > + add edi,ecx > > + xor eax,esi > > + rol r12d,30 > > + add edi,eax > > + mov r14d,DWORD[20+r9] > > + mov eax,r13d > > + mov DWORD[16+rsp],ebp > > + mov ecx,edi > > + bswap r14d > > + xor eax,r12d > > + rol ecx,5 > > + and eax,r11d > > + lea esi,[1518500249+rsi*1+rbp] > > + add esi,ecx > > + xor eax,r13d > > + rol r11d,30 > > + add esi,eax > > + mov edx,DWORD[24+r9] > > + mov eax,r12d > > + mov DWORD[20+rsp],r14d > > + mov ecx,esi > > + bswap edx > > + xor eax,r11d > > + rol ecx,5 > > + and eax,edi > > + lea r13d,[1518500249+r13*1+r14] > > + add r13d,ecx > > + xor eax,r12d > > + rol edi,30 > > + add r13d,eax > > + mov ebp,DWORD[28+r9] > > + mov eax,r11d > > + mov DWORD[24+rsp],edx > > + mov ecx,r13d > > + bswap ebp > > + xor eax,edi > > + rol ecx,5 > > + and eax,esi > > + lea r12d,[1518500249+r12*1+rdx] > > + add r12d,ecx > > + xor eax,r11d > > + rol esi,30 > > + add r12d,eax > > + mov r14d,DWORD[32+r9] > > + mov eax,edi > > + mov DWORD[28+rsp],ebp > > + mov ecx,r12d > > + bswap r14d > > + xor eax,esi > > + rol ecx,5 > > + and eax,r13d > > + lea r11d,[1518500249+r11*1+rbp] > > + add r11d,ecx > > + xor eax,edi > > + rol r13d,30 > > + add r11d,eax > > + mov edx,DWORD[36+r9] > > + mov eax,esi > > + mov DWORD[32+rsp],r14d > > + mov ecx,r11d > > + bswap edx > > + xor eax,r13d > > + rol ecx,5 > > + and eax,r12d > > + lea edi,[1518500249+rdi*1+r14] > > + add edi,ecx > > + xor eax,esi > > + rol r12d,30 > > + add edi,eax > > + mov ebp,DWORD[40+r9] > > + mov eax,r13d > > + mov DWORD[36+rsp],edx > > + mov ecx,edi > > + bswap ebp > > + xor eax,r12d > > + rol ecx,5 > > + and eax,r11d > > + lea esi,[1518500249+rsi*1+rdx] > > + add esi,ecx > > + xor eax,r13d > > + rol r11d,30 > > + add esi,eax > > + mov r14d,DWORD[44+r9] > > + mov eax,r12d > > + mov DWORD[40+rsp],ebp > > + mov ecx,esi > > + bswap r14d > > + xor eax,r11d > > + rol ecx,5 > > + and eax,edi > > + lea r13d,[1518500249+r13*1+rbp] > > + add r13d,ecx > > + xor eax,r12d > > + rol edi,30 > > + add r13d,eax > > + mov edx,DWORD[48+r9] > > + mov eax,r11d > > + mov DWORD[44+rsp],r14d > > + mov ecx,r13d > > + bswap edx > > + xor eax,edi > > + rol ecx,5 > > + and eax,esi > > + lea r12d,[1518500249+r12*1+r14] > > + add r12d,ecx > > + xor eax,r11d > > + rol esi,30 > > + add r12d,eax > > + mov ebp,DWORD[52+r9] > > + mov eax,edi > > + mov DWORD[48+rsp],edx > > + mov ecx,r12d > > + bswap ebp > > + xor eax,esi > > + rol ecx,5 > > + and eax,r13d > > + lea r11d,[1518500249+r11*1+rdx] > > + add r11d,ecx > > + xor eax,edi > > + rol r13d,30 > > + add r11d,eax > > + mov r14d,DWORD[56+r9] > > + mov eax,esi > > + mov DWORD[52+rsp],ebp > > + mov ecx,r11d > > + bswap r14d > > + xor eax,r13d > > + rol ecx,5 > > + and eax,r12d > > + lea edi,[1518500249+rdi*1+rbp] > > + add edi,ecx > > + xor eax,esi > > + rol r12d,30 > > + add edi,eax > > + mov edx,DWORD[60+r9] > > + mov eax,r13d > > + mov DWORD[56+rsp],r14d > > + mov ecx,edi > > + bswap edx > > + xor eax,r12d > > + rol ecx,5 > > + and eax,r11d > > + lea esi,[1518500249+rsi*1+r14] > > + add esi,ecx > > + xor eax,r13d > > + rol r11d,30 > > + add esi,eax > > + xor ebp,DWORD[rsp] > > + mov eax,r12d > > + mov DWORD[60+rsp],edx > > + mov ecx,esi > > + xor ebp,DWORD[8+rsp] > > + xor eax,r11d > > + rol ecx,5 > > + xor ebp,DWORD[32+rsp] > > + and eax,edi > > + lea r13d,[1518500249+r13*1+rdx] > > + rol edi,30 > > + xor eax,r12d > > + add r13d,ecx > > + rol ebp,1 > > + add r13d,eax > > + xor r14d,DWORD[4+rsp] > > + mov eax,r11d > > + mov DWORD[rsp],ebp > > + mov ecx,r13d > > + xor r14d,DWORD[12+rsp] > > + xor eax,edi > > + rol ecx,5 > > + xor r14d,DWORD[36+rsp] > > + and eax,esi > > + lea r12d,[1518500249+r12*1+rbp] > > + rol esi,30 > > + xor eax,r11d > > + add r12d,ecx > > + rol r14d,1 > > + add r12d,eax > > + xor edx,DWORD[8+rsp] > > + mov eax,edi > > + mov DWORD[4+rsp],r14d > > + mov ecx,r12d > > + xor edx,DWORD[16+rsp] > > + xor eax,esi > > + rol ecx,5 > > + xor edx,DWORD[40+rsp] > > + and eax,r13d > > + lea r11d,[1518500249+r11*1+r14] > > + rol r13d,30 > > + xor eax,edi > > + add r11d,ecx > > + rol edx,1 > > + add r11d,eax > > + xor ebp,DWORD[12+rsp] > > + mov eax,esi > > + mov DWORD[8+rsp],edx > > + mov ecx,r11d > > + xor ebp,DWORD[20+rsp] > > + xor eax,r13d > > + rol ecx,5 > > + xor ebp,DWORD[44+rsp] > > + and eax,r12d > > + lea edi,[1518500249+rdi*1+rdx] > > + rol r12d,30 > > + xor eax,esi > > + add edi,ecx > > + rol ebp,1 > > + add edi,eax > > + xor r14d,DWORD[16+rsp] > > + mov eax,r13d > > + mov DWORD[12+rsp],ebp > > + mov ecx,edi > > + xor r14d,DWORD[24+rsp] > > + xor eax,r12d > > + rol ecx,5 > > + xor r14d,DWORD[48+rsp] > > + and eax,r11d > > + lea esi,[1518500249+rsi*1+rbp] > > + rol r11d,30 > > + xor eax,r13d > > + add esi,ecx > > + rol r14d,1 > > + add esi,eax > > + xor edx,DWORD[20+rsp] > > + mov eax,edi > > + mov DWORD[16+rsp],r14d > > + mov ecx,esi > > + xor edx,DWORD[28+rsp] > > + xor eax,r12d > > + rol ecx,5 > > + xor edx,DWORD[52+rsp] > > + lea r13d,[1859775393+r13*1+r14] > > + xor eax,r11d > > + add r13d,ecx > > + rol edi,30 > > + add r13d,eax > > + rol edx,1 > > + xor ebp,DWORD[24+rsp] > > + mov eax,esi > > + mov DWORD[20+rsp],edx > > + mov ecx,r13d > > + xor ebp,DWORD[32+rsp] > > + xor eax,r11d > > + rol ecx,5 > > + xor ebp,DWORD[56+rsp] > > + lea r12d,[1859775393+r12*1+rdx] > > + xor eax,edi > > + add r12d,ecx > > + rol esi,30 > > + add r12d,eax > > + rol ebp,1 > > + xor r14d,DWORD[28+rsp] > > + mov eax,r13d > > + mov DWORD[24+rsp],ebp > > + mov ecx,r12d > > + xor r14d,DWORD[36+rsp] > > + xor eax,edi > > + rol ecx,5 > > + xor r14d,DWORD[60+rsp] > > + lea r11d,[1859775393+r11*1+rbp] > > + xor eax,esi > > + add r11d,ecx > > + rol r13d,30 > > + add r11d,eax > > + rol r14d,1 > > + xor edx,DWORD[32+rsp] > > + mov eax,r12d > > + mov DWORD[28+rsp],r14d > > + mov ecx,r11d > > + xor edx,DWORD[40+rsp] > > + xor eax,esi > > + rol ecx,5 > > + xor edx,DWORD[rsp] > > + lea edi,[1859775393+rdi*1+r14] > > + xor eax,r13d > > + add edi,ecx > > + rol r12d,30 > > + add edi,eax > > + rol edx,1 > > + xor ebp,DWORD[36+rsp] > > + mov eax,r11d > > + mov DWORD[32+rsp],edx > > + mov ecx,edi > > + xor ebp,DWORD[44+rsp] > > + xor eax,r13d > > + rol ecx,5 > > + xor ebp,DWORD[4+rsp] > > + lea esi,[1859775393+rsi*1+rdx] > > + xor eax,r12d > > + add esi,ecx > > + rol r11d,30 > > + add esi,eax > > + rol ebp,1 > > + xor r14d,DWORD[40+rsp] > > + mov eax,edi > > + mov DWORD[36+rsp],ebp > > + mov ecx,esi > > + xor r14d,DWORD[48+rsp] > > + xor eax,r12d > > + rol ecx,5 > > + xor r14d,DWORD[8+rsp] > > + lea r13d,[1859775393+r13*1+rbp] > > + xor eax,r11d > > + add r13d,ecx > > + rol edi,30 > > + add r13d,eax > > + rol r14d,1 > > + xor edx,DWORD[44+rsp] > > + mov eax,esi > > + mov DWORD[40+rsp],r14d > > + mov ecx,r13d > > + xor edx,DWORD[52+rsp] > > + xor eax,r11d > > + rol ecx,5 > > + xor edx,DWORD[12+rsp] > > + lea r12d,[1859775393+r12*1+r14] > > + xor eax,edi > > + add r12d,ecx > > + rol esi,30 > > + add r12d,eax > > + rol edx,1 > > + xor ebp,DWORD[48+rsp] > > + mov eax,r13d > > + mov DWORD[44+rsp],edx > > + mov ecx,r12d > > + xor ebp,DWORD[56+rsp] > > + xor eax,edi > > + rol ecx,5 > > + xor ebp,DWORD[16+rsp] > > + lea r11d,[1859775393+r11*1+rdx] > > + xor eax,esi > > + add r11d,ecx > > + rol r13d,30 > > + add r11d,eax > > + rol ebp,1 > > + xor r14d,DWORD[52+rsp] > > + mov eax,r12d > > + mov DWORD[48+rsp],ebp > > + mov ecx,r11d > > + xor r14d,DWORD[60+rsp] > > + xor eax,esi > > + rol ecx,5 > > + xor r14d,DWORD[20+rsp] > > + lea edi,[1859775393+rdi*1+rbp] > > + xor eax,r13d > > + add edi,ecx > > + rol r12d,30 > > + add edi,eax > > + rol r14d,1 > > + xor edx,DWORD[56+rsp] > > + mov eax,r11d > > + mov DWORD[52+rsp],r14d > > + mov ecx,edi > > + xor edx,DWORD[rsp] > > + xor eax,r13d > > + rol ecx,5 > > + xor edx,DWORD[24+rsp] > > + lea esi,[1859775393+rsi*1+r14] > > + xor eax,r12d > > + add esi,ecx > > + rol r11d,30 > > + add esi,eax > > + rol edx,1 > > + xor ebp,DWORD[60+rsp] > > + mov eax,edi > > + mov DWORD[56+rsp],edx > > + mov ecx,esi > > + xor ebp,DWORD[4+rsp] > > + xor eax,r12d > > + rol ecx,5 > > + xor ebp,DWORD[28+rsp] > > + lea r13d,[1859775393+r13*1+rdx] > > + xor eax,r11d > > + add r13d,ecx > > + rol edi,30 > > + add r13d,eax > > + rol ebp,1 > > + xor r14d,DWORD[rsp] > > + mov eax,esi > > + mov DWORD[60+rsp],ebp > > + mov ecx,r13d > > + xor r14d,DWORD[8+rsp] > > + xor eax,r11d > > + rol ecx,5 > > + xor r14d,DWORD[32+rsp] > > + lea r12d,[1859775393+r12*1+rbp] > > + xor eax,edi > > + add r12d,ecx > > + rol esi,30 > > + add r12d,eax > > + rol r14d,1 > > + xor edx,DWORD[4+rsp] > > + mov eax,r13d > > + mov DWORD[rsp],r14d > > + mov ecx,r12d > > + xor edx,DWORD[12+rsp] > > + xor eax,edi > > + rol ecx,5 > > + xor edx,DWORD[36+rsp] > > + lea r11d,[1859775393+r11*1+r14] > > + xor eax,esi > > + add r11d,ecx > > + rol r13d,30 > > + add r11d,eax > > + rol edx,1 > > + xor ebp,DWORD[8+rsp] > > + mov eax,r12d > > + mov DWORD[4+rsp],edx > > + mov ecx,r11d > > + xor ebp,DWORD[16+rsp] > > + xor eax,esi > > + rol ecx,5 > > + xor ebp,DWORD[40+rsp] > > + lea edi,[1859775393+rdi*1+rdx] > > + xor eax,r13d > > + add edi,ecx > > + rol r12d,30 > > + add edi,eax > > + rol ebp,1 > > + xor r14d,DWORD[12+rsp] > > + mov eax,r11d > > + mov DWORD[8+rsp],ebp > > + mov ecx,edi > > + xor r14d,DWORD[20+rsp] > > + xor eax,r13d > > + rol ecx,5 > > + xor r14d,DWORD[44+rsp] > > + lea esi,[1859775393+rsi*1+rbp] > > + xor eax,r12d > > + add esi,ecx > > + rol r11d,30 > > + add esi,eax > > + rol r14d,1 > > + xor edx,DWORD[16+rsp] > > + mov eax,edi > > + mov DWORD[12+rsp],r14d > > + mov ecx,esi > > + xor edx,DWORD[24+rsp] > > + xor eax,r12d > > + rol ecx,5 > > + xor edx,DWORD[48+rsp] > > + lea r13d,[1859775393+r13*1+r14] > > + xor eax,r11d > > + add r13d,ecx > > + rol edi,30 > > + add r13d,eax > > + rol edx,1 > > + xor ebp,DWORD[20+rsp] > > + mov eax,esi > > + mov DWORD[16+rsp],edx > > + mov ecx,r13d > > + xor ebp,DWORD[28+rsp] > > + xor eax,r11d > > + rol ecx,5 > > + xor ebp,DWORD[52+rsp] > > + lea r12d,[1859775393+r12*1+rdx] > > + xor eax,edi > > + add r12d,ecx > > + rol esi,30 > > + add r12d,eax > > + rol ebp,1 > > + xor r14d,DWORD[24+rsp] > > + mov eax,r13d > > + mov DWORD[20+rsp],ebp > > + mov ecx,r12d > > + xor r14d,DWORD[32+rsp] > > + xor eax,edi > > + rol ecx,5 > > + xor r14d,DWORD[56+rsp] > > + lea r11d,[1859775393+r11*1+rbp] > > + xor eax,esi > > + add r11d,ecx > > + rol r13d,30 > > + add r11d,eax > > + rol r14d,1 > > + xor edx,DWORD[28+rsp] > > + mov eax,r12d > > + mov DWORD[24+rsp],r14d > > + mov ecx,r11d > > + xor edx,DWORD[36+rsp] > > + xor eax,esi > > + rol ecx,5 > > + xor edx,DWORD[60+rsp] > > + lea edi,[1859775393+rdi*1+r14] > > + xor eax,r13d > > + add edi,ecx > > + rol r12d,30 > > + add edi,eax > > + rol edx,1 > > + xor ebp,DWORD[32+rsp] > > + mov eax,r11d > > + mov DWORD[28+rsp],edx > > + mov ecx,edi > > + xor ebp,DWORD[40+rsp] > > + xor eax,r13d > > + rol ecx,5 > > + xor ebp,DWORD[rsp] > > + lea esi,[1859775393+rsi*1+rdx] > > + xor eax,r12d > > + add esi,ecx > > + rol r11d,30 > > + add esi,eax > > + rol ebp,1 > > + xor r14d,DWORD[36+rsp] > > + mov eax,r12d > > + mov DWORD[32+rsp],ebp > > + mov ebx,r12d > > + xor r14d,DWORD[44+rsp] > > + and eax,r11d > > + mov ecx,esi > > + xor r14d,DWORD[4+rsp] > > + lea r13d,[((-1894007588))+r13*1+rbp] > > + xor ebx,r11d > > + rol ecx,5 > > + add r13d,eax > > + rol r14d,1 > > + and ebx,edi > > + add r13d,ecx > > + rol edi,30 > > + add r13d,ebx > > + xor edx,DWORD[40+rsp] > > + mov eax,r11d > > + mov DWORD[36+rsp],r14d > > + mov ebx,r11d > > + xor edx,DWORD[48+rsp] > > + and eax,edi > > + mov ecx,r13d > > + xor edx,DWORD[8+rsp] > > + lea r12d,[((-1894007588))+r12*1+r14] > > + xor ebx,edi > > + rol ecx,5 > > + add r12d,eax > > + rol edx,1 > > + and ebx,esi > > + add r12d,ecx > > + rol esi,30 > > + add r12d,ebx > > + xor ebp,DWORD[44+rsp] > > + mov eax,edi > > + mov DWORD[40+rsp],edx > > + mov ebx,edi > > + xor ebp,DWORD[52+rsp] > > + and eax,esi > > + mov ecx,r12d > > + xor ebp,DWORD[12+rsp] > > + lea r11d,[((-1894007588))+r11*1+rdx] > > + xor ebx,esi > > + rol ecx,5 > > + add r11d,eax > > + rol ebp,1 > > + and ebx,r13d > > + add r11d,ecx > > + rol r13d,30 > > + add r11d,ebx > > + xor r14d,DWORD[48+rsp] > > + mov eax,esi > > + mov DWORD[44+rsp],ebp > > + mov ebx,esi > > + xor r14d,DWORD[56+rsp] > > + and eax,r13d > > + mov ecx,r11d > > + xor r14d,DWORD[16+rsp] > > + lea edi,[((-1894007588))+rdi*1+rbp] > > + xor ebx,r13d > > + rol ecx,5 > > + add edi,eax > > + rol r14d,1 > > + and ebx,r12d > > + add edi,ecx > > + rol r12d,30 > > + add edi,ebx > > + xor edx,DWORD[52+rsp] > > + mov eax,r13d > > + mov DWORD[48+rsp],r14d > > + mov ebx,r13d > > + xor edx,DWORD[60+rsp] > > + and eax,r12d > > + mov ecx,edi > > + xor edx,DWORD[20+rsp] > > + lea esi,[((-1894007588))+rsi*1+r14] > > + xor ebx,r12d > > + rol ecx,5 > > + add esi,eax > > + rol edx,1 > > + and ebx,r11d > > + add esi,ecx > > + rol r11d,30 > > + add esi,ebx > > + xor ebp,DWORD[56+rsp] > > + mov eax,r12d > > + mov DWORD[52+rsp],edx > > + mov ebx,r12d > > + xor ebp,DWORD[rsp] > > + and eax,r11d > > + mov ecx,esi > > + xor ebp,DWORD[24+rsp] > > + lea r13d,[((-1894007588))+r13*1+rdx] > > + xor ebx,r11d > > + rol ecx,5 > > + add r13d,eax > > + rol ebp,1 > > + and ebx,edi > > + add r13d,ecx > > + rol edi,30 > > + add r13d,ebx > > + xor r14d,DWORD[60+rsp] > > + mov eax,r11d > > + mov DWORD[56+rsp],ebp > > + mov ebx,r11d > > + xor r14d,DWORD[4+rsp] > > + and eax,edi > > + mov ecx,r13d > > + xor r14d,DWORD[28+rsp] > > + lea r12d,[((-1894007588))+r12*1+rbp] > > + xor ebx,edi > > + rol ecx,5 > > + add r12d,eax > > + rol r14d,1 > > + and ebx,esi > > + add r12d,ecx > > + rol esi,30 > > + add r12d,ebx > > + xor edx,DWORD[rsp] > > + mov eax,edi > > + mov DWORD[60+rsp],r14d > > + mov ebx,edi > > + xor edx,DWORD[8+rsp] > > + and eax,esi > > + mov ecx,r12d > > + xor edx,DWORD[32+rsp] > > + lea r11d,[((-1894007588))+r11*1+r14] > > + xor ebx,esi > > + rol ecx,5 > > + add r11d,eax > > + rol edx,1 > > + and ebx,r13d > > + add r11d,ecx > > + rol r13d,30 > > + add r11d,ebx > > + xor ebp,DWORD[4+rsp] > > + mov eax,esi > > + mov DWORD[rsp],edx > > + mov ebx,esi > > + xor ebp,DWORD[12+rsp] > > + and eax,r13d > > + mov ecx,r11d > > + xor ebp,DWORD[36+rsp] > > + lea edi,[((-1894007588))+rdi*1+rdx] > > + xor ebx,r13d > > + rol ecx,5 > > + add edi,eax > > + rol ebp,1 > > + and ebx,r12d > > + add edi,ecx > > + rol r12d,30 > > + add edi,ebx > > + xor r14d,DWORD[8+rsp] > > + mov eax,r13d > > + mov DWORD[4+rsp],ebp > > + mov ebx,r13d > > + xor r14d,DWORD[16+rsp] > > + and eax,r12d > > + mov ecx,edi > > + xor r14d,DWORD[40+rsp] > > + lea esi,[((-1894007588))+rsi*1+rbp] > > + xor ebx,r12d > > + rol ecx,5 > > + add esi,eax > > + rol r14d,1 > > + and ebx,r11d > > + add esi,ecx > > + rol r11d,30 > > + add esi,ebx > > + xor edx,DWORD[12+rsp] > > + mov eax,r12d > > + mov DWORD[8+rsp],r14d > > + mov ebx,r12d > > + xor edx,DWORD[20+rsp] > > + and eax,r11d > > + mov ecx,esi > > + xor edx,DWORD[44+rsp] > > + lea r13d,[((-1894007588))+r13*1+r14] > > + xor ebx,r11d > > + rol ecx,5 > > + add r13d,eax > > + rol edx,1 > > + and ebx,edi > > + add r13d,ecx > > + rol edi,30 > > + add r13d,ebx > > + xor ebp,DWORD[16+rsp] > > + mov eax,r11d > > + mov DWORD[12+rsp],edx > > + mov ebx,r11d > > + xor ebp,DWORD[24+rsp] > > + and eax,edi > > + mov ecx,r13d > > + xor ebp,DWORD[48+rsp] > > + lea r12d,[((-1894007588))+r12*1+rdx] > > + xor ebx,edi > > + rol ecx,5 > > + add r12d,eax > > + rol ebp,1 > > + and ebx,esi > > + add r12d,ecx > > + rol esi,30 > > + add r12d,ebx > > + xor r14d,DWORD[20+rsp] > > + mov eax,edi > > + mov DWORD[16+rsp],ebp > > + mov ebx,edi > > + xor r14d,DWORD[28+rsp] > > + and eax,esi > > + mov ecx,r12d > > + xor r14d,DWORD[52+rsp] > > + lea r11d,[((-1894007588))+r11*1+rbp] > > + xor ebx,esi > > + rol ecx,5 > > + add r11d,eax > > + rol r14d,1 > > + and ebx,r13d > > + add r11d,ecx > > + rol r13d,30 > > + add r11d,ebx > > + xor edx,DWORD[24+rsp] > > + mov eax,esi > > + mov DWORD[20+rsp],r14d > > + mov ebx,esi > > + xor edx,DWORD[32+rsp] > > + and eax,r13d > > + mov ecx,r11d > > + xor edx,DWORD[56+rsp] > > + lea edi,[((-1894007588))+rdi*1+r14] > > + xor ebx,r13d > > + rol ecx,5 > > + add edi,eax > > + rol edx,1 > > + and ebx,r12d > > + add edi,ecx > > + rol r12d,30 > > + add edi,ebx > > + xor ebp,DWORD[28+rsp] > > + mov eax,r13d > > + mov DWORD[24+rsp],edx > > + mov ebx,r13d > > + xor ebp,DWORD[36+rsp] > > + and eax,r12d > > + mov ecx,edi > > + xor ebp,DWORD[60+rsp] > > + lea esi,[((-1894007588))+rsi*1+rdx] > > + xor ebx,r12d > > + rol ecx,5 > > + add esi,eax > > + rol ebp,1 > > + and ebx,r11d > > + add esi,ecx > > + rol r11d,30 > > + add esi,ebx > > + xor r14d,DWORD[32+rsp] > > + mov eax,r12d > > + mov DWORD[28+rsp],ebp > > + mov ebx,r12d > > + xor r14d,DWORD[40+rsp] > > + and eax,r11d > > + mov ecx,esi > > + xor r14d,DWORD[rsp] > > + lea r13d,[((-1894007588))+r13*1+rbp] > > + xor ebx,r11d > > + rol ecx,5 > > + add r13d,eax > > + rol r14d,1 > > + and ebx,edi > > + add r13d,ecx > > + rol edi,30 > > + add r13d,ebx > > + xor edx,DWORD[36+rsp] > > + mov eax,r11d > > + mov DWORD[32+rsp],r14d > > + mov ebx,r11d > > + xor edx,DWORD[44+rsp] > > + and eax,edi > > + mov ecx,r13d > > + xor edx,DWORD[4+rsp] > > + lea r12d,[((-1894007588))+r12*1+r14] > > + xor ebx,edi > > + rol ecx,5 > > + add r12d,eax > > + rol edx,1 > > + and ebx,esi > > + add r12d,ecx > > + rol esi,30 > > + add r12d,ebx > > + xor ebp,DWORD[40+rsp] > > + mov eax,edi > > + mov DWORD[36+rsp],edx > > + mov ebx,edi > > + xor ebp,DWORD[48+rsp] > > + and eax,esi > > + mov ecx,r12d > > + xor ebp,DWORD[8+rsp] > > + lea r11d,[((-1894007588))+r11*1+rdx] > > + xor ebx,esi > > + rol ecx,5 > > + add r11d,eax > > + rol ebp,1 > > + and ebx,r13d > > + add r11d,ecx > > + rol r13d,30 > > + add r11d,ebx > > + xor r14d,DWORD[44+rsp] > > + mov eax,esi > > + mov DWORD[40+rsp],ebp > > + mov ebx,esi > > + xor r14d,DWORD[52+rsp] > > + and eax,r13d > > + mov ecx,r11d > > + xor r14d,DWORD[12+rsp] > > + lea edi,[((-1894007588))+rdi*1+rbp] > > + xor ebx,r13d > > + rol ecx,5 > > + add edi,eax > > + rol r14d,1 > > + and ebx,r12d > > + add edi,ecx > > + rol r12d,30 > > + add edi,ebx > > + xor edx,DWORD[48+rsp] > > + mov eax,r13d > > + mov DWORD[44+rsp],r14d > > + mov ebx,r13d > > + xor edx,DWORD[56+rsp] > > + and eax,r12d > > + mov ecx,edi > > + xor edx,DWORD[16+rsp] > > + lea esi,[((-1894007588))+rsi*1+r14] > > + xor ebx,r12d > > + rol ecx,5 > > + add esi,eax > > + rol edx,1 > > + and ebx,r11d > > + add esi,ecx > > + rol r11d,30 > > + add esi,ebx > > + xor ebp,DWORD[52+rsp] > > + mov eax,edi > > + mov DWORD[48+rsp],edx > > + mov ecx,esi > > + xor ebp,DWORD[60+rsp] > > + xor eax,r12d > > + rol ecx,5 > > + xor ebp,DWORD[20+rsp] > > + lea r13d,[((-899497514))+r13*1+rdx] > > + xor eax,r11d > > + add r13d,ecx > > + rol edi,30 > > + add r13d,eax > > + rol ebp,1 > > + xor r14d,DWORD[56+rsp] > > + mov eax,esi > > + mov DWORD[52+rsp],ebp > > + mov ecx,r13d > > + xor r14d,DWORD[rsp] > > + xor eax,r11d > > + rol ecx,5 > > + xor r14d,DWORD[24+rsp] > > + lea r12d,[((-899497514))+r12*1+rbp] > > + xor eax,edi > > + add r12d,ecx > > + rol esi,30 > > + add r12d,eax > > + rol r14d,1 > > + xor edx,DWORD[60+rsp] > > + mov eax,r13d > > + mov DWORD[56+rsp],r14d > > + mov ecx,r12d > > + xor edx,DWORD[4+rsp] > > + xor eax,edi > > + rol ecx,5 > > + xor edx,DWORD[28+rsp] > > + lea r11d,[((-899497514))+r11*1+r14] > > + xor eax,esi > > + add r11d,ecx > > + rol r13d,30 > > + add r11d,eax > > + rol edx,1 > > + xor ebp,DWORD[rsp] > > + mov eax,r12d > > + mov DWORD[60+rsp],edx > > + mov ecx,r11d > > + xor ebp,DWORD[8+rsp] > > + xor eax,esi > > + rol ecx,5 > > + xor ebp,DWORD[32+rsp] > > + lea edi,[((-899497514))+rdi*1+rdx] > > + xor eax,r13d > > + add edi,ecx > > + rol r12d,30 > > + add edi,eax > > + rol ebp,1 > > + xor r14d,DWORD[4+rsp] > > + mov eax,r11d > > + mov DWORD[rsp],ebp > > + mov ecx,edi > > + xor r14d,DWORD[12+rsp] > > + xor eax,r13d > > + rol ecx,5 > > + xor r14d,DWORD[36+rsp] > > + lea esi,[((-899497514))+rsi*1+rbp] > > + xor eax,r12d > > + add esi,ecx > > + rol r11d,30 > > + add esi,eax > > + rol r14d,1 > > + xor edx,DWORD[8+rsp] > > + mov eax,edi > > + mov DWORD[4+rsp],r14d > > + mov ecx,esi > > + xor edx,DWORD[16+rsp] > > + xor eax,r12d > > + rol ecx,5 > > + xor edx,DWORD[40+rsp] > > + lea r13d,[((-899497514))+r13*1+r14] > > + xor eax,r11d > > + add r13d,ecx > > + rol edi,30 > > + add r13d,eax > > + rol edx,1 > > + xor ebp,DWORD[12+rsp] > > + mov eax,esi > > + mov DWORD[8+rsp],edx > > + mov ecx,r13d > > + xor ebp,DWORD[20+rsp] > > + xor eax,r11d > > + rol ecx,5 > > + xor ebp,DWORD[44+rsp] > > + lea r12d,[((-899497514))+r12*1+rdx] > > + xor eax,edi > > + add r12d,ecx > > + rol esi,30 > > + add r12d,eax > > + rol ebp,1 > > + xor r14d,DWORD[16+rsp] > > + mov eax,r13d > > + mov DWORD[12+rsp],ebp > > + mov ecx,r12d > > + xor r14d,DWORD[24+rsp] > > + xor eax,edi > > + rol ecx,5 > > + xor r14d,DWORD[48+rsp] > > + lea r11d,[((-899497514))+r11*1+rbp] > > + xor eax,esi > > + add r11d,ecx > > + rol r13d,30 > > + add r11d,eax > > + rol r14d,1 > > + xor edx,DWORD[20+rsp] > > + mov eax,r12d > > + mov DWORD[16+rsp],r14d > > + mov ecx,r11d > > + xor edx,DWORD[28+rsp] > > + xor eax,esi > > + rol ecx,5 > > + xor edx,DWORD[52+rsp] > > + lea edi,[((-899497514))+rdi*1+r14] > > + xor eax,r13d > > + add edi,ecx > > + rol r12d,30 > > + add edi,eax > > + rol edx,1 > > + xor ebp,DWORD[24+rsp] > > + mov eax,r11d > > + mov DWORD[20+rsp],edx > > + mov ecx,edi > > + xor ebp,DWORD[32+rsp] > > + xor eax,r13d > > + rol ecx,5 > > + xor ebp,DWORD[56+rsp] > > + lea esi,[((-899497514))+rsi*1+rdx] > > + xor eax,r12d > > + add esi,ecx > > + rol r11d,30 > > + add esi,eax > > + rol ebp,1 > > + xor r14d,DWORD[28+rsp] > > + mov eax,edi > > + mov DWORD[24+rsp],ebp > > + mov ecx,esi > > + xor r14d,DWORD[36+rsp] > > + xor eax,r12d > > + rol ecx,5 > > + xor r14d,DWORD[60+rsp] > > + lea r13d,[((-899497514))+r13*1+rbp] > > + xor eax,r11d > > + add r13d,ecx > > + rol edi,30 > > + add r13d,eax > > + rol r14d,1 > > + xor edx,DWORD[32+rsp] > > + mov eax,esi > > + mov DWORD[28+rsp],r14d > > + mov ecx,r13d > > + xor edx,DWORD[40+rsp] > > + xor eax,r11d > > + rol ecx,5 > > + xor edx,DWORD[rsp] > > + lea r12d,[((-899497514))+r12*1+r14] > > + xor eax,edi > > + add r12d,ecx > > + rol esi,30 > > + add r12d,eax > > + rol edx,1 > > + xor ebp,DWORD[36+rsp] > > + mov eax,r13d > > + > > + mov ecx,r12d > > + xor ebp,DWORD[44+rsp] > > + xor eax,edi > > + rol ecx,5 > > + xor ebp,DWORD[4+rsp] > > + lea r11d,[((-899497514))+r11*1+rdx] > > + xor eax,esi > > + add r11d,ecx > > + rol r13d,30 > > + add r11d,eax > > + rol ebp,1 > > + xor r14d,DWORD[40+rsp] > > + mov eax,r12d > > + > > + mov ecx,r11d > > + xor r14d,DWORD[48+rsp] > > + xor eax,esi > > + rol ecx,5 > > + xor r14d,DWORD[8+rsp] > > + lea edi,[((-899497514))+rdi*1+rbp] > > + xor eax,r13d > > + add edi,ecx > > + rol r12d,30 > > + add edi,eax > > + rol r14d,1 > > + xor edx,DWORD[44+rsp] > > + mov eax,r11d > > + > > + mov ecx,edi > > + xor edx,DWORD[52+rsp] > > + xor eax,r13d > > + rol ecx,5 > > + xor edx,DWORD[12+rsp] > > + lea esi,[((-899497514))+rsi*1+r14] > > + xor eax,r12d > > + add esi,ecx > > + rol r11d,30 > > + add esi,eax > > + rol edx,1 > > + xor ebp,DWORD[48+rsp] > > + mov eax,edi > > + > > + mov ecx,esi > > + xor ebp,DWORD[56+rsp] > > + xor eax,r12d > > + rol ecx,5 > > + xor ebp,DWORD[16+rsp] > > + lea r13d,[((-899497514))+r13*1+rdx] > > + xor eax,r11d > > + add r13d,ecx > > + rol edi,30 > > + add r13d,eax > > + rol ebp,1 > > + xor r14d,DWORD[52+rsp] > > + mov eax,esi > > + > > + mov ecx,r13d > > + xor r14d,DWORD[60+rsp] > > + xor eax,r11d > > + rol ecx,5 > > + xor r14d,DWORD[20+rsp] > > + lea r12d,[((-899497514))+r12*1+rbp] > > + xor eax,edi > > + add r12d,ecx > > + rol esi,30 > > + add r12d,eax > > + rol r14d,1 > > + xor edx,DWORD[56+rsp] > > + mov eax,r13d > > + > > + mov ecx,r12d > > + xor edx,DWORD[rsp] > > + xor eax,edi > > + rol ecx,5 > > + xor edx,DWORD[24+rsp] > > + lea r11d,[((-899497514))+r11*1+r14] > > + xor eax,esi > > + add r11d,ecx > > + rol r13d,30 > > + add r11d,eax > > + rol edx,1 > > + xor ebp,DWORD[60+rsp] > > + mov eax,r12d > > + > > + mov ecx,r11d > > + xor ebp,DWORD[4+rsp] > > + xor eax,esi > > + rol ecx,5 > > + xor ebp,DWORD[28+rsp] > > + lea edi,[((-899497514))+rdi*1+rdx] > > + xor eax,r13d > > + add edi,ecx > > + rol r12d,30 > > + add edi,eax > > + rol ebp,1 > > + mov eax,r11d > > + mov ecx,edi > > + xor eax,r13d > > + lea esi,[((-899497514))+rsi*1+rbp] > > + rol ecx,5 > > + xor eax,r12d > > + add esi,ecx > > + rol r11d,30 > > + add esi,eax > > + add esi,DWORD[r8] > > + add edi,DWORD[4+r8] > > + add r11d,DWORD[8+r8] > > + add r12d,DWORD[12+r8] > > + add r13d,DWORD[16+r8] > > + mov DWORD[r8],esi > > + mov DWORD[4+r8],edi > > + mov DWORD[8+r8],r11d > > + mov DWORD[12+r8],r12d > > + mov DWORD[16+r8],r13d > > + > > + sub r10,1 > > + lea r9,[64+r9] > > + jnz NEAR $L$loop > > + > > + mov rsi,QWORD[64+rsp] > > + > > + mov r14,QWORD[((-40))+rsi] > > + > > + mov r13,QWORD[((-32))+rsi] > > + > > + mov r12,QWORD[((-24))+rsi] > > + > > + mov rbp,QWORD[((-16))+rsi] > > + > > + mov rbx,QWORD[((-8))+rsi] > > + > > + lea rsp,[rsi] > > + > > +$L$epilogue: > > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > > + mov rsi,QWORD[16+rsp] > > + DB 0F3h,0C3h ;repret > > + > > +$L$SEH_end_sha1_block_data_order: > > + > > +ALIGN 32 > > +sha1_block_data_order_shaext: > > + mov QWORD[8+rsp],rdi ;WIN64 prologue > > + mov QWORD[16+rsp],rsi > > + mov rax,rsp > > +$L$SEH_begin_sha1_block_data_order_shaext: > > + mov rdi,rcx > > + mov rsi,rdx > > + mov rdx,r8 > > + > > + > > +_shaext_shortcut: > > + > > + lea rsp,[((-72))+rsp] > > + movaps XMMWORD[(-8-64)+rax],xmm6 > > + movaps XMMWORD[(-8-48)+rax],xmm7 > > + movaps XMMWORD[(-8-32)+rax],xmm8 > > + movaps XMMWORD[(-8-16)+rax],xmm9 > > +$L$prologue_shaext: > > + movdqu xmm0,XMMWORD[rdi] > > + movd xmm1,DWORD[16+rdi] > > + movdqa xmm3,XMMWORD[((K_XX_XX+160))] > > + > > + movdqu xmm4,XMMWORD[rsi] > > + pshufd xmm0,xmm0,27 > > + movdqu xmm5,XMMWORD[16+rsi] > > + pshufd xmm1,xmm1,27 > > + movdqu xmm6,XMMWORD[32+rsi] > > +DB 102,15,56,0,227 > > + movdqu xmm7,XMMWORD[48+rsi] > > +DB 102,15,56,0,235 > > +DB 102,15,56,0,243 > > + movdqa xmm9,xmm1 > > +DB 102,15,56,0,251 > > + jmp NEAR $L$oop_shaext > > + > > +ALIGN 16 > > +$L$oop_shaext: > > + dec rdx > > + lea r8,[64+rsi] > > + paddd xmm1,xmm4 > > + cmovne rsi,r8 > > + movdqa xmm8,xmm0 > > +DB 15,56,201,229 > > + movdqa xmm2,xmm0 > > +DB 15,58,204,193,0 > > +DB 15,56,200,213 > > + pxor xmm4,xmm6 > > +DB 15,56,201,238 > > +DB 15,56,202,231 > > + > > + movdqa xmm1,xmm0 > > +DB 15,58,204,194,0 > > +DB 15,56,200,206 > > + pxor xmm5,xmm7 > > +DB 15,56,202,236 > > +DB 15,56,201,247 > > + movdqa xmm2,xmm0 > > +DB 15,58,204,193,0 > > +DB 15,56,200,215 > > + pxor xmm6,xmm4 > > +DB 15,56,201,252 > > +DB 15,56,202,245 > > + > > + movdqa xmm1,xmm0 > > +DB 15,58,204,194,0 > > +DB 15,56,200,204 > > + pxor xmm7,xmm5 > > +DB 15,56,202,254 > > +DB 15,56,201,229 > > + movdqa xmm2,xmm0 > > +DB 15,58,204,193,0 > > +DB 15,56,200,213 > > + pxor xmm4,xmm6 > > +DB 15,56,201,238 > > +DB 15,56,202,231 > > + > > + movdqa xmm1,xmm0 > > +DB 15,58,204,194,1 > > +DB 15,56,200,206 > > + pxor xmm5,xmm7 > > +DB 15,56,202,236 > > +DB 15,56,201,247 > > + movdqa xmm2,xmm0 > > +DB 15,58,204,193,1 > > +DB 15,56,200,215 > > + pxor xmm6,xmm4 > > +DB 15,56,201,252 > > +DB 15,56,202,245 > > + > > + movdqa xmm1,xmm0 > > +DB 15,58,204,194,1 > > +DB 15,56,200,204 > > + pxor xmm7,xmm5 > > +DB 15,56,202,254 > > +DB 15,56,201,229 > > + movdqa xmm2,xmm0 > > +DB 15,58,204,193,1 > > +DB 15,56,200,213 > > + pxor xmm4,xmm6 > > +DB 15,56,201,238 > > +DB 15,56,202,231 > > + > > + movdqa xmm1,xmm0 > > +DB 15,58,204,194,1 > > +DB 15,56,200,206 > > + pxor xmm5,xmm7 > > +DB 15,56,202,236 > > +DB 15,56,201,247 > > + movdqa xmm2,xmm0 > > +DB 15,58,204,193,2 > > +DB 15,56,200,215 > > + pxor xmm6,xmm4 > > +DB 15,56,201,252 > > +DB 15,56,202,245 > > + > > + movdqa xmm1,xmm0 > > +DB 15,58,204,194,2 > > +DB 15,56,200,204 > > + pxor xmm7,xmm5 > > +DB 15,56,202,254 > > +DB 15,56,201,229 > > + movdqa xmm2,xmm0 > > +DB 15,58,204,193,2 > > +DB 15,56,200,213 > > + pxor xmm4,xmm6 > > +DB 15,56,201,238 > > +DB 15,56,202,231 > > + > > + movdqa xmm1,xmm0 > > +DB 15,58,204,194,2 > > +DB 15,56,200,206 > > + pxor xmm5,xmm7 > > +DB 15,56,202,236 > > +DB 15,56,201,247 > > + movdqa xmm2,xmm0 > > +DB 15,58,204,193,2 > > +DB 15,56,200,215 > > + pxor xmm6,xmm4 > > +DB 15,56,201,252 > > +DB 15,56,202,245 > > + > > + movdqa xmm1,xmm0 > > +DB 15,58,204,194,3 > > +DB 15,56,200,204 > > + pxor xmm7,xmm5 > > +DB 15,56,202,254 > > + movdqu xmm4,XMMWORD[rsi] > > + movdqa xmm2,xmm0 > > +DB 15,58,204,193,3 > > +DB 15,56,200,213 > > + movdqu xmm5,XMMWORD[16+rsi] > > +DB 102,15,56,0,227 > > + > > + movdqa xmm1,xmm0 > > +DB 15,58,204,194,3 > > +DB 15,56,200,206 > > + movdqu xmm6,XMMWORD[32+rsi] > > +DB 102,15,56,0,235 > > + > > + movdqa xmm2,xmm0 > > +DB 15,58,204,193,3 > > +DB 15,56,200,215 > > + movdqu xmm7,XMMWORD[48+rsi] > > +DB 102,15,56,0,243 > > + > > + movdqa xmm1,xmm0 > > +DB 15,58,204,194,3 > > +DB 65,15,56,200,201 > > +DB 102,15,56,0,251 > > + > > + paddd xmm0,xmm8 > > + movdqa xmm9,xmm1 > > + > > + jnz NEAR $L$oop_shaext > > + > > + pshufd xmm0,xmm0,27 > > + pshufd xmm1,xmm1,27 > > + movdqu XMMWORD[rdi],xmm0 > > + movd DWORD[16+rdi],xmm1 > > + movaps xmm6,XMMWORD[((-8-64))+rax] > > + movaps xmm7,XMMWORD[((-8-48))+rax] > > + movaps xmm8,XMMWORD[((-8-32))+rax] > > + movaps xmm9,XMMWORD[((-8-16))+rax] > > + mov rsp,rax > > +$L$epilogue_shaext: > > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > > + mov rsi,QWORD[16+rsp] > > + DB 0F3h,0C3h ;repret > > + > > +$L$SEH_end_sha1_block_data_order_shaext: > > + > > +ALIGN 16 > > +sha1_block_data_order_ssse3: > > + mov QWORD[8+rsp],rdi ;WIN64 prologue > > + mov QWORD[16+rsp],rsi > > + mov rax,rsp > > +$L$SEH_begin_sha1_block_data_order_ssse3: > > + mov rdi,rcx > > + mov rsi,rdx > > + mov rdx,r8 > > + > > + > > +_ssse3_shortcut: > > + > > + mov r11,rsp > > + > > + push rbx > > + > > + push rbp > > + > > + push r12 > > + > > + push r13 > > + > > + push r14 > > + > > + lea rsp,[((-160))+rsp] > > + movaps XMMWORD[(-40-96)+r11],xmm6 > > + movaps XMMWORD[(-40-80)+r11],xmm7 > > + movaps XMMWORD[(-40-64)+r11],xmm8 > > + movaps XMMWORD[(-40-48)+r11],xmm9 > > + movaps XMMWORD[(-40-32)+r11],xmm10 > > + movaps XMMWORD[(-40-16)+r11],xmm11 > > +$L$prologue_ssse3: > > + and rsp,-64 > > + mov r8,rdi > > + mov r9,rsi > > + mov r10,rdx > > + > > + shl r10,6 > > + add r10,r9 > > + lea r14,[((K_XX_XX+64))] > > + > > + mov eax,DWORD[r8] > > + mov ebx,DWORD[4+r8] > > + mov ecx,DWORD[8+r8] > > + mov edx,DWORD[12+r8] > > + mov esi,ebx > > + mov ebp,DWORD[16+r8] > > + mov edi,ecx > > + xor edi,edx > > + and esi,edi > > + > > + movdqa xmm6,XMMWORD[64+r14] > > + movdqa xmm9,XMMWORD[((-64))+r14] > > + movdqu xmm0,XMMWORD[r9] > > + movdqu xmm1,XMMWORD[16+r9] > > + movdqu xmm2,XMMWORD[32+r9] > > + movdqu xmm3,XMMWORD[48+r9] > > +DB 102,15,56,0,198 > > +DB 102,15,56,0,206 > > +DB 102,15,56,0,214 > > + add r9,64 > > + paddd xmm0,xmm9 > > +DB 102,15,56,0,222 > > + paddd xmm1,xmm9 > > + paddd xmm2,xmm9 > > + movdqa XMMWORD[rsp],xmm0 > > + psubd xmm0,xmm9 > > + movdqa XMMWORD[16+rsp],xmm1 > > + psubd xmm1,xmm9 > > + movdqa XMMWORD[32+rsp],xmm2 > > + psubd xmm2,xmm9 > > + jmp NEAR $L$oop_ssse3 > > +ALIGN 16 > > +$L$oop_ssse3: > > + ror ebx,2 > > + pshufd xmm4,xmm0,238 > > + xor esi,edx > > + movdqa xmm8,xmm3 > > + paddd xmm9,xmm3 > > + mov edi,eax > > + add ebp,DWORD[rsp] > > + punpcklqdq xmm4,xmm1 > > + xor ebx,ecx > > + rol eax,5 > > + add ebp,esi > > + psrldq xmm8,4 > > + and edi,ebx > > + xor ebx,ecx > > + pxor xmm4,xmm0 > > + add ebp,eax > > + ror eax,7 > > + pxor xmm8,xmm2 > > + xor edi,ecx > > + mov esi,ebp > > + add edx,DWORD[4+rsp] > > + pxor xmm4,xmm8 > > + xor eax,ebx > > + rol ebp,5 > > + movdqa XMMWORD[48+rsp],xmm9 > > + add edx,edi > > + and esi,eax > > + movdqa xmm10,xmm4 > > + xor eax,ebx > > + add edx,ebp > > + ror ebp,7 > > + movdqa xmm8,xmm4 > > + xor esi,ebx > > + pslldq xmm10,12 > > + paddd xmm4,xmm4 > > + mov edi,edx > > + add ecx,DWORD[8+rsp] > > + psrld xmm8,31 > > + xor ebp,eax > > + rol edx,5 > > + add ecx,esi > > + movdqa xmm9,xmm10 > > + and edi,ebp > > + xor ebp,eax > > + psrld xmm10,30 > > + add ecx,edx > > + ror edx,7 > > + por xmm4,xmm8 > > + xor edi,eax > > + mov esi,ecx > > + add ebx,DWORD[12+rsp] > > + pslld xmm9,2 > > + pxor xmm4,xmm10 > > + xor edx,ebp > > + movdqa xmm10,XMMWORD[((-64))+r14] > > + rol ecx,5 > > + add ebx,edi > > + and esi,edx > > + pxor xmm4,xmm9 > > + xor edx,ebp > > + add ebx,ecx > > + ror ecx,7 > > + pshufd xmm5,xmm1,238 > > + xor esi,ebp > > + movdqa xmm9,xmm4 > > + paddd xmm10,xmm4 > > + mov edi,ebx > > + add eax,DWORD[16+rsp] > > + punpcklqdq xmm5,xmm2 > > + xor ecx,edx > > + rol ebx,5 > > + add eax,esi > > + psrldq xmm9,4 > > + and edi,ecx > > + xor ecx,edx > > + pxor xmm5,xmm1 > > + add eax,ebx > > + ror ebx,7 > > + pxor xmm9,xmm3 > > + xor edi,edx > > + mov esi,eax > > + add ebp,DWORD[20+rsp] > > + pxor xmm5,xmm9 > > + xor ebx,ecx > > + rol eax,5 > > + movdqa XMMWORD[rsp],xmm10 > > + add ebp,edi > > + and esi,ebx > > + movdqa xmm8,xmm5 > > + xor ebx,ecx > > + add ebp,eax > > + ror eax,7 > > + movdqa xmm9,xmm5 > > + xor esi,ecx > > + pslldq xmm8,12 > > + paddd xmm5,xmm5 > > + mov edi,ebp > > + add edx,DWORD[24+rsp] > > + psrld xmm9,31 > > + xor eax,ebx > > + rol ebp,5 > > + add edx,esi > > + movdqa xmm10,xmm8 > > + and edi,eax > > + xor eax,ebx > > + psrld xmm8,30 > > + add edx,ebp > > + ror ebp,7 > > + por xmm5,xmm9 > > + xor edi,ebx > > + mov esi,edx > > + add ecx,DWORD[28+rsp] > > + pslld xmm10,2 > > + pxor xmm5,xmm8 > > + xor ebp,eax > > + movdqa xmm8,XMMWORD[((-32))+r14] > > + rol edx,5 > > + add ecx,edi > > + and esi,ebp > > + pxor xmm5,xmm10 > > + xor ebp,eax > > + add ecx,edx > > + ror edx,7 > > + pshufd xmm6,xmm2,238 > > + xor esi,eax > > + movdqa xmm10,xmm5 > > + paddd xmm8,xmm5 > > + mov edi,ecx > > + add ebx,DWORD[32+rsp] > > + punpcklqdq xmm6,xmm3 > > + xor edx,ebp > > + rol ecx,5 > > + add ebx,esi > > + psrldq xmm10,4 > > + and edi,edx > > + xor edx,ebp > > + pxor xmm6,xmm2 > > + add ebx,ecx > > + ror ecx,7 > > + pxor xmm10,xmm4 > > + xor edi,ebp > > + mov esi,ebx > > + add eax,DWORD[36+rsp] > > + pxor xmm6,xmm10 > > + xor ecx,edx > > + rol ebx,5 > > + movdqa XMMWORD[16+rsp],xmm8 > > + add eax,edi > > + and esi,ecx > > + movdqa xmm9,xmm6 > > + xor ecx,edx > > + add eax,ebx > > + ror ebx,7 > > + movdqa xmm10,xmm6 > > + xor esi,edx > > + pslldq xmm9,12 > > + paddd xmm6,xmm6 > > + mov edi,eax > > + add ebp,DWORD[40+rsp] > > + psrld xmm10,31 > > + xor ebx,ecx > > + rol eax,5 > > + add ebp,esi > > + movdqa xmm8,xmm9 > > + and edi,ebx > > + xor ebx,ecx > > + psrld xmm9,30 > > + add ebp,eax > > + ror eax,7 > > + por xmm6,xmm10 > > + xor edi,ecx > > + mov esi,ebp > > + add edx,DWORD[44+rsp] > > + pslld xmm8,2 > > + pxor xmm6,xmm9 > > + xor eax,ebx > > + movdqa xmm9,XMMWORD[((-32))+r14] > > + rol ebp,5 > > + add edx,edi > > + and esi,eax > > + pxor xmm6,xmm8 > > + xor eax,ebx > > + add edx,ebp > > + ror ebp,7 > > + pshufd xmm7,xmm3,238 > > + xor esi,ebx > > + movdqa xmm8,xmm6 > > + paddd xmm9,xmm6 > > + mov edi,edx > > + add ecx,DWORD[48+rsp] > > + punpcklqdq xmm7,xmm4 > > + xor ebp,eax > > + rol edx,5 > > + add ecx,esi > > + psrldq xmm8,4 > > + and edi,ebp > > + xor ebp,eax > > + pxor xmm7,xmm3 > > + add ecx,edx > > + ror edx,7 > > + pxor xmm8,xmm5 > > + xor edi,eax > > + mov esi,ecx > > + add ebx,DWORD[52+rsp] > > + pxor xmm7,xmm8 > > + xor edx,ebp > > + rol ecx,5 > > + movdqa XMMWORD[32+rsp],xmm9 > > + add ebx,edi > > + and esi,edx > > + movdqa xmm10,xmm7 > > + xor edx,ebp > > + add ebx,ecx > > + ror ecx,7 > > + movdqa xmm8,xmm7 > > + xor esi,ebp > > + pslldq xmm10,12 > > + paddd xmm7,xmm7 > > + mov edi,ebx > > + add eax,DWORD[56+rsp] > > + psrld xmm8,31 > > + xor ecx,edx > > + rol ebx,5 > > + add eax,esi > > + movdqa xmm9,xmm10 > > + and edi,ecx > > + xor ecx,edx > > + psrld xmm10,30 > > + add eax,ebx > > + ror ebx,7 > > + por xmm7,xmm8 > > + xor edi,edx > > + mov esi,eax > > + add ebp,DWORD[60+rsp] > > + pslld xmm9,2 > > + pxor xmm7,xmm10 > > + xor ebx,ecx > > + movdqa xmm10,XMMWORD[((-32))+r14] > > + rol eax,5 > > + add ebp,edi > > + and esi,ebx > > + pxor xmm7,xmm9 > > + pshufd xmm9,xmm6,238 > > + xor ebx,ecx > > + add ebp,eax > > + ror eax,7 > > + pxor xmm0,xmm4 > > + xor esi,ecx > > + mov edi,ebp > > + add edx,DWORD[rsp] > > + punpcklqdq xmm9,xmm7 > > + xor eax,ebx > > + rol ebp,5 > > + pxor xmm0,xmm1 > > + add edx,esi > > + and edi,eax > > + movdqa xmm8,xmm10 > > + xor eax,ebx > > + paddd xmm10,xmm7 > > + add edx,ebp > > + pxor xmm0,xmm9 > > + ror ebp,7 > > + xor edi,ebx > > + mov esi,edx > > + add ecx,DWORD[4+rsp] > > + movdqa xmm9,xmm0 > > + xor ebp,eax > > + rol edx,5 > > + movdqa XMMWORD[48+rsp],xmm10 > > + add ecx,edi > > + and esi,ebp > > + xor ebp,eax > > + pslld xmm0,2 > > + add ecx,edx > > + ror edx,7 > > + psrld xmm9,30 > > + xor esi,eax > > + mov edi,ecx > > + add ebx,DWORD[8+rsp] > > + por xmm0,xmm9 > > + xor edx,ebp > > + rol ecx,5 > > + pshufd xmm10,xmm7,238 > > + add ebx,esi > > + and edi,edx > > + xor edx,ebp > > + add ebx,ecx > > + add eax,DWORD[12+rsp] > > + xor edi,ebp > > + mov esi,ebx > > + rol ebx,5 > > + add eax,edi > > + xor esi,edx > > + ror ecx,7 > > + add eax,ebx > > + pxor xmm1,xmm5 > > + add ebp,DWORD[16+rsp] > > + xor esi,ecx > > + punpcklqdq xmm10,xmm0 > > + mov edi,eax > > + rol eax,5 > > + pxor xmm1,xmm2 > > + add ebp,esi > > + xor edi,ecx > > + movdqa xmm9,xmm8 > > + ror ebx,7 > > + paddd xmm8,xmm0 > > + add ebp,eax > > + pxor xmm1,xmm10 > > + add edx,DWORD[20+rsp] > > + xor edi,ebx > > + mov esi,ebp > > + rol ebp,5 > > + movdqa xmm10,xmm1 > > + add edx,edi > > + xor esi,ebx > > + movdqa XMMWORD[rsp],xmm8 > > + ror eax,7 > > + add edx,ebp > > + add ecx,DWORD[24+rsp] > > + pslld xmm1,2 > > + xor esi,eax > > + mov edi,edx > > + psrld xmm10,30 > > + rol edx,5 > > + add ecx,esi > > + xor edi,eax > > + ror ebp,7 > > + por xmm1,xmm10 > > + add ecx,edx > > + add ebx,DWORD[28+rsp] > > + pshufd xmm8,xmm0,238 > > + xor edi,ebp > > + mov esi,ecx > > + rol ecx,5 > > + add ebx,edi > > + xor esi,ebp > > + ror edx,7 > > + add ebx,ecx > > + pxor xmm2,xmm6 > > + add eax,DWORD[32+rsp] > > + xor esi,edx > > + punpcklqdq xmm8,xmm1 > > + mov edi,ebx > > + rol ebx,5 > > + pxor xmm2,xmm3 > > + add eax,esi > > + xor edi,edx > > + movdqa xmm10,XMMWORD[r14] > > + ror ecx,7 > > + paddd xmm9,xmm1 > > + add eax,ebx > > + pxor xmm2,xmm8 > > + add ebp,DWORD[36+rsp] > > + xor edi,ecx > > + mov esi,eax > > + rol eax,5 > > + movdqa xmm8,xmm2 > > + add ebp,edi > > + xor esi,ecx > > + movdqa XMMWORD[16+rsp],xmm9 > > + ror ebx,7 > > + add ebp,eax > > + add edx,DWORD[40+rsp] > > + pslld xmm2,2 > > + xor esi,ebx > > + mov edi,ebp > > + psrld xmm8,30 > > + rol ebp,5 > > + add edx,esi > > + xor edi,ebx > > + ror eax,7 > > + por xmm2,xmm8 > > + add edx,ebp > > + add ecx,DWORD[44+rsp] > > + pshufd xmm9,xmm1,238 > > + xor edi,eax > > + mov esi,edx > > + rol edx,5 > > + add ecx,edi > > + xor esi,eax > > + ror ebp,7 > > + add ecx,edx > > + pxor xmm3,xmm7 > > + add ebx,DWORD[48+rsp] > > + xor esi,ebp > > + punpcklqdq xmm9,xmm2 > > + mov edi,ecx > > + rol ecx,5 > > + pxor xmm3,xmm4 > > + add ebx,esi > > + xor edi,ebp > > + movdqa xmm8,xmm10 > > + ror edx,7 > > + paddd xmm10,xmm2 > > + add ebx,ecx > > + pxor xmm3,xmm9 > > + add eax,DWORD[52+rsp] > > + xor edi,edx > > + mov esi,ebx > > + rol ebx,5 > > + movdqa xmm9,xmm3 > > + add eax,edi > > + xor esi,edx > > + movdqa XMMWORD[32+rsp],xmm10 > > + ror ecx,7 > > + add eax,ebx > > + add ebp,DWORD[56+rsp] > > + pslld xmm3,2 > > + xor esi,ecx > > + mov edi,eax > > + psrld xmm9,30 > > + rol eax,5 > > + add ebp,esi > > + xor edi,ecx > > + ror ebx,7 > > + por xmm3,xmm9 > > + add ebp,eax > > + add edx,DWORD[60+rsp] > > + pshufd xmm10,xmm2,238 > > + xor edi,ebx > > + mov esi,ebp > > + rol ebp,5 > > + add edx,edi > > + xor esi,ebx > > + ror eax,7 > > + add edx,ebp > > + pxor xmm4,xmm0 > > + add ecx,DWORD[rsp] > > + xor esi,eax > > + punpcklqdq xmm10,xmm3 > > + mov edi,edx > > + rol edx,5 > > + pxor xmm4,xmm5 > > + add ecx,esi > > + xor edi,eax > > + movdqa xmm9,xmm8 > > + ror ebp,7 > > + paddd xmm8,xmm3 > > + add ecx,edx > > + pxor xmm4,xmm10 > > + add ebx,DWORD[4+rsp] > > + xor edi,ebp > > + mov esi,ecx > > + rol ecx,5 > > + movdqa xmm10,xmm4 > > + add ebx,edi > > + xor esi,ebp > > + movdqa XMMWORD[48+rsp],xmm8 > > + ror edx,7 > > + add ebx,ecx > > + add eax,DWORD[8+rsp] > > + pslld xmm4,2 > > + xor esi,edx > > + mov edi,ebx > > + psrld xmm10,30 > > + rol ebx,5 > > + add eax,esi > > + xor edi,edx > > + ror ecx,7 > > + por xmm4,xmm10 > > + add eax,ebx > > + add ebp,DWORD[12+rsp] > > + pshufd xmm8,xmm3,238 > > + xor edi,ecx > > + mov esi,eax > > + rol eax,5 > > + add ebp,edi > > + xor esi,ecx > > + ror ebx,7 > > + add ebp,eax > > + pxor xmm5,xmm1 > > + add edx,DWORD[16+rsp] > > + xor esi,ebx > > + punpcklqdq xmm8,xmm4 > > + mov edi,ebp > > + rol ebp,5 > > + pxor xmm5,xmm6 > > + add edx,esi > > + xor edi,ebx > > + movdqa xmm10,xmm9 > > + ror eax,7 > > + paddd xmm9,xmm4 > > + add edx,ebp > > + pxor xmm5,xmm8 > > + add ecx,DWORD[20+rsp] > > + xor edi,eax > > + mov esi,edx > > + rol edx,5 > > + movdqa xmm8,xmm5 > > + add ecx,edi > > + xor esi,eax > > + movdqa XMMWORD[rsp],xmm9 > > + ror ebp,7 > > + add ecx,edx > > + add ebx,DWORD[24+rsp] > > + pslld xmm5,2 > > + xor esi,ebp > > + mov edi,ecx > > + psrld xmm8,30 > > + rol ecx,5 > > + add ebx,esi > > + xor edi,ebp > > + ror edx,7 > > + por xmm5,xmm8 > > + add ebx,ecx > > + add eax,DWORD[28+rsp] > > + pshufd xmm9,xmm4,238 > > + ror ecx,7 > > + mov esi,ebx > > + xor edi,edx > > + rol ebx,5 > > + add eax,edi > > + xor esi,ecx > > + xor ecx,edx > > + add eax,ebx > > + pxor xmm6,xmm2 > > + add ebp,DWORD[32+rsp] > > + and esi,ecx > > + xor ecx,edx > > + ror ebx,7 > > + punpcklqdq xmm9,xmm5 > > + mov edi,eax > > + xor esi,ecx > > + pxor xmm6,xmm7 > > + rol eax,5 > > + add ebp,esi > > + movdqa xmm8,xmm10 > > + xor edi,ebx > > + paddd xmm10,xmm5 > > + xor ebx,ecx > > + pxor xmm6,xmm9 > > + add ebp,eax > > + add edx,DWORD[36+rsp] > > + and edi,ebx > > + xor ebx,ecx > > + ror eax,7 > > + movdqa xmm9,xmm6 > > + mov esi,ebp > > + xor edi,ebx > > + movdqa XMMWORD[16+rsp],xmm10 > > + rol ebp,5 > > + add edx,edi > > + xor esi,eax > > + pslld xmm6,2 > > + xor eax,ebx > > + add edx,ebp > > + psrld xmm9,30 > > + add ecx,DWORD[40+rsp] > > + and esi,eax > > + xor eax,ebx > > + por xmm6,xmm9 > > + ror ebp,7 > > + mov edi,edx > > + xor esi,eax > > + rol edx,5 > > + pshufd xmm10,xmm5,238 > > + add ecx,esi > > + xor edi,ebp > > + xor ebp,eax > > + add ecx,edx > > + add ebx,DWORD[44+rsp] > > + and edi,ebp > > + xor ebp,eax > > + ror edx,7 > > + mov esi,ecx > > + xor edi,ebp > > + rol ecx,5 > > + add ebx,edi > > + xor esi,edx > > + xor edx,ebp > > + add ebx,ecx > > + pxor xmm7,xmm3 > > + add eax,DWORD[48+rsp] > > + and esi,edx > > + xor edx,ebp > > + ror ecx,7 > > + punpcklqdq xmm10,xmm6 > > + mov edi,ebx > > + xor esi,edx > > + pxor xmm7,xmm0 > > + rol ebx,5 > > + add eax,esi > > + movdqa xmm9,XMMWORD[32+r14] > > + xor edi,ecx > > + paddd xmm8,xmm6 > > + xor ecx,edx > > + pxor xmm7,xmm10 > > + add eax,ebx > > + add ebp,DWORD[52+rsp] > > + and edi,ecx > > + xor ecx,edx > > + ror ebx,7 > > + movdqa xmm10,xmm7 > > + mov esi,eax > > + xor edi,ecx > > + movdqa XMMWORD[32+rsp],xmm8 > > + rol eax,5 > > + add ebp,edi > > + xor esi,ebx > > + pslld xmm7,2 > > + xor ebx,ecx > > + add ebp,eax > > + psrld xmm10,30 > > + add edx,DWORD[56+rsp] > > + and esi,ebx > > + xor ebx,ecx > > + por xmm7,xmm10 > > + ror eax,7 > > + mov edi,ebp > > + xor esi,ebx > > + rol ebp,5 > > + pshufd xmm8,xmm6,238 > > + add edx,esi > > + xor edi,eax > > + xor eax,ebx > > + add edx,ebp > > + add ecx,DWORD[60+rsp] > > + and edi,eax > > + xor eax,ebx > > + ror ebp,7 > > + mov esi,edx > > + xor edi,eax > > + rol edx,5 > > + add ecx,edi > > + xor esi,ebp > > + xor ebp,eax > > + add ecx,edx > > + pxor xmm0,xmm4 > > + add ebx,DWORD[rsp] > > + and esi,ebp > > + xor ebp,eax > > + ror edx,7 > > + punpcklqdq xmm8,xmm7 > > + mov edi,ecx > > + xor esi,ebp > > + pxor xmm0,xmm1 > > + rol ecx,5 > > + add ebx,esi > > + movdqa xmm10,xmm9 > > + xor edi,edx > > + paddd xmm9,xmm7 > > + xor edx,ebp > > + pxor xmm0,xmm8 > > + add ebx,ecx > > + add eax,DWORD[4+rsp] > > + and edi,edx > > + xor edx,ebp > > + ror ecx,7 > > + movdqa xmm8,xmm0 > > + mov esi,ebx > > + xor edi,edx > > + movdqa XMMWORD[48+rsp],xmm9 > > + rol ebx,5 > > + add eax,edi > > + xor esi,ecx > > + pslld xmm0,2 > > + xor ecx,edx > > + add eax,ebx > > + psrld xmm8,30 > > + add ebp,DWORD[8+rsp] > > + and esi,ecx > > + xor ecx,edx > > + por xmm0,xmm8 > > + ror ebx,7 > > + mov edi,eax > > + xor esi,ecx > > + rol eax,5 > > + pshufd xmm9,xmm7,238 > > + add ebp,esi > > + xor edi,ebx > > + xor ebx,ecx > > + add ebp,eax > > + add edx,DWORD[12+rsp] > > + and edi,ebx > > + xor ebx,ecx > > + ror eax,7 > > + mov esi,ebp > > + xor edi,ebx > > + rol ebp,5 > > + add edx,edi > > + xor esi,eax > > + xor eax,ebx > > + add edx,ebp > > + pxor xmm1,xmm5 > > + add ecx,DWORD[16+rsp] > > + and esi,eax > > + xor eax,ebx > > + ror ebp,7 > > + punpcklqdq xmm9,xmm0 > > + mov edi,edx > > + xor esi,eax > > + pxor xmm1,xmm2 > > + rol edx,5 > > + add ecx,esi > > + movdqa xmm8,xmm10 > > + xor edi,ebp > > + paddd xmm10,xmm0 > > + xor ebp,eax > > + pxor xmm1,xmm9 > > + add ecx,edx > > + add ebx,DWORD[20+rsp] > > + and edi,ebp > > + xor ebp,eax > > + ror edx,7 > > + movdqa xmm9,xmm1 > > + mov esi,ecx > > + xor edi,ebp > > + movdqa XMMWORD[rsp],xmm10 > > + rol ecx,5 > > + add ebx,edi > > + xor esi,edx > > + pslld xmm1,2 > > + xor edx,ebp > > + add ebx,ecx > > + psrld xmm9,30 > > + add eax,DWORD[24+rsp] > > + and esi,edx > > + xor edx,ebp > > + por xmm1,xmm9 > > + ror ecx,7 > > + mov edi,ebx > > + xor esi,edx > > + rol ebx,5 > > + pshufd xmm10,xmm0,238 > > + add eax,esi > > + xor edi,ecx > > + xor ecx,edx > > + add eax,ebx > > + add ebp,DWORD[28+rsp] > > + and edi,ecx > > + xor ecx,edx > > + ror ebx,7 > > + mov esi,eax > > + xor edi,ecx > > + rol eax,5 > > + add ebp,edi > > + xor esi,ebx > > + xor ebx,ecx > > + add ebp,eax > > + pxor xmm2,xmm6 > > + add edx,DWORD[32+rsp] > > + and esi,ebx > > + xor ebx,ecx > > + ror eax,7 > > + punpcklqdq xmm10,xmm1 > > + mov edi,ebp > > + xor esi,ebx > > + pxor xmm2,xmm3 > > + rol ebp,5 > > + add edx,esi > > + movdqa xmm9,xmm8 > > + xor edi,eax > > + paddd xmm8,xmm1 > > + xor eax,ebx > > + pxor xmm2,xmm10 > > + add edx,ebp > > + add ecx,DWORD[36+rsp] > > + and edi,eax > > + xor eax,ebx > > + ror ebp,7 > > + movdqa xmm10,xmm2 > > + mov esi,edx > > + xor edi,eax > > + movdqa XMMWORD[16+rsp],xmm8 > > + rol edx,5 > > + add ecx,edi > > + xor esi,ebp > > + pslld xmm2,2 > > + xor ebp,eax > > + add ecx,edx > > + psrld xmm10,30 > > + add ebx,DWORD[40+rsp] > > + and esi,ebp > > + xor ebp,eax > > + por xmm2,xmm10 > > + ror edx,7 > > + mov edi,ecx > > + xor esi,ebp > > + rol ecx,5 > > + pshufd xmm8,xmm1,238 > > + add ebx,esi > > + xor edi,edx > > + xor edx,ebp > > + add ebx,ecx > > + add eax,DWORD[44+rsp] > > + and edi,edx > > + xor edx,ebp > > + ror ecx,7 > > + mov esi,ebx > > + xor edi,edx > > + rol ebx,5 > > + add eax,edi > > + xor esi,edx > > + add eax,ebx > > + pxor xmm3,xmm7 > > + add ebp,DWORD[48+rsp] > > + xor esi,ecx > > + punpcklqdq xmm8,xmm2 > > + mov edi,eax > > + rol eax,5 > > + pxor xmm3,xmm4 > > + add ebp,esi > > + xor edi,ecx > > + movdqa xmm10,xmm9 > > + ror ebx,7 > > + paddd xmm9,xmm2 > > + add ebp,eax > > + pxor xmm3,xmm8 > > + add edx,DWORD[52+rsp] > > + xor edi,ebx > > + mov esi,ebp > > + rol ebp,5 > > + movdqa xmm8,xmm3 > > + add edx,edi > > + xor esi,ebx > > + movdqa XMMWORD[32+rsp],xmm9 > > + ror eax,7 > > + add edx,ebp > > + add ecx,DWORD[56+rsp] > > + pslld xmm3,2 > > + xor esi,eax > > + mov edi,edx > > + psrld xmm8,30 > > + rol edx,5 > > + add ecx,esi > > + xor edi,eax > > + ror ebp,7 > > + por xmm3,xmm8 > > + add ecx,edx > > + add ebx,DWORD[60+rsp] > > + xor edi,ebp > > + mov esi,ecx > > + rol ecx,5 > > + add ebx,edi > > + xor esi,ebp > > + ror edx,7 > > + add ebx,ecx > > + add eax,DWORD[rsp] > > + xor esi,edx > > + mov edi,ebx > > + rol ebx,5 > > + paddd xmm10,xmm3 > > + add eax,esi > > + xor edi,edx > > + movdqa XMMWORD[48+rsp],xmm10 > > + ror ecx,7 > > + add eax,ebx > > + add ebp,DWORD[4+rsp] > > + xor edi,ecx > > + mov esi,eax > > + rol eax,5 > > + add ebp,edi > > + xor esi,ecx > > + ror ebx,7 > > + add ebp,eax > > + add edx,DWORD[8+rsp] > > + xor esi,ebx > > + mov edi,ebp > > + rol ebp,5 > > + add edx,esi > > + xor edi,ebx > > + ror eax,7 > > + add edx,ebp > > + add ecx,DWORD[12+rsp] > > + xor edi,eax > > + mov esi,edx > > + rol edx,5 > > + add ecx,edi > > + xor esi,eax > > + ror ebp,7 > > + add ecx,edx > > + cmp r9,r10 > > + je NEAR $L$done_ssse3 > > + movdqa xmm6,XMMWORD[64+r14] > > + movdqa xmm9,XMMWORD[((-64))+r14] > > + movdqu xmm0,XMMWORD[r9] > > + movdqu xmm1,XMMWORD[16+r9] > > + movdqu xmm2,XMMWORD[32+r9] > > + movdqu xmm3,XMMWORD[48+r9] > > +DB 102,15,56,0,198 > > + add r9,64 > > + add ebx,DWORD[16+rsp] > > + xor esi,ebp > > + mov edi,ecx > > +DB 102,15,56,0,206 > > + rol ecx,5 > > + add ebx,esi > > + xor edi,ebp > > + ror edx,7 > > + paddd xmm0,xmm9 > > + add ebx,ecx > > + add eax,DWORD[20+rsp] > > + xor edi,edx > > + mov esi,ebx > > + movdqa XMMWORD[rsp],xmm0 > > + rol ebx,5 > > + add eax,edi > > + xor esi,edx > > + ror ecx,7 > > + psubd xmm0,xmm9 > > + add eax,ebx > > + add ebp,DWORD[24+rsp] > > + xor esi,ecx > > + mov edi,eax > > + rol eax,5 > > + add ebp,esi > > + xor edi,ecx > > + ror ebx,7 > > + add ebp,eax > > + add edx,DWORD[28+rsp] > > + xor edi,ebx > > + mov esi,ebp > > + rol ebp,5 > > + add edx,edi > > + xor esi,ebx > > + ror eax,7 > > + add edx,ebp > > + add ecx,DWORD[32+rsp] > > + xor esi,eax > > + mov edi,edx > > +DB 102,15,56,0,214 > > + rol edx,5 > > + add ecx,esi > > + xor edi,eax > > + ror ebp,7 > > + paddd xmm1,xmm9 > > + add ecx,edx > > + add ebx,DWORD[36+rsp] > > + xor edi,ebp > > + mov esi,ecx > > + movdqa XMMWORD[16+rsp],xmm1 > > + rol ecx,5 > > + add ebx,edi > > + xor esi,ebp > > + ror edx,7 > > + psubd xmm1,xmm9 > > + add ebx,ecx > > + add eax,DWORD[40+rsp] > > + xor esi,edx > > + mov edi,ebx > > + rol ebx,5 > > + add eax,esi > > + xor edi,edx > > + ror ecx,7 > > + add eax,ebx > > + add ebp,DWORD[44+rsp] > > + xor edi,ecx > > + mov esi,eax > > + rol eax,5 > > + add ebp,edi > > + xor esi,ecx > > + ror ebx,7 > > + add ebp,eax > > + add edx,DWORD[48+rsp] > > + xor esi,ebx > > + mov edi,ebp > > +DB 102,15,56,0,222 > > + rol ebp,5 > > + add edx,esi > > + xor edi,ebx > > + ror eax,7 > > + paddd xmm2,xmm9 > > + add edx,ebp > > + add ecx,DWORD[52+rsp] > > + xor edi,eax > > + mov esi,edx > > + movdqa XMMWORD[32+rsp],xmm2 > > + rol edx,5 > > + add ecx,edi > > + xor esi,eax > > + ror ebp,7 > > + psubd xmm2,xmm9 > > + add ecx,edx > > + add ebx,DWORD[56+rsp] > > + xor esi,ebp > > + mov edi,ecx > > + rol ecx,5 > > + add ebx,esi > > + xor edi,ebp > > + ror edx,7 > > + add ebx,ecx > > + add eax,DWORD[60+rsp] > > + xor edi,edx > > + mov esi,ebx > > + rol ebx,5 > > + add eax,edi > > + ror ecx,7 > > + add eax,ebx > > + add eax,DWORD[r8] > > + add esi,DWORD[4+r8] > > + add ecx,DWORD[8+r8] > > + add edx,DWORD[12+r8] > > + mov DWORD[r8],eax > > + add ebp,DWORD[16+r8] > > + mov DWORD[4+r8],esi > > + mov ebx,esi > > + mov DWORD[8+r8],ecx > > + mov edi,ecx > > + mov DWORD[12+r8],edx > > + xor edi,edx > > + mov DWORD[16+r8],ebp > > + and esi,edi > > + jmp NEAR $L$oop_ssse3 > > + > > +ALIGN 16 > > +$L$done_ssse3: > > + add ebx,DWORD[16+rsp] > > + xor esi,ebp > > + mov edi,ecx > > + rol ecx,5 > > + add ebx,esi > > + xor edi,ebp > > + ror edx,7 > > + add ebx,ecx > > + add eax,DWORD[20+rsp] > > + xor edi,edx > > + mov esi,ebx > > + rol ebx,5 > > + add eax,edi > > + xor esi,edx > > + ror ecx,7 > > + add eax,ebx > > + add ebp,DWORD[24+rsp] > > + xor esi,ecx > > + mov edi,eax > > + rol eax,5 > > + add ebp,esi > > + xor edi,ecx > > + ror ebx,7 > > + add ebp,eax > > + add edx,DWORD[28+rsp] > > + xor edi,ebx > > + mov esi,ebp > > + rol ebp,5 > > + add edx,edi > > + xor esi,ebx > > + ror eax,7 > > + add edx,ebp > > + add ecx,DWORD[32+rsp] > > + xor esi,eax > > + mov edi,edx > > + rol edx,5 > > + add ecx,esi > > + xor edi,eax > > + ror ebp,7 > > + add ecx,edx > > + add ebx,DWORD[36+rsp] > > + xor edi,ebp > > + mov esi,ecx > > + rol ecx,5 > > + add ebx,edi > > + xor esi,ebp > > + ror edx,7 > > + add ebx,ecx > > + add eax,DWORD[40+rsp] > > + xor esi,edx > > + mov edi,ebx > > + rol ebx,5 > > + add eax,esi > > + xor edi,edx > > + ror ecx,7 > > + add eax,ebx > > + add ebp,DWORD[44+rsp] > > + xor edi,ecx > > + mov esi,eax > > + rol eax,5 > > + add ebp,edi > > + xor esi,ecx > > + ror ebx,7 > > + add ebp,eax > > + add edx,DWORD[48+rsp] > > + xor esi,ebx > > + mov edi,ebp > > + rol ebp,5 > > + add edx,esi > > + xor edi,ebx > > + ror eax,7 > > + add edx,ebp > > + add ecx,DWORD[52+rsp] > > + xor edi,eax > > + mov esi,edx > > + rol edx,5 > > + add ecx,edi > > + xor esi,eax > > + ror ebp,7 > > + add ecx,edx > > + add ebx,DWORD[56+rsp] > > + xor esi,ebp > > + mov edi,ecx > > + rol ecx,5 > > + add ebx,esi > > + xor edi,ebp > > + ror edx,7 > > + add ebx,ecx > > + add eax,DWORD[60+rsp] > > + xor edi,edx > > + mov esi,ebx > > + rol ebx,5 > > + add eax,edi > > + ror ecx,7 > > + add eax,ebx > > + add eax,DWORD[r8] > > + add esi,DWORD[4+r8] > > + add ecx,DWORD[8+r8] > > + mov DWORD[r8],eax > > + add edx,DWORD[12+r8] > > + mov DWORD[4+r8],esi > > + add ebp,DWORD[16+r8] > > + mov DWORD[8+r8],ecx > > + mov DWORD[12+r8],edx > > + mov DWORD[16+r8],ebp > > + movaps xmm6,XMMWORD[((-40-96))+r11] > > + movaps xmm7,XMMWORD[((-40-80))+r11] > > + movaps xmm8,XMMWORD[((-40-64))+r11] > > + movaps xmm9,XMMWORD[((-40-48))+r11] > > + movaps xmm10,XMMWORD[((-40-32))+r11] > > + movaps xmm11,XMMWORD[((-40-16))+r11] > > + mov r14,QWORD[((-40))+r11] > > + > > + mov r13,QWORD[((-32))+r11] > > + > > + mov r12,QWORD[((-24))+r11] > > + > > + mov rbp,QWORD[((-16))+r11] > > + > > + mov rbx,QWORD[((-8))+r11] > > + > > + lea rsp,[r11] > > + > > +$L$epilogue_ssse3: > > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > > + mov rsi,QWORD[16+rsp] > > + DB 0F3h,0C3h ;repret > > + > > +$L$SEH_end_sha1_block_data_order_ssse3: > > +ALIGN 64 > > +K_XX_XX: > > + DD 0x5a827999,0x5a827999,0x5a827999,0x5a827999 > > + DD 0x5a827999,0x5a827999,0x5a827999,0x5a827999 > > + DD 0x6ed9eba1,0x6ed9eba1,0x6ed9eba1,0x6ed9eba1 > > + DD 0x6ed9eba1,0x6ed9eba1,0x6ed9eba1,0x6ed9eba1 > > + DD 0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc > > + DD 0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc > > + DD 0xca62c1d6,0xca62c1d6,0xca62c1d6,0xca62c1d6 > > + DD 0xca62c1d6,0xca62c1d6,0xca62c1d6,0xca62c1d6 > > + DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f > > + DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f > > +DB > 0xf,0xe,0xd,0xc,0xb,0xa,0x9,0x8,0x7,0x6,0x5,0x4,0x3,0x2,0x1,0x0 > > +DB 83,72,65,49,32,98,108,111,99,107,32,116,114,97,110,115 > > +DB 102,111,114,109,32,102,111,114,32,120,56,54,95,54,52,44 > > +DB 32,67,82,89,80,84,79,71,65,77,83,32,98,121,32,60 > > +DB > 97,112,112,114,111,64,111,112,101,110,115,115,108,46,111,114 > > +DB 103,62,0 > > +ALIGN 64 > > +EXTERN __imp_RtlVirtualUnwind > > + > > +ALIGN 16 > > +se_handler: > > + push rsi > > + push rdi > > + push rbx > > + push rbp > > + push r12 > > + push r13 > > + push r14 > > + push r15 > > + pushfq > > + sub rsp,64 > > + > > + mov rax,QWORD[120+r8] > > + mov rbx,QWORD[248+r8] > > + > > + lea r10,[$L$prologue] > > + cmp rbx,r10 > > + jb NEAR $L$common_seh_tail > > + > > + mov rax,QWORD[152+r8] > > + > > + lea r10,[$L$epilogue] > > + cmp rbx,r10 > > + jae NEAR $L$common_seh_tail > > + > > + mov rax,QWORD[64+rax] > > + > > + mov rbx,QWORD[((-8))+rax] > > + mov rbp,QWORD[((-16))+rax] > > + mov r12,QWORD[((-24))+rax] > > + mov r13,QWORD[((-32))+rax] > > + mov r14,QWORD[((-40))+rax] > > + mov QWORD[144+r8],rbx > > + mov QWORD[160+r8],rbp > > + mov QWORD[216+r8],r12 > > + mov QWORD[224+r8],r13 > > + mov QWORD[232+r8],r14 > > + > > + jmp NEAR $L$common_seh_tail > > + > > + > > +ALIGN 16 > > +shaext_handler: > > + push rsi > > + push rdi > > + push rbx > > + push rbp > > + push r12 > > + push r13 > > + push r14 > > + push r15 > > + pushfq > > + sub rsp,64 > > + > > + mov rax,QWORD[120+r8] > > + mov rbx,QWORD[248+r8] > > + > > + lea r10,[$L$prologue_shaext] > > + cmp rbx,r10 > > + jb NEAR $L$common_seh_tail > > + > > + lea r10,[$L$epilogue_shaext] > > + cmp rbx,r10 > > + jae NEAR $L$common_seh_tail > > + > > + lea rsi,[((-8-64))+rax] > > + lea rdi,[512+r8] > > + mov ecx,8 > > + DD 0xa548f3fc > > + > > + jmp NEAR $L$common_seh_tail > > + > > + > > +ALIGN 16 > > +ssse3_handler: > > + push rsi > > + push rdi > > + push rbx > > + push rbp > > + push r12 > > + push r13 > > + push r14 > > + push r15 > > + pushfq > > + sub rsp,64 > > + > > + mov rax,QWORD[120+r8] > > + mov rbx,QWORD[248+r8] > > + > > + mov rsi,QWORD[8+r9] > > + mov r11,QWORD[56+r9] > > + > > + mov r10d,DWORD[r11] > > + lea r10,[r10*1+rsi] > > + cmp rbx,r10 > > + jb NEAR $L$common_seh_tail > > + > > + mov rax,QWORD[208+r8] > > + > > + mov r10d,DWORD[4+r11] > > + lea r10,[r10*1+rsi] > > + cmp rbx,r10 > > + jae NEAR $L$common_seh_tail > > + > > + lea rsi,[((-40-96))+rax] > > + lea rdi,[512+r8] > > + mov ecx,12 > > + DD 0xa548f3fc > > + > > + mov rbx,QWORD[((-8))+rax] > > + mov rbp,QWORD[((-16))+rax] > > + mov r12,QWORD[((-24))+rax] > > + mov r13,QWORD[((-32))+rax] > > + mov r14,QWORD[((-40))+rax] > > + mov QWORD[144+r8],rbx > > + mov QWORD[160+r8],rbp > > + mov QWORD[216+r8],r12 > > + mov QWORD[224+r8],r13 > > + mov QWORD[232+r8],r14 > > + > > +$L$common_seh_tail: > > + mov rdi,QWORD[8+rax] > > + mov rsi,QWORD[16+rax] > > + mov QWORD[152+r8],rax > > + mov QWORD[168+r8],rsi > > + mov QWORD[176+r8],rdi > > + > > + mov rdi,QWORD[40+r9] > > + mov rsi,r8 > > + mov ecx,154 > > + DD 0xa548f3fc > > + > > + mov rsi,r9 > > + xor rcx,rcx > > + mov rdx,QWORD[8+rsi] > > + mov r8,QWORD[rsi] > > + mov r9,QWORD[16+rsi] > > + mov r10,QWORD[40+rsi] > > + lea r11,[56+rsi] > > + lea r12,[24+rsi] > > + mov QWORD[32+rsp],r10 > > + mov QWORD[40+rsp],r11 > > + mov QWORD[48+rsp],r12 > > + mov QWORD[56+rsp],rcx > > + call QWORD[__imp_RtlVirtualUnwind] > > + > > + mov eax,1 > > + add rsp,64 > > + popfq > > + pop r15 > > + pop r14 > > + pop r13 > > + pop r12 > > + pop rbp > > + pop rbx > > + pop rdi > > + pop rsi > > + DB 0F3h,0C3h ;repret > > + > > + > > +section .pdata rdata align=4 > > +ALIGN 4 > > + DD $L$SEH_begin_sha1_block_data_order > wrt ..imagebase > > + DD $L$SEH_end_sha1_block_data_order wrt ..imagebase > > + DD $L$SEH_info_sha1_block_data_order wrt ..imagebase > > + DD $L$SEH_begin_sha1_block_data_order_shaext > wrt ..imagebase > > + DD $L$SEH_end_sha1_block_data_order_shaext > wrt ..imagebase > > + DD $L$SEH_info_sha1_block_data_order_shaext > wrt ..imagebase > > + DD $L$SEH_begin_sha1_block_data_order_ssse3 > wrt ..imagebase > > + DD $L$SEH_end_sha1_block_data_order_ssse3 > wrt ..imagebase > > + DD $L$SEH_info_sha1_block_data_order_ssse3 > wrt ..imagebase > > +section .xdata rdata align=8 > > +ALIGN 8 > > +$L$SEH_info_sha1_block_data_order: > > +DB 9,0,0,0 > > + DD se_handler wrt ..imagebase > > +$L$SEH_info_sha1_block_data_order_shaext: > > +DB 9,0,0,0 > > + DD shaext_handler wrt ..imagebase > > +$L$SEH_info_sha1_block_data_order_ssse3: > > +DB 9,0,0,0 > > + DD ssse3_handler wrt ..imagebase > > + DD $L$prologue_ssse3 > wrt ..imagebase,$L$epilogue_ssse3 > > wrt ..imagebase > > diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-mb- > > x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-mb- > > x86_64.nasm > > new file mode 100644 > > index 0000000000..7cd5eae85c > > --- /dev/null > > +++ > b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-mb-x86_64.nasm > > @@ -0,0 +1,3461 @@ > > +; WARNING: do not edit! > > +; Generated from openssl/crypto/sha/asm/sha256-mb-x86_64.pl > > +; > > +; Copyright 2013-2020 The OpenSSL Project Authors. All Rights Reserved. > > +; > > +; Licensed under the OpenSSL license (the "License"). You may not use > > +; this file except in compliance with the License. You can obtain a copy > > +; in the file LICENSE in the source distribution or at > > +; https://www.openssl.org/source/license.html > > + > > +default rel > > +%define XMMWORD > > +%define YMMWORD > > +%define ZMMWORD > > +section .text code align=64 > > + > > + > > +EXTERN OPENSSL_ia32cap_P > > + > > +global sha256_multi_block > > + > > +ALIGN 32 > > +sha256_multi_block: > > + mov QWORD[8+rsp],rdi ;WIN64 prologue > > + mov QWORD[16+rsp],rsi > > + mov rax,rsp > > +$L$SEH_begin_sha256_multi_block: > > + mov rdi,rcx > > + mov rsi,rdx > > + mov rdx,r8 > > + > > + > > + > > + mov rcx,QWORD[((OPENSSL_ia32cap_P+4))] > > + bt rcx,61 > > + jc NEAR _shaext_shortcut > > + mov rax,rsp > > + > > + push rbx > > + > > + push rbp > > + > > + lea rsp,[((-168))+rsp] > > + movaps XMMWORD[rsp],xmm6 > > + movaps XMMWORD[16+rsp],xmm7 > > + movaps XMMWORD[32+rsp],xmm8 > > + movaps XMMWORD[48+rsp],xmm9 > > + movaps XMMWORD[(-120)+rax],xmm10 > > + movaps XMMWORD[(-104)+rax],xmm11 > > + movaps XMMWORD[(-88)+rax],xmm12 > > + movaps XMMWORD[(-72)+rax],xmm13 > > + movaps XMMWORD[(-56)+rax],xmm14 > > + movaps XMMWORD[(-40)+rax],xmm15 > > + sub rsp,288 > > + and rsp,-256 > > + mov QWORD[272+rsp],rax > > + > > +$L$body: > > + lea rbp,[((K256+128))] > > + lea rbx,[256+rsp] > > + lea rdi,[128+rdi] > > + > > +$L$oop_grande: > > + mov DWORD[280+rsp],edx > > + xor edx,edx > > + mov r8,QWORD[rsi] > > + mov ecx,DWORD[8+rsi] > > + cmp ecx,edx > > + cmovg edx,ecx > > + test ecx,ecx > > + mov DWORD[rbx],ecx > > + cmovle r8,rbp > > + mov r9,QWORD[16+rsi] > > + mov ecx,DWORD[24+rsi] > > + cmp ecx,edx > > + cmovg edx,ecx > > + test ecx,ecx > > + mov DWORD[4+rbx],ecx > > + cmovle r9,rbp > > + mov r10,QWORD[32+rsi] > > + mov ecx,DWORD[40+rsi] > > + cmp ecx,edx > > + cmovg edx,ecx > > + test ecx,ecx > > + mov DWORD[8+rbx],ecx > > + cmovle r10,rbp > > + mov r11,QWORD[48+rsi] > > + mov ecx,DWORD[56+rsi] > > + cmp ecx,edx > > + cmovg edx,ecx > > + test ecx,ecx > > + mov DWORD[12+rbx],ecx > > + cmovle r11,rbp > > + test edx,edx > > + jz NEAR $L$done > > + > > + movdqu xmm8,XMMWORD[((0-128))+rdi] > > + lea rax,[128+rsp] > > + movdqu xmm9,XMMWORD[((32-128))+rdi] > > + movdqu xmm10,XMMWORD[((64-128))+rdi] > > + movdqu xmm11,XMMWORD[((96-128))+rdi] > > + movdqu xmm12,XMMWORD[((128-128))+rdi] > > + movdqu xmm13,XMMWORD[((160-128))+rdi] > > + movdqu xmm14,XMMWORD[((192-128))+rdi] > > + movdqu xmm15,XMMWORD[((224-128))+rdi] > > + movdqu xmm6,XMMWORD[$L$pbswap] > > + jmp NEAR $L$oop > > + > > +ALIGN 32 > > +$L$oop: > > + movdqa xmm4,xmm10 > > + pxor xmm4,xmm9 > > + movd xmm5,DWORD[r8] > > + movd xmm0,DWORD[r9] > > + movd xmm1,DWORD[r10] > > + movd xmm2,DWORD[r11] > > + punpckldq xmm5,xmm1 > > + punpckldq xmm0,xmm2 > > + punpckldq xmm5,xmm0 > > + movdqa xmm7,xmm12 > > +DB 102,15,56,0,238 > > + movdqa xmm2,xmm12 > > + > > + psrld xmm7,6 > > + movdqa xmm1,xmm12 > > + pslld xmm2,7 > > + movdqa XMMWORD[(0-128)+rax],xmm5 > > + paddd xmm5,xmm15 > > + > > + psrld xmm1,11 > > + pxor xmm7,xmm2 > > + pslld xmm2,21-7 > > + paddd xmm5,XMMWORD[((-128))+rbp] > > + pxor xmm7,xmm1 > > + > > + psrld xmm1,25-11 > > + movdqa xmm0,xmm12 > > + > > + pxor xmm7,xmm2 > > + movdqa xmm3,xmm12 > > + pslld xmm2,26-21 > > + pandn xmm0,xmm14 > > + pand xmm3,xmm13 > > + pxor xmm7,xmm1 > > + > > + > > + movdqa xmm1,xmm8 > > + pxor xmm7,xmm2 > > + movdqa xmm2,xmm8 > > + psrld xmm1,2 > > + paddd xmm5,xmm7 > > + pxor xmm0,xmm3 > > + movdqa xmm3,xmm9 > > + movdqa xmm7,xmm8 > > + pslld xmm2,10 > > + pxor xmm3,xmm8 > > + > > + > > + psrld xmm7,13 > > + pxor xmm1,xmm2 > > + paddd xmm5,xmm0 > > + pslld xmm2,19-10 > > + pand xmm4,xmm3 > > + pxor xmm1,xmm7 > > + > > + > > + psrld xmm7,22-13 > > + pxor xmm1,xmm2 > > + movdqa xmm15,xmm9 > > + pslld xmm2,30-19 > > + pxor xmm7,xmm1 > > + pxor xmm15,xmm4 > > + paddd xmm11,xmm5 > > + pxor xmm7,xmm2 > > + > > + paddd xmm15,xmm5 > > + paddd xmm15,xmm7 > > + movd xmm5,DWORD[4+r8] > > + movd xmm0,DWORD[4+r9] > > + movd xmm1,DWORD[4+r10] > > + movd xmm2,DWORD[4+r11] > > + punpckldq xmm5,xmm1 > > + punpckldq xmm0,xmm2 > > + punpckldq xmm5,xmm0 > > + movdqa xmm7,xmm11 > > + > > + movdqa xmm2,xmm11 > > +DB 102,15,56,0,238 > > + psrld xmm7,6 > > + movdqa xmm1,xmm11 > > + pslld xmm2,7 > > + movdqa XMMWORD[(16-128)+rax],xmm5 > > + paddd xmm5,xmm14 > > + > > + psrld xmm1,11 > > + pxor xmm7,xmm2 > > + pslld xmm2,21-7 > > + paddd xmm5,XMMWORD[((-96))+rbp] > > + pxor xmm7,xmm1 > > + > > + psrld xmm1,25-11 > > + movdqa xmm0,xmm11 > > + > > + pxor xmm7,xmm2 > > + movdqa xmm4,xmm11 > > + pslld xmm2,26-21 > > + pandn xmm0,xmm13 > > + pand xmm4,xmm12 > > + pxor xmm7,xmm1 > > + > > + > > + movdqa xmm1,xmm15 > > + pxor xmm7,xmm2 > > + movdqa xmm2,xmm15 > > + psrld xmm1,2 > > + paddd xmm5,xmm7 > > + pxor xmm0,xmm4 > > + movdqa xmm4,xmm8 > > + movdqa xmm7,xmm15 > > + pslld xmm2,10 > > + pxor xmm4,xmm15 > > + > > + > > + psrld xmm7,13 > > + pxor xmm1,xmm2 > > + paddd xmm5,xmm0 > > + pslld xmm2,19-10 > > + pand xmm3,xmm4 > > + pxor xmm1,xmm7 > > + > > + > > + psrld xmm7,22-13 > > + pxor xmm1,xmm2 > > + movdqa xmm14,xmm8 > > + pslld xmm2,30-19 > > + pxor xmm7,xmm1 > > + pxor xmm14,xmm3 > > + paddd xmm10,xmm5 > > + pxor xmm7,xmm2 > > + > > + paddd xmm14,xmm5 > > + paddd xmm14,xmm7 > > + movd xmm5,DWORD[8+r8] > > + movd xmm0,DWORD[8+r9] > > + movd xmm1,DWORD[8+r10] > > + movd xmm2,DWORD[8+r11] > > + punpckldq xmm5,xmm1 > > + punpckldq xmm0,xmm2 > > + punpckldq xmm5,xmm0 > > + movdqa xmm7,xmm10 > > +DB 102,15,56,0,238 > > + movdqa xmm2,xmm10 > > + > > + psrld xmm7,6 > > + movdqa xmm1,xmm10 > > + pslld xmm2,7 > > + movdqa XMMWORD[(32-128)+rax],xmm5 > > + paddd xmm5,xmm13 > > + > > + psrld xmm1,11 > > + pxor xmm7,xmm2 > > + pslld xmm2,21-7 > > + paddd xmm5,XMMWORD[((-64))+rbp] > > + pxor xmm7,xmm1 > > + > > + psrld xmm1,25-11 > > + movdqa xmm0,xmm10 > > + > > + pxor xmm7,xmm2 > > + movdqa xmm3,xmm10 > > + pslld xmm2,26-21 > > + pandn xmm0,xmm12 > > + pand xmm3,xmm11 > > + pxor xmm7,xmm1 > > + > > + > > + movdqa xmm1,xmm14 > > + pxor xmm7,xmm2 > > + movdqa xmm2,xmm14 > > + psrld xmm1,2 > > + paddd xmm5,xmm7 > > + pxor xmm0,xmm3 > > + movdqa xmm3,xmm15 > > + movdqa xmm7,xmm14 > > + pslld xmm2,10 > > + pxor xmm3,xmm14 > > + > > + > > + psrld xmm7,13 > > + pxor xmm1,xmm2 > > + paddd xmm5,xmm0 > > + pslld xmm2,19-10 > > + pand xmm4,xmm3 > > + pxor xmm1,xmm7 > > + > > + > > + psrld xmm7,22-13 > > + pxor xmm1,xmm2 > > + movdqa xmm13,xmm15 > > + pslld xmm2,30-19 > > + pxor xmm7,xmm1 > > + pxor xmm13,xmm4 > > + paddd xmm9,xmm5 > > + pxor xmm7,xmm2 > > + > > + paddd xmm13,xmm5 > > + paddd xmm13,xmm7 > > + movd xmm5,DWORD[12+r8] > > + movd xmm0,DWORD[12+r9] > > + movd xmm1,DWORD[12+r10] > > + movd xmm2,DWORD[12+r11] > > + punpckldq xmm5,xmm1 > > + punpckldq xmm0,xmm2 > > + punpckldq xmm5,xmm0 > > + movdqa xmm7,xmm9 > > + > > + movdqa xmm2,xmm9 > > +DB 102,15,56,0,238 > > + psrld xmm7,6 > > + movdqa xmm1,xmm9 > > + pslld xmm2,7 > > + movdqa XMMWORD[(48-128)+rax],xmm5 > > + paddd xmm5,xmm12 > > + > > + psrld xmm1,11 > > + pxor xmm7,xmm2 > > + pslld xmm2,21-7 > > + paddd xmm5,XMMWORD[((-32))+rbp] > > + pxor xmm7,xmm1 > > + > > + psrld xmm1,25-11 > > + movdqa xmm0,xmm9 > > + > > + pxor xmm7,xmm2 > > + movdqa xmm4,xmm9 > > + pslld xmm2,26-21 > > + pandn xmm0,xmm11 > > + pand xmm4,xmm10 > > + pxor xmm7,xmm1 > > + > > + > > + movdqa xmm1,xmm13 > > + pxor xmm7,xmm2 > > + movdqa xmm2,xmm13 > > + psrld xmm1,2 > > + paddd xmm5,xmm7 > > + pxor xmm0,xmm4 > > + movdqa xmm4,xmm14 > > + movdqa xmm7,xmm13 > > + pslld xmm2,10 > > + pxor xmm4,xmm13 > > + > > + > > + psrld xmm7,13 > > + pxor xmm1,xmm2 > > + paddd xmm5,xmm0 > > + pslld xmm2,19-10 > > + pand xmm3,xmm4 > > + pxor xmm1,xmm7 > > + > > + > > + psrld xmm7,22-13 > > + pxor xmm1,xmm2 > > + movdqa xmm12,xmm14 > > + pslld xmm2,30-19 > > + pxor xmm7,xmm1 > > + pxor xmm12,xmm3 > > + paddd xmm8,xmm5 > > + pxor xmm7,xmm2 > > + > > + paddd xmm12,xmm5 > > + paddd xmm12,xmm7 > > + movd xmm5,DWORD[16+r8] > > + movd xmm0,DWORD[16+r9] > > + movd xmm1,DWORD[16+r10] > > + movd xmm2,DWORD[16+r11] > > + punpckldq xmm5,xmm1 > > + punpckldq xmm0,xmm2 > > + punpckldq xmm5,xmm0 > > + movdqa xmm7,xmm8 > > +DB 102,15,56,0,238 > > + movdqa xmm2,xmm8 > > + > > + psrld xmm7,6 > > + movdqa xmm1,xmm8 > > + pslld xmm2,7 > > + movdqa XMMWORD[(64-128)+rax],xmm5 > > + paddd xmm5,xmm11 > > + > > + psrld xmm1,11 > > + pxor xmm7,xmm2 > > + pslld xmm2,21-7 > > + paddd xmm5,XMMWORD[rbp] > > + pxor xmm7,xmm1 > > + > > + psrld xmm1,25-11 > > + movdqa xmm0,xmm8 > > + > > + pxor xmm7,xmm2 > > + movdqa xmm3,xmm8 > > + pslld xmm2,26-21 > > + pandn xmm0,xmm10 > > + pand xmm3,xmm9 > > + pxor xmm7,xmm1 > > + > > + > > + movdqa xmm1,xmm12 > > + pxor xmm7,xmm2 > > + movdqa xmm2,xmm12 > > + psrld xmm1,2 > > + paddd xmm5,xmm7 > > + pxor xmm0,xmm3 > > + movdqa xmm3,xmm13 > > + movdqa xmm7,xmm12 > > + pslld xmm2,10 > > + pxor xmm3,xmm12 > > + > > + > > + psrld xmm7,13 > > + pxor xmm1,xmm2 > > + paddd xmm5,xmm0 > > + pslld xmm2,19-10 > > + pand xmm4,xmm3 > > + pxor xmm1,xmm7 > > + > > + > > + psrld xmm7,22-13 > > + pxor xmm1,xmm2 > > + movdqa xmm11,xmm13 > > + pslld xmm2,30-19 > > + pxor xmm7,xmm1 > > + pxor xmm11,xmm4 > > + paddd xmm15,xmm5 > > + pxor xmm7,xmm2 > > + > > + paddd xmm11,xmm5 > > + paddd xmm11,xmm7 > > + movd xmm5,DWORD[20+r8] > > + movd xmm0,DWORD[20+r9] > > + movd xmm1,DWORD[20+r10] > > + movd xmm2,DWORD[20+r11] > > + punpckldq xmm5,xmm1 > > + punpckldq xmm0,xmm2 > > + punpckldq xmm5,xmm0 > > + movdqa xmm7,xmm15 > > + > > + movdqa xmm2,xmm15 > > +DB 102,15,56,0,238 > > + psrld xmm7,6 > > + movdqa xmm1,xmm15 > > + pslld xmm2,7 > > + movdqa XMMWORD[(80-128)+rax],xmm5 > > + paddd xmm5,xmm10 > > + > > + psrld xmm1,11 > > + pxor xmm7,xmm2 > > + pslld xmm2,21-7 > > + paddd xmm5,XMMWORD[32+rbp] > > + pxor xmm7,xmm1 > > + > > + psrld xmm1,25-11 > > + movdqa xmm0,xmm15 > > + > > + pxor xmm7,xmm2 > > + movdqa xmm4,xmm15 > > + pslld xmm2,26-21 > > + pandn xmm0,xmm9 > > + pand xmm4,xmm8 > > + pxor xmm7,xmm1 > > + > > + > > + movdqa xmm1,xmm11 > > + pxor xmm7,xmm2 > > + movdqa xmm2,xmm11 > > + psrld xmm1,2 > > + paddd xmm5,xmm7 > > + pxor xmm0,xmm4 > > + movdqa xmm4,xmm12 > > + movdqa xmm7,xmm11 > > + pslld xmm2,10 > > + pxor xmm4,xmm11 > > + > > + > > + psrld xmm7,13 > > + pxor xmm1,xmm2 > > + paddd xmm5,xmm0 > > + pslld xmm2,19-10 > > + pand xmm3,xmm4 > > + pxor xmm1,xmm7 > > + > > + > > + psrld xmm7,22-13 > > + pxor xmm1,xmm2 > > + movdqa xmm10,xmm12 > > + pslld xmm2,30-19 > > + pxor xmm7,xmm1 > > + pxor xmm10,xmm3 > > + paddd xmm14,xmm5 > > + pxor xmm7,xmm2 > > + > > + paddd xmm10,xmm5 > > + paddd xmm10,xmm7 > > + movd xmm5,DWORD[24+r8] > > + movd xmm0,DWORD[24+r9] > > + movd xmm1,DWORD[24+r10] > > + movd xmm2,DWORD[24+r11] > > + punpckldq xmm5,xmm1 > > + punpckldq xmm0,xmm2 > > + punpckldq xmm5,xmm0 > > + movdqa xmm7,xmm14 > > +DB 102,15,56,0,238 > > + movdqa xmm2,xmm14 > > + > > + psrld xmm7,6 > > + movdqa xmm1,xmm14 > > + pslld xmm2,7 > > + movdqa XMMWORD[(96-128)+rax],xmm5 > > + paddd xmm5,xmm9 > > + > > + psrld xmm1,11 > > + pxor xmm7,xmm2 > > + pslld xmm2,21-7 > > + paddd xmm5,XMMWORD[64+rbp] > > + pxor xmm7,xmm1 > > + > > + psrld xmm1,25-11 > > + movdqa xmm0,xmm14 > > + > > + pxor xmm7,xmm2 > > + movdqa xmm3,xmm14 > > + pslld xmm2,26-21 > > + pandn xmm0,xmm8 > > + pand xmm3,xmm15 > > + pxor xmm7,xmm1 > > + > > + > > + movdqa xmm1,xmm10 > > + pxor xmm7,xmm2 > > + movdqa xmm2,xmm10 > > + psrld xmm1,2 > > + paddd xmm5,xmm7 > > + pxor xmm0,xmm3 > > + movdqa xmm3,xmm11 > > + movdqa xmm7,xmm10 > > + pslld xmm2,10 > > + pxor xmm3,xmm10 > > + > > + > > + psrld xmm7,13 > > + pxor xmm1,xmm2 > > + paddd xmm5,xmm0 > > + pslld xmm2,19-10 > > + pand xmm4,xmm3 > > + pxor xmm1,xmm7 > > + > > + > > + psrld xmm7,22-13 > > + pxor xmm1,xmm2 > > + movdqa xmm9,xmm11 > > + pslld xmm2,30-19 > > + pxor xmm7,xmm1 > > + pxor xmm9,xmm4 > > + paddd xmm13,xmm5 > > + pxor xmm7,xmm2 > > + > > + paddd xmm9,xmm5 > > + paddd xmm9,xmm7 > > + movd xmm5,DWORD[28+r8] > > + movd xmm0,DWORD[28+r9] > > + movd xmm1,DWORD[28+r10] > > + movd xmm2,DWORD[28+r11] > > + punpckldq xmm5,xmm1 > > + punpckldq xmm0,xmm2 > > + punpckldq xmm5,xmm0 > > + movdqa xmm7,xmm13 > > + > > + movdqa xmm2,xmm13 > > +DB 102,15,56,0,238 > > + psrld xmm7,6 > > + movdqa xmm1,xmm13 > > + pslld xmm2,7 > > + movdqa XMMWORD[(112-128)+rax],xmm5 > > + paddd xmm5,xmm8 > > + > > + psrld xmm1,11 > > + pxor xmm7,xmm2 > > + pslld xmm2,21-7 > > + paddd xmm5,XMMWORD[96+rbp] > > + pxor xmm7,xmm1 > > + > > + psrld xmm1,25-11 > > + movdqa xmm0,xmm13 > > + > > + pxor xmm7,xmm2 > > + movdqa xmm4,xmm13 > > + pslld xmm2,26-21 > > + pandn xmm0,xmm15 > > + pand xmm4,xmm14 > > + pxor xmm7,xmm1 > > + > > + > > + movdqa xmm1,xmm9 > > + pxor xmm7,xmm2 > > + movdqa xmm2,xmm9 > > + psrld xmm1,2 > > + paddd xmm5,xmm7 > > + pxor xmm0,xmm4 > > + movdqa xmm4,xmm10 > > + movdqa xmm7,xmm9 > > + pslld xmm2,10 > > + pxor xmm4,xmm9 > > + > > + > > + psrld xmm7,13 > > + pxor xmm1,xmm2 > > + paddd xmm5,xmm0 > > + pslld xmm2,19-10 > > + pand xmm3,xmm4 > > + pxor xmm1,xmm7 > > + > > + > > + psrld xmm7,22-13 > > + pxor xmm1,xmm2 > > + movdqa xmm8,xmm10 > > + pslld xmm2,30-19 > > + pxor xmm7,xmm1 > > + pxor xmm8,xmm3 > > + paddd xmm12,xmm5 > > + pxor xmm7,xmm2 > > + > > + paddd xmm8,xmm5 > > + paddd xmm8,xmm7 > > + lea rbp,[256+rbp] > > + movd xmm5,DWORD[32+r8] > > + movd xmm0,DWORD[32+r9] > > + movd xmm1,DWORD[32+r10] > > + movd xmm2,DWORD[32+r11] > > + punpckldq xmm5,xmm1 > > + punpckldq xmm0,xmm2 > > + punpckldq xmm5,xmm0 > > + movdqa xmm7,xmm12 > > +DB 102,15,56,0,238 > > + movdqa xmm2,xmm12 > > + > > + psrld xmm7,6 > > + movdqa xmm1,xmm12 > > + pslld xmm2,7 > > + movdqa XMMWORD[(128-128)+rax],xmm5 > > + paddd xmm5,xmm15 > > + > > + psrld xmm1,11 > > + pxor xmm7,xmm2 > > + pslld xmm2,21-7 > > + paddd xmm5,XMMWORD[((-128))+rbp] > > + pxor xmm7,xmm1 > > + > > + psrld xmm1,25-11 > > + movdqa xmm0,xmm12 > > + > > + pxor xmm7,xmm2 > > + movdqa xmm3,xmm12 > > + pslld xmm2,26-21 > > + pandn xmm0,xmm14 > > + pand xmm3,xmm13 > > + pxor xmm7,xmm1 > > + > > + > > + movdqa xmm1,xmm8 > > + pxor xmm7,xmm2 > > + movdqa xmm2,xmm8 > > + psrld xmm1,2 > > + paddd xmm5,xmm7 > > + pxor xmm0,xmm3 > > + movdqa xmm3,xmm9 > > + movdqa xmm7,xmm8 > > + pslld xmm2,10 > > + pxor xmm3,xmm8 > > + > > + > > + psrld xmm7,13 > > + pxor xmm1,xmm2 > > + paddd xmm5,xmm0 > > + pslld xmm2,19-10 > > + pand xmm4,xmm3 > > + pxor xmm1,xmm7 > > + > > + > > + psrld xmm7,22-13 > > + pxor xmm1,xmm2 > > + movdqa xmm15,xmm9 > > + pslld xmm2,30-19 > > + pxor xmm7,xmm1 > > + pxor xmm15,xmm4 > > + paddd xmm11,xmm5 > > + pxor xmm7,xmm2 > > + > > + paddd xmm15,xmm5 > > + paddd xmm15,xmm7 > > + movd xmm5,DWORD[36+r8] > > + movd xmm0,DWORD[36+r9] > > + movd xmm1,DWORD[36+r10] > > + movd xmm2,DWORD[36+r11] > > + punpckldq xmm5,xmm1 > > + punpckldq xmm0,xmm2 > > + punpckldq xmm5,xmm0 > > + movdqa xmm7,xmm11 > > + > > + movdqa xmm2,xmm11 > > +DB 102,15,56,0,238 > > + psrld xmm7,6 > > + movdqa xmm1,xmm11 > > + pslld xmm2,7 > > + movdqa XMMWORD[(144-128)+rax],xmm5 > > + paddd xmm5,xmm14 > > + > > + psrld xmm1,11 > > + pxor xmm7,xmm2 > > + pslld xmm2,21-7 > > + paddd xmm5,XMMWORD[((-96))+rbp] > > + pxor xmm7,xmm1 > > + > > + psrld xmm1,25-11 > > + movdqa xmm0,xmm11 > > + > > + pxor xmm7,xmm2 > > + movdqa xmm4,xmm11 > > + pslld xmm2,26-21 > > + pandn xmm0,xmm13 > > + pand xmm4,xmm12 > > + pxor xmm7,xmm1 > > + > > + > > + movdqa xmm1,xmm15 > > + pxor xmm7,xmm2 > > + movdqa xmm2,xmm15 > > + psrld xmm1,2 > > + paddd xmm5,xmm7 > > + pxor xmm0,xmm4 > > + movdqa xmm4,xmm8 > > + movdqa xmm7,xmm15 > > + pslld xmm2,10 > > + pxor xmm4,xmm15 > > + > > + > > + psrld xmm7,13 > > + pxor xmm1,xmm2 > > + paddd xmm5,xmm0 > > + pslld xmm2,19-10 > > + pand xmm3,xmm4 > > + pxor xmm1,xmm7 > > + > > + > > + psrld xmm7,22-13 > > + pxor xmm1,xmm2 > > + movdqa xmm14,xmm8 > > + pslld xmm2,30-19 > > + pxor xmm7,xmm1 > > + pxor xmm14,xmm3 > > + paddd xmm10,xmm5 > > + pxor xmm7,xmm2 > > + > > + paddd xmm14,xmm5 > > + paddd xmm14,xmm7 > > + movd xmm5,DWORD[40+r8] > > + movd xmm0,DWORD[40+r9] > > + movd xmm1,DWORD[40+r10] > > + movd xmm2,DWORD[40+r11] > > + punpckldq xmm5,xmm1 > > + punpckldq xmm0,xmm2 > > + punpckldq xmm5,xmm0 > > + movdqa xmm7,xmm10 > > +DB 102,15,56,0,238 > > + movdqa xmm2,xmm10 > > + > > + psrld xmm7,6 > > + movdqa xmm1,xmm10 > > + pslld xmm2,7 > > + movdqa XMMWORD[(160-128)+rax],xmm5 > > + paddd xmm5,xmm13 > > + > > + psrld xmm1,11 > > + pxor xmm7,xmm2 > > + pslld xmm2,21-7 > > + paddd xmm5,XMMWORD[((-64))+rbp] > > + pxor xmm7,xmm1 > > + > > + psrld xmm1,25-11 > > + movdqa xmm0,xmm10 > > + > > + pxor xmm7,xmm2 > > + movdqa xmm3,xmm10 > > + pslld xmm2,26-21 > > + pandn xmm0,xmm12 > > + pand xmm3,xmm11 > > + pxor xmm7,xmm1 > > + > > + > > + movdqa xmm1,xmm14 > > + pxor xmm7,xmm2 > > + movdqa xmm2,xmm14 > > + psrld xmm1,2 > > + paddd xmm5,xmm7 > > + pxor xmm0,xmm3 > > + movdqa xmm3,xmm15 > > + movdqa xmm7,xmm14 > > + pslld xmm2,10 > > + pxor xmm3,xmm14 > > + > > + > > + psrld xmm7,13 > > + pxor xmm1,xmm2 > > + paddd xmm5,xmm0 > > + pslld xmm2,19-10 > > + pand xmm4,xmm3 > > + pxor xmm1,xmm7 > > + > > + > > + psrld xmm7,22-13 > > + pxor xmm1,xmm2 > > + movdqa xmm13,xmm15 > > + pslld xmm2,30-19 > > + pxor xmm7,xmm1 > > + pxor xmm13,xmm4 > > + paddd xmm9,xmm5 > > + pxor xmm7,xmm2 > > + > > + paddd xmm13,xmm5 > > + paddd xmm13,xmm7 > > + movd xmm5,DWORD[44+r8] > > + movd xmm0,DWORD[44+r9] > > + movd xmm1,DWORD[44+r10] > > + movd xmm2,DWORD[44+r11] > > + punpckldq xmm5,xmm1 > > + punpckldq xmm0,xmm2 > > + punpckldq xmm5,xmm0 > > + movdqa xmm7,xmm9 > > + > > + movdqa xmm2,xmm9 > > +DB 102,15,56,0,238 > > + psrld xmm7,6 > > + movdqa xmm1,xmm9 > > + pslld xmm2,7 > > + movdqa XMMWORD[(176-128)+rax],xmm5 > > + paddd xmm5,xmm12 > > + > > + psrld xmm1,11 > > + pxor xmm7,xmm2 > > + pslld xmm2,21-7 > > + paddd xmm5,XMMWORD[((-32))+rbp] > > + pxor xmm7,xmm1 > > + > > + psrld xmm1,25-11 > > + movdqa xmm0,xmm9 > > + > > + pxor xmm7,xmm2 > > + movdqa xmm4,xmm9 > > + pslld xmm2,26-21 > > + pandn xmm0,xmm11 > > + pand xmm4,xmm10 > > + pxor xmm7,xmm1 > > + > > + > > + movdqa xmm1,xmm13 > > + pxor xmm7,xmm2 > > + movdqa xmm2,xmm13 > > + psrld xmm1,2 > > + paddd xmm5,xmm7 > > + pxor xmm0,xmm4 > > + movdqa xmm4,xmm14 > > + movdqa xmm7,xmm13 > > + pslld xmm2,10 > > + pxor xmm4,xmm13 > > + > > + > > + psrld xmm7,13 > > + pxor xmm1,xmm2 > > + paddd xmm5,xmm0 > > + pslld xmm2,19-10 > > + pand xmm3,xmm4 > > + pxor xmm1,xmm7 > > + > > + > > + psrld xmm7,22-13 > > + pxor xmm1,xmm2 > > + movdqa xmm12,xmm14 > > + pslld xmm2,30-19 > > + pxor xmm7,xmm1 > > + pxor xmm12,xmm3 > > + paddd xmm8,xmm5 > > + pxor xmm7,xmm2 > > + > > + paddd xmm12,xmm5 > > + paddd xmm12,xmm7 > > + movd xmm5,DWORD[48+r8] > > + movd xmm0,DWORD[48+r9] > > + movd xmm1,DWORD[48+r10] > > + movd xmm2,DWORD[48+r11] > > + punpckldq xmm5,xmm1 > > + punpckldq xmm0,xmm2 > > + punpckldq xmm5,xmm0 > > + movdqa xmm7,xmm8 > > +DB 102,15,56,0,238 > > + movdqa xmm2,xmm8 > > + > > + psrld xmm7,6 > > + movdqa xmm1,xmm8 > > + pslld xmm2,7 > > + movdqa XMMWORD[(192-128)+rax],xmm5 > > + paddd xmm5,xmm11 > > + > > + psrld xmm1,11 > > + pxor xmm7,xmm2 > > + pslld xmm2,21-7 > > + paddd xmm5,XMMWORD[rbp] > > + pxor xmm7,xmm1 > > + > > + psrld xmm1,25-11 > > + movdqa xmm0,xmm8 > > + > > + pxor xmm7,xmm2 > > + movdqa xmm3,xmm8 > > + pslld xmm2,26-21 > > + pandn xmm0,xmm10 > > + pand xmm3,xmm9 > > + pxor xmm7,xmm1 > > + > > + > > + movdqa xmm1,xmm12 > > + pxor xmm7,xmm2 > > + movdqa xmm2,xmm12 > > + psrld xmm1,2 > > + paddd xmm5,xmm7 > > + pxor xmm0,xmm3 > > + movdqa xmm3,xmm13 > > + movdqa xmm7,xmm12 > > + pslld xmm2,10 > > + pxor xmm3,xmm12 > > + > > + > > + psrld xmm7,13 > > + pxor xmm1,xmm2 > > + paddd xmm5,xmm0 > > + pslld xmm2,19-10 > > + pand xmm4,xmm3 > > + pxor xmm1,xmm7 > > + > > + > > + psrld xmm7,22-13 > > + pxor xmm1,xmm2 > > + movdqa xmm11,xmm13 > > + pslld xmm2,30-19 > > + pxor xmm7,xmm1 > > + pxor xmm11,xmm4 > > + paddd xmm15,xmm5 > > + pxor xmm7,xmm2 > > + > > + paddd xmm11,xmm5 > > + paddd xmm11,xmm7 > > + movd xmm5,DWORD[52+r8] > > + movd xmm0,DWORD[52+r9] > > + movd xmm1,DWORD[52+r10] > > + movd xmm2,DWORD[52+r11] > > + punpckldq xmm5,xmm1 > > + punpckldq xmm0,xmm2 > > + punpckldq xmm5,xmm0 > > + movdqa xmm7,xmm15 > > + > > + movdqa xmm2,xmm15 > > +DB 102,15,56,0,238 > > + psrld xmm7,6 > > + movdqa xmm1,xmm15 > > + pslld xmm2,7 > > + movdqa XMMWORD[(208-128)+rax],xmm5 > > + paddd xmm5,xmm10 > > + > > + psrld xmm1,11 > > + pxor xmm7,xmm2 > > + pslld xmm2,21-7 > > + paddd xmm5,XMMWORD[32+rbp] > > + pxor xmm7,xmm1 > > + > > + psrld xmm1,25-11 > > + movdqa xmm0,xmm15 > > + > > + pxor xmm7,xmm2 > > + movdqa xmm4,xmm15 > > + pslld xmm2,26-21 > > + pandn xmm0,xmm9 > > + pand xmm4,xmm8 > > + pxor xmm7,xmm1 > > + > > + > > + movdqa xmm1,xmm11 > > + pxor xmm7,xmm2 > > + movdqa xmm2,xmm11 > > + psrld xmm1,2 > > + paddd xmm5,xmm7 > > + pxor xmm0,xmm4 > > + movdqa xmm4,xmm12 > > + movdqa xmm7,xmm11 > > + pslld xmm2,10 > > + pxor xmm4,xmm11 > > + > > + > > + psrld xmm7,13 > > + pxor xmm1,xmm2 > > + paddd xmm5,xmm0 > > + pslld xmm2,19-10 > > + pand xmm3,xmm4 > > + pxor xmm1,xmm7 > > + > > + > > + psrld xmm7,22-13 > > + pxor xmm1,xmm2 > > + movdqa xmm10,xmm12 > > + pslld xmm2,30-19 > > + pxor xmm7,xmm1 > > + pxor xmm10,xmm3 > > + paddd xmm14,xmm5 > > + pxor xmm7,xmm2 > > + > > + paddd xmm10,xmm5 > > + paddd xmm10,xmm7 > > + movd xmm5,DWORD[56+r8] > > + movd xmm0,DWORD[56+r9] > > + movd xmm1,DWORD[56+r10] > > + movd xmm2,DWORD[56+r11] > > + punpckldq xmm5,xmm1 > > + punpckldq xmm0,xmm2 > > + punpckldq xmm5,xmm0 > > + movdqa xmm7,xmm14 > > +DB 102,15,56,0,238 > > + movdqa xmm2,xmm14 > > + > > + psrld xmm7,6 > > + movdqa xmm1,xmm14 > > + pslld xmm2,7 > > + movdqa XMMWORD[(224-128)+rax],xmm5 > > + paddd xmm5,xmm9 > > + > > + psrld xmm1,11 > > + pxor xmm7,xmm2 > > + pslld xmm2,21-7 > > + paddd xmm5,XMMWORD[64+rbp] > > + pxor xmm7,xmm1 > > + > > + psrld xmm1,25-11 > > + movdqa xmm0,xmm14 > > + > > + pxor xmm7,xmm2 > > + movdqa xmm3,xmm14 > > + pslld xmm2,26-21 > > + pandn xmm0,xmm8 > > + pand xmm3,xmm15 > > + pxor xmm7,xmm1 > > + > > + > > + movdqa xmm1,xmm10 > > + pxor xmm7,xmm2 > > + movdqa xmm2,xmm10 > > + psrld xmm1,2 > > + paddd xmm5,xmm7 > > + pxor xmm0,xmm3 > > + movdqa xmm3,xmm11 > > + movdqa xmm7,xmm10 > > + pslld xmm2,10 > > + pxor xmm3,xmm10 > > + > > + > > + psrld xmm7,13 > > + pxor xmm1,xmm2 > > + paddd xmm5,xmm0 > > + pslld xmm2,19-10 > > + pand xmm4,xmm3 > > + pxor xmm1,xmm7 > > + > > + > > + psrld xmm7,22-13 > > + pxor xmm1,xmm2 > > + movdqa xmm9,xmm11 > > + pslld xmm2,30-19 > > + pxor xmm7,xmm1 > > + pxor xmm9,xmm4 > > + paddd xmm13,xmm5 > > + pxor xmm7,xmm2 > > + > > + paddd xmm9,xmm5 > > + paddd xmm9,xmm7 > > + movd xmm5,DWORD[60+r8] > > + lea r8,[64+r8] > > + movd xmm0,DWORD[60+r9] > > + lea r9,[64+r9] > > + movd xmm1,DWORD[60+r10] > > + lea r10,[64+r10] > > + movd xmm2,DWORD[60+r11] > > + lea r11,[64+r11] > > + punpckldq xmm5,xmm1 > > + punpckldq xmm0,xmm2 > > + punpckldq xmm5,xmm0 > > + movdqa xmm7,xmm13 > > + > > + movdqa xmm2,xmm13 > > +DB 102,15,56,0,238 > > + psrld xmm7,6 > > + movdqa xmm1,xmm13 > > + pslld xmm2,7 > > + movdqa XMMWORD[(240-128)+rax],xmm5 > > + paddd xmm5,xmm8 > > + > > + psrld xmm1,11 > > + pxor xmm7,xmm2 > > + pslld xmm2,21-7 > > + paddd xmm5,XMMWORD[96+rbp] > > + pxor xmm7,xmm1 > > + > > + psrld xmm1,25-11 > > + movdqa xmm0,xmm13 > > + prefetcht0 [63+r8] > > + pxor xmm7,xmm2 > > + movdqa xmm4,xmm13 > > + pslld xmm2,26-21 > > + pandn xmm0,xmm15 > > + pand xmm4,xmm14 > > + pxor xmm7,xmm1 > > + > > + prefetcht0 [63+r9] > > + movdqa xmm1,xmm9 > > + pxor xmm7,xmm2 > > + movdqa xmm2,xmm9 > > + psrld xmm1,2 > > + paddd xmm5,xmm7 > > + pxor xmm0,xmm4 > > + movdqa xmm4,xmm10 > > + movdqa xmm7,xmm9 > > + pslld xmm2,10 > > + pxor xmm4,xmm9 > > + > > + prefetcht0 [63+r10] > > + psrld xmm7,13 > > + pxor xmm1,xmm2 > > + paddd xmm5,xmm0 > > + pslld xmm2,19-10 > > + pand xmm3,xmm4 > > + pxor xmm1,xmm7 > > + > > + prefetcht0 [63+r11] > > + psrld xmm7,22-13 > > + pxor xmm1,xmm2 > > + movdqa xmm8,xmm10 > > + pslld xmm2,30-19 > > + pxor xmm7,xmm1 > > + pxor xmm8,xmm3 > > + paddd xmm12,xmm5 > > + pxor xmm7,xmm2 > > + > > + paddd xmm8,xmm5 > > + paddd xmm8,xmm7 > > + lea rbp,[256+rbp] > > + movdqu xmm5,XMMWORD[((0-128))+rax] > > + mov ecx,3 > > + jmp NEAR $L$oop_16_xx > > +ALIGN 32 > > +$L$oop_16_xx: > > + movdqa xmm6,XMMWORD[((16-128))+rax] > > + paddd xmm5,XMMWORD[((144-128))+rax] > > + > > + movdqa xmm7,xmm6 > > + movdqa xmm1,xmm6 > > + psrld xmm7,3 > > + movdqa xmm2,xmm6 > > + > > + psrld xmm1,7 > > + movdqa xmm0,XMMWORD[((224-128))+rax] > > + pslld xmm2,14 > > + pxor xmm7,xmm1 > > + psrld xmm1,18-7 > > + movdqa xmm3,xmm0 > > + pxor xmm7,xmm2 > > + pslld xmm2,25-14 > > + pxor xmm7,xmm1 > > + psrld xmm0,10 > > + movdqa xmm1,xmm3 > > + > > + psrld xmm3,17 > > + pxor xmm7,xmm2 > > + pslld xmm1,13 > > + paddd xmm5,xmm7 > > + pxor xmm0,xmm3 > > + psrld xmm3,19-17 > > + pxor xmm0,xmm1 > > + pslld xmm1,15-13 > > + pxor xmm0,xmm3 > > + pxor xmm0,xmm1 > > + paddd xmm5,xmm0 > > + movdqa xmm7,xmm12 > > + > > + movdqa xmm2,xmm12 > > + > > + psrld xmm7,6 > > + movdqa xmm1,xmm12 > > + pslld xmm2,7 > > + movdqa XMMWORD[(0-128)+rax],xmm5 > > + paddd xmm5,xmm15 > > + > > + psrld xmm1,11 > > + pxor xmm7,xmm2 > > + pslld xmm2,21-7 > > + paddd xmm5,XMMWORD[((-128))+rbp] > > + pxor xmm7,xmm1 > > + > > + psrld xmm1,25-11 > > + movdqa xmm0,xmm12 > > + > > + pxor xmm7,xmm2 > > + movdqa xmm3,xmm12 > > + pslld xmm2,26-21 > > + pandn xmm0,xmm14 > > + pand xmm3,xmm13 > > + pxor xmm7,xmm1 > > + > > + > > + movdqa xmm1,xmm8 > > + pxor xmm7,xmm2 > > + movdqa xmm2,xmm8 > > + psrld xmm1,2 > > + paddd xmm5,xmm7 > > + pxor xmm0,xmm3 > > + movdqa xmm3,xmm9 > > + movdqa xmm7,xmm8 > > + pslld xmm2,10 > > + pxor xmm3,xmm8 > > + > > + > > + psrld xmm7,13 > > + pxor xmm1,xmm2 > > + paddd xmm5,xmm0 > > + pslld xmm2,19-10 > > + pand xmm4,xmm3 > > + pxor xmm1,xmm7 > > + > > + > > + psrld xmm7,22-13 > > + pxor xmm1,xmm2 > > + movdqa xmm15,xmm9 > > + pslld xmm2,30-19 > > + pxor xmm7,xmm1 > > + pxor xmm15,xmm4 > > + paddd xmm11,xmm5 > > + pxor xmm7,xmm2 > > + > > + paddd xmm15,xmm5 > > + paddd xmm15,xmm7 > > + movdqa xmm5,XMMWORD[((32-128))+rax] > > + paddd xmm6,XMMWORD[((160-128))+rax] > > + > > + movdqa xmm7,xmm5 > > + movdqa xmm1,xmm5 > > + psrld xmm7,3 > > + movdqa xmm2,xmm5 > > + > > + psrld xmm1,7 > > + movdqa xmm0,XMMWORD[((240-128))+rax] > > + pslld xmm2,14 > > + pxor xmm7,xmm1 > > + psrld xmm1,18-7 > > + movdqa xmm4,xmm0 > > + pxor xmm7,xmm2 > > + pslld xmm2,25-14 > > + pxor xmm7,xmm1 > > + psrld xmm0,10 > > + movdqa xmm1,xmm4 > > + > > + psrld xmm4,17 > > + pxor xmm7,xmm2 > > + pslld xmm1,13 > > + paddd xmm6,xmm7 > > + pxor xmm0,xmm4 > > + psrld xmm4,19-17 > > + pxor xmm0,xmm1 > > + pslld xmm1,15-13 > > + pxor xmm0,xmm4 > > + pxor xmm0,xmm1 > > + paddd xmm6,xmm0 > > + movdqa xmm7,xmm11 > > + > > + movdqa xmm2,xmm11 > > + > > + psrld xmm7,6 > > + movdqa xmm1,xmm11 > > + pslld xmm2,7 > > + movdqa XMMWORD[(16-128)+rax],xmm6 > > + paddd xmm6,xmm14 > > + > > + psrld xmm1,11 > > + pxor xmm7,xmm2 > > + pslld xmm2,21-7 > > + paddd xmm6,XMMWORD[((-96))+rbp] > > + pxor xmm7,xmm1 > > + > > + psrld xmm1,25-11 > > + movdqa xmm0,xmm11 > > + > > + pxor xmm7,xmm2 > > + movdqa xmm4,xmm11 > > + pslld xmm2,26-21 > > + pandn xmm0,xmm13 > > + pand xmm4,xmm12 > > + pxor xmm7,xmm1 > > + > > + > > + movdqa xmm1,xmm15 > > + pxor xmm7,xmm2 > > + movdqa xmm2,xmm15 > > + psrld xmm1,2 > > + paddd xmm6,xmm7 > > + pxor xmm0,xmm4 > > + movdqa xmm4,xmm8 > > + movdqa xmm7,xmm15 > > + pslld xmm2,10 > > + pxor xmm4,xmm15 > > + > > + > > + psrld xmm7,13 > > + pxor xmm1,xmm2 > > + paddd xmm6,xmm0 > > + pslld xmm2,19-10 > > + pand xmm3,xmm4 > > + pxor xmm1,xmm7 > > + > > + > > + psrld xmm7,22-13 > > + pxor xmm1,xmm2 > > + movdqa xmm14,xmm8 > > + pslld xmm2,30-19 > > + pxor xmm7,xmm1 > > + pxor xmm14,xmm3 > > + paddd xmm10,xmm6 > > + pxor xmm7,xmm2 > > + > > + paddd xmm14,xmm6 > > + paddd xmm14,xmm7 > > + movdqa xmm6,XMMWORD[((48-128))+rax] > > + paddd xmm5,XMMWORD[((176-128))+rax] > > + > > + movdqa xmm7,xmm6 > > + movdqa xmm1,xmm6 > > + psrld xmm7,3 > > + movdqa xmm2,xmm6 > > + > > + psrld xmm1,7 > > + movdqa xmm0,XMMWORD[((0-128))+rax] > > + pslld xmm2,14 > > + pxor xmm7,xmm1 > > + psrld xmm1,18-7 > > + movdqa xmm3,xmm0 > > + pxor xmm7,xmm2 > > + pslld xmm2,25-14 > > + pxor xmm7,xmm1 > > + psrld xmm0,10 > > + movdqa xmm1,xmm3 > > + > > + psrld xmm3,17 > > + pxor xmm7,xmm2 > > + pslld xmm1,13 > > + paddd xmm5,xmm7 > > + pxor xmm0,xmm3 > > + psrld xmm3,19-17 > > + pxor xmm0,xmm1 > > + pslld xmm1,15-13 > > + pxor xmm0,xmm3 > > + pxor xmm0,xmm1 > > + paddd xmm5,xmm0 > > + movdqa xmm7,xmm10 > > + > > + movdqa xmm2,xmm10 > > + > > + psrld xmm7,6 > > + movdqa xmm1,xmm10 > > + pslld xmm2,7 > > + movdqa XMMWORD[(32-128)+rax],xmm5 > > + paddd xmm5,xmm13 > > + > > + psrld xmm1,11 > > + pxor xmm7,xmm2 > > + pslld xmm2,21-7 > > + paddd xmm5,XMMWORD[((-64))+rbp] > > + pxor xmm7,xmm1 > > + > > + psrld xmm1,25-11 > > + movdqa xmm0,xmm10 > > + > > + pxor xmm7,xmm2 > > + movdqa xmm3,xmm10 > > + pslld xmm2,26-21 > > + pandn xmm0,xmm12 > > + pand xmm3,xmm11 > > + pxor xmm7,xmm1 > > + > > + > > + movdqa xmm1,xmm14 > > + pxor xmm7,xmm2 > > + movdqa xmm2,xmm14 > > + psrld xmm1,2 > > + paddd xmm5,xmm7 > > + pxor xmm0,xmm3 > > + movdqa xmm3,xmm15 > > + movdqa xmm7,xmm14 > > + pslld xmm2,10 > > + pxor xmm3,xmm14 > > + > > + > > + psrld xmm7,13 > > + pxor xmm1,xmm2 > > + paddd xmm5,xmm0 > > + pslld xmm2,19-10 > > + pand xmm4,xmm3 > > + pxor xmm1,xmm7 > > + > > + > > + psrld xmm7,22-13 > > + pxor xmm1,xmm2 > > + movdqa xmm13,xmm15 > > + pslld xmm2,30-19 > > + pxor xmm7,xmm1 > > + pxor xmm13,xmm4 > > + paddd xmm9,xmm5 > > + pxor xmm7,xmm2 > > + > > + paddd xmm13,xmm5 > > + paddd xmm13,xmm7 > > + movdqa xmm5,XMMWORD[((64-128))+rax] > > + paddd xmm6,XMMWORD[((192-128))+rax] > > + > > + movdqa xmm7,xmm5 > > + movdqa xmm1,xmm5 > > + psrld xmm7,3 > > + movdqa xmm2,xmm5 > > + > > + psrld xmm1,7 > > + movdqa xmm0,XMMWORD[((16-128))+rax] > > + pslld xmm2,14 > > + pxor xmm7,xmm1 > > + psrld xmm1,18-7 > > + movdqa xmm4,xmm0 > > + pxor xmm7,xmm2 > > + pslld xmm2,25-14 > > + pxor xmm7,xmm1 > > + psrld xmm0,10 > > + movdqa xmm1,xmm4 > > + > > + psrld xmm4,17 > > + pxor xmm7,xmm2 > > + pslld xmm1,13 > > + paddd xmm6,xmm7 > > + pxor xmm0,xmm4 > > + psrld xmm4,19-17 > > + pxor xmm0,xmm1 > > + pslld xmm1,15-13 > > + pxor xmm0,xmm4 > > + pxor xmm0,xmm1 > > + paddd xmm6,xmm0 > > + movdqa xmm7,xmm9 > > + > > + movdqa xmm2,xmm9 > > + > > + psrld xmm7,6 > > + movdqa xmm1,xmm9 > > + pslld xmm2,7 > > + movdqa XMMWORD[(48-128)+rax],xmm6 > > + paddd xmm6,xmm12 > > + > > + psrld xmm1,11 > > + pxor xmm7,xmm2 > > + pslld xmm2,21-7 > > + paddd xmm6,XMMWORD[((-32))+rbp] > > + pxor xmm7,xmm1 > > + > > + psrld xmm1,25-11 > > + movdqa xmm0,xmm9 > > + > > + pxor xmm7,xmm2 > > + movdqa xmm4,xmm9 > > + pslld xmm2,26-21 > > + pandn xmm0,xmm11 > > + pand xmm4,xmm10 > > + pxor xmm7,xmm1 > > + > > + > > + movdqa xmm1,xmm13 > > + pxor xmm7,xmm2 > > + movdqa xmm2,xmm13 > > + psrld xmm1,2 > > + paddd xmm6,xmm7 > > + pxor xmm0,xmm4 > > + movdqa xmm4,xmm14 > > + movdqa xmm7,xmm13 > > + pslld xmm2,10 > > + pxor xmm4,xmm13 > > + > > + > > + psrld xmm7,13 > > + pxor xmm1,xmm2 > > + paddd xmm6,xmm0 > > + pslld xmm2,19-10 > > + pand xmm3,xmm4 > > + pxor xmm1,xmm7 > > + > > + > > + psrld xmm7,22-13 > > + pxor xmm1,xmm2 > > + movdqa xmm12,xmm14 > > + pslld xmm2,30-19 > > + pxor xmm7,xmm1 > > + pxor xmm12,xmm3 > > + paddd xmm8,xmm6 > > + pxor xmm7,xmm2 > > + > > + paddd xmm12,xmm6 > > + paddd xmm12,xmm7 > > + movdqa xmm6,XMMWORD[((80-128))+rax] > > + paddd xmm5,XMMWORD[((208-128))+rax] > > + > > + movdqa xmm7,xmm6 > > + movdqa xmm1,xmm6 > > + psrld xmm7,3 > > + movdqa xmm2,xmm6 > > + > > + psrld xmm1,7 > > + movdqa xmm0,XMMWORD[((32-128))+rax] > > + pslld xmm2,14 > > + pxor xmm7,xmm1 > > + psrld xmm1,18-7 > > + movdqa xmm3,xmm0 > > + pxor xmm7,xmm2 > > + pslld xmm2,25-14 > > + pxor xmm7,xmm1 > > + psrld xmm0,10 > > + movdqa xmm1,xmm3 > > + > > + psrld xmm3,17 > > + pxor xmm7,xmm2 > > + pslld xmm1,13 > > + paddd xmm5,xmm7 > > + pxor xmm0,xmm3 > > + psrld xmm3,19-17 > > + pxor xmm0,xmm1 > > + pslld xmm1,15-13 > > + pxor xmm0,xmm3 > > + pxor xmm0,xmm1 > > + paddd xmm5,xmm0 > > + movdqa xmm7,xmm8 > > + > > + movdqa xmm2,xmm8 > > + > > + psrld xmm7,6 > > + movdqa xmm1,xmm8 > > + pslld xmm2,7 > > + movdqa XMMWORD[(64-128)+rax],xmm5 > > + paddd xmm5,xmm11 > > + > > + psrld xmm1,11 > > + pxor xmm7,xmm2 > > + pslld xmm2,21-7 > > + paddd xmm5,XMMWORD[rbp] > > + pxor xmm7,xmm1 > > + > > + psrld xmm1,25-11 > > + movdqa xmm0,xmm8 > > + > > + pxor xmm7,xmm2 > > + movdqa xmm3,xmm8 > > + pslld xmm2,26-21 > > + pandn xmm0,xmm10 > > + pand xmm3,xmm9 > > + pxor xmm7,xmm1 > > + > > + > > + movdqa xmm1,xmm12 > > + pxor xmm7,xmm2 > > + movdqa xmm2,xmm12 > > + psrld xmm1,2 > > + paddd xmm5,xmm7 > > + pxor xmm0,xmm3 > > + movdqa xmm3,xmm13 > > + movdqa xmm7,xmm12 > > + pslld xmm2,10 > > + pxor xmm3,xmm12 > > + > > + > > + psrld xmm7,13 > > + pxor xmm1,xmm2 > > + paddd xmm5,xmm0 > > + pslld xmm2,19-10 > > + pand xmm4,xmm3 > > + pxor xmm1,xmm7 > > + > > + > > + psrld xmm7,22-13 > > + pxor xmm1,xmm2 > > + movdqa xmm11,xmm13 > > + pslld xmm2,30-19 > > + pxor xmm7,xmm1 > > + pxor xmm11,xmm4 > > + paddd xmm15,xmm5 > > + pxor xmm7,xmm2 > > + > > + paddd xmm11,xmm5 > > + paddd xmm11,xmm7 > > + movdqa xmm5,XMMWORD[((96-128))+rax] > > + paddd xmm6,XMMWORD[((224-128))+rax] > > + > > + movdqa xmm7,xmm5 > > + movdqa xmm1,xmm5 > > + psrld xmm7,3 > > + movdqa xmm2,xmm5 > > + > > + psrld xmm1,7 > > + movdqa xmm0,XMMWORD[((48-128))+rax] > > + pslld xmm2,14 > > + pxor xmm7,xmm1 > > + psrld xmm1,18-7 > > + movdqa xmm4,xmm0 > > + pxor xmm7,xmm2 > > + pslld xmm2,25-14 > > + pxor xmm7,xmm1 > > + psrld xmm0,10 > > + movdqa xmm1,xmm4 > > + > > + psrld xmm4,17 > > + pxor xmm7,xmm2 > > + pslld xmm1,13 > > + paddd xmm6,xmm7 > > + pxor xmm0,xmm4 > > + psrld xmm4,19-17 > > + pxor xmm0,xmm1 > > + pslld xmm1,15-13 > > + pxor xmm0,xmm4 > > + pxor xmm0,xmm1 > > + paddd xmm6,xmm0 > > + movdqa xmm7,xmm15 > > + > > + movdqa xmm2,xmm15 > > + > > + psrld xmm7,6 > > + movdqa xmm1,xmm15 > > + pslld xmm2,7 > > + movdqa XMMWORD[(80-128)+rax],xmm6 > > + paddd xmm6,xmm10 > > + > > + psrld xmm1,11 > > + pxor xmm7,xmm2 > > + pslld xmm2,21-7 > > + paddd xmm6,XMMWORD[32+rbp] > > + pxor xmm7,xmm1 > > + > > + psrld xmm1,25-11 > > + movdqa xmm0,xmm15 > > + > > + pxor xmm7,xmm2 > > + movdqa xmm4,xmm15 > > + pslld xmm2,26-21 > > + pandn xmm0,xmm9 > > + pand xmm4,xmm8 > > + pxor xmm7,xmm1 > > + > > + > > + movdqa xmm1,xmm11 > > + pxor xmm7,xmm2 > > + movdqa xmm2,xmm11 > > + psrld xmm1,2 > > + paddd xmm6,xmm7 > > + pxor xmm0,xmm4 > > + movdqa xmm4,xmm12 > > + movdqa xmm7,xmm11 > > + pslld xmm2,10 > > + pxor xmm4,xmm11 > > + > > + > > + psrld xmm7,13 > > + pxor xmm1,xmm2 > > + paddd xmm6,xmm0 > > + pslld xmm2,19-10 > > + pand xmm3,xmm4 > > + pxor xmm1,xmm7 > > + > > + > > + psrld xmm7,22-13 > > + pxor xmm1,xmm2 > > + movdqa xmm10,xmm12 > > + pslld xmm2,30-19 > > + pxor xmm7,xmm1 > > + pxor xmm10,xmm3 > > + paddd xmm14,xmm6 > > + pxor xmm7,xmm2 > > + > > + paddd xmm10,xmm6 > > + paddd xmm10,xmm7 > > + movdqa xmm6,XMMWORD[((112-128))+rax] > > + paddd xmm5,XMMWORD[((240-128))+rax] > > + > > + movdqa xmm7,xmm6 > > + movdqa xmm1,xmm6 > > + psrld xmm7,3 > > + movdqa xmm2,xmm6 > > + > > + psrld xmm1,7 > > + movdqa xmm0,XMMWORD[((64-128))+rax] > > + pslld xmm2,14 > > + pxor xmm7,xmm1 > > + psrld xmm1,18-7 > > + movdqa xmm3,xmm0 > > + pxor xmm7,xmm2 > > + pslld xmm2,25-14 > > + pxor xmm7,xmm1 > > + psrld xmm0,10 > > + movdqa xmm1,xmm3 > > + > > + psrld xmm3,17 > > + pxor xmm7,xmm2 > > + pslld xmm1,13 > > + paddd xmm5,xmm7 > > + pxor xmm0,xmm3 > > + psrld xmm3,19-17 > > + pxor xmm0,xmm1 > > + pslld xmm1,15-13 > > + pxor xmm0,xmm3 > > + pxor xmm0,xmm1 > > + paddd xmm5,xmm0 > > + movdqa xmm7,xmm14 > > + > > + movdqa xmm2,xmm14 > > + > > + psrld xmm7,6 > > + movdqa xmm1,xmm14 > > + pslld xmm2,7 > > + movdqa XMMWORD[(96-128)+rax],xmm5 > > + paddd xmm5,xmm9 > > + > > + psrld xmm1,11 > > + pxor xmm7,xmm2 > > + pslld xmm2,21-7 > > + paddd xmm5,XMMWORD[64+rbp] > > + pxor xmm7,xmm1 > > + > > + psrld xmm1,25-11 > > + movdqa xmm0,xmm14 > > + > > + pxor xmm7,xmm2 > > + movdqa xmm3,xmm14 > > + pslld xmm2,26-21 > > + pandn xmm0,xmm8 > > + pand xmm3,xmm15 > > + pxor xmm7,xmm1 > > + > > + > > + movdqa xmm1,xmm10 > > + pxor xmm7,xmm2 > > + movdqa xmm2,xmm10 > > + psrld xmm1,2 > > + paddd xmm5,xmm7 > > + pxor xmm0,xmm3 > > + movdqa xmm3,xmm11 > > + movdqa xmm7,xmm10 > > + pslld xmm2,10 > > + pxor xmm3,xmm10 > > + > > + > > + psrld xmm7,13 > > + pxor xmm1,xmm2 > > + paddd xmm5,xmm0 > > + pslld xmm2,19-10 > > + pand xmm4,xmm3 > > + pxor xmm1,xmm7 > > + > > + > > + psrld xmm7,22-13 > > + pxor xmm1,xmm2 > > + movdqa xmm9,xmm11 > > + pslld xmm2,30-19 > > + pxor xmm7,xmm1 > > + pxor xmm9,xmm4 > > + paddd xmm13,xmm5 > > + pxor xmm7,xmm2 > > + > > + paddd xmm9,xmm5 > > + paddd xmm9,xmm7 > > + movdqa xmm5,XMMWORD[((128-128))+rax] > > + paddd xmm6,XMMWORD[((0-128))+rax] > > + > > + movdqa xmm7,xmm5 > > + movdqa xmm1,xmm5 > > + psrld xmm7,3 > > + movdqa xmm2,xmm5 > > + > > + psrld xmm1,7 > > + movdqa xmm0,XMMWORD[((80-128))+rax] > > + pslld xmm2,14 > > + pxor xmm7,xmm1 > > + psrld xmm1,18-7 > > + movdqa xmm4,xmm0 > > + pxor xmm7,xmm2 > > + pslld xmm2,25-14 > > + pxor xmm7,xmm1 > > + psrld xmm0,10 > > + movdqa xmm1,xmm4 > > + > > + psrld xmm4,17 > > + pxor xmm7,xmm2 > > + pslld xmm1,13 > > + paddd xmm6,xmm7 > > + pxor xmm0,xmm4 > > + psrld xmm4,19-17 > > + pxor xmm0,xmm1 > > + pslld xmm1,15-13 > > + pxor xmm0,xmm4 > > + pxor xmm0,xmm1 > > + paddd xmm6,xmm0 > > + movdqa xmm7,xmm13 > > + > > + movdqa xmm2,xmm13 > > + > > + psrld xmm7,6 > > + movdqa xmm1,xmm13 > > + pslld xmm2,7 > > + movdqa XMMWORD[(112-128)+rax],xmm6 > > + paddd xmm6,xmm8 > > + > > + psrld xmm1,11 > > + pxor xmm7,xmm2 > > + pslld xmm2,21-7 > > + paddd xmm6,XMMWORD[96+rbp] > > + pxor xmm7,xmm1 > > + > > + psrld xmm1,25-11 > > + movdqa xmm0,xmm13 > > + > > + pxor xmm7,xmm2 > > + movdqa xmm4,xmm13 > > + pslld xmm2,26-21 > > + pandn xmm0,xmm15 > > + pand xmm4,xmm14 > > + pxor xmm7,xmm1 > > + > > + > > + movdqa xmm1,xmm9 > > + pxor xmm7,xmm2 > > + movdqa xmm2,xmm9 > > + psrld xmm1,2 > > + paddd xmm6,xmm7 > > + pxor xmm0,xmm4 > > + movdqa xmm4,xmm10 > > + movdqa xmm7,xmm9 > > + pslld xmm2,10 > > + pxor xmm4,xmm9 > > + > > + > > + psrld xmm7,13 > > + pxor xmm1,xmm2 > > + paddd xmm6,xmm0 > > + pslld xmm2,19-10 > > + pand xmm3,xmm4 > > + pxor xmm1,xmm7 > > + > > + > > + psrld xmm7,22-13 > > + pxor xmm1,xmm2 > > + movdqa xmm8,xmm10 > > + pslld xmm2,30-19 > > + pxor xmm7,xmm1 > > + pxor xmm8,xmm3 > > + paddd xmm12,xmm6 > > + pxor xmm7,xmm2 > > + > > + paddd xmm8,xmm6 > > + paddd xmm8,xmm7 > > + lea rbp,[256+rbp] > > + movdqa xmm6,XMMWORD[((144-128))+rax] > > + paddd xmm5,XMMWORD[((16-128))+rax] > > + > > + movdqa xmm7,xmm6 > > + movdqa xmm1,xmm6 > > + psrld xmm7,3 > > + movdqa xmm2,xmm6 > > + > > + psrld xmm1,7 > > + movdqa xmm0,XMMWORD[((96-128))+rax] > > + pslld xmm2,14 > > + pxor xmm7,xmm1 > > + psrld xmm1,18-7 > > + movdqa xmm3,xmm0 > > + pxor xmm7,xmm2 > > + pslld xmm2,25-14 > > + pxor xmm7,xmm1 > > + psrld xmm0,10 > > + movdqa xmm1,xmm3 > > + > > + psrld xmm3,17 > > + pxor xmm7,xmm2 > > + pslld xmm1,13 > > + paddd xmm5,xmm7 > > + pxor xmm0,xmm3 > > + psrld xmm3,19-17 > > + pxor xmm0,xmm1 > > + pslld xmm1,15-13 > > + pxor xmm0,xmm3 > > + pxor xmm0,xmm1 > > + paddd xmm5,xmm0 > > + movdqa xmm7,xmm12 > > + > > + movdqa xmm2,xmm12 > > + > > + psrld xmm7,6 > > + movdqa xmm1,xmm12 > > + pslld xmm2,7 > > + movdqa XMMWORD[(128-128)+rax],xmm5 > > + paddd xmm5,xmm15 > > + > > + psrld xmm1,11 > > + pxor xmm7,xmm2 > > + pslld xmm2,21-7 > > + paddd xmm5,XMMWORD[((-128))+rbp] > > + pxor xmm7,xmm1 > > + > > + psrld xmm1,25-11 > > + movdqa xmm0,xmm12 > > + > > + pxor xmm7,xmm2 > > + movdqa xmm3,xmm12 > > + pslld xmm2,26-21 > > + pandn xmm0,xmm14 > > + pand xmm3,xmm13 > > + pxor xmm7,xmm1 > > + > > + > > + movdqa xmm1,xmm8 > > + pxor xmm7,xmm2 > > + movdqa xmm2,xmm8 > > + psrld xmm1,2 > > + paddd xmm5,xmm7 > > + pxor xmm0,xmm3 > > + movdqa xmm3,xmm9 > > + movdqa xmm7,xmm8 > > + pslld xmm2,10 > > + pxor xmm3,xmm8 > > + > > + > > + psrld xmm7,13 > > + pxor xmm1,xmm2 > > + paddd xmm5,xmm0 > > + pslld xmm2,19-10 > > + pand xmm4,xmm3 > > + pxor xmm1,xmm7 > > + > > + > > + psrld xmm7,22-13 > > + pxor xmm1,xmm2 > > + movdqa xmm15,xmm9 > > + pslld xmm2,30-19 > > + pxor xmm7,xmm1 > > + pxor xmm15,xmm4 > > + paddd xmm11,xmm5 > > + pxor xmm7,xmm2 > > + > > + paddd xmm15,xmm5 > > + paddd xmm15,xmm7 > > + movdqa xmm5,XMMWORD[((160-128))+rax] > > + paddd xmm6,XMMWORD[((32-128))+rax] > > + > > + movdqa xmm7,xmm5 > > + movdqa xmm1,xmm5 > > + psrld xmm7,3 > > + movdqa xmm2,xmm5 > > + > > + psrld xmm1,7 > > + movdqa xmm0,XMMWORD[((112-128))+rax] > > + pslld xmm2,14 > > + pxor xmm7,xmm1 > > + psrld xmm1,18-7 > > + movdqa xmm4,xmm0 > > + pxor xmm7,xmm2 > > + pslld xmm2,25-14 > > + pxor xmm7,xmm1 > > + psrld xmm0,10 > > + movdqa xmm1,xmm4 > > + > > + psrld xmm4,17 > > + pxor xmm7,xmm2 > > + pslld xmm1,13 > > + paddd xmm6,xmm7 > > + pxor xmm0,xmm4 > > + psrld xmm4,19-17 > > + pxor xmm0,xmm1 > > + pslld xmm1,15-13 > > + pxor xmm0,xmm4 > > + pxor xmm0,xmm1 > > + paddd xmm6,xmm0 > > + movdqa xmm7,xmm11 > > + > > + movdqa xmm2,xmm11 > > + > > + psrld xmm7,6 > > + movdqa xmm1,xmm11 > > + pslld xmm2,7 > > + movdqa XMMWORD[(144-128)+rax],xmm6 > > + paddd xmm6,xmm14 > > + > > + psrld xmm1,11 > > + pxor xmm7,xmm2 > > + pslld xmm2,21-7 > > + paddd xmm6,XMMWORD[((-96))+rbp] > > + pxor xmm7,xmm1 > > + > > + psrld xmm1,25-11 > > + movdqa xmm0,xmm11 > > + > > + pxor xmm7,xmm2 > > + movdqa xmm4,xmm11 > > + pslld xmm2,26-21 > > + pandn xmm0,xmm13 > > + pand xmm4,xmm12 > > + pxor xmm7,xmm1 > > + > > + > > + movdqa xmm1,xmm15 > > + pxor xmm7,xmm2 > > + movdqa xmm2,xmm15 > > + psrld xmm1,2 > > + paddd xmm6,xmm7 > > + pxor xmm0,xmm4 > > + movdqa xmm4,xmm8 > > + movdqa xmm7,xmm15 > > + pslld xmm2,10 > > + pxor xmm4,xmm15 > > + > > + > > + psrld xmm7,13 > > + pxor xmm1,xmm2 > > + paddd xmm6,xmm0 > > + pslld xmm2,19-10 > > + pand xmm3,xmm4 > > + pxor xmm1,xmm7 > > + > > + > > + psrld xmm7,22-13 > > + pxor xmm1,xmm2 > > + movdqa xmm14,xmm8 > > + pslld xmm2,30-19 > > + pxor xmm7,xmm1 > > + pxor xmm14,xmm3 > > + paddd xmm10,xmm6 > > + pxor xmm7,xmm2 > > + > > + paddd xmm14,xmm6 > > + paddd xmm14,xmm7 > > + movdqa xmm6,XMMWORD[((176-128))+rax] > > + paddd xmm5,XMMWORD[((48-128))+rax] > > + > > + movdqa xmm7,xmm6 > > + movdqa xmm1,xmm6 > > + psrld xmm7,3 > > + movdqa xmm2,xmm6 > > + > > + psrld xmm1,7 > > + movdqa xmm0,XMMWORD[((128-128))+rax] > > + pslld xmm2,14 > > + pxor xmm7,xmm1 > > + psrld xmm1,18-7 > > + movdqa xmm3,xmm0 > > + pxor xmm7,xmm2 > > + pslld xmm2,25-14 > > + pxor xmm7,xmm1 > > + psrld xmm0,10 > > + movdqa xmm1,xmm3 > > + > > + psrld xmm3,17 > > + pxor xmm7,xmm2 > > + pslld xmm1,13 > > + paddd xmm5,xmm7 > > + pxor xmm0,xmm3 > > + psrld xmm3,19-17 > > + pxor xmm0,xmm1 > > + pslld xmm1,15-13 > > + pxor xmm0,xmm3 > > + pxor xmm0,xmm1 > > + paddd xmm5,xmm0 > > + movdqa xmm7,xmm10 > > + > > + movdqa xmm2,xmm10 > > + > > + psrld xmm7,6 > > + movdqa xmm1,xmm10 > > + pslld xmm2,7 > > + movdqa XMMWORD[(160-128)+rax],xmm5 > > + paddd xmm5,xmm13 > > + > > + psrld xmm1,11 > > + pxor xmm7,xmm2 > > + pslld xmm2,21-7 > > + paddd xmm5,XMMWORD[((-64))+rbp] > > + pxor xmm7,xmm1 > > + > > + psrld xmm1,25-11 > > + movdqa xmm0,xmm10 > > + > > + pxor xmm7,xmm2 > > + movdqa xmm3,xmm10 > > + pslld xmm2,26-21 > > + pandn xmm0,xmm12 > > + pand xmm3,xmm11 > > + pxor xmm7,xmm1 > > + > > + > > + movdqa xmm1,xmm14 > > + pxor xmm7,xmm2 > > + movdqa xmm2,xmm14 > > + psrld xmm1,2 > > + paddd xmm5,xmm7 > > + pxor xmm0,xmm3 > > + movdqa xmm3,xmm15 > > + movdqa xmm7,xmm14 > > + pslld xmm2,10 > > + pxor xmm3,xmm14 > > + > > + > > + psrld xmm7,13 > > + pxor xmm1,xmm2 > > + paddd xmm5,xmm0 > > + pslld xmm2,19-10 > > + pand xmm4,xmm3 > > + pxor xmm1,xmm7 > > + > > + > > + psrld xmm7,22-13 > > + pxor xmm1,xmm2 > > + movdqa xmm13,xmm15 > > + pslld xmm2,30-19 > > + pxor xmm7,xmm1 > > + pxor xmm13,xmm4 > > + paddd xmm9,xmm5 > > + pxor xmm7,xmm2 > > + > > + paddd xmm13,xmm5 > > + paddd xmm13,xmm7 > > + movdqa xmm5,XMMWORD[((192-128))+rax] > > + paddd xmm6,XMMWORD[((64-128))+rax] > > + > > + movdqa xmm7,xmm5 > > + movdqa xmm1,xmm5 > > + psrld xmm7,3 > > + movdqa xmm2,xmm5 > > + > > + psrld xmm1,7 > > + movdqa xmm0,XMMWORD[((144-128))+rax] > > + pslld xmm2,14 > > + pxor xmm7,xmm1 > > + psrld xmm1,18-7 > > + movdqa xmm4,xmm0 > > + pxor xmm7,xmm2 > > + pslld xmm2,25-14 > > + pxor xmm7,xmm1 > > + psrld xmm0,10 > > + movdqa xmm1,xmm4 > > + > > + psrld xmm4,17 > > + pxor xmm7,xmm2 > > + pslld xmm1,13 > > + paddd xmm6,xmm7 > > + pxor xmm0,xmm4 > > + psrld xmm4,19-17 > > + pxor xmm0,xmm1 > > + pslld xmm1,15-13 > > + pxor xmm0,xmm4 > > + pxor xmm0,xmm1 > > + paddd xmm6,xmm0 > > + movdqa xmm7,xmm9 > > + > > + movdqa xmm2,xmm9 > > + > > + psrld xmm7,6 > > + movdqa xmm1,xmm9 > > + pslld xmm2,7 > > + movdqa XMMWORD[(176-128)+rax],xmm6 > > + paddd xmm6,xmm12 > > + > > + psrld xmm1,11 > > + pxor xmm7,xmm2 > > + pslld xmm2,21-7 > > + paddd xmm6,XMMWORD[((-32))+rbp] > > + pxor xmm7,xmm1 > > + > > + psrld xmm1,25-11 > > + movdqa xmm0,xmm9 > > + > > + pxor xmm7,xmm2 > > + movdqa xmm4,xmm9 > > + pslld xmm2,26-21 > > + pandn xmm0,xmm11 > > + pand xmm4,xmm10 > > + pxor xmm7,xmm1 > > + > > + > > + movdqa xmm1,xmm13 > > + pxor xmm7,xmm2 > > + movdqa xmm2,xmm13 > > + psrld xmm1,2 > > + paddd xmm6,xmm7 > > + pxor xmm0,xmm4 > > + movdqa xmm4,xmm14 > > + movdqa xmm7,xmm13 > > + pslld xmm2,10 > > + pxor xmm4,xmm13 > > + > > + > > + psrld xmm7,13 > > + pxor xmm1,xmm2 > > + paddd xmm6,xmm0 > > + pslld xmm2,19-10 > > + pand xmm3,xmm4 > > + pxor xmm1,xmm7 > > + > > + > > + psrld xmm7,22-13 > > + pxor xmm1,xmm2 > > + movdqa xmm12,xmm14 > > + pslld xmm2,30-19 > > + pxor xmm7,xmm1 > > + pxor xmm12,xmm3 > > + paddd xmm8,xmm6 > > + pxor xmm7,xmm2 > > + > > + paddd xmm12,xmm6 > > + paddd xmm12,xmm7 > > + movdqa xmm6,XMMWORD[((208-128))+rax] > > + paddd xmm5,XMMWORD[((80-128))+rax] > > + > > + movdqa xmm7,xmm6 > > + movdqa xmm1,xmm6 > > + psrld xmm7,3 > > + movdqa xmm2,xmm6 > > + > > + psrld xmm1,7 > > + movdqa xmm0,XMMWORD[((160-128))+rax] > > + pslld xmm2,14 > > + pxor xmm7,xmm1 > > + psrld xmm1,18-7 > > + movdqa xmm3,xmm0 > > + pxor xmm7,xmm2 > > + pslld xmm2,25-14 > > + pxor xmm7,xmm1 > > + psrld xmm0,10 > > + movdqa xmm1,xmm3 > > + > > + psrld xmm3,17 > > + pxor xmm7,xmm2 > > + pslld xmm1,13 > > + paddd xmm5,xmm7 > > + pxor xmm0,xmm3 > > + psrld xmm3,19-17 > > + pxor xmm0,xmm1 > > + pslld xmm1,15-13 > > + pxor xmm0,xmm3 > > + pxor xmm0,xmm1 > > + paddd xmm5,xmm0 > > + movdqa xmm7,xmm8 > > + > > + movdqa xmm2,xmm8 > > + > > + psrld xmm7,6 > > + movdqa xmm1,xmm8 > > + pslld xmm2,7 > > + movdqa XMMWORD[(192-128)+rax],xmm5 > > + paddd xmm5,xmm11 > > + > > + psrld xmm1,11 > > + pxor xmm7,xmm2 > > + pslld xmm2,21-7 > > + paddd xmm5,XMMWORD[rbp] > > + pxor xmm7,xmm1 > > + > > + psrld xmm1,25-11 > > + movdqa xmm0,xmm8 > > + > > + pxor xmm7,xmm2 > > + movdqa xmm3,xmm8 > > + pslld xmm2,26-21 > > + pandn xmm0,xmm10 > > + pand xmm3,xmm9 > > + pxor xmm7,xmm1 > > + > > + > > + movdqa xmm1,xmm12 > > + pxor xmm7,xmm2 > > + movdqa xmm2,xmm12 > > + psrld xmm1,2 > > + paddd xmm5,xmm7 > > + pxor xmm0,xmm3 > > + movdqa xmm3,xmm13 > > + movdqa xmm7,xmm12 > > + pslld xmm2,10 > > + pxor xmm3,xmm12 > > + > > + > > + psrld xmm7,13 > > + pxor xmm1,xmm2 > > + paddd xmm5,xmm0 > > + pslld xmm2,19-10 > > + pand xmm4,xmm3 > > + pxor xmm1,xmm7 > > + > > + > > + psrld xmm7,22-13 > > + pxor xmm1,xmm2 > > + movdqa xmm11,xmm13 > > + pslld xmm2,30-19 > > + pxor xmm7,xmm1 > > + pxor xmm11,xmm4 > > + paddd xmm15,xmm5 > > + pxor xmm7,xmm2 > > + > > + paddd xmm11,xmm5 > > + paddd xmm11,xmm7 > > + movdqa xmm5,XMMWORD[((224-128))+rax] > > + paddd xmm6,XMMWORD[((96-128))+rax] > > + > > + movdqa xmm7,xmm5 > > + movdqa xmm1,xmm5 > > + psrld xmm7,3 > > + movdqa xmm2,xmm5 > > + > > + psrld xmm1,7 > > + movdqa xmm0,XMMWORD[((176-128))+rax] > > + pslld xmm2,14 > > + pxor xmm7,xmm1 > > + psrld xmm1,18-7 > > + movdqa xmm4,xmm0 > > + pxor xmm7,xmm2 > > + pslld xmm2,25-14 > > + pxor xmm7,xmm1 > > + psrld xmm0,10 > > + movdqa xmm1,xmm4 > > + > > + psrld xmm4,17 > > + pxor xmm7,xmm2 > > + pslld xmm1,13 > > + paddd xmm6,xmm7 > > + pxor xmm0,xmm4 > > + psrld xmm4,19-17 > > + pxor xmm0,xmm1 > > + pslld xmm1,15-13 > > + pxor xmm0,xmm4 > > + pxor xmm0,xmm1 > > + paddd xmm6,xmm0 > > + movdqa xmm7,xmm15 > > + > > + movdqa xmm2,xmm15 > > + > > + psrld xmm7,6 > > + movdqa xmm1,xmm15 > > + pslld xmm2,7 > > + movdqa XMMWORD[(208-128)+rax],xmm6 > > + paddd xmm6,xmm10 > > + > > + psrld xmm1,11 > > + pxor xmm7,xmm2 > > + pslld xmm2,21-7 > > + paddd xmm6,XMMWORD[32+rbp] > > + pxor xmm7,xmm1 > > + > > + psrld xmm1,25-11 > > + movdqa xmm0,xmm15 > > + > > + pxor xmm7,xmm2 > > + movdqa xmm4,xmm15 > > + pslld xmm2,26-21 > > + pandn xmm0,xmm9 > > + pand xmm4,xmm8 > > + pxor xmm7,xmm1 > > + > > + > > + movdqa xmm1,xmm11 > > + pxor xmm7,xmm2 > > + movdqa xmm2,xmm11 > > + psrld xmm1,2 > > + paddd xmm6,xmm7 > > + pxor xmm0,xmm4 > > + movdqa xmm4,xmm12 > > + movdqa xmm7,xmm11 > > + pslld xmm2,10 > > + pxor xmm4,xmm11 > > + > > + > > + psrld xmm7,13 > > + pxor xmm1,xmm2 > > + paddd xmm6,xmm0 > > + pslld xmm2,19-10 > > + pand xmm3,xmm4 > > + pxor xmm1,xmm7 > > + > > + > > + psrld xmm7,22-13 > > + pxor xmm1,xmm2 > > + movdqa xmm10,xmm12 > > + pslld xmm2,30-19 > > + pxor xmm7,xmm1 > > + pxor xmm10,xmm3 > > + paddd xmm14,xmm6 > > + pxor xmm7,xmm2 > > + > > + paddd xmm10,xmm6 > > + paddd xmm10,xmm7 > > + movdqa xmm6,XMMWORD[((240-128))+rax] > > + paddd xmm5,XMMWORD[((112-128))+rax] > > + > > + movdqa xmm7,xmm6 > > + movdqa xmm1,xmm6 > > + psrld xmm7,3 > > + movdqa xmm2,xmm6 > > + > > + psrld xmm1,7 > > + movdqa xmm0,XMMWORD[((192-128))+rax] > > + pslld xmm2,14 > > + pxor xmm7,xmm1 > > + psrld xmm1,18-7 > > + movdqa xmm3,xmm0 > > + pxor xmm7,xmm2 > > + pslld xmm2,25-14 > > + pxor xmm7,xmm1 > > + psrld xmm0,10 > > + movdqa xmm1,xmm3 > > + > > + psrld xmm3,17 > > + pxor xmm7,xmm2 > > + pslld xmm1,13 > > + paddd xmm5,xmm7 > > + pxor xmm0,xmm3 > > + psrld xmm3,19-17 > > + pxor xmm0,xmm1 > > + pslld xmm1,15-13 > > + pxor xmm0,xmm3 > > + pxor xmm0,xmm1 > > + paddd xmm5,xmm0 > > + movdqa xmm7,xmm14 > > + > > + movdqa xmm2,xmm14 > > + > > + psrld xmm7,6 > > + movdqa xmm1,xmm14 > > + pslld xmm2,7 > > + movdqa XMMWORD[(224-128)+rax],xmm5 > > + paddd xmm5,xmm9 > > + > > + psrld xmm1,11 > > + pxor xmm7,xmm2 > > + pslld xmm2,21-7 > > + paddd xmm5,XMMWORD[64+rbp] > > + pxor xmm7,xmm1 > > + > > + psrld xmm1,25-11 > > + movdqa xmm0,xmm14 > > + > > + pxor xmm7,xmm2 > > + movdqa xmm3,xmm14 > > + pslld xmm2,26-21 > > + pandn xmm0,xmm8 > > + pand xmm3,xmm15 > > + pxor xmm7,xmm1 > > + > > + > > + movdqa xmm1,xmm10 > > + pxor xmm7,xmm2 > > + movdqa xmm2,xmm10 > > + psrld xmm1,2 > > + paddd xmm5,xmm7 > > + pxor xmm0,xmm3 > > + movdqa xmm3,xmm11 > > + movdqa xmm7,xmm10 > > + pslld xmm2,10 > > + pxor xmm3,xmm10 > > + > > + > > + psrld xmm7,13 > > + pxor xmm1,xmm2 > > + paddd xmm5,xmm0 > > + pslld xmm2,19-10 > > + pand xmm4,xmm3 > > + pxor xmm1,xmm7 > > + > > + > > + psrld xmm7,22-13 > > + pxor xmm1,xmm2 > > + movdqa xmm9,xmm11 > > + pslld xmm2,30-19 > > + pxor xmm7,xmm1 > > + pxor xmm9,xmm4 > > + paddd xmm13,xmm5 > > + pxor xmm7,xmm2 > > + > > + paddd xmm9,xmm5 > > + paddd xmm9,xmm7 > > + movdqa xmm5,XMMWORD[((0-128))+rax] > > + paddd xmm6,XMMWORD[((128-128))+rax] > > + > > + movdqa xmm7,xmm5 > > + movdqa xmm1,xmm5 > > + psrld xmm7,3 > > + movdqa xmm2,xmm5 > > + > > + psrld xmm1,7 > > + movdqa xmm0,XMMWORD[((208-128))+rax] > > + pslld xmm2,14 > > + pxor xmm7,xmm1 > > + psrld xmm1,18-7 > > + movdqa xmm4,xmm0 > > + pxor xmm7,xmm2 > > + pslld xmm2,25-14 > > + pxor xmm7,xmm1 > > + psrld xmm0,10 > > + movdqa xmm1,xmm4 > > + > > + psrld xmm4,17 > > + pxor xmm7,xmm2 > > + pslld xmm1,13 > > + paddd xmm6,xmm7 > > + pxor xmm0,xmm4 > > + psrld xmm4,19-17 > > + pxor xmm0,xmm1 > > + pslld xmm1,15-13 > > + pxor xmm0,xmm4 > > + pxor xmm0,xmm1 > > + paddd xmm6,xmm0 > > + movdqa xmm7,xmm13 > > + > > + movdqa xmm2,xmm13 > > + > > + psrld xmm7,6 > > + movdqa xmm1,xmm13 > > + pslld xmm2,7 > > + movdqa XMMWORD[(240-128)+rax],xmm6 > > + paddd xmm6,xmm8 > > + > > + psrld xmm1,11 > > + pxor xmm7,xmm2 > > + pslld xmm2,21-7 > > + paddd xmm6,XMMWORD[96+rbp] > > + pxor xmm7,xmm1 > > + > > + psrld xmm1,25-11 > > + movdqa xmm0,xmm13 > > + > > + pxor xmm7,xmm2 > > + movdqa xmm4,xmm13 > > + pslld xmm2,26-21 > > + pandn xmm0,xmm15 > > + pand xmm4,xmm14 > > + pxor xmm7,xmm1 > > + > > + > > + movdqa xmm1,xmm9 > > + pxor xmm7,xmm2 > > + movdqa xmm2,xmm9 > > + psrld xmm1,2 > > + paddd xmm6,xmm7 > > + pxor xmm0,xmm4 > > + movdqa xmm4,xmm10 > > + movdqa xmm7,xmm9 > > + pslld xmm2,10 > > + pxor xmm4,xmm9 > > + > > + > > + psrld xmm7,13 > > + pxor xmm1,xmm2 > > + paddd xmm6,xmm0 > > + pslld xmm2,19-10 > > + pand xmm3,xmm4 > > + pxor xmm1,xmm7 > > + > > + > > + psrld xmm7,22-13 > > + pxor xmm1,xmm2 > > + movdqa xmm8,xmm10 > > + pslld xmm2,30-19 > > + pxor xmm7,xmm1 > > + pxor xmm8,xmm3 > > + paddd xmm12,xmm6 > > + pxor xmm7,xmm2 > > + > > + paddd xmm8,xmm6 > > + paddd xmm8,xmm7 > > + lea rbp,[256+rbp] > > + dec ecx > > + jnz NEAR $L$oop_16_xx > > + > > + mov ecx,1 > > + lea rbp,[((K256+128))] > > + > > + movdqa xmm7,XMMWORD[rbx] > > + cmp ecx,DWORD[rbx] > > + pxor xmm0,xmm0 > > + cmovge r8,rbp > > + cmp ecx,DWORD[4+rbx] > > + movdqa xmm6,xmm7 > > + cmovge r9,rbp > > + cmp ecx,DWORD[8+rbx] > > + pcmpgtd xmm6,xmm0 > > + cmovge r10,rbp > > + cmp ecx,DWORD[12+rbx] > > + paddd xmm7,xmm6 > > + cmovge r11,rbp > > + > > + movdqu xmm0,XMMWORD[((0-128))+rdi] > > + pand xmm8,xmm6 > > + movdqu xmm1,XMMWORD[((32-128))+rdi] > > + pand xmm9,xmm6 > > + movdqu xmm2,XMMWORD[((64-128))+rdi] > > + pand xmm10,xmm6 > > + movdqu xmm5,XMMWORD[((96-128))+rdi] > > + pand xmm11,xmm6 > > + paddd xmm8,xmm0 > > + movdqu xmm0,XMMWORD[((128-128))+rdi] > > + pand xmm12,xmm6 > > + paddd xmm9,xmm1 > > + movdqu xmm1,XMMWORD[((160-128))+rdi] > > + pand xmm13,xmm6 > > + paddd xmm10,xmm2 > > + movdqu xmm2,XMMWORD[((192-128))+rdi] > > + pand xmm14,xmm6 > > + paddd xmm11,xmm5 > > + movdqu xmm5,XMMWORD[((224-128))+rdi] > > + pand xmm15,xmm6 > > + paddd xmm12,xmm0 > > + paddd xmm13,xmm1 > > + movdqu XMMWORD[(0-128)+rdi],xmm8 > > + paddd xmm14,xmm2 > > + movdqu XMMWORD[(32-128)+rdi],xmm9 > > + paddd xmm15,xmm5 > > + movdqu XMMWORD[(64-128)+rdi],xmm10 > > + movdqu XMMWORD[(96-128)+rdi],xmm11 > > + movdqu XMMWORD[(128-128)+rdi],xmm12 > > + movdqu XMMWORD[(160-128)+rdi],xmm13 > > + movdqu XMMWORD[(192-128)+rdi],xmm14 > > + movdqu XMMWORD[(224-128)+rdi],xmm15 > > + > > + movdqa XMMWORD[rbx],xmm7 > > + movdqa xmm6,XMMWORD[$L$pbswap] > > + dec edx > > + jnz NEAR $L$oop > > + > > + mov edx,DWORD[280+rsp] > > + lea rdi,[16+rdi] > > + lea rsi,[64+rsi] > > + dec edx > > + jnz NEAR $L$oop_grande > > + > > +$L$done: > > + mov rax,QWORD[272+rsp] > > + > > + movaps xmm6,XMMWORD[((-184))+rax] > > + movaps xmm7,XMMWORD[((-168))+rax] > > + movaps xmm8,XMMWORD[((-152))+rax] > > + movaps xmm9,XMMWORD[((-136))+rax] > > + movaps xmm10,XMMWORD[((-120))+rax] > > + movaps xmm11,XMMWORD[((-104))+rax] > > + movaps xmm12,XMMWORD[((-88))+rax] > > + movaps xmm13,XMMWORD[((-72))+rax] > > + movaps xmm14,XMMWORD[((-56))+rax] > > + movaps xmm15,XMMWORD[((-40))+rax] > > + mov rbp,QWORD[((-16))+rax] > > + > > + mov rbx,QWORD[((-8))+rax] > > + > > + lea rsp,[rax] > > + > > +$L$epilogue: > > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > > + mov rsi,QWORD[16+rsp] > > + DB 0F3h,0C3h ;repret > > + > > +$L$SEH_end_sha256_multi_block: > > + > > +ALIGN 32 > > +sha256_multi_block_shaext: > > + mov QWORD[8+rsp],rdi ;WIN64 prologue > > + mov QWORD[16+rsp],rsi > > + mov rax,rsp > > +$L$SEH_begin_sha256_multi_block_shaext: > > + mov rdi,rcx > > + mov rsi,rdx > > + mov rdx,r8 > > + > > + > > + > > +_shaext_shortcut: > > + mov rax,rsp > > + > > + push rbx > > + > > + push rbp > > + > > + lea rsp,[((-168))+rsp] > > + movaps XMMWORD[rsp],xmm6 > > + movaps XMMWORD[16+rsp],xmm7 > > + movaps XMMWORD[32+rsp],xmm8 > > + movaps XMMWORD[48+rsp],xmm9 > > + movaps XMMWORD[(-120)+rax],xmm10 > > + movaps XMMWORD[(-104)+rax],xmm11 > > + movaps XMMWORD[(-88)+rax],xmm12 > > + movaps XMMWORD[(-72)+rax],xmm13 > > + movaps XMMWORD[(-56)+rax],xmm14 > > + movaps XMMWORD[(-40)+rax],xmm15 > > + sub rsp,288 > > + shl edx,1 > > + and rsp,-256 > > + lea rdi,[128+rdi] > > + mov QWORD[272+rsp],rax > > +$L$body_shaext: > > + lea rbx,[256+rsp] > > + lea rbp,[((K256_shaext+128))] > > + > > +$L$oop_grande_shaext: > > + mov DWORD[280+rsp],edx > > + xor edx,edx > > + mov r8,QWORD[rsi] > > + mov ecx,DWORD[8+rsi] > > + cmp ecx,edx > > + cmovg edx,ecx > > + test ecx,ecx > > + mov DWORD[rbx],ecx > > + cmovle r8,rsp > > + mov r9,QWORD[16+rsi] > > + mov ecx,DWORD[24+rsi] > > + cmp ecx,edx > > + cmovg edx,ecx > > + test ecx,ecx > > + mov DWORD[4+rbx],ecx > > + cmovle r9,rsp > > + test edx,edx > > + jz NEAR $L$done_shaext > > + > > + movq xmm12,QWORD[((0-128))+rdi] > > + movq xmm4,QWORD[((32-128))+rdi] > > + movq xmm13,QWORD[((64-128))+rdi] > > + movq xmm5,QWORD[((96-128))+rdi] > > + movq xmm8,QWORD[((128-128))+rdi] > > + movq xmm9,QWORD[((160-128))+rdi] > > + movq xmm10,QWORD[((192-128))+rdi] > > + movq xmm11,QWORD[((224-128))+rdi] > > + > > + punpckldq xmm12,xmm4 > > + punpckldq xmm13,xmm5 > > + punpckldq xmm8,xmm9 > > + punpckldq xmm10,xmm11 > > + movdqa xmm3,XMMWORD[((K256_shaext-16))] > > + > > + movdqa xmm14,xmm12 > > + movdqa xmm15,xmm13 > > + punpcklqdq xmm12,xmm8 > > + punpcklqdq xmm13,xmm10 > > + punpckhqdq xmm14,xmm8 > > + punpckhqdq xmm15,xmm10 > > + > > + pshufd xmm12,xmm12,27 > > + pshufd xmm13,xmm13,27 > > + pshufd xmm14,xmm14,27 > > + pshufd xmm15,xmm15,27 > > + jmp NEAR $L$oop_shaext > > + > > +ALIGN 32 > > +$L$oop_shaext: > > + movdqu xmm4,XMMWORD[r8] > > + movdqu xmm8,XMMWORD[r9] > > + movdqu xmm5,XMMWORD[16+r8] > > + movdqu xmm9,XMMWORD[16+r9] > > + movdqu xmm6,XMMWORD[32+r8] > > +DB 102,15,56,0,227 > > + movdqu xmm10,XMMWORD[32+r9] > > +DB 102,68,15,56,0,195 > > + movdqu xmm7,XMMWORD[48+r8] > > + lea r8,[64+r8] > > + movdqu xmm11,XMMWORD[48+r9] > > + lea r9,[64+r9] > > + > > + movdqa xmm0,XMMWORD[((0-128))+rbp] > > +DB 102,15,56,0,235 > > + paddd xmm0,xmm4 > > + pxor xmm4,xmm12 > > + movdqa xmm1,xmm0 > > + movdqa xmm2,XMMWORD[((0-128))+rbp] > > +DB 102,68,15,56,0,203 > > + paddd xmm2,xmm8 > > + movdqa XMMWORD[80+rsp],xmm13 > > +DB 69,15,56,203,236 > > + pxor xmm8,xmm14 > > + movdqa xmm0,xmm2 > > + movdqa XMMWORD[112+rsp],xmm15 > > +DB 69,15,56,203,254 > > + pshufd xmm0,xmm1,0x0e > > + pxor xmm4,xmm12 > > + movdqa XMMWORD[64+rsp],xmm12 > > +DB 69,15,56,203,229 > > + pshufd xmm0,xmm2,0x0e > > + pxor xmm8,xmm14 > > + movdqa XMMWORD[96+rsp],xmm14 > > + movdqa xmm1,XMMWORD[((16-128))+rbp] > > + paddd xmm1,xmm5 > > +DB 102,15,56,0,243 > > +DB 69,15,56,203,247 > > + > > + movdqa xmm0,xmm1 > > + movdqa xmm2,XMMWORD[((16-128))+rbp] > > + paddd xmm2,xmm9 > > +DB 69,15,56,203,236 > > + movdqa xmm0,xmm2 > > + prefetcht0 [127+r8] > > +DB 102,15,56,0,251 > > +DB 102,68,15,56,0,211 > > + prefetcht0 [127+r9] > > +DB 69,15,56,203,254 > > + pshufd xmm0,xmm1,0x0e > > +DB 102,68,15,56,0,219 > > +DB 15,56,204,229 > > +DB 69,15,56,203,229 > > + pshufd xmm0,xmm2,0x0e > > + movdqa xmm1,XMMWORD[((32-128))+rbp] > > + paddd xmm1,xmm6 > > +DB 69,15,56,203,247 > > + > > + movdqa xmm0,xmm1 > > + movdqa xmm2,XMMWORD[((32-128))+rbp] > > + paddd xmm2,xmm10 > > +DB 69,15,56,203,236 > > +DB 69,15,56,204,193 > > + movdqa xmm0,xmm2 > > + movdqa xmm3,xmm7 > > +DB 69,15,56,203,254 > > + pshufd xmm0,xmm1,0x0e > > +DB 102,15,58,15,222,4 > > + paddd xmm4,xmm3 > > + movdqa xmm3,xmm11 > > +DB 102,65,15,58,15,218,4 > > +DB 15,56,204,238 > > +DB 69,15,56,203,229 > > + pshufd xmm0,xmm2,0x0e > > + movdqa xmm1,XMMWORD[((48-128))+rbp] > > + paddd xmm1,xmm7 > > +DB 69,15,56,203,247 > > +DB 69,15,56,204,202 > > + > > + movdqa xmm0,xmm1 > > + movdqa xmm2,XMMWORD[((48-128))+rbp] > > + paddd xmm8,xmm3 > > + paddd xmm2,xmm11 > > +DB 15,56,205,231 > > +DB 69,15,56,203,236 > > + movdqa xmm0,xmm2 > > + movdqa xmm3,xmm4 > > +DB 102,15,58,15,223,4 > > +DB 69,15,56,203,254 > > +DB 69,15,56,205,195 > > + pshufd xmm0,xmm1,0x0e > > + paddd xmm5,xmm3 > > + movdqa xmm3,xmm8 > > +DB 102,65,15,58,15,219,4 > > +DB 15,56,204,247 > > +DB 69,15,56,203,229 > > + pshufd xmm0,xmm2,0x0e > > + movdqa xmm1,XMMWORD[((64-128))+rbp] > > + paddd xmm1,xmm4 > > +DB 69,15,56,203,247 > > +DB 69,15,56,204,211 > > + movdqa xmm0,xmm1 > > + movdqa xmm2,XMMWORD[((64-128))+rbp] > > + paddd xmm9,xmm3 > > + paddd xmm2,xmm8 > > +DB 15,56,205,236 > > +DB 69,15,56,203,236 > > + movdqa xmm0,xmm2 > > + movdqa xmm3,xmm5 > > +DB 102,15,58,15,220,4 > > +DB 69,15,56,203,254 > > +DB 69,15,56,205,200 > > + pshufd xmm0,xmm1,0x0e > > + paddd xmm6,xmm3 > > + movdqa xmm3,xmm9 > > +DB 102,65,15,58,15,216,4 > > +DB 15,56,204,252 > > +DB 69,15,56,203,229 > > + pshufd xmm0,xmm2,0x0e > > + movdqa xmm1,XMMWORD[((80-128))+rbp] > > + paddd xmm1,xmm5 > > +DB 69,15,56,203,247 > > +DB 69,15,56,204,216 > > + movdqa xmm0,xmm1 > > + movdqa xmm2,XMMWORD[((80-128))+rbp] > > + paddd xmm10,xmm3 > > + paddd xmm2,xmm9 > > +DB 15,56,205,245 > > +DB 69,15,56,203,236 > > + movdqa xmm0,xmm2 > > + movdqa xmm3,xmm6 > > +DB 102,15,58,15,221,4 > > +DB 69,15,56,203,254 > > +DB 69,15,56,205,209 > > + pshufd xmm0,xmm1,0x0e > > + paddd xmm7,xmm3 > > + movdqa xmm3,xmm10 > > +DB 102,65,15,58,15,217,4 > > +DB 15,56,204,229 > > +DB 69,15,56,203,229 > > + pshufd xmm0,xmm2,0x0e > > + movdqa xmm1,XMMWORD[((96-128))+rbp] > > + paddd xmm1,xmm6 > > +DB 69,15,56,203,247 > > +DB 69,15,56,204,193 > > + movdqa xmm0,xmm1 > > + movdqa xmm2,XMMWORD[((96-128))+rbp] > > + paddd xmm11,xmm3 > > + paddd xmm2,xmm10 > > +DB 15,56,205,254 > > +DB 69,15,56,203,236 > > + movdqa xmm0,xmm2 > > + movdqa xmm3,xmm7 > > +DB 102,15,58,15,222,4 > > +DB 69,15,56,203,254 > > +DB 69,15,56,205,218 > > + pshufd xmm0,xmm1,0x0e > > + paddd xmm4,xmm3 > > + movdqa xmm3,xmm11 > > +DB 102,65,15,58,15,218,4 > > +DB 15,56,204,238 > > +DB 69,15,56,203,229 > > + pshufd xmm0,xmm2,0x0e > > + movdqa xmm1,XMMWORD[((112-128))+rbp] > > + paddd xmm1,xmm7 > > +DB 69,15,56,203,247 > > +DB 69,15,56,204,202 > > + movdqa xmm0,xmm1 > > + movdqa xmm2,XMMWORD[((112-128))+rbp] > > + paddd xmm8,xmm3 > > + paddd xmm2,xmm11 > > +DB 15,56,205,231 > > +DB 69,15,56,203,236 > > + movdqa xmm0,xmm2 > > + movdqa xmm3,xmm4 > > +DB 102,15,58,15,223,4 > > +DB 69,15,56,203,254 > > +DB 69,15,56,205,195 > > + pshufd xmm0,xmm1,0x0e > > + paddd xmm5,xmm3 > > + movdqa xmm3,xmm8 > > +DB 102,65,15,58,15,219,4 > > +DB 15,56,204,247 > > +DB 69,15,56,203,229 > > + pshufd xmm0,xmm2,0x0e > > + movdqa xmm1,XMMWORD[((128-128))+rbp] > > + paddd xmm1,xmm4 > > +DB 69,15,56,203,247 > > +DB 69,15,56,204,211 > > + movdqa xmm0,xmm1 > > + movdqa xmm2,XMMWORD[((128-128))+rbp] > > + paddd xmm9,xmm3 > > + paddd xmm2,xmm8 > > +DB 15,56,205,236 > > +DB 69,15,56,203,236 > > + movdqa xmm0,xmm2 > > + movdqa xmm3,xmm5 > > +DB 102,15,58,15,220,4 > > +DB 69,15,56,203,254 > > +DB 69,15,56,205,200 > > + pshufd xmm0,xmm1,0x0e > > + paddd xmm6,xmm3 > > + movdqa xmm3,xmm9 > > +DB 102,65,15,58,15,216,4 > > +DB 15,56,204,252 > > +DB 69,15,56,203,229 > > + pshufd xmm0,xmm2,0x0e > > + movdqa xmm1,XMMWORD[((144-128))+rbp] > > + paddd xmm1,xmm5 > > +DB 69,15,56,203,247 > > +DB 69,15,56,204,216 > > + movdqa xmm0,xmm1 > > + movdqa xmm2,XMMWORD[((144-128))+rbp] > > + paddd xmm10,xmm3 > > + paddd xmm2,xmm9 > > +DB 15,56,205,245 > > +DB 69,15,56,203,236 > > + movdqa xmm0,xmm2 > > + movdqa xmm3,xmm6 > > +DB 102,15,58,15,221,4 > > +DB 69,15,56,203,254 > > +DB 69,15,56,205,209 > > + pshufd xmm0,xmm1,0x0e > > + paddd xmm7,xmm3 > > + movdqa xmm3,xmm10 > > +DB 102,65,15,58,15,217,4 > > +DB 15,56,204,229 > > +DB 69,15,56,203,229 > > + pshufd xmm0,xmm2,0x0e > > + movdqa xmm1,XMMWORD[((160-128))+rbp] > > + paddd xmm1,xmm6 > > +DB 69,15,56,203,247 > > +DB 69,15,56,204,193 > > + movdqa xmm0,xmm1 > > + movdqa xmm2,XMMWORD[((160-128))+rbp] > > + paddd xmm11,xmm3 > > + paddd xmm2,xmm10 > > +DB 15,56,205,254 > > +DB 69,15,56,203,236 > > + movdqa xmm0,xmm2 > > + movdqa xmm3,xmm7 > > +DB 102,15,58,15,222,4 > > +DB 69,15,56,203,254 > > +DB 69,15,56,205,218 > > + pshufd xmm0,xmm1,0x0e > > + paddd xmm4,xmm3 > > + movdqa xmm3,xmm11 > > +DB 102,65,15,58,15,218,4 > > +DB 15,56,204,238 > > +DB 69,15,56,203,229 > > + pshufd xmm0,xmm2,0x0e > > + movdqa xmm1,XMMWORD[((176-128))+rbp] > > + paddd xmm1,xmm7 > > +DB 69,15,56,203,247 > > +DB 69,15,56,204,202 > > + movdqa xmm0,xmm1 > > + movdqa xmm2,XMMWORD[((176-128))+rbp] > > + paddd xmm8,xmm3 > > + paddd xmm2,xmm11 > > +DB 15,56,205,231 > > +DB 69,15,56,203,236 > > + movdqa xmm0,xmm2 > > + movdqa xmm3,xmm4 > > +DB 102,15,58,15,223,4 > > +DB 69,15,56,203,254 > > +DB 69,15,56,205,195 > > + pshufd xmm0,xmm1,0x0e > > + paddd xmm5,xmm3 > > + movdqa xmm3,xmm8 > > +DB 102,65,15,58,15,219,4 > > +DB 15,56,204,247 > > +DB 69,15,56,203,229 > > + pshufd xmm0,xmm2,0x0e > > + movdqa xmm1,XMMWORD[((192-128))+rbp] > > + paddd xmm1,xmm4 > > +DB 69,15,56,203,247 > > +DB 69,15,56,204,211 > > + movdqa xmm0,xmm1 > > + movdqa xmm2,XMMWORD[((192-128))+rbp] > > + paddd xmm9,xmm3 > > + paddd xmm2,xmm8 > > +DB 15,56,205,236 > > +DB 69,15,56,203,236 > > + movdqa xmm0,xmm2 > > + movdqa xmm3,xmm5 > > +DB 102,15,58,15,220,4 > > +DB 69,15,56,203,254 > > +DB 69,15,56,205,200 > > + pshufd xmm0,xmm1,0x0e > > + paddd xmm6,xmm3 > > + movdqa xmm3,xmm9 > > +DB 102,65,15,58,15,216,4 > > +DB 15,56,204,252 > > +DB 69,15,56,203,229 > > + pshufd xmm0,xmm2,0x0e > > + movdqa xmm1,XMMWORD[((208-128))+rbp] > > + paddd xmm1,xmm5 > > +DB 69,15,56,203,247 > > +DB 69,15,56,204,216 > > + movdqa xmm0,xmm1 > > + movdqa xmm2,XMMWORD[((208-128))+rbp] > > + paddd xmm10,xmm3 > > + paddd xmm2,xmm9 > > +DB 15,56,205,245 > > +DB 69,15,56,203,236 > > + movdqa xmm0,xmm2 > > + movdqa xmm3,xmm6 > > +DB 102,15,58,15,221,4 > > +DB 69,15,56,203,254 > > +DB 69,15,56,205,209 > > + pshufd xmm0,xmm1,0x0e > > + paddd xmm7,xmm3 > > + movdqa xmm3,xmm10 > > +DB 102,65,15,58,15,217,4 > > + nop > > +DB 69,15,56,203,229 > > + pshufd xmm0,xmm2,0x0e > > + movdqa xmm1,XMMWORD[((224-128))+rbp] > > + paddd xmm1,xmm6 > > +DB 69,15,56,203,247 > > + > > + movdqa xmm0,xmm1 > > + movdqa xmm2,XMMWORD[((224-128))+rbp] > > + paddd xmm11,xmm3 > > + paddd xmm2,xmm10 > > +DB 15,56,205,254 > > + nop > > +DB 69,15,56,203,236 > > + movdqa xmm0,xmm2 > > + mov ecx,1 > > + pxor xmm6,xmm6 > > +DB 69,15,56,203,254 > > +DB 69,15,56,205,218 > > + pshufd xmm0,xmm1,0x0e > > + movdqa xmm1,XMMWORD[((240-128))+rbp] > > + paddd xmm1,xmm7 > > + movq xmm7,QWORD[rbx] > > + nop > > +DB 69,15,56,203,229 > > + pshufd xmm0,xmm2,0x0e > > + movdqa xmm2,XMMWORD[((240-128))+rbp] > > + paddd xmm2,xmm11 > > +DB 69,15,56,203,247 > > + > > + movdqa xmm0,xmm1 > > + cmp ecx,DWORD[rbx] > > + cmovge r8,rsp > > + cmp ecx,DWORD[4+rbx] > > + cmovge r9,rsp > > + pshufd xmm9,xmm7,0x00 > > +DB 69,15,56,203,236 > > + movdqa xmm0,xmm2 > > + pshufd xmm10,xmm7,0x55 > > + movdqa xmm11,xmm7 > > +DB 69,15,56,203,254 > > + pshufd xmm0,xmm1,0x0e > > + pcmpgtd xmm9,xmm6 > > + pcmpgtd xmm10,xmm6 > > +DB 69,15,56,203,229 > > + pshufd xmm0,xmm2,0x0e > > + pcmpgtd xmm11,xmm6 > > + movdqa xmm3,XMMWORD[((K256_shaext-16))] > > +DB 69,15,56,203,247 > > + > > + pand xmm13,xmm9 > > + pand xmm15,xmm10 > > + pand xmm12,xmm9 > > + pand xmm14,xmm10 > > + paddd xmm11,xmm7 > > + > > + paddd xmm13,XMMWORD[80+rsp] > > + paddd xmm15,XMMWORD[112+rsp] > > + paddd xmm12,XMMWORD[64+rsp] > > + paddd xmm14,XMMWORD[96+rsp] > > + > > + movq QWORD[rbx],xmm11 > > + dec edx > > + jnz NEAR $L$oop_shaext > > + > > + mov edx,DWORD[280+rsp] > > + > > + pshufd xmm12,xmm12,27 > > + pshufd xmm13,xmm13,27 > > + pshufd xmm14,xmm14,27 > > + pshufd xmm15,xmm15,27 > > + > > + movdqa xmm5,xmm12 > > + movdqa xmm6,xmm13 > > + punpckldq xmm12,xmm14 > > + punpckhdq xmm5,xmm14 > > + punpckldq xmm13,xmm15 > > + punpckhdq xmm6,xmm15 > > + > > + movq QWORD[(0-128)+rdi],xmm12 > > + psrldq xmm12,8 > > + movq QWORD[(128-128)+rdi],xmm5 > > + psrldq xmm5,8 > > + movq QWORD[(32-128)+rdi],xmm12 > > + movq QWORD[(160-128)+rdi],xmm5 > > + > > + movq QWORD[(64-128)+rdi],xmm13 > > + psrldq xmm13,8 > > + movq QWORD[(192-128)+rdi],xmm6 > > + psrldq xmm6,8 > > + movq QWORD[(96-128)+rdi],xmm13 > > + movq QWORD[(224-128)+rdi],xmm6 > > + > > + lea rdi,[8+rdi] > > + lea rsi,[32+rsi] > > + dec edx > > + jnz NEAR $L$oop_grande_shaext > > + > > +$L$done_shaext: > > + > > + movaps xmm6,XMMWORD[((-184))+rax] > > + movaps xmm7,XMMWORD[((-168))+rax] > > + movaps xmm8,XMMWORD[((-152))+rax] > > + movaps xmm9,XMMWORD[((-136))+rax] > > + movaps xmm10,XMMWORD[((-120))+rax] > > + movaps xmm11,XMMWORD[((-104))+rax] > > + movaps xmm12,XMMWORD[((-88))+rax] > > + movaps xmm13,XMMWORD[((-72))+rax] > > + movaps xmm14,XMMWORD[((-56))+rax] > > + movaps xmm15,XMMWORD[((-40))+rax] > > + mov rbp,QWORD[((-16))+rax] > > + > > + mov rbx,QWORD[((-8))+rax] > > + > > + lea rsp,[rax] > > + > > +$L$epilogue_shaext: > > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > > + mov rsi,QWORD[16+rsp] > > + DB 0F3h,0C3h ;repret > > + > > +$L$SEH_end_sha256_multi_block_shaext: > > +ALIGN 256 > > +K256: > > + DD 1116352408,1116352408,1116352408,1116352408 > > + DD 1116352408,1116352408,1116352408,1116352408 > > + DD 1899447441,1899447441,1899447441,1899447441 > > + DD 1899447441,1899447441,1899447441,1899447441 > > + DD 3049323471,3049323471,3049323471,3049323471 > > + DD 3049323471,3049323471,3049323471,3049323471 > > + DD 3921009573,3921009573,3921009573,3921009573 > > + DD 3921009573,3921009573,3921009573,3921009573 > > + DD 961987163,961987163,961987163,961987163 > > + DD 961987163,961987163,961987163,961987163 > > + DD 1508970993,1508970993,1508970993,1508970993 > > + DD 1508970993,1508970993,1508970993,1508970993 > > + DD 2453635748,2453635748,2453635748,2453635748 > > + DD 2453635748,2453635748,2453635748,2453635748 > > + DD 2870763221,2870763221,2870763221,2870763221 > > + DD 2870763221,2870763221,2870763221,2870763221 > > + DD 3624381080,3624381080,3624381080,3624381080 > > + DD 3624381080,3624381080,3624381080,3624381080 > > + DD 310598401,310598401,310598401,310598401 > > + DD 310598401,310598401,310598401,310598401 > > + DD 607225278,607225278,607225278,607225278 > > + DD 607225278,607225278,607225278,607225278 > > + DD 1426881987,1426881987,1426881987,1426881987 > > + DD 1426881987,1426881987,1426881987,1426881987 > > + DD 1925078388,1925078388,1925078388,1925078388 > > + DD 1925078388,1925078388,1925078388,1925078388 > > + DD 2162078206,2162078206,2162078206,2162078206 > > + DD 2162078206,2162078206,2162078206,2162078206 > > + DD 2614888103,2614888103,2614888103,2614888103 > > + DD 2614888103,2614888103,2614888103,2614888103 > > + DD 3248222580,3248222580,3248222580,3248222580 > > + DD 3248222580,3248222580,3248222580,3248222580 > > + DD 3835390401,3835390401,3835390401,3835390401 > > + DD 3835390401,3835390401,3835390401,3835390401 > > + DD 4022224774,4022224774,4022224774,4022224774 > > + DD 4022224774,4022224774,4022224774,4022224774 > > + DD 264347078,264347078,264347078,264347078 > > + DD 264347078,264347078,264347078,264347078 > > + DD 604807628,604807628,604807628,604807628 > > + DD 604807628,604807628,604807628,604807628 > > + DD 770255983,770255983,770255983,770255983 > > + DD 770255983,770255983,770255983,770255983 > > + DD 1249150122,1249150122,1249150122,1249150122 > > + DD 1249150122,1249150122,1249150122,1249150122 > > + DD 1555081692,1555081692,1555081692,1555081692 > > + DD 1555081692,1555081692,1555081692,1555081692 > > + DD 1996064986,1996064986,1996064986,1996064986 > > + DD 1996064986,1996064986,1996064986,1996064986 > > + DD 2554220882,2554220882,2554220882,2554220882 > > + DD 2554220882,2554220882,2554220882,2554220882 > > + DD 2821834349,2821834349,2821834349,2821834349 > > + DD 2821834349,2821834349,2821834349,2821834349 > > + DD 2952996808,2952996808,2952996808,2952996808 > > + DD 2952996808,2952996808,2952996808,2952996808 > > + DD 3210313671,3210313671,3210313671,3210313671 > > + DD 3210313671,3210313671,3210313671,3210313671 > > + DD 3336571891,3336571891,3336571891,3336571891 > > + DD 3336571891,3336571891,3336571891,3336571891 > > + DD 3584528711,3584528711,3584528711,3584528711 > > + DD 3584528711,3584528711,3584528711,3584528711 > > + DD 113926993,113926993,113926993,113926993 > > + DD 113926993,113926993,113926993,113926993 > > + DD 338241895,338241895,338241895,338241895 > > + DD 338241895,338241895,338241895,338241895 > > + DD 666307205,666307205,666307205,666307205 > > + DD 666307205,666307205,666307205,666307205 > > + DD 773529912,773529912,773529912,773529912 > > + DD 773529912,773529912,773529912,773529912 > > + DD 1294757372,1294757372,1294757372,1294757372 > > + DD 1294757372,1294757372,1294757372,1294757372 > > + DD 1396182291,1396182291,1396182291,1396182291 > > + DD 1396182291,1396182291,1396182291,1396182291 > > + DD 1695183700,1695183700,1695183700,1695183700 > > + DD 1695183700,1695183700,1695183700,1695183700 > > + DD 1986661051,1986661051,1986661051,1986661051 > > + DD 1986661051,1986661051,1986661051,1986661051 > > + DD 2177026350,2177026350,2177026350,2177026350 > > + DD 2177026350,2177026350,2177026350,2177026350 > > + DD 2456956037,2456956037,2456956037,2456956037 > > + DD 2456956037,2456956037,2456956037,2456956037 > > + DD 2730485921,2730485921,2730485921,2730485921 > > + DD 2730485921,2730485921,2730485921,2730485921 > > + DD 2820302411,2820302411,2820302411,2820302411 > > + DD 2820302411,2820302411,2820302411,2820302411 > > + DD 3259730800,3259730800,3259730800,3259730800 > > + DD 3259730800,3259730800,3259730800,3259730800 > > + DD 3345764771,3345764771,3345764771,3345764771 > > + DD 3345764771,3345764771,3345764771,3345764771 > > + DD 3516065817,3516065817,3516065817,3516065817 > > + DD 3516065817,3516065817,3516065817,3516065817 > > + DD 3600352804,3600352804,3600352804,3600352804 > > + DD 3600352804,3600352804,3600352804,3600352804 > > + DD 4094571909,4094571909,4094571909,4094571909 > > + DD 4094571909,4094571909,4094571909,4094571909 > > + DD 275423344,275423344,275423344,275423344 > > + DD 275423344,275423344,275423344,275423344 > > + DD 430227734,430227734,430227734,430227734 > > + DD 430227734,430227734,430227734,430227734 > > + DD 506948616,506948616,506948616,506948616 > > + DD 506948616,506948616,506948616,506948616 > > + DD 659060556,659060556,659060556,659060556 > > + DD 659060556,659060556,659060556,659060556 > > + DD 883997877,883997877,883997877,883997877 > > + DD 883997877,883997877,883997877,883997877 > > + DD 958139571,958139571,958139571,958139571 > > + DD 958139571,958139571,958139571,958139571 > > + DD 1322822218,1322822218,1322822218,1322822218 > > + DD 1322822218,1322822218,1322822218,1322822218 > > + DD 1537002063,1537002063,1537002063,1537002063 > > + DD 1537002063,1537002063,1537002063,1537002063 > > + DD 1747873779,1747873779,1747873779,1747873779 > > + DD 1747873779,1747873779,1747873779,1747873779 > > + DD 1955562222,1955562222,1955562222,1955562222 > > + DD 1955562222,1955562222,1955562222,1955562222 > > + DD 2024104815,2024104815,2024104815,2024104815 > > + DD 2024104815,2024104815,2024104815,2024104815 > > + DD 2227730452,2227730452,2227730452,2227730452 > > + DD 2227730452,2227730452,2227730452,2227730452 > > + DD 2361852424,2361852424,2361852424,2361852424 > > + DD 2361852424,2361852424,2361852424,2361852424 > > + DD 2428436474,2428436474,2428436474,2428436474 > > + DD 2428436474,2428436474,2428436474,2428436474 > > + DD 2756734187,2756734187,2756734187,2756734187 > > + DD 2756734187,2756734187,2756734187,2756734187 > > + DD 3204031479,3204031479,3204031479,3204031479 > > + DD 3204031479,3204031479,3204031479,3204031479 > > + DD 3329325298,3329325298,3329325298,3329325298 > > + DD 3329325298,3329325298,3329325298,3329325298 > > +$L$pbswap: > > + DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f > > + DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f > > +K256_shaext: > > + DD 0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5 > > + DD 0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5 > > + DD 0xd807aa98,0x12835b01,0x243185be,0x550c7dc3 > > + DD 0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174 > > + DD 0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc > > + DD 0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da > > + DD 0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7 > > + DD 0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967 > > + DD 0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13 > > + DD 0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85 > > + DD 0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3 > > + DD 0xd192e819,0xd6990624,0xf40e3585,0x106aa070 > > + DD 0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5 > > + DD 0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3 > > + DD 0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208 > > + DD 0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2 > > +DB 83,72,65,50,53,54,32,109,117,108,116,105,45,98,108,111 > > +DB 99,107,32,116,114,97,110,115,102,111,114,109,32,102,111,114 > > +DB 32,120,56,54,95,54,52,44,32,67,82,89,80,84,79,71 > > +DB 65,77,83,32,98,121,32,60,97,112,112,114,111,64,111,112 > > +DB 101,110,115,115,108,46,111,114,103,62,0 > > +EXTERN __imp_RtlVirtualUnwind > > + > > +ALIGN 16 > > +se_handler: > > + push rsi > > + push rdi > > + push rbx > > + push rbp > > + push r12 > > + push r13 > > + push r14 > > + push r15 > > + pushfq > > + sub rsp,64 > > + > > + mov rax,QWORD[120+r8] > > + mov rbx,QWORD[248+r8] > > + > > + mov rsi,QWORD[8+r9] > > + mov r11,QWORD[56+r9] > > + > > + mov r10d,DWORD[r11] > > + lea r10,[r10*1+rsi] > > + cmp rbx,r10 > > + jb NEAR $L$in_prologue > > + > > + mov rax,QWORD[152+r8] > > + > > + mov r10d,DWORD[4+r11] > > + lea r10,[r10*1+rsi] > > + cmp rbx,r10 > > + jae NEAR $L$in_prologue > > + > > + mov rax,QWORD[272+rax] > > + > > + mov rbx,QWORD[((-8))+rax] > > + mov rbp,QWORD[((-16))+rax] > > + mov QWORD[144+r8],rbx > > + mov QWORD[160+r8],rbp > > + > > + lea rsi,[((-24-160))+rax] > > + lea rdi,[512+r8] > > + mov ecx,20 > > + DD 0xa548f3fc > > + > > +$L$in_prologue: > > + mov rdi,QWORD[8+rax] > > + mov rsi,QWORD[16+rax] > > + mov QWORD[152+r8],rax > > + mov QWORD[168+r8],rsi > > + mov QWORD[176+r8],rdi > > + > > + mov rdi,QWORD[40+r9] > > + mov rsi,r8 > > + mov ecx,154 > > + DD 0xa548f3fc > > + > > + mov rsi,r9 > > + xor rcx,rcx > > + mov rdx,QWORD[8+rsi] > > + mov r8,QWORD[rsi] > > + mov r9,QWORD[16+rsi] > > + mov r10,QWORD[40+rsi] > > + lea r11,[56+rsi] > > + lea r12,[24+rsi] > > + mov QWORD[32+rsp],r10 > > + mov QWORD[40+rsp],r11 > > + mov QWORD[48+rsp],r12 > > + mov QWORD[56+rsp],rcx > > + call QWORD[__imp_RtlVirtualUnwind] > > + > > + mov eax,1 > > + add rsp,64 > > + popfq > > + pop r15 > > + pop r14 > > + pop r13 > > + pop r12 > > + pop rbp > > + pop rbx > > + pop rdi > > + pop rsi > > + DB 0F3h,0C3h ;repret > > + > > +section .pdata rdata align=4 > > +ALIGN 4 > > + DD $L$SEH_begin_sha256_multi_block wrt ..imagebase > > + DD $L$SEH_end_sha256_multi_block wrt ..imagebase > > + DD $L$SEH_info_sha256_multi_block wrt ..imagebase > > + DD $L$SEH_begin_sha256_multi_block_shaext > wrt ..imagebase > > + DD $L$SEH_end_sha256_multi_block_shaext > wrt ..imagebase > > + DD $L$SEH_info_sha256_multi_block_shaext > wrt ..imagebase > > +section .xdata rdata align=8 > > +ALIGN 8 > > +$L$SEH_info_sha256_multi_block: > > +DB 9,0,0,0 > > + DD se_handler wrt ..imagebase > > + DD $L$body wrt ..imagebase,$L$epilogue > wrt ..imagebase > > +$L$SEH_info_sha256_multi_block_shaext: > > +DB 9,0,0,0 > > + DD se_handler wrt ..imagebase > > + DD $L$body_shaext wrt ..imagebase,$L$epilogue_shaext > > wrt ..imagebase > > diff --git > a/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-x86_64.nasm > > b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-x86_64.nasm > > new file mode 100644 > > index 0000000000..70e49862a3 > > --- /dev/null > > +++ b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-x86_64.nasm > > @@ -0,0 +1,3313 @@ > > +; WARNING: do not edit! > > +; Generated from openssl/crypto/sha/asm/sha512-x86_64.pl > > +; > > +; Copyright 2005-2020 The OpenSSL Project Authors. All Rights Reserved. > > +; > > +; Licensed under the OpenSSL license (the "License"). You may not use > > +; this file except in compliance with the License. You can obtain a copy > > +; in the file LICENSE in the source distribution or at > > +; https://www.openssl.org/source/license.html > > + > > +default rel > > +%define XMMWORD > > +%define YMMWORD > > +%define ZMMWORD > > +section .text code align=64 > > + > > + > > +EXTERN OPENSSL_ia32cap_P > > +global sha256_block_data_order > > + > > +ALIGN 16 > > +sha256_block_data_order: > > + mov QWORD[8+rsp],rdi ;WIN64 prologue > > + mov QWORD[16+rsp],rsi > > + mov rax,rsp > > +$L$SEH_begin_sha256_block_data_order: > > + mov rdi,rcx > > + mov rsi,rdx > > + mov rdx,r8 > > + > > + > > + > > + lea r11,[OPENSSL_ia32cap_P] > > + mov r9d,DWORD[r11] > > + mov r10d,DWORD[4+r11] > > + mov r11d,DWORD[8+r11] > > + test r11d,536870912 > > + jnz NEAR _shaext_shortcut > > + test r10d,512 > > + jnz NEAR $L$ssse3_shortcut > > + mov rax,rsp > > + > > + push rbx > > + > > + push rbp > > + > > + push r12 > > + > > + push r13 > > + > > + push r14 > > + > > + push r15 > > + > > + shl rdx,4 > > + sub rsp,16*4+4*8 > > + lea rdx,[rdx*4+rsi] > > + and rsp,-64 > > + mov QWORD[((64+0))+rsp],rdi > > + mov QWORD[((64+8))+rsp],rsi > > + mov QWORD[((64+16))+rsp],rdx > > + mov QWORD[88+rsp],rax > > + > > +$L$prologue: > > + > > + mov eax,DWORD[rdi] > > + mov ebx,DWORD[4+rdi] > > + mov ecx,DWORD[8+rdi] > > + mov edx,DWORD[12+rdi] > > + mov r8d,DWORD[16+rdi] > > + mov r9d,DWORD[20+rdi] > > + mov r10d,DWORD[24+rdi] > > + mov r11d,DWORD[28+rdi] > > + jmp NEAR $L$loop > > + > > +ALIGN 16 > > +$L$loop: > > + mov edi,ebx > > + lea rbp,[K256] > > + xor edi,ecx > > + mov r12d,DWORD[rsi] > > + mov r13d,r8d > > + mov r14d,eax > > + bswap r12d > > + ror r13d,14 > > + mov r15d,r9d > > + > > + xor r13d,r8d > > + ror r14d,9 > > + xor r15d,r10d > > + > > + mov DWORD[rsp],r12d > > + xor r14d,eax > > + and r15d,r8d > > + > > + ror r13d,5 > > + add r12d,r11d > > + xor r15d,r10d > > + > > + ror r14d,11 > > + xor r13d,r8d > > + add r12d,r15d > > + > > + mov r15d,eax > > + add r12d,DWORD[rbp] > > + xor r14d,eax > > + > > + xor r15d,ebx > > + ror r13d,6 > > + mov r11d,ebx > > + > > + and edi,r15d > > + ror r14d,2 > > + add r12d,r13d > > + > > + xor r11d,edi > > + add edx,r12d > > + add r11d,r12d > > + > > + lea rbp,[4+rbp] > > + add r11d,r14d > > + mov r12d,DWORD[4+rsi] > > + mov r13d,edx > > + mov r14d,r11d > > + bswap r12d > > + ror r13d,14 > > + mov edi,r8d > > + > > + xor r13d,edx > > + ror r14d,9 > > + xor edi,r9d > > + > > + mov DWORD[4+rsp],r12d > > + xor r14d,r11d > > + and edi,edx > > + > > + ror r13d,5 > > + add r12d,r10d > > + xor edi,r9d > > + > > + ror r14d,11 > > + xor r13d,edx > > + add r12d,edi > > + > > + mov edi,r11d > > + add r12d,DWORD[rbp] > > + xor r14d,r11d > > + > > + xor edi,eax > > + ror r13d,6 > > + mov r10d,eax > > + > > + and r15d,edi > > + ror r14d,2 > > + add r12d,r13d > > + > > + xor r10d,r15d > > + add ecx,r12d > > + add r10d,r12d > > + > > + lea rbp,[4+rbp] > > + add r10d,r14d > > + mov r12d,DWORD[8+rsi] > > + mov r13d,ecx > > + mov r14d,r10d > > + bswap r12d > > + ror r13d,14 > > + mov r15d,edx > > + > > + xor r13d,ecx > > + ror r14d,9 > > + xor r15d,r8d > > + > > + mov DWORD[8+rsp],r12d > > + xor r14d,r10d > > + and r15d,ecx > > + > > + ror r13d,5 > > + add r12d,r9d > > + xor r15d,r8d > > + > > + ror r14d,11 > > + xor r13d,ecx > > + add r12d,r15d > > + > > + mov r15d,r10d > > + add r12d,DWORD[rbp] > > + xor r14d,r10d > > + > > + xor r15d,r11d > > + ror r13d,6 > > + mov r9d,r11d > > + > > + and edi,r15d > > + ror r14d,2 > > + add r12d,r13d > > + > > + xor r9d,edi > > + add ebx,r12d > > + add r9d,r12d > > + > > + lea rbp,[4+rbp] > > + add r9d,r14d > > + mov r12d,DWORD[12+rsi] > > + mov r13d,ebx > > + mov r14d,r9d > > + bswap r12d > > + ror r13d,14 > > + mov edi,ecx > > + > > + xor r13d,ebx > > + ror r14d,9 > > + xor edi,edx > > + > > + mov DWORD[12+rsp],r12d > > + xor r14d,r9d > > + and edi,ebx > > + > > + ror r13d,5 > > + add r12d,r8d > > + xor edi,edx > > + > > + ror r14d,11 > > + xor r13d,ebx > > + add r12d,edi > > + > > + mov edi,r9d > > + add r12d,DWORD[rbp] > > + xor r14d,r9d > > + > > + xor edi,r10d > > + ror r13d,6 > > + mov r8d,r10d > > + > > + and r15d,edi > > + ror r14d,2 > > + add r12d,r13d > > + > > + xor r8d,r15d > > + add eax,r12d > > + add r8d,r12d > > + > > + lea rbp,[20+rbp] > > + add r8d,r14d > > + mov r12d,DWORD[16+rsi] > > + mov r13d,eax > > + mov r14d,r8d > > + bswap r12d > > + ror r13d,14 > > + mov r15d,ebx > > + > > + xor r13d,eax > > + ror r14d,9 > > + xor r15d,ecx > > + > > + mov DWORD[16+rsp],r12d > > + xor r14d,r8d > > + and r15d,eax > > + > > + ror r13d,5 > > + add r12d,edx > > + xor r15d,ecx > > + > > + ror r14d,11 > > + xor r13d,eax > > + add r12d,r15d > > + > > + mov r15d,r8d > > + add r12d,DWORD[rbp] > > + xor r14d,r8d > > + > > + xor r15d,r9d > > + ror r13d,6 > > + mov edx,r9d > > + > > + and edi,r15d > > + ror r14d,2 > > + add r12d,r13d > > + > > + xor edx,edi > > + add r11d,r12d > > + add edx,r12d > > + > > + lea rbp,[4+rbp] > > + add edx,r14d > > + mov r12d,DWORD[20+rsi] > > + mov r13d,r11d > > + mov r14d,edx > > + bswap r12d > > + ror r13d,14 > > + mov edi,eax > > + > > + xor r13d,r11d > > + ror r14d,9 > > + xor edi,ebx > > + > > + mov DWORD[20+rsp],r12d > > + xor r14d,edx > > + and edi,r11d > > + > > + ror r13d,5 > > + add r12d,ecx > > + xor edi,ebx > > + > > + ror r14d,11 > > + xor r13d,r11d > > + add r12d,edi > > + > > + mov edi,edx > > + add r12d,DWORD[rbp] > > + xor r14d,edx > > + > > + xor edi,r8d > > + ror r13d,6 > > + mov ecx,r8d > > + > > + and r15d,edi > > + ror r14d,2 > > + add r12d,r13d > > + > > + xor ecx,r15d > > + add r10d,r12d > > + add ecx,r12d > > + > > + lea rbp,[4+rbp] > > + add ecx,r14d > > + mov r12d,DWORD[24+rsi] > > + mov r13d,r10d > > + mov r14d,ecx > > + bswap r12d > > + ror r13d,14 > > + mov r15d,r11d > > + > > + xor r13d,r10d > > + ror r14d,9 > > + xor r15d,eax > > + > > + mov DWORD[24+rsp],r12d > > + xor r14d,ecx > > + and r15d,r10d > > + > > + ror r13d,5 > > + add r12d,ebx > > + xor r15d,eax > > + > > + ror r14d,11 > > + xor r13d,r10d > > + add r12d,r15d > > + > > + mov r15d,ecx > > + add r12d,DWORD[rbp] > > + xor r14d,ecx > > + > > + xor r15d,edx > > + ror r13d,6 > > + mov ebx,edx > > + > > + and edi,r15d > > + ror r14d,2 > > + add r12d,r13d > > + > > + xor ebx,edi > > + add r9d,r12d > > + add ebx,r12d > > + > > + lea rbp,[4+rbp] > > + add ebx,r14d > > + mov r12d,DWORD[28+rsi] > > + mov r13d,r9d > > + mov r14d,ebx > > + bswap r12d > > + ror r13d,14 > > + mov edi,r10d > > + > > + xor r13d,r9d > > + ror r14d,9 > > + xor edi,r11d > > + > > + mov DWORD[28+rsp],r12d > > + xor r14d,ebx > > + and edi,r9d > > + > > + ror r13d,5 > > + add r12d,eax > > + xor edi,r11d > > + > > + ror r14d,11 > > + xor r13d,r9d > > + add r12d,edi > > + > > + mov edi,ebx > > + add r12d,DWORD[rbp] > > + xor r14d,ebx > > + > > + xor edi,ecx > > + ror r13d,6 > > + mov eax,ecx > > + > > + and r15d,edi > > + ror r14d,2 > > + add r12d,r13d > > + > > + xor eax,r15d > > + add r8d,r12d > > + add eax,r12d > > + > > + lea rbp,[20+rbp] > > + add eax,r14d > > + mov r12d,DWORD[32+rsi] > > + mov r13d,r8d > > + mov r14d,eax > > + bswap r12d > > + ror r13d,14 > > + mov r15d,r9d > > + > > + xor r13d,r8d > > + ror r14d,9 > > + xor r15d,r10d > > + > > + mov DWORD[32+rsp],r12d > > + xor r14d,eax > > + and r15d,r8d > > + > > + ror r13d,5 > > + add r12d,r11d > > + xor r15d,r10d > > + > > + ror r14d,11 > > + xor r13d,r8d > > + add r12d,r15d > > + > > + mov r15d,eax > > + add r12d,DWORD[rbp] > > + xor r14d,eax > > + > > + xor r15d,ebx > > + ror r13d,6 > > + mov r11d,ebx > > + > > + and edi,r15d > > + ror r14d,2 > > + add r12d,r13d > > + > > + xor r11d,edi > > + add edx,r12d > > + add r11d,r12d > > + > > + lea rbp,[4+rbp] > > + add r11d,r14d > > + mov r12d,DWORD[36+rsi] > > + mov r13d,edx > > + mov r14d,r11d > > + bswap r12d > > + ror r13d,14 > > + mov edi,r8d > > + > > + xor r13d,edx > > + ror r14d,9 > > + xor edi,r9d > > + > > + mov DWORD[36+rsp],r12d > > + xor r14d,r11d > > + and edi,edx > > + > > + ror r13d,5 > > + add r12d,r10d > > + xor edi,r9d > > + > > + ror r14d,11 > > + xor r13d,edx > > + add r12d,edi > > + > > + mov edi,r11d > > + add r12d,DWORD[rbp] > > + xor r14d,r11d > > + > > + xor edi,eax > > + ror r13d,6 > > + mov r10d,eax > > + > > + and r15d,edi > > + ror r14d,2 > > + add r12d,r13d > > + > > + xor r10d,r15d > > + add ecx,r12d > > + add r10d,r12d > > + > > + lea rbp,[4+rbp] > > + add r10d,r14d > > + mov r12d,DWORD[40+rsi] > > + mov r13d,ecx > > + mov r14d,r10d > > + bswap r12d > > + ror r13d,14 > > + mov r15d,edx > > + > > + xor r13d,ecx > > + ror r14d,9 > > + xor r15d,r8d > > + > > + mov DWORD[40+rsp],r12d > > + xor r14d,r10d > > + and r15d,ecx > > + > > + ror r13d,5 > > + add r12d,r9d > > + xor r15d,r8d > > + > > + ror r14d,11 > > + xor r13d,ecx > > + add r12d,r15d > > + > > + mov r15d,r10d > > + add r12d,DWORD[rbp] > > + xor r14d,r10d > > + > > + xor r15d,r11d > > + ror r13d,6 > > + mov r9d,r11d > > + > > + and edi,r15d > > + ror r14d,2 > > + add r12d,r13d > > + > > + xor r9d,edi > > + add ebx,r12d > > + add r9d,r12d > > + > > + lea rbp,[4+rbp] > > + add r9d,r14d > > + mov r12d,DWORD[44+rsi] > > + mov r13d,ebx > > + mov r14d,r9d > > + bswap r12d > > + ror r13d,14 > > + mov edi,ecx > > + > > + xor r13d,ebx > > + ror r14d,9 > > + xor edi,edx > > + > > + mov DWORD[44+rsp],r12d > > + xor r14d,r9d > > + and edi,ebx > > + > > + ror r13d,5 > > + add r12d,r8d > > + xor edi,edx > > + > > + ror r14d,11 > > + xor r13d,ebx > > + add r12d,edi > > + > > + mov edi,r9d > > + add r12d,DWORD[rbp] > > + xor r14d,r9d > > + > > + xor edi,r10d > > + ror r13d,6 > > + mov r8d,r10d > > + > > + and r15d,edi > > + ror r14d,2 > > + add r12d,r13d > > + > > + xor r8d,r15d > > + add eax,r12d > > + add r8d,r12d > > + > > + lea rbp,[20+rbp] > > + add r8d,r14d > > + mov r12d,DWORD[48+rsi] > > + mov r13d,eax > > + mov r14d,r8d > > + bswap r12d > > + ror r13d,14 > > + mov r15d,ebx > > + > > + xor r13d,eax > > + ror r14d,9 > > + xor r15d,ecx > > + > > + mov DWORD[48+rsp],r12d > > + xor r14d,r8d > > + and r15d,eax > > + > > + ror r13d,5 > > + add r12d,edx > > + xor r15d,ecx > > + > > + ror r14d,11 > > + xor r13d,eax > > + add r12d,r15d > > + > > + mov r15d,r8d > > + add r12d,DWORD[rbp] > > + xor r14d,r8d > > + > > + xor r15d,r9d > > + ror r13d,6 > > + mov edx,r9d > > + > > + and edi,r15d > > + ror r14d,2 > > + add r12d,r13d > > + > > + xor edx,edi > > + add r11d,r12d > > + add edx,r12d > > + > > + lea rbp,[4+rbp] > > + add edx,r14d > > + mov r12d,DWORD[52+rsi] > > + mov r13d,r11d > > + mov r14d,edx > > + bswap r12d > > + ror r13d,14 > > + mov edi,eax > > + > > + xor r13d,r11d > > + ror r14d,9 > > + xor edi,ebx > > + > > + mov DWORD[52+rsp],r12d > > + xor r14d,edx > > + and edi,r11d > > + > > + ror r13d,5 > > + add r12d,ecx > > + xor edi,ebx > > + > > + ror r14d,11 > > + xor r13d,r11d > > + add r12d,edi > > + > > + mov edi,edx > > + add r12d,DWORD[rbp] > > + xor r14d,edx > > + > > + xor edi,r8d > > + ror r13d,6 > > + mov ecx,r8d > > + > > + and r15d,edi > > + ror r14d,2 > > + add r12d,r13d > > + > > + xor ecx,r15d > > + add r10d,r12d > > + add ecx,r12d > > + > > + lea rbp,[4+rbp] > > + add ecx,r14d > > + mov r12d,DWORD[56+rsi] > > + mov r13d,r10d > > + mov r14d,ecx > > + bswap r12d > > + ror r13d,14 > > + mov r15d,r11d > > + > > + xor r13d,r10d > > + ror r14d,9 > > + xor r15d,eax > > + > > + mov DWORD[56+rsp],r12d > > + xor r14d,ecx > > + and r15d,r10d > > + > > + ror r13d,5 > > + add r12d,ebx > > + xor r15d,eax > > + > > + ror r14d,11 > > + xor r13d,r10d > > + add r12d,r15d > > + > > + mov r15d,ecx > > + add r12d,DWORD[rbp] > > + xor r14d,ecx > > + > > + xor r15d,edx > > + ror r13d,6 > > + mov ebx,edx > > + > > + and edi,r15d > > + ror r14d,2 > > + add r12d,r13d > > + > > + xor ebx,edi > > + add r9d,r12d > > + add ebx,r12d > > + > > + lea rbp,[4+rbp] > > + add ebx,r14d > > + mov r12d,DWORD[60+rsi] > > + mov r13d,r9d > > + mov r14d,ebx > > + bswap r12d > > + ror r13d,14 > > + mov edi,r10d > > + > > + xor r13d,r9d > > + ror r14d,9 > > + xor edi,r11d > > + > > + mov DWORD[60+rsp],r12d > > + xor r14d,ebx > > + and edi,r9d > > + > > + ror r13d,5 > > + add r12d,eax > > + xor edi,r11d > > + > > + ror r14d,11 > > + xor r13d,r9d > > + add r12d,edi > > + > > + mov edi,ebx > > + add r12d,DWORD[rbp] > > + xor r14d,ebx > > + > > + xor edi,ecx > > + ror r13d,6 > > + mov eax,ecx > > + > > + and r15d,edi > > + ror r14d,2 > > + add r12d,r13d > > + > > + xor eax,r15d > > + add r8d,r12d > > + add eax,r12d > > + > > + lea rbp,[20+rbp] > > + jmp NEAR $L$rounds_16_xx > > +ALIGN 16 > > +$L$rounds_16_xx: > > + mov r13d,DWORD[4+rsp] > > + mov r15d,DWORD[56+rsp] > > + > > + mov r12d,r13d > > + ror r13d,11 > > + add eax,r14d > > + mov r14d,r15d > > + ror r15d,2 > > + > > + xor r13d,r12d > > + shr r12d,3 > > + ror r13d,7 > > + xor r15d,r14d > > + shr r14d,10 > > + > > + ror r15d,17 > > + xor r12d,r13d > > + xor r15d,r14d > > + add r12d,DWORD[36+rsp] > > + > > + add r12d,DWORD[rsp] > > + mov r13d,r8d > > + add r12d,r15d > > + mov r14d,eax > > + ror r13d,14 > > + mov r15d,r9d > > + > > + xor r13d,r8d > > + ror r14d,9 > > + xor r15d,r10d > > + > > + mov DWORD[rsp],r12d > > + xor r14d,eax > > + and r15d,r8d > > + > > + ror r13d,5 > > + add r12d,r11d > > + xor r15d,r10d > > + > > + ror r14d,11 > > + xor r13d,r8d > > + add r12d,r15d > > + > > + mov r15d,eax > > + add r12d,DWORD[rbp] > > + xor r14d,eax > > + > > + xor r15d,ebx > > + ror r13d,6 > > + mov r11d,ebx > > + > > + and edi,r15d > > + ror r14d,2 > > + add r12d,r13d > > + > > + xor r11d,edi > > + add edx,r12d > > + add r11d,r12d > > + > > + lea rbp,[4+rbp] > > + mov r13d,DWORD[8+rsp] > > + mov edi,DWORD[60+rsp] > > + > > + mov r12d,r13d > > + ror r13d,11 > > + add r11d,r14d > > + mov r14d,edi > > + ror edi,2 > > + > > + xor r13d,r12d > > + shr r12d,3 > > + ror r13d,7 > > + xor edi,r14d > > + shr r14d,10 > > + > > + ror edi,17 > > + xor r12d,r13d > > + xor edi,r14d > > + add r12d,DWORD[40+rsp] > > + > > + add r12d,DWORD[4+rsp] > > + mov r13d,edx > > + add r12d,edi > > + mov r14d,r11d > > + ror r13d,14 > > + mov edi,r8d > > + > > + xor r13d,edx > > + ror r14d,9 > > + xor edi,r9d > > + > > + mov DWORD[4+rsp],r12d > > + xor r14d,r11d > > + and edi,edx > > + > > + ror r13d,5 > > + add r12d,r10d > > + xor edi,r9d > > + > > + ror r14d,11 > > + xor r13d,edx > > + add r12d,edi > > + > > + mov edi,r11d > > + add r12d,DWORD[rbp] > > + xor r14d,r11d > > + > > + xor edi,eax > > + ror r13d,6 > > + mov r10d,eax > > + > > + and r15d,edi > > + ror r14d,2 > > + add r12d,r13d > > + > > + xor r10d,r15d > > + add ecx,r12d > > + add r10d,r12d > > + > > + lea rbp,[4+rbp] > > + mov r13d,DWORD[12+rsp] > > + mov r15d,DWORD[rsp] > > + > > + mov r12d,r13d > > + ror r13d,11 > > + add r10d,r14d > > + mov r14d,r15d > > + ror r15d,2 > > + > > + xor r13d,r12d > > + shr r12d,3 > > + ror r13d,7 > > + xor r15d,r14d > > + shr r14d,10 > > + > > + ror r15d,17 > > + xor r12d,r13d > > + xor r15d,r14d > > + add r12d,DWORD[44+rsp] > > + > > + add r12d,DWORD[8+rsp] > > + mov r13d,ecx > > + add r12d,r15d > > + mov r14d,r10d > > + ror r13d,14 > > + mov r15d,edx > > + > > + xor r13d,ecx > > + ror r14d,9 > > + xor r15d,r8d > > + > > + mov DWORD[8+rsp],r12d > > + xor r14d,r10d > > + and r15d,ecx > > + > > + ror r13d,5 > > + add r12d,r9d > > + xor r15d,r8d > > + > > + ror r14d,11 > > + xor r13d,ecx > > + add r12d,r15d > > + > > + mov r15d,r10d > > + add r12d,DWORD[rbp] > > + xor r14d,r10d > > + > > + xor r15d,r11d > > + ror r13d,6 > > + mov r9d,r11d > > + > > + and edi,r15d > > + ror r14d,2 > > + add r12d,r13d > > + > > + xor r9d,edi > > + add ebx,r12d > > + add r9d,r12d > > + > > + lea rbp,[4+rbp] > > + mov r13d,DWORD[16+rsp] > > + mov edi,DWORD[4+rsp] > > + > > + mov r12d,r13d > > + ror r13d,11 > > + add r9d,r14d > > + mov r14d,edi > > + ror edi,2 > > + > > + xor r13d,r12d > > + shr r12d,3 > > + ror r13d,7 > > + xor edi,r14d > > + shr r14d,10 > > + > > + ror edi,17 > > + xor r12d,r13d > > + xor edi,r14d > > + add r12d,DWORD[48+rsp] > > + > > + add r12d,DWORD[12+rsp] > > + mov r13d,ebx > > + add r12d,edi > > + mov r14d,r9d > > + ror r13d,14 > > + mov edi,ecx > > + > > + xor r13d,ebx > > + ror r14d,9 > > + xor edi,edx > > + > > + mov DWORD[12+rsp],r12d > > + xor r14d,r9d > > + and edi,ebx > > + > > + ror r13d,5 > > + add r12d,r8d > > + xor edi,edx > > + > > + ror r14d,11 > > + xor r13d,ebx > > + add r12d,edi > > + > > + mov edi,r9d > > + add r12d,DWORD[rbp] > > + xor r14d,r9d > > + > > + xor edi,r10d > > + ror r13d,6 > > + mov r8d,r10d > > + > > + and r15d,edi > > + ror r14d,2 > > + add r12d,r13d > > + > > + xor r8d,r15d > > + add eax,r12d > > + add r8d,r12d > > + > > + lea rbp,[20+rbp] > > + mov r13d,DWORD[20+rsp] > > + mov r15d,DWORD[8+rsp] > > + > > + mov r12d,r13d > > + ror r13d,11 > > + add r8d,r14d > > + mov r14d,r15d > > + ror r15d,2 > > + > > + xor r13d,r12d > > + shr r12d,3 > > + ror r13d,7 > > + xor r15d,r14d > > + shr r14d,10 > > + > > + ror r15d,17 > > + xor r12d,r13d > > + xor r15d,r14d > > + add r12d,DWORD[52+rsp] > > + > > + add r12d,DWORD[16+rsp] > > + mov r13d,eax > > + add r12d,r15d > > + mov r14d,r8d > > + ror r13d,14 > > + mov r15d,ebx > > + > > + xor r13d,eax > > + ror r14d,9 > > + xor r15d,ecx > > + > > + mov DWORD[16+rsp],r12d > > + xor r14d,r8d > > + and r15d,eax > > + > > + ror r13d,5 > > + add r12d,edx > > + xor r15d,ecx > > + > > + ror r14d,11 > > + xor r13d,eax > > + add r12d,r15d > > + > > + mov r15d,r8d > > + add r12d,DWORD[rbp] > > + xor r14d,r8d > > + > > + xor r15d,r9d > > + ror r13d,6 > > + mov edx,r9d > > + > > + and edi,r15d > > + ror r14d,2 > > + add r12d,r13d > > + > > + xor edx,edi > > + add r11d,r12d > > + add edx,r12d > > + > > + lea rbp,[4+rbp] > > + mov r13d,DWORD[24+rsp] > > + mov edi,DWORD[12+rsp] > > + > > + mov r12d,r13d > > + ror r13d,11 > > + add edx,r14d > > + mov r14d,edi > > + ror edi,2 > > + > > + xor r13d,r12d > > + shr r12d,3 > > + ror r13d,7 > > + xor edi,r14d > > + shr r14d,10 > > + > > + ror edi,17 > > + xor r12d,r13d > > + xor edi,r14d > > + add r12d,DWORD[56+rsp] > > + > > + add r12d,DWORD[20+rsp] > > + mov r13d,r11d > > + add r12d,edi > > + mov r14d,edx > > + ror r13d,14 > > + mov edi,eax > > + > > + xor r13d,r11d > > + ror r14d,9 > > + xor edi,ebx > > + > > + mov DWORD[20+rsp],r12d > > + xor r14d,edx > > + and edi,r11d > > + > > + ror r13d,5 > > + add r12d,ecx > > + xor edi,ebx > > + > > + ror r14d,11 > > + xor r13d,r11d > > + add r12d,edi > > + > > + mov edi,edx > > + add r12d,DWORD[rbp] > > + xor r14d,edx > > + > > + xor edi,r8d > > + ror r13d,6 > > + mov ecx,r8d > > + > > + and r15d,edi > > + ror r14d,2 > > + add r12d,r13d > > + > > + xor ecx,r15d > > + add r10d,r12d > > + add ecx,r12d > > + > > + lea rbp,[4+rbp] > > + mov r13d,DWORD[28+rsp] > > + mov r15d,DWORD[16+rsp] > > + > > + mov r12d,r13d > > + ror r13d,11 > > + add ecx,r14d > > + mov r14d,r15d > > + ror r15d,2 > > + > > + xor r13d,r12d > > + shr r12d,3 > > + ror r13d,7 > > + xor r15d,r14d > > + shr r14d,10 > > + > > + ror r15d,17 > > + xor r12d,r13d > > + xor r15d,r14d > > + add r12d,DWORD[60+rsp] > > + > > + add r12d,DWORD[24+rsp] > > + mov r13d,r10d > > + add r12d,r15d > > + mov r14d,ecx > > + ror r13d,14 > > + mov r15d,r11d > > + > > + xor r13d,r10d > > + ror r14d,9 > > + xor r15d,eax > > + > > + mov DWORD[24+rsp],r12d > > + xor r14d,ecx > > + and r15d,r10d > > + > > + ror r13d,5 > > + add r12d,ebx > > + xor r15d,eax > > + > > + ror r14d,11 > > + xor r13d,r10d > > + add r12d,r15d > > + > > + mov r15d,ecx > > + add r12d,DWORD[rbp] > > + xor r14d,ecx > > + > > + xor r15d,edx > > + ror r13d,6 > > + mov ebx,edx > > + > > + and edi,r15d > > + ror r14d,2 > > + add r12d,r13d > > + > > + xor ebx,edi > > + add r9d,r12d > > + add ebx,r12d > > + > > + lea rbp,[4+rbp] > > + mov r13d,DWORD[32+rsp] > > + mov edi,DWORD[20+rsp] > > + > > + mov r12d,r13d > > + ror r13d,11 > > + add ebx,r14d > > + mov r14d,edi > > + ror edi,2 > > + > > + xor r13d,r12d > > + shr r12d,3 > > + ror r13d,7 > > + xor edi,r14d > > + shr r14d,10 > > + > > + ror edi,17 > > + xor r12d,r13d > > + xor edi,r14d > > + add r12d,DWORD[rsp] > > + > > + add r12d,DWORD[28+rsp] > > + mov r13d,r9d > > + add r12d,edi > > + mov r14d,ebx > > + ror r13d,14 > > + mov edi,r10d > > + > > + xor r13d,r9d > > + ror r14d,9 > > + xor edi,r11d > > + > > + mov DWORD[28+rsp],r12d > > + xor r14d,ebx > > + and edi,r9d > > + > > + ror r13d,5 > > + add r12d,eax > > + xor edi,r11d > > + > > + ror r14d,11 > > + xor r13d,r9d > > + add r12d,edi > > + > > + mov edi,ebx > > + add r12d,DWORD[rbp] > > + xor r14d,ebx > > + > > + xor edi,ecx > > + ror r13d,6 > > + mov eax,ecx > > + > > + and r15d,edi > > + ror r14d,2 > > + add r12d,r13d > > + > > + xor eax,r15d > > + add r8d,r12d > > + add eax,r12d > > + > > + lea rbp,[20+rbp] > > + mov r13d,DWORD[36+rsp] > > + mov r15d,DWORD[24+rsp] > > + > > + mov r12d,r13d > > + ror r13d,11 > > + add eax,r14d > > + mov r14d,r15d > > + ror r15d,2 > > + > > + xor r13d,r12d > > + shr r12d,3 > > + ror r13d,7 > > + xor r15d,r14d > > + shr r14d,10 > > + > > + ror r15d,17 > > + xor r12d,r13d > > + xor r15d,r14d > > + add r12d,DWORD[4+rsp] > > + > > + add r12d,DWORD[32+rsp] > > + mov r13d,r8d > > + add r12d,r15d > > + mov r14d,eax > > + ror r13d,14 > > + mov r15d,r9d > > + > > + xor r13d,r8d > > + ror r14d,9 > > + xor r15d,r10d > > + > > + mov DWORD[32+rsp],r12d > > + xor r14d,eax > > + and r15d,r8d > > + > > + ror r13d,5 > > + add r12d,r11d > > + xor r15d,r10d > > + > > + ror r14d,11 > > + xor r13d,r8d > > + add r12d,r15d > > + > > + mov r15d,eax > > + add r12d,DWORD[rbp] > > + xor r14d,eax > > + > > + xor r15d,ebx > > + ror r13d,6 > > + mov r11d,ebx > > + > > + and edi,r15d > > + ror r14d,2 > > + add r12d,r13d > > + > > + xor r11d,edi > > + add edx,r12d > > + add r11d,r12d > > + > > + lea rbp,[4+rbp] > > + mov r13d,DWORD[40+rsp] > > + mov edi,DWORD[28+rsp] > > + > > + mov r12d,r13d > > + ror r13d,11 > > + add r11d,r14d > > + mov r14d,edi > > + ror edi,2 > > + > > + xor r13d,r12d > > + shr r12d,3 > > + ror r13d,7 > > + xor edi,r14d > > + shr r14d,10 > > + > > + ror edi,17 > > + xor r12d,r13d > > + xor edi,r14d > > + add r12d,DWORD[8+rsp] > > + > > + add r12d,DWORD[36+rsp] > > + mov r13d,edx > > + add r12d,edi > > + mov r14d,r11d > > + ror r13d,14 > > + mov edi,r8d > > + > > + xor r13d,edx > > + ror r14d,9 > > + xor edi,r9d > > + > > + mov DWORD[36+rsp],r12d > > + xor r14d,r11d > > + and edi,edx > > + > > + ror r13d,5 > > + add r12d,r10d > > + xor edi,r9d > > + > > + ror r14d,11 > > + xor r13d,edx > > + add r12d,edi > > + > > + mov edi,r11d > > + add r12d,DWORD[rbp] > > + xor r14d,r11d > > + > > + xor edi,eax > > + ror r13d,6 > > + mov r10d,eax > > + > > + and r15d,edi > > + ror r14d,2 > > + add r12d,r13d > > + > > + xor r10d,r15d > > + add ecx,r12d > > + add r10d,r12d > > + > > + lea rbp,[4+rbp] > > + mov r13d,DWORD[44+rsp] > > + mov r15d,DWORD[32+rsp] > > + > > + mov r12d,r13d > > + ror r13d,11 > > + add r10d,r14d > > + mov r14d,r15d > > + ror r15d,2 > > + > > + xor r13d,r12d > > + shr r12d,3 > > + ror r13d,7 > > + xor r15d,r14d > > + shr r14d,10 > > + > > + ror r15d,17 > > + xor r12d,r13d > > + xor r15d,r14d > > + add r12d,DWORD[12+rsp] > > + > > + add r12d,DWORD[40+rsp] > > + mov r13d,ecx > > + add r12d,r15d > > + mov r14d,r10d > > + ror r13d,14 > > + mov r15d,edx > > + > > + xor r13d,ecx > > + ror r14d,9 > > + xor r15d,r8d > > + > > + mov DWORD[40+rsp],r12d > > + xor r14d,r10d > > + and r15d,ecx > > + > > + ror r13d,5 > > + add r12d,r9d > > + xor r15d,r8d > > + > > + ror r14d,11 > > + xor r13d,ecx > > + add r12d,r15d > > + > > + mov r15d,r10d > > + add r12d,DWORD[rbp] > > + xor r14d,r10d > > + > > + xor r15d,r11d > > + ror r13d,6 > > + mov r9d,r11d > > + > > + and edi,r15d > > + ror r14d,2 > > + add r12d,r13d > > + > > + xor r9d,edi > > + add ebx,r12d > > + add r9d,r12d > > + > > + lea rbp,[4+rbp] > > + mov r13d,DWORD[48+rsp] > > + mov edi,DWORD[36+rsp] > > + > > + mov r12d,r13d > > + ror r13d,11 > > + add r9d,r14d > > + mov r14d,edi > > + ror edi,2 > > + > > + xor r13d,r12d > > + shr r12d,3 > > + ror r13d,7 > > + xor edi,r14d > > + shr r14d,10 > > + > > + ror edi,17 > > + xor r12d,r13d > > + xor edi,r14d > > + add r12d,DWORD[16+rsp] > > + > > + add r12d,DWORD[44+rsp] > > + mov r13d,ebx > > + add r12d,edi > > + mov r14d,r9d > > + ror r13d,14 > > + mov edi,ecx > > + > > + xor r13d,ebx > > + ror r14d,9 > > + xor edi,edx > > + > > + mov DWORD[44+rsp],r12d > > + xor r14d,r9d > > + and edi,ebx > > + > > + ror r13d,5 > > + add r12d,r8d > > + xor edi,edx > > + > > + ror r14d,11 > > + xor r13d,ebx > > + add r12d,edi > > + > > + mov edi,r9d > > + add r12d,DWORD[rbp] > > + xor r14d,r9d > > + > > + xor edi,r10d > > + ror r13d,6 > > + mov r8d,r10d > > + > > + and r15d,edi > > + ror r14d,2 > > + add r12d,r13d > > + > > + xor r8d,r15d > > + add eax,r12d > > + add r8d,r12d > > + > > + lea rbp,[20+rbp] > > + mov r13d,DWORD[52+rsp] > > + mov r15d,DWORD[40+rsp] > > + > > + mov r12d,r13d > > + ror r13d,11 > > + add r8d,r14d > > + mov r14d,r15d > > + ror r15d,2 > > + > > + xor r13d,r12d > > + shr r12d,3 > > + ror r13d,7 > > + xor r15d,r14d > > + shr r14d,10 > > + > > + ror r15d,17 > > + xor r12d,r13d > > + xor r15d,r14d > > + add r12d,DWORD[20+rsp] > > + > > + add r12d,DWORD[48+rsp] > > + mov r13d,eax > > + add r12d,r15d > > + mov r14d,r8d > > + ror r13d,14 > > + mov r15d,ebx > > + > > + xor r13d,eax > > + ror r14d,9 > > + xor r15d,ecx > > + > > + mov DWORD[48+rsp],r12d > > + xor r14d,r8d > > + and r15d,eax > > + > > + ror r13d,5 > > + add r12d,edx > > + xor r15d,ecx > > + > > + ror r14d,11 > > + xor r13d,eax > > + add r12d,r15d > > + > > + mov r15d,r8d > > + add r12d,DWORD[rbp] > > + xor r14d,r8d > > + > > + xor r15d,r9d > > + ror r13d,6 > > + mov edx,r9d > > + > > + and edi,r15d > > + ror r14d,2 > > + add r12d,r13d > > + > > + xor edx,edi > > + add r11d,r12d > > + add edx,r12d > > + > > + lea rbp,[4+rbp] > > + mov r13d,DWORD[56+rsp] > > + mov edi,DWORD[44+rsp] > > + > > + mov r12d,r13d > > + ror r13d,11 > > + add edx,r14d > > + mov r14d,edi > > + ror edi,2 > > + > > + xor r13d,r12d > > + shr r12d,3 > > + ror r13d,7 > > + xor edi,r14d > > + shr r14d,10 > > + > > + ror edi,17 > > + xor r12d,r13d > > + xor edi,r14d > > + add r12d,DWORD[24+rsp] > > + > > + add r12d,DWORD[52+rsp] > > + mov r13d,r11d > > + add r12d,edi > > + mov r14d,edx > > + ror r13d,14 > > + mov edi,eax > > + > > + xor r13d,r11d > > + ror r14d,9 > > + xor edi,ebx > > + > > + mov DWORD[52+rsp],r12d > > + xor r14d,edx > > + and edi,r11d > > + > > + ror r13d,5 > > + add r12d,ecx > > + xor edi,ebx > > + > > + ror r14d,11 > > + xor r13d,r11d > > + add r12d,edi > > + > > + mov edi,edx > > + add r12d,DWORD[rbp] > > + xor r14d,edx > > + > > + xor edi,r8d > > + ror r13d,6 > > + mov ecx,r8d > > + > > + and r15d,edi > > + ror r14d,2 > > + add r12d,r13d > > + > > + xor ecx,r15d > > + add r10d,r12d > > + add ecx,r12d > > + > > + lea rbp,[4+rbp] > > + mov r13d,DWORD[60+rsp] > > + mov r15d,DWORD[48+rsp] > > + > > + mov r12d,r13d > > + ror r13d,11 > > + add ecx,r14d > > + mov r14d,r15d > > + ror r15d,2 > > + > > + xor r13d,r12d > > + shr r12d,3 > > + ror r13d,7 > > + xor r15d,r14d > > + shr r14d,10 > > + > > + ror r15d,17 > > + xor r12d,r13d > > + xor r15d,r14d > > + add r12d,DWORD[28+rsp] > > + > > + add r12d,DWORD[56+rsp] > > + mov r13d,r10d > > + add r12d,r15d > > + mov r14d,ecx > > + ror r13d,14 > > + mov r15d,r11d > > + > > + xor r13d,r10d > > + ror r14d,9 > > + xor r15d,eax > > + > > + mov DWORD[56+rsp],r12d > > + xor r14d,ecx > > + and r15d,r10d > > + > > + ror r13d,5 > > + add r12d,ebx > > + xor r15d,eax > > + > > + ror r14d,11 > > + xor r13d,r10d > > + add r12d,r15d > > + > > + mov r15d,ecx > > + add r12d,DWORD[rbp] > > + xor r14d,ecx > > + > > + xor r15d,edx > > + ror r13d,6 > > + mov ebx,edx > > + > > + and edi,r15d > > + ror r14d,2 > > + add r12d,r13d > > + > > + xor ebx,edi > > + add r9d,r12d > > + add ebx,r12d > > + > > + lea rbp,[4+rbp] > > + mov r13d,DWORD[rsp] > > + mov edi,DWORD[52+rsp] > > + > > + mov r12d,r13d > > + ror r13d,11 > > + add ebx,r14d > > + mov r14d,edi > > + ror edi,2 > > + > > + xor r13d,r12d > > + shr r12d,3 > > + ror r13d,7 > > + xor edi,r14d > > + shr r14d,10 > > + > > + ror edi,17 > > + xor r12d,r13d > > + xor edi,r14d > > + add r12d,DWORD[32+rsp] > > + > > + add r12d,DWORD[60+rsp] > > + mov r13d,r9d > > + add r12d,edi > > + mov r14d,ebx > > + ror r13d,14 > > + mov edi,r10d > > + > > + xor r13d,r9d > > + ror r14d,9 > > + xor edi,r11d > > + > > + mov DWORD[60+rsp],r12d > > + xor r14d,ebx > > + and edi,r9d > > + > > + ror r13d,5 > > + add r12d,eax > > + xor edi,r11d > > + > > + ror r14d,11 > > + xor r13d,r9d > > + add r12d,edi > > + > > + mov edi,ebx > > + add r12d,DWORD[rbp] > > + xor r14d,ebx > > + > > + xor edi,ecx > > + ror r13d,6 > > + mov eax,ecx > > + > > + and r15d,edi > > + ror r14d,2 > > + add r12d,r13d > > + > > + xor eax,r15d > > + add r8d,r12d > > + add eax,r12d > > + > > + lea rbp,[20+rbp] > > + cmp BYTE[3+rbp],0 > > + jnz NEAR $L$rounds_16_xx > > + > > + mov rdi,QWORD[((64+0))+rsp] > > + add eax,r14d > > + lea rsi,[64+rsi] > > + > > + add eax,DWORD[rdi] > > + add ebx,DWORD[4+rdi] > > + add ecx,DWORD[8+rdi] > > + add edx,DWORD[12+rdi] > > + add r8d,DWORD[16+rdi] > > + add r9d,DWORD[20+rdi] > > + add r10d,DWORD[24+rdi] > > + add r11d,DWORD[28+rdi] > > + > > + cmp rsi,QWORD[((64+16))+rsp] > > + > > + mov DWORD[rdi],eax > > + mov DWORD[4+rdi],ebx > > + mov DWORD[8+rdi],ecx > > + mov DWORD[12+rdi],edx > > + mov DWORD[16+rdi],r8d > > + mov DWORD[20+rdi],r9d > > + mov DWORD[24+rdi],r10d > > + mov DWORD[28+rdi],r11d > > + jb NEAR $L$loop > > + > > + mov rsi,QWORD[88+rsp] > > + > > + mov r15,QWORD[((-48))+rsi] > > + > > + mov r14,QWORD[((-40))+rsi] > > + > > + mov r13,QWORD[((-32))+rsi] > > + > > + mov r12,QWORD[((-24))+rsi] > > + > > + mov rbp,QWORD[((-16))+rsi] > > + > > + mov rbx,QWORD[((-8))+rsi] > > + > > + lea rsp,[rsi] > > + > > +$L$epilogue: > > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > > + mov rsi,QWORD[16+rsp] > > + DB 0F3h,0C3h ;repret > > + > > +$L$SEH_end_sha256_block_data_order: > > +ALIGN 64 > > + > > +K256: > > + DD 0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5 > > + DD 0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5 > > + DD 0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5 > > + DD 0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5 > > + DD 0xd807aa98,0x12835b01,0x243185be,0x550c7dc3 > > + DD 0xd807aa98,0x12835b01,0x243185be,0x550c7dc3 > > + DD 0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174 > > + DD 0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174 > > + DD 0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc > > + DD 0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc > > + DD 0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da > > + DD 0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da > > + DD 0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7 > > + DD 0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7 > > + DD 0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967 > > + DD 0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967 > > + DD 0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13 > > + DD 0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13 > > + DD 0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85 > > + DD 0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85 > > + DD 0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3 > > + DD 0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3 > > + DD 0xd192e819,0xd6990624,0xf40e3585,0x106aa070 > > + DD 0xd192e819,0xd6990624,0xf40e3585,0x106aa070 > > + DD 0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5 > > + DD 0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5 > > + DD 0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3 > > + DD 0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3 > > + DD 0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208 > > + DD 0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208 > > + DD 0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2 > > + DD 0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2 > > + > > + DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f > > + DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f > > + DD 0x03020100,0x0b0a0908,0xffffffff,0xffffffff > > + DD 0x03020100,0x0b0a0908,0xffffffff,0xffffffff > > + DD 0xffffffff,0xffffffff,0x03020100,0x0b0a0908 > > + DD 0xffffffff,0xffffffff,0x03020100,0x0b0a0908 > > +DB 83,72,65,50,53,54,32,98,108,111,99,107,32,116,114,97 > > +DB 110,115,102,111,114,109,32,102,111,114,32,120,56,54,95,54 > > +DB 52,44,32,67,82,89,80,84,79,71,65,77,83,32,98,121 > > +DB 32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46 > > +DB 111,114,103,62,0 > > + > > +ALIGN 64 > > +sha256_block_data_order_shaext: > > + mov QWORD[8+rsp],rdi ;WIN64 prologue > > + mov QWORD[16+rsp],rsi > > + mov rax,rsp > > +$L$SEH_begin_sha256_block_data_order_shaext: > > + mov rdi,rcx > > + mov rsi,rdx > > + mov rdx,r8 > > + > > + > > +_shaext_shortcut: > > + > > + lea rsp,[((-88))+rsp] > > + movaps XMMWORD[(-8-80)+rax],xmm6 > > + movaps XMMWORD[(-8-64)+rax],xmm7 > > + movaps XMMWORD[(-8-48)+rax],xmm8 > > + movaps XMMWORD[(-8-32)+rax],xmm9 > > + movaps XMMWORD[(-8-16)+rax],xmm10 > > +$L$prologue_shaext: > > + lea rcx,[((K256+128))] > > + movdqu xmm1,XMMWORD[rdi] > > + movdqu xmm2,XMMWORD[16+rdi] > > + movdqa xmm7,XMMWORD[((512-128))+rcx] > > + > > + pshufd xmm0,xmm1,0x1b > > + pshufd xmm1,xmm1,0xb1 > > + pshufd xmm2,xmm2,0x1b > > + movdqa xmm8,xmm7 > > +DB 102,15,58,15,202,8 > > + punpcklqdq xmm2,xmm0 > > + jmp NEAR $L$oop_shaext > > + > > +ALIGN 16 > > +$L$oop_shaext: > > + movdqu xmm3,XMMWORD[rsi] > > + movdqu xmm4,XMMWORD[16+rsi] > > + movdqu xmm5,XMMWORD[32+rsi] > > +DB 102,15,56,0,223 > > + movdqu xmm6,XMMWORD[48+rsi] > > + > > + movdqa xmm0,XMMWORD[((0-128))+rcx] > > + paddd xmm0,xmm3 > > +DB 102,15,56,0,231 > > + movdqa xmm10,xmm2 > > +DB 15,56,203,209 > > + pshufd xmm0,xmm0,0x0e > > + nop > > + movdqa xmm9,xmm1 > > +DB 15,56,203,202 > > + > > + movdqa xmm0,XMMWORD[((32-128))+rcx] > > + paddd xmm0,xmm4 > > +DB 102,15,56,0,239 > > +DB 15,56,203,209 > > + pshufd xmm0,xmm0,0x0e > > + lea rsi,[64+rsi] > > +DB 15,56,204,220 > > +DB 15,56,203,202 > > + > > + movdqa xmm0,XMMWORD[((64-128))+rcx] > > + paddd xmm0,xmm5 > > +DB 102,15,56,0,247 > > +DB 15,56,203,209 > > + pshufd xmm0,xmm0,0x0e > > + movdqa xmm7,xmm6 > > +DB 102,15,58,15,253,4 > > + nop > > + paddd xmm3,xmm7 > > +DB 15,56,204,229 > > +DB 15,56,203,202 > > + > > + movdqa xmm0,XMMWORD[((96-128))+rcx] > > + paddd xmm0,xmm6 > > +DB 15,56,205,222 > > +DB 15,56,203,209 > > + pshufd xmm0,xmm0,0x0e > > + movdqa xmm7,xmm3 > > +DB 102,15,58,15,254,4 > > + nop > > + paddd xmm4,xmm7 > > +DB 15,56,204,238 > > +DB 15,56,203,202 > > + movdqa xmm0,XMMWORD[((128-128))+rcx] > > + paddd xmm0,xmm3 > > +DB 15,56,205,227 > > +DB 15,56,203,209 > > + pshufd xmm0,xmm0,0x0e > > + movdqa xmm7,xmm4 > > +DB 102,15,58,15,251,4 > > + nop > > + paddd xmm5,xmm7 > > +DB 15,56,204,243 > > +DB 15,56,203,202 > > + movdqa xmm0,XMMWORD[((160-128))+rcx] > > + paddd xmm0,xmm4 > > +DB 15,56,205,236 > > +DB 15,56,203,209 > > + pshufd xmm0,xmm0,0x0e > > + movdqa xmm7,xmm5 > > +DB 102,15,58,15,252,4 > > + nop > > + paddd xmm6,xmm7 > > +DB 15,56,204,220 > > +DB 15,56,203,202 > > + movdqa xmm0,XMMWORD[((192-128))+rcx] > > + paddd xmm0,xmm5 > > +DB 15,56,205,245 > > +DB 15,56,203,209 > > + pshufd xmm0,xmm0,0x0e > > + movdqa xmm7,xmm6 > > +DB 102,15,58,15,253,4 > > + nop > > + paddd xmm3,xmm7 > > +DB 15,56,204,229 > > +DB 15,56,203,202 > > + movdqa xmm0,XMMWORD[((224-128))+rcx] > > + paddd xmm0,xmm6 > > +DB 15,56,205,222 > > +DB 15,56,203,209 > > + pshufd xmm0,xmm0,0x0e > > + movdqa xmm7,xmm3 > > +DB 102,15,58,15,254,4 > > + nop > > + paddd xmm4,xmm7 > > +DB 15,56,204,238 > > +DB 15,56,203,202 > > + movdqa xmm0,XMMWORD[((256-128))+rcx] > > + paddd xmm0,xmm3 > > +DB 15,56,205,227 > > +DB 15,56,203,209 > > + pshufd xmm0,xmm0,0x0e > > + movdqa xmm7,xmm4 > > +DB 102,15,58,15,251,4 > > + nop > > + paddd xmm5,xmm7 > > +DB 15,56,204,243 > > +DB 15,56,203,202 > > + movdqa xmm0,XMMWORD[((288-128))+rcx] > > + paddd xmm0,xmm4 > > +DB 15,56,205,236 > > +DB 15,56,203,209 > > + pshufd xmm0,xmm0,0x0e > > + movdqa xmm7,xmm5 > > +DB 102,15,58,15,252,4 > > + nop > > + paddd xmm6,xmm7 > > +DB 15,56,204,220 > > +DB 15,56,203,202 > > + movdqa xmm0,XMMWORD[((320-128))+rcx] > > + paddd xmm0,xmm5 > > +DB 15,56,205,245 > > +DB 15,56,203,209 > > + pshufd xmm0,xmm0,0x0e > > + movdqa xmm7,xmm6 > > +DB 102,15,58,15,253,4 > > + nop > > + paddd xmm3,xmm7 > > +DB 15,56,204,229 > > +DB 15,56,203,202 > > + movdqa xmm0,XMMWORD[((352-128))+rcx] > > + paddd xmm0,xmm6 > > +DB 15,56,205,222 > > +DB 15,56,203,209 > > + pshufd xmm0,xmm0,0x0e > > + movdqa xmm7,xmm3 > > +DB 102,15,58,15,254,4 > > + nop > > + paddd xmm4,xmm7 > > +DB 15,56,204,238 > > +DB 15,56,203,202 > > + movdqa xmm0,XMMWORD[((384-128))+rcx] > > + paddd xmm0,xmm3 > > +DB 15,56,205,227 > > +DB 15,56,203,209 > > + pshufd xmm0,xmm0,0x0e > > + movdqa xmm7,xmm4 > > +DB 102,15,58,15,251,4 > > + nop > > + paddd xmm5,xmm7 > > +DB 15,56,204,243 > > +DB 15,56,203,202 > > + movdqa xmm0,XMMWORD[((416-128))+rcx] > > + paddd xmm0,xmm4 > > +DB 15,56,205,236 > > +DB 15,56,203,209 > > + pshufd xmm0,xmm0,0x0e > > + movdqa xmm7,xmm5 > > +DB 102,15,58,15,252,4 > > +DB 15,56,203,202 > > + paddd xmm6,xmm7 > > + > > + movdqa xmm0,XMMWORD[((448-128))+rcx] > > + paddd xmm0,xmm5 > > +DB 15,56,203,209 > > + pshufd xmm0,xmm0,0x0e > > +DB 15,56,205,245 > > + movdqa xmm7,xmm8 > > +DB 15,56,203,202 > > + > > + movdqa xmm0,XMMWORD[((480-128))+rcx] > > + paddd xmm0,xmm6 > > + nop > > +DB 15,56,203,209 > > + pshufd xmm0,xmm0,0x0e > > + dec rdx > > + nop > > +DB 15,56,203,202 > > + > > + paddd xmm2,xmm10 > > + paddd xmm1,xmm9 > > + jnz NEAR $L$oop_shaext > > + > > + pshufd xmm2,xmm2,0xb1 > > + pshufd xmm7,xmm1,0x1b > > + pshufd xmm1,xmm1,0xb1 > > + punpckhqdq xmm1,xmm2 > > +DB 102,15,58,15,215,8 > > + > > + movdqu XMMWORD[rdi],xmm1 > > + movdqu XMMWORD[16+rdi],xmm2 > > + movaps xmm6,XMMWORD[((-8-80))+rax] > > + movaps xmm7,XMMWORD[((-8-64))+rax] > > + movaps xmm8,XMMWORD[((-8-48))+rax] > > + movaps xmm9,XMMWORD[((-8-32))+rax] > > + movaps xmm10,XMMWORD[((-8-16))+rax] > > + mov rsp,rax > > +$L$epilogue_shaext: > > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > > + mov rsi,QWORD[16+rsp] > > + DB 0F3h,0C3h ;repret > > + > > +$L$SEH_end_sha256_block_data_order_shaext: > > + > > +ALIGN 64 > > +sha256_block_data_order_ssse3: > > + mov QWORD[8+rsp],rdi ;WIN64 prologue > > + mov QWORD[16+rsp],rsi > > + mov rax,rsp > > +$L$SEH_begin_sha256_block_data_order_ssse3: > > + mov rdi,rcx > > + mov rsi,rdx > > + mov rdx,r8 > > + > > + > > + > > +$L$ssse3_shortcut: > > + mov rax,rsp > > + > > + push rbx > > + > > + push rbp > > + > > + push r12 > > + > > + push r13 > > + > > + push r14 > > + > > + push r15 > > + > > + shl rdx,4 > > + sub rsp,160 > > + lea rdx,[rdx*4+rsi] > > + and rsp,-64 > > + mov QWORD[((64+0))+rsp],rdi > > + mov QWORD[((64+8))+rsp],rsi > > + mov QWORD[((64+16))+rsp],rdx > > + mov QWORD[88+rsp],rax > > + > > + movaps XMMWORD[(64+32)+rsp],xmm6 > > + movaps XMMWORD[(64+48)+rsp],xmm7 > > + movaps XMMWORD[(64+64)+rsp],xmm8 > > + movaps XMMWORD[(64+80)+rsp],xmm9 > > +$L$prologue_ssse3: > > + > > + mov eax,DWORD[rdi] > > + mov ebx,DWORD[4+rdi] > > + mov ecx,DWORD[8+rdi] > > + mov edx,DWORD[12+rdi] > > + mov r8d,DWORD[16+rdi] > > + mov r9d,DWORD[20+rdi] > > + mov r10d,DWORD[24+rdi] > > + mov r11d,DWORD[28+rdi] > > + > > + > > + jmp NEAR $L$loop_ssse3 > > +ALIGN 16 > > +$L$loop_ssse3: > > + movdqa xmm7,XMMWORD[((K256+512))] > > + movdqu xmm0,XMMWORD[rsi] > > + movdqu xmm1,XMMWORD[16+rsi] > > + movdqu xmm2,XMMWORD[32+rsi] > > +DB 102,15,56,0,199 > > + movdqu xmm3,XMMWORD[48+rsi] > > + lea rbp,[K256] > > +DB 102,15,56,0,207 > > + movdqa xmm4,XMMWORD[rbp] > > + movdqa xmm5,XMMWORD[32+rbp] > > +DB 102,15,56,0,215 > > + paddd xmm4,xmm0 > > + movdqa xmm6,XMMWORD[64+rbp] > > +DB 102,15,56,0,223 > > + movdqa xmm7,XMMWORD[96+rbp] > > + paddd xmm5,xmm1 > > + paddd xmm6,xmm2 > > + paddd xmm7,xmm3 > > + movdqa XMMWORD[rsp],xmm4 > > + mov r14d,eax > > + movdqa XMMWORD[16+rsp],xmm5 > > + mov edi,ebx > > + movdqa XMMWORD[32+rsp],xmm6 > > + xor edi,ecx > > + movdqa XMMWORD[48+rsp],xmm7 > > + mov r13d,r8d > > + jmp NEAR $L$ssse3_00_47 > > + > > +ALIGN 16 > > +$L$ssse3_00_47: > > + sub rbp,-128 > > + ror r13d,14 > > + movdqa xmm4,xmm1 > > + mov eax,r14d > > + mov r12d,r9d > > + movdqa xmm7,xmm3 > > + ror r14d,9 > > + xor r13d,r8d > > + xor r12d,r10d > > + ror r13d,5 > > + xor r14d,eax > > +DB 102,15,58,15,224,4 > > + and r12d,r8d > > + xor r13d,r8d > > +DB 102,15,58,15,250,4 > > + add r11d,DWORD[rsp] > > + mov r15d,eax > > + xor r12d,r10d > > + ror r14d,11 > > + movdqa xmm5,xmm4 > > + xor r15d,ebx > > + add r11d,r12d > > + movdqa xmm6,xmm4 > > + ror r13d,6 > > + and edi,r15d > > + psrld xmm4,3 > > + xor r14d,eax > > + add r11d,r13d > > + xor edi,ebx > > + paddd xmm0,xmm7 > > + ror r14d,2 > > + add edx,r11d > > + psrld xmm6,7 > > + add r11d,edi > > + mov r13d,edx > > + pshufd xmm7,xmm3,250 > > + add r14d,r11d > > + ror r13d,14 > > + pslld xmm5,14 > > + mov r11d,r14d > > + mov r12d,r8d > > + pxor xmm4,xmm6 > > + ror r14d,9 > > + xor r13d,edx > > + xor r12d,r9d > > + ror r13d,5 > > + psrld xmm6,11 > > + xor r14d,r11d > > + pxor xmm4,xmm5 > > + and r12d,edx > > + xor r13d,edx > > + pslld xmm5,11 > > + add r10d,DWORD[4+rsp] > > + mov edi,r11d > > + pxor xmm4,xmm6 > > + xor r12d,r9d > > + ror r14d,11 > > + movdqa xmm6,xmm7 > > + xor edi,eax > > + add r10d,r12d > > + pxor xmm4,xmm5 > > + ror r13d,6 > > + and r15d,edi > > + xor r14d,r11d > > + psrld xmm7,10 > > + add r10d,r13d > > + xor r15d,eax > > + paddd xmm0,xmm4 > > + ror r14d,2 > > + add ecx,r10d > > + psrlq xmm6,17 > > + add r10d,r15d > > + mov r13d,ecx > > + add r14d,r10d > > + pxor xmm7,xmm6 > > + ror r13d,14 > > + mov r10d,r14d > > + mov r12d,edx > > + ror r14d,9 > > + psrlq xmm6,2 > > + xor r13d,ecx > > + xor r12d,r8d > > + pxor xmm7,xmm6 > > + ror r13d,5 > > + xor r14d,r10d > > + and r12d,ecx > > + pshufd xmm7,xmm7,128 > > + xor r13d,ecx > > + add r9d,DWORD[8+rsp] > > + mov r15d,r10d > > + psrldq xmm7,8 > > + xor r12d,r8d > > + ror r14d,11 > > + xor r15d,r11d > > + add r9d,r12d > > + ror r13d,6 > > + paddd xmm0,xmm7 > > + and edi,r15d > > + xor r14d,r10d > > + add r9d,r13d > > + pshufd xmm7,xmm0,80 > > + xor edi,r11d > > + ror r14d,2 > > + add ebx,r9d > > + movdqa xmm6,xmm7 > > + add r9d,edi > > + mov r13d,ebx > > + psrld xmm7,10 > > + add r14d,r9d > > + ror r13d,14 > > + psrlq xmm6,17 > > + mov r9d,r14d > > + mov r12d,ecx > > + pxor xmm7,xmm6 > > + ror r14d,9 > > + xor r13d,ebx > > + xor r12d,edx > > + ror r13d,5 > > + xor r14d,r9d > > + psrlq xmm6,2 > > + and r12d,ebx > > + xor r13d,ebx > > + add r8d,DWORD[12+rsp] > > + pxor xmm7,xmm6 > > + mov edi,r9d > > + xor r12d,edx > > + ror r14d,11 > > + pshufd xmm7,xmm7,8 > > + xor edi,r10d > > + add r8d,r12d > > + movdqa xmm6,XMMWORD[rbp] > > + ror r13d,6 > > + and r15d,edi > > + pslldq xmm7,8 > > + xor r14d,r9d > > + add r8d,r13d > > + xor r15d,r10d > > + paddd xmm0,xmm7 > > + ror r14d,2 > > + add eax,r8d > > + add r8d,r15d > > + paddd xmm6,xmm0 > > + mov r13d,eax > > + add r14d,r8d > > + movdqa XMMWORD[rsp],xmm6 > > + ror r13d,14 > > + movdqa xmm4,xmm2 > > + mov r8d,r14d > > + mov r12d,ebx > > + movdqa xmm7,xmm0 > > + ror r14d,9 > > + xor r13d,eax > > + xor r12d,ecx > > + ror r13d,5 > > + xor r14d,r8d > > +DB 102,15,58,15,225,4 > > + and r12d,eax > > + xor r13d,eax > > +DB 102,15,58,15,251,4 > > + add edx,DWORD[16+rsp] > > + mov r15d,r8d > > + xor r12d,ecx > > + ror r14d,11 > > + movdqa xmm5,xmm4 > > + xor r15d,r9d > > + add edx,r12d > > + movdqa xmm6,xmm4 > > + ror r13d,6 > > + and edi,r15d > > + psrld xmm4,3 > > + xor r14d,r8d > > + add edx,r13d > > + xor edi,r9d > > + paddd xmm1,xmm7 > > + ror r14d,2 > > + add r11d,edx > > + psrld xmm6,7 > > + add edx,edi > > + mov r13d,r11d > > + pshufd xmm7,xmm0,250 > > + add r14d,edx > > + ror r13d,14 > > + pslld xmm5,14 > > + mov edx,r14d > > + mov r12d,eax > > + pxor xmm4,xmm6 > > + ror r14d,9 > > + xor r13d,r11d > > + xor r12d,ebx > > + ror r13d,5 > > + psrld xmm6,11 > > + xor r14d,edx > > + pxor xmm4,xmm5 > > + and r12d,r11d > > + xor r13d,r11d > > + pslld xmm5,11 > > + add ecx,DWORD[20+rsp] > > + mov edi,edx > > + pxor xmm4,xmm6 > > + xor r12d,ebx > > + ror r14d,11 > > + movdqa xmm6,xmm7 > > + xor edi,r8d > > + add ecx,r12d > > + pxor xmm4,xmm5 > > + ror r13d,6 > > + and r15d,edi > > + xor r14d,edx > > + psrld xmm7,10 > > + add ecx,r13d > > + xor r15d,r8d > > + paddd xmm1,xmm4 > > + ror r14d,2 > > + add r10d,ecx > > + psrlq xmm6,17 > > + add ecx,r15d > > + mov r13d,r10d > > + add r14d,ecx > > + pxor xmm7,xmm6 > > + ror r13d,14 > > + mov ecx,r14d > > + mov r12d,r11d > > + ror r14d,9 > > + psrlq xmm6,2 > > + xor r13d,r10d > > + xor r12d,eax > > + pxor xmm7,xmm6 > > + ror r13d,5 > > + xor r14d,ecx > > + and r12d,r10d > > + pshufd xmm7,xmm7,128 > > + xor r13d,r10d > > + add ebx,DWORD[24+rsp] > > + mov r15d,ecx > > + psrldq xmm7,8 > > + xor r12d,eax > > + ror r14d,11 > > + xor r15d,edx > > + add ebx,r12d > > + ror r13d,6 > > + paddd xmm1,xmm7 > > + and edi,r15d > > + xor r14d,ecx > > + add ebx,r13d > > + pshufd xmm7,xmm1,80 > > + xor edi,edx > > + ror r14d,2 > > + add r9d,ebx > > + movdqa xmm6,xmm7 > > + add ebx,edi > > + mov r13d,r9d > > + psrld xmm7,10 > > + add r14d,ebx > > + ror r13d,14 > > + psrlq xmm6,17 > > + mov ebx,r14d > > + mov r12d,r10d > > + pxor xmm7,xmm6 > > + ror r14d,9 > > + xor r13d,r9d > > + xor r12d,r11d > > + ror r13d,5 > > + xor r14d,ebx > > + psrlq xmm6,2 > > + and r12d,r9d > > + xor r13d,r9d > > + add eax,DWORD[28+rsp] > > + pxor xmm7,xmm6 > > + mov edi,ebx > > + xor r12d,r11d > > + ror r14d,11 > > + pshufd xmm7,xmm7,8 > > + xor edi,ecx > > + add eax,r12d > > + movdqa xmm6,XMMWORD[32+rbp] > > + ror r13d,6 > > + and r15d,edi > > + pslldq xmm7,8 > > + xor r14d,ebx > > + add eax,r13d > > + xor r15d,ecx > > + paddd xmm1,xmm7 > > + ror r14d,2 > > + add r8d,eax > > + add eax,r15d > > + paddd xmm6,xmm1 > > + mov r13d,r8d > > + add r14d,eax > > + movdqa XMMWORD[16+rsp],xmm6 > > + ror r13d,14 > > + movdqa xmm4,xmm3 > > + mov eax,r14d > > + mov r12d,r9d > > + movdqa xmm7,xmm1 > > + ror r14d,9 > > + xor r13d,r8d > > + xor r12d,r10d > > + ror r13d,5 > > + xor r14d,eax > > +DB 102,15,58,15,226,4 > > + and r12d,r8d > > + xor r13d,r8d > > +DB 102,15,58,15,248,4 > > + add r11d,DWORD[32+rsp] > > + mov r15d,eax > > + xor r12d,r10d > > + ror r14d,11 > > + movdqa xmm5,xmm4 > > + xor r15d,ebx > > + add r11d,r12d > > + movdqa xmm6,xmm4 > > + ror r13d,6 > > + and edi,r15d > > + psrld xmm4,3 > > + xor r14d,eax > > + add r11d,r13d > > + xor edi,ebx > > + paddd xmm2,xmm7 > > + ror r14d,2 > > + add edx,r11d > > + psrld xmm6,7 > > + add r11d,edi > > + mov r13d,edx > > + pshufd xmm7,xmm1,250 > > + add r14d,r11d > > + ror r13d,14 > > + pslld xmm5,14 > > + mov r11d,r14d > > + mov r12d,r8d > > + pxor xmm4,xmm6 > > + ror r14d,9 > > + xor r13d,edx > > + xor r12d,r9d > > + ror r13d,5 > > + psrld xmm6,11 > > + xor r14d,r11d > > + pxor xmm4,xmm5 > > + and r12d,edx > > + xor r13d,edx > > + pslld xmm5,11 > > + add r10d,DWORD[36+rsp] > > + mov edi,r11d > > + pxor xmm4,xmm6 > > + xor r12d,r9d > > + ror r14d,11 > > + movdqa xmm6,xmm7 > > + xor edi,eax > > + add r10d,r12d > > + pxor xmm4,xmm5 > > + ror r13d,6 > > + and r15d,edi > > + xor r14d,r11d > > + psrld xmm7,10 > > + add r10d,r13d > > + xor r15d,eax > > + paddd xmm2,xmm4 > > + ror r14d,2 > > + add ecx,r10d > > + psrlq xmm6,17 > > + add r10d,r15d > > + mov r13d,ecx > > + add r14d,r10d > > + pxor xmm7,xmm6 > > + ror r13d,14 > > + mov r10d,r14d > > + mov r12d,edx > > + ror r14d,9 > > + psrlq xmm6,2 > > + xor r13d,ecx > > + xor r12d,r8d > > + pxor xmm7,xmm6 > > + ror r13d,5 > > + xor r14d,r10d > > + and r12d,ecx > > + pshufd xmm7,xmm7,128 > > + xor r13d,ecx > > + add r9d,DWORD[40+rsp] > > + mov r15d,r10d > > + psrldq xmm7,8 > > + xor r12d,r8d > > + ror r14d,11 > > + xor r15d,r11d > > + add r9d,r12d > > + ror r13d,6 > > + paddd xmm2,xmm7 > > + and edi,r15d > > + xor r14d,r10d > > + add r9d,r13d > > + pshufd xmm7,xmm2,80 > > + xor edi,r11d > > + ror r14d,2 > > + add ebx,r9d > > + movdqa xmm6,xmm7 > > + add r9d,edi > > + mov r13d,ebx > > + psrld xmm7,10 > > + add r14d,r9d > > + ror r13d,14 > > + psrlq xmm6,17 > > + mov r9d,r14d > > + mov r12d,ecx > > + pxor xmm7,xmm6 > > + ror r14d,9 > > + xor r13d,ebx > > + xor r12d,edx > > + ror r13d,5 > > + xor r14d,r9d > > + psrlq xmm6,2 > > + and r12d,ebx > > + xor r13d,ebx > > + add r8d,DWORD[44+rsp] > > + pxor xmm7,xmm6 > > + mov edi,r9d > > + xor r12d,edx > > + ror r14d,11 > > + pshufd xmm7,xmm7,8 > > + xor edi,r10d > > + add r8d,r12d > > + movdqa xmm6,XMMWORD[64+rbp] > > + ror r13d,6 > > + and r15d,edi > > + pslldq xmm7,8 > > + xor r14d,r9d > > + add r8d,r13d > > + xor r15d,r10d > > + paddd xmm2,xmm7 > > + ror r14d,2 > > + add eax,r8d > > + add r8d,r15d > > + paddd xmm6,xmm2 > > + mov r13d,eax > > + add r14d,r8d > > + movdqa XMMWORD[32+rsp],xmm6 > > + ror r13d,14 > > + movdqa xmm4,xmm0 > > + mov r8d,r14d > > + mov r12d,ebx > > + movdqa xmm7,xmm2 > > + ror r14d,9 > > + xor r13d,eax > > + xor r12d,ecx > > + ror r13d,5 > > + xor r14d,r8d > > +DB 102,15,58,15,227,4 > > + and r12d,eax > > + xor r13d,eax > > +DB 102,15,58,15,249,4 > > + add edx,DWORD[48+rsp] > > + mov r15d,r8d > > + xor r12d,ecx > > + ror r14d,11 > > + movdqa xmm5,xmm4 > > + xor r15d,r9d > > + add edx,r12d > > + movdqa xmm6,xmm4 > > + ror r13d,6 > > + and edi,r15d > > + psrld xmm4,3 > > + xor r14d,r8d > > + add edx,r13d > > + xor edi,r9d > > + paddd xmm3,xmm7 > > + ror r14d,2 > > + add r11d,edx > > + psrld xmm6,7 > > + add edx,edi > > + mov r13d,r11d > > + pshufd xmm7,xmm2,250 > > + add r14d,edx > > + ror r13d,14 > > + pslld xmm5,14 > > + mov edx,r14d > > + mov r12d,eax > > + pxor xmm4,xmm6 > > + ror r14d,9 > > + xor r13d,r11d > > + xor r12d,ebx > > + ror r13d,5 > > + psrld xmm6,11 > > + xor r14d,edx > > + pxor xmm4,xmm5 > > + and r12d,r11d > > + xor r13d,r11d > > + pslld xmm5,11 > > + add ecx,DWORD[52+rsp] > > + mov edi,edx > > + pxor xmm4,xmm6 > > + xor r12d,ebx > > + ror r14d,11 > > + movdqa xmm6,xmm7 > > + xor edi,r8d > > + add ecx,r12d > > + pxor xmm4,xmm5 > > + ror r13d,6 > > + and r15d,edi > > + xor r14d,edx > > + psrld xmm7,10 > > + add ecx,r13d > > + xor r15d,r8d > > + paddd xmm3,xmm4 > > + ror r14d,2 > > + add r10d,ecx > > + psrlq xmm6,17 > > + add ecx,r15d > > + mov r13d,r10d > > + add r14d,ecx > > + pxor xmm7,xmm6 > > + ror r13d,14 > > + mov ecx,r14d > > + mov r12d,r11d > > + ror r14d,9 > > + psrlq xmm6,2 > > + xor r13d,r10d > > + xor r12d,eax > > + pxor xmm7,xmm6 > > + ror r13d,5 > > + xor r14d,ecx > > + and r12d,r10d > > + pshufd xmm7,xmm7,128 > > + xor r13d,r10d > > + add ebx,DWORD[56+rsp] > > + mov r15d,ecx > > + psrldq xmm7,8 > > + xor r12d,eax > > + ror r14d,11 > > + xor r15d,edx > > + add ebx,r12d > > + ror r13d,6 > > + paddd xmm3,xmm7 > > + and edi,r15d > > + xor r14d,ecx > > + add ebx,r13d > > + pshufd xmm7,xmm3,80 > > + xor edi,edx > > + ror r14d,2 > > + add r9d,ebx > > + movdqa xmm6,xmm7 > > + add ebx,edi > > + mov r13d,r9d > > + psrld xmm7,10 > > + add r14d,ebx > > + ror r13d,14 > > + psrlq xmm6,17 > > + mov ebx,r14d > > + mov r12d,r10d > > + pxor xmm7,xmm6 > > + ror r14d,9 > > + xor r13d,r9d > > + xor r12d,r11d > > + ror r13d,5 > > + xor r14d,ebx > > + psrlq xmm6,2 > > + and r12d,r9d > > + xor r13d,r9d > > + add eax,DWORD[60+rsp] > > + pxor xmm7,xmm6 > > + mov edi,ebx > > + xor r12d,r11d > > + ror r14d,11 > > + pshufd xmm7,xmm7,8 > > + xor edi,ecx > > + add eax,r12d > > + movdqa xmm6,XMMWORD[96+rbp] > > + ror r13d,6 > > + and r15d,edi > > + pslldq xmm7,8 > > + xor r14d,ebx > > + add eax,r13d > > + xor r15d,ecx > > + paddd xmm3,xmm7 > > + ror r14d,2 > > + add r8d,eax > > + add eax,r15d > > + paddd xmm6,xmm3 > > + mov r13d,r8d > > + add r14d,eax > > + movdqa XMMWORD[48+rsp],xmm6 > > + cmp BYTE[131+rbp],0 > > + jne NEAR $L$ssse3_00_47 > > + ror r13d,14 > > + mov eax,r14d > > + mov r12d,r9d > > + ror r14d,9 > > + xor r13d,r8d > > + xor r12d,r10d > > + ror r13d,5 > > + xor r14d,eax > > + and r12d,r8d > > + xor r13d,r8d > > + add r11d,DWORD[rsp] > > + mov r15d,eax > > + xor r12d,r10d > > + ror r14d,11 > > + xor r15d,ebx > > + add r11d,r12d > > + ror r13d,6 > > + and edi,r15d > > + xor r14d,eax > > + add r11d,r13d > > + xor edi,ebx > > + ror r14d,2 > > + add edx,r11d > > + add r11d,edi > > + mov r13d,edx > > + add r14d,r11d > > + ror r13d,14 > > + mov r11d,r14d > > + mov r12d,r8d > > + ror r14d,9 > > + xor r13d,edx > > + xor r12d,r9d > > + ror r13d,5 > > + xor r14d,r11d > > + and r12d,edx > > + xor r13d,edx > > + add r10d,DWORD[4+rsp] > > + mov edi,r11d > > + xor r12d,r9d > > + ror r14d,11 > > + xor edi,eax > > + add r10d,r12d > > + ror r13d,6 > > + and r15d,edi > > + xor r14d,r11d > > + add r10d,r13d > > + xor r15d,eax > > + ror r14d,2 > > + add ecx,r10d > > + add r10d,r15d > > + mov r13d,ecx > > + add r14d,r10d > > + ror r13d,14 > > + mov r10d,r14d > > + mov r12d,edx > > + ror r14d,9 > > + xor r13d,ecx > > + xor r12d,r8d > > + ror r13d,5 > > + xor r14d,r10d > > + and r12d,ecx > > + xor r13d,ecx > > + add r9d,DWORD[8+rsp] > > + mov r15d,r10d > > + xor r12d,r8d > > + ror r14d,11 > > + xor r15d,r11d > > + add r9d,r12d > > + ror r13d,6 > > + and edi,r15d > > + xor r14d,r10d > > + add r9d,r13d > > + xor edi,r11d > > + ror r14d,2 > > + add ebx,r9d > > + add r9d,edi > > + mov r13d,ebx > > + add r14d,r9d > > + ror r13d,14 > > + mov r9d,r14d > > + mov r12d,ecx > > + ror r14d,9 > > + xor r13d,ebx > > + xor r12d,edx > > + ror r13d,5 > > + xor r14d,r9d > > + and r12d,ebx > > + xor r13d,ebx > > + add r8d,DWORD[12+rsp] > > + mov edi,r9d > > + xor r12d,edx > > + ror r14d,11 > > + xor edi,r10d > > + add r8d,r12d > > + ror r13d,6 > > + and r15d,edi > > + xor r14d,r9d > > + add r8d,r13d > > + xor r15d,r10d > > + ror r14d,2 > > + add eax,r8d > > + add r8d,r15d > > + mov r13d,eax > > + add r14d,r8d > > + ror r13d,14 > > + mov r8d,r14d > > + mov r12d,ebx > > + ror r14d,9 > > + xor r13d,eax > > + xor r12d,ecx > > + ror r13d,5 > > + xor r14d,r8d > > + and r12d,eax > > + xor r13d,eax > > + add edx,DWORD[16+rsp] > > + mov r15d,r8d > > + xor r12d,ecx > > + ror r14d,11 > > + xor r15d,r9d > > + add edx,r12d > > + ror r13d,6 > > + and edi,r15d > > + xor r14d,r8d > > + add edx,r13d > > + xor edi,r9d > > + ror r14d,2 > > + add r11d,edx > > + add edx,edi > > + mov r13d,r11d > > + add r14d,edx > > + ror r13d,14 > > + mov edx,r14d > > + mov r12d,eax > > + ror r14d,9 > > + xor r13d,r11d > > + xor r12d,ebx > > + ror r13d,5 > > + xor r14d,edx > > + and r12d,r11d > > + xor r13d,r11d > > + add ecx,DWORD[20+rsp] > > + mov edi,edx > > + xor r12d,ebx > > + ror r14d,11 > > + xor edi,r8d > > + add ecx,r12d > > + ror r13d,6 > > + and r15d,edi > > + xor r14d,edx > > + add ecx,r13d > > + xor r15d,r8d > > + ror r14d,2 > > + add r10d,ecx > > + add ecx,r15d > > + mov r13d,r10d > > + add r14d,ecx > > + ror r13d,14 > > + mov ecx,r14d > > + mov r12d,r11d > > + ror r14d,9 > > + xor r13d,r10d > > + xor r12d,eax > > + ror r13d,5 > > + xor r14d,ecx > > + and r12d,r10d > > + xor r13d,r10d > > + add ebx,DWORD[24+rsp] > > + mov r15d,ecx > > + xor r12d,eax > > + ror r14d,11 > > + xor r15d,edx > > + add ebx,r12d > > + ror r13d,6 > > + and edi,r15d > > + xor r14d,ecx > > + add ebx,r13d > > + xor edi,edx > > + ror r14d,2 > > + add r9d,ebx > > + add ebx,edi > > + mov r13d,r9d > > + add r14d,ebx > > + ror r13d,14 > > + mov ebx,r14d > > + mov r12d,r10d > > + ror r14d,9 > > + xor r13d,r9d > > + xor r12d,r11d > > + ror r13d,5 > > + xor r14d,ebx > > + and r12d,r9d > > + xor r13d,r9d > > + add eax,DWORD[28+rsp] > > + mov edi,ebx > > + xor r12d,r11d > > + ror r14d,11 > > + xor edi,ecx > > + add eax,r12d > > + ror r13d,6 > > + and r15d,edi > > + xor r14d,ebx > > + add eax,r13d > > + xor r15d,ecx > > + ror r14d,2 > > + add r8d,eax > > + add eax,r15d > > + mov r13d,r8d > > + add r14d,eax > > + ror r13d,14 > > + mov eax,r14d > > + mov r12d,r9d > > + ror r14d,9 > > + xor r13d,r8d > > + xor r12d,r10d > > + ror r13d,5 > > + xor r14d,eax > > + and r12d,r8d > > + xor r13d,r8d > > + add r11d,DWORD[32+rsp] > > + mov r15d,eax > > + xor r12d,r10d > > + ror r14d,11 > > + xor r15d,ebx > > + add r11d,r12d > > + ror r13d,6 > > + and edi,r15d > > + xor r14d,eax > > + add r11d,r13d > > + xor edi,ebx > > + ror r14d,2 > > + add edx,r11d > > + add r11d,edi > > + mov r13d,edx > > + add r14d,r11d > > + ror r13d,14 > > + mov r11d,r14d > > + mov r12d,r8d > > + ror r14d,9 > > + xor r13d,edx > > + xor r12d,r9d > > + ror r13d,5 > > + xor r14d,r11d > > + and r12d,edx > > + xor r13d,edx > > + add r10d,DWORD[36+rsp] > > + mov edi,r11d > > + xor r12d,r9d > > + ror r14d,11 > > + xor edi,eax > > + add r10d,r12d > > + ror r13d,6 > > + and r15d,edi > > + xor r14d,r11d > > + add r10d,r13d > > + xor r15d,eax > > + ror r14d,2 > > + add ecx,r10d > > + add r10d,r15d > > + mov r13d,ecx > > + add r14d,r10d > > + ror r13d,14 > > + mov r10d,r14d > > + mov r12d,edx > > + ror r14d,9 > > + xor r13d,ecx > > + xor r12d,r8d > > + ror r13d,5 > > + xor r14d,r10d > > + and r12d,ecx > > + xor r13d,ecx > > + add r9d,DWORD[40+rsp] > > + mov r15d,r10d > > + xor r12d,r8d > > + ror r14d,11 > > + xor r15d,r11d > > + add r9d,r12d > > + ror r13d,6 > > + and edi,r15d > > + xor r14d,r10d > > + add r9d,r13d > > + xor edi,r11d > > + ror r14d,2 > > + add ebx,r9d > > + add r9d,edi > > + mov r13d,ebx > > + add r14d,r9d > > + ror r13d,14 > > + mov r9d,r14d > > + mov r12d,ecx > > + ror r14d,9 > > + xor r13d,ebx > > + xor r12d,edx > > + ror r13d,5 > > + xor r14d,r9d > > + and r12d,ebx > > + xor r13d,ebx > > + add r8d,DWORD[44+rsp] > > + mov edi,r9d > > + xor r12d,edx > > + ror r14d,11 > > + xor edi,r10d > > + add r8d,r12d > > + ror r13d,6 > > + and r15d,edi > > + xor r14d,r9d > > + add r8d,r13d > > + xor r15d,r10d > > + ror r14d,2 > > + add eax,r8d > > + add r8d,r15d > > + mov r13d,eax > > + add r14d,r8d > > + ror r13d,14 > > + mov r8d,r14d > > + mov r12d,ebx > > + ror r14d,9 > > + xor r13d,eax > > + xor r12d,ecx > > + ror r13d,5 > > + xor r14d,r8d > > + and r12d,eax > > + xor r13d,eax > > + add edx,DWORD[48+rsp] > > + mov r15d,r8d > > + xor r12d,ecx > > + ror r14d,11 > > + xor r15d,r9d > > + add edx,r12d > > + ror r13d,6 > > + and edi,r15d > > + xor r14d,r8d > > + add edx,r13d > > + xor edi,r9d > > + ror r14d,2 > > + add r11d,edx > > + add edx,edi > > + mov r13d,r11d > > + add r14d,edx > > + ror r13d,14 > > + mov edx,r14d > > + mov r12d,eax > > + ror r14d,9 > > + xor r13d,r11d > > + xor r12d,ebx > > + ror r13d,5 > > + xor r14d,edx > > + and r12d,r11d > > + xor r13d,r11d > > + add ecx,DWORD[52+rsp] > > + mov edi,edx > > + xor r12d,ebx > > + ror r14d,11 > > + xor edi,r8d > > + add ecx,r12d > > + ror r13d,6 > > + and r15d,edi > > + xor r14d,edx > > + add ecx,r13d > > + xor r15d,r8d > > + ror r14d,2 > > + add r10d,ecx > > + add ecx,r15d > > + mov r13d,r10d > > + add r14d,ecx > > + ror r13d,14 > > + mov ecx,r14d > > + mov r12d,r11d > > + ror r14d,9 > > + xor r13d,r10d > > + xor r12d,eax > > + ror r13d,5 > > + xor r14d,ecx > > + and r12d,r10d > > + xor r13d,r10d > > + add ebx,DWORD[56+rsp] > > + mov r15d,ecx > > + xor r12d,eax > > + ror r14d,11 > > + xor r15d,edx > > + add ebx,r12d > > + ror r13d,6 > > + and edi,r15d > > + xor r14d,ecx > > + add ebx,r13d > > + xor edi,edx > > + ror r14d,2 > > + add r9d,ebx > > + add ebx,edi > > + mov r13d,r9d > > + add r14d,ebx > > + ror r13d,14 > > + mov ebx,r14d > > + mov r12d,r10d > > + ror r14d,9 > > + xor r13d,r9d > > + xor r12d,r11d > > + ror r13d,5 > > + xor r14d,ebx > > + and r12d,r9d > > + xor r13d,r9d > > + add eax,DWORD[60+rsp] > > + mov edi,ebx > > + xor r12d,r11d > > + ror r14d,11 > > + xor edi,ecx > > + add eax,r12d > > + ror r13d,6 > > + and r15d,edi > > + xor r14d,ebx > > + add eax,r13d > > + xor r15d,ecx > > + ror r14d,2 > > + add r8d,eax > > + add eax,r15d > > + mov r13d,r8d > > + add r14d,eax > > + mov rdi,QWORD[((64+0))+rsp] > > + mov eax,r14d > > + > > + add eax,DWORD[rdi] > > + lea rsi,[64+rsi] > > + add ebx,DWORD[4+rdi] > > + add ecx,DWORD[8+rdi] > > + add edx,DWORD[12+rdi] > > + add r8d,DWORD[16+rdi] > > + add r9d,DWORD[20+rdi] > > + add r10d,DWORD[24+rdi] > > + add r11d,DWORD[28+rdi] > > + > > + cmp rsi,QWORD[((64+16))+rsp] > > + > > + mov DWORD[rdi],eax > > + mov DWORD[4+rdi],ebx > > + mov DWORD[8+rdi],ecx > > + mov DWORD[12+rdi],edx > > + mov DWORD[16+rdi],r8d > > + mov DWORD[20+rdi],r9d > > + mov DWORD[24+rdi],r10d > > + mov DWORD[28+rdi],r11d > > + jb NEAR $L$loop_ssse3 > > + > > + mov rsi,QWORD[88+rsp] > > + > > + movaps xmm6,XMMWORD[((64+32))+rsp] > > + movaps xmm7,XMMWORD[((64+48))+rsp] > > + movaps xmm8,XMMWORD[((64+64))+rsp] > > + movaps xmm9,XMMWORD[((64+80))+rsp] > > + mov r15,QWORD[((-48))+rsi] > > + > > + mov r14,QWORD[((-40))+rsi] > > + > > + mov r13,QWORD[((-32))+rsi] > > + > > + mov r12,QWORD[((-24))+rsi] > > + > > + mov rbp,QWORD[((-16))+rsi] > > + > > + mov rbx,QWORD[((-8))+rsi] > > + > > + lea rsp,[rsi] > > + > > +$L$epilogue_ssse3: > > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > > + mov rsi,QWORD[16+rsp] > > + DB 0F3h,0C3h ;repret > > + > > +$L$SEH_end_sha256_block_data_order_ssse3: > > +EXTERN __imp_RtlVirtualUnwind > > + > > +ALIGN 16 > > +se_handler: > > + push rsi > > + push rdi > > + push rbx > > + push rbp > > + push r12 > > + push r13 > > + push r14 > > + push r15 > > + pushfq > > + sub rsp,64 > > + > > + mov rax,QWORD[120+r8] > > + mov rbx,QWORD[248+r8] > > + > > + mov rsi,QWORD[8+r9] > > + mov r11,QWORD[56+r9] > > + > > + mov r10d,DWORD[r11] > > + lea r10,[r10*1+rsi] > > + cmp rbx,r10 > > + jb NEAR $L$in_prologue > > + > > + mov rax,QWORD[152+r8] > > + > > + mov r10d,DWORD[4+r11] > > + lea r10,[r10*1+rsi] > > + cmp rbx,r10 > > + jae NEAR $L$in_prologue > > + mov rsi,rax > > + mov rax,QWORD[((64+24))+rax] > > + > > + mov rbx,QWORD[((-8))+rax] > > + mov rbp,QWORD[((-16))+rax] > > + mov r12,QWORD[((-24))+rax] > > + mov r13,QWORD[((-32))+rax] > > + mov r14,QWORD[((-40))+rax] > > + mov r15,QWORD[((-48))+rax] > > + mov QWORD[144+r8],rbx > > + mov QWORD[160+r8],rbp > > + mov QWORD[216+r8],r12 > > + mov QWORD[224+r8],r13 > > + mov QWORD[232+r8],r14 > > + mov QWORD[240+r8],r15 > > + > > + lea r10,[$L$epilogue] > > + cmp rbx,r10 > > + jb NEAR $L$in_prologue > > + > > + lea rsi,[((64+32))+rsi] > > + lea rdi,[512+r8] > > + mov ecx,8 > > + DD 0xa548f3fc > > + > > +$L$in_prologue: > > + mov rdi,QWORD[8+rax] > > + mov rsi,QWORD[16+rax] > > + mov QWORD[152+r8],rax > > + mov QWORD[168+r8],rsi > > + mov QWORD[176+r8],rdi > > + > > + mov rdi,QWORD[40+r9] > > + mov rsi,r8 > > + mov ecx,154 > > + DD 0xa548f3fc > > + > > + mov rsi,r9 > > + xor rcx,rcx > > + mov rdx,QWORD[8+rsi] > > + mov r8,QWORD[rsi] > > + mov r9,QWORD[16+rsi] > > + mov r10,QWORD[40+rsi] > > + lea r11,[56+rsi] > > + lea r12,[24+rsi] > > + mov QWORD[32+rsp],r10 > > + mov QWORD[40+rsp],r11 > > + mov QWORD[48+rsp],r12 > > + mov QWORD[56+rsp],rcx > > + call QWORD[__imp_RtlVirtualUnwind] > > + > > + mov eax,1 > > + add rsp,64 > > + popfq > > + pop r15 > > + pop r14 > > + pop r13 > > + pop r12 > > + pop rbp > > + pop rbx > > + pop rdi > > + pop rsi > > + DB 0F3h,0C3h ;repret > > + > > + > > +ALIGN 16 > > +shaext_handler: > > + push rsi > > + push rdi > > + push rbx > > + push rbp > > + push r12 > > + push r13 > > + push r14 > > + push r15 > > + pushfq > > + sub rsp,64 > > + > > + mov rax,QWORD[120+r8] > > + mov rbx,QWORD[248+r8] > > + > > + lea r10,[$L$prologue_shaext] > > + cmp rbx,r10 > > + jb NEAR $L$in_prologue > > + > > + lea r10,[$L$epilogue_shaext] > > + cmp rbx,r10 > > + jae NEAR $L$in_prologue > > + > > + lea rsi,[((-8-80))+rax] > > + lea rdi,[512+r8] > > + mov ecx,10 > > + DD 0xa548f3fc > > + > > + jmp NEAR $L$in_prologue > > + > > +section .pdata rdata align=4 > > +ALIGN 4 > > + DD $L$SEH_begin_sha256_block_data_order > wrt ..imagebase > > + DD $L$SEH_end_sha256_block_data_order > wrt ..imagebase > > + DD $L$SEH_info_sha256_block_data_order > wrt ..imagebase > > + DD $L$SEH_begin_sha256_block_data_order_shaext > wrt ..imagebase > > + DD $L$SEH_end_sha256_block_data_order_shaext > wrt ..imagebase > > + DD $L$SEH_info_sha256_block_data_order_shaext > wrt ..imagebase > > + DD $L$SEH_begin_sha256_block_data_order_ssse3 > wrt ..imagebase > > + DD $L$SEH_end_sha256_block_data_order_ssse3 > wrt ..imagebase > > + DD $L$SEH_info_sha256_block_data_order_ssse3 > wrt ..imagebase > > +section .xdata rdata align=8 > > +ALIGN 8 > > +$L$SEH_info_sha256_block_data_order: > > +DB 9,0,0,0 > > + DD se_handler wrt ..imagebase > > + DD $L$prologue wrt ..imagebase,$L$epilogue > wrt ..imagebase > > +$L$SEH_info_sha256_block_data_order_shaext: > > +DB 9,0,0,0 > > + DD shaext_handler wrt ..imagebase > > +$L$SEH_info_sha256_block_data_order_ssse3: > > +DB 9,0,0,0 > > + DD se_handler wrt ..imagebase > > + DD $L$prologue_ssse3 > wrt ..imagebase,$L$epilogue_ssse3 > > wrt ..imagebase > > diff --git > a/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha512-x86_64.nasm > > b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha512-x86_64.nasm > > new file mode 100644 > > index 0000000000..c6397d4393 > > --- /dev/null > > +++ b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha512-x86_64.nasm > > @@ -0,0 +1,1938 @@ > > +; WARNING: do not edit! > > +; Generated from openssl/crypto/sha/asm/sha512-x86_64.pl > > +; > > +; Copyright 2005-2020 The OpenSSL Project Authors. All Rights Reserved. > > +; > > +; Licensed under the OpenSSL license (the "License"). You may not use > > +; this file except in compliance with the License. You can obtain a copy > > +; in the file LICENSE in the source distribution or at > > +; https://www.openssl.org/source/license.html > > + > > +default rel > > +%define XMMWORD > > +%define YMMWORD > > +%define ZMMWORD > > +section .text code align=64 > > + > > + > > +EXTERN OPENSSL_ia32cap_P > > +global sha512_block_data_order > > + > > +ALIGN 16 > > +sha512_block_data_order: > > + mov QWORD[8+rsp],rdi ;WIN64 prologue > > + mov QWORD[16+rsp],rsi > > + mov rax,rsp > > +$L$SEH_begin_sha512_block_data_order: > > + mov rdi,rcx > > + mov rsi,rdx > > + mov rdx,r8 > > + > > + > > + > > + mov rax,rsp > > + > > + push rbx > > + > > + push rbp > > + > > + push r12 > > + > > + push r13 > > + > > + push r14 > > + > > + push r15 > > + > > + shl rdx,4 > > + sub rsp,16*8+4*8 > > + lea rdx,[rdx*8+rsi] > > + and rsp,-64 > > + mov QWORD[((128+0))+rsp],rdi > > + mov QWORD[((128+8))+rsp],rsi > > + mov QWORD[((128+16))+rsp],rdx > > + mov QWORD[152+rsp],rax > > + > > +$L$prologue: > > + > > + mov rax,QWORD[rdi] > > + mov rbx,QWORD[8+rdi] > > + mov rcx,QWORD[16+rdi] > > + mov rdx,QWORD[24+rdi] > > + mov r8,QWORD[32+rdi] > > + mov r9,QWORD[40+rdi] > > + mov r10,QWORD[48+rdi] > > + mov r11,QWORD[56+rdi] > > + jmp NEAR $L$loop > > + > > +ALIGN 16 > > +$L$loop: > > + mov rdi,rbx > > + lea rbp,[K512] > > + xor rdi,rcx > > + mov r12,QWORD[rsi] > > + mov r13,r8 > > + mov r14,rax > > + bswap r12 > > + ror r13,23 > > + mov r15,r9 > > + > > + xor r13,r8 > > + ror r14,5 > > + xor r15,r10 > > + > > + mov QWORD[rsp],r12 > > + xor r14,rax > > + and r15,r8 > > + > > + ror r13,4 > > + add r12,r11 > > + xor r15,r10 > > + > > + ror r14,6 > > + xor r13,r8 > > + add r12,r15 > > + > > + mov r15,rax > > + add r12,QWORD[rbp] > > + xor r14,rax > > + > > + xor r15,rbx > > + ror r13,14 > > + mov r11,rbx > > + > > + and rdi,r15 > > + ror r14,28 > > + add r12,r13 > > + > > + xor r11,rdi > > + add rdx,r12 > > + add r11,r12 > > + > > + lea rbp,[8+rbp] > > + add r11,r14 > > + mov r12,QWORD[8+rsi] > > + mov r13,rdx > > + mov r14,r11 > > + bswap r12 > > + ror r13,23 > > + mov rdi,r8 > > + > > + xor r13,rdx > > + ror r14,5 > > + xor rdi,r9 > > + > > + mov QWORD[8+rsp],r12 > > + xor r14,r11 > > + and rdi,rdx > > + > > + ror r13,4 > > + add r12,r10 > > + xor rdi,r9 > > + > > + ror r14,6 > > + xor r13,rdx > > + add r12,rdi > > + > > + mov rdi,r11 > > + add r12,QWORD[rbp] > > + xor r14,r11 > > + > > + xor rdi,rax > > + ror r13,14 > > + mov r10,rax > > + > > + and r15,rdi > > + ror r14,28 > > + add r12,r13 > > + > > + xor r10,r15 > > + add rcx,r12 > > + add r10,r12 > > + > > + lea rbp,[24+rbp] > > + add r10,r14 > > + mov r12,QWORD[16+rsi] > > + mov r13,rcx > > + mov r14,r10 > > + bswap r12 > > + ror r13,23 > > + mov r15,rdx > > + > > + xor r13,rcx > > + ror r14,5 > > + xor r15,r8 > > + > > + mov QWORD[16+rsp],r12 > > + xor r14,r10 > > + and r15,rcx > > + > > + ror r13,4 > > + add r12,r9 > > + xor r15,r8 > > + > > + ror r14,6 > > + xor r13,rcx > > + add r12,r15 > > + > > + mov r15,r10 > > + add r12,QWORD[rbp] > > + xor r14,r10 > > + > > + xor r15,r11 > > + ror r13,14 > > + mov r9,r11 > > + > > + and rdi,r15 > > + ror r14,28 > > + add r12,r13 > > + > > + xor r9,rdi > > + add rbx,r12 > > + add r9,r12 > > + > > + lea rbp,[8+rbp] > > + add r9,r14 > > + mov r12,QWORD[24+rsi] > > + mov r13,rbx > > + mov r14,r9 > > + bswap r12 > > + ror r13,23 > > + mov rdi,rcx > > + > > + xor r13,rbx > > + ror r14,5 > > + xor rdi,rdx > > + > > + mov QWORD[24+rsp],r12 > > + xor r14,r9 > > + and rdi,rbx > > + > > + ror r13,4 > > + add r12,r8 > > + xor rdi,rdx > > + > > + ror r14,6 > > + xor r13,rbx > > + add r12,rdi > > + > > + mov rdi,r9 > > + add r12,QWORD[rbp] > > + xor r14,r9 > > + > > + xor rdi,r10 > > + ror r13,14 > > + mov r8,r10 > > + > > + and r15,rdi > > + ror r14,28 > > + add r12,r13 > > + > > + xor r8,r15 > > + add rax,r12 > > + add r8,r12 > > + > > + lea rbp,[24+rbp] > > + add r8,r14 > > + mov r12,QWORD[32+rsi] > > + mov r13,rax > > + mov r14,r8 > > + bswap r12 > > + ror r13,23 > > + mov r15,rbx > > + > > + xor r13,rax > > + ror r14,5 > > + xor r15,rcx > > + > > + mov QWORD[32+rsp],r12 > > + xor r14,r8 > > + and r15,rax > > + > > + ror r13,4 > > + add r12,rdx > > + xor r15,rcx > > + > > + ror r14,6 > > + xor r13,rax > > + add r12,r15 > > + > > + mov r15,r8 > > + add r12,QWORD[rbp] > > + xor r14,r8 > > + > > + xor r15,r9 > > + ror r13,14 > > + mov rdx,r9 > > + > > + and rdi,r15 > > + ror r14,28 > > + add r12,r13 > > + > > + xor rdx,rdi > > + add r11,r12 > > + add rdx,r12 > > + > > + lea rbp,[8+rbp] > > + add rdx,r14 > > + mov r12,QWORD[40+rsi] > > + mov r13,r11 > > + mov r14,rdx > > + bswap r12 > > + ror r13,23 > > + mov rdi,rax > > + > > + xor r13,r11 > > + ror r14,5 > > + xor rdi,rbx > > + > > + mov QWORD[40+rsp],r12 > > + xor r14,rdx > > + and rdi,r11 > > + > > + ror r13,4 > > + add r12,rcx > > + xor rdi,rbx > > + > > + ror r14,6 > > + xor r13,r11 > > + add r12,rdi > > + > > + mov rdi,rdx > > + add r12,QWORD[rbp] > > + xor r14,rdx > > + > > + xor rdi,r8 > > + ror r13,14 > > + mov rcx,r8 > > + > > + and r15,rdi > > + ror r14,28 > > + add r12,r13 > > + > > + xor rcx,r15 > > + add r10,r12 > > + add rcx,r12 > > + > > + lea rbp,[24+rbp] > > + add rcx,r14 > > + mov r12,QWORD[48+rsi] > > + mov r13,r10 > > + mov r14,rcx > > + bswap r12 > > + ror r13,23 > > + mov r15,r11 > > + > > + xor r13,r10 > > + ror r14,5 > > + xor r15,rax > > + > > + mov QWORD[48+rsp],r12 > > + xor r14,rcx > > + and r15,r10 > > + > > + ror r13,4 > > + add r12,rbx > > + xor r15,rax > > + > > + ror r14,6 > > + xor r13,r10 > > + add r12,r15 > > + > > + mov r15,rcx > > + add r12,QWORD[rbp] > > + xor r14,rcx > > + > > + xor r15,rdx > > + ror r13,14 > > + mov rbx,rdx > > + > > + and rdi,r15 > > + ror r14,28 > > + add r12,r13 > > + > > + xor rbx,rdi > > + add r9,r12 > > + add rbx,r12 > > + > > + lea rbp,[8+rbp] > > + add rbx,r14 > > + mov r12,QWORD[56+rsi] > > + mov r13,r9 > > + mov r14,rbx > > + bswap r12 > > + ror r13,23 > > + mov rdi,r10 > > + > > + xor r13,r9 > > + ror r14,5 > > + xor rdi,r11 > > + > > + mov QWORD[56+rsp],r12 > > + xor r14,rbx > > + and rdi,r9 > > + > > + ror r13,4 > > + add r12,rax > > + xor rdi,r11 > > + > > + ror r14,6 > > + xor r13,r9 > > + add r12,rdi > > + > > + mov rdi,rbx > > + add r12,QWORD[rbp] > > + xor r14,rbx > > + > > + xor rdi,rcx > > + ror r13,14 > > + mov rax,rcx > > + > > + and r15,rdi > > + ror r14,28 > > + add r12,r13 > > + > > + xor rax,r15 > > + add r8,r12 > > + add rax,r12 > > + > > + lea rbp,[24+rbp] > > + add rax,r14 > > + mov r12,QWORD[64+rsi] > > + mov r13,r8 > > + mov r14,rax > > + bswap r12 > > + ror r13,23 > > + mov r15,r9 > > + > > + xor r13,r8 > > + ror r14,5 > > + xor r15,r10 > > + > > + mov QWORD[64+rsp],r12 > > + xor r14,rax > > + and r15,r8 > > + > > + ror r13,4 > > + add r12,r11 > > + xor r15,r10 > > + > > + ror r14,6 > > + xor r13,r8 > > + add r12,r15 > > + > > + mov r15,rax > > + add r12,QWORD[rbp] > > + xor r14,rax > > + > > + xor r15,rbx > > + ror r13,14 > > + mov r11,rbx > > + > > + and rdi,r15 > > + ror r14,28 > > + add r12,r13 > > + > > + xor r11,rdi > > + add rdx,r12 > > + add r11,r12 > > + > > + lea rbp,[8+rbp] > > + add r11,r14 > > + mov r12,QWORD[72+rsi] > > + mov r13,rdx > > + mov r14,r11 > > + bswap r12 > > + ror r13,23 > > + mov rdi,r8 > > + > > + xor r13,rdx > > + ror r14,5 > > + xor rdi,r9 > > + > > + mov QWORD[72+rsp],r12 > > + xor r14,r11 > > + and rdi,rdx > > + > > + ror r13,4 > > + add r12,r10 > > + xor rdi,r9 > > + > > + ror r14,6 > > + xor r13,rdx > > + add r12,rdi > > + > > + mov rdi,r11 > > + add r12,QWORD[rbp] > > + xor r14,r11 > > + > > + xor rdi,rax > > + ror r13,14 > > + mov r10,rax > > + > > + and r15,rdi > > + ror r14,28 > > + add r12,r13 > > + > > + xor r10,r15 > > + add rcx,r12 > > + add r10,r12 > > + > > + lea rbp,[24+rbp] > > + add r10,r14 > > + mov r12,QWORD[80+rsi] > > + mov r13,rcx > > + mov r14,r10 > > + bswap r12 > > + ror r13,23 > > + mov r15,rdx > > + > > + xor r13,rcx > > + ror r14,5 > > + xor r15,r8 > > + > > + mov QWORD[80+rsp],r12 > > + xor r14,r10 > > + and r15,rcx > > + > > + ror r13,4 > > + add r12,r9 > > + xor r15,r8 > > + > > + ror r14,6 > > + xor r13,rcx > > + add r12,r15 > > + > > + mov r15,r10 > > + add r12,QWORD[rbp] > > + xor r14,r10 > > + > > + xor r15,r11 > > + ror r13,14 > > + mov r9,r11 > > + > > + and rdi,r15 > > + ror r14,28 > > + add r12,r13 > > + > > + xor r9,rdi > > + add rbx,r12 > > + add r9,r12 > > + > > + lea rbp,[8+rbp] > > + add r9,r14 > > + mov r12,QWORD[88+rsi] > > + mov r13,rbx > > + mov r14,r9 > > + bswap r12 > > + ror r13,23 > > + mov rdi,rcx > > + > > + xor r13,rbx > > + ror r14,5 > > + xor rdi,rdx > > + > > + mov QWORD[88+rsp],r12 > > + xor r14,r9 > > + and rdi,rbx > > + > > + ror r13,4 > > + add r12,r8 > > + xor rdi,rdx > > + > > + ror r14,6 > > + xor r13,rbx > > + add r12,rdi > > + > > + mov rdi,r9 > > + add r12,QWORD[rbp] > > + xor r14,r9 > > + > > + xor rdi,r10 > > + ror r13,14 > > + mov r8,r10 > > + > > + and r15,rdi > > + ror r14,28 > > + add r12,r13 > > + > > + xor r8,r15 > > + add rax,r12 > > + add r8,r12 > > + > > + lea rbp,[24+rbp] > > + add r8,r14 > > + mov r12,QWORD[96+rsi] > > + mov r13,rax > > + mov r14,r8 > > + bswap r12 > > + ror r13,23 > > + mov r15,rbx > > + > > + xor r13,rax > > + ror r14,5 > > + xor r15,rcx > > + > > + mov QWORD[96+rsp],r12 > > + xor r14,r8 > > + and r15,rax > > + > > + ror r13,4 > > + add r12,rdx > > + xor r15,rcx > > + > > + ror r14,6 > > + xor r13,rax > > + add r12,r15 > > + > > + mov r15,r8 > > + add r12,QWORD[rbp] > > + xor r14,r8 > > + > > + xor r15,r9 > > + ror r13,14 > > + mov rdx,r9 > > + > > + and rdi,r15 > > + ror r14,28 > > + add r12,r13 > > + > > + xor rdx,rdi > > + add r11,r12 > > + add rdx,r12 > > + > > + lea rbp,[8+rbp] > > + add rdx,r14 > > + mov r12,QWORD[104+rsi] > > + mov r13,r11 > > + mov r14,rdx > > + bswap r12 > > + ror r13,23 > > + mov rdi,rax > > + > > + xor r13,r11 > > + ror r14,5 > > + xor rdi,rbx > > + > > + mov QWORD[104+rsp],r12 > > + xor r14,rdx > > + and rdi,r11 > > + > > + ror r13,4 > > + add r12,rcx > > + xor rdi,rbx > > + > > + ror r14,6 > > + xor r13,r11 > > + add r12,rdi > > + > > + mov rdi,rdx > > + add r12,QWORD[rbp] > > + xor r14,rdx > > + > > + xor rdi,r8 > > + ror r13,14 > > + mov rcx,r8 > > + > > + and r15,rdi > > + ror r14,28 > > + add r12,r13 > > + > > + xor rcx,r15 > > + add r10,r12 > > + add rcx,r12 > > + > > + lea rbp,[24+rbp] > > + add rcx,r14 > > + mov r12,QWORD[112+rsi] > > + mov r13,r10 > > + mov r14,rcx > > + bswap r12 > > + ror r13,23 > > + mov r15,r11 > > + > > + xor r13,r10 > > + ror r14,5 > > + xor r15,rax > > + > > + mov QWORD[112+rsp],r12 > > + xor r14,rcx > > + and r15,r10 > > + > > + ror r13,4 > > + add r12,rbx > > + xor r15,rax > > + > > + ror r14,6 > > + xor r13,r10 > > + add r12,r15 > > + > > + mov r15,rcx > > + add r12,QWORD[rbp] > > + xor r14,rcx > > + > > + xor r15,rdx > > + ror r13,14 > > + mov rbx,rdx > > + > > + and rdi,r15 > > + ror r14,28 > > + add r12,r13 > > + > > + xor rbx,rdi > > + add r9,r12 > > + add rbx,r12 > > + > > + lea rbp,[8+rbp] > > + add rbx,r14 > > + mov r12,QWORD[120+rsi] > > + mov r13,r9 > > + mov r14,rbx > > + bswap r12 > > + ror r13,23 > > + mov rdi,r10 > > + > > + xor r13,r9 > > + ror r14,5 > > + xor rdi,r11 > > + > > + mov QWORD[120+rsp],r12 > > + xor r14,rbx > > + and rdi,r9 > > + > > + ror r13,4 > > + add r12,rax > > + xor rdi,r11 > > + > > + ror r14,6 > > + xor r13,r9 > > + add r12,rdi > > + > > + mov rdi,rbx > > + add r12,QWORD[rbp] > > + xor r14,rbx > > + > > + xor rdi,rcx > > + ror r13,14 > > + mov rax,rcx > > + > > + and r15,rdi > > + ror r14,28 > > + add r12,r13 > > + > > + xor rax,r15 > > + add r8,r12 > > + add rax,r12 > > + > > + lea rbp,[24+rbp] > > + jmp NEAR $L$rounds_16_xx > > +ALIGN 16 > > +$L$rounds_16_xx: > > + mov r13,QWORD[8+rsp] > > + mov r15,QWORD[112+rsp] > > + > > + mov r12,r13 > > + ror r13,7 > > + add rax,r14 > > + mov r14,r15 > > + ror r15,42 > > + > > + xor r13,r12 > > + shr r12,7 > > + ror r13,1 > > + xor r15,r14 > > + shr r14,6 > > + > > + ror r15,19 > > + xor r12,r13 > > + xor r15,r14 > > + add r12,QWORD[72+rsp] > > + > > + add r12,QWORD[rsp] > > + mov r13,r8 > > + add r12,r15 > > + mov r14,rax > > + ror r13,23 > > + mov r15,r9 > > + > > + xor r13,r8 > > + ror r14,5 > > + xor r15,r10 > > + > > + mov QWORD[rsp],r12 > > + xor r14,rax > > + and r15,r8 > > + > > + ror r13,4 > > + add r12,r11 > > + xor r15,r10 > > + > > + ror r14,6 > > + xor r13,r8 > > + add r12,r15 > > + > > + mov r15,rax > > + add r12,QWORD[rbp] > > + xor r14,rax > > + > > + xor r15,rbx > > + ror r13,14 > > + mov r11,rbx > > + > > + and rdi,r15 > > + ror r14,28 > > + add r12,r13 > > + > > + xor r11,rdi > > + add rdx,r12 > > + add r11,r12 > > + > > + lea rbp,[8+rbp] > > + mov r13,QWORD[16+rsp] > > + mov rdi,QWORD[120+rsp] > > + > > + mov r12,r13 > > + ror r13,7 > > + add r11,r14 > > + mov r14,rdi > > + ror rdi,42 > > + > > + xor r13,r12 > > + shr r12,7 > > + ror r13,1 > > + xor rdi,r14 > > + shr r14,6 > > + > > + ror rdi,19 > > + xor r12,r13 > > + xor rdi,r14 > > + add r12,QWORD[80+rsp] > > + > > + add r12,QWORD[8+rsp] > > + mov r13,rdx > > + add r12,rdi > > + mov r14,r11 > > + ror r13,23 > > + mov rdi,r8 > > + > > + xor r13,rdx > > + ror r14,5 > > + xor rdi,r9 > > + > > + mov QWORD[8+rsp],r12 > > + xor r14,r11 > > + and rdi,rdx > > + > > + ror r13,4 > > + add r12,r10 > > + xor rdi,r9 > > + > > + ror r14,6 > > + xor r13,rdx > > + add r12,rdi > > + > > + mov rdi,r11 > > + add r12,QWORD[rbp] > > + xor r14,r11 > > + > > + xor rdi,rax > > + ror r13,14 > > + mov r10,rax > > + > > + and r15,rdi > > + ror r14,28 > > + add r12,r13 > > + > > + xor r10,r15 > > + add rcx,r12 > > + add r10,r12 > > + > > + lea rbp,[24+rbp] > > + mov r13,QWORD[24+rsp] > > + mov r15,QWORD[rsp] > > + > > + mov r12,r13 > > + ror r13,7 > > + add r10,r14 > > + mov r14,r15 > > + ror r15,42 > > + > > + xor r13,r12 > > + shr r12,7 > > + ror r13,1 > > + xor r15,r14 > > + shr r14,6 > > + > > + ror r15,19 > > + xor r12,r13 > > + xor r15,r14 > > + add r12,QWORD[88+rsp] > > + > > + add r12,QWORD[16+rsp] > > + mov r13,rcx > > + add r12,r15 > > + mov r14,r10 > > + ror r13,23 > > + mov r15,rdx > > + > > + xor r13,rcx > > + ror r14,5 > > + xor r15,r8 > > + > > + mov QWORD[16+rsp],r12 > > + xor r14,r10 > > + and r15,rcx > > + > > + ror r13,4 > > + add r12,r9 > > + xor r15,r8 > > + > > + ror r14,6 > > + xor r13,rcx > > + add r12,r15 > > + > > + mov r15,r10 > > + add r12,QWORD[rbp] > > + xor r14,r10 > > + > > + xor r15,r11 > > + ror r13,14 > > + mov r9,r11 > > + > > + and rdi,r15 > > + ror r14,28 > > + add r12,r13 > > + > > + xor r9,rdi > > + add rbx,r12 > > + add r9,r12 > > + > > + lea rbp,[8+rbp] > > + mov r13,QWORD[32+rsp] > > + mov rdi,QWORD[8+rsp] > > + > > + mov r12,r13 > > + ror r13,7 > > + add r9,r14 > > + mov r14,rdi > > + ror rdi,42 > > + > > + xor r13,r12 > > + shr r12,7 > > + ror r13,1 > > + xor rdi,r14 > > + shr r14,6 > > + > > + ror rdi,19 > > + xor r12,r13 > > + xor rdi,r14 > > + add r12,QWORD[96+rsp] > > + > > + add r12,QWORD[24+rsp] > > + mov r13,rbx > > + add r12,rdi > > + mov r14,r9 > > + ror r13,23 > > + mov rdi,rcx > > + > > + xor r13,rbx > > + ror r14,5 > > + xor rdi,rdx > > + > > + mov QWORD[24+rsp],r12 > > + xor r14,r9 > > + and rdi,rbx > > + > > + ror r13,4 > > + add r12,r8 > > + xor rdi,rdx > > + > > + ror r14,6 > > + xor r13,rbx > > + add r12,rdi > > + > > + mov rdi,r9 > > + add r12,QWORD[rbp] > > + xor r14,r9 > > + > > + xor rdi,r10 > > + ror r13,14 > > + mov r8,r10 > > + > > + and r15,rdi > > + ror r14,28 > > + add r12,r13 > > + > > + xor r8,r15 > > + add rax,r12 > > + add r8,r12 > > + > > + lea rbp,[24+rbp] > > + mov r13,QWORD[40+rsp] > > + mov r15,QWORD[16+rsp] > > + > > + mov r12,r13 > > + ror r13,7 > > + add r8,r14 > > + mov r14,r15 > > + ror r15,42 > > + > > + xor r13,r12 > > + shr r12,7 > > + ror r13,1 > > + xor r15,r14 > > + shr r14,6 > > + > > + ror r15,19 > > + xor r12,r13 > > + xor r15,r14 > > + add r12,QWORD[104+rsp] > > + > > + add r12,QWORD[32+rsp] > > + mov r13,rax > > + add r12,r15 > > + mov r14,r8 > > + ror r13,23 > > + mov r15,rbx > > + > > + xor r13,rax > > + ror r14,5 > > + xor r15,rcx > > + > > + mov QWORD[32+rsp],r12 > > + xor r14,r8 > > + and r15,rax > > + > > + ror r13,4 > > + add r12,rdx > > + xor r15,rcx > > + > > + ror r14,6 > > + xor r13,rax > > + add r12,r15 > > + > > + mov r15,r8 > > + add r12,QWORD[rbp] > > + xor r14,r8 > > + > > + xor r15,r9 > > + ror r13,14 > > + mov rdx,r9 > > + > > + and rdi,r15 > > + ror r14,28 > > + add r12,r13 > > + > > + xor rdx,rdi > > + add r11,r12 > > + add rdx,r12 > > + > > + lea rbp,[8+rbp] > > + mov r13,QWORD[48+rsp] > > + mov rdi,QWORD[24+rsp] > > + > > + mov r12,r13 > > + ror r13,7 > > + add rdx,r14 > > + mov r14,rdi > > + ror rdi,42 > > + > > + xor r13,r12 > > + shr r12,7 > > + ror r13,1 > > + xor rdi,r14 > > + shr r14,6 > > + > > + ror rdi,19 > > + xor r12,r13 > > + xor rdi,r14 > > + add r12,QWORD[112+rsp] > > + > > + add r12,QWORD[40+rsp] > > + mov r13,r11 > > + add r12,rdi > > + mov r14,rdx > > + ror r13,23 > > + mov rdi,rax > > + > > + xor r13,r11 > > + ror r14,5 > > + xor rdi,rbx > > + > > + mov QWORD[40+rsp],r12 > > + xor r14,rdx > > + and rdi,r11 > > + > > + ror r13,4 > > + add r12,rcx > > + xor rdi,rbx > > + > > + ror r14,6 > > + xor r13,r11 > > + add r12,rdi > > + > > + mov rdi,rdx > > + add r12,QWORD[rbp] > > + xor r14,rdx > > + > > + xor rdi,r8 > > + ror r13,14 > > + mov rcx,r8 > > + > > + and r15,rdi > > + ror r14,28 > > + add r12,r13 > > + > > + xor rcx,r15 > > + add r10,r12 > > + add rcx,r12 > > + > > + lea rbp,[24+rbp] > > + mov r13,QWORD[56+rsp] > > + mov r15,QWORD[32+rsp] > > + > > + mov r12,r13 > > + ror r13,7 > > + add rcx,r14 > > + mov r14,r15 > > + ror r15,42 > > + > > + xor r13,r12 > > + shr r12,7 > > + ror r13,1 > > + xor r15,r14 > > + shr r14,6 > > + > > + ror r15,19 > > + xor r12,r13 > > + xor r15,r14 > > + add r12,QWORD[120+rsp] > > + > > + add r12,QWORD[48+rsp] > > + mov r13,r10 > > + add r12,r15 > > + mov r14,rcx > > + ror r13,23 > > + mov r15,r11 > > + > > + xor r13,r10 > > + ror r14,5 > > + xor r15,rax > > + > > + mov QWORD[48+rsp],r12 > > + xor r14,rcx > > + and r15,r10 > > + > > + ror r13,4 > > + add r12,rbx > > + xor r15,rax > > + > > + ror r14,6 > > + xor r13,r10 > > + add r12,r15 > > + > > + mov r15,rcx > > + add r12,QWORD[rbp] > > + xor r14,rcx > > + > > + xor r15,rdx > > + ror r13,14 > > + mov rbx,rdx > > + > > + and rdi,r15 > > + ror r14,28 > > + add r12,r13 > > + > > + xor rbx,rdi > > + add r9,r12 > > + add rbx,r12 > > + > > + lea rbp,[8+rbp] > > + mov r13,QWORD[64+rsp] > > + mov rdi,QWORD[40+rsp] > > + > > + mov r12,r13 > > + ror r13,7 > > + add rbx,r14 > > + mov r14,rdi > > + ror rdi,42 > > + > > + xor r13,r12 > > + shr r12,7 > > + ror r13,1 > > + xor rdi,r14 > > + shr r14,6 > > + > > + ror rdi,19 > > + xor r12,r13 > > + xor rdi,r14 > > + add r12,QWORD[rsp] > > + > > + add r12,QWORD[56+rsp] > > + mov r13,r9 > > + add r12,rdi > > + mov r14,rbx > > + ror r13,23 > > + mov rdi,r10 > > + > > + xor r13,r9 > > + ror r14,5 > > + xor rdi,r11 > > + > > + mov QWORD[56+rsp],r12 > > + xor r14,rbx > > + and rdi,r9 > > + > > + ror r13,4 > > + add r12,rax > > + xor rdi,r11 > > + > > + ror r14,6 > > + xor r13,r9 > > + add r12,rdi > > + > > + mov rdi,rbx > > + add r12,QWORD[rbp] > > + xor r14,rbx > > + > > + xor rdi,rcx > > + ror r13,14 > > + mov rax,rcx > > + > > + and r15,rdi > > + ror r14,28 > > + add r12,r13 > > + > > + xor rax,r15 > > + add r8,r12 > > + add rax,r12 > > + > > + lea rbp,[24+rbp] > > + mov r13,QWORD[72+rsp] > > + mov r15,QWORD[48+rsp] > > + > > + mov r12,r13 > > + ror r13,7 > > + add rax,r14 > > + mov r14,r15 > > + ror r15,42 > > + > > + xor r13,r12 > > + shr r12,7 > > + ror r13,1 > > + xor r15,r14 > > + shr r14,6 > > + > > + ror r15,19 > > + xor r12,r13 > > + xor r15,r14 > > + add r12,QWORD[8+rsp] > > + > > + add r12,QWORD[64+rsp] > > + mov r13,r8 > > + add r12,r15 > > + mov r14,rax > > + ror r13,23 > > + mov r15,r9 > > + > > + xor r13,r8 > > + ror r14,5 > > + xor r15,r10 > > + > > + mov QWORD[64+rsp],r12 > > + xor r14,rax > > + and r15,r8 > > + > > + ror r13,4 > > + add r12,r11 > > + xor r15,r10 > > + > > + ror r14,6 > > + xor r13,r8 > > + add r12,r15 > > + > > + mov r15,rax > > + add r12,QWORD[rbp] > > + xor r14,rax > > + > > + xor r15,rbx > > + ror r13,14 > > + mov r11,rbx > > + > > + and rdi,r15 > > + ror r14,28 > > + add r12,r13 > > + > > + xor r11,rdi > > + add rdx,r12 > > + add r11,r12 > > + > > + lea rbp,[8+rbp] > > + mov r13,QWORD[80+rsp] > > + mov rdi,QWORD[56+rsp] > > + > > + mov r12,r13 > > + ror r13,7 > > + add r11,r14 > > + mov r14,rdi > > + ror rdi,42 > > + > > + xor r13,r12 > > + shr r12,7 > > + ror r13,1 > > + xor rdi,r14 > > + shr r14,6 > > + > > + ror rdi,19 > > + xor r12,r13 > > + xor rdi,r14 > > + add r12,QWORD[16+rsp] > > + > > + add r12,QWORD[72+rsp] > > + mov r13,rdx > > + add r12,rdi > > + mov r14,r11 > > + ror r13,23 > > + mov rdi,r8 > > + > > + xor r13,rdx > > + ror r14,5 > > + xor rdi,r9 > > + > > + mov QWORD[72+rsp],r12 > > + xor r14,r11 > > + and rdi,rdx > > + > > + ror r13,4 > > + add r12,r10 > > + xor rdi,r9 > > + > > + ror r14,6 > > + xor r13,rdx > > + add r12,rdi > > + > > + mov rdi,r11 > > + add r12,QWORD[rbp] > > + xor r14,r11 > > + > > + xor rdi,rax > > + ror r13,14 > > + mov r10,rax > > + > > + and r15,rdi > > + ror r14,28 > > + add r12,r13 > > + > > + xor r10,r15 > > + add rcx,r12 > > + add r10,r12 > > + > > + lea rbp,[24+rbp] > > + mov r13,QWORD[88+rsp] > > + mov r15,QWORD[64+rsp] > > + > > + mov r12,r13 > > + ror r13,7 > > + add r10,r14 > > + mov r14,r15 > > + ror r15,42 > > + > > + xor r13,r12 > > + shr r12,7 > > + ror r13,1 > > + xor r15,r14 > > + shr r14,6 > > + > > + ror r15,19 > > + xor r12,r13 > > + xor r15,r14 > > + add r12,QWORD[24+rsp] > > + > > + add r12,QWORD[80+rsp] > > + mov r13,rcx > > + add r12,r15 > > + mov r14,r10 > > + ror r13,23 > > + mov r15,rdx > > + > > + xor r13,rcx > > + ror r14,5 > > + xor r15,r8 > > + > > + mov QWORD[80+rsp],r12 > > + xor r14,r10 > > + and r15,rcx > > + > > + ror r13,4 > > + add r12,r9 > > + xor r15,r8 > > + > > + ror r14,6 > > + xor r13,rcx > > + add r12,r15 > > + > > + mov r15,r10 > > + add r12,QWORD[rbp] > > + xor r14,r10 > > + > > + xor r15,r11 > > + ror r13,14 > > + mov r9,r11 > > + > > + and rdi,r15 > > + ror r14,28 > > + add r12,r13 > > + > > + xor r9,rdi > > + add rbx,r12 > > + add r9,r12 > > + > > + lea rbp,[8+rbp] > > + mov r13,QWORD[96+rsp] > > + mov rdi,QWORD[72+rsp] > > + > > + mov r12,r13 > > + ror r13,7 > > + add r9,r14 > > + mov r14,rdi > > + ror rdi,42 > > + > > + xor r13,r12 > > + shr r12,7 > > + ror r13,1 > > + xor rdi,r14 > > + shr r14,6 > > + > > + ror rdi,19 > > + xor r12,r13 > > + xor rdi,r14 > > + add r12,QWORD[32+rsp] > > + > > + add r12,QWORD[88+rsp] > > + mov r13,rbx > > + add r12,rdi > > + mov r14,r9 > > + ror r13,23 > > + mov rdi,rcx > > + > > + xor r13,rbx > > + ror r14,5 > > + xor rdi,rdx > > + > > + mov QWORD[88+rsp],r12 > > + xor r14,r9 > > + and rdi,rbx > > + > > + ror r13,4 > > + add r12,r8 > > + xor rdi,rdx > > + > > + ror r14,6 > > + xor r13,rbx > > + add r12,rdi > > + > > + mov rdi,r9 > > + add r12,QWORD[rbp] > > + xor r14,r9 > > + > > + xor rdi,r10 > > + ror r13,14 > > + mov r8,r10 > > + > > + and r15,rdi > > + ror r14,28 > > + add r12,r13 > > + > > + xor r8,r15 > > + add rax,r12 > > + add r8,r12 > > + > > + lea rbp,[24+rbp] > > + mov r13,QWORD[104+rsp] > > + mov r15,QWORD[80+rsp] > > + > > + mov r12,r13 > > + ror r13,7 > > + add r8,r14 > > + mov r14,r15 > > + ror r15,42 > > + > > + xor r13,r12 > > + shr r12,7 > > + ror r13,1 > > + xor r15,r14 > > + shr r14,6 > > + > > + ror r15,19 > > + xor r12,r13 > > + xor r15,r14 > > + add r12,QWORD[40+rsp] > > + > > + add r12,QWORD[96+rsp] > > + mov r13,rax > > + add r12,r15 > > + mov r14,r8 > > + ror r13,23 > > + mov r15,rbx > > + > > + xor r13,rax > > + ror r14,5 > > + xor r15,rcx > > + > > + mov QWORD[96+rsp],r12 > > + xor r14,r8 > > + and r15,rax > > + > > + ror r13,4 > > + add r12,rdx > > + xor r15,rcx > > + > > + ror r14,6 > > + xor r13,rax > > + add r12,r15 > > + > > + mov r15,r8 > > + add r12,QWORD[rbp] > > + xor r14,r8 > > + > > + xor r15,r9 > > + ror r13,14 > > + mov rdx,r9 > > + > > + and rdi,r15 > > + ror r14,28 > > + add r12,r13 > > + > > + xor rdx,rdi > > + add r11,r12 > > + add rdx,r12 > > + > > + lea rbp,[8+rbp] > > + mov r13,QWORD[112+rsp] > > + mov rdi,QWORD[88+rsp] > > + > > + mov r12,r13 > > + ror r13,7 > > + add rdx,r14 > > + mov r14,rdi > > + ror rdi,42 > > + > > + xor r13,r12 > > + shr r12,7 > > + ror r13,1 > > + xor rdi,r14 > > + shr r14,6 > > + > > + ror rdi,19 > > + xor r12,r13 > > + xor rdi,r14 > > + add r12,QWORD[48+rsp] > > + > > + add r12,QWORD[104+rsp] > > + mov r13,r11 > > + add r12,rdi > > + mov r14,rdx > > + ror r13,23 > > + mov rdi,rax > > + > > + xor r13,r11 > > + ror r14,5 > > + xor rdi,rbx > > + > > + mov QWORD[104+rsp],r12 > > + xor r14,rdx > > + and rdi,r11 > > + > > + ror r13,4 > > + add r12,rcx > > + xor rdi,rbx > > + > > + ror r14,6 > > + xor r13,r11 > > + add r12,rdi > > + > > + mov rdi,rdx > > + add r12,QWORD[rbp] > > + xor r14,rdx > > + > > + xor rdi,r8 > > + ror r13,14 > > + mov rcx,r8 > > + > > + and r15,rdi > > + ror r14,28 > > + add r12,r13 > > + > > + xor rcx,r15 > > + add r10,r12 > > + add rcx,r12 > > + > > + lea rbp,[24+rbp] > > + mov r13,QWORD[120+rsp] > > + mov r15,QWORD[96+rsp] > > + > > + mov r12,r13 > > + ror r13,7 > > + add rcx,r14 > > + mov r14,r15 > > + ror r15,42 > > + > > + xor r13,r12 > > + shr r12,7 > > + ror r13,1 > > + xor r15,r14 > > + shr r14,6 > > + > > + ror r15,19 > > + xor r12,r13 > > + xor r15,r14 > > + add r12,QWORD[56+rsp] > > + > > + add r12,QWORD[112+rsp] > > + mov r13,r10 > > + add r12,r15 > > + mov r14,rcx > > + ror r13,23 > > + mov r15,r11 > > + > > + xor r13,r10 > > + ror r14,5 > > + xor r15,rax > > + > > + mov QWORD[112+rsp],r12 > > + xor r14,rcx > > + and r15,r10 > > + > > + ror r13,4 > > + add r12,rbx > > + xor r15,rax > > + > > + ror r14,6 > > + xor r13,r10 > > + add r12,r15 > > + > > + mov r15,rcx > > + add r12,QWORD[rbp] > > + xor r14,rcx > > + > > + xor r15,rdx > > + ror r13,14 > > + mov rbx,rdx > > + > > + and rdi,r15 > > + ror r14,28 > > + add r12,r13 > > + > > + xor rbx,rdi > > + add r9,r12 > > + add rbx,r12 > > + > > + lea rbp,[8+rbp] > > + mov r13,QWORD[rsp] > > + mov rdi,QWORD[104+rsp] > > + > > + mov r12,r13 > > + ror r13,7 > > + add rbx,r14 > > + mov r14,rdi > > + ror rdi,42 > > + > > + xor r13,r12 > > + shr r12,7 > > + ror r13,1 > > + xor rdi,r14 > > + shr r14,6 > > + > > + ror rdi,19 > > + xor r12,r13 > > + xor rdi,r14 > > + add r12,QWORD[64+rsp] > > + > > + add r12,QWORD[120+rsp] > > + mov r13,r9 > > + add r12,rdi > > + mov r14,rbx > > + ror r13,23 > > + mov rdi,r10 > > + > > + xor r13,r9 > > + ror r14,5 > > + xor rdi,r11 > > + > > + mov QWORD[120+rsp],r12 > > + xor r14,rbx > > + and rdi,r9 > > + > > + ror r13,4 > > + add r12,rax > > + xor rdi,r11 > > + > > + ror r14,6 > > + xor r13,r9 > > + add r12,rdi > > + > > + mov rdi,rbx > > + add r12,QWORD[rbp] > > + xor r14,rbx > > + > > + xor rdi,rcx > > + ror r13,14 > > + mov rax,rcx > > + > > + and r15,rdi > > + ror r14,28 > > + add r12,r13 > > + > > + xor rax,r15 > > + add r8,r12 > > + add rax,r12 > > + > > + lea rbp,[24+rbp] > > + cmp BYTE[7+rbp],0 > > + jnz NEAR $L$rounds_16_xx > > + > > + mov rdi,QWORD[((128+0))+rsp] > > + add rax,r14 > > + lea rsi,[128+rsi] > > + > > + add rax,QWORD[rdi] > > + add rbx,QWORD[8+rdi] > > + add rcx,QWORD[16+rdi] > > + add rdx,QWORD[24+rdi] > > + add r8,QWORD[32+rdi] > > + add r9,QWORD[40+rdi] > > + add r10,QWORD[48+rdi] > > + add r11,QWORD[56+rdi] > > + > > + cmp rsi,QWORD[((128+16))+rsp] > > + > > + mov QWORD[rdi],rax > > + mov QWORD[8+rdi],rbx > > + mov QWORD[16+rdi],rcx > > + mov QWORD[24+rdi],rdx > > + mov QWORD[32+rdi],r8 > > + mov QWORD[40+rdi],r9 > > + mov QWORD[48+rdi],r10 > > + mov QWORD[56+rdi],r11 > > + jb NEAR $L$loop > > + > > + mov rsi,QWORD[152+rsp] > > + > > + mov r15,QWORD[((-48))+rsi] > > + > > + mov r14,QWORD[((-40))+rsi] > > + > > + mov r13,QWORD[((-32))+rsi] > > + > > + mov r12,QWORD[((-24))+rsi] > > + > > + mov rbp,QWORD[((-16))+rsi] > > + > > + mov rbx,QWORD[((-8))+rsi] > > + > > + lea rsp,[rsi] > > + > > +$L$epilogue: > > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > > + mov rsi,QWORD[16+rsp] > > + DB 0F3h,0C3h ;repret > > + > > +$L$SEH_end_sha512_block_data_order: > > +ALIGN 64 > > + > > +K512: > > + DQ 0x428a2f98d728ae22,0x7137449123ef65cd > > + DQ 0x428a2f98d728ae22,0x7137449123ef65cd > > + DQ 0xb5c0fbcfec4d3b2f,0xe9b5dba58189dbbc > > + DQ 0xb5c0fbcfec4d3b2f,0xe9b5dba58189dbbc > > + DQ 0x3956c25bf348b538,0x59f111f1b605d019 > > + DQ 0x3956c25bf348b538,0x59f111f1b605d019 > > + DQ 0x923f82a4af194f9b,0xab1c5ed5da6d8118 > > + DQ 0x923f82a4af194f9b,0xab1c5ed5da6d8118 > > + DQ 0xd807aa98a3030242,0x12835b0145706fbe > > + DQ 0xd807aa98a3030242,0x12835b0145706fbe > > + DQ 0x243185be4ee4b28c,0x550c7dc3d5ffb4e2 > > + DQ 0x243185be4ee4b28c,0x550c7dc3d5ffb4e2 > > + DQ 0x72be5d74f27b896f,0x80deb1fe3b1696b1 > > + DQ 0x72be5d74f27b896f,0x80deb1fe3b1696b1 > > + DQ 0x9bdc06a725c71235,0xc19bf174cf692694 > > + DQ 0x9bdc06a725c71235,0xc19bf174cf692694 > > + DQ 0xe49b69c19ef14ad2,0xefbe4786384f25e3 > > + DQ 0xe49b69c19ef14ad2,0xefbe4786384f25e3 > > + DQ 0x0fc19dc68b8cd5b5,0x240ca1cc77ac9c65 > > + DQ 0x0fc19dc68b8cd5b5,0x240ca1cc77ac9c65 > > + DQ 0x2de92c6f592b0275,0x4a7484aa6ea6e483 > > + DQ 0x2de92c6f592b0275,0x4a7484aa6ea6e483 > > + DQ 0x5cb0a9dcbd41fbd4,0x76f988da831153b5 > > + DQ 0x5cb0a9dcbd41fbd4,0x76f988da831153b5 > > + DQ 0x983e5152ee66dfab,0xa831c66d2db43210 > > + DQ 0x983e5152ee66dfab,0xa831c66d2db43210 > > + DQ 0xb00327c898fb213f,0xbf597fc7beef0ee4 > > + DQ 0xb00327c898fb213f,0xbf597fc7beef0ee4 > > + DQ 0xc6e00bf33da88fc2,0xd5a79147930aa725 > > + DQ 0xc6e00bf33da88fc2,0xd5a79147930aa725 > > + DQ 0x06ca6351e003826f,0x142929670a0e6e70 > > + DQ 0x06ca6351e003826f,0x142929670a0e6e70 > > + DQ 0x27b70a8546d22ffc,0x2e1b21385c26c926 > > + DQ 0x27b70a8546d22ffc,0x2e1b21385c26c926 > > + DQ 0x4d2c6dfc5ac42aed,0x53380d139d95b3df > > + DQ 0x4d2c6dfc5ac42aed,0x53380d139d95b3df > > + DQ 0x650a73548baf63de,0x766a0abb3c77b2a8 > > + DQ 0x650a73548baf63de,0x766a0abb3c77b2a8 > > + DQ 0x81c2c92e47edaee6,0x92722c851482353b > > + DQ 0x81c2c92e47edaee6,0x92722c851482353b > > + DQ 0xa2bfe8a14cf10364,0xa81a664bbc423001 > > + DQ 0xa2bfe8a14cf10364,0xa81a664bbc423001 > > + DQ 0xc24b8b70d0f89791,0xc76c51a30654be30 > > + DQ 0xc24b8b70d0f89791,0xc76c51a30654be30 > > + DQ 0xd192e819d6ef5218,0xd69906245565a910 > > + DQ 0xd192e819d6ef5218,0xd69906245565a910 > > + DQ 0xf40e35855771202a,0x106aa07032bbd1b8 > > + DQ 0xf40e35855771202a,0x106aa07032bbd1b8 > > + DQ 0x19a4c116b8d2d0c8,0x1e376c085141ab53 > > + DQ 0x19a4c116b8d2d0c8,0x1e376c085141ab53 > > + DQ 0x2748774cdf8eeb99,0x34b0bcb5e19b48a8 > > + DQ 0x2748774cdf8eeb99,0x34b0bcb5e19b48a8 > > + DQ 0x391c0cb3c5c95a63,0x4ed8aa4ae3418acb > > + DQ 0x391c0cb3c5c95a63,0x4ed8aa4ae3418acb > > + DQ 0x5b9cca4f7763e373,0x682e6ff3d6b2b8a3 > > + DQ 0x5b9cca4f7763e373,0x682e6ff3d6b2b8a3 > > + DQ 0x748f82ee5defb2fc,0x78a5636f43172f60 > > + DQ 0x748f82ee5defb2fc,0x78a5636f43172f60 > > + DQ 0x84c87814a1f0ab72,0x8cc702081a6439ec > > + DQ 0x84c87814a1f0ab72,0x8cc702081a6439ec > > + DQ 0x90befffa23631e28,0xa4506cebde82bde9 > > + DQ 0x90befffa23631e28,0xa4506cebde82bde9 > > + DQ 0xbef9a3f7b2c67915,0xc67178f2e372532b > > + DQ 0xbef9a3f7b2c67915,0xc67178f2e372532b > > + DQ 0xca273eceea26619c,0xd186b8c721c0c207 > > + DQ 0xca273eceea26619c,0xd186b8c721c0c207 > > + DQ 0xeada7dd6cde0eb1e,0xf57d4f7fee6ed178 > > + DQ 0xeada7dd6cde0eb1e,0xf57d4f7fee6ed178 > > + DQ 0x06f067aa72176fba,0x0a637dc5a2c898a6 > > + DQ 0x06f067aa72176fba,0x0a637dc5a2c898a6 > > + DQ 0x113f9804bef90dae,0x1b710b35131c471b > > + DQ 0x113f9804bef90dae,0x1b710b35131c471b > > + DQ 0x28db77f523047d84,0x32caab7b40c72493 > > + DQ 0x28db77f523047d84,0x32caab7b40c72493 > > + DQ 0x3c9ebe0a15c9bebc,0x431d67c49c100d4c > > + DQ 0x3c9ebe0a15c9bebc,0x431d67c49c100d4c > > + DQ 0x4cc5d4becb3e42b6,0x597f299cfc657e2a > > + DQ 0x4cc5d4becb3e42b6,0x597f299cfc657e2a > > + DQ 0x5fcb6fab3ad6faec,0x6c44198c4a475817 > > + DQ 0x5fcb6fab3ad6faec,0x6c44198c4a475817 > > + > > + DQ 0x0001020304050607,0x08090a0b0c0d0e0f > > + DQ 0x0001020304050607,0x08090a0b0c0d0e0f > > +DB 83,72,65,53,49,50,32,98,108,111,99,107,32,116,114,97 > > +DB 110,115,102,111,114,109,32,102,111,114,32,120,56,54,95,54 > > +DB 52,44,32,67,82,89,80,84,79,71,65,77,83,32,98,121 > > +DB 32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46 > > +DB 111,114,103,62,0 > > +EXTERN __imp_RtlVirtualUnwind > > + > > +ALIGN 16 > > +se_handler: > > + push rsi > > + push rdi > > + push rbx > > + push rbp > > + push r12 > > + push r13 > > + push r14 > > + push r15 > > + pushfq > > + sub rsp,64 > > + > > + mov rax,QWORD[120+r8] > > + mov rbx,QWORD[248+r8] > > + > > + mov rsi,QWORD[8+r9] > > + mov r11,QWORD[56+r9] > > + > > + mov r10d,DWORD[r11] > > + lea r10,[r10*1+rsi] > > + cmp rbx,r10 > > + jb NEAR $L$in_prologue > > + > > + mov rax,QWORD[152+r8] > > + > > + mov r10d,DWORD[4+r11] > > + lea r10,[r10*1+rsi] > > + cmp rbx,r10 > > + jae NEAR $L$in_prologue > > + mov rsi,rax > > + mov rax,QWORD[((128+24))+rax] > > + > > + mov rbx,QWORD[((-8))+rax] > > + mov rbp,QWORD[((-16))+rax] > > + mov r12,QWORD[((-24))+rax] > > + mov r13,QWORD[((-32))+rax] > > + mov r14,QWORD[((-40))+rax] > > + mov r15,QWORD[((-48))+rax] > > + mov QWORD[144+r8],rbx > > + mov QWORD[160+r8],rbp > > + mov QWORD[216+r8],r12 > > + mov QWORD[224+r8],r13 > > + mov QWORD[232+r8],r14 > > + mov QWORD[240+r8],r15 > > + > > + lea r10,[$L$epilogue] > > + cmp rbx,r10 > > + jb NEAR $L$in_prologue > > + > > + lea rsi,[((128+32))+rsi] > > + lea rdi,[512+r8] > > + mov ecx,12 > > + DD 0xa548f3fc > > + > > +$L$in_prologue: > > + mov rdi,QWORD[8+rax] > > + mov rsi,QWORD[16+rax] > > + mov QWORD[152+r8],rax > > + mov QWORD[168+r8],rsi > > + mov QWORD[176+r8],rdi > > + > > + mov rdi,QWORD[40+r9] > > + mov rsi,r8 > > + mov ecx,154 > > + DD 0xa548f3fc > > + > > + mov rsi,r9 > > + xor rcx,rcx > > + mov rdx,QWORD[8+rsi] > > + mov r8,QWORD[rsi] > > + mov r9,QWORD[16+rsi] > > + mov r10,QWORD[40+rsi] > > + lea r11,[56+rsi] > > + lea r12,[24+rsi] > > + mov QWORD[32+rsp],r10 > > + mov QWORD[40+rsp],r11 > > + mov QWORD[48+rsp],r12 > > + mov QWORD[56+rsp],rcx > > + call QWORD[__imp_RtlVirtualUnwind] > > + > > + mov eax,1 > > + add rsp,64 > > + popfq > > + pop r15 > > + pop r14 > > + pop r13 > > + pop r12 > > + pop rbp > > + pop rbx > > + pop rdi > > + pop rsi > > + DB 0F3h,0C3h ;repret > > + > > +section .pdata rdata align=4 > > +ALIGN 4 > > + DD $L$SEH_begin_sha512_block_data_order > wrt ..imagebase > > + DD $L$SEH_end_sha512_block_data_order > wrt ..imagebase > > + DD $L$SEH_info_sha512_block_data_order > wrt ..imagebase > > +section .xdata rdata align=8 > > +ALIGN 8 > > +$L$SEH_info_sha512_block_data_order: > > +DB 9,0,0,0 > > + DD se_handler wrt ..imagebase > > + DD $L$prologue wrt ..imagebase,$L$epilogue > wrt ..imagebase > > diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/x86_64cpuid.nasm > > b/CryptoPkg/Library/OpensslLib/X64/crypto/x86_64cpuid.nasm > > new file mode 100644 > > index 0000000000..2a3d5bcf72 > > --- /dev/null > > +++ b/CryptoPkg/Library/OpensslLib/X64/crypto/x86_64cpuid.nasm > > @@ -0,0 +1,491 @@ > > +; WARNING: do not edit! > > +; Generated from openssl/crypto/x86_64cpuid.pl > > +; > > +; Copyright 2005-2020 The OpenSSL Project Authors. All Rights Reserved. > > +; > > +; Licensed under the OpenSSL license (the "License"). You may not use > > +; this file except in compliance with the License. You can obtain a copy > > +; in the file LICENSE in the source distribution or at > > +; https://www.openssl.org/source/license.html > > + > > +default rel > > +%define XMMWORD > > +%define YMMWORD > > +%define ZMMWORD > > +EXTERN OPENSSL_cpuid_setup > > + > > +section .CRT$XCU rdata align=8 > > + DQ OPENSSL_cpuid_setup > > + > > + > > +common OPENSSL_ia32cap_P 16 > > + > > +section .text code align=64 > > + > > + > > +global OPENSSL_atomic_add > > + > > +ALIGN 16 > > +OPENSSL_atomic_add: > > + > > + mov eax,DWORD[rcx] > > +$L$spin: lea r8,[rax*1+rdx] > > +DB 0xf0 > > + cmpxchg DWORD[rcx],r8d > > + jne NEAR $L$spin > > + mov eax,r8d > > +DB 0x48,0x98 > > + DB 0F3h,0C3h ;repret > > + > > + > > + > > +global OPENSSL_rdtsc > > + > > +ALIGN 16 > > +OPENSSL_rdtsc: > > + > > + rdtsc > > + shl rdx,32 > > + or rax,rdx > > + DB 0F3h,0C3h ;repret > > + > > + > > + > > +global OPENSSL_ia32_cpuid > > + > > +ALIGN 16 > > +OPENSSL_ia32_cpuid: > > + mov QWORD[8+rsp],rdi ;WIN64 prologue > > + mov QWORD[16+rsp],rsi > > + mov rax,rsp > > +$L$SEH_begin_OPENSSL_ia32_cpuid: > > + mov rdi,rcx > > + > > + > > + > > + mov r8,rbx > > + > > + > > + xor eax,eax > > + mov QWORD[8+rdi],rax > > + cpuid > > + mov r11d,eax > > + > > + xor eax,eax > > + cmp ebx,0x756e6547 > > + setne al > > + mov r9d,eax > > + cmp edx,0x49656e69 > > + setne al > > + or r9d,eax > > + cmp ecx,0x6c65746e > > + setne al > > + or r9d,eax > > + jz NEAR $L$intel > > + > > + cmp ebx,0x68747541 > > + setne al > > + mov r10d,eax > > + cmp edx,0x69746E65 > > + setne al > > + or r10d,eax > > + cmp ecx,0x444D4163 > > + setne al > > + or r10d,eax > > + jnz NEAR $L$intel > > + > > + > > + mov eax,0x80000000 > > + cpuid > > + cmp eax,0x80000001 > > + jb NEAR $L$intel > > + mov r10d,eax > > + mov eax,0x80000001 > > + cpuid > > + or r9d,ecx > > + and r9d,0x00000801 > > + > > + cmp r10d,0x80000008 > > + jb NEAR $L$intel > > + > > + mov eax,0x80000008 > > + cpuid > > + movzx r10,cl > > + inc r10 > > + > > + mov eax,1 > > + cpuid > > + bt edx,28 > > + jnc NEAR $L$generic > > + shr ebx,16 > > + cmp bl,r10b > > + ja NEAR $L$generic > > + and edx,0xefffffff > > + jmp NEAR $L$generic > > + > > +$L$intel: > > + cmp r11d,4 > > + mov r10d,-1 > > + jb NEAR $L$nocacheinfo > > + > > + mov eax,4 > > + mov ecx,0 > > + cpuid > > + mov r10d,eax > > + shr r10d,14 > > + and r10d,0xfff > > + > > +$L$nocacheinfo: > > + mov eax,1 > > + cpuid > > + movd xmm0,eax > > + and edx,0xbfefffff > > + cmp r9d,0 > > + jne NEAR $L$notintel > > + or edx,0x40000000 > > + and ah,15 > > + cmp ah,15 > > + jne NEAR $L$notP4 > > + or edx,0x00100000 > > +$L$notP4: > > + cmp ah,6 > > + jne NEAR $L$notintel > > + and eax,0x0fff0ff0 > > + cmp eax,0x00050670 > > + je NEAR $L$knights > > + cmp eax,0x00080650 > > + jne NEAR $L$notintel > > +$L$knights: > > + and ecx,0xfbffffff > > + > > +$L$notintel: > > + bt edx,28 > > + jnc NEAR $L$generic > > + and edx,0xefffffff > > + cmp r10d,0 > > + je NEAR $L$generic > > + > > + or edx,0x10000000 > > + shr ebx,16 > > + cmp bl,1 > > + ja NEAR $L$generic > > + and edx,0xefffffff > > +$L$generic: > > + and r9d,0x00000800 > > + and ecx,0xfffff7ff > > + or r9d,ecx > > + > > + mov r10d,edx > > + > > + cmp r11d,7 > > + jb NEAR $L$no_extended_info > > + mov eax,7 > > + xor ecx,ecx > > + cpuid > > + bt r9d,26 > > + jc NEAR $L$notknights > > + and ebx,0xfff7ffff > > +$L$notknights: > > + movd eax,xmm0 > > + and eax,0x0fff0ff0 > > + cmp eax,0x00050650 > > + jne NEAR $L$notskylakex > > + and ebx,0xfffeffff > > + > > +$L$notskylakex: > > + mov DWORD[8+rdi],ebx > > + mov DWORD[12+rdi],ecx > > +$L$no_extended_info: > > + > > + bt r9d,27 > > + jnc NEAR $L$clear_avx > > + xor ecx,ecx > > +DB 0x0f,0x01,0xd0 > > + and eax,0xe6 > > + cmp eax,0xe6 > > + je NEAR $L$done > > + and DWORD[8+rdi],0x3fdeffff > > + > > + > > + > > + > > + and eax,6 > > + cmp eax,6 > > + je NEAR $L$done > > +$L$clear_avx: > > + mov eax,0xefffe7ff > > + and r9d,eax > > + mov eax,0x3fdeffdf > > + and DWORD[8+rdi],eax > > +$L$done: > > + shl r9,32 > > + mov eax,r10d > > + mov rbx,r8 > > + > > + or rax,r9 > > + mov rdi,QWORD[8+rsp] ;WIN64 epilogue > > + mov rsi,QWORD[16+rsp] > > + DB 0F3h,0C3h ;repret > > + > > +$L$SEH_end_OPENSSL_ia32_cpuid: > > + > > +global OPENSSL_cleanse > > + > > +ALIGN 16 > > +OPENSSL_cleanse: > > + > > + xor rax,rax > > + cmp rdx,15 > > + jae NEAR $L$ot > > + cmp rdx,0 > > + je NEAR $L$ret > > +$L$ittle: > > + mov BYTE[rcx],al > > + sub rdx,1 > > + lea rcx,[1+rcx] > > + jnz NEAR $L$ittle > > +$L$ret: > > + DB 0F3h,0C3h ;repret > > +ALIGN 16 > > +$L$ot: > > + test rcx,7 > > + jz NEAR $L$aligned > > + mov BYTE[rcx],al > > + lea rdx,[((-1))+rdx] > > + lea rcx,[1+rcx] > > + jmp NEAR $L$ot > > +$L$aligned: > > + mov QWORD[rcx],rax > > + lea rdx,[((-8))+rdx] > > + test rdx,-8 > > + lea rcx,[8+rcx] > > + jnz NEAR $L$aligned > > + cmp rdx,0 > > + jne NEAR $L$ittle > > + DB 0F3h,0C3h ;repret > > + > > + > > + > > +global CRYPTO_memcmp > > + > > +ALIGN 16 > > +CRYPTO_memcmp: > > + > > + xor rax,rax > > + xor r10,r10 > > + cmp r8,0 > > + je NEAR $L$no_data > > + cmp r8,16 > > + jne NEAR $L$oop_cmp > > + mov r10,QWORD[rcx] > > + mov r11,QWORD[8+rcx] > > + mov r8,1 > > + xor r10,QWORD[rdx] > > + xor r11,QWORD[8+rdx] > > + or r10,r11 > > + cmovnz rax,r8 > > + DB 0F3h,0C3h ;repret > > + > > +ALIGN 16 > > +$L$oop_cmp: > > + mov r10b,BYTE[rcx] > > + lea rcx,[1+rcx] > > + xor r10b,BYTE[rdx] > > + lea rdx,[1+rdx] > > + or al,r10b > > + dec r8 > > + jnz NEAR $L$oop_cmp > > + neg rax > > + shr rax,63 > > +$L$no_data: > > + DB 0F3h,0C3h ;repret > > + > > + > > +global OPENSSL_wipe_cpu > > + > > +ALIGN 16 > > +OPENSSL_wipe_cpu: > > + pxor xmm0,xmm0 > > + pxor xmm1,xmm1 > > + pxor xmm2,xmm2 > > + pxor xmm3,xmm3 > > + pxor xmm4,xmm4 > > + pxor xmm5,xmm5 > > + xor rcx,rcx > > + xor rdx,rdx > > + xor r8,r8 > > + xor r9,r9 > > + xor r10,r10 > > + xor r11,r11 > > + lea rax,[8+rsp] > > + DB 0F3h,0C3h ;repret > > + > > +global OPENSSL_instrument_bus > > + > > +ALIGN 16 > > +OPENSSL_instrument_bus: > > + > > + mov r10,rcx > > + mov rcx,rdx > > + mov r11,rdx > > + > > + rdtsc > > + mov r8d,eax > > + mov r9d,0 > > + clflush [r10] > > +DB 0xf0 > > + add DWORD[r10],r9d > > + jmp NEAR $L$oop > > +ALIGN 16 > > +$L$oop: rdtsc > > + mov edx,eax > > + sub eax,r8d > > + mov r8d,edx > > + mov r9d,eax > > + clflush [r10] > > +DB 0xf0 > > + add DWORD[r10],eax > > + lea r10,[4+r10] > > + sub rcx,1 > > + jnz NEAR $L$oop > > + > > + mov rax,r11 > > + DB 0F3h,0C3h ;repret > > + > > + > > + > > +global OPENSSL_instrument_bus2 > > + > > +ALIGN 16 > > +OPENSSL_instrument_bus2: > > + > > + mov r10,rcx > > + mov rcx,rdx > > + mov r11,r8 > > + mov QWORD[8+rsp],rcx > > + > > + rdtsc > > + mov r8d,eax > > + mov r9d,0 > > + > > + clflush [r10] > > +DB 0xf0 > > + add DWORD[r10],r9d > > + > > + rdtsc > > + mov edx,eax > > + sub eax,r8d > > + mov r8d,edx > > + mov r9d,eax > > +$L$oop2: > > + clflush [r10] > > +DB 0xf0 > > + add DWORD[r10],eax > > + > > + sub r11,1 > > + jz NEAR $L$done2 > > + > > + rdtsc > > + mov edx,eax > > + sub eax,r8d > > + mov r8d,edx > > + cmp eax,r9d > > + mov r9d,eax > > + mov edx,0 > > + setne dl > > + sub rcx,rdx > > + lea r10,[rdx*4+r10] > > + jnz NEAR $L$oop2 > > + > > +$L$done2: > > + mov rax,QWORD[8+rsp] > > + sub rax,rcx > > + DB 0F3h,0C3h ;repret > > + > > + > > +global OPENSSL_ia32_rdrand_bytes > > + > > +ALIGN 16 > > +OPENSSL_ia32_rdrand_bytes: > > + > > + xor rax,rax > > + cmp rdx,0 > > + je NEAR $L$done_rdrand_bytes > > + > > + mov r11,8 > > +$L$oop_rdrand_bytes: > > +DB 73,15,199,242 > > + jc NEAR $L$break_rdrand_bytes > > + dec r11 > > + jnz NEAR $L$oop_rdrand_bytes > > + jmp NEAR $L$done_rdrand_bytes > > + > > +ALIGN 16 > > +$L$break_rdrand_bytes: > > + cmp rdx,8 > > + jb NEAR $L$tail_rdrand_bytes > > + mov QWORD[rcx],r10 > > + lea rcx,[8+rcx] > > + add rax,8 > > + sub rdx,8 > > + jz NEAR $L$done_rdrand_bytes > > + mov r11,8 > > + jmp NEAR $L$oop_rdrand_bytes > > + > > +ALIGN 16 > > +$L$tail_rdrand_bytes: > > + mov BYTE[rcx],r10b > > + lea rcx,[1+rcx] > > + inc rax > > + shr r10,8 > > + dec rdx > > + jnz NEAR $L$tail_rdrand_bytes > > + > > +$L$done_rdrand_bytes: > > + xor r10,r10 > > + DB 0F3h,0C3h ;repret > > + > > + > > +global OPENSSL_ia32_rdseed_bytes > > + > > +ALIGN 16 > > +OPENSSL_ia32_rdseed_bytes: > > + > > + xor rax,rax > > + cmp rdx,0 > > + je NEAR $L$done_rdseed_bytes > > + > > + mov r11,8 > > +$L$oop_rdseed_bytes: > > +DB 73,15,199,250 > > + jc NEAR $L$break_rdseed_bytes > > + dec r11 > > + jnz NEAR $L$oop_rdseed_bytes > > + jmp NEAR $L$done_rdseed_bytes > > + > > +ALIGN 16 > > +$L$break_rdseed_bytes: > > + cmp rdx,8 > > + jb NEAR $L$tail_rdseed_bytes > > + mov QWORD[rcx],r10 > > + lea rcx,[8+rcx] > > + add rax,8 > > + sub rdx,8 > > + jz NEAR $L$done_rdseed_bytes > > + mov r11,8 > > + jmp NEAR $L$oop_rdseed_bytes > > + > > +ALIGN 16 > > +$L$tail_rdseed_bytes: > > + mov BYTE[rcx],r10b > > + lea rcx,[1+rcx] > > + inc rax > > + shr r10,8 > > + dec rdx > > + jnz NEAR $L$tail_rdseed_bytes > > + > > +$L$done_rdseed_bytes: > > + xor r10,r10 > > + DB 0F3h,0C3h ;repret > > + > > + > > diff --git a/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/aesni-mb- > > x86_64.S b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/aesni-mb- > > x86_64.S > > new file mode 100644 > > index 0000000000..7749fd685a > > --- /dev/null > > +++ > b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/aesni-mb-x86_64.S > > @@ -0,0 +1,552 @@ > > +# WARNING: do not edit! > > +# Generated from openssl/crypto/aes/asm/aesni-mb-x86_64.pl > > +# > > +# Copyright 2013-2020 The OpenSSL Project Authors. All Rights Reserved. > > +# > > +# Licensed under the OpenSSL license (the "License"). You may not use > > +# this file except in compliance with the License. You can obtain a copy > > +# in the file LICENSE in the source distribution or at > > +# https://www.openssl.org/source/license.html > > + > > +.text > > + > > + > > + > > +.globl aesni_multi_cbc_encrypt > > +.type aesni_multi_cbc_encrypt,@function > > +.align 32 > > +aesni_multi_cbc_encrypt: > > +.cfi_startproc > > + movq %rsp,%rax > > +.cfi_def_cfa_register %rax > > + pushq %rbx > > +.cfi_offset %rbx,-16 > > + pushq %rbp > > +.cfi_offset %rbp,-24 > > + pushq %r12 > > +.cfi_offset %r12,-32 > > + pushq %r13 > > +.cfi_offset %r13,-40 > > + pushq %r14 > > +.cfi_offset %r14,-48 > > + pushq %r15 > > +.cfi_offset %r15,-56 > > + > > + > > + > > + > > + > > + > > + subq $48,%rsp > > + andq $-64,%rsp > > + movq %rax,16(%rsp) > > +.cfi_escape 0x0f,0x05,0x77,0x10,0x06,0x23,0x08 > > + > > +.Lenc4x_body: > > + movdqu (%rsi),%xmm12 > > + leaq 120(%rsi),%rsi > > + leaq 80(%rdi),%rdi > > + > > +.Lenc4x_loop_grande: > > + movl %edx,24(%rsp) > > + xorl %edx,%edx > > + movl -64(%rdi),%ecx > > + movq -80(%rdi),%r8 > > + cmpl %edx,%ecx > > + movq -72(%rdi),%r12 > > + cmovgl %ecx,%edx > > + testl %ecx,%ecx > > + movdqu -56(%rdi),%xmm2 > > + movl %ecx,32(%rsp) > > + cmovleq %rsp,%r8 > > + movl -24(%rdi),%ecx > > + movq -40(%rdi),%r9 > > + cmpl %edx,%ecx > > + movq -32(%rdi),%r13 > > + cmovgl %ecx,%edx > > + testl %ecx,%ecx > > + movdqu -16(%rdi),%xmm3 > > + movl %ecx,36(%rsp) > > + cmovleq %rsp,%r9 > > + movl 16(%rdi),%ecx > > + movq 0(%rdi),%r10 > > + cmpl %edx,%ecx > > + movq 8(%rdi),%r14 > > + cmovgl %ecx,%edx > > + testl %ecx,%ecx > > + movdqu 24(%rdi),%xmm4 > > + movl %ecx,40(%rsp) > > + cmovleq %rsp,%r10 > > + movl 56(%rdi),%ecx > > + movq 40(%rdi),%r11 > > + cmpl %edx,%ecx > > + movq 48(%rdi),%r15 > > + cmovgl %ecx,%edx > > + testl %ecx,%ecx > > + movdqu 64(%rdi),%xmm5 > > + movl %ecx,44(%rsp) > > + cmovleq %rsp,%r11 > > + testl %edx,%edx > > + jz .Lenc4x_done > > + > > + movups 16-120(%rsi),%xmm1 > > + pxor %xmm12,%xmm2 > > + movups 32-120(%rsi),%xmm0 > > + pxor %xmm12,%xmm3 > > + movl 240-120(%rsi),%eax > > + pxor %xmm12,%xmm4 > > + movdqu (%r8),%xmm6 > > + pxor %xmm12,%xmm5 > > + movdqu (%r9),%xmm7 > > + pxor %xmm6,%xmm2 > > + movdqu (%r10),%xmm8 > > + pxor %xmm7,%xmm3 > > + movdqu (%r11),%xmm9 > > + pxor %xmm8,%xmm4 > > + pxor %xmm9,%xmm5 > > + movdqa 32(%rsp),%xmm10 > > + xorq %rbx,%rbx > > + jmp .Loop_enc4x > > + > > +.align 32 > > +.Loop_enc4x: > > + addq $16,%rbx > > + leaq 16(%rsp),%rbp > > + movl $1,%ecx > > + subq %rbx,%rbp > > + > > +.byte 102,15,56,220,209 > > + prefetcht0 31(%r8,%rbx,1) > > + prefetcht0 31(%r9,%rbx,1) > > +.byte 102,15,56,220,217 > > + prefetcht0 31(%r10,%rbx,1) > > + prefetcht0 31(%r10,%rbx,1) > > +.byte 102,15,56,220,225 > > +.byte 102,15,56,220,233 > > + movups 48-120(%rsi),%xmm1 > > + cmpl 32(%rsp),%ecx > > +.byte 102,15,56,220,208 > > +.byte 102,15,56,220,216 > > +.byte 102,15,56,220,224 > > + cmovgeq %rbp,%r8 > > + cmovgq %rbp,%r12 > > +.byte 102,15,56,220,232 > > + movups -56(%rsi),%xmm0 > > + cmpl 36(%rsp),%ecx > > +.byte 102,15,56,220,209 > > +.byte 102,15,56,220,217 > > +.byte 102,15,56,220,225 > > + cmovgeq %rbp,%r9 > > + cmovgq %rbp,%r13 > > +.byte 102,15,56,220,233 > > + movups -40(%rsi),%xmm1 > > + cmpl 40(%rsp),%ecx > > +.byte 102,15,56,220,208 > > +.byte 102,15,56,220,216 > > +.byte 102,15,56,220,224 > > + cmovgeq %rbp,%r10 > > + cmovgq %rbp,%r14 > > +.byte 102,15,56,220,232 > > + movups -24(%rsi),%xmm0 > > + cmpl 44(%rsp),%ecx > > +.byte 102,15,56,220,209 > > +.byte 102,15,56,220,217 > > +.byte 102,15,56,220,225 > > + cmovgeq %rbp,%r11 > > + cmovgq %rbp,%r15 > > +.byte 102,15,56,220,233 > > + movups -8(%rsi),%xmm1 > > + movdqa %xmm10,%xmm11 > > +.byte 102,15,56,220,208 > > + prefetcht0 15(%r12,%rbx,1) > > + prefetcht0 15(%r13,%rbx,1) > > +.byte 102,15,56,220,216 > > + prefetcht0 15(%r14,%rbx,1) > > + prefetcht0 15(%r15,%rbx,1) > > +.byte 102,15,56,220,224 > > +.byte 102,15,56,220,232 > > + movups 128-120(%rsi),%xmm0 > > + pxor %xmm12,%xmm12 > > + > > +.byte 102,15,56,220,209 > > + pcmpgtd %xmm12,%xmm11 > > + movdqu -120(%rsi),%xmm12 > > +.byte 102,15,56,220,217 > > + paddd %xmm11,%xmm10 > > + movdqa %xmm10,32(%rsp) > > +.byte 102,15,56,220,225 > > +.byte 102,15,56,220,233 > > + movups 144-120(%rsi),%xmm1 > > + > > + cmpl $11,%eax > > + > > +.byte 102,15,56,220,208 > > +.byte 102,15,56,220,216 > > +.byte 102,15,56,220,224 > > +.byte 102,15,56,220,232 > > + movups 160-120(%rsi),%xmm0 > > + > > + jb .Lenc4x_tail > > + > > +.byte 102,15,56,220,209 > > +.byte 102,15,56,220,217 > > +.byte 102,15,56,220,225 > > +.byte 102,15,56,220,233 > > + movups 176-120(%rsi),%xmm1 > > + > > +.byte 102,15,56,220,208 > > +.byte 102,15,56,220,216 > > +.byte 102,15,56,220,224 > > +.byte 102,15,56,220,232 > > + movups 192-120(%rsi),%xmm0 > > + > > + je .Lenc4x_tail > > + > > +.byte 102,15,56,220,209 > > +.byte 102,15,56,220,217 > > +.byte 102,15,56,220,225 > > +.byte 102,15,56,220,233 > > + movups 208-120(%rsi),%xmm1 > > + > > +.byte 102,15,56,220,208 > > +.byte 102,15,56,220,216 > > +.byte 102,15,56,220,224 > > +.byte 102,15,56,220,232 > > + movups 224-120(%rsi),%xmm0 > > + jmp .Lenc4x_tail > > + > > +.align 32 > > +.Lenc4x_tail: > > +.byte 102,15,56,220,209 > > +.byte 102,15,56,220,217 > > +.byte 102,15,56,220,225 > > +.byte 102,15,56,220,233 > > + movdqu (%r8,%rbx,1),%xmm6 > > + movdqu 16-120(%rsi),%xmm1 > > + > > +.byte 102,15,56,221,208 > > + movdqu (%r9,%rbx,1),%xmm7 > > + pxor %xmm12,%xmm6 > > +.byte 102,15,56,221,216 > > + movdqu (%r10,%rbx,1),%xmm8 > > + pxor %xmm12,%xmm7 > > +.byte 102,15,56,221,224 > > + movdqu (%r11,%rbx,1),%xmm9 > > + pxor %xmm12,%xmm8 > > +.byte 102,15,56,221,232 > > + movdqu 32-120(%rsi),%xmm0 > > + pxor %xmm12,%xmm9 > > + > > + movups %xmm2,-16(%r12,%rbx,1) > > + pxor %xmm6,%xmm2 > > + movups %xmm3,-16(%r13,%rbx,1) > > + pxor %xmm7,%xmm3 > > + movups %xmm4,-16(%r14,%rbx,1) > > + pxor %xmm8,%xmm4 > > + movups %xmm5,-16(%r15,%rbx,1) > > + pxor %xmm9,%xmm5 > > + > > + decl %edx > > + jnz .Loop_enc4x > > + > > + movq 16(%rsp),%rax > > +.cfi_def_cfa %rax,8 > > + movl 24(%rsp),%edx > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + leaq 160(%rdi),%rdi > > + decl %edx > > + jnz .Lenc4x_loop_grande > > + > > +.Lenc4x_done: > > + movq -48(%rax),%r15 > > +.cfi_restore %r15 > > + movq -40(%rax),%r14 > > +.cfi_restore %r14 > > + movq -32(%rax),%r13 > > +.cfi_restore %r13 > > + movq -24(%rax),%r12 > > +.cfi_restore %r12 > > + movq -16(%rax),%rbp > > +.cfi_restore %rbp > > + movq -8(%rax),%rbx > > +.cfi_restore %rbx > > + leaq (%rax),%rsp > > +.cfi_def_cfa_register %rsp > > +.Lenc4x_epilogue: > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size aesni_multi_cbc_encrypt,.-aesni_multi_cbc_encrypt > > + > > +.globl aesni_multi_cbc_decrypt > > +.type aesni_multi_cbc_decrypt,@function > > +.align 32 > > +aesni_multi_cbc_decrypt: > > +.cfi_startproc > > + movq %rsp,%rax > > +.cfi_def_cfa_register %rax > > + pushq %rbx > > +.cfi_offset %rbx,-16 > > + pushq %rbp > > +.cfi_offset %rbp,-24 > > + pushq %r12 > > +.cfi_offset %r12,-32 > > + pushq %r13 > > +.cfi_offset %r13,-40 > > + pushq %r14 > > +.cfi_offset %r14,-48 > > + pushq %r15 > > +.cfi_offset %r15,-56 > > + > > + > > + > > + > > + > > + > > + subq $48,%rsp > > + andq $-64,%rsp > > + movq %rax,16(%rsp) > > +.cfi_escape 0x0f,0x05,0x77,0x10,0x06,0x23,0x08 > > + > > +.Ldec4x_body: > > + movdqu (%rsi),%xmm12 > > + leaq 120(%rsi),%rsi > > + leaq 80(%rdi),%rdi > > + > > +.Ldec4x_loop_grande: > > + movl %edx,24(%rsp) > > + xorl %edx,%edx > > + movl -64(%rdi),%ecx > > + movq -80(%rdi),%r8 > > + cmpl %edx,%ecx > > + movq -72(%rdi),%r12 > > + cmovgl %ecx,%edx > > + testl %ecx,%ecx > > + movdqu -56(%rdi),%xmm6 > > + movl %ecx,32(%rsp) > > + cmovleq %rsp,%r8 > > + movl -24(%rdi),%ecx > > + movq -40(%rdi),%r9 > > + cmpl %edx,%ecx > > + movq -32(%rdi),%r13 > > + cmovgl %ecx,%edx > > + testl %ecx,%ecx > > + movdqu -16(%rdi),%xmm7 > > + movl %ecx,36(%rsp) > > + cmovleq %rsp,%r9 > > + movl 16(%rdi),%ecx > > + movq 0(%rdi),%r10 > > + cmpl %edx,%ecx > > + movq 8(%rdi),%r14 > > + cmovgl %ecx,%edx > > + testl %ecx,%ecx > > + movdqu 24(%rdi),%xmm8 > > + movl %ecx,40(%rsp) > > + cmovleq %rsp,%r10 > > + movl 56(%rdi),%ecx > > + movq 40(%rdi),%r11 > > + cmpl %edx,%ecx > > + movq 48(%rdi),%r15 > > + cmovgl %ecx,%edx > > + testl %ecx,%ecx > > + movdqu 64(%rdi),%xmm9 > > + movl %ecx,44(%rsp) > > + cmovleq %rsp,%r11 > > + testl %edx,%edx > > + jz .Ldec4x_done > > + > > + movups 16-120(%rsi),%xmm1 > > + movups 32-120(%rsi),%xmm0 > > + movl 240-120(%rsi),%eax > > + movdqu (%r8),%xmm2 > > + movdqu (%r9),%xmm3 > > + pxor %xmm12,%xmm2 > > + movdqu (%r10),%xmm4 > > + pxor %xmm12,%xmm3 > > + movdqu (%r11),%xmm5 > > + pxor %xmm12,%xmm4 > > + pxor %xmm12,%xmm5 > > + movdqa 32(%rsp),%xmm10 > > + xorq %rbx,%rbx > > + jmp .Loop_dec4x > > + > > +.align 32 > > +.Loop_dec4x: > > + addq $16,%rbx > > + leaq 16(%rsp),%rbp > > + movl $1,%ecx > > + subq %rbx,%rbp > > + > > +.byte 102,15,56,222,209 > > + prefetcht0 31(%r8,%rbx,1) > > + prefetcht0 31(%r9,%rbx,1) > > +.byte 102,15,56,222,217 > > + prefetcht0 31(%r10,%rbx,1) > > + prefetcht0 31(%r11,%rbx,1) > > +.byte 102,15,56,222,225 > > +.byte 102,15,56,222,233 > > + movups 48-120(%rsi),%xmm1 > > + cmpl 32(%rsp),%ecx > > +.byte 102,15,56,222,208 > > +.byte 102,15,56,222,216 > > +.byte 102,15,56,222,224 > > + cmovgeq %rbp,%r8 > > + cmovgq %rbp,%r12 > > +.byte 102,15,56,222,232 > > + movups -56(%rsi),%xmm0 > > + cmpl 36(%rsp),%ecx > > +.byte 102,15,56,222,209 > > +.byte 102,15,56,222,217 > > +.byte 102,15,56,222,225 > > + cmovgeq %rbp,%r9 > > + cmovgq %rbp,%r13 > > +.byte 102,15,56,222,233 > > + movups -40(%rsi),%xmm1 > > + cmpl 40(%rsp),%ecx > > +.byte 102,15,56,222,208 > > +.byte 102,15,56,222,216 > > +.byte 102,15,56,222,224 > > + cmovgeq %rbp,%r10 > > + cmovgq %rbp,%r14 > > +.byte 102,15,56,222,232 > > + movups -24(%rsi),%xmm0 > > + cmpl 44(%rsp),%ecx > > +.byte 102,15,56,222,209 > > +.byte 102,15,56,222,217 > > +.byte 102,15,56,222,225 > > + cmovgeq %rbp,%r11 > > + cmovgq %rbp,%r15 > > +.byte 102,15,56,222,233 > > + movups -8(%rsi),%xmm1 > > + movdqa %xmm10,%xmm11 > > +.byte 102,15,56,222,208 > > + prefetcht0 15(%r12,%rbx,1) > > + prefetcht0 15(%r13,%rbx,1) > > +.byte 102,15,56,222,216 > > + prefetcht0 15(%r14,%rbx,1) > > + prefetcht0 15(%r15,%rbx,1) > > +.byte 102,15,56,222,224 > > +.byte 102,15,56,222,232 > > + movups 128-120(%rsi),%xmm0 > > + pxor %xmm12,%xmm12 > > + > > +.byte 102,15,56,222,209 > > + pcmpgtd %xmm12,%xmm11 > > + movdqu -120(%rsi),%xmm12 > > +.byte 102,15,56,222,217 > > + paddd %xmm11,%xmm10 > > + movdqa %xmm10,32(%rsp) > > +.byte 102,15,56,222,225 > > +.byte 102,15,56,222,233 > > + movups 144-120(%rsi),%xmm1 > > + > > + cmpl $11,%eax > > + > > +.byte 102,15,56,222,208 > > +.byte 102,15,56,222,216 > > +.byte 102,15,56,222,224 > > +.byte 102,15,56,222,232 > > + movups 160-120(%rsi),%xmm0 > > + > > + jb .Ldec4x_tail > > + > > +.byte 102,15,56,222,209 > > +.byte 102,15,56,222,217 > > +.byte 102,15,56,222,225 > > +.byte 102,15,56,222,233 > > + movups 176-120(%rsi),%xmm1 > > + > > +.byte 102,15,56,222,208 > > +.byte 102,15,56,222,216 > > +.byte 102,15,56,222,224 > > +.byte 102,15,56,222,232 > > + movups 192-120(%rsi),%xmm0 > > + > > + je .Ldec4x_tail > > + > > +.byte 102,15,56,222,209 > > +.byte 102,15,56,222,217 > > +.byte 102,15,56,222,225 > > +.byte 102,15,56,222,233 > > + movups 208-120(%rsi),%xmm1 > > + > > +.byte 102,15,56,222,208 > > +.byte 102,15,56,222,216 > > +.byte 102,15,56,222,224 > > +.byte 102,15,56,222,232 > > + movups 224-120(%rsi),%xmm0 > > + jmp .Ldec4x_tail > > + > > +.align 32 > > +.Ldec4x_tail: > > +.byte 102,15,56,222,209 > > +.byte 102,15,56,222,217 > > +.byte 102,15,56,222,225 > > + pxor %xmm0,%xmm6 > > + pxor %xmm0,%xmm7 > > +.byte 102,15,56,222,233 > > + movdqu 16-120(%rsi),%xmm1 > > + pxor %xmm0,%xmm8 > > + pxor %xmm0,%xmm9 > > + movdqu 32-120(%rsi),%xmm0 > > + > > +.byte 102,15,56,223,214 > > +.byte 102,15,56,223,223 > > + movdqu -16(%r8,%rbx,1),%xmm6 > > + movdqu -16(%r9,%rbx,1),%xmm7 > > +.byte 102,65,15,56,223,224 > > +.byte 102,65,15,56,223,233 > > + movdqu -16(%r10,%rbx,1),%xmm8 > > + movdqu -16(%r11,%rbx,1),%xmm9 > > + > > + movups %xmm2,-16(%r12,%rbx,1) > > + movdqu (%r8,%rbx,1),%xmm2 > > + movups %xmm3,-16(%r13,%rbx,1) > > + movdqu (%r9,%rbx,1),%xmm3 > > + pxor %xmm12,%xmm2 > > + movups %xmm4,-16(%r14,%rbx,1) > > + movdqu (%r10,%rbx,1),%xmm4 > > + pxor %xmm12,%xmm3 > > + movups %xmm5,-16(%r15,%rbx,1) > > + movdqu (%r11,%rbx,1),%xmm5 > > + pxor %xmm12,%xmm4 > > + pxor %xmm12,%xmm5 > > + > > + decl %edx > > + jnz .Loop_dec4x > > + > > + movq 16(%rsp),%rax > > +.cfi_def_cfa %rax,8 > > + movl 24(%rsp),%edx > > + > > + leaq 160(%rdi),%rdi > > + decl %edx > > + jnz .Ldec4x_loop_grande > > + > > +.Ldec4x_done: > > + movq -48(%rax),%r15 > > +.cfi_restore %r15 > > + movq -40(%rax),%r14 > > +.cfi_restore %r14 > > + movq -32(%rax),%r13 > > +.cfi_restore %r13 > > + movq -24(%rax),%r12 > > +.cfi_restore %r12 > > + movq -16(%rax),%rbp > > +.cfi_restore %rbp > > + movq -8(%rax),%rbx > > +.cfi_restore %rbx > > + leaq (%rax),%rsp > > +.cfi_def_cfa_register %rsp > > +.Ldec4x_epilogue: > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size aesni_multi_cbc_decrypt,.-aesni_multi_cbc_decrypt > > diff --git a/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/aesni-sha1- > > x86_64.S b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/aesni-sha1- > > x86_64.S > > new file mode 100644 > > index 0000000000..ab763a2eec > > --- /dev/null > > +++ > b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/aesni-sha1-x86_64.S > > @@ -0,0 +1,1719 @@ > > +# WARNING: do not edit! > > +# Generated from openssl/crypto/aes/asm/aesni-sha1-x86_64.pl > > +# > > +# Copyright 2011-2020 The OpenSSL Project Authors. All Rights Reserved. > > +# > > +# Licensed under the OpenSSL license (the "License"). You may not use > > +# this file except in compliance with the License. You can obtain a copy > > +# in the file LICENSE in the source distribution or at > > +# https://www.openssl.org/source/license.html > > + > > +.text > > + > > + > > +.globl aesni_cbc_sha1_enc > > +.type aesni_cbc_sha1_enc,@function > > +.align 32 > > +aesni_cbc_sha1_enc: > > +.cfi_startproc > > + > > + movl OPENSSL_ia32cap_P+0(%rip),%r10d > > + movq OPENSSL_ia32cap_P+4(%rip),%r11 > > + btq $61,%r11 > > + jc aesni_cbc_sha1_enc_shaext > > + jmp aesni_cbc_sha1_enc_ssse3 > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size aesni_cbc_sha1_enc,.-aesni_cbc_sha1_enc > > +.type aesni_cbc_sha1_enc_ssse3,@function > > +.align 32 > > +aesni_cbc_sha1_enc_ssse3: > > +.cfi_startproc > > + movq 8(%rsp),%r10 > > + > > + > > + pushq %rbx > > +.cfi_adjust_cfa_offset 8 > > +.cfi_offset %rbx,-16 > > + pushq %rbp > > +.cfi_adjust_cfa_offset 8 > > +.cfi_offset %rbp,-24 > > + pushq %r12 > > +.cfi_adjust_cfa_offset 8 > > +.cfi_offset %r12,-32 > > + pushq %r13 > > +.cfi_adjust_cfa_offset 8 > > +.cfi_offset %r13,-40 > > + pushq %r14 > > +.cfi_adjust_cfa_offset 8 > > +.cfi_offset %r14,-48 > > + pushq %r15 > > +.cfi_adjust_cfa_offset 8 > > +.cfi_offset %r15,-56 > > + leaq -104(%rsp),%rsp > > +.cfi_adjust_cfa_offset 104 > > + > > + > > + movq %rdi,%r12 > > + movq %rsi,%r13 > > + movq %rdx,%r14 > > + leaq 112(%rcx),%r15 > > + movdqu (%r8),%xmm2 > > + movq %r8,88(%rsp) > > + shlq $6,%r14 > > + subq %r12,%r13 > > + movl 240-112(%r15),%r8d > > + addq %r10,%r14 > > + > > + leaq K_XX_XX(%rip),%r11 > > + movl 0(%r9),%eax > > + movl 4(%r9),%ebx > > + movl 8(%r9),%ecx > > + movl 12(%r9),%edx > > + movl %ebx,%esi > > + movl 16(%r9),%ebp > > + movl %ecx,%edi > > + xorl %edx,%edi > > + andl %edi,%esi > > + > > + movdqa 64(%r11),%xmm3 > > + movdqa 0(%r11),%xmm13 > > + movdqu 0(%r10),%xmm4 > > + movdqu 16(%r10),%xmm5 > > + movdqu 32(%r10),%xmm6 > > + movdqu 48(%r10),%xmm7 > > +.byte 102,15,56,0,227 > > +.byte 102,15,56,0,235 > > +.byte 102,15,56,0,243 > > + addq $64,%r10 > > + paddd %xmm13,%xmm4 > > +.byte 102,15,56,0,251 > > + paddd %xmm13,%xmm5 > > + paddd %xmm13,%xmm6 > > + movdqa %xmm4,0(%rsp) > > + psubd %xmm13,%xmm4 > > + movdqa %xmm5,16(%rsp) > > + psubd %xmm13,%xmm5 > > + movdqa %xmm6,32(%rsp) > > + psubd %xmm13,%xmm6 > > + movups -112(%r15),%xmm15 > > + movups 16-112(%r15),%xmm0 > > + jmp .Loop_ssse3 > > +.align 32 > > +.Loop_ssse3: > > + rorl $2,%ebx > > + movups 0(%r12),%xmm14 > > + xorps %xmm15,%xmm14 > > + xorps %xmm14,%xmm2 > > + movups -80(%r15),%xmm1 > > +.byte 102,15,56,220,208 > > + pshufd $238,%xmm4,%xmm8 > > + xorl %edx,%esi > > + movdqa %xmm7,%xmm12 > > + paddd %xmm7,%xmm13 > > + movl %eax,%edi > > + addl 0(%rsp),%ebp > > + punpcklqdq %xmm5,%xmm8 > > + xorl %ecx,%ebx > > + roll $5,%eax > > + addl %esi,%ebp > > + psrldq $4,%xmm12 > > + andl %ebx,%edi > > + xorl %ecx,%ebx > > + pxor %xmm4,%xmm8 > > + addl %eax,%ebp > > + rorl $7,%eax > > + pxor %xmm6,%xmm12 > > + xorl %ecx,%edi > > + movl %ebp,%esi > > + addl 4(%rsp),%edx > > + pxor %xmm12,%xmm8 > > + xorl %ebx,%eax > > + roll $5,%ebp > > + movdqa %xmm13,48(%rsp) > > + addl %edi,%edx > > + movups -64(%r15),%xmm0 > > +.byte 102,15,56,220,209 > > + andl %eax,%esi > > + movdqa %xmm8,%xmm3 > > + xorl %ebx,%eax > > + addl %ebp,%edx > > + rorl $7,%ebp > > + movdqa %xmm8,%xmm12 > > + xorl %ebx,%esi > > + pslldq $12,%xmm3 > > + paddd %xmm8,%xmm8 > > + movl %edx,%edi > > + addl 8(%rsp),%ecx > > + psrld $31,%xmm12 > > + xorl %eax,%ebp > > + roll $5,%edx > > + addl %esi,%ecx > > + movdqa %xmm3,%xmm13 > > + andl %ebp,%edi > > + xorl %eax,%ebp > > + psrld $30,%xmm3 > > + addl %edx,%ecx > > + rorl $7,%edx > > + por %xmm12,%xmm8 > > + xorl %eax,%edi > > + movl %ecx,%esi > > + addl 12(%rsp),%ebx > > + movups -48(%r15),%xmm1 > > +.byte 102,15,56,220,208 > > + pslld $2,%xmm13 > > + pxor %xmm3,%xmm8 > > + xorl %ebp,%edx > > + movdqa 0(%r11),%xmm3 > > + roll $5,%ecx > > + addl %edi,%ebx > > + andl %edx,%esi > > + pxor %xmm13,%xmm8 > > + xorl %ebp,%edx > > + addl %ecx,%ebx > > + rorl $7,%ecx > > + pshufd $238,%xmm5,%xmm9 > > + xorl %ebp,%esi > > + movdqa %xmm8,%xmm13 > > + paddd %xmm8,%xmm3 > > + movl %ebx,%edi > > + addl 16(%rsp),%eax > > + punpcklqdq %xmm6,%xmm9 > > + xorl %edx,%ecx > > + roll $5,%ebx > > + addl %esi,%eax > > + psrldq $4,%xmm13 > > + andl %ecx,%edi > > + xorl %edx,%ecx > > + pxor %xmm5,%xmm9 > > + addl %ebx,%eax > > + rorl $7,%ebx > > + movups -32(%r15),%xmm0 > > +.byte 102,15,56,220,209 > > + pxor %xmm7,%xmm13 > > + xorl %edx,%edi > > + movl %eax,%esi > > + addl 20(%rsp),%ebp > > + pxor %xmm13,%xmm9 > > + xorl %ecx,%ebx > > + roll $5,%eax > > + movdqa %xmm3,0(%rsp) > > + addl %edi,%ebp > > + andl %ebx,%esi > > + movdqa %xmm9,%xmm12 > > + xorl %ecx,%ebx > > + addl %eax,%ebp > > + rorl $7,%eax > > + movdqa %xmm9,%xmm13 > > + xorl %ecx,%esi > > + pslldq $12,%xmm12 > > + paddd %xmm9,%xmm9 > > + movl %ebp,%edi > > + addl 24(%rsp),%edx > > + psrld $31,%xmm13 > > + xorl %ebx,%eax > > + roll $5,%ebp > > + addl %esi,%edx > > + movups -16(%r15),%xmm1 > > +.byte 102,15,56,220,208 > > + movdqa %xmm12,%xmm3 > > + andl %eax,%edi > > + xorl %ebx,%eax > > + psrld $30,%xmm12 > > + addl %ebp,%edx > > + rorl $7,%ebp > > + por %xmm13,%xmm9 > > + xorl %ebx,%edi > > + movl %edx,%esi > > + addl 28(%rsp),%ecx > > + pslld $2,%xmm3 > > + pxor %xmm12,%xmm9 > > + xorl %eax,%ebp > > + movdqa 16(%r11),%xmm12 > > + roll $5,%edx > > + addl %edi,%ecx > > + andl %ebp,%esi > > + pxor %xmm3,%xmm9 > > + xorl %eax,%ebp > > + addl %edx,%ecx > > + rorl $7,%edx > > + pshufd $238,%xmm6,%xmm10 > > + xorl %eax,%esi > > + movdqa %xmm9,%xmm3 > > + paddd %xmm9,%xmm12 > > + movl %ecx,%edi > > + addl 32(%rsp),%ebx > > + movups 0(%r15),%xmm0 > > +.byte 102,15,56,220,209 > > + punpcklqdq %xmm7,%xmm10 > > + xorl %ebp,%edx > > + roll $5,%ecx > > + addl %esi,%ebx > > + psrldq $4,%xmm3 > > + andl %edx,%edi > > + xorl %ebp,%edx > > + pxor %xmm6,%xmm10 > > + addl %ecx,%ebx > > + rorl $7,%ecx > > + pxor %xmm8,%xmm3 > > + xorl %ebp,%edi > > + movl %ebx,%esi > > + addl 36(%rsp),%eax > > + pxor %xmm3,%xmm10 > > + xorl %edx,%ecx > > + roll $5,%ebx > > + movdqa %xmm12,16(%rsp) > > + addl %edi,%eax > > + andl %ecx,%esi > > + movdqa %xmm10,%xmm13 > > + xorl %edx,%ecx > > + addl %ebx,%eax > > + rorl $7,%ebx > > + movups 16(%r15),%xmm1 > > +.byte 102,15,56,220,208 > > + movdqa %xmm10,%xmm3 > > + xorl %edx,%esi > > + pslldq $12,%xmm13 > > + paddd %xmm10,%xmm10 > > + movl %eax,%edi > > + addl 40(%rsp),%ebp > > + psrld $31,%xmm3 > > + xorl %ecx,%ebx > > + roll $5,%eax > > + addl %esi,%ebp > > + movdqa %xmm13,%xmm12 > > + andl %ebx,%edi > > + xorl %ecx,%ebx > > + psrld $30,%xmm13 > > + addl %eax,%ebp > > + rorl $7,%eax > > + por %xmm3,%xmm10 > > + xorl %ecx,%edi > > + movl %ebp,%esi > > + addl 44(%rsp),%edx > > + pslld $2,%xmm12 > > + pxor %xmm13,%xmm10 > > + xorl %ebx,%eax > > + movdqa 16(%r11),%xmm13 > > + roll $5,%ebp > > + addl %edi,%edx > > + movups 32(%r15),%xmm0 > > +.byte 102,15,56,220,209 > > + andl %eax,%esi > > + pxor %xmm12,%xmm10 > > + xorl %ebx,%eax > > + addl %ebp,%edx > > + rorl $7,%ebp > > + pshufd $238,%xmm7,%xmm11 > > + xorl %ebx,%esi > > + movdqa %xmm10,%xmm12 > > + paddd %xmm10,%xmm13 > > + movl %edx,%edi > > + addl 48(%rsp),%ecx > > + punpcklqdq %xmm8,%xmm11 > > + xorl %eax,%ebp > > + roll $5,%edx > > + addl %esi,%ecx > > + psrldq $4,%xmm12 > > + andl %ebp,%edi > > + xorl %eax,%ebp > > + pxor %xmm7,%xmm11 > > + addl %edx,%ecx > > + rorl $7,%edx > > + pxor %xmm9,%xmm12 > > + xorl %eax,%edi > > + movl %ecx,%esi > > + addl 52(%rsp),%ebx > > + movups 48(%r15),%xmm1 > > +.byte 102,15,56,220,208 > > + pxor %xmm12,%xmm11 > > + xorl %ebp,%edx > > + roll $5,%ecx > > + movdqa %xmm13,32(%rsp) > > + addl %edi,%ebx > > + andl %edx,%esi > > + movdqa %xmm11,%xmm3 > > + xorl %ebp,%edx > > + addl %ecx,%ebx > > + rorl $7,%ecx > > + movdqa %xmm11,%xmm12 > > + xorl %ebp,%esi > > + pslldq $12,%xmm3 > > + paddd %xmm11,%xmm11 > > + movl %ebx,%edi > > + addl 56(%rsp),%eax > > + psrld $31,%xmm12 > > + xorl %edx,%ecx > > + roll $5,%ebx > > + addl %esi,%eax > > + movdqa %xmm3,%xmm13 > > + andl %ecx,%edi > > + xorl %edx,%ecx > > + psrld $30,%xmm3 > > + addl %ebx,%eax > > + rorl $7,%ebx > > + cmpl $11,%r8d > > + jb .Laesenclast1 > > + movups 64(%r15),%xmm0 > > +.byte 102,15,56,220,209 > > + movups 80(%r15),%xmm1 > > +.byte 102,15,56,220,208 > > + je .Laesenclast1 > > + movups 96(%r15),%xmm0 > > +.byte 102,15,56,220,209 > > + movups 112(%r15),%xmm1 > > +.byte 102,15,56,220,208 > > +.Laesenclast1: > > +.byte 102,15,56,221,209 > > + movups 16-112(%r15),%xmm0 > > + por %xmm12,%xmm11 > > + xorl %edx,%edi > > + movl %eax,%esi > > + addl 60(%rsp),%ebp > > + pslld $2,%xmm13 > > + pxor %xmm3,%xmm11 > > + xorl %ecx,%ebx > > + movdqa 16(%r11),%xmm3 > > + roll $5,%eax > > + addl %edi,%ebp > > + andl %ebx,%esi > > + pxor %xmm13,%xmm11 > > + pshufd $238,%xmm10,%xmm13 > > + xorl %ecx,%ebx > > + addl %eax,%ebp > > + rorl $7,%eax > > + pxor %xmm8,%xmm4 > > + xorl %ecx,%esi > > + movl %ebp,%edi > > + addl 0(%rsp),%edx > > + punpcklqdq %xmm11,%xmm13 > > + xorl %ebx,%eax > > + roll $5,%ebp > > + pxor %xmm5,%xmm4 > > + addl %esi,%edx > > + movups 16(%r12),%xmm14 > > + xorps %xmm15,%xmm14 > > + movups %xmm2,0(%r12,%r13,1) > > + xorps %xmm14,%xmm2 > > + movups -80(%r15),%xmm1 > > +.byte 102,15,56,220,208 > > + andl %eax,%edi > > + movdqa %xmm3,%xmm12 > > + xorl %ebx,%eax > > + paddd %xmm11,%xmm3 > > + addl %ebp,%edx > > + pxor %xmm13,%xmm4 > > + rorl $7,%ebp > > + xorl %ebx,%edi > > + movl %edx,%esi > > + addl 4(%rsp),%ecx > > + movdqa %xmm4,%xmm13 > > + xorl %eax,%ebp > > + roll $5,%edx > > + movdqa %xmm3,48(%rsp) > > + addl %edi,%ecx > > + andl %ebp,%esi > > + xorl %eax,%ebp > > + pslld $2,%xmm4 > > + addl %edx,%ecx > > + rorl $7,%edx > > + psrld $30,%xmm13 > > + xorl %eax,%esi > > + movl %ecx,%edi > > + addl 8(%rsp),%ebx > > + movups -64(%r15),%xmm0 > > +.byte 102,15,56,220,209 > > + por %xmm13,%xmm4 > > + xorl %ebp,%edx > > + roll $5,%ecx > > + pshufd $238,%xmm11,%xmm3 > > + addl %esi,%ebx > > + andl %edx,%edi > > + xorl %ebp,%edx > > + addl %ecx,%ebx > > + addl 12(%rsp),%eax > > + xorl %ebp,%edi > > + movl %ebx,%esi > > + roll $5,%ebx > > + addl %edi,%eax > > + xorl %edx,%esi > > + rorl $7,%ecx > > + addl %ebx,%eax > > + pxor %xmm9,%xmm5 > > + addl 16(%rsp),%ebp > > + movups -48(%r15),%xmm1 > > +.byte 102,15,56,220,208 > > + xorl %ecx,%esi > > + punpcklqdq %xmm4,%xmm3 > > + movl %eax,%edi > > + roll $5,%eax > > + pxor %xmm6,%xmm5 > > + addl %esi,%ebp > > + xorl %ecx,%edi > > + movdqa %xmm12,%xmm13 > > + rorl $7,%ebx > > + paddd %xmm4,%xmm12 > > + addl %eax,%ebp > > + pxor %xmm3,%xmm5 > > + addl 20(%rsp),%edx > > + xorl %ebx,%edi > > + movl %ebp,%esi > > + roll $5,%ebp > > + movdqa %xmm5,%xmm3 > > + addl %edi,%edx > > + xorl %ebx,%esi > > + movdqa %xmm12,0(%rsp) > > + rorl $7,%eax > > + addl %ebp,%edx > > + addl 24(%rsp),%ecx > > + pslld $2,%xmm5 > > + xorl %eax,%esi > > + movl %edx,%edi > > + psrld $30,%xmm3 > > + roll $5,%edx > > + addl %esi,%ecx > > + movups -32(%r15),%xmm0 > > +.byte 102,15,56,220,209 > > + xorl %eax,%edi > > + rorl $7,%ebp > > + por %xmm3,%xmm5 > > + addl %edx,%ecx > > + addl 28(%rsp),%ebx > > + pshufd $238,%xmm4,%xmm12 > > + xorl %ebp,%edi > > + movl %ecx,%esi > > + roll $5,%ecx > > + addl %edi,%ebx > > + xorl %ebp,%esi > > + rorl $7,%edx > > + addl %ecx,%ebx > > + pxor %xmm10,%xmm6 > > + addl 32(%rsp),%eax > > + xorl %edx,%esi > > + punpcklqdq %xmm5,%xmm12 > > + movl %ebx,%edi > > + roll $5,%ebx > > + pxor %xmm7,%xmm6 > > + addl %esi,%eax > > + xorl %edx,%edi > > + movdqa 32(%r11),%xmm3 > > + rorl $7,%ecx > > + paddd %xmm5,%xmm13 > > + addl %ebx,%eax > > + pxor %xmm12,%xmm6 > > + addl 36(%rsp),%ebp > > + movups -16(%r15),%xmm1 > > +.byte 102,15,56,220,208 > > + xorl %ecx,%edi > > + movl %eax,%esi > > + roll $5,%eax > > + movdqa %xmm6,%xmm12 > > + addl %edi,%ebp > > + xorl %ecx,%esi > > + movdqa %xmm13,16(%rsp) > > + rorl $7,%ebx > > + addl %eax,%ebp > > + addl 40(%rsp),%edx > > + pslld $2,%xmm6 > > + xorl %ebx,%esi > > + movl %ebp,%edi > > + psrld $30,%xmm12 > > + roll $5,%ebp > > + addl %esi,%edx > > + xorl %ebx,%edi > > + rorl $7,%eax > > + por %xmm12,%xmm6 > > + addl %ebp,%edx > > + addl 44(%rsp),%ecx > > + pshufd $238,%xmm5,%xmm13 > > + xorl %eax,%edi > > + movl %edx,%esi > > + roll $5,%edx > > + addl %edi,%ecx > > + movups 0(%r15),%xmm0 > > +.byte 102,15,56,220,209 > > + xorl %eax,%esi > > + rorl $7,%ebp > > + addl %edx,%ecx > > + pxor %xmm11,%xmm7 > > + addl 48(%rsp),%ebx > > + xorl %ebp,%esi > > + punpcklqdq %xmm6,%xmm13 > > + movl %ecx,%edi > > + roll $5,%ecx > > + pxor %xmm8,%xmm7 > > + addl %esi,%ebx > > + xorl %ebp,%edi > > + movdqa %xmm3,%xmm12 > > + rorl $7,%edx > > + paddd %xmm6,%xmm3 > > + addl %ecx,%ebx > > + pxor %xmm13,%xmm7 > > + addl 52(%rsp),%eax > > + xorl %edx,%edi > > + movl %ebx,%esi > > + roll $5,%ebx > > + movdqa %xmm7,%xmm13 > > + addl %edi,%eax > > + xorl %edx,%esi > > + movdqa %xmm3,32(%rsp) > > + rorl $7,%ecx > > + addl %ebx,%eax > > + addl 56(%rsp),%ebp > > + movups 16(%r15),%xmm1 > > +.byte 102,15,56,220,208 > > + pslld $2,%xmm7 > > + xorl %ecx,%esi > > + movl %eax,%edi > > + psrld $30,%xmm13 > > + roll $5,%eax > > + addl %esi,%ebp > > + xorl %ecx,%edi > > + rorl $7,%ebx > > + por %xmm13,%xmm7 > > + addl %eax,%ebp > > + addl 60(%rsp),%edx > > + pshufd $238,%xmm6,%xmm3 > > + xorl %ebx,%edi > > + movl %ebp,%esi > > + roll $5,%ebp > > + addl %edi,%edx > > + xorl %ebx,%esi > > + rorl $7,%eax > > + addl %ebp,%edx > > + pxor %xmm4,%xmm8 > > + addl 0(%rsp),%ecx > > + xorl %eax,%esi > > + punpcklqdq %xmm7,%xmm3 > > + movl %edx,%edi > > + roll $5,%edx > > + pxor %xmm9,%xmm8 > > + addl %esi,%ecx > > + movups 32(%r15),%xmm0 > > +.byte 102,15,56,220,209 > > + xorl %eax,%edi > > + movdqa %xmm12,%xmm13 > > + rorl $7,%ebp > > + paddd %xmm7,%xmm12 > > + addl %edx,%ecx > > + pxor %xmm3,%xmm8 > > + addl 4(%rsp),%ebx > > + xorl %ebp,%edi > > + movl %ecx,%esi > > + roll $5,%ecx > > + movdqa %xmm8,%xmm3 > > + addl %edi,%ebx > > + xorl %ebp,%esi > > + movdqa %xmm12,48(%rsp) > > + rorl $7,%edx > > + addl %ecx,%ebx > > + addl 8(%rsp),%eax > > + pslld $2,%xmm8 > > + xorl %edx,%esi > > + movl %ebx,%edi > > + psrld $30,%xmm3 > > + roll $5,%ebx > > + addl %esi,%eax > > + xorl %edx,%edi > > + rorl $7,%ecx > > + por %xmm3,%xmm8 > > + addl %ebx,%eax > > + addl 12(%rsp),%ebp > > + movups 48(%r15),%xmm1 > > +.byte 102,15,56,220,208 > > + pshufd $238,%xmm7,%xmm12 > > + xorl %ecx,%edi > > + movl %eax,%esi > > + roll $5,%eax > > + addl %edi,%ebp > > + xorl %ecx,%esi > > + rorl $7,%ebx > > + addl %eax,%ebp > > + pxor %xmm5,%xmm9 > > + addl 16(%rsp),%edx > > + xorl %ebx,%esi > > + punpcklqdq %xmm8,%xmm12 > > + movl %ebp,%edi > > + roll $5,%ebp > > + pxor %xmm10,%xmm9 > > + addl %esi,%edx > > + xorl %ebx,%edi > > + movdqa %xmm13,%xmm3 > > + rorl $7,%eax > > + paddd %xmm8,%xmm13 > > + addl %ebp,%edx > > + pxor %xmm12,%xmm9 > > + addl 20(%rsp),%ecx > > + xorl %eax,%edi > > + movl %edx,%esi > > + roll $5,%edx > > + movdqa %xmm9,%xmm12 > > + addl %edi,%ecx > > + cmpl $11,%r8d > > + jb .Laesenclast2 > > + movups 64(%r15),%xmm0 > > +.byte 102,15,56,220,209 > > + movups 80(%r15),%xmm1 > > +.byte 102,15,56,220,208 > > + je .Laesenclast2 > > + movups 96(%r15),%xmm0 > > +.byte 102,15,56,220,209 > > + movups 112(%r15),%xmm1 > > +.byte 102,15,56,220,208 > > +.Laesenclast2: > > +.byte 102,15,56,221,209 > > + movups 16-112(%r15),%xmm0 > > + xorl %eax,%esi > > + movdqa %xmm13,0(%rsp) > > + rorl $7,%ebp > > + addl %edx,%ecx > > + addl 24(%rsp),%ebx > > + pslld $2,%xmm9 > > + xorl %ebp,%esi > > + movl %ecx,%edi > > + psrld $30,%xmm12 > > + roll $5,%ecx > > + addl %esi,%ebx > > + xorl %ebp,%edi > > + rorl $7,%edx > > + por %xmm12,%xmm9 > > + addl %ecx,%ebx > > + addl 28(%rsp),%eax > > + pshufd $238,%xmm8,%xmm13 > > + rorl $7,%ecx > > + movl %ebx,%esi > > + xorl %edx,%edi > > + roll $5,%ebx > > + addl %edi,%eax > > + xorl %ecx,%esi > > + xorl %edx,%ecx > > + addl %ebx,%eax > > + pxor %xmm6,%xmm10 > > + addl 32(%rsp),%ebp > > + movups 32(%r12),%xmm14 > > + xorps %xmm15,%xmm14 > > + movups %xmm2,16(%r13,%r12,1) > > + xorps %xmm14,%xmm2 > > + movups -80(%r15),%xmm1 > > +.byte 102,15,56,220,208 > > + andl %ecx,%esi > > + xorl %edx,%ecx > > + rorl $7,%ebx > > + punpcklqdq %xmm9,%xmm13 > > + movl %eax,%edi > > + xorl %ecx,%esi > > + pxor %xmm11,%xmm10 > > + roll $5,%eax > > + addl %esi,%ebp > > + movdqa %xmm3,%xmm12 > > + xorl %ebx,%edi > > + paddd %xmm9,%xmm3 > > + xorl %ecx,%ebx > > + pxor %xmm13,%xmm10 > > + addl %eax,%ebp > > + addl 36(%rsp),%edx > > + andl %ebx,%edi > > + xorl %ecx,%ebx > > + rorl $7,%eax > > + movdqa %xmm10,%xmm13 > > + movl %ebp,%esi > > + xorl %ebx,%edi > > + movdqa %xmm3,16(%rsp) > > + roll $5,%ebp > > + addl %edi,%edx > > + movups -64(%r15),%xmm0 > > +.byte 102,15,56,220,209 > > + xorl %eax,%esi > > + pslld $2,%xmm10 > > + xorl %ebx,%eax > > + addl %ebp,%edx > > + psrld $30,%xmm13 > > + addl 40(%rsp),%ecx > > + andl %eax,%esi > > + xorl %ebx,%eax > > + por %xmm13,%xmm10 > > + rorl $7,%ebp > > + movl %edx,%edi > > + xorl %eax,%esi > > + roll $5,%edx > > + pshufd $238,%xmm9,%xmm3 > > + addl %esi,%ecx > > + xorl %ebp,%edi > > + xorl %eax,%ebp > > + addl %edx,%ecx > > + addl 44(%rsp),%ebx > > + andl %ebp,%edi > > + xorl %eax,%ebp > > + rorl $7,%edx > > + movups -48(%r15),%xmm1 > > +.byte 102,15,56,220,208 > > + movl %ecx,%esi > > + xorl %ebp,%edi > > + roll $5,%ecx > > + addl %edi,%ebx > > + xorl %edx,%esi > > + xorl %ebp,%edx > > + addl %ecx,%ebx > > + pxor %xmm7,%xmm11 > > + addl 48(%rsp),%eax > > + andl %edx,%esi > > + xorl %ebp,%edx > > + rorl $7,%ecx > > + punpcklqdq %xmm10,%xmm3 > > + movl %ebx,%edi > > + xorl %edx,%esi > > + pxor %xmm4,%xmm11 > > + roll $5,%ebx > > + addl %esi,%eax > > + movdqa 48(%r11),%xmm13 > > + xorl %ecx,%edi > > + paddd %xmm10,%xmm12 > > + xorl %edx,%ecx > > + pxor %xmm3,%xmm11 > > + addl %ebx,%eax > > + addl 52(%rsp),%ebp > > + movups -32(%r15),%xmm0 > > +.byte 102,15,56,220,209 > > + andl %ecx,%edi > > + xorl %edx,%ecx > > + rorl $7,%ebx > > + movdqa %xmm11,%xmm3 > > + movl %eax,%esi > > + xorl %ecx,%edi > > + movdqa %xmm12,32(%rsp) > > + roll $5,%eax > > + addl %edi,%ebp > > + xorl %ebx,%esi > > + pslld $2,%xmm11 > > + xorl %ecx,%ebx > > + addl %eax,%ebp > > + psrld $30,%xmm3 > > + addl 56(%rsp),%edx > > + andl %ebx,%esi > > + xorl %ecx,%ebx > > + por %xmm3,%xmm11 > > + rorl $7,%eax > > + movl %ebp,%edi > > + xorl %ebx,%esi > > + roll $5,%ebp > > + pshufd $238,%xmm10,%xmm12 > > + addl %esi,%edx > > + movups -16(%r15),%xmm1 > > +.byte 102,15,56,220,208 > > + xorl %eax,%edi > > + xorl %ebx,%eax > > + addl %ebp,%edx > > + addl 60(%rsp),%ecx > > + andl %eax,%edi > > + xorl %ebx,%eax > > + rorl $7,%ebp > > + movl %edx,%esi > > + xorl %eax,%edi > > + roll $5,%edx > > + addl %edi,%ecx > > + xorl %ebp,%esi > > + xorl %eax,%ebp > > + addl %edx,%ecx > > + pxor %xmm8,%xmm4 > > + addl 0(%rsp),%ebx > > + andl %ebp,%esi > > + xorl %eax,%ebp > > + rorl $7,%edx > > + movups 0(%r15),%xmm0 > > +.byte 102,15,56,220,209 > > + punpcklqdq %xmm11,%xmm12 > > + movl %ecx,%edi > > + xorl %ebp,%esi > > + pxor %xmm5,%xmm4 > > + roll $5,%ecx > > + addl %esi,%ebx > > + movdqa %xmm13,%xmm3 > > + xorl %edx,%edi > > + paddd %xmm11,%xmm13 > > + xorl %ebp,%edx > > + pxor %xmm12,%xmm4 > > + addl %ecx,%ebx > > + addl 4(%rsp),%eax > > + andl %edx,%edi > > + xorl %ebp,%edx > > + rorl $7,%ecx > > + movdqa %xmm4,%xmm12 > > + movl %ebx,%esi > > + xorl %edx,%edi > > + movdqa %xmm13,48(%rsp) > > + roll $5,%ebx > > + addl %edi,%eax > > + xorl %ecx,%esi > > + pslld $2,%xmm4 > > + xorl %edx,%ecx > > + addl %ebx,%eax > > + psrld $30,%xmm12 > > + addl 8(%rsp),%ebp > > + movups 16(%r15),%xmm1 > > +.byte 102,15,56,220,208 > > + andl %ecx,%esi > > + xorl %edx,%ecx > > + por %xmm12,%xmm4 > > + rorl $7,%ebx > > + movl %eax,%edi > > + xorl %ecx,%esi > > + roll $5,%eax > > + pshufd $238,%xmm11,%xmm13 > > + addl %esi,%ebp > > + xorl %ebx,%edi > > + xorl %ecx,%ebx > > + addl %eax,%ebp > > + addl 12(%rsp),%edx > > + andl %ebx,%edi > > + xorl %ecx,%ebx > > + rorl $7,%eax > > + movl %ebp,%esi > > + xorl %ebx,%edi > > + roll $5,%ebp > > + addl %edi,%edx > > + movups 32(%r15),%xmm0 > > +.byte 102,15,56,220,209 > > + xorl %eax,%esi > > + xorl %ebx,%eax > > + addl %ebp,%edx > > + pxor %xmm9,%xmm5 > > + addl 16(%rsp),%ecx > > + andl %eax,%esi > > + xorl %ebx,%eax > > + rorl $7,%ebp > > + punpcklqdq %xmm4,%xmm13 > > + movl %edx,%edi > > + xorl %eax,%esi > > + pxor %xmm6,%xmm5 > > + roll $5,%edx > > + addl %esi,%ecx > > + movdqa %xmm3,%xmm12 > > + xorl %ebp,%edi > > + paddd %xmm4,%xmm3 > > + xorl %eax,%ebp > > + pxor %xmm13,%xmm5 > > + addl %edx,%ecx > > + addl 20(%rsp),%ebx > > + andl %ebp,%edi > > + xorl %eax,%ebp > > + rorl $7,%edx > > + movups 48(%r15),%xmm1 > > +.byte 102,15,56,220,208 > > + movdqa %xmm5,%xmm13 > > + movl %ecx,%esi > > + xorl %ebp,%edi > > + movdqa %xmm3,0(%rsp) > > + roll $5,%ecx > > + addl %edi,%ebx > > + xorl %edx,%esi > > + pslld $2,%xmm5 > > + xorl %ebp,%edx > > + addl %ecx,%ebx > > + psrld $30,%xmm13 > > + addl 24(%rsp),%eax > > + andl %edx,%esi > > + xorl %ebp,%edx > > + por %xmm13,%xmm5 > > + rorl $7,%ecx > > + movl %ebx,%edi > > + xorl %edx,%esi > > + roll $5,%ebx > > + pshufd $238,%xmm4,%xmm3 > > + addl %esi,%eax > > + xorl %ecx,%edi > > + xorl %edx,%ecx > > + addl %ebx,%eax > > + addl 28(%rsp),%ebp > > + cmpl $11,%r8d > > + jb .Laesenclast3 > > + movups 64(%r15),%xmm0 > > +.byte 102,15,56,220,209 > > + movups 80(%r15),%xmm1 > > +.byte 102,15,56,220,208 > > + je .Laesenclast3 > > + movups 96(%r15),%xmm0 > > +.byte 102,15,56,220,209 > > + movups 112(%r15),%xmm1 > > +.byte 102,15,56,220,208 > > +.Laesenclast3: > > +.byte 102,15,56,221,209 > > + movups 16-112(%r15),%xmm0 > > + andl %ecx,%edi > > + xorl %edx,%ecx > > + rorl $7,%ebx > > + movl %eax,%esi > > + xorl %ecx,%edi > > + roll $5,%eax > > + addl %edi,%ebp > > + xorl %ebx,%esi > > + xorl %ecx,%ebx > > + addl %eax,%ebp > > + pxor %xmm10,%xmm6 > > + addl 32(%rsp),%edx > > + andl %ebx,%esi > > + xorl %ecx,%ebx > > + rorl $7,%eax > > + punpcklqdq %xmm5,%xmm3 > > + movl %ebp,%edi > > + xorl %ebx,%esi > > + pxor %xmm7,%xmm6 > > + roll $5,%ebp > > + addl %esi,%edx > > + movups 48(%r12),%xmm14 > > + xorps %xmm15,%xmm14 > > + movups %xmm2,32(%r13,%r12,1) > > + xorps %xmm14,%xmm2 > > + movups -80(%r15),%xmm1 > > +.byte 102,15,56,220,208 > > + movdqa %xmm12,%xmm13 > > + xorl %eax,%edi > > + paddd %xmm5,%xmm12 > > + xorl %ebx,%eax > > + pxor %xmm3,%xmm6 > > + addl %ebp,%edx > > + addl 36(%rsp),%ecx > > + andl %eax,%edi > > + xorl %ebx,%eax > > + rorl $7,%ebp > > + movdqa %xmm6,%xmm3 > > + movl %edx,%esi > > + xorl %eax,%edi > > + movdqa %xmm12,16(%rsp) > > + roll $5,%edx > > + addl %edi,%ecx > > + xorl %ebp,%esi > > + pslld $2,%xmm6 > > + xorl %eax,%ebp > > + addl %edx,%ecx > > + psrld $30,%xmm3 > > + addl 40(%rsp),%ebx > > + andl %ebp,%esi > > + xorl %eax,%ebp > > + por %xmm3,%xmm6 > > + rorl $7,%edx > > + movups -64(%r15),%xmm0 > > +.byte 102,15,56,220,209 > > + movl %ecx,%edi > > + xorl %ebp,%esi > > + roll $5,%ecx > > + pshufd $238,%xmm5,%xmm12 > > + addl %esi,%ebx > > + xorl %edx,%edi > > + xorl %ebp,%edx > > + addl %ecx,%ebx > > + addl 44(%rsp),%eax > > + andl %edx,%edi > > + xorl %ebp,%edx > > + rorl $7,%ecx > > + movl %ebx,%esi > > + xorl %edx,%edi > > + roll $5,%ebx > > + addl %edi,%eax > > + xorl %edx,%esi > > + addl %ebx,%eax > > + pxor %xmm11,%xmm7 > > + addl 48(%rsp),%ebp > > + movups -48(%r15),%xmm1 > > +.byte 102,15,56,220,208 > > + xorl %ecx,%esi > > + punpcklqdq %xmm6,%xmm12 > > + movl %eax,%edi > > + roll $5,%eax > > + pxor %xmm8,%xmm7 > > + addl %esi,%ebp > > + xorl %ecx,%edi > > + movdqa %xmm13,%xmm3 > > + rorl $7,%ebx > > + paddd %xmm6,%xmm13 > > + addl %eax,%ebp > > + pxor %xmm12,%xmm7 > > + addl 52(%rsp),%edx > > + xorl %ebx,%edi > > + movl %ebp,%esi > > + roll $5,%ebp > > + movdqa %xmm7,%xmm12 > > + addl %edi,%edx > > + xorl %ebx,%esi > > + movdqa %xmm13,32(%rsp) > > + rorl $7,%eax > > + addl %ebp,%edx > > + addl 56(%rsp),%ecx > > + pslld $2,%xmm7 > > + xorl %eax,%esi > > + movl %edx,%edi > > + psrld $30,%xmm12 > > + roll $5,%edx > > + addl %esi,%ecx > > + movups -32(%r15),%xmm0 > > +.byte 102,15,56,220,209 > > + xorl %eax,%edi > > + rorl $7,%ebp > > + por %xmm12,%xmm7 > > + addl %edx,%ecx > > + addl 60(%rsp),%ebx > > + xorl %ebp,%edi > > + movl %ecx,%esi > > + roll $5,%ecx > > + addl %edi,%ebx > > + xorl %ebp,%esi > > + rorl $7,%edx > > + addl %ecx,%ebx > > + addl 0(%rsp),%eax > > + xorl %edx,%esi > > + movl %ebx,%edi > > + roll $5,%ebx > > + paddd %xmm7,%xmm3 > > + addl %esi,%eax > > + xorl %edx,%edi > > + movdqa %xmm3,48(%rsp) > > + rorl $7,%ecx > > + addl %ebx,%eax > > + addl 4(%rsp),%ebp > > + movups -16(%r15),%xmm1 > > +.byte 102,15,56,220,208 > > + xorl %ecx,%edi > > + movl %eax,%esi > > + roll $5,%eax > > + addl %edi,%ebp > > + xorl %ecx,%esi > > + rorl $7,%ebx > > + addl %eax,%ebp > > + addl 8(%rsp),%edx > > + xorl %ebx,%esi > > + movl %ebp,%edi > > + roll $5,%ebp > > + addl %esi,%edx > > + xorl %ebx,%edi > > + rorl $7,%eax > > + addl %ebp,%edx > > + addl 12(%rsp),%ecx > > + xorl %eax,%edi > > + movl %edx,%esi > > + roll $5,%edx > > + addl %edi,%ecx > > + movups 0(%r15),%xmm0 > > +.byte 102,15,56,220,209 > > + xorl %eax,%esi > > + rorl $7,%ebp > > + addl %edx,%ecx > > + cmpq %r14,%r10 > > + je .Ldone_ssse3 > > + movdqa 64(%r11),%xmm3 > > + movdqa 0(%r11),%xmm13 > > + movdqu 0(%r10),%xmm4 > > + movdqu 16(%r10),%xmm5 > > + movdqu 32(%r10),%xmm6 > > + movdqu 48(%r10),%xmm7 > > +.byte 102,15,56,0,227 > > + addq $64,%r10 > > + addl 16(%rsp),%ebx > > + xorl %ebp,%esi > > + movl %ecx,%edi > > +.byte 102,15,56,0,235 > > + roll $5,%ecx > > + addl %esi,%ebx > > + xorl %ebp,%edi > > + rorl $7,%edx > > + paddd %xmm13,%xmm4 > > + addl %ecx,%ebx > > + addl 20(%rsp),%eax > > + xorl %edx,%edi > > + movl %ebx,%esi > > + movdqa %xmm4,0(%rsp) > > + roll $5,%ebx > > + addl %edi,%eax > > + xorl %edx,%esi > > + rorl $7,%ecx > > + psubd %xmm13,%xmm4 > > + addl %ebx,%eax > > + addl 24(%rsp),%ebp > > + movups 16(%r15),%xmm1 > > +.byte 102,15,56,220,208 > > + xorl %ecx,%esi > > + movl %eax,%edi > > + roll $5,%eax > > + addl %esi,%ebp > > + xorl %ecx,%edi > > + rorl $7,%ebx > > + addl %eax,%ebp > > + addl 28(%rsp),%edx > > + xorl %ebx,%edi > > + movl %ebp,%esi > > + roll $5,%ebp > > + addl %edi,%edx > > + xorl %ebx,%esi > > + rorl $7,%eax > > + addl %ebp,%edx > > + addl 32(%rsp),%ecx > > + xorl %eax,%esi > > + movl %edx,%edi > > +.byte 102,15,56,0,243 > > + roll $5,%edx > > + addl %esi,%ecx > > + movups 32(%r15),%xmm0 > > +.byte 102,15,56,220,209 > > + xorl %eax,%edi > > + rorl $7,%ebp > > + paddd %xmm13,%xmm5 > > + addl %edx,%ecx > > + addl 36(%rsp),%ebx > > + xorl %ebp,%edi > > + movl %ecx,%esi > > + movdqa %xmm5,16(%rsp) > > + roll $5,%ecx > > + addl %edi,%ebx > > + xorl %ebp,%esi > > + rorl $7,%edx > > + psubd %xmm13,%xmm5 > > + addl %ecx,%ebx > > + addl 40(%rsp),%eax > > + xorl %edx,%esi > > + movl %ebx,%edi > > + roll $5,%ebx > > + addl %esi,%eax > > + xorl %edx,%edi > > + rorl $7,%ecx > > + addl %ebx,%eax > > + addl 44(%rsp),%ebp > > + movups 48(%r15),%xmm1 > > +.byte 102,15,56,220,208 > > + xorl %ecx,%edi > > + movl %eax,%esi > > + roll $5,%eax > > + addl %edi,%ebp > > + xorl %ecx,%esi > > + rorl $7,%ebx > > + addl %eax,%ebp > > + addl 48(%rsp),%edx > > + xorl %ebx,%esi > > + movl %ebp,%edi > > +.byte 102,15,56,0,251 > > + roll $5,%ebp > > + addl %esi,%edx > > + xorl %ebx,%edi > > + rorl $7,%eax > > + paddd %xmm13,%xmm6 > > + addl %ebp,%edx > > + addl 52(%rsp),%ecx > > + xorl %eax,%edi > > + movl %edx,%esi > > + movdqa %xmm6,32(%rsp) > > + roll $5,%edx > > + addl %edi,%ecx > > + cmpl $11,%r8d > > + jb .Laesenclast4 > > + movups 64(%r15),%xmm0 > > +.byte 102,15,56,220,209 > > + movups 80(%r15),%xmm1 > > +.byte 102,15,56,220,208 > > + je .Laesenclast4 > > + movups 96(%r15),%xmm0 > > +.byte 102,15,56,220,209 > > + movups 112(%r15),%xmm1 > > +.byte 102,15,56,220,208 > > +.Laesenclast4: > > +.byte 102,15,56,221,209 > > + movups 16-112(%r15),%xmm0 > > + xorl %eax,%esi > > + rorl $7,%ebp > > + psubd %xmm13,%xmm6 > > + addl %edx,%ecx > > + addl 56(%rsp),%ebx > > + xorl %ebp,%esi > > + movl %ecx,%edi > > + roll $5,%ecx > > + addl %esi,%ebx > > + xorl %ebp,%edi > > + rorl $7,%edx > > + addl %ecx,%ebx > > + addl 60(%rsp),%eax > > + xorl %edx,%edi > > + movl %ebx,%esi > > + roll $5,%ebx > > + addl %edi,%eax > > + rorl $7,%ecx > > + addl %ebx,%eax > > + movups %xmm2,48(%r13,%r12,1) > > + leaq 64(%r12),%r12 > > + > > + addl 0(%r9),%eax > > + addl 4(%r9),%esi > > + addl 8(%r9),%ecx > > + addl 12(%r9),%edx > > + movl %eax,0(%r9) > > + addl 16(%r9),%ebp > > + movl %esi,4(%r9) > > + movl %esi,%ebx > > + movl %ecx,8(%r9) > > + movl %ecx,%edi > > + movl %edx,12(%r9) > > + xorl %edx,%edi > > + movl %ebp,16(%r9) > > + andl %edi,%esi > > + jmp .Loop_ssse3 > > + > > +.Ldone_ssse3: > > + addl 16(%rsp),%ebx > > + xorl %ebp,%esi > > + movl %ecx,%edi > > + roll $5,%ecx > > + addl %esi,%ebx > > + xorl %ebp,%edi > > + rorl $7,%edx > > + addl %ecx,%ebx > > + addl 20(%rsp),%eax > > + xorl %edx,%edi > > + movl %ebx,%esi > > + roll $5,%ebx > > + addl %edi,%eax > > + xorl %edx,%esi > > + rorl $7,%ecx > > + addl %ebx,%eax > > + addl 24(%rsp),%ebp > > + movups 16(%r15),%xmm1 > > +.byte 102,15,56,220,208 > > + xorl %ecx,%esi > > + movl %eax,%edi > > + roll $5,%eax > > + addl %esi,%ebp > > + xorl %ecx,%edi > > + rorl $7,%ebx > > + addl %eax,%ebp > > + addl 28(%rsp),%edx > > + xorl %ebx,%edi > > + movl %ebp,%esi > > + roll $5,%ebp > > + addl %edi,%edx > > + xorl %ebx,%esi > > + rorl $7,%eax > > + addl %ebp,%edx > > + addl 32(%rsp),%ecx > > + xorl %eax,%esi > > + movl %edx,%edi > > + roll $5,%edx > > + addl %esi,%ecx > > + movups 32(%r15),%xmm0 > > +.byte 102,15,56,220,209 > > + xorl %eax,%edi > > + rorl $7,%ebp > > + addl %edx,%ecx > > + addl 36(%rsp),%ebx > > + xorl %ebp,%edi > > + movl %ecx,%esi > > + roll $5,%ecx > > + addl %edi,%ebx > > + xorl %ebp,%esi > > + rorl $7,%edx > > + addl %ecx,%ebx > > + addl 40(%rsp),%eax > > + xorl %edx,%esi > > + movl %ebx,%edi > > + roll $5,%ebx > > + addl %esi,%eax > > + xorl %edx,%edi > > + rorl $7,%ecx > > + addl %ebx,%eax > > + addl 44(%rsp),%ebp > > + movups 48(%r15),%xmm1 > > +.byte 102,15,56,220,208 > > + xorl %ecx,%edi > > + movl %eax,%esi > > + roll $5,%eax > > + addl %edi,%ebp > > + xorl %ecx,%esi > > + rorl $7,%ebx > > + addl %eax,%ebp > > + addl 48(%rsp),%edx > > + xorl %ebx,%esi > > + movl %ebp,%edi > > + roll $5,%ebp > > + addl %esi,%edx > > + xorl %ebx,%edi > > + rorl $7,%eax > > + addl %ebp,%edx > > + addl 52(%rsp),%ecx > > + xorl %eax,%edi > > + movl %edx,%esi > > + roll $5,%edx > > + addl %edi,%ecx > > + cmpl $11,%r8d > > + jb .Laesenclast5 > > + movups 64(%r15),%xmm0 > > +.byte 102,15,56,220,209 > > + movups 80(%r15),%xmm1 > > +.byte 102,15,56,220,208 > > + je .Laesenclast5 > > + movups 96(%r15),%xmm0 > > +.byte 102,15,56,220,209 > > + movups 112(%r15),%xmm1 > > +.byte 102,15,56,220,208 > > +.Laesenclast5: > > +.byte 102,15,56,221,209 > > + movups 16-112(%r15),%xmm0 > > + xorl %eax,%esi > > + rorl $7,%ebp > > + addl %edx,%ecx > > + addl 56(%rsp),%ebx > > + xorl %ebp,%esi > > + movl %ecx,%edi > > + roll $5,%ecx > > + addl %esi,%ebx > > + xorl %ebp,%edi > > + rorl $7,%edx > > + addl %ecx,%ebx > > + addl 60(%rsp),%eax > > + xorl %edx,%edi > > + movl %ebx,%esi > > + roll $5,%ebx > > + addl %edi,%eax > > + rorl $7,%ecx > > + addl %ebx,%eax > > + movups %xmm2,48(%r13,%r12,1) > > + movq 88(%rsp),%r8 > > + > > + addl 0(%r9),%eax > > + addl 4(%r9),%esi > > + addl 8(%r9),%ecx > > + movl %eax,0(%r9) > > + addl 12(%r9),%edx > > + movl %esi,4(%r9) > > + addl 16(%r9),%ebp > > + movl %ecx,8(%r9) > > + movl %edx,12(%r9) > > + movl %ebp,16(%r9) > > + movups %xmm2,(%r8) > > + leaq 104(%rsp),%rsi > > +.cfi_def_cfa %rsi,56 > > + movq 0(%rsi),%r15 > > +.cfi_restore %r15 > > + movq 8(%rsi),%r14 > > +.cfi_restore %r14 > > + movq 16(%rsi),%r13 > > +.cfi_restore %r13 > > + movq 24(%rsi),%r12 > > +.cfi_restore %r12 > > + movq 32(%rsi),%rbp > > +.cfi_restore %rbp > > + movq 40(%rsi),%rbx > > +.cfi_restore %rbx > > + leaq 48(%rsi),%rsp > > +.cfi_def_cfa %rsp,8 > > +.Lepilogue_ssse3: > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size aesni_cbc_sha1_enc_ssse3,.-aesni_cbc_sha1_enc_ssse3 > > +.align 64 > > +K_XX_XX: > > +.long 0x5a827999,0x5a827999,0x5a827999,0x5a827999 > > +.long 0x6ed9eba1,0x6ed9eba1,0x6ed9eba1,0x6ed9eba1 > > +.long 0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc > > +.long 0xca62c1d6,0xca62c1d6,0xca62c1d6,0xca62c1d6 > > +.long 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f > > +.byte 0xf,0xe,0xd,0xc,0xb,0xa,0x9,0x8,0x7,0x6,0x5,0x4,0x3,0x2,0x1,0x0 > > + > > +.byte > > > 65,69,83,78,73,45,67,66,67,43,83,72,65,49,32,115,116,105,116,99,104,32,102, > > > 111,114,32,120,56,54,95,54,52,44,32,67,82,89,80,84,79,71,65,77,83,32,98,121 > , > > > 32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46,111,114,103,6 > 2, > > 0 > > +.align 64 > > +.type aesni_cbc_sha1_enc_shaext,@function > > +.align 32 > > +aesni_cbc_sha1_enc_shaext: > > +.cfi_startproc > > + movq 8(%rsp),%r10 > > + movdqu (%r9),%xmm8 > > + movd 16(%r9),%xmm9 > > + movdqa K_XX_XX+80(%rip),%xmm7 > > + > > + movl 240(%rcx),%r11d > > + subq %rdi,%rsi > > + movups (%rcx),%xmm15 > > + movups (%r8),%xmm2 > > + movups 16(%rcx),%xmm0 > > + leaq 112(%rcx),%rcx > > + > > + pshufd $27,%xmm8,%xmm8 > > + pshufd $27,%xmm9,%xmm9 > > + jmp .Loop_shaext > > + > > +.align 16 > > +.Loop_shaext: > > + movups 0(%rdi),%xmm14 > > + xorps %xmm15,%xmm14 > > + xorps %xmm14,%xmm2 > > + movups -80(%rcx),%xmm1 > > +.byte 102,15,56,220,208 > > + movdqu (%r10),%xmm3 > > + movdqa %xmm9,%xmm12 > > +.byte 102,15,56,0,223 > > + movdqu 16(%r10),%xmm4 > > + movdqa %xmm8,%xmm11 > > + movups -64(%rcx),%xmm0 > > +.byte 102,15,56,220,209 > > +.byte 102,15,56,0,231 > > + > > + paddd %xmm3,%xmm9 > > + movdqu 32(%r10),%xmm5 > > + leaq 64(%r10),%r10 > > + pxor %xmm12,%xmm3 > > + movups -48(%rcx),%xmm1 > > +.byte 102,15,56,220,208 > > + pxor %xmm12,%xmm3 > > + movdqa %xmm8,%xmm10 > > +.byte 102,15,56,0,239 > > +.byte 69,15,58,204,193,0 > > +.byte 68,15,56,200,212 > > + movups -32(%rcx),%xmm0 > > +.byte 102,15,56,220,209 > > +.byte 15,56,201,220 > > + movdqu -16(%r10),%xmm6 > > + movdqa %xmm8,%xmm9 > > +.byte 102,15,56,0,247 > > + movups -16(%rcx),%xmm1 > > +.byte 102,15,56,220,208 > > +.byte 69,15,58,204,194,0 > > +.byte 68,15,56,200,205 > > + pxor %xmm5,%xmm3 > > +.byte 15,56,201,229 > > + movups 0(%rcx),%xmm0 > > +.byte 102,15,56,220,209 > > + movdqa %xmm8,%xmm10 > > +.byte 69,15,58,204,193,0 > > +.byte 68,15,56,200,214 > > + movups 16(%rcx),%xmm1 > > +.byte 102,15,56,220,208 > > +.byte 15,56,202,222 > > + pxor %xmm6,%xmm4 > > +.byte 15,56,201,238 > > + movups 32(%rcx),%xmm0 > > +.byte 102,15,56,220,209 > > + movdqa %xmm8,%xmm9 > > +.byte 69,15,58,204,194,0 > > +.byte 68,15,56,200,203 > > + movups 48(%rcx),%xmm1 > > +.byte 102,15,56,220,208 > > +.byte 15,56,202,227 > > + pxor %xmm3,%xmm5 > > +.byte 15,56,201,243 > > + cmpl $11,%r11d > > + jb .Laesenclast6 > > + movups 64(%rcx),%xmm0 > > +.byte 102,15,56,220,209 > > + movups 80(%rcx),%xmm1 > > +.byte 102,15,56,220,208 > > + je .Laesenclast6 > > + movups 96(%rcx),%xmm0 > > +.byte 102,15,56,220,209 > > + movups 112(%rcx),%xmm1 > > +.byte 102,15,56,220,208 > > +.Laesenclast6: > > +.byte 102,15,56,221,209 > > + movups 16-112(%rcx),%xmm0 > > + movdqa %xmm8,%xmm10 > > +.byte 69,15,58,204,193,0 > > +.byte 68,15,56,200,212 > > + movups 16(%rdi),%xmm14 > > + xorps %xmm15,%xmm14 > > + movups %xmm2,0(%rsi,%rdi,1) > > + xorps %xmm14,%xmm2 > > + movups -80(%rcx),%xmm1 > > +.byte 102,15,56,220,208 > > +.byte 15,56,202,236 > > + pxor %xmm4,%xmm6 > > +.byte 15,56,201,220 > > + movups -64(%rcx),%xmm0 > > +.byte 102,15,56,220,209 > > + movdqa %xmm8,%xmm9 > > +.byte 69,15,58,204,194,1 > > +.byte 68,15,56,200,205 > > + movups -48(%rcx),%xmm1 > > +.byte 102,15,56,220,208 > > +.byte 15,56,202,245 > > + pxor %xmm5,%xmm3 > > +.byte 15,56,201,229 > > + movups -32(%rcx),%xmm0 > > +.byte 102,15,56,220,209 > > + movdqa %xmm8,%xmm10 > > +.byte 69,15,58,204,193,1 > > +.byte 68,15,56,200,214 > > + movups -16(%rcx),%xmm1 > > +.byte 102,15,56,220,208 > > +.byte 15,56,202,222 > > + pxor %xmm6,%xmm4 > > +.byte 15,56,201,238 > > + movups 0(%rcx),%xmm0 > > +.byte 102,15,56,220,209 > > + movdqa %xmm8,%xmm9 > > +.byte 69,15,58,204,194,1 > > +.byte 68,15,56,200,203 > > + movups 16(%rcx),%xmm1 > > +.byte 102,15,56,220,208 > > +.byte 15,56,202,227 > > + pxor %xmm3,%xmm5 > > +.byte 15,56,201,243 > > + movups 32(%rcx),%xmm0 > > +.byte 102,15,56,220,209 > > + movdqa %xmm8,%xmm10 > > +.byte 69,15,58,204,193,1 > > +.byte 68,15,56,200,212 > > + movups 48(%rcx),%xmm1 > > +.byte 102,15,56,220,208 > > +.byte 15,56,202,236 > > + pxor %xmm4,%xmm6 > > +.byte 15,56,201,220 > > + cmpl $11,%r11d > > + jb .Laesenclast7 > > + movups 64(%rcx),%xmm0 > > +.byte 102,15,56,220,209 > > + movups 80(%rcx),%xmm1 > > +.byte 102,15,56,220,208 > > + je .Laesenclast7 > > + movups 96(%rcx),%xmm0 > > +.byte 102,15,56,220,209 > > + movups 112(%rcx),%xmm1 > > +.byte 102,15,56,220,208 > > +.Laesenclast7: > > +.byte 102,15,56,221,209 > > + movups 16-112(%rcx),%xmm0 > > + movdqa %xmm8,%xmm9 > > +.byte 69,15,58,204,194,1 > > +.byte 68,15,56,200,205 > > + movups 32(%rdi),%xmm14 > > + xorps %xmm15,%xmm14 > > + movups %xmm2,16(%rsi,%rdi,1) > > + xorps %xmm14,%xmm2 > > + movups -80(%rcx),%xmm1 > > +.byte 102,15,56,220,208 > > +.byte 15,56,202,245 > > + pxor %xmm5,%xmm3 > > +.byte 15,56,201,229 > > + movups -64(%rcx),%xmm0 > > +.byte 102,15,56,220,209 > > + movdqa %xmm8,%xmm10 > > +.byte 69,15,58,204,193,2 > > +.byte 68,15,56,200,214 > > + movups -48(%rcx),%xmm1 > > +.byte 102,15,56,220,208 > > +.byte 15,56,202,222 > > + pxor %xmm6,%xmm4 > > +.byte 15,56,201,238 > > + movups -32(%rcx),%xmm0 > > +.byte 102,15,56,220,209 > > + movdqa %xmm8,%xmm9 > > +.byte 69,15,58,204,194,2 > > +.byte 68,15,56,200,203 > > + movups -16(%rcx),%xmm1 > > +.byte 102,15,56,220,208 > > +.byte 15,56,202,227 > > + pxor %xmm3,%xmm5 > > +.byte 15,56,201,243 > > + movups 0(%rcx),%xmm0 > > +.byte 102,15,56,220,209 > > + movdqa %xmm8,%xmm10 > > +.byte 69,15,58,204,193,2 > > +.byte 68,15,56,200,212 > > + movups 16(%rcx),%xmm1 > > +.byte 102,15,56,220,208 > > +.byte 15,56,202,236 > > + pxor %xmm4,%xmm6 > > +.byte 15,56,201,220 > > + movups 32(%rcx),%xmm0 > > +.byte 102,15,56,220,209 > > + movdqa %xmm8,%xmm9 > > +.byte 69,15,58,204,194,2 > > +.byte 68,15,56,200,205 > > + movups 48(%rcx),%xmm1 > > +.byte 102,15,56,220,208 > > +.byte 15,56,202,245 > > + pxor %xmm5,%xmm3 > > +.byte 15,56,201,229 > > + cmpl $11,%r11d > > + jb .Laesenclast8 > > + movups 64(%rcx),%xmm0 > > +.byte 102,15,56,220,209 > > + movups 80(%rcx),%xmm1 > > +.byte 102,15,56,220,208 > > + je .Laesenclast8 > > + movups 96(%rcx),%xmm0 > > +.byte 102,15,56,220,209 > > + movups 112(%rcx),%xmm1 > > +.byte 102,15,56,220,208 > > +.Laesenclast8: > > +.byte 102,15,56,221,209 > > + movups 16-112(%rcx),%xmm0 > > + movdqa %xmm8,%xmm10 > > +.byte 69,15,58,204,193,2 > > +.byte 68,15,56,200,214 > > + movups 48(%rdi),%xmm14 > > + xorps %xmm15,%xmm14 > > + movups %xmm2,32(%rsi,%rdi,1) > > + xorps %xmm14,%xmm2 > > + movups -80(%rcx),%xmm1 > > +.byte 102,15,56,220,208 > > +.byte 15,56,202,222 > > + pxor %xmm6,%xmm4 > > +.byte 15,56,201,238 > > + movups -64(%rcx),%xmm0 > > +.byte 102,15,56,220,209 > > + movdqa %xmm8,%xmm9 > > +.byte 69,15,58,204,194,3 > > +.byte 68,15,56,200,203 > > + movups -48(%rcx),%xmm1 > > +.byte 102,15,56,220,208 > > +.byte 15,56,202,227 > > + pxor %xmm3,%xmm5 > > +.byte 15,56,201,243 > > + movups -32(%rcx),%xmm0 > > +.byte 102,15,56,220,209 > > + movdqa %xmm8,%xmm10 > > +.byte 69,15,58,204,193,3 > > +.byte 68,15,56,200,212 > > +.byte 15,56,202,236 > > + pxor %xmm4,%xmm6 > > + movups -16(%rcx),%xmm1 > > +.byte 102,15,56,220,208 > > + movdqa %xmm8,%xmm9 > > +.byte 69,15,58,204,194,3 > > +.byte 68,15,56,200,205 > > +.byte 15,56,202,245 > > + movups 0(%rcx),%xmm0 > > +.byte 102,15,56,220,209 > > + movdqa %xmm12,%xmm5 > > + movdqa %xmm8,%xmm10 > > +.byte 69,15,58,204,193,3 > > +.byte 68,15,56,200,214 > > + movups 16(%rcx),%xmm1 > > +.byte 102,15,56,220,208 > > + movdqa %xmm8,%xmm9 > > +.byte 69,15,58,204,194,3 > > +.byte 68,15,56,200,205 > > + movups 32(%rcx),%xmm0 > > +.byte 102,15,56,220,209 > > + movups 48(%rcx),%xmm1 > > +.byte 102,15,56,220,208 > > + cmpl $11,%r11d > > + jb .Laesenclast9 > > + movups 64(%rcx),%xmm0 > > +.byte 102,15,56,220,209 > > + movups 80(%rcx),%xmm1 > > +.byte 102,15,56,220,208 > > + je .Laesenclast9 > > + movups 96(%rcx),%xmm0 > > +.byte 102,15,56,220,209 > > + movups 112(%rcx),%xmm1 > > +.byte 102,15,56,220,208 > > +.Laesenclast9: > > +.byte 102,15,56,221,209 > > + movups 16-112(%rcx),%xmm0 > > + decq %rdx > > + > > + paddd %xmm11,%xmm8 > > + movups %xmm2,48(%rsi,%rdi,1) > > + leaq 64(%rdi),%rdi > > + jnz .Loop_shaext > > + > > + pshufd $27,%xmm8,%xmm8 > > + pshufd $27,%xmm9,%xmm9 > > + movups %xmm2,(%r8) > > + movdqu %xmm8,(%r9) > > + movd %xmm9,16(%r9) > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size aesni_cbc_sha1_enc_shaext,.-aesni_cbc_sha1_enc_shaext > > diff --git a/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/aesni-sha256- > > x86_64.S > b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/aesni-sha256- > > x86_64.S > > new file mode 100644 > > index 0000000000..e257169287 > > --- /dev/null > > +++ > b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/aesni-sha256-x86_64.S > > @@ -0,0 +1,69 @@ > > +# WARNING: do not edit! > > +# Generated from openssl/crypto/aes/asm/aesni-sha256-x86_64.pl > > +# > > +# Copyright 2013-2020 The OpenSSL Project Authors. All Rights Reserved. > > +# > > +# Licensed under the OpenSSL license (the "License"). You may not use > > +# this file except in compliance with the License. You can obtain a copy > > +# in the file LICENSE in the source distribution or at > > +# https://www.openssl.org/source/license.html > > + > > +.text > > + > > + > > +.globl aesni_cbc_sha256_enc > > +.type aesni_cbc_sha256_enc,@function > > +.align 16 > > +aesni_cbc_sha256_enc: > > +.cfi_startproc > > + xorl %eax,%eax > > + cmpq $0,%rdi > > + je .Lprobe > > + ud2 > > +.Lprobe: > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size aesni_cbc_sha256_enc,.-aesni_cbc_sha256_enc > > + > > +.align 64 > > +.type K256,@object > > +K256: > > +.long 0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5 > > +.long 0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5 > > +.long 0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5 > > +.long 0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5 > > +.long 0xd807aa98,0x12835b01,0x243185be,0x550c7dc3 > > +.long 0xd807aa98,0x12835b01,0x243185be,0x550c7dc3 > > +.long 0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174 > > +.long 0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174 > > +.long 0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc > > +.long 0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc > > +.long 0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da > > +.long 0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da > > +.long 0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7 > > +.long 0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7 > > +.long 0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967 > > +.long 0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967 > > +.long 0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13 > > +.long 0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13 > > +.long 0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85 > > +.long 0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85 > > +.long 0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3 > > +.long 0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3 > > +.long 0xd192e819,0xd6990624,0xf40e3585,0x106aa070 > > +.long 0xd192e819,0xd6990624,0xf40e3585,0x106aa070 > > +.long 0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5 > > +.long 0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5 > > +.long 0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3 > > +.long 0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3 > > +.long 0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208 > > +.long 0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208 > > +.long 0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2 > > +.long 0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2 > > + > > +.long 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f > > +.long 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f > > +.long 0,0,0,0, 0,0,0,0, -1,-1,-1,-1 > > +.long 0,0,0,0, 0,0,0,0 > > +.byte > > > 65,69,83,78,73,45,67,66,67,43,83,72,65,50,53,54,32,115,116,105,116,99,104, 3 > > > 2,102,111,114,32,120,56,54,95,54,52,44,32,67,82,89,80,84,79,71,65,77,83,32, > 9 > > > 8,121,32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46,111,114, > 1 > > 03,62,0 > > +.align 64 > > diff --git > a/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/aesni-x86_64.S > > b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/aesni-x86_64.S > > new file mode 100644 > > index 0000000000..2bdb5cf251 > > --- /dev/null > > +++ b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/aesni-x86_64.S > > @@ -0,0 +1,4484 @@ > > +# WARNING: do not edit! > > +# Generated from openssl/crypto/aes/asm/aesni-x86_64.pl > > +# > > +# Copyright 2009-2020 The OpenSSL Project Authors. All Rights Reserved. > > +# > > +# Licensed under the OpenSSL license (the "License"). You may not use > > +# this file except in compliance with the License. You can obtain a copy > > +# in the file LICENSE in the source distribution or at > > +# https://www.openssl.org/source/license.html > > + > > +.text > > + > > +.globl aesni_encrypt > > +.type aesni_encrypt,@function > > +.align 16 > > +aesni_encrypt: > > +.cfi_startproc > > + movups (%rdi),%xmm2 > > + movl 240(%rdx),%eax > > + movups (%rdx),%xmm0 > > + movups 16(%rdx),%xmm1 > > + leaq 32(%rdx),%rdx > > + xorps %xmm0,%xmm2 > > +.Loop_enc1_1: > > +.byte 102,15,56,220,209 > > + decl %eax > > + movups (%rdx),%xmm1 > > + leaq 16(%rdx),%rdx > > + jnz .Loop_enc1_1 > > +.byte 102,15,56,221,209 > > + pxor %xmm0,%xmm0 > > + pxor %xmm1,%xmm1 > > + movups %xmm2,(%rsi) > > + pxor %xmm2,%xmm2 > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size aesni_encrypt,.-aesni_encrypt > > + > > +.globl aesni_decrypt > > +.type aesni_decrypt,@function > > +.align 16 > > +aesni_decrypt: > > +.cfi_startproc > > + movups (%rdi),%xmm2 > > + movl 240(%rdx),%eax > > + movups (%rdx),%xmm0 > > + movups 16(%rdx),%xmm1 > > + leaq 32(%rdx),%rdx > > + xorps %xmm0,%xmm2 > > +.Loop_dec1_2: > > +.byte 102,15,56,222,209 > > + decl %eax > > + movups (%rdx),%xmm1 > > + leaq 16(%rdx),%rdx > > + jnz .Loop_dec1_2 > > +.byte 102,15,56,223,209 > > + pxor %xmm0,%xmm0 > > + pxor %xmm1,%xmm1 > > + movups %xmm2,(%rsi) > > + pxor %xmm2,%xmm2 > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size aesni_decrypt, .-aesni_decrypt > > +.type _aesni_encrypt2,@function > > +.align 16 > > +_aesni_encrypt2: > > +.cfi_startproc > > + movups (%rcx),%xmm0 > > + shll $4,%eax > > + movups 16(%rcx),%xmm1 > > + xorps %xmm0,%xmm2 > > + xorps %xmm0,%xmm3 > > + movups 32(%rcx),%xmm0 > > + leaq 32(%rcx,%rax,1),%rcx > > + negq %rax > > + addq $16,%rax > > + > > +.Lenc_loop2: > > +.byte 102,15,56,220,209 > > +.byte 102,15,56,220,217 > > + movups (%rcx,%rax,1),%xmm1 > > + addq $32,%rax > > +.byte 102,15,56,220,208 > > +.byte 102,15,56,220,216 > > + movups -16(%rcx,%rax,1),%xmm0 > > + jnz .Lenc_loop2 > > + > > +.byte 102,15,56,220,209 > > +.byte 102,15,56,220,217 > > +.byte 102,15,56,221,208 > > +.byte 102,15,56,221,216 > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size _aesni_encrypt2,.-_aesni_encrypt2 > > +.type _aesni_decrypt2,@function > > +.align 16 > > +_aesni_decrypt2: > > +.cfi_startproc > > + movups (%rcx),%xmm0 > > + shll $4,%eax > > + movups 16(%rcx),%xmm1 > > + xorps %xmm0,%xmm2 > > + xorps %xmm0,%xmm3 > > + movups 32(%rcx),%xmm0 > > + leaq 32(%rcx,%rax,1),%rcx > > + negq %rax > > + addq $16,%rax > > + > > +.Ldec_loop2: > > +.byte 102,15,56,222,209 > > +.byte 102,15,56,222,217 > > + movups (%rcx,%rax,1),%xmm1 > > + addq $32,%rax > > +.byte 102,15,56,222,208 > > +.byte 102,15,56,222,216 > > + movups -16(%rcx,%rax,1),%xmm0 > > + jnz .Ldec_loop2 > > + > > +.byte 102,15,56,222,209 > > +.byte 102,15,56,222,217 > > +.byte 102,15,56,223,208 > > +.byte 102,15,56,223,216 > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size _aesni_decrypt2,.-_aesni_decrypt2 > > +.type _aesni_encrypt3,@function > > +.align 16 > > +_aesni_encrypt3: > > +.cfi_startproc > > + movups (%rcx),%xmm0 > > + shll $4,%eax > > + movups 16(%rcx),%xmm1 > > + xorps %xmm0,%xmm2 > > + xorps %xmm0,%xmm3 > > + xorps %xmm0,%xmm4 > > + movups 32(%rcx),%xmm0 > > + leaq 32(%rcx,%rax,1),%rcx > > + negq %rax > > + addq $16,%rax > > + > > +.Lenc_loop3: > > +.byte 102,15,56,220,209 > > +.byte 102,15,56,220,217 > > +.byte 102,15,56,220,225 > > + movups (%rcx,%rax,1),%xmm1 > > + addq $32,%rax > > +.byte 102,15,56,220,208 > > +.byte 102,15,56,220,216 > > +.byte 102,15,56,220,224 > > + movups -16(%rcx,%rax,1),%xmm0 > > + jnz .Lenc_loop3 > > + > > +.byte 102,15,56,220,209 > > +.byte 102,15,56,220,217 > > +.byte 102,15,56,220,225 > > +.byte 102,15,56,221,208 > > +.byte 102,15,56,221,216 > > +.byte 102,15,56,221,224 > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size _aesni_encrypt3,.-_aesni_encrypt3 > > +.type _aesni_decrypt3,@function > > +.align 16 > > +_aesni_decrypt3: > > +.cfi_startproc > > + movups (%rcx),%xmm0 > > + shll $4,%eax > > + movups 16(%rcx),%xmm1 > > + xorps %xmm0,%xmm2 > > + xorps %xmm0,%xmm3 > > + xorps %xmm0,%xmm4 > > + movups 32(%rcx),%xmm0 > > + leaq 32(%rcx,%rax,1),%rcx > > + negq %rax > > + addq $16,%rax > > + > > +.Ldec_loop3: > > +.byte 102,15,56,222,209 > > +.byte 102,15,56,222,217 > > +.byte 102,15,56,222,225 > > + movups (%rcx,%rax,1),%xmm1 > > + addq $32,%rax > > +.byte 102,15,56,222,208 > > +.byte 102,15,56,222,216 > > +.byte 102,15,56,222,224 > > + movups -16(%rcx,%rax,1),%xmm0 > > + jnz .Ldec_loop3 > > + > > +.byte 102,15,56,222,209 > > +.byte 102,15,56,222,217 > > +.byte 102,15,56,222,225 > > +.byte 102,15,56,223,208 > > +.byte 102,15,56,223,216 > > +.byte 102,15,56,223,224 > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size _aesni_decrypt3,.-_aesni_decrypt3 > > +.type _aesni_encrypt4,@function > > +.align 16 > > +_aesni_encrypt4: > > +.cfi_startproc > > + movups (%rcx),%xmm0 > > + shll $4,%eax > > + movups 16(%rcx),%xmm1 > > + xorps %xmm0,%xmm2 > > + xorps %xmm0,%xmm3 > > + xorps %xmm0,%xmm4 > > + xorps %xmm0,%xmm5 > > + movups 32(%rcx),%xmm0 > > + leaq 32(%rcx,%rax,1),%rcx > > + negq %rax > > +.byte 0x0f,0x1f,0x00 > > + addq $16,%rax > > + > > +.Lenc_loop4: > > +.byte 102,15,56,220,209 > > +.byte 102,15,56,220,217 > > +.byte 102,15,56,220,225 > > +.byte 102,15,56,220,233 > > + movups (%rcx,%rax,1),%xmm1 > > + addq $32,%rax > > +.byte 102,15,56,220,208 > > +.byte 102,15,56,220,216 > > +.byte 102,15,56,220,224 > > +.byte 102,15,56,220,232 > > + movups -16(%rcx,%rax,1),%xmm0 > > + jnz .Lenc_loop4 > > + > > +.byte 102,15,56,220,209 > > +.byte 102,15,56,220,217 > > +.byte 102,15,56,220,225 > > +.byte 102,15,56,220,233 > > +.byte 102,15,56,221,208 > > +.byte 102,15,56,221,216 > > +.byte 102,15,56,221,224 > > +.byte 102,15,56,221,232 > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size _aesni_encrypt4,.-_aesni_encrypt4 > > +.type _aesni_decrypt4,@function > > +.align 16 > > +_aesni_decrypt4: > > +.cfi_startproc > > + movups (%rcx),%xmm0 > > + shll $4,%eax > > + movups 16(%rcx),%xmm1 > > + xorps %xmm0,%xmm2 > > + xorps %xmm0,%xmm3 > > + xorps %xmm0,%xmm4 > > + xorps %xmm0,%xmm5 > > + movups 32(%rcx),%xmm0 > > + leaq 32(%rcx,%rax,1),%rcx > > + negq %rax > > +.byte 0x0f,0x1f,0x00 > > + addq $16,%rax > > + > > +.Ldec_loop4: > > +.byte 102,15,56,222,209 > > +.byte 102,15,56,222,217 > > +.byte 102,15,56,222,225 > > +.byte 102,15,56,222,233 > > + movups (%rcx,%rax,1),%xmm1 > > + addq $32,%rax > > +.byte 102,15,56,222,208 > > +.byte 102,15,56,222,216 > > +.byte 102,15,56,222,224 > > +.byte 102,15,56,222,232 > > + movups -16(%rcx,%rax,1),%xmm0 > > + jnz .Ldec_loop4 > > + > > +.byte 102,15,56,222,209 > > +.byte 102,15,56,222,217 > > +.byte 102,15,56,222,225 > > +.byte 102,15,56,222,233 > > +.byte 102,15,56,223,208 > > +.byte 102,15,56,223,216 > > +.byte 102,15,56,223,224 > > +.byte 102,15,56,223,232 > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size _aesni_decrypt4,.-_aesni_decrypt4 > > +.type _aesni_encrypt6,@function > > +.align 16 > > +_aesni_encrypt6: > > +.cfi_startproc > > + movups (%rcx),%xmm0 > > + shll $4,%eax > > + movups 16(%rcx),%xmm1 > > + xorps %xmm0,%xmm2 > > + pxor %xmm0,%xmm3 > > + pxor %xmm0,%xmm4 > > +.byte 102,15,56,220,209 > > + leaq 32(%rcx,%rax,1),%rcx > > + negq %rax > > +.byte 102,15,56,220,217 > > + pxor %xmm0,%xmm5 > > + pxor %xmm0,%xmm6 > > +.byte 102,15,56,220,225 > > + pxor %xmm0,%xmm7 > > + movups (%rcx,%rax,1),%xmm0 > > + addq $16,%rax > > + jmp .Lenc_loop6_enter > > +.align 16 > > +.Lenc_loop6: > > +.byte 102,15,56,220,209 > > +.byte 102,15,56,220,217 > > +.byte 102,15,56,220,225 > > +.Lenc_loop6_enter: > > +.byte 102,15,56,220,233 > > +.byte 102,15,56,220,241 > > +.byte 102,15,56,220,249 > > + movups (%rcx,%rax,1),%xmm1 > > + addq $32,%rax > > +.byte 102,15,56,220,208 > > +.byte 102,15,56,220,216 > > +.byte 102,15,56,220,224 > > +.byte 102,15,56,220,232 > > +.byte 102,15,56,220,240 > > +.byte 102,15,56,220,248 > > + movups -16(%rcx,%rax,1),%xmm0 > > + jnz .Lenc_loop6 > > + > > +.byte 102,15,56,220,209 > > +.byte 102,15,56,220,217 > > +.byte 102,15,56,220,225 > > +.byte 102,15,56,220,233 > > +.byte 102,15,56,220,241 > > +.byte 102,15,56,220,249 > > +.byte 102,15,56,221,208 > > +.byte 102,15,56,221,216 > > +.byte 102,15,56,221,224 > > +.byte 102,15,56,221,232 > > +.byte 102,15,56,221,240 > > +.byte 102,15,56,221,248 > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size _aesni_encrypt6,.-_aesni_encrypt6 > > +.type _aesni_decrypt6,@function > > +.align 16 > > +_aesni_decrypt6: > > +.cfi_startproc > > + movups (%rcx),%xmm0 > > + shll $4,%eax > > + movups 16(%rcx),%xmm1 > > + xorps %xmm0,%xmm2 > > + pxor %xmm0,%xmm3 > > + pxor %xmm0,%xmm4 > > +.byte 102,15,56,222,209 > > + leaq 32(%rcx,%rax,1),%rcx > > + negq %rax > > +.byte 102,15,56,222,217 > > + pxor %xmm0,%xmm5 > > + pxor %xmm0,%xmm6 > > +.byte 102,15,56,222,225 > > + pxor %xmm0,%xmm7 > > + movups (%rcx,%rax,1),%xmm0 > > + addq $16,%rax > > + jmp .Ldec_loop6_enter > > +.align 16 > > +.Ldec_loop6: > > +.byte 102,15,56,222,209 > > +.byte 102,15,56,222,217 > > +.byte 102,15,56,222,225 > > +.Ldec_loop6_enter: > > +.byte 102,15,56,222,233 > > +.byte 102,15,56,222,241 > > +.byte 102,15,56,222,249 > > + movups (%rcx,%rax,1),%xmm1 > > + addq $32,%rax > > +.byte 102,15,56,222,208 > > +.byte 102,15,56,222,216 > > +.byte 102,15,56,222,224 > > +.byte 102,15,56,222,232 > > +.byte 102,15,56,222,240 > > +.byte 102,15,56,222,248 > > + movups -16(%rcx,%rax,1),%xmm0 > > + jnz .Ldec_loop6 > > + > > +.byte 102,15,56,222,209 > > +.byte 102,15,56,222,217 > > +.byte 102,15,56,222,225 > > +.byte 102,15,56,222,233 > > +.byte 102,15,56,222,241 > > +.byte 102,15,56,222,249 > > +.byte 102,15,56,223,208 > > +.byte 102,15,56,223,216 > > +.byte 102,15,56,223,224 > > +.byte 102,15,56,223,232 > > +.byte 102,15,56,223,240 > > +.byte 102,15,56,223,248 > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size _aesni_decrypt6,.-_aesni_decrypt6 > > +.type _aesni_encrypt8,@function > > +.align 16 > > +_aesni_encrypt8: > > +.cfi_startproc > > + movups (%rcx),%xmm0 > > + shll $4,%eax > > + movups 16(%rcx),%xmm1 > > + xorps %xmm0,%xmm2 > > + xorps %xmm0,%xmm3 > > + pxor %xmm0,%xmm4 > > + pxor %xmm0,%xmm5 > > + pxor %xmm0,%xmm6 > > + leaq 32(%rcx,%rax,1),%rcx > > + negq %rax > > +.byte 102,15,56,220,209 > > + pxor %xmm0,%xmm7 > > + pxor %xmm0,%xmm8 > > +.byte 102,15,56,220,217 > > + pxor %xmm0,%xmm9 > > + movups (%rcx,%rax,1),%xmm0 > > + addq $16,%rax > > + jmp .Lenc_loop8_inner > > +.align 16 > > +.Lenc_loop8: > > +.byte 102,15,56,220,209 > > +.byte 102,15,56,220,217 > > +.Lenc_loop8_inner: > > +.byte 102,15,56,220,225 > > +.byte 102,15,56,220,233 > > +.byte 102,15,56,220,241 > > +.byte 102,15,56,220,249 > > +.byte 102,68,15,56,220,193 > > +.byte 102,68,15,56,220,201 > > +.Lenc_loop8_enter: > > + movups (%rcx,%rax,1),%xmm1 > > + addq $32,%rax > > +.byte 102,15,56,220,208 > > +.byte 102,15,56,220,216 > > +.byte 102,15,56,220,224 > > +.byte 102,15,56,220,232 > > +.byte 102,15,56,220,240 > > +.byte 102,15,56,220,248 > > +.byte 102,68,15,56,220,192 > > +.byte 102,68,15,56,220,200 > > + movups -16(%rcx,%rax,1),%xmm0 > > + jnz .Lenc_loop8 > > + > > +.byte 102,15,56,220,209 > > +.byte 102,15,56,220,217 > > +.byte 102,15,56,220,225 > > +.byte 102,15,56,220,233 > > +.byte 102,15,56,220,241 > > +.byte 102,15,56,220,249 > > +.byte 102,68,15,56,220,193 > > +.byte 102,68,15,56,220,201 > > +.byte 102,15,56,221,208 > > +.byte 102,15,56,221,216 > > +.byte 102,15,56,221,224 > > +.byte 102,15,56,221,232 > > +.byte 102,15,56,221,240 > > +.byte 102,15,56,221,248 > > +.byte 102,68,15,56,221,192 > > +.byte 102,68,15,56,221,200 > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size _aesni_encrypt8,.-_aesni_encrypt8 > > +.type _aesni_decrypt8,@function > > +.align 16 > > +_aesni_decrypt8: > > +.cfi_startproc > > + movups (%rcx),%xmm0 > > + shll $4,%eax > > + movups 16(%rcx),%xmm1 > > + xorps %xmm0,%xmm2 > > + xorps %xmm0,%xmm3 > > + pxor %xmm0,%xmm4 > > + pxor %xmm0,%xmm5 > > + pxor %xmm0,%xmm6 > > + leaq 32(%rcx,%rax,1),%rcx > > + negq %rax > > +.byte 102,15,56,222,209 > > + pxor %xmm0,%xmm7 > > + pxor %xmm0,%xmm8 > > +.byte 102,15,56,222,217 > > + pxor %xmm0,%xmm9 > > + movups (%rcx,%rax,1),%xmm0 > > + addq $16,%rax > > + jmp .Ldec_loop8_inner > > +.align 16 > > +.Ldec_loop8: > > +.byte 102,15,56,222,209 > > +.byte 102,15,56,222,217 > > +.Ldec_loop8_inner: > > +.byte 102,15,56,222,225 > > +.byte 102,15,56,222,233 > > +.byte 102,15,56,222,241 > > +.byte 102,15,56,222,249 > > +.byte 102,68,15,56,222,193 > > +.byte 102,68,15,56,222,201 > > +.Ldec_loop8_enter: > > + movups (%rcx,%rax,1),%xmm1 > > + addq $32,%rax > > +.byte 102,15,56,222,208 > > +.byte 102,15,56,222,216 > > +.byte 102,15,56,222,224 > > +.byte 102,15,56,222,232 > > +.byte 102,15,56,222,240 > > +.byte 102,15,56,222,248 > > +.byte 102,68,15,56,222,192 > > +.byte 102,68,15,56,222,200 > > + movups -16(%rcx,%rax,1),%xmm0 > > + jnz .Ldec_loop8 > > + > > +.byte 102,15,56,222,209 > > +.byte 102,15,56,222,217 > > +.byte 102,15,56,222,225 > > +.byte 102,15,56,222,233 > > +.byte 102,15,56,222,241 > > +.byte 102,15,56,222,249 > > +.byte 102,68,15,56,222,193 > > +.byte 102,68,15,56,222,201 > > +.byte 102,15,56,223,208 > > +.byte 102,15,56,223,216 > > +.byte 102,15,56,223,224 > > +.byte 102,15,56,223,232 > > +.byte 102,15,56,223,240 > > +.byte 102,15,56,223,248 > > +.byte 102,68,15,56,223,192 > > +.byte 102,68,15,56,223,200 > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size _aesni_decrypt8,.-_aesni_decrypt8 > > +.globl aesni_ecb_encrypt > > +.type aesni_ecb_encrypt,@function > > +.align 16 > > +aesni_ecb_encrypt: > > +.cfi_startproc > > + andq $-16,%rdx > > + jz .Lecb_ret > > + > > + movl 240(%rcx),%eax > > + movups (%rcx),%xmm0 > > + movq %rcx,%r11 > > + movl %eax,%r10d > > + testl %r8d,%r8d > > + jz .Lecb_decrypt > > + > > + cmpq $0x80,%rdx > > + jb .Lecb_enc_tail > > + > > + movdqu (%rdi),%xmm2 > > + movdqu 16(%rdi),%xmm3 > > + movdqu 32(%rdi),%xmm4 > > + movdqu 48(%rdi),%xmm5 > > + movdqu 64(%rdi),%xmm6 > > + movdqu 80(%rdi),%xmm7 > > + movdqu 96(%rdi),%xmm8 > > + movdqu 112(%rdi),%xmm9 > > + leaq 128(%rdi),%rdi > > + subq $0x80,%rdx > > + jmp .Lecb_enc_loop8_enter > > +.align 16 > > +.Lecb_enc_loop8: > > + movups %xmm2,(%rsi) > > + movq %r11,%rcx > > + movdqu (%rdi),%xmm2 > > + movl %r10d,%eax > > + movups %xmm3,16(%rsi) > > + movdqu 16(%rdi),%xmm3 > > + movups %xmm4,32(%rsi) > > + movdqu 32(%rdi),%xmm4 > > + movups %xmm5,48(%rsi) > > + movdqu 48(%rdi),%xmm5 > > + movups %xmm6,64(%rsi) > > + movdqu 64(%rdi),%xmm6 > > + movups %xmm7,80(%rsi) > > + movdqu 80(%rdi),%xmm7 > > + movups %xmm8,96(%rsi) > > + movdqu 96(%rdi),%xmm8 > > + movups %xmm9,112(%rsi) > > + leaq 128(%rsi),%rsi > > + movdqu 112(%rdi),%xmm9 > > + leaq 128(%rdi),%rdi > > +.Lecb_enc_loop8_enter: > > + > > + call _aesni_encrypt8 > > + > > + subq $0x80,%rdx > > + jnc .Lecb_enc_loop8 > > + > > + movups %xmm2,(%rsi) > > + movq %r11,%rcx > > + movups %xmm3,16(%rsi) > > + movl %r10d,%eax > > + movups %xmm4,32(%rsi) > > + movups %xmm5,48(%rsi) > > + movups %xmm6,64(%rsi) > > + movups %xmm7,80(%rsi) > > + movups %xmm8,96(%rsi) > > + movups %xmm9,112(%rsi) > > + leaq 128(%rsi),%rsi > > + addq $0x80,%rdx > > + jz .Lecb_ret > > + > > +.Lecb_enc_tail: > > + movups (%rdi),%xmm2 > > + cmpq $0x20,%rdx > > + jb .Lecb_enc_one > > + movups 16(%rdi),%xmm3 > > + je .Lecb_enc_two > > + movups 32(%rdi),%xmm4 > > + cmpq $0x40,%rdx > > + jb .Lecb_enc_three > > + movups 48(%rdi),%xmm5 > > + je .Lecb_enc_four > > + movups 64(%rdi),%xmm6 > > + cmpq $0x60,%rdx > > + jb .Lecb_enc_five > > + movups 80(%rdi),%xmm7 > > + je .Lecb_enc_six > > + movdqu 96(%rdi),%xmm8 > > + xorps %xmm9,%xmm9 > > + call _aesni_encrypt8 > > + movups %xmm2,(%rsi) > > + movups %xmm3,16(%rsi) > > + movups %xmm4,32(%rsi) > > + movups %xmm5,48(%rsi) > > + movups %xmm6,64(%rsi) > > + movups %xmm7,80(%rsi) > > + movups %xmm8,96(%rsi) > > + jmp .Lecb_ret > > +.align 16 > > +.Lecb_enc_one: > > + movups (%rcx),%xmm0 > > + movups 16(%rcx),%xmm1 > > + leaq 32(%rcx),%rcx > > + xorps %xmm0,%xmm2 > > +.Loop_enc1_3: > > +.byte 102,15,56,220,209 > > + decl %eax > > + movups (%rcx),%xmm1 > > + leaq 16(%rcx),%rcx > > + jnz .Loop_enc1_3 > > +.byte 102,15,56,221,209 > > + movups %xmm2,(%rsi) > > + jmp .Lecb_ret > > +.align 16 > > +.Lecb_enc_two: > > + call _aesni_encrypt2 > > + movups %xmm2,(%rsi) > > + movups %xmm3,16(%rsi) > > + jmp .Lecb_ret > > +.align 16 > > +.Lecb_enc_three: > > + call _aesni_encrypt3 > > + movups %xmm2,(%rsi) > > + movups %xmm3,16(%rsi) > > + movups %xmm4,32(%rsi) > > + jmp .Lecb_ret > > +.align 16 > > +.Lecb_enc_four: > > + call _aesni_encrypt4 > > + movups %xmm2,(%rsi) > > + movups %xmm3,16(%rsi) > > + movups %xmm4,32(%rsi) > > + movups %xmm5,48(%rsi) > > + jmp .Lecb_ret > > +.align 16 > > +.Lecb_enc_five: > > + xorps %xmm7,%xmm7 > > + call _aesni_encrypt6 > > + movups %xmm2,(%rsi) > > + movups %xmm3,16(%rsi) > > + movups %xmm4,32(%rsi) > > + movups %xmm5,48(%rsi) > > + movups %xmm6,64(%rsi) > > + jmp .Lecb_ret > > +.align 16 > > +.Lecb_enc_six: > > + call _aesni_encrypt6 > > + movups %xmm2,(%rsi) > > + movups %xmm3,16(%rsi) > > + movups %xmm4,32(%rsi) > > + movups %xmm5,48(%rsi) > > + movups %xmm6,64(%rsi) > > + movups %xmm7,80(%rsi) > > + jmp .Lecb_ret > > + > > +.align 16 > > +.Lecb_decrypt: > > + cmpq $0x80,%rdx > > + jb .Lecb_dec_tail > > + > > + movdqu (%rdi),%xmm2 > > + movdqu 16(%rdi),%xmm3 > > + movdqu 32(%rdi),%xmm4 > > + movdqu 48(%rdi),%xmm5 > > + movdqu 64(%rdi),%xmm6 > > + movdqu 80(%rdi),%xmm7 > > + movdqu 96(%rdi),%xmm8 > > + movdqu 112(%rdi),%xmm9 > > + leaq 128(%rdi),%rdi > > + subq $0x80,%rdx > > + jmp .Lecb_dec_loop8_enter > > +.align 16 > > +.Lecb_dec_loop8: > > + movups %xmm2,(%rsi) > > + movq %r11,%rcx > > + movdqu (%rdi),%xmm2 > > + movl %r10d,%eax > > + movups %xmm3,16(%rsi) > > + movdqu 16(%rdi),%xmm3 > > + movups %xmm4,32(%rsi) > > + movdqu 32(%rdi),%xmm4 > > + movups %xmm5,48(%rsi) > > + movdqu 48(%rdi),%xmm5 > > + movups %xmm6,64(%rsi) > > + movdqu 64(%rdi),%xmm6 > > + movups %xmm7,80(%rsi) > > + movdqu 80(%rdi),%xmm7 > > + movups %xmm8,96(%rsi) > > + movdqu 96(%rdi),%xmm8 > > + movups %xmm9,112(%rsi) > > + leaq 128(%rsi),%rsi > > + movdqu 112(%rdi),%xmm9 > > + leaq 128(%rdi),%rdi > > +.Lecb_dec_loop8_enter: > > + > > + call _aesni_decrypt8 > > + > > + movups (%r11),%xmm0 > > + subq $0x80,%rdx > > + jnc .Lecb_dec_loop8 > > + > > + movups %xmm2,(%rsi) > > + pxor %xmm2,%xmm2 > > + movq %r11,%rcx > > + movups %xmm3,16(%rsi) > > + pxor %xmm3,%xmm3 > > + movl %r10d,%eax > > + movups %xmm4,32(%rsi) > > + pxor %xmm4,%xmm4 > > + movups %xmm5,48(%rsi) > > + pxor %xmm5,%xmm5 > > + movups %xmm6,64(%rsi) > > + pxor %xmm6,%xmm6 > > + movups %xmm7,80(%rsi) > > + pxor %xmm7,%xmm7 > > + movups %xmm8,96(%rsi) > > + pxor %xmm8,%xmm8 > > + movups %xmm9,112(%rsi) > > + pxor %xmm9,%xmm9 > > + leaq 128(%rsi),%rsi > > + addq $0x80,%rdx > > + jz .Lecb_ret > > + > > +.Lecb_dec_tail: > > + movups (%rdi),%xmm2 > > + cmpq $0x20,%rdx > > + jb .Lecb_dec_one > > + movups 16(%rdi),%xmm3 > > + je .Lecb_dec_two > > + movups 32(%rdi),%xmm4 > > + cmpq $0x40,%rdx > > + jb .Lecb_dec_three > > + movups 48(%rdi),%xmm5 > > + je .Lecb_dec_four > > + movups 64(%rdi),%xmm6 > > + cmpq $0x60,%rdx > > + jb .Lecb_dec_five > > + movups 80(%rdi),%xmm7 > > + je .Lecb_dec_six > > + movups 96(%rdi),%xmm8 > > + movups (%rcx),%xmm0 > > + xorps %xmm9,%xmm9 > > + call _aesni_decrypt8 > > + movups %xmm2,(%rsi) > > + pxor %xmm2,%xmm2 > > + movups %xmm3,16(%rsi) > > + pxor %xmm3,%xmm3 > > + movups %xmm4,32(%rsi) > > + pxor %xmm4,%xmm4 > > + movups %xmm5,48(%rsi) > > + pxor %xmm5,%xmm5 > > + movups %xmm6,64(%rsi) > > + pxor %xmm6,%xmm6 > > + movups %xmm7,80(%rsi) > > + pxor %xmm7,%xmm7 > > + movups %xmm8,96(%rsi) > > + pxor %xmm8,%xmm8 > > + pxor %xmm9,%xmm9 > > + jmp .Lecb_ret > > +.align 16 > > +.Lecb_dec_one: > > + movups (%rcx),%xmm0 > > + movups 16(%rcx),%xmm1 > > + leaq 32(%rcx),%rcx > > + xorps %xmm0,%xmm2 > > +.Loop_dec1_4: > > +.byte 102,15,56,222,209 > > + decl %eax > > + movups (%rcx),%xmm1 > > + leaq 16(%rcx),%rcx > > + jnz .Loop_dec1_4 > > +.byte 102,15,56,223,209 > > + movups %xmm2,(%rsi) > > + pxor %xmm2,%xmm2 > > + jmp .Lecb_ret > > +.align 16 > > +.Lecb_dec_two: > > + call _aesni_decrypt2 > > + movups %xmm2,(%rsi) > > + pxor %xmm2,%xmm2 > > + movups %xmm3,16(%rsi) > > + pxor %xmm3,%xmm3 > > + jmp .Lecb_ret > > +.align 16 > > +.Lecb_dec_three: > > + call _aesni_decrypt3 > > + movups %xmm2,(%rsi) > > + pxor %xmm2,%xmm2 > > + movups %xmm3,16(%rsi) > > + pxor %xmm3,%xmm3 > > + movups %xmm4,32(%rsi) > > + pxor %xmm4,%xmm4 > > + jmp .Lecb_ret > > +.align 16 > > +.Lecb_dec_four: > > + call _aesni_decrypt4 > > + movups %xmm2,(%rsi) > > + pxor %xmm2,%xmm2 > > + movups %xmm3,16(%rsi) > > + pxor %xmm3,%xmm3 > > + movups %xmm4,32(%rsi) > > + pxor %xmm4,%xmm4 > > + movups %xmm5,48(%rsi) > > + pxor %xmm5,%xmm5 > > + jmp .Lecb_ret > > +.align 16 > > +.Lecb_dec_five: > > + xorps %xmm7,%xmm7 > > + call _aesni_decrypt6 > > + movups %xmm2,(%rsi) > > + pxor %xmm2,%xmm2 > > + movups %xmm3,16(%rsi) > > + pxor %xmm3,%xmm3 > > + movups %xmm4,32(%rsi) > > + pxor %xmm4,%xmm4 > > + movups %xmm5,48(%rsi) > > + pxor %xmm5,%xmm5 > > + movups %xmm6,64(%rsi) > > + pxor %xmm6,%xmm6 > > + pxor %xmm7,%xmm7 > > + jmp .Lecb_ret > > +.align 16 > > +.Lecb_dec_six: > > + call _aesni_decrypt6 > > + movups %xmm2,(%rsi) > > + pxor %xmm2,%xmm2 > > + movups %xmm3,16(%rsi) > > + pxor %xmm3,%xmm3 > > + movups %xmm4,32(%rsi) > > + pxor %xmm4,%xmm4 > > + movups %xmm5,48(%rsi) > > + pxor %xmm5,%xmm5 > > + movups %xmm6,64(%rsi) > > + pxor %xmm6,%xmm6 > > + movups %xmm7,80(%rsi) > > + pxor %xmm7,%xmm7 > > + > > +.Lecb_ret: > > + xorps %xmm0,%xmm0 > > + pxor %xmm1,%xmm1 > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size aesni_ecb_encrypt,.-aesni_ecb_encrypt > > +.globl aesni_ccm64_encrypt_blocks > > +.type aesni_ccm64_encrypt_blocks,@function > > +.align 16 > > +aesni_ccm64_encrypt_blocks: > > +.cfi_startproc > > + movl 240(%rcx),%eax > > + movdqu (%r8),%xmm6 > > + movdqa .Lincrement64(%rip),%xmm9 > > + movdqa .Lbswap_mask(%rip),%xmm7 > > + > > + shll $4,%eax > > + movl $16,%r10d > > + leaq 0(%rcx),%r11 > > + movdqu (%r9),%xmm3 > > + movdqa %xmm6,%xmm2 > > + leaq 32(%rcx,%rax,1),%rcx > > +.byte 102,15,56,0,247 > > + subq %rax,%r10 > > + jmp .Lccm64_enc_outer > > +.align 16 > > +.Lccm64_enc_outer: > > + movups (%r11),%xmm0 > > + movq %r10,%rax > > + movups (%rdi),%xmm8 > > + > > + xorps %xmm0,%xmm2 > > + movups 16(%r11),%xmm1 > > + xorps %xmm8,%xmm0 > > + xorps %xmm0,%xmm3 > > + movups 32(%r11),%xmm0 > > + > > +.Lccm64_enc2_loop: > > +.byte 102,15,56,220,209 > > +.byte 102,15,56,220,217 > > + movups (%rcx,%rax,1),%xmm1 > > + addq $32,%rax > > +.byte 102,15,56,220,208 > > +.byte 102,15,56,220,216 > > + movups -16(%rcx,%rax,1),%xmm0 > > + jnz .Lccm64_enc2_loop > > +.byte 102,15,56,220,209 > > +.byte 102,15,56,220,217 > > + paddq %xmm9,%xmm6 > > + decq %rdx > > +.byte 102,15,56,221,208 > > +.byte 102,15,56,221,216 > > + > > + leaq 16(%rdi),%rdi > > + xorps %xmm2,%xmm8 > > + movdqa %xmm6,%xmm2 > > + movups %xmm8,(%rsi) > > +.byte 102,15,56,0,215 > > + leaq 16(%rsi),%rsi > > + jnz .Lccm64_enc_outer > > + > > + pxor %xmm0,%xmm0 > > + pxor %xmm1,%xmm1 > > + pxor %xmm2,%xmm2 > > + movups %xmm3,(%r9) > > + pxor %xmm3,%xmm3 > > + pxor %xmm8,%xmm8 > > + pxor %xmm6,%xmm6 > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size aesni_ccm64_encrypt_blocks,.-aesni_ccm64_encrypt_blocks > > +.globl aesni_ccm64_decrypt_blocks > > +.type aesni_ccm64_decrypt_blocks,@function > > +.align 16 > > +aesni_ccm64_decrypt_blocks: > > +.cfi_startproc > > + movl 240(%rcx),%eax > > + movups (%r8),%xmm6 > > + movdqu (%r9),%xmm3 > > + movdqa .Lincrement64(%rip),%xmm9 > > + movdqa .Lbswap_mask(%rip),%xmm7 > > + > > + movaps %xmm6,%xmm2 > > + movl %eax,%r10d > > + movq %rcx,%r11 > > +.byte 102,15,56,0,247 > > + movups (%rcx),%xmm0 > > + movups 16(%rcx),%xmm1 > > + leaq 32(%rcx),%rcx > > + xorps %xmm0,%xmm2 > > +.Loop_enc1_5: > > +.byte 102,15,56,220,209 > > + decl %eax > > + movups (%rcx),%xmm1 > > + leaq 16(%rcx),%rcx > > + jnz .Loop_enc1_5 > > +.byte 102,15,56,221,209 > > + shll $4,%r10d > > + movl $16,%eax > > + movups (%rdi),%xmm8 > > + paddq %xmm9,%xmm6 > > + leaq 16(%rdi),%rdi > > + subq %r10,%rax > > + leaq 32(%r11,%r10,1),%rcx > > + movq %rax,%r10 > > + jmp .Lccm64_dec_outer > > +.align 16 > > +.Lccm64_dec_outer: > > + xorps %xmm2,%xmm8 > > + movdqa %xmm6,%xmm2 > > + movups %xmm8,(%rsi) > > + leaq 16(%rsi),%rsi > > +.byte 102,15,56,0,215 > > + > > + subq $1,%rdx > > + jz .Lccm64_dec_break > > + > > + movups (%r11),%xmm0 > > + movq %r10,%rax > > + movups 16(%r11),%xmm1 > > + xorps %xmm0,%xmm8 > > + xorps %xmm0,%xmm2 > > + xorps %xmm8,%xmm3 > > + movups 32(%r11),%xmm0 > > + jmp .Lccm64_dec2_loop > > +.align 16 > > +.Lccm64_dec2_loop: > > +.byte 102,15,56,220,209 > > +.byte 102,15,56,220,217 > > + movups (%rcx,%rax,1),%xmm1 > > + addq $32,%rax > > +.byte 102,15,56,220,208 > > +.byte 102,15,56,220,216 > > + movups -16(%rcx,%rax,1),%xmm0 > > + jnz .Lccm64_dec2_loop > > + movups (%rdi),%xmm8 > > + paddq %xmm9,%xmm6 > > +.byte 102,15,56,220,209 > > +.byte 102,15,56,220,217 > > +.byte 102,15,56,221,208 > > +.byte 102,15,56,221,216 > > + leaq 16(%rdi),%rdi > > + jmp .Lccm64_dec_outer > > + > > +.align 16 > > +.Lccm64_dec_break: > > + > > + movl 240(%r11),%eax > > + movups (%r11),%xmm0 > > + movups 16(%r11),%xmm1 > > + xorps %xmm0,%xmm8 > > + leaq 32(%r11),%r11 > > + xorps %xmm8,%xmm3 > > +.Loop_enc1_6: > > +.byte 102,15,56,220,217 > > + decl %eax > > + movups (%r11),%xmm1 > > + leaq 16(%r11),%r11 > > + jnz .Loop_enc1_6 > > +.byte 102,15,56,221,217 > > + pxor %xmm0,%xmm0 > > + pxor %xmm1,%xmm1 > > + pxor %xmm2,%xmm2 > > + movups %xmm3,(%r9) > > + pxor %xmm3,%xmm3 > > + pxor %xmm8,%xmm8 > > + pxor %xmm6,%xmm6 > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size aesni_ccm64_decrypt_blocks,.-aesni_ccm64_decrypt_blocks > > +.globl aesni_ctr32_encrypt_blocks > > +.type aesni_ctr32_encrypt_blocks,@function > > +.align 16 > > +aesni_ctr32_encrypt_blocks: > > +.cfi_startproc > > + cmpq $1,%rdx > > + jne .Lctr32_bulk > > + > > + > > + > > + movups (%r8),%xmm2 > > + movups (%rdi),%xmm3 > > + movl 240(%rcx),%edx > > + movups (%rcx),%xmm0 > > + movups 16(%rcx),%xmm1 > > + leaq 32(%rcx),%rcx > > + xorps %xmm0,%xmm2 > > +.Loop_enc1_7: > > +.byte 102,15,56,220,209 > > + decl %edx > > + movups (%rcx),%xmm1 > > + leaq 16(%rcx),%rcx > > + jnz .Loop_enc1_7 > > +.byte 102,15,56,221,209 > > + pxor %xmm0,%xmm0 > > + pxor %xmm1,%xmm1 > > + xorps %xmm3,%xmm2 > > + pxor %xmm3,%xmm3 > > + movups %xmm2,(%rsi) > > + xorps %xmm2,%xmm2 > > + jmp .Lctr32_epilogue > > + > > +.align 16 > > +.Lctr32_bulk: > > + leaq (%rsp),%r11 > > +.cfi_def_cfa_register %r11 > > + pushq %rbp > > +.cfi_offset %rbp,-16 > > + subq $128,%rsp > > + andq $-16,%rsp > > + > > + > > + > > + > > + movdqu (%r8),%xmm2 > > + movdqu (%rcx),%xmm0 > > + movl 12(%r8),%r8d > > + pxor %xmm0,%xmm2 > > + movl 12(%rcx),%ebp > > + movdqa %xmm2,0(%rsp) > > + bswapl %r8d > > + movdqa %xmm2,%xmm3 > > + movdqa %xmm2,%xmm4 > > + movdqa %xmm2,%xmm5 > > + movdqa %xmm2,64(%rsp) > > + movdqa %xmm2,80(%rsp) > > + movdqa %xmm2,96(%rsp) > > + movq %rdx,%r10 > > + movdqa %xmm2,112(%rsp) > > + > > + leaq 1(%r8),%rax > > + leaq 2(%r8),%rdx > > + bswapl %eax > > + bswapl %edx > > + xorl %ebp,%eax > > + xorl %ebp,%edx > > +.byte 102,15,58,34,216,3 > > + leaq 3(%r8),%rax > > + movdqa %xmm3,16(%rsp) > > +.byte 102,15,58,34,226,3 > > + bswapl %eax > > + movq %r10,%rdx > > + leaq 4(%r8),%r10 > > + movdqa %xmm4,32(%rsp) > > + xorl %ebp,%eax > > + bswapl %r10d > > +.byte 102,15,58,34,232,3 > > + xorl %ebp,%r10d > > + movdqa %xmm5,48(%rsp) > > + leaq 5(%r8),%r9 > > + movl %r10d,64+12(%rsp) > > + bswapl %r9d > > + leaq 6(%r8),%r10 > > + movl 240(%rcx),%eax > > + xorl %ebp,%r9d > > + bswapl %r10d > > + movl %r9d,80+12(%rsp) > > + xorl %ebp,%r10d > > + leaq 7(%r8),%r9 > > + movl %r10d,96+12(%rsp) > > + bswapl %r9d > > + movl OPENSSL_ia32cap_P+4(%rip),%r10d > > + xorl %ebp,%r9d > > + andl $71303168,%r10d > > + movl %r9d,112+12(%rsp) > > + > > + movups 16(%rcx),%xmm1 > > + > > + movdqa 64(%rsp),%xmm6 > > + movdqa 80(%rsp),%xmm7 > > + > > + cmpq $8,%rdx > > + jb .Lctr32_tail > > + > > + subq $6,%rdx > > + cmpl $4194304,%r10d > > + je .Lctr32_6x > > + > > + leaq 128(%rcx),%rcx > > + subq $2,%rdx > > + jmp .Lctr32_loop8 > > + > > +.align 16 > > +.Lctr32_6x: > > + shll $4,%eax > > + movl $48,%r10d > > + bswapl %ebp > > + leaq 32(%rcx,%rax,1),%rcx > > + subq %rax,%r10 > > + jmp .Lctr32_loop6 > > + > > +.align 16 > > +.Lctr32_loop6: > > + addl $6,%r8d > > + movups -48(%rcx,%r10,1),%xmm0 > > +.byte 102,15,56,220,209 > > + movl %r8d,%eax > > + xorl %ebp,%eax > > +.byte 102,15,56,220,217 > > +.byte 0x0f,0x38,0xf1,0x44,0x24,12 > > + leal 1(%r8),%eax > > +.byte 102,15,56,220,225 > > + xorl %ebp,%eax > > +.byte 0x0f,0x38,0xf1,0x44,0x24,28 > > +.byte 102,15,56,220,233 > > + leal 2(%r8),%eax > > + xorl %ebp,%eax > > +.byte 102,15,56,220,241 > > +.byte 0x0f,0x38,0xf1,0x44,0x24,44 > > + leal 3(%r8),%eax > > +.byte 102,15,56,220,249 > > + movups -32(%rcx,%r10,1),%xmm1 > > + xorl %ebp,%eax > > + > > +.byte 102,15,56,220,208 > > +.byte 0x0f,0x38,0xf1,0x44,0x24,60 > > + leal 4(%r8),%eax > > +.byte 102,15,56,220,216 > > + xorl %ebp,%eax > > +.byte 0x0f,0x38,0xf1,0x44,0x24,76 > > +.byte 102,15,56,220,224 > > + leal 5(%r8),%eax > > + xorl %ebp,%eax > > +.byte 102,15,56,220,232 > > +.byte 0x0f,0x38,0xf1,0x44,0x24,92 > > + movq %r10,%rax > > +.byte 102,15,56,220,240 > > +.byte 102,15,56,220,248 > > + movups -16(%rcx,%r10,1),%xmm0 > > + > > + call .Lenc_loop6 > > + > > + movdqu (%rdi),%xmm8 > > + movdqu 16(%rdi),%xmm9 > > + movdqu 32(%rdi),%xmm10 > > + movdqu 48(%rdi),%xmm11 > > + movdqu 64(%rdi),%xmm12 > > + movdqu 80(%rdi),%xmm13 > > + leaq 96(%rdi),%rdi > > + movups -64(%rcx,%r10,1),%xmm1 > > + pxor %xmm2,%xmm8 > > + movaps 0(%rsp),%xmm2 > > + pxor %xmm3,%xmm9 > > + movaps 16(%rsp),%xmm3 > > + pxor %xmm4,%xmm10 > > + movaps 32(%rsp),%xmm4 > > + pxor %xmm5,%xmm11 > > + movaps 48(%rsp),%xmm5 > > + pxor %xmm6,%xmm12 > > + movaps 64(%rsp),%xmm6 > > + pxor %xmm7,%xmm13 > > + movaps 80(%rsp),%xmm7 > > + movdqu %xmm8,(%rsi) > > + movdqu %xmm9,16(%rsi) > > + movdqu %xmm10,32(%rsi) > > + movdqu %xmm11,48(%rsi) > > + movdqu %xmm12,64(%rsi) > > + movdqu %xmm13,80(%rsi) > > + leaq 96(%rsi),%rsi > > + > > + subq $6,%rdx > > + jnc .Lctr32_loop6 > > + > > + addq $6,%rdx > > + jz .Lctr32_done > > + > > + leal -48(%r10),%eax > > + leaq -80(%rcx,%r10,1),%rcx > > + negl %eax > > + shrl $4,%eax > > + jmp .Lctr32_tail > > + > > +.align 32 > > +.Lctr32_loop8: > > + addl $8,%r8d > > + movdqa 96(%rsp),%xmm8 > > +.byte 102,15,56,220,209 > > + movl %r8d,%r9d > > + movdqa 112(%rsp),%xmm9 > > +.byte 102,15,56,220,217 > > + bswapl %r9d > > + movups 32-128(%rcx),%xmm0 > > +.byte 102,15,56,220,225 > > + xorl %ebp,%r9d > > + nop > > +.byte 102,15,56,220,233 > > + movl %r9d,0+12(%rsp) > > + leaq 1(%r8),%r9 > > +.byte 102,15,56,220,241 > > +.byte 102,15,56,220,249 > > +.byte 102,68,15,56,220,193 > > +.byte 102,68,15,56,220,201 > > + movups 48-128(%rcx),%xmm1 > > + bswapl %r9d > > +.byte 102,15,56,220,208 > > +.byte 102,15,56,220,216 > > + xorl %ebp,%r9d > > +.byte 0x66,0x90 > > +.byte 102,15,56,220,224 > > +.byte 102,15,56,220,232 > > + movl %r9d,16+12(%rsp) > > + leaq 2(%r8),%r9 > > +.byte 102,15,56,220,240 > > +.byte 102,15,56,220,248 > > +.byte 102,68,15,56,220,192 > > +.byte 102,68,15,56,220,200 > > + movups 64-128(%rcx),%xmm0 > > + bswapl %r9d > > +.byte 102,15,56,220,209 > > +.byte 102,15,56,220,217 > > + xorl %ebp,%r9d > > +.byte 0x66,0x90 > > +.byte 102,15,56,220,225 > > +.byte 102,15,56,220,233 > > + movl %r9d,32+12(%rsp) > > + leaq 3(%r8),%r9 > > +.byte 102,15,56,220,241 > > +.byte 102,15,56,220,249 > > +.byte 102,68,15,56,220,193 > > +.byte 102,68,15,56,220,201 > > + movups 80-128(%rcx),%xmm1 > > + bswapl %r9d > > +.byte 102,15,56,220,208 > > +.byte 102,15,56,220,216 > > + xorl %ebp,%r9d > > +.byte 0x66,0x90 > > +.byte 102,15,56,220,224 > > +.byte 102,15,56,220,232 > > + movl %r9d,48+12(%rsp) > > + leaq 4(%r8),%r9 > > +.byte 102,15,56,220,240 > > +.byte 102,15,56,220,248 > > +.byte 102,68,15,56,220,192 > > +.byte 102,68,15,56,220,200 > > + movups 96-128(%rcx),%xmm0 > > + bswapl %r9d > > +.byte 102,15,56,220,209 > > +.byte 102,15,56,220,217 > > + xorl %ebp,%r9d > > +.byte 0x66,0x90 > > +.byte 102,15,56,220,225 > > +.byte 102,15,56,220,233 > > + movl %r9d,64+12(%rsp) > > + leaq 5(%r8),%r9 > > +.byte 102,15,56,220,241 > > +.byte 102,15,56,220,249 > > +.byte 102,68,15,56,220,193 > > +.byte 102,68,15,56,220,201 > > + movups 112-128(%rcx),%xmm1 > > + bswapl %r9d > > +.byte 102,15,56,220,208 > > +.byte 102,15,56,220,216 > > + xorl %ebp,%r9d > > +.byte 0x66,0x90 > > +.byte 102,15,56,220,224 > > +.byte 102,15,56,220,232 > > + movl %r9d,80+12(%rsp) > > + leaq 6(%r8),%r9 > > +.byte 102,15,56,220,240 > > +.byte 102,15,56,220,248 > > +.byte 102,68,15,56,220,192 > > +.byte 102,68,15,56,220,200 > > + movups 128-128(%rcx),%xmm0 > > + bswapl %r9d > > +.byte 102,15,56,220,209 > > +.byte 102,15,56,220,217 > > + xorl %ebp,%r9d > > +.byte 0x66,0x90 > > +.byte 102,15,56,220,225 > > +.byte 102,15,56,220,233 > > + movl %r9d,96+12(%rsp) > > + leaq 7(%r8),%r9 > > +.byte 102,15,56,220,241 > > +.byte 102,15,56,220,249 > > +.byte 102,68,15,56,220,193 > > +.byte 102,68,15,56,220,201 > > + movups 144-128(%rcx),%xmm1 > > + bswapl %r9d > > +.byte 102,15,56,220,208 > > +.byte 102,15,56,220,216 > > +.byte 102,15,56,220,224 > > + xorl %ebp,%r9d > > + movdqu 0(%rdi),%xmm10 > > +.byte 102,15,56,220,232 > > + movl %r9d,112+12(%rsp) > > + cmpl $11,%eax > > +.byte 102,15,56,220,240 > > +.byte 102,15,56,220,248 > > +.byte 102,68,15,56,220,192 > > +.byte 102,68,15,56,220,200 > > + movups 160-128(%rcx),%xmm0 > > + > > + jb .Lctr32_enc_done > > + > > +.byte 102,15,56,220,209 > > +.byte 102,15,56,220,217 > > +.byte 102,15,56,220,225 > > +.byte 102,15,56,220,233 > > +.byte 102,15,56,220,241 > > +.byte 102,15,56,220,249 > > +.byte 102,68,15,56,220,193 > > +.byte 102,68,15,56,220,201 > > + movups 176-128(%rcx),%xmm1 > > + > > +.byte 102,15,56,220,208 > > +.byte 102,15,56,220,216 > > +.byte 102,15,56,220,224 > > +.byte 102,15,56,220,232 > > +.byte 102,15,56,220,240 > > +.byte 102,15,56,220,248 > > +.byte 102,68,15,56,220,192 > > +.byte 102,68,15,56,220,200 > > + movups 192-128(%rcx),%xmm0 > > + je .Lctr32_enc_done > > + > > +.byte 102,15,56,220,209 > > +.byte 102,15,56,220,217 > > +.byte 102,15,56,220,225 > > +.byte 102,15,56,220,233 > > +.byte 102,15,56,220,241 > > +.byte 102,15,56,220,249 > > +.byte 102,68,15,56,220,193 > > +.byte 102,68,15,56,220,201 > > + movups 208-128(%rcx),%xmm1 > > + > > +.byte 102,15,56,220,208 > > +.byte 102,15,56,220,216 > > +.byte 102,15,56,220,224 > > +.byte 102,15,56,220,232 > > +.byte 102,15,56,220,240 > > +.byte 102,15,56,220,248 > > +.byte 102,68,15,56,220,192 > > +.byte 102,68,15,56,220,200 > > + movups 224-128(%rcx),%xmm0 > > + jmp .Lctr32_enc_done > > + > > +.align 16 > > +.Lctr32_enc_done: > > + movdqu 16(%rdi),%xmm11 > > + pxor %xmm0,%xmm10 > > + movdqu 32(%rdi),%xmm12 > > + pxor %xmm0,%xmm11 > > + movdqu 48(%rdi),%xmm13 > > + pxor %xmm0,%xmm12 > > + movdqu 64(%rdi),%xmm14 > > + pxor %xmm0,%xmm13 > > + movdqu 80(%rdi),%xmm15 > > + pxor %xmm0,%xmm14 > > + pxor %xmm0,%xmm15 > > +.byte 102,15,56,220,209 > > +.byte 102,15,56,220,217 > > +.byte 102,15,56,220,225 > > +.byte 102,15,56,220,233 > > +.byte 102,15,56,220,241 > > +.byte 102,15,56,220,249 > > +.byte 102,68,15,56,220,193 > > +.byte 102,68,15,56,220,201 > > + movdqu 96(%rdi),%xmm1 > > + leaq 128(%rdi),%rdi > > + > > +.byte 102,65,15,56,221,210 > > + pxor %xmm0,%xmm1 > > + movdqu 112-128(%rdi),%xmm10 > > +.byte 102,65,15,56,221,219 > > + pxor %xmm0,%xmm10 > > + movdqa 0(%rsp),%xmm11 > > +.byte 102,65,15,56,221,228 > > +.byte 102,65,15,56,221,237 > > + movdqa 16(%rsp),%xmm12 > > + movdqa 32(%rsp),%xmm13 > > +.byte 102,65,15,56,221,246 > > +.byte 102,65,15,56,221,255 > > + movdqa 48(%rsp),%xmm14 > > + movdqa 64(%rsp),%xmm15 > > +.byte 102,68,15,56,221,193 > > + movdqa 80(%rsp),%xmm0 > > + movups 16-128(%rcx),%xmm1 > > +.byte 102,69,15,56,221,202 > > + > > + movups %xmm2,(%rsi) > > + movdqa %xmm11,%xmm2 > > + movups %xmm3,16(%rsi) > > + movdqa %xmm12,%xmm3 > > + movups %xmm4,32(%rsi) > > + movdqa %xmm13,%xmm4 > > + movups %xmm5,48(%rsi) > > + movdqa %xmm14,%xmm5 > > + movups %xmm6,64(%rsi) > > + movdqa %xmm15,%xmm6 > > + movups %xmm7,80(%rsi) > > + movdqa %xmm0,%xmm7 > > + movups %xmm8,96(%rsi) > > + movups %xmm9,112(%rsi) > > + leaq 128(%rsi),%rsi > > + > > + subq $8,%rdx > > + jnc .Lctr32_loop8 > > + > > + addq $8,%rdx > > + jz .Lctr32_done > > + leaq -128(%rcx),%rcx > > + > > +.Lctr32_tail: > > + > > + > > + leaq 16(%rcx),%rcx > > + cmpq $4,%rdx > > + jb .Lctr32_loop3 > > + je .Lctr32_loop4 > > + > > + > > + shll $4,%eax > > + movdqa 96(%rsp),%xmm8 > > + pxor %xmm9,%xmm9 > > + > > + movups 16(%rcx),%xmm0 > > +.byte 102,15,56,220,209 > > +.byte 102,15,56,220,217 > > + leaq 32-16(%rcx,%rax,1),%rcx > > + negq %rax > > +.byte 102,15,56,220,225 > > + addq $16,%rax > > + movups (%rdi),%xmm10 > > +.byte 102,15,56,220,233 > > +.byte 102,15,56,220,241 > > + movups 16(%rdi),%xmm11 > > + movups 32(%rdi),%xmm12 > > +.byte 102,15,56,220,249 > > +.byte 102,68,15,56,220,193 > > + > > + call .Lenc_loop8_enter > > + > > + movdqu 48(%rdi),%xmm13 > > + pxor %xmm10,%xmm2 > > + movdqu 64(%rdi),%xmm10 > > + pxor %xmm11,%xmm3 > > + movdqu %xmm2,(%rsi) > > + pxor %xmm12,%xmm4 > > + movdqu %xmm3,16(%rsi) > > + pxor %xmm13,%xmm5 > > + movdqu %xmm4,32(%rsi) > > + pxor %xmm10,%xmm6 > > + movdqu %xmm5,48(%rsi) > > + movdqu %xmm6,64(%rsi) > > + cmpq $6,%rdx > > + jb .Lctr32_done > > + > > + movups 80(%rdi),%xmm11 > > + xorps %xmm11,%xmm7 > > + movups %xmm7,80(%rsi) > > + je .Lctr32_done > > + > > + movups 96(%rdi),%xmm12 > > + xorps %xmm12,%xmm8 > > + movups %xmm8,96(%rsi) > > + jmp .Lctr32_done > > + > > +.align 32 > > +.Lctr32_loop4: > > +.byte 102,15,56,220,209 > > + leaq 16(%rcx),%rcx > > + decl %eax > > +.byte 102,15,56,220,217 > > +.byte 102,15,56,220,225 > > +.byte 102,15,56,220,233 > > + movups (%rcx),%xmm1 > > + jnz .Lctr32_loop4 > > +.byte 102,15,56,221,209 > > +.byte 102,15,56,221,217 > > + movups (%rdi),%xmm10 > > + movups 16(%rdi),%xmm11 > > +.byte 102,15,56,221,225 > > +.byte 102,15,56,221,233 > > + movups 32(%rdi),%xmm12 > > + movups 48(%rdi),%xmm13 > > + > > + xorps %xmm10,%xmm2 > > + movups %xmm2,(%rsi) > > + xorps %xmm11,%xmm3 > > + movups %xmm3,16(%rsi) > > + pxor %xmm12,%xmm4 > > + movdqu %xmm4,32(%rsi) > > + pxor %xmm13,%xmm5 > > + movdqu %xmm5,48(%rsi) > > + jmp .Lctr32_done > > + > > +.align 32 > > +.Lctr32_loop3: > > +.byte 102,15,56,220,209 > > + leaq 16(%rcx),%rcx > > + decl %eax > > +.byte 102,15,56,220,217 > > +.byte 102,15,56,220,225 > > + movups (%rcx),%xmm1 > > + jnz .Lctr32_loop3 > > +.byte 102,15,56,221,209 > > +.byte 102,15,56,221,217 > > +.byte 102,15,56,221,225 > > + > > + movups (%rdi),%xmm10 > > + xorps %xmm10,%xmm2 > > + movups %xmm2,(%rsi) > > + cmpq $2,%rdx > > + jb .Lctr32_done > > + > > + movups 16(%rdi),%xmm11 > > + xorps %xmm11,%xmm3 > > + movups %xmm3,16(%rsi) > > + je .Lctr32_done > > + > > + movups 32(%rdi),%xmm12 > > + xorps %xmm12,%xmm4 > > + movups %xmm4,32(%rsi) > > + > > +.Lctr32_done: > > + xorps %xmm0,%xmm0 > > + xorl %ebp,%ebp > > + pxor %xmm1,%xmm1 > > + pxor %xmm2,%xmm2 > > + pxor %xmm3,%xmm3 > > + pxor %xmm4,%xmm4 > > + pxor %xmm5,%xmm5 > > + pxor %xmm6,%xmm6 > > + pxor %xmm7,%xmm7 > > + movaps %xmm0,0(%rsp) > > + pxor %xmm8,%xmm8 > > + movaps %xmm0,16(%rsp) > > + pxor %xmm9,%xmm9 > > + movaps %xmm0,32(%rsp) > > + pxor %xmm10,%xmm10 > > + movaps %xmm0,48(%rsp) > > + pxor %xmm11,%xmm11 > > + movaps %xmm0,64(%rsp) > > + pxor %xmm12,%xmm12 > > + movaps %xmm0,80(%rsp) > > + pxor %xmm13,%xmm13 > > + movaps %xmm0,96(%rsp) > > + pxor %xmm14,%xmm14 > > + movaps %xmm0,112(%rsp) > > + pxor %xmm15,%xmm15 > > + movq -8(%r11),%rbp > > +.cfi_restore %rbp > > + leaq (%r11),%rsp > > +.cfi_def_cfa_register %rsp > > +.Lctr32_epilogue: > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size aesni_ctr32_encrypt_blocks,.-aesni_ctr32_encrypt_blocks > > +.globl aesni_xts_encrypt > > +.type aesni_xts_encrypt,@function > > +.align 16 > > +aesni_xts_encrypt: > > +.cfi_startproc > > + leaq (%rsp),%r11 > > +.cfi_def_cfa_register %r11 > > + pushq %rbp > > +.cfi_offset %rbp,-16 > > + subq $112,%rsp > > + andq $-16,%rsp > > + movups (%r9),%xmm2 > > + movl 240(%r8),%eax > > + movl 240(%rcx),%r10d > > + movups (%r8),%xmm0 > > + movups 16(%r8),%xmm1 > > + leaq 32(%r8),%r8 > > + xorps %xmm0,%xmm2 > > +.Loop_enc1_8: > > +.byte 102,15,56,220,209 > > + decl %eax > > + movups (%r8),%xmm1 > > + leaq 16(%r8),%r8 > > + jnz .Loop_enc1_8 > > +.byte 102,15,56,221,209 > > + movups (%rcx),%xmm0 > > + movq %rcx,%rbp > > + movl %r10d,%eax > > + shll $4,%r10d > > + movq %rdx,%r9 > > + andq $-16,%rdx > > + > > + movups 16(%rcx,%r10,1),%xmm1 > > + > > + movdqa .Lxts_magic(%rip),%xmm8 > > + movdqa %xmm2,%xmm15 > > + pshufd $0x5f,%xmm2,%xmm9 > > + pxor %xmm0,%xmm1 > > + movdqa %xmm9,%xmm14 > > + paddd %xmm9,%xmm9 > > + movdqa %xmm15,%xmm10 > > + psrad $31,%xmm14 > > + paddq %xmm15,%xmm15 > > + pand %xmm8,%xmm14 > > + pxor %xmm0,%xmm10 > > + pxor %xmm14,%xmm15 > > + movdqa %xmm9,%xmm14 > > + paddd %xmm9,%xmm9 > > + movdqa %xmm15,%xmm11 > > + psrad $31,%xmm14 > > + paddq %xmm15,%xmm15 > > + pand %xmm8,%xmm14 > > + pxor %xmm0,%xmm11 > > + pxor %xmm14,%xmm15 > > + movdqa %xmm9,%xmm14 > > + paddd %xmm9,%xmm9 > > + movdqa %xmm15,%xmm12 > > + psrad $31,%xmm14 > > + paddq %xmm15,%xmm15 > > + pand %xmm8,%xmm14 > > + pxor %xmm0,%xmm12 > > + pxor %xmm14,%xmm15 > > + movdqa %xmm9,%xmm14 > > + paddd %xmm9,%xmm9 > > + movdqa %xmm15,%xmm13 > > + psrad $31,%xmm14 > > + paddq %xmm15,%xmm15 > > + pand %xmm8,%xmm14 > > + pxor %xmm0,%xmm13 > > + pxor %xmm14,%xmm15 > > + movdqa %xmm15,%xmm14 > > + psrad $31,%xmm9 > > + paddq %xmm15,%xmm15 > > + pand %xmm8,%xmm9 > > + pxor %xmm0,%xmm14 > > + pxor %xmm9,%xmm15 > > + movaps %xmm1,96(%rsp) > > + > > + subq $96,%rdx > > + jc .Lxts_enc_short > > + > > + movl $16+96,%eax > > + leaq 32(%rbp,%r10,1),%rcx > > + subq %r10,%rax > > + movups 16(%rbp),%xmm1 > > + movq %rax,%r10 > > + leaq .Lxts_magic(%rip),%r8 > > + jmp .Lxts_enc_grandloop > > + > > +.align 32 > > +.Lxts_enc_grandloop: > > + movdqu 0(%rdi),%xmm2 > > + movdqa %xmm0,%xmm8 > > + movdqu 16(%rdi),%xmm3 > > + pxor %xmm10,%xmm2 > > + movdqu 32(%rdi),%xmm4 > > + pxor %xmm11,%xmm3 > > +.byte 102,15,56,220,209 > > + movdqu 48(%rdi),%xmm5 > > + pxor %xmm12,%xmm4 > > +.byte 102,15,56,220,217 > > + movdqu 64(%rdi),%xmm6 > > + pxor %xmm13,%xmm5 > > +.byte 102,15,56,220,225 > > + movdqu 80(%rdi),%xmm7 > > + pxor %xmm15,%xmm8 > > + movdqa 96(%rsp),%xmm9 > > + pxor %xmm14,%xmm6 > > +.byte 102,15,56,220,233 > > + movups 32(%rbp),%xmm0 > > + leaq 96(%rdi),%rdi > > + pxor %xmm8,%xmm7 > > + > > + pxor %xmm9,%xmm10 > > +.byte 102,15,56,220,241 > > + pxor %xmm9,%xmm11 > > + movdqa %xmm10,0(%rsp) > > +.byte 102,15,56,220,249 > > + movups 48(%rbp),%xmm1 > > + pxor %xmm9,%xmm12 > > + > > +.byte 102,15,56,220,208 > > + pxor %xmm9,%xmm13 > > + movdqa %xmm11,16(%rsp) > > +.byte 102,15,56,220,216 > > + pxor %xmm9,%xmm14 > > + movdqa %xmm12,32(%rsp) > > +.byte 102,15,56,220,224 > > +.byte 102,15,56,220,232 > > + pxor %xmm9,%xmm8 > > + movdqa %xmm14,64(%rsp) > > +.byte 102,15,56,220,240 > > +.byte 102,15,56,220,248 > > + movups 64(%rbp),%xmm0 > > + movdqa %xmm8,80(%rsp) > > + pshufd $0x5f,%xmm15,%xmm9 > > + jmp .Lxts_enc_loop6 > > +.align 32 > > +.Lxts_enc_loop6: > > +.byte 102,15,56,220,209 > > +.byte 102,15,56,220,217 > > +.byte 102,15,56,220,225 > > +.byte 102,15,56,220,233 > > +.byte 102,15,56,220,241 > > +.byte 102,15,56,220,249 > > + movups -64(%rcx,%rax,1),%xmm1 > > + addq $32,%rax > > + > > +.byte 102,15,56,220,208 > > +.byte 102,15,56,220,216 > > +.byte 102,15,56,220,224 > > +.byte 102,15,56,220,232 > > +.byte 102,15,56,220,240 > > +.byte 102,15,56,220,248 > > + movups -80(%rcx,%rax,1),%xmm0 > > + jnz .Lxts_enc_loop6 > > + > > + movdqa (%r8),%xmm8 > > + movdqa %xmm9,%xmm14 > > + paddd %xmm9,%xmm9 > > +.byte 102,15,56,220,209 > > + paddq %xmm15,%xmm15 > > + psrad $31,%xmm14 > > +.byte 102,15,56,220,217 > > + pand %xmm8,%xmm14 > > + movups (%rbp),%xmm10 > > +.byte 102,15,56,220,225 > > +.byte 102,15,56,220,233 > > +.byte 102,15,56,220,241 > > + pxor %xmm14,%xmm15 > > + movaps %xmm10,%xmm11 > > +.byte 102,15,56,220,249 > > + movups -64(%rcx),%xmm1 > > + > > + movdqa %xmm9,%xmm14 > > +.byte 102,15,56,220,208 > > + paddd %xmm9,%xmm9 > > + pxor %xmm15,%xmm10 > > +.byte 102,15,56,220,216 > > + psrad $31,%xmm14 > > + paddq %xmm15,%xmm15 > > +.byte 102,15,56,220,224 > > +.byte 102,15,56,220,232 > > + pand %xmm8,%xmm14 > > + movaps %xmm11,%xmm12 > > +.byte 102,15,56,220,240 > > + pxor %xmm14,%xmm15 > > + movdqa %xmm9,%xmm14 > > +.byte 102,15,56,220,248 > > + movups -48(%rcx),%xmm0 > > + > > + paddd %xmm9,%xmm9 > > +.byte 102,15,56,220,209 > > + pxor %xmm15,%xmm11 > > + psrad $31,%xmm14 > > +.byte 102,15,56,220,217 > > + paddq %xmm15,%xmm15 > > + pand %xmm8,%xmm14 > > +.byte 102,15,56,220,225 > > +.byte 102,15,56,220,233 > > + movdqa %xmm13,48(%rsp) > > + pxor %xmm14,%xmm15 > > +.byte 102,15,56,220,241 > > + movaps %xmm12,%xmm13 > > + movdqa %xmm9,%xmm14 > > +.byte 102,15,56,220,249 > > + movups -32(%rcx),%xmm1 > > + > > + paddd %xmm9,%xmm9 > > +.byte 102,15,56,220,208 > > + pxor %xmm15,%xmm12 > > + psrad $31,%xmm14 > > +.byte 102,15,56,220,216 > > + paddq %xmm15,%xmm15 > > + pand %xmm8,%xmm14 > > +.byte 102,15,56,220,224 > > +.byte 102,15,56,220,232 > > +.byte 102,15,56,220,240 > > + pxor %xmm14,%xmm15 > > + movaps %xmm13,%xmm14 > > +.byte 102,15,56,220,248 > > + > > + movdqa %xmm9,%xmm0 > > + paddd %xmm9,%xmm9 > > +.byte 102,15,56,220,209 > > + pxor %xmm15,%xmm13 > > + psrad $31,%xmm0 > > +.byte 102,15,56,220,217 > > + paddq %xmm15,%xmm15 > > + pand %xmm8,%xmm0 > > +.byte 102,15,56,220,225 > > +.byte 102,15,56,220,233 > > + pxor %xmm0,%xmm15 > > + movups (%rbp),%xmm0 > > +.byte 102,15,56,220,241 > > +.byte 102,15,56,220,249 > > + movups 16(%rbp),%xmm1 > > + > > + pxor %xmm15,%xmm14 > > +.byte 102,15,56,221,84,36,0 > > + psrad $31,%xmm9 > > + paddq %xmm15,%xmm15 > > +.byte 102,15,56,221,92,36,16 > > +.byte 102,15,56,221,100,36,32 > > + pand %xmm8,%xmm9 > > + movq %r10,%rax > > +.byte 102,15,56,221,108,36,48 > > +.byte 102,15,56,221,116,36,64 > > +.byte 102,15,56,221,124,36,80 > > + pxor %xmm9,%xmm15 > > + > > + leaq 96(%rsi),%rsi > > + movups %xmm2,-96(%rsi) > > + movups %xmm3,-80(%rsi) > > + movups %xmm4,-64(%rsi) > > + movups %xmm5,-48(%rsi) > > + movups %xmm6,-32(%rsi) > > + movups %xmm7,-16(%rsi) > > + subq $96,%rdx > > + jnc .Lxts_enc_grandloop > > + > > + movl $16+96,%eax > > + subl %r10d,%eax > > + movq %rbp,%rcx > > + shrl $4,%eax > > + > > +.Lxts_enc_short: > > + > > + movl %eax,%r10d > > + pxor %xmm0,%xmm10 > > + addq $96,%rdx > > + jz .Lxts_enc_done > > + > > + pxor %xmm0,%xmm11 > > + cmpq $0x20,%rdx > > + jb .Lxts_enc_one > > + pxor %xmm0,%xmm12 > > + je .Lxts_enc_two > > + > > + pxor %xmm0,%xmm13 > > + cmpq $0x40,%rdx > > + jb .Lxts_enc_three > > + pxor %xmm0,%xmm14 > > + je .Lxts_enc_four > > + > > + movdqu (%rdi),%xmm2 > > + movdqu 16(%rdi),%xmm3 > > + movdqu 32(%rdi),%xmm4 > > + pxor %xmm10,%xmm2 > > + movdqu 48(%rdi),%xmm5 > > + pxor %xmm11,%xmm3 > > + movdqu 64(%rdi),%xmm6 > > + leaq 80(%rdi),%rdi > > + pxor %xmm12,%xmm4 > > + pxor %xmm13,%xmm5 > > + pxor %xmm14,%xmm6 > > + pxor %xmm7,%xmm7 > > + > > + call _aesni_encrypt6 > > + > > + xorps %xmm10,%xmm2 > > + movdqa %xmm15,%xmm10 > > + xorps %xmm11,%xmm3 > > + xorps %xmm12,%xmm4 > > + movdqu %xmm2,(%rsi) > > + xorps %xmm13,%xmm5 > > + movdqu %xmm3,16(%rsi) > > + xorps %xmm14,%xmm6 > > + movdqu %xmm4,32(%rsi) > > + movdqu %xmm5,48(%rsi) > > + movdqu %xmm6,64(%rsi) > > + leaq 80(%rsi),%rsi > > + jmp .Lxts_enc_done > > + > > +.align 16 > > +.Lxts_enc_one: > > + movups (%rdi),%xmm2 > > + leaq 16(%rdi),%rdi > > + xorps %xmm10,%xmm2 > > + movups (%rcx),%xmm0 > > + movups 16(%rcx),%xmm1 > > + leaq 32(%rcx),%rcx > > + xorps %xmm0,%xmm2 > > +.Loop_enc1_9: > > +.byte 102,15,56,220,209 > > + decl %eax > > + movups (%rcx),%xmm1 > > + leaq 16(%rcx),%rcx > > + jnz .Loop_enc1_9 > > +.byte 102,15,56,221,209 > > + xorps %xmm10,%xmm2 > > + movdqa %xmm11,%xmm10 > > + movups %xmm2,(%rsi) > > + leaq 16(%rsi),%rsi > > + jmp .Lxts_enc_done > > + > > +.align 16 > > +.Lxts_enc_two: > > + movups (%rdi),%xmm2 > > + movups 16(%rdi),%xmm3 > > + leaq 32(%rdi),%rdi > > + xorps %xmm10,%xmm2 > > + xorps %xmm11,%xmm3 > > + > > + call _aesni_encrypt2 > > + > > + xorps %xmm10,%xmm2 > > + movdqa %xmm12,%xmm10 > > + xorps %xmm11,%xmm3 > > + movups %xmm2,(%rsi) > > + movups %xmm3,16(%rsi) > > + leaq 32(%rsi),%rsi > > + jmp .Lxts_enc_done > > + > > +.align 16 > > +.Lxts_enc_three: > > + movups (%rdi),%xmm2 > > + movups 16(%rdi),%xmm3 > > + movups 32(%rdi),%xmm4 > > + leaq 48(%rdi),%rdi > > + xorps %xmm10,%xmm2 > > + xorps %xmm11,%xmm3 > > + xorps %xmm12,%xmm4 > > + > > + call _aesni_encrypt3 > > + > > + xorps %xmm10,%xmm2 > > + movdqa %xmm13,%xmm10 > > + xorps %xmm11,%xmm3 > > + xorps %xmm12,%xmm4 > > + movups %xmm2,(%rsi) > > + movups %xmm3,16(%rsi) > > + movups %xmm4,32(%rsi) > > + leaq 48(%rsi),%rsi > > + jmp .Lxts_enc_done > > + > > +.align 16 > > +.Lxts_enc_four: > > + movups (%rdi),%xmm2 > > + movups 16(%rdi),%xmm3 > > + movups 32(%rdi),%xmm4 > > + xorps %xmm10,%xmm2 > > + movups 48(%rdi),%xmm5 > > + leaq 64(%rdi),%rdi > > + xorps %xmm11,%xmm3 > > + xorps %xmm12,%xmm4 > > + xorps %xmm13,%xmm5 > > + > > + call _aesni_encrypt4 > > + > > + pxor %xmm10,%xmm2 > > + movdqa %xmm14,%xmm10 > > + pxor %xmm11,%xmm3 > > + pxor %xmm12,%xmm4 > > + movdqu %xmm2,(%rsi) > > + pxor %xmm13,%xmm5 > > + movdqu %xmm3,16(%rsi) > > + movdqu %xmm4,32(%rsi) > > + movdqu %xmm5,48(%rsi) > > + leaq 64(%rsi),%rsi > > + jmp .Lxts_enc_done > > + > > +.align 16 > > +.Lxts_enc_done: > > + andq $15,%r9 > > + jz .Lxts_enc_ret > > + movq %r9,%rdx > > + > > +.Lxts_enc_steal: > > + movzbl (%rdi),%eax > > + movzbl -16(%rsi),%ecx > > + leaq 1(%rdi),%rdi > > + movb %al,-16(%rsi) > > + movb %cl,0(%rsi) > > + leaq 1(%rsi),%rsi > > + subq $1,%rdx > > + jnz .Lxts_enc_steal > > + > > + subq %r9,%rsi > > + movq %rbp,%rcx > > + movl %r10d,%eax > > + > > + movups -16(%rsi),%xmm2 > > + xorps %xmm10,%xmm2 > > + movups (%rcx),%xmm0 > > + movups 16(%rcx),%xmm1 > > + leaq 32(%rcx),%rcx > > + xorps %xmm0,%xmm2 > > +.Loop_enc1_10: > > +.byte 102,15,56,220,209 > > + decl %eax > > + movups (%rcx),%xmm1 > > + leaq 16(%rcx),%rcx > > + jnz .Loop_enc1_10 > > +.byte 102,15,56,221,209 > > + xorps %xmm10,%xmm2 > > + movups %xmm2,-16(%rsi) > > + > > +.Lxts_enc_ret: > > + xorps %xmm0,%xmm0 > > + pxor %xmm1,%xmm1 > > + pxor %xmm2,%xmm2 > > + pxor %xmm3,%xmm3 > > + pxor %xmm4,%xmm4 > > + pxor %xmm5,%xmm5 > > + pxor %xmm6,%xmm6 > > + pxor %xmm7,%xmm7 > > + movaps %xmm0,0(%rsp) > > + pxor %xmm8,%xmm8 > > + movaps %xmm0,16(%rsp) > > + pxor %xmm9,%xmm9 > > + movaps %xmm0,32(%rsp) > > + pxor %xmm10,%xmm10 > > + movaps %xmm0,48(%rsp) > > + pxor %xmm11,%xmm11 > > + movaps %xmm0,64(%rsp) > > + pxor %xmm12,%xmm12 > > + movaps %xmm0,80(%rsp) > > + pxor %xmm13,%xmm13 > > + movaps %xmm0,96(%rsp) > > + pxor %xmm14,%xmm14 > > + pxor %xmm15,%xmm15 > > + movq -8(%r11),%rbp > > +.cfi_restore %rbp > > + leaq (%r11),%rsp > > +.cfi_def_cfa_register %rsp > > +.Lxts_enc_epilogue: > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size aesni_xts_encrypt,.-aesni_xts_encrypt > > +.globl aesni_xts_decrypt > > +.type aesni_xts_decrypt,@function > > +.align 16 > > +aesni_xts_decrypt: > > +.cfi_startproc > > + leaq (%rsp),%r11 > > +.cfi_def_cfa_register %r11 > > + pushq %rbp > > +.cfi_offset %rbp,-16 > > + subq $112,%rsp > > + andq $-16,%rsp > > + movups (%r9),%xmm2 > > + movl 240(%r8),%eax > > + movl 240(%rcx),%r10d > > + movups (%r8),%xmm0 > > + movups 16(%r8),%xmm1 > > + leaq 32(%r8),%r8 > > + xorps %xmm0,%xmm2 > > +.Loop_enc1_11: > > +.byte 102,15,56,220,209 > > + decl %eax > > + movups (%r8),%xmm1 > > + leaq 16(%r8),%r8 > > + jnz .Loop_enc1_11 > > +.byte 102,15,56,221,209 > > + xorl %eax,%eax > > + testq $15,%rdx > > + setnz %al > > + shlq $4,%rax > > + subq %rax,%rdx > > + > > + movups (%rcx),%xmm0 > > + movq %rcx,%rbp > > + movl %r10d,%eax > > + shll $4,%r10d > > + movq %rdx,%r9 > > + andq $-16,%rdx > > + > > + movups 16(%rcx,%r10,1),%xmm1 > > + > > + movdqa .Lxts_magic(%rip),%xmm8 > > + movdqa %xmm2,%xmm15 > > + pshufd $0x5f,%xmm2,%xmm9 > > + pxor %xmm0,%xmm1 > > + movdqa %xmm9,%xmm14 > > + paddd %xmm9,%xmm9 > > + movdqa %xmm15,%xmm10 > > + psrad $31,%xmm14 > > + paddq %xmm15,%xmm15 > > + pand %xmm8,%xmm14 > > + pxor %xmm0,%xmm10 > > + pxor %xmm14,%xmm15 > > + movdqa %xmm9,%xmm14 > > + paddd %xmm9,%xmm9 > > + movdqa %xmm15,%xmm11 > > + psrad $31,%xmm14 > > + paddq %xmm15,%xmm15 > > + pand %xmm8,%xmm14 > > + pxor %xmm0,%xmm11 > > + pxor %xmm14,%xmm15 > > + movdqa %xmm9,%xmm14 > > + paddd %xmm9,%xmm9 > > + movdqa %xmm15,%xmm12 > > + psrad $31,%xmm14 > > + paddq %xmm15,%xmm15 > > + pand %xmm8,%xmm14 > > + pxor %xmm0,%xmm12 > > + pxor %xmm14,%xmm15 > > + movdqa %xmm9,%xmm14 > > + paddd %xmm9,%xmm9 > > + movdqa %xmm15,%xmm13 > > + psrad $31,%xmm14 > > + paddq %xmm15,%xmm15 > > + pand %xmm8,%xmm14 > > + pxor %xmm0,%xmm13 > > + pxor %xmm14,%xmm15 > > + movdqa %xmm15,%xmm14 > > + psrad $31,%xmm9 > > + paddq %xmm15,%xmm15 > > + pand %xmm8,%xmm9 > > + pxor %xmm0,%xmm14 > > + pxor %xmm9,%xmm15 > > + movaps %xmm1,96(%rsp) > > + > > + subq $96,%rdx > > + jc .Lxts_dec_short > > + > > + movl $16+96,%eax > > + leaq 32(%rbp,%r10,1),%rcx > > + subq %r10,%rax > > + movups 16(%rbp),%xmm1 > > + movq %rax,%r10 > > + leaq .Lxts_magic(%rip),%r8 > > + jmp .Lxts_dec_grandloop > > + > > +.align 32 > > +.Lxts_dec_grandloop: > > + movdqu 0(%rdi),%xmm2 > > + movdqa %xmm0,%xmm8 > > + movdqu 16(%rdi),%xmm3 > > + pxor %xmm10,%xmm2 > > + movdqu 32(%rdi),%xmm4 > > + pxor %xmm11,%xmm3 > > +.byte 102,15,56,222,209 > > + movdqu 48(%rdi),%xmm5 > > + pxor %xmm12,%xmm4 > > +.byte 102,15,56,222,217 > > + movdqu 64(%rdi),%xmm6 > > + pxor %xmm13,%xmm5 > > +.byte 102,15,56,222,225 > > + movdqu 80(%rdi),%xmm7 > > + pxor %xmm15,%xmm8 > > + movdqa 96(%rsp),%xmm9 > > + pxor %xmm14,%xmm6 > > +.byte 102,15,56,222,233 > > + movups 32(%rbp),%xmm0 > > + leaq 96(%rdi),%rdi > > + pxor %xmm8,%xmm7 > > + > > + pxor %xmm9,%xmm10 > > +.byte 102,15,56,222,241 > > + pxor %xmm9,%xmm11 > > + movdqa %xmm10,0(%rsp) > > +.byte 102,15,56,222,249 > > + movups 48(%rbp),%xmm1 > > + pxor %xmm9,%xmm12 > > + > > +.byte 102,15,56,222,208 > > + pxor %xmm9,%xmm13 > > + movdqa %xmm11,16(%rsp) > > +.byte 102,15,56,222,216 > > + pxor %xmm9,%xmm14 > > + movdqa %xmm12,32(%rsp) > > +.byte 102,15,56,222,224 > > +.byte 102,15,56,222,232 > > + pxor %xmm9,%xmm8 > > + movdqa %xmm14,64(%rsp) > > +.byte 102,15,56,222,240 > > +.byte 102,15,56,222,248 > > + movups 64(%rbp),%xmm0 > > + movdqa %xmm8,80(%rsp) > > + pshufd $0x5f,%xmm15,%xmm9 > > + jmp .Lxts_dec_loop6 > > +.align 32 > > +.Lxts_dec_loop6: > > +.byte 102,15,56,222,209 > > +.byte 102,15,56,222,217 > > +.byte 102,15,56,222,225 > > +.byte 102,15,56,222,233 > > +.byte 102,15,56,222,241 > > +.byte 102,15,56,222,249 > > + movups -64(%rcx,%rax,1),%xmm1 > > + addq $32,%rax > > + > > +.byte 102,15,56,222,208 > > +.byte 102,15,56,222,216 > > +.byte 102,15,56,222,224 > > +.byte 102,15,56,222,232 > > +.byte 102,15,56,222,240 > > +.byte 102,15,56,222,248 > > + movups -80(%rcx,%rax,1),%xmm0 > > + jnz .Lxts_dec_loop6 > > + > > + movdqa (%r8),%xmm8 > > + movdqa %xmm9,%xmm14 > > + paddd %xmm9,%xmm9 > > +.byte 102,15,56,222,209 > > + paddq %xmm15,%xmm15 > > + psrad $31,%xmm14 > > +.byte 102,15,56,222,217 > > + pand %xmm8,%xmm14 > > + movups (%rbp),%xmm10 > > +.byte 102,15,56,222,225 > > +.byte 102,15,56,222,233 > > +.byte 102,15,56,222,241 > > + pxor %xmm14,%xmm15 > > + movaps %xmm10,%xmm11 > > +.byte 102,15,56,222,249 > > + movups -64(%rcx),%xmm1 > > + > > + movdqa %xmm9,%xmm14 > > +.byte 102,15,56,222,208 > > + paddd %xmm9,%xmm9 > > + pxor %xmm15,%xmm10 > > +.byte 102,15,56,222,216 > > + psrad $31,%xmm14 > > + paddq %xmm15,%xmm15 > > +.byte 102,15,56,222,224 > > +.byte 102,15,56,222,232 > > + pand %xmm8,%xmm14 > > + movaps %xmm11,%xmm12 > > +.byte 102,15,56,222,240 > > + pxor %xmm14,%xmm15 > > + movdqa %xmm9,%xmm14 > > +.byte 102,15,56,222,248 > > + movups -48(%rcx),%xmm0 > > + > > + paddd %xmm9,%xmm9 > > +.byte 102,15,56,222,209 > > + pxor %xmm15,%xmm11 > > + psrad $31,%xmm14 > > +.byte 102,15,56,222,217 > > + paddq %xmm15,%xmm15 > > + pand %xmm8,%xmm14 > > +.byte 102,15,56,222,225 > > +.byte 102,15,56,222,233 > > + movdqa %xmm13,48(%rsp) > > + pxor %xmm14,%xmm15 > > +.byte 102,15,56,222,241 > > + movaps %xmm12,%xmm13 > > + movdqa %xmm9,%xmm14 > > +.byte 102,15,56,222,249 > > + movups -32(%rcx),%xmm1 > > + > > + paddd %xmm9,%xmm9 > > +.byte 102,15,56,222,208 > > + pxor %xmm15,%xmm12 > > + psrad $31,%xmm14 > > +.byte 102,15,56,222,216 > > + paddq %xmm15,%xmm15 > > + pand %xmm8,%xmm14 > > +.byte 102,15,56,222,224 > > +.byte 102,15,56,222,232 > > +.byte 102,15,56,222,240 > > + pxor %xmm14,%xmm15 > > + movaps %xmm13,%xmm14 > > +.byte 102,15,56,222,248 > > + > > + movdqa %xmm9,%xmm0 > > + paddd %xmm9,%xmm9 > > +.byte 102,15,56,222,209 > > + pxor %xmm15,%xmm13 > > + psrad $31,%xmm0 > > +.byte 102,15,56,222,217 > > + paddq %xmm15,%xmm15 > > + pand %xmm8,%xmm0 > > +.byte 102,15,56,222,225 > > +.byte 102,15,56,222,233 > > + pxor %xmm0,%xmm15 > > + movups (%rbp),%xmm0 > > +.byte 102,15,56,222,241 > > +.byte 102,15,56,222,249 > > + movups 16(%rbp),%xmm1 > > + > > + pxor %xmm15,%xmm14 > > +.byte 102,15,56,223,84,36,0 > > + psrad $31,%xmm9 > > + paddq %xmm15,%xmm15 > > +.byte 102,15,56,223,92,36,16 > > +.byte 102,15,56,223,100,36,32 > > + pand %xmm8,%xmm9 > > + movq %r10,%rax > > +.byte 102,15,56,223,108,36,48 > > +.byte 102,15,56,223,116,36,64 > > +.byte 102,15,56,223,124,36,80 > > + pxor %xmm9,%xmm15 > > + > > + leaq 96(%rsi),%rsi > > + movups %xmm2,-96(%rsi) > > + movups %xmm3,-80(%rsi) > > + movups %xmm4,-64(%rsi) > > + movups %xmm5,-48(%rsi) > > + movups %xmm6,-32(%rsi) > > + movups %xmm7,-16(%rsi) > > + subq $96,%rdx > > + jnc .Lxts_dec_grandloop > > + > > + movl $16+96,%eax > > + subl %r10d,%eax > > + movq %rbp,%rcx > > + shrl $4,%eax > > + > > +.Lxts_dec_short: > > + > > + movl %eax,%r10d > > + pxor %xmm0,%xmm10 > > + pxor %xmm0,%xmm11 > > + addq $96,%rdx > > + jz .Lxts_dec_done > > + > > + pxor %xmm0,%xmm12 > > + cmpq $0x20,%rdx > > + jb .Lxts_dec_one > > + pxor %xmm0,%xmm13 > > + je .Lxts_dec_two > > + > > + pxor %xmm0,%xmm14 > > + cmpq $0x40,%rdx > > + jb .Lxts_dec_three > > + je .Lxts_dec_four > > + > > + movdqu (%rdi),%xmm2 > > + movdqu 16(%rdi),%xmm3 > > + movdqu 32(%rdi),%xmm4 > > + pxor %xmm10,%xmm2 > > + movdqu 48(%rdi),%xmm5 > > + pxor %xmm11,%xmm3 > > + movdqu 64(%rdi),%xmm6 > > + leaq 80(%rdi),%rdi > > + pxor %xmm12,%xmm4 > > + pxor %xmm13,%xmm5 > > + pxor %xmm14,%xmm6 > > + > > + call _aesni_decrypt6 > > + > > + xorps %xmm10,%xmm2 > > + xorps %xmm11,%xmm3 > > + xorps %xmm12,%xmm4 > > + movdqu %xmm2,(%rsi) > > + xorps %xmm13,%xmm5 > > + movdqu %xmm3,16(%rsi) > > + xorps %xmm14,%xmm6 > > + movdqu %xmm4,32(%rsi) > > + pxor %xmm14,%xmm14 > > + movdqu %xmm5,48(%rsi) > > + pcmpgtd %xmm15,%xmm14 > > + movdqu %xmm6,64(%rsi) > > + leaq 80(%rsi),%rsi > > + pshufd $0x13,%xmm14,%xmm11 > > + andq $15,%r9 > > + jz .Lxts_dec_ret > > + > > + movdqa %xmm15,%xmm10 > > + paddq %xmm15,%xmm15 > > + pand %xmm8,%xmm11 > > + pxor %xmm15,%xmm11 > > + jmp .Lxts_dec_done2 > > + > > +.align 16 > > +.Lxts_dec_one: > > + movups (%rdi),%xmm2 > > + leaq 16(%rdi),%rdi > > + xorps %xmm10,%xmm2 > > + movups (%rcx),%xmm0 > > + movups 16(%rcx),%xmm1 > > + leaq 32(%rcx),%rcx > > + xorps %xmm0,%xmm2 > > +.Loop_dec1_12: > > +.byte 102,15,56,222,209 > > + decl %eax > > + movups (%rcx),%xmm1 > > + leaq 16(%rcx),%rcx > > + jnz .Loop_dec1_12 > > +.byte 102,15,56,223,209 > > + xorps %xmm10,%xmm2 > > + movdqa %xmm11,%xmm10 > > + movups %xmm2,(%rsi) > > + movdqa %xmm12,%xmm11 > > + leaq 16(%rsi),%rsi > > + jmp .Lxts_dec_done > > + > > +.align 16 > > +.Lxts_dec_two: > > + movups (%rdi),%xmm2 > > + movups 16(%rdi),%xmm3 > > + leaq 32(%rdi),%rdi > > + xorps %xmm10,%xmm2 > > + xorps %xmm11,%xmm3 > > + > > + call _aesni_decrypt2 > > + > > + xorps %xmm10,%xmm2 > > + movdqa %xmm12,%xmm10 > > + xorps %xmm11,%xmm3 > > + movdqa %xmm13,%xmm11 > > + movups %xmm2,(%rsi) > > + movups %xmm3,16(%rsi) > > + leaq 32(%rsi),%rsi > > + jmp .Lxts_dec_done > > + > > +.align 16 > > +.Lxts_dec_three: > > + movups (%rdi),%xmm2 > > + movups 16(%rdi),%xmm3 > > + movups 32(%rdi),%xmm4 > > + leaq 48(%rdi),%rdi > > + xorps %xmm10,%xmm2 > > + xorps %xmm11,%xmm3 > > + xorps %xmm12,%xmm4 > > + > > + call _aesni_decrypt3 > > + > > + xorps %xmm10,%xmm2 > > + movdqa %xmm13,%xmm10 > > + xorps %xmm11,%xmm3 > > + movdqa %xmm14,%xmm11 > > + xorps %xmm12,%xmm4 > > + movups %xmm2,(%rsi) > > + movups %xmm3,16(%rsi) > > + movups %xmm4,32(%rsi) > > + leaq 48(%rsi),%rsi > > + jmp .Lxts_dec_done > > + > > +.align 16 > > +.Lxts_dec_four: > > + movups (%rdi),%xmm2 > > + movups 16(%rdi),%xmm3 > > + movups 32(%rdi),%xmm4 > > + xorps %xmm10,%xmm2 > > + movups 48(%rdi),%xmm5 > > + leaq 64(%rdi),%rdi > > + xorps %xmm11,%xmm3 > > + xorps %xmm12,%xmm4 > > + xorps %xmm13,%xmm5 > > + > > + call _aesni_decrypt4 > > + > > + pxor %xmm10,%xmm2 > > + movdqa %xmm14,%xmm10 > > + pxor %xmm11,%xmm3 > > + movdqa %xmm15,%xmm11 > > + pxor %xmm12,%xmm4 > > + movdqu %xmm2,(%rsi) > > + pxor %xmm13,%xmm5 > > + movdqu %xmm3,16(%rsi) > > + movdqu %xmm4,32(%rsi) > > + movdqu %xmm5,48(%rsi) > > + leaq 64(%rsi),%rsi > > + jmp .Lxts_dec_done > > + > > +.align 16 > > +.Lxts_dec_done: > > + andq $15,%r9 > > + jz .Lxts_dec_ret > > +.Lxts_dec_done2: > > + movq %r9,%rdx > > + movq %rbp,%rcx > > + movl %r10d,%eax > > + > > + movups (%rdi),%xmm2 > > + xorps %xmm11,%xmm2 > > + movups (%rcx),%xmm0 > > + movups 16(%rcx),%xmm1 > > + leaq 32(%rcx),%rcx > > + xorps %xmm0,%xmm2 > > +.Loop_dec1_13: > > +.byte 102,15,56,222,209 > > + decl %eax > > + movups (%rcx),%xmm1 > > + leaq 16(%rcx),%rcx > > + jnz .Loop_dec1_13 > > +.byte 102,15,56,223,209 > > + xorps %xmm11,%xmm2 > > + movups %xmm2,(%rsi) > > + > > +.Lxts_dec_steal: > > + movzbl 16(%rdi),%eax > > + movzbl (%rsi),%ecx > > + leaq 1(%rdi),%rdi > > + movb %al,(%rsi) > > + movb %cl,16(%rsi) > > + leaq 1(%rsi),%rsi > > + subq $1,%rdx > > + jnz .Lxts_dec_steal > > + > > + subq %r9,%rsi > > + movq %rbp,%rcx > > + movl %r10d,%eax > > + > > + movups (%rsi),%xmm2 > > + xorps %xmm10,%xmm2 > > + movups (%rcx),%xmm0 > > + movups 16(%rcx),%xmm1 > > + leaq 32(%rcx),%rcx > > + xorps %xmm0,%xmm2 > > +.Loop_dec1_14: > > +.byte 102,15,56,222,209 > > + decl %eax > > + movups (%rcx),%xmm1 > > + leaq 16(%rcx),%rcx > > + jnz .Loop_dec1_14 > > +.byte 102,15,56,223,209 > > + xorps %xmm10,%xmm2 > > + movups %xmm2,(%rsi) > > + > > +.Lxts_dec_ret: > > + xorps %xmm0,%xmm0 > > + pxor %xmm1,%xmm1 > > + pxor %xmm2,%xmm2 > > + pxor %xmm3,%xmm3 > > + pxor %xmm4,%xmm4 > > + pxor %xmm5,%xmm5 > > + pxor %xmm6,%xmm6 > > + pxor %xmm7,%xmm7 > > + movaps %xmm0,0(%rsp) > > + pxor %xmm8,%xmm8 > > + movaps %xmm0,16(%rsp) > > + pxor %xmm9,%xmm9 > > + movaps %xmm0,32(%rsp) > > + pxor %xmm10,%xmm10 > > + movaps %xmm0,48(%rsp) > > + pxor %xmm11,%xmm11 > > + movaps %xmm0,64(%rsp) > > + pxor %xmm12,%xmm12 > > + movaps %xmm0,80(%rsp) > > + pxor %xmm13,%xmm13 > > + movaps %xmm0,96(%rsp) > > + pxor %xmm14,%xmm14 > > + pxor %xmm15,%xmm15 > > + movq -8(%r11),%rbp > > +.cfi_restore %rbp > > + leaq (%r11),%rsp > > +.cfi_def_cfa_register %rsp > > +.Lxts_dec_epilogue: > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size aesni_xts_decrypt,.-aesni_xts_decrypt > > +.globl aesni_ocb_encrypt > > +.type aesni_ocb_encrypt,@function > > +.align 32 > > +aesni_ocb_encrypt: > > +.cfi_startproc > > + leaq (%rsp),%rax > > + pushq %rbx > > +.cfi_adjust_cfa_offset 8 > > +.cfi_offset %rbx,-16 > > + pushq %rbp > > +.cfi_adjust_cfa_offset 8 > > +.cfi_offset %rbp,-24 > > + pushq %r12 > > +.cfi_adjust_cfa_offset 8 > > +.cfi_offset %r12,-32 > > + pushq %r13 > > +.cfi_adjust_cfa_offset 8 > > +.cfi_offset %r13,-40 > > + pushq %r14 > > +.cfi_adjust_cfa_offset 8 > > +.cfi_offset %r14,-48 > > + movq 8(%rax),%rbx > > + movq 8+8(%rax),%rbp > > + > > + movl 240(%rcx),%r10d > > + movq %rcx,%r11 > > + shll $4,%r10d > > + movups (%rcx),%xmm9 > > + movups 16(%rcx,%r10,1),%xmm1 > > + > > + movdqu (%r9),%xmm15 > > + pxor %xmm1,%xmm9 > > + pxor %xmm1,%xmm15 > > + > > + movl $16+32,%eax > > + leaq 32(%r11,%r10,1),%rcx > > + movups 16(%r11),%xmm1 > > + subq %r10,%rax > > + movq %rax,%r10 > > + > > + movdqu (%rbx),%xmm10 > > + movdqu (%rbp),%xmm8 > > + > > + testq $1,%r8 > > + jnz .Locb_enc_odd > > + > > + bsfq %r8,%r12 > > + addq $1,%r8 > > + shlq $4,%r12 > > + movdqu (%rbx,%r12,1),%xmm7 > > + movdqu (%rdi),%xmm2 > > + leaq 16(%rdi),%rdi > > + > > + call __ocb_encrypt1 > > + > > + movdqa %xmm7,%xmm15 > > + movups %xmm2,(%rsi) > > + leaq 16(%rsi),%rsi > > + subq $1,%rdx > > + jz .Locb_enc_done > > + > > +.Locb_enc_odd: > > + leaq 1(%r8),%r12 > > + leaq 3(%r8),%r13 > > + leaq 5(%r8),%r14 > > + leaq 6(%r8),%r8 > > + bsfq %r12,%r12 > > + bsfq %r13,%r13 > > + bsfq %r14,%r14 > > + shlq $4,%r12 > > + shlq $4,%r13 > > + shlq $4,%r14 > > + > > + subq $6,%rdx > > + jc .Locb_enc_short > > + jmp .Locb_enc_grandloop > > + > > +.align 32 > > +.Locb_enc_grandloop: > > + movdqu 0(%rdi),%xmm2 > > + movdqu 16(%rdi),%xmm3 > > + movdqu 32(%rdi),%xmm4 > > + movdqu 48(%rdi),%xmm5 > > + movdqu 64(%rdi),%xmm6 > > + movdqu 80(%rdi),%xmm7 > > + leaq 96(%rdi),%rdi > > + > > + call __ocb_encrypt6 > > + > > + movups %xmm2,0(%rsi) > > + movups %xmm3,16(%rsi) > > + movups %xmm4,32(%rsi) > > + movups %xmm5,48(%rsi) > > + movups %xmm6,64(%rsi) > > + movups %xmm7,80(%rsi) > > + leaq 96(%rsi),%rsi > > + subq $6,%rdx > > + jnc .Locb_enc_grandloop > > + > > +.Locb_enc_short: > > + addq $6,%rdx > > + jz .Locb_enc_done > > + > > + movdqu 0(%rdi),%xmm2 > > + cmpq $2,%rdx > > + jb .Locb_enc_one > > + movdqu 16(%rdi),%xmm3 > > + je .Locb_enc_two > > + > > + movdqu 32(%rdi),%xmm4 > > + cmpq $4,%rdx > > + jb .Locb_enc_three > > + movdqu 48(%rdi),%xmm5 > > + je .Locb_enc_four > > + > > + movdqu 64(%rdi),%xmm6 > > + pxor %xmm7,%xmm7 > > + > > + call __ocb_encrypt6 > > + > > + movdqa %xmm14,%xmm15 > > + movups %xmm2,0(%rsi) > > + movups %xmm3,16(%rsi) > > + movups %xmm4,32(%rsi) > > + movups %xmm5,48(%rsi) > > + movups %xmm6,64(%rsi) > > + > > + jmp .Locb_enc_done > > + > > +.align 16 > > +.Locb_enc_one: > > + movdqa %xmm10,%xmm7 > > + > > + call __ocb_encrypt1 > > + > > + movdqa %xmm7,%xmm15 > > + movups %xmm2,0(%rsi) > > + jmp .Locb_enc_done > > + > > +.align 16 > > +.Locb_enc_two: > > + pxor %xmm4,%xmm4 > > + pxor %xmm5,%xmm5 > > + > > + call __ocb_encrypt4 > > + > > + movdqa %xmm11,%xmm15 > > + movups %xmm2,0(%rsi) > > + movups %xmm3,16(%rsi) > > + > > + jmp .Locb_enc_done > > + > > +.align 16 > > +.Locb_enc_three: > > + pxor %xmm5,%xmm5 > > + > > + call __ocb_encrypt4 > > + > > + movdqa %xmm12,%xmm15 > > + movups %xmm2,0(%rsi) > > + movups %xmm3,16(%rsi) > > + movups %xmm4,32(%rsi) > > + > > + jmp .Locb_enc_done > > + > > +.align 16 > > +.Locb_enc_four: > > + call __ocb_encrypt4 > > + > > + movdqa %xmm13,%xmm15 > > + movups %xmm2,0(%rsi) > > + movups %xmm3,16(%rsi) > > + movups %xmm4,32(%rsi) > > + movups %xmm5,48(%rsi) > > + > > +.Locb_enc_done: > > + pxor %xmm0,%xmm15 > > + movdqu %xmm8,(%rbp) > > + movdqu %xmm15,(%r9) > > + > > + xorps %xmm0,%xmm0 > > + pxor %xmm1,%xmm1 > > + pxor %xmm2,%xmm2 > > + pxor %xmm3,%xmm3 > > + pxor %xmm4,%xmm4 > > + pxor %xmm5,%xmm5 > > + pxor %xmm6,%xmm6 > > + pxor %xmm7,%xmm7 > > + pxor %xmm8,%xmm8 > > + pxor %xmm9,%xmm9 > > + pxor %xmm10,%xmm10 > > + pxor %xmm11,%xmm11 > > + pxor %xmm12,%xmm12 > > + pxor %xmm13,%xmm13 > > + pxor %xmm14,%xmm14 > > + pxor %xmm15,%xmm15 > > + leaq 40(%rsp),%rax > > +.cfi_def_cfa %rax,8 > > + movq -40(%rax),%r14 > > +.cfi_restore %r14 > > + movq -32(%rax),%r13 > > +.cfi_restore %r13 > > + movq -24(%rax),%r12 > > +.cfi_restore %r12 > > + movq -16(%rax),%rbp > > +.cfi_restore %rbp > > + movq -8(%rax),%rbx > > +.cfi_restore %rbx > > + leaq (%rax),%rsp > > +.cfi_def_cfa_register %rsp > > +.Locb_enc_epilogue: > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size aesni_ocb_encrypt,.-aesni_ocb_encrypt > > + > > +.type __ocb_encrypt6,@function > > +.align 32 > > +__ocb_encrypt6: > > +.cfi_startproc > > + pxor %xmm9,%xmm15 > > + movdqu (%rbx,%r12,1),%xmm11 > > + movdqa %xmm10,%xmm12 > > + movdqu (%rbx,%r13,1),%xmm13 > > + movdqa %xmm10,%xmm14 > > + pxor %xmm15,%xmm10 > > + movdqu (%rbx,%r14,1),%xmm15 > > + pxor %xmm10,%xmm11 > > + pxor %xmm2,%xmm8 > > + pxor %xmm10,%xmm2 > > + pxor %xmm11,%xmm12 > > + pxor %xmm3,%xmm8 > > + pxor %xmm11,%xmm3 > > + pxor %xmm12,%xmm13 > > + pxor %xmm4,%xmm8 > > + pxor %xmm12,%xmm4 > > + pxor %xmm13,%xmm14 > > + pxor %xmm5,%xmm8 > > + pxor %xmm13,%xmm5 > > + pxor %xmm14,%xmm15 > > + pxor %xmm6,%xmm8 > > + pxor %xmm14,%xmm6 > > + pxor %xmm7,%xmm8 > > + pxor %xmm15,%xmm7 > > + movups 32(%r11),%xmm0 > > + > > + leaq 1(%r8),%r12 > > + leaq 3(%r8),%r13 > > + leaq 5(%r8),%r14 > > + addq $6,%r8 > > + pxor %xmm9,%xmm10 > > + bsfq %r12,%r12 > > + bsfq %r13,%r13 > > + bsfq %r14,%r14 > > + > > +.byte 102,15,56,220,209 > > +.byte 102,15,56,220,217 > > +.byte 102,15,56,220,225 > > +.byte 102,15,56,220,233 > > + pxor %xmm9,%xmm11 > > + pxor %xmm9,%xmm12 > > +.byte 102,15,56,220,241 > > + pxor %xmm9,%xmm13 > > + pxor %xmm9,%xmm14 > > +.byte 102,15,56,220,249 > > + movups 48(%r11),%xmm1 > > + pxor %xmm9,%xmm15 > > + > > +.byte 102,15,56,220,208 > > +.byte 102,15,56,220,216 > > +.byte 102,15,56,220,224 > > +.byte 102,15,56,220,232 > > +.byte 102,15,56,220,240 > > +.byte 102,15,56,220,248 > > + movups 64(%r11),%xmm0 > > + shlq $4,%r12 > > + shlq $4,%r13 > > + jmp .Locb_enc_loop6 > > + > > +.align 32 > > +.Locb_enc_loop6: > > +.byte 102,15,56,220,209 > > +.byte 102,15,56,220,217 > > +.byte 102,15,56,220,225 > > +.byte 102,15,56,220,233 > > +.byte 102,15,56,220,241 > > +.byte 102,15,56,220,249 > > + movups (%rcx,%rax,1),%xmm1 > > + addq $32,%rax > > + > > +.byte 102,15,56,220,208 > > +.byte 102,15,56,220,216 > > +.byte 102,15,56,220,224 > > +.byte 102,15,56,220,232 > > +.byte 102,15,56,220,240 > > +.byte 102,15,56,220,248 > > + movups -16(%rcx,%rax,1),%xmm0 > > + jnz .Locb_enc_loop6 > > + > > +.byte 102,15,56,220,209 > > +.byte 102,15,56,220,217 > > +.byte 102,15,56,220,225 > > +.byte 102,15,56,220,233 > > +.byte 102,15,56,220,241 > > +.byte 102,15,56,220,249 > > + movups 16(%r11),%xmm1 > > + shlq $4,%r14 > > + > > +.byte 102,65,15,56,221,210 > > + movdqu (%rbx),%xmm10 > > + movq %r10,%rax > > +.byte 102,65,15,56,221,219 > > +.byte 102,65,15,56,221,228 > > +.byte 102,65,15,56,221,237 > > +.byte 102,65,15,56,221,246 > > +.byte 102,65,15,56,221,255 > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size __ocb_encrypt6,.-__ocb_encrypt6 > > + > > +.type __ocb_encrypt4,@function > > +.align 32 > > +__ocb_encrypt4: > > +.cfi_startproc > > + pxor %xmm9,%xmm15 > > + movdqu (%rbx,%r12,1),%xmm11 > > + movdqa %xmm10,%xmm12 > > + movdqu (%rbx,%r13,1),%xmm13 > > + pxor %xmm15,%xmm10 > > + pxor %xmm10,%xmm11 > > + pxor %xmm2,%xmm8 > > + pxor %xmm10,%xmm2 > > + pxor %xmm11,%xmm12 > > + pxor %xmm3,%xmm8 > > + pxor %xmm11,%xmm3 > > + pxor %xmm12,%xmm13 > > + pxor %xmm4,%xmm8 > > + pxor %xmm12,%xmm4 > > + pxor %xmm5,%xmm8 > > + pxor %xmm13,%xmm5 > > + movups 32(%r11),%xmm0 > > + > > + pxor %xmm9,%xmm10 > > + pxor %xmm9,%xmm11 > > + pxor %xmm9,%xmm12 > > + pxor %xmm9,%xmm13 > > + > > +.byte 102,15,56,220,209 > > +.byte 102,15,56,220,217 > > +.byte 102,15,56,220,225 > > +.byte 102,15,56,220,233 > > + movups 48(%r11),%xmm1 > > + > > +.byte 102,15,56,220,208 > > +.byte 102,15,56,220,216 > > +.byte 102,15,56,220,224 > > +.byte 102,15,56,220,232 > > + movups 64(%r11),%xmm0 > > + jmp .Locb_enc_loop4 > > + > > +.align 32 > > +.Locb_enc_loop4: > > +.byte 102,15,56,220,209 > > +.byte 102,15,56,220,217 > > +.byte 102,15,56,220,225 > > +.byte 102,15,56,220,233 > > + movups (%rcx,%rax,1),%xmm1 > > + addq $32,%rax > > + > > +.byte 102,15,56,220,208 > > +.byte 102,15,56,220,216 > > +.byte 102,15,56,220,224 > > +.byte 102,15,56,220,232 > > + movups -16(%rcx,%rax,1),%xmm0 > > + jnz .Locb_enc_loop4 > > + > > +.byte 102,15,56,220,209 > > +.byte 102,15,56,220,217 > > +.byte 102,15,56,220,225 > > +.byte 102,15,56,220,233 > > + movups 16(%r11),%xmm1 > > + movq %r10,%rax > > + > > +.byte 102,65,15,56,221,210 > > +.byte 102,65,15,56,221,219 > > +.byte 102,65,15,56,221,228 > > +.byte 102,65,15,56,221,237 > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size __ocb_encrypt4,.-__ocb_encrypt4 > > + > > +.type __ocb_encrypt1,@function > > +.align 32 > > +__ocb_encrypt1: > > +.cfi_startproc > > + pxor %xmm15,%xmm7 > > + pxor %xmm9,%xmm7 > > + pxor %xmm2,%xmm8 > > + pxor %xmm7,%xmm2 > > + movups 32(%r11),%xmm0 > > + > > +.byte 102,15,56,220,209 > > + movups 48(%r11),%xmm1 > > + pxor %xmm9,%xmm7 > > + > > +.byte 102,15,56,220,208 > > + movups 64(%r11),%xmm0 > > + jmp .Locb_enc_loop1 > > + > > +.align 32 > > +.Locb_enc_loop1: > > +.byte 102,15,56,220,209 > > + movups (%rcx,%rax,1),%xmm1 > > + addq $32,%rax > > + > > +.byte 102,15,56,220,208 > > + movups -16(%rcx,%rax,1),%xmm0 > > + jnz .Locb_enc_loop1 > > + > > +.byte 102,15,56,220,209 > > + movups 16(%r11),%xmm1 > > + movq %r10,%rax > > + > > +.byte 102,15,56,221,215 > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size __ocb_encrypt1,.-__ocb_encrypt1 > > + > > +.globl aesni_ocb_decrypt > > +.type aesni_ocb_decrypt,@function > > +.align 32 > > +aesni_ocb_decrypt: > > +.cfi_startproc > > + leaq (%rsp),%rax > > + pushq %rbx > > +.cfi_adjust_cfa_offset 8 > > +.cfi_offset %rbx,-16 > > + pushq %rbp > > +.cfi_adjust_cfa_offset 8 > > +.cfi_offset %rbp,-24 > > + pushq %r12 > > +.cfi_adjust_cfa_offset 8 > > +.cfi_offset %r12,-32 > > + pushq %r13 > > +.cfi_adjust_cfa_offset 8 > > +.cfi_offset %r13,-40 > > + pushq %r14 > > +.cfi_adjust_cfa_offset 8 > > +.cfi_offset %r14,-48 > > + movq 8(%rax),%rbx > > + movq 8+8(%rax),%rbp > > + > > + movl 240(%rcx),%r10d > > + movq %rcx,%r11 > > + shll $4,%r10d > > + movups (%rcx),%xmm9 > > + movups 16(%rcx,%r10,1),%xmm1 > > + > > + movdqu (%r9),%xmm15 > > + pxor %xmm1,%xmm9 > > + pxor %xmm1,%xmm15 > > + > > + movl $16+32,%eax > > + leaq 32(%r11,%r10,1),%rcx > > + movups 16(%r11),%xmm1 > > + subq %r10,%rax > > + movq %rax,%r10 > > + > > + movdqu (%rbx),%xmm10 > > + movdqu (%rbp),%xmm8 > > + > > + testq $1,%r8 > > + jnz .Locb_dec_odd > > + > > + bsfq %r8,%r12 > > + addq $1,%r8 > > + shlq $4,%r12 > > + movdqu (%rbx,%r12,1),%xmm7 > > + movdqu (%rdi),%xmm2 > > + leaq 16(%rdi),%rdi > > + > > + call __ocb_decrypt1 > > + > > + movdqa %xmm7,%xmm15 > > + movups %xmm2,(%rsi) > > + xorps %xmm2,%xmm8 > > + leaq 16(%rsi),%rsi > > + subq $1,%rdx > > + jz .Locb_dec_done > > + > > +.Locb_dec_odd: > > + leaq 1(%r8),%r12 > > + leaq 3(%r8),%r13 > > + leaq 5(%r8),%r14 > > + leaq 6(%r8),%r8 > > + bsfq %r12,%r12 > > + bsfq %r13,%r13 > > + bsfq %r14,%r14 > > + shlq $4,%r12 > > + shlq $4,%r13 > > + shlq $4,%r14 > > + > > + subq $6,%rdx > > + jc .Locb_dec_short > > + jmp .Locb_dec_grandloop > > + > > +.align 32 > > +.Locb_dec_grandloop: > > + movdqu 0(%rdi),%xmm2 > > + movdqu 16(%rdi),%xmm3 > > + movdqu 32(%rdi),%xmm4 > > + movdqu 48(%rdi),%xmm5 > > + movdqu 64(%rdi),%xmm6 > > + movdqu 80(%rdi),%xmm7 > > + leaq 96(%rdi),%rdi > > + > > + call __ocb_decrypt6 > > + > > + movups %xmm2,0(%rsi) > > + pxor %xmm2,%xmm8 > > + movups %xmm3,16(%rsi) > > + pxor %xmm3,%xmm8 > > + movups %xmm4,32(%rsi) > > + pxor %xmm4,%xmm8 > > + movups %xmm5,48(%rsi) > > + pxor %xmm5,%xmm8 > > + movups %xmm6,64(%rsi) > > + pxor %xmm6,%xmm8 > > + movups %xmm7,80(%rsi) > > + pxor %xmm7,%xmm8 > > + leaq 96(%rsi),%rsi > > + subq $6,%rdx > > + jnc .Locb_dec_grandloop > > + > > +.Locb_dec_short: > > + addq $6,%rdx > > + jz .Locb_dec_done > > + > > + movdqu 0(%rdi),%xmm2 > > + cmpq $2,%rdx > > + jb .Locb_dec_one > > + movdqu 16(%rdi),%xmm3 > > + je .Locb_dec_two > > + > > + movdqu 32(%rdi),%xmm4 > > + cmpq $4,%rdx > > + jb .Locb_dec_three > > + movdqu 48(%rdi),%xmm5 > > + je .Locb_dec_four > > + > > + movdqu 64(%rdi),%xmm6 > > + pxor %xmm7,%xmm7 > > + > > + call __ocb_decrypt6 > > + > > + movdqa %xmm14,%xmm15 > > + movups %xmm2,0(%rsi) > > + pxor %xmm2,%xmm8 > > + movups %xmm3,16(%rsi) > > + pxor %xmm3,%xmm8 > > + movups %xmm4,32(%rsi) > > + pxor %xmm4,%xmm8 > > + movups %xmm5,48(%rsi) > > + pxor %xmm5,%xmm8 > > + movups %xmm6,64(%rsi) > > + pxor %xmm6,%xmm8 > > + > > + jmp .Locb_dec_done > > + > > +.align 16 > > +.Locb_dec_one: > > + movdqa %xmm10,%xmm7 > > + > > + call __ocb_decrypt1 > > + > > + movdqa %xmm7,%xmm15 > > + movups %xmm2,0(%rsi) > > + xorps %xmm2,%xmm8 > > + jmp .Locb_dec_done > > + > > +.align 16 > > +.Locb_dec_two: > > + pxor %xmm4,%xmm4 > > + pxor %xmm5,%xmm5 > > + > > + call __ocb_decrypt4 > > + > > + movdqa %xmm11,%xmm15 > > + movups %xmm2,0(%rsi) > > + xorps %xmm2,%xmm8 > > + movups %xmm3,16(%rsi) > > + xorps %xmm3,%xmm8 > > + > > + jmp .Locb_dec_done > > + > > +.align 16 > > +.Locb_dec_three: > > + pxor %xmm5,%xmm5 > > + > > + call __ocb_decrypt4 > > + > > + movdqa %xmm12,%xmm15 > > + movups %xmm2,0(%rsi) > > + xorps %xmm2,%xmm8 > > + movups %xmm3,16(%rsi) > > + xorps %xmm3,%xmm8 > > + movups %xmm4,32(%rsi) > > + xorps %xmm4,%xmm8 > > + > > + jmp .Locb_dec_done > > + > > +.align 16 > > +.Locb_dec_four: > > + call __ocb_decrypt4 > > + > > + movdqa %xmm13,%xmm15 > > + movups %xmm2,0(%rsi) > > + pxor %xmm2,%xmm8 > > + movups %xmm3,16(%rsi) > > + pxor %xmm3,%xmm8 > > + movups %xmm4,32(%rsi) > > + pxor %xmm4,%xmm8 > > + movups %xmm5,48(%rsi) > > + pxor %xmm5,%xmm8 > > + > > +.Locb_dec_done: > > + pxor %xmm0,%xmm15 > > + movdqu %xmm8,(%rbp) > > + movdqu %xmm15,(%r9) > > + > > + xorps %xmm0,%xmm0 > > + pxor %xmm1,%xmm1 > > + pxor %xmm2,%xmm2 > > + pxor %xmm3,%xmm3 > > + pxor %xmm4,%xmm4 > > + pxor %xmm5,%xmm5 > > + pxor %xmm6,%xmm6 > > + pxor %xmm7,%xmm7 > > + pxor %xmm8,%xmm8 > > + pxor %xmm9,%xmm9 > > + pxor %xmm10,%xmm10 > > + pxor %xmm11,%xmm11 > > + pxor %xmm12,%xmm12 > > + pxor %xmm13,%xmm13 > > + pxor %xmm14,%xmm14 > > + pxor %xmm15,%xmm15 > > + leaq 40(%rsp),%rax > > +.cfi_def_cfa %rax,8 > > + movq -40(%rax),%r14 > > +.cfi_restore %r14 > > + movq -32(%rax),%r13 > > +.cfi_restore %r13 > > + movq -24(%rax),%r12 > > +.cfi_restore %r12 > > + movq -16(%rax),%rbp > > +.cfi_restore %rbp > > + movq -8(%rax),%rbx > > +.cfi_restore %rbx > > + leaq (%rax),%rsp > > +.cfi_def_cfa_register %rsp > > +.Locb_dec_epilogue: > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size aesni_ocb_decrypt,.-aesni_ocb_decrypt > > + > > +.type __ocb_decrypt6,@function > > +.align 32 > > +__ocb_decrypt6: > > +.cfi_startproc > > + pxor %xmm9,%xmm15 > > + movdqu (%rbx,%r12,1),%xmm11 > > + movdqa %xmm10,%xmm12 > > + movdqu (%rbx,%r13,1),%xmm13 > > + movdqa %xmm10,%xmm14 > > + pxor %xmm15,%xmm10 > > + movdqu (%rbx,%r14,1),%xmm15 > > + pxor %xmm10,%xmm11 > > + pxor %xmm10,%xmm2 > > + pxor %xmm11,%xmm12 > > + pxor %xmm11,%xmm3 > > + pxor %xmm12,%xmm13 > > + pxor %xmm12,%xmm4 > > + pxor %xmm13,%xmm14 > > + pxor %xmm13,%xmm5 > > + pxor %xmm14,%xmm15 > > + pxor %xmm14,%xmm6 > > + pxor %xmm15,%xmm7 > > + movups 32(%r11),%xmm0 > > + > > + leaq 1(%r8),%r12 > > + leaq 3(%r8),%r13 > > + leaq 5(%r8),%r14 > > + addq $6,%r8 > > + pxor %xmm9,%xmm10 > > + bsfq %r12,%r12 > > + bsfq %r13,%r13 > > + bsfq %r14,%r14 > > + > > +.byte 102,15,56,222,209 > > +.byte 102,15,56,222,217 > > +.byte 102,15,56,222,225 > > +.byte 102,15,56,222,233 > > + pxor %xmm9,%xmm11 > > + pxor %xmm9,%xmm12 > > +.byte 102,15,56,222,241 > > + pxor %xmm9,%xmm13 > > + pxor %xmm9,%xmm14 > > +.byte 102,15,56,222,249 > > + movups 48(%r11),%xmm1 > > + pxor %xmm9,%xmm15 > > + > > +.byte 102,15,56,222,208 > > +.byte 102,15,56,222,216 > > +.byte 102,15,56,222,224 > > +.byte 102,15,56,222,232 > > +.byte 102,15,56,222,240 > > +.byte 102,15,56,222,248 > > + movups 64(%r11),%xmm0 > > + shlq $4,%r12 > > + shlq $4,%r13 > > + jmp .Locb_dec_loop6 > > + > > +.align 32 > > +.Locb_dec_loop6: > > +.byte 102,15,56,222,209 > > +.byte 102,15,56,222,217 > > +.byte 102,15,56,222,225 > > +.byte 102,15,56,222,233 > > +.byte 102,15,56,222,241 > > +.byte 102,15,56,222,249 > > + movups (%rcx,%rax,1),%xmm1 > > + addq $32,%rax > > + > > +.byte 102,15,56,222,208 > > +.byte 102,15,56,222,216 > > +.byte 102,15,56,222,224 > > +.byte 102,15,56,222,232 > > +.byte 102,15,56,222,240 > > +.byte 102,15,56,222,248 > > + movups -16(%rcx,%rax,1),%xmm0 > > + jnz .Locb_dec_loop6 > > + > > +.byte 102,15,56,222,209 > > +.byte 102,15,56,222,217 > > +.byte 102,15,56,222,225 > > +.byte 102,15,56,222,233 > > +.byte 102,15,56,222,241 > > +.byte 102,15,56,222,249 > > + movups 16(%r11),%xmm1 > > + shlq $4,%r14 > > + > > +.byte 102,65,15,56,223,210 > > + movdqu (%rbx),%xmm10 > > + movq %r10,%rax > > +.byte 102,65,15,56,223,219 > > +.byte 102,65,15,56,223,228 > > +.byte 102,65,15,56,223,237 > > +.byte 102,65,15,56,223,246 > > +.byte 102,65,15,56,223,255 > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size __ocb_decrypt6,.-__ocb_decrypt6 > > + > > +.type __ocb_decrypt4,@function > > +.align 32 > > +__ocb_decrypt4: > > +.cfi_startproc > > + pxor %xmm9,%xmm15 > > + movdqu (%rbx,%r12,1),%xmm11 > > + movdqa %xmm10,%xmm12 > > + movdqu (%rbx,%r13,1),%xmm13 > > + pxor %xmm15,%xmm10 > > + pxor %xmm10,%xmm11 > > + pxor %xmm10,%xmm2 > > + pxor %xmm11,%xmm12 > > + pxor %xmm11,%xmm3 > > + pxor %xmm12,%xmm13 > > + pxor %xmm12,%xmm4 > > + pxor %xmm13,%xmm5 > > + movups 32(%r11),%xmm0 > > + > > + pxor %xmm9,%xmm10 > > + pxor %xmm9,%xmm11 > > + pxor %xmm9,%xmm12 > > + pxor %xmm9,%xmm13 > > + > > +.byte 102,15,56,222,209 > > +.byte 102,15,56,222,217 > > +.byte 102,15,56,222,225 > > +.byte 102,15,56,222,233 > > + movups 48(%r11),%xmm1 > > + > > +.byte 102,15,56,222,208 > > +.byte 102,15,56,222,216 > > +.byte 102,15,56,222,224 > > +.byte 102,15,56,222,232 > > + movups 64(%r11),%xmm0 > > + jmp .Locb_dec_loop4 > > + > > +.align 32 > > +.Locb_dec_loop4: > > +.byte 102,15,56,222,209 > > +.byte 102,15,56,222,217 > > +.byte 102,15,56,222,225 > > +.byte 102,15,56,222,233 > > + movups (%rcx,%rax,1),%xmm1 > > + addq $32,%rax > > + > > +.byte 102,15,56,222,208 > > +.byte 102,15,56,222,216 > > +.byte 102,15,56,222,224 > > +.byte 102,15,56,222,232 > > + movups -16(%rcx,%rax,1),%xmm0 > > + jnz .Locb_dec_loop4 > > + > > +.byte 102,15,56,222,209 > > +.byte 102,15,56,222,217 > > +.byte 102,15,56,222,225 > > +.byte 102,15,56,222,233 > > + movups 16(%r11),%xmm1 > > + movq %r10,%rax > > + > > +.byte 102,65,15,56,223,210 > > +.byte 102,65,15,56,223,219 > > +.byte 102,65,15,56,223,228 > > +.byte 102,65,15,56,223,237 > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size __ocb_decrypt4,.-__ocb_decrypt4 > > + > > +.type __ocb_decrypt1,@function > > +.align 32 > > +__ocb_decrypt1: > > +.cfi_startproc > > + pxor %xmm15,%xmm7 > > + pxor %xmm9,%xmm7 > > + pxor %xmm7,%xmm2 > > + movups 32(%r11),%xmm0 > > + > > +.byte 102,15,56,222,209 > > + movups 48(%r11),%xmm1 > > + pxor %xmm9,%xmm7 > > + > > +.byte 102,15,56,222,208 > > + movups 64(%r11),%xmm0 > > + jmp .Locb_dec_loop1 > > + > > +.align 32 > > +.Locb_dec_loop1: > > +.byte 102,15,56,222,209 > > + movups (%rcx,%rax,1),%xmm1 > > + addq $32,%rax > > + > > +.byte 102,15,56,222,208 > > + movups -16(%rcx,%rax,1),%xmm0 > > + jnz .Locb_dec_loop1 > > + > > +.byte 102,15,56,222,209 > > + movups 16(%r11),%xmm1 > > + movq %r10,%rax > > + > > +.byte 102,15,56,223,215 > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size __ocb_decrypt1,.-__ocb_decrypt1 > > +.globl aesni_cbc_encrypt > > +.type aesni_cbc_encrypt,@function > > +.align 16 > > +aesni_cbc_encrypt: > > +.cfi_startproc > > + testq %rdx,%rdx > > + jz .Lcbc_ret > > + > > + movl 240(%rcx),%r10d > > + movq %rcx,%r11 > > + testl %r9d,%r9d > > + jz .Lcbc_decrypt > > + > > + movups (%r8),%xmm2 > > + movl %r10d,%eax > > + cmpq $16,%rdx > > + jb .Lcbc_enc_tail > > + subq $16,%rdx > > + jmp .Lcbc_enc_loop > > +.align 16 > > +.Lcbc_enc_loop: > > + movups (%rdi),%xmm3 > > + leaq 16(%rdi),%rdi > > + > > + movups (%rcx),%xmm0 > > + movups 16(%rcx),%xmm1 > > + xorps %xmm0,%xmm3 > > + leaq 32(%rcx),%rcx > > + xorps %xmm3,%xmm2 > > +.Loop_enc1_15: > > +.byte 102,15,56,220,209 > > + decl %eax > > + movups (%rcx),%xmm1 > > + leaq 16(%rcx),%rcx > > + jnz .Loop_enc1_15 > > +.byte 102,15,56,221,209 > > + movl %r10d,%eax > > + movq %r11,%rcx > > + movups %xmm2,0(%rsi) > > + leaq 16(%rsi),%rsi > > + subq $16,%rdx > > + jnc .Lcbc_enc_loop > > + addq $16,%rdx > > + jnz .Lcbc_enc_tail > > + pxor %xmm0,%xmm0 > > + pxor %xmm1,%xmm1 > > + movups %xmm2,(%r8) > > + pxor %xmm2,%xmm2 > > + pxor %xmm3,%xmm3 > > + jmp .Lcbc_ret > > + > > +.Lcbc_enc_tail: > > + movq %rdx,%rcx > > + xchgq %rdi,%rsi > > +.long 0x9066A4F3 > > + movl $16,%ecx > > + subq %rdx,%rcx > > + xorl %eax,%eax > > +.long 0x9066AAF3 > > + leaq -16(%rdi),%rdi > > + movl %r10d,%eax > > + movq %rdi,%rsi > > + movq %r11,%rcx > > + xorq %rdx,%rdx > > + jmp .Lcbc_enc_loop > > + > > +.align 16 > > +.Lcbc_decrypt: > > + cmpq $16,%rdx > > + jne .Lcbc_decrypt_bulk > > + > > + > > + > > + movdqu (%rdi),%xmm2 > > + movdqu (%r8),%xmm3 > > + movdqa %xmm2,%xmm4 > > + movups (%rcx),%xmm0 > > + movups 16(%rcx),%xmm1 > > + leaq 32(%rcx),%rcx > > + xorps %xmm0,%xmm2 > > +.Loop_dec1_16: > > +.byte 102,15,56,222,209 > > + decl %r10d > > + movups (%rcx),%xmm1 > > + leaq 16(%rcx),%rcx > > + jnz .Loop_dec1_16 > > +.byte 102,15,56,223,209 > > + pxor %xmm0,%xmm0 > > + pxor %xmm1,%xmm1 > > + movdqu %xmm4,(%r8) > > + xorps %xmm3,%xmm2 > > + pxor %xmm3,%xmm3 > > + movups %xmm2,(%rsi) > > + pxor %xmm2,%xmm2 > > + jmp .Lcbc_ret > > +.align 16 > > +.Lcbc_decrypt_bulk: > > + leaq (%rsp),%r11 > > +.cfi_def_cfa_register %r11 > > + pushq %rbp > > +.cfi_offset %rbp,-16 > > + subq $16,%rsp > > + andq $-16,%rsp > > + movq %rcx,%rbp > > + movups (%r8),%xmm10 > > + movl %r10d,%eax > > + cmpq $0x50,%rdx > > + jbe .Lcbc_dec_tail > > + > > + movups (%rcx),%xmm0 > > + movdqu 0(%rdi),%xmm2 > > + movdqu 16(%rdi),%xmm3 > > + movdqa %xmm2,%xmm11 > > + movdqu 32(%rdi),%xmm4 > > + movdqa %xmm3,%xmm12 > > + movdqu 48(%rdi),%xmm5 > > + movdqa %xmm4,%xmm13 > > + movdqu 64(%rdi),%xmm6 > > + movdqa %xmm5,%xmm14 > > + movdqu 80(%rdi),%xmm7 > > + movdqa %xmm6,%xmm15 > > + movl OPENSSL_ia32cap_P+4(%rip),%r9d > > + cmpq $0x70,%rdx > > + jbe .Lcbc_dec_six_or_seven > > + > > + andl $71303168,%r9d > > + subq $0x50,%rdx > > + cmpl $4194304,%r9d > > + je .Lcbc_dec_loop6_enter > > + subq $0x20,%rdx > > + leaq 112(%rcx),%rcx > > + jmp .Lcbc_dec_loop8_enter > > +.align 16 > > +.Lcbc_dec_loop8: > > + movups %xmm9,(%rsi) > > + leaq 16(%rsi),%rsi > > +.Lcbc_dec_loop8_enter: > > + movdqu 96(%rdi),%xmm8 > > + pxor %xmm0,%xmm2 > > + movdqu 112(%rdi),%xmm9 > > + pxor %xmm0,%xmm3 > > + movups 16-112(%rcx),%xmm1 > > + pxor %xmm0,%xmm4 > > + movq $-1,%rbp > > + cmpq $0x70,%rdx > > + pxor %xmm0,%xmm5 > > + pxor %xmm0,%xmm6 > > + pxor %xmm0,%xmm7 > > + pxor %xmm0,%xmm8 > > + > > +.byte 102,15,56,222,209 > > + pxor %xmm0,%xmm9 > > + movups 32-112(%rcx),%xmm0 > > +.byte 102,15,56,222,217 > > +.byte 102,15,56,222,225 > > +.byte 102,15,56,222,233 > > +.byte 102,15,56,222,241 > > +.byte 102,15,56,222,249 > > +.byte 102,68,15,56,222,193 > > + adcq $0,%rbp > > + andq $128,%rbp > > +.byte 102,68,15,56,222,201 > > + addq %rdi,%rbp > > + movups 48-112(%rcx),%xmm1 > > +.byte 102,15,56,222,208 > > +.byte 102,15,56,222,216 > > +.byte 102,15,56,222,224 > > +.byte 102,15,56,222,232 > > +.byte 102,15,56,222,240 > > +.byte 102,15,56,222,248 > > +.byte 102,68,15,56,222,192 > > +.byte 102,68,15,56,222,200 > > + movups 64-112(%rcx),%xmm0 > > + nop > > +.byte 102,15,56,222,209 > > +.byte 102,15,56,222,217 > > +.byte 102,15,56,222,225 > > +.byte 102,15,56,222,233 > > +.byte 102,15,56,222,241 > > +.byte 102,15,56,222,249 > > +.byte 102,68,15,56,222,193 > > +.byte 102,68,15,56,222,201 > > + movups 80-112(%rcx),%xmm1 > > + nop > > +.byte 102,15,56,222,208 > > +.byte 102,15,56,222,216 > > +.byte 102,15,56,222,224 > > +.byte 102,15,56,222,232 > > +.byte 102,15,56,222,240 > > +.byte 102,15,56,222,248 > > +.byte 102,68,15,56,222,192 > > +.byte 102,68,15,56,222,200 > > + movups 96-112(%rcx),%xmm0 > > + nop > > +.byte 102,15,56,222,209 > > +.byte 102,15,56,222,217 > > +.byte 102,15,56,222,225 > > +.byte 102,15,56,222,233 > > +.byte 102,15,56,222,241 > > +.byte 102,15,56,222,249 > > +.byte 102,68,15,56,222,193 > > +.byte 102,68,15,56,222,201 > > + movups 112-112(%rcx),%xmm1 > > + nop > > +.byte 102,15,56,222,208 > > +.byte 102,15,56,222,216 > > +.byte 102,15,56,222,224 > > +.byte 102,15,56,222,232 > > +.byte 102,15,56,222,240 > > +.byte 102,15,56,222,248 > > +.byte 102,68,15,56,222,192 > > +.byte 102,68,15,56,222,200 > > + movups 128-112(%rcx),%xmm0 > > + nop > > +.byte 102,15,56,222,209 > > +.byte 102,15,56,222,217 > > +.byte 102,15,56,222,225 > > +.byte 102,15,56,222,233 > > +.byte 102,15,56,222,241 > > +.byte 102,15,56,222,249 > > +.byte 102,68,15,56,222,193 > > +.byte 102,68,15,56,222,201 > > + movups 144-112(%rcx),%xmm1 > > + cmpl $11,%eax > > +.byte 102,15,56,222,208 > > +.byte 102,15,56,222,216 > > +.byte 102,15,56,222,224 > > +.byte 102,15,56,222,232 > > +.byte 102,15,56,222,240 > > +.byte 102,15,56,222,248 > > +.byte 102,68,15,56,222,192 > > +.byte 102,68,15,56,222,200 > > + movups 160-112(%rcx),%xmm0 > > + jb .Lcbc_dec_done > > +.byte 102,15,56,222,209 > > +.byte 102,15,56,222,217 > > +.byte 102,15,56,222,225 > > +.byte 102,15,56,222,233 > > +.byte 102,15,56,222,241 > > +.byte 102,15,56,222,249 > > +.byte 102,68,15,56,222,193 > > +.byte 102,68,15,56,222,201 > > + movups 176-112(%rcx),%xmm1 > > + nop > > +.byte 102,15,56,222,208 > > +.byte 102,15,56,222,216 > > +.byte 102,15,56,222,224 > > +.byte 102,15,56,222,232 > > +.byte 102,15,56,222,240 > > +.byte 102,15,56,222,248 > > +.byte 102,68,15,56,222,192 > > +.byte 102,68,15,56,222,200 > > + movups 192-112(%rcx),%xmm0 > > + je .Lcbc_dec_done > > +.byte 102,15,56,222,209 > > +.byte 102,15,56,222,217 > > +.byte 102,15,56,222,225 > > +.byte 102,15,56,222,233 > > +.byte 102,15,56,222,241 > > +.byte 102,15,56,222,249 > > +.byte 102,68,15,56,222,193 > > +.byte 102,68,15,56,222,201 > > + movups 208-112(%rcx),%xmm1 > > + nop > > +.byte 102,15,56,222,208 > > +.byte 102,15,56,222,216 > > +.byte 102,15,56,222,224 > > +.byte 102,15,56,222,232 > > +.byte 102,15,56,222,240 > > +.byte 102,15,56,222,248 > > +.byte 102,68,15,56,222,192 > > +.byte 102,68,15,56,222,200 > > + movups 224-112(%rcx),%xmm0 > > + jmp .Lcbc_dec_done > > +.align 16 > > +.Lcbc_dec_done: > > +.byte 102,15,56,222,209 > > +.byte 102,15,56,222,217 > > + pxor %xmm0,%xmm10 > > + pxor %xmm0,%xmm11 > > +.byte 102,15,56,222,225 > > +.byte 102,15,56,222,233 > > + pxor %xmm0,%xmm12 > > + pxor %xmm0,%xmm13 > > +.byte 102,15,56,222,241 > > +.byte 102,15,56,222,249 > > + pxor %xmm0,%xmm14 > > + pxor %xmm0,%xmm15 > > +.byte 102,68,15,56,222,193 > > +.byte 102,68,15,56,222,201 > > + movdqu 80(%rdi),%xmm1 > > + > > +.byte 102,65,15,56,223,210 > > + movdqu 96(%rdi),%xmm10 > > + pxor %xmm0,%xmm1 > > +.byte 102,65,15,56,223,219 > > + pxor %xmm0,%xmm10 > > + movdqu 112(%rdi),%xmm0 > > +.byte 102,65,15,56,223,228 > > + leaq 128(%rdi),%rdi > > + movdqu 0(%rbp),%xmm11 > > +.byte 102,65,15,56,223,237 > > +.byte 102,65,15,56,223,246 > > + movdqu 16(%rbp),%xmm12 > > + movdqu 32(%rbp),%xmm13 > > +.byte 102,65,15,56,223,255 > > +.byte 102,68,15,56,223,193 > > + movdqu 48(%rbp),%xmm14 > > + movdqu 64(%rbp),%xmm15 > > +.byte 102,69,15,56,223,202 > > + movdqa %xmm0,%xmm10 > > + movdqu 80(%rbp),%xmm1 > > + movups -112(%rcx),%xmm0 > > + > > + movups %xmm2,(%rsi) > > + movdqa %xmm11,%xmm2 > > + movups %xmm3,16(%rsi) > > + movdqa %xmm12,%xmm3 > > + movups %xmm4,32(%rsi) > > + movdqa %xmm13,%xmm4 > > + movups %xmm5,48(%rsi) > > + movdqa %xmm14,%xmm5 > > + movups %xmm6,64(%rsi) > > + movdqa %xmm15,%xmm6 > > + movups %xmm7,80(%rsi) > > + movdqa %xmm1,%xmm7 > > + movups %xmm8,96(%rsi) > > + leaq 112(%rsi),%rsi > > + > > + subq $0x80,%rdx > > + ja .Lcbc_dec_loop8 > > + > > + movaps %xmm9,%xmm2 > > + leaq -112(%rcx),%rcx > > + addq $0x70,%rdx > > + jle .Lcbc_dec_clear_tail_collected > > + movups %xmm9,(%rsi) > > + leaq 16(%rsi),%rsi > > + cmpq $0x50,%rdx > > + jbe .Lcbc_dec_tail > > + > > + movaps %xmm11,%xmm2 > > +.Lcbc_dec_six_or_seven: > > + cmpq $0x60,%rdx > > + ja .Lcbc_dec_seven > > + > > + movaps %xmm7,%xmm8 > > + call _aesni_decrypt6 > > + pxor %xmm10,%xmm2 > > + movaps %xmm8,%xmm10 > > + pxor %xmm11,%xmm3 > > + movdqu %xmm2,(%rsi) > > + pxor %xmm12,%xmm4 > > + movdqu %xmm3,16(%rsi) > > + pxor %xmm3,%xmm3 > > + pxor %xmm13,%xmm5 > > + movdqu %xmm4,32(%rsi) > > + pxor %xmm4,%xmm4 > > + pxor %xmm14,%xmm6 > > + movdqu %xmm5,48(%rsi) > > + pxor %xmm5,%xmm5 > > + pxor %xmm15,%xmm7 > > + movdqu %xmm6,64(%rsi) > > + pxor %xmm6,%xmm6 > > + leaq 80(%rsi),%rsi > > + movdqa %xmm7,%xmm2 > > + pxor %xmm7,%xmm7 > > + jmp .Lcbc_dec_tail_collected > > + > > +.align 16 > > +.Lcbc_dec_seven: > > + movups 96(%rdi),%xmm8 > > + xorps %xmm9,%xmm9 > > + call _aesni_decrypt8 > > + movups 80(%rdi),%xmm9 > > + pxor %xmm10,%xmm2 > > + movups 96(%rdi),%xmm10 > > + pxor %xmm11,%xmm3 > > + movdqu %xmm2,(%rsi) > > + pxor %xmm12,%xmm4 > > + movdqu %xmm3,16(%rsi) > > + pxor %xmm3,%xmm3 > > + pxor %xmm13,%xmm5 > > + movdqu %xmm4,32(%rsi) > > + pxor %xmm4,%xmm4 > > + pxor %xmm14,%xmm6 > > + movdqu %xmm5,48(%rsi) > > + pxor %xmm5,%xmm5 > > + pxor %xmm15,%xmm7 > > + movdqu %xmm6,64(%rsi) > > + pxor %xmm6,%xmm6 > > + pxor %xmm9,%xmm8 > > + movdqu %xmm7,80(%rsi) > > + pxor %xmm7,%xmm7 > > + leaq 96(%rsi),%rsi > > + movdqa %xmm8,%xmm2 > > + pxor %xmm8,%xmm8 > > + pxor %xmm9,%xmm9 > > + jmp .Lcbc_dec_tail_collected > > + > > +.align 16 > > +.Lcbc_dec_loop6: > > + movups %xmm7,(%rsi) > > + leaq 16(%rsi),%rsi > > + movdqu 0(%rdi),%xmm2 > > + movdqu 16(%rdi),%xmm3 > > + movdqa %xmm2,%xmm11 > > + movdqu 32(%rdi),%xmm4 > > + movdqa %xmm3,%xmm12 > > + movdqu 48(%rdi),%xmm5 > > + movdqa %xmm4,%xmm13 > > + movdqu 64(%rdi),%xmm6 > > + movdqa %xmm5,%xmm14 > > + movdqu 80(%rdi),%xmm7 > > + movdqa %xmm6,%xmm15 > > +.Lcbc_dec_loop6_enter: > > + leaq 96(%rdi),%rdi > > + movdqa %xmm7,%xmm8 > > + > > + call _aesni_decrypt6 > > + > > + pxor %xmm10,%xmm2 > > + movdqa %xmm8,%xmm10 > > + pxor %xmm11,%xmm3 > > + movdqu %xmm2,(%rsi) > > + pxor %xmm12,%xmm4 > > + movdqu %xmm3,16(%rsi) > > + pxor %xmm13,%xmm5 > > + movdqu %xmm4,32(%rsi) > > + pxor %xmm14,%xmm6 > > + movq %rbp,%rcx > > + movdqu %xmm5,48(%rsi) > > + pxor %xmm15,%xmm7 > > + movl %r10d,%eax > > + movdqu %xmm6,64(%rsi) > > + leaq 80(%rsi),%rsi > > + subq $0x60,%rdx > > + ja .Lcbc_dec_loop6 > > + > > + movdqa %xmm7,%xmm2 > > + addq $0x50,%rdx > > + jle .Lcbc_dec_clear_tail_collected > > + movups %xmm7,(%rsi) > > + leaq 16(%rsi),%rsi > > + > > +.Lcbc_dec_tail: > > + movups (%rdi),%xmm2 > > + subq $0x10,%rdx > > + jbe .Lcbc_dec_one > > + > > + movups 16(%rdi),%xmm3 > > + movaps %xmm2,%xmm11 > > + subq $0x10,%rdx > > + jbe .Lcbc_dec_two > > + > > + movups 32(%rdi),%xmm4 > > + movaps %xmm3,%xmm12 > > + subq $0x10,%rdx > > + jbe .Lcbc_dec_three > > + > > + movups 48(%rdi),%xmm5 > > + movaps %xmm4,%xmm13 > > + subq $0x10,%rdx > > + jbe .Lcbc_dec_four > > + > > + movups 64(%rdi),%xmm6 > > + movaps %xmm5,%xmm14 > > + movaps %xmm6,%xmm15 > > + xorps %xmm7,%xmm7 > > + call _aesni_decrypt6 > > + pxor %xmm10,%xmm2 > > + movaps %xmm15,%xmm10 > > + pxor %xmm11,%xmm3 > > + movdqu %xmm2,(%rsi) > > + pxor %xmm12,%xmm4 > > + movdqu %xmm3,16(%rsi) > > + pxor %xmm3,%xmm3 > > + pxor %xmm13,%xmm5 > > + movdqu %xmm4,32(%rsi) > > + pxor %xmm4,%xmm4 > > + pxor %xmm14,%xmm6 > > + movdqu %xmm5,48(%rsi) > > + pxor %xmm5,%xmm5 > > + leaq 64(%rsi),%rsi > > + movdqa %xmm6,%xmm2 > > + pxor %xmm6,%xmm6 > > + pxor %xmm7,%xmm7 > > + subq $0x10,%rdx > > + jmp .Lcbc_dec_tail_collected > > + > > +.align 16 > > +.Lcbc_dec_one: > > + movaps %xmm2,%xmm11 > > + movups (%rcx),%xmm0 > > + movups 16(%rcx),%xmm1 > > + leaq 32(%rcx),%rcx > > + xorps %xmm0,%xmm2 > > +.Loop_dec1_17: > > +.byte 102,15,56,222,209 > > + decl %eax > > + movups (%rcx),%xmm1 > > + leaq 16(%rcx),%rcx > > + jnz .Loop_dec1_17 > > +.byte 102,15,56,223,209 > > + xorps %xmm10,%xmm2 > > + movaps %xmm11,%xmm10 > > + jmp .Lcbc_dec_tail_collected > > +.align 16 > > +.Lcbc_dec_two: > > + movaps %xmm3,%xmm12 > > + call _aesni_decrypt2 > > + pxor %xmm10,%xmm2 > > + movaps %xmm12,%xmm10 > > + pxor %xmm11,%xmm3 > > + movdqu %xmm2,(%rsi) > > + movdqa %xmm3,%xmm2 > > + pxor %xmm3,%xmm3 > > + leaq 16(%rsi),%rsi > > + jmp .Lcbc_dec_tail_collected > > +.align 16 > > +.Lcbc_dec_three: > > + movaps %xmm4,%xmm13 > > + call _aesni_decrypt3 > > + pxor %xmm10,%xmm2 > > + movaps %xmm13,%xmm10 > > + pxor %xmm11,%xmm3 > > + movdqu %xmm2,(%rsi) > > + pxor %xmm12,%xmm4 > > + movdqu %xmm3,16(%rsi) > > + pxor %xmm3,%xmm3 > > + movdqa %xmm4,%xmm2 > > + pxor %xmm4,%xmm4 > > + leaq 32(%rsi),%rsi > > + jmp .Lcbc_dec_tail_collected > > +.align 16 > > +.Lcbc_dec_four: > > + movaps %xmm5,%xmm14 > > + call _aesni_decrypt4 > > + pxor %xmm10,%xmm2 > > + movaps %xmm14,%xmm10 > > + pxor %xmm11,%xmm3 > > + movdqu %xmm2,(%rsi) > > + pxor %xmm12,%xmm4 > > + movdqu %xmm3,16(%rsi) > > + pxor %xmm3,%xmm3 > > + pxor %xmm13,%xmm5 > > + movdqu %xmm4,32(%rsi) > > + pxor %xmm4,%xmm4 > > + movdqa %xmm5,%xmm2 > > + pxor %xmm5,%xmm5 > > + leaq 48(%rsi),%rsi > > + jmp .Lcbc_dec_tail_collected > > + > > +.align 16 > > +.Lcbc_dec_clear_tail_collected: > > + pxor %xmm3,%xmm3 > > + pxor %xmm4,%xmm4 > > + pxor %xmm5,%xmm5 > > + pxor %xmm6,%xmm6 > > + pxor %xmm7,%xmm7 > > + pxor %xmm8,%xmm8 > > + pxor %xmm9,%xmm9 > > +.Lcbc_dec_tail_collected: > > + movups %xmm10,(%r8) > > + andq $15,%rdx > > + jnz .Lcbc_dec_tail_partial > > + movups %xmm2,(%rsi) > > + pxor %xmm2,%xmm2 > > + jmp .Lcbc_dec_ret > > +.align 16 > > +.Lcbc_dec_tail_partial: > > + movaps %xmm2,(%rsp) > > + pxor %xmm2,%xmm2 > > + movq $16,%rcx > > + movq %rsi,%rdi > > + subq %rdx,%rcx > > + leaq (%rsp),%rsi > > +.long 0x9066A4F3 > > + movdqa %xmm2,(%rsp) > > + > > +.Lcbc_dec_ret: > > + xorps %xmm0,%xmm0 > > + pxor %xmm1,%xmm1 > > + movq -8(%r11),%rbp > > +.cfi_restore %rbp > > + leaq (%r11),%rsp > > +.cfi_def_cfa_register %rsp > > +.Lcbc_ret: > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size aesni_cbc_encrypt,.-aesni_cbc_encrypt > > +.globl aesni_set_decrypt_key > > +.type aesni_set_decrypt_key,@function > > +.align 16 > > +aesni_set_decrypt_key: > > +.cfi_startproc > > +.byte 0x48,0x83,0xEC,0x08 > > +.cfi_adjust_cfa_offset 8 > > + call __aesni_set_encrypt_key > > + shll $4,%esi > > + testl %eax,%eax > > + jnz .Ldec_key_ret > > + leaq 16(%rdx,%rsi,1),%rdi > > + > > + movups (%rdx),%xmm0 > > + movups (%rdi),%xmm1 > > + movups %xmm0,(%rdi) > > + movups %xmm1,(%rdx) > > + leaq 16(%rdx),%rdx > > + leaq -16(%rdi),%rdi > > + > > +.Ldec_key_inverse: > > + movups (%rdx),%xmm0 > > + movups (%rdi),%xmm1 > > +.byte 102,15,56,219,192 > > +.byte 102,15,56,219,201 > > + leaq 16(%rdx),%rdx > > + leaq -16(%rdi),%rdi > > + movups %xmm0,16(%rdi) > > + movups %xmm1,-16(%rdx) > > + cmpq %rdx,%rdi > > + ja .Ldec_key_inverse > > + > > + movups (%rdx),%xmm0 > > +.byte 102,15,56,219,192 > > + pxor %xmm1,%xmm1 > > + movups %xmm0,(%rdi) > > + pxor %xmm0,%xmm0 > > +.Ldec_key_ret: > > + addq $8,%rsp > > +.cfi_adjust_cfa_offset -8 > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.LSEH_end_set_decrypt_key: > > +.size aesni_set_decrypt_key,.-aesni_set_decrypt_key > > +.globl aesni_set_encrypt_key > > +.type aesni_set_encrypt_key,@function > > +.align 16 > > +aesni_set_encrypt_key: > > +__aesni_set_encrypt_key: > > +.cfi_startproc > > +.byte 0x48,0x83,0xEC,0x08 > > +.cfi_adjust_cfa_offset 8 > > + movq $-1,%rax > > + testq %rdi,%rdi > > + jz .Lenc_key_ret > > + testq %rdx,%rdx > > + jz .Lenc_key_ret > > + > > + movl $268437504,%r10d > > + movups (%rdi),%xmm0 > > + xorps %xmm4,%xmm4 > > + andl OPENSSL_ia32cap_P+4(%rip),%r10d > > + leaq 16(%rdx),%rax > > + cmpl $256,%esi > > + je .L14rounds > > + cmpl $192,%esi > > + je .L12rounds > > + cmpl $128,%esi > > + jne .Lbad_keybits > > + > > +.L10rounds: > > + movl $9,%esi > > + cmpl $268435456,%r10d > > + je .L10rounds_alt > > + > > + movups %xmm0,(%rdx) > > +.byte 102,15,58,223,200,1 > > + call .Lkey_expansion_128_cold > > +.byte 102,15,58,223,200,2 > > + call .Lkey_expansion_128 > > +.byte 102,15,58,223,200,4 > > + call .Lkey_expansion_128 > > +.byte 102,15,58,223,200,8 > > + call .Lkey_expansion_128 > > +.byte 102,15,58,223,200,16 > > + call .Lkey_expansion_128 > > +.byte 102,15,58,223,200,32 > > + call .Lkey_expansion_128 > > +.byte 102,15,58,223,200,64 > > + call .Lkey_expansion_128 > > +.byte 102,15,58,223,200,128 > > + call .Lkey_expansion_128 > > +.byte 102,15,58,223,200,27 > > + call .Lkey_expansion_128 > > +.byte 102,15,58,223,200,54 > > + call .Lkey_expansion_128 > > + movups %xmm0,(%rax) > > + movl %esi,80(%rax) > > + xorl %eax,%eax > > + jmp .Lenc_key_ret > > + > > +.align 16 > > +.L10rounds_alt: > > + movdqa .Lkey_rotate(%rip),%xmm5 > > + movl $8,%r10d > > + movdqa .Lkey_rcon1(%rip),%xmm4 > > + movdqa %xmm0,%xmm2 > > + movdqu %xmm0,(%rdx) > > + jmp .Loop_key128 > > + > > +.align 16 > > +.Loop_key128: > > +.byte 102,15,56,0,197 > > +.byte 102,15,56,221,196 > > + pslld $1,%xmm4 > > + leaq 16(%rax),%rax > > + > > + movdqa %xmm2,%xmm3 > > + pslldq $4,%xmm2 > > + pxor %xmm2,%xmm3 > > + pslldq $4,%xmm2 > > + pxor %xmm2,%xmm3 > > + pslldq $4,%xmm2 > > + pxor %xmm3,%xmm2 > > + > > + pxor %xmm2,%xmm0 > > + movdqu %xmm0,-16(%rax) > > + movdqa %xmm0,%xmm2 > > + > > + decl %r10d > > + jnz .Loop_key128 > > + > > + movdqa .Lkey_rcon1b(%rip),%xmm4 > > + > > +.byte 102,15,56,0,197 > > +.byte 102,15,56,221,196 > > + pslld $1,%xmm4 > > + > > + movdqa %xmm2,%xmm3 > > + pslldq $4,%xmm2 > > + pxor %xmm2,%xmm3 > > + pslldq $4,%xmm2 > > + pxor %xmm2,%xmm3 > > + pslldq $4,%xmm2 > > + pxor %xmm3,%xmm2 > > + > > + pxor %xmm2,%xmm0 > > + movdqu %xmm0,(%rax) > > + > > + movdqa %xmm0,%xmm2 > > +.byte 102,15,56,0,197 > > +.byte 102,15,56,221,196 > > + > > + movdqa %xmm2,%xmm3 > > + pslldq $4,%xmm2 > > + pxor %xmm2,%xmm3 > > + pslldq $4,%xmm2 > > + pxor %xmm2,%xmm3 > > + pslldq $4,%xmm2 > > + pxor %xmm3,%xmm2 > > + > > + pxor %xmm2,%xmm0 > > + movdqu %xmm0,16(%rax) > > + > > + movl %esi,96(%rax) > > + xorl %eax,%eax > > + jmp .Lenc_key_ret > > + > > +.align 16 > > +.L12rounds: > > + movq 16(%rdi),%xmm2 > > + movl $11,%esi > > + cmpl $268435456,%r10d > > + je .L12rounds_alt > > + > > + movups %xmm0,(%rdx) > > +.byte 102,15,58,223,202,1 > > + call .Lkey_expansion_192a_cold > > +.byte 102,15,58,223,202,2 > > + call .Lkey_expansion_192b > > +.byte 102,15,58,223,202,4 > > + call .Lkey_expansion_192a > > +.byte 102,15,58,223,202,8 > > + call .Lkey_expansion_192b > > +.byte 102,15,58,223,202,16 > > + call .Lkey_expansion_192a > > +.byte 102,15,58,223,202,32 > > + call .Lkey_expansion_192b > > +.byte 102,15,58,223,202,64 > > + call .Lkey_expansion_192a > > +.byte 102,15,58,223,202,128 > > + call .Lkey_expansion_192b > > + movups %xmm0,(%rax) > > + movl %esi,48(%rax) > > + xorq %rax,%rax > > + jmp .Lenc_key_ret > > + > > +.align 16 > > +.L12rounds_alt: > > + movdqa .Lkey_rotate192(%rip),%xmm5 > > + movdqa .Lkey_rcon1(%rip),%xmm4 > > + movl $8,%r10d > > + movdqu %xmm0,(%rdx) > > + jmp .Loop_key192 > > + > > +.align 16 > > +.Loop_key192: > > + movq %xmm2,0(%rax) > > + movdqa %xmm2,%xmm1 > > +.byte 102,15,56,0,213 > > +.byte 102,15,56,221,212 > > + pslld $1,%xmm4 > > + leaq 24(%rax),%rax > > + > > + movdqa %xmm0,%xmm3 > > + pslldq $4,%xmm0 > > + pxor %xmm0,%xmm3 > > + pslldq $4,%xmm0 > > + pxor %xmm0,%xmm3 > > + pslldq $4,%xmm0 > > + pxor %xmm3,%xmm0 > > + > > + pshufd $0xff,%xmm0,%xmm3 > > + pxor %xmm1,%xmm3 > > + pslldq $4,%xmm1 > > + pxor %xmm1,%xmm3 > > + > > + pxor %xmm2,%xmm0 > > + pxor %xmm3,%xmm2 > > + movdqu %xmm0,-16(%rax) > > + > > + decl %r10d > > + jnz .Loop_key192 > > + > > + movl %esi,32(%rax) > > + xorl %eax,%eax > > + jmp .Lenc_key_ret > > + > > +.align 16 > > +.L14rounds: > > + movups 16(%rdi),%xmm2 > > + movl $13,%esi > > + leaq 16(%rax),%rax > > + cmpl $268435456,%r10d > > + je .L14rounds_alt > > + > > + movups %xmm0,(%rdx) > > + movups %xmm2,16(%rdx) > > +.byte 102,15,58,223,202,1 > > + call .Lkey_expansion_256a_cold > > +.byte 102,15,58,223,200,1 > > + call .Lkey_expansion_256b > > +.byte 102,15,58,223,202,2 > > + call .Lkey_expansion_256a > > +.byte 102,15,58,223,200,2 > > + call .Lkey_expansion_256b > > +.byte 102,15,58,223,202,4 > > + call .Lkey_expansion_256a > > +.byte 102,15,58,223,200,4 > > + call .Lkey_expansion_256b > > +.byte 102,15,58,223,202,8 > > + call .Lkey_expansion_256a > > +.byte 102,15,58,223,200,8 > > + call .Lkey_expansion_256b > > +.byte 102,15,58,223,202,16 > > + call .Lkey_expansion_256a > > +.byte 102,15,58,223,200,16 > > + call .Lkey_expansion_256b > > +.byte 102,15,58,223,202,32 > > + call .Lkey_expansion_256a > > +.byte 102,15,58,223,200,32 > > + call .Lkey_expansion_256b > > +.byte 102,15,58,223,202,64 > > + call .Lkey_expansion_256a > > + movups %xmm0,(%rax) > > + movl %esi,16(%rax) > > + xorq %rax,%rax > > + jmp .Lenc_key_ret > > + > > +.align 16 > > +.L14rounds_alt: > > + movdqa .Lkey_rotate(%rip),%xmm5 > > + movdqa .Lkey_rcon1(%rip),%xmm4 > > + movl $7,%r10d > > + movdqu %xmm0,0(%rdx) > > + movdqa %xmm2,%xmm1 > > + movdqu %xmm2,16(%rdx) > > + jmp .Loop_key256 > > + > > +.align 16 > > +.Loop_key256: > > +.byte 102,15,56,0,213 > > +.byte 102,15,56,221,212 > > + > > + movdqa %xmm0,%xmm3 > > + pslldq $4,%xmm0 > > + pxor %xmm0,%xmm3 > > + pslldq $4,%xmm0 > > + pxor %xmm0,%xmm3 > > + pslldq $4,%xmm0 > > + pxor %xmm3,%xmm0 > > + pslld $1,%xmm4 > > + > > + pxor %xmm2,%xmm0 > > + movdqu %xmm0,(%rax) > > + > > + decl %r10d > > + jz .Ldone_key256 > > + > > + pshufd $0xff,%xmm0,%xmm2 > > + pxor %xmm3,%xmm3 > > +.byte 102,15,56,221,211 > > + > > + movdqa %xmm1,%xmm3 > > + pslldq $4,%xmm1 > > + pxor %xmm1,%xmm3 > > + pslldq $4,%xmm1 > > + pxor %xmm1,%xmm3 > > + pslldq $4,%xmm1 > > + pxor %xmm3,%xmm1 > > + > > + pxor %xmm1,%xmm2 > > + movdqu %xmm2,16(%rax) > > + leaq 32(%rax),%rax > > + movdqa %xmm2,%xmm1 > > + > > + jmp .Loop_key256 > > + > > +.Ldone_key256: > > + movl %esi,16(%rax) > > + xorl %eax,%eax > > + jmp .Lenc_key_ret > > + > > +.align 16 > > +.Lbad_keybits: > > + movq $-2,%rax > > +.Lenc_key_ret: > > + pxor %xmm0,%xmm0 > > + pxor %xmm1,%xmm1 > > + pxor %xmm2,%xmm2 > > + pxor %xmm3,%xmm3 > > + pxor %xmm4,%xmm4 > > + pxor %xmm5,%xmm5 > > + addq $8,%rsp > > +.cfi_adjust_cfa_offset -8 > > + .byte 0xf3,0xc3 > > +.LSEH_end_set_encrypt_key: > > + > > +.align 16 > > +.Lkey_expansion_128: > > + movups %xmm0,(%rax) > > + leaq 16(%rax),%rax > > +.Lkey_expansion_128_cold: > > + shufps $16,%xmm0,%xmm4 > > + xorps %xmm4,%xmm0 > > + shufps $140,%xmm0,%xmm4 > > + xorps %xmm4,%xmm0 > > + shufps $255,%xmm1,%xmm1 > > + xorps %xmm1,%xmm0 > > + .byte 0xf3,0xc3 > > + > > +.align 16 > > +.Lkey_expansion_192a: > > + movups %xmm0,(%rax) > > + leaq 16(%rax),%rax > > +.Lkey_expansion_192a_cold: > > + movaps %xmm2,%xmm5 > > +.Lkey_expansion_192b_warm: > > + shufps $16,%xmm0,%xmm4 > > + movdqa %xmm2,%xmm3 > > + xorps %xmm4,%xmm0 > > + shufps $140,%xmm0,%xmm4 > > + pslldq $4,%xmm3 > > + xorps %xmm4,%xmm0 > > + pshufd $85,%xmm1,%xmm1 > > + pxor %xmm3,%xmm2 > > + pxor %xmm1,%xmm0 > > + pshufd $255,%xmm0,%xmm3 > > + pxor %xmm3,%xmm2 > > + .byte 0xf3,0xc3 > > + > > +.align 16 > > +.Lkey_expansion_192b: > > + movaps %xmm0,%xmm3 > > + shufps $68,%xmm0,%xmm5 > > + movups %xmm5,(%rax) > > + shufps $78,%xmm2,%xmm3 > > + movups %xmm3,16(%rax) > > + leaq 32(%rax),%rax > > + jmp .Lkey_expansion_192b_warm > > + > > +.align 16 > > +.Lkey_expansion_256a: > > + movups %xmm2,(%rax) > > + leaq 16(%rax),%rax > > +.Lkey_expansion_256a_cold: > > + shufps $16,%xmm0,%xmm4 > > + xorps %xmm4,%xmm0 > > + shufps $140,%xmm0,%xmm4 > > + xorps %xmm4,%xmm0 > > + shufps $255,%xmm1,%xmm1 > > + xorps %xmm1,%xmm0 > > + .byte 0xf3,0xc3 > > + > > +.align 16 > > +.Lkey_expansion_256b: > > + movups %xmm0,(%rax) > > + leaq 16(%rax),%rax > > + > > + shufps $16,%xmm2,%xmm4 > > + xorps %xmm4,%xmm2 > > + shufps $140,%xmm2,%xmm4 > > + xorps %xmm4,%xmm2 > > + shufps $170,%xmm1,%xmm1 > > + xorps %xmm1,%xmm2 > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size aesni_set_encrypt_key,.-aesni_set_encrypt_key > > +.size __aesni_set_encrypt_key,.-__aesni_set_encrypt_key > > +.align 64 > > +.Lbswap_mask: > > +.byte 15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0 > > +.Lincrement32: > > +.long 6,6,6,0 > > +.Lincrement64: > > +.long 1,0,0,0 > > +.Lxts_magic: > > +.long 0x87,0,1,0 > > +.Lincrement1: > > +.byte 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1 > > +.Lkey_rotate: > > +.long 0x0c0f0e0d,0x0c0f0e0d,0x0c0f0e0d,0x0c0f0e0d > > +.Lkey_rotate192: > > +.long 0x04070605,0x04070605,0x04070605,0x04070605 > > +.Lkey_rcon1: > > +.long 1,1,1,1 > > +.Lkey_rcon1b: > > +.long 0x1b,0x1b,0x1b,0x1b > > + > > +.byte > > > 65,69,83,32,102,111,114,32,73,110,116,101,108,32,65,69,83,45,78,73,44,32,6 > 7 > > ,82,89,80,84,79,71,65,77,83,32,98,121,32,60,97,112,112,114,111,64,111,112 > ,1 > > 01,110,115,115,108,46,111,114,103,62,0 > > +.align 64 > > diff --git > a/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/vpaes-x86_64.S > > b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/vpaes-x86_64.S > > new file mode 100644 > > index 0000000000..982818f83b > > --- /dev/null > > +++ b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/aes/vpaes-x86_64.S > > @@ -0,0 +1,863 @@ > > +# WARNING: do not edit! > > +# Generated from openssl/crypto/aes/asm/vpaes-x86_64.pl > > +# > > +# Copyright 2011-2020 The OpenSSL Project Authors. All Rights Reserved. > > +# > > +# Licensed under the OpenSSL license (the "License"). You may not use > > +# this file except in compliance with the License. You can obtain a copy > > +# in the file LICENSE in the source distribution or at > > +# https://www.openssl.org/source/license.html > > + > > +.text > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > +.type _vpaes_encrypt_core,@function > > +.align 16 > > +_vpaes_encrypt_core: > > +.cfi_startproc > > + movq %rdx,%r9 > > + movq $16,%r11 > > + movl 240(%rdx),%eax > > + movdqa %xmm9,%xmm1 > > + movdqa .Lk_ipt(%rip),%xmm2 > > + pandn %xmm0,%xmm1 > > + movdqu (%r9),%xmm5 > > + psrld $4,%xmm1 > > + pand %xmm9,%xmm0 > > +.byte 102,15,56,0,208 > > + movdqa .Lk_ipt+16(%rip),%xmm0 > > +.byte 102,15,56,0,193 > > + pxor %xmm5,%xmm2 > > + addq $16,%r9 > > + pxor %xmm2,%xmm0 > > + leaq .Lk_mc_backward(%rip),%r10 > > + jmp .Lenc_entry > > + > > +.align 16 > > +.Lenc_loop: > > + > > + movdqa %xmm13,%xmm4 > > + movdqa %xmm12,%xmm0 > > +.byte 102,15,56,0,226 > > +.byte 102,15,56,0,195 > > + pxor %xmm5,%xmm4 > > + movdqa %xmm15,%xmm5 > > + pxor %xmm4,%xmm0 > > + movdqa -64(%r11,%r10,1),%xmm1 > > +.byte 102,15,56,0,234 > > + movdqa (%r11,%r10,1),%xmm4 > > + movdqa %xmm14,%xmm2 > > +.byte 102,15,56,0,211 > > + movdqa %xmm0,%xmm3 > > + pxor %xmm5,%xmm2 > > +.byte 102,15,56,0,193 > > + addq $16,%r9 > > + pxor %xmm2,%xmm0 > > +.byte 102,15,56,0,220 > > + addq $16,%r11 > > + pxor %xmm0,%xmm3 > > +.byte 102,15,56,0,193 > > + andq $0x30,%r11 > > + subq $1,%rax > > + pxor %xmm3,%xmm0 > > + > > +.Lenc_entry: > > + > > + movdqa %xmm9,%xmm1 > > + movdqa %xmm11,%xmm5 > > + pandn %xmm0,%xmm1 > > + psrld $4,%xmm1 > > + pand %xmm9,%xmm0 > > +.byte 102,15,56,0,232 > > + movdqa %xmm10,%xmm3 > > + pxor %xmm1,%xmm0 > > +.byte 102,15,56,0,217 > > + movdqa %xmm10,%xmm4 > > + pxor %xmm5,%xmm3 > > +.byte 102,15,56,0,224 > > + movdqa %xmm10,%xmm2 > > + pxor %xmm5,%xmm4 > > +.byte 102,15,56,0,211 > > + movdqa %xmm10,%xmm3 > > + pxor %xmm0,%xmm2 > > +.byte 102,15,56,0,220 > > + movdqu (%r9),%xmm5 > > + pxor %xmm1,%xmm3 > > + jnz .Lenc_loop > > + > > + > > + movdqa -96(%r10),%xmm4 > > + movdqa -80(%r10),%xmm0 > > +.byte 102,15,56,0,226 > > + pxor %xmm5,%xmm4 > > +.byte 102,15,56,0,195 > > + movdqa 64(%r11,%r10,1),%xmm1 > > + pxor %xmm4,%xmm0 > > +.byte 102,15,56,0,193 > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size _vpaes_encrypt_core,.-_vpaes_encrypt_core > > + > > + > > + > > + > > + > > + > > +.type _vpaes_decrypt_core,@function > > +.align 16 > > +_vpaes_decrypt_core: > > +.cfi_startproc > > + movq %rdx,%r9 > > + movl 240(%rdx),%eax > > + movdqa %xmm9,%xmm1 > > + movdqa .Lk_dipt(%rip),%xmm2 > > + pandn %xmm0,%xmm1 > > + movq %rax,%r11 > > + psrld $4,%xmm1 > > + movdqu (%r9),%xmm5 > > + shlq $4,%r11 > > + pand %xmm9,%xmm0 > > +.byte 102,15,56,0,208 > > + movdqa .Lk_dipt+16(%rip),%xmm0 > > + xorq $0x30,%r11 > > + leaq .Lk_dsbd(%rip),%r10 > > +.byte 102,15,56,0,193 > > + andq $0x30,%r11 > > + pxor %xmm5,%xmm2 > > + movdqa .Lk_mc_forward+48(%rip),%xmm5 > > + pxor %xmm2,%xmm0 > > + addq $16,%r9 > > + addq %r10,%r11 > > + jmp .Ldec_entry > > + > > +.align 16 > > +.Ldec_loop: > > + > > + > > + > > + movdqa -32(%r10),%xmm4 > > + movdqa -16(%r10),%xmm1 > > +.byte 102,15,56,0,226 > > +.byte 102,15,56,0,203 > > + pxor %xmm4,%xmm0 > > + movdqa 0(%r10),%xmm4 > > + pxor %xmm1,%xmm0 > > + movdqa 16(%r10),%xmm1 > > + > > +.byte 102,15,56,0,226 > > +.byte 102,15,56,0,197 > > +.byte 102,15,56,0,203 > > + pxor %xmm4,%xmm0 > > + movdqa 32(%r10),%xmm4 > > + pxor %xmm1,%xmm0 > > + movdqa 48(%r10),%xmm1 > > + > > +.byte 102,15,56,0,226 > > +.byte 102,15,56,0,197 > > +.byte 102,15,56,0,203 > > + pxor %xmm4,%xmm0 > > + movdqa 64(%r10),%xmm4 > > + pxor %xmm1,%xmm0 > > + movdqa 80(%r10),%xmm1 > > + > > +.byte 102,15,56,0,226 > > +.byte 102,15,56,0,197 > > +.byte 102,15,56,0,203 > > + pxor %xmm4,%xmm0 > > + addq $16,%r9 > > +.byte 102,15,58,15,237,12 > > + pxor %xmm1,%xmm0 > > + subq $1,%rax > > + > > +.Ldec_entry: > > + > > + movdqa %xmm9,%xmm1 > > + pandn %xmm0,%xmm1 > > + movdqa %xmm11,%xmm2 > > + psrld $4,%xmm1 > > + pand %xmm9,%xmm0 > > +.byte 102,15,56,0,208 > > + movdqa %xmm10,%xmm3 > > + pxor %xmm1,%xmm0 > > +.byte 102,15,56,0,217 > > + movdqa %xmm10,%xmm4 > > + pxor %xmm2,%xmm3 > > +.byte 102,15,56,0,224 > > + pxor %xmm2,%xmm4 > > + movdqa %xmm10,%xmm2 > > +.byte 102,15,56,0,211 > > + movdqa %xmm10,%xmm3 > > + pxor %xmm0,%xmm2 > > +.byte 102,15,56,0,220 > > + movdqu (%r9),%xmm0 > > + pxor %xmm1,%xmm3 > > + jnz .Ldec_loop > > + > > + > > + movdqa 96(%r10),%xmm4 > > +.byte 102,15,56,0,226 > > + pxor %xmm0,%xmm4 > > + movdqa 112(%r10),%xmm0 > > + movdqa -352(%r11),%xmm2 > > +.byte 102,15,56,0,195 > > + pxor %xmm4,%xmm0 > > +.byte 102,15,56,0,194 > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size _vpaes_decrypt_core,.-_vpaes_decrypt_core > > + > > + > > + > > + > > + > > + > > +.type _vpaes_schedule_core,@function > > +.align 16 > > +_vpaes_schedule_core: > > +.cfi_startproc > > + > > + > > + > > + > > + > > + call _vpaes_preheat > > + movdqa .Lk_rcon(%rip),%xmm8 > > + movdqu (%rdi),%xmm0 > > + > > + > > + movdqa %xmm0,%xmm3 > > + leaq .Lk_ipt(%rip),%r11 > > + call _vpaes_schedule_transform > > + movdqa %xmm0,%xmm7 > > + > > + leaq .Lk_sr(%rip),%r10 > > + testq %rcx,%rcx > > + jnz .Lschedule_am_decrypting > > + > > + > > + movdqu %xmm0,(%rdx) > > + jmp .Lschedule_go > > + > > +.Lschedule_am_decrypting: > > + > > + movdqa (%r8,%r10,1),%xmm1 > > +.byte 102,15,56,0,217 > > + movdqu %xmm3,(%rdx) > > + xorq $0x30,%r8 > > + > > +.Lschedule_go: > > + cmpl $192,%esi > > + ja .Lschedule_256 > > + je .Lschedule_192 > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > +.Lschedule_128: > > + movl $10,%esi > > + > > +.Loop_schedule_128: > > + call _vpaes_schedule_round > > + decq %rsi > > + jz .Lschedule_mangle_last > > + call _vpaes_schedule_mangle > > + jmp .Loop_schedule_128 > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > +.align 16 > > +.Lschedule_192: > > + movdqu 8(%rdi),%xmm0 > > + call _vpaes_schedule_transform > > + movdqa %xmm0,%xmm6 > > + pxor %xmm4,%xmm4 > > + movhlps %xmm4,%xmm6 > > + movl $4,%esi > > + > > +.Loop_schedule_192: > > + call _vpaes_schedule_round > > +.byte 102,15,58,15,198,8 > > + call _vpaes_schedule_mangle > > + call _vpaes_schedule_192_smear > > + call _vpaes_schedule_mangle > > + call _vpaes_schedule_round > > + decq %rsi > > + jz .Lschedule_mangle_last > > + call _vpaes_schedule_mangle > > + call _vpaes_schedule_192_smear > > + jmp .Loop_schedule_192 > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > +.align 16 > > +.Lschedule_256: > > + movdqu 16(%rdi),%xmm0 > > + call _vpaes_schedule_transform > > + movl $7,%esi > > + > > +.Loop_schedule_256: > > + call _vpaes_schedule_mangle > > + movdqa %xmm0,%xmm6 > > + > > + > > + call _vpaes_schedule_round > > + decq %rsi > > + jz .Lschedule_mangle_last > > + call _vpaes_schedule_mangle > > + > > + > > + pshufd $0xFF,%xmm0,%xmm0 > > + movdqa %xmm7,%xmm5 > > + movdqa %xmm6,%xmm7 > > + call _vpaes_schedule_low_round > > + movdqa %xmm5,%xmm7 > > + > > + jmp .Loop_schedule_256 > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > +.align 16 > > +.Lschedule_mangle_last: > > + > > + leaq .Lk_deskew(%rip),%r11 > > + testq %rcx,%rcx > > + jnz .Lschedule_mangle_last_dec > > + > > + > > + movdqa (%r8,%r10,1),%xmm1 > > +.byte 102,15,56,0,193 > > + leaq .Lk_opt(%rip),%r11 > > + addq $32,%rdx > > + > > +.Lschedule_mangle_last_dec: > > + addq $-16,%rdx > > + pxor .Lk_s63(%rip),%xmm0 > > + call _vpaes_schedule_transform > > + movdqu %xmm0,(%rdx) > > + > > + > > + pxor %xmm0,%xmm0 > > + pxor %xmm1,%xmm1 > > + pxor %xmm2,%xmm2 > > + pxor %xmm3,%xmm3 > > + pxor %xmm4,%xmm4 > > + pxor %xmm5,%xmm5 > > + pxor %xmm6,%xmm6 > > + pxor %xmm7,%xmm7 > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size _vpaes_schedule_core,.-_vpaes_schedule_core > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > +.type _vpaes_schedule_192_smear,@function > > +.align 16 > > +_vpaes_schedule_192_smear: > > +.cfi_startproc > > + pshufd $0x80,%xmm6,%xmm1 > > + pshufd $0xFE,%xmm7,%xmm0 > > + pxor %xmm1,%xmm6 > > + pxor %xmm1,%xmm1 > > + pxor %xmm0,%xmm6 > > + movdqa %xmm6,%xmm0 > > + movhlps %xmm1,%xmm6 > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size _vpaes_schedule_192_smear,.-_vpaes_schedule_192_smear > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > +.type _vpaes_schedule_round,@function > > +.align 16 > > +_vpaes_schedule_round: > > +.cfi_startproc > > + > > + pxor %xmm1,%xmm1 > > +.byte 102,65,15,58,15,200,15 > > +.byte 102,69,15,58,15,192,15 > > + pxor %xmm1,%xmm7 > > + > > + > > + pshufd $0xFF,%xmm0,%xmm0 > > +.byte 102,15,58,15,192,1 > > + > > + > > + > > + > > +_vpaes_schedule_low_round: > > + > > + movdqa %xmm7,%xmm1 > > + pslldq $4,%xmm7 > > + pxor %xmm1,%xmm7 > > + movdqa %xmm7,%xmm1 > > + pslldq $8,%xmm7 > > + pxor %xmm1,%xmm7 > > + pxor .Lk_s63(%rip),%xmm7 > > + > > + > > + movdqa %xmm9,%xmm1 > > + pandn %xmm0,%xmm1 > > + psrld $4,%xmm1 > > + pand %xmm9,%xmm0 > > + movdqa %xmm11,%xmm2 > > +.byte 102,15,56,0,208 > > + pxor %xmm1,%xmm0 > > + movdqa %xmm10,%xmm3 > > +.byte 102,15,56,0,217 > > + pxor %xmm2,%xmm3 > > + movdqa %xmm10,%xmm4 > > +.byte 102,15,56,0,224 > > + pxor %xmm2,%xmm4 > > + movdqa %xmm10,%xmm2 > > +.byte 102,15,56,0,211 > > + pxor %xmm0,%xmm2 > > + movdqa %xmm10,%xmm3 > > +.byte 102,15,56,0,220 > > + pxor %xmm1,%xmm3 > > + movdqa %xmm13,%xmm4 > > +.byte 102,15,56,0,226 > > + movdqa %xmm12,%xmm0 > > +.byte 102,15,56,0,195 > > + pxor %xmm4,%xmm0 > > + > > + > > + pxor %xmm7,%xmm0 > > + movdqa %xmm0,%xmm7 > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size _vpaes_schedule_round,.-_vpaes_schedule_round > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > +.type _vpaes_schedule_transform,@function > > +.align 16 > > +_vpaes_schedule_transform: > > +.cfi_startproc > > + movdqa %xmm9,%xmm1 > > + pandn %xmm0,%xmm1 > > + psrld $4,%xmm1 > > + pand %xmm9,%xmm0 > > + movdqa (%r11),%xmm2 > > +.byte 102,15,56,0,208 > > + movdqa 16(%r11),%xmm0 > > +.byte 102,15,56,0,193 > > + pxor %xmm2,%xmm0 > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size _vpaes_schedule_transform,.-_vpaes_schedule_transform > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > +.type _vpaes_schedule_mangle,@function > > +.align 16 > > +_vpaes_schedule_mangle: > > +.cfi_startproc > > + movdqa %xmm0,%xmm4 > > + movdqa .Lk_mc_forward(%rip),%xmm5 > > + testq %rcx,%rcx > > + jnz .Lschedule_mangle_dec > > + > > + > > + addq $16,%rdx > > + pxor .Lk_s63(%rip),%xmm4 > > +.byte 102,15,56,0,229 > > + movdqa %xmm4,%xmm3 > > +.byte 102,15,56,0,229 > > + pxor %xmm4,%xmm3 > > +.byte 102,15,56,0,229 > > + pxor %xmm4,%xmm3 > > + > > + jmp .Lschedule_mangle_both > > +.align 16 > > +.Lschedule_mangle_dec: > > + > > + leaq .Lk_dksd(%rip),%r11 > > + movdqa %xmm9,%xmm1 > > + pandn %xmm4,%xmm1 > > + psrld $4,%xmm1 > > + pand %xmm9,%xmm4 > > + > > + movdqa 0(%r11),%xmm2 > > +.byte 102,15,56,0,212 > > + movdqa 16(%r11),%xmm3 > > +.byte 102,15,56,0,217 > > + pxor %xmm2,%xmm3 > > +.byte 102,15,56,0,221 > > + > > + movdqa 32(%r11),%xmm2 > > +.byte 102,15,56,0,212 > > + pxor %xmm3,%xmm2 > > + movdqa 48(%r11),%xmm3 > > +.byte 102,15,56,0,217 > > + pxor %xmm2,%xmm3 > > +.byte 102,15,56,0,221 > > + > > + movdqa 64(%r11),%xmm2 > > +.byte 102,15,56,0,212 > > + pxor %xmm3,%xmm2 > > + movdqa 80(%r11),%xmm3 > > +.byte 102,15,56,0,217 > > + pxor %xmm2,%xmm3 > > +.byte 102,15,56,0,221 > > + > > + movdqa 96(%r11),%xmm2 > > +.byte 102,15,56,0,212 > > + pxor %xmm3,%xmm2 > > + movdqa 112(%r11),%xmm3 > > +.byte 102,15,56,0,217 > > + pxor %xmm2,%xmm3 > > + > > + addq $-16,%rdx > > + > > +.Lschedule_mangle_both: > > + movdqa (%r8,%r10,1),%xmm1 > > +.byte 102,15,56,0,217 > > + addq $-16,%r8 > > + andq $0x30,%r8 > > + movdqu %xmm3,(%rdx) > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size _vpaes_schedule_mangle,.-_vpaes_schedule_mangle > > + > > + > > + > > + > > +.globl vpaes_set_encrypt_key > > +.type vpaes_set_encrypt_key,@function > > +.align 16 > > +vpaes_set_encrypt_key: > > +.cfi_startproc > > + movl %esi,%eax > > + shrl $5,%eax > > + addl $5,%eax > > + movl %eax,240(%rdx) > > + > > + movl $0,%ecx > > + movl $0x30,%r8d > > + call _vpaes_schedule_core > > + xorl %eax,%eax > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size vpaes_set_encrypt_key,.-vpaes_set_encrypt_key > > + > > +.globl vpaes_set_decrypt_key > > +.type vpaes_set_decrypt_key,@function > > +.align 16 > > +vpaes_set_decrypt_key: > > +.cfi_startproc > > + movl %esi,%eax > > + shrl $5,%eax > > + addl $5,%eax > > + movl %eax,240(%rdx) > > + shll $4,%eax > > + leaq 16(%rdx,%rax,1),%rdx > > + > > + movl $1,%ecx > > + movl %esi,%r8d > > + shrl $1,%r8d > > + andl $32,%r8d > > + xorl $32,%r8d > > + call _vpaes_schedule_core > > + xorl %eax,%eax > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size vpaes_set_decrypt_key,.-vpaes_set_decrypt_key > > + > > +.globl vpaes_encrypt > > +.type vpaes_encrypt,@function > > +.align 16 > > +vpaes_encrypt: > > +.cfi_startproc > > + movdqu (%rdi),%xmm0 > > + call _vpaes_preheat > > + call _vpaes_encrypt_core > > + movdqu %xmm0,(%rsi) > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size vpaes_encrypt,.-vpaes_encrypt > > + > > +.globl vpaes_decrypt > > +.type vpaes_decrypt,@function > > +.align 16 > > +vpaes_decrypt: > > +.cfi_startproc > > + movdqu (%rdi),%xmm0 > > + call _vpaes_preheat > > + call _vpaes_decrypt_core > > + movdqu %xmm0,(%rsi) > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size vpaes_decrypt,.-vpaes_decrypt > > +.globl vpaes_cbc_encrypt > > +.type vpaes_cbc_encrypt,@function > > +.align 16 > > +vpaes_cbc_encrypt: > > +.cfi_startproc > > + xchgq %rcx,%rdx > > + subq $16,%rcx > > + jc .Lcbc_abort > > + movdqu (%r8),%xmm6 > > + subq %rdi,%rsi > > + call _vpaes_preheat > > + cmpl $0,%r9d > > + je .Lcbc_dec_loop > > + jmp .Lcbc_enc_loop > > +.align 16 > > +.Lcbc_enc_loop: > > + movdqu (%rdi),%xmm0 > > + pxor %xmm6,%xmm0 > > + call _vpaes_encrypt_core > > + movdqa %xmm0,%xmm6 > > + movdqu %xmm0,(%rsi,%rdi,1) > > + leaq 16(%rdi),%rdi > > + subq $16,%rcx > > + jnc .Lcbc_enc_loop > > + jmp .Lcbc_done > > +.align 16 > > +.Lcbc_dec_loop: > > + movdqu (%rdi),%xmm0 > > + movdqa %xmm0,%xmm7 > > + call _vpaes_decrypt_core > > + pxor %xmm6,%xmm0 > > + movdqa %xmm7,%xmm6 > > + movdqu %xmm0,(%rsi,%rdi,1) > > + leaq 16(%rdi),%rdi > > + subq $16,%rcx > > + jnc .Lcbc_dec_loop > > +.Lcbc_done: > > + movdqu %xmm6,(%r8) > > +.Lcbc_abort: > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size vpaes_cbc_encrypt,.-vpaes_cbc_encrypt > > + > > + > > + > > + > > + > > + > > +.type _vpaes_preheat,@function > > +.align 16 > > +_vpaes_preheat: > > +.cfi_startproc > > + leaq .Lk_s0F(%rip),%r10 > > + movdqa -32(%r10),%xmm10 > > + movdqa -16(%r10),%xmm11 > > + movdqa 0(%r10),%xmm9 > > + movdqa 48(%r10),%xmm13 > > + movdqa 64(%r10),%xmm12 > > + movdqa 80(%r10),%xmm15 > > + movdqa 96(%r10),%xmm14 > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size _vpaes_preheat,.-_vpaes_preheat > > + > > + > > + > > + > > + > > +.type _vpaes_consts,@object > > +.align 64 > > +_vpaes_consts: > > +.Lk_inv: > > +.quad 0x0E05060F0D080180, 0x040703090A0B0C02 > > +.quad 0x01040A060F0B0780, 0x030D0E0C02050809 > > + > > +.Lk_s0F: > > +.quad 0x0F0F0F0F0F0F0F0F, 0x0F0F0F0F0F0F0F0F > > + > > +.Lk_ipt: > > +.quad 0xC2B2E8985A2A7000, 0xCABAE09052227808 > > +.quad 0x4C01307D317C4D00, 0xCD80B1FCB0FDCC81 > > + > > +.Lk_sb1: > > +.quad 0xB19BE18FCB503E00, 0xA5DF7A6E142AF544 > > +.quad 0x3618D415FAE22300, 0x3BF7CCC10D2ED9EF > > +.Lk_sb2: > > +.quad 0xE27A93C60B712400, 0x5EB7E955BC982FCD > > +.quad 0x69EB88400AE12900, 0xC2A163C8AB82234A > > +.Lk_sbo: > > +.quad 0xD0D26D176FBDC700, 0x15AABF7AC502A878 > > +.quad 0xCFE474A55FBB6A00, 0x8E1E90D1412B35FA > > + > > +.Lk_mc_forward: > > +.quad 0x0407060500030201, 0x0C0F0E0D080B0A09 > > +.quad 0x080B0A0904070605, 0x000302010C0F0E0D > > +.quad 0x0C0F0E0D080B0A09, 0x0407060500030201 > > +.quad 0x000302010C0F0E0D, 0x080B0A0904070605 > > + > > +.Lk_mc_backward: > > +.quad 0x0605040702010003, 0x0E0D0C0F0A09080B > > +.quad 0x020100030E0D0C0F, 0x0A09080B06050407 > > +.quad 0x0E0D0C0F0A09080B, 0x0605040702010003 > > +.quad 0x0A09080B06050407, 0x020100030E0D0C0F > > + > > +.Lk_sr: > > +.quad 0x0706050403020100, 0x0F0E0D0C0B0A0908 > > +.quad 0x030E09040F0A0500, 0x0B06010C07020D08 > > +.quad 0x0F060D040B020900, 0x070E050C030A0108 > > +.quad 0x0B0E0104070A0D00, 0x0306090C0F020508 > > + > > +.Lk_rcon: > > +.quad 0x1F8391B9AF9DEEB6, 0x702A98084D7C7D81 > > + > > +.Lk_s63: > > +.quad 0x5B5B5B5B5B5B5B5B, 0x5B5B5B5B5B5B5B5B > > + > > +.Lk_opt: > > +.quad 0xFF9F4929D6B66000, 0xF7974121DEBE6808 > > +.quad 0x01EDBD5150BCEC00, 0xE10D5DB1B05C0CE0 > > + > > +.Lk_deskew: > > +.quad 0x07E4A34047A4E300, 0x1DFEB95A5DBEF91A > > +.quad 0x5F36B5DC83EA6900, 0x2841C2ABF49D1E77 > > + > > + > > + > > + > > + > > +.Lk_dksd: > > +.quad 0xFEB91A5DA3E44700, 0x0740E3A45A1DBEF9 > > +.quad 0x41C277F4B5368300, 0x5FDC69EAAB289D1E > > +.Lk_dksb: > > +.quad 0x9A4FCA1F8550D500, 0x03D653861CC94C99 > > +.quad 0x115BEDA7B6FC4A00, 0xD993256F7E3482C8 > > +.Lk_dkse: > > +.quad 0xD5031CCA1FC9D600, 0x53859A4C994F5086 > > +.quad 0xA23196054FDC7BE8, 0xCD5EF96A20B31487 > > +.Lk_dks9: > > +.quad 0xB6116FC87ED9A700, 0x4AED933482255BFC > > +.quad 0x4576516227143300, 0x8BB89FACE9DAFDCE > > + > > + > > + > > + > > + > > +.Lk_dipt: > > +.quad 0x0F505B040B545F00, 0x154A411E114E451A > > +.quad 0x86E383E660056500, 0x12771772F491F194 > > + > > +.Lk_dsb9: > > +.quad 0x851C03539A86D600, 0xCAD51F504F994CC9 > > +.quad 0xC03B1789ECD74900, 0x725E2C9EB2FBA565 > > +.Lk_dsbd: > > +.quad 0x7D57CCDFE6B1A200, 0xF56E9B13882A4439 > > +.quad 0x3CE2FAF724C6CB00, 0x2931180D15DEEFD3 > > +.Lk_dsbb: > > +.quad 0xD022649296B44200, 0x602646F6B0F2D404 > > +.quad 0xC19498A6CD596700, 0xF3FF0C3E3255AA6B > > +.Lk_dsbe: > > +.quad 0x46F2929626D4D000, 0x2242600464B4F6B0 > > +.quad 0x0C55A6CDFFAAC100, 0x9467F36B98593E32 > > +.Lk_dsbo: > > +.quad 0x1387EA537EF94000, 0xC7AA6DB9D4943E2D > > +.quad 0x12D7560F93441D00, 0xCA4B8159D8C58E9C > > +.byte > > > 86,101,99,116,111,114,32,80,101,114,109,117,116,97,116,105,111,110,32,65, > 6 > > > 9,83,32,102,111,114,32,120,56,54,95,54,52,47,83,83,83,69,51,44,32,77,105,10 > > > 7,101,32,72,97,109,98,117,114,103,32,40,83,116,97,110,102,111,114,100,32,8 > > 5,110,105,118,101,114,115,105,116,121,41,0 > > +.align 64 > > +.size _vpaes_consts,.-_vpaes_consts > > diff --git > a/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/modes/aesni-gcm- > > x86_64.S > b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/modes/aesni-gcm- > > x86_64.S > > new file mode 100644 > > index 0000000000..1201f3427a > > --- /dev/null > > +++ b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/modes/aesni-gcm- > > x86_64.S > > @@ -0,0 +1,29 @@ > > +# WARNING: do not edit! > > +# Generated from openssl/crypto/modes/asm/aesni-gcm-x86_64.pl > > +# > > +# Copyright 2013-2020 The OpenSSL Project Authors. All Rights Reserved. > > +# > > +# Licensed under the OpenSSL license (the "License"). You may not use > > +# this file except in compliance with the License. You can obtain a copy > > +# in the file LICENSE in the source distribution or at > > +# https://www.openssl.org/source/license.html > > + > > +.text > > + > > +.globl aesni_gcm_encrypt > > +.type aesni_gcm_encrypt,@function > > +aesni_gcm_encrypt: > > +.cfi_startproc > > + xorl %eax,%eax > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size aesni_gcm_encrypt,.-aesni_gcm_encrypt > > + > > +.globl aesni_gcm_decrypt > > +.type aesni_gcm_decrypt,@function > > +aesni_gcm_decrypt: > > +.cfi_startproc > > + xorl %eax,%eax > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size aesni_gcm_decrypt,.-aesni_gcm_decrypt > > diff --git a/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/modes/ghash- > > x86_64.S b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/modes/ghash- > > x86_64.S > > new file mode 100644 > > index 0000000000..3fcaa4b2ef > > --- /dev/null > > +++ > b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/modes/ghash-x86_64.S > > @@ -0,0 +1,1386 @@ > > +# WARNING: do not edit! > > +# Generated from openssl/crypto/modes/asm/ghash-x86_64.pl > > +# > > +# Copyright 2010-2020 The OpenSSL Project Authors. All Rights Reserved. > > +# > > +# Licensed under the OpenSSL license (the "License"). You may not use > > +# this file except in compliance with the License. You can obtain a copy > > +# in the file LICENSE in the source distribution or at > > +# https://www.openssl.org/source/license.html > > + > > +.text > > + > > + > > +.globl gcm_gmult_4bit > > +.type gcm_gmult_4bit,@function > > +.align 16 > > +gcm_gmult_4bit: > > +.cfi_startproc > > + pushq %rbx > > +.cfi_adjust_cfa_offset 8 > > +.cfi_offset %rbx,-16 > > + pushq %rbp > > +.cfi_adjust_cfa_offset 8 > > +.cfi_offset %rbp,-24 > > + pushq %r12 > > +.cfi_adjust_cfa_offset 8 > > +.cfi_offset %r12,-32 > > + pushq %r13 > > +.cfi_adjust_cfa_offset 8 > > +.cfi_offset %r13,-40 > > + pushq %r14 > > +.cfi_adjust_cfa_offset 8 > > +.cfi_offset %r14,-48 > > + pushq %r15 > > +.cfi_adjust_cfa_offset 8 > > +.cfi_offset %r15,-56 > > + subq $280,%rsp > > +.cfi_adjust_cfa_offset 280 > > +.Lgmult_prologue: > > + > > + movzbq 15(%rdi),%r8 > > + leaq .Lrem_4bit(%rip),%r11 > > + xorq %rax,%rax > > + xorq %rbx,%rbx > > + movb %r8b,%al > > + movb %r8b,%bl > > + shlb $4,%al > > + movq $14,%rcx > > + movq 8(%rsi,%rax,1),%r8 > > + movq (%rsi,%rax,1),%r9 > > + andb $0xf0,%bl > > + movq %r8,%rdx > > + jmp .Loop1 > > + > > +.align 16 > > +.Loop1: > > + shrq $4,%r8 > > + andq $0xf,%rdx > > + movq %r9,%r10 > > + movb (%rdi,%rcx,1),%al > > + shrq $4,%r9 > > + xorq 8(%rsi,%rbx,1),%r8 > > + shlq $60,%r10 > > + xorq (%rsi,%rbx,1),%r9 > > + movb %al,%bl > > + xorq (%r11,%rdx,8),%r9 > > + movq %r8,%rdx > > + shlb $4,%al > > + xorq %r10,%r8 > > + decq %rcx > > + js .Lbreak1 > > + > > + shrq $4,%r8 > > + andq $0xf,%rdx > > + movq %r9,%r10 > > + shrq $4,%r9 > > + xorq 8(%rsi,%rax,1),%r8 > > + shlq $60,%r10 > > + xorq (%rsi,%rax,1),%r9 > > + andb $0xf0,%bl > > + xorq (%r11,%rdx,8),%r9 > > + movq %r8,%rdx > > + xorq %r10,%r8 > > + jmp .Loop1 > > + > > +.align 16 > > +.Lbreak1: > > + shrq $4,%r8 > > + andq $0xf,%rdx > > + movq %r9,%r10 > > + shrq $4,%r9 > > + xorq 8(%rsi,%rax,1),%r8 > > + shlq $60,%r10 > > + xorq (%rsi,%rax,1),%r9 > > + andb $0xf0,%bl > > + xorq (%r11,%rdx,8),%r9 > > + movq %r8,%rdx > > + xorq %r10,%r8 > > + > > + shrq $4,%r8 > > + andq $0xf,%rdx > > + movq %r9,%r10 > > + shrq $4,%r9 > > + xorq 8(%rsi,%rbx,1),%r8 > > + shlq $60,%r10 > > + xorq (%rsi,%rbx,1),%r9 > > + xorq %r10,%r8 > > + xorq (%r11,%rdx,8),%r9 > > + > > + bswapq %r8 > > + bswapq %r9 > > + movq %r8,8(%rdi) > > + movq %r9,(%rdi) > > + > > + leaq 280+48(%rsp),%rsi > > +.cfi_def_cfa %rsi,8 > > + movq -8(%rsi),%rbx > > +.cfi_restore %rbx > > + leaq (%rsi),%rsp > > +.cfi_def_cfa_register %rsp > > +.Lgmult_epilogue: > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size gcm_gmult_4bit,.-gcm_gmult_4bit > > +.globl gcm_ghash_4bit > > +.type gcm_ghash_4bit,@function > > +.align 16 > > +gcm_ghash_4bit: > > +.cfi_startproc > > + pushq %rbx > > +.cfi_adjust_cfa_offset 8 > > +.cfi_offset %rbx,-16 > > + pushq %rbp > > +.cfi_adjust_cfa_offset 8 > > +.cfi_offset %rbp,-24 > > + pushq %r12 > > +.cfi_adjust_cfa_offset 8 > > +.cfi_offset %r12,-32 > > + pushq %r13 > > +.cfi_adjust_cfa_offset 8 > > +.cfi_offset %r13,-40 > > + pushq %r14 > > +.cfi_adjust_cfa_offset 8 > > +.cfi_offset %r14,-48 > > + pushq %r15 > > +.cfi_adjust_cfa_offset 8 > > +.cfi_offset %r15,-56 > > + subq $280,%rsp > > +.cfi_adjust_cfa_offset 280 > > +.Lghash_prologue: > > + movq %rdx,%r14 > > + movq %rcx,%r15 > > + subq $-128,%rsi > > + leaq 16+128(%rsp),%rbp > > + xorl %edx,%edx > > + movq 0+0-128(%rsi),%r8 > > + movq 0+8-128(%rsi),%rax > > + movb %al,%dl > > + shrq $4,%rax > > + movq %r8,%r10 > > + shrq $4,%r8 > > + movq 16+0-128(%rsi),%r9 > > + shlb $4,%dl > > + movq 16+8-128(%rsi),%rbx > > + shlq $60,%r10 > > + movb %dl,0(%rsp) > > + orq %r10,%rax > > + movb %bl,%dl > > + shrq $4,%rbx > > + movq %r9,%r10 > > + shrq $4,%r9 > > + movq %r8,0(%rbp) > > + movq 32+0-128(%rsi),%r8 > > + shlb $4,%dl > > + movq %rax,0-128(%rbp) > > + movq 32+8-128(%rsi),%rax > > + shlq $60,%r10 > > + movb %dl,1(%rsp) > > + orq %r10,%rbx > > + movb %al,%dl > > + shrq $4,%rax > > + movq %r8,%r10 > > + shrq $4,%r8 > > + movq %r9,8(%rbp) > > + movq 48+0-128(%rsi),%r9 > > + shlb $4,%dl > > + movq %rbx,8-128(%rbp) > > + movq 48+8-128(%rsi),%rbx > > + shlq $60,%r10 > > + movb %dl,2(%rsp) > > + orq %r10,%rax > > + movb %bl,%dl > > + shrq $4,%rbx > > + movq %r9,%r10 > > + shrq $4,%r9 > > + movq %r8,16(%rbp) > > + movq 64+0-128(%rsi),%r8 > > + shlb $4,%dl > > + movq %rax,16-128(%rbp) > > + movq 64+8-128(%rsi),%rax > > + shlq $60,%r10 > > + movb %dl,3(%rsp) > > + orq %r10,%rbx > > + movb %al,%dl > > + shrq $4,%rax > > + movq %r8,%r10 > > + shrq $4,%r8 > > + movq %r9,24(%rbp) > > + movq 80+0-128(%rsi),%r9 > > + shlb $4,%dl > > + movq %rbx,24-128(%rbp) > > + movq 80+8-128(%rsi),%rbx > > + shlq $60,%r10 > > + movb %dl,4(%rsp) > > + orq %r10,%rax > > + movb %bl,%dl > > + shrq $4,%rbx > > + movq %r9,%r10 > > + shrq $4,%r9 > > + movq %r8,32(%rbp) > > + movq 96+0-128(%rsi),%r8 > > + shlb $4,%dl > > + movq %rax,32-128(%rbp) > > + movq 96+8-128(%rsi),%rax > > + shlq $60,%r10 > > + movb %dl,5(%rsp) > > + orq %r10,%rbx > > + movb %al,%dl > > + shrq $4,%rax > > + movq %r8,%r10 > > + shrq $4,%r8 > > + movq %r9,40(%rbp) > > + movq 112+0-128(%rsi),%r9 > > + shlb $4,%dl > > + movq %rbx,40-128(%rbp) > > + movq 112+8-128(%rsi),%rbx > > + shlq $60,%r10 > > + movb %dl,6(%rsp) > > + orq %r10,%rax > > + movb %bl,%dl > > + shrq $4,%rbx > > + movq %r9,%r10 > > + shrq $4,%r9 > > + movq %r8,48(%rbp) > > + movq 128+0-128(%rsi),%r8 > > + shlb $4,%dl > > + movq %rax,48-128(%rbp) > > + movq 128+8-128(%rsi),%rax > > + shlq $60,%r10 > > + movb %dl,7(%rsp) > > + orq %r10,%rbx > > + movb %al,%dl > > + shrq $4,%rax > > + movq %r8,%r10 > > + shrq $4,%r8 > > + movq %r9,56(%rbp) > > + movq 144+0-128(%rsi),%r9 > > + shlb $4,%dl > > + movq %rbx,56-128(%rbp) > > + movq 144+8-128(%rsi),%rbx > > + shlq $60,%r10 > > + movb %dl,8(%rsp) > > + orq %r10,%rax > > + movb %bl,%dl > > + shrq $4,%rbx > > + movq %r9,%r10 > > + shrq $4,%r9 > > + movq %r8,64(%rbp) > > + movq 160+0-128(%rsi),%r8 > > + shlb $4,%dl > > + movq %rax,64-128(%rbp) > > + movq 160+8-128(%rsi),%rax > > + shlq $60,%r10 > > + movb %dl,9(%rsp) > > + orq %r10,%rbx > > + movb %al,%dl > > + shrq $4,%rax > > + movq %r8,%r10 > > + shrq $4,%r8 > > + movq %r9,72(%rbp) > > + movq 176+0-128(%rsi),%r9 > > + shlb $4,%dl > > + movq %rbx,72-128(%rbp) > > + movq 176+8-128(%rsi),%rbx > > + shlq $60,%r10 > > + movb %dl,10(%rsp) > > + orq %r10,%rax > > + movb %bl,%dl > > + shrq $4,%rbx > > + movq %r9,%r10 > > + shrq $4,%r9 > > + movq %r8,80(%rbp) > > + movq 192+0-128(%rsi),%r8 > > + shlb $4,%dl > > + movq %rax,80-128(%rbp) > > + movq 192+8-128(%rsi),%rax > > + shlq $60,%r10 > > + movb %dl,11(%rsp) > > + orq %r10,%rbx > > + movb %al,%dl > > + shrq $4,%rax > > + movq %r8,%r10 > > + shrq $4,%r8 > > + movq %r9,88(%rbp) > > + movq 208+0-128(%rsi),%r9 > > + shlb $4,%dl > > + movq %rbx,88-128(%rbp) > > + movq 208+8-128(%rsi),%rbx > > + shlq $60,%r10 > > + movb %dl,12(%rsp) > > + orq %r10,%rax > > + movb %bl,%dl > > + shrq $4,%rbx > > + movq %r9,%r10 > > + shrq $4,%r9 > > + movq %r8,96(%rbp) > > + movq 224+0-128(%rsi),%r8 > > + shlb $4,%dl > > + movq %rax,96-128(%rbp) > > + movq 224+8-128(%rsi),%rax > > + shlq $60,%r10 > > + movb %dl,13(%rsp) > > + orq %r10,%rbx > > + movb %al,%dl > > + shrq $4,%rax > > + movq %r8,%r10 > > + shrq $4,%r8 > > + movq %r9,104(%rbp) > > + movq 240+0-128(%rsi),%r9 > > + shlb $4,%dl > > + movq %rbx,104-128(%rbp) > > + movq 240+8-128(%rsi),%rbx > > + shlq $60,%r10 > > + movb %dl,14(%rsp) > > + orq %r10,%rax > > + movb %bl,%dl > > + shrq $4,%rbx > > + movq %r9,%r10 > > + shrq $4,%r9 > > + movq %r8,112(%rbp) > > + shlb $4,%dl > > + movq %rax,112-128(%rbp) > > + shlq $60,%r10 > > + movb %dl,15(%rsp) > > + orq %r10,%rbx > > + movq %r9,120(%rbp) > > + movq %rbx,120-128(%rbp) > > + addq $-128,%rsi > > + movq 8(%rdi),%r8 > > + movq 0(%rdi),%r9 > > + addq %r14,%r15 > > + leaq .Lrem_8bit(%rip),%r11 > > + jmp .Louter_loop > > +.align 16 > > +.Louter_loop: > > + xorq (%r14),%r9 > > + movq 8(%r14),%rdx > > + leaq 16(%r14),%r14 > > + xorq %r8,%rdx > > + movq %r9,(%rdi) > > + movq %rdx,8(%rdi) > > + shrq $32,%rdx > > + xorq %rax,%rax > > + roll $8,%edx > > + movb %dl,%al > > + movzbl %dl,%ebx > > + shlb $4,%al > > + shrl $4,%ebx > > + roll $8,%edx > > + movq 8(%rsi,%rax,1),%r8 > > + movq (%rsi,%rax,1),%r9 > > + movb %dl,%al > > + movzbl %dl,%ecx > > + shlb $4,%al > > + movzbq (%rsp,%rbx,1),%r12 > > + shrl $4,%ecx > > + xorq %r8,%r12 > > + movq %r9,%r10 > > + shrq $8,%r8 > > + movzbq %r12b,%r12 > > + shrq $8,%r9 > > + xorq -128(%rbp,%rbx,8),%r8 > > + shlq $56,%r10 > > + xorq (%rbp,%rbx,8),%r9 > > + roll $8,%edx > > + xorq 8(%rsi,%rax,1),%r8 > > + xorq (%rsi,%rax,1),%r9 > > + movb %dl,%al > > + xorq %r10,%r8 > > + movzwq (%r11,%r12,2),%r12 > > + movzbl %dl,%ebx > > + shlb $4,%al > > + movzbq (%rsp,%rcx,1),%r13 > > + shrl $4,%ebx > > + shlq $48,%r12 > > + xorq %r8,%r13 > > + movq %r9,%r10 > > + xorq %r12,%r9 > > + shrq $8,%r8 > > + movzbq %r13b,%r13 > > + shrq $8,%r9 > > + xorq -128(%rbp,%rcx,8),%r8 > > + shlq $56,%r10 > > + xorq (%rbp,%rcx,8),%r9 > > + roll $8,%edx > > + xorq 8(%rsi,%rax,1),%r8 > > + xorq (%rsi,%rax,1),%r9 > > + movb %dl,%al > > + xorq %r10,%r8 > > + movzwq (%r11,%r13,2),%r13 > > + movzbl %dl,%ecx > > + shlb $4,%al > > + movzbq (%rsp,%rbx,1),%r12 > > + shrl $4,%ecx > > + shlq $48,%r13 > > + xorq %r8,%r12 > > + movq %r9,%r10 > > + xorq %r13,%r9 > > + shrq $8,%r8 > > + movzbq %r12b,%r12 > > + movl 8(%rdi),%edx > > + shrq $8,%r9 > > + xorq -128(%rbp,%rbx,8),%r8 > > + shlq $56,%r10 > > + xorq (%rbp,%rbx,8),%r9 > > + roll $8,%edx > > + xorq 8(%rsi,%rax,1),%r8 > > + xorq (%rsi,%rax,1),%r9 > > + movb %dl,%al > > + xorq %r10,%r8 > > + movzwq (%r11,%r12,2),%r12 > > + movzbl %dl,%ebx > > + shlb $4,%al > > + movzbq (%rsp,%rcx,1),%r13 > > + shrl $4,%ebx > > + shlq $48,%r12 > > + xorq %r8,%r13 > > + movq %r9,%r10 > > + xorq %r12,%r9 > > + shrq $8,%r8 > > + movzbq %r13b,%r13 > > + shrq $8,%r9 > > + xorq -128(%rbp,%rcx,8),%r8 > > + shlq $56,%r10 > > + xorq (%rbp,%rcx,8),%r9 > > + roll $8,%edx > > + xorq 8(%rsi,%rax,1),%r8 > > + xorq (%rsi,%rax,1),%r9 > > + movb %dl,%al > > + xorq %r10,%r8 > > + movzwq (%r11,%r13,2),%r13 > > + movzbl %dl,%ecx > > + shlb $4,%al > > + movzbq (%rsp,%rbx,1),%r12 > > + shrl $4,%ecx > > + shlq $48,%r13 > > + xorq %r8,%r12 > > + movq %r9,%r10 > > + xorq %r13,%r9 > > + shrq $8,%r8 > > + movzbq %r12b,%r12 > > + shrq $8,%r9 > > + xorq -128(%rbp,%rbx,8),%r8 > > + shlq $56,%r10 > > + xorq (%rbp,%rbx,8),%r9 > > + roll $8,%edx > > + xorq 8(%rsi,%rax,1),%r8 > > + xorq (%rsi,%rax,1),%r9 > > + movb %dl,%al > > + xorq %r10,%r8 > > + movzwq (%r11,%r12,2),%r12 > > + movzbl %dl,%ebx > > + shlb $4,%al > > + movzbq (%rsp,%rcx,1),%r13 > > + shrl $4,%ebx > > + shlq $48,%r12 > > + xorq %r8,%r13 > > + movq %r9,%r10 > > + xorq %r12,%r9 > > + shrq $8,%r8 > > + movzbq %r13b,%r13 > > + shrq $8,%r9 > > + xorq -128(%rbp,%rcx,8),%r8 > > + shlq $56,%r10 > > + xorq (%rbp,%rcx,8),%r9 > > + roll $8,%edx > > + xorq 8(%rsi,%rax,1),%r8 > > + xorq (%rsi,%rax,1),%r9 > > + movb %dl,%al > > + xorq %r10,%r8 > > + movzwq (%r11,%r13,2),%r13 > > + movzbl %dl,%ecx > > + shlb $4,%al > > + movzbq (%rsp,%rbx,1),%r12 > > + shrl $4,%ecx > > + shlq $48,%r13 > > + xorq %r8,%r12 > > + movq %r9,%r10 > > + xorq %r13,%r9 > > + shrq $8,%r8 > > + movzbq %r12b,%r12 > > + movl 4(%rdi),%edx > > + shrq $8,%r9 > > + xorq -128(%rbp,%rbx,8),%r8 > > + shlq $56,%r10 > > + xorq (%rbp,%rbx,8),%r9 > > + roll $8,%edx > > + xorq 8(%rsi,%rax,1),%r8 > > + xorq (%rsi,%rax,1),%r9 > > + movb %dl,%al > > + xorq %r10,%r8 > > + movzwq (%r11,%r12,2),%r12 > > + movzbl %dl,%ebx > > + shlb $4,%al > > + movzbq (%rsp,%rcx,1),%r13 > > + shrl $4,%ebx > > + shlq $48,%r12 > > + xorq %r8,%r13 > > + movq %r9,%r10 > > + xorq %r12,%r9 > > + shrq $8,%r8 > > + movzbq %r13b,%r13 > > + shrq $8,%r9 > > + xorq -128(%rbp,%rcx,8),%r8 > > + shlq $56,%r10 > > + xorq (%rbp,%rcx,8),%r9 > > + roll $8,%edx > > + xorq 8(%rsi,%rax,1),%r8 > > + xorq (%rsi,%rax,1),%r9 > > + movb %dl,%al > > + xorq %r10,%r8 > > + movzwq (%r11,%r13,2),%r13 > > + movzbl %dl,%ecx > > + shlb $4,%al > > + movzbq (%rsp,%rbx,1),%r12 > > + shrl $4,%ecx > > + shlq $48,%r13 > > + xorq %r8,%r12 > > + movq %r9,%r10 > > + xorq %r13,%r9 > > + shrq $8,%r8 > > + movzbq %r12b,%r12 > > + shrq $8,%r9 > > + xorq -128(%rbp,%rbx,8),%r8 > > + shlq $56,%r10 > > + xorq (%rbp,%rbx,8),%r9 > > + roll $8,%edx > > + xorq 8(%rsi,%rax,1),%r8 > > + xorq (%rsi,%rax,1),%r9 > > + movb %dl,%al > > + xorq %r10,%r8 > > + movzwq (%r11,%r12,2),%r12 > > + movzbl %dl,%ebx > > + shlb $4,%al > > + movzbq (%rsp,%rcx,1),%r13 > > + shrl $4,%ebx > > + shlq $48,%r12 > > + xorq %r8,%r13 > > + movq %r9,%r10 > > + xorq %r12,%r9 > > + shrq $8,%r8 > > + movzbq %r13b,%r13 > > + shrq $8,%r9 > > + xorq -128(%rbp,%rcx,8),%r8 > > + shlq $56,%r10 > > + xorq (%rbp,%rcx,8),%r9 > > + roll $8,%edx > > + xorq 8(%rsi,%rax,1),%r8 > > + xorq (%rsi,%rax,1),%r9 > > + movb %dl,%al > > + xorq %r10,%r8 > > + movzwq (%r11,%r13,2),%r13 > > + movzbl %dl,%ecx > > + shlb $4,%al > > + movzbq (%rsp,%rbx,1),%r12 > > + shrl $4,%ecx > > + shlq $48,%r13 > > + xorq %r8,%r12 > > + movq %r9,%r10 > > + xorq %r13,%r9 > > + shrq $8,%r8 > > + movzbq %r12b,%r12 > > + movl 0(%rdi),%edx > > + shrq $8,%r9 > > + xorq -128(%rbp,%rbx,8),%r8 > > + shlq $56,%r10 > > + xorq (%rbp,%rbx,8),%r9 > > + roll $8,%edx > > + xorq 8(%rsi,%rax,1),%r8 > > + xorq (%rsi,%rax,1),%r9 > > + movb %dl,%al > > + xorq %r10,%r8 > > + movzwq (%r11,%r12,2),%r12 > > + movzbl %dl,%ebx > > + shlb $4,%al > > + movzbq (%rsp,%rcx,1),%r13 > > + shrl $4,%ebx > > + shlq $48,%r12 > > + xorq %r8,%r13 > > + movq %r9,%r10 > > + xorq %r12,%r9 > > + shrq $8,%r8 > > + movzbq %r13b,%r13 > > + shrq $8,%r9 > > + xorq -128(%rbp,%rcx,8),%r8 > > + shlq $56,%r10 > > + xorq (%rbp,%rcx,8),%r9 > > + roll $8,%edx > > + xorq 8(%rsi,%rax,1),%r8 > > + xorq (%rsi,%rax,1),%r9 > > + movb %dl,%al > > + xorq %r10,%r8 > > + movzwq (%r11,%r13,2),%r13 > > + movzbl %dl,%ecx > > + shlb $4,%al > > + movzbq (%rsp,%rbx,1),%r12 > > + shrl $4,%ecx > > + shlq $48,%r13 > > + xorq %r8,%r12 > > + movq %r9,%r10 > > + xorq %r13,%r9 > > + shrq $8,%r8 > > + movzbq %r12b,%r12 > > + shrq $8,%r9 > > + xorq -128(%rbp,%rbx,8),%r8 > > + shlq $56,%r10 > > + xorq (%rbp,%rbx,8),%r9 > > + roll $8,%edx > > + xorq 8(%rsi,%rax,1),%r8 > > + xorq (%rsi,%rax,1),%r9 > > + movb %dl,%al > > + xorq %r10,%r8 > > + movzwq (%r11,%r12,2),%r12 > > + movzbl %dl,%ebx > > + shlb $4,%al > > + movzbq (%rsp,%rcx,1),%r13 > > + shrl $4,%ebx > > + shlq $48,%r12 > > + xorq %r8,%r13 > > + movq %r9,%r10 > > + xorq %r12,%r9 > > + shrq $8,%r8 > > + movzbq %r13b,%r13 > > + shrq $8,%r9 > > + xorq -128(%rbp,%rcx,8),%r8 > > + shlq $56,%r10 > > + xorq (%rbp,%rcx,8),%r9 > > + roll $8,%edx > > + xorq 8(%rsi,%rax,1),%r8 > > + xorq (%rsi,%rax,1),%r9 > > + movb %dl,%al > > + xorq %r10,%r8 > > + movzwq (%r11,%r13,2),%r13 > > + movzbl %dl,%ecx > > + shlb $4,%al > > + movzbq (%rsp,%rbx,1),%r12 > > + andl $240,%ecx > > + shlq $48,%r13 > > + xorq %r8,%r12 > > + movq %r9,%r10 > > + xorq %r13,%r9 > > + shrq $8,%r8 > > + movzbq %r12b,%r12 > > + movl -4(%rdi),%edx > > + shrq $8,%r9 > > + xorq -128(%rbp,%rbx,8),%r8 > > + shlq $56,%r10 > > + xorq (%rbp,%rbx,8),%r9 > > + movzwq (%r11,%r12,2),%r12 > > + xorq 8(%rsi,%rax,1),%r8 > > + xorq (%rsi,%rax,1),%r9 > > + shlq $48,%r12 > > + xorq %r10,%r8 > > + xorq %r12,%r9 > > + movzbq %r8b,%r13 > > + shrq $4,%r8 > > + movq %r9,%r10 > > + shlb $4,%r13b > > + shrq $4,%r9 > > + xorq 8(%rsi,%rcx,1),%r8 > > + movzwq (%r11,%r13,2),%r13 > > + shlq $60,%r10 > > + xorq (%rsi,%rcx,1),%r9 > > + xorq %r10,%r8 > > + shlq $48,%r13 > > + bswapq %r8 > > + xorq %r13,%r9 > > + bswapq %r9 > > + cmpq %r15,%r14 > > + jb .Louter_loop > > + movq %r8,8(%rdi) > > + movq %r9,(%rdi) > > + > > + leaq 280+48(%rsp),%rsi > > +.cfi_def_cfa %rsi,8 > > + movq -48(%rsi),%r15 > > +.cfi_restore %r15 > > + movq -40(%rsi),%r14 > > +.cfi_restore %r14 > > + movq -32(%rsi),%r13 > > +.cfi_restore %r13 > > + movq -24(%rsi),%r12 > > +.cfi_restore %r12 > > + movq -16(%rsi),%rbp > > +.cfi_restore %rbp > > + movq -8(%rsi),%rbx > > +.cfi_restore %rbx > > + leaq 0(%rsi),%rsp > > +.cfi_def_cfa_register %rsp > > +.Lghash_epilogue: > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size gcm_ghash_4bit,.-gcm_ghash_4bit > > +.globl gcm_init_clmul > > +.type gcm_init_clmul,@function > > +.align 16 > > +gcm_init_clmul: > > +.cfi_startproc > > +.L_init_clmul: > > + movdqu (%rsi),%xmm2 > > + pshufd $78,%xmm2,%xmm2 > > + > > + > > + pshufd $255,%xmm2,%xmm4 > > + movdqa %xmm2,%xmm3 > > + psllq $1,%xmm2 > > + pxor %xmm5,%xmm5 > > + psrlq $63,%xmm3 > > + pcmpgtd %xmm4,%xmm5 > > + pslldq $8,%xmm3 > > + por %xmm3,%xmm2 > > + > > + > > + pand .L0x1c2_polynomial(%rip),%xmm5 > > + pxor %xmm5,%xmm2 > > + > > + > > + pshufd $78,%xmm2,%xmm6 > > + movdqa %xmm2,%xmm0 > > + pxor %xmm2,%xmm6 > > + movdqa %xmm0,%xmm1 > > + pshufd $78,%xmm0,%xmm3 > > + pxor %xmm0,%xmm3 > > +.byte 102,15,58,68,194,0 > > +.byte 102,15,58,68,202,17 > > +.byte 102,15,58,68,222,0 > > + pxor %xmm0,%xmm3 > > + pxor %xmm1,%xmm3 > > + > > + movdqa %xmm3,%xmm4 > > + psrldq $8,%xmm3 > > + pslldq $8,%xmm4 > > + pxor %xmm3,%xmm1 > > + pxor %xmm4,%xmm0 > > + > > + movdqa %xmm0,%xmm4 > > + movdqa %xmm0,%xmm3 > > + psllq $5,%xmm0 > > + pxor %xmm0,%xmm3 > > + psllq $1,%xmm0 > > + pxor %xmm3,%xmm0 > > + psllq $57,%xmm0 > > + movdqa %xmm0,%xmm3 > > + pslldq $8,%xmm0 > > + psrldq $8,%xmm3 > > + pxor %xmm4,%xmm0 > > + pxor %xmm3,%xmm1 > > + > > + > > + movdqa %xmm0,%xmm4 > > + psrlq $1,%xmm0 > > + pxor %xmm4,%xmm1 > > + pxor %xmm0,%xmm4 > > + psrlq $5,%xmm0 > > + pxor %xmm4,%xmm0 > > + psrlq $1,%xmm0 > > + pxor %xmm1,%xmm0 > > + pshufd $78,%xmm2,%xmm3 > > + pshufd $78,%xmm0,%xmm4 > > + pxor %xmm2,%xmm3 > > + movdqu %xmm2,0(%rdi) > > + pxor %xmm0,%xmm4 > > + movdqu %xmm0,16(%rdi) > > +.byte 102,15,58,15,227,8 > > + movdqu %xmm4,32(%rdi) > > + movdqa %xmm0,%xmm1 > > + pshufd $78,%xmm0,%xmm3 > > + pxor %xmm0,%xmm3 > > +.byte 102,15,58,68,194,0 > > +.byte 102,15,58,68,202,17 > > +.byte 102,15,58,68,222,0 > > + pxor %xmm0,%xmm3 > > + pxor %xmm1,%xmm3 > > + > > + movdqa %xmm3,%xmm4 > > + psrldq $8,%xmm3 > > + pslldq $8,%xmm4 > > + pxor %xmm3,%xmm1 > > + pxor %xmm4,%xmm0 > > + > > + movdqa %xmm0,%xmm4 > > + movdqa %xmm0,%xmm3 > > + psllq $5,%xmm0 > > + pxor %xmm0,%xmm3 > > + psllq $1,%xmm0 > > + pxor %xmm3,%xmm0 > > + psllq $57,%xmm0 > > + movdqa %xmm0,%xmm3 > > + pslldq $8,%xmm0 > > + psrldq $8,%xmm3 > > + pxor %xmm4,%xmm0 > > + pxor %xmm3,%xmm1 > > + > > + > > + movdqa %xmm0,%xmm4 > > + psrlq $1,%xmm0 > > + pxor %xmm4,%xmm1 > > + pxor %xmm0,%xmm4 > > + psrlq $5,%xmm0 > > + pxor %xmm4,%xmm0 > > + psrlq $1,%xmm0 > > + pxor %xmm1,%xmm0 > > + movdqa %xmm0,%xmm5 > > + movdqa %xmm0,%xmm1 > > + pshufd $78,%xmm0,%xmm3 > > + pxor %xmm0,%xmm3 > > +.byte 102,15,58,68,194,0 > > +.byte 102,15,58,68,202,17 > > +.byte 102,15,58,68,222,0 > > + pxor %xmm0,%xmm3 > > + pxor %xmm1,%xmm3 > > + > > + movdqa %xmm3,%xmm4 > > + psrldq $8,%xmm3 > > + pslldq $8,%xmm4 > > + pxor %xmm3,%xmm1 > > + pxor %xmm4,%xmm0 > > + > > + movdqa %xmm0,%xmm4 > > + movdqa %xmm0,%xmm3 > > + psllq $5,%xmm0 > > + pxor %xmm0,%xmm3 > > + psllq $1,%xmm0 > > + pxor %xmm3,%xmm0 > > + psllq $57,%xmm0 > > + movdqa %xmm0,%xmm3 > > + pslldq $8,%xmm0 > > + psrldq $8,%xmm3 > > + pxor %xmm4,%xmm0 > > + pxor %xmm3,%xmm1 > > + > > + > > + movdqa %xmm0,%xmm4 > > + psrlq $1,%xmm0 > > + pxor %xmm4,%xmm1 > > + pxor %xmm0,%xmm4 > > + psrlq $5,%xmm0 > > + pxor %xmm4,%xmm0 > > + psrlq $1,%xmm0 > > + pxor %xmm1,%xmm0 > > + pshufd $78,%xmm5,%xmm3 > > + pshufd $78,%xmm0,%xmm4 > > + pxor %xmm5,%xmm3 > > + movdqu %xmm5,48(%rdi) > > + pxor %xmm0,%xmm4 > > + movdqu %xmm0,64(%rdi) > > +.byte 102,15,58,15,227,8 > > + movdqu %xmm4,80(%rdi) > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size gcm_init_clmul,.-gcm_init_clmul > > +.globl gcm_gmult_clmul > > +.type gcm_gmult_clmul,@function > > +.align 16 > > +gcm_gmult_clmul: > > +.cfi_startproc > > +.L_gmult_clmul: > > + movdqu (%rdi),%xmm0 > > + movdqa .Lbswap_mask(%rip),%xmm5 > > + movdqu (%rsi),%xmm2 > > + movdqu 32(%rsi),%xmm4 > > +.byte 102,15,56,0,197 > > + movdqa %xmm0,%xmm1 > > + pshufd $78,%xmm0,%xmm3 > > + pxor %xmm0,%xmm3 > > +.byte 102,15,58,68,194,0 > > +.byte 102,15,58,68,202,17 > > +.byte 102,15,58,68,220,0 > > + pxor %xmm0,%xmm3 > > + pxor %xmm1,%xmm3 > > + > > + movdqa %xmm3,%xmm4 > > + psrldq $8,%xmm3 > > + pslldq $8,%xmm4 > > + pxor %xmm3,%xmm1 > > + pxor %xmm4,%xmm0 > > + > > + movdqa %xmm0,%xmm4 > > + movdqa %xmm0,%xmm3 > > + psllq $5,%xmm0 > > + pxor %xmm0,%xmm3 > > + psllq $1,%xmm0 > > + pxor %xmm3,%xmm0 > > + psllq $57,%xmm0 > > + movdqa %xmm0,%xmm3 > > + pslldq $8,%xmm0 > > + psrldq $8,%xmm3 > > + pxor %xmm4,%xmm0 > > + pxor %xmm3,%xmm1 > > + > > + > > + movdqa %xmm0,%xmm4 > > + psrlq $1,%xmm0 > > + pxor %xmm4,%xmm1 > > + pxor %xmm0,%xmm4 > > + psrlq $5,%xmm0 > > + pxor %xmm4,%xmm0 > > + psrlq $1,%xmm0 > > + pxor %xmm1,%xmm0 > > +.byte 102,15,56,0,197 > > + movdqu %xmm0,(%rdi) > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size gcm_gmult_clmul,.-gcm_gmult_clmul > > +.globl gcm_ghash_clmul > > +.type gcm_ghash_clmul,@function > > +.align 32 > > +gcm_ghash_clmul: > > +.cfi_startproc > > +.L_ghash_clmul: > > + movdqa .Lbswap_mask(%rip),%xmm10 > > + > > + movdqu (%rdi),%xmm0 > > + movdqu (%rsi),%xmm2 > > + movdqu 32(%rsi),%xmm7 > > +.byte 102,65,15,56,0,194 > > + > > + subq $0x10,%rcx > > + jz .Lodd_tail > > + > > + movdqu 16(%rsi),%xmm6 > > + movl OPENSSL_ia32cap_P+4(%rip),%eax > > + cmpq $0x30,%rcx > > + jb .Lskip4x > > + > > + andl $71303168,%eax > > + cmpl $4194304,%eax > > + je .Lskip4x > > + > > + subq $0x30,%rcx > > + movq $0xA040608020C0E000,%rax > > + movdqu 48(%rsi),%xmm14 > > + movdqu 64(%rsi),%xmm15 > > + > > + > > + > > + > > + movdqu 48(%rdx),%xmm3 > > + movdqu 32(%rdx),%xmm11 > > +.byte 102,65,15,56,0,218 > > +.byte 102,69,15,56,0,218 > > + movdqa %xmm3,%xmm5 > > + pshufd $78,%xmm3,%xmm4 > > + pxor %xmm3,%xmm4 > > +.byte 102,15,58,68,218,0 > > +.byte 102,15,58,68,234,17 > > +.byte 102,15,58,68,231,0 > > + > > + movdqa %xmm11,%xmm13 > > + pshufd $78,%xmm11,%xmm12 > > + pxor %xmm11,%xmm12 > > +.byte 102,68,15,58,68,222,0 > > +.byte 102,68,15,58,68,238,17 > > +.byte 102,68,15,58,68,231,16 > > + xorps %xmm11,%xmm3 > > + xorps %xmm13,%xmm5 > > + movups 80(%rsi),%xmm7 > > + xorps %xmm12,%xmm4 > > + > > + movdqu 16(%rdx),%xmm11 > > + movdqu 0(%rdx),%xmm8 > > +.byte 102,69,15,56,0,218 > > +.byte 102,69,15,56,0,194 > > + movdqa %xmm11,%xmm13 > > + pshufd $78,%xmm11,%xmm12 > > + pxor %xmm8,%xmm0 > > + pxor %xmm11,%xmm12 > > +.byte 102,69,15,58,68,222,0 > > + movdqa %xmm0,%xmm1 > > + pshufd $78,%xmm0,%xmm8 > > + pxor %xmm0,%xmm8 > > +.byte 102,69,15,58,68,238,17 > > +.byte 102,68,15,58,68,231,0 > > + xorps %xmm11,%xmm3 > > + xorps %xmm13,%xmm5 > > + > > + leaq 64(%rdx),%rdx > > + subq $0x40,%rcx > > + jc .Ltail4x > > + > > + jmp .Lmod4_loop > > +.align 32 > > +.Lmod4_loop: > > +.byte 102,65,15,58,68,199,0 > > + xorps %xmm12,%xmm4 > > + movdqu 48(%rdx),%xmm11 > > +.byte 102,69,15,56,0,218 > > +.byte 102,65,15,58,68,207,17 > > + xorps %xmm3,%xmm0 > > + movdqu 32(%rdx),%xmm3 > > + movdqa %xmm11,%xmm13 > > +.byte 102,68,15,58,68,199,16 > > + pshufd $78,%xmm11,%xmm12 > > + xorps %xmm5,%xmm1 > > + pxor %xmm11,%xmm12 > > +.byte 102,65,15,56,0,218 > > + movups 32(%rsi),%xmm7 > > + xorps %xmm4,%xmm8 > > +.byte 102,68,15,58,68,218,0 > > + pshufd $78,%xmm3,%xmm4 > > + > > + pxor %xmm0,%xmm8 > > + movdqa %xmm3,%xmm5 > > + pxor %xmm1,%xmm8 > > + pxor %xmm3,%xmm4 > > + movdqa %xmm8,%xmm9 > > +.byte 102,68,15,58,68,234,17 > > + pslldq $8,%xmm8 > > + psrldq $8,%xmm9 > > + pxor %xmm8,%xmm0 > > + movdqa .L7_mask(%rip),%xmm8 > > + pxor %xmm9,%xmm1 > > +.byte 102,76,15,110,200 > > + > > + pand %xmm0,%xmm8 > > +.byte 102,69,15,56,0,200 > > + pxor %xmm0,%xmm9 > > +.byte 102,68,15,58,68,231,0 > > + psllq $57,%xmm9 > > + movdqa %xmm9,%xmm8 > > + pslldq $8,%xmm9 > > +.byte 102,15,58,68,222,0 > > + psrldq $8,%xmm8 > > + pxor %xmm9,%xmm0 > > + pxor %xmm8,%xmm1 > > + movdqu 0(%rdx),%xmm8 > > + > > + movdqa %xmm0,%xmm9 > > + psrlq $1,%xmm0 > > +.byte 102,15,58,68,238,17 > > + xorps %xmm11,%xmm3 > > + movdqu 16(%rdx),%xmm11 > > +.byte 102,69,15,56,0,218 > > +.byte 102,15,58,68,231,16 > > + xorps %xmm13,%xmm5 > > + movups 80(%rsi),%xmm7 > > +.byte 102,69,15,56,0,194 > > + pxor %xmm9,%xmm1 > > + pxor %xmm0,%xmm9 > > + psrlq $5,%xmm0 > > + > > + movdqa %xmm11,%xmm13 > > + pxor %xmm12,%xmm4 > > + pshufd $78,%xmm11,%xmm12 > > + pxor %xmm9,%xmm0 > > + pxor %xmm8,%xmm1 > > + pxor %xmm11,%xmm12 > > +.byte 102,69,15,58,68,222,0 > > + psrlq $1,%xmm0 > > + pxor %xmm1,%xmm0 > > + movdqa %xmm0,%xmm1 > > +.byte 102,69,15,58,68,238,17 > > + xorps %xmm11,%xmm3 > > + pshufd $78,%xmm0,%xmm8 > > + pxor %xmm0,%xmm8 > > + > > +.byte 102,68,15,58,68,231,0 > > + xorps %xmm13,%xmm5 > > + > > + leaq 64(%rdx),%rdx > > + subq $0x40,%rcx > > + jnc .Lmod4_loop > > + > > +.Ltail4x: > > +.byte 102,65,15,58,68,199,0 > > +.byte 102,65,15,58,68,207,17 > > +.byte 102,68,15,58,68,199,16 > > + xorps %xmm12,%xmm4 > > + xorps %xmm3,%xmm0 > > + xorps %xmm5,%xmm1 > > + pxor %xmm0,%xmm1 > > + pxor %xmm4,%xmm8 > > + > > + pxor %xmm1,%xmm8 > > + pxor %xmm0,%xmm1 > > + > > + movdqa %xmm8,%xmm9 > > + psrldq $8,%xmm8 > > + pslldq $8,%xmm9 > > + pxor %xmm8,%xmm1 > > + pxor %xmm9,%xmm0 > > + > > + movdqa %xmm0,%xmm4 > > + movdqa %xmm0,%xmm3 > > + psllq $5,%xmm0 > > + pxor %xmm0,%xmm3 > > + psllq $1,%xmm0 > > + pxor %xmm3,%xmm0 > > + psllq $57,%xmm0 > > + movdqa %xmm0,%xmm3 > > + pslldq $8,%xmm0 > > + psrldq $8,%xmm3 > > + pxor %xmm4,%xmm0 > > + pxor %xmm3,%xmm1 > > + > > + > > + movdqa %xmm0,%xmm4 > > + psrlq $1,%xmm0 > > + pxor %xmm4,%xmm1 > > + pxor %xmm0,%xmm4 > > + psrlq $5,%xmm0 > > + pxor %xmm4,%xmm0 > > + psrlq $1,%xmm0 > > + pxor %xmm1,%xmm0 > > + addq $0x40,%rcx > > + jz .Ldone > > + movdqu 32(%rsi),%xmm7 > > + subq $0x10,%rcx > > + jz .Lodd_tail > > +.Lskip4x: > > + > > + > > + > > + > > + > > + movdqu (%rdx),%xmm8 > > + movdqu 16(%rdx),%xmm3 > > +.byte 102,69,15,56,0,194 > > +.byte 102,65,15,56,0,218 > > + pxor %xmm8,%xmm0 > > + > > + movdqa %xmm3,%xmm5 > > + pshufd $78,%xmm3,%xmm4 > > + pxor %xmm3,%xmm4 > > +.byte 102,15,58,68,218,0 > > +.byte 102,15,58,68,234,17 > > +.byte 102,15,58,68,231,0 > > + > > + leaq 32(%rdx),%rdx > > + nop > > + subq $0x20,%rcx > > + jbe .Leven_tail > > + nop > > + jmp .Lmod_loop > > + > > +.align 32 > > +.Lmod_loop: > > + movdqa %xmm0,%xmm1 > > + movdqa %xmm4,%xmm8 > > + pshufd $78,%xmm0,%xmm4 > > + pxor %xmm0,%xmm4 > > + > > +.byte 102,15,58,68,198,0 > > +.byte 102,15,58,68,206,17 > > +.byte 102,15,58,68,231,16 > > + > > + pxor %xmm3,%xmm0 > > + pxor %xmm5,%xmm1 > > + movdqu (%rdx),%xmm9 > > + pxor %xmm0,%xmm8 > > +.byte 102,69,15,56,0,202 > > + movdqu 16(%rdx),%xmm3 > > + > > + pxor %xmm1,%xmm8 > > + pxor %xmm9,%xmm1 > > + pxor %xmm8,%xmm4 > > +.byte 102,65,15,56,0,218 > > + movdqa %xmm4,%xmm8 > > + psrldq $8,%xmm8 > > + pslldq $8,%xmm4 > > + pxor %xmm8,%xmm1 > > + pxor %xmm4,%xmm0 > > + > > + movdqa %xmm3,%xmm5 > > + > > + movdqa %xmm0,%xmm9 > > + movdqa %xmm0,%xmm8 > > + psllq $5,%xmm0 > > + pxor %xmm0,%xmm8 > > +.byte 102,15,58,68,218,0 > > + psllq $1,%xmm0 > > + pxor %xmm8,%xmm0 > > + psllq $57,%xmm0 > > + movdqa %xmm0,%xmm8 > > + pslldq $8,%xmm0 > > + psrldq $8,%xmm8 > > + pxor %xmm9,%xmm0 > > + pshufd $78,%xmm5,%xmm4 > > + pxor %xmm8,%xmm1 > > + pxor %xmm5,%xmm4 > > + > > + movdqa %xmm0,%xmm9 > > + psrlq $1,%xmm0 > > +.byte 102,15,58,68,234,17 > > + pxor %xmm9,%xmm1 > > + pxor %xmm0,%xmm9 > > + psrlq $5,%xmm0 > > + pxor %xmm9,%xmm0 > > + leaq 32(%rdx),%rdx > > + psrlq $1,%xmm0 > > +.byte 102,15,58,68,231,0 > > + pxor %xmm1,%xmm0 > > + > > + subq $0x20,%rcx > > + ja .Lmod_loop > > + > > +.Leven_tail: > > + movdqa %xmm0,%xmm1 > > + movdqa %xmm4,%xmm8 > > + pshufd $78,%xmm0,%xmm4 > > + pxor %xmm0,%xmm4 > > + > > +.byte 102,15,58,68,198,0 > > +.byte 102,15,58,68,206,17 > > +.byte 102,15,58,68,231,16 > > + > > + pxor %xmm3,%xmm0 > > + pxor %xmm5,%xmm1 > > + pxor %xmm0,%xmm8 > > + pxor %xmm1,%xmm8 > > + pxor %xmm8,%xmm4 > > + movdqa %xmm4,%xmm8 > > + psrldq $8,%xmm8 > > + pslldq $8,%xmm4 > > + pxor %xmm8,%xmm1 > > + pxor %xmm4,%xmm0 > > + > > + movdqa %xmm0,%xmm4 > > + movdqa %xmm0,%xmm3 > > + psllq $5,%xmm0 > > + pxor %xmm0,%xmm3 > > + psllq $1,%xmm0 > > + pxor %xmm3,%xmm0 > > + psllq $57,%xmm0 > > + movdqa %xmm0,%xmm3 > > + pslldq $8,%xmm0 > > + psrldq $8,%xmm3 > > + pxor %xmm4,%xmm0 > > + pxor %xmm3,%xmm1 > > + > > + > > + movdqa %xmm0,%xmm4 > > + psrlq $1,%xmm0 > > + pxor %xmm4,%xmm1 > > + pxor %xmm0,%xmm4 > > + psrlq $5,%xmm0 > > + pxor %xmm4,%xmm0 > > + psrlq $1,%xmm0 > > + pxor %xmm1,%xmm0 > > + testq %rcx,%rcx > > + jnz .Ldone > > + > > +.Lodd_tail: > > + movdqu (%rdx),%xmm8 > > +.byte 102,69,15,56,0,194 > > + pxor %xmm8,%xmm0 > > + movdqa %xmm0,%xmm1 > > + pshufd $78,%xmm0,%xmm3 > > + pxor %xmm0,%xmm3 > > +.byte 102,15,58,68,194,0 > > +.byte 102,15,58,68,202,17 > > +.byte 102,15,58,68,223,0 > > + pxor %xmm0,%xmm3 > > + pxor %xmm1,%xmm3 > > + > > + movdqa %xmm3,%xmm4 > > + psrldq $8,%xmm3 > > + pslldq $8,%xmm4 > > + pxor %xmm3,%xmm1 > > + pxor %xmm4,%xmm0 > > + > > + movdqa %xmm0,%xmm4 > > + movdqa %xmm0,%xmm3 > > + psllq $5,%xmm0 > > + pxor %xmm0,%xmm3 > > + psllq $1,%xmm0 > > + pxor %xmm3,%xmm0 > > + psllq $57,%xmm0 > > + movdqa %xmm0,%xmm3 > > + pslldq $8,%xmm0 > > + psrldq $8,%xmm3 > > + pxor %xmm4,%xmm0 > > + pxor %xmm3,%xmm1 > > + > > + > > + movdqa %xmm0,%xmm4 > > + psrlq $1,%xmm0 > > + pxor %xmm4,%xmm1 > > + pxor %xmm0,%xmm4 > > + psrlq $5,%xmm0 > > + pxor %xmm4,%xmm0 > > + psrlq $1,%xmm0 > > + pxor %xmm1,%xmm0 > > +.Ldone: > > +.byte 102,65,15,56,0,194 > > + movdqu %xmm0,(%rdi) > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size gcm_ghash_clmul,.-gcm_ghash_clmul > > +.globl gcm_init_avx > > +.type gcm_init_avx,@function > > +.align 32 > > +gcm_init_avx: > > +.cfi_startproc > > + jmp .L_init_clmul > > +.cfi_endproc > > +.size gcm_init_avx,.-gcm_init_avx > > +.globl gcm_gmult_avx > > +.type gcm_gmult_avx,@function > > +.align 32 > > +gcm_gmult_avx: > > +.cfi_startproc > > + jmp .L_gmult_clmul > > +.cfi_endproc > > +.size gcm_gmult_avx,.-gcm_gmult_avx > > +.globl gcm_ghash_avx > > +.type gcm_ghash_avx,@function > > +.align 32 > > +gcm_ghash_avx: > > +.cfi_startproc > > + jmp .L_ghash_clmul > > +.cfi_endproc > > +.size gcm_ghash_avx,.-gcm_ghash_avx > > +.align 64 > > +.Lbswap_mask: > > +.byte 15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0 > > +.L0x1c2_polynomial: > > +.byte 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0xc2 > > +.L7_mask: > > +.long 7,0,7,0 > > +.L7_mask_poly: > > +.long 7,0,450,0 > > +.align 64 > > +.type .Lrem_4bit,@object > > +.Lrem_4bit: > > +.long 0,0,0,471859200,0,943718400,0,610271232 > > +.long 0,1887436800,0,1822425088,0,1220542464,0,1423966208 > > +.long 0,3774873600,0,4246732800,0,3644850176,0,3311403008 > > +.long 0,2441084928,0,2376073216,0,2847932416,0,3051356160 > > +.type .Lrem_8bit,@object > > +.Lrem_8bit: > > +.value 0x0000,0x01C2,0x0384,0x0246,0x0708,0x06CA,0x048C,0x054E > > +.value 0x0E10,0x0FD2,0x0D94,0x0C56,0x0918,0x08DA,0x0A9C,0x0B5E > > +.value 0x1C20,0x1DE2,0x1FA4,0x1E66,0x1B28,0x1AEA,0x18AC,0x196E > > +.value 0x1230,0x13F2,0x11B4,0x1076,0x1538,0x14FA,0x16BC,0x177E > > +.value 0x3840,0x3982,0x3BC4,0x3A06,0x3F48,0x3E8A,0x3CCC,0x3D0E > > +.value 0x3650,0x3792,0x35D4,0x3416,0x3158,0x309A,0x32DC,0x331E > > +.value 0x2460,0x25A2,0x27E4,0x2626,0x2368,0x22AA,0x20EC,0x212E > > +.value 0x2A70,0x2BB2,0x29F4,0x2836,0x2D78,0x2CBA,0x2EFC,0x2F3E > > +.value 0x7080,0x7142,0x7304,0x72C6,0x7788,0x764A,0x740C,0x75CE > > +.value 0x7E90,0x7F52,0x7D14,0x7CD6,0x7998,0x785A,0x7A1C,0x7BDE > > +.value 0x6CA0,0x6D62,0x6F24,0x6EE6,0x6BA8,0x6A6A,0x682C,0x69EE > > +.value 0x62B0,0x6372,0x6134,0x60F6,0x65B8,0x647A,0x663C,0x67FE > > +.value 0x48C0,0x4902,0x4B44,0x4A86,0x4FC8,0x4E0A,0x4C4C,0x4D8E > > +.value 0x46D0,0x4712,0x4554,0x4496,0x41D8,0x401A,0x425C,0x439E > > +.value 0x54E0,0x5522,0x5764,0x56A6,0x53E8,0x522A,0x506C,0x51AE > > +.value 0x5AF0,0x5B32,0x5974,0x58B6,0x5DF8,0x5C3A,0x5E7C,0x5FBE > > +.value 0xE100,0xE0C2,0xE284,0xE346,0xE608,0xE7CA,0xE58C,0xE44E > > +.value 0xEF10,0xEED2,0xEC94,0xED56,0xE818,0xE9DA,0xEB9C,0xEA5E > > +.value 0xFD20,0xFCE2,0xFEA4,0xFF66,0xFA28,0xFBEA,0xF9AC,0xF86E > > +.value 0xF330,0xF2F2,0xF0B4,0xF176,0xF438,0xF5FA,0xF7BC,0xF67E > > +.value 0xD940,0xD882,0xDAC4,0xDB06,0xDE48,0xDF8A,0xDDCC,0xDC0E > > +.value 0xD750,0xD692,0xD4D4,0xD516,0xD058,0xD19A,0xD3DC,0xD21E > > +.value 0xC560,0xC4A2,0xC6E4,0xC726,0xC268,0xC3AA,0xC1EC,0xC02E > > +.value 0xCB70,0xCAB2,0xC8F4,0xC936,0xCC78,0xCDBA,0xCFFC,0xCE3E > > +.value 0x9180,0x9042,0x9204,0x93C6,0x9688,0x974A,0x950C,0x94CE > > +.value 0x9F90,0x9E52,0x9C14,0x9DD6,0x9898,0x995A,0x9B1C,0x9ADE > > +.value 0x8DA0,0x8C62,0x8E24,0x8FE6,0x8AA8,0x8B6A,0x892C,0x88EE > > +.value 0x83B0,0x8272,0x8034,0x81F6,0x84B8,0x857A,0x873C,0x86FE > > +.value 0xA9C0,0xA802,0xAA44,0xAB86,0xAEC8,0xAF0A,0xAD4C,0xAC8E > > +.value 0xA7D0,0xA612,0xA454,0xA596,0xA0D8,0xA11A,0xA35C,0xA29E > > +.value 0xB5E0,0xB422,0xB664,0xB7A6,0xB2E8,0xB32A,0xB16C,0xB0AE > > +.value 0xBBF0,0xBA32,0xB874,0xB9B6,0xBCF8,0xBD3A,0xBF7C,0xBEBE > > + > > +.byte > > > 71,72,65,83,72,32,102,111,114,32,120,56,54,95,54,52,44,32,67,82,89,80,84,79 > , > > > 71,65,77,83,32,98,121,32,60,97,112,112,114,111,64,111,112,101,110,115,115, > > 108,46,111,114,103,62,0 > > +.align 64 > > diff --git > a/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha1-mb-x86_64.S > > b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha1-mb-x86_64.S > > new file mode 100644 > > index 0000000000..4572bc7227 > > --- /dev/null > > +++ > b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha1-mb-x86_64.S > > @@ -0,0 +1,2962 @@ > > +# WARNING: do not edit! > > +# Generated from openssl/crypto/sha/asm/sha1-mb-x86_64.pl > > +# > > +# Copyright 2013-2020 The OpenSSL Project Authors. All Rights Reserved. > > +# > > +# Licensed under the OpenSSL license (the "License"). You may not use > > +# this file except in compliance with the License. You can obtain a copy > > +# in the file LICENSE in the source distribution or at > > +# https://www.openssl.org/source/license.html > > + > > +.text > > + > > + > > + > > +.globl sha1_multi_block > > +.type sha1_multi_block,@function > > +.align 32 > > +sha1_multi_block: > > +.cfi_startproc > > + movq OPENSSL_ia32cap_P+4(%rip),%rcx > > + btq $61,%rcx > > + jc _shaext_shortcut > > + movq %rsp,%rax > > +.cfi_def_cfa_register %rax > > + pushq %rbx > > +.cfi_offset %rbx,-16 > > + pushq %rbp > > +.cfi_offset %rbx,-24 > > + subq $288,%rsp > > + andq $-256,%rsp > > + movq %rax,272(%rsp) > > +.cfi_escape 0x0f,0x06,0x77,0x90,0x02,0x06,0x23,0x08 > > +.Lbody: > > + leaq K_XX_XX(%rip),%rbp > > + leaq 256(%rsp),%rbx > > + > > +.Loop_grande: > > + movl %edx,280(%rsp) > > + xorl %edx,%edx > > + movq 0(%rsi),%r8 > > + movl 8(%rsi),%ecx > > + cmpl %edx,%ecx > > + cmovgl %ecx,%edx > > + testl %ecx,%ecx > > + movl %ecx,0(%rbx) > > + cmovleq %rbp,%r8 > > + movq 16(%rsi),%r9 > > + movl 24(%rsi),%ecx > > + cmpl %edx,%ecx > > + cmovgl %ecx,%edx > > + testl %ecx,%ecx > > + movl %ecx,4(%rbx) > > + cmovleq %rbp,%r9 > > + movq 32(%rsi),%r10 > > + movl 40(%rsi),%ecx > > + cmpl %edx,%ecx > > + cmovgl %ecx,%edx > > + testl %ecx,%ecx > > + movl %ecx,8(%rbx) > > + cmovleq %rbp,%r10 > > + movq 48(%rsi),%r11 > > + movl 56(%rsi),%ecx > > + cmpl %edx,%ecx > > + cmovgl %ecx,%edx > > + testl %ecx,%ecx > > + movl %ecx,12(%rbx) > > + cmovleq %rbp,%r11 > > + testl %edx,%edx > > + jz .Ldone > > + > > + movdqu 0(%rdi),%xmm10 > > + leaq 128(%rsp),%rax > > + movdqu 32(%rdi),%xmm11 > > + movdqu 64(%rdi),%xmm12 > > + movdqu 96(%rdi),%xmm13 > > + movdqu 128(%rdi),%xmm14 > > + movdqa 96(%rbp),%xmm5 > > + movdqa -32(%rbp),%xmm15 > > + jmp .Loop > > + > > +.align 32 > > +.Loop: > > + movd (%r8),%xmm0 > > + leaq 64(%r8),%r8 > > + movd (%r9),%xmm2 > > + leaq 64(%r9),%r9 > > + movd (%r10),%xmm3 > > + leaq 64(%r10),%r10 > > + movd (%r11),%xmm4 > > + leaq 64(%r11),%r11 > > + punpckldq %xmm3,%xmm0 > > + movd -60(%r8),%xmm1 > > + punpckldq %xmm4,%xmm2 > > + movd -60(%r9),%xmm9 > > + punpckldq %xmm2,%xmm0 > > + movd -60(%r10),%xmm8 > > +.byte 102,15,56,0,197 > > + movd -60(%r11),%xmm7 > > + punpckldq %xmm8,%xmm1 > > + movdqa %xmm10,%xmm8 > > + paddd %xmm15,%xmm14 > > + punpckldq %xmm7,%xmm9 > > + movdqa %xmm11,%xmm7 > > + movdqa %xmm11,%xmm6 > > + pslld $5,%xmm8 > > + pandn %xmm13,%xmm7 > > + pand %xmm12,%xmm6 > > + punpckldq %xmm9,%xmm1 > > + movdqa %xmm10,%xmm9 > > + > > + movdqa %xmm0,0-128(%rax) > > + paddd %xmm0,%xmm14 > > + movd -56(%r8),%xmm2 > > + psrld $27,%xmm9 > > + pxor %xmm7,%xmm6 > > + movdqa %xmm11,%xmm7 > > + > > + por %xmm9,%xmm8 > > + movd -56(%r9),%xmm9 > > + pslld $30,%xmm7 > > + paddd %xmm6,%xmm14 > > + > > + psrld $2,%xmm11 > > + paddd %xmm8,%xmm14 > > +.byte 102,15,56,0,205 > > + movd -56(%r10),%xmm8 > > + por %xmm7,%xmm11 > > + movd -56(%r11),%xmm7 > > + punpckldq %xmm8,%xmm2 > > + movdqa %xmm14,%xmm8 > > + paddd %xmm15,%xmm13 > > + punpckldq %xmm7,%xmm9 > > + movdqa %xmm10,%xmm7 > > + movdqa %xmm10,%xmm6 > > + pslld $5,%xmm8 > > + pandn %xmm12,%xmm7 > > + pand %xmm11,%xmm6 > > + punpckldq %xmm9,%xmm2 > > + movdqa %xmm14,%xmm9 > > + > > + movdqa %xmm1,16-128(%rax) > > + paddd %xmm1,%xmm13 > > + movd -52(%r8),%xmm3 > > + psrld $27,%xmm9 > > + pxor %xmm7,%xmm6 > > + movdqa %xmm10,%xmm7 > > + > > + por %xmm9,%xmm8 > > + movd -52(%r9),%xmm9 > > + pslld $30,%xmm7 > > + paddd %xmm6,%xmm13 > > + > > + psrld $2,%xmm10 > > + paddd %xmm8,%xmm13 > > +.byte 102,15,56,0,213 > > + movd -52(%r10),%xmm8 > > + por %xmm7,%xmm10 > > + movd -52(%r11),%xmm7 > > + punpckldq %xmm8,%xmm3 > > + movdqa %xmm13,%xmm8 > > + paddd %xmm15,%xmm12 > > + punpckldq %xmm7,%xmm9 > > + movdqa %xmm14,%xmm7 > > + movdqa %xmm14,%xmm6 > > + pslld $5,%xmm8 > > + pandn %xmm11,%xmm7 > > + pand %xmm10,%xmm6 > > + punpckldq %xmm9,%xmm3 > > + movdqa %xmm13,%xmm9 > > + > > + movdqa %xmm2,32-128(%rax) > > + paddd %xmm2,%xmm12 > > + movd -48(%r8),%xmm4 > > + psrld $27,%xmm9 > > + pxor %xmm7,%xmm6 > > + movdqa %xmm14,%xmm7 > > + > > + por %xmm9,%xmm8 > > + movd -48(%r9),%xmm9 > > + pslld $30,%xmm7 > > + paddd %xmm6,%xmm12 > > + > > + psrld $2,%xmm14 > > + paddd %xmm8,%xmm12 > > +.byte 102,15,56,0,221 > > + movd -48(%r10),%xmm8 > > + por %xmm7,%xmm14 > > + movd -48(%r11),%xmm7 > > + punpckldq %xmm8,%xmm4 > > + movdqa %xmm12,%xmm8 > > + paddd %xmm15,%xmm11 > > + punpckldq %xmm7,%xmm9 > > + movdqa %xmm13,%xmm7 > > + movdqa %xmm13,%xmm6 > > + pslld $5,%xmm8 > > + pandn %xmm10,%xmm7 > > + pand %xmm14,%xmm6 > > + punpckldq %xmm9,%xmm4 > > + movdqa %xmm12,%xmm9 > > + > > + movdqa %xmm3,48-128(%rax) > > + paddd %xmm3,%xmm11 > > + movd -44(%r8),%xmm0 > > + psrld $27,%xmm9 > > + pxor %xmm7,%xmm6 > > + movdqa %xmm13,%xmm7 > > + > > + por %xmm9,%xmm8 > > + movd -44(%r9),%xmm9 > > + pslld $30,%xmm7 > > + paddd %xmm6,%xmm11 > > + > > + psrld $2,%xmm13 > > + paddd %xmm8,%xmm11 > > +.byte 102,15,56,0,229 > > + movd -44(%r10),%xmm8 > > + por %xmm7,%xmm13 > > + movd -44(%r11),%xmm7 > > + punpckldq %xmm8,%xmm0 > > + movdqa %xmm11,%xmm8 > > + paddd %xmm15,%xmm10 > > + punpckldq %xmm7,%xmm9 > > + movdqa %xmm12,%xmm7 > > + movdqa %xmm12,%xmm6 > > + pslld $5,%xmm8 > > + pandn %xmm14,%xmm7 > > + pand %xmm13,%xmm6 > > + punpckldq %xmm9,%xmm0 > > + movdqa %xmm11,%xmm9 > > + > > + movdqa %xmm4,64-128(%rax) > > + paddd %xmm4,%xmm10 > > + movd -40(%r8),%xmm1 > > + psrld $27,%xmm9 > > + pxor %xmm7,%xmm6 > > + movdqa %xmm12,%xmm7 > > + > > + por %xmm9,%xmm8 > > + movd -40(%r9),%xmm9 > > + pslld $30,%xmm7 > > + paddd %xmm6,%xmm10 > > + > > + psrld $2,%xmm12 > > + paddd %xmm8,%xmm10 > > +.byte 102,15,56,0,197 > > + movd -40(%r10),%xmm8 > > + por %xmm7,%xmm12 > > + movd -40(%r11),%xmm7 > > + punpckldq %xmm8,%xmm1 > > + movdqa %xmm10,%xmm8 > > + paddd %xmm15,%xmm14 > > + punpckldq %xmm7,%xmm9 > > + movdqa %xmm11,%xmm7 > > + movdqa %xmm11,%xmm6 > > + pslld $5,%xmm8 > > + pandn %xmm13,%xmm7 > > + pand %xmm12,%xmm6 > > + punpckldq %xmm9,%xmm1 > > + movdqa %xmm10,%xmm9 > > + > > + movdqa %xmm0,80-128(%rax) > > + paddd %xmm0,%xmm14 > > + movd -36(%r8),%xmm2 > > + psrld $27,%xmm9 > > + pxor %xmm7,%xmm6 > > + movdqa %xmm11,%xmm7 > > + > > + por %xmm9,%xmm8 > > + movd -36(%r9),%xmm9 > > + pslld $30,%xmm7 > > + paddd %xmm6,%xmm14 > > + > > + psrld $2,%xmm11 > > + paddd %xmm8,%xmm14 > > +.byte 102,15,56,0,205 > > + movd -36(%r10),%xmm8 > > + por %xmm7,%xmm11 > > + movd -36(%r11),%xmm7 > > + punpckldq %xmm8,%xmm2 > > + movdqa %xmm14,%xmm8 > > + paddd %xmm15,%xmm13 > > + punpckldq %xmm7,%xmm9 > > + movdqa %xmm10,%xmm7 > > + movdqa %xmm10,%xmm6 > > + pslld $5,%xmm8 > > + pandn %xmm12,%xmm7 > > + pand %xmm11,%xmm6 > > + punpckldq %xmm9,%xmm2 > > + movdqa %xmm14,%xmm9 > > + > > + movdqa %xmm1,96-128(%rax) > > + paddd %xmm1,%xmm13 > > + movd -32(%r8),%xmm3 > > + psrld $27,%xmm9 > > + pxor %xmm7,%xmm6 > > + movdqa %xmm10,%xmm7 > > + > > + por %xmm9,%xmm8 > > + movd -32(%r9),%xmm9 > > + pslld $30,%xmm7 > > + paddd %xmm6,%xmm13 > > + > > + psrld $2,%xmm10 > > + paddd %xmm8,%xmm13 > > +.byte 102,15,56,0,213 > > + movd -32(%r10),%xmm8 > > + por %xmm7,%xmm10 > > + movd -32(%r11),%xmm7 > > + punpckldq %xmm8,%xmm3 > > + movdqa %xmm13,%xmm8 > > + paddd %xmm15,%xmm12 > > + punpckldq %xmm7,%xmm9 > > + movdqa %xmm14,%xmm7 > > + movdqa %xmm14,%xmm6 > > + pslld $5,%xmm8 > > + pandn %xmm11,%xmm7 > > + pand %xmm10,%xmm6 > > + punpckldq %xmm9,%xmm3 > > + movdqa %xmm13,%xmm9 > > + > > + movdqa %xmm2,112-128(%rax) > > + paddd %xmm2,%xmm12 > > + movd -28(%r8),%xmm4 > > + psrld $27,%xmm9 > > + pxor %xmm7,%xmm6 > > + movdqa %xmm14,%xmm7 > > + > > + por %xmm9,%xmm8 > > + movd -28(%r9),%xmm9 > > + pslld $30,%xmm7 > > + paddd %xmm6,%xmm12 > > + > > + psrld $2,%xmm14 > > + paddd %xmm8,%xmm12 > > +.byte 102,15,56,0,221 > > + movd -28(%r10),%xmm8 > > + por %xmm7,%xmm14 > > + movd -28(%r11),%xmm7 > > + punpckldq %xmm8,%xmm4 > > + movdqa %xmm12,%xmm8 > > + paddd %xmm15,%xmm11 > > + punpckldq %xmm7,%xmm9 > > + movdqa %xmm13,%xmm7 > > + movdqa %xmm13,%xmm6 > > + pslld $5,%xmm8 > > + pandn %xmm10,%xmm7 > > + pand %xmm14,%xmm6 > > + punpckldq %xmm9,%xmm4 > > + movdqa %xmm12,%xmm9 > > + > > + movdqa %xmm3,128-128(%rax) > > + paddd %xmm3,%xmm11 > > + movd -24(%r8),%xmm0 > > + psrld $27,%xmm9 > > + pxor %xmm7,%xmm6 > > + movdqa %xmm13,%xmm7 > > + > > + por %xmm9,%xmm8 > > + movd -24(%r9),%xmm9 > > + pslld $30,%xmm7 > > + paddd %xmm6,%xmm11 > > + > > + psrld $2,%xmm13 > > + paddd %xmm8,%xmm11 > > +.byte 102,15,56,0,229 > > + movd -24(%r10),%xmm8 > > + por %xmm7,%xmm13 > > + movd -24(%r11),%xmm7 > > + punpckldq %xmm8,%xmm0 > > + movdqa %xmm11,%xmm8 > > + paddd %xmm15,%xmm10 > > + punpckldq %xmm7,%xmm9 > > + movdqa %xmm12,%xmm7 > > + movdqa %xmm12,%xmm6 > > + pslld $5,%xmm8 > > + pandn %xmm14,%xmm7 > > + pand %xmm13,%xmm6 > > + punpckldq %xmm9,%xmm0 > > + movdqa %xmm11,%xmm9 > > + > > + movdqa %xmm4,144-128(%rax) > > + paddd %xmm4,%xmm10 > > + movd -20(%r8),%xmm1 > > + psrld $27,%xmm9 > > + pxor %xmm7,%xmm6 > > + movdqa %xmm12,%xmm7 > > + > > + por %xmm9,%xmm8 > > + movd -20(%r9),%xmm9 > > + pslld $30,%xmm7 > > + paddd %xmm6,%xmm10 > > + > > + psrld $2,%xmm12 > > + paddd %xmm8,%xmm10 > > +.byte 102,15,56,0,197 > > + movd -20(%r10),%xmm8 > > + por %xmm7,%xmm12 > > + movd -20(%r11),%xmm7 > > + punpckldq %xmm8,%xmm1 > > + movdqa %xmm10,%xmm8 > > + paddd %xmm15,%xmm14 > > + punpckldq %xmm7,%xmm9 > > + movdqa %xmm11,%xmm7 > > + movdqa %xmm11,%xmm6 > > + pslld $5,%xmm8 > > + pandn %xmm13,%xmm7 > > + pand %xmm12,%xmm6 > > + punpckldq %xmm9,%xmm1 > > + movdqa %xmm10,%xmm9 > > + > > + movdqa %xmm0,160-128(%rax) > > + paddd %xmm0,%xmm14 > > + movd -16(%r8),%xmm2 > > + psrld $27,%xmm9 > > + pxor %xmm7,%xmm6 > > + movdqa %xmm11,%xmm7 > > + > > + por %xmm9,%xmm8 > > + movd -16(%r9),%xmm9 > > + pslld $30,%xmm7 > > + paddd %xmm6,%xmm14 > > + > > + psrld $2,%xmm11 > > + paddd %xmm8,%xmm14 > > +.byte 102,15,56,0,205 > > + movd -16(%r10),%xmm8 > > + por %xmm7,%xmm11 > > + movd -16(%r11),%xmm7 > > + punpckldq %xmm8,%xmm2 > > + movdqa %xmm14,%xmm8 > > + paddd %xmm15,%xmm13 > > + punpckldq %xmm7,%xmm9 > > + movdqa %xmm10,%xmm7 > > + movdqa %xmm10,%xmm6 > > + pslld $5,%xmm8 > > + pandn %xmm12,%xmm7 > > + pand %xmm11,%xmm6 > > + punpckldq %xmm9,%xmm2 > > + movdqa %xmm14,%xmm9 > > + > > + movdqa %xmm1,176-128(%rax) > > + paddd %xmm1,%xmm13 > > + movd -12(%r8),%xmm3 > > + psrld $27,%xmm9 > > + pxor %xmm7,%xmm6 > > + movdqa %xmm10,%xmm7 > > + > > + por %xmm9,%xmm8 > > + movd -12(%r9),%xmm9 > > + pslld $30,%xmm7 > > + paddd %xmm6,%xmm13 > > + > > + psrld $2,%xmm10 > > + paddd %xmm8,%xmm13 > > +.byte 102,15,56,0,213 > > + movd -12(%r10),%xmm8 > > + por %xmm7,%xmm10 > > + movd -12(%r11),%xmm7 > > + punpckldq %xmm8,%xmm3 > > + movdqa %xmm13,%xmm8 > > + paddd %xmm15,%xmm12 > > + punpckldq %xmm7,%xmm9 > > + movdqa %xmm14,%xmm7 > > + movdqa %xmm14,%xmm6 > > + pslld $5,%xmm8 > > + pandn %xmm11,%xmm7 > > + pand %xmm10,%xmm6 > > + punpckldq %xmm9,%xmm3 > > + movdqa %xmm13,%xmm9 > > + > > + movdqa %xmm2,192-128(%rax) > > + paddd %xmm2,%xmm12 > > + movd -8(%r8),%xmm4 > > + psrld $27,%xmm9 > > + pxor %xmm7,%xmm6 > > + movdqa %xmm14,%xmm7 > > + > > + por %xmm9,%xmm8 > > + movd -8(%r9),%xmm9 > > + pslld $30,%xmm7 > > + paddd %xmm6,%xmm12 > > + > > + psrld $2,%xmm14 > > + paddd %xmm8,%xmm12 > > +.byte 102,15,56,0,221 > > + movd -8(%r10),%xmm8 > > + por %xmm7,%xmm14 > > + movd -8(%r11),%xmm7 > > + punpckldq %xmm8,%xmm4 > > + movdqa %xmm12,%xmm8 > > + paddd %xmm15,%xmm11 > > + punpckldq %xmm7,%xmm9 > > + movdqa %xmm13,%xmm7 > > + movdqa %xmm13,%xmm6 > > + pslld $5,%xmm8 > > + pandn %xmm10,%xmm7 > > + pand %xmm14,%xmm6 > > + punpckldq %xmm9,%xmm4 > > + movdqa %xmm12,%xmm9 > > + > > + movdqa %xmm3,208-128(%rax) > > + paddd %xmm3,%xmm11 > > + movd -4(%r8),%xmm0 > > + psrld $27,%xmm9 > > + pxor %xmm7,%xmm6 > > + movdqa %xmm13,%xmm7 > > + > > + por %xmm9,%xmm8 > > + movd -4(%r9),%xmm9 > > + pslld $30,%xmm7 > > + paddd %xmm6,%xmm11 > > + > > + psrld $2,%xmm13 > > + paddd %xmm8,%xmm11 > > +.byte 102,15,56,0,229 > > + movd -4(%r10),%xmm8 > > + por %xmm7,%xmm13 > > + movdqa 0-128(%rax),%xmm1 > > + movd -4(%r11),%xmm7 > > + punpckldq %xmm8,%xmm0 > > + movdqa %xmm11,%xmm8 > > + paddd %xmm15,%xmm10 > > + punpckldq %xmm7,%xmm9 > > + movdqa %xmm12,%xmm7 > > + movdqa %xmm12,%xmm6 > > + pslld $5,%xmm8 > > + prefetcht0 63(%r8) > > + pandn %xmm14,%xmm7 > > + pand %xmm13,%xmm6 > > + punpckldq %xmm9,%xmm0 > > + movdqa %xmm11,%xmm9 > > + > > + movdqa %xmm4,224-128(%rax) > > + paddd %xmm4,%xmm10 > > + psrld $27,%xmm9 > > + pxor %xmm7,%xmm6 > > + movdqa %xmm12,%xmm7 > > + prefetcht0 63(%r9) > > + > > + por %xmm9,%xmm8 > > + pslld $30,%xmm7 > > + paddd %xmm6,%xmm10 > > + prefetcht0 63(%r10) > > + > > + psrld $2,%xmm12 > > + paddd %xmm8,%xmm10 > > +.byte 102,15,56,0,197 > > + prefetcht0 63(%r11) > > + por %xmm7,%xmm12 > > + movdqa 16-128(%rax),%xmm2 > > + pxor %xmm3,%xmm1 > > + movdqa 32-128(%rax),%xmm3 > > + > > + movdqa %xmm10,%xmm8 > > + pxor 128-128(%rax),%xmm1 > > + paddd %xmm15,%xmm14 > > + movdqa %xmm11,%xmm7 > > + pslld $5,%xmm8 > > + pxor %xmm3,%xmm1 > > + movdqa %xmm11,%xmm6 > > + pandn %xmm13,%xmm7 > > + movdqa %xmm1,%xmm5 > > + pand %xmm12,%xmm6 > > + movdqa %xmm10,%xmm9 > > + psrld $31,%xmm5 > > + paddd %xmm1,%xmm1 > > + > > + movdqa %xmm0,240-128(%rax) > > + paddd %xmm0,%xmm14 > > + psrld $27,%xmm9 > > + pxor %xmm7,%xmm6 > > + > > + movdqa %xmm11,%xmm7 > > + por %xmm9,%xmm8 > > + pslld $30,%xmm7 > > + paddd %xmm6,%xmm14 > > + > > + psrld $2,%xmm11 > > + paddd %xmm8,%xmm14 > > + por %xmm5,%xmm1 > > + por %xmm7,%xmm11 > > + pxor %xmm4,%xmm2 > > + movdqa 48-128(%rax),%xmm4 > > + > > + movdqa %xmm14,%xmm8 > > + pxor 144-128(%rax),%xmm2 > > + paddd %xmm15,%xmm13 > > + movdqa %xmm10,%xmm7 > > + pslld $5,%xmm8 > > + pxor %xmm4,%xmm2 > > + movdqa %xmm10,%xmm6 > > + pandn %xmm12,%xmm7 > > + movdqa %xmm2,%xmm5 > > + pand %xmm11,%xmm6 > > + movdqa %xmm14,%xmm9 > > + psrld $31,%xmm5 > > + paddd %xmm2,%xmm2 > > + > > + movdqa %xmm1,0-128(%rax) > > + paddd %xmm1,%xmm13 > > + psrld $27,%xmm9 > > + pxor %xmm7,%xmm6 > > + > > + movdqa %xmm10,%xmm7 > > + por %xmm9,%xmm8 > > + pslld $30,%xmm7 > > + paddd %xmm6,%xmm13 > > + > > + psrld $2,%xmm10 > > + paddd %xmm8,%xmm13 > > + por %xmm5,%xmm2 > > + por %xmm7,%xmm10 > > + pxor %xmm0,%xmm3 > > + movdqa 64-128(%rax),%xmm0 > > + > > + movdqa %xmm13,%xmm8 > > + pxor 160-128(%rax),%xmm3 > > + paddd %xmm15,%xmm12 > > + movdqa %xmm14,%xmm7 > > + pslld $5,%xmm8 > > + pxor %xmm0,%xmm3 > > + movdqa %xmm14,%xmm6 > > + pandn %xmm11,%xmm7 > > + movdqa %xmm3,%xmm5 > > + pand %xmm10,%xmm6 > > + movdqa %xmm13,%xmm9 > > + psrld $31,%xmm5 > > + paddd %xmm3,%xmm3 > > + > > + movdqa %xmm2,16-128(%rax) > > + paddd %xmm2,%xmm12 > > + psrld $27,%xmm9 > > + pxor %xmm7,%xmm6 > > + > > + movdqa %xmm14,%xmm7 > > + por %xmm9,%xmm8 > > + pslld $30,%xmm7 > > + paddd %xmm6,%xmm12 > > + > > + psrld $2,%xmm14 > > + paddd %xmm8,%xmm12 > > + por %xmm5,%xmm3 > > + por %xmm7,%xmm14 > > + pxor %xmm1,%xmm4 > > + movdqa 80-128(%rax),%xmm1 > > + > > + movdqa %xmm12,%xmm8 > > + pxor 176-128(%rax),%xmm4 > > + paddd %xmm15,%xmm11 > > + movdqa %xmm13,%xmm7 > > + pslld $5,%xmm8 > > + pxor %xmm1,%xmm4 > > + movdqa %xmm13,%xmm6 > > + pandn %xmm10,%xmm7 > > + movdqa %xmm4,%xmm5 > > + pand %xmm14,%xmm6 > > + movdqa %xmm12,%xmm9 > > + psrld $31,%xmm5 > > + paddd %xmm4,%xmm4 > > + > > + movdqa %xmm3,32-128(%rax) > > + paddd %xmm3,%xmm11 > > + psrld $27,%xmm9 > > + pxor %xmm7,%xmm6 > > + > > + movdqa %xmm13,%xmm7 > > + por %xmm9,%xmm8 > > + pslld $30,%xmm7 > > + paddd %xmm6,%xmm11 > > + > > + psrld $2,%xmm13 > > + paddd %xmm8,%xmm11 > > + por %xmm5,%xmm4 > > + por %xmm7,%xmm13 > > + pxor %xmm2,%xmm0 > > + movdqa 96-128(%rax),%xmm2 > > + > > + movdqa %xmm11,%xmm8 > > + pxor 192-128(%rax),%xmm0 > > + paddd %xmm15,%xmm10 > > + movdqa %xmm12,%xmm7 > > + pslld $5,%xmm8 > > + pxor %xmm2,%xmm0 > > + movdqa %xmm12,%xmm6 > > + pandn %xmm14,%xmm7 > > + movdqa %xmm0,%xmm5 > > + pand %xmm13,%xmm6 > > + movdqa %xmm11,%xmm9 > > + psrld $31,%xmm5 > > + paddd %xmm0,%xmm0 > > + > > + movdqa %xmm4,48-128(%rax) > > + paddd %xmm4,%xmm10 > > + psrld $27,%xmm9 > > + pxor %xmm7,%xmm6 > > + > > + movdqa %xmm12,%xmm7 > > + por %xmm9,%xmm8 > > + pslld $30,%xmm7 > > + paddd %xmm6,%xmm10 > > + > > + psrld $2,%xmm12 > > + paddd %xmm8,%xmm10 > > + por %xmm5,%xmm0 > > + por %xmm7,%xmm12 > > + movdqa 0(%rbp),%xmm15 > > + pxor %xmm3,%xmm1 > > + movdqa 112-128(%rax),%xmm3 > > + > > + movdqa %xmm10,%xmm8 > > + movdqa %xmm13,%xmm6 > > + pxor 208-128(%rax),%xmm1 > > + paddd %xmm15,%xmm14 > > + pslld $5,%xmm8 > > + pxor %xmm11,%xmm6 > > + > > + movdqa %xmm10,%xmm9 > > + movdqa %xmm0,64-128(%rax) > > + paddd %xmm0,%xmm14 > > + pxor %xmm3,%xmm1 > > + psrld $27,%xmm9 > > + pxor %xmm12,%xmm6 > > + movdqa %xmm11,%xmm7 > > + > > + pslld $30,%xmm7 > > + movdqa %xmm1,%xmm5 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + paddd %xmm6,%xmm14 > > + paddd %xmm1,%xmm1 > > + > > + psrld $2,%xmm11 > > + paddd %xmm8,%xmm14 > > + por %xmm5,%xmm1 > > + por %xmm7,%xmm11 > > + pxor %xmm4,%xmm2 > > + movdqa 128-128(%rax),%xmm4 > > + > > + movdqa %xmm14,%xmm8 > > + movdqa %xmm12,%xmm6 > > + pxor 224-128(%rax),%xmm2 > > + paddd %xmm15,%xmm13 > > + pslld $5,%xmm8 > > + pxor %xmm10,%xmm6 > > + > > + movdqa %xmm14,%xmm9 > > + movdqa %xmm1,80-128(%rax) > > + paddd %xmm1,%xmm13 > > + pxor %xmm4,%xmm2 > > + psrld $27,%xmm9 > > + pxor %xmm11,%xmm6 > > + movdqa %xmm10,%xmm7 > > + > > + pslld $30,%xmm7 > > + movdqa %xmm2,%xmm5 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + paddd %xmm6,%xmm13 > > + paddd %xmm2,%xmm2 > > + > > + psrld $2,%xmm10 > > + paddd %xmm8,%xmm13 > > + por %xmm5,%xmm2 > > + por %xmm7,%xmm10 > > + pxor %xmm0,%xmm3 > > + movdqa 144-128(%rax),%xmm0 > > + > > + movdqa %xmm13,%xmm8 > > + movdqa %xmm11,%xmm6 > > + pxor 240-128(%rax),%xmm3 > > + paddd %xmm15,%xmm12 > > + pslld $5,%xmm8 > > + pxor %xmm14,%xmm6 > > + > > + movdqa %xmm13,%xmm9 > > + movdqa %xmm2,96-128(%rax) > > + paddd %xmm2,%xmm12 > > + pxor %xmm0,%xmm3 > > + psrld $27,%xmm9 > > + pxor %xmm10,%xmm6 > > + movdqa %xmm14,%xmm7 > > + > > + pslld $30,%xmm7 > > + movdqa %xmm3,%xmm5 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + paddd %xmm6,%xmm12 > > + paddd %xmm3,%xmm3 > > + > > + psrld $2,%xmm14 > > + paddd %xmm8,%xmm12 > > + por %xmm5,%xmm3 > > + por %xmm7,%xmm14 > > + pxor %xmm1,%xmm4 > > + movdqa 160-128(%rax),%xmm1 > > + > > + movdqa %xmm12,%xmm8 > > + movdqa %xmm10,%xmm6 > > + pxor 0-128(%rax),%xmm4 > > + paddd %xmm15,%xmm11 > > + pslld $5,%xmm8 > > + pxor %xmm13,%xmm6 > > + > > + movdqa %xmm12,%xmm9 > > + movdqa %xmm3,112-128(%rax) > > + paddd %xmm3,%xmm11 > > + pxor %xmm1,%xmm4 > > + psrld $27,%xmm9 > > + pxor %xmm14,%xmm6 > > + movdqa %xmm13,%xmm7 > > + > > + pslld $30,%xmm7 > > + movdqa %xmm4,%xmm5 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + paddd %xmm6,%xmm11 > > + paddd %xmm4,%xmm4 > > + > > + psrld $2,%xmm13 > > + paddd %xmm8,%xmm11 > > + por %xmm5,%xmm4 > > + por %xmm7,%xmm13 > > + pxor %xmm2,%xmm0 > > + movdqa 176-128(%rax),%xmm2 > > + > > + movdqa %xmm11,%xmm8 > > + movdqa %xmm14,%xmm6 > > + pxor 16-128(%rax),%xmm0 > > + paddd %xmm15,%xmm10 > > + pslld $5,%xmm8 > > + pxor %xmm12,%xmm6 > > + > > + movdqa %xmm11,%xmm9 > > + movdqa %xmm4,128-128(%rax) > > + paddd %xmm4,%xmm10 > > + pxor %xmm2,%xmm0 > > + psrld $27,%xmm9 > > + pxor %xmm13,%xmm6 > > + movdqa %xmm12,%xmm7 > > + > > + pslld $30,%xmm7 > > + movdqa %xmm0,%xmm5 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + paddd %xmm6,%xmm10 > > + paddd %xmm0,%xmm0 > > + > > + psrld $2,%xmm12 > > + paddd %xmm8,%xmm10 > > + por %xmm5,%xmm0 > > + por %xmm7,%xmm12 > > + pxor %xmm3,%xmm1 > > + movdqa 192-128(%rax),%xmm3 > > + > > + movdqa %xmm10,%xmm8 > > + movdqa %xmm13,%xmm6 > > + pxor 32-128(%rax),%xmm1 > > + paddd %xmm15,%xmm14 > > + pslld $5,%xmm8 > > + pxor %xmm11,%xmm6 > > + > > + movdqa %xmm10,%xmm9 > > + movdqa %xmm0,144-128(%rax) > > + paddd %xmm0,%xmm14 > > + pxor %xmm3,%xmm1 > > + psrld $27,%xmm9 > > + pxor %xmm12,%xmm6 > > + movdqa %xmm11,%xmm7 > > + > > + pslld $30,%xmm7 > > + movdqa %xmm1,%xmm5 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + paddd %xmm6,%xmm14 > > + paddd %xmm1,%xmm1 > > + > > + psrld $2,%xmm11 > > + paddd %xmm8,%xmm14 > > + por %xmm5,%xmm1 > > + por %xmm7,%xmm11 > > + pxor %xmm4,%xmm2 > > + movdqa 208-128(%rax),%xmm4 > > + > > + movdqa %xmm14,%xmm8 > > + movdqa %xmm12,%xmm6 > > + pxor 48-128(%rax),%xmm2 > > + paddd %xmm15,%xmm13 > > + pslld $5,%xmm8 > > + pxor %xmm10,%xmm6 > > + > > + movdqa %xmm14,%xmm9 > > + movdqa %xmm1,160-128(%rax) > > + paddd %xmm1,%xmm13 > > + pxor %xmm4,%xmm2 > > + psrld $27,%xmm9 > > + pxor %xmm11,%xmm6 > > + movdqa %xmm10,%xmm7 > > + > > + pslld $30,%xmm7 > > + movdqa %xmm2,%xmm5 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + paddd %xmm6,%xmm13 > > + paddd %xmm2,%xmm2 > > + > > + psrld $2,%xmm10 > > + paddd %xmm8,%xmm13 > > + por %xmm5,%xmm2 > > + por %xmm7,%xmm10 > > + pxor %xmm0,%xmm3 > > + movdqa 224-128(%rax),%xmm0 > > + > > + movdqa %xmm13,%xmm8 > > + movdqa %xmm11,%xmm6 > > + pxor 64-128(%rax),%xmm3 > > + paddd %xmm15,%xmm12 > > + pslld $5,%xmm8 > > + pxor %xmm14,%xmm6 > > + > > + movdqa %xmm13,%xmm9 > > + movdqa %xmm2,176-128(%rax) > > + paddd %xmm2,%xmm12 > > + pxor %xmm0,%xmm3 > > + psrld $27,%xmm9 > > + pxor %xmm10,%xmm6 > > + movdqa %xmm14,%xmm7 > > + > > + pslld $30,%xmm7 > > + movdqa %xmm3,%xmm5 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + paddd %xmm6,%xmm12 > > + paddd %xmm3,%xmm3 > > + > > + psrld $2,%xmm14 > > + paddd %xmm8,%xmm12 > > + por %xmm5,%xmm3 > > + por %xmm7,%xmm14 > > + pxor %xmm1,%xmm4 > > + movdqa 240-128(%rax),%xmm1 > > + > > + movdqa %xmm12,%xmm8 > > + movdqa %xmm10,%xmm6 > > + pxor 80-128(%rax),%xmm4 > > + paddd %xmm15,%xmm11 > > + pslld $5,%xmm8 > > + pxor %xmm13,%xmm6 > > + > > + movdqa %xmm12,%xmm9 > > + movdqa %xmm3,192-128(%rax) > > + paddd %xmm3,%xmm11 > > + pxor %xmm1,%xmm4 > > + psrld $27,%xmm9 > > + pxor %xmm14,%xmm6 > > + movdqa %xmm13,%xmm7 > > + > > + pslld $30,%xmm7 > > + movdqa %xmm4,%xmm5 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + paddd %xmm6,%xmm11 > > + paddd %xmm4,%xmm4 > > + > > + psrld $2,%xmm13 > > + paddd %xmm8,%xmm11 > > + por %xmm5,%xmm4 > > + por %xmm7,%xmm13 > > + pxor %xmm2,%xmm0 > > + movdqa 0-128(%rax),%xmm2 > > + > > + movdqa %xmm11,%xmm8 > > + movdqa %xmm14,%xmm6 > > + pxor 96-128(%rax),%xmm0 > > + paddd %xmm15,%xmm10 > > + pslld $5,%xmm8 > > + pxor %xmm12,%xmm6 > > + > > + movdqa %xmm11,%xmm9 > > + movdqa %xmm4,208-128(%rax) > > + paddd %xmm4,%xmm10 > > + pxor %xmm2,%xmm0 > > + psrld $27,%xmm9 > > + pxor %xmm13,%xmm6 > > + movdqa %xmm12,%xmm7 > > + > > + pslld $30,%xmm7 > > + movdqa %xmm0,%xmm5 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + paddd %xmm6,%xmm10 > > + paddd %xmm0,%xmm0 > > + > > + psrld $2,%xmm12 > > + paddd %xmm8,%xmm10 > > + por %xmm5,%xmm0 > > + por %xmm7,%xmm12 > > + pxor %xmm3,%xmm1 > > + movdqa 16-128(%rax),%xmm3 > > + > > + movdqa %xmm10,%xmm8 > > + movdqa %xmm13,%xmm6 > > + pxor 112-128(%rax),%xmm1 > > + paddd %xmm15,%xmm14 > > + pslld $5,%xmm8 > > + pxor %xmm11,%xmm6 > > + > > + movdqa %xmm10,%xmm9 > > + movdqa %xmm0,224-128(%rax) > > + paddd %xmm0,%xmm14 > > + pxor %xmm3,%xmm1 > > + psrld $27,%xmm9 > > + pxor %xmm12,%xmm6 > > + movdqa %xmm11,%xmm7 > > + > > + pslld $30,%xmm7 > > + movdqa %xmm1,%xmm5 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + paddd %xmm6,%xmm14 > > + paddd %xmm1,%xmm1 > > + > > + psrld $2,%xmm11 > > + paddd %xmm8,%xmm14 > > + por %xmm5,%xmm1 > > + por %xmm7,%xmm11 > > + pxor %xmm4,%xmm2 > > + movdqa 32-128(%rax),%xmm4 > > + > > + movdqa %xmm14,%xmm8 > > + movdqa %xmm12,%xmm6 > > + pxor 128-128(%rax),%xmm2 > > + paddd %xmm15,%xmm13 > > + pslld $5,%xmm8 > > + pxor %xmm10,%xmm6 > > + > > + movdqa %xmm14,%xmm9 > > + movdqa %xmm1,240-128(%rax) > > + paddd %xmm1,%xmm13 > > + pxor %xmm4,%xmm2 > > + psrld $27,%xmm9 > > + pxor %xmm11,%xmm6 > > + movdqa %xmm10,%xmm7 > > + > > + pslld $30,%xmm7 > > + movdqa %xmm2,%xmm5 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + paddd %xmm6,%xmm13 > > + paddd %xmm2,%xmm2 > > + > > + psrld $2,%xmm10 > > + paddd %xmm8,%xmm13 > > + por %xmm5,%xmm2 > > + por %xmm7,%xmm10 > > + pxor %xmm0,%xmm3 > > + movdqa 48-128(%rax),%xmm0 > > + > > + movdqa %xmm13,%xmm8 > > + movdqa %xmm11,%xmm6 > > + pxor 144-128(%rax),%xmm3 > > + paddd %xmm15,%xmm12 > > + pslld $5,%xmm8 > > + pxor %xmm14,%xmm6 > > + > > + movdqa %xmm13,%xmm9 > > + movdqa %xmm2,0-128(%rax) > > + paddd %xmm2,%xmm12 > > + pxor %xmm0,%xmm3 > > + psrld $27,%xmm9 > > + pxor %xmm10,%xmm6 > > + movdqa %xmm14,%xmm7 > > + > > + pslld $30,%xmm7 > > + movdqa %xmm3,%xmm5 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + paddd %xmm6,%xmm12 > > + paddd %xmm3,%xmm3 > > + > > + psrld $2,%xmm14 > > + paddd %xmm8,%xmm12 > > + por %xmm5,%xmm3 > > + por %xmm7,%xmm14 > > + pxor %xmm1,%xmm4 > > + movdqa 64-128(%rax),%xmm1 > > + > > + movdqa %xmm12,%xmm8 > > + movdqa %xmm10,%xmm6 > > + pxor 160-128(%rax),%xmm4 > > + paddd %xmm15,%xmm11 > > + pslld $5,%xmm8 > > + pxor %xmm13,%xmm6 > > + > > + movdqa %xmm12,%xmm9 > > + movdqa %xmm3,16-128(%rax) > > + paddd %xmm3,%xmm11 > > + pxor %xmm1,%xmm4 > > + psrld $27,%xmm9 > > + pxor %xmm14,%xmm6 > > + movdqa %xmm13,%xmm7 > > + > > + pslld $30,%xmm7 > > + movdqa %xmm4,%xmm5 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + paddd %xmm6,%xmm11 > > + paddd %xmm4,%xmm4 > > + > > + psrld $2,%xmm13 > > + paddd %xmm8,%xmm11 > > + por %xmm5,%xmm4 > > + por %xmm7,%xmm13 > > + pxor %xmm2,%xmm0 > > + movdqa 80-128(%rax),%xmm2 > > + > > + movdqa %xmm11,%xmm8 > > + movdqa %xmm14,%xmm6 > > + pxor 176-128(%rax),%xmm0 > > + paddd %xmm15,%xmm10 > > + pslld $5,%xmm8 > > + pxor %xmm12,%xmm6 > > + > > + movdqa %xmm11,%xmm9 > > + movdqa %xmm4,32-128(%rax) > > + paddd %xmm4,%xmm10 > > + pxor %xmm2,%xmm0 > > + psrld $27,%xmm9 > > + pxor %xmm13,%xmm6 > > + movdqa %xmm12,%xmm7 > > + > > + pslld $30,%xmm7 > > + movdqa %xmm0,%xmm5 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + paddd %xmm6,%xmm10 > > + paddd %xmm0,%xmm0 > > + > > + psrld $2,%xmm12 > > + paddd %xmm8,%xmm10 > > + por %xmm5,%xmm0 > > + por %xmm7,%xmm12 > > + pxor %xmm3,%xmm1 > > + movdqa 96-128(%rax),%xmm3 > > + > > + movdqa %xmm10,%xmm8 > > + movdqa %xmm13,%xmm6 > > + pxor 192-128(%rax),%xmm1 > > + paddd %xmm15,%xmm14 > > + pslld $5,%xmm8 > > + pxor %xmm11,%xmm6 > > + > > + movdqa %xmm10,%xmm9 > > + movdqa %xmm0,48-128(%rax) > > + paddd %xmm0,%xmm14 > > + pxor %xmm3,%xmm1 > > + psrld $27,%xmm9 > > + pxor %xmm12,%xmm6 > > + movdqa %xmm11,%xmm7 > > + > > + pslld $30,%xmm7 > > + movdqa %xmm1,%xmm5 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + paddd %xmm6,%xmm14 > > + paddd %xmm1,%xmm1 > > + > > + psrld $2,%xmm11 > > + paddd %xmm8,%xmm14 > > + por %xmm5,%xmm1 > > + por %xmm7,%xmm11 > > + pxor %xmm4,%xmm2 > > + movdqa 112-128(%rax),%xmm4 > > + > > + movdqa %xmm14,%xmm8 > > + movdqa %xmm12,%xmm6 > > + pxor 208-128(%rax),%xmm2 > > + paddd %xmm15,%xmm13 > > + pslld $5,%xmm8 > > + pxor %xmm10,%xmm6 > > + > > + movdqa %xmm14,%xmm9 > > + movdqa %xmm1,64-128(%rax) > > + paddd %xmm1,%xmm13 > > + pxor %xmm4,%xmm2 > > + psrld $27,%xmm9 > > + pxor %xmm11,%xmm6 > > + movdqa %xmm10,%xmm7 > > + > > + pslld $30,%xmm7 > > + movdqa %xmm2,%xmm5 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + paddd %xmm6,%xmm13 > > + paddd %xmm2,%xmm2 > > + > > + psrld $2,%xmm10 > > + paddd %xmm8,%xmm13 > > + por %xmm5,%xmm2 > > + por %xmm7,%xmm10 > > + pxor %xmm0,%xmm3 > > + movdqa 128-128(%rax),%xmm0 > > + > > + movdqa %xmm13,%xmm8 > > + movdqa %xmm11,%xmm6 > > + pxor 224-128(%rax),%xmm3 > > + paddd %xmm15,%xmm12 > > + pslld $5,%xmm8 > > + pxor %xmm14,%xmm6 > > + > > + movdqa %xmm13,%xmm9 > > + movdqa %xmm2,80-128(%rax) > > + paddd %xmm2,%xmm12 > > + pxor %xmm0,%xmm3 > > + psrld $27,%xmm9 > > + pxor %xmm10,%xmm6 > > + movdqa %xmm14,%xmm7 > > + > > + pslld $30,%xmm7 > > + movdqa %xmm3,%xmm5 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + paddd %xmm6,%xmm12 > > + paddd %xmm3,%xmm3 > > + > > + psrld $2,%xmm14 > > + paddd %xmm8,%xmm12 > > + por %xmm5,%xmm3 > > + por %xmm7,%xmm14 > > + pxor %xmm1,%xmm4 > > + movdqa 144-128(%rax),%xmm1 > > + > > + movdqa %xmm12,%xmm8 > > + movdqa %xmm10,%xmm6 > > + pxor 240-128(%rax),%xmm4 > > + paddd %xmm15,%xmm11 > > + pslld $5,%xmm8 > > + pxor %xmm13,%xmm6 > > + > > + movdqa %xmm12,%xmm9 > > + movdqa %xmm3,96-128(%rax) > > + paddd %xmm3,%xmm11 > > + pxor %xmm1,%xmm4 > > + psrld $27,%xmm9 > > + pxor %xmm14,%xmm6 > > + movdqa %xmm13,%xmm7 > > + > > + pslld $30,%xmm7 > > + movdqa %xmm4,%xmm5 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + paddd %xmm6,%xmm11 > > + paddd %xmm4,%xmm4 > > + > > + psrld $2,%xmm13 > > + paddd %xmm8,%xmm11 > > + por %xmm5,%xmm4 > > + por %xmm7,%xmm13 > > + pxor %xmm2,%xmm0 > > + movdqa 160-128(%rax),%xmm2 > > + > > + movdqa %xmm11,%xmm8 > > + movdqa %xmm14,%xmm6 > > + pxor 0-128(%rax),%xmm0 > > + paddd %xmm15,%xmm10 > > + pslld $5,%xmm8 > > + pxor %xmm12,%xmm6 > > + > > + movdqa %xmm11,%xmm9 > > + movdqa %xmm4,112-128(%rax) > > + paddd %xmm4,%xmm10 > > + pxor %xmm2,%xmm0 > > + psrld $27,%xmm9 > > + pxor %xmm13,%xmm6 > > + movdqa %xmm12,%xmm7 > > + > > + pslld $30,%xmm7 > > + movdqa %xmm0,%xmm5 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + paddd %xmm6,%xmm10 > > + paddd %xmm0,%xmm0 > > + > > + psrld $2,%xmm12 > > + paddd %xmm8,%xmm10 > > + por %xmm5,%xmm0 > > + por %xmm7,%xmm12 > > + movdqa 32(%rbp),%xmm15 > > + pxor %xmm3,%xmm1 > > + movdqa 176-128(%rax),%xmm3 > > + > > + movdqa %xmm10,%xmm8 > > + movdqa %xmm13,%xmm7 > > + pxor 16-128(%rax),%xmm1 > > + pxor %xmm3,%xmm1 > > + paddd %xmm15,%xmm14 > > + pslld $5,%xmm8 > > + movdqa %xmm10,%xmm9 > > + pand %xmm12,%xmm7 > > + > > + movdqa %xmm13,%xmm6 > > + movdqa %xmm1,%xmm5 > > + psrld $27,%xmm9 > > + paddd %xmm7,%xmm14 > > + pxor %xmm12,%xmm6 > > + > > + movdqa %xmm0,128-128(%rax) > > + paddd %xmm0,%xmm14 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + pand %xmm11,%xmm6 > > + movdqa %xmm11,%xmm7 > > + > > + pslld $30,%xmm7 > > + paddd %xmm1,%xmm1 > > + paddd %xmm6,%xmm14 > > + > > + psrld $2,%xmm11 > > + paddd %xmm8,%xmm14 > > + por %xmm5,%xmm1 > > + por %xmm7,%xmm11 > > + pxor %xmm4,%xmm2 > > + movdqa 192-128(%rax),%xmm4 > > + > > + movdqa %xmm14,%xmm8 > > + movdqa %xmm12,%xmm7 > > + pxor 32-128(%rax),%xmm2 > > + pxor %xmm4,%xmm2 > > + paddd %xmm15,%xmm13 > > + pslld $5,%xmm8 > > + movdqa %xmm14,%xmm9 > > + pand %xmm11,%xmm7 > > + > > + movdqa %xmm12,%xmm6 > > + movdqa %xmm2,%xmm5 > > + psrld $27,%xmm9 > > + paddd %xmm7,%xmm13 > > + pxor %xmm11,%xmm6 > > + > > + movdqa %xmm1,144-128(%rax) > > + paddd %xmm1,%xmm13 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + pand %xmm10,%xmm6 > > + movdqa %xmm10,%xmm7 > > + > > + pslld $30,%xmm7 > > + paddd %xmm2,%xmm2 > > + paddd %xmm6,%xmm13 > > + > > + psrld $2,%xmm10 > > + paddd %xmm8,%xmm13 > > + por %xmm5,%xmm2 > > + por %xmm7,%xmm10 > > + pxor %xmm0,%xmm3 > > + movdqa 208-128(%rax),%xmm0 > > + > > + movdqa %xmm13,%xmm8 > > + movdqa %xmm11,%xmm7 > > + pxor 48-128(%rax),%xmm3 > > + pxor %xmm0,%xmm3 > > + paddd %xmm15,%xmm12 > > + pslld $5,%xmm8 > > + movdqa %xmm13,%xmm9 > > + pand %xmm10,%xmm7 > > + > > + movdqa %xmm11,%xmm6 > > + movdqa %xmm3,%xmm5 > > + psrld $27,%xmm9 > > + paddd %xmm7,%xmm12 > > + pxor %xmm10,%xmm6 > > + > > + movdqa %xmm2,160-128(%rax) > > + paddd %xmm2,%xmm12 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + pand %xmm14,%xmm6 > > + movdqa %xmm14,%xmm7 > > + > > + pslld $30,%xmm7 > > + paddd %xmm3,%xmm3 > > + paddd %xmm6,%xmm12 > > + > > + psrld $2,%xmm14 > > + paddd %xmm8,%xmm12 > > + por %xmm5,%xmm3 > > + por %xmm7,%xmm14 > > + pxor %xmm1,%xmm4 > > + movdqa 224-128(%rax),%xmm1 > > + > > + movdqa %xmm12,%xmm8 > > + movdqa %xmm10,%xmm7 > > + pxor 64-128(%rax),%xmm4 > > + pxor %xmm1,%xmm4 > > + paddd %xmm15,%xmm11 > > + pslld $5,%xmm8 > > + movdqa %xmm12,%xmm9 > > + pand %xmm14,%xmm7 > > + > > + movdqa %xmm10,%xmm6 > > + movdqa %xmm4,%xmm5 > > + psrld $27,%xmm9 > > + paddd %xmm7,%xmm11 > > + pxor %xmm14,%xmm6 > > + > > + movdqa %xmm3,176-128(%rax) > > + paddd %xmm3,%xmm11 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + pand %xmm13,%xmm6 > > + movdqa %xmm13,%xmm7 > > + > > + pslld $30,%xmm7 > > + paddd %xmm4,%xmm4 > > + paddd %xmm6,%xmm11 > > + > > + psrld $2,%xmm13 > > + paddd %xmm8,%xmm11 > > + por %xmm5,%xmm4 > > + por %xmm7,%xmm13 > > + pxor %xmm2,%xmm0 > > + movdqa 240-128(%rax),%xmm2 > > + > > + movdqa %xmm11,%xmm8 > > + movdqa %xmm14,%xmm7 > > + pxor 80-128(%rax),%xmm0 > > + pxor %xmm2,%xmm0 > > + paddd %xmm15,%xmm10 > > + pslld $5,%xmm8 > > + movdqa %xmm11,%xmm9 > > + pand %xmm13,%xmm7 > > + > > + movdqa %xmm14,%xmm6 > > + movdqa %xmm0,%xmm5 > > + psrld $27,%xmm9 > > + paddd %xmm7,%xmm10 > > + pxor %xmm13,%xmm6 > > + > > + movdqa %xmm4,192-128(%rax) > > + paddd %xmm4,%xmm10 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + pand %xmm12,%xmm6 > > + movdqa %xmm12,%xmm7 > > + > > + pslld $30,%xmm7 > > + paddd %xmm0,%xmm0 > > + paddd %xmm6,%xmm10 > > + > > + psrld $2,%xmm12 > > + paddd %xmm8,%xmm10 > > + por %xmm5,%xmm0 > > + por %xmm7,%xmm12 > > + pxor %xmm3,%xmm1 > > + movdqa 0-128(%rax),%xmm3 > > + > > + movdqa %xmm10,%xmm8 > > + movdqa %xmm13,%xmm7 > > + pxor 96-128(%rax),%xmm1 > > + pxor %xmm3,%xmm1 > > + paddd %xmm15,%xmm14 > > + pslld $5,%xmm8 > > + movdqa %xmm10,%xmm9 > > + pand %xmm12,%xmm7 > > + > > + movdqa %xmm13,%xmm6 > > + movdqa %xmm1,%xmm5 > > + psrld $27,%xmm9 > > + paddd %xmm7,%xmm14 > > + pxor %xmm12,%xmm6 > > + > > + movdqa %xmm0,208-128(%rax) > > + paddd %xmm0,%xmm14 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + pand %xmm11,%xmm6 > > + movdqa %xmm11,%xmm7 > > + > > + pslld $30,%xmm7 > > + paddd %xmm1,%xmm1 > > + paddd %xmm6,%xmm14 > > + > > + psrld $2,%xmm11 > > + paddd %xmm8,%xmm14 > > + por %xmm5,%xmm1 > > + por %xmm7,%xmm11 > > + pxor %xmm4,%xmm2 > > + movdqa 16-128(%rax),%xmm4 > > + > > + movdqa %xmm14,%xmm8 > > + movdqa %xmm12,%xmm7 > > + pxor 112-128(%rax),%xmm2 > > + pxor %xmm4,%xmm2 > > + paddd %xmm15,%xmm13 > > + pslld $5,%xmm8 > > + movdqa %xmm14,%xmm9 > > + pand %xmm11,%xmm7 > > + > > + movdqa %xmm12,%xmm6 > > + movdqa %xmm2,%xmm5 > > + psrld $27,%xmm9 > > + paddd %xmm7,%xmm13 > > + pxor %xmm11,%xmm6 > > + > > + movdqa %xmm1,224-128(%rax) > > + paddd %xmm1,%xmm13 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + pand %xmm10,%xmm6 > > + movdqa %xmm10,%xmm7 > > + > > + pslld $30,%xmm7 > > + paddd %xmm2,%xmm2 > > + paddd %xmm6,%xmm13 > > + > > + psrld $2,%xmm10 > > + paddd %xmm8,%xmm13 > > + por %xmm5,%xmm2 > > + por %xmm7,%xmm10 > > + pxor %xmm0,%xmm3 > > + movdqa 32-128(%rax),%xmm0 > > + > > + movdqa %xmm13,%xmm8 > > + movdqa %xmm11,%xmm7 > > + pxor 128-128(%rax),%xmm3 > > + pxor %xmm0,%xmm3 > > + paddd %xmm15,%xmm12 > > + pslld $5,%xmm8 > > + movdqa %xmm13,%xmm9 > > + pand %xmm10,%xmm7 > > + > > + movdqa %xmm11,%xmm6 > > + movdqa %xmm3,%xmm5 > > + psrld $27,%xmm9 > > + paddd %xmm7,%xmm12 > > + pxor %xmm10,%xmm6 > > + > > + movdqa %xmm2,240-128(%rax) > > + paddd %xmm2,%xmm12 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + pand %xmm14,%xmm6 > > + movdqa %xmm14,%xmm7 > > + > > + pslld $30,%xmm7 > > + paddd %xmm3,%xmm3 > > + paddd %xmm6,%xmm12 > > + > > + psrld $2,%xmm14 > > + paddd %xmm8,%xmm12 > > + por %xmm5,%xmm3 > > + por %xmm7,%xmm14 > > + pxor %xmm1,%xmm4 > > + movdqa 48-128(%rax),%xmm1 > > + > > + movdqa %xmm12,%xmm8 > > + movdqa %xmm10,%xmm7 > > + pxor 144-128(%rax),%xmm4 > > + pxor %xmm1,%xmm4 > > + paddd %xmm15,%xmm11 > > + pslld $5,%xmm8 > > + movdqa %xmm12,%xmm9 > > + pand %xmm14,%xmm7 > > + > > + movdqa %xmm10,%xmm6 > > + movdqa %xmm4,%xmm5 > > + psrld $27,%xmm9 > > + paddd %xmm7,%xmm11 > > + pxor %xmm14,%xmm6 > > + > > + movdqa %xmm3,0-128(%rax) > > + paddd %xmm3,%xmm11 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + pand %xmm13,%xmm6 > > + movdqa %xmm13,%xmm7 > > + > > + pslld $30,%xmm7 > > + paddd %xmm4,%xmm4 > > + paddd %xmm6,%xmm11 > > + > > + psrld $2,%xmm13 > > + paddd %xmm8,%xmm11 > > + por %xmm5,%xmm4 > > + por %xmm7,%xmm13 > > + pxor %xmm2,%xmm0 > > + movdqa 64-128(%rax),%xmm2 > > + > > + movdqa %xmm11,%xmm8 > > + movdqa %xmm14,%xmm7 > > + pxor 160-128(%rax),%xmm0 > > + pxor %xmm2,%xmm0 > > + paddd %xmm15,%xmm10 > > + pslld $5,%xmm8 > > + movdqa %xmm11,%xmm9 > > + pand %xmm13,%xmm7 > > + > > + movdqa %xmm14,%xmm6 > > + movdqa %xmm0,%xmm5 > > + psrld $27,%xmm9 > > + paddd %xmm7,%xmm10 > > + pxor %xmm13,%xmm6 > > + > > + movdqa %xmm4,16-128(%rax) > > + paddd %xmm4,%xmm10 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + pand %xmm12,%xmm6 > > + movdqa %xmm12,%xmm7 > > + > > + pslld $30,%xmm7 > > + paddd %xmm0,%xmm0 > > + paddd %xmm6,%xmm10 > > + > > + psrld $2,%xmm12 > > + paddd %xmm8,%xmm10 > > + por %xmm5,%xmm0 > > + por %xmm7,%xmm12 > > + pxor %xmm3,%xmm1 > > + movdqa 80-128(%rax),%xmm3 > > + > > + movdqa %xmm10,%xmm8 > > + movdqa %xmm13,%xmm7 > > + pxor 176-128(%rax),%xmm1 > > + pxor %xmm3,%xmm1 > > + paddd %xmm15,%xmm14 > > + pslld $5,%xmm8 > > + movdqa %xmm10,%xmm9 > > + pand %xmm12,%xmm7 > > + > > + movdqa %xmm13,%xmm6 > > + movdqa %xmm1,%xmm5 > > + psrld $27,%xmm9 > > + paddd %xmm7,%xmm14 > > + pxor %xmm12,%xmm6 > > + > > + movdqa %xmm0,32-128(%rax) > > + paddd %xmm0,%xmm14 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + pand %xmm11,%xmm6 > > + movdqa %xmm11,%xmm7 > > + > > + pslld $30,%xmm7 > > + paddd %xmm1,%xmm1 > > + paddd %xmm6,%xmm14 > > + > > + psrld $2,%xmm11 > > + paddd %xmm8,%xmm14 > > + por %xmm5,%xmm1 > > + por %xmm7,%xmm11 > > + pxor %xmm4,%xmm2 > > + movdqa 96-128(%rax),%xmm4 > > + > > + movdqa %xmm14,%xmm8 > > + movdqa %xmm12,%xmm7 > > + pxor 192-128(%rax),%xmm2 > > + pxor %xmm4,%xmm2 > > + paddd %xmm15,%xmm13 > > + pslld $5,%xmm8 > > + movdqa %xmm14,%xmm9 > > + pand %xmm11,%xmm7 > > + > > + movdqa %xmm12,%xmm6 > > + movdqa %xmm2,%xmm5 > > + psrld $27,%xmm9 > > + paddd %xmm7,%xmm13 > > + pxor %xmm11,%xmm6 > > + > > + movdqa %xmm1,48-128(%rax) > > + paddd %xmm1,%xmm13 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + pand %xmm10,%xmm6 > > + movdqa %xmm10,%xmm7 > > + > > + pslld $30,%xmm7 > > + paddd %xmm2,%xmm2 > > + paddd %xmm6,%xmm13 > > + > > + psrld $2,%xmm10 > > + paddd %xmm8,%xmm13 > > + por %xmm5,%xmm2 > > + por %xmm7,%xmm10 > > + pxor %xmm0,%xmm3 > > + movdqa 112-128(%rax),%xmm0 > > + > > + movdqa %xmm13,%xmm8 > > + movdqa %xmm11,%xmm7 > > + pxor 208-128(%rax),%xmm3 > > + pxor %xmm0,%xmm3 > > + paddd %xmm15,%xmm12 > > + pslld $5,%xmm8 > > + movdqa %xmm13,%xmm9 > > + pand %xmm10,%xmm7 > > + > > + movdqa %xmm11,%xmm6 > > + movdqa %xmm3,%xmm5 > > + psrld $27,%xmm9 > > + paddd %xmm7,%xmm12 > > + pxor %xmm10,%xmm6 > > + > > + movdqa %xmm2,64-128(%rax) > > + paddd %xmm2,%xmm12 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + pand %xmm14,%xmm6 > > + movdqa %xmm14,%xmm7 > > + > > + pslld $30,%xmm7 > > + paddd %xmm3,%xmm3 > > + paddd %xmm6,%xmm12 > > + > > + psrld $2,%xmm14 > > + paddd %xmm8,%xmm12 > > + por %xmm5,%xmm3 > > + por %xmm7,%xmm14 > > + pxor %xmm1,%xmm4 > > + movdqa 128-128(%rax),%xmm1 > > + > > + movdqa %xmm12,%xmm8 > > + movdqa %xmm10,%xmm7 > > + pxor 224-128(%rax),%xmm4 > > + pxor %xmm1,%xmm4 > > + paddd %xmm15,%xmm11 > > + pslld $5,%xmm8 > > + movdqa %xmm12,%xmm9 > > + pand %xmm14,%xmm7 > > + > > + movdqa %xmm10,%xmm6 > > + movdqa %xmm4,%xmm5 > > + psrld $27,%xmm9 > > + paddd %xmm7,%xmm11 > > + pxor %xmm14,%xmm6 > > + > > + movdqa %xmm3,80-128(%rax) > > + paddd %xmm3,%xmm11 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + pand %xmm13,%xmm6 > > + movdqa %xmm13,%xmm7 > > + > > + pslld $30,%xmm7 > > + paddd %xmm4,%xmm4 > > + paddd %xmm6,%xmm11 > > + > > + psrld $2,%xmm13 > > + paddd %xmm8,%xmm11 > > + por %xmm5,%xmm4 > > + por %xmm7,%xmm13 > > + pxor %xmm2,%xmm0 > > + movdqa 144-128(%rax),%xmm2 > > + > > + movdqa %xmm11,%xmm8 > > + movdqa %xmm14,%xmm7 > > + pxor 240-128(%rax),%xmm0 > > + pxor %xmm2,%xmm0 > > + paddd %xmm15,%xmm10 > > + pslld $5,%xmm8 > > + movdqa %xmm11,%xmm9 > > + pand %xmm13,%xmm7 > > + > > + movdqa %xmm14,%xmm6 > > + movdqa %xmm0,%xmm5 > > + psrld $27,%xmm9 > > + paddd %xmm7,%xmm10 > > + pxor %xmm13,%xmm6 > > + > > + movdqa %xmm4,96-128(%rax) > > + paddd %xmm4,%xmm10 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + pand %xmm12,%xmm6 > > + movdqa %xmm12,%xmm7 > > + > > + pslld $30,%xmm7 > > + paddd %xmm0,%xmm0 > > + paddd %xmm6,%xmm10 > > + > > + psrld $2,%xmm12 > > + paddd %xmm8,%xmm10 > > + por %xmm5,%xmm0 > > + por %xmm7,%xmm12 > > + pxor %xmm3,%xmm1 > > + movdqa 160-128(%rax),%xmm3 > > + > > + movdqa %xmm10,%xmm8 > > + movdqa %xmm13,%xmm7 > > + pxor 0-128(%rax),%xmm1 > > + pxor %xmm3,%xmm1 > > + paddd %xmm15,%xmm14 > > + pslld $5,%xmm8 > > + movdqa %xmm10,%xmm9 > > + pand %xmm12,%xmm7 > > + > > + movdqa %xmm13,%xmm6 > > + movdqa %xmm1,%xmm5 > > + psrld $27,%xmm9 > > + paddd %xmm7,%xmm14 > > + pxor %xmm12,%xmm6 > > + > > + movdqa %xmm0,112-128(%rax) > > + paddd %xmm0,%xmm14 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + pand %xmm11,%xmm6 > > + movdqa %xmm11,%xmm7 > > + > > + pslld $30,%xmm7 > > + paddd %xmm1,%xmm1 > > + paddd %xmm6,%xmm14 > > + > > + psrld $2,%xmm11 > > + paddd %xmm8,%xmm14 > > + por %xmm5,%xmm1 > > + por %xmm7,%xmm11 > > + pxor %xmm4,%xmm2 > > + movdqa 176-128(%rax),%xmm4 > > + > > + movdqa %xmm14,%xmm8 > > + movdqa %xmm12,%xmm7 > > + pxor 16-128(%rax),%xmm2 > > + pxor %xmm4,%xmm2 > > + paddd %xmm15,%xmm13 > > + pslld $5,%xmm8 > > + movdqa %xmm14,%xmm9 > > + pand %xmm11,%xmm7 > > + > > + movdqa %xmm12,%xmm6 > > + movdqa %xmm2,%xmm5 > > + psrld $27,%xmm9 > > + paddd %xmm7,%xmm13 > > + pxor %xmm11,%xmm6 > > + > > + movdqa %xmm1,128-128(%rax) > > + paddd %xmm1,%xmm13 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + pand %xmm10,%xmm6 > > + movdqa %xmm10,%xmm7 > > + > > + pslld $30,%xmm7 > > + paddd %xmm2,%xmm2 > > + paddd %xmm6,%xmm13 > > + > > + psrld $2,%xmm10 > > + paddd %xmm8,%xmm13 > > + por %xmm5,%xmm2 > > + por %xmm7,%xmm10 > > + pxor %xmm0,%xmm3 > > + movdqa 192-128(%rax),%xmm0 > > + > > + movdqa %xmm13,%xmm8 > > + movdqa %xmm11,%xmm7 > > + pxor 32-128(%rax),%xmm3 > > + pxor %xmm0,%xmm3 > > + paddd %xmm15,%xmm12 > > + pslld $5,%xmm8 > > + movdqa %xmm13,%xmm9 > > + pand %xmm10,%xmm7 > > + > > + movdqa %xmm11,%xmm6 > > + movdqa %xmm3,%xmm5 > > + psrld $27,%xmm9 > > + paddd %xmm7,%xmm12 > > + pxor %xmm10,%xmm6 > > + > > + movdqa %xmm2,144-128(%rax) > > + paddd %xmm2,%xmm12 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + pand %xmm14,%xmm6 > > + movdqa %xmm14,%xmm7 > > + > > + pslld $30,%xmm7 > > + paddd %xmm3,%xmm3 > > + paddd %xmm6,%xmm12 > > + > > + psrld $2,%xmm14 > > + paddd %xmm8,%xmm12 > > + por %xmm5,%xmm3 > > + por %xmm7,%xmm14 > > + pxor %xmm1,%xmm4 > > + movdqa 208-128(%rax),%xmm1 > > + > > + movdqa %xmm12,%xmm8 > > + movdqa %xmm10,%xmm7 > > + pxor 48-128(%rax),%xmm4 > > + pxor %xmm1,%xmm4 > > + paddd %xmm15,%xmm11 > > + pslld $5,%xmm8 > > + movdqa %xmm12,%xmm9 > > + pand %xmm14,%xmm7 > > + > > + movdqa %xmm10,%xmm6 > > + movdqa %xmm4,%xmm5 > > + psrld $27,%xmm9 > > + paddd %xmm7,%xmm11 > > + pxor %xmm14,%xmm6 > > + > > + movdqa %xmm3,160-128(%rax) > > + paddd %xmm3,%xmm11 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + pand %xmm13,%xmm6 > > + movdqa %xmm13,%xmm7 > > + > > + pslld $30,%xmm7 > > + paddd %xmm4,%xmm4 > > + paddd %xmm6,%xmm11 > > + > > + psrld $2,%xmm13 > > + paddd %xmm8,%xmm11 > > + por %xmm5,%xmm4 > > + por %xmm7,%xmm13 > > + pxor %xmm2,%xmm0 > > + movdqa 224-128(%rax),%xmm2 > > + > > + movdqa %xmm11,%xmm8 > > + movdqa %xmm14,%xmm7 > > + pxor 64-128(%rax),%xmm0 > > + pxor %xmm2,%xmm0 > > + paddd %xmm15,%xmm10 > > + pslld $5,%xmm8 > > + movdqa %xmm11,%xmm9 > > + pand %xmm13,%xmm7 > > + > > + movdqa %xmm14,%xmm6 > > + movdqa %xmm0,%xmm5 > > + psrld $27,%xmm9 > > + paddd %xmm7,%xmm10 > > + pxor %xmm13,%xmm6 > > + > > + movdqa %xmm4,176-128(%rax) > > + paddd %xmm4,%xmm10 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + pand %xmm12,%xmm6 > > + movdqa %xmm12,%xmm7 > > + > > + pslld $30,%xmm7 > > + paddd %xmm0,%xmm0 > > + paddd %xmm6,%xmm10 > > + > > + psrld $2,%xmm12 > > + paddd %xmm8,%xmm10 > > + por %xmm5,%xmm0 > > + por %xmm7,%xmm12 > > + movdqa 64(%rbp),%xmm15 > > + pxor %xmm3,%xmm1 > > + movdqa 240-128(%rax),%xmm3 > > + > > + movdqa %xmm10,%xmm8 > > + movdqa %xmm13,%xmm6 > > + pxor 80-128(%rax),%xmm1 > > + paddd %xmm15,%xmm14 > > + pslld $5,%xmm8 > > + pxor %xmm11,%xmm6 > > + > > + movdqa %xmm10,%xmm9 > > + movdqa %xmm0,192-128(%rax) > > + paddd %xmm0,%xmm14 > > + pxor %xmm3,%xmm1 > > + psrld $27,%xmm9 > > + pxor %xmm12,%xmm6 > > + movdqa %xmm11,%xmm7 > > + > > + pslld $30,%xmm7 > > + movdqa %xmm1,%xmm5 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + paddd %xmm6,%xmm14 > > + paddd %xmm1,%xmm1 > > + > > + psrld $2,%xmm11 > > + paddd %xmm8,%xmm14 > > + por %xmm5,%xmm1 > > + por %xmm7,%xmm11 > > + pxor %xmm4,%xmm2 > > + movdqa 0-128(%rax),%xmm4 > > + > > + movdqa %xmm14,%xmm8 > > + movdqa %xmm12,%xmm6 > > + pxor 96-128(%rax),%xmm2 > > + paddd %xmm15,%xmm13 > > + pslld $5,%xmm8 > > + pxor %xmm10,%xmm6 > > + > > + movdqa %xmm14,%xmm9 > > + movdqa %xmm1,208-128(%rax) > > + paddd %xmm1,%xmm13 > > + pxor %xmm4,%xmm2 > > + psrld $27,%xmm9 > > + pxor %xmm11,%xmm6 > > + movdqa %xmm10,%xmm7 > > + > > + pslld $30,%xmm7 > > + movdqa %xmm2,%xmm5 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + paddd %xmm6,%xmm13 > > + paddd %xmm2,%xmm2 > > + > > + psrld $2,%xmm10 > > + paddd %xmm8,%xmm13 > > + por %xmm5,%xmm2 > > + por %xmm7,%xmm10 > > + pxor %xmm0,%xmm3 > > + movdqa 16-128(%rax),%xmm0 > > + > > + movdqa %xmm13,%xmm8 > > + movdqa %xmm11,%xmm6 > > + pxor 112-128(%rax),%xmm3 > > + paddd %xmm15,%xmm12 > > + pslld $5,%xmm8 > > + pxor %xmm14,%xmm6 > > + > > + movdqa %xmm13,%xmm9 > > + movdqa %xmm2,224-128(%rax) > > + paddd %xmm2,%xmm12 > > + pxor %xmm0,%xmm3 > > + psrld $27,%xmm9 > > + pxor %xmm10,%xmm6 > > + movdqa %xmm14,%xmm7 > > + > > + pslld $30,%xmm7 > > + movdqa %xmm3,%xmm5 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + paddd %xmm6,%xmm12 > > + paddd %xmm3,%xmm3 > > + > > + psrld $2,%xmm14 > > + paddd %xmm8,%xmm12 > > + por %xmm5,%xmm3 > > + por %xmm7,%xmm14 > > + pxor %xmm1,%xmm4 > > + movdqa 32-128(%rax),%xmm1 > > + > > + movdqa %xmm12,%xmm8 > > + movdqa %xmm10,%xmm6 > > + pxor 128-128(%rax),%xmm4 > > + paddd %xmm15,%xmm11 > > + pslld $5,%xmm8 > > + pxor %xmm13,%xmm6 > > + > > + movdqa %xmm12,%xmm9 > > + movdqa %xmm3,240-128(%rax) > > + paddd %xmm3,%xmm11 > > + pxor %xmm1,%xmm4 > > + psrld $27,%xmm9 > > + pxor %xmm14,%xmm6 > > + movdqa %xmm13,%xmm7 > > + > > + pslld $30,%xmm7 > > + movdqa %xmm4,%xmm5 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + paddd %xmm6,%xmm11 > > + paddd %xmm4,%xmm4 > > + > > + psrld $2,%xmm13 > > + paddd %xmm8,%xmm11 > > + por %xmm5,%xmm4 > > + por %xmm7,%xmm13 > > + pxor %xmm2,%xmm0 > > + movdqa 48-128(%rax),%xmm2 > > + > > + movdqa %xmm11,%xmm8 > > + movdqa %xmm14,%xmm6 > > + pxor 144-128(%rax),%xmm0 > > + paddd %xmm15,%xmm10 > > + pslld $5,%xmm8 > > + pxor %xmm12,%xmm6 > > + > > + movdqa %xmm11,%xmm9 > > + movdqa %xmm4,0-128(%rax) > > + paddd %xmm4,%xmm10 > > + pxor %xmm2,%xmm0 > > + psrld $27,%xmm9 > > + pxor %xmm13,%xmm6 > > + movdqa %xmm12,%xmm7 > > + > > + pslld $30,%xmm7 > > + movdqa %xmm0,%xmm5 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + paddd %xmm6,%xmm10 > > + paddd %xmm0,%xmm0 > > + > > + psrld $2,%xmm12 > > + paddd %xmm8,%xmm10 > > + por %xmm5,%xmm0 > > + por %xmm7,%xmm12 > > + pxor %xmm3,%xmm1 > > + movdqa 64-128(%rax),%xmm3 > > + > > + movdqa %xmm10,%xmm8 > > + movdqa %xmm13,%xmm6 > > + pxor 160-128(%rax),%xmm1 > > + paddd %xmm15,%xmm14 > > + pslld $5,%xmm8 > > + pxor %xmm11,%xmm6 > > + > > + movdqa %xmm10,%xmm9 > > + movdqa %xmm0,16-128(%rax) > > + paddd %xmm0,%xmm14 > > + pxor %xmm3,%xmm1 > > + psrld $27,%xmm9 > > + pxor %xmm12,%xmm6 > > + movdqa %xmm11,%xmm7 > > + > > + pslld $30,%xmm7 > > + movdqa %xmm1,%xmm5 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + paddd %xmm6,%xmm14 > > + paddd %xmm1,%xmm1 > > + > > + psrld $2,%xmm11 > > + paddd %xmm8,%xmm14 > > + por %xmm5,%xmm1 > > + por %xmm7,%xmm11 > > + pxor %xmm4,%xmm2 > > + movdqa 80-128(%rax),%xmm4 > > + > > + movdqa %xmm14,%xmm8 > > + movdqa %xmm12,%xmm6 > > + pxor 176-128(%rax),%xmm2 > > + paddd %xmm15,%xmm13 > > + pslld $5,%xmm8 > > + pxor %xmm10,%xmm6 > > + > > + movdqa %xmm14,%xmm9 > > + movdqa %xmm1,32-128(%rax) > > + paddd %xmm1,%xmm13 > > + pxor %xmm4,%xmm2 > > + psrld $27,%xmm9 > > + pxor %xmm11,%xmm6 > > + movdqa %xmm10,%xmm7 > > + > > + pslld $30,%xmm7 > > + movdqa %xmm2,%xmm5 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + paddd %xmm6,%xmm13 > > + paddd %xmm2,%xmm2 > > + > > + psrld $2,%xmm10 > > + paddd %xmm8,%xmm13 > > + por %xmm5,%xmm2 > > + por %xmm7,%xmm10 > > + pxor %xmm0,%xmm3 > > + movdqa 96-128(%rax),%xmm0 > > + > > + movdqa %xmm13,%xmm8 > > + movdqa %xmm11,%xmm6 > > + pxor 192-128(%rax),%xmm3 > > + paddd %xmm15,%xmm12 > > + pslld $5,%xmm8 > > + pxor %xmm14,%xmm6 > > + > > + movdqa %xmm13,%xmm9 > > + movdqa %xmm2,48-128(%rax) > > + paddd %xmm2,%xmm12 > > + pxor %xmm0,%xmm3 > > + psrld $27,%xmm9 > > + pxor %xmm10,%xmm6 > > + movdqa %xmm14,%xmm7 > > + > > + pslld $30,%xmm7 > > + movdqa %xmm3,%xmm5 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + paddd %xmm6,%xmm12 > > + paddd %xmm3,%xmm3 > > + > > + psrld $2,%xmm14 > > + paddd %xmm8,%xmm12 > > + por %xmm5,%xmm3 > > + por %xmm7,%xmm14 > > + pxor %xmm1,%xmm4 > > + movdqa 112-128(%rax),%xmm1 > > + > > + movdqa %xmm12,%xmm8 > > + movdqa %xmm10,%xmm6 > > + pxor 208-128(%rax),%xmm4 > > + paddd %xmm15,%xmm11 > > + pslld $5,%xmm8 > > + pxor %xmm13,%xmm6 > > + > > + movdqa %xmm12,%xmm9 > > + movdqa %xmm3,64-128(%rax) > > + paddd %xmm3,%xmm11 > > + pxor %xmm1,%xmm4 > > + psrld $27,%xmm9 > > + pxor %xmm14,%xmm6 > > + movdqa %xmm13,%xmm7 > > + > > + pslld $30,%xmm7 > > + movdqa %xmm4,%xmm5 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + paddd %xmm6,%xmm11 > > + paddd %xmm4,%xmm4 > > + > > + psrld $2,%xmm13 > > + paddd %xmm8,%xmm11 > > + por %xmm5,%xmm4 > > + por %xmm7,%xmm13 > > + pxor %xmm2,%xmm0 > > + movdqa 128-128(%rax),%xmm2 > > + > > + movdqa %xmm11,%xmm8 > > + movdqa %xmm14,%xmm6 > > + pxor 224-128(%rax),%xmm0 > > + paddd %xmm15,%xmm10 > > + pslld $5,%xmm8 > > + pxor %xmm12,%xmm6 > > + > > + movdqa %xmm11,%xmm9 > > + movdqa %xmm4,80-128(%rax) > > + paddd %xmm4,%xmm10 > > + pxor %xmm2,%xmm0 > > + psrld $27,%xmm9 > > + pxor %xmm13,%xmm6 > > + movdqa %xmm12,%xmm7 > > + > > + pslld $30,%xmm7 > > + movdqa %xmm0,%xmm5 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + paddd %xmm6,%xmm10 > > + paddd %xmm0,%xmm0 > > + > > + psrld $2,%xmm12 > > + paddd %xmm8,%xmm10 > > + por %xmm5,%xmm0 > > + por %xmm7,%xmm12 > > + pxor %xmm3,%xmm1 > > + movdqa 144-128(%rax),%xmm3 > > + > > + movdqa %xmm10,%xmm8 > > + movdqa %xmm13,%xmm6 > > + pxor 240-128(%rax),%xmm1 > > + paddd %xmm15,%xmm14 > > + pslld $5,%xmm8 > > + pxor %xmm11,%xmm6 > > + > > + movdqa %xmm10,%xmm9 > > + movdqa %xmm0,96-128(%rax) > > + paddd %xmm0,%xmm14 > > + pxor %xmm3,%xmm1 > > + psrld $27,%xmm9 > > + pxor %xmm12,%xmm6 > > + movdqa %xmm11,%xmm7 > > + > > + pslld $30,%xmm7 > > + movdqa %xmm1,%xmm5 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + paddd %xmm6,%xmm14 > > + paddd %xmm1,%xmm1 > > + > > + psrld $2,%xmm11 > > + paddd %xmm8,%xmm14 > > + por %xmm5,%xmm1 > > + por %xmm7,%xmm11 > > + pxor %xmm4,%xmm2 > > + movdqa 160-128(%rax),%xmm4 > > + > > + movdqa %xmm14,%xmm8 > > + movdqa %xmm12,%xmm6 > > + pxor 0-128(%rax),%xmm2 > > + paddd %xmm15,%xmm13 > > + pslld $5,%xmm8 > > + pxor %xmm10,%xmm6 > > + > > + movdqa %xmm14,%xmm9 > > + movdqa %xmm1,112-128(%rax) > > + paddd %xmm1,%xmm13 > > + pxor %xmm4,%xmm2 > > + psrld $27,%xmm9 > > + pxor %xmm11,%xmm6 > > + movdqa %xmm10,%xmm7 > > + > > + pslld $30,%xmm7 > > + movdqa %xmm2,%xmm5 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + paddd %xmm6,%xmm13 > > + paddd %xmm2,%xmm2 > > + > > + psrld $2,%xmm10 > > + paddd %xmm8,%xmm13 > > + por %xmm5,%xmm2 > > + por %xmm7,%xmm10 > > + pxor %xmm0,%xmm3 > > + movdqa 176-128(%rax),%xmm0 > > + > > + movdqa %xmm13,%xmm8 > > + movdqa %xmm11,%xmm6 > > + pxor 16-128(%rax),%xmm3 > > + paddd %xmm15,%xmm12 > > + pslld $5,%xmm8 > > + pxor %xmm14,%xmm6 > > + > > + movdqa %xmm13,%xmm9 > > + paddd %xmm2,%xmm12 > > + pxor %xmm0,%xmm3 > > + psrld $27,%xmm9 > > + pxor %xmm10,%xmm6 > > + movdqa %xmm14,%xmm7 > > + > > + pslld $30,%xmm7 > > + movdqa %xmm3,%xmm5 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + paddd %xmm6,%xmm12 > > + paddd %xmm3,%xmm3 > > + > > + psrld $2,%xmm14 > > + paddd %xmm8,%xmm12 > > + por %xmm5,%xmm3 > > + por %xmm7,%xmm14 > > + pxor %xmm1,%xmm4 > > + movdqa 192-128(%rax),%xmm1 > > + > > + movdqa %xmm12,%xmm8 > > + movdqa %xmm10,%xmm6 > > + pxor 32-128(%rax),%xmm4 > > + paddd %xmm15,%xmm11 > > + pslld $5,%xmm8 > > + pxor %xmm13,%xmm6 > > + > > + movdqa %xmm12,%xmm9 > > + paddd %xmm3,%xmm11 > > + pxor %xmm1,%xmm4 > > + psrld $27,%xmm9 > > + pxor %xmm14,%xmm6 > > + movdqa %xmm13,%xmm7 > > + > > + pslld $30,%xmm7 > > + movdqa %xmm4,%xmm5 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + paddd %xmm6,%xmm11 > > + paddd %xmm4,%xmm4 > > + > > + psrld $2,%xmm13 > > + paddd %xmm8,%xmm11 > > + por %xmm5,%xmm4 > > + por %xmm7,%xmm13 > > + pxor %xmm2,%xmm0 > > + movdqa 208-128(%rax),%xmm2 > > + > > + movdqa %xmm11,%xmm8 > > + movdqa %xmm14,%xmm6 > > + pxor 48-128(%rax),%xmm0 > > + paddd %xmm15,%xmm10 > > + pslld $5,%xmm8 > > + pxor %xmm12,%xmm6 > > + > > + movdqa %xmm11,%xmm9 > > + paddd %xmm4,%xmm10 > > + pxor %xmm2,%xmm0 > > + psrld $27,%xmm9 > > + pxor %xmm13,%xmm6 > > + movdqa %xmm12,%xmm7 > > + > > + pslld $30,%xmm7 > > + movdqa %xmm0,%xmm5 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + paddd %xmm6,%xmm10 > > + paddd %xmm0,%xmm0 > > + > > + psrld $2,%xmm12 > > + paddd %xmm8,%xmm10 > > + por %xmm5,%xmm0 > > + por %xmm7,%xmm12 > > + pxor %xmm3,%xmm1 > > + movdqa 224-128(%rax),%xmm3 > > + > > + movdqa %xmm10,%xmm8 > > + movdqa %xmm13,%xmm6 > > + pxor 64-128(%rax),%xmm1 > > + paddd %xmm15,%xmm14 > > + pslld $5,%xmm8 > > + pxor %xmm11,%xmm6 > > + > > + movdqa %xmm10,%xmm9 > > + paddd %xmm0,%xmm14 > > + pxor %xmm3,%xmm1 > > + psrld $27,%xmm9 > > + pxor %xmm12,%xmm6 > > + movdqa %xmm11,%xmm7 > > + > > + pslld $30,%xmm7 > > + movdqa %xmm1,%xmm5 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + paddd %xmm6,%xmm14 > > + paddd %xmm1,%xmm1 > > + > > + psrld $2,%xmm11 > > + paddd %xmm8,%xmm14 > > + por %xmm5,%xmm1 > > + por %xmm7,%xmm11 > > + pxor %xmm4,%xmm2 > > + movdqa 240-128(%rax),%xmm4 > > + > > + movdqa %xmm14,%xmm8 > > + movdqa %xmm12,%xmm6 > > + pxor 80-128(%rax),%xmm2 > > + paddd %xmm15,%xmm13 > > + pslld $5,%xmm8 > > + pxor %xmm10,%xmm6 > > + > > + movdqa %xmm14,%xmm9 > > + paddd %xmm1,%xmm13 > > + pxor %xmm4,%xmm2 > > + psrld $27,%xmm9 > > + pxor %xmm11,%xmm6 > > + movdqa %xmm10,%xmm7 > > + > > + pslld $30,%xmm7 > > + movdqa %xmm2,%xmm5 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + paddd %xmm6,%xmm13 > > + paddd %xmm2,%xmm2 > > + > > + psrld $2,%xmm10 > > + paddd %xmm8,%xmm13 > > + por %xmm5,%xmm2 > > + por %xmm7,%xmm10 > > + pxor %xmm0,%xmm3 > > + movdqa 0-128(%rax),%xmm0 > > + > > + movdqa %xmm13,%xmm8 > > + movdqa %xmm11,%xmm6 > > + pxor 96-128(%rax),%xmm3 > > + paddd %xmm15,%xmm12 > > + pslld $5,%xmm8 > > + pxor %xmm14,%xmm6 > > + > > + movdqa %xmm13,%xmm9 > > + paddd %xmm2,%xmm12 > > + pxor %xmm0,%xmm3 > > + psrld $27,%xmm9 > > + pxor %xmm10,%xmm6 > > + movdqa %xmm14,%xmm7 > > + > > + pslld $30,%xmm7 > > + movdqa %xmm3,%xmm5 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + paddd %xmm6,%xmm12 > > + paddd %xmm3,%xmm3 > > + > > + psrld $2,%xmm14 > > + paddd %xmm8,%xmm12 > > + por %xmm5,%xmm3 > > + por %xmm7,%xmm14 > > + pxor %xmm1,%xmm4 > > + movdqa 16-128(%rax),%xmm1 > > + > > + movdqa %xmm12,%xmm8 > > + movdqa %xmm10,%xmm6 > > + pxor 112-128(%rax),%xmm4 > > + paddd %xmm15,%xmm11 > > + pslld $5,%xmm8 > > + pxor %xmm13,%xmm6 > > + > > + movdqa %xmm12,%xmm9 > > + paddd %xmm3,%xmm11 > > + pxor %xmm1,%xmm4 > > + psrld $27,%xmm9 > > + pxor %xmm14,%xmm6 > > + movdqa %xmm13,%xmm7 > > + > > + pslld $30,%xmm7 > > + movdqa %xmm4,%xmm5 > > + por %xmm9,%xmm8 > > + psrld $31,%xmm5 > > + paddd %xmm6,%xmm11 > > + paddd %xmm4,%xmm4 > > + > > + psrld $2,%xmm13 > > + paddd %xmm8,%xmm11 > > + por %xmm5,%xmm4 > > + por %xmm7,%xmm13 > > + movdqa %xmm11,%xmm8 > > + paddd %xmm15,%xmm10 > > + movdqa %xmm14,%xmm6 > > + pslld $5,%xmm8 > > + pxor %xmm12,%xmm6 > > + > > + movdqa %xmm11,%xmm9 > > + paddd %xmm4,%xmm10 > > + psrld $27,%xmm9 > > + movdqa %xmm12,%xmm7 > > + pxor %xmm13,%xmm6 > > + > > + pslld $30,%xmm7 > > + por %xmm9,%xmm8 > > + paddd %xmm6,%xmm10 > > + > > + psrld $2,%xmm12 > > + paddd %xmm8,%xmm10 > > + por %xmm7,%xmm12 > > + movdqa (%rbx),%xmm0 > > + movl $1,%ecx > > + cmpl 0(%rbx),%ecx > > + pxor %xmm8,%xmm8 > > + cmovgeq %rbp,%r8 > > + cmpl 4(%rbx),%ecx > > + movdqa %xmm0,%xmm1 > > + cmovgeq %rbp,%r9 > > + cmpl 8(%rbx),%ecx > > + pcmpgtd %xmm8,%xmm1 > > + cmovgeq %rbp,%r10 > > + cmpl 12(%rbx),%ecx > > + paddd %xmm1,%xmm0 > > + cmovgeq %rbp,%r11 > > + > > + movdqu 0(%rdi),%xmm6 > > + pand %xmm1,%xmm10 > > + movdqu 32(%rdi),%xmm7 > > + pand %xmm1,%xmm11 > > + paddd %xmm6,%xmm10 > > + movdqu 64(%rdi),%xmm8 > > + pand %xmm1,%xmm12 > > + paddd %xmm7,%xmm11 > > + movdqu 96(%rdi),%xmm9 > > + pand %xmm1,%xmm13 > > + paddd %xmm8,%xmm12 > > + movdqu 128(%rdi),%xmm5 > > + pand %xmm1,%xmm14 > > + movdqu %xmm10,0(%rdi) > > + paddd %xmm9,%xmm13 > > + movdqu %xmm11,32(%rdi) > > + paddd %xmm5,%xmm14 > > + movdqu %xmm12,64(%rdi) > > + movdqu %xmm13,96(%rdi) > > + movdqu %xmm14,128(%rdi) > > + > > + movdqa %xmm0,(%rbx) > > + movdqa 96(%rbp),%xmm5 > > + movdqa -32(%rbp),%xmm15 > > + decl %edx > > + jnz .Loop > > + > > + movl 280(%rsp),%edx > > + leaq 16(%rdi),%rdi > > + leaq 64(%rsi),%rsi > > + decl %edx > > + jnz .Loop_grande > > + > > +.Ldone: > > + movq 272(%rsp),%rax > > +.cfi_def_cfa %rax,8 > > + movq -16(%rax),%rbp > > +.cfi_restore %rbp > > + movq -8(%rax),%rbx > > +.cfi_restore %rbx > > + leaq (%rax),%rsp > > +.cfi_def_cfa_register %rsp > > +.Lepilogue: > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size sha1_multi_block,.-sha1_multi_block > > +.type sha1_multi_block_shaext,@function > > +.align 32 > > +sha1_multi_block_shaext: > > +.cfi_startproc > > +_shaext_shortcut: > > + movq %rsp,%rax > > +.cfi_def_cfa_register %rax > > + pushq %rbx > > +.cfi_offset %rbx,-16 > > + pushq %rbp > > +.cfi_offset %rbp,-24 > > + subq $288,%rsp > > + shll $1,%edx > > + andq $-256,%rsp > > + leaq 64(%rdi),%rdi > > + movq %rax,272(%rsp) > > +.Lbody_shaext: > > + leaq 256(%rsp),%rbx > > + movdqa K_XX_XX+128(%rip),%xmm3 > > + > > +.Loop_grande_shaext: > > + movl %edx,280(%rsp) > > + xorl %edx,%edx > > + movq 0(%rsi),%r8 > > + movl 8(%rsi),%ecx > > + cmpl %edx,%ecx > > + cmovgl %ecx,%edx > > + testl %ecx,%ecx > > + movl %ecx,0(%rbx) > > + cmovleq %rsp,%r8 > > + movq 16(%rsi),%r9 > > + movl 24(%rsi),%ecx > > + cmpl %edx,%ecx > > + cmovgl %ecx,%edx > > + testl %ecx,%ecx > > + movl %ecx,4(%rbx) > > + cmovleq %rsp,%r9 > > + testl %edx,%edx > > + jz .Ldone_shaext > > + > > + movq 0-64(%rdi),%xmm0 > > + movq 32-64(%rdi),%xmm4 > > + movq 64-64(%rdi),%xmm5 > > + movq 96-64(%rdi),%xmm6 > > + movq 128-64(%rdi),%xmm7 > > + > > + punpckldq %xmm4,%xmm0 > > + punpckldq %xmm6,%xmm5 > > + > > + movdqa %xmm0,%xmm8 > > + punpcklqdq %xmm5,%xmm0 > > + punpckhqdq %xmm5,%xmm8 > > + > > + pshufd $63,%xmm7,%xmm1 > > + pshufd $127,%xmm7,%xmm9 > > + pshufd $27,%xmm0,%xmm0 > > + pshufd $27,%xmm8,%xmm8 > > + jmp .Loop_shaext > > + > > +.align 32 > > +.Loop_shaext: > > + movdqu 0(%r8),%xmm4 > > + movdqu 0(%r9),%xmm11 > > + movdqu 16(%r8),%xmm5 > > + movdqu 16(%r9),%xmm12 > > + movdqu 32(%r8),%xmm6 > > +.byte 102,15,56,0,227 > > + movdqu 32(%r9),%xmm13 > > +.byte 102,68,15,56,0,219 > > + movdqu 48(%r8),%xmm7 > > + leaq 64(%r8),%r8 > > +.byte 102,15,56,0,235 > > + movdqu 48(%r9),%xmm14 > > + leaq 64(%r9),%r9 > > +.byte 102,68,15,56,0,227 > > + > > + movdqa %xmm1,80(%rsp) > > + paddd %xmm4,%xmm1 > > + movdqa %xmm9,112(%rsp) > > + paddd %xmm11,%xmm9 > > + movdqa %xmm0,64(%rsp) > > + movdqa %xmm0,%xmm2 > > + movdqa %xmm8,96(%rsp) > > + movdqa %xmm8,%xmm10 > > +.byte 15,58,204,193,0 > > +.byte 15,56,200,213 > > +.byte 69,15,58,204,193,0 > > +.byte 69,15,56,200,212 > > +.byte 102,15,56,0,243 > > + prefetcht0 127(%r8) > > +.byte 15,56,201,229 > > +.byte 102,68,15,56,0,235 > > + prefetcht0 127(%r9) > > +.byte 69,15,56,201,220 > > + > > +.byte 102,15,56,0,251 > > + movdqa %xmm0,%xmm1 > > +.byte 102,68,15,56,0,243 > > + movdqa %xmm8,%xmm9 > > +.byte 15,58,204,194,0 > > +.byte 15,56,200,206 > > +.byte 69,15,58,204,194,0 > > +.byte 69,15,56,200,205 > > + pxor %xmm6,%xmm4 > > +.byte 15,56,201,238 > > + pxor %xmm13,%xmm11 > > +.byte 69,15,56,201,229 > > + movdqa %xmm0,%xmm2 > > + movdqa %xmm8,%xmm10 > > +.byte 15,58,204,193,0 > > +.byte 15,56,200,215 > > +.byte 69,15,58,204,193,0 > > +.byte 69,15,56,200,214 > > +.byte 15,56,202,231 > > +.byte 69,15,56,202,222 > > + pxor %xmm7,%xmm5 > > +.byte 15,56,201,247 > > + pxor %xmm14,%xmm12 > > +.byte 69,15,56,201,238 > > + movdqa %xmm0,%xmm1 > > + movdqa %xmm8,%xmm9 > > +.byte 15,58,204,194,0 > > +.byte 15,56,200,204 > > +.byte 69,15,58,204,194,0 > > +.byte 69,15,56,200,203 > > +.byte 15,56,202,236 > > +.byte 69,15,56,202,227 > > + pxor %xmm4,%xmm6 > > +.byte 15,56,201,252 > > + pxor %xmm11,%xmm13 > > +.byte 69,15,56,201,243 > > + movdqa %xmm0,%xmm2 > > + movdqa %xmm8,%xmm10 > > +.byte 15,58,204,193,0 > > +.byte 15,56,200,213 > > +.byte 69,15,58,204,193,0 > > +.byte 69,15,56,200,212 > > +.byte 15,56,202,245 > > +.byte 69,15,56,202,236 > > + pxor %xmm5,%xmm7 > > +.byte 15,56,201,229 > > + pxor %xmm12,%xmm14 > > +.byte 69,15,56,201,220 > > + movdqa %xmm0,%xmm1 > > + movdqa %xmm8,%xmm9 > > +.byte 15,58,204,194,1 > > +.byte 15,56,200,206 > > +.byte 69,15,58,204,194,1 > > +.byte 69,15,56,200,205 > > +.byte 15,56,202,254 > > +.byte 69,15,56,202,245 > > + pxor %xmm6,%xmm4 > > +.byte 15,56,201,238 > > + pxor %xmm13,%xmm11 > > +.byte 69,15,56,201,229 > > + movdqa %xmm0,%xmm2 > > + movdqa %xmm8,%xmm10 > > +.byte 15,58,204,193,1 > > +.byte 15,56,200,215 > > +.byte 69,15,58,204,193,1 > > +.byte 69,15,56,200,214 > > +.byte 15,56,202,231 > > +.byte 69,15,56,202,222 > > + pxor %xmm7,%xmm5 > > +.byte 15,56,201,247 > > + pxor %xmm14,%xmm12 > > +.byte 69,15,56,201,238 > > + movdqa %xmm0,%xmm1 > > + movdqa %xmm8,%xmm9 > > +.byte 15,58,204,194,1 > > +.byte 15,56,200,204 > > +.byte 69,15,58,204,194,1 > > +.byte 69,15,56,200,203 > > +.byte 15,56,202,236 > > +.byte 69,15,56,202,227 > > + pxor %xmm4,%xmm6 > > +.byte 15,56,201,252 > > + pxor %xmm11,%xmm13 > > +.byte 69,15,56,201,243 > > + movdqa %xmm0,%xmm2 > > + movdqa %xmm8,%xmm10 > > +.byte 15,58,204,193,1 > > +.byte 15,56,200,213 > > +.byte 69,15,58,204,193,1 > > +.byte 69,15,56,200,212 > > +.byte 15,56,202,245 > > +.byte 69,15,56,202,236 > > + pxor %xmm5,%xmm7 > > +.byte 15,56,201,229 > > + pxor %xmm12,%xmm14 > > +.byte 69,15,56,201,220 > > + movdqa %xmm0,%xmm1 > > + movdqa %xmm8,%xmm9 > > +.byte 15,58,204,194,1 > > +.byte 15,56,200,206 > > +.byte 69,15,58,204,194,1 > > +.byte 69,15,56,200,205 > > +.byte 15,56,202,254 > > +.byte 69,15,56,202,245 > > + pxor %xmm6,%xmm4 > > +.byte 15,56,201,238 > > + pxor %xmm13,%xmm11 > > +.byte 69,15,56,201,229 > > + movdqa %xmm0,%xmm2 > > + movdqa %xmm8,%xmm10 > > +.byte 15,58,204,193,2 > > +.byte 15,56,200,215 > > +.byte 69,15,58,204,193,2 > > +.byte 69,15,56,200,214 > > +.byte 15,56,202,231 > > +.byte 69,15,56,202,222 > > + pxor %xmm7,%xmm5 > > +.byte 15,56,201,247 > > + pxor %xmm14,%xmm12 > > +.byte 69,15,56,201,238 > > + movdqa %xmm0,%xmm1 > > + movdqa %xmm8,%xmm9 > > +.byte 15,58,204,194,2 > > +.byte 15,56,200,204 > > +.byte 69,15,58,204,194,2 > > +.byte 69,15,56,200,203 > > +.byte 15,56,202,236 > > +.byte 69,15,56,202,227 > > + pxor %xmm4,%xmm6 > > +.byte 15,56,201,252 > > + pxor %xmm11,%xmm13 > > +.byte 69,15,56,201,243 > > + movdqa %xmm0,%xmm2 > > + movdqa %xmm8,%xmm10 > > +.byte 15,58,204,193,2 > > +.byte 15,56,200,213 > > +.byte 69,15,58,204,193,2 > > +.byte 69,15,56,200,212 > > +.byte 15,56,202,245 > > +.byte 69,15,56,202,236 > > + pxor %xmm5,%xmm7 > > +.byte 15,56,201,229 > > + pxor %xmm12,%xmm14 > > +.byte 69,15,56,201,220 > > + movdqa %xmm0,%xmm1 > > + movdqa %xmm8,%xmm9 > > +.byte 15,58,204,194,2 > > +.byte 15,56,200,206 > > +.byte 69,15,58,204,194,2 > > +.byte 69,15,56,200,205 > > +.byte 15,56,202,254 > > +.byte 69,15,56,202,245 > > + pxor %xmm6,%xmm4 > > +.byte 15,56,201,238 > > + pxor %xmm13,%xmm11 > > +.byte 69,15,56,201,229 > > + movdqa %xmm0,%xmm2 > > + movdqa %xmm8,%xmm10 > > +.byte 15,58,204,193,2 > > +.byte 15,56,200,215 > > +.byte 69,15,58,204,193,2 > > +.byte 69,15,56,200,214 > > +.byte 15,56,202,231 > > +.byte 69,15,56,202,222 > > + pxor %xmm7,%xmm5 > > +.byte 15,56,201,247 > > + pxor %xmm14,%xmm12 > > +.byte 69,15,56,201,238 > > + movdqa %xmm0,%xmm1 > > + movdqa %xmm8,%xmm9 > > +.byte 15,58,204,194,3 > > +.byte 15,56,200,204 > > +.byte 69,15,58,204,194,3 > > +.byte 69,15,56,200,203 > > +.byte 15,56,202,236 > > +.byte 69,15,56,202,227 > > + pxor %xmm4,%xmm6 > > +.byte 15,56,201,252 > > + pxor %xmm11,%xmm13 > > +.byte 69,15,56,201,243 > > + movdqa %xmm0,%xmm2 > > + movdqa %xmm8,%xmm10 > > +.byte 15,58,204,193,3 > > +.byte 15,56,200,213 > > +.byte 69,15,58,204,193,3 > > +.byte 69,15,56,200,212 > > +.byte 15,56,202,245 > > +.byte 69,15,56,202,236 > > + pxor %xmm5,%xmm7 > > + pxor %xmm12,%xmm14 > > + > > + movl $1,%ecx > > + pxor %xmm4,%xmm4 > > + cmpl 0(%rbx),%ecx > > + cmovgeq %rsp,%r8 > > + > > + movdqa %xmm0,%xmm1 > > + movdqa %xmm8,%xmm9 > > +.byte 15,58,204,194,3 > > +.byte 15,56,200,206 > > +.byte 69,15,58,204,194,3 > > +.byte 69,15,56,200,205 > > +.byte 15,56,202,254 > > +.byte 69,15,56,202,245 > > + > > + cmpl 4(%rbx),%ecx > > + cmovgeq %rsp,%r9 > > + movq (%rbx),%xmm6 > > + > > + movdqa %xmm0,%xmm2 > > + movdqa %xmm8,%xmm10 > > +.byte 15,58,204,193,3 > > +.byte 15,56,200,215 > > +.byte 69,15,58,204,193,3 > > +.byte 69,15,56,200,214 > > + > > + pshufd $0x00,%xmm6,%xmm11 > > + pshufd $0x55,%xmm6,%xmm12 > > + movdqa %xmm6,%xmm7 > > + pcmpgtd %xmm4,%xmm11 > > + pcmpgtd %xmm4,%xmm12 > > + > > + movdqa %xmm0,%xmm1 > > + movdqa %xmm8,%xmm9 > > +.byte 15,58,204,194,3 > > +.byte 15,56,200,204 > > +.byte 69,15,58,204,194,3 > > +.byte 68,15,56,200,204 > > + > > + pcmpgtd %xmm4,%xmm7 > > + pand %xmm11,%xmm0 > > + pand %xmm11,%xmm1 > > + pand %xmm12,%xmm8 > > + pand %xmm12,%xmm9 > > + paddd %xmm7,%xmm6 > > + > > + paddd 64(%rsp),%xmm0 > > + paddd 80(%rsp),%xmm1 > > + paddd 96(%rsp),%xmm8 > > + paddd 112(%rsp),%xmm9 > > + > > + movq %xmm6,(%rbx) > > + decl %edx > > + jnz .Loop_shaext > > + > > + movl 280(%rsp),%edx > > + > > + pshufd $27,%xmm0,%xmm0 > > + pshufd $27,%xmm8,%xmm8 > > + > > + movdqa %xmm0,%xmm6 > > + punpckldq %xmm8,%xmm0 > > + punpckhdq %xmm8,%xmm6 > > + punpckhdq %xmm9,%xmm1 > > + movq %xmm0,0-64(%rdi) > > + psrldq $8,%xmm0 > > + movq %xmm6,64-64(%rdi) > > + psrldq $8,%xmm6 > > + movq %xmm0,32-64(%rdi) > > + psrldq $8,%xmm1 > > + movq %xmm6,96-64(%rdi) > > + movq %xmm1,128-64(%rdi) > > + > > + leaq 8(%rdi),%rdi > > + leaq 32(%rsi),%rsi > > + decl %edx > > + jnz .Loop_grande_shaext > > + > > +.Ldone_shaext: > > + > > + movq -16(%rax),%rbp > > +.cfi_restore %rbp > > + movq -8(%rax),%rbx > > +.cfi_restore %rbx > > + leaq (%rax),%rsp > > +.cfi_def_cfa_register %rsp > > +.Lepilogue_shaext: > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size sha1_multi_block_shaext,.-sha1_multi_block_shaext > > + > > +.align 256 > > +.long 0x5a827999,0x5a827999,0x5a827999,0x5a827999 > > +.long 0x5a827999,0x5a827999,0x5a827999,0x5a827999 > > +K_XX_XX: > > +.long 0x6ed9eba1,0x6ed9eba1,0x6ed9eba1,0x6ed9eba1 > > +.long 0x6ed9eba1,0x6ed9eba1,0x6ed9eba1,0x6ed9eba1 > > +.long 0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc > > +.long 0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc > > +.long 0xca62c1d6,0xca62c1d6,0xca62c1d6,0xca62c1d6 > > +.long 0xca62c1d6,0xca62c1d6,0xca62c1d6,0xca62c1d6 > > +.long 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f > > +.long 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f > > +.byte 0xf,0xe,0xd,0xc,0xb,0xa,0x9,0x8,0x7,0x6,0x5,0x4,0x3,0x2,0x1,0x0 > > +.byte > > > 83,72,65,49,32,109,117,108,116,105,45,98,108,111,99,107,32,116,114,97,110, > > > 115,102,111,114,109,32,102,111,114,32,120,56,54,95,54,52,44,32,67,82,89,80 > , > > > 84,79,71,65,77,83,32,98,121,32,60,97,112,112,114,111,64,111,112,101,110,11 > > 5,115,108,46,111,114,103,62,0 > > diff --git > a/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha1-x86_64.S > > b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha1-x86_64.S > > new file mode 100644 > > index 0000000000..0b59726ae4 > > --- /dev/null > > +++ b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha1-x86_64.S > > @@ -0,0 +1,2631 @@ > > +# WARNING: do not edit! > > +# Generated from openssl/crypto/sha/asm/sha1-x86_64.pl > > +# > > +# Copyright 2006-2020 The OpenSSL Project Authors. All Rights Reserved. > > +# > > +# Licensed under the OpenSSL license (the "License"). You may not use > > +# this file except in compliance with the License. You can obtain a copy > > +# in the file LICENSE in the source distribution or at > > +# https://www.openssl.org/source/license.html > > + > > +.text > > + > > + > > +.globl sha1_block_data_order > > +.type sha1_block_data_order,@function > > +.align 16 > > +sha1_block_data_order: > > +.cfi_startproc > > + movl OPENSSL_ia32cap_P+0(%rip),%r9d > > + movl OPENSSL_ia32cap_P+4(%rip),%r8d > > + movl OPENSSL_ia32cap_P+8(%rip),%r10d > > + testl $512,%r8d > > + jz .Lialu > > + testl $536870912,%r10d > > + jnz _shaext_shortcut > > + jmp _ssse3_shortcut > > + > > +.align 16 > > +.Lialu: > > + movq %rsp,%rax > > +.cfi_def_cfa_register %rax > > + pushq %rbx > > +.cfi_offset %rbx,-16 > > + pushq %rbp > > +.cfi_offset %rbp,-24 > > + pushq %r12 > > +.cfi_offset %r12,-32 > > + pushq %r13 > > +.cfi_offset %r13,-40 > > + pushq %r14 > > +.cfi_offset %r14,-48 > > + movq %rdi,%r8 > > + subq $72,%rsp > > + movq %rsi,%r9 > > + andq $-64,%rsp > > + movq %rdx,%r10 > > + movq %rax,64(%rsp) > > +.cfi_escape 0x0f,0x06,0x77,0xc0,0x00,0x06,0x23,0x08 > > +.Lprologue: > > + > > + movl 0(%r8),%esi > > + movl 4(%r8),%edi > > + movl 8(%r8),%r11d > > + movl 12(%r8),%r12d > > + movl 16(%r8),%r13d > > + jmp .Lloop > > + > > +.align 16 > > +.Lloop: > > + movl 0(%r9),%edx > > + bswapl %edx > > + movl 4(%r9),%ebp > > + movl %r12d,%eax > > + movl %edx,0(%rsp) > > + movl %esi,%ecx > > + bswapl %ebp > > + xorl %r11d,%eax > > + roll $5,%ecx > > + andl %edi,%eax > > + leal 1518500249(%rdx,%r13,1),%r13d > > + addl %ecx,%r13d > > + xorl %r12d,%eax > > + roll $30,%edi > > + addl %eax,%r13d > > + movl 8(%r9),%r14d > > + movl %r11d,%eax > > + movl %ebp,4(%rsp) > > + movl %r13d,%ecx > > + bswapl %r14d > > + xorl %edi,%eax > > + roll $5,%ecx > > + andl %esi,%eax > > + leal 1518500249(%rbp,%r12,1),%r12d > > + addl %ecx,%r12d > > + xorl %r11d,%eax > > + roll $30,%esi > > + addl %eax,%r12d > > + movl 12(%r9),%edx > > + movl %edi,%eax > > + movl %r14d,8(%rsp) > > + movl %r12d,%ecx > > + bswapl %edx > > + xorl %esi,%eax > > + roll $5,%ecx > > + andl %r13d,%eax > > + leal 1518500249(%r14,%r11,1),%r11d > > + addl %ecx,%r11d > > + xorl %edi,%eax > > + roll $30,%r13d > > + addl %eax,%r11d > > + movl 16(%r9),%ebp > > + movl %esi,%eax > > + movl %edx,12(%rsp) > > + movl %r11d,%ecx > > + bswapl %ebp > > + xorl %r13d,%eax > > + roll $5,%ecx > > + andl %r12d,%eax > > + leal 1518500249(%rdx,%rdi,1),%edi > > + addl %ecx,%edi > > + xorl %esi,%eax > > + roll $30,%r12d > > + addl %eax,%edi > > + movl 20(%r9),%r14d > > + movl %r13d,%eax > > + movl %ebp,16(%rsp) > > + movl %edi,%ecx > > + bswapl %r14d > > + xorl %r12d,%eax > > + roll $5,%ecx > > + andl %r11d,%eax > > + leal 1518500249(%rbp,%rsi,1),%esi > > + addl %ecx,%esi > > + xorl %r13d,%eax > > + roll $30,%r11d > > + addl %eax,%esi > > + movl 24(%r9),%edx > > + movl %r12d,%eax > > + movl %r14d,20(%rsp) > > + movl %esi,%ecx > > + bswapl %edx > > + xorl %r11d,%eax > > + roll $5,%ecx > > + andl %edi,%eax > > + leal 1518500249(%r14,%r13,1),%r13d > > + addl %ecx,%r13d > > + xorl %r12d,%eax > > + roll $30,%edi > > + addl %eax,%r13d > > + movl 28(%r9),%ebp > > + movl %r11d,%eax > > + movl %edx,24(%rsp) > > + movl %r13d,%ecx > > + bswapl %ebp > > + xorl %edi,%eax > > + roll $5,%ecx > > + andl %esi,%eax > > + leal 1518500249(%rdx,%r12,1),%r12d > > + addl %ecx,%r12d > > + xorl %r11d,%eax > > + roll $30,%esi > > + addl %eax,%r12d > > + movl 32(%r9),%r14d > > + movl %edi,%eax > > + movl %ebp,28(%rsp) > > + movl %r12d,%ecx > > + bswapl %r14d > > + xorl %esi,%eax > > + roll $5,%ecx > > + andl %r13d,%eax > > + leal 1518500249(%rbp,%r11,1),%r11d > > + addl %ecx,%r11d > > + xorl %edi,%eax > > + roll $30,%r13d > > + addl %eax,%r11d > > + movl 36(%r9),%edx > > + movl %esi,%eax > > + movl %r14d,32(%rsp) > > + movl %r11d,%ecx > > + bswapl %edx > > + xorl %r13d,%eax > > + roll $5,%ecx > > + andl %r12d,%eax > > + leal 1518500249(%r14,%rdi,1),%edi > > + addl %ecx,%edi > > + xorl %esi,%eax > > + roll $30,%r12d > > + addl %eax,%edi > > + movl 40(%r9),%ebp > > + movl %r13d,%eax > > + movl %edx,36(%rsp) > > + movl %edi,%ecx > > + bswapl %ebp > > + xorl %r12d,%eax > > + roll $5,%ecx > > + andl %r11d,%eax > > + leal 1518500249(%rdx,%rsi,1),%esi > > + addl %ecx,%esi > > + xorl %r13d,%eax > > + roll $30,%r11d > > + addl %eax,%esi > > + movl 44(%r9),%r14d > > + movl %r12d,%eax > > + movl %ebp,40(%rsp) > > + movl %esi,%ecx > > + bswapl %r14d > > + xorl %r11d,%eax > > + roll $5,%ecx > > + andl %edi,%eax > > + leal 1518500249(%rbp,%r13,1),%r13d > > + addl %ecx,%r13d > > + xorl %r12d,%eax > > + roll $30,%edi > > + addl %eax,%r13d > > + movl 48(%r9),%edx > > + movl %r11d,%eax > > + movl %r14d,44(%rsp) > > + movl %r13d,%ecx > > + bswapl %edx > > + xorl %edi,%eax > > + roll $5,%ecx > > + andl %esi,%eax > > + leal 1518500249(%r14,%r12,1),%r12d > > + addl %ecx,%r12d > > + xorl %r11d,%eax > > + roll $30,%esi > > + addl %eax,%r12d > > + movl 52(%r9),%ebp > > + movl %edi,%eax > > + movl %edx,48(%rsp) > > + movl %r12d,%ecx > > + bswapl %ebp > > + xorl %esi,%eax > > + roll $5,%ecx > > + andl %r13d,%eax > > + leal 1518500249(%rdx,%r11,1),%r11d > > + addl %ecx,%r11d > > + xorl %edi,%eax > > + roll $30,%r13d > > + addl %eax,%r11d > > + movl 56(%r9),%r14d > > + movl %esi,%eax > > + movl %ebp,52(%rsp) > > + movl %r11d,%ecx > > + bswapl %r14d > > + xorl %r13d,%eax > > + roll $5,%ecx > > + andl %r12d,%eax > > + leal 1518500249(%rbp,%rdi,1),%edi > > + addl %ecx,%edi > > + xorl %esi,%eax > > + roll $30,%r12d > > + addl %eax,%edi > > + movl 60(%r9),%edx > > + movl %r13d,%eax > > + movl %r14d,56(%rsp) > > + movl %edi,%ecx > > + bswapl %edx > > + xorl %r12d,%eax > > + roll $5,%ecx > > + andl %r11d,%eax > > + leal 1518500249(%r14,%rsi,1),%esi > > + addl %ecx,%esi > > + xorl %r13d,%eax > > + roll $30,%r11d > > + addl %eax,%esi > > + xorl 0(%rsp),%ebp > > + movl %r12d,%eax > > + movl %edx,60(%rsp) > > + movl %esi,%ecx > > + xorl 8(%rsp),%ebp > > + xorl %r11d,%eax > > + roll $5,%ecx > > + xorl 32(%rsp),%ebp > > + andl %edi,%eax > > + leal 1518500249(%rdx,%r13,1),%r13d > > + roll $30,%edi > > + xorl %r12d,%eax > > + addl %ecx,%r13d > > + roll $1,%ebp > > + addl %eax,%r13d > > + xorl 4(%rsp),%r14d > > + movl %r11d,%eax > > + movl %ebp,0(%rsp) > > + movl %r13d,%ecx > > + xorl 12(%rsp),%r14d > > + xorl %edi,%eax > > + roll $5,%ecx > > + xorl 36(%rsp),%r14d > > + andl %esi,%eax > > + leal 1518500249(%rbp,%r12,1),%r12d > > + roll $30,%esi > > + xorl %r11d,%eax > > + addl %ecx,%r12d > > + roll $1,%r14d > > + addl %eax,%r12d > > + xorl 8(%rsp),%edx > > + movl %edi,%eax > > + movl %r14d,4(%rsp) > > + movl %r12d,%ecx > > + xorl 16(%rsp),%edx > > + xorl %esi,%eax > > + roll $5,%ecx > > + xorl 40(%rsp),%edx > > + andl %r13d,%eax > > + leal 1518500249(%r14,%r11,1),%r11d > > + roll $30,%r13d > > + xorl %edi,%eax > > + addl %ecx,%r11d > > + roll $1,%edx > > + addl %eax,%r11d > > + xorl 12(%rsp),%ebp > > + movl %esi,%eax > > + movl %edx,8(%rsp) > > + movl %r11d,%ecx > > + xorl 20(%rsp),%ebp > > + xorl %r13d,%eax > > + roll $5,%ecx > > + xorl 44(%rsp),%ebp > > + andl %r12d,%eax > > + leal 1518500249(%rdx,%rdi,1),%edi > > + roll $30,%r12d > > + xorl %esi,%eax > > + addl %ecx,%edi > > + roll $1,%ebp > > + addl %eax,%edi > > + xorl 16(%rsp),%r14d > > + movl %r13d,%eax > > + movl %ebp,12(%rsp) > > + movl %edi,%ecx > > + xorl 24(%rsp),%r14d > > + xorl %r12d,%eax > > + roll $5,%ecx > > + xorl 48(%rsp),%r14d > > + andl %r11d,%eax > > + leal 1518500249(%rbp,%rsi,1),%esi > > + roll $30,%r11d > > + xorl %r13d,%eax > > + addl %ecx,%esi > > + roll $1,%r14d > > + addl %eax,%esi > > + xorl 20(%rsp),%edx > > + movl %edi,%eax > > + movl %r14d,16(%rsp) > > + movl %esi,%ecx > > + xorl 28(%rsp),%edx > > + xorl %r12d,%eax > > + roll $5,%ecx > > + xorl 52(%rsp),%edx > > + leal 1859775393(%r14,%r13,1),%r13d > > + xorl %r11d,%eax > > + addl %ecx,%r13d > > + roll $30,%edi > > + addl %eax,%r13d > > + roll $1,%edx > > + xorl 24(%rsp),%ebp > > + movl %esi,%eax > > + movl %edx,20(%rsp) > > + movl %r13d,%ecx > > + xorl 32(%rsp),%ebp > > + xorl %r11d,%eax > > + roll $5,%ecx > > + xorl 56(%rsp),%ebp > > + leal 1859775393(%rdx,%r12,1),%r12d > > + xorl %edi,%eax > > + addl %ecx,%r12d > > + roll $30,%esi > > + addl %eax,%r12d > > + roll $1,%ebp > > + xorl 28(%rsp),%r14d > > + movl %r13d,%eax > > + movl %ebp,24(%rsp) > > + movl %r12d,%ecx > > + xorl 36(%rsp),%r14d > > + xorl %edi,%eax > > + roll $5,%ecx > > + xorl 60(%rsp),%r14d > > + leal 1859775393(%rbp,%r11,1),%r11d > > + xorl %esi,%eax > > + addl %ecx,%r11d > > + roll $30,%r13d > > + addl %eax,%r11d > > + roll $1,%r14d > > + xorl 32(%rsp),%edx > > + movl %r12d,%eax > > + movl %r14d,28(%rsp) > > + movl %r11d,%ecx > > + xorl 40(%rsp),%edx > > + xorl %esi,%eax > > + roll $5,%ecx > > + xorl 0(%rsp),%edx > > + leal 1859775393(%r14,%rdi,1),%edi > > + xorl %r13d,%eax > > + addl %ecx,%edi > > + roll $30,%r12d > > + addl %eax,%edi > > + roll $1,%edx > > + xorl 36(%rsp),%ebp > > + movl %r11d,%eax > > + movl %edx,32(%rsp) > > + movl %edi,%ecx > > + xorl 44(%rsp),%ebp > > + xorl %r13d,%eax > > + roll $5,%ecx > > + xorl 4(%rsp),%ebp > > + leal 1859775393(%rdx,%rsi,1),%esi > > + xorl %r12d,%eax > > + addl %ecx,%esi > > + roll $30,%r11d > > + addl %eax,%esi > > + roll $1,%ebp > > + xorl 40(%rsp),%r14d > > + movl %edi,%eax > > + movl %ebp,36(%rsp) > > + movl %esi,%ecx > > + xorl 48(%rsp),%r14d > > + xorl %r12d,%eax > > + roll $5,%ecx > > + xorl 8(%rsp),%r14d > > + leal 1859775393(%rbp,%r13,1),%r13d > > + xorl %r11d,%eax > > + addl %ecx,%r13d > > + roll $30,%edi > > + addl %eax,%r13d > > + roll $1,%r14d > > + xorl 44(%rsp),%edx > > + movl %esi,%eax > > + movl %r14d,40(%rsp) > > + movl %r13d,%ecx > > + xorl 52(%rsp),%edx > > + xorl %r11d,%eax > > + roll $5,%ecx > > + xorl 12(%rsp),%edx > > + leal 1859775393(%r14,%r12,1),%r12d > > + xorl %edi,%eax > > + addl %ecx,%r12d > > + roll $30,%esi > > + addl %eax,%r12d > > + roll $1,%edx > > + xorl 48(%rsp),%ebp > > + movl %r13d,%eax > > + movl %edx,44(%rsp) > > + movl %r12d,%ecx > > + xorl 56(%rsp),%ebp > > + xorl %edi,%eax > > + roll $5,%ecx > > + xorl 16(%rsp),%ebp > > + leal 1859775393(%rdx,%r11,1),%r11d > > + xorl %esi,%eax > > + addl %ecx,%r11d > > + roll $30,%r13d > > + addl %eax,%r11d > > + roll $1,%ebp > > + xorl 52(%rsp),%r14d > > + movl %r12d,%eax > > + movl %ebp,48(%rsp) > > + movl %r11d,%ecx > > + xorl 60(%rsp),%r14d > > + xorl %esi,%eax > > + roll $5,%ecx > > + xorl 20(%rsp),%r14d > > + leal 1859775393(%rbp,%rdi,1),%edi > > + xorl %r13d,%eax > > + addl %ecx,%edi > > + roll $30,%r12d > > + addl %eax,%edi > > + roll $1,%r14d > > + xorl 56(%rsp),%edx > > + movl %r11d,%eax > > + movl %r14d,52(%rsp) > > + movl %edi,%ecx > > + xorl 0(%rsp),%edx > > + xorl %r13d,%eax > > + roll $5,%ecx > > + xorl 24(%rsp),%edx > > + leal 1859775393(%r14,%rsi,1),%esi > > + xorl %r12d,%eax > > + addl %ecx,%esi > > + roll $30,%r11d > > + addl %eax,%esi > > + roll $1,%edx > > + xorl 60(%rsp),%ebp > > + movl %edi,%eax > > + movl %edx,56(%rsp) > > + movl %esi,%ecx > > + xorl 4(%rsp),%ebp > > + xorl %r12d,%eax > > + roll $5,%ecx > > + xorl 28(%rsp),%ebp > > + leal 1859775393(%rdx,%r13,1),%r13d > > + xorl %r11d,%eax > > + addl %ecx,%r13d > > + roll $30,%edi > > + addl %eax,%r13d > > + roll $1,%ebp > > + xorl 0(%rsp),%r14d > > + movl %esi,%eax > > + movl %ebp,60(%rsp) > > + movl %r13d,%ecx > > + xorl 8(%rsp),%r14d > > + xorl %r11d,%eax > > + roll $5,%ecx > > + xorl 32(%rsp),%r14d > > + leal 1859775393(%rbp,%r12,1),%r12d > > + xorl %edi,%eax > > + addl %ecx,%r12d > > + roll $30,%esi > > + addl %eax,%r12d > > + roll $1,%r14d > > + xorl 4(%rsp),%edx > > + movl %r13d,%eax > > + movl %r14d,0(%rsp) > > + movl %r12d,%ecx > > + xorl 12(%rsp),%edx > > + xorl %edi,%eax > > + roll $5,%ecx > > + xorl 36(%rsp),%edx > > + leal 1859775393(%r14,%r11,1),%r11d > > + xorl %esi,%eax > > + addl %ecx,%r11d > > + roll $30,%r13d > > + addl %eax,%r11d > > + roll $1,%edx > > + xorl 8(%rsp),%ebp > > + movl %r12d,%eax > > + movl %edx,4(%rsp) > > + movl %r11d,%ecx > > + xorl 16(%rsp),%ebp > > + xorl %esi,%eax > > + roll $5,%ecx > > + xorl 40(%rsp),%ebp > > + leal 1859775393(%rdx,%rdi,1),%edi > > + xorl %r13d,%eax > > + addl %ecx,%edi > > + roll $30,%r12d > > + addl %eax,%edi > > + roll $1,%ebp > > + xorl 12(%rsp),%r14d > > + movl %r11d,%eax > > + movl %ebp,8(%rsp) > > + movl %edi,%ecx > > + xorl 20(%rsp),%r14d > > + xorl %r13d,%eax > > + roll $5,%ecx > > + xorl 44(%rsp),%r14d > > + leal 1859775393(%rbp,%rsi,1),%esi > > + xorl %r12d,%eax > > + addl %ecx,%esi > > + roll $30,%r11d > > + addl %eax,%esi > > + roll $1,%r14d > > + xorl 16(%rsp),%edx > > + movl %edi,%eax > > + movl %r14d,12(%rsp) > > + movl %esi,%ecx > > + xorl 24(%rsp),%edx > > + xorl %r12d,%eax > > + roll $5,%ecx > > + xorl 48(%rsp),%edx > > + leal 1859775393(%r14,%r13,1),%r13d > > + xorl %r11d,%eax > > + addl %ecx,%r13d > > + roll $30,%edi > > + addl %eax,%r13d > > + roll $1,%edx > > + xorl 20(%rsp),%ebp > > + movl %esi,%eax > > + movl %edx,16(%rsp) > > + movl %r13d,%ecx > > + xorl 28(%rsp),%ebp > > + xorl %r11d,%eax > > + roll $5,%ecx > > + xorl 52(%rsp),%ebp > > + leal 1859775393(%rdx,%r12,1),%r12d > > + xorl %edi,%eax > > + addl %ecx,%r12d > > + roll $30,%esi > > + addl %eax,%r12d > > + roll $1,%ebp > > + xorl 24(%rsp),%r14d > > + movl %r13d,%eax > > + movl %ebp,20(%rsp) > > + movl %r12d,%ecx > > + xorl 32(%rsp),%r14d > > + xorl %edi,%eax > > + roll $5,%ecx > > + xorl 56(%rsp),%r14d > > + leal 1859775393(%rbp,%r11,1),%r11d > > + xorl %esi,%eax > > + addl %ecx,%r11d > > + roll $30,%r13d > > + addl %eax,%r11d > > + roll $1,%r14d > > + xorl 28(%rsp),%edx > > + movl %r12d,%eax > > + movl %r14d,24(%rsp) > > + movl %r11d,%ecx > > + xorl 36(%rsp),%edx > > + xorl %esi,%eax > > + roll $5,%ecx > > + xorl 60(%rsp),%edx > > + leal 1859775393(%r14,%rdi,1),%edi > > + xorl %r13d,%eax > > + addl %ecx,%edi > > + roll $30,%r12d > > + addl %eax,%edi > > + roll $1,%edx > > + xorl 32(%rsp),%ebp > > + movl %r11d,%eax > > + movl %edx,28(%rsp) > > + movl %edi,%ecx > > + xorl 40(%rsp),%ebp > > + xorl %r13d,%eax > > + roll $5,%ecx > > + xorl 0(%rsp),%ebp > > + leal 1859775393(%rdx,%rsi,1),%esi > > + xorl %r12d,%eax > > + addl %ecx,%esi > > + roll $30,%r11d > > + addl %eax,%esi > > + roll $1,%ebp > > + xorl 36(%rsp),%r14d > > + movl %r12d,%eax > > + movl %ebp,32(%rsp) > > + movl %r12d,%ebx > > + xorl 44(%rsp),%r14d > > + andl %r11d,%eax > > + movl %esi,%ecx > > + xorl 4(%rsp),%r14d > > + leal -1894007588(%rbp,%r13,1),%r13d > > + xorl %r11d,%ebx > > + roll $5,%ecx > > + addl %eax,%r13d > > + roll $1,%r14d > > + andl %edi,%ebx > > + addl %ecx,%r13d > > + roll $30,%edi > > + addl %ebx,%r13d > > + xorl 40(%rsp),%edx > > + movl %r11d,%eax > > + movl %r14d,36(%rsp) > > + movl %r11d,%ebx > > + xorl 48(%rsp),%edx > > + andl %edi,%eax > > + movl %r13d,%ecx > > + xorl 8(%rsp),%edx > > + leal -1894007588(%r14,%r12,1),%r12d > > + xorl %edi,%ebx > > + roll $5,%ecx > > + addl %eax,%r12d > > + roll $1,%edx > > + andl %esi,%ebx > > + addl %ecx,%r12d > > + roll $30,%esi > > + addl %ebx,%r12d > > + xorl 44(%rsp),%ebp > > + movl %edi,%eax > > + movl %edx,40(%rsp) > > + movl %edi,%ebx > > + xorl 52(%rsp),%ebp > > + andl %esi,%eax > > + movl %r12d,%ecx > > + xorl 12(%rsp),%ebp > > + leal -1894007588(%rdx,%r11,1),%r11d > > + xorl %esi,%ebx > > + roll $5,%ecx > > + addl %eax,%r11d > > + roll $1,%ebp > > + andl %r13d,%ebx > > + addl %ecx,%r11d > > + roll $30,%r13d > > + addl %ebx,%r11d > > + xorl 48(%rsp),%r14d > > + movl %esi,%eax > > + movl %ebp,44(%rsp) > > + movl %esi,%ebx > > + xorl 56(%rsp),%r14d > > + andl %r13d,%eax > > + movl %r11d,%ecx > > + xorl 16(%rsp),%r14d > > + leal -1894007588(%rbp,%rdi,1),%edi > > + xorl %r13d,%ebx > > + roll $5,%ecx > > + addl %eax,%edi > > + roll $1,%r14d > > + andl %r12d,%ebx > > + addl %ecx,%edi > > + roll $30,%r12d > > + addl %ebx,%edi > > + xorl 52(%rsp),%edx > > + movl %r13d,%eax > > + movl %r14d,48(%rsp) > > + movl %r13d,%ebx > > + xorl 60(%rsp),%edx > > + andl %r12d,%eax > > + movl %edi,%ecx > > + xorl 20(%rsp),%edx > > + leal -1894007588(%r14,%rsi,1),%esi > > + xorl %r12d,%ebx > > + roll $5,%ecx > > + addl %eax,%esi > > + roll $1,%edx > > + andl %r11d,%ebx > > + addl %ecx,%esi > > + roll $30,%r11d > > + addl %ebx,%esi > > + xorl 56(%rsp),%ebp > > + movl %r12d,%eax > > + movl %edx,52(%rsp) > > + movl %r12d,%ebx > > + xorl 0(%rsp),%ebp > > + andl %r11d,%eax > > + movl %esi,%ecx > > + xorl 24(%rsp),%ebp > > + leal -1894007588(%rdx,%r13,1),%r13d > > + xorl %r11d,%ebx > > + roll $5,%ecx > > + addl %eax,%r13d > > + roll $1,%ebp > > + andl %edi,%ebx > > + addl %ecx,%r13d > > + roll $30,%edi > > + addl %ebx,%r13d > > + xorl 60(%rsp),%r14d > > + movl %r11d,%eax > > + movl %ebp,56(%rsp) > > + movl %r11d,%ebx > > + xorl 4(%rsp),%r14d > > + andl %edi,%eax > > + movl %r13d,%ecx > > + xorl 28(%rsp),%r14d > > + leal -1894007588(%rbp,%r12,1),%r12d > > + xorl %edi,%ebx > > + roll $5,%ecx > > + addl %eax,%r12d > > + roll $1,%r14d > > + andl %esi,%ebx > > + addl %ecx,%r12d > > + roll $30,%esi > > + addl %ebx,%r12d > > + xorl 0(%rsp),%edx > > + movl %edi,%eax > > + movl %r14d,60(%rsp) > > + movl %edi,%ebx > > + xorl 8(%rsp),%edx > > + andl %esi,%eax > > + movl %r12d,%ecx > > + xorl 32(%rsp),%edx > > + leal -1894007588(%r14,%r11,1),%r11d > > + xorl %esi,%ebx > > + roll $5,%ecx > > + addl %eax,%r11d > > + roll $1,%edx > > + andl %r13d,%ebx > > + addl %ecx,%r11d > > + roll $30,%r13d > > + addl %ebx,%r11d > > + xorl 4(%rsp),%ebp > > + movl %esi,%eax > > + movl %edx,0(%rsp) > > + movl %esi,%ebx > > + xorl 12(%rsp),%ebp > > + andl %r13d,%eax > > + movl %r11d,%ecx > > + xorl 36(%rsp),%ebp > > + leal -1894007588(%rdx,%rdi,1),%edi > > + xorl %r13d,%ebx > > + roll $5,%ecx > > + addl %eax,%edi > > + roll $1,%ebp > > + andl %r12d,%ebx > > + addl %ecx,%edi > > + roll $30,%r12d > > + addl %ebx,%edi > > + xorl 8(%rsp),%r14d > > + movl %r13d,%eax > > + movl %ebp,4(%rsp) > > + movl %r13d,%ebx > > + xorl 16(%rsp),%r14d > > + andl %r12d,%eax > > + movl %edi,%ecx > > + xorl 40(%rsp),%r14d > > + leal -1894007588(%rbp,%rsi,1),%esi > > + xorl %r12d,%ebx > > + roll $5,%ecx > > + addl %eax,%esi > > + roll $1,%r14d > > + andl %r11d,%ebx > > + addl %ecx,%esi > > + roll $30,%r11d > > + addl %ebx,%esi > > + xorl 12(%rsp),%edx > > + movl %r12d,%eax > > + movl %r14d,8(%rsp) > > + movl %r12d,%ebx > > + xorl 20(%rsp),%edx > > + andl %r11d,%eax > > + movl %esi,%ecx > > + xorl 44(%rsp),%edx > > + leal -1894007588(%r14,%r13,1),%r13d > > + xorl %r11d,%ebx > > + roll $5,%ecx > > + addl %eax,%r13d > > + roll $1,%edx > > + andl %edi,%ebx > > + addl %ecx,%r13d > > + roll $30,%edi > > + addl %ebx,%r13d > > + xorl 16(%rsp),%ebp > > + movl %r11d,%eax > > + movl %edx,12(%rsp) > > + movl %r11d,%ebx > > + xorl 24(%rsp),%ebp > > + andl %edi,%eax > > + movl %r13d,%ecx > > + xorl 48(%rsp),%ebp > > + leal -1894007588(%rdx,%r12,1),%r12d > > + xorl %edi,%ebx > > + roll $5,%ecx > > + addl %eax,%r12d > > + roll $1,%ebp > > + andl %esi,%ebx > > + addl %ecx,%r12d > > + roll $30,%esi > > + addl %ebx,%r12d > > + xorl 20(%rsp),%r14d > > + movl %edi,%eax > > + movl %ebp,16(%rsp) > > + movl %edi,%ebx > > + xorl 28(%rsp),%r14d > > + andl %esi,%eax > > + movl %r12d,%ecx > > + xorl 52(%rsp),%r14d > > + leal -1894007588(%rbp,%r11,1),%r11d > > + xorl %esi,%ebx > > + roll $5,%ecx > > + addl %eax,%r11d > > + roll $1,%r14d > > + andl %r13d,%ebx > > + addl %ecx,%r11d > > + roll $30,%r13d > > + addl %ebx,%r11d > > + xorl 24(%rsp),%edx > > + movl %esi,%eax > > + movl %r14d,20(%rsp) > > + movl %esi,%ebx > > + xorl 32(%rsp),%edx > > + andl %r13d,%eax > > + movl %r11d,%ecx > > + xorl 56(%rsp),%edx > > + leal -1894007588(%r14,%rdi,1),%edi > > + xorl %r13d,%ebx > > + roll $5,%ecx > > + addl %eax,%edi > > + roll $1,%edx > > + andl %r12d,%ebx > > + addl %ecx,%edi > > + roll $30,%r12d > > + addl %ebx,%edi > > + xorl 28(%rsp),%ebp > > + movl %r13d,%eax > > + movl %edx,24(%rsp) > > + movl %r13d,%ebx > > + xorl 36(%rsp),%ebp > > + andl %r12d,%eax > > + movl %edi,%ecx > > + xorl 60(%rsp),%ebp > > + leal -1894007588(%rdx,%rsi,1),%esi > > + xorl %r12d,%ebx > > + roll $5,%ecx > > + addl %eax,%esi > > + roll $1,%ebp > > + andl %r11d,%ebx > > + addl %ecx,%esi > > + roll $30,%r11d > > + addl %ebx,%esi > > + xorl 32(%rsp),%r14d > > + movl %r12d,%eax > > + movl %ebp,28(%rsp) > > + movl %r12d,%ebx > > + xorl 40(%rsp),%r14d > > + andl %r11d,%eax > > + movl %esi,%ecx > > + xorl 0(%rsp),%r14d > > + leal -1894007588(%rbp,%r13,1),%r13d > > + xorl %r11d,%ebx > > + roll $5,%ecx > > + addl %eax,%r13d > > + roll $1,%r14d > > + andl %edi,%ebx > > + addl %ecx,%r13d > > + roll $30,%edi > > + addl %ebx,%r13d > > + xorl 36(%rsp),%edx > > + movl %r11d,%eax > > + movl %r14d,32(%rsp) > > + movl %r11d,%ebx > > + xorl 44(%rsp),%edx > > + andl %edi,%eax > > + movl %r13d,%ecx > > + xorl 4(%rsp),%edx > > + leal -1894007588(%r14,%r12,1),%r12d > > + xorl %edi,%ebx > > + roll $5,%ecx > > + addl %eax,%r12d > > + roll $1,%edx > > + andl %esi,%ebx > > + addl %ecx,%r12d > > + roll $30,%esi > > + addl %ebx,%r12d > > + xorl 40(%rsp),%ebp > > + movl %edi,%eax > > + movl %edx,36(%rsp) > > + movl %edi,%ebx > > + xorl 48(%rsp),%ebp > > + andl %esi,%eax > > + movl %r12d,%ecx > > + xorl 8(%rsp),%ebp > > + leal -1894007588(%rdx,%r11,1),%r11d > > + xorl %esi,%ebx > > + roll $5,%ecx > > + addl %eax,%r11d > > + roll $1,%ebp > > + andl %r13d,%ebx > > + addl %ecx,%r11d > > + roll $30,%r13d > > + addl %ebx,%r11d > > + xorl 44(%rsp),%r14d > > + movl %esi,%eax > > + movl %ebp,40(%rsp) > > + movl %esi,%ebx > > + xorl 52(%rsp),%r14d > > + andl %r13d,%eax > > + movl %r11d,%ecx > > + xorl 12(%rsp),%r14d > > + leal -1894007588(%rbp,%rdi,1),%edi > > + xorl %r13d,%ebx > > + roll $5,%ecx > > + addl %eax,%edi > > + roll $1,%r14d > > + andl %r12d,%ebx > > + addl %ecx,%edi > > + roll $30,%r12d > > + addl %ebx,%edi > > + xorl 48(%rsp),%edx > > + movl %r13d,%eax > > + movl %r14d,44(%rsp) > > + movl %r13d,%ebx > > + xorl 56(%rsp),%edx > > + andl %r12d,%eax > > + movl %edi,%ecx > > + xorl 16(%rsp),%edx > > + leal -1894007588(%r14,%rsi,1),%esi > > + xorl %r12d,%ebx > > + roll $5,%ecx > > + addl %eax,%esi > > + roll $1,%edx > > + andl %r11d,%ebx > > + addl %ecx,%esi > > + roll $30,%r11d > > + addl %ebx,%esi > > + xorl 52(%rsp),%ebp > > + movl %edi,%eax > > + movl %edx,48(%rsp) > > + movl %esi,%ecx > > + xorl 60(%rsp),%ebp > > + xorl %r12d,%eax > > + roll $5,%ecx > > + xorl 20(%rsp),%ebp > > + leal -899497514(%rdx,%r13,1),%r13d > > + xorl %r11d,%eax > > + addl %ecx,%r13d > > + roll $30,%edi > > + addl %eax,%r13d > > + roll $1,%ebp > > + xorl 56(%rsp),%r14d > > + movl %esi,%eax > > + movl %ebp,52(%rsp) > > + movl %r13d,%ecx > > + xorl 0(%rsp),%r14d > > + xorl %r11d,%eax > > + roll $5,%ecx > > + xorl 24(%rsp),%r14d > > + leal -899497514(%rbp,%r12,1),%r12d > > + xorl %edi,%eax > > + addl %ecx,%r12d > > + roll $30,%esi > > + addl %eax,%r12d > > + roll $1,%r14d > > + xorl 60(%rsp),%edx > > + movl %r13d,%eax > > + movl %r14d,56(%rsp) > > + movl %r12d,%ecx > > + xorl 4(%rsp),%edx > > + xorl %edi,%eax > > + roll $5,%ecx > > + xorl 28(%rsp),%edx > > + leal -899497514(%r14,%r11,1),%r11d > > + xorl %esi,%eax > > + addl %ecx,%r11d > > + roll $30,%r13d > > + addl %eax,%r11d > > + roll $1,%edx > > + xorl 0(%rsp),%ebp > > + movl %r12d,%eax > > + movl %edx,60(%rsp) > > + movl %r11d,%ecx > > + xorl 8(%rsp),%ebp > > + xorl %esi,%eax > > + roll $5,%ecx > > + xorl 32(%rsp),%ebp > > + leal -899497514(%rdx,%rdi,1),%edi > > + xorl %r13d,%eax > > + addl %ecx,%edi > > + roll $30,%r12d > > + addl %eax,%edi > > + roll $1,%ebp > > + xorl 4(%rsp),%r14d > > + movl %r11d,%eax > > + movl %ebp,0(%rsp) > > + movl %edi,%ecx > > + xorl 12(%rsp),%r14d > > + xorl %r13d,%eax > > + roll $5,%ecx > > + xorl 36(%rsp),%r14d > > + leal -899497514(%rbp,%rsi,1),%esi > > + xorl %r12d,%eax > > + addl %ecx,%esi > > + roll $30,%r11d > > + addl %eax,%esi > > + roll $1,%r14d > > + xorl 8(%rsp),%edx > > + movl %edi,%eax > > + movl %r14d,4(%rsp) > > + movl %esi,%ecx > > + xorl 16(%rsp),%edx > > + xorl %r12d,%eax > > + roll $5,%ecx > > + xorl 40(%rsp),%edx > > + leal -899497514(%r14,%r13,1),%r13d > > + xorl %r11d,%eax > > + addl %ecx,%r13d > > + roll $30,%edi > > + addl %eax,%r13d > > + roll $1,%edx > > + xorl 12(%rsp),%ebp > > + movl %esi,%eax > > + movl %edx,8(%rsp) > > + movl %r13d,%ecx > > + xorl 20(%rsp),%ebp > > + xorl %r11d,%eax > > + roll $5,%ecx > > + xorl 44(%rsp),%ebp > > + leal -899497514(%rdx,%r12,1),%r12d > > + xorl %edi,%eax > > + addl %ecx,%r12d > > + roll $30,%esi > > + addl %eax,%r12d > > + roll $1,%ebp > > + xorl 16(%rsp),%r14d > > + movl %r13d,%eax > > + movl %ebp,12(%rsp) > > + movl %r12d,%ecx > > + xorl 24(%rsp),%r14d > > + xorl %edi,%eax > > + roll $5,%ecx > > + xorl 48(%rsp),%r14d > > + leal -899497514(%rbp,%r11,1),%r11d > > + xorl %esi,%eax > > + addl %ecx,%r11d > > + roll $30,%r13d > > + addl %eax,%r11d > > + roll $1,%r14d > > + xorl 20(%rsp),%edx > > + movl %r12d,%eax > > + movl %r14d,16(%rsp) > > + movl %r11d,%ecx > > + xorl 28(%rsp),%edx > > + xorl %esi,%eax > > + roll $5,%ecx > > + xorl 52(%rsp),%edx > > + leal -899497514(%r14,%rdi,1),%edi > > + xorl %r13d,%eax > > + addl %ecx,%edi > > + roll $30,%r12d > > + addl %eax,%edi > > + roll $1,%edx > > + xorl 24(%rsp),%ebp > > + movl %r11d,%eax > > + movl %edx,20(%rsp) > > + movl %edi,%ecx > > + xorl 32(%rsp),%ebp > > + xorl %r13d,%eax > > + roll $5,%ecx > > + xorl 56(%rsp),%ebp > > + leal -899497514(%rdx,%rsi,1),%esi > > + xorl %r12d,%eax > > + addl %ecx,%esi > > + roll $30,%r11d > > + addl %eax,%esi > > + roll $1,%ebp > > + xorl 28(%rsp),%r14d > > + movl %edi,%eax > > + movl %ebp,24(%rsp) > > + movl %esi,%ecx > > + xorl 36(%rsp),%r14d > > + xorl %r12d,%eax > > + roll $5,%ecx > > + xorl 60(%rsp),%r14d > > + leal -899497514(%rbp,%r13,1),%r13d > > + xorl %r11d,%eax > > + addl %ecx,%r13d > > + roll $30,%edi > > + addl %eax,%r13d > > + roll $1,%r14d > > + xorl 32(%rsp),%edx > > + movl %esi,%eax > > + movl %r14d,28(%rsp) > > + movl %r13d,%ecx > > + xorl 40(%rsp),%edx > > + xorl %r11d,%eax > > + roll $5,%ecx > > + xorl 0(%rsp),%edx > > + leal -899497514(%r14,%r12,1),%r12d > > + xorl %edi,%eax > > + addl %ecx,%r12d > > + roll $30,%esi > > + addl %eax,%r12d > > + roll $1,%edx > > + xorl 36(%rsp),%ebp > > + movl %r13d,%eax > > + > > + movl %r12d,%ecx > > + xorl 44(%rsp),%ebp > > + xorl %edi,%eax > > + roll $5,%ecx > > + xorl 4(%rsp),%ebp > > + leal -899497514(%rdx,%r11,1),%r11d > > + xorl %esi,%eax > > + addl %ecx,%r11d > > + roll $30,%r13d > > + addl %eax,%r11d > > + roll $1,%ebp > > + xorl 40(%rsp),%r14d > > + movl %r12d,%eax > > + > > + movl %r11d,%ecx > > + xorl 48(%rsp),%r14d > > + xorl %esi,%eax > > + roll $5,%ecx > > + xorl 8(%rsp),%r14d > > + leal -899497514(%rbp,%rdi,1),%edi > > + xorl %r13d,%eax > > + addl %ecx,%edi > > + roll $30,%r12d > > + addl %eax,%edi > > + roll $1,%r14d > > + xorl 44(%rsp),%edx > > + movl %r11d,%eax > > + > > + movl %edi,%ecx > > + xorl 52(%rsp),%edx > > + xorl %r13d,%eax > > + roll $5,%ecx > > + xorl 12(%rsp),%edx > > + leal -899497514(%r14,%rsi,1),%esi > > + xorl %r12d,%eax > > + addl %ecx,%esi > > + roll $30,%r11d > > + addl %eax,%esi > > + roll $1,%edx > > + xorl 48(%rsp),%ebp > > + movl %edi,%eax > > + > > + movl %esi,%ecx > > + xorl 56(%rsp),%ebp > > + xorl %r12d,%eax > > + roll $5,%ecx > > + xorl 16(%rsp),%ebp > > + leal -899497514(%rdx,%r13,1),%r13d > > + xorl %r11d,%eax > > + addl %ecx,%r13d > > + roll $30,%edi > > + addl %eax,%r13d > > + roll $1,%ebp > > + xorl 52(%rsp),%r14d > > + movl %esi,%eax > > + > > + movl %r13d,%ecx > > + xorl 60(%rsp),%r14d > > + xorl %r11d,%eax > > + roll $5,%ecx > > + xorl 20(%rsp),%r14d > > + leal -899497514(%rbp,%r12,1),%r12d > > + xorl %edi,%eax > > + addl %ecx,%r12d > > + roll $30,%esi > > + addl %eax,%r12d > > + roll $1,%r14d > > + xorl 56(%rsp),%edx > > + movl %r13d,%eax > > + > > + movl %r12d,%ecx > > + xorl 0(%rsp),%edx > > + xorl %edi,%eax > > + roll $5,%ecx > > + xorl 24(%rsp),%edx > > + leal -899497514(%r14,%r11,1),%r11d > > + xorl %esi,%eax > > + addl %ecx,%r11d > > + roll $30,%r13d > > + addl %eax,%r11d > > + roll $1,%edx > > + xorl 60(%rsp),%ebp > > + movl %r12d,%eax > > + > > + movl %r11d,%ecx > > + xorl 4(%rsp),%ebp > > + xorl %esi,%eax > > + roll $5,%ecx > > + xorl 28(%rsp),%ebp > > + leal -899497514(%rdx,%rdi,1),%edi > > + xorl %r13d,%eax > > + addl %ecx,%edi > > + roll $30,%r12d > > + addl %eax,%edi > > + roll $1,%ebp > > + movl %r11d,%eax > > + movl %edi,%ecx > > + xorl %r13d,%eax > > + leal -899497514(%rbp,%rsi,1),%esi > > + roll $5,%ecx > > + xorl %r12d,%eax > > + addl %ecx,%esi > > + roll $30,%r11d > > + addl %eax,%esi > > + addl 0(%r8),%esi > > + addl 4(%r8),%edi > > + addl 8(%r8),%r11d > > + addl 12(%r8),%r12d > > + addl 16(%r8),%r13d > > + movl %esi,0(%r8) > > + movl %edi,4(%r8) > > + movl %r11d,8(%r8) > > + movl %r12d,12(%r8) > > + movl %r13d,16(%r8) > > + > > + subq $1,%r10 > > + leaq 64(%r9),%r9 > > + jnz .Lloop > > + > > + movq 64(%rsp),%rsi > > +.cfi_def_cfa %rsi,8 > > + movq -40(%rsi),%r14 > > +.cfi_restore %r14 > > + movq -32(%rsi),%r13 > > +.cfi_restore %r13 > > + movq -24(%rsi),%r12 > > +.cfi_restore %r12 > > + movq -16(%rsi),%rbp > > +.cfi_restore %rbp > > + movq -8(%rsi),%rbx > > +.cfi_restore %rbx > > + leaq (%rsi),%rsp > > +.cfi_def_cfa_register %rsp > > +.Lepilogue: > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size sha1_block_data_order,.-sha1_block_data_order > > +.type sha1_block_data_order_shaext,@function > > +.align 32 > > +sha1_block_data_order_shaext: > > +_shaext_shortcut: > > +.cfi_startproc > > + movdqu (%rdi),%xmm0 > > + movd 16(%rdi),%xmm1 > > + movdqa K_XX_XX+160(%rip),%xmm3 > > + > > + movdqu (%rsi),%xmm4 > > + pshufd $27,%xmm0,%xmm0 > > + movdqu 16(%rsi),%xmm5 > > + pshufd $27,%xmm1,%xmm1 > > + movdqu 32(%rsi),%xmm6 > > +.byte 102,15,56,0,227 > > + movdqu 48(%rsi),%xmm7 > > +.byte 102,15,56,0,235 > > +.byte 102,15,56,0,243 > > + movdqa %xmm1,%xmm9 > > +.byte 102,15,56,0,251 > > + jmp .Loop_shaext > > + > > +.align 16 > > +.Loop_shaext: > > + decq %rdx > > + leaq 64(%rsi),%r8 > > + paddd %xmm4,%xmm1 > > + cmovneq %r8,%rsi > > + movdqa %xmm0,%xmm8 > > +.byte 15,56,201,229 > > + movdqa %xmm0,%xmm2 > > +.byte 15,58,204,193,0 > > +.byte 15,56,200,213 > > + pxor %xmm6,%xmm4 > > +.byte 15,56,201,238 > > +.byte 15,56,202,231 > > + > > + movdqa %xmm0,%xmm1 > > +.byte 15,58,204,194,0 > > +.byte 15,56,200,206 > > + pxor %xmm7,%xmm5 > > +.byte 15,56,202,236 > > +.byte 15,56,201,247 > > + movdqa %xmm0,%xmm2 > > +.byte 15,58,204,193,0 > > +.byte 15,56,200,215 > > + pxor %xmm4,%xmm6 > > +.byte 15,56,201,252 > > +.byte 15,56,202,245 > > + > > + movdqa %xmm0,%xmm1 > > +.byte 15,58,204,194,0 > > +.byte 15,56,200,204 > > + pxor %xmm5,%xmm7 > > +.byte 15,56,202,254 > > +.byte 15,56,201,229 > > + movdqa %xmm0,%xmm2 > > +.byte 15,58,204,193,0 > > +.byte 15,56,200,213 > > + pxor %xmm6,%xmm4 > > +.byte 15,56,201,238 > > +.byte 15,56,202,231 > > + > > + movdqa %xmm0,%xmm1 > > +.byte 15,58,204,194,1 > > +.byte 15,56,200,206 > > + pxor %xmm7,%xmm5 > > +.byte 15,56,202,236 > > +.byte 15,56,201,247 > > + movdqa %xmm0,%xmm2 > > +.byte 15,58,204,193,1 > > +.byte 15,56,200,215 > > + pxor %xmm4,%xmm6 > > +.byte 15,56,201,252 > > +.byte 15,56,202,245 > > + > > + movdqa %xmm0,%xmm1 > > +.byte 15,58,204,194,1 > > +.byte 15,56,200,204 > > + pxor %xmm5,%xmm7 > > +.byte 15,56,202,254 > > +.byte 15,56,201,229 > > + movdqa %xmm0,%xmm2 > > +.byte 15,58,204,193,1 > > +.byte 15,56,200,213 > > + pxor %xmm6,%xmm4 > > +.byte 15,56,201,238 > > +.byte 15,56,202,231 > > + > > + movdqa %xmm0,%xmm1 > > +.byte 15,58,204,194,1 > > +.byte 15,56,200,206 > > + pxor %xmm7,%xmm5 > > +.byte 15,56,202,236 > > +.byte 15,56,201,247 > > + movdqa %xmm0,%xmm2 > > +.byte 15,58,204,193,2 > > +.byte 15,56,200,215 > > + pxor %xmm4,%xmm6 > > +.byte 15,56,201,252 > > +.byte 15,56,202,245 > > + > > + movdqa %xmm0,%xmm1 > > +.byte 15,58,204,194,2 > > +.byte 15,56,200,204 > > + pxor %xmm5,%xmm7 > > +.byte 15,56,202,254 > > +.byte 15,56,201,229 > > + movdqa %xmm0,%xmm2 > > +.byte 15,58,204,193,2 > > +.byte 15,56,200,213 > > + pxor %xmm6,%xmm4 > > +.byte 15,56,201,238 > > +.byte 15,56,202,231 > > + > > + movdqa %xmm0,%xmm1 > > +.byte 15,58,204,194,2 > > +.byte 15,56,200,206 > > + pxor %xmm7,%xmm5 > > +.byte 15,56,202,236 > > +.byte 15,56,201,247 > > + movdqa %xmm0,%xmm2 > > +.byte 15,58,204,193,2 > > +.byte 15,56,200,215 > > + pxor %xmm4,%xmm6 > > +.byte 15,56,201,252 > > +.byte 15,56,202,245 > > + > > + movdqa %xmm0,%xmm1 > > +.byte 15,58,204,194,3 > > +.byte 15,56,200,204 > > + pxor %xmm5,%xmm7 > > +.byte 15,56,202,254 > > + movdqu (%rsi),%xmm4 > > + movdqa %xmm0,%xmm2 > > +.byte 15,58,204,193,3 > > +.byte 15,56,200,213 > > + movdqu 16(%rsi),%xmm5 > > +.byte 102,15,56,0,227 > > + > > + movdqa %xmm0,%xmm1 > > +.byte 15,58,204,194,3 > > +.byte 15,56,200,206 > > + movdqu 32(%rsi),%xmm6 > > +.byte 102,15,56,0,235 > > + > > + movdqa %xmm0,%xmm2 > > +.byte 15,58,204,193,3 > > +.byte 15,56,200,215 > > + movdqu 48(%rsi),%xmm7 > > +.byte 102,15,56,0,243 > > + > > + movdqa %xmm0,%xmm1 > > +.byte 15,58,204,194,3 > > +.byte 65,15,56,200,201 > > +.byte 102,15,56,0,251 > > + > > + paddd %xmm8,%xmm0 > > + movdqa %xmm1,%xmm9 > > + > > + jnz .Loop_shaext > > + > > + pshufd $27,%xmm0,%xmm0 > > + pshufd $27,%xmm1,%xmm1 > > + movdqu %xmm0,(%rdi) > > + movd %xmm1,16(%rdi) > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size sha1_block_data_order_shaext,.-sha1_block_data_order_shaext > > +.type sha1_block_data_order_ssse3,@function > > +.align 16 > > +sha1_block_data_order_ssse3: > > +_ssse3_shortcut: > > +.cfi_startproc > > + movq %rsp,%r11 > > +.cfi_def_cfa_register %r11 > > + pushq %rbx > > +.cfi_offset %rbx,-16 > > + pushq %rbp > > +.cfi_offset %rbp,-24 > > + pushq %r12 > > +.cfi_offset %r12,-32 > > + pushq %r13 > > +.cfi_offset %r13,-40 > > + pushq %r14 > > +.cfi_offset %r14,-48 > > + leaq -64(%rsp),%rsp > > + andq $-64,%rsp > > + movq %rdi,%r8 > > + movq %rsi,%r9 > > + movq %rdx,%r10 > > + > > + shlq $6,%r10 > > + addq %r9,%r10 > > + leaq K_XX_XX+64(%rip),%r14 > > + > > + movl 0(%r8),%eax > > + movl 4(%r8),%ebx > > + movl 8(%r8),%ecx > > + movl 12(%r8),%edx > > + movl %ebx,%esi > > + movl 16(%r8),%ebp > > + movl %ecx,%edi > > + xorl %edx,%edi > > + andl %edi,%esi > > + > > + movdqa 64(%r14),%xmm6 > > + movdqa -64(%r14),%xmm9 > > + movdqu 0(%r9),%xmm0 > > + movdqu 16(%r9),%xmm1 > > + movdqu 32(%r9),%xmm2 > > + movdqu 48(%r9),%xmm3 > > +.byte 102,15,56,0,198 > > +.byte 102,15,56,0,206 > > +.byte 102,15,56,0,214 > > + addq $64,%r9 > > + paddd %xmm9,%xmm0 > > +.byte 102,15,56,0,222 > > + paddd %xmm9,%xmm1 > > + paddd %xmm9,%xmm2 > > + movdqa %xmm0,0(%rsp) > > + psubd %xmm9,%xmm0 > > + movdqa %xmm1,16(%rsp) > > + psubd %xmm9,%xmm1 > > + movdqa %xmm2,32(%rsp) > > + psubd %xmm9,%xmm2 > > + jmp .Loop_ssse3 > > +.align 16 > > +.Loop_ssse3: > > + rorl $2,%ebx > > + pshufd $238,%xmm0,%xmm4 > > + xorl %edx,%esi > > + movdqa %xmm3,%xmm8 > > + paddd %xmm3,%xmm9 > > + movl %eax,%edi > > + addl 0(%rsp),%ebp > > + punpcklqdq %xmm1,%xmm4 > > + xorl %ecx,%ebx > > + roll $5,%eax > > + addl %esi,%ebp > > + psrldq $4,%xmm8 > > + andl %ebx,%edi > > + xorl %ecx,%ebx > > + pxor %xmm0,%xmm4 > > + addl %eax,%ebp > > + rorl $7,%eax > > + pxor %xmm2,%xmm8 > > + xorl %ecx,%edi > > + movl %ebp,%esi > > + addl 4(%rsp),%edx > > + pxor %xmm8,%xmm4 > > + xorl %ebx,%eax > > + roll $5,%ebp > > + movdqa %xmm9,48(%rsp) > > + addl %edi,%edx > > + andl %eax,%esi > > + movdqa %xmm4,%xmm10 > > + xorl %ebx,%eax > > + addl %ebp,%edx > > + rorl $7,%ebp > > + movdqa %xmm4,%xmm8 > > + xorl %ebx,%esi > > + pslldq $12,%xmm10 > > + paddd %xmm4,%xmm4 > > + movl %edx,%edi > > + addl 8(%rsp),%ecx > > + psrld $31,%xmm8 > > + xorl %eax,%ebp > > + roll $5,%edx > > + addl %esi,%ecx > > + movdqa %xmm10,%xmm9 > > + andl %ebp,%edi > > + xorl %eax,%ebp > > + psrld $30,%xmm10 > > + addl %edx,%ecx > > + rorl $7,%edx > > + por %xmm8,%xmm4 > > + xorl %eax,%edi > > + movl %ecx,%esi > > + addl 12(%rsp),%ebx > > + pslld $2,%xmm9 > > + pxor %xmm10,%xmm4 > > + xorl %ebp,%edx > > + movdqa -64(%r14),%xmm10 > > + roll $5,%ecx > > + addl %edi,%ebx > > + andl %edx,%esi > > + pxor %xmm9,%xmm4 > > + xorl %ebp,%edx > > + addl %ecx,%ebx > > + rorl $7,%ecx > > + pshufd $238,%xmm1,%xmm5 > > + xorl %ebp,%esi > > + movdqa %xmm4,%xmm9 > > + paddd %xmm4,%xmm10 > > + movl %ebx,%edi > > + addl 16(%rsp),%eax > > + punpcklqdq %xmm2,%xmm5 > > + xorl %edx,%ecx > > + roll $5,%ebx > > + addl %esi,%eax > > + psrldq $4,%xmm9 > > + andl %ecx,%edi > > + xorl %edx,%ecx > > + pxor %xmm1,%xmm5 > > + addl %ebx,%eax > > + rorl $7,%ebx > > + pxor %xmm3,%xmm9 > > + xorl %edx,%edi > > + movl %eax,%esi > > + addl 20(%rsp),%ebp > > + pxor %xmm9,%xmm5 > > + xorl %ecx,%ebx > > + roll $5,%eax > > + movdqa %xmm10,0(%rsp) > > + addl %edi,%ebp > > + andl %ebx,%esi > > + movdqa %xmm5,%xmm8 > > + xorl %ecx,%ebx > > + addl %eax,%ebp > > + rorl $7,%eax > > + movdqa %xmm5,%xmm9 > > + xorl %ecx,%esi > > + pslldq $12,%xmm8 > > + paddd %xmm5,%xmm5 > > + movl %ebp,%edi > > + addl 24(%rsp),%edx > > + psrld $31,%xmm9 > > + xorl %ebx,%eax > > + roll $5,%ebp > > + addl %esi,%edx > > + movdqa %xmm8,%xmm10 > > + andl %eax,%edi > > + xorl %ebx,%eax > > + psrld $30,%xmm8 > > + addl %ebp,%edx > > + rorl $7,%ebp > > + por %xmm9,%xmm5 > > + xorl %ebx,%edi > > + movl %edx,%esi > > + addl 28(%rsp),%ecx > > + pslld $2,%xmm10 > > + pxor %xmm8,%xmm5 > > + xorl %eax,%ebp > > + movdqa -32(%r14),%xmm8 > > + roll $5,%edx > > + addl %edi,%ecx > > + andl %ebp,%esi > > + pxor %xmm10,%xmm5 > > + xorl %eax,%ebp > > + addl %edx,%ecx > > + rorl $7,%edx > > + pshufd $238,%xmm2,%xmm6 > > + xorl %eax,%esi > > + movdqa %xmm5,%xmm10 > > + paddd %xmm5,%xmm8 > > + movl %ecx,%edi > > + addl 32(%rsp),%ebx > > + punpcklqdq %xmm3,%xmm6 > > + xorl %ebp,%edx > > + roll $5,%ecx > > + addl %esi,%ebx > > + psrldq $4,%xmm10 > > + andl %edx,%edi > > + xorl %ebp,%edx > > + pxor %xmm2,%xmm6 > > + addl %ecx,%ebx > > + rorl $7,%ecx > > + pxor %xmm4,%xmm10 > > + xorl %ebp,%edi > > + movl %ebx,%esi > > + addl 36(%rsp),%eax > > + pxor %xmm10,%xmm6 > > + xorl %edx,%ecx > > + roll $5,%ebx > > + movdqa %xmm8,16(%rsp) > > + addl %edi,%eax > > + andl %ecx,%esi > > + movdqa %xmm6,%xmm9 > > + xorl %edx,%ecx > > + addl %ebx,%eax > > + rorl $7,%ebx > > + movdqa %xmm6,%xmm10 > > + xorl %edx,%esi > > + pslldq $12,%xmm9 > > + paddd %xmm6,%xmm6 > > + movl %eax,%edi > > + addl 40(%rsp),%ebp > > + psrld $31,%xmm10 > > + xorl %ecx,%ebx > > + roll $5,%eax > > + addl %esi,%ebp > > + movdqa %xmm9,%xmm8 > > + andl %ebx,%edi > > + xorl %ecx,%ebx > > + psrld $30,%xmm9 > > + addl %eax,%ebp > > + rorl $7,%eax > > + por %xmm10,%xmm6 > > + xorl %ecx,%edi > > + movl %ebp,%esi > > + addl 44(%rsp),%edx > > + pslld $2,%xmm8 > > + pxor %xmm9,%xmm6 > > + xorl %ebx,%eax > > + movdqa -32(%r14),%xmm9 > > + roll $5,%ebp > > + addl %edi,%edx > > + andl %eax,%esi > > + pxor %xmm8,%xmm6 > > + xorl %ebx,%eax > > + addl %ebp,%edx > > + rorl $7,%ebp > > + pshufd $238,%xmm3,%xmm7 > > + xorl %ebx,%esi > > + movdqa %xmm6,%xmm8 > > + paddd %xmm6,%xmm9 > > + movl %edx,%edi > > + addl 48(%rsp),%ecx > > + punpcklqdq %xmm4,%xmm7 > > + xorl %eax,%ebp > > + roll $5,%edx > > + addl %esi,%ecx > > + psrldq $4,%xmm8 > > + andl %ebp,%edi > > + xorl %eax,%ebp > > + pxor %xmm3,%xmm7 > > + addl %edx,%ecx > > + rorl $7,%edx > > + pxor %xmm5,%xmm8 > > + xorl %eax,%edi > > + movl %ecx,%esi > > + addl 52(%rsp),%ebx > > + pxor %xmm8,%xmm7 > > + xorl %ebp,%edx > > + roll $5,%ecx > > + movdqa %xmm9,32(%rsp) > > + addl %edi,%ebx > > + andl %edx,%esi > > + movdqa %xmm7,%xmm10 > > + xorl %ebp,%edx > > + addl %ecx,%ebx > > + rorl $7,%ecx > > + movdqa %xmm7,%xmm8 > > + xorl %ebp,%esi > > + pslldq $12,%xmm10 > > + paddd %xmm7,%xmm7 > > + movl %ebx,%edi > > + addl 56(%rsp),%eax > > + psrld $31,%xmm8 > > + xorl %edx,%ecx > > + roll $5,%ebx > > + addl %esi,%eax > > + movdqa %xmm10,%xmm9 > > + andl %ecx,%edi > > + xorl %edx,%ecx > > + psrld $30,%xmm10 > > + addl %ebx,%eax > > + rorl $7,%ebx > > + por %xmm8,%xmm7 > > + xorl %edx,%edi > > + movl %eax,%esi > > + addl 60(%rsp),%ebp > > + pslld $2,%xmm9 > > + pxor %xmm10,%xmm7 > > + xorl %ecx,%ebx > > + movdqa -32(%r14),%xmm10 > > + roll $5,%eax > > + addl %edi,%ebp > > + andl %ebx,%esi > > + pxor %xmm9,%xmm7 > > + pshufd $238,%xmm6,%xmm9 > > + xorl %ecx,%ebx > > + addl %eax,%ebp > > + rorl $7,%eax > > + pxor %xmm4,%xmm0 > > + xorl %ecx,%esi > > + movl %ebp,%edi > > + addl 0(%rsp),%edx > > + punpcklqdq %xmm7,%xmm9 > > + xorl %ebx,%eax > > + roll $5,%ebp > > + pxor %xmm1,%xmm0 > > + addl %esi,%edx > > + andl %eax,%edi > > + movdqa %xmm10,%xmm8 > > + xorl %ebx,%eax > > + paddd %xmm7,%xmm10 > > + addl %ebp,%edx > > + pxor %xmm9,%xmm0 > > + rorl $7,%ebp > > + xorl %ebx,%edi > > + movl %edx,%esi > > + addl 4(%rsp),%ecx > > + movdqa %xmm0,%xmm9 > > + xorl %eax,%ebp > > + roll $5,%edx > > + movdqa %xmm10,48(%rsp) > > + addl %edi,%ecx > > + andl %ebp,%esi > > + xorl %eax,%ebp > > + pslld $2,%xmm0 > > + addl %edx,%ecx > > + rorl $7,%edx > > + psrld $30,%xmm9 > > + xorl %eax,%esi > > + movl %ecx,%edi > > + addl 8(%rsp),%ebx > > + por %xmm9,%xmm0 > > + xorl %ebp,%edx > > + roll $5,%ecx > > + pshufd $238,%xmm7,%xmm10 > > + addl %esi,%ebx > > + andl %edx,%edi > > + xorl %ebp,%edx > > + addl %ecx,%ebx > > + addl 12(%rsp),%eax > > + xorl %ebp,%edi > > + movl %ebx,%esi > > + roll $5,%ebx > > + addl %edi,%eax > > + xorl %edx,%esi > > + rorl $7,%ecx > > + addl %ebx,%eax > > + pxor %xmm5,%xmm1 > > + addl 16(%rsp),%ebp > > + xorl %ecx,%esi > > + punpcklqdq %xmm0,%xmm10 > > + movl %eax,%edi > > + roll $5,%eax > > + pxor %xmm2,%xmm1 > > + addl %esi,%ebp > > + xorl %ecx,%edi > > + movdqa %xmm8,%xmm9 > > + rorl $7,%ebx > > + paddd %xmm0,%xmm8 > > + addl %eax,%ebp > > + pxor %xmm10,%xmm1 > > + addl 20(%rsp),%edx > > + xorl %ebx,%edi > > + movl %ebp,%esi > > + roll $5,%ebp > > + movdqa %xmm1,%xmm10 > > + addl %edi,%edx > > + xorl %ebx,%esi > > + movdqa %xmm8,0(%rsp) > > + rorl $7,%eax > > + addl %ebp,%edx > > + addl 24(%rsp),%ecx > > + pslld $2,%xmm1 > > + xorl %eax,%esi > > + movl %edx,%edi > > + psrld $30,%xmm10 > > + roll $5,%edx > > + addl %esi,%ecx > > + xorl %eax,%edi > > + rorl $7,%ebp > > + por %xmm10,%xmm1 > > + addl %edx,%ecx > > + addl 28(%rsp),%ebx > > + pshufd $238,%xmm0,%xmm8 > > + xorl %ebp,%edi > > + movl %ecx,%esi > > + roll $5,%ecx > > + addl %edi,%ebx > > + xorl %ebp,%esi > > + rorl $7,%edx > > + addl %ecx,%ebx > > + pxor %xmm6,%xmm2 > > + addl 32(%rsp),%eax > > + xorl %edx,%esi > > + punpcklqdq %xmm1,%xmm8 > > + movl %ebx,%edi > > + roll $5,%ebx > > + pxor %xmm3,%xmm2 > > + addl %esi,%eax > > + xorl %edx,%edi > > + movdqa 0(%r14),%xmm10 > > + rorl $7,%ecx > > + paddd %xmm1,%xmm9 > > + addl %ebx,%eax > > + pxor %xmm8,%xmm2 > > + addl 36(%rsp),%ebp > > + xorl %ecx,%edi > > + movl %eax,%esi > > + roll $5,%eax > > + movdqa %xmm2,%xmm8 > > + addl %edi,%ebp > > + xorl %ecx,%esi > > + movdqa %xmm9,16(%rsp) > > + rorl $7,%ebx > > + addl %eax,%ebp > > + addl 40(%rsp),%edx > > + pslld $2,%xmm2 > > + xorl %ebx,%esi > > + movl %ebp,%edi > > + psrld $30,%xmm8 > > + roll $5,%ebp > > + addl %esi,%edx > > + xorl %ebx,%edi > > + rorl $7,%eax > > + por %xmm8,%xmm2 > > + addl %ebp,%edx > > + addl 44(%rsp),%ecx > > + pshufd $238,%xmm1,%xmm9 > > + xorl %eax,%edi > > + movl %edx,%esi > > + roll $5,%edx > > + addl %edi,%ecx > > + xorl %eax,%esi > > + rorl $7,%ebp > > + addl %edx,%ecx > > + pxor %xmm7,%xmm3 > > + addl 48(%rsp),%ebx > > + xorl %ebp,%esi > > + punpcklqdq %xmm2,%xmm9 > > + movl %ecx,%edi > > + roll $5,%ecx > > + pxor %xmm4,%xmm3 > > + addl %esi,%ebx > > + xorl %ebp,%edi > > + movdqa %xmm10,%xmm8 > > + rorl $7,%edx > > + paddd %xmm2,%xmm10 > > + addl %ecx,%ebx > > + pxor %xmm9,%xmm3 > > + addl 52(%rsp),%eax > > + xorl %edx,%edi > > + movl %ebx,%esi > > + roll $5,%ebx > > + movdqa %xmm3,%xmm9 > > + addl %edi,%eax > > + xorl %edx,%esi > > + movdqa %xmm10,32(%rsp) > > + rorl $7,%ecx > > + addl %ebx,%eax > > + addl 56(%rsp),%ebp > > + pslld $2,%xmm3 > > + xorl %ecx,%esi > > + movl %eax,%edi > > + psrld $30,%xmm9 > > + roll $5,%eax > > + addl %esi,%ebp > > + xorl %ecx,%edi > > + rorl $7,%ebx > > + por %xmm9,%xmm3 > > + addl %eax,%ebp > > + addl 60(%rsp),%edx > > + pshufd $238,%xmm2,%xmm10 > > + xorl %ebx,%edi > > + movl %ebp,%esi > > + roll $5,%ebp > > + addl %edi,%edx > > + xorl %ebx,%esi > > + rorl $7,%eax > > + addl %ebp,%edx > > + pxor %xmm0,%xmm4 > > + addl 0(%rsp),%ecx > > + xorl %eax,%esi > > + punpcklqdq %xmm3,%xmm10 > > + movl %edx,%edi > > + roll $5,%edx > > + pxor %xmm5,%xmm4 > > + addl %esi,%ecx > > + xorl %eax,%edi > > + movdqa %xmm8,%xmm9 > > + rorl $7,%ebp > > + paddd %xmm3,%xmm8 > > + addl %edx,%ecx > > + pxor %xmm10,%xmm4 > > + addl 4(%rsp),%ebx > > + xorl %ebp,%edi > > + movl %ecx,%esi > > + roll $5,%ecx > > + movdqa %xmm4,%xmm10 > > + addl %edi,%ebx > > + xorl %ebp,%esi > > + movdqa %xmm8,48(%rsp) > > + rorl $7,%edx > > + addl %ecx,%ebx > > + addl 8(%rsp),%eax > > + pslld $2,%xmm4 > > + xorl %edx,%esi > > + movl %ebx,%edi > > + psrld $30,%xmm10 > > + roll $5,%ebx > > + addl %esi,%eax > > + xorl %edx,%edi > > + rorl $7,%ecx > > + por %xmm10,%xmm4 > > + addl %ebx,%eax > > + addl 12(%rsp),%ebp > > + pshufd $238,%xmm3,%xmm8 > > + xorl %ecx,%edi > > + movl %eax,%esi > > + roll $5,%eax > > + addl %edi,%ebp > > + xorl %ecx,%esi > > + rorl $7,%ebx > > + addl %eax,%ebp > > + pxor %xmm1,%xmm5 > > + addl 16(%rsp),%edx > > + xorl %ebx,%esi > > + punpcklqdq %xmm4,%xmm8 > > + movl %ebp,%edi > > + roll $5,%ebp > > + pxor %xmm6,%xmm5 > > + addl %esi,%edx > > + xorl %ebx,%edi > > + movdqa %xmm9,%xmm10 > > + rorl $7,%eax > > + paddd %xmm4,%xmm9 > > + addl %ebp,%edx > > + pxor %xmm8,%xmm5 > > + addl 20(%rsp),%ecx > > + xorl %eax,%edi > > + movl %edx,%esi > > + roll $5,%edx > > + movdqa %xmm5,%xmm8 > > + addl %edi,%ecx > > + xorl %eax,%esi > > + movdqa %xmm9,0(%rsp) > > + rorl $7,%ebp > > + addl %edx,%ecx > > + addl 24(%rsp),%ebx > > + pslld $2,%xmm5 > > + xorl %ebp,%esi > > + movl %ecx,%edi > > + psrld $30,%xmm8 > > + roll $5,%ecx > > + addl %esi,%ebx > > + xorl %ebp,%edi > > + rorl $7,%edx > > + por %xmm8,%xmm5 > > + addl %ecx,%ebx > > + addl 28(%rsp),%eax > > + pshufd $238,%xmm4,%xmm9 > > + rorl $7,%ecx > > + movl %ebx,%esi > > + xorl %edx,%edi > > + roll $5,%ebx > > + addl %edi,%eax > > + xorl %ecx,%esi > > + xorl %edx,%ecx > > + addl %ebx,%eax > > + pxor %xmm2,%xmm6 > > + addl 32(%rsp),%ebp > > + andl %ecx,%esi > > + xorl %edx,%ecx > > + rorl $7,%ebx > > + punpcklqdq %xmm5,%xmm9 > > + movl %eax,%edi > > + xorl %ecx,%esi > > + pxor %xmm7,%xmm6 > > + roll $5,%eax > > + addl %esi,%ebp > > + movdqa %xmm10,%xmm8 > > + xorl %ebx,%edi > > + paddd %xmm5,%xmm10 > > + xorl %ecx,%ebx > > + pxor %xmm9,%xmm6 > > + addl %eax,%ebp > > + addl 36(%rsp),%edx > > + andl %ebx,%edi > > + xorl %ecx,%ebx > > + rorl $7,%eax > > + movdqa %xmm6,%xmm9 > > + movl %ebp,%esi > > + xorl %ebx,%edi > > + movdqa %xmm10,16(%rsp) > > + roll $5,%ebp > > + addl %edi,%edx > > + xorl %eax,%esi > > + pslld $2,%xmm6 > > + xorl %ebx,%eax > > + addl %ebp,%edx > > + psrld $30,%xmm9 > > + addl 40(%rsp),%ecx > > + andl %eax,%esi > > + xorl %ebx,%eax > > + por %xmm9,%xmm6 > > + rorl $7,%ebp > > + movl %edx,%edi > > + xorl %eax,%esi > > + roll $5,%edx > > + pshufd $238,%xmm5,%xmm10 > > + addl %esi,%ecx > > + xorl %ebp,%edi > > + xorl %eax,%ebp > > + addl %edx,%ecx > > + addl 44(%rsp),%ebx > > + andl %ebp,%edi > > + xorl %eax,%ebp > > + rorl $7,%edx > > + movl %ecx,%esi > > + xorl %ebp,%edi > > + roll $5,%ecx > > + addl %edi,%ebx > > + xorl %edx,%esi > > + xorl %ebp,%edx > > + addl %ecx,%ebx > > + pxor %xmm3,%xmm7 > > + addl 48(%rsp),%eax > > + andl %edx,%esi > > + xorl %ebp,%edx > > + rorl $7,%ecx > > + punpcklqdq %xmm6,%xmm10 > > + movl %ebx,%edi > > + xorl %edx,%esi > > + pxor %xmm0,%xmm7 > > + roll $5,%ebx > > + addl %esi,%eax > > + movdqa 32(%r14),%xmm9 > > + xorl %ecx,%edi > > + paddd %xmm6,%xmm8 > > + xorl %edx,%ecx > > + pxor %xmm10,%xmm7 > > + addl %ebx,%eax > > + addl 52(%rsp),%ebp > > + andl %ecx,%edi > > + xorl %edx,%ecx > > + rorl $7,%ebx > > + movdqa %xmm7,%xmm10 > > + movl %eax,%esi > > + xorl %ecx,%edi > > + movdqa %xmm8,32(%rsp) > > + roll $5,%eax > > + addl %edi,%ebp > > + xorl %ebx,%esi > > + pslld $2,%xmm7 > > + xorl %ecx,%ebx > > + addl %eax,%ebp > > + psrld $30,%xmm10 > > + addl 56(%rsp),%edx > > + andl %ebx,%esi > > + xorl %ecx,%ebx > > + por %xmm10,%xmm7 > > + rorl $7,%eax > > + movl %ebp,%edi > > + xorl %ebx,%esi > > + roll $5,%ebp > > + pshufd $238,%xmm6,%xmm8 > > + addl %esi,%edx > > + xorl %eax,%edi > > + xorl %ebx,%eax > > + addl %ebp,%edx > > + addl 60(%rsp),%ecx > > + andl %eax,%edi > > + xorl %ebx,%eax > > + rorl $7,%ebp > > + movl %edx,%esi > > + xorl %eax,%edi > > + roll $5,%edx > > + addl %edi,%ecx > > + xorl %ebp,%esi > > + xorl %eax,%ebp > > + addl %edx,%ecx > > + pxor %xmm4,%xmm0 > > + addl 0(%rsp),%ebx > > + andl %ebp,%esi > > + xorl %eax,%ebp > > + rorl $7,%edx > > + punpcklqdq %xmm7,%xmm8 > > + movl %ecx,%edi > > + xorl %ebp,%esi > > + pxor %xmm1,%xmm0 > > + roll $5,%ecx > > + addl %esi,%ebx > > + movdqa %xmm9,%xmm10 > > + xorl %edx,%edi > > + paddd %xmm7,%xmm9 > > + xorl %ebp,%edx > > + pxor %xmm8,%xmm0 > > + addl %ecx,%ebx > > + addl 4(%rsp),%eax > > + andl %edx,%edi > > + xorl %ebp,%edx > > + rorl $7,%ecx > > + movdqa %xmm0,%xmm8 > > + movl %ebx,%esi > > + xorl %edx,%edi > > + movdqa %xmm9,48(%rsp) > > + roll $5,%ebx > > + addl %edi,%eax > > + xorl %ecx,%esi > > + pslld $2,%xmm0 > > + xorl %edx,%ecx > > + addl %ebx,%eax > > + psrld $30,%xmm8 > > + addl 8(%rsp),%ebp > > + andl %ecx,%esi > > + xorl %edx,%ecx > > + por %xmm8,%xmm0 > > + rorl $7,%ebx > > + movl %eax,%edi > > + xorl %ecx,%esi > > + roll $5,%eax > > + pshufd $238,%xmm7,%xmm9 > > + addl %esi,%ebp > > + xorl %ebx,%edi > > + xorl %ecx,%ebx > > + addl %eax,%ebp > > + addl 12(%rsp),%edx > > + andl %ebx,%edi > > + xorl %ecx,%ebx > > + rorl $7,%eax > > + movl %ebp,%esi > > + xorl %ebx,%edi > > + roll $5,%ebp > > + addl %edi,%edx > > + xorl %eax,%esi > > + xorl %ebx,%eax > > + addl %ebp,%edx > > + pxor %xmm5,%xmm1 > > + addl 16(%rsp),%ecx > > + andl %eax,%esi > > + xorl %ebx,%eax > > + rorl $7,%ebp > > + punpcklqdq %xmm0,%xmm9 > > + movl %edx,%edi > > + xorl %eax,%esi > > + pxor %xmm2,%xmm1 > > + roll $5,%edx > > + addl %esi,%ecx > > + movdqa %xmm10,%xmm8 > > + xorl %ebp,%edi > > + paddd %xmm0,%xmm10 > > + xorl %eax,%ebp > > + pxor %xmm9,%xmm1 > > + addl %edx,%ecx > > + addl 20(%rsp),%ebx > > + andl %ebp,%edi > > + xorl %eax,%ebp > > + rorl $7,%edx > > + movdqa %xmm1,%xmm9 > > + movl %ecx,%esi > > + xorl %ebp,%edi > > + movdqa %xmm10,0(%rsp) > > + roll $5,%ecx > > + addl %edi,%ebx > > + xorl %edx,%esi > > + pslld $2,%xmm1 > > + xorl %ebp,%edx > > + addl %ecx,%ebx > > + psrld $30,%xmm9 > > + addl 24(%rsp),%eax > > + andl %edx,%esi > > + xorl %ebp,%edx > > + por %xmm9,%xmm1 > > + rorl $7,%ecx > > + movl %ebx,%edi > > + xorl %edx,%esi > > + roll $5,%ebx > > + pshufd $238,%xmm0,%xmm10 > > + addl %esi,%eax > > + xorl %ecx,%edi > > + xorl %edx,%ecx > > + addl %ebx,%eax > > + addl 28(%rsp),%ebp > > + andl %ecx,%edi > > + xorl %edx,%ecx > > + rorl $7,%ebx > > + movl %eax,%esi > > + xorl %ecx,%edi > > + roll $5,%eax > > + addl %edi,%ebp > > + xorl %ebx,%esi > > + xorl %ecx,%ebx > > + addl %eax,%ebp > > + pxor %xmm6,%xmm2 > > + addl 32(%rsp),%edx > > + andl %ebx,%esi > > + xorl %ecx,%ebx > > + rorl $7,%eax > > + punpcklqdq %xmm1,%xmm10 > > + movl %ebp,%edi > > + xorl %ebx,%esi > > + pxor %xmm3,%xmm2 > > + roll $5,%ebp > > + addl %esi,%edx > > + movdqa %xmm8,%xmm9 > > + xorl %eax,%edi > > + paddd %xmm1,%xmm8 > > + xorl %ebx,%eax > > + pxor %xmm10,%xmm2 > > + addl %ebp,%edx > > + addl 36(%rsp),%ecx > > + andl %eax,%edi > > + xorl %ebx,%eax > > + rorl $7,%ebp > > + movdqa %xmm2,%xmm10 > > + movl %edx,%esi > > + xorl %eax,%edi > > + movdqa %xmm8,16(%rsp) > > + roll $5,%edx > > + addl %edi,%ecx > > + xorl %ebp,%esi > > + pslld $2,%xmm2 > > + xorl %eax,%ebp > > + addl %edx,%ecx > > + psrld $30,%xmm10 > > + addl 40(%rsp),%ebx > > + andl %ebp,%esi > > + xorl %eax,%ebp > > + por %xmm10,%xmm2 > > + rorl $7,%edx > > + movl %ecx,%edi > > + xorl %ebp,%esi > > + roll $5,%ecx > > + pshufd $238,%xmm1,%xmm8 > > + addl %esi,%ebx > > + xorl %edx,%edi > > + xorl %ebp,%edx > > + addl %ecx,%ebx > > + addl 44(%rsp),%eax > > + andl %edx,%edi > > + xorl %ebp,%edx > > + rorl $7,%ecx > > + movl %ebx,%esi > > + xorl %edx,%edi > > + roll $5,%ebx > > + addl %edi,%eax > > + xorl %edx,%esi > > + addl %ebx,%eax > > + pxor %xmm7,%xmm3 > > + addl 48(%rsp),%ebp > > + xorl %ecx,%esi > > + punpcklqdq %xmm2,%xmm8 > > + movl %eax,%edi > > + roll $5,%eax > > + pxor %xmm4,%xmm3 > > + addl %esi,%ebp > > + xorl %ecx,%edi > > + movdqa %xmm9,%xmm10 > > + rorl $7,%ebx > > + paddd %xmm2,%xmm9 > > + addl %eax,%ebp > > + pxor %xmm8,%xmm3 > > + addl 52(%rsp),%edx > > + xorl %ebx,%edi > > + movl %ebp,%esi > > + roll $5,%ebp > > + movdqa %xmm3,%xmm8 > > + addl %edi,%edx > > + xorl %ebx,%esi > > + movdqa %xmm9,32(%rsp) > > + rorl $7,%eax > > + addl %ebp,%edx > > + addl 56(%rsp),%ecx > > + pslld $2,%xmm3 > > + xorl %eax,%esi > > + movl %edx,%edi > > + psrld $30,%xmm8 > > + roll $5,%edx > > + addl %esi,%ecx > > + xorl %eax,%edi > > + rorl $7,%ebp > > + por %xmm8,%xmm3 > > + addl %edx,%ecx > > + addl 60(%rsp),%ebx > > + xorl %ebp,%edi > > + movl %ecx,%esi > > + roll $5,%ecx > > + addl %edi,%ebx > > + xorl %ebp,%esi > > + rorl $7,%edx > > + addl %ecx,%ebx > > + addl 0(%rsp),%eax > > + xorl %edx,%esi > > + movl %ebx,%edi > > + roll $5,%ebx > > + paddd %xmm3,%xmm10 > > + addl %esi,%eax > > + xorl %edx,%edi > > + movdqa %xmm10,48(%rsp) > > + rorl $7,%ecx > > + addl %ebx,%eax > > + addl 4(%rsp),%ebp > > + xorl %ecx,%edi > > + movl %eax,%esi > > + roll $5,%eax > > + addl %edi,%ebp > > + xorl %ecx,%esi > > + rorl $7,%ebx > > + addl %eax,%ebp > > + addl 8(%rsp),%edx > > + xorl %ebx,%esi > > + movl %ebp,%edi > > + roll $5,%ebp > > + addl %esi,%edx > > + xorl %ebx,%edi > > + rorl $7,%eax > > + addl %ebp,%edx > > + addl 12(%rsp),%ecx > > + xorl %eax,%edi > > + movl %edx,%esi > > + roll $5,%edx > > + addl %edi,%ecx > > + xorl %eax,%esi > > + rorl $7,%ebp > > + addl %edx,%ecx > > + cmpq %r10,%r9 > > + je .Ldone_ssse3 > > + movdqa 64(%r14),%xmm6 > > + movdqa -64(%r14),%xmm9 > > + movdqu 0(%r9),%xmm0 > > + movdqu 16(%r9),%xmm1 > > + movdqu 32(%r9),%xmm2 > > + movdqu 48(%r9),%xmm3 > > +.byte 102,15,56,0,198 > > + addq $64,%r9 > > + addl 16(%rsp),%ebx > > + xorl %ebp,%esi > > + movl %ecx,%edi > > +.byte 102,15,56,0,206 > > + roll $5,%ecx > > + addl %esi,%ebx > > + xorl %ebp,%edi > > + rorl $7,%edx > > + paddd %xmm9,%xmm0 > > + addl %ecx,%ebx > > + addl 20(%rsp),%eax > > + xorl %edx,%edi > > + movl %ebx,%esi > > + movdqa %xmm0,0(%rsp) > > + roll $5,%ebx > > + addl %edi,%eax > > + xorl %edx,%esi > > + rorl $7,%ecx > > + psubd %xmm9,%xmm0 > > + addl %ebx,%eax > > + addl 24(%rsp),%ebp > > + xorl %ecx,%esi > > + movl %eax,%edi > > + roll $5,%eax > > + addl %esi,%ebp > > + xorl %ecx,%edi > > + rorl $7,%ebx > > + addl %eax,%ebp > > + addl 28(%rsp),%edx > > + xorl %ebx,%edi > > + movl %ebp,%esi > > + roll $5,%ebp > > + addl %edi,%edx > > + xorl %ebx,%esi > > + rorl $7,%eax > > + addl %ebp,%edx > > + addl 32(%rsp),%ecx > > + xorl %eax,%esi > > + movl %edx,%edi > > +.byte 102,15,56,0,214 > > + roll $5,%edx > > + addl %esi,%ecx > > + xorl %eax,%edi > > + rorl $7,%ebp > > + paddd %xmm9,%xmm1 > > + addl %edx,%ecx > > + addl 36(%rsp),%ebx > > + xorl %ebp,%edi > > + movl %ecx,%esi > > + movdqa %xmm1,16(%rsp) > > + roll $5,%ecx > > + addl %edi,%ebx > > + xorl %ebp,%esi > > + rorl $7,%edx > > + psubd %xmm9,%xmm1 > > + addl %ecx,%ebx > > + addl 40(%rsp),%eax > > + xorl %edx,%esi > > + movl %ebx,%edi > > + roll $5,%ebx > > + addl %esi,%eax > > + xorl %edx,%edi > > + rorl $7,%ecx > > + addl %ebx,%eax > > + addl 44(%rsp),%ebp > > + xorl %ecx,%edi > > + movl %eax,%esi > > + roll $5,%eax > > + addl %edi,%ebp > > + xorl %ecx,%esi > > + rorl $7,%ebx > > + addl %eax,%ebp > > + addl 48(%rsp),%edx > > + xorl %ebx,%esi > > + movl %ebp,%edi > > +.byte 102,15,56,0,222 > > + roll $5,%ebp > > + addl %esi,%edx > > + xorl %ebx,%edi > > + rorl $7,%eax > > + paddd %xmm9,%xmm2 > > + addl %ebp,%edx > > + addl 52(%rsp),%ecx > > + xorl %eax,%edi > > + movl %edx,%esi > > + movdqa %xmm2,32(%rsp) > > + roll $5,%edx > > + addl %edi,%ecx > > + xorl %eax,%esi > > + rorl $7,%ebp > > + psubd %xmm9,%xmm2 > > + addl %edx,%ecx > > + addl 56(%rsp),%ebx > > + xorl %ebp,%esi > > + movl %ecx,%edi > > + roll $5,%ecx > > + addl %esi,%ebx > > + xorl %ebp,%edi > > + rorl $7,%edx > > + addl %ecx,%ebx > > + addl 60(%rsp),%eax > > + xorl %edx,%edi > > + movl %ebx,%esi > > + roll $5,%ebx > > + addl %edi,%eax > > + rorl $7,%ecx > > + addl %ebx,%eax > > + addl 0(%r8),%eax > > + addl 4(%r8),%esi > > + addl 8(%r8),%ecx > > + addl 12(%r8),%edx > > + movl %eax,0(%r8) > > + addl 16(%r8),%ebp > > + movl %esi,4(%r8) > > + movl %esi,%ebx > > + movl %ecx,8(%r8) > > + movl %ecx,%edi > > + movl %edx,12(%r8) > > + xorl %edx,%edi > > + movl %ebp,16(%r8) > > + andl %edi,%esi > > + jmp .Loop_ssse3 > > + > > +.align 16 > > +.Ldone_ssse3: > > + addl 16(%rsp),%ebx > > + xorl %ebp,%esi > > + movl %ecx,%edi > > + roll $5,%ecx > > + addl %esi,%ebx > > + xorl %ebp,%edi > > + rorl $7,%edx > > + addl %ecx,%ebx > > + addl 20(%rsp),%eax > > + xorl %edx,%edi > > + movl %ebx,%esi > > + roll $5,%ebx > > + addl %edi,%eax > > + xorl %edx,%esi > > + rorl $7,%ecx > > + addl %ebx,%eax > > + addl 24(%rsp),%ebp > > + xorl %ecx,%esi > > + movl %eax,%edi > > + roll $5,%eax > > + addl %esi,%ebp > > + xorl %ecx,%edi > > + rorl $7,%ebx > > + addl %eax,%ebp > > + addl 28(%rsp),%edx > > + xorl %ebx,%edi > > + movl %ebp,%esi > > + roll $5,%ebp > > + addl %edi,%edx > > + xorl %ebx,%esi > > + rorl $7,%eax > > + addl %ebp,%edx > > + addl 32(%rsp),%ecx > > + xorl %eax,%esi > > + movl %edx,%edi > > + roll $5,%edx > > + addl %esi,%ecx > > + xorl %eax,%edi > > + rorl $7,%ebp > > + addl %edx,%ecx > > + addl 36(%rsp),%ebx > > + xorl %ebp,%edi > > + movl %ecx,%esi > > + roll $5,%ecx > > + addl %edi,%ebx > > + xorl %ebp,%esi > > + rorl $7,%edx > > + addl %ecx,%ebx > > + addl 40(%rsp),%eax > > + xorl %edx,%esi > > + movl %ebx,%edi > > + roll $5,%ebx > > + addl %esi,%eax > > + xorl %edx,%edi > > + rorl $7,%ecx > > + addl %ebx,%eax > > + addl 44(%rsp),%ebp > > + xorl %ecx,%edi > > + movl %eax,%esi > > + roll $5,%eax > > + addl %edi,%ebp > > + xorl %ecx,%esi > > + rorl $7,%ebx > > + addl %eax,%ebp > > + addl 48(%rsp),%edx > > + xorl %ebx,%esi > > + movl %ebp,%edi > > + roll $5,%ebp > > + addl %esi,%edx > > + xorl %ebx,%edi > > + rorl $7,%eax > > + addl %ebp,%edx > > + addl 52(%rsp),%ecx > > + xorl %eax,%edi > > + movl %edx,%esi > > + roll $5,%edx > > + addl %edi,%ecx > > + xorl %eax,%esi > > + rorl $7,%ebp > > + addl %edx,%ecx > > + addl 56(%rsp),%ebx > > + xorl %ebp,%esi > > + movl %ecx,%edi > > + roll $5,%ecx > > + addl %esi,%ebx > > + xorl %ebp,%edi > > + rorl $7,%edx > > + addl %ecx,%ebx > > + addl 60(%rsp),%eax > > + xorl %edx,%edi > > + movl %ebx,%esi > > + roll $5,%ebx > > + addl %edi,%eax > > + rorl $7,%ecx > > + addl %ebx,%eax > > + addl 0(%r8),%eax > > + addl 4(%r8),%esi > > + addl 8(%r8),%ecx > > + movl %eax,0(%r8) > > + addl 12(%r8),%edx > > + movl %esi,4(%r8) > > + addl 16(%r8),%ebp > > + movl %ecx,8(%r8) > > + movl %edx,12(%r8) > > + movl %ebp,16(%r8) > > + movq -40(%r11),%r14 > > +.cfi_restore %r14 > > + movq -32(%r11),%r13 > > +.cfi_restore %r13 > > + movq -24(%r11),%r12 > > +.cfi_restore %r12 > > + movq -16(%r11),%rbp > > +.cfi_restore %rbp > > + movq -8(%r11),%rbx > > +.cfi_restore %rbx > > + leaq (%r11),%rsp > > +.cfi_def_cfa_register %rsp > > +.Lepilogue_ssse3: > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size sha1_block_data_order_ssse3,.-sha1_block_data_order_ssse3 > > +.align 64 > > +K_XX_XX: > > +.long 0x5a827999,0x5a827999,0x5a827999,0x5a827999 > > +.long 0x5a827999,0x5a827999,0x5a827999,0x5a827999 > > +.long 0x6ed9eba1,0x6ed9eba1,0x6ed9eba1,0x6ed9eba1 > > +.long 0x6ed9eba1,0x6ed9eba1,0x6ed9eba1,0x6ed9eba1 > > +.long 0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc > > +.long 0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc > > +.long 0xca62c1d6,0xca62c1d6,0xca62c1d6,0xca62c1d6 > > +.long 0xca62c1d6,0xca62c1d6,0xca62c1d6,0xca62c1d6 > > +.long 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f > > +.long 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f > > +.byte 0xf,0xe,0xd,0xc,0xb,0xa,0x9,0x8,0x7,0x6,0x5,0x4,0x3,0x2,0x1,0x0 > > +.byte > > > 83,72,65,49,32,98,108,111,99,107,32,116,114,97,110,115,102,111,114,109,32, > > > 102,111,114,32,120,56,54,95,54,52,44,32,67,82,89,80,84,79,71,65,77,83,32,98 > , > > > 121,32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46,111,114,1 > 0 > > 3,62,0 > > +.align 64 > > diff --git a/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha256-mb- > > x86_64.S b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha256-mb- > > x86_64.S > > new file mode 100644 > > index 0000000000..25dee488b8 > > --- /dev/null > > +++ > b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha256-mb-x86_64.S > > @@ -0,0 +1,3286 @@ > > +# WARNING: do not edit! > > +# Generated from openssl/crypto/sha/asm/sha256-mb-x86_64.pl > > +# > > +# Copyright 2013-2020 The OpenSSL Project Authors. All Rights Reserved. > > +# > > +# Licensed under the OpenSSL license (the "License"). You may not use > > +# this file except in compliance with the License. You can obtain a copy > > +# in the file LICENSE in the source distribution or at > > +# https://www.openssl.org/source/license.html > > + > > +.text > > + > > + > > + > > +.globl sha256_multi_block > > +.type sha256_multi_block,@function > > +.align 32 > > +sha256_multi_block: > > +.cfi_startproc > > + movq OPENSSL_ia32cap_P+4(%rip),%rcx > > + btq $61,%rcx > > + jc _shaext_shortcut > > + movq %rsp,%rax > > +.cfi_def_cfa_register %rax > > + pushq %rbx > > +.cfi_offset %rbx,-16 > > + pushq %rbp > > +.cfi_offset %rbp,-24 > > + subq $288,%rsp > > + andq $-256,%rsp > > + movq %rax,272(%rsp) > > +.cfi_escape 0x0f,0x06,0x77,0x90,0x02,0x06,0x23,0x08 > > +.Lbody: > > + leaq K256+128(%rip),%rbp > > + leaq 256(%rsp),%rbx > > + leaq 128(%rdi),%rdi > > + > > +.Loop_grande: > > + movl %edx,280(%rsp) > > + xorl %edx,%edx > > + movq 0(%rsi),%r8 > > + movl 8(%rsi),%ecx > > + cmpl %edx,%ecx > > + cmovgl %ecx,%edx > > + testl %ecx,%ecx > > + movl %ecx,0(%rbx) > > + cmovleq %rbp,%r8 > > + movq 16(%rsi),%r9 > > + movl 24(%rsi),%ecx > > + cmpl %edx,%ecx > > + cmovgl %ecx,%edx > > + testl %ecx,%ecx > > + movl %ecx,4(%rbx) > > + cmovleq %rbp,%r9 > > + movq 32(%rsi),%r10 > > + movl 40(%rsi),%ecx > > + cmpl %edx,%ecx > > + cmovgl %ecx,%edx > > + testl %ecx,%ecx > > + movl %ecx,8(%rbx) > > + cmovleq %rbp,%r10 > > + movq 48(%rsi),%r11 > > + movl 56(%rsi),%ecx > > + cmpl %edx,%ecx > > + cmovgl %ecx,%edx > > + testl %ecx,%ecx > > + movl %ecx,12(%rbx) > > + cmovleq %rbp,%r11 > > + testl %edx,%edx > > + jz .Ldone > > + > > + movdqu 0-128(%rdi),%xmm8 > > + leaq 128(%rsp),%rax > > + movdqu 32-128(%rdi),%xmm9 > > + movdqu 64-128(%rdi),%xmm10 > > + movdqu 96-128(%rdi),%xmm11 > > + movdqu 128-128(%rdi),%xmm12 > > + movdqu 160-128(%rdi),%xmm13 > > + movdqu 192-128(%rdi),%xmm14 > > + movdqu 224-128(%rdi),%xmm15 > > + movdqu .Lpbswap(%rip),%xmm6 > > + jmp .Loop > > + > > +.align 32 > > +.Loop: > > + movdqa %xmm10,%xmm4 > > + pxor %xmm9,%xmm4 > > + movd 0(%r8),%xmm5 > > + movd 0(%r9),%xmm0 > > + movd 0(%r10),%xmm1 > > + movd 0(%r11),%xmm2 > > + punpckldq %xmm1,%xmm5 > > + punpckldq %xmm2,%xmm0 > > + punpckldq %xmm0,%xmm5 > > + movdqa %xmm12,%xmm7 > > +.byte 102,15,56,0,238 > > + movdqa %xmm12,%xmm2 > > + > > + psrld $6,%xmm7 > > + movdqa %xmm12,%xmm1 > > + pslld $7,%xmm2 > > + movdqa %xmm5,0-128(%rax) > > + paddd %xmm15,%xmm5 > > + > > + psrld $11,%xmm1 > > + pxor %xmm2,%xmm7 > > + pslld $21-7,%xmm2 > > + paddd -128(%rbp),%xmm5 > > + pxor %xmm1,%xmm7 > > + > > + psrld $25-11,%xmm1 > > + movdqa %xmm12,%xmm0 > > + > > + pxor %xmm2,%xmm7 > > + movdqa %xmm12,%xmm3 > > + pslld $26-21,%xmm2 > > + pandn %xmm14,%xmm0 > > + pand %xmm13,%xmm3 > > + pxor %xmm1,%xmm7 > > + > > + > > + movdqa %xmm8,%xmm1 > > + pxor %xmm2,%xmm7 > > + movdqa %xmm8,%xmm2 > > + psrld $2,%xmm1 > > + paddd %xmm7,%xmm5 > > + pxor %xmm3,%xmm0 > > + movdqa %xmm9,%xmm3 > > + movdqa %xmm8,%xmm7 > > + pslld $10,%xmm2 > > + pxor %xmm8,%xmm3 > > + > > + > > + psrld $13,%xmm7 > > + pxor %xmm2,%xmm1 > > + paddd %xmm0,%xmm5 > > + pslld $19-10,%xmm2 > > + pand %xmm3,%xmm4 > > + pxor %xmm7,%xmm1 > > + > > + > > + psrld $22-13,%xmm7 > > + pxor %xmm2,%xmm1 > > + movdqa %xmm9,%xmm15 > > + pslld $30-19,%xmm2 > > + pxor %xmm1,%xmm7 > > + pxor %xmm4,%xmm15 > > + paddd %xmm5,%xmm11 > > + pxor %xmm2,%xmm7 > > + > > + paddd %xmm5,%xmm15 > > + paddd %xmm7,%xmm15 > > + movd 4(%r8),%xmm5 > > + movd 4(%r9),%xmm0 > > + movd 4(%r10),%xmm1 > > + movd 4(%r11),%xmm2 > > + punpckldq %xmm1,%xmm5 > > + punpckldq %xmm2,%xmm0 > > + punpckldq %xmm0,%xmm5 > > + movdqa %xmm11,%xmm7 > > + > > + movdqa %xmm11,%xmm2 > > +.byte 102,15,56,0,238 > > + psrld $6,%xmm7 > > + movdqa %xmm11,%xmm1 > > + pslld $7,%xmm2 > > + movdqa %xmm5,16-128(%rax) > > + paddd %xmm14,%xmm5 > > + > > + psrld $11,%xmm1 > > + pxor %xmm2,%xmm7 > > + pslld $21-7,%xmm2 > > + paddd -96(%rbp),%xmm5 > > + pxor %xmm1,%xmm7 > > + > > + psrld $25-11,%xmm1 > > + movdqa %xmm11,%xmm0 > > + > > + pxor %xmm2,%xmm7 > > + movdqa %xmm11,%xmm4 > > + pslld $26-21,%xmm2 > > + pandn %xmm13,%xmm0 > > + pand %xmm12,%xmm4 > > + pxor %xmm1,%xmm7 > > + > > + > > + movdqa %xmm15,%xmm1 > > + pxor %xmm2,%xmm7 > > + movdqa %xmm15,%xmm2 > > + psrld $2,%xmm1 > > + paddd %xmm7,%xmm5 > > + pxor %xmm4,%xmm0 > > + movdqa %xmm8,%xmm4 > > + movdqa %xmm15,%xmm7 > > + pslld $10,%xmm2 > > + pxor %xmm15,%xmm4 > > + > > + > > + psrld $13,%xmm7 > > + pxor %xmm2,%xmm1 > > + paddd %xmm0,%xmm5 > > + pslld $19-10,%xmm2 > > + pand %xmm4,%xmm3 > > + pxor %xmm7,%xmm1 > > + > > + > > + psrld $22-13,%xmm7 > > + pxor %xmm2,%xmm1 > > + movdqa %xmm8,%xmm14 > > + pslld $30-19,%xmm2 > > + pxor %xmm1,%xmm7 > > + pxor %xmm3,%xmm14 > > + paddd %xmm5,%xmm10 > > + pxor %xmm2,%xmm7 > > + > > + paddd %xmm5,%xmm14 > > + paddd %xmm7,%xmm14 > > + movd 8(%r8),%xmm5 > > + movd 8(%r9),%xmm0 > > + movd 8(%r10),%xmm1 > > + movd 8(%r11),%xmm2 > > + punpckldq %xmm1,%xmm5 > > + punpckldq %xmm2,%xmm0 > > + punpckldq %xmm0,%xmm5 > > + movdqa %xmm10,%xmm7 > > +.byte 102,15,56,0,238 > > + movdqa %xmm10,%xmm2 > > + > > + psrld $6,%xmm7 > > + movdqa %xmm10,%xmm1 > > + pslld $7,%xmm2 > > + movdqa %xmm5,32-128(%rax) > > + paddd %xmm13,%xmm5 > > + > > + psrld $11,%xmm1 > > + pxor %xmm2,%xmm7 > > + pslld $21-7,%xmm2 > > + paddd -64(%rbp),%xmm5 > > + pxor %xmm1,%xmm7 > > + > > + psrld $25-11,%xmm1 > > + movdqa %xmm10,%xmm0 > > + > > + pxor %xmm2,%xmm7 > > + movdqa %xmm10,%xmm3 > > + pslld $26-21,%xmm2 > > + pandn %xmm12,%xmm0 > > + pand %xmm11,%xmm3 > > + pxor %xmm1,%xmm7 > > + > > + > > + movdqa %xmm14,%xmm1 > > + pxor %xmm2,%xmm7 > > + movdqa %xmm14,%xmm2 > > + psrld $2,%xmm1 > > + paddd %xmm7,%xmm5 > > + pxor %xmm3,%xmm0 > > + movdqa %xmm15,%xmm3 > > + movdqa %xmm14,%xmm7 > > + pslld $10,%xmm2 > > + pxor %xmm14,%xmm3 > > + > > + > > + psrld $13,%xmm7 > > + pxor %xmm2,%xmm1 > > + paddd %xmm0,%xmm5 > > + pslld $19-10,%xmm2 > > + pand %xmm3,%xmm4 > > + pxor %xmm7,%xmm1 > > + > > + > > + psrld $22-13,%xmm7 > > + pxor %xmm2,%xmm1 > > + movdqa %xmm15,%xmm13 > > + pslld $30-19,%xmm2 > > + pxor %xmm1,%xmm7 > > + pxor %xmm4,%xmm13 > > + paddd %xmm5,%xmm9 > > + pxor %xmm2,%xmm7 > > + > > + paddd %xmm5,%xmm13 > > + paddd %xmm7,%xmm13 > > + movd 12(%r8),%xmm5 > > + movd 12(%r9),%xmm0 > > + movd 12(%r10),%xmm1 > > + movd 12(%r11),%xmm2 > > + punpckldq %xmm1,%xmm5 > > + punpckldq %xmm2,%xmm0 > > + punpckldq %xmm0,%xmm5 > > + movdqa %xmm9,%xmm7 > > + > > + movdqa %xmm9,%xmm2 > > +.byte 102,15,56,0,238 > > + psrld $6,%xmm7 > > + movdqa %xmm9,%xmm1 > > + pslld $7,%xmm2 > > + movdqa %xmm5,48-128(%rax) > > + paddd %xmm12,%xmm5 > > + > > + psrld $11,%xmm1 > > + pxor %xmm2,%xmm7 > > + pslld $21-7,%xmm2 > > + paddd -32(%rbp),%xmm5 > > + pxor %xmm1,%xmm7 > > + > > + psrld $25-11,%xmm1 > > + movdqa %xmm9,%xmm0 > > + > > + pxor %xmm2,%xmm7 > > + movdqa %xmm9,%xmm4 > > + pslld $26-21,%xmm2 > > + pandn %xmm11,%xmm0 > > + pand %xmm10,%xmm4 > > + pxor %xmm1,%xmm7 > > + > > + > > + movdqa %xmm13,%xmm1 > > + pxor %xmm2,%xmm7 > > + movdqa %xmm13,%xmm2 > > + psrld $2,%xmm1 > > + paddd %xmm7,%xmm5 > > + pxor %xmm4,%xmm0 > > + movdqa %xmm14,%xmm4 > > + movdqa %xmm13,%xmm7 > > + pslld $10,%xmm2 > > + pxor %xmm13,%xmm4 > > + > > + > > + psrld $13,%xmm7 > > + pxor %xmm2,%xmm1 > > + paddd %xmm0,%xmm5 > > + pslld $19-10,%xmm2 > > + pand %xmm4,%xmm3 > > + pxor %xmm7,%xmm1 > > + > > + > > + psrld $22-13,%xmm7 > > + pxor %xmm2,%xmm1 > > + movdqa %xmm14,%xmm12 > > + pslld $30-19,%xmm2 > > + pxor %xmm1,%xmm7 > > + pxor %xmm3,%xmm12 > > + paddd %xmm5,%xmm8 > > + pxor %xmm2,%xmm7 > > + > > + paddd %xmm5,%xmm12 > > + paddd %xmm7,%xmm12 > > + movd 16(%r8),%xmm5 > > + movd 16(%r9),%xmm0 > > + movd 16(%r10),%xmm1 > > + movd 16(%r11),%xmm2 > > + punpckldq %xmm1,%xmm5 > > + punpckldq %xmm2,%xmm0 > > + punpckldq %xmm0,%xmm5 > > + movdqa %xmm8,%xmm7 > > +.byte 102,15,56,0,238 > > + movdqa %xmm8,%xmm2 > > + > > + psrld $6,%xmm7 > > + movdqa %xmm8,%xmm1 > > + pslld $7,%xmm2 > > + movdqa %xmm5,64-128(%rax) > > + paddd %xmm11,%xmm5 > > + > > + psrld $11,%xmm1 > > + pxor %xmm2,%xmm7 > > + pslld $21-7,%xmm2 > > + paddd 0(%rbp),%xmm5 > > + pxor %xmm1,%xmm7 > > + > > + psrld $25-11,%xmm1 > > + movdqa %xmm8,%xmm0 > > + > > + pxor %xmm2,%xmm7 > > + movdqa %xmm8,%xmm3 > > + pslld $26-21,%xmm2 > > + pandn %xmm10,%xmm0 > > + pand %xmm9,%xmm3 > > + pxor %xmm1,%xmm7 > > + > > + > > + movdqa %xmm12,%xmm1 > > + pxor %xmm2,%xmm7 > > + movdqa %xmm12,%xmm2 > > + psrld $2,%xmm1 > > + paddd %xmm7,%xmm5 > > + pxor %xmm3,%xmm0 > > + movdqa %xmm13,%xmm3 > > + movdqa %xmm12,%xmm7 > > + pslld $10,%xmm2 > > + pxor %xmm12,%xmm3 > > + > > + > > + psrld $13,%xmm7 > > + pxor %xmm2,%xmm1 > > + paddd %xmm0,%xmm5 > > + pslld $19-10,%xmm2 > > + pand %xmm3,%xmm4 > > + pxor %xmm7,%xmm1 > > + > > + > > + psrld $22-13,%xmm7 > > + pxor %xmm2,%xmm1 > > + movdqa %xmm13,%xmm11 > > + pslld $30-19,%xmm2 > > + pxor %xmm1,%xmm7 > > + pxor %xmm4,%xmm11 > > + paddd %xmm5,%xmm15 > > + pxor %xmm2,%xmm7 > > + > > + paddd %xmm5,%xmm11 > > + paddd %xmm7,%xmm11 > > + movd 20(%r8),%xmm5 > > + movd 20(%r9),%xmm0 > > + movd 20(%r10),%xmm1 > > + movd 20(%r11),%xmm2 > > + punpckldq %xmm1,%xmm5 > > + punpckldq %xmm2,%xmm0 > > + punpckldq %xmm0,%xmm5 > > + movdqa %xmm15,%xmm7 > > + > > + movdqa %xmm15,%xmm2 > > +.byte 102,15,56,0,238 > > + psrld $6,%xmm7 > > + movdqa %xmm15,%xmm1 > > + pslld $7,%xmm2 > > + movdqa %xmm5,80-128(%rax) > > + paddd %xmm10,%xmm5 > > + > > + psrld $11,%xmm1 > > + pxor %xmm2,%xmm7 > > + pslld $21-7,%xmm2 > > + paddd 32(%rbp),%xmm5 > > + pxor %xmm1,%xmm7 > > + > > + psrld $25-11,%xmm1 > > + movdqa %xmm15,%xmm0 > > + > > + pxor %xmm2,%xmm7 > > + movdqa %xmm15,%xmm4 > > + pslld $26-21,%xmm2 > > + pandn %xmm9,%xmm0 > > + pand %xmm8,%xmm4 > > + pxor %xmm1,%xmm7 > > + > > + > > + movdqa %xmm11,%xmm1 > > + pxor %xmm2,%xmm7 > > + movdqa %xmm11,%xmm2 > > + psrld $2,%xmm1 > > + paddd %xmm7,%xmm5 > > + pxor %xmm4,%xmm0 > > + movdqa %xmm12,%xmm4 > > + movdqa %xmm11,%xmm7 > > + pslld $10,%xmm2 > > + pxor %xmm11,%xmm4 > > + > > + > > + psrld $13,%xmm7 > > + pxor %xmm2,%xmm1 > > + paddd %xmm0,%xmm5 > > + pslld $19-10,%xmm2 > > + pand %xmm4,%xmm3 > > + pxor %xmm7,%xmm1 > > + > > + > > + psrld $22-13,%xmm7 > > + pxor %xmm2,%xmm1 > > + movdqa %xmm12,%xmm10 > > + pslld $30-19,%xmm2 > > + pxor %xmm1,%xmm7 > > + pxor %xmm3,%xmm10 > > + paddd %xmm5,%xmm14 > > + pxor %xmm2,%xmm7 > > + > > + paddd %xmm5,%xmm10 > > + paddd %xmm7,%xmm10 > > + movd 24(%r8),%xmm5 > > + movd 24(%r9),%xmm0 > > + movd 24(%r10),%xmm1 > > + movd 24(%r11),%xmm2 > > + punpckldq %xmm1,%xmm5 > > + punpckldq %xmm2,%xmm0 > > + punpckldq %xmm0,%xmm5 > > + movdqa %xmm14,%xmm7 > > +.byte 102,15,56,0,238 > > + movdqa %xmm14,%xmm2 > > + > > + psrld $6,%xmm7 > > + movdqa %xmm14,%xmm1 > > + pslld $7,%xmm2 > > + movdqa %xmm5,96-128(%rax) > > + paddd %xmm9,%xmm5 > > + > > + psrld $11,%xmm1 > > + pxor %xmm2,%xmm7 > > + pslld $21-7,%xmm2 > > + paddd 64(%rbp),%xmm5 > > + pxor %xmm1,%xmm7 > > + > > + psrld $25-11,%xmm1 > > + movdqa %xmm14,%xmm0 > > + > > + pxor %xmm2,%xmm7 > > + movdqa %xmm14,%xmm3 > > + pslld $26-21,%xmm2 > > + pandn %xmm8,%xmm0 > > + pand %xmm15,%xmm3 > > + pxor %xmm1,%xmm7 > > + > > + > > + movdqa %xmm10,%xmm1 > > + pxor %xmm2,%xmm7 > > + movdqa %xmm10,%xmm2 > > + psrld $2,%xmm1 > > + paddd %xmm7,%xmm5 > > + pxor %xmm3,%xmm0 > > + movdqa %xmm11,%xmm3 > > + movdqa %xmm10,%xmm7 > > + pslld $10,%xmm2 > > + pxor %xmm10,%xmm3 > > + > > + > > + psrld $13,%xmm7 > > + pxor %xmm2,%xmm1 > > + paddd %xmm0,%xmm5 > > + pslld $19-10,%xmm2 > > + pand %xmm3,%xmm4 > > + pxor %xmm7,%xmm1 > > + > > + > > + psrld $22-13,%xmm7 > > + pxor %xmm2,%xmm1 > > + movdqa %xmm11,%xmm9 > > + pslld $30-19,%xmm2 > > + pxor %xmm1,%xmm7 > > + pxor %xmm4,%xmm9 > > + paddd %xmm5,%xmm13 > > + pxor %xmm2,%xmm7 > > + > > + paddd %xmm5,%xmm9 > > + paddd %xmm7,%xmm9 > > + movd 28(%r8),%xmm5 > > + movd 28(%r9),%xmm0 > > + movd 28(%r10),%xmm1 > > + movd 28(%r11),%xmm2 > > + punpckldq %xmm1,%xmm5 > > + punpckldq %xmm2,%xmm0 > > + punpckldq %xmm0,%xmm5 > > + movdqa %xmm13,%xmm7 > > + > > + movdqa %xmm13,%xmm2 > > +.byte 102,15,56,0,238 > > + psrld $6,%xmm7 > > + movdqa %xmm13,%xmm1 > > + pslld $7,%xmm2 > > + movdqa %xmm5,112-128(%rax) > > + paddd %xmm8,%xmm5 > > + > > + psrld $11,%xmm1 > > + pxor %xmm2,%xmm7 > > + pslld $21-7,%xmm2 > > + paddd 96(%rbp),%xmm5 > > + pxor %xmm1,%xmm7 > > + > > + psrld $25-11,%xmm1 > > + movdqa %xmm13,%xmm0 > > + > > + pxor %xmm2,%xmm7 > > + movdqa %xmm13,%xmm4 > > + pslld $26-21,%xmm2 > > + pandn %xmm15,%xmm0 > > + pand %xmm14,%xmm4 > > + pxor %xmm1,%xmm7 > > + > > + > > + movdqa %xmm9,%xmm1 > > + pxor %xmm2,%xmm7 > > + movdqa %xmm9,%xmm2 > > + psrld $2,%xmm1 > > + paddd %xmm7,%xmm5 > > + pxor %xmm4,%xmm0 > > + movdqa %xmm10,%xmm4 > > + movdqa %xmm9,%xmm7 > > + pslld $10,%xmm2 > > + pxor %xmm9,%xmm4 > > + > > + > > + psrld $13,%xmm7 > > + pxor %xmm2,%xmm1 > > + paddd %xmm0,%xmm5 > > + pslld $19-10,%xmm2 > > + pand %xmm4,%xmm3 > > + pxor %xmm7,%xmm1 > > + > > + > > + psrld $22-13,%xmm7 > > + pxor %xmm2,%xmm1 > > + movdqa %xmm10,%xmm8 > > + pslld $30-19,%xmm2 > > + pxor %xmm1,%xmm7 > > + pxor %xmm3,%xmm8 > > + paddd %xmm5,%xmm12 > > + pxor %xmm2,%xmm7 > > + > > + paddd %xmm5,%xmm8 > > + paddd %xmm7,%xmm8 > > + leaq 256(%rbp),%rbp > > + movd 32(%r8),%xmm5 > > + movd 32(%r9),%xmm0 > > + movd 32(%r10),%xmm1 > > + movd 32(%r11),%xmm2 > > + punpckldq %xmm1,%xmm5 > > + punpckldq %xmm2,%xmm0 > > + punpckldq %xmm0,%xmm5 > > + movdqa %xmm12,%xmm7 > > +.byte 102,15,56,0,238 > > + movdqa %xmm12,%xmm2 > > + > > + psrld $6,%xmm7 > > + movdqa %xmm12,%xmm1 > > + pslld $7,%xmm2 > > + movdqa %xmm5,128-128(%rax) > > + paddd %xmm15,%xmm5 > > + > > + psrld $11,%xmm1 > > + pxor %xmm2,%xmm7 > > + pslld $21-7,%xmm2 > > + paddd -128(%rbp),%xmm5 > > + pxor %xmm1,%xmm7 > > + > > + psrld $25-11,%xmm1 > > + movdqa %xmm12,%xmm0 > > + > > + pxor %xmm2,%xmm7 > > + movdqa %xmm12,%xmm3 > > + pslld $26-21,%xmm2 > > + pandn %xmm14,%xmm0 > > + pand %xmm13,%xmm3 > > + pxor %xmm1,%xmm7 > > + > > + > > + movdqa %xmm8,%xmm1 > > + pxor %xmm2,%xmm7 > > + movdqa %xmm8,%xmm2 > > + psrld $2,%xmm1 > > + paddd %xmm7,%xmm5 > > + pxor %xmm3,%xmm0 > > + movdqa %xmm9,%xmm3 > > + movdqa %xmm8,%xmm7 > > + pslld $10,%xmm2 > > + pxor %xmm8,%xmm3 > > + > > + > > + psrld $13,%xmm7 > > + pxor %xmm2,%xmm1 > > + paddd %xmm0,%xmm5 > > + pslld $19-10,%xmm2 > > + pand %xmm3,%xmm4 > > + pxor %xmm7,%xmm1 > > + > > + > > + psrld $22-13,%xmm7 > > + pxor %xmm2,%xmm1 > > + movdqa %xmm9,%xmm15 > > + pslld $30-19,%xmm2 > > + pxor %xmm1,%xmm7 > > + pxor %xmm4,%xmm15 > > + paddd %xmm5,%xmm11 > > + pxor %xmm2,%xmm7 > > + > > + paddd %xmm5,%xmm15 > > + paddd %xmm7,%xmm15 > > + movd 36(%r8),%xmm5 > > + movd 36(%r9),%xmm0 > > + movd 36(%r10),%xmm1 > > + movd 36(%r11),%xmm2 > > + punpckldq %xmm1,%xmm5 > > + punpckldq %xmm2,%xmm0 > > + punpckldq %xmm0,%xmm5 > > + movdqa %xmm11,%xmm7 > > + > > + movdqa %xmm11,%xmm2 > > +.byte 102,15,56,0,238 > > + psrld $6,%xmm7 > > + movdqa %xmm11,%xmm1 > > + pslld $7,%xmm2 > > + movdqa %xmm5,144-128(%rax) > > + paddd %xmm14,%xmm5 > > + > > + psrld $11,%xmm1 > > + pxor %xmm2,%xmm7 > > + pslld $21-7,%xmm2 > > + paddd -96(%rbp),%xmm5 > > + pxor %xmm1,%xmm7 > > + > > + psrld $25-11,%xmm1 > > + movdqa %xmm11,%xmm0 > > + > > + pxor %xmm2,%xmm7 > > + movdqa %xmm11,%xmm4 > > + pslld $26-21,%xmm2 > > + pandn %xmm13,%xmm0 > > + pand %xmm12,%xmm4 > > + pxor %xmm1,%xmm7 > > + > > + > > + movdqa %xmm15,%xmm1 > > + pxor %xmm2,%xmm7 > > + movdqa %xmm15,%xmm2 > > + psrld $2,%xmm1 > > + paddd %xmm7,%xmm5 > > + pxor %xmm4,%xmm0 > > + movdqa %xmm8,%xmm4 > > + movdqa %xmm15,%xmm7 > > + pslld $10,%xmm2 > > + pxor %xmm15,%xmm4 > > + > > + > > + psrld $13,%xmm7 > > + pxor %xmm2,%xmm1 > > + paddd %xmm0,%xmm5 > > + pslld $19-10,%xmm2 > > + pand %xmm4,%xmm3 > > + pxor %xmm7,%xmm1 > > + > > + > > + psrld $22-13,%xmm7 > > + pxor %xmm2,%xmm1 > > + movdqa %xmm8,%xmm14 > > + pslld $30-19,%xmm2 > > + pxor %xmm1,%xmm7 > > + pxor %xmm3,%xmm14 > > + paddd %xmm5,%xmm10 > > + pxor %xmm2,%xmm7 > > + > > + paddd %xmm5,%xmm14 > > + paddd %xmm7,%xmm14 > > + movd 40(%r8),%xmm5 > > + movd 40(%r9),%xmm0 > > + movd 40(%r10),%xmm1 > > + movd 40(%r11),%xmm2 > > + punpckldq %xmm1,%xmm5 > > + punpckldq %xmm2,%xmm0 > > + punpckldq %xmm0,%xmm5 > > + movdqa %xmm10,%xmm7 > > +.byte 102,15,56,0,238 > > + movdqa %xmm10,%xmm2 > > + > > + psrld $6,%xmm7 > > + movdqa %xmm10,%xmm1 > > + pslld $7,%xmm2 > > + movdqa %xmm5,160-128(%rax) > > + paddd %xmm13,%xmm5 > > + > > + psrld $11,%xmm1 > > + pxor %xmm2,%xmm7 > > + pslld $21-7,%xmm2 > > + paddd -64(%rbp),%xmm5 > > + pxor %xmm1,%xmm7 > > + > > + psrld $25-11,%xmm1 > > + movdqa %xmm10,%xmm0 > > + > > + pxor %xmm2,%xmm7 > > + movdqa %xmm10,%xmm3 > > + pslld $26-21,%xmm2 > > + pandn %xmm12,%xmm0 > > + pand %xmm11,%xmm3 > > + pxor %xmm1,%xmm7 > > + > > + > > + movdqa %xmm14,%xmm1 > > + pxor %xmm2,%xmm7 > > + movdqa %xmm14,%xmm2 > > + psrld $2,%xmm1 > > + paddd %xmm7,%xmm5 > > + pxor %xmm3,%xmm0 > > + movdqa %xmm15,%xmm3 > > + movdqa %xmm14,%xmm7 > > + pslld $10,%xmm2 > > + pxor %xmm14,%xmm3 > > + > > + > > + psrld $13,%xmm7 > > + pxor %xmm2,%xmm1 > > + paddd %xmm0,%xmm5 > > + pslld $19-10,%xmm2 > > + pand %xmm3,%xmm4 > > + pxor %xmm7,%xmm1 > > + > > + > > + psrld $22-13,%xmm7 > > + pxor %xmm2,%xmm1 > > + movdqa %xmm15,%xmm13 > > + pslld $30-19,%xmm2 > > + pxor %xmm1,%xmm7 > > + pxor %xmm4,%xmm13 > > + paddd %xmm5,%xmm9 > > + pxor %xmm2,%xmm7 > > + > > + paddd %xmm5,%xmm13 > > + paddd %xmm7,%xmm13 > > + movd 44(%r8),%xmm5 > > + movd 44(%r9),%xmm0 > > + movd 44(%r10),%xmm1 > > + movd 44(%r11),%xmm2 > > + punpckldq %xmm1,%xmm5 > > + punpckldq %xmm2,%xmm0 > > + punpckldq %xmm0,%xmm5 > > + movdqa %xmm9,%xmm7 > > + > > + movdqa %xmm9,%xmm2 > > +.byte 102,15,56,0,238 > > + psrld $6,%xmm7 > > + movdqa %xmm9,%xmm1 > > + pslld $7,%xmm2 > > + movdqa %xmm5,176-128(%rax) > > + paddd %xmm12,%xmm5 > > + > > + psrld $11,%xmm1 > > + pxor %xmm2,%xmm7 > > + pslld $21-7,%xmm2 > > + paddd -32(%rbp),%xmm5 > > + pxor %xmm1,%xmm7 > > + > > + psrld $25-11,%xmm1 > > + movdqa %xmm9,%xmm0 > > + > > + pxor %xmm2,%xmm7 > > + movdqa %xmm9,%xmm4 > > + pslld $26-21,%xmm2 > > + pandn %xmm11,%xmm0 > > + pand %xmm10,%xmm4 > > + pxor %xmm1,%xmm7 > > + > > + > > + movdqa %xmm13,%xmm1 > > + pxor %xmm2,%xmm7 > > + movdqa %xmm13,%xmm2 > > + psrld $2,%xmm1 > > + paddd %xmm7,%xmm5 > > + pxor %xmm4,%xmm0 > > + movdqa %xmm14,%xmm4 > > + movdqa %xmm13,%xmm7 > > + pslld $10,%xmm2 > > + pxor %xmm13,%xmm4 > > + > > + > > + psrld $13,%xmm7 > > + pxor %xmm2,%xmm1 > > + paddd %xmm0,%xmm5 > > + pslld $19-10,%xmm2 > > + pand %xmm4,%xmm3 > > + pxor %xmm7,%xmm1 > > + > > + > > + psrld $22-13,%xmm7 > > + pxor %xmm2,%xmm1 > > + movdqa %xmm14,%xmm12 > > + pslld $30-19,%xmm2 > > + pxor %xmm1,%xmm7 > > + pxor %xmm3,%xmm12 > > + paddd %xmm5,%xmm8 > > + pxor %xmm2,%xmm7 > > + > > + paddd %xmm5,%xmm12 > > + paddd %xmm7,%xmm12 > > + movd 48(%r8),%xmm5 > > + movd 48(%r9),%xmm0 > > + movd 48(%r10),%xmm1 > > + movd 48(%r11),%xmm2 > > + punpckldq %xmm1,%xmm5 > > + punpckldq %xmm2,%xmm0 > > + punpckldq %xmm0,%xmm5 > > + movdqa %xmm8,%xmm7 > > +.byte 102,15,56,0,238 > > + movdqa %xmm8,%xmm2 > > + > > + psrld $6,%xmm7 > > + movdqa %xmm8,%xmm1 > > + pslld $7,%xmm2 > > + movdqa %xmm5,192-128(%rax) > > + paddd %xmm11,%xmm5 > > + > > + psrld $11,%xmm1 > > + pxor %xmm2,%xmm7 > > + pslld $21-7,%xmm2 > > + paddd 0(%rbp),%xmm5 > > + pxor %xmm1,%xmm7 > > + > > + psrld $25-11,%xmm1 > > + movdqa %xmm8,%xmm0 > > + > > + pxor %xmm2,%xmm7 > > + movdqa %xmm8,%xmm3 > > + pslld $26-21,%xmm2 > > + pandn %xmm10,%xmm0 > > + pand %xmm9,%xmm3 > > + pxor %xmm1,%xmm7 > > + > > + > > + movdqa %xmm12,%xmm1 > > + pxor %xmm2,%xmm7 > > + movdqa %xmm12,%xmm2 > > + psrld $2,%xmm1 > > + paddd %xmm7,%xmm5 > > + pxor %xmm3,%xmm0 > > + movdqa %xmm13,%xmm3 > > + movdqa %xmm12,%xmm7 > > + pslld $10,%xmm2 > > + pxor %xmm12,%xmm3 > > + > > + > > + psrld $13,%xmm7 > > + pxor %xmm2,%xmm1 > > + paddd %xmm0,%xmm5 > > + pslld $19-10,%xmm2 > > + pand %xmm3,%xmm4 > > + pxor %xmm7,%xmm1 > > + > > + > > + psrld $22-13,%xmm7 > > + pxor %xmm2,%xmm1 > > + movdqa %xmm13,%xmm11 > > + pslld $30-19,%xmm2 > > + pxor %xmm1,%xmm7 > > + pxor %xmm4,%xmm11 > > + paddd %xmm5,%xmm15 > > + pxor %xmm2,%xmm7 > > + > > + paddd %xmm5,%xmm11 > > + paddd %xmm7,%xmm11 > > + movd 52(%r8),%xmm5 > > + movd 52(%r9),%xmm0 > > + movd 52(%r10),%xmm1 > > + movd 52(%r11),%xmm2 > > + punpckldq %xmm1,%xmm5 > > + punpckldq %xmm2,%xmm0 > > + punpckldq %xmm0,%xmm5 > > + movdqa %xmm15,%xmm7 > > + > > + movdqa %xmm15,%xmm2 > > +.byte 102,15,56,0,238 > > + psrld $6,%xmm7 > > + movdqa %xmm15,%xmm1 > > + pslld $7,%xmm2 > > + movdqa %xmm5,208-128(%rax) > > + paddd %xmm10,%xmm5 > > + > > + psrld $11,%xmm1 > > + pxor %xmm2,%xmm7 > > + pslld $21-7,%xmm2 > > + paddd 32(%rbp),%xmm5 > > + pxor %xmm1,%xmm7 > > + > > + psrld $25-11,%xmm1 > > + movdqa %xmm15,%xmm0 > > + > > + pxor %xmm2,%xmm7 > > + movdqa %xmm15,%xmm4 > > + pslld $26-21,%xmm2 > > + pandn %xmm9,%xmm0 > > + pand %xmm8,%xmm4 > > + pxor %xmm1,%xmm7 > > + > > + > > + movdqa %xmm11,%xmm1 > > + pxor %xmm2,%xmm7 > > + movdqa %xmm11,%xmm2 > > + psrld $2,%xmm1 > > + paddd %xmm7,%xmm5 > > + pxor %xmm4,%xmm0 > > + movdqa %xmm12,%xmm4 > > + movdqa %xmm11,%xmm7 > > + pslld $10,%xmm2 > > + pxor %xmm11,%xmm4 > > + > > + > > + psrld $13,%xmm7 > > + pxor %xmm2,%xmm1 > > + paddd %xmm0,%xmm5 > > + pslld $19-10,%xmm2 > > + pand %xmm4,%xmm3 > > + pxor %xmm7,%xmm1 > > + > > + > > + psrld $22-13,%xmm7 > > + pxor %xmm2,%xmm1 > > + movdqa %xmm12,%xmm10 > > + pslld $30-19,%xmm2 > > + pxor %xmm1,%xmm7 > > + pxor %xmm3,%xmm10 > > + paddd %xmm5,%xmm14 > > + pxor %xmm2,%xmm7 > > + > > + paddd %xmm5,%xmm10 > > + paddd %xmm7,%xmm10 > > + movd 56(%r8),%xmm5 > > + movd 56(%r9),%xmm0 > > + movd 56(%r10),%xmm1 > > + movd 56(%r11),%xmm2 > > + punpckldq %xmm1,%xmm5 > > + punpckldq %xmm2,%xmm0 > > + punpckldq %xmm0,%xmm5 > > + movdqa %xmm14,%xmm7 > > +.byte 102,15,56,0,238 > > + movdqa %xmm14,%xmm2 > > + > > + psrld $6,%xmm7 > > + movdqa %xmm14,%xmm1 > > + pslld $7,%xmm2 > > + movdqa %xmm5,224-128(%rax) > > + paddd %xmm9,%xmm5 > > + > > + psrld $11,%xmm1 > > + pxor %xmm2,%xmm7 > > + pslld $21-7,%xmm2 > > + paddd 64(%rbp),%xmm5 > > + pxor %xmm1,%xmm7 > > + > > + psrld $25-11,%xmm1 > > + movdqa %xmm14,%xmm0 > > + > > + pxor %xmm2,%xmm7 > > + movdqa %xmm14,%xmm3 > > + pslld $26-21,%xmm2 > > + pandn %xmm8,%xmm0 > > + pand %xmm15,%xmm3 > > + pxor %xmm1,%xmm7 > > + > > + > > + movdqa %xmm10,%xmm1 > > + pxor %xmm2,%xmm7 > > + movdqa %xmm10,%xmm2 > > + psrld $2,%xmm1 > > + paddd %xmm7,%xmm5 > > + pxor %xmm3,%xmm0 > > + movdqa %xmm11,%xmm3 > > + movdqa %xmm10,%xmm7 > > + pslld $10,%xmm2 > > + pxor %xmm10,%xmm3 > > + > > + > > + psrld $13,%xmm7 > > + pxor %xmm2,%xmm1 > > + paddd %xmm0,%xmm5 > > + pslld $19-10,%xmm2 > > + pand %xmm3,%xmm4 > > + pxor %xmm7,%xmm1 > > + > > + > > + psrld $22-13,%xmm7 > > + pxor %xmm2,%xmm1 > > + movdqa %xmm11,%xmm9 > > + pslld $30-19,%xmm2 > > + pxor %xmm1,%xmm7 > > + pxor %xmm4,%xmm9 > > + paddd %xmm5,%xmm13 > > + pxor %xmm2,%xmm7 > > + > > + paddd %xmm5,%xmm9 > > + paddd %xmm7,%xmm9 > > + movd 60(%r8),%xmm5 > > + leaq 64(%r8),%r8 > > + movd 60(%r9),%xmm0 > > + leaq 64(%r9),%r9 > > + movd 60(%r10),%xmm1 > > + leaq 64(%r10),%r10 > > + movd 60(%r11),%xmm2 > > + leaq 64(%r11),%r11 > > + punpckldq %xmm1,%xmm5 > > + punpckldq %xmm2,%xmm0 > > + punpckldq %xmm0,%xmm5 > > + movdqa %xmm13,%xmm7 > > + > > + movdqa %xmm13,%xmm2 > > +.byte 102,15,56,0,238 > > + psrld $6,%xmm7 > > + movdqa %xmm13,%xmm1 > > + pslld $7,%xmm2 > > + movdqa %xmm5,240-128(%rax) > > + paddd %xmm8,%xmm5 > > + > > + psrld $11,%xmm1 > > + pxor %xmm2,%xmm7 > > + pslld $21-7,%xmm2 > > + paddd 96(%rbp),%xmm5 > > + pxor %xmm1,%xmm7 > > + > > + psrld $25-11,%xmm1 > > + movdqa %xmm13,%xmm0 > > + prefetcht0 63(%r8) > > + pxor %xmm2,%xmm7 > > + movdqa %xmm13,%xmm4 > > + pslld $26-21,%xmm2 > > + pandn %xmm15,%xmm0 > > + pand %xmm14,%xmm4 > > + pxor %xmm1,%xmm7 > > + > > + prefetcht0 63(%r9) > > + movdqa %xmm9,%xmm1 > > + pxor %xmm2,%xmm7 > > + movdqa %xmm9,%xmm2 > > + psrld $2,%xmm1 > > + paddd %xmm7,%xmm5 > > + pxor %xmm4,%xmm0 > > + movdqa %xmm10,%xmm4 > > + movdqa %xmm9,%xmm7 > > + pslld $10,%xmm2 > > + pxor %xmm9,%xmm4 > > + > > + prefetcht0 63(%r10) > > + psrld $13,%xmm7 > > + pxor %xmm2,%xmm1 > > + paddd %xmm0,%xmm5 > > + pslld $19-10,%xmm2 > > + pand %xmm4,%xmm3 > > + pxor %xmm7,%xmm1 > > + > > + prefetcht0 63(%r11) > > + psrld $22-13,%xmm7 > > + pxor %xmm2,%xmm1 > > + movdqa %xmm10,%xmm8 > > + pslld $30-19,%xmm2 > > + pxor %xmm1,%xmm7 > > + pxor %xmm3,%xmm8 > > + paddd %xmm5,%xmm12 > > + pxor %xmm2,%xmm7 > > + > > + paddd %xmm5,%xmm8 > > + paddd %xmm7,%xmm8 > > + leaq 256(%rbp),%rbp > > + movdqu 0-128(%rax),%xmm5 > > + movl $3,%ecx > > + jmp .Loop_16_xx > > +.align 32 > > +.Loop_16_xx: > > + movdqa 16-128(%rax),%xmm6 > > + paddd 144-128(%rax),%xmm5 > > + > > + movdqa %xmm6,%xmm7 > > + movdqa %xmm6,%xmm1 > > + psrld $3,%xmm7 > > + movdqa %xmm6,%xmm2 > > + > > + psrld $7,%xmm1 > > + movdqa 224-128(%rax),%xmm0 > > + pslld $14,%xmm2 > > + pxor %xmm1,%xmm7 > > + psrld $18-7,%xmm1 > > + movdqa %xmm0,%xmm3 > > + pxor %xmm2,%xmm7 > > + pslld $25-14,%xmm2 > > + pxor %xmm1,%xmm7 > > + psrld $10,%xmm0 > > + movdqa %xmm3,%xmm1 > > + > > + psrld $17,%xmm3 > > + pxor %xmm2,%xmm7 > > + pslld $13,%xmm1 > > + paddd %xmm7,%xmm5 > > + pxor %xmm3,%xmm0 > > + psrld $19-17,%xmm3 > > + pxor %xmm1,%xmm0 > > + pslld $15-13,%xmm1 > > + pxor %xmm3,%xmm0 > > + pxor %xmm1,%xmm0 > > + paddd %xmm0,%xmm5 > > + movdqa %xmm12,%xmm7 > > + > > + movdqa %xmm12,%xmm2 > > + > > + psrld $6,%xmm7 > > + movdqa %xmm12,%xmm1 > > + pslld $7,%xmm2 > > + movdqa %xmm5,0-128(%rax) > > + paddd %xmm15,%xmm5 > > + > > + psrld $11,%xmm1 > > + pxor %xmm2,%xmm7 > > + pslld $21-7,%xmm2 > > + paddd -128(%rbp),%xmm5 > > + pxor %xmm1,%xmm7 > > + > > + psrld $25-11,%xmm1 > > + movdqa %xmm12,%xmm0 > > + > > + pxor %xmm2,%xmm7 > > + movdqa %xmm12,%xmm3 > > + pslld $26-21,%xmm2 > > + pandn %xmm14,%xmm0 > > + pand %xmm13,%xmm3 > > + pxor %xmm1,%xmm7 > > + > > + > > + movdqa %xmm8,%xmm1 > > + pxor %xmm2,%xmm7 > > + movdqa %xmm8,%xmm2 > > + psrld $2,%xmm1 > > + paddd %xmm7,%xmm5 > > + pxor %xmm3,%xmm0 > > + movdqa %xmm9,%xmm3 > > + movdqa %xmm8,%xmm7 > > + pslld $10,%xmm2 > > + pxor %xmm8,%xmm3 > > + > > + > > + psrld $13,%xmm7 > > + pxor %xmm2,%xmm1 > > + paddd %xmm0,%xmm5 > > + pslld $19-10,%xmm2 > > + pand %xmm3,%xmm4 > > + pxor %xmm7,%xmm1 > > + > > + > > + psrld $22-13,%xmm7 > > + pxor %xmm2,%xmm1 > > + movdqa %xmm9,%xmm15 > > + pslld $30-19,%xmm2 > > + pxor %xmm1,%xmm7 > > + pxor %xmm4,%xmm15 > > + paddd %xmm5,%xmm11 > > + pxor %xmm2,%xmm7 > > + > > + paddd %xmm5,%xmm15 > > + paddd %xmm7,%xmm15 > > + movdqa 32-128(%rax),%xmm5 > > + paddd 160-128(%rax),%xmm6 > > + > > + movdqa %xmm5,%xmm7 > > + movdqa %xmm5,%xmm1 > > + psrld $3,%xmm7 > > + movdqa %xmm5,%xmm2 > > + > > + psrld $7,%xmm1 > > + movdqa 240-128(%rax),%xmm0 > > + pslld $14,%xmm2 > > + pxor %xmm1,%xmm7 > > + psrld $18-7,%xmm1 > > + movdqa %xmm0,%xmm4 > > + pxor %xmm2,%xmm7 > > + pslld $25-14,%xmm2 > > + pxor %xmm1,%xmm7 > > + psrld $10,%xmm0 > > + movdqa %xmm4,%xmm1 > > + > > + psrld $17,%xmm4 > > + pxor %xmm2,%xmm7 > > + pslld $13,%xmm1 > > + paddd %xmm7,%xmm6 > > + pxor %xmm4,%xmm0 > > + psrld $19-17,%xmm4 > > + pxor %xmm1,%xmm0 > > + pslld $15-13,%xmm1 > > + pxor %xmm4,%xmm0 > > + pxor %xmm1,%xmm0 > > + paddd %xmm0,%xmm6 > > + movdqa %xmm11,%xmm7 > > + > > + movdqa %xmm11,%xmm2 > > + > > + psrld $6,%xmm7 > > + movdqa %xmm11,%xmm1 > > + pslld $7,%xmm2 > > + movdqa %xmm6,16-128(%rax) > > + paddd %xmm14,%xmm6 > > + > > + psrld $11,%xmm1 > > + pxor %xmm2,%xmm7 > > + pslld $21-7,%xmm2 > > + paddd -96(%rbp),%xmm6 > > + pxor %xmm1,%xmm7 > > + > > + psrld $25-11,%xmm1 > > + movdqa %xmm11,%xmm0 > > + > > + pxor %xmm2,%xmm7 > > + movdqa %xmm11,%xmm4 > > + pslld $26-21,%xmm2 > > + pandn %xmm13,%xmm0 > > + pand %xmm12,%xmm4 > > + pxor %xmm1,%xmm7 > > + > > + > > + movdqa %xmm15,%xmm1 > > + pxor %xmm2,%xmm7 > > + movdqa %xmm15,%xmm2 > > + psrld $2,%xmm1 > > + paddd %xmm7,%xmm6 > > + pxor %xmm4,%xmm0 > > + movdqa %xmm8,%xmm4 > > + movdqa %xmm15,%xmm7 > > + pslld $10,%xmm2 > > + pxor %xmm15,%xmm4 > > + > > + > > + psrld $13,%xmm7 > > + pxor %xmm2,%xmm1 > > + paddd %xmm0,%xmm6 > > + pslld $19-10,%xmm2 > > + pand %xmm4,%xmm3 > > + pxor %xmm7,%xmm1 > > + > > + > > + psrld $22-13,%xmm7 > > + pxor %xmm2,%xmm1 > > + movdqa %xmm8,%xmm14 > > + pslld $30-19,%xmm2 > > + pxor %xmm1,%xmm7 > > + pxor %xmm3,%xmm14 > > + paddd %xmm6,%xmm10 > > + pxor %xmm2,%xmm7 > > + > > + paddd %xmm6,%xmm14 > > + paddd %xmm7,%xmm14 > > + movdqa 48-128(%rax),%xmm6 > > + paddd 176-128(%rax),%xmm5 > > + > > + movdqa %xmm6,%xmm7 > > + movdqa %xmm6,%xmm1 > > + psrld $3,%xmm7 > > + movdqa %xmm6,%xmm2 > > + > > + psrld $7,%xmm1 > > + movdqa 0-128(%rax),%xmm0 > > + pslld $14,%xmm2 > > + pxor %xmm1,%xmm7 > > + psrld $18-7,%xmm1 > > + movdqa %xmm0,%xmm3 > > + pxor %xmm2,%xmm7 > > + pslld $25-14,%xmm2 > > + pxor %xmm1,%xmm7 > > + psrld $10,%xmm0 > > + movdqa %xmm3,%xmm1 > > + > > + psrld $17,%xmm3 > > + pxor %xmm2,%xmm7 > > + pslld $13,%xmm1 > > + paddd %xmm7,%xmm5 > > + pxor %xmm3,%xmm0 > > + psrld $19-17,%xmm3 > > + pxor %xmm1,%xmm0 > > + pslld $15-13,%xmm1 > > + pxor %xmm3,%xmm0 > > + pxor %xmm1,%xmm0 > > + paddd %xmm0,%xmm5 > > + movdqa %xmm10,%xmm7 > > + > > + movdqa %xmm10,%xmm2 > > + > > + psrld $6,%xmm7 > > + movdqa %xmm10,%xmm1 > > + pslld $7,%xmm2 > > + movdqa %xmm5,32-128(%rax) > > + paddd %xmm13,%xmm5 > > + > > + psrld $11,%xmm1 > > + pxor %xmm2,%xmm7 > > + pslld $21-7,%xmm2 > > + paddd -64(%rbp),%xmm5 > > + pxor %xmm1,%xmm7 > > + > > + psrld $25-11,%xmm1 > > + movdqa %xmm10,%xmm0 > > + > > + pxor %xmm2,%xmm7 > > + movdqa %xmm10,%xmm3 > > + pslld $26-21,%xmm2 > > + pandn %xmm12,%xmm0 > > + pand %xmm11,%xmm3 > > + pxor %xmm1,%xmm7 > > + > > + > > + movdqa %xmm14,%xmm1 > > + pxor %xmm2,%xmm7 > > + movdqa %xmm14,%xmm2 > > + psrld $2,%xmm1 > > + paddd %xmm7,%xmm5 > > + pxor %xmm3,%xmm0 > > + movdqa %xmm15,%xmm3 > > + movdqa %xmm14,%xmm7 > > + pslld $10,%xmm2 > > + pxor %xmm14,%xmm3 > > + > > + > > + psrld $13,%xmm7 > > + pxor %xmm2,%xmm1 > > + paddd %xmm0,%xmm5 > > + pslld $19-10,%xmm2 > > + pand %xmm3,%xmm4 > > + pxor %xmm7,%xmm1 > > + > > + > > + psrld $22-13,%xmm7 > > + pxor %xmm2,%xmm1 > > + movdqa %xmm15,%xmm13 > > + pslld $30-19,%xmm2 > > + pxor %xmm1,%xmm7 > > + pxor %xmm4,%xmm13 > > + paddd %xmm5,%xmm9 > > + pxor %xmm2,%xmm7 > > + > > + paddd %xmm5,%xmm13 > > + paddd %xmm7,%xmm13 > > + movdqa 64-128(%rax),%xmm5 > > + paddd 192-128(%rax),%xmm6 > > + > > + movdqa %xmm5,%xmm7 > > + movdqa %xmm5,%xmm1 > > + psrld $3,%xmm7 > > + movdqa %xmm5,%xmm2 > > + > > + psrld $7,%xmm1 > > + movdqa 16-128(%rax),%xmm0 > > + pslld $14,%xmm2 > > + pxor %xmm1,%xmm7 > > + psrld $18-7,%xmm1 > > + movdqa %xmm0,%xmm4 > > + pxor %xmm2,%xmm7 > > + pslld $25-14,%xmm2 > > + pxor %xmm1,%xmm7 > > + psrld $10,%xmm0 > > + movdqa %xmm4,%xmm1 > > + > > + psrld $17,%xmm4 > > + pxor %xmm2,%xmm7 > > + pslld $13,%xmm1 > > + paddd %xmm7,%xmm6 > > + pxor %xmm4,%xmm0 > > + psrld $19-17,%xmm4 > > + pxor %xmm1,%xmm0 > > + pslld $15-13,%xmm1 > > + pxor %xmm4,%xmm0 > > + pxor %xmm1,%xmm0 > > + paddd %xmm0,%xmm6 > > + movdqa %xmm9,%xmm7 > > + > > + movdqa %xmm9,%xmm2 > > + > > + psrld $6,%xmm7 > > + movdqa %xmm9,%xmm1 > > + pslld $7,%xmm2 > > + movdqa %xmm6,48-128(%rax) > > + paddd %xmm12,%xmm6 > > + > > + psrld $11,%xmm1 > > + pxor %xmm2,%xmm7 > > + pslld $21-7,%xmm2 > > + paddd -32(%rbp),%xmm6 > > + pxor %xmm1,%xmm7 > > + > > + psrld $25-11,%xmm1 > > + movdqa %xmm9,%xmm0 > > + > > + pxor %xmm2,%xmm7 > > + movdqa %xmm9,%xmm4 > > + pslld $26-21,%xmm2 > > + pandn %xmm11,%xmm0 > > + pand %xmm10,%xmm4 > > + pxor %xmm1,%xmm7 > > + > > + > > + movdqa %xmm13,%xmm1 > > + pxor %xmm2,%xmm7 > > + movdqa %xmm13,%xmm2 > > + psrld $2,%xmm1 > > + paddd %xmm7,%xmm6 > > + pxor %xmm4,%xmm0 > > + movdqa %xmm14,%xmm4 > > + movdqa %xmm13,%xmm7 > > + pslld $10,%xmm2 > > + pxor %xmm13,%xmm4 > > + > > + > > + psrld $13,%xmm7 > > + pxor %xmm2,%xmm1 > > + paddd %xmm0,%xmm6 > > + pslld $19-10,%xmm2 > > + pand %xmm4,%xmm3 > > + pxor %xmm7,%xmm1 > > + > > + > > + psrld $22-13,%xmm7 > > + pxor %xmm2,%xmm1 > > + movdqa %xmm14,%xmm12 > > + pslld $30-19,%xmm2 > > + pxor %xmm1,%xmm7 > > + pxor %xmm3,%xmm12 > > + paddd %xmm6,%xmm8 > > + pxor %xmm2,%xmm7 > > + > > + paddd %xmm6,%xmm12 > > + paddd %xmm7,%xmm12 > > + movdqa 80-128(%rax),%xmm6 > > + paddd 208-128(%rax),%xmm5 > > + > > + movdqa %xmm6,%xmm7 > > + movdqa %xmm6,%xmm1 > > + psrld $3,%xmm7 > > + movdqa %xmm6,%xmm2 > > + > > + psrld $7,%xmm1 > > + movdqa 32-128(%rax),%xmm0 > > + pslld $14,%xmm2 > > + pxor %xmm1,%xmm7 > > + psrld $18-7,%xmm1 > > + movdqa %xmm0,%xmm3 > > + pxor %xmm2,%xmm7 > > + pslld $25-14,%xmm2 > > + pxor %xmm1,%xmm7 > > + psrld $10,%xmm0 > > + movdqa %xmm3,%xmm1 > > + > > + psrld $17,%xmm3 > > + pxor %xmm2,%xmm7 > > + pslld $13,%xmm1 > > + paddd %xmm7,%xmm5 > > + pxor %xmm3,%xmm0 > > + psrld $19-17,%xmm3 > > + pxor %xmm1,%xmm0 > > + pslld $15-13,%xmm1 > > + pxor %xmm3,%xmm0 > > + pxor %xmm1,%xmm0 > > + paddd %xmm0,%xmm5 > > + movdqa %xmm8,%xmm7 > > + > > + movdqa %xmm8,%xmm2 > > + > > + psrld $6,%xmm7 > > + movdqa %xmm8,%xmm1 > > + pslld $7,%xmm2 > > + movdqa %xmm5,64-128(%rax) > > + paddd %xmm11,%xmm5 > > + > > + psrld $11,%xmm1 > > + pxor %xmm2,%xmm7 > > + pslld $21-7,%xmm2 > > + paddd 0(%rbp),%xmm5 > > + pxor %xmm1,%xmm7 > > + > > + psrld $25-11,%xmm1 > > + movdqa %xmm8,%xmm0 > > + > > + pxor %xmm2,%xmm7 > > + movdqa %xmm8,%xmm3 > > + pslld $26-21,%xmm2 > > + pandn %xmm10,%xmm0 > > + pand %xmm9,%xmm3 > > + pxor %xmm1,%xmm7 > > + > > + > > + movdqa %xmm12,%xmm1 > > + pxor %xmm2,%xmm7 > > + movdqa %xmm12,%xmm2 > > + psrld $2,%xmm1 > > + paddd %xmm7,%xmm5 > > + pxor %xmm3,%xmm0 > > + movdqa %xmm13,%xmm3 > > + movdqa %xmm12,%xmm7 > > + pslld $10,%xmm2 > > + pxor %xmm12,%xmm3 > > + > > + > > + psrld $13,%xmm7 > > + pxor %xmm2,%xmm1 > > + paddd %xmm0,%xmm5 > > + pslld $19-10,%xmm2 > > + pand %xmm3,%xmm4 > > + pxor %xmm7,%xmm1 > > + > > + > > + psrld $22-13,%xmm7 > > + pxor %xmm2,%xmm1 > > + movdqa %xmm13,%xmm11 > > + pslld $30-19,%xmm2 > > + pxor %xmm1,%xmm7 > > + pxor %xmm4,%xmm11 > > + paddd %xmm5,%xmm15 > > + pxor %xmm2,%xmm7 > > + > > + paddd %xmm5,%xmm11 > > + paddd %xmm7,%xmm11 > > + movdqa 96-128(%rax),%xmm5 > > + paddd 224-128(%rax),%xmm6 > > + > > + movdqa %xmm5,%xmm7 > > + movdqa %xmm5,%xmm1 > > + psrld $3,%xmm7 > > + movdqa %xmm5,%xmm2 > > + > > + psrld $7,%xmm1 > > + movdqa 48-128(%rax),%xmm0 > > + pslld $14,%xmm2 > > + pxor %xmm1,%xmm7 > > + psrld $18-7,%xmm1 > > + movdqa %xmm0,%xmm4 > > + pxor %xmm2,%xmm7 > > + pslld $25-14,%xmm2 > > + pxor %xmm1,%xmm7 > > + psrld $10,%xmm0 > > + movdqa %xmm4,%xmm1 > > + > > + psrld $17,%xmm4 > > + pxor %xmm2,%xmm7 > > + pslld $13,%xmm1 > > + paddd %xmm7,%xmm6 > > + pxor %xmm4,%xmm0 > > + psrld $19-17,%xmm4 > > + pxor %xmm1,%xmm0 > > + pslld $15-13,%xmm1 > > + pxor %xmm4,%xmm0 > > + pxor %xmm1,%xmm0 > > + paddd %xmm0,%xmm6 > > + movdqa %xmm15,%xmm7 > > + > > + movdqa %xmm15,%xmm2 > > + > > + psrld $6,%xmm7 > > + movdqa %xmm15,%xmm1 > > + pslld $7,%xmm2 > > + movdqa %xmm6,80-128(%rax) > > + paddd %xmm10,%xmm6 > > + > > + psrld $11,%xmm1 > > + pxor %xmm2,%xmm7 > > + pslld $21-7,%xmm2 > > + paddd 32(%rbp),%xmm6 > > + pxor %xmm1,%xmm7 > > + > > + psrld $25-11,%xmm1 > > + movdqa %xmm15,%xmm0 > > + > > + pxor %xmm2,%xmm7 > > + movdqa %xmm15,%xmm4 > > + pslld $26-21,%xmm2 > > + pandn %xmm9,%xmm0 > > + pand %xmm8,%xmm4 > > + pxor %xmm1,%xmm7 > > + > > + > > + movdqa %xmm11,%xmm1 > > + pxor %xmm2,%xmm7 > > + movdqa %xmm11,%xmm2 > > + psrld $2,%xmm1 > > + paddd %xmm7,%xmm6 > > + pxor %xmm4,%xmm0 > > + movdqa %xmm12,%xmm4 > > + movdqa %xmm11,%xmm7 > > + pslld $10,%xmm2 > > + pxor %xmm11,%xmm4 > > + > > + > > + psrld $13,%xmm7 > > + pxor %xmm2,%xmm1 > > + paddd %xmm0,%xmm6 > > + pslld $19-10,%xmm2 > > + pand %xmm4,%xmm3 > > + pxor %xmm7,%xmm1 > > + > > + > > + psrld $22-13,%xmm7 > > + pxor %xmm2,%xmm1 > > + movdqa %xmm12,%xmm10 > > + pslld $30-19,%xmm2 > > + pxor %xmm1,%xmm7 > > + pxor %xmm3,%xmm10 > > + paddd %xmm6,%xmm14 > > + pxor %xmm2,%xmm7 > > + > > + paddd %xmm6,%xmm10 > > + paddd %xmm7,%xmm10 > > + movdqa 112-128(%rax),%xmm6 > > + paddd 240-128(%rax),%xmm5 > > + > > + movdqa %xmm6,%xmm7 > > + movdqa %xmm6,%xmm1 > > + psrld $3,%xmm7 > > + movdqa %xmm6,%xmm2 > > + > > + psrld $7,%xmm1 > > + movdqa 64-128(%rax),%xmm0 > > + pslld $14,%xmm2 > > + pxor %xmm1,%xmm7 > > + psrld $18-7,%xmm1 > > + movdqa %xmm0,%xmm3 > > + pxor %xmm2,%xmm7 > > + pslld $25-14,%xmm2 > > + pxor %xmm1,%xmm7 > > + psrld $10,%xmm0 > > + movdqa %xmm3,%xmm1 > > + > > + psrld $17,%xmm3 > > + pxor %xmm2,%xmm7 > > + pslld $13,%xmm1 > > + paddd %xmm7,%xmm5 > > + pxor %xmm3,%xmm0 > > + psrld $19-17,%xmm3 > > + pxor %xmm1,%xmm0 > > + pslld $15-13,%xmm1 > > + pxor %xmm3,%xmm0 > > + pxor %xmm1,%xmm0 > > + paddd %xmm0,%xmm5 > > + movdqa %xmm14,%xmm7 > > + > > + movdqa %xmm14,%xmm2 > > + > > + psrld $6,%xmm7 > > + movdqa %xmm14,%xmm1 > > + pslld $7,%xmm2 > > + movdqa %xmm5,96-128(%rax) > > + paddd %xmm9,%xmm5 > > + > > + psrld $11,%xmm1 > > + pxor %xmm2,%xmm7 > > + pslld $21-7,%xmm2 > > + paddd 64(%rbp),%xmm5 > > + pxor %xmm1,%xmm7 > > + > > + psrld $25-11,%xmm1 > > + movdqa %xmm14,%xmm0 > > + > > + pxor %xmm2,%xmm7 > > + movdqa %xmm14,%xmm3 > > + pslld $26-21,%xmm2 > > + pandn %xmm8,%xmm0 > > + pand %xmm15,%xmm3 > > + pxor %xmm1,%xmm7 > > + > > + > > + movdqa %xmm10,%xmm1 > > + pxor %xmm2,%xmm7 > > + movdqa %xmm10,%xmm2 > > + psrld $2,%xmm1 > > + paddd %xmm7,%xmm5 > > + pxor %xmm3,%xmm0 > > + movdqa %xmm11,%xmm3 > > + movdqa %xmm10,%xmm7 > > + pslld $10,%xmm2 > > + pxor %xmm10,%xmm3 > > + > > + > > + psrld $13,%xmm7 > > + pxor %xmm2,%xmm1 > > + paddd %xmm0,%xmm5 > > + pslld $19-10,%xmm2 > > + pand %xmm3,%xmm4 > > + pxor %xmm7,%xmm1 > > + > > + > > + psrld $22-13,%xmm7 > > + pxor %xmm2,%xmm1 > > + movdqa %xmm11,%xmm9 > > + pslld $30-19,%xmm2 > > + pxor %xmm1,%xmm7 > > + pxor %xmm4,%xmm9 > > + paddd %xmm5,%xmm13 > > + pxor %xmm2,%xmm7 > > + > > + paddd %xmm5,%xmm9 > > + paddd %xmm7,%xmm9 > > + movdqa 128-128(%rax),%xmm5 > > + paddd 0-128(%rax),%xmm6 > > + > > + movdqa %xmm5,%xmm7 > > + movdqa %xmm5,%xmm1 > > + psrld $3,%xmm7 > > + movdqa %xmm5,%xmm2 > > + > > + psrld $7,%xmm1 > > + movdqa 80-128(%rax),%xmm0 > > + pslld $14,%xmm2 > > + pxor %xmm1,%xmm7 > > + psrld $18-7,%xmm1 > > + movdqa %xmm0,%xmm4 > > + pxor %xmm2,%xmm7 > > + pslld $25-14,%xmm2 > > + pxor %xmm1,%xmm7 > > + psrld $10,%xmm0 > > + movdqa %xmm4,%xmm1 > > + > > + psrld $17,%xmm4 > > + pxor %xmm2,%xmm7 > > + pslld $13,%xmm1 > > + paddd %xmm7,%xmm6 > > + pxor %xmm4,%xmm0 > > + psrld $19-17,%xmm4 > > + pxor %xmm1,%xmm0 > > + pslld $15-13,%xmm1 > > + pxor %xmm4,%xmm0 > > + pxor %xmm1,%xmm0 > > + paddd %xmm0,%xmm6 > > + movdqa %xmm13,%xmm7 > > + > > + movdqa %xmm13,%xmm2 > > + > > + psrld $6,%xmm7 > > + movdqa %xmm13,%xmm1 > > + pslld $7,%xmm2 > > + movdqa %xmm6,112-128(%rax) > > + paddd %xmm8,%xmm6 > > + > > + psrld $11,%xmm1 > > + pxor %xmm2,%xmm7 > > + pslld $21-7,%xmm2 > > + paddd 96(%rbp),%xmm6 > > + pxor %xmm1,%xmm7 > > + > > + psrld $25-11,%xmm1 > > + movdqa %xmm13,%xmm0 > > + > > + pxor %xmm2,%xmm7 > > + movdqa %xmm13,%xmm4 > > + pslld $26-21,%xmm2 > > + pandn %xmm15,%xmm0 > > + pand %xmm14,%xmm4 > > + pxor %xmm1,%xmm7 > > + > > + > > + movdqa %xmm9,%xmm1 > > + pxor %xmm2,%xmm7 > > + movdqa %xmm9,%xmm2 > > + psrld $2,%xmm1 > > + paddd %xmm7,%xmm6 > > + pxor %xmm4,%xmm0 > > + movdqa %xmm10,%xmm4 > > + movdqa %xmm9,%xmm7 > > + pslld $10,%xmm2 > > + pxor %xmm9,%xmm4 > > + > > + > > + psrld $13,%xmm7 > > + pxor %xmm2,%xmm1 > > + paddd %xmm0,%xmm6 > > + pslld $19-10,%xmm2 > > + pand %xmm4,%xmm3 > > + pxor %xmm7,%xmm1 > > + > > + > > + psrld $22-13,%xmm7 > > + pxor %xmm2,%xmm1 > > + movdqa %xmm10,%xmm8 > > + pslld $30-19,%xmm2 > > + pxor %xmm1,%xmm7 > > + pxor %xmm3,%xmm8 > > + paddd %xmm6,%xmm12 > > + pxor %xmm2,%xmm7 > > + > > + paddd %xmm6,%xmm8 > > + paddd %xmm7,%xmm8 > > + leaq 256(%rbp),%rbp > > + movdqa 144-128(%rax),%xmm6 > > + paddd 16-128(%rax),%xmm5 > > + > > + movdqa %xmm6,%xmm7 > > + movdqa %xmm6,%xmm1 > > + psrld $3,%xmm7 > > + movdqa %xmm6,%xmm2 > > + > > + psrld $7,%xmm1 > > + movdqa 96-128(%rax),%xmm0 > > + pslld $14,%xmm2 > > + pxor %xmm1,%xmm7 > > + psrld $18-7,%xmm1 > > + movdqa %xmm0,%xmm3 > > + pxor %xmm2,%xmm7 > > + pslld $25-14,%xmm2 > > + pxor %xmm1,%xmm7 > > + psrld $10,%xmm0 > > + movdqa %xmm3,%xmm1 > > + > > + psrld $17,%xmm3 > > + pxor %xmm2,%xmm7 > > + pslld $13,%xmm1 > > + paddd %xmm7,%xmm5 > > + pxor %xmm3,%xmm0 > > + psrld $19-17,%xmm3 > > + pxor %xmm1,%xmm0 > > + pslld $15-13,%xmm1 > > + pxor %xmm3,%xmm0 > > + pxor %xmm1,%xmm0 > > + paddd %xmm0,%xmm5 > > + movdqa %xmm12,%xmm7 > > + > > + movdqa %xmm12,%xmm2 > > + > > + psrld $6,%xmm7 > > + movdqa %xmm12,%xmm1 > > + pslld $7,%xmm2 > > + movdqa %xmm5,128-128(%rax) > > + paddd %xmm15,%xmm5 > > + > > + psrld $11,%xmm1 > > + pxor %xmm2,%xmm7 > > + pslld $21-7,%xmm2 > > + paddd -128(%rbp),%xmm5 > > + pxor %xmm1,%xmm7 > > + > > + psrld $25-11,%xmm1 > > + movdqa %xmm12,%xmm0 > > + > > + pxor %xmm2,%xmm7 > > + movdqa %xmm12,%xmm3 > > + pslld $26-21,%xmm2 > > + pandn %xmm14,%xmm0 > > + pand %xmm13,%xmm3 > > + pxor %xmm1,%xmm7 > > + > > + > > + movdqa %xmm8,%xmm1 > > + pxor %xmm2,%xmm7 > > + movdqa %xmm8,%xmm2 > > + psrld $2,%xmm1 > > + paddd %xmm7,%xmm5 > > + pxor %xmm3,%xmm0 > > + movdqa %xmm9,%xmm3 > > + movdqa %xmm8,%xmm7 > > + pslld $10,%xmm2 > > + pxor %xmm8,%xmm3 > > + > > + > > + psrld $13,%xmm7 > > + pxor %xmm2,%xmm1 > > + paddd %xmm0,%xmm5 > > + pslld $19-10,%xmm2 > > + pand %xmm3,%xmm4 > > + pxor %xmm7,%xmm1 > > + > > + > > + psrld $22-13,%xmm7 > > + pxor %xmm2,%xmm1 > > + movdqa %xmm9,%xmm15 > > + pslld $30-19,%xmm2 > > + pxor %xmm1,%xmm7 > > + pxor %xmm4,%xmm15 > > + paddd %xmm5,%xmm11 > > + pxor %xmm2,%xmm7 > > + > > + paddd %xmm5,%xmm15 > > + paddd %xmm7,%xmm15 > > + movdqa 160-128(%rax),%xmm5 > > + paddd 32-128(%rax),%xmm6 > > + > > + movdqa %xmm5,%xmm7 > > + movdqa %xmm5,%xmm1 > > + psrld $3,%xmm7 > > + movdqa %xmm5,%xmm2 > > + > > + psrld $7,%xmm1 > > + movdqa 112-128(%rax),%xmm0 > > + pslld $14,%xmm2 > > + pxor %xmm1,%xmm7 > > + psrld $18-7,%xmm1 > > + movdqa %xmm0,%xmm4 > > + pxor %xmm2,%xmm7 > > + pslld $25-14,%xmm2 > > + pxor %xmm1,%xmm7 > > + psrld $10,%xmm0 > > + movdqa %xmm4,%xmm1 > > + > > + psrld $17,%xmm4 > > + pxor %xmm2,%xmm7 > > + pslld $13,%xmm1 > > + paddd %xmm7,%xmm6 > > + pxor %xmm4,%xmm0 > > + psrld $19-17,%xmm4 > > + pxor %xmm1,%xmm0 > > + pslld $15-13,%xmm1 > > + pxor %xmm4,%xmm0 > > + pxor %xmm1,%xmm0 > > + paddd %xmm0,%xmm6 > > + movdqa %xmm11,%xmm7 > > + > > + movdqa %xmm11,%xmm2 > > + > > + psrld $6,%xmm7 > > + movdqa %xmm11,%xmm1 > > + pslld $7,%xmm2 > > + movdqa %xmm6,144-128(%rax) > > + paddd %xmm14,%xmm6 > > + > > + psrld $11,%xmm1 > > + pxor %xmm2,%xmm7 > > + pslld $21-7,%xmm2 > > + paddd -96(%rbp),%xmm6 > > + pxor %xmm1,%xmm7 > > + > > + psrld $25-11,%xmm1 > > + movdqa %xmm11,%xmm0 > > + > > + pxor %xmm2,%xmm7 > > + movdqa %xmm11,%xmm4 > > + pslld $26-21,%xmm2 > > + pandn %xmm13,%xmm0 > > + pand %xmm12,%xmm4 > > + pxor %xmm1,%xmm7 > > + > > + > > + movdqa %xmm15,%xmm1 > > + pxor %xmm2,%xmm7 > > + movdqa %xmm15,%xmm2 > > + psrld $2,%xmm1 > > + paddd %xmm7,%xmm6 > > + pxor %xmm4,%xmm0 > > + movdqa %xmm8,%xmm4 > > + movdqa %xmm15,%xmm7 > > + pslld $10,%xmm2 > > + pxor %xmm15,%xmm4 > > + > > + > > + psrld $13,%xmm7 > > + pxor %xmm2,%xmm1 > > + paddd %xmm0,%xmm6 > > + pslld $19-10,%xmm2 > > + pand %xmm4,%xmm3 > > + pxor %xmm7,%xmm1 > > + > > + > > + psrld $22-13,%xmm7 > > + pxor %xmm2,%xmm1 > > + movdqa %xmm8,%xmm14 > > + pslld $30-19,%xmm2 > > + pxor %xmm1,%xmm7 > > + pxor %xmm3,%xmm14 > > + paddd %xmm6,%xmm10 > > + pxor %xmm2,%xmm7 > > + > > + paddd %xmm6,%xmm14 > > + paddd %xmm7,%xmm14 > > + movdqa 176-128(%rax),%xmm6 > > + paddd 48-128(%rax),%xmm5 > > + > > + movdqa %xmm6,%xmm7 > > + movdqa %xmm6,%xmm1 > > + psrld $3,%xmm7 > > + movdqa %xmm6,%xmm2 > > + > > + psrld $7,%xmm1 > > + movdqa 128-128(%rax),%xmm0 > > + pslld $14,%xmm2 > > + pxor %xmm1,%xmm7 > > + psrld $18-7,%xmm1 > > + movdqa %xmm0,%xmm3 > > + pxor %xmm2,%xmm7 > > + pslld $25-14,%xmm2 > > + pxor %xmm1,%xmm7 > > + psrld $10,%xmm0 > > + movdqa %xmm3,%xmm1 > > + > > + psrld $17,%xmm3 > > + pxor %xmm2,%xmm7 > > + pslld $13,%xmm1 > > + paddd %xmm7,%xmm5 > > + pxor %xmm3,%xmm0 > > + psrld $19-17,%xmm3 > > + pxor %xmm1,%xmm0 > > + pslld $15-13,%xmm1 > > + pxor %xmm3,%xmm0 > > + pxor %xmm1,%xmm0 > > + paddd %xmm0,%xmm5 > > + movdqa %xmm10,%xmm7 > > + > > + movdqa %xmm10,%xmm2 > > + > > + psrld $6,%xmm7 > > + movdqa %xmm10,%xmm1 > > + pslld $7,%xmm2 > > + movdqa %xmm5,160-128(%rax) > > + paddd %xmm13,%xmm5 > > + > > + psrld $11,%xmm1 > > + pxor %xmm2,%xmm7 > > + pslld $21-7,%xmm2 > > + paddd -64(%rbp),%xmm5 > > + pxor %xmm1,%xmm7 > > + > > + psrld $25-11,%xmm1 > > + movdqa %xmm10,%xmm0 > > + > > + pxor %xmm2,%xmm7 > > + movdqa %xmm10,%xmm3 > > + pslld $26-21,%xmm2 > > + pandn %xmm12,%xmm0 > > + pand %xmm11,%xmm3 > > + pxor %xmm1,%xmm7 > > + > > + > > + movdqa %xmm14,%xmm1 > > + pxor %xmm2,%xmm7 > > + movdqa %xmm14,%xmm2 > > + psrld $2,%xmm1 > > + paddd %xmm7,%xmm5 > > + pxor %xmm3,%xmm0 > > + movdqa %xmm15,%xmm3 > > + movdqa %xmm14,%xmm7 > > + pslld $10,%xmm2 > > + pxor %xmm14,%xmm3 > > + > > + > > + psrld $13,%xmm7 > > + pxor %xmm2,%xmm1 > > + paddd %xmm0,%xmm5 > > + pslld $19-10,%xmm2 > > + pand %xmm3,%xmm4 > > + pxor %xmm7,%xmm1 > > + > > + > > + psrld $22-13,%xmm7 > > + pxor %xmm2,%xmm1 > > + movdqa %xmm15,%xmm13 > > + pslld $30-19,%xmm2 > > + pxor %xmm1,%xmm7 > > + pxor %xmm4,%xmm13 > > + paddd %xmm5,%xmm9 > > + pxor %xmm2,%xmm7 > > + > > + paddd %xmm5,%xmm13 > > + paddd %xmm7,%xmm13 > > + movdqa 192-128(%rax),%xmm5 > > + paddd 64-128(%rax),%xmm6 > > + > > + movdqa %xmm5,%xmm7 > > + movdqa %xmm5,%xmm1 > > + psrld $3,%xmm7 > > + movdqa %xmm5,%xmm2 > > + > > + psrld $7,%xmm1 > > + movdqa 144-128(%rax),%xmm0 > > + pslld $14,%xmm2 > > + pxor %xmm1,%xmm7 > > + psrld $18-7,%xmm1 > > + movdqa %xmm0,%xmm4 > > + pxor %xmm2,%xmm7 > > + pslld $25-14,%xmm2 > > + pxor %xmm1,%xmm7 > > + psrld $10,%xmm0 > > + movdqa %xmm4,%xmm1 > > + > > + psrld $17,%xmm4 > > + pxor %xmm2,%xmm7 > > + pslld $13,%xmm1 > > + paddd %xmm7,%xmm6 > > + pxor %xmm4,%xmm0 > > + psrld $19-17,%xmm4 > > + pxor %xmm1,%xmm0 > > + pslld $15-13,%xmm1 > > + pxor %xmm4,%xmm0 > > + pxor %xmm1,%xmm0 > > + paddd %xmm0,%xmm6 > > + movdqa %xmm9,%xmm7 > > + > > + movdqa %xmm9,%xmm2 > > + > > + psrld $6,%xmm7 > > + movdqa %xmm9,%xmm1 > > + pslld $7,%xmm2 > > + movdqa %xmm6,176-128(%rax) > > + paddd %xmm12,%xmm6 > > + > > + psrld $11,%xmm1 > > + pxor %xmm2,%xmm7 > > + pslld $21-7,%xmm2 > > + paddd -32(%rbp),%xmm6 > > + pxor %xmm1,%xmm7 > > + > > + psrld $25-11,%xmm1 > > + movdqa %xmm9,%xmm0 > > + > > + pxor %xmm2,%xmm7 > > + movdqa %xmm9,%xmm4 > > + pslld $26-21,%xmm2 > > + pandn %xmm11,%xmm0 > > + pand %xmm10,%xmm4 > > + pxor %xmm1,%xmm7 > > + > > + > > + movdqa %xmm13,%xmm1 > > + pxor %xmm2,%xmm7 > > + movdqa %xmm13,%xmm2 > > + psrld $2,%xmm1 > > + paddd %xmm7,%xmm6 > > + pxor %xmm4,%xmm0 > > + movdqa %xmm14,%xmm4 > > + movdqa %xmm13,%xmm7 > > + pslld $10,%xmm2 > > + pxor %xmm13,%xmm4 > > + > > + > > + psrld $13,%xmm7 > > + pxor %xmm2,%xmm1 > > + paddd %xmm0,%xmm6 > > + pslld $19-10,%xmm2 > > + pand %xmm4,%xmm3 > > + pxor %xmm7,%xmm1 > > + > > + > > + psrld $22-13,%xmm7 > > + pxor %xmm2,%xmm1 > > + movdqa %xmm14,%xmm12 > > + pslld $30-19,%xmm2 > > + pxor %xmm1,%xmm7 > > + pxor %xmm3,%xmm12 > > + paddd %xmm6,%xmm8 > > + pxor %xmm2,%xmm7 > > + > > + paddd %xmm6,%xmm12 > > + paddd %xmm7,%xmm12 > > + movdqa 208-128(%rax),%xmm6 > > + paddd 80-128(%rax),%xmm5 > > + > > + movdqa %xmm6,%xmm7 > > + movdqa %xmm6,%xmm1 > > + psrld $3,%xmm7 > > + movdqa %xmm6,%xmm2 > > + > > + psrld $7,%xmm1 > > + movdqa 160-128(%rax),%xmm0 > > + pslld $14,%xmm2 > > + pxor %xmm1,%xmm7 > > + psrld $18-7,%xmm1 > > + movdqa %xmm0,%xmm3 > > + pxor %xmm2,%xmm7 > > + pslld $25-14,%xmm2 > > + pxor %xmm1,%xmm7 > > + psrld $10,%xmm0 > > + movdqa %xmm3,%xmm1 > > + > > + psrld $17,%xmm3 > > + pxor %xmm2,%xmm7 > > + pslld $13,%xmm1 > > + paddd %xmm7,%xmm5 > > + pxor %xmm3,%xmm0 > > + psrld $19-17,%xmm3 > > + pxor %xmm1,%xmm0 > > + pslld $15-13,%xmm1 > > + pxor %xmm3,%xmm0 > > + pxor %xmm1,%xmm0 > > + paddd %xmm0,%xmm5 > > + movdqa %xmm8,%xmm7 > > + > > + movdqa %xmm8,%xmm2 > > + > > + psrld $6,%xmm7 > > + movdqa %xmm8,%xmm1 > > + pslld $7,%xmm2 > > + movdqa %xmm5,192-128(%rax) > > + paddd %xmm11,%xmm5 > > + > > + psrld $11,%xmm1 > > + pxor %xmm2,%xmm7 > > + pslld $21-7,%xmm2 > > + paddd 0(%rbp),%xmm5 > > + pxor %xmm1,%xmm7 > > + > > + psrld $25-11,%xmm1 > > + movdqa %xmm8,%xmm0 > > + > > + pxor %xmm2,%xmm7 > > + movdqa %xmm8,%xmm3 > > + pslld $26-21,%xmm2 > > + pandn %xmm10,%xmm0 > > + pand %xmm9,%xmm3 > > + pxor %xmm1,%xmm7 > > + > > + > > + movdqa %xmm12,%xmm1 > > + pxor %xmm2,%xmm7 > > + movdqa %xmm12,%xmm2 > > + psrld $2,%xmm1 > > + paddd %xmm7,%xmm5 > > + pxor %xmm3,%xmm0 > > + movdqa %xmm13,%xmm3 > > + movdqa %xmm12,%xmm7 > > + pslld $10,%xmm2 > > + pxor %xmm12,%xmm3 > > + > > + > > + psrld $13,%xmm7 > > + pxor %xmm2,%xmm1 > > + paddd %xmm0,%xmm5 > > + pslld $19-10,%xmm2 > > + pand %xmm3,%xmm4 > > + pxor %xmm7,%xmm1 > > + > > + > > + psrld $22-13,%xmm7 > > + pxor %xmm2,%xmm1 > > + movdqa %xmm13,%xmm11 > > + pslld $30-19,%xmm2 > > + pxor %xmm1,%xmm7 > > + pxor %xmm4,%xmm11 > > + paddd %xmm5,%xmm15 > > + pxor %xmm2,%xmm7 > > + > > + paddd %xmm5,%xmm11 > > + paddd %xmm7,%xmm11 > > + movdqa 224-128(%rax),%xmm5 > > + paddd 96-128(%rax),%xmm6 > > + > > + movdqa %xmm5,%xmm7 > > + movdqa %xmm5,%xmm1 > > + psrld $3,%xmm7 > > + movdqa %xmm5,%xmm2 > > + > > + psrld $7,%xmm1 > > + movdqa 176-128(%rax),%xmm0 > > + pslld $14,%xmm2 > > + pxor %xmm1,%xmm7 > > + psrld $18-7,%xmm1 > > + movdqa %xmm0,%xmm4 > > + pxor %xmm2,%xmm7 > > + pslld $25-14,%xmm2 > > + pxor %xmm1,%xmm7 > > + psrld $10,%xmm0 > > + movdqa %xmm4,%xmm1 > > + > > + psrld $17,%xmm4 > > + pxor %xmm2,%xmm7 > > + pslld $13,%xmm1 > > + paddd %xmm7,%xmm6 > > + pxor %xmm4,%xmm0 > > + psrld $19-17,%xmm4 > > + pxor %xmm1,%xmm0 > > + pslld $15-13,%xmm1 > > + pxor %xmm4,%xmm0 > > + pxor %xmm1,%xmm0 > > + paddd %xmm0,%xmm6 > > + movdqa %xmm15,%xmm7 > > + > > + movdqa %xmm15,%xmm2 > > + > > + psrld $6,%xmm7 > > + movdqa %xmm15,%xmm1 > > + pslld $7,%xmm2 > > + movdqa %xmm6,208-128(%rax) > > + paddd %xmm10,%xmm6 > > + > > + psrld $11,%xmm1 > > + pxor %xmm2,%xmm7 > > + pslld $21-7,%xmm2 > > + paddd 32(%rbp),%xmm6 > > + pxor %xmm1,%xmm7 > > + > > + psrld $25-11,%xmm1 > > + movdqa %xmm15,%xmm0 > > + > > + pxor %xmm2,%xmm7 > > + movdqa %xmm15,%xmm4 > > + pslld $26-21,%xmm2 > > + pandn %xmm9,%xmm0 > > + pand %xmm8,%xmm4 > > + pxor %xmm1,%xmm7 > > + > > + > > + movdqa %xmm11,%xmm1 > > + pxor %xmm2,%xmm7 > > + movdqa %xmm11,%xmm2 > > + psrld $2,%xmm1 > > + paddd %xmm7,%xmm6 > > + pxor %xmm4,%xmm0 > > + movdqa %xmm12,%xmm4 > > + movdqa %xmm11,%xmm7 > > + pslld $10,%xmm2 > > + pxor %xmm11,%xmm4 > > + > > + > > + psrld $13,%xmm7 > > + pxor %xmm2,%xmm1 > > + paddd %xmm0,%xmm6 > > + pslld $19-10,%xmm2 > > + pand %xmm4,%xmm3 > > + pxor %xmm7,%xmm1 > > + > > + > > + psrld $22-13,%xmm7 > > + pxor %xmm2,%xmm1 > > + movdqa %xmm12,%xmm10 > > + pslld $30-19,%xmm2 > > + pxor %xmm1,%xmm7 > > + pxor %xmm3,%xmm10 > > + paddd %xmm6,%xmm14 > > + pxor %xmm2,%xmm7 > > + > > + paddd %xmm6,%xmm10 > > + paddd %xmm7,%xmm10 > > + movdqa 240-128(%rax),%xmm6 > > + paddd 112-128(%rax),%xmm5 > > + > > + movdqa %xmm6,%xmm7 > > + movdqa %xmm6,%xmm1 > > + psrld $3,%xmm7 > > + movdqa %xmm6,%xmm2 > > + > > + psrld $7,%xmm1 > > + movdqa 192-128(%rax),%xmm0 > > + pslld $14,%xmm2 > > + pxor %xmm1,%xmm7 > > + psrld $18-7,%xmm1 > > + movdqa %xmm0,%xmm3 > > + pxor %xmm2,%xmm7 > > + pslld $25-14,%xmm2 > > + pxor %xmm1,%xmm7 > > + psrld $10,%xmm0 > > + movdqa %xmm3,%xmm1 > > + > > + psrld $17,%xmm3 > > + pxor %xmm2,%xmm7 > > + pslld $13,%xmm1 > > + paddd %xmm7,%xmm5 > > + pxor %xmm3,%xmm0 > > + psrld $19-17,%xmm3 > > + pxor %xmm1,%xmm0 > > + pslld $15-13,%xmm1 > > + pxor %xmm3,%xmm0 > > + pxor %xmm1,%xmm0 > > + paddd %xmm0,%xmm5 > > + movdqa %xmm14,%xmm7 > > + > > + movdqa %xmm14,%xmm2 > > + > > + psrld $6,%xmm7 > > + movdqa %xmm14,%xmm1 > > + pslld $7,%xmm2 > > + movdqa %xmm5,224-128(%rax) > > + paddd %xmm9,%xmm5 > > + > > + psrld $11,%xmm1 > > + pxor %xmm2,%xmm7 > > + pslld $21-7,%xmm2 > > + paddd 64(%rbp),%xmm5 > > + pxor %xmm1,%xmm7 > > + > > + psrld $25-11,%xmm1 > > + movdqa %xmm14,%xmm0 > > + > > + pxor %xmm2,%xmm7 > > + movdqa %xmm14,%xmm3 > > + pslld $26-21,%xmm2 > > + pandn %xmm8,%xmm0 > > + pand %xmm15,%xmm3 > > + pxor %xmm1,%xmm7 > > + > > + > > + movdqa %xmm10,%xmm1 > > + pxor %xmm2,%xmm7 > > + movdqa %xmm10,%xmm2 > > + psrld $2,%xmm1 > > + paddd %xmm7,%xmm5 > > + pxor %xmm3,%xmm0 > > + movdqa %xmm11,%xmm3 > > + movdqa %xmm10,%xmm7 > > + pslld $10,%xmm2 > > + pxor %xmm10,%xmm3 > > + > > + > > + psrld $13,%xmm7 > > + pxor %xmm2,%xmm1 > > + paddd %xmm0,%xmm5 > > + pslld $19-10,%xmm2 > > + pand %xmm3,%xmm4 > > + pxor %xmm7,%xmm1 > > + > > + > > + psrld $22-13,%xmm7 > > + pxor %xmm2,%xmm1 > > + movdqa %xmm11,%xmm9 > > + pslld $30-19,%xmm2 > > + pxor %xmm1,%xmm7 > > + pxor %xmm4,%xmm9 > > + paddd %xmm5,%xmm13 > > + pxor %xmm2,%xmm7 > > + > > + paddd %xmm5,%xmm9 > > + paddd %xmm7,%xmm9 > > + movdqa 0-128(%rax),%xmm5 > > + paddd 128-128(%rax),%xmm6 > > + > > + movdqa %xmm5,%xmm7 > > + movdqa %xmm5,%xmm1 > > + psrld $3,%xmm7 > > + movdqa %xmm5,%xmm2 > > + > > + psrld $7,%xmm1 > > + movdqa 208-128(%rax),%xmm0 > > + pslld $14,%xmm2 > > + pxor %xmm1,%xmm7 > > + psrld $18-7,%xmm1 > > + movdqa %xmm0,%xmm4 > > + pxor %xmm2,%xmm7 > > + pslld $25-14,%xmm2 > > + pxor %xmm1,%xmm7 > > + psrld $10,%xmm0 > > + movdqa %xmm4,%xmm1 > > + > > + psrld $17,%xmm4 > > + pxor %xmm2,%xmm7 > > + pslld $13,%xmm1 > > + paddd %xmm7,%xmm6 > > + pxor %xmm4,%xmm0 > > + psrld $19-17,%xmm4 > > + pxor %xmm1,%xmm0 > > + pslld $15-13,%xmm1 > > + pxor %xmm4,%xmm0 > > + pxor %xmm1,%xmm0 > > + paddd %xmm0,%xmm6 > > + movdqa %xmm13,%xmm7 > > + > > + movdqa %xmm13,%xmm2 > > + > > + psrld $6,%xmm7 > > + movdqa %xmm13,%xmm1 > > + pslld $7,%xmm2 > > + movdqa %xmm6,240-128(%rax) > > + paddd %xmm8,%xmm6 > > + > > + psrld $11,%xmm1 > > + pxor %xmm2,%xmm7 > > + pslld $21-7,%xmm2 > > + paddd 96(%rbp),%xmm6 > > + pxor %xmm1,%xmm7 > > + > > + psrld $25-11,%xmm1 > > + movdqa %xmm13,%xmm0 > > + > > + pxor %xmm2,%xmm7 > > + movdqa %xmm13,%xmm4 > > + pslld $26-21,%xmm2 > > + pandn %xmm15,%xmm0 > > + pand %xmm14,%xmm4 > > + pxor %xmm1,%xmm7 > > + > > + > > + movdqa %xmm9,%xmm1 > > + pxor %xmm2,%xmm7 > > + movdqa %xmm9,%xmm2 > > + psrld $2,%xmm1 > > + paddd %xmm7,%xmm6 > > + pxor %xmm4,%xmm0 > > + movdqa %xmm10,%xmm4 > > + movdqa %xmm9,%xmm7 > > + pslld $10,%xmm2 > > + pxor %xmm9,%xmm4 > > + > > + > > + psrld $13,%xmm7 > > + pxor %xmm2,%xmm1 > > + paddd %xmm0,%xmm6 > > + pslld $19-10,%xmm2 > > + pand %xmm4,%xmm3 > > + pxor %xmm7,%xmm1 > > + > > + > > + psrld $22-13,%xmm7 > > + pxor %xmm2,%xmm1 > > + movdqa %xmm10,%xmm8 > > + pslld $30-19,%xmm2 > > + pxor %xmm1,%xmm7 > > + pxor %xmm3,%xmm8 > > + paddd %xmm6,%xmm12 > > + pxor %xmm2,%xmm7 > > + > > + paddd %xmm6,%xmm8 > > + paddd %xmm7,%xmm8 > > + leaq 256(%rbp),%rbp > > + decl %ecx > > + jnz .Loop_16_xx > > + > > + movl $1,%ecx > > + leaq K256+128(%rip),%rbp > > + > > + movdqa (%rbx),%xmm7 > > + cmpl 0(%rbx),%ecx > > + pxor %xmm0,%xmm0 > > + cmovgeq %rbp,%r8 > > + cmpl 4(%rbx),%ecx > > + movdqa %xmm7,%xmm6 > > + cmovgeq %rbp,%r9 > > + cmpl 8(%rbx),%ecx > > + pcmpgtd %xmm0,%xmm6 > > + cmovgeq %rbp,%r10 > > + cmpl 12(%rbx),%ecx > > + paddd %xmm6,%xmm7 > > + cmovgeq %rbp,%r11 > > + > > + movdqu 0-128(%rdi),%xmm0 > > + pand %xmm6,%xmm8 > > + movdqu 32-128(%rdi),%xmm1 > > + pand %xmm6,%xmm9 > > + movdqu 64-128(%rdi),%xmm2 > > + pand %xmm6,%xmm10 > > + movdqu 96-128(%rdi),%xmm5 > > + pand %xmm6,%xmm11 > > + paddd %xmm0,%xmm8 > > + movdqu 128-128(%rdi),%xmm0 > > + pand %xmm6,%xmm12 > > + paddd %xmm1,%xmm9 > > + movdqu 160-128(%rdi),%xmm1 > > + pand %xmm6,%xmm13 > > + paddd %xmm2,%xmm10 > > + movdqu 192-128(%rdi),%xmm2 > > + pand %xmm6,%xmm14 > > + paddd %xmm5,%xmm11 > > + movdqu 224-128(%rdi),%xmm5 > > + pand %xmm6,%xmm15 > > + paddd %xmm0,%xmm12 > > + paddd %xmm1,%xmm13 > > + movdqu %xmm8,0-128(%rdi) > > + paddd %xmm2,%xmm14 > > + movdqu %xmm9,32-128(%rdi) > > + paddd %xmm5,%xmm15 > > + movdqu %xmm10,64-128(%rdi) > > + movdqu %xmm11,96-128(%rdi) > > + movdqu %xmm12,128-128(%rdi) > > + movdqu %xmm13,160-128(%rdi) > > + movdqu %xmm14,192-128(%rdi) > > + movdqu %xmm15,224-128(%rdi) > > + > > + movdqa %xmm7,(%rbx) > > + movdqa .Lpbswap(%rip),%xmm6 > > + decl %edx > > + jnz .Loop > > + > > + movl 280(%rsp),%edx > > + leaq 16(%rdi),%rdi > > + leaq 64(%rsi),%rsi > > + decl %edx > > + jnz .Loop_grande > > + > > +.Ldone: > > + movq 272(%rsp),%rax > > +.cfi_def_cfa %rax,8 > > + movq -16(%rax),%rbp > > +.cfi_restore %rbp > > + movq -8(%rax),%rbx > > +.cfi_restore %rbx > > + leaq (%rax),%rsp > > +.cfi_def_cfa_register %rsp > > +.Lepilogue: > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size sha256_multi_block,.-sha256_multi_block > > +.type sha256_multi_block_shaext,@function > > +.align 32 > > +sha256_multi_block_shaext: > > +.cfi_startproc > > +_shaext_shortcut: > > + movq %rsp,%rax > > +.cfi_def_cfa_register %rax > > + pushq %rbx > > +.cfi_offset %rbx,-16 > > + pushq %rbp > > +.cfi_offset %rbp,-24 > > + subq $288,%rsp > > + shll $1,%edx > > + andq $-256,%rsp > > + leaq 128(%rdi),%rdi > > + movq %rax,272(%rsp) > > +.Lbody_shaext: > > + leaq 256(%rsp),%rbx > > + leaq K256_shaext+128(%rip),%rbp > > + > > +.Loop_grande_shaext: > > + movl %edx,280(%rsp) > > + xorl %edx,%edx > > + movq 0(%rsi),%r8 > > + movl 8(%rsi),%ecx > > + cmpl %edx,%ecx > > + cmovgl %ecx,%edx > > + testl %ecx,%ecx > > + movl %ecx,0(%rbx) > > + cmovleq %rsp,%r8 > > + movq 16(%rsi),%r9 > > + movl 24(%rsi),%ecx > > + cmpl %edx,%ecx > > + cmovgl %ecx,%edx > > + testl %ecx,%ecx > > + movl %ecx,4(%rbx) > > + cmovleq %rsp,%r9 > > + testl %edx,%edx > > + jz .Ldone_shaext > > + > > + movq 0-128(%rdi),%xmm12 > > + movq 32-128(%rdi),%xmm4 > > + movq 64-128(%rdi),%xmm13 > > + movq 96-128(%rdi),%xmm5 > > + movq 128-128(%rdi),%xmm8 > > + movq 160-128(%rdi),%xmm9 > > + movq 192-128(%rdi),%xmm10 > > + movq 224-128(%rdi),%xmm11 > > + > > + punpckldq %xmm4,%xmm12 > > + punpckldq %xmm5,%xmm13 > > + punpckldq %xmm9,%xmm8 > > + punpckldq %xmm11,%xmm10 > > + movdqa K256_shaext-16(%rip),%xmm3 > > + > > + movdqa %xmm12,%xmm14 > > + movdqa %xmm13,%xmm15 > > + punpcklqdq %xmm8,%xmm12 > > + punpcklqdq %xmm10,%xmm13 > > + punpckhqdq %xmm8,%xmm14 > > + punpckhqdq %xmm10,%xmm15 > > + > > + pshufd $27,%xmm12,%xmm12 > > + pshufd $27,%xmm13,%xmm13 > > + pshufd $27,%xmm14,%xmm14 > > + pshufd $27,%xmm15,%xmm15 > > + jmp .Loop_shaext > > + > > +.align 32 > > +.Loop_shaext: > > + movdqu 0(%r8),%xmm4 > > + movdqu 0(%r9),%xmm8 > > + movdqu 16(%r8),%xmm5 > > + movdqu 16(%r9),%xmm9 > > + movdqu 32(%r8),%xmm6 > > +.byte 102,15,56,0,227 > > + movdqu 32(%r9),%xmm10 > > +.byte 102,68,15,56,0,195 > > + movdqu 48(%r8),%xmm7 > > + leaq 64(%r8),%r8 > > + movdqu 48(%r9),%xmm11 > > + leaq 64(%r9),%r9 > > + > > + movdqa 0-128(%rbp),%xmm0 > > +.byte 102,15,56,0,235 > > + paddd %xmm4,%xmm0 > > + pxor %xmm12,%xmm4 > > + movdqa %xmm0,%xmm1 > > + movdqa 0-128(%rbp),%xmm2 > > +.byte 102,68,15,56,0,203 > > + paddd %xmm8,%xmm2 > > + movdqa %xmm13,80(%rsp) > > +.byte 69,15,56,203,236 > > + pxor %xmm14,%xmm8 > > + movdqa %xmm2,%xmm0 > > + movdqa %xmm15,112(%rsp) > > +.byte 69,15,56,203,254 > > + pshufd $0x0e,%xmm1,%xmm0 > > + pxor %xmm12,%xmm4 > > + movdqa %xmm12,64(%rsp) > > +.byte 69,15,56,203,229 > > + pshufd $0x0e,%xmm2,%xmm0 > > + pxor %xmm14,%xmm8 > > + movdqa %xmm14,96(%rsp) > > + movdqa 16-128(%rbp),%xmm1 > > + paddd %xmm5,%xmm1 > > +.byte 102,15,56,0,243 > > +.byte 69,15,56,203,247 > > + > > + movdqa %xmm1,%xmm0 > > + movdqa 16-128(%rbp),%xmm2 > > + paddd %xmm9,%xmm2 > > +.byte 69,15,56,203,236 > > + movdqa %xmm2,%xmm0 > > + prefetcht0 127(%r8) > > +.byte 102,15,56,0,251 > > +.byte 102,68,15,56,0,211 > > + prefetcht0 127(%r9) > > +.byte 69,15,56,203,254 > > + pshufd $0x0e,%xmm1,%xmm0 > > +.byte 102,68,15,56,0,219 > > +.byte 15,56,204,229 > > +.byte 69,15,56,203,229 > > + pshufd $0x0e,%xmm2,%xmm0 > > + movdqa 32-128(%rbp),%xmm1 > > + paddd %xmm6,%xmm1 > > +.byte 69,15,56,203,247 > > + > > + movdqa %xmm1,%xmm0 > > + movdqa 32-128(%rbp),%xmm2 > > + paddd %xmm10,%xmm2 > > +.byte 69,15,56,203,236 > > +.byte 69,15,56,204,193 > > + movdqa %xmm2,%xmm0 > > + movdqa %xmm7,%xmm3 > > +.byte 69,15,56,203,254 > > + pshufd $0x0e,%xmm1,%xmm0 > > +.byte 102,15,58,15,222,4 > > + paddd %xmm3,%xmm4 > > + movdqa %xmm11,%xmm3 > > +.byte 102,65,15,58,15,218,4 > > +.byte 15,56,204,238 > > +.byte 69,15,56,203,229 > > + pshufd $0x0e,%xmm2,%xmm0 > > + movdqa 48-128(%rbp),%xmm1 > > + paddd %xmm7,%xmm1 > > +.byte 69,15,56,203,247 > > +.byte 69,15,56,204,202 > > + > > + movdqa %xmm1,%xmm0 > > + movdqa 48-128(%rbp),%xmm2 > > + paddd %xmm3,%xmm8 > > + paddd %xmm11,%xmm2 > > +.byte 15,56,205,231 > > +.byte 69,15,56,203,236 > > + movdqa %xmm2,%xmm0 > > + movdqa %xmm4,%xmm3 > > +.byte 102,15,58,15,223,4 > > +.byte 69,15,56,203,254 > > +.byte 69,15,56,205,195 > > + pshufd $0x0e,%xmm1,%xmm0 > > + paddd %xmm3,%xmm5 > > + movdqa %xmm8,%xmm3 > > +.byte 102,65,15,58,15,219,4 > > +.byte 15,56,204,247 > > +.byte 69,15,56,203,229 > > + pshufd $0x0e,%xmm2,%xmm0 > > + movdqa 64-128(%rbp),%xmm1 > > + paddd %xmm4,%xmm1 > > +.byte 69,15,56,203,247 > > +.byte 69,15,56,204,211 > > + movdqa %xmm1,%xmm0 > > + movdqa 64-128(%rbp),%xmm2 > > + paddd %xmm3,%xmm9 > > + paddd %xmm8,%xmm2 > > +.byte 15,56,205,236 > > +.byte 69,15,56,203,236 > > + movdqa %xmm2,%xmm0 > > + movdqa %xmm5,%xmm3 > > +.byte 102,15,58,15,220,4 > > +.byte 69,15,56,203,254 > > +.byte 69,15,56,205,200 > > + pshufd $0x0e,%xmm1,%xmm0 > > + paddd %xmm3,%xmm6 > > + movdqa %xmm9,%xmm3 > > +.byte 102,65,15,58,15,216,4 > > +.byte 15,56,204,252 > > +.byte 69,15,56,203,229 > > + pshufd $0x0e,%xmm2,%xmm0 > > + movdqa 80-128(%rbp),%xmm1 > > + paddd %xmm5,%xmm1 > > +.byte 69,15,56,203,247 > > +.byte 69,15,56,204,216 > > + movdqa %xmm1,%xmm0 > > + movdqa 80-128(%rbp),%xmm2 > > + paddd %xmm3,%xmm10 > > + paddd %xmm9,%xmm2 > > +.byte 15,56,205,245 > > +.byte 69,15,56,203,236 > > + movdqa %xmm2,%xmm0 > > + movdqa %xmm6,%xmm3 > > +.byte 102,15,58,15,221,4 > > +.byte 69,15,56,203,254 > > +.byte 69,15,56,205,209 > > + pshufd $0x0e,%xmm1,%xmm0 > > + paddd %xmm3,%xmm7 > > + movdqa %xmm10,%xmm3 > > +.byte 102,65,15,58,15,217,4 > > +.byte 15,56,204,229 > > +.byte 69,15,56,203,229 > > + pshufd $0x0e,%xmm2,%xmm0 > > + movdqa 96-128(%rbp),%xmm1 > > + paddd %xmm6,%xmm1 > > +.byte 69,15,56,203,247 > > +.byte 69,15,56,204,193 > > + movdqa %xmm1,%xmm0 > > + movdqa 96-128(%rbp),%xmm2 > > + paddd %xmm3,%xmm11 > > + paddd %xmm10,%xmm2 > > +.byte 15,56,205,254 > > +.byte 69,15,56,203,236 > > + movdqa %xmm2,%xmm0 > > + movdqa %xmm7,%xmm3 > > +.byte 102,15,58,15,222,4 > > +.byte 69,15,56,203,254 > > +.byte 69,15,56,205,218 > > + pshufd $0x0e,%xmm1,%xmm0 > > + paddd %xmm3,%xmm4 > > + movdqa %xmm11,%xmm3 > > +.byte 102,65,15,58,15,218,4 > > +.byte 15,56,204,238 > > +.byte 69,15,56,203,229 > > + pshufd $0x0e,%xmm2,%xmm0 > > + movdqa 112-128(%rbp),%xmm1 > > + paddd %xmm7,%xmm1 > > +.byte 69,15,56,203,247 > > +.byte 69,15,56,204,202 > > + movdqa %xmm1,%xmm0 > > + movdqa 112-128(%rbp),%xmm2 > > + paddd %xmm3,%xmm8 > > + paddd %xmm11,%xmm2 > > +.byte 15,56,205,231 > > +.byte 69,15,56,203,236 > > + movdqa %xmm2,%xmm0 > > + movdqa %xmm4,%xmm3 > > +.byte 102,15,58,15,223,4 > > +.byte 69,15,56,203,254 > > +.byte 69,15,56,205,195 > > + pshufd $0x0e,%xmm1,%xmm0 > > + paddd %xmm3,%xmm5 > > + movdqa %xmm8,%xmm3 > > +.byte 102,65,15,58,15,219,4 > > +.byte 15,56,204,247 > > +.byte 69,15,56,203,229 > > + pshufd $0x0e,%xmm2,%xmm0 > > + movdqa 128-128(%rbp),%xmm1 > > + paddd %xmm4,%xmm1 > > +.byte 69,15,56,203,247 > > +.byte 69,15,56,204,211 > > + movdqa %xmm1,%xmm0 > > + movdqa 128-128(%rbp),%xmm2 > > + paddd %xmm3,%xmm9 > > + paddd %xmm8,%xmm2 > > +.byte 15,56,205,236 > > +.byte 69,15,56,203,236 > > + movdqa %xmm2,%xmm0 > > + movdqa %xmm5,%xmm3 > > +.byte 102,15,58,15,220,4 > > +.byte 69,15,56,203,254 > > +.byte 69,15,56,205,200 > > + pshufd $0x0e,%xmm1,%xmm0 > > + paddd %xmm3,%xmm6 > > + movdqa %xmm9,%xmm3 > > +.byte 102,65,15,58,15,216,4 > > +.byte 15,56,204,252 > > +.byte 69,15,56,203,229 > > + pshufd $0x0e,%xmm2,%xmm0 > > + movdqa 144-128(%rbp),%xmm1 > > + paddd %xmm5,%xmm1 > > +.byte 69,15,56,203,247 > > +.byte 69,15,56,204,216 > > + movdqa %xmm1,%xmm0 > > + movdqa 144-128(%rbp),%xmm2 > > + paddd %xmm3,%xmm10 > > + paddd %xmm9,%xmm2 > > +.byte 15,56,205,245 > > +.byte 69,15,56,203,236 > > + movdqa %xmm2,%xmm0 > > + movdqa %xmm6,%xmm3 > > +.byte 102,15,58,15,221,4 > > +.byte 69,15,56,203,254 > > +.byte 69,15,56,205,209 > > + pshufd $0x0e,%xmm1,%xmm0 > > + paddd %xmm3,%xmm7 > > + movdqa %xmm10,%xmm3 > > +.byte 102,65,15,58,15,217,4 > > +.byte 15,56,204,229 > > +.byte 69,15,56,203,229 > > + pshufd $0x0e,%xmm2,%xmm0 > > + movdqa 160-128(%rbp),%xmm1 > > + paddd %xmm6,%xmm1 > > +.byte 69,15,56,203,247 > > +.byte 69,15,56,204,193 > > + movdqa %xmm1,%xmm0 > > + movdqa 160-128(%rbp),%xmm2 > > + paddd %xmm3,%xmm11 > > + paddd %xmm10,%xmm2 > > +.byte 15,56,205,254 > > +.byte 69,15,56,203,236 > > + movdqa %xmm2,%xmm0 > > + movdqa %xmm7,%xmm3 > > +.byte 102,15,58,15,222,4 > > +.byte 69,15,56,203,254 > > +.byte 69,15,56,205,218 > > + pshufd $0x0e,%xmm1,%xmm0 > > + paddd %xmm3,%xmm4 > > + movdqa %xmm11,%xmm3 > > +.byte 102,65,15,58,15,218,4 > > +.byte 15,56,204,238 > > +.byte 69,15,56,203,229 > > + pshufd $0x0e,%xmm2,%xmm0 > > + movdqa 176-128(%rbp),%xmm1 > > + paddd %xmm7,%xmm1 > > +.byte 69,15,56,203,247 > > +.byte 69,15,56,204,202 > > + movdqa %xmm1,%xmm0 > > + movdqa 176-128(%rbp),%xmm2 > > + paddd %xmm3,%xmm8 > > + paddd %xmm11,%xmm2 > > +.byte 15,56,205,231 > > +.byte 69,15,56,203,236 > > + movdqa %xmm2,%xmm0 > > + movdqa %xmm4,%xmm3 > > +.byte 102,15,58,15,223,4 > > +.byte 69,15,56,203,254 > > +.byte 69,15,56,205,195 > > + pshufd $0x0e,%xmm1,%xmm0 > > + paddd %xmm3,%xmm5 > > + movdqa %xmm8,%xmm3 > > +.byte 102,65,15,58,15,219,4 > > +.byte 15,56,204,247 > > +.byte 69,15,56,203,229 > > + pshufd $0x0e,%xmm2,%xmm0 > > + movdqa 192-128(%rbp),%xmm1 > > + paddd %xmm4,%xmm1 > > +.byte 69,15,56,203,247 > > +.byte 69,15,56,204,211 > > + movdqa %xmm1,%xmm0 > > + movdqa 192-128(%rbp),%xmm2 > > + paddd %xmm3,%xmm9 > > + paddd %xmm8,%xmm2 > > +.byte 15,56,205,236 > > +.byte 69,15,56,203,236 > > + movdqa %xmm2,%xmm0 > > + movdqa %xmm5,%xmm3 > > +.byte 102,15,58,15,220,4 > > +.byte 69,15,56,203,254 > > +.byte 69,15,56,205,200 > > + pshufd $0x0e,%xmm1,%xmm0 > > + paddd %xmm3,%xmm6 > > + movdqa %xmm9,%xmm3 > > +.byte 102,65,15,58,15,216,4 > > +.byte 15,56,204,252 > > +.byte 69,15,56,203,229 > > + pshufd $0x0e,%xmm2,%xmm0 > > + movdqa 208-128(%rbp),%xmm1 > > + paddd %xmm5,%xmm1 > > +.byte 69,15,56,203,247 > > +.byte 69,15,56,204,216 > > + movdqa %xmm1,%xmm0 > > + movdqa 208-128(%rbp),%xmm2 > > + paddd %xmm3,%xmm10 > > + paddd %xmm9,%xmm2 > > +.byte 15,56,205,245 > > +.byte 69,15,56,203,236 > > + movdqa %xmm2,%xmm0 > > + movdqa %xmm6,%xmm3 > > +.byte 102,15,58,15,221,4 > > +.byte 69,15,56,203,254 > > +.byte 69,15,56,205,209 > > + pshufd $0x0e,%xmm1,%xmm0 > > + paddd %xmm3,%xmm7 > > + movdqa %xmm10,%xmm3 > > +.byte 102,65,15,58,15,217,4 > > + nop > > +.byte 69,15,56,203,229 > > + pshufd $0x0e,%xmm2,%xmm0 > > + movdqa 224-128(%rbp),%xmm1 > > + paddd %xmm6,%xmm1 > > +.byte 69,15,56,203,247 > > + > > + movdqa %xmm1,%xmm0 > > + movdqa 224-128(%rbp),%xmm2 > > + paddd %xmm3,%xmm11 > > + paddd %xmm10,%xmm2 > > +.byte 15,56,205,254 > > + nop > > +.byte 69,15,56,203,236 > > + movdqa %xmm2,%xmm0 > > + movl $1,%ecx > > + pxor %xmm6,%xmm6 > > +.byte 69,15,56,203,254 > > +.byte 69,15,56,205,218 > > + pshufd $0x0e,%xmm1,%xmm0 > > + movdqa 240-128(%rbp),%xmm1 > > + paddd %xmm7,%xmm1 > > + movq (%rbx),%xmm7 > > + nop > > +.byte 69,15,56,203,229 > > + pshufd $0x0e,%xmm2,%xmm0 > > + movdqa 240-128(%rbp),%xmm2 > > + paddd %xmm11,%xmm2 > > +.byte 69,15,56,203,247 > > + > > + movdqa %xmm1,%xmm0 > > + cmpl 0(%rbx),%ecx > > + cmovgeq %rsp,%r8 > > + cmpl 4(%rbx),%ecx > > + cmovgeq %rsp,%r9 > > + pshufd $0x00,%xmm7,%xmm9 > > +.byte 69,15,56,203,236 > > + movdqa %xmm2,%xmm0 > > + pshufd $0x55,%xmm7,%xmm10 > > + movdqa %xmm7,%xmm11 > > +.byte 69,15,56,203,254 > > + pshufd $0x0e,%xmm1,%xmm0 > > + pcmpgtd %xmm6,%xmm9 > > + pcmpgtd %xmm6,%xmm10 > > +.byte 69,15,56,203,229 > > + pshufd $0x0e,%xmm2,%xmm0 > > + pcmpgtd %xmm6,%xmm11 > > + movdqa K256_shaext-16(%rip),%xmm3 > > +.byte 69,15,56,203,247 > > + > > + pand %xmm9,%xmm13 > > + pand %xmm10,%xmm15 > > + pand %xmm9,%xmm12 > > + pand %xmm10,%xmm14 > > + paddd %xmm7,%xmm11 > > + > > + paddd 80(%rsp),%xmm13 > > + paddd 112(%rsp),%xmm15 > > + paddd 64(%rsp),%xmm12 > > + paddd 96(%rsp),%xmm14 > > + > > + movq %xmm11,(%rbx) > > + decl %edx > > + jnz .Loop_shaext > > + > > + movl 280(%rsp),%edx > > + > > + pshufd $27,%xmm12,%xmm12 > > + pshufd $27,%xmm13,%xmm13 > > + pshufd $27,%xmm14,%xmm14 > > + pshufd $27,%xmm15,%xmm15 > > + > > + movdqa %xmm12,%xmm5 > > + movdqa %xmm13,%xmm6 > > + punpckldq %xmm14,%xmm12 > > + punpckhdq %xmm14,%xmm5 > > + punpckldq %xmm15,%xmm13 > > + punpckhdq %xmm15,%xmm6 > > + > > + movq %xmm12,0-128(%rdi) > > + psrldq $8,%xmm12 > > + movq %xmm5,128-128(%rdi) > > + psrldq $8,%xmm5 > > + movq %xmm12,32-128(%rdi) > > + movq %xmm5,160-128(%rdi) > > + > > + movq %xmm13,64-128(%rdi) > > + psrldq $8,%xmm13 > > + movq %xmm6,192-128(%rdi) > > + psrldq $8,%xmm6 > > + movq %xmm13,96-128(%rdi) > > + movq %xmm6,224-128(%rdi) > > + > > + leaq 8(%rdi),%rdi > > + leaq 32(%rsi),%rsi > > + decl %edx > > + jnz .Loop_grande_shaext > > + > > +.Ldone_shaext: > > + > > + movq -16(%rax),%rbp > > +.cfi_restore %rbp > > + movq -8(%rax),%rbx > > +.cfi_restore %rbx > > + leaq (%rax),%rsp > > +.cfi_def_cfa_register %rsp > > +.Lepilogue_shaext: > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size sha256_multi_block_shaext,.-sha256_multi_block_shaext > > +.align 256 > > +K256: > > +.long 1116352408,1116352408,1116352408,1116352408 > > +.long 1116352408,1116352408,1116352408,1116352408 > > +.long 1899447441,1899447441,1899447441,1899447441 > > +.long 1899447441,1899447441,1899447441,1899447441 > > +.long 3049323471,3049323471,3049323471,3049323471 > > +.long 3049323471,3049323471,3049323471,3049323471 > > +.long 3921009573,3921009573,3921009573,3921009573 > > +.long 3921009573,3921009573,3921009573,3921009573 > > +.long 961987163,961987163,961987163,961987163 > > +.long 961987163,961987163,961987163,961987163 > > +.long 1508970993,1508970993,1508970993,1508970993 > > +.long 1508970993,1508970993,1508970993,1508970993 > > +.long 2453635748,2453635748,2453635748,2453635748 > > +.long 2453635748,2453635748,2453635748,2453635748 > > +.long 2870763221,2870763221,2870763221,2870763221 > > +.long 2870763221,2870763221,2870763221,2870763221 > > +.long 3624381080,3624381080,3624381080,3624381080 > > +.long 3624381080,3624381080,3624381080,3624381080 > > +.long 310598401,310598401,310598401,310598401 > > +.long 310598401,310598401,310598401,310598401 > > +.long 607225278,607225278,607225278,607225278 > > +.long 607225278,607225278,607225278,607225278 > > +.long 1426881987,1426881987,1426881987,1426881987 > > +.long 1426881987,1426881987,1426881987,1426881987 > > +.long 1925078388,1925078388,1925078388,1925078388 > > +.long 1925078388,1925078388,1925078388,1925078388 > > +.long 2162078206,2162078206,2162078206,2162078206 > > +.long 2162078206,2162078206,2162078206,2162078206 > > +.long 2614888103,2614888103,2614888103,2614888103 > > +.long 2614888103,2614888103,2614888103,2614888103 > > +.long 3248222580,3248222580,3248222580,3248222580 > > +.long 3248222580,3248222580,3248222580,3248222580 > > +.long 3835390401,3835390401,3835390401,3835390401 > > +.long 3835390401,3835390401,3835390401,3835390401 > > +.long 4022224774,4022224774,4022224774,4022224774 > > +.long 4022224774,4022224774,4022224774,4022224774 > > +.long 264347078,264347078,264347078,264347078 > > +.long 264347078,264347078,264347078,264347078 > > +.long 604807628,604807628,604807628,604807628 > > +.long 604807628,604807628,604807628,604807628 > > +.long 770255983,770255983,770255983,770255983 > > +.long 770255983,770255983,770255983,770255983 > > +.long 1249150122,1249150122,1249150122,1249150122 > > +.long 1249150122,1249150122,1249150122,1249150122 > > +.long 1555081692,1555081692,1555081692,1555081692 > > +.long 1555081692,1555081692,1555081692,1555081692 > > +.long 1996064986,1996064986,1996064986,1996064986 > > +.long 1996064986,1996064986,1996064986,1996064986 > > +.long 2554220882,2554220882,2554220882,2554220882 > > +.long 2554220882,2554220882,2554220882,2554220882 > > +.long 2821834349,2821834349,2821834349,2821834349 > > +.long 2821834349,2821834349,2821834349,2821834349 > > +.long 2952996808,2952996808,2952996808,2952996808 > > +.long 2952996808,2952996808,2952996808,2952996808 > > +.long 3210313671,3210313671,3210313671,3210313671 > > +.long 3210313671,3210313671,3210313671,3210313671 > > +.long 3336571891,3336571891,3336571891,3336571891 > > +.long 3336571891,3336571891,3336571891,3336571891 > > +.long 3584528711,3584528711,3584528711,3584528711 > > +.long 3584528711,3584528711,3584528711,3584528711 > > +.long 113926993,113926993,113926993,113926993 > > +.long 113926993,113926993,113926993,113926993 > > +.long 338241895,338241895,338241895,338241895 > > +.long 338241895,338241895,338241895,338241895 > > +.long 666307205,666307205,666307205,666307205 > > +.long 666307205,666307205,666307205,666307205 > > +.long 773529912,773529912,773529912,773529912 > > +.long 773529912,773529912,773529912,773529912 > > +.long 1294757372,1294757372,1294757372,1294757372 > > +.long 1294757372,1294757372,1294757372,1294757372 > > +.long 1396182291,1396182291,1396182291,1396182291 > > +.long 1396182291,1396182291,1396182291,1396182291 > > +.long 1695183700,1695183700,1695183700,1695183700 > > +.long 1695183700,1695183700,1695183700,1695183700 > > +.long 1986661051,1986661051,1986661051,1986661051 > > +.long 1986661051,1986661051,1986661051,1986661051 > > +.long 2177026350,2177026350,2177026350,2177026350 > > +.long 2177026350,2177026350,2177026350,2177026350 > > +.long 2456956037,2456956037,2456956037,2456956037 > > +.long 2456956037,2456956037,2456956037,2456956037 > > +.long 2730485921,2730485921,2730485921,2730485921 > > +.long 2730485921,2730485921,2730485921,2730485921 > > +.long 2820302411,2820302411,2820302411,2820302411 > > +.long 2820302411,2820302411,2820302411,2820302411 > > +.long 3259730800,3259730800,3259730800,3259730800 > > +.long 3259730800,3259730800,3259730800,3259730800 > > +.long 3345764771,3345764771,3345764771,3345764771 > > +.long 3345764771,3345764771,3345764771,3345764771 > > +.long 3516065817,3516065817,3516065817,3516065817 > > +.long 3516065817,3516065817,3516065817,3516065817 > > +.long 3600352804,3600352804,3600352804,3600352804 > > +.long 3600352804,3600352804,3600352804,3600352804 > > +.long 4094571909,4094571909,4094571909,4094571909 > > +.long 4094571909,4094571909,4094571909,4094571909 > > +.long 275423344,275423344,275423344,275423344 > > +.long 275423344,275423344,275423344,275423344 > > +.long 430227734,430227734,430227734,430227734 > > +.long 430227734,430227734,430227734,430227734 > > +.long 506948616,506948616,506948616,506948616 > > +.long 506948616,506948616,506948616,506948616 > > +.long 659060556,659060556,659060556,659060556 > > +.long 659060556,659060556,659060556,659060556 > > +.long 883997877,883997877,883997877,883997877 > > +.long 883997877,883997877,883997877,883997877 > > +.long 958139571,958139571,958139571,958139571 > > +.long 958139571,958139571,958139571,958139571 > > +.long 1322822218,1322822218,1322822218,1322822218 > > +.long 1322822218,1322822218,1322822218,1322822218 > > +.long 1537002063,1537002063,1537002063,1537002063 > > +.long 1537002063,1537002063,1537002063,1537002063 > > +.long 1747873779,1747873779,1747873779,1747873779 > > +.long 1747873779,1747873779,1747873779,1747873779 > > +.long 1955562222,1955562222,1955562222,1955562222 > > +.long 1955562222,1955562222,1955562222,1955562222 > > +.long 2024104815,2024104815,2024104815,2024104815 > > +.long 2024104815,2024104815,2024104815,2024104815 > > +.long 2227730452,2227730452,2227730452,2227730452 > > +.long 2227730452,2227730452,2227730452,2227730452 > > +.long 2361852424,2361852424,2361852424,2361852424 > > +.long 2361852424,2361852424,2361852424,2361852424 > > +.long 2428436474,2428436474,2428436474,2428436474 > > +.long 2428436474,2428436474,2428436474,2428436474 > > +.long 2756734187,2756734187,2756734187,2756734187 > > +.long 2756734187,2756734187,2756734187,2756734187 > > +.long 3204031479,3204031479,3204031479,3204031479 > > +.long 3204031479,3204031479,3204031479,3204031479 > > +.long 3329325298,3329325298,3329325298,3329325298 > > +.long 3329325298,3329325298,3329325298,3329325298 > > +.Lpbswap: > > +.long 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f > > +.long 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f > > +K256_shaext: > > +.long 0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5 > > +.long 0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5 > > +.long 0xd807aa98,0x12835b01,0x243185be,0x550c7dc3 > > +.long 0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174 > > +.long 0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc > > +.long 0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da > > +.long 0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7 > > +.long 0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967 > > +.long 0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13 > > +.long 0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85 > > +.long 0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3 > > +.long 0xd192e819,0xd6990624,0xf40e3585,0x106aa070 > > +.long 0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5 > > +.long 0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3 > > +.long 0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208 > > +.long 0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2 > > +.byte > > > 83,72,65,50,53,54,32,109,117,108,116,105,45,98,108,111,99,107,32,116,114,9 > > > 7,110,115,102,111,114,109,32,102,111,114,32,120,56,54,95,54,52,44,32,67,82 > , > > > 89,80,84,79,71,65,77,83,32,98,121,32,60,97,112,112,114,111,64,111,112,101, > 1 > > 10,115,115,108,46,111,114,103,62,0 > > diff --git > a/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha256-x86_64.S > > b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha256-x86_64.S > > new file mode 100644 > > index 0000000000..a5d3cf5068 > > --- /dev/null > > +++ b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha256-x86_64.S > > @@ -0,0 +1,3097 @@ > > +# WARNING: do not edit! > > +# Generated from openssl/crypto/sha/asm/sha512-x86_64.pl > > +# > > +# Copyright 2005-2020 The OpenSSL Project Authors. All Rights Reserved. > > +# > > +# Licensed under the OpenSSL license (the "License"). You may not use > > +# this file except in compliance with the License. You can obtain a copy > > +# in the file LICENSE in the source distribution or at > > +# https://www.openssl.org/source/license.html > > + > > +.text > > + > > + > > +.globl sha256_block_data_order > > +.type sha256_block_data_order,@function > > +.align 16 > > +sha256_block_data_order: > > +.cfi_startproc > > + leaq OPENSSL_ia32cap_P(%rip),%r11 > > + movl 0(%r11),%r9d > > + movl 4(%r11),%r10d > > + movl 8(%r11),%r11d > > + testl $536870912,%r11d > > + jnz _shaext_shortcut > > + testl $512,%r10d > > + jnz .Lssse3_shortcut > > + movq %rsp,%rax > > +.cfi_def_cfa_register %rax > > + pushq %rbx > > +.cfi_offset %rbx,-16 > > + pushq %rbp > > +.cfi_offset %rbp,-24 > > + pushq %r12 > > +.cfi_offset %r12,-32 > > + pushq %r13 > > +.cfi_offset %r13,-40 > > + pushq %r14 > > +.cfi_offset %r14,-48 > > + pushq %r15 > > +.cfi_offset %r15,-56 > > + shlq $4,%rdx > > + subq $64+32,%rsp > > + leaq (%rsi,%rdx,4),%rdx > > + andq $-64,%rsp > > + movq %rdi,64+0(%rsp) > > + movq %rsi,64+8(%rsp) > > + movq %rdx,64+16(%rsp) > > + movq %rax,88(%rsp) > > +.cfi_escape 0x0f,0x06,0x77,0xd8,0x00,0x06,0x23,0x08 > > +.Lprologue: > > + > > + movl 0(%rdi),%eax > > + movl 4(%rdi),%ebx > > + movl 8(%rdi),%ecx > > + movl 12(%rdi),%edx > > + movl 16(%rdi),%r8d > > + movl 20(%rdi),%r9d > > + movl 24(%rdi),%r10d > > + movl 28(%rdi),%r11d > > + jmp .Lloop > > + > > +.align 16 > > +.Lloop: > > + movl %ebx,%edi > > + leaq K256(%rip),%rbp > > + xorl %ecx,%edi > > + movl 0(%rsi),%r12d > > + movl %r8d,%r13d > > + movl %eax,%r14d > > + bswapl %r12d > > + rorl $14,%r13d > > + movl %r9d,%r15d > > + > > + xorl %r8d,%r13d > > + rorl $9,%r14d > > + xorl %r10d,%r15d > > + > > + movl %r12d,0(%rsp) > > + xorl %eax,%r14d > > + andl %r8d,%r15d > > + > > + rorl $5,%r13d > > + addl %r11d,%r12d > > + xorl %r10d,%r15d > > + > > + rorl $11,%r14d > > + xorl %r8d,%r13d > > + addl %r15d,%r12d > > + > > + movl %eax,%r15d > > + addl (%rbp),%r12d > > + xorl %eax,%r14d > > + > > + xorl %ebx,%r15d > > + rorl $6,%r13d > > + movl %ebx,%r11d > > + > > + andl %r15d,%edi > > + rorl $2,%r14d > > + addl %r13d,%r12d > > + > > + xorl %edi,%r11d > > + addl %r12d,%edx > > + addl %r12d,%r11d > > + > > + leaq 4(%rbp),%rbp > > + addl %r14d,%r11d > > + movl 4(%rsi),%r12d > > + movl %edx,%r13d > > + movl %r11d,%r14d > > + bswapl %r12d > > + rorl $14,%r13d > > + movl %r8d,%edi > > + > > + xorl %edx,%r13d > > + rorl $9,%r14d > > + xorl %r9d,%edi > > + > > + movl %r12d,4(%rsp) > > + xorl %r11d,%r14d > > + andl %edx,%edi > > + > > + rorl $5,%r13d > > + addl %r10d,%r12d > > + xorl %r9d,%edi > > + > > + rorl $11,%r14d > > + xorl %edx,%r13d > > + addl %edi,%r12d > > + > > + movl %r11d,%edi > > + addl (%rbp),%r12d > > + xorl %r11d,%r14d > > + > > + xorl %eax,%edi > > + rorl $6,%r13d > > + movl %eax,%r10d > > + > > + andl %edi,%r15d > > + rorl $2,%r14d > > + addl %r13d,%r12d > > + > > + xorl %r15d,%r10d > > + addl %r12d,%ecx > > + addl %r12d,%r10d > > + > > + leaq 4(%rbp),%rbp > > + addl %r14d,%r10d > > + movl 8(%rsi),%r12d > > + movl %ecx,%r13d > > + movl %r10d,%r14d > > + bswapl %r12d > > + rorl $14,%r13d > > + movl %edx,%r15d > > + > > + xorl %ecx,%r13d > > + rorl $9,%r14d > > + xorl %r8d,%r15d > > + > > + movl %r12d,8(%rsp) > > + xorl %r10d,%r14d > > + andl %ecx,%r15d > > + > > + rorl $5,%r13d > > + addl %r9d,%r12d > > + xorl %r8d,%r15d > > + > > + rorl $11,%r14d > > + xorl %ecx,%r13d > > + addl %r15d,%r12d > > + > > + movl %r10d,%r15d > > + addl (%rbp),%r12d > > + xorl %r10d,%r14d > > + > > + xorl %r11d,%r15d > > + rorl $6,%r13d > > + movl %r11d,%r9d > > + > > + andl %r15d,%edi > > + rorl $2,%r14d > > + addl %r13d,%r12d > > + > > + xorl %edi,%r9d > > + addl %r12d,%ebx > > + addl %r12d,%r9d > > + > > + leaq 4(%rbp),%rbp > > + addl %r14d,%r9d > > + movl 12(%rsi),%r12d > > + movl %ebx,%r13d > > + movl %r9d,%r14d > > + bswapl %r12d > > + rorl $14,%r13d > > + movl %ecx,%edi > > + > > + xorl %ebx,%r13d > > + rorl $9,%r14d > > + xorl %edx,%edi > > + > > + movl %r12d,12(%rsp) > > + xorl %r9d,%r14d > > + andl %ebx,%edi > > + > > + rorl $5,%r13d > > + addl %r8d,%r12d > > + xorl %edx,%edi > > + > > + rorl $11,%r14d > > + xorl %ebx,%r13d > > + addl %edi,%r12d > > + > > + movl %r9d,%edi > > + addl (%rbp),%r12d > > + xorl %r9d,%r14d > > + > > + xorl %r10d,%edi > > + rorl $6,%r13d > > + movl %r10d,%r8d > > + > > + andl %edi,%r15d > > + rorl $2,%r14d > > + addl %r13d,%r12d > > + > > + xorl %r15d,%r8d > > + addl %r12d,%eax > > + addl %r12d,%r8d > > + > > + leaq 20(%rbp),%rbp > > + addl %r14d,%r8d > > + movl 16(%rsi),%r12d > > + movl %eax,%r13d > > + movl %r8d,%r14d > > + bswapl %r12d > > + rorl $14,%r13d > > + movl %ebx,%r15d > > + > > + xorl %eax,%r13d > > + rorl $9,%r14d > > + xorl %ecx,%r15d > > + > > + movl %r12d,16(%rsp) > > + xorl %r8d,%r14d > > + andl %eax,%r15d > > + > > + rorl $5,%r13d > > + addl %edx,%r12d > > + xorl %ecx,%r15d > > + > > + rorl $11,%r14d > > + xorl %eax,%r13d > > + addl %r15d,%r12d > > + > > + movl %r8d,%r15d > > + addl (%rbp),%r12d > > + xorl %r8d,%r14d > > + > > + xorl %r9d,%r15d > > + rorl $6,%r13d > > + movl %r9d,%edx > > + > > + andl %r15d,%edi > > + rorl $2,%r14d > > + addl %r13d,%r12d > > + > > + xorl %edi,%edx > > + addl %r12d,%r11d > > + addl %r12d,%edx > > + > > + leaq 4(%rbp),%rbp > > + addl %r14d,%edx > > + movl 20(%rsi),%r12d > > + movl %r11d,%r13d > > + movl %edx,%r14d > > + bswapl %r12d > > + rorl $14,%r13d > > + movl %eax,%edi > > + > > + xorl %r11d,%r13d > > + rorl $9,%r14d > > + xorl %ebx,%edi > > + > > + movl %r12d,20(%rsp) > > + xorl %edx,%r14d > > + andl %r11d,%edi > > + > > + rorl $5,%r13d > > + addl %ecx,%r12d > > + xorl %ebx,%edi > > + > > + rorl $11,%r14d > > + xorl %r11d,%r13d > > + addl %edi,%r12d > > + > > + movl %edx,%edi > > + addl (%rbp),%r12d > > + xorl %edx,%r14d > > + > > + xorl %r8d,%edi > > + rorl $6,%r13d > > + movl %r8d,%ecx > > + > > + andl %edi,%r15d > > + rorl $2,%r14d > > + addl %r13d,%r12d > > + > > + xorl %r15d,%ecx > > + addl %r12d,%r10d > > + addl %r12d,%ecx > > + > > + leaq 4(%rbp),%rbp > > + addl %r14d,%ecx > > + movl 24(%rsi),%r12d > > + movl %r10d,%r13d > > + movl %ecx,%r14d > > + bswapl %r12d > > + rorl $14,%r13d > > + movl %r11d,%r15d > > + > > + xorl %r10d,%r13d > > + rorl $9,%r14d > > + xorl %eax,%r15d > > + > > + movl %r12d,24(%rsp) > > + xorl %ecx,%r14d > > + andl %r10d,%r15d > > + > > + rorl $5,%r13d > > + addl %ebx,%r12d > > + xorl %eax,%r15d > > + > > + rorl $11,%r14d > > + xorl %r10d,%r13d > > + addl %r15d,%r12d > > + > > + movl %ecx,%r15d > > + addl (%rbp),%r12d > > + xorl %ecx,%r14d > > + > > + xorl %edx,%r15d > > + rorl $6,%r13d > > + movl %edx,%ebx > > + > > + andl %r15d,%edi > > + rorl $2,%r14d > > + addl %r13d,%r12d > > + > > + xorl %edi,%ebx > > + addl %r12d,%r9d > > + addl %r12d,%ebx > > + > > + leaq 4(%rbp),%rbp > > + addl %r14d,%ebx > > + movl 28(%rsi),%r12d > > + movl %r9d,%r13d > > + movl %ebx,%r14d > > + bswapl %r12d > > + rorl $14,%r13d > > + movl %r10d,%edi > > + > > + xorl %r9d,%r13d > > + rorl $9,%r14d > > + xorl %r11d,%edi > > + > > + movl %r12d,28(%rsp) > > + xorl %ebx,%r14d > > + andl %r9d,%edi > > + > > + rorl $5,%r13d > > + addl %eax,%r12d > > + xorl %r11d,%edi > > + > > + rorl $11,%r14d > > + xorl %r9d,%r13d > > + addl %edi,%r12d > > + > > + movl %ebx,%edi > > + addl (%rbp),%r12d > > + xorl %ebx,%r14d > > + > > + xorl %ecx,%edi > > + rorl $6,%r13d > > + movl %ecx,%eax > > + > > + andl %edi,%r15d > > + rorl $2,%r14d > > + addl %r13d,%r12d > > + > > + xorl %r15d,%eax > > + addl %r12d,%r8d > > + addl %r12d,%eax > > + > > + leaq 20(%rbp),%rbp > > + addl %r14d,%eax > > + movl 32(%rsi),%r12d > > + movl %r8d,%r13d > > + movl %eax,%r14d > > + bswapl %r12d > > + rorl $14,%r13d > > + movl %r9d,%r15d > > + > > + xorl %r8d,%r13d > > + rorl $9,%r14d > > + xorl %r10d,%r15d > > + > > + movl %r12d,32(%rsp) > > + xorl %eax,%r14d > > + andl %r8d,%r15d > > + > > + rorl $5,%r13d > > + addl %r11d,%r12d > > + xorl %r10d,%r15d > > + > > + rorl $11,%r14d > > + xorl %r8d,%r13d > > + addl %r15d,%r12d > > + > > + movl %eax,%r15d > > + addl (%rbp),%r12d > > + xorl %eax,%r14d > > + > > + xorl %ebx,%r15d > > + rorl $6,%r13d > > + movl %ebx,%r11d > > + > > + andl %r15d,%edi > > + rorl $2,%r14d > > + addl %r13d,%r12d > > + > > + xorl %edi,%r11d > > + addl %r12d,%edx > > + addl %r12d,%r11d > > + > > + leaq 4(%rbp),%rbp > > + addl %r14d,%r11d > > + movl 36(%rsi),%r12d > > + movl %edx,%r13d > > + movl %r11d,%r14d > > + bswapl %r12d > > + rorl $14,%r13d > > + movl %r8d,%edi > > + > > + xorl %edx,%r13d > > + rorl $9,%r14d > > + xorl %r9d,%edi > > + > > + movl %r12d,36(%rsp) > > + xorl %r11d,%r14d > > + andl %edx,%edi > > + > > + rorl $5,%r13d > > + addl %r10d,%r12d > > + xorl %r9d,%edi > > + > > + rorl $11,%r14d > > + xorl %edx,%r13d > > + addl %edi,%r12d > > + > > + movl %r11d,%edi > > + addl (%rbp),%r12d > > + xorl %r11d,%r14d > > + > > + xorl %eax,%edi > > + rorl $6,%r13d > > + movl %eax,%r10d > > + > > + andl %edi,%r15d > > + rorl $2,%r14d > > + addl %r13d,%r12d > > + > > + xorl %r15d,%r10d > > + addl %r12d,%ecx > > + addl %r12d,%r10d > > + > > + leaq 4(%rbp),%rbp > > + addl %r14d,%r10d > > + movl 40(%rsi),%r12d > > + movl %ecx,%r13d > > + movl %r10d,%r14d > > + bswapl %r12d > > + rorl $14,%r13d > > + movl %edx,%r15d > > + > > + xorl %ecx,%r13d > > + rorl $9,%r14d > > + xorl %r8d,%r15d > > + > > + movl %r12d,40(%rsp) > > + xorl %r10d,%r14d > > + andl %ecx,%r15d > > + > > + rorl $5,%r13d > > + addl %r9d,%r12d > > + xorl %r8d,%r15d > > + > > + rorl $11,%r14d > > + xorl %ecx,%r13d > > + addl %r15d,%r12d > > + > > + movl %r10d,%r15d > > + addl (%rbp),%r12d > > + xorl %r10d,%r14d > > + > > + xorl %r11d,%r15d > > + rorl $6,%r13d > > + movl %r11d,%r9d > > + > > + andl %r15d,%edi > > + rorl $2,%r14d > > + addl %r13d,%r12d > > + > > + xorl %edi,%r9d > > + addl %r12d,%ebx > > + addl %r12d,%r9d > > + > > + leaq 4(%rbp),%rbp > > + addl %r14d,%r9d > > + movl 44(%rsi),%r12d > > + movl %ebx,%r13d > > + movl %r9d,%r14d > > + bswapl %r12d > > + rorl $14,%r13d > > + movl %ecx,%edi > > + > > + xorl %ebx,%r13d > > + rorl $9,%r14d > > + xorl %edx,%edi > > + > > + movl %r12d,44(%rsp) > > + xorl %r9d,%r14d > > + andl %ebx,%edi > > + > > + rorl $5,%r13d > > + addl %r8d,%r12d > > + xorl %edx,%edi > > + > > + rorl $11,%r14d > > + xorl %ebx,%r13d > > + addl %edi,%r12d > > + > > + movl %r9d,%edi > > + addl (%rbp),%r12d > > + xorl %r9d,%r14d > > + > > + xorl %r10d,%edi > > + rorl $6,%r13d > > + movl %r10d,%r8d > > + > > + andl %edi,%r15d > > + rorl $2,%r14d > > + addl %r13d,%r12d > > + > > + xorl %r15d,%r8d > > + addl %r12d,%eax > > + addl %r12d,%r8d > > + > > + leaq 20(%rbp),%rbp > > + addl %r14d,%r8d > > + movl 48(%rsi),%r12d > > + movl %eax,%r13d > > + movl %r8d,%r14d > > + bswapl %r12d > > + rorl $14,%r13d > > + movl %ebx,%r15d > > + > > + xorl %eax,%r13d > > + rorl $9,%r14d > > + xorl %ecx,%r15d > > + > > + movl %r12d,48(%rsp) > > + xorl %r8d,%r14d > > + andl %eax,%r15d > > + > > + rorl $5,%r13d > > + addl %edx,%r12d > > + xorl %ecx,%r15d > > + > > + rorl $11,%r14d > > + xorl %eax,%r13d > > + addl %r15d,%r12d > > + > > + movl %r8d,%r15d > > + addl (%rbp),%r12d > > + xorl %r8d,%r14d > > + > > + xorl %r9d,%r15d > > + rorl $6,%r13d > > + movl %r9d,%edx > > + > > + andl %r15d,%edi > > + rorl $2,%r14d > > + addl %r13d,%r12d > > + > > + xorl %edi,%edx > > + addl %r12d,%r11d > > + addl %r12d,%edx > > + > > + leaq 4(%rbp),%rbp > > + addl %r14d,%edx > > + movl 52(%rsi),%r12d > > + movl %r11d,%r13d > > + movl %edx,%r14d > > + bswapl %r12d > > + rorl $14,%r13d > > + movl %eax,%edi > > + > > + xorl %r11d,%r13d > > + rorl $9,%r14d > > + xorl %ebx,%edi > > + > > + movl %r12d,52(%rsp) > > + xorl %edx,%r14d > > + andl %r11d,%edi > > + > > + rorl $5,%r13d > > + addl %ecx,%r12d > > + xorl %ebx,%edi > > + > > + rorl $11,%r14d > > + xorl %r11d,%r13d > > + addl %edi,%r12d > > + > > + movl %edx,%edi > > + addl (%rbp),%r12d > > + xorl %edx,%r14d > > + > > + xorl %r8d,%edi > > + rorl $6,%r13d > > + movl %r8d,%ecx > > + > > + andl %edi,%r15d > > + rorl $2,%r14d > > + addl %r13d,%r12d > > + > > + xorl %r15d,%ecx > > + addl %r12d,%r10d > > + addl %r12d,%ecx > > + > > + leaq 4(%rbp),%rbp > > + addl %r14d,%ecx > > + movl 56(%rsi),%r12d > > + movl %r10d,%r13d > > + movl %ecx,%r14d > > + bswapl %r12d > > + rorl $14,%r13d > > + movl %r11d,%r15d > > + > > + xorl %r10d,%r13d > > + rorl $9,%r14d > > + xorl %eax,%r15d > > + > > + movl %r12d,56(%rsp) > > + xorl %ecx,%r14d > > + andl %r10d,%r15d > > + > > + rorl $5,%r13d > > + addl %ebx,%r12d > > + xorl %eax,%r15d > > + > > + rorl $11,%r14d > > + xorl %r10d,%r13d > > + addl %r15d,%r12d > > + > > + movl %ecx,%r15d > > + addl (%rbp),%r12d > > + xorl %ecx,%r14d > > + > > + xorl %edx,%r15d > > + rorl $6,%r13d > > + movl %edx,%ebx > > + > > + andl %r15d,%edi > > + rorl $2,%r14d > > + addl %r13d,%r12d > > + > > + xorl %edi,%ebx > > + addl %r12d,%r9d > > + addl %r12d,%ebx > > + > > + leaq 4(%rbp),%rbp > > + addl %r14d,%ebx > > + movl 60(%rsi),%r12d > > + movl %r9d,%r13d > > + movl %ebx,%r14d > > + bswapl %r12d > > + rorl $14,%r13d > > + movl %r10d,%edi > > + > > + xorl %r9d,%r13d > > + rorl $9,%r14d > > + xorl %r11d,%edi > > + > > + movl %r12d,60(%rsp) > > + xorl %ebx,%r14d > > + andl %r9d,%edi > > + > > + rorl $5,%r13d > > + addl %eax,%r12d > > + xorl %r11d,%edi > > + > > + rorl $11,%r14d > > + xorl %r9d,%r13d > > + addl %edi,%r12d > > + > > + movl %ebx,%edi > > + addl (%rbp),%r12d > > + xorl %ebx,%r14d > > + > > + xorl %ecx,%edi > > + rorl $6,%r13d > > + movl %ecx,%eax > > + > > + andl %edi,%r15d > > + rorl $2,%r14d > > + addl %r13d,%r12d > > + > > + xorl %r15d,%eax > > + addl %r12d,%r8d > > + addl %r12d,%eax > > + > > + leaq 20(%rbp),%rbp > > + jmp .Lrounds_16_xx > > +.align 16 > > +.Lrounds_16_xx: > > + movl 4(%rsp),%r13d > > + movl 56(%rsp),%r15d > > + > > + movl %r13d,%r12d > > + rorl $11,%r13d > > + addl %r14d,%eax > > + movl %r15d,%r14d > > + rorl $2,%r15d > > + > > + xorl %r12d,%r13d > > + shrl $3,%r12d > > + rorl $7,%r13d > > + xorl %r14d,%r15d > > + shrl $10,%r14d > > + > > + rorl $17,%r15d > > + xorl %r13d,%r12d > > + xorl %r14d,%r15d > > + addl 36(%rsp),%r12d > > + > > + addl 0(%rsp),%r12d > > + movl %r8d,%r13d > > + addl %r15d,%r12d > > + movl %eax,%r14d > > + rorl $14,%r13d > > + movl %r9d,%r15d > > + > > + xorl %r8d,%r13d > > + rorl $9,%r14d > > + xorl %r10d,%r15d > > + > > + movl %r12d,0(%rsp) > > + xorl %eax,%r14d > > + andl %r8d,%r15d > > + > > + rorl $5,%r13d > > + addl %r11d,%r12d > > + xorl %r10d,%r15d > > + > > + rorl $11,%r14d > > + xorl %r8d,%r13d > > + addl %r15d,%r12d > > + > > + movl %eax,%r15d > > + addl (%rbp),%r12d > > + xorl %eax,%r14d > > + > > + xorl %ebx,%r15d > > + rorl $6,%r13d > > + movl %ebx,%r11d > > + > > + andl %r15d,%edi > > + rorl $2,%r14d > > + addl %r13d,%r12d > > + > > + xorl %edi,%r11d > > + addl %r12d,%edx > > + addl %r12d,%r11d > > + > > + leaq 4(%rbp),%rbp > > + movl 8(%rsp),%r13d > > + movl 60(%rsp),%edi > > + > > + movl %r13d,%r12d > > + rorl $11,%r13d > > + addl %r14d,%r11d > > + movl %edi,%r14d > > + rorl $2,%edi > > + > > + xorl %r12d,%r13d > > + shrl $3,%r12d > > + rorl $7,%r13d > > + xorl %r14d,%edi > > + shrl $10,%r14d > > + > > + rorl $17,%edi > > + xorl %r13d,%r12d > > + xorl %r14d,%edi > > + addl 40(%rsp),%r12d > > + > > + addl 4(%rsp),%r12d > > + movl %edx,%r13d > > + addl %edi,%r12d > > + movl %r11d,%r14d > > + rorl $14,%r13d > > + movl %r8d,%edi > > + > > + xorl %edx,%r13d > > + rorl $9,%r14d > > + xorl %r9d,%edi > > + > > + movl %r12d,4(%rsp) > > + xorl %r11d,%r14d > > + andl %edx,%edi > > + > > + rorl $5,%r13d > > + addl %r10d,%r12d > > + xorl %r9d,%edi > > + > > + rorl $11,%r14d > > + xorl %edx,%r13d > > + addl %edi,%r12d > > + > > + movl %r11d,%edi > > + addl (%rbp),%r12d > > + xorl %r11d,%r14d > > + > > + xorl %eax,%edi > > + rorl $6,%r13d > > + movl %eax,%r10d > > + > > + andl %edi,%r15d > > + rorl $2,%r14d > > + addl %r13d,%r12d > > + > > + xorl %r15d,%r10d > > + addl %r12d,%ecx > > + addl %r12d,%r10d > > + > > + leaq 4(%rbp),%rbp > > + movl 12(%rsp),%r13d > > + movl 0(%rsp),%r15d > > + > > + movl %r13d,%r12d > > + rorl $11,%r13d > > + addl %r14d,%r10d > > + movl %r15d,%r14d > > + rorl $2,%r15d > > + > > + xorl %r12d,%r13d > > + shrl $3,%r12d > > + rorl $7,%r13d > > + xorl %r14d,%r15d > > + shrl $10,%r14d > > + > > + rorl $17,%r15d > > + xorl %r13d,%r12d > > + xorl %r14d,%r15d > > + addl 44(%rsp),%r12d > > + > > + addl 8(%rsp),%r12d > > + movl %ecx,%r13d > > + addl %r15d,%r12d > > + movl %r10d,%r14d > > + rorl $14,%r13d > > + movl %edx,%r15d > > + > > + xorl %ecx,%r13d > > + rorl $9,%r14d > > + xorl %r8d,%r15d > > + > > + movl %r12d,8(%rsp) > > + xorl %r10d,%r14d > > + andl %ecx,%r15d > > + > > + rorl $5,%r13d > > + addl %r9d,%r12d > > + xorl %r8d,%r15d > > + > > + rorl $11,%r14d > > + xorl %ecx,%r13d > > + addl %r15d,%r12d > > + > > + movl %r10d,%r15d > > + addl (%rbp),%r12d > > + xorl %r10d,%r14d > > + > > + xorl %r11d,%r15d > > + rorl $6,%r13d > > + movl %r11d,%r9d > > + > > + andl %r15d,%edi > > + rorl $2,%r14d > > + addl %r13d,%r12d > > + > > + xorl %edi,%r9d > > + addl %r12d,%ebx > > + addl %r12d,%r9d > > + > > + leaq 4(%rbp),%rbp > > + movl 16(%rsp),%r13d > > + movl 4(%rsp),%edi > > + > > + movl %r13d,%r12d > > + rorl $11,%r13d > > + addl %r14d,%r9d > > + movl %edi,%r14d > > + rorl $2,%edi > > + > > + xorl %r12d,%r13d > > + shrl $3,%r12d > > + rorl $7,%r13d > > + xorl %r14d,%edi > > + shrl $10,%r14d > > + > > + rorl $17,%edi > > + xorl %r13d,%r12d > > + xorl %r14d,%edi > > + addl 48(%rsp),%r12d > > + > > + addl 12(%rsp),%r12d > > + movl %ebx,%r13d > > + addl %edi,%r12d > > + movl %r9d,%r14d > > + rorl $14,%r13d > > + movl %ecx,%edi > > + > > + xorl %ebx,%r13d > > + rorl $9,%r14d > > + xorl %edx,%edi > > + > > + movl %r12d,12(%rsp) > > + xorl %r9d,%r14d > > + andl %ebx,%edi > > + > > + rorl $5,%r13d > > + addl %r8d,%r12d > > + xorl %edx,%edi > > + > > + rorl $11,%r14d > > + xorl %ebx,%r13d > > + addl %edi,%r12d > > + > > + movl %r9d,%edi > > + addl (%rbp),%r12d > > + xorl %r9d,%r14d > > + > > + xorl %r10d,%edi > > + rorl $6,%r13d > > + movl %r10d,%r8d > > + > > + andl %edi,%r15d > > + rorl $2,%r14d > > + addl %r13d,%r12d > > + > > + xorl %r15d,%r8d > > + addl %r12d,%eax > > + addl %r12d,%r8d > > + > > + leaq 20(%rbp),%rbp > > + movl 20(%rsp),%r13d > > + movl 8(%rsp),%r15d > > + > > + movl %r13d,%r12d > > + rorl $11,%r13d > > + addl %r14d,%r8d > > + movl %r15d,%r14d > > + rorl $2,%r15d > > + > > + xorl %r12d,%r13d > > + shrl $3,%r12d > > + rorl $7,%r13d > > + xorl %r14d,%r15d > > + shrl $10,%r14d > > + > > + rorl $17,%r15d > > + xorl %r13d,%r12d > > + xorl %r14d,%r15d > > + addl 52(%rsp),%r12d > > + > > + addl 16(%rsp),%r12d > > + movl %eax,%r13d > > + addl %r15d,%r12d > > + movl %r8d,%r14d > > + rorl $14,%r13d > > + movl %ebx,%r15d > > + > > + xorl %eax,%r13d > > + rorl $9,%r14d > > + xorl %ecx,%r15d > > + > > + movl %r12d,16(%rsp) > > + xorl %r8d,%r14d > > + andl %eax,%r15d > > + > > + rorl $5,%r13d > > + addl %edx,%r12d > > + xorl %ecx,%r15d > > + > > + rorl $11,%r14d > > + xorl %eax,%r13d > > + addl %r15d,%r12d > > + > > + movl %r8d,%r15d > > + addl (%rbp),%r12d > > + xorl %r8d,%r14d > > + > > + xorl %r9d,%r15d > > + rorl $6,%r13d > > + movl %r9d,%edx > > + > > + andl %r15d,%edi > > + rorl $2,%r14d > > + addl %r13d,%r12d > > + > > + xorl %edi,%edx > > + addl %r12d,%r11d > > + addl %r12d,%edx > > + > > + leaq 4(%rbp),%rbp > > + movl 24(%rsp),%r13d > > + movl 12(%rsp),%edi > > + > > + movl %r13d,%r12d > > + rorl $11,%r13d > > + addl %r14d,%edx > > + movl %edi,%r14d > > + rorl $2,%edi > > + > > + xorl %r12d,%r13d > > + shrl $3,%r12d > > + rorl $7,%r13d > > + xorl %r14d,%edi > > + shrl $10,%r14d > > + > > + rorl $17,%edi > > + xorl %r13d,%r12d > > + xorl %r14d,%edi > > + addl 56(%rsp),%r12d > > + > > + addl 20(%rsp),%r12d > > + movl %r11d,%r13d > > + addl %edi,%r12d > > + movl %edx,%r14d > > + rorl $14,%r13d > > + movl %eax,%edi > > + > > + xorl %r11d,%r13d > > + rorl $9,%r14d > > + xorl %ebx,%edi > > + > > + movl %r12d,20(%rsp) > > + xorl %edx,%r14d > > + andl %r11d,%edi > > + > > + rorl $5,%r13d > > + addl %ecx,%r12d > > + xorl %ebx,%edi > > + > > + rorl $11,%r14d > > + xorl %r11d,%r13d > > + addl %edi,%r12d > > + > > + movl %edx,%edi > > + addl (%rbp),%r12d > > + xorl %edx,%r14d > > + > > + xorl %r8d,%edi > > + rorl $6,%r13d > > + movl %r8d,%ecx > > + > > + andl %edi,%r15d > > + rorl $2,%r14d > > + addl %r13d,%r12d > > + > > + xorl %r15d,%ecx > > + addl %r12d,%r10d > > + addl %r12d,%ecx > > + > > + leaq 4(%rbp),%rbp > > + movl 28(%rsp),%r13d > > + movl 16(%rsp),%r15d > > + > > + movl %r13d,%r12d > > + rorl $11,%r13d > > + addl %r14d,%ecx > > + movl %r15d,%r14d > > + rorl $2,%r15d > > + > > + xorl %r12d,%r13d > > + shrl $3,%r12d > > + rorl $7,%r13d > > + xorl %r14d,%r15d > > + shrl $10,%r14d > > + > > + rorl $17,%r15d > > + xorl %r13d,%r12d > > + xorl %r14d,%r15d > > + addl 60(%rsp),%r12d > > + > > + addl 24(%rsp),%r12d > > + movl %r10d,%r13d > > + addl %r15d,%r12d > > + movl %ecx,%r14d > > + rorl $14,%r13d > > + movl %r11d,%r15d > > + > > + xorl %r10d,%r13d > > + rorl $9,%r14d > > + xorl %eax,%r15d > > + > > + movl %r12d,24(%rsp) > > + xorl %ecx,%r14d > > + andl %r10d,%r15d > > + > > + rorl $5,%r13d > > + addl %ebx,%r12d > > + xorl %eax,%r15d > > + > > + rorl $11,%r14d > > + xorl %r10d,%r13d > > + addl %r15d,%r12d > > + > > + movl %ecx,%r15d > > + addl (%rbp),%r12d > > + xorl %ecx,%r14d > > + > > + xorl %edx,%r15d > > + rorl $6,%r13d > > + movl %edx,%ebx > > + > > + andl %r15d,%edi > > + rorl $2,%r14d > > + addl %r13d,%r12d > > + > > + xorl %edi,%ebx > > + addl %r12d,%r9d > > + addl %r12d,%ebx > > + > > + leaq 4(%rbp),%rbp > > + movl 32(%rsp),%r13d > > + movl 20(%rsp),%edi > > + > > + movl %r13d,%r12d > > + rorl $11,%r13d > > + addl %r14d,%ebx > > + movl %edi,%r14d > > + rorl $2,%edi > > + > > + xorl %r12d,%r13d > > + shrl $3,%r12d > > + rorl $7,%r13d > > + xorl %r14d,%edi > > + shrl $10,%r14d > > + > > + rorl $17,%edi > > + xorl %r13d,%r12d > > + xorl %r14d,%edi > > + addl 0(%rsp),%r12d > > + > > + addl 28(%rsp),%r12d > > + movl %r9d,%r13d > > + addl %edi,%r12d > > + movl %ebx,%r14d > > + rorl $14,%r13d > > + movl %r10d,%edi > > + > > + xorl %r9d,%r13d > > + rorl $9,%r14d > > + xorl %r11d,%edi > > + > > + movl %r12d,28(%rsp) > > + xorl %ebx,%r14d > > + andl %r9d,%edi > > + > > + rorl $5,%r13d > > + addl %eax,%r12d > > + xorl %r11d,%edi > > + > > + rorl $11,%r14d > > + xorl %r9d,%r13d > > + addl %edi,%r12d > > + > > + movl %ebx,%edi > > + addl (%rbp),%r12d > > + xorl %ebx,%r14d > > + > > + xorl %ecx,%edi > > + rorl $6,%r13d > > + movl %ecx,%eax > > + > > + andl %edi,%r15d > > + rorl $2,%r14d > > + addl %r13d,%r12d > > + > > + xorl %r15d,%eax > > + addl %r12d,%r8d > > + addl %r12d,%eax > > + > > + leaq 20(%rbp),%rbp > > + movl 36(%rsp),%r13d > > + movl 24(%rsp),%r15d > > + > > + movl %r13d,%r12d > > + rorl $11,%r13d > > + addl %r14d,%eax > > + movl %r15d,%r14d > > + rorl $2,%r15d > > + > > + xorl %r12d,%r13d > > + shrl $3,%r12d > > + rorl $7,%r13d > > + xorl %r14d,%r15d > > + shrl $10,%r14d > > + > > + rorl $17,%r15d > > + xorl %r13d,%r12d > > + xorl %r14d,%r15d > > + addl 4(%rsp),%r12d > > + > > + addl 32(%rsp),%r12d > > + movl %r8d,%r13d > > + addl %r15d,%r12d > > + movl %eax,%r14d > > + rorl $14,%r13d > > + movl %r9d,%r15d > > + > > + xorl %r8d,%r13d > > + rorl $9,%r14d > > + xorl %r10d,%r15d > > + > > + movl %r12d,32(%rsp) > > + xorl %eax,%r14d > > + andl %r8d,%r15d > > + > > + rorl $5,%r13d > > + addl %r11d,%r12d > > + xorl %r10d,%r15d > > + > > + rorl $11,%r14d > > + xorl %r8d,%r13d > > + addl %r15d,%r12d > > + > > + movl %eax,%r15d > > + addl (%rbp),%r12d > > + xorl %eax,%r14d > > + > > + xorl %ebx,%r15d > > + rorl $6,%r13d > > + movl %ebx,%r11d > > + > > + andl %r15d,%edi > > + rorl $2,%r14d > > + addl %r13d,%r12d > > + > > + xorl %edi,%r11d > > + addl %r12d,%edx > > + addl %r12d,%r11d > > + > > + leaq 4(%rbp),%rbp > > + movl 40(%rsp),%r13d > > + movl 28(%rsp),%edi > > + > > + movl %r13d,%r12d > > + rorl $11,%r13d > > + addl %r14d,%r11d > > + movl %edi,%r14d > > + rorl $2,%edi > > + > > + xorl %r12d,%r13d > > + shrl $3,%r12d > > + rorl $7,%r13d > > + xorl %r14d,%edi > > + shrl $10,%r14d > > + > > + rorl $17,%edi > > + xorl %r13d,%r12d > > + xorl %r14d,%edi > > + addl 8(%rsp),%r12d > > + > > + addl 36(%rsp),%r12d > > + movl %edx,%r13d > > + addl %edi,%r12d > > + movl %r11d,%r14d > > + rorl $14,%r13d > > + movl %r8d,%edi > > + > > + xorl %edx,%r13d > > + rorl $9,%r14d > > + xorl %r9d,%edi > > + > > + movl %r12d,36(%rsp) > > + xorl %r11d,%r14d > > + andl %edx,%edi > > + > > + rorl $5,%r13d > > + addl %r10d,%r12d > > + xorl %r9d,%edi > > + > > + rorl $11,%r14d > > + xorl %edx,%r13d > > + addl %edi,%r12d > > + > > + movl %r11d,%edi > > + addl (%rbp),%r12d > > + xorl %r11d,%r14d > > + > > + xorl %eax,%edi > > + rorl $6,%r13d > > + movl %eax,%r10d > > + > > + andl %edi,%r15d > > + rorl $2,%r14d > > + addl %r13d,%r12d > > + > > + xorl %r15d,%r10d > > + addl %r12d,%ecx > > + addl %r12d,%r10d > > + > > + leaq 4(%rbp),%rbp > > + movl 44(%rsp),%r13d > > + movl 32(%rsp),%r15d > > + > > + movl %r13d,%r12d > > + rorl $11,%r13d > > + addl %r14d,%r10d > > + movl %r15d,%r14d > > + rorl $2,%r15d > > + > > + xorl %r12d,%r13d > > + shrl $3,%r12d > > + rorl $7,%r13d > > + xorl %r14d,%r15d > > + shrl $10,%r14d > > + > > + rorl $17,%r15d > > + xorl %r13d,%r12d > > + xorl %r14d,%r15d > > + addl 12(%rsp),%r12d > > + > > + addl 40(%rsp),%r12d > > + movl %ecx,%r13d > > + addl %r15d,%r12d > > + movl %r10d,%r14d > > + rorl $14,%r13d > > + movl %edx,%r15d > > + > > + xorl %ecx,%r13d > > + rorl $9,%r14d > > + xorl %r8d,%r15d > > + > > + movl %r12d,40(%rsp) > > + xorl %r10d,%r14d > > + andl %ecx,%r15d > > + > > + rorl $5,%r13d > > + addl %r9d,%r12d > > + xorl %r8d,%r15d > > + > > + rorl $11,%r14d > > + xorl %ecx,%r13d > > + addl %r15d,%r12d > > + > > + movl %r10d,%r15d > > + addl (%rbp),%r12d > > + xorl %r10d,%r14d > > + > > + xorl %r11d,%r15d > > + rorl $6,%r13d > > + movl %r11d,%r9d > > + > > + andl %r15d,%edi > > + rorl $2,%r14d > > + addl %r13d,%r12d > > + > > + xorl %edi,%r9d > > + addl %r12d,%ebx > > + addl %r12d,%r9d > > + > > + leaq 4(%rbp),%rbp > > + movl 48(%rsp),%r13d > > + movl 36(%rsp),%edi > > + > > + movl %r13d,%r12d > > + rorl $11,%r13d > > + addl %r14d,%r9d > > + movl %edi,%r14d > > + rorl $2,%edi > > + > > + xorl %r12d,%r13d > > + shrl $3,%r12d > > + rorl $7,%r13d > > + xorl %r14d,%edi > > + shrl $10,%r14d > > + > > + rorl $17,%edi > > + xorl %r13d,%r12d > > + xorl %r14d,%edi > > + addl 16(%rsp),%r12d > > + > > + addl 44(%rsp),%r12d > > + movl %ebx,%r13d > > + addl %edi,%r12d > > + movl %r9d,%r14d > > + rorl $14,%r13d > > + movl %ecx,%edi > > + > > + xorl %ebx,%r13d > > + rorl $9,%r14d > > + xorl %edx,%edi > > + > > + movl %r12d,44(%rsp) > > + xorl %r9d,%r14d > > + andl %ebx,%edi > > + > > + rorl $5,%r13d > > + addl %r8d,%r12d > > + xorl %edx,%edi > > + > > + rorl $11,%r14d > > + xorl %ebx,%r13d > > + addl %edi,%r12d > > + > > + movl %r9d,%edi > > + addl (%rbp),%r12d > > + xorl %r9d,%r14d > > + > > + xorl %r10d,%edi > > + rorl $6,%r13d > > + movl %r10d,%r8d > > + > > + andl %edi,%r15d > > + rorl $2,%r14d > > + addl %r13d,%r12d > > + > > + xorl %r15d,%r8d > > + addl %r12d,%eax > > + addl %r12d,%r8d > > + > > + leaq 20(%rbp),%rbp > > + movl 52(%rsp),%r13d > > + movl 40(%rsp),%r15d > > + > > + movl %r13d,%r12d > > + rorl $11,%r13d > > + addl %r14d,%r8d > > + movl %r15d,%r14d > > + rorl $2,%r15d > > + > > + xorl %r12d,%r13d > > + shrl $3,%r12d > > + rorl $7,%r13d > > + xorl %r14d,%r15d > > + shrl $10,%r14d > > + > > + rorl $17,%r15d > > + xorl %r13d,%r12d > > + xorl %r14d,%r15d > > + addl 20(%rsp),%r12d > > + > > + addl 48(%rsp),%r12d > > + movl %eax,%r13d > > + addl %r15d,%r12d > > + movl %r8d,%r14d > > + rorl $14,%r13d > > + movl %ebx,%r15d > > + > > + xorl %eax,%r13d > > + rorl $9,%r14d > > + xorl %ecx,%r15d > > + > > + movl %r12d,48(%rsp) > > + xorl %r8d,%r14d > > + andl %eax,%r15d > > + > > + rorl $5,%r13d > > + addl %edx,%r12d > > + xorl %ecx,%r15d > > + > > + rorl $11,%r14d > > + xorl %eax,%r13d > > + addl %r15d,%r12d > > + > > + movl %r8d,%r15d > > + addl (%rbp),%r12d > > + xorl %r8d,%r14d > > + > > + xorl %r9d,%r15d > > + rorl $6,%r13d > > + movl %r9d,%edx > > + > > + andl %r15d,%edi > > + rorl $2,%r14d > > + addl %r13d,%r12d > > + > > + xorl %edi,%edx > > + addl %r12d,%r11d > > + addl %r12d,%edx > > + > > + leaq 4(%rbp),%rbp > > + movl 56(%rsp),%r13d > > + movl 44(%rsp),%edi > > + > > + movl %r13d,%r12d > > + rorl $11,%r13d > > + addl %r14d,%edx > > + movl %edi,%r14d > > + rorl $2,%edi > > + > > + xorl %r12d,%r13d > > + shrl $3,%r12d > > + rorl $7,%r13d > > + xorl %r14d,%edi > > + shrl $10,%r14d > > + > > + rorl $17,%edi > > + xorl %r13d,%r12d > > + xorl %r14d,%edi > > + addl 24(%rsp),%r12d > > + > > + addl 52(%rsp),%r12d > > + movl %r11d,%r13d > > + addl %edi,%r12d > > + movl %edx,%r14d > > + rorl $14,%r13d > > + movl %eax,%edi > > + > > + xorl %r11d,%r13d > > + rorl $9,%r14d > > + xorl %ebx,%edi > > + > > + movl %r12d,52(%rsp) > > + xorl %edx,%r14d > > + andl %r11d,%edi > > + > > + rorl $5,%r13d > > + addl %ecx,%r12d > > + xorl %ebx,%edi > > + > > + rorl $11,%r14d > > + xorl %r11d,%r13d > > + addl %edi,%r12d > > + > > + movl %edx,%edi > > + addl (%rbp),%r12d > > + xorl %edx,%r14d > > + > > + xorl %r8d,%edi > > + rorl $6,%r13d > > + movl %r8d,%ecx > > + > > + andl %edi,%r15d > > + rorl $2,%r14d > > + addl %r13d,%r12d > > + > > + xorl %r15d,%ecx > > + addl %r12d,%r10d > > + addl %r12d,%ecx > > + > > + leaq 4(%rbp),%rbp > > + movl 60(%rsp),%r13d > > + movl 48(%rsp),%r15d > > + > > + movl %r13d,%r12d > > + rorl $11,%r13d > > + addl %r14d,%ecx > > + movl %r15d,%r14d > > + rorl $2,%r15d > > + > > + xorl %r12d,%r13d > > + shrl $3,%r12d > > + rorl $7,%r13d > > + xorl %r14d,%r15d > > + shrl $10,%r14d > > + > > + rorl $17,%r15d > > + xorl %r13d,%r12d > > + xorl %r14d,%r15d > > + addl 28(%rsp),%r12d > > + > > + addl 56(%rsp),%r12d > > + movl %r10d,%r13d > > + addl %r15d,%r12d > > + movl %ecx,%r14d > > + rorl $14,%r13d > > + movl %r11d,%r15d > > + > > + xorl %r10d,%r13d > > + rorl $9,%r14d > > + xorl %eax,%r15d > > + > > + movl %r12d,56(%rsp) > > + xorl %ecx,%r14d > > + andl %r10d,%r15d > > + > > + rorl $5,%r13d > > + addl %ebx,%r12d > > + xorl %eax,%r15d > > + > > + rorl $11,%r14d > > + xorl %r10d,%r13d > > + addl %r15d,%r12d > > + > > + movl %ecx,%r15d > > + addl (%rbp),%r12d > > + xorl %ecx,%r14d > > + > > + xorl %edx,%r15d > > + rorl $6,%r13d > > + movl %edx,%ebx > > + > > + andl %r15d,%edi > > + rorl $2,%r14d > > + addl %r13d,%r12d > > + > > + xorl %edi,%ebx > > + addl %r12d,%r9d > > + addl %r12d,%ebx > > + > > + leaq 4(%rbp),%rbp > > + movl 0(%rsp),%r13d > > + movl 52(%rsp),%edi > > + > > + movl %r13d,%r12d > > + rorl $11,%r13d > > + addl %r14d,%ebx > > + movl %edi,%r14d > > + rorl $2,%edi > > + > > + xorl %r12d,%r13d > > + shrl $3,%r12d > > + rorl $7,%r13d > > + xorl %r14d,%edi > > + shrl $10,%r14d > > + > > + rorl $17,%edi > > + xorl %r13d,%r12d > > + xorl %r14d,%edi > > + addl 32(%rsp),%r12d > > + > > + addl 60(%rsp),%r12d > > + movl %r9d,%r13d > > + addl %edi,%r12d > > + movl %ebx,%r14d > > + rorl $14,%r13d > > + movl %r10d,%edi > > + > > + xorl %r9d,%r13d > > + rorl $9,%r14d > > + xorl %r11d,%edi > > + > > + movl %r12d,60(%rsp) > > + xorl %ebx,%r14d > > + andl %r9d,%edi > > + > > + rorl $5,%r13d > > + addl %eax,%r12d > > + xorl %r11d,%edi > > + > > + rorl $11,%r14d > > + xorl %r9d,%r13d > > + addl %edi,%r12d > > + > > + movl %ebx,%edi > > + addl (%rbp),%r12d > > + xorl %ebx,%r14d > > + > > + xorl %ecx,%edi > > + rorl $6,%r13d > > + movl %ecx,%eax > > + > > + andl %edi,%r15d > > + rorl $2,%r14d > > + addl %r13d,%r12d > > + > > + xorl %r15d,%eax > > + addl %r12d,%r8d > > + addl %r12d,%eax > > + > > + leaq 20(%rbp),%rbp > > + cmpb $0,3(%rbp) > > + jnz .Lrounds_16_xx > > + > > + movq 64+0(%rsp),%rdi > > + addl %r14d,%eax > > + leaq 64(%rsi),%rsi > > + > > + addl 0(%rdi),%eax > > + addl 4(%rdi),%ebx > > + addl 8(%rdi),%ecx > > + addl 12(%rdi),%edx > > + addl 16(%rdi),%r8d > > + addl 20(%rdi),%r9d > > + addl 24(%rdi),%r10d > > + addl 28(%rdi),%r11d > > + > > + cmpq 64+16(%rsp),%rsi > > + > > + movl %eax,0(%rdi) > > + movl %ebx,4(%rdi) > > + movl %ecx,8(%rdi) > > + movl %edx,12(%rdi) > > + movl %r8d,16(%rdi) > > + movl %r9d,20(%rdi) > > + movl %r10d,24(%rdi) > > + movl %r11d,28(%rdi) > > + jb .Lloop > > + > > + movq 88(%rsp),%rsi > > +.cfi_def_cfa %rsi,8 > > + movq -48(%rsi),%r15 > > +.cfi_restore %r15 > > + movq -40(%rsi),%r14 > > +.cfi_restore %r14 > > + movq -32(%rsi),%r13 > > +.cfi_restore %r13 > > + movq -24(%rsi),%r12 > > +.cfi_restore %r12 > > + movq -16(%rsi),%rbp > > +.cfi_restore %rbp > > + movq -8(%rsi),%rbx > > +.cfi_restore %rbx > > + leaq (%rsi),%rsp > > +.cfi_def_cfa_register %rsp > > +.Lepilogue: > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size sha256_block_data_order,.-sha256_block_data_order > > +.align 64 > > +.type K256,@object > > +K256: > > +.long 0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5 > > +.long 0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5 > > +.long 0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5 > > +.long 0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5 > > +.long 0xd807aa98,0x12835b01,0x243185be,0x550c7dc3 > > +.long 0xd807aa98,0x12835b01,0x243185be,0x550c7dc3 > > +.long 0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174 > > +.long 0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174 > > +.long 0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc > > +.long 0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc > > +.long 0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da > > +.long 0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da > > +.long 0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7 > > +.long 0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7 > > +.long 0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967 > > +.long 0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967 > > +.long 0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13 > > +.long 0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13 > > +.long 0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85 > > +.long 0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85 > > +.long 0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3 > > +.long 0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3 > > +.long 0xd192e819,0xd6990624,0xf40e3585,0x106aa070 > > +.long 0xd192e819,0xd6990624,0xf40e3585,0x106aa070 > > +.long 0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5 > > +.long 0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5 > > +.long 0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3 > > +.long 0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3 > > +.long 0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208 > > +.long 0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208 > > +.long 0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2 > > +.long 0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2 > > + > > +.long 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f > > +.long 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f > > +.long 0x03020100,0x0b0a0908,0xffffffff,0xffffffff > > +.long 0x03020100,0x0b0a0908,0xffffffff,0xffffffff > > +.long 0xffffffff,0xffffffff,0x03020100,0x0b0a0908 > > +.long 0xffffffff,0xffffffff,0x03020100,0x0b0a0908 > > +.byte > > > 83,72,65,50,53,54,32,98,108,111,99,107,32,116,114,97,110,115,102,111,114,1 > > > 09,32,102,111,114,32,120,56,54,95,54,52,44,32,67,82,89,80,84,79,71,65,77,83 > , > > > 32,98,121,32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46,111, > 1 > > 14,103,62,0 > > +.type sha256_block_data_order_shaext,@function > > +.align 64 > > +sha256_block_data_order_shaext: > > +_shaext_shortcut: > > +.cfi_startproc > > + leaq K256+128(%rip),%rcx > > + movdqu (%rdi),%xmm1 > > + movdqu 16(%rdi),%xmm2 > > + movdqa 512-128(%rcx),%xmm7 > > + > > + pshufd $0x1b,%xmm1,%xmm0 > > + pshufd $0xb1,%xmm1,%xmm1 > > + pshufd $0x1b,%xmm2,%xmm2 > > + movdqa %xmm7,%xmm8 > > +.byte 102,15,58,15,202,8 > > + punpcklqdq %xmm0,%xmm2 > > + jmp .Loop_shaext > > + > > +.align 16 > > +.Loop_shaext: > > + movdqu (%rsi),%xmm3 > > + movdqu 16(%rsi),%xmm4 > > + movdqu 32(%rsi),%xmm5 > > +.byte 102,15,56,0,223 > > + movdqu 48(%rsi),%xmm6 > > + > > + movdqa 0-128(%rcx),%xmm0 > > + paddd %xmm3,%xmm0 > > +.byte 102,15,56,0,231 > > + movdqa %xmm2,%xmm10 > > +.byte 15,56,203,209 > > + pshufd $0x0e,%xmm0,%xmm0 > > + nop > > + movdqa %xmm1,%xmm9 > > +.byte 15,56,203,202 > > + > > + movdqa 32-128(%rcx),%xmm0 > > + paddd %xmm4,%xmm0 > > +.byte 102,15,56,0,239 > > +.byte 15,56,203,209 > > + pshufd $0x0e,%xmm0,%xmm0 > > + leaq 64(%rsi),%rsi > > +.byte 15,56,204,220 > > +.byte 15,56,203,202 > > + > > + movdqa 64-128(%rcx),%xmm0 > > + paddd %xmm5,%xmm0 > > +.byte 102,15,56,0,247 > > +.byte 15,56,203,209 > > + pshufd $0x0e,%xmm0,%xmm0 > > + movdqa %xmm6,%xmm7 > > +.byte 102,15,58,15,253,4 > > + nop > > + paddd %xmm7,%xmm3 > > +.byte 15,56,204,229 > > +.byte 15,56,203,202 > > + > > + movdqa 96-128(%rcx),%xmm0 > > + paddd %xmm6,%xmm0 > > +.byte 15,56,205,222 > > +.byte 15,56,203,209 > > + pshufd $0x0e,%xmm0,%xmm0 > > + movdqa %xmm3,%xmm7 > > +.byte 102,15,58,15,254,4 > > + nop > > + paddd %xmm7,%xmm4 > > +.byte 15,56,204,238 > > +.byte 15,56,203,202 > > + movdqa 128-128(%rcx),%xmm0 > > + paddd %xmm3,%xmm0 > > +.byte 15,56,205,227 > > +.byte 15,56,203,209 > > + pshufd $0x0e,%xmm0,%xmm0 > > + movdqa %xmm4,%xmm7 > > +.byte 102,15,58,15,251,4 > > + nop > > + paddd %xmm7,%xmm5 > > +.byte 15,56,204,243 > > +.byte 15,56,203,202 > > + movdqa 160-128(%rcx),%xmm0 > > + paddd %xmm4,%xmm0 > > +.byte 15,56,205,236 > > +.byte 15,56,203,209 > > + pshufd $0x0e,%xmm0,%xmm0 > > + movdqa %xmm5,%xmm7 > > +.byte 102,15,58,15,252,4 > > + nop > > + paddd %xmm7,%xmm6 > > +.byte 15,56,204,220 > > +.byte 15,56,203,202 > > + movdqa 192-128(%rcx),%xmm0 > > + paddd %xmm5,%xmm0 > > +.byte 15,56,205,245 > > +.byte 15,56,203,209 > > + pshufd $0x0e,%xmm0,%xmm0 > > + movdqa %xmm6,%xmm7 > > +.byte 102,15,58,15,253,4 > > + nop > > + paddd %xmm7,%xmm3 > > +.byte 15,56,204,229 > > +.byte 15,56,203,202 > > + movdqa 224-128(%rcx),%xmm0 > > + paddd %xmm6,%xmm0 > > +.byte 15,56,205,222 > > +.byte 15,56,203,209 > > + pshufd $0x0e,%xmm0,%xmm0 > > + movdqa %xmm3,%xmm7 > > +.byte 102,15,58,15,254,4 > > + nop > > + paddd %xmm7,%xmm4 > > +.byte 15,56,204,238 > > +.byte 15,56,203,202 > > + movdqa 256-128(%rcx),%xmm0 > > + paddd %xmm3,%xmm0 > > +.byte 15,56,205,227 > > +.byte 15,56,203,209 > > + pshufd $0x0e,%xmm0,%xmm0 > > + movdqa %xmm4,%xmm7 > > +.byte 102,15,58,15,251,4 > > + nop > > + paddd %xmm7,%xmm5 > > +.byte 15,56,204,243 > > +.byte 15,56,203,202 > > + movdqa 288-128(%rcx),%xmm0 > > + paddd %xmm4,%xmm0 > > +.byte 15,56,205,236 > > +.byte 15,56,203,209 > > + pshufd $0x0e,%xmm0,%xmm0 > > + movdqa %xmm5,%xmm7 > > +.byte 102,15,58,15,252,4 > > + nop > > + paddd %xmm7,%xmm6 > > +.byte 15,56,204,220 > > +.byte 15,56,203,202 > > + movdqa 320-128(%rcx),%xmm0 > > + paddd %xmm5,%xmm0 > > +.byte 15,56,205,245 > > +.byte 15,56,203,209 > > + pshufd $0x0e,%xmm0,%xmm0 > > + movdqa %xmm6,%xmm7 > > +.byte 102,15,58,15,253,4 > > + nop > > + paddd %xmm7,%xmm3 > > +.byte 15,56,204,229 > > +.byte 15,56,203,202 > > + movdqa 352-128(%rcx),%xmm0 > > + paddd %xmm6,%xmm0 > > +.byte 15,56,205,222 > > +.byte 15,56,203,209 > > + pshufd $0x0e,%xmm0,%xmm0 > > + movdqa %xmm3,%xmm7 > > +.byte 102,15,58,15,254,4 > > + nop > > + paddd %xmm7,%xmm4 > > +.byte 15,56,204,238 > > +.byte 15,56,203,202 > > + movdqa 384-128(%rcx),%xmm0 > > + paddd %xmm3,%xmm0 > > +.byte 15,56,205,227 > > +.byte 15,56,203,209 > > + pshufd $0x0e,%xmm0,%xmm0 > > + movdqa %xmm4,%xmm7 > > +.byte 102,15,58,15,251,4 > > + nop > > + paddd %xmm7,%xmm5 > > +.byte 15,56,204,243 > > +.byte 15,56,203,202 > > + movdqa 416-128(%rcx),%xmm0 > > + paddd %xmm4,%xmm0 > > +.byte 15,56,205,236 > > +.byte 15,56,203,209 > > + pshufd $0x0e,%xmm0,%xmm0 > > + movdqa %xmm5,%xmm7 > > +.byte 102,15,58,15,252,4 > > +.byte 15,56,203,202 > > + paddd %xmm7,%xmm6 > > + > > + movdqa 448-128(%rcx),%xmm0 > > + paddd %xmm5,%xmm0 > > +.byte 15,56,203,209 > > + pshufd $0x0e,%xmm0,%xmm0 > > +.byte 15,56,205,245 > > + movdqa %xmm8,%xmm7 > > +.byte 15,56,203,202 > > + > > + movdqa 480-128(%rcx),%xmm0 > > + paddd %xmm6,%xmm0 > > + nop > > +.byte 15,56,203,209 > > + pshufd $0x0e,%xmm0,%xmm0 > > + decq %rdx > > + nop > > +.byte 15,56,203,202 > > + > > + paddd %xmm10,%xmm2 > > + paddd %xmm9,%xmm1 > > + jnz .Loop_shaext > > + > > + pshufd $0xb1,%xmm2,%xmm2 > > + pshufd $0x1b,%xmm1,%xmm7 > > + pshufd $0xb1,%xmm1,%xmm1 > > + punpckhqdq %xmm2,%xmm1 > > +.byte 102,15,58,15,215,8 > > + > > + movdqu %xmm1,(%rdi) > > + movdqu %xmm2,16(%rdi) > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size > sha256_block_data_order_shaext,.-sha256_block_data_order_shaext > > +.type sha256_block_data_order_ssse3,@function > > +.align 64 > > +sha256_block_data_order_ssse3: > > +.cfi_startproc > > +.Lssse3_shortcut: > > + movq %rsp,%rax > > +.cfi_def_cfa_register %rax > > + pushq %rbx > > +.cfi_offset %rbx,-16 > > + pushq %rbp > > +.cfi_offset %rbp,-24 > > + pushq %r12 > > +.cfi_offset %r12,-32 > > + pushq %r13 > > +.cfi_offset %r13,-40 > > + pushq %r14 > > +.cfi_offset %r14,-48 > > + pushq %r15 > > +.cfi_offset %r15,-56 > > + shlq $4,%rdx > > + subq $96,%rsp > > + leaq (%rsi,%rdx,4),%rdx > > + andq $-64,%rsp > > + movq %rdi,64+0(%rsp) > > + movq %rsi,64+8(%rsp) > > + movq %rdx,64+16(%rsp) > > + movq %rax,88(%rsp) > > +.cfi_escape 0x0f,0x06,0x77,0xd8,0x00,0x06,0x23,0x08 > > +.Lprologue_ssse3: > > + > > + movl 0(%rdi),%eax > > + movl 4(%rdi),%ebx > > + movl 8(%rdi),%ecx > > + movl 12(%rdi),%edx > > + movl 16(%rdi),%r8d > > + movl 20(%rdi),%r9d > > + movl 24(%rdi),%r10d > > + movl 28(%rdi),%r11d > > + > > + > > + jmp .Lloop_ssse3 > > +.align 16 > > +.Lloop_ssse3: > > + movdqa K256+512(%rip),%xmm7 > > + movdqu 0(%rsi),%xmm0 > > + movdqu 16(%rsi),%xmm1 > > + movdqu 32(%rsi),%xmm2 > > +.byte 102,15,56,0,199 > > + movdqu 48(%rsi),%xmm3 > > + leaq K256(%rip),%rbp > > +.byte 102,15,56,0,207 > > + movdqa 0(%rbp),%xmm4 > > + movdqa 32(%rbp),%xmm5 > > +.byte 102,15,56,0,215 > > + paddd %xmm0,%xmm4 > > + movdqa 64(%rbp),%xmm6 > > +.byte 102,15,56,0,223 > > + movdqa 96(%rbp),%xmm7 > > + paddd %xmm1,%xmm5 > > + paddd %xmm2,%xmm6 > > + paddd %xmm3,%xmm7 > > + movdqa %xmm4,0(%rsp) > > + movl %eax,%r14d > > + movdqa %xmm5,16(%rsp) > > + movl %ebx,%edi > > + movdqa %xmm6,32(%rsp) > > + xorl %ecx,%edi > > + movdqa %xmm7,48(%rsp) > > + movl %r8d,%r13d > > + jmp .Lssse3_00_47 > > + > > +.align 16 > > +.Lssse3_00_47: > > + subq $-128,%rbp > > + rorl $14,%r13d > > + movdqa %xmm1,%xmm4 > > + movl %r14d,%eax > > + movl %r9d,%r12d > > + movdqa %xmm3,%xmm7 > > + rorl $9,%r14d > > + xorl %r8d,%r13d > > + xorl %r10d,%r12d > > + rorl $5,%r13d > > + xorl %eax,%r14d > > +.byte 102,15,58,15,224,4 > > + andl %r8d,%r12d > > + xorl %r8d,%r13d > > +.byte 102,15,58,15,250,4 > > + addl 0(%rsp),%r11d > > + movl %eax,%r15d > > + xorl %r10d,%r12d > > + rorl $11,%r14d > > + movdqa %xmm4,%xmm5 > > + xorl %ebx,%r15d > > + addl %r12d,%r11d > > + movdqa %xmm4,%xmm6 > > + rorl $6,%r13d > > + andl %r15d,%edi > > + psrld $3,%xmm4 > > + xorl %eax,%r14d > > + addl %r13d,%r11d > > + xorl %ebx,%edi > > + paddd %xmm7,%xmm0 > > + rorl $2,%r14d > > + addl %r11d,%edx > > + psrld $7,%xmm6 > > + addl %edi,%r11d > > + movl %edx,%r13d > > + pshufd $250,%xmm3,%xmm7 > > + addl %r11d,%r14d > > + rorl $14,%r13d > > + pslld $14,%xmm5 > > + movl %r14d,%r11d > > + movl %r8d,%r12d > > + pxor %xmm6,%xmm4 > > + rorl $9,%r14d > > + xorl %edx,%r13d > > + xorl %r9d,%r12d > > + rorl $5,%r13d > > + psrld $11,%xmm6 > > + xorl %r11d,%r14d > > + pxor %xmm5,%xmm4 > > + andl %edx,%r12d > > + xorl %edx,%r13d > > + pslld $11,%xmm5 > > + addl 4(%rsp),%r10d > > + movl %r11d,%edi > > + pxor %xmm6,%xmm4 > > + xorl %r9d,%r12d > > + rorl $11,%r14d > > + movdqa %xmm7,%xmm6 > > + xorl %eax,%edi > > + addl %r12d,%r10d > > + pxor %xmm5,%xmm4 > > + rorl $6,%r13d > > + andl %edi,%r15d > > + xorl %r11d,%r14d > > + psrld $10,%xmm7 > > + addl %r13d,%r10d > > + xorl %eax,%r15d > > + paddd %xmm4,%xmm0 > > + rorl $2,%r14d > > + addl %r10d,%ecx > > + psrlq $17,%xmm6 > > + addl %r15d,%r10d > > + movl %ecx,%r13d > > + addl %r10d,%r14d > > + pxor %xmm6,%xmm7 > > + rorl $14,%r13d > > + movl %r14d,%r10d > > + movl %edx,%r12d > > + rorl $9,%r14d > > + psrlq $2,%xmm6 > > + xorl %ecx,%r13d > > + xorl %r8d,%r12d > > + pxor %xmm6,%xmm7 > > + rorl $5,%r13d > > + xorl %r10d,%r14d > > + andl %ecx,%r12d > > + pshufd $128,%xmm7,%xmm7 > > + xorl %ecx,%r13d > > + addl 8(%rsp),%r9d > > + movl %r10d,%r15d > > + psrldq $8,%xmm7 > > + xorl %r8d,%r12d > > + rorl $11,%r14d > > + xorl %r11d,%r15d > > + addl %r12d,%r9d > > + rorl $6,%r13d > > + paddd %xmm7,%xmm0 > > + andl %r15d,%edi > > + xorl %r10d,%r14d > > + addl %r13d,%r9d > > + pshufd $80,%xmm0,%xmm7 > > + xorl %r11d,%edi > > + rorl $2,%r14d > > + addl %r9d,%ebx > > + movdqa %xmm7,%xmm6 > > + addl %edi,%r9d > > + movl %ebx,%r13d > > + psrld $10,%xmm7 > > + addl %r9d,%r14d > > + rorl $14,%r13d > > + psrlq $17,%xmm6 > > + movl %r14d,%r9d > > + movl %ecx,%r12d > > + pxor %xmm6,%xmm7 > > + rorl $9,%r14d > > + xorl %ebx,%r13d > > + xorl %edx,%r12d > > + rorl $5,%r13d > > + xorl %r9d,%r14d > > + psrlq $2,%xmm6 > > + andl %ebx,%r12d > > + xorl %ebx,%r13d > > + addl 12(%rsp),%r8d > > + pxor %xmm6,%xmm7 > > + movl %r9d,%edi > > + xorl %edx,%r12d > > + rorl $11,%r14d > > + pshufd $8,%xmm7,%xmm7 > > + xorl %r10d,%edi > > + addl %r12d,%r8d > > + movdqa 0(%rbp),%xmm6 > > + rorl $6,%r13d > > + andl %edi,%r15d > > + pslldq $8,%xmm7 > > + xorl %r9d,%r14d > > + addl %r13d,%r8d > > + xorl %r10d,%r15d > > + paddd %xmm7,%xmm0 > > + rorl $2,%r14d > > + addl %r8d,%eax > > + addl %r15d,%r8d > > + paddd %xmm0,%xmm6 > > + movl %eax,%r13d > > + addl %r8d,%r14d > > + movdqa %xmm6,0(%rsp) > > + rorl $14,%r13d > > + movdqa %xmm2,%xmm4 > > + movl %r14d,%r8d > > + movl %ebx,%r12d > > + movdqa %xmm0,%xmm7 > > + rorl $9,%r14d > > + xorl %eax,%r13d > > + xorl %ecx,%r12d > > + rorl $5,%r13d > > + xorl %r8d,%r14d > > +.byte 102,15,58,15,225,4 > > + andl %eax,%r12d > > + xorl %eax,%r13d > > +.byte 102,15,58,15,251,4 > > + addl 16(%rsp),%edx > > + movl %r8d,%r15d > > + xorl %ecx,%r12d > > + rorl $11,%r14d > > + movdqa %xmm4,%xmm5 > > + xorl %r9d,%r15d > > + addl %r12d,%edx > > + movdqa %xmm4,%xmm6 > > + rorl $6,%r13d > > + andl %r15d,%edi > > + psrld $3,%xmm4 > > + xorl %r8d,%r14d > > + addl %r13d,%edx > > + xorl %r9d,%edi > > + paddd %xmm7,%xmm1 > > + rorl $2,%r14d > > + addl %edx,%r11d > > + psrld $7,%xmm6 > > + addl %edi,%edx > > + movl %r11d,%r13d > > + pshufd $250,%xmm0,%xmm7 > > + addl %edx,%r14d > > + rorl $14,%r13d > > + pslld $14,%xmm5 > > + movl %r14d,%edx > > + movl %eax,%r12d > > + pxor %xmm6,%xmm4 > > + rorl $9,%r14d > > + xorl %r11d,%r13d > > + xorl %ebx,%r12d > > + rorl $5,%r13d > > + psrld $11,%xmm6 > > + xorl %edx,%r14d > > + pxor %xmm5,%xmm4 > > + andl %r11d,%r12d > > + xorl %r11d,%r13d > > + pslld $11,%xmm5 > > + addl 20(%rsp),%ecx > > + movl %edx,%edi > > + pxor %xmm6,%xmm4 > > + xorl %ebx,%r12d > > + rorl $11,%r14d > > + movdqa %xmm7,%xmm6 > > + xorl %r8d,%edi > > + addl %r12d,%ecx > > + pxor %xmm5,%xmm4 > > + rorl $6,%r13d > > + andl %edi,%r15d > > + xorl %edx,%r14d > > + psrld $10,%xmm7 > > + addl %r13d,%ecx > > + xorl %r8d,%r15d > > + paddd %xmm4,%xmm1 > > + rorl $2,%r14d > > + addl %ecx,%r10d > > + psrlq $17,%xmm6 > > + addl %r15d,%ecx > > + movl %r10d,%r13d > > + addl %ecx,%r14d > > + pxor %xmm6,%xmm7 > > + rorl $14,%r13d > > + movl %r14d,%ecx > > + movl %r11d,%r12d > > + rorl $9,%r14d > > + psrlq $2,%xmm6 > > + xorl %r10d,%r13d > > + xorl %eax,%r12d > > + pxor %xmm6,%xmm7 > > + rorl $5,%r13d > > + xorl %ecx,%r14d > > + andl %r10d,%r12d > > + pshufd $128,%xmm7,%xmm7 > > + xorl %r10d,%r13d > > + addl 24(%rsp),%ebx > > + movl %ecx,%r15d > > + psrldq $8,%xmm7 > > + xorl %eax,%r12d > > + rorl $11,%r14d > > + xorl %edx,%r15d > > + addl %r12d,%ebx > > + rorl $6,%r13d > > + paddd %xmm7,%xmm1 > > + andl %r15d,%edi > > + xorl %ecx,%r14d > > + addl %r13d,%ebx > > + pshufd $80,%xmm1,%xmm7 > > + xorl %edx,%edi > > + rorl $2,%r14d > > + addl %ebx,%r9d > > + movdqa %xmm7,%xmm6 > > + addl %edi,%ebx > > + movl %r9d,%r13d > > + psrld $10,%xmm7 > > + addl %ebx,%r14d > > + rorl $14,%r13d > > + psrlq $17,%xmm6 > > + movl %r14d,%ebx > > + movl %r10d,%r12d > > + pxor %xmm6,%xmm7 > > + rorl $9,%r14d > > + xorl %r9d,%r13d > > + xorl %r11d,%r12d > > + rorl $5,%r13d > > + xorl %ebx,%r14d > > + psrlq $2,%xmm6 > > + andl %r9d,%r12d > > + xorl %r9d,%r13d > > + addl 28(%rsp),%eax > > + pxor %xmm6,%xmm7 > > + movl %ebx,%edi > > + xorl %r11d,%r12d > > + rorl $11,%r14d > > + pshufd $8,%xmm7,%xmm7 > > + xorl %ecx,%edi > > + addl %r12d,%eax > > + movdqa 32(%rbp),%xmm6 > > + rorl $6,%r13d > > + andl %edi,%r15d > > + pslldq $8,%xmm7 > > + xorl %ebx,%r14d > > + addl %r13d,%eax > > + xorl %ecx,%r15d > > + paddd %xmm7,%xmm1 > > + rorl $2,%r14d > > + addl %eax,%r8d > > + addl %r15d,%eax > > + paddd %xmm1,%xmm6 > > + movl %r8d,%r13d > > + addl %eax,%r14d > > + movdqa %xmm6,16(%rsp) > > + rorl $14,%r13d > > + movdqa %xmm3,%xmm4 > > + movl %r14d,%eax > > + movl %r9d,%r12d > > + movdqa %xmm1,%xmm7 > > + rorl $9,%r14d > > + xorl %r8d,%r13d > > + xorl %r10d,%r12d > > + rorl $5,%r13d > > + xorl %eax,%r14d > > +.byte 102,15,58,15,226,4 > > + andl %r8d,%r12d > > + xorl %r8d,%r13d > > +.byte 102,15,58,15,248,4 > > + addl 32(%rsp),%r11d > > + movl %eax,%r15d > > + xorl %r10d,%r12d > > + rorl $11,%r14d > > + movdqa %xmm4,%xmm5 > > + xorl %ebx,%r15d > > + addl %r12d,%r11d > > + movdqa %xmm4,%xmm6 > > + rorl $6,%r13d > > + andl %r15d,%edi > > + psrld $3,%xmm4 > > + xorl %eax,%r14d > > + addl %r13d,%r11d > > + xorl %ebx,%edi > > + paddd %xmm7,%xmm2 > > + rorl $2,%r14d > > + addl %r11d,%edx > > + psrld $7,%xmm6 > > + addl %edi,%r11d > > + movl %edx,%r13d > > + pshufd $250,%xmm1,%xmm7 > > + addl %r11d,%r14d > > + rorl $14,%r13d > > + pslld $14,%xmm5 > > + movl %r14d,%r11d > > + movl %r8d,%r12d > > + pxor %xmm6,%xmm4 > > + rorl $9,%r14d > > + xorl %edx,%r13d > > + xorl %r9d,%r12d > > + rorl $5,%r13d > > + psrld $11,%xmm6 > > + xorl %r11d,%r14d > > + pxor %xmm5,%xmm4 > > + andl %edx,%r12d > > + xorl %edx,%r13d > > + pslld $11,%xmm5 > > + addl 36(%rsp),%r10d > > + movl %r11d,%edi > > + pxor %xmm6,%xmm4 > > + xorl %r9d,%r12d > > + rorl $11,%r14d > > + movdqa %xmm7,%xmm6 > > + xorl %eax,%edi > > + addl %r12d,%r10d > > + pxor %xmm5,%xmm4 > > + rorl $6,%r13d > > + andl %edi,%r15d > > + xorl %r11d,%r14d > > + psrld $10,%xmm7 > > + addl %r13d,%r10d > > + xorl %eax,%r15d > > + paddd %xmm4,%xmm2 > > + rorl $2,%r14d > > + addl %r10d,%ecx > > + psrlq $17,%xmm6 > > + addl %r15d,%r10d > > + movl %ecx,%r13d > > + addl %r10d,%r14d > > + pxor %xmm6,%xmm7 > > + rorl $14,%r13d > > + movl %r14d,%r10d > > + movl %edx,%r12d > > + rorl $9,%r14d > > + psrlq $2,%xmm6 > > + xorl %ecx,%r13d > > + xorl %r8d,%r12d > > + pxor %xmm6,%xmm7 > > + rorl $5,%r13d > > + xorl %r10d,%r14d > > + andl %ecx,%r12d > > + pshufd $128,%xmm7,%xmm7 > > + xorl %ecx,%r13d > > + addl 40(%rsp),%r9d > > + movl %r10d,%r15d > > + psrldq $8,%xmm7 > > + xorl %r8d,%r12d > > + rorl $11,%r14d > > + xorl %r11d,%r15d > > + addl %r12d,%r9d > > + rorl $6,%r13d > > + paddd %xmm7,%xmm2 > > + andl %r15d,%edi > > + xorl %r10d,%r14d > > + addl %r13d,%r9d > > + pshufd $80,%xmm2,%xmm7 > > + xorl %r11d,%edi > > + rorl $2,%r14d > > + addl %r9d,%ebx > > + movdqa %xmm7,%xmm6 > > + addl %edi,%r9d > > + movl %ebx,%r13d > > + psrld $10,%xmm7 > > + addl %r9d,%r14d > > + rorl $14,%r13d > > + psrlq $17,%xmm6 > > + movl %r14d,%r9d > > + movl %ecx,%r12d > > + pxor %xmm6,%xmm7 > > + rorl $9,%r14d > > + xorl %ebx,%r13d > > + xorl %edx,%r12d > > + rorl $5,%r13d > > + xorl %r9d,%r14d > > + psrlq $2,%xmm6 > > + andl %ebx,%r12d > > + xorl %ebx,%r13d > > + addl 44(%rsp),%r8d > > + pxor %xmm6,%xmm7 > > + movl %r9d,%edi > > + xorl %edx,%r12d > > + rorl $11,%r14d > > + pshufd $8,%xmm7,%xmm7 > > + xorl %r10d,%edi > > + addl %r12d,%r8d > > + movdqa 64(%rbp),%xmm6 > > + rorl $6,%r13d > > + andl %edi,%r15d > > + pslldq $8,%xmm7 > > + xorl %r9d,%r14d > > + addl %r13d,%r8d > > + xorl %r10d,%r15d > > + paddd %xmm7,%xmm2 > > + rorl $2,%r14d > > + addl %r8d,%eax > > + addl %r15d,%r8d > > + paddd %xmm2,%xmm6 > > + movl %eax,%r13d > > + addl %r8d,%r14d > > + movdqa %xmm6,32(%rsp) > > + rorl $14,%r13d > > + movdqa %xmm0,%xmm4 > > + movl %r14d,%r8d > > + movl %ebx,%r12d > > + movdqa %xmm2,%xmm7 > > + rorl $9,%r14d > > + xorl %eax,%r13d > > + xorl %ecx,%r12d > > + rorl $5,%r13d > > + xorl %r8d,%r14d > > +.byte 102,15,58,15,227,4 > > + andl %eax,%r12d > > + xorl %eax,%r13d > > +.byte 102,15,58,15,249,4 > > + addl 48(%rsp),%edx > > + movl %r8d,%r15d > > + xorl %ecx,%r12d > > + rorl $11,%r14d > > + movdqa %xmm4,%xmm5 > > + xorl %r9d,%r15d > > + addl %r12d,%edx > > + movdqa %xmm4,%xmm6 > > + rorl $6,%r13d > > + andl %r15d,%edi > > + psrld $3,%xmm4 > > + xorl %r8d,%r14d > > + addl %r13d,%edx > > + xorl %r9d,%edi > > + paddd %xmm7,%xmm3 > > + rorl $2,%r14d > > + addl %edx,%r11d > > + psrld $7,%xmm6 > > + addl %edi,%edx > > + movl %r11d,%r13d > > + pshufd $250,%xmm2,%xmm7 > > + addl %edx,%r14d > > + rorl $14,%r13d > > + pslld $14,%xmm5 > > + movl %r14d,%edx > > + movl %eax,%r12d > > + pxor %xmm6,%xmm4 > > + rorl $9,%r14d > > + xorl %r11d,%r13d > > + xorl %ebx,%r12d > > + rorl $5,%r13d > > + psrld $11,%xmm6 > > + xorl %edx,%r14d > > + pxor %xmm5,%xmm4 > > + andl %r11d,%r12d > > + xorl %r11d,%r13d > > + pslld $11,%xmm5 > > + addl 52(%rsp),%ecx > > + movl %edx,%edi > > + pxor %xmm6,%xmm4 > > + xorl %ebx,%r12d > > + rorl $11,%r14d > > + movdqa %xmm7,%xmm6 > > + xorl %r8d,%edi > > + addl %r12d,%ecx > > + pxor %xmm5,%xmm4 > > + rorl $6,%r13d > > + andl %edi,%r15d > > + xorl %edx,%r14d > > + psrld $10,%xmm7 > > + addl %r13d,%ecx > > + xorl %r8d,%r15d > > + paddd %xmm4,%xmm3 > > + rorl $2,%r14d > > + addl %ecx,%r10d > > + psrlq $17,%xmm6 > > + addl %r15d,%ecx > > + movl %r10d,%r13d > > + addl %ecx,%r14d > > + pxor %xmm6,%xmm7 > > + rorl $14,%r13d > > + movl %r14d,%ecx > > + movl %r11d,%r12d > > + rorl $9,%r14d > > + psrlq $2,%xmm6 > > + xorl %r10d,%r13d > > + xorl %eax,%r12d > > + pxor %xmm6,%xmm7 > > + rorl $5,%r13d > > + xorl %ecx,%r14d > > + andl %r10d,%r12d > > + pshufd $128,%xmm7,%xmm7 > > + xorl %r10d,%r13d > > + addl 56(%rsp),%ebx > > + movl %ecx,%r15d > > + psrldq $8,%xmm7 > > + xorl %eax,%r12d > > + rorl $11,%r14d > > + xorl %edx,%r15d > > + addl %r12d,%ebx > > + rorl $6,%r13d > > + paddd %xmm7,%xmm3 > > + andl %r15d,%edi > > + xorl %ecx,%r14d > > + addl %r13d,%ebx > > + pshufd $80,%xmm3,%xmm7 > > + xorl %edx,%edi > > + rorl $2,%r14d > > + addl %ebx,%r9d > > + movdqa %xmm7,%xmm6 > > + addl %edi,%ebx > > + movl %r9d,%r13d > > + psrld $10,%xmm7 > > + addl %ebx,%r14d > > + rorl $14,%r13d > > + psrlq $17,%xmm6 > > + movl %r14d,%ebx > > + movl %r10d,%r12d > > + pxor %xmm6,%xmm7 > > + rorl $9,%r14d > > + xorl %r9d,%r13d > > + xorl %r11d,%r12d > > + rorl $5,%r13d > > + xorl %ebx,%r14d > > + psrlq $2,%xmm6 > > + andl %r9d,%r12d > > + xorl %r9d,%r13d > > + addl 60(%rsp),%eax > > + pxor %xmm6,%xmm7 > > + movl %ebx,%edi > > + xorl %r11d,%r12d > > + rorl $11,%r14d > > + pshufd $8,%xmm7,%xmm7 > > + xorl %ecx,%edi > > + addl %r12d,%eax > > + movdqa 96(%rbp),%xmm6 > > + rorl $6,%r13d > > + andl %edi,%r15d > > + pslldq $8,%xmm7 > > + xorl %ebx,%r14d > > + addl %r13d,%eax > > + xorl %ecx,%r15d > > + paddd %xmm7,%xmm3 > > + rorl $2,%r14d > > + addl %eax,%r8d > > + addl %r15d,%eax > > + paddd %xmm3,%xmm6 > > + movl %r8d,%r13d > > + addl %eax,%r14d > > + movdqa %xmm6,48(%rsp) > > + cmpb $0,131(%rbp) > > + jne .Lssse3_00_47 > > + rorl $14,%r13d > > + movl %r14d,%eax > > + movl %r9d,%r12d > > + rorl $9,%r14d > > + xorl %r8d,%r13d > > + xorl %r10d,%r12d > > + rorl $5,%r13d > > + xorl %eax,%r14d > > + andl %r8d,%r12d > > + xorl %r8d,%r13d > > + addl 0(%rsp),%r11d > > + movl %eax,%r15d > > + xorl %r10d,%r12d > > + rorl $11,%r14d > > + xorl %ebx,%r15d > > + addl %r12d,%r11d > > + rorl $6,%r13d > > + andl %r15d,%edi > > + xorl %eax,%r14d > > + addl %r13d,%r11d > > + xorl %ebx,%edi > > + rorl $2,%r14d > > + addl %r11d,%edx > > + addl %edi,%r11d > > + movl %edx,%r13d > > + addl %r11d,%r14d > > + rorl $14,%r13d > > + movl %r14d,%r11d > > + movl %r8d,%r12d > > + rorl $9,%r14d > > + xorl %edx,%r13d > > + xorl %r9d,%r12d > > + rorl $5,%r13d > > + xorl %r11d,%r14d > > + andl %edx,%r12d > > + xorl %edx,%r13d > > + addl 4(%rsp),%r10d > > + movl %r11d,%edi > > + xorl %r9d,%r12d > > + rorl $11,%r14d > > + xorl %eax,%edi > > + addl %r12d,%r10d > > + rorl $6,%r13d > > + andl %edi,%r15d > > + xorl %r11d,%r14d > > + addl %r13d,%r10d > > + xorl %eax,%r15d > > + rorl $2,%r14d > > + addl %r10d,%ecx > > + addl %r15d,%r10d > > + movl %ecx,%r13d > > + addl %r10d,%r14d > > + rorl $14,%r13d > > + movl %r14d,%r10d > > + movl %edx,%r12d > > + rorl $9,%r14d > > + xorl %ecx,%r13d > > + xorl %r8d,%r12d > > + rorl $5,%r13d > > + xorl %r10d,%r14d > > + andl %ecx,%r12d > > + xorl %ecx,%r13d > > + addl 8(%rsp),%r9d > > + movl %r10d,%r15d > > + xorl %r8d,%r12d > > + rorl $11,%r14d > > + xorl %r11d,%r15d > > + addl %r12d,%r9d > > + rorl $6,%r13d > > + andl %r15d,%edi > > + xorl %r10d,%r14d > > + addl %r13d,%r9d > > + xorl %r11d,%edi > > + rorl $2,%r14d > > + addl %r9d,%ebx > > + addl %edi,%r9d > > + movl %ebx,%r13d > > + addl %r9d,%r14d > > + rorl $14,%r13d > > + movl %r14d,%r9d > > + movl %ecx,%r12d > > + rorl $9,%r14d > > + xorl %ebx,%r13d > > + xorl %edx,%r12d > > + rorl $5,%r13d > > + xorl %r9d,%r14d > > + andl %ebx,%r12d > > + xorl %ebx,%r13d > > + addl 12(%rsp),%r8d > > + movl %r9d,%edi > > + xorl %edx,%r12d > > + rorl $11,%r14d > > + xorl %r10d,%edi > > + addl %r12d,%r8d > > + rorl $6,%r13d > > + andl %edi,%r15d > > + xorl %r9d,%r14d > > + addl %r13d,%r8d > > + xorl %r10d,%r15d > > + rorl $2,%r14d > > + addl %r8d,%eax > > + addl %r15d,%r8d > > + movl %eax,%r13d > > + addl %r8d,%r14d > > + rorl $14,%r13d > > + movl %r14d,%r8d > > + movl %ebx,%r12d > > + rorl $9,%r14d > > + xorl %eax,%r13d > > + xorl %ecx,%r12d > > + rorl $5,%r13d > > + xorl %r8d,%r14d > > + andl %eax,%r12d > > + xorl %eax,%r13d > > + addl 16(%rsp),%edx > > + movl %r8d,%r15d > > + xorl %ecx,%r12d > > + rorl $11,%r14d > > + xorl %r9d,%r15d > > + addl %r12d,%edx > > + rorl $6,%r13d > > + andl %r15d,%edi > > + xorl %r8d,%r14d > > + addl %r13d,%edx > > + xorl %r9d,%edi > > + rorl $2,%r14d > > + addl %edx,%r11d > > + addl %edi,%edx > > + movl %r11d,%r13d > > + addl %edx,%r14d > > + rorl $14,%r13d > > + movl %r14d,%edx > > + movl %eax,%r12d > > + rorl $9,%r14d > > + xorl %r11d,%r13d > > + xorl %ebx,%r12d > > + rorl $5,%r13d > > + xorl %edx,%r14d > > + andl %r11d,%r12d > > + xorl %r11d,%r13d > > + addl 20(%rsp),%ecx > > + movl %edx,%edi > > + xorl %ebx,%r12d > > + rorl $11,%r14d > > + xorl %r8d,%edi > > + addl %r12d,%ecx > > + rorl $6,%r13d > > + andl %edi,%r15d > > + xorl %edx,%r14d > > + addl %r13d,%ecx > > + xorl %r8d,%r15d > > + rorl $2,%r14d > > + addl %ecx,%r10d > > + addl %r15d,%ecx > > + movl %r10d,%r13d > > + addl %ecx,%r14d > > + rorl $14,%r13d > > + movl %r14d,%ecx > > + movl %r11d,%r12d > > + rorl $9,%r14d > > + xorl %r10d,%r13d > > + xorl %eax,%r12d > > + rorl $5,%r13d > > + xorl %ecx,%r14d > > + andl %r10d,%r12d > > + xorl %r10d,%r13d > > + addl 24(%rsp),%ebx > > + movl %ecx,%r15d > > + xorl %eax,%r12d > > + rorl $11,%r14d > > + xorl %edx,%r15d > > + addl %r12d,%ebx > > + rorl $6,%r13d > > + andl %r15d,%edi > > + xorl %ecx,%r14d > > + addl %r13d,%ebx > > + xorl %edx,%edi > > + rorl $2,%r14d > > + addl %ebx,%r9d > > + addl %edi,%ebx > > + movl %r9d,%r13d > > + addl %ebx,%r14d > > + rorl $14,%r13d > > + movl %r14d,%ebx > > + movl %r10d,%r12d > > + rorl $9,%r14d > > + xorl %r9d,%r13d > > + xorl %r11d,%r12d > > + rorl $5,%r13d > > + xorl %ebx,%r14d > > + andl %r9d,%r12d > > + xorl %r9d,%r13d > > + addl 28(%rsp),%eax > > + movl %ebx,%edi > > + xorl %r11d,%r12d > > + rorl $11,%r14d > > + xorl %ecx,%edi > > + addl %r12d,%eax > > + rorl $6,%r13d > > + andl %edi,%r15d > > + xorl %ebx,%r14d > > + addl %r13d,%eax > > + xorl %ecx,%r15d > > + rorl $2,%r14d > > + addl %eax,%r8d > > + addl %r15d,%eax > > + movl %r8d,%r13d > > + addl %eax,%r14d > > + rorl $14,%r13d > > + movl %r14d,%eax > > + movl %r9d,%r12d > > + rorl $9,%r14d > > + xorl %r8d,%r13d > > + xorl %r10d,%r12d > > + rorl $5,%r13d > > + xorl %eax,%r14d > > + andl %r8d,%r12d > > + xorl %r8d,%r13d > > + addl 32(%rsp),%r11d > > + movl %eax,%r15d > > + xorl %r10d,%r12d > > + rorl $11,%r14d > > + xorl %ebx,%r15d > > + addl %r12d,%r11d > > + rorl $6,%r13d > > + andl %r15d,%edi > > + xorl %eax,%r14d > > + addl %r13d,%r11d > > + xorl %ebx,%edi > > + rorl $2,%r14d > > + addl %r11d,%edx > > + addl %edi,%r11d > > + movl %edx,%r13d > > + addl %r11d,%r14d > > + rorl $14,%r13d > > + movl %r14d,%r11d > > + movl %r8d,%r12d > > + rorl $9,%r14d > > + xorl %edx,%r13d > > + xorl %r9d,%r12d > > + rorl $5,%r13d > > + xorl %r11d,%r14d > > + andl %edx,%r12d > > + xorl %edx,%r13d > > + addl 36(%rsp),%r10d > > + movl %r11d,%edi > > + xorl %r9d,%r12d > > + rorl $11,%r14d > > + xorl %eax,%edi > > + addl %r12d,%r10d > > + rorl $6,%r13d > > + andl %edi,%r15d > > + xorl %r11d,%r14d > > + addl %r13d,%r10d > > + xorl %eax,%r15d > > + rorl $2,%r14d > > + addl %r10d,%ecx > > + addl %r15d,%r10d > > + movl %ecx,%r13d > > + addl %r10d,%r14d > > + rorl $14,%r13d > > + movl %r14d,%r10d > > + movl %edx,%r12d > > + rorl $9,%r14d > > + xorl %ecx,%r13d > > + xorl %r8d,%r12d > > + rorl $5,%r13d > > + xorl %r10d,%r14d > > + andl %ecx,%r12d > > + xorl %ecx,%r13d > > + addl 40(%rsp),%r9d > > + movl %r10d,%r15d > > + xorl %r8d,%r12d > > + rorl $11,%r14d > > + xorl %r11d,%r15d > > + addl %r12d,%r9d > > + rorl $6,%r13d > > + andl %r15d,%edi > > + xorl %r10d,%r14d > > + addl %r13d,%r9d > > + xorl %r11d,%edi > > + rorl $2,%r14d > > + addl %r9d,%ebx > > + addl %edi,%r9d > > + movl %ebx,%r13d > > + addl %r9d,%r14d > > + rorl $14,%r13d > > + movl %r14d,%r9d > > + movl %ecx,%r12d > > + rorl $9,%r14d > > + xorl %ebx,%r13d > > + xorl %edx,%r12d > > + rorl $5,%r13d > > + xorl %r9d,%r14d > > + andl %ebx,%r12d > > + xorl %ebx,%r13d > > + addl 44(%rsp),%r8d > > + movl %r9d,%edi > > + xorl %edx,%r12d > > + rorl $11,%r14d > > + xorl %r10d,%edi > > + addl %r12d,%r8d > > + rorl $6,%r13d > > + andl %edi,%r15d > > + xorl %r9d,%r14d > > + addl %r13d,%r8d > > + xorl %r10d,%r15d > > + rorl $2,%r14d > > + addl %r8d,%eax > > + addl %r15d,%r8d > > + movl %eax,%r13d > > + addl %r8d,%r14d > > + rorl $14,%r13d > > + movl %r14d,%r8d > > + movl %ebx,%r12d > > + rorl $9,%r14d > > + xorl %eax,%r13d > > + xorl %ecx,%r12d > > + rorl $5,%r13d > > + xorl %r8d,%r14d > > + andl %eax,%r12d > > + xorl %eax,%r13d > > + addl 48(%rsp),%edx > > + movl %r8d,%r15d > > + xorl %ecx,%r12d > > + rorl $11,%r14d > > + xorl %r9d,%r15d > > + addl %r12d,%edx > > + rorl $6,%r13d > > + andl %r15d,%edi > > + xorl %r8d,%r14d > > + addl %r13d,%edx > > + xorl %r9d,%edi > > + rorl $2,%r14d > > + addl %edx,%r11d > > + addl %edi,%edx > > + movl %r11d,%r13d > > + addl %edx,%r14d > > + rorl $14,%r13d > > + movl %r14d,%edx > > + movl %eax,%r12d > > + rorl $9,%r14d > > + xorl %r11d,%r13d > > + xorl %ebx,%r12d > > + rorl $5,%r13d > > + xorl %edx,%r14d > > + andl %r11d,%r12d > > + xorl %r11d,%r13d > > + addl 52(%rsp),%ecx > > + movl %edx,%edi > > + xorl %ebx,%r12d > > + rorl $11,%r14d > > + xorl %r8d,%edi > > + addl %r12d,%ecx > > + rorl $6,%r13d > > + andl %edi,%r15d > > + xorl %edx,%r14d > > + addl %r13d,%ecx > > + xorl %r8d,%r15d > > + rorl $2,%r14d > > + addl %ecx,%r10d > > + addl %r15d,%ecx > > + movl %r10d,%r13d > > + addl %ecx,%r14d > > + rorl $14,%r13d > > + movl %r14d,%ecx > > + movl %r11d,%r12d > > + rorl $9,%r14d > > + xorl %r10d,%r13d > > + xorl %eax,%r12d > > + rorl $5,%r13d > > + xorl %ecx,%r14d > > + andl %r10d,%r12d > > + xorl %r10d,%r13d > > + addl 56(%rsp),%ebx > > + movl %ecx,%r15d > > + xorl %eax,%r12d > > + rorl $11,%r14d > > + xorl %edx,%r15d > > + addl %r12d,%ebx > > + rorl $6,%r13d > > + andl %r15d,%edi > > + xorl %ecx,%r14d > > + addl %r13d,%ebx > > + xorl %edx,%edi > > + rorl $2,%r14d > > + addl %ebx,%r9d > > + addl %edi,%ebx > > + movl %r9d,%r13d > > + addl %ebx,%r14d > > + rorl $14,%r13d > > + movl %r14d,%ebx > > + movl %r10d,%r12d > > + rorl $9,%r14d > > + xorl %r9d,%r13d > > + xorl %r11d,%r12d > > + rorl $5,%r13d > > + xorl %ebx,%r14d > > + andl %r9d,%r12d > > + xorl %r9d,%r13d > > + addl 60(%rsp),%eax > > + movl %ebx,%edi > > + xorl %r11d,%r12d > > + rorl $11,%r14d > > + xorl %ecx,%edi > > + addl %r12d,%eax > > + rorl $6,%r13d > > + andl %edi,%r15d > > + xorl %ebx,%r14d > > + addl %r13d,%eax > > + xorl %ecx,%r15d > > + rorl $2,%r14d > > + addl %eax,%r8d > > + addl %r15d,%eax > > + movl %r8d,%r13d > > + addl %eax,%r14d > > + movq 64+0(%rsp),%rdi > > + movl %r14d,%eax > > + > > + addl 0(%rdi),%eax > > + leaq 64(%rsi),%rsi > > + addl 4(%rdi),%ebx > > + addl 8(%rdi),%ecx > > + addl 12(%rdi),%edx > > + addl 16(%rdi),%r8d > > + addl 20(%rdi),%r9d > > + addl 24(%rdi),%r10d > > + addl 28(%rdi),%r11d > > + > > + cmpq 64+16(%rsp),%rsi > > + > > + movl %eax,0(%rdi) > > + movl %ebx,4(%rdi) > > + movl %ecx,8(%rdi) > > + movl %edx,12(%rdi) > > + movl %r8d,16(%rdi) > > + movl %r9d,20(%rdi) > > + movl %r10d,24(%rdi) > > + movl %r11d,28(%rdi) > > + jb .Lloop_ssse3 > > + > > + movq 88(%rsp),%rsi > > +.cfi_def_cfa %rsi,8 > > + movq -48(%rsi),%r15 > > +.cfi_restore %r15 > > + movq -40(%rsi),%r14 > > +.cfi_restore %r14 > > + movq -32(%rsi),%r13 > > +.cfi_restore %r13 > > + movq -24(%rsi),%r12 > > +.cfi_restore %r12 > > + movq -16(%rsi),%rbp > > +.cfi_restore %rbp > > + movq -8(%rsi),%rbx > > +.cfi_restore %rbx > > + leaq (%rsi),%rsp > > +.cfi_def_cfa_register %rsp > > +.Lepilogue_ssse3: > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size > sha256_block_data_order_ssse3,.-sha256_block_data_order_ssse3 > > diff --git > a/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha512-x86_64.S > > b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha512-x86_64.S > > new file mode 100644 > > index 0000000000..11e67e5ba1 > > --- /dev/null > > +++ b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/sha/sha512-x86_64.S > > @@ -0,0 +1,1811 @@ > > +# WARNING: do not edit! > > +# Generated from openssl/crypto/sha/asm/sha512-x86_64.pl > > +# > > +# Copyright 2005-2020 The OpenSSL Project Authors. All Rights Reserved. > > +# > > +# Licensed under the OpenSSL license (the "License"). You may not use > > +# this file except in compliance with the License. You can obtain a copy > > +# in the file LICENSE in the source distribution or at > > +# https://www.openssl.org/source/license.html > > + > > +.text > > + > > + > > +.globl sha512_block_data_order > > +.type sha512_block_data_order,@function > > +.align 16 > > +sha512_block_data_order: > > +.cfi_startproc > > + movq %rsp,%rax > > +.cfi_def_cfa_register %rax > > + pushq %rbx > > +.cfi_offset %rbx,-16 > > + pushq %rbp > > +.cfi_offset %rbp,-24 > > + pushq %r12 > > +.cfi_offset %r12,-32 > > + pushq %r13 > > +.cfi_offset %r13,-40 > > + pushq %r14 > > +.cfi_offset %r14,-48 > > + pushq %r15 > > +.cfi_offset %r15,-56 > > + shlq $4,%rdx > > + subq $128+32,%rsp > > + leaq (%rsi,%rdx,8),%rdx > > + andq $-64,%rsp > > + movq %rdi,128+0(%rsp) > > + movq %rsi,128+8(%rsp) > > + movq %rdx,128+16(%rsp) > > + movq %rax,152(%rsp) > > +.cfi_escape 0x0f,0x06,0x77,0x98,0x01,0x06,0x23,0x08 > > +.Lprologue: > > + > > + movq 0(%rdi),%rax > > + movq 8(%rdi),%rbx > > + movq 16(%rdi),%rcx > > + movq 24(%rdi),%rdx > > + movq 32(%rdi),%r8 > > + movq 40(%rdi),%r9 > > + movq 48(%rdi),%r10 > > + movq 56(%rdi),%r11 > > + jmp .Lloop > > + > > +.align 16 > > +.Lloop: > > + movq %rbx,%rdi > > + leaq K512(%rip),%rbp > > + xorq %rcx,%rdi > > + movq 0(%rsi),%r12 > > + movq %r8,%r13 > > + movq %rax,%r14 > > + bswapq %r12 > > + rorq $23,%r13 > > + movq %r9,%r15 > > + > > + xorq %r8,%r13 > > + rorq $5,%r14 > > + xorq %r10,%r15 > > + > > + movq %r12,0(%rsp) > > + xorq %rax,%r14 > > + andq %r8,%r15 > > + > > + rorq $4,%r13 > > + addq %r11,%r12 > > + xorq %r10,%r15 > > + > > + rorq $6,%r14 > > + xorq %r8,%r13 > > + addq %r15,%r12 > > + > > + movq %rax,%r15 > > + addq (%rbp),%r12 > > + xorq %rax,%r14 > > + > > + xorq %rbx,%r15 > > + rorq $14,%r13 > > + movq %rbx,%r11 > > + > > + andq %r15,%rdi > > + rorq $28,%r14 > > + addq %r13,%r12 > > + > > + xorq %rdi,%r11 > > + addq %r12,%rdx > > + addq %r12,%r11 > > + > > + leaq 8(%rbp),%rbp > > + addq %r14,%r11 > > + movq 8(%rsi),%r12 > > + movq %rdx,%r13 > > + movq %r11,%r14 > > + bswapq %r12 > > + rorq $23,%r13 > > + movq %r8,%rdi > > + > > + xorq %rdx,%r13 > > + rorq $5,%r14 > > + xorq %r9,%rdi > > + > > + movq %r12,8(%rsp) > > + xorq %r11,%r14 > > + andq %rdx,%rdi > > + > > + rorq $4,%r13 > > + addq %r10,%r12 > > + xorq %r9,%rdi > > + > > + rorq $6,%r14 > > + xorq %rdx,%r13 > > + addq %rdi,%r12 > > + > > + movq %r11,%rdi > > + addq (%rbp),%r12 > > + xorq %r11,%r14 > > + > > + xorq %rax,%rdi > > + rorq $14,%r13 > > + movq %rax,%r10 > > + > > + andq %rdi,%r15 > > + rorq $28,%r14 > > + addq %r13,%r12 > > + > > + xorq %r15,%r10 > > + addq %r12,%rcx > > + addq %r12,%r10 > > + > > + leaq 24(%rbp),%rbp > > + addq %r14,%r10 > > + movq 16(%rsi),%r12 > > + movq %rcx,%r13 > > + movq %r10,%r14 > > + bswapq %r12 > > + rorq $23,%r13 > > + movq %rdx,%r15 > > + > > + xorq %rcx,%r13 > > + rorq $5,%r14 > > + xorq %r8,%r15 > > + > > + movq %r12,16(%rsp) > > + xorq %r10,%r14 > > + andq %rcx,%r15 > > + > > + rorq $4,%r13 > > + addq %r9,%r12 > > + xorq %r8,%r15 > > + > > + rorq $6,%r14 > > + xorq %rcx,%r13 > > + addq %r15,%r12 > > + > > + movq %r10,%r15 > > + addq (%rbp),%r12 > > + xorq %r10,%r14 > > + > > + xorq %r11,%r15 > > + rorq $14,%r13 > > + movq %r11,%r9 > > + > > + andq %r15,%rdi > > + rorq $28,%r14 > > + addq %r13,%r12 > > + > > + xorq %rdi,%r9 > > + addq %r12,%rbx > > + addq %r12,%r9 > > + > > + leaq 8(%rbp),%rbp > > + addq %r14,%r9 > > + movq 24(%rsi),%r12 > > + movq %rbx,%r13 > > + movq %r9,%r14 > > + bswapq %r12 > > + rorq $23,%r13 > > + movq %rcx,%rdi > > + > > + xorq %rbx,%r13 > > + rorq $5,%r14 > > + xorq %rdx,%rdi > > + > > + movq %r12,24(%rsp) > > + xorq %r9,%r14 > > + andq %rbx,%rdi > > + > > + rorq $4,%r13 > > + addq %r8,%r12 > > + xorq %rdx,%rdi > > + > > + rorq $6,%r14 > > + xorq %rbx,%r13 > > + addq %rdi,%r12 > > + > > + movq %r9,%rdi > > + addq (%rbp),%r12 > > + xorq %r9,%r14 > > + > > + xorq %r10,%rdi > > + rorq $14,%r13 > > + movq %r10,%r8 > > + > > + andq %rdi,%r15 > > + rorq $28,%r14 > > + addq %r13,%r12 > > + > > + xorq %r15,%r8 > > + addq %r12,%rax > > + addq %r12,%r8 > > + > > + leaq 24(%rbp),%rbp > > + addq %r14,%r8 > > + movq 32(%rsi),%r12 > > + movq %rax,%r13 > > + movq %r8,%r14 > > + bswapq %r12 > > + rorq $23,%r13 > > + movq %rbx,%r15 > > + > > + xorq %rax,%r13 > > + rorq $5,%r14 > > + xorq %rcx,%r15 > > + > > + movq %r12,32(%rsp) > > + xorq %r8,%r14 > > + andq %rax,%r15 > > + > > + rorq $4,%r13 > > + addq %rdx,%r12 > > + xorq %rcx,%r15 > > + > > + rorq $6,%r14 > > + xorq %rax,%r13 > > + addq %r15,%r12 > > + > > + movq %r8,%r15 > > + addq (%rbp),%r12 > > + xorq %r8,%r14 > > + > > + xorq %r9,%r15 > > + rorq $14,%r13 > > + movq %r9,%rdx > > + > > + andq %r15,%rdi > > + rorq $28,%r14 > > + addq %r13,%r12 > > + > > + xorq %rdi,%rdx > > + addq %r12,%r11 > > + addq %r12,%rdx > > + > > + leaq 8(%rbp),%rbp > > + addq %r14,%rdx > > + movq 40(%rsi),%r12 > > + movq %r11,%r13 > > + movq %rdx,%r14 > > + bswapq %r12 > > + rorq $23,%r13 > > + movq %rax,%rdi > > + > > + xorq %r11,%r13 > > + rorq $5,%r14 > > + xorq %rbx,%rdi > > + > > + movq %r12,40(%rsp) > > + xorq %rdx,%r14 > > + andq %r11,%rdi > > + > > + rorq $4,%r13 > > + addq %rcx,%r12 > > + xorq %rbx,%rdi > > + > > + rorq $6,%r14 > > + xorq %r11,%r13 > > + addq %rdi,%r12 > > + > > + movq %rdx,%rdi > > + addq (%rbp),%r12 > > + xorq %rdx,%r14 > > + > > + xorq %r8,%rdi > > + rorq $14,%r13 > > + movq %r8,%rcx > > + > > + andq %rdi,%r15 > > + rorq $28,%r14 > > + addq %r13,%r12 > > + > > + xorq %r15,%rcx > > + addq %r12,%r10 > > + addq %r12,%rcx > > + > > + leaq 24(%rbp),%rbp > > + addq %r14,%rcx > > + movq 48(%rsi),%r12 > > + movq %r10,%r13 > > + movq %rcx,%r14 > > + bswapq %r12 > > + rorq $23,%r13 > > + movq %r11,%r15 > > + > > + xorq %r10,%r13 > > + rorq $5,%r14 > > + xorq %rax,%r15 > > + > > + movq %r12,48(%rsp) > > + xorq %rcx,%r14 > > + andq %r10,%r15 > > + > > + rorq $4,%r13 > > + addq %rbx,%r12 > > + xorq %rax,%r15 > > + > > + rorq $6,%r14 > > + xorq %r10,%r13 > > + addq %r15,%r12 > > + > > + movq %rcx,%r15 > > + addq (%rbp),%r12 > > + xorq %rcx,%r14 > > + > > + xorq %rdx,%r15 > > + rorq $14,%r13 > > + movq %rdx,%rbx > > + > > + andq %r15,%rdi > > + rorq $28,%r14 > > + addq %r13,%r12 > > + > > + xorq %rdi,%rbx > > + addq %r12,%r9 > > + addq %r12,%rbx > > + > > + leaq 8(%rbp),%rbp > > + addq %r14,%rbx > > + movq 56(%rsi),%r12 > > + movq %r9,%r13 > > + movq %rbx,%r14 > > + bswapq %r12 > > + rorq $23,%r13 > > + movq %r10,%rdi > > + > > + xorq %r9,%r13 > > + rorq $5,%r14 > > + xorq %r11,%rdi > > + > > + movq %r12,56(%rsp) > > + xorq %rbx,%r14 > > + andq %r9,%rdi > > + > > + rorq $4,%r13 > > + addq %rax,%r12 > > + xorq %r11,%rdi > > + > > + rorq $6,%r14 > > + xorq %r9,%r13 > > + addq %rdi,%r12 > > + > > + movq %rbx,%rdi > > + addq (%rbp),%r12 > > + xorq %rbx,%r14 > > + > > + xorq %rcx,%rdi > > + rorq $14,%r13 > > + movq %rcx,%rax > > + > > + andq %rdi,%r15 > > + rorq $28,%r14 > > + addq %r13,%r12 > > + > > + xorq %r15,%rax > > + addq %r12,%r8 > > + addq %r12,%rax > > + > > + leaq 24(%rbp),%rbp > > + addq %r14,%rax > > + movq 64(%rsi),%r12 > > + movq %r8,%r13 > > + movq %rax,%r14 > > + bswapq %r12 > > + rorq $23,%r13 > > + movq %r9,%r15 > > + > > + xorq %r8,%r13 > > + rorq $5,%r14 > > + xorq %r10,%r15 > > + > > + movq %r12,64(%rsp) > > + xorq %rax,%r14 > > + andq %r8,%r15 > > + > > + rorq $4,%r13 > > + addq %r11,%r12 > > + xorq %r10,%r15 > > + > > + rorq $6,%r14 > > + xorq %r8,%r13 > > + addq %r15,%r12 > > + > > + movq %rax,%r15 > > + addq (%rbp),%r12 > > + xorq %rax,%r14 > > + > > + xorq %rbx,%r15 > > + rorq $14,%r13 > > + movq %rbx,%r11 > > + > > + andq %r15,%rdi > > + rorq $28,%r14 > > + addq %r13,%r12 > > + > > + xorq %rdi,%r11 > > + addq %r12,%rdx > > + addq %r12,%r11 > > + > > + leaq 8(%rbp),%rbp > > + addq %r14,%r11 > > + movq 72(%rsi),%r12 > > + movq %rdx,%r13 > > + movq %r11,%r14 > > + bswapq %r12 > > + rorq $23,%r13 > > + movq %r8,%rdi > > + > > + xorq %rdx,%r13 > > + rorq $5,%r14 > > + xorq %r9,%rdi > > + > > + movq %r12,72(%rsp) > > + xorq %r11,%r14 > > + andq %rdx,%rdi > > + > > + rorq $4,%r13 > > + addq %r10,%r12 > > + xorq %r9,%rdi > > + > > + rorq $6,%r14 > > + xorq %rdx,%r13 > > + addq %rdi,%r12 > > + > > + movq %r11,%rdi > > + addq (%rbp),%r12 > > + xorq %r11,%r14 > > + > > + xorq %rax,%rdi > > + rorq $14,%r13 > > + movq %rax,%r10 > > + > > + andq %rdi,%r15 > > + rorq $28,%r14 > > + addq %r13,%r12 > > + > > + xorq %r15,%r10 > > + addq %r12,%rcx > > + addq %r12,%r10 > > + > > + leaq 24(%rbp),%rbp > > + addq %r14,%r10 > > + movq 80(%rsi),%r12 > > + movq %rcx,%r13 > > + movq %r10,%r14 > > + bswapq %r12 > > + rorq $23,%r13 > > + movq %rdx,%r15 > > + > > + xorq %rcx,%r13 > > + rorq $5,%r14 > > + xorq %r8,%r15 > > + > > + movq %r12,80(%rsp) > > + xorq %r10,%r14 > > + andq %rcx,%r15 > > + > > + rorq $4,%r13 > > + addq %r9,%r12 > > + xorq %r8,%r15 > > + > > + rorq $6,%r14 > > + xorq %rcx,%r13 > > + addq %r15,%r12 > > + > > + movq %r10,%r15 > > + addq (%rbp),%r12 > > + xorq %r10,%r14 > > + > > + xorq %r11,%r15 > > + rorq $14,%r13 > > + movq %r11,%r9 > > + > > + andq %r15,%rdi > > + rorq $28,%r14 > > + addq %r13,%r12 > > + > > + xorq %rdi,%r9 > > + addq %r12,%rbx > > + addq %r12,%r9 > > + > > + leaq 8(%rbp),%rbp > > + addq %r14,%r9 > > + movq 88(%rsi),%r12 > > + movq %rbx,%r13 > > + movq %r9,%r14 > > + bswapq %r12 > > + rorq $23,%r13 > > + movq %rcx,%rdi > > + > > + xorq %rbx,%r13 > > + rorq $5,%r14 > > + xorq %rdx,%rdi > > + > > + movq %r12,88(%rsp) > > + xorq %r9,%r14 > > + andq %rbx,%rdi > > + > > + rorq $4,%r13 > > + addq %r8,%r12 > > + xorq %rdx,%rdi > > + > > + rorq $6,%r14 > > + xorq %rbx,%r13 > > + addq %rdi,%r12 > > + > > + movq %r9,%rdi > > + addq (%rbp),%r12 > > + xorq %r9,%r14 > > + > > + xorq %r10,%rdi > > + rorq $14,%r13 > > + movq %r10,%r8 > > + > > + andq %rdi,%r15 > > + rorq $28,%r14 > > + addq %r13,%r12 > > + > > + xorq %r15,%r8 > > + addq %r12,%rax > > + addq %r12,%r8 > > + > > + leaq 24(%rbp),%rbp > > + addq %r14,%r8 > > + movq 96(%rsi),%r12 > > + movq %rax,%r13 > > + movq %r8,%r14 > > + bswapq %r12 > > + rorq $23,%r13 > > + movq %rbx,%r15 > > + > > + xorq %rax,%r13 > > + rorq $5,%r14 > > + xorq %rcx,%r15 > > + > > + movq %r12,96(%rsp) > > + xorq %r8,%r14 > > + andq %rax,%r15 > > + > > + rorq $4,%r13 > > + addq %rdx,%r12 > > + xorq %rcx,%r15 > > + > > + rorq $6,%r14 > > + xorq %rax,%r13 > > + addq %r15,%r12 > > + > > + movq %r8,%r15 > > + addq (%rbp),%r12 > > + xorq %r8,%r14 > > + > > + xorq %r9,%r15 > > + rorq $14,%r13 > > + movq %r9,%rdx > > + > > + andq %r15,%rdi > > + rorq $28,%r14 > > + addq %r13,%r12 > > + > > + xorq %rdi,%rdx > > + addq %r12,%r11 > > + addq %r12,%rdx > > + > > + leaq 8(%rbp),%rbp > > + addq %r14,%rdx > > + movq 104(%rsi),%r12 > > + movq %r11,%r13 > > + movq %rdx,%r14 > > + bswapq %r12 > > + rorq $23,%r13 > > + movq %rax,%rdi > > + > > + xorq %r11,%r13 > > + rorq $5,%r14 > > + xorq %rbx,%rdi > > + > > + movq %r12,104(%rsp) > > + xorq %rdx,%r14 > > + andq %r11,%rdi > > + > > + rorq $4,%r13 > > + addq %rcx,%r12 > > + xorq %rbx,%rdi > > + > > + rorq $6,%r14 > > + xorq %r11,%r13 > > + addq %rdi,%r12 > > + > > + movq %rdx,%rdi > > + addq (%rbp),%r12 > > + xorq %rdx,%r14 > > + > > + xorq %r8,%rdi > > + rorq $14,%r13 > > + movq %r8,%rcx > > + > > + andq %rdi,%r15 > > + rorq $28,%r14 > > + addq %r13,%r12 > > + > > + xorq %r15,%rcx > > + addq %r12,%r10 > > + addq %r12,%rcx > > + > > + leaq 24(%rbp),%rbp > > + addq %r14,%rcx > > + movq 112(%rsi),%r12 > > + movq %r10,%r13 > > + movq %rcx,%r14 > > + bswapq %r12 > > + rorq $23,%r13 > > + movq %r11,%r15 > > + > > + xorq %r10,%r13 > > + rorq $5,%r14 > > + xorq %rax,%r15 > > + > > + movq %r12,112(%rsp) > > + xorq %rcx,%r14 > > + andq %r10,%r15 > > + > > + rorq $4,%r13 > > + addq %rbx,%r12 > > + xorq %rax,%r15 > > + > > + rorq $6,%r14 > > + xorq %r10,%r13 > > + addq %r15,%r12 > > + > > + movq %rcx,%r15 > > + addq (%rbp),%r12 > > + xorq %rcx,%r14 > > + > > + xorq %rdx,%r15 > > + rorq $14,%r13 > > + movq %rdx,%rbx > > + > > + andq %r15,%rdi > > + rorq $28,%r14 > > + addq %r13,%r12 > > + > > + xorq %rdi,%rbx > > + addq %r12,%r9 > > + addq %r12,%rbx > > + > > + leaq 8(%rbp),%rbp > > + addq %r14,%rbx > > + movq 120(%rsi),%r12 > > + movq %r9,%r13 > > + movq %rbx,%r14 > > + bswapq %r12 > > + rorq $23,%r13 > > + movq %r10,%rdi > > + > > + xorq %r9,%r13 > > + rorq $5,%r14 > > + xorq %r11,%rdi > > + > > + movq %r12,120(%rsp) > > + xorq %rbx,%r14 > > + andq %r9,%rdi > > + > > + rorq $4,%r13 > > + addq %rax,%r12 > > + xorq %r11,%rdi > > + > > + rorq $6,%r14 > > + xorq %r9,%r13 > > + addq %rdi,%r12 > > + > > + movq %rbx,%rdi > > + addq (%rbp),%r12 > > + xorq %rbx,%r14 > > + > > + xorq %rcx,%rdi > > + rorq $14,%r13 > > + movq %rcx,%rax > > + > > + andq %rdi,%r15 > > + rorq $28,%r14 > > + addq %r13,%r12 > > + > > + xorq %r15,%rax > > + addq %r12,%r8 > > + addq %r12,%rax > > + > > + leaq 24(%rbp),%rbp > > + jmp .Lrounds_16_xx > > +.align 16 > > +.Lrounds_16_xx: > > + movq 8(%rsp),%r13 > > + movq 112(%rsp),%r15 > > + > > + movq %r13,%r12 > > + rorq $7,%r13 > > + addq %r14,%rax > > + movq %r15,%r14 > > + rorq $42,%r15 > > + > > + xorq %r12,%r13 > > + shrq $7,%r12 > > + rorq $1,%r13 > > + xorq %r14,%r15 > > + shrq $6,%r14 > > + > > + rorq $19,%r15 > > + xorq %r13,%r12 > > + xorq %r14,%r15 > > + addq 72(%rsp),%r12 > > + > > + addq 0(%rsp),%r12 > > + movq %r8,%r13 > > + addq %r15,%r12 > > + movq %rax,%r14 > > + rorq $23,%r13 > > + movq %r9,%r15 > > + > > + xorq %r8,%r13 > > + rorq $5,%r14 > > + xorq %r10,%r15 > > + > > + movq %r12,0(%rsp) > > + xorq %rax,%r14 > > + andq %r8,%r15 > > + > > + rorq $4,%r13 > > + addq %r11,%r12 > > + xorq %r10,%r15 > > + > > + rorq $6,%r14 > > + xorq %r8,%r13 > > + addq %r15,%r12 > > + > > + movq %rax,%r15 > > + addq (%rbp),%r12 > > + xorq %rax,%r14 > > + > > + xorq %rbx,%r15 > > + rorq $14,%r13 > > + movq %rbx,%r11 > > + > > + andq %r15,%rdi > > + rorq $28,%r14 > > + addq %r13,%r12 > > + > > + xorq %rdi,%r11 > > + addq %r12,%rdx > > + addq %r12,%r11 > > + > > + leaq 8(%rbp),%rbp > > + movq 16(%rsp),%r13 > > + movq 120(%rsp),%rdi > > + > > + movq %r13,%r12 > > + rorq $7,%r13 > > + addq %r14,%r11 > > + movq %rdi,%r14 > > + rorq $42,%rdi > > + > > + xorq %r12,%r13 > > + shrq $7,%r12 > > + rorq $1,%r13 > > + xorq %r14,%rdi > > + shrq $6,%r14 > > + > > + rorq $19,%rdi > > + xorq %r13,%r12 > > + xorq %r14,%rdi > > + addq 80(%rsp),%r12 > > + > > + addq 8(%rsp),%r12 > > + movq %rdx,%r13 > > + addq %rdi,%r12 > > + movq %r11,%r14 > > + rorq $23,%r13 > > + movq %r8,%rdi > > + > > + xorq %rdx,%r13 > > + rorq $5,%r14 > > + xorq %r9,%rdi > > + > > + movq %r12,8(%rsp) > > + xorq %r11,%r14 > > + andq %rdx,%rdi > > + > > + rorq $4,%r13 > > + addq %r10,%r12 > > + xorq %r9,%rdi > > + > > + rorq $6,%r14 > > + xorq %rdx,%r13 > > + addq %rdi,%r12 > > + > > + movq %r11,%rdi > > + addq (%rbp),%r12 > > + xorq %r11,%r14 > > + > > + xorq %rax,%rdi > > + rorq $14,%r13 > > + movq %rax,%r10 > > + > > + andq %rdi,%r15 > > + rorq $28,%r14 > > + addq %r13,%r12 > > + > > + xorq %r15,%r10 > > + addq %r12,%rcx > > + addq %r12,%r10 > > + > > + leaq 24(%rbp),%rbp > > + movq 24(%rsp),%r13 > > + movq 0(%rsp),%r15 > > + > > + movq %r13,%r12 > > + rorq $7,%r13 > > + addq %r14,%r10 > > + movq %r15,%r14 > > + rorq $42,%r15 > > + > > + xorq %r12,%r13 > > + shrq $7,%r12 > > + rorq $1,%r13 > > + xorq %r14,%r15 > > + shrq $6,%r14 > > + > > + rorq $19,%r15 > > + xorq %r13,%r12 > > + xorq %r14,%r15 > > + addq 88(%rsp),%r12 > > + > > + addq 16(%rsp),%r12 > > + movq %rcx,%r13 > > + addq %r15,%r12 > > + movq %r10,%r14 > > + rorq $23,%r13 > > + movq %rdx,%r15 > > + > > + xorq %rcx,%r13 > > + rorq $5,%r14 > > + xorq %r8,%r15 > > + > > + movq %r12,16(%rsp) > > + xorq %r10,%r14 > > + andq %rcx,%r15 > > + > > + rorq $4,%r13 > > + addq %r9,%r12 > > + xorq %r8,%r15 > > + > > + rorq $6,%r14 > > + xorq %rcx,%r13 > > + addq %r15,%r12 > > + > > + movq %r10,%r15 > > + addq (%rbp),%r12 > > + xorq %r10,%r14 > > + > > + xorq %r11,%r15 > > + rorq $14,%r13 > > + movq %r11,%r9 > > + > > + andq %r15,%rdi > > + rorq $28,%r14 > > + addq %r13,%r12 > > + > > + xorq %rdi,%r9 > > + addq %r12,%rbx > > + addq %r12,%r9 > > + > > + leaq 8(%rbp),%rbp > > + movq 32(%rsp),%r13 > > + movq 8(%rsp),%rdi > > + > > + movq %r13,%r12 > > + rorq $7,%r13 > > + addq %r14,%r9 > > + movq %rdi,%r14 > > + rorq $42,%rdi > > + > > + xorq %r12,%r13 > > + shrq $7,%r12 > > + rorq $1,%r13 > > + xorq %r14,%rdi > > + shrq $6,%r14 > > + > > + rorq $19,%rdi > > + xorq %r13,%r12 > > + xorq %r14,%rdi > > + addq 96(%rsp),%r12 > > + > > + addq 24(%rsp),%r12 > > + movq %rbx,%r13 > > + addq %rdi,%r12 > > + movq %r9,%r14 > > + rorq $23,%r13 > > + movq %rcx,%rdi > > + > > + xorq %rbx,%r13 > > + rorq $5,%r14 > > + xorq %rdx,%rdi > > + > > + movq %r12,24(%rsp) > > + xorq %r9,%r14 > > + andq %rbx,%rdi > > + > > + rorq $4,%r13 > > + addq %r8,%r12 > > + xorq %rdx,%rdi > > + > > + rorq $6,%r14 > > + xorq %rbx,%r13 > > + addq %rdi,%r12 > > + > > + movq %r9,%rdi > > + addq (%rbp),%r12 > > + xorq %r9,%r14 > > + > > + xorq %r10,%rdi > > + rorq $14,%r13 > > + movq %r10,%r8 > > + > > + andq %rdi,%r15 > > + rorq $28,%r14 > > + addq %r13,%r12 > > + > > + xorq %r15,%r8 > > + addq %r12,%rax > > + addq %r12,%r8 > > + > > + leaq 24(%rbp),%rbp > > + movq 40(%rsp),%r13 > > + movq 16(%rsp),%r15 > > + > > + movq %r13,%r12 > > + rorq $7,%r13 > > + addq %r14,%r8 > > + movq %r15,%r14 > > + rorq $42,%r15 > > + > > + xorq %r12,%r13 > > + shrq $7,%r12 > > + rorq $1,%r13 > > + xorq %r14,%r15 > > + shrq $6,%r14 > > + > > + rorq $19,%r15 > > + xorq %r13,%r12 > > + xorq %r14,%r15 > > + addq 104(%rsp),%r12 > > + > > + addq 32(%rsp),%r12 > > + movq %rax,%r13 > > + addq %r15,%r12 > > + movq %r8,%r14 > > + rorq $23,%r13 > > + movq %rbx,%r15 > > + > > + xorq %rax,%r13 > > + rorq $5,%r14 > > + xorq %rcx,%r15 > > + > > + movq %r12,32(%rsp) > > + xorq %r8,%r14 > > + andq %rax,%r15 > > + > > + rorq $4,%r13 > > + addq %rdx,%r12 > > + xorq %rcx,%r15 > > + > > + rorq $6,%r14 > > + xorq %rax,%r13 > > + addq %r15,%r12 > > + > > + movq %r8,%r15 > > + addq (%rbp),%r12 > > + xorq %r8,%r14 > > + > > + xorq %r9,%r15 > > + rorq $14,%r13 > > + movq %r9,%rdx > > + > > + andq %r15,%rdi > > + rorq $28,%r14 > > + addq %r13,%r12 > > + > > + xorq %rdi,%rdx > > + addq %r12,%r11 > > + addq %r12,%rdx > > + > > + leaq 8(%rbp),%rbp > > + movq 48(%rsp),%r13 > > + movq 24(%rsp),%rdi > > + > > + movq %r13,%r12 > > + rorq $7,%r13 > > + addq %r14,%rdx > > + movq %rdi,%r14 > > + rorq $42,%rdi > > + > > + xorq %r12,%r13 > > + shrq $7,%r12 > > + rorq $1,%r13 > > + xorq %r14,%rdi > > + shrq $6,%r14 > > + > > + rorq $19,%rdi > > + xorq %r13,%r12 > > + xorq %r14,%rdi > > + addq 112(%rsp),%r12 > > + > > + addq 40(%rsp),%r12 > > + movq %r11,%r13 > > + addq %rdi,%r12 > > + movq %rdx,%r14 > > + rorq $23,%r13 > > + movq %rax,%rdi > > + > > + xorq %r11,%r13 > > + rorq $5,%r14 > > + xorq %rbx,%rdi > > + > > + movq %r12,40(%rsp) > > + xorq %rdx,%r14 > > + andq %r11,%rdi > > + > > + rorq $4,%r13 > > + addq %rcx,%r12 > > + xorq %rbx,%rdi > > + > > + rorq $6,%r14 > > + xorq %r11,%r13 > > + addq %rdi,%r12 > > + > > + movq %rdx,%rdi > > + addq (%rbp),%r12 > > + xorq %rdx,%r14 > > + > > + xorq %r8,%rdi > > + rorq $14,%r13 > > + movq %r8,%rcx > > + > > + andq %rdi,%r15 > > + rorq $28,%r14 > > + addq %r13,%r12 > > + > > + xorq %r15,%rcx > > + addq %r12,%r10 > > + addq %r12,%rcx > > + > > + leaq 24(%rbp),%rbp > > + movq 56(%rsp),%r13 > > + movq 32(%rsp),%r15 > > + > > + movq %r13,%r12 > > + rorq $7,%r13 > > + addq %r14,%rcx > > + movq %r15,%r14 > > + rorq $42,%r15 > > + > > + xorq %r12,%r13 > > + shrq $7,%r12 > > + rorq $1,%r13 > > + xorq %r14,%r15 > > + shrq $6,%r14 > > + > > + rorq $19,%r15 > > + xorq %r13,%r12 > > + xorq %r14,%r15 > > + addq 120(%rsp),%r12 > > + > > + addq 48(%rsp),%r12 > > + movq %r10,%r13 > > + addq %r15,%r12 > > + movq %rcx,%r14 > > + rorq $23,%r13 > > + movq %r11,%r15 > > + > > + xorq %r10,%r13 > > + rorq $5,%r14 > > + xorq %rax,%r15 > > + > > + movq %r12,48(%rsp) > > + xorq %rcx,%r14 > > + andq %r10,%r15 > > + > > + rorq $4,%r13 > > + addq %rbx,%r12 > > + xorq %rax,%r15 > > + > > + rorq $6,%r14 > > + xorq %r10,%r13 > > + addq %r15,%r12 > > + > > + movq %rcx,%r15 > > + addq (%rbp),%r12 > > + xorq %rcx,%r14 > > + > > + xorq %rdx,%r15 > > + rorq $14,%r13 > > + movq %rdx,%rbx > > + > > + andq %r15,%rdi > > + rorq $28,%r14 > > + addq %r13,%r12 > > + > > + xorq %rdi,%rbx > > + addq %r12,%r9 > > + addq %r12,%rbx > > + > > + leaq 8(%rbp),%rbp > > + movq 64(%rsp),%r13 > > + movq 40(%rsp),%rdi > > + > > + movq %r13,%r12 > > + rorq $7,%r13 > > + addq %r14,%rbx > > + movq %rdi,%r14 > > + rorq $42,%rdi > > + > > + xorq %r12,%r13 > > + shrq $7,%r12 > > + rorq $1,%r13 > > + xorq %r14,%rdi > > + shrq $6,%r14 > > + > > + rorq $19,%rdi > > + xorq %r13,%r12 > > + xorq %r14,%rdi > > + addq 0(%rsp),%r12 > > + > > + addq 56(%rsp),%r12 > > + movq %r9,%r13 > > + addq %rdi,%r12 > > + movq %rbx,%r14 > > + rorq $23,%r13 > > + movq %r10,%rdi > > + > > + xorq %r9,%r13 > > + rorq $5,%r14 > > + xorq %r11,%rdi > > + > > + movq %r12,56(%rsp) > > + xorq %rbx,%r14 > > + andq %r9,%rdi > > + > > + rorq $4,%r13 > > + addq %rax,%r12 > > + xorq %r11,%rdi > > + > > + rorq $6,%r14 > > + xorq %r9,%r13 > > + addq %rdi,%r12 > > + > > + movq %rbx,%rdi > > + addq (%rbp),%r12 > > + xorq %rbx,%r14 > > + > > + xorq %rcx,%rdi > > + rorq $14,%r13 > > + movq %rcx,%rax > > + > > + andq %rdi,%r15 > > + rorq $28,%r14 > > + addq %r13,%r12 > > + > > + xorq %r15,%rax > > + addq %r12,%r8 > > + addq %r12,%rax > > + > > + leaq 24(%rbp),%rbp > > + movq 72(%rsp),%r13 > > + movq 48(%rsp),%r15 > > + > > + movq %r13,%r12 > > + rorq $7,%r13 > > + addq %r14,%rax > > + movq %r15,%r14 > > + rorq $42,%r15 > > + > > + xorq %r12,%r13 > > + shrq $7,%r12 > > + rorq $1,%r13 > > + xorq %r14,%r15 > > + shrq $6,%r14 > > + > > + rorq $19,%r15 > > + xorq %r13,%r12 > > + xorq %r14,%r15 > > + addq 8(%rsp),%r12 > > + > > + addq 64(%rsp),%r12 > > + movq %r8,%r13 > > + addq %r15,%r12 > > + movq %rax,%r14 > > + rorq $23,%r13 > > + movq %r9,%r15 > > + > > + xorq %r8,%r13 > > + rorq $5,%r14 > > + xorq %r10,%r15 > > + > > + movq %r12,64(%rsp) > > + xorq %rax,%r14 > > + andq %r8,%r15 > > + > > + rorq $4,%r13 > > + addq %r11,%r12 > > + xorq %r10,%r15 > > + > > + rorq $6,%r14 > > + xorq %r8,%r13 > > + addq %r15,%r12 > > + > > + movq %rax,%r15 > > + addq (%rbp),%r12 > > + xorq %rax,%r14 > > + > > + xorq %rbx,%r15 > > + rorq $14,%r13 > > + movq %rbx,%r11 > > + > > + andq %r15,%rdi > > + rorq $28,%r14 > > + addq %r13,%r12 > > + > > + xorq %rdi,%r11 > > + addq %r12,%rdx > > + addq %r12,%r11 > > + > > + leaq 8(%rbp),%rbp > > + movq 80(%rsp),%r13 > > + movq 56(%rsp),%rdi > > + > > + movq %r13,%r12 > > + rorq $7,%r13 > > + addq %r14,%r11 > > + movq %rdi,%r14 > > + rorq $42,%rdi > > + > > + xorq %r12,%r13 > > + shrq $7,%r12 > > + rorq $1,%r13 > > + xorq %r14,%rdi > > + shrq $6,%r14 > > + > > + rorq $19,%rdi > > + xorq %r13,%r12 > > + xorq %r14,%rdi > > + addq 16(%rsp),%r12 > > + > > + addq 72(%rsp),%r12 > > + movq %rdx,%r13 > > + addq %rdi,%r12 > > + movq %r11,%r14 > > + rorq $23,%r13 > > + movq %r8,%rdi > > + > > + xorq %rdx,%r13 > > + rorq $5,%r14 > > + xorq %r9,%rdi > > + > > + movq %r12,72(%rsp) > > + xorq %r11,%r14 > > + andq %rdx,%rdi > > + > > + rorq $4,%r13 > > + addq %r10,%r12 > > + xorq %r9,%rdi > > + > > + rorq $6,%r14 > > + xorq %rdx,%r13 > > + addq %rdi,%r12 > > + > > + movq %r11,%rdi > > + addq (%rbp),%r12 > > + xorq %r11,%r14 > > + > > + xorq %rax,%rdi > > + rorq $14,%r13 > > + movq %rax,%r10 > > + > > + andq %rdi,%r15 > > + rorq $28,%r14 > > + addq %r13,%r12 > > + > > + xorq %r15,%r10 > > + addq %r12,%rcx > > + addq %r12,%r10 > > + > > + leaq 24(%rbp),%rbp > > + movq 88(%rsp),%r13 > > + movq 64(%rsp),%r15 > > + > > + movq %r13,%r12 > > + rorq $7,%r13 > > + addq %r14,%r10 > > + movq %r15,%r14 > > + rorq $42,%r15 > > + > > + xorq %r12,%r13 > > + shrq $7,%r12 > > + rorq $1,%r13 > > + xorq %r14,%r15 > > + shrq $6,%r14 > > + > > + rorq $19,%r15 > > + xorq %r13,%r12 > > + xorq %r14,%r15 > > + addq 24(%rsp),%r12 > > + > > + addq 80(%rsp),%r12 > > + movq %rcx,%r13 > > + addq %r15,%r12 > > + movq %r10,%r14 > > + rorq $23,%r13 > > + movq %rdx,%r15 > > + > > + xorq %rcx,%r13 > > + rorq $5,%r14 > > + xorq %r8,%r15 > > + > > + movq %r12,80(%rsp) > > + xorq %r10,%r14 > > + andq %rcx,%r15 > > + > > + rorq $4,%r13 > > + addq %r9,%r12 > > + xorq %r8,%r15 > > + > > + rorq $6,%r14 > > + xorq %rcx,%r13 > > + addq %r15,%r12 > > + > > + movq %r10,%r15 > > + addq (%rbp),%r12 > > + xorq %r10,%r14 > > + > > + xorq %r11,%r15 > > + rorq $14,%r13 > > + movq %r11,%r9 > > + > > + andq %r15,%rdi > > + rorq $28,%r14 > > + addq %r13,%r12 > > + > > + xorq %rdi,%r9 > > + addq %r12,%rbx > > + addq %r12,%r9 > > + > > + leaq 8(%rbp),%rbp > > + movq 96(%rsp),%r13 > > + movq 72(%rsp),%rdi > > + > > + movq %r13,%r12 > > + rorq $7,%r13 > > + addq %r14,%r9 > > + movq %rdi,%r14 > > + rorq $42,%rdi > > + > > + xorq %r12,%r13 > > + shrq $7,%r12 > > + rorq $1,%r13 > > + xorq %r14,%rdi > > + shrq $6,%r14 > > + > > + rorq $19,%rdi > > + xorq %r13,%r12 > > + xorq %r14,%rdi > > + addq 32(%rsp),%r12 > > + > > + addq 88(%rsp),%r12 > > + movq %rbx,%r13 > > + addq %rdi,%r12 > > + movq %r9,%r14 > > + rorq $23,%r13 > > + movq %rcx,%rdi > > + > > + xorq %rbx,%r13 > > + rorq $5,%r14 > > + xorq %rdx,%rdi > > + > > + movq %r12,88(%rsp) > > + xorq %r9,%r14 > > + andq %rbx,%rdi > > + > > + rorq $4,%r13 > > + addq %r8,%r12 > > + xorq %rdx,%rdi > > + > > + rorq $6,%r14 > > + xorq %rbx,%r13 > > + addq %rdi,%r12 > > + > > + movq %r9,%rdi > > + addq (%rbp),%r12 > > + xorq %r9,%r14 > > + > > + xorq %r10,%rdi > > + rorq $14,%r13 > > + movq %r10,%r8 > > + > > + andq %rdi,%r15 > > + rorq $28,%r14 > > + addq %r13,%r12 > > + > > + xorq %r15,%r8 > > + addq %r12,%rax > > + addq %r12,%r8 > > + > > + leaq 24(%rbp),%rbp > > + movq 104(%rsp),%r13 > > + movq 80(%rsp),%r15 > > + > > + movq %r13,%r12 > > + rorq $7,%r13 > > + addq %r14,%r8 > > + movq %r15,%r14 > > + rorq $42,%r15 > > + > > + xorq %r12,%r13 > > + shrq $7,%r12 > > + rorq $1,%r13 > > + xorq %r14,%r15 > > + shrq $6,%r14 > > + > > + rorq $19,%r15 > > + xorq %r13,%r12 > > + xorq %r14,%r15 > > + addq 40(%rsp),%r12 > > + > > + addq 96(%rsp),%r12 > > + movq %rax,%r13 > > + addq %r15,%r12 > > + movq %r8,%r14 > > + rorq $23,%r13 > > + movq %rbx,%r15 > > + > > + xorq %rax,%r13 > > + rorq $5,%r14 > > + xorq %rcx,%r15 > > + > > + movq %r12,96(%rsp) > > + xorq %r8,%r14 > > + andq %rax,%r15 > > + > > + rorq $4,%r13 > > + addq %rdx,%r12 > > + xorq %rcx,%r15 > > + > > + rorq $6,%r14 > > + xorq %rax,%r13 > > + addq %r15,%r12 > > + > > + movq %r8,%r15 > > + addq (%rbp),%r12 > > + xorq %r8,%r14 > > + > > + xorq %r9,%r15 > > + rorq $14,%r13 > > + movq %r9,%rdx > > + > > + andq %r15,%rdi > > + rorq $28,%r14 > > + addq %r13,%r12 > > + > > + xorq %rdi,%rdx > > + addq %r12,%r11 > > + addq %r12,%rdx > > + > > + leaq 8(%rbp),%rbp > > + movq 112(%rsp),%r13 > > + movq 88(%rsp),%rdi > > + > > + movq %r13,%r12 > > + rorq $7,%r13 > > + addq %r14,%rdx > > + movq %rdi,%r14 > > + rorq $42,%rdi > > + > > + xorq %r12,%r13 > > + shrq $7,%r12 > > + rorq $1,%r13 > > + xorq %r14,%rdi > > + shrq $6,%r14 > > + > > + rorq $19,%rdi > > + xorq %r13,%r12 > > + xorq %r14,%rdi > > + addq 48(%rsp),%r12 > > + > > + addq 104(%rsp),%r12 > > + movq %r11,%r13 > > + addq %rdi,%r12 > > + movq %rdx,%r14 > > + rorq $23,%r13 > > + movq %rax,%rdi > > + > > + xorq %r11,%r13 > > + rorq $5,%r14 > > + xorq %rbx,%rdi > > + > > + movq %r12,104(%rsp) > > + xorq %rdx,%r14 > > + andq %r11,%rdi > > + > > + rorq $4,%r13 > > + addq %rcx,%r12 > > + xorq %rbx,%rdi > > + > > + rorq $6,%r14 > > + xorq %r11,%r13 > > + addq %rdi,%r12 > > + > > + movq %rdx,%rdi > > + addq (%rbp),%r12 > > + xorq %rdx,%r14 > > + > > + xorq %r8,%rdi > > + rorq $14,%r13 > > + movq %r8,%rcx > > + > > + andq %rdi,%r15 > > + rorq $28,%r14 > > + addq %r13,%r12 > > + > > + xorq %r15,%rcx > > + addq %r12,%r10 > > + addq %r12,%rcx > > + > > + leaq 24(%rbp),%rbp > > + movq 120(%rsp),%r13 > > + movq 96(%rsp),%r15 > > + > > + movq %r13,%r12 > > + rorq $7,%r13 > > + addq %r14,%rcx > > + movq %r15,%r14 > > + rorq $42,%r15 > > + > > + xorq %r12,%r13 > > + shrq $7,%r12 > > + rorq $1,%r13 > > + xorq %r14,%r15 > > + shrq $6,%r14 > > + > > + rorq $19,%r15 > > + xorq %r13,%r12 > > + xorq %r14,%r15 > > + addq 56(%rsp),%r12 > > + > > + addq 112(%rsp),%r12 > > + movq %r10,%r13 > > + addq %r15,%r12 > > + movq %rcx,%r14 > > + rorq $23,%r13 > > + movq %r11,%r15 > > + > > + xorq %r10,%r13 > > + rorq $5,%r14 > > + xorq %rax,%r15 > > + > > + movq %r12,112(%rsp) > > + xorq %rcx,%r14 > > + andq %r10,%r15 > > + > > + rorq $4,%r13 > > + addq %rbx,%r12 > > + xorq %rax,%r15 > > + > > + rorq $6,%r14 > > + xorq %r10,%r13 > > + addq %r15,%r12 > > + > > + movq %rcx,%r15 > > + addq (%rbp),%r12 > > + xorq %rcx,%r14 > > + > > + xorq %rdx,%r15 > > + rorq $14,%r13 > > + movq %rdx,%rbx > > + > > + andq %r15,%rdi > > + rorq $28,%r14 > > + addq %r13,%r12 > > + > > + xorq %rdi,%rbx > > + addq %r12,%r9 > > + addq %r12,%rbx > > + > > + leaq 8(%rbp),%rbp > > + movq 0(%rsp),%r13 > > + movq 104(%rsp),%rdi > > + > > + movq %r13,%r12 > > + rorq $7,%r13 > > + addq %r14,%rbx > > + movq %rdi,%r14 > > + rorq $42,%rdi > > + > > + xorq %r12,%r13 > > + shrq $7,%r12 > > + rorq $1,%r13 > > + xorq %r14,%rdi > > + shrq $6,%r14 > > + > > + rorq $19,%rdi > > + xorq %r13,%r12 > > + xorq %r14,%rdi > > + addq 64(%rsp),%r12 > > + > > + addq 120(%rsp),%r12 > > + movq %r9,%r13 > > + addq %rdi,%r12 > > + movq %rbx,%r14 > > + rorq $23,%r13 > > + movq %r10,%rdi > > + > > + xorq %r9,%r13 > > + rorq $5,%r14 > > + xorq %r11,%rdi > > + > > + movq %r12,120(%rsp) > > + xorq %rbx,%r14 > > + andq %r9,%rdi > > + > > + rorq $4,%r13 > > + addq %rax,%r12 > > + xorq %r11,%rdi > > + > > + rorq $6,%r14 > > + xorq %r9,%r13 > > + addq %rdi,%r12 > > + > > + movq %rbx,%rdi > > + addq (%rbp),%r12 > > + xorq %rbx,%r14 > > + > > + xorq %rcx,%rdi > > + rorq $14,%r13 > > + movq %rcx,%rax > > + > > + andq %rdi,%r15 > > + rorq $28,%r14 > > + addq %r13,%r12 > > + > > + xorq %r15,%rax > > + addq %r12,%r8 > > + addq %r12,%rax > > + > > + leaq 24(%rbp),%rbp > > + cmpb $0,7(%rbp) > > + jnz .Lrounds_16_xx > > + > > + movq 128+0(%rsp),%rdi > > + addq %r14,%rax > > + leaq 128(%rsi),%rsi > > + > > + addq 0(%rdi),%rax > > + addq 8(%rdi),%rbx > > + addq 16(%rdi),%rcx > > + addq 24(%rdi),%rdx > > + addq 32(%rdi),%r8 > > + addq 40(%rdi),%r9 > > + addq 48(%rdi),%r10 > > + addq 56(%rdi),%r11 > > + > > + cmpq 128+16(%rsp),%rsi > > + > > + movq %rax,0(%rdi) > > + movq %rbx,8(%rdi) > > + movq %rcx,16(%rdi) > > + movq %rdx,24(%rdi) > > + movq %r8,32(%rdi) > > + movq %r9,40(%rdi) > > + movq %r10,48(%rdi) > > + movq %r11,56(%rdi) > > + jb .Lloop > > + > > + movq 152(%rsp),%rsi > > +.cfi_def_cfa %rsi,8 > > + movq -48(%rsi),%r15 > > +.cfi_restore %r15 > > + movq -40(%rsi),%r14 > > +.cfi_restore %r14 > > + movq -32(%rsi),%r13 > > +.cfi_restore %r13 > > + movq -24(%rsi),%r12 > > +.cfi_restore %r12 > > + movq -16(%rsi),%rbp > > +.cfi_restore %rbp > > + movq -8(%rsi),%rbx > > +.cfi_restore %rbx > > + leaq (%rsi),%rsp > > +.cfi_def_cfa_register %rsp > > +.Lepilogue: > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size sha512_block_data_order,.-sha512_block_data_order > > +.align 64 > > +.type K512,@object > > +K512: > > +.quad 0x428a2f98d728ae22,0x7137449123ef65cd > > +.quad 0x428a2f98d728ae22,0x7137449123ef65cd > > +.quad 0xb5c0fbcfec4d3b2f,0xe9b5dba58189dbbc > > +.quad 0xb5c0fbcfec4d3b2f,0xe9b5dba58189dbbc > > +.quad 0x3956c25bf348b538,0x59f111f1b605d019 > > +.quad 0x3956c25bf348b538,0x59f111f1b605d019 > > +.quad 0x923f82a4af194f9b,0xab1c5ed5da6d8118 > > +.quad 0x923f82a4af194f9b,0xab1c5ed5da6d8118 > > +.quad 0xd807aa98a3030242,0x12835b0145706fbe > > +.quad 0xd807aa98a3030242,0x12835b0145706fbe > > +.quad 0x243185be4ee4b28c,0x550c7dc3d5ffb4e2 > > +.quad 0x243185be4ee4b28c,0x550c7dc3d5ffb4e2 > > +.quad 0x72be5d74f27b896f,0x80deb1fe3b1696b1 > > +.quad 0x72be5d74f27b896f,0x80deb1fe3b1696b1 > > +.quad 0x9bdc06a725c71235,0xc19bf174cf692694 > > +.quad 0x9bdc06a725c71235,0xc19bf174cf692694 > > +.quad 0xe49b69c19ef14ad2,0xefbe4786384f25e3 > > +.quad 0xe49b69c19ef14ad2,0xefbe4786384f25e3 > > +.quad 0x0fc19dc68b8cd5b5,0x240ca1cc77ac9c65 > > +.quad 0x0fc19dc68b8cd5b5,0x240ca1cc77ac9c65 > > +.quad 0x2de92c6f592b0275,0x4a7484aa6ea6e483 > > +.quad 0x2de92c6f592b0275,0x4a7484aa6ea6e483 > > +.quad 0x5cb0a9dcbd41fbd4,0x76f988da831153b5 > > +.quad 0x5cb0a9dcbd41fbd4,0x76f988da831153b5 > > +.quad 0x983e5152ee66dfab,0xa831c66d2db43210 > > +.quad 0x983e5152ee66dfab,0xa831c66d2db43210 > > +.quad 0xb00327c898fb213f,0xbf597fc7beef0ee4 > > +.quad 0xb00327c898fb213f,0xbf597fc7beef0ee4 > > +.quad 0xc6e00bf33da88fc2,0xd5a79147930aa725 > > +.quad 0xc6e00bf33da88fc2,0xd5a79147930aa725 > > +.quad 0x06ca6351e003826f,0x142929670a0e6e70 > > +.quad 0x06ca6351e003826f,0x142929670a0e6e70 > > +.quad 0x27b70a8546d22ffc,0x2e1b21385c26c926 > > +.quad 0x27b70a8546d22ffc,0x2e1b21385c26c926 > > +.quad 0x4d2c6dfc5ac42aed,0x53380d139d95b3df > > +.quad 0x4d2c6dfc5ac42aed,0x53380d139d95b3df > > +.quad 0x650a73548baf63de,0x766a0abb3c77b2a8 > > +.quad 0x650a73548baf63de,0x766a0abb3c77b2a8 > > +.quad 0x81c2c92e47edaee6,0x92722c851482353b > > +.quad 0x81c2c92e47edaee6,0x92722c851482353b > > +.quad 0xa2bfe8a14cf10364,0xa81a664bbc423001 > > +.quad 0xa2bfe8a14cf10364,0xa81a664bbc423001 > > +.quad 0xc24b8b70d0f89791,0xc76c51a30654be30 > > +.quad 0xc24b8b70d0f89791,0xc76c51a30654be30 > > +.quad 0xd192e819d6ef5218,0xd69906245565a910 > > +.quad 0xd192e819d6ef5218,0xd69906245565a910 > > +.quad 0xf40e35855771202a,0x106aa07032bbd1b8 > > +.quad 0xf40e35855771202a,0x106aa07032bbd1b8 > > +.quad 0x19a4c116b8d2d0c8,0x1e376c085141ab53 > > +.quad 0x19a4c116b8d2d0c8,0x1e376c085141ab53 > > +.quad 0x2748774cdf8eeb99,0x34b0bcb5e19b48a8 > > +.quad 0x2748774cdf8eeb99,0x34b0bcb5e19b48a8 > > +.quad 0x391c0cb3c5c95a63,0x4ed8aa4ae3418acb > > +.quad 0x391c0cb3c5c95a63,0x4ed8aa4ae3418acb > > +.quad 0x5b9cca4f7763e373,0x682e6ff3d6b2b8a3 > > +.quad 0x5b9cca4f7763e373,0x682e6ff3d6b2b8a3 > > +.quad 0x748f82ee5defb2fc,0x78a5636f43172f60 > > +.quad 0x748f82ee5defb2fc,0x78a5636f43172f60 > > +.quad 0x84c87814a1f0ab72,0x8cc702081a6439ec > > +.quad 0x84c87814a1f0ab72,0x8cc702081a6439ec > > +.quad 0x90befffa23631e28,0xa4506cebde82bde9 > > +.quad 0x90befffa23631e28,0xa4506cebde82bde9 > > +.quad 0xbef9a3f7b2c67915,0xc67178f2e372532b > > +.quad 0xbef9a3f7b2c67915,0xc67178f2e372532b > > +.quad 0xca273eceea26619c,0xd186b8c721c0c207 > > +.quad 0xca273eceea26619c,0xd186b8c721c0c207 > > +.quad 0xeada7dd6cde0eb1e,0xf57d4f7fee6ed178 > > +.quad 0xeada7dd6cde0eb1e,0xf57d4f7fee6ed178 > > +.quad 0x06f067aa72176fba,0x0a637dc5a2c898a6 > > +.quad 0x06f067aa72176fba,0x0a637dc5a2c898a6 > > +.quad 0x113f9804bef90dae,0x1b710b35131c471b > > +.quad 0x113f9804bef90dae,0x1b710b35131c471b > > +.quad 0x28db77f523047d84,0x32caab7b40c72493 > > +.quad 0x28db77f523047d84,0x32caab7b40c72493 > > +.quad 0x3c9ebe0a15c9bebc,0x431d67c49c100d4c > > +.quad 0x3c9ebe0a15c9bebc,0x431d67c49c100d4c > > +.quad 0x4cc5d4becb3e42b6,0x597f299cfc657e2a > > +.quad 0x4cc5d4becb3e42b6,0x597f299cfc657e2a > > +.quad 0x5fcb6fab3ad6faec,0x6c44198c4a475817 > > +.quad 0x5fcb6fab3ad6faec,0x6c44198c4a475817 > > + > > +.quad 0x0001020304050607,0x08090a0b0c0d0e0f > > +.quad 0x0001020304050607,0x08090a0b0c0d0e0f > > +.byte > > > 83,72,65,53,49,50,32,98,108,111,99,107,32,116,114,97,110,115,102,111,114,1 > > > 09,32,102,111,114,32,120,56,54,95,54,52,44,32,67,82,89,80,84,79,71,65,77,83 > , > > > 32,98,121,32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46,111, > 1 > > 14,103,62,0 > > diff --git a/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/x86_64cpuid.S > > b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/x86_64cpuid.S > > new file mode 100644 > > index 0000000000..cac5f8f32c > > --- /dev/null > > +++ b/CryptoPkg/Library/OpensslLib/X64Gcc/crypto/x86_64cpuid.S > > @@ -0,0 +1,491 @@ > > +# WARNING: do not edit! > > +# Generated from openssl/crypto/x86_64cpuid.pl > > +# > > +# Copyright 2005-2020 The OpenSSL Project Authors. All Rights Reserved. > > +# > > +# Licensed under the OpenSSL license (the "License"). You may not use > > +# this file except in compliance with the License. You can obtain a copy > > +# in the file LICENSE in the source distribution or at > > +# https://www.openssl.org/source/license.html > > + > > + > > +.hidden OPENSSL_cpuid_setup > > +.section .init > > + call OPENSSL_cpuid_setup > > + > > +.hidden OPENSSL_ia32cap_P > > +.comm OPENSSL_ia32cap_P,16,4 > > + > > +.text > > + > > +.globl OPENSSL_atomic_add > > +.type OPENSSL_atomic_add,@function > > +.align 16 > > +OPENSSL_atomic_add: > > +.cfi_startproc > > + movl (%rdi),%eax > > +.Lspin: leaq (%rsi,%rax,1),%r8 > > +.byte 0xf0 > > + cmpxchgl %r8d,(%rdi) > > + jne .Lspin > > + movl %r8d,%eax > > +.byte 0x48,0x98 > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size OPENSSL_atomic_add,.-OPENSSL_atomic_add > > + > > +.globl OPENSSL_rdtsc > > +.type OPENSSL_rdtsc,@function > > +.align 16 > > +OPENSSL_rdtsc: > > +.cfi_startproc > > + rdtsc > > + shlq $32,%rdx > > + orq %rdx,%rax > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size OPENSSL_rdtsc,.-OPENSSL_rdtsc > > + > > +.globl OPENSSL_ia32_cpuid > > +.type OPENSSL_ia32_cpuid,@function > > +.align 16 > > +OPENSSL_ia32_cpuid: > > +.cfi_startproc > > + movq %rbx,%r8 > > +.cfi_register %rbx,%r8 > > + > > + xorl %eax,%eax > > + movq %rax,8(%rdi) > > + cpuid > > + movl %eax,%r11d > > + > > + xorl %eax,%eax > > + cmpl $0x756e6547,%ebx > > + setne %al > > + movl %eax,%r9d > > + cmpl $0x49656e69,%edx > > + setne %al > > + orl %eax,%r9d > > + cmpl $0x6c65746e,%ecx > > + setne %al > > + orl %eax,%r9d > > + jz .Lintel > > + > > + cmpl $0x68747541,%ebx > > + setne %al > > + movl %eax,%r10d > > + cmpl $0x69746E65,%edx > > + setne %al > > + orl %eax,%r10d > > + cmpl $0x444D4163,%ecx > > + setne %al > > + orl %eax,%r10d > > + jnz .Lintel > > + > > + > > + movl $0x80000000,%eax > > + cpuid > > + cmpl $0x80000001,%eax > > + jb .Lintel > > + movl %eax,%r10d > > + movl $0x80000001,%eax > > + cpuid > > + orl %ecx,%r9d > > + andl $0x00000801,%r9d > > + > > + cmpl $0x80000008,%r10d > > + jb .Lintel > > + > > + movl $0x80000008,%eax > > + cpuid > > + movzbq %cl,%r10 > > + incq %r10 > > + > > + movl $1,%eax > > + cpuid > > + btl $28,%edx > > + jnc .Lgeneric > > + shrl $16,%ebx > > + cmpb %r10b,%bl > > + ja .Lgeneric > > + andl $0xefffffff,%edx > > + jmp .Lgeneric > > + > > +.Lintel: > > + cmpl $4,%r11d > > + movl $-1,%r10d > > + jb .Lnocacheinfo > > + > > + movl $4,%eax > > + movl $0,%ecx > > + cpuid > > + movl %eax,%r10d > > + shrl $14,%r10d > > + andl $0xfff,%r10d > > + > > +.Lnocacheinfo: > > + movl $1,%eax > > + cpuid > > + movd %eax,%xmm0 > > + andl $0xbfefffff,%edx > > + cmpl $0,%r9d > > + jne .Lnotintel > > + orl $0x40000000,%edx > > + andb $15,%ah > > + cmpb $15,%ah > > + jne .LnotP4 > > + orl $0x00100000,%edx > > +.LnotP4: > > + cmpb $6,%ah > > + jne .Lnotintel > > + andl $0x0fff0ff0,%eax > > + cmpl $0x00050670,%eax > > + je .Lknights > > + cmpl $0x00080650,%eax > > + jne .Lnotintel > > +.Lknights: > > + andl $0xfbffffff,%ecx > > + > > +.Lnotintel: > > + btl $28,%edx > > + jnc .Lgeneric > > + andl $0xefffffff,%edx > > + cmpl $0,%r10d > > + je .Lgeneric > > + > > + orl $0x10000000,%edx > > + shrl $16,%ebx > > + cmpb $1,%bl > > + ja .Lgeneric > > + andl $0xefffffff,%edx > > +.Lgeneric: > > + andl $0x00000800,%r9d > > + andl $0xfffff7ff,%ecx > > + orl %ecx,%r9d > > + > > + movl %edx,%r10d > > + > > + cmpl $7,%r11d > > + jb .Lno_extended_info > > + movl $7,%eax > > + xorl %ecx,%ecx > > + cpuid > > + btl $26,%r9d > > + jc .Lnotknights > > + andl $0xfff7ffff,%ebx > > +.Lnotknights: > > + movd %xmm0,%eax > > + andl $0x0fff0ff0,%eax > > + cmpl $0x00050650,%eax > > + jne .Lnotskylakex > > + andl $0xfffeffff,%ebx > > + > > +.Lnotskylakex: > > + movl %ebx,8(%rdi) > > + movl %ecx,12(%rdi) > > +.Lno_extended_info: > > + > > + btl $27,%r9d > > + jnc .Lclear_avx > > + xorl %ecx,%ecx > > +.byte 0x0f,0x01,0xd0 > > + andl $0xe6,%eax > > + cmpl $0xe6,%eax > > + je .Ldone > > + andl $0x3fdeffff,8(%rdi) > > + > > + > > + > > + > > + andl $6,%eax > > + cmpl $6,%eax > > + je .Ldone > > +.Lclear_avx: > > + movl $0xefffe7ff,%eax > > + andl %eax,%r9d > > + movl $0x3fdeffdf,%eax > > + andl %eax,8(%rdi) > > +.Ldone: > > + shlq $32,%r9 > > + movl %r10d,%eax > > + movq %r8,%rbx > > +.cfi_restore %rbx > > + orq %r9,%rax > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size OPENSSL_ia32_cpuid,.-OPENSSL_ia32_cpuid > > + > > +.globl OPENSSL_cleanse > > +.type OPENSSL_cleanse,@function > > +.align 16 > > +OPENSSL_cleanse: > > +.cfi_startproc > > + xorq %rax,%rax > > + cmpq $15,%rsi > > + jae .Lot > > + cmpq $0,%rsi > > + je .Lret > > +.Little: > > + movb %al,(%rdi) > > + subq $1,%rsi > > + leaq 1(%rdi),%rdi > > + jnz .Little > > +.Lret: > > + .byte 0xf3,0xc3 > > +.align 16 > > +.Lot: > > + testq $7,%rdi > > + jz .Laligned > > + movb %al,(%rdi) > > + leaq -1(%rsi),%rsi > > + leaq 1(%rdi),%rdi > > + jmp .Lot > > +.Laligned: > > + movq %rax,(%rdi) > > + leaq -8(%rsi),%rsi > > + testq $-8,%rsi > > + leaq 8(%rdi),%rdi > > + jnz .Laligned > > + cmpq $0,%rsi > > + jne .Little > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size OPENSSL_cleanse,.-OPENSSL_cleanse > > + > > +.globl CRYPTO_memcmp > > +.type CRYPTO_memcmp,@function > > +.align 16 > > +CRYPTO_memcmp: > > +.cfi_startproc > > + xorq %rax,%rax > > + xorq %r10,%r10 > > + cmpq $0,%rdx > > + je .Lno_data > > + cmpq $16,%rdx > > + jne .Loop_cmp > > + movq (%rdi),%r10 > > + movq 8(%rdi),%r11 > > + movq $1,%rdx > > + xorq (%rsi),%r10 > > + xorq 8(%rsi),%r11 > > + orq %r11,%r10 > > + cmovnzq %rdx,%rax > > + .byte 0xf3,0xc3 > > + > > +.align 16 > > +.Loop_cmp: > > + movb (%rdi),%r10b > > + leaq 1(%rdi),%rdi > > + xorb (%rsi),%r10b > > + leaq 1(%rsi),%rsi > > + orb %r10b,%al > > + decq %rdx > > + jnz .Loop_cmp > > + negq %rax > > + shrq $63,%rax > > +.Lno_data: > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size CRYPTO_memcmp,.-CRYPTO_memcmp > > +.globl OPENSSL_wipe_cpu > > +.type OPENSSL_wipe_cpu,@function > > +.align 16 > > +OPENSSL_wipe_cpu: > > +.cfi_startproc > > + pxor %xmm0,%xmm0 > > + pxor %xmm1,%xmm1 > > + pxor %xmm2,%xmm2 > > + pxor %xmm3,%xmm3 > > + pxor %xmm4,%xmm4 > > + pxor %xmm5,%xmm5 > > + pxor %xmm6,%xmm6 > > + pxor %xmm7,%xmm7 > > + pxor %xmm8,%xmm8 > > + pxor %xmm9,%xmm9 > > + pxor %xmm10,%xmm10 > > + pxor %xmm11,%xmm11 > > + pxor %xmm12,%xmm12 > > + pxor %xmm13,%xmm13 > > + pxor %xmm14,%xmm14 > > + pxor %xmm15,%xmm15 > > + xorq %rcx,%rcx > > + xorq %rdx,%rdx > > + xorq %rsi,%rsi > > + xorq %rdi,%rdi > > + xorq %r8,%r8 > > + xorq %r9,%r9 > > + xorq %r10,%r10 > > + xorq %r11,%r11 > > + leaq 8(%rsp),%rax > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size OPENSSL_wipe_cpu,.-OPENSSL_wipe_cpu > > +.globl OPENSSL_instrument_bus > > +.type OPENSSL_instrument_bus,@function > > +.align 16 > > +OPENSSL_instrument_bus: > > +.cfi_startproc > > + movq %rdi,%r10 > > + movq %rsi,%rcx > > + movq %rsi,%r11 > > + > > + rdtsc > > + movl %eax,%r8d > > + movl $0,%r9d > > + clflush (%r10) > > +.byte 0xf0 > > + addl %r9d,(%r10) > > + jmp .Loop > > +.align 16 > > +.Loop: rdtsc > > + movl %eax,%edx > > + subl %r8d,%eax > > + movl %edx,%r8d > > + movl %eax,%r9d > > + clflush (%r10) > > +.byte 0xf0 > > + addl %eax,(%r10) > > + leaq 4(%r10),%r10 > > + subq $1,%rcx > > + jnz .Loop > > + > > + movq %r11,%rax > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size OPENSSL_instrument_bus,.-OPENSSL_instrument_bus > > + > > +.globl OPENSSL_instrument_bus2 > > +.type OPENSSL_instrument_bus2,@function > > +.align 16 > > +OPENSSL_instrument_bus2: > > +.cfi_startproc > > + movq %rdi,%r10 > > + movq %rsi,%rcx > > + movq %rdx,%r11 > > + movq %rcx,8(%rsp) > > + > > + rdtsc > > + movl %eax,%r8d > > + movl $0,%r9d > > + > > + clflush (%r10) > > +.byte 0xf0 > > + addl %r9d,(%r10) > > + > > + rdtsc > > + movl %eax,%edx > > + subl %r8d,%eax > > + movl %edx,%r8d > > + movl %eax,%r9d > > +.Loop2: > > + clflush (%r10) > > +.byte 0xf0 > > + addl %eax,(%r10) > > + > > + subq $1,%r11 > > + jz .Ldone2 > > + > > + rdtsc > > + movl %eax,%edx > > + subl %r8d,%eax > > + movl %edx,%r8d > > + cmpl %r9d,%eax > > + movl %eax,%r9d > > + movl $0,%edx > > + setne %dl > > + subq %rdx,%rcx > > + leaq (%r10,%rdx,4),%r10 > > + jnz .Loop2 > > + > > +.Ldone2: > > + movq 8(%rsp),%rax > > + subq %rcx,%rax > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size OPENSSL_instrument_bus2,.-OPENSSL_instrument_bus2 > > +.globl OPENSSL_ia32_rdrand_bytes > > +.type OPENSSL_ia32_rdrand_bytes,@function > > +.align 16 > > +OPENSSL_ia32_rdrand_bytes: > > +.cfi_startproc > > + xorq %rax,%rax > > + cmpq $0,%rsi > > + je .Ldone_rdrand_bytes > > + > > + movq $8,%r11 > > +.Loop_rdrand_bytes: > > +.byte 73,15,199,242 > > + jc .Lbreak_rdrand_bytes > > + decq %r11 > > + jnz .Loop_rdrand_bytes > > + jmp .Ldone_rdrand_bytes > > + > > +.align 16 > > +.Lbreak_rdrand_bytes: > > + cmpq $8,%rsi > > + jb .Ltail_rdrand_bytes > > + movq %r10,(%rdi) > > + leaq 8(%rdi),%rdi > > + addq $8,%rax > > + subq $8,%rsi > > + jz .Ldone_rdrand_bytes > > + movq $8,%r11 > > + jmp .Loop_rdrand_bytes > > + > > +.align 16 > > +.Ltail_rdrand_bytes: > > + movb %r10b,(%rdi) > > + leaq 1(%rdi),%rdi > > + incq %rax > > + shrq $8,%r10 > > + decq %rsi > > + jnz .Ltail_rdrand_bytes > > + > > +.Ldone_rdrand_bytes: > > + xorq %r10,%r10 > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size OPENSSL_ia32_rdrand_bytes,.-OPENSSL_ia32_rdrand_bytes > > +.globl OPENSSL_ia32_rdseed_bytes > > +.type OPENSSL_ia32_rdseed_bytes,@function > > +.align 16 > > +OPENSSL_ia32_rdseed_bytes: > > +.cfi_startproc > > + xorq %rax,%rax > > + cmpq $0,%rsi > > + je .Ldone_rdseed_bytes > > + > > + movq $8,%r11 > > +.Loop_rdseed_bytes: > > +.byte 73,15,199,250 > > + jc .Lbreak_rdseed_bytes > > + decq %r11 > > + jnz .Loop_rdseed_bytes > > + jmp .Ldone_rdseed_bytes > > + > > +.align 16 > > +.Lbreak_rdseed_bytes: > > + cmpq $8,%rsi > > + jb .Ltail_rdseed_bytes > > + movq %r10,(%rdi) > > + leaq 8(%rdi),%rdi > > + addq $8,%rax > > + subq $8,%rsi > > + jz .Ldone_rdseed_bytes > > + movq $8,%r11 > > + jmp .Loop_rdseed_bytes > > + > > +.align 16 > > +.Ltail_rdseed_bytes: > > + movb %r10b,(%rdi) > > + leaq 1(%rdi),%rdi > > + incq %rax > > + shrq $8,%r10 > > + decq %rsi > > + jnz .Ltail_rdseed_bytes > > + > > +.Ldone_rdseed_bytes: > > + xorq %r10,%r10 > > + .byte 0xf3,0xc3 > > +.cfi_endproc > > +.size OPENSSL_ia32_rdseed_bytes,.-OPENSSL_ia32_rdseed_bytes > > -- > > 2.32.0.windows.1 > > > > > ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2021-08-06 20:01 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <20210720220646.659-1-christopher.zurcher@outlook.com> 2021-07-20 22:06 ` [PATCH v7 1/3] BaseTools: Remove COMMON section from the GCC discard list Christopher Zurcher 2021-07-21 1:11 ` 回复: " gaoliming 2021-07-21 1:14 ` [edk2-devel] " Christopher Zurcher 2021-07-21 1:46 ` 回复: " gaoliming 2021-07-21 11:44 ` [edk2-devel] " Yao, Jiewen 2021-08-04 12:26 ` Ard Biesheuvel 2021-08-05 5:04 ` 回复: " gaoliming 2021-08-06 20:01 ` Christopher Zurcher 2021-07-20 22:06 ` [PATCH v7 2/3] CryptoPkg/OpensslLib: Add native instruction support for X64 Christopher Zurcher 2021-07-21 11:44 ` Yao, Jiewen 2021-07-20 22:06 ` [PATCH v7 3/3] CryptoPkg/OpensslLib: Commit the auto-generated assembly files " Christopher Zurcher 2021-07-21 11:44 ` Yao, Jiewen 2021-07-26 10:08 ` 回复: [edk2-devel] " gaoliming
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox