* [PATCH 0/1] CryptoPkg/OpensslLib: Add native instruction support for IA32 and X64
@ 2020-03-17 10:26 Zurcher, Christopher J
2020-03-17 10:26 ` [PATCH 1/1] " Zurcher, Christopher J
` (2 more replies)
0 siblings, 3 replies; 14+ messages in thread
From: Zurcher, Christopher J @ 2020-03-17 10:26 UTC (permalink / raw)
To: devel; +Cc: Jian J Wang, Xiaoyu Lu, Eugene Cohen, Ard Biesheuvel
BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=2507
This patch adds support for building the native instruction algorithms for
IA32 and X64 versions of OpensslLib. The process_files.pl script was modified
to parse the .asm file targets from the OpenSSL build config data struct, and
generate the necessary assembly files for the EDK2 build environment.
For the X64 variant, OpenSSL includes calls to a Windows error handling API,
and that function has been stubbed out in ApiHooks.c.
For all variants, a constructor was added to call the required CPUID function
within OpenSSL to facilitate processor capability checks in the native
algorithms.
Additional native architecture variants should be simple to add by following
the changes made for these two architectures.
The OpenSSL assembly files are traditionally generated at build time using a
perl script. To avoid that burden on EDK2 users, these end-result assembly
files are generated during the configuration steps performed by the package
maintainer (through process_files.pl). The perl generator scripts inside
OpenSSL do not parse file comments as they are only meant to create
intermediate build files, so process_files.pl contains additional hooks to
preserve the copyright headers as well as clean up tabs and line endings to
comply with EDK2 coding standards. The resulting file headers align with
the generated .h files which are already included in the EDK2 repository.
Cc: Jian J Wang <jian.j.wang@intel.com>
Cc: Xiaoyu Lu <xiaoyux.lu@intel.com>
Cc: Eugene Cohen <eugene@hp.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Christopher J Zurcher (1):
CryptoPkg/OpensslLib: Add native instruction support for IA32 and X64
CryptoPkg/Library/OpensslLib/OpensslLib.inf | 2 +-
CryptoPkg/Library/OpensslLib/OpensslLibCrypto.inf | 2 +-
CryptoPkg/Library/OpensslLib/OpensslLibIa32.inf | 680 ++
CryptoPkg/Library/OpensslLib/OpensslLibX64.inf | 691 ++
CryptoPkg/Library/Include/openssl/opensslconf.h | 3 -
CryptoPkg/Library/OpensslLib/ApiHooks.c | 18 +
CryptoPkg/Library/OpensslLib/OpensslLibConstructor.c | 34 +
CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/aesni-x86.nasm | 3209 ++++++++
CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/vpaes-x86.nasm | 648 ++
CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/bn-586.nasm | 1522 ++++
CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/co-586.nasm | 1259 +++
CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-gf2m.nasm | 352 +
CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-mont.nasm | 486 ++
CryptoPkg/Library/OpensslLib/Ia32/crypto/des/crypt586.nasm | 887 +++
CryptoPkg/Library/OpensslLib/Ia32/crypto/des/des-586.nasm | 1835 +++++
CryptoPkg/Library/OpensslLib/Ia32/crypto/md5/md5-586.nasm | 690 ++
CryptoPkg/Library/OpensslLib/Ia32/crypto/modes/ghash-x86.nasm | 1264 +++
CryptoPkg/Library/OpensslLib/Ia32/crypto/rc4/rc4-586.nasm | 381 +
CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha1-586.nasm | 3977 ++++++++++
CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha256-586.nasm | 6796 ++++++++++++++++
CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha512-586.nasm | 2842 +++++++
CryptoPkg/Library/OpensslLib/Ia32/crypto/x86cpuid.nasm | 513 ++
CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-mb-x86_64.nasm | 1772 +++++
CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha1-x86_64.nasm | 3271 ++++++++
CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha256-x86_64.nasm | 4709 +++++++++++
CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-x86_64.nasm | 5084 ++++++++++++
CryptoPkg/Library/OpensslLib/X64/crypto/aes/vpaes-x86_64.nasm | 1170 +++
CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-avx2.nasm | 1989 +++++
CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-x86_64.nasm | 2242 ++++++
CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-gf2m.nasm | 432 +
CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont.nasm | 1479 ++++
CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont5.nasm | 4033 ++++++++++
CryptoPkg/Library/OpensslLib/X64/crypto/md5/md5-x86_64.nasm | 794 ++
CryptoPkg/Library/OpensslLib/X64/crypto/modes/aesni-gcm-x86_64.nasm | 984 +++
CryptoPkg/Library/OpensslLib/X64/crypto/modes/ghash-x86_64.nasm | 2077 +++++
CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-md5-x86_64.nasm | 1395 ++++
CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-x86_64.nasm | 784 ++
CryptoPkg/Library/OpensslLib/X64/crypto/sha/keccak1600-x86_64.nasm | 532 ++
CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-mb-x86_64.nasm | 7581 ++++++++++++++++++
CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-x86_64.nasm | 5773 ++++++++++++++
CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-mb-x86_64.nasm | 8262 ++++++++++++++++++++
CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-x86_64.nasm | 5712 ++++++++++++++
CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha512-x86_64.nasm | 5668 ++++++++++++++
CryptoPkg/Library/OpensslLib/X64/crypto/x86_64cpuid.nasm | 472 ++
CryptoPkg/Library/OpensslLib/process_files.pl | 208 +-
CryptoPkg/Library/OpensslLib/uefi-asm.conf | 14 +
46 files changed, 94478 insertions(+), 50 deletions(-)
create mode 100644 CryptoPkg/Library/OpensslLib/OpensslLibIa32.inf
create mode 100644 CryptoPkg/Library/OpensslLib/OpensslLibX64.inf
create mode 100644 CryptoPkg/Library/OpensslLib/ApiHooks.c
create mode 100644 CryptoPkg/Library/OpensslLib/OpensslLibConstructor.c
create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/aesni-x86.nasm
create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/vpaes-x86.nasm
create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/bn-586.nasm
create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/co-586.nasm
create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-gf2m.nasm
create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-mont.nasm
create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/des/crypt586.nasm
create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/des/des-586.nasm
create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/md5/md5-586.nasm
create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/modes/ghash-x86.nasm
create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/rc4/rc4-586.nasm
create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha1-586.nasm
create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha256-586.nasm
create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha512-586.nasm
create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/x86cpuid.nasm
create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-mb-x86_64.nasm
create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha1-x86_64.nasm
create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha256-x86_64.nasm
create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-x86_64.nasm
create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/aes/vpaes-x86_64.nasm
create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-avx2.nasm
create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-x86_64.nasm
create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-gf2m.nasm
create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont.nasm
create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont5.nasm
create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/md5/md5-x86_64.nasm
create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/modes/aesni-gcm-x86_64.nasm
create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/modes/ghash-x86_64.nasm
create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-md5-x86_64.nasm
create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-x86_64.nasm
create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/sha/keccak1600-x86_64.nasm
create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-mb-x86_64.nasm
create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-x86_64.nasm
create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-mb-x86_64.nasm
create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-x86_64.nasm
create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha512-x86_64.nasm
create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/x86_64cpuid.nasm
create mode 100644 CryptoPkg/Library/OpensslLib/uefi-asm.conf
--
2.16.2.windows.1
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 1/1] CryptoPkg/OpensslLib: Add native instruction support for IA32 and X64
2020-03-17 10:26 [PATCH 0/1] CryptoPkg/OpensslLib: Add native instruction support for IA32 and X64 Zurcher, Christopher J
@ 2020-03-17 10:26 ` Zurcher, Christopher J
2020-03-26 1:15 ` [edk2-devel] " Yao, Jiewen
[not found] ` <15FFB5A5A94CCE31.23217@groups.io>
2020-03-23 12:59 ` [edk2-devel] [PATCH 0/1] " Laszlo Ersek
2020-03-25 18:40 ` Ard Biesheuvel
2 siblings, 2 replies; 14+ messages in thread
From: Zurcher, Christopher J @ 2020-03-17 10:26 UTC (permalink / raw)
To: devel; +Cc: Jian J Wang, Xiaoyu Lu, Eugene Cohen, Ard Biesheuvel
BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=2507
Adding IA32 and X64 versions of OpensslLib.inf, and their respective
assembly files. This also introduces the required modifications to
process_files.pl for generating these files.
Cc: Jian J Wang <jian.j.wang@intel.com>
Cc: Xiaoyu Lu <xiaoyux.lu@intel.com>
Cc: Eugene Cohen <eugene@hp.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Christopher J Zurcher <christopher.j.zurcher@intel.com>
---
CryptoPkg/Library/OpensslLib/OpensslLib.inf | 2 +-
CryptoPkg/Library/OpensslLib/OpensslLibCrypto.inf | 2 +-
CryptoPkg/Library/OpensslLib/OpensslLibIa32.inf | 680 ++
CryptoPkg/Library/OpensslLib/OpensslLibX64.inf | 691 ++
CryptoPkg/Library/Include/openssl/opensslconf.h | 3 -
CryptoPkg/Library/OpensslLib/ApiHooks.c | 18 +
CryptoPkg/Library/OpensslLib/OpensslLibConstructor.c | 34 +
CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/aesni-x86.nasm | 3209 ++++++++
CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/vpaes-x86.nasm | 648 ++
CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/bn-586.nasm | 1522 ++++
CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/co-586.nasm | 1259 +++
CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-gf2m.nasm | 352 +
CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-mont.nasm | 486 ++
CryptoPkg/Library/OpensslLib/Ia32/crypto/des/crypt586.nasm | 887 +++
CryptoPkg/Library/OpensslLib/Ia32/crypto/des/des-586.nasm | 1835 +++++
CryptoPkg/Library/OpensslLib/Ia32/crypto/md5/md5-586.nasm | 690 ++
CryptoPkg/Library/OpensslLib/Ia32/crypto/modes/ghash-x86.nasm | 1264 +++
CryptoPkg/Library/OpensslLib/Ia32/crypto/rc4/rc4-586.nasm | 381 +
CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha1-586.nasm | 3977 ++++++++++
CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha256-586.nasm | 6796 ++++++++++++++++
CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha512-586.nasm | 2842 +++++++
CryptoPkg/Library/OpensslLib/Ia32/crypto/x86cpuid.nasm | 513 ++
CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-mb-x86_64.nasm | 1772 +++++
CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha1-x86_64.nasm | 3271 ++++++++
CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha256-x86_64.nasm | 4709 +++++++++++
CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-x86_64.nasm | 5084 ++++++++++++
CryptoPkg/Library/OpensslLib/X64/crypto/aes/vpaes-x86_64.nasm | 1170 +++
CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-avx2.nasm | 1989 +++++
CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-x86_64.nasm | 2242 ++++++
CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-gf2m.nasm | 432 +
CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont.nasm | 1479 ++++
CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont5.nasm | 4033 ++++++++++
CryptoPkg/Library/OpensslLib/X64/crypto/md5/md5-x86_64.nasm | 794 ++
CryptoPkg/Library/OpensslLib/X64/crypto/modes/aesni-gcm-x86_64.nasm | 984 +++
CryptoPkg/Library/OpensslLib/X64/crypto/modes/ghash-x86_64.nasm | 2077 +++++
CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-md5-x86_64.nasm | 1395 ++++
CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-x86_64.nasm | 784 ++
CryptoPkg/Library/OpensslLib/X64/crypto/sha/keccak1600-x86_64.nasm | 532 ++
CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-mb-x86_64.nasm | 7581 ++++++++++++++++++
CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-x86_64.nasm | 5773 ++++++++++++++
CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-mb-x86_64.nasm | 8262 ++++++++++++++++++++
CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-x86_64.nasm | 5712 ++++++++++++++
CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha512-x86_64.nasm | 5668 ++++++++++++++
CryptoPkg/Library/OpensslLib/X64/crypto/x86_64cpuid.nasm | 472 ++
CryptoPkg/Library/OpensslLib/process_files.pl | 208 +-
CryptoPkg/Library/OpensslLib/uefi-asm.conf | 14 +
46 files changed, 94478 insertions(+), 50 deletions(-)
diff --git a/CryptoPkg/Library/OpensslLib/OpensslLib.inf b/CryptoPkg/Library/OpensslLib/OpensslLib.inf
index 3519a66885..542507a534 100644
--- a/CryptoPkg/Library/OpensslLib/OpensslLib.inf
+++ b/CryptoPkg/Library/OpensslLib/OpensslLib.inf
@@ -15,7 +15,7 @@
VERSION_STRING = 1.0
LIBRARY_CLASS = OpensslLib
DEFINE OPENSSL_PATH = openssl
- DEFINE OPENSSL_FLAGS = -DL_ENDIAN -DOPENSSL_SMALL_FOOTPRINT -D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE
+ DEFINE OPENSSL_FLAGS = -DL_ENDIAN -DOPENSSL_SMALL_FOOTPRINT -D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE -DOPENSSL_NO_ASM
#
# VALID_ARCHITECTURES = IA32 X64 ARM AARCH64
diff --git a/CryptoPkg/Library/OpensslLib/OpensslLibCrypto.inf b/CryptoPkg/Library/OpensslLib/OpensslLibCrypto.inf
index 8a723cb8cd..f0c588284c 100644
--- a/CryptoPkg/Library/OpensslLib/OpensslLibCrypto.inf
+++ b/CryptoPkg/Library/OpensslLib/OpensslLibCrypto.inf
@@ -15,7 +15,7 @@
VERSION_STRING = 1.0
LIBRARY_CLASS = OpensslLib
DEFINE OPENSSL_PATH = openssl
- DEFINE OPENSSL_FLAGS = -DL_ENDIAN -DOPENSSL_SMALL_FOOTPRINT -D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE
+ DEFINE OPENSSL_FLAGS = -DL_ENDIAN -DOPENSSL_SMALL_FOOTPRINT -D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE -DOPENSSL_NO_ASM
#
# VALID_ARCHITECTURES = IA32 X64 ARM AARCH64
diff --git a/CryptoPkg/Library/OpensslLib/OpensslLibIa32.inf b/CryptoPkg/Library/OpensslLib/OpensslLibIa32.inf
new file mode 100644
index 0000000000..14f4d4ab1a
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/OpensslLibIa32.inf
@@ -0,0 +1,680 @@
+## @file
+# This module provides OpenSSL Library implementation.
+#
+# Copyright (c) 2010 - 2020, Intel Corporation. All rights reserved.<BR>
+# SPDX-License-Identifier: BSD-2-Clause-Patent
+#
+##
+
+[Defines]
+ INF_VERSION = 0x00010005
+ BASE_NAME = OpensslLibIa32
+ MODULE_UNI_FILE = OpensslLib.uni
+ FILE_GUID = 5805D1D4-F8EE-4FBA-BDD8-74465F16A534
+ MODULE_TYPE = BASE
+ VERSION_STRING = 1.0
+ LIBRARY_CLASS = OpensslLib
+ DEFINE OPENSSL_PATH = openssl
+ DEFINE OPENSSL_FLAGS = -DL_ENDIAN -DOPENSSL_SMALL_FOOTPRINT -D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE
+ DEFINE OPENSSL_FLAGS_CONFIG = -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DRMD160_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM
+ CONSTRUCTOR = OpensslLibConstructor
+
+#
+# VALID_ARCHITECTURES = IA32
+#
+
+[Sources]
+ OpensslLibConstructor.c
+ $(OPENSSL_PATH)/e_os.h
+ $(OPENSSL_PATH)/ms/uplink.h
+# Autogenerated files list starts here
+ Ia32/crypto/aes/aesni-x86.nasm
+ Ia32/crypto/aes/vpaes-x86.nasm
+ Ia32/crypto/bn/bn-586.nasm
+ Ia32/crypto/bn/co-586.nasm
+ Ia32/crypto/bn/x86-gf2m.nasm
+ Ia32/crypto/bn/x86-mont.nasm
+ Ia32/crypto/des/crypt586.nasm
+ Ia32/crypto/des/des-586.nasm
+ Ia32/crypto/md5/md5-586.nasm
+ Ia32/crypto/modes/ghash-x86.nasm
+ Ia32/crypto/rc4/rc4-586.nasm
+ Ia32/crypto/sha/sha1-586.nasm
+ Ia32/crypto/sha/sha256-586.nasm
+ Ia32/crypto/sha/sha512-586.nasm
+ Ia32/crypto/x86cpuid.nasm
+ $(OPENSSL_PATH)/crypto/aes/aes_cbc.c
+ $(OPENSSL_PATH)/crypto/aes/aes_cfb.c
+ $(OPENSSL_PATH)/crypto/aes/aes_core.c
+ $(OPENSSL_PATH)/crypto/aes/aes_ecb.c
+ $(OPENSSL_PATH)/crypto/aes/aes_ige.c
+ $(OPENSSL_PATH)/crypto/aes/aes_misc.c
+ $(OPENSSL_PATH)/crypto/aes/aes_ofb.c
+ $(OPENSSL_PATH)/crypto/aes/aes_wrap.c
+ $(OPENSSL_PATH)/crypto/aria/aria.c
+ $(OPENSSL_PATH)/crypto/asn1/a_bitstr.c
+ $(OPENSSL_PATH)/crypto/asn1/a_d2i_fp.c
+ $(OPENSSL_PATH)/crypto/asn1/a_digest.c
+ $(OPENSSL_PATH)/crypto/asn1/a_dup.c
+ $(OPENSSL_PATH)/crypto/asn1/a_gentm.c
+ $(OPENSSL_PATH)/crypto/asn1/a_i2d_fp.c
+ $(OPENSSL_PATH)/crypto/asn1/a_int.c
+ $(OPENSSL_PATH)/crypto/asn1/a_mbstr.c
+ $(OPENSSL_PATH)/crypto/asn1/a_object.c
+ $(OPENSSL_PATH)/crypto/asn1/a_octet.c
+ $(OPENSSL_PATH)/crypto/asn1/a_print.c
+ $(OPENSSL_PATH)/crypto/asn1/a_sign.c
+ $(OPENSSL_PATH)/crypto/asn1/a_strex.c
+ $(OPENSSL_PATH)/crypto/asn1/a_strnid.c
+ $(OPENSSL_PATH)/crypto/asn1/a_time.c
+ $(OPENSSL_PATH)/crypto/asn1/a_type.c
+ $(OPENSSL_PATH)/crypto/asn1/a_utctm.c
+ $(OPENSSL_PATH)/crypto/asn1/a_utf8.c
+ $(OPENSSL_PATH)/crypto/asn1/a_verify.c
+ $(OPENSSL_PATH)/crypto/asn1/ameth_lib.c
+ $(OPENSSL_PATH)/crypto/asn1/asn1_err.c
+ $(OPENSSL_PATH)/crypto/asn1/asn1_gen.c
+ $(OPENSSL_PATH)/crypto/asn1/asn1_item_list.c
+ $(OPENSSL_PATH)/crypto/asn1/asn1_lib.c
+ $(OPENSSL_PATH)/crypto/asn1/asn1_par.c
+ $(OPENSSL_PATH)/crypto/asn1/asn_mime.c
+ $(OPENSSL_PATH)/crypto/asn1/asn_moid.c
+ $(OPENSSL_PATH)/crypto/asn1/asn_mstbl.c
+ $(OPENSSL_PATH)/crypto/asn1/asn_pack.c
+ $(OPENSSL_PATH)/crypto/asn1/bio_asn1.c
+ $(OPENSSL_PATH)/crypto/asn1/bio_ndef.c
+ $(OPENSSL_PATH)/crypto/asn1/d2i_pr.c
+ $(OPENSSL_PATH)/crypto/asn1/d2i_pu.c
+ $(OPENSSL_PATH)/crypto/asn1/evp_asn1.c
+ $(OPENSSL_PATH)/crypto/asn1/f_int.c
+ $(OPENSSL_PATH)/crypto/asn1/f_string.c
+ $(OPENSSL_PATH)/crypto/asn1/i2d_pr.c
+ $(OPENSSL_PATH)/crypto/asn1/i2d_pu.c
+ $(OPENSSL_PATH)/crypto/asn1/n_pkey.c
+ $(OPENSSL_PATH)/crypto/asn1/nsseq.c
+ $(OPENSSL_PATH)/crypto/asn1/p5_pbe.c
+ $(OPENSSL_PATH)/crypto/asn1/p5_pbev2.c
+ $(OPENSSL_PATH)/crypto/asn1/p5_scrypt.c
+ $(OPENSSL_PATH)/crypto/asn1/p8_pkey.c
+ $(OPENSSL_PATH)/crypto/asn1/t_bitst.c
+ $(OPENSSL_PATH)/crypto/asn1/t_pkey.c
+ $(OPENSSL_PATH)/crypto/asn1/t_spki.c
+ $(OPENSSL_PATH)/crypto/asn1/tasn_dec.c
+ $(OPENSSL_PATH)/crypto/asn1/tasn_enc.c
+ $(OPENSSL_PATH)/crypto/asn1/tasn_fre.c
+ $(OPENSSL_PATH)/crypto/asn1/tasn_new.c
+ $(OPENSSL_PATH)/crypto/asn1/tasn_prn.c
+ $(OPENSSL_PATH)/crypto/asn1/tasn_scn.c
+ $(OPENSSL_PATH)/crypto/asn1/tasn_typ.c
+ $(OPENSSL_PATH)/crypto/asn1/tasn_utl.c
+ $(OPENSSL_PATH)/crypto/asn1/x_algor.c
+ $(OPENSSL_PATH)/crypto/asn1/x_bignum.c
+ $(OPENSSL_PATH)/crypto/asn1/x_info.c
+ $(OPENSSL_PATH)/crypto/asn1/x_int64.c
+ $(OPENSSL_PATH)/crypto/asn1/x_long.c
+ $(OPENSSL_PATH)/crypto/asn1/x_pkey.c
+ $(OPENSSL_PATH)/crypto/asn1/x_sig.c
+ $(OPENSSL_PATH)/crypto/asn1/x_spki.c
+ $(OPENSSL_PATH)/crypto/asn1/x_val.c
+ $(OPENSSL_PATH)/crypto/async/arch/async_null.c
+ $(OPENSSL_PATH)/crypto/async/arch/async_posix.c
+ $(OPENSSL_PATH)/crypto/async/arch/async_win.c
+ $(OPENSSL_PATH)/crypto/async/async.c
+ $(OPENSSL_PATH)/crypto/async/async_err.c
+ $(OPENSSL_PATH)/crypto/async/async_wait.c
+ $(OPENSSL_PATH)/crypto/bio/b_addr.c
+ $(OPENSSL_PATH)/crypto/bio/b_dump.c
+ $(OPENSSL_PATH)/crypto/bio/b_sock.c
+ $(OPENSSL_PATH)/crypto/bio/b_sock2.c
+ $(OPENSSL_PATH)/crypto/bio/bf_buff.c
+ $(OPENSSL_PATH)/crypto/bio/bf_lbuf.c
+ $(OPENSSL_PATH)/crypto/bio/bf_nbio.c
+ $(OPENSSL_PATH)/crypto/bio/bf_null.c
+ $(OPENSSL_PATH)/crypto/bio/bio_cb.c
+ $(OPENSSL_PATH)/crypto/bio/bio_err.c
+ $(OPENSSL_PATH)/crypto/bio/bio_lib.c
+ $(OPENSSL_PATH)/crypto/bio/bio_meth.c
+ $(OPENSSL_PATH)/crypto/bio/bss_acpt.c
+ $(OPENSSL_PATH)/crypto/bio/bss_bio.c
+ $(OPENSSL_PATH)/crypto/bio/bss_conn.c
+ $(OPENSSL_PATH)/crypto/bio/bss_dgram.c
+ $(OPENSSL_PATH)/crypto/bio/bss_fd.c
+ $(OPENSSL_PATH)/crypto/bio/bss_file.c
+ $(OPENSSL_PATH)/crypto/bio/bss_log.c
+ $(OPENSSL_PATH)/crypto/bio/bss_mem.c
+ $(OPENSSL_PATH)/crypto/bio/bss_null.c
+ $(OPENSSL_PATH)/crypto/bio/bss_sock.c
+ $(OPENSSL_PATH)/crypto/bn/bn_add.c
+ $(OPENSSL_PATH)/crypto/bn/bn_blind.c
+ $(OPENSSL_PATH)/crypto/bn/bn_const.c
+ $(OPENSSL_PATH)/crypto/bn/bn_ctx.c
+ $(OPENSSL_PATH)/crypto/bn/bn_depr.c
+ $(OPENSSL_PATH)/crypto/bn/bn_dh.c
+ $(OPENSSL_PATH)/crypto/bn/bn_div.c
+ $(OPENSSL_PATH)/crypto/bn/bn_err.c
+ $(OPENSSL_PATH)/crypto/bn/bn_exp.c
+ $(OPENSSL_PATH)/crypto/bn/bn_exp2.c
+ $(OPENSSL_PATH)/crypto/bn/bn_gcd.c
+ $(OPENSSL_PATH)/crypto/bn/bn_gf2m.c
+ $(OPENSSL_PATH)/crypto/bn/bn_intern.c
+ $(OPENSSL_PATH)/crypto/bn/bn_kron.c
+ $(OPENSSL_PATH)/crypto/bn/bn_lib.c
+ $(OPENSSL_PATH)/crypto/bn/bn_mod.c
+ $(OPENSSL_PATH)/crypto/bn/bn_mont.c
+ $(OPENSSL_PATH)/crypto/bn/bn_mpi.c
+ $(OPENSSL_PATH)/crypto/bn/bn_mul.c
+ $(OPENSSL_PATH)/crypto/bn/bn_nist.c
+ $(OPENSSL_PATH)/crypto/bn/bn_prime.c
+ $(OPENSSL_PATH)/crypto/bn/bn_print.c
+ $(OPENSSL_PATH)/crypto/bn/bn_rand.c
+ $(OPENSSL_PATH)/crypto/bn/bn_recp.c
+ $(OPENSSL_PATH)/crypto/bn/bn_shift.c
+ $(OPENSSL_PATH)/crypto/bn/bn_sqr.c
+ $(OPENSSL_PATH)/crypto/bn/bn_sqrt.c
+ $(OPENSSL_PATH)/crypto/bn/bn_srp.c
+ $(OPENSSL_PATH)/crypto/bn/bn_word.c
+ $(OPENSSL_PATH)/crypto/bn/bn_x931p.c
+ $(OPENSSL_PATH)/crypto/buffer/buf_err.c
+ $(OPENSSL_PATH)/crypto/buffer/buffer.c
+ $(OPENSSL_PATH)/crypto/cmac/cm_ameth.c
+ $(OPENSSL_PATH)/crypto/cmac/cm_pmeth.c
+ $(OPENSSL_PATH)/crypto/cmac/cmac.c
+ $(OPENSSL_PATH)/crypto/comp/c_zlib.c
+ $(OPENSSL_PATH)/crypto/comp/comp_err.c
+ $(OPENSSL_PATH)/crypto/comp/comp_lib.c
+ $(OPENSSL_PATH)/crypto/conf/conf_api.c
+ $(OPENSSL_PATH)/crypto/conf/conf_def.c
+ $(OPENSSL_PATH)/crypto/conf/conf_err.c
+ $(OPENSSL_PATH)/crypto/conf/conf_lib.c
+ $(OPENSSL_PATH)/crypto/conf/conf_mall.c
+ $(OPENSSL_PATH)/crypto/conf/conf_mod.c
+ $(OPENSSL_PATH)/crypto/conf/conf_sap.c
+ $(OPENSSL_PATH)/crypto/conf/conf_ssl.c
+ $(OPENSSL_PATH)/crypto/cpt_err.c
+ $(OPENSSL_PATH)/crypto/cryptlib.c
+ $(OPENSSL_PATH)/crypto/ctype.c
+ $(OPENSSL_PATH)/crypto/cversion.c
+ $(OPENSSL_PATH)/crypto/des/cbc_cksm.c
+ $(OPENSSL_PATH)/crypto/des/cbc_enc.c
+ $(OPENSSL_PATH)/crypto/des/cfb64ede.c
+ $(OPENSSL_PATH)/crypto/des/cfb64enc.c
+ $(OPENSSL_PATH)/crypto/des/cfb_enc.c
+ $(OPENSSL_PATH)/crypto/des/ecb3_enc.c
+ $(OPENSSL_PATH)/crypto/des/ecb_enc.c
+ $(OPENSSL_PATH)/crypto/des/fcrypt.c
+ $(OPENSSL_PATH)/crypto/des/ofb64ede.c
+ $(OPENSSL_PATH)/crypto/des/ofb64enc.c
+ $(OPENSSL_PATH)/crypto/des/ofb_enc.c
+ $(OPENSSL_PATH)/crypto/des/pcbc_enc.c
+ $(OPENSSL_PATH)/crypto/des/qud_cksm.c
+ $(OPENSSL_PATH)/crypto/des/rand_key.c
+ $(OPENSSL_PATH)/crypto/des/set_key.c
+ $(OPENSSL_PATH)/crypto/des/str2key.c
+ $(OPENSSL_PATH)/crypto/des/xcbc_enc.c
+ $(OPENSSL_PATH)/crypto/dh/dh_ameth.c
+ $(OPENSSL_PATH)/crypto/dh/dh_asn1.c
+ $(OPENSSL_PATH)/crypto/dh/dh_check.c
+ $(OPENSSL_PATH)/crypto/dh/dh_depr.c
+ $(OPENSSL_PATH)/crypto/dh/dh_err.c
+ $(OPENSSL_PATH)/crypto/dh/dh_gen.c
+ $(OPENSSL_PATH)/crypto/dh/dh_kdf.c
+ $(OPENSSL_PATH)/crypto/dh/dh_key.c
+ $(OPENSSL_PATH)/crypto/dh/dh_lib.c
+ $(OPENSSL_PATH)/crypto/dh/dh_meth.c
+ $(OPENSSL_PATH)/crypto/dh/dh_pmeth.c
+ $(OPENSSL_PATH)/crypto/dh/dh_prn.c
+ $(OPENSSL_PATH)/crypto/dh/dh_rfc5114.c
+ $(OPENSSL_PATH)/crypto/dh/dh_rfc7919.c
+ $(OPENSSL_PATH)/crypto/dso/dso_dl.c
+ $(OPENSSL_PATH)/crypto/dso/dso_dlfcn.c
+ $(OPENSSL_PATH)/crypto/dso/dso_err.c
+ $(OPENSSL_PATH)/crypto/dso/dso_lib.c
+ $(OPENSSL_PATH)/crypto/dso/dso_openssl.c
+ $(OPENSSL_PATH)/crypto/dso/dso_vms.c
+ $(OPENSSL_PATH)/crypto/dso/dso_win32.c
+ $(OPENSSL_PATH)/crypto/ebcdic.c
+ $(OPENSSL_PATH)/crypto/err/err.c
+ $(OPENSSL_PATH)/crypto/err/err_prn.c
+ $(OPENSSL_PATH)/crypto/evp/bio_b64.c
+ $(OPENSSL_PATH)/crypto/evp/bio_enc.c
+ $(OPENSSL_PATH)/crypto/evp/bio_md.c
+ $(OPENSSL_PATH)/crypto/evp/bio_ok.c
+ $(OPENSSL_PATH)/crypto/evp/c_allc.c
+ $(OPENSSL_PATH)/crypto/evp/c_alld.c
+ $(OPENSSL_PATH)/crypto/evp/cmeth_lib.c
+ $(OPENSSL_PATH)/crypto/evp/digest.c
+ $(OPENSSL_PATH)/crypto/evp/e_aes.c
+ $(OPENSSL_PATH)/crypto/evp/e_aes_cbc_hmac_sha1.c
+ $(OPENSSL_PATH)/crypto/evp/e_aes_cbc_hmac_sha256.c
+ $(OPENSSL_PATH)/crypto/evp/e_aria.c
+ $(OPENSSL_PATH)/crypto/evp/e_bf.c
+ $(OPENSSL_PATH)/crypto/evp/e_camellia.c
+ $(OPENSSL_PATH)/crypto/evp/e_cast.c
+ $(OPENSSL_PATH)/crypto/evp/e_chacha20_poly1305.c
+ $(OPENSSL_PATH)/crypto/evp/e_des.c
+ $(OPENSSL_PATH)/crypto/evp/e_des3.c
+ $(OPENSSL_PATH)/crypto/evp/e_idea.c
+ $(OPENSSL_PATH)/crypto/evp/e_null.c
+ $(OPENSSL_PATH)/crypto/evp/e_old.c
+ $(OPENSSL_PATH)/crypto/evp/e_rc2.c
+ $(OPENSSL_PATH)/crypto/evp/e_rc4.c
+ $(OPENSSL_PATH)/crypto/evp/e_rc4_hmac_md5.c
+ $(OPENSSL_PATH)/crypto/evp/e_rc5.c
+ $(OPENSSL_PATH)/crypto/evp/e_seed.c
+ $(OPENSSL_PATH)/crypto/evp/e_sm4.c
+ $(OPENSSL_PATH)/crypto/evp/e_xcbc_d.c
+ $(OPENSSL_PATH)/crypto/evp/encode.c
+ $(OPENSSL_PATH)/crypto/evp/evp_cnf.c
+ $(OPENSSL_PATH)/crypto/evp/evp_enc.c
+ $(OPENSSL_PATH)/crypto/evp/evp_err.c
+ $(OPENSSL_PATH)/crypto/evp/evp_key.c
+ $(OPENSSL_PATH)/crypto/evp/evp_lib.c
+ $(OPENSSL_PATH)/crypto/evp/evp_pbe.c
+ $(OPENSSL_PATH)/crypto/evp/evp_pkey.c
+ $(OPENSSL_PATH)/crypto/evp/m_md2.c
+ $(OPENSSL_PATH)/crypto/evp/m_md4.c
+ $(OPENSSL_PATH)/crypto/evp/m_md5.c
+ $(OPENSSL_PATH)/crypto/evp/m_md5_sha1.c
+ $(OPENSSL_PATH)/crypto/evp/m_mdc2.c
+ $(OPENSSL_PATH)/crypto/evp/m_null.c
+ $(OPENSSL_PATH)/crypto/evp/m_ripemd.c
+ $(OPENSSL_PATH)/crypto/evp/m_sha1.c
+ $(OPENSSL_PATH)/crypto/evp/m_sha3.c
+ $(OPENSSL_PATH)/crypto/evp/m_sigver.c
+ $(OPENSSL_PATH)/crypto/evp/m_wp.c
+ $(OPENSSL_PATH)/crypto/evp/names.c
+ $(OPENSSL_PATH)/crypto/evp/p5_crpt.c
+ $(OPENSSL_PATH)/crypto/evp/p5_crpt2.c
+ $(OPENSSL_PATH)/crypto/evp/p_dec.c
+ $(OPENSSL_PATH)/crypto/evp/p_enc.c
+ $(OPENSSL_PATH)/crypto/evp/p_lib.c
+ $(OPENSSL_PATH)/crypto/evp/p_open.c
+ $(OPENSSL_PATH)/crypto/evp/p_seal.c
+ $(OPENSSL_PATH)/crypto/evp/p_sign.c
+ $(OPENSSL_PATH)/crypto/evp/p_verify.c
+ $(OPENSSL_PATH)/crypto/evp/pbe_scrypt.c
+ $(OPENSSL_PATH)/crypto/evp/pmeth_fn.c
+ $(OPENSSL_PATH)/crypto/evp/pmeth_gn.c
+ $(OPENSSL_PATH)/crypto/evp/pmeth_lib.c
+ $(OPENSSL_PATH)/crypto/ex_data.c
+ $(OPENSSL_PATH)/crypto/getenv.c
+ $(OPENSSL_PATH)/crypto/hmac/hm_ameth.c
+ $(OPENSSL_PATH)/crypto/hmac/hm_pmeth.c
+ $(OPENSSL_PATH)/crypto/hmac/hmac.c
+ $(OPENSSL_PATH)/crypto/init.c
+ $(OPENSSL_PATH)/crypto/kdf/hkdf.c
+ $(OPENSSL_PATH)/crypto/kdf/kdf_err.c
+ $(OPENSSL_PATH)/crypto/kdf/scrypt.c
+ $(OPENSSL_PATH)/crypto/kdf/tls1_prf.c
+ $(OPENSSL_PATH)/crypto/lhash/lh_stats.c
+ $(OPENSSL_PATH)/crypto/lhash/lhash.c
+ $(OPENSSL_PATH)/crypto/md4/md4_dgst.c
+ $(OPENSSL_PATH)/crypto/md4/md4_one.c
+ $(OPENSSL_PATH)/crypto/md5/md5_dgst.c
+ $(OPENSSL_PATH)/crypto/md5/md5_one.c
+ $(OPENSSL_PATH)/crypto/mem.c
+ $(OPENSSL_PATH)/crypto/mem_dbg.c
+ $(OPENSSL_PATH)/crypto/mem_sec.c
+ $(OPENSSL_PATH)/crypto/modes/cbc128.c
+ $(OPENSSL_PATH)/crypto/modes/ccm128.c
+ $(OPENSSL_PATH)/crypto/modes/cfb128.c
+ $(OPENSSL_PATH)/crypto/modes/ctr128.c
+ $(OPENSSL_PATH)/crypto/modes/cts128.c
+ $(OPENSSL_PATH)/crypto/modes/gcm128.c
+ $(OPENSSL_PATH)/crypto/modes/ocb128.c
+ $(OPENSSL_PATH)/crypto/modes/ofb128.c
+ $(OPENSSL_PATH)/crypto/modes/wrap128.c
+ $(OPENSSL_PATH)/crypto/modes/xts128.c
+ $(OPENSSL_PATH)/crypto/o_dir.c
+ $(OPENSSL_PATH)/crypto/o_fips.c
+ $(OPENSSL_PATH)/crypto/o_fopen.c
+ $(OPENSSL_PATH)/crypto/o_init.c
+ $(OPENSSL_PATH)/crypto/o_str.c
+ $(OPENSSL_PATH)/crypto/o_time.c
+ $(OPENSSL_PATH)/crypto/objects/o_names.c
+ $(OPENSSL_PATH)/crypto/objects/obj_dat.c
+ $(OPENSSL_PATH)/crypto/objects/obj_err.c
+ $(OPENSSL_PATH)/crypto/objects/obj_lib.c
+ $(OPENSSL_PATH)/crypto/objects/obj_xref.c
+ $(OPENSSL_PATH)/crypto/ocsp/ocsp_asn.c
+ $(OPENSSL_PATH)/crypto/ocsp/ocsp_cl.c
+ $(OPENSSL_PATH)/crypto/ocsp/ocsp_err.c
+ $(OPENSSL_PATH)/crypto/ocsp/ocsp_ext.c
+ $(OPENSSL_PATH)/crypto/ocsp/ocsp_ht.c
+ $(OPENSSL_PATH)/crypto/ocsp/ocsp_lib.c
+ $(OPENSSL_PATH)/crypto/ocsp/ocsp_prn.c
+ $(OPENSSL_PATH)/crypto/ocsp/ocsp_srv.c
+ $(OPENSSL_PATH)/crypto/ocsp/ocsp_vfy.c
+ $(OPENSSL_PATH)/crypto/ocsp/v3_ocsp.c
+ $(OPENSSL_PATH)/crypto/pem/pem_all.c
+ $(OPENSSL_PATH)/crypto/pem/pem_err.c
+ $(OPENSSL_PATH)/crypto/pem/pem_info.c
+ $(OPENSSL_PATH)/crypto/pem/pem_lib.c
+ $(OPENSSL_PATH)/crypto/pem/pem_oth.c
+ $(OPENSSL_PATH)/crypto/pem/pem_pk8.c
+ $(OPENSSL_PATH)/crypto/pem/pem_pkey.c
+ $(OPENSSL_PATH)/crypto/pem/pem_sign.c
+ $(OPENSSL_PATH)/crypto/pem/pem_x509.c
+ $(OPENSSL_PATH)/crypto/pem/pem_xaux.c
+ $(OPENSSL_PATH)/crypto/pem/pvkfmt.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_add.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_asn.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_attr.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_crpt.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_crt.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_decr.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_init.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_key.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_kiss.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_mutl.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_npas.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_p8d.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_p8e.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_sbag.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_utl.c
+ $(OPENSSL_PATH)/crypto/pkcs12/pk12err.c
+ $(OPENSSL_PATH)/crypto/pkcs7/bio_pk7.c
+ $(OPENSSL_PATH)/crypto/pkcs7/pk7_asn1.c
+ $(OPENSSL_PATH)/crypto/pkcs7/pk7_attr.c
+ $(OPENSSL_PATH)/crypto/pkcs7/pk7_doit.c
+ $(OPENSSL_PATH)/crypto/pkcs7/pk7_lib.c
+ $(OPENSSL_PATH)/crypto/pkcs7/pk7_mime.c
+ $(OPENSSL_PATH)/crypto/pkcs7/pk7_smime.c
+ $(OPENSSL_PATH)/crypto/pkcs7/pkcs7err.c
+ $(OPENSSL_PATH)/crypto/rand/drbg_ctr.c
+ $(OPENSSL_PATH)/crypto/rand/drbg_lib.c
+ $(OPENSSL_PATH)/crypto/rand/rand_egd.c
+ $(OPENSSL_PATH)/crypto/rand/rand_err.c
+ $(OPENSSL_PATH)/crypto/rand/rand_lib.c
+ $(OPENSSL_PATH)/crypto/rand/rand_unix.c
+ $(OPENSSL_PATH)/crypto/rand/rand_vms.c
+ $(OPENSSL_PATH)/crypto/rand/rand_win.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_ameth.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_asn1.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_chk.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_crpt.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_depr.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_err.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_gen.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_lib.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_meth.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_mp.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_none.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_oaep.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_ossl.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_pk1.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_pmeth.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_prn.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_pss.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_saos.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_sign.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_ssl.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_x931.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_x931g.c
+ $(OPENSSL_PATH)/crypto/sha/keccak1600.c
+ $(OPENSSL_PATH)/crypto/sha/sha1_one.c
+ $(OPENSSL_PATH)/crypto/sha/sha1dgst.c
+ $(OPENSSL_PATH)/crypto/sha/sha256.c
+ $(OPENSSL_PATH)/crypto/sha/sha512.c
+ $(OPENSSL_PATH)/crypto/siphash/siphash.c
+ $(OPENSSL_PATH)/crypto/siphash/siphash_ameth.c
+ $(OPENSSL_PATH)/crypto/siphash/siphash_pmeth.c
+ $(OPENSSL_PATH)/crypto/sm3/m_sm3.c
+ $(OPENSSL_PATH)/crypto/sm3/sm3.c
+ $(OPENSSL_PATH)/crypto/sm4/sm4.c
+ $(OPENSSL_PATH)/crypto/stack/stack.c
+ $(OPENSSL_PATH)/crypto/threads_none.c
+ $(OPENSSL_PATH)/crypto/threads_pthread.c
+ $(OPENSSL_PATH)/crypto/threads_win.c
+ $(OPENSSL_PATH)/crypto/txt_db/txt_db.c
+ $(OPENSSL_PATH)/crypto/ui/ui_err.c
+ $(OPENSSL_PATH)/crypto/ui/ui_lib.c
+ $(OPENSSL_PATH)/crypto/ui/ui_null.c
+ $(OPENSSL_PATH)/crypto/ui/ui_openssl.c
+ $(OPENSSL_PATH)/crypto/ui/ui_util.c
+ $(OPENSSL_PATH)/crypto/uid.c
+ $(OPENSSL_PATH)/crypto/x509/by_dir.c
+ $(OPENSSL_PATH)/crypto/x509/by_file.c
+ $(OPENSSL_PATH)/crypto/x509/t_crl.c
+ $(OPENSSL_PATH)/crypto/x509/t_req.c
+ $(OPENSSL_PATH)/crypto/x509/t_x509.c
+ $(OPENSSL_PATH)/crypto/x509/x509_att.c
+ $(OPENSSL_PATH)/crypto/x509/x509_cmp.c
+ $(OPENSSL_PATH)/crypto/x509/x509_d2.c
+ $(OPENSSL_PATH)/crypto/x509/x509_def.c
+ $(OPENSSL_PATH)/crypto/x509/x509_err.c
+ $(OPENSSL_PATH)/crypto/x509/x509_ext.c
+ $(OPENSSL_PATH)/crypto/x509/x509_lu.c
+ $(OPENSSL_PATH)/crypto/x509/x509_meth.c
+ $(OPENSSL_PATH)/crypto/x509/x509_obj.c
+ $(OPENSSL_PATH)/crypto/x509/x509_r2x.c
+ $(OPENSSL_PATH)/crypto/x509/x509_req.c
+ $(OPENSSL_PATH)/crypto/x509/x509_set.c
+ $(OPENSSL_PATH)/crypto/x509/x509_trs.c
+ $(OPENSSL_PATH)/crypto/x509/x509_txt.c
+ $(OPENSSL_PATH)/crypto/x509/x509_v3.c
+ $(OPENSSL_PATH)/crypto/x509/x509_vfy.c
+ $(OPENSSL_PATH)/crypto/x509/x509_vpm.c
+ $(OPENSSL_PATH)/crypto/x509/x509cset.c
+ $(OPENSSL_PATH)/crypto/x509/x509name.c
+ $(OPENSSL_PATH)/crypto/x509/x509rset.c
+ $(OPENSSL_PATH)/crypto/x509/x509spki.c
+ $(OPENSSL_PATH)/crypto/x509/x509type.c
+ $(OPENSSL_PATH)/crypto/x509/x_all.c
+ $(OPENSSL_PATH)/crypto/x509/x_attrib.c
+ $(OPENSSL_PATH)/crypto/x509/x_crl.c
+ $(OPENSSL_PATH)/crypto/x509/x_exten.c
+ $(OPENSSL_PATH)/crypto/x509/x_name.c
+ $(OPENSSL_PATH)/crypto/x509/x_pubkey.c
+ $(OPENSSL_PATH)/crypto/x509/x_req.c
+ $(OPENSSL_PATH)/crypto/x509/x_x509.c
+ $(OPENSSL_PATH)/crypto/x509/x_x509a.c
+ $(OPENSSL_PATH)/crypto/x509v3/pcy_cache.c
+ $(OPENSSL_PATH)/crypto/x509v3/pcy_data.c
+ $(OPENSSL_PATH)/crypto/x509v3/pcy_lib.c
+ $(OPENSSL_PATH)/crypto/x509v3/pcy_map.c
+ $(OPENSSL_PATH)/crypto/x509v3/pcy_node.c
+ $(OPENSSL_PATH)/crypto/x509v3/pcy_tree.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_addr.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_admis.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_akey.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_akeya.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_alt.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_asid.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_bcons.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_bitst.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_conf.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_cpols.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_crld.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_enum.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_extku.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_genn.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_ia5.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_info.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_int.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_lib.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_ncons.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_pci.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_pcia.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_pcons.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_pku.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_pmaps.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_prn.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_purp.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_skey.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_sxnet.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_tlsf.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_utl.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3err.c
+ $(OPENSSL_PATH)/crypto/arm_arch.h
+ $(OPENSSL_PATH)/crypto/mips_arch.h
+ $(OPENSSL_PATH)/crypto/ppc_arch.h
+ $(OPENSSL_PATH)/crypto/s390x_arch.h
+ $(OPENSSL_PATH)/crypto/sparc_arch.h
+ $(OPENSSL_PATH)/crypto/vms_rms.h
+ $(OPENSSL_PATH)/crypto/aes/aes_locl.h
+ $(OPENSSL_PATH)/crypto/asn1/asn1_item_list.h
+ $(OPENSSL_PATH)/crypto/asn1/asn1_locl.h
+ $(OPENSSL_PATH)/crypto/asn1/charmap.h
+ $(OPENSSL_PATH)/crypto/asn1/standard_methods.h
+ $(OPENSSL_PATH)/crypto/asn1/tbl_standard.h
+ $(OPENSSL_PATH)/crypto/async/async_locl.h
+ $(OPENSSL_PATH)/crypto/async/arch/async_null.h
+ $(OPENSSL_PATH)/crypto/async/arch/async_posix.h
+ $(OPENSSL_PATH)/crypto/async/arch/async_win.h
+ $(OPENSSL_PATH)/crypto/bio/bio_lcl.h
+ $(OPENSSL_PATH)/crypto/bn/bn_lcl.h
+ $(OPENSSL_PATH)/crypto/bn/bn_prime.h
+ $(OPENSSL_PATH)/crypto/bn/rsaz_exp.h
+ $(OPENSSL_PATH)/crypto/comp/comp_lcl.h
+ $(OPENSSL_PATH)/crypto/conf/conf_def.h
+ $(OPENSSL_PATH)/crypto/conf/conf_lcl.h
+ $(OPENSSL_PATH)/crypto/des/des_locl.h
+ $(OPENSSL_PATH)/crypto/des/spr.h
+ $(OPENSSL_PATH)/crypto/dh/dh_locl.h
+ $(OPENSSL_PATH)/crypto/dso/dso_locl.h
+ $(OPENSSL_PATH)/crypto/evp/evp_locl.h
+ $(OPENSSL_PATH)/crypto/hmac/hmac_lcl.h
+ $(OPENSSL_PATH)/crypto/lhash/lhash_lcl.h
+ $(OPENSSL_PATH)/crypto/md4/md4_locl.h
+ $(OPENSSL_PATH)/crypto/md5/md5_locl.h
+ $(OPENSSL_PATH)/crypto/modes/modes_lcl.h
+ $(OPENSSL_PATH)/crypto/objects/obj_dat.h
+ $(OPENSSL_PATH)/crypto/objects/obj_lcl.h
+ $(OPENSSL_PATH)/crypto/objects/obj_xref.h
+ $(OPENSSL_PATH)/crypto/ocsp/ocsp_lcl.h
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_lcl.h
+ $(OPENSSL_PATH)/crypto/rand/rand_lcl.h
+ $(OPENSSL_PATH)/crypto/rc4/rc4_locl.h
+ $(OPENSSL_PATH)/crypto/rsa/rsa_locl.h
+ $(OPENSSL_PATH)/crypto/sha/sha_locl.h
+ $(OPENSSL_PATH)/crypto/siphash/siphash_local.h
+ $(OPENSSL_PATH)/crypto/sm3/sm3_locl.h
+ $(OPENSSL_PATH)/crypto/store/store_locl.h
+ $(OPENSSL_PATH)/crypto/ui/ui_locl.h
+ $(OPENSSL_PATH)/crypto/x509/x509_lcl.h
+ $(OPENSSL_PATH)/crypto/x509v3/ext_dat.h
+ $(OPENSSL_PATH)/crypto/x509v3/pcy_int.h
+ $(OPENSSL_PATH)/crypto/x509v3/standard_exts.h
+ $(OPENSSL_PATH)/crypto/x509v3/v3_admis.h
+ $(OPENSSL_PATH)/ssl/bio_ssl.c
+ $(OPENSSL_PATH)/ssl/d1_lib.c
+ $(OPENSSL_PATH)/ssl/d1_msg.c
+ $(OPENSSL_PATH)/ssl/d1_srtp.c
+ $(OPENSSL_PATH)/ssl/methods.c
+ $(OPENSSL_PATH)/ssl/packet.c
+ $(OPENSSL_PATH)/ssl/pqueue.c
+ $(OPENSSL_PATH)/ssl/record/dtls1_bitmap.c
+ $(OPENSSL_PATH)/ssl/record/rec_layer_d1.c
+ $(OPENSSL_PATH)/ssl/record/rec_layer_s3.c
+ $(OPENSSL_PATH)/ssl/record/ssl3_buffer.c
+ $(OPENSSL_PATH)/ssl/record/ssl3_record.c
+ $(OPENSSL_PATH)/ssl/record/ssl3_record_tls13.c
+ $(OPENSSL_PATH)/ssl/s3_cbc.c
+ $(OPENSSL_PATH)/ssl/s3_enc.c
+ $(OPENSSL_PATH)/ssl/s3_lib.c
+ $(OPENSSL_PATH)/ssl/s3_msg.c
+ $(OPENSSL_PATH)/ssl/ssl_asn1.c
+ $(OPENSSL_PATH)/ssl/ssl_cert.c
+ $(OPENSSL_PATH)/ssl/ssl_ciph.c
+ $(OPENSSL_PATH)/ssl/ssl_conf.c
+ $(OPENSSL_PATH)/ssl/ssl_err.c
+ $(OPENSSL_PATH)/ssl/ssl_init.c
+ $(OPENSSL_PATH)/ssl/ssl_lib.c
+ $(OPENSSL_PATH)/ssl/ssl_mcnf.c
+ $(OPENSSL_PATH)/ssl/ssl_rsa.c
+ $(OPENSSL_PATH)/ssl/ssl_sess.c
+ $(OPENSSL_PATH)/ssl/ssl_stat.c
+ $(OPENSSL_PATH)/ssl/ssl_txt.c
+ $(OPENSSL_PATH)/ssl/ssl_utst.c
+ $(OPENSSL_PATH)/ssl/statem/extensions.c
+ $(OPENSSL_PATH)/ssl/statem/extensions_clnt.c
+ $(OPENSSL_PATH)/ssl/statem/extensions_cust.c
+ $(OPENSSL_PATH)/ssl/statem/extensions_srvr.c
+ $(OPENSSL_PATH)/ssl/statem/statem.c
+ $(OPENSSL_PATH)/ssl/statem/statem_clnt.c
+ $(OPENSSL_PATH)/ssl/statem/statem_dtls.c
+ $(OPENSSL_PATH)/ssl/statem/statem_lib.c
+ $(OPENSSL_PATH)/ssl/statem/statem_srvr.c
+ $(OPENSSL_PATH)/ssl/t1_enc.c
+ $(OPENSSL_PATH)/ssl/t1_lib.c
+ $(OPENSSL_PATH)/ssl/t1_trce.c
+ $(OPENSSL_PATH)/ssl/tls13_enc.c
+ $(OPENSSL_PATH)/ssl/tls_srp.c
+ $(OPENSSL_PATH)/ssl/packet_locl.h
+ $(OPENSSL_PATH)/ssl/ssl_cert_table.h
+ $(OPENSSL_PATH)/ssl/ssl_locl.h
+ $(OPENSSL_PATH)/ssl/record/record.h
+ $(OPENSSL_PATH)/ssl/record/record_locl.h
+ $(OPENSSL_PATH)/ssl/statem/statem.h
+ $(OPENSSL_PATH)/ssl/statem/statem_locl.h
+# Autogenerated files list ends here
+ buildinf.h
+ rand_pool_noise.h
+ ossl_store.c
+ rand_pool.c
+
+[Sources.Ia32]
+ rand_pool_noise_tsc.c
+
+[Packages]
+ MdePkg/MdePkg.dec
+ CryptoPkg/CryptoPkg.dec
+
+[LibraryClasses]
+ BaseLib
+ DebugLib
+ TimerLib
+ PrintLib
+
+[BuildOptions]
+ #
+ # Disables the following Visual Studio compiler warnings brought by openssl source,
+ # so we do not break the build with /WX option:
+ # C4090: 'function' : different 'const' qualifiers
+ # C4132: 'object' : const object should be initialized (tls13_enc.c)
+ # C4210: nonstandard extension used: function given file scope
+ # C4244: conversion from type1 to type2, possible loss of data
+ # C4245: conversion from type1 to type2, signed/unsigned mismatch
+ # C4267: conversion from size_t to type, possible loss of data
+ # C4306: 'identifier' : conversion from 'type1' to 'type2' of greater size
+ # C4310: cast truncates constant value
+ # C4389: 'operator' : signed/unsigned mismatch (xxxx)
+ # C4700: uninitialized local variable 'name' used. (conf_sap.c(71))
+ # C4702: unreachable code
+ # C4706: assignment within conditional expression
+ # C4819: The file contains a character that cannot be represented in the current code page
+ #
+ MSFT:*_*_IA32_CC_FLAGS = -U_WIN32 -U_WIN64 -U_MSC_VER $(OPENSSL_FLAGS) $(OPENSSL_FLAGS_CONFIG) /wd4090 /wd4132 /wd4210 /wd4244 /wd4245 /wd4267 /wd4310 /wd4389 /wd4700 /wd4702 /wd4706 /wd4819
+
+ INTEL:*_*_IA32_CC_FLAGS = -U_WIN32 -U_WIN64 -U_MSC_VER -U__ICC $(OPENSSL_FLAGS) $(OPENSSL_FLAGS_CONFIG) /w
+
+ #
+ # Suppress the following build warnings in openssl so we don't break the build with -Werror
+ # -Werror=maybe-uninitialized: there exist some other paths for which the variable is not initialized.
+ # -Werror=format: Check calls to printf and scanf, etc., to make sure that the arguments supplied have
+ # types appropriate to the format string specified.
+ # -Werror=unused-but-set-variable: Warn whenever a local variable is assigned to, but otherwise unused (aside from its declaration).
+ #
+ GCC:*_*_IA32_CC_FLAGS = -U_WIN32 -U_WIN64 $(OPENSSL_FLAGS) -Wno-error=maybe-uninitialized -Wno-error=unused-but-set-variable
+
+ # suppress the following warnings in openssl so we don't break the build with warnings-as-errors:
+ # 1295: Deprecated declaration <entity> - give arg types
+ # 550: <entity> was set but never used
+ # 1293: assignment in condition
+ # 111: statement is unreachable (invariably "break;" after "return X;" in case statement)
+ # 68: integer conversion resulted in a change of sign ("if (Status == -1)")
+ # 177: <entity> was declared but never referenced
+ # 223: function <entity> declared implicitly
+ # 144: a value of type <type> cannot be used to initialize an entity of type <type>
+ # 513: a value of type <type> cannot be assigned to an entity of type <type>
+ # 188: enumerated type mixed with another type (i.e. passing an integer as an enum without a cast)
+ # 1296: Extended constant initialiser used
+ # 128: loop is not reachable - may be emitted inappropriately if code follows a conditional return
+ # from the function that evaluates to true at compile time
+ # 546: transfer of control bypasses initialization - may be emitted inappropriately if the uninitialized
+ # variable is never referenced after the jump
+ # 1: ignore "#1-D: last line of file ends without a newline"
+ # 3017: <entity> may be used before being set (NOTE: This was fixed in OpenSSL 1.1 HEAD with
+ # commit d9b8b89bec4480de3a10bdaf9425db371c19145b, and can be dropped then.)
+ XCODE:*_*_IA32_CC_FLAGS = -mmmx -msse -U_WIN32 -U_WIN64 $(OPENSSL_FLAGS) -w -std=c99 -Wno-error=uninitialized
diff --git a/CryptoPkg/Library/OpensslLib/OpensslLibX64.inf b/CryptoPkg/Library/OpensslLib/OpensslLibX64.inf
new file mode 100644
index 0000000000..fcebc6d6de
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/OpensslLibX64.inf
@@ -0,0 +1,691 @@
+## @file
+# This module provides OpenSSL Library implementation.
+#
+# Copyright (c) 2010 - 2020, Intel Corporation. All rights reserved.<BR>
+# SPDX-License-Identifier: BSD-2-Clause-Patent
+#
+##
+
+[Defines]
+ INF_VERSION = 0x00010005
+ BASE_NAME = OpensslLibX64
+ MODULE_UNI_FILE = OpensslLib.uni
+ FILE_GUID = 18125E50-0117-4DD0-BE54-4784AD995FEF
+ MODULE_TYPE = BASE
+ VERSION_STRING = 1.0
+ LIBRARY_CLASS = OpensslLib
+ DEFINE OPENSSL_PATH = openssl
+ DEFINE OPENSSL_FLAGS = -DL_ENDIAN -DOPENSSL_SMALL_FOOTPRINT -D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE
+ DEFINE OPENSSL_FLAGS_CONFIG = -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM
+ CONSTRUCTOR = OpensslLibConstructor
+
+#
+# VALID_ARCHITECTURES = X64
+#
+
+[Sources]
+ OpensslLibConstructor.c
+ $(OPENSSL_PATH)/e_os.h
+ $(OPENSSL_PATH)/ms/uplink.h
+# Autogenerated files list starts here
+ X64/crypto/aes/aesni-mb-x86_64.nasm
+ X64/crypto/aes/aesni-sha1-x86_64.nasm
+ X64/crypto/aes/aesni-sha256-x86_64.nasm
+ X64/crypto/aes/aesni-x86_64.nasm
+ X64/crypto/aes/vpaes-x86_64.nasm
+ X64/crypto/bn/rsaz-avx2.nasm
+ X64/crypto/bn/rsaz-x86_64.nasm
+ X64/crypto/bn/x86_64-gf2m.nasm
+ X64/crypto/bn/x86_64-mont.nasm
+ X64/crypto/bn/x86_64-mont5.nasm
+ X64/crypto/md5/md5-x86_64.nasm
+ X64/crypto/modes/aesni-gcm-x86_64.nasm
+ X64/crypto/modes/ghash-x86_64.nasm
+ X64/crypto/rc4/rc4-md5-x86_64.nasm
+ X64/crypto/rc4/rc4-x86_64.nasm
+ X64/crypto/sha/keccak1600-x86_64.nasm
+ X64/crypto/sha/sha1-mb-x86_64.nasm
+ X64/crypto/sha/sha1-x86_64.nasm
+ X64/crypto/sha/sha256-mb-x86_64.nasm
+ X64/crypto/sha/sha256-x86_64.nasm
+ X64/crypto/sha/sha512-x86_64.nasm
+ X64/crypto/x86_64cpuid.nasm
+ $(OPENSSL_PATH)/crypto/aes/aes_cbc.c
+ $(OPENSSL_PATH)/crypto/aes/aes_cfb.c
+ $(OPENSSL_PATH)/crypto/aes/aes_core.c
+ $(OPENSSL_PATH)/crypto/aes/aes_ecb.c
+ $(OPENSSL_PATH)/crypto/aes/aes_ige.c
+ $(OPENSSL_PATH)/crypto/aes/aes_misc.c
+ $(OPENSSL_PATH)/crypto/aes/aes_ofb.c
+ $(OPENSSL_PATH)/crypto/aes/aes_wrap.c
+ $(OPENSSL_PATH)/crypto/aria/aria.c
+ $(OPENSSL_PATH)/crypto/asn1/a_bitstr.c
+ $(OPENSSL_PATH)/crypto/asn1/a_d2i_fp.c
+ $(OPENSSL_PATH)/crypto/asn1/a_digest.c
+ $(OPENSSL_PATH)/crypto/asn1/a_dup.c
+ $(OPENSSL_PATH)/crypto/asn1/a_gentm.c
+ $(OPENSSL_PATH)/crypto/asn1/a_i2d_fp.c
+ $(OPENSSL_PATH)/crypto/asn1/a_int.c
+ $(OPENSSL_PATH)/crypto/asn1/a_mbstr.c
+ $(OPENSSL_PATH)/crypto/asn1/a_object.c
+ $(OPENSSL_PATH)/crypto/asn1/a_octet.c
+ $(OPENSSL_PATH)/crypto/asn1/a_print.c
+ $(OPENSSL_PATH)/crypto/asn1/a_sign.c
+ $(OPENSSL_PATH)/crypto/asn1/a_strex.c
+ $(OPENSSL_PATH)/crypto/asn1/a_strnid.c
+ $(OPENSSL_PATH)/crypto/asn1/a_time.c
+ $(OPENSSL_PATH)/crypto/asn1/a_type.c
+ $(OPENSSL_PATH)/crypto/asn1/a_utctm.c
+ $(OPENSSL_PATH)/crypto/asn1/a_utf8.c
+ $(OPENSSL_PATH)/crypto/asn1/a_verify.c
+ $(OPENSSL_PATH)/crypto/asn1/ameth_lib.c
+ $(OPENSSL_PATH)/crypto/asn1/asn1_err.c
+ $(OPENSSL_PATH)/crypto/asn1/asn1_gen.c
+ $(OPENSSL_PATH)/crypto/asn1/asn1_item_list.c
+ $(OPENSSL_PATH)/crypto/asn1/asn1_lib.c
+ $(OPENSSL_PATH)/crypto/asn1/asn1_par.c
+ $(OPENSSL_PATH)/crypto/asn1/asn_mime.c
+ $(OPENSSL_PATH)/crypto/asn1/asn_moid.c
+ $(OPENSSL_PATH)/crypto/asn1/asn_mstbl.c
+ $(OPENSSL_PATH)/crypto/asn1/asn_pack.c
+ $(OPENSSL_PATH)/crypto/asn1/bio_asn1.c
+ $(OPENSSL_PATH)/crypto/asn1/bio_ndef.c
+ $(OPENSSL_PATH)/crypto/asn1/d2i_pr.c
+ $(OPENSSL_PATH)/crypto/asn1/d2i_pu.c
+ $(OPENSSL_PATH)/crypto/asn1/evp_asn1.c
+ $(OPENSSL_PATH)/crypto/asn1/f_int.c
+ $(OPENSSL_PATH)/crypto/asn1/f_string.c
+ $(OPENSSL_PATH)/crypto/asn1/i2d_pr.c
+ $(OPENSSL_PATH)/crypto/asn1/i2d_pu.c
+ $(OPENSSL_PATH)/crypto/asn1/n_pkey.c
+ $(OPENSSL_PATH)/crypto/asn1/nsseq.c
+ $(OPENSSL_PATH)/crypto/asn1/p5_pbe.c
+ $(OPENSSL_PATH)/crypto/asn1/p5_pbev2.c
+ $(OPENSSL_PATH)/crypto/asn1/p5_scrypt.c
+ $(OPENSSL_PATH)/crypto/asn1/p8_pkey.c
+ $(OPENSSL_PATH)/crypto/asn1/t_bitst.c
+ $(OPENSSL_PATH)/crypto/asn1/t_pkey.c
+ $(OPENSSL_PATH)/crypto/asn1/t_spki.c
+ $(OPENSSL_PATH)/crypto/asn1/tasn_dec.c
+ $(OPENSSL_PATH)/crypto/asn1/tasn_enc.c
+ $(OPENSSL_PATH)/crypto/asn1/tasn_fre.c
+ $(OPENSSL_PATH)/crypto/asn1/tasn_new.c
+ $(OPENSSL_PATH)/crypto/asn1/tasn_prn.c
+ $(OPENSSL_PATH)/crypto/asn1/tasn_scn.c
+ $(OPENSSL_PATH)/crypto/asn1/tasn_typ.c
+ $(OPENSSL_PATH)/crypto/asn1/tasn_utl.c
+ $(OPENSSL_PATH)/crypto/asn1/x_algor.c
+ $(OPENSSL_PATH)/crypto/asn1/x_bignum.c
+ $(OPENSSL_PATH)/crypto/asn1/x_info.c
+ $(OPENSSL_PATH)/crypto/asn1/x_int64.c
+ $(OPENSSL_PATH)/crypto/asn1/x_long.c
+ $(OPENSSL_PATH)/crypto/asn1/x_pkey.c
+ $(OPENSSL_PATH)/crypto/asn1/x_sig.c
+ $(OPENSSL_PATH)/crypto/asn1/x_spki.c
+ $(OPENSSL_PATH)/crypto/asn1/x_val.c
+ $(OPENSSL_PATH)/crypto/async/arch/async_null.c
+ $(OPENSSL_PATH)/crypto/async/arch/async_posix.c
+ $(OPENSSL_PATH)/crypto/async/arch/async_win.c
+ $(OPENSSL_PATH)/crypto/async/async.c
+ $(OPENSSL_PATH)/crypto/async/async_err.c
+ $(OPENSSL_PATH)/crypto/async/async_wait.c
+ $(OPENSSL_PATH)/crypto/bio/b_addr.c
+ $(OPENSSL_PATH)/crypto/bio/b_dump.c
+ $(OPENSSL_PATH)/crypto/bio/b_sock.c
+ $(OPENSSL_PATH)/crypto/bio/b_sock2.c
+ $(OPENSSL_PATH)/crypto/bio/bf_buff.c
+ $(OPENSSL_PATH)/crypto/bio/bf_lbuf.c
+ $(OPENSSL_PATH)/crypto/bio/bf_nbio.c
+ $(OPENSSL_PATH)/crypto/bio/bf_null.c
+ $(OPENSSL_PATH)/crypto/bio/bio_cb.c
+ $(OPENSSL_PATH)/crypto/bio/bio_err.c
+ $(OPENSSL_PATH)/crypto/bio/bio_lib.c
+ $(OPENSSL_PATH)/crypto/bio/bio_meth.c
+ $(OPENSSL_PATH)/crypto/bio/bss_acpt.c
+ $(OPENSSL_PATH)/crypto/bio/bss_bio.c
+ $(OPENSSL_PATH)/crypto/bio/bss_conn.c
+ $(OPENSSL_PATH)/crypto/bio/bss_dgram.c
+ $(OPENSSL_PATH)/crypto/bio/bss_fd.c
+ $(OPENSSL_PATH)/crypto/bio/bss_file.c
+ $(OPENSSL_PATH)/crypto/bio/bss_log.c
+ $(OPENSSL_PATH)/crypto/bio/bss_mem.c
+ $(OPENSSL_PATH)/crypto/bio/bss_null.c
+ $(OPENSSL_PATH)/crypto/bio/bss_sock.c
+ $(OPENSSL_PATH)/crypto/bn/asm/x86_64-gcc.c
+ $(OPENSSL_PATH)/crypto/bn/bn_add.c
+ $(OPENSSL_PATH)/crypto/bn/bn_blind.c
+ $(OPENSSL_PATH)/crypto/bn/bn_const.c
+ $(OPENSSL_PATH)/crypto/bn/bn_ctx.c
+ $(OPENSSL_PATH)/crypto/bn/bn_depr.c
+ $(OPENSSL_PATH)/crypto/bn/bn_dh.c
+ $(OPENSSL_PATH)/crypto/bn/bn_div.c
+ $(OPENSSL_PATH)/crypto/bn/bn_err.c
+ $(OPENSSL_PATH)/crypto/bn/bn_exp.c
+ $(OPENSSL_PATH)/crypto/bn/bn_exp2.c
+ $(OPENSSL_PATH)/crypto/bn/bn_gcd.c
+ $(OPENSSL_PATH)/crypto/bn/bn_gf2m.c
+ $(OPENSSL_PATH)/crypto/bn/bn_intern.c
+ $(OPENSSL_PATH)/crypto/bn/bn_kron.c
+ $(OPENSSL_PATH)/crypto/bn/bn_lib.c
+ $(OPENSSL_PATH)/crypto/bn/bn_mod.c
+ $(OPENSSL_PATH)/crypto/bn/bn_mont.c
+ $(OPENSSL_PATH)/crypto/bn/bn_mpi.c
+ $(OPENSSL_PATH)/crypto/bn/bn_mul.c
+ $(OPENSSL_PATH)/crypto/bn/bn_nist.c
+ $(OPENSSL_PATH)/crypto/bn/bn_prime.c
+ $(OPENSSL_PATH)/crypto/bn/bn_print.c
+ $(OPENSSL_PATH)/crypto/bn/bn_rand.c
+ $(OPENSSL_PATH)/crypto/bn/bn_recp.c
+ $(OPENSSL_PATH)/crypto/bn/bn_shift.c
+ $(OPENSSL_PATH)/crypto/bn/bn_sqr.c
+ $(OPENSSL_PATH)/crypto/bn/bn_sqrt.c
+ $(OPENSSL_PATH)/crypto/bn/bn_srp.c
+ $(OPENSSL_PATH)/crypto/bn/bn_word.c
+ $(OPENSSL_PATH)/crypto/bn/bn_x931p.c
+ $(OPENSSL_PATH)/crypto/bn/rsaz_exp.c
+ $(OPENSSL_PATH)/crypto/buffer/buf_err.c
+ $(OPENSSL_PATH)/crypto/buffer/buffer.c
+ $(OPENSSL_PATH)/crypto/cmac/cm_ameth.c
+ $(OPENSSL_PATH)/crypto/cmac/cm_pmeth.c
+ $(OPENSSL_PATH)/crypto/cmac/cmac.c
+ $(OPENSSL_PATH)/crypto/comp/c_zlib.c
+ $(OPENSSL_PATH)/crypto/comp/comp_err.c
+ $(OPENSSL_PATH)/crypto/comp/comp_lib.c
+ $(OPENSSL_PATH)/crypto/conf/conf_api.c
+ $(OPENSSL_PATH)/crypto/conf/conf_def.c
+ $(OPENSSL_PATH)/crypto/conf/conf_err.c
+ $(OPENSSL_PATH)/crypto/conf/conf_lib.c
+ $(OPENSSL_PATH)/crypto/conf/conf_mall.c
+ $(OPENSSL_PATH)/crypto/conf/conf_mod.c
+ $(OPENSSL_PATH)/crypto/conf/conf_sap.c
+ $(OPENSSL_PATH)/crypto/conf/conf_ssl.c
+ $(OPENSSL_PATH)/crypto/cpt_err.c
+ $(OPENSSL_PATH)/crypto/cryptlib.c
+ $(OPENSSL_PATH)/crypto/ctype.c
+ $(OPENSSL_PATH)/crypto/cversion.c
+ $(OPENSSL_PATH)/crypto/des/cbc_cksm.c
+ $(OPENSSL_PATH)/crypto/des/cbc_enc.c
+ $(OPENSSL_PATH)/crypto/des/cfb64ede.c
+ $(OPENSSL_PATH)/crypto/des/cfb64enc.c
+ $(OPENSSL_PATH)/crypto/des/cfb_enc.c
+ $(OPENSSL_PATH)/crypto/des/des_enc.c
+ $(OPENSSL_PATH)/crypto/des/ecb3_enc.c
+ $(OPENSSL_PATH)/crypto/des/ecb_enc.c
+ $(OPENSSL_PATH)/crypto/des/fcrypt.c
+ $(OPENSSL_PATH)/crypto/des/fcrypt_b.c
+ $(OPENSSL_PATH)/crypto/des/ofb64ede.c
+ $(OPENSSL_PATH)/crypto/des/ofb64enc.c
+ $(OPENSSL_PATH)/crypto/des/ofb_enc.c
+ $(OPENSSL_PATH)/crypto/des/pcbc_enc.c
+ $(OPENSSL_PATH)/crypto/des/qud_cksm.c
+ $(OPENSSL_PATH)/crypto/des/rand_key.c
+ $(OPENSSL_PATH)/crypto/des/set_key.c
+ $(OPENSSL_PATH)/crypto/des/str2key.c
+ $(OPENSSL_PATH)/crypto/des/xcbc_enc.c
+ $(OPENSSL_PATH)/crypto/dh/dh_ameth.c
+ $(OPENSSL_PATH)/crypto/dh/dh_asn1.c
+ $(OPENSSL_PATH)/crypto/dh/dh_check.c
+ $(OPENSSL_PATH)/crypto/dh/dh_depr.c
+ $(OPENSSL_PATH)/crypto/dh/dh_err.c
+ $(OPENSSL_PATH)/crypto/dh/dh_gen.c
+ $(OPENSSL_PATH)/crypto/dh/dh_kdf.c
+ $(OPENSSL_PATH)/crypto/dh/dh_key.c
+ $(OPENSSL_PATH)/crypto/dh/dh_lib.c
+ $(OPENSSL_PATH)/crypto/dh/dh_meth.c
+ $(OPENSSL_PATH)/crypto/dh/dh_pmeth.c
+ $(OPENSSL_PATH)/crypto/dh/dh_prn.c
+ $(OPENSSL_PATH)/crypto/dh/dh_rfc5114.c
+ $(OPENSSL_PATH)/crypto/dh/dh_rfc7919.c
+ $(OPENSSL_PATH)/crypto/dso/dso_dl.c
+ $(OPENSSL_PATH)/crypto/dso/dso_dlfcn.c
+ $(OPENSSL_PATH)/crypto/dso/dso_err.c
+ $(OPENSSL_PATH)/crypto/dso/dso_lib.c
+ $(OPENSSL_PATH)/crypto/dso/dso_openssl.c
+ $(OPENSSL_PATH)/crypto/dso/dso_vms.c
+ $(OPENSSL_PATH)/crypto/dso/dso_win32.c
+ $(OPENSSL_PATH)/crypto/ebcdic.c
+ $(OPENSSL_PATH)/crypto/err/err.c
+ $(OPENSSL_PATH)/crypto/err/err_prn.c
+ $(OPENSSL_PATH)/crypto/evp/bio_b64.c
+ $(OPENSSL_PATH)/crypto/evp/bio_enc.c
+ $(OPENSSL_PATH)/crypto/evp/bio_md.c
+ $(OPENSSL_PATH)/crypto/evp/bio_ok.c
+ $(OPENSSL_PATH)/crypto/evp/c_allc.c
+ $(OPENSSL_PATH)/crypto/evp/c_alld.c
+ $(OPENSSL_PATH)/crypto/evp/cmeth_lib.c
+ $(OPENSSL_PATH)/crypto/evp/digest.c
+ $(OPENSSL_PATH)/crypto/evp/e_aes.c
+ $(OPENSSL_PATH)/crypto/evp/e_aes_cbc_hmac_sha1.c
+ $(OPENSSL_PATH)/crypto/evp/e_aes_cbc_hmac_sha256.c
+ $(OPENSSL_PATH)/crypto/evp/e_aria.c
+ $(OPENSSL_PATH)/crypto/evp/e_bf.c
+ $(OPENSSL_PATH)/crypto/evp/e_camellia.c
+ $(OPENSSL_PATH)/crypto/evp/e_cast.c
+ $(OPENSSL_PATH)/crypto/evp/e_chacha20_poly1305.c
+ $(OPENSSL_PATH)/crypto/evp/e_des.c
+ $(OPENSSL_PATH)/crypto/evp/e_des3.c
+ $(OPENSSL_PATH)/crypto/evp/e_idea.c
+ $(OPENSSL_PATH)/crypto/evp/e_null.c
+ $(OPENSSL_PATH)/crypto/evp/e_old.c
+ $(OPENSSL_PATH)/crypto/evp/e_rc2.c
+ $(OPENSSL_PATH)/crypto/evp/e_rc4.c
+ $(OPENSSL_PATH)/crypto/evp/e_rc4_hmac_md5.c
+ $(OPENSSL_PATH)/crypto/evp/e_rc5.c
+ $(OPENSSL_PATH)/crypto/evp/e_seed.c
+ $(OPENSSL_PATH)/crypto/evp/e_sm4.c
+ $(OPENSSL_PATH)/crypto/evp/e_xcbc_d.c
+ $(OPENSSL_PATH)/crypto/evp/encode.c
+ $(OPENSSL_PATH)/crypto/evp/evp_cnf.c
+ $(OPENSSL_PATH)/crypto/evp/evp_enc.c
+ $(OPENSSL_PATH)/crypto/evp/evp_err.c
+ $(OPENSSL_PATH)/crypto/evp/evp_key.c
+ $(OPENSSL_PATH)/crypto/evp/evp_lib.c
+ $(OPENSSL_PATH)/crypto/evp/evp_pbe.c
+ $(OPENSSL_PATH)/crypto/evp/evp_pkey.c
+ $(OPENSSL_PATH)/crypto/evp/m_md2.c
+ $(OPENSSL_PATH)/crypto/evp/m_md4.c
+ $(OPENSSL_PATH)/crypto/evp/m_md5.c
+ $(OPENSSL_PATH)/crypto/evp/m_md5_sha1.c
+ $(OPENSSL_PATH)/crypto/evp/m_mdc2.c
+ $(OPENSSL_PATH)/crypto/evp/m_null.c
+ $(OPENSSL_PATH)/crypto/evp/m_ripemd.c
+ $(OPENSSL_PATH)/crypto/evp/m_sha1.c
+ $(OPENSSL_PATH)/crypto/evp/m_sha3.c
+ $(OPENSSL_PATH)/crypto/evp/m_sigver.c
+ $(OPENSSL_PATH)/crypto/evp/m_wp.c
+ $(OPENSSL_PATH)/crypto/evp/names.c
+ $(OPENSSL_PATH)/crypto/evp/p5_crpt.c
+ $(OPENSSL_PATH)/crypto/evp/p5_crpt2.c
+ $(OPENSSL_PATH)/crypto/evp/p_dec.c
+ $(OPENSSL_PATH)/crypto/evp/p_enc.c
+ $(OPENSSL_PATH)/crypto/evp/p_lib.c
+ $(OPENSSL_PATH)/crypto/evp/p_open.c
+ $(OPENSSL_PATH)/crypto/evp/p_seal.c
+ $(OPENSSL_PATH)/crypto/evp/p_sign.c
+ $(OPENSSL_PATH)/crypto/evp/p_verify.c
+ $(OPENSSL_PATH)/crypto/evp/pbe_scrypt.c
+ $(OPENSSL_PATH)/crypto/evp/pmeth_fn.c
+ $(OPENSSL_PATH)/crypto/evp/pmeth_gn.c
+ $(OPENSSL_PATH)/crypto/evp/pmeth_lib.c
+ $(OPENSSL_PATH)/crypto/ex_data.c
+ $(OPENSSL_PATH)/crypto/getenv.c
+ $(OPENSSL_PATH)/crypto/hmac/hm_ameth.c
+ $(OPENSSL_PATH)/crypto/hmac/hm_pmeth.c
+ $(OPENSSL_PATH)/crypto/hmac/hmac.c
+ $(OPENSSL_PATH)/crypto/init.c
+ $(OPENSSL_PATH)/crypto/kdf/hkdf.c
+ $(OPENSSL_PATH)/crypto/kdf/kdf_err.c
+ $(OPENSSL_PATH)/crypto/kdf/scrypt.c
+ $(OPENSSL_PATH)/crypto/kdf/tls1_prf.c
+ $(OPENSSL_PATH)/crypto/lhash/lh_stats.c
+ $(OPENSSL_PATH)/crypto/lhash/lhash.c
+ $(OPENSSL_PATH)/crypto/md4/md4_dgst.c
+ $(OPENSSL_PATH)/crypto/md4/md4_one.c
+ $(OPENSSL_PATH)/crypto/md5/md5_dgst.c
+ $(OPENSSL_PATH)/crypto/md5/md5_one.c
+ $(OPENSSL_PATH)/crypto/mem.c
+ $(OPENSSL_PATH)/crypto/mem_dbg.c
+ $(OPENSSL_PATH)/crypto/mem_sec.c
+ $(OPENSSL_PATH)/crypto/modes/cbc128.c
+ $(OPENSSL_PATH)/crypto/modes/ccm128.c
+ $(OPENSSL_PATH)/crypto/modes/cfb128.c
+ $(OPENSSL_PATH)/crypto/modes/ctr128.c
+ $(OPENSSL_PATH)/crypto/modes/cts128.c
+ $(OPENSSL_PATH)/crypto/modes/gcm128.c
+ $(OPENSSL_PATH)/crypto/modes/ocb128.c
+ $(OPENSSL_PATH)/crypto/modes/ofb128.c
+ $(OPENSSL_PATH)/crypto/modes/wrap128.c
+ $(OPENSSL_PATH)/crypto/modes/xts128.c
+ $(OPENSSL_PATH)/crypto/o_dir.c
+ $(OPENSSL_PATH)/crypto/o_fips.c
+ $(OPENSSL_PATH)/crypto/o_fopen.c
+ $(OPENSSL_PATH)/crypto/o_init.c
+ $(OPENSSL_PATH)/crypto/o_str.c
+ $(OPENSSL_PATH)/crypto/o_time.c
+ $(OPENSSL_PATH)/crypto/objects/o_names.c
+ $(OPENSSL_PATH)/crypto/objects/obj_dat.c
+ $(OPENSSL_PATH)/crypto/objects/obj_err.c
+ $(OPENSSL_PATH)/crypto/objects/obj_lib.c
+ $(OPENSSL_PATH)/crypto/objects/obj_xref.c
+ $(OPENSSL_PATH)/crypto/ocsp/ocsp_asn.c
+ $(OPENSSL_PATH)/crypto/ocsp/ocsp_cl.c
+ $(OPENSSL_PATH)/crypto/ocsp/ocsp_err.c
+ $(OPENSSL_PATH)/crypto/ocsp/ocsp_ext.c
+ $(OPENSSL_PATH)/crypto/ocsp/ocsp_ht.c
+ $(OPENSSL_PATH)/crypto/ocsp/ocsp_lib.c
+ $(OPENSSL_PATH)/crypto/ocsp/ocsp_prn.c
+ $(OPENSSL_PATH)/crypto/ocsp/ocsp_srv.c
+ $(OPENSSL_PATH)/crypto/ocsp/ocsp_vfy.c
+ $(OPENSSL_PATH)/crypto/ocsp/v3_ocsp.c
+ $(OPENSSL_PATH)/crypto/pem/pem_all.c
+ $(OPENSSL_PATH)/crypto/pem/pem_err.c
+ $(OPENSSL_PATH)/crypto/pem/pem_info.c
+ $(OPENSSL_PATH)/crypto/pem/pem_lib.c
+ $(OPENSSL_PATH)/crypto/pem/pem_oth.c
+ $(OPENSSL_PATH)/crypto/pem/pem_pk8.c
+ $(OPENSSL_PATH)/crypto/pem/pem_pkey.c
+ $(OPENSSL_PATH)/crypto/pem/pem_sign.c
+ $(OPENSSL_PATH)/crypto/pem/pem_x509.c
+ $(OPENSSL_PATH)/crypto/pem/pem_xaux.c
+ $(OPENSSL_PATH)/crypto/pem/pvkfmt.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_add.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_asn.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_attr.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_crpt.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_crt.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_decr.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_init.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_key.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_kiss.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_mutl.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_npas.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_p8d.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_p8e.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_sbag.c
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_utl.c
+ $(OPENSSL_PATH)/crypto/pkcs12/pk12err.c
+ $(OPENSSL_PATH)/crypto/pkcs7/bio_pk7.c
+ $(OPENSSL_PATH)/crypto/pkcs7/pk7_asn1.c
+ $(OPENSSL_PATH)/crypto/pkcs7/pk7_attr.c
+ $(OPENSSL_PATH)/crypto/pkcs7/pk7_doit.c
+ $(OPENSSL_PATH)/crypto/pkcs7/pk7_lib.c
+ $(OPENSSL_PATH)/crypto/pkcs7/pk7_mime.c
+ $(OPENSSL_PATH)/crypto/pkcs7/pk7_smime.c
+ $(OPENSSL_PATH)/crypto/pkcs7/pkcs7err.c
+ $(OPENSSL_PATH)/crypto/rand/drbg_ctr.c
+ $(OPENSSL_PATH)/crypto/rand/drbg_lib.c
+ $(OPENSSL_PATH)/crypto/rand/rand_egd.c
+ $(OPENSSL_PATH)/crypto/rand/rand_err.c
+ $(OPENSSL_PATH)/crypto/rand/rand_lib.c
+ $(OPENSSL_PATH)/crypto/rand/rand_unix.c
+ $(OPENSSL_PATH)/crypto/rand/rand_vms.c
+ $(OPENSSL_PATH)/crypto/rand/rand_win.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_ameth.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_asn1.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_chk.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_crpt.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_depr.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_err.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_gen.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_lib.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_meth.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_mp.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_none.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_oaep.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_ossl.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_pk1.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_pmeth.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_prn.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_pss.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_saos.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_sign.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_ssl.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_x931.c
+ $(OPENSSL_PATH)/crypto/rsa/rsa_x931g.c
+ $(OPENSSL_PATH)/crypto/sha/sha1_one.c
+ $(OPENSSL_PATH)/crypto/sha/sha1dgst.c
+ $(OPENSSL_PATH)/crypto/sha/sha256.c
+ $(OPENSSL_PATH)/crypto/sha/sha512.c
+ $(OPENSSL_PATH)/crypto/siphash/siphash.c
+ $(OPENSSL_PATH)/crypto/siphash/siphash_ameth.c
+ $(OPENSSL_PATH)/crypto/siphash/siphash_pmeth.c
+ $(OPENSSL_PATH)/crypto/sm3/m_sm3.c
+ $(OPENSSL_PATH)/crypto/sm3/sm3.c
+ $(OPENSSL_PATH)/crypto/sm4/sm4.c
+ $(OPENSSL_PATH)/crypto/stack/stack.c
+ $(OPENSSL_PATH)/crypto/threads_none.c
+ $(OPENSSL_PATH)/crypto/threads_pthread.c
+ $(OPENSSL_PATH)/crypto/threads_win.c
+ $(OPENSSL_PATH)/crypto/txt_db/txt_db.c
+ $(OPENSSL_PATH)/crypto/ui/ui_err.c
+ $(OPENSSL_PATH)/crypto/ui/ui_lib.c
+ $(OPENSSL_PATH)/crypto/ui/ui_null.c
+ $(OPENSSL_PATH)/crypto/ui/ui_openssl.c
+ $(OPENSSL_PATH)/crypto/ui/ui_util.c
+ $(OPENSSL_PATH)/crypto/uid.c
+ $(OPENSSL_PATH)/crypto/x509/by_dir.c
+ $(OPENSSL_PATH)/crypto/x509/by_file.c
+ $(OPENSSL_PATH)/crypto/x509/t_crl.c
+ $(OPENSSL_PATH)/crypto/x509/t_req.c
+ $(OPENSSL_PATH)/crypto/x509/t_x509.c
+ $(OPENSSL_PATH)/crypto/x509/x509_att.c
+ $(OPENSSL_PATH)/crypto/x509/x509_cmp.c
+ $(OPENSSL_PATH)/crypto/x509/x509_d2.c
+ $(OPENSSL_PATH)/crypto/x509/x509_def.c
+ $(OPENSSL_PATH)/crypto/x509/x509_err.c
+ $(OPENSSL_PATH)/crypto/x509/x509_ext.c
+ $(OPENSSL_PATH)/crypto/x509/x509_lu.c
+ $(OPENSSL_PATH)/crypto/x509/x509_meth.c
+ $(OPENSSL_PATH)/crypto/x509/x509_obj.c
+ $(OPENSSL_PATH)/crypto/x509/x509_r2x.c
+ $(OPENSSL_PATH)/crypto/x509/x509_req.c
+ $(OPENSSL_PATH)/crypto/x509/x509_set.c
+ $(OPENSSL_PATH)/crypto/x509/x509_trs.c
+ $(OPENSSL_PATH)/crypto/x509/x509_txt.c
+ $(OPENSSL_PATH)/crypto/x509/x509_v3.c
+ $(OPENSSL_PATH)/crypto/x509/x509_vfy.c
+ $(OPENSSL_PATH)/crypto/x509/x509_vpm.c
+ $(OPENSSL_PATH)/crypto/x509/x509cset.c
+ $(OPENSSL_PATH)/crypto/x509/x509name.c
+ $(OPENSSL_PATH)/crypto/x509/x509rset.c
+ $(OPENSSL_PATH)/crypto/x509/x509spki.c
+ $(OPENSSL_PATH)/crypto/x509/x509type.c
+ $(OPENSSL_PATH)/crypto/x509/x_all.c
+ $(OPENSSL_PATH)/crypto/x509/x_attrib.c
+ $(OPENSSL_PATH)/crypto/x509/x_crl.c
+ $(OPENSSL_PATH)/crypto/x509/x_exten.c
+ $(OPENSSL_PATH)/crypto/x509/x_name.c
+ $(OPENSSL_PATH)/crypto/x509/x_pubkey.c
+ $(OPENSSL_PATH)/crypto/x509/x_req.c
+ $(OPENSSL_PATH)/crypto/x509/x_x509.c
+ $(OPENSSL_PATH)/crypto/x509/x_x509a.c
+ $(OPENSSL_PATH)/crypto/x509v3/pcy_cache.c
+ $(OPENSSL_PATH)/crypto/x509v3/pcy_data.c
+ $(OPENSSL_PATH)/crypto/x509v3/pcy_lib.c
+ $(OPENSSL_PATH)/crypto/x509v3/pcy_map.c
+ $(OPENSSL_PATH)/crypto/x509v3/pcy_node.c
+ $(OPENSSL_PATH)/crypto/x509v3/pcy_tree.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_addr.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_admis.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_akey.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_akeya.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_alt.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_asid.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_bcons.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_bitst.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_conf.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_cpols.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_crld.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_enum.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_extku.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_genn.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_ia5.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_info.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_int.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_lib.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_ncons.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_pci.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_pcia.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_pcons.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_pku.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_pmaps.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_prn.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_purp.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_skey.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_sxnet.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_tlsf.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3_utl.c
+ $(OPENSSL_PATH)/crypto/x509v3/v3err.c
+ $(OPENSSL_PATH)/crypto/arm_arch.h
+ $(OPENSSL_PATH)/crypto/mips_arch.h
+ $(OPENSSL_PATH)/crypto/ppc_arch.h
+ $(OPENSSL_PATH)/crypto/s390x_arch.h
+ $(OPENSSL_PATH)/crypto/sparc_arch.h
+ $(OPENSSL_PATH)/crypto/vms_rms.h
+ $(OPENSSL_PATH)/crypto/aes/aes_locl.h
+ $(OPENSSL_PATH)/crypto/asn1/asn1_item_list.h
+ $(OPENSSL_PATH)/crypto/asn1/asn1_locl.h
+ $(OPENSSL_PATH)/crypto/asn1/charmap.h
+ $(OPENSSL_PATH)/crypto/asn1/standard_methods.h
+ $(OPENSSL_PATH)/crypto/asn1/tbl_standard.h
+ $(OPENSSL_PATH)/crypto/async/async_locl.h
+ $(OPENSSL_PATH)/crypto/async/arch/async_null.h
+ $(OPENSSL_PATH)/crypto/async/arch/async_posix.h
+ $(OPENSSL_PATH)/crypto/async/arch/async_win.h
+ $(OPENSSL_PATH)/crypto/bio/bio_lcl.h
+ $(OPENSSL_PATH)/crypto/bn/bn_lcl.h
+ $(OPENSSL_PATH)/crypto/bn/bn_prime.h
+ $(OPENSSL_PATH)/crypto/bn/rsaz_exp.h
+ $(OPENSSL_PATH)/crypto/comp/comp_lcl.h
+ $(OPENSSL_PATH)/crypto/conf/conf_def.h
+ $(OPENSSL_PATH)/crypto/conf/conf_lcl.h
+ $(OPENSSL_PATH)/crypto/des/des_locl.h
+ $(OPENSSL_PATH)/crypto/des/spr.h
+ $(OPENSSL_PATH)/crypto/dh/dh_locl.h
+ $(OPENSSL_PATH)/crypto/dso/dso_locl.h
+ $(OPENSSL_PATH)/crypto/evp/evp_locl.h
+ $(OPENSSL_PATH)/crypto/hmac/hmac_lcl.h
+ $(OPENSSL_PATH)/crypto/lhash/lhash_lcl.h
+ $(OPENSSL_PATH)/crypto/md4/md4_locl.h
+ $(OPENSSL_PATH)/crypto/md5/md5_locl.h
+ $(OPENSSL_PATH)/crypto/modes/modes_lcl.h
+ $(OPENSSL_PATH)/crypto/objects/obj_dat.h
+ $(OPENSSL_PATH)/crypto/objects/obj_lcl.h
+ $(OPENSSL_PATH)/crypto/objects/obj_xref.h
+ $(OPENSSL_PATH)/crypto/ocsp/ocsp_lcl.h
+ $(OPENSSL_PATH)/crypto/pkcs12/p12_lcl.h
+ $(OPENSSL_PATH)/crypto/rand/rand_lcl.h
+ $(OPENSSL_PATH)/crypto/rc4/rc4_locl.h
+ $(OPENSSL_PATH)/crypto/rsa/rsa_locl.h
+ $(OPENSSL_PATH)/crypto/sha/sha_locl.h
+ $(OPENSSL_PATH)/crypto/siphash/siphash_local.h
+ $(OPENSSL_PATH)/crypto/sm3/sm3_locl.h
+ $(OPENSSL_PATH)/crypto/store/store_locl.h
+ $(OPENSSL_PATH)/crypto/ui/ui_locl.h
+ $(OPENSSL_PATH)/crypto/x509/x509_lcl.h
+ $(OPENSSL_PATH)/crypto/x509v3/ext_dat.h
+ $(OPENSSL_PATH)/crypto/x509v3/pcy_int.h
+ $(OPENSSL_PATH)/crypto/x509v3/standard_exts.h
+ $(OPENSSL_PATH)/crypto/x509v3/v3_admis.h
+ $(OPENSSL_PATH)/ssl/bio_ssl.c
+ $(OPENSSL_PATH)/ssl/d1_lib.c
+ $(OPENSSL_PATH)/ssl/d1_msg.c
+ $(OPENSSL_PATH)/ssl/d1_srtp.c
+ $(OPENSSL_PATH)/ssl/methods.c
+ $(OPENSSL_PATH)/ssl/packet.c
+ $(OPENSSL_PATH)/ssl/pqueue.c
+ $(OPENSSL_PATH)/ssl/record/dtls1_bitmap.c
+ $(OPENSSL_PATH)/ssl/record/rec_layer_d1.c
+ $(OPENSSL_PATH)/ssl/record/rec_layer_s3.c
+ $(OPENSSL_PATH)/ssl/record/ssl3_buffer.c
+ $(OPENSSL_PATH)/ssl/record/ssl3_record.c
+ $(OPENSSL_PATH)/ssl/record/ssl3_record_tls13.c
+ $(OPENSSL_PATH)/ssl/s3_cbc.c
+ $(OPENSSL_PATH)/ssl/s3_enc.c
+ $(OPENSSL_PATH)/ssl/s3_lib.c
+ $(OPENSSL_PATH)/ssl/s3_msg.c
+ $(OPENSSL_PATH)/ssl/ssl_asn1.c
+ $(OPENSSL_PATH)/ssl/ssl_cert.c
+ $(OPENSSL_PATH)/ssl/ssl_ciph.c
+ $(OPENSSL_PATH)/ssl/ssl_conf.c
+ $(OPENSSL_PATH)/ssl/ssl_err.c
+ $(OPENSSL_PATH)/ssl/ssl_init.c
+ $(OPENSSL_PATH)/ssl/ssl_lib.c
+ $(OPENSSL_PATH)/ssl/ssl_mcnf.c
+ $(OPENSSL_PATH)/ssl/ssl_rsa.c
+ $(OPENSSL_PATH)/ssl/ssl_sess.c
+ $(OPENSSL_PATH)/ssl/ssl_stat.c
+ $(OPENSSL_PATH)/ssl/ssl_txt.c
+ $(OPENSSL_PATH)/ssl/ssl_utst.c
+ $(OPENSSL_PATH)/ssl/statem/extensions.c
+ $(OPENSSL_PATH)/ssl/statem/extensions_clnt.c
+ $(OPENSSL_PATH)/ssl/statem/extensions_cust.c
+ $(OPENSSL_PATH)/ssl/statem/extensions_srvr.c
+ $(OPENSSL_PATH)/ssl/statem/statem.c
+ $(OPENSSL_PATH)/ssl/statem/statem_clnt.c
+ $(OPENSSL_PATH)/ssl/statem/statem_dtls.c
+ $(OPENSSL_PATH)/ssl/statem/statem_lib.c
+ $(OPENSSL_PATH)/ssl/statem/statem_srvr.c
+ $(OPENSSL_PATH)/ssl/t1_enc.c
+ $(OPENSSL_PATH)/ssl/t1_lib.c
+ $(OPENSSL_PATH)/ssl/t1_trce.c
+ $(OPENSSL_PATH)/ssl/tls13_enc.c
+ $(OPENSSL_PATH)/ssl/tls_srp.c
+ $(OPENSSL_PATH)/ssl/packet_locl.h
+ $(OPENSSL_PATH)/ssl/ssl_cert_table.h
+ $(OPENSSL_PATH)/ssl/ssl_locl.h
+ $(OPENSSL_PATH)/ssl/record/record.h
+ $(OPENSSL_PATH)/ssl/record/record_locl.h
+ $(OPENSSL_PATH)/ssl/statem/statem.h
+ $(OPENSSL_PATH)/ssl/statem/statem_locl.h
+# Autogenerated files list ends here
+ buildinf.h
+ rand_pool_noise.h
+ ossl_store.c
+ rand_pool.c
+
+[Sources.X64]
+ ApiHooks.c
+ rand_pool_noise_tsc.c
+
+[Packages]
+ MdePkg/MdePkg.dec
+ CryptoPkg/CryptoPkg.dec
+
+[LibraryClasses]
+ BaseLib
+ DebugLib
+ TimerLib
+ PrintLib
+
+[BuildOptions]
+ #
+ # Disables the following Visual Studio compiler warnings brought by openssl source,
+ # so we do not break the build with /WX option:
+ # C4090: 'function' : different 'const' qualifiers
+ # C4132: 'object' : const object should be initialized (tls13_enc.c)
+ # C4210: nonstandard extension used: function given file scope
+ # C4244: conversion from type1 to type2, possible loss of data
+ # C4245: conversion from type1 to type2, signed/unsigned mismatch
+ # C4267: conversion from size_t to type, possible loss of data
+ # C4306: 'identifier' : conversion from 'type1' to 'type2' of greater size
+ # C4310: cast truncates constant value
+ # C4389: 'operator' : signed/unsigned mismatch (xxxx)
+ # C4700: uninitialized local variable 'name' used. (conf_sap.c(71))
+ # C4702: unreachable code
+ # C4706: assignment within conditional expression
+ # C4819: The file contains a character that cannot be represented in the current code page
+ #
+ MSFT:*_*_X64_CC_FLAGS = -U_WIN32 -U_WIN64 -U_MSC_VER $(OPENSSL_FLAGS) $(OPENSSL_FLAGS_CONFIG) /wd4090 /wd4132 /wd4210 /wd4244 /wd4245 /wd4267 /wd4306 /wd4310 /wd4700 /wd4389 /wd4702 /wd4706 /wd4819
+
+ INTEL:*_*_X64_CC_FLAGS = -U_WIN32 -U_WIN64 -U_MSC_VER -U__ICC $(OPENSSL_FLAGS) $(OPENSSL_FLAGS_CONFIG) /w
+
+ #
+ # Suppress the following build warnings in openssl so we don't break the build with -Werror
+ # -Werror=maybe-uninitialized: there exist some other paths for which the variable is not initialized.
+ # -Werror=format: Check calls to printf and scanf, etc., to make sure that the arguments supplied have
+ # types appropriate to the format string specified.
+ # -Werror=unused-but-set-variable: Warn whenever a local variable is assigned to, but otherwise unused (aside from its declaration).
+ #
+ GCC:*_*_X64_CC_FLAGS = -U_WIN32 -U_WIN64 $(OPENSSL_FLAGS) -Wno-error=maybe-uninitialized -Wno-error=format -Wno-format -Wno-error=unused-but-set-variable -DNO_MSABI_VA_FUNCS
+
+ # suppress the following warnings in openssl so we don't break the build with warnings-as-errors:
+ # 1295: Deprecated declaration <entity> - give arg types
+ # 550: <entity> was set but never used
+ # 1293: assignment in condition
+ # 111: statement is unreachable (invariably "break;" after "return X;" in case statement)
+ # 68: integer conversion resulted in a change of sign ("if (Status == -1)")
+ # 177: <entity> was declared but never referenced
+ # 223: function <entity> declared implicitly
+ # 144: a value of type <type> cannot be used to initialize an entity of type <type>
+ # 513: a value of type <type> cannot be assigned to an entity of type <type>
+ # 188: enumerated type mixed with another type (i.e. passing an integer as an enum without a cast)
+ # 1296: Extended constant initialiser used
+ # 128: loop is not reachable - may be emitted inappropriately if code follows a conditional return
+ # from the function that evaluates to true at compile time
+ # 546: transfer of control bypasses initialization - may be emitted inappropriately if the uninitialized
+ # variable is never referenced after the jump
+ # 1: ignore "#1-D: last line of file ends without a newline"
+ # 3017: <entity> may be used before being set (NOTE: This was fixed in OpenSSL 1.1 HEAD with
+ # commit d9b8b89bec4480de3a10bdaf9425db371c19145b, and can be dropped then.)
+ XCODE:*_*_X64_CC_FLAGS = -mmmx -msse -U_WIN32 -U_WIN64 $(OPENSSL_FLAGS) -w -std=c99 -Wno-error=uninitialized
diff --git a/CryptoPkg/Library/Include/openssl/opensslconf.h b/CryptoPkg/Library/Include/openssl/opensslconf.h
index bd34e53ef2..20f32cc6fe 100644
--- a/CryptoPkg/Library/Include/openssl/opensslconf.h
+++ b/CryptoPkg/Library/Include/openssl/opensslconf.h
@@ -103,9 +103,6 @@ extern "C" {
#ifndef OPENSSL_NO_ASAN
# define OPENSSL_NO_ASAN
#endif
-#ifndef OPENSSL_NO_ASM
-# define OPENSSL_NO_ASM
-#endif
#ifndef OPENSSL_NO_ASYNC
# define OPENSSL_NO_ASYNC
#endif
diff --git a/CryptoPkg/Library/OpensslLib/ApiHooks.c b/CryptoPkg/Library/OpensslLib/ApiHooks.c
new file mode 100644
index 0000000000..58cff16838
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/ApiHooks.c
@@ -0,0 +1,18 @@
+/** @file
+ OpenSSL Library API hooks.
+
+Copyright (c) 2020, Intel Corporation. All rights reserved.<BR>
+SPDX-License-Identifier: BSD-2-Clause-Patent
+
+**/
+
+#include <Uefi.h>
+
+VOID *
+__imp_RtlVirtualUnwind (
+ VOID * Args
+ )
+{
+ return NULL;
+}
+
diff --git a/CryptoPkg/Library/OpensslLib/OpensslLibConstructor.c b/CryptoPkg/Library/OpensslLib/OpensslLibConstructor.c
new file mode 100644
index 0000000000..ef20d2b84e
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/OpensslLibConstructor.c
@@ -0,0 +1,34 @@
+/** @file
+ Constructor to initialize CPUID data for OpenSSL assembly operations.
+
+Copyright (c) 2020, Intel Corporation. All rights reserved.<BR>
+SPDX-License-Identifier: BSD-2-Clause-Patent
+
+**/
+
+#include <Uefi.h>
+
+extern void OPENSSL_cpuid_setup (void);
+
+/**
+ Constructor routine for OpensslLib.
+
+ The constructor calls an internal OpenSSL function which fetches a local copy
+ of the hardware capability flags, used to enable native crypto instructions.
+
+ @param None
+
+ @retval EFI_SUCCESS The construction succeeded.
+
+**/
+EFI_STATUS
+EFIAPI
+OpensslLibConstructor (
+ VOID
+ )
+{
+ OPENSSL_cpuid_setup ();
+
+ return EFI_SUCCESS;
+}
+
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/aesni-x86.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/aesni-x86.nasm
new file mode 100644
index 0000000000..30879d3cf5
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/aesni-x86.nasm
@@ -0,0 +1,3209 @@
+; Copyright 2009-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text code align=64
+%else
+section .text code
+%endif
+;extern _OPENSSL_ia32cap_P
+global _aesni_encrypt
+align 16
+_aesni_encrypt:
+L$_aesni_encrypt_begin:
+ mov eax,DWORD [4+esp]
+ mov edx,DWORD [12+esp]
+ movups xmm2,[eax]
+ mov ecx,DWORD [240+edx]
+ mov eax,DWORD [8+esp]
+ movups xmm0,[edx]
+ movups xmm1,[16+edx]
+ lea edx,[32+edx]
+ xorps xmm2,xmm0
+L$000enc1_loop_1:
+db 102,15,56,220,209
+ dec ecx
+ movups xmm1,[edx]
+ lea edx,[16+edx]
+ jnz NEAR L$000enc1_loop_1
+db 102,15,56,221,209
+ pxor xmm0,xmm0
+ pxor xmm1,xmm1
+ movups [eax],xmm2
+ pxor xmm2,xmm2
+ ret
+global _aesni_decrypt
+align 16
+_aesni_decrypt:
+L$_aesni_decrypt_begin:
+ mov eax,DWORD [4+esp]
+ mov edx,DWORD [12+esp]
+ movups xmm2,[eax]
+ mov ecx,DWORD [240+edx]
+ mov eax,DWORD [8+esp]
+ movups xmm0,[edx]
+ movups xmm1,[16+edx]
+ lea edx,[32+edx]
+ xorps xmm2,xmm0
+L$001dec1_loop_2:
+db 102,15,56,222,209
+ dec ecx
+ movups xmm1,[edx]
+ lea edx,[16+edx]
+ jnz NEAR L$001dec1_loop_2
+db 102,15,56,223,209
+ pxor xmm0,xmm0
+ pxor xmm1,xmm1
+ movups [eax],xmm2
+ pxor xmm2,xmm2
+ ret
+align 16
+__aesni_encrypt2:
+ movups xmm0,[edx]
+ shl ecx,4
+ movups xmm1,[16+edx]
+ xorps xmm2,xmm0
+ pxor xmm3,xmm0
+ movups xmm0,[32+edx]
+ lea edx,[32+ecx*1+edx]
+ neg ecx
+ add ecx,16
+L$002enc2_loop:
+db 102,15,56,220,209
+db 102,15,56,220,217
+ movups xmm1,[ecx*1+edx]
+ add ecx,32
+db 102,15,56,220,208
+db 102,15,56,220,216
+ movups xmm0,[ecx*1+edx-16]
+ jnz NEAR L$002enc2_loop
+db 102,15,56,220,209
+db 102,15,56,220,217
+db 102,15,56,221,208
+db 102,15,56,221,216
+ ret
+align 16
+__aesni_decrypt2:
+ movups xmm0,[edx]
+ shl ecx,4
+ movups xmm1,[16+edx]
+ xorps xmm2,xmm0
+ pxor xmm3,xmm0
+ movups xmm0,[32+edx]
+ lea edx,[32+ecx*1+edx]
+ neg ecx
+ add ecx,16
+L$003dec2_loop:
+db 102,15,56,222,209
+db 102,15,56,222,217
+ movups xmm1,[ecx*1+edx]
+ add ecx,32
+db 102,15,56,222,208
+db 102,15,56,222,216
+ movups xmm0,[ecx*1+edx-16]
+ jnz NEAR L$003dec2_loop
+db 102,15,56,222,209
+db 102,15,56,222,217
+db 102,15,56,223,208
+db 102,15,56,223,216
+ ret
+align 16
+__aesni_encrypt3:
+ movups xmm0,[edx]
+ shl ecx,4
+ movups xmm1,[16+edx]
+ xorps xmm2,xmm0
+ pxor xmm3,xmm0
+ pxor xmm4,xmm0
+ movups xmm0,[32+edx]
+ lea edx,[32+ecx*1+edx]
+ neg ecx
+ add ecx,16
+L$004enc3_loop:
+db 102,15,56,220,209
+db 102,15,56,220,217
+db 102,15,56,220,225
+ movups xmm1,[ecx*1+edx]
+ add ecx,32
+db 102,15,56,220,208
+db 102,15,56,220,216
+db 102,15,56,220,224
+ movups xmm0,[ecx*1+edx-16]
+ jnz NEAR L$004enc3_loop
+db 102,15,56,220,209
+db 102,15,56,220,217
+db 102,15,56,220,225
+db 102,15,56,221,208
+db 102,15,56,221,216
+db 102,15,56,221,224
+ ret
+align 16
+__aesni_decrypt3:
+ movups xmm0,[edx]
+ shl ecx,4
+ movups xmm1,[16+edx]
+ xorps xmm2,xmm0
+ pxor xmm3,xmm0
+ pxor xmm4,xmm0
+ movups xmm0,[32+edx]
+ lea edx,[32+ecx*1+edx]
+ neg ecx
+ add ecx,16
+L$005dec3_loop:
+db 102,15,56,222,209
+db 102,15,56,222,217
+db 102,15,56,222,225
+ movups xmm1,[ecx*1+edx]
+ add ecx,32
+db 102,15,56,222,208
+db 102,15,56,222,216
+db 102,15,56,222,224
+ movups xmm0,[ecx*1+edx-16]
+ jnz NEAR L$005dec3_loop
+db 102,15,56,222,209
+db 102,15,56,222,217
+db 102,15,56,222,225
+db 102,15,56,223,208
+db 102,15,56,223,216
+db 102,15,56,223,224
+ ret
+align 16
+__aesni_encrypt4:
+ movups xmm0,[edx]
+ movups xmm1,[16+edx]
+ shl ecx,4
+ xorps xmm2,xmm0
+ pxor xmm3,xmm0
+ pxor xmm4,xmm0
+ pxor xmm5,xmm0
+ movups xmm0,[32+edx]
+ lea edx,[32+ecx*1+edx]
+ neg ecx
+db 15,31,64,0
+ add ecx,16
+L$006enc4_loop:
+db 102,15,56,220,209
+db 102,15,56,220,217
+db 102,15,56,220,225
+db 102,15,56,220,233
+ movups xmm1,[ecx*1+edx]
+ add ecx,32
+db 102,15,56,220,208
+db 102,15,56,220,216
+db 102,15,56,220,224
+db 102,15,56,220,232
+ movups xmm0,[ecx*1+edx-16]
+ jnz NEAR L$006enc4_loop
+db 102,15,56,220,209
+db 102,15,56,220,217
+db 102,15,56,220,225
+db 102,15,56,220,233
+db 102,15,56,221,208
+db 102,15,56,221,216
+db 102,15,56,221,224
+db 102,15,56,221,232
+ ret
+align 16
+__aesni_decrypt4:
+ movups xmm0,[edx]
+ movups xmm1,[16+edx]
+ shl ecx,4
+ xorps xmm2,xmm0
+ pxor xmm3,xmm0
+ pxor xmm4,xmm0
+ pxor xmm5,xmm0
+ movups xmm0,[32+edx]
+ lea edx,[32+ecx*1+edx]
+ neg ecx
+db 15,31,64,0
+ add ecx,16
+L$007dec4_loop:
+db 102,15,56,222,209
+db 102,15,56,222,217
+db 102,15,56,222,225
+db 102,15,56,222,233
+ movups xmm1,[ecx*1+edx]
+ add ecx,32
+db 102,15,56,222,208
+db 102,15,56,222,216
+db 102,15,56,222,224
+db 102,15,56,222,232
+ movups xmm0,[ecx*1+edx-16]
+ jnz NEAR L$007dec4_loop
+db 102,15,56,222,209
+db 102,15,56,222,217
+db 102,15,56,222,225
+db 102,15,56,222,233
+db 102,15,56,223,208
+db 102,15,56,223,216
+db 102,15,56,223,224
+db 102,15,56,223,232
+ ret
+align 16
+__aesni_encrypt6:
+ movups xmm0,[edx]
+ shl ecx,4
+ movups xmm1,[16+edx]
+ xorps xmm2,xmm0
+ pxor xmm3,xmm0
+ pxor xmm4,xmm0
+db 102,15,56,220,209
+ pxor xmm5,xmm0
+ pxor xmm6,xmm0
+db 102,15,56,220,217
+ lea edx,[32+ecx*1+edx]
+ neg ecx
+db 102,15,56,220,225
+ pxor xmm7,xmm0
+ movups xmm0,[ecx*1+edx]
+ add ecx,16
+ jmp NEAR L$008_aesni_encrypt6_inner
+align 16
+L$009enc6_loop:
+db 102,15,56,220,209
+db 102,15,56,220,217
+db 102,15,56,220,225
+L$008_aesni_encrypt6_inner:
+db 102,15,56,220,233
+db 102,15,56,220,241
+db 102,15,56,220,249
+L$_aesni_encrypt6_enter:
+ movups xmm1,[ecx*1+edx]
+ add ecx,32
+db 102,15,56,220,208
+db 102,15,56,220,216
+db 102,15,56,220,224
+db 102,15,56,220,232
+db 102,15,56,220,240
+db 102,15,56,220,248
+ movups xmm0,[ecx*1+edx-16]
+ jnz NEAR L$009enc6_loop
+db 102,15,56,220,209
+db 102,15,56,220,217
+db 102,15,56,220,225
+db 102,15,56,220,233
+db 102,15,56,220,241
+db 102,15,56,220,249
+db 102,15,56,221,208
+db 102,15,56,221,216
+db 102,15,56,221,224
+db 102,15,56,221,232
+db 102,15,56,221,240
+db 102,15,56,221,248
+ ret
+align 16
+__aesni_decrypt6:
+ movups xmm0,[edx]
+ shl ecx,4
+ movups xmm1,[16+edx]
+ xorps xmm2,xmm0
+ pxor xmm3,xmm0
+ pxor xmm4,xmm0
+db 102,15,56,222,209
+ pxor xmm5,xmm0
+ pxor xmm6,xmm0
+db 102,15,56,222,217
+ lea edx,[32+ecx*1+edx]
+ neg ecx
+db 102,15,56,222,225
+ pxor xmm7,xmm0
+ movups xmm0,[ecx*1+edx]
+ add ecx,16
+ jmp NEAR L$010_aesni_decrypt6_inner
+align 16
+L$011dec6_loop:
+db 102,15,56,222,209
+db 102,15,56,222,217
+db 102,15,56,222,225
+L$010_aesni_decrypt6_inner:
+db 102,15,56,222,233
+db 102,15,56,222,241
+db 102,15,56,222,249
+L$_aesni_decrypt6_enter:
+ movups xmm1,[ecx*1+edx]
+ add ecx,32
+db 102,15,56,222,208
+db 102,15,56,222,216
+db 102,15,56,222,224
+db 102,15,56,222,232
+db 102,15,56,222,240
+db 102,15,56,222,248
+ movups xmm0,[ecx*1+edx-16]
+ jnz NEAR L$011dec6_loop
+db 102,15,56,222,209
+db 102,15,56,222,217
+db 102,15,56,222,225
+db 102,15,56,222,233
+db 102,15,56,222,241
+db 102,15,56,222,249
+db 102,15,56,223,208
+db 102,15,56,223,216
+db 102,15,56,223,224
+db 102,15,56,223,232
+db 102,15,56,223,240
+db 102,15,56,223,248
+ ret
+global _aesni_ecb_encrypt
+align 16
+_aesni_ecb_encrypt:
+L$_aesni_ecb_encrypt_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ mov esi,DWORD [20+esp]
+ mov edi,DWORD [24+esp]
+ mov eax,DWORD [28+esp]
+ mov edx,DWORD [32+esp]
+ mov ebx,DWORD [36+esp]
+ and eax,-16
+ jz NEAR L$012ecb_ret
+ mov ecx,DWORD [240+edx]
+ test ebx,ebx
+ jz NEAR L$013ecb_decrypt
+ mov ebp,edx
+ mov ebx,ecx
+ cmp eax,96
+ jb NEAR L$014ecb_enc_tail
+ movdqu xmm2,[esi]
+ movdqu xmm3,[16+esi]
+ movdqu xmm4,[32+esi]
+ movdqu xmm5,[48+esi]
+ movdqu xmm6,[64+esi]
+ movdqu xmm7,[80+esi]
+ lea esi,[96+esi]
+ sub eax,96
+ jmp NEAR L$015ecb_enc_loop6_enter
+align 16
+L$016ecb_enc_loop6:
+ movups [edi],xmm2
+ movdqu xmm2,[esi]
+ movups [16+edi],xmm3
+ movdqu xmm3,[16+esi]
+ movups [32+edi],xmm4
+ movdqu xmm4,[32+esi]
+ movups [48+edi],xmm5
+ movdqu xmm5,[48+esi]
+ movups [64+edi],xmm6
+ movdqu xmm6,[64+esi]
+ movups [80+edi],xmm7
+ lea edi,[96+edi]
+ movdqu xmm7,[80+esi]
+ lea esi,[96+esi]
+L$015ecb_enc_loop6_enter:
+ call __aesni_encrypt6
+ mov edx,ebp
+ mov ecx,ebx
+ sub eax,96
+ jnc NEAR L$016ecb_enc_loop6
+ movups [edi],xmm2
+ movups [16+edi],xmm3
+ movups [32+edi],xmm4
+ movups [48+edi],xmm5
+ movups [64+edi],xmm6
+ movups [80+edi],xmm7
+ lea edi,[96+edi]
+ add eax,96
+ jz NEAR L$012ecb_ret
+L$014ecb_enc_tail:
+ movups xmm2,[esi]
+ cmp eax,32
+ jb NEAR L$017ecb_enc_one
+ movups xmm3,[16+esi]
+ je NEAR L$018ecb_enc_two
+ movups xmm4,[32+esi]
+ cmp eax,64
+ jb NEAR L$019ecb_enc_three
+ movups xmm5,[48+esi]
+ je NEAR L$020ecb_enc_four
+ movups xmm6,[64+esi]
+ xorps xmm7,xmm7
+ call __aesni_encrypt6
+ movups [edi],xmm2
+ movups [16+edi],xmm3
+ movups [32+edi],xmm4
+ movups [48+edi],xmm5
+ movups [64+edi],xmm6
+ jmp NEAR L$012ecb_ret
+align 16
+L$017ecb_enc_one:
+ movups xmm0,[edx]
+ movups xmm1,[16+edx]
+ lea edx,[32+edx]
+ xorps xmm2,xmm0
+L$021enc1_loop_3:
+db 102,15,56,220,209
+ dec ecx
+ movups xmm1,[edx]
+ lea edx,[16+edx]
+ jnz NEAR L$021enc1_loop_3
+db 102,15,56,221,209
+ movups [edi],xmm2
+ jmp NEAR L$012ecb_ret
+align 16
+L$018ecb_enc_two:
+ call __aesni_encrypt2
+ movups [edi],xmm2
+ movups [16+edi],xmm3
+ jmp NEAR L$012ecb_ret
+align 16
+L$019ecb_enc_three:
+ call __aesni_encrypt3
+ movups [edi],xmm2
+ movups [16+edi],xmm3
+ movups [32+edi],xmm4
+ jmp NEAR L$012ecb_ret
+align 16
+L$020ecb_enc_four:
+ call __aesni_encrypt4
+ movups [edi],xmm2
+ movups [16+edi],xmm3
+ movups [32+edi],xmm4
+ movups [48+edi],xmm5
+ jmp NEAR L$012ecb_ret
+align 16
+L$013ecb_decrypt:
+ mov ebp,edx
+ mov ebx,ecx
+ cmp eax,96
+ jb NEAR L$022ecb_dec_tail
+ movdqu xmm2,[esi]
+ movdqu xmm3,[16+esi]
+ movdqu xmm4,[32+esi]
+ movdqu xmm5,[48+esi]
+ movdqu xmm6,[64+esi]
+ movdqu xmm7,[80+esi]
+ lea esi,[96+esi]
+ sub eax,96
+ jmp NEAR L$023ecb_dec_loop6_enter
+align 16
+L$024ecb_dec_loop6:
+ movups [edi],xmm2
+ movdqu xmm2,[esi]
+ movups [16+edi],xmm3
+ movdqu xmm3,[16+esi]
+ movups [32+edi],xmm4
+ movdqu xmm4,[32+esi]
+ movups [48+edi],xmm5
+ movdqu xmm5,[48+esi]
+ movups [64+edi],xmm6
+ movdqu xmm6,[64+esi]
+ movups [80+edi],xmm7
+ lea edi,[96+edi]
+ movdqu xmm7,[80+esi]
+ lea esi,[96+esi]
+L$023ecb_dec_loop6_enter:
+ call __aesni_decrypt6
+ mov edx,ebp
+ mov ecx,ebx
+ sub eax,96
+ jnc NEAR L$024ecb_dec_loop6
+ movups [edi],xmm2
+ movups [16+edi],xmm3
+ movups [32+edi],xmm4
+ movups [48+edi],xmm5
+ movups [64+edi],xmm6
+ movups [80+edi],xmm7
+ lea edi,[96+edi]
+ add eax,96
+ jz NEAR L$012ecb_ret
+L$022ecb_dec_tail:
+ movups xmm2,[esi]
+ cmp eax,32
+ jb NEAR L$025ecb_dec_one
+ movups xmm3,[16+esi]
+ je NEAR L$026ecb_dec_two
+ movups xmm4,[32+esi]
+ cmp eax,64
+ jb NEAR L$027ecb_dec_three
+ movups xmm5,[48+esi]
+ je NEAR L$028ecb_dec_four
+ movups xmm6,[64+esi]
+ xorps xmm7,xmm7
+ call __aesni_decrypt6
+ movups [edi],xmm2
+ movups [16+edi],xmm3
+ movups [32+edi],xmm4
+ movups [48+edi],xmm5
+ movups [64+edi],xmm6
+ jmp NEAR L$012ecb_ret
+align 16
+L$025ecb_dec_one:
+ movups xmm0,[edx]
+ movups xmm1,[16+edx]
+ lea edx,[32+edx]
+ xorps xmm2,xmm0
+L$029dec1_loop_4:
+db 102,15,56,222,209
+ dec ecx
+ movups xmm1,[edx]
+ lea edx,[16+edx]
+ jnz NEAR L$029dec1_loop_4
+db 102,15,56,223,209
+ movups [edi],xmm2
+ jmp NEAR L$012ecb_ret
+align 16
+L$026ecb_dec_two:
+ call __aesni_decrypt2
+ movups [edi],xmm2
+ movups [16+edi],xmm3
+ jmp NEAR L$012ecb_ret
+align 16
+L$027ecb_dec_three:
+ call __aesni_decrypt3
+ movups [edi],xmm2
+ movups [16+edi],xmm3
+ movups [32+edi],xmm4
+ jmp NEAR L$012ecb_ret
+align 16
+L$028ecb_dec_four:
+ call __aesni_decrypt4
+ movups [edi],xmm2
+ movups [16+edi],xmm3
+ movups [32+edi],xmm4
+ movups [48+edi],xmm5
+L$012ecb_ret:
+ pxor xmm0,xmm0
+ pxor xmm1,xmm1
+ pxor xmm2,xmm2
+ pxor xmm3,xmm3
+ pxor xmm4,xmm4
+ pxor xmm5,xmm5
+ pxor xmm6,xmm6
+ pxor xmm7,xmm7
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+global _aesni_ccm64_encrypt_blocks
+align 16
+_aesni_ccm64_encrypt_blocks:
+L$_aesni_ccm64_encrypt_blocks_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ mov esi,DWORD [20+esp]
+ mov edi,DWORD [24+esp]
+ mov eax,DWORD [28+esp]
+ mov edx,DWORD [32+esp]
+ mov ebx,DWORD [36+esp]
+ mov ecx,DWORD [40+esp]
+ mov ebp,esp
+ sub esp,60
+ and esp,-16
+ mov DWORD [48+esp],ebp
+ movdqu xmm7,[ebx]
+ movdqu xmm3,[ecx]
+ mov ecx,DWORD [240+edx]
+ mov DWORD [esp],202182159
+ mov DWORD [4+esp],134810123
+ mov DWORD [8+esp],67438087
+ mov DWORD [12+esp],66051
+ mov ebx,1
+ xor ebp,ebp
+ mov DWORD [16+esp],ebx
+ mov DWORD [20+esp],ebp
+ mov DWORD [24+esp],ebp
+ mov DWORD [28+esp],ebp
+ shl ecx,4
+ mov ebx,16
+ lea ebp,[edx]
+ movdqa xmm5,[esp]
+ movdqa xmm2,xmm7
+ lea edx,[32+ecx*1+edx]
+ sub ebx,ecx
+db 102,15,56,0,253
+L$030ccm64_enc_outer:
+ movups xmm0,[ebp]
+ mov ecx,ebx
+ movups xmm6,[esi]
+ xorps xmm2,xmm0
+ movups xmm1,[16+ebp]
+ xorps xmm0,xmm6
+ xorps xmm3,xmm0
+ movups xmm0,[32+ebp]
+L$031ccm64_enc2_loop:
+db 102,15,56,220,209
+db 102,15,56,220,217
+ movups xmm1,[ecx*1+edx]
+ add ecx,32
+db 102,15,56,220,208
+db 102,15,56,220,216
+ movups xmm0,[ecx*1+edx-16]
+ jnz NEAR L$031ccm64_enc2_loop
+db 102,15,56,220,209
+db 102,15,56,220,217
+ paddq xmm7,[16+esp]
+ dec eax
+db 102,15,56,221,208
+db 102,15,56,221,216
+ lea esi,[16+esi]
+ xorps xmm6,xmm2
+ movdqa xmm2,xmm7
+ movups [edi],xmm6
+db 102,15,56,0,213
+ lea edi,[16+edi]
+ jnz NEAR L$030ccm64_enc_outer
+ mov esp,DWORD [48+esp]
+ mov edi,DWORD [40+esp]
+ movups [edi],xmm3
+ pxor xmm0,xmm0
+ pxor xmm1,xmm1
+ pxor xmm2,xmm2
+ pxor xmm3,xmm3
+ pxor xmm4,xmm4
+ pxor xmm5,xmm5
+ pxor xmm6,xmm6
+ pxor xmm7,xmm7
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+global _aesni_ccm64_decrypt_blocks
+align 16
+_aesni_ccm64_decrypt_blocks:
+L$_aesni_ccm64_decrypt_blocks_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ mov esi,DWORD [20+esp]
+ mov edi,DWORD [24+esp]
+ mov eax,DWORD [28+esp]
+ mov edx,DWORD [32+esp]
+ mov ebx,DWORD [36+esp]
+ mov ecx,DWORD [40+esp]
+ mov ebp,esp
+ sub esp,60
+ and esp,-16
+ mov DWORD [48+esp],ebp
+ movdqu xmm7,[ebx]
+ movdqu xmm3,[ecx]
+ mov ecx,DWORD [240+edx]
+ mov DWORD [esp],202182159
+ mov DWORD [4+esp],134810123
+ mov DWORD [8+esp],67438087
+ mov DWORD [12+esp],66051
+ mov ebx,1
+ xor ebp,ebp
+ mov DWORD [16+esp],ebx
+ mov DWORD [20+esp],ebp
+ mov DWORD [24+esp],ebp
+ mov DWORD [28+esp],ebp
+ movdqa xmm5,[esp]
+ movdqa xmm2,xmm7
+ mov ebp,edx
+ mov ebx,ecx
+db 102,15,56,0,253
+ movups xmm0,[edx]
+ movups xmm1,[16+edx]
+ lea edx,[32+edx]
+ xorps xmm2,xmm0
+L$032enc1_loop_5:
+db 102,15,56,220,209
+ dec ecx
+ movups xmm1,[edx]
+ lea edx,[16+edx]
+ jnz NEAR L$032enc1_loop_5
+db 102,15,56,221,209
+ shl ebx,4
+ mov ecx,16
+ movups xmm6,[esi]
+ paddq xmm7,[16+esp]
+ lea esi,[16+esi]
+ sub ecx,ebx
+ lea edx,[32+ebx*1+ebp]
+ mov ebx,ecx
+ jmp NEAR L$033ccm64_dec_outer
+align 16
+L$033ccm64_dec_outer:
+ xorps xmm6,xmm2
+ movdqa xmm2,xmm7
+ movups [edi],xmm6
+ lea edi,[16+edi]
+db 102,15,56,0,213
+ sub eax,1
+ jz NEAR L$034ccm64_dec_break
+ movups xmm0,[ebp]
+ mov ecx,ebx
+ movups xmm1,[16+ebp]
+ xorps xmm6,xmm0
+ xorps xmm2,xmm0
+ xorps xmm3,xmm6
+ movups xmm0,[32+ebp]
+L$035ccm64_dec2_loop:
+db 102,15,56,220,209
+db 102,15,56,220,217
+ movups xmm1,[ecx*1+edx]
+ add ecx,32
+db 102,15,56,220,208
+db 102,15,56,220,216
+ movups xmm0,[ecx*1+edx-16]
+ jnz NEAR L$035ccm64_dec2_loop
+ movups xmm6,[esi]
+ paddq xmm7,[16+esp]
+db 102,15,56,220,209
+db 102,15,56,220,217
+db 102,15,56,221,208
+db 102,15,56,221,216
+ lea esi,[16+esi]
+ jmp NEAR L$033ccm64_dec_outer
+align 16
+L$034ccm64_dec_break:
+ mov ecx,DWORD [240+ebp]
+ mov edx,ebp
+ movups xmm0,[edx]
+ movups xmm1,[16+edx]
+ xorps xmm6,xmm0
+ lea edx,[32+edx]
+ xorps xmm3,xmm6
+L$036enc1_loop_6:
+db 102,15,56,220,217
+ dec ecx
+ movups xmm1,[edx]
+ lea edx,[16+edx]
+ jnz NEAR L$036enc1_loop_6
+db 102,15,56,221,217
+ mov esp,DWORD [48+esp]
+ mov edi,DWORD [40+esp]
+ movups [edi],xmm3
+ pxor xmm0,xmm0
+ pxor xmm1,xmm1
+ pxor xmm2,xmm2
+ pxor xmm3,xmm3
+ pxor xmm4,xmm4
+ pxor xmm5,xmm5
+ pxor xmm6,xmm6
+ pxor xmm7,xmm7
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+global _aesni_ctr32_encrypt_blocks
+align 16
+_aesni_ctr32_encrypt_blocks:
+L$_aesni_ctr32_encrypt_blocks_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ mov esi,DWORD [20+esp]
+ mov edi,DWORD [24+esp]
+ mov eax,DWORD [28+esp]
+ mov edx,DWORD [32+esp]
+ mov ebx,DWORD [36+esp]
+ mov ebp,esp
+ sub esp,88
+ and esp,-16
+ mov DWORD [80+esp],ebp
+ cmp eax,1
+ je NEAR L$037ctr32_one_shortcut
+ movdqu xmm7,[ebx]
+ mov DWORD [esp],202182159
+ mov DWORD [4+esp],134810123
+ mov DWORD [8+esp],67438087
+ mov DWORD [12+esp],66051
+ mov ecx,6
+ xor ebp,ebp
+ mov DWORD [16+esp],ecx
+ mov DWORD [20+esp],ecx
+ mov DWORD [24+esp],ecx
+ mov DWORD [28+esp],ebp
+db 102,15,58,22,251,3
+db 102,15,58,34,253,3
+ mov ecx,DWORD [240+edx]
+ bswap ebx
+ pxor xmm0,xmm0
+ pxor xmm1,xmm1
+ movdqa xmm2,[esp]
+db 102,15,58,34,195,0
+ lea ebp,[3+ebx]
+db 102,15,58,34,205,0
+ inc ebx
+db 102,15,58,34,195,1
+ inc ebp
+db 102,15,58,34,205,1
+ inc ebx
+db 102,15,58,34,195,2
+ inc ebp
+db 102,15,58,34,205,2
+ movdqa [48+esp],xmm0
+db 102,15,56,0,194
+ movdqu xmm6,[edx]
+ movdqa [64+esp],xmm1
+db 102,15,56,0,202
+ pshufd xmm2,xmm0,192
+ pshufd xmm3,xmm0,128
+ cmp eax,6
+ jb NEAR L$038ctr32_tail
+ pxor xmm7,xmm6
+ shl ecx,4
+ mov ebx,16
+ movdqa [32+esp],xmm7
+ mov ebp,edx
+ sub ebx,ecx
+ lea edx,[32+ecx*1+edx]
+ sub eax,6
+ jmp NEAR L$039ctr32_loop6
+align 16
+L$039ctr32_loop6:
+ pshufd xmm4,xmm0,64
+ movdqa xmm0,[32+esp]
+ pshufd xmm5,xmm1,192
+ pxor xmm2,xmm0
+ pshufd xmm6,xmm1,128
+ pxor xmm3,xmm0
+ pshufd xmm7,xmm1,64
+ movups xmm1,[16+ebp]
+ pxor xmm4,xmm0
+ pxor xmm5,xmm0
+db 102,15,56,220,209
+ pxor xmm6,xmm0
+ pxor xmm7,xmm0
+db 102,15,56,220,217
+ movups xmm0,[32+ebp]
+ mov ecx,ebx
+db 102,15,56,220,225
+db 102,15,56,220,233
+db 102,15,56,220,241
+db 102,15,56,220,249
+ call L$_aesni_encrypt6_enter
+ movups xmm1,[esi]
+ movups xmm0,[16+esi]
+ xorps xmm2,xmm1
+ movups xmm1,[32+esi]
+ xorps xmm3,xmm0
+ movups [edi],xmm2
+ movdqa xmm0,[16+esp]
+ xorps xmm4,xmm1
+ movdqa xmm1,[64+esp]
+ movups [16+edi],xmm3
+ movups [32+edi],xmm4
+ paddd xmm1,xmm0
+ paddd xmm0,[48+esp]
+ movdqa xmm2,[esp]
+ movups xmm3,[48+esi]
+ movups xmm4,[64+esi]
+ xorps xmm5,xmm3
+ movups xmm3,[80+esi]
+ lea esi,[96+esi]
+ movdqa [48+esp],xmm0
+db 102,15,56,0,194
+ xorps xmm6,xmm4
+ movups [48+edi],xmm5
+ xorps xmm7,xmm3
+ movdqa [64+esp],xmm1
+db 102,15,56,0,202
+ movups [64+edi],xmm6
+ pshufd xmm2,xmm0,192
+ movups [80+edi],xmm7
+ lea edi,[96+edi]
+ pshufd xmm3,xmm0,128
+ sub eax,6
+ jnc NEAR L$039ctr32_loop6
+ add eax,6
+ jz NEAR L$040ctr32_ret
+ movdqu xmm7,[ebp]
+ mov edx,ebp
+ pxor xmm7,[32+esp]
+ mov ecx,DWORD [240+ebp]
+L$038ctr32_tail:
+ por xmm2,xmm7
+ cmp eax,2
+ jb NEAR L$041ctr32_one
+ pshufd xmm4,xmm0,64
+ por xmm3,xmm7
+ je NEAR L$042ctr32_two
+ pshufd xmm5,xmm1,192
+ por xmm4,xmm7
+ cmp eax,4
+ jb NEAR L$043ctr32_three
+ pshufd xmm6,xmm1,128
+ por xmm5,xmm7
+ je NEAR L$044ctr32_four
+ por xmm6,xmm7
+ call __aesni_encrypt6
+ movups xmm1,[esi]
+ movups xmm0,[16+esi]
+ xorps xmm2,xmm1
+ movups xmm1,[32+esi]
+ xorps xmm3,xmm0
+ movups xmm0,[48+esi]
+ xorps xmm4,xmm1
+ movups xmm1,[64+esi]
+ xorps xmm5,xmm0
+ movups [edi],xmm2
+ xorps xmm6,xmm1
+ movups [16+edi],xmm3
+ movups [32+edi],xmm4
+ movups [48+edi],xmm5
+ movups [64+edi],xmm6
+ jmp NEAR L$040ctr32_ret
+align 16
+L$037ctr32_one_shortcut:
+ movups xmm2,[ebx]
+ mov ecx,DWORD [240+edx]
+L$041ctr32_one:
+ movups xmm0,[edx]
+ movups xmm1,[16+edx]
+ lea edx,[32+edx]
+ xorps xmm2,xmm0
+L$045enc1_loop_7:
+db 102,15,56,220,209
+ dec ecx
+ movups xmm1,[edx]
+ lea edx,[16+edx]
+ jnz NEAR L$045enc1_loop_7
+db 102,15,56,221,209
+ movups xmm6,[esi]
+ xorps xmm6,xmm2
+ movups [edi],xmm6
+ jmp NEAR L$040ctr32_ret
+align 16
+L$042ctr32_two:
+ call __aesni_encrypt2
+ movups xmm5,[esi]
+ movups xmm6,[16+esi]
+ xorps xmm2,xmm5
+ xorps xmm3,xmm6
+ movups [edi],xmm2
+ movups [16+edi],xmm3
+ jmp NEAR L$040ctr32_ret
+align 16
+L$043ctr32_three:
+ call __aesni_encrypt3
+ movups xmm5,[esi]
+ movups xmm6,[16+esi]
+ xorps xmm2,xmm5
+ movups xmm7,[32+esi]
+ xorps xmm3,xmm6
+ movups [edi],xmm2
+ xorps xmm4,xmm7
+ movups [16+edi],xmm3
+ movups [32+edi],xmm4
+ jmp NEAR L$040ctr32_ret
+align 16
+L$044ctr32_four:
+ call __aesni_encrypt4
+ movups xmm6,[esi]
+ movups xmm7,[16+esi]
+ movups xmm1,[32+esi]
+ xorps xmm2,xmm6
+ movups xmm0,[48+esi]
+ xorps xmm3,xmm7
+ movups [edi],xmm2
+ xorps xmm4,xmm1
+ movups [16+edi],xmm3
+ xorps xmm5,xmm0
+ movups [32+edi],xmm4
+ movups [48+edi],xmm5
+L$040ctr32_ret:
+ pxor xmm0,xmm0
+ pxor xmm1,xmm1
+ pxor xmm2,xmm2
+ pxor xmm3,xmm3
+ pxor xmm4,xmm4
+ movdqa [32+esp],xmm0
+ pxor xmm5,xmm5
+ movdqa [48+esp],xmm0
+ pxor xmm6,xmm6
+ movdqa [64+esp],xmm0
+ pxor xmm7,xmm7
+ mov esp,DWORD [80+esp]
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+global _aesni_xts_encrypt
+align 16
+_aesni_xts_encrypt:
+L$_aesni_xts_encrypt_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ mov edx,DWORD [36+esp]
+ mov esi,DWORD [40+esp]
+ mov ecx,DWORD [240+edx]
+ movups xmm2,[esi]
+ movups xmm0,[edx]
+ movups xmm1,[16+edx]
+ lea edx,[32+edx]
+ xorps xmm2,xmm0
+L$046enc1_loop_8:
+db 102,15,56,220,209
+ dec ecx
+ movups xmm1,[edx]
+ lea edx,[16+edx]
+ jnz NEAR L$046enc1_loop_8
+db 102,15,56,221,209
+ mov esi,DWORD [20+esp]
+ mov edi,DWORD [24+esp]
+ mov eax,DWORD [28+esp]
+ mov edx,DWORD [32+esp]
+ mov ebp,esp
+ sub esp,120
+ mov ecx,DWORD [240+edx]
+ and esp,-16
+ mov DWORD [96+esp],135
+ mov DWORD [100+esp],0
+ mov DWORD [104+esp],1
+ mov DWORD [108+esp],0
+ mov DWORD [112+esp],eax
+ mov DWORD [116+esp],ebp
+ movdqa xmm1,xmm2
+ pxor xmm0,xmm0
+ movdqa xmm3,[96+esp]
+ pcmpgtd xmm0,xmm1
+ and eax,-16
+ mov ebp,edx
+ mov ebx,ecx
+ sub eax,96
+ jc NEAR L$047xts_enc_short
+ shl ecx,4
+ mov ebx,16
+ sub ebx,ecx
+ lea edx,[32+ecx*1+edx]
+ jmp NEAR L$048xts_enc_loop6
+align 16
+L$048xts_enc_loop6:
+ pshufd xmm2,xmm0,19
+ pxor xmm0,xmm0
+ movdqa [esp],xmm1
+ paddq xmm1,xmm1
+ pand xmm2,xmm3
+ pcmpgtd xmm0,xmm1
+ pxor xmm1,xmm2
+ pshufd xmm2,xmm0,19
+ pxor xmm0,xmm0
+ movdqa [16+esp],xmm1
+ paddq xmm1,xmm1
+ pand xmm2,xmm3
+ pcmpgtd xmm0,xmm1
+ pxor xmm1,xmm2
+ pshufd xmm2,xmm0,19
+ pxor xmm0,xmm0
+ movdqa [32+esp],xmm1
+ paddq xmm1,xmm1
+ pand xmm2,xmm3
+ pcmpgtd xmm0,xmm1
+ pxor xmm1,xmm2
+ pshufd xmm2,xmm0,19
+ pxor xmm0,xmm0
+ movdqa [48+esp],xmm1
+ paddq xmm1,xmm1
+ pand xmm2,xmm3
+ pcmpgtd xmm0,xmm1
+ pxor xmm1,xmm2
+ pshufd xmm7,xmm0,19
+ movdqa [64+esp],xmm1
+ paddq xmm1,xmm1
+ movups xmm0,[ebp]
+ pand xmm7,xmm3
+ movups xmm2,[esi]
+ pxor xmm7,xmm1
+ mov ecx,ebx
+ movdqu xmm3,[16+esi]
+ xorps xmm2,xmm0
+ movdqu xmm4,[32+esi]
+ pxor xmm3,xmm0
+ movdqu xmm5,[48+esi]
+ pxor xmm4,xmm0
+ movdqu xmm6,[64+esi]
+ pxor xmm5,xmm0
+ movdqu xmm1,[80+esi]
+ pxor xmm6,xmm0
+ lea esi,[96+esi]
+ pxor xmm2,[esp]
+ movdqa [80+esp],xmm7
+ pxor xmm7,xmm1
+ movups xmm1,[16+ebp]
+ pxor xmm3,[16+esp]
+ pxor xmm4,[32+esp]
+db 102,15,56,220,209
+ pxor xmm5,[48+esp]
+ pxor xmm6,[64+esp]
+db 102,15,56,220,217
+ pxor xmm7,xmm0
+ movups xmm0,[32+ebp]
+db 102,15,56,220,225
+db 102,15,56,220,233
+db 102,15,56,220,241
+db 102,15,56,220,249
+ call L$_aesni_encrypt6_enter
+ movdqa xmm1,[80+esp]
+ pxor xmm0,xmm0
+ xorps xmm2,[esp]
+ pcmpgtd xmm0,xmm1
+ xorps xmm3,[16+esp]
+ movups [edi],xmm2
+ xorps xmm4,[32+esp]
+ movups [16+edi],xmm3
+ xorps xmm5,[48+esp]
+ movups [32+edi],xmm4
+ xorps xmm6,[64+esp]
+ movups [48+edi],xmm5
+ xorps xmm7,xmm1
+ movups [64+edi],xmm6
+ pshufd xmm2,xmm0,19
+ movups [80+edi],xmm7
+ lea edi,[96+edi]
+ movdqa xmm3,[96+esp]
+ pxor xmm0,xmm0
+ paddq xmm1,xmm1
+ pand xmm2,xmm3
+ pcmpgtd xmm0,xmm1
+ pxor xmm1,xmm2
+ sub eax,96
+ jnc NEAR L$048xts_enc_loop6
+ mov ecx,DWORD [240+ebp]
+ mov edx,ebp
+ mov ebx,ecx
+L$047xts_enc_short:
+ add eax,96
+ jz NEAR L$049xts_enc_done6x
+ movdqa xmm5,xmm1
+ cmp eax,32
+ jb NEAR L$050xts_enc_one
+ pshufd xmm2,xmm0,19
+ pxor xmm0,xmm0
+ paddq xmm1,xmm1
+ pand xmm2,xmm3
+ pcmpgtd xmm0,xmm1
+ pxor xmm1,xmm2
+ je NEAR L$051xts_enc_two
+ pshufd xmm2,xmm0,19
+ pxor xmm0,xmm0
+ movdqa xmm6,xmm1
+ paddq xmm1,xmm1
+ pand xmm2,xmm3
+ pcmpgtd xmm0,xmm1
+ pxor xmm1,xmm2
+ cmp eax,64
+ jb NEAR L$052xts_enc_three
+ pshufd xmm2,xmm0,19
+ pxor xmm0,xmm0
+ movdqa xmm7,xmm1
+ paddq xmm1,xmm1
+ pand xmm2,xmm3
+ pcmpgtd xmm0,xmm1
+ pxor xmm1,xmm2
+ movdqa [esp],xmm5
+ movdqa [16+esp],xmm6
+ je NEAR L$053xts_enc_four
+ movdqa [32+esp],xmm7
+ pshufd xmm7,xmm0,19
+ movdqa [48+esp],xmm1
+ paddq xmm1,xmm1
+ pand xmm7,xmm3
+ pxor xmm7,xmm1
+ movdqu xmm2,[esi]
+ movdqu xmm3,[16+esi]
+ movdqu xmm4,[32+esi]
+ pxor xmm2,[esp]
+ movdqu xmm5,[48+esi]
+ pxor xmm3,[16+esp]
+ movdqu xmm6,[64+esi]
+ pxor xmm4,[32+esp]
+ lea esi,[80+esi]
+ pxor xmm5,[48+esp]
+ movdqa [64+esp],xmm7
+ pxor xmm6,xmm7
+ call __aesni_encrypt6
+ movaps xmm1,[64+esp]
+ xorps xmm2,[esp]
+ xorps xmm3,[16+esp]
+ xorps xmm4,[32+esp]
+ movups [edi],xmm2
+ xorps xmm5,[48+esp]
+ movups [16+edi],xmm3
+ xorps xmm6,xmm1
+ movups [32+edi],xmm4
+ movups [48+edi],xmm5
+ movups [64+edi],xmm6
+ lea edi,[80+edi]
+ jmp NEAR L$054xts_enc_done
+align 16
+L$050xts_enc_one:
+ movups xmm2,[esi]
+ lea esi,[16+esi]
+ xorps xmm2,xmm5
+ movups xmm0,[edx]
+ movups xmm1,[16+edx]
+ lea edx,[32+edx]
+ xorps xmm2,xmm0
+L$055enc1_loop_9:
+db 102,15,56,220,209
+ dec ecx
+ movups xmm1,[edx]
+ lea edx,[16+edx]
+ jnz NEAR L$055enc1_loop_9
+db 102,15,56,221,209
+ xorps xmm2,xmm5
+ movups [edi],xmm2
+ lea edi,[16+edi]
+ movdqa xmm1,xmm5
+ jmp NEAR L$054xts_enc_done
+align 16
+L$051xts_enc_two:
+ movaps xmm6,xmm1
+ movups xmm2,[esi]
+ movups xmm3,[16+esi]
+ lea esi,[32+esi]
+ xorps xmm2,xmm5
+ xorps xmm3,xmm6
+ call __aesni_encrypt2
+ xorps xmm2,xmm5
+ xorps xmm3,xmm6
+ movups [edi],xmm2
+ movups [16+edi],xmm3
+ lea edi,[32+edi]
+ movdqa xmm1,xmm6
+ jmp NEAR L$054xts_enc_done
+align 16
+L$052xts_enc_three:
+ movaps xmm7,xmm1
+ movups xmm2,[esi]
+ movups xmm3,[16+esi]
+ movups xmm4,[32+esi]
+ lea esi,[48+esi]
+ xorps xmm2,xmm5
+ xorps xmm3,xmm6
+ xorps xmm4,xmm7
+ call __aesni_encrypt3
+ xorps xmm2,xmm5
+ xorps xmm3,xmm6
+ xorps xmm4,xmm7
+ movups [edi],xmm2
+ movups [16+edi],xmm3
+ movups [32+edi],xmm4
+ lea edi,[48+edi]
+ movdqa xmm1,xmm7
+ jmp NEAR L$054xts_enc_done
+align 16
+L$053xts_enc_four:
+ movaps xmm6,xmm1
+ movups xmm2,[esi]
+ movups xmm3,[16+esi]
+ movups xmm4,[32+esi]
+ xorps xmm2,[esp]
+ movups xmm5,[48+esi]
+ lea esi,[64+esi]
+ xorps xmm3,[16+esp]
+ xorps xmm4,xmm7
+ xorps xmm5,xmm6
+ call __aesni_encrypt4
+ xorps xmm2,[esp]
+ xorps xmm3,[16+esp]
+ xorps xmm4,xmm7
+ movups [edi],xmm2
+ xorps xmm5,xmm6
+ movups [16+edi],xmm3
+ movups [32+edi],xmm4
+ movups [48+edi],xmm5
+ lea edi,[64+edi]
+ movdqa xmm1,xmm6
+ jmp NEAR L$054xts_enc_done
+align 16
+L$049xts_enc_done6x:
+ mov eax,DWORD [112+esp]
+ and eax,15
+ jz NEAR L$056xts_enc_ret
+ movdqa xmm5,xmm1
+ mov DWORD [112+esp],eax
+ jmp NEAR L$057xts_enc_steal
+align 16
+L$054xts_enc_done:
+ mov eax,DWORD [112+esp]
+ pxor xmm0,xmm0
+ and eax,15
+ jz NEAR L$056xts_enc_ret
+ pcmpgtd xmm0,xmm1
+ mov DWORD [112+esp],eax
+ pshufd xmm5,xmm0,19
+ paddq xmm1,xmm1
+ pand xmm5,[96+esp]
+ pxor xmm5,xmm1
+L$057xts_enc_steal:
+ movzx ecx,BYTE [esi]
+ movzx edx,BYTE [edi-16]
+ lea esi,[1+esi]
+ mov BYTE [edi-16],cl
+ mov BYTE [edi],dl
+ lea edi,[1+edi]
+ sub eax,1
+ jnz NEAR L$057xts_enc_steal
+ sub edi,DWORD [112+esp]
+ mov edx,ebp
+ mov ecx,ebx
+ movups xmm2,[edi-16]
+ xorps xmm2,xmm5
+ movups xmm0,[edx]
+ movups xmm1,[16+edx]
+ lea edx,[32+edx]
+ xorps xmm2,xmm0
+L$058enc1_loop_10:
+db 102,15,56,220,209
+ dec ecx
+ movups xmm1,[edx]
+ lea edx,[16+edx]
+ jnz NEAR L$058enc1_loop_10
+db 102,15,56,221,209
+ xorps xmm2,xmm5
+ movups [edi-16],xmm2
+L$056xts_enc_ret:
+ pxor xmm0,xmm0
+ pxor xmm1,xmm1
+ pxor xmm2,xmm2
+ movdqa [esp],xmm0
+ pxor xmm3,xmm3
+ movdqa [16+esp],xmm0
+ pxor xmm4,xmm4
+ movdqa [32+esp],xmm0
+ pxor xmm5,xmm5
+ movdqa [48+esp],xmm0
+ pxor xmm6,xmm6
+ movdqa [64+esp],xmm0
+ pxor xmm7,xmm7
+ movdqa [80+esp],xmm0
+ mov esp,DWORD [116+esp]
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+global _aesni_xts_decrypt
+align 16
+_aesni_xts_decrypt:
+L$_aesni_xts_decrypt_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ mov edx,DWORD [36+esp]
+ mov esi,DWORD [40+esp]
+ mov ecx,DWORD [240+edx]
+ movups xmm2,[esi]
+ movups xmm0,[edx]
+ movups xmm1,[16+edx]
+ lea edx,[32+edx]
+ xorps xmm2,xmm0
+L$059enc1_loop_11:
+db 102,15,56,220,209
+ dec ecx
+ movups xmm1,[edx]
+ lea edx,[16+edx]
+ jnz NEAR L$059enc1_loop_11
+db 102,15,56,221,209
+ mov esi,DWORD [20+esp]
+ mov edi,DWORD [24+esp]
+ mov eax,DWORD [28+esp]
+ mov edx,DWORD [32+esp]
+ mov ebp,esp
+ sub esp,120
+ and esp,-16
+ xor ebx,ebx
+ test eax,15
+ setnz bl
+ shl ebx,4
+ sub eax,ebx
+ mov DWORD [96+esp],135
+ mov DWORD [100+esp],0
+ mov DWORD [104+esp],1
+ mov DWORD [108+esp],0
+ mov DWORD [112+esp],eax
+ mov DWORD [116+esp],ebp
+ mov ecx,DWORD [240+edx]
+ mov ebp,edx
+ mov ebx,ecx
+ movdqa xmm1,xmm2
+ pxor xmm0,xmm0
+ movdqa xmm3,[96+esp]
+ pcmpgtd xmm0,xmm1
+ and eax,-16
+ sub eax,96
+ jc NEAR L$060xts_dec_short
+ shl ecx,4
+ mov ebx,16
+ sub ebx,ecx
+ lea edx,[32+ecx*1+edx]
+ jmp NEAR L$061xts_dec_loop6
+align 16
+L$061xts_dec_loop6:
+ pshufd xmm2,xmm0,19
+ pxor xmm0,xmm0
+ movdqa [esp],xmm1
+ paddq xmm1,xmm1
+ pand xmm2,xmm3
+ pcmpgtd xmm0,xmm1
+ pxor xmm1,xmm2
+ pshufd xmm2,xmm0,19
+ pxor xmm0,xmm0
+ movdqa [16+esp],xmm1
+ paddq xmm1,xmm1
+ pand xmm2,xmm3
+ pcmpgtd xmm0,xmm1
+ pxor xmm1,xmm2
+ pshufd xmm2,xmm0,19
+ pxor xmm0,xmm0
+ movdqa [32+esp],xmm1
+ paddq xmm1,xmm1
+ pand xmm2,xmm3
+ pcmpgtd xmm0,xmm1
+ pxor xmm1,xmm2
+ pshufd xmm2,xmm0,19
+ pxor xmm0,xmm0
+ movdqa [48+esp],xmm1
+ paddq xmm1,xmm1
+ pand xmm2,xmm3
+ pcmpgtd xmm0,xmm1
+ pxor xmm1,xmm2
+ pshufd xmm7,xmm0,19
+ movdqa [64+esp],xmm1
+ paddq xmm1,xmm1
+ movups xmm0,[ebp]
+ pand xmm7,xmm3
+ movups xmm2,[esi]
+ pxor xmm7,xmm1
+ mov ecx,ebx
+ movdqu xmm3,[16+esi]
+ xorps xmm2,xmm0
+ movdqu xmm4,[32+esi]
+ pxor xmm3,xmm0
+ movdqu xmm5,[48+esi]
+ pxor xmm4,xmm0
+ movdqu xmm6,[64+esi]
+ pxor xmm5,xmm0
+ movdqu xmm1,[80+esi]
+ pxor xmm6,xmm0
+ lea esi,[96+esi]
+ pxor xmm2,[esp]
+ movdqa [80+esp],xmm7
+ pxor xmm7,xmm1
+ movups xmm1,[16+ebp]
+ pxor xmm3,[16+esp]
+ pxor xmm4,[32+esp]
+db 102,15,56,222,209
+ pxor xmm5,[48+esp]
+ pxor xmm6,[64+esp]
+db 102,15,56,222,217
+ pxor xmm7,xmm0
+ movups xmm0,[32+ebp]
+db 102,15,56,222,225
+db 102,15,56,222,233
+db 102,15,56,222,241
+db 102,15,56,222,249
+ call L$_aesni_decrypt6_enter
+ movdqa xmm1,[80+esp]
+ pxor xmm0,xmm0
+ xorps xmm2,[esp]
+ pcmpgtd xmm0,xmm1
+ xorps xmm3,[16+esp]
+ movups [edi],xmm2
+ xorps xmm4,[32+esp]
+ movups [16+edi],xmm3
+ xorps xmm5,[48+esp]
+ movups [32+edi],xmm4
+ xorps xmm6,[64+esp]
+ movups [48+edi],xmm5
+ xorps xmm7,xmm1
+ movups [64+edi],xmm6
+ pshufd xmm2,xmm0,19
+ movups [80+edi],xmm7
+ lea edi,[96+edi]
+ movdqa xmm3,[96+esp]
+ pxor xmm0,xmm0
+ paddq xmm1,xmm1
+ pand xmm2,xmm3
+ pcmpgtd xmm0,xmm1
+ pxor xmm1,xmm2
+ sub eax,96
+ jnc NEAR L$061xts_dec_loop6
+ mov ecx,DWORD [240+ebp]
+ mov edx,ebp
+ mov ebx,ecx
+L$060xts_dec_short:
+ add eax,96
+ jz NEAR L$062xts_dec_done6x
+ movdqa xmm5,xmm1
+ cmp eax,32
+ jb NEAR L$063xts_dec_one
+ pshufd xmm2,xmm0,19
+ pxor xmm0,xmm0
+ paddq xmm1,xmm1
+ pand xmm2,xmm3
+ pcmpgtd xmm0,xmm1
+ pxor xmm1,xmm2
+ je NEAR L$064xts_dec_two
+ pshufd xmm2,xmm0,19
+ pxor xmm0,xmm0
+ movdqa xmm6,xmm1
+ paddq xmm1,xmm1
+ pand xmm2,xmm3
+ pcmpgtd xmm0,xmm1
+ pxor xmm1,xmm2
+ cmp eax,64
+ jb NEAR L$065xts_dec_three
+ pshufd xmm2,xmm0,19
+ pxor xmm0,xmm0
+ movdqa xmm7,xmm1
+ paddq xmm1,xmm1
+ pand xmm2,xmm3
+ pcmpgtd xmm0,xmm1
+ pxor xmm1,xmm2
+ movdqa [esp],xmm5
+ movdqa [16+esp],xmm6
+ je NEAR L$066xts_dec_four
+ movdqa [32+esp],xmm7
+ pshufd xmm7,xmm0,19
+ movdqa [48+esp],xmm1
+ paddq xmm1,xmm1
+ pand xmm7,xmm3
+ pxor xmm7,xmm1
+ movdqu xmm2,[esi]
+ movdqu xmm3,[16+esi]
+ movdqu xmm4,[32+esi]
+ pxor xmm2,[esp]
+ movdqu xmm5,[48+esi]
+ pxor xmm3,[16+esp]
+ movdqu xmm6,[64+esi]
+ pxor xmm4,[32+esp]
+ lea esi,[80+esi]
+ pxor xmm5,[48+esp]
+ movdqa [64+esp],xmm7
+ pxor xmm6,xmm7
+ call __aesni_decrypt6
+ movaps xmm1,[64+esp]
+ xorps xmm2,[esp]
+ xorps xmm3,[16+esp]
+ xorps xmm4,[32+esp]
+ movups [edi],xmm2
+ xorps xmm5,[48+esp]
+ movups [16+edi],xmm3
+ xorps xmm6,xmm1
+ movups [32+edi],xmm4
+ movups [48+edi],xmm5
+ movups [64+edi],xmm6
+ lea edi,[80+edi]
+ jmp NEAR L$067xts_dec_done
+align 16
+L$063xts_dec_one:
+ movups xmm2,[esi]
+ lea esi,[16+esi]
+ xorps xmm2,xmm5
+ movups xmm0,[edx]
+ movups xmm1,[16+edx]
+ lea edx,[32+edx]
+ xorps xmm2,xmm0
+L$068dec1_loop_12:
+db 102,15,56,222,209
+ dec ecx
+ movups xmm1,[edx]
+ lea edx,[16+edx]
+ jnz NEAR L$068dec1_loop_12
+db 102,15,56,223,209
+ xorps xmm2,xmm5
+ movups [edi],xmm2
+ lea edi,[16+edi]
+ movdqa xmm1,xmm5
+ jmp NEAR L$067xts_dec_done
+align 16
+L$064xts_dec_two:
+ movaps xmm6,xmm1
+ movups xmm2,[esi]
+ movups xmm3,[16+esi]
+ lea esi,[32+esi]
+ xorps xmm2,xmm5
+ xorps xmm3,xmm6
+ call __aesni_decrypt2
+ xorps xmm2,xmm5
+ xorps xmm3,xmm6
+ movups [edi],xmm2
+ movups [16+edi],xmm3
+ lea edi,[32+edi]
+ movdqa xmm1,xmm6
+ jmp NEAR L$067xts_dec_done
+align 16
+L$065xts_dec_three:
+ movaps xmm7,xmm1
+ movups xmm2,[esi]
+ movups xmm3,[16+esi]
+ movups xmm4,[32+esi]
+ lea esi,[48+esi]
+ xorps xmm2,xmm5
+ xorps xmm3,xmm6
+ xorps xmm4,xmm7
+ call __aesni_decrypt3
+ xorps xmm2,xmm5
+ xorps xmm3,xmm6
+ xorps xmm4,xmm7
+ movups [edi],xmm2
+ movups [16+edi],xmm3
+ movups [32+edi],xmm4
+ lea edi,[48+edi]
+ movdqa xmm1,xmm7
+ jmp NEAR L$067xts_dec_done
+align 16
+L$066xts_dec_four:
+ movaps xmm6,xmm1
+ movups xmm2,[esi]
+ movups xmm3,[16+esi]
+ movups xmm4,[32+esi]
+ xorps xmm2,[esp]
+ movups xmm5,[48+esi]
+ lea esi,[64+esi]
+ xorps xmm3,[16+esp]
+ xorps xmm4,xmm7
+ xorps xmm5,xmm6
+ call __aesni_decrypt4
+ xorps xmm2,[esp]
+ xorps xmm3,[16+esp]
+ xorps xmm4,xmm7
+ movups [edi],xmm2
+ xorps xmm5,xmm6
+ movups [16+edi],xmm3
+ movups [32+edi],xmm4
+ movups [48+edi],xmm5
+ lea edi,[64+edi]
+ movdqa xmm1,xmm6
+ jmp NEAR L$067xts_dec_done
+align 16
+L$062xts_dec_done6x:
+ mov eax,DWORD [112+esp]
+ and eax,15
+ jz NEAR L$069xts_dec_ret
+ mov DWORD [112+esp],eax
+ jmp NEAR L$070xts_dec_only_one_more
+align 16
+L$067xts_dec_done:
+ mov eax,DWORD [112+esp]
+ pxor xmm0,xmm0
+ and eax,15
+ jz NEAR L$069xts_dec_ret
+ pcmpgtd xmm0,xmm1
+ mov DWORD [112+esp],eax
+ pshufd xmm2,xmm0,19
+ pxor xmm0,xmm0
+ movdqa xmm3,[96+esp]
+ paddq xmm1,xmm1
+ pand xmm2,xmm3
+ pcmpgtd xmm0,xmm1
+ pxor xmm1,xmm2
+L$070xts_dec_only_one_more:
+ pshufd xmm5,xmm0,19
+ movdqa xmm6,xmm1
+ paddq xmm1,xmm1
+ pand xmm5,xmm3
+ pxor xmm5,xmm1
+ mov edx,ebp
+ mov ecx,ebx
+ movups xmm2,[esi]
+ xorps xmm2,xmm5
+ movups xmm0,[edx]
+ movups xmm1,[16+edx]
+ lea edx,[32+edx]
+ xorps xmm2,xmm0
+L$071dec1_loop_13:
+db 102,15,56,222,209
+ dec ecx
+ movups xmm1,[edx]
+ lea edx,[16+edx]
+ jnz NEAR L$071dec1_loop_13
+db 102,15,56,223,209
+ xorps xmm2,xmm5
+ movups [edi],xmm2
+L$072xts_dec_steal:
+ movzx ecx,BYTE [16+esi]
+ movzx edx,BYTE [edi]
+ lea esi,[1+esi]
+ mov BYTE [edi],cl
+ mov BYTE [16+edi],dl
+ lea edi,[1+edi]
+ sub eax,1
+ jnz NEAR L$072xts_dec_steal
+ sub edi,DWORD [112+esp]
+ mov edx,ebp
+ mov ecx,ebx
+ movups xmm2,[edi]
+ xorps xmm2,xmm6
+ movups xmm0,[edx]
+ movups xmm1,[16+edx]
+ lea edx,[32+edx]
+ xorps xmm2,xmm0
+L$073dec1_loop_14:
+db 102,15,56,222,209
+ dec ecx
+ movups xmm1,[edx]
+ lea edx,[16+edx]
+ jnz NEAR L$073dec1_loop_14
+db 102,15,56,223,209
+ xorps xmm2,xmm6
+ movups [edi],xmm2
+L$069xts_dec_ret:
+ pxor xmm0,xmm0
+ pxor xmm1,xmm1
+ pxor xmm2,xmm2
+ movdqa [esp],xmm0
+ pxor xmm3,xmm3
+ movdqa [16+esp],xmm0
+ pxor xmm4,xmm4
+ movdqa [32+esp],xmm0
+ pxor xmm5,xmm5
+ movdqa [48+esp],xmm0
+ pxor xmm6,xmm6
+ movdqa [64+esp],xmm0
+ pxor xmm7,xmm7
+ movdqa [80+esp],xmm0
+ mov esp,DWORD [116+esp]
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+global _aesni_ocb_encrypt
+align 16
+_aesni_ocb_encrypt:
+L$_aesni_ocb_encrypt_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ mov ecx,DWORD [40+esp]
+ mov ebx,DWORD [48+esp]
+ mov esi,DWORD [20+esp]
+ mov edi,DWORD [24+esp]
+ mov eax,DWORD [28+esp]
+ mov edx,DWORD [32+esp]
+ movdqu xmm0,[ecx]
+ mov ebp,DWORD [36+esp]
+ movdqu xmm1,[ebx]
+ mov ebx,DWORD [44+esp]
+ mov ecx,esp
+ sub esp,132
+ and esp,-16
+ sub edi,esi
+ shl eax,4
+ lea eax,[eax*1+esi-96]
+ mov DWORD [120+esp],edi
+ mov DWORD [124+esp],eax
+ mov DWORD [128+esp],ecx
+ mov ecx,DWORD [240+edx]
+ test ebp,1
+ jnz NEAR L$074odd
+ bsf eax,ebp
+ add ebp,1
+ shl eax,4
+ movdqu xmm7,[eax*1+ebx]
+ mov eax,edx
+ movdqu xmm2,[esi]
+ lea esi,[16+esi]
+ pxor xmm7,xmm0
+ pxor xmm1,xmm2
+ pxor xmm2,xmm7
+ movdqa xmm6,xmm1
+ movups xmm0,[edx]
+ movups xmm1,[16+edx]
+ lea edx,[32+edx]
+ xorps xmm2,xmm0
+L$075enc1_loop_15:
+db 102,15,56,220,209
+ dec ecx
+ movups xmm1,[edx]
+ lea edx,[16+edx]
+ jnz NEAR L$075enc1_loop_15
+db 102,15,56,221,209
+ xorps xmm2,xmm7
+ movdqa xmm0,xmm7
+ movdqa xmm1,xmm6
+ movups [esi*1+edi-16],xmm2
+ mov ecx,DWORD [240+eax]
+ mov edx,eax
+ mov eax,DWORD [124+esp]
+L$074odd:
+ shl ecx,4
+ mov edi,16
+ sub edi,ecx
+ mov DWORD [112+esp],edx
+ lea edx,[32+ecx*1+edx]
+ mov DWORD [116+esp],edi
+ cmp esi,eax
+ ja NEAR L$076short
+ jmp NEAR L$077grandloop
+align 32
+L$077grandloop:
+ lea ecx,[1+ebp]
+ lea eax,[3+ebp]
+ lea edi,[5+ebp]
+ add ebp,6
+ bsf ecx,ecx
+ bsf eax,eax
+ bsf edi,edi
+ shl ecx,4
+ shl eax,4
+ shl edi,4
+ movdqu xmm2,[ebx]
+ movdqu xmm3,[ecx*1+ebx]
+ mov ecx,DWORD [116+esp]
+ movdqa xmm4,xmm2
+ movdqu xmm5,[eax*1+ebx]
+ movdqa xmm6,xmm2
+ movdqu xmm7,[edi*1+ebx]
+ pxor xmm2,xmm0
+ pxor xmm3,xmm2
+ movdqa [esp],xmm2
+ pxor xmm4,xmm3
+ movdqa [16+esp],xmm3
+ pxor xmm5,xmm4
+ movdqa [32+esp],xmm4
+ pxor xmm6,xmm5
+ movdqa [48+esp],xmm5
+ pxor xmm7,xmm6
+ movdqa [64+esp],xmm6
+ movdqa [80+esp],xmm7
+ movups xmm0,[ecx*1+edx-48]
+ movdqu xmm2,[esi]
+ movdqu xmm3,[16+esi]
+ movdqu xmm4,[32+esi]
+ movdqu xmm5,[48+esi]
+ movdqu xmm6,[64+esi]
+ movdqu xmm7,[80+esi]
+ lea esi,[96+esi]
+ pxor xmm1,xmm2
+ pxor xmm2,xmm0
+ pxor xmm1,xmm3
+ pxor xmm3,xmm0
+ pxor xmm1,xmm4
+ pxor xmm4,xmm0
+ pxor xmm1,xmm5
+ pxor xmm5,xmm0
+ pxor xmm1,xmm6
+ pxor xmm6,xmm0
+ pxor xmm1,xmm7
+ pxor xmm7,xmm0
+ movdqa [96+esp],xmm1
+ movups xmm1,[ecx*1+edx-32]
+ pxor xmm2,[esp]
+ pxor xmm3,[16+esp]
+ pxor xmm4,[32+esp]
+ pxor xmm5,[48+esp]
+ pxor xmm6,[64+esp]
+ pxor xmm7,[80+esp]
+ movups xmm0,[ecx*1+edx-16]
+db 102,15,56,220,209
+db 102,15,56,220,217
+db 102,15,56,220,225
+db 102,15,56,220,233
+db 102,15,56,220,241
+db 102,15,56,220,249
+ mov edi,DWORD [120+esp]
+ mov eax,DWORD [124+esp]
+ call L$_aesni_encrypt6_enter
+ movdqa xmm0,[80+esp]
+ pxor xmm2,[esp]
+ pxor xmm3,[16+esp]
+ pxor xmm4,[32+esp]
+ pxor xmm5,[48+esp]
+ pxor xmm6,[64+esp]
+ pxor xmm7,xmm0
+ movdqa xmm1,[96+esp]
+ movdqu [esi*1+edi-96],xmm2
+ movdqu [esi*1+edi-80],xmm3
+ movdqu [esi*1+edi-64],xmm4
+ movdqu [esi*1+edi-48],xmm5
+ movdqu [esi*1+edi-32],xmm6
+ movdqu [esi*1+edi-16],xmm7
+ cmp esi,eax
+ jb NEAR L$077grandloop
+L$076short:
+ add eax,96
+ sub eax,esi
+ jz NEAR L$078done
+ cmp eax,32
+ jb NEAR L$079one
+ je NEAR L$080two
+ cmp eax,64
+ jb NEAR L$081three
+ je NEAR L$082four
+ lea ecx,[1+ebp]
+ lea eax,[3+ebp]
+ bsf ecx,ecx
+ bsf eax,eax
+ shl ecx,4
+ shl eax,4
+ movdqu xmm2,[ebx]
+ movdqu xmm3,[ecx*1+ebx]
+ mov ecx,DWORD [116+esp]
+ movdqa xmm4,xmm2
+ movdqu xmm5,[eax*1+ebx]
+ movdqa xmm6,xmm2
+ pxor xmm2,xmm0
+ pxor xmm3,xmm2
+ movdqa [esp],xmm2
+ pxor xmm4,xmm3
+ movdqa [16+esp],xmm3
+ pxor xmm5,xmm4
+ movdqa [32+esp],xmm4
+ pxor xmm6,xmm5
+ movdqa [48+esp],xmm5
+ pxor xmm7,xmm6
+ movdqa [64+esp],xmm6
+ movups xmm0,[ecx*1+edx-48]
+ movdqu xmm2,[esi]
+ movdqu xmm3,[16+esi]
+ movdqu xmm4,[32+esi]
+ movdqu xmm5,[48+esi]
+ movdqu xmm6,[64+esi]
+ pxor xmm7,xmm7
+ pxor xmm1,xmm2
+ pxor xmm2,xmm0
+ pxor xmm1,xmm3
+ pxor xmm3,xmm0
+ pxor xmm1,xmm4
+ pxor xmm4,xmm0
+ pxor xmm1,xmm5
+ pxor xmm5,xmm0
+ pxor xmm1,xmm6
+ pxor xmm6,xmm0
+ movdqa [96+esp],xmm1
+ movups xmm1,[ecx*1+edx-32]
+ pxor xmm2,[esp]
+ pxor xmm3,[16+esp]
+ pxor xmm4,[32+esp]
+ pxor xmm5,[48+esp]
+ pxor xmm6,[64+esp]
+ movups xmm0,[ecx*1+edx-16]
+db 102,15,56,220,209
+db 102,15,56,220,217
+db 102,15,56,220,225
+db 102,15,56,220,233
+db 102,15,56,220,241
+db 102,15,56,220,249
+ mov edi,DWORD [120+esp]
+ call L$_aesni_encrypt6_enter
+ movdqa xmm0,[64+esp]
+ pxor xmm2,[esp]
+ pxor xmm3,[16+esp]
+ pxor xmm4,[32+esp]
+ pxor xmm5,[48+esp]
+ pxor xmm6,xmm0
+ movdqa xmm1,[96+esp]
+ movdqu [esi*1+edi],xmm2
+ movdqu [16+esi*1+edi],xmm3
+ movdqu [32+esi*1+edi],xmm4
+ movdqu [48+esi*1+edi],xmm5
+ movdqu [64+esi*1+edi],xmm6
+ jmp NEAR L$078done
+align 16
+L$079one:
+ movdqu xmm7,[ebx]
+ mov edx,DWORD [112+esp]
+ movdqu xmm2,[esi]
+ mov ecx,DWORD [240+edx]
+ pxor xmm7,xmm0
+ pxor xmm1,xmm2
+ pxor xmm2,xmm7
+ movdqa xmm6,xmm1
+ mov edi,DWORD [120+esp]
+ movups xmm0,[edx]
+ movups xmm1,[16+edx]
+ lea edx,[32+edx]
+ xorps xmm2,xmm0
+L$083enc1_loop_16:
+db 102,15,56,220,209
+ dec ecx
+ movups xmm1,[edx]
+ lea edx,[16+edx]
+ jnz NEAR L$083enc1_loop_16
+db 102,15,56,221,209
+ xorps xmm2,xmm7
+ movdqa xmm0,xmm7
+ movdqa xmm1,xmm6
+ movups [esi*1+edi],xmm2
+ jmp NEAR L$078done
+align 16
+L$080two:
+ lea ecx,[1+ebp]
+ mov edx,DWORD [112+esp]
+ bsf ecx,ecx
+ shl ecx,4
+ movdqu xmm6,[ebx]
+ movdqu xmm7,[ecx*1+ebx]
+ movdqu xmm2,[esi]
+ movdqu xmm3,[16+esi]
+ mov ecx,DWORD [240+edx]
+ pxor xmm6,xmm0
+ pxor xmm7,xmm6
+ pxor xmm1,xmm2
+ pxor xmm2,xmm6
+ pxor xmm1,xmm3
+ pxor xmm3,xmm7
+ movdqa xmm5,xmm1
+ mov edi,DWORD [120+esp]
+ call __aesni_encrypt2
+ xorps xmm2,xmm6
+ xorps xmm3,xmm7
+ movdqa xmm0,xmm7
+ movdqa xmm1,xmm5
+ movups [esi*1+edi],xmm2
+ movups [16+esi*1+edi],xmm3
+ jmp NEAR L$078done
+align 16
+L$081three:
+ lea ecx,[1+ebp]
+ mov edx,DWORD [112+esp]
+ bsf ecx,ecx
+ shl ecx,4
+ movdqu xmm5,[ebx]
+ movdqu xmm6,[ecx*1+ebx]
+ movdqa xmm7,xmm5
+ movdqu xmm2,[esi]
+ movdqu xmm3,[16+esi]
+ movdqu xmm4,[32+esi]
+ mov ecx,DWORD [240+edx]
+ pxor xmm5,xmm0
+ pxor xmm6,xmm5
+ pxor xmm7,xmm6
+ pxor xmm1,xmm2
+ pxor xmm2,xmm5
+ pxor xmm1,xmm3
+ pxor xmm3,xmm6
+ pxor xmm1,xmm4
+ pxor xmm4,xmm7
+ movdqa [96+esp],xmm1
+ mov edi,DWORD [120+esp]
+ call __aesni_encrypt3
+ xorps xmm2,xmm5
+ xorps xmm3,xmm6
+ xorps xmm4,xmm7
+ movdqa xmm0,xmm7
+ movdqa xmm1,[96+esp]
+ movups [esi*1+edi],xmm2
+ movups [16+esi*1+edi],xmm3
+ movups [32+esi*1+edi],xmm4
+ jmp NEAR L$078done
+align 16
+L$082four:
+ lea ecx,[1+ebp]
+ lea eax,[3+ebp]
+ bsf ecx,ecx
+ bsf eax,eax
+ mov edx,DWORD [112+esp]
+ shl ecx,4
+ shl eax,4
+ movdqu xmm4,[ebx]
+ movdqu xmm5,[ecx*1+ebx]
+ movdqa xmm6,xmm4
+ movdqu xmm7,[eax*1+ebx]
+ pxor xmm4,xmm0
+ movdqu xmm2,[esi]
+ pxor xmm5,xmm4
+ movdqu xmm3,[16+esi]
+ pxor xmm6,xmm5
+ movdqa [esp],xmm4
+ pxor xmm7,xmm6
+ movdqa [16+esp],xmm5
+ movdqu xmm4,[32+esi]
+ movdqu xmm5,[48+esi]
+ mov ecx,DWORD [240+edx]
+ pxor xmm1,xmm2
+ pxor xmm2,[esp]
+ pxor xmm1,xmm3
+ pxor xmm3,[16+esp]
+ pxor xmm1,xmm4
+ pxor xmm4,xmm6
+ pxor xmm1,xmm5
+ pxor xmm5,xmm7
+ movdqa [96+esp],xmm1
+ mov edi,DWORD [120+esp]
+ call __aesni_encrypt4
+ xorps xmm2,[esp]
+ xorps xmm3,[16+esp]
+ xorps xmm4,xmm6
+ movups [esi*1+edi],xmm2
+ xorps xmm5,xmm7
+ movups [16+esi*1+edi],xmm3
+ movdqa xmm0,xmm7
+ movups [32+esi*1+edi],xmm4
+ movdqa xmm1,[96+esp]
+ movups [48+esi*1+edi],xmm5
+L$078done:
+ mov edx,DWORD [128+esp]
+ pxor xmm2,xmm2
+ pxor xmm3,xmm3
+ movdqa [esp],xmm2
+ pxor xmm4,xmm4
+ movdqa [16+esp],xmm2
+ pxor xmm5,xmm5
+ movdqa [32+esp],xmm2
+ pxor xmm6,xmm6
+ movdqa [48+esp],xmm2
+ pxor xmm7,xmm7
+ movdqa [64+esp],xmm2
+ movdqa [80+esp],xmm2
+ movdqa [96+esp],xmm2
+ lea esp,[edx]
+ mov ecx,DWORD [40+esp]
+ mov ebx,DWORD [48+esp]
+ movdqu [ecx],xmm0
+ pxor xmm0,xmm0
+ movdqu [ebx],xmm1
+ pxor xmm1,xmm1
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+global _aesni_ocb_decrypt
+align 16
+_aesni_ocb_decrypt:
+L$_aesni_ocb_decrypt_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ mov ecx,DWORD [40+esp]
+ mov ebx,DWORD [48+esp]
+ mov esi,DWORD [20+esp]
+ mov edi,DWORD [24+esp]
+ mov eax,DWORD [28+esp]
+ mov edx,DWORD [32+esp]
+ movdqu xmm0,[ecx]
+ mov ebp,DWORD [36+esp]
+ movdqu xmm1,[ebx]
+ mov ebx,DWORD [44+esp]
+ mov ecx,esp
+ sub esp,132
+ and esp,-16
+ sub edi,esi
+ shl eax,4
+ lea eax,[eax*1+esi-96]
+ mov DWORD [120+esp],edi
+ mov DWORD [124+esp],eax
+ mov DWORD [128+esp],ecx
+ mov ecx,DWORD [240+edx]
+ test ebp,1
+ jnz NEAR L$084odd
+ bsf eax,ebp
+ add ebp,1
+ shl eax,4
+ movdqu xmm7,[eax*1+ebx]
+ mov eax,edx
+ movdqu xmm2,[esi]
+ lea esi,[16+esi]
+ pxor xmm7,xmm0
+ pxor xmm2,xmm7
+ movdqa xmm6,xmm1
+ movups xmm0,[edx]
+ movups xmm1,[16+edx]
+ lea edx,[32+edx]
+ xorps xmm2,xmm0
+L$085dec1_loop_17:
+db 102,15,56,222,209
+ dec ecx
+ movups xmm1,[edx]
+ lea edx,[16+edx]
+ jnz NEAR L$085dec1_loop_17
+db 102,15,56,223,209
+ xorps xmm2,xmm7
+ movaps xmm1,xmm6
+ movdqa xmm0,xmm7
+ xorps xmm1,xmm2
+ movups [esi*1+edi-16],xmm2
+ mov ecx,DWORD [240+eax]
+ mov edx,eax
+ mov eax,DWORD [124+esp]
+L$084odd:
+ shl ecx,4
+ mov edi,16
+ sub edi,ecx
+ mov DWORD [112+esp],edx
+ lea edx,[32+ecx*1+edx]
+ mov DWORD [116+esp],edi
+ cmp esi,eax
+ ja NEAR L$086short
+ jmp NEAR L$087grandloop
+align 32
+L$087grandloop:
+ lea ecx,[1+ebp]
+ lea eax,[3+ebp]
+ lea edi,[5+ebp]
+ add ebp,6
+ bsf ecx,ecx
+ bsf eax,eax
+ bsf edi,edi
+ shl ecx,4
+ shl eax,4
+ shl edi,4
+ movdqu xmm2,[ebx]
+ movdqu xmm3,[ecx*1+ebx]
+ mov ecx,DWORD [116+esp]
+ movdqa xmm4,xmm2
+ movdqu xmm5,[eax*1+ebx]
+ movdqa xmm6,xmm2
+ movdqu xmm7,[edi*1+ebx]
+ pxor xmm2,xmm0
+ pxor xmm3,xmm2
+ movdqa [esp],xmm2
+ pxor xmm4,xmm3
+ movdqa [16+esp],xmm3
+ pxor xmm5,xmm4
+ movdqa [32+esp],xmm4
+ pxor xmm6,xmm5
+ movdqa [48+esp],xmm5
+ pxor xmm7,xmm6
+ movdqa [64+esp],xmm6
+ movdqa [80+esp],xmm7
+ movups xmm0,[ecx*1+edx-48]
+ movdqu xmm2,[esi]
+ movdqu xmm3,[16+esi]
+ movdqu xmm4,[32+esi]
+ movdqu xmm5,[48+esi]
+ movdqu xmm6,[64+esi]
+ movdqu xmm7,[80+esi]
+ lea esi,[96+esi]
+ movdqa [96+esp],xmm1
+ pxor xmm2,xmm0
+ pxor xmm3,xmm0
+ pxor xmm4,xmm0
+ pxor xmm5,xmm0
+ pxor xmm6,xmm0
+ pxor xmm7,xmm0
+ movups xmm1,[ecx*1+edx-32]
+ pxor xmm2,[esp]
+ pxor xmm3,[16+esp]
+ pxor xmm4,[32+esp]
+ pxor xmm5,[48+esp]
+ pxor xmm6,[64+esp]
+ pxor xmm7,[80+esp]
+ movups xmm0,[ecx*1+edx-16]
+db 102,15,56,222,209
+db 102,15,56,222,217
+db 102,15,56,222,225
+db 102,15,56,222,233
+db 102,15,56,222,241
+db 102,15,56,222,249
+ mov edi,DWORD [120+esp]
+ mov eax,DWORD [124+esp]
+ call L$_aesni_decrypt6_enter
+ movdqa xmm0,[80+esp]
+ pxor xmm2,[esp]
+ movdqa xmm1,[96+esp]
+ pxor xmm3,[16+esp]
+ pxor xmm4,[32+esp]
+ pxor xmm5,[48+esp]
+ pxor xmm6,[64+esp]
+ pxor xmm7,xmm0
+ pxor xmm1,xmm2
+ movdqu [esi*1+edi-96],xmm2
+ pxor xmm1,xmm3
+ movdqu [esi*1+edi-80],xmm3
+ pxor xmm1,xmm4
+ movdqu [esi*1+edi-64],xmm4
+ pxor xmm1,xmm5
+ movdqu [esi*1+edi-48],xmm5
+ pxor xmm1,xmm6
+ movdqu [esi*1+edi-32],xmm6
+ pxor xmm1,xmm7
+ movdqu [esi*1+edi-16],xmm7
+ cmp esi,eax
+ jb NEAR L$087grandloop
+L$086short:
+ add eax,96
+ sub eax,esi
+ jz NEAR L$088done
+ cmp eax,32
+ jb NEAR L$089one
+ je NEAR L$090two
+ cmp eax,64
+ jb NEAR L$091three
+ je NEAR L$092four
+ lea ecx,[1+ebp]
+ lea eax,[3+ebp]
+ bsf ecx,ecx
+ bsf eax,eax
+ shl ecx,4
+ shl eax,4
+ movdqu xmm2,[ebx]
+ movdqu xmm3,[ecx*1+ebx]
+ mov ecx,DWORD [116+esp]
+ movdqa xmm4,xmm2
+ movdqu xmm5,[eax*1+ebx]
+ movdqa xmm6,xmm2
+ pxor xmm2,xmm0
+ pxor xmm3,xmm2
+ movdqa [esp],xmm2
+ pxor xmm4,xmm3
+ movdqa [16+esp],xmm3
+ pxor xmm5,xmm4
+ movdqa [32+esp],xmm4
+ pxor xmm6,xmm5
+ movdqa [48+esp],xmm5
+ pxor xmm7,xmm6
+ movdqa [64+esp],xmm6
+ movups xmm0,[ecx*1+edx-48]
+ movdqu xmm2,[esi]
+ movdqu xmm3,[16+esi]
+ movdqu xmm4,[32+esi]
+ movdqu xmm5,[48+esi]
+ movdqu xmm6,[64+esi]
+ pxor xmm7,xmm7
+ movdqa [96+esp],xmm1
+ pxor xmm2,xmm0
+ pxor xmm3,xmm0
+ pxor xmm4,xmm0
+ pxor xmm5,xmm0
+ pxor xmm6,xmm0
+ movups xmm1,[ecx*1+edx-32]
+ pxor xmm2,[esp]
+ pxor xmm3,[16+esp]
+ pxor xmm4,[32+esp]
+ pxor xmm5,[48+esp]
+ pxor xmm6,[64+esp]
+ movups xmm0,[ecx*1+edx-16]
+db 102,15,56,222,209
+db 102,15,56,222,217
+db 102,15,56,222,225
+db 102,15,56,222,233
+db 102,15,56,222,241
+db 102,15,56,222,249
+ mov edi,DWORD [120+esp]
+ call L$_aesni_decrypt6_enter
+ movdqa xmm0,[64+esp]
+ pxor xmm2,[esp]
+ movdqa xmm1,[96+esp]
+ pxor xmm3,[16+esp]
+ pxor xmm4,[32+esp]
+ pxor xmm5,[48+esp]
+ pxor xmm6,xmm0
+ pxor xmm1,xmm2
+ movdqu [esi*1+edi],xmm2
+ pxor xmm1,xmm3
+ movdqu [16+esi*1+edi],xmm3
+ pxor xmm1,xmm4
+ movdqu [32+esi*1+edi],xmm4
+ pxor xmm1,xmm5
+ movdqu [48+esi*1+edi],xmm5
+ pxor xmm1,xmm6
+ movdqu [64+esi*1+edi],xmm6
+ jmp NEAR L$088done
+align 16
+L$089one:
+ movdqu xmm7,[ebx]
+ mov edx,DWORD [112+esp]
+ movdqu xmm2,[esi]
+ mov ecx,DWORD [240+edx]
+ pxor xmm7,xmm0
+ pxor xmm2,xmm7
+ movdqa xmm6,xmm1
+ mov edi,DWORD [120+esp]
+ movups xmm0,[edx]
+ movups xmm1,[16+edx]
+ lea edx,[32+edx]
+ xorps xmm2,xmm0
+L$093dec1_loop_18:
+db 102,15,56,222,209
+ dec ecx
+ movups xmm1,[edx]
+ lea edx,[16+edx]
+ jnz NEAR L$093dec1_loop_18
+db 102,15,56,223,209
+ xorps xmm2,xmm7
+ movaps xmm1,xmm6
+ movdqa xmm0,xmm7
+ xorps xmm1,xmm2
+ movups [esi*1+edi],xmm2
+ jmp NEAR L$088done
+align 16
+L$090two:
+ lea ecx,[1+ebp]
+ mov edx,DWORD [112+esp]
+ bsf ecx,ecx
+ shl ecx,4
+ movdqu xmm6,[ebx]
+ movdqu xmm7,[ecx*1+ebx]
+ movdqu xmm2,[esi]
+ movdqu xmm3,[16+esi]
+ mov ecx,DWORD [240+edx]
+ movdqa xmm5,xmm1
+ pxor xmm6,xmm0
+ pxor xmm7,xmm6
+ pxor xmm2,xmm6
+ pxor xmm3,xmm7
+ mov edi,DWORD [120+esp]
+ call __aesni_decrypt2
+ xorps xmm2,xmm6
+ xorps xmm3,xmm7
+ movdqa xmm0,xmm7
+ xorps xmm5,xmm2
+ movups [esi*1+edi],xmm2
+ xorps xmm5,xmm3
+ movups [16+esi*1+edi],xmm3
+ movaps xmm1,xmm5
+ jmp NEAR L$088done
+align 16
+L$091three:
+ lea ecx,[1+ebp]
+ mov edx,DWORD [112+esp]
+ bsf ecx,ecx
+ shl ecx,4
+ movdqu xmm5,[ebx]
+ movdqu xmm6,[ecx*1+ebx]
+ movdqa xmm7,xmm5
+ movdqu xmm2,[esi]
+ movdqu xmm3,[16+esi]
+ movdqu xmm4,[32+esi]
+ mov ecx,DWORD [240+edx]
+ movdqa [96+esp],xmm1
+ pxor xmm5,xmm0
+ pxor xmm6,xmm5
+ pxor xmm7,xmm6
+ pxor xmm2,xmm5
+ pxor xmm3,xmm6
+ pxor xmm4,xmm7
+ mov edi,DWORD [120+esp]
+ call __aesni_decrypt3
+ movdqa xmm1,[96+esp]
+ xorps xmm2,xmm5
+ xorps xmm3,xmm6
+ xorps xmm4,xmm7
+ movups [esi*1+edi],xmm2
+ pxor xmm1,xmm2
+ movdqa xmm0,xmm7
+ movups [16+esi*1+edi],xmm3
+ pxor xmm1,xmm3
+ movups [32+esi*1+edi],xmm4
+ pxor xmm1,xmm4
+ jmp NEAR L$088done
+align 16
+L$092four:
+ lea ecx,[1+ebp]
+ lea eax,[3+ebp]
+ bsf ecx,ecx
+ bsf eax,eax
+ mov edx,DWORD [112+esp]
+ shl ecx,4
+ shl eax,4
+ movdqu xmm4,[ebx]
+ movdqu xmm5,[ecx*1+ebx]
+ movdqa xmm6,xmm4
+ movdqu xmm7,[eax*1+ebx]
+ pxor xmm4,xmm0
+ movdqu xmm2,[esi]
+ pxor xmm5,xmm4
+ movdqu xmm3,[16+esi]
+ pxor xmm6,xmm5
+ movdqa [esp],xmm4
+ pxor xmm7,xmm6
+ movdqa [16+esp],xmm5
+ movdqu xmm4,[32+esi]
+ movdqu xmm5,[48+esi]
+ mov ecx,DWORD [240+edx]
+ movdqa [96+esp],xmm1
+ pxor xmm2,[esp]
+ pxor xmm3,[16+esp]
+ pxor xmm4,xmm6
+ pxor xmm5,xmm7
+ mov edi,DWORD [120+esp]
+ call __aesni_decrypt4
+ movdqa xmm1,[96+esp]
+ xorps xmm2,[esp]
+ xorps xmm3,[16+esp]
+ xorps xmm4,xmm6
+ movups [esi*1+edi],xmm2
+ pxor xmm1,xmm2
+ xorps xmm5,xmm7
+ movups [16+esi*1+edi],xmm3
+ pxor xmm1,xmm3
+ movdqa xmm0,xmm7
+ movups [32+esi*1+edi],xmm4
+ pxor xmm1,xmm4
+ movups [48+esi*1+edi],xmm5
+ pxor xmm1,xmm5
+L$088done:
+ mov edx,DWORD [128+esp]
+ pxor xmm2,xmm2
+ pxor xmm3,xmm3
+ movdqa [esp],xmm2
+ pxor xmm4,xmm4
+ movdqa [16+esp],xmm2
+ pxor xmm5,xmm5
+ movdqa [32+esp],xmm2
+ pxor xmm6,xmm6
+ movdqa [48+esp],xmm2
+ pxor xmm7,xmm7
+ movdqa [64+esp],xmm2
+ movdqa [80+esp],xmm2
+ movdqa [96+esp],xmm2
+ lea esp,[edx]
+ mov ecx,DWORD [40+esp]
+ mov ebx,DWORD [48+esp]
+ movdqu [ecx],xmm0
+ pxor xmm0,xmm0
+ movdqu [ebx],xmm1
+ pxor xmm1,xmm1
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+global _aesni_cbc_encrypt
+align 16
+_aesni_cbc_encrypt:
+L$_aesni_cbc_encrypt_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ mov esi,DWORD [20+esp]
+ mov ebx,esp
+ mov edi,DWORD [24+esp]
+ sub ebx,24
+ mov eax,DWORD [28+esp]
+ and ebx,-16
+ mov edx,DWORD [32+esp]
+ mov ebp,DWORD [36+esp]
+ test eax,eax
+ jz NEAR L$094cbc_abort
+ cmp DWORD [40+esp],0
+ xchg ebx,esp
+ movups xmm7,[ebp]
+ mov ecx,DWORD [240+edx]
+ mov ebp,edx
+ mov DWORD [16+esp],ebx
+ mov ebx,ecx
+ je NEAR L$095cbc_decrypt
+ movaps xmm2,xmm7
+ cmp eax,16
+ jb NEAR L$096cbc_enc_tail
+ sub eax,16
+ jmp NEAR L$097cbc_enc_loop
+align 16
+L$097cbc_enc_loop:
+ movups xmm7,[esi]
+ lea esi,[16+esi]
+ movups xmm0,[edx]
+ movups xmm1,[16+edx]
+ xorps xmm7,xmm0
+ lea edx,[32+edx]
+ xorps xmm2,xmm7
+L$098enc1_loop_19:
+db 102,15,56,220,209
+ dec ecx
+ movups xmm1,[edx]
+ lea edx,[16+edx]
+ jnz NEAR L$098enc1_loop_19
+db 102,15,56,221,209
+ mov ecx,ebx
+ mov edx,ebp
+ movups [edi],xmm2
+ lea edi,[16+edi]
+ sub eax,16
+ jnc NEAR L$097cbc_enc_loop
+ add eax,16
+ jnz NEAR L$096cbc_enc_tail
+ movaps xmm7,xmm2
+ pxor xmm2,xmm2
+ jmp NEAR L$099cbc_ret
+L$096cbc_enc_tail:
+ mov ecx,eax
+dd 2767451785
+ mov ecx,16
+ sub ecx,eax
+ xor eax,eax
+dd 2868115081
+ lea edi,[edi-16]
+ mov ecx,ebx
+ mov esi,edi
+ mov edx,ebp
+ jmp NEAR L$097cbc_enc_loop
+align 16
+L$095cbc_decrypt:
+ cmp eax,80
+ jbe NEAR L$100cbc_dec_tail
+ movaps [esp],xmm7
+ sub eax,80
+ jmp NEAR L$101cbc_dec_loop6_enter
+align 16
+L$102cbc_dec_loop6:
+ movaps [esp],xmm0
+ movups [edi],xmm7
+ lea edi,[16+edi]
+L$101cbc_dec_loop6_enter:
+ movdqu xmm2,[esi]
+ movdqu xmm3,[16+esi]
+ movdqu xmm4,[32+esi]
+ movdqu xmm5,[48+esi]
+ movdqu xmm6,[64+esi]
+ movdqu xmm7,[80+esi]
+ call __aesni_decrypt6
+ movups xmm1,[esi]
+ movups xmm0,[16+esi]
+ xorps xmm2,[esp]
+ xorps xmm3,xmm1
+ movups xmm1,[32+esi]
+ xorps xmm4,xmm0
+ movups xmm0,[48+esi]
+ xorps xmm5,xmm1
+ movups xmm1,[64+esi]
+ xorps xmm6,xmm0
+ movups xmm0,[80+esi]
+ xorps xmm7,xmm1
+ movups [edi],xmm2
+ movups [16+edi],xmm3
+ lea esi,[96+esi]
+ movups [32+edi],xmm4
+ mov ecx,ebx
+ movups [48+edi],xmm5
+ mov edx,ebp
+ movups [64+edi],xmm6
+ lea edi,[80+edi]
+ sub eax,96
+ ja NEAR L$102cbc_dec_loop6
+ movaps xmm2,xmm7
+ movaps xmm7,xmm0
+ add eax,80
+ jle NEAR L$103cbc_dec_clear_tail_collected
+ movups [edi],xmm2
+ lea edi,[16+edi]
+L$100cbc_dec_tail:
+ movups xmm2,[esi]
+ movaps xmm6,xmm2
+ cmp eax,16
+ jbe NEAR L$104cbc_dec_one
+ movups xmm3,[16+esi]
+ movaps xmm5,xmm3
+ cmp eax,32
+ jbe NEAR L$105cbc_dec_two
+ movups xmm4,[32+esi]
+ cmp eax,48
+ jbe NEAR L$106cbc_dec_three
+ movups xmm5,[48+esi]
+ cmp eax,64
+ jbe NEAR L$107cbc_dec_four
+ movups xmm6,[64+esi]
+ movaps [esp],xmm7
+ movups xmm2,[esi]
+ xorps xmm7,xmm7
+ call __aesni_decrypt6
+ movups xmm1,[esi]
+ movups xmm0,[16+esi]
+ xorps xmm2,[esp]
+ xorps xmm3,xmm1
+ movups xmm1,[32+esi]
+ xorps xmm4,xmm0
+ movups xmm0,[48+esi]
+ xorps xmm5,xmm1
+ movups xmm7,[64+esi]
+ xorps xmm6,xmm0
+ movups [edi],xmm2
+ movups [16+edi],xmm3
+ pxor xmm3,xmm3
+ movups [32+edi],xmm4
+ pxor xmm4,xmm4
+ movups [48+edi],xmm5
+ pxor xmm5,xmm5
+ lea edi,[64+edi]
+ movaps xmm2,xmm6
+ pxor xmm6,xmm6
+ sub eax,80
+ jmp NEAR L$108cbc_dec_tail_collected
+align 16
+L$104cbc_dec_one:
+ movups xmm0,[edx]
+ movups xmm1,[16+edx]
+ lea edx,[32+edx]
+ xorps xmm2,xmm0
+L$109dec1_loop_20:
+db 102,15,56,222,209
+ dec ecx
+ movups xmm1,[edx]
+ lea edx,[16+edx]
+ jnz NEAR L$109dec1_loop_20
+db 102,15,56,223,209
+ xorps xmm2,xmm7
+ movaps xmm7,xmm6
+ sub eax,16
+ jmp NEAR L$108cbc_dec_tail_collected
+align 16
+L$105cbc_dec_two:
+ call __aesni_decrypt2
+ xorps xmm2,xmm7
+ xorps xmm3,xmm6
+ movups [edi],xmm2
+ movaps xmm2,xmm3
+ pxor xmm3,xmm3
+ lea edi,[16+edi]
+ movaps xmm7,xmm5
+ sub eax,32
+ jmp NEAR L$108cbc_dec_tail_collected
+align 16
+L$106cbc_dec_three:
+ call __aesni_decrypt3
+ xorps xmm2,xmm7
+ xorps xmm3,xmm6
+ xorps xmm4,xmm5
+ movups [edi],xmm2
+ movaps xmm2,xmm4
+ pxor xmm4,xmm4
+ movups [16+edi],xmm3
+ pxor xmm3,xmm3
+ lea edi,[32+edi]
+ movups xmm7,[32+esi]
+ sub eax,48
+ jmp NEAR L$108cbc_dec_tail_collected
+align 16
+L$107cbc_dec_four:
+ call __aesni_decrypt4
+ movups xmm1,[16+esi]
+ movups xmm0,[32+esi]
+ xorps xmm2,xmm7
+ movups xmm7,[48+esi]
+ xorps xmm3,xmm6
+ movups [edi],xmm2
+ xorps xmm4,xmm1
+ movups [16+edi],xmm3
+ pxor xmm3,xmm3
+ xorps xmm5,xmm0
+ movups [32+edi],xmm4
+ pxor xmm4,xmm4
+ lea edi,[48+edi]
+ movaps xmm2,xmm5
+ pxor xmm5,xmm5
+ sub eax,64
+ jmp NEAR L$108cbc_dec_tail_collected
+align 16
+L$103cbc_dec_clear_tail_collected:
+ pxor xmm3,xmm3
+ pxor xmm4,xmm4
+ pxor xmm5,xmm5
+ pxor xmm6,xmm6
+L$108cbc_dec_tail_collected:
+ and eax,15
+ jnz NEAR L$110cbc_dec_tail_partial
+ movups [edi],xmm2
+ pxor xmm0,xmm0
+ jmp NEAR L$099cbc_ret
+align 16
+L$110cbc_dec_tail_partial:
+ movaps [esp],xmm2
+ pxor xmm0,xmm0
+ mov ecx,16
+ mov esi,esp
+ sub ecx,eax
+dd 2767451785
+ movdqa [esp],xmm2
+L$099cbc_ret:
+ mov esp,DWORD [16+esp]
+ mov ebp,DWORD [36+esp]
+ pxor xmm2,xmm2
+ pxor xmm1,xmm1
+ movups [ebp],xmm7
+ pxor xmm7,xmm7
+L$094cbc_abort:
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+align 16
+__aesni_set_encrypt_key:
+ push ebp
+ push ebx
+ test eax,eax
+ jz NEAR L$111bad_pointer
+ test edx,edx
+ jz NEAR L$111bad_pointer
+ call L$112pic
+L$112pic:
+ pop ebx
+ lea ebx,[(L$key_const-L$112pic)+ebx]
+ lea ebp,[_OPENSSL_ia32cap_P]
+ movups xmm0,[eax]
+ xorps xmm4,xmm4
+ mov ebp,DWORD [4+ebp]
+ lea edx,[16+edx]
+ and ebp,268437504
+ cmp ecx,256
+ je NEAR L$11314rounds
+ cmp ecx,192
+ je NEAR L$11412rounds
+ cmp ecx,128
+ jne NEAR L$115bad_keybits
+align 16
+L$11610rounds:
+ cmp ebp,268435456
+ je NEAR L$11710rounds_alt
+ mov ecx,9
+ movups [edx-16],xmm0
+db 102,15,58,223,200,1
+ call L$118key_128_cold
+db 102,15,58,223,200,2
+ call L$119key_128
+db 102,15,58,223,200,4
+ call L$119key_128
+db 102,15,58,223,200,8
+ call L$119key_128
+db 102,15,58,223,200,16
+ call L$119key_128
+db 102,15,58,223,200,32
+ call L$119key_128
+db 102,15,58,223,200,64
+ call L$119key_128
+db 102,15,58,223,200,128
+ call L$119key_128
+db 102,15,58,223,200,27
+ call L$119key_128
+db 102,15,58,223,200,54
+ call L$119key_128
+ movups [edx],xmm0
+ mov DWORD [80+edx],ecx
+ jmp NEAR L$120good_key
+align 16
+L$119key_128:
+ movups [edx],xmm0
+ lea edx,[16+edx]
+L$118key_128_cold:
+ shufps xmm4,xmm0,16
+ xorps xmm0,xmm4
+ shufps xmm4,xmm0,140
+ xorps xmm0,xmm4
+ shufps xmm1,xmm1,255
+ xorps xmm0,xmm1
+ ret
+align 16
+L$11710rounds_alt:
+ movdqa xmm5,[ebx]
+ mov ecx,8
+ movdqa xmm4,[32+ebx]
+ movdqa xmm2,xmm0
+ movdqu [edx-16],xmm0
+L$121loop_key128:
+db 102,15,56,0,197
+db 102,15,56,221,196
+ pslld xmm4,1
+ lea edx,[16+edx]
+ movdqa xmm3,xmm2
+ pslldq xmm2,4
+ pxor xmm3,xmm2
+ pslldq xmm2,4
+ pxor xmm3,xmm2
+ pslldq xmm2,4
+ pxor xmm2,xmm3
+ pxor xmm0,xmm2
+ movdqu [edx-16],xmm0
+ movdqa xmm2,xmm0
+ dec ecx
+ jnz NEAR L$121loop_key128
+ movdqa xmm4,[48+ebx]
+db 102,15,56,0,197
+db 102,15,56,221,196
+ pslld xmm4,1
+ movdqa xmm3,xmm2
+ pslldq xmm2,4
+ pxor xmm3,xmm2
+ pslldq xmm2,4
+ pxor xmm3,xmm2
+ pslldq xmm2,4
+ pxor xmm2,xmm3
+ pxor xmm0,xmm2
+ movdqu [edx],xmm0
+ movdqa xmm2,xmm0
+db 102,15,56,0,197
+db 102,15,56,221,196
+ movdqa xmm3,xmm2
+ pslldq xmm2,4
+ pxor xmm3,xmm2
+ pslldq xmm2,4
+ pxor xmm3,xmm2
+ pslldq xmm2,4
+ pxor xmm2,xmm3
+ pxor xmm0,xmm2
+ movdqu [16+edx],xmm0
+ mov ecx,9
+ mov DWORD [96+edx],ecx
+ jmp NEAR L$120good_key
+align 16
+L$11412rounds:
+ movq xmm2,[16+eax]
+ cmp ebp,268435456
+ je NEAR L$12212rounds_alt
+ mov ecx,11
+ movups [edx-16],xmm0
+db 102,15,58,223,202,1
+ call L$123key_192a_cold
+db 102,15,58,223,202,2
+ call L$124key_192b
+db 102,15,58,223,202,4
+ call L$125key_192a
+db 102,15,58,223,202,8
+ call L$124key_192b
+db 102,15,58,223,202,16
+ call L$125key_192a
+db 102,15,58,223,202,32
+ call L$124key_192b
+db 102,15,58,223,202,64
+ call L$125key_192a
+db 102,15,58,223,202,128
+ call L$124key_192b
+ movups [edx],xmm0
+ mov DWORD [48+edx],ecx
+ jmp NEAR L$120good_key
+align 16
+L$125key_192a:
+ movups [edx],xmm0
+ lea edx,[16+edx]
+align 16
+L$123key_192a_cold:
+ movaps xmm5,xmm2
+L$126key_192b_warm:
+ shufps xmm4,xmm0,16
+ movdqa xmm3,xmm2
+ xorps xmm0,xmm4
+ shufps xmm4,xmm0,140
+ pslldq xmm3,4
+ xorps xmm0,xmm4
+ pshufd xmm1,xmm1,85
+ pxor xmm2,xmm3
+ pxor xmm0,xmm1
+ pshufd xmm3,xmm0,255
+ pxor xmm2,xmm3
+ ret
+align 16
+L$124key_192b:
+ movaps xmm3,xmm0
+ shufps xmm5,xmm0,68
+ movups [edx],xmm5
+ shufps xmm3,xmm2,78
+ movups [16+edx],xmm3
+ lea edx,[32+edx]
+ jmp NEAR L$126key_192b_warm
+align 16
+L$12212rounds_alt:
+ movdqa xmm5,[16+ebx]
+ movdqa xmm4,[32+ebx]
+ mov ecx,8
+ movdqu [edx-16],xmm0
+L$127loop_key192:
+ movq [edx],xmm2
+ movdqa xmm1,xmm2
+db 102,15,56,0,213
+db 102,15,56,221,212
+ pslld xmm4,1
+ lea edx,[24+edx]
+ movdqa xmm3,xmm0
+ pslldq xmm0,4
+ pxor xmm3,xmm0
+ pslldq xmm0,4
+ pxor xmm3,xmm0
+ pslldq xmm0,4
+ pxor xmm0,xmm3
+ pshufd xmm3,xmm0,255
+ pxor xmm3,xmm1
+ pslldq xmm1,4
+ pxor xmm3,xmm1
+ pxor xmm0,xmm2
+ pxor xmm2,xmm3
+ movdqu [edx-16],xmm0
+ dec ecx
+ jnz NEAR L$127loop_key192
+ mov ecx,11
+ mov DWORD [32+edx],ecx
+ jmp NEAR L$120good_key
+align 16
+L$11314rounds:
+ movups xmm2,[16+eax]
+ lea edx,[16+edx]
+ cmp ebp,268435456
+ je NEAR L$12814rounds_alt
+ mov ecx,13
+ movups [edx-32],xmm0
+ movups [edx-16],xmm2
+db 102,15,58,223,202,1
+ call L$129key_256a_cold
+db 102,15,58,223,200,1
+ call L$130key_256b
+db 102,15,58,223,202,2
+ call L$131key_256a
+db 102,15,58,223,200,2
+ call L$130key_256b
+db 102,15,58,223,202,4
+ call L$131key_256a
+db 102,15,58,223,200,4
+ call L$130key_256b
+db 102,15,58,223,202,8
+ call L$131key_256a
+db 102,15,58,223,200,8
+ call L$130key_256b
+db 102,15,58,223,202,16
+ call L$131key_256a
+db 102,15,58,223,200,16
+ call L$130key_256b
+db 102,15,58,223,202,32
+ call L$131key_256a
+db 102,15,58,223,200,32
+ call L$130key_256b
+db 102,15,58,223,202,64
+ call L$131key_256a
+ movups [edx],xmm0
+ mov DWORD [16+edx],ecx
+ xor eax,eax
+ jmp NEAR L$120good_key
+align 16
+L$131key_256a:
+ movups [edx],xmm2
+ lea edx,[16+edx]
+L$129key_256a_cold:
+ shufps xmm4,xmm0,16
+ xorps xmm0,xmm4
+ shufps xmm4,xmm0,140
+ xorps xmm0,xmm4
+ shufps xmm1,xmm1,255
+ xorps xmm0,xmm1
+ ret
+align 16
+L$130key_256b:
+ movups [edx],xmm0
+ lea edx,[16+edx]
+ shufps xmm4,xmm2,16
+ xorps xmm2,xmm4
+ shufps xmm4,xmm2,140
+ xorps xmm2,xmm4
+ shufps xmm1,xmm1,170
+ xorps xmm2,xmm1
+ ret
+align 16
+L$12814rounds_alt:
+ movdqa xmm5,[ebx]
+ movdqa xmm4,[32+ebx]
+ mov ecx,7
+ movdqu [edx-32],xmm0
+ movdqa xmm1,xmm2
+ movdqu [edx-16],xmm2
+L$132loop_key256:
+db 102,15,56,0,213
+db 102,15,56,221,212
+ movdqa xmm3,xmm0
+ pslldq xmm0,4
+ pxor xmm3,xmm0
+ pslldq xmm0,4
+ pxor xmm3,xmm0
+ pslldq xmm0,4
+ pxor xmm0,xmm3
+ pslld xmm4,1
+ pxor xmm0,xmm2
+ movdqu [edx],xmm0
+ dec ecx
+ jz NEAR L$133done_key256
+ pshufd xmm2,xmm0,255
+ pxor xmm3,xmm3
+db 102,15,56,221,211
+ movdqa xmm3,xmm1
+ pslldq xmm1,4
+ pxor xmm3,xmm1
+ pslldq xmm1,4
+ pxor xmm3,xmm1
+ pslldq xmm1,4
+ pxor xmm1,xmm3
+ pxor xmm2,xmm1
+ movdqu [16+edx],xmm2
+ lea edx,[32+edx]
+ movdqa xmm1,xmm2
+ jmp NEAR L$132loop_key256
+L$133done_key256:
+ mov ecx,13
+ mov DWORD [16+edx],ecx
+L$120good_key:
+ pxor xmm0,xmm0
+ pxor xmm1,xmm1
+ pxor xmm2,xmm2
+ pxor xmm3,xmm3
+ pxor xmm4,xmm4
+ pxor xmm5,xmm5
+ xor eax,eax
+ pop ebx
+ pop ebp
+ ret
+align 4
+L$111bad_pointer:
+ mov eax,-1
+ pop ebx
+ pop ebp
+ ret
+align 4
+L$115bad_keybits:
+ pxor xmm0,xmm0
+ mov eax,-2
+ pop ebx
+ pop ebp
+ ret
+global _aesni_set_encrypt_key
+align 16
+_aesni_set_encrypt_key:
+L$_aesni_set_encrypt_key_begin:
+ mov eax,DWORD [4+esp]
+ mov ecx,DWORD [8+esp]
+ mov edx,DWORD [12+esp]
+ call __aesni_set_encrypt_key
+ ret
+global _aesni_set_decrypt_key
+align 16
+_aesni_set_decrypt_key:
+L$_aesni_set_decrypt_key_begin:
+ mov eax,DWORD [4+esp]
+ mov ecx,DWORD [8+esp]
+ mov edx,DWORD [12+esp]
+ call __aesni_set_encrypt_key
+ mov edx,DWORD [12+esp]
+ shl ecx,4
+ test eax,eax
+ jnz NEAR L$134dec_key_ret
+ lea eax,[16+ecx*1+edx]
+ movups xmm0,[edx]
+ movups xmm1,[eax]
+ movups [eax],xmm0
+ movups [edx],xmm1
+ lea edx,[16+edx]
+ lea eax,[eax-16]
+L$135dec_key_inverse:
+ movups xmm0,[edx]
+ movups xmm1,[eax]
+db 102,15,56,219,192
+db 102,15,56,219,201
+ lea edx,[16+edx]
+ lea eax,[eax-16]
+ movups [16+eax],xmm0
+ movups [edx-16],xmm1
+ cmp eax,edx
+ ja NEAR L$135dec_key_inverse
+ movups xmm0,[edx]
+db 102,15,56,219,192
+ movups [edx],xmm0
+ pxor xmm0,xmm0
+ pxor xmm1,xmm1
+ xor eax,eax
+L$134dec_key_ret:
+ ret
+align 64
+L$key_const:
+dd 202313229,202313229,202313229,202313229
+dd 67569157,67569157,67569157,67569157
+dd 1,1,1,1
+dd 27,27,27,27
+db 65,69,83,32,102,111,114,32,73,110,116,101,108,32,65,69
+db 83,45,78,73,44,32,67,82,89,80,84,79,71,65,77,83
+db 32,98,121,32,60,97,112,112,114,111,64,111,112,101,110,115
+db 115,108,46,111,114,103,62,0
+segment .bss
+common _OPENSSL_ia32cap_P 16
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/vpaes-x86.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/vpaes-x86.nasm
new file mode 100644
index 0000000000..5eecfdba3d
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/vpaes-x86.nasm
@@ -0,0 +1,648 @@
+; Copyright 2011-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text code align=64
+%else
+section .text code
+%endif
+align 64
+L$_vpaes_consts:
+dd 218628480,235210255,168496130,67568393
+dd 252381056,17041926,33884169,51187212
+dd 252645135,252645135,252645135,252645135
+dd 1512730624,3266504856,1377990664,3401244816
+dd 830229760,1275146365,2969422977,3447763452
+dd 3411033600,2979783055,338359620,2782886510
+dd 4209124096,907596821,221174255,1006095553
+dd 191964160,3799684038,3164090317,1589111125
+dd 182528256,1777043520,2877432650,3265356744
+dd 1874708224,3503451415,3305285752,363511674
+dd 1606117888,3487855781,1093350906,2384367825
+dd 197121,67569157,134941193,202313229
+dd 67569157,134941193,202313229,197121
+dd 134941193,202313229,197121,67569157
+dd 202313229,197121,67569157,134941193
+dd 33619971,100992007,168364043,235736079
+dd 235736079,33619971,100992007,168364043
+dd 168364043,235736079,33619971,100992007
+dd 100992007,168364043,235736079,33619971
+dd 50462976,117835012,185207048,252579084
+dd 252314880,51251460,117574920,184942860
+dd 184682752,252054788,50987272,118359308
+dd 118099200,185467140,251790600,50727180
+dd 2946363062,528716217,1300004225,1881839624
+dd 1532713819,1532713819,1532713819,1532713819
+dd 3602276352,4288629033,3737020424,4153884961
+dd 1354558464,32357713,2958822624,3775749553
+dd 1201988352,132424512,1572796698,503232858
+dd 2213177600,1597421020,4103937655,675398315
+dd 2749646592,4273543773,1511898873,121693092
+dd 3040248576,1103263732,2871565598,1608280554
+dd 2236667136,2588920351,482954393,64377734
+dd 3069987328,291237287,2117370568,3650299247
+dd 533321216,3573750986,2572112006,1401264716
+dd 1339849704,2721158661,548607111,3445553514
+dd 2128193280,3054596040,2183486460,1257083700
+dd 655635200,1165381986,3923443150,2344132524
+dd 190078720,256924420,290342170,357187870
+dd 1610966272,2263057382,4103205268,309794674
+dd 2592527872,2233205587,1335446729,3402964816
+dd 3973531904,3225098121,3002836325,1918774430
+dd 3870401024,2102906079,2284471353,4117666579
+dd 617007872,1021508343,366931923,691083277
+dd 2528395776,3491914898,2968704004,1613121270
+dd 3445188352,3247741094,844474987,4093578302
+dd 651481088,1190302358,1689581232,574775300
+dd 4289380608,206939853,2555985458,2489840491
+dd 2130264064,327674451,3566485037,3349835193
+dd 2470714624,316102159,3636825756,3393945945
+db 86,101,99,116,111,114,32,80,101,114,109,117,116,97,116,105
+db 111,110,32,65,69,83,32,102,111,114,32,120,56,54,47,83
+db 83,83,69,51,44,32,77,105,107,101,32,72,97,109,98,117
+db 114,103,32,40,83,116,97,110,102,111,114,100,32,85,110,105
+db 118,101,114,115,105,116,121,41,0
+align 64
+align 16
+__vpaes_preheat:
+ add ebp,DWORD [esp]
+ movdqa xmm7,[ebp-48]
+ movdqa xmm6,[ebp-16]
+ ret
+align 16
+__vpaes_encrypt_core:
+ mov ecx,16
+ mov eax,DWORD [240+edx]
+ movdqa xmm1,xmm6
+ movdqa xmm2,[ebp]
+ pandn xmm1,xmm0
+ pand xmm0,xmm6
+ movdqu xmm5,[edx]
+db 102,15,56,0,208
+ movdqa xmm0,[16+ebp]
+ pxor xmm2,xmm5
+ psrld xmm1,4
+ add edx,16
+db 102,15,56,0,193
+ lea ebx,[192+ebp]
+ pxor xmm0,xmm2
+ jmp NEAR L$000enc_entry
+align 16
+L$001enc_loop:
+ movdqa xmm4,[32+ebp]
+ movdqa xmm0,[48+ebp]
+db 102,15,56,0,226
+db 102,15,56,0,195
+ pxor xmm4,xmm5
+ movdqa xmm5,[64+ebp]
+ pxor xmm0,xmm4
+ movdqa xmm1,[ecx*1+ebx-64]
+db 102,15,56,0,234
+ movdqa xmm2,[80+ebp]
+ movdqa xmm4,[ecx*1+ebx]
+db 102,15,56,0,211
+ movdqa xmm3,xmm0
+ pxor xmm2,xmm5
+db 102,15,56,0,193
+ add edx,16
+ pxor xmm0,xmm2
+db 102,15,56,0,220
+ add ecx,16
+ pxor xmm3,xmm0
+db 102,15,56,0,193
+ and ecx,48
+ sub eax,1
+ pxor xmm0,xmm3
+L$000enc_entry:
+ movdqa xmm1,xmm6
+ movdqa xmm5,[ebp-32]
+ pandn xmm1,xmm0
+ psrld xmm1,4
+ pand xmm0,xmm6
+db 102,15,56,0,232
+ movdqa xmm3,xmm7
+ pxor xmm0,xmm1
+db 102,15,56,0,217
+ movdqa xmm4,xmm7
+ pxor xmm3,xmm5
+db 102,15,56,0,224
+ movdqa xmm2,xmm7
+ pxor xmm4,xmm5
+db 102,15,56,0,211
+ movdqa xmm3,xmm7
+ pxor xmm2,xmm0
+db 102,15,56,0,220
+ movdqu xmm5,[edx]
+ pxor xmm3,xmm1
+ jnz NEAR L$001enc_loop
+ movdqa xmm4,[96+ebp]
+ movdqa xmm0,[112+ebp]
+db 102,15,56,0,226
+ pxor xmm4,xmm5
+db 102,15,56,0,195
+ movdqa xmm1,[64+ecx*1+ebx]
+ pxor xmm0,xmm4
+db 102,15,56,0,193
+ ret
+align 16
+__vpaes_decrypt_core:
+ lea ebx,[608+ebp]
+ mov eax,DWORD [240+edx]
+ movdqa xmm1,xmm6
+ movdqa xmm2,[ebx-64]
+ pandn xmm1,xmm0
+ mov ecx,eax
+ psrld xmm1,4
+ movdqu xmm5,[edx]
+ shl ecx,4
+ pand xmm0,xmm6
+db 102,15,56,0,208
+ movdqa xmm0,[ebx-48]
+ xor ecx,48
+db 102,15,56,0,193
+ and ecx,48
+ pxor xmm2,xmm5
+ movdqa xmm5,[176+ebp]
+ pxor xmm0,xmm2
+ add edx,16
+ lea ecx,[ecx*1+ebx-352]
+ jmp NEAR L$002dec_entry
+align 16
+L$003dec_loop:
+ movdqa xmm4,[ebx-32]
+ movdqa xmm1,[ebx-16]
+db 102,15,56,0,226
+db 102,15,56,0,203
+ pxor xmm0,xmm4
+ movdqa xmm4,[ebx]
+ pxor xmm0,xmm1
+ movdqa xmm1,[16+ebx]
+db 102,15,56,0,226
+db 102,15,56,0,197
+db 102,15,56,0,203
+ pxor xmm0,xmm4
+ movdqa xmm4,[32+ebx]
+ pxor xmm0,xmm1
+ movdqa xmm1,[48+ebx]
+db 102,15,56,0,226
+db 102,15,56,0,197
+db 102,15,56,0,203
+ pxor xmm0,xmm4
+ movdqa xmm4,[64+ebx]
+ pxor xmm0,xmm1
+ movdqa xmm1,[80+ebx]
+db 102,15,56,0,226
+db 102,15,56,0,197
+db 102,15,56,0,203
+ pxor xmm0,xmm4
+ add edx,16
+db 102,15,58,15,237,12
+ pxor xmm0,xmm1
+ sub eax,1
+L$002dec_entry:
+ movdqa xmm1,xmm6
+ movdqa xmm2,[ebp-32]
+ pandn xmm1,xmm0
+ pand xmm0,xmm6
+ psrld xmm1,4
+db 102,15,56,0,208
+ movdqa xmm3,xmm7
+ pxor xmm0,xmm1
+db 102,15,56,0,217
+ movdqa xmm4,xmm7
+ pxor xmm3,xmm2
+db 102,15,56,0,224
+ pxor xmm4,xmm2
+ movdqa xmm2,xmm7
+db 102,15,56,0,211
+ movdqa xmm3,xmm7
+ pxor xmm2,xmm0
+db 102,15,56,0,220
+ movdqu xmm0,[edx]
+ pxor xmm3,xmm1
+ jnz NEAR L$003dec_loop
+ movdqa xmm4,[96+ebx]
+db 102,15,56,0,226
+ pxor xmm4,xmm0
+ movdqa xmm0,[112+ebx]
+ movdqa xmm2,[ecx]
+db 102,15,56,0,195
+ pxor xmm0,xmm4
+db 102,15,56,0,194
+ ret
+align 16
+__vpaes_schedule_core:
+ add ebp,DWORD [esp]
+ movdqu xmm0,[esi]
+ movdqa xmm2,[320+ebp]
+ movdqa xmm3,xmm0
+ lea ebx,[ebp]
+ movdqa [4+esp],xmm2
+ call __vpaes_schedule_transform
+ movdqa xmm7,xmm0
+ test edi,edi
+ jnz NEAR L$004schedule_am_decrypting
+ movdqu [edx],xmm0
+ jmp NEAR L$005schedule_go
+L$004schedule_am_decrypting:
+ movdqa xmm1,[256+ecx*1+ebp]
+db 102,15,56,0,217
+ movdqu [edx],xmm3
+ xor ecx,48
+L$005schedule_go:
+ cmp eax,192
+ ja NEAR L$006schedule_256
+ je NEAR L$007schedule_192
+L$008schedule_128:
+ mov eax,10
+L$009loop_schedule_128:
+ call __vpaes_schedule_round
+ dec eax
+ jz NEAR L$010schedule_mangle_last
+ call __vpaes_schedule_mangle
+ jmp NEAR L$009loop_schedule_128
+align 16
+L$007schedule_192:
+ movdqu xmm0,[8+esi]
+ call __vpaes_schedule_transform
+ movdqa xmm6,xmm0
+ pxor xmm4,xmm4
+ movhlps xmm6,xmm4
+ mov eax,4
+L$011loop_schedule_192:
+ call __vpaes_schedule_round
+db 102,15,58,15,198,8
+ call __vpaes_schedule_mangle
+ call __vpaes_schedule_192_smear
+ call __vpaes_schedule_mangle
+ call __vpaes_schedule_round
+ dec eax
+ jz NEAR L$010schedule_mangle_last
+ call __vpaes_schedule_mangle
+ call __vpaes_schedule_192_smear
+ jmp NEAR L$011loop_schedule_192
+align 16
+L$006schedule_256:
+ movdqu xmm0,[16+esi]
+ call __vpaes_schedule_transform
+ mov eax,7
+L$012loop_schedule_256:
+ call __vpaes_schedule_mangle
+ movdqa xmm6,xmm0
+ call __vpaes_schedule_round
+ dec eax
+ jz NEAR L$010schedule_mangle_last
+ call __vpaes_schedule_mangle
+ pshufd xmm0,xmm0,255
+ movdqa [20+esp],xmm7
+ movdqa xmm7,xmm6
+ call L$_vpaes_schedule_low_round
+ movdqa xmm7,[20+esp]
+ jmp NEAR L$012loop_schedule_256
+align 16
+L$010schedule_mangle_last:
+ lea ebx,[384+ebp]
+ test edi,edi
+ jnz NEAR L$013schedule_mangle_last_dec
+ movdqa xmm1,[256+ecx*1+ebp]
+db 102,15,56,0,193
+ lea ebx,[352+ebp]
+ add edx,32
+L$013schedule_mangle_last_dec:
+ add edx,-16
+ pxor xmm0,[336+ebp]
+ call __vpaes_schedule_transform
+ movdqu [edx],xmm0
+ pxor xmm0,xmm0
+ pxor xmm1,xmm1
+ pxor xmm2,xmm2
+ pxor xmm3,xmm3
+ pxor xmm4,xmm4
+ pxor xmm5,xmm5
+ pxor xmm6,xmm6
+ pxor xmm7,xmm7
+ ret
+align 16
+__vpaes_schedule_192_smear:
+ pshufd xmm1,xmm6,128
+ pshufd xmm0,xmm7,254
+ pxor xmm6,xmm1
+ pxor xmm1,xmm1
+ pxor xmm6,xmm0
+ movdqa xmm0,xmm6
+ movhlps xmm6,xmm1
+ ret
+align 16
+__vpaes_schedule_round:
+ movdqa xmm2,[8+esp]
+ pxor xmm1,xmm1
+db 102,15,58,15,202,15
+db 102,15,58,15,210,15
+ pxor xmm7,xmm1
+ pshufd xmm0,xmm0,255
+db 102,15,58,15,192,1
+ movdqa [8+esp],xmm2
+L$_vpaes_schedule_low_round:
+ movdqa xmm1,xmm7
+ pslldq xmm7,4
+ pxor xmm7,xmm1
+ movdqa xmm1,xmm7
+ pslldq xmm7,8
+ pxor xmm7,xmm1
+ pxor xmm7,[336+ebp]
+ movdqa xmm4,[ebp-16]
+ movdqa xmm5,[ebp-48]
+ movdqa xmm1,xmm4
+ pandn xmm1,xmm0
+ psrld xmm1,4
+ pand xmm0,xmm4
+ movdqa xmm2,[ebp-32]
+db 102,15,56,0,208
+ pxor xmm0,xmm1
+ movdqa xmm3,xmm5
+db 102,15,56,0,217
+ pxor xmm3,xmm2
+ movdqa xmm4,xmm5
+db 102,15,56,0,224
+ pxor xmm4,xmm2
+ movdqa xmm2,xmm5
+db 102,15,56,0,211
+ pxor xmm2,xmm0
+ movdqa xmm3,xmm5
+db 102,15,56,0,220
+ pxor xmm3,xmm1
+ movdqa xmm4,[32+ebp]
+db 102,15,56,0,226
+ movdqa xmm0,[48+ebp]
+db 102,15,56,0,195
+ pxor xmm0,xmm4
+ pxor xmm0,xmm7
+ movdqa xmm7,xmm0
+ ret
+align 16
+__vpaes_schedule_transform:
+ movdqa xmm2,[ebp-16]
+ movdqa xmm1,xmm2
+ pandn xmm1,xmm0
+ psrld xmm1,4
+ pand xmm0,xmm2
+ movdqa xmm2,[ebx]
+db 102,15,56,0,208
+ movdqa xmm0,[16+ebx]
+db 102,15,56,0,193
+ pxor xmm0,xmm2
+ ret
+align 16
+__vpaes_schedule_mangle:
+ movdqa xmm4,xmm0
+ movdqa xmm5,[128+ebp]
+ test edi,edi
+ jnz NEAR L$014schedule_mangle_dec
+ add edx,16
+ pxor xmm4,[336+ebp]
+db 102,15,56,0,229
+ movdqa xmm3,xmm4
+db 102,15,56,0,229
+ pxor xmm3,xmm4
+db 102,15,56,0,229
+ pxor xmm3,xmm4
+ jmp NEAR L$015schedule_mangle_both
+align 16
+L$014schedule_mangle_dec:
+ movdqa xmm2,[ebp-16]
+ lea esi,[416+ebp]
+ movdqa xmm1,xmm2
+ pandn xmm1,xmm4
+ psrld xmm1,4
+ pand xmm4,xmm2
+ movdqa xmm2,[esi]
+db 102,15,56,0,212
+ movdqa xmm3,[16+esi]
+db 102,15,56,0,217
+ pxor xmm3,xmm2
+db 102,15,56,0,221
+ movdqa xmm2,[32+esi]
+db 102,15,56,0,212
+ pxor xmm2,xmm3
+ movdqa xmm3,[48+esi]
+db 102,15,56,0,217
+ pxor xmm3,xmm2
+db 102,15,56,0,221
+ movdqa xmm2,[64+esi]
+db 102,15,56,0,212
+ pxor xmm2,xmm3
+ movdqa xmm3,[80+esi]
+db 102,15,56,0,217
+ pxor xmm3,xmm2
+db 102,15,56,0,221
+ movdqa xmm2,[96+esi]
+db 102,15,56,0,212
+ pxor xmm2,xmm3
+ movdqa xmm3,[112+esi]
+db 102,15,56,0,217
+ pxor xmm3,xmm2
+ add edx,-16
+L$015schedule_mangle_both:
+ movdqa xmm1,[256+ecx*1+ebp]
+db 102,15,56,0,217
+ add ecx,-16
+ and ecx,48
+ movdqu [edx],xmm3
+ ret
+global _vpaes_set_encrypt_key
+align 16
+_vpaes_set_encrypt_key:
+L$_vpaes_set_encrypt_key_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ mov esi,DWORD [20+esp]
+ lea ebx,[esp-56]
+ mov eax,DWORD [24+esp]
+ and ebx,-16
+ mov edx,DWORD [28+esp]
+ xchg ebx,esp
+ mov DWORD [48+esp],ebx
+ mov ebx,eax
+ shr ebx,5
+ add ebx,5
+ mov DWORD [240+edx],ebx
+ mov ecx,48
+ mov edi,0
+ lea ebp,[(L$_vpaes_consts+0x30-L$016pic_point)]
+ call __vpaes_schedule_core
+L$016pic_point:
+ mov esp,DWORD [48+esp]
+ xor eax,eax
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+global _vpaes_set_decrypt_key
+align 16
+_vpaes_set_decrypt_key:
+L$_vpaes_set_decrypt_key_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ mov esi,DWORD [20+esp]
+ lea ebx,[esp-56]
+ mov eax,DWORD [24+esp]
+ and ebx,-16
+ mov edx,DWORD [28+esp]
+ xchg ebx,esp
+ mov DWORD [48+esp],ebx
+ mov ebx,eax
+ shr ebx,5
+ add ebx,5
+ mov DWORD [240+edx],ebx
+ shl ebx,4
+ lea edx,[16+ebx*1+edx]
+ mov edi,1
+ mov ecx,eax
+ shr ecx,1
+ and ecx,32
+ xor ecx,32
+ lea ebp,[(L$_vpaes_consts+0x30-L$017pic_point)]
+ call __vpaes_schedule_core
+L$017pic_point:
+ mov esp,DWORD [48+esp]
+ xor eax,eax
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+global _vpaes_encrypt
+align 16
+_vpaes_encrypt:
+L$_vpaes_encrypt_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ lea ebp,[(L$_vpaes_consts+0x30-L$018pic_point)]
+ call __vpaes_preheat
+L$018pic_point:
+ mov esi,DWORD [20+esp]
+ lea ebx,[esp-56]
+ mov edi,DWORD [24+esp]
+ and ebx,-16
+ mov edx,DWORD [28+esp]
+ xchg ebx,esp
+ mov DWORD [48+esp],ebx
+ movdqu xmm0,[esi]
+ call __vpaes_encrypt_core
+ movdqu [edi],xmm0
+ mov esp,DWORD [48+esp]
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+global _vpaes_decrypt
+align 16
+_vpaes_decrypt:
+L$_vpaes_decrypt_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ lea ebp,[(L$_vpaes_consts+0x30-L$019pic_point)]
+ call __vpaes_preheat
+L$019pic_point:
+ mov esi,DWORD [20+esp]
+ lea ebx,[esp-56]
+ mov edi,DWORD [24+esp]
+ and ebx,-16
+ mov edx,DWORD [28+esp]
+ xchg ebx,esp
+ mov DWORD [48+esp],ebx
+ movdqu xmm0,[esi]
+ call __vpaes_decrypt_core
+ movdqu [edi],xmm0
+ mov esp,DWORD [48+esp]
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+global _vpaes_cbc_encrypt
+align 16
+_vpaes_cbc_encrypt:
+L$_vpaes_cbc_encrypt_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ mov esi,DWORD [20+esp]
+ mov edi,DWORD [24+esp]
+ mov eax,DWORD [28+esp]
+ mov edx,DWORD [32+esp]
+ sub eax,16
+ jc NEAR L$020cbc_abort
+ lea ebx,[esp-56]
+ mov ebp,DWORD [36+esp]
+ and ebx,-16
+ mov ecx,DWORD [40+esp]
+ xchg ebx,esp
+ movdqu xmm1,[ebp]
+ sub edi,esi
+ mov DWORD [48+esp],ebx
+ mov DWORD [esp],edi
+ mov DWORD [4+esp],edx
+ mov DWORD [8+esp],ebp
+ mov edi,eax
+ lea ebp,[(L$_vpaes_consts+0x30-L$021pic_point)]
+ call __vpaes_preheat
+L$021pic_point:
+ cmp ecx,0
+ je NEAR L$022cbc_dec_loop
+ jmp NEAR L$023cbc_enc_loop
+align 16
+L$023cbc_enc_loop:
+ movdqu xmm0,[esi]
+ pxor xmm0,xmm1
+ call __vpaes_encrypt_core
+ mov ebx,DWORD [esp]
+ mov edx,DWORD [4+esp]
+ movdqa xmm1,xmm0
+ movdqu [esi*1+ebx],xmm0
+ lea esi,[16+esi]
+ sub edi,16
+ jnc NEAR L$023cbc_enc_loop
+ jmp NEAR L$024cbc_done
+align 16
+L$022cbc_dec_loop:
+ movdqu xmm0,[esi]
+ movdqa [16+esp],xmm1
+ movdqa [32+esp],xmm0
+ call __vpaes_decrypt_core
+ mov ebx,DWORD [esp]
+ mov edx,DWORD [4+esp]
+ pxor xmm0,[16+esp]
+ movdqa xmm1,[32+esp]
+ movdqu [esi*1+ebx],xmm0
+ lea esi,[16+esi]
+ sub edi,16
+ jnc NEAR L$022cbc_dec_loop
+L$024cbc_done:
+ mov ebx,DWORD [8+esp]
+ mov esp,DWORD [48+esp]
+ movdqu [ebx],xmm1
+L$020cbc_abort:
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/bn-586.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/bn-586.nasm
new file mode 100644
index 0000000000..75bba13387
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/bn-586.nasm
@@ -0,0 +1,1522 @@
+; Copyright 1995-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text code align=64
+%else
+section .text code
+%endif
+;extern _OPENSSL_ia32cap_P
+global _bn_mul_add_words
+align 16
+_bn_mul_add_words:
+L$_bn_mul_add_words_begin:
+ lea eax,[_OPENSSL_ia32cap_P]
+ bt DWORD [eax],26
+ jnc NEAR L$000maw_non_sse2
+ mov eax,DWORD [4+esp]
+ mov edx,DWORD [8+esp]
+ mov ecx,DWORD [12+esp]
+ movd mm0,DWORD [16+esp]
+ pxor mm1,mm1
+ jmp NEAR L$001maw_sse2_entry
+align 16
+L$002maw_sse2_unrolled:
+ movd mm3,DWORD [eax]
+ paddq mm1,mm3
+ movd mm2,DWORD [edx]
+ pmuludq mm2,mm0
+ movd mm4,DWORD [4+edx]
+ pmuludq mm4,mm0
+ movd mm6,DWORD [8+edx]
+ pmuludq mm6,mm0
+ movd mm7,DWORD [12+edx]
+ pmuludq mm7,mm0
+ paddq mm1,mm2
+ movd mm3,DWORD [4+eax]
+ paddq mm3,mm4
+ movd mm5,DWORD [8+eax]
+ paddq mm5,mm6
+ movd mm4,DWORD [12+eax]
+ paddq mm7,mm4
+ movd DWORD [eax],mm1
+ movd mm2,DWORD [16+edx]
+ pmuludq mm2,mm0
+ psrlq mm1,32
+ movd mm4,DWORD [20+edx]
+ pmuludq mm4,mm0
+ paddq mm1,mm3
+ movd mm6,DWORD [24+edx]
+ pmuludq mm6,mm0
+ movd DWORD [4+eax],mm1
+ psrlq mm1,32
+ movd mm3,DWORD [28+edx]
+ add edx,32
+ pmuludq mm3,mm0
+ paddq mm1,mm5
+ movd mm5,DWORD [16+eax]
+ paddq mm2,mm5
+ movd DWORD [8+eax],mm1
+ psrlq mm1,32
+ paddq mm1,mm7
+ movd mm5,DWORD [20+eax]
+ paddq mm4,mm5
+ movd DWORD [12+eax],mm1
+ psrlq mm1,32
+ paddq mm1,mm2
+ movd mm5,DWORD [24+eax]
+ paddq mm6,mm5
+ movd DWORD [16+eax],mm1
+ psrlq mm1,32
+ paddq mm1,mm4
+ movd mm5,DWORD [28+eax]
+ paddq mm3,mm5
+ movd DWORD [20+eax],mm1
+ psrlq mm1,32
+ paddq mm1,mm6
+ movd DWORD [24+eax],mm1
+ psrlq mm1,32
+ paddq mm1,mm3
+ movd DWORD [28+eax],mm1
+ lea eax,[32+eax]
+ psrlq mm1,32
+ sub ecx,8
+ jz NEAR L$003maw_sse2_exit
+L$001maw_sse2_entry:
+ test ecx,4294967288
+ jnz NEAR L$002maw_sse2_unrolled
+align 4
+L$004maw_sse2_loop:
+ movd mm2,DWORD [edx]
+ movd mm3,DWORD [eax]
+ pmuludq mm2,mm0
+ lea edx,[4+edx]
+ paddq mm1,mm3
+ paddq mm1,mm2
+ movd DWORD [eax],mm1
+ sub ecx,1
+ psrlq mm1,32
+ lea eax,[4+eax]
+ jnz NEAR L$004maw_sse2_loop
+L$003maw_sse2_exit:
+ movd eax,mm1
+ emms
+ ret
+align 16
+L$000maw_non_sse2:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ ;
+ xor esi,esi
+ mov edi,DWORD [20+esp]
+ mov ecx,DWORD [28+esp]
+ mov ebx,DWORD [24+esp]
+ and ecx,4294967288
+ mov ebp,DWORD [32+esp]
+ push ecx
+ jz NEAR L$005maw_finish
+align 16
+L$006maw_loop:
+ ; Round 0
+ mov eax,DWORD [ebx]
+ mul ebp
+ add eax,esi
+ adc edx,0
+ add eax,DWORD [edi]
+ adc edx,0
+ mov DWORD [edi],eax
+ mov esi,edx
+ ; Round 4
+ mov eax,DWORD [4+ebx]
+ mul ebp
+ add eax,esi
+ adc edx,0
+ add eax,DWORD [4+edi]
+ adc edx,0
+ mov DWORD [4+edi],eax
+ mov esi,edx
+ ; Round 8
+ mov eax,DWORD [8+ebx]
+ mul ebp
+ add eax,esi
+ adc edx,0
+ add eax,DWORD [8+edi]
+ adc edx,0
+ mov DWORD [8+edi],eax
+ mov esi,edx
+ ; Round 12
+ mov eax,DWORD [12+ebx]
+ mul ebp
+ add eax,esi
+ adc edx,0
+ add eax,DWORD [12+edi]
+ adc edx,0
+ mov DWORD [12+edi],eax
+ mov esi,edx
+ ; Round 16
+ mov eax,DWORD [16+ebx]
+ mul ebp
+ add eax,esi
+ adc edx,0
+ add eax,DWORD [16+edi]
+ adc edx,0
+ mov DWORD [16+edi],eax
+ mov esi,edx
+ ; Round 20
+ mov eax,DWORD [20+ebx]
+ mul ebp
+ add eax,esi
+ adc edx,0
+ add eax,DWORD [20+edi]
+ adc edx,0
+ mov DWORD [20+edi],eax
+ mov esi,edx
+ ; Round 24
+ mov eax,DWORD [24+ebx]
+ mul ebp
+ add eax,esi
+ adc edx,0
+ add eax,DWORD [24+edi]
+ adc edx,0
+ mov DWORD [24+edi],eax
+ mov esi,edx
+ ; Round 28
+ mov eax,DWORD [28+ebx]
+ mul ebp
+ add eax,esi
+ adc edx,0
+ add eax,DWORD [28+edi]
+ adc edx,0
+ mov DWORD [28+edi],eax
+ mov esi,edx
+ ;
+ sub ecx,8
+ lea ebx,[32+ebx]
+ lea edi,[32+edi]
+ jnz NEAR L$006maw_loop
+L$005maw_finish:
+ mov ecx,DWORD [32+esp]
+ and ecx,7
+ jnz NEAR L$007maw_finish2
+ jmp NEAR L$008maw_end
+L$007maw_finish2:
+ ; Tail Round 0
+ mov eax,DWORD [ebx]
+ mul ebp
+ add eax,esi
+ adc edx,0
+ add eax,DWORD [edi]
+ adc edx,0
+ dec ecx
+ mov DWORD [edi],eax
+ mov esi,edx
+ jz NEAR L$008maw_end
+ ; Tail Round 1
+ mov eax,DWORD [4+ebx]
+ mul ebp
+ add eax,esi
+ adc edx,0
+ add eax,DWORD [4+edi]
+ adc edx,0
+ dec ecx
+ mov DWORD [4+edi],eax
+ mov esi,edx
+ jz NEAR L$008maw_end
+ ; Tail Round 2
+ mov eax,DWORD [8+ebx]
+ mul ebp
+ add eax,esi
+ adc edx,0
+ add eax,DWORD [8+edi]
+ adc edx,0
+ dec ecx
+ mov DWORD [8+edi],eax
+ mov esi,edx
+ jz NEAR L$008maw_end
+ ; Tail Round 3
+ mov eax,DWORD [12+ebx]
+ mul ebp
+ add eax,esi
+ adc edx,0
+ add eax,DWORD [12+edi]
+ adc edx,0
+ dec ecx
+ mov DWORD [12+edi],eax
+ mov esi,edx
+ jz NEAR L$008maw_end
+ ; Tail Round 4
+ mov eax,DWORD [16+ebx]
+ mul ebp
+ add eax,esi
+ adc edx,0
+ add eax,DWORD [16+edi]
+ adc edx,0
+ dec ecx
+ mov DWORD [16+edi],eax
+ mov esi,edx
+ jz NEAR L$008maw_end
+ ; Tail Round 5
+ mov eax,DWORD [20+ebx]
+ mul ebp
+ add eax,esi
+ adc edx,0
+ add eax,DWORD [20+edi]
+ adc edx,0
+ dec ecx
+ mov DWORD [20+edi],eax
+ mov esi,edx
+ jz NEAR L$008maw_end
+ ; Tail Round 6
+ mov eax,DWORD [24+ebx]
+ mul ebp
+ add eax,esi
+ adc edx,0
+ add eax,DWORD [24+edi]
+ adc edx,0
+ mov DWORD [24+edi],eax
+ mov esi,edx
+L$008maw_end:
+ mov eax,esi
+ pop ecx
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+global _bn_mul_words
+align 16
+_bn_mul_words:
+L$_bn_mul_words_begin:
+ lea eax,[_OPENSSL_ia32cap_P]
+ bt DWORD [eax],26
+ jnc NEAR L$009mw_non_sse2
+ mov eax,DWORD [4+esp]
+ mov edx,DWORD [8+esp]
+ mov ecx,DWORD [12+esp]
+ movd mm0,DWORD [16+esp]
+ pxor mm1,mm1
+align 16
+L$010mw_sse2_loop:
+ movd mm2,DWORD [edx]
+ pmuludq mm2,mm0
+ lea edx,[4+edx]
+ paddq mm1,mm2
+ movd DWORD [eax],mm1
+ sub ecx,1
+ psrlq mm1,32
+ lea eax,[4+eax]
+ jnz NEAR L$010mw_sse2_loop
+ movd eax,mm1
+ emms
+ ret
+align 16
+L$009mw_non_sse2:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ ;
+ xor esi,esi
+ mov edi,DWORD [20+esp]
+ mov ebx,DWORD [24+esp]
+ mov ebp,DWORD [28+esp]
+ mov ecx,DWORD [32+esp]
+ and ebp,4294967288
+ jz NEAR L$011mw_finish
+L$012mw_loop:
+ ; Round 0
+ mov eax,DWORD [ebx]
+ mul ecx
+ add eax,esi
+ adc edx,0
+ mov DWORD [edi],eax
+ mov esi,edx
+ ; Round 4
+ mov eax,DWORD [4+ebx]
+ mul ecx
+ add eax,esi
+ adc edx,0
+ mov DWORD [4+edi],eax
+ mov esi,edx
+ ; Round 8
+ mov eax,DWORD [8+ebx]
+ mul ecx
+ add eax,esi
+ adc edx,0
+ mov DWORD [8+edi],eax
+ mov esi,edx
+ ; Round 12
+ mov eax,DWORD [12+ebx]
+ mul ecx
+ add eax,esi
+ adc edx,0
+ mov DWORD [12+edi],eax
+ mov esi,edx
+ ; Round 16
+ mov eax,DWORD [16+ebx]
+ mul ecx
+ add eax,esi
+ adc edx,0
+ mov DWORD [16+edi],eax
+ mov esi,edx
+ ; Round 20
+ mov eax,DWORD [20+ebx]
+ mul ecx
+ add eax,esi
+ adc edx,0
+ mov DWORD [20+edi],eax
+ mov esi,edx
+ ; Round 24
+ mov eax,DWORD [24+ebx]
+ mul ecx
+ add eax,esi
+ adc edx,0
+ mov DWORD [24+edi],eax
+ mov esi,edx
+ ; Round 28
+ mov eax,DWORD [28+ebx]
+ mul ecx
+ add eax,esi
+ adc edx,0
+ mov DWORD [28+edi],eax
+ mov esi,edx
+ ;
+ add ebx,32
+ add edi,32
+ sub ebp,8
+ jz NEAR L$011mw_finish
+ jmp NEAR L$012mw_loop
+L$011mw_finish:
+ mov ebp,DWORD [28+esp]
+ and ebp,7
+ jnz NEAR L$013mw_finish2
+ jmp NEAR L$014mw_end
+L$013mw_finish2:
+ ; Tail Round 0
+ mov eax,DWORD [ebx]
+ mul ecx
+ add eax,esi
+ adc edx,0
+ mov DWORD [edi],eax
+ mov esi,edx
+ dec ebp
+ jz NEAR L$014mw_end
+ ; Tail Round 1
+ mov eax,DWORD [4+ebx]
+ mul ecx
+ add eax,esi
+ adc edx,0
+ mov DWORD [4+edi],eax
+ mov esi,edx
+ dec ebp
+ jz NEAR L$014mw_end
+ ; Tail Round 2
+ mov eax,DWORD [8+ebx]
+ mul ecx
+ add eax,esi
+ adc edx,0
+ mov DWORD [8+edi],eax
+ mov esi,edx
+ dec ebp
+ jz NEAR L$014mw_end
+ ; Tail Round 3
+ mov eax,DWORD [12+ebx]
+ mul ecx
+ add eax,esi
+ adc edx,0
+ mov DWORD [12+edi],eax
+ mov esi,edx
+ dec ebp
+ jz NEAR L$014mw_end
+ ; Tail Round 4
+ mov eax,DWORD [16+ebx]
+ mul ecx
+ add eax,esi
+ adc edx,0
+ mov DWORD [16+edi],eax
+ mov esi,edx
+ dec ebp
+ jz NEAR L$014mw_end
+ ; Tail Round 5
+ mov eax,DWORD [20+ebx]
+ mul ecx
+ add eax,esi
+ adc edx,0
+ mov DWORD [20+edi],eax
+ mov esi,edx
+ dec ebp
+ jz NEAR L$014mw_end
+ ; Tail Round 6
+ mov eax,DWORD [24+ebx]
+ mul ecx
+ add eax,esi
+ adc edx,0
+ mov DWORD [24+edi],eax
+ mov esi,edx
+L$014mw_end:
+ mov eax,esi
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+global _bn_sqr_words
+align 16
+_bn_sqr_words:
+L$_bn_sqr_words_begin:
+ lea eax,[_OPENSSL_ia32cap_P]
+ bt DWORD [eax],26
+ jnc NEAR L$015sqr_non_sse2
+ mov eax,DWORD [4+esp]
+ mov edx,DWORD [8+esp]
+ mov ecx,DWORD [12+esp]
+align 16
+L$016sqr_sse2_loop:
+ movd mm0,DWORD [edx]
+ pmuludq mm0,mm0
+ lea edx,[4+edx]
+ movq [eax],mm0
+ sub ecx,1
+ lea eax,[8+eax]
+ jnz NEAR L$016sqr_sse2_loop
+ emms
+ ret
+align 16
+L$015sqr_non_sse2:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ ;
+ mov esi,DWORD [20+esp]
+ mov edi,DWORD [24+esp]
+ mov ebx,DWORD [28+esp]
+ and ebx,4294967288
+ jz NEAR L$017sw_finish
+L$018sw_loop:
+ ; Round 0
+ mov eax,DWORD [edi]
+ mul eax
+ mov DWORD [esi],eax
+ mov DWORD [4+esi],edx
+ ; Round 4
+ mov eax,DWORD [4+edi]
+ mul eax
+ mov DWORD [8+esi],eax
+ mov DWORD [12+esi],edx
+ ; Round 8
+ mov eax,DWORD [8+edi]
+ mul eax
+ mov DWORD [16+esi],eax
+ mov DWORD [20+esi],edx
+ ; Round 12
+ mov eax,DWORD [12+edi]
+ mul eax
+ mov DWORD [24+esi],eax
+ mov DWORD [28+esi],edx
+ ; Round 16
+ mov eax,DWORD [16+edi]
+ mul eax
+ mov DWORD [32+esi],eax
+ mov DWORD [36+esi],edx
+ ; Round 20
+ mov eax,DWORD [20+edi]
+ mul eax
+ mov DWORD [40+esi],eax
+ mov DWORD [44+esi],edx
+ ; Round 24
+ mov eax,DWORD [24+edi]
+ mul eax
+ mov DWORD [48+esi],eax
+ mov DWORD [52+esi],edx
+ ; Round 28
+ mov eax,DWORD [28+edi]
+ mul eax
+ mov DWORD [56+esi],eax
+ mov DWORD [60+esi],edx
+ ;
+ add edi,32
+ add esi,64
+ sub ebx,8
+ jnz NEAR L$018sw_loop
+L$017sw_finish:
+ mov ebx,DWORD [28+esp]
+ and ebx,7
+ jz NEAR L$019sw_end
+ ; Tail Round 0
+ mov eax,DWORD [edi]
+ mul eax
+ mov DWORD [esi],eax
+ dec ebx
+ mov DWORD [4+esi],edx
+ jz NEAR L$019sw_end
+ ; Tail Round 1
+ mov eax,DWORD [4+edi]
+ mul eax
+ mov DWORD [8+esi],eax
+ dec ebx
+ mov DWORD [12+esi],edx
+ jz NEAR L$019sw_end
+ ; Tail Round 2
+ mov eax,DWORD [8+edi]
+ mul eax
+ mov DWORD [16+esi],eax
+ dec ebx
+ mov DWORD [20+esi],edx
+ jz NEAR L$019sw_end
+ ; Tail Round 3
+ mov eax,DWORD [12+edi]
+ mul eax
+ mov DWORD [24+esi],eax
+ dec ebx
+ mov DWORD [28+esi],edx
+ jz NEAR L$019sw_end
+ ; Tail Round 4
+ mov eax,DWORD [16+edi]
+ mul eax
+ mov DWORD [32+esi],eax
+ dec ebx
+ mov DWORD [36+esi],edx
+ jz NEAR L$019sw_end
+ ; Tail Round 5
+ mov eax,DWORD [20+edi]
+ mul eax
+ mov DWORD [40+esi],eax
+ dec ebx
+ mov DWORD [44+esi],edx
+ jz NEAR L$019sw_end
+ ; Tail Round 6
+ mov eax,DWORD [24+edi]
+ mul eax
+ mov DWORD [48+esi],eax
+ mov DWORD [52+esi],edx
+L$019sw_end:
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+global _bn_div_words
+align 16
+_bn_div_words:
+L$_bn_div_words_begin:
+ mov edx,DWORD [4+esp]
+ mov eax,DWORD [8+esp]
+ mov ecx,DWORD [12+esp]
+ div ecx
+ ret
+global _bn_add_words
+align 16
+_bn_add_words:
+L$_bn_add_words_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ ;
+ mov ebx,DWORD [20+esp]
+ mov esi,DWORD [24+esp]
+ mov edi,DWORD [28+esp]
+ mov ebp,DWORD [32+esp]
+ xor eax,eax
+ and ebp,4294967288
+ jz NEAR L$020aw_finish
+L$021aw_loop:
+ ; Round 0
+ mov ecx,DWORD [esi]
+ mov edx,DWORD [edi]
+ add ecx,eax
+ mov eax,0
+ adc eax,eax
+ add ecx,edx
+ adc eax,0
+ mov DWORD [ebx],ecx
+ ; Round 1
+ mov ecx,DWORD [4+esi]
+ mov edx,DWORD [4+edi]
+ add ecx,eax
+ mov eax,0
+ adc eax,eax
+ add ecx,edx
+ adc eax,0
+ mov DWORD [4+ebx],ecx
+ ; Round 2
+ mov ecx,DWORD [8+esi]
+ mov edx,DWORD [8+edi]
+ add ecx,eax
+ mov eax,0
+ adc eax,eax
+ add ecx,edx
+ adc eax,0
+ mov DWORD [8+ebx],ecx
+ ; Round 3
+ mov ecx,DWORD [12+esi]
+ mov edx,DWORD [12+edi]
+ add ecx,eax
+ mov eax,0
+ adc eax,eax
+ add ecx,edx
+ adc eax,0
+ mov DWORD [12+ebx],ecx
+ ; Round 4
+ mov ecx,DWORD [16+esi]
+ mov edx,DWORD [16+edi]
+ add ecx,eax
+ mov eax,0
+ adc eax,eax
+ add ecx,edx
+ adc eax,0
+ mov DWORD [16+ebx],ecx
+ ; Round 5
+ mov ecx,DWORD [20+esi]
+ mov edx,DWORD [20+edi]
+ add ecx,eax
+ mov eax,0
+ adc eax,eax
+ add ecx,edx
+ adc eax,0
+ mov DWORD [20+ebx],ecx
+ ; Round 6
+ mov ecx,DWORD [24+esi]
+ mov edx,DWORD [24+edi]
+ add ecx,eax
+ mov eax,0
+ adc eax,eax
+ add ecx,edx
+ adc eax,0
+ mov DWORD [24+ebx],ecx
+ ; Round 7
+ mov ecx,DWORD [28+esi]
+ mov edx,DWORD [28+edi]
+ add ecx,eax
+ mov eax,0
+ adc eax,eax
+ add ecx,edx
+ adc eax,0
+ mov DWORD [28+ebx],ecx
+ ;
+ add esi,32
+ add edi,32
+ add ebx,32
+ sub ebp,8
+ jnz NEAR L$021aw_loop
+L$020aw_finish:
+ mov ebp,DWORD [32+esp]
+ and ebp,7
+ jz NEAR L$022aw_end
+ ; Tail Round 0
+ mov ecx,DWORD [esi]
+ mov edx,DWORD [edi]
+ add ecx,eax
+ mov eax,0
+ adc eax,eax
+ add ecx,edx
+ adc eax,0
+ dec ebp
+ mov DWORD [ebx],ecx
+ jz NEAR L$022aw_end
+ ; Tail Round 1
+ mov ecx,DWORD [4+esi]
+ mov edx,DWORD [4+edi]
+ add ecx,eax
+ mov eax,0
+ adc eax,eax
+ add ecx,edx
+ adc eax,0
+ dec ebp
+ mov DWORD [4+ebx],ecx
+ jz NEAR L$022aw_end
+ ; Tail Round 2
+ mov ecx,DWORD [8+esi]
+ mov edx,DWORD [8+edi]
+ add ecx,eax
+ mov eax,0
+ adc eax,eax
+ add ecx,edx
+ adc eax,0
+ dec ebp
+ mov DWORD [8+ebx],ecx
+ jz NEAR L$022aw_end
+ ; Tail Round 3
+ mov ecx,DWORD [12+esi]
+ mov edx,DWORD [12+edi]
+ add ecx,eax
+ mov eax,0
+ adc eax,eax
+ add ecx,edx
+ adc eax,0
+ dec ebp
+ mov DWORD [12+ebx],ecx
+ jz NEAR L$022aw_end
+ ; Tail Round 4
+ mov ecx,DWORD [16+esi]
+ mov edx,DWORD [16+edi]
+ add ecx,eax
+ mov eax,0
+ adc eax,eax
+ add ecx,edx
+ adc eax,0
+ dec ebp
+ mov DWORD [16+ebx],ecx
+ jz NEAR L$022aw_end
+ ; Tail Round 5
+ mov ecx,DWORD [20+esi]
+ mov edx,DWORD [20+edi]
+ add ecx,eax
+ mov eax,0
+ adc eax,eax
+ add ecx,edx
+ adc eax,0
+ dec ebp
+ mov DWORD [20+ebx],ecx
+ jz NEAR L$022aw_end
+ ; Tail Round 6
+ mov ecx,DWORD [24+esi]
+ mov edx,DWORD [24+edi]
+ add ecx,eax
+ mov eax,0
+ adc eax,eax
+ add ecx,edx
+ adc eax,0
+ mov DWORD [24+ebx],ecx
+L$022aw_end:
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+global _bn_sub_words
+align 16
+_bn_sub_words:
+L$_bn_sub_words_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ ;
+ mov ebx,DWORD [20+esp]
+ mov esi,DWORD [24+esp]
+ mov edi,DWORD [28+esp]
+ mov ebp,DWORD [32+esp]
+ xor eax,eax
+ and ebp,4294967288
+ jz NEAR L$023aw_finish
+L$024aw_loop:
+ ; Round 0
+ mov ecx,DWORD [esi]
+ mov edx,DWORD [edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [ebx],ecx
+ ; Round 1
+ mov ecx,DWORD [4+esi]
+ mov edx,DWORD [4+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [4+ebx],ecx
+ ; Round 2
+ mov ecx,DWORD [8+esi]
+ mov edx,DWORD [8+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [8+ebx],ecx
+ ; Round 3
+ mov ecx,DWORD [12+esi]
+ mov edx,DWORD [12+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [12+ebx],ecx
+ ; Round 4
+ mov ecx,DWORD [16+esi]
+ mov edx,DWORD [16+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [16+ebx],ecx
+ ; Round 5
+ mov ecx,DWORD [20+esi]
+ mov edx,DWORD [20+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [20+ebx],ecx
+ ; Round 6
+ mov ecx,DWORD [24+esi]
+ mov edx,DWORD [24+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [24+ebx],ecx
+ ; Round 7
+ mov ecx,DWORD [28+esi]
+ mov edx,DWORD [28+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [28+ebx],ecx
+ ;
+ add esi,32
+ add edi,32
+ add ebx,32
+ sub ebp,8
+ jnz NEAR L$024aw_loop
+L$023aw_finish:
+ mov ebp,DWORD [32+esp]
+ and ebp,7
+ jz NEAR L$025aw_end
+ ; Tail Round 0
+ mov ecx,DWORD [esi]
+ mov edx,DWORD [edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ dec ebp
+ mov DWORD [ebx],ecx
+ jz NEAR L$025aw_end
+ ; Tail Round 1
+ mov ecx,DWORD [4+esi]
+ mov edx,DWORD [4+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ dec ebp
+ mov DWORD [4+ebx],ecx
+ jz NEAR L$025aw_end
+ ; Tail Round 2
+ mov ecx,DWORD [8+esi]
+ mov edx,DWORD [8+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ dec ebp
+ mov DWORD [8+ebx],ecx
+ jz NEAR L$025aw_end
+ ; Tail Round 3
+ mov ecx,DWORD [12+esi]
+ mov edx,DWORD [12+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ dec ebp
+ mov DWORD [12+ebx],ecx
+ jz NEAR L$025aw_end
+ ; Tail Round 4
+ mov ecx,DWORD [16+esi]
+ mov edx,DWORD [16+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ dec ebp
+ mov DWORD [16+ebx],ecx
+ jz NEAR L$025aw_end
+ ; Tail Round 5
+ mov ecx,DWORD [20+esi]
+ mov edx,DWORD [20+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ dec ebp
+ mov DWORD [20+ebx],ecx
+ jz NEAR L$025aw_end
+ ; Tail Round 6
+ mov ecx,DWORD [24+esi]
+ mov edx,DWORD [24+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [24+ebx],ecx
+L$025aw_end:
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+global _bn_sub_part_words
+align 16
+_bn_sub_part_words:
+L$_bn_sub_part_words_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ ;
+ mov ebx,DWORD [20+esp]
+ mov esi,DWORD [24+esp]
+ mov edi,DWORD [28+esp]
+ mov ebp,DWORD [32+esp]
+ xor eax,eax
+ and ebp,4294967288
+ jz NEAR L$026aw_finish
+L$027aw_loop:
+ ; Round 0
+ mov ecx,DWORD [esi]
+ mov edx,DWORD [edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [ebx],ecx
+ ; Round 1
+ mov ecx,DWORD [4+esi]
+ mov edx,DWORD [4+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [4+ebx],ecx
+ ; Round 2
+ mov ecx,DWORD [8+esi]
+ mov edx,DWORD [8+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [8+ebx],ecx
+ ; Round 3
+ mov ecx,DWORD [12+esi]
+ mov edx,DWORD [12+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [12+ebx],ecx
+ ; Round 4
+ mov ecx,DWORD [16+esi]
+ mov edx,DWORD [16+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [16+ebx],ecx
+ ; Round 5
+ mov ecx,DWORD [20+esi]
+ mov edx,DWORD [20+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [20+ebx],ecx
+ ; Round 6
+ mov ecx,DWORD [24+esi]
+ mov edx,DWORD [24+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [24+ebx],ecx
+ ; Round 7
+ mov ecx,DWORD [28+esi]
+ mov edx,DWORD [28+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [28+ebx],ecx
+ ;
+ add esi,32
+ add edi,32
+ add ebx,32
+ sub ebp,8
+ jnz NEAR L$027aw_loop
+L$026aw_finish:
+ mov ebp,DWORD [32+esp]
+ and ebp,7
+ jz NEAR L$028aw_end
+ ; Tail Round 0
+ mov ecx,DWORD [esi]
+ mov edx,DWORD [edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [ebx],ecx
+ add esi,4
+ add edi,4
+ add ebx,4
+ dec ebp
+ jz NEAR L$028aw_end
+ ; Tail Round 1
+ mov ecx,DWORD [esi]
+ mov edx,DWORD [edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [ebx],ecx
+ add esi,4
+ add edi,4
+ add ebx,4
+ dec ebp
+ jz NEAR L$028aw_end
+ ; Tail Round 2
+ mov ecx,DWORD [esi]
+ mov edx,DWORD [edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [ebx],ecx
+ add esi,4
+ add edi,4
+ add ebx,4
+ dec ebp
+ jz NEAR L$028aw_end
+ ; Tail Round 3
+ mov ecx,DWORD [esi]
+ mov edx,DWORD [edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [ebx],ecx
+ add esi,4
+ add edi,4
+ add ebx,4
+ dec ebp
+ jz NEAR L$028aw_end
+ ; Tail Round 4
+ mov ecx,DWORD [esi]
+ mov edx,DWORD [edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [ebx],ecx
+ add esi,4
+ add edi,4
+ add ebx,4
+ dec ebp
+ jz NEAR L$028aw_end
+ ; Tail Round 5
+ mov ecx,DWORD [esi]
+ mov edx,DWORD [edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [ebx],ecx
+ add esi,4
+ add edi,4
+ add ebx,4
+ dec ebp
+ jz NEAR L$028aw_end
+ ; Tail Round 6
+ mov ecx,DWORD [esi]
+ mov edx,DWORD [edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [ebx],ecx
+ add esi,4
+ add edi,4
+ add ebx,4
+L$028aw_end:
+ cmp DWORD [36+esp],0
+ je NEAR L$029pw_end
+ mov ebp,DWORD [36+esp]
+ cmp ebp,0
+ je NEAR L$029pw_end
+ jge NEAR L$030pw_pos
+ ; pw_neg
+ mov edx,0
+ sub edx,ebp
+ mov ebp,edx
+ and ebp,4294967288
+ jz NEAR L$031pw_neg_finish
+L$032pw_neg_loop:
+ ; dl<0 Round 0
+ mov ecx,0
+ mov edx,DWORD [edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [ebx],ecx
+ ; dl<0 Round 1
+ mov ecx,0
+ mov edx,DWORD [4+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [4+ebx],ecx
+ ; dl<0 Round 2
+ mov ecx,0
+ mov edx,DWORD [8+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [8+ebx],ecx
+ ; dl<0 Round 3
+ mov ecx,0
+ mov edx,DWORD [12+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [12+ebx],ecx
+ ; dl<0 Round 4
+ mov ecx,0
+ mov edx,DWORD [16+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [16+ebx],ecx
+ ; dl<0 Round 5
+ mov ecx,0
+ mov edx,DWORD [20+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [20+ebx],ecx
+ ; dl<0 Round 6
+ mov ecx,0
+ mov edx,DWORD [24+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [24+ebx],ecx
+ ; dl<0 Round 7
+ mov ecx,0
+ mov edx,DWORD [28+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [28+ebx],ecx
+ ;
+ add edi,32
+ add ebx,32
+ sub ebp,8
+ jnz NEAR L$032pw_neg_loop
+L$031pw_neg_finish:
+ mov edx,DWORD [36+esp]
+ mov ebp,0
+ sub ebp,edx
+ and ebp,7
+ jz NEAR L$029pw_end
+ ; dl<0 Tail Round 0
+ mov ecx,0
+ mov edx,DWORD [edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ dec ebp
+ mov DWORD [ebx],ecx
+ jz NEAR L$029pw_end
+ ; dl<0 Tail Round 1
+ mov ecx,0
+ mov edx,DWORD [4+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ dec ebp
+ mov DWORD [4+ebx],ecx
+ jz NEAR L$029pw_end
+ ; dl<0 Tail Round 2
+ mov ecx,0
+ mov edx,DWORD [8+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ dec ebp
+ mov DWORD [8+ebx],ecx
+ jz NEAR L$029pw_end
+ ; dl<0 Tail Round 3
+ mov ecx,0
+ mov edx,DWORD [12+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ dec ebp
+ mov DWORD [12+ebx],ecx
+ jz NEAR L$029pw_end
+ ; dl<0 Tail Round 4
+ mov ecx,0
+ mov edx,DWORD [16+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ dec ebp
+ mov DWORD [16+ebx],ecx
+ jz NEAR L$029pw_end
+ ; dl<0 Tail Round 5
+ mov ecx,0
+ mov edx,DWORD [20+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ dec ebp
+ mov DWORD [20+ebx],ecx
+ jz NEAR L$029pw_end
+ ; dl<0 Tail Round 6
+ mov ecx,0
+ mov edx,DWORD [24+edi]
+ sub ecx,eax
+ mov eax,0
+ adc eax,eax
+ sub ecx,edx
+ adc eax,0
+ mov DWORD [24+ebx],ecx
+ jmp NEAR L$029pw_end
+L$030pw_pos:
+ and ebp,4294967288
+ jz NEAR L$033pw_pos_finish
+L$034pw_pos_loop:
+ ; dl>0 Round 0
+ mov ecx,DWORD [esi]
+ sub ecx,eax
+ mov DWORD [ebx],ecx
+ jnc NEAR L$035pw_nc0
+ ; dl>0 Round 1
+ mov ecx,DWORD [4+esi]
+ sub ecx,eax
+ mov DWORD [4+ebx],ecx
+ jnc NEAR L$036pw_nc1
+ ; dl>0 Round 2
+ mov ecx,DWORD [8+esi]
+ sub ecx,eax
+ mov DWORD [8+ebx],ecx
+ jnc NEAR L$037pw_nc2
+ ; dl>0 Round 3
+ mov ecx,DWORD [12+esi]
+ sub ecx,eax
+ mov DWORD [12+ebx],ecx
+ jnc NEAR L$038pw_nc3
+ ; dl>0 Round 4
+ mov ecx,DWORD [16+esi]
+ sub ecx,eax
+ mov DWORD [16+ebx],ecx
+ jnc NEAR L$039pw_nc4
+ ; dl>0 Round 5
+ mov ecx,DWORD [20+esi]
+ sub ecx,eax
+ mov DWORD [20+ebx],ecx
+ jnc NEAR L$040pw_nc5
+ ; dl>0 Round 6
+ mov ecx,DWORD [24+esi]
+ sub ecx,eax
+ mov DWORD [24+ebx],ecx
+ jnc NEAR L$041pw_nc6
+ ; dl>0 Round 7
+ mov ecx,DWORD [28+esi]
+ sub ecx,eax
+ mov DWORD [28+ebx],ecx
+ jnc NEAR L$042pw_nc7
+ ;
+ add esi,32
+ add ebx,32
+ sub ebp,8
+ jnz NEAR L$034pw_pos_loop
+L$033pw_pos_finish:
+ mov ebp,DWORD [36+esp]
+ and ebp,7
+ jz NEAR L$029pw_end
+ ; dl>0 Tail Round 0
+ mov ecx,DWORD [esi]
+ sub ecx,eax
+ mov DWORD [ebx],ecx
+ jnc NEAR L$043pw_tail_nc0
+ dec ebp
+ jz NEAR L$029pw_end
+ ; dl>0 Tail Round 1
+ mov ecx,DWORD [4+esi]
+ sub ecx,eax
+ mov DWORD [4+ebx],ecx
+ jnc NEAR L$044pw_tail_nc1
+ dec ebp
+ jz NEAR L$029pw_end
+ ; dl>0 Tail Round 2
+ mov ecx,DWORD [8+esi]
+ sub ecx,eax
+ mov DWORD [8+ebx],ecx
+ jnc NEAR L$045pw_tail_nc2
+ dec ebp
+ jz NEAR L$029pw_end
+ ; dl>0 Tail Round 3
+ mov ecx,DWORD [12+esi]
+ sub ecx,eax
+ mov DWORD [12+ebx],ecx
+ jnc NEAR L$046pw_tail_nc3
+ dec ebp
+ jz NEAR L$029pw_end
+ ; dl>0 Tail Round 4
+ mov ecx,DWORD [16+esi]
+ sub ecx,eax
+ mov DWORD [16+ebx],ecx
+ jnc NEAR L$047pw_tail_nc4
+ dec ebp
+ jz NEAR L$029pw_end
+ ; dl>0 Tail Round 5
+ mov ecx,DWORD [20+esi]
+ sub ecx,eax
+ mov DWORD [20+ebx],ecx
+ jnc NEAR L$048pw_tail_nc5
+ dec ebp
+ jz NEAR L$029pw_end
+ ; dl>0 Tail Round 6
+ mov ecx,DWORD [24+esi]
+ sub ecx,eax
+ mov DWORD [24+ebx],ecx
+ jnc NEAR L$049pw_tail_nc6
+ mov eax,1
+ jmp NEAR L$029pw_end
+L$050pw_nc_loop:
+ mov ecx,DWORD [esi]
+ mov DWORD [ebx],ecx
+L$035pw_nc0:
+ mov ecx,DWORD [4+esi]
+ mov DWORD [4+ebx],ecx
+L$036pw_nc1:
+ mov ecx,DWORD [8+esi]
+ mov DWORD [8+ebx],ecx
+L$037pw_nc2:
+ mov ecx,DWORD [12+esi]
+ mov DWORD [12+ebx],ecx
+L$038pw_nc3:
+ mov ecx,DWORD [16+esi]
+ mov DWORD [16+ebx],ecx
+L$039pw_nc4:
+ mov ecx,DWORD [20+esi]
+ mov DWORD [20+ebx],ecx
+L$040pw_nc5:
+ mov ecx,DWORD [24+esi]
+ mov DWORD [24+ebx],ecx
+L$041pw_nc6:
+ mov ecx,DWORD [28+esi]
+ mov DWORD [28+ebx],ecx
+L$042pw_nc7:
+ ;
+ add esi,32
+ add ebx,32
+ sub ebp,8
+ jnz NEAR L$050pw_nc_loop
+ mov ebp,DWORD [36+esp]
+ and ebp,7
+ jz NEAR L$051pw_nc_end
+ mov ecx,DWORD [esi]
+ mov DWORD [ebx],ecx
+L$043pw_tail_nc0:
+ dec ebp
+ jz NEAR L$051pw_nc_end
+ mov ecx,DWORD [4+esi]
+ mov DWORD [4+ebx],ecx
+L$044pw_tail_nc1:
+ dec ebp
+ jz NEAR L$051pw_nc_end
+ mov ecx,DWORD [8+esi]
+ mov DWORD [8+ebx],ecx
+L$045pw_tail_nc2:
+ dec ebp
+ jz NEAR L$051pw_nc_end
+ mov ecx,DWORD [12+esi]
+ mov DWORD [12+ebx],ecx
+L$046pw_tail_nc3:
+ dec ebp
+ jz NEAR L$051pw_nc_end
+ mov ecx,DWORD [16+esi]
+ mov DWORD [16+ebx],ecx
+L$047pw_tail_nc4:
+ dec ebp
+ jz NEAR L$051pw_nc_end
+ mov ecx,DWORD [20+esi]
+ mov DWORD [20+ebx],ecx
+L$048pw_tail_nc5:
+ dec ebp
+ jz NEAR L$051pw_nc_end
+ mov ecx,DWORD [24+esi]
+ mov DWORD [24+ebx],ecx
+L$049pw_tail_nc6:
+L$051pw_nc_end:
+ mov eax,0
+L$029pw_end:
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+segment .bss
+common _OPENSSL_ia32cap_P 16
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/co-586.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/co-586.nasm
new file mode 100644
index 0000000000..08eb9fe372
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/co-586.nasm
@@ -0,0 +1,1259 @@
+; Copyright 1995-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text code align=64
+%else
+section .text code
+%endif
+global _bn_mul_comba8
+align 16
+_bn_mul_comba8:
+L$_bn_mul_comba8_begin:
+ push esi
+ mov esi,DWORD [12+esp]
+ push edi
+ mov edi,DWORD [20+esp]
+ push ebp
+ push ebx
+ xor ebx,ebx
+ mov eax,DWORD [esi]
+ xor ecx,ecx
+ mov edx,DWORD [edi]
+ ; ################## Calculate word 0
+ xor ebp,ebp
+ ; mul a[0]*b[0]
+ mul edx
+ add ebx,eax
+ mov eax,DWORD [20+esp]
+ adc ecx,edx
+ mov edx,DWORD [edi]
+ adc ebp,0
+ mov DWORD [eax],ebx
+ mov eax,DWORD [4+esi]
+ ; saved r[0]
+ ; ################## Calculate word 1
+ xor ebx,ebx
+ ; mul a[1]*b[0]
+ mul edx
+ add ecx,eax
+ mov eax,DWORD [esi]
+ adc ebp,edx
+ mov edx,DWORD [4+edi]
+ adc ebx,0
+ ; mul a[0]*b[1]
+ mul edx
+ add ecx,eax
+ mov eax,DWORD [20+esp]
+ adc ebp,edx
+ mov edx,DWORD [edi]
+ adc ebx,0
+ mov DWORD [4+eax],ecx
+ mov eax,DWORD [8+esi]
+ ; saved r[1]
+ ; ################## Calculate word 2
+ xor ecx,ecx
+ ; mul a[2]*b[0]
+ mul edx
+ add ebp,eax
+ mov eax,DWORD [4+esi]
+ adc ebx,edx
+ mov edx,DWORD [4+edi]
+ adc ecx,0
+ ; mul a[1]*b[1]
+ mul edx
+ add ebp,eax
+ mov eax,DWORD [esi]
+ adc ebx,edx
+ mov edx,DWORD [8+edi]
+ adc ecx,0
+ ; mul a[0]*b[2]
+ mul edx
+ add ebp,eax
+ mov eax,DWORD [20+esp]
+ adc ebx,edx
+ mov edx,DWORD [edi]
+ adc ecx,0
+ mov DWORD [8+eax],ebp
+ mov eax,DWORD [12+esi]
+ ; saved r[2]
+ ; ################## Calculate word 3
+ xor ebp,ebp
+ ; mul a[3]*b[0]
+ mul edx
+ add ebx,eax
+ mov eax,DWORD [8+esi]
+ adc ecx,edx
+ mov edx,DWORD [4+edi]
+ adc ebp,0
+ ; mul a[2]*b[1]
+ mul edx
+ add ebx,eax
+ mov eax,DWORD [4+esi]
+ adc ecx,edx
+ mov edx,DWORD [8+edi]
+ adc ebp,0
+ ; mul a[1]*b[2]
+ mul edx
+ add ebx,eax
+ mov eax,DWORD [esi]
+ adc ecx,edx
+ mov edx,DWORD [12+edi]
+ adc ebp,0
+ ; mul a[0]*b[3]
+ mul edx
+ add ebx,eax
+ mov eax,DWORD [20+esp]
+ adc ecx,edx
+ mov edx,DWORD [edi]
+ adc ebp,0
+ mov DWORD [12+eax],ebx
+ mov eax,DWORD [16+esi]
+ ; saved r[3]
+ ; ################## Calculate word 4
+ xor ebx,ebx
+ ; mul a[4]*b[0]
+ mul edx
+ add ecx,eax
+ mov eax,DWORD [12+esi]
+ adc ebp,edx
+ mov edx,DWORD [4+edi]
+ adc ebx,0
+ ; mul a[3]*b[1]
+ mul edx
+ add ecx,eax
+ mov eax,DWORD [8+esi]
+ adc ebp,edx
+ mov edx,DWORD [8+edi]
+ adc ebx,0
+ ; mul a[2]*b[2]
+ mul edx
+ add ecx,eax
+ mov eax,DWORD [4+esi]
+ adc ebp,edx
+ mov edx,DWORD [12+edi]
+ adc ebx,0
+ ; mul a[1]*b[3]
+ mul edx
+ add ecx,eax
+ mov eax,DWORD [esi]
+ adc ebp,edx
+ mov edx,DWORD [16+edi]
+ adc ebx,0
+ ; mul a[0]*b[4]
+ mul edx
+ add ecx,eax
+ mov eax,DWORD [20+esp]
+ adc ebp,edx
+ mov edx,DWORD [edi]
+ adc ebx,0
+ mov DWORD [16+eax],ecx
+ mov eax,DWORD [20+esi]
+ ; saved r[4]
+ ; ################## Calculate word 5
+ xor ecx,ecx
+ ; mul a[5]*b[0]
+ mul edx
+ add ebp,eax
+ mov eax,DWORD [16+esi]
+ adc ebx,edx
+ mov edx,DWORD [4+edi]
+ adc ecx,0
+ ; mul a[4]*b[1]
+ mul edx
+ add ebp,eax
+ mov eax,DWORD [12+esi]
+ adc ebx,edx
+ mov edx,DWORD [8+edi]
+ adc ecx,0
+ ; mul a[3]*b[2]
+ mul edx
+ add ebp,eax
+ mov eax,DWORD [8+esi]
+ adc ebx,edx
+ mov edx,DWORD [12+edi]
+ adc ecx,0
+ ; mul a[2]*b[3]
+ mul edx
+ add ebp,eax
+ mov eax,DWORD [4+esi]
+ adc ebx,edx
+ mov edx,DWORD [16+edi]
+ adc ecx,0
+ ; mul a[1]*b[4]
+ mul edx
+ add ebp,eax
+ mov eax,DWORD [esi]
+ adc ebx,edx
+ mov edx,DWORD [20+edi]
+ adc ecx,0
+ ; mul a[0]*b[5]
+ mul edx
+ add ebp,eax
+ mov eax,DWORD [20+esp]
+ adc ebx,edx
+ mov edx,DWORD [edi]
+ adc ecx,0
+ mov DWORD [20+eax],ebp
+ mov eax,DWORD [24+esi]
+ ; saved r[5]
+ ; ################## Calculate word 6
+ xor ebp,ebp
+ ; mul a[6]*b[0]
+ mul edx
+ add ebx,eax
+ mov eax,DWORD [20+esi]
+ adc ecx,edx
+ mov edx,DWORD [4+edi]
+ adc ebp,0
+ ; mul a[5]*b[1]
+ mul edx
+ add ebx,eax
+ mov eax,DWORD [16+esi]
+ adc ecx,edx
+ mov edx,DWORD [8+edi]
+ adc ebp,0
+ ; mul a[4]*b[2]
+ mul edx
+ add ebx,eax
+ mov eax,DWORD [12+esi]
+ adc ecx,edx
+ mov edx,DWORD [12+edi]
+ adc ebp,0
+ ; mul a[3]*b[3]
+ mul edx
+ add ebx,eax
+ mov eax,DWORD [8+esi]
+ adc ecx,edx
+ mov edx,DWORD [16+edi]
+ adc ebp,0
+ ; mul a[2]*b[4]
+ mul edx
+ add ebx,eax
+ mov eax,DWORD [4+esi]
+ adc ecx,edx
+ mov edx,DWORD [20+edi]
+ adc ebp,0
+ ; mul a[1]*b[5]
+ mul edx
+ add ebx,eax
+ mov eax,DWORD [esi]
+ adc ecx,edx
+ mov edx,DWORD [24+edi]
+ adc ebp,0
+ ; mul a[0]*b[6]
+ mul edx
+ add ebx,eax
+ mov eax,DWORD [20+esp]
+ adc ecx,edx
+ mov edx,DWORD [edi]
+ adc ebp,0
+ mov DWORD [24+eax],ebx
+ mov eax,DWORD [28+esi]
+ ; saved r[6]
+ ; ################## Calculate word 7
+ xor ebx,ebx
+ ; mul a[7]*b[0]
+ mul edx
+ add ecx,eax
+ mov eax,DWORD [24+esi]
+ adc ebp,edx
+ mov edx,DWORD [4+edi]
+ adc ebx,0
+ ; mul a[6]*b[1]
+ mul edx
+ add ecx,eax
+ mov eax,DWORD [20+esi]
+ adc ebp,edx
+ mov edx,DWORD [8+edi]
+ adc ebx,0
+ ; mul a[5]*b[2]
+ mul edx
+ add ecx,eax
+ mov eax,DWORD [16+esi]
+ adc ebp,edx
+ mov edx,DWORD [12+edi]
+ adc ebx,0
+ ; mul a[4]*b[3]
+ mul edx
+ add ecx,eax
+ mov eax,DWORD [12+esi]
+ adc ebp,edx
+ mov edx,DWORD [16+edi]
+ adc ebx,0
+ ; mul a[3]*b[4]
+ mul edx
+ add ecx,eax
+ mov eax,DWORD [8+esi]
+ adc ebp,edx
+ mov edx,DWORD [20+edi]
+ adc ebx,0
+ ; mul a[2]*b[5]
+ mul edx
+ add ecx,eax
+ mov eax,DWORD [4+esi]
+ adc ebp,edx
+ mov edx,DWORD [24+edi]
+ adc ebx,0
+ ; mul a[1]*b[6]
+ mul edx
+ add ecx,eax
+ mov eax,DWORD [esi]
+ adc ebp,edx
+ mov edx,DWORD [28+edi]
+ adc ebx,0
+ ; mul a[0]*b[7]
+ mul edx
+ add ecx,eax
+ mov eax,DWORD [20+esp]
+ adc ebp,edx
+ mov edx,DWORD [4+edi]
+ adc ebx,0
+ mov DWORD [28+eax],ecx
+ mov eax,DWORD [28+esi]
+ ; saved r[7]
+ ; ################## Calculate word 8
+ xor ecx,ecx
+ ; mul a[7]*b[1]
+ mul edx
+ add ebp,eax
+ mov eax,DWORD [24+esi]
+ adc ebx,edx
+ mov edx,DWORD [8+edi]
+ adc ecx,0
+ ; mul a[6]*b[2]
+ mul edx
+ add ebp,eax
+ mov eax,DWORD [20+esi]
+ adc ebx,edx
+ mov edx,DWORD [12+edi]
+ adc ecx,0
+ ; mul a[5]*b[3]
+ mul edx
+ add ebp,eax
+ mov eax,DWORD [16+esi]
+ adc ebx,edx
+ mov edx,DWORD [16+edi]
+ adc ecx,0
+ ; mul a[4]*b[4]
+ mul edx
+ add ebp,eax
+ mov eax,DWORD [12+esi]
+ adc ebx,edx
+ mov edx,DWORD [20+edi]
+ adc ecx,0
+ ; mul a[3]*b[5]
+ mul edx
+ add ebp,eax
+ mov eax,DWORD [8+esi]
+ adc ebx,edx
+ mov edx,DWORD [24+edi]
+ adc ecx,0
+ ; mul a[2]*b[6]
+ mul edx
+ add ebp,eax
+ mov eax,DWORD [4+esi]
+ adc ebx,edx
+ mov edx,DWORD [28+edi]
+ adc ecx,0
+ ; mul a[1]*b[7]
+ mul edx
+ add ebp,eax
+ mov eax,DWORD [20+esp]
+ adc ebx,edx
+ mov edx,DWORD [8+edi]
+ adc ecx,0
+ mov DWORD [32+eax],ebp
+ mov eax,DWORD [28+esi]
+ ; saved r[8]
+ ; ################## Calculate word 9
+ xor ebp,ebp
+ ; mul a[7]*b[2]
+ mul edx
+ add ebx,eax
+ mov eax,DWORD [24+esi]
+ adc ecx,edx
+ mov edx,DWORD [12+edi]
+ adc ebp,0
+ ; mul a[6]*b[3]
+ mul edx
+ add ebx,eax
+ mov eax,DWORD [20+esi]
+ adc ecx,edx
+ mov edx,DWORD [16+edi]
+ adc ebp,0
+ ; mul a[5]*b[4]
+ mul edx
+ add ebx,eax
+ mov eax,DWORD [16+esi]
+ adc ecx,edx
+ mov edx,DWORD [20+edi]
+ adc ebp,0
+ ; mul a[4]*b[5]
+ mul edx
+ add ebx,eax
+ mov eax,DWORD [12+esi]
+ adc ecx,edx
+ mov edx,DWORD [24+edi]
+ adc ebp,0
+ ; mul a[3]*b[6]
+ mul edx
+ add ebx,eax
+ mov eax,DWORD [8+esi]
+ adc ecx,edx
+ mov edx,DWORD [28+edi]
+ adc ebp,0
+ ; mul a[2]*b[7]
+ mul edx
+ add ebx,eax
+ mov eax,DWORD [20+esp]
+ adc ecx,edx
+ mov edx,DWORD [12+edi]
+ adc ebp,0
+ mov DWORD [36+eax],ebx
+ mov eax,DWORD [28+esi]
+ ; saved r[9]
+ ; ################## Calculate word 10
+ xor ebx,ebx
+ ; mul a[7]*b[3]
+ mul edx
+ add ecx,eax
+ mov eax,DWORD [24+esi]
+ adc ebp,edx
+ mov edx,DWORD [16+edi]
+ adc ebx,0
+ ; mul a[6]*b[4]
+ mul edx
+ add ecx,eax
+ mov eax,DWORD [20+esi]
+ adc ebp,edx
+ mov edx,DWORD [20+edi]
+ adc ebx,0
+ ; mul a[5]*b[5]
+ mul edx
+ add ecx,eax
+ mov eax,DWORD [16+esi]
+ adc ebp,edx
+ mov edx,DWORD [24+edi]
+ adc ebx,0
+ ; mul a[4]*b[6]
+ mul edx
+ add ecx,eax
+ mov eax,DWORD [12+esi]
+ adc ebp,edx
+ mov edx,DWORD [28+edi]
+ adc ebx,0
+ ; mul a[3]*b[7]
+ mul edx
+ add ecx,eax
+ mov eax,DWORD [20+esp]
+ adc ebp,edx
+ mov edx,DWORD [16+edi]
+ adc ebx,0
+ mov DWORD [40+eax],ecx
+ mov eax,DWORD [28+esi]
+ ; saved r[10]
+ ; ################## Calculate word 11
+ xor ecx,ecx
+ ; mul a[7]*b[4]
+ mul edx
+ add ebp,eax
+ mov eax,DWORD [24+esi]
+ adc ebx,edx
+ mov edx,DWORD [20+edi]
+ adc ecx,0
+ ; mul a[6]*b[5]
+ mul edx
+ add ebp,eax
+ mov eax,DWORD [20+esi]
+ adc ebx,edx
+ mov edx,DWORD [24+edi]
+ adc ecx,0
+ ; mul a[5]*b[6]
+ mul edx
+ add ebp,eax
+ mov eax,DWORD [16+esi]
+ adc ebx,edx
+ mov edx,DWORD [28+edi]
+ adc ecx,0
+ ; mul a[4]*b[7]
+ mul edx
+ add ebp,eax
+ mov eax,DWORD [20+esp]
+ adc ebx,edx
+ mov edx,DWORD [20+edi]
+ adc ecx,0
+ mov DWORD [44+eax],ebp
+ mov eax,DWORD [28+esi]
+ ; saved r[11]
+ ; ################## Calculate word 12
+ xor ebp,ebp
+ ; mul a[7]*b[5]
+ mul edx
+ add ebx,eax
+ mov eax,DWORD [24+esi]
+ adc ecx,edx
+ mov edx,DWORD [24+edi]
+ adc ebp,0
+ ; mul a[6]*b[6]
+ mul edx
+ add ebx,eax
+ mov eax,DWORD [20+esi]
+ adc ecx,edx
+ mov edx,DWORD [28+edi]
+ adc ebp,0
+ ; mul a[5]*b[7]
+ mul edx
+ add ebx,eax
+ mov eax,DWORD [20+esp]
+ adc ecx,edx
+ mov edx,DWORD [24+edi]
+ adc ebp,0
+ mov DWORD [48+eax],ebx
+ mov eax,DWORD [28+esi]
+ ; saved r[12]
+ ; ################## Calculate word 13
+ xor ebx,ebx
+ ; mul a[7]*b[6]
+ mul edx
+ add ecx,eax
+ mov eax,DWORD [24+esi]
+ adc ebp,edx
+ mov edx,DWORD [28+edi]
+ adc ebx,0
+ ; mul a[6]*b[7]
+ mul edx
+ add ecx,eax
+ mov eax,DWORD [20+esp]
+ adc ebp,edx
+ mov edx,DWORD [28+edi]
+ adc ebx,0
+ mov DWORD [52+eax],ecx
+ mov eax,DWORD [28+esi]
+ ; saved r[13]
+ ; ################## Calculate word 14
+ xor ecx,ecx
+ ; mul a[7]*b[7]
+ mul edx
+ add ebp,eax
+ mov eax,DWORD [20+esp]
+ adc ebx,edx
+ adc ecx,0
+ mov DWORD [56+eax],ebp
+ ; saved r[14]
+ ; save r[15]
+ mov DWORD [60+eax],ebx
+ pop ebx
+ pop ebp
+ pop edi
+ pop esi
+ ret
+global _bn_mul_comba4
+align 16
+_bn_mul_comba4:
+L$_bn_mul_comba4_begin:
+ push esi
+ mov esi,DWORD [12+esp]
+ push edi
+ mov edi,DWORD [20+esp]
+ push ebp
+ push ebx
+ xor ebx,ebx
+ mov eax,DWORD [esi]
+ xor ecx,ecx
+ mov edx,DWORD [edi]
+ ; ################## Calculate word 0
+ xor ebp,ebp
+ ; mul a[0]*b[0]
+ mul edx
+ add ebx,eax
+ mov eax,DWORD [20+esp]
+ adc ecx,edx
+ mov edx,DWORD [edi]
+ adc ebp,0
+ mov DWORD [eax],ebx
+ mov eax,DWORD [4+esi]
+ ; saved r[0]
+ ; ################## Calculate word 1
+ xor ebx,ebx
+ ; mul a[1]*b[0]
+ mul edx
+ add ecx,eax
+ mov eax,DWORD [esi]
+ adc ebp,edx
+ mov edx,DWORD [4+edi]
+ adc ebx,0
+ ; mul a[0]*b[1]
+ mul edx
+ add ecx,eax
+ mov eax,DWORD [20+esp]
+ adc ebp,edx
+ mov edx,DWORD [edi]
+ adc ebx,0
+ mov DWORD [4+eax],ecx
+ mov eax,DWORD [8+esi]
+ ; saved r[1]
+ ; ################## Calculate word 2
+ xor ecx,ecx
+ ; mul a[2]*b[0]
+ mul edx
+ add ebp,eax
+ mov eax,DWORD [4+esi]
+ adc ebx,edx
+ mov edx,DWORD [4+edi]
+ adc ecx,0
+ ; mul a[1]*b[1]
+ mul edx
+ add ebp,eax
+ mov eax,DWORD [esi]
+ adc ebx,edx
+ mov edx,DWORD [8+edi]
+ adc ecx,0
+ ; mul a[0]*b[2]
+ mul edx
+ add ebp,eax
+ mov eax,DWORD [20+esp]
+ adc ebx,edx
+ mov edx,DWORD [edi]
+ adc ecx,0
+ mov DWORD [8+eax],ebp
+ mov eax,DWORD [12+esi]
+ ; saved r[2]
+ ; ################## Calculate word 3
+ xor ebp,ebp
+ ; mul a[3]*b[0]
+ mul edx
+ add ebx,eax
+ mov eax,DWORD [8+esi]
+ adc ecx,edx
+ mov edx,DWORD [4+edi]
+ adc ebp,0
+ ; mul a[2]*b[1]
+ mul edx
+ add ebx,eax
+ mov eax,DWORD [4+esi]
+ adc ecx,edx
+ mov edx,DWORD [8+edi]
+ adc ebp,0
+ ; mul a[1]*b[2]
+ mul edx
+ add ebx,eax
+ mov eax,DWORD [esi]
+ adc ecx,edx
+ mov edx,DWORD [12+edi]
+ adc ebp,0
+ ; mul a[0]*b[3]
+ mul edx
+ add ebx,eax
+ mov eax,DWORD [20+esp]
+ adc ecx,edx
+ mov edx,DWORD [4+edi]
+ adc ebp,0
+ mov DWORD [12+eax],ebx
+ mov eax,DWORD [12+esi]
+ ; saved r[3]
+ ; ################## Calculate word 4
+ xor ebx,ebx
+ ; mul a[3]*b[1]
+ mul edx
+ add ecx,eax
+ mov eax,DWORD [8+esi]
+ adc ebp,edx
+ mov edx,DWORD [8+edi]
+ adc ebx,0
+ ; mul a[2]*b[2]
+ mul edx
+ add ecx,eax
+ mov eax,DWORD [4+esi]
+ adc ebp,edx
+ mov edx,DWORD [12+edi]
+ adc ebx,0
+ ; mul a[1]*b[3]
+ mul edx
+ add ecx,eax
+ mov eax,DWORD [20+esp]
+ adc ebp,edx
+ mov edx,DWORD [8+edi]
+ adc ebx,0
+ mov DWORD [16+eax],ecx
+ mov eax,DWORD [12+esi]
+ ; saved r[4]
+ ; ################## Calculate word 5
+ xor ecx,ecx
+ ; mul a[3]*b[2]
+ mul edx
+ add ebp,eax
+ mov eax,DWORD [8+esi]
+ adc ebx,edx
+ mov edx,DWORD [12+edi]
+ adc ecx,0
+ ; mul a[2]*b[3]
+ mul edx
+ add ebp,eax
+ mov eax,DWORD [20+esp]
+ adc ebx,edx
+ mov edx,DWORD [12+edi]
+ adc ecx,0
+ mov DWORD [20+eax],ebp
+ mov eax,DWORD [12+esi]
+ ; saved r[5]
+ ; ################## Calculate word 6
+ xor ebp,ebp
+ ; mul a[3]*b[3]
+ mul edx
+ add ebx,eax
+ mov eax,DWORD [20+esp]
+ adc ecx,edx
+ adc ebp,0
+ mov DWORD [24+eax],ebx
+ ; saved r[6]
+ ; save r[7]
+ mov DWORD [28+eax],ecx
+ pop ebx
+ pop ebp
+ pop edi
+ pop esi
+ ret
+global _bn_sqr_comba8
+align 16
+_bn_sqr_comba8:
+L$_bn_sqr_comba8_begin:
+ push esi
+ push edi
+ push ebp
+ push ebx
+ mov edi,DWORD [20+esp]
+ mov esi,DWORD [24+esp]
+ xor ebx,ebx
+ xor ecx,ecx
+ mov eax,DWORD [esi]
+ ; ############### Calculate word 0
+ xor ebp,ebp
+ ; sqr a[0]*a[0]
+ mul eax
+ add ebx,eax
+ adc ecx,edx
+ mov edx,DWORD [esi]
+ adc ebp,0
+ mov DWORD [edi],ebx
+ mov eax,DWORD [4+esi]
+ ; saved r[0]
+ ; ############### Calculate word 1
+ xor ebx,ebx
+ ; sqr a[1]*a[0]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ebx,0
+ add ecx,eax
+ adc ebp,edx
+ mov eax,DWORD [8+esi]
+ adc ebx,0
+ mov DWORD [4+edi],ecx
+ mov edx,DWORD [esi]
+ ; saved r[1]
+ ; ############### Calculate word 2
+ xor ecx,ecx
+ ; sqr a[2]*a[0]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ecx,0
+ add ebp,eax
+ adc ebx,edx
+ mov eax,DWORD [4+esi]
+ adc ecx,0
+ ; sqr a[1]*a[1]
+ mul eax
+ add ebp,eax
+ adc ebx,edx
+ mov edx,DWORD [esi]
+ adc ecx,0
+ mov DWORD [8+edi],ebp
+ mov eax,DWORD [12+esi]
+ ; saved r[2]
+ ; ############### Calculate word 3
+ xor ebp,ebp
+ ; sqr a[3]*a[0]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ebp,0
+ add ebx,eax
+ adc ecx,edx
+ mov eax,DWORD [8+esi]
+ adc ebp,0
+ mov edx,DWORD [4+esi]
+ ; sqr a[2]*a[1]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ebp,0
+ add ebx,eax
+ adc ecx,edx
+ mov eax,DWORD [16+esi]
+ adc ebp,0
+ mov DWORD [12+edi],ebx
+ mov edx,DWORD [esi]
+ ; saved r[3]
+ ; ############### Calculate word 4
+ xor ebx,ebx
+ ; sqr a[4]*a[0]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ebx,0
+ add ecx,eax
+ adc ebp,edx
+ mov eax,DWORD [12+esi]
+ adc ebx,0
+ mov edx,DWORD [4+esi]
+ ; sqr a[3]*a[1]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ebx,0
+ add ecx,eax
+ adc ebp,edx
+ mov eax,DWORD [8+esi]
+ adc ebx,0
+ ; sqr a[2]*a[2]
+ mul eax
+ add ecx,eax
+ adc ebp,edx
+ mov edx,DWORD [esi]
+ adc ebx,0
+ mov DWORD [16+edi],ecx
+ mov eax,DWORD [20+esi]
+ ; saved r[4]
+ ; ############### Calculate word 5
+ xor ecx,ecx
+ ; sqr a[5]*a[0]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ecx,0
+ add ebp,eax
+ adc ebx,edx
+ mov eax,DWORD [16+esi]
+ adc ecx,0
+ mov edx,DWORD [4+esi]
+ ; sqr a[4]*a[1]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ecx,0
+ add ebp,eax
+ adc ebx,edx
+ mov eax,DWORD [12+esi]
+ adc ecx,0
+ mov edx,DWORD [8+esi]
+ ; sqr a[3]*a[2]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ecx,0
+ add ebp,eax
+ adc ebx,edx
+ mov eax,DWORD [24+esi]
+ adc ecx,0
+ mov DWORD [20+edi],ebp
+ mov edx,DWORD [esi]
+ ; saved r[5]
+ ; ############### Calculate word 6
+ xor ebp,ebp
+ ; sqr a[6]*a[0]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ebp,0
+ add ebx,eax
+ adc ecx,edx
+ mov eax,DWORD [20+esi]
+ adc ebp,0
+ mov edx,DWORD [4+esi]
+ ; sqr a[5]*a[1]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ebp,0
+ add ebx,eax
+ adc ecx,edx
+ mov eax,DWORD [16+esi]
+ adc ebp,0
+ mov edx,DWORD [8+esi]
+ ; sqr a[4]*a[2]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ebp,0
+ add ebx,eax
+ adc ecx,edx
+ mov eax,DWORD [12+esi]
+ adc ebp,0
+ ; sqr a[3]*a[3]
+ mul eax
+ add ebx,eax
+ adc ecx,edx
+ mov edx,DWORD [esi]
+ adc ebp,0
+ mov DWORD [24+edi],ebx
+ mov eax,DWORD [28+esi]
+ ; saved r[6]
+ ; ############### Calculate word 7
+ xor ebx,ebx
+ ; sqr a[7]*a[0]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ebx,0
+ add ecx,eax
+ adc ebp,edx
+ mov eax,DWORD [24+esi]
+ adc ebx,0
+ mov edx,DWORD [4+esi]
+ ; sqr a[6]*a[1]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ebx,0
+ add ecx,eax
+ adc ebp,edx
+ mov eax,DWORD [20+esi]
+ adc ebx,0
+ mov edx,DWORD [8+esi]
+ ; sqr a[5]*a[2]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ebx,0
+ add ecx,eax
+ adc ebp,edx
+ mov eax,DWORD [16+esi]
+ adc ebx,0
+ mov edx,DWORD [12+esi]
+ ; sqr a[4]*a[3]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ebx,0
+ add ecx,eax
+ adc ebp,edx
+ mov eax,DWORD [28+esi]
+ adc ebx,0
+ mov DWORD [28+edi],ecx
+ mov edx,DWORD [4+esi]
+ ; saved r[7]
+ ; ############### Calculate word 8
+ xor ecx,ecx
+ ; sqr a[7]*a[1]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ecx,0
+ add ebp,eax
+ adc ebx,edx
+ mov eax,DWORD [24+esi]
+ adc ecx,0
+ mov edx,DWORD [8+esi]
+ ; sqr a[6]*a[2]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ecx,0
+ add ebp,eax
+ adc ebx,edx
+ mov eax,DWORD [20+esi]
+ adc ecx,0
+ mov edx,DWORD [12+esi]
+ ; sqr a[5]*a[3]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ecx,0
+ add ebp,eax
+ adc ebx,edx
+ mov eax,DWORD [16+esi]
+ adc ecx,0
+ ; sqr a[4]*a[4]
+ mul eax
+ add ebp,eax
+ adc ebx,edx
+ mov edx,DWORD [8+esi]
+ adc ecx,0
+ mov DWORD [32+edi],ebp
+ mov eax,DWORD [28+esi]
+ ; saved r[8]
+ ; ############### Calculate word 9
+ xor ebp,ebp
+ ; sqr a[7]*a[2]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ebp,0
+ add ebx,eax
+ adc ecx,edx
+ mov eax,DWORD [24+esi]
+ adc ebp,0
+ mov edx,DWORD [12+esi]
+ ; sqr a[6]*a[3]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ebp,0
+ add ebx,eax
+ adc ecx,edx
+ mov eax,DWORD [20+esi]
+ adc ebp,0
+ mov edx,DWORD [16+esi]
+ ; sqr a[5]*a[4]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ebp,0
+ add ebx,eax
+ adc ecx,edx
+ mov eax,DWORD [28+esi]
+ adc ebp,0
+ mov DWORD [36+edi],ebx
+ mov edx,DWORD [12+esi]
+ ; saved r[9]
+ ; ############### Calculate word 10
+ xor ebx,ebx
+ ; sqr a[7]*a[3]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ebx,0
+ add ecx,eax
+ adc ebp,edx
+ mov eax,DWORD [24+esi]
+ adc ebx,0
+ mov edx,DWORD [16+esi]
+ ; sqr a[6]*a[4]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ebx,0
+ add ecx,eax
+ adc ebp,edx
+ mov eax,DWORD [20+esi]
+ adc ebx,0
+ ; sqr a[5]*a[5]
+ mul eax
+ add ecx,eax
+ adc ebp,edx
+ mov edx,DWORD [16+esi]
+ adc ebx,0
+ mov DWORD [40+edi],ecx
+ mov eax,DWORD [28+esi]
+ ; saved r[10]
+ ; ############### Calculate word 11
+ xor ecx,ecx
+ ; sqr a[7]*a[4]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ecx,0
+ add ebp,eax
+ adc ebx,edx
+ mov eax,DWORD [24+esi]
+ adc ecx,0
+ mov edx,DWORD [20+esi]
+ ; sqr a[6]*a[5]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ecx,0
+ add ebp,eax
+ adc ebx,edx
+ mov eax,DWORD [28+esi]
+ adc ecx,0
+ mov DWORD [44+edi],ebp
+ mov edx,DWORD [20+esi]
+ ; saved r[11]
+ ; ############### Calculate word 12
+ xor ebp,ebp
+ ; sqr a[7]*a[5]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ebp,0
+ add ebx,eax
+ adc ecx,edx
+ mov eax,DWORD [24+esi]
+ adc ebp,0
+ ; sqr a[6]*a[6]
+ mul eax
+ add ebx,eax
+ adc ecx,edx
+ mov edx,DWORD [24+esi]
+ adc ebp,0
+ mov DWORD [48+edi],ebx
+ mov eax,DWORD [28+esi]
+ ; saved r[12]
+ ; ############### Calculate word 13
+ xor ebx,ebx
+ ; sqr a[7]*a[6]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ebx,0
+ add ecx,eax
+ adc ebp,edx
+ mov eax,DWORD [28+esi]
+ adc ebx,0
+ mov DWORD [52+edi],ecx
+ ; saved r[13]
+ ; ############### Calculate word 14
+ xor ecx,ecx
+ ; sqr a[7]*a[7]
+ mul eax
+ add ebp,eax
+ adc ebx,edx
+ adc ecx,0
+ mov DWORD [56+edi],ebp
+ ; saved r[14]
+ mov DWORD [60+edi],ebx
+ pop ebx
+ pop ebp
+ pop edi
+ pop esi
+ ret
+global _bn_sqr_comba4
+align 16
+_bn_sqr_comba4:
+L$_bn_sqr_comba4_begin:
+ push esi
+ push edi
+ push ebp
+ push ebx
+ mov edi,DWORD [20+esp]
+ mov esi,DWORD [24+esp]
+ xor ebx,ebx
+ xor ecx,ecx
+ mov eax,DWORD [esi]
+ ; ############### Calculate word 0
+ xor ebp,ebp
+ ; sqr a[0]*a[0]
+ mul eax
+ add ebx,eax
+ adc ecx,edx
+ mov edx,DWORD [esi]
+ adc ebp,0
+ mov DWORD [edi],ebx
+ mov eax,DWORD [4+esi]
+ ; saved r[0]
+ ; ############### Calculate word 1
+ xor ebx,ebx
+ ; sqr a[1]*a[0]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ebx,0
+ add ecx,eax
+ adc ebp,edx
+ mov eax,DWORD [8+esi]
+ adc ebx,0
+ mov DWORD [4+edi],ecx
+ mov edx,DWORD [esi]
+ ; saved r[1]
+ ; ############### Calculate word 2
+ xor ecx,ecx
+ ; sqr a[2]*a[0]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ecx,0
+ add ebp,eax
+ adc ebx,edx
+ mov eax,DWORD [4+esi]
+ adc ecx,0
+ ; sqr a[1]*a[1]
+ mul eax
+ add ebp,eax
+ adc ebx,edx
+ mov edx,DWORD [esi]
+ adc ecx,0
+ mov DWORD [8+edi],ebp
+ mov eax,DWORD [12+esi]
+ ; saved r[2]
+ ; ############### Calculate word 3
+ xor ebp,ebp
+ ; sqr a[3]*a[0]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ebp,0
+ add ebx,eax
+ adc ecx,edx
+ mov eax,DWORD [8+esi]
+ adc ebp,0
+ mov edx,DWORD [4+esi]
+ ; sqr a[2]*a[1]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ebp,0
+ add ebx,eax
+ adc ecx,edx
+ mov eax,DWORD [12+esi]
+ adc ebp,0
+ mov DWORD [12+edi],ebx
+ mov edx,DWORD [4+esi]
+ ; saved r[3]
+ ; ############### Calculate word 4
+ xor ebx,ebx
+ ; sqr a[3]*a[1]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ebx,0
+ add ecx,eax
+ adc ebp,edx
+ mov eax,DWORD [8+esi]
+ adc ebx,0
+ ; sqr a[2]*a[2]
+ mul eax
+ add ecx,eax
+ adc ebp,edx
+ mov edx,DWORD [8+esi]
+ adc ebx,0
+ mov DWORD [16+edi],ecx
+ mov eax,DWORD [12+esi]
+ ; saved r[4]
+ ; ############### Calculate word 5
+ xor ecx,ecx
+ ; sqr a[3]*a[2]
+ mul edx
+ add eax,eax
+ adc edx,edx
+ adc ecx,0
+ add ebp,eax
+ adc ebx,edx
+ mov eax,DWORD [12+esi]
+ adc ecx,0
+ mov DWORD [20+edi],ebp
+ ; saved r[5]
+ ; ############### Calculate word 6
+ xor ebp,ebp
+ ; sqr a[3]*a[3]
+ mul eax
+ add ebx,eax
+ adc ecx,edx
+ adc ebp,0
+ mov DWORD [24+edi],ebx
+ ; saved r[6]
+ mov DWORD [28+edi],ecx
+ pop ebx
+ pop ebp
+ pop edi
+ pop esi
+ ret
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-gf2m.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-gf2m.nasm
new file mode 100644
index 0000000000..5f2f4f65de
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-gf2m.nasm
@@ -0,0 +1,352 @@
+; Copyright 2011-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text code align=64
+%else
+section .text code
+%endif
+;extern _OPENSSL_ia32cap_P
+align 16
+__mul_1x1_mmx:
+ sub esp,36
+ mov ecx,eax
+ lea edx,[eax*1+eax]
+ and ecx,1073741823
+ lea ebp,[edx*1+edx]
+ mov DWORD [esp],0
+ and edx,2147483647
+ movd mm2,eax
+ movd mm3,ebx
+ mov DWORD [4+esp],ecx
+ xor ecx,edx
+ pxor mm5,mm5
+ pxor mm4,mm4
+ mov DWORD [8+esp],edx
+ xor edx,ebp
+ mov DWORD [12+esp],ecx
+ pcmpgtd mm5,mm2
+ paddd mm2,mm2
+ xor ecx,edx
+ mov DWORD [16+esp],ebp
+ xor ebp,edx
+ pand mm5,mm3
+ pcmpgtd mm4,mm2
+ mov DWORD [20+esp],ecx
+ xor ebp,ecx
+ psllq mm5,31
+ pand mm4,mm3
+ mov DWORD [24+esp],edx
+ mov esi,7
+ mov DWORD [28+esp],ebp
+ mov ebp,esi
+ and esi,ebx
+ shr ebx,3
+ mov edi,ebp
+ psllq mm4,30
+ and edi,ebx
+ shr ebx,3
+ movd mm0,DWORD [esi*4+esp]
+ mov esi,ebp
+ and esi,ebx
+ shr ebx,3
+ movd mm2,DWORD [edi*4+esp]
+ mov edi,ebp
+ psllq mm2,3
+ and edi,ebx
+ shr ebx,3
+ pxor mm0,mm2
+ movd mm1,DWORD [esi*4+esp]
+ mov esi,ebp
+ psllq mm1,6
+ and esi,ebx
+ shr ebx,3
+ pxor mm0,mm1
+ movd mm2,DWORD [edi*4+esp]
+ mov edi,ebp
+ psllq mm2,9
+ and edi,ebx
+ shr ebx,3
+ pxor mm0,mm2
+ movd mm1,DWORD [esi*4+esp]
+ mov esi,ebp
+ psllq mm1,12
+ and esi,ebx
+ shr ebx,3
+ pxor mm0,mm1
+ movd mm2,DWORD [edi*4+esp]
+ mov edi,ebp
+ psllq mm2,15
+ and edi,ebx
+ shr ebx,3
+ pxor mm0,mm2
+ movd mm1,DWORD [esi*4+esp]
+ mov esi,ebp
+ psllq mm1,18
+ and esi,ebx
+ shr ebx,3
+ pxor mm0,mm1
+ movd mm2,DWORD [edi*4+esp]
+ mov edi,ebp
+ psllq mm2,21
+ and edi,ebx
+ shr ebx,3
+ pxor mm0,mm2
+ movd mm1,DWORD [esi*4+esp]
+ mov esi,ebp
+ psllq mm1,24
+ and esi,ebx
+ shr ebx,3
+ pxor mm0,mm1
+ movd mm2,DWORD [edi*4+esp]
+ pxor mm0,mm4
+ psllq mm2,27
+ pxor mm0,mm2
+ movd mm1,DWORD [esi*4+esp]
+ pxor mm0,mm5
+ psllq mm1,30
+ add esp,36
+ pxor mm0,mm1
+ ret
+align 16
+__mul_1x1_ialu:
+ sub esp,36
+ mov ecx,eax
+ lea edx,[eax*1+eax]
+ lea ebp,[eax*4]
+ and ecx,1073741823
+ lea edi,[eax*1+eax]
+ sar eax,31
+ mov DWORD [esp],0
+ and edx,2147483647
+ mov DWORD [4+esp],ecx
+ xor ecx,edx
+ mov DWORD [8+esp],edx
+ xor edx,ebp
+ mov DWORD [12+esp],ecx
+ xor ecx,edx
+ mov DWORD [16+esp],ebp
+ xor ebp,edx
+ mov DWORD [20+esp],ecx
+ xor ebp,ecx
+ sar edi,31
+ and eax,ebx
+ mov DWORD [24+esp],edx
+ and edi,ebx
+ mov DWORD [28+esp],ebp
+ mov edx,eax
+ shl eax,31
+ mov ecx,edi
+ shr edx,1
+ mov esi,7
+ shl edi,30
+ and esi,ebx
+ shr ecx,2
+ xor eax,edi
+ shr ebx,3
+ mov edi,7
+ and edi,ebx
+ shr ebx,3
+ xor edx,ecx
+ xor eax,DWORD [esi*4+esp]
+ mov esi,7
+ and esi,ebx
+ shr ebx,3
+ mov ebp,DWORD [edi*4+esp]
+ mov edi,7
+ mov ecx,ebp
+ shl ebp,3
+ and edi,ebx
+ shr ecx,29
+ xor eax,ebp
+ shr ebx,3
+ xor edx,ecx
+ mov ecx,DWORD [esi*4+esp]
+ mov esi,7
+ mov ebp,ecx
+ shl ecx,6
+ and esi,ebx
+ shr ebp,26
+ xor eax,ecx
+ shr ebx,3
+ xor edx,ebp
+ mov ebp,DWORD [edi*4+esp]
+ mov edi,7
+ mov ecx,ebp
+ shl ebp,9
+ and edi,ebx
+ shr ecx,23
+ xor eax,ebp
+ shr ebx,3
+ xor edx,ecx
+ mov ecx,DWORD [esi*4+esp]
+ mov esi,7
+ mov ebp,ecx
+ shl ecx,12
+ and esi,ebx
+ shr ebp,20
+ xor eax,ecx
+ shr ebx,3
+ xor edx,ebp
+ mov ebp,DWORD [edi*4+esp]
+ mov edi,7
+ mov ecx,ebp
+ shl ebp,15
+ and edi,ebx
+ shr ecx,17
+ xor eax,ebp
+ shr ebx,3
+ xor edx,ecx
+ mov ecx,DWORD [esi*4+esp]
+ mov esi,7
+ mov ebp,ecx
+ shl ecx,18
+ and esi,ebx
+ shr ebp,14
+ xor eax,ecx
+ shr ebx,3
+ xor edx,ebp
+ mov ebp,DWORD [edi*4+esp]
+ mov edi,7
+ mov ecx,ebp
+ shl ebp,21
+ and edi,ebx
+ shr ecx,11
+ xor eax,ebp
+ shr ebx,3
+ xor edx,ecx
+ mov ecx,DWORD [esi*4+esp]
+ mov esi,7
+ mov ebp,ecx
+ shl ecx,24
+ and esi,ebx
+ shr ebp,8
+ xor eax,ecx
+ shr ebx,3
+ xor edx,ebp
+ mov ebp,DWORD [edi*4+esp]
+ mov ecx,ebp
+ shl ebp,27
+ mov edi,DWORD [esi*4+esp]
+ shr ecx,5
+ mov esi,edi
+ xor eax,ebp
+ shl edi,30
+ xor edx,ecx
+ shr esi,2
+ xor eax,edi
+ xor edx,esi
+ add esp,36
+ ret
+global _bn_GF2m_mul_2x2
+align 16
+_bn_GF2m_mul_2x2:
+L$_bn_GF2m_mul_2x2_begin:
+ lea edx,[_OPENSSL_ia32cap_P]
+ mov eax,DWORD [edx]
+ mov edx,DWORD [4+edx]
+ test eax,8388608
+ jz NEAR L$000ialu
+ test eax,16777216
+ jz NEAR L$001mmx
+ test edx,2
+ jz NEAR L$001mmx
+ movups xmm0,[8+esp]
+ shufps xmm0,xmm0,177
+db 102,15,58,68,192,1
+ mov eax,DWORD [4+esp]
+ movups [eax],xmm0
+ ret
+align 16
+L$001mmx:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ mov eax,DWORD [24+esp]
+ mov ebx,DWORD [32+esp]
+ call __mul_1x1_mmx
+ movq mm7,mm0
+ mov eax,DWORD [28+esp]
+ mov ebx,DWORD [36+esp]
+ call __mul_1x1_mmx
+ movq mm6,mm0
+ mov eax,DWORD [24+esp]
+ mov ebx,DWORD [32+esp]
+ xor eax,DWORD [28+esp]
+ xor ebx,DWORD [36+esp]
+ call __mul_1x1_mmx
+ pxor mm0,mm7
+ mov eax,DWORD [20+esp]
+ pxor mm0,mm6
+ movq mm2,mm0
+ psllq mm0,32
+ pop edi
+ psrlq mm2,32
+ pop esi
+ pxor mm0,mm6
+ pop ebx
+ pxor mm2,mm7
+ movq [eax],mm0
+ pop ebp
+ movq [8+eax],mm2
+ emms
+ ret
+align 16
+L$000ialu:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ sub esp,20
+ mov eax,DWORD [44+esp]
+ mov ebx,DWORD [52+esp]
+ call __mul_1x1_ialu
+ mov DWORD [8+esp],eax
+ mov DWORD [12+esp],edx
+ mov eax,DWORD [48+esp]
+ mov ebx,DWORD [56+esp]
+ call __mul_1x1_ialu
+ mov DWORD [esp],eax
+ mov DWORD [4+esp],edx
+ mov eax,DWORD [44+esp]
+ mov ebx,DWORD [52+esp]
+ xor eax,DWORD [48+esp]
+ xor ebx,DWORD [56+esp]
+ call __mul_1x1_ialu
+ mov ebp,DWORD [40+esp]
+ mov ebx,DWORD [esp]
+ mov ecx,DWORD [4+esp]
+ mov edi,DWORD [8+esp]
+ mov esi,DWORD [12+esp]
+ xor eax,edx
+ xor edx,ecx
+ xor eax,ebx
+ mov DWORD [ebp],ebx
+ xor edx,edi
+ mov DWORD [12+ebp],esi
+ xor eax,esi
+ add esp,20
+ xor edx,esi
+ pop edi
+ xor eax,edx
+ pop esi
+ mov DWORD [8+ebp],edx
+ pop ebx
+ mov DWORD [4+ebp],eax
+ pop ebp
+ ret
+db 71,70,40,50,94,109,41,32,77,117,108,116,105,112,108,105
+db 99,97,116,105,111,110,32,102,111,114,32,120,56,54,44,32
+db 67,82,89,80,84,79,71,65,77,83,32,98,121,32,60,97
+db 112,112,114,111,64,111,112,101,110,115,115,108,46,111,114,103
+db 62,0
+segment .bss
+common _OPENSSL_ia32cap_P 16
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-mont.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-mont.nasm
new file mode 100644
index 0000000000..904526ffbf
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-mont.nasm
@@ -0,0 +1,486 @@
+; Copyright 2005-2018 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text code align=64
+%else
+section .text code
+%endif
+;extern _OPENSSL_ia32cap_P
+global _bn_mul_mont
+align 16
+_bn_mul_mont:
+L$_bn_mul_mont_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ xor eax,eax
+ mov edi,DWORD [40+esp]
+ cmp edi,4
+ jl NEAR L$000just_leave
+ lea esi,[20+esp]
+ lea edx,[24+esp]
+ add edi,2
+ neg edi
+ lea ebp,[edi*4+esp-32]
+ neg edi
+ mov eax,ebp
+ sub eax,edx
+ and eax,2047
+ sub ebp,eax
+ xor edx,ebp
+ and edx,2048
+ xor edx,2048
+ sub ebp,edx
+ and ebp,-64
+ mov eax,esp
+ sub eax,ebp
+ and eax,-4096
+ mov edx,esp
+ lea esp,[eax*1+ebp]
+ mov eax,DWORD [esp]
+ cmp esp,ebp
+ ja NEAR L$001page_walk
+ jmp NEAR L$002page_walk_done
+align 16
+L$001page_walk:
+ lea esp,[esp-4096]
+ mov eax,DWORD [esp]
+ cmp esp,ebp
+ ja NEAR L$001page_walk
+L$002page_walk_done:
+ mov eax,DWORD [esi]
+ mov ebx,DWORD [4+esi]
+ mov ecx,DWORD [8+esi]
+ mov ebp,DWORD [12+esi]
+ mov esi,DWORD [16+esi]
+ mov esi,DWORD [esi]
+ mov DWORD [4+esp],eax
+ mov DWORD [8+esp],ebx
+ mov DWORD [12+esp],ecx
+ mov DWORD [16+esp],ebp
+ mov DWORD [20+esp],esi
+ lea ebx,[edi-3]
+ mov DWORD [24+esp],edx
+ lea eax,[_OPENSSL_ia32cap_P]
+ bt DWORD [eax],26
+ jnc NEAR L$003non_sse2
+ mov eax,-1
+ movd mm7,eax
+ mov esi,DWORD [8+esp]
+ mov edi,DWORD [12+esp]
+ mov ebp,DWORD [16+esp]
+ xor edx,edx
+ xor ecx,ecx
+ movd mm4,DWORD [edi]
+ movd mm5,DWORD [esi]
+ movd mm3,DWORD [ebp]
+ pmuludq mm5,mm4
+ movq mm2,mm5
+ movq mm0,mm5
+ pand mm0,mm7
+ pmuludq mm5,[20+esp]
+ pmuludq mm3,mm5
+ paddq mm3,mm0
+ movd mm1,DWORD [4+ebp]
+ movd mm0,DWORD [4+esi]
+ psrlq mm2,32
+ psrlq mm3,32
+ inc ecx
+align 16
+L$0041st:
+ pmuludq mm0,mm4
+ pmuludq mm1,mm5
+ paddq mm2,mm0
+ paddq mm3,mm1
+ movq mm0,mm2
+ pand mm0,mm7
+ movd mm1,DWORD [4+ecx*4+ebp]
+ paddq mm3,mm0
+ movd mm0,DWORD [4+ecx*4+esi]
+ psrlq mm2,32
+ movd DWORD [28+ecx*4+esp],mm3
+ psrlq mm3,32
+ lea ecx,[1+ecx]
+ cmp ecx,ebx
+ jl NEAR L$0041st
+ pmuludq mm0,mm4
+ pmuludq mm1,mm5
+ paddq mm2,mm0
+ paddq mm3,mm1
+ movq mm0,mm2
+ pand mm0,mm7
+ paddq mm3,mm0
+ movd DWORD [28+ecx*4+esp],mm3
+ psrlq mm2,32
+ psrlq mm3,32
+ paddq mm3,mm2
+ movq [32+ebx*4+esp],mm3
+ inc edx
+L$005outer:
+ xor ecx,ecx
+ movd mm4,DWORD [edx*4+edi]
+ movd mm5,DWORD [esi]
+ movd mm6,DWORD [32+esp]
+ movd mm3,DWORD [ebp]
+ pmuludq mm5,mm4
+ paddq mm5,mm6
+ movq mm0,mm5
+ movq mm2,mm5
+ pand mm0,mm7
+ pmuludq mm5,[20+esp]
+ pmuludq mm3,mm5
+ paddq mm3,mm0
+ movd mm6,DWORD [36+esp]
+ movd mm1,DWORD [4+ebp]
+ movd mm0,DWORD [4+esi]
+ psrlq mm2,32
+ psrlq mm3,32
+ paddq mm2,mm6
+ inc ecx
+ dec ebx
+L$006inner:
+ pmuludq mm0,mm4
+ pmuludq mm1,mm5
+ paddq mm2,mm0
+ paddq mm3,mm1
+ movq mm0,mm2
+ movd mm6,DWORD [36+ecx*4+esp]
+ pand mm0,mm7
+ movd mm1,DWORD [4+ecx*4+ebp]
+ paddq mm3,mm0
+ movd mm0,DWORD [4+ecx*4+esi]
+ psrlq mm2,32
+ movd DWORD [28+ecx*4+esp],mm3
+ psrlq mm3,32
+ paddq mm2,mm6
+ dec ebx
+ lea ecx,[1+ecx]
+ jnz NEAR L$006inner
+ mov ebx,ecx
+ pmuludq mm0,mm4
+ pmuludq mm1,mm5
+ paddq mm2,mm0
+ paddq mm3,mm1
+ movq mm0,mm2
+ pand mm0,mm7
+ paddq mm3,mm0
+ movd DWORD [28+ecx*4+esp],mm3
+ psrlq mm2,32
+ psrlq mm3,32
+ movd mm6,DWORD [36+ebx*4+esp]
+ paddq mm3,mm2
+ paddq mm3,mm6
+ movq [32+ebx*4+esp],mm3
+ lea edx,[1+edx]
+ cmp edx,ebx
+ jle NEAR L$005outer
+ emms
+ jmp NEAR L$007common_tail
+align 16
+L$003non_sse2:
+ mov esi,DWORD [8+esp]
+ lea ebp,[1+ebx]
+ mov edi,DWORD [12+esp]
+ xor ecx,ecx
+ mov edx,esi
+ and ebp,1
+ sub edx,edi
+ lea eax,[4+ebx*4+edi]
+ or ebp,edx
+ mov edi,DWORD [edi]
+ jz NEAR L$008bn_sqr_mont
+ mov DWORD [28+esp],eax
+ mov eax,DWORD [esi]
+ xor edx,edx
+align 16
+L$009mull:
+ mov ebp,edx
+ mul edi
+ add ebp,eax
+ lea ecx,[1+ecx]
+ adc edx,0
+ mov eax,DWORD [ecx*4+esi]
+ cmp ecx,ebx
+ mov DWORD [28+ecx*4+esp],ebp
+ jl NEAR L$009mull
+ mov ebp,edx
+ mul edi
+ mov edi,DWORD [20+esp]
+ add eax,ebp
+ mov esi,DWORD [16+esp]
+ adc edx,0
+ imul edi,DWORD [32+esp]
+ mov DWORD [32+ebx*4+esp],eax
+ xor ecx,ecx
+ mov DWORD [36+ebx*4+esp],edx
+ mov DWORD [40+ebx*4+esp],ecx
+ mov eax,DWORD [esi]
+ mul edi
+ add eax,DWORD [32+esp]
+ mov eax,DWORD [4+esi]
+ adc edx,0
+ inc ecx
+ jmp NEAR L$0102ndmadd
+align 16
+L$0111stmadd:
+ mov ebp,edx
+ mul edi
+ add ebp,DWORD [32+ecx*4+esp]
+ lea ecx,[1+ecx]
+ adc edx,0
+ add ebp,eax
+ mov eax,DWORD [ecx*4+esi]
+ adc edx,0
+ cmp ecx,ebx
+ mov DWORD [28+ecx*4+esp],ebp
+ jl NEAR L$0111stmadd
+ mov ebp,edx
+ mul edi
+ add eax,DWORD [32+ebx*4+esp]
+ mov edi,DWORD [20+esp]
+ adc edx,0
+ mov esi,DWORD [16+esp]
+ add ebp,eax
+ adc edx,0
+ imul edi,DWORD [32+esp]
+ xor ecx,ecx
+ add edx,DWORD [36+ebx*4+esp]
+ mov DWORD [32+ebx*4+esp],ebp
+ adc ecx,0
+ mov eax,DWORD [esi]
+ mov DWORD [36+ebx*4+esp],edx
+ mov DWORD [40+ebx*4+esp],ecx
+ mul edi
+ add eax,DWORD [32+esp]
+ mov eax,DWORD [4+esi]
+ adc edx,0
+ mov ecx,1
+align 16
+L$0102ndmadd:
+ mov ebp,edx
+ mul edi
+ add ebp,DWORD [32+ecx*4+esp]
+ lea ecx,[1+ecx]
+ adc edx,0
+ add ebp,eax
+ mov eax,DWORD [ecx*4+esi]
+ adc edx,0
+ cmp ecx,ebx
+ mov DWORD [24+ecx*4+esp],ebp
+ jl NEAR L$0102ndmadd
+ mov ebp,edx
+ mul edi
+ add ebp,DWORD [32+ebx*4+esp]
+ adc edx,0
+ add ebp,eax
+ adc edx,0
+ mov DWORD [28+ebx*4+esp],ebp
+ xor eax,eax
+ mov ecx,DWORD [12+esp]
+ add edx,DWORD [36+ebx*4+esp]
+ adc eax,DWORD [40+ebx*4+esp]
+ lea ecx,[4+ecx]
+ mov DWORD [32+ebx*4+esp],edx
+ cmp ecx,DWORD [28+esp]
+ mov DWORD [36+ebx*4+esp],eax
+ je NEAR L$007common_tail
+ mov edi,DWORD [ecx]
+ mov esi,DWORD [8+esp]
+ mov DWORD [12+esp],ecx
+ xor ecx,ecx
+ xor edx,edx
+ mov eax,DWORD [esi]
+ jmp NEAR L$0111stmadd
+align 16
+L$008bn_sqr_mont:
+ mov DWORD [esp],ebx
+ mov DWORD [12+esp],ecx
+ mov eax,edi
+ mul edi
+ mov DWORD [32+esp],eax
+ mov ebx,edx
+ shr edx,1
+ and ebx,1
+ inc ecx
+align 16
+L$012sqr:
+ mov eax,DWORD [ecx*4+esi]
+ mov ebp,edx
+ mul edi
+ add eax,ebp
+ lea ecx,[1+ecx]
+ adc edx,0
+ lea ebp,[eax*2+ebx]
+ shr eax,31
+ cmp ecx,DWORD [esp]
+ mov ebx,eax
+ mov DWORD [28+ecx*4+esp],ebp
+ jl NEAR L$012sqr
+ mov eax,DWORD [ecx*4+esi]
+ mov ebp,edx
+ mul edi
+ add eax,ebp
+ mov edi,DWORD [20+esp]
+ adc edx,0
+ mov esi,DWORD [16+esp]
+ lea ebp,[eax*2+ebx]
+ imul edi,DWORD [32+esp]
+ shr eax,31
+ mov DWORD [32+ecx*4+esp],ebp
+ lea ebp,[edx*2+eax]
+ mov eax,DWORD [esi]
+ shr edx,31
+ mov DWORD [36+ecx*4+esp],ebp
+ mov DWORD [40+ecx*4+esp],edx
+ mul edi
+ add eax,DWORD [32+esp]
+ mov ebx,ecx
+ adc edx,0
+ mov eax,DWORD [4+esi]
+ mov ecx,1
+align 16
+L$0133rdmadd:
+ mov ebp,edx
+ mul edi
+ add ebp,DWORD [32+ecx*4+esp]
+ adc edx,0
+ add ebp,eax
+ mov eax,DWORD [4+ecx*4+esi]
+ adc edx,0
+ mov DWORD [28+ecx*4+esp],ebp
+ mov ebp,edx
+ mul edi
+ add ebp,DWORD [36+ecx*4+esp]
+ lea ecx,[2+ecx]
+ adc edx,0
+ add ebp,eax
+ mov eax,DWORD [ecx*4+esi]
+ adc edx,0
+ cmp ecx,ebx
+ mov DWORD [24+ecx*4+esp],ebp
+ jl NEAR L$0133rdmadd
+ mov ebp,edx
+ mul edi
+ add ebp,DWORD [32+ebx*4+esp]
+ adc edx,0
+ add ebp,eax
+ adc edx,0
+ mov DWORD [28+ebx*4+esp],ebp
+ mov ecx,DWORD [12+esp]
+ xor eax,eax
+ mov esi,DWORD [8+esp]
+ add edx,DWORD [36+ebx*4+esp]
+ adc eax,DWORD [40+ebx*4+esp]
+ mov DWORD [32+ebx*4+esp],edx
+ cmp ecx,ebx
+ mov DWORD [36+ebx*4+esp],eax
+ je NEAR L$007common_tail
+ mov edi,DWORD [4+ecx*4+esi]
+ lea ecx,[1+ecx]
+ mov eax,edi
+ mov DWORD [12+esp],ecx
+ mul edi
+ add eax,DWORD [32+ecx*4+esp]
+ adc edx,0
+ mov DWORD [32+ecx*4+esp],eax
+ xor ebp,ebp
+ cmp ecx,ebx
+ lea ecx,[1+ecx]
+ je NEAR L$014sqrlast
+ mov ebx,edx
+ shr edx,1
+ and ebx,1
+align 16
+L$015sqradd:
+ mov eax,DWORD [ecx*4+esi]
+ mov ebp,edx
+ mul edi
+ add eax,ebp
+ lea ebp,[eax*1+eax]
+ adc edx,0
+ shr eax,31
+ add ebp,DWORD [32+ecx*4+esp]
+ lea ecx,[1+ecx]
+ adc eax,0
+ add ebp,ebx
+ adc eax,0
+ cmp ecx,DWORD [esp]
+ mov DWORD [28+ecx*4+esp],ebp
+ mov ebx,eax
+ jle NEAR L$015sqradd
+ mov ebp,edx
+ add edx,edx
+ shr ebp,31
+ add edx,ebx
+ adc ebp,0
+L$014sqrlast:
+ mov edi,DWORD [20+esp]
+ mov esi,DWORD [16+esp]
+ imul edi,DWORD [32+esp]
+ add edx,DWORD [32+ecx*4+esp]
+ mov eax,DWORD [esi]
+ adc ebp,0
+ mov DWORD [32+ecx*4+esp],edx
+ mov DWORD [36+ecx*4+esp],ebp
+ mul edi
+ add eax,DWORD [32+esp]
+ lea ebx,[ecx-1]
+ adc edx,0
+ mov ecx,1
+ mov eax,DWORD [4+esi]
+ jmp NEAR L$0133rdmadd
+align 16
+L$007common_tail:
+ mov ebp,DWORD [16+esp]
+ mov edi,DWORD [4+esp]
+ lea esi,[32+esp]
+ mov eax,DWORD [esi]
+ mov ecx,ebx
+ xor edx,edx
+align 16
+L$016sub:
+ sbb eax,DWORD [edx*4+ebp]
+ mov DWORD [edx*4+edi],eax
+ dec ecx
+ mov eax,DWORD [4+edx*4+esi]
+ lea edx,[1+edx]
+ jge NEAR L$016sub
+ sbb eax,0
+ mov edx,-1
+ xor edx,eax
+ jmp NEAR L$017copy
+align 16
+L$017copy:
+ mov esi,DWORD [32+ebx*4+esp]
+ mov ebp,DWORD [ebx*4+edi]
+ mov DWORD [32+ebx*4+esp],ecx
+ and esi,eax
+ and ebp,edx
+ or ebp,esi
+ mov DWORD [ebx*4+edi],ebp
+ dec ebx
+ jge NEAR L$017copy
+ mov esp,DWORD [24+esp]
+ mov eax,1
+L$000just_leave:
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+db 77,111,110,116,103,111,109,101,114,121,32,77,117,108,116,105
+db 112,108,105,99,97,116,105,111,110,32,102,111,114,32,120,56
+db 54,44,32,67,82,89,80,84,79,71,65,77,83,32,98,121
+db 32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46
+db 111,114,103,62,0
+segment .bss
+common _OPENSSL_ia32cap_P 16
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/des/crypt586.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/des/crypt586.nasm
new file mode 100644
index 0000000000..dd69f436c4
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/des/crypt586.nasm
@@ -0,0 +1,887 @@
+; Copyright 1995-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text code align=64
+%else
+section .text code
+%endif
+extern _DES_SPtrans
+global _fcrypt_body
+align 16
+_fcrypt_body:
+L$_fcrypt_body_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ ;
+ ; Load the 2 words
+ xor edi,edi
+ xor esi,esi
+ lea edx,[_DES_SPtrans]
+ push edx
+ mov ebp,DWORD [28+esp]
+ push DWORD 25
+L$000start:
+ ;
+ ; Round 0
+ mov eax,DWORD [36+esp]
+ mov edx,esi
+ shr edx,16
+ mov ecx,DWORD [40+esp]
+ xor edx,esi
+ and eax,edx
+ and edx,ecx
+ mov ebx,eax
+ shl ebx,16
+ mov ecx,edx
+ shl ecx,16
+ xor eax,ebx
+ xor edx,ecx
+ mov ebx,DWORD [ebp]
+ xor eax,ebx
+ mov ecx,DWORD [4+ebp]
+ xor eax,esi
+ xor edx,esi
+ xor edx,ecx
+ and eax,0xfcfcfcfc
+ xor ebx,ebx
+ and edx,0xcfcfcfcf
+ xor ecx,ecx
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ mov ebp,DWORD [4+esp]
+ xor edi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor edi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor edi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor edi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ mov ebx,DWORD [0x600+ebx*1+ebp]
+ xor edi,ebx
+ mov ebx,DWORD [0x700+ecx*1+ebp]
+ xor edi,ebx
+ mov ebx,DWORD [0x400+eax*1+ebp]
+ xor edi,ebx
+ mov ebx,DWORD [0x500+edx*1+ebp]
+ xor edi,ebx
+ mov ebp,DWORD [32+esp]
+ ;
+ ; Round 1
+ mov eax,DWORD [36+esp]
+ mov edx,edi
+ shr edx,16
+ mov ecx,DWORD [40+esp]
+ xor edx,edi
+ and eax,edx
+ and edx,ecx
+ mov ebx,eax
+ shl ebx,16
+ mov ecx,edx
+ shl ecx,16
+ xor eax,ebx
+ xor edx,ecx
+ mov ebx,DWORD [8+ebp]
+ xor eax,ebx
+ mov ecx,DWORD [12+ebp]
+ xor eax,edi
+ xor edx,edi
+ xor edx,ecx
+ and eax,0xfcfcfcfc
+ xor ebx,ebx
+ and edx,0xcfcfcfcf
+ xor ecx,ecx
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ mov ebp,DWORD [4+esp]
+ xor esi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor esi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor esi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor esi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ mov ebx,DWORD [0x600+ebx*1+ebp]
+ xor esi,ebx
+ mov ebx,DWORD [0x700+ecx*1+ebp]
+ xor esi,ebx
+ mov ebx,DWORD [0x400+eax*1+ebp]
+ xor esi,ebx
+ mov ebx,DWORD [0x500+edx*1+ebp]
+ xor esi,ebx
+ mov ebp,DWORD [32+esp]
+ ;
+ ; Round 2
+ mov eax,DWORD [36+esp]
+ mov edx,esi
+ shr edx,16
+ mov ecx,DWORD [40+esp]
+ xor edx,esi
+ and eax,edx
+ and edx,ecx
+ mov ebx,eax
+ shl ebx,16
+ mov ecx,edx
+ shl ecx,16
+ xor eax,ebx
+ xor edx,ecx
+ mov ebx,DWORD [16+ebp]
+ xor eax,ebx
+ mov ecx,DWORD [20+ebp]
+ xor eax,esi
+ xor edx,esi
+ xor edx,ecx
+ and eax,0xfcfcfcfc
+ xor ebx,ebx
+ and edx,0xcfcfcfcf
+ xor ecx,ecx
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ mov ebp,DWORD [4+esp]
+ xor edi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor edi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor edi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor edi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ mov ebx,DWORD [0x600+ebx*1+ebp]
+ xor edi,ebx
+ mov ebx,DWORD [0x700+ecx*1+ebp]
+ xor edi,ebx
+ mov ebx,DWORD [0x400+eax*1+ebp]
+ xor edi,ebx
+ mov ebx,DWORD [0x500+edx*1+ebp]
+ xor edi,ebx
+ mov ebp,DWORD [32+esp]
+ ;
+ ; Round 3
+ mov eax,DWORD [36+esp]
+ mov edx,edi
+ shr edx,16
+ mov ecx,DWORD [40+esp]
+ xor edx,edi
+ and eax,edx
+ and edx,ecx
+ mov ebx,eax
+ shl ebx,16
+ mov ecx,edx
+ shl ecx,16
+ xor eax,ebx
+ xor edx,ecx
+ mov ebx,DWORD [24+ebp]
+ xor eax,ebx
+ mov ecx,DWORD [28+ebp]
+ xor eax,edi
+ xor edx,edi
+ xor edx,ecx
+ and eax,0xfcfcfcfc
+ xor ebx,ebx
+ and edx,0xcfcfcfcf
+ xor ecx,ecx
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ mov ebp,DWORD [4+esp]
+ xor esi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor esi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor esi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor esi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ mov ebx,DWORD [0x600+ebx*1+ebp]
+ xor esi,ebx
+ mov ebx,DWORD [0x700+ecx*1+ebp]
+ xor esi,ebx
+ mov ebx,DWORD [0x400+eax*1+ebp]
+ xor esi,ebx
+ mov ebx,DWORD [0x500+edx*1+ebp]
+ xor esi,ebx
+ mov ebp,DWORD [32+esp]
+ ;
+ ; Round 4
+ mov eax,DWORD [36+esp]
+ mov edx,esi
+ shr edx,16
+ mov ecx,DWORD [40+esp]
+ xor edx,esi
+ and eax,edx
+ and edx,ecx
+ mov ebx,eax
+ shl ebx,16
+ mov ecx,edx
+ shl ecx,16
+ xor eax,ebx
+ xor edx,ecx
+ mov ebx,DWORD [32+ebp]
+ xor eax,ebx
+ mov ecx,DWORD [36+ebp]
+ xor eax,esi
+ xor edx,esi
+ xor edx,ecx
+ and eax,0xfcfcfcfc
+ xor ebx,ebx
+ and edx,0xcfcfcfcf
+ xor ecx,ecx
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ mov ebp,DWORD [4+esp]
+ xor edi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor edi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor edi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor edi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ mov ebx,DWORD [0x600+ebx*1+ebp]
+ xor edi,ebx
+ mov ebx,DWORD [0x700+ecx*1+ebp]
+ xor edi,ebx
+ mov ebx,DWORD [0x400+eax*1+ebp]
+ xor edi,ebx
+ mov ebx,DWORD [0x500+edx*1+ebp]
+ xor edi,ebx
+ mov ebp,DWORD [32+esp]
+ ;
+ ; Round 5
+ mov eax,DWORD [36+esp]
+ mov edx,edi
+ shr edx,16
+ mov ecx,DWORD [40+esp]
+ xor edx,edi
+ and eax,edx
+ and edx,ecx
+ mov ebx,eax
+ shl ebx,16
+ mov ecx,edx
+ shl ecx,16
+ xor eax,ebx
+ xor edx,ecx
+ mov ebx,DWORD [40+ebp]
+ xor eax,ebx
+ mov ecx,DWORD [44+ebp]
+ xor eax,edi
+ xor edx,edi
+ xor edx,ecx
+ and eax,0xfcfcfcfc
+ xor ebx,ebx
+ and edx,0xcfcfcfcf
+ xor ecx,ecx
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ mov ebp,DWORD [4+esp]
+ xor esi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor esi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor esi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor esi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ mov ebx,DWORD [0x600+ebx*1+ebp]
+ xor esi,ebx
+ mov ebx,DWORD [0x700+ecx*1+ebp]
+ xor esi,ebx
+ mov ebx,DWORD [0x400+eax*1+ebp]
+ xor esi,ebx
+ mov ebx,DWORD [0x500+edx*1+ebp]
+ xor esi,ebx
+ mov ebp,DWORD [32+esp]
+ ;
+ ; Round 6
+ mov eax,DWORD [36+esp]
+ mov edx,esi
+ shr edx,16
+ mov ecx,DWORD [40+esp]
+ xor edx,esi
+ and eax,edx
+ and edx,ecx
+ mov ebx,eax
+ shl ebx,16
+ mov ecx,edx
+ shl ecx,16
+ xor eax,ebx
+ xor edx,ecx
+ mov ebx,DWORD [48+ebp]
+ xor eax,ebx
+ mov ecx,DWORD [52+ebp]
+ xor eax,esi
+ xor edx,esi
+ xor edx,ecx
+ and eax,0xfcfcfcfc
+ xor ebx,ebx
+ and edx,0xcfcfcfcf
+ xor ecx,ecx
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ mov ebp,DWORD [4+esp]
+ xor edi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor edi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor edi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor edi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ mov ebx,DWORD [0x600+ebx*1+ebp]
+ xor edi,ebx
+ mov ebx,DWORD [0x700+ecx*1+ebp]
+ xor edi,ebx
+ mov ebx,DWORD [0x400+eax*1+ebp]
+ xor edi,ebx
+ mov ebx,DWORD [0x500+edx*1+ebp]
+ xor edi,ebx
+ mov ebp,DWORD [32+esp]
+ ;
+ ; Round 7
+ mov eax,DWORD [36+esp]
+ mov edx,edi
+ shr edx,16
+ mov ecx,DWORD [40+esp]
+ xor edx,edi
+ and eax,edx
+ and edx,ecx
+ mov ebx,eax
+ shl ebx,16
+ mov ecx,edx
+ shl ecx,16
+ xor eax,ebx
+ xor edx,ecx
+ mov ebx,DWORD [56+ebp]
+ xor eax,ebx
+ mov ecx,DWORD [60+ebp]
+ xor eax,edi
+ xor edx,edi
+ xor edx,ecx
+ and eax,0xfcfcfcfc
+ xor ebx,ebx
+ and edx,0xcfcfcfcf
+ xor ecx,ecx
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ mov ebp,DWORD [4+esp]
+ xor esi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor esi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor esi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor esi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ mov ebx,DWORD [0x600+ebx*1+ebp]
+ xor esi,ebx
+ mov ebx,DWORD [0x700+ecx*1+ebp]
+ xor esi,ebx
+ mov ebx,DWORD [0x400+eax*1+ebp]
+ xor esi,ebx
+ mov ebx,DWORD [0x500+edx*1+ebp]
+ xor esi,ebx
+ mov ebp,DWORD [32+esp]
+ ;
+ ; Round 8
+ mov eax,DWORD [36+esp]
+ mov edx,esi
+ shr edx,16
+ mov ecx,DWORD [40+esp]
+ xor edx,esi
+ and eax,edx
+ and edx,ecx
+ mov ebx,eax
+ shl ebx,16
+ mov ecx,edx
+ shl ecx,16
+ xor eax,ebx
+ xor edx,ecx
+ mov ebx,DWORD [64+ebp]
+ xor eax,ebx
+ mov ecx,DWORD [68+ebp]
+ xor eax,esi
+ xor edx,esi
+ xor edx,ecx
+ and eax,0xfcfcfcfc
+ xor ebx,ebx
+ and edx,0xcfcfcfcf
+ xor ecx,ecx
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ mov ebp,DWORD [4+esp]
+ xor edi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor edi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor edi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor edi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ mov ebx,DWORD [0x600+ebx*1+ebp]
+ xor edi,ebx
+ mov ebx,DWORD [0x700+ecx*1+ebp]
+ xor edi,ebx
+ mov ebx,DWORD [0x400+eax*1+ebp]
+ xor edi,ebx
+ mov ebx,DWORD [0x500+edx*1+ebp]
+ xor edi,ebx
+ mov ebp,DWORD [32+esp]
+ ;
+ ; Round 9
+ mov eax,DWORD [36+esp]
+ mov edx,edi
+ shr edx,16
+ mov ecx,DWORD [40+esp]
+ xor edx,edi
+ and eax,edx
+ and edx,ecx
+ mov ebx,eax
+ shl ebx,16
+ mov ecx,edx
+ shl ecx,16
+ xor eax,ebx
+ xor edx,ecx
+ mov ebx,DWORD [72+ebp]
+ xor eax,ebx
+ mov ecx,DWORD [76+ebp]
+ xor eax,edi
+ xor edx,edi
+ xor edx,ecx
+ and eax,0xfcfcfcfc
+ xor ebx,ebx
+ and edx,0xcfcfcfcf
+ xor ecx,ecx
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ mov ebp,DWORD [4+esp]
+ xor esi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor esi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor esi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor esi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ mov ebx,DWORD [0x600+ebx*1+ebp]
+ xor esi,ebx
+ mov ebx,DWORD [0x700+ecx*1+ebp]
+ xor esi,ebx
+ mov ebx,DWORD [0x400+eax*1+ebp]
+ xor esi,ebx
+ mov ebx,DWORD [0x500+edx*1+ebp]
+ xor esi,ebx
+ mov ebp,DWORD [32+esp]
+ ;
+ ; Round 10
+ mov eax,DWORD [36+esp]
+ mov edx,esi
+ shr edx,16
+ mov ecx,DWORD [40+esp]
+ xor edx,esi
+ and eax,edx
+ and edx,ecx
+ mov ebx,eax
+ shl ebx,16
+ mov ecx,edx
+ shl ecx,16
+ xor eax,ebx
+ xor edx,ecx
+ mov ebx,DWORD [80+ebp]
+ xor eax,ebx
+ mov ecx,DWORD [84+ebp]
+ xor eax,esi
+ xor edx,esi
+ xor edx,ecx
+ and eax,0xfcfcfcfc
+ xor ebx,ebx
+ and edx,0xcfcfcfcf
+ xor ecx,ecx
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ mov ebp,DWORD [4+esp]
+ xor edi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor edi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor edi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor edi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ mov ebx,DWORD [0x600+ebx*1+ebp]
+ xor edi,ebx
+ mov ebx,DWORD [0x700+ecx*1+ebp]
+ xor edi,ebx
+ mov ebx,DWORD [0x400+eax*1+ebp]
+ xor edi,ebx
+ mov ebx,DWORD [0x500+edx*1+ebp]
+ xor edi,ebx
+ mov ebp,DWORD [32+esp]
+ ;
+ ; Round 11
+ mov eax,DWORD [36+esp]
+ mov edx,edi
+ shr edx,16
+ mov ecx,DWORD [40+esp]
+ xor edx,edi
+ and eax,edx
+ and edx,ecx
+ mov ebx,eax
+ shl ebx,16
+ mov ecx,edx
+ shl ecx,16
+ xor eax,ebx
+ xor edx,ecx
+ mov ebx,DWORD [88+ebp]
+ xor eax,ebx
+ mov ecx,DWORD [92+ebp]
+ xor eax,edi
+ xor edx,edi
+ xor edx,ecx
+ and eax,0xfcfcfcfc
+ xor ebx,ebx
+ and edx,0xcfcfcfcf
+ xor ecx,ecx
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ mov ebp,DWORD [4+esp]
+ xor esi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor esi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor esi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor esi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ mov ebx,DWORD [0x600+ebx*1+ebp]
+ xor esi,ebx
+ mov ebx,DWORD [0x700+ecx*1+ebp]
+ xor esi,ebx
+ mov ebx,DWORD [0x400+eax*1+ebp]
+ xor esi,ebx
+ mov ebx,DWORD [0x500+edx*1+ebp]
+ xor esi,ebx
+ mov ebp,DWORD [32+esp]
+ ;
+ ; Round 12
+ mov eax,DWORD [36+esp]
+ mov edx,esi
+ shr edx,16
+ mov ecx,DWORD [40+esp]
+ xor edx,esi
+ and eax,edx
+ and edx,ecx
+ mov ebx,eax
+ shl ebx,16
+ mov ecx,edx
+ shl ecx,16
+ xor eax,ebx
+ xor edx,ecx
+ mov ebx,DWORD [96+ebp]
+ xor eax,ebx
+ mov ecx,DWORD [100+ebp]
+ xor eax,esi
+ xor edx,esi
+ xor edx,ecx
+ and eax,0xfcfcfcfc
+ xor ebx,ebx
+ and edx,0xcfcfcfcf
+ xor ecx,ecx
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ mov ebp,DWORD [4+esp]
+ xor edi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor edi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor edi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor edi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ mov ebx,DWORD [0x600+ebx*1+ebp]
+ xor edi,ebx
+ mov ebx,DWORD [0x700+ecx*1+ebp]
+ xor edi,ebx
+ mov ebx,DWORD [0x400+eax*1+ebp]
+ xor edi,ebx
+ mov ebx,DWORD [0x500+edx*1+ebp]
+ xor edi,ebx
+ mov ebp,DWORD [32+esp]
+ ;
+ ; Round 13
+ mov eax,DWORD [36+esp]
+ mov edx,edi
+ shr edx,16
+ mov ecx,DWORD [40+esp]
+ xor edx,edi
+ and eax,edx
+ and edx,ecx
+ mov ebx,eax
+ shl ebx,16
+ mov ecx,edx
+ shl ecx,16
+ xor eax,ebx
+ xor edx,ecx
+ mov ebx,DWORD [104+ebp]
+ xor eax,ebx
+ mov ecx,DWORD [108+ebp]
+ xor eax,edi
+ xor edx,edi
+ xor edx,ecx
+ and eax,0xfcfcfcfc
+ xor ebx,ebx
+ and edx,0xcfcfcfcf
+ xor ecx,ecx
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ mov ebp,DWORD [4+esp]
+ xor esi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor esi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor esi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor esi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ mov ebx,DWORD [0x600+ebx*1+ebp]
+ xor esi,ebx
+ mov ebx,DWORD [0x700+ecx*1+ebp]
+ xor esi,ebx
+ mov ebx,DWORD [0x400+eax*1+ebp]
+ xor esi,ebx
+ mov ebx,DWORD [0x500+edx*1+ebp]
+ xor esi,ebx
+ mov ebp,DWORD [32+esp]
+ ;
+ ; Round 14
+ mov eax,DWORD [36+esp]
+ mov edx,esi
+ shr edx,16
+ mov ecx,DWORD [40+esp]
+ xor edx,esi
+ and eax,edx
+ and edx,ecx
+ mov ebx,eax
+ shl ebx,16
+ mov ecx,edx
+ shl ecx,16
+ xor eax,ebx
+ xor edx,ecx
+ mov ebx,DWORD [112+ebp]
+ xor eax,ebx
+ mov ecx,DWORD [116+ebp]
+ xor eax,esi
+ xor edx,esi
+ xor edx,ecx
+ and eax,0xfcfcfcfc
+ xor ebx,ebx
+ and edx,0xcfcfcfcf
+ xor ecx,ecx
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ mov ebp,DWORD [4+esp]
+ xor edi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor edi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor edi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor edi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ mov ebx,DWORD [0x600+ebx*1+ebp]
+ xor edi,ebx
+ mov ebx,DWORD [0x700+ecx*1+ebp]
+ xor edi,ebx
+ mov ebx,DWORD [0x400+eax*1+ebp]
+ xor edi,ebx
+ mov ebx,DWORD [0x500+edx*1+ebp]
+ xor edi,ebx
+ mov ebp,DWORD [32+esp]
+ ;
+ ; Round 15
+ mov eax,DWORD [36+esp]
+ mov edx,edi
+ shr edx,16
+ mov ecx,DWORD [40+esp]
+ xor edx,edi
+ and eax,edx
+ and edx,ecx
+ mov ebx,eax
+ shl ebx,16
+ mov ecx,edx
+ shl ecx,16
+ xor eax,ebx
+ xor edx,ecx
+ mov ebx,DWORD [120+ebp]
+ xor eax,ebx
+ mov ecx,DWORD [124+ebp]
+ xor eax,edi
+ xor edx,edi
+ xor edx,ecx
+ and eax,0xfcfcfcfc
+ xor ebx,ebx
+ and edx,0xcfcfcfcf
+ xor ecx,ecx
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ mov ebp,DWORD [4+esp]
+ xor esi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor esi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor esi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor esi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ mov ebx,DWORD [0x600+ebx*1+ebp]
+ xor esi,ebx
+ mov ebx,DWORD [0x700+ecx*1+ebp]
+ xor esi,ebx
+ mov ebx,DWORD [0x400+eax*1+ebp]
+ xor esi,ebx
+ mov ebx,DWORD [0x500+edx*1+ebp]
+ xor esi,ebx
+ mov ebp,DWORD [32+esp]
+ mov ebx,DWORD [esp]
+ mov eax,edi
+ dec ebx
+ mov edi,esi
+ mov esi,eax
+ mov DWORD [esp],ebx
+ jnz NEAR L$000start
+ ;
+ ; FP
+ mov edx,DWORD [28+esp]
+ ror edi,1
+ mov eax,esi
+ xor esi,edi
+ and esi,0xaaaaaaaa
+ xor eax,esi
+ xor edi,esi
+ ;
+ rol eax,23
+ mov esi,eax
+ xor eax,edi
+ and eax,0x03fc03fc
+ xor esi,eax
+ xor edi,eax
+ ;
+ rol esi,10
+ mov eax,esi
+ xor esi,edi
+ and esi,0x33333333
+ xor eax,esi
+ xor edi,esi
+ ;
+ rol edi,18
+ mov esi,edi
+ xor edi,eax
+ and edi,0xfff0000f
+ xor esi,edi
+ xor eax,edi
+ ;
+ rol esi,12
+ mov edi,esi
+ xor esi,eax
+ and esi,0xf0f0f0f0
+ xor edi,esi
+ xor eax,esi
+ ;
+ ror eax,4
+ mov DWORD [edx],eax
+ mov DWORD [4+edx],edi
+ add esp,8
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/des/des-586.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/des/des-586.nasm
new file mode 100644
index 0000000000..980d488316
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/des/des-586.nasm
@@ -0,0 +1,1835 @@
+; Copyright 1995-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text code align=64
+%else
+section .text code
+%endif
+global _DES_SPtrans
+align 16
+__x86_DES_encrypt:
+ push ecx
+ ; Round 0
+ mov eax,DWORD [ecx]
+ xor ebx,ebx
+ mov edx,DWORD [4+ecx]
+ xor eax,esi
+ xor ecx,ecx
+ xor edx,esi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor edi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor edi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor edi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor edi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor edi,DWORD [0x600+ebx*1+ebp]
+ xor edi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor edi,DWORD [0x400+eax*1+ebp]
+ xor edi,DWORD [0x500+edx*1+ebp]
+ ; Round 1
+ mov eax,DWORD [8+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [12+ecx]
+ xor eax,edi
+ xor ecx,ecx
+ xor edx,edi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor esi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor esi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor esi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor esi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor esi,DWORD [0x600+ebx*1+ebp]
+ xor esi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor esi,DWORD [0x400+eax*1+ebp]
+ xor esi,DWORD [0x500+edx*1+ebp]
+ ; Round 2
+ mov eax,DWORD [16+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [20+ecx]
+ xor eax,esi
+ xor ecx,ecx
+ xor edx,esi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor edi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor edi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor edi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor edi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor edi,DWORD [0x600+ebx*1+ebp]
+ xor edi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor edi,DWORD [0x400+eax*1+ebp]
+ xor edi,DWORD [0x500+edx*1+ebp]
+ ; Round 3
+ mov eax,DWORD [24+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [28+ecx]
+ xor eax,edi
+ xor ecx,ecx
+ xor edx,edi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor esi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor esi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor esi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor esi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor esi,DWORD [0x600+ebx*1+ebp]
+ xor esi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor esi,DWORD [0x400+eax*1+ebp]
+ xor esi,DWORD [0x500+edx*1+ebp]
+ ; Round 4
+ mov eax,DWORD [32+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [36+ecx]
+ xor eax,esi
+ xor ecx,ecx
+ xor edx,esi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor edi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor edi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor edi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor edi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor edi,DWORD [0x600+ebx*1+ebp]
+ xor edi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor edi,DWORD [0x400+eax*1+ebp]
+ xor edi,DWORD [0x500+edx*1+ebp]
+ ; Round 5
+ mov eax,DWORD [40+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [44+ecx]
+ xor eax,edi
+ xor ecx,ecx
+ xor edx,edi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor esi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor esi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor esi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor esi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor esi,DWORD [0x600+ebx*1+ebp]
+ xor esi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor esi,DWORD [0x400+eax*1+ebp]
+ xor esi,DWORD [0x500+edx*1+ebp]
+ ; Round 6
+ mov eax,DWORD [48+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [52+ecx]
+ xor eax,esi
+ xor ecx,ecx
+ xor edx,esi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor edi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor edi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor edi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor edi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor edi,DWORD [0x600+ebx*1+ebp]
+ xor edi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor edi,DWORD [0x400+eax*1+ebp]
+ xor edi,DWORD [0x500+edx*1+ebp]
+ ; Round 7
+ mov eax,DWORD [56+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [60+ecx]
+ xor eax,edi
+ xor ecx,ecx
+ xor edx,edi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor esi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor esi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor esi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor esi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor esi,DWORD [0x600+ebx*1+ebp]
+ xor esi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor esi,DWORD [0x400+eax*1+ebp]
+ xor esi,DWORD [0x500+edx*1+ebp]
+ ; Round 8
+ mov eax,DWORD [64+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [68+ecx]
+ xor eax,esi
+ xor ecx,ecx
+ xor edx,esi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor edi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor edi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor edi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor edi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor edi,DWORD [0x600+ebx*1+ebp]
+ xor edi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor edi,DWORD [0x400+eax*1+ebp]
+ xor edi,DWORD [0x500+edx*1+ebp]
+ ; Round 9
+ mov eax,DWORD [72+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [76+ecx]
+ xor eax,edi
+ xor ecx,ecx
+ xor edx,edi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor esi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor esi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor esi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor esi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor esi,DWORD [0x600+ebx*1+ebp]
+ xor esi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor esi,DWORD [0x400+eax*1+ebp]
+ xor esi,DWORD [0x500+edx*1+ebp]
+ ; Round 10
+ mov eax,DWORD [80+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [84+ecx]
+ xor eax,esi
+ xor ecx,ecx
+ xor edx,esi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor edi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor edi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor edi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor edi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor edi,DWORD [0x600+ebx*1+ebp]
+ xor edi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor edi,DWORD [0x400+eax*1+ebp]
+ xor edi,DWORD [0x500+edx*1+ebp]
+ ; Round 11
+ mov eax,DWORD [88+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [92+ecx]
+ xor eax,edi
+ xor ecx,ecx
+ xor edx,edi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor esi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor esi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor esi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor esi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor esi,DWORD [0x600+ebx*1+ebp]
+ xor esi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor esi,DWORD [0x400+eax*1+ebp]
+ xor esi,DWORD [0x500+edx*1+ebp]
+ ; Round 12
+ mov eax,DWORD [96+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [100+ecx]
+ xor eax,esi
+ xor ecx,ecx
+ xor edx,esi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor edi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor edi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor edi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor edi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor edi,DWORD [0x600+ebx*1+ebp]
+ xor edi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor edi,DWORD [0x400+eax*1+ebp]
+ xor edi,DWORD [0x500+edx*1+ebp]
+ ; Round 13
+ mov eax,DWORD [104+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [108+ecx]
+ xor eax,edi
+ xor ecx,ecx
+ xor edx,edi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor esi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor esi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor esi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor esi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor esi,DWORD [0x600+ebx*1+ebp]
+ xor esi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor esi,DWORD [0x400+eax*1+ebp]
+ xor esi,DWORD [0x500+edx*1+ebp]
+ ; Round 14
+ mov eax,DWORD [112+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [116+ecx]
+ xor eax,esi
+ xor ecx,ecx
+ xor edx,esi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor edi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor edi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor edi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor edi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor edi,DWORD [0x600+ebx*1+ebp]
+ xor edi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor edi,DWORD [0x400+eax*1+ebp]
+ xor edi,DWORD [0x500+edx*1+ebp]
+ ; Round 15
+ mov eax,DWORD [120+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [124+ecx]
+ xor eax,edi
+ xor ecx,ecx
+ xor edx,edi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor esi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor esi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor esi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor esi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor esi,DWORD [0x600+ebx*1+ebp]
+ xor esi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor esi,DWORD [0x400+eax*1+ebp]
+ xor esi,DWORD [0x500+edx*1+ebp]
+ add esp,4
+ ret
+align 16
+__x86_DES_decrypt:
+ push ecx
+ ; Round 15
+ mov eax,DWORD [120+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [124+ecx]
+ xor eax,esi
+ xor ecx,ecx
+ xor edx,esi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor edi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor edi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor edi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor edi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor edi,DWORD [0x600+ebx*1+ebp]
+ xor edi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor edi,DWORD [0x400+eax*1+ebp]
+ xor edi,DWORD [0x500+edx*1+ebp]
+ ; Round 14
+ mov eax,DWORD [112+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [116+ecx]
+ xor eax,edi
+ xor ecx,ecx
+ xor edx,edi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor esi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor esi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor esi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor esi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor esi,DWORD [0x600+ebx*1+ebp]
+ xor esi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor esi,DWORD [0x400+eax*1+ebp]
+ xor esi,DWORD [0x500+edx*1+ebp]
+ ; Round 13
+ mov eax,DWORD [104+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [108+ecx]
+ xor eax,esi
+ xor ecx,ecx
+ xor edx,esi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor edi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor edi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor edi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor edi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor edi,DWORD [0x600+ebx*1+ebp]
+ xor edi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor edi,DWORD [0x400+eax*1+ebp]
+ xor edi,DWORD [0x500+edx*1+ebp]
+ ; Round 12
+ mov eax,DWORD [96+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [100+ecx]
+ xor eax,edi
+ xor ecx,ecx
+ xor edx,edi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor esi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor esi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor esi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor esi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor esi,DWORD [0x600+ebx*1+ebp]
+ xor esi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor esi,DWORD [0x400+eax*1+ebp]
+ xor esi,DWORD [0x500+edx*1+ebp]
+ ; Round 11
+ mov eax,DWORD [88+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [92+ecx]
+ xor eax,esi
+ xor ecx,ecx
+ xor edx,esi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor edi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor edi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor edi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor edi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor edi,DWORD [0x600+ebx*1+ebp]
+ xor edi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor edi,DWORD [0x400+eax*1+ebp]
+ xor edi,DWORD [0x500+edx*1+ebp]
+ ; Round 10
+ mov eax,DWORD [80+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [84+ecx]
+ xor eax,edi
+ xor ecx,ecx
+ xor edx,edi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor esi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor esi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor esi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor esi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor esi,DWORD [0x600+ebx*1+ebp]
+ xor esi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor esi,DWORD [0x400+eax*1+ebp]
+ xor esi,DWORD [0x500+edx*1+ebp]
+ ; Round 9
+ mov eax,DWORD [72+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [76+ecx]
+ xor eax,esi
+ xor ecx,ecx
+ xor edx,esi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor edi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor edi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor edi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor edi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor edi,DWORD [0x600+ebx*1+ebp]
+ xor edi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor edi,DWORD [0x400+eax*1+ebp]
+ xor edi,DWORD [0x500+edx*1+ebp]
+ ; Round 8
+ mov eax,DWORD [64+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [68+ecx]
+ xor eax,edi
+ xor ecx,ecx
+ xor edx,edi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor esi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor esi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor esi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor esi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor esi,DWORD [0x600+ebx*1+ebp]
+ xor esi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor esi,DWORD [0x400+eax*1+ebp]
+ xor esi,DWORD [0x500+edx*1+ebp]
+ ; Round 7
+ mov eax,DWORD [56+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [60+ecx]
+ xor eax,esi
+ xor ecx,ecx
+ xor edx,esi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor edi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor edi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor edi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor edi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor edi,DWORD [0x600+ebx*1+ebp]
+ xor edi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor edi,DWORD [0x400+eax*1+ebp]
+ xor edi,DWORD [0x500+edx*1+ebp]
+ ; Round 6
+ mov eax,DWORD [48+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [52+ecx]
+ xor eax,edi
+ xor ecx,ecx
+ xor edx,edi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor esi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor esi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor esi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor esi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor esi,DWORD [0x600+ebx*1+ebp]
+ xor esi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor esi,DWORD [0x400+eax*1+ebp]
+ xor esi,DWORD [0x500+edx*1+ebp]
+ ; Round 5
+ mov eax,DWORD [40+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [44+ecx]
+ xor eax,esi
+ xor ecx,ecx
+ xor edx,esi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor edi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor edi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor edi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor edi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor edi,DWORD [0x600+ebx*1+ebp]
+ xor edi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor edi,DWORD [0x400+eax*1+ebp]
+ xor edi,DWORD [0x500+edx*1+ebp]
+ ; Round 4
+ mov eax,DWORD [32+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [36+ecx]
+ xor eax,edi
+ xor ecx,ecx
+ xor edx,edi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor esi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor esi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor esi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor esi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor esi,DWORD [0x600+ebx*1+ebp]
+ xor esi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor esi,DWORD [0x400+eax*1+ebp]
+ xor esi,DWORD [0x500+edx*1+ebp]
+ ; Round 3
+ mov eax,DWORD [24+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [28+ecx]
+ xor eax,esi
+ xor ecx,ecx
+ xor edx,esi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor edi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor edi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor edi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor edi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor edi,DWORD [0x600+ebx*1+ebp]
+ xor edi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor edi,DWORD [0x400+eax*1+ebp]
+ xor edi,DWORD [0x500+edx*1+ebp]
+ ; Round 2
+ mov eax,DWORD [16+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [20+ecx]
+ xor eax,edi
+ xor ecx,ecx
+ xor edx,edi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor esi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor esi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor esi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor esi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor esi,DWORD [0x600+ebx*1+ebp]
+ xor esi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor esi,DWORD [0x400+eax*1+ebp]
+ xor esi,DWORD [0x500+edx*1+ebp]
+ ; Round 1
+ mov eax,DWORD [8+ecx]
+ xor ebx,ebx
+ mov edx,DWORD [12+ecx]
+ xor eax,esi
+ xor ecx,ecx
+ xor edx,esi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor edi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor edi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor edi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor edi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor edi,DWORD [0x600+ebx*1+ebp]
+ xor edi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor edi,DWORD [0x400+eax*1+ebp]
+ xor edi,DWORD [0x500+edx*1+ebp]
+ ; Round 0
+ mov eax,DWORD [ecx]
+ xor ebx,ebx
+ mov edx,DWORD [4+ecx]
+ xor eax,edi
+ xor ecx,ecx
+ xor edx,edi
+ and eax,0xfcfcfcfc
+ and edx,0xcfcfcfcf
+ mov bl,al
+ mov cl,ah
+ ror edx,4
+ xor esi,DWORD [ebx*1+ebp]
+ mov bl,dl
+ xor esi,DWORD [0x200+ecx*1+ebp]
+ mov cl,dh
+ shr eax,16
+ xor esi,DWORD [0x100+ebx*1+ebp]
+ mov bl,ah
+ shr edx,16
+ xor esi,DWORD [0x300+ecx*1+ebp]
+ mov cl,dh
+ and eax,0xff
+ and edx,0xff
+ xor esi,DWORD [0x600+ebx*1+ebp]
+ xor esi,DWORD [0x700+ecx*1+ebp]
+ mov ecx,DWORD [esp]
+ xor esi,DWORD [0x400+eax*1+ebp]
+ xor esi,DWORD [0x500+edx*1+ebp]
+ add esp,4
+ ret
+global _DES_encrypt1
+align 16
+_DES_encrypt1:
+L$_DES_encrypt1_begin:
+ push esi
+ push edi
+ ;
+ ; Load the 2 words
+ mov esi,DWORD [12+esp]
+ xor ecx,ecx
+ push ebx
+ push ebp
+ mov eax,DWORD [esi]
+ mov ebx,DWORD [28+esp]
+ mov edi,DWORD [4+esi]
+ ;
+ ; IP
+ rol eax,4
+ mov esi,eax
+ xor eax,edi
+ and eax,0xf0f0f0f0
+ xor esi,eax
+ xor edi,eax
+ ;
+ rol edi,20
+ mov eax,edi
+ xor edi,esi
+ and edi,0xfff0000f
+ xor eax,edi
+ xor esi,edi
+ ;
+ rol eax,14
+ mov edi,eax
+ xor eax,esi
+ and eax,0x33333333
+ xor edi,eax
+ xor esi,eax
+ ;
+ rol esi,22
+ mov eax,esi
+ xor esi,edi
+ and esi,0x03fc03fc
+ xor eax,esi
+ xor edi,esi
+ ;
+ rol eax,9
+ mov esi,eax
+ xor eax,edi
+ and eax,0xaaaaaaaa
+ xor esi,eax
+ xor edi,eax
+ ;
+ rol edi,1
+ call L$000pic_point
+L$000pic_point:
+ pop ebp
+ lea ebp,[(L$des_sptrans-L$000pic_point)+ebp]
+ mov ecx,DWORD [24+esp]
+ cmp ebx,0
+ je NEAR L$001decrypt
+ call __x86_DES_encrypt
+ jmp NEAR L$002done
+L$001decrypt:
+ call __x86_DES_decrypt
+L$002done:
+ ;
+ ; FP
+ mov edx,DWORD [20+esp]
+ ror esi,1
+ mov eax,edi
+ xor edi,esi
+ and edi,0xaaaaaaaa
+ xor eax,edi
+ xor esi,edi
+ ;
+ rol eax,23
+ mov edi,eax
+ xor eax,esi
+ and eax,0x03fc03fc
+ xor edi,eax
+ xor esi,eax
+ ;
+ rol edi,10
+ mov eax,edi
+ xor edi,esi
+ and edi,0x33333333
+ xor eax,edi
+ xor esi,edi
+ ;
+ rol esi,18
+ mov edi,esi
+ xor esi,eax
+ and esi,0xfff0000f
+ xor edi,esi
+ xor eax,esi
+ ;
+ rol edi,12
+ mov esi,edi
+ xor edi,eax
+ and edi,0xf0f0f0f0
+ xor esi,edi
+ xor eax,edi
+ ;
+ ror eax,4
+ mov DWORD [edx],eax
+ mov DWORD [4+edx],esi
+ pop ebp
+ pop ebx
+ pop edi
+ pop esi
+ ret
+global _DES_encrypt2
+align 16
+_DES_encrypt2:
+L$_DES_encrypt2_begin:
+ push esi
+ push edi
+ ;
+ ; Load the 2 words
+ mov eax,DWORD [12+esp]
+ xor ecx,ecx
+ push ebx
+ push ebp
+ mov esi,DWORD [eax]
+ mov ebx,DWORD [28+esp]
+ rol esi,3
+ mov edi,DWORD [4+eax]
+ rol edi,3
+ call L$003pic_point
+L$003pic_point:
+ pop ebp
+ lea ebp,[(L$des_sptrans-L$003pic_point)+ebp]
+ mov ecx,DWORD [24+esp]
+ cmp ebx,0
+ je NEAR L$004decrypt
+ call __x86_DES_encrypt
+ jmp NEAR L$005done
+L$004decrypt:
+ call __x86_DES_decrypt
+L$005done:
+ ;
+ ; Fixup
+ ror edi,3
+ mov eax,DWORD [20+esp]
+ ror esi,3
+ mov DWORD [eax],edi
+ mov DWORD [4+eax],esi
+ pop ebp
+ pop ebx
+ pop edi
+ pop esi
+ ret
+global _DES_encrypt3
+align 16
+_DES_encrypt3:
+L$_DES_encrypt3_begin:
+ push ebx
+ mov ebx,DWORD [8+esp]
+ push ebp
+ push esi
+ push edi
+ ;
+ ; Load the data words
+ mov edi,DWORD [ebx]
+ mov esi,DWORD [4+ebx]
+ sub esp,12
+ ;
+ ; IP
+ rol edi,4
+ mov edx,edi
+ xor edi,esi
+ and edi,0xf0f0f0f0
+ xor edx,edi
+ xor esi,edi
+ ;
+ rol esi,20
+ mov edi,esi
+ xor esi,edx
+ and esi,0xfff0000f
+ xor edi,esi
+ xor edx,esi
+ ;
+ rol edi,14
+ mov esi,edi
+ xor edi,edx
+ and edi,0x33333333
+ xor esi,edi
+ xor edx,edi
+ ;
+ rol edx,22
+ mov edi,edx
+ xor edx,esi
+ and edx,0x03fc03fc
+ xor edi,edx
+ xor esi,edx
+ ;
+ rol edi,9
+ mov edx,edi
+ xor edi,esi
+ and edi,0xaaaaaaaa
+ xor edx,edi
+ xor esi,edi
+ ;
+ ror edx,3
+ ror esi,2
+ mov DWORD [4+ebx],esi
+ mov eax,DWORD [36+esp]
+ mov DWORD [ebx],edx
+ mov edi,DWORD [40+esp]
+ mov esi,DWORD [44+esp]
+ mov DWORD [8+esp],DWORD 1
+ mov DWORD [4+esp],eax
+ mov DWORD [esp],ebx
+ call L$_DES_encrypt2_begin
+ mov DWORD [8+esp],DWORD 0
+ mov DWORD [4+esp],edi
+ mov DWORD [esp],ebx
+ call L$_DES_encrypt2_begin
+ mov DWORD [8+esp],DWORD 1
+ mov DWORD [4+esp],esi
+ mov DWORD [esp],ebx
+ call L$_DES_encrypt2_begin
+ add esp,12
+ mov edi,DWORD [ebx]
+ mov esi,DWORD [4+ebx]
+ ;
+ ; FP
+ rol esi,2
+ rol edi,3
+ mov eax,edi
+ xor edi,esi
+ and edi,0xaaaaaaaa
+ xor eax,edi
+ xor esi,edi
+ ;
+ rol eax,23
+ mov edi,eax
+ xor eax,esi
+ and eax,0x03fc03fc
+ xor edi,eax
+ xor esi,eax
+ ;
+ rol edi,10
+ mov eax,edi
+ xor edi,esi
+ and edi,0x33333333
+ xor eax,edi
+ xor esi,edi
+ ;
+ rol esi,18
+ mov edi,esi
+ xor esi,eax
+ and esi,0xfff0000f
+ xor edi,esi
+ xor eax,esi
+ ;
+ rol edi,12
+ mov esi,edi
+ xor edi,eax
+ and edi,0xf0f0f0f0
+ xor esi,edi
+ xor eax,edi
+ ;
+ ror eax,4
+ mov DWORD [ebx],eax
+ mov DWORD [4+ebx],esi
+ pop edi
+ pop esi
+ pop ebp
+ pop ebx
+ ret
+global _DES_decrypt3
+align 16
+_DES_decrypt3:
+L$_DES_decrypt3_begin:
+ push ebx
+ mov ebx,DWORD [8+esp]
+ push ebp
+ push esi
+ push edi
+ ;
+ ; Load the data words
+ mov edi,DWORD [ebx]
+ mov esi,DWORD [4+ebx]
+ sub esp,12
+ ;
+ ; IP
+ rol edi,4
+ mov edx,edi
+ xor edi,esi
+ and edi,0xf0f0f0f0
+ xor edx,edi
+ xor esi,edi
+ ;
+ rol esi,20
+ mov edi,esi
+ xor esi,edx
+ and esi,0xfff0000f
+ xor edi,esi
+ xor edx,esi
+ ;
+ rol edi,14
+ mov esi,edi
+ xor edi,edx
+ and edi,0x33333333
+ xor esi,edi
+ xor edx,edi
+ ;
+ rol edx,22
+ mov edi,edx
+ xor edx,esi
+ and edx,0x03fc03fc
+ xor edi,edx
+ xor esi,edx
+ ;
+ rol edi,9
+ mov edx,edi
+ xor edi,esi
+ and edi,0xaaaaaaaa
+ xor edx,edi
+ xor esi,edi
+ ;
+ ror edx,3
+ ror esi,2
+ mov DWORD [4+ebx],esi
+ mov esi,DWORD [36+esp]
+ mov DWORD [ebx],edx
+ mov edi,DWORD [40+esp]
+ mov eax,DWORD [44+esp]
+ mov DWORD [8+esp],DWORD 0
+ mov DWORD [4+esp],eax
+ mov DWORD [esp],ebx
+ call L$_DES_encrypt2_begin
+ mov DWORD [8+esp],DWORD 1
+ mov DWORD [4+esp],edi
+ mov DWORD [esp],ebx
+ call L$_DES_encrypt2_begin
+ mov DWORD [8+esp],DWORD 0
+ mov DWORD [4+esp],esi
+ mov DWORD [esp],ebx
+ call L$_DES_encrypt2_begin
+ add esp,12
+ mov edi,DWORD [ebx]
+ mov esi,DWORD [4+ebx]
+ ;
+ ; FP
+ rol esi,2
+ rol edi,3
+ mov eax,edi
+ xor edi,esi
+ and edi,0xaaaaaaaa
+ xor eax,edi
+ xor esi,edi
+ ;
+ rol eax,23
+ mov edi,eax
+ xor eax,esi
+ and eax,0x03fc03fc
+ xor edi,eax
+ xor esi,eax
+ ;
+ rol edi,10
+ mov eax,edi
+ xor edi,esi
+ and edi,0x33333333
+ xor eax,edi
+ xor esi,edi
+ ;
+ rol esi,18
+ mov edi,esi
+ xor esi,eax
+ and esi,0xfff0000f
+ xor edi,esi
+ xor eax,esi
+ ;
+ rol edi,12
+ mov esi,edi
+ xor edi,eax
+ and edi,0xf0f0f0f0
+ xor esi,edi
+ xor eax,edi
+ ;
+ ror eax,4
+ mov DWORD [ebx],eax
+ mov DWORD [4+ebx],esi
+ pop edi
+ pop esi
+ pop ebp
+ pop ebx
+ ret
+global _DES_ncbc_encrypt
+align 16
+_DES_ncbc_encrypt:
+L$_DES_ncbc_encrypt_begin:
+ ;
+ push ebp
+ push ebx
+ push esi
+ push edi
+ mov ebp,DWORD [28+esp]
+ ; getting iv ptr from parameter 4
+ mov ebx,DWORD [36+esp]
+ mov esi,DWORD [ebx]
+ mov edi,DWORD [4+ebx]
+ push edi
+ push esi
+ push edi
+ push esi
+ mov ebx,esp
+ mov esi,DWORD [36+esp]
+ mov edi,DWORD [40+esp]
+ ; getting encrypt flag from parameter 5
+ mov ecx,DWORD [56+esp]
+ ; get and push parameter 5
+ push ecx
+ ; get and push parameter 3
+ mov eax,DWORD [52+esp]
+ push eax
+ push ebx
+ cmp ecx,0
+ jz NEAR L$006decrypt
+ and ebp,4294967288
+ mov eax,DWORD [12+esp]
+ mov ebx,DWORD [16+esp]
+ jz NEAR L$007encrypt_finish
+L$008encrypt_loop:
+ mov ecx,DWORD [esi]
+ mov edx,DWORD [4+esi]
+ xor eax,ecx
+ xor ebx,edx
+ mov DWORD [12+esp],eax
+ mov DWORD [16+esp],ebx
+ call L$_DES_encrypt1_begin
+ mov eax,DWORD [12+esp]
+ mov ebx,DWORD [16+esp]
+ mov DWORD [edi],eax
+ mov DWORD [4+edi],ebx
+ add esi,8
+ add edi,8
+ sub ebp,8
+ jnz NEAR L$008encrypt_loop
+L$007encrypt_finish:
+ mov ebp,DWORD [56+esp]
+ and ebp,7
+ jz NEAR L$009finish
+ call L$010PIC_point
+L$010PIC_point:
+ pop edx
+ lea ecx,[(L$011cbc_enc_jmp_table-L$010PIC_point)+edx]
+ mov ebp,DWORD [ebp*4+ecx]
+ add ebp,edx
+ xor ecx,ecx
+ xor edx,edx
+ jmp ebp
+L$012ej7:
+ mov dh,BYTE [6+esi]
+ shl edx,8
+L$013ej6:
+ mov dh,BYTE [5+esi]
+L$014ej5:
+ mov dl,BYTE [4+esi]
+L$015ej4:
+ mov ecx,DWORD [esi]
+ jmp NEAR L$016ejend
+L$017ej3:
+ mov ch,BYTE [2+esi]
+ shl ecx,8
+L$018ej2:
+ mov ch,BYTE [1+esi]
+L$019ej1:
+ mov cl,BYTE [esi]
+L$016ejend:
+ xor eax,ecx
+ xor ebx,edx
+ mov DWORD [12+esp],eax
+ mov DWORD [16+esp],ebx
+ call L$_DES_encrypt1_begin
+ mov eax,DWORD [12+esp]
+ mov ebx,DWORD [16+esp]
+ mov DWORD [edi],eax
+ mov DWORD [4+edi],ebx
+ jmp NEAR L$009finish
+L$006decrypt:
+ and ebp,4294967288
+ mov eax,DWORD [20+esp]
+ mov ebx,DWORD [24+esp]
+ jz NEAR L$020decrypt_finish
+L$021decrypt_loop:
+ mov eax,DWORD [esi]
+ mov ebx,DWORD [4+esi]
+ mov DWORD [12+esp],eax
+ mov DWORD [16+esp],ebx
+ call L$_DES_encrypt1_begin
+ mov eax,DWORD [12+esp]
+ mov ebx,DWORD [16+esp]
+ mov ecx,DWORD [20+esp]
+ mov edx,DWORD [24+esp]
+ xor ecx,eax
+ xor edx,ebx
+ mov eax,DWORD [esi]
+ mov ebx,DWORD [4+esi]
+ mov DWORD [edi],ecx
+ mov DWORD [4+edi],edx
+ mov DWORD [20+esp],eax
+ mov DWORD [24+esp],ebx
+ add esi,8
+ add edi,8
+ sub ebp,8
+ jnz NEAR L$021decrypt_loop
+L$020decrypt_finish:
+ mov ebp,DWORD [56+esp]
+ and ebp,7
+ jz NEAR L$009finish
+ mov eax,DWORD [esi]
+ mov ebx,DWORD [4+esi]
+ mov DWORD [12+esp],eax
+ mov DWORD [16+esp],ebx
+ call L$_DES_encrypt1_begin
+ mov eax,DWORD [12+esp]
+ mov ebx,DWORD [16+esp]
+ mov ecx,DWORD [20+esp]
+ mov edx,DWORD [24+esp]
+ xor ecx,eax
+ xor edx,ebx
+ mov eax,DWORD [esi]
+ mov ebx,DWORD [4+esi]
+L$022dj7:
+ ror edx,16
+ mov BYTE [6+edi],dl
+ shr edx,16
+L$023dj6:
+ mov BYTE [5+edi],dh
+L$024dj5:
+ mov BYTE [4+edi],dl
+L$025dj4:
+ mov DWORD [edi],ecx
+ jmp NEAR L$026djend
+L$027dj3:
+ ror ecx,16
+ mov BYTE [2+edi],cl
+ shl ecx,16
+L$028dj2:
+ mov BYTE [1+esi],ch
+L$029dj1:
+ mov BYTE [esi],cl
+L$026djend:
+ jmp NEAR L$009finish
+L$009finish:
+ mov ecx,DWORD [64+esp]
+ add esp,28
+ mov DWORD [ecx],eax
+ mov DWORD [4+ecx],ebx
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+align 64
+L$011cbc_enc_jmp_table:
+dd 0
+dd L$019ej1-L$010PIC_point
+dd L$018ej2-L$010PIC_point
+dd L$017ej3-L$010PIC_point
+dd L$015ej4-L$010PIC_point
+dd L$014ej5-L$010PIC_point
+dd L$013ej6-L$010PIC_point
+dd L$012ej7-L$010PIC_point
+align 64
+global _DES_ede3_cbc_encrypt
+align 16
+_DES_ede3_cbc_encrypt:
+L$_DES_ede3_cbc_encrypt_begin:
+ ;
+ push ebp
+ push ebx
+ push esi
+ push edi
+ mov ebp,DWORD [28+esp]
+ ; getting iv ptr from parameter 6
+ mov ebx,DWORD [44+esp]
+ mov esi,DWORD [ebx]
+ mov edi,DWORD [4+ebx]
+ push edi
+ push esi
+ push edi
+ push esi
+ mov ebx,esp
+ mov esi,DWORD [36+esp]
+ mov edi,DWORD [40+esp]
+ ; getting encrypt flag from parameter 7
+ mov ecx,DWORD [64+esp]
+ ; get and push parameter 5
+ mov eax,DWORD [56+esp]
+ push eax
+ ; get and push parameter 4
+ mov eax,DWORD [56+esp]
+ push eax
+ ; get and push parameter 3
+ mov eax,DWORD [56+esp]
+ push eax
+ push ebx
+ cmp ecx,0
+ jz NEAR L$030decrypt
+ and ebp,4294967288
+ mov eax,DWORD [16+esp]
+ mov ebx,DWORD [20+esp]
+ jz NEAR L$031encrypt_finish
+L$032encrypt_loop:
+ mov ecx,DWORD [esi]
+ mov edx,DWORD [4+esi]
+ xor eax,ecx
+ xor ebx,edx
+ mov DWORD [16+esp],eax
+ mov DWORD [20+esp],ebx
+ call L$_DES_encrypt3_begin
+ mov eax,DWORD [16+esp]
+ mov ebx,DWORD [20+esp]
+ mov DWORD [edi],eax
+ mov DWORD [4+edi],ebx
+ add esi,8
+ add edi,8
+ sub ebp,8
+ jnz NEAR L$032encrypt_loop
+L$031encrypt_finish:
+ mov ebp,DWORD [60+esp]
+ and ebp,7
+ jz NEAR L$033finish
+ call L$034PIC_point
+L$034PIC_point:
+ pop edx
+ lea ecx,[(L$035cbc_enc_jmp_table-L$034PIC_point)+edx]
+ mov ebp,DWORD [ebp*4+ecx]
+ add ebp,edx
+ xor ecx,ecx
+ xor edx,edx
+ jmp ebp
+L$036ej7:
+ mov dh,BYTE [6+esi]
+ shl edx,8
+L$037ej6:
+ mov dh,BYTE [5+esi]
+L$038ej5:
+ mov dl,BYTE [4+esi]
+L$039ej4:
+ mov ecx,DWORD [esi]
+ jmp NEAR L$040ejend
+L$041ej3:
+ mov ch,BYTE [2+esi]
+ shl ecx,8
+L$042ej2:
+ mov ch,BYTE [1+esi]
+L$043ej1:
+ mov cl,BYTE [esi]
+L$040ejend:
+ xor eax,ecx
+ xor ebx,edx
+ mov DWORD [16+esp],eax
+ mov DWORD [20+esp],ebx
+ call L$_DES_encrypt3_begin
+ mov eax,DWORD [16+esp]
+ mov ebx,DWORD [20+esp]
+ mov DWORD [edi],eax
+ mov DWORD [4+edi],ebx
+ jmp NEAR L$033finish
+L$030decrypt:
+ and ebp,4294967288
+ mov eax,DWORD [24+esp]
+ mov ebx,DWORD [28+esp]
+ jz NEAR L$044decrypt_finish
+L$045decrypt_loop:
+ mov eax,DWORD [esi]
+ mov ebx,DWORD [4+esi]
+ mov DWORD [16+esp],eax
+ mov DWORD [20+esp],ebx
+ call L$_DES_decrypt3_begin
+ mov eax,DWORD [16+esp]
+ mov ebx,DWORD [20+esp]
+ mov ecx,DWORD [24+esp]
+ mov edx,DWORD [28+esp]
+ xor ecx,eax
+ xor edx,ebx
+ mov eax,DWORD [esi]
+ mov ebx,DWORD [4+esi]
+ mov DWORD [edi],ecx
+ mov DWORD [4+edi],edx
+ mov DWORD [24+esp],eax
+ mov DWORD [28+esp],ebx
+ add esi,8
+ add edi,8
+ sub ebp,8
+ jnz NEAR L$045decrypt_loop
+L$044decrypt_finish:
+ mov ebp,DWORD [60+esp]
+ and ebp,7
+ jz NEAR L$033finish
+ mov eax,DWORD [esi]
+ mov ebx,DWORD [4+esi]
+ mov DWORD [16+esp],eax
+ mov DWORD [20+esp],ebx
+ call L$_DES_decrypt3_begin
+ mov eax,DWORD [16+esp]
+ mov ebx,DWORD [20+esp]
+ mov ecx,DWORD [24+esp]
+ mov edx,DWORD [28+esp]
+ xor ecx,eax
+ xor edx,ebx
+ mov eax,DWORD [esi]
+ mov ebx,DWORD [4+esi]
+L$046dj7:
+ ror edx,16
+ mov BYTE [6+edi],dl
+ shr edx,16
+L$047dj6:
+ mov BYTE [5+edi],dh
+L$048dj5:
+ mov BYTE [4+edi],dl
+L$049dj4:
+ mov DWORD [edi],ecx
+ jmp NEAR L$050djend
+L$051dj3:
+ ror ecx,16
+ mov BYTE [2+edi],cl
+ shl ecx,16
+L$052dj2:
+ mov BYTE [1+esi],ch
+L$053dj1:
+ mov BYTE [esi],cl
+L$050djend:
+ jmp NEAR L$033finish
+L$033finish:
+ mov ecx,DWORD [76+esp]
+ add esp,32
+ mov DWORD [ecx],eax
+ mov DWORD [4+ecx],ebx
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+align 64
+L$035cbc_enc_jmp_table:
+dd 0
+dd L$043ej1-L$034PIC_point
+dd L$042ej2-L$034PIC_point
+dd L$041ej3-L$034PIC_point
+dd L$039ej4-L$034PIC_point
+dd L$038ej5-L$034PIC_point
+dd L$037ej6-L$034PIC_point
+dd L$036ej7-L$034PIC_point
+align 64
+align 64
+_DES_SPtrans:
+L$des_sptrans:
+dd 34080768,524288,33554434,34080770
+dd 33554432,526338,524290,33554434
+dd 526338,34080768,34078720,2050
+dd 33556482,33554432,0,524290
+dd 524288,2,33556480,526336
+dd 34080770,34078720,2050,33556480
+dd 2,2048,526336,34078722
+dd 2048,33556482,34078722,0
+dd 0,34080770,33556480,524290
+dd 34080768,524288,2050,33556480
+dd 34078722,2048,526336,33554434
+dd 526338,2,33554434,34078720
+dd 34080770,526336,34078720,33556482
+dd 33554432,2050,524290,0
+dd 524288,33554432,33556482,34080768
+dd 2,34078722,2048,526338
+dd 1074823184,0,1081344,1074790400
+dd 1073741840,32784,1073774592,1081344
+dd 32768,1074790416,16,1073774592
+dd 1048592,1074823168,1074790400,16
+dd 1048576,1073774608,1074790416,32768
+dd 1081360,1073741824,0,1048592
+dd 1073774608,1081360,1074823168,1073741840
+dd 1073741824,1048576,32784,1074823184
+dd 1048592,1074823168,1073774592,1081360
+dd 1074823184,1048592,1073741840,0
+dd 1073741824,32784,1048576,1074790416
+dd 32768,1073741824,1081360,1073774608
+dd 1074823168,32768,0,1073741840
+dd 16,1074823184,1081344,1074790400
+dd 1074790416,1048576,32784,1073774592
+dd 1073774608,16,1074790400,1081344
+dd 67108865,67371264,256,67109121
+dd 262145,67108864,67109121,262400
+dd 67109120,262144,67371008,1
+dd 67371265,257,1,67371009
+dd 0,262145,67371264,256
+dd 257,67371265,262144,67108865
+dd 67371009,67109120,262401,67371008
+dd 262400,0,67108864,262401
+dd 67371264,256,1,262144
+dd 257,262145,67371008,67109121
+dd 0,67371264,262400,67371009
+dd 262145,67108864,67371265,1
+dd 262401,67108865,67108864,67371265
+dd 262144,67109120,67109121,262400
+dd 67109120,0,67371009,257
+dd 67108865,262401,256,67371008
+dd 4198408,268439552,8,272633864
+dd 0,272629760,268439560,4194312
+dd 272633856,268435464,268435456,4104
+dd 268435464,4198408,4194304,268435456
+dd 272629768,4198400,4096,8
+dd 4198400,268439560,272629760,4096
+dd 4104,0,4194312,272633856
+dd 268439552,272629768,272633864,4194304
+dd 272629768,4104,4194304,268435464
+dd 4198400,268439552,8,272629760
+dd 268439560,0,4096,4194312
+dd 0,272629768,272633856,4096
+dd 268435456,272633864,4198408,4194304
+dd 272633864,8,268439552,4198408
+dd 4194312,4198400,272629760,268439560
+dd 4104,268435456,268435464,272633856
+dd 134217728,65536,1024,134284320
+dd 134283296,134218752,66592,134283264
+dd 65536,32,134217760,66560
+dd 134218784,134283296,134284288,0
+dd 66560,134217728,65568,1056
+dd 134218752,66592,0,134217760
+dd 32,134218784,134284320,65568
+dd 134283264,1024,1056,134284288
+dd 134284288,134218784,65568,134283264
+dd 65536,32,134217760,134218752
+dd 134217728,66560,134284320,0
+dd 66592,134217728,1024,65568
+dd 134218784,1024,0,134284320
+dd 134283296,134284288,1056,65536
+dd 66560,134283296,134218752,1056
+dd 32,66592,134283264,134217760
+dd 2147483712,2097216,0,2149588992
+dd 2097216,8192,2147491904,2097152
+dd 8256,2149589056,2105344,2147483648
+dd 2147491840,2147483712,2149580800,2105408
+dd 2097152,2147491904,2149580864,0
+dd 8192,64,2149588992,2149580864
+dd 2149589056,2149580800,2147483648,8256
+dd 64,2105344,2105408,2147491840
+dd 8256,2147483648,2147491840,2105408
+dd 2149588992,2097216,0,2147491840
+dd 2147483648,8192,2149580864,2097152
+dd 2097216,2149589056,2105344,64
+dd 2149589056,2105344,2097152,2147491904
+dd 2147483712,2149580800,2105408,0
+dd 8192,2147483712,2147491904,2149588992
+dd 2149580800,8256,64,2149580864
+dd 16384,512,16777728,16777220
+dd 16794116,16388,16896,0
+dd 16777216,16777732,516,16793600
+dd 4,16794112,16793600,516
+dd 16777732,16384,16388,16794116
+dd 0,16777728,16777220,16896
+dd 16793604,16900,16794112,4
+dd 16900,16793604,512,16777216
+dd 16900,16793600,16793604,516
+dd 16384,512,16777216,16793604
+dd 16777732,16900,16896,0
+dd 512,16777220,4,16777728
+dd 0,16777732,16777728,16896
+dd 516,16384,16794116,16777216
+dd 16794112,4,16388,16794116
+dd 16777220,16794112,16793600,16388
+dd 545259648,545390592,131200,0
+dd 537001984,8388736,545259520,545390720
+dd 128,536870912,8519680,131200
+dd 8519808,537002112,536871040,545259520
+dd 131072,8519808,8388736,537001984
+dd 545390720,536871040,0,8519680
+dd 536870912,8388608,537002112,545259648
+dd 8388608,131072,545390592,128
+dd 8388608,131072,536871040,545390720
+dd 131200,536870912,0,8519680
+dd 545259648,537002112,537001984,8388736
+dd 545390592,128,8388736,537001984
+dd 545390720,8388608,545259520,536871040
+dd 8519680,131200,537002112,545259520
+dd 128,545390592,8519808,0
+dd 536870912,545259648,131072,8519808
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/md5/md5-586.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/md5/md5-586.nasm
new file mode 100644
index 0000000000..83e4e77e6a
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/md5/md5-586.nasm
@@ -0,0 +1,690 @@
+; Copyright 1995-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text code align=64
+%else
+section .text code
+%endif
+global _md5_block_asm_data_order
+align 16
+_md5_block_asm_data_order:
+L$_md5_block_asm_data_order_begin:
+ push esi
+ push edi
+ mov edi,DWORD [12+esp]
+ mov esi,DWORD [16+esp]
+ mov ecx,DWORD [20+esp]
+ push ebp
+ shl ecx,6
+ push ebx
+ add ecx,esi
+ sub ecx,64
+ mov eax,DWORD [edi]
+ push ecx
+ mov ebx,DWORD [4+edi]
+ mov ecx,DWORD [8+edi]
+ mov edx,DWORD [12+edi]
+L$000start:
+ ;
+ ; R0 section
+ mov edi,ecx
+ mov ebp,DWORD [esi]
+ ; R0 0
+ xor edi,edx
+ and edi,ebx
+ lea eax,[3614090360+ebp*1+eax]
+ xor edi,edx
+ mov ebp,DWORD [4+esi]
+ add eax,edi
+ rol eax,7
+ mov edi,ebx
+ add eax,ebx
+ ; R0 1
+ xor edi,ecx
+ and edi,eax
+ lea edx,[3905402710+ebp*1+edx]
+ xor edi,ecx
+ mov ebp,DWORD [8+esi]
+ add edx,edi
+ rol edx,12
+ mov edi,eax
+ add edx,eax
+ ; R0 2
+ xor edi,ebx
+ and edi,edx
+ lea ecx,[606105819+ebp*1+ecx]
+ xor edi,ebx
+ mov ebp,DWORD [12+esi]
+ add ecx,edi
+ rol ecx,17
+ mov edi,edx
+ add ecx,edx
+ ; R0 3
+ xor edi,eax
+ and edi,ecx
+ lea ebx,[3250441966+ebp*1+ebx]
+ xor edi,eax
+ mov ebp,DWORD [16+esi]
+ add ebx,edi
+ rol ebx,22
+ mov edi,ecx
+ add ebx,ecx
+ ; R0 4
+ xor edi,edx
+ and edi,ebx
+ lea eax,[4118548399+ebp*1+eax]
+ xor edi,edx
+ mov ebp,DWORD [20+esi]
+ add eax,edi
+ rol eax,7
+ mov edi,ebx
+ add eax,ebx
+ ; R0 5
+ xor edi,ecx
+ and edi,eax
+ lea edx,[1200080426+ebp*1+edx]
+ xor edi,ecx
+ mov ebp,DWORD [24+esi]
+ add edx,edi
+ rol edx,12
+ mov edi,eax
+ add edx,eax
+ ; R0 6
+ xor edi,ebx
+ and edi,edx
+ lea ecx,[2821735955+ebp*1+ecx]
+ xor edi,ebx
+ mov ebp,DWORD [28+esi]
+ add ecx,edi
+ rol ecx,17
+ mov edi,edx
+ add ecx,edx
+ ; R0 7
+ xor edi,eax
+ and edi,ecx
+ lea ebx,[4249261313+ebp*1+ebx]
+ xor edi,eax
+ mov ebp,DWORD [32+esi]
+ add ebx,edi
+ rol ebx,22
+ mov edi,ecx
+ add ebx,ecx
+ ; R0 8
+ xor edi,edx
+ and edi,ebx
+ lea eax,[1770035416+ebp*1+eax]
+ xor edi,edx
+ mov ebp,DWORD [36+esi]
+ add eax,edi
+ rol eax,7
+ mov edi,ebx
+ add eax,ebx
+ ; R0 9
+ xor edi,ecx
+ and edi,eax
+ lea edx,[2336552879+ebp*1+edx]
+ xor edi,ecx
+ mov ebp,DWORD [40+esi]
+ add edx,edi
+ rol edx,12
+ mov edi,eax
+ add edx,eax
+ ; R0 10
+ xor edi,ebx
+ and edi,edx
+ lea ecx,[4294925233+ebp*1+ecx]
+ xor edi,ebx
+ mov ebp,DWORD [44+esi]
+ add ecx,edi
+ rol ecx,17
+ mov edi,edx
+ add ecx,edx
+ ; R0 11
+ xor edi,eax
+ and edi,ecx
+ lea ebx,[2304563134+ebp*1+ebx]
+ xor edi,eax
+ mov ebp,DWORD [48+esi]
+ add ebx,edi
+ rol ebx,22
+ mov edi,ecx
+ add ebx,ecx
+ ; R0 12
+ xor edi,edx
+ and edi,ebx
+ lea eax,[1804603682+ebp*1+eax]
+ xor edi,edx
+ mov ebp,DWORD [52+esi]
+ add eax,edi
+ rol eax,7
+ mov edi,ebx
+ add eax,ebx
+ ; R0 13
+ xor edi,ecx
+ and edi,eax
+ lea edx,[4254626195+ebp*1+edx]
+ xor edi,ecx
+ mov ebp,DWORD [56+esi]
+ add edx,edi
+ rol edx,12
+ mov edi,eax
+ add edx,eax
+ ; R0 14
+ xor edi,ebx
+ and edi,edx
+ lea ecx,[2792965006+ebp*1+ecx]
+ xor edi,ebx
+ mov ebp,DWORD [60+esi]
+ add ecx,edi
+ rol ecx,17
+ mov edi,edx
+ add ecx,edx
+ ; R0 15
+ xor edi,eax
+ and edi,ecx
+ lea ebx,[1236535329+ebp*1+ebx]
+ xor edi,eax
+ mov ebp,DWORD [4+esi]
+ add ebx,edi
+ rol ebx,22
+ mov edi,ecx
+ add ebx,ecx
+ ;
+ ; R1 section
+ ; R1 16
+ xor edi,ebx
+ and edi,edx
+ lea eax,[4129170786+ebp*1+eax]
+ xor edi,ecx
+ mov ebp,DWORD [24+esi]
+ add eax,edi
+ mov edi,ebx
+ rol eax,5
+ add eax,ebx
+ ; R1 17
+ xor edi,eax
+ and edi,ecx
+ lea edx,[3225465664+ebp*1+edx]
+ xor edi,ebx
+ mov ebp,DWORD [44+esi]
+ add edx,edi
+ mov edi,eax
+ rol edx,9
+ add edx,eax
+ ; R1 18
+ xor edi,edx
+ and edi,ebx
+ lea ecx,[643717713+ebp*1+ecx]
+ xor edi,eax
+ mov ebp,DWORD [esi]
+ add ecx,edi
+ mov edi,edx
+ rol ecx,14
+ add ecx,edx
+ ; R1 19
+ xor edi,ecx
+ and edi,eax
+ lea ebx,[3921069994+ebp*1+ebx]
+ xor edi,edx
+ mov ebp,DWORD [20+esi]
+ add ebx,edi
+ mov edi,ecx
+ rol ebx,20
+ add ebx,ecx
+ ; R1 20
+ xor edi,ebx
+ and edi,edx
+ lea eax,[3593408605+ebp*1+eax]
+ xor edi,ecx
+ mov ebp,DWORD [40+esi]
+ add eax,edi
+ mov edi,ebx
+ rol eax,5
+ add eax,ebx
+ ; R1 21
+ xor edi,eax
+ and edi,ecx
+ lea edx,[38016083+ebp*1+edx]
+ xor edi,ebx
+ mov ebp,DWORD [60+esi]
+ add edx,edi
+ mov edi,eax
+ rol edx,9
+ add edx,eax
+ ; R1 22
+ xor edi,edx
+ and edi,ebx
+ lea ecx,[3634488961+ebp*1+ecx]
+ xor edi,eax
+ mov ebp,DWORD [16+esi]
+ add ecx,edi
+ mov edi,edx
+ rol ecx,14
+ add ecx,edx
+ ; R1 23
+ xor edi,ecx
+ and edi,eax
+ lea ebx,[3889429448+ebp*1+ebx]
+ xor edi,edx
+ mov ebp,DWORD [36+esi]
+ add ebx,edi
+ mov edi,ecx
+ rol ebx,20
+ add ebx,ecx
+ ; R1 24
+ xor edi,ebx
+ and edi,edx
+ lea eax,[568446438+ebp*1+eax]
+ xor edi,ecx
+ mov ebp,DWORD [56+esi]
+ add eax,edi
+ mov edi,ebx
+ rol eax,5
+ add eax,ebx
+ ; R1 25
+ xor edi,eax
+ and edi,ecx
+ lea edx,[3275163606+ebp*1+edx]
+ xor edi,ebx
+ mov ebp,DWORD [12+esi]
+ add edx,edi
+ mov edi,eax
+ rol edx,9
+ add edx,eax
+ ; R1 26
+ xor edi,edx
+ and edi,ebx
+ lea ecx,[4107603335+ebp*1+ecx]
+ xor edi,eax
+ mov ebp,DWORD [32+esi]
+ add ecx,edi
+ mov edi,edx
+ rol ecx,14
+ add ecx,edx
+ ; R1 27
+ xor edi,ecx
+ and edi,eax
+ lea ebx,[1163531501+ebp*1+ebx]
+ xor edi,edx
+ mov ebp,DWORD [52+esi]
+ add ebx,edi
+ mov edi,ecx
+ rol ebx,20
+ add ebx,ecx
+ ; R1 28
+ xor edi,ebx
+ and edi,edx
+ lea eax,[2850285829+ebp*1+eax]
+ xor edi,ecx
+ mov ebp,DWORD [8+esi]
+ add eax,edi
+ mov edi,ebx
+ rol eax,5
+ add eax,ebx
+ ; R1 29
+ xor edi,eax
+ and edi,ecx
+ lea edx,[4243563512+ebp*1+edx]
+ xor edi,ebx
+ mov ebp,DWORD [28+esi]
+ add edx,edi
+ mov edi,eax
+ rol edx,9
+ add edx,eax
+ ; R1 30
+ xor edi,edx
+ and edi,ebx
+ lea ecx,[1735328473+ebp*1+ecx]
+ xor edi,eax
+ mov ebp,DWORD [48+esi]
+ add ecx,edi
+ mov edi,edx
+ rol ecx,14
+ add ecx,edx
+ ; R1 31
+ xor edi,ecx
+ and edi,eax
+ lea ebx,[2368359562+ebp*1+ebx]
+ xor edi,edx
+ mov ebp,DWORD [20+esi]
+ add ebx,edi
+ mov edi,ecx
+ rol ebx,20
+ add ebx,ecx
+ ;
+ ; R2 section
+ ; R2 32
+ xor edi,edx
+ xor edi,ebx
+ lea eax,[4294588738+ebp*1+eax]
+ add eax,edi
+ mov ebp,DWORD [32+esi]
+ rol eax,4
+ mov edi,ebx
+ ; R2 33
+ add eax,ebx
+ xor edi,ecx
+ lea edx,[2272392833+ebp*1+edx]
+ xor edi,eax
+ mov ebp,DWORD [44+esi]
+ add edx,edi
+ mov edi,eax
+ rol edx,11
+ add edx,eax
+ ; R2 34
+ xor edi,ebx
+ xor edi,edx
+ lea ecx,[1839030562+ebp*1+ecx]
+ add ecx,edi
+ mov ebp,DWORD [56+esi]
+ rol ecx,16
+ mov edi,edx
+ ; R2 35
+ add ecx,edx
+ xor edi,eax
+ lea ebx,[4259657740+ebp*1+ebx]
+ xor edi,ecx
+ mov ebp,DWORD [4+esi]
+ add ebx,edi
+ mov edi,ecx
+ rol ebx,23
+ add ebx,ecx
+ ; R2 36
+ xor edi,edx
+ xor edi,ebx
+ lea eax,[2763975236+ebp*1+eax]
+ add eax,edi
+ mov ebp,DWORD [16+esi]
+ rol eax,4
+ mov edi,ebx
+ ; R2 37
+ add eax,ebx
+ xor edi,ecx
+ lea edx,[1272893353+ebp*1+edx]
+ xor edi,eax
+ mov ebp,DWORD [28+esi]
+ add edx,edi
+ mov edi,eax
+ rol edx,11
+ add edx,eax
+ ; R2 38
+ xor edi,ebx
+ xor edi,edx
+ lea ecx,[4139469664+ebp*1+ecx]
+ add ecx,edi
+ mov ebp,DWORD [40+esi]
+ rol ecx,16
+ mov edi,edx
+ ; R2 39
+ add ecx,edx
+ xor edi,eax
+ lea ebx,[3200236656+ebp*1+ebx]
+ xor edi,ecx
+ mov ebp,DWORD [52+esi]
+ add ebx,edi
+ mov edi,ecx
+ rol ebx,23
+ add ebx,ecx
+ ; R2 40
+ xor edi,edx
+ xor edi,ebx
+ lea eax,[681279174+ebp*1+eax]
+ add eax,edi
+ mov ebp,DWORD [esi]
+ rol eax,4
+ mov edi,ebx
+ ; R2 41
+ add eax,ebx
+ xor edi,ecx
+ lea edx,[3936430074+ebp*1+edx]
+ xor edi,eax
+ mov ebp,DWORD [12+esi]
+ add edx,edi
+ mov edi,eax
+ rol edx,11
+ add edx,eax
+ ; R2 42
+ xor edi,ebx
+ xor edi,edx
+ lea ecx,[3572445317+ebp*1+ecx]
+ add ecx,edi
+ mov ebp,DWORD [24+esi]
+ rol ecx,16
+ mov edi,edx
+ ; R2 43
+ add ecx,edx
+ xor edi,eax
+ lea ebx,[76029189+ebp*1+ebx]
+ xor edi,ecx
+ mov ebp,DWORD [36+esi]
+ add ebx,edi
+ mov edi,ecx
+ rol ebx,23
+ add ebx,ecx
+ ; R2 44
+ xor edi,edx
+ xor edi,ebx
+ lea eax,[3654602809+ebp*1+eax]
+ add eax,edi
+ mov ebp,DWORD [48+esi]
+ rol eax,4
+ mov edi,ebx
+ ; R2 45
+ add eax,ebx
+ xor edi,ecx
+ lea edx,[3873151461+ebp*1+edx]
+ xor edi,eax
+ mov ebp,DWORD [60+esi]
+ add edx,edi
+ mov edi,eax
+ rol edx,11
+ add edx,eax
+ ; R2 46
+ xor edi,ebx
+ xor edi,edx
+ lea ecx,[530742520+ebp*1+ecx]
+ add ecx,edi
+ mov ebp,DWORD [8+esi]
+ rol ecx,16
+ mov edi,edx
+ ; R2 47
+ add ecx,edx
+ xor edi,eax
+ lea ebx,[3299628645+ebp*1+ebx]
+ xor edi,ecx
+ mov ebp,DWORD [esi]
+ add ebx,edi
+ mov edi,-1
+ rol ebx,23
+ add ebx,ecx
+ ;
+ ; R3 section
+ ; R3 48
+ xor edi,edx
+ or edi,ebx
+ lea eax,[4096336452+ebp*1+eax]
+ xor edi,ecx
+ mov ebp,DWORD [28+esi]
+ add eax,edi
+ mov edi,-1
+ rol eax,6
+ xor edi,ecx
+ add eax,ebx
+ ; R3 49
+ or edi,eax
+ lea edx,[1126891415+ebp*1+edx]
+ xor edi,ebx
+ mov ebp,DWORD [56+esi]
+ add edx,edi
+ mov edi,-1
+ rol edx,10
+ xor edi,ebx
+ add edx,eax
+ ; R3 50
+ or edi,edx
+ lea ecx,[2878612391+ebp*1+ecx]
+ xor edi,eax
+ mov ebp,DWORD [20+esi]
+ add ecx,edi
+ mov edi,-1
+ rol ecx,15
+ xor edi,eax
+ add ecx,edx
+ ; R3 51
+ or edi,ecx
+ lea ebx,[4237533241+ebp*1+ebx]
+ xor edi,edx
+ mov ebp,DWORD [48+esi]
+ add ebx,edi
+ mov edi,-1
+ rol ebx,21
+ xor edi,edx
+ add ebx,ecx
+ ; R3 52
+ or edi,ebx
+ lea eax,[1700485571+ebp*1+eax]
+ xor edi,ecx
+ mov ebp,DWORD [12+esi]
+ add eax,edi
+ mov edi,-1
+ rol eax,6
+ xor edi,ecx
+ add eax,ebx
+ ; R3 53
+ or edi,eax
+ lea edx,[2399980690+ebp*1+edx]
+ xor edi,ebx
+ mov ebp,DWORD [40+esi]
+ add edx,edi
+ mov edi,-1
+ rol edx,10
+ xor edi,ebx
+ add edx,eax
+ ; R3 54
+ or edi,edx
+ lea ecx,[4293915773+ebp*1+ecx]
+ xor edi,eax
+ mov ebp,DWORD [4+esi]
+ add ecx,edi
+ mov edi,-1
+ rol ecx,15
+ xor edi,eax
+ add ecx,edx
+ ; R3 55
+ or edi,ecx
+ lea ebx,[2240044497+ebp*1+ebx]
+ xor edi,edx
+ mov ebp,DWORD [32+esi]
+ add ebx,edi
+ mov edi,-1
+ rol ebx,21
+ xor edi,edx
+ add ebx,ecx
+ ; R3 56
+ or edi,ebx
+ lea eax,[1873313359+ebp*1+eax]
+ xor edi,ecx
+ mov ebp,DWORD [60+esi]
+ add eax,edi
+ mov edi,-1
+ rol eax,6
+ xor edi,ecx
+ add eax,ebx
+ ; R3 57
+ or edi,eax
+ lea edx,[4264355552+ebp*1+edx]
+ xor edi,ebx
+ mov ebp,DWORD [24+esi]
+ add edx,edi
+ mov edi,-1
+ rol edx,10
+ xor edi,ebx
+ add edx,eax
+ ; R3 58
+ or edi,edx
+ lea ecx,[2734768916+ebp*1+ecx]
+ xor edi,eax
+ mov ebp,DWORD [52+esi]
+ add ecx,edi
+ mov edi,-1
+ rol ecx,15
+ xor edi,eax
+ add ecx,edx
+ ; R3 59
+ or edi,ecx
+ lea ebx,[1309151649+ebp*1+ebx]
+ xor edi,edx
+ mov ebp,DWORD [16+esi]
+ add ebx,edi
+ mov edi,-1
+ rol ebx,21
+ xor edi,edx
+ add ebx,ecx
+ ; R3 60
+ or edi,ebx
+ lea eax,[4149444226+ebp*1+eax]
+ xor edi,ecx
+ mov ebp,DWORD [44+esi]
+ add eax,edi
+ mov edi,-1
+ rol eax,6
+ xor edi,ecx
+ add eax,ebx
+ ; R3 61
+ or edi,eax
+ lea edx,[3174756917+ebp*1+edx]
+ xor edi,ebx
+ mov ebp,DWORD [8+esi]
+ add edx,edi
+ mov edi,-1
+ rol edx,10
+ xor edi,ebx
+ add edx,eax
+ ; R3 62
+ or edi,edx
+ lea ecx,[718787259+ebp*1+ecx]
+ xor edi,eax
+ mov ebp,DWORD [36+esi]
+ add ecx,edi
+ mov edi,-1
+ rol ecx,15
+ xor edi,eax
+ add ecx,edx
+ ; R3 63
+ or edi,ecx
+ lea ebx,[3951481745+ebp*1+ebx]
+ xor edi,edx
+ mov ebp,DWORD [24+esp]
+ add ebx,edi
+ add esi,64
+ rol ebx,21
+ mov edi,DWORD [ebp]
+ add ebx,ecx
+ add eax,edi
+ mov edi,DWORD [4+ebp]
+ add ebx,edi
+ mov edi,DWORD [8+ebp]
+ add ecx,edi
+ mov edi,DWORD [12+ebp]
+ add edx,edi
+ mov DWORD [ebp],eax
+ mov DWORD [4+ebp],ebx
+ mov edi,DWORD [esp]
+ mov DWORD [8+ebp],ecx
+ mov DWORD [12+ebp],edx
+ cmp edi,esi
+ jae NEAR L$000start
+ pop eax
+ pop ebx
+ pop ebp
+ pop edi
+ pop esi
+ ret
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/modes/ghash-x86.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/modes/ghash-x86.nasm
new file mode 100644
index 0000000000..57649ad22b
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/modes/ghash-x86.nasm
@@ -0,0 +1,1264 @@
+; Copyright 2010-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text code align=64
+%else
+section .text code
+%endif
+global _gcm_gmult_4bit_x86
+align 16
+_gcm_gmult_4bit_x86:
+L$_gcm_gmult_4bit_x86_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ sub esp,84
+ mov edi,DWORD [104+esp]
+ mov esi,DWORD [108+esp]
+ mov ebp,DWORD [edi]
+ mov edx,DWORD [4+edi]
+ mov ecx,DWORD [8+edi]
+ mov ebx,DWORD [12+edi]
+ mov DWORD [16+esp],0
+ mov DWORD [20+esp],471859200
+ mov DWORD [24+esp],943718400
+ mov DWORD [28+esp],610271232
+ mov DWORD [32+esp],1887436800
+ mov DWORD [36+esp],1822425088
+ mov DWORD [40+esp],1220542464
+ mov DWORD [44+esp],1423966208
+ mov DWORD [48+esp],3774873600
+ mov DWORD [52+esp],4246732800
+ mov DWORD [56+esp],3644850176
+ mov DWORD [60+esp],3311403008
+ mov DWORD [64+esp],2441084928
+ mov DWORD [68+esp],2376073216
+ mov DWORD [72+esp],2847932416
+ mov DWORD [76+esp],3051356160
+ mov DWORD [esp],ebp
+ mov DWORD [4+esp],edx
+ mov DWORD [8+esp],ecx
+ mov DWORD [12+esp],ebx
+ shr ebx,20
+ and ebx,240
+ mov ebp,DWORD [4+ebx*1+esi]
+ mov edx,DWORD [ebx*1+esi]
+ mov ecx,DWORD [12+ebx*1+esi]
+ mov ebx,DWORD [8+ebx*1+esi]
+ xor eax,eax
+ mov edi,15
+ jmp NEAR L$000x86_loop
+align 16
+L$000x86_loop:
+ mov al,bl
+ shrd ebx,ecx,4
+ and al,15
+ shrd ecx,edx,4
+ shrd edx,ebp,4
+ shr ebp,4
+ xor ebp,DWORD [16+eax*4+esp]
+ mov al,BYTE [edi*1+esp]
+ and al,240
+ xor ebx,DWORD [8+eax*1+esi]
+ xor ecx,DWORD [12+eax*1+esi]
+ xor edx,DWORD [eax*1+esi]
+ xor ebp,DWORD [4+eax*1+esi]
+ dec edi
+ js NEAR L$001x86_break
+ mov al,bl
+ shrd ebx,ecx,4
+ and al,15
+ shrd ecx,edx,4
+ shrd edx,ebp,4
+ shr ebp,4
+ xor ebp,DWORD [16+eax*4+esp]
+ mov al,BYTE [edi*1+esp]
+ shl al,4
+ xor ebx,DWORD [8+eax*1+esi]
+ xor ecx,DWORD [12+eax*1+esi]
+ xor edx,DWORD [eax*1+esi]
+ xor ebp,DWORD [4+eax*1+esi]
+ jmp NEAR L$000x86_loop
+align 16
+L$001x86_break:
+ bswap ebx
+ bswap ecx
+ bswap edx
+ bswap ebp
+ mov edi,DWORD [104+esp]
+ mov DWORD [12+edi],ebx
+ mov DWORD [8+edi],ecx
+ mov DWORD [4+edi],edx
+ mov DWORD [edi],ebp
+ add esp,84
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+global _gcm_ghash_4bit_x86
+align 16
+_gcm_ghash_4bit_x86:
+L$_gcm_ghash_4bit_x86_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ sub esp,84
+ mov ebx,DWORD [104+esp]
+ mov esi,DWORD [108+esp]
+ mov edi,DWORD [112+esp]
+ mov ecx,DWORD [116+esp]
+ add ecx,edi
+ mov DWORD [116+esp],ecx
+ mov ebp,DWORD [ebx]
+ mov edx,DWORD [4+ebx]
+ mov ecx,DWORD [8+ebx]
+ mov ebx,DWORD [12+ebx]
+ mov DWORD [16+esp],0
+ mov DWORD [20+esp],471859200
+ mov DWORD [24+esp],943718400
+ mov DWORD [28+esp],610271232
+ mov DWORD [32+esp],1887436800
+ mov DWORD [36+esp],1822425088
+ mov DWORD [40+esp],1220542464
+ mov DWORD [44+esp],1423966208
+ mov DWORD [48+esp],3774873600
+ mov DWORD [52+esp],4246732800
+ mov DWORD [56+esp],3644850176
+ mov DWORD [60+esp],3311403008
+ mov DWORD [64+esp],2441084928
+ mov DWORD [68+esp],2376073216
+ mov DWORD [72+esp],2847932416
+ mov DWORD [76+esp],3051356160
+align 16
+L$002x86_outer_loop:
+ xor ebx,DWORD [12+edi]
+ xor ecx,DWORD [8+edi]
+ xor edx,DWORD [4+edi]
+ xor ebp,DWORD [edi]
+ mov DWORD [12+esp],ebx
+ mov DWORD [8+esp],ecx
+ mov DWORD [4+esp],edx
+ mov DWORD [esp],ebp
+ shr ebx,20
+ and ebx,240
+ mov ebp,DWORD [4+ebx*1+esi]
+ mov edx,DWORD [ebx*1+esi]
+ mov ecx,DWORD [12+ebx*1+esi]
+ mov ebx,DWORD [8+ebx*1+esi]
+ xor eax,eax
+ mov edi,15
+ jmp NEAR L$003x86_loop
+align 16
+L$003x86_loop:
+ mov al,bl
+ shrd ebx,ecx,4
+ and al,15
+ shrd ecx,edx,4
+ shrd edx,ebp,4
+ shr ebp,4
+ xor ebp,DWORD [16+eax*4+esp]
+ mov al,BYTE [edi*1+esp]
+ and al,240
+ xor ebx,DWORD [8+eax*1+esi]
+ xor ecx,DWORD [12+eax*1+esi]
+ xor edx,DWORD [eax*1+esi]
+ xor ebp,DWORD [4+eax*1+esi]
+ dec edi
+ js NEAR L$004x86_break
+ mov al,bl
+ shrd ebx,ecx,4
+ and al,15
+ shrd ecx,edx,4
+ shrd edx,ebp,4
+ shr ebp,4
+ xor ebp,DWORD [16+eax*4+esp]
+ mov al,BYTE [edi*1+esp]
+ shl al,4
+ xor ebx,DWORD [8+eax*1+esi]
+ xor ecx,DWORD [12+eax*1+esi]
+ xor edx,DWORD [eax*1+esi]
+ xor ebp,DWORD [4+eax*1+esi]
+ jmp NEAR L$003x86_loop
+align 16
+L$004x86_break:
+ bswap ebx
+ bswap ecx
+ bswap edx
+ bswap ebp
+ mov edi,DWORD [112+esp]
+ lea edi,[16+edi]
+ cmp edi,DWORD [116+esp]
+ mov DWORD [112+esp],edi
+ jb NEAR L$002x86_outer_loop
+ mov edi,DWORD [104+esp]
+ mov DWORD [12+edi],ebx
+ mov DWORD [8+edi],ecx
+ mov DWORD [4+edi],edx
+ mov DWORD [edi],ebp
+ add esp,84
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+global _gcm_gmult_4bit_mmx
+align 16
+_gcm_gmult_4bit_mmx:
+L$_gcm_gmult_4bit_mmx_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ mov edi,DWORD [20+esp]
+ mov esi,DWORD [24+esp]
+ call L$005pic_point
+L$005pic_point:
+ pop eax
+ lea eax,[(L$rem_4bit-L$005pic_point)+eax]
+ movzx ebx,BYTE [15+edi]
+ xor ecx,ecx
+ mov edx,ebx
+ mov cl,dl
+ mov ebp,14
+ shl cl,4
+ and edx,240
+ movq mm0,[8+ecx*1+esi]
+ movq mm1,[ecx*1+esi]
+ movd ebx,mm0
+ jmp NEAR L$006mmx_loop
+align 16
+L$006mmx_loop:
+ psrlq mm0,4
+ and ebx,15
+ movq mm2,mm1
+ psrlq mm1,4
+ pxor mm0,[8+edx*1+esi]
+ mov cl,BYTE [ebp*1+edi]
+ psllq mm2,60
+ pxor mm1,[ebx*8+eax]
+ dec ebp
+ movd ebx,mm0
+ pxor mm1,[edx*1+esi]
+ mov edx,ecx
+ pxor mm0,mm2
+ js NEAR L$007mmx_break
+ shl cl,4
+ and ebx,15
+ psrlq mm0,4
+ and edx,240
+ movq mm2,mm1
+ psrlq mm1,4
+ pxor mm0,[8+ecx*1+esi]
+ psllq mm2,60
+ pxor mm1,[ebx*8+eax]
+ movd ebx,mm0
+ pxor mm1,[ecx*1+esi]
+ pxor mm0,mm2
+ jmp NEAR L$006mmx_loop
+align 16
+L$007mmx_break:
+ shl cl,4
+ and ebx,15
+ psrlq mm0,4
+ and edx,240
+ movq mm2,mm1
+ psrlq mm1,4
+ pxor mm0,[8+ecx*1+esi]
+ psllq mm2,60
+ pxor mm1,[ebx*8+eax]
+ movd ebx,mm0
+ pxor mm1,[ecx*1+esi]
+ pxor mm0,mm2
+ psrlq mm0,4
+ and ebx,15
+ movq mm2,mm1
+ psrlq mm1,4
+ pxor mm0,[8+edx*1+esi]
+ psllq mm2,60
+ pxor mm1,[ebx*8+eax]
+ movd ebx,mm0
+ pxor mm1,[edx*1+esi]
+ pxor mm0,mm2
+ psrlq mm0,32
+ movd edx,mm1
+ psrlq mm1,32
+ movd ecx,mm0
+ movd ebp,mm1
+ bswap ebx
+ bswap edx
+ bswap ecx
+ bswap ebp
+ emms
+ mov DWORD [12+edi],ebx
+ mov DWORD [4+edi],edx
+ mov DWORD [8+edi],ecx
+ mov DWORD [edi],ebp
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+global _gcm_ghash_4bit_mmx
+align 16
+_gcm_ghash_4bit_mmx:
+L$_gcm_ghash_4bit_mmx_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ mov eax,DWORD [20+esp]
+ mov ebx,DWORD [24+esp]
+ mov ecx,DWORD [28+esp]
+ mov edx,DWORD [32+esp]
+ mov ebp,esp
+ call L$008pic_point
+L$008pic_point:
+ pop esi
+ lea esi,[(L$rem_8bit-L$008pic_point)+esi]
+ sub esp,544
+ and esp,-64
+ sub esp,16
+ add edx,ecx
+ mov DWORD [544+esp],eax
+ mov DWORD [552+esp],edx
+ mov DWORD [556+esp],ebp
+ add ebx,128
+ lea edi,[144+esp]
+ lea ebp,[400+esp]
+ mov edx,DWORD [ebx-120]
+ movq mm0,[ebx-120]
+ movq mm3,[ebx-128]
+ shl edx,4
+ mov BYTE [esp],dl
+ mov edx,DWORD [ebx-104]
+ movq mm2,[ebx-104]
+ movq mm5,[ebx-112]
+ movq [edi-128],mm0
+ psrlq mm0,4
+ movq [edi],mm3
+ movq mm7,mm3
+ psrlq mm3,4
+ shl edx,4
+ mov BYTE [1+esp],dl
+ mov edx,DWORD [ebx-88]
+ movq mm1,[ebx-88]
+ psllq mm7,60
+ movq mm4,[ebx-96]
+ por mm0,mm7
+ movq [edi-120],mm2
+ psrlq mm2,4
+ movq [8+edi],mm5
+ movq mm6,mm5
+ movq [ebp-128],mm0
+ psrlq mm5,4
+ movq [ebp],mm3
+ shl edx,4
+ mov BYTE [2+esp],dl
+ mov edx,DWORD [ebx-72]
+ movq mm0,[ebx-72]
+ psllq mm6,60
+ movq mm3,[ebx-80]
+ por mm2,mm6
+ movq [edi-112],mm1
+ psrlq mm1,4
+ movq [16+edi],mm4
+ movq mm7,mm4
+ movq [ebp-120],mm2
+ psrlq mm4,4
+ movq [8+ebp],mm5
+ shl edx,4
+ mov BYTE [3+esp],dl
+ mov edx,DWORD [ebx-56]
+ movq mm2,[ebx-56]
+ psllq mm7,60
+ movq mm5,[ebx-64]
+ por mm1,mm7
+ movq [edi-104],mm0
+ psrlq mm0,4
+ movq [24+edi],mm3
+ movq mm6,mm3
+ movq [ebp-112],mm1
+ psrlq mm3,4
+ movq [16+ebp],mm4
+ shl edx,4
+ mov BYTE [4+esp],dl
+ mov edx,DWORD [ebx-40]
+ movq mm1,[ebx-40]
+ psllq mm6,60
+ movq mm4,[ebx-48]
+ por mm0,mm6
+ movq [edi-96],mm2
+ psrlq mm2,4
+ movq [32+edi],mm5
+ movq mm7,mm5
+ movq [ebp-104],mm0
+ psrlq mm5,4
+ movq [24+ebp],mm3
+ shl edx,4
+ mov BYTE [5+esp],dl
+ mov edx,DWORD [ebx-24]
+ movq mm0,[ebx-24]
+ psllq mm7,60
+ movq mm3,[ebx-32]
+ por mm2,mm7
+ movq [edi-88],mm1
+ psrlq mm1,4
+ movq [40+edi],mm4
+ movq mm6,mm4
+ movq [ebp-96],mm2
+ psrlq mm4,4
+ movq [32+ebp],mm5
+ shl edx,4
+ mov BYTE [6+esp],dl
+ mov edx,DWORD [ebx-8]
+ movq mm2,[ebx-8]
+ psllq mm6,60
+ movq mm5,[ebx-16]
+ por mm1,mm6
+ movq [edi-80],mm0
+ psrlq mm0,4
+ movq [48+edi],mm3
+ movq mm7,mm3
+ movq [ebp-88],mm1
+ psrlq mm3,4
+ movq [40+ebp],mm4
+ shl edx,4
+ mov BYTE [7+esp],dl
+ mov edx,DWORD [8+ebx]
+ movq mm1,[8+ebx]
+ psllq mm7,60
+ movq mm4,[ebx]
+ por mm0,mm7
+ movq [edi-72],mm2
+ psrlq mm2,4
+ movq [56+edi],mm5
+ movq mm6,mm5
+ movq [ebp-80],mm0
+ psrlq mm5,4
+ movq [48+ebp],mm3
+ shl edx,4
+ mov BYTE [8+esp],dl
+ mov edx,DWORD [24+ebx]
+ movq mm0,[24+ebx]
+ psllq mm6,60
+ movq mm3,[16+ebx]
+ por mm2,mm6
+ movq [edi-64],mm1
+ psrlq mm1,4
+ movq [64+edi],mm4
+ movq mm7,mm4
+ movq [ebp-72],mm2
+ psrlq mm4,4
+ movq [56+ebp],mm5
+ shl edx,4
+ mov BYTE [9+esp],dl
+ mov edx,DWORD [40+ebx]
+ movq mm2,[40+ebx]
+ psllq mm7,60
+ movq mm5,[32+ebx]
+ por mm1,mm7
+ movq [edi-56],mm0
+ psrlq mm0,4
+ movq [72+edi],mm3
+ movq mm6,mm3
+ movq [ebp-64],mm1
+ psrlq mm3,4
+ movq [64+ebp],mm4
+ shl edx,4
+ mov BYTE [10+esp],dl
+ mov edx,DWORD [56+ebx]
+ movq mm1,[56+ebx]
+ psllq mm6,60
+ movq mm4,[48+ebx]
+ por mm0,mm6
+ movq [edi-48],mm2
+ psrlq mm2,4
+ movq [80+edi],mm5
+ movq mm7,mm5
+ movq [ebp-56],mm0
+ psrlq mm5,4
+ movq [72+ebp],mm3
+ shl edx,4
+ mov BYTE [11+esp],dl
+ mov edx,DWORD [72+ebx]
+ movq mm0,[72+ebx]
+ psllq mm7,60
+ movq mm3,[64+ebx]
+ por mm2,mm7
+ movq [edi-40],mm1
+ psrlq mm1,4
+ movq [88+edi],mm4
+ movq mm6,mm4
+ movq [ebp-48],mm2
+ psrlq mm4,4
+ movq [80+ebp],mm5
+ shl edx,4
+ mov BYTE [12+esp],dl
+ mov edx,DWORD [88+ebx]
+ movq mm2,[88+ebx]
+ psllq mm6,60
+ movq mm5,[80+ebx]
+ por mm1,mm6
+ movq [edi-32],mm0
+ psrlq mm0,4
+ movq [96+edi],mm3
+ movq mm7,mm3
+ movq [ebp-40],mm1
+ psrlq mm3,4
+ movq [88+ebp],mm4
+ shl edx,4
+ mov BYTE [13+esp],dl
+ mov edx,DWORD [104+ebx]
+ movq mm1,[104+ebx]
+ psllq mm7,60
+ movq mm4,[96+ebx]
+ por mm0,mm7
+ movq [edi-24],mm2
+ psrlq mm2,4
+ movq [104+edi],mm5
+ movq mm6,mm5
+ movq [ebp-32],mm0
+ psrlq mm5,4
+ movq [96+ebp],mm3
+ shl edx,4
+ mov BYTE [14+esp],dl
+ mov edx,DWORD [120+ebx]
+ movq mm0,[120+ebx]
+ psllq mm6,60
+ movq mm3,[112+ebx]
+ por mm2,mm6
+ movq [edi-16],mm1
+ psrlq mm1,4
+ movq [112+edi],mm4
+ movq mm7,mm4
+ movq [ebp-24],mm2
+ psrlq mm4,4
+ movq [104+ebp],mm5
+ shl edx,4
+ mov BYTE [15+esp],dl
+ psllq mm7,60
+ por mm1,mm7
+ movq [edi-8],mm0
+ psrlq mm0,4
+ movq [120+edi],mm3
+ movq mm6,mm3
+ movq [ebp-16],mm1
+ psrlq mm3,4
+ movq [112+ebp],mm4
+ psllq mm6,60
+ por mm0,mm6
+ movq [ebp-8],mm0
+ movq [120+ebp],mm3
+ movq mm6,[eax]
+ mov ebx,DWORD [8+eax]
+ mov edx,DWORD [12+eax]
+align 16
+L$009outer:
+ xor edx,DWORD [12+ecx]
+ xor ebx,DWORD [8+ecx]
+ pxor mm6,[ecx]
+ lea ecx,[16+ecx]
+ mov DWORD [536+esp],ebx
+ movq [528+esp],mm6
+ mov DWORD [548+esp],ecx
+ xor eax,eax
+ rol edx,8
+ mov al,dl
+ mov ebp,eax
+ and al,15
+ shr ebp,4
+ pxor mm0,mm0
+ rol edx,8
+ pxor mm1,mm1
+ pxor mm2,mm2
+ movq mm7,[16+eax*8+esp]
+ movq mm6,[144+eax*8+esp]
+ mov al,dl
+ movd ebx,mm7
+ psrlq mm7,8
+ movq mm3,mm6
+ mov edi,eax
+ psrlq mm6,8
+ pxor mm7,[272+ebp*8+esp]
+ and al,15
+ psllq mm3,56
+ shr edi,4
+ pxor mm7,[16+eax*8+esp]
+ rol edx,8
+ pxor mm6,[144+eax*8+esp]
+ pxor mm7,mm3
+ pxor mm6,[400+ebp*8+esp]
+ xor bl,BYTE [ebp*1+esp]
+ mov al,dl
+ movd ecx,mm7
+ movzx ebx,bl
+ psrlq mm7,8
+ movq mm3,mm6
+ mov ebp,eax
+ psrlq mm6,8
+ pxor mm7,[272+edi*8+esp]
+ and al,15
+ psllq mm3,56
+ shr ebp,4
+ pinsrw mm2,WORD [ebx*2+esi],2
+ pxor mm7,[16+eax*8+esp]
+ rol edx,8
+ pxor mm6,[144+eax*8+esp]
+ pxor mm7,mm3
+ pxor mm6,[400+edi*8+esp]
+ xor cl,BYTE [edi*1+esp]
+ mov al,dl
+ mov edx,DWORD [536+esp]
+ movd ebx,mm7
+ movzx ecx,cl
+ psrlq mm7,8
+ movq mm3,mm6
+ mov edi,eax
+ psrlq mm6,8
+ pxor mm7,[272+ebp*8+esp]
+ and al,15
+ psllq mm3,56
+ pxor mm6,mm2
+ shr edi,4
+ pinsrw mm1,WORD [ecx*2+esi],2
+ pxor mm7,[16+eax*8+esp]
+ rol edx,8
+ pxor mm6,[144+eax*8+esp]
+ pxor mm7,mm3
+ pxor mm6,[400+ebp*8+esp]
+ xor bl,BYTE [ebp*1+esp]
+ mov al,dl
+ movd ecx,mm7
+ movzx ebx,bl
+ psrlq mm7,8
+ movq mm3,mm6
+ mov ebp,eax
+ psrlq mm6,8
+ pxor mm7,[272+edi*8+esp]
+ and al,15
+ psllq mm3,56
+ pxor mm6,mm1
+ shr ebp,4
+ pinsrw mm0,WORD [ebx*2+esi],2
+ pxor mm7,[16+eax*8+esp]
+ rol edx,8
+ pxor mm6,[144+eax*8+esp]
+ pxor mm7,mm3
+ pxor mm6,[400+edi*8+esp]
+ xor cl,BYTE [edi*1+esp]
+ mov al,dl
+ movd ebx,mm7
+ movzx ecx,cl
+ psrlq mm7,8
+ movq mm3,mm6
+ mov edi,eax
+ psrlq mm6,8
+ pxor mm7,[272+ebp*8+esp]
+ and al,15
+ psllq mm3,56
+ pxor mm6,mm0
+ shr edi,4
+ pinsrw mm2,WORD [ecx*2+esi],2
+ pxor mm7,[16+eax*8+esp]
+ rol edx,8
+ pxor mm6,[144+eax*8+esp]
+ pxor mm7,mm3
+ pxor mm6,[400+ebp*8+esp]
+ xor bl,BYTE [ebp*1+esp]
+ mov al,dl
+ movd ecx,mm7
+ movzx ebx,bl
+ psrlq mm7,8
+ movq mm3,mm6
+ mov ebp,eax
+ psrlq mm6,8
+ pxor mm7,[272+edi*8+esp]
+ and al,15
+ psllq mm3,56
+ pxor mm6,mm2
+ shr ebp,4
+ pinsrw mm1,WORD [ebx*2+esi],2
+ pxor mm7,[16+eax*8+esp]
+ rol edx,8
+ pxor mm6,[144+eax*8+esp]
+ pxor mm7,mm3
+ pxor mm6,[400+edi*8+esp]
+ xor cl,BYTE [edi*1+esp]
+ mov al,dl
+ mov edx,DWORD [532+esp]
+ movd ebx,mm7
+ movzx ecx,cl
+ psrlq mm7,8
+ movq mm3,mm6
+ mov edi,eax
+ psrlq mm6,8
+ pxor mm7,[272+ebp*8+esp]
+ and al,15
+ psllq mm3,56
+ pxor mm6,mm1
+ shr edi,4
+ pinsrw mm0,WORD [ecx*2+esi],2
+ pxor mm7,[16+eax*8+esp]
+ rol edx,8
+ pxor mm6,[144+eax*8+esp]
+ pxor mm7,mm3
+ pxor mm6,[400+ebp*8+esp]
+ xor bl,BYTE [ebp*1+esp]
+ mov al,dl
+ movd ecx,mm7
+ movzx ebx,bl
+ psrlq mm7,8
+ movq mm3,mm6
+ mov ebp,eax
+ psrlq mm6,8
+ pxor mm7,[272+edi*8+esp]
+ and al,15
+ psllq mm3,56
+ pxor mm6,mm0
+ shr ebp,4
+ pinsrw mm2,WORD [ebx*2+esi],2
+ pxor mm7,[16+eax*8+esp]
+ rol edx,8
+ pxor mm6,[144+eax*8+esp]
+ pxor mm7,mm3
+ pxor mm6,[400+edi*8+esp]
+ xor cl,BYTE [edi*1+esp]
+ mov al,dl
+ movd ebx,mm7
+ movzx ecx,cl
+ psrlq mm7,8
+ movq mm3,mm6
+ mov edi,eax
+ psrlq mm6,8
+ pxor mm7,[272+ebp*8+esp]
+ and al,15
+ psllq mm3,56
+ pxor mm6,mm2
+ shr edi,4
+ pinsrw mm1,WORD [ecx*2+esi],2
+ pxor mm7,[16+eax*8+esp]
+ rol edx,8
+ pxor mm6,[144+eax*8+esp]
+ pxor mm7,mm3
+ pxor mm6,[400+ebp*8+esp]
+ xor bl,BYTE [ebp*1+esp]
+ mov al,dl
+ movd ecx,mm7
+ movzx ebx,bl
+ psrlq mm7,8
+ movq mm3,mm6
+ mov ebp,eax
+ psrlq mm6,8
+ pxor mm7,[272+edi*8+esp]
+ and al,15
+ psllq mm3,56
+ pxor mm6,mm1
+ shr ebp,4
+ pinsrw mm0,WORD [ebx*2+esi],2
+ pxor mm7,[16+eax*8+esp]
+ rol edx,8
+ pxor mm6,[144+eax*8+esp]
+ pxor mm7,mm3
+ pxor mm6,[400+edi*8+esp]
+ xor cl,BYTE [edi*1+esp]
+ mov al,dl
+ mov edx,DWORD [528+esp]
+ movd ebx,mm7
+ movzx ecx,cl
+ psrlq mm7,8
+ movq mm3,mm6
+ mov edi,eax
+ psrlq mm6,8
+ pxor mm7,[272+ebp*8+esp]
+ and al,15
+ psllq mm3,56
+ pxor mm6,mm0
+ shr edi,4
+ pinsrw mm2,WORD [ecx*2+esi],2
+ pxor mm7,[16+eax*8+esp]
+ rol edx,8
+ pxor mm6,[144+eax*8+esp]
+ pxor mm7,mm3
+ pxor mm6,[400+ebp*8+esp]
+ xor bl,BYTE [ebp*1+esp]
+ mov al,dl
+ movd ecx,mm7
+ movzx ebx,bl
+ psrlq mm7,8
+ movq mm3,mm6
+ mov ebp,eax
+ psrlq mm6,8
+ pxor mm7,[272+edi*8+esp]
+ and al,15
+ psllq mm3,56
+ pxor mm6,mm2
+ shr ebp,4
+ pinsrw mm1,WORD [ebx*2+esi],2
+ pxor mm7,[16+eax*8+esp]
+ rol edx,8
+ pxor mm6,[144+eax*8+esp]
+ pxor mm7,mm3
+ pxor mm6,[400+edi*8+esp]
+ xor cl,BYTE [edi*1+esp]
+ mov al,dl
+ movd ebx,mm7
+ movzx ecx,cl
+ psrlq mm7,8
+ movq mm3,mm6
+ mov edi,eax
+ psrlq mm6,8
+ pxor mm7,[272+ebp*8+esp]
+ and al,15
+ psllq mm3,56
+ pxor mm6,mm1
+ shr edi,4
+ pinsrw mm0,WORD [ecx*2+esi],2
+ pxor mm7,[16+eax*8+esp]
+ rol edx,8
+ pxor mm6,[144+eax*8+esp]
+ pxor mm7,mm3
+ pxor mm6,[400+ebp*8+esp]
+ xor bl,BYTE [ebp*1+esp]
+ mov al,dl
+ movd ecx,mm7
+ movzx ebx,bl
+ psrlq mm7,8
+ movq mm3,mm6
+ mov ebp,eax
+ psrlq mm6,8
+ pxor mm7,[272+edi*8+esp]
+ and al,15
+ psllq mm3,56
+ pxor mm6,mm0
+ shr ebp,4
+ pinsrw mm2,WORD [ebx*2+esi],2
+ pxor mm7,[16+eax*8+esp]
+ rol edx,8
+ pxor mm6,[144+eax*8+esp]
+ pxor mm7,mm3
+ pxor mm6,[400+edi*8+esp]
+ xor cl,BYTE [edi*1+esp]
+ mov al,dl
+ mov edx,DWORD [524+esp]
+ movd ebx,mm7
+ movzx ecx,cl
+ psrlq mm7,8
+ movq mm3,mm6
+ mov edi,eax
+ psrlq mm6,8
+ pxor mm7,[272+ebp*8+esp]
+ and al,15
+ psllq mm3,56
+ pxor mm6,mm2
+ shr edi,4
+ pinsrw mm1,WORD [ecx*2+esi],2
+ pxor mm7,[16+eax*8+esp]
+ pxor mm6,[144+eax*8+esp]
+ xor bl,BYTE [ebp*1+esp]
+ pxor mm7,mm3
+ pxor mm6,[400+ebp*8+esp]
+ movzx ebx,bl
+ pxor mm2,mm2
+ psllq mm1,4
+ movd ecx,mm7
+ psrlq mm7,4
+ movq mm3,mm6
+ psrlq mm6,4
+ shl ecx,4
+ pxor mm7,[16+edi*8+esp]
+ psllq mm3,60
+ movzx ecx,cl
+ pxor mm7,mm3
+ pxor mm6,[144+edi*8+esp]
+ pinsrw mm0,WORD [ebx*2+esi],2
+ pxor mm6,mm1
+ movd edx,mm7
+ pinsrw mm2,WORD [ecx*2+esi],3
+ psllq mm0,12
+ pxor mm6,mm0
+ psrlq mm7,32
+ pxor mm6,mm2
+ mov ecx,DWORD [548+esp]
+ movd ebx,mm7
+ movq mm3,mm6
+ psllw mm6,8
+ psrlw mm3,8
+ por mm6,mm3
+ bswap edx
+ pshufw mm6,mm6,27
+ bswap ebx
+ cmp ecx,DWORD [552+esp]
+ jne NEAR L$009outer
+ mov eax,DWORD [544+esp]
+ mov DWORD [12+eax],edx
+ mov DWORD [8+eax],ebx
+ movq [eax],mm6
+ mov esp,DWORD [556+esp]
+ emms
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+global _gcm_init_clmul
+align 16
+_gcm_init_clmul:
+L$_gcm_init_clmul_begin:
+ mov edx,DWORD [4+esp]
+ mov eax,DWORD [8+esp]
+ call L$010pic
+L$010pic:
+ pop ecx
+ lea ecx,[(L$bswap-L$010pic)+ecx]
+ movdqu xmm2,[eax]
+ pshufd xmm2,xmm2,78
+ pshufd xmm4,xmm2,255
+ movdqa xmm3,xmm2
+ psllq xmm2,1
+ pxor xmm5,xmm5
+ psrlq xmm3,63
+ pcmpgtd xmm5,xmm4
+ pslldq xmm3,8
+ por xmm2,xmm3
+ pand xmm5,[16+ecx]
+ pxor xmm2,xmm5
+ movdqa xmm0,xmm2
+ movdqa xmm1,xmm0
+ pshufd xmm3,xmm0,78
+ pshufd xmm4,xmm2,78
+ pxor xmm3,xmm0
+ pxor xmm4,xmm2
+db 102,15,58,68,194,0
+db 102,15,58,68,202,17
+db 102,15,58,68,220,0
+ xorps xmm3,xmm0
+ xorps xmm3,xmm1
+ movdqa xmm4,xmm3
+ psrldq xmm3,8
+ pslldq xmm4,8
+ pxor xmm1,xmm3
+ pxor xmm0,xmm4
+ movdqa xmm4,xmm0
+ movdqa xmm3,xmm0
+ psllq xmm0,5
+ pxor xmm3,xmm0
+ psllq xmm0,1
+ pxor xmm0,xmm3
+ psllq xmm0,57
+ movdqa xmm3,xmm0
+ pslldq xmm0,8
+ psrldq xmm3,8
+ pxor xmm0,xmm4
+ pxor xmm1,xmm3
+ movdqa xmm4,xmm0
+ psrlq xmm0,1
+ pxor xmm1,xmm4
+ pxor xmm4,xmm0
+ psrlq xmm0,5
+ pxor xmm0,xmm4
+ psrlq xmm0,1
+ pxor xmm0,xmm1
+ pshufd xmm3,xmm2,78
+ pshufd xmm4,xmm0,78
+ pxor xmm3,xmm2
+ movdqu [edx],xmm2
+ pxor xmm4,xmm0
+ movdqu [16+edx],xmm0
+db 102,15,58,15,227,8
+ movdqu [32+edx],xmm4
+ ret
+global _gcm_gmult_clmul
+align 16
+_gcm_gmult_clmul:
+L$_gcm_gmult_clmul_begin:
+ mov eax,DWORD [4+esp]
+ mov edx,DWORD [8+esp]
+ call L$011pic
+L$011pic:
+ pop ecx
+ lea ecx,[(L$bswap-L$011pic)+ecx]
+ movdqu xmm0,[eax]
+ movdqa xmm5,[ecx]
+ movups xmm2,[edx]
+db 102,15,56,0,197
+ movups xmm4,[32+edx]
+ movdqa xmm1,xmm0
+ pshufd xmm3,xmm0,78
+ pxor xmm3,xmm0
+db 102,15,58,68,194,0
+db 102,15,58,68,202,17
+db 102,15,58,68,220,0
+ xorps xmm3,xmm0
+ xorps xmm3,xmm1
+ movdqa xmm4,xmm3
+ psrldq xmm3,8
+ pslldq xmm4,8
+ pxor xmm1,xmm3
+ pxor xmm0,xmm4
+ movdqa xmm4,xmm0
+ movdqa xmm3,xmm0
+ psllq xmm0,5
+ pxor xmm3,xmm0
+ psllq xmm0,1
+ pxor xmm0,xmm3
+ psllq xmm0,57
+ movdqa xmm3,xmm0
+ pslldq xmm0,8
+ psrldq xmm3,8
+ pxor xmm0,xmm4
+ pxor xmm1,xmm3
+ movdqa xmm4,xmm0
+ psrlq xmm0,1
+ pxor xmm1,xmm4
+ pxor xmm4,xmm0
+ psrlq xmm0,5
+ pxor xmm0,xmm4
+ psrlq xmm0,1
+ pxor xmm0,xmm1
+db 102,15,56,0,197
+ movdqu [eax],xmm0
+ ret
+global _gcm_ghash_clmul
+align 16
+_gcm_ghash_clmul:
+L$_gcm_ghash_clmul_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ mov eax,DWORD [20+esp]
+ mov edx,DWORD [24+esp]
+ mov esi,DWORD [28+esp]
+ mov ebx,DWORD [32+esp]
+ call L$012pic
+L$012pic:
+ pop ecx
+ lea ecx,[(L$bswap-L$012pic)+ecx]
+ movdqu xmm0,[eax]
+ movdqa xmm5,[ecx]
+ movdqu xmm2,[edx]
+db 102,15,56,0,197
+ sub ebx,16
+ jz NEAR L$013odd_tail
+ movdqu xmm3,[esi]
+ movdqu xmm6,[16+esi]
+db 102,15,56,0,221
+db 102,15,56,0,245
+ movdqu xmm5,[32+edx]
+ pxor xmm0,xmm3
+ pshufd xmm3,xmm6,78
+ movdqa xmm7,xmm6
+ pxor xmm3,xmm6
+ lea esi,[32+esi]
+db 102,15,58,68,242,0
+db 102,15,58,68,250,17
+db 102,15,58,68,221,0
+ movups xmm2,[16+edx]
+ nop
+ sub ebx,32
+ jbe NEAR L$014even_tail
+ jmp NEAR L$015mod_loop
+align 32
+L$015mod_loop:
+ pshufd xmm4,xmm0,78
+ movdqa xmm1,xmm0
+ pxor xmm4,xmm0
+ nop
+db 102,15,58,68,194,0
+db 102,15,58,68,202,17
+db 102,15,58,68,229,16
+ movups xmm2,[edx]
+ xorps xmm0,xmm6
+ movdqa xmm5,[ecx]
+ xorps xmm1,xmm7
+ movdqu xmm7,[esi]
+ pxor xmm3,xmm0
+ movdqu xmm6,[16+esi]
+ pxor xmm3,xmm1
+db 102,15,56,0,253
+ pxor xmm4,xmm3
+ movdqa xmm3,xmm4
+ psrldq xmm4,8
+ pslldq xmm3,8
+ pxor xmm1,xmm4
+ pxor xmm0,xmm3
+db 102,15,56,0,245
+ pxor xmm1,xmm7
+ movdqa xmm7,xmm6
+ movdqa xmm4,xmm0
+ movdqa xmm3,xmm0
+ psllq xmm0,5
+ pxor xmm3,xmm0
+ psllq xmm0,1
+ pxor xmm0,xmm3
+db 102,15,58,68,242,0
+ movups xmm5,[32+edx]
+ psllq xmm0,57
+ movdqa xmm3,xmm0
+ pslldq xmm0,8
+ psrldq xmm3,8
+ pxor xmm0,xmm4
+ pxor xmm1,xmm3
+ pshufd xmm3,xmm7,78
+ movdqa xmm4,xmm0
+ psrlq xmm0,1
+ pxor xmm3,xmm7
+ pxor xmm1,xmm4
+db 102,15,58,68,250,17
+ movups xmm2,[16+edx]
+ pxor xmm4,xmm0
+ psrlq xmm0,5
+ pxor xmm0,xmm4
+ psrlq xmm0,1
+ pxor xmm0,xmm1
+db 102,15,58,68,221,0
+ lea esi,[32+esi]
+ sub ebx,32
+ ja NEAR L$015mod_loop
+L$014even_tail:
+ pshufd xmm4,xmm0,78
+ movdqa xmm1,xmm0
+ pxor xmm4,xmm0
+db 102,15,58,68,194,0
+db 102,15,58,68,202,17
+db 102,15,58,68,229,16
+ movdqa xmm5,[ecx]
+ xorps xmm0,xmm6
+ xorps xmm1,xmm7
+ pxor xmm3,xmm0
+ pxor xmm3,xmm1
+ pxor xmm4,xmm3
+ movdqa xmm3,xmm4
+ psrldq xmm4,8
+ pslldq xmm3,8
+ pxor xmm1,xmm4
+ pxor xmm0,xmm3
+ movdqa xmm4,xmm0
+ movdqa xmm3,xmm0
+ psllq xmm0,5
+ pxor xmm3,xmm0
+ psllq xmm0,1
+ pxor xmm0,xmm3
+ psllq xmm0,57
+ movdqa xmm3,xmm0
+ pslldq xmm0,8
+ psrldq xmm3,8
+ pxor xmm0,xmm4
+ pxor xmm1,xmm3
+ movdqa xmm4,xmm0
+ psrlq xmm0,1
+ pxor xmm1,xmm4
+ pxor xmm4,xmm0
+ psrlq xmm0,5
+ pxor xmm0,xmm4
+ psrlq xmm0,1
+ pxor xmm0,xmm1
+ test ebx,ebx
+ jnz NEAR L$016done
+ movups xmm2,[edx]
+L$013odd_tail:
+ movdqu xmm3,[esi]
+db 102,15,56,0,221
+ pxor xmm0,xmm3
+ movdqa xmm1,xmm0
+ pshufd xmm3,xmm0,78
+ pshufd xmm4,xmm2,78
+ pxor xmm3,xmm0
+ pxor xmm4,xmm2
+db 102,15,58,68,194,0
+db 102,15,58,68,202,17
+db 102,15,58,68,220,0
+ xorps xmm3,xmm0
+ xorps xmm3,xmm1
+ movdqa xmm4,xmm3
+ psrldq xmm3,8
+ pslldq xmm4,8
+ pxor xmm1,xmm3
+ pxor xmm0,xmm4
+ movdqa xmm4,xmm0
+ movdqa xmm3,xmm0
+ psllq xmm0,5
+ pxor xmm3,xmm0
+ psllq xmm0,1
+ pxor xmm0,xmm3
+ psllq xmm0,57
+ movdqa xmm3,xmm0
+ pslldq xmm0,8
+ psrldq xmm3,8
+ pxor xmm0,xmm4
+ pxor xmm1,xmm3
+ movdqa xmm4,xmm0
+ psrlq xmm0,1
+ pxor xmm1,xmm4
+ pxor xmm4,xmm0
+ psrlq xmm0,5
+ pxor xmm0,xmm4
+ psrlq xmm0,1
+ pxor xmm0,xmm1
+L$016done:
+db 102,15,56,0,197
+ movdqu [eax],xmm0
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+align 64
+L$bswap:
+db 15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0
+db 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,194
+align 64
+L$rem_8bit:
+dw 0,450,900,582,1800,1738,1164,1358
+dw 3600,4050,3476,3158,2328,2266,2716,2910
+dw 7200,7650,8100,7782,6952,6890,6316,6510
+dw 4656,5106,4532,4214,5432,5370,5820,6014
+dw 14400,14722,15300,14854,16200,16010,15564,15630
+dw 13904,14226,13780,13334,12632,12442,13020,13086
+dw 9312,9634,10212,9766,9064,8874,8428,8494
+dw 10864,11186,10740,10294,11640,11450,12028,12094
+dw 28800,28994,29444,29382,30600,30282,29708,30158
+dw 32400,32594,32020,31958,31128,30810,31260,31710
+dw 27808,28002,28452,28390,27560,27242,26668,27118
+dw 25264,25458,24884,24822,26040,25722,26172,26622
+dw 18624,18690,19268,19078,20424,19978,19532,19854
+dw 18128,18194,17748,17558,16856,16410,16988,17310
+dw 21728,21794,22372,22182,21480,21034,20588,20910
+dw 23280,23346,22900,22710,24056,23610,24188,24510
+dw 57600,57538,57988,58182,58888,59338,58764,58446
+dw 61200,61138,60564,60758,59416,59866,60316,59998
+dw 64800,64738,65188,65382,64040,64490,63916,63598
+dw 62256,62194,61620,61814,62520,62970,63420,63102
+dw 55616,55426,56004,56070,56904,57226,56780,56334
+dw 55120,54930,54484,54550,53336,53658,54236,53790
+dw 50528,50338,50916,50982,49768,50090,49644,49198
+dw 52080,51890,51444,51510,52344,52666,53244,52798
+dw 37248,36930,37380,37830,38536,38730,38156,38094
+dw 40848,40530,39956,40406,39064,39258,39708,39646
+dw 36256,35938,36388,36838,35496,35690,35116,35054
+dw 33712,33394,32820,33270,33976,34170,34620,34558
+dw 43456,43010,43588,43910,44744,44810,44364,44174
+dw 42960,42514,42068,42390,41176,41242,41820,41630
+dw 46560,46114,46692,47014,45800,45866,45420,45230
+dw 48112,47666,47220,47542,48376,48442,49020,48830
+align 64
+L$rem_4bit:
+dd 0,0,0,471859200,0,943718400,0,610271232
+dd 0,1887436800,0,1822425088,0,1220542464,0,1423966208
+dd 0,3774873600,0,4246732800,0,3644850176,0,3311403008
+dd 0,2441084928,0,2376073216,0,2847932416,0,3051356160
+db 71,72,65,83,72,32,102,111,114,32,120,56,54,44,32,67
+db 82,89,80,84,79,71,65,77,83,32,98,121,32,60,97,112
+db 112,114,111,64,111,112,101,110,115,115,108,46,111,114,103,62
+db 0
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/rc4/rc4-586.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/rc4/rc4-586.nasm
new file mode 100644
index 0000000000..e78222ee9d
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/rc4/rc4-586.nasm
@@ -0,0 +1,381 @@
+; Copyright 1998-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text code align=64
+%else
+section .text code
+%endif
+;extern _OPENSSL_ia32cap_P
+global _RC4
+align 16
+_RC4:
+L$_RC4_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ mov edi,DWORD [20+esp]
+ mov edx,DWORD [24+esp]
+ mov esi,DWORD [28+esp]
+ mov ebp,DWORD [32+esp]
+ xor eax,eax
+ xor ebx,ebx
+ cmp edx,0
+ je NEAR L$000abort
+ mov al,BYTE [edi]
+ mov bl,BYTE [4+edi]
+ add edi,8
+ lea ecx,[edx*1+esi]
+ sub ebp,esi
+ mov DWORD [24+esp],ecx
+ inc al
+ cmp DWORD [256+edi],-1
+ je NEAR L$001RC4_CHAR
+ mov ecx,DWORD [eax*4+edi]
+ and edx,-4
+ jz NEAR L$002loop1
+ mov DWORD [32+esp],ebp
+ test edx,-8
+ jz NEAR L$003go4loop4
+ lea ebp,[_OPENSSL_ia32cap_P]
+ bt DWORD [ebp],26
+ jnc NEAR L$003go4loop4
+ mov ebp,DWORD [32+esp]
+ and edx,-8
+ lea edx,[edx*1+esi-8]
+ mov DWORD [edi-4],edx
+ add bl,cl
+ mov edx,DWORD [ebx*4+edi]
+ mov DWORD [ebx*4+edi],ecx
+ mov DWORD [eax*4+edi],edx
+ inc eax
+ add edx,ecx
+ movzx eax,al
+ movzx edx,dl
+ movq mm0,[esi]
+ mov ecx,DWORD [eax*4+edi]
+ movd mm2,DWORD [edx*4+edi]
+ jmp NEAR L$004loop_mmx_enter
+align 16
+L$005loop_mmx:
+ add bl,cl
+ psllq mm1,56
+ mov edx,DWORD [ebx*4+edi]
+ mov DWORD [ebx*4+edi],ecx
+ mov DWORD [eax*4+edi],edx
+ inc eax
+ add edx,ecx
+ movzx eax,al
+ movzx edx,dl
+ pxor mm2,mm1
+ movq mm0,[esi]
+ movq [esi*1+ebp-8],mm2
+ mov ecx,DWORD [eax*4+edi]
+ movd mm2,DWORD [edx*4+edi]
+L$004loop_mmx_enter:
+ add bl,cl
+ mov edx,DWORD [ebx*4+edi]
+ mov DWORD [ebx*4+edi],ecx
+ mov DWORD [eax*4+edi],edx
+ inc eax
+ add edx,ecx
+ movzx eax,al
+ movzx edx,dl
+ pxor mm2,mm0
+ mov ecx,DWORD [eax*4+edi]
+ movd mm1,DWORD [edx*4+edi]
+ add bl,cl
+ psllq mm1,8
+ mov edx,DWORD [ebx*4+edi]
+ mov DWORD [ebx*4+edi],ecx
+ mov DWORD [eax*4+edi],edx
+ inc eax
+ add edx,ecx
+ movzx eax,al
+ movzx edx,dl
+ pxor mm2,mm1
+ mov ecx,DWORD [eax*4+edi]
+ movd mm1,DWORD [edx*4+edi]
+ add bl,cl
+ psllq mm1,16
+ mov edx,DWORD [ebx*4+edi]
+ mov DWORD [ebx*4+edi],ecx
+ mov DWORD [eax*4+edi],edx
+ inc eax
+ add edx,ecx
+ movzx eax,al
+ movzx edx,dl
+ pxor mm2,mm1
+ mov ecx,DWORD [eax*4+edi]
+ movd mm1,DWORD [edx*4+edi]
+ add bl,cl
+ psllq mm1,24
+ mov edx,DWORD [ebx*4+edi]
+ mov DWORD [ebx*4+edi],ecx
+ mov DWORD [eax*4+edi],edx
+ inc eax
+ add edx,ecx
+ movzx eax,al
+ movzx edx,dl
+ pxor mm2,mm1
+ mov ecx,DWORD [eax*4+edi]
+ movd mm1,DWORD [edx*4+edi]
+ add bl,cl
+ psllq mm1,32
+ mov edx,DWORD [ebx*4+edi]
+ mov DWORD [ebx*4+edi],ecx
+ mov DWORD [eax*4+edi],edx
+ inc eax
+ add edx,ecx
+ movzx eax,al
+ movzx edx,dl
+ pxor mm2,mm1
+ mov ecx,DWORD [eax*4+edi]
+ movd mm1,DWORD [edx*4+edi]
+ add bl,cl
+ psllq mm1,40
+ mov edx,DWORD [ebx*4+edi]
+ mov DWORD [ebx*4+edi],ecx
+ mov DWORD [eax*4+edi],edx
+ inc eax
+ add edx,ecx
+ movzx eax,al
+ movzx edx,dl
+ pxor mm2,mm1
+ mov ecx,DWORD [eax*4+edi]
+ movd mm1,DWORD [edx*4+edi]
+ add bl,cl
+ psllq mm1,48
+ mov edx,DWORD [ebx*4+edi]
+ mov DWORD [ebx*4+edi],ecx
+ mov DWORD [eax*4+edi],edx
+ inc eax
+ add edx,ecx
+ movzx eax,al
+ movzx edx,dl
+ pxor mm2,mm1
+ mov ecx,DWORD [eax*4+edi]
+ movd mm1,DWORD [edx*4+edi]
+ mov edx,ebx
+ xor ebx,ebx
+ mov bl,dl
+ cmp esi,DWORD [edi-4]
+ lea esi,[8+esi]
+ jb NEAR L$005loop_mmx
+ psllq mm1,56
+ pxor mm2,mm1
+ movq [esi*1+ebp-8],mm2
+ emms
+ cmp esi,DWORD [24+esp]
+ je NEAR L$006done
+ jmp NEAR L$002loop1
+align 16
+L$003go4loop4:
+ lea edx,[edx*1+esi-4]
+ mov DWORD [28+esp],edx
+L$007loop4:
+ add bl,cl
+ mov edx,DWORD [ebx*4+edi]
+ mov DWORD [ebx*4+edi],ecx
+ mov DWORD [eax*4+edi],edx
+ add edx,ecx
+ inc al
+ and edx,255
+ mov ecx,DWORD [eax*4+edi]
+ mov ebp,DWORD [edx*4+edi]
+ add bl,cl
+ mov edx,DWORD [ebx*4+edi]
+ mov DWORD [ebx*4+edi],ecx
+ mov DWORD [eax*4+edi],edx
+ add edx,ecx
+ inc al
+ and edx,255
+ ror ebp,8
+ mov ecx,DWORD [eax*4+edi]
+ or ebp,DWORD [edx*4+edi]
+ add bl,cl
+ mov edx,DWORD [ebx*4+edi]
+ mov DWORD [ebx*4+edi],ecx
+ mov DWORD [eax*4+edi],edx
+ add edx,ecx
+ inc al
+ and edx,255
+ ror ebp,8
+ mov ecx,DWORD [eax*4+edi]
+ or ebp,DWORD [edx*4+edi]
+ add bl,cl
+ mov edx,DWORD [ebx*4+edi]
+ mov DWORD [ebx*4+edi],ecx
+ mov DWORD [eax*4+edi],edx
+ add edx,ecx
+ inc al
+ and edx,255
+ ror ebp,8
+ mov ecx,DWORD [32+esp]
+ or ebp,DWORD [edx*4+edi]
+ ror ebp,8
+ xor ebp,DWORD [esi]
+ cmp esi,DWORD [28+esp]
+ mov DWORD [esi*1+ecx],ebp
+ lea esi,[4+esi]
+ mov ecx,DWORD [eax*4+edi]
+ jb NEAR L$007loop4
+ cmp esi,DWORD [24+esp]
+ je NEAR L$006done
+ mov ebp,DWORD [32+esp]
+align 16
+L$002loop1:
+ add bl,cl
+ mov edx,DWORD [ebx*4+edi]
+ mov DWORD [ebx*4+edi],ecx
+ mov DWORD [eax*4+edi],edx
+ add edx,ecx
+ inc al
+ and edx,255
+ mov edx,DWORD [edx*4+edi]
+ xor dl,BYTE [esi]
+ lea esi,[1+esi]
+ mov ecx,DWORD [eax*4+edi]
+ cmp esi,DWORD [24+esp]
+ mov BYTE [esi*1+ebp-1],dl
+ jb NEAR L$002loop1
+ jmp NEAR L$006done
+align 16
+L$001RC4_CHAR:
+ movzx ecx,BYTE [eax*1+edi]
+L$008cloop1:
+ add bl,cl
+ movzx edx,BYTE [ebx*1+edi]
+ mov BYTE [ebx*1+edi],cl
+ mov BYTE [eax*1+edi],dl
+ add dl,cl
+ movzx edx,BYTE [edx*1+edi]
+ add al,1
+ xor dl,BYTE [esi]
+ lea esi,[1+esi]
+ movzx ecx,BYTE [eax*1+edi]
+ cmp esi,DWORD [24+esp]
+ mov BYTE [esi*1+ebp-1],dl
+ jb NEAR L$008cloop1
+L$006done:
+ dec al
+ mov DWORD [edi-4],ebx
+ mov BYTE [edi-8],al
+L$000abort:
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+global _RC4_set_key
+align 16
+_RC4_set_key:
+L$_RC4_set_key_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ mov edi,DWORD [20+esp]
+ mov ebp,DWORD [24+esp]
+ mov esi,DWORD [28+esp]
+ lea edx,[_OPENSSL_ia32cap_P]
+ lea edi,[8+edi]
+ lea esi,[ebp*1+esi]
+ neg ebp
+ xor eax,eax
+ mov DWORD [edi-4],ebp
+ bt DWORD [edx],20
+ jc NEAR L$009c1stloop
+align 16
+L$010w1stloop:
+ mov DWORD [eax*4+edi],eax
+ add al,1
+ jnc NEAR L$010w1stloop
+ xor ecx,ecx
+ xor edx,edx
+align 16
+L$011w2ndloop:
+ mov eax,DWORD [ecx*4+edi]
+ add dl,BYTE [ebp*1+esi]
+ add dl,al
+ add ebp,1
+ mov ebx,DWORD [edx*4+edi]
+ jnz NEAR L$012wnowrap
+ mov ebp,DWORD [edi-4]
+L$012wnowrap:
+ mov DWORD [edx*4+edi],eax
+ mov DWORD [ecx*4+edi],ebx
+ add cl,1
+ jnc NEAR L$011w2ndloop
+ jmp NEAR L$013exit
+align 16
+L$009c1stloop:
+ mov BYTE [eax*1+edi],al
+ add al,1
+ jnc NEAR L$009c1stloop
+ xor ecx,ecx
+ xor edx,edx
+ xor ebx,ebx
+align 16
+L$014c2ndloop:
+ mov al,BYTE [ecx*1+edi]
+ add dl,BYTE [ebp*1+esi]
+ add dl,al
+ add ebp,1
+ mov bl,BYTE [edx*1+edi]
+ jnz NEAR L$015cnowrap
+ mov ebp,DWORD [edi-4]
+L$015cnowrap:
+ mov BYTE [edx*1+edi],al
+ mov BYTE [ecx*1+edi],bl
+ add cl,1
+ jnc NEAR L$014c2ndloop
+ mov DWORD [256+edi],-1
+L$013exit:
+ xor eax,eax
+ mov DWORD [edi-8],eax
+ mov DWORD [edi-4],eax
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+global _RC4_options
+align 16
+_RC4_options:
+L$_RC4_options_begin:
+ call L$016pic_point
+L$016pic_point:
+ pop eax
+ lea eax,[(L$017opts-L$016pic_point)+eax]
+ lea edx,[_OPENSSL_ia32cap_P]
+ mov edx,DWORD [edx]
+ bt edx,20
+ jc NEAR L$0181xchar
+ bt edx,26
+ jnc NEAR L$019ret
+ add eax,25
+ ret
+L$0181xchar:
+ add eax,12
+L$019ret:
+ ret
+align 64
+L$017opts:
+db 114,99,52,40,52,120,44,105,110,116,41,0
+db 114,99,52,40,49,120,44,99,104,97,114,41,0
+db 114,99,52,40,56,120,44,109,109,120,41,0
+db 82,67,52,32,102,111,114,32,120,56,54,44,32,67,82,89
+db 80,84,79,71,65,77,83,32,98,121,32,60,97,112,112,114
+db 111,64,111,112,101,110,115,115,108,46,111,114,103,62,0
+align 64
+segment .bss
+common _OPENSSL_ia32cap_P 16
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha1-586.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha1-586.nasm
new file mode 100644
index 0000000000..4a893333d8
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha1-586.nasm
@@ -0,0 +1,3977 @@
+; Copyright 1998-2018 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text code align=64
+%else
+section .text code
+%endif
+;extern _OPENSSL_ia32cap_P
+global _sha1_block_data_order
+align 16
+_sha1_block_data_order:
+L$_sha1_block_data_order_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ call L$000pic_point
+L$000pic_point:
+ pop ebp
+ lea esi,[_OPENSSL_ia32cap_P]
+ lea ebp,[(L$K_XX_XX-L$000pic_point)+ebp]
+ mov eax,DWORD [esi]
+ mov edx,DWORD [4+esi]
+ test edx,512
+ jz NEAR L$001x86
+ mov ecx,DWORD [8+esi]
+ test eax,16777216
+ jz NEAR L$001x86
+ test ecx,536870912
+ jnz NEAR L$shaext_shortcut
+ and edx,268435456
+ and eax,1073741824
+ or eax,edx
+ cmp eax,1342177280
+ je NEAR L$avx_shortcut
+ jmp NEAR L$ssse3_shortcut
+align 16
+L$001x86:
+ mov ebp,DWORD [20+esp]
+ mov esi,DWORD [24+esp]
+ mov eax,DWORD [28+esp]
+ sub esp,76
+ shl eax,6
+ add eax,esi
+ mov DWORD [104+esp],eax
+ mov edi,DWORD [16+ebp]
+ jmp NEAR L$002loop
+align 16
+L$002loop:
+ mov eax,DWORD [esi]
+ mov ebx,DWORD [4+esi]
+ mov ecx,DWORD [8+esi]
+ mov edx,DWORD [12+esi]
+ bswap eax
+ bswap ebx
+ bswap ecx
+ bswap edx
+ mov DWORD [esp],eax
+ mov DWORD [4+esp],ebx
+ mov DWORD [8+esp],ecx
+ mov DWORD [12+esp],edx
+ mov eax,DWORD [16+esi]
+ mov ebx,DWORD [20+esi]
+ mov ecx,DWORD [24+esi]
+ mov edx,DWORD [28+esi]
+ bswap eax
+ bswap ebx
+ bswap ecx
+ bswap edx
+ mov DWORD [16+esp],eax
+ mov DWORD [20+esp],ebx
+ mov DWORD [24+esp],ecx
+ mov DWORD [28+esp],edx
+ mov eax,DWORD [32+esi]
+ mov ebx,DWORD [36+esi]
+ mov ecx,DWORD [40+esi]
+ mov edx,DWORD [44+esi]
+ bswap eax
+ bswap ebx
+ bswap ecx
+ bswap edx
+ mov DWORD [32+esp],eax
+ mov DWORD [36+esp],ebx
+ mov DWORD [40+esp],ecx
+ mov DWORD [44+esp],edx
+ mov eax,DWORD [48+esi]
+ mov ebx,DWORD [52+esi]
+ mov ecx,DWORD [56+esi]
+ mov edx,DWORD [60+esi]
+ bswap eax
+ bswap ebx
+ bswap ecx
+ bswap edx
+ mov DWORD [48+esp],eax
+ mov DWORD [52+esp],ebx
+ mov DWORD [56+esp],ecx
+ mov DWORD [60+esp],edx
+ mov DWORD [100+esp],esi
+ mov eax,DWORD [ebp]
+ mov ebx,DWORD [4+ebp]
+ mov ecx,DWORD [8+ebp]
+ mov edx,DWORD [12+ebp]
+ ; 00_15 0
+ mov esi,ecx
+ mov ebp,eax
+ rol ebp,5
+ xor esi,edx
+ add ebp,edi
+ mov edi,DWORD [esp]
+ and esi,ebx
+ ror ebx,2
+ xor esi,edx
+ lea ebp,[1518500249+edi*1+ebp]
+ add ebp,esi
+ ; 00_15 1
+ mov edi,ebx
+ mov esi,ebp
+ rol ebp,5
+ xor edi,ecx
+ add ebp,edx
+ mov edx,DWORD [4+esp]
+ and edi,eax
+ ror eax,2
+ xor edi,ecx
+ lea ebp,[1518500249+edx*1+ebp]
+ add ebp,edi
+ ; 00_15 2
+ mov edx,eax
+ mov edi,ebp
+ rol ebp,5
+ xor edx,ebx
+ add ebp,ecx
+ mov ecx,DWORD [8+esp]
+ and edx,esi
+ ror esi,2
+ xor edx,ebx
+ lea ebp,[1518500249+ecx*1+ebp]
+ add ebp,edx
+ ; 00_15 3
+ mov ecx,esi
+ mov edx,ebp
+ rol ebp,5
+ xor ecx,eax
+ add ebp,ebx
+ mov ebx,DWORD [12+esp]
+ and ecx,edi
+ ror edi,2
+ xor ecx,eax
+ lea ebp,[1518500249+ebx*1+ebp]
+ add ebp,ecx
+ ; 00_15 4
+ mov ebx,edi
+ mov ecx,ebp
+ rol ebp,5
+ xor ebx,esi
+ add ebp,eax
+ mov eax,DWORD [16+esp]
+ and ebx,edx
+ ror edx,2
+ xor ebx,esi
+ lea ebp,[1518500249+eax*1+ebp]
+ add ebp,ebx
+ ; 00_15 5
+ mov eax,edx
+ mov ebx,ebp
+ rol ebp,5
+ xor eax,edi
+ add ebp,esi
+ mov esi,DWORD [20+esp]
+ and eax,ecx
+ ror ecx,2
+ xor eax,edi
+ lea ebp,[1518500249+esi*1+ebp]
+ add ebp,eax
+ ; 00_15 6
+ mov esi,ecx
+ mov eax,ebp
+ rol ebp,5
+ xor esi,edx
+ add ebp,edi
+ mov edi,DWORD [24+esp]
+ and esi,ebx
+ ror ebx,2
+ xor esi,edx
+ lea ebp,[1518500249+edi*1+ebp]
+ add ebp,esi
+ ; 00_15 7
+ mov edi,ebx
+ mov esi,ebp
+ rol ebp,5
+ xor edi,ecx
+ add ebp,edx
+ mov edx,DWORD [28+esp]
+ and edi,eax
+ ror eax,2
+ xor edi,ecx
+ lea ebp,[1518500249+edx*1+ebp]
+ add ebp,edi
+ ; 00_15 8
+ mov edx,eax
+ mov edi,ebp
+ rol ebp,5
+ xor edx,ebx
+ add ebp,ecx
+ mov ecx,DWORD [32+esp]
+ and edx,esi
+ ror esi,2
+ xor edx,ebx
+ lea ebp,[1518500249+ecx*1+ebp]
+ add ebp,edx
+ ; 00_15 9
+ mov ecx,esi
+ mov edx,ebp
+ rol ebp,5
+ xor ecx,eax
+ add ebp,ebx
+ mov ebx,DWORD [36+esp]
+ and ecx,edi
+ ror edi,2
+ xor ecx,eax
+ lea ebp,[1518500249+ebx*1+ebp]
+ add ebp,ecx
+ ; 00_15 10
+ mov ebx,edi
+ mov ecx,ebp
+ rol ebp,5
+ xor ebx,esi
+ add ebp,eax
+ mov eax,DWORD [40+esp]
+ and ebx,edx
+ ror edx,2
+ xor ebx,esi
+ lea ebp,[1518500249+eax*1+ebp]
+ add ebp,ebx
+ ; 00_15 11
+ mov eax,edx
+ mov ebx,ebp
+ rol ebp,5
+ xor eax,edi
+ add ebp,esi
+ mov esi,DWORD [44+esp]
+ and eax,ecx
+ ror ecx,2
+ xor eax,edi
+ lea ebp,[1518500249+esi*1+ebp]
+ add ebp,eax
+ ; 00_15 12
+ mov esi,ecx
+ mov eax,ebp
+ rol ebp,5
+ xor esi,edx
+ add ebp,edi
+ mov edi,DWORD [48+esp]
+ and esi,ebx
+ ror ebx,2
+ xor esi,edx
+ lea ebp,[1518500249+edi*1+ebp]
+ add ebp,esi
+ ; 00_15 13
+ mov edi,ebx
+ mov esi,ebp
+ rol ebp,5
+ xor edi,ecx
+ add ebp,edx
+ mov edx,DWORD [52+esp]
+ and edi,eax
+ ror eax,2
+ xor edi,ecx
+ lea ebp,[1518500249+edx*1+ebp]
+ add ebp,edi
+ ; 00_15 14
+ mov edx,eax
+ mov edi,ebp
+ rol ebp,5
+ xor edx,ebx
+ add ebp,ecx
+ mov ecx,DWORD [56+esp]
+ and edx,esi
+ ror esi,2
+ xor edx,ebx
+ lea ebp,[1518500249+ecx*1+ebp]
+ add ebp,edx
+ ; 00_15 15
+ mov ecx,esi
+ mov edx,ebp
+ rol ebp,5
+ xor ecx,eax
+ add ebp,ebx
+ mov ebx,DWORD [60+esp]
+ and ecx,edi
+ ror edi,2
+ xor ecx,eax
+ lea ebp,[1518500249+ebx*1+ebp]
+ mov ebx,DWORD [esp]
+ add ecx,ebp
+ ; 16_19 16
+ mov ebp,edi
+ xor ebx,DWORD [8+esp]
+ xor ebp,esi
+ xor ebx,DWORD [32+esp]
+ and ebp,edx
+ xor ebx,DWORD [52+esp]
+ rol ebx,1
+ xor ebp,esi
+ add eax,ebp
+ mov ebp,ecx
+ ror edx,2
+ mov DWORD [esp],ebx
+ rol ebp,5
+ lea ebx,[1518500249+eax*1+ebx]
+ mov eax,DWORD [4+esp]
+ add ebx,ebp
+ ; 16_19 17
+ mov ebp,edx
+ xor eax,DWORD [12+esp]
+ xor ebp,edi
+ xor eax,DWORD [36+esp]
+ and ebp,ecx
+ xor eax,DWORD [56+esp]
+ rol eax,1
+ xor ebp,edi
+ add esi,ebp
+ mov ebp,ebx
+ ror ecx,2
+ mov DWORD [4+esp],eax
+ rol ebp,5
+ lea eax,[1518500249+esi*1+eax]
+ mov esi,DWORD [8+esp]
+ add eax,ebp
+ ; 16_19 18
+ mov ebp,ecx
+ xor esi,DWORD [16+esp]
+ xor ebp,edx
+ xor esi,DWORD [40+esp]
+ and ebp,ebx
+ xor esi,DWORD [60+esp]
+ rol esi,1
+ xor ebp,edx
+ add edi,ebp
+ mov ebp,eax
+ ror ebx,2
+ mov DWORD [8+esp],esi
+ rol ebp,5
+ lea esi,[1518500249+edi*1+esi]
+ mov edi,DWORD [12+esp]
+ add esi,ebp
+ ; 16_19 19
+ mov ebp,ebx
+ xor edi,DWORD [20+esp]
+ xor ebp,ecx
+ xor edi,DWORD [44+esp]
+ and ebp,eax
+ xor edi,DWORD [esp]
+ rol edi,1
+ xor ebp,ecx
+ add edx,ebp
+ mov ebp,esi
+ ror eax,2
+ mov DWORD [12+esp],edi
+ rol ebp,5
+ lea edi,[1518500249+edx*1+edi]
+ mov edx,DWORD [16+esp]
+ add edi,ebp
+ ; 20_39 20
+ mov ebp,esi
+ xor edx,DWORD [24+esp]
+ xor ebp,eax
+ xor edx,DWORD [48+esp]
+ xor ebp,ebx
+ xor edx,DWORD [4+esp]
+ rol edx,1
+ add ecx,ebp
+ ror esi,2
+ mov ebp,edi
+ rol ebp,5
+ mov DWORD [16+esp],edx
+ lea edx,[1859775393+ecx*1+edx]
+ mov ecx,DWORD [20+esp]
+ add edx,ebp
+ ; 20_39 21
+ mov ebp,edi
+ xor ecx,DWORD [28+esp]
+ xor ebp,esi
+ xor ecx,DWORD [52+esp]
+ xor ebp,eax
+ xor ecx,DWORD [8+esp]
+ rol ecx,1
+ add ebx,ebp
+ ror edi,2
+ mov ebp,edx
+ rol ebp,5
+ mov DWORD [20+esp],ecx
+ lea ecx,[1859775393+ebx*1+ecx]
+ mov ebx,DWORD [24+esp]
+ add ecx,ebp
+ ; 20_39 22
+ mov ebp,edx
+ xor ebx,DWORD [32+esp]
+ xor ebp,edi
+ xor ebx,DWORD [56+esp]
+ xor ebp,esi
+ xor ebx,DWORD [12+esp]
+ rol ebx,1
+ add eax,ebp
+ ror edx,2
+ mov ebp,ecx
+ rol ebp,5
+ mov DWORD [24+esp],ebx
+ lea ebx,[1859775393+eax*1+ebx]
+ mov eax,DWORD [28+esp]
+ add ebx,ebp
+ ; 20_39 23
+ mov ebp,ecx
+ xor eax,DWORD [36+esp]
+ xor ebp,edx
+ xor eax,DWORD [60+esp]
+ xor ebp,edi
+ xor eax,DWORD [16+esp]
+ rol eax,1
+ add esi,ebp
+ ror ecx,2
+ mov ebp,ebx
+ rol ebp,5
+ mov DWORD [28+esp],eax
+ lea eax,[1859775393+esi*1+eax]
+ mov esi,DWORD [32+esp]
+ add eax,ebp
+ ; 20_39 24
+ mov ebp,ebx
+ xor esi,DWORD [40+esp]
+ xor ebp,ecx
+ xor esi,DWORD [esp]
+ xor ebp,edx
+ xor esi,DWORD [20+esp]
+ rol esi,1
+ add edi,ebp
+ ror ebx,2
+ mov ebp,eax
+ rol ebp,5
+ mov DWORD [32+esp],esi
+ lea esi,[1859775393+edi*1+esi]
+ mov edi,DWORD [36+esp]
+ add esi,ebp
+ ; 20_39 25
+ mov ebp,eax
+ xor edi,DWORD [44+esp]
+ xor ebp,ebx
+ xor edi,DWORD [4+esp]
+ xor ebp,ecx
+ xor edi,DWORD [24+esp]
+ rol edi,1
+ add edx,ebp
+ ror eax,2
+ mov ebp,esi
+ rol ebp,5
+ mov DWORD [36+esp],edi
+ lea edi,[1859775393+edx*1+edi]
+ mov edx,DWORD [40+esp]
+ add edi,ebp
+ ; 20_39 26
+ mov ebp,esi
+ xor edx,DWORD [48+esp]
+ xor ebp,eax
+ xor edx,DWORD [8+esp]
+ xor ebp,ebx
+ xor edx,DWORD [28+esp]
+ rol edx,1
+ add ecx,ebp
+ ror esi,2
+ mov ebp,edi
+ rol ebp,5
+ mov DWORD [40+esp],edx
+ lea edx,[1859775393+ecx*1+edx]
+ mov ecx,DWORD [44+esp]
+ add edx,ebp
+ ; 20_39 27
+ mov ebp,edi
+ xor ecx,DWORD [52+esp]
+ xor ebp,esi
+ xor ecx,DWORD [12+esp]
+ xor ebp,eax
+ xor ecx,DWORD [32+esp]
+ rol ecx,1
+ add ebx,ebp
+ ror edi,2
+ mov ebp,edx
+ rol ebp,5
+ mov DWORD [44+esp],ecx
+ lea ecx,[1859775393+ebx*1+ecx]
+ mov ebx,DWORD [48+esp]
+ add ecx,ebp
+ ; 20_39 28
+ mov ebp,edx
+ xor ebx,DWORD [56+esp]
+ xor ebp,edi
+ xor ebx,DWORD [16+esp]
+ xor ebp,esi
+ xor ebx,DWORD [36+esp]
+ rol ebx,1
+ add eax,ebp
+ ror edx,2
+ mov ebp,ecx
+ rol ebp,5
+ mov DWORD [48+esp],ebx
+ lea ebx,[1859775393+eax*1+ebx]
+ mov eax,DWORD [52+esp]
+ add ebx,ebp
+ ; 20_39 29
+ mov ebp,ecx
+ xor eax,DWORD [60+esp]
+ xor ebp,edx
+ xor eax,DWORD [20+esp]
+ xor ebp,edi
+ xor eax,DWORD [40+esp]
+ rol eax,1
+ add esi,ebp
+ ror ecx,2
+ mov ebp,ebx
+ rol ebp,5
+ mov DWORD [52+esp],eax
+ lea eax,[1859775393+esi*1+eax]
+ mov esi,DWORD [56+esp]
+ add eax,ebp
+ ; 20_39 30
+ mov ebp,ebx
+ xor esi,DWORD [esp]
+ xor ebp,ecx
+ xor esi,DWORD [24+esp]
+ xor ebp,edx
+ xor esi,DWORD [44+esp]
+ rol esi,1
+ add edi,ebp
+ ror ebx,2
+ mov ebp,eax
+ rol ebp,5
+ mov DWORD [56+esp],esi
+ lea esi,[1859775393+edi*1+esi]
+ mov edi,DWORD [60+esp]
+ add esi,ebp
+ ; 20_39 31
+ mov ebp,eax
+ xor edi,DWORD [4+esp]
+ xor ebp,ebx
+ xor edi,DWORD [28+esp]
+ xor ebp,ecx
+ xor edi,DWORD [48+esp]
+ rol edi,1
+ add edx,ebp
+ ror eax,2
+ mov ebp,esi
+ rol ebp,5
+ mov DWORD [60+esp],edi
+ lea edi,[1859775393+edx*1+edi]
+ mov edx,DWORD [esp]
+ add edi,ebp
+ ; 20_39 32
+ mov ebp,esi
+ xor edx,DWORD [8+esp]
+ xor ebp,eax
+ xor edx,DWORD [32+esp]
+ xor ebp,ebx
+ xor edx,DWORD [52+esp]
+ rol edx,1
+ add ecx,ebp
+ ror esi,2
+ mov ebp,edi
+ rol ebp,5
+ mov DWORD [esp],edx
+ lea edx,[1859775393+ecx*1+edx]
+ mov ecx,DWORD [4+esp]
+ add edx,ebp
+ ; 20_39 33
+ mov ebp,edi
+ xor ecx,DWORD [12+esp]
+ xor ebp,esi
+ xor ecx,DWORD [36+esp]
+ xor ebp,eax
+ xor ecx,DWORD [56+esp]
+ rol ecx,1
+ add ebx,ebp
+ ror edi,2
+ mov ebp,edx
+ rol ebp,5
+ mov DWORD [4+esp],ecx
+ lea ecx,[1859775393+ebx*1+ecx]
+ mov ebx,DWORD [8+esp]
+ add ecx,ebp
+ ; 20_39 34
+ mov ebp,edx
+ xor ebx,DWORD [16+esp]
+ xor ebp,edi
+ xor ebx,DWORD [40+esp]
+ xor ebp,esi
+ xor ebx,DWORD [60+esp]
+ rol ebx,1
+ add eax,ebp
+ ror edx,2
+ mov ebp,ecx
+ rol ebp,5
+ mov DWORD [8+esp],ebx
+ lea ebx,[1859775393+eax*1+ebx]
+ mov eax,DWORD [12+esp]
+ add ebx,ebp
+ ; 20_39 35
+ mov ebp,ecx
+ xor eax,DWORD [20+esp]
+ xor ebp,edx
+ xor eax,DWORD [44+esp]
+ xor ebp,edi
+ xor eax,DWORD [esp]
+ rol eax,1
+ add esi,ebp
+ ror ecx,2
+ mov ebp,ebx
+ rol ebp,5
+ mov DWORD [12+esp],eax
+ lea eax,[1859775393+esi*1+eax]
+ mov esi,DWORD [16+esp]
+ add eax,ebp
+ ; 20_39 36
+ mov ebp,ebx
+ xor esi,DWORD [24+esp]
+ xor ebp,ecx
+ xor esi,DWORD [48+esp]
+ xor ebp,edx
+ xor esi,DWORD [4+esp]
+ rol esi,1
+ add edi,ebp
+ ror ebx,2
+ mov ebp,eax
+ rol ebp,5
+ mov DWORD [16+esp],esi
+ lea esi,[1859775393+edi*1+esi]
+ mov edi,DWORD [20+esp]
+ add esi,ebp
+ ; 20_39 37
+ mov ebp,eax
+ xor edi,DWORD [28+esp]
+ xor ebp,ebx
+ xor edi,DWORD [52+esp]
+ xor ebp,ecx
+ xor edi,DWORD [8+esp]
+ rol edi,1
+ add edx,ebp
+ ror eax,2
+ mov ebp,esi
+ rol ebp,5
+ mov DWORD [20+esp],edi
+ lea edi,[1859775393+edx*1+edi]
+ mov edx,DWORD [24+esp]
+ add edi,ebp
+ ; 20_39 38
+ mov ebp,esi
+ xor edx,DWORD [32+esp]
+ xor ebp,eax
+ xor edx,DWORD [56+esp]
+ xor ebp,ebx
+ xor edx,DWORD [12+esp]
+ rol edx,1
+ add ecx,ebp
+ ror esi,2
+ mov ebp,edi
+ rol ebp,5
+ mov DWORD [24+esp],edx
+ lea edx,[1859775393+ecx*1+edx]
+ mov ecx,DWORD [28+esp]
+ add edx,ebp
+ ; 20_39 39
+ mov ebp,edi
+ xor ecx,DWORD [36+esp]
+ xor ebp,esi
+ xor ecx,DWORD [60+esp]
+ xor ebp,eax
+ xor ecx,DWORD [16+esp]
+ rol ecx,1
+ add ebx,ebp
+ ror edi,2
+ mov ebp,edx
+ rol ebp,5
+ mov DWORD [28+esp],ecx
+ lea ecx,[1859775393+ebx*1+ecx]
+ mov ebx,DWORD [32+esp]
+ add ecx,ebp
+ ; 40_59 40
+ mov ebp,edi
+ xor ebx,DWORD [40+esp]
+ xor ebp,esi
+ xor ebx,DWORD [esp]
+ and ebp,edx
+ xor ebx,DWORD [20+esp]
+ rol ebx,1
+ add ebp,eax
+ ror edx,2
+ mov eax,ecx
+ rol eax,5
+ mov DWORD [32+esp],ebx
+ lea ebx,[2400959708+ebp*1+ebx]
+ mov ebp,edi
+ add ebx,eax
+ and ebp,esi
+ mov eax,DWORD [36+esp]
+ add ebx,ebp
+ ; 40_59 41
+ mov ebp,edx
+ xor eax,DWORD [44+esp]
+ xor ebp,edi
+ xor eax,DWORD [4+esp]
+ and ebp,ecx
+ xor eax,DWORD [24+esp]
+ rol eax,1
+ add ebp,esi
+ ror ecx,2
+ mov esi,ebx
+ rol esi,5
+ mov DWORD [36+esp],eax
+ lea eax,[2400959708+ebp*1+eax]
+ mov ebp,edx
+ add eax,esi
+ and ebp,edi
+ mov esi,DWORD [40+esp]
+ add eax,ebp
+ ; 40_59 42
+ mov ebp,ecx
+ xor esi,DWORD [48+esp]
+ xor ebp,edx
+ xor esi,DWORD [8+esp]
+ and ebp,ebx
+ xor esi,DWORD [28+esp]
+ rol esi,1
+ add ebp,edi
+ ror ebx,2
+ mov edi,eax
+ rol edi,5
+ mov DWORD [40+esp],esi
+ lea esi,[2400959708+ebp*1+esi]
+ mov ebp,ecx
+ add esi,edi
+ and ebp,edx
+ mov edi,DWORD [44+esp]
+ add esi,ebp
+ ; 40_59 43
+ mov ebp,ebx
+ xor edi,DWORD [52+esp]
+ xor ebp,ecx
+ xor edi,DWORD [12+esp]
+ and ebp,eax
+ xor edi,DWORD [32+esp]
+ rol edi,1
+ add ebp,edx
+ ror eax,2
+ mov edx,esi
+ rol edx,5
+ mov DWORD [44+esp],edi
+ lea edi,[2400959708+ebp*1+edi]
+ mov ebp,ebx
+ add edi,edx
+ and ebp,ecx
+ mov edx,DWORD [48+esp]
+ add edi,ebp
+ ; 40_59 44
+ mov ebp,eax
+ xor edx,DWORD [56+esp]
+ xor ebp,ebx
+ xor edx,DWORD [16+esp]
+ and ebp,esi
+ xor edx,DWORD [36+esp]
+ rol edx,1
+ add ebp,ecx
+ ror esi,2
+ mov ecx,edi
+ rol ecx,5
+ mov DWORD [48+esp],edx
+ lea edx,[2400959708+ebp*1+edx]
+ mov ebp,eax
+ add edx,ecx
+ and ebp,ebx
+ mov ecx,DWORD [52+esp]
+ add edx,ebp
+ ; 40_59 45
+ mov ebp,esi
+ xor ecx,DWORD [60+esp]
+ xor ebp,eax
+ xor ecx,DWORD [20+esp]
+ and ebp,edi
+ xor ecx,DWORD [40+esp]
+ rol ecx,1
+ add ebp,ebx
+ ror edi,2
+ mov ebx,edx
+ rol ebx,5
+ mov DWORD [52+esp],ecx
+ lea ecx,[2400959708+ebp*1+ecx]
+ mov ebp,esi
+ add ecx,ebx
+ and ebp,eax
+ mov ebx,DWORD [56+esp]
+ add ecx,ebp
+ ; 40_59 46
+ mov ebp,edi
+ xor ebx,DWORD [esp]
+ xor ebp,esi
+ xor ebx,DWORD [24+esp]
+ and ebp,edx
+ xor ebx,DWORD [44+esp]
+ rol ebx,1
+ add ebp,eax
+ ror edx,2
+ mov eax,ecx
+ rol eax,5
+ mov DWORD [56+esp],ebx
+ lea ebx,[2400959708+ebp*1+ebx]
+ mov ebp,edi
+ add ebx,eax
+ and ebp,esi
+ mov eax,DWORD [60+esp]
+ add ebx,ebp
+ ; 40_59 47
+ mov ebp,edx
+ xor eax,DWORD [4+esp]
+ xor ebp,edi
+ xor eax,DWORD [28+esp]
+ and ebp,ecx
+ xor eax,DWORD [48+esp]
+ rol eax,1
+ add ebp,esi
+ ror ecx,2
+ mov esi,ebx
+ rol esi,5
+ mov DWORD [60+esp],eax
+ lea eax,[2400959708+ebp*1+eax]
+ mov ebp,edx
+ add eax,esi
+ and ebp,edi
+ mov esi,DWORD [esp]
+ add eax,ebp
+ ; 40_59 48
+ mov ebp,ecx
+ xor esi,DWORD [8+esp]
+ xor ebp,edx
+ xor esi,DWORD [32+esp]
+ and ebp,ebx
+ xor esi,DWORD [52+esp]
+ rol esi,1
+ add ebp,edi
+ ror ebx,2
+ mov edi,eax
+ rol edi,5
+ mov DWORD [esp],esi
+ lea esi,[2400959708+ebp*1+esi]
+ mov ebp,ecx
+ add esi,edi
+ and ebp,edx
+ mov edi,DWORD [4+esp]
+ add esi,ebp
+ ; 40_59 49
+ mov ebp,ebx
+ xor edi,DWORD [12+esp]
+ xor ebp,ecx
+ xor edi,DWORD [36+esp]
+ and ebp,eax
+ xor edi,DWORD [56+esp]
+ rol edi,1
+ add ebp,edx
+ ror eax,2
+ mov edx,esi
+ rol edx,5
+ mov DWORD [4+esp],edi
+ lea edi,[2400959708+ebp*1+edi]
+ mov ebp,ebx
+ add edi,edx
+ and ebp,ecx
+ mov edx,DWORD [8+esp]
+ add edi,ebp
+ ; 40_59 50
+ mov ebp,eax
+ xor edx,DWORD [16+esp]
+ xor ebp,ebx
+ xor edx,DWORD [40+esp]
+ and ebp,esi
+ xor edx,DWORD [60+esp]
+ rol edx,1
+ add ebp,ecx
+ ror esi,2
+ mov ecx,edi
+ rol ecx,5
+ mov DWORD [8+esp],edx
+ lea edx,[2400959708+ebp*1+edx]
+ mov ebp,eax
+ add edx,ecx
+ and ebp,ebx
+ mov ecx,DWORD [12+esp]
+ add edx,ebp
+ ; 40_59 51
+ mov ebp,esi
+ xor ecx,DWORD [20+esp]
+ xor ebp,eax
+ xor ecx,DWORD [44+esp]
+ and ebp,edi
+ xor ecx,DWORD [esp]
+ rol ecx,1
+ add ebp,ebx
+ ror edi,2
+ mov ebx,edx
+ rol ebx,5
+ mov DWORD [12+esp],ecx
+ lea ecx,[2400959708+ebp*1+ecx]
+ mov ebp,esi
+ add ecx,ebx
+ and ebp,eax
+ mov ebx,DWORD [16+esp]
+ add ecx,ebp
+ ; 40_59 52
+ mov ebp,edi
+ xor ebx,DWORD [24+esp]
+ xor ebp,esi
+ xor ebx,DWORD [48+esp]
+ and ebp,edx
+ xor ebx,DWORD [4+esp]
+ rol ebx,1
+ add ebp,eax
+ ror edx,2
+ mov eax,ecx
+ rol eax,5
+ mov DWORD [16+esp],ebx
+ lea ebx,[2400959708+ebp*1+ebx]
+ mov ebp,edi
+ add ebx,eax
+ and ebp,esi
+ mov eax,DWORD [20+esp]
+ add ebx,ebp
+ ; 40_59 53
+ mov ebp,edx
+ xor eax,DWORD [28+esp]
+ xor ebp,edi
+ xor eax,DWORD [52+esp]
+ and ebp,ecx
+ xor eax,DWORD [8+esp]
+ rol eax,1
+ add ebp,esi
+ ror ecx,2
+ mov esi,ebx
+ rol esi,5
+ mov DWORD [20+esp],eax
+ lea eax,[2400959708+ebp*1+eax]
+ mov ebp,edx
+ add eax,esi
+ and ebp,edi
+ mov esi,DWORD [24+esp]
+ add eax,ebp
+ ; 40_59 54
+ mov ebp,ecx
+ xor esi,DWORD [32+esp]
+ xor ebp,edx
+ xor esi,DWORD [56+esp]
+ and ebp,ebx
+ xor esi,DWORD [12+esp]
+ rol esi,1
+ add ebp,edi
+ ror ebx,2
+ mov edi,eax
+ rol edi,5
+ mov DWORD [24+esp],esi
+ lea esi,[2400959708+ebp*1+esi]
+ mov ebp,ecx
+ add esi,edi
+ and ebp,edx
+ mov edi,DWORD [28+esp]
+ add esi,ebp
+ ; 40_59 55
+ mov ebp,ebx
+ xor edi,DWORD [36+esp]
+ xor ebp,ecx
+ xor edi,DWORD [60+esp]
+ and ebp,eax
+ xor edi,DWORD [16+esp]
+ rol edi,1
+ add ebp,edx
+ ror eax,2
+ mov edx,esi
+ rol edx,5
+ mov DWORD [28+esp],edi
+ lea edi,[2400959708+ebp*1+edi]
+ mov ebp,ebx
+ add edi,edx
+ and ebp,ecx
+ mov edx,DWORD [32+esp]
+ add edi,ebp
+ ; 40_59 56
+ mov ebp,eax
+ xor edx,DWORD [40+esp]
+ xor ebp,ebx
+ xor edx,DWORD [esp]
+ and ebp,esi
+ xor edx,DWORD [20+esp]
+ rol edx,1
+ add ebp,ecx
+ ror esi,2
+ mov ecx,edi
+ rol ecx,5
+ mov DWORD [32+esp],edx
+ lea edx,[2400959708+ebp*1+edx]
+ mov ebp,eax
+ add edx,ecx
+ and ebp,ebx
+ mov ecx,DWORD [36+esp]
+ add edx,ebp
+ ; 40_59 57
+ mov ebp,esi
+ xor ecx,DWORD [44+esp]
+ xor ebp,eax
+ xor ecx,DWORD [4+esp]
+ and ebp,edi
+ xor ecx,DWORD [24+esp]
+ rol ecx,1
+ add ebp,ebx
+ ror edi,2
+ mov ebx,edx
+ rol ebx,5
+ mov DWORD [36+esp],ecx
+ lea ecx,[2400959708+ebp*1+ecx]
+ mov ebp,esi
+ add ecx,ebx
+ and ebp,eax
+ mov ebx,DWORD [40+esp]
+ add ecx,ebp
+ ; 40_59 58
+ mov ebp,edi
+ xor ebx,DWORD [48+esp]
+ xor ebp,esi
+ xor ebx,DWORD [8+esp]
+ and ebp,edx
+ xor ebx,DWORD [28+esp]
+ rol ebx,1
+ add ebp,eax
+ ror edx,2
+ mov eax,ecx
+ rol eax,5
+ mov DWORD [40+esp],ebx
+ lea ebx,[2400959708+ebp*1+ebx]
+ mov ebp,edi
+ add ebx,eax
+ and ebp,esi
+ mov eax,DWORD [44+esp]
+ add ebx,ebp
+ ; 40_59 59
+ mov ebp,edx
+ xor eax,DWORD [52+esp]
+ xor ebp,edi
+ xor eax,DWORD [12+esp]
+ and ebp,ecx
+ xor eax,DWORD [32+esp]
+ rol eax,1
+ add ebp,esi
+ ror ecx,2
+ mov esi,ebx
+ rol esi,5
+ mov DWORD [44+esp],eax
+ lea eax,[2400959708+ebp*1+eax]
+ mov ebp,edx
+ add eax,esi
+ and ebp,edi
+ mov esi,DWORD [48+esp]
+ add eax,ebp
+ ; 20_39 60
+ mov ebp,ebx
+ xor esi,DWORD [56+esp]
+ xor ebp,ecx
+ xor esi,DWORD [16+esp]
+ xor ebp,edx
+ xor esi,DWORD [36+esp]
+ rol esi,1
+ add edi,ebp
+ ror ebx,2
+ mov ebp,eax
+ rol ebp,5
+ mov DWORD [48+esp],esi
+ lea esi,[3395469782+edi*1+esi]
+ mov edi,DWORD [52+esp]
+ add esi,ebp
+ ; 20_39 61
+ mov ebp,eax
+ xor edi,DWORD [60+esp]
+ xor ebp,ebx
+ xor edi,DWORD [20+esp]
+ xor ebp,ecx
+ xor edi,DWORD [40+esp]
+ rol edi,1
+ add edx,ebp
+ ror eax,2
+ mov ebp,esi
+ rol ebp,5
+ mov DWORD [52+esp],edi
+ lea edi,[3395469782+edx*1+edi]
+ mov edx,DWORD [56+esp]
+ add edi,ebp
+ ; 20_39 62
+ mov ebp,esi
+ xor edx,DWORD [esp]
+ xor ebp,eax
+ xor edx,DWORD [24+esp]
+ xor ebp,ebx
+ xor edx,DWORD [44+esp]
+ rol edx,1
+ add ecx,ebp
+ ror esi,2
+ mov ebp,edi
+ rol ebp,5
+ mov DWORD [56+esp],edx
+ lea edx,[3395469782+ecx*1+edx]
+ mov ecx,DWORD [60+esp]
+ add edx,ebp
+ ; 20_39 63
+ mov ebp,edi
+ xor ecx,DWORD [4+esp]
+ xor ebp,esi
+ xor ecx,DWORD [28+esp]
+ xor ebp,eax
+ xor ecx,DWORD [48+esp]
+ rol ecx,1
+ add ebx,ebp
+ ror edi,2
+ mov ebp,edx
+ rol ebp,5
+ mov DWORD [60+esp],ecx
+ lea ecx,[3395469782+ebx*1+ecx]
+ mov ebx,DWORD [esp]
+ add ecx,ebp
+ ; 20_39 64
+ mov ebp,edx
+ xor ebx,DWORD [8+esp]
+ xor ebp,edi
+ xor ebx,DWORD [32+esp]
+ xor ebp,esi
+ xor ebx,DWORD [52+esp]
+ rol ebx,1
+ add eax,ebp
+ ror edx,2
+ mov ebp,ecx
+ rol ebp,5
+ mov DWORD [esp],ebx
+ lea ebx,[3395469782+eax*1+ebx]
+ mov eax,DWORD [4+esp]
+ add ebx,ebp
+ ; 20_39 65
+ mov ebp,ecx
+ xor eax,DWORD [12+esp]
+ xor ebp,edx
+ xor eax,DWORD [36+esp]
+ xor ebp,edi
+ xor eax,DWORD [56+esp]
+ rol eax,1
+ add esi,ebp
+ ror ecx,2
+ mov ebp,ebx
+ rol ebp,5
+ mov DWORD [4+esp],eax
+ lea eax,[3395469782+esi*1+eax]
+ mov esi,DWORD [8+esp]
+ add eax,ebp
+ ; 20_39 66
+ mov ebp,ebx
+ xor esi,DWORD [16+esp]
+ xor ebp,ecx
+ xor esi,DWORD [40+esp]
+ xor ebp,edx
+ xor esi,DWORD [60+esp]
+ rol esi,1
+ add edi,ebp
+ ror ebx,2
+ mov ebp,eax
+ rol ebp,5
+ mov DWORD [8+esp],esi
+ lea esi,[3395469782+edi*1+esi]
+ mov edi,DWORD [12+esp]
+ add esi,ebp
+ ; 20_39 67
+ mov ebp,eax
+ xor edi,DWORD [20+esp]
+ xor ebp,ebx
+ xor edi,DWORD [44+esp]
+ xor ebp,ecx
+ xor edi,DWORD [esp]
+ rol edi,1
+ add edx,ebp
+ ror eax,2
+ mov ebp,esi
+ rol ebp,5
+ mov DWORD [12+esp],edi
+ lea edi,[3395469782+edx*1+edi]
+ mov edx,DWORD [16+esp]
+ add edi,ebp
+ ; 20_39 68
+ mov ebp,esi
+ xor edx,DWORD [24+esp]
+ xor ebp,eax
+ xor edx,DWORD [48+esp]
+ xor ebp,ebx
+ xor edx,DWORD [4+esp]
+ rol edx,1
+ add ecx,ebp
+ ror esi,2
+ mov ebp,edi
+ rol ebp,5
+ mov DWORD [16+esp],edx
+ lea edx,[3395469782+ecx*1+edx]
+ mov ecx,DWORD [20+esp]
+ add edx,ebp
+ ; 20_39 69
+ mov ebp,edi
+ xor ecx,DWORD [28+esp]
+ xor ebp,esi
+ xor ecx,DWORD [52+esp]
+ xor ebp,eax
+ xor ecx,DWORD [8+esp]
+ rol ecx,1
+ add ebx,ebp
+ ror edi,2
+ mov ebp,edx
+ rol ebp,5
+ mov DWORD [20+esp],ecx
+ lea ecx,[3395469782+ebx*1+ecx]
+ mov ebx,DWORD [24+esp]
+ add ecx,ebp
+ ; 20_39 70
+ mov ebp,edx
+ xor ebx,DWORD [32+esp]
+ xor ebp,edi
+ xor ebx,DWORD [56+esp]
+ xor ebp,esi
+ xor ebx,DWORD [12+esp]
+ rol ebx,1
+ add eax,ebp
+ ror edx,2
+ mov ebp,ecx
+ rol ebp,5
+ mov DWORD [24+esp],ebx
+ lea ebx,[3395469782+eax*1+ebx]
+ mov eax,DWORD [28+esp]
+ add ebx,ebp
+ ; 20_39 71
+ mov ebp,ecx
+ xor eax,DWORD [36+esp]
+ xor ebp,edx
+ xor eax,DWORD [60+esp]
+ xor ebp,edi
+ xor eax,DWORD [16+esp]
+ rol eax,1
+ add esi,ebp
+ ror ecx,2
+ mov ebp,ebx
+ rol ebp,5
+ mov DWORD [28+esp],eax
+ lea eax,[3395469782+esi*1+eax]
+ mov esi,DWORD [32+esp]
+ add eax,ebp
+ ; 20_39 72
+ mov ebp,ebx
+ xor esi,DWORD [40+esp]
+ xor ebp,ecx
+ xor esi,DWORD [esp]
+ xor ebp,edx
+ xor esi,DWORD [20+esp]
+ rol esi,1
+ add edi,ebp
+ ror ebx,2
+ mov ebp,eax
+ rol ebp,5
+ mov DWORD [32+esp],esi
+ lea esi,[3395469782+edi*1+esi]
+ mov edi,DWORD [36+esp]
+ add esi,ebp
+ ; 20_39 73
+ mov ebp,eax
+ xor edi,DWORD [44+esp]
+ xor ebp,ebx
+ xor edi,DWORD [4+esp]
+ xor ebp,ecx
+ xor edi,DWORD [24+esp]
+ rol edi,1
+ add edx,ebp
+ ror eax,2
+ mov ebp,esi
+ rol ebp,5
+ mov DWORD [36+esp],edi
+ lea edi,[3395469782+edx*1+edi]
+ mov edx,DWORD [40+esp]
+ add edi,ebp
+ ; 20_39 74
+ mov ebp,esi
+ xor edx,DWORD [48+esp]
+ xor ebp,eax
+ xor edx,DWORD [8+esp]
+ xor ebp,ebx
+ xor edx,DWORD [28+esp]
+ rol edx,1
+ add ecx,ebp
+ ror esi,2
+ mov ebp,edi
+ rol ebp,5
+ mov DWORD [40+esp],edx
+ lea edx,[3395469782+ecx*1+edx]
+ mov ecx,DWORD [44+esp]
+ add edx,ebp
+ ; 20_39 75
+ mov ebp,edi
+ xor ecx,DWORD [52+esp]
+ xor ebp,esi
+ xor ecx,DWORD [12+esp]
+ xor ebp,eax
+ xor ecx,DWORD [32+esp]
+ rol ecx,1
+ add ebx,ebp
+ ror edi,2
+ mov ebp,edx
+ rol ebp,5
+ mov DWORD [44+esp],ecx
+ lea ecx,[3395469782+ebx*1+ecx]
+ mov ebx,DWORD [48+esp]
+ add ecx,ebp
+ ; 20_39 76
+ mov ebp,edx
+ xor ebx,DWORD [56+esp]
+ xor ebp,edi
+ xor ebx,DWORD [16+esp]
+ xor ebp,esi
+ xor ebx,DWORD [36+esp]
+ rol ebx,1
+ add eax,ebp
+ ror edx,2
+ mov ebp,ecx
+ rol ebp,5
+ mov DWORD [48+esp],ebx
+ lea ebx,[3395469782+eax*1+ebx]
+ mov eax,DWORD [52+esp]
+ add ebx,ebp
+ ; 20_39 77
+ mov ebp,ecx
+ xor eax,DWORD [60+esp]
+ xor ebp,edx
+ xor eax,DWORD [20+esp]
+ xor ebp,edi
+ xor eax,DWORD [40+esp]
+ rol eax,1
+ add esi,ebp
+ ror ecx,2
+ mov ebp,ebx
+ rol ebp,5
+ lea eax,[3395469782+esi*1+eax]
+ mov esi,DWORD [56+esp]
+ add eax,ebp
+ ; 20_39 78
+ mov ebp,ebx
+ xor esi,DWORD [esp]
+ xor ebp,ecx
+ xor esi,DWORD [24+esp]
+ xor ebp,edx
+ xor esi,DWORD [44+esp]
+ rol esi,1
+ add edi,ebp
+ ror ebx,2
+ mov ebp,eax
+ rol ebp,5
+ lea esi,[3395469782+edi*1+esi]
+ mov edi,DWORD [60+esp]
+ add esi,ebp
+ ; 20_39 79
+ mov ebp,eax
+ xor edi,DWORD [4+esp]
+ xor ebp,ebx
+ xor edi,DWORD [28+esp]
+ xor ebp,ecx
+ xor edi,DWORD [48+esp]
+ rol edi,1
+ add edx,ebp
+ ror eax,2
+ mov ebp,esi
+ rol ebp,5
+ lea edi,[3395469782+edx*1+edi]
+ add edi,ebp
+ mov ebp,DWORD [96+esp]
+ mov edx,DWORD [100+esp]
+ add edi,DWORD [ebp]
+ add esi,DWORD [4+ebp]
+ add eax,DWORD [8+ebp]
+ add ebx,DWORD [12+ebp]
+ add ecx,DWORD [16+ebp]
+ mov DWORD [ebp],edi
+ add edx,64
+ mov DWORD [4+ebp],esi
+ cmp edx,DWORD [104+esp]
+ mov DWORD [8+ebp],eax
+ mov edi,ecx
+ mov DWORD [12+ebp],ebx
+ mov esi,edx
+ mov DWORD [16+ebp],ecx
+ jb NEAR L$002loop
+ add esp,76
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+align 16
+__sha1_block_data_order_shaext:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ call L$003pic_point
+L$003pic_point:
+ pop ebp
+ lea ebp,[(L$K_XX_XX-L$003pic_point)+ebp]
+L$shaext_shortcut:
+ mov edi,DWORD [20+esp]
+ mov ebx,esp
+ mov esi,DWORD [24+esp]
+ mov ecx,DWORD [28+esp]
+ sub esp,32
+ movdqu xmm0,[edi]
+ movd xmm1,DWORD [16+edi]
+ and esp,-32
+ movdqa xmm3,[80+ebp]
+ movdqu xmm4,[esi]
+ pshufd xmm0,xmm0,27
+ movdqu xmm5,[16+esi]
+ pshufd xmm1,xmm1,27
+ movdqu xmm6,[32+esi]
+db 102,15,56,0,227
+ movdqu xmm7,[48+esi]
+db 102,15,56,0,235
+db 102,15,56,0,243
+db 102,15,56,0,251
+ jmp NEAR L$004loop_shaext
+align 16
+L$004loop_shaext:
+ dec ecx
+ lea eax,[64+esi]
+ movdqa [esp],xmm1
+ paddd xmm1,xmm4
+ cmovne esi,eax
+ movdqa [16+esp],xmm0
+db 15,56,201,229
+ movdqa xmm2,xmm0
+db 15,58,204,193,0
+db 15,56,200,213
+ pxor xmm4,xmm6
+db 15,56,201,238
+db 15,56,202,231
+ movdqa xmm1,xmm0
+db 15,58,204,194,0
+db 15,56,200,206
+ pxor xmm5,xmm7
+db 15,56,202,236
+db 15,56,201,247
+ movdqa xmm2,xmm0
+db 15,58,204,193,0
+db 15,56,200,215
+ pxor xmm6,xmm4
+db 15,56,201,252
+db 15,56,202,245
+ movdqa xmm1,xmm0
+db 15,58,204,194,0
+db 15,56,200,204
+ pxor xmm7,xmm5
+db 15,56,202,254
+db 15,56,201,229
+ movdqa xmm2,xmm0
+db 15,58,204,193,0
+db 15,56,200,213
+ pxor xmm4,xmm6
+db 15,56,201,238
+db 15,56,202,231
+ movdqa xmm1,xmm0
+db 15,58,204,194,1
+db 15,56,200,206
+ pxor xmm5,xmm7
+db 15,56,202,236
+db 15,56,201,247
+ movdqa xmm2,xmm0
+db 15,58,204,193,1
+db 15,56,200,215
+ pxor xmm6,xmm4
+db 15,56,201,252
+db 15,56,202,245
+ movdqa xmm1,xmm0
+db 15,58,204,194,1
+db 15,56,200,204
+ pxor xmm7,xmm5
+db 15,56,202,254
+db 15,56,201,229
+ movdqa xmm2,xmm0
+db 15,58,204,193,1
+db 15,56,200,213
+ pxor xmm4,xmm6
+db 15,56,201,238
+db 15,56,202,231
+ movdqa xmm1,xmm0
+db 15,58,204,194,1
+db 15,56,200,206
+ pxor xmm5,xmm7
+db 15,56,202,236
+db 15,56,201,247
+ movdqa xmm2,xmm0
+db 15,58,204,193,2
+db 15,56,200,215
+ pxor xmm6,xmm4
+db 15,56,201,252
+db 15,56,202,245
+ movdqa xmm1,xmm0
+db 15,58,204,194,2
+db 15,56,200,204
+ pxor xmm7,xmm5
+db 15,56,202,254
+db 15,56,201,229
+ movdqa xmm2,xmm0
+db 15,58,204,193,2
+db 15,56,200,213
+ pxor xmm4,xmm6
+db 15,56,201,238
+db 15,56,202,231
+ movdqa xmm1,xmm0
+db 15,58,204,194,2
+db 15,56,200,206
+ pxor xmm5,xmm7
+db 15,56,202,236
+db 15,56,201,247
+ movdqa xmm2,xmm0
+db 15,58,204,193,2
+db 15,56,200,215
+ pxor xmm6,xmm4
+db 15,56,201,252
+db 15,56,202,245
+ movdqa xmm1,xmm0
+db 15,58,204,194,3
+db 15,56,200,204
+ pxor xmm7,xmm5
+db 15,56,202,254
+ movdqu xmm4,[esi]
+ movdqa xmm2,xmm0
+db 15,58,204,193,3
+db 15,56,200,213
+ movdqu xmm5,[16+esi]
+db 102,15,56,0,227
+ movdqa xmm1,xmm0
+db 15,58,204,194,3
+db 15,56,200,206
+ movdqu xmm6,[32+esi]
+db 102,15,56,0,235
+ movdqa xmm2,xmm0
+db 15,58,204,193,3
+db 15,56,200,215
+ movdqu xmm7,[48+esi]
+db 102,15,56,0,243
+ movdqa xmm1,xmm0
+db 15,58,204,194,3
+ movdqa xmm2,[esp]
+db 102,15,56,0,251
+db 15,56,200,202
+ paddd xmm0,[16+esp]
+ jnz NEAR L$004loop_shaext
+ pshufd xmm0,xmm0,27
+ pshufd xmm1,xmm1,27
+ movdqu [edi],xmm0
+ movd DWORD [16+edi],xmm1
+ mov esp,ebx
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+align 16
+__sha1_block_data_order_ssse3:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ call L$005pic_point
+L$005pic_point:
+ pop ebp
+ lea ebp,[(L$K_XX_XX-L$005pic_point)+ebp]
+L$ssse3_shortcut:
+ movdqa xmm7,[ebp]
+ movdqa xmm0,[16+ebp]
+ movdqa xmm1,[32+ebp]
+ movdqa xmm2,[48+ebp]
+ movdqa xmm6,[64+ebp]
+ mov edi,DWORD [20+esp]
+ mov ebp,DWORD [24+esp]
+ mov edx,DWORD [28+esp]
+ mov esi,esp
+ sub esp,208
+ and esp,-64
+ movdqa [112+esp],xmm0
+ movdqa [128+esp],xmm1
+ movdqa [144+esp],xmm2
+ shl edx,6
+ movdqa [160+esp],xmm7
+ add edx,ebp
+ movdqa [176+esp],xmm6
+ add ebp,64
+ mov DWORD [192+esp],edi
+ mov DWORD [196+esp],ebp
+ mov DWORD [200+esp],edx
+ mov DWORD [204+esp],esi
+ mov eax,DWORD [edi]
+ mov ebx,DWORD [4+edi]
+ mov ecx,DWORD [8+edi]
+ mov edx,DWORD [12+edi]
+ mov edi,DWORD [16+edi]
+ mov esi,ebx
+ movdqu xmm0,[ebp-64]
+ movdqu xmm1,[ebp-48]
+ movdqu xmm2,[ebp-32]
+ movdqu xmm3,[ebp-16]
+db 102,15,56,0,198
+db 102,15,56,0,206
+db 102,15,56,0,214
+ movdqa [96+esp],xmm7
+db 102,15,56,0,222
+ paddd xmm0,xmm7
+ paddd xmm1,xmm7
+ paddd xmm2,xmm7
+ movdqa [esp],xmm0
+ psubd xmm0,xmm7
+ movdqa [16+esp],xmm1
+ psubd xmm1,xmm7
+ movdqa [32+esp],xmm2
+ mov ebp,ecx
+ psubd xmm2,xmm7
+ xor ebp,edx
+ pshufd xmm4,xmm0,238
+ and esi,ebp
+ jmp NEAR L$006loop
+align 16
+L$006loop:
+ ror ebx,2
+ xor esi,edx
+ mov ebp,eax
+ punpcklqdq xmm4,xmm1
+ movdqa xmm6,xmm3
+ add edi,DWORD [esp]
+ xor ebx,ecx
+ paddd xmm7,xmm3
+ movdqa [64+esp],xmm0
+ rol eax,5
+ add edi,esi
+ psrldq xmm6,4
+ and ebp,ebx
+ xor ebx,ecx
+ pxor xmm4,xmm0
+ add edi,eax
+ ror eax,7
+ pxor xmm6,xmm2
+ xor ebp,ecx
+ mov esi,edi
+ add edx,DWORD [4+esp]
+ pxor xmm4,xmm6
+ xor eax,ebx
+ rol edi,5
+ movdqa [48+esp],xmm7
+ add edx,ebp
+ and esi,eax
+ movdqa xmm0,xmm4
+ xor eax,ebx
+ add edx,edi
+ ror edi,7
+ movdqa xmm6,xmm4
+ xor esi,ebx
+ pslldq xmm0,12
+ paddd xmm4,xmm4
+ mov ebp,edx
+ add ecx,DWORD [8+esp]
+ psrld xmm6,31
+ xor edi,eax
+ rol edx,5
+ movdqa xmm7,xmm0
+ add ecx,esi
+ and ebp,edi
+ xor edi,eax
+ psrld xmm0,30
+ add ecx,edx
+ ror edx,7
+ por xmm4,xmm6
+ xor ebp,eax
+ mov esi,ecx
+ add ebx,DWORD [12+esp]
+ pslld xmm7,2
+ xor edx,edi
+ rol ecx,5
+ pxor xmm4,xmm0
+ movdqa xmm0,[96+esp]
+ add ebx,ebp
+ and esi,edx
+ pxor xmm4,xmm7
+ pshufd xmm5,xmm1,238
+ xor edx,edi
+ add ebx,ecx
+ ror ecx,7
+ xor esi,edi
+ mov ebp,ebx
+ punpcklqdq xmm5,xmm2
+ movdqa xmm7,xmm4
+ add eax,DWORD [16+esp]
+ xor ecx,edx
+ paddd xmm0,xmm4
+ movdqa [80+esp],xmm1
+ rol ebx,5
+ add eax,esi
+ psrldq xmm7,4
+ and ebp,ecx
+ xor ecx,edx
+ pxor xmm5,xmm1
+ add eax,ebx
+ ror ebx,7
+ pxor xmm7,xmm3
+ xor ebp,edx
+ mov esi,eax
+ add edi,DWORD [20+esp]
+ pxor xmm5,xmm7
+ xor ebx,ecx
+ rol eax,5
+ movdqa [esp],xmm0
+ add edi,ebp
+ and esi,ebx
+ movdqa xmm1,xmm5
+ xor ebx,ecx
+ add edi,eax
+ ror eax,7
+ movdqa xmm7,xmm5
+ xor esi,ecx
+ pslldq xmm1,12
+ paddd xmm5,xmm5
+ mov ebp,edi
+ add edx,DWORD [24+esp]
+ psrld xmm7,31
+ xor eax,ebx
+ rol edi,5
+ movdqa xmm0,xmm1
+ add edx,esi
+ and ebp,eax
+ xor eax,ebx
+ psrld xmm1,30
+ add edx,edi
+ ror edi,7
+ por xmm5,xmm7
+ xor ebp,ebx
+ mov esi,edx
+ add ecx,DWORD [28+esp]
+ pslld xmm0,2
+ xor edi,eax
+ rol edx,5
+ pxor xmm5,xmm1
+ movdqa xmm1,[112+esp]
+ add ecx,ebp
+ and esi,edi
+ pxor xmm5,xmm0
+ pshufd xmm6,xmm2,238
+ xor edi,eax
+ add ecx,edx
+ ror edx,7
+ xor esi,eax
+ mov ebp,ecx
+ punpcklqdq xmm6,xmm3
+ movdqa xmm0,xmm5
+ add ebx,DWORD [32+esp]
+ xor edx,edi
+ paddd xmm1,xmm5
+ movdqa [96+esp],xmm2
+ rol ecx,5
+ add ebx,esi
+ psrldq xmm0,4
+ and ebp,edx
+ xor edx,edi
+ pxor xmm6,xmm2
+ add ebx,ecx
+ ror ecx,7
+ pxor xmm0,xmm4
+ xor ebp,edi
+ mov esi,ebx
+ add eax,DWORD [36+esp]
+ pxor xmm6,xmm0
+ xor ecx,edx
+ rol ebx,5
+ movdqa [16+esp],xmm1
+ add eax,ebp
+ and esi,ecx
+ movdqa xmm2,xmm6
+ xor ecx,edx
+ add eax,ebx
+ ror ebx,7
+ movdqa xmm0,xmm6
+ xor esi,edx
+ pslldq xmm2,12
+ paddd xmm6,xmm6
+ mov ebp,eax
+ add edi,DWORD [40+esp]
+ psrld xmm0,31
+ xor ebx,ecx
+ rol eax,5
+ movdqa xmm1,xmm2
+ add edi,esi
+ and ebp,ebx
+ xor ebx,ecx
+ psrld xmm2,30
+ add edi,eax
+ ror eax,7
+ por xmm6,xmm0
+ xor ebp,ecx
+ movdqa xmm0,[64+esp]
+ mov esi,edi
+ add edx,DWORD [44+esp]
+ pslld xmm1,2
+ xor eax,ebx
+ rol edi,5
+ pxor xmm6,xmm2
+ movdqa xmm2,[112+esp]
+ add edx,ebp
+ and esi,eax
+ pxor xmm6,xmm1
+ pshufd xmm7,xmm3,238
+ xor eax,ebx
+ add edx,edi
+ ror edi,7
+ xor esi,ebx
+ mov ebp,edx
+ punpcklqdq xmm7,xmm4
+ movdqa xmm1,xmm6
+ add ecx,DWORD [48+esp]
+ xor edi,eax
+ paddd xmm2,xmm6
+ movdqa [64+esp],xmm3
+ rol edx,5
+ add ecx,esi
+ psrldq xmm1,4
+ and ebp,edi
+ xor edi,eax
+ pxor xmm7,xmm3
+ add ecx,edx
+ ror edx,7
+ pxor xmm1,xmm5
+ xor ebp,eax
+ mov esi,ecx
+ add ebx,DWORD [52+esp]
+ pxor xmm7,xmm1
+ xor edx,edi
+ rol ecx,5
+ movdqa [32+esp],xmm2
+ add ebx,ebp
+ and esi,edx
+ movdqa xmm3,xmm7
+ xor edx,edi
+ add ebx,ecx
+ ror ecx,7
+ movdqa xmm1,xmm7
+ xor esi,edi
+ pslldq xmm3,12
+ paddd xmm7,xmm7
+ mov ebp,ebx
+ add eax,DWORD [56+esp]
+ psrld xmm1,31
+ xor ecx,edx
+ rol ebx,5
+ movdqa xmm2,xmm3
+ add eax,esi
+ and ebp,ecx
+ xor ecx,edx
+ psrld xmm3,30
+ add eax,ebx
+ ror ebx,7
+ por xmm7,xmm1
+ xor ebp,edx
+ movdqa xmm1,[80+esp]
+ mov esi,eax
+ add edi,DWORD [60+esp]
+ pslld xmm2,2
+ xor ebx,ecx
+ rol eax,5
+ pxor xmm7,xmm3
+ movdqa xmm3,[112+esp]
+ add edi,ebp
+ and esi,ebx
+ pxor xmm7,xmm2
+ pshufd xmm2,xmm6,238
+ xor ebx,ecx
+ add edi,eax
+ ror eax,7
+ pxor xmm0,xmm4
+ punpcklqdq xmm2,xmm7
+ xor esi,ecx
+ mov ebp,edi
+ add edx,DWORD [esp]
+ pxor xmm0,xmm1
+ movdqa [80+esp],xmm4
+ xor eax,ebx
+ rol edi,5
+ movdqa xmm4,xmm3
+ add edx,esi
+ paddd xmm3,xmm7
+ and ebp,eax
+ pxor xmm0,xmm2
+ xor eax,ebx
+ add edx,edi
+ ror edi,7
+ xor ebp,ebx
+ movdqa xmm2,xmm0
+ movdqa [48+esp],xmm3
+ mov esi,edx
+ add ecx,DWORD [4+esp]
+ xor edi,eax
+ rol edx,5
+ pslld xmm0,2
+ add ecx,ebp
+ and esi,edi
+ psrld xmm2,30
+ xor edi,eax
+ add ecx,edx
+ ror edx,7
+ xor esi,eax
+ mov ebp,ecx
+ add ebx,DWORD [8+esp]
+ xor edx,edi
+ rol ecx,5
+ por xmm0,xmm2
+ add ebx,esi
+ and ebp,edx
+ movdqa xmm2,[96+esp]
+ xor edx,edi
+ add ebx,ecx
+ add eax,DWORD [12+esp]
+ xor ebp,edi
+ mov esi,ebx
+ pshufd xmm3,xmm7,238
+ rol ebx,5
+ add eax,ebp
+ xor esi,edx
+ ror ecx,7
+ add eax,ebx
+ add edi,DWORD [16+esp]
+ pxor xmm1,xmm5
+ punpcklqdq xmm3,xmm0
+ xor esi,ecx
+ mov ebp,eax
+ rol eax,5
+ pxor xmm1,xmm2
+ movdqa [96+esp],xmm5
+ add edi,esi
+ xor ebp,ecx
+ movdqa xmm5,xmm4
+ ror ebx,7
+ paddd xmm4,xmm0
+ add edi,eax
+ pxor xmm1,xmm3
+ add edx,DWORD [20+esp]
+ xor ebp,ebx
+ mov esi,edi
+ rol edi,5
+ movdqa xmm3,xmm1
+ movdqa [esp],xmm4
+ add edx,ebp
+ xor esi,ebx
+ ror eax,7
+ add edx,edi
+ pslld xmm1,2
+ add ecx,DWORD [24+esp]
+ xor esi,eax
+ psrld xmm3,30
+ mov ebp,edx
+ rol edx,5
+ add ecx,esi
+ xor ebp,eax
+ ror edi,7
+ add ecx,edx
+ por xmm1,xmm3
+ add ebx,DWORD [28+esp]
+ xor ebp,edi
+ movdqa xmm3,[64+esp]
+ mov esi,ecx
+ rol ecx,5
+ add ebx,ebp
+ xor esi,edi
+ ror edx,7
+ pshufd xmm4,xmm0,238
+ add ebx,ecx
+ add eax,DWORD [32+esp]
+ pxor xmm2,xmm6
+ punpcklqdq xmm4,xmm1
+ xor esi,edx
+ mov ebp,ebx
+ rol ebx,5
+ pxor xmm2,xmm3
+ movdqa [64+esp],xmm6
+ add eax,esi
+ xor ebp,edx
+ movdqa xmm6,[128+esp]
+ ror ecx,7
+ paddd xmm5,xmm1
+ add eax,ebx
+ pxor xmm2,xmm4
+ add edi,DWORD [36+esp]
+ xor ebp,ecx
+ mov esi,eax
+ rol eax,5
+ movdqa xmm4,xmm2
+ movdqa [16+esp],xmm5
+ add edi,ebp
+ xor esi,ecx
+ ror ebx,7
+ add edi,eax
+ pslld xmm2,2
+ add edx,DWORD [40+esp]
+ xor esi,ebx
+ psrld xmm4,30
+ mov ebp,edi
+ rol edi,5
+ add edx,esi
+ xor ebp,ebx
+ ror eax,7
+ add edx,edi
+ por xmm2,xmm4
+ add ecx,DWORD [44+esp]
+ xor ebp,eax
+ movdqa xmm4,[80+esp]
+ mov esi,edx
+ rol edx,5
+ add ecx,ebp
+ xor esi,eax
+ ror edi,7
+ pshufd xmm5,xmm1,238
+ add ecx,edx
+ add ebx,DWORD [48+esp]
+ pxor xmm3,xmm7
+ punpcklqdq xmm5,xmm2
+ xor esi,edi
+ mov ebp,ecx
+ rol ecx,5
+ pxor xmm3,xmm4
+ movdqa [80+esp],xmm7
+ add ebx,esi
+ xor ebp,edi
+ movdqa xmm7,xmm6
+ ror edx,7
+ paddd xmm6,xmm2
+ add ebx,ecx
+ pxor xmm3,xmm5
+ add eax,DWORD [52+esp]
+ xor ebp,edx
+ mov esi,ebx
+ rol ebx,5
+ movdqa xmm5,xmm3
+ movdqa [32+esp],xmm6
+ add eax,ebp
+ xor esi,edx
+ ror ecx,7
+ add eax,ebx
+ pslld xmm3,2
+ add edi,DWORD [56+esp]
+ xor esi,ecx
+ psrld xmm5,30
+ mov ebp,eax
+ rol eax,5
+ add edi,esi
+ xor ebp,ecx
+ ror ebx,7
+ add edi,eax
+ por xmm3,xmm5
+ add edx,DWORD [60+esp]
+ xor ebp,ebx
+ movdqa xmm5,[96+esp]
+ mov esi,edi
+ rol edi,5
+ add edx,ebp
+ xor esi,ebx
+ ror eax,7
+ pshufd xmm6,xmm2,238
+ add edx,edi
+ add ecx,DWORD [esp]
+ pxor xmm4,xmm0
+ punpcklqdq xmm6,xmm3
+ xor esi,eax
+ mov ebp,edx
+ rol edx,5
+ pxor xmm4,xmm5
+ movdqa [96+esp],xmm0
+ add ecx,esi
+ xor ebp,eax
+ movdqa xmm0,xmm7
+ ror edi,7
+ paddd xmm7,xmm3
+ add ecx,edx
+ pxor xmm4,xmm6
+ add ebx,DWORD [4+esp]
+ xor ebp,edi
+ mov esi,ecx
+ rol ecx,5
+ movdqa xmm6,xmm4
+ movdqa [48+esp],xmm7
+ add ebx,ebp
+ xor esi,edi
+ ror edx,7
+ add ebx,ecx
+ pslld xmm4,2
+ add eax,DWORD [8+esp]
+ xor esi,edx
+ psrld xmm6,30
+ mov ebp,ebx
+ rol ebx,5
+ add eax,esi
+ xor ebp,edx
+ ror ecx,7
+ add eax,ebx
+ por xmm4,xmm6
+ add edi,DWORD [12+esp]
+ xor ebp,ecx
+ movdqa xmm6,[64+esp]
+ mov esi,eax
+ rol eax,5
+ add edi,ebp
+ xor esi,ecx
+ ror ebx,7
+ pshufd xmm7,xmm3,238
+ add edi,eax
+ add edx,DWORD [16+esp]
+ pxor xmm5,xmm1
+ punpcklqdq xmm7,xmm4
+ xor esi,ebx
+ mov ebp,edi
+ rol edi,5
+ pxor xmm5,xmm6
+ movdqa [64+esp],xmm1
+ add edx,esi
+ xor ebp,ebx
+ movdqa xmm1,xmm0
+ ror eax,7
+ paddd xmm0,xmm4
+ add edx,edi
+ pxor xmm5,xmm7
+ add ecx,DWORD [20+esp]
+ xor ebp,eax
+ mov esi,edx
+ rol edx,5
+ movdqa xmm7,xmm5
+ movdqa [esp],xmm0
+ add ecx,ebp
+ xor esi,eax
+ ror edi,7
+ add ecx,edx
+ pslld xmm5,2
+ add ebx,DWORD [24+esp]
+ xor esi,edi
+ psrld xmm7,30
+ mov ebp,ecx
+ rol ecx,5
+ add ebx,esi
+ xor ebp,edi
+ ror edx,7
+ add ebx,ecx
+ por xmm5,xmm7
+ add eax,DWORD [28+esp]
+ movdqa xmm7,[80+esp]
+ ror ecx,7
+ mov esi,ebx
+ xor ebp,edx
+ rol ebx,5
+ pshufd xmm0,xmm4,238
+ add eax,ebp
+ xor esi,ecx
+ xor ecx,edx
+ add eax,ebx
+ add edi,DWORD [32+esp]
+ pxor xmm6,xmm2
+ punpcklqdq xmm0,xmm5
+ and esi,ecx
+ xor ecx,edx
+ ror ebx,7
+ pxor xmm6,xmm7
+ movdqa [80+esp],xmm2
+ mov ebp,eax
+ xor esi,ecx
+ rol eax,5
+ movdqa xmm2,xmm1
+ add edi,esi
+ paddd xmm1,xmm5
+ xor ebp,ebx
+ pxor xmm6,xmm0
+ xor ebx,ecx
+ add edi,eax
+ add edx,DWORD [36+esp]
+ and ebp,ebx
+ movdqa xmm0,xmm6
+ movdqa [16+esp],xmm1
+ xor ebx,ecx
+ ror eax,7
+ mov esi,edi
+ xor ebp,ebx
+ rol edi,5
+ pslld xmm6,2
+ add edx,ebp
+ xor esi,eax
+ psrld xmm0,30
+ xor eax,ebx
+ add edx,edi
+ add ecx,DWORD [40+esp]
+ and esi,eax
+ xor eax,ebx
+ ror edi,7
+ por xmm6,xmm0
+ mov ebp,edx
+ xor esi,eax
+ movdqa xmm0,[96+esp]
+ rol edx,5
+ add ecx,esi
+ xor ebp,edi
+ xor edi,eax
+ add ecx,edx
+ pshufd xmm1,xmm5,238
+ add ebx,DWORD [44+esp]
+ and ebp,edi
+ xor edi,eax
+ ror edx,7
+ mov esi,ecx
+ xor ebp,edi
+ rol ecx,5
+ add ebx,ebp
+ xor esi,edx
+ xor edx,edi
+ add ebx,ecx
+ add eax,DWORD [48+esp]
+ pxor xmm7,xmm3
+ punpcklqdq xmm1,xmm6
+ and esi,edx
+ xor edx,edi
+ ror ecx,7
+ pxor xmm7,xmm0
+ movdqa [96+esp],xmm3
+ mov ebp,ebx
+ xor esi,edx
+ rol ebx,5
+ movdqa xmm3,[144+esp]
+ add eax,esi
+ paddd xmm2,xmm6
+ xor ebp,ecx
+ pxor xmm7,xmm1
+ xor ecx,edx
+ add eax,ebx
+ add edi,DWORD [52+esp]
+ and ebp,ecx
+ movdqa xmm1,xmm7
+ movdqa [32+esp],xmm2
+ xor ecx,edx
+ ror ebx,7
+ mov esi,eax
+ xor ebp,ecx
+ rol eax,5
+ pslld xmm7,2
+ add edi,ebp
+ xor esi,ebx
+ psrld xmm1,30
+ xor ebx,ecx
+ add edi,eax
+ add edx,DWORD [56+esp]
+ and esi,ebx
+ xor ebx,ecx
+ ror eax,7
+ por xmm7,xmm1
+ mov ebp,edi
+ xor esi,ebx
+ movdqa xmm1,[64+esp]
+ rol edi,5
+ add edx,esi
+ xor ebp,eax
+ xor eax,ebx
+ add edx,edi
+ pshufd xmm2,xmm6,238
+ add ecx,DWORD [60+esp]
+ and ebp,eax
+ xor eax,ebx
+ ror edi,7
+ mov esi,edx
+ xor ebp,eax
+ rol edx,5
+ add ecx,ebp
+ xor esi,edi
+ xor edi,eax
+ add ecx,edx
+ add ebx,DWORD [esp]
+ pxor xmm0,xmm4
+ punpcklqdq xmm2,xmm7
+ and esi,edi
+ xor edi,eax
+ ror edx,7
+ pxor xmm0,xmm1
+ movdqa [64+esp],xmm4
+ mov ebp,ecx
+ xor esi,edi
+ rol ecx,5
+ movdqa xmm4,xmm3
+ add ebx,esi
+ paddd xmm3,xmm7
+ xor ebp,edx
+ pxor xmm0,xmm2
+ xor edx,edi
+ add ebx,ecx
+ add eax,DWORD [4+esp]
+ and ebp,edx
+ movdqa xmm2,xmm0
+ movdqa [48+esp],xmm3
+ xor edx,edi
+ ror ecx,7
+ mov esi,ebx
+ xor ebp,edx
+ rol ebx,5
+ pslld xmm0,2
+ add eax,ebp
+ xor esi,ecx
+ psrld xmm2,30
+ xor ecx,edx
+ add eax,ebx
+ add edi,DWORD [8+esp]
+ and esi,ecx
+ xor ecx,edx
+ ror ebx,7
+ por xmm0,xmm2
+ mov ebp,eax
+ xor esi,ecx
+ movdqa xmm2,[80+esp]
+ rol eax,5
+ add edi,esi
+ xor ebp,ebx
+ xor ebx,ecx
+ add edi,eax
+ pshufd xmm3,xmm7,238
+ add edx,DWORD [12+esp]
+ and ebp,ebx
+ xor ebx,ecx
+ ror eax,7
+ mov esi,edi
+ xor ebp,ebx
+ rol edi,5
+ add edx,ebp
+ xor esi,eax
+ xor eax,ebx
+ add edx,edi
+ add ecx,DWORD [16+esp]
+ pxor xmm1,xmm5
+ punpcklqdq xmm3,xmm0
+ and esi,eax
+ xor eax,ebx
+ ror edi,7
+ pxor xmm1,xmm2
+ movdqa [80+esp],xmm5
+ mov ebp,edx
+ xor esi,eax
+ rol edx,5
+ movdqa xmm5,xmm4
+ add ecx,esi
+ paddd xmm4,xmm0
+ xor ebp,edi
+ pxor xmm1,xmm3
+ xor edi,eax
+ add ecx,edx
+ add ebx,DWORD [20+esp]
+ and ebp,edi
+ movdqa xmm3,xmm1
+ movdqa [esp],xmm4
+ xor edi,eax
+ ror edx,7
+ mov esi,ecx
+ xor ebp,edi
+ rol ecx,5
+ pslld xmm1,2
+ add ebx,ebp
+ xor esi,edx
+ psrld xmm3,30
+ xor edx,edi
+ add ebx,ecx
+ add eax,DWORD [24+esp]
+ and esi,edx
+ xor edx,edi
+ ror ecx,7
+ por xmm1,xmm3
+ mov ebp,ebx
+ xor esi,edx
+ movdqa xmm3,[96+esp]
+ rol ebx,5
+ add eax,esi
+ xor ebp,ecx
+ xor ecx,edx
+ add eax,ebx
+ pshufd xmm4,xmm0,238
+ add edi,DWORD [28+esp]
+ and ebp,ecx
+ xor ecx,edx
+ ror ebx,7
+ mov esi,eax
+ xor ebp,ecx
+ rol eax,5
+ add edi,ebp
+ xor esi,ebx
+ xor ebx,ecx
+ add edi,eax
+ add edx,DWORD [32+esp]
+ pxor xmm2,xmm6
+ punpcklqdq xmm4,xmm1
+ and esi,ebx
+ xor ebx,ecx
+ ror eax,7
+ pxor xmm2,xmm3
+ movdqa [96+esp],xmm6
+ mov ebp,edi
+ xor esi,ebx
+ rol edi,5
+ movdqa xmm6,xmm5
+ add edx,esi
+ paddd xmm5,xmm1
+ xor ebp,eax
+ pxor xmm2,xmm4
+ xor eax,ebx
+ add edx,edi
+ add ecx,DWORD [36+esp]
+ and ebp,eax
+ movdqa xmm4,xmm2
+ movdqa [16+esp],xmm5
+ xor eax,ebx
+ ror edi,7
+ mov esi,edx
+ xor ebp,eax
+ rol edx,5
+ pslld xmm2,2
+ add ecx,ebp
+ xor esi,edi
+ psrld xmm4,30
+ xor edi,eax
+ add ecx,edx
+ add ebx,DWORD [40+esp]
+ and esi,edi
+ xor edi,eax
+ ror edx,7
+ por xmm2,xmm4
+ mov ebp,ecx
+ xor esi,edi
+ movdqa xmm4,[64+esp]
+ rol ecx,5
+ add ebx,esi
+ xor ebp,edx
+ xor edx,edi
+ add ebx,ecx
+ pshufd xmm5,xmm1,238
+ add eax,DWORD [44+esp]
+ and ebp,edx
+ xor edx,edi
+ ror ecx,7
+ mov esi,ebx
+ xor ebp,edx
+ rol ebx,5
+ add eax,ebp
+ xor esi,edx
+ add eax,ebx
+ add edi,DWORD [48+esp]
+ pxor xmm3,xmm7
+ punpcklqdq xmm5,xmm2
+ xor esi,ecx
+ mov ebp,eax
+ rol eax,5
+ pxor xmm3,xmm4
+ movdqa [64+esp],xmm7
+ add edi,esi
+ xor ebp,ecx
+ movdqa xmm7,xmm6
+ ror ebx,7
+ paddd xmm6,xmm2
+ add edi,eax
+ pxor xmm3,xmm5
+ add edx,DWORD [52+esp]
+ xor ebp,ebx
+ mov esi,edi
+ rol edi,5
+ movdqa xmm5,xmm3
+ movdqa [32+esp],xmm6
+ add edx,ebp
+ xor esi,ebx
+ ror eax,7
+ add edx,edi
+ pslld xmm3,2
+ add ecx,DWORD [56+esp]
+ xor esi,eax
+ psrld xmm5,30
+ mov ebp,edx
+ rol edx,5
+ add ecx,esi
+ xor ebp,eax
+ ror edi,7
+ add ecx,edx
+ por xmm3,xmm5
+ add ebx,DWORD [60+esp]
+ xor ebp,edi
+ mov esi,ecx
+ rol ecx,5
+ add ebx,ebp
+ xor esi,edi
+ ror edx,7
+ add ebx,ecx
+ add eax,DWORD [esp]
+ xor esi,edx
+ mov ebp,ebx
+ rol ebx,5
+ add eax,esi
+ xor ebp,edx
+ ror ecx,7
+ paddd xmm7,xmm3
+ add eax,ebx
+ add edi,DWORD [4+esp]
+ xor ebp,ecx
+ mov esi,eax
+ movdqa [48+esp],xmm7
+ rol eax,5
+ add edi,ebp
+ xor esi,ecx
+ ror ebx,7
+ add edi,eax
+ add edx,DWORD [8+esp]
+ xor esi,ebx
+ mov ebp,edi
+ rol edi,5
+ add edx,esi
+ xor ebp,ebx
+ ror eax,7
+ add edx,edi
+ add ecx,DWORD [12+esp]
+ xor ebp,eax
+ mov esi,edx
+ rol edx,5
+ add ecx,ebp
+ xor esi,eax
+ ror edi,7
+ add ecx,edx
+ mov ebp,DWORD [196+esp]
+ cmp ebp,DWORD [200+esp]
+ je NEAR L$007done
+ movdqa xmm7,[160+esp]
+ movdqa xmm6,[176+esp]
+ movdqu xmm0,[ebp]
+ movdqu xmm1,[16+ebp]
+ movdqu xmm2,[32+ebp]
+ movdqu xmm3,[48+ebp]
+ add ebp,64
+db 102,15,56,0,198
+ mov DWORD [196+esp],ebp
+ movdqa [96+esp],xmm7
+ add ebx,DWORD [16+esp]
+ xor esi,edi
+ mov ebp,ecx
+ rol ecx,5
+ add ebx,esi
+ xor ebp,edi
+ ror edx,7
+db 102,15,56,0,206
+ add ebx,ecx
+ add eax,DWORD [20+esp]
+ xor ebp,edx
+ mov esi,ebx
+ paddd xmm0,xmm7
+ rol ebx,5
+ add eax,ebp
+ xor esi,edx
+ ror ecx,7
+ movdqa [esp],xmm0
+ add eax,ebx
+ add edi,DWORD [24+esp]
+ xor esi,ecx
+ mov ebp,eax
+ psubd xmm0,xmm7
+ rol eax,5
+ add edi,esi
+ xor ebp,ecx
+ ror ebx,7
+ add edi,eax
+ add edx,DWORD [28+esp]
+ xor ebp,ebx
+ mov esi,edi
+ rol edi,5
+ add edx,ebp
+ xor esi,ebx
+ ror eax,7
+ add edx,edi
+ add ecx,DWORD [32+esp]
+ xor esi,eax
+ mov ebp,edx
+ rol edx,5
+ add ecx,esi
+ xor ebp,eax
+ ror edi,7
+db 102,15,56,0,214
+ add ecx,edx
+ add ebx,DWORD [36+esp]
+ xor ebp,edi
+ mov esi,ecx
+ paddd xmm1,xmm7
+ rol ecx,5
+ add ebx,ebp
+ xor esi,edi
+ ror edx,7
+ movdqa [16+esp],xmm1
+ add ebx,ecx
+ add eax,DWORD [40+esp]
+ xor esi,edx
+ mov ebp,ebx
+ psubd xmm1,xmm7
+ rol ebx,5
+ add eax,esi
+ xor ebp,edx
+ ror ecx,7
+ add eax,ebx
+ add edi,DWORD [44+esp]
+ xor ebp,ecx
+ mov esi,eax
+ rol eax,5
+ add edi,ebp
+ xor esi,ecx
+ ror ebx,7
+ add edi,eax
+ add edx,DWORD [48+esp]
+ xor esi,ebx
+ mov ebp,edi
+ rol edi,5
+ add edx,esi
+ xor ebp,ebx
+ ror eax,7
+db 102,15,56,0,222
+ add edx,edi
+ add ecx,DWORD [52+esp]
+ xor ebp,eax
+ mov esi,edx
+ paddd xmm2,xmm7
+ rol edx,5
+ add ecx,ebp
+ xor esi,eax
+ ror edi,7
+ movdqa [32+esp],xmm2
+ add ecx,edx
+ add ebx,DWORD [56+esp]
+ xor esi,edi
+ mov ebp,ecx
+ psubd xmm2,xmm7
+ rol ecx,5
+ add ebx,esi
+ xor ebp,edi
+ ror edx,7
+ add ebx,ecx
+ add eax,DWORD [60+esp]
+ xor ebp,edx
+ mov esi,ebx
+ rol ebx,5
+ add eax,ebp
+ ror ecx,7
+ add eax,ebx
+ mov ebp,DWORD [192+esp]
+ add eax,DWORD [ebp]
+ add esi,DWORD [4+ebp]
+ add ecx,DWORD [8+ebp]
+ mov DWORD [ebp],eax
+ add edx,DWORD [12+ebp]
+ mov DWORD [4+ebp],esi
+ add edi,DWORD [16+ebp]
+ mov DWORD [8+ebp],ecx
+ mov ebx,ecx
+ mov DWORD [12+ebp],edx
+ xor ebx,edx
+ mov DWORD [16+ebp],edi
+ mov ebp,esi
+ pshufd xmm4,xmm0,238
+ and esi,ebx
+ mov ebx,ebp
+ jmp NEAR L$006loop
+align 16
+L$007done:
+ add ebx,DWORD [16+esp]
+ xor esi,edi
+ mov ebp,ecx
+ rol ecx,5
+ add ebx,esi
+ xor ebp,edi
+ ror edx,7
+ add ebx,ecx
+ add eax,DWORD [20+esp]
+ xor ebp,edx
+ mov esi,ebx
+ rol ebx,5
+ add eax,ebp
+ xor esi,edx
+ ror ecx,7
+ add eax,ebx
+ add edi,DWORD [24+esp]
+ xor esi,ecx
+ mov ebp,eax
+ rol eax,5
+ add edi,esi
+ xor ebp,ecx
+ ror ebx,7
+ add edi,eax
+ add edx,DWORD [28+esp]
+ xor ebp,ebx
+ mov esi,edi
+ rol edi,5
+ add edx,ebp
+ xor esi,ebx
+ ror eax,7
+ add edx,edi
+ add ecx,DWORD [32+esp]
+ xor esi,eax
+ mov ebp,edx
+ rol edx,5
+ add ecx,esi
+ xor ebp,eax
+ ror edi,7
+ add ecx,edx
+ add ebx,DWORD [36+esp]
+ xor ebp,edi
+ mov esi,ecx
+ rol ecx,5
+ add ebx,ebp
+ xor esi,edi
+ ror edx,7
+ add ebx,ecx
+ add eax,DWORD [40+esp]
+ xor esi,edx
+ mov ebp,ebx
+ rol ebx,5
+ add eax,esi
+ xor ebp,edx
+ ror ecx,7
+ add eax,ebx
+ add edi,DWORD [44+esp]
+ xor ebp,ecx
+ mov esi,eax
+ rol eax,5
+ add edi,ebp
+ xor esi,ecx
+ ror ebx,7
+ add edi,eax
+ add edx,DWORD [48+esp]
+ xor esi,ebx
+ mov ebp,edi
+ rol edi,5
+ add edx,esi
+ xor ebp,ebx
+ ror eax,7
+ add edx,edi
+ add ecx,DWORD [52+esp]
+ xor ebp,eax
+ mov esi,edx
+ rol edx,5
+ add ecx,ebp
+ xor esi,eax
+ ror edi,7
+ add ecx,edx
+ add ebx,DWORD [56+esp]
+ xor esi,edi
+ mov ebp,ecx
+ rol ecx,5
+ add ebx,esi
+ xor ebp,edi
+ ror edx,7
+ add ebx,ecx
+ add eax,DWORD [60+esp]
+ xor ebp,edx
+ mov esi,ebx
+ rol ebx,5
+ add eax,ebp
+ ror ecx,7
+ add eax,ebx
+ mov ebp,DWORD [192+esp]
+ add eax,DWORD [ebp]
+ mov esp,DWORD [204+esp]
+ add esi,DWORD [4+ebp]
+ add ecx,DWORD [8+ebp]
+ mov DWORD [ebp],eax
+ add edx,DWORD [12+ebp]
+ mov DWORD [4+ebp],esi
+ add edi,DWORD [16+ebp]
+ mov DWORD [8+ebp],ecx
+ mov DWORD [12+ebp],edx
+ mov DWORD [16+ebp],edi
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+align 16
+__sha1_block_data_order_avx:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ call L$008pic_point
+L$008pic_point:
+ pop ebp
+ lea ebp,[(L$K_XX_XX-L$008pic_point)+ebp]
+L$avx_shortcut:
+ vzeroall
+ vmovdqa xmm7,[ebp]
+ vmovdqa xmm0,[16+ebp]
+ vmovdqa xmm1,[32+ebp]
+ vmovdqa xmm2,[48+ebp]
+ vmovdqa xmm6,[64+ebp]
+ mov edi,DWORD [20+esp]
+ mov ebp,DWORD [24+esp]
+ mov edx,DWORD [28+esp]
+ mov esi,esp
+ sub esp,208
+ and esp,-64
+ vmovdqa [112+esp],xmm0
+ vmovdqa [128+esp],xmm1
+ vmovdqa [144+esp],xmm2
+ shl edx,6
+ vmovdqa [160+esp],xmm7
+ add edx,ebp
+ vmovdqa [176+esp],xmm6
+ add ebp,64
+ mov DWORD [192+esp],edi
+ mov DWORD [196+esp],ebp
+ mov DWORD [200+esp],edx
+ mov DWORD [204+esp],esi
+ mov eax,DWORD [edi]
+ mov ebx,DWORD [4+edi]
+ mov ecx,DWORD [8+edi]
+ mov edx,DWORD [12+edi]
+ mov edi,DWORD [16+edi]
+ mov esi,ebx
+ vmovdqu xmm0,[ebp-64]
+ vmovdqu xmm1,[ebp-48]
+ vmovdqu xmm2,[ebp-32]
+ vmovdqu xmm3,[ebp-16]
+ vpshufb xmm0,xmm0,xmm6
+ vpshufb xmm1,xmm1,xmm6
+ vpshufb xmm2,xmm2,xmm6
+ vmovdqa [96+esp],xmm7
+ vpshufb xmm3,xmm3,xmm6
+ vpaddd xmm4,xmm0,xmm7
+ vpaddd xmm5,xmm1,xmm7
+ vpaddd xmm6,xmm2,xmm7
+ vmovdqa [esp],xmm4
+ mov ebp,ecx
+ vmovdqa [16+esp],xmm5
+ xor ebp,edx
+ vmovdqa [32+esp],xmm6
+ and esi,ebp
+ jmp NEAR L$009loop
+align 16
+L$009loop:
+ shrd ebx,ebx,2
+ xor esi,edx
+ vpalignr xmm4,xmm1,xmm0,8
+ mov ebp,eax
+ add edi,DWORD [esp]
+ vpaddd xmm7,xmm7,xmm3
+ vmovdqa [64+esp],xmm0
+ xor ebx,ecx
+ shld eax,eax,5
+ vpsrldq xmm6,xmm3,4
+ add edi,esi
+ and ebp,ebx
+ vpxor xmm4,xmm4,xmm0
+ xor ebx,ecx
+ add edi,eax
+ vpxor xmm6,xmm6,xmm2
+ shrd eax,eax,7
+ xor ebp,ecx
+ vmovdqa [48+esp],xmm7
+ mov esi,edi
+ add edx,DWORD [4+esp]
+ vpxor xmm4,xmm4,xmm6
+ xor eax,ebx
+ shld edi,edi,5
+ add edx,ebp
+ and esi,eax
+ vpsrld xmm6,xmm4,31
+ xor eax,ebx
+ add edx,edi
+ shrd edi,edi,7
+ xor esi,ebx
+ vpslldq xmm0,xmm4,12
+ vpaddd xmm4,xmm4,xmm4
+ mov ebp,edx
+ add ecx,DWORD [8+esp]
+ xor edi,eax
+ shld edx,edx,5
+ vpsrld xmm7,xmm0,30
+ vpor xmm4,xmm4,xmm6
+ add ecx,esi
+ and ebp,edi
+ xor edi,eax
+ add ecx,edx
+ vpslld xmm0,xmm0,2
+ shrd edx,edx,7
+ xor ebp,eax
+ vpxor xmm4,xmm4,xmm7
+ mov esi,ecx
+ add ebx,DWORD [12+esp]
+ xor edx,edi
+ shld ecx,ecx,5
+ vpxor xmm4,xmm4,xmm0
+ add ebx,ebp
+ and esi,edx
+ vmovdqa xmm0,[96+esp]
+ xor edx,edi
+ add ebx,ecx
+ shrd ecx,ecx,7
+ xor esi,edi
+ vpalignr xmm5,xmm2,xmm1,8
+ mov ebp,ebx
+ add eax,DWORD [16+esp]
+ vpaddd xmm0,xmm0,xmm4
+ vmovdqa [80+esp],xmm1
+ xor ecx,edx
+ shld ebx,ebx,5
+ vpsrldq xmm7,xmm4,4
+ add eax,esi
+ and ebp,ecx
+ vpxor xmm5,xmm5,xmm1
+ xor ecx,edx
+ add eax,ebx
+ vpxor xmm7,xmm7,xmm3
+ shrd ebx,ebx,7
+ xor ebp,edx
+ vmovdqa [esp],xmm0
+ mov esi,eax
+ add edi,DWORD [20+esp]
+ vpxor xmm5,xmm5,xmm7
+ xor ebx,ecx
+ shld eax,eax,5
+ add edi,ebp
+ and esi,ebx
+ vpsrld xmm7,xmm5,31
+ xor ebx,ecx
+ add edi,eax
+ shrd eax,eax,7
+ xor esi,ecx
+ vpslldq xmm1,xmm5,12
+ vpaddd xmm5,xmm5,xmm5
+ mov ebp,edi
+ add edx,DWORD [24+esp]
+ xor eax,ebx
+ shld edi,edi,5
+ vpsrld xmm0,xmm1,30
+ vpor xmm5,xmm5,xmm7
+ add edx,esi
+ and ebp,eax
+ xor eax,ebx
+ add edx,edi
+ vpslld xmm1,xmm1,2
+ shrd edi,edi,7
+ xor ebp,ebx
+ vpxor xmm5,xmm5,xmm0
+ mov esi,edx
+ add ecx,DWORD [28+esp]
+ xor edi,eax
+ shld edx,edx,5
+ vpxor xmm5,xmm5,xmm1
+ add ecx,ebp
+ and esi,edi
+ vmovdqa xmm1,[112+esp]
+ xor edi,eax
+ add ecx,edx
+ shrd edx,edx,7
+ xor esi,eax
+ vpalignr xmm6,xmm3,xmm2,8
+ mov ebp,ecx
+ add ebx,DWORD [32+esp]
+ vpaddd xmm1,xmm1,xmm5
+ vmovdqa [96+esp],xmm2
+ xor edx,edi
+ shld ecx,ecx,5
+ vpsrldq xmm0,xmm5,4
+ add ebx,esi
+ and ebp,edx
+ vpxor xmm6,xmm6,xmm2
+ xor edx,edi
+ add ebx,ecx
+ vpxor xmm0,xmm0,xmm4
+ shrd ecx,ecx,7
+ xor ebp,edi
+ vmovdqa [16+esp],xmm1
+ mov esi,ebx
+ add eax,DWORD [36+esp]
+ vpxor xmm6,xmm6,xmm0
+ xor ecx,edx
+ shld ebx,ebx,5
+ add eax,ebp
+ and esi,ecx
+ vpsrld xmm0,xmm6,31
+ xor ecx,edx
+ add eax,ebx
+ shrd ebx,ebx,7
+ xor esi,edx
+ vpslldq xmm2,xmm6,12
+ vpaddd xmm6,xmm6,xmm6
+ mov ebp,eax
+ add edi,DWORD [40+esp]
+ xor ebx,ecx
+ shld eax,eax,5
+ vpsrld xmm1,xmm2,30
+ vpor xmm6,xmm6,xmm0
+ add edi,esi
+ and ebp,ebx
+ xor ebx,ecx
+ add edi,eax
+ vpslld xmm2,xmm2,2
+ vmovdqa xmm0,[64+esp]
+ shrd eax,eax,7
+ xor ebp,ecx
+ vpxor xmm6,xmm6,xmm1
+ mov esi,edi
+ add edx,DWORD [44+esp]
+ xor eax,ebx
+ shld edi,edi,5
+ vpxor xmm6,xmm6,xmm2
+ add edx,ebp
+ and esi,eax
+ vmovdqa xmm2,[112+esp]
+ xor eax,ebx
+ add edx,edi
+ shrd edi,edi,7
+ xor esi,ebx
+ vpalignr xmm7,xmm4,xmm3,8
+ mov ebp,edx
+ add ecx,DWORD [48+esp]
+ vpaddd xmm2,xmm2,xmm6
+ vmovdqa [64+esp],xmm3
+ xor edi,eax
+ shld edx,edx,5
+ vpsrldq xmm1,xmm6,4
+ add ecx,esi
+ and ebp,edi
+ vpxor xmm7,xmm7,xmm3
+ xor edi,eax
+ add ecx,edx
+ vpxor xmm1,xmm1,xmm5
+ shrd edx,edx,7
+ xor ebp,eax
+ vmovdqa [32+esp],xmm2
+ mov esi,ecx
+ add ebx,DWORD [52+esp]
+ vpxor xmm7,xmm7,xmm1
+ xor edx,edi
+ shld ecx,ecx,5
+ add ebx,ebp
+ and esi,edx
+ vpsrld xmm1,xmm7,31
+ xor edx,edi
+ add ebx,ecx
+ shrd ecx,ecx,7
+ xor esi,edi
+ vpslldq xmm3,xmm7,12
+ vpaddd xmm7,xmm7,xmm7
+ mov ebp,ebx
+ add eax,DWORD [56+esp]
+ xor ecx,edx
+ shld ebx,ebx,5
+ vpsrld xmm2,xmm3,30
+ vpor xmm7,xmm7,xmm1
+ add eax,esi
+ and ebp,ecx
+ xor ecx,edx
+ add eax,ebx
+ vpslld xmm3,xmm3,2
+ vmovdqa xmm1,[80+esp]
+ shrd ebx,ebx,7
+ xor ebp,edx
+ vpxor xmm7,xmm7,xmm2
+ mov esi,eax
+ add edi,DWORD [60+esp]
+ xor ebx,ecx
+ shld eax,eax,5
+ vpxor xmm7,xmm7,xmm3
+ add edi,ebp
+ and esi,ebx
+ vmovdqa xmm3,[112+esp]
+ xor ebx,ecx
+ add edi,eax
+ vpalignr xmm2,xmm7,xmm6,8
+ vpxor xmm0,xmm0,xmm4
+ shrd eax,eax,7
+ xor esi,ecx
+ mov ebp,edi
+ add edx,DWORD [esp]
+ vpxor xmm0,xmm0,xmm1
+ vmovdqa [80+esp],xmm4
+ xor eax,ebx
+ shld edi,edi,5
+ vmovdqa xmm4,xmm3
+ vpaddd xmm3,xmm3,xmm7
+ add edx,esi
+ and ebp,eax
+ vpxor xmm0,xmm0,xmm2
+ xor eax,ebx
+ add edx,edi
+ shrd edi,edi,7
+ xor ebp,ebx
+ vpsrld xmm2,xmm0,30
+ vmovdqa [48+esp],xmm3
+ mov esi,edx
+ add ecx,DWORD [4+esp]
+ xor edi,eax
+ shld edx,edx,5
+ vpslld xmm0,xmm0,2
+ add ecx,ebp
+ and esi,edi
+ xor edi,eax
+ add ecx,edx
+ shrd edx,edx,7
+ xor esi,eax
+ mov ebp,ecx
+ add ebx,DWORD [8+esp]
+ vpor xmm0,xmm0,xmm2
+ xor edx,edi
+ shld ecx,ecx,5
+ vmovdqa xmm2,[96+esp]
+ add ebx,esi
+ and ebp,edx
+ xor edx,edi
+ add ebx,ecx
+ add eax,DWORD [12+esp]
+ xor ebp,edi
+ mov esi,ebx
+ shld ebx,ebx,5
+ add eax,ebp
+ xor esi,edx
+ shrd ecx,ecx,7
+ add eax,ebx
+ vpalignr xmm3,xmm0,xmm7,8
+ vpxor xmm1,xmm1,xmm5
+ add edi,DWORD [16+esp]
+ xor esi,ecx
+ mov ebp,eax
+ shld eax,eax,5
+ vpxor xmm1,xmm1,xmm2
+ vmovdqa [96+esp],xmm5
+ add edi,esi
+ xor ebp,ecx
+ vmovdqa xmm5,xmm4
+ vpaddd xmm4,xmm4,xmm0
+ shrd ebx,ebx,7
+ add edi,eax
+ vpxor xmm1,xmm1,xmm3
+ add edx,DWORD [20+esp]
+ xor ebp,ebx
+ mov esi,edi
+ shld edi,edi,5
+ vpsrld xmm3,xmm1,30
+ vmovdqa [esp],xmm4
+ add edx,ebp
+ xor esi,ebx
+ shrd eax,eax,7
+ add edx,edi
+ vpslld xmm1,xmm1,2
+ add ecx,DWORD [24+esp]
+ xor esi,eax
+ mov ebp,edx
+ shld edx,edx,5
+ add ecx,esi
+ xor ebp,eax
+ shrd edi,edi,7
+ add ecx,edx
+ vpor xmm1,xmm1,xmm3
+ add ebx,DWORD [28+esp]
+ xor ebp,edi
+ vmovdqa xmm3,[64+esp]
+ mov esi,ecx
+ shld ecx,ecx,5
+ add ebx,ebp
+ xor esi,edi
+ shrd edx,edx,7
+ add ebx,ecx
+ vpalignr xmm4,xmm1,xmm0,8
+ vpxor xmm2,xmm2,xmm6
+ add eax,DWORD [32+esp]
+ xor esi,edx
+ mov ebp,ebx
+ shld ebx,ebx,5
+ vpxor xmm2,xmm2,xmm3
+ vmovdqa [64+esp],xmm6
+ add eax,esi
+ xor ebp,edx
+ vmovdqa xmm6,[128+esp]
+ vpaddd xmm5,xmm5,xmm1
+ shrd ecx,ecx,7
+ add eax,ebx
+ vpxor xmm2,xmm2,xmm4
+ add edi,DWORD [36+esp]
+ xor ebp,ecx
+ mov esi,eax
+ shld eax,eax,5
+ vpsrld xmm4,xmm2,30
+ vmovdqa [16+esp],xmm5
+ add edi,ebp
+ xor esi,ecx
+ shrd ebx,ebx,7
+ add edi,eax
+ vpslld xmm2,xmm2,2
+ add edx,DWORD [40+esp]
+ xor esi,ebx
+ mov ebp,edi
+ shld edi,edi,5
+ add edx,esi
+ xor ebp,ebx
+ shrd eax,eax,7
+ add edx,edi
+ vpor xmm2,xmm2,xmm4
+ add ecx,DWORD [44+esp]
+ xor ebp,eax
+ vmovdqa xmm4,[80+esp]
+ mov esi,edx
+ shld edx,edx,5
+ add ecx,ebp
+ xor esi,eax
+ shrd edi,edi,7
+ add ecx,edx
+ vpalignr xmm5,xmm2,xmm1,8
+ vpxor xmm3,xmm3,xmm7
+ add ebx,DWORD [48+esp]
+ xor esi,edi
+ mov ebp,ecx
+ shld ecx,ecx,5
+ vpxor xmm3,xmm3,xmm4
+ vmovdqa [80+esp],xmm7
+ add ebx,esi
+ xor ebp,edi
+ vmovdqa xmm7,xmm6
+ vpaddd xmm6,xmm6,xmm2
+ shrd edx,edx,7
+ add ebx,ecx
+ vpxor xmm3,xmm3,xmm5
+ add eax,DWORD [52+esp]
+ xor ebp,edx
+ mov esi,ebx
+ shld ebx,ebx,5
+ vpsrld xmm5,xmm3,30
+ vmovdqa [32+esp],xmm6
+ add eax,ebp
+ xor esi,edx
+ shrd ecx,ecx,7
+ add eax,ebx
+ vpslld xmm3,xmm3,2
+ add edi,DWORD [56+esp]
+ xor esi,ecx
+ mov ebp,eax
+ shld eax,eax,5
+ add edi,esi
+ xor ebp,ecx
+ shrd ebx,ebx,7
+ add edi,eax
+ vpor xmm3,xmm3,xmm5
+ add edx,DWORD [60+esp]
+ xor ebp,ebx
+ vmovdqa xmm5,[96+esp]
+ mov esi,edi
+ shld edi,edi,5
+ add edx,ebp
+ xor esi,ebx
+ shrd eax,eax,7
+ add edx,edi
+ vpalignr xmm6,xmm3,xmm2,8
+ vpxor xmm4,xmm4,xmm0
+ add ecx,DWORD [esp]
+ xor esi,eax
+ mov ebp,edx
+ shld edx,edx,5
+ vpxor xmm4,xmm4,xmm5
+ vmovdqa [96+esp],xmm0
+ add ecx,esi
+ xor ebp,eax
+ vmovdqa xmm0,xmm7
+ vpaddd xmm7,xmm7,xmm3
+ shrd edi,edi,7
+ add ecx,edx
+ vpxor xmm4,xmm4,xmm6
+ add ebx,DWORD [4+esp]
+ xor ebp,edi
+ mov esi,ecx
+ shld ecx,ecx,5
+ vpsrld xmm6,xmm4,30
+ vmovdqa [48+esp],xmm7
+ add ebx,ebp
+ xor esi,edi
+ shrd edx,edx,7
+ add ebx,ecx
+ vpslld xmm4,xmm4,2
+ add eax,DWORD [8+esp]
+ xor esi,edx
+ mov ebp,ebx
+ shld ebx,ebx,5
+ add eax,esi
+ xor ebp,edx
+ shrd ecx,ecx,7
+ add eax,ebx
+ vpor xmm4,xmm4,xmm6
+ add edi,DWORD [12+esp]
+ xor ebp,ecx
+ vmovdqa xmm6,[64+esp]
+ mov esi,eax
+ shld eax,eax,5
+ add edi,ebp
+ xor esi,ecx
+ shrd ebx,ebx,7
+ add edi,eax
+ vpalignr xmm7,xmm4,xmm3,8
+ vpxor xmm5,xmm5,xmm1
+ add edx,DWORD [16+esp]
+ xor esi,ebx
+ mov ebp,edi
+ shld edi,edi,5
+ vpxor xmm5,xmm5,xmm6
+ vmovdqa [64+esp],xmm1
+ add edx,esi
+ xor ebp,ebx
+ vmovdqa xmm1,xmm0
+ vpaddd xmm0,xmm0,xmm4
+ shrd eax,eax,7
+ add edx,edi
+ vpxor xmm5,xmm5,xmm7
+ add ecx,DWORD [20+esp]
+ xor ebp,eax
+ mov esi,edx
+ shld edx,edx,5
+ vpsrld xmm7,xmm5,30
+ vmovdqa [esp],xmm0
+ add ecx,ebp
+ xor esi,eax
+ shrd edi,edi,7
+ add ecx,edx
+ vpslld xmm5,xmm5,2
+ add ebx,DWORD [24+esp]
+ xor esi,edi
+ mov ebp,ecx
+ shld ecx,ecx,5
+ add ebx,esi
+ xor ebp,edi
+ shrd edx,edx,7
+ add ebx,ecx
+ vpor xmm5,xmm5,xmm7
+ add eax,DWORD [28+esp]
+ vmovdqa xmm7,[80+esp]
+ shrd ecx,ecx,7
+ mov esi,ebx
+ xor ebp,edx
+ shld ebx,ebx,5
+ add eax,ebp
+ xor esi,ecx
+ xor ecx,edx
+ add eax,ebx
+ vpalignr xmm0,xmm5,xmm4,8
+ vpxor xmm6,xmm6,xmm2
+ add edi,DWORD [32+esp]
+ and esi,ecx
+ xor ecx,edx
+ shrd ebx,ebx,7
+ vpxor xmm6,xmm6,xmm7
+ vmovdqa [80+esp],xmm2
+ mov ebp,eax
+ xor esi,ecx
+ vmovdqa xmm2,xmm1
+ vpaddd xmm1,xmm1,xmm5
+ shld eax,eax,5
+ add edi,esi
+ vpxor xmm6,xmm6,xmm0
+ xor ebp,ebx
+ xor ebx,ecx
+ add edi,eax
+ add edx,DWORD [36+esp]
+ vpsrld xmm0,xmm6,30
+ vmovdqa [16+esp],xmm1
+ and ebp,ebx
+ xor ebx,ecx
+ shrd eax,eax,7
+ mov esi,edi
+ vpslld xmm6,xmm6,2
+ xor ebp,ebx
+ shld edi,edi,5
+ add edx,ebp
+ xor esi,eax
+ xor eax,ebx
+ add edx,edi
+ add ecx,DWORD [40+esp]
+ and esi,eax
+ vpor xmm6,xmm6,xmm0
+ xor eax,ebx
+ shrd edi,edi,7
+ vmovdqa xmm0,[96+esp]
+ mov ebp,edx
+ xor esi,eax
+ shld edx,edx,5
+ add ecx,esi
+ xor ebp,edi
+ xor edi,eax
+ add ecx,edx
+ add ebx,DWORD [44+esp]
+ and ebp,edi
+ xor edi,eax
+ shrd edx,edx,7
+ mov esi,ecx
+ xor ebp,edi
+ shld ecx,ecx,5
+ add ebx,ebp
+ xor esi,edx
+ xor edx,edi
+ add ebx,ecx
+ vpalignr xmm1,xmm6,xmm5,8
+ vpxor xmm7,xmm7,xmm3
+ add eax,DWORD [48+esp]
+ and esi,edx
+ xor edx,edi
+ shrd ecx,ecx,7
+ vpxor xmm7,xmm7,xmm0
+ vmovdqa [96+esp],xmm3
+ mov ebp,ebx
+ xor esi,edx
+ vmovdqa xmm3,[144+esp]
+ vpaddd xmm2,xmm2,xmm6
+ shld ebx,ebx,5
+ add eax,esi
+ vpxor xmm7,xmm7,xmm1
+ xor ebp,ecx
+ xor ecx,edx
+ add eax,ebx
+ add edi,DWORD [52+esp]
+ vpsrld xmm1,xmm7,30
+ vmovdqa [32+esp],xmm2
+ and ebp,ecx
+ xor ecx,edx
+ shrd ebx,ebx,7
+ mov esi,eax
+ vpslld xmm7,xmm7,2
+ xor ebp,ecx
+ shld eax,eax,5
+ add edi,ebp
+ xor esi,ebx
+ xor ebx,ecx
+ add edi,eax
+ add edx,DWORD [56+esp]
+ and esi,ebx
+ vpor xmm7,xmm7,xmm1
+ xor ebx,ecx
+ shrd eax,eax,7
+ vmovdqa xmm1,[64+esp]
+ mov ebp,edi
+ xor esi,ebx
+ shld edi,edi,5
+ add edx,esi
+ xor ebp,eax
+ xor eax,ebx
+ add edx,edi
+ add ecx,DWORD [60+esp]
+ and ebp,eax
+ xor eax,ebx
+ shrd edi,edi,7
+ mov esi,edx
+ xor ebp,eax
+ shld edx,edx,5
+ add ecx,ebp
+ xor esi,edi
+ xor edi,eax
+ add ecx,edx
+ vpalignr xmm2,xmm7,xmm6,8
+ vpxor xmm0,xmm0,xmm4
+ add ebx,DWORD [esp]
+ and esi,edi
+ xor edi,eax
+ shrd edx,edx,7
+ vpxor xmm0,xmm0,xmm1
+ vmovdqa [64+esp],xmm4
+ mov ebp,ecx
+ xor esi,edi
+ vmovdqa xmm4,xmm3
+ vpaddd xmm3,xmm3,xmm7
+ shld ecx,ecx,5
+ add ebx,esi
+ vpxor xmm0,xmm0,xmm2
+ xor ebp,edx
+ xor edx,edi
+ add ebx,ecx
+ add eax,DWORD [4+esp]
+ vpsrld xmm2,xmm0,30
+ vmovdqa [48+esp],xmm3
+ and ebp,edx
+ xor edx,edi
+ shrd ecx,ecx,7
+ mov esi,ebx
+ vpslld xmm0,xmm0,2
+ xor ebp,edx
+ shld ebx,ebx,5
+ add eax,ebp
+ xor esi,ecx
+ xor ecx,edx
+ add eax,ebx
+ add edi,DWORD [8+esp]
+ and esi,ecx
+ vpor xmm0,xmm0,xmm2
+ xor ecx,edx
+ shrd ebx,ebx,7
+ vmovdqa xmm2,[80+esp]
+ mov ebp,eax
+ xor esi,ecx
+ shld eax,eax,5
+ add edi,esi
+ xor ebp,ebx
+ xor ebx,ecx
+ add edi,eax
+ add edx,DWORD [12+esp]
+ and ebp,ebx
+ xor ebx,ecx
+ shrd eax,eax,7
+ mov esi,edi
+ xor ebp,ebx
+ shld edi,edi,5
+ add edx,ebp
+ xor esi,eax
+ xor eax,ebx
+ add edx,edi
+ vpalignr xmm3,xmm0,xmm7,8
+ vpxor xmm1,xmm1,xmm5
+ add ecx,DWORD [16+esp]
+ and esi,eax
+ xor eax,ebx
+ shrd edi,edi,7
+ vpxor xmm1,xmm1,xmm2
+ vmovdqa [80+esp],xmm5
+ mov ebp,edx
+ xor esi,eax
+ vmovdqa xmm5,xmm4
+ vpaddd xmm4,xmm4,xmm0
+ shld edx,edx,5
+ add ecx,esi
+ vpxor xmm1,xmm1,xmm3
+ xor ebp,edi
+ xor edi,eax
+ add ecx,edx
+ add ebx,DWORD [20+esp]
+ vpsrld xmm3,xmm1,30
+ vmovdqa [esp],xmm4
+ and ebp,edi
+ xor edi,eax
+ shrd edx,edx,7
+ mov esi,ecx
+ vpslld xmm1,xmm1,2
+ xor ebp,edi
+ shld ecx,ecx,5
+ add ebx,ebp
+ xor esi,edx
+ xor edx,edi
+ add ebx,ecx
+ add eax,DWORD [24+esp]
+ and esi,edx
+ vpor xmm1,xmm1,xmm3
+ xor edx,edi
+ shrd ecx,ecx,7
+ vmovdqa xmm3,[96+esp]
+ mov ebp,ebx
+ xor esi,edx
+ shld ebx,ebx,5
+ add eax,esi
+ xor ebp,ecx
+ xor ecx,edx
+ add eax,ebx
+ add edi,DWORD [28+esp]
+ and ebp,ecx
+ xor ecx,edx
+ shrd ebx,ebx,7
+ mov esi,eax
+ xor ebp,ecx
+ shld eax,eax,5
+ add edi,ebp
+ xor esi,ebx
+ xor ebx,ecx
+ add edi,eax
+ vpalignr xmm4,xmm1,xmm0,8
+ vpxor xmm2,xmm2,xmm6
+ add edx,DWORD [32+esp]
+ and esi,ebx
+ xor ebx,ecx
+ shrd eax,eax,7
+ vpxor xmm2,xmm2,xmm3
+ vmovdqa [96+esp],xmm6
+ mov ebp,edi
+ xor esi,ebx
+ vmovdqa xmm6,xmm5
+ vpaddd xmm5,xmm5,xmm1
+ shld edi,edi,5
+ add edx,esi
+ vpxor xmm2,xmm2,xmm4
+ xor ebp,eax
+ xor eax,ebx
+ add edx,edi
+ add ecx,DWORD [36+esp]
+ vpsrld xmm4,xmm2,30
+ vmovdqa [16+esp],xmm5
+ and ebp,eax
+ xor eax,ebx
+ shrd edi,edi,7
+ mov esi,edx
+ vpslld xmm2,xmm2,2
+ xor ebp,eax
+ shld edx,edx,5
+ add ecx,ebp
+ xor esi,edi
+ xor edi,eax
+ add ecx,edx
+ add ebx,DWORD [40+esp]
+ and esi,edi
+ vpor xmm2,xmm2,xmm4
+ xor edi,eax
+ shrd edx,edx,7
+ vmovdqa xmm4,[64+esp]
+ mov ebp,ecx
+ xor esi,edi
+ shld ecx,ecx,5
+ add ebx,esi
+ xor ebp,edx
+ xor edx,edi
+ add ebx,ecx
+ add eax,DWORD [44+esp]
+ and ebp,edx
+ xor edx,edi
+ shrd ecx,ecx,7
+ mov esi,ebx
+ xor ebp,edx
+ shld ebx,ebx,5
+ add eax,ebp
+ xor esi,edx
+ add eax,ebx
+ vpalignr xmm5,xmm2,xmm1,8
+ vpxor xmm3,xmm3,xmm7
+ add edi,DWORD [48+esp]
+ xor esi,ecx
+ mov ebp,eax
+ shld eax,eax,5
+ vpxor xmm3,xmm3,xmm4
+ vmovdqa [64+esp],xmm7
+ add edi,esi
+ xor ebp,ecx
+ vmovdqa xmm7,xmm6
+ vpaddd xmm6,xmm6,xmm2
+ shrd ebx,ebx,7
+ add edi,eax
+ vpxor xmm3,xmm3,xmm5
+ add edx,DWORD [52+esp]
+ xor ebp,ebx
+ mov esi,edi
+ shld edi,edi,5
+ vpsrld xmm5,xmm3,30
+ vmovdqa [32+esp],xmm6
+ add edx,ebp
+ xor esi,ebx
+ shrd eax,eax,7
+ add edx,edi
+ vpslld xmm3,xmm3,2
+ add ecx,DWORD [56+esp]
+ xor esi,eax
+ mov ebp,edx
+ shld edx,edx,5
+ add ecx,esi
+ xor ebp,eax
+ shrd edi,edi,7
+ add ecx,edx
+ vpor xmm3,xmm3,xmm5
+ add ebx,DWORD [60+esp]
+ xor ebp,edi
+ mov esi,ecx
+ shld ecx,ecx,5
+ add ebx,ebp
+ xor esi,edi
+ shrd edx,edx,7
+ add ebx,ecx
+ add eax,DWORD [esp]
+ vpaddd xmm7,xmm7,xmm3
+ xor esi,edx
+ mov ebp,ebx
+ shld ebx,ebx,5
+ add eax,esi
+ vmovdqa [48+esp],xmm7
+ xor ebp,edx
+ shrd ecx,ecx,7
+ add eax,ebx
+ add edi,DWORD [4+esp]
+ xor ebp,ecx
+ mov esi,eax
+ shld eax,eax,5
+ add edi,ebp
+ xor esi,ecx
+ shrd ebx,ebx,7
+ add edi,eax
+ add edx,DWORD [8+esp]
+ xor esi,ebx
+ mov ebp,edi
+ shld edi,edi,5
+ add edx,esi
+ xor ebp,ebx
+ shrd eax,eax,7
+ add edx,edi
+ add ecx,DWORD [12+esp]
+ xor ebp,eax
+ mov esi,edx
+ shld edx,edx,5
+ add ecx,ebp
+ xor esi,eax
+ shrd edi,edi,7
+ add ecx,edx
+ mov ebp,DWORD [196+esp]
+ cmp ebp,DWORD [200+esp]
+ je NEAR L$010done
+ vmovdqa xmm7,[160+esp]
+ vmovdqa xmm6,[176+esp]
+ vmovdqu xmm0,[ebp]
+ vmovdqu xmm1,[16+ebp]
+ vmovdqu xmm2,[32+ebp]
+ vmovdqu xmm3,[48+ebp]
+ add ebp,64
+ vpshufb xmm0,xmm0,xmm6
+ mov DWORD [196+esp],ebp
+ vmovdqa [96+esp],xmm7
+ add ebx,DWORD [16+esp]
+ xor esi,edi
+ vpshufb xmm1,xmm1,xmm6
+ mov ebp,ecx
+ shld ecx,ecx,5
+ vpaddd xmm4,xmm0,xmm7
+ add ebx,esi
+ xor ebp,edi
+ shrd edx,edx,7
+ add ebx,ecx
+ vmovdqa [esp],xmm4
+ add eax,DWORD [20+esp]
+ xor ebp,edx
+ mov esi,ebx
+ shld ebx,ebx,5
+ add eax,ebp
+ xor esi,edx
+ shrd ecx,ecx,7
+ add eax,ebx
+ add edi,DWORD [24+esp]
+ xor esi,ecx
+ mov ebp,eax
+ shld eax,eax,5
+ add edi,esi
+ xor ebp,ecx
+ shrd ebx,ebx,7
+ add edi,eax
+ add edx,DWORD [28+esp]
+ xor ebp,ebx
+ mov esi,edi
+ shld edi,edi,5
+ add edx,ebp
+ xor esi,ebx
+ shrd eax,eax,7
+ add edx,edi
+ add ecx,DWORD [32+esp]
+ xor esi,eax
+ vpshufb xmm2,xmm2,xmm6
+ mov ebp,edx
+ shld edx,edx,5
+ vpaddd xmm5,xmm1,xmm7
+ add ecx,esi
+ xor ebp,eax
+ shrd edi,edi,7
+ add ecx,edx
+ vmovdqa [16+esp],xmm5
+ add ebx,DWORD [36+esp]
+ xor ebp,edi
+ mov esi,ecx
+ shld ecx,ecx,5
+ add ebx,ebp
+ xor esi,edi
+ shrd edx,edx,7
+ add ebx,ecx
+ add eax,DWORD [40+esp]
+ xor esi,edx
+ mov ebp,ebx
+ shld ebx,ebx,5
+ add eax,esi
+ xor ebp,edx
+ shrd ecx,ecx,7
+ add eax,ebx
+ add edi,DWORD [44+esp]
+ xor ebp,ecx
+ mov esi,eax
+ shld eax,eax,5
+ add edi,ebp
+ xor esi,ecx
+ shrd ebx,ebx,7
+ add edi,eax
+ add edx,DWORD [48+esp]
+ xor esi,ebx
+ vpshufb xmm3,xmm3,xmm6
+ mov ebp,edi
+ shld edi,edi,5
+ vpaddd xmm6,xmm2,xmm7
+ add edx,esi
+ xor ebp,ebx
+ shrd eax,eax,7
+ add edx,edi
+ vmovdqa [32+esp],xmm6
+ add ecx,DWORD [52+esp]
+ xor ebp,eax
+ mov esi,edx
+ shld edx,edx,5
+ add ecx,ebp
+ xor esi,eax
+ shrd edi,edi,7
+ add ecx,edx
+ add ebx,DWORD [56+esp]
+ xor esi,edi
+ mov ebp,ecx
+ shld ecx,ecx,5
+ add ebx,esi
+ xor ebp,edi
+ shrd edx,edx,7
+ add ebx,ecx
+ add eax,DWORD [60+esp]
+ xor ebp,edx
+ mov esi,ebx
+ shld ebx,ebx,5
+ add eax,ebp
+ shrd ecx,ecx,7
+ add eax,ebx
+ mov ebp,DWORD [192+esp]
+ add eax,DWORD [ebp]
+ add esi,DWORD [4+ebp]
+ add ecx,DWORD [8+ebp]
+ mov DWORD [ebp],eax
+ add edx,DWORD [12+ebp]
+ mov DWORD [4+ebp],esi
+ add edi,DWORD [16+ebp]
+ mov ebx,ecx
+ mov DWORD [8+ebp],ecx
+ xor ebx,edx
+ mov DWORD [12+ebp],edx
+ mov DWORD [16+ebp],edi
+ mov ebp,esi
+ and esi,ebx
+ mov ebx,ebp
+ jmp NEAR L$009loop
+align 16
+L$010done:
+ add ebx,DWORD [16+esp]
+ xor esi,edi
+ mov ebp,ecx
+ shld ecx,ecx,5
+ add ebx,esi
+ xor ebp,edi
+ shrd edx,edx,7
+ add ebx,ecx
+ add eax,DWORD [20+esp]
+ xor ebp,edx
+ mov esi,ebx
+ shld ebx,ebx,5
+ add eax,ebp
+ xor esi,edx
+ shrd ecx,ecx,7
+ add eax,ebx
+ add edi,DWORD [24+esp]
+ xor esi,ecx
+ mov ebp,eax
+ shld eax,eax,5
+ add edi,esi
+ xor ebp,ecx
+ shrd ebx,ebx,7
+ add edi,eax
+ add edx,DWORD [28+esp]
+ xor ebp,ebx
+ mov esi,edi
+ shld edi,edi,5
+ add edx,ebp
+ xor esi,ebx
+ shrd eax,eax,7
+ add edx,edi
+ add ecx,DWORD [32+esp]
+ xor esi,eax
+ mov ebp,edx
+ shld edx,edx,5
+ add ecx,esi
+ xor ebp,eax
+ shrd edi,edi,7
+ add ecx,edx
+ add ebx,DWORD [36+esp]
+ xor ebp,edi
+ mov esi,ecx
+ shld ecx,ecx,5
+ add ebx,ebp
+ xor esi,edi
+ shrd edx,edx,7
+ add ebx,ecx
+ add eax,DWORD [40+esp]
+ xor esi,edx
+ mov ebp,ebx
+ shld ebx,ebx,5
+ add eax,esi
+ xor ebp,edx
+ shrd ecx,ecx,7
+ add eax,ebx
+ add edi,DWORD [44+esp]
+ xor ebp,ecx
+ mov esi,eax
+ shld eax,eax,5
+ add edi,ebp
+ xor esi,ecx
+ shrd ebx,ebx,7
+ add edi,eax
+ add edx,DWORD [48+esp]
+ xor esi,ebx
+ mov ebp,edi
+ shld edi,edi,5
+ add edx,esi
+ xor ebp,ebx
+ shrd eax,eax,7
+ add edx,edi
+ add ecx,DWORD [52+esp]
+ xor ebp,eax
+ mov esi,edx
+ shld edx,edx,5
+ add ecx,ebp
+ xor esi,eax
+ shrd edi,edi,7
+ add ecx,edx
+ add ebx,DWORD [56+esp]
+ xor esi,edi
+ mov ebp,ecx
+ shld ecx,ecx,5
+ add ebx,esi
+ xor ebp,edi
+ shrd edx,edx,7
+ add ebx,ecx
+ add eax,DWORD [60+esp]
+ xor ebp,edx
+ mov esi,ebx
+ shld ebx,ebx,5
+ add eax,ebp
+ shrd ecx,ecx,7
+ add eax,ebx
+ vzeroall
+ mov ebp,DWORD [192+esp]
+ add eax,DWORD [ebp]
+ mov esp,DWORD [204+esp]
+ add esi,DWORD [4+ebp]
+ add ecx,DWORD [8+ebp]
+ mov DWORD [ebp],eax
+ add edx,DWORD [12+ebp]
+ mov DWORD [4+ebp],esi
+ add edi,DWORD [16+ebp]
+ mov DWORD [8+ebp],ecx
+ mov DWORD [12+ebp],edx
+ mov DWORD [16+ebp],edi
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+align 64
+L$K_XX_XX:
+dd 1518500249,1518500249,1518500249,1518500249
+dd 1859775393,1859775393,1859775393,1859775393
+dd 2400959708,2400959708,2400959708,2400959708
+dd 3395469782,3395469782,3395469782,3395469782
+dd 66051,67438087,134810123,202182159
+db 15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0
+db 83,72,65,49,32,98,108,111,99,107,32,116,114,97,110,115
+db 102,111,114,109,32,102,111,114,32,120,56,54,44,32,67,82
+db 89,80,84,79,71,65,77,83,32,98,121,32,60,97,112,112
+db 114,111,64,111,112,101,110,115,115,108,46,111,114,103,62,0
+segment .bss
+common _OPENSSL_ia32cap_P 16
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha256-586.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha256-586.nasm
new file mode 100644
index 0000000000..0540b0eac7
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha256-586.nasm
@@ -0,0 +1,6796 @@
+; Copyright 2007-2018 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text code align=64
+%else
+section .text code
+%endif
+;extern _OPENSSL_ia32cap_P
+global _sha256_block_data_order
+align 16
+_sha256_block_data_order:
+L$_sha256_block_data_order_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ mov esi,DWORD [20+esp]
+ mov edi,DWORD [24+esp]
+ mov eax,DWORD [28+esp]
+ mov ebx,esp
+ call L$000pic_point
+L$000pic_point:
+ pop ebp
+ lea ebp,[(L$001K256-L$000pic_point)+ebp]
+ sub esp,16
+ and esp,-64
+ shl eax,6
+ add eax,edi
+ mov DWORD [esp],esi
+ mov DWORD [4+esp],edi
+ mov DWORD [8+esp],eax
+ mov DWORD [12+esp],ebx
+ lea edx,[_OPENSSL_ia32cap_P]
+ mov ecx,DWORD [edx]
+ mov ebx,DWORD [4+edx]
+ test ecx,1048576
+ jnz NEAR L$002loop
+ mov edx,DWORD [8+edx]
+ test ecx,16777216
+ jz NEAR L$003no_xmm
+ and ecx,1073741824
+ and ebx,268435968
+ test edx,536870912
+ jnz NEAR L$004shaext
+ or ecx,ebx
+ and ecx,1342177280
+ cmp ecx,1342177280
+ je NEAR L$005AVX
+ test ebx,512
+ jnz NEAR L$006SSSE3
+L$003no_xmm:
+ sub eax,edi
+ cmp eax,256
+ jae NEAR L$007unrolled
+ jmp NEAR L$002loop
+align 16
+L$002loop:
+ mov eax,DWORD [edi]
+ mov ebx,DWORD [4+edi]
+ mov ecx,DWORD [8+edi]
+ bswap eax
+ mov edx,DWORD [12+edi]
+ bswap ebx
+ push eax
+ bswap ecx
+ push ebx
+ bswap edx
+ push ecx
+ push edx
+ mov eax,DWORD [16+edi]
+ mov ebx,DWORD [20+edi]
+ mov ecx,DWORD [24+edi]
+ bswap eax
+ mov edx,DWORD [28+edi]
+ bswap ebx
+ push eax
+ bswap ecx
+ push ebx
+ bswap edx
+ push ecx
+ push edx
+ mov eax,DWORD [32+edi]
+ mov ebx,DWORD [36+edi]
+ mov ecx,DWORD [40+edi]
+ bswap eax
+ mov edx,DWORD [44+edi]
+ bswap ebx
+ push eax
+ bswap ecx
+ push ebx
+ bswap edx
+ push ecx
+ push edx
+ mov eax,DWORD [48+edi]
+ mov ebx,DWORD [52+edi]
+ mov ecx,DWORD [56+edi]
+ bswap eax
+ mov edx,DWORD [60+edi]
+ bswap ebx
+ push eax
+ bswap ecx
+ push ebx
+ bswap edx
+ push ecx
+ push edx
+ add edi,64
+ lea esp,[esp-36]
+ mov DWORD [104+esp],edi
+ mov eax,DWORD [esi]
+ mov ebx,DWORD [4+esi]
+ mov ecx,DWORD [8+esi]
+ mov edi,DWORD [12+esi]
+ mov DWORD [8+esp],ebx
+ xor ebx,ecx
+ mov DWORD [12+esp],ecx
+ mov DWORD [16+esp],edi
+ mov DWORD [esp],ebx
+ mov edx,DWORD [16+esi]
+ mov ebx,DWORD [20+esi]
+ mov ecx,DWORD [24+esi]
+ mov edi,DWORD [28+esi]
+ mov DWORD [24+esp],ebx
+ mov DWORD [28+esp],ecx
+ mov DWORD [32+esp],edi
+align 16
+L$00800_15:
+ mov ecx,edx
+ mov esi,DWORD [24+esp]
+ ror ecx,14
+ mov edi,DWORD [28+esp]
+ xor ecx,edx
+ xor esi,edi
+ mov ebx,DWORD [96+esp]
+ ror ecx,5
+ and esi,edx
+ mov DWORD [20+esp],edx
+ xor edx,ecx
+ add ebx,DWORD [32+esp]
+ xor esi,edi
+ ror edx,6
+ mov ecx,eax
+ add ebx,esi
+ ror ecx,9
+ add ebx,edx
+ mov edi,DWORD [8+esp]
+ xor ecx,eax
+ mov DWORD [4+esp],eax
+ lea esp,[esp-4]
+ ror ecx,11
+ mov esi,DWORD [ebp]
+ xor ecx,eax
+ mov edx,DWORD [20+esp]
+ xor eax,edi
+ ror ecx,2
+ add ebx,esi
+ mov DWORD [esp],eax
+ add edx,ebx
+ and eax,DWORD [4+esp]
+ add ebx,ecx
+ xor eax,edi
+ add ebp,4
+ add eax,ebx
+ cmp esi,3248222580
+ jne NEAR L$00800_15
+ mov ecx,DWORD [156+esp]
+ jmp NEAR L$00916_63
+align 16
+L$00916_63:
+ mov ebx,ecx
+ mov esi,DWORD [104+esp]
+ ror ecx,11
+ mov edi,esi
+ ror esi,2
+ xor ecx,ebx
+ shr ebx,3
+ ror ecx,7
+ xor esi,edi
+ xor ebx,ecx
+ ror esi,17
+ add ebx,DWORD [160+esp]
+ shr edi,10
+ add ebx,DWORD [124+esp]
+ mov ecx,edx
+ xor edi,esi
+ mov esi,DWORD [24+esp]
+ ror ecx,14
+ add ebx,edi
+ mov edi,DWORD [28+esp]
+ xor ecx,edx
+ xor esi,edi
+ mov DWORD [96+esp],ebx
+ ror ecx,5
+ and esi,edx
+ mov DWORD [20+esp],edx
+ xor edx,ecx
+ add ebx,DWORD [32+esp]
+ xor esi,edi
+ ror edx,6
+ mov ecx,eax
+ add ebx,esi
+ ror ecx,9
+ add ebx,edx
+ mov edi,DWORD [8+esp]
+ xor ecx,eax
+ mov DWORD [4+esp],eax
+ lea esp,[esp-4]
+ ror ecx,11
+ mov esi,DWORD [ebp]
+ xor ecx,eax
+ mov edx,DWORD [20+esp]
+ xor eax,edi
+ ror ecx,2
+ add ebx,esi
+ mov DWORD [esp],eax
+ add edx,ebx
+ and eax,DWORD [4+esp]
+ add ebx,ecx
+ xor eax,edi
+ mov ecx,DWORD [156+esp]
+ add ebp,4
+ add eax,ebx
+ cmp esi,3329325298
+ jne NEAR L$00916_63
+ mov esi,DWORD [356+esp]
+ mov ebx,DWORD [8+esp]
+ mov ecx,DWORD [16+esp]
+ add eax,DWORD [esi]
+ add ebx,DWORD [4+esi]
+ add edi,DWORD [8+esi]
+ add ecx,DWORD [12+esi]
+ mov DWORD [esi],eax
+ mov DWORD [4+esi],ebx
+ mov DWORD [8+esi],edi
+ mov DWORD [12+esi],ecx
+ mov eax,DWORD [24+esp]
+ mov ebx,DWORD [28+esp]
+ mov ecx,DWORD [32+esp]
+ mov edi,DWORD [360+esp]
+ add edx,DWORD [16+esi]
+ add eax,DWORD [20+esi]
+ add ebx,DWORD [24+esi]
+ add ecx,DWORD [28+esi]
+ mov DWORD [16+esi],edx
+ mov DWORD [20+esi],eax
+ mov DWORD [24+esi],ebx
+ mov DWORD [28+esi],ecx
+ lea esp,[356+esp]
+ sub ebp,256
+ cmp edi,DWORD [8+esp]
+ jb NEAR L$002loop
+ mov esp,DWORD [12+esp]
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+align 64
+L$001K256:
+dd 1116352408,1899447441,3049323471,3921009573,961987163,1508970993,2453635748,2870763221,3624381080,310598401,607225278,1426881987,1925078388,2162078206,2614888103,3248222580,3835390401,4022224774,264347078,604807628,770255983,1249150122,1555081692,1996064986,2554220882,2821834349,2952996808,3210313671,3336571891,3584528711,113926993,338241895,666307205,773529912,1294757372,1396182291,1695183700,1986661051,2177026350,2456956037,2730485921,2820302411,3259730800,3345764771,3516065817,3600352804,4094571909,275423344,430227734,506948616,659060556,883997877,958139571,1322822218,1537002063,1747873779,1955562222,2024104815,2227730452,2361852424,2428436474,2756734187,3204031479,3329325298
+dd 66051,67438087,134810123,202182159
+db 83,72,65,50,53,54,32,98,108,111,99,107,32,116,114,97
+db 110,115,102,111,114,109,32,102,111,114,32,120,56,54,44,32
+db 67,82,89,80,84,79,71,65,77,83,32,98,121,32,60,97
+db 112,112,114,111,64,111,112,101,110,115,115,108,46,111,114,103
+db 62,0
+align 16
+L$007unrolled:
+ lea esp,[esp-96]
+ mov eax,DWORD [esi]
+ mov ebp,DWORD [4+esi]
+ mov ecx,DWORD [8+esi]
+ mov ebx,DWORD [12+esi]
+ mov DWORD [4+esp],ebp
+ xor ebp,ecx
+ mov DWORD [8+esp],ecx
+ mov DWORD [12+esp],ebx
+ mov edx,DWORD [16+esi]
+ mov ebx,DWORD [20+esi]
+ mov ecx,DWORD [24+esi]
+ mov esi,DWORD [28+esi]
+ mov DWORD [20+esp],ebx
+ mov DWORD [24+esp],ecx
+ mov DWORD [28+esp],esi
+ jmp NEAR L$010grand_loop
+align 16
+L$010grand_loop:
+ mov ebx,DWORD [edi]
+ mov ecx,DWORD [4+edi]
+ bswap ebx
+ mov esi,DWORD [8+edi]
+ bswap ecx
+ mov DWORD [32+esp],ebx
+ bswap esi
+ mov DWORD [36+esp],ecx
+ mov DWORD [40+esp],esi
+ mov ebx,DWORD [12+edi]
+ mov ecx,DWORD [16+edi]
+ bswap ebx
+ mov esi,DWORD [20+edi]
+ bswap ecx
+ mov DWORD [44+esp],ebx
+ bswap esi
+ mov DWORD [48+esp],ecx
+ mov DWORD [52+esp],esi
+ mov ebx,DWORD [24+edi]
+ mov ecx,DWORD [28+edi]
+ bswap ebx
+ mov esi,DWORD [32+edi]
+ bswap ecx
+ mov DWORD [56+esp],ebx
+ bswap esi
+ mov DWORD [60+esp],ecx
+ mov DWORD [64+esp],esi
+ mov ebx,DWORD [36+edi]
+ mov ecx,DWORD [40+edi]
+ bswap ebx
+ mov esi,DWORD [44+edi]
+ bswap ecx
+ mov DWORD [68+esp],ebx
+ bswap esi
+ mov DWORD [72+esp],ecx
+ mov DWORD [76+esp],esi
+ mov ebx,DWORD [48+edi]
+ mov ecx,DWORD [52+edi]
+ bswap ebx
+ mov esi,DWORD [56+edi]
+ bswap ecx
+ mov DWORD [80+esp],ebx
+ bswap esi
+ mov DWORD [84+esp],ecx
+ mov DWORD [88+esp],esi
+ mov ebx,DWORD [60+edi]
+ add edi,64
+ bswap ebx
+ mov DWORD [100+esp],edi
+ mov DWORD [92+esp],ebx
+ mov ecx,edx
+ mov esi,DWORD [20+esp]
+ ror edx,14
+ mov edi,DWORD [24+esp]
+ xor edx,ecx
+ mov ebx,DWORD [32+esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [16+esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [28+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [4+esp]
+ xor ecx,eax
+ mov DWORD [esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[1116352408+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [12+esp]
+ add ebp,ecx
+ mov esi,edx
+ mov ecx,DWORD [16+esp]
+ ror edx,14
+ mov edi,DWORD [20+esp]
+ xor edx,esi
+ mov ebx,DWORD [36+esp]
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [12+esp],esi
+ xor edx,esi
+ add ebx,DWORD [24+esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [esp]
+ xor esi,ebp
+ mov DWORD [28+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[1899447441+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [8+esp]
+ add eax,esi
+ mov ecx,edx
+ mov esi,DWORD [12+esp]
+ ror edx,14
+ mov edi,DWORD [16+esp]
+ xor edx,ecx
+ mov ebx,DWORD [40+esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [8+esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [20+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [28+esp]
+ xor ecx,eax
+ mov DWORD [24+esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[3049323471+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [4+esp]
+ add ebp,ecx
+ mov esi,edx
+ mov ecx,DWORD [8+esp]
+ ror edx,14
+ mov edi,DWORD [12+esp]
+ xor edx,esi
+ mov ebx,DWORD [44+esp]
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [4+esp],esi
+ xor edx,esi
+ add ebx,DWORD [16+esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [24+esp]
+ xor esi,ebp
+ mov DWORD [20+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[3921009573+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [esp]
+ add eax,esi
+ mov ecx,edx
+ mov esi,DWORD [4+esp]
+ ror edx,14
+ mov edi,DWORD [8+esp]
+ xor edx,ecx
+ mov ebx,DWORD [48+esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [12+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [20+esp]
+ xor ecx,eax
+ mov DWORD [16+esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[961987163+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [28+esp]
+ add ebp,ecx
+ mov esi,edx
+ mov ecx,DWORD [esp]
+ ror edx,14
+ mov edi,DWORD [4+esp]
+ xor edx,esi
+ mov ebx,DWORD [52+esp]
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [28+esp],esi
+ xor edx,esi
+ add ebx,DWORD [8+esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [16+esp]
+ xor esi,ebp
+ mov DWORD [12+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[1508970993+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [24+esp]
+ add eax,esi
+ mov ecx,edx
+ mov esi,DWORD [28+esp]
+ ror edx,14
+ mov edi,DWORD [esp]
+ xor edx,ecx
+ mov ebx,DWORD [56+esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [24+esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [4+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [12+esp]
+ xor ecx,eax
+ mov DWORD [8+esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[2453635748+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [20+esp]
+ add ebp,ecx
+ mov esi,edx
+ mov ecx,DWORD [24+esp]
+ ror edx,14
+ mov edi,DWORD [28+esp]
+ xor edx,esi
+ mov ebx,DWORD [60+esp]
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [20+esp],esi
+ xor edx,esi
+ add ebx,DWORD [esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [8+esp]
+ xor esi,ebp
+ mov DWORD [4+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[2870763221+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [16+esp]
+ add eax,esi
+ mov ecx,edx
+ mov esi,DWORD [20+esp]
+ ror edx,14
+ mov edi,DWORD [24+esp]
+ xor edx,ecx
+ mov ebx,DWORD [64+esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [16+esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [28+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [4+esp]
+ xor ecx,eax
+ mov DWORD [esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[3624381080+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [12+esp]
+ add ebp,ecx
+ mov esi,edx
+ mov ecx,DWORD [16+esp]
+ ror edx,14
+ mov edi,DWORD [20+esp]
+ xor edx,esi
+ mov ebx,DWORD [68+esp]
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [12+esp],esi
+ xor edx,esi
+ add ebx,DWORD [24+esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [esp]
+ xor esi,ebp
+ mov DWORD [28+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[310598401+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [8+esp]
+ add eax,esi
+ mov ecx,edx
+ mov esi,DWORD [12+esp]
+ ror edx,14
+ mov edi,DWORD [16+esp]
+ xor edx,ecx
+ mov ebx,DWORD [72+esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [8+esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [20+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [28+esp]
+ xor ecx,eax
+ mov DWORD [24+esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[607225278+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [4+esp]
+ add ebp,ecx
+ mov esi,edx
+ mov ecx,DWORD [8+esp]
+ ror edx,14
+ mov edi,DWORD [12+esp]
+ xor edx,esi
+ mov ebx,DWORD [76+esp]
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [4+esp],esi
+ xor edx,esi
+ add ebx,DWORD [16+esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [24+esp]
+ xor esi,ebp
+ mov DWORD [20+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[1426881987+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [esp]
+ add eax,esi
+ mov ecx,edx
+ mov esi,DWORD [4+esp]
+ ror edx,14
+ mov edi,DWORD [8+esp]
+ xor edx,ecx
+ mov ebx,DWORD [80+esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [12+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [20+esp]
+ xor ecx,eax
+ mov DWORD [16+esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[1925078388+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [28+esp]
+ add ebp,ecx
+ mov esi,edx
+ mov ecx,DWORD [esp]
+ ror edx,14
+ mov edi,DWORD [4+esp]
+ xor edx,esi
+ mov ebx,DWORD [84+esp]
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [28+esp],esi
+ xor edx,esi
+ add ebx,DWORD [8+esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [16+esp]
+ xor esi,ebp
+ mov DWORD [12+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[2162078206+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [24+esp]
+ add eax,esi
+ mov ecx,edx
+ mov esi,DWORD [28+esp]
+ ror edx,14
+ mov edi,DWORD [esp]
+ xor edx,ecx
+ mov ebx,DWORD [88+esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [24+esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [4+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [12+esp]
+ xor ecx,eax
+ mov DWORD [8+esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[2614888103+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [20+esp]
+ add ebp,ecx
+ mov esi,edx
+ mov ecx,DWORD [24+esp]
+ ror edx,14
+ mov edi,DWORD [28+esp]
+ xor edx,esi
+ mov ebx,DWORD [92+esp]
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [20+esp],esi
+ xor edx,esi
+ add ebx,DWORD [esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [8+esp]
+ xor esi,ebp
+ mov DWORD [4+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[3248222580+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ mov ecx,DWORD [36+esp]
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [16+esp]
+ add eax,esi
+ mov esi,DWORD [88+esp]
+ mov ebx,ecx
+ ror ecx,11
+ mov edi,esi
+ ror esi,2
+ xor ecx,ebx
+ shr ebx,3
+ ror ecx,7
+ xor esi,edi
+ xor ebx,ecx
+ ror esi,17
+ add ebx,DWORD [32+esp]
+ shr edi,10
+ add ebx,DWORD [68+esp]
+ mov ecx,edx
+ xor edi,esi
+ mov esi,DWORD [20+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [24+esp]
+ xor edx,ecx
+ mov DWORD [32+esp],ebx
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [16+esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [28+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [4+esp]
+ xor ecx,eax
+ mov DWORD [esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[3835390401+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ mov esi,DWORD [40+esp]
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [12+esp]
+ add ebp,ecx
+ mov ecx,DWORD [92+esp]
+ mov ebx,esi
+ ror esi,11
+ mov edi,ecx
+ ror ecx,2
+ xor esi,ebx
+ shr ebx,3
+ ror esi,7
+ xor ecx,edi
+ xor ebx,esi
+ ror ecx,17
+ add ebx,DWORD [36+esp]
+ shr edi,10
+ add ebx,DWORD [72+esp]
+ mov esi,edx
+ xor edi,ecx
+ mov ecx,DWORD [16+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [20+esp]
+ xor edx,esi
+ mov DWORD [36+esp],ebx
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [12+esp],esi
+ xor edx,esi
+ add ebx,DWORD [24+esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [esp]
+ xor esi,ebp
+ mov DWORD [28+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[4022224774+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ mov ecx,DWORD [44+esp]
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [8+esp]
+ add eax,esi
+ mov esi,DWORD [32+esp]
+ mov ebx,ecx
+ ror ecx,11
+ mov edi,esi
+ ror esi,2
+ xor ecx,ebx
+ shr ebx,3
+ ror ecx,7
+ xor esi,edi
+ xor ebx,ecx
+ ror esi,17
+ add ebx,DWORD [40+esp]
+ shr edi,10
+ add ebx,DWORD [76+esp]
+ mov ecx,edx
+ xor edi,esi
+ mov esi,DWORD [12+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [16+esp]
+ xor edx,ecx
+ mov DWORD [40+esp],ebx
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [8+esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [20+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [28+esp]
+ xor ecx,eax
+ mov DWORD [24+esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[264347078+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ mov esi,DWORD [48+esp]
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [4+esp]
+ add ebp,ecx
+ mov ecx,DWORD [36+esp]
+ mov ebx,esi
+ ror esi,11
+ mov edi,ecx
+ ror ecx,2
+ xor esi,ebx
+ shr ebx,3
+ ror esi,7
+ xor ecx,edi
+ xor ebx,esi
+ ror ecx,17
+ add ebx,DWORD [44+esp]
+ shr edi,10
+ add ebx,DWORD [80+esp]
+ mov esi,edx
+ xor edi,ecx
+ mov ecx,DWORD [8+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [12+esp]
+ xor edx,esi
+ mov DWORD [44+esp],ebx
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [4+esp],esi
+ xor edx,esi
+ add ebx,DWORD [16+esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [24+esp]
+ xor esi,ebp
+ mov DWORD [20+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[604807628+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ mov ecx,DWORD [52+esp]
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [esp]
+ add eax,esi
+ mov esi,DWORD [40+esp]
+ mov ebx,ecx
+ ror ecx,11
+ mov edi,esi
+ ror esi,2
+ xor ecx,ebx
+ shr ebx,3
+ ror ecx,7
+ xor esi,edi
+ xor ebx,ecx
+ ror esi,17
+ add ebx,DWORD [48+esp]
+ shr edi,10
+ add ebx,DWORD [84+esp]
+ mov ecx,edx
+ xor edi,esi
+ mov esi,DWORD [4+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [8+esp]
+ xor edx,ecx
+ mov DWORD [48+esp],ebx
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [12+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [20+esp]
+ xor ecx,eax
+ mov DWORD [16+esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[770255983+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ mov esi,DWORD [56+esp]
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [28+esp]
+ add ebp,ecx
+ mov ecx,DWORD [44+esp]
+ mov ebx,esi
+ ror esi,11
+ mov edi,ecx
+ ror ecx,2
+ xor esi,ebx
+ shr ebx,3
+ ror esi,7
+ xor ecx,edi
+ xor ebx,esi
+ ror ecx,17
+ add ebx,DWORD [52+esp]
+ shr edi,10
+ add ebx,DWORD [88+esp]
+ mov esi,edx
+ xor edi,ecx
+ mov ecx,DWORD [esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [4+esp]
+ xor edx,esi
+ mov DWORD [52+esp],ebx
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [28+esp],esi
+ xor edx,esi
+ add ebx,DWORD [8+esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [16+esp]
+ xor esi,ebp
+ mov DWORD [12+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[1249150122+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ mov ecx,DWORD [60+esp]
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [24+esp]
+ add eax,esi
+ mov esi,DWORD [48+esp]
+ mov ebx,ecx
+ ror ecx,11
+ mov edi,esi
+ ror esi,2
+ xor ecx,ebx
+ shr ebx,3
+ ror ecx,7
+ xor esi,edi
+ xor ebx,ecx
+ ror esi,17
+ add ebx,DWORD [56+esp]
+ shr edi,10
+ add ebx,DWORD [92+esp]
+ mov ecx,edx
+ xor edi,esi
+ mov esi,DWORD [28+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [esp]
+ xor edx,ecx
+ mov DWORD [56+esp],ebx
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [24+esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [4+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [12+esp]
+ xor ecx,eax
+ mov DWORD [8+esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[1555081692+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ mov esi,DWORD [64+esp]
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [20+esp]
+ add ebp,ecx
+ mov ecx,DWORD [52+esp]
+ mov ebx,esi
+ ror esi,11
+ mov edi,ecx
+ ror ecx,2
+ xor esi,ebx
+ shr ebx,3
+ ror esi,7
+ xor ecx,edi
+ xor ebx,esi
+ ror ecx,17
+ add ebx,DWORD [60+esp]
+ shr edi,10
+ add ebx,DWORD [32+esp]
+ mov esi,edx
+ xor edi,ecx
+ mov ecx,DWORD [24+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [28+esp]
+ xor edx,esi
+ mov DWORD [60+esp],ebx
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [20+esp],esi
+ xor edx,esi
+ add ebx,DWORD [esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [8+esp]
+ xor esi,ebp
+ mov DWORD [4+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[1996064986+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ mov ecx,DWORD [68+esp]
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [16+esp]
+ add eax,esi
+ mov esi,DWORD [56+esp]
+ mov ebx,ecx
+ ror ecx,11
+ mov edi,esi
+ ror esi,2
+ xor ecx,ebx
+ shr ebx,3
+ ror ecx,7
+ xor esi,edi
+ xor ebx,ecx
+ ror esi,17
+ add ebx,DWORD [64+esp]
+ shr edi,10
+ add ebx,DWORD [36+esp]
+ mov ecx,edx
+ xor edi,esi
+ mov esi,DWORD [20+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [24+esp]
+ xor edx,ecx
+ mov DWORD [64+esp],ebx
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [16+esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [28+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [4+esp]
+ xor ecx,eax
+ mov DWORD [esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[2554220882+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ mov esi,DWORD [72+esp]
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [12+esp]
+ add ebp,ecx
+ mov ecx,DWORD [60+esp]
+ mov ebx,esi
+ ror esi,11
+ mov edi,ecx
+ ror ecx,2
+ xor esi,ebx
+ shr ebx,3
+ ror esi,7
+ xor ecx,edi
+ xor ebx,esi
+ ror ecx,17
+ add ebx,DWORD [68+esp]
+ shr edi,10
+ add ebx,DWORD [40+esp]
+ mov esi,edx
+ xor edi,ecx
+ mov ecx,DWORD [16+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [20+esp]
+ xor edx,esi
+ mov DWORD [68+esp],ebx
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [12+esp],esi
+ xor edx,esi
+ add ebx,DWORD [24+esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [esp]
+ xor esi,ebp
+ mov DWORD [28+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[2821834349+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ mov ecx,DWORD [76+esp]
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [8+esp]
+ add eax,esi
+ mov esi,DWORD [64+esp]
+ mov ebx,ecx
+ ror ecx,11
+ mov edi,esi
+ ror esi,2
+ xor ecx,ebx
+ shr ebx,3
+ ror ecx,7
+ xor esi,edi
+ xor ebx,ecx
+ ror esi,17
+ add ebx,DWORD [72+esp]
+ shr edi,10
+ add ebx,DWORD [44+esp]
+ mov ecx,edx
+ xor edi,esi
+ mov esi,DWORD [12+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [16+esp]
+ xor edx,ecx
+ mov DWORD [72+esp],ebx
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [8+esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [20+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [28+esp]
+ xor ecx,eax
+ mov DWORD [24+esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[2952996808+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ mov esi,DWORD [80+esp]
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [4+esp]
+ add ebp,ecx
+ mov ecx,DWORD [68+esp]
+ mov ebx,esi
+ ror esi,11
+ mov edi,ecx
+ ror ecx,2
+ xor esi,ebx
+ shr ebx,3
+ ror esi,7
+ xor ecx,edi
+ xor ebx,esi
+ ror ecx,17
+ add ebx,DWORD [76+esp]
+ shr edi,10
+ add ebx,DWORD [48+esp]
+ mov esi,edx
+ xor edi,ecx
+ mov ecx,DWORD [8+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [12+esp]
+ xor edx,esi
+ mov DWORD [76+esp],ebx
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [4+esp],esi
+ xor edx,esi
+ add ebx,DWORD [16+esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [24+esp]
+ xor esi,ebp
+ mov DWORD [20+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[3210313671+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ mov ecx,DWORD [84+esp]
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [esp]
+ add eax,esi
+ mov esi,DWORD [72+esp]
+ mov ebx,ecx
+ ror ecx,11
+ mov edi,esi
+ ror esi,2
+ xor ecx,ebx
+ shr ebx,3
+ ror ecx,7
+ xor esi,edi
+ xor ebx,ecx
+ ror esi,17
+ add ebx,DWORD [80+esp]
+ shr edi,10
+ add ebx,DWORD [52+esp]
+ mov ecx,edx
+ xor edi,esi
+ mov esi,DWORD [4+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [8+esp]
+ xor edx,ecx
+ mov DWORD [80+esp],ebx
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [12+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [20+esp]
+ xor ecx,eax
+ mov DWORD [16+esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[3336571891+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ mov esi,DWORD [88+esp]
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [28+esp]
+ add ebp,ecx
+ mov ecx,DWORD [76+esp]
+ mov ebx,esi
+ ror esi,11
+ mov edi,ecx
+ ror ecx,2
+ xor esi,ebx
+ shr ebx,3
+ ror esi,7
+ xor ecx,edi
+ xor ebx,esi
+ ror ecx,17
+ add ebx,DWORD [84+esp]
+ shr edi,10
+ add ebx,DWORD [56+esp]
+ mov esi,edx
+ xor edi,ecx
+ mov ecx,DWORD [esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [4+esp]
+ xor edx,esi
+ mov DWORD [84+esp],ebx
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [28+esp],esi
+ xor edx,esi
+ add ebx,DWORD [8+esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [16+esp]
+ xor esi,ebp
+ mov DWORD [12+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[3584528711+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ mov ecx,DWORD [92+esp]
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [24+esp]
+ add eax,esi
+ mov esi,DWORD [80+esp]
+ mov ebx,ecx
+ ror ecx,11
+ mov edi,esi
+ ror esi,2
+ xor ecx,ebx
+ shr ebx,3
+ ror ecx,7
+ xor esi,edi
+ xor ebx,ecx
+ ror esi,17
+ add ebx,DWORD [88+esp]
+ shr edi,10
+ add ebx,DWORD [60+esp]
+ mov ecx,edx
+ xor edi,esi
+ mov esi,DWORD [28+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [esp]
+ xor edx,ecx
+ mov DWORD [88+esp],ebx
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [24+esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [4+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [12+esp]
+ xor ecx,eax
+ mov DWORD [8+esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[113926993+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ mov esi,DWORD [32+esp]
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [20+esp]
+ add ebp,ecx
+ mov ecx,DWORD [84+esp]
+ mov ebx,esi
+ ror esi,11
+ mov edi,ecx
+ ror ecx,2
+ xor esi,ebx
+ shr ebx,3
+ ror esi,7
+ xor ecx,edi
+ xor ebx,esi
+ ror ecx,17
+ add ebx,DWORD [92+esp]
+ shr edi,10
+ add ebx,DWORD [64+esp]
+ mov esi,edx
+ xor edi,ecx
+ mov ecx,DWORD [24+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [28+esp]
+ xor edx,esi
+ mov DWORD [92+esp],ebx
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [20+esp],esi
+ xor edx,esi
+ add ebx,DWORD [esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [8+esp]
+ xor esi,ebp
+ mov DWORD [4+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[338241895+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ mov ecx,DWORD [36+esp]
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [16+esp]
+ add eax,esi
+ mov esi,DWORD [88+esp]
+ mov ebx,ecx
+ ror ecx,11
+ mov edi,esi
+ ror esi,2
+ xor ecx,ebx
+ shr ebx,3
+ ror ecx,7
+ xor esi,edi
+ xor ebx,ecx
+ ror esi,17
+ add ebx,DWORD [32+esp]
+ shr edi,10
+ add ebx,DWORD [68+esp]
+ mov ecx,edx
+ xor edi,esi
+ mov esi,DWORD [20+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [24+esp]
+ xor edx,ecx
+ mov DWORD [32+esp],ebx
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [16+esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [28+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [4+esp]
+ xor ecx,eax
+ mov DWORD [esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[666307205+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ mov esi,DWORD [40+esp]
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [12+esp]
+ add ebp,ecx
+ mov ecx,DWORD [92+esp]
+ mov ebx,esi
+ ror esi,11
+ mov edi,ecx
+ ror ecx,2
+ xor esi,ebx
+ shr ebx,3
+ ror esi,7
+ xor ecx,edi
+ xor ebx,esi
+ ror ecx,17
+ add ebx,DWORD [36+esp]
+ shr edi,10
+ add ebx,DWORD [72+esp]
+ mov esi,edx
+ xor edi,ecx
+ mov ecx,DWORD [16+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [20+esp]
+ xor edx,esi
+ mov DWORD [36+esp],ebx
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [12+esp],esi
+ xor edx,esi
+ add ebx,DWORD [24+esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [esp]
+ xor esi,ebp
+ mov DWORD [28+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[773529912+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ mov ecx,DWORD [44+esp]
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [8+esp]
+ add eax,esi
+ mov esi,DWORD [32+esp]
+ mov ebx,ecx
+ ror ecx,11
+ mov edi,esi
+ ror esi,2
+ xor ecx,ebx
+ shr ebx,3
+ ror ecx,7
+ xor esi,edi
+ xor ebx,ecx
+ ror esi,17
+ add ebx,DWORD [40+esp]
+ shr edi,10
+ add ebx,DWORD [76+esp]
+ mov ecx,edx
+ xor edi,esi
+ mov esi,DWORD [12+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [16+esp]
+ xor edx,ecx
+ mov DWORD [40+esp],ebx
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [8+esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [20+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [28+esp]
+ xor ecx,eax
+ mov DWORD [24+esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[1294757372+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ mov esi,DWORD [48+esp]
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [4+esp]
+ add ebp,ecx
+ mov ecx,DWORD [36+esp]
+ mov ebx,esi
+ ror esi,11
+ mov edi,ecx
+ ror ecx,2
+ xor esi,ebx
+ shr ebx,3
+ ror esi,7
+ xor ecx,edi
+ xor ebx,esi
+ ror ecx,17
+ add ebx,DWORD [44+esp]
+ shr edi,10
+ add ebx,DWORD [80+esp]
+ mov esi,edx
+ xor edi,ecx
+ mov ecx,DWORD [8+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [12+esp]
+ xor edx,esi
+ mov DWORD [44+esp],ebx
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [4+esp],esi
+ xor edx,esi
+ add ebx,DWORD [16+esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [24+esp]
+ xor esi,ebp
+ mov DWORD [20+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[1396182291+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ mov ecx,DWORD [52+esp]
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [esp]
+ add eax,esi
+ mov esi,DWORD [40+esp]
+ mov ebx,ecx
+ ror ecx,11
+ mov edi,esi
+ ror esi,2
+ xor ecx,ebx
+ shr ebx,3
+ ror ecx,7
+ xor esi,edi
+ xor ebx,ecx
+ ror esi,17
+ add ebx,DWORD [48+esp]
+ shr edi,10
+ add ebx,DWORD [84+esp]
+ mov ecx,edx
+ xor edi,esi
+ mov esi,DWORD [4+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [8+esp]
+ xor edx,ecx
+ mov DWORD [48+esp],ebx
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [12+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [20+esp]
+ xor ecx,eax
+ mov DWORD [16+esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[1695183700+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ mov esi,DWORD [56+esp]
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [28+esp]
+ add ebp,ecx
+ mov ecx,DWORD [44+esp]
+ mov ebx,esi
+ ror esi,11
+ mov edi,ecx
+ ror ecx,2
+ xor esi,ebx
+ shr ebx,3
+ ror esi,7
+ xor ecx,edi
+ xor ebx,esi
+ ror ecx,17
+ add ebx,DWORD [52+esp]
+ shr edi,10
+ add ebx,DWORD [88+esp]
+ mov esi,edx
+ xor edi,ecx
+ mov ecx,DWORD [esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [4+esp]
+ xor edx,esi
+ mov DWORD [52+esp],ebx
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [28+esp],esi
+ xor edx,esi
+ add ebx,DWORD [8+esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [16+esp]
+ xor esi,ebp
+ mov DWORD [12+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[1986661051+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ mov ecx,DWORD [60+esp]
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [24+esp]
+ add eax,esi
+ mov esi,DWORD [48+esp]
+ mov ebx,ecx
+ ror ecx,11
+ mov edi,esi
+ ror esi,2
+ xor ecx,ebx
+ shr ebx,3
+ ror ecx,7
+ xor esi,edi
+ xor ebx,ecx
+ ror esi,17
+ add ebx,DWORD [56+esp]
+ shr edi,10
+ add ebx,DWORD [92+esp]
+ mov ecx,edx
+ xor edi,esi
+ mov esi,DWORD [28+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [esp]
+ xor edx,ecx
+ mov DWORD [56+esp],ebx
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [24+esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [4+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [12+esp]
+ xor ecx,eax
+ mov DWORD [8+esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[2177026350+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ mov esi,DWORD [64+esp]
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [20+esp]
+ add ebp,ecx
+ mov ecx,DWORD [52+esp]
+ mov ebx,esi
+ ror esi,11
+ mov edi,ecx
+ ror ecx,2
+ xor esi,ebx
+ shr ebx,3
+ ror esi,7
+ xor ecx,edi
+ xor ebx,esi
+ ror ecx,17
+ add ebx,DWORD [60+esp]
+ shr edi,10
+ add ebx,DWORD [32+esp]
+ mov esi,edx
+ xor edi,ecx
+ mov ecx,DWORD [24+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [28+esp]
+ xor edx,esi
+ mov DWORD [60+esp],ebx
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [20+esp],esi
+ xor edx,esi
+ add ebx,DWORD [esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [8+esp]
+ xor esi,ebp
+ mov DWORD [4+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[2456956037+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ mov ecx,DWORD [68+esp]
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [16+esp]
+ add eax,esi
+ mov esi,DWORD [56+esp]
+ mov ebx,ecx
+ ror ecx,11
+ mov edi,esi
+ ror esi,2
+ xor ecx,ebx
+ shr ebx,3
+ ror ecx,7
+ xor esi,edi
+ xor ebx,ecx
+ ror esi,17
+ add ebx,DWORD [64+esp]
+ shr edi,10
+ add ebx,DWORD [36+esp]
+ mov ecx,edx
+ xor edi,esi
+ mov esi,DWORD [20+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [24+esp]
+ xor edx,ecx
+ mov DWORD [64+esp],ebx
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [16+esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [28+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [4+esp]
+ xor ecx,eax
+ mov DWORD [esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[2730485921+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ mov esi,DWORD [72+esp]
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [12+esp]
+ add ebp,ecx
+ mov ecx,DWORD [60+esp]
+ mov ebx,esi
+ ror esi,11
+ mov edi,ecx
+ ror ecx,2
+ xor esi,ebx
+ shr ebx,3
+ ror esi,7
+ xor ecx,edi
+ xor ebx,esi
+ ror ecx,17
+ add ebx,DWORD [68+esp]
+ shr edi,10
+ add ebx,DWORD [40+esp]
+ mov esi,edx
+ xor edi,ecx
+ mov ecx,DWORD [16+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [20+esp]
+ xor edx,esi
+ mov DWORD [68+esp],ebx
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [12+esp],esi
+ xor edx,esi
+ add ebx,DWORD [24+esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [esp]
+ xor esi,ebp
+ mov DWORD [28+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[2820302411+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ mov ecx,DWORD [76+esp]
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [8+esp]
+ add eax,esi
+ mov esi,DWORD [64+esp]
+ mov ebx,ecx
+ ror ecx,11
+ mov edi,esi
+ ror esi,2
+ xor ecx,ebx
+ shr ebx,3
+ ror ecx,7
+ xor esi,edi
+ xor ebx,ecx
+ ror esi,17
+ add ebx,DWORD [72+esp]
+ shr edi,10
+ add ebx,DWORD [44+esp]
+ mov ecx,edx
+ xor edi,esi
+ mov esi,DWORD [12+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [16+esp]
+ xor edx,ecx
+ mov DWORD [72+esp],ebx
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [8+esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [20+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [28+esp]
+ xor ecx,eax
+ mov DWORD [24+esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[3259730800+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ mov esi,DWORD [80+esp]
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [4+esp]
+ add ebp,ecx
+ mov ecx,DWORD [68+esp]
+ mov ebx,esi
+ ror esi,11
+ mov edi,ecx
+ ror ecx,2
+ xor esi,ebx
+ shr ebx,3
+ ror esi,7
+ xor ecx,edi
+ xor ebx,esi
+ ror ecx,17
+ add ebx,DWORD [76+esp]
+ shr edi,10
+ add ebx,DWORD [48+esp]
+ mov esi,edx
+ xor edi,ecx
+ mov ecx,DWORD [8+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [12+esp]
+ xor edx,esi
+ mov DWORD [76+esp],ebx
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [4+esp],esi
+ xor edx,esi
+ add ebx,DWORD [16+esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [24+esp]
+ xor esi,ebp
+ mov DWORD [20+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[3345764771+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ mov ecx,DWORD [84+esp]
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [esp]
+ add eax,esi
+ mov esi,DWORD [72+esp]
+ mov ebx,ecx
+ ror ecx,11
+ mov edi,esi
+ ror esi,2
+ xor ecx,ebx
+ shr ebx,3
+ ror ecx,7
+ xor esi,edi
+ xor ebx,ecx
+ ror esi,17
+ add ebx,DWORD [80+esp]
+ shr edi,10
+ add ebx,DWORD [52+esp]
+ mov ecx,edx
+ xor edi,esi
+ mov esi,DWORD [4+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [8+esp]
+ xor edx,ecx
+ mov DWORD [80+esp],ebx
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [12+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [20+esp]
+ xor ecx,eax
+ mov DWORD [16+esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[3516065817+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ mov esi,DWORD [88+esp]
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [28+esp]
+ add ebp,ecx
+ mov ecx,DWORD [76+esp]
+ mov ebx,esi
+ ror esi,11
+ mov edi,ecx
+ ror ecx,2
+ xor esi,ebx
+ shr ebx,3
+ ror esi,7
+ xor ecx,edi
+ xor ebx,esi
+ ror ecx,17
+ add ebx,DWORD [84+esp]
+ shr edi,10
+ add ebx,DWORD [56+esp]
+ mov esi,edx
+ xor edi,ecx
+ mov ecx,DWORD [esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [4+esp]
+ xor edx,esi
+ mov DWORD [84+esp],ebx
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [28+esp],esi
+ xor edx,esi
+ add ebx,DWORD [8+esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [16+esp]
+ xor esi,ebp
+ mov DWORD [12+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[3600352804+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ mov ecx,DWORD [92+esp]
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [24+esp]
+ add eax,esi
+ mov esi,DWORD [80+esp]
+ mov ebx,ecx
+ ror ecx,11
+ mov edi,esi
+ ror esi,2
+ xor ecx,ebx
+ shr ebx,3
+ ror ecx,7
+ xor esi,edi
+ xor ebx,ecx
+ ror esi,17
+ add ebx,DWORD [88+esp]
+ shr edi,10
+ add ebx,DWORD [60+esp]
+ mov ecx,edx
+ xor edi,esi
+ mov esi,DWORD [28+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [esp]
+ xor edx,ecx
+ mov DWORD [88+esp],ebx
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [24+esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [4+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [12+esp]
+ xor ecx,eax
+ mov DWORD [8+esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[4094571909+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ mov esi,DWORD [32+esp]
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [20+esp]
+ add ebp,ecx
+ mov ecx,DWORD [84+esp]
+ mov ebx,esi
+ ror esi,11
+ mov edi,ecx
+ ror ecx,2
+ xor esi,ebx
+ shr ebx,3
+ ror esi,7
+ xor ecx,edi
+ xor ebx,esi
+ ror ecx,17
+ add ebx,DWORD [92+esp]
+ shr edi,10
+ add ebx,DWORD [64+esp]
+ mov esi,edx
+ xor edi,ecx
+ mov ecx,DWORD [24+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [28+esp]
+ xor edx,esi
+ mov DWORD [92+esp],ebx
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [20+esp],esi
+ xor edx,esi
+ add ebx,DWORD [esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [8+esp]
+ xor esi,ebp
+ mov DWORD [4+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[275423344+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ mov ecx,DWORD [36+esp]
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [16+esp]
+ add eax,esi
+ mov esi,DWORD [88+esp]
+ mov ebx,ecx
+ ror ecx,11
+ mov edi,esi
+ ror esi,2
+ xor ecx,ebx
+ shr ebx,3
+ ror ecx,7
+ xor esi,edi
+ xor ebx,ecx
+ ror esi,17
+ add ebx,DWORD [32+esp]
+ shr edi,10
+ add ebx,DWORD [68+esp]
+ mov ecx,edx
+ xor edi,esi
+ mov esi,DWORD [20+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [24+esp]
+ xor edx,ecx
+ mov DWORD [32+esp],ebx
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [16+esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [28+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [4+esp]
+ xor ecx,eax
+ mov DWORD [esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[430227734+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ mov esi,DWORD [40+esp]
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [12+esp]
+ add ebp,ecx
+ mov ecx,DWORD [92+esp]
+ mov ebx,esi
+ ror esi,11
+ mov edi,ecx
+ ror ecx,2
+ xor esi,ebx
+ shr ebx,3
+ ror esi,7
+ xor ecx,edi
+ xor ebx,esi
+ ror ecx,17
+ add ebx,DWORD [36+esp]
+ shr edi,10
+ add ebx,DWORD [72+esp]
+ mov esi,edx
+ xor edi,ecx
+ mov ecx,DWORD [16+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [20+esp]
+ xor edx,esi
+ mov DWORD [36+esp],ebx
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [12+esp],esi
+ xor edx,esi
+ add ebx,DWORD [24+esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [esp]
+ xor esi,ebp
+ mov DWORD [28+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[506948616+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ mov ecx,DWORD [44+esp]
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [8+esp]
+ add eax,esi
+ mov esi,DWORD [32+esp]
+ mov ebx,ecx
+ ror ecx,11
+ mov edi,esi
+ ror esi,2
+ xor ecx,ebx
+ shr ebx,3
+ ror ecx,7
+ xor esi,edi
+ xor ebx,ecx
+ ror esi,17
+ add ebx,DWORD [40+esp]
+ shr edi,10
+ add ebx,DWORD [76+esp]
+ mov ecx,edx
+ xor edi,esi
+ mov esi,DWORD [12+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [16+esp]
+ xor edx,ecx
+ mov DWORD [40+esp],ebx
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [8+esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [20+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [28+esp]
+ xor ecx,eax
+ mov DWORD [24+esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[659060556+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ mov esi,DWORD [48+esp]
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [4+esp]
+ add ebp,ecx
+ mov ecx,DWORD [36+esp]
+ mov ebx,esi
+ ror esi,11
+ mov edi,ecx
+ ror ecx,2
+ xor esi,ebx
+ shr ebx,3
+ ror esi,7
+ xor ecx,edi
+ xor ebx,esi
+ ror ecx,17
+ add ebx,DWORD [44+esp]
+ shr edi,10
+ add ebx,DWORD [80+esp]
+ mov esi,edx
+ xor edi,ecx
+ mov ecx,DWORD [8+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [12+esp]
+ xor edx,esi
+ mov DWORD [44+esp],ebx
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [4+esp],esi
+ xor edx,esi
+ add ebx,DWORD [16+esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [24+esp]
+ xor esi,ebp
+ mov DWORD [20+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[883997877+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ mov ecx,DWORD [52+esp]
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [esp]
+ add eax,esi
+ mov esi,DWORD [40+esp]
+ mov ebx,ecx
+ ror ecx,11
+ mov edi,esi
+ ror esi,2
+ xor ecx,ebx
+ shr ebx,3
+ ror ecx,7
+ xor esi,edi
+ xor ebx,ecx
+ ror esi,17
+ add ebx,DWORD [48+esp]
+ shr edi,10
+ add ebx,DWORD [84+esp]
+ mov ecx,edx
+ xor edi,esi
+ mov esi,DWORD [4+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [8+esp]
+ xor edx,ecx
+ mov DWORD [48+esp],ebx
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [12+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [20+esp]
+ xor ecx,eax
+ mov DWORD [16+esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[958139571+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ mov esi,DWORD [56+esp]
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [28+esp]
+ add ebp,ecx
+ mov ecx,DWORD [44+esp]
+ mov ebx,esi
+ ror esi,11
+ mov edi,ecx
+ ror ecx,2
+ xor esi,ebx
+ shr ebx,3
+ ror esi,7
+ xor ecx,edi
+ xor ebx,esi
+ ror ecx,17
+ add ebx,DWORD [52+esp]
+ shr edi,10
+ add ebx,DWORD [88+esp]
+ mov esi,edx
+ xor edi,ecx
+ mov ecx,DWORD [esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [4+esp]
+ xor edx,esi
+ mov DWORD [52+esp],ebx
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [28+esp],esi
+ xor edx,esi
+ add ebx,DWORD [8+esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [16+esp]
+ xor esi,ebp
+ mov DWORD [12+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[1322822218+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ mov ecx,DWORD [60+esp]
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [24+esp]
+ add eax,esi
+ mov esi,DWORD [48+esp]
+ mov ebx,ecx
+ ror ecx,11
+ mov edi,esi
+ ror esi,2
+ xor ecx,ebx
+ shr ebx,3
+ ror ecx,7
+ xor esi,edi
+ xor ebx,ecx
+ ror esi,17
+ add ebx,DWORD [56+esp]
+ shr edi,10
+ add ebx,DWORD [92+esp]
+ mov ecx,edx
+ xor edi,esi
+ mov esi,DWORD [28+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [esp]
+ xor edx,ecx
+ mov DWORD [56+esp],ebx
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [24+esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [4+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [12+esp]
+ xor ecx,eax
+ mov DWORD [8+esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[1537002063+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ mov esi,DWORD [64+esp]
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [20+esp]
+ add ebp,ecx
+ mov ecx,DWORD [52+esp]
+ mov ebx,esi
+ ror esi,11
+ mov edi,ecx
+ ror ecx,2
+ xor esi,ebx
+ shr ebx,3
+ ror esi,7
+ xor ecx,edi
+ xor ebx,esi
+ ror ecx,17
+ add ebx,DWORD [60+esp]
+ shr edi,10
+ add ebx,DWORD [32+esp]
+ mov esi,edx
+ xor edi,ecx
+ mov ecx,DWORD [24+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [28+esp]
+ xor edx,esi
+ mov DWORD [60+esp],ebx
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [20+esp],esi
+ xor edx,esi
+ add ebx,DWORD [esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [8+esp]
+ xor esi,ebp
+ mov DWORD [4+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[1747873779+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ mov ecx,DWORD [68+esp]
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [16+esp]
+ add eax,esi
+ mov esi,DWORD [56+esp]
+ mov ebx,ecx
+ ror ecx,11
+ mov edi,esi
+ ror esi,2
+ xor ecx,ebx
+ shr ebx,3
+ ror ecx,7
+ xor esi,edi
+ xor ebx,ecx
+ ror esi,17
+ add ebx,DWORD [64+esp]
+ shr edi,10
+ add ebx,DWORD [36+esp]
+ mov ecx,edx
+ xor edi,esi
+ mov esi,DWORD [20+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [24+esp]
+ xor edx,ecx
+ mov DWORD [64+esp],ebx
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [16+esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [28+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [4+esp]
+ xor ecx,eax
+ mov DWORD [esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[1955562222+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ mov esi,DWORD [72+esp]
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [12+esp]
+ add ebp,ecx
+ mov ecx,DWORD [60+esp]
+ mov ebx,esi
+ ror esi,11
+ mov edi,ecx
+ ror ecx,2
+ xor esi,ebx
+ shr ebx,3
+ ror esi,7
+ xor ecx,edi
+ xor ebx,esi
+ ror ecx,17
+ add ebx,DWORD [68+esp]
+ shr edi,10
+ add ebx,DWORD [40+esp]
+ mov esi,edx
+ xor edi,ecx
+ mov ecx,DWORD [16+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [20+esp]
+ xor edx,esi
+ mov DWORD [68+esp],ebx
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [12+esp],esi
+ xor edx,esi
+ add ebx,DWORD [24+esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [esp]
+ xor esi,ebp
+ mov DWORD [28+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[2024104815+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ mov ecx,DWORD [76+esp]
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [8+esp]
+ add eax,esi
+ mov esi,DWORD [64+esp]
+ mov ebx,ecx
+ ror ecx,11
+ mov edi,esi
+ ror esi,2
+ xor ecx,ebx
+ shr ebx,3
+ ror ecx,7
+ xor esi,edi
+ xor ebx,ecx
+ ror esi,17
+ add ebx,DWORD [72+esp]
+ shr edi,10
+ add ebx,DWORD [44+esp]
+ mov ecx,edx
+ xor edi,esi
+ mov esi,DWORD [12+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [16+esp]
+ xor edx,ecx
+ mov DWORD [72+esp],ebx
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [8+esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [20+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [28+esp]
+ xor ecx,eax
+ mov DWORD [24+esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[2227730452+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ mov esi,DWORD [80+esp]
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [4+esp]
+ add ebp,ecx
+ mov ecx,DWORD [68+esp]
+ mov ebx,esi
+ ror esi,11
+ mov edi,ecx
+ ror ecx,2
+ xor esi,ebx
+ shr ebx,3
+ ror esi,7
+ xor ecx,edi
+ xor ebx,esi
+ ror ecx,17
+ add ebx,DWORD [76+esp]
+ shr edi,10
+ add ebx,DWORD [48+esp]
+ mov esi,edx
+ xor edi,ecx
+ mov ecx,DWORD [8+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [12+esp]
+ xor edx,esi
+ mov DWORD [76+esp],ebx
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [4+esp],esi
+ xor edx,esi
+ add ebx,DWORD [16+esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [24+esp]
+ xor esi,ebp
+ mov DWORD [20+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[2361852424+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ mov ecx,DWORD [84+esp]
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [esp]
+ add eax,esi
+ mov esi,DWORD [72+esp]
+ mov ebx,ecx
+ ror ecx,11
+ mov edi,esi
+ ror esi,2
+ xor ecx,ebx
+ shr ebx,3
+ ror ecx,7
+ xor esi,edi
+ xor ebx,ecx
+ ror esi,17
+ add ebx,DWORD [80+esp]
+ shr edi,10
+ add ebx,DWORD [52+esp]
+ mov ecx,edx
+ xor edi,esi
+ mov esi,DWORD [4+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [8+esp]
+ xor edx,ecx
+ mov DWORD [80+esp],ebx
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [12+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [20+esp]
+ xor ecx,eax
+ mov DWORD [16+esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[2428436474+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ mov esi,DWORD [88+esp]
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [28+esp]
+ add ebp,ecx
+ mov ecx,DWORD [76+esp]
+ mov ebx,esi
+ ror esi,11
+ mov edi,ecx
+ ror ecx,2
+ xor esi,ebx
+ shr ebx,3
+ ror esi,7
+ xor ecx,edi
+ xor ebx,esi
+ ror ecx,17
+ add ebx,DWORD [84+esp]
+ shr edi,10
+ add ebx,DWORD [56+esp]
+ mov esi,edx
+ xor edi,ecx
+ mov ecx,DWORD [esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [4+esp]
+ xor edx,esi
+ mov DWORD [84+esp],ebx
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [28+esp],esi
+ xor edx,esi
+ add ebx,DWORD [8+esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [16+esp]
+ xor esi,ebp
+ mov DWORD [12+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[2756734187+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ mov ecx,DWORD [92+esp]
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [24+esp]
+ add eax,esi
+ mov esi,DWORD [80+esp]
+ mov ebx,ecx
+ ror ecx,11
+ mov edi,esi
+ ror esi,2
+ xor ecx,ebx
+ shr ebx,3
+ ror ecx,7
+ xor esi,edi
+ xor ebx,ecx
+ ror esi,17
+ add ebx,DWORD [88+esp]
+ shr edi,10
+ add ebx,DWORD [60+esp]
+ mov ecx,edx
+ xor edi,esi
+ mov esi,DWORD [28+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [esp]
+ xor edx,ecx
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [24+esp],ecx
+ xor edx,ecx
+ add ebx,DWORD [4+esp]
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add ebx,edi
+ ror ecx,9
+ mov esi,eax
+ mov edi,DWORD [12+esp]
+ xor ecx,eax
+ mov DWORD [8+esp],eax
+ xor eax,edi
+ ror ecx,11
+ and ebp,eax
+ lea edx,[3204031479+edx*1+ebx]
+ xor ecx,esi
+ xor ebp,edi
+ mov esi,DWORD [32+esp]
+ ror ecx,2
+ add ebp,edx
+ add edx,DWORD [20+esp]
+ add ebp,ecx
+ mov ecx,DWORD [84+esp]
+ mov ebx,esi
+ ror esi,11
+ mov edi,ecx
+ ror ecx,2
+ xor esi,ebx
+ shr ebx,3
+ ror esi,7
+ xor ecx,edi
+ xor ebx,esi
+ ror ecx,17
+ add ebx,DWORD [92+esp]
+ shr edi,10
+ add ebx,DWORD [64+esp]
+ mov esi,edx
+ xor edi,ecx
+ mov ecx,DWORD [24+esp]
+ ror edx,14
+ add ebx,edi
+ mov edi,DWORD [28+esp]
+ xor edx,esi
+ xor ecx,edi
+ ror edx,5
+ and ecx,esi
+ mov DWORD [20+esp],esi
+ xor edx,esi
+ add ebx,DWORD [esp]
+ xor edi,ecx
+ ror edx,6
+ mov esi,ebp
+ add ebx,edi
+ ror esi,9
+ mov ecx,ebp
+ mov edi,DWORD [8+esp]
+ xor esi,ebp
+ mov DWORD [4+esp],ebp
+ xor ebp,edi
+ ror esi,11
+ and eax,ebp
+ lea edx,[3329325298+edx*1+ebx]
+ xor esi,ecx
+ xor eax,edi
+ ror esi,2
+ add eax,edx
+ add edx,DWORD [16+esp]
+ add eax,esi
+ mov esi,DWORD [96+esp]
+ xor ebp,edi
+ mov ecx,DWORD [12+esp]
+ add eax,DWORD [esi]
+ add ebp,DWORD [4+esi]
+ add edi,DWORD [8+esi]
+ add ecx,DWORD [12+esi]
+ mov DWORD [esi],eax
+ mov DWORD [4+esi],ebp
+ mov DWORD [8+esi],edi
+ mov DWORD [12+esi],ecx
+ mov DWORD [4+esp],ebp
+ xor ebp,edi
+ mov DWORD [8+esp],edi
+ mov DWORD [12+esp],ecx
+ mov edi,DWORD [20+esp]
+ mov ebx,DWORD [24+esp]
+ mov ecx,DWORD [28+esp]
+ add edx,DWORD [16+esi]
+ add edi,DWORD [20+esi]
+ add ebx,DWORD [24+esi]
+ add ecx,DWORD [28+esi]
+ mov DWORD [16+esi],edx
+ mov DWORD [20+esi],edi
+ mov DWORD [24+esi],ebx
+ mov DWORD [28+esi],ecx
+ mov DWORD [20+esp],edi
+ mov edi,DWORD [100+esp]
+ mov DWORD [24+esp],ebx
+ mov DWORD [28+esp],ecx
+ cmp edi,DWORD [104+esp]
+ jb NEAR L$010grand_loop
+ mov esp,DWORD [108+esp]
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+align 32
+L$004shaext:
+ sub esp,32
+ movdqu xmm1,[esi]
+ lea ebp,[128+ebp]
+ movdqu xmm2,[16+esi]
+ movdqa xmm7,[128+ebp]
+ pshufd xmm0,xmm1,27
+ pshufd xmm1,xmm1,177
+ pshufd xmm2,xmm2,27
+db 102,15,58,15,202,8
+ punpcklqdq xmm2,xmm0
+ jmp NEAR L$011loop_shaext
+align 16
+L$011loop_shaext:
+ movdqu xmm3,[edi]
+ movdqu xmm4,[16+edi]
+ movdqu xmm5,[32+edi]
+db 102,15,56,0,223
+ movdqu xmm6,[48+edi]
+ movdqa [16+esp],xmm2
+ movdqa xmm0,[ebp-128]
+ paddd xmm0,xmm3
+db 102,15,56,0,231
+db 15,56,203,209
+ pshufd xmm0,xmm0,14
+ nop
+ movdqa [esp],xmm1
+db 15,56,203,202
+ movdqa xmm0,[ebp-112]
+ paddd xmm0,xmm4
+db 102,15,56,0,239
+db 15,56,203,209
+ pshufd xmm0,xmm0,14
+ lea edi,[64+edi]
+db 15,56,204,220
+db 15,56,203,202
+ movdqa xmm0,[ebp-96]
+ paddd xmm0,xmm5
+db 102,15,56,0,247
+db 15,56,203,209
+ pshufd xmm0,xmm0,14
+ movdqa xmm7,xmm6
+db 102,15,58,15,253,4
+ nop
+ paddd xmm3,xmm7
+db 15,56,204,229
+db 15,56,203,202
+ movdqa xmm0,[ebp-80]
+ paddd xmm0,xmm6
+db 15,56,205,222
+db 15,56,203,209
+ pshufd xmm0,xmm0,14
+ movdqa xmm7,xmm3
+db 102,15,58,15,254,4
+ nop
+ paddd xmm4,xmm7
+db 15,56,204,238
+db 15,56,203,202
+ movdqa xmm0,[ebp-64]
+ paddd xmm0,xmm3
+db 15,56,205,227
+db 15,56,203,209
+ pshufd xmm0,xmm0,14
+ movdqa xmm7,xmm4
+db 102,15,58,15,251,4
+ nop
+ paddd xmm5,xmm7
+db 15,56,204,243
+db 15,56,203,202
+ movdqa xmm0,[ebp-48]
+ paddd xmm0,xmm4
+db 15,56,205,236
+db 15,56,203,209
+ pshufd xmm0,xmm0,14
+ movdqa xmm7,xmm5
+db 102,15,58,15,252,4
+ nop
+ paddd xmm6,xmm7
+db 15,56,204,220
+db 15,56,203,202
+ movdqa xmm0,[ebp-32]
+ paddd xmm0,xmm5
+db 15,56,205,245
+db 15,56,203,209
+ pshufd xmm0,xmm0,14
+ movdqa xmm7,xmm6
+db 102,15,58,15,253,4
+ nop
+ paddd xmm3,xmm7
+db 15,56,204,229
+db 15,56,203,202
+ movdqa xmm0,[ebp-16]
+ paddd xmm0,xmm6
+db 15,56,205,222
+db 15,56,203,209
+ pshufd xmm0,xmm0,14
+ movdqa xmm7,xmm3
+db 102,15,58,15,254,4
+ nop
+ paddd xmm4,xmm7
+db 15,56,204,238
+db 15,56,203,202
+ movdqa xmm0,[ebp]
+ paddd xmm0,xmm3
+db 15,56,205,227
+db 15,56,203,209
+ pshufd xmm0,xmm0,14
+ movdqa xmm7,xmm4
+db 102,15,58,15,251,4
+ nop
+ paddd xmm5,xmm7
+db 15,56,204,243
+db 15,56,203,202
+ movdqa xmm0,[16+ebp]
+ paddd xmm0,xmm4
+db 15,56,205,236
+db 15,56,203,209
+ pshufd xmm0,xmm0,14
+ movdqa xmm7,xmm5
+db 102,15,58,15,252,4
+ nop
+ paddd xmm6,xmm7
+db 15,56,204,220
+db 15,56,203,202
+ movdqa xmm0,[32+ebp]
+ paddd xmm0,xmm5
+db 15,56,205,245
+db 15,56,203,209
+ pshufd xmm0,xmm0,14
+ movdqa xmm7,xmm6
+db 102,15,58,15,253,4
+ nop
+ paddd xmm3,xmm7
+db 15,56,204,229
+db 15,56,203,202
+ movdqa xmm0,[48+ebp]
+ paddd xmm0,xmm6
+db 15,56,205,222
+db 15,56,203,209
+ pshufd xmm0,xmm0,14
+ movdqa xmm7,xmm3
+db 102,15,58,15,254,4
+ nop
+ paddd xmm4,xmm7
+db 15,56,204,238
+db 15,56,203,202
+ movdqa xmm0,[64+ebp]
+ paddd xmm0,xmm3
+db 15,56,205,227
+db 15,56,203,209
+ pshufd xmm0,xmm0,14
+ movdqa xmm7,xmm4
+db 102,15,58,15,251,4
+ nop
+ paddd xmm5,xmm7
+db 15,56,204,243
+db 15,56,203,202
+ movdqa xmm0,[80+ebp]
+ paddd xmm0,xmm4
+db 15,56,205,236
+db 15,56,203,209
+ pshufd xmm0,xmm0,14
+ movdqa xmm7,xmm5
+db 102,15,58,15,252,4
+db 15,56,203,202
+ paddd xmm6,xmm7
+ movdqa xmm0,[96+ebp]
+ paddd xmm0,xmm5
+db 15,56,203,209
+ pshufd xmm0,xmm0,14
+db 15,56,205,245
+ movdqa xmm7,[128+ebp]
+db 15,56,203,202
+ movdqa xmm0,[112+ebp]
+ paddd xmm0,xmm6
+ nop
+db 15,56,203,209
+ pshufd xmm0,xmm0,14
+ cmp eax,edi
+ nop
+db 15,56,203,202
+ paddd xmm2,[16+esp]
+ paddd xmm1,[esp]
+ jnz NEAR L$011loop_shaext
+ pshufd xmm2,xmm2,177
+ pshufd xmm7,xmm1,27
+ pshufd xmm1,xmm1,177
+ punpckhqdq xmm1,xmm2
+db 102,15,58,15,215,8
+ mov esp,DWORD [44+esp]
+ movdqu [esi],xmm1
+ movdqu [16+esi],xmm2
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+align 32
+L$006SSSE3:
+ lea esp,[esp-96]
+ mov eax,DWORD [esi]
+ mov ebx,DWORD [4+esi]
+ mov ecx,DWORD [8+esi]
+ mov edi,DWORD [12+esi]
+ mov DWORD [4+esp],ebx
+ xor ebx,ecx
+ mov DWORD [8+esp],ecx
+ mov DWORD [12+esp],edi
+ mov edx,DWORD [16+esi]
+ mov edi,DWORD [20+esi]
+ mov ecx,DWORD [24+esi]
+ mov esi,DWORD [28+esi]
+ mov DWORD [20+esp],edi
+ mov edi,DWORD [100+esp]
+ mov DWORD [24+esp],ecx
+ mov DWORD [28+esp],esi
+ movdqa xmm7,[256+ebp]
+ jmp NEAR L$012grand_ssse3
+align 16
+L$012grand_ssse3:
+ movdqu xmm0,[edi]
+ movdqu xmm1,[16+edi]
+ movdqu xmm2,[32+edi]
+ movdqu xmm3,[48+edi]
+ add edi,64
+db 102,15,56,0,199
+ mov DWORD [100+esp],edi
+db 102,15,56,0,207
+ movdqa xmm4,[ebp]
+db 102,15,56,0,215
+ movdqa xmm5,[16+ebp]
+ paddd xmm4,xmm0
+db 102,15,56,0,223
+ movdqa xmm6,[32+ebp]
+ paddd xmm5,xmm1
+ movdqa xmm7,[48+ebp]
+ movdqa [32+esp],xmm4
+ paddd xmm6,xmm2
+ movdqa [48+esp],xmm5
+ paddd xmm7,xmm3
+ movdqa [64+esp],xmm6
+ movdqa [80+esp],xmm7
+ jmp NEAR L$013ssse3_00_47
+align 16
+L$013ssse3_00_47:
+ add ebp,64
+ mov ecx,edx
+ movdqa xmm4,xmm1
+ ror edx,14
+ mov esi,DWORD [20+esp]
+ movdqa xmm7,xmm3
+ xor edx,ecx
+ mov edi,DWORD [24+esp]
+db 102,15,58,15,224,4
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+db 102,15,58,15,250,4
+ mov DWORD [16+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ movdqa xmm5,xmm4
+ ror edx,6
+ mov ecx,eax
+ movdqa xmm6,xmm4
+ add edx,edi
+ mov edi,DWORD [4+esp]
+ psrld xmm4,3
+ mov esi,eax
+ ror ecx,9
+ paddd xmm0,xmm7
+ mov DWORD [esp],eax
+ xor ecx,eax
+ psrld xmm6,7
+ xor eax,edi
+ add edx,DWORD [28+esp]
+ ror ecx,11
+ and ebx,eax
+ pshufd xmm7,xmm3,250
+ xor ecx,esi
+ add edx,DWORD [32+esp]
+ pslld xmm5,14
+ xor ebx,edi
+ ror ecx,2
+ pxor xmm4,xmm6
+ add ebx,edx
+ add edx,DWORD [12+esp]
+ psrld xmm6,11
+ add ebx,ecx
+ mov ecx,edx
+ ror edx,14
+ pxor xmm4,xmm5
+ mov esi,DWORD [16+esp]
+ xor edx,ecx
+ pslld xmm5,11
+ mov edi,DWORD [20+esp]
+ xor esi,edi
+ ror edx,5
+ pxor xmm4,xmm6
+ and esi,ecx
+ mov DWORD [12+esp],ecx
+ movdqa xmm6,xmm7
+ xor edx,ecx
+ xor edi,esi
+ ror edx,6
+ pxor xmm4,xmm5
+ mov ecx,ebx
+ add edx,edi
+ psrld xmm7,10
+ mov edi,DWORD [esp]
+ mov esi,ebx
+ ror ecx,9
+ paddd xmm0,xmm4
+ mov DWORD [28+esp],ebx
+ xor ecx,ebx
+ psrlq xmm6,17
+ xor ebx,edi
+ add edx,DWORD [24+esp]
+ ror ecx,11
+ pxor xmm7,xmm6
+ and eax,ebx
+ xor ecx,esi
+ psrlq xmm6,2
+ add edx,DWORD [36+esp]
+ xor eax,edi
+ ror ecx,2
+ pxor xmm7,xmm6
+ add eax,edx
+ add edx,DWORD [8+esp]
+ pshufd xmm7,xmm7,128
+ add eax,ecx
+ mov ecx,edx
+ ror edx,14
+ mov esi,DWORD [12+esp]
+ xor edx,ecx
+ mov edi,DWORD [16+esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ psrldq xmm7,8
+ mov DWORD [8+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ paddd xmm0,xmm7
+ ror edx,6
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [28+esp]
+ mov esi,eax
+ ror ecx,9
+ mov DWORD [24+esp],eax
+ pshufd xmm7,xmm0,80
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [20+esp]
+ movdqa xmm6,xmm7
+ ror ecx,11
+ psrld xmm7,10
+ and ebx,eax
+ psrlq xmm6,17
+ xor ecx,esi
+ add edx,DWORD [40+esp]
+ xor ebx,edi
+ ror ecx,2
+ pxor xmm7,xmm6
+ add ebx,edx
+ add edx,DWORD [4+esp]
+ psrlq xmm6,2
+ add ebx,ecx
+ mov ecx,edx
+ ror edx,14
+ pxor xmm7,xmm6
+ mov esi,DWORD [8+esp]
+ xor edx,ecx
+ mov edi,DWORD [12+esp]
+ pshufd xmm7,xmm7,8
+ xor esi,edi
+ ror edx,5
+ movdqa xmm6,[ebp]
+ and esi,ecx
+ mov DWORD [4+esp],ecx
+ pslldq xmm7,8
+ xor edx,ecx
+ xor edi,esi
+ ror edx,6
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [24+esp]
+ mov esi,ebx
+ ror ecx,9
+ paddd xmm0,xmm7
+ mov DWORD [20+esp],ebx
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [16+esp]
+ paddd xmm6,xmm0
+ ror ecx,11
+ and eax,ebx
+ xor ecx,esi
+ add edx,DWORD [44+esp]
+ xor eax,edi
+ ror ecx,2
+ add eax,edx
+ add edx,DWORD [esp]
+ add eax,ecx
+ movdqa [32+esp],xmm6
+ mov ecx,edx
+ movdqa xmm4,xmm2
+ ror edx,14
+ mov esi,DWORD [4+esp]
+ movdqa xmm7,xmm0
+ xor edx,ecx
+ mov edi,DWORD [8+esp]
+db 102,15,58,15,225,4
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+db 102,15,58,15,251,4
+ mov DWORD [esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ movdqa xmm5,xmm4
+ ror edx,6
+ mov ecx,eax
+ movdqa xmm6,xmm4
+ add edx,edi
+ mov edi,DWORD [20+esp]
+ psrld xmm4,3
+ mov esi,eax
+ ror ecx,9
+ paddd xmm1,xmm7
+ mov DWORD [16+esp],eax
+ xor ecx,eax
+ psrld xmm6,7
+ xor eax,edi
+ add edx,DWORD [12+esp]
+ ror ecx,11
+ and ebx,eax
+ pshufd xmm7,xmm0,250
+ xor ecx,esi
+ add edx,DWORD [48+esp]
+ pslld xmm5,14
+ xor ebx,edi
+ ror ecx,2
+ pxor xmm4,xmm6
+ add ebx,edx
+ add edx,DWORD [28+esp]
+ psrld xmm6,11
+ add ebx,ecx
+ mov ecx,edx
+ ror edx,14
+ pxor xmm4,xmm5
+ mov esi,DWORD [esp]
+ xor edx,ecx
+ pslld xmm5,11
+ mov edi,DWORD [4+esp]
+ xor esi,edi
+ ror edx,5
+ pxor xmm4,xmm6
+ and esi,ecx
+ mov DWORD [28+esp],ecx
+ movdqa xmm6,xmm7
+ xor edx,ecx
+ xor edi,esi
+ ror edx,6
+ pxor xmm4,xmm5
+ mov ecx,ebx
+ add edx,edi
+ psrld xmm7,10
+ mov edi,DWORD [16+esp]
+ mov esi,ebx
+ ror ecx,9
+ paddd xmm1,xmm4
+ mov DWORD [12+esp],ebx
+ xor ecx,ebx
+ psrlq xmm6,17
+ xor ebx,edi
+ add edx,DWORD [8+esp]
+ ror ecx,11
+ pxor xmm7,xmm6
+ and eax,ebx
+ xor ecx,esi
+ psrlq xmm6,2
+ add edx,DWORD [52+esp]
+ xor eax,edi
+ ror ecx,2
+ pxor xmm7,xmm6
+ add eax,edx
+ add edx,DWORD [24+esp]
+ pshufd xmm7,xmm7,128
+ add eax,ecx
+ mov ecx,edx
+ ror edx,14
+ mov esi,DWORD [28+esp]
+ xor edx,ecx
+ mov edi,DWORD [esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ psrldq xmm7,8
+ mov DWORD [24+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ paddd xmm1,xmm7
+ ror edx,6
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [12+esp]
+ mov esi,eax
+ ror ecx,9
+ mov DWORD [8+esp],eax
+ pshufd xmm7,xmm1,80
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [4+esp]
+ movdqa xmm6,xmm7
+ ror ecx,11
+ psrld xmm7,10
+ and ebx,eax
+ psrlq xmm6,17
+ xor ecx,esi
+ add edx,DWORD [56+esp]
+ xor ebx,edi
+ ror ecx,2
+ pxor xmm7,xmm6
+ add ebx,edx
+ add edx,DWORD [20+esp]
+ psrlq xmm6,2
+ add ebx,ecx
+ mov ecx,edx
+ ror edx,14
+ pxor xmm7,xmm6
+ mov esi,DWORD [24+esp]
+ xor edx,ecx
+ mov edi,DWORD [28+esp]
+ pshufd xmm7,xmm7,8
+ xor esi,edi
+ ror edx,5
+ movdqa xmm6,[16+ebp]
+ and esi,ecx
+ mov DWORD [20+esp],ecx
+ pslldq xmm7,8
+ xor edx,ecx
+ xor edi,esi
+ ror edx,6
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [8+esp]
+ mov esi,ebx
+ ror ecx,9
+ paddd xmm1,xmm7
+ mov DWORD [4+esp],ebx
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [esp]
+ paddd xmm6,xmm1
+ ror ecx,11
+ and eax,ebx
+ xor ecx,esi
+ add edx,DWORD [60+esp]
+ xor eax,edi
+ ror ecx,2
+ add eax,edx
+ add edx,DWORD [16+esp]
+ add eax,ecx
+ movdqa [48+esp],xmm6
+ mov ecx,edx
+ movdqa xmm4,xmm3
+ ror edx,14
+ mov esi,DWORD [20+esp]
+ movdqa xmm7,xmm1
+ xor edx,ecx
+ mov edi,DWORD [24+esp]
+db 102,15,58,15,226,4
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+db 102,15,58,15,248,4
+ mov DWORD [16+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ movdqa xmm5,xmm4
+ ror edx,6
+ mov ecx,eax
+ movdqa xmm6,xmm4
+ add edx,edi
+ mov edi,DWORD [4+esp]
+ psrld xmm4,3
+ mov esi,eax
+ ror ecx,9
+ paddd xmm2,xmm7
+ mov DWORD [esp],eax
+ xor ecx,eax
+ psrld xmm6,7
+ xor eax,edi
+ add edx,DWORD [28+esp]
+ ror ecx,11
+ and ebx,eax
+ pshufd xmm7,xmm1,250
+ xor ecx,esi
+ add edx,DWORD [64+esp]
+ pslld xmm5,14
+ xor ebx,edi
+ ror ecx,2
+ pxor xmm4,xmm6
+ add ebx,edx
+ add edx,DWORD [12+esp]
+ psrld xmm6,11
+ add ebx,ecx
+ mov ecx,edx
+ ror edx,14
+ pxor xmm4,xmm5
+ mov esi,DWORD [16+esp]
+ xor edx,ecx
+ pslld xmm5,11
+ mov edi,DWORD [20+esp]
+ xor esi,edi
+ ror edx,5
+ pxor xmm4,xmm6
+ and esi,ecx
+ mov DWORD [12+esp],ecx
+ movdqa xmm6,xmm7
+ xor edx,ecx
+ xor edi,esi
+ ror edx,6
+ pxor xmm4,xmm5
+ mov ecx,ebx
+ add edx,edi
+ psrld xmm7,10
+ mov edi,DWORD [esp]
+ mov esi,ebx
+ ror ecx,9
+ paddd xmm2,xmm4
+ mov DWORD [28+esp],ebx
+ xor ecx,ebx
+ psrlq xmm6,17
+ xor ebx,edi
+ add edx,DWORD [24+esp]
+ ror ecx,11
+ pxor xmm7,xmm6
+ and eax,ebx
+ xor ecx,esi
+ psrlq xmm6,2
+ add edx,DWORD [68+esp]
+ xor eax,edi
+ ror ecx,2
+ pxor xmm7,xmm6
+ add eax,edx
+ add edx,DWORD [8+esp]
+ pshufd xmm7,xmm7,128
+ add eax,ecx
+ mov ecx,edx
+ ror edx,14
+ mov esi,DWORD [12+esp]
+ xor edx,ecx
+ mov edi,DWORD [16+esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ psrldq xmm7,8
+ mov DWORD [8+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ paddd xmm2,xmm7
+ ror edx,6
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [28+esp]
+ mov esi,eax
+ ror ecx,9
+ mov DWORD [24+esp],eax
+ pshufd xmm7,xmm2,80
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [20+esp]
+ movdqa xmm6,xmm7
+ ror ecx,11
+ psrld xmm7,10
+ and ebx,eax
+ psrlq xmm6,17
+ xor ecx,esi
+ add edx,DWORD [72+esp]
+ xor ebx,edi
+ ror ecx,2
+ pxor xmm7,xmm6
+ add ebx,edx
+ add edx,DWORD [4+esp]
+ psrlq xmm6,2
+ add ebx,ecx
+ mov ecx,edx
+ ror edx,14
+ pxor xmm7,xmm6
+ mov esi,DWORD [8+esp]
+ xor edx,ecx
+ mov edi,DWORD [12+esp]
+ pshufd xmm7,xmm7,8
+ xor esi,edi
+ ror edx,5
+ movdqa xmm6,[32+ebp]
+ and esi,ecx
+ mov DWORD [4+esp],ecx
+ pslldq xmm7,8
+ xor edx,ecx
+ xor edi,esi
+ ror edx,6
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [24+esp]
+ mov esi,ebx
+ ror ecx,9
+ paddd xmm2,xmm7
+ mov DWORD [20+esp],ebx
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [16+esp]
+ paddd xmm6,xmm2
+ ror ecx,11
+ and eax,ebx
+ xor ecx,esi
+ add edx,DWORD [76+esp]
+ xor eax,edi
+ ror ecx,2
+ add eax,edx
+ add edx,DWORD [esp]
+ add eax,ecx
+ movdqa [64+esp],xmm6
+ mov ecx,edx
+ movdqa xmm4,xmm0
+ ror edx,14
+ mov esi,DWORD [4+esp]
+ movdqa xmm7,xmm2
+ xor edx,ecx
+ mov edi,DWORD [8+esp]
+db 102,15,58,15,227,4
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+db 102,15,58,15,249,4
+ mov DWORD [esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ movdqa xmm5,xmm4
+ ror edx,6
+ mov ecx,eax
+ movdqa xmm6,xmm4
+ add edx,edi
+ mov edi,DWORD [20+esp]
+ psrld xmm4,3
+ mov esi,eax
+ ror ecx,9
+ paddd xmm3,xmm7
+ mov DWORD [16+esp],eax
+ xor ecx,eax
+ psrld xmm6,7
+ xor eax,edi
+ add edx,DWORD [12+esp]
+ ror ecx,11
+ and ebx,eax
+ pshufd xmm7,xmm2,250
+ xor ecx,esi
+ add edx,DWORD [80+esp]
+ pslld xmm5,14
+ xor ebx,edi
+ ror ecx,2
+ pxor xmm4,xmm6
+ add ebx,edx
+ add edx,DWORD [28+esp]
+ psrld xmm6,11
+ add ebx,ecx
+ mov ecx,edx
+ ror edx,14
+ pxor xmm4,xmm5
+ mov esi,DWORD [esp]
+ xor edx,ecx
+ pslld xmm5,11
+ mov edi,DWORD [4+esp]
+ xor esi,edi
+ ror edx,5
+ pxor xmm4,xmm6
+ and esi,ecx
+ mov DWORD [28+esp],ecx
+ movdqa xmm6,xmm7
+ xor edx,ecx
+ xor edi,esi
+ ror edx,6
+ pxor xmm4,xmm5
+ mov ecx,ebx
+ add edx,edi
+ psrld xmm7,10
+ mov edi,DWORD [16+esp]
+ mov esi,ebx
+ ror ecx,9
+ paddd xmm3,xmm4
+ mov DWORD [12+esp],ebx
+ xor ecx,ebx
+ psrlq xmm6,17
+ xor ebx,edi
+ add edx,DWORD [8+esp]
+ ror ecx,11
+ pxor xmm7,xmm6
+ and eax,ebx
+ xor ecx,esi
+ psrlq xmm6,2
+ add edx,DWORD [84+esp]
+ xor eax,edi
+ ror ecx,2
+ pxor xmm7,xmm6
+ add eax,edx
+ add edx,DWORD [24+esp]
+ pshufd xmm7,xmm7,128
+ add eax,ecx
+ mov ecx,edx
+ ror edx,14
+ mov esi,DWORD [28+esp]
+ xor edx,ecx
+ mov edi,DWORD [esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ psrldq xmm7,8
+ mov DWORD [24+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ paddd xmm3,xmm7
+ ror edx,6
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [12+esp]
+ mov esi,eax
+ ror ecx,9
+ mov DWORD [8+esp],eax
+ pshufd xmm7,xmm3,80
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [4+esp]
+ movdqa xmm6,xmm7
+ ror ecx,11
+ psrld xmm7,10
+ and ebx,eax
+ psrlq xmm6,17
+ xor ecx,esi
+ add edx,DWORD [88+esp]
+ xor ebx,edi
+ ror ecx,2
+ pxor xmm7,xmm6
+ add ebx,edx
+ add edx,DWORD [20+esp]
+ psrlq xmm6,2
+ add ebx,ecx
+ mov ecx,edx
+ ror edx,14
+ pxor xmm7,xmm6
+ mov esi,DWORD [24+esp]
+ xor edx,ecx
+ mov edi,DWORD [28+esp]
+ pshufd xmm7,xmm7,8
+ xor esi,edi
+ ror edx,5
+ movdqa xmm6,[48+ebp]
+ and esi,ecx
+ mov DWORD [20+esp],ecx
+ pslldq xmm7,8
+ xor edx,ecx
+ xor edi,esi
+ ror edx,6
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [8+esp]
+ mov esi,ebx
+ ror ecx,9
+ paddd xmm3,xmm7
+ mov DWORD [4+esp],ebx
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [esp]
+ paddd xmm6,xmm3
+ ror ecx,11
+ and eax,ebx
+ xor ecx,esi
+ add edx,DWORD [92+esp]
+ xor eax,edi
+ ror ecx,2
+ add eax,edx
+ add edx,DWORD [16+esp]
+ add eax,ecx
+ movdqa [80+esp],xmm6
+ cmp DWORD [64+ebp],66051
+ jne NEAR L$013ssse3_00_47
+ mov ecx,edx
+ ror edx,14
+ mov esi,DWORD [20+esp]
+ xor edx,ecx
+ mov edi,DWORD [24+esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [16+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [4+esp]
+ mov esi,eax
+ ror ecx,9
+ mov DWORD [esp],eax
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [28+esp]
+ ror ecx,11
+ and ebx,eax
+ xor ecx,esi
+ add edx,DWORD [32+esp]
+ xor ebx,edi
+ ror ecx,2
+ add ebx,edx
+ add edx,DWORD [12+esp]
+ add ebx,ecx
+ mov ecx,edx
+ ror edx,14
+ mov esi,DWORD [16+esp]
+ xor edx,ecx
+ mov edi,DWORD [20+esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [12+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ ror edx,6
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [esp]
+ mov esi,ebx
+ ror ecx,9
+ mov DWORD [28+esp],ebx
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [24+esp]
+ ror ecx,11
+ and eax,ebx
+ xor ecx,esi
+ add edx,DWORD [36+esp]
+ xor eax,edi
+ ror ecx,2
+ add eax,edx
+ add edx,DWORD [8+esp]
+ add eax,ecx
+ mov ecx,edx
+ ror edx,14
+ mov esi,DWORD [12+esp]
+ xor edx,ecx
+ mov edi,DWORD [16+esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [8+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [28+esp]
+ mov esi,eax
+ ror ecx,9
+ mov DWORD [24+esp],eax
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [20+esp]
+ ror ecx,11
+ and ebx,eax
+ xor ecx,esi
+ add edx,DWORD [40+esp]
+ xor ebx,edi
+ ror ecx,2
+ add ebx,edx
+ add edx,DWORD [4+esp]
+ add ebx,ecx
+ mov ecx,edx
+ ror edx,14
+ mov esi,DWORD [8+esp]
+ xor edx,ecx
+ mov edi,DWORD [12+esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [4+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ ror edx,6
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [24+esp]
+ mov esi,ebx
+ ror ecx,9
+ mov DWORD [20+esp],ebx
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [16+esp]
+ ror ecx,11
+ and eax,ebx
+ xor ecx,esi
+ add edx,DWORD [44+esp]
+ xor eax,edi
+ ror ecx,2
+ add eax,edx
+ add edx,DWORD [esp]
+ add eax,ecx
+ mov ecx,edx
+ ror edx,14
+ mov esi,DWORD [4+esp]
+ xor edx,ecx
+ mov edi,DWORD [8+esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [20+esp]
+ mov esi,eax
+ ror ecx,9
+ mov DWORD [16+esp],eax
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [12+esp]
+ ror ecx,11
+ and ebx,eax
+ xor ecx,esi
+ add edx,DWORD [48+esp]
+ xor ebx,edi
+ ror ecx,2
+ add ebx,edx
+ add edx,DWORD [28+esp]
+ add ebx,ecx
+ mov ecx,edx
+ ror edx,14
+ mov esi,DWORD [esp]
+ xor edx,ecx
+ mov edi,DWORD [4+esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [28+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ ror edx,6
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [16+esp]
+ mov esi,ebx
+ ror ecx,9
+ mov DWORD [12+esp],ebx
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [8+esp]
+ ror ecx,11
+ and eax,ebx
+ xor ecx,esi
+ add edx,DWORD [52+esp]
+ xor eax,edi
+ ror ecx,2
+ add eax,edx
+ add edx,DWORD [24+esp]
+ add eax,ecx
+ mov ecx,edx
+ ror edx,14
+ mov esi,DWORD [28+esp]
+ xor edx,ecx
+ mov edi,DWORD [esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [24+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [12+esp]
+ mov esi,eax
+ ror ecx,9
+ mov DWORD [8+esp],eax
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [4+esp]
+ ror ecx,11
+ and ebx,eax
+ xor ecx,esi
+ add edx,DWORD [56+esp]
+ xor ebx,edi
+ ror ecx,2
+ add ebx,edx
+ add edx,DWORD [20+esp]
+ add ebx,ecx
+ mov ecx,edx
+ ror edx,14
+ mov esi,DWORD [24+esp]
+ xor edx,ecx
+ mov edi,DWORD [28+esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [20+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ ror edx,6
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [8+esp]
+ mov esi,ebx
+ ror ecx,9
+ mov DWORD [4+esp],ebx
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [esp]
+ ror ecx,11
+ and eax,ebx
+ xor ecx,esi
+ add edx,DWORD [60+esp]
+ xor eax,edi
+ ror ecx,2
+ add eax,edx
+ add edx,DWORD [16+esp]
+ add eax,ecx
+ mov ecx,edx
+ ror edx,14
+ mov esi,DWORD [20+esp]
+ xor edx,ecx
+ mov edi,DWORD [24+esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [16+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [4+esp]
+ mov esi,eax
+ ror ecx,9
+ mov DWORD [esp],eax
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [28+esp]
+ ror ecx,11
+ and ebx,eax
+ xor ecx,esi
+ add edx,DWORD [64+esp]
+ xor ebx,edi
+ ror ecx,2
+ add ebx,edx
+ add edx,DWORD [12+esp]
+ add ebx,ecx
+ mov ecx,edx
+ ror edx,14
+ mov esi,DWORD [16+esp]
+ xor edx,ecx
+ mov edi,DWORD [20+esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [12+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ ror edx,6
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [esp]
+ mov esi,ebx
+ ror ecx,9
+ mov DWORD [28+esp],ebx
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [24+esp]
+ ror ecx,11
+ and eax,ebx
+ xor ecx,esi
+ add edx,DWORD [68+esp]
+ xor eax,edi
+ ror ecx,2
+ add eax,edx
+ add edx,DWORD [8+esp]
+ add eax,ecx
+ mov ecx,edx
+ ror edx,14
+ mov esi,DWORD [12+esp]
+ xor edx,ecx
+ mov edi,DWORD [16+esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [8+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [28+esp]
+ mov esi,eax
+ ror ecx,9
+ mov DWORD [24+esp],eax
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [20+esp]
+ ror ecx,11
+ and ebx,eax
+ xor ecx,esi
+ add edx,DWORD [72+esp]
+ xor ebx,edi
+ ror ecx,2
+ add ebx,edx
+ add edx,DWORD [4+esp]
+ add ebx,ecx
+ mov ecx,edx
+ ror edx,14
+ mov esi,DWORD [8+esp]
+ xor edx,ecx
+ mov edi,DWORD [12+esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [4+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ ror edx,6
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [24+esp]
+ mov esi,ebx
+ ror ecx,9
+ mov DWORD [20+esp],ebx
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [16+esp]
+ ror ecx,11
+ and eax,ebx
+ xor ecx,esi
+ add edx,DWORD [76+esp]
+ xor eax,edi
+ ror ecx,2
+ add eax,edx
+ add edx,DWORD [esp]
+ add eax,ecx
+ mov ecx,edx
+ ror edx,14
+ mov esi,DWORD [4+esp]
+ xor edx,ecx
+ mov edi,DWORD [8+esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [20+esp]
+ mov esi,eax
+ ror ecx,9
+ mov DWORD [16+esp],eax
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [12+esp]
+ ror ecx,11
+ and ebx,eax
+ xor ecx,esi
+ add edx,DWORD [80+esp]
+ xor ebx,edi
+ ror ecx,2
+ add ebx,edx
+ add edx,DWORD [28+esp]
+ add ebx,ecx
+ mov ecx,edx
+ ror edx,14
+ mov esi,DWORD [esp]
+ xor edx,ecx
+ mov edi,DWORD [4+esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [28+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ ror edx,6
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [16+esp]
+ mov esi,ebx
+ ror ecx,9
+ mov DWORD [12+esp],ebx
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [8+esp]
+ ror ecx,11
+ and eax,ebx
+ xor ecx,esi
+ add edx,DWORD [84+esp]
+ xor eax,edi
+ ror ecx,2
+ add eax,edx
+ add edx,DWORD [24+esp]
+ add eax,ecx
+ mov ecx,edx
+ ror edx,14
+ mov esi,DWORD [28+esp]
+ xor edx,ecx
+ mov edi,DWORD [esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [24+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ ror edx,6
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [12+esp]
+ mov esi,eax
+ ror ecx,9
+ mov DWORD [8+esp],eax
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [4+esp]
+ ror ecx,11
+ and ebx,eax
+ xor ecx,esi
+ add edx,DWORD [88+esp]
+ xor ebx,edi
+ ror ecx,2
+ add ebx,edx
+ add edx,DWORD [20+esp]
+ add ebx,ecx
+ mov ecx,edx
+ ror edx,14
+ mov esi,DWORD [24+esp]
+ xor edx,ecx
+ mov edi,DWORD [28+esp]
+ xor esi,edi
+ ror edx,5
+ and esi,ecx
+ mov DWORD [20+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ ror edx,6
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [8+esp]
+ mov esi,ebx
+ ror ecx,9
+ mov DWORD [4+esp],ebx
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [esp]
+ ror ecx,11
+ and eax,ebx
+ xor ecx,esi
+ add edx,DWORD [92+esp]
+ xor eax,edi
+ ror ecx,2
+ add eax,edx
+ add edx,DWORD [16+esp]
+ add eax,ecx
+ mov esi,DWORD [96+esp]
+ xor ebx,edi
+ mov ecx,DWORD [12+esp]
+ add eax,DWORD [esi]
+ add ebx,DWORD [4+esi]
+ add edi,DWORD [8+esi]
+ add ecx,DWORD [12+esi]
+ mov DWORD [esi],eax
+ mov DWORD [4+esi],ebx
+ mov DWORD [8+esi],edi
+ mov DWORD [12+esi],ecx
+ mov DWORD [4+esp],ebx
+ xor ebx,edi
+ mov DWORD [8+esp],edi
+ mov DWORD [12+esp],ecx
+ mov edi,DWORD [20+esp]
+ mov ecx,DWORD [24+esp]
+ add edx,DWORD [16+esi]
+ add edi,DWORD [20+esi]
+ add ecx,DWORD [24+esi]
+ mov DWORD [16+esi],edx
+ mov DWORD [20+esi],edi
+ mov DWORD [20+esp],edi
+ mov edi,DWORD [28+esp]
+ mov DWORD [24+esi],ecx
+ add edi,DWORD [28+esi]
+ mov DWORD [24+esp],ecx
+ mov DWORD [28+esi],edi
+ mov DWORD [28+esp],edi
+ mov edi,DWORD [100+esp]
+ movdqa xmm7,[64+ebp]
+ sub ebp,192
+ cmp edi,DWORD [104+esp]
+ jb NEAR L$012grand_ssse3
+ mov esp,DWORD [108+esp]
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+align 32
+L$005AVX:
+ and edx,264
+ cmp edx,264
+ je NEAR L$014AVX_BMI
+ lea esp,[esp-96]
+ vzeroall
+ mov eax,DWORD [esi]
+ mov ebx,DWORD [4+esi]
+ mov ecx,DWORD [8+esi]
+ mov edi,DWORD [12+esi]
+ mov DWORD [4+esp],ebx
+ xor ebx,ecx
+ mov DWORD [8+esp],ecx
+ mov DWORD [12+esp],edi
+ mov edx,DWORD [16+esi]
+ mov edi,DWORD [20+esi]
+ mov ecx,DWORD [24+esi]
+ mov esi,DWORD [28+esi]
+ mov DWORD [20+esp],edi
+ mov edi,DWORD [100+esp]
+ mov DWORD [24+esp],ecx
+ mov DWORD [28+esp],esi
+ vmovdqa xmm7,[256+ebp]
+ jmp NEAR L$015grand_avx
+align 32
+L$015grand_avx:
+ vmovdqu xmm0,[edi]
+ vmovdqu xmm1,[16+edi]
+ vmovdqu xmm2,[32+edi]
+ vmovdqu xmm3,[48+edi]
+ add edi,64
+ vpshufb xmm0,xmm0,xmm7
+ mov DWORD [100+esp],edi
+ vpshufb xmm1,xmm1,xmm7
+ vpshufb xmm2,xmm2,xmm7
+ vpaddd xmm4,xmm0,[ebp]
+ vpshufb xmm3,xmm3,xmm7
+ vpaddd xmm5,xmm1,[16+ebp]
+ vpaddd xmm6,xmm2,[32+ebp]
+ vpaddd xmm7,xmm3,[48+ebp]
+ vmovdqa [32+esp],xmm4
+ vmovdqa [48+esp],xmm5
+ vmovdqa [64+esp],xmm6
+ vmovdqa [80+esp],xmm7
+ jmp NEAR L$016avx_00_47
+align 16
+L$016avx_00_47:
+ add ebp,64
+ vpalignr xmm4,xmm1,xmm0,4
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [20+esp]
+ vpalignr xmm7,xmm3,xmm2,4
+ xor edx,ecx
+ mov edi,DWORD [24+esp]
+ xor esi,edi
+ vpsrld xmm6,xmm4,7
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [16+esp],ecx
+ vpaddd xmm0,xmm0,xmm7
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ vpsrld xmm7,xmm4,3
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [4+esp]
+ vpslld xmm5,xmm4,14
+ mov esi,eax
+ shrd ecx,ecx,9
+ mov DWORD [esp],eax
+ vpxor xmm4,xmm7,xmm6
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [28+esp]
+ vpshufd xmm7,xmm3,250
+ shrd ecx,ecx,11
+ and ebx,eax
+ xor ecx,esi
+ vpsrld xmm6,xmm6,11
+ add edx,DWORD [32+esp]
+ xor ebx,edi
+ shrd ecx,ecx,2
+ vpxor xmm4,xmm4,xmm5
+ add ebx,edx
+ add edx,DWORD [12+esp]
+ add ebx,ecx
+ vpslld xmm5,xmm5,11
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [16+esp]
+ vpxor xmm4,xmm4,xmm6
+ xor edx,ecx
+ mov edi,DWORD [20+esp]
+ xor esi,edi
+ vpsrld xmm6,xmm7,10
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [12+esp],ecx
+ vpxor xmm4,xmm4,xmm5
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ vpsrlq xmm5,xmm7,17
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [esp]
+ vpaddd xmm0,xmm0,xmm4
+ mov esi,ebx
+ shrd ecx,ecx,9
+ mov DWORD [28+esp],ebx
+ vpxor xmm6,xmm6,xmm5
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [24+esp]
+ vpsrlq xmm7,xmm7,19
+ shrd ecx,ecx,11
+ and eax,ebx
+ xor ecx,esi
+ vpxor xmm6,xmm6,xmm7
+ add edx,DWORD [36+esp]
+ xor eax,edi
+ shrd ecx,ecx,2
+ vpshufd xmm7,xmm6,132
+ add eax,edx
+ add edx,DWORD [8+esp]
+ add eax,ecx
+ vpsrldq xmm7,xmm7,8
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [12+esp]
+ vpaddd xmm0,xmm0,xmm7
+ xor edx,ecx
+ mov edi,DWORD [16+esp]
+ xor esi,edi
+ vpshufd xmm7,xmm0,80
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [8+esp],ecx
+ vpsrld xmm6,xmm7,10
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ vpsrlq xmm5,xmm7,17
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [28+esp]
+ vpxor xmm6,xmm6,xmm5
+ mov esi,eax
+ shrd ecx,ecx,9
+ mov DWORD [24+esp],eax
+ vpsrlq xmm7,xmm7,19
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [20+esp]
+ vpxor xmm6,xmm6,xmm7
+ shrd ecx,ecx,11
+ and ebx,eax
+ xor ecx,esi
+ vpshufd xmm7,xmm6,232
+ add edx,DWORD [40+esp]
+ xor ebx,edi
+ shrd ecx,ecx,2
+ vpslldq xmm7,xmm7,8
+ add ebx,edx
+ add edx,DWORD [4+esp]
+ add ebx,ecx
+ vpaddd xmm0,xmm0,xmm7
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [8+esp]
+ vpaddd xmm6,xmm0,[ebp]
+ xor edx,ecx
+ mov edi,DWORD [12+esp]
+ xor esi,edi
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [4+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [24+esp]
+ mov esi,ebx
+ shrd ecx,ecx,9
+ mov DWORD [20+esp],ebx
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [16+esp]
+ shrd ecx,ecx,11
+ and eax,ebx
+ xor ecx,esi
+ add edx,DWORD [44+esp]
+ xor eax,edi
+ shrd ecx,ecx,2
+ add eax,edx
+ add edx,DWORD [esp]
+ add eax,ecx
+ vmovdqa [32+esp],xmm6
+ vpalignr xmm4,xmm2,xmm1,4
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [4+esp]
+ vpalignr xmm7,xmm0,xmm3,4
+ xor edx,ecx
+ mov edi,DWORD [8+esp]
+ xor esi,edi
+ vpsrld xmm6,xmm4,7
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [esp],ecx
+ vpaddd xmm1,xmm1,xmm7
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ vpsrld xmm7,xmm4,3
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [20+esp]
+ vpslld xmm5,xmm4,14
+ mov esi,eax
+ shrd ecx,ecx,9
+ mov DWORD [16+esp],eax
+ vpxor xmm4,xmm7,xmm6
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [12+esp]
+ vpshufd xmm7,xmm0,250
+ shrd ecx,ecx,11
+ and ebx,eax
+ xor ecx,esi
+ vpsrld xmm6,xmm6,11
+ add edx,DWORD [48+esp]
+ xor ebx,edi
+ shrd ecx,ecx,2
+ vpxor xmm4,xmm4,xmm5
+ add ebx,edx
+ add edx,DWORD [28+esp]
+ add ebx,ecx
+ vpslld xmm5,xmm5,11
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [esp]
+ vpxor xmm4,xmm4,xmm6
+ xor edx,ecx
+ mov edi,DWORD [4+esp]
+ xor esi,edi
+ vpsrld xmm6,xmm7,10
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [28+esp],ecx
+ vpxor xmm4,xmm4,xmm5
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ vpsrlq xmm5,xmm7,17
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [16+esp]
+ vpaddd xmm1,xmm1,xmm4
+ mov esi,ebx
+ shrd ecx,ecx,9
+ mov DWORD [12+esp],ebx
+ vpxor xmm6,xmm6,xmm5
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [8+esp]
+ vpsrlq xmm7,xmm7,19
+ shrd ecx,ecx,11
+ and eax,ebx
+ xor ecx,esi
+ vpxor xmm6,xmm6,xmm7
+ add edx,DWORD [52+esp]
+ xor eax,edi
+ shrd ecx,ecx,2
+ vpshufd xmm7,xmm6,132
+ add eax,edx
+ add edx,DWORD [24+esp]
+ add eax,ecx
+ vpsrldq xmm7,xmm7,8
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [28+esp]
+ vpaddd xmm1,xmm1,xmm7
+ xor edx,ecx
+ mov edi,DWORD [esp]
+ xor esi,edi
+ vpshufd xmm7,xmm1,80
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [24+esp],ecx
+ vpsrld xmm6,xmm7,10
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ vpsrlq xmm5,xmm7,17
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [12+esp]
+ vpxor xmm6,xmm6,xmm5
+ mov esi,eax
+ shrd ecx,ecx,9
+ mov DWORD [8+esp],eax
+ vpsrlq xmm7,xmm7,19
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [4+esp]
+ vpxor xmm6,xmm6,xmm7
+ shrd ecx,ecx,11
+ and ebx,eax
+ xor ecx,esi
+ vpshufd xmm7,xmm6,232
+ add edx,DWORD [56+esp]
+ xor ebx,edi
+ shrd ecx,ecx,2
+ vpslldq xmm7,xmm7,8
+ add ebx,edx
+ add edx,DWORD [20+esp]
+ add ebx,ecx
+ vpaddd xmm1,xmm1,xmm7
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [24+esp]
+ vpaddd xmm6,xmm1,[16+ebp]
+ xor edx,ecx
+ mov edi,DWORD [28+esp]
+ xor esi,edi
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [20+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [8+esp]
+ mov esi,ebx
+ shrd ecx,ecx,9
+ mov DWORD [4+esp],ebx
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [esp]
+ shrd ecx,ecx,11
+ and eax,ebx
+ xor ecx,esi
+ add edx,DWORD [60+esp]
+ xor eax,edi
+ shrd ecx,ecx,2
+ add eax,edx
+ add edx,DWORD [16+esp]
+ add eax,ecx
+ vmovdqa [48+esp],xmm6
+ vpalignr xmm4,xmm3,xmm2,4
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [20+esp]
+ vpalignr xmm7,xmm1,xmm0,4
+ xor edx,ecx
+ mov edi,DWORD [24+esp]
+ xor esi,edi
+ vpsrld xmm6,xmm4,7
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [16+esp],ecx
+ vpaddd xmm2,xmm2,xmm7
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ vpsrld xmm7,xmm4,3
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [4+esp]
+ vpslld xmm5,xmm4,14
+ mov esi,eax
+ shrd ecx,ecx,9
+ mov DWORD [esp],eax
+ vpxor xmm4,xmm7,xmm6
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [28+esp]
+ vpshufd xmm7,xmm1,250
+ shrd ecx,ecx,11
+ and ebx,eax
+ xor ecx,esi
+ vpsrld xmm6,xmm6,11
+ add edx,DWORD [64+esp]
+ xor ebx,edi
+ shrd ecx,ecx,2
+ vpxor xmm4,xmm4,xmm5
+ add ebx,edx
+ add edx,DWORD [12+esp]
+ add ebx,ecx
+ vpslld xmm5,xmm5,11
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [16+esp]
+ vpxor xmm4,xmm4,xmm6
+ xor edx,ecx
+ mov edi,DWORD [20+esp]
+ xor esi,edi
+ vpsrld xmm6,xmm7,10
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [12+esp],ecx
+ vpxor xmm4,xmm4,xmm5
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ vpsrlq xmm5,xmm7,17
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [esp]
+ vpaddd xmm2,xmm2,xmm4
+ mov esi,ebx
+ shrd ecx,ecx,9
+ mov DWORD [28+esp],ebx
+ vpxor xmm6,xmm6,xmm5
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [24+esp]
+ vpsrlq xmm7,xmm7,19
+ shrd ecx,ecx,11
+ and eax,ebx
+ xor ecx,esi
+ vpxor xmm6,xmm6,xmm7
+ add edx,DWORD [68+esp]
+ xor eax,edi
+ shrd ecx,ecx,2
+ vpshufd xmm7,xmm6,132
+ add eax,edx
+ add edx,DWORD [8+esp]
+ add eax,ecx
+ vpsrldq xmm7,xmm7,8
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [12+esp]
+ vpaddd xmm2,xmm2,xmm7
+ xor edx,ecx
+ mov edi,DWORD [16+esp]
+ xor esi,edi
+ vpshufd xmm7,xmm2,80
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [8+esp],ecx
+ vpsrld xmm6,xmm7,10
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ vpsrlq xmm5,xmm7,17
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [28+esp]
+ vpxor xmm6,xmm6,xmm5
+ mov esi,eax
+ shrd ecx,ecx,9
+ mov DWORD [24+esp],eax
+ vpsrlq xmm7,xmm7,19
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [20+esp]
+ vpxor xmm6,xmm6,xmm7
+ shrd ecx,ecx,11
+ and ebx,eax
+ xor ecx,esi
+ vpshufd xmm7,xmm6,232
+ add edx,DWORD [72+esp]
+ xor ebx,edi
+ shrd ecx,ecx,2
+ vpslldq xmm7,xmm7,8
+ add ebx,edx
+ add edx,DWORD [4+esp]
+ add ebx,ecx
+ vpaddd xmm2,xmm2,xmm7
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [8+esp]
+ vpaddd xmm6,xmm2,[32+ebp]
+ xor edx,ecx
+ mov edi,DWORD [12+esp]
+ xor esi,edi
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [4+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [24+esp]
+ mov esi,ebx
+ shrd ecx,ecx,9
+ mov DWORD [20+esp],ebx
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [16+esp]
+ shrd ecx,ecx,11
+ and eax,ebx
+ xor ecx,esi
+ add edx,DWORD [76+esp]
+ xor eax,edi
+ shrd ecx,ecx,2
+ add eax,edx
+ add edx,DWORD [esp]
+ add eax,ecx
+ vmovdqa [64+esp],xmm6
+ vpalignr xmm4,xmm0,xmm3,4
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [4+esp]
+ vpalignr xmm7,xmm2,xmm1,4
+ xor edx,ecx
+ mov edi,DWORD [8+esp]
+ xor esi,edi
+ vpsrld xmm6,xmm4,7
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [esp],ecx
+ vpaddd xmm3,xmm3,xmm7
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ vpsrld xmm7,xmm4,3
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [20+esp]
+ vpslld xmm5,xmm4,14
+ mov esi,eax
+ shrd ecx,ecx,9
+ mov DWORD [16+esp],eax
+ vpxor xmm4,xmm7,xmm6
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [12+esp]
+ vpshufd xmm7,xmm2,250
+ shrd ecx,ecx,11
+ and ebx,eax
+ xor ecx,esi
+ vpsrld xmm6,xmm6,11
+ add edx,DWORD [80+esp]
+ xor ebx,edi
+ shrd ecx,ecx,2
+ vpxor xmm4,xmm4,xmm5
+ add ebx,edx
+ add edx,DWORD [28+esp]
+ add ebx,ecx
+ vpslld xmm5,xmm5,11
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [esp]
+ vpxor xmm4,xmm4,xmm6
+ xor edx,ecx
+ mov edi,DWORD [4+esp]
+ xor esi,edi
+ vpsrld xmm6,xmm7,10
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [28+esp],ecx
+ vpxor xmm4,xmm4,xmm5
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ vpsrlq xmm5,xmm7,17
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [16+esp]
+ vpaddd xmm3,xmm3,xmm4
+ mov esi,ebx
+ shrd ecx,ecx,9
+ mov DWORD [12+esp],ebx
+ vpxor xmm6,xmm6,xmm5
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [8+esp]
+ vpsrlq xmm7,xmm7,19
+ shrd ecx,ecx,11
+ and eax,ebx
+ xor ecx,esi
+ vpxor xmm6,xmm6,xmm7
+ add edx,DWORD [84+esp]
+ xor eax,edi
+ shrd ecx,ecx,2
+ vpshufd xmm7,xmm6,132
+ add eax,edx
+ add edx,DWORD [24+esp]
+ add eax,ecx
+ vpsrldq xmm7,xmm7,8
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [28+esp]
+ vpaddd xmm3,xmm3,xmm7
+ xor edx,ecx
+ mov edi,DWORD [esp]
+ xor esi,edi
+ vpshufd xmm7,xmm3,80
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [24+esp],ecx
+ vpsrld xmm6,xmm7,10
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ vpsrlq xmm5,xmm7,17
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [12+esp]
+ vpxor xmm6,xmm6,xmm5
+ mov esi,eax
+ shrd ecx,ecx,9
+ mov DWORD [8+esp],eax
+ vpsrlq xmm7,xmm7,19
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [4+esp]
+ vpxor xmm6,xmm6,xmm7
+ shrd ecx,ecx,11
+ and ebx,eax
+ xor ecx,esi
+ vpshufd xmm7,xmm6,232
+ add edx,DWORD [88+esp]
+ xor ebx,edi
+ shrd ecx,ecx,2
+ vpslldq xmm7,xmm7,8
+ add ebx,edx
+ add edx,DWORD [20+esp]
+ add ebx,ecx
+ vpaddd xmm3,xmm3,xmm7
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [24+esp]
+ vpaddd xmm6,xmm3,[48+ebp]
+ xor edx,ecx
+ mov edi,DWORD [28+esp]
+ xor esi,edi
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [20+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [8+esp]
+ mov esi,ebx
+ shrd ecx,ecx,9
+ mov DWORD [4+esp],ebx
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [esp]
+ shrd ecx,ecx,11
+ and eax,ebx
+ xor ecx,esi
+ add edx,DWORD [92+esp]
+ xor eax,edi
+ shrd ecx,ecx,2
+ add eax,edx
+ add edx,DWORD [16+esp]
+ add eax,ecx
+ vmovdqa [80+esp],xmm6
+ cmp DWORD [64+ebp],66051
+ jne NEAR L$016avx_00_47
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [20+esp]
+ xor edx,ecx
+ mov edi,DWORD [24+esp]
+ xor esi,edi
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [16+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [4+esp]
+ mov esi,eax
+ shrd ecx,ecx,9
+ mov DWORD [esp],eax
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [28+esp]
+ shrd ecx,ecx,11
+ and ebx,eax
+ xor ecx,esi
+ add edx,DWORD [32+esp]
+ xor ebx,edi
+ shrd ecx,ecx,2
+ add ebx,edx
+ add edx,DWORD [12+esp]
+ add ebx,ecx
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [16+esp]
+ xor edx,ecx
+ mov edi,DWORD [20+esp]
+ xor esi,edi
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [12+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [esp]
+ mov esi,ebx
+ shrd ecx,ecx,9
+ mov DWORD [28+esp],ebx
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [24+esp]
+ shrd ecx,ecx,11
+ and eax,ebx
+ xor ecx,esi
+ add edx,DWORD [36+esp]
+ xor eax,edi
+ shrd ecx,ecx,2
+ add eax,edx
+ add edx,DWORD [8+esp]
+ add eax,ecx
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [12+esp]
+ xor edx,ecx
+ mov edi,DWORD [16+esp]
+ xor esi,edi
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [8+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [28+esp]
+ mov esi,eax
+ shrd ecx,ecx,9
+ mov DWORD [24+esp],eax
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [20+esp]
+ shrd ecx,ecx,11
+ and ebx,eax
+ xor ecx,esi
+ add edx,DWORD [40+esp]
+ xor ebx,edi
+ shrd ecx,ecx,2
+ add ebx,edx
+ add edx,DWORD [4+esp]
+ add ebx,ecx
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [8+esp]
+ xor edx,ecx
+ mov edi,DWORD [12+esp]
+ xor esi,edi
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [4+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [24+esp]
+ mov esi,ebx
+ shrd ecx,ecx,9
+ mov DWORD [20+esp],ebx
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [16+esp]
+ shrd ecx,ecx,11
+ and eax,ebx
+ xor ecx,esi
+ add edx,DWORD [44+esp]
+ xor eax,edi
+ shrd ecx,ecx,2
+ add eax,edx
+ add edx,DWORD [esp]
+ add eax,ecx
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [4+esp]
+ xor edx,ecx
+ mov edi,DWORD [8+esp]
+ xor esi,edi
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [20+esp]
+ mov esi,eax
+ shrd ecx,ecx,9
+ mov DWORD [16+esp],eax
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [12+esp]
+ shrd ecx,ecx,11
+ and ebx,eax
+ xor ecx,esi
+ add edx,DWORD [48+esp]
+ xor ebx,edi
+ shrd ecx,ecx,2
+ add ebx,edx
+ add edx,DWORD [28+esp]
+ add ebx,ecx
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [esp]
+ xor edx,ecx
+ mov edi,DWORD [4+esp]
+ xor esi,edi
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [28+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [16+esp]
+ mov esi,ebx
+ shrd ecx,ecx,9
+ mov DWORD [12+esp],ebx
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [8+esp]
+ shrd ecx,ecx,11
+ and eax,ebx
+ xor ecx,esi
+ add edx,DWORD [52+esp]
+ xor eax,edi
+ shrd ecx,ecx,2
+ add eax,edx
+ add edx,DWORD [24+esp]
+ add eax,ecx
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [28+esp]
+ xor edx,ecx
+ mov edi,DWORD [esp]
+ xor esi,edi
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [24+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [12+esp]
+ mov esi,eax
+ shrd ecx,ecx,9
+ mov DWORD [8+esp],eax
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [4+esp]
+ shrd ecx,ecx,11
+ and ebx,eax
+ xor ecx,esi
+ add edx,DWORD [56+esp]
+ xor ebx,edi
+ shrd ecx,ecx,2
+ add ebx,edx
+ add edx,DWORD [20+esp]
+ add ebx,ecx
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [24+esp]
+ xor edx,ecx
+ mov edi,DWORD [28+esp]
+ xor esi,edi
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [20+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [8+esp]
+ mov esi,ebx
+ shrd ecx,ecx,9
+ mov DWORD [4+esp],ebx
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [esp]
+ shrd ecx,ecx,11
+ and eax,ebx
+ xor ecx,esi
+ add edx,DWORD [60+esp]
+ xor eax,edi
+ shrd ecx,ecx,2
+ add eax,edx
+ add edx,DWORD [16+esp]
+ add eax,ecx
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [20+esp]
+ xor edx,ecx
+ mov edi,DWORD [24+esp]
+ xor esi,edi
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [16+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [4+esp]
+ mov esi,eax
+ shrd ecx,ecx,9
+ mov DWORD [esp],eax
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [28+esp]
+ shrd ecx,ecx,11
+ and ebx,eax
+ xor ecx,esi
+ add edx,DWORD [64+esp]
+ xor ebx,edi
+ shrd ecx,ecx,2
+ add ebx,edx
+ add edx,DWORD [12+esp]
+ add ebx,ecx
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [16+esp]
+ xor edx,ecx
+ mov edi,DWORD [20+esp]
+ xor esi,edi
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [12+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [esp]
+ mov esi,ebx
+ shrd ecx,ecx,9
+ mov DWORD [28+esp],ebx
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [24+esp]
+ shrd ecx,ecx,11
+ and eax,ebx
+ xor ecx,esi
+ add edx,DWORD [68+esp]
+ xor eax,edi
+ shrd ecx,ecx,2
+ add eax,edx
+ add edx,DWORD [8+esp]
+ add eax,ecx
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [12+esp]
+ xor edx,ecx
+ mov edi,DWORD [16+esp]
+ xor esi,edi
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [8+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [28+esp]
+ mov esi,eax
+ shrd ecx,ecx,9
+ mov DWORD [24+esp],eax
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [20+esp]
+ shrd ecx,ecx,11
+ and ebx,eax
+ xor ecx,esi
+ add edx,DWORD [72+esp]
+ xor ebx,edi
+ shrd ecx,ecx,2
+ add ebx,edx
+ add edx,DWORD [4+esp]
+ add ebx,ecx
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [8+esp]
+ xor edx,ecx
+ mov edi,DWORD [12+esp]
+ xor esi,edi
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [4+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [24+esp]
+ mov esi,ebx
+ shrd ecx,ecx,9
+ mov DWORD [20+esp],ebx
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [16+esp]
+ shrd ecx,ecx,11
+ and eax,ebx
+ xor ecx,esi
+ add edx,DWORD [76+esp]
+ xor eax,edi
+ shrd ecx,ecx,2
+ add eax,edx
+ add edx,DWORD [esp]
+ add eax,ecx
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [4+esp]
+ xor edx,ecx
+ mov edi,DWORD [8+esp]
+ xor esi,edi
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [20+esp]
+ mov esi,eax
+ shrd ecx,ecx,9
+ mov DWORD [16+esp],eax
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [12+esp]
+ shrd ecx,ecx,11
+ and ebx,eax
+ xor ecx,esi
+ add edx,DWORD [80+esp]
+ xor ebx,edi
+ shrd ecx,ecx,2
+ add ebx,edx
+ add edx,DWORD [28+esp]
+ add ebx,ecx
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [esp]
+ xor edx,ecx
+ mov edi,DWORD [4+esp]
+ xor esi,edi
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [28+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [16+esp]
+ mov esi,ebx
+ shrd ecx,ecx,9
+ mov DWORD [12+esp],ebx
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [8+esp]
+ shrd ecx,ecx,11
+ and eax,ebx
+ xor ecx,esi
+ add edx,DWORD [84+esp]
+ xor eax,edi
+ shrd ecx,ecx,2
+ add eax,edx
+ add edx,DWORD [24+esp]
+ add eax,ecx
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [28+esp]
+ xor edx,ecx
+ mov edi,DWORD [esp]
+ xor esi,edi
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [24+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ mov ecx,eax
+ add edx,edi
+ mov edi,DWORD [12+esp]
+ mov esi,eax
+ shrd ecx,ecx,9
+ mov DWORD [8+esp],eax
+ xor ecx,eax
+ xor eax,edi
+ add edx,DWORD [4+esp]
+ shrd ecx,ecx,11
+ and ebx,eax
+ xor ecx,esi
+ add edx,DWORD [88+esp]
+ xor ebx,edi
+ shrd ecx,ecx,2
+ add ebx,edx
+ add edx,DWORD [20+esp]
+ add ebx,ecx
+ mov ecx,edx
+ shrd edx,edx,14
+ mov esi,DWORD [24+esp]
+ xor edx,ecx
+ mov edi,DWORD [28+esp]
+ xor esi,edi
+ shrd edx,edx,5
+ and esi,ecx
+ mov DWORD [20+esp],ecx
+ xor edx,ecx
+ xor edi,esi
+ shrd edx,edx,6
+ mov ecx,ebx
+ add edx,edi
+ mov edi,DWORD [8+esp]
+ mov esi,ebx
+ shrd ecx,ecx,9
+ mov DWORD [4+esp],ebx
+ xor ecx,ebx
+ xor ebx,edi
+ add edx,DWORD [esp]
+ shrd ecx,ecx,11
+ and eax,ebx
+ xor ecx,esi
+ add edx,DWORD [92+esp]
+ xor eax,edi
+ shrd ecx,ecx,2
+ add eax,edx
+ add edx,DWORD [16+esp]
+ add eax,ecx
+ mov esi,DWORD [96+esp]
+ xor ebx,edi
+ mov ecx,DWORD [12+esp]
+ add eax,DWORD [esi]
+ add ebx,DWORD [4+esi]
+ add edi,DWORD [8+esi]
+ add ecx,DWORD [12+esi]
+ mov DWORD [esi],eax
+ mov DWORD [4+esi],ebx
+ mov DWORD [8+esi],edi
+ mov DWORD [12+esi],ecx
+ mov DWORD [4+esp],ebx
+ xor ebx,edi
+ mov DWORD [8+esp],edi
+ mov DWORD [12+esp],ecx
+ mov edi,DWORD [20+esp]
+ mov ecx,DWORD [24+esp]
+ add edx,DWORD [16+esi]
+ add edi,DWORD [20+esi]
+ add ecx,DWORD [24+esi]
+ mov DWORD [16+esi],edx
+ mov DWORD [20+esi],edi
+ mov DWORD [20+esp],edi
+ mov edi,DWORD [28+esp]
+ mov DWORD [24+esi],ecx
+ add edi,DWORD [28+esi]
+ mov DWORD [24+esp],ecx
+ mov DWORD [28+esi],edi
+ mov DWORD [28+esp],edi
+ mov edi,DWORD [100+esp]
+ vmovdqa xmm7,[64+ebp]
+ sub ebp,192
+ cmp edi,DWORD [104+esp]
+ jb NEAR L$015grand_avx
+ mov esp,DWORD [108+esp]
+ vzeroall
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+align 32
+L$014AVX_BMI:
+ lea esp,[esp-96]
+ vzeroall
+ mov eax,DWORD [esi]
+ mov ebx,DWORD [4+esi]
+ mov ecx,DWORD [8+esi]
+ mov edi,DWORD [12+esi]
+ mov DWORD [4+esp],ebx
+ xor ebx,ecx
+ mov DWORD [8+esp],ecx
+ mov DWORD [12+esp],edi
+ mov edx,DWORD [16+esi]
+ mov edi,DWORD [20+esi]
+ mov ecx,DWORD [24+esi]
+ mov esi,DWORD [28+esi]
+ mov DWORD [20+esp],edi
+ mov edi,DWORD [100+esp]
+ mov DWORD [24+esp],ecx
+ mov DWORD [28+esp],esi
+ vmovdqa xmm7,[256+ebp]
+ jmp NEAR L$017grand_avx_bmi
+align 32
+L$017grand_avx_bmi:
+ vmovdqu xmm0,[edi]
+ vmovdqu xmm1,[16+edi]
+ vmovdqu xmm2,[32+edi]
+ vmovdqu xmm3,[48+edi]
+ add edi,64
+ vpshufb xmm0,xmm0,xmm7
+ mov DWORD [100+esp],edi
+ vpshufb xmm1,xmm1,xmm7
+ vpshufb xmm2,xmm2,xmm7
+ vpaddd xmm4,xmm0,[ebp]
+ vpshufb xmm3,xmm3,xmm7
+ vpaddd xmm5,xmm1,[16+ebp]
+ vpaddd xmm6,xmm2,[32+ebp]
+ vpaddd xmm7,xmm3,[48+ebp]
+ vmovdqa [32+esp],xmm4
+ vmovdqa [48+esp],xmm5
+ vmovdqa [64+esp],xmm6
+ vmovdqa [80+esp],xmm7
+ jmp NEAR L$018avx_bmi_00_47
+align 16
+L$018avx_bmi_00_47:
+ add ebp,64
+ vpalignr xmm4,xmm1,xmm0,4
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ mov DWORD [16+esp],edx
+ vpalignr xmm7,xmm3,xmm2,4
+ rorx edi,edx,25
+ xor ecx,esi
+ andn esi,edx,DWORD [24+esp]
+ vpsrld xmm6,xmm4,7
+ xor ecx,edi
+ and edx,DWORD [20+esp]
+ mov DWORD [esp],eax
+ vpaddd xmm0,xmm0,xmm7
+ or edx,esi
+ rorx edi,eax,2
+ rorx esi,eax,13
+ vpsrld xmm7,xmm4,3
+ lea edx,[ecx*1+edx]
+ rorx ecx,eax,22
+ xor esi,edi
+ vpslld xmm5,xmm4,14
+ mov edi,DWORD [4+esp]
+ xor ecx,esi
+ xor eax,edi
+ vpxor xmm4,xmm7,xmm6
+ add edx,DWORD [28+esp]
+ and ebx,eax
+ add edx,DWORD [32+esp]
+ vpshufd xmm7,xmm3,250
+ xor ebx,edi
+ add ecx,edx
+ add edx,DWORD [12+esp]
+ vpsrld xmm6,xmm6,11
+ lea ebx,[ecx*1+ebx]
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ vpxor xmm4,xmm4,xmm5
+ mov DWORD [12+esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ vpslld xmm5,xmm5,11
+ andn esi,edx,DWORD [20+esp]
+ xor ecx,edi
+ and edx,DWORD [16+esp]
+ vpxor xmm4,xmm4,xmm6
+ mov DWORD [28+esp],ebx
+ or edx,esi
+ rorx edi,ebx,2
+ rorx esi,ebx,13
+ vpsrld xmm6,xmm7,10
+ lea edx,[ecx*1+edx]
+ rorx ecx,ebx,22
+ xor esi,edi
+ vpxor xmm4,xmm4,xmm5
+ mov edi,DWORD [esp]
+ xor ecx,esi
+ xor ebx,edi
+ vpsrlq xmm5,xmm7,17
+ add edx,DWORD [24+esp]
+ and eax,ebx
+ add edx,DWORD [36+esp]
+ vpaddd xmm0,xmm0,xmm4
+ xor eax,edi
+ add ecx,edx
+ add edx,DWORD [8+esp]
+ vpxor xmm6,xmm6,xmm5
+ lea eax,[ecx*1+eax]
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ vpsrlq xmm7,xmm7,19
+ mov DWORD [8+esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ vpxor xmm6,xmm6,xmm7
+ andn esi,edx,DWORD [16+esp]
+ xor ecx,edi
+ and edx,DWORD [12+esp]
+ vpshufd xmm7,xmm6,132
+ mov DWORD [24+esp],eax
+ or edx,esi
+ rorx edi,eax,2
+ rorx esi,eax,13
+ vpsrldq xmm7,xmm7,8
+ lea edx,[ecx*1+edx]
+ rorx ecx,eax,22
+ xor esi,edi
+ vpaddd xmm0,xmm0,xmm7
+ mov edi,DWORD [28+esp]
+ xor ecx,esi
+ xor eax,edi
+ vpshufd xmm7,xmm0,80
+ add edx,DWORD [20+esp]
+ and ebx,eax
+ add edx,DWORD [40+esp]
+ vpsrld xmm6,xmm7,10
+ xor ebx,edi
+ add ecx,edx
+ add edx,DWORD [4+esp]
+ vpsrlq xmm5,xmm7,17
+ lea ebx,[ecx*1+ebx]
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ vpxor xmm6,xmm6,xmm5
+ mov DWORD [4+esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ vpsrlq xmm7,xmm7,19
+ andn esi,edx,DWORD [12+esp]
+ xor ecx,edi
+ and edx,DWORD [8+esp]
+ vpxor xmm6,xmm6,xmm7
+ mov DWORD [20+esp],ebx
+ or edx,esi
+ rorx edi,ebx,2
+ rorx esi,ebx,13
+ vpshufd xmm7,xmm6,232
+ lea edx,[ecx*1+edx]
+ rorx ecx,ebx,22
+ xor esi,edi
+ vpslldq xmm7,xmm7,8
+ mov edi,DWORD [24+esp]
+ xor ecx,esi
+ xor ebx,edi
+ vpaddd xmm0,xmm0,xmm7
+ add edx,DWORD [16+esp]
+ and eax,ebx
+ add edx,DWORD [44+esp]
+ vpaddd xmm6,xmm0,[ebp]
+ xor eax,edi
+ add ecx,edx
+ add edx,DWORD [esp]
+ lea eax,[ecx*1+eax]
+ vmovdqa [32+esp],xmm6
+ vpalignr xmm4,xmm2,xmm1,4
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ mov DWORD [esp],edx
+ vpalignr xmm7,xmm0,xmm3,4
+ rorx edi,edx,25
+ xor ecx,esi
+ andn esi,edx,DWORD [8+esp]
+ vpsrld xmm6,xmm4,7
+ xor ecx,edi
+ and edx,DWORD [4+esp]
+ mov DWORD [16+esp],eax
+ vpaddd xmm1,xmm1,xmm7
+ or edx,esi
+ rorx edi,eax,2
+ rorx esi,eax,13
+ vpsrld xmm7,xmm4,3
+ lea edx,[ecx*1+edx]
+ rorx ecx,eax,22
+ xor esi,edi
+ vpslld xmm5,xmm4,14
+ mov edi,DWORD [20+esp]
+ xor ecx,esi
+ xor eax,edi
+ vpxor xmm4,xmm7,xmm6
+ add edx,DWORD [12+esp]
+ and ebx,eax
+ add edx,DWORD [48+esp]
+ vpshufd xmm7,xmm0,250
+ xor ebx,edi
+ add ecx,edx
+ add edx,DWORD [28+esp]
+ vpsrld xmm6,xmm6,11
+ lea ebx,[ecx*1+ebx]
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ vpxor xmm4,xmm4,xmm5
+ mov DWORD [28+esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ vpslld xmm5,xmm5,11
+ andn esi,edx,DWORD [4+esp]
+ xor ecx,edi
+ and edx,DWORD [esp]
+ vpxor xmm4,xmm4,xmm6
+ mov DWORD [12+esp],ebx
+ or edx,esi
+ rorx edi,ebx,2
+ rorx esi,ebx,13
+ vpsrld xmm6,xmm7,10
+ lea edx,[ecx*1+edx]
+ rorx ecx,ebx,22
+ xor esi,edi
+ vpxor xmm4,xmm4,xmm5
+ mov edi,DWORD [16+esp]
+ xor ecx,esi
+ xor ebx,edi
+ vpsrlq xmm5,xmm7,17
+ add edx,DWORD [8+esp]
+ and eax,ebx
+ add edx,DWORD [52+esp]
+ vpaddd xmm1,xmm1,xmm4
+ xor eax,edi
+ add ecx,edx
+ add edx,DWORD [24+esp]
+ vpxor xmm6,xmm6,xmm5
+ lea eax,[ecx*1+eax]
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ vpsrlq xmm7,xmm7,19
+ mov DWORD [24+esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ vpxor xmm6,xmm6,xmm7
+ andn esi,edx,DWORD [esp]
+ xor ecx,edi
+ and edx,DWORD [28+esp]
+ vpshufd xmm7,xmm6,132
+ mov DWORD [8+esp],eax
+ or edx,esi
+ rorx edi,eax,2
+ rorx esi,eax,13
+ vpsrldq xmm7,xmm7,8
+ lea edx,[ecx*1+edx]
+ rorx ecx,eax,22
+ xor esi,edi
+ vpaddd xmm1,xmm1,xmm7
+ mov edi,DWORD [12+esp]
+ xor ecx,esi
+ xor eax,edi
+ vpshufd xmm7,xmm1,80
+ add edx,DWORD [4+esp]
+ and ebx,eax
+ add edx,DWORD [56+esp]
+ vpsrld xmm6,xmm7,10
+ xor ebx,edi
+ add ecx,edx
+ add edx,DWORD [20+esp]
+ vpsrlq xmm5,xmm7,17
+ lea ebx,[ecx*1+ebx]
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ vpxor xmm6,xmm6,xmm5
+ mov DWORD [20+esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ vpsrlq xmm7,xmm7,19
+ andn esi,edx,DWORD [28+esp]
+ xor ecx,edi
+ and edx,DWORD [24+esp]
+ vpxor xmm6,xmm6,xmm7
+ mov DWORD [4+esp],ebx
+ or edx,esi
+ rorx edi,ebx,2
+ rorx esi,ebx,13
+ vpshufd xmm7,xmm6,232
+ lea edx,[ecx*1+edx]
+ rorx ecx,ebx,22
+ xor esi,edi
+ vpslldq xmm7,xmm7,8
+ mov edi,DWORD [8+esp]
+ xor ecx,esi
+ xor ebx,edi
+ vpaddd xmm1,xmm1,xmm7
+ add edx,DWORD [esp]
+ and eax,ebx
+ add edx,DWORD [60+esp]
+ vpaddd xmm6,xmm1,[16+ebp]
+ xor eax,edi
+ add ecx,edx
+ add edx,DWORD [16+esp]
+ lea eax,[ecx*1+eax]
+ vmovdqa [48+esp],xmm6
+ vpalignr xmm4,xmm3,xmm2,4
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ mov DWORD [16+esp],edx
+ vpalignr xmm7,xmm1,xmm0,4
+ rorx edi,edx,25
+ xor ecx,esi
+ andn esi,edx,DWORD [24+esp]
+ vpsrld xmm6,xmm4,7
+ xor ecx,edi
+ and edx,DWORD [20+esp]
+ mov DWORD [esp],eax
+ vpaddd xmm2,xmm2,xmm7
+ or edx,esi
+ rorx edi,eax,2
+ rorx esi,eax,13
+ vpsrld xmm7,xmm4,3
+ lea edx,[ecx*1+edx]
+ rorx ecx,eax,22
+ xor esi,edi
+ vpslld xmm5,xmm4,14
+ mov edi,DWORD [4+esp]
+ xor ecx,esi
+ xor eax,edi
+ vpxor xmm4,xmm7,xmm6
+ add edx,DWORD [28+esp]
+ and ebx,eax
+ add edx,DWORD [64+esp]
+ vpshufd xmm7,xmm1,250
+ xor ebx,edi
+ add ecx,edx
+ add edx,DWORD [12+esp]
+ vpsrld xmm6,xmm6,11
+ lea ebx,[ecx*1+ebx]
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ vpxor xmm4,xmm4,xmm5
+ mov DWORD [12+esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ vpslld xmm5,xmm5,11
+ andn esi,edx,DWORD [20+esp]
+ xor ecx,edi
+ and edx,DWORD [16+esp]
+ vpxor xmm4,xmm4,xmm6
+ mov DWORD [28+esp],ebx
+ or edx,esi
+ rorx edi,ebx,2
+ rorx esi,ebx,13
+ vpsrld xmm6,xmm7,10
+ lea edx,[ecx*1+edx]
+ rorx ecx,ebx,22
+ xor esi,edi
+ vpxor xmm4,xmm4,xmm5
+ mov edi,DWORD [esp]
+ xor ecx,esi
+ xor ebx,edi
+ vpsrlq xmm5,xmm7,17
+ add edx,DWORD [24+esp]
+ and eax,ebx
+ add edx,DWORD [68+esp]
+ vpaddd xmm2,xmm2,xmm4
+ xor eax,edi
+ add ecx,edx
+ add edx,DWORD [8+esp]
+ vpxor xmm6,xmm6,xmm5
+ lea eax,[ecx*1+eax]
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ vpsrlq xmm7,xmm7,19
+ mov DWORD [8+esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ vpxor xmm6,xmm6,xmm7
+ andn esi,edx,DWORD [16+esp]
+ xor ecx,edi
+ and edx,DWORD [12+esp]
+ vpshufd xmm7,xmm6,132
+ mov DWORD [24+esp],eax
+ or edx,esi
+ rorx edi,eax,2
+ rorx esi,eax,13
+ vpsrldq xmm7,xmm7,8
+ lea edx,[ecx*1+edx]
+ rorx ecx,eax,22
+ xor esi,edi
+ vpaddd xmm2,xmm2,xmm7
+ mov edi,DWORD [28+esp]
+ xor ecx,esi
+ xor eax,edi
+ vpshufd xmm7,xmm2,80
+ add edx,DWORD [20+esp]
+ and ebx,eax
+ add edx,DWORD [72+esp]
+ vpsrld xmm6,xmm7,10
+ xor ebx,edi
+ add ecx,edx
+ add edx,DWORD [4+esp]
+ vpsrlq xmm5,xmm7,17
+ lea ebx,[ecx*1+ebx]
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ vpxor xmm6,xmm6,xmm5
+ mov DWORD [4+esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ vpsrlq xmm7,xmm7,19
+ andn esi,edx,DWORD [12+esp]
+ xor ecx,edi
+ and edx,DWORD [8+esp]
+ vpxor xmm6,xmm6,xmm7
+ mov DWORD [20+esp],ebx
+ or edx,esi
+ rorx edi,ebx,2
+ rorx esi,ebx,13
+ vpshufd xmm7,xmm6,232
+ lea edx,[ecx*1+edx]
+ rorx ecx,ebx,22
+ xor esi,edi
+ vpslldq xmm7,xmm7,8
+ mov edi,DWORD [24+esp]
+ xor ecx,esi
+ xor ebx,edi
+ vpaddd xmm2,xmm2,xmm7
+ add edx,DWORD [16+esp]
+ and eax,ebx
+ add edx,DWORD [76+esp]
+ vpaddd xmm6,xmm2,[32+ebp]
+ xor eax,edi
+ add ecx,edx
+ add edx,DWORD [esp]
+ lea eax,[ecx*1+eax]
+ vmovdqa [64+esp],xmm6
+ vpalignr xmm4,xmm0,xmm3,4
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ mov DWORD [esp],edx
+ vpalignr xmm7,xmm2,xmm1,4
+ rorx edi,edx,25
+ xor ecx,esi
+ andn esi,edx,DWORD [8+esp]
+ vpsrld xmm6,xmm4,7
+ xor ecx,edi
+ and edx,DWORD [4+esp]
+ mov DWORD [16+esp],eax
+ vpaddd xmm3,xmm3,xmm7
+ or edx,esi
+ rorx edi,eax,2
+ rorx esi,eax,13
+ vpsrld xmm7,xmm4,3
+ lea edx,[ecx*1+edx]
+ rorx ecx,eax,22
+ xor esi,edi
+ vpslld xmm5,xmm4,14
+ mov edi,DWORD [20+esp]
+ xor ecx,esi
+ xor eax,edi
+ vpxor xmm4,xmm7,xmm6
+ add edx,DWORD [12+esp]
+ and ebx,eax
+ add edx,DWORD [80+esp]
+ vpshufd xmm7,xmm2,250
+ xor ebx,edi
+ add ecx,edx
+ add edx,DWORD [28+esp]
+ vpsrld xmm6,xmm6,11
+ lea ebx,[ecx*1+ebx]
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ vpxor xmm4,xmm4,xmm5
+ mov DWORD [28+esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ vpslld xmm5,xmm5,11
+ andn esi,edx,DWORD [4+esp]
+ xor ecx,edi
+ and edx,DWORD [esp]
+ vpxor xmm4,xmm4,xmm6
+ mov DWORD [12+esp],ebx
+ or edx,esi
+ rorx edi,ebx,2
+ rorx esi,ebx,13
+ vpsrld xmm6,xmm7,10
+ lea edx,[ecx*1+edx]
+ rorx ecx,ebx,22
+ xor esi,edi
+ vpxor xmm4,xmm4,xmm5
+ mov edi,DWORD [16+esp]
+ xor ecx,esi
+ xor ebx,edi
+ vpsrlq xmm5,xmm7,17
+ add edx,DWORD [8+esp]
+ and eax,ebx
+ add edx,DWORD [84+esp]
+ vpaddd xmm3,xmm3,xmm4
+ xor eax,edi
+ add ecx,edx
+ add edx,DWORD [24+esp]
+ vpxor xmm6,xmm6,xmm5
+ lea eax,[ecx*1+eax]
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ vpsrlq xmm7,xmm7,19
+ mov DWORD [24+esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ vpxor xmm6,xmm6,xmm7
+ andn esi,edx,DWORD [esp]
+ xor ecx,edi
+ and edx,DWORD [28+esp]
+ vpshufd xmm7,xmm6,132
+ mov DWORD [8+esp],eax
+ or edx,esi
+ rorx edi,eax,2
+ rorx esi,eax,13
+ vpsrldq xmm7,xmm7,8
+ lea edx,[ecx*1+edx]
+ rorx ecx,eax,22
+ xor esi,edi
+ vpaddd xmm3,xmm3,xmm7
+ mov edi,DWORD [12+esp]
+ xor ecx,esi
+ xor eax,edi
+ vpshufd xmm7,xmm3,80
+ add edx,DWORD [4+esp]
+ and ebx,eax
+ add edx,DWORD [88+esp]
+ vpsrld xmm6,xmm7,10
+ xor ebx,edi
+ add ecx,edx
+ add edx,DWORD [20+esp]
+ vpsrlq xmm5,xmm7,17
+ lea ebx,[ecx*1+ebx]
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ vpxor xmm6,xmm6,xmm5
+ mov DWORD [20+esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ vpsrlq xmm7,xmm7,19
+ andn esi,edx,DWORD [28+esp]
+ xor ecx,edi
+ and edx,DWORD [24+esp]
+ vpxor xmm6,xmm6,xmm7
+ mov DWORD [4+esp],ebx
+ or edx,esi
+ rorx edi,ebx,2
+ rorx esi,ebx,13
+ vpshufd xmm7,xmm6,232
+ lea edx,[ecx*1+edx]
+ rorx ecx,ebx,22
+ xor esi,edi
+ vpslldq xmm7,xmm7,8
+ mov edi,DWORD [8+esp]
+ xor ecx,esi
+ xor ebx,edi
+ vpaddd xmm3,xmm3,xmm7
+ add edx,DWORD [esp]
+ and eax,ebx
+ add edx,DWORD [92+esp]
+ vpaddd xmm6,xmm3,[48+ebp]
+ xor eax,edi
+ add ecx,edx
+ add edx,DWORD [16+esp]
+ lea eax,[ecx*1+eax]
+ vmovdqa [80+esp],xmm6
+ cmp DWORD [64+ebp],66051
+ jne NEAR L$018avx_bmi_00_47
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ mov DWORD [16+esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ andn esi,edx,DWORD [24+esp]
+ xor ecx,edi
+ and edx,DWORD [20+esp]
+ mov DWORD [esp],eax
+ or edx,esi
+ rorx edi,eax,2
+ rorx esi,eax,13
+ lea edx,[ecx*1+edx]
+ rorx ecx,eax,22
+ xor esi,edi
+ mov edi,DWORD [4+esp]
+ xor ecx,esi
+ xor eax,edi
+ add edx,DWORD [28+esp]
+ and ebx,eax
+ add edx,DWORD [32+esp]
+ xor ebx,edi
+ add ecx,edx
+ add edx,DWORD [12+esp]
+ lea ebx,[ecx*1+ebx]
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ mov DWORD [12+esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ andn esi,edx,DWORD [20+esp]
+ xor ecx,edi
+ and edx,DWORD [16+esp]
+ mov DWORD [28+esp],ebx
+ or edx,esi
+ rorx edi,ebx,2
+ rorx esi,ebx,13
+ lea edx,[ecx*1+edx]
+ rorx ecx,ebx,22
+ xor esi,edi
+ mov edi,DWORD [esp]
+ xor ecx,esi
+ xor ebx,edi
+ add edx,DWORD [24+esp]
+ and eax,ebx
+ add edx,DWORD [36+esp]
+ xor eax,edi
+ add ecx,edx
+ add edx,DWORD [8+esp]
+ lea eax,[ecx*1+eax]
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ mov DWORD [8+esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ andn esi,edx,DWORD [16+esp]
+ xor ecx,edi
+ and edx,DWORD [12+esp]
+ mov DWORD [24+esp],eax
+ or edx,esi
+ rorx edi,eax,2
+ rorx esi,eax,13
+ lea edx,[ecx*1+edx]
+ rorx ecx,eax,22
+ xor esi,edi
+ mov edi,DWORD [28+esp]
+ xor ecx,esi
+ xor eax,edi
+ add edx,DWORD [20+esp]
+ and ebx,eax
+ add edx,DWORD [40+esp]
+ xor ebx,edi
+ add ecx,edx
+ add edx,DWORD [4+esp]
+ lea ebx,[ecx*1+ebx]
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ mov DWORD [4+esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ andn esi,edx,DWORD [12+esp]
+ xor ecx,edi
+ and edx,DWORD [8+esp]
+ mov DWORD [20+esp],ebx
+ or edx,esi
+ rorx edi,ebx,2
+ rorx esi,ebx,13
+ lea edx,[ecx*1+edx]
+ rorx ecx,ebx,22
+ xor esi,edi
+ mov edi,DWORD [24+esp]
+ xor ecx,esi
+ xor ebx,edi
+ add edx,DWORD [16+esp]
+ and eax,ebx
+ add edx,DWORD [44+esp]
+ xor eax,edi
+ add ecx,edx
+ add edx,DWORD [esp]
+ lea eax,[ecx*1+eax]
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ mov DWORD [esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ andn esi,edx,DWORD [8+esp]
+ xor ecx,edi
+ and edx,DWORD [4+esp]
+ mov DWORD [16+esp],eax
+ or edx,esi
+ rorx edi,eax,2
+ rorx esi,eax,13
+ lea edx,[ecx*1+edx]
+ rorx ecx,eax,22
+ xor esi,edi
+ mov edi,DWORD [20+esp]
+ xor ecx,esi
+ xor eax,edi
+ add edx,DWORD [12+esp]
+ and ebx,eax
+ add edx,DWORD [48+esp]
+ xor ebx,edi
+ add ecx,edx
+ add edx,DWORD [28+esp]
+ lea ebx,[ecx*1+ebx]
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ mov DWORD [28+esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ andn esi,edx,DWORD [4+esp]
+ xor ecx,edi
+ and edx,DWORD [esp]
+ mov DWORD [12+esp],ebx
+ or edx,esi
+ rorx edi,ebx,2
+ rorx esi,ebx,13
+ lea edx,[ecx*1+edx]
+ rorx ecx,ebx,22
+ xor esi,edi
+ mov edi,DWORD [16+esp]
+ xor ecx,esi
+ xor ebx,edi
+ add edx,DWORD [8+esp]
+ and eax,ebx
+ add edx,DWORD [52+esp]
+ xor eax,edi
+ add ecx,edx
+ add edx,DWORD [24+esp]
+ lea eax,[ecx*1+eax]
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ mov DWORD [24+esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ andn esi,edx,DWORD [esp]
+ xor ecx,edi
+ and edx,DWORD [28+esp]
+ mov DWORD [8+esp],eax
+ or edx,esi
+ rorx edi,eax,2
+ rorx esi,eax,13
+ lea edx,[ecx*1+edx]
+ rorx ecx,eax,22
+ xor esi,edi
+ mov edi,DWORD [12+esp]
+ xor ecx,esi
+ xor eax,edi
+ add edx,DWORD [4+esp]
+ and ebx,eax
+ add edx,DWORD [56+esp]
+ xor ebx,edi
+ add ecx,edx
+ add edx,DWORD [20+esp]
+ lea ebx,[ecx*1+ebx]
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ mov DWORD [20+esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ andn esi,edx,DWORD [28+esp]
+ xor ecx,edi
+ and edx,DWORD [24+esp]
+ mov DWORD [4+esp],ebx
+ or edx,esi
+ rorx edi,ebx,2
+ rorx esi,ebx,13
+ lea edx,[ecx*1+edx]
+ rorx ecx,ebx,22
+ xor esi,edi
+ mov edi,DWORD [8+esp]
+ xor ecx,esi
+ xor ebx,edi
+ add edx,DWORD [esp]
+ and eax,ebx
+ add edx,DWORD [60+esp]
+ xor eax,edi
+ add ecx,edx
+ add edx,DWORD [16+esp]
+ lea eax,[ecx*1+eax]
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ mov DWORD [16+esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ andn esi,edx,DWORD [24+esp]
+ xor ecx,edi
+ and edx,DWORD [20+esp]
+ mov DWORD [esp],eax
+ or edx,esi
+ rorx edi,eax,2
+ rorx esi,eax,13
+ lea edx,[ecx*1+edx]
+ rorx ecx,eax,22
+ xor esi,edi
+ mov edi,DWORD [4+esp]
+ xor ecx,esi
+ xor eax,edi
+ add edx,DWORD [28+esp]
+ and ebx,eax
+ add edx,DWORD [64+esp]
+ xor ebx,edi
+ add ecx,edx
+ add edx,DWORD [12+esp]
+ lea ebx,[ecx*1+ebx]
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ mov DWORD [12+esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ andn esi,edx,DWORD [20+esp]
+ xor ecx,edi
+ and edx,DWORD [16+esp]
+ mov DWORD [28+esp],ebx
+ or edx,esi
+ rorx edi,ebx,2
+ rorx esi,ebx,13
+ lea edx,[ecx*1+edx]
+ rorx ecx,ebx,22
+ xor esi,edi
+ mov edi,DWORD [esp]
+ xor ecx,esi
+ xor ebx,edi
+ add edx,DWORD [24+esp]
+ and eax,ebx
+ add edx,DWORD [68+esp]
+ xor eax,edi
+ add ecx,edx
+ add edx,DWORD [8+esp]
+ lea eax,[ecx*1+eax]
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ mov DWORD [8+esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ andn esi,edx,DWORD [16+esp]
+ xor ecx,edi
+ and edx,DWORD [12+esp]
+ mov DWORD [24+esp],eax
+ or edx,esi
+ rorx edi,eax,2
+ rorx esi,eax,13
+ lea edx,[ecx*1+edx]
+ rorx ecx,eax,22
+ xor esi,edi
+ mov edi,DWORD [28+esp]
+ xor ecx,esi
+ xor eax,edi
+ add edx,DWORD [20+esp]
+ and ebx,eax
+ add edx,DWORD [72+esp]
+ xor ebx,edi
+ add ecx,edx
+ add edx,DWORD [4+esp]
+ lea ebx,[ecx*1+ebx]
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ mov DWORD [4+esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ andn esi,edx,DWORD [12+esp]
+ xor ecx,edi
+ and edx,DWORD [8+esp]
+ mov DWORD [20+esp],ebx
+ or edx,esi
+ rorx edi,ebx,2
+ rorx esi,ebx,13
+ lea edx,[ecx*1+edx]
+ rorx ecx,ebx,22
+ xor esi,edi
+ mov edi,DWORD [24+esp]
+ xor ecx,esi
+ xor ebx,edi
+ add edx,DWORD [16+esp]
+ and eax,ebx
+ add edx,DWORD [76+esp]
+ xor eax,edi
+ add ecx,edx
+ add edx,DWORD [esp]
+ lea eax,[ecx*1+eax]
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ mov DWORD [esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ andn esi,edx,DWORD [8+esp]
+ xor ecx,edi
+ and edx,DWORD [4+esp]
+ mov DWORD [16+esp],eax
+ or edx,esi
+ rorx edi,eax,2
+ rorx esi,eax,13
+ lea edx,[ecx*1+edx]
+ rorx ecx,eax,22
+ xor esi,edi
+ mov edi,DWORD [20+esp]
+ xor ecx,esi
+ xor eax,edi
+ add edx,DWORD [12+esp]
+ and ebx,eax
+ add edx,DWORD [80+esp]
+ xor ebx,edi
+ add ecx,edx
+ add edx,DWORD [28+esp]
+ lea ebx,[ecx*1+ebx]
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ mov DWORD [28+esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ andn esi,edx,DWORD [4+esp]
+ xor ecx,edi
+ and edx,DWORD [esp]
+ mov DWORD [12+esp],ebx
+ or edx,esi
+ rorx edi,ebx,2
+ rorx esi,ebx,13
+ lea edx,[ecx*1+edx]
+ rorx ecx,ebx,22
+ xor esi,edi
+ mov edi,DWORD [16+esp]
+ xor ecx,esi
+ xor ebx,edi
+ add edx,DWORD [8+esp]
+ and eax,ebx
+ add edx,DWORD [84+esp]
+ xor eax,edi
+ add ecx,edx
+ add edx,DWORD [24+esp]
+ lea eax,[ecx*1+eax]
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ mov DWORD [24+esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ andn esi,edx,DWORD [esp]
+ xor ecx,edi
+ and edx,DWORD [28+esp]
+ mov DWORD [8+esp],eax
+ or edx,esi
+ rorx edi,eax,2
+ rorx esi,eax,13
+ lea edx,[ecx*1+edx]
+ rorx ecx,eax,22
+ xor esi,edi
+ mov edi,DWORD [12+esp]
+ xor ecx,esi
+ xor eax,edi
+ add edx,DWORD [4+esp]
+ and ebx,eax
+ add edx,DWORD [88+esp]
+ xor ebx,edi
+ add ecx,edx
+ add edx,DWORD [20+esp]
+ lea ebx,[ecx*1+ebx]
+ rorx ecx,edx,6
+ rorx esi,edx,11
+ mov DWORD [20+esp],edx
+ rorx edi,edx,25
+ xor ecx,esi
+ andn esi,edx,DWORD [28+esp]
+ xor ecx,edi
+ and edx,DWORD [24+esp]
+ mov DWORD [4+esp],ebx
+ or edx,esi
+ rorx edi,ebx,2
+ rorx esi,ebx,13
+ lea edx,[ecx*1+edx]
+ rorx ecx,ebx,22
+ xor esi,edi
+ mov edi,DWORD [8+esp]
+ xor ecx,esi
+ xor ebx,edi
+ add edx,DWORD [esp]
+ and eax,ebx
+ add edx,DWORD [92+esp]
+ xor eax,edi
+ add ecx,edx
+ add edx,DWORD [16+esp]
+ lea eax,[ecx*1+eax]
+ mov esi,DWORD [96+esp]
+ xor ebx,edi
+ mov ecx,DWORD [12+esp]
+ add eax,DWORD [esi]
+ add ebx,DWORD [4+esi]
+ add edi,DWORD [8+esi]
+ add ecx,DWORD [12+esi]
+ mov DWORD [esi],eax
+ mov DWORD [4+esi],ebx
+ mov DWORD [8+esi],edi
+ mov DWORD [12+esi],ecx
+ mov DWORD [4+esp],ebx
+ xor ebx,edi
+ mov DWORD [8+esp],edi
+ mov DWORD [12+esp],ecx
+ mov edi,DWORD [20+esp]
+ mov ecx,DWORD [24+esp]
+ add edx,DWORD [16+esi]
+ add edi,DWORD [20+esi]
+ add ecx,DWORD [24+esi]
+ mov DWORD [16+esi],edx
+ mov DWORD [20+esi],edi
+ mov DWORD [20+esp],edi
+ mov edi,DWORD [28+esp]
+ mov DWORD [24+esi],ecx
+ add edi,DWORD [28+esi]
+ mov DWORD [24+esp],ecx
+ mov DWORD [28+esi],edi
+ mov DWORD [28+esp],edi
+ mov edi,DWORD [100+esp]
+ vmovdqa xmm7,[64+ebp]
+ sub ebp,192
+ cmp edi,DWORD [104+esp]
+ jb NEAR L$017grand_avx_bmi
+ mov esp,DWORD [108+esp]
+ vzeroall
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+segment .bss
+common _OPENSSL_ia32cap_P 16
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha512-586.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha512-586.nasm
new file mode 100644
index 0000000000..f80f1cca53
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha512-586.nasm
@@ -0,0 +1,2842 @@
+; Copyright 2007-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text code align=64
+%else
+section .text code
+%endif
+;extern _OPENSSL_ia32cap_P
+global _sha512_block_data_order
+align 16
+_sha512_block_data_order:
+L$_sha512_block_data_order_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ mov esi,DWORD [20+esp]
+ mov edi,DWORD [24+esp]
+ mov eax,DWORD [28+esp]
+ mov ebx,esp
+ call L$000pic_point
+L$000pic_point:
+ pop ebp
+ lea ebp,[(L$001K512-L$000pic_point)+ebp]
+ sub esp,16
+ and esp,-64
+ shl eax,7
+ add eax,edi
+ mov DWORD [esp],esi
+ mov DWORD [4+esp],edi
+ mov DWORD [8+esp],eax
+ mov DWORD [12+esp],ebx
+ lea edx,[_OPENSSL_ia32cap_P]
+ mov ecx,DWORD [edx]
+ test ecx,67108864
+ jz NEAR L$002loop_x86
+ mov edx,DWORD [4+edx]
+ movq mm0,[esi]
+ and ecx,16777216
+ movq mm1,[8+esi]
+ and edx,512
+ movq mm2,[16+esi]
+ or ecx,edx
+ movq mm3,[24+esi]
+ movq mm4,[32+esi]
+ movq mm5,[40+esi]
+ movq mm6,[48+esi]
+ movq mm7,[56+esi]
+ cmp ecx,16777728
+ je NEAR L$003SSSE3
+ sub esp,80
+ jmp NEAR L$004loop_sse2
+align 16
+L$004loop_sse2:
+ movq [8+esp],mm1
+ movq [16+esp],mm2
+ movq [24+esp],mm3
+ movq [40+esp],mm5
+ movq [48+esp],mm6
+ pxor mm2,mm1
+ movq [56+esp],mm7
+ movq mm3,mm0
+ mov eax,DWORD [edi]
+ mov ebx,DWORD [4+edi]
+ add edi,8
+ mov edx,15
+ bswap eax
+ bswap ebx
+ jmp NEAR L$00500_14_sse2
+align 16
+L$00500_14_sse2:
+ movd mm1,eax
+ mov eax,DWORD [edi]
+ movd mm7,ebx
+ mov ebx,DWORD [4+edi]
+ add edi,8
+ bswap eax
+ bswap ebx
+ punpckldq mm7,mm1
+ movq mm1,mm4
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [32+esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ movq mm0,mm3
+ movq [72+esp],mm7
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [esp],mm0
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[56+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ paddq mm7,[ebp]
+ pxor mm3,mm4
+ movq mm4,[24+esp]
+ paddq mm3,mm7
+ movq mm5,mm0
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm0
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[8+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ sub esp,8
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm0,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm2,mm0
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm2,mm1
+ pxor mm6,mm7
+ movq mm5,[40+esp]
+ paddq mm3,mm2
+ movq mm2,mm0
+ add ebp,8
+ paddq mm3,mm6
+ movq mm6,[48+esp]
+ dec edx
+ jnz NEAR L$00500_14_sse2
+ movd mm1,eax
+ movd mm7,ebx
+ punpckldq mm7,mm1
+ movq mm1,mm4
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [32+esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ movq mm0,mm3
+ movq [72+esp],mm7
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [esp],mm0
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[56+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ paddq mm7,[ebp]
+ pxor mm3,mm4
+ movq mm4,[24+esp]
+ paddq mm3,mm7
+ movq mm5,mm0
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm0
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[8+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ sub esp,8
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm0,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm2,mm0
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm2,mm1
+ pxor mm6,mm7
+ movq mm7,[192+esp]
+ paddq mm3,mm2
+ movq mm2,mm0
+ add ebp,8
+ paddq mm3,mm6
+ pxor mm0,mm0
+ mov edx,32
+ jmp NEAR L$00616_79_sse2
+align 16
+L$00616_79_sse2:
+ movq mm5,[88+esp]
+ movq mm1,mm7
+ psrlq mm7,1
+ movq mm6,mm5
+ psrlq mm5,6
+ psllq mm1,56
+ paddq mm0,mm3
+ movq mm3,mm7
+ psrlq mm7,6
+ pxor mm3,mm1
+ psllq mm1,7
+ pxor mm3,mm7
+ psrlq mm7,1
+ pxor mm3,mm1
+ movq mm1,mm5
+ psrlq mm5,13
+ pxor mm7,mm3
+ psllq mm6,3
+ pxor mm1,mm5
+ paddq mm7,[200+esp]
+ pxor mm1,mm6
+ psrlq mm5,42
+ paddq mm7,[128+esp]
+ pxor mm1,mm5
+ psllq mm6,42
+ movq mm5,[40+esp]
+ pxor mm1,mm6
+ movq mm6,[48+esp]
+ paddq mm7,mm1
+ movq mm1,mm4
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [32+esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ movq [72+esp],mm7
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [esp],mm0
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[56+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ paddq mm7,[ebp]
+ pxor mm3,mm4
+ movq mm4,[24+esp]
+ paddq mm3,mm7
+ movq mm5,mm0
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm0
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[8+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ sub esp,8
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm0,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm2,mm0
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm2,mm1
+ pxor mm6,mm7
+ movq mm7,[192+esp]
+ paddq mm2,mm6
+ add ebp,8
+ movq mm5,[88+esp]
+ movq mm1,mm7
+ psrlq mm7,1
+ movq mm6,mm5
+ psrlq mm5,6
+ psllq mm1,56
+ paddq mm2,mm3
+ movq mm3,mm7
+ psrlq mm7,6
+ pxor mm3,mm1
+ psllq mm1,7
+ pxor mm3,mm7
+ psrlq mm7,1
+ pxor mm3,mm1
+ movq mm1,mm5
+ psrlq mm5,13
+ pxor mm7,mm3
+ psllq mm6,3
+ pxor mm1,mm5
+ paddq mm7,[200+esp]
+ pxor mm1,mm6
+ psrlq mm5,42
+ paddq mm7,[128+esp]
+ pxor mm1,mm5
+ psllq mm6,42
+ movq mm5,[40+esp]
+ pxor mm1,mm6
+ movq mm6,[48+esp]
+ paddq mm7,mm1
+ movq mm1,mm4
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [32+esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ movq [72+esp],mm7
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [esp],mm2
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[56+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ paddq mm7,[ebp]
+ pxor mm3,mm4
+ movq mm4,[24+esp]
+ paddq mm3,mm7
+ movq mm5,mm2
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm2
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[8+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ sub esp,8
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm2,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm0,mm2
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm0,mm1
+ pxor mm6,mm7
+ movq mm7,[192+esp]
+ paddq mm0,mm6
+ add ebp,8
+ dec edx
+ jnz NEAR L$00616_79_sse2
+ paddq mm0,mm3
+ movq mm1,[8+esp]
+ movq mm3,[24+esp]
+ movq mm5,[40+esp]
+ movq mm6,[48+esp]
+ movq mm7,[56+esp]
+ pxor mm2,mm1
+ paddq mm0,[esi]
+ paddq mm1,[8+esi]
+ paddq mm2,[16+esi]
+ paddq mm3,[24+esi]
+ paddq mm4,[32+esi]
+ paddq mm5,[40+esi]
+ paddq mm6,[48+esi]
+ paddq mm7,[56+esi]
+ mov eax,640
+ movq [esi],mm0
+ movq [8+esi],mm1
+ movq [16+esi],mm2
+ movq [24+esi],mm3
+ movq [32+esi],mm4
+ movq [40+esi],mm5
+ movq [48+esi],mm6
+ movq [56+esi],mm7
+ lea esp,[eax*1+esp]
+ sub ebp,eax
+ cmp edi,DWORD [88+esp]
+ jb NEAR L$004loop_sse2
+ mov esp,DWORD [92+esp]
+ emms
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+align 32
+L$003SSSE3:
+ lea edx,[esp-64]
+ sub esp,256
+ movdqa xmm1,[640+ebp]
+ movdqu xmm0,[edi]
+db 102,15,56,0,193
+ movdqa xmm3,[ebp]
+ movdqa xmm2,xmm1
+ movdqu xmm1,[16+edi]
+ paddq xmm3,xmm0
+db 102,15,56,0,202
+ movdqa [edx-128],xmm3
+ movdqa xmm4,[16+ebp]
+ movdqa xmm3,xmm2
+ movdqu xmm2,[32+edi]
+ paddq xmm4,xmm1
+db 102,15,56,0,211
+ movdqa [edx-112],xmm4
+ movdqa xmm5,[32+ebp]
+ movdqa xmm4,xmm3
+ movdqu xmm3,[48+edi]
+ paddq xmm5,xmm2
+db 102,15,56,0,220
+ movdqa [edx-96],xmm5
+ movdqa xmm6,[48+ebp]
+ movdqa xmm5,xmm4
+ movdqu xmm4,[64+edi]
+ paddq xmm6,xmm3
+db 102,15,56,0,229
+ movdqa [edx-80],xmm6
+ movdqa xmm7,[64+ebp]
+ movdqa xmm6,xmm5
+ movdqu xmm5,[80+edi]
+ paddq xmm7,xmm4
+db 102,15,56,0,238
+ movdqa [edx-64],xmm7
+ movdqa [edx],xmm0
+ movdqa xmm0,[80+ebp]
+ movdqa xmm7,xmm6
+ movdqu xmm6,[96+edi]
+ paddq xmm0,xmm5
+db 102,15,56,0,247
+ movdqa [edx-48],xmm0
+ movdqa [16+edx],xmm1
+ movdqa xmm1,[96+ebp]
+ movdqa xmm0,xmm7
+ movdqu xmm7,[112+edi]
+ paddq xmm1,xmm6
+db 102,15,56,0,248
+ movdqa [edx-32],xmm1
+ movdqa [32+edx],xmm2
+ movdqa xmm2,[112+ebp]
+ movdqa xmm0,[edx]
+ paddq xmm2,xmm7
+ movdqa [edx-16],xmm2
+ nop
+align 32
+L$007loop_ssse3:
+ movdqa xmm2,[16+edx]
+ movdqa [48+edx],xmm3
+ lea ebp,[128+ebp]
+ movq [8+esp],mm1
+ mov ebx,edi
+ movq [16+esp],mm2
+ lea edi,[128+edi]
+ movq [24+esp],mm3
+ cmp edi,eax
+ movq [40+esp],mm5
+ cmovb ebx,edi
+ movq [48+esp],mm6
+ mov ecx,4
+ pxor mm2,mm1
+ movq [56+esp],mm7
+ pxor mm3,mm3
+ jmp NEAR L$00800_47_ssse3
+align 32
+L$00800_47_ssse3:
+ movdqa xmm3,xmm5
+ movdqa xmm1,xmm2
+db 102,15,58,15,208,8
+ movdqa [edx],xmm4
+db 102,15,58,15,220,8
+ movdqa xmm4,xmm2
+ psrlq xmm2,7
+ paddq xmm0,xmm3
+ movdqa xmm3,xmm4
+ psrlq xmm4,1
+ psllq xmm3,56
+ pxor xmm2,xmm4
+ psrlq xmm4,7
+ pxor xmm2,xmm3
+ psllq xmm3,7
+ pxor xmm2,xmm4
+ movdqa xmm4,xmm7
+ pxor xmm2,xmm3
+ movdqa xmm3,xmm7
+ psrlq xmm4,6
+ paddq xmm0,xmm2
+ movdqa xmm2,xmm7
+ psrlq xmm3,19
+ psllq xmm2,3
+ pxor xmm4,xmm3
+ psrlq xmm3,42
+ pxor xmm4,xmm2
+ psllq xmm2,42
+ pxor xmm4,xmm3
+ movdqa xmm3,[32+edx]
+ pxor xmm4,xmm2
+ movdqa xmm2,[ebp]
+ movq mm1,mm4
+ paddq xmm0,xmm4
+ movq mm7,[edx-128]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [32+esp],mm4
+ paddq xmm2,xmm0
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm0,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [esp],mm0
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[56+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[24+esp]
+ paddq mm3,mm7
+ movq mm5,mm0
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm0
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[8+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm0,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm2,mm0
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm2,mm1
+ pxor mm6,mm7
+ movq mm5,[32+esp]
+ paddq mm2,mm6
+ movq mm6,[40+esp]
+ movq mm1,mm4
+ movq mm7,[edx-120]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [24+esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm2,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [56+esp],mm2
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[48+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[16+esp]
+ paddq mm3,mm7
+ movq mm5,mm2
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm2
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm2,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm0,mm2
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm0,mm1
+ pxor mm6,mm7
+ movq mm5,[24+esp]
+ paddq mm0,mm6
+ movq mm6,[32+esp]
+ movdqa [edx-128],xmm2
+ movdqa xmm4,xmm6
+ movdqa xmm2,xmm3
+db 102,15,58,15,217,8
+ movdqa [16+edx],xmm5
+db 102,15,58,15,229,8
+ movdqa xmm5,xmm3
+ psrlq xmm3,7
+ paddq xmm1,xmm4
+ movdqa xmm4,xmm5
+ psrlq xmm5,1
+ psllq xmm4,56
+ pxor xmm3,xmm5
+ psrlq xmm5,7
+ pxor xmm3,xmm4
+ psllq xmm4,7
+ pxor xmm3,xmm5
+ movdqa xmm5,xmm0
+ pxor xmm3,xmm4
+ movdqa xmm4,xmm0
+ psrlq xmm5,6
+ paddq xmm1,xmm3
+ movdqa xmm3,xmm0
+ psrlq xmm4,19
+ psllq xmm3,3
+ pxor xmm5,xmm4
+ psrlq xmm4,42
+ pxor xmm5,xmm3
+ psllq xmm3,42
+ pxor xmm5,xmm4
+ movdqa xmm4,[48+edx]
+ pxor xmm5,xmm3
+ movdqa xmm3,[16+ebp]
+ movq mm1,mm4
+ paddq xmm1,xmm5
+ movq mm7,[edx-112]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [16+esp],mm4
+ paddq xmm3,xmm1
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm0,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [48+esp],mm0
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[40+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[8+esp]
+ paddq mm3,mm7
+ movq mm5,mm0
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm0
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[56+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm0,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm2,mm0
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm2,mm1
+ pxor mm6,mm7
+ movq mm5,[16+esp]
+ paddq mm2,mm6
+ movq mm6,[24+esp]
+ movq mm1,mm4
+ movq mm7,[edx-104]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [8+esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm2,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [40+esp],mm2
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[32+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[esp]
+ paddq mm3,mm7
+ movq mm5,mm2
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm2
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[48+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm2,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm0,mm2
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm0,mm1
+ pxor mm6,mm7
+ movq mm5,[8+esp]
+ paddq mm0,mm6
+ movq mm6,[16+esp]
+ movdqa [edx-112],xmm3
+ movdqa xmm5,xmm7
+ movdqa xmm3,xmm4
+db 102,15,58,15,226,8
+ movdqa [32+edx],xmm6
+db 102,15,58,15,238,8
+ movdqa xmm6,xmm4
+ psrlq xmm4,7
+ paddq xmm2,xmm5
+ movdqa xmm5,xmm6
+ psrlq xmm6,1
+ psllq xmm5,56
+ pxor xmm4,xmm6
+ psrlq xmm6,7
+ pxor xmm4,xmm5
+ psllq xmm5,7
+ pxor xmm4,xmm6
+ movdqa xmm6,xmm1
+ pxor xmm4,xmm5
+ movdqa xmm5,xmm1
+ psrlq xmm6,6
+ paddq xmm2,xmm4
+ movdqa xmm4,xmm1
+ psrlq xmm5,19
+ psllq xmm4,3
+ pxor xmm6,xmm5
+ psrlq xmm5,42
+ pxor xmm6,xmm4
+ psllq xmm4,42
+ pxor xmm6,xmm5
+ movdqa xmm5,[edx]
+ pxor xmm6,xmm4
+ movdqa xmm4,[32+ebp]
+ movq mm1,mm4
+ paddq xmm2,xmm6
+ movq mm7,[edx-96]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [esp],mm4
+ paddq xmm4,xmm2
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm0,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [32+esp],mm0
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[24+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[56+esp]
+ paddq mm3,mm7
+ movq mm5,mm0
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm0
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[40+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm0,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm2,mm0
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm2,mm1
+ pxor mm6,mm7
+ movq mm5,[esp]
+ paddq mm2,mm6
+ movq mm6,[8+esp]
+ movq mm1,mm4
+ movq mm7,[edx-88]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [56+esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm2,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [24+esp],mm2
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[16+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[48+esp]
+ paddq mm3,mm7
+ movq mm5,mm2
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm2
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[32+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm2,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm0,mm2
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm0,mm1
+ pxor mm6,mm7
+ movq mm5,[56+esp]
+ paddq mm0,mm6
+ movq mm6,[esp]
+ movdqa [edx-96],xmm4
+ movdqa xmm6,xmm0
+ movdqa xmm4,xmm5
+db 102,15,58,15,235,8
+ movdqa [48+edx],xmm7
+db 102,15,58,15,247,8
+ movdqa xmm7,xmm5
+ psrlq xmm5,7
+ paddq xmm3,xmm6
+ movdqa xmm6,xmm7
+ psrlq xmm7,1
+ psllq xmm6,56
+ pxor xmm5,xmm7
+ psrlq xmm7,7
+ pxor xmm5,xmm6
+ psllq xmm6,7
+ pxor xmm5,xmm7
+ movdqa xmm7,xmm2
+ pxor xmm5,xmm6
+ movdqa xmm6,xmm2
+ psrlq xmm7,6
+ paddq xmm3,xmm5
+ movdqa xmm5,xmm2
+ psrlq xmm6,19
+ psllq xmm5,3
+ pxor xmm7,xmm6
+ psrlq xmm6,42
+ pxor xmm7,xmm5
+ psllq xmm5,42
+ pxor xmm7,xmm6
+ movdqa xmm6,[16+edx]
+ pxor xmm7,xmm5
+ movdqa xmm5,[48+ebp]
+ movq mm1,mm4
+ paddq xmm3,xmm7
+ movq mm7,[edx-80]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [48+esp],mm4
+ paddq xmm5,xmm3
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm0,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [16+esp],mm0
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[8+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[40+esp]
+ paddq mm3,mm7
+ movq mm5,mm0
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm0
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[24+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm0,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm2,mm0
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm2,mm1
+ pxor mm6,mm7
+ movq mm5,[48+esp]
+ paddq mm2,mm6
+ movq mm6,[56+esp]
+ movq mm1,mm4
+ movq mm7,[edx-72]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [40+esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm2,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [8+esp],mm2
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[32+esp]
+ paddq mm3,mm7
+ movq mm5,mm2
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm2
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[16+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm2,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm0,mm2
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm0,mm1
+ pxor mm6,mm7
+ movq mm5,[40+esp]
+ paddq mm0,mm6
+ movq mm6,[48+esp]
+ movdqa [edx-80],xmm5
+ movdqa xmm7,xmm1
+ movdqa xmm5,xmm6
+db 102,15,58,15,244,8
+ movdqa [edx],xmm0
+db 102,15,58,15,248,8
+ movdqa xmm0,xmm6
+ psrlq xmm6,7
+ paddq xmm4,xmm7
+ movdqa xmm7,xmm0
+ psrlq xmm0,1
+ psllq xmm7,56
+ pxor xmm6,xmm0
+ psrlq xmm0,7
+ pxor xmm6,xmm7
+ psllq xmm7,7
+ pxor xmm6,xmm0
+ movdqa xmm0,xmm3
+ pxor xmm6,xmm7
+ movdqa xmm7,xmm3
+ psrlq xmm0,6
+ paddq xmm4,xmm6
+ movdqa xmm6,xmm3
+ psrlq xmm7,19
+ psllq xmm6,3
+ pxor xmm0,xmm7
+ psrlq xmm7,42
+ pxor xmm0,xmm6
+ psllq xmm6,42
+ pxor xmm0,xmm7
+ movdqa xmm7,[32+edx]
+ pxor xmm0,xmm6
+ movdqa xmm6,[64+ebp]
+ movq mm1,mm4
+ paddq xmm4,xmm0
+ movq mm7,[edx-64]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [32+esp],mm4
+ paddq xmm6,xmm4
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm0,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [esp],mm0
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[56+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[24+esp]
+ paddq mm3,mm7
+ movq mm5,mm0
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm0
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[8+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm0,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm2,mm0
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm2,mm1
+ pxor mm6,mm7
+ movq mm5,[32+esp]
+ paddq mm2,mm6
+ movq mm6,[40+esp]
+ movq mm1,mm4
+ movq mm7,[edx-56]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [24+esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm2,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [56+esp],mm2
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[48+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[16+esp]
+ paddq mm3,mm7
+ movq mm5,mm2
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm2
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm2,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm0,mm2
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm0,mm1
+ pxor mm6,mm7
+ movq mm5,[24+esp]
+ paddq mm0,mm6
+ movq mm6,[32+esp]
+ movdqa [edx-64],xmm6
+ movdqa xmm0,xmm2
+ movdqa xmm6,xmm7
+db 102,15,58,15,253,8
+ movdqa [16+edx],xmm1
+db 102,15,58,15,193,8
+ movdqa xmm1,xmm7
+ psrlq xmm7,7
+ paddq xmm5,xmm0
+ movdqa xmm0,xmm1
+ psrlq xmm1,1
+ psllq xmm0,56
+ pxor xmm7,xmm1
+ psrlq xmm1,7
+ pxor xmm7,xmm0
+ psllq xmm0,7
+ pxor xmm7,xmm1
+ movdqa xmm1,xmm4
+ pxor xmm7,xmm0
+ movdqa xmm0,xmm4
+ psrlq xmm1,6
+ paddq xmm5,xmm7
+ movdqa xmm7,xmm4
+ psrlq xmm0,19
+ psllq xmm7,3
+ pxor xmm1,xmm0
+ psrlq xmm0,42
+ pxor xmm1,xmm7
+ psllq xmm7,42
+ pxor xmm1,xmm0
+ movdqa xmm0,[48+edx]
+ pxor xmm1,xmm7
+ movdqa xmm7,[80+ebp]
+ movq mm1,mm4
+ paddq xmm5,xmm1
+ movq mm7,[edx-48]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [16+esp],mm4
+ paddq xmm7,xmm5
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm0,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [48+esp],mm0
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[40+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[8+esp]
+ paddq mm3,mm7
+ movq mm5,mm0
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm0
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[56+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm0,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm2,mm0
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm2,mm1
+ pxor mm6,mm7
+ movq mm5,[16+esp]
+ paddq mm2,mm6
+ movq mm6,[24+esp]
+ movq mm1,mm4
+ movq mm7,[edx-40]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [8+esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm2,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [40+esp],mm2
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[32+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[esp]
+ paddq mm3,mm7
+ movq mm5,mm2
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm2
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[48+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm2,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm0,mm2
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm0,mm1
+ pxor mm6,mm7
+ movq mm5,[8+esp]
+ paddq mm0,mm6
+ movq mm6,[16+esp]
+ movdqa [edx-48],xmm7
+ movdqa xmm1,xmm3
+ movdqa xmm7,xmm0
+db 102,15,58,15,198,8
+ movdqa [32+edx],xmm2
+db 102,15,58,15,202,8
+ movdqa xmm2,xmm0
+ psrlq xmm0,7
+ paddq xmm6,xmm1
+ movdqa xmm1,xmm2
+ psrlq xmm2,1
+ psllq xmm1,56
+ pxor xmm0,xmm2
+ psrlq xmm2,7
+ pxor xmm0,xmm1
+ psllq xmm1,7
+ pxor xmm0,xmm2
+ movdqa xmm2,xmm5
+ pxor xmm0,xmm1
+ movdqa xmm1,xmm5
+ psrlq xmm2,6
+ paddq xmm6,xmm0
+ movdqa xmm0,xmm5
+ psrlq xmm1,19
+ psllq xmm0,3
+ pxor xmm2,xmm1
+ psrlq xmm1,42
+ pxor xmm2,xmm0
+ psllq xmm0,42
+ pxor xmm2,xmm1
+ movdqa xmm1,[edx]
+ pxor xmm2,xmm0
+ movdqa xmm0,[96+ebp]
+ movq mm1,mm4
+ paddq xmm6,xmm2
+ movq mm7,[edx-32]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [esp],mm4
+ paddq xmm0,xmm6
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm0,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [32+esp],mm0
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[24+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[56+esp]
+ paddq mm3,mm7
+ movq mm5,mm0
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm0
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[40+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm0,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm2,mm0
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm2,mm1
+ pxor mm6,mm7
+ movq mm5,[esp]
+ paddq mm2,mm6
+ movq mm6,[8+esp]
+ movq mm1,mm4
+ movq mm7,[edx-24]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [56+esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm2,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [24+esp],mm2
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[16+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[48+esp]
+ paddq mm3,mm7
+ movq mm5,mm2
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm2
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[32+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm2,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm0,mm2
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm0,mm1
+ pxor mm6,mm7
+ movq mm5,[56+esp]
+ paddq mm0,mm6
+ movq mm6,[esp]
+ movdqa [edx-32],xmm0
+ movdqa xmm2,xmm4
+ movdqa xmm0,xmm1
+db 102,15,58,15,207,8
+ movdqa [48+edx],xmm3
+db 102,15,58,15,211,8
+ movdqa xmm3,xmm1
+ psrlq xmm1,7
+ paddq xmm7,xmm2
+ movdqa xmm2,xmm3
+ psrlq xmm3,1
+ psllq xmm2,56
+ pxor xmm1,xmm3
+ psrlq xmm3,7
+ pxor xmm1,xmm2
+ psllq xmm2,7
+ pxor xmm1,xmm3
+ movdqa xmm3,xmm6
+ pxor xmm1,xmm2
+ movdqa xmm2,xmm6
+ psrlq xmm3,6
+ paddq xmm7,xmm1
+ movdqa xmm1,xmm6
+ psrlq xmm2,19
+ psllq xmm1,3
+ pxor xmm3,xmm2
+ psrlq xmm2,42
+ pxor xmm3,xmm1
+ psllq xmm1,42
+ pxor xmm3,xmm2
+ movdqa xmm2,[16+edx]
+ pxor xmm3,xmm1
+ movdqa xmm1,[112+ebp]
+ movq mm1,mm4
+ paddq xmm7,xmm3
+ movq mm7,[edx-16]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [48+esp],mm4
+ paddq xmm1,xmm7
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm0,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [16+esp],mm0
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[8+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[40+esp]
+ paddq mm3,mm7
+ movq mm5,mm0
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm0
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[24+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm0,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm2,mm0
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm2,mm1
+ pxor mm6,mm7
+ movq mm5,[48+esp]
+ paddq mm2,mm6
+ movq mm6,[56+esp]
+ movq mm1,mm4
+ movq mm7,[edx-8]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [40+esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm2,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [8+esp],mm2
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[32+esp]
+ paddq mm3,mm7
+ movq mm5,mm2
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm2
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[16+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm2,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm0,mm2
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm0,mm1
+ pxor mm6,mm7
+ movq mm5,[40+esp]
+ paddq mm0,mm6
+ movq mm6,[48+esp]
+ movdqa [edx-16],xmm1
+ lea ebp,[128+ebp]
+ dec ecx
+ jnz NEAR L$00800_47_ssse3
+ movdqa xmm1,[ebp]
+ lea ebp,[ebp-640]
+ movdqu xmm0,[ebx]
+db 102,15,56,0,193
+ movdqa xmm3,[ebp]
+ movdqa xmm2,xmm1
+ movdqu xmm1,[16+ebx]
+ paddq xmm3,xmm0
+db 102,15,56,0,202
+ movq mm1,mm4
+ movq mm7,[edx-128]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [32+esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm0,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [esp],mm0
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[56+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[24+esp]
+ paddq mm3,mm7
+ movq mm5,mm0
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm0
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[8+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm0,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm2,mm0
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm2,mm1
+ pxor mm6,mm7
+ movq mm5,[32+esp]
+ paddq mm2,mm6
+ movq mm6,[40+esp]
+ movq mm1,mm4
+ movq mm7,[edx-120]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [24+esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm2,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [56+esp],mm2
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[48+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[16+esp]
+ paddq mm3,mm7
+ movq mm5,mm2
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm2
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm2,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm0,mm2
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm0,mm1
+ pxor mm6,mm7
+ movq mm5,[24+esp]
+ paddq mm0,mm6
+ movq mm6,[32+esp]
+ movdqa [edx-128],xmm3
+ movdqa xmm4,[16+ebp]
+ movdqa xmm3,xmm2
+ movdqu xmm2,[32+ebx]
+ paddq xmm4,xmm1
+db 102,15,56,0,211
+ movq mm1,mm4
+ movq mm7,[edx-112]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [16+esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm0,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [48+esp],mm0
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[40+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[8+esp]
+ paddq mm3,mm7
+ movq mm5,mm0
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm0
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[56+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm0,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm2,mm0
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm2,mm1
+ pxor mm6,mm7
+ movq mm5,[16+esp]
+ paddq mm2,mm6
+ movq mm6,[24+esp]
+ movq mm1,mm4
+ movq mm7,[edx-104]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [8+esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm2,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [40+esp],mm2
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[32+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[esp]
+ paddq mm3,mm7
+ movq mm5,mm2
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm2
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[48+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm2,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm0,mm2
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm0,mm1
+ pxor mm6,mm7
+ movq mm5,[8+esp]
+ paddq mm0,mm6
+ movq mm6,[16+esp]
+ movdqa [edx-112],xmm4
+ movdqa xmm5,[32+ebp]
+ movdqa xmm4,xmm3
+ movdqu xmm3,[48+ebx]
+ paddq xmm5,xmm2
+db 102,15,56,0,220
+ movq mm1,mm4
+ movq mm7,[edx-96]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm0,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [32+esp],mm0
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[24+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[56+esp]
+ paddq mm3,mm7
+ movq mm5,mm0
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm0
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[40+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm0,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm2,mm0
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm2,mm1
+ pxor mm6,mm7
+ movq mm5,[esp]
+ paddq mm2,mm6
+ movq mm6,[8+esp]
+ movq mm1,mm4
+ movq mm7,[edx-88]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [56+esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm2,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [24+esp],mm2
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[16+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[48+esp]
+ paddq mm3,mm7
+ movq mm5,mm2
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm2
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[32+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm2,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm0,mm2
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm0,mm1
+ pxor mm6,mm7
+ movq mm5,[56+esp]
+ paddq mm0,mm6
+ movq mm6,[esp]
+ movdqa [edx-96],xmm5
+ movdqa xmm6,[48+ebp]
+ movdqa xmm5,xmm4
+ movdqu xmm4,[64+ebx]
+ paddq xmm6,xmm3
+db 102,15,56,0,229
+ movq mm1,mm4
+ movq mm7,[edx-80]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [48+esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm0,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [16+esp],mm0
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[8+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[40+esp]
+ paddq mm3,mm7
+ movq mm5,mm0
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm0
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[24+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm0,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm2,mm0
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm2,mm1
+ pxor mm6,mm7
+ movq mm5,[48+esp]
+ paddq mm2,mm6
+ movq mm6,[56+esp]
+ movq mm1,mm4
+ movq mm7,[edx-72]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [40+esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm2,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [8+esp],mm2
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[32+esp]
+ paddq mm3,mm7
+ movq mm5,mm2
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm2
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[16+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm2,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm0,mm2
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm0,mm1
+ pxor mm6,mm7
+ movq mm5,[40+esp]
+ paddq mm0,mm6
+ movq mm6,[48+esp]
+ movdqa [edx-80],xmm6
+ movdqa xmm7,[64+ebp]
+ movdqa xmm6,xmm5
+ movdqu xmm5,[80+ebx]
+ paddq xmm7,xmm4
+db 102,15,56,0,238
+ movq mm1,mm4
+ movq mm7,[edx-64]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [32+esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm0,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [esp],mm0
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[56+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[24+esp]
+ paddq mm3,mm7
+ movq mm5,mm0
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm0
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[8+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm0,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm2,mm0
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm2,mm1
+ pxor mm6,mm7
+ movq mm5,[32+esp]
+ paddq mm2,mm6
+ movq mm6,[40+esp]
+ movq mm1,mm4
+ movq mm7,[edx-56]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [24+esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm2,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [56+esp],mm2
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[48+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[16+esp]
+ paddq mm3,mm7
+ movq mm5,mm2
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm2
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm2,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm0,mm2
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm0,mm1
+ pxor mm6,mm7
+ movq mm5,[24+esp]
+ paddq mm0,mm6
+ movq mm6,[32+esp]
+ movdqa [edx-64],xmm7
+ movdqa [edx],xmm0
+ movdqa xmm0,[80+ebp]
+ movdqa xmm7,xmm6
+ movdqu xmm6,[96+ebx]
+ paddq xmm0,xmm5
+db 102,15,56,0,247
+ movq mm1,mm4
+ movq mm7,[edx-48]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [16+esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm0,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [48+esp],mm0
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[40+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[8+esp]
+ paddq mm3,mm7
+ movq mm5,mm0
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm0
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[56+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm0,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm2,mm0
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm2,mm1
+ pxor mm6,mm7
+ movq mm5,[16+esp]
+ paddq mm2,mm6
+ movq mm6,[24+esp]
+ movq mm1,mm4
+ movq mm7,[edx-40]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [8+esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm2,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [40+esp],mm2
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[32+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[esp]
+ paddq mm3,mm7
+ movq mm5,mm2
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm2
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[48+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm2,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm0,mm2
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm0,mm1
+ pxor mm6,mm7
+ movq mm5,[8+esp]
+ paddq mm0,mm6
+ movq mm6,[16+esp]
+ movdqa [edx-48],xmm0
+ movdqa [16+edx],xmm1
+ movdqa xmm1,[96+ebp]
+ movdqa xmm0,xmm7
+ movdqu xmm7,[112+ebx]
+ paddq xmm1,xmm6
+db 102,15,56,0,248
+ movq mm1,mm4
+ movq mm7,[edx-32]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm0,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [32+esp],mm0
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[24+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[56+esp]
+ paddq mm3,mm7
+ movq mm5,mm0
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm0
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[40+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm0,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm2,mm0
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm2,mm1
+ pxor mm6,mm7
+ movq mm5,[esp]
+ paddq mm2,mm6
+ movq mm6,[8+esp]
+ movq mm1,mm4
+ movq mm7,[edx-24]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [56+esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm2,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [24+esp],mm2
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[16+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[48+esp]
+ paddq mm3,mm7
+ movq mm5,mm2
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm2
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[32+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm2,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm0,mm2
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm0,mm1
+ pxor mm6,mm7
+ movq mm5,[56+esp]
+ paddq mm0,mm6
+ movq mm6,[esp]
+ movdqa [edx-32],xmm1
+ movdqa [32+edx],xmm2
+ movdqa xmm2,[112+ebp]
+ movdqa xmm0,[edx]
+ paddq xmm2,xmm7
+ movq mm1,mm4
+ movq mm7,[edx-16]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [48+esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm0,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [16+esp],mm0
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[8+esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[40+esp]
+ paddq mm3,mm7
+ movq mm5,mm0
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm0
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[24+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm0,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm2,mm0
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm2,mm1
+ pxor mm6,mm7
+ movq mm5,[48+esp]
+ paddq mm2,mm6
+ movq mm6,[56+esp]
+ movq mm1,mm4
+ movq mm7,[edx-8]
+ pxor mm5,mm6
+ psrlq mm1,14
+ movq [40+esp],mm4
+ pand mm5,mm4
+ psllq mm4,23
+ paddq mm2,mm3
+ movq mm3,mm1
+ psrlq mm1,4
+ pxor mm5,mm6
+ pxor mm3,mm4
+ psllq mm4,23
+ pxor mm3,mm1
+ movq [8+esp],mm2
+ paddq mm7,mm5
+ pxor mm3,mm4
+ psrlq mm1,23
+ paddq mm7,[esp]
+ pxor mm3,mm1
+ psllq mm4,4
+ pxor mm3,mm4
+ movq mm4,[32+esp]
+ paddq mm3,mm7
+ movq mm5,mm2
+ psrlq mm5,28
+ paddq mm4,mm3
+ movq mm6,mm2
+ movq mm7,mm5
+ psllq mm6,25
+ movq mm1,[16+esp]
+ psrlq mm5,6
+ pxor mm7,mm6
+ psllq mm6,5
+ pxor mm7,mm5
+ pxor mm2,mm1
+ psrlq mm5,5
+ pxor mm7,mm6
+ pand mm0,mm2
+ psllq mm6,6
+ pxor mm7,mm5
+ pxor mm0,mm1
+ pxor mm6,mm7
+ movq mm5,[40+esp]
+ paddq mm0,mm6
+ movq mm6,[48+esp]
+ movdqa [edx-16],xmm2
+ movq mm1,[8+esp]
+ paddq mm0,mm3
+ movq mm3,[24+esp]
+ movq mm7,[56+esp]
+ pxor mm2,mm1
+ paddq mm0,[esi]
+ paddq mm1,[8+esi]
+ paddq mm2,[16+esi]
+ paddq mm3,[24+esi]
+ paddq mm4,[32+esi]
+ paddq mm5,[40+esi]
+ paddq mm6,[48+esi]
+ paddq mm7,[56+esi]
+ movq [esi],mm0
+ movq [8+esi],mm1
+ movq [16+esi],mm2
+ movq [24+esi],mm3
+ movq [32+esi],mm4
+ movq [40+esi],mm5
+ movq [48+esi],mm6
+ movq [56+esi],mm7
+ cmp edi,eax
+ jb NEAR L$007loop_ssse3
+ mov esp,DWORD [76+edx]
+ emms
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+align 16
+L$002loop_x86:
+ mov eax,DWORD [edi]
+ mov ebx,DWORD [4+edi]
+ mov ecx,DWORD [8+edi]
+ mov edx,DWORD [12+edi]
+ bswap eax
+ bswap ebx
+ bswap ecx
+ bswap edx
+ push eax
+ push ebx
+ push ecx
+ push edx
+ mov eax,DWORD [16+edi]
+ mov ebx,DWORD [20+edi]
+ mov ecx,DWORD [24+edi]
+ mov edx,DWORD [28+edi]
+ bswap eax
+ bswap ebx
+ bswap ecx
+ bswap edx
+ push eax
+ push ebx
+ push ecx
+ push edx
+ mov eax,DWORD [32+edi]
+ mov ebx,DWORD [36+edi]
+ mov ecx,DWORD [40+edi]
+ mov edx,DWORD [44+edi]
+ bswap eax
+ bswap ebx
+ bswap ecx
+ bswap edx
+ push eax
+ push ebx
+ push ecx
+ push edx
+ mov eax,DWORD [48+edi]
+ mov ebx,DWORD [52+edi]
+ mov ecx,DWORD [56+edi]
+ mov edx,DWORD [60+edi]
+ bswap eax
+ bswap ebx
+ bswap ecx
+ bswap edx
+ push eax
+ push ebx
+ push ecx
+ push edx
+ mov eax,DWORD [64+edi]
+ mov ebx,DWORD [68+edi]
+ mov ecx,DWORD [72+edi]
+ mov edx,DWORD [76+edi]
+ bswap eax
+ bswap ebx
+ bswap ecx
+ bswap edx
+ push eax
+ push ebx
+ push ecx
+ push edx
+ mov eax,DWORD [80+edi]
+ mov ebx,DWORD [84+edi]
+ mov ecx,DWORD [88+edi]
+ mov edx,DWORD [92+edi]
+ bswap eax
+ bswap ebx
+ bswap ecx
+ bswap edx
+ push eax
+ push ebx
+ push ecx
+ push edx
+ mov eax,DWORD [96+edi]
+ mov ebx,DWORD [100+edi]
+ mov ecx,DWORD [104+edi]
+ mov edx,DWORD [108+edi]
+ bswap eax
+ bswap ebx
+ bswap ecx
+ bswap edx
+ push eax
+ push ebx
+ push ecx
+ push edx
+ mov eax,DWORD [112+edi]
+ mov ebx,DWORD [116+edi]
+ mov ecx,DWORD [120+edi]
+ mov edx,DWORD [124+edi]
+ bswap eax
+ bswap ebx
+ bswap ecx
+ bswap edx
+ push eax
+ push ebx
+ push ecx
+ push edx
+ add edi,128
+ sub esp,72
+ mov DWORD [204+esp],edi
+ lea edi,[8+esp]
+ mov ecx,16
+dd 2784229001
+align 16
+L$00900_15_x86:
+ mov ecx,DWORD [40+esp]
+ mov edx,DWORD [44+esp]
+ mov esi,ecx
+ shr ecx,9
+ mov edi,edx
+ shr edx,9
+ mov ebx,ecx
+ shl esi,14
+ mov eax,edx
+ shl edi,14
+ xor ebx,esi
+ shr ecx,5
+ xor eax,edi
+ shr edx,5
+ xor eax,ecx
+ shl esi,4
+ xor ebx,edx
+ shl edi,4
+ xor ebx,esi
+ shr ecx,4
+ xor eax,edi
+ shr edx,4
+ xor eax,ecx
+ shl esi,5
+ xor ebx,edx
+ shl edi,5
+ xor eax,esi
+ xor ebx,edi
+ mov ecx,DWORD [48+esp]
+ mov edx,DWORD [52+esp]
+ mov esi,DWORD [56+esp]
+ mov edi,DWORD [60+esp]
+ add eax,DWORD [64+esp]
+ adc ebx,DWORD [68+esp]
+ xor ecx,esi
+ xor edx,edi
+ and ecx,DWORD [40+esp]
+ and edx,DWORD [44+esp]
+ add eax,DWORD [192+esp]
+ adc ebx,DWORD [196+esp]
+ xor ecx,esi
+ xor edx,edi
+ mov esi,DWORD [ebp]
+ mov edi,DWORD [4+ebp]
+ add eax,ecx
+ adc ebx,edx
+ mov ecx,DWORD [32+esp]
+ mov edx,DWORD [36+esp]
+ add eax,esi
+ adc ebx,edi
+ mov DWORD [esp],eax
+ mov DWORD [4+esp],ebx
+ add eax,ecx
+ adc ebx,edx
+ mov ecx,DWORD [8+esp]
+ mov edx,DWORD [12+esp]
+ mov DWORD [32+esp],eax
+ mov DWORD [36+esp],ebx
+ mov esi,ecx
+ shr ecx,2
+ mov edi,edx
+ shr edx,2
+ mov ebx,ecx
+ shl esi,4
+ mov eax,edx
+ shl edi,4
+ xor ebx,esi
+ shr ecx,5
+ xor eax,edi
+ shr edx,5
+ xor ebx,ecx
+ shl esi,21
+ xor eax,edx
+ shl edi,21
+ xor eax,esi
+ shr ecx,21
+ xor ebx,edi
+ shr edx,21
+ xor eax,ecx
+ shl esi,5
+ xor ebx,edx
+ shl edi,5
+ xor eax,esi
+ xor ebx,edi
+ mov ecx,DWORD [8+esp]
+ mov edx,DWORD [12+esp]
+ mov esi,DWORD [16+esp]
+ mov edi,DWORD [20+esp]
+ add eax,DWORD [esp]
+ adc ebx,DWORD [4+esp]
+ or ecx,esi
+ or edx,edi
+ and ecx,DWORD [24+esp]
+ and edx,DWORD [28+esp]
+ and esi,DWORD [8+esp]
+ and edi,DWORD [12+esp]
+ or ecx,esi
+ or edx,edi
+ add eax,ecx
+ adc ebx,edx
+ mov DWORD [esp],eax
+ mov DWORD [4+esp],ebx
+ mov dl,BYTE [ebp]
+ sub esp,8
+ lea ebp,[8+ebp]
+ cmp dl,148
+ jne NEAR L$00900_15_x86
+align 16
+L$01016_79_x86:
+ mov ecx,DWORD [312+esp]
+ mov edx,DWORD [316+esp]
+ mov esi,ecx
+ shr ecx,1
+ mov edi,edx
+ shr edx,1
+ mov eax,ecx
+ shl esi,24
+ mov ebx,edx
+ shl edi,24
+ xor ebx,esi
+ shr ecx,6
+ xor eax,edi
+ shr edx,6
+ xor eax,ecx
+ shl esi,7
+ xor ebx,edx
+ shl edi,1
+ xor ebx,esi
+ shr ecx,1
+ xor eax,edi
+ shr edx,1
+ xor eax,ecx
+ shl edi,6
+ xor ebx,edx
+ xor eax,edi
+ mov DWORD [esp],eax
+ mov DWORD [4+esp],ebx
+ mov ecx,DWORD [208+esp]
+ mov edx,DWORD [212+esp]
+ mov esi,ecx
+ shr ecx,6
+ mov edi,edx
+ shr edx,6
+ mov eax,ecx
+ shl esi,3
+ mov ebx,edx
+ shl edi,3
+ xor eax,esi
+ shr ecx,13
+ xor ebx,edi
+ shr edx,13
+ xor eax,ecx
+ shl esi,10
+ xor ebx,edx
+ shl edi,10
+ xor ebx,esi
+ shr ecx,10
+ xor eax,edi
+ shr edx,10
+ xor ebx,ecx
+ shl edi,13
+ xor eax,edx
+ xor eax,edi
+ mov ecx,DWORD [320+esp]
+ mov edx,DWORD [324+esp]
+ add eax,DWORD [esp]
+ adc ebx,DWORD [4+esp]
+ mov esi,DWORD [248+esp]
+ mov edi,DWORD [252+esp]
+ add eax,ecx
+ adc ebx,edx
+ add eax,esi
+ adc ebx,edi
+ mov DWORD [192+esp],eax
+ mov DWORD [196+esp],ebx
+ mov ecx,DWORD [40+esp]
+ mov edx,DWORD [44+esp]
+ mov esi,ecx
+ shr ecx,9
+ mov edi,edx
+ shr edx,9
+ mov ebx,ecx
+ shl esi,14
+ mov eax,edx
+ shl edi,14
+ xor ebx,esi
+ shr ecx,5
+ xor eax,edi
+ shr edx,5
+ xor eax,ecx
+ shl esi,4
+ xor ebx,edx
+ shl edi,4
+ xor ebx,esi
+ shr ecx,4
+ xor eax,edi
+ shr edx,4
+ xor eax,ecx
+ shl esi,5
+ xor ebx,edx
+ shl edi,5
+ xor eax,esi
+ xor ebx,edi
+ mov ecx,DWORD [48+esp]
+ mov edx,DWORD [52+esp]
+ mov esi,DWORD [56+esp]
+ mov edi,DWORD [60+esp]
+ add eax,DWORD [64+esp]
+ adc ebx,DWORD [68+esp]
+ xor ecx,esi
+ xor edx,edi
+ and ecx,DWORD [40+esp]
+ and edx,DWORD [44+esp]
+ add eax,DWORD [192+esp]
+ adc ebx,DWORD [196+esp]
+ xor ecx,esi
+ xor edx,edi
+ mov esi,DWORD [ebp]
+ mov edi,DWORD [4+ebp]
+ add eax,ecx
+ adc ebx,edx
+ mov ecx,DWORD [32+esp]
+ mov edx,DWORD [36+esp]
+ add eax,esi
+ adc ebx,edi
+ mov DWORD [esp],eax
+ mov DWORD [4+esp],ebx
+ add eax,ecx
+ adc ebx,edx
+ mov ecx,DWORD [8+esp]
+ mov edx,DWORD [12+esp]
+ mov DWORD [32+esp],eax
+ mov DWORD [36+esp],ebx
+ mov esi,ecx
+ shr ecx,2
+ mov edi,edx
+ shr edx,2
+ mov ebx,ecx
+ shl esi,4
+ mov eax,edx
+ shl edi,4
+ xor ebx,esi
+ shr ecx,5
+ xor eax,edi
+ shr edx,5
+ xor ebx,ecx
+ shl esi,21
+ xor eax,edx
+ shl edi,21
+ xor eax,esi
+ shr ecx,21
+ xor ebx,edi
+ shr edx,21
+ xor eax,ecx
+ shl esi,5
+ xor ebx,edx
+ shl edi,5
+ xor eax,esi
+ xor ebx,edi
+ mov ecx,DWORD [8+esp]
+ mov edx,DWORD [12+esp]
+ mov esi,DWORD [16+esp]
+ mov edi,DWORD [20+esp]
+ add eax,DWORD [esp]
+ adc ebx,DWORD [4+esp]
+ or ecx,esi
+ or edx,edi
+ and ecx,DWORD [24+esp]
+ and edx,DWORD [28+esp]
+ and esi,DWORD [8+esp]
+ and edi,DWORD [12+esp]
+ or ecx,esi
+ or edx,edi
+ add eax,ecx
+ adc ebx,edx
+ mov DWORD [esp],eax
+ mov DWORD [4+esp],ebx
+ mov dl,BYTE [ebp]
+ sub esp,8
+ lea ebp,[8+ebp]
+ cmp dl,23
+ jne NEAR L$01016_79_x86
+ mov esi,DWORD [840+esp]
+ mov edi,DWORD [844+esp]
+ mov eax,DWORD [esi]
+ mov ebx,DWORD [4+esi]
+ mov ecx,DWORD [8+esi]
+ mov edx,DWORD [12+esi]
+ add eax,DWORD [8+esp]
+ adc ebx,DWORD [12+esp]
+ mov DWORD [esi],eax
+ mov DWORD [4+esi],ebx
+ add ecx,DWORD [16+esp]
+ adc edx,DWORD [20+esp]
+ mov DWORD [8+esi],ecx
+ mov DWORD [12+esi],edx
+ mov eax,DWORD [16+esi]
+ mov ebx,DWORD [20+esi]
+ mov ecx,DWORD [24+esi]
+ mov edx,DWORD [28+esi]
+ add eax,DWORD [24+esp]
+ adc ebx,DWORD [28+esp]
+ mov DWORD [16+esi],eax
+ mov DWORD [20+esi],ebx
+ add ecx,DWORD [32+esp]
+ adc edx,DWORD [36+esp]
+ mov DWORD [24+esi],ecx
+ mov DWORD [28+esi],edx
+ mov eax,DWORD [32+esi]
+ mov ebx,DWORD [36+esi]
+ mov ecx,DWORD [40+esi]
+ mov edx,DWORD [44+esi]
+ add eax,DWORD [40+esp]
+ adc ebx,DWORD [44+esp]
+ mov DWORD [32+esi],eax
+ mov DWORD [36+esi],ebx
+ add ecx,DWORD [48+esp]
+ adc edx,DWORD [52+esp]
+ mov DWORD [40+esi],ecx
+ mov DWORD [44+esi],edx
+ mov eax,DWORD [48+esi]
+ mov ebx,DWORD [52+esi]
+ mov ecx,DWORD [56+esi]
+ mov edx,DWORD [60+esi]
+ add eax,DWORD [56+esp]
+ adc ebx,DWORD [60+esp]
+ mov DWORD [48+esi],eax
+ mov DWORD [52+esi],ebx
+ add ecx,DWORD [64+esp]
+ adc edx,DWORD [68+esp]
+ mov DWORD [56+esi],ecx
+ mov DWORD [60+esi],edx
+ add esp,840
+ sub ebp,640
+ cmp edi,DWORD [8+esp]
+ jb NEAR L$002loop_x86
+ mov esp,DWORD [12+esp]
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+align 64
+L$001K512:
+dd 3609767458,1116352408
+dd 602891725,1899447441
+dd 3964484399,3049323471
+dd 2173295548,3921009573
+dd 4081628472,961987163
+dd 3053834265,1508970993
+dd 2937671579,2453635748
+dd 3664609560,2870763221
+dd 2734883394,3624381080
+dd 1164996542,310598401
+dd 1323610764,607225278
+dd 3590304994,1426881987
+dd 4068182383,1925078388
+dd 991336113,2162078206
+dd 633803317,2614888103
+dd 3479774868,3248222580
+dd 2666613458,3835390401
+dd 944711139,4022224774
+dd 2341262773,264347078
+dd 2007800933,604807628
+dd 1495990901,770255983
+dd 1856431235,1249150122
+dd 3175218132,1555081692
+dd 2198950837,1996064986
+dd 3999719339,2554220882
+dd 766784016,2821834349
+dd 2566594879,2952996808
+dd 3203337956,3210313671
+dd 1034457026,3336571891
+dd 2466948901,3584528711
+dd 3758326383,113926993
+dd 168717936,338241895
+dd 1188179964,666307205
+dd 1546045734,773529912
+dd 1522805485,1294757372
+dd 2643833823,1396182291
+dd 2343527390,1695183700
+dd 1014477480,1986661051
+dd 1206759142,2177026350
+dd 344077627,2456956037
+dd 1290863460,2730485921
+dd 3158454273,2820302411
+dd 3505952657,3259730800
+dd 106217008,3345764771
+dd 3606008344,3516065817
+dd 1432725776,3600352804
+dd 1467031594,4094571909
+dd 851169720,275423344
+dd 3100823752,430227734
+dd 1363258195,506948616
+dd 3750685593,659060556
+dd 3785050280,883997877
+dd 3318307427,958139571
+dd 3812723403,1322822218
+dd 2003034995,1537002063
+dd 3602036899,1747873779
+dd 1575990012,1955562222
+dd 1125592928,2024104815
+dd 2716904306,2227730452
+dd 442776044,2361852424
+dd 593698344,2428436474
+dd 3733110249,2756734187
+dd 2999351573,3204031479
+dd 3815920427,3329325298
+dd 3928383900,3391569614
+dd 566280711,3515267271
+dd 3454069534,3940187606
+dd 4000239992,4118630271
+dd 1914138554,116418474
+dd 2731055270,174292421
+dd 3203993006,289380356
+dd 320620315,460393269
+dd 587496836,685471733
+dd 1086792851,852142971
+dd 365543100,1017036298
+dd 2618297676,1126000580
+dd 3409855158,1288033470
+dd 4234509866,1501505948
+dd 987167468,1607167915
+dd 1246189591,1816402316
+dd 67438087,66051
+dd 202182159,134810123
+db 83,72,65,53,49,50,32,98,108,111,99,107,32,116,114,97
+db 110,115,102,111,114,109,32,102,111,114,32,120,56,54,44,32
+db 67,82,89,80,84,79,71,65,77,83,32,98,121,32,60,97
+db 112,112,114,111,64,111,112,101,110,115,115,108,46,111,114,103
+db 62,0
+segment .bss
+common _OPENSSL_ia32cap_P 16
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/x86cpuid.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/x86cpuid.nasm
new file mode 100644
index 0000000000..9d61eedd34
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/x86cpuid.nasm
@@ -0,0 +1,513 @@
+; Copyright 2004-2018 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text code align=64
+%else
+section .text code
+%endif
+global _OPENSSL_ia32_cpuid
+align 16
+_OPENSSL_ia32_cpuid:
+L$_OPENSSL_ia32_cpuid_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ xor edx,edx
+ pushfd
+ pop eax
+ mov ecx,eax
+ xor eax,2097152
+ push eax
+ popfd
+ pushfd
+ pop eax
+ xor ecx,eax
+ xor eax,eax
+ mov esi,DWORD [20+esp]
+ mov DWORD [8+esi],eax
+ bt ecx,21
+ jnc NEAR L$000nocpuid
+ cpuid
+ mov edi,eax
+ xor eax,eax
+ cmp ebx,1970169159
+ setne al
+ mov ebp,eax
+ cmp edx,1231384169
+ setne al
+ or ebp,eax
+ cmp ecx,1818588270
+ setne al
+ or ebp,eax
+ jz NEAR L$001intel
+ cmp ebx,1752462657
+ setne al
+ mov esi,eax
+ cmp edx,1769238117
+ setne al
+ or esi,eax
+ cmp ecx,1145913699
+ setne al
+ or esi,eax
+ jnz NEAR L$001intel
+ mov eax,2147483648
+ cpuid
+ cmp eax,2147483649
+ jb NEAR L$001intel
+ mov esi,eax
+ mov eax,2147483649
+ cpuid
+ or ebp,ecx
+ and ebp,2049
+ cmp esi,2147483656
+ jb NEAR L$001intel
+ mov eax,2147483656
+ cpuid
+ movzx esi,cl
+ inc esi
+ mov eax,1
+ xor ecx,ecx
+ cpuid
+ bt edx,28
+ jnc NEAR L$002generic
+ shr ebx,16
+ and ebx,255
+ cmp ebx,esi
+ ja NEAR L$002generic
+ and edx,4026531839
+ jmp NEAR L$002generic
+L$001intel:
+ cmp edi,4
+ mov esi,-1
+ jb NEAR L$003nocacheinfo
+ mov eax,4
+ mov ecx,0
+ cpuid
+ mov esi,eax
+ shr esi,14
+ and esi,4095
+L$003nocacheinfo:
+ mov eax,1
+ xor ecx,ecx
+ cpuid
+ and edx,3220176895
+ cmp ebp,0
+ jne NEAR L$004notintel
+ or edx,1073741824
+ and ah,15
+ cmp ah,15
+ jne NEAR L$004notintel
+ or edx,1048576
+L$004notintel:
+ bt edx,28
+ jnc NEAR L$002generic
+ and edx,4026531839
+ cmp esi,0
+ je NEAR L$002generic
+ or edx,268435456
+ shr ebx,16
+ cmp bl,1
+ ja NEAR L$002generic
+ and edx,4026531839
+L$002generic:
+ and ebp,2048
+ and ecx,4294965247
+ mov esi,edx
+ or ebp,ecx
+ cmp edi,7
+ mov edi,DWORD [20+esp]
+ jb NEAR L$005no_extended_info
+ mov eax,7
+ xor ecx,ecx
+ cpuid
+ mov DWORD [8+edi],ebx
+L$005no_extended_info:
+ bt ebp,27
+ jnc NEAR L$006clear_avx
+ xor ecx,ecx
+db 15,1,208
+ and eax,6
+ cmp eax,6
+ je NEAR L$007done
+ cmp eax,2
+ je NEAR L$006clear_avx
+L$008clear_xmm:
+ and ebp,4261412861
+ and esi,4278190079
+L$006clear_avx:
+ and ebp,4026525695
+ and DWORD [8+edi],4294967263
+L$007done:
+ mov eax,esi
+ mov edx,ebp
+L$000nocpuid:
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+;extern _OPENSSL_ia32cap_P
+global _OPENSSL_rdtsc
+align 16
+_OPENSSL_rdtsc:
+L$_OPENSSL_rdtsc_begin:
+ xor eax,eax
+ xor edx,edx
+ lea ecx,[_OPENSSL_ia32cap_P]
+ bt DWORD [ecx],4
+ jnc NEAR L$009notsc
+ rdtsc
+L$009notsc:
+ ret
+global _OPENSSL_instrument_halt
+align 16
+_OPENSSL_instrument_halt:
+L$_OPENSSL_instrument_halt_begin:
+ lea ecx,[_OPENSSL_ia32cap_P]
+ bt DWORD [ecx],4
+ jnc NEAR L$010nohalt
+dd 2421723150
+ and eax,3
+ jnz NEAR L$010nohalt
+ pushfd
+ pop eax
+ bt eax,9
+ jnc NEAR L$010nohalt
+ rdtsc
+ push edx
+ push eax
+ hlt
+ rdtsc
+ sub eax,DWORD [esp]
+ sbb edx,DWORD [4+esp]
+ add esp,8
+ ret
+L$010nohalt:
+ xor eax,eax
+ xor edx,edx
+ ret
+global _OPENSSL_far_spin
+align 16
+_OPENSSL_far_spin:
+L$_OPENSSL_far_spin_begin:
+ pushfd
+ pop eax
+ bt eax,9
+ jnc NEAR L$011nospin
+ mov eax,DWORD [4+esp]
+ mov ecx,DWORD [8+esp]
+dd 2430111262
+ xor eax,eax
+ mov edx,DWORD [ecx]
+ jmp NEAR L$012spin
+align 16
+L$012spin:
+ inc eax
+ cmp edx,DWORD [ecx]
+ je NEAR L$012spin
+dd 529567888
+ ret
+L$011nospin:
+ xor eax,eax
+ xor edx,edx
+ ret
+global _OPENSSL_wipe_cpu
+align 16
+_OPENSSL_wipe_cpu:
+L$_OPENSSL_wipe_cpu_begin:
+ xor eax,eax
+ xor edx,edx
+ lea ecx,[_OPENSSL_ia32cap_P]
+ mov ecx,DWORD [ecx]
+ bt DWORD [ecx],1
+ jnc NEAR L$013no_x87
+ and ecx,83886080
+ cmp ecx,83886080
+ jne NEAR L$014no_sse2
+ pxor xmm0,xmm0
+ pxor xmm1,xmm1
+ pxor xmm2,xmm2
+ pxor xmm3,xmm3
+ pxor xmm4,xmm4
+ pxor xmm5,xmm5
+ pxor xmm6,xmm6
+ pxor xmm7,xmm7
+L$014no_sse2:
+dd 4007259865,4007259865,4007259865,4007259865,2430851995
+L$013no_x87:
+ lea eax,[4+esp]
+ ret
+global _OPENSSL_atomic_add
+align 16
+_OPENSSL_atomic_add:
+L$_OPENSSL_atomic_add_begin:
+ mov edx,DWORD [4+esp]
+ mov ecx,DWORD [8+esp]
+ push ebx
+ nop
+ mov eax,DWORD [edx]
+L$015spin:
+ lea ebx,[ecx*1+eax]
+ nop
+dd 447811568
+ jne NEAR L$015spin
+ mov eax,ebx
+ pop ebx
+ ret
+global _OPENSSL_cleanse
+align 16
+_OPENSSL_cleanse:
+L$_OPENSSL_cleanse_begin:
+ mov edx,DWORD [4+esp]
+ mov ecx,DWORD [8+esp]
+ xor eax,eax
+ cmp ecx,7
+ jae NEAR L$016lot
+ cmp ecx,0
+ je NEAR L$017ret
+L$018little:
+ mov BYTE [edx],al
+ sub ecx,1
+ lea edx,[1+edx]
+ jnz NEAR L$018little
+L$017ret:
+ ret
+align 16
+L$016lot:
+ test edx,3
+ jz NEAR L$019aligned
+ mov BYTE [edx],al
+ lea ecx,[ecx-1]
+ lea edx,[1+edx]
+ jmp NEAR L$016lot
+L$019aligned:
+ mov DWORD [edx],eax
+ lea ecx,[ecx-4]
+ test ecx,-4
+ lea edx,[4+edx]
+ jnz NEAR L$019aligned
+ cmp ecx,0
+ jne NEAR L$018little
+ ret
+global _CRYPTO_memcmp
+align 16
+_CRYPTO_memcmp:
+L$_CRYPTO_memcmp_begin:
+ push esi
+ push edi
+ mov esi,DWORD [12+esp]
+ mov edi,DWORD [16+esp]
+ mov ecx,DWORD [20+esp]
+ xor eax,eax
+ xor edx,edx
+ cmp ecx,0
+ je NEAR L$020no_data
+L$021loop:
+ mov dl,BYTE [esi]
+ lea esi,[1+esi]
+ xor dl,BYTE [edi]
+ lea edi,[1+edi]
+ or al,dl
+ dec ecx
+ jnz NEAR L$021loop
+ neg eax
+ shr eax,31
+L$020no_data:
+ pop edi
+ pop esi
+ ret
+global _OPENSSL_instrument_bus
+align 16
+_OPENSSL_instrument_bus:
+L$_OPENSSL_instrument_bus_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ mov eax,0
+ lea edx,[_OPENSSL_ia32cap_P]
+ bt DWORD [edx],4
+ jnc NEAR L$022nogo
+ bt DWORD [edx],19
+ jnc NEAR L$022nogo
+ mov edi,DWORD [20+esp]
+ mov ecx,DWORD [24+esp]
+ rdtsc
+ mov esi,eax
+ mov ebx,0
+ clflush [edi]
+db 240
+ add DWORD [edi],ebx
+ jmp NEAR L$023loop
+align 16
+L$023loop:
+ rdtsc
+ mov edx,eax
+ sub eax,esi
+ mov esi,edx
+ mov ebx,eax
+ clflush [edi]
+db 240
+ add DWORD [edi],eax
+ lea edi,[4+edi]
+ sub ecx,1
+ jnz NEAR L$023loop
+ mov eax,DWORD [24+esp]
+L$022nogo:
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+global _OPENSSL_instrument_bus2
+align 16
+_OPENSSL_instrument_bus2:
+L$_OPENSSL_instrument_bus2_begin:
+ push ebp
+ push ebx
+ push esi
+ push edi
+ mov eax,0
+ lea edx,[_OPENSSL_ia32cap_P]
+ bt DWORD [edx],4
+ jnc NEAR L$024nogo
+ bt DWORD [edx],19
+ jnc NEAR L$024nogo
+ mov edi,DWORD [20+esp]
+ mov ecx,DWORD [24+esp]
+ mov ebp,DWORD [28+esp]
+ rdtsc
+ mov esi,eax
+ mov ebx,0
+ clflush [edi]
+db 240
+ add DWORD [edi],ebx
+ rdtsc
+ mov edx,eax
+ sub eax,esi
+ mov esi,edx
+ mov ebx,eax
+ jmp NEAR L$025loop2
+align 16
+L$025loop2:
+ clflush [edi]
+db 240
+ add DWORD [edi],eax
+ sub ebp,1
+ jz NEAR L$026done2
+ rdtsc
+ mov edx,eax
+ sub eax,esi
+ mov esi,edx
+ cmp eax,ebx
+ mov ebx,eax
+ mov edx,0
+ setne dl
+ sub ecx,edx
+ lea edi,[edx*4+edi]
+ jnz NEAR L$025loop2
+L$026done2:
+ mov eax,DWORD [24+esp]
+ sub eax,ecx
+L$024nogo:
+ pop edi
+ pop esi
+ pop ebx
+ pop ebp
+ ret
+global _OPENSSL_ia32_rdrand_bytes
+align 16
+_OPENSSL_ia32_rdrand_bytes:
+L$_OPENSSL_ia32_rdrand_bytes_begin:
+ push edi
+ push ebx
+ xor eax,eax
+ mov edi,DWORD [12+esp]
+ mov ebx,DWORD [16+esp]
+ cmp ebx,0
+ je NEAR L$027done
+ mov ecx,8
+L$028loop:
+db 15,199,242
+ jc NEAR L$029break
+ loop L$028loop
+ jmp NEAR L$027done
+align 16
+L$029break:
+ cmp ebx,4
+ jb NEAR L$030tail
+ mov DWORD [edi],edx
+ lea edi,[4+edi]
+ add eax,4
+ sub ebx,4
+ jz NEAR L$027done
+ mov ecx,8
+ jmp NEAR L$028loop
+align 16
+L$030tail:
+ mov BYTE [edi],dl
+ lea edi,[1+edi]
+ inc eax
+ shr edx,8
+ dec ebx
+ jnz NEAR L$030tail
+L$027done:
+ xor edx,edx
+ pop ebx
+ pop edi
+ ret
+global _OPENSSL_ia32_rdseed_bytes
+align 16
+_OPENSSL_ia32_rdseed_bytes:
+L$_OPENSSL_ia32_rdseed_bytes_begin:
+ push edi
+ push ebx
+ xor eax,eax
+ mov edi,DWORD [12+esp]
+ mov ebx,DWORD [16+esp]
+ cmp ebx,0
+ je NEAR L$031done
+ mov ecx,8
+L$032loop:
+db 15,199,250
+ jc NEAR L$033break
+ loop L$032loop
+ jmp NEAR L$031done
+align 16
+L$033break:
+ cmp ebx,4
+ jb NEAR L$034tail
+ mov DWORD [edi],edx
+ lea edi,[4+edi]
+ add eax,4
+ sub ebx,4
+ jz NEAR L$031done
+ mov ecx,8
+ jmp NEAR L$032loop
+align 16
+L$034tail:
+ mov BYTE [edi],dl
+ lea edi,[1+edi]
+ inc eax
+ shr edx,8
+ dec ebx
+ jnz NEAR L$034tail
+L$031done:
+ xor edx,edx
+ pop ebx
+ pop edi
+ ret
+segment .bss
+common _OPENSSL_ia32cap_P 16
+segment .CRT$XCU data align=4
+extern _OPENSSL_cpuid_setup
+dd _OPENSSL_cpuid_setup
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-mb-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-mb-x86_64.nasm
new file mode 100644
index 0000000000..a90434b21f
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-mb-x86_64.nasm
@@ -0,0 +1,1772 @@
+; Copyright 2013-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+EXTERN OPENSSL_ia32cap_P
+
+global aesni_multi_cbc_encrypt
+
+ALIGN 32
+aesni_multi_cbc_encrypt:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_aesni_multi_cbc_encrypt:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+
+ cmp edx,2
+ jb NEAR $L$enc_non_avx
+ mov ecx,DWORD[((OPENSSL_ia32cap_P+4))]
+ test ecx,268435456
+ jnz NEAR _avx_cbc_enc_shortcut
+ jmp NEAR $L$enc_non_avx
+ALIGN 16
+$L$enc_non_avx:
+ mov rax,rsp
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+ lea rsp,[((-168))+rsp]
+ movaps XMMWORD[rsp],xmm6
+ movaps XMMWORD[16+rsp],xmm7
+ movaps XMMWORD[32+rsp],xmm8
+ movaps XMMWORD[48+rsp],xmm9
+ movaps XMMWORD[64+rsp],xmm10
+ movaps XMMWORD[80+rsp],xmm11
+ movaps XMMWORD[96+rsp],xmm12
+ movaps XMMWORD[(-104)+rax],xmm13
+ movaps XMMWORD[(-88)+rax],xmm14
+ movaps XMMWORD[(-72)+rax],xmm15
+
+
+
+
+
+
+ sub rsp,48
+ and rsp,-64
+ mov QWORD[16+rsp],rax
+
+
+$L$enc4x_body:
+ movdqu xmm12,XMMWORD[rsi]
+ lea rsi,[120+rsi]
+ lea rdi,[80+rdi]
+
+$L$enc4x_loop_grande:
+ mov DWORD[24+rsp],edx
+ xor edx,edx
+ mov ecx,DWORD[((-64))+rdi]
+ mov r8,QWORD[((-80))+rdi]
+ cmp ecx,edx
+ mov r12,QWORD[((-72))+rdi]
+ cmovg edx,ecx
+ test ecx,ecx
+ movdqu xmm2,XMMWORD[((-56))+rdi]
+ mov DWORD[32+rsp],ecx
+ cmovle r8,rsp
+ mov ecx,DWORD[((-24))+rdi]
+ mov r9,QWORD[((-40))+rdi]
+ cmp ecx,edx
+ mov r13,QWORD[((-32))+rdi]
+ cmovg edx,ecx
+ test ecx,ecx
+ movdqu xmm3,XMMWORD[((-16))+rdi]
+ mov DWORD[36+rsp],ecx
+ cmovle r9,rsp
+ mov ecx,DWORD[16+rdi]
+ mov r10,QWORD[rdi]
+ cmp ecx,edx
+ mov r14,QWORD[8+rdi]
+ cmovg edx,ecx
+ test ecx,ecx
+ movdqu xmm4,XMMWORD[24+rdi]
+ mov DWORD[40+rsp],ecx
+ cmovle r10,rsp
+ mov ecx,DWORD[56+rdi]
+ mov r11,QWORD[40+rdi]
+ cmp ecx,edx
+ mov r15,QWORD[48+rdi]
+ cmovg edx,ecx
+ test ecx,ecx
+ movdqu xmm5,XMMWORD[64+rdi]
+ mov DWORD[44+rsp],ecx
+ cmovle r11,rsp
+ test edx,edx
+ jz NEAR $L$enc4x_done
+
+ movups xmm1,XMMWORD[((16-120))+rsi]
+ pxor xmm2,xmm12
+ movups xmm0,XMMWORD[((32-120))+rsi]
+ pxor xmm3,xmm12
+ mov eax,DWORD[((240-120))+rsi]
+ pxor xmm4,xmm12
+ movdqu xmm6,XMMWORD[r8]
+ pxor xmm5,xmm12
+ movdqu xmm7,XMMWORD[r9]
+ pxor xmm2,xmm6
+ movdqu xmm8,XMMWORD[r10]
+ pxor xmm3,xmm7
+ movdqu xmm9,XMMWORD[r11]
+ pxor xmm4,xmm8
+ pxor xmm5,xmm9
+ movdqa xmm10,XMMWORD[32+rsp]
+ xor rbx,rbx
+ jmp NEAR $L$oop_enc4x
+
+ALIGN 32
+$L$oop_enc4x:
+ add rbx,16
+ lea rbp,[16+rsp]
+ mov ecx,1
+ sub rbp,rbx
+
+DB 102,15,56,220,209
+ prefetcht0 [31+rbx*1+r8]
+ prefetcht0 [31+rbx*1+r9]
+DB 102,15,56,220,217
+ prefetcht0 [31+rbx*1+r10]
+ prefetcht0 [31+rbx*1+r10]
+DB 102,15,56,220,225
+DB 102,15,56,220,233
+ movups xmm1,XMMWORD[((48-120))+rsi]
+ cmp ecx,DWORD[32+rsp]
+DB 102,15,56,220,208
+DB 102,15,56,220,216
+DB 102,15,56,220,224
+ cmovge r8,rbp
+ cmovg r12,rbp
+DB 102,15,56,220,232
+ movups xmm0,XMMWORD[((-56))+rsi]
+ cmp ecx,DWORD[36+rsp]
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+DB 102,15,56,220,225
+ cmovge r9,rbp
+ cmovg r13,rbp
+DB 102,15,56,220,233
+ movups xmm1,XMMWORD[((-40))+rsi]
+ cmp ecx,DWORD[40+rsp]
+DB 102,15,56,220,208
+DB 102,15,56,220,216
+DB 102,15,56,220,224
+ cmovge r10,rbp
+ cmovg r14,rbp
+DB 102,15,56,220,232
+ movups xmm0,XMMWORD[((-24))+rsi]
+ cmp ecx,DWORD[44+rsp]
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+DB 102,15,56,220,225
+ cmovge r11,rbp
+ cmovg r15,rbp
+DB 102,15,56,220,233
+ movups xmm1,XMMWORD[((-8))+rsi]
+ movdqa xmm11,xmm10
+DB 102,15,56,220,208
+ prefetcht0 [15+rbx*1+r12]
+ prefetcht0 [15+rbx*1+r13]
+DB 102,15,56,220,216
+ prefetcht0 [15+rbx*1+r14]
+ prefetcht0 [15+rbx*1+r15]
+DB 102,15,56,220,224
+DB 102,15,56,220,232
+ movups xmm0,XMMWORD[((128-120))+rsi]
+ pxor xmm12,xmm12
+
+DB 102,15,56,220,209
+ pcmpgtd xmm11,xmm12
+ movdqu xmm12,XMMWORD[((-120))+rsi]
+DB 102,15,56,220,217
+ paddd xmm10,xmm11
+ movdqa XMMWORD[32+rsp],xmm10
+DB 102,15,56,220,225
+DB 102,15,56,220,233
+ movups xmm1,XMMWORD[((144-120))+rsi]
+
+ cmp eax,11
+
+DB 102,15,56,220,208
+DB 102,15,56,220,216
+DB 102,15,56,220,224
+DB 102,15,56,220,232
+ movups xmm0,XMMWORD[((160-120))+rsi]
+
+ jb NEAR $L$enc4x_tail
+
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+DB 102,15,56,220,225
+DB 102,15,56,220,233
+ movups xmm1,XMMWORD[((176-120))+rsi]
+
+DB 102,15,56,220,208
+DB 102,15,56,220,216
+DB 102,15,56,220,224
+DB 102,15,56,220,232
+ movups xmm0,XMMWORD[((192-120))+rsi]
+
+ je NEAR $L$enc4x_tail
+
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+DB 102,15,56,220,225
+DB 102,15,56,220,233
+ movups xmm1,XMMWORD[((208-120))+rsi]
+
+DB 102,15,56,220,208
+DB 102,15,56,220,216
+DB 102,15,56,220,224
+DB 102,15,56,220,232
+ movups xmm0,XMMWORD[((224-120))+rsi]
+ jmp NEAR $L$enc4x_tail
+
+ALIGN 32
+$L$enc4x_tail:
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+DB 102,15,56,220,225
+DB 102,15,56,220,233
+ movdqu xmm6,XMMWORD[rbx*1+r8]
+ movdqu xmm1,XMMWORD[((16-120))+rsi]
+
+DB 102,15,56,221,208
+ movdqu xmm7,XMMWORD[rbx*1+r9]
+ pxor xmm6,xmm12
+DB 102,15,56,221,216
+ movdqu xmm8,XMMWORD[rbx*1+r10]
+ pxor xmm7,xmm12
+DB 102,15,56,221,224
+ movdqu xmm9,XMMWORD[rbx*1+r11]
+ pxor xmm8,xmm12
+DB 102,15,56,221,232
+ movdqu xmm0,XMMWORD[((32-120))+rsi]
+ pxor xmm9,xmm12
+
+ movups XMMWORD[(-16)+rbx*1+r12],xmm2
+ pxor xmm2,xmm6
+ movups XMMWORD[(-16)+rbx*1+r13],xmm3
+ pxor xmm3,xmm7
+ movups XMMWORD[(-16)+rbx*1+r14],xmm4
+ pxor xmm4,xmm8
+ movups XMMWORD[(-16)+rbx*1+r15],xmm5
+ pxor xmm5,xmm9
+
+ dec edx
+ jnz NEAR $L$oop_enc4x
+
+ mov rax,QWORD[16+rsp]
+
+ mov edx,DWORD[24+rsp]
+
+
+
+
+
+
+
+
+
+
+ lea rdi,[160+rdi]
+ dec edx
+ jnz NEAR $L$enc4x_loop_grande
+
+$L$enc4x_done:
+ movaps xmm6,XMMWORD[((-216))+rax]
+ movaps xmm7,XMMWORD[((-200))+rax]
+ movaps xmm8,XMMWORD[((-184))+rax]
+ movaps xmm9,XMMWORD[((-168))+rax]
+ movaps xmm10,XMMWORD[((-152))+rax]
+ movaps xmm11,XMMWORD[((-136))+rax]
+ movaps xmm12,XMMWORD[((-120))+rax]
+
+
+
+ mov r15,QWORD[((-48))+rax]
+
+ mov r14,QWORD[((-40))+rax]
+
+ mov r13,QWORD[((-32))+rax]
+
+ mov r12,QWORD[((-24))+rax]
+
+ mov rbp,QWORD[((-16))+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+
+ lea rsp,[rax]
+
+$L$enc4x_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_aesni_multi_cbc_encrypt:
+
+global aesni_multi_cbc_decrypt
+
+ALIGN 32
+aesni_multi_cbc_decrypt:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_aesni_multi_cbc_decrypt:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+
+ cmp edx,2
+ jb NEAR $L$dec_non_avx
+ mov ecx,DWORD[((OPENSSL_ia32cap_P+4))]
+ test ecx,268435456
+ jnz NEAR _avx_cbc_dec_shortcut
+ jmp NEAR $L$dec_non_avx
+ALIGN 16
+$L$dec_non_avx:
+ mov rax,rsp
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+ lea rsp,[((-168))+rsp]
+ movaps XMMWORD[rsp],xmm6
+ movaps XMMWORD[16+rsp],xmm7
+ movaps XMMWORD[32+rsp],xmm8
+ movaps XMMWORD[48+rsp],xmm9
+ movaps XMMWORD[64+rsp],xmm10
+ movaps XMMWORD[80+rsp],xmm11
+ movaps XMMWORD[96+rsp],xmm12
+ movaps XMMWORD[(-104)+rax],xmm13
+ movaps XMMWORD[(-88)+rax],xmm14
+ movaps XMMWORD[(-72)+rax],xmm15
+
+
+
+
+
+
+ sub rsp,48
+ and rsp,-64
+ mov QWORD[16+rsp],rax
+
+
+$L$dec4x_body:
+ movdqu xmm12,XMMWORD[rsi]
+ lea rsi,[120+rsi]
+ lea rdi,[80+rdi]
+
+$L$dec4x_loop_grande:
+ mov DWORD[24+rsp],edx
+ xor edx,edx
+ mov ecx,DWORD[((-64))+rdi]
+ mov r8,QWORD[((-80))+rdi]
+ cmp ecx,edx
+ mov r12,QWORD[((-72))+rdi]
+ cmovg edx,ecx
+ test ecx,ecx
+ movdqu xmm6,XMMWORD[((-56))+rdi]
+ mov DWORD[32+rsp],ecx
+ cmovle r8,rsp
+ mov ecx,DWORD[((-24))+rdi]
+ mov r9,QWORD[((-40))+rdi]
+ cmp ecx,edx
+ mov r13,QWORD[((-32))+rdi]
+ cmovg edx,ecx
+ test ecx,ecx
+ movdqu xmm7,XMMWORD[((-16))+rdi]
+ mov DWORD[36+rsp],ecx
+ cmovle r9,rsp
+ mov ecx,DWORD[16+rdi]
+ mov r10,QWORD[rdi]
+ cmp ecx,edx
+ mov r14,QWORD[8+rdi]
+ cmovg edx,ecx
+ test ecx,ecx
+ movdqu xmm8,XMMWORD[24+rdi]
+ mov DWORD[40+rsp],ecx
+ cmovle r10,rsp
+ mov ecx,DWORD[56+rdi]
+ mov r11,QWORD[40+rdi]
+ cmp ecx,edx
+ mov r15,QWORD[48+rdi]
+ cmovg edx,ecx
+ test ecx,ecx
+ movdqu xmm9,XMMWORD[64+rdi]
+ mov DWORD[44+rsp],ecx
+ cmovle r11,rsp
+ test edx,edx
+ jz NEAR $L$dec4x_done
+
+ movups xmm1,XMMWORD[((16-120))+rsi]
+ movups xmm0,XMMWORD[((32-120))+rsi]
+ mov eax,DWORD[((240-120))+rsi]
+ movdqu xmm2,XMMWORD[r8]
+ movdqu xmm3,XMMWORD[r9]
+ pxor xmm2,xmm12
+ movdqu xmm4,XMMWORD[r10]
+ pxor xmm3,xmm12
+ movdqu xmm5,XMMWORD[r11]
+ pxor xmm4,xmm12
+ pxor xmm5,xmm12
+ movdqa xmm10,XMMWORD[32+rsp]
+ xor rbx,rbx
+ jmp NEAR $L$oop_dec4x
+
+ALIGN 32
+$L$oop_dec4x:
+ add rbx,16
+ lea rbp,[16+rsp]
+ mov ecx,1
+ sub rbp,rbx
+
+DB 102,15,56,222,209
+ prefetcht0 [31+rbx*1+r8]
+ prefetcht0 [31+rbx*1+r9]
+DB 102,15,56,222,217
+ prefetcht0 [31+rbx*1+r10]
+ prefetcht0 [31+rbx*1+r11]
+DB 102,15,56,222,225
+DB 102,15,56,222,233
+ movups xmm1,XMMWORD[((48-120))+rsi]
+ cmp ecx,DWORD[32+rsp]
+DB 102,15,56,222,208
+DB 102,15,56,222,216
+DB 102,15,56,222,224
+ cmovge r8,rbp
+ cmovg r12,rbp
+DB 102,15,56,222,232
+ movups xmm0,XMMWORD[((-56))+rsi]
+ cmp ecx,DWORD[36+rsp]
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+DB 102,15,56,222,225
+ cmovge r9,rbp
+ cmovg r13,rbp
+DB 102,15,56,222,233
+ movups xmm1,XMMWORD[((-40))+rsi]
+ cmp ecx,DWORD[40+rsp]
+DB 102,15,56,222,208
+DB 102,15,56,222,216
+DB 102,15,56,222,224
+ cmovge r10,rbp
+ cmovg r14,rbp
+DB 102,15,56,222,232
+ movups xmm0,XMMWORD[((-24))+rsi]
+ cmp ecx,DWORD[44+rsp]
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+DB 102,15,56,222,225
+ cmovge r11,rbp
+ cmovg r15,rbp
+DB 102,15,56,222,233
+ movups xmm1,XMMWORD[((-8))+rsi]
+ movdqa xmm11,xmm10
+DB 102,15,56,222,208
+ prefetcht0 [15+rbx*1+r12]
+ prefetcht0 [15+rbx*1+r13]
+DB 102,15,56,222,216
+ prefetcht0 [15+rbx*1+r14]
+ prefetcht0 [15+rbx*1+r15]
+DB 102,15,56,222,224
+DB 102,15,56,222,232
+ movups xmm0,XMMWORD[((128-120))+rsi]
+ pxor xmm12,xmm12
+
+DB 102,15,56,222,209
+ pcmpgtd xmm11,xmm12
+ movdqu xmm12,XMMWORD[((-120))+rsi]
+DB 102,15,56,222,217
+ paddd xmm10,xmm11
+ movdqa XMMWORD[32+rsp],xmm10
+DB 102,15,56,222,225
+DB 102,15,56,222,233
+ movups xmm1,XMMWORD[((144-120))+rsi]
+
+ cmp eax,11
+
+DB 102,15,56,222,208
+DB 102,15,56,222,216
+DB 102,15,56,222,224
+DB 102,15,56,222,232
+ movups xmm0,XMMWORD[((160-120))+rsi]
+
+ jb NEAR $L$dec4x_tail
+
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+DB 102,15,56,222,225
+DB 102,15,56,222,233
+ movups xmm1,XMMWORD[((176-120))+rsi]
+
+DB 102,15,56,222,208
+DB 102,15,56,222,216
+DB 102,15,56,222,224
+DB 102,15,56,222,232
+ movups xmm0,XMMWORD[((192-120))+rsi]
+
+ je NEAR $L$dec4x_tail
+
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+DB 102,15,56,222,225
+DB 102,15,56,222,233
+ movups xmm1,XMMWORD[((208-120))+rsi]
+
+DB 102,15,56,222,208
+DB 102,15,56,222,216
+DB 102,15,56,222,224
+DB 102,15,56,222,232
+ movups xmm0,XMMWORD[((224-120))+rsi]
+ jmp NEAR $L$dec4x_tail
+
+ALIGN 32
+$L$dec4x_tail:
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+DB 102,15,56,222,225
+ pxor xmm6,xmm0
+ pxor xmm7,xmm0
+DB 102,15,56,222,233
+ movdqu xmm1,XMMWORD[((16-120))+rsi]
+ pxor xmm8,xmm0
+ pxor xmm9,xmm0
+ movdqu xmm0,XMMWORD[((32-120))+rsi]
+
+DB 102,15,56,223,214
+DB 102,15,56,223,223
+ movdqu xmm6,XMMWORD[((-16))+rbx*1+r8]
+ movdqu xmm7,XMMWORD[((-16))+rbx*1+r9]
+DB 102,65,15,56,223,224
+DB 102,65,15,56,223,233
+ movdqu xmm8,XMMWORD[((-16))+rbx*1+r10]
+ movdqu xmm9,XMMWORD[((-16))+rbx*1+r11]
+
+ movups XMMWORD[(-16)+rbx*1+r12],xmm2
+ movdqu xmm2,XMMWORD[rbx*1+r8]
+ movups XMMWORD[(-16)+rbx*1+r13],xmm3
+ movdqu xmm3,XMMWORD[rbx*1+r9]
+ pxor xmm2,xmm12
+ movups XMMWORD[(-16)+rbx*1+r14],xmm4
+ movdqu xmm4,XMMWORD[rbx*1+r10]
+ pxor xmm3,xmm12
+ movups XMMWORD[(-16)+rbx*1+r15],xmm5
+ movdqu xmm5,XMMWORD[rbx*1+r11]
+ pxor xmm4,xmm12
+ pxor xmm5,xmm12
+
+ dec edx
+ jnz NEAR $L$oop_dec4x
+
+ mov rax,QWORD[16+rsp]
+
+ mov edx,DWORD[24+rsp]
+
+ lea rdi,[160+rdi]
+ dec edx
+ jnz NEAR $L$dec4x_loop_grande
+
+$L$dec4x_done:
+ movaps xmm6,XMMWORD[((-216))+rax]
+ movaps xmm7,XMMWORD[((-200))+rax]
+ movaps xmm8,XMMWORD[((-184))+rax]
+ movaps xmm9,XMMWORD[((-168))+rax]
+ movaps xmm10,XMMWORD[((-152))+rax]
+ movaps xmm11,XMMWORD[((-136))+rax]
+ movaps xmm12,XMMWORD[((-120))+rax]
+
+
+
+ mov r15,QWORD[((-48))+rax]
+
+ mov r14,QWORD[((-40))+rax]
+
+ mov r13,QWORD[((-32))+rax]
+
+ mov r12,QWORD[((-24))+rax]
+
+ mov rbp,QWORD[((-16))+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+
+ lea rsp,[rax]
+
+$L$dec4x_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_aesni_multi_cbc_decrypt:
+
+ALIGN 32
+aesni_multi_cbc_encrypt_avx:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_aesni_multi_cbc_encrypt_avx:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+
+_avx_cbc_enc_shortcut:
+ mov rax,rsp
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+ lea rsp,[((-168))+rsp]
+ movaps XMMWORD[rsp],xmm6
+ movaps XMMWORD[16+rsp],xmm7
+ movaps XMMWORD[32+rsp],xmm8
+ movaps XMMWORD[48+rsp],xmm9
+ movaps XMMWORD[64+rsp],xmm10
+ movaps XMMWORD[80+rsp],xmm11
+ movaps XMMWORD[(-120)+rax],xmm12
+ movaps XMMWORD[(-104)+rax],xmm13
+ movaps XMMWORD[(-88)+rax],xmm14
+ movaps XMMWORD[(-72)+rax],xmm15
+
+
+
+
+
+
+
+
+ sub rsp,192
+ and rsp,-128
+ mov QWORD[16+rsp],rax
+
+
+$L$enc8x_body:
+ vzeroupper
+ vmovdqu xmm15,XMMWORD[rsi]
+ lea rsi,[120+rsi]
+ lea rdi,[160+rdi]
+ shr edx,1
+
+$L$enc8x_loop_grande:
+
+ xor edx,edx
+ mov ecx,DWORD[((-144))+rdi]
+ mov r8,QWORD[((-160))+rdi]
+ cmp ecx,edx
+ mov rbx,QWORD[((-152))+rdi]
+ cmovg edx,ecx
+ test ecx,ecx
+ vmovdqu xmm2,XMMWORD[((-136))+rdi]
+ mov DWORD[32+rsp],ecx
+ cmovle r8,rsp
+ sub rbx,r8
+ mov QWORD[64+rsp],rbx
+ mov ecx,DWORD[((-104))+rdi]
+ mov r9,QWORD[((-120))+rdi]
+ cmp ecx,edx
+ mov rbp,QWORD[((-112))+rdi]
+ cmovg edx,ecx
+ test ecx,ecx
+ vmovdqu xmm3,XMMWORD[((-96))+rdi]
+ mov DWORD[36+rsp],ecx
+ cmovle r9,rsp
+ sub rbp,r9
+ mov QWORD[72+rsp],rbp
+ mov ecx,DWORD[((-64))+rdi]
+ mov r10,QWORD[((-80))+rdi]
+ cmp ecx,edx
+ mov rbp,QWORD[((-72))+rdi]
+ cmovg edx,ecx
+ test ecx,ecx
+ vmovdqu xmm4,XMMWORD[((-56))+rdi]
+ mov DWORD[40+rsp],ecx
+ cmovle r10,rsp
+ sub rbp,r10
+ mov QWORD[80+rsp],rbp
+ mov ecx,DWORD[((-24))+rdi]
+ mov r11,QWORD[((-40))+rdi]
+ cmp ecx,edx
+ mov rbp,QWORD[((-32))+rdi]
+ cmovg edx,ecx
+ test ecx,ecx
+ vmovdqu xmm5,XMMWORD[((-16))+rdi]
+ mov DWORD[44+rsp],ecx
+ cmovle r11,rsp
+ sub rbp,r11
+ mov QWORD[88+rsp],rbp
+ mov ecx,DWORD[16+rdi]
+ mov r12,QWORD[rdi]
+ cmp ecx,edx
+ mov rbp,QWORD[8+rdi]
+ cmovg edx,ecx
+ test ecx,ecx
+ vmovdqu xmm6,XMMWORD[24+rdi]
+ mov DWORD[48+rsp],ecx
+ cmovle r12,rsp
+ sub rbp,r12
+ mov QWORD[96+rsp],rbp
+ mov ecx,DWORD[56+rdi]
+ mov r13,QWORD[40+rdi]
+ cmp ecx,edx
+ mov rbp,QWORD[48+rdi]
+ cmovg edx,ecx
+ test ecx,ecx
+ vmovdqu xmm7,XMMWORD[64+rdi]
+ mov DWORD[52+rsp],ecx
+ cmovle r13,rsp
+ sub rbp,r13
+ mov QWORD[104+rsp],rbp
+ mov ecx,DWORD[96+rdi]
+ mov r14,QWORD[80+rdi]
+ cmp ecx,edx
+ mov rbp,QWORD[88+rdi]
+ cmovg edx,ecx
+ test ecx,ecx
+ vmovdqu xmm8,XMMWORD[104+rdi]
+ mov DWORD[56+rsp],ecx
+ cmovle r14,rsp
+ sub rbp,r14
+ mov QWORD[112+rsp],rbp
+ mov ecx,DWORD[136+rdi]
+ mov r15,QWORD[120+rdi]
+ cmp ecx,edx
+ mov rbp,QWORD[128+rdi]
+ cmovg edx,ecx
+ test ecx,ecx
+ vmovdqu xmm9,XMMWORD[144+rdi]
+ mov DWORD[60+rsp],ecx
+ cmovle r15,rsp
+ sub rbp,r15
+ mov QWORD[120+rsp],rbp
+ test edx,edx
+ jz NEAR $L$enc8x_done
+
+ vmovups xmm1,XMMWORD[((16-120))+rsi]
+ vmovups xmm0,XMMWORD[((32-120))+rsi]
+ mov eax,DWORD[((240-120))+rsi]
+
+ vpxor xmm10,xmm15,XMMWORD[r8]
+ lea rbp,[128+rsp]
+ vpxor xmm11,xmm15,XMMWORD[r9]
+ vpxor xmm12,xmm15,XMMWORD[r10]
+ vpxor xmm13,xmm15,XMMWORD[r11]
+ vpxor xmm2,xmm2,xmm10
+ vpxor xmm10,xmm15,XMMWORD[r12]
+ vpxor xmm3,xmm3,xmm11
+ vpxor xmm11,xmm15,XMMWORD[r13]
+ vpxor xmm4,xmm4,xmm12
+ vpxor xmm12,xmm15,XMMWORD[r14]
+ vpxor xmm5,xmm5,xmm13
+ vpxor xmm13,xmm15,XMMWORD[r15]
+ vpxor xmm6,xmm6,xmm10
+ mov ecx,1
+ vpxor xmm7,xmm7,xmm11
+ vpxor xmm8,xmm8,xmm12
+ vpxor xmm9,xmm9,xmm13
+ jmp NEAR $L$oop_enc8x
+
+ALIGN 32
+$L$oop_enc8x:
+ vaesenc xmm2,xmm2,xmm1
+ cmp ecx,DWORD[((32+0))+rsp]
+ vaesenc xmm3,xmm3,xmm1
+ prefetcht0 [31+r8]
+ vaesenc xmm4,xmm4,xmm1
+ vaesenc xmm5,xmm5,xmm1
+ lea rbx,[rbx*1+r8]
+ cmovge r8,rsp
+ vaesenc xmm6,xmm6,xmm1
+ cmovg rbx,rsp
+ vaesenc xmm7,xmm7,xmm1
+ sub rbx,r8
+ vaesenc xmm8,xmm8,xmm1
+ vpxor xmm10,xmm15,XMMWORD[16+r8]
+ mov QWORD[((64+0))+rsp],rbx
+ vaesenc xmm9,xmm9,xmm1
+ vmovups xmm1,XMMWORD[((-72))+rsi]
+ lea r8,[16+rbx*1+r8]
+ vmovdqu XMMWORD[rbp],xmm10
+ vaesenc xmm2,xmm2,xmm0
+ cmp ecx,DWORD[((32+4))+rsp]
+ mov rbx,QWORD[((64+8))+rsp]
+ vaesenc xmm3,xmm3,xmm0
+ prefetcht0 [31+r9]
+ vaesenc xmm4,xmm4,xmm0
+ vaesenc xmm5,xmm5,xmm0
+ lea rbx,[rbx*1+r9]
+ cmovge r9,rsp
+ vaesenc xmm6,xmm6,xmm0
+ cmovg rbx,rsp
+ vaesenc xmm7,xmm7,xmm0
+ sub rbx,r9
+ vaesenc xmm8,xmm8,xmm0
+ vpxor xmm11,xmm15,XMMWORD[16+r9]
+ mov QWORD[((64+8))+rsp],rbx
+ vaesenc xmm9,xmm9,xmm0
+ vmovups xmm0,XMMWORD[((-56))+rsi]
+ lea r9,[16+rbx*1+r9]
+ vmovdqu XMMWORD[16+rbp],xmm11
+ vaesenc xmm2,xmm2,xmm1
+ cmp ecx,DWORD[((32+8))+rsp]
+ mov rbx,QWORD[((64+16))+rsp]
+ vaesenc xmm3,xmm3,xmm1
+ prefetcht0 [31+r10]
+ vaesenc xmm4,xmm4,xmm1
+ prefetcht0 [15+r8]
+ vaesenc xmm5,xmm5,xmm1
+ lea rbx,[rbx*1+r10]
+ cmovge r10,rsp
+ vaesenc xmm6,xmm6,xmm1
+ cmovg rbx,rsp
+ vaesenc xmm7,xmm7,xmm1
+ sub rbx,r10
+ vaesenc xmm8,xmm8,xmm1
+ vpxor xmm12,xmm15,XMMWORD[16+r10]
+ mov QWORD[((64+16))+rsp],rbx
+ vaesenc xmm9,xmm9,xmm1
+ vmovups xmm1,XMMWORD[((-40))+rsi]
+ lea r10,[16+rbx*1+r10]
+ vmovdqu XMMWORD[32+rbp],xmm12
+ vaesenc xmm2,xmm2,xmm0
+ cmp ecx,DWORD[((32+12))+rsp]
+ mov rbx,QWORD[((64+24))+rsp]
+ vaesenc xmm3,xmm3,xmm0
+ prefetcht0 [31+r11]
+ vaesenc xmm4,xmm4,xmm0
+ prefetcht0 [15+r9]
+ vaesenc xmm5,xmm5,xmm0
+ lea rbx,[rbx*1+r11]
+ cmovge r11,rsp
+ vaesenc xmm6,xmm6,xmm0
+ cmovg rbx,rsp
+ vaesenc xmm7,xmm7,xmm0
+ sub rbx,r11
+ vaesenc xmm8,xmm8,xmm0
+ vpxor xmm13,xmm15,XMMWORD[16+r11]
+ mov QWORD[((64+24))+rsp],rbx
+ vaesenc xmm9,xmm9,xmm0
+ vmovups xmm0,XMMWORD[((-24))+rsi]
+ lea r11,[16+rbx*1+r11]
+ vmovdqu XMMWORD[48+rbp],xmm13
+ vaesenc xmm2,xmm2,xmm1
+ cmp ecx,DWORD[((32+16))+rsp]
+ mov rbx,QWORD[((64+32))+rsp]
+ vaesenc xmm3,xmm3,xmm1
+ prefetcht0 [31+r12]
+ vaesenc xmm4,xmm4,xmm1
+ prefetcht0 [15+r10]
+ vaesenc xmm5,xmm5,xmm1
+ lea rbx,[rbx*1+r12]
+ cmovge r12,rsp
+ vaesenc xmm6,xmm6,xmm1
+ cmovg rbx,rsp
+ vaesenc xmm7,xmm7,xmm1
+ sub rbx,r12
+ vaesenc xmm8,xmm8,xmm1
+ vpxor xmm10,xmm15,XMMWORD[16+r12]
+ mov QWORD[((64+32))+rsp],rbx
+ vaesenc xmm9,xmm9,xmm1
+ vmovups xmm1,XMMWORD[((-8))+rsi]
+ lea r12,[16+rbx*1+r12]
+ vaesenc xmm2,xmm2,xmm0
+ cmp ecx,DWORD[((32+20))+rsp]
+ mov rbx,QWORD[((64+40))+rsp]
+ vaesenc xmm3,xmm3,xmm0
+ prefetcht0 [31+r13]
+ vaesenc xmm4,xmm4,xmm0
+ prefetcht0 [15+r11]
+ vaesenc xmm5,xmm5,xmm0
+ lea rbx,[r13*1+rbx]
+ cmovge r13,rsp
+ vaesenc xmm6,xmm6,xmm0
+ cmovg rbx,rsp
+ vaesenc xmm7,xmm7,xmm0
+ sub rbx,r13
+ vaesenc xmm8,xmm8,xmm0
+ vpxor xmm11,xmm15,XMMWORD[16+r13]
+ mov QWORD[((64+40))+rsp],rbx
+ vaesenc xmm9,xmm9,xmm0
+ vmovups xmm0,XMMWORD[8+rsi]
+ lea r13,[16+rbx*1+r13]
+ vaesenc xmm2,xmm2,xmm1
+ cmp ecx,DWORD[((32+24))+rsp]
+ mov rbx,QWORD[((64+48))+rsp]
+ vaesenc xmm3,xmm3,xmm1
+ prefetcht0 [31+r14]
+ vaesenc xmm4,xmm4,xmm1
+ prefetcht0 [15+r12]
+ vaesenc xmm5,xmm5,xmm1
+ lea rbx,[rbx*1+r14]
+ cmovge r14,rsp
+ vaesenc xmm6,xmm6,xmm1
+ cmovg rbx,rsp
+ vaesenc xmm7,xmm7,xmm1
+ sub rbx,r14
+ vaesenc xmm8,xmm8,xmm1
+ vpxor xmm12,xmm15,XMMWORD[16+r14]
+ mov QWORD[((64+48))+rsp],rbx
+ vaesenc xmm9,xmm9,xmm1
+ vmovups xmm1,XMMWORD[24+rsi]
+ lea r14,[16+rbx*1+r14]
+ vaesenc xmm2,xmm2,xmm0
+ cmp ecx,DWORD[((32+28))+rsp]
+ mov rbx,QWORD[((64+56))+rsp]
+ vaesenc xmm3,xmm3,xmm0
+ prefetcht0 [31+r15]
+ vaesenc xmm4,xmm4,xmm0
+ prefetcht0 [15+r13]
+ vaesenc xmm5,xmm5,xmm0
+ lea rbx,[rbx*1+r15]
+ cmovge r15,rsp
+ vaesenc xmm6,xmm6,xmm0
+ cmovg rbx,rsp
+ vaesenc xmm7,xmm7,xmm0
+ sub rbx,r15
+ vaesenc xmm8,xmm8,xmm0
+ vpxor xmm13,xmm15,XMMWORD[16+r15]
+ mov QWORD[((64+56))+rsp],rbx
+ vaesenc xmm9,xmm9,xmm0
+ vmovups xmm0,XMMWORD[40+rsi]
+ lea r15,[16+rbx*1+r15]
+ vmovdqu xmm14,XMMWORD[32+rsp]
+ prefetcht0 [15+r14]
+ prefetcht0 [15+r15]
+ cmp eax,11
+ jb NEAR $L$enc8x_tail
+
+ vaesenc xmm2,xmm2,xmm1
+ vaesenc xmm3,xmm3,xmm1
+ vaesenc xmm4,xmm4,xmm1
+ vaesenc xmm5,xmm5,xmm1
+ vaesenc xmm6,xmm6,xmm1
+ vaesenc xmm7,xmm7,xmm1
+ vaesenc xmm8,xmm8,xmm1
+ vaesenc xmm9,xmm9,xmm1
+ vmovups xmm1,XMMWORD[((176-120))+rsi]
+
+ vaesenc xmm2,xmm2,xmm0
+ vaesenc xmm3,xmm3,xmm0
+ vaesenc xmm4,xmm4,xmm0
+ vaesenc xmm5,xmm5,xmm0
+ vaesenc xmm6,xmm6,xmm0
+ vaesenc xmm7,xmm7,xmm0
+ vaesenc xmm8,xmm8,xmm0
+ vaesenc xmm9,xmm9,xmm0
+ vmovups xmm0,XMMWORD[((192-120))+rsi]
+ je NEAR $L$enc8x_tail
+
+ vaesenc xmm2,xmm2,xmm1
+ vaesenc xmm3,xmm3,xmm1
+ vaesenc xmm4,xmm4,xmm1
+ vaesenc xmm5,xmm5,xmm1
+ vaesenc xmm6,xmm6,xmm1
+ vaesenc xmm7,xmm7,xmm1
+ vaesenc xmm8,xmm8,xmm1
+ vaesenc xmm9,xmm9,xmm1
+ vmovups xmm1,XMMWORD[((208-120))+rsi]
+
+ vaesenc xmm2,xmm2,xmm0
+ vaesenc xmm3,xmm3,xmm0
+ vaesenc xmm4,xmm4,xmm0
+ vaesenc xmm5,xmm5,xmm0
+ vaesenc xmm6,xmm6,xmm0
+ vaesenc xmm7,xmm7,xmm0
+ vaesenc xmm8,xmm8,xmm0
+ vaesenc xmm9,xmm9,xmm0
+ vmovups xmm0,XMMWORD[((224-120))+rsi]
+
+$L$enc8x_tail:
+ vaesenc xmm2,xmm2,xmm1
+ vpxor xmm15,xmm15,xmm15
+ vaesenc xmm3,xmm3,xmm1
+ vaesenc xmm4,xmm4,xmm1
+ vpcmpgtd xmm15,xmm14,xmm15
+ vaesenc xmm5,xmm5,xmm1
+ vaesenc xmm6,xmm6,xmm1
+ vpaddd xmm15,xmm15,xmm14
+ vmovdqu xmm14,XMMWORD[48+rsp]
+ vaesenc xmm7,xmm7,xmm1
+ mov rbx,QWORD[64+rsp]
+ vaesenc xmm8,xmm8,xmm1
+ vaesenc xmm9,xmm9,xmm1
+ vmovups xmm1,XMMWORD[((16-120))+rsi]
+
+ vaesenclast xmm2,xmm2,xmm0
+ vmovdqa XMMWORD[32+rsp],xmm15
+ vpxor xmm15,xmm15,xmm15
+ vaesenclast xmm3,xmm3,xmm0
+ vaesenclast xmm4,xmm4,xmm0
+ vpcmpgtd xmm15,xmm14,xmm15
+ vaesenclast xmm5,xmm5,xmm0
+ vaesenclast xmm6,xmm6,xmm0
+ vpaddd xmm14,xmm14,xmm15
+ vmovdqu xmm15,XMMWORD[((-120))+rsi]
+ vaesenclast xmm7,xmm7,xmm0
+ vaesenclast xmm8,xmm8,xmm0
+ vmovdqa XMMWORD[48+rsp],xmm14
+ vaesenclast xmm9,xmm9,xmm0
+ vmovups xmm0,XMMWORD[((32-120))+rsi]
+
+ vmovups XMMWORD[(-16)+r8],xmm2
+ sub r8,rbx
+ vpxor xmm2,xmm2,XMMWORD[rbp]
+ vmovups XMMWORD[(-16)+r9],xmm3
+ sub r9,QWORD[72+rsp]
+ vpxor xmm3,xmm3,XMMWORD[16+rbp]
+ vmovups XMMWORD[(-16)+r10],xmm4
+ sub r10,QWORD[80+rsp]
+ vpxor xmm4,xmm4,XMMWORD[32+rbp]
+ vmovups XMMWORD[(-16)+r11],xmm5
+ sub r11,QWORD[88+rsp]
+ vpxor xmm5,xmm5,XMMWORD[48+rbp]
+ vmovups XMMWORD[(-16)+r12],xmm6
+ sub r12,QWORD[96+rsp]
+ vpxor xmm6,xmm6,xmm10
+ vmovups XMMWORD[(-16)+r13],xmm7
+ sub r13,QWORD[104+rsp]
+ vpxor xmm7,xmm7,xmm11
+ vmovups XMMWORD[(-16)+r14],xmm8
+ sub r14,QWORD[112+rsp]
+ vpxor xmm8,xmm8,xmm12
+ vmovups XMMWORD[(-16)+r15],xmm9
+ sub r15,QWORD[120+rsp]
+ vpxor xmm9,xmm9,xmm13
+
+ dec edx
+ jnz NEAR $L$oop_enc8x
+
+ mov rax,QWORD[16+rsp]
+
+
+
+
+
+
+$L$enc8x_done:
+ vzeroupper
+ movaps xmm6,XMMWORD[((-216))+rax]
+ movaps xmm7,XMMWORD[((-200))+rax]
+ movaps xmm8,XMMWORD[((-184))+rax]
+ movaps xmm9,XMMWORD[((-168))+rax]
+ movaps xmm10,XMMWORD[((-152))+rax]
+ movaps xmm11,XMMWORD[((-136))+rax]
+ movaps xmm12,XMMWORD[((-120))+rax]
+ movaps xmm13,XMMWORD[((-104))+rax]
+ movaps xmm14,XMMWORD[((-88))+rax]
+ movaps xmm15,XMMWORD[((-72))+rax]
+ mov r15,QWORD[((-48))+rax]
+
+ mov r14,QWORD[((-40))+rax]
+
+ mov r13,QWORD[((-32))+rax]
+
+ mov r12,QWORD[((-24))+rax]
+
+ mov rbp,QWORD[((-16))+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+
+ lea rsp,[rax]
+
+$L$enc8x_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_aesni_multi_cbc_encrypt_avx:
+
+
+ALIGN 32
+aesni_multi_cbc_decrypt_avx:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_aesni_multi_cbc_decrypt_avx:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+
+_avx_cbc_dec_shortcut:
+ mov rax,rsp
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+ lea rsp,[((-168))+rsp]
+ movaps XMMWORD[rsp],xmm6
+ movaps XMMWORD[16+rsp],xmm7
+ movaps XMMWORD[32+rsp],xmm8
+ movaps XMMWORD[48+rsp],xmm9
+ movaps XMMWORD[64+rsp],xmm10
+ movaps XMMWORD[80+rsp],xmm11
+ movaps XMMWORD[(-120)+rax],xmm12
+ movaps XMMWORD[(-104)+rax],xmm13
+ movaps XMMWORD[(-88)+rax],xmm14
+ movaps XMMWORD[(-72)+rax],xmm15
+
+
+
+
+
+
+
+
+
+ sub rsp,256
+ and rsp,-256
+ sub rsp,192
+ mov QWORD[16+rsp],rax
+
+
+$L$dec8x_body:
+ vzeroupper
+ vmovdqu xmm15,XMMWORD[rsi]
+ lea rsi,[120+rsi]
+ lea rdi,[160+rdi]
+ shr edx,1
+
+$L$dec8x_loop_grande:
+
+ xor edx,edx
+ mov ecx,DWORD[((-144))+rdi]
+ mov r8,QWORD[((-160))+rdi]
+ cmp ecx,edx
+ mov rbx,QWORD[((-152))+rdi]
+ cmovg edx,ecx
+ test ecx,ecx
+ vmovdqu xmm2,XMMWORD[((-136))+rdi]
+ mov DWORD[32+rsp],ecx
+ cmovle r8,rsp
+ sub rbx,r8
+ mov QWORD[64+rsp],rbx
+ vmovdqu XMMWORD[192+rsp],xmm2
+ mov ecx,DWORD[((-104))+rdi]
+ mov r9,QWORD[((-120))+rdi]
+ cmp ecx,edx
+ mov rbp,QWORD[((-112))+rdi]
+ cmovg edx,ecx
+ test ecx,ecx
+ vmovdqu xmm3,XMMWORD[((-96))+rdi]
+ mov DWORD[36+rsp],ecx
+ cmovle r9,rsp
+ sub rbp,r9
+ mov QWORD[72+rsp],rbp
+ vmovdqu XMMWORD[208+rsp],xmm3
+ mov ecx,DWORD[((-64))+rdi]
+ mov r10,QWORD[((-80))+rdi]
+ cmp ecx,edx
+ mov rbp,QWORD[((-72))+rdi]
+ cmovg edx,ecx
+ test ecx,ecx
+ vmovdqu xmm4,XMMWORD[((-56))+rdi]
+ mov DWORD[40+rsp],ecx
+ cmovle r10,rsp
+ sub rbp,r10
+ mov QWORD[80+rsp],rbp
+ vmovdqu XMMWORD[224+rsp],xmm4
+ mov ecx,DWORD[((-24))+rdi]
+ mov r11,QWORD[((-40))+rdi]
+ cmp ecx,edx
+ mov rbp,QWORD[((-32))+rdi]
+ cmovg edx,ecx
+ test ecx,ecx
+ vmovdqu xmm5,XMMWORD[((-16))+rdi]
+ mov DWORD[44+rsp],ecx
+ cmovle r11,rsp
+ sub rbp,r11
+ mov QWORD[88+rsp],rbp
+ vmovdqu XMMWORD[240+rsp],xmm5
+ mov ecx,DWORD[16+rdi]
+ mov r12,QWORD[rdi]
+ cmp ecx,edx
+ mov rbp,QWORD[8+rdi]
+ cmovg edx,ecx
+ test ecx,ecx
+ vmovdqu xmm6,XMMWORD[24+rdi]
+ mov DWORD[48+rsp],ecx
+ cmovle r12,rsp
+ sub rbp,r12
+ mov QWORD[96+rsp],rbp
+ vmovdqu XMMWORD[256+rsp],xmm6
+ mov ecx,DWORD[56+rdi]
+ mov r13,QWORD[40+rdi]
+ cmp ecx,edx
+ mov rbp,QWORD[48+rdi]
+ cmovg edx,ecx
+ test ecx,ecx
+ vmovdqu xmm7,XMMWORD[64+rdi]
+ mov DWORD[52+rsp],ecx
+ cmovle r13,rsp
+ sub rbp,r13
+ mov QWORD[104+rsp],rbp
+ vmovdqu XMMWORD[272+rsp],xmm7
+ mov ecx,DWORD[96+rdi]
+ mov r14,QWORD[80+rdi]
+ cmp ecx,edx
+ mov rbp,QWORD[88+rdi]
+ cmovg edx,ecx
+ test ecx,ecx
+ vmovdqu xmm8,XMMWORD[104+rdi]
+ mov DWORD[56+rsp],ecx
+ cmovle r14,rsp
+ sub rbp,r14
+ mov QWORD[112+rsp],rbp
+ vmovdqu XMMWORD[288+rsp],xmm8
+ mov ecx,DWORD[136+rdi]
+ mov r15,QWORD[120+rdi]
+ cmp ecx,edx
+ mov rbp,QWORD[128+rdi]
+ cmovg edx,ecx
+ test ecx,ecx
+ vmovdqu xmm9,XMMWORD[144+rdi]
+ mov DWORD[60+rsp],ecx
+ cmovle r15,rsp
+ sub rbp,r15
+ mov QWORD[120+rsp],rbp
+ vmovdqu XMMWORD[304+rsp],xmm9
+ test edx,edx
+ jz NEAR $L$dec8x_done
+
+ vmovups xmm1,XMMWORD[((16-120))+rsi]
+ vmovups xmm0,XMMWORD[((32-120))+rsi]
+ mov eax,DWORD[((240-120))+rsi]
+ lea rbp,[((192+128))+rsp]
+
+ vmovdqu xmm2,XMMWORD[r8]
+ vmovdqu xmm3,XMMWORD[r9]
+ vmovdqu xmm4,XMMWORD[r10]
+ vmovdqu xmm5,XMMWORD[r11]
+ vmovdqu xmm6,XMMWORD[r12]
+ vmovdqu xmm7,XMMWORD[r13]
+ vmovdqu xmm8,XMMWORD[r14]
+ vmovdqu xmm9,XMMWORD[r15]
+ vmovdqu XMMWORD[rbp],xmm2
+ vpxor xmm2,xmm2,xmm15
+ vmovdqu XMMWORD[16+rbp],xmm3
+ vpxor xmm3,xmm3,xmm15
+ vmovdqu XMMWORD[32+rbp],xmm4
+ vpxor xmm4,xmm4,xmm15
+ vmovdqu XMMWORD[48+rbp],xmm5
+ vpxor xmm5,xmm5,xmm15
+ vmovdqu XMMWORD[64+rbp],xmm6
+ vpxor xmm6,xmm6,xmm15
+ vmovdqu XMMWORD[80+rbp],xmm7
+ vpxor xmm7,xmm7,xmm15
+ vmovdqu XMMWORD[96+rbp],xmm8
+ vpxor xmm8,xmm8,xmm15
+ vmovdqu XMMWORD[112+rbp],xmm9
+ vpxor xmm9,xmm9,xmm15
+ xor rbp,0x80
+ mov ecx,1
+ jmp NEAR $L$oop_dec8x
+
+ALIGN 32
+$L$oop_dec8x:
+ vaesdec xmm2,xmm2,xmm1
+ cmp ecx,DWORD[((32+0))+rsp]
+ vaesdec xmm3,xmm3,xmm1
+ prefetcht0 [31+r8]
+ vaesdec xmm4,xmm4,xmm1
+ vaesdec xmm5,xmm5,xmm1
+ lea rbx,[rbx*1+r8]
+ cmovge r8,rsp
+ vaesdec xmm6,xmm6,xmm1
+ cmovg rbx,rsp
+ vaesdec xmm7,xmm7,xmm1
+ sub rbx,r8
+ vaesdec xmm8,xmm8,xmm1
+ vmovdqu xmm10,XMMWORD[16+r8]
+ mov QWORD[((64+0))+rsp],rbx
+ vaesdec xmm9,xmm9,xmm1
+ vmovups xmm1,XMMWORD[((-72))+rsi]
+ lea r8,[16+rbx*1+r8]
+ vmovdqu XMMWORD[128+rsp],xmm10
+ vaesdec xmm2,xmm2,xmm0
+ cmp ecx,DWORD[((32+4))+rsp]
+ mov rbx,QWORD[((64+8))+rsp]
+ vaesdec xmm3,xmm3,xmm0
+ prefetcht0 [31+r9]
+ vaesdec xmm4,xmm4,xmm0
+ vaesdec xmm5,xmm5,xmm0
+ lea rbx,[rbx*1+r9]
+ cmovge r9,rsp
+ vaesdec xmm6,xmm6,xmm0
+ cmovg rbx,rsp
+ vaesdec xmm7,xmm7,xmm0
+ sub rbx,r9
+ vaesdec xmm8,xmm8,xmm0
+ vmovdqu xmm11,XMMWORD[16+r9]
+ mov QWORD[((64+8))+rsp],rbx
+ vaesdec xmm9,xmm9,xmm0
+ vmovups xmm0,XMMWORD[((-56))+rsi]
+ lea r9,[16+rbx*1+r9]
+ vmovdqu XMMWORD[144+rsp],xmm11
+ vaesdec xmm2,xmm2,xmm1
+ cmp ecx,DWORD[((32+8))+rsp]
+ mov rbx,QWORD[((64+16))+rsp]
+ vaesdec xmm3,xmm3,xmm1
+ prefetcht0 [31+r10]
+ vaesdec xmm4,xmm4,xmm1
+ prefetcht0 [15+r8]
+ vaesdec xmm5,xmm5,xmm1
+ lea rbx,[rbx*1+r10]
+ cmovge r10,rsp
+ vaesdec xmm6,xmm6,xmm1
+ cmovg rbx,rsp
+ vaesdec xmm7,xmm7,xmm1
+ sub rbx,r10
+ vaesdec xmm8,xmm8,xmm1
+ vmovdqu xmm12,XMMWORD[16+r10]
+ mov QWORD[((64+16))+rsp],rbx
+ vaesdec xmm9,xmm9,xmm1
+ vmovups xmm1,XMMWORD[((-40))+rsi]
+ lea r10,[16+rbx*1+r10]
+ vmovdqu XMMWORD[160+rsp],xmm12
+ vaesdec xmm2,xmm2,xmm0
+ cmp ecx,DWORD[((32+12))+rsp]
+ mov rbx,QWORD[((64+24))+rsp]
+ vaesdec xmm3,xmm3,xmm0
+ prefetcht0 [31+r11]
+ vaesdec xmm4,xmm4,xmm0
+ prefetcht0 [15+r9]
+ vaesdec xmm5,xmm5,xmm0
+ lea rbx,[rbx*1+r11]
+ cmovge r11,rsp
+ vaesdec xmm6,xmm6,xmm0
+ cmovg rbx,rsp
+ vaesdec xmm7,xmm7,xmm0
+ sub rbx,r11
+ vaesdec xmm8,xmm8,xmm0
+ vmovdqu xmm13,XMMWORD[16+r11]
+ mov QWORD[((64+24))+rsp],rbx
+ vaesdec xmm9,xmm9,xmm0
+ vmovups xmm0,XMMWORD[((-24))+rsi]
+ lea r11,[16+rbx*1+r11]
+ vmovdqu XMMWORD[176+rsp],xmm13
+ vaesdec xmm2,xmm2,xmm1
+ cmp ecx,DWORD[((32+16))+rsp]
+ mov rbx,QWORD[((64+32))+rsp]
+ vaesdec xmm3,xmm3,xmm1
+ prefetcht0 [31+r12]
+ vaesdec xmm4,xmm4,xmm1
+ prefetcht0 [15+r10]
+ vaesdec xmm5,xmm5,xmm1
+ lea rbx,[rbx*1+r12]
+ cmovge r12,rsp
+ vaesdec xmm6,xmm6,xmm1
+ cmovg rbx,rsp
+ vaesdec xmm7,xmm7,xmm1
+ sub rbx,r12
+ vaesdec xmm8,xmm8,xmm1
+ vmovdqu xmm10,XMMWORD[16+r12]
+ mov QWORD[((64+32))+rsp],rbx
+ vaesdec xmm9,xmm9,xmm1
+ vmovups xmm1,XMMWORD[((-8))+rsi]
+ lea r12,[16+rbx*1+r12]
+ vaesdec xmm2,xmm2,xmm0
+ cmp ecx,DWORD[((32+20))+rsp]
+ mov rbx,QWORD[((64+40))+rsp]
+ vaesdec xmm3,xmm3,xmm0
+ prefetcht0 [31+r13]
+ vaesdec xmm4,xmm4,xmm0
+ prefetcht0 [15+r11]
+ vaesdec xmm5,xmm5,xmm0
+ lea rbx,[r13*1+rbx]
+ cmovge r13,rsp
+ vaesdec xmm6,xmm6,xmm0
+ cmovg rbx,rsp
+ vaesdec xmm7,xmm7,xmm0
+ sub rbx,r13
+ vaesdec xmm8,xmm8,xmm0
+ vmovdqu xmm11,XMMWORD[16+r13]
+ mov QWORD[((64+40))+rsp],rbx
+ vaesdec xmm9,xmm9,xmm0
+ vmovups xmm0,XMMWORD[8+rsi]
+ lea r13,[16+rbx*1+r13]
+ vaesdec xmm2,xmm2,xmm1
+ cmp ecx,DWORD[((32+24))+rsp]
+ mov rbx,QWORD[((64+48))+rsp]
+ vaesdec xmm3,xmm3,xmm1
+ prefetcht0 [31+r14]
+ vaesdec xmm4,xmm4,xmm1
+ prefetcht0 [15+r12]
+ vaesdec xmm5,xmm5,xmm1
+ lea rbx,[rbx*1+r14]
+ cmovge r14,rsp
+ vaesdec xmm6,xmm6,xmm1
+ cmovg rbx,rsp
+ vaesdec xmm7,xmm7,xmm1
+ sub rbx,r14
+ vaesdec xmm8,xmm8,xmm1
+ vmovdqu xmm12,XMMWORD[16+r14]
+ mov QWORD[((64+48))+rsp],rbx
+ vaesdec xmm9,xmm9,xmm1
+ vmovups xmm1,XMMWORD[24+rsi]
+ lea r14,[16+rbx*1+r14]
+ vaesdec xmm2,xmm2,xmm0
+ cmp ecx,DWORD[((32+28))+rsp]
+ mov rbx,QWORD[((64+56))+rsp]
+ vaesdec xmm3,xmm3,xmm0
+ prefetcht0 [31+r15]
+ vaesdec xmm4,xmm4,xmm0
+ prefetcht0 [15+r13]
+ vaesdec xmm5,xmm5,xmm0
+ lea rbx,[rbx*1+r15]
+ cmovge r15,rsp
+ vaesdec xmm6,xmm6,xmm0
+ cmovg rbx,rsp
+ vaesdec xmm7,xmm7,xmm0
+ sub rbx,r15
+ vaesdec xmm8,xmm8,xmm0
+ vmovdqu xmm13,XMMWORD[16+r15]
+ mov QWORD[((64+56))+rsp],rbx
+ vaesdec xmm9,xmm9,xmm0
+ vmovups xmm0,XMMWORD[40+rsi]
+ lea r15,[16+rbx*1+r15]
+ vmovdqu xmm14,XMMWORD[32+rsp]
+ prefetcht0 [15+r14]
+ prefetcht0 [15+r15]
+ cmp eax,11
+ jb NEAR $L$dec8x_tail
+
+ vaesdec xmm2,xmm2,xmm1
+ vaesdec xmm3,xmm3,xmm1
+ vaesdec xmm4,xmm4,xmm1
+ vaesdec xmm5,xmm5,xmm1
+ vaesdec xmm6,xmm6,xmm1
+ vaesdec xmm7,xmm7,xmm1
+ vaesdec xmm8,xmm8,xmm1
+ vaesdec xmm9,xmm9,xmm1
+ vmovups xmm1,XMMWORD[((176-120))+rsi]
+
+ vaesdec xmm2,xmm2,xmm0
+ vaesdec xmm3,xmm3,xmm0
+ vaesdec xmm4,xmm4,xmm0
+ vaesdec xmm5,xmm5,xmm0
+ vaesdec xmm6,xmm6,xmm0
+ vaesdec xmm7,xmm7,xmm0
+ vaesdec xmm8,xmm8,xmm0
+ vaesdec xmm9,xmm9,xmm0
+ vmovups xmm0,XMMWORD[((192-120))+rsi]
+ je NEAR $L$dec8x_tail
+
+ vaesdec xmm2,xmm2,xmm1
+ vaesdec xmm3,xmm3,xmm1
+ vaesdec xmm4,xmm4,xmm1
+ vaesdec xmm5,xmm5,xmm1
+ vaesdec xmm6,xmm6,xmm1
+ vaesdec xmm7,xmm7,xmm1
+ vaesdec xmm8,xmm8,xmm1
+ vaesdec xmm9,xmm9,xmm1
+ vmovups xmm1,XMMWORD[((208-120))+rsi]
+
+ vaesdec xmm2,xmm2,xmm0
+ vaesdec xmm3,xmm3,xmm0
+ vaesdec xmm4,xmm4,xmm0
+ vaesdec xmm5,xmm5,xmm0
+ vaesdec xmm6,xmm6,xmm0
+ vaesdec xmm7,xmm7,xmm0
+ vaesdec xmm8,xmm8,xmm0
+ vaesdec xmm9,xmm9,xmm0
+ vmovups xmm0,XMMWORD[((224-120))+rsi]
+
+$L$dec8x_tail:
+ vaesdec xmm2,xmm2,xmm1
+ vpxor xmm15,xmm15,xmm15
+ vaesdec xmm3,xmm3,xmm1
+ vaesdec xmm4,xmm4,xmm1
+ vpcmpgtd xmm15,xmm14,xmm15
+ vaesdec xmm5,xmm5,xmm1
+ vaesdec xmm6,xmm6,xmm1
+ vpaddd xmm15,xmm15,xmm14
+ vmovdqu xmm14,XMMWORD[48+rsp]
+ vaesdec xmm7,xmm7,xmm1
+ mov rbx,QWORD[64+rsp]
+ vaesdec xmm8,xmm8,xmm1
+ vaesdec xmm9,xmm9,xmm1
+ vmovups xmm1,XMMWORD[((16-120))+rsi]
+
+ vaesdeclast xmm2,xmm2,xmm0
+ vmovdqa XMMWORD[32+rsp],xmm15
+ vpxor xmm15,xmm15,xmm15
+ vaesdeclast xmm3,xmm3,xmm0
+ vpxor xmm2,xmm2,XMMWORD[rbp]
+ vaesdeclast xmm4,xmm4,xmm0
+ vpxor xmm3,xmm3,XMMWORD[16+rbp]
+ vpcmpgtd xmm15,xmm14,xmm15
+ vaesdeclast xmm5,xmm5,xmm0
+ vpxor xmm4,xmm4,XMMWORD[32+rbp]
+ vaesdeclast xmm6,xmm6,xmm0
+ vpxor xmm5,xmm5,XMMWORD[48+rbp]
+ vpaddd xmm14,xmm14,xmm15
+ vmovdqu xmm15,XMMWORD[((-120))+rsi]
+ vaesdeclast xmm7,xmm7,xmm0
+ vpxor xmm6,xmm6,XMMWORD[64+rbp]
+ vaesdeclast xmm8,xmm8,xmm0
+ vpxor xmm7,xmm7,XMMWORD[80+rbp]
+ vmovdqa XMMWORD[48+rsp],xmm14
+ vaesdeclast xmm9,xmm9,xmm0
+ vpxor xmm8,xmm8,XMMWORD[96+rbp]
+ vmovups xmm0,XMMWORD[((32-120))+rsi]
+
+ vmovups XMMWORD[(-16)+r8],xmm2
+ sub r8,rbx
+ vmovdqu xmm2,XMMWORD[((128+0))+rsp]
+ vpxor xmm9,xmm9,XMMWORD[112+rbp]
+ vmovups XMMWORD[(-16)+r9],xmm3
+ sub r9,QWORD[72+rsp]
+ vmovdqu XMMWORD[rbp],xmm2
+ vpxor xmm2,xmm2,xmm15
+ vmovdqu xmm3,XMMWORD[((128+16))+rsp]
+ vmovups XMMWORD[(-16)+r10],xmm4
+ sub r10,QWORD[80+rsp]
+ vmovdqu XMMWORD[16+rbp],xmm3
+ vpxor xmm3,xmm3,xmm15
+ vmovdqu xmm4,XMMWORD[((128+32))+rsp]
+ vmovups XMMWORD[(-16)+r11],xmm5
+ sub r11,QWORD[88+rsp]
+ vmovdqu XMMWORD[32+rbp],xmm4
+ vpxor xmm4,xmm4,xmm15
+ vmovdqu xmm5,XMMWORD[((128+48))+rsp]
+ vmovups XMMWORD[(-16)+r12],xmm6
+ sub r12,QWORD[96+rsp]
+ vmovdqu XMMWORD[48+rbp],xmm5
+ vpxor xmm5,xmm5,xmm15
+ vmovdqu XMMWORD[64+rbp],xmm10
+ vpxor xmm6,xmm15,xmm10
+ vmovups XMMWORD[(-16)+r13],xmm7
+ sub r13,QWORD[104+rsp]
+ vmovdqu XMMWORD[80+rbp],xmm11
+ vpxor xmm7,xmm15,xmm11
+ vmovups XMMWORD[(-16)+r14],xmm8
+ sub r14,QWORD[112+rsp]
+ vmovdqu XMMWORD[96+rbp],xmm12
+ vpxor xmm8,xmm15,xmm12
+ vmovups XMMWORD[(-16)+r15],xmm9
+ sub r15,QWORD[120+rsp]
+ vmovdqu XMMWORD[112+rbp],xmm13
+ vpxor xmm9,xmm15,xmm13
+
+ xor rbp,128
+ dec edx
+ jnz NEAR $L$oop_dec8x
+
+ mov rax,QWORD[16+rsp]
+
+
+
+
+
+
+$L$dec8x_done:
+ vzeroupper
+ movaps xmm6,XMMWORD[((-216))+rax]
+ movaps xmm7,XMMWORD[((-200))+rax]
+ movaps xmm8,XMMWORD[((-184))+rax]
+ movaps xmm9,XMMWORD[((-168))+rax]
+ movaps xmm10,XMMWORD[((-152))+rax]
+ movaps xmm11,XMMWORD[((-136))+rax]
+ movaps xmm12,XMMWORD[((-120))+rax]
+ movaps xmm13,XMMWORD[((-104))+rax]
+ movaps xmm14,XMMWORD[((-88))+rax]
+ movaps xmm15,XMMWORD[((-72))+rax]
+ mov r15,QWORD[((-48))+rax]
+
+ mov r14,QWORD[((-40))+rax]
+
+ mov r13,QWORD[((-32))+rax]
+
+ mov r12,QWORD[((-24))+rax]
+
+ mov rbp,QWORD[((-16))+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+
+ lea rsp,[rax]
+
+$L$dec8x_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_aesni_multi_cbc_decrypt_avx:
+EXTERN __imp_RtlVirtualUnwind
+
+ALIGN 16
+se_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ mov rsi,QWORD[8+r9]
+ mov r11,QWORD[56+r9]
+
+ mov r10d,DWORD[r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jb NEAR $L$in_prologue
+
+ mov rax,QWORD[152+r8]
+
+ mov r10d,DWORD[4+r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jae NEAR $L$in_prologue
+
+ mov rax,QWORD[16+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+ mov rbp,QWORD[((-16))+rax]
+ mov r12,QWORD[((-24))+rax]
+ mov r13,QWORD[((-32))+rax]
+ mov r14,QWORD[((-40))+rax]
+ mov r15,QWORD[((-48))+rax]
+ mov QWORD[144+r8],rbx
+ mov QWORD[160+r8],rbp
+ mov QWORD[216+r8],r12
+ mov QWORD[224+r8],r13
+ mov QWORD[232+r8],r14
+ mov QWORD[240+r8],r15
+
+ lea rsi,[((-56-160))+rax]
+ lea rdi,[512+r8]
+ mov ecx,20
+ DD 0xa548f3fc
+
+$L$in_prologue:
+ mov rdi,QWORD[8+rax]
+ mov rsi,QWORD[16+rax]
+ mov QWORD[152+r8],rax
+ mov QWORD[168+r8],rsi
+ mov QWORD[176+r8],rdi
+
+ mov rdi,QWORD[40+r9]
+ mov rsi,r8
+ mov ecx,154
+ DD 0xa548f3fc
+
+ mov rsi,r9
+ xor rcx,rcx
+ mov rdx,QWORD[8+rsi]
+ mov r8,QWORD[rsi]
+ mov r9,QWORD[16+rsi]
+ mov r10,QWORD[40+rsi]
+ lea r11,[56+rsi]
+ lea r12,[24+rsi]
+ mov QWORD[32+rsp],r10
+ mov QWORD[40+rsp],r11
+ mov QWORD[48+rsp],r12
+ mov QWORD[56+rsp],rcx
+ call QWORD[__imp_RtlVirtualUnwind]
+
+ mov eax,1
+ add rsp,64
+ popfq
+ pop r15
+ pop r14
+ pop r13
+ pop r12
+ pop rbp
+ pop rbx
+ pop rdi
+ pop rsi
+ DB 0F3h,0C3h ;repret
+
+
+section .pdata rdata align=4
+ALIGN 4
+ DD $L$SEH_begin_aesni_multi_cbc_encrypt wrt ..imagebase
+ DD $L$SEH_end_aesni_multi_cbc_encrypt wrt ..imagebase
+ DD $L$SEH_info_aesni_multi_cbc_encrypt wrt ..imagebase
+ DD $L$SEH_begin_aesni_multi_cbc_decrypt wrt ..imagebase
+ DD $L$SEH_end_aesni_multi_cbc_decrypt wrt ..imagebase
+ DD $L$SEH_info_aesni_multi_cbc_decrypt wrt ..imagebase
+ DD $L$SEH_begin_aesni_multi_cbc_encrypt_avx wrt ..imagebase
+ DD $L$SEH_end_aesni_multi_cbc_encrypt_avx wrt ..imagebase
+ DD $L$SEH_info_aesni_multi_cbc_encrypt_avx wrt ..imagebase
+ DD $L$SEH_begin_aesni_multi_cbc_decrypt_avx wrt ..imagebase
+ DD $L$SEH_end_aesni_multi_cbc_decrypt_avx wrt ..imagebase
+ DD $L$SEH_info_aesni_multi_cbc_decrypt_avx wrt ..imagebase
+section .xdata rdata align=8
+ALIGN 8
+$L$SEH_info_aesni_multi_cbc_encrypt:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$enc4x_body wrt ..imagebase,$L$enc4x_epilogue wrt ..imagebase
+$L$SEH_info_aesni_multi_cbc_decrypt:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$dec4x_body wrt ..imagebase,$L$dec4x_epilogue wrt ..imagebase
+$L$SEH_info_aesni_multi_cbc_encrypt_avx:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$enc8x_body wrt ..imagebase,$L$enc8x_epilogue wrt ..imagebase
+$L$SEH_info_aesni_multi_cbc_decrypt_avx:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$dec8x_body wrt ..imagebase,$L$dec8x_epilogue wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha1-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha1-x86_64.nasm
new file mode 100644
index 0000000000..0b706c4e77
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha1-x86_64.nasm
@@ -0,0 +1,3271 @@
+; Copyright 2011-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+EXTERN OPENSSL_ia32cap_P
+
+global aesni_cbc_sha1_enc
+
+ALIGN 32
+aesni_cbc_sha1_enc:
+
+ mov r10d,DWORD[((OPENSSL_ia32cap_P+0))]
+ mov r11,QWORD[((OPENSSL_ia32cap_P+4))]
+ bt r11,61
+ jc NEAR aesni_cbc_sha1_enc_shaext
+ and r11d,268435456
+ and r10d,1073741824
+ or r10d,r11d
+ cmp r10d,1342177280
+ je NEAR aesni_cbc_sha1_enc_avx
+ jmp NEAR aesni_cbc_sha1_enc_ssse3
+ DB 0F3h,0C3h ;repret
+
+
+ALIGN 32
+aesni_cbc_sha1_enc_ssse3:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_aesni_cbc_sha1_enc_ssse3:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+
+ mov r10,QWORD[56+rsp]
+
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+ lea rsp,[((-264))+rsp]
+
+
+
+ movaps XMMWORD[(96+0)+rsp],xmm6
+ movaps XMMWORD[(96+16)+rsp],xmm7
+ movaps XMMWORD[(96+32)+rsp],xmm8
+ movaps XMMWORD[(96+48)+rsp],xmm9
+ movaps XMMWORD[(96+64)+rsp],xmm10
+ movaps XMMWORD[(96+80)+rsp],xmm11
+ movaps XMMWORD[(96+96)+rsp],xmm12
+ movaps XMMWORD[(96+112)+rsp],xmm13
+ movaps XMMWORD[(96+128)+rsp],xmm14
+ movaps XMMWORD[(96+144)+rsp],xmm15
+$L$prologue_ssse3:
+ mov r12,rdi
+ mov r13,rsi
+ mov r14,rdx
+ lea r15,[112+rcx]
+ movdqu xmm2,XMMWORD[r8]
+ mov QWORD[88+rsp],r8
+ shl r14,6
+ sub r13,r12
+ mov r8d,DWORD[((240-112))+r15]
+ add r14,r10
+
+ lea r11,[K_XX_XX]
+ mov eax,DWORD[r9]
+ mov ebx,DWORD[4+r9]
+ mov ecx,DWORD[8+r9]
+ mov edx,DWORD[12+r9]
+ mov esi,ebx
+ mov ebp,DWORD[16+r9]
+ mov edi,ecx
+ xor edi,edx
+ and esi,edi
+
+ movdqa xmm3,XMMWORD[64+r11]
+ movdqa xmm13,XMMWORD[r11]
+ movdqu xmm4,XMMWORD[r10]
+ movdqu xmm5,XMMWORD[16+r10]
+ movdqu xmm6,XMMWORD[32+r10]
+ movdqu xmm7,XMMWORD[48+r10]
+DB 102,15,56,0,227
+DB 102,15,56,0,235
+DB 102,15,56,0,243
+ add r10,64
+ paddd xmm4,xmm13
+DB 102,15,56,0,251
+ paddd xmm5,xmm13
+ paddd xmm6,xmm13
+ movdqa XMMWORD[rsp],xmm4
+ psubd xmm4,xmm13
+ movdqa XMMWORD[16+rsp],xmm5
+ psubd xmm5,xmm13
+ movdqa XMMWORD[32+rsp],xmm6
+ psubd xmm6,xmm13
+ movups xmm15,XMMWORD[((-112))+r15]
+ movups xmm0,XMMWORD[((16-112))+r15]
+ jmp NEAR $L$oop_ssse3
+ALIGN 32
+$L$oop_ssse3:
+ ror ebx,2
+ movups xmm14,XMMWORD[r12]
+ xorps xmm14,xmm15
+ xorps xmm2,xmm14
+ movups xmm1,XMMWORD[((-80))+r15]
+DB 102,15,56,220,208
+ pshufd xmm8,xmm4,238
+ xor esi,edx
+ movdqa xmm12,xmm7
+ paddd xmm13,xmm7
+ mov edi,eax
+ add ebp,DWORD[rsp]
+ punpcklqdq xmm8,xmm5
+ xor ebx,ecx
+ rol eax,5
+ add ebp,esi
+ psrldq xmm12,4
+ and edi,ebx
+ xor ebx,ecx
+ pxor xmm8,xmm4
+ add ebp,eax
+ ror eax,7
+ pxor xmm12,xmm6
+ xor edi,ecx
+ mov esi,ebp
+ add edx,DWORD[4+rsp]
+ pxor xmm8,xmm12
+ xor eax,ebx
+ rol ebp,5
+ movdqa XMMWORD[48+rsp],xmm13
+ add edx,edi
+ movups xmm0,XMMWORD[((-64))+r15]
+DB 102,15,56,220,209
+ and esi,eax
+ movdqa xmm3,xmm8
+ xor eax,ebx
+ add edx,ebp
+ ror ebp,7
+ movdqa xmm12,xmm8
+ xor esi,ebx
+ pslldq xmm3,12
+ paddd xmm8,xmm8
+ mov edi,edx
+ add ecx,DWORD[8+rsp]
+ psrld xmm12,31
+ xor ebp,eax
+ rol edx,5
+ add ecx,esi
+ movdqa xmm13,xmm3
+ and edi,ebp
+ xor ebp,eax
+ psrld xmm3,30
+ add ecx,edx
+ ror edx,7
+ por xmm8,xmm12
+ xor edi,eax
+ mov esi,ecx
+ add ebx,DWORD[12+rsp]
+ movups xmm1,XMMWORD[((-48))+r15]
+DB 102,15,56,220,208
+ pslld xmm13,2
+ pxor xmm8,xmm3
+ xor edx,ebp
+ movdqa xmm3,XMMWORD[r11]
+ rol ecx,5
+ add ebx,edi
+ and esi,edx
+ pxor xmm8,xmm13
+ xor edx,ebp
+ add ebx,ecx
+ ror ecx,7
+ pshufd xmm9,xmm5,238
+ xor esi,ebp
+ movdqa xmm13,xmm8
+ paddd xmm3,xmm8
+ mov edi,ebx
+ add eax,DWORD[16+rsp]
+ punpcklqdq xmm9,xmm6
+ xor ecx,edx
+ rol ebx,5
+ add eax,esi
+ psrldq xmm13,4
+ and edi,ecx
+ xor ecx,edx
+ pxor xmm9,xmm5
+ add eax,ebx
+ ror ebx,7
+ movups xmm0,XMMWORD[((-32))+r15]
+DB 102,15,56,220,209
+ pxor xmm13,xmm7
+ xor edi,edx
+ mov esi,eax
+ add ebp,DWORD[20+rsp]
+ pxor xmm9,xmm13
+ xor ebx,ecx
+ rol eax,5
+ movdqa XMMWORD[rsp],xmm3
+ add ebp,edi
+ and esi,ebx
+ movdqa xmm12,xmm9
+ xor ebx,ecx
+ add ebp,eax
+ ror eax,7
+ movdqa xmm13,xmm9
+ xor esi,ecx
+ pslldq xmm12,12
+ paddd xmm9,xmm9
+ mov edi,ebp
+ add edx,DWORD[24+rsp]
+ psrld xmm13,31
+ xor eax,ebx
+ rol ebp,5
+ add edx,esi
+ movups xmm1,XMMWORD[((-16))+r15]
+DB 102,15,56,220,208
+ movdqa xmm3,xmm12
+ and edi,eax
+ xor eax,ebx
+ psrld xmm12,30
+ add edx,ebp
+ ror ebp,7
+ por xmm9,xmm13
+ xor edi,ebx
+ mov esi,edx
+ add ecx,DWORD[28+rsp]
+ pslld xmm3,2
+ pxor xmm9,xmm12
+ xor ebp,eax
+ movdqa xmm12,XMMWORD[16+r11]
+ rol edx,5
+ add ecx,edi
+ and esi,ebp
+ pxor xmm9,xmm3
+ xor ebp,eax
+ add ecx,edx
+ ror edx,7
+ pshufd xmm10,xmm6,238
+ xor esi,eax
+ movdqa xmm3,xmm9
+ paddd xmm12,xmm9
+ mov edi,ecx
+ add ebx,DWORD[32+rsp]
+ movups xmm0,XMMWORD[r15]
+DB 102,15,56,220,209
+ punpcklqdq xmm10,xmm7
+ xor edx,ebp
+ rol ecx,5
+ add ebx,esi
+ psrldq xmm3,4
+ and edi,edx
+ xor edx,ebp
+ pxor xmm10,xmm6
+ add ebx,ecx
+ ror ecx,7
+ pxor xmm3,xmm8
+ xor edi,ebp
+ mov esi,ebx
+ add eax,DWORD[36+rsp]
+ pxor xmm10,xmm3
+ xor ecx,edx
+ rol ebx,5
+ movdqa XMMWORD[16+rsp],xmm12
+ add eax,edi
+ and esi,ecx
+ movdqa xmm13,xmm10
+ xor ecx,edx
+ add eax,ebx
+ ror ebx,7
+ movups xmm1,XMMWORD[16+r15]
+DB 102,15,56,220,208
+ movdqa xmm3,xmm10
+ xor esi,edx
+ pslldq xmm13,12
+ paddd xmm10,xmm10
+ mov edi,eax
+ add ebp,DWORD[40+rsp]
+ psrld xmm3,31
+ xor ebx,ecx
+ rol eax,5
+ add ebp,esi
+ movdqa xmm12,xmm13
+ and edi,ebx
+ xor ebx,ecx
+ psrld xmm13,30
+ add ebp,eax
+ ror eax,7
+ por xmm10,xmm3
+ xor edi,ecx
+ mov esi,ebp
+ add edx,DWORD[44+rsp]
+ pslld xmm12,2
+ pxor xmm10,xmm13
+ xor eax,ebx
+ movdqa xmm13,XMMWORD[16+r11]
+ rol ebp,5
+ add edx,edi
+ movups xmm0,XMMWORD[32+r15]
+DB 102,15,56,220,209
+ and esi,eax
+ pxor xmm10,xmm12
+ xor eax,ebx
+ add edx,ebp
+ ror ebp,7
+ pshufd xmm11,xmm7,238
+ xor esi,ebx
+ movdqa xmm12,xmm10
+ paddd xmm13,xmm10
+ mov edi,edx
+ add ecx,DWORD[48+rsp]
+ punpcklqdq xmm11,xmm8
+ xor ebp,eax
+ rol edx,5
+ add ecx,esi
+ psrldq xmm12,4
+ and edi,ebp
+ xor ebp,eax
+ pxor xmm11,xmm7
+ add ecx,edx
+ ror edx,7
+ pxor xmm12,xmm9
+ xor edi,eax
+ mov esi,ecx
+ add ebx,DWORD[52+rsp]
+ movups xmm1,XMMWORD[48+r15]
+DB 102,15,56,220,208
+ pxor xmm11,xmm12
+ xor edx,ebp
+ rol ecx,5
+ movdqa XMMWORD[32+rsp],xmm13
+ add ebx,edi
+ and esi,edx
+ movdqa xmm3,xmm11
+ xor edx,ebp
+ add ebx,ecx
+ ror ecx,7
+ movdqa xmm12,xmm11
+ xor esi,ebp
+ pslldq xmm3,12
+ paddd xmm11,xmm11
+ mov edi,ebx
+ add eax,DWORD[56+rsp]
+ psrld xmm12,31
+ xor ecx,edx
+ rol ebx,5
+ add eax,esi
+ movdqa xmm13,xmm3
+ and edi,ecx
+ xor ecx,edx
+ psrld xmm3,30
+ add eax,ebx
+ ror ebx,7
+ cmp r8d,11
+ jb NEAR $L$aesenclast1
+ movups xmm0,XMMWORD[64+r15]
+DB 102,15,56,220,209
+ movups xmm1,XMMWORD[80+r15]
+DB 102,15,56,220,208
+ je NEAR $L$aesenclast1
+ movups xmm0,XMMWORD[96+r15]
+DB 102,15,56,220,209
+ movups xmm1,XMMWORD[112+r15]
+DB 102,15,56,220,208
+$L$aesenclast1:
+DB 102,15,56,221,209
+ movups xmm0,XMMWORD[((16-112))+r15]
+ por xmm11,xmm12
+ xor edi,edx
+ mov esi,eax
+ add ebp,DWORD[60+rsp]
+ pslld xmm13,2
+ pxor xmm11,xmm3
+ xor ebx,ecx
+ movdqa xmm3,XMMWORD[16+r11]
+ rol eax,5
+ add ebp,edi
+ and esi,ebx
+ pxor xmm11,xmm13
+ pshufd xmm13,xmm10,238
+ xor ebx,ecx
+ add ebp,eax
+ ror eax,7
+ pxor xmm4,xmm8
+ xor esi,ecx
+ mov edi,ebp
+ add edx,DWORD[rsp]
+ punpcklqdq xmm13,xmm11
+ xor eax,ebx
+ rol ebp,5
+ pxor xmm4,xmm5
+ add edx,esi
+ movups xmm14,XMMWORD[16+r12]
+ xorps xmm14,xmm15
+ movups XMMWORD[r13*1+r12],xmm2
+ xorps xmm2,xmm14
+ movups xmm1,XMMWORD[((-80))+r15]
+DB 102,15,56,220,208
+ and edi,eax
+ movdqa xmm12,xmm3
+ xor eax,ebx
+ paddd xmm3,xmm11
+ add edx,ebp
+ pxor xmm4,xmm13
+ ror ebp,7
+ xor edi,ebx
+ mov esi,edx
+ add ecx,DWORD[4+rsp]
+ movdqa xmm13,xmm4
+ xor ebp,eax
+ rol edx,5
+ movdqa XMMWORD[48+rsp],xmm3
+ add ecx,edi
+ and esi,ebp
+ xor ebp,eax
+ pslld xmm4,2
+ add ecx,edx
+ ror edx,7
+ psrld xmm13,30
+ xor esi,eax
+ mov edi,ecx
+ add ebx,DWORD[8+rsp]
+ movups xmm0,XMMWORD[((-64))+r15]
+DB 102,15,56,220,209
+ por xmm4,xmm13
+ xor edx,ebp
+ rol ecx,5
+ pshufd xmm3,xmm11,238
+ add ebx,esi
+ and edi,edx
+ xor edx,ebp
+ add ebx,ecx
+ add eax,DWORD[12+rsp]
+ xor edi,ebp
+ mov esi,ebx
+ rol ebx,5
+ add eax,edi
+ xor esi,edx
+ ror ecx,7
+ add eax,ebx
+ pxor xmm5,xmm9
+ add ebp,DWORD[16+rsp]
+ movups xmm1,XMMWORD[((-48))+r15]
+DB 102,15,56,220,208
+ xor esi,ecx
+ punpcklqdq xmm3,xmm4
+ mov edi,eax
+ rol eax,5
+ pxor xmm5,xmm6
+ add ebp,esi
+ xor edi,ecx
+ movdqa xmm13,xmm12
+ ror ebx,7
+ paddd xmm12,xmm4
+ add ebp,eax
+ pxor xmm5,xmm3
+ add edx,DWORD[20+rsp]
+ xor edi,ebx
+ mov esi,ebp
+ rol ebp,5
+ movdqa xmm3,xmm5
+ add edx,edi
+ xor esi,ebx
+ movdqa XMMWORD[rsp],xmm12
+ ror eax,7
+ add edx,ebp
+ add ecx,DWORD[24+rsp]
+ pslld xmm5,2
+ xor esi,eax
+ mov edi,edx
+ psrld xmm3,30
+ rol edx,5
+ add ecx,esi
+ movups xmm0,XMMWORD[((-32))+r15]
+DB 102,15,56,220,209
+ xor edi,eax
+ ror ebp,7
+ por xmm5,xmm3
+ add ecx,edx
+ add ebx,DWORD[28+rsp]
+ pshufd xmm12,xmm4,238
+ xor edi,ebp
+ mov esi,ecx
+ rol ecx,5
+ add ebx,edi
+ xor esi,ebp
+ ror edx,7
+ add ebx,ecx
+ pxor xmm6,xmm10
+ add eax,DWORD[32+rsp]
+ xor esi,edx
+ punpcklqdq xmm12,xmm5
+ mov edi,ebx
+ rol ebx,5
+ pxor xmm6,xmm7
+ add eax,esi
+ xor edi,edx
+ movdqa xmm3,XMMWORD[32+r11]
+ ror ecx,7
+ paddd xmm13,xmm5
+ add eax,ebx
+ pxor xmm6,xmm12
+ add ebp,DWORD[36+rsp]
+ movups xmm1,XMMWORD[((-16))+r15]
+DB 102,15,56,220,208
+ xor edi,ecx
+ mov esi,eax
+ rol eax,5
+ movdqa xmm12,xmm6
+ add ebp,edi
+ xor esi,ecx
+ movdqa XMMWORD[16+rsp],xmm13
+ ror ebx,7
+ add ebp,eax
+ add edx,DWORD[40+rsp]
+ pslld xmm6,2
+ xor esi,ebx
+ mov edi,ebp
+ psrld xmm12,30
+ rol ebp,5
+ add edx,esi
+ xor edi,ebx
+ ror eax,7
+ por xmm6,xmm12
+ add edx,ebp
+ add ecx,DWORD[44+rsp]
+ pshufd xmm13,xmm5,238
+ xor edi,eax
+ mov esi,edx
+ rol edx,5
+ add ecx,edi
+ movups xmm0,XMMWORD[r15]
+DB 102,15,56,220,209
+ xor esi,eax
+ ror ebp,7
+ add ecx,edx
+ pxor xmm7,xmm11
+ add ebx,DWORD[48+rsp]
+ xor esi,ebp
+ punpcklqdq xmm13,xmm6
+ mov edi,ecx
+ rol ecx,5
+ pxor xmm7,xmm8
+ add ebx,esi
+ xor edi,ebp
+ movdqa xmm12,xmm3
+ ror edx,7
+ paddd xmm3,xmm6
+ add ebx,ecx
+ pxor xmm7,xmm13
+ add eax,DWORD[52+rsp]
+ xor edi,edx
+ mov esi,ebx
+ rol ebx,5
+ movdqa xmm13,xmm7
+ add eax,edi
+ xor esi,edx
+ movdqa XMMWORD[32+rsp],xmm3
+ ror ecx,7
+ add eax,ebx
+ add ebp,DWORD[56+rsp]
+ movups xmm1,XMMWORD[16+r15]
+DB 102,15,56,220,208
+ pslld xmm7,2
+ xor esi,ecx
+ mov edi,eax
+ psrld xmm13,30
+ rol eax,5
+ add ebp,esi
+ xor edi,ecx
+ ror ebx,7
+ por xmm7,xmm13
+ add ebp,eax
+ add edx,DWORD[60+rsp]
+ pshufd xmm3,xmm6,238
+ xor edi,ebx
+ mov esi,ebp
+ rol ebp,5
+ add edx,edi
+ xor esi,ebx
+ ror eax,7
+ add edx,ebp
+ pxor xmm8,xmm4
+ add ecx,DWORD[rsp]
+ xor esi,eax
+ punpcklqdq xmm3,xmm7
+ mov edi,edx
+ rol edx,5
+ pxor xmm8,xmm9
+ add ecx,esi
+ movups xmm0,XMMWORD[32+r15]
+DB 102,15,56,220,209
+ xor edi,eax
+ movdqa xmm13,xmm12
+ ror ebp,7
+ paddd xmm12,xmm7
+ add ecx,edx
+ pxor xmm8,xmm3
+ add ebx,DWORD[4+rsp]
+ xor edi,ebp
+ mov esi,ecx
+ rol ecx,5
+ movdqa xmm3,xmm8
+ add ebx,edi
+ xor esi,ebp
+ movdqa XMMWORD[48+rsp],xmm12
+ ror edx,7
+ add ebx,ecx
+ add eax,DWORD[8+rsp]
+ pslld xmm8,2
+ xor esi,edx
+ mov edi,ebx
+ psrld xmm3,30
+ rol ebx,5
+ add eax,esi
+ xor edi,edx
+ ror ecx,7
+ por xmm8,xmm3
+ add eax,ebx
+ add ebp,DWORD[12+rsp]
+ movups xmm1,XMMWORD[48+r15]
+DB 102,15,56,220,208
+ pshufd xmm12,xmm7,238
+ xor edi,ecx
+ mov esi,eax
+ rol eax,5
+ add ebp,edi
+ xor esi,ecx
+ ror ebx,7
+ add ebp,eax
+ pxor xmm9,xmm5
+ add edx,DWORD[16+rsp]
+ xor esi,ebx
+ punpcklqdq xmm12,xmm8
+ mov edi,ebp
+ rol ebp,5
+ pxor xmm9,xmm10
+ add edx,esi
+ xor edi,ebx
+ movdqa xmm3,xmm13
+ ror eax,7
+ paddd xmm13,xmm8
+ add edx,ebp
+ pxor xmm9,xmm12
+ add ecx,DWORD[20+rsp]
+ xor edi,eax
+ mov esi,edx
+ rol edx,5
+ movdqa xmm12,xmm9
+ add ecx,edi
+ cmp r8d,11
+ jb NEAR $L$aesenclast2
+ movups xmm0,XMMWORD[64+r15]
+DB 102,15,56,220,209
+ movups xmm1,XMMWORD[80+r15]
+DB 102,15,56,220,208
+ je NEAR $L$aesenclast2
+ movups xmm0,XMMWORD[96+r15]
+DB 102,15,56,220,209
+ movups xmm1,XMMWORD[112+r15]
+DB 102,15,56,220,208
+$L$aesenclast2:
+DB 102,15,56,221,209
+ movups xmm0,XMMWORD[((16-112))+r15]
+ xor esi,eax
+ movdqa XMMWORD[rsp],xmm13
+ ror ebp,7
+ add ecx,edx
+ add ebx,DWORD[24+rsp]
+ pslld xmm9,2
+ xor esi,ebp
+ mov edi,ecx
+ psrld xmm12,30
+ rol ecx,5
+ add ebx,esi
+ xor edi,ebp
+ ror edx,7
+ por xmm9,xmm12
+ add ebx,ecx
+ add eax,DWORD[28+rsp]
+ pshufd xmm13,xmm8,238
+ ror ecx,7
+ mov esi,ebx
+ xor edi,edx
+ rol ebx,5
+ add eax,edi
+ xor esi,ecx
+ xor ecx,edx
+ add eax,ebx
+ pxor xmm10,xmm6
+ add ebp,DWORD[32+rsp]
+ movups xmm14,XMMWORD[32+r12]
+ xorps xmm14,xmm15
+ movups XMMWORD[16+r12*1+r13],xmm2
+ xorps xmm2,xmm14
+ movups xmm1,XMMWORD[((-80))+r15]
+DB 102,15,56,220,208
+ and esi,ecx
+ xor ecx,edx
+ ror ebx,7
+ punpcklqdq xmm13,xmm9
+ mov edi,eax
+ xor esi,ecx
+ pxor xmm10,xmm11
+ rol eax,5
+ add ebp,esi
+ movdqa xmm12,xmm3
+ xor edi,ebx
+ paddd xmm3,xmm9
+ xor ebx,ecx
+ pxor xmm10,xmm13
+ add ebp,eax
+ add edx,DWORD[36+rsp]
+ and edi,ebx
+ xor ebx,ecx
+ ror eax,7
+ movdqa xmm13,xmm10
+ mov esi,ebp
+ xor edi,ebx
+ movdqa XMMWORD[16+rsp],xmm3
+ rol ebp,5
+ add edx,edi
+ movups xmm0,XMMWORD[((-64))+r15]
+DB 102,15,56,220,209
+ xor esi,eax
+ pslld xmm10,2
+ xor eax,ebx
+ add edx,ebp
+ psrld xmm13,30
+ add ecx,DWORD[40+rsp]
+ and esi,eax
+ xor eax,ebx
+ por xmm10,xmm13
+ ror ebp,7
+ mov edi,edx
+ xor esi,eax
+ rol edx,5
+ pshufd xmm3,xmm9,238
+ add ecx,esi
+ xor edi,ebp
+ xor ebp,eax
+ add ecx,edx
+ add ebx,DWORD[44+rsp]
+ and edi,ebp
+ xor ebp,eax
+ ror edx,7
+ movups xmm1,XMMWORD[((-48))+r15]
+DB 102,15,56,220,208
+ mov esi,ecx
+ xor edi,ebp
+ rol ecx,5
+ add ebx,edi
+ xor esi,edx
+ xor edx,ebp
+ add ebx,ecx
+ pxor xmm11,xmm7
+ add eax,DWORD[48+rsp]
+ and esi,edx
+ xor edx,ebp
+ ror ecx,7
+ punpcklqdq xmm3,xmm10
+ mov edi,ebx
+ xor esi,edx
+ pxor xmm11,xmm4
+ rol ebx,5
+ add eax,esi
+ movdqa xmm13,XMMWORD[48+r11]
+ xor edi,ecx
+ paddd xmm12,xmm10
+ xor ecx,edx
+ pxor xmm11,xmm3
+ add eax,ebx
+ add ebp,DWORD[52+rsp]
+ movups xmm0,XMMWORD[((-32))+r15]
+DB 102,15,56,220,209
+ and edi,ecx
+ xor ecx,edx
+ ror ebx,7
+ movdqa xmm3,xmm11
+ mov esi,eax
+ xor edi,ecx
+ movdqa XMMWORD[32+rsp],xmm12
+ rol eax,5
+ add ebp,edi
+ xor esi,ebx
+ pslld xmm11,2
+ xor ebx,ecx
+ add ebp,eax
+ psrld xmm3,30
+ add edx,DWORD[56+rsp]
+ and esi,ebx
+ xor ebx,ecx
+ por xmm11,xmm3
+ ror eax,7
+ mov edi,ebp
+ xor esi,ebx
+ rol ebp,5
+ pshufd xmm12,xmm10,238
+ add edx,esi
+ movups xmm1,XMMWORD[((-16))+r15]
+DB 102,15,56,220,208
+ xor edi,eax
+ xor eax,ebx
+ add edx,ebp
+ add ecx,DWORD[60+rsp]
+ and edi,eax
+ xor eax,ebx
+ ror ebp,7
+ mov esi,edx
+ xor edi,eax
+ rol edx,5
+ add ecx,edi
+ xor esi,ebp
+ xor ebp,eax
+ add ecx,edx
+ pxor xmm4,xmm8
+ add ebx,DWORD[rsp]
+ and esi,ebp
+ xor ebp,eax
+ ror edx,7
+ movups xmm0,XMMWORD[r15]
+DB 102,15,56,220,209
+ punpcklqdq xmm12,xmm11
+ mov edi,ecx
+ xor esi,ebp
+ pxor xmm4,xmm5
+ rol ecx,5
+ add ebx,esi
+ movdqa xmm3,xmm13
+ xor edi,edx
+ paddd xmm13,xmm11
+ xor edx,ebp
+ pxor xmm4,xmm12
+ add ebx,ecx
+ add eax,DWORD[4+rsp]
+ and edi,edx
+ xor edx,ebp
+ ror ecx,7
+ movdqa xmm12,xmm4
+ mov esi,ebx
+ xor edi,edx
+ movdqa XMMWORD[48+rsp],xmm13
+ rol ebx,5
+ add eax,edi
+ xor esi,ecx
+ pslld xmm4,2
+ xor ecx,edx
+ add eax,ebx
+ psrld xmm12,30
+ add ebp,DWORD[8+rsp]
+ movups xmm1,XMMWORD[16+r15]
+DB 102,15,56,220,208
+ and esi,ecx
+ xor ecx,edx
+ por xmm4,xmm12
+ ror ebx,7
+ mov edi,eax
+ xor esi,ecx
+ rol eax,5
+ pshufd xmm13,xmm11,238
+ add ebp,esi
+ xor edi,ebx
+ xor ebx,ecx
+ add ebp,eax
+ add edx,DWORD[12+rsp]
+ and edi,ebx
+ xor ebx,ecx
+ ror eax,7
+ mov esi,ebp
+ xor edi,ebx
+ rol ebp,5
+ add edx,edi
+ movups xmm0,XMMWORD[32+r15]
+DB 102,15,56,220,209
+ xor esi,eax
+ xor eax,ebx
+ add edx,ebp
+ pxor xmm5,xmm9
+ add ecx,DWORD[16+rsp]
+ and esi,eax
+ xor eax,ebx
+ ror ebp,7
+ punpcklqdq xmm13,xmm4
+ mov edi,edx
+ xor esi,eax
+ pxor xmm5,xmm6
+ rol edx,5
+ add ecx,esi
+ movdqa xmm12,xmm3
+ xor edi,ebp
+ paddd xmm3,xmm4
+ xor ebp,eax
+ pxor xmm5,xmm13
+ add ecx,edx
+ add ebx,DWORD[20+rsp]
+ and edi,ebp
+ xor ebp,eax
+ ror edx,7
+ movups xmm1,XMMWORD[48+r15]
+DB 102,15,56,220,208
+ movdqa xmm13,xmm5
+ mov esi,ecx
+ xor edi,ebp
+ movdqa XMMWORD[rsp],xmm3
+ rol ecx,5
+ add ebx,edi
+ xor esi,edx
+ pslld xmm5,2
+ xor edx,ebp
+ add ebx,ecx
+ psrld xmm13,30
+ add eax,DWORD[24+rsp]
+ and esi,edx
+ xor edx,ebp
+ por xmm5,xmm13
+ ror ecx,7
+ mov edi,ebx
+ xor esi,edx
+ rol ebx,5
+ pshufd xmm3,xmm4,238
+ add eax,esi
+ xor edi,ecx
+ xor ecx,edx
+ add eax,ebx
+ add ebp,DWORD[28+rsp]
+ cmp r8d,11
+ jb NEAR $L$aesenclast3
+ movups xmm0,XMMWORD[64+r15]
+DB 102,15,56,220,209
+ movups xmm1,XMMWORD[80+r15]
+DB 102,15,56,220,208
+ je NEAR $L$aesenclast3
+ movups xmm0,XMMWORD[96+r15]
+DB 102,15,56,220,209
+ movups xmm1,XMMWORD[112+r15]
+DB 102,15,56,220,208
+$L$aesenclast3:
+DB 102,15,56,221,209
+ movups xmm0,XMMWORD[((16-112))+r15]
+ and edi,ecx
+ xor ecx,edx
+ ror ebx,7
+ mov esi,eax
+ xor edi,ecx
+ rol eax,5
+ add ebp,edi
+ xor esi,ebx
+ xor ebx,ecx
+ add ebp,eax
+ pxor xmm6,xmm10
+ add edx,DWORD[32+rsp]
+ and esi,ebx
+ xor ebx,ecx
+ ror eax,7
+ punpcklqdq xmm3,xmm5
+ mov edi,ebp
+ xor esi,ebx
+ pxor xmm6,xmm7
+ rol ebp,5
+ add edx,esi
+ movups xmm14,XMMWORD[48+r12]
+ xorps xmm14,xmm15
+ movups XMMWORD[32+r12*1+r13],xmm2
+ xorps xmm2,xmm14
+ movups xmm1,XMMWORD[((-80))+r15]
+DB 102,15,56,220,208
+ movdqa xmm13,xmm12
+ xor edi,eax
+ paddd xmm12,xmm5
+ xor eax,ebx
+ pxor xmm6,xmm3
+ add edx,ebp
+ add ecx,DWORD[36+rsp]
+ and edi,eax
+ xor eax,ebx
+ ror ebp,7
+ movdqa xmm3,xmm6
+ mov esi,edx
+ xor edi,eax
+ movdqa XMMWORD[16+rsp],xmm12
+ rol edx,5
+ add ecx,edi
+ xor esi,ebp
+ pslld xmm6,2
+ xor ebp,eax
+ add ecx,edx
+ psrld xmm3,30
+ add ebx,DWORD[40+rsp]
+ and esi,ebp
+ xor ebp,eax
+ por xmm6,xmm3
+ ror edx,7
+ movups xmm0,XMMWORD[((-64))+r15]
+DB 102,15,56,220,209
+ mov edi,ecx
+ xor esi,ebp
+ rol ecx,5
+ pshufd xmm12,xmm5,238
+ add ebx,esi
+ xor edi,edx
+ xor edx,ebp
+ add ebx,ecx
+ add eax,DWORD[44+rsp]
+ and edi,edx
+ xor edx,ebp
+ ror ecx,7
+ mov esi,ebx
+ xor edi,edx
+ rol ebx,5
+ add eax,edi
+ xor esi,edx
+ add eax,ebx
+ pxor xmm7,xmm11
+ add ebp,DWORD[48+rsp]
+ movups xmm1,XMMWORD[((-48))+r15]
+DB 102,15,56,220,208
+ xor esi,ecx
+ punpcklqdq xmm12,xmm6
+ mov edi,eax
+ rol eax,5
+ pxor xmm7,xmm8
+ add ebp,esi
+ xor edi,ecx
+ movdqa xmm3,xmm13
+ ror ebx,7
+ paddd xmm13,xmm6
+ add ebp,eax
+ pxor xmm7,xmm12
+ add edx,DWORD[52+rsp]
+ xor edi,ebx
+ mov esi,ebp
+ rol ebp,5
+ movdqa xmm12,xmm7
+ add edx,edi
+ xor esi,ebx
+ movdqa XMMWORD[32+rsp],xmm13
+ ror eax,7
+ add edx,ebp
+ add ecx,DWORD[56+rsp]
+ pslld xmm7,2
+ xor esi,eax
+ mov edi,edx
+ psrld xmm12,30
+ rol edx,5
+ add ecx,esi
+ movups xmm0,XMMWORD[((-32))+r15]
+DB 102,15,56,220,209
+ xor edi,eax
+ ror ebp,7
+ por xmm7,xmm12
+ add ecx,edx
+ add ebx,DWORD[60+rsp]
+ xor edi,ebp
+ mov esi,ecx
+ rol ecx,5
+ add ebx,edi
+ xor esi,ebp
+ ror edx,7
+ add ebx,ecx
+ add eax,DWORD[rsp]
+ xor esi,edx
+ mov edi,ebx
+ rol ebx,5
+ paddd xmm3,xmm7
+ add eax,esi
+ xor edi,edx
+ movdqa XMMWORD[48+rsp],xmm3
+ ror ecx,7
+ add eax,ebx
+ add ebp,DWORD[4+rsp]
+ movups xmm1,XMMWORD[((-16))+r15]
+DB 102,15,56,220,208
+ xor edi,ecx
+ mov esi,eax
+ rol eax,5
+ add ebp,edi
+ xor esi,ecx
+ ror ebx,7
+ add ebp,eax
+ add edx,DWORD[8+rsp]
+ xor esi,ebx
+ mov edi,ebp
+ rol ebp,5
+ add edx,esi
+ xor edi,ebx
+ ror eax,7
+ add edx,ebp
+ add ecx,DWORD[12+rsp]
+ xor edi,eax
+ mov esi,edx
+ rol edx,5
+ add ecx,edi
+ movups xmm0,XMMWORD[r15]
+DB 102,15,56,220,209
+ xor esi,eax
+ ror ebp,7
+ add ecx,edx
+ cmp r10,r14
+ je NEAR $L$done_ssse3
+ movdqa xmm3,XMMWORD[64+r11]
+ movdqa xmm13,XMMWORD[r11]
+ movdqu xmm4,XMMWORD[r10]
+ movdqu xmm5,XMMWORD[16+r10]
+ movdqu xmm6,XMMWORD[32+r10]
+ movdqu xmm7,XMMWORD[48+r10]
+DB 102,15,56,0,227
+ add r10,64
+ add ebx,DWORD[16+rsp]
+ xor esi,ebp
+ mov edi,ecx
+DB 102,15,56,0,235
+ rol ecx,5
+ add ebx,esi
+ xor edi,ebp
+ ror edx,7
+ paddd xmm4,xmm13
+ add ebx,ecx
+ add eax,DWORD[20+rsp]
+ xor edi,edx
+ mov esi,ebx
+ movdqa XMMWORD[rsp],xmm4
+ rol ebx,5
+ add eax,edi
+ xor esi,edx
+ ror ecx,7
+ psubd xmm4,xmm13
+ add eax,ebx
+ add ebp,DWORD[24+rsp]
+ movups xmm1,XMMWORD[16+r15]
+DB 102,15,56,220,208
+ xor esi,ecx
+ mov edi,eax
+ rol eax,5
+ add ebp,esi
+ xor edi,ecx
+ ror ebx,7
+ add ebp,eax
+ add edx,DWORD[28+rsp]
+ xor edi,ebx
+ mov esi,ebp
+ rol ebp,5
+ add edx,edi
+ xor esi,ebx
+ ror eax,7
+ add edx,ebp
+ add ecx,DWORD[32+rsp]
+ xor esi,eax
+ mov edi,edx
+DB 102,15,56,0,243
+ rol edx,5
+ add ecx,esi
+ movups xmm0,XMMWORD[32+r15]
+DB 102,15,56,220,209
+ xor edi,eax
+ ror ebp,7
+ paddd xmm5,xmm13
+ add ecx,edx
+ add ebx,DWORD[36+rsp]
+ xor edi,ebp
+ mov esi,ecx
+ movdqa XMMWORD[16+rsp],xmm5
+ rol ecx,5
+ add ebx,edi
+ xor esi,ebp
+ ror edx,7
+ psubd xmm5,xmm13
+ add ebx,ecx
+ add eax,DWORD[40+rsp]
+ xor esi,edx
+ mov edi,ebx
+ rol ebx,5
+ add eax,esi
+ xor edi,edx
+ ror ecx,7
+ add eax,ebx
+ add ebp,DWORD[44+rsp]
+ movups xmm1,XMMWORD[48+r15]
+DB 102,15,56,220,208
+ xor edi,ecx
+ mov esi,eax
+ rol eax,5
+ add ebp,edi
+ xor esi,ecx
+ ror ebx,7
+ add ebp,eax
+ add edx,DWORD[48+rsp]
+ xor esi,ebx
+ mov edi,ebp
+DB 102,15,56,0,251
+ rol ebp,5
+ add edx,esi
+ xor edi,ebx
+ ror eax,7
+ paddd xmm6,xmm13
+ add edx,ebp
+ add ecx,DWORD[52+rsp]
+ xor edi,eax
+ mov esi,edx
+ movdqa XMMWORD[32+rsp],xmm6
+ rol edx,5
+ add ecx,edi
+ cmp r8d,11
+ jb NEAR $L$aesenclast4
+ movups xmm0,XMMWORD[64+r15]
+DB 102,15,56,220,209
+ movups xmm1,XMMWORD[80+r15]
+DB 102,15,56,220,208
+ je NEAR $L$aesenclast4
+ movups xmm0,XMMWORD[96+r15]
+DB 102,15,56,220,209
+ movups xmm1,XMMWORD[112+r15]
+DB 102,15,56,220,208
+$L$aesenclast4:
+DB 102,15,56,221,209
+ movups xmm0,XMMWORD[((16-112))+r15]
+ xor esi,eax
+ ror ebp,7
+ psubd xmm6,xmm13
+ add ecx,edx
+ add ebx,DWORD[56+rsp]
+ xor esi,ebp
+ mov edi,ecx
+ rol ecx,5
+ add ebx,esi
+ xor edi,ebp
+ ror edx,7
+ add ebx,ecx
+ add eax,DWORD[60+rsp]
+ xor edi,edx
+ mov esi,ebx
+ rol ebx,5
+ add eax,edi
+ ror ecx,7
+ add eax,ebx
+ movups XMMWORD[48+r12*1+r13],xmm2
+ lea r12,[64+r12]
+
+ add eax,DWORD[r9]
+ add esi,DWORD[4+r9]
+ add ecx,DWORD[8+r9]
+ add edx,DWORD[12+r9]
+ mov DWORD[r9],eax
+ add ebp,DWORD[16+r9]
+ mov DWORD[4+r9],esi
+ mov ebx,esi
+ mov DWORD[8+r9],ecx
+ mov edi,ecx
+ mov DWORD[12+r9],edx
+ xor edi,edx
+ mov DWORD[16+r9],ebp
+ and esi,edi
+ jmp NEAR $L$oop_ssse3
+
+$L$done_ssse3:
+ add ebx,DWORD[16+rsp]
+ xor esi,ebp
+ mov edi,ecx
+ rol ecx,5
+ add ebx,esi
+ xor edi,ebp
+ ror edx,7
+ add ebx,ecx
+ add eax,DWORD[20+rsp]
+ xor edi,edx
+ mov esi,ebx
+ rol ebx,5
+ add eax,edi
+ xor esi,edx
+ ror ecx,7
+ add eax,ebx
+ add ebp,DWORD[24+rsp]
+ movups xmm1,XMMWORD[16+r15]
+DB 102,15,56,220,208
+ xor esi,ecx
+ mov edi,eax
+ rol eax,5
+ add ebp,esi
+ xor edi,ecx
+ ror ebx,7
+ add ebp,eax
+ add edx,DWORD[28+rsp]
+ xor edi,ebx
+ mov esi,ebp
+ rol ebp,5
+ add edx,edi
+ xor esi,ebx
+ ror eax,7
+ add edx,ebp
+ add ecx,DWORD[32+rsp]
+ xor esi,eax
+ mov edi,edx
+ rol edx,5
+ add ecx,esi
+ movups xmm0,XMMWORD[32+r15]
+DB 102,15,56,220,209
+ xor edi,eax
+ ror ebp,7
+ add ecx,edx
+ add ebx,DWORD[36+rsp]
+ xor edi,ebp
+ mov esi,ecx
+ rol ecx,5
+ add ebx,edi
+ xor esi,ebp
+ ror edx,7
+ add ebx,ecx
+ add eax,DWORD[40+rsp]
+ xor esi,edx
+ mov edi,ebx
+ rol ebx,5
+ add eax,esi
+ xor edi,edx
+ ror ecx,7
+ add eax,ebx
+ add ebp,DWORD[44+rsp]
+ movups xmm1,XMMWORD[48+r15]
+DB 102,15,56,220,208
+ xor edi,ecx
+ mov esi,eax
+ rol eax,5
+ add ebp,edi
+ xor esi,ecx
+ ror ebx,7
+ add ebp,eax
+ add edx,DWORD[48+rsp]
+ xor esi,ebx
+ mov edi,ebp
+ rol ebp,5
+ add edx,esi
+ xor edi,ebx
+ ror eax,7
+ add edx,ebp
+ add ecx,DWORD[52+rsp]
+ xor edi,eax
+ mov esi,edx
+ rol edx,5
+ add ecx,edi
+ cmp r8d,11
+ jb NEAR $L$aesenclast5
+ movups xmm0,XMMWORD[64+r15]
+DB 102,15,56,220,209
+ movups xmm1,XMMWORD[80+r15]
+DB 102,15,56,220,208
+ je NEAR $L$aesenclast5
+ movups xmm0,XMMWORD[96+r15]
+DB 102,15,56,220,209
+ movups xmm1,XMMWORD[112+r15]
+DB 102,15,56,220,208
+$L$aesenclast5:
+DB 102,15,56,221,209
+ movups xmm0,XMMWORD[((16-112))+r15]
+ xor esi,eax
+ ror ebp,7
+ add ecx,edx
+ add ebx,DWORD[56+rsp]
+ xor esi,ebp
+ mov edi,ecx
+ rol ecx,5
+ add ebx,esi
+ xor edi,ebp
+ ror edx,7
+ add ebx,ecx
+ add eax,DWORD[60+rsp]
+ xor edi,edx
+ mov esi,ebx
+ rol ebx,5
+ add eax,edi
+ ror ecx,7
+ add eax,ebx
+ movups XMMWORD[48+r12*1+r13],xmm2
+ mov r8,QWORD[88+rsp]
+
+ add eax,DWORD[r9]
+ add esi,DWORD[4+r9]
+ add ecx,DWORD[8+r9]
+ mov DWORD[r9],eax
+ add edx,DWORD[12+r9]
+ mov DWORD[4+r9],esi
+ add ebp,DWORD[16+r9]
+ mov DWORD[8+r9],ecx
+ mov DWORD[12+r9],edx
+ mov DWORD[16+r9],ebp
+ movups XMMWORD[r8],xmm2
+ movaps xmm6,XMMWORD[((96+0))+rsp]
+ movaps xmm7,XMMWORD[((96+16))+rsp]
+ movaps xmm8,XMMWORD[((96+32))+rsp]
+ movaps xmm9,XMMWORD[((96+48))+rsp]
+ movaps xmm10,XMMWORD[((96+64))+rsp]
+ movaps xmm11,XMMWORD[((96+80))+rsp]
+ movaps xmm12,XMMWORD[((96+96))+rsp]
+ movaps xmm13,XMMWORD[((96+112))+rsp]
+ movaps xmm14,XMMWORD[((96+128))+rsp]
+ movaps xmm15,XMMWORD[((96+144))+rsp]
+ lea rsi,[264+rsp]
+
+ mov r15,QWORD[rsi]
+
+ mov r14,QWORD[8+rsi]
+
+ mov r13,QWORD[16+rsi]
+
+ mov r12,QWORD[24+rsi]
+
+ mov rbp,QWORD[32+rsi]
+
+ mov rbx,QWORD[40+rsi]
+
+ lea rsp,[48+rsi]
+
+$L$epilogue_ssse3:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_aesni_cbc_sha1_enc_ssse3:
+
+ALIGN 32
+aesni_cbc_sha1_enc_avx:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_aesni_cbc_sha1_enc_avx:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+
+ mov r10,QWORD[56+rsp]
+
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+ lea rsp,[((-264))+rsp]
+
+
+
+ movaps XMMWORD[(96+0)+rsp],xmm6
+ movaps XMMWORD[(96+16)+rsp],xmm7
+ movaps XMMWORD[(96+32)+rsp],xmm8
+ movaps XMMWORD[(96+48)+rsp],xmm9
+ movaps XMMWORD[(96+64)+rsp],xmm10
+ movaps XMMWORD[(96+80)+rsp],xmm11
+ movaps XMMWORD[(96+96)+rsp],xmm12
+ movaps XMMWORD[(96+112)+rsp],xmm13
+ movaps XMMWORD[(96+128)+rsp],xmm14
+ movaps XMMWORD[(96+144)+rsp],xmm15
+$L$prologue_avx:
+ vzeroall
+ mov r12,rdi
+ mov r13,rsi
+ mov r14,rdx
+ lea r15,[112+rcx]
+ vmovdqu xmm12,XMMWORD[r8]
+ mov QWORD[88+rsp],r8
+ shl r14,6
+ sub r13,r12
+ mov r8d,DWORD[((240-112))+r15]
+ add r14,r10
+
+ lea r11,[K_XX_XX]
+ mov eax,DWORD[r9]
+ mov ebx,DWORD[4+r9]
+ mov ecx,DWORD[8+r9]
+ mov edx,DWORD[12+r9]
+ mov esi,ebx
+ mov ebp,DWORD[16+r9]
+ mov edi,ecx
+ xor edi,edx
+ and esi,edi
+
+ vmovdqa xmm6,XMMWORD[64+r11]
+ vmovdqa xmm10,XMMWORD[r11]
+ vmovdqu xmm0,XMMWORD[r10]
+ vmovdqu xmm1,XMMWORD[16+r10]
+ vmovdqu xmm2,XMMWORD[32+r10]
+ vmovdqu xmm3,XMMWORD[48+r10]
+ vpshufb xmm0,xmm0,xmm6
+ add r10,64
+ vpshufb xmm1,xmm1,xmm6
+ vpshufb xmm2,xmm2,xmm6
+ vpshufb xmm3,xmm3,xmm6
+ vpaddd xmm4,xmm0,xmm10
+ vpaddd xmm5,xmm1,xmm10
+ vpaddd xmm6,xmm2,xmm10
+ vmovdqa XMMWORD[rsp],xmm4
+ vmovdqa XMMWORD[16+rsp],xmm5
+ vmovdqa XMMWORD[32+rsp],xmm6
+ vmovups xmm15,XMMWORD[((-112))+r15]
+ vmovups xmm14,XMMWORD[((16-112))+r15]
+ jmp NEAR $L$oop_avx
+ALIGN 32
+$L$oop_avx:
+ shrd ebx,ebx,2
+ vmovdqu xmm13,XMMWORD[r12]
+ vpxor xmm13,xmm13,xmm15
+ vpxor xmm12,xmm12,xmm13
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[((-80))+r15]
+ xor esi,edx
+ vpalignr xmm4,xmm1,xmm0,8
+ mov edi,eax
+ add ebp,DWORD[rsp]
+ vpaddd xmm9,xmm10,xmm3
+ xor ebx,ecx
+ shld eax,eax,5
+ vpsrldq xmm8,xmm3,4
+ add ebp,esi
+ and edi,ebx
+ vpxor xmm4,xmm4,xmm0
+ xor ebx,ecx
+ add ebp,eax
+ vpxor xmm8,xmm8,xmm2
+ shrd eax,eax,7
+ xor edi,ecx
+ mov esi,ebp
+ add edx,DWORD[4+rsp]
+ vpxor xmm4,xmm4,xmm8
+ xor eax,ebx
+ shld ebp,ebp,5
+ vmovdqa XMMWORD[48+rsp],xmm9
+ add edx,edi
+ vaesenc xmm12,xmm12,xmm15
+ vmovups xmm14,XMMWORD[((-64))+r15]
+ and esi,eax
+ vpsrld xmm8,xmm4,31
+ xor eax,ebx
+ add edx,ebp
+ shrd ebp,ebp,7
+ xor esi,ebx
+ vpslldq xmm9,xmm4,12
+ vpaddd xmm4,xmm4,xmm4
+ mov edi,edx
+ add ecx,DWORD[8+rsp]
+ xor ebp,eax
+ shld edx,edx,5
+ vpor xmm4,xmm4,xmm8
+ vpsrld xmm8,xmm9,30
+ add ecx,esi
+ and edi,ebp
+ xor ebp,eax
+ add ecx,edx
+ vpslld xmm9,xmm9,2
+ vpxor xmm4,xmm4,xmm8
+ shrd edx,edx,7
+ xor edi,eax
+ mov esi,ecx
+ add ebx,DWORD[12+rsp]
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[((-48))+r15]
+ vpxor xmm4,xmm4,xmm9
+ xor edx,ebp
+ shld ecx,ecx,5
+ add ebx,edi
+ and esi,edx
+ xor edx,ebp
+ add ebx,ecx
+ shrd ecx,ecx,7
+ xor esi,ebp
+ vpalignr xmm5,xmm2,xmm1,8
+ mov edi,ebx
+ add eax,DWORD[16+rsp]
+ vpaddd xmm9,xmm10,xmm4
+ xor ecx,edx
+ shld ebx,ebx,5
+ vpsrldq xmm8,xmm4,4
+ add eax,esi
+ and edi,ecx
+ vpxor xmm5,xmm5,xmm1
+ xor ecx,edx
+ add eax,ebx
+ vpxor xmm8,xmm8,xmm3
+ shrd ebx,ebx,7
+ vaesenc xmm12,xmm12,xmm15
+ vmovups xmm14,XMMWORD[((-32))+r15]
+ xor edi,edx
+ mov esi,eax
+ add ebp,DWORD[20+rsp]
+ vpxor xmm5,xmm5,xmm8
+ xor ebx,ecx
+ shld eax,eax,5
+ vmovdqa XMMWORD[rsp],xmm9
+ add ebp,edi
+ and esi,ebx
+ vpsrld xmm8,xmm5,31
+ xor ebx,ecx
+ add ebp,eax
+ shrd eax,eax,7
+ xor esi,ecx
+ vpslldq xmm9,xmm5,12
+ vpaddd xmm5,xmm5,xmm5
+ mov edi,ebp
+ add edx,DWORD[24+rsp]
+ xor eax,ebx
+ shld ebp,ebp,5
+ vpor xmm5,xmm5,xmm8
+ vpsrld xmm8,xmm9,30
+ add edx,esi
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[((-16))+r15]
+ and edi,eax
+ xor eax,ebx
+ add edx,ebp
+ vpslld xmm9,xmm9,2
+ vpxor xmm5,xmm5,xmm8
+ shrd ebp,ebp,7
+ xor edi,ebx
+ mov esi,edx
+ add ecx,DWORD[28+rsp]
+ vpxor xmm5,xmm5,xmm9
+ xor ebp,eax
+ shld edx,edx,5
+ vmovdqa xmm10,XMMWORD[16+r11]
+ add ecx,edi
+ and esi,ebp
+ xor ebp,eax
+ add ecx,edx
+ shrd edx,edx,7
+ xor esi,eax
+ vpalignr xmm6,xmm3,xmm2,8
+ mov edi,ecx
+ add ebx,DWORD[32+rsp]
+ vaesenc xmm12,xmm12,xmm15
+ vmovups xmm14,XMMWORD[r15]
+ vpaddd xmm9,xmm10,xmm5
+ xor edx,ebp
+ shld ecx,ecx,5
+ vpsrldq xmm8,xmm5,4
+ add ebx,esi
+ and edi,edx
+ vpxor xmm6,xmm6,xmm2
+ xor edx,ebp
+ add ebx,ecx
+ vpxor xmm8,xmm8,xmm4
+ shrd ecx,ecx,7
+ xor edi,ebp
+ mov esi,ebx
+ add eax,DWORD[36+rsp]
+ vpxor xmm6,xmm6,xmm8
+ xor ecx,edx
+ shld ebx,ebx,5
+ vmovdqa XMMWORD[16+rsp],xmm9
+ add eax,edi
+ and esi,ecx
+ vpsrld xmm8,xmm6,31
+ xor ecx,edx
+ add eax,ebx
+ shrd ebx,ebx,7
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[16+r15]
+ xor esi,edx
+ vpslldq xmm9,xmm6,12
+ vpaddd xmm6,xmm6,xmm6
+ mov edi,eax
+ add ebp,DWORD[40+rsp]
+ xor ebx,ecx
+ shld eax,eax,5
+ vpor xmm6,xmm6,xmm8
+ vpsrld xmm8,xmm9,30
+ add ebp,esi
+ and edi,ebx
+ xor ebx,ecx
+ add ebp,eax
+ vpslld xmm9,xmm9,2
+ vpxor xmm6,xmm6,xmm8
+ shrd eax,eax,7
+ xor edi,ecx
+ mov esi,ebp
+ add edx,DWORD[44+rsp]
+ vpxor xmm6,xmm6,xmm9
+ xor eax,ebx
+ shld ebp,ebp,5
+ add edx,edi
+ vaesenc xmm12,xmm12,xmm15
+ vmovups xmm14,XMMWORD[32+r15]
+ and esi,eax
+ xor eax,ebx
+ add edx,ebp
+ shrd ebp,ebp,7
+ xor esi,ebx
+ vpalignr xmm7,xmm4,xmm3,8
+ mov edi,edx
+ add ecx,DWORD[48+rsp]
+ vpaddd xmm9,xmm10,xmm6
+ xor ebp,eax
+ shld edx,edx,5
+ vpsrldq xmm8,xmm6,4
+ add ecx,esi
+ and edi,ebp
+ vpxor xmm7,xmm7,xmm3
+ xor ebp,eax
+ add ecx,edx
+ vpxor xmm8,xmm8,xmm5
+ shrd edx,edx,7
+ xor edi,eax
+ mov esi,ecx
+ add ebx,DWORD[52+rsp]
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[48+r15]
+ vpxor xmm7,xmm7,xmm8
+ xor edx,ebp
+ shld ecx,ecx,5
+ vmovdqa XMMWORD[32+rsp],xmm9
+ add ebx,edi
+ and esi,edx
+ vpsrld xmm8,xmm7,31
+ xor edx,ebp
+ add ebx,ecx
+ shrd ecx,ecx,7
+ xor esi,ebp
+ vpslldq xmm9,xmm7,12
+ vpaddd xmm7,xmm7,xmm7
+ mov edi,ebx
+ add eax,DWORD[56+rsp]
+ xor ecx,edx
+ shld ebx,ebx,5
+ vpor xmm7,xmm7,xmm8
+ vpsrld xmm8,xmm9,30
+ add eax,esi
+ and edi,ecx
+ xor ecx,edx
+ add eax,ebx
+ vpslld xmm9,xmm9,2
+ vpxor xmm7,xmm7,xmm8
+ shrd ebx,ebx,7
+ cmp r8d,11
+ jb NEAR $L$vaesenclast6
+ vaesenc xmm12,xmm12,xmm15
+ vmovups xmm14,XMMWORD[64+r15]
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[80+r15]
+ je NEAR $L$vaesenclast6
+ vaesenc xmm12,xmm12,xmm15
+ vmovups xmm14,XMMWORD[96+r15]
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[112+r15]
+$L$vaesenclast6:
+ vaesenclast xmm12,xmm12,xmm15
+ vmovups xmm15,XMMWORD[((-112))+r15]
+ vmovups xmm14,XMMWORD[((16-112))+r15]
+ xor edi,edx
+ mov esi,eax
+ add ebp,DWORD[60+rsp]
+ vpxor xmm7,xmm7,xmm9
+ xor ebx,ecx
+ shld eax,eax,5
+ add ebp,edi
+ and esi,ebx
+ xor ebx,ecx
+ add ebp,eax
+ vpalignr xmm8,xmm7,xmm6,8
+ vpxor xmm0,xmm0,xmm4
+ shrd eax,eax,7
+ xor esi,ecx
+ mov edi,ebp
+ add edx,DWORD[rsp]
+ vpxor xmm0,xmm0,xmm1
+ xor eax,ebx
+ shld ebp,ebp,5
+ vpaddd xmm9,xmm10,xmm7
+ add edx,esi
+ vmovdqu xmm13,XMMWORD[16+r12]
+ vpxor xmm13,xmm13,xmm15
+ vmovups XMMWORD[r13*1+r12],xmm12
+ vpxor xmm12,xmm12,xmm13
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[((-80))+r15]
+ and edi,eax
+ vpxor xmm0,xmm0,xmm8
+ xor eax,ebx
+ add edx,ebp
+ shrd ebp,ebp,7
+ xor edi,ebx
+ vpsrld xmm8,xmm0,30
+ vmovdqa XMMWORD[48+rsp],xmm9
+ mov esi,edx
+ add ecx,DWORD[4+rsp]
+ xor ebp,eax
+ shld edx,edx,5
+ vpslld xmm0,xmm0,2
+ add ecx,edi
+ and esi,ebp
+ xor ebp,eax
+ add ecx,edx
+ shrd edx,edx,7
+ xor esi,eax
+ mov edi,ecx
+ add ebx,DWORD[8+rsp]
+ vaesenc xmm12,xmm12,xmm15
+ vmovups xmm14,XMMWORD[((-64))+r15]
+ vpor xmm0,xmm0,xmm8
+ xor edx,ebp
+ shld ecx,ecx,5
+ add ebx,esi
+ and edi,edx
+ xor edx,ebp
+ add ebx,ecx
+ add eax,DWORD[12+rsp]
+ xor edi,ebp
+ mov esi,ebx
+ shld ebx,ebx,5
+ add eax,edi
+ xor esi,edx
+ shrd ecx,ecx,7
+ add eax,ebx
+ vpalignr xmm8,xmm0,xmm7,8
+ vpxor xmm1,xmm1,xmm5
+ add ebp,DWORD[16+rsp]
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[((-48))+r15]
+ xor esi,ecx
+ mov edi,eax
+ shld eax,eax,5
+ vpxor xmm1,xmm1,xmm2
+ add ebp,esi
+ xor edi,ecx
+ vpaddd xmm9,xmm10,xmm0
+ shrd ebx,ebx,7
+ add ebp,eax
+ vpxor xmm1,xmm1,xmm8
+ add edx,DWORD[20+rsp]
+ xor edi,ebx
+ mov esi,ebp
+ shld ebp,ebp,5
+ vpsrld xmm8,xmm1,30
+ vmovdqa XMMWORD[rsp],xmm9
+ add edx,edi
+ xor esi,ebx
+ shrd eax,eax,7
+ add edx,ebp
+ vpslld xmm1,xmm1,2
+ add ecx,DWORD[24+rsp]
+ xor esi,eax
+ mov edi,edx
+ shld edx,edx,5
+ add ecx,esi
+ vaesenc xmm12,xmm12,xmm15
+ vmovups xmm14,XMMWORD[((-32))+r15]
+ xor edi,eax
+ shrd ebp,ebp,7
+ add ecx,edx
+ vpor xmm1,xmm1,xmm8
+ add ebx,DWORD[28+rsp]
+ xor edi,ebp
+ mov esi,ecx
+ shld ecx,ecx,5
+ add ebx,edi
+ xor esi,ebp
+ shrd edx,edx,7
+ add ebx,ecx
+ vpalignr xmm8,xmm1,xmm0,8
+ vpxor xmm2,xmm2,xmm6
+ add eax,DWORD[32+rsp]
+ xor esi,edx
+ mov edi,ebx
+ shld ebx,ebx,5
+ vpxor xmm2,xmm2,xmm3
+ add eax,esi
+ xor edi,edx
+ vpaddd xmm9,xmm10,xmm1
+ vmovdqa xmm10,XMMWORD[32+r11]
+ shrd ecx,ecx,7
+ add eax,ebx
+ vpxor xmm2,xmm2,xmm8
+ add ebp,DWORD[36+rsp]
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[((-16))+r15]
+ xor edi,ecx
+ mov esi,eax
+ shld eax,eax,5
+ vpsrld xmm8,xmm2,30
+ vmovdqa XMMWORD[16+rsp],xmm9
+ add ebp,edi
+ xor esi,ecx
+ shrd ebx,ebx,7
+ add ebp,eax
+ vpslld xmm2,xmm2,2
+ add edx,DWORD[40+rsp]
+ xor esi,ebx
+ mov edi,ebp
+ shld ebp,ebp,5
+ add edx,esi
+ xor edi,ebx
+ shrd eax,eax,7
+ add edx,ebp
+ vpor xmm2,xmm2,xmm8
+ add ecx,DWORD[44+rsp]
+ xor edi,eax
+ mov esi,edx
+ shld edx,edx,5
+ add ecx,edi
+ vaesenc xmm12,xmm12,xmm15
+ vmovups xmm14,XMMWORD[r15]
+ xor esi,eax
+ shrd ebp,ebp,7
+ add ecx,edx
+ vpalignr xmm8,xmm2,xmm1,8
+ vpxor xmm3,xmm3,xmm7
+ add ebx,DWORD[48+rsp]
+ xor esi,ebp
+ mov edi,ecx
+ shld ecx,ecx,5
+ vpxor xmm3,xmm3,xmm4
+ add ebx,esi
+ xor edi,ebp
+ vpaddd xmm9,xmm10,xmm2
+ shrd edx,edx,7
+ add ebx,ecx
+ vpxor xmm3,xmm3,xmm8
+ add eax,DWORD[52+rsp]
+ xor edi,edx
+ mov esi,ebx
+ shld ebx,ebx,5
+ vpsrld xmm8,xmm3,30
+ vmovdqa XMMWORD[32+rsp],xmm9
+ add eax,edi
+ xor esi,edx
+ shrd ecx,ecx,7
+ add eax,ebx
+ vpslld xmm3,xmm3,2
+ add ebp,DWORD[56+rsp]
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[16+r15]
+ xor esi,ecx
+ mov edi,eax
+ shld eax,eax,5
+ add ebp,esi
+ xor edi,ecx
+ shrd ebx,ebx,7
+ add ebp,eax
+ vpor xmm3,xmm3,xmm8
+ add edx,DWORD[60+rsp]
+ xor edi,ebx
+ mov esi,ebp
+ shld ebp,ebp,5
+ add edx,edi
+ xor esi,ebx
+ shrd eax,eax,7
+ add edx,ebp
+ vpalignr xmm8,xmm3,xmm2,8
+ vpxor xmm4,xmm4,xmm0
+ add ecx,DWORD[rsp]
+ xor esi,eax
+ mov edi,edx
+ shld edx,edx,5
+ vpxor xmm4,xmm4,xmm5
+ add ecx,esi
+ vaesenc xmm12,xmm12,xmm15
+ vmovups xmm14,XMMWORD[32+r15]
+ xor edi,eax
+ vpaddd xmm9,xmm10,xmm3
+ shrd ebp,ebp,7
+ add ecx,edx
+ vpxor xmm4,xmm4,xmm8
+ add ebx,DWORD[4+rsp]
+ xor edi,ebp
+ mov esi,ecx
+ shld ecx,ecx,5
+ vpsrld xmm8,xmm4,30
+ vmovdqa XMMWORD[48+rsp],xmm9
+ add ebx,edi
+ xor esi,ebp
+ shrd edx,edx,7
+ add ebx,ecx
+ vpslld xmm4,xmm4,2
+ add eax,DWORD[8+rsp]
+ xor esi,edx
+ mov edi,ebx
+ shld ebx,ebx,5
+ add eax,esi
+ xor edi,edx
+ shrd ecx,ecx,7
+ add eax,ebx
+ vpor xmm4,xmm4,xmm8
+ add ebp,DWORD[12+rsp]
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[48+r15]
+ xor edi,ecx
+ mov esi,eax
+ shld eax,eax,5
+ add ebp,edi
+ xor esi,ecx
+ shrd ebx,ebx,7
+ add ebp,eax
+ vpalignr xmm8,xmm4,xmm3,8
+ vpxor xmm5,xmm5,xmm1
+ add edx,DWORD[16+rsp]
+ xor esi,ebx
+ mov edi,ebp
+ shld ebp,ebp,5
+ vpxor xmm5,xmm5,xmm6
+ add edx,esi
+ xor edi,ebx
+ vpaddd xmm9,xmm10,xmm4
+ shrd eax,eax,7
+ add edx,ebp
+ vpxor xmm5,xmm5,xmm8
+ add ecx,DWORD[20+rsp]
+ xor edi,eax
+ mov esi,edx
+ shld edx,edx,5
+ vpsrld xmm8,xmm5,30
+ vmovdqa XMMWORD[rsp],xmm9
+ add ecx,edi
+ cmp r8d,11
+ jb NEAR $L$vaesenclast7
+ vaesenc xmm12,xmm12,xmm15
+ vmovups xmm14,XMMWORD[64+r15]
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[80+r15]
+ je NEAR $L$vaesenclast7
+ vaesenc xmm12,xmm12,xmm15
+ vmovups xmm14,XMMWORD[96+r15]
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[112+r15]
+$L$vaesenclast7:
+ vaesenclast xmm12,xmm12,xmm15
+ vmovups xmm15,XMMWORD[((-112))+r15]
+ vmovups xmm14,XMMWORD[((16-112))+r15]
+ xor esi,eax
+ shrd ebp,ebp,7
+ add ecx,edx
+ vpslld xmm5,xmm5,2
+ add ebx,DWORD[24+rsp]
+ xor esi,ebp
+ mov edi,ecx
+ shld ecx,ecx,5
+ add ebx,esi
+ xor edi,ebp
+ shrd edx,edx,7
+ add ebx,ecx
+ vpor xmm5,xmm5,xmm8
+ add eax,DWORD[28+rsp]
+ shrd ecx,ecx,7
+ mov esi,ebx
+ xor edi,edx
+ shld ebx,ebx,5
+ add eax,edi
+ xor esi,ecx
+ xor ecx,edx
+ add eax,ebx
+ vpalignr xmm8,xmm5,xmm4,8
+ vpxor xmm6,xmm6,xmm2
+ add ebp,DWORD[32+rsp]
+ vmovdqu xmm13,XMMWORD[32+r12]
+ vpxor xmm13,xmm13,xmm15
+ vmovups XMMWORD[16+r12*1+r13],xmm12
+ vpxor xmm12,xmm12,xmm13
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[((-80))+r15]
+ and esi,ecx
+ xor ecx,edx
+ shrd ebx,ebx,7
+ vpxor xmm6,xmm6,xmm7
+ mov edi,eax
+ xor esi,ecx
+ vpaddd xmm9,xmm10,xmm5
+ shld eax,eax,5
+ add ebp,esi
+ vpxor xmm6,xmm6,xmm8
+ xor edi,ebx
+ xor ebx,ecx
+ add ebp,eax
+ add edx,DWORD[36+rsp]
+ vpsrld xmm8,xmm6,30
+ vmovdqa XMMWORD[16+rsp],xmm9
+ and edi,ebx
+ xor ebx,ecx
+ shrd eax,eax,7
+ mov esi,ebp
+ vpslld xmm6,xmm6,2
+ xor edi,ebx
+ shld ebp,ebp,5
+ add edx,edi
+ vaesenc xmm12,xmm12,xmm15
+ vmovups xmm14,XMMWORD[((-64))+r15]
+ xor esi,eax
+ xor eax,ebx
+ add edx,ebp
+ add ecx,DWORD[40+rsp]
+ and esi,eax
+ vpor xmm6,xmm6,xmm8
+ xor eax,ebx
+ shrd ebp,ebp,7
+ mov edi,edx
+ xor esi,eax
+ shld edx,edx,5
+ add ecx,esi
+ xor edi,ebp
+ xor ebp,eax
+ add ecx,edx
+ add ebx,DWORD[44+rsp]
+ and edi,ebp
+ xor ebp,eax
+ shrd edx,edx,7
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[((-48))+r15]
+ mov esi,ecx
+ xor edi,ebp
+ shld ecx,ecx,5
+ add ebx,edi
+ xor esi,edx
+ xor edx,ebp
+ add ebx,ecx
+ vpalignr xmm8,xmm6,xmm5,8
+ vpxor xmm7,xmm7,xmm3
+ add eax,DWORD[48+rsp]
+ and esi,edx
+ xor edx,ebp
+ shrd ecx,ecx,7
+ vpxor xmm7,xmm7,xmm0
+ mov edi,ebx
+ xor esi,edx
+ vpaddd xmm9,xmm10,xmm6
+ vmovdqa xmm10,XMMWORD[48+r11]
+ shld ebx,ebx,5
+ add eax,esi
+ vpxor xmm7,xmm7,xmm8
+ xor edi,ecx
+ xor ecx,edx
+ add eax,ebx
+ add ebp,DWORD[52+rsp]
+ vaesenc xmm12,xmm12,xmm15
+ vmovups xmm14,XMMWORD[((-32))+r15]
+ vpsrld xmm8,xmm7,30
+ vmovdqa XMMWORD[32+rsp],xmm9
+ and edi,ecx
+ xor ecx,edx
+ shrd ebx,ebx,7
+ mov esi,eax
+ vpslld xmm7,xmm7,2
+ xor edi,ecx
+ shld eax,eax,5
+ add ebp,edi
+ xor esi,ebx
+ xor ebx,ecx
+ add ebp,eax
+ add edx,DWORD[56+rsp]
+ and esi,ebx
+ vpor xmm7,xmm7,xmm8
+ xor ebx,ecx
+ shrd eax,eax,7
+ mov edi,ebp
+ xor esi,ebx
+ shld ebp,ebp,5
+ add edx,esi
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[((-16))+r15]
+ xor edi,eax
+ xor eax,ebx
+ add edx,ebp
+ add ecx,DWORD[60+rsp]
+ and edi,eax
+ xor eax,ebx
+ shrd ebp,ebp,7
+ mov esi,edx
+ xor edi,eax
+ shld edx,edx,5
+ add ecx,edi
+ xor esi,ebp
+ xor ebp,eax
+ add ecx,edx
+ vpalignr xmm8,xmm7,xmm6,8
+ vpxor xmm0,xmm0,xmm4
+ add ebx,DWORD[rsp]
+ and esi,ebp
+ xor ebp,eax
+ shrd edx,edx,7
+ vaesenc xmm12,xmm12,xmm15
+ vmovups xmm14,XMMWORD[r15]
+ vpxor xmm0,xmm0,xmm1
+ mov edi,ecx
+ xor esi,ebp
+ vpaddd xmm9,xmm10,xmm7
+ shld ecx,ecx,5
+ add ebx,esi
+ vpxor xmm0,xmm0,xmm8
+ xor edi,edx
+ xor edx,ebp
+ add ebx,ecx
+ add eax,DWORD[4+rsp]
+ vpsrld xmm8,xmm0,30
+ vmovdqa XMMWORD[48+rsp],xmm9
+ and edi,edx
+ xor edx,ebp
+ shrd ecx,ecx,7
+ mov esi,ebx
+ vpslld xmm0,xmm0,2
+ xor edi,edx
+ shld ebx,ebx,5
+ add eax,edi
+ xor esi,ecx
+ xor ecx,edx
+ add eax,ebx
+ add ebp,DWORD[8+rsp]
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[16+r15]
+ and esi,ecx
+ vpor xmm0,xmm0,xmm8
+ xor ecx,edx
+ shrd ebx,ebx,7
+ mov edi,eax
+ xor esi,ecx
+ shld eax,eax,5
+ add ebp,esi
+ xor edi,ebx
+ xor ebx,ecx
+ add ebp,eax
+ add edx,DWORD[12+rsp]
+ and edi,ebx
+ xor ebx,ecx
+ shrd eax,eax,7
+ mov esi,ebp
+ xor edi,ebx
+ shld ebp,ebp,5
+ add edx,edi
+ vaesenc xmm12,xmm12,xmm15
+ vmovups xmm14,XMMWORD[32+r15]
+ xor esi,eax
+ xor eax,ebx
+ add edx,ebp
+ vpalignr xmm8,xmm0,xmm7,8
+ vpxor xmm1,xmm1,xmm5
+ add ecx,DWORD[16+rsp]
+ and esi,eax
+ xor eax,ebx
+ shrd ebp,ebp,7
+ vpxor xmm1,xmm1,xmm2
+ mov edi,edx
+ xor esi,eax
+ vpaddd xmm9,xmm10,xmm0
+ shld edx,edx,5
+ add ecx,esi
+ vpxor xmm1,xmm1,xmm8
+ xor edi,ebp
+ xor ebp,eax
+ add ecx,edx
+ add ebx,DWORD[20+rsp]
+ vpsrld xmm8,xmm1,30
+ vmovdqa XMMWORD[rsp],xmm9
+ and edi,ebp
+ xor ebp,eax
+ shrd edx,edx,7
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[48+r15]
+ mov esi,ecx
+ vpslld xmm1,xmm1,2
+ xor edi,ebp
+ shld ecx,ecx,5
+ add ebx,edi
+ xor esi,edx
+ xor edx,ebp
+ add ebx,ecx
+ add eax,DWORD[24+rsp]
+ and esi,edx
+ vpor xmm1,xmm1,xmm8
+ xor edx,ebp
+ shrd ecx,ecx,7
+ mov edi,ebx
+ xor esi,edx
+ shld ebx,ebx,5
+ add eax,esi
+ xor edi,ecx
+ xor ecx,edx
+ add eax,ebx
+ add ebp,DWORD[28+rsp]
+ cmp r8d,11
+ jb NEAR $L$vaesenclast8
+ vaesenc xmm12,xmm12,xmm15
+ vmovups xmm14,XMMWORD[64+r15]
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[80+r15]
+ je NEAR $L$vaesenclast8
+ vaesenc xmm12,xmm12,xmm15
+ vmovups xmm14,XMMWORD[96+r15]
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[112+r15]
+$L$vaesenclast8:
+ vaesenclast xmm12,xmm12,xmm15
+ vmovups xmm15,XMMWORD[((-112))+r15]
+ vmovups xmm14,XMMWORD[((16-112))+r15]
+ and edi,ecx
+ xor ecx,edx
+ shrd ebx,ebx,7
+ mov esi,eax
+ xor edi,ecx
+ shld eax,eax,5
+ add ebp,edi
+ xor esi,ebx
+ xor ebx,ecx
+ add ebp,eax
+ vpalignr xmm8,xmm1,xmm0,8
+ vpxor xmm2,xmm2,xmm6
+ add edx,DWORD[32+rsp]
+ and esi,ebx
+ xor ebx,ecx
+ shrd eax,eax,7
+ vpxor xmm2,xmm2,xmm3
+ mov edi,ebp
+ xor esi,ebx
+ vpaddd xmm9,xmm10,xmm1
+ shld ebp,ebp,5
+ add edx,esi
+ vmovdqu xmm13,XMMWORD[48+r12]
+ vpxor xmm13,xmm13,xmm15
+ vmovups XMMWORD[32+r12*1+r13],xmm12
+ vpxor xmm12,xmm12,xmm13
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[((-80))+r15]
+ vpxor xmm2,xmm2,xmm8
+ xor edi,eax
+ xor eax,ebx
+ add edx,ebp
+ add ecx,DWORD[36+rsp]
+ vpsrld xmm8,xmm2,30
+ vmovdqa XMMWORD[16+rsp],xmm9
+ and edi,eax
+ xor eax,ebx
+ shrd ebp,ebp,7
+ mov esi,edx
+ vpslld xmm2,xmm2,2
+ xor edi,eax
+ shld edx,edx,5
+ add ecx,edi
+ xor esi,ebp
+ xor ebp,eax
+ add ecx,edx
+ add ebx,DWORD[40+rsp]
+ and esi,ebp
+ vpor xmm2,xmm2,xmm8
+ xor ebp,eax
+ shrd edx,edx,7
+ vaesenc xmm12,xmm12,xmm15
+ vmovups xmm14,XMMWORD[((-64))+r15]
+ mov edi,ecx
+ xor esi,ebp
+ shld ecx,ecx,5
+ add ebx,esi
+ xor edi,edx
+ xor edx,ebp
+ add ebx,ecx
+ add eax,DWORD[44+rsp]
+ and edi,edx
+ xor edx,ebp
+ shrd ecx,ecx,7
+ mov esi,ebx
+ xor edi,edx
+ shld ebx,ebx,5
+ add eax,edi
+ xor esi,edx
+ add eax,ebx
+ vpalignr xmm8,xmm2,xmm1,8
+ vpxor xmm3,xmm3,xmm7
+ add ebp,DWORD[48+rsp]
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[((-48))+r15]
+ xor esi,ecx
+ mov edi,eax
+ shld eax,eax,5
+ vpxor xmm3,xmm3,xmm4
+ add ebp,esi
+ xor edi,ecx
+ vpaddd xmm9,xmm10,xmm2
+ shrd ebx,ebx,7
+ add ebp,eax
+ vpxor xmm3,xmm3,xmm8
+ add edx,DWORD[52+rsp]
+ xor edi,ebx
+ mov esi,ebp
+ shld ebp,ebp,5
+ vpsrld xmm8,xmm3,30
+ vmovdqa XMMWORD[32+rsp],xmm9
+ add edx,edi
+ xor esi,ebx
+ shrd eax,eax,7
+ add edx,ebp
+ vpslld xmm3,xmm3,2
+ add ecx,DWORD[56+rsp]
+ xor esi,eax
+ mov edi,edx
+ shld edx,edx,5
+ add ecx,esi
+ vaesenc xmm12,xmm12,xmm15
+ vmovups xmm14,XMMWORD[((-32))+r15]
+ xor edi,eax
+ shrd ebp,ebp,7
+ add ecx,edx
+ vpor xmm3,xmm3,xmm8
+ add ebx,DWORD[60+rsp]
+ xor edi,ebp
+ mov esi,ecx
+ shld ecx,ecx,5
+ add ebx,edi
+ xor esi,ebp
+ shrd edx,edx,7
+ add ebx,ecx
+ add eax,DWORD[rsp]
+ vpaddd xmm9,xmm10,xmm3
+ xor esi,edx
+ mov edi,ebx
+ shld ebx,ebx,5
+ add eax,esi
+ vmovdqa XMMWORD[48+rsp],xmm9
+ xor edi,edx
+ shrd ecx,ecx,7
+ add eax,ebx
+ add ebp,DWORD[4+rsp]
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[((-16))+r15]
+ xor edi,ecx
+ mov esi,eax
+ shld eax,eax,5
+ add ebp,edi
+ xor esi,ecx
+ shrd ebx,ebx,7
+ add ebp,eax
+ add edx,DWORD[8+rsp]
+ xor esi,ebx
+ mov edi,ebp
+ shld ebp,ebp,5
+ add edx,esi
+ xor edi,ebx
+ shrd eax,eax,7
+ add edx,ebp
+ add ecx,DWORD[12+rsp]
+ xor edi,eax
+ mov esi,edx
+ shld edx,edx,5
+ add ecx,edi
+ vaesenc xmm12,xmm12,xmm15
+ vmovups xmm14,XMMWORD[r15]
+ xor esi,eax
+ shrd ebp,ebp,7
+ add ecx,edx
+ cmp r10,r14
+ je NEAR $L$done_avx
+ vmovdqa xmm9,XMMWORD[64+r11]
+ vmovdqa xmm10,XMMWORD[r11]
+ vmovdqu xmm0,XMMWORD[r10]
+ vmovdqu xmm1,XMMWORD[16+r10]
+ vmovdqu xmm2,XMMWORD[32+r10]
+ vmovdqu xmm3,XMMWORD[48+r10]
+ vpshufb xmm0,xmm0,xmm9
+ add r10,64
+ add ebx,DWORD[16+rsp]
+ xor esi,ebp
+ vpshufb xmm1,xmm1,xmm9
+ mov edi,ecx
+ shld ecx,ecx,5
+ vpaddd xmm8,xmm0,xmm10
+ add ebx,esi
+ xor edi,ebp
+ shrd edx,edx,7
+ add ebx,ecx
+ vmovdqa XMMWORD[rsp],xmm8
+ add eax,DWORD[20+rsp]
+ xor edi,edx
+ mov esi,ebx
+ shld ebx,ebx,5
+ add eax,edi
+ xor esi,edx
+ shrd ecx,ecx,7
+ add eax,ebx
+ add ebp,DWORD[24+rsp]
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[16+r15]
+ xor esi,ecx
+ mov edi,eax
+ shld eax,eax,5
+ add ebp,esi
+ xor edi,ecx
+ shrd ebx,ebx,7
+ add ebp,eax
+ add edx,DWORD[28+rsp]
+ xor edi,ebx
+ mov esi,ebp
+ shld ebp,ebp,5
+ add edx,edi
+ xor esi,ebx
+ shrd eax,eax,7
+ add edx,ebp
+ add ecx,DWORD[32+rsp]
+ xor esi,eax
+ vpshufb xmm2,xmm2,xmm9
+ mov edi,edx
+ shld edx,edx,5
+ vpaddd xmm8,xmm1,xmm10
+ add ecx,esi
+ vaesenc xmm12,xmm12,xmm15
+ vmovups xmm14,XMMWORD[32+r15]
+ xor edi,eax
+ shrd ebp,ebp,7
+ add ecx,edx
+ vmovdqa XMMWORD[16+rsp],xmm8
+ add ebx,DWORD[36+rsp]
+ xor edi,ebp
+ mov esi,ecx
+ shld ecx,ecx,5
+ add ebx,edi
+ xor esi,ebp
+ shrd edx,edx,7
+ add ebx,ecx
+ add eax,DWORD[40+rsp]
+ xor esi,edx
+ mov edi,ebx
+ shld ebx,ebx,5
+ add eax,esi
+ xor edi,edx
+ shrd ecx,ecx,7
+ add eax,ebx
+ add ebp,DWORD[44+rsp]
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[48+r15]
+ xor edi,ecx
+ mov esi,eax
+ shld eax,eax,5
+ add ebp,edi
+ xor esi,ecx
+ shrd ebx,ebx,7
+ add ebp,eax
+ add edx,DWORD[48+rsp]
+ xor esi,ebx
+ vpshufb xmm3,xmm3,xmm9
+ mov edi,ebp
+ shld ebp,ebp,5
+ vpaddd xmm8,xmm2,xmm10
+ add edx,esi
+ xor edi,ebx
+ shrd eax,eax,7
+ add edx,ebp
+ vmovdqa XMMWORD[32+rsp],xmm8
+ add ecx,DWORD[52+rsp]
+ xor edi,eax
+ mov esi,edx
+ shld edx,edx,5
+ add ecx,edi
+ cmp r8d,11
+ jb NEAR $L$vaesenclast9
+ vaesenc xmm12,xmm12,xmm15
+ vmovups xmm14,XMMWORD[64+r15]
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[80+r15]
+ je NEAR $L$vaesenclast9
+ vaesenc xmm12,xmm12,xmm15
+ vmovups xmm14,XMMWORD[96+r15]
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[112+r15]
+$L$vaesenclast9:
+ vaesenclast xmm12,xmm12,xmm15
+ vmovups xmm15,XMMWORD[((-112))+r15]
+ vmovups xmm14,XMMWORD[((16-112))+r15]
+ xor esi,eax
+ shrd ebp,ebp,7
+ add ecx,edx
+ add ebx,DWORD[56+rsp]
+ xor esi,ebp
+ mov edi,ecx
+ shld ecx,ecx,5
+ add ebx,esi
+ xor edi,ebp
+ shrd edx,edx,7
+ add ebx,ecx
+ add eax,DWORD[60+rsp]
+ xor edi,edx
+ mov esi,ebx
+ shld ebx,ebx,5
+ add eax,edi
+ shrd ecx,ecx,7
+ add eax,ebx
+ vmovups XMMWORD[48+r12*1+r13],xmm12
+ lea r12,[64+r12]
+
+ add eax,DWORD[r9]
+ add esi,DWORD[4+r9]
+ add ecx,DWORD[8+r9]
+ add edx,DWORD[12+r9]
+ mov DWORD[r9],eax
+ add ebp,DWORD[16+r9]
+ mov DWORD[4+r9],esi
+ mov ebx,esi
+ mov DWORD[8+r9],ecx
+ mov edi,ecx
+ mov DWORD[12+r9],edx
+ xor edi,edx
+ mov DWORD[16+r9],ebp
+ and esi,edi
+ jmp NEAR $L$oop_avx
+
+$L$done_avx:
+ add ebx,DWORD[16+rsp]
+ xor esi,ebp
+ mov edi,ecx
+ shld ecx,ecx,5
+ add ebx,esi
+ xor edi,ebp
+ shrd edx,edx,7
+ add ebx,ecx
+ add eax,DWORD[20+rsp]
+ xor edi,edx
+ mov esi,ebx
+ shld ebx,ebx,5
+ add eax,edi
+ xor esi,edx
+ shrd ecx,ecx,7
+ add eax,ebx
+ add ebp,DWORD[24+rsp]
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[16+r15]
+ xor esi,ecx
+ mov edi,eax
+ shld eax,eax,5
+ add ebp,esi
+ xor edi,ecx
+ shrd ebx,ebx,7
+ add ebp,eax
+ add edx,DWORD[28+rsp]
+ xor edi,ebx
+ mov esi,ebp
+ shld ebp,ebp,5
+ add edx,edi
+ xor esi,ebx
+ shrd eax,eax,7
+ add edx,ebp
+ add ecx,DWORD[32+rsp]
+ xor esi,eax
+ mov edi,edx
+ shld edx,edx,5
+ add ecx,esi
+ vaesenc xmm12,xmm12,xmm15
+ vmovups xmm14,XMMWORD[32+r15]
+ xor edi,eax
+ shrd ebp,ebp,7
+ add ecx,edx
+ add ebx,DWORD[36+rsp]
+ xor edi,ebp
+ mov esi,ecx
+ shld ecx,ecx,5
+ add ebx,edi
+ xor esi,ebp
+ shrd edx,edx,7
+ add ebx,ecx
+ add eax,DWORD[40+rsp]
+ xor esi,edx
+ mov edi,ebx
+ shld ebx,ebx,5
+ add eax,esi
+ xor edi,edx
+ shrd ecx,ecx,7
+ add eax,ebx
+ add ebp,DWORD[44+rsp]
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[48+r15]
+ xor edi,ecx
+ mov esi,eax
+ shld eax,eax,5
+ add ebp,edi
+ xor esi,ecx
+ shrd ebx,ebx,7
+ add ebp,eax
+ add edx,DWORD[48+rsp]
+ xor esi,ebx
+ mov edi,ebp
+ shld ebp,ebp,5
+ add edx,esi
+ xor edi,ebx
+ shrd eax,eax,7
+ add edx,ebp
+ add ecx,DWORD[52+rsp]
+ xor edi,eax
+ mov esi,edx
+ shld edx,edx,5
+ add ecx,edi
+ cmp r8d,11
+ jb NEAR $L$vaesenclast10
+ vaesenc xmm12,xmm12,xmm15
+ vmovups xmm14,XMMWORD[64+r15]
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[80+r15]
+ je NEAR $L$vaesenclast10
+ vaesenc xmm12,xmm12,xmm15
+ vmovups xmm14,XMMWORD[96+r15]
+ vaesenc xmm12,xmm12,xmm14
+ vmovups xmm15,XMMWORD[112+r15]
+$L$vaesenclast10:
+ vaesenclast xmm12,xmm12,xmm15
+ vmovups xmm15,XMMWORD[((-112))+r15]
+ vmovups xmm14,XMMWORD[((16-112))+r15]
+ xor esi,eax
+ shrd ebp,ebp,7
+ add ecx,edx
+ add ebx,DWORD[56+rsp]
+ xor esi,ebp
+ mov edi,ecx
+ shld ecx,ecx,5
+ add ebx,esi
+ xor edi,ebp
+ shrd edx,edx,7
+ add ebx,ecx
+ add eax,DWORD[60+rsp]
+ xor edi,edx
+ mov esi,ebx
+ shld ebx,ebx,5
+ add eax,edi
+ shrd ecx,ecx,7
+ add eax,ebx
+ vmovups XMMWORD[48+r12*1+r13],xmm12
+ mov r8,QWORD[88+rsp]
+
+ add eax,DWORD[r9]
+ add esi,DWORD[4+r9]
+ add ecx,DWORD[8+r9]
+ mov DWORD[r9],eax
+ add edx,DWORD[12+r9]
+ mov DWORD[4+r9],esi
+ add ebp,DWORD[16+r9]
+ mov DWORD[8+r9],ecx
+ mov DWORD[12+r9],edx
+ mov DWORD[16+r9],ebp
+ vmovups XMMWORD[r8],xmm12
+ vzeroall
+ movaps xmm6,XMMWORD[((96+0))+rsp]
+ movaps xmm7,XMMWORD[((96+16))+rsp]
+ movaps xmm8,XMMWORD[((96+32))+rsp]
+ movaps xmm9,XMMWORD[((96+48))+rsp]
+ movaps xmm10,XMMWORD[((96+64))+rsp]
+ movaps xmm11,XMMWORD[((96+80))+rsp]
+ movaps xmm12,XMMWORD[((96+96))+rsp]
+ movaps xmm13,XMMWORD[((96+112))+rsp]
+ movaps xmm14,XMMWORD[((96+128))+rsp]
+ movaps xmm15,XMMWORD[((96+144))+rsp]
+ lea rsi,[264+rsp]
+
+ mov r15,QWORD[rsi]
+
+ mov r14,QWORD[8+rsi]
+
+ mov r13,QWORD[16+rsi]
+
+ mov r12,QWORD[24+rsi]
+
+ mov rbp,QWORD[32+rsi]
+
+ mov rbx,QWORD[40+rsi]
+
+ lea rsp,[48+rsi]
+
+$L$epilogue_avx:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_aesni_cbc_sha1_enc_avx:
+ALIGN 64
+K_XX_XX:
+ DD 0x5a827999,0x5a827999,0x5a827999,0x5a827999
+ DD 0x6ed9eba1,0x6ed9eba1,0x6ed9eba1,0x6ed9eba1
+ DD 0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc
+ DD 0xca62c1d6,0xca62c1d6,0xca62c1d6,0xca62c1d6
+ DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f
+DB 0xf,0xe,0xd,0xc,0xb,0xa,0x9,0x8,0x7,0x6,0x5,0x4,0x3,0x2,0x1,0x0
+
+DB 65,69,83,78,73,45,67,66,67,43,83,72,65,49,32,115
+DB 116,105,116,99,104,32,102,111,114,32,120,56,54,95,54,52
+DB 44,32,67,82,89,80,84,79,71,65,77,83,32,98,121,32
+DB 60,97,112,112,114,111,64,111,112,101,110,115,115,108,46,111
+DB 114,103,62,0
+ALIGN 64
+
+ALIGN 32
+aesni_cbc_sha1_enc_shaext:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_aesni_cbc_sha1_enc_shaext:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+ mov r10,QWORD[56+rsp]
+ lea rsp,[((-168))+rsp]
+ movaps XMMWORD[(-8-160)+rax],xmm6
+ movaps XMMWORD[(-8-144)+rax],xmm7
+ movaps XMMWORD[(-8-128)+rax],xmm8
+ movaps XMMWORD[(-8-112)+rax],xmm9
+ movaps XMMWORD[(-8-96)+rax],xmm10
+ movaps XMMWORD[(-8-80)+rax],xmm11
+ movaps XMMWORD[(-8-64)+rax],xmm12
+ movaps XMMWORD[(-8-48)+rax],xmm13
+ movaps XMMWORD[(-8-32)+rax],xmm14
+ movaps XMMWORD[(-8-16)+rax],xmm15
+$L$prologue_shaext:
+ movdqu xmm8,XMMWORD[r9]
+ movd xmm9,DWORD[16+r9]
+ movdqa xmm7,XMMWORD[((K_XX_XX+80))]
+
+ mov r11d,DWORD[240+rcx]
+ sub rsi,rdi
+ movups xmm15,XMMWORD[rcx]
+ movups xmm2,XMMWORD[r8]
+ movups xmm0,XMMWORD[16+rcx]
+ lea rcx,[112+rcx]
+
+ pshufd xmm8,xmm8,27
+ pshufd xmm9,xmm9,27
+ jmp NEAR $L$oop_shaext
+
+ALIGN 16
+$L$oop_shaext:
+ movups xmm14,XMMWORD[rdi]
+ xorps xmm14,xmm15
+ xorps xmm2,xmm14
+ movups xmm1,XMMWORD[((-80))+rcx]
+DB 102,15,56,220,208
+ movdqu xmm3,XMMWORD[r10]
+ movdqa xmm12,xmm9
+DB 102,15,56,0,223
+ movdqu xmm4,XMMWORD[16+r10]
+ movdqa xmm11,xmm8
+ movups xmm0,XMMWORD[((-64))+rcx]
+DB 102,15,56,220,209
+DB 102,15,56,0,231
+
+ paddd xmm9,xmm3
+ movdqu xmm5,XMMWORD[32+r10]
+ lea r10,[64+r10]
+ pxor xmm3,xmm12
+ movups xmm1,XMMWORD[((-48))+rcx]
+DB 102,15,56,220,208
+ pxor xmm3,xmm12
+ movdqa xmm10,xmm8
+DB 102,15,56,0,239
+DB 69,15,58,204,193,0
+DB 68,15,56,200,212
+ movups xmm0,XMMWORD[((-32))+rcx]
+DB 102,15,56,220,209
+DB 15,56,201,220
+ movdqu xmm6,XMMWORD[((-16))+r10]
+ movdqa xmm9,xmm8
+DB 102,15,56,0,247
+ movups xmm1,XMMWORD[((-16))+rcx]
+DB 102,15,56,220,208
+DB 69,15,58,204,194,0
+DB 68,15,56,200,205
+ pxor xmm3,xmm5
+DB 15,56,201,229
+ movups xmm0,XMMWORD[rcx]
+DB 102,15,56,220,209
+ movdqa xmm10,xmm8
+DB 69,15,58,204,193,0
+DB 68,15,56,200,214
+ movups xmm1,XMMWORD[16+rcx]
+DB 102,15,56,220,208
+DB 15,56,202,222
+ pxor xmm4,xmm6
+DB 15,56,201,238
+ movups xmm0,XMMWORD[32+rcx]
+DB 102,15,56,220,209
+ movdqa xmm9,xmm8
+DB 69,15,58,204,194,0
+DB 68,15,56,200,203
+ movups xmm1,XMMWORD[48+rcx]
+DB 102,15,56,220,208
+DB 15,56,202,227
+ pxor xmm5,xmm3
+DB 15,56,201,243
+ cmp r11d,11
+ jb NEAR $L$aesenclast11
+ movups xmm0,XMMWORD[64+rcx]
+DB 102,15,56,220,209
+ movups xmm1,XMMWORD[80+rcx]
+DB 102,15,56,220,208
+ je NEAR $L$aesenclast11
+ movups xmm0,XMMWORD[96+rcx]
+DB 102,15,56,220,209
+ movups xmm1,XMMWORD[112+rcx]
+DB 102,15,56,220,208
+$L$aesenclast11:
+DB 102,15,56,221,209
+ movups xmm0,XMMWORD[((16-112))+rcx]
+ movdqa xmm10,xmm8
+DB 69,15,58,204,193,0
+DB 68,15,56,200,212
+ movups xmm14,XMMWORD[16+rdi]
+ xorps xmm14,xmm15
+ movups XMMWORD[rdi*1+rsi],xmm2
+ xorps xmm2,xmm14
+ movups xmm1,XMMWORD[((-80))+rcx]
+DB 102,15,56,220,208
+DB 15,56,202,236
+ pxor xmm6,xmm4
+DB 15,56,201,220
+ movups xmm0,XMMWORD[((-64))+rcx]
+DB 102,15,56,220,209
+ movdqa xmm9,xmm8
+DB 69,15,58,204,194,1
+DB 68,15,56,200,205
+ movups xmm1,XMMWORD[((-48))+rcx]
+DB 102,15,56,220,208
+DB 15,56,202,245
+ pxor xmm3,xmm5
+DB 15,56,201,229
+ movups xmm0,XMMWORD[((-32))+rcx]
+DB 102,15,56,220,209
+ movdqa xmm10,xmm8
+DB 69,15,58,204,193,1
+DB 68,15,56,200,214
+ movups xmm1,XMMWORD[((-16))+rcx]
+DB 102,15,56,220,208
+DB 15,56,202,222
+ pxor xmm4,xmm6
+DB 15,56,201,238
+ movups xmm0,XMMWORD[rcx]
+DB 102,15,56,220,209
+ movdqa xmm9,xmm8
+DB 69,15,58,204,194,1
+DB 68,15,56,200,203
+ movups xmm1,XMMWORD[16+rcx]
+DB 102,15,56,220,208
+DB 15,56,202,227
+ pxor xmm5,xmm3
+DB 15,56,201,243
+ movups xmm0,XMMWORD[32+rcx]
+DB 102,15,56,220,209
+ movdqa xmm10,xmm8
+DB 69,15,58,204,193,1
+DB 68,15,56,200,212
+ movups xmm1,XMMWORD[48+rcx]
+DB 102,15,56,220,208
+DB 15,56,202,236
+ pxor xmm6,xmm4
+DB 15,56,201,220
+ cmp r11d,11
+ jb NEAR $L$aesenclast12
+ movups xmm0,XMMWORD[64+rcx]
+DB 102,15,56,220,209
+ movups xmm1,XMMWORD[80+rcx]
+DB 102,15,56,220,208
+ je NEAR $L$aesenclast12
+ movups xmm0,XMMWORD[96+rcx]
+DB 102,15,56,220,209
+ movups xmm1,XMMWORD[112+rcx]
+DB 102,15,56,220,208
+$L$aesenclast12:
+DB 102,15,56,221,209
+ movups xmm0,XMMWORD[((16-112))+rcx]
+ movdqa xmm9,xmm8
+DB 69,15,58,204,194,1
+DB 68,15,56,200,205
+ movups xmm14,XMMWORD[32+rdi]
+ xorps xmm14,xmm15
+ movups XMMWORD[16+rdi*1+rsi],xmm2
+ xorps xmm2,xmm14
+ movups xmm1,XMMWORD[((-80))+rcx]
+DB 102,15,56,220,208
+DB 15,56,202,245
+ pxor xmm3,xmm5
+DB 15,56,201,229
+ movups xmm0,XMMWORD[((-64))+rcx]
+DB 102,15,56,220,209
+ movdqa xmm10,xmm8
+DB 69,15,58,204,193,2
+DB 68,15,56,200,214
+ movups xmm1,XMMWORD[((-48))+rcx]
+DB 102,15,56,220,208
+DB 15,56,202,222
+ pxor xmm4,xmm6
+DB 15,56,201,238
+ movups xmm0,XMMWORD[((-32))+rcx]
+DB 102,15,56,220,209
+ movdqa xmm9,xmm8
+DB 69,15,58,204,194,2
+DB 68,15,56,200,203
+ movups xmm1,XMMWORD[((-16))+rcx]
+DB 102,15,56,220,208
+DB 15,56,202,227
+ pxor xmm5,xmm3
+DB 15,56,201,243
+ movups xmm0,XMMWORD[rcx]
+DB 102,15,56,220,209
+ movdqa xmm10,xmm8
+DB 69,15,58,204,193,2
+DB 68,15,56,200,212
+ movups xmm1,XMMWORD[16+rcx]
+DB 102,15,56,220,208
+DB 15,56,202,236
+ pxor xmm6,xmm4
+DB 15,56,201,220
+ movups xmm0,XMMWORD[32+rcx]
+DB 102,15,56,220,209
+ movdqa xmm9,xmm8
+DB 69,15,58,204,194,2
+DB 68,15,56,200,205
+ movups xmm1,XMMWORD[48+rcx]
+DB 102,15,56,220,208
+DB 15,56,202,245
+ pxor xmm3,xmm5
+DB 15,56,201,229
+ cmp r11d,11
+ jb NEAR $L$aesenclast13
+ movups xmm0,XMMWORD[64+rcx]
+DB 102,15,56,220,209
+ movups xmm1,XMMWORD[80+rcx]
+DB 102,15,56,220,208
+ je NEAR $L$aesenclast13
+ movups xmm0,XMMWORD[96+rcx]
+DB 102,15,56,220,209
+ movups xmm1,XMMWORD[112+rcx]
+DB 102,15,56,220,208
+$L$aesenclast13:
+DB 102,15,56,221,209
+ movups xmm0,XMMWORD[((16-112))+rcx]
+ movdqa xmm10,xmm8
+DB 69,15,58,204,193,2
+DB 68,15,56,200,214
+ movups xmm14,XMMWORD[48+rdi]
+ xorps xmm14,xmm15
+ movups XMMWORD[32+rdi*1+rsi],xmm2
+ xorps xmm2,xmm14
+ movups xmm1,XMMWORD[((-80))+rcx]
+DB 102,15,56,220,208
+DB 15,56,202,222
+ pxor xmm4,xmm6
+DB 15,56,201,238
+ movups xmm0,XMMWORD[((-64))+rcx]
+DB 102,15,56,220,209
+ movdqa xmm9,xmm8
+DB 69,15,58,204,194,3
+DB 68,15,56,200,203
+ movups xmm1,XMMWORD[((-48))+rcx]
+DB 102,15,56,220,208
+DB 15,56,202,227
+ pxor xmm5,xmm3
+DB 15,56,201,243
+ movups xmm0,XMMWORD[((-32))+rcx]
+DB 102,15,56,220,209
+ movdqa xmm10,xmm8
+DB 69,15,58,204,193,3
+DB 68,15,56,200,212
+DB 15,56,202,236
+ pxor xmm6,xmm4
+ movups xmm1,XMMWORD[((-16))+rcx]
+DB 102,15,56,220,208
+ movdqa xmm9,xmm8
+DB 69,15,58,204,194,3
+DB 68,15,56,200,205
+DB 15,56,202,245
+ movups xmm0,XMMWORD[rcx]
+DB 102,15,56,220,209
+ movdqa xmm5,xmm12
+ movdqa xmm10,xmm8
+DB 69,15,58,204,193,3
+DB 68,15,56,200,214
+ movups xmm1,XMMWORD[16+rcx]
+DB 102,15,56,220,208
+ movdqa xmm9,xmm8
+DB 69,15,58,204,194,3
+DB 68,15,56,200,205
+ movups xmm0,XMMWORD[32+rcx]
+DB 102,15,56,220,209
+ movups xmm1,XMMWORD[48+rcx]
+DB 102,15,56,220,208
+ cmp r11d,11
+ jb NEAR $L$aesenclast14
+ movups xmm0,XMMWORD[64+rcx]
+DB 102,15,56,220,209
+ movups xmm1,XMMWORD[80+rcx]
+DB 102,15,56,220,208
+ je NEAR $L$aesenclast14
+ movups xmm0,XMMWORD[96+rcx]
+DB 102,15,56,220,209
+ movups xmm1,XMMWORD[112+rcx]
+DB 102,15,56,220,208
+$L$aesenclast14:
+DB 102,15,56,221,209
+ movups xmm0,XMMWORD[((16-112))+rcx]
+ dec rdx
+
+ paddd xmm8,xmm11
+ movups XMMWORD[48+rdi*1+rsi],xmm2
+ lea rdi,[64+rdi]
+ jnz NEAR $L$oop_shaext
+
+ pshufd xmm8,xmm8,27
+ pshufd xmm9,xmm9,27
+ movups XMMWORD[r8],xmm2
+ movdqu XMMWORD[r9],xmm8
+ movd DWORD[16+r9],xmm9
+ movaps xmm6,XMMWORD[((-8-160))+rax]
+ movaps xmm7,XMMWORD[((-8-144))+rax]
+ movaps xmm8,XMMWORD[((-8-128))+rax]
+ movaps xmm9,XMMWORD[((-8-112))+rax]
+ movaps xmm10,XMMWORD[((-8-96))+rax]
+ movaps xmm11,XMMWORD[((-8-80))+rax]
+ movaps xmm12,XMMWORD[((-8-64))+rax]
+ movaps xmm13,XMMWORD[((-8-48))+rax]
+ movaps xmm14,XMMWORD[((-8-32))+rax]
+ movaps xmm15,XMMWORD[((-8-16))+rax]
+ mov rsp,rax
+$L$epilogue_shaext:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+$L$SEH_end_aesni_cbc_sha1_enc_shaext:
+EXTERN __imp_RtlVirtualUnwind
+
+ALIGN 16
+ssse3_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ mov rsi,QWORD[8+r9]
+ mov r11,QWORD[56+r9]
+
+ mov r10d,DWORD[r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jb NEAR $L$common_seh_tail
+
+ mov rax,QWORD[152+r8]
+
+ mov r10d,DWORD[4+r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jae NEAR $L$common_seh_tail
+ lea r10,[aesni_cbc_sha1_enc_shaext]
+ cmp rbx,r10
+ jb NEAR $L$seh_no_shaext
+
+ lea rsi,[rax]
+ lea rdi,[512+r8]
+ mov ecx,20
+ DD 0xa548f3fc
+ lea rax,[168+rax]
+ jmp NEAR $L$common_seh_tail
+$L$seh_no_shaext:
+ lea rsi,[96+rax]
+ lea rdi,[512+r8]
+ mov ecx,20
+ DD 0xa548f3fc
+ lea rax,[264+rax]
+
+ mov r15,QWORD[rax]
+ mov r14,QWORD[8+rax]
+ mov r13,QWORD[16+rax]
+ mov r12,QWORD[24+rax]
+ mov rbp,QWORD[32+rax]
+ mov rbx,QWORD[40+rax]
+ lea rax,[48+rax]
+ mov QWORD[144+r8],rbx
+ mov QWORD[160+r8],rbp
+ mov QWORD[216+r8],r12
+ mov QWORD[224+r8],r13
+ mov QWORD[232+r8],r14
+ mov QWORD[240+r8],r15
+
+$L$common_seh_tail:
+ mov rdi,QWORD[8+rax]
+ mov rsi,QWORD[16+rax]
+ mov QWORD[152+r8],rax
+ mov QWORD[168+r8],rsi
+ mov QWORD[176+r8],rdi
+
+ mov rdi,QWORD[40+r9]
+ mov rsi,r8
+ mov ecx,154
+ DD 0xa548f3fc
+
+ mov rsi,r9
+ xor rcx,rcx
+ mov rdx,QWORD[8+rsi]
+ mov r8,QWORD[rsi]
+ mov r9,QWORD[16+rsi]
+ mov r10,QWORD[40+rsi]
+ lea r11,[56+rsi]
+ lea r12,[24+rsi]
+ mov QWORD[32+rsp],r10
+ mov QWORD[40+rsp],r11
+ mov QWORD[48+rsp],r12
+ mov QWORD[56+rsp],rcx
+ call QWORD[__imp_RtlVirtualUnwind]
+
+ mov eax,1
+ add rsp,64
+ popfq
+ pop r15
+ pop r14
+ pop r13
+ pop r12
+ pop rbp
+ pop rbx
+ pop rdi
+ pop rsi
+ DB 0F3h,0C3h ;repret
+
+
+section .pdata rdata align=4
+ALIGN 4
+ DD $L$SEH_begin_aesni_cbc_sha1_enc_ssse3 wrt ..imagebase
+ DD $L$SEH_end_aesni_cbc_sha1_enc_ssse3 wrt ..imagebase
+ DD $L$SEH_info_aesni_cbc_sha1_enc_ssse3 wrt ..imagebase
+ DD $L$SEH_begin_aesni_cbc_sha1_enc_avx wrt ..imagebase
+ DD $L$SEH_end_aesni_cbc_sha1_enc_avx wrt ..imagebase
+ DD $L$SEH_info_aesni_cbc_sha1_enc_avx wrt ..imagebase
+ DD $L$SEH_begin_aesni_cbc_sha1_enc_shaext wrt ..imagebase
+ DD $L$SEH_end_aesni_cbc_sha1_enc_shaext wrt ..imagebase
+ DD $L$SEH_info_aesni_cbc_sha1_enc_shaext wrt ..imagebase
+section .xdata rdata align=8
+ALIGN 8
+$L$SEH_info_aesni_cbc_sha1_enc_ssse3:
+DB 9,0,0,0
+ DD ssse3_handler wrt ..imagebase
+ DD $L$prologue_ssse3 wrt ..imagebase,$L$epilogue_ssse3 wrt ..imagebase
+$L$SEH_info_aesni_cbc_sha1_enc_avx:
+DB 9,0,0,0
+ DD ssse3_handler wrt ..imagebase
+ DD $L$prologue_avx wrt ..imagebase,$L$epilogue_avx wrt ..imagebase
+$L$SEH_info_aesni_cbc_sha1_enc_shaext:
+DB 9,0,0,0
+ DD ssse3_handler wrt ..imagebase
+ DD $L$prologue_shaext wrt ..imagebase,$L$epilogue_shaext wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha256-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha256-x86_64.nasm
new file mode 100644
index 0000000000..0dba3d7f67
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha256-x86_64.nasm
@@ -0,0 +1,4709 @@
+; Copyright 2013-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+EXTERN OPENSSL_ia32cap_P
+global aesni_cbc_sha256_enc
+
+ALIGN 16
+aesni_cbc_sha256_enc:
+ lea r11,[OPENSSL_ia32cap_P]
+ mov eax,1
+ cmp rcx,0
+ je NEAR $L$probe
+ mov eax,DWORD[r11]
+ mov r10,QWORD[4+r11]
+ bt r10,61
+ jc NEAR aesni_cbc_sha256_enc_shaext
+ mov r11,r10
+ shr r11,32
+
+ test r10d,2048
+ jnz NEAR aesni_cbc_sha256_enc_xop
+ and r11d,296
+ cmp r11d,296
+ je NEAR aesni_cbc_sha256_enc_avx2
+ and r10d,268435456
+ jnz NEAR aesni_cbc_sha256_enc_avx
+ ud2
+ xor eax,eax
+ cmp rcx,0
+ je NEAR $L$probe
+ ud2
+$L$probe:
+ DB 0F3h,0C3h ;repret
+
+
+ALIGN 64
+
+K256:
+ DD 0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5
+ DD 0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5
+ DD 0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5
+ DD 0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5
+ DD 0xd807aa98,0x12835b01,0x243185be,0x550c7dc3
+ DD 0xd807aa98,0x12835b01,0x243185be,0x550c7dc3
+ DD 0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174
+ DD 0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174
+ DD 0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc
+ DD 0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc
+ DD 0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da
+ DD 0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da
+ DD 0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7
+ DD 0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7
+ DD 0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967
+ DD 0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967
+ DD 0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13
+ DD 0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13
+ DD 0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85
+ DD 0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85
+ DD 0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3
+ DD 0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3
+ DD 0xd192e819,0xd6990624,0xf40e3585,0x106aa070
+ DD 0xd192e819,0xd6990624,0xf40e3585,0x106aa070
+ DD 0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5
+ DD 0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5
+ DD 0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3
+ DD 0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3
+ DD 0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208
+ DD 0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208
+ DD 0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2
+ DD 0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2
+
+ DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f
+ DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f
+ DD 0,0,0,0,0,0,0,0,-1,-1,-1,-1
+ DD 0,0,0,0,0,0,0,0
+DB 65,69,83,78,73,45,67,66,67,43,83,72,65,50,53,54
+DB 32,115,116,105,116,99,104,32,102,111,114,32,120,56,54,95
+DB 54,52,44,32,67,82,89,80,84,79,71,65,77,83,32,98
+DB 121,32,60,97,112,112,114,111,64,111,112,101,110,115,115,108
+DB 46,111,114,103,62,0
+ALIGN 64
+
+ALIGN 64
+aesni_cbc_sha256_enc_xop:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_aesni_cbc_sha256_enc_xop:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+
+$L$xop_shortcut:
+ mov r10,QWORD[56+rsp]
+ mov rax,rsp
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+ sub rsp,288
+ and rsp,-64
+
+ shl rdx,6
+ sub rsi,rdi
+ sub r10,rdi
+ add rdx,rdi
+
+
+ mov QWORD[((64+8))+rsp],rsi
+ mov QWORD[((64+16))+rsp],rdx
+
+ mov QWORD[((64+32))+rsp],r8
+ mov QWORD[((64+40))+rsp],r9
+ mov QWORD[((64+48))+rsp],r10
+ mov QWORD[120+rsp],rax
+
+ movaps XMMWORD[128+rsp],xmm6
+ movaps XMMWORD[144+rsp],xmm7
+ movaps XMMWORD[160+rsp],xmm8
+ movaps XMMWORD[176+rsp],xmm9
+ movaps XMMWORD[192+rsp],xmm10
+ movaps XMMWORD[208+rsp],xmm11
+ movaps XMMWORD[224+rsp],xmm12
+ movaps XMMWORD[240+rsp],xmm13
+ movaps XMMWORD[256+rsp],xmm14
+ movaps XMMWORD[272+rsp],xmm15
+$L$prologue_xop:
+ vzeroall
+
+ mov r12,rdi
+ lea rdi,[128+rcx]
+ lea r13,[((K256+544))]
+ mov r14d,DWORD[((240-128))+rdi]
+ mov r15,r9
+ mov rsi,r10
+ vmovdqu xmm8,XMMWORD[r8]
+ sub r14,9
+
+ mov eax,DWORD[r15]
+ mov ebx,DWORD[4+r15]
+ mov ecx,DWORD[8+r15]
+ mov edx,DWORD[12+r15]
+ mov r8d,DWORD[16+r15]
+ mov r9d,DWORD[20+r15]
+ mov r10d,DWORD[24+r15]
+ mov r11d,DWORD[28+r15]
+
+ vmovdqa xmm14,XMMWORD[r14*8+r13]
+ vmovdqa xmm13,XMMWORD[16+r14*8+r13]
+ vmovdqa xmm12,XMMWORD[32+r14*8+r13]
+ vmovdqu xmm10,XMMWORD[((0-128))+rdi]
+ jmp NEAR $L$loop_xop
+ALIGN 16
+$L$loop_xop:
+ vmovdqa xmm7,XMMWORD[((K256+512))]
+ vmovdqu xmm0,XMMWORD[r12*1+rsi]
+ vmovdqu xmm1,XMMWORD[16+r12*1+rsi]
+ vmovdqu xmm2,XMMWORD[32+r12*1+rsi]
+ vmovdqu xmm3,XMMWORD[48+r12*1+rsi]
+ vpshufb xmm0,xmm0,xmm7
+ lea rbp,[K256]
+ vpshufb xmm1,xmm1,xmm7
+ vpshufb xmm2,xmm2,xmm7
+ vpaddd xmm4,xmm0,XMMWORD[rbp]
+ vpshufb xmm3,xmm3,xmm7
+ vpaddd xmm5,xmm1,XMMWORD[32+rbp]
+ vpaddd xmm6,xmm2,XMMWORD[64+rbp]
+ vpaddd xmm7,xmm3,XMMWORD[96+rbp]
+ vmovdqa XMMWORD[rsp],xmm4
+ mov r14d,eax
+ vmovdqa XMMWORD[16+rsp],xmm5
+ mov esi,ebx
+ vmovdqa XMMWORD[32+rsp],xmm6
+ xor esi,ecx
+ vmovdqa XMMWORD[48+rsp],xmm7
+ mov r13d,r8d
+ jmp NEAR $L$xop_00_47
+
+ALIGN 16
+$L$xop_00_47:
+ sub rbp,-16*2*4
+ vmovdqu xmm9,XMMWORD[r12]
+ mov QWORD[((64+0))+rsp],r12
+ vpalignr xmm4,xmm1,xmm0,4
+ ror r13d,14
+ mov eax,r14d
+ vpalignr xmm7,xmm3,xmm2,4
+ mov r12d,r9d
+ xor r13d,r8d
+DB 143,232,120,194,236,14
+ ror r14d,9
+ xor r12d,r10d
+ vpsrld xmm4,xmm4,3
+ ror r13d,5
+ xor r14d,eax
+ vpaddd xmm0,xmm0,xmm7
+ and r12d,r8d
+ vpxor xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((16-128))+rdi]
+ xor r13d,r8d
+ add r11d,DWORD[rsp]
+ mov r15d,eax
+DB 143,232,120,194,245,11
+ ror r14d,11
+ xor r12d,r10d
+ vpxor xmm4,xmm4,xmm5
+ xor r15d,ebx
+ ror r13d,6
+ add r11d,r12d
+ and esi,r15d
+DB 143,232,120,194,251,13
+ xor r14d,eax
+ add r11d,r13d
+ vpxor xmm4,xmm4,xmm6
+ xor esi,ebx
+ add edx,r11d
+ vpsrld xmm6,xmm3,10
+ ror r14d,2
+ add r11d,esi
+ vpaddd xmm0,xmm0,xmm4
+ mov r13d,edx
+ add r14d,r11d
+DB 143,232,120,194,239,2
+ ror r13d,14
+ mov r11d,r14d
+ vpxor xmm7,xmm7,xmm6
+ mov r12d,r8d
+ xor r13d,edx
+ ror r14d,9
+ xor r12d,r9d
+ vpxor xmm7,xmm7,xmm5
+ ror r13d,5
+ xor r14d,r11d
+ and r12d,edx
+ vpxor xmm9,xmm9,xmm8
+ xor r13d,edx
+ vpsrldq xmm7,xmm7,8
+ add r10d,DWORD[4+rsp]
+ mov esi,r11d
+ ror r14d,11
+ xor r12d,r9d
+ vpaddd xmm0,xmm0,xmm7
+ xor esi,eax
+ ror r13d,6
+ add r10d,r12d
+ and r15d,esi
+DB 143,232,120,194,248,13
+ xor r14d,r11d
+ add r10d,r13d
+ vpsrld xmm6,xmm0,10
+ xor r15d,eax
+ add ecx,r10d
+DB 143,232,120,194,239,2
+ ror r14d,2
+ add r10d,r15d
+ vpxor xmm7,xmm7,xmm6
+ mov r13d,ecx
+ add r14d,r10d
+ ror r13d,14
+ mov r10d,r14d
+ vpxor xmm7,xmm7,xmm5
+ mov r12d,edx
+ xor r13d,ecx
+ ror r14d,9
+ xor r12d,r8d
+ vpslldq xmm7,xmm7,8
+ ror r13d,5
+ xor r14d,r10d
+ and r12d,ecx
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((32-128))+rdi]
+ xor r13d,ecx
+ vpaddd xmm0,xmm0,xmm7
+ add r9d,DWORD[8+rsp]
+ mov r15d,r10d
+ ror r14d,11
+ xor r12d,r8d
+ vpaddd xmm6,xmm0,XMMWORD[rbp]
+ xor r15d,r11d
+ ror r13d,6
+ add r9d,r12d
+ and esi,r15d
+ xor r14d,r10d
+ add r9d,r13d
+ xor esi,r11d
+ add ebx,r9d
+ ror r14d,2
+ add r9d,esi
+ mov r13d,ebx
+ add r14d,r9d
+ ror r13d,14
+ mov r9d,r14d
+ mov r12d,ecx
+ xor r13d,ebx
+ ror r14d,9
+ xor r12d,edx
+ ror r13d,5
+ xor r14d,r9d
+ and r12d,ebx
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((48-128))+rdi]
+ xor r13d,ebx
+ add r8d,DWORD[12+rsp]
+ mov esi,r9d
+ ror r14d,11
+ xor r12d,edx
+ xor esi,r10d
+ ror r13d,6
+ add r8d,r12d
+ and r15d,esi
+ xor r14d,r9d
+ add r8d,r13d
+ xor r15d,r10d
+ add eax,r8d
+ ror r14d,2
+ add r8d,r15d
+ mov r13d,eax
+ add r14d,r8d
+ vmovdqa XMMWORD[rsp],xmm6
+ vpalignr xmm4,xmm2,xmm1,4
+ ror r13d,14
+ mov r8d,r14d
+ vpalignr xmm7,xmm0,xmm3,4
+ mov r12d,ebx
+ xor r13d,eax
+DB 143,232,120,194,236,14
+ ror r14d,9
+ xor r12d,ecx
+ vpsrld xmm4,xmm4,3
+ ror r13d,5
+ xor r14d,r8d
+ vpaddd xmm1,xmm1,xmm7
+ and r12d,eax
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((64-128))+rdi]
+ xor r13d,eax
+ add edx,DWORD[16+rsp]
+ mov r15d,r8d
+DB 143,232,120,194,245,11
+ ror r14d,11
+ xor r12d,ecx
+ vpxor xmm4,xmm4,xmm5
+ xor r15d,r9d
+ ror r13d,6
+ add edx,r12d
+ and esi,r15d
+DB 143,232,120,194,248,13
+ xor r14d,r8d
+ add edx,r13d
+ vpxor xmm4,xmm4,xmm6
+ xor esi,r9d
+ add r11d,edx
+ vpsrld xmm6,xmm0,10
+ ror r14d,2
+ add edx,esi
+ vpaddd xmm1,xmm1,xmm4
+ mov r13d,r11d
+ add r14d,edx
+DB 143,232,120,194,239,2
+ ror r13d,14
+ mov edx,r14d
+ vpxor xmm7,xmm7,xmm6
+ mov r12d,eax
+ xor r13d,r11d
+ ror r14d,9
+ xor r12d,ebx
+ vpxor xmm7,xmm7,xmm5
+ ror r13d,5
+ xor r14d,edx
+ and r12d,r11d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((80-128))+rdi]
+ xor r13d,r11d
+ vpsrldq xmm7,xmm7,8
+ add ecx,DWORD[20+rsp]
+ mov esi,edx
+ ror r14d,11
+ xor r12d,ebx
+ vpaddd xmm1,xmm1,xmm7
+ xor esi,r8d
+ ror r13d,6
+ add ecx,r12d
+ and r15d,esi
+DB 143,232,120,194,249,13
+ xor r14d,edx
+ add ecx,r13d
+ vpsrld xmm6,xmm1,10
+ xor r15d,r8d
+ add r10d,ecx
+DB 143,232,120,194,239,2
+ ror r14d,2
+ add ecx,r15d
+ vpxor xmm7,xmm7,xmm6
+ mov r13d,r10d
+ add r14d,ecx
+ ror r13d,14
+ mov ecx,r14d
+ vpxor xmm7,xmm7,xmm5
+ mov r12d,r11d
+ xor r13d,r10d
+ ror r14d,9
+ xor r12d,eax
+ vpslldq xmm7,xmm7,8
+ ror r13d,5
+ xor r14d,ecx
+ and r12d,r10d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((96-128))+rdi]
+ xor r13d,r10d
+ vpaddd xmm1,xmm1,xmm7
+ add ebx,DWORD[24+rsp]
+ mov r15d,ecx
+ ror r14d,11
+ xor r12d,eax
+ vpaddd xmm6,xmm1,XMMWORD[32+rbp]
+ xor r15d,edx
+ ror r13d,6
+ add ebx,r12d
+ and esi,r15d
+ xor r14d,ecx
+ add ebx,r13d
+ xor esi,edx
+ add r9d,ebx
+ ror r14d,2
+ add ebx,esi
+ mov r13d,r9d
+ add r14d,ebx
+ ror r13d,14
+ mov ebx,r14d
+ mov r12d,r10d
+ xor r13d,r9d
+ ror r14d,9
+ xor r12d,r11d
+ ror r13d,5
+ xor r14d,ebx
+ and r12d,r9d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((112-128))+rdi]
+ xor r13d,r9d
+ add eax,DWORD[28+rsp]
+ mov esi,ebx
+ ror r14d,11
+ xor r12d,r11d
+ xor esi,ecx
+ ror r13d,6
+ add eax,r12d
+ and r15d,esi
+ xor r14d,ebx
+ add eax,r13d
+ xor r15d,ecx
+ add r8d,eax
+ ror r14d,2
+ add eax,r15d
+ mov r13d,r8d
+ add r14d,eax
+ vmovdqa XMMWORD[16+rsp],xmm6
+ vpalignr xmm4,xmm3,xmm2,4
+ ror r13d,14
+ mov eax,r14d
+ vpalignr xmm7,xmm1,xmm0,4
+ mov r12d,r9d
+ xor r13d,r8d
+DB 143,232,120,194,236,14
+ ror r14d,9
+ xor r12d,r10d
+ vpsrld xmm4,xmm4,3
+ ror r13d,5
+ xor r14d,eax
+ vpaddd xmm2,xmm2,xmm7
+ and r12d,r8d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((128-128))+rdi]
+ xor r13d,r8d
+ add r11d,DWORD[32+rsp]
+ mov r15d,eax
+DB 143,232,120,194,245,11
+ ror r14d,11
+ xor r12d,r10d
+ vpxor xmm4,xmm4,xmm5
+ xor r15d,ebx
+ ror r13d,6
+ add r11d,r12d
+ and esi,r15d
+DB 143,232,120,194,249,13
+ xor r14d,eax
+ add r11d,r13d
+ vpxor xmm4,xmm4,xmm6
+ xor esi,ebx
+ add edx,r11d
+ vpsrld xmm6,xmm1,10
+ ror r14d,2
+ add r11d,esi
+ vpaddd xmm2,xmm2,xmm4
+ mov r13d,edx
+ add r14d,r11d
+DB 143,232,120,194,239,2
+ ror r13d,14
+ mov r11d,r14d
+ vpxor xmm7,xmm7,xmm6
+ mov r12d,r8d
+ xor r13d,edx
+ ror r14d,9
+ xor r12d,r9d
+ vpxor xmm7,xmm7,xmm5
+ ror r13d,5
+ xor r14d,r11d
+ and r12d,edx
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((144-128))+rdi]
+ xor r13d,edx
+ vpsrldq xmm7,xmm7,8
+ add r10d,DWORD[36+rsp]
+ mov esi,r11d
+ ror r14d,11
+ xor r12d,r9d
+ vpaddd xmm2,xmm2,xmm7
+ xor esi,eax
+ ror r13d,6
+ add r10d,r12d
+ and r15d,esi
+DB 143,232,120,194,250,13
+ xor r14d,r11d
+ add r10d,r13d
+ vpsrld xmm6,xmm2,10
+ xor r15d,eax
+ add ecx,r10d
+DB 143,232,120,194,239,2
+ ror r14d,2
+ add r10d,r15d
+ vpxor xmm7,xmm7,xmm6
+ mov r13d,ecx
+ add r14d,r10d
+ ror r13d,14
+ mov r10d,r14d
+ vpxor xmm7,xmm7,xmm5
+ mov r12d,edx
+ xor r13d,ecx
+ ror r14d,9
+ xor r12d,r8d
+ vpslldq xmm7,xmm7,8
+ ror r13d,5
+ xor r14d,r10d
+ and r12d,ecx
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((160-128))+rdi]
+ xor r13d,ecx
+ vpaddd xmm2,xmm2,xmm7
+ add r9d,DWORD[40+rsp]
+ mov r15d,r10d
+ ror r14d,11
+ xor r12d,r8d
+ vpaddd xmm6,xmm2,XMMWORD[64+rbp]
+ xor r15d,r11d
+ ror r13d,6
+ add r9d,r12d
+ and esi,r15d
+ xor r14d,r10d
+ add r9d,r13d
+ xor esi,r11d
+ add ebx,r9d
+ ror r14d,2
+ add r9d,esi
+ mov r13d,ebx
+ add r14d,r9d
+ ror r13d,14
+ mov r9d,r14d
+ mov r12d,ecx
+ xor r13d,ebx
+ ror r14d,9
+ xor r12d,edx
+ ror r13d,5
+ xor r14d,r9d
+ and r12d,ebx
+ vaesenclast xmm11,xmm9,xmm10
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((176-128))+rdi]
+ xor r13d,ebx
+ add r8d,DWORD[44+rsp]
+ mov esi,r9d
+ ror r14d,11
+ xor r12d,edx
+ xor esi,r10d
+ ror r13d,6
+ add r8d,r12d
+ and r15d,esi
+ xor r14d,r9d
+ add r8d,r13d
+ xor r15d,r10d
+ add eax,r8d
+ ror r14d,2
+ add r8d,r15d
+ mov r13d,eax
+ add r14d,r8d
+ vmovdqa XMMWORD[32+rsp],xmm6
+ vpalignr xmm4,xmm0,xmm3,4
+ ror r13d,14
+ mov r8d,r14d
+ vpalignr xmm7,xmm2,xmm1,4
+ mov r12d,ebx
+ xor r13d,eax
+DB 143,232,120,194,236,14
+ ror r14d,9
+ xor r12d,ecx
+ vpsrld xmm4,xmm4,3
+ ror r13d,5
+ xor r14d,r8d
+ vpaddd xmm3,xmm3,xmm7
+ and r12d,eax
+ vpand xmm8,xmm11,xmm12
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((192-128))+rdi]
+ xor r13d,eax
+ add edx,DWORD[48+rsp]
+ mov r15d,r8d
+DB 143,232,120,194,245,11
+ ror r14d,11
+ xor r12d,ecx
+ vpxor xmm4,xmm4,xmm5
+ xor r15d,r9d
+ ror r13d,6
+ add edx,r12d
+ and esi,r15d
+DB 143,232,120,194,250,13
+ xor r14d,r8d
+ add edx,r13d
+ vpxor xmm4,xmm4,xmm6
+ xor esi,r9d
+ add r11d,edx
+ vpsrld xmm6,xmm2,10
+ ror r14d,2
+ add edx,esi
+ vpaddd xmm3,xmm3,xmm4
+ mov r13d,r11d
+ add r14d,edx
+DB 143,232,120,194,239,2
+ ror r13d,14
+ mov edx,r14d
+ vpxor xmm7,xmm7,xmm6
+ mov r12d,eax
+ xor r13d,r11d
+ ror r14d,9
+ xor r12d,ebx
+ vpxor xmm7,xmm7,xmm5
+ ror r13d,5
+ xor r14d,edx
+ and r12d,r11d
+ vaesenclast xmm11,xmm9,xmm10
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((208-128))+rdi]
+ xor r13d,r11d
+ vpsrldq xmm7,xmm7,8
+ add ecx,DWORD[52+rsp]
+ mov esi,edx
+ ror r14d,11
+ xor r12d,ebx
+ vpaddd xmm3,xmm3,xmm7
+ xor esi,r8d
+ ror r13d,6
+ add ecx,r12d
+ and r15d,esi
+DB 143,232,120,194,251,13
+ xor r14d,edx
+ add ecx,r13d
+ vpsrld xmm6,xmm3,10
+ xor r15d,r8d
+ add r10d,ecx
+DB 143,232,120,194,239,2
+ ror r14d,2
+ add ecx,r15d
+ vpxor xmm7,xmm7,xmm6
+ mov r13d,r10d
+ add r14d,ecx
+ ror r13d,14
+ mov ecx,r14d
+ vpxor xmm7,xmm7,xmm5
+ mov r12d,r11d
+ xor r13d,r10d
+ ror r14d,9
+ xor r12d,eax
+ vpslldq xmm7,xmm7,8
+ ror r13d,5
+ xor r14d,ecx
+ and r12d,r10d
+ vpand xmm11,xmm11,xmm13
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((224-128))+rdi]
+ xor r13d,r10d
+ vpaddd xmm3,xmm3,xmm7
+ add ebx,DWORD[56+rsp]
+ mov r15d,ecx
+ ror r14d,11
+ xor r12d,eax
+ vpaddd xmm6,xmm3,XMMWORD[96+rbp]
+ xor r15d,edx
+ ror r13d,6
+ add ebx,r12d
+ and esi,r15d
+ xor r14d,ecx
+ add ebx,r13d
+ xor esi,edx
+ add r9d,ebx
+ ror r14d,2
+ add ebx,esi
+ mov r13d,r9d
+ add r14d,ebx
+ ror r13d,14
+ mov ebx,r14d
+ mov r12d,r10d
+ xor r13d,r9d
+ ror r14d,9
+ xor r12d,r11d
+ ror r13d,5
+ xor r14d,ebx
+ and r12d,r9d
+ vpor xmm8,xmm8,xmm11
+ vaesenclast xmm11,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((0-128))+rdi]
+ xor r13d,r9d
+ add eax,DWORD[60+rsp]
+ mov esi,ebx
+ ror r14d,11
+ xor r12d,r11d
+ xor esi,ecx
+ ror r13d,6
+ add eax,r12d
+ and r15d,esi
+ xor r14d,ebx
+ add eax,r13d
+ xor r15d,ecx
+ add r8d,eax
+ ror r14d,2
+ add eax,r15d
+ mov r13d,r8d
+ add r14d,eax
+ vmovdqa XMMWORD[48+rsp],xmm6
+ mov r12,QWORD[((64+0))+rsp]
+ vpand xmm11,xmm11,xmm14
+ mov r15,QWORD[((64+8))+rsp]
+ vpor xmm8,xmm8,xmm11
+ vmovdqu XMMWORD[r12*1+r15],xmm8
+ lea r12,[16+r12]
+ cmp BYTE[131+rbp],0
+ jne NEAR $L$xop_00_47
+ vmovdqu xmm9,XMMWORD[r12]
+ mov QWORD[((64+0))+rsp],r12
+ ror r13d,14
+ mov eax,r14d
+ mov r12d,r9d
+ xor r13d,r8d
+ ror r14d,9
+ xor r12d,r10d
+ ror r13d,5
+ xor r14d,eax
+ and r12d,r8d
+ vpxor xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((16-128))+rdi]
+ xor r13d,r8d
+ add r11d,DWORD[rsp]
+ mov r15d,eax
+ ror r14d,11
+ xor r12d,r10d
+ xor r15d,ebx
+ ror r13d,6
+ add r11d,r12d
+ and esi,r15d
+ xor r14d,eax
+ add r11d,r13d
+ xor esi,ebx
+ add edx,r11d
+ ror r14d,2
+ add r11d,esi
+ mov r13d,edx
+ add r14d,r11d
+ ror r13d,14
+ mov r11d,r14d
+ mov r12d,r8d
+ xor r13d,edx
+ ror r14d,9
+ xor r12d,r9d
+ ror r13d,5
+ xor r14d,r11d
+ and r12d,edx
+ vpxor xmm9,xmm9,xmm8
+ xor r13d,edx
+ add r10d,DWORD[4+rsp]
+ mov esi,r11d
+ ror r14d,11
+ xor r12d,r9d
+ xor esi,eax
+ ror r13d,6
+ add r10d,r12d
+ and r15d,esi
+ xor r14d,r11d
+ add r10d,r13d
+ xor r15d,eax
+ add ecx,r10d
+ ror r14d,2
+ add r10d,r15d
+ mov r13d,ecx
+ add r14d,r10d
+ ror r13d,14
+ mov r10d,r14d
+ mov r12d,edx
+ xor r13d,ecx
+ ror r14d,9
+ xor r12d,r8d
+ ror r13d,5
+ xor r14d,r10d
+ and r12d,ecx
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((32-128))+rdi]
+ xor r13d,ecx
+ add r9d,DWORD[8+rsp]
+ mov r15d,r10d
+ ror r14d,11
+ xor r12d,r8d
+ xor r15d,r11d
+ ror r13d,6
+ add r9d,r12d
+ and esi,r15d
+ xor r14d,r10d
+ add r9d,r13d
+ xor esi,r11d
+ add ebx,r9d
+ ror r14d,2
+ add r9d,esi
+ mov r13d,ebx
+ add r14d,r9d
+ ror r13d,14
+ mov r9d,r14d
+ mov r12d,ecx
+ xor r13d,ebx
+ ror r14d,9
+ xor r12d,edx
+ ror r13d,5
+ xor r14d,r9d
+ and r12d,ebx
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((48-128))+rdi]
+ xor r13d,ebx
+ add r8d,DWORD[12+rsp]
+ mov esi,r9d
+ ror r14d,11
+ xor r12d,edx
+ xor esi,r10d
+ ror r13d,6
+ add r8d,r12d
+ and r15d,esi
+ xor r14d,r9d
+ add r8d,r13d
+ xor r15d,r10d
+ add eax,r8d
+ ror r14d,2
+ add r8d,r15d
+ mov r13d,eax
+ add r14d,r8d
+ ror r13d,14
+ mov r8d,r14d
+ mov r12d,ebx
+ xor r13d,eax
+ ror r14d,9
+ xor r12d,ecx
+ ror r13d,5
+ xor r14d,r8d
+ and r12d,eax
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((64-128))+rdi]
+ xor r13d,eax
+ add edx,DWORD[16+rsp]
+ mov r15d,r8d
+ ror r14d,11
+ xor r12d,ecx
+ xor r15d,r9d
+ ror r13d,6
+ add edx,r12d
+ and esi,r15d
+ xor r14d,r8d
+ add edx,r13d
+ xor esi,r9d
+ add r11d,edx
+ ror r14d,2
+ add edx,esi
+ mov r13d,r11d
+ add r14d,edx
+ ror r13d,14
+ mov edx,r14d
+ mov r12d,eax
+ xor r13d,r11d
+ ror r14d,9
+ xor r12d,ebx
+ ror r13d,5
+ xor r14d,edx
+ and r12d,r11d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((80-128))+rdi]
+ xor r13d,r11d
+ add ecx,DWORD[20+rsp]
+ mov esi,edx
+ ror r14d,11
+ xor r12d,ebx
+ xor esi,r8d
+ ror r13d,6
+ add ecx,r12d
+ and r15d,esi
+ xor r14d,edx
+ add ecx,r13d
+ xor r15d,r8d
+ add r10d,ecx
+ ror r14d,2
+ add ecx,r15d
+ mov r13d,r10d
+ add r14d,ecx
+ ror r13d,14
+ mov ecx,r14d
+ mov r12d,r11d
+ xor r13d,r10d
+ ror r14d,9
+ xor r12d,eax
+ ror r13d,5
+ xor r14d,ecx
+ and r12d,r10d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((96-128))+rdi]
+ xor r13d,r10d
+ add ebx,DWORD[24+rsp]
+ mov r15d,ecx
+ ror r14d,11
+ xor r12d,eax
+ xor r15d,edx
+ ror r13d,6
+ add ebx,r12d
+ and esi,r15d
+ xor r14d,ecx
+ add ebx,r13d
+ xor esi,edx
+ add r9d,ebx
+ ror r14d,2
+ add ebx,esi
+ mov r13d,r9d
+ add r14d,ebx
+ ror r13d,14
+ mov ebx,r14d
+ mov r12d,r10d
+ xor r13d,r9d
+ ror r14d,9
+ xor r12d,r11d
+ ror r13d,5
+ xor r14d,ebx
+ and r12d,r9d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((112-128))+rdi]
+ xor r13d,r9d
+ add eax,DWORD[28+rsp]
+ mov esi,ebx
+ ror r14d,11
+ xor r12d,r11d
+ xor esi,ecx
+ ror r13d,6
+ add eax,r12d
+ and r15d,esi
+ xor r14d,ebx
+ add eax,r13d
+ xor r15d,ecx
+ add r8d,eax
+ ror r14d,2
+ add eax,r15d
+ mov r13d,r8d
+ add r14d,eax
+ ror r13d,14
+ mov eax,r14d
+ mov r12d,r9d
+ xor r13d,r8d
+ ror r14d,9
+ xor r12d,r10d
+ ror r13d,5
+ xor r14d,eax
+ and r12d,r8d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((128-128))+rdi]
+ xor r13d,r8d
+ add r11d,DWORD[32+rsp]
+ mov r15d,eax
+ ror r14d,11
+ xor r12d,r10d
+ xor r15d,ebx
+ ror r13d,6
+ add r11d,r12d
+ and esi,r15d
+ xor r14d,eax
+ add r11d,r13d
+ xor esi,ebx
+ add edx,r11d
+ ror r14d,2
+ add r11d,esi
+ mov r13d,edx
+ add r14d,r11d
+ ror r13d,14
+ mov r11d,r14d
+ mov r12d,r8d
+ xor r13d,edx
+ ror r14d,9
+ xor r12d,r9d
+ ror r13d,5
+ xor r14d,r11d
+ and r12d,edx
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((144-128))+rdi]
+ xor r13d,edx
+ add r10d,DWORD[36+rsp]
+ mov esi,r11d
+ ror r14d,11
+ xor r12d,r9d
+ xor esi,eax
+ ror r13d,6
+ add r10d,r12d
+ and r15d,esi
+ xor r14d,r11d
+ add r10d,r13d
+ xor r15d,eax
+ add ecx,r10d
+ ror r14d,2
+ add r10d,r15d
+ mov r13d,ecx
+ add r14d,r10d
+ ror r13d,14
+ mov r10d,r14d
+ mov r12d,edx
+ xor r13d,ecx
+ ror r14d,9
+ xor r12d,r8d
+ ror r13d,5
+ xor r14d,r10d
+ and r12d,ecx
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((160-128))+rdi]
+ xor r13d,ecx
+ add r9d,DWORD[40+rsp]
+ mov r15d,r10d
+ ror r14d,11
+ xor r12d,r8d
+ xor r15d,r11d
+ ror r13d,6
+ add r9d,r12d
+ and esi,r15d
+ xor r14d,r10d
+ add r9d,r13d
+ xor esi,r11d
+ add ebx,r9d
+ ror r14d,2
+ add r9d,esi
+ mov r13d,ebx
+ add r14d,r9d
+ ror r13d,14
+ mov r9d,r14d
+ mov r12d,ecx
+ xor r13d,ebx
+ ror r14d,9
+ xor r12d,edx
+ ror r13d,5
+ xor r14d,r9d
+ and r12d,ebx
+ vaesenclast xmm11,xmm9,xmm10
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((176-128))+rdi]
+ xor r13d,ebx
+ add r8d,DWORD[44+rsp]
+ mov esi,r9d
+ ror r14d,11
+ xor r12d,edx
+ xor esi,r10d
+ ror r13d,6
+ add r8d,r12d
+ and r15d,esi
+ xor r14d,r9d
+ add r8d,r13d
+ xor r15d,r10d
+ add eax,r8d
+ ror r14d,2
+ add r8d,r15d
+ mov r13d,eax
+ add r14d,r8d
+ ror r13d,14
+ mov r8d,r14d
+ mov r12d,ebx
+ xor r13d,eax
+ ror r14d,9
+ xor r12d,ecx
+ ror r13d,5
+ xor r14d,r8d
+ and r12d,eax
+ vpand xmm8,xmm11,xmm12
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((192-128))+rdi]
+ xor r13d,eax
+ add edx,DWORD[48+rsp]
+ mov r15d,r8d
+ ror r14d,11
+ xor r12d,ecx
+ xor r15d,r9d
+ ror r13d,6
+ add edx,r12d
+ and esi,r15d
+ xor r14d,r8d
+ add edx,r13d
+ xor esi,r9d
+ add r11d,edx
+ ror r14d,2
+ add edx,esi
+ mov r13d,r11d
+ add r14d,edx
+ ror r13d,14
+ mov edx,r14d
+ mov r12d,eax
+ xor r13d,r11d
+ ror r14d,9
+ xor r12d,ebx
+ ror r13d,5
+ xor r14d,edx
+ and r12d,r11d
+ vaesenclast xmm11,xmm9,xmm10
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((208-128))+rdi]
+ xor r13d,r11d
+ add ecx,DWORD[52+rsp]
+ mov esi,edx
+ ror r14d,11
+ xor r12d,ebx
+ xor esi,r8d
+ ror r13d,6
+ add ecx,r12d
+ and r15d,esi
+ xor r14d,edx
+ add ecx,r13d
+ xor r15d,r8d
+ add r10d,ecx
+ ror r14d,2
+ add ecx,r15d
+ mov r13d,r10d
+ add r14d,ecx
+ ror r13d,14
+ mov ecx,r14d
+ mov r12d,r11d
+ xor r13d,r10d
+ ror r14d,9
+ xor r12d,eax
+ ror r13d,5
+ xor r14d,ecx
+ and r12d,r10d
+ vpand xmm11,xmm11,xmm13
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((224-128))+rdi]
+ xor r13d,r10d
+ add ebx,DWORD[56+rsp]
+ mov r15d,ecx
+ ror r14d,11
+ xor r12d,eax
+ xor r15d,edx
+ ror r13d,6
+ add ebx,r12d
+ and esi,r15d
+ xor r14d,ecx
+ add ebx,r13d
+ xor esi,edx
+ add r9d,ebx
+ ror r14d,2
+ add ebx,esi
+ mov r13d,r9d
+ add r14d,ebx
+ ror r13d,14
+ mov ebx,r14d
+ mov r12d,r10d
+ xor r13d,r9d
+ ror r14d,9
+ xor r12d,r11d
+ ror r13d,5
+ xor r14d,ebx
+ and r12d,r9d
+ vpor xmm8,xmm8,xmm11
+ vaesenclast xmm11,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((0-128))+rdi]
+ xor r13d,r9d
+ add eax,DWORD[60+rsp]
+ mov esi,ebx
+ ror r14d,11
+ xor r12d,r11d
+ xor esi,ecx
+ ror r13d,6
+ add eax,r12d
+ and r15d,esi
+ xor r14d,ebx
+ add eax,r13d
+ xor r15d,ecx
+ add r8d,eax
+ ror r14d,2
+ add eax,r15d
+ mov r13d,r8d
+ add r14d,eax
+ mov r12,QWORD[((64+0))+rsp]
+ mov r13,QWORD[((64+8))+rsp]
+ mov r15,QWORD[((64+40))+rsp]
+ mov rsi,QWORD[((64+48))+rsp]
+
+ vpand xmm11,xmm11,xmm14
+ mov eax,r14d
+ vpor xmm8,xmm8,xmm11
+ vmovdqu XMMWORD[r13*1+r12],xmm8
+ lea r12,[16+r12]
+
+ add eax,DWORD[r15]
+ add ebx,DWORD[4+r15]
+ add ecx,DWORD[8+r15]
+ add edx,DWORD[12+r15]
+ add r8d,DWORD[16+r15]
+ add r9d,DWORD[20+r15]
+ add r10d,DWORD[24+r15]
+ add r11d,DWORD[28+r15]
+
+ cmp r12,QWORD[((64+16))+rsp]
+
+ mov DWORD[r15],eax
+ mov DWORD[4+r15],ebx
+ mov DWORD[8+r15],ecx
+ mov DWORD[12+r15],edx
+ mov DWORD[16+r15],r8d
+ mov DWORD[20+r15],r9d
+ mov DWORD[24+r15],r10d
+ mov DWORD[28+r15],r11d
+
+ jb NEAR $L$loop_xop
+
+ mov r8,QWORD[((64+32))+rsp]
+ mov rsi,QWORD[120+rsp]
+
+ vmovdqu XMMWORD[r8],xmm8
+ vzeroall
+ movaps xmm6,XMMWORD[128+rsp]
+ movaps xmm7,XMMWORD[144+rsp]
+ movaps xmm8,XMMWORD[160+rsp]
+ movaps xmm9,XMMWORD[176+rsp]
+ movaps xmm10,XMMWORD[192+rsp]
+ movaps xmm11,XMMWORD[208+rsp]
+ movaps xmm12,XMMWORD[224+rsp]
+ movaps xmm13,XMMWORD[240+rsp]
+ movaps xmm14,XMMWORD[256+rsp]
+ movaps xmm15,XMMWORD[272+rsp]
+ mov r15,QWORD[((-48))+rsi]
+
+ mov r14,QWORD[((-40))+rsi]
+
+ mov r13,QWORD[((-32))+rsi]
+
+ mov r12,QWORD[((-24))+rsi]
+
+ mov rbp,QWORD[((-16))+rsi]
+
+ mov rbx,QWORD[((-8))+rsi]
+
+ lea rsp,[rsi]
+
+$L$epilogue_xop:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_aesni_cbc_sha256_enc_xop:
+
+ALIGN 64
+aesni_cbc_sha256_enc_avx:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_aesni_cbc_sha256_enc_avx:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+
+$L$avx_shortcut:
+ mov r10,QWORD[56+rsp]
+ mov rax,rsp
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+ sub rsp,288
+ and rsp,-64
+
+ shl rdx,6
+ sub rsi,rdi
+ sub r10,rdi
+ add rdx,rdi
+
+
+ mov QWORD[((64+8))+rsp],rsi
+ mov QWORD[((64+16))+rsp],rdx
+
+ mov QWORD[((64+32))+rsp],r8
+ mov QWORD[((64+40))+rsp],r9
+ mov QWORD[((64+48))+rsp],r10
+ mov QWORD[120+rsp],rax
+
+ movaps XMMWORD[128+rsp],xmm6
+ movaps XMMWORD[144+rsp],xmm7
+ movaps XMMWORD[160+rsp],xmm8
+ movaps XMMWORD[176+rsp],xmm9
+ movaps XMMWORD[192+rsp],xmm10
+ movaps XMMWORD[208+rsp],xmm11
+ movaps XMMWORD[224+rsp],xmm12
+ movaps XMMWORD[240+rsp],xmm13
+ movaps XMMWORD[256+rsp],xmm14
+ movaps XMMWORD[272+rsp],xmm15
+$L$prologue_avx:
+ vzeroall
+
+ mov r12,rdi
+ lea rdi,[128+rcx]
+ lea r13,[((K256+544))]
+ mov r14d,DWORD[((240-128))+rdi]
+ mov r15,r9
+ mov rsi,r10
+ vmovdqu xmm8,XMMWORD[r8]
+ sub r14,9
+
+ mov eax,DWORD[r15]
+ mov ebx,DWORD[4+r15]
+ mov ecx,DWORD[8+r15]
+ mov edx,DWORD[12+r15]
+ mov r8d,DWORD[16+r15]
+ mov r9d,DWORD[20+r15]
+ mov r10d,DWORD[24+r15]
+ mov r11d,DWORD[28+r15]
+
+ vmovdqa xmm14,XMMWORD[r14*8+r13]
+ vmovdqa xmm13,XMMWORD[16+r14*8+r13]
+ vmovdqa xmm12,XMMWORD[32+r14*8+r13]
+ vmovdqu xmm10,XMMWORD[((0-128))+rdi]
+ jmp NEAR $L$loop_avx
+ALIGN 16
+$L$loop_avx:
+ vmovdqa xmm7,XMMWORD[((K256+512))]
+ vmovdqu xmm0,XMMWORD[r12*1+rsi]
+ vmovdqu xmm1,XMMWORD[16+r12*1+rsi]
+ vmovdqu xmm2,XMMWORD[32+r12*1+rsi]
+ vmovdqu xmm3,XMMWORD[48+r12*1+rsi]
+ vpshufb xmm0,xmm0,xmm7
+ lea rbp,[K256]
+ vpshufb xmm1,xmm1,xmm7
+ vpshufb xmm2,xmm2,xmm7
+ vpaddd xmm4,xmm0,XMMWORD[rbp]
+ vpshufb xmm3,xmm3,xmm7
+ vpaddd xmm5,xmm1,XMMWORD[32+rbp]
+ vpaddd xmm6,xmm2,XMMWORD[64+rbp]
+ vpaddd xmm7,xmm3,XMMWORD[96+rbp]
+ vmovdqa XMMWORD[rsp],xmm4
+ mov r14d,eax
+ vmovdqa XMMWORD[16+rsp],xmm5
+ mov esi,ebx
+ vmovdqa XMMWORD[32+rsp],xmm6
+ xor esi,ecx
+ vmovdqa XMMWORD[48+rsp],xmm7
+ mov r13d,r8d
+ jmp NEAR $L$avx_00_47
+
+ALIGN 16
+$L$avx_00_47:
+ sub rbp,-16*2*4
+ vmovdqu xmm9,XMMWORD[r12]
+ mov QWORD[((64+0))+rsp],r12
+ vpalignr xmm4,xmm1,xmm0,4
+ shrd r13d,r13d,14
+ mov eax,r14d
+ mov r12d,r9d
+ vpalignr xmm7,xmm3,xmm2,4
+ xor r13d,r8d
+ shrd r14d,r14d,9
+ xor r12d,r10d
+ vpsrld xmm6,xmm4,7
+ shrd r13d,r13d,5
+ xor r14d,eax
+ and r12d,r8d
+ vpaddd xmm0,xmm0,xmm7
+ vpxor xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((16-128))+rdi]
+ xor r13d,r8d
+ add r11d,DWORD[rsp]
+ mov r15d,eax
+ vpsrld xmm7,xmm4,3
+ shrd r14d,r14d,11
+ xor r12d,r10d
+ xor r15d,ebx
+ vpslld xmm5,xmm4,14
+ shrd r13d,r13d,6
+ add r11d,r12d
+ and esi,r15d
+ vpxor xmm4,xmm7,xmm6
+ xor r14d,eax
+ add r11d,r13d
+ xor esi,ebx
+ vpshufd xmm7,xmm3,250
+ add edx,r11d
+ shrd r14d,r14d,2
+ add r11d,esi
+ vpsrld xmm6,xmm6,11
+ mov r13d,edx
+ add r14d,r11d
+ shrd r13d,r13d,14
+ vpxor xmm4,xmm4,xmm5
+ mov r11d,r14d
+ mov r12d,r8d
+ xor r13d,edx
+ vpslld xmm5,xmm5,11
+ shrd r14d,r14d,9
+ xor r12d,r9d
+ shrd r13d,r13d,5
+ vpxor xmm4,xmm4,xmm6
+ xor r14d,r11d
+ and r12d,edx
+ vpxor xmm9,xmm9,xmm8
+ xor r13d,edx
+ vpsrld xmm6,xmm7,10
+ add r10d,DWORD[4+rsp]
+ mov esi,r11d
+ shrd r14d,r14d,11
+ vpxor xmm4,xmm4,xmm5
+ xor r12d,r9d
+ xor esi,eax
+ shrd r13d,r13d,6
+ vpsrlq xmm7,xmm7,17
+ add r10d,r12d
+ and r15d,esi
+ xor r14d,r11d
+ vpaddd xmm0,xmm0,xmm4
+ add r10d,r13d
+ xor r15d,eax
+ add ecx,r10d
+ vpxor xmm6,xmm6,xmm7
+ shrd r14d,r14d,2
+ add r10d,r15d
+ mov r13d,ecx
+ vpsrlq xmm7,xmm7,2
+ add r14d,r10d
+ shrd r13d,r13d,14
+ mov r10d,r14d
+ vpxor xmm6,xmm6,xmm7
+ mov r12d,edx
+ xor r13d,ecx
+ shrd r14d,r14d,9
+ vpshufd xmm6,xmm6,132
+ xor r12d,r8d
+ shrd r13d,r13d,5
+ xor r14d,r10d
+ vpsrldq xmm6,xmm6,8
+ and r12d,ecx
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((32-128))+rdi]
+ xor r13d,ecx
+ add r9d,DWORD[8+rsp]
+ vpaddd xmm0,xmm0,xmm6
+ mov r15d,r10d
+ shrd r14d,r14d,11
+ xor r12d,r8d
+ vpshufd xmm7,xmm0,80
+ xor r15d,r11d
+ shrd r13d,r13d,6
+ add r9d,r12d
+ vpsrld xmm6,xmm7,10
+ and esi,r15d
+ xor r14d,r10d
+ add r9d,r13d
+ vpsrlq xmm7,xmm7,17
+ xor esi,r11d
+ add ebx,r9d
+ shrd r14d,r14d,2
+ vpxor xmm6,xmm6,xmm7
+ add r9d,esi
+ mov r13d,ebx
+ add r14d,r9d
+ vpsrlq xmm7,xmm7,2
+ shrd r13d,r13d,14
+ mov r9d,r14d
+ mov r12d,ecx
+ vpxor xmm6,xmm6,xmm7
+ xor r13d,ebx
+ shrd r14d,r14d,9
+ xor r12d,edx
+ vpshufd xmm6,xmm6,232
+ shrd r13d,r13d,5
+ xor r14d,r9d
+ and r12d,ebx
+ vpslldq xmm6,xmm6,8
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((48-128))+rdi]
+ xor r13d,ebx
+ add r8d,DWORD[12+rsp]
+ mov esi,r9d
+ vpaddd xmm0,xmm0,xmm6
+ shrd r14d,r14d,11
+ xor r12d,edx
+ xor esi,r10d
+ vpaddd xmm6,xmm0,XMMWORD[rbp]
+ shrd r13d,r13d,6
+ add r8d,r12d
+ and r15d,esi
+ xor r14d,r9d
+ add r8d,r13d
+ xor r15d,r10d
+ add eax,r8d
+ shrd r14d,r14d,2
+ add r8d,r15d
+ mov r13d,eax
+ add r14d,r8d
+ vmovdqa XMMWORD[rsp],xmm6
+ vpalignr xmm4,xmm2,xmm1,4
+ shrd r13d,r13d,14
+ mov r8d,r14d
+ mov r12d,ebx
+ vpalignr xmm7,xmm0,xmm3,4
+ xor r13d,eax
+ shrd r14d,r14d,9
+ xor r12d,ecx
+ vpsrld xmm6,xmm4,7
+ shrd r13d,r13d,5
+ xor r14d,r8d
+ and r12d,eax
+ vpaddd xmm1,xmm1,xmm7
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((64-128))+rdi]
+ xor r13d,eax
+ add edx,DWORD[16+rsp]
+ mov r15d,r8d
+ vpsrld xmm7,xmm4,3
+ shrd r14d,r14d,11
+ xor r12d,ecx
+ xor r15d,r9d
+ vpslld xmm5,xmm4,14
+ shrd r13d,r13d,6
+ add edx,r12d
+ and esi,r15d
+ vpxor xmm4,xmm7,xmm6
+ xor r14d,r8d
+ add edx,r13d
+ xor esi,r9d
+ vpshufd xmm7,xmm0,250
+ add r11d,edx
+ shrd r14d,r14d,2
+ add edx,esi
+ vpsrld xmm6,xmm6,11
+ mov r13d,r11d
+ add r14d,edx
+ shrd r13d,r13d,14
+ vpxor xmm4,xmm4,xmm5
+ mov edx,r14d
+ mov r12d,eax
+ xor r13d,r11d
+ vpslld xmm5,xmm5,11
+ shrd r14d,r14d,9
+ xor r12d,ebx
+ shrd r13d,r13d,5
+ vpxor xmm4,xmm4,xmm6
+ xor r14d,edx
+ and r12d,r11d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((80-128))+rdi]
+ xor r13d,r11d
+ vpsrld xmm6,xmm7,10
+ add ecx,DWORD[20+rsp]
+ mov esi,edx
+ shrd r14d,r14d,11
+ vpxor xmm4,xmm4,xmm5
+ xor r12d,ebx
+ xor esi,r8d
+ shrd r13d,r13d,6
+ vpsrlq xmm7,xmm7,17
+ add ecx,r12d
+ and r15d,esi
+ xor r14d,edx
+ vpaddd xmm1,xmm1,xmm4
+ add ecx,r13d
+ xor r15d,r8d
+ add r10d,ecx
+ vpxor xmm6,xmm6,xmm7
+ shrd r14d,r14d,2
+ add ecx,r15d
+ mov r13d,r10d
+ vpsrlq xmm7,xmm7,2
+ add r14d,ecx
+ shrd r13d,r13d,14
+ mov ecx,r14d
+ vpxor xmm6,xmm6,xmm7
+ mov r12d,r11d
+ xor r13d,r10d
+ shrd r14d,r14d,9
+ vpshufd xmm6,xmm6,132
+ xor r12d,eax
+ shrd r13d,r13d,5
+ xor r14d,ecx
+ vpsrldq xmm6,xmm6,8
+ and r12d,r10d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((96-128))+rdi]
+ xor r13d,r10d
+ add ebx,DWORD[24+rsp]
+ vpaddd xmm1,xmm1,xmm6
+ mov r15d,ecx
+ shrd r14d,r14d,11
+ xor r12d,eax
+ vpshufd xmm7,xmm1,80
+ xor r15d,edx
+ shrd r13d,r13d,6
+ add ebx,r12d
+ vpsrld xmm6,xmm7,10
+ and esi,r15d
+ xor r14d,ecx
+ add ebx,r13d
+ vpsrlq xmm7,xmm7,17
+ xor esi,edx
+ add r9d,ebx
+ shrd r14d,r14d,2
+ vpxor xmm6,xmm6,xmm7
+ add ebx,esi
+ mov r13d,r9d
+ add r14d,ebx
+ vpsrlq xmm7,xmm7,2
+ shrd r13d,r13d,14
+ mov ebx,r14d
+ mov r12d,r10d
+ vpxor xmm6,xmm6,xmm7
+ xor r13d,r9d
+ shrd r14d,r14d,9
+ xor r12d,r11d
+ vpshufd xmm6,xmm6,232
+ shrd r13d,r13d,5
+ xor r14d,ebx
+ and r12d,r9d
+ vpslldq xmm6,xmm6,8
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((112-128))+rdi]
+ xor r13d,r9d
+ add eax,DWORD[28+rsp]
+ mov esi,ebx
+ vpaddd xmm1,xmm1,xmm6
+ shrd r14d,r14d,11
+ xor r12d,r11d
+ xor esi,ecx
+ vpaddd xmm6,xmm1,XMMWORD[32+rbp]
+ shrd r13d,r13d,6
+ add eax,r12d
+ and r15d,esi
+ xor r14d,ebx
+ add eax,r13d
+ xor r15d,ecx
+ add r8d,eax
+ shrd r14d,r14d,2
+ add eax,r15d
+ mov r13d,r8d
+ add r14d,eax
+ vmovdqa XMMWORD[16+rsp],xmm6
+ vpalignr xmm4,xmm3,xmm2,4
+ shrd r13d,r13d,14
+ mov eax,r14d
+ mov r12d,r9d
+ vpalignr xmm7,xmm1,xmm0,4
+ xor r13d,r8d
+ shrd r14d,r14d,9
+ xor r12d,r10d
+ vpsrld xmm6,xmm4,7
+ shrd r13d,r13d,5
+ xor r14d,eax
+ and r12d,r8d
+ vpaddd xmm2,xmm2,xmm7
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((128-128))+rdi]
+ xor r13d,r8d
+ add r11d,DWORD[32+rsp]
+ mov r15d,eax
+ vpsrld xmm7,xmm4,3
+ shrd r14d,r14d,11
+ xor r12d,r10d
+ xor r15d,ebx
+ vpslld xmm5,xmm4,14
+ shrd r13d,r13d,6
+ add r11d,r12d
+ and esi,r15d
+ vpxor xmm4,xmm7,xmm6
+ xor r14d,eax
+ add r11d,r13d
+ xor esi,ebx
+ vpshufd xmm7,xmm1,250
+ add edx,r11d
+ shrd r14d,r14d,2
+ add r11d,esi
+ vpsrld xmm6,xmm6,11
+ mov r13d,edx
+ add r14d,r11d
+ shrd r13d,r13d,14
+ vpxor xmm4,xmm4,xmm5
+ mov r11d,r14d
+ mov r12d,r8d
+ xor r13d,edx
+ vpslld xmm5,xmm5,11
+ shrd r14d,r14d,9
+ xor r12d,r9d
+ shrd r13d,r13d,5
+ vpxor xmm4,xmm4,xmm6
+ xor r14d,r11d
+ and r12d,edx
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((144-128))+rdi]
+ xor r13d,edx
+ vpsrld xmm6,xmm7,10
+ add r10d,DWORD[36+rsp]
+ mov esi,r11d
+ shrd r14d,r14d,11
+ vpxor xmm4,xmm4,xmm5
+ xor r12d,r9d
+ xor esi,eax
+ shrd r13d,r13d,6
+ vpsrlq xmm7,xmm7,17
+ add r10d,r12d
+ and r15d,esi
+ xor r14d,r11d
+ vpaddd xmm2,xmm2,xmm4
+ add r10d,r13d
+ xor r15d,eax
+ add ecx,r10d
+ vpxor xmm6,xmm6,xmm7
+ shrd r14d,r14d,2
+ add r10d,r15d
+ mov r13d,ecx
+ vpsrlq xmm7,xmm7,2
+ add r14d,r10d
+ shrd r13d,r13d,14
+ mov r10d,r14d
+ vpxor xmm6,xmm6,xmm7
+ mov r12d,edx
+ xor r13d,ecx
+ shrd r14d,r14d,9
+ vpshufd xmm6,xmm6,132
+ xor r12d,r8d
+ shrd r13d,r13d,5
+ xor r14d,r10d
+ vpsrldq xmm6,xmm6,8
+ and r12d,ecx
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((160-128))+rdi]
+ xor r13d,ecx
+ add r9d,DWORD[40+rsp]
+ vpaddd xmm2,xmm2,xmm6
+ mov r15d,r10d
+ shrd r14d,r14d,11
+ xor r12d,r8d
+ vpshufd xmm7,xmm2,80
+ xor r15d,r11d
+ shrd r13d,r13d,6
+ add r9d,r12d
+ vpsrld xmm6,xmm7,10
+ and esi,r15d
+ xor r14d,r10d
+ add r9d,r13d
+ vpsrlq xmm7,xmm7,17
+ xor esi,r11d
+ add ebx,r9d
+ shrd r14d,r14d,2
+ vpxor xmm6,xmm6,xmm7
+ add r9d,esi
+ mov r13d,ebx
+ add r14d,r9d
+ vpsrlq xmm7,xmm7,2
+ shrd r13d,r13d,14
+ mov r9d,r14d
+ mov r12d,ecx
+ vpxor xmm6,xmm6,xmm7
+ xor r13d,ebx
+ shrd r14d,r14d,9
+ xor r12d,edx
+ vpshufd xmm6,xmm6,232
+ shrd r13d,r13d,5
+ xor r14d,r9d
+ and r12d,ebx
+ vpslldq xmm6,xmm6,8
+ vaesenclast xmm11,xmm9,xmm10
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((176-128))+rdi]
+ xor r13d,ebx
+ add r8d,DWORD[44+rsp]
+ mov esi,r9d
+ vpaddd xmm2,xmm2,xmm6
+ shrd r14d,r14d,11
+ xor r12d,edx
+ xor esi,r10d
+ vpaddd xmm6,xmm2,XMMWORD[64+rbp]
+ shrd r13d,r13d,6
+ add r8d,r12d
+ and r15d,esi
+ xor r14d,r9d
+ add r8d,r13d
+ xor r15d,r10d
+ add eax,r8d
+ shrd r14d,r14d,2
+ add r8d,r15d
+ mov r13d,eax
+ add r14d,r8d
+ vmovdqa XMMWORD[32+rsp],xmm6
+ vpalignr xmm4,xmm0,xmm3,4
+ shrd r13d,r13d,14
+ mov r8d,r14d
+ mov r12d,ebx
+ vpalignr xmm7,xmm2,xmm1,4
+ xor r13d,eax
+ shrd r14d,r14d,9
+ xor r12d,ecx
+ vpsrld xmm6,xmm4,7
+ shrd r13d,r13d,5
+ xor r14d,r8d
+ and r12d,eax
+ vpaddd xmm3,xmm3,xmm7
+ vpand xmm8,xmm11,xmm12
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((192-128))+rdi]
+ xor r13d,eax
+ add edx,DWORD[48+rsp]
+ mov r15d,r8d
+ vpsrld xmm7,xmm4,3
+ shrd r14d,r14d,11
+ xor r12d,ecx
+ xor r15d,r9d
+ vpslld xmm5,xmm4,14
+ shrd r13d,r13d,6
+ add edx,r12d
+ and esi,r15d
+ vpxor xmm4,xmm7,xmm6
+ xor r14d,r8d
+ add edx,r13d
+ xor esi,r9d
+ vpshufd xmm7,xmm2,250
+ add r11d,edx
+ shrd r14d,r14d,2
+ add edx,esi
+ vpsrld xmm6,xmm6,11
+ mov r13d,r11d
+ add r14d,edx
+ shrd r13d,r13d,14
+ vpxor xmm4,xmm4,xmm5
+ mov edx,r14d
+ mov r12d,eax
+ xor r13d,r11d
+ vpslld xmm5,xmm5,11
+ shrd r14d,r14d,9
+ xor r12d,ebx
+ shrd r13d,r13d,5
+ vpxor xmm4,xmm4,xmm6
+ xor r14d,edx
+ and r12d,r11d
+ vaesenclast xmm11,xmm9,xmm10
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((208-128))+rdi]
+ xor r13d,r11d
+ vpsrld xmm6,xmm7,10
+ add ecx,DWORD[52+rsp]
+ mov esi,edx
+ shrd r14d,r14d,11
+ vpxor xmm4,xmm4,xmm5
+ xor r12d,ebx
+ xor esi,r8d
+ shrd r13d,r13d,6
+ vpsrlq xmm7,xmm7,17
+ add ecx,r12d
+ and r15d,esi
+ xor r14d,edx
+ vpaddd xmm3,xmm3,xmm4
+ add ecx,r13d
+ xor r15d,r8d
+ add r10d,ecx
+ vpxor xmm6,xmm6,xmm7
+ shrd r14d,r14d,2
+ add ecx,r15d
+ mov r13d,r10d
+ vpsrlq xmm7,xmm7,2
+ add r14d,ecx
+ shrd r13d,r13d,14
+ mov ecx,r14d
+ vpxor xmm6,xmm6,xmm7
+ mov r12d,r11d
+ xor r13d,r10d
+ shrd r14d,r14d,9
+ vpshufd xmm6,xmm6,132
+ xor r12d,eax
+ shrd r13d,r13d,5
+ xor r14d,ecx
+ vpsrldq xmm6,xmm6,8
+ and r12d,r10d
+ vpand xmm11,xmm11,xmm13
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((224-128))+rdi]
+ xor r13d,r10d
+ add ebx,DWORD[56+rsp]
+ vpaddd xmm3,xmm3,xmm6
+ mov r15d,ecx
+ shrd r14d,r14d,11
+ xor r12d,eax
+ vpshufd xmm7,xmm3,80
+ xor r15d,edx
+ shrd r13d,r13d,6
+ add ebx,r12d
+ vpsrld xmm6,xmm7,10
+ and esi,r15d
+ xor r14d,ecx
+ add ebx,r13d
+ vpsrlq xmm7,xmm7,17
+ xor esi,edx
+ add r9d,ebx
+ shrd r14d,r14d,2
+ vpxor xmm6,xmm6,xmm7
+ add ebx,esi
+ mov r13d,r9d
+ add r14d,ebx
+ vpsrlq xmm7,xmm7,2
+ shrd r13d,r13d,14
+ mov ebx,r14d
+ mov r12d,r10d
+ vpxor xmm6,xmm6,xmm7
+ xor r13d,r9d
+ shrd r14d,r14d,9
+ xor r12d,r11d
+ vpshufd xmm6,xmm6,232
+ shrd r13d,r13d,5
+ xor r14d,ebx
+ and r12d,r9d
+ vpslldq xmm6,xmm6,8
+ vpor xmm8,xmm8,xmm11
+ vaesenclast xmm11,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((0-128))+rdi]
+ xor r13d,r9d
+ add eax,DWORD[60+rsp]
+ mov esi,ebx
+ vpaddd xmm3,xmm3,xmm6
+ shrd r14d,r14d,11
+ xor r12d,r11d
+ xor esi,ecx
+ vpaddd xmm6,xmm3,XMMWORD[96+rbp]
+ shrd r13d,r13d,6
+ add eax,r12d
+ and r15d,esi
+ xor r14d,ebx
+ add eax,r13d
+ xor r15d,ecx
+ add r8d,eax
+ shrd r14d,r14d,2
+ add eax,r15d
+ mov r13d,r8d
+ add r14d,eax
+ vmovdqa XMMWORD[48+rsp],xmm6
+ mov r12,QWORD[((64+0))+rsp]
+ vpand xmm11,xmm11,xmm14
+ mov r15,QWORD[((64+8))+rsp]
+ vpor xmm8,xmm8,xmm11
+ vmovdqu XMMWORD[r12*1+r15],xmm8
+ lea r12,[16+r12]
+ cmp BYTE[131+rbp],0
+ jne NEAR $L$avx_00_47
+ vmovdqu xmm9,XMMWORD[r12]
+ mov QWORD[((64+0))+rsp],r12
+ shrd r13d,r13d,14
+ mov eax,r14d
+ mov r12d,r9d
+ xor r13d,r8d
+ shrd r14d,r14d,9
+ xor r12d,r10d
+ shrd r13d,r13d,5
+ xor r14d,eax
+ and r12d,r8d
+ vpxor xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((16-128))+rdi]
+ xor r13d,r8d
+ add r11d,DWORD[rsp]
+ mov r15d,eax
+ shrd r14d,r14d,11
+ xor r12d,r10d
+ xor r15d,ebx
+ shrd r13d,r13d,6
+ add r11d,r12d
+ and esi,r15d
+ xor r14d,eax
+ add r11d,r13d
+ xor esi,ebx
+ add edx,r11d
+ shrd r14d,r14d,2
+ add r11d,esi
+ mov r13d,edx
+ add r14d,r11d
+ shrd r13d,r13d,14
+ mov r11d,r14d
+ mov r12d,r8d
+ xor r13d,edx
+ shrd r14d,r14d,9
+ xor r12d,r9d
+ shrd r13d,r13d,5
+ xor r14d,r11d
+ and r12d,edx
+ vpxor xmm9,xmm9,xmm8
+ xor r13d,edx
+ add r10d,DWORD[4+rsp]
+ mov esi,r11d
+ shrd r14d,r14d,11
+ xor r12d,r9d
+ xor esi,eax
+ shrd r13d,r13d,6
+ add r10d,r12d
+ and r15d,esi
+ xor r14d,r11d
+ add r10d,r13d
+ xor r15d,eax
+ add ecx,r10d
+ shrd r14d,r14d,2
+ add r10d,r15d
+ mov r13d,ecx
+ add r14d,r10d
+ shrd r13d,r13d,14
+ mov r10d,r14d
+ mov r12d,edx
+ xor r13d,ecx
+ shrd r14d,r14d,9
+ xor r12d,r8d
+ shrd r13d,r13d,5
+ xor r14d,r10d
+ and r12d,ecx
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((32-128))+rdi]
+ xor r13d,ecx
+ add r9d,DWORD[8+rsp]
+ mov r15d,r10d
+ shrd r14d,r14d,11
+ xor r12d,r8d
+ xor r15d,r11d
+ shrd r13d,r13d,6
+ add r9d,r12d
+ and esi,r15d
+ xor r14d,r10d
+ add r9d,r13d
+ xor esi,r11d
+ add ebx,r9d
+ shrd r14d,r14d,2
+ add r9d,esi
+ mov r13d,ebx
+ add r14d,r9d
+ shrd r13d,r13d,14
+ mov r9d,r14d
+ mov r12d,ecx
+ xor r13d,ebx
+ shrd r14d,r14d,9
+ xor r12d,edx
+ shrd r13d,r13d,5
+ xor r14d,r9d
+ and r12d,ebx
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((48-128))+rdi]
+ xor r13d,ebx
+ add r8d,DWORD[12+rsp]
+ mov esi,r9d
+ shrd r14d,r14d,11
+ xor r12d,edx
+ xor esi,r10d
+ shrd r13d,r13d,6
+ add r8d,r12d
+ and r15d,esi
+ xor r14d,r9d
+ add r8d,r13d
+ xor r15d,r10d
+ add eax,r8d
+ shrd r14d,r14d,2
+ add r8d,r15d
+ mov r13d,eax
+ add r14d,r8d
+ shrd r13d,r13d,14
+ mov r8d,r14d
+ mov r12d,ebx
+ xor r13d,eax
+ shrd r14d,r14d,9
+ xor r12d,ecx
+ shrd r13d,r13d,5
+ xor r14d,r8d
+ and r12d,eax
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((64-128))+rdi]
+ xor r13d,eax
+ add edx,DWORD[16+rsp]
+ mov r15d,r8d
+ shrd r14d,r14d,11
+ xor r12d,ecx
+ xor r15d,r9d
+ shrd r13d,r13d,6
+ add edx,r12d
+ and esi,r15d
+ xor r14d,r8d
+ add edx,r13d
+ xor esi,r9d
+ add r11d,edx
+ shrd r14d,r14d,2
+ add edx,esi
+ mov r13d,r11d
+ add r14d,edx
+ shrd r13d,r13d,14
+ mov edx,r14d
+ mov r12d,eax
+ xor r13d,r11d
+ shrd r14d,r14d,9
+ xor r12d,ebx
+ shrd r13d,r13d,5
+ xor r14d,edx
+ and r12d,r11d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((80-128))+rdi]
+ xor r13d,r11d
+ add ecx,DWORD[20+rsp]
+ mov esi,edx
+ shrd r14d,r14d,11
+ xor r12d,ebx
+ xor esi,r8d
+ shrd r13d,r13d,6
+ add ecx,r12d
+ and r15d,esi
+ xor r14d,edx
+ add ecx,r13d
+ xor r15d,r8d
+ add r10d,ecx
+ shrd r14d,r14d,2
+ add ecx,r15d
+ mov r13d,r10d
+ add r14d,ecx
+ shrd r13d,r13d,14
+ mov ecx,r14d
+ mov r12d,r11d
+ xor r13d,r10d
+ shrd r14d,r14d,9
+ xor r12d,eax
+ shrd r13d,r13d,5
+ xor r14d,ecx
+ and r12d,r10d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((96-128))+rdi]
+ xor r13d,r10d
+ add ebx,DWORD[24+rsp]
+ mov r15d,ecx
+ shrd r14d,r14d,11
+ xor r12d,eax
+ xor r15d,edx
+ shrd r13d,r13d,6
+ add ebx,r12d
+ and esi,r15d
+ xor r14d,ecx
+ add ebx,r13d
+ xor esi,edx
+ add r9d,ebx
+ shrd r14d,r14d,2
+ add ebx,esi
+ mov r13d,r9d
+ add r14d,ebx
+ shrd r13d,r13d,14
+ mov ebx,r14d
+ mov r12d,r10d
+ xor r13d,r9d
+ shrd r14d,r14d,9
+ xor r12d,r11d
+ shrd r13d,r13d,5
+ xor r14d,ebx
+ and r12d,r9d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((112-128))+rdi]
+ xor r13d,r9d
+ add eax,DWORD[28+rsp]
+ mov esi,ebx
+ shrd r14d,r14d,11
+ xor r12d,r11d
+ xor esi,ecx
+ shrd r13d,r13d,6
+ add eax,r12d
+ and r15d,esi
+ xor r14d,ebx
+ add eax,r13d
+ xor r15d,ecx
+ add r8d,eax
+ shrd r14d,r14d,2
+ add eax,r15d
+ mov r13d,r8d
+ add r14d,eax
+ shrd r13d,r13d,14
+ mov eax,r14d
+ mov r12d,r9d
+ xor r13d,r8d
+ shrd r14d,r14d,9
+ xor r12d,r10d
+ shrd r13d,r13d,5
+ xor r14d,eax
+ and r12d,r8d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((128-128))+rdi]
+ xor r13d,r8d
+ add r11d,DWORD[32+rsp]
+ mov r15d,eax
+ shrd r14d,r14d,11
+ xor r12d,r10d
+ xor r15d,ebx
+ shrd r13d,r13d,6
+ add r11d,r12d
+ and esi,r15d
+ xor r14d,eax
+ add r11d,r13d
+ xor esi,ebx
+ add edx,r11d
+ shrd r14d,r14d,2
+ add r11d,esi
+ mov r13d,edx
+ add r14d,r11d
+ shrd r13d,r13d,14
+ mov r11d,r14d
+ mov r12d,r8d
+ xor r13d,edx
+ shrd r14d,r14d,9
+ xor r12d,r9d
+ shrd r13d,r13d,5
+ xor r14d,r11d
+ and r12d,edx
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((144-128))+rdi]
+ xor r13d,edx
+ add r10d,DWORD[36+rsp]
+ mov esi,r11d
+ shrd r14d,r14d,11
+ xor r12d,r9d
+ xor esi,eax
+ shrd r13d,r13d,6
+ add r10d,r12d
+ and r15d,esi
+ xor r14d,r11d
+ add r10d,r13d
+ xor r15d,eax
+ add ecx,r10d
+ shrd r14d,r14d,2
+ add r10d,r15d
+ mov r13d,ecx
+ add r14d,r10d
+ shrd r13d,r13d,14
+ mov r10d,r14d
+ mov r12d,edx
+ xor r13d,ecx
+ shrd r14d,r14d,9
+ xor r12d,r8d
+ shrd r13d,r13d,5
+ xor r14d,r10d
+ and r12d,ecx
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((160-128))+rdi]
+ xor r13d,ecx
+ add r9d,DWORD[40+rsp]
+ mov r15d,r10d
+ shrd r14d,r14d,11
+ xor r12d,r8d
+ xor r15d,r11d
+ shrd r13d,r13d,6
+ add r9d,r12d
+ and esi,r15d
+ xor r14d,r10d
+ add r9d,r13d
+ xor esi,r11d
+ add ebx,r9d
+ shrd r14d,r14d,2
+ add r9d,esi
+ mov r13d,ebx
+ add r14d,r9d
+ shrd r13d,r13d,14
+ mov r9d,r14d
+ mov r12d,ecx
+ xor r13d,ebx
+ shrd r14d,r14d,9
+ xor r12d,edx
+ shrd r13d,r13d,5
+ xor r14d,r9d
+ and r12d,ebx
+ vaesenclast xmm11,xmm9,xmm10
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((176-128))+rdi]
+ xor r13d,ebx
+ add r8d,DWORD[44+rsp]
+ mov esi,r9d
+ shrd r14d,r14d,11
+ xor r12d,edx
+ xor esi,r10d
+ shrd r13d,r13d,6
+ add r8d,r12d
+ and r15d,esi
+ xor r14d,r9d
+ add r8d,r13d
+ xor r15d,r10d
+ add eax,r8d
+ shrd r14d,r14d,2
+ add r8d,r15d
+ mov r13d,eax
+ add r14d,r8d
+ shrd r13d,r13d,14
+ mov r8d,r14d
+ mov r12d,ebx
+ xor r13d,eax
+ shrd r14d,r14d,9
+ xor r12d,ecx
+ shrd r13d,r13d,5
+ xor r14d,r8d
+ and r12d,eax
+ vpand xmm8,xmm11,xmm12
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((192-128))+rdi]
+ xor r13d,eax
+ add edx,DWORD[48+rsp]
+ mov r15d,r8d
+ shrd r14d,r14d,11
+ xor r12d,ecx
+ xor r15d,r9d
+ shrd r13d,r13d,6
+ add edx,r12d
+ and esi,r15d
+ xor r14d,r8d
+ add edx,r13d
+ xor esi,r9d
+ add r11d,edx
+ shrd r14d,r14d,2
+ add edx,esi
+ mov r13d,r11d
+ add r14d,edx
+ shrd r13d,r13d,14
+ mov edx,r14d
+ mov r12d,eax
+ xor r13d,r11d
+ shrd r14d,r14d,9
+ xor r12d,ebx
+ shrd r13d,r13d,5
+ xor r14d,edx
+ and r12d,r11d
+ vaesenclast xmm11,xmm9,xmm10
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((208-128))+rdi]
+ xor r13d,r11d
+ add ecx,DWORD[52+rsp]
+ mov esi,edx
+ shrd r14d,r14d,11
+ xor r12d,ebx
+ xor esi,r8d
+ shrd r13d,r13d,6
+ add ecx,r12d
+ and r15d,esi
+ xor r14d,edx
+ add ecx,r13d
+ xor r15d,r8d
+ add r10d,ecx
+ shrd r14d,r14d,2
+ add ecx,r15d
+ mov r13d,r10d
+ add r14d,ecx
+ shrd r13d,r13d,14
+ mov ecx,r14d
+ mov r12d,r11d
+ xor r13d,r10d
+ shrd r14d,r14d,9
+ xor r12d,eax
+ shrd r13d,r13d,5
+ xor r14d,ecx
+ and r12d,r10d
+ vpand xmm11,xmm11,xmm13
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((224-128))+rdi]
+ xor r13d,r10d
+ add ebx,DWORD[56+rsp]
+ mov r15d,ecx
+ shrd r14d,r14d,11
+ xor r12d,eax
+ xor r15d,edx
+ shrd r13d,r13d,6
+ add ebx,r12d
+ and esi,r15d
+ xor r14d,ecx
+ add ebx,r13d
+ xor esi,edx
+ add r9d,ebx
+ shrd r14d,r14d,2
+ add ebx,esi
+ mov r13d,r9d
+ add r14d,ebx
+ shrd r13d,r13d,14
+ mov ebx,r14d
+ mov r12d,r10d
+ xor r13d,r9d
+ shrd r14d,r14d,9
+ xor r12d,r11d
+ shrd r13d,r13d,5
+ xor r14d,ebx
+ and r12d,r9d
+ vpor xmm8,xmm8,xmm11
+ vaesenclast xmm11,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((0-128))+rdi]
+ xor r13d,r9d
+ add eax,DWORD[60+rsp]
+ mov esi,ebx
+ shrd r14d,r14d,11
+ xor r12d,r11d
+ xor esi,ecx
+ shrd r13d,r13d,6
+ add eax,r12d
+ and r15d,esi
+ xor r14d,ebx
+ add eax,r13d
+ xor r15d,ecx
+ add r8d,eax
+ shrd r14d,r14d,2
+ add eax,r15d
+ mov r13d,r8d
+ add r14d,eax
+ mov r12,QWORD[((64+0))+rsp]
+ mov r13,QWORD[((64+8))+rsp]
+ mov r15,QWORD[((64+40))+rsp]
+ mov rsi,QWORD[((64+48))+rsp]
+
+ vpand xmm11,xmm11,xmm14
+ mov eax,r14d
+ vpor xmm8,xmm8,xmm11
+ vmovdqu XMMWORD[r13*1+r12],xmm8
+ lea r12,[16+r12]
+
+ add eax,DWORD[r15]
+ add ebx,DWORD[4+r15]
+ add ecx,DWORD[8+r15]
+ add edx,DWORD[12+r15]
+ add r8d,DWORD[16+r15]
+ add r9d,DWORD[20+r15]
+ add r10d,DWORD[24+r15]
+ add r11d,DWORD[28+r15]
+
+ cmp r12,QWORD[((64+16))+rsp]
+
+ mov DWORD[r15],eax
+ mov DWORD[4+r15],ebx
+ mov DWORD[8+r15],ecx
+ mov DWORD[12+r15],edx
+ mov DWORD[16+r15],r8d
+ mov DWORD[20+r15],r9d
+ mov DWORD[24+r15],r10d
+ mov DWORD[28+r15],r11d
+ jb NEAR $L$loop_avx
+
+ mov r8,QWORD[((64+32))+rsp]
+ mov rsi,QWORD[120+rsp]
+
+ vmovdqu XMMWORD[r8],xmm8
+ vzeroall
+ movaps xmm6,XMMWORD[128+rsp]
+ movaps xmm7,XMMWORD[144+rsp]
+ movaps xmm8,XMMWORD[160+rsp]
+ movaps xmm9,XMMWORD[176+rsp]
+ movaps xmm10,XMMWORD[192+rsp]
+ movaps xmm11,XMMWORD[208+rsp]
+ movaps xmm12,XMMWORD[224+rsp]
+ movaps xmm13,XMMWORD[240+rsp]
+ movaps xmm14,XMMWORD[256+rsp]
+ movaps xmm15,XMMWORD[272+rsp]
+ mov r15,QWORD[((-48))+rsi]
+
+ mov r14,QWORD[((-40))+rsi]
+
+ mov r13,QWORD[((-32))+rsi]
+
+ mov r12,QWORD[((-24))+rsi]
+
+ mov rbp,QWORD[((-16))+rsi]
+
+ mov rbx,QWORD[((-8))+rsi]
+
+ lea rsp,[rsi]
+
+$L$epilogue_avx:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_aesni_cbc_sha256_enc_avx:
+
+ALIGN 64
+aesni_cbc_sha256_enc_avx2:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_aesni_cbc_sha256_enc_avx2:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+
+$L$avx2_shortcut:
+ mov r10,QWORD[56+rsp]
+ mov rax,rsp
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+ sub rsp,736
+ and rsp,-256*4
+ add rsp,448
+
+ shl rdx,6
+ sub rsi,rdi
+ sub r10,rdi
+ add rdx,rdi
+
+
+
+ mov QWORD[((64+16))+rsp],rdx
+
+ mov QWORD[((64+32))+rsp],r8
+ mov QWORD[((64+40))+rsp],r9
+ mov QWORD[((64+48))+rsp],r10
+ mov QWORD[120+rsp],rax
+
+ movaps XMMWORD[128+rsp],xmm6
+ movaps XMMWORD[144+rsp],xmm7
+ movaps XMMWORD[160+rsp],xmm8
+ movaps XMMWORD[176+rsp],xmm9
+ movaps XMMWORD[192+rsp],xmm10
+ movaps XMMWORD[208+rsp],xmm11
+ movaps XMMWORD[224+rsp],xmm12
+ movaps XMMWORD[240+rsp],xmm13
+ movaps XMMWORD[256+rsp],xmm14
+ movaps XMMWORD[272+rsp],xmm15
+$L$prologue_avx2:
+ vzeroall
+
+ mov r13,rdi
+ vpinsrq xmm15,xmm15,rsi,1
+ lea rdi,[128+rcx]
+ lea r12,[((K256+544))]
+ mov r14d,DWORD[((240-128))+rdi]
+ mov r15,r9
+ mov rsi,r10
+ vmovdqu xmm8,XMMWORD[r8]
+ lea r14,[((-9))+r14]
+
+ vmovdqa xmm14,XMMWORD[r14*8+r12]
+ vmovdqa xmm13,XMMWORD[16+r14*8+r12]
+ vmovdqa xmm12,XMMWORD[32+r14*8+r12]
+
+ sub r13,-16*4
+ mov eax,DWORD[r15]
+ lea r12,[r13*1+rsi]
+ mov ebx,DWORD[4+r15]
+ cmp r13,rdx
+ mov ecx,DWORD[8+r15]
+ cmove r12,rsp
+ mov edx,DWORD[12+r15]
+ mov r8d,DWORD[16+r15]
+ mov r9d,DWORD[20+r15]
+ mov r10d,DWORD[24+r15]
+ mov r11d,DWORD[28+r15]
+ vmovdqu xmm10,XMMWORD[((0-128))+rdi]
+ jmp NEAR $L$oop_avx2
+ALIGN 16
+$L$oop_avx2:
+ vmovdqa ymm7,YMMWORD[((K256+512))]
+ vmovdqu xmm0,XMMWORD[((-64+0))+r13*1+rsi]
+ vmovdqu xmm1,XMMWORD[((-64+16))+r13*1+rsi]
+ vmovdqu xmm2,XMMWORD[((-64+32))+r13*1+rsi]
+ vmovdqu xmm3,XMMWORD[((-64+48))+r13*1+rsi]
+
+ vinserti128 ymm0,ymm0,XMMWORD[r12],1
+ vinserti128 ymm1,ymm1,XMMWORD[16+r12],1
+ vpshufb ymm0,ymm0,ymm7
+ vinserti128 ymm2,ymm2,XMMWORD[32+r12],1
+ vpshufb ymm1,ymm1,ymm7
+ vinserti128 ymm3,ymm3,XMMWORD[48+r12],1
+
+ lea rbp,[K256]
+ vpshufb ymm2,ymm2,ymm7
+ lea r13,[((-64))+r13]
+ vpaddd ymm4,ymm0,YMMWORD[rbp]
+ vpshufb ymm3,ymm3,ymm7
+ vpaddd ymm5,ymm1,YMMWORD[32+rbp]
+ vpaddd ymm6,ymm2,YMMWORD[64+rbp]
+ vpaddd ymm7,ymm3,YMMWORD[96+rbp]
+ vmovdqa YMMWORD[rsp],ymm4
+ xor r14d,r14d
+ vmovdqa YMMWORD[32+rsp],ymm5
+ lea rsp,[((-64))+rsp]
+ mov esi,ebx
+ vmovdqa YMMWORD[rsp],ymm6
+ xor esi,ecx
+ vmovdqa YMMWORD[32+rsp],ymm7
+ mov r12d,r9d
+ sub rbp,-16*2*4
+ jmp NEAR $L$avx2_00_47
+
+ALIGN 16
+$L$avx2_00_47:
+ vmovdqu xmm9,XMMWORD[r13]
+ vpinsrq xmm15,xmm15,r13,0
+ lea rsp,[((-64))+rsp]
+ vpalignr ymm4,ymm1,ymm0,4
+ add r11d,DWORD[((0+128))+rsp]
+ and r12d,r8d
+ rorx r13d,r8d,25
+ vpalignr ymm7,ymm3,ymm2,4
+ rorx r15d,r8d,11
+ lea eax,[r14*1+rax]
+ lea r11d,[r12*1+r11]
+ vpsrld ymm6,ymm4,7
+ andn r12d,r8d,r10d
+ xor r13d,r15d
+ rorx r14d,r8d,6
+ vpaddd ymm0,ymm0,ymm7
+ lea r11d,[r12*1+r11]
+ xor r13d,r14d
+ mov r15d,eax
+ vpsrld ymm7,ymm4,3
+ rorx r12d,eax,22
+ lea r11d,[r13*1+r11]
+ xor r15d,ebx
+ vpslld ymm5,ymm4,14
+ rorx r14d,eax,13
+ rorx r13d,eax,2
+ lea edx,[r11*1+rdx]
+ vpxor ymm4,ymm7,ymm6
+ and esi,r15d
+ vpxor xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((16-128))+rdi]
+ xor r14d,r12d
+ xor esi,ebx
+ vpshufd ymm7,ymm3,250
+ xor r14d,r13d
+ lea r11d,[rsi*1+r11]
+ mov r12d,r8d
+ vpsrld ymm6,ymm6,11
+ add r10d,DWORD[((4+128))+rsp]
+ and r12d,edx
+ rorx r13d,edx,25
+ vpxor ymm4,ymm4,ymm5
+ rorx esi,edx,11
+ lea r11d,[r14*1+r11]
+ lea r10d,[r12*1+r10]
+ vpslld ymm5,ymm5,11
+ andn r12d,edx,r9d
+ xor r13d,esi
+ rorx r14d,edx,6
+ vpxor ymm4,ymm4,ymm6
+ lea r10d,[r12*1+r10]
+ xor r13d,r14d
+ mov esi,r11d
+ vpsrld ymm6,ymm7,10
+ rorx r12d,r11d,22
+ lea r10d,[r13*1+r10]
+ xor esi,eax
+ vpxor ymm4,ymm4,ymm5
+ rorx r14d,r11d,13
+ rorx r13d,r11d,2
+ lea ecx,[r10*1+rcx]
+ vpsrlq ymm7,ymm7,17
+ and r15d,esi
+ vpxor xmm9,xmm9,xmm8
+ xor r14d,r12d
+ xor r15d,eax
+ vpaddd ymm0,ymm0,ymm4
+ xor r14d,r13d
+ lea r10d,[r15*1+r10]
+ mov r12d,edx
+ vpxor ymm6,ymm6,ymm7
+ add r9d,DWORD[((8+128))+rsp]
+ and r12d,ecx
+ rorx r13d,ecx,25
+ vpsrlq ymm7,ymm7,2
+ rorx r15d,ecx,11
+ lea r10d,[r14*1+r10]
+ lea r9d,[r12*1+r9]
+ vpxor ymm6,ymm6,ymm7
+ andn r12d,ecx,r8d
+ xor r13d,r15d
+ rorx r14d,ecx,6
+ vpshufd ymm6,ymm6,132
+ lea r9d,[r12*1+r9]
+ xor r13d,r14d
+ mov r15d,r10d
+ vpsrldq ymm6,ymm6,8
+ rorx r12d,r10d,22
+ lea r9d,[r13*1+r9]
+ xor r15d,r11d
+ vpaddd ymm0,ymm0,ymm6
+ rorx r14d,r10d,13
+ rorx r13d,r10d,2
+ lea ebx,[r9*1+rbx]
+ vpshufd ymm7,ymm0,80
+ and esi,r15d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((32-128))+rdi]
+ xor r14d,r12d
+ xor esi,r11d
+ vpsrld ymm6,ymm7,10
+ xor r14d,r13d
+ lea r9d,[rsi*1+r9]
+ mov r12d,ecx
+ vpsrlq ymm7,ymm7,17
+ add r8d,DWORD[((12+128))+rsp]
+ and r12d,ebx
+ rorx r13d,ebx,25
+ vpxor ymm6,ymm6,ymm7
+ rorx esi,ebx,11
+ lea r9d,[r14*1+r9]
+ lea r8d,[r12*1+r8]
+ vpsrlq ymm7,ymm7,2
+ andn r12d,ebx,edx
+ xor r13d,esi
+ rorx r14d,ebx,6
+ vpxor ymm6,ymm6,ymm7
+ lea r8d,[r12*1+r8]
+ xor r13d,r14d
+ mov esi,r9d
+ vpshufd ymm6,ymm6,232
+ rorx r12d,r9d,22
+ lea r8d,[r13*1+r8]
+ xor esi,r10d
+ vpslldq ymm6,ymm6,8
+ rorx r14d,r9d,13
+ rorx r13d,r9d,2
+ lea eax,[r8*1+rax]
+ vpaddd ymm0,ymm0,ymm6
+ and r15d,esi
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((48-128))+rdi]
+ xor r14d,r12d
+ xor r15d,r10d
+ vpaddd ymm6,ymm0,YMMWORD[rbp]
+ xor r14d,r13d
+ lea r8d,[r15*1+r8]
+ mov r12d,ebx
+ vmovdqa YMMWORD[rsp],ymm6
+ vpalignr ymm4,ymm2,ymm1,4
+ add edx,DWORD[((32+128))+rsp]
+ and r12d,eax
+ rorx r13d,eax,25
+ vpalignr ymm7,ymm0,ymm3,4
+ rorx r15d,eax,11
+ lea r8d,[r14*1+r8]
+ lea edx,[r12*1+rdx]
+ vpsrld ymm6,ymm4,7
+ andn r12d,eax,ecx
+ xor r13d,r15d
+ rorx r14d,eax,6
+ vpaddd ymm1,ymm1,ymm7
+ lea edx,[r12*1+rdx]
+ xor r13d,r14d
+ mov r15d,r8d
+ vpsrld ymm7,ymm4,3
+ rorx r12d,r8d,22
+ lea edx,[r13*1+rdx]
+ xor r15d,r9d
+ vpslld ymm5,ymm4,14
+ rorx r14d,r8d,13
+ rorx r13d,r8d,2
+ lea r11d,[rdx*1+r11]
+ vpxor ymm4,ymm7,ymm6
+ and esi,r15d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((64-128))+rdi]
+ xor r14d,r12d
+ xor esi,r9d
+ vpshufd ymm7,ymm0,250
+ xor r14d,r13d
+ lea edx,[rsi*1+rdx]
+ mov r12d,eax
+ vpsrld ymm6,ymm6,11
+ add ecx,DWORD[((36+128))+rsp]
+ and r12d,r11d
+ rorx r13d,r11d,25
+ vpxor ymm4,ymm4,ymm5
+ rorx esi,r11d,11
+ lea edx,[r14*1+rdx]
+ lea ecx,[r12*1+rcx]
+ vpslld ymm5,ymm5,11
+ andn r12d,r11d,ebx
+ xor r13d,esi
+ rorx r14d,r11d,6
+ vpxor ymm4,ymm4,ymm6
+ lea ecx,[r12*1+rcx]
+ xor r13d,r14d
+ mov esi,edx
+ vpsrld ymm6,ymm7,10
+ rorx r12d,edx,22
+ lea ecx,[r13*1+rcx]
+ xor esi,r8d
+ vpxor ymm4,ymm4,ymm5
+ rorx r14d,edx,13
+ rorx r13d,edx,2
+ lea r10d,[rcx*1+r10]
+ vpsrlq ymm7,ymm7,17
+ and r15d,esi
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((80-128))+rdi]
+ xor r14d,r12d
+ xor r15d,r8d
+ vpaddd ymm1,ymm1,ymm4
+ xor r14d,r13d
+ lea ecx,[r15*1+rcx]
+ mov r12d,r11d
+ vpxor ymm6,ymm6,ymm7
+ add ebx,DWORD[((40+128))+rsp]
+ and r12d,r10d
+ rorx r13d,r10d,25
+ vpsrlq ymm7,ymm7,2
+ rorx r15d,r10d,11
+ lea ecx,[r14*1+rcx]
+ lea ebx,[r12*1+rbx]
+ vpxor ymm6,ymm6,ymm7
+ andn r12d,r10d,eax
+ xor r13d,r15d
+ rorx r14d,r10d,6
+ vpshufd ymm6,ymm6,132
+ lea ebx,[r12*1+rbx]
+ xor r13d,r14d
+ mov r15d,ecx
+ vpsrldq ymm6,ymm6,8
+ rorx r12d,ecx,22
+ lea ebx,[r13*1+rbx]
+ xor r15d,edx
+ vpaddd ymm1,ymm1,ymm6
+ rorx r14d,ecx,13
+ rorx r13d,ecx,2
+ lea r9d,[rbx*1+r9]
+ vpshufd ymm7,ymm1,80
+ and esi,r15d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((96-128))+rdi]
+ xor r14d,r12d
+ xor esi,edx
+ vpsrld ymm6,ymm7,10
+ xor r14d,r13d
+ lea ebx,[rsi*1+rbx]
+ mov r12d,r10d
+ vpsrlq ymm7,ymm7,17
+ add eax,DWORD[((44+128))+rsp]
+ and r12d,r9d
+ rorx r13d,r9d,25
+ vpxor ymm6,ymm6,ymm7
+ rorx esi,r9d,11
+ lea ebx,[r14*1+rbx]
+ lea eax,[r12*1+rax]
+ vpsrlq ymm7,ymm7,2
+ andn r12d,r9d,r11d
+ xor r13d,esi
+ rorx r14d,r9d,6
+ vpxor ymm6,ymm6,ymm7
+ lea eax,[r12*1+rax]
+ xor r13d,r14d
+ mov esi,ebx
+ vpshufd ymm6,ymm6,232
+ rorx r12d,ebx,22
+ lea eax,[r13*1+rax]
+ xor esi,ecx
+ vpslldq ymm6,ymm6,8
+ rorx r14d,ebx,13
+ rorx r13d,ebx,2
+ lea r8d,[rax*1+r8]
+ vpaddd ymm1,ymm1,ymm6
+ and r15d,esi
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((112-128))+rdi]
+ xor r14d,r12d
+ xor r15d,ecx
+ vpaddd ymm6,ymm1,YMMWORD[32+rbp]
+ xor r14d,r13d
+ lea eax,[r15*1+rax]
+ mov r12d,r9d
+ vmovdqa YMMWORD[32+rsp],ymm6
+ lea rsp,[((-64))+rsp]
+ vpalignr ymm4,ymm3,ymm2,4
+ add r11d,DWORD[((0+128))+rsp]
+ and r12d,r8d
+ rorx r13d,r8d,25
+ vpalignr ymm7,ymm1,ymm0,4
+ rorx r15d,r8d,11
+ lea eax,[r14*1+rax]
+ lea r11d,[r12*1+r11]
+ vpsrld ymm6,ymm4,7
+ andn r12d,r8d,r10d
+ xor r13d,r15d
+ rorx r14d,r8d,6
+ vpaddd ymm2,ymm2,ymm7
+ lea r11d,[r12*1+r11]
+ xor r13d,r14d
+ mov r15d,eax
+ vpsrld ymm7,ymm4,3
+ rorx r12d,eax,22
+ lea r11d,[r13*1+r11]
+ xor r15d,ebx
+ vpslld ymm5,ymm4,14
+ rorx r14d,eax,13
+ rorx r13d,eax,2
+ lea edx,[r11*1+rdx]
+ vpxor ymm4,ymm7,ymm6
+ and esi,r15d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((128-128))+rdi]
+ xor r14d,r12d
+ xor esi,ebx
+ vpshufd ymm7,ymm1,250
+ xor r14d,r13d
+ lea r11d,[rsi*1+r11]
+ mov r12d,r8d
+ vpsrld ymm6,ymm6,11
+ add r10d,DWORD[((4+128))+rsp]
+ and r12d,edx
+ rorx r13d,edx,25
+ vpxor ymm4,ymm4,ymm5
+ rorx esi,edx,11
+ lea r11d,[r14*1+r11]
+ lea r10d,[r12*1+r10]
+ vpslld ymm5,ymm5,11
+ andn r12d,edx,r9d
+ xor r13d,esi
+ rorx r14d,edx,6
+ vpxor ymm4,ymm4,ymm6
+ lea r10d,[r12*1+r10]
+ xor r13d,r14d
+ mov esi,r11d
+ vpsrld ymm6,ymm7,10
+ rorx r12d,r11d,22
+ lea r10d,[r13*1+r10]
+ xor esi,eax
+ vpxor ymm4,ymm4,ymm5
+ rorx r14d,r11d,13
+ rorx r13d,r11d,2
+ lea ecx,[r10*1+rcx]
+ vpsrlq ymm7,ymm7,17
+ and r15d,esi
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((144-128))+rdi]
+ xor r14d,r12d
+ xor r15d,eax
+ vpaddd ymm2,ymm2,ymm4
+ xor r14d,r13d
+ lea r10d,[r15*1+r10]
+ mov r12d,edx
+ vpxor ymm6,ymm6,ymm7
+ add r9d,DWORD[((8+128))+rsp]
+ and r12d,ecx
+ rorx r13d,ecx,25
+ vpsrlq ymm7,ymm7,2
+ rorx r15d,ecx,11
+ lea r10d,[r14*1+r10]
+ lea r9d,[r12*1+r9]
+ vpxor ymm6,ymm6,ymm7
+ andn r12d,ecx,r8d
+ xor r13d,r15d
+ rorx r14d,ecx,6
+ vpshufd ymm6,ymm6,132
+ lea r9d,[r12*1+r9]
+ xor r13d,r14d
+ mov r15d,r10d
+ vpsrldq ymm6,ymm6,8
+ rorx r12d,r10d,22
+ lea r9d,[r13*1+r9]
+ xor r15d,r11d
+ vpaddd ymm2,ymm2,ymm6
+ rorx r14d,r10d,13
+ rorx r13d,r10d,2
+ lea ebx,[r9*1+rbx]
+ vpshufd ymm7,ymm2,80
+ and esi,r15d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((160-128))+rdi]
+ xor r14d,r12d
+ xor esi,r11d
+ vpsrld ymm6,ymm7,10
+ xor r14d,r13d
+ lea r9d,[rsi*1+r9]
+ mov r12d,ecx
+ vpsrlq ymm7,ymm7,17
+ add r8d,DWORD[((12+128))+rsp]
+ and r12d,ebx
+ rorx r13d,ebx,25
+ vpxor ymm6,ymm6,ymm7
+ rorx esi,ebx,11
+ lea r9d,[r14*1+r9]
+ lea r8d,[r12*1+r8]
+ vpsrlq ymm7,ymm7,2
+ andn r12d,ebx,edx
+ xor r13d,esi
+ rorx r14d,ebx,6
+ vpxor ymm6,ymm6,ymm7
+ lea r8d,[r12*1+r8]
+ xor r13d,r14d
+ mov esi,r9d
+ vpshufd ymm6,ymm6,232
+ rorx r12d,r9d,22
+ lea r8d,[r13*1+r8]
+ xor esi,r10d
+ vpslldq ymm6,ymm6,8
+ rorx r14d,r9d,13
+ rorx r13d,r9d,2
+ lea eax,[r8*1+rax]
+ vpaddd ymm2,ymm2,ymm6
+ and r15d,esi
+ vaesenclast xmm11,xmm9,xmm10
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((176-128))+rdi]
+ xor r14d,r12d
+ xor r15d,r10d
+ vpaddd ymm6,ymm2,YMMWORD[64+rbp]
+ xor r14d,r13d
+ lea r8d,[r15*1+r8]
+ mov r12d,ebx
+ vmovdqa YMMWORD[rsp],ymm6
+ vpalignr ymm4,ymm0,ymm3,4
+ add edx,DWORD[((32+128))+rsp]
+ and r12d,eax
+ rorx r13d,eax,25
+ vpalignr ymm7,ymm2,ymm1,4
+ rorx r15d,eax,11
+ lea r8d,[r14*1+r8]
+ lea edx,[r12*1+rdx]
+ vpsrld ymm6,ymm4,7
+ andn r12d,eax,ecx
+ xor r13d,r15d
+ rorx r14d,eax,6
+ vpaddd ymm3,ymm3,ymm7
+ lea edx,[r12*1+rdx]
+ xor r13d,r14d
+ mov r15d,r8d
+ vpsrld ymm7,ymm4,3
+ rorx r12d,r8d,22
+ lea edx,[r13*1+rdx]
+ xor r15d,r9d
+ vpslld ymm5,ymm4,14
+ rorx r14d,r8d,13
+ rorx r13d,r8d,2
+ lea r11d,[rdx*1+r11]
+ vpxor ymm4,ymm7,ymm6
+ and esi,r15d
+ vpand xmm8,xmm11,xmm12
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((192-128))+rdi]
+ xor r14d,r12d
+ xor esi,r9d
+ vpshufd ymm7,ymm2,250
+ xor r14d,r13d
+ lea edx,[rsi*1+rdx]
+ mov r12d,eax
+ vpsrld ymm6,ymm6,11
+ add ecx,DWORD[((36+128))+rsp]
+ and r12d,r11d
+ rorx r13d,r11d,25
+ vpxor ymm4,ymm4,ymm5
+ rorx esi,r11d,11
+ lea edx,[r14*1+rdx]
+ lea ecx,[r12*1+rcx]
+ vpslld ymm5,ymm5,11
+ andn r12d,r11d,ebx
+ xor r13d,esi
+ rorx r14d,r11d,6
+ vpxor ymm4,ymm4,ymm6
+ lea ecx,[r12*1+rcx]
+ xor r13d,r14d
+ mov esi,edx
+ vpsrld ymm6,ymm7,10
+ rorx r12d,edx,22
+ lea ecx,[r13*1+rcx]
+ xor esi,r8d
+ vpxor ymm4,ymm4,ymm5
+ rorx r14d,edx,13
+ rorx r13d,edx,2
+ lea r10d,[rcx*1+r10]
+ vpsrlq ymm7,ymm7,17
+ and r15d,esi
+ vaesenclast xmm11,xmm9,xmm10
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((208-128))+rdi]
+ xor r14d,r12d
+ xor r15d,r8d
+ vpaddd ymm3,ymm3,ymm4
+ xor r14d,r13d
+ lea ecx,[r15*1+rcx]
+ mov r12d,r11d
+ vpxor ymm6,ymm6,ymm7
+ add ebx,DWORD[((40+128))+rsp]
+ and r12d,r10d
+ rorx r13d,r10d,25
+ vpsrlq ymm7,ymm7,2
+ rorx r15d,r10d,11
+ lea ecx,[r14*1+rcx]
+ lea ebx,[r12*1+rbx]
+ vpxor ymm6,ymm6,ymm7
+ andn r12d,r10d,eax
+ xor r13d,r15d
+ rorx r14d,r10d,6
+ vpshufd ymm6,ymm6,132
+ lea ebx,[r12*1+rbx]
+ xor r13d,r14d
+ mov r15d,ecx
+ vpsrldq ymm6,ymm6,8
+ rorx r12d,ecx,22
+ lea ebx,[r13*1+rbx]
+ xor r15d,edx
+ vpaddd ymm3,ymm3,ymm6
+ rorx r14d,ecx,13
+ rorx r13d,ecx,2
+ lea r9d,[rbx*1+r9]
+ vpshufd ymm7,ymm3,80
+ and esi,r15d
+ vpand xmm11,xmm11,xmm13
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((224-128))+rdi]
+ xor r14d,r12d
+ xor esi,edx
+ vpsrld ymm6,ymm7,10
+ xor r14d,r13d
+ lea ebx,[rsi*1+rbx]
+ mov r12d,r10d
+ vpsrlq ymm7,ymm7,17
+ add eax,DWORD[((44+128))+rsp]
+ and r12d,r9d
+ rorx r13d,r9d,25
+ vpxor ymm6,ymm6,ymm7
+ rorx esi,r9d,11
+ lea ebx,[r14*1+rbx]
+ lea eax,[r12*1+rax]
+ vpsrlq ymm7,ymm7,2
+ andn r12d,r9d,r11d
+ xor r13d,esi
+ rorx r14d,r9d,6
+ vpxor ymm6,ymm6,ymm7
+ lea eax,[r12*1+rax]
+ xor r13d,r14d
+ mov esi,ebx
+ vpshufd ymm6,ymm6,232
+ rorx r12d,ebx,22
+ lea eax,[r13*1+rax]
+ xor esi,ecx
+ vpslldq ymm6,ymm6,8
+ rorx r14d,ebx,13
+ rorx r13d,ebx,2
+ lea r8d,[rax*1+r8]
+ vpaddd ymm3,ymm3,ymm6
+ and r15d,esi
+ vpor xmm8,xmm8,xmm11
+ vaesenclast xmm11,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((0-128))+rdi]
+ xor r14d,r12d
+ xor r15d,ecx
+ vpaddd ymm6,ymm3,YMMWORD[96+rbp]
+ xor r14d,r13d
+ lea eax,[r15*1+rax]
+ mov r12d,r9d
+ vmovdqa YMMWORD[32+rsp],ymm6
+ vmovq r13,xmm15
+ vpextrq r15,xmm15,1
+ vpand xmm11,xmm11,xmm14
+ vpor xmm8,xmm8,xmm11
+ vmovdqu XMMWORD[r13*1+r15],xmm8
+ lea r13,[16+r13]
+ lea rbp,[128+rbp]
+ cmp BYTE[3+rbp],0
+ jne NEAR $L$avx2_00_47
+ vmovdqu xmm9,XMMWORD[r13]
+ vpinsrq xmm15,xmm15,r13,0
+ add r11d,DWORD[((0+64))+rsp]
+ and r12d,r8d
+ rorx r13d,r8d,25
+ rorx r15d,r8d,11
+ lea eax,[r14*1+rax]
+ lea r11d,[r12*1+r11]
+ andn r12d,r8d,r10d
+ xor r13d,r15d
+ rorx r14d,r8d,6
+ lea r11d,[r12*1+r11]
+ xor r13d,r14d
+ mov r15d,eax
+ rorx r12d,eax,22
+ lea r11d,[r13*1+r11]
+ xor r15d,ebx
+ rorx r14d,eax,13
+ rorx r13d,eax,2
+ lea edx,[r11*1+rdx]
+ and esi,r15d
+ vpxor xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((16-128))+rdi]
+ xor r14d,r12d
+ xor esi,ebx
+ xor r14d,r13d
+ lea r11d,[rsi*1+r11]
+ mov r12d,r8d
+ add r10d,DWORD[((4+64))+rsp]
+ and r12d,edx
+ rorx r13d,edx,25
+ rorx esi,edx,11
+ lea r11d,[r14*1+r11]
+ lea r10d,[r12*1+r10]
+ andn r12d,edx,r9d
+ xor r13d,esi
+ rorx r14d,edx,6
+ lea r10d,[r12*1+r10]
+ xor r13d,r14d
+ mov esi,r11d
+ rorx r12d,r11d,22
+ lea r10d,[r13*1+r10]
+ xor esi,eax
+ rorx r14d,r11d,13
+ rorx r13d,r11d,2
+ lea ecx,[r10*1+rcx]
+ and r15d,esi
+ vpxor xmm9,xmm9,xmm8
+ xor r14d,r12d
+ xor r15d,eax
+ xor r14d,r13d
+ lea r10d,[r15*1+r10]
+ mov r12d,edx
+ add r9d,DWORD[((8+64))+rsp]
+ and r12d,ecx
+ rorx r13d,ecx,25
+ rorx r15d,ecx,11
+ lea r10d,[r14*1+r10]
+ lea r9d,[r12*1+r9]
+ andn r12d,ecx,r8d
+ xor r13d,r15d
+ rorx r14d,ecx,6
+ lea r9d,[r12*1+r9]
+ xor r13d,r14d
+ mov r15d,r10d
+ rorx r12d,r10d,22
+ lea r9d,[r13*1+r9]
+ xor r15d,r11d
+ rorx r14d,r10d,13
+ rorx r13d,r10d,2
+ lea ebx,[r9*1+rbx]
+ and esi,r15d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((32-128))+rdi]
+ xor r14d,r12d
+ xor esi,r11d
+ xor r14d,r13d
+ lea r9d,[rsi*1+r9]
+ mov r12d,ecx
+ add r8d,DWORD[((12+64))+rsp]
+ and r12d,ebx
+ rorx r13d,ebx,25
+ rorx esi,ebx,11
+ lea r9d,[r14*1+r9]
+ lea r8d,[r12*1+r8]
+ andn r12d,ebx,edx
+ xor r13d,esi
+ rorx r14d,ebx,6
+ lea r8d,[r12*1+r8]
+ xor r13d,r14d
+ mov esi,r9d
+ rorx r12d,r9d,22
+ lea r8d,[r13*1+r8]
+ xor esi,r10d
+ rorx r14d,r9d,13
+ rorx r13d,r9d,2
+ lea eax,[r8*1+rax]
+ and r15d,esi
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((48-128))+rdi]
+ xor r14d,r12d
+ xor r15d,r10d
+ xor r14d,r13d
+ lea r8d,[r15*1+r8]
+ mov r12d,ebx
+ add edx,DWORD[((32+64))+rsp]
+ and r12d,eax
+ rorx r13d,eax,25
+ rorx r15d,eax,11
+ lea r8d,[r14*1+r8]
+ lea edx,[r12*1+rdx]
+ andn r12d,eax,ecx
+ xor r13d,r15d
+ rorx r14d,eax,6
+ lea edx,[r12*1+rdx]
+ xor r13d,r14d
+ mov r15d,r8d
+ rorx r12d,r8d,22
+ lea edx,[r13*1+rdx]
+ xor r15d,r9d
+ rorx r14d,r8d,13
+ rorx r13d,r8d,2
+ lea r11d,[rdx*1+r11]
+ and esi,r15d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((64-128))+rdi]
+ xor r14d,r12d
+ xor esi,r9d
+ xor r14d,r13d
+ lea edx,[rsi*1+rdx]
+ mov r12d,eax
+ add ecx,DWORD[((36+64))+rsp]
+ and r12d,r11d
+ rorx r13d,r11d,25
+ rorx esi,r11d,11
+ lea edx,[r14*1+rdx]
+ lea ecx,[r12*1+rcx]
+ andn r12d,r11d,ebx
+ xor r13d,esi
+ rorx r14d,r11d,6
+ lea ecx,[r12*1+rcx]
+ xor r13d,r14d
+ mov esi,edx
+ rorx r12d,edx,22
+ lea ecx,[r13*1+rcx]
+ xor esi,r8d
+ rorx r14d,edx,13
+ rorx r13d,edx,2
+ lea r10d,[rcx*1+r10]
+ and r15d,esi
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((80-128))+rdi]
+ xor r14d,r12d
+ xor r15d,r8d
+ xor r14d,r13d
+ lea ecx,[r15*1+rcx]
+ mov r12d,r11d
+ add ebx,DWORD[((40+64))+rsp]
+ and r12d,r10d
+ rorx r13d,r10d,25
+ rorx r15d,r10d,11
+ lea ecx,[r14*1+rcx]
+ lea ebx,[r12*1+rbx]
+ andn r12d,r10d,eax
+ xor r13d,r15d
+ rorx r14d,r10d,6
+ lea ebx,[r12*1+rbx]
+ xor r13d,r14d
+ mov r15d,ecx
+ rorx r12d,ecx,22
+ lea ebx,[r13*1+rbx]
+ xor r15d,edx
+ rorx r14d,ecx,13
+ rorx r13d,ecx,2
+ lea r9d,[rbx*1+r9]
+ and esi,r15d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((96-128))+rdi]
+ xor r14d,r12d
+ xor esi,edx
+ xor r14d,r13d
+ lea ebx,[rsi*1+rbx]
+ mov r12d,r10d
+ add eax,DWORD[((44+64))+rsp]
+ and r12d,r9d
+ rorx r13d,r9d,25
+ rorx esi,r9d,11
+ lea ebx,[r14*1+rbx]
+ lea eax,[r12*1+rax]
+ andn r12d,r9d,r11d
+ xor r13d,esi
+ rorx r14d,r9d,6
+ lea eax,[r12*1+rax]
+ xor r13d,r14d
+ mov esi,ebx
+ rorx r12d,ebx,22
+ lea eax,[r13*1+rax]
+ xor esi,ecx
+ rorx r14d,ebx,13
+ rorx r13d,ebx,2
+ lea r8d,[rax*1+r8]
+ and r15d,esi
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((112-128))+rdi]
+ xor r14d,r12d
+ xor r15d,ecx
+ xor r14d,r13d
+ lea eax,[r15*1+rax]
+ mov r12d,r9d
+ add r11d,DWORD[rsp]
+ and r12d,r8d
+ rorx r13d,r8d,25
+ rorx r15d,r8d,11
+ lea eax,[r14*1+rax]
+ lea r11d,[r12*1+r11]
+ andn r12d,r8d,r10d
+ xor r13d,r15d
+ rorx r14d,r8d,6
+ lea r11d,[r12*1+r11]
+ xor r13d,r14d
+ mov r15d,eax
+ rorx r12d,eax,22
+ lea r11d,[r13*1+r11]
+ xor r15d,ebx
+ rorx r14d,eax,13
+ rorx r13d,eax,2
+ lea edx,[r11*1+rdx]
+ and esi,r15d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((128-128))+rdi]
+ xor r14d,r12d
+ xor esi,ebx
+ xor r14d,r13d
+ lea r11d,[rsi*1+r11]
+ mov r12d,r8d
+ add r10d,DWORD[4+rsp]
+ and r12d,edx
+ rorx r13d,edx,25
+ rorx esi,edx,11
+ lea r11d,[r14*1+r11]
+ lea r10d,[r12*1+r10]
+ andn r12d,edx,r9d
+ xor r13d,esi
+ rorx r14d,edx,6
+ lea r10d,[r12*1+r10]
+ xor r13d,r14d
+ mov esi,r11d
+ rorx r12d,r11d,22
+ lea r10d,[r13*1+r10]
+ xor esi,eax
+ rorx r14d,r11d,13
+ rorx r13d,r11d,2
+ lea ecx,[r10*1+rcx]
+ and r15d,esi
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((144-128))+rdi]
+ xor r14d,r12d
+ xor r15d,eax
+ xor r14d,r13d
+ lea r10d,[r15*1+r10]
+ mov r12d,edx
+ add r9d,DWORD[8+rsp]
+ and r12d,ecx
+ rorx r13d,ecx,25
+ rorx r15d,ecx,11
+ lea r10d,[r14*1+r10]
+ lea r9d,[r12*1+r9]
+ andn r12d,ecx,r8d
+ xor r13d,r15d
+ rorx r14d,ecx,6
+ lea r9d,[r12*1+r9]
+ xor r13d,r14d
+ mov r15d,r10d
+ rorx r12d,r10d,22
+ lea r9d,[r13*1+r9]
+ xor r15d,r11d
+ rorx r14d,r10d,13
+ rorx r13d,r10d,2
+ lea ebx,[r9*1+rbx]
+ and esi,r15d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((160-128))+rdi]
+ xor r14d,r12d
+ xor esi,r11d
+ xor r14d,r13d
+ lea r9d,[rsi*1+r9]
+ mov r12d,ecx
+ add r8d,DWORD[12+rsp]
+ and r12d,ebx
+ rorx r13d,ebx,25
+ rorx esi,ebx,11
+ lea r9d,[r14*1+r9]
+ lea r8d,[r12*1+r8]
+ andn r12d,ebx,edx
+ xor r13d,esi
+ rorx r14d,ebx,6
+ lea r8d,[r12*1+r8]
+ xor r13d,r14d
+ mov esi,r9d
+ rorx r12d,r9d,22
+ lea r8d,[r13*1+r8]
+ xor esi,r10d
+ rorx r14d,r9d,13
+ rorx r13d,r9d,2
+ lea eax,[r8*1+rax]
+ and r15d,esi
+ vaesenclast xmm11,xmm9,xmm10
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((176-128))+rdi]
+ xor r14d,r12d
+ xor r15d,r10d
+ xor r14d,r13d
+ lea r8d,[r15*1+r8]
+ mov r12d,ebx
+ add edx,DWORD[32+rsp]
+ and r12d,eax
+ rorx r13d,eax,25
+ rorx r15d,eax,11
+ lea r8d,[r14*1+r8]
+ lea edx,[r12*1+rdx]
+ andn r12d,eax,ecx
+ xor r13d,r15d
+ rorx r14d,eax,6
+ lea edx,[r12*1+rdx]
+ xor r13d,r14d
+ mov r15d,r8d
+ rorx r12d,r8d,22
+ lea edx,[r13*1+rdx]
+ xor r15d,r9d
+ rorx r14d,r8d,13
+ rorx r13d,r8d,2
+ lea r11d,[rdx*1+r11]
+ and esi,r15d
+ vpand xmm8,xmm11,xmm12
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((192-128))+rdi]
+ xor r14d,r12d
+ xor esi,r9d
+ xor r14d,r13d
+ lea edx,[rsi*1+rdx]
+ mov r12d,eax
+ add ecx,DWORD[36+rsp]
+ and r12d,r11d
+ rorx r13d,r11d,25
+ rorx esi,r11d,11
+ lea edx,[r14*1+rdx]
+ lea ecx,[r12*1+rcx]
+ andn r12d,r11d,ebx
+ xor r13d,esi
+ rorx r14d,r11d,6
+ lea ecx,[r12*1+rcx]
+ xor r13d,r14d
+ mov esi,edx
+ rorx r12d,edx,22
+ lea ecx,[r13*1+rcx]
+ xor esi,r8d
+ rorx r14d,edx,13
+ rorx r13d,edx,2
+ lea r10d,[rcx*1+r10]
+ and r15d,esi
+ vaesenclast xmm11,xmm9,xmm10
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((208-128))+rdi]
+ xor r14d,r12d
+ xor r15d,r8d
+ xor r14d,r13d
+ lea ecx,[r15*1+rcx]
+ mov r12d,r11d
+ add ebx,DWORD[40+rsp]
+ and r12d,r10d
+ rorx r13d,r10d,25
+ rorx r15d,r10d,11
+ lea ecx,[r14*1+rcx]
+ lea ebx,[r12*1+rbx]
+ andn r12d,r10d,eax
+ xor r13d,r15d
+ rorx r14d,r10d,6
+ lea ebx,[r12*1+rbx]
+ xor r13d,r14d
+ mov r15d,ecx
+ rorx r12d,ecx,22
+ lea ebx,[r13*1+rbx]
+ xor r15d,edx
+ rorx r14d,ecx,13
+ rorx r13d,ecx,2
+ lea r9d,[rbx*1+r9]
+ and esi,r15d
+ vpand xmm11,xmm11,xmm13
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((224-128))+rdi]
+ xor r14d,r12d
+ xor esi,edx
+ xor r14d,r13d
+ lea ebx,[rsi*1+rbx]
+ mov r12d,r10d
+ add eax,DWORD[44+rsp]
+ and r12d,r9d
+ rorx r13d,r9d,25
+ rorx esi,r9d,11
+ lea ebx,[r14*1+rbx]
+ lea eax,[r12*1+rax]
+ andn r12d,r9d,r11d
+ xor r13d,esi
+ rorx r14d,r9d,6
+ lea eax,[r12*1+rax]
+ xor r13d,r14d
+ mov esi,ebx
+ rorx r12d,ebx,22
+ lea eax,[r13*1+rax]
+ xor esi,ecx
+ rorx r14d,ebx,13
+ rorx r13d,ebx,2
+ lea r8d,[rax*1+r8]
+ and r15d,esi
+ vpor xmm8,xmm8,xmm11
+ vaesenclast xmm11,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((0-128))+rdi]
+ xor r14d,r12d
+ xor r15d,ecx
+ xor r14d,r13d
+ lea eax,[r15*1+rax]
+ mov r12d,r9d
+ vpextrq r12,xmm15,1
+ vmovq r13,xmm15
+ mov r15,QWORD[552+rsp]
+ add eax,r14d
+ lea rbp,[448+rsp]
+
+ vpand xmm11,xmm11,xmm14
+ vpor xmm8,xmm8,xmm11
+ vmovdqu XMMWORD[r13*1+r12],xmm8
+ lea r13,[16+r13]
+
+ add eax,DWORD[r15]
+ add ebx,DWORD[4+r15]
+ add ecx,DWORD[8+r15]
+ add edx,DWORD[12+r15]
+ add r8d,DWORD[16+r15]
+ add r9d,DWORD[20+r15]
+ add r10d,DWORD[24+r15]
+ add r11d,DWORD[28+r15]
+
+ mov DWORD[r15],eax
+ mov DWORD[4+r15],ebx
+ mov DWORD[8+r15],ecx
+ mov DWORD[12+r15],edx
+ mov DWORD[16+r15],r8d
+ mov DWORD[20+r15],r9d
+ mov DWORD[24+r15],r10d
+ mov DWORD[28+r15],r11d
+
+ cmp r13,QWORD[80+rbp]
+ je NEAR $L$done_avx2
+
+ xor r14d,r14d
+ mov esi,ebx
+ mov r12d,r9d
+ xor esi,ecx
+ jmp NEAR $L$ower_avx2
+ALIGN 16
+$L$ower_avx2:
+ vmovdqu xmm9,XMMWORD[r13]
+ vpinsrq xmm15,xmm15,r13,0
+ add r11d,DWORD[((0+16))+rbp]
+ and r12d,r8d
+ rorx r13d,r8d,25
+ rorx r15d,r8d,11
+ lea eax,[r14*1+rax]
+ lea r11d,[r12*1+r11]
+ andn r12d,r8d,r10d
+ xor r13d,r15d
+ rorx r14d,r8d,6
+ lea r11d,[r12*1+r11]
+ xor r13d,r14d
+ mov r15d,eax
+ rorx r12d,eax,22
+ lea r11d,[r13*1+r11]
+ xor r15d,ebx
+ rorx r14d,eax,13
+ rorx r13d,eax,2
+ lea edx,[r11*1+rdx]
+ and esi,r15d
+ vpxor xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((16-128))+rdi]
+ xor r14d,r12d
+ xor esi,ebx
+ xor r14d,r13d
+ lea r11d,[rsi*1+r11]
+ mov r12d,r8d
+ add r10d,DWORD[((4+16))+rbp]
+ and r12d,edx
+ rorx r13d,edx,25
+ rorx esi,edx,11
+ lea r11d,[r14*1+r11]
+ lea r10d,[r12*1+r10]
+ andn r12d,edx,r9d
+ xor r13d,esi
+ rorx r14d,edx,6
+ lea r10d,[r12*1+r10]
+ xor r13d,r14d
+ mov esi,r11d
+ rorx r12d,r11d,22
+ lea r10d,[r13*1+r10]
+ xor esi,eax
+ rorx r14d,r11d,13
+ rorx r13d,r11d,2
+ lea ecx,[r10*1+rcx]
+ and r15d,esi
+ vpxor xmm9,xmm9,xmm8
+ xor r14d,r12d
+ xor r15d,eax
+ xor r14d,r13d
+ lea r10d,[r15*1+r10]
+ mov r12d,edx
+ add r9d,DWORD[((8+16))+rbp]
+ and r12d,ecx
+ rorx r13d,ecx,25
+ rorx r15d,ecx,11
+ lea r10d,[r14*1+r10]
+ lea r9d,[r12*1+r9]
+ andn r12d,ecx,r8d
+ xor r13d,r15d
+ rorx r14d,ecx,6
+ lea r9d,[r12*1+r9]
+ xor r13d,r14d
+ mov r15d,r10d
+ rorx r12d,r10d,22
+ lea r9d,[r13*1+r9]
+ xor r15d,r11d
+ rorx r14d,r10d,13
+ rorx r13d,r10d,2
+ lea ebx,[r9*1+rbx]
+ and esi,r15d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((32-128))+rdi]
+ xor r14d,r12d
+ xor esi,r11d
+ xor r14d,r13d
+ lea r9d,[rsi*1+r9]
+ mov r12d,ecx
+ add r8d,DWORD[((12+16))+rbp]
+ and r12d,ebx
+ rorx r13d,ebx,25
+ rorx esi,ebx,11
+ lea r9d,[r14*1+r9]
+ lea r8d,[r12*1+r8]
+ andn r12d,ebx,edx
+ xor r13d,esi
+ rorx r14d,ebx,6
+ lea r8d,[r12*1+r8]
+ xor r13d,r14d
+ mov esi,r9d
+ rorx r12d,r9d,22
+ lea r8d,[r13*1+r8]
+ xor esi,r10d
+ rorx r14d,r9d,13
+ rorx r13d,r9d,2
+ lea eax,[r8*1+rax]
+ and r15d,esi
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((48-128))+rdi]
+ xor r14d,r12d
+ xor r15d,r10d
+ xor r14d,r13d
+ lea r8d,[r15*1+r8]
+ mov r12d,ebx
+ add edx,DWORD[((32+16))+rbp]
+ and r12d,eax
+ rorx r13d,eax,25
+ rorx r15d,eax,11
+ lea r8d,[r14*1+r8]
+ lea edx,[r12*1+rdx]
+ andn r12d,eax,ecx
+ xor r13d,r15d
+ rorx r14d,eax,6
+ lea edx,[r12*1+rdx]
+ xor r13d,r14d
+ mov r15d,r8d
+ rorx r12d,r8d,22
+ lea edx,[r13*1+rdx]
+ xor r15d,r9d
+ rorx r14d,r8d,13
+ rorx r13d,r8d,2
+ lea r11d,[rdx*1+r11]
+ and esi,r15d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((64-128))+rdi]
+ xor r14d,r12d
+ xor esi,r9d
+ xor r14d,r13d
+ lea edx,[rsi*1+rdx]
+ mov r12d,eax
+ add ecx,DWORD[((36+16))+rbp]
+ and r12d,r11d
+ rorx r13d,r11d,25
+ rorx esi,r11d,11
+ lea edx,[r14*1+rdx]
+ lea ecx,[r12*1+rcx]
+ andn r12d,r11d,ebx
+ xor r13d,esi
+ rorx r14d,r11d,6
+ lea ecx,[r12*1+rcx]
+ xor r13d,r14d
+ mov esi,edx
+ rorx r12d,edx,22
+ lea ecx,[r13*1+rcx]
+ xor esi,r8d
+ rorx r14d,edx,13
+ rorx r13d,edx,2
+ lea r10d,[rcx*1+r10]
+ and r15d,esi
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((80-128))+rdi]
+ xor r14d,r12d
+ xor r15d,r8d
+ xor r14d,r13d
+ lea ecx,[r15*1+rcx]
+ mov r12d,r11d
+ add ebx,DWORD[((40+16))+rbp]
+ and r12d,r10d
+ rorx r13d,r10d,25
+ rorx r15d,r10d,11
+ lea ecx,[r14*1+rcx]
+ lea ebx,[r12*1+rbx]
+ andn r12d,r10d,eax
+ xor r13d,r15d
+ rorx r14d,r10d,6
+ lea ebx,[r12*1+rbx]
+ xor r13d,r14d
+ mov r15d,ecx
+ rorx r12d,ecx,22
+ lea ebx,[r13*1+rbx]
+ xor r15d,edx
+ rorx r14d,ecx,13
+ rorx r13d,ecx,2
+ lea r9d,[rbx*1+r9]
+ and esi,r15d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((96-128))+rdi]
+ xor r14d,r12d
+ xor esi,edx
+ xor r14d,r13d
+ lea ebx,[rsi*1+rbx]
+ mov r12d,r10d
+ add eax,DWORD[((44+16))+rbp]
+ and r12d,r9d
+ rorx r13d,r9d,25
+ rorx esi,r9d,11
+ lea ebx,[r14*1+rbx]
+ lea eax,[r12*1+rax]
+ andn r12d,r9d,r11d
+ xor r13d,esi
+ rorx r14d,r9d,6
+ lea eax,[r12*1+rax]
+ xor r13d,r14d
+ mov esi,ebx
+ rorx r12d,ebx,22
+ lea eax,[r13*1+rax]
+ xor esi,ecx
+ rorx r14d,ebx,13
+ rorx r13d,ebx,2
+ lea r8d,[rax*1+r8]
+ and r15d,esi
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((112-128))+rdi]
+ xor r14d,r12d
+ xor r15d,ecx
+ xor r14d,r13d
+ lea eax,[r15*1+rax]
+ mov r12d,r9d
+ lea rbp,[((-64))+rbp]
+ add r11d,DWORD[((0+16))+rbp]
+ and r12d,r8d
+ rorx r13d,r8d,25
+ rorx r15d,r8d,11
+ lea eax,[r14*1+rax]
+ lea r11d,[r12*1+r11]
+ andn r12d,r8d,r10d
+ xor r13d,r15d
+ rorx r14d,r8d,6
+ lea r11d,[r12*1+r11]
+ xor r13d,r14d
+ mov r15d,eax
+ rorx r12d,eax,22
+ lea r11d,[r13*1+r11]
+ xor r15d,ebx
+ rorx r14d,eax,13
+ rorx r13d,eax,2
+ lea edx,[r11*1+rdx]
+ and esi,r15d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((128-128))+rdi]
+ xor r14d,r12d
+ xor esi,ebx
+ xor r14d,r13d
+ lea r11d,[rsi*1+r11]
+ mov r12d,r8d
+ add r10d,DWORD[((4+16))+rbp]
+ and r12d,edx
+ rorx r13d,edx,25
+ rorx esi,edx,11
+ lea r11d,[r14*1+r11]
+ lea r10d,[r12*1+r10]
+ andn r12d,edx,r9d
+ xor r13d,esi
+ rorx r14d,edx,6
+ lea r10d,[r12*1+r10]
+ xor r13d,r14d
+ mov esi,r11d
+ rorx r12d,r11d,22
+ lea r10d,[r13*1+r10]
+ xor esi,eax
+ rorx r14d,r11d,13
+ rorx r13d,r11d,2
+ lea ecx,[r10*1+rcx]
+ and r15d,esi
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((144-128))+rdi]
+ xor r14d,r12d
+ xor r15d,eax
+ xor r14d,r13d
+ lea r10d,[r15*1+r10]
+ mov r12d,edx
+ add r9d,DWORD[((8+16))+rbp]
+ and r12d,ecx
+ rorx r13d,ecx,25
+ rorx r15d,ecx,11
+ lea r10d,[r14*1+r10]
+ lea r9d,[r12*1+r9]
+ andn r12d,ecx,r8d
+ xor r13d,r15d
+ rorx r14d,ecx,6
+ lea r9d,[r12*1+r9]
+ xor r13d,r14d
+ mov r15d,r10d
+ rorx r12d,r10d,22
+ lea r9d,[r13*1+r9]
+ xor r15d,r11d
+ rorx r14d,r10d,13
+ rorx r13d,r10d,2
+ lea ebx,[r9*1+rbx]
+ and esi,r15d
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((160-128))+rdi]
+ xor r14d,r12d
+ xor esi,r11d
+ xor r14d,r13d
+ lea r9d,[rsi*1+r9]
+ mov r12d,ecx
+ add r8d,DWORD[((12+16))+rbp]
+ and r12d,ebx
+ rorx r13d,ebx,25
+ rorx esi,ebx,11
+ lea r9d,[r14*1+r9]
+ lea r8d,[r12*1+r8]
+ andn r12d,ebx,edx
+ xor r13d,esi
+ rorx r14d,ebx,6
+ lea r8d,[r12*1+r8]
+ xor r13d,r14d
+ mov esi,r9d
+ rorx r12d,r9d,22
+ lea r8d,[r13*1+r8]
+ xor esi,r10d
+ rorx r14d,r9d,13
+ rorx r13d,r9d,2
+ lea eax,[r8*1+rax]
+ and r15d,esi
+ vaesenclast xmm11,xmm9,xmm10
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((176-128))+rdi]
+ xor r14d,r12d
+ xor r15d,r10d
+ xor r14d,r13d
+ lea r8d,[r15*1+r8]
+ mov r12d,ebx
+ add edx,DWORD[((32+16))+rbp]
+ and r12d,eax
+ rorx r13d,eax,25
+ rorx r15d,eax,11
+ lea r8d,[r14*1+r8]
+ lea edx,[r12*1+rdx]
+ andn r12d,eax,ecx
+ xor r13d,r15d
+ rorx r14d,eax,6
+ lea edx,[r12*1+rdx]
+ xor r13d,r14d
+ mov r15d,r8d
+ rorx r12d,r8d,22
+ lea edx,[r13*1+rdx]
+ xor r15d,r9d
+ rorx r14d,r8d,13
+ rorx r13d,r8d,2
+ lea r11d,[rdx*1+r11]
+ and esi,r15d
+ vpand xmm8,xmm11,xmm12
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((192-128))+rdi]
+ xor r14d,r12d
+ xor esi,r9d
+ xor r14d,r13d
+ lea edx,[rsi*1+rdx]
+ mov r12d,eax
+ add ecx,DWORD[((36+16))+rbp]
+ and r12d,r11d
+ rorx r13d,r11d,25
+ rorx esi,r11d,11
+ lea edx,[r14*1+rdx]
+ lea ecx,[r12*1+rcx]
+ andn r12d,r11d,ebx
+ xor r13d,esi
+ rorx r14d,r11d,6
+ lea ecx,[r12*1+rcx]
+ xor r13d,r14d
+ mov esi,edx
+ rorx r12d,edx,22
+ lea ecx,[r13*1+rcx]
+ xor esi,r8d
+ rorx r14d,edx,13
+ rorx r13d,edx,2
+ lea r10d,[rcx*1+r10]
+ and r15d,esi
+ vaesenclast xmm11,xmm9,xmm10
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((208-128))+rdi]
+ xor r14d,r12d
+ xor r15d,r8d
+ xor r14d,r13d
+ lea ecx,[r15*1+rcx]
+ mov r12d,r11d
+ add ebx,DWORD[((40+16))+rbp]
+ and r12d,r10d
+ rorx r13d,r10d,25
+ rorx r15d,r10d,11
+ lea ecx,[r14*1+rcx]
+ lea ebx,[r12*1+rbx]
+ andn r12d,r10d,eax
+ xor r13d,r15d
+ rorx r14d,r10d,6
+ lea ebx,[r12*1+rbx]
+ xor r13d,r14d
+ mov r15d,ecx
+ rorx r12d,ecx,22
+ lea ebx,[r13*1+rbx]
+ xor r15d,edx
+ rorx r14d,ecx,13
+ rorx r13d,ecx,2
+ lea r9d,[rbx*1+r9]
+ and esi,r15d
+ vpand xmm11,xmm11,xmm13
+ vaesenc xmm9,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((224-128))+rdi]
+ xor r14d,r12d
+ xor esi,edx
+ xor r14d,r13d
+ lea ebx,[rsi*1+rbx]
+ mov r12d,r10d
+ add eax,DWORD[((44+16))+rbp]
+ and r12d,r9d
+ rorx r13d,r9d,25
+ rorx esi,r9d,11
+ lea ebx,[r14*1+rbx]
+ lea eax,[r12*1+rax]
+ andn r12d,r9d,r11d
+ xor r13d,esi
+ rorx r14d,r9d,6
+ lea eax,[r12*1+rax]
+ xor r13d,r14d
+ mov esi,ebx
+ rorx r12d,ebx,22
+ lea eax,[r13*1+rax]
+ xor esi,ecx
+ rorx r14d,ebx,13
+ rorx r13d,ebx,2
+ lea r8d,[rax*1+r8]
+ and r15d,esi
+ vpor xmm8,xmm8,xmm11
+ vaesenclast xmm11,xmm9,xmm10
+ vmovdqu xmm10,XMMWORD[((0-128))+rdi]
+ xor r14d,r12d
+ xor r15d,ecx
+ xor r14d,r13d
+ lea eax,[r15*1+rax]
+ mov r12d,r9d
+ vmovq r13,xmm15
+ vpextrq r15,xmm15,1
+ vpand xmm11,xmm11,xmm14
+ vpor xmm8,xmm8,xmm11
+ lea rbp,[((-64))+rbp]
+ vmovdqu XMMWORD[r13*1+r15],xmm8
+ lea r13,[16+r13]
+ cmp rbp,rsp
+ jae NEAR $L$ower_avx2
+
+ mov r15,QWORD[552+rsp]
+ lea r13,[64+r13]
+ mov rsi,QWORD[560+rsp]
+ add eax,r14d
+ lea rsp,[448+rsp]
+
+ add eax,DWORD[r15]
+ add ebx,DWORD[4+r15]
+ add ecx,DWORD[8+r15]
+ add edx,DWORD[12+r15]
+ add r8d,DWORD[16+r15]
+ add r9d,DWORD[20+r15]
+ add r10d,DWORD[24+r15]
+ lea r12,[r13*1+rsi]
+ add r11d,DWORD[28+r15]
+
+ cmp r13,QWORD[((64+16))+rsp]
+
+ mov DWORD[r15],eax
+ cmove r12,rsp
+ mov DWORD[4+r15],ebx
+ mov DWORD[8+r15],ecx
+ mov DWORD[12+r15],edx
+ mov DWORD[16+r15],r8d
+ mov DWORD[20+r15],r9d
+ mov DWORD[24+r15],r10d
+ mov DWORD[28+r15],r11d
+
+ jbe NEAR $L$oop_avx2
+ lea rbp,[rsp]
+
+$L$done_avx2:
+ lea rsp,[rbp]
+ mov r8,QWORD[((64+32))+rsp]
+ mov rsi,QWORD[120+rsp]
+
+ vmovdqu XMMWORD[r8],xmm8
+ vzeroall
+ movaps xmm6,XMMWORD[128+rsp]
+ movaps xmm7,XMMWORD[144+rsp]
+ movaps xmm8,XMMWORD[160+rsp]
+ movaps xmm9,XMMWORD[176+rsp]
+ movaps xmm10,XMMWORD[192+rsp]
+ movaps xmm11,XMMWORD[208+rsp]
+ movaps xmm12,XMMWORD[224+rsp]
+ movaps xmm13,XMMWORD[240+rsp]
+ movaps xmm14,XMMWORD[256+rsp]
+ movaps xmm15,XMMWORD[272+rsp]
+ mov r15,QWORD[((-48))+rsi]
+
+ mov r14,QWORD[((-40))+rsi]
+
+ mov r13,QWORD[((-32))+rsi]
+
+ mov r12,QWORD[((-24))+rsi]
+
+ mov rbp,QWORD[((-16))+rsi]
+
+ mov rbx,QWORD[((-8))+rsi]
+
+ lea rsp,[rsi]
+
+$L$epilogue_avx2:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_aesni_cbc_sha256_enc_avx2:
+
+ALIGN 32
+aesni_cbc_sha256_enc_shaext:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_aesni_cbc_sha256_enc_shaext:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+ mov r10,QWORD[56+rsp]
+ lea rsp,[((-168))+rsp]
+ movaps XMMWORD[(-8-160)+rax],xmm6
+ movaps XMMWORD[(-8-144)+rax],xmm7
+ movaps XMMWORD[(-8-128)+rax],xmm8
+ movaps XMMWORD[(-8-112)+rax],xmm9
+ movaps XMMWORD[(-8-96)+rax],xmm10
+ movaps XMMWORD[(-8-80)+rax],xmm11
+ movaps XMMWORD[(-8-64)+rax],xmm12
+ movaps XMMWORD[(-8-48)+rax],xmm13
+ movaps XMMWORD[(-8-32)+rax],xmm14
+ movaps XMMWORD[(-8-16)+rax],xmm15
+$L$prologue_shaext:
+ lea rax,[((K256+128))]
+ movdqu xmm1,XMMWORD[r9]
+ movdqu xmm2,XMMWORD[16+r9]
+ movdqa xmm3,XMMWORD[((512-128))+rax]
+
+ mov r11d,DWORD[240+rcx]
+ sub rsi,rdi
+ movups xmm15,XMMWORD[rcx]
+ movups xmm6,XMMWORD[r8]
+ movups xmm4,XMMWORD[16+rcx]
+ lea rcx,[112+rcx]
+
+ pshufd xmm0,xmm1,0x1b
+ pshufd xmm1,xmm1,0xb1
+ pshufd xmm2,xmm2,0x1b
+ movdqa xmm7,xmm3
+DB 102,15,58,15,202,8
+ punpcklqdq xmm2,xmm0
+
+ jmp NEAR $L$oop_shaext
+
+ALIGN 16
+$L$oop_shaext:
+ movdqu xmm10,XMMWORD[r10]
+ movdqu xmm11,XMMWORD[16+r10]
+ movdqu xmm12,XMMWORD[32+r10]
+DB 102,68,15,56,0,211
+ movdqu xmm13,XMMWORD[48+r10]
+
+ movdqa xmm0,XMMWORD[((0-128))+rax]
+ paddd xmm0,xmm10
+DB 102,68,15,56,0,219
+ movdqa xmm9,xmm2
+ movdqa xmm8,xmm1
+ movups xmm14,XMMWORD[rdi]
+ xorps xmm14,xmm15
+ xorps xmm6,xmm14
+ movups xmm5,XMMWORD[((-80))+rcx]
+ aesenc xmm6,xmm4
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movups xmm4,XMMWORD[((-64))+rcx]
+ aesenc xmm6,xmm5
+DB 15,56,203,202
+
+ movdqa xmm0,XMMWORD[((32-128))+rax]
+ paddd xmm0,xmm11
+DB 102,68,15,56,0,227
+ lea r10,[64+r10]
+ movups xmm5,XMMWORD[((-48))+rcx]
+ aesenc xmm6,xmm4
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movups xmm4,XMMWORD[((-32))+rcx]
+ aesenc xmm6,xmm5
+DB 15,56,203,202
+
+ movdqa xmm0,XMMWORD[((64-128))+rax]
+ paddd xmm0,xmm12
+DB 102,68,15,56,0,235
+DB 69,15,56,204,211
+ movups xmm5,XMMWORD[((-16))+rcx]
+ aesenc xmm6,xmm4
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movdqa xmm3,xmm13
+DB 102,65,15,58,15,220,4
+ paddd xmm10,xmm3
+ movups xmm4,XMMWORD[rcx]
+ aesenc xmm6,xmm5
+DB 15,56,203,202
+
+ movdqa xmm0,XMMWORD[((96-128))+rax]
+ paddd xmm0,xmm13
+DB 69,15,56,205,213
+DB 69,15,56,204,220
+ movups xmm5,XMMWORD[16+rcx]
+ aesenc xmm6,xmm4
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movups xmm4,XMMWORD[32+rcx]
+ aesenc xmm6,xmm5
+ movdqa xmm3,xmm10
+DB 102,65,15,58,15,221,4
+ paddd xmm11,xmm3
+DB 15,56,203,202
+ movdqa xmm0,XMMWORD[((128-128))+rax]
+ paddd xmm0,xmm10
+DB 69,15,56,205,218
+DB 69,15,56,204,229
+ movups xmm5,XMMWORD[48+rcx]
+ aesenc xmm6,xmm4
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movdqa xmm3,xmm11
+DB 102,65,15,58,15,218,4
+ paddd xmm12,xmm3
+ cmp r11d,11
+ jb NEAR $L$aesenclast1
+ movups xmm4,XMMWORD[64+rcx]
+ aesenc xmm6,xmm5
+ movups xmm5,XMMWORD[80+rcx]
+ aesenc xmm6,xmm4
+ je NEAR $L$aesenclast1
+ movups xmm4,XMMWORD[96+rcx]
+ aesenc xmm6,xmm5
+ movups xmm5,XMMWORD[112+rcx]
+ aesenc xmm6,xmm4
+$L$aesenclast1:
+ aesenclast xmm6,xmm5
+ movups xmm4,XMMWORD[((16-112))+rcx]
+ nop
+DB 15,56,203,202
+ movups xmm14,XMMWORD[16+rdi]
+ xorps xmm14,xmm15
+ movups XMMWORD[rdi*1+rsi],xmm6
+ xorps xmm6,xmm14
+ movups xmm5,XMMWORD[((-80))+rcx]
+ aesenc xmm6,xmm4
+ movdqa xmm0,XMMWORD[((160-128))+rax]
+ paddd xmm0,xmm11
+DB 69,15,56,205,227
+DB 69,15,56,204,234
+ movups xmm4,XMMWORD[((-64))+rcx]
+ aesenc xmm6,xmm5
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movdqa xmm3,xmm12
+DB 102,65,15,58,15,219,4
+ paddd xmm13,xmm3
+ movups xmm5,XMMWORD[((-48))+rcx]
+ aesenc xmm6,xmm4
+DB 15,56,203,202
+ movdqa xmm0,XMMWORD[((192-128))+rax]
+ paddd xmm0,xmm12
+DB 69,15,56,205,236
+DB 69,15,56,204,211
+ movups xmm4,XMMWORD[((-32))+rcx]
+ aesenc xmm6,xmm5
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movdqa xmm3,xmm13
+DB 102,65,15,58,15,220,4
+ paddd xmm10,xmm3
+ movups xmm5,XMMWORD[((-16))+rcx]
+ aesenc xmm6,xmm4
+DB 15,56,203,202
+ movdqa xmm0,XMMWORD[((224-128))+rax]
+ paddd xmm0,xmm13
+DB 69,15,56,205,213
+DB 69,15,56,204,220
+ movups xmm4,XMMWORD[rcx]
+ aesenc xmm6,xmm5
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movdqa xmm3,xmm10
+DB 102,65,15,58,15,221,4
+ paddd xmm11,xmm3
+ movups xmm5,XMMWORD[16+rcx]
+ aesenc xmm6,xmm4
+DB 15,56,203,202
+ movdqa xmm0,XMMWORD[((256-128))+rax]
+ paddd xmm0,xmm10
+DB 69,15,56,205,218
+DB 69,15,56,204,229
+ movups xmm4,XMMWORD[32+rcx]
+ aesenc xmm6,xmm5
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movdqa xmm3,xmm11
+DB 102,65,15,58,15,218,4
+ paddd xmm12,xmm3
+ movups xmm5,XMMWORD[48+rcx]
+ aesenc xmm6,xmm4
+ cmp r11d,11
+ jb NEAR $L$aesenclast2
+ movups xmm4,XMMWORD[64+rcx]
+ aesenc xmm6,xmm5
+ movups xmm5,XMMWORD[80+rcx]
+ aesenc xmm6,xmm4
+ je NEAR $L$aesenclast2
+ movups xmm4,XMMWORD[96+rcx]
+ aesenc xmm6,xmm5
+ movups xmm5,XMMWORD[112+rcx]
+ aesenc xmm6,xmm4
+$L$aesenclast2:
+ aesenclast xmm6,xmm5
+ movups xmm4,XMMWORD[((16-112))+rcx]
+ nop
+DB 15,56,203,202
+ movups xmm14,XMMWORD[32+rdi]
+ xorps xmm14,xmm15
+ movups XMMWORD[16+rdi*1+rsi],xmm6
+ xorps xmm6,xmm14
+ movups xmm5,XMMWORD[((-80))+rcx]
+ aesenc xmm6,xmm4
+ movdqa xmm0,XMMWORD[((288-128))+rax]
+ paddd xmm0,xmm11
+DB 69,15,56,205,227
+DB 69,15,56,204,234
+ movups xmm4,XMMWORD[((-64))+rcx]
+ aesenc xmm6,xmm5
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movdqa xmm3,xmm12
+DB 102,65,15,58,15,219,4
+ paddd xmm13,xmm3
+ movups xmm5,XMMWORD[((-48))+rcx]
+ aesenc xmm6,xmm4
+DB 15,56,203,202
+ movdqa xmm0,XMMWORD[((320-128))+rax]
+ paddd xmm0,xmm12
+DB 69,15,56,205,236
+DB 69,15,56,204,211
+ movups xmm4,XMMWORD[((-32))+rcx]
+ aesenc xmm6,xmm5
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movdqa xmm3,xmm13
+DB 102,65,15,58,15,220,4
+ paddd xmm10,xmm3
+ movups xmm5,XMMWORD[((-16))+rcx]
+ aesenc xmm6,xmm4
+DB 15,56,203,202
+ movdqa xmm0,XMMWORD[((352-128))+rax]
+ paddd xmm0,xmm13
+DB 69,15,56,205,213
+DB 69,15,56,204,220
+ movups xmm4,XMMWORD[rcx]
+ aesenc xmm6,xmm5
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movdqa xmm3,xmm10
+DB 102,65,15,58,15,221,4
+ paddd xmm11,xmm3
+ movups xmm5,XMMWORD[16+rcx]
+ aesenc xmm6,xmm4
+DB 15,56,203,202
+ movdqa xmm0,XMMWORD[((384-128))+rax]
+ paddd xmm0,xmm10
+DB 69,15,56,205,218
+DB 69,15,56,204,229
+ movups xmm4,XMMWORD[32+rcx]
+ aesenc xmm6,xmm5
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movdqa xmm3,xmm11
+DB 102,65,15,58,15,218,4
+ paddd xmm12,xmm3
+ movups xmm5,XMMWORD[48+rcx]
+ aesenc xmm6,xmm4
+DB 15,56,203,202
+ movdqa xmm0,XMMWORD[((416-128))+rax]
+ paddd xmm0,xmm11
+DB 69,15,56,205,227
+DB 69,15,56,204,234
+ cmp r11d,11
+ jb NEAR $L$aesenclast3
+ movups xmm4,XMMWORD[64+rcx]
+ aesenc xmm6,xmm5
+ movups xmm5,XMMWORD[80+rcx]
+ aesenc xmm6,xmm4
+ je NEAR $L$aesenclast3
+ movups xmm4,XMMWORD[96+rcx]
+ aesenc xmm6,xmm5
+ movups xmm5,XMMWORD[112+rcx]
+ aesenc xmm6,xmm4
+$L$aesenclast3:
+ aesenclast xmm6,xmm5
+ movups xmm4,XMMWORD[((16-112))+rcx]
+ nop
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movdqa xmm3,xmm12
+DB 102,65,15,58,15,219,4
+ paddd xmm13,xmm3
+ movups xmm14,XMMWORD[48+rdi]
+ xorps xmm14,xmm15
+ movups XMMWORD[32+rdi*1+rsi],xmm6
+ xorps xmm6,xmm14
+ movups xmm5,XMMWORD[((-80))+rcx]
+ aesenc xmm6,xmm4
+ movups xmm4,XMMWORD[((-64))+rcx]
+ aesenc xmm6,xmm5
+DB 15,56,203,202
+
+ movdqa xmm0,XMMWORD[((448-128))+rax]
+ paddd xmm0,xmm12
+DB 69,15,56,205,236
+ movdqa xmm3,xmm7
+ movups xmm5,XMMWORD[((-48))+rcx]
+ aesenc xmm6,xmm4
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movups xmm4,XMMWORD[((-32))+rcx]
+ aesenc xmm6,xmm5
+DB 15,56,203,202
+
+ movdqa xmm0,XMMWORD[((480-128))+rax]
+ paddd xmm0,xmm13
+ movups xmm5,XMMWORD[((-16))+rcx]
+ aesenc xmm6,xmm4
+ movups xmm4,XMMWORD[rcx]
+ aesenc xmm6,xmm5
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movups xmm5,XMMWORD[16+rcx]
+ aesenc xmm6,xmm4
+DB 15,56,203,202
+
+ movups xmm4,XMMWORD[32+rcx]
+ aesenc xmm6,xmm5
+ movups xmm5,XMMWORD[48+rcx]
+ aesenc xmm6,xmm4
+ cmp r11d,11
+ jb NEAR $L$aesenclast4
+ movups xmm4,XMMWORD[64+rcx]
+ aesenc xmm6,xmm5
+ movups xmm5,XMMWORD[80+rcx]
+ aesenc xmm6,xmm4
+ je NEAR $L$aesenclast4
+ movups xmm4,XMMWORD[96+rcx]
+ aesenc xmm6,xmm5
+ movups xmm5,XMMWORD[112+rcx]
+ aesenc xmm6,xmm4
+$L$aesenclast4:
+ aesenclast xmm6,xmm5
+ movups xmm4,XMMWORD[((16-112))+rcx]
+ nop
+
+ paddd xmm2,xmm9
+ paddd xmm1,xmm8
+
+ dec rdx
+ movups XMMWORD[48+rdi*1+rsi],xmm6
+ lea rdi,[64+rdi]
+ jnz NEAR $L$oop_shaext
+
+ pshufd xmm2,xmm2,0xb1
+ pshufd xmm3,xmm1,0x1b
+ pshufd xmm1,xmm1,0xb1
+ punpckhqdq xmm1,xmm2
+DB 102,15,58,15,211,8
+
+ movups XMMWORD[r8],xmm6
+ movdqu XMMWORD[r9],xmm1
+ movdqu XMMWORD[16+r9],xmm2
+ movaps xmm6,XMMWORD[rsp]
+ movaps xmm7,XMMWORD[16+rsp]
+ movaps xmm8,XMMWORD[32+rsp]
+ movaps xmm9,XMMWORD[48+rsp]
+ movaps xmm10,XMMWORD[64+rsp]
+ movaps xmm11,XMMWORD[80+rsp]
+ movaps xmm12,XMMWORD[96+rsp]
+ movaps xmm13,XMMWORD[112+rsp]
+ movaps xmm14,XMMWORD[128+rsp]
+ movaps xmm15,XMMWORD[144+rsp]
+ lea rsp,[((8+160))+rsp]
+$L$epilogue_shaext:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+$L$SEH_end_aesni_cbc_sha256_enc_shaext:
+EXTERN __imp_RtlVirtualUnwind
+
+ALIGN 16
+se_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ mov rsi,QWORD[8+r9]
+ mov r11,QWORD[56+r9]
+
+ mov r10d,DWORD[r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jb NEAR $L$in_prologue
+
+ mov rax,QWORD[152+r8]
+
+ mov r10d,DWORD[4+r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jae NEAR $L$in_prologue
+ lea r10,[aesni_cbc_sha256_enc_shaext]
+ cmp rbx,r10
+ jb NEAR $L$not_in_shaext
+
+ lea rsi,[rax]
+ lea rdi,[512+r8]
+ mov ecx,20
+ DD 0xa548f3fc
+ lea rax,[168+rax]
+ jmp NEAR $L$in_prologue
+$L$not_in_shaext:
+ lea r10,[$L$avx2_shortcut]
+ cmp rbx,r10
+ jb NEAR $L$not_in_avx2
+
+ and rax,-256*4
+ add rax,448
+$L$not_in_avx2:
+ mov rsi,rax
+ mov rax,QWORD[((64+56))+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+ mov rbp,QWORD[((-16))+rax]
+ mov r12,QWORD[((-24))+rax]
+ mov r13,QWORD[((-32))+rax]
+ mov r14,QWORD[((-40))+rax]
+ mov r15,QWORD[((-48))+rax]
+ mov QWORD[144+r8],rbx
+ mov QWORD[160+r8],rbp
+ mov QWORD[216+r8],r12
+ mov QWORD[224+r8],r13
+ mov QWORD[232+r8],r14
+ mov QWORD[240+r8],r15
+
+ lea rsi,[((64+64))+rsi]
+ lea rdi,[512+r8]
+ mov ecx,20
+ DD 0xa548f3fc
+
+$L$in_prologue:
+ mov rdi,QWORD[8+rax]
+ mov rsi,QWORD[16+rax]
+ mov QWORD[152+r8],rax
+ mov QWORD[168+r8],rsi
+ mov QWORD[176+r8],rdi
+
+ mov rdi,QWORD[40+r9]
+ mov rsi,r8
+ mov ecx,154
+ DD 0xa548f3fc
+
+ mov rsi,r9
+ xor rcx,rcx
+ mov rdx,QWORD[8+rsi]
+ mov r8,QWORD[rsi]
+ mov r9,QWORD[16+rsi]
+ mov r10,QWORD[40+rsi]
+ lea r11,[56+rsi]
+ lea r12,[24+rsi]
+ mov QWORD[32+rsp],r10
+ mov QWORD[40+rsp],r11
+ mov QWORD[48+rsp],r12
+ mov QWORD[56+rsp],rcx
+ call QWORD[__imp_RtlVirtualUnwind]
+
+ mov eax,1
+ add rsp,64
+ popfq
+ pop r15
+ pop r14
+ pop r13
+ pop r12
+ pop rbp
+ pop rbx
+ pop rdi
+ pop rsi
+ DB 0F3h,0C3h ;repret
+
+
+section .pdata rdata align=4
+ DD $L$SEH_begin_aesni_cbc_sha256_enc_xop wrt ..imagebase
+ DD $L$SEH_end_aesni_cbc_sha256_enc_xop wrt ..imagebase
+ DD $L$SEH_info_aesni_cbc_sha256_enc_xop wrt ..imagebase
+
+ DD $L$SEH_begin_aesni_cbc_sha256_enc_avx wrt ..imagebase
+ DD $L$SEH_end_aesni_cbc_sha256_enc_avx wrt ..imagebase
+ DD $L$SEH_info_aesni_cbc_sha256_enc_avx wrt ..imagebase
+ DD $L$SEH_begin_aesni_cbc_sha256_enc_avx2 wrt ..imagebase
+ DD $L$SEH_end_aesni_cbc_sha256_enc_avx2 wrt ..imagebase
+ DD $L$SEH_info_aesni_cbc_sha256_enc_avx2 wrt ..imagebase
+ DD $L$SEH_begin_aesni_cbc_sha256_enc_shaext wrt ..imagebase
+ DD $L$SEH_end_aesni_cbc_sha256_enc_shaext wrt ..imagebase
+ DD $L$SEH_info_aesni_cbc_sha256_enc_shaext wrt ..imagebase
+section .xdata rdata align=8
+ALIGN 8
+$L$SEH_info_aesni_cbc_sha256_enc_xop:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$prologue_xop wrt ..imagebase,$L$epilogue_xop wrt ..imagebase
+
+$L$SEH_info_aesni_cbc_sha256_enc_avx:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$prologue_avx wrt ..imagebase,$L$epilogue_avx wrt ..imagebase
+$L$SEH_info_aesni_cbc_sha256_enc_avx2:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$prologue_avx2 wrt ..imagebase,$L$epilogue_avx2 wrt ..imagebase
+$L$SEH_info_aesni_cbc_sha256_enc_shaext:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$prologue_shaext wrt ..imagebase,$L$epilogue_shaext wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-x86_64.nasm
new file mode 100644
index 0000000000..2705ece3e2
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-x86_64.nasm
@@ -0,0 +1,5084 @@
+; Copyright 2009-2019 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+EXTERN OPENSSL_ia32cap_P
+global aesni_encrypt
+
+ALIGN 16
+aesni_encrypt:
+
+ movups xmm2,XMMWORD[rcx]
+ mov eax,DWORD[240+r8]
+ movups xmm0,XMMWORD[r8]
+ movups xmm1,XMMWORD[16+r8]
+ lea r8,[32+r8]
+ xorps xmm2,xmm0
+$L$oop_enc1_1:
+DB 102,15,56,220,209
+ dec eax
+ movups xmm1,XMMWORD[r8]
+ lea r8,[16+r8]
+ jnz NEAR $L$oop_enc1_1
+DB 102,15,56,221,209
+ pxor xmm0,xmm0
+ pxor xmm1,xmm1
+ movups XMMWORD[rdx],xmm2
+ pxor xmm2,xmm2
+ DB 0F3h,0C3h ;repret
+
+
+
+global aesni_decrypt
+
+ALIGN 16
+aesni_decrypt:
+
+ movups xmm2,XMMWORD[rcx]
+ mov eax,DWORD[240+r8]
+ movups xmm0,XMMWORD[r8]
+ movups xmm1,XMMWORD[16+r8]
+ lea r8,[32+r8]
+ xorps xmm2,xmm0
+$L$oop_dec1_2:
+DB 102,15,56,222,209
+ dec eax
+ movups xmm1,XMMWORD[r8]
+ lea r8,[16+r8]
+ jnz NEAR $L$oop_dec1_2
+DB 102,15,56,223,209
+ pxor xmm0,xmm0
+ pxor xmm1,xmm1
+ movups XMMWORD[rdx],xmm2
+ pxor xmm2,xmm2
+ DB 0F3h,0C3h ;repret
+
+
+
+ALIGN 16
+_aesni_encrypt2:
+
+ movups xmm0,XMMWORD[rcx]
+ shl eax,4
+ movups xmm1,XMMWORD[16+rcx]
+ xorps xmm2,xmm0
+ xorps xmm3,xmm0
+ movups xmm0,XMMWORD[32+rcx]
+ lea rcx,[32+rax*1+rcx]
+ neg rax
+ add rax,16
+
+$L$enc_loop2:
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+ movups xmm1,XMMWORD[rax*1+rcx]
+ add rax,32
+DB 102,15,56,220,208
+DB 102,15,56,220,216
+ movups xmm0,XMMWORD[((-16))+rax*1+rcx]
+ jnz NEAR $L$enc_loop2
+
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+DB 102,15,56,221,208
+DB 102,15,56,221,216
+ DB 0F3h,0C3h ;repret
+
+
+
+ALIGN 16
+_aesni_decrypt2:
+
+ movups xmm0,XMMWORD[rcx]
+ shl eax,4
+ movups xmm1,XMMWORD[16+rcx]
+ xorps xmm2,xmm0
+ xorps xmm3,xmm0
+ movups xmm0,XMMWORD[32+rcx]
+ lea rcx,[32+rax*1+rcx]
+ neg rax
+ add rax,16
+
+$L$dec_loop2:
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+ movups xmm1,XMMWORD[rax*1+rcx]
+ add rax,32
+DB 102,15,56,222,208
+DB 102,15,56,222,216
+ movups xmm0,XMMWORD[((-16))+rax*1+rcx]
+ jnz NEAR $L$dec_loop2
+
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+DB 102,15,56,223,208
+DB 102,15,56,223,216
+ DB 0F3h,0C3h ;repret
+
+
+
+ALIGN 16
+_aesni_encrypt3:
+
+ movups xmm0,XMMWORD[rcx]
+ shl eax,4
+ movups xmm1,XMMWORD[16+rcx]
+ xorps xmm2,xmm0
+ xorps xmm3,xmm0
+ xorps xmm4,xmm0
+ movups xmm0,XMMWORD[32+rcx]
+ lea rcx,[32+rax*1+rcx]
+ neg rax
+ add rax,16
+
+$L$enc_loop3:
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+DB 102,15,56,220,225
+ movups xmm1,XMMWORD[rax*1+rcx]
+ add rax,32
+DB 102,15,56,220,208
+DB 102,15,56,220,216
+DB 102,15,56,220,224
+ movups xmm0,XMMWORD[((-16))+rax*1+rcx]
+ jnz NEAR $L$enc_loop3
+
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+DB 102,15,56,220,225
+DB 102,15,56,221,208
+DB 102,15,56,221,216
+DB 102,15,56,221,224
+ DB 0F3h,0C3h ;repret
+
+
+
+ALIGN 16
+_aesni_decrypt3:
+
+ movups xmm0,XMMWORD[rcx]
+ shl eax,4
+ movups xmm1,XMMWORD[16+rcx]
+ xorps xmm2,xmm0
+ xorps xmm3,xmm0
+ xorps xmm4,xmm0
+ movups xmm0,XMMWORD[32+rcx]
+ lea rcx,[32+rax*1+rcx]
+ neg rax
+ add rax,16
+
+$L$dec_loop3:
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+DB 102,15,56,222,225
+ movups xmm1,XMMWORD[rax*1+rcx]
+ add rax,32
+DB 102,15,56,222,208
+DB 102,15,56,222,216
+DB 102,15,56,222,224
+ movups xmm0,XMMWORD[((-16))+rax*1+rcx]
+ jnz NEAR $L$dec_loop3
+
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+DB 102,15,56,222,225
+DB 102,15,56,223,208
+DB 102,15,56,223,216
+DB 102,15,56,223,224
+ DB 0F3h,0C3h ;repret
+
+
+
+ALIGN 16
+_aesni_encrypt4:
+
+ movups xmm0,XMMWORD[rcx]
+ shl eax,4
+ movups xmm1,XMMWORD[16+rcx]
+ xorps xmm2,xmm0
+ xorps xmm3,xmm0
+ xorps xmm4,xmm0
+ xorps xmm5,xmm0
+ movups xmm0,XMMWORD[32+rcx]
+ lea rcx,[32+rax*1+rcx]
+ neg rax
+DB 0x0f,0x1f,0x00
+ add rax,16
+
+$L$enc_loop4:
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+DB 102,15,56,220,225
+DB 102,15,56,220,233
+ movups xmm1,XMMWORD[rax*1+rcx]
+ add rax,32
+DB 102,15,56,220,208
+DB 102,15,56,220,216
+DB 102,15,56,220,224
+DB 102,15,56,220,232
+ movups xmm0,XMMWORD[((-16))+rax*1+rcx]
+ jnz NEAR $L$enc_loop4
+
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+DB 102,15,56,220,225
+DB 102,15,56,220,233
+DB 102,15,56,221,208
+DB 102,15,56,221,216
+DB 102,15,56,221,224
+DB 102,15,56,221,232
+ DB 0F3h,0C3h ;repret
+
+
+
+ALIGN 16
+_aesni_decrypt4:
+
+ movups xmm0,XMMWORD[rcx]
+ shl eax,4
+ movups xmm1,XMMWORD[16+rcx]
+ xorps xmm2,xmm0
+ xorps xmm3,xmm0
+ xorps xmm4,xmm0
+ xorps xmm5,xmm0
+ movups xmm0,XMMWORD[32+rcx]
+ lea rcx,[32+rax*1+rcx]
+ neg rax
+DB 0x0f,0x1f,0x00
+ add rax,16
+
+$L$dec_loop4:
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+DB 102,15,56,222,225
+DB 102,15,56,222,233
+ movups xmm1,XMMWORD[rax*1+rcx]
+ add rax,32
+DB 102,15,56,222,208
+DB 102,15,56,222,216
+DB 102,15,56,222,224
+DB 102,15,56,222,232
+ movups xmm0,XMMWORD[((-16))+rax*1+rcx]
+ jnz NEAR $L$dec_loop4
+
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+DB 102,15,56,222,225
+DB 102,15,56,222,233
+DB 102,15,56,223,208
+DB 102,15,56,223,216
+DB 102,15,56,223,224
+DB 102,15,56,223,232
+ DB 0F3h,0C3h ;repret
+
+
+
+ALIGN 16
+_aesni_encrypt6:
+
+ movups xmm0,XMMWORD[rcx]
+ shl eax,4
+ movups xmm1,XMMWORD[16+rcx]
+ xorps xmm2,xmm0
+ pxor xmm3,xmm0
+ pxor xmm4,xmm0
+DB 102,15,56,220,209
+ lea rcx,[32+rax*1+rcx]
+ neg rax
+DB 102,15,56,220,217
+ pxor xmm5,xmm0
+ pxor xmm6,xmm0
+DB 102,15,56,220,225
+ pxor xmm7,xmm0
+ movups xmm0,XMMWORD[rax*1+rcx]
+ add rax,16
+ jmp NEAR $L$enc_loop6_enter
+ALIGN 16
+$L$enc_loop6:
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+DB 102,15,56,220,225
+$L$enc_loop6_enter:
+DB 102,15,56,220,233
+DB 102,15,56,220,241
+DB 102,15,56,220,249
+ movups xmm1,XMMWORD[rax*1+rcx]
+ add rax,32
+DB 102,15,56,220,208
+DB 102,15,56,220,216
+DB 102,15,56,220,224
+DB 102,15,56,220,232
+DB 102,15,56,220,240
+DB 102,15,56,220,248
+ movups xmm0,XMMWORD[((-16))+rax*1+rcx]
+ jnz NEAR $L$enc_loop6
+
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+DB 102,15,56,220,225
+DB 102,15,56,220,233
+DB 102,15,56,220,241
+DB 102,15,56,220,249
+DB 102,15,56,221,208
+DB 102,15,56,221,216
+DB 102,15,56,221,224
+DB 102,15,56,221,232
+DB 102,15,56,221,240
+DB 102,15,56,221,248
+ DB 0F3h,0C3h ;repret
+
+
+
+ALIGN 16
+_aesni_decrypt6:
+
+ movups xmm0,XMMWORD[rcx]
+ shl eax,4
+ movups xmm1,XMMWORD[16+rcx]
+ xorps xmm2,xmm0
+ pxor xmm3,xmm0
+ pxor xmm4,xmm0
+DB 102,15,56,222,209
+ lea rcx,[32+rax*1+rcx]
+ neg rax
+DB 102,15,56,222,217
+ pxor xmm5,xmm0
+ pxor xmm6,xmm0
+DB 102,15,56,222,225
+ pxor xmm7,xmm0
+ movups xmm0,XMMWORD[rax*1+rcx]
+ add rax,16
+ jmp NEAR $L$dec_loop6_enter
+ALIGN 16
+$L$dec_loop6:
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+DB 102,15,56,222,225
+$L$dec_loop6_enter:
+DB 102,15,56,222,233
+DB 102,15,56,222,241
+DB 102,15,56,222,249
+ movups xmm1,XMMWORD[rax*1+rcx]
+ add rax,32
+DB 102,15,56,222,208
+DB 102,15,56,222,216
+DB 102,15,56,222,224
+DB 102,15,56,222,232
+DB 102,15,56,222,240
+DB 102,15,56,222,248
+ movups xmm0,XMMWORD[((-16))+rax*1+rcx]
+ jnz NEAR $L$dec_loop6
+
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+DB 102,15,56,222,225
+DB 102,15,56,222,233
+DB 102,15,56,222,241
+DB 102,15,56,222,249
+DB 102,15,56,223,208
+DB 102,15,56,223,216
+DB 102,15,56,223,224
+DB 102,15,56,223,232
+DB 102,15,56,223,240
+DB 102,15,56,223,248
+ DB 0F3h,0C3h ;repret
+
+
+
+ALIGN 16
+_aesni_encrypt8:
+
+ movups xmm0,XMMWORD[rcx]
+ shl eax,4
+ movups xmm1,XMMWORD[16+rcx]
+ xorps xmm2,xmm0
+ xorps xmm3,xmm0
+ pxor xmm4,xmm0
+ pxor xmm5,xmm0
+ pxor xmm6,xmm0
+ lea rcx,[32+rax*1+rcx]
+ neg rax
+DB 102,15,56,220,209
+ pxor xmm7,xmm0
+ pxor xmm8,xmm0
+DB 102,15,56,220,217
+ pxor xmm9,xmm0
+ movups xmm0,XMMWORD[rax*1+rcx]
+ add rax,16
+ jmp NEAR $L$enc_loop8_inner
+ALIGN 16
+$L$enc_loop8:
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+$L$enc_loop8_inner:
+DB 102,15,56,220,225
+DB 102,15,56,220,233
+DB 102,15,56,220,241
+DB 102,15,56,220,249
+DB 102,68,15,56,220,193
+DB 102,68,15,56,220,201
+$L$enc_loop8_enter:
+ movups xmm1,XMMWORD[rax*1+rcx]
+ add rax,32
+DB 102,15,56,220,208
+DB 102,15,56,220,216
+DB 102,15,56,220,224
+DB 102,15,56,220,232
+DB 102,15,56,220,240
+DB 102,15,56,220,248
+DB 102,68,15,56,220,192
+DB 102,68,15,56,220,200
+ movups xmm0,XMMWORD[((-16))+rax*1+rcx]
+ jnz NEAR $L$enc_loop8
+
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+DB 102,15,56,220,225
+DB 102,15,56,220,233
+DB 102,15,56,220,241
+DB 102,15,56,220,249
+DB 102,68,15,56,220,193
+DB 102,68,15,56,220,201
+DB 102,15,56,221,208
+DB 102,15,56,221,216
+DB 102,15,56,221,224
+DB 102,15,56,221,232
+DB 102,15,56,221,240
+DB 102,15,56,221,248
+DB 102,68,15,56,221,192
+DB 102,68,15,56,221,200
+ DB 0F3h,0C3h ;repret
+
+
+
+ALIGN 16
+_aesni_decrypt8:
+
+ movups xmm0,XMMWORD[rcx]
+ shl eax,4
+ movups xmm1,XMMWORD[16+rcx]
+ xorps xmm2,xmm0
+ xorps xmm3,xmm0
+ pxor xmm4,xmm0
+ pxor xmm5,xmm0
+ pxor xmm6,xmm0
+ lea rcx,[32+rax*1+rcx]
+ neg rax
+DB 102,15,56,222,209
+ pxor xmm7,xmm0
+ pxor xmm8,xmm0
+DB 102,15,56,222,217
+ pxor xmm9,xmm0
+ movups xmm0,XMMWORD[rax*1+rcx]
+ add rax,16
+ jmp NEAR $L$dec_loop8_inner
+ALIGN 16
+$L$dec_loop8:
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+$L$dec_loop8_inner:
+DB 102,15,56,222,225
+DB 102,15,56,222,233
+DB 102,15,56,222,241
+DB 102,15,56,222,249
+DB 102,68,15,56,222,193
+DB 102,68,15,56,222,201
+$L$dec_loop8_enter:
+ movups xmm1,XMMWORD[rax*1+rcx]
+ add rax,32
+DB 102,15,56,222,208
+DB 102,15,56,222,216
+DB 102,15,56,222,224
+DB 102,15,56,222,232
+DB 102,15,56,222,240
+DB 102,15,56,222,248
+DB 102,68,15,56,222,192
+DB 102,68,15,56,222,200
+ movups xmm0,XMMWORD[((-16))+rax*1+rcx]
+ jnz NEAR $L$dec_loop8
+
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+DB 102,15,56,222,225
+DB 102,15,56,222,233
+DB 102,15,56,222,241
+DB 102,15,56,222,249
+DB 102,68,15,56,222,193
+DB 102,68,15,56,222,201
+DB 102,15,56,223,208
+DB 102,15,56,223,216
+DB 102,15,56,223,224
+DB 102,15,56,223,232
+DB 102,15,56,223,240
+DB 102,15,56,223,248
+DB 102,68,15,56,223,192
+DB 102,68,15,56,223,200
+ DB 0F3h,0C3h ;repret
+
+
+global aesni_ecb_encrypt
+
+ALIGN 16
+aesni_ecb_encrypt:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_aesni_ecb_encrypt:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+
+
+
+ lea rsp,[((-88))+rsp]
+ movaps XMMWORD[rsp],xmm6
+ movaps XMMWORD[16+rsp],xmm7
+ movaps XMMWORD[32+rsp],xmm8
+ movaps XMMWORD[48+rsp],xmm9
+$L$ecb_enc_body:
+ and rdx,-16
+ jz NEAR $L$ecb_ret
+
+ mov eax,DWORD[240+rcx]
+ movups xmm0,XMMWORD[rcx]
+ mov r11,rcx
+ mov r10d,eax
+ test r8d,r8d
+ jz NEAR $L$ecb_decrypt
+
+ cmp rdx,0x80
+ jb NEAR $L$ecb_enc_tail
+
+ movdqu xmm2,XMMWORD[rdi]
+ movdqu xmm3,XMMWORD[16+rdi]
+ movdqu xmm4,XMMWORD[32+rdi]
+ movdqu xmm5,XMMWORD[48+rdi]
+ movdqu xmm6,XMMWORD[64+rdi]
+ movdqu xmm7,XMMWORD[80+rdi]
+ movdqu xmm8,XMMWORD[96+rdi]
+ movdqu xmm9,XMMWORD[112+rdi]
+ lea rdi,[128+rdi]
+ sub rdx,0x80
+ jmp NEAR $L$ecb_enc_loop8_enter
+ALIGN 16
+$L$ecb_enc_loop8:
+ movups XMMWORD[rsi],xmm2
+ mov rcx,r11
+ movdqu xmm2,XMMWORD[rdi]
+ mov eax,r10d
+ movups XMMWORD[16+rsi],xmm3
+ movdqu xmm3,XMMWORD[16+rdi]
+ movups XMMWORD[32+rsi],xmm4
+ movdqu xmm4,XMMWORD[32+rdi]
+ movups XMMWORD[48+rsi],xmm5
+ movdqu xmm5,XMMWORD[48+rdi]
+ movups XMMWORD[64+rsi],xmm6
+ movdqu xmm6,XMMWORD[64+rdi]
+ movups XMMWORD[80+rsi],xmm7
+ movdqu xmm7,XMMWORD[80+rdi]
+ movups XMMWORD[96+rsi],xmm8
+ movdqu xmm8,XMMWORD[96+rdi]
+ movups XMMWORD[112+rsi],xmm9
+ lea rsi,[128+rsi]
+ movdqu xmm9,XMMWORD[112+rdi]
+ lea rdi,[128+rdi]
+$L$ecb_enc_loop8_enter:
+
+ call _aesni_encrypt8
+
+ sub rdx,0x80
+ jnc NEAR $L$ecb_enc_loop8
+
+ movups XMMWORD[rsi],xmm2
+ mov rcx,r11
+ movups XMMWORD[16+rsi],xmm3
+ mov eax,r10d
+ movups XMMWORD[32+rsi],xmm4
+ movups XMMWORD[48+rsi],xmm5
+ movups XMMWORD[64+rsi],xmm6
+ movups XMMWORD[80+rsi],xmm7
+ movups XMMWORD[96+rsi],xmm8
+ movups XMMWORD[112+rsi],xmm9
+ lea rsi,[128+rsi]
+ add rdx,0x80
+ jz NEAR $L$ecb_ret
+
+$L$ecb_enc_tail:
+ movups xmm2,XMMWORD[rdi]
+ cmp rdx,0x20
+ jb NEAR $L$ecb_enc_one
+ movups xmm3,XMMWORD[16+rdi]
+ je NEAR $L$ecb_enc_two
+ movups xmm4,XMMWORD[32+rdi]
+ cmp rdx,0x40
+ jb NEAR $L$ecb_enc_three
+ movups xmm5,XMMWORD[48+rdi]
+ je NEAR $L$ecb_enc_four
+ movups xmm6,XMMWORD[64+rdi]
+ cmp rdx,0x60
+ jb NEAR $L$ecb_enc_five
+ movups xmm7,XMMWORD[80+rdi]
+ je NEAR $L$ecb_enc_six
+ movdqu xmm8,XMMWORD[96+rdi]
+ xorps xmm9,xmm9
+ call _aesni_encrypt8
+ movups XMMWORD[rsi],xmm2
+ movups XMMWORD[16+rsi],xmm3
+ movups XMMWORD[32+rsi],xmm4
+ movups XMMWORD[48+rsi],xmm5
+ movups XMMWORD[64+rsi],xmm6
+ movups XMMWORD[80+rsi],xmm7
+ movups XMMWORD[96+rsi],xmm8
+ jmp NEAR $L$ecb_ret
+ALIGN 16
+$L$ecb_enc_one:
+ movups xmm0,XMMWORD[rcx]
+ movups xmm1,XMMWORD[16+rcx]
+ lea rcx,[32+rcx]
+ xorps xmm2,xmm0
+$L$oop_enc1_3:
+DB 102,15,56,220,209
+ dec eax
+ movups xmm1,XMMWORD[rcx]
+ lea rcx,[16+rcx]
+ jnz NEAR $L$oop_enc1_3
+DB 102,15,56,221,209
+ movups XMMWORD[rsi],xmm2
+ jmp NEAR $L$ecb_ret
+ALIGN 16
+$L$ecb_enc_two:
+ call _aesni_encrypt2
+ movups XMMWORD[rsi],xmm2
+ movups XMMWORD[16+rsi],xmm3
+ jmp NEAR $L$ecb_ret
+ALIGN 16
+$L$ecb_enc_three:
+ call _aesni_encrypt3
+ movups XMMWORD[rsi],xmm2
+ movups XMMWORD[16+rsi],xmm3
+ movups XMMWORD[32+rsi],xmm4
+ jmp NEAR $L$ecb_ret
+ALIGN 16
+$L$ecb_enc_four:
+ call _aesni_encrypt4
+ movups XMMWORD[rsi],xmm2
+ movups XMMWORD[16+rsi],xmm3
+ movups XMMWORD[32+rsi],xmm4
+ movups XMMWORD[48+rsi],xmm5
+ jmp NEAR $L$ecb_ret
+ALIGN 16
+$L$ecb_enc_five:
+ xorps xmm7,xmm7
+ call _aesni_encrypt6
+ movups XMMWORD[rsi],xmm2
+ movups XMMWORD[16+rsi],xmm3
+ movups XMMWORD[32+rsi],xmm4
+ movups XMMWORD[48+rsi],xmm5
+ movups XMMWORD[64+rsi],xmm6
+ jmp NEAR $L$ecb_ret
+ALIGN 16
+$L$ecb_enc_six:
+ call _aesni_encrypt6
+ movups XMMWORD[rsi],xmm2
+ movups XMMWORD[16+rsi],xmm3
+ movups XMMWORD[32+rsi],xmm4
+ movups XMMWORD[48+rsi],xmm5
+ movups XMMWORD[64+rsi],xmm6
+ movups XMMWORD[80+rsi],xmm7
+ jmp NEAR $L$ecb_ret
+
+ALIGN 16
+$L$ecb_decrypt:
+ cmp rdx,0x80
+ jb NEAR $L$ecb_dec_tail
+
+ movdqu xmm2,XMMWORD[rdi]
+ movdqu xmm3,XMMWORD[16+rdi]
+ movdqu xmm4,XMMWORD[32+rdi]
+ movdqu xmm5,XMMWORD[48+rdi]
+ movdqu xmm6,XMMWORD[64+rdi]
+ movdqu xmm7,XMMWORD[80+rdi]
+ movdqu xmm8,XMMWORD[96+rdi]
+ movdqu xmm9,XMMWORD[112+rdi]
+ lea rdi,[128+rdi]
+ sub rdx,0x80
+ jmp NEAR $L$ecb_dec_loop8_enter
+ALIGN 16
+$L$ecb_dec_loop8:
+ movups XMMWORD[rsi],xmm2
+ mov rcx,r11
+ movdqu xmm2,XMMWORD[rdi]
+ mov eax,r10d
+ movups XMMWORD[16+rsi],xmm3
+ movdqu xmm3,XMMWORD[16+rdi]
+ movups XMMWORD[32+rsi],xmm4
+ movdqu xmm4,XMMWORD[32+rdi]
+ movups XMMWORD[48+rsi],xmm5
+ movdqu xmm5,XMMWORD[48+rdi]
+ movups XMMWORD[64+rsi],xmm6
+ movdqu xmm6,XMMWORD[64+rdi]
+ movups XMMWORD[80+rsi],xmm7
+ movdqu xmm7,XMMWORD[80+rdi]
+ movups XMMWORD[96+rsi],xmm8
+ movdqu xmm8,XMMWORD[96+rdi]
+ movups XMMWORD[112+rsi],xmm9
+ lea rsi,[128+rsi]
+ movdqu xmm9,XMMWORD[112+rdi]
+ lea rdi,[128+rdi]
+$L$ecb_dec_loop8_enter:
+
+ call _aesni_decrypt8
+
+ movups xmm0,XMMWORD[r11]
+ sub rdx,0x80
+ jnc NEAR $L$ecb_dec_loop8
+
+ movups XMMWORD[rsi],xmm2
+ pxor xmm2,xmm2
+ mov rcx,r11
+ movups XMMWORD[16+rsi],xmm3
+ pxor xmm3,xmm3
+ mov eax,r10d
+ movups XMMWORD[32+rsi],xmm4
+ pxor xmm4,xmm4
+ movups XMMWORD[48+rsi],xmm5
+ pxor xmm5,xmm5
+ movups XMMWORD[64+rsi],xmm6
+ pxor xmm6,xmm6
+ movups XMMWORD[80+rsi],xmm7
+ pxor xmm7,xmm7
+ movups XMMWORD[96+rsi],xmm8
+ pxor xmm8,xmm8
+ movups XMMWORD[112+rsi],xmm9
+ pxor xmm9,xmm9
+ lea rsi,[128+rsi]
+ add rdx,0x80
+ jz NEAR $L$ecb_ret
+
+$L$ecb_dec_tail:
+ movups xmm2,XMMWORD[rdi]
+ cmp rdx,0x20
+ jb NEAR $L$ecb_dec_one
+ movups xmm3,XMMWORD[16+rdi]
+ je NEAR $L$ecb_dec_two
+ movups xmm4,XMMWORD[32+rdi]
+ cmp rdx,0x40
+ jb NEAR $L$ecb_dec_three
+ movups xmm5,XMMWORD[48+rdi]
+ je NEAR $L$ecb_dec_four
+ movups xmm6,XMMWORD[64+rdi]
+ cmp rdx,0x60
+ jb NEAR $L$ecb_dec_five
+ movups xmm7,XMMWORD[80+rdi]
+ je NEAR $L$ecb_dec_six
+ movups xmm8,XMMWORD[96+rdi]
+ movups xmm0,XMMWORD[rcx]
+ xorps xmm9,xmm9
+ call _aesni_decrypt8
+ movups XMMWORD[rsi],xmm2
+ pxor xmm2,xmm2
+ movups XMMWORD[16+rsi],xmm3
+ pxor xmm3,xmm3
+ movups XMMWORD[32+rsi],xmm4
+ pxor xmm4,xmm4
+ movups XMMWORD[48+rsi],xmm5
+ pxor xmm5,xmm5
+ movups XMMWORD[64+rsi],xmm6
+ pxor xmm6,xmm6
+ movups XMMWORD[80+rsi],xmm7
+ pxor xmm7,xmm7
+ movups XMMWORD[96+rsi],xmm8
+ pxor xmm8,xmm8
+ pxor xmm9,xmm9
+ jmp NEAR $L$ecb_ret
+ALIGN 16
+$L$ecb_dec_one:
+ movups xmm0,XMMWORD[rcx]
+ movups xmm1,XMMWORD[16+rcx]
+ lea rcx,[32+rcx]
+ xorps xmm2,xmm0
+$L$oop_dec1_4:
+DB 102,15,56,222,209
+ dec eax
+ movups xmm1,XMMWORD[rcx]
+ lea rcx,[16+rcx]
+ jnz NEAR $L$oop_dec1_4
+DB 102,15,56,223,209
+ movups XMMWORD[rsi],xmm2
+ pxor xmm2,xmm2
+ jmp NEAR $L$ecb_ret
+ALIGN 16
+$L$ecb_dec_two:
+ call _aesni_decrypt2
+ movups XMMWORD[rsi],xmm2
+ pxor xmm2,xmm2
+ movups XMMWORD[16+rsi],xmm3
+ pxor xmm3,xmm3
+ jmp NEAR $L$ecb_ret
+ALIGN 16
+$L$ecb_dec_three:
+ call _aesni_decrypt3
+ movups XMMWORD[rsi],xmm2
+ pxor xmm2,xmm2
+ movups XMMWORD[16+rsi],xmm3
+ pxor xmm3,xmm3
+ movups XMMWORD[32+rsi],xmm4
+ pxor xmm4,xmm4
+ jmp NEAR $L$ecb_ret
+ALIGN 16
+$L$ecb_dec_four:
+ call _aesni_decrypt4
+ movups XMMWORD[rsi],xmm2
+ pxor xmm2,xmm2
+ movups XMMWORD[16+rsi],xmm3
+ pxor xmm3,xmm3
+ movups XMMWORD[32+rsi],xmm4
+ pxor xmm4,xmm4
+ movups XMMWORD[48+rsi],xmm5
+ pxor xmm5,xmm5
+ jmp NEAR $L$ecb_ret
+ALIGN 16
+$L$ecb_dec_five:
+ xorps xmm7,xmm7
+ call _aesni_decrypt6
+ movups XMMWORD[rsi],xmm2
+ pxor xmm2,xmm2
+ movups XMMWORD[16+rsi],xmm3
+ pxor xmm3,xmm3
+ movups XMMWORD[32+rsi],xmm4
+ pxor xmm4,xmm4
+ movups XMMWORD[48+rsi],xmm5
+ pxor xmm5,xmm5
+ movups XMMWORD[64+rsi],xmm6
+ pxor xmm6,xmm6
+ pxor xmm7,xmm7
+ jmp NEAR $L$ecb_ret
+ALIGN 16
+$L$ecb_dec_six:
+ call _aesni_decrypt6
+ movups XMMWORD[rsi],xmm2
+ pxor xmm2,xmm2
+ movups XMMWORD[16+rsi],xmm3
+ pxor xmm3,xmm3
+ movups XMMWORD[32+rsi],xmm4
+ pxor xmm4,xmm4
+ movups XMMWORD[48+rsi],xmm5
+ pxor xmm5,xmm5
+ movups XMMWORD[64+rsi],xmm6
+ pxor xmm6,xmm6
+ movups XMMWORD[80+rsi],xmm7
+ pxor xmm7,xmm7
+
+$L$ecb_ret:
+ xorps xmm0,xmm0
+ pxor xmm1,xmm1
+ movaps xmm6,XMMWORD[rsp]
+ movaps XMMWORD[rsp],xmm0
+ movaps xmm7,XMMWORD[16+rsp]
+ movaps XMMWORD[16+rsp],xmm0
+ movaps xmm8,XMMWORD[32+rsp]
+ movaps XMMWORD[32+rsp],xmm0
+ movaps xmm9,XMMWORD[48+rsp]
+ movaps XMMWORD[48+rsp],xmm0
+ lea rsp,[88+rsp]
+$L$ecb_enc_ret:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_aesni_ecb_encrypt:
+global aesni_ccm64_encrypt_blocks
+
+ALIGN 16
+aesni_ccm64_encrypt_blocks:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_aesni_ccm64_encrypt_blocks:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+ lea rsp,[((-88))+rsp]
+ movaps XMMWORD[rsp],xmm6
+ movaps XMMWORD[16+rsp],xmm7
+ movaps XMMWORD[32+rsp],xmm8
+ movaps XMMWORD[48+rsp],xmm9
+$L$ccm64_enc_body:
+ mov eax,DWORD[240+rcx]
+ movdqu xmm6,XMMWORD[r8]
+ movdqa xmm9,XMMWORD[$L$increment64]
+ movdqa xmm7,XMMWORD[$L$bswap_mask]
+
+ shl eax,4
+ mov r10d,16
+ lea r11,[rcx]
+ movdqu xmm3,XMMWORD[r9]
+ movdqa xmm2,xmm6
+ lea rcx,[32+rax*1+rcx]
+DB 102,15,56,0,247
+ sub r10,rax
+ jmp NEAR $L$ccm64_enc_outer
+ALIGN 16
+$L$ccm64_enc_outer:
+ movups xmm0,XMMWORD[r11]
+ mov rax,r10
+ movups xmm8,XMMWORD[rdi]
+
+ xorps xmm2,xmm0
+ movups xmm1,XMMWORD[16+r11]
+ xorps xmm0,xmm8
+ xorps xmm3,xmm0
+ movups xmm0,XMMWORD[32+r11]
+
+$L$ccm64_enc2_loop:
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+ movups xmm1,XMMWORD[rax*1+rcx]
+ add rax,32
+DB 102,15,56,220,208
+DB 102,15,56,220,216
+ movups xmm0,XMMWORD[((-16))+rax*1+rcx]
+ jnz NEAR $L$ccm64_enc2_loop
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+ paddq xmm6,xmm9
+ dec rdx
+DB 102,15,56,221,208
+DB 102,15,56,221,216
+
+ lea rdi,[16+rdi]
+ xorps xmm8,xmm2
+ movdqa xmm2,xmm6
+ movups XMMWORD[rsi],xmm8
+DB 102,15,56,0,215
+ lea rsi,[16+rsi]
+ jnz NEAR $L$ccm64_enc_outer
+
+ pxor xmm0,xmm0
+ pxor xmm1,xmm1
+ pxor xmm2,xmm2
+ movups XMMWORD[r9],xmm3
+ pxor xmm3,xmm3
+ pxor xmm8,xmm8
+ pxor xmm6,xmm6
+ movaps xmm6,XMMWORD[rsp]
+ movaps XMMWORD[rsp],xmm0
+ movaps xmm7,XMMWORD[16+rsp]
+ movaps XMMWORD[16+rsp],xmm0
+ movaps xmm8,XMMWORD[32+rsp]
+ movaps XMMWORD[32+rsp],xmm0
+ movaps xmm9,XMMWORD[48+rsp]
+ movaps XMMWORD[48+rsp],xmm0
+ lea rsp,[88+rsp]
+$L$ccm64_enc_ret:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+$L$SEH_end_aesni_ccm64_encrypt_blocks:
+global aesni_ccm64_decrypt_blocks
+
+ALIGN 16
+aesni_ccm64_decrypt_blocks:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_aesni_ccm64_decrypt_blocks:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+ lea rsp,[((-88))+rsp]
+ movaps XMMWORD[rsp],xmm6
+ movaps XMMWORD[16+rsp],xmm7
+ movaps XMMWORD[32+rsp],xmm8
+ movaps XMMWORD[48+rsp],xmm9
+$L$ccm64_dec_body:
+ mov eax,DWORD[240+rcx]
+ movups xmm6,XMMWORD[r8]
+ movdqu xmm3,XMMWORD[r9]
+ movdqa xmm9,XMMWORD[$L$increment64]
+ movdqa xmm7,XMMWORD[$L$bswap_mask]
+
+ movaps xmm2,xmm6
+ mov r10d,eax
+ mov r11,rcx
+DB 102,15,56,0,247
+ movups xmm0,XMMWORD[rcx]
+ movups xmm1,XMMWORD[16+rcx]
+ lea rcx,[32+rcx]
+ xorps xmm2,xmm0
+$L$oop_enc1_5:
+DB 102,15,56,220,209
+ dec eax
+ movups xmm1,XMMWORD[rcx]
+ lea rcx,[16+rcx]
+ jnz NEAR $L$oop_enc1_5
+DB 102,15,56,221,209
+ shl r10d,4
+ mov eax,16
+ movups xmm8,XMMWORD[rdi]
+ paddq xmm6,xmm9
+ lea rdi,[16+rdi]
+ sub rax,r10
+ lea rcx,[32+r10*1+r11]
+ mov r10,rax
+ jmp NEAR $L$ccm64_dec_outer
+ALIGN 16
+$L$ccm64_dec_outer:
+ xorps xmm8,xmm2
+ movdqa xmm2,xmm6
+ movups XMMWORD[rsi],xmm8
+ lea rsi,[16+rsi]
+DB 102,15,56,0,215
+
+ sub rdx,1
+ jz NEAR $L$ccm64_dec_break
+
+ movups xmm0,XMMWORD[r11]
+ mov rax,r10
+ movups xmm1,XMMWORD[16+r11]
+ xorps xmm8,xmm0
+ xorps xmm2,xmm0
+ xorps xmm3,xmm8
+ movups xmm0,XMMWORD[32+r11]
+ jmp NEAR $L$ccm64_dec2_loop
+ALIGN 16
+$L$ccm64_dec2_loop:
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+ movups xmm1,XMMWORD[rax*1+rcx]
+ add rax,32
+DB 102,15,56,220,208
+DB 102,15,56,220,216
+ movups xmm0,XMMWORD[((-16))+rax*1+rcx]
+ jnz NEAR $L$ccm64_dec2_loop
+ movups xmm8,XMMWORD[rdi]
+ paddq xmm6,xmm9
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+DB 102,15,56,221,208
+DB 102,15,56,221,216
+ lea rdi,[16+rdi]
+ jmp NEAR $L$ccm64_dec_outer
+
+ALIGN 16
+$L$ccm64_dec_break:
+
+ mov eax,DWORD[240+r11]
+ movups xmm0,XMMWORD[r11]
+ movups xmm1,XMMWORD[16+r11]
+ xorps xmm8,xmm0
+ lea r11,[32+r11]
+ xorps xmm3,xmm8
+$L$oop_enc1_6:
+DB 102,15,56,220,217
+ dec eax
+ movups xmm1,XMMWORD[r11]
+ lea r11,[16+r11]
+ jnz NEAR $L$oop_enc1_6
+DB 102,15,56,221,217
+ pxor xmm0,xmm0
+ pxor xmm1,xmm1
+ pxor xmm2,xmm2
+ movups XMMWORD[r9],xmm3
+ pxor xmm3,xmm3
+ pxor xmm8,xmm8
+ pxor xmm6,xmm6
+ movaps xmm6,XMMWORD[rsp]
+ movaps XMMWORD[rsp],xmm0
+ movaps xmm7,XMMWORD[16+rsp]
+ movaps XMMWORD[16+rsp],xmm0
+ movaps xmm8,XMMWORD[32+rsp]
+ movaps XMMWORD[32+rsp],xmm0
+ movaps xmm9,XMMWORD[48+rsp]
+ movaps XMMWORD[48+rsp],xmm0
+ lea rsp,[88+rsp]
+$L$ccm64_dec_ret:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+$L$SEH_end_aesni_ccm64_decrypt_blocks:
+global aesni_ctr32_encrypt_blocks
+
+ALIGN 16
+aesni_ctr32_encrypt_blocks:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_aesni_ctr32_encrypt_blocks:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+
+
+
+ cmp rdx,1
+ jne NEAR $L$ctr32_bulk
+
+
+
+ movups xmm2,XMMWORD[r8]
+ movups xmm3,XMMWORD[rdi]
+ mov edx,DWORD[240+rcx]
+ movups xmm0,XMMWORD[rcx]
+ movups xmm1,XMMWORD[16+rcx]
+ lea rcx,[32+rcx]
+ xorps xmm2,xmm0
+$L$oop_enc1_7:
+DB 102,15,56,220,209
+ dec edx
+ movups xmm1,XMMWORD[rcx]
+ lea rcx,[16+rcx]
+ jnz NEAR $L$oop_enc1_7
+DB 102,15,56,221,209
+ pxor xmm0,xmm0
+ pxor xmm1,xmm1
+ xorps xmm2,xmm3
+ pxor xmm3,xmm3
+ movups XMMWORD[rsi],xmm2
+ xorps xmm2,xmm2
+ jmp NEAR $L$ctr32_epilogue
+
+ALIGN 16
+$L$ctr32_bulk:
+ lea r11,[rsp]
+
+ push rbp
+
+ sub rsp,288
+ and rsp,-16
+ movaps XMMWORD[(-168)+r11],xmm6
+ movaps XMMWORD[(-152)+r11],xmm7
+ movaps XMMWORD[(-136)+r11],xmm8
+ movaps XMMWORD[(-120)+r11],xmm9
+ movaps XMMWORD[(-104)+r11],xmm10
+ movaps XMMWORD[(-88)+r11],xmm11
+ movaps XMMWORD[(-72)+r11],xmm12
+ movaps XMMWORD[(-56)+r11],xmm13
+ movaps XMMWORD[(-40)+r11],xmm14
+ movaps XMMWORD[(-24)+r11],xmm15
+$L$ctr32_body:
+
+
+
+
+ movdqu xmm2,XMMWORD[r8]
+ movdqu xmm0,XMMWORD[rcx]
+ mov r8d,DWORD[12+r8]
+ pxor xmm2,xmm0
+ mov ebp,DWORD[12+rcx]
+ movdqa XMMWORD[rsp],xmm2
+ bswap r8d
+ movdqa xmm3,xmm2
+ movdqa xmm4,xmm2
+ movdqa xmm5,xmm2
+ movdqa XMMWORD[64+rsp],xmm2
+ movdqa XMMWORD[80+rsp],xmm2
+ movdqa XMMWORD[96+rsp],xmm2
+ mov r10,rdx
+ movdqa XMMWORD[112+rsp],xmm2
+
+ lea rax,[1+r8]
+ lea rdx,[2+r8]
+ bswap eax
+ bswap edx
+ xor eax,ebp
+ xor edx,ebp
+DB 102,15,58,34,216,3
+ lea rax,[3+r8]
+ movdqa XMMWORD[16+rsp],xmm3
+DB 102,15,58,34,226,3
+ bswap eax
+ mov rdx,r10
+ lea r10,[4+r8]
+ movdqa XMMWORD[32+rsp],xmm4
+ xor eax,ebp
+ bswap r10d
+DB 102,15,58,34,232,3
+ xor r10d,ebp
+ movdqa XMMWORD[48+rsp],xmm5
+ lea r9,[5+r8]
+ mov DWORD[((64+12))+rsp],r10d
+ bswap r9d
+ lea r10,[6+r8]
+ mov eax,DWORD[240+rcx]
+ xor r9d,ebp
+ bswap r10d
+ mov DWORD[((80+12))+rsp],r9d
+ xor r10d,ebp
+ lea r9,[7+r8]
+ mov DWORD[((96+12))+rsp],r10d
+ bswap r9d
+ mov r10d,DWORD[((OPENSSL_ia32cap_P+4))]
+ xor r9d,ebp
+ and r10d,71303168
+ mov DWORD[((112+12))+rsp],r9d
+
+ movups xmm1,XMMWORD[16+rcx]
+
+ movdqa xmm6,XMMWORD[64+rsp]
+ movdqa xmm7,XMMWORD[80+rsp]
+
+ cmp rdx,8
+ jb NEAR $L$ctr32_tail
+
+ sub rdx,6
+ cmp r10d,4194304
+ je NEAR $L$ctr32_6x
+
+ lea rcx,[128+rcx]
+ sub rdx,2
+ jmp NEAR $L$ctr32_loop8
+
+ALIGN 16
+$L$ctr32_6x:
+ shl eax,4
+ mov r10d,48
+ bswap ebp
+ lea rcx,[32+rax*1+rcx]
+ sub r10,rax
+ jmp NEAR $L$ctr32_loop6
+
+ALIGN 16
+$L$ctr32_loop6:
+ add r8d,6
+ movups xmm0,XMMWORD[((-48))+r10*1+rcx]
+DB 102,15,56,220,209
+ mov eax,r8d
+ xor eax,ebp
+DB 102,15,56,220,217
+DB 0x0f,0x38,0xf1,0x44,0x24,12
+ lea eax,[1+r8]
+DB 102,15,56,220,225
+ xor eax,ebp
+DB 0x0f,0x38,0xf1,0x44,0x24,28
+DB 102,15,56,220,233
+ lea eax,[2+r8]
+ xor eax,ebp
+DB 102,15,56,220,241
+DB 0x0f,0x38,0xf1,0x44,0x24,44
+ lea eax,[3+r8]
+DB 102,15,56,220,249
+ movups xmm1,XMMWORD[((-32))+r10*1+rcx]
+ xor eax,ebp
+
+DB 102,15,56,220,208
+DB 0x0f,0x38,0xf1,0x44,0x24,60
+ lea eax,[4+r8]
+DB 102,15,56,220,216
+ xor eax,ebp
+DB 0x0f,0x38,0xf1,0x44,0x24,76
+DB 102,15,56,220,224
+ lea eax,[5+r8]
+ xor eax,ebp
+DB 102,15,56,220,232
+DB 0x0f,0x38,0xf1,0x44,0x24,92
+ mov rax,r10
+DB 102,15,56,220,240
+DB 102,15,56,220,248
+ movups xmm0,XMMWORD[((-16))+r10*1+rcx]
+
+ call $L$enc_loop6
+
+ movdqu xmm8,XMMWORD[rdi]
+ movdqu xmm9,XMMWORD[16+rdi]
+ movdqu xmm10,XMMWORD[32+rdi]
+ movdqu xmm11,XMMWORD[48+rdi]
+ movdqu xmm12,XMMWORD[64+rdi]
+ movdqu xmm13,XMMWORD[80+rdi]
+ lea rdi,[96+rdi]
+ movups xmm1,XMMWORD[((-64))+r10*1+rcx]
+ pxor xmm8,xmm2
+ movaps xmm2,XMMWORD[rsp]
+ pxor xmm9,xmm3
+ movaps xmm3,XMMWORD[16+rsp]
+ pxor xmm10,xmm4
+ movaps xmm4,XMMWORD[32+rsp]
+ pxor xmm11,xmm5
+ movaps xmm5,XMMWORD[48+rsp]
+ pxor xmm12,xmm6
+ movaps xmm6,XMMWORD[64+rsp]
+ pxor xmm13,xmm7
+ movaps xmm7,XMMWORD[80+rsp]
+ movdqu XMMWORD[rsi],xmm8
+ movdqu XMMWORD[16+rsi],xmm9
+ movdqu XMMWORD[32+rsi],xmm10
+ movdqu XMMWORD[48+rsi],xmm11
+ movdqu XMMWORD[64+rsi],xmm12
+ movdqu XMMWORD[80+rsi],xmm13
+ lea rsi,[96+rsi]
+
+ sub rdx,6
+ jnc NEAR $L$ctr32_loop6
+
+ add rdx,6
+ jz NEAR $L$ctr32_done
+
+ lea eax,[((-48))+r10]
+ lea rcx,[((-80))+r10*1+rcx]
+ neg eax
+ shr eax,4
+ jmp NEAR $L$ctr32_tail
+
+ALIGN 32
+$L$ctr32_loop8:
+ add r8d,8
+ movdqa xmm8,XMMWORD[96+rsp]
+DB 102,15,56,220,209
+ mov r9d,r8d
+ movdqa xmm9,XMMWORD[112+rsp]
+DB 102,15,56,220,217
+ bswap r9d
+ movups xmm0,XMMWORD[((32-128))+rcx]
+DB 102,15,56,220,225
+ xor r9d,ebp
+ nop
+DB 102,15,56,220,233
+ mov DWORD[((0+12))+rsp],r9d
+ lea r9,[1+r8]
+DB 102,15,56,220,241
+DB 102,15,56,220,249
+DB 102,68,15,56,220,193
+DB 102,68,15,56,220,201
+ movups xmm1,XMMWORD[((48-128))+rcx]
+ bswap r9d
+DB 102,15,56,220,208
+DB 102,15,56,220,216
+ xor r9d,ebp
+DB 0x66,0x90
+DB 102,15,56,220,224
+DB 102,15,56,220,232
+ mov DWORD[((16+12))+rsp],r9d
+ lea r9,[2+r8]
+DB 102,15,56,220,240
+DB 102,15,56,220,248
+DB 102,68,15,56,220,192
+DB 102,68,15,56,220,200
+ movups xmm0,XMMWORD[((64-128))+rcx]
+ bswap r9d
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+ xor r9d,ebp
+DB 0x66,0x90
+DB 102,15,56,220,225
+DB 102,15,56,220,233
+ mov DWORD[((32+12))+rsp],r9d
+ lea r9,[3+r8]
+DB 102,15,56,220,241
+DB 102,15,56,220,249
+DB 102,68,15,56,220,193
+DB 102,68,15,56,220,201
+ movups xmm1,XMMWORD[((80-128))+rcx]
+ bswap r9d
+DB 102,15,56,220,208
+DB 102,15,56,220,216
+ xor r9d,ebp
+DB 0x66,0x90
+DB 102,15,56,220,224
+DB 102,15,56,220,232
+ mov DWORD[((48+12))+rsp],r9d
+ lea r9,[4+r8]
+DB 102,15,56,220,240
+DB 102,15,56,220,248
+DB 102,68,15,56,220,192
+DB 102,68,15,56,220,200
+ movups xmm0,XMMWORD[((96-128))+rcx]
+ bswap r9d
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+ xor r9d,ebp
+DB 0x66,0x90
+DB 102,15,56,220,225
+DB 102,15,56,220,233
+ mov DWORD[((64+12))+rsp],r9d
+ lea r9,[5+r8]
+DB 102,15,56,220,241
+DB 102,15,56,220,249
+DB 102,68,15,56,220,193
+DB 102,68,15,56,220,201
+ movups xmm1,XMMWORD[((112-128))+rcx]
+ bswap r9d
+DB 102,15,56,220,208
+DB 102,15,56,220,216
+ xor r9d,ebp
+DB 0x66,0x90
+DB 102,15,56,220,224
+DB 102,15,56,220,232
+ mov DWORD[((80+12))+rsp],r9d
+ lea r9,[6+r8]
+DB 102,15,56,220,240
+DB 102,15,56,220,248
+DB 102,68,15,56,220,192
+DB 102,68,15,56,220,200
+ movups xmm0,XMMWORD[((128-128))+rcx]
+ bswap r9d
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+ xor r9d,ebp
+DB 0x66,0x90
+DB 102,15,56,220,225
+DB 102,15,56,220,233
+ mov DWORD[((96+12))+rsp],r9d
+ lea r9,[7+r8]
+DB 102,15,56,220,241
+DB 102,15,56,220,249
+DB 102,68,15,56,220,193
+DB 102,68,15,56,220,201
+ movups xmm1,XMMWORD[((144-128))+rcx]
+ bswap r9d
+DB 102,15,56,220,208
+DB 102,15,56,220,216
+DB 102,15,56,220,224
+ xor r9d,ebp
+ movdqu xmm10,XMMWORD[rdi]
+DB 102,15,56,220,232
+ mov DWORD[((112+12))+rsp],r9d
+ cmp eax,11
+DB 102,15,56,220,240
+DB 102,15,56,220,248
+DB 102,68,15,56,220,192
+DB 102,68,15,56,220,200
+ movups xmm0,XMMWORD[((160-128))+rcx]
+
+ jb NEAR $L$ctr32_enc_done
+
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+DB 102,15,56,220,225
+DB 102,15,56,220,233
+DB 102,15,56,220,241
+DB 102,15,56,220,249
+DB 102,68,15,56,220,193
+DB 102,68,15,56,220,201
+ movups xmm1,XMMWORD[((176-128))+rcx]
+
+DB 102,15,56,220,208
+DB 102,15,56,220,216
+DB 102,15,56,220,224
+DB 102,15,56,220,232
+DB 102,15,56,220,240
+DB 102,15,56,220,248
+DB 102,68,15,56,220,192
+DB 102,68,15,56,220,200
+ movups xmm0,XMMWORD[((192-128))+rcx]
+ je NEAR $L$ctr32_enc_done
+
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+DB 102,15,56,220,225
+DB 102,15,56,220,233
+DB 102,15,56,220,241
+DB 102,15,56,220,249
+DB 102,68,15,56,220,193
+DB 102,68,15,56,220,201
+ movups xmm1,XMMWORD[((208-128))+rcx]
+
+DB 102,15,56,220,208
+DB 102,15,56,220,216
+DB 102,15,56,220,224
+DB 102,15,56,220,232
+DB 102,15,56,220,240
+DB 102,15,56,220,248
+DB 102,68,15,56,220,192
+DB 102,68,15,56,220,200
+ movups xmm0,XMMWORD[((224-128))+rcx]
+ jmp NEAR $L$ctr32_enc_done
+
+ALIGN 16
+$L$ctr32_enc_done:
+ movdqu xmm11,XMMWORD[16+rdi]
+ pxor xmm10,xmm0
+ movdqu xmm12,XMMWORD[32+rdi]
+ pxor xmm11,xmm0
+ movdqu xmm13,XMMWORD[48+rdi]
+ pxor xmm12,xmm0
+ movdqu xmm14,XMMWORD[64+rdi]
+ pxor xmm13,xmm0
+ movdqu xmm15,XMMWORD[80+rdi]
+ pxor xmm14,xmm0
+ pxor xmm15,xmm0
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+DB 102,15,56,220,225
+DB 102,15,56,220,233
+DB 102,15,56,220,241
+DB 102,15,56,220,249
+DB 102,68,15,56,220,193
+DB 102,68,15,56,220,201
+ movdqu xmm1,XMMWORD[96+rdi]
+ lea rdi,[128+rdi]
+
+DB 102,65,15,56,221,210
+ pxor xmm1,xmm0
+ movdqu xmm10,XMMWORD[((112-128))+rdi]
+DB 102,65,15,56,221,219
+ pxor xmm10,xmm0
+ movdqa xmm11,XMMWORD[rsp]
+DB 102,65,15,56,221,228
+DB 102,65,15,56,221,237
+ movdqa xmm12,XMMWORD[16+rsp]
+ movdqa xmm13,XMMWORD[32+rsp]
+DB 102,65,15,56,221,246
+DB 102,65,15,56,221,255
+ movdqa xmm14,XMMWORD[48+rsp]
+ movdqa xmm15,XMMWORD[64+rsp]
+DB 102,68,15,56,221,193
+ movdqa xmm0,XMMWORD[80+rsp]
+ movups xmm1,XMMWORD[((16-128))+rcx]
+DB 102,69,15,56,221,202
+
+ movups XMMWORD[rsi],xmm2
+ movdqa xmm2,xmm11
+ movups XMMWORD[16+rsi],xmm3
+ movdqa xmm3,xmm12
+ movups XMMWORD[32+rsi],xmm4
+ movdqa xmm4,xmm13
+ movups XMMWORD[48+rsi],xmm5
+ movdqa xmm5,xmm14
+ movups XMMWORD[64+rsi],xmm6
+ movdqa xmm6,xmm15
+ movups XMMWORD[80+rsi],xmm7
+ movdqa xmm7,xmm0
+ movups XMMWORD[96+rsi],xmm8
+ movups XMMWORD[112+rsi],xmm9
+ lea rsi,[128+rsi]
+
+ sub rdx,8
+ jnc NEAR $L$ctr32_loop8
+
+ add rdx,8
+ jz NEAR $L$ctr32_done
+ lea rcx,[((-128))+rcx]
+
+$L$ctr32_tail:
+
+
+ lea rcx,[16+rcx]
+ cmp rdx,4
+ jb NEAR $L$ctr32_loop3
+ je NEAR $L$ctr32_loop4
+
+
+ shl eax,4
+ movdqa xmm8,XMMWORD[96+rsp]
+ pxor xmm9,xmm9
+
+ movups xmm0,XMMWORD[16+rcx]
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+ lea rcx,[((32-16))+rax*1+rcx]
+ neg rax
+DB 102,15,56,220,225
+ add rax,16
+ movups xmm10,XMMWORD[rdi]
+DB 102,15,56,220,233
+DB 102,15,56,220,241
+ movups xmm11,XMMWORD[16+rdi]
+ movups xmm12,XMMWORD[32+rdi]
+DB 102,15,56,220,249
+DB 102,68,15,56,220,193
+
+ call $L$enc_loop8_enter
+
+ movdqu xmm13,XMMWORD[48+rdi]
+ pxor xmm2,xmm10
+ movdqu xmm10,XMMWORD[64+rdi]
+ pxor xmm3,xmm11
+ movdqu XMMWORD[rsi],xmm2
+ pxor xmm4,xmm12
+ movdqu XMMWORD[16+rsi],xmm3
+ pxor xmm5,xmm13
+ movdqu XMMWORD[32+rsi],xmm4
+ pxor xmm6,xmm10
+ movdqu XMMWORD[48+rsi],xmm5
+ movdqu XMMWORD[64+rsi],xmm6
+ cmp rdx,6
+ jb NEAR $L$ctr32_done
+
+ movups xmm11,XMMWORD[80+rdi]
+ xorps xmm7,xmm11
+ movups XMMWORD[80+rsi],xmm7
+ je NEAR $L$ctr32_done
+
+ movups xmm12,XMMWORD[96+rdi]
+ xorps xmm8,xmm12
+ movups XMMWORD[96+rsi],xmm8
+ jmp NEAR $L$ctr32_done
+
+ALIGN 32
+$L$ctr32_loop4:
+DB 102,15,56,220,209
+ lea rcx,[16+rcx]
+ dec eax
+DB 102,15,56,220,217
+DB 102,15,56,220,225
+DB 102,15,56,220,233
+ movups xmm1,XMMWORD[rcx]
+ jnz NEAR $L$ctr32_loop4
+DB 102,15,56,221,209
+DB 102,15,56,221,217
+ movups xmm10,XMMWORD[rdi]
+ movups xmm11,XMMWORD[16+rdi]
+DB 102,15,56,221,225
+DB 102,15,56,221,233
+ movups xmm12,XMMWORD[32+rdi]
+ movups xmm13,XMMWORD[48+rdi]
+
+ xorps xmm2,xmm10
+ movups XMMWORD[rsi],xmm2
+ xorps xmm3,xmm11
+ movups XMMWORD[16+rsi],xmm3
+ pxor xmm4,xmm12
+ movdqu XMMWORD[32+rsi],xmm4
+ pxor xmm5,xmm13
+ movdqu XMMWORD[48+rsi],xmm5
+ jmp NEAR $L$ctr32_done
+
+ALIGN 32
+$L$ctr32_loop3:
+DB 102,15,56,220,209
+ lea rcx,[16+rcx]
+ dec eax
+DB 102,15,56,220,217
+DB 102,15,56,220,225
+ movups xmm1,XMMWORD[rcx]
+ jnz NEAR $L$ctr32_loop3
+DB 102,15,56,221,209
+DB 102,15,56,221,217
+DB 102,15,56,221,225
+
+ movups xmm10,XMMWORD[rdi]
+ xorps xmm2,xmm10
+ movups XMMWORD[rsi],xmm2
+ cmp rdx,2
+ jb NEAR $L$ctr32_done
+
+ movups xmm11,XMMWORD[16+rdi]
+ xorps xmm3,xmm11
+ movups XMMWORD[16+rsi],xmm3
+ je NEAR $L$ctr32_done
+
+ movups xmm12,XMMWORD[32+rdi]
+ xorps xmm4,xmm12
+ movups XMMWORD[32+rsi],xmm4
+
+$L$ctr32_done:
+ xorps xmm0,xmm0
+ xor ebp,ebp
+ pxor xmm1,xmm1
+ pxor xmm2,xmm2
+ pxor xmm3,xmm3
+ pxor xmm4,xmm4
+ pxor xmm5,xmm5
+ movaps xmm6,XMMWORD[((-168))+r11]
+ movaps XMMWORD[(-168)+r11],xmm0
+ movaps xmm7,XMMWORD[((-152))+r11]
+ movaps XMMWORD[(-152)+r11],xmm0
+ movaps xmm8,XMMWORD[((-136))+r11]
+ movaps XMMWORD[(-136)+r11],xmm0
+ movaps xmm9,XMMWORD[((-120))+r11]
+ movaps XMMWORD[(-120)+r11],xmm0
+ movaps xmm10,XMMWORD[((-104))+r11]
+ movaps XMMWORD[(-104)+r11],xmm0
+ movaps xmm11,XMMWORD[((-88))+r11]
+ movaps XMMWORD[(-88)+r11],xmm0
+ movaps xmm12,XMMWORD[((-72))+r11]
+ movaps XMMWORD[(-72)+r11],xmm0
+ movaps xmm13,XMMWORD[((-56))+r11]
+ movaps XMMWORD[(-56)+r11],xmm0
+ movaps xmm14,XMMWORD[((-40))+r11]
+ movaps XMMWORD[(-40)+r11],xmm0
+ movaps xmm15,XMMWORD[((-24))+r11]
+ movaps XMMWORD[(-24)+r11],xmm0
+ movaps XMMWORD[rsp],xmm0
+ movaps XMMWORD[16+rsp],xmm0
+ movaps XMMWORD[32+rsp],xmm0
+ movaps XMMWORD[48+rsp],xmm0
+ movaps XMMWORD[64+rsp],xmm0
+ movaps XMMWORD[80+rsp],xmm0
+ movaps XMMWORD[96+rsp],xmm0
+ movaps XMMWORD[112+rsp],xmm0
+ mov rbp,QWORD[((-8))+r11]
+
+ lea rsp,[r11]
+
+$L$ctr32_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_aesni_ctr32_encrypt_blocks:
+global aesni_xts_encrypt
+
+ALIGN 16
+aesni_xts_encrypt:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_aesni_xts_encrypt:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+
+ lea r11,[rsp]
+
+ push rbp
+
+ sub rsp,272
+ and rsp,-16
+ movaps XMMWORD[(-168)+r11],xmm6
+ movaps XMMWORD[(-152)+r11],xmm7
+ movaps XMMWORD[(-136)+r11],xmm8
+ movaps XMMWORD[(-120)+r11],xmm9
+ movaps XMMWORD[(-104)+r11],xmm10
+ movaps XMMWORD[(-88)+r11],xmm11
+ movaps XMMWORD[(-72)+r11],xmm12
+ movaps XMMWORD[(-56)+r11],xmm13
+ movaps XMMWORD[(-40)+r11],xmm14
+ movaps XMMWORD[(-24)+r11],xmm15
+$L$xts_enc_body:
+ movups xmm2,XMMWORD[r9]
+ mov eax,DWORD[240+r8]
+ mov r10d,DWORD[240+rcx]
+ movups xmm0,XMMWORD[r8]
+ movups xmm1,XMMWORD[16+r8]
+ lea r8,[32+r8]
+ xorps xmm2,xmm0
+$L$oop_enc1_8:
+DB 102,15,56,220,209
+ dec eax
+ movups xmm1,XMMWORD[r8]
+ lea r8,[16+r8]
+ jnz NEAR $L$oop_enc1_8
+DB 102,15,56,221,209
+ movups xmm0,XMMWORD[rcx]
+ mov rbp,rcx
+ mov eax,r10d
+ shl r10d,4
+ mov r9,rdx
+ and rdx,-16
+
+ movups xmm1,XMMWORD[16+r10*1+rcx]
+
+ movdqa xmm8,XMMWORD[$L$xts_magic]
+ movdqa xmm15,xmm2
+ pshufd xmm9,xmm2,0x5f
+ pxor xmm1,xmm0
+ movdqa xmm14,xmm9
+ paddd xmm9,xmm9
+ movdqa xmm10,xmm15
+ psrad xmm14,31
+ paddq xmm15,xmm15
+ pand xmm14,xmm8
+ pxor xmm10,xmm0
+ pxor xmm15,xmm14
+ movdqa xmm14,xmm9
+ paddd xmm9,xmm9
+ movdqa xmm11,xmm15
+ psrad xmm14,31
+ paddq xmm15,xmm15
+ pand xmm14,xmm8
+ pxor xmm11,xmm0
+ pxor xmm15,xmm14
+ movdqa xmm14,xmm9
+ paddd xmm9,xmm9
+ movdqa xmm12,xmm15
+ psrad xmm14,31
+ paddq xmm15,xmm15
+ pand xmm14,xmm8
+ pxor xmm12,xmm0
+ pxor xmm15,xmm14
+ movdqa xmm14,xmm9
+ paddd xmm9,xmm9
+ movdqa xmm13,xmm15
+ psrad xmm14,31
+ paddq xmm15,xmm15
+ pand xmm14,xmm8
+ pxor xmm13,xmm0
+ pxor xmm15,xmm14
+ movdqa xmm14,xmm15
+ psrad xmm9,31
+ paddq xmm15,xmm15
+ pand xmm9,xmm8
+ pxor xmm14,xmm0
+ pxor xmm15,xmm9
+ movaps XMMWORD[96+rsp],xmm1
+
+ sub rdx,16*6
+ jc NEAR $L$xts_enc_short
+
+ mov eax,16+96
+ lea rcx,[32+r10*1+rbp]
+ sub rax,r10
+ movups xmm1,XMMWORD[16+rbp]
+ mov r10,rax
+ lea r8,[$L$xts_magic]
+ jmp NEAR $L$xts_enc_grandloop
+
+ALIGN 32
+$L$xts_enc_grandloop:
+ movdqu xmm2,XMMWORD[rdi]
+ movdqa xmm8,xmm0
+ movdqu xmm3,XMMWORD[16+rdi]
+ pxor xmm2,xmm10
+ movdqu xmm4,XMMWORD[32+rdi]
+ pxor xmm3,xmm11
+DB 102,15,56,220,209
+ movdqu xmm5,XMMWORD[48+rdi]
+ pxor xmm4,xmm12
+DB 102,15,56,220,217
+ movdqu xmm6,XMMWORD[64+rdi]
+ pxor xmm5,xmm13
+DB 102,15,56,220,225
+ movdqu xmm7,XMMWORD[80+rdi]
+ pxor xmm8,xmm15
+ movdqa xmm9,XMMWORD[96+rsp]
+ pxor xmm6,xmm14
+DB 102,15,56,220,233
+ movups xmm0,XMMWORD[32+rbp]
+ lea rdi,[96+rdi]
+ pxor xmm7,xmm8
+
+ pxor xmm10,xmm9
+DB 102,15,56,220,241
+ pxor xmm11,xmm9
+ movdqa XMMWORD[rsp],xmm10
+DB 102,15,56,220,249
+ movups xmm1,XMMWORD[48+rbp]
+ pxor xmm12,xmm9
+
+DB 102,15,56,220,208
+ pxor xmm13,xmm9
+ movdqa XMMWORD[16+rsp],xmm11
+DB 102,15,56,220,216
+ pxor xmm14,xmm9
+ movdqa XMMWORD[32+rsp],xmm12
+DB 102,15,56,220,224
+DB 102,15,56,220,232
+ pxor xmm8,xmm9
+ movdqa XMMWORD[64+rsp],xmm14
+DB 102,15,56,220,240
+DB 102,15,56,220,248
+ movups xmm0,XMMWORD[64+rbp]
+ movdqa XMMWORD[80+rsp],xmm8
+ pshufd xmm9,xmm15,0x5f
+ jmp NEAR $L$xts_enc_loop6
+ALIGN 32
+$L$xts_enc_loop6:
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+DB 102,15,56,220,225
+DB 102,15,56,220,233
+DB 102,15,56,220,241
+DB 102,15,56,220,249
+ movups xmm1,XMMWORD[((-64))+rax*1+rcx]
+ add rax,32
+
+DB 102,15,56,220,208
+DB 102,15,56,220,216
+DB 102,15,56,220,224
+DB 102,15,56,220,232
+DB 102,15,56,220,240
+DB 102,15,56,220,248
+ movups xmm0,XMMWORD[((-80))+rax*1+rcx]
+ jnz NEAR $L$xts_enc_loop6
+
+ movdqa xmm8,XMMWORD[r8]
+ movdqa xmm14,xmm9
+ paddd xmm9,xmm9
+DB 102,15,56,220,209
+ paddq xmm15,xmm15
+ psrad xmm14,31
+DB 102,15,56,220,217
+ pand xmm14,xmm8
+ movups xmm10,XMMWORD[rbp]
+DB 102,15,56,220,225
+DB 102,15,56,220,233
+DB 102,15,56,220,241
+ pxor xmm15,xmm14
+ movaps xmm11,xmm10
+DB 102,15,56,220,249
+ movups xmm1,XMMWORD[((-64))+rcx]
+
+ movdqa xmm14,xmm9
+DB 102,15,56,220,208
+ paddd xmm9,xmm9
+ pxor xmm10,xmm15
+DB 102,15,56,220,216
+ psrad xmm14,31
+ paddq xmm15,xmm15
+DB 102,15,56,220,224
+DB 102,15,56,220,232
+ pand xmm14,xmm8
+ movaps xmm12,xmm11
+DB 102,15,56,220,240
+ pxor xmm15,xmm14
+ movdqa xmm14,xmm9
+DB 102,15,56,220,248
+ movups xmm0,XMMWORD[((-48))+rcx]
+
+ paddd xmm9,xmm9
+DB 102,15,56,220,209
+ pxor xmm11,xmm15
+ psrad xmm14,31
+DB 102,15,56,220,217
+ paddq xmm15,xmm15
+ pand xmm14,xmm8
+DB 102,15,56,220,225
+DB 102,15,56,220,233
+ movdqa XMMWORD[48+rsp],xmm13
+ pxor xmm15,xmm14
+DB 102,15,56,220,241
+ movaps xmm13,xmm12
+ movdqa xmm14,xmm9
+DB 102,15,56,220,249
+ movups xmm1,XMMWORD[((-32))+rcx]
+
+ paddd xmm9,xmm9
+DB 102,15,56,220,208
+ pxor xmm12,xmm15
+ psrad xmm14,31
+DB 102,15,56,220,216
+ paddq xmm15,xmm15
+ pand xmm14,xmm8
+DB 102,15,56,220,224
+DB 102,15,56,220,232
+DB 102,15,56,220,240
+ pxor xmm15,xmm14
+ movaps xmm14,xmm13
+DB 102,15,56,220,248
+
+ movdqa xmm0,xmm9
+ paddd xmm9,xmm9
+DB 102,15,56,220,209
+ pxor xmm13,xmm15
+ psrad xmm0,31
+DB 102,15,56,220,217
+ paddq xmm15,xmm15
+ pand xmm0,xmm8
+DB 102,15,56,220,225
+DB 102,15,56,220,233
+ pxor xmm15,xmm0
+ movups xmm0,XMMWORD[rbp]
+DB 102,15,56,220,241
+DB 102,15,56,220,249
+ movups xmm1,XMMWORD[16+rbp]
+
+ pxor xmm14,xmm15
+DB 102,15,56,221,84,36,0
+ psrad xmm9,31
+ paddq xmm15,xmm15
+DB 102,15,56,221,92,36,16
+DB 102,15,56,221,100,36,32
+ pand xmm9,xmm8
+ mov rax,r10
+DB 102,15,56,221,108,36,48
+DB 102,15,56,221,116,36,64
+DB 102,15,56,221,124,36,80
+ pxor xmm15,xmm9
+
+ lea rsi,[96+rsi]
+ movups XMMWORD[(-96)+rsi],xmm2
+ movups XMMWORD[(-80)+rsi],xmm3
+ movups XMMWORD[(-64)+rsi],xmm4
+ movups XMMWORD[(-48)+rsi],xmm5
+ movups XMMWORD[(-32)+rsi],xmm6
+ movups XMMWORD[(-16)+rsi],xmm7
+ sub rdx,16*6
+ jnc NEAR $L$xts_enc_grandloop
+
+ mov eax,16+96
+ sub eax,r10d
+ mov rcx,rbp
+ shr eax,4
+
+$L$xts_enc_short:
+
+ mov r10d,eax
+ pxor xmm10,xmm0
+ add rdx,16*6
+ jz NEAR $L$xts_enc_done
+
+ pxor xmm11,xmm0
+ cmp rdx,0x20
+ jb NEAR $L$xts_enc_one
+ pxor xmm12,xmm0
+ je NEAR $L$xts_enc_two
+
+ pxor xmm13,xmm0
+ cmp rdx,0x40
+ jb NEAR $L$xts_enc_three
+ pxor xmm14,xmm0
+ je NEAR $L$xts_enc_four
+
+ movdqu xmm2,XMMWORD[rdi]
+ movdqu xmm3,XMMWORD[16+rdi]
+ movdqu xmm4,XMMWORD[32+rdi]
+ pxor xmm2,xmm10
+ movdqu xmm5,XMMWORD[48+rdi]
+ pxor xmm3,xmm11
+ movdqu xmm6,XMMWORD[64+rdi]
+ lea rdi,[80+rdi]
+ pxor xmm4,xmm12
+ pxor xmm5,xmm13
+ pxor xmm6,xmm14
+ pxor xmm7,xmm7
+
+ call _aesni_encrypt6
+
+ xorps xmm2,xmm10
+ movdqa xmm10,xmm15
+ xorps xmm3,xmm11
+ xorps xmm4,xmm12
+ movdqu XMMWORD[rsi],xmm2
+ xorps xmm5,xmm13
+ movdqu XMMWORD[16+rsi],xmm3
+ xorps xmm6,xmm14
+ movdqu XMMWORD[32+rsi],xmm4
+ movdqu XMMWORD[48+rsi],xmm5
+ movdqu XMMWORD[64+rsi],xmm6
+ lea rsi,[80+rsi]
+ jmp NEAR $L$xts_enc_done
+
+ALIGN 16
+$L$xts_enc_one:
+ movups xmm2,XMMWORD[rdi]
+ lea rdi,[16+rdi]
+ xorps xmm2,xmm10
+ movups xmm0,XMMWORD[rcx]
+ movups xmm1,XMMWORD[16+rcx]
+ lea rcx,[32+rcx]
+ xorps xmm2,xmm0
+$L$oop_enc1_9:
+DB 102,15,56,220,209
+ dec eax
+ movups xmm1,XMMWORD[rcx]
+ lea rcx,[16+rcx]
+ jnz NEAR $L$oop_enc1_9
+DB 102,15,56,221,209
+ xorps xmm2,xmm10
+ movdqa xmm10,xmm11
+ movups XMMWORD[rsi],xmm2
+ lea rsi,[16+rsi]
+ jmp NEAR $L$xts_enc_done
+
+ALIGN 16
+$L$xts_enc_two:
+ movups xmm2,XMMWORD[rdi]
+ movups xmm3,XMMWORD[16+rdi]
+ lea rdi,[32+rdi]
+ xorps xmm2,xmm10
+ xorps xmm3,xmm11
+
+ call _aesni_encrypt2
+
+ xorps xmm2,xmm10
+ movdqa xmm10,xmm12
+ xorps xmm3,xmm11
+ movups XMMWORD[rsi],xmm2
+ movups XMMWORD[16+rsi],xmm3
+ lea rsi,[32+rsi]
+ jmp NEAR $L$xts_enc_done
+
+ALIGN 16
+$L$xts_enc_three:
+ movups xmm2,XMMWORD[rdi]
+ movups xmm3,XMMWORD[16+rdi]
+ movups xmm4,XMMWORD[32+rdi]
+ lea rdi,[48+rdi]
+ xorps xmm2,xmm10
+ xorps xmm3,xmm11
+ xorps xmm4,xmm12
+
+ call _aesni_encrypt3
+
+ xorps xmm2,xmm10
+ movdqa xmm10,xmm13
+ xorps xmm3,xmm11
+ xorps xmm4,xmm12
+ movups XMMWORD[rsi],xmm2
+ movups XMMWORD[16+rsi],xmm3
+ movups XMMWORD[32+rsi],xmm4
+ lea rsi,[48+rsi]
+ jmp NEAR $L$xts_enc_done
+
+ALIGN 16
+$L$xts_enc_four:
+ movups xmm2,XMMWORD[rdi]
+ movups xmm3,XMMWORD[16+rdi]
+ movups xmm4,XMMWORD[32+rdi]
+ xorps xmm2,xmm10
+ movups xmm5,XMMWORD[48+rdi]
+ lea rdi,[64+rdi]
+ xorps xmm3,xmm11
+ xorps xmm4,xmm12
+ xorps xmm5,xmm13
+
+ call _aesni_encrypt4
+
+ pxor xmm2,xmm10
+ movdqa xmm10,xmm14
+ pxor xmm3,xmm11
+ pxor xmm4,xmm12
+ movdqu XMMWORD[rsi],xmm2
+ pxor xmm5,xmm13
+ movdqu XMMWORD[16+rsi],xmm3
+ movdqu XMMWORD[32+rsi],xmm4
+ movdqu XMMWORD[48+rsi],xmm5
+ lea rsi,[64+rsi]
+ jmp NEAR $L$xts_enc_done
+
+ALIGN 16
+$L$xts_enc_done:
+ and r9,15
+ jz NEAR $L$xts_enc_ret
+ mov rdx,r9
+
+$L$xts_enc_steal:
+ movzx eax,BYTE[rdi]
+ movzx ecx,BYTE[((-16))+rsi]
+ lea rdi,[1+rdi]
+ mov BYTE[((-16))+rsi],al
+ mov BYTE[rsi],cl
+ lea rsi,[1+rsi]
+ sub rdx,1
+ jnz NEAR $L$xts_enc_steal
+
+ sub rsi,r9
+ mov rcx,rbp
+ mov eax,r10d
+
+ movups xmm2,XMMWORD[((-16))+rsi]
+ xorps xmm2,xmm10
+ movups xmm0,XMMWORD[rcx]
+ movups xmm1,XMMWORD[16+rcx]
+ lea rcx,[32+rcx]
+ xorps xmm2,xmm0
+$L$oop_enc1_10:
+DB 102,15,56,220,209
+ dec eax
+ movups xmm1,XMMWORD[rcx]
+ lea rcx,[16+rcx]
+ jnz NEAR $L$oop_enc1_10
+DB 102,15,56,221,209
+ xorps xmm2,xmm10
+ movups XMMWORD[(-16)+rsi],xmm2
+
+$L$xts_enc_ret:
+ xorps xmm0,xmm0
+ pxor xmm1,xmm1
+ pxor xmm2,xmm2
+ pxor xmm3,xmm3
+ pxor xmm4,xmm4
+ pxor xmm5,xmm5
+ movaps xmm6,XMMWORD[((-168))+r11]
+ movaps XMMWORD[(-168)+r11],xmm0
+ movaps xmm7,XMMWORD[((-152))+r11]
+ movaps XMMWORD[(-152)+r11],xmm0
+ movaps xmm8,XMMWORD[((-136))+r11]
+ movaps XMMWORD[(-136)+r11],xmm0
+ movaps xmm9,XMMWORD[((-120))+r11]
+ movaps XMMWORD[(-120)+r11],xmm0
+ movaps xmm10,XMMWORD[((-104))+r11]
+ movaps XMMWORD[(-104)+r11],xmm0
+ movaps xmm11,XMMWORD[((-88))+r11]
+ movaps XMMWORD[(-88)+r11],xmm0
+ movaps xmm12,XMMWORD[((-72))+r11]
+ movaps XMMWORD[(-72)+r11],xmm0
+ movaps xmm13,XMMWORD[((-56))+r11]
+ movaps XMMWORD[(-56)+r11],xmm0
+ movaps xmm14,XMMWORD[((-40))+r11]
+ movaps XMMWORD[(-40)+r11],xmm0
+ movaps xmm15,XMMWORD[((-24))+r11]
+ movaps XMMWORD[(-24)+r11],xmm0
+ movaps XMMWORD[rsp],xmm0
+ movaps XMMWORD[16+rsp],xmm0
+ movaps XMMWORD[32+rsp],xmm0
+ movaps XMMWORD[48+rsp],xmm0
+ movaps XMMWORD[64+rsp],xmm0
+ movaps XMMWORD[80+rsp],xmm0
+ movaps XMMWORD[96+rsp],xmm0
+ mov rbp,QWORD[((-8))+r11]
+
+ lea rsp,[r11]
+
+$L$xts_enc_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_aesni_xts_encrypt:
+global aesni_xts_decrypt
+
+ALIGN 16
+aesni_xts_decrypt:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_aesni_xts_decrypt:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+
+ lea r11,[rsp]
+
+ push rbp
+
+ sub rsp,272
+ and rsp,-16
+ movaps XMMWORD[(-168)+r11],xmm6
+ movaps XMMWORD[(-152)+r11],xmm7
+ movaps XMMWORD[(-136)+r11],xmm8
+ movaps XMMWORD[(-120)+r11],xmm9
+ movaps XMMWORD[(-104)+r11],xmm10
+ movaps XMMWORD[(-88)+r11],xmm11
+ movaps XMMWORD[(-72)+r11],xmm12
+ movaps XMMWORD[(-56)+r11],xmm13
+ movaps XMMWORD[(-40)+r11],xmm14
+ movaps XMMWORD[(-24)+r11],xmm15
+$L$xts_dec_body:
+ movups xmm2,XMMWORD[r9]
+ mov eax,DWORD[240+r8]
+ mov r10d,DWORD[240+rcx]
+ movups xmm0,XMMWORD[r8]
+ movups xmm1,XMMWORD[16+r8]
+ lea r8,[32+r8]
+ xorps xmm2,xmm0
+$L$oop_enc1_11:
+DB 102,15,56,220,209
+ dec eax
+ movups xmm1,XMMWORD[r8]
+ lea r8,[16+r8]
+ jnz NEAR $L$oop_enc1_11
+DB 102,15,56,221,209
+ xor eax,eax
+ test rdx,15
+ setnz al
+ shl rax,4
+ sub rdx,rax
+
+ movups xmm0,XMMWORD[rcx]
+ mov rbp,rcx
+ mov eax,r10d
+ shl r10d,4
+ mov r9,rdx
+ and rdx,-16
+
+ movups xmm1,XMMWORD[16+r10*1+rcx]
+
+ movdqa xmm8,XMMWORD[$L$xts_magic]
+ movdqa xmm15,xmm2
+ pshufd xmm9,xmm2,0x5f
+ pxor xmm1,xmm0
+ movdqa xmm14,xmm9
+ paddd xmm9,xmm9
+ movdqa xmm10,xmm15
+ psrad xmm14,31
+ paddq xmm15,xmm15
+ pand xmm14,xmm8
+ pxor xmm10,xmm0
+ pxor xmm15,xmm14
+ movdqa xmm14,xmm9
+ paddd xmm9,xmm9
+ movdqa xmm11,xmm15
+ psrad xmm14,31
+ paddq xmm15,xmm15
+ pand xmm14,xmm8
+ pxor xmm11,xmm0
+ pxor xmm15,xmm14
+ movdqa xmm14,xmm9
+ paddd xmm9,xmm9
+ movdqa xmm12,xmm15
+ psrad xmm14,31
+ paddq xmm15,xmm15
+ pand xmm14,xmm8
+ pxor xmm12,xmm0
+ pxor xmm15,xmm14
+ movdqa xmm14,xmm9
+ paddd xmm9,xmm9
+ movdqa xmm13,xmm15
+ psrad xmm14,31
+ paddq xmm15,xmm15
+ pand xmm14,xmm8
+ pxor xmm13,xmm0
+ pxor xmm15,xmm14
+ movdqa xmm14,xmm15
+ psrad xmm9,31
+ paddq xmm15,xmm15
+ pand xmm9,xmm8
+ pxor xmm14,xmm0
+ pxor xmm15,xmm9
+ movaps XMMWORD[96+rsp],xmm1
+
+ sub rdx,16*6
+ jc NEAR $L$xts_dec_short
+
+ mov eax,16+96
+ lea rcx,[32+r10*1+rbp]
+ sub rax,r10
+ movups xmm1,XMMWORD[16+rbp]
+ mov r10,rax
+ lea r8,[$L$xts_magic]
+ jmp NEAR $L$xts_dec_grandloop
+
+ALIGN 32
+$L$xts_dec_grandloop:
+ movdqu xmm2,XMMWORD[rdi]
+ movdqa xmm8,xmm0
+ movdqu xmm3,XMMWORD[16+rdi]
+ pxor xmm2,xmm10
+ movdqu xmm4,XMMWORD[32+rdi]
+ pxor xmm3,xmm11
+DB 102,15,56,222,209
+ movdqu xmm5,XMMWORD[48+rdi]
+ pxor xmm4,xmm12
+DB 102,15,56,222,217
+ movdqu xmm6,XMMWORD[64+rdi]
+ pxor xmm5,xmm13
+DB 102,15,56,222,225
+ movdqu xmm7,XMMWORD[80+rdi]
+ pxor xmm8,xmm15
+ movdqa xmm9,XMMWORD[96+rsp]
+ pxor xmm6,xmm14
+DB 102,15,56,222,233
+ movups xmm0,XMMWORD[32+rbp]
+ lea rdi,[96+rdi]
+ pxor xmm7,xmm8
+
+ pxor xmm10,xmm9
+DB 102,15,56,222,241
+ pxor xmm11,xmm9
+ movdqa XMMWORD[rsp],xmm10
+DB 102,15,56,222,249
+ movups xmm1,XMMWORD[48+rbp]
+ pxor xmm12,xmm9
+
+DB 102,15,56,222,208
+ pxor xmm13,xmm9
+ movdqa XMMWORD[16+rsp],xmm11
+DB 102,15,56,222,216
+ pxor xmm14,xmm9
+ movdqa XMMWORD[32+rsp],xmm12
+DB 102,15,56,222,224
+DB 102,15,56,222,232
+ pxor xmm8,xmm9
+ movdqa XMMWORD[64+rsp],xmm14
+DB 102,15,56,222,240
+DB 102,15,56,222,248
+ movups xmm0,XMMWORD[64+rbp]
+ movdqa XMMWORD[80+rsp],xmm8
+ pshufd xmm9,xmm15,0x5f
+ jmp NEAR $L$xts_dec_loop6
+ALIGN 32
+$L$xts_dec_loop6:
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+DB 102,15,56,222,225
+DB 102,15,56,222,233
+DB 102,15,56,222,241
+DB 102,15,56,222,249
+ movups xmm1,XMMWORD[((-64))+rax*1+rcx]
+ add rax,32
+
+DB 102,15,56,222,208
+DB 102,15,56,222,216
+DB 102,15,56,222,224
+DB 102,15,56,222,232
+DB 102,15,56,222,240
+DB 102,15,56,222,248
+ movups xmm0,XMMWORD[((-80))+rax*1+rcx]
+ jnz NEAR $L$xts_dec_loop6
+
+ movdqa xmm8,XMMWORD[r8]
+ movdqa xmm14,xmm9
+ paddd xmm9,xmm9
+DB 102,15,56,222,209
+ paddq xmm15,xmm15
+ psrad xmm14,31
+DB 102,15,56,222,217
+ pand xmm14,xmm8
+ movups xmm10,XMMWORD[rbp]
+DB 102,15,56,222,225
+DB 102,15,56,222,233
+DB 102,15,56,222,241
+ pxor xmm15,xmm14
+ movaps xmm11,xmm10
+DB 102,15,56,222,249
+ movups xmm1,XMMWORD[((-64))+rcx]
+
+ movdqa xmm14,xmm9
+DB 102,15,56,222,208
+ paddd xmm9,xmm9
+ pxor xmm10,xmm15
+DB 102,15,56,222,216
+ psrad xmm14,31
+ paddq xmm15,xmm15
+DB 102,15,56,222,224
+DB 102,15,56,222,232
+ pand xmm14,xmm8
+ movaps xmm12,xmm11
+DB 102,15,56,222,240
+ pxor xmm15,xmm14
+ movdqa xmm14,xmm9
+DB 102,15,56,222,248
+ movups xmm0,XMMWORD[((-48))+rcx]
+
+ paddd xmm9,xmm9
+DB 102,15,56,222,209
+ pxor xmm11,xmm15
+ psrad xmm14,31
+DB 102,15,56,222,217
+ paddq xmm15,xmm15
+ pand xmm14,xmm8
+DB 102,15,56,222,225
+DB 102,15,56,222,233
+ movdqa XMMWORD[48+rsp],xmm13
+ pxor xmm15,xmm14
+DB 102,15,56,222,241
+ movaps xmm13,xmm12
+ movdqa xmm14,xmm9
+DB 102,15,56,222,249
+ movups xmm1,XMMWORD[((-32))+rcx]
+
+ paddd xmm9,xmm9
+DB 102,15,56,222,208
+ pxor xmm12,xmm15
+ psrad xmm14,31
+DB 102,15,56,222,216
+ paddq xmm15,xmm15
+ pand xmm14,xmm8
+DB 102,15,56,222,224
+DB 102,15,56,222,232
+DB 102,15,56,222,240
+ pxor xmm15,xmm14
+ movaps xmm14,xmm13
+DB 102,15,56,222,248
+
+ movdqa xmm0,xmm9
+ paddd xmm9,xmm9
+DB 102,15,56,222,209
+ pxor xmm13,xmm15
+ psrad xmm0,31
+DB 102,15,56,222,217
+ paddq xmm15,xmm15
+ pand xmm0,xmm8
+DB 102,15,56,222,225
+DB 102,15,56,222,233
+ pxor xmm15,xmm0
+ movups xmm0,XMMWORD[rbp]
+DB 102,15,56,222,241
+DB 102,15,56,222,249
+ movups xmm1,XMMWORD[16+rbp]
+
+ pxor xmm14,xmm15
+DB 102,15,56,223,84,36,0
+ psrad xmm9,31
+ paddq xmm15,xmm15
+DB 102,15,56,223,92,36,16
+DB 102,15,56,223,100,36,32
+ pand xmm9,xmm8
+ mov rax,r10
+DB 102,15,56,223,108,36,48
+DB 102,15,56,223,116,36,64
+DB 102,15,56,223,124,36,80
+ pxor xmm15,xmm9
+
+ lea rsi,[96+rsi]
+ movups XMMWORD[(-96)+rsi],xmm2
+ movups XMMWORD[(-80)+rsi],xmm3
+ movups XMMWORD[(-64)+rsi],xmm4
+ movups XMMWORD[(-48)+rsi],xmm5
+ movups XMMWORD[(-32)+rsi],xmm6
+ movups XMMWORD[(-16)+rsi],xmm7
+ sub rdx,16*6
+ jnc NEAR $L$xts_dec_grandloop
+
+ mov eax,16+96
+ sub eax,r10d
+ mov rcx,rbp
+ shr eax,4
+
+$L$xts_dec_short:
+
+ mov r10d,eax
+ pxor xmm10,xmm0
+ pxor xmm11,xmm0
+ add rdx,16*6
+ jz NEAR $L$xts_dec_done
+
+ pxor xmm12,xmm0
+ cmp rdx,0x20
+ jb NEAR $L$xts_dec_one
+ pxor xmm13,xmm0
+ je NEAR $L$xts_dec_two
+
+ pxor xmm14,xmm0
+ cmp rdx,0x40
+ jb NEAR $L$xts_dec_three
+ je NEAR $L$xts_dec_four
+
+ movdqu xmm2,XMMWORD[rdi]
+ movdqu xmm3,XMMWORD[16+rdi]
+ movdqu xmm4,XMMWORD[32+rdi]
+ pxor xmm2,xmm10
+ movdqu xmm5,XMMWORD[48+rdi]
+ pxor xmm3,xmm11
+ movdqu xmm6,XMMWORD[64+rdi]
+ lea rdi,[80+rdi]
+ pxor xmm4,xmm12
+ pxor xmm5,xmm13
+ pxor xmm6,xmm14
+
+ call _aesni_decrypt6
+
+ xorps xmm2,xmm10
+ xorps xmm3,xmm11
+ xorps xmm4,xmm12
+ movdqu XMMWORD[rsi],xmm2
+ xorps xmm5,xmm13
+ movdqu XMMWORD[16+rsi],xmm3
+ xorps xmm6,xmm14
+ movdqu XMMWORD[32+rsi],xmm4
+ pxor xmm14,xmm14
+ movdqu XMMWORD[48+rsi],xmm5
+ pcmpgtd xmm14,xmm15
+ movdqu XMMWORD[64+rsi],xmm6
+ lea rsi,[80+rsi]
+ pshufd xmm11,xmm14,0x13
+ and r9,15
+ jz NEAR $L$xts_dec_ret
+
+ movdqa xmm10,xmm15
+ paddq xmm15,xmm15
+ pand xmm11,xmm8
+ pxor xmm11,xmm15
+ jmp NEAR $L$xts_dec_done2
+
+ALIGN 16
+$L$xts_dec_one:
+ movups xmm2,XMMWORD[rdi]
+ lea rdi,[16+rdi]
+ xorps xmm2,xmm10
+ movups xmm0,XMMWORD[rcx]
+ movups xmm1,XMMWORD[16+rcx]
+ lea rcx,[32+rcx]
+ xorps xmm2,xmm0
+$L$oop_dec1_12:
+DB 102,15,56,222,209
+ dec eax
+ movups xmm1,XMMWORD[rcx]
+ lea rcx,[16+rcx]
+ jnz NEAR $L$oop_dec1_12
+DB 102,15,56,223,209
+ xorps xmm2,xmm10
+ movdqa xmm10,xmm11
+ movups XMMWORD[rsi],xmm2
+ movdqa xmm11,xmm12
+ lea rsi,[16+rsi]
+ jmp NEAR $L$xts_dec_done
+
+ALIGN 16
+$L$xts_dec_two:
+ movups xmm2,XMMWORD[rdi]
+ movups xmm3,XMMWORD[16+rdi]
+ lea rdi,[32+rdi]
+ xorps xmm2,xmm10
+ xorps xmm3,xmm11
+
+ call _aesni_decrypt2
+
+ xorps xmm2,xmm10
+ movdqa xmm10,xmm12
+ xorps xmm3,xmm11
+ movdqa xmm11,xmm13
+ movups XMMWORD[rsi],xmm2
+ movups XMMWORD[16+rsi],xmm3
+ lea rsi,[32+rsi]
+ jmp NEAR $L$xts_dec_done
+
+ALIGN 16
+$L$xts_dec_three:
+ movups xmm2,XMMWORD[rdi]
+ movups xmm3,XMMWORD[16+rdi]
+ movups xmm4,XMMWORD[32+rdi]
+ lea rdi,[48+rdi]
+ xorps xmm2,xmm10
+ xorps xmm3,xmm11
+ xorps xmm4,xmm12
+
+ call _aesni_decrypt3
+
+ xorps xmm2,xmm10
+ movdqa xmm10,xmm13
+ xorps xmm3,xmm11
+ movdqa xmm11,xmm14
+ xorps xmm4,xmm12
+ movups XMMWORD[rsi],xmm2
+ movups XMMWORD[16+rsi],xmm3
+ movups XMMWORD[32+rsi],xmm4
+ lea rsi,[48+rsi]
+ jmp NEAR $L$xts_dec_done
+
+ALIGN 16
+$L$xts_dec_four:
+ movups xmm2,XMMWORD[rdi]
+ movups xmm3,XMMWORD[16+rdi]
+ movups xmm4,XMMWORD[32+rdi]
+ xorps xmm2,xmm10
+ movups xmm5,XMMWORD[48+rdi]
+ lea rdi,[64+rdi]
+ xorps xmm3,xmm11
+ xorps xmm4,xmm12
+ xorps xmm5,xmm13
+
+ call _aesni_decrypt4
+
+ pxor xmm2,xmm10
+ movdqa xmm10,xmm14
+ pxor xmm3,xmm11
+ movdqa xmm11,xmm15
+ pxor xmm4,xmm12
+ movdqu XMMWORD[rsi],xmm2
+ pxor xmm5,xmm13
+ movdqu XMMWORD[16+rsi],xmm3
+ movdqu XMMWORD[32+rsi],xmm4
+ movdqu XMMWORD[48+rsi],xmm5
+ lea rsi,[64+rsi]
+ jmp NEAR $L$xts_dec_done
+
+ALIGN 16
+$L$xts_dec_done:
+ and r9,15
+ jz NEAR $L$xts_dec_ret
+$L$xts_dec_done2:
+ mov rdx,r9
+ mov rcx,rbp
+ mov eax,r10d
+
+ movups xmm2,XMMWORD[rdi]
+ xorps xmm2,xmm11
+ movups xmm0,XMMWORD[rcx]
+ movups xmm1,XMMWORD[16+rcx]
+ lea rcx,[32+rcx]
+ xorps xmm2,xmm0
+$L$oop_dec1_13:
+DB 102,15,56,222,209
+ dec eax
+ movups xmm1,XMMWORD[rcx]
+ lea rcx,[16+rcx]
+ jnz NEAR $L$oop_dec1_13
+DB 102,15,56,223,209
+ xorps xmm2,xmm11
+ movups XMMWORD[rsi],xmm2
+
+$L$xts_dec_steal:
+ movzx eax,BYTE[16+rdi]
+ movzx ecx,BYTE[rsi]
+ lea rdi,[1+rdi]
+ mov BYTE[rsi],al
+ mov BYTE[16+rsi],cl
+ lea rsi,[1+rsi]
+ sub rdx,1
+ jnz NEAR $L$xts_dec_steal
+
+ sub rsi,r9
+ mov rcx,rbp
+ mov eax,r10d
+
+ movups xmm2,XMMWORD[rsi]
+ xorps xmm2,xmm10
+ movups xmm0,XMMWORD[rcx]
+ movups xmm1,XMMWORD[16+rcx]
+ lea rcx,[32+rcx]
+ xorps xmm2,xmm0
+$L$oop_dec1_14:
+DB 102,15,56,222,209
+ dec eax
+ movups xmm1,XMMWORD[rcx]
+ lea rcx,[16+rcx]
+ jnz NEAR $L$oop_dec1_14
+DB 102,15,56,223,209
+ xorps xmm2,xmm10
+ movups XMMWORD[rsi],xmm2
+
+$L$xts_dec_ret:
+ xorps xmm0,xmm0
+ pxor xmm1,xmm1
+ pxor xmm2,xmm2
+ pxor xmm3,xmm3
+ pxor xmm4,xmm4
+ pxor xmm5,xmm5
+ movaps xmm6,XMMWORD[((-168))+r11]
+ movaps XMMWORD[(-168)+r11],xmm0
+ movaps xmm7,XMMWORD[((-152))+r11]
+ movaps XMMWORD[(-152)+r11],xmm0
+ movaps xmm8,XMMWORD[((-136))+r11]
+ movaps XMMWORD[(-136)+r11],xmm0
+ movaps xmm9,XMMWORD[((-120))+r11]
+ movaps XMMWORD[(-120)+r11],xmm0
+ movaps xmm10,XMMWORD[((-104))+r11]
+ movaps XMMWORD[(-104)+r11],xmm0
+ movaps xmm11,XMMWORD[((-88))+r11]
+ movaps XMMWORD[(-88)+r11],xmm0
+ movaps xmm12,XMMWORD[((-72))+r11]
+ movaps XMMWORD[(-72)+r11],xmm0
+ movaps xmm13,XMMWORD[((-56))+r11]
+ movaps XMMWORD[(-56)+r11],xmm0
+ movaps xmm14,XMMWORD[((-40))+r11]
+ movaps XMMWORD[(-40)+r11],xmm0
+ movaps xmm15,XMMWORD[((-24))+r11]
+ movaps XMMWORD[(-24)+r11],xmm0
+ movaps XMMWORD[rsp],xmm0
+ movaps XMMWORD[16+rsp],xmm0
+ movaps XMMWORD[32+rsp],xmm0
+ movaps XMMWORD[48+rsp],xmm0
+ movaps XMMWORD[64+rsp],xmm0
+ movaps XMMWORD[80+rsp],xmm0
+ movaps XMMWORD[96+rsp],xmm0
+ mov rbp,QWORD[((-8))+r11]
+
+ lea rsp,[r11]
+
+$L$xts_dec_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_aesni_xts_decrypt:
+global aesni_ocb_encrypt
+
+ALIGN 32
+aesni_ocb_encrypt:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_aesni_ocb_encrypt:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+
+ lea rax,[rsp]
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ lea rsp,[((-160))+rsp]
+ movaps XMMWORD[rsp],xmm6
+ movaps XMMWORD[16+rsp],xmm7
+ movaps XMMWORD[32+rsp],xmm8
+ movaps XMMWORD[48+rsp],xmm9
+ movaps XMMWORD[64+rsp],xmm10
+ movaps XMMWORD[80+rsp],xmm11
+ movaps XMMWORD[96+rsp],xmm12
+ movaps XMMWORD[112+rsp],xmm13
+ movaps XMMWORD[128+rsp],xmm14
+ movaps XMMWORD[144+rsp],xmm15
+$L$ocb_enc_body:
+ mov rbx,QWORD[56+rax]
+ mov rbp,QWORD[((56+8))+rax]
+
+ mov r10d,DWORD[240+rcx]
+ mov r11,rcx
+ shl r10d,4
+ movups xmm9,XMMWORD[rcx]
+ movups xmm1,XMMWORD[16+r10*1+rcx]
+
+ movdqu xmm15,XMMWORD[r9]
+ pxor xmm9,xmm1
+ pxor xmm15,xmm1
+
+ mov eax,16+32
+ lea rcx,[32+r10*1+r11]
+ movups xmm1,XMMWORD[16+r11]
+ sub rax,r10
+ mov r10,rax
+
+ movdqu xmm10,XMMWORD[rbx]
+ movdqu xmm8,XMMWORD[rbp]
+
+ test r8,1
+ jnz NEAR $L$ocb_enc_odd
+
+ bsf r12,r8
+ add r8,1
+ shl r12,4
+ movdqu xmm7,XMMWORD[r12*1+rbx]
+ movdqu xmm2,XMMWORD[rdi]
+ lea rdi,[16+rdi]
+
+ call __ocb_encrypt1
+
+ movdqa xmm15,xmm7
+ movups XMMWORD[rsi],xmm2
+ lea rsi,[16+rsi]
+ sub rdx,1
+ jz NEAR $L$ocb_enc_done
+
+$L$ocb_enc_odd:
+ lea r12,[1+r8]
+ lea r13,[3+r8]
+ lea r14,[5+r8]
+ lea r8,[6+r8]
+ bsf r12,r12
+ bsf r13,r13
+ bsf r14,r14
+ shl r12,4
+ shl r13,4
+ shl r14,4
+
+ sub rdx,6
+ jc NEAR $L$ocb_enc_short
+ jmp NEAR $L$ocb_enc_grandloop
+
+ALIGN 32
+$L$ocb_enc_grandloop:
+ movdqu xmm2,XMMWORD[rdi]
+ movdqu xmm3,XMMWORD[16+rdi]
+ movdqu xmm4,XMMWORD[32+rdi]
+ movdqu xmm5,XMMWORD[48+rdi]
+ movdqu xmm6,XMMWORD[64+rdi]
+ movdqu xmm7,XMMWORD[80+rdi]
+ lea rdi,[96+rdi]
+
+ call __ocb_encrypt6
+
+ movups XMMWORD[rsi],xmm2
+ movups XMMWORD[16+rsi],xmm3
+ movups XMMWORD[32+rsi],xmm4
+ movups XMMWORD[48+rsi],xmm5
+ movups XMMWORD[64+rsi],xmm6
+ movups XMMWORD[80+rsi],xmm7
+ lea rsi,[96+rsi]
+ sub rdx,6
+ jnc NEAR $L$ocb_enc_grandloop
+
+$L$ocb_enc_short:
+ add rdx,6
+ jz NEAR $L$ocb_enc_done
+
+ movdqu xmm2,XMMWORD[rdi]
+ cmp rdx,2
+ jb NEAR $L$ocb_enc_one
+ movdqu xmm3,XMMWORD[16+rdi]
+ je NEAR $L$ocb_enc_two
+
+ movdqu xmm4,XMMWORD[32+rdi]
+ cmp rdx,4
+ jb NEAR $L$ocb_enc_three
+ movdqu xmm5,XMMWORD[48+rdi]
+ je NEAR $L$ocb_enc_four
+
+ movdqu xmm6,XMMWORD[64+rdi]
+ pxor xmm7,xmm7
+
+ call __ocb_encrypt6
+
+ movdqa xmm15,xmm14
+ movups XMMWORD[rsi],xmm2
+ movups XMMWORD[16+rsi],xmm3
+ movups XMMWORD[32+rsi],xmm4
+ movups XMMWORD[48+rsi],xmm5
+ movups XMMWORD[64+rsi],xmm6
+
+ jmp NEAR $L$ocb_enc_done
+
+ALIGN 16
+$L$ocb_enc_one:
+ movdqa xmm7,xmm10
+
+ call __ocb_encrypt1
+
+ movdqa xmm15,xmm7
+ movups XMMWORD[rsi],xmm2
+ jmp NEAR $L$ocb_enc_done
+
+ALIGN 16
+$L$ocb_enc_two:
+ pxor xmm4,xmm4
+ pxor xmm5,xmm5
+
+ call __ocb_encrypt4
+
+ movdqa xmm15,xmm11
+ movups XMMWORD[rsi],xmm2
+ movups XMMWORD[16+rsi],xmm3
+
+ jmp NEAR $L$ocb_enc_done
+
+ALIGN 16
+$L$ocb_enc_three:
+ pxor xmm5,xmm5
+
+ call __ocb_encrypt4
+
+ movdqa xmm15,xmm12
+ movups XMMWORD[rsi],xmm2
+ movups XMMWORD[16+rsi],xmm3
+ movups XMMWORD[32+rsi],xmm4
+
+ jmp NEAR $L$ocb_enc_done
+
+ALIGN 16
+$L$ocb_enc_four:
+ call __ocb_encrypt4
+
+ movdqa xmm15,xmm13
+ movups XMMWORD[rsi],xmm2
+ movups XMMWORD[16+rsi],xmm3
+ movups XMMWORD[32+rsi],xmm4
+ movups XMMWORD[48+rsi],xmm5
+
+$L$ocb_enc_done:
+ pxor xmm15,xmm0
+ movdqu XMMWORD[rbp],xmm8
+ movdqu XMMWORD[r9],xmm15
+
+ xorps xmm0,xmm0
+ pxor xmm1,xmm1
+ pxor xmm2,xmm2
+ pxor xmm3,xmm3
+ pxor xmm4,xmm4
+ pxor xmm5,xmm5
+ movaps xmm6,XMMWORD[rsp]
+ movaps XMMWORD[rsp],xmm0
+ movaps xmm7,XMMWORD[16+rsp]
+ movaps XMMWORD[16+rsp],xmm0
+ movaps xmm8,XMMWORD[32+rsp]
+ movaps XMMWORD[32+rsp],xmm0
+ movaps xmm9,XMMWORD[48+rsp]
+ movaps XMMWORD[48+rsp],xmm0
+ movaps xmm10,XMMWORD[64+rsp]
+ movaps XMMWORD[64+rsp],xmm0
+ movaps xmm11,XMMWORD[80+rsp]
+ movaps XMMWORD[80+rsp],xmm0
+ movaps xmm12,XMMWORD[96+rsp]
+ movaps XMMWORD[96+rsp],xmm0
+ movaps xmm13,XMMWORD[112+rsp]
+ movaps XMMWORD[112+rsp],xmm0
+ movaps xmm14,XMMWORD[128+rsp]
+ movaps XMMWORD[128+rsp],xmm0
+ movaps xmm15,XMMWORD[144+rsp]
+ movaps XMMWORD[144+rsp],xmm0
+ lea rax,[((160+40))+rsp]
+$L$ocb_enc_pop:
+ mov r14,QWORD[((-40))+rax]
+
+ mov r13,QWORD[((-32))+rax]
+
+ mov r12,QWORD[((-24))+rax]
+
+ mov rbp,QWORD[((-16))+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+
+ lea rsp,[rax]
+
+$L$ocb_enc_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_aesni_ocb_encrypt:
+
+
+ALIGN 32
+__ocb_encrypt6:
+ pxor xmm15,xmm9
+ movdqu xmm11,XMMWORD[r12*1+rbx]
+ movdqa xmm12,xmm10
+ movdqu xmm13,XMMWORD[r13*1+rbx]
+ movdqa xmm14,xmm10
+ pxor xmm10,xmm15
+ movdqu xmm15,XMMWORD[r14*1+rbx]
+ pxor xmm11,xmm10
+ pxor xmm8,xmm2
+ pxor xmm2,xmm10
+ pxor xmm12,xmm11
+ pxor xmm8,xmm3
+ pxor xmm3,xmm11
+ pxor xmm13,xmm12
+ pxor xmm8,xmm4
+ pxor xmm4,xmm12
+ pxor xmm14,xmm13
+ pxor xmm8,xmm5
+ pxor xmm5,xmm13
+ pxor xmm15,xmm14
+ pxor xmm8,xmm6
+ pxor xmm6,xmm14
+ pxor xmm8,xmm7
+ pxor xmm7,xmm15
+ movups xmm0,XMMWORD[32+r11]
+
+ lea r12,[1+r8]
+ lea r13,[3+r8]
+ lea r14,[5+r8]
+ add r8,6
+ pxor xmm10,xmm9
+ bsf r12,r12
+ bsf r13,r13
+ bsf r14,r14
+
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+DB 102,15,56,220,225
+DB 102,15,56,220,233
+ pxor xmm11,xmm9
+ pxor xmm12,xmm9
+DB 102,15,56,220,241
+ pxor xmm13,xmm9
+ pxor xmm14,xmm9
+DB 102,15,56,220,249
+ movups xmm1,XMMWORD[48+r11]
+ pxor xmm15,xmm9
+
+DB 102,15,56,220,208
+DB 102,15,56,220,216
+DB 102,15,56,220,224
+DB 102,15,56,220,232
+DB 102,15,56,220,240
+DB 102,15,56,220,248
+ movups xmm0,XMMWORD[64+r11]
+ shl r12,4
+ shl r13,4
+ jmp NEAR $L$ocb_enc_loop6
+
+ALIGN 32
+$L$ocb_enc_loop6:
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+DB 102,15,56,220,225
+DB 102,15,56,220,233
+DB 102,15,56,220,241
+DB 102,15,56,220,249
+ movups xmm1,XMMWORD[rax*1+rcx]
+ add rax,32
+
+DB 102,15,56,220,208
+DB 102,15,56,220,216
+DB 102,15,56,220,224
+DB 102,15,56,220,232
+DB 102,15,56,220,240
+DB 102,15,56,220,248
+ movups xmm0,XMMWORD[((-16))+rax*1+rcx]
+ jnz NEAR $L$ocb_enc_loop6
+
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+DB 102,15,56,220,225
+DB 102,15,56,220,233
+DB 102,15,56,220,241
+DB 102,15,56,220,249
+ movups xmm1,XMMWORD[16+r11]
+ shl r14,4
+
+DB 102,65,15,56,221,210
+ movdqu xmm10,XMMWORD[rbx]
+ mov rax,r10
+DB 102,65,15,56,221,219
+DB 102,65,15,56,221,228
+DB 102,65,15,56,221,237
+DB 102,65,15,56,221,246
+DB 102,65,15,56,221,255
+ DB 0F3h,0C3h ;repret
+
+
+
+ALIGN 32
+__ocb_encrypt4:
+ pxor xmm15,xmm9
+ movdqu xmm11,XMMWORD[r12*1+rbx]
+ movdqa xmm12,xmm10
+ movdqu xmm13,XMMWORD[r13*1+rbx]
+ pxor xmm10,xmm15
+ pxor xmm11,xmm10
+ pxor xmm8,xmm2
+ pxor xmm2,xmm10
+ pxor xmm12,xmm11
+ pxor xmm8,xmm3
+ pxor xmm3,xmm11
+ pxor xmm13,xmm12
+ pxor xmm8,xmm4
+ pxor xmm4,xmm12
+ pxor xmm8,xmm5
+ pxor xmm5,xmm13
+ movups xmm0,XMMWORD[32+r11]
+
+ pxor xmm10,xmm9
+ pxor xmm11,xmm9
+ pxor xmm12,xmm9
+ pxor xmm13,xmm9
+
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+DB 102,15,56,220,225
+DB 102,15,56,220,233
+ movups xmm1,XMMWORD[48+r11]
+
+DB 102,15,56,220,208
+DB 102,15,56,220,216
+DB 102,15,56,220,224
+DB 102,15,56,220,232
+ movups xmm0,XMMWORD[64+r11]
+ jmp NEAR $L$ocb_enc_loop4
+
+ALIGN 32
+$L$ocb_enc_loop4:
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+DB 102,15,56,220,225
+DB 102,15,56,220,233
+ movups xmm1,XMMWORD[rax*1+rcx]
+ add rax,32
+
+DB 102,15,56,220,208
+DB 102,15,56,220,216
+DB 102,15,56,220,224
+DB 102,15,56,220,232
+ movups xmm0,XMMWORD[((-16))+rax*1+rcx]
+ jnz NEAR $L$ocb_enc_loop4
+
+DB 102,15,56,220,209
+DB 102,15,56,220,217
+DB 102,15,56,220,225
+DB 102,15,56,220,233
+ movups xmm1,XMMWORD[16+r11]
+ mov rax,r10
+
+DB 102,65,15,56,221,210
+DB 102,65,15,56,221,219
+DB 102,65,15,56,221,228
+DB 102,65,15,56,221,237
+ DB 0F3h,0C3h ;repret
+
+
+
+ALIGN 32
+__ocb_encrypt1:
+ pxor xmm7,xmm15
+ pxor xmm7,xmm9
+ pxor xmm8,xmm2
+ pxor xmm2,xmm7
+ movups xmm0,XMMWORD[32+r11]
+
+DB 102,15,56,220,209
+ movups xmm1,XMMWORD[48+r11]
+ pxor xmm7,xmm9
+
+DB 102,15,56,220,208
+ movups xmm0,XMMWORD[64+r11]
+ jmp NEAR $L$ocb_enc_loop1
+
+ALIGN 32
+$L$ocb_enc_loop1:
+DB 102,15,56,220,209
+ movups xmm1,XMMWORD[rax*1+rcx]
+ add rax,32
+
+DB 102,15,56,220,208
+ movups xmm0,XMMWORD[((-16))+rax*1+rcx]
+ jnz NEAR $L$ocb_enc_loop1
+
+DB 102,15,56,220,209
+ movups xmm1,XMMWORD[16+r11]
+ mov rax,r10
+
+DB 102,15,56,221,215
+ DB 0F3h,0C3h ;repret
+
+
+global aesni_ocb_decrypt
+
+ALIGN 32
+aesni_ocb_decrypt:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_aesni_ocb_decrypt:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+
+ lea rax,[rsp]
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ lea rsp,[((-160))+rsp]
+ movaps XMMWORD[rsp],xmm6
+ movaps XMMWORD[16+rsp],xmm7
+ movaps XMMWORD[32+rsp],xmm8
+ movaps XMMWORD[48+rsp],xmm9
+ movaps XMMWORD[64+rsp],xmm10
+ movaps XMMWORD[80+rsp],xmm11
+ movaps XMMWORD[96+rsp],xmm12
+ movaps XMMWORD[112+rsp],xmm13
+ movaps XMMWORD[128+rsp],xmm14
+ movaps XMMWORD[144+rsp],xmm15
+$L$ocb_dec_body:
+ mov rbx,QWORD[56+rax]
+ mov rbp,QWORD[((56+8))+rax]
+
+ mov r10d,DWORD[240+rcx]
+ mov r11,rcx
+ shl r10d,4
+ movups xmm9,XMMWORD[rcx]
+ movups xmm1,XMMWORD[16+r10*1+rcx]
+
+ movdqu xmm15,XMMWORD[r9]
+ pxor xmm9,xmm1
+ pxor xmm15,xmm1
+
+ mov eax,16+32
+ lea rcx,[32+r10*1+r11]
+ movups xmm1,XMMWORD[16+r11]
+ sub rax,r10
+ mov r10,rax
+
+ movdqu xmm10,XMMWORD[rbx]
+ movdqu xmm8,XMMWORD[rbp]
+
+ test r8,1
+ jnz NEAR $L$ocb_dec_odd
+
+ bsf r12,r8
+ add r8,1
+ shl r12,4
+ movdqu xmm7,XMMWORD[r12*1+rbx]
+ movdqu xmm2,XMMWORD[rdi]
+ lea rdi,[16+rdi]
+
+ call __ocb_decrypt1
+
+ movdqa xmm15,xmm7
+ movups XMMWORD[rsi],xmm2
+ xorps xmm8,xmm2
+ lea rsi,[16+rsi]
+ sub rdx,1
+ jz NEAR $L$ocb_dec_done
+
+$L$ocb_dec_odd:
+ lea r12,[1+r8]
+ lea r13,[3+r8]
+ lea r14,[5+r8]
+ lea r8,[6+r8]
+ bsf r12,r12
+ bsf r13,r13
+ bsf r14,r14
+ shl r12,4
+ shl r13,4
+ shl r14,4
+
+ sub rdx,6
+ jc NEAR $L$ocb_dec_short
+ jmp NEAR $L$ocb_dec_grandloop
+
+ALIGN 32
+$L$ocb_dec_grandloop:
+ movdqu xmm2,XMMWORD[rdi]
+ movdqu xmm3,XMMWORD[16+rdi]
+ movdqu xmm4,XMMWORD[32+rdi]
+ movdqu xmm5,XMMWORD[48+rdi]
+ movdqu xmm6,XMMWORD[64+rdi]
+ movdqu xmm7,XMMWORD[80+rdi]
+ lea rdi,[96+rdi]
+
+ call __ocb_decrypt6
+
+ movups XMMWORD[rsi],xmm2
+ pxor xmm8,xmm2
+ movups XMMWORD[16+rsi],xmm3
+ pxor xmm8,xmm3
+ movups XMMWORD[32+rsi],xmm4
+ pxor xmm8,xmm4
+ movups XMMWORD[48+rsi],xmm5
+ pxor xmm8,xmm5
+ movups XMMWORD[64+rsi],xmm6
+ pxor xmm8,xmm6
+ movups XMMWORD[80+rsi],xmm7
+ pxor xmm8,xmm7
+ lea rsi,[96+rsi]
+ sub rdx,6
+ jnc NEAR $L$ocb_dec_grandloop
+
+$L$ocb_dec_short:
+ add rdx,6
+ jz NEAR $L$ocb_dec_done
+
+ movdqu xmm2,XMMWORD[rdi]
+ cmp rdx,2
+ jb NEAR $L$ocb_dec_one
+ movdqu xmm3,XMMWORD[16+rdi]
+ je NEAR $L$ocb_dec_two
+
+ movdqu xmm4,XMMWORD[32+rdi]
+ cmp rdx,4
+ jb NEAR $L$ocb_dec_three
+ movdqu xmm5,XMMWORD[48+rdi]
+ je NEAR $L$ocb_dec_four
+
+ movdqu xmm6,XMMWORD[64+rdi]
+ pxor xmm7,xmm7
+
+ call __ocb_decrypt6
+
+ movdqa xmm15,xmm14
+ movups XMMWORD[rsi],xmm2
+ pxor xmm8,xmm2
+ movups XMMWORD[16+rsi],xmm3
+ pxor xmm8,xmm3
+ movups XMMWORD[32+rsi],xmm4
+ pxor xmm8,xmm4
+ movups XMMWORD[48+rsi],xmm5
+ pxor xmm8,xmm5
+ movups XMMWORD[64+rsi],xmm6
+ pxor xmm8,xmm6
+
+ jmp NEAR $L$ocb_dec_done
+
+ALIGN 16
+$L$ocb_dec_one:
+ movdqa xmm7,xmm10
+
+ call __ocb_decrypt1
+
+ movdqa xmm15,xmm7
+ movups XMMWORD[rsi],xmm2
+ xorps xmm8,xmm2
+ jmp NEAR $L$ocb_dec_done
+
+ALIGN 16
+$L$ocb_dec_two:
+ pxor xmm4,xmm4
+ pxor xmm5,xmm5
+
+ call __ocb_decrypt4
+
+ movdqa xmm15,xmm11
+ movups XMMWORD[rsi],xmm2
+ xorps xmm8,xmm2
+ movups XMMWORD[16+rsi],xmm3
+ xorps xmm8,xmm3
+
+ jmp NEAR $L$ocb_dec_done
+
+ALIGN 16
+$L$ocb_dec_three:
+ pxor xmm5,xmm5
+
+ call __ocb_decrypt4
+
+ movdqa xmm15,xmm12
+ movups XMMWORD[rsi],xmm2
+ xorps xmm8,xmm2
+ movups XMMWORD[16+rsi],xmm3
+ xorps xmm8,xmm3
+ movups XMMWORD[32+rsi],xmm4
+ xorps xmm8,xmm4
+
+ jmp NEAR $L$ocb_dec_done
+
+ALIGN 16
+$L$ocb_dec_four:
+ call __ocb_decrypt4
+
+ movdqa xmm15,xmm13
+ movups XMMWORD[rsi],xmm2
+ pxor xmm8,xmm2
+ movups XMMWORD[16+rsi],xmm3
+ pxor xmm8,xmm3
+ movups XMMWORD[32+rsi],xmm4
+ pxor xmm8,xmm4
+ movups XMMWORD[48+rsi],xmm5
+ pxor xmm8,xmm5
+
+$L$ocb_dec_done:
+ pxor xmm15,xmm0
+ movdqu XMMWORD[rbp],xmm8
+ movdqu XMMWORD[r9],xmm15
+
+ xorps xmm0,xmm0
+ pxor xmm1,xmm1
+ pxor xmm2,xmm2
+ pxor xmm3,xmm3
+ pxor xmm4,xmm4
+ pxor xmm5,xmm5
+ movaps xmm6,XMMWORD[rsp]
+ movaps XMMWORD[rsp],xmm0
+ movaps xmm7,XMMWORD[16+rsp]
+ movaps XMMWORD[16+rsp],xmm0
+ movaps xmm8,XMMWORD[32+rsp]
+ movaps XMMWORD[32+rsp],xmm0
+ movaps xmm9,XMMWORD[48+rsp]
+ movaps XMMWORD[48+rsp],xmm0
+ movaps xmm10,XMMWORD[64+rsp]
+ movaps XMMWORD[64+rsp],xmm0
+ movaps xmm11,XMMWORD[80+rsp]
+ movaps XMMWORD[80+rsp],xmm0
+ movaps xmm12,XMMWORD[96+rsp]
+ movaps XMMWORD[96+rsp],xmm0
+ movaps xmm13,XMMWORD[112+rsp]
+ movaps XMMWORD[112+rsp],xmm0
+ movaps xmm14,XMMWORD[128+rsp]
+ movaps XMMWORD[128+rsp],xmm0
+ movaps xmm15,XMMWORD[144+rsp]
+ movaps XMMWORD[144+rsp],xmm0
+ lea rax,[((160+40))+rsp]
+$L$ocb_dec_pop:
+ mov r14,QWORD[((-40))+rax]
+
+ mov r13,QWORD[((-32))+rax]
+
+ mov r12,QWORD[((-24))+rax]
+
+ mov rbp,QWORD[((-16))+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+
+ lea rsp,[rax]
+
+$L$ocb_dec_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_aesni_ocb_decrypt:
+
+
+ALIGN 32
+__ocb_decrypt6:
+ pxor xmm15,xmm9
+ movdqu xmm11,XMMWORD[r12*1+rbx]
+ movdqa xmm12,xmm10
+ movdqu xmm13,XMMWORD[r13*1+rbx]
+ movdqa xmm14,xmm10
+ pxor xmm10,xmm15
+ movdqu xmm15,XMMWORD[r14*1+rbx]
+ pxor xmm11,xmm10
+ pxor xmm2,xmm10
+ pxor xmm12,xmm11
+ pxor xmm3,xmm11
+ pxor xmm13,xmm12
+ pxor xmm4,xmm12
+ pxor xmm14,xmm13
+ pxor xmm5,xmm13
+ pxor xmm15,xmm14
+ pxor xmm6,xmm14
+ pxor xmm7,xmm15
+ movups xmm0,XMMWORD[32+r11]
+
+ lea r12,[1+r8]
+ lea r13,[3+r8]
+ lea r14,[5+r8]
+ add r8,6
+ pxor xmm10,xmm9
+ bsf r12,r12
+ bsf r13,r13
+ bsf r14,r14
+
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+DB 102,15,56,222,225
+DB 102,15,56,222,233
+ pxor xmm11,xmm9
+ pxor xmm12,xmm9
+DB 102,15,56,222,241
+ pxor xmm13,xmm9
+ pxor xmm14,xmm9
+DB 102,15,56,222,249
+ movups xmm1,XMMWORD[48+r11]
+ pxor xmm15,xmm9
+
+DB 102,15,56,222,208
+DB 102,15,56,222,216
+DB 102,15,56,222,224
+DB 102,15,56,222,232
+DB 102,15,56,222,240
+DB 102,15,56,222,248
+ movups xmm0,XMMWORD[64+r11]
+ shl r12,4
+ shl r13,4
+ jmp NEAR $L$ocb_dec_loop6
+
+ALIGN 32
+$L$ocb_dec_loop6:
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+DB 102,15,56,222,225
+DB 102,15,56,222,233
+DB 102,15,56,222,241
+DB 102,15,56,222,249
+ movups xmm1,XMMWORD[rax*1+rcx]
+ add rax,32
+
+DB 102,15,56,222,208
+DB 102,15,56,222,216
+DB 102,15,56,222,224
+DB 102,15,56,222,232
+DB 102,15,56,222,240
+DB 102,15,56,222,248
+ movups xmm0,XMMWORD[((-16))+rax*1+rcx]
+ jnz NEAR $L$ocb_dec_loop6
+
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+DB 102,15,56,222,225
+DB 102,15,56,222,233
+DB 102,15,56,222,241
+DB 102,15,56,222,249
+ movups xmm1,XMMWORD[16+r11]
+ shl r14,4
+
+DB 102,65,15,56,223,210
+ movdqu xmm10,XMMWORD[rbx]
+ mov rax,r10
+DB 102,65,15,56,223,219
+DB 102,65,15,56,223,228
+DB 102,65,15,56,223,237
+DB 102,65,15,56,223,246
+DB 102,65,15,56,223,255
+ DB 0F3h,0C3h ;repret
+
+
+
+ALIGN 32
+__ocb_decrypt4:
+ pxor xmm15,xmm9
+ movdqu xmm11,XMMWORD[r12*1+rbx]
+ movdqa xmm12,xmm10
+ movdqu xmm13,XMMWORD[r13*1+rbx]
+ pxor xmm10,xmm15
+ pxor xmm11,xmm10
+ pxor xmm2,xmm10
+ pxor xmm12,xmm11
+ pxor xmm3,xmm11
+ pxor xmm13,xmm12
+ pxor xmm4,xmm12
+ pxor xmm5,xmm13
+ movups xmm0,XMMWORD[32+r11]
+
+ pxor xmm10,xmm9
+ pxor xmm11,xmm9
+ pxor xmm12,xmm9
+ pxor xmm13,xmm9
+
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+DB 102,15,56,222,225
+DB 102,15,56,222,233
+ movups xmm1,XMMWORD[48+r11]
+
+DB 102,15,56,222,208
+DB 102,15,56,222,216
+DB 102,15,56,222,224
+DB 102,15,56,222,232
+ movups xmm0,XMMWORD[64+r11]
+ jmp NEAR $L$ocb_dec_loop4
+
+ALIGN 32
+$L$ocb_dec_loop4:
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+DB 102,15,56,222,225
+DB 102,15,56,222,233
+ movups xmm1,XMMWORD[rax*1+rcx]
+ add rax,32
+
+DB 102,15,56,222,208
+DB 102,15,56,222,216
+DB 102,15,56,222,224
+DB 102,15,56,222,232
+ movups xmm0,XMMWORD[((-16))+rax*1+rcx]
+ jnz NEAR $L$ocb_dec_loop4
+
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+DB 102,15,56,222,225
+DB 102,15,56,222,233
+ movups xmm1,XMMWORD[16+r11]
+ mov rax,r10
+
+DB 102,65,15,56,223,210
+DB 102,65,15,56,223,219
+DB 102,65,15,56,223,228
+DB 102,65,15,56,223,237
+ DB 0F3h,0C3h ;repret
+
+
+
+ALIGN 32
+__ocb_decrypt1:
+ pxor xmm7,xmm15
+ pxor xmm7,xmm9
+ pxor xmm2,xmm7
+ movups xmm0,XMMWORD[32+r11]
+
+DB 102,15,56,222,209
+ movups xmm1,XMMWORD[48+r11]
+ pxor xmm7,xmm9
+
+DB 102,15,56,222,208
+ movups xmm0,XMMWORD[64+r11]
+ jmp NEAR $L$ocb_dec_loop1
+
+ALIGN 32
+$L$ocb_dec_loop1:
+DB 102,15,56,222,209
+ movups xmm1,XMMWORD[rax*1+rcx]
+ add rax,32
+
+DB 102,15,56,222,208
+ movups xmm0,XMMWORD[((-16))+rax*1+rcx]
+ jnz NEAR $L$ocb_dec_loop1
+
+DB 102,15,56,222,209
+ movups xmm1,XMMWORD[16+r11]
+ mov rax,r10
+
+DB 102,15,56,223,215
+ DB 0F3h,0C3h ;repret
+
+global aesni_cbc_encrypt
+
+ALIGN 16
+aesni_cbc_encrypt:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_aesni_cbc_encrypt:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+
+ test rdx,rdx
+ jz NEAR $L$cbc_ret
+
+ mov r10d,DWORD[240+rcx]
+ mov r11,rcx
+ test r9d,r9d
+ jz NEAR $L$cbc_decrypt
+
+ movups xmm2,XMMWORD[r8]
+ mov eax,r10d
+ cmp rdx,16
+ jb NEAR $L$cbc_enc_tail
+ sub rdx,16
+ jmp NEAR $L$cbc_enc_loop
+ALIGN 16
+$L$cbc_enc_loop:
+ movups xmm3,XMMWORD[rdi]
+ lea rdi,[16+rdi]
+
+ movups xmm0,XMMWORD[rcx]
+ movups xmm1,XMMWORD[16+rcx]
+ xorps xmm3,xmm0
+ lea rcx,[32+rcx]
+ xorps xmm2,xmm3
+$L$oop_enc1_15:
+DB 102,15,56,220,209
+ dec eax
+ movups xmm1,XMMWORD[rcx]
+ lea rcx,[16+rcx]
+ jnz NEAR $L$oop_enc1_15
+DB 102,15,56,221,209
+ mov eax,r10d
+ mov rcx,r11
+ movups XMMWORD[rsi],xmm2
+ lea rsi,[16+rsi]
+ sub rdx,16
+ jnc NEAR $L$cbc_enc_loop
+ add rdx,16
+ jnz NEAR $L$cbc_enc_tail
+ pxor xmm0,xmm0
+ pxor xmm1,xmm1
+ movups XMMWORD[r8],xmm2
+ pxor xmm2,xmm2
+ pxor xmm3,xmm3
+ jmp NEAR $L$cbc_ret
+
+$L$cbc_enc_tail:
+ mov rcx,rdx
+ xchg rsi,rdi
+ DD 0x9066A4F3
+ mov ecx,16
+ sub rcx,rdx
+ xor eax,eax
+ DD 0x9066AAF3
+ lea rdi,[((-16))+rdi]
+ mov eax,r10d
+ mov rsi,rdi
+ mov rcx,r11
+ xor rdx,rdx
+ jmp NEAR $L$cbc_enc_loop
+
+ALIGN 16
+$L$cbc_decrypt:
+ cmp rdx,16
+ jne NEAR $L$cbc_decrypt_bulk
+
+
+
+ movdqu xmm2,XMMWORD[rdi]
+ movdqu xmm3,XMMWORD[r8]
+ movdqa xmm4,xmm2
+ movups xmm0,XMMWORD[rcx]
+ movups xmm1,XMMWORD[16+rcx]
+ lea rcx,[32+rcx]
+ xorps xmm2,xmm0
+$L$oop_dec1_16:
+DB 102,15,56,222,209
+ dec r10d
+ movups xmm1,XMMWORD[rcx]
+ lea rcx,[16+rcx]
+ jnz NEAR $L$oop_dec1_16
+DB 102,15,56,223,209
+ pxor xmm0,xmm0
+ pxor xmm1,xmm1
+ movdqu XMMWORD[r8],xmm4
+ xorps xmm2,xmm3
+ pxor xmm3,xmm3
+ movups XMMWORD[rsi],xmm2
+ pxor xmm2,xmm2
+ jmp NEAR $L$cbc_ret
+ALIGN 16
+$L$cbc_decrypt_bulk:
+ lea r11,[rsp]
+
+ push rbp
+
+ sub rsp,176
+ and rsp,-16
+ movaps XMMWORD[16+rsp],xmm6
+ movaps XMMWORD[32+rsp],xmm7
+ movaps XMMWORD[48+rsp],xmm8
+ movaps XMMWORD[64+rsp],xmm9
+ movaps XMMWORD[80+rsp],xmm10
+ movaps XMMWORD[96+rsp],xmm11
+ movaps XMMWORD[112+rsp],xmm12
+ movaps XMMWORD[128+rsp],xmm13
+ movaps XMMWORD[144+rsp],xmm14
+ movaps XMMWORD[160+rsp],xmm15
+$L$cbc_decrypt_body:
+ mov rbp,rcx
+ movups xmm10,XMMWORD[r8]
+ mov eax,r10d
+ cmp rdx,0x50
+ jbe NEAR $L$cbc_dec_tail
+
+ movups xmm0,XMMWORD[rcx]
+ movdqu xmm2,XMMWORD[rdi]
+ movdqu xmm3,XMMWORD[16+rdi]
+ movdqa xmm11,xmm2
+ movdqu xmm4,XMMWORD[32+rdi]
+ movdqa xmm12,xmm3
+ movdqu xmm5,XMMWORD[48+rdi]
+ movdqa xmm13,xmm4
+ movdqu xmm6,XMMWORD[64+rdi]
+ movdqa xmm14,xmm5
+ movdqu xmm7,XMMWORD[80+rdi]
+ movdqa xmm15,xmm6
+ mov r9d,DWORD[((OPENSSL_ia32cap_P+4))]
+ cmp rdx,0x70
+ jbe NEAR $L$cbc_dec_six_or_seven
+
+ and r9d,71303168
+ sub rdx,0x50
+ cmp r9d,4194304
+ je NEAR $L$cbc_dec_loop6_enter
+ sub rdx,0x20
+ lea rcx,[112+rcx]
+ jmp NEAR $L$cbc_dec_loop8_enter
+ALIGN 16
+$L$cbc_dec_loop8:
+ movups XMMWORD[rsi],xmm9
+ lea rsi,[16+rsi]
+$L$cbc_dec_loop8_enter:
+ movdqu xmm8,XMMWORD[96+rdi]
+ pxor xmm2,xmm0
+ movdqu xmm9,XMMWORD[112+rdi]
+ pxor xmm3,xmm0
+ movups xmm1,XMMWORD[((16-112))+rcx]
+ pxor xmm4,xmm0
+ mov rbp,-1
+ cmp rdx,0x70
+ pxor xmm5,xmm0
+ pxor xmm6,xmm0
+ pxor xmm7,xmm0
+ pxor xmm8,xmm0
+
+DB 102,15,56,222,209
+ pxor xmm9,xmm0
+ movups xmm0,XMMWORD[((32-112))+rcx]
+DB 102,15,56,222,217
+DB 102,15,56,222,225
+DB 102,15,56,222,233
+DB 102,15,56,222,241
+DB 102,15,56,222,249
+DB 102,68,15,56,222,193
+ adc rbp,0
+ and rbp,128
+DB 102,68,15,56,222,201
+ add rbp,rdi
+ movups xmm1,XMMWORD[((48-112))+rcx]
+DB 102,15,56,222,208
+DB 102,15,56,222,216
+DB 102,15,56,222,224
+DB 102,15,56,222,232
+DB 102,15,56,222,240
+DB 102,15,56,222,248
+DB 102,68,15,56,222,192
+DB 102,68,15,56,222,200
+ movups xmm0,XMMWORD[((64-112))+rcx]
+ nop
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+DB 102,15,56,222,225
+DB 102,15,56,222,233
+DB 102,15,56,222,241
+DB 102,15,56,222,249
+DB 102,68,15,56,222,193
+DB 102,68,15,56,222,201
+ movups xmm1,XMMWORD[((80-112))+rcx]
+ nop
+DB 102,15,56,222,208
+DB 102,15,56,222,216
+DB 102,15,56,222,224
+DB 102,15,56,222,232
+DB 102,15,56,222,240
+DB 102,15,56,222,248
+DB 102,68,15,56,222,192
+DB 102,68,15,56,222,200
+ movups xmm0,XMMWORD[((96-112))+rcx]
+ nop
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+DB 102,15,56,222,225
+DB 102,15,56,222,233
+DB 102,15,56,222,241
+DB 102,15,56,222,249
+DB 102,68,15,56,222,193
+DB 102,68,15,56,222,201
+ movups xmm1,XMMWORD[((112-112))+rcx]
+ nop
+DB 102,15,56,222,208
+DB 102,15,56,222,216
+DB 102,15,56,222,224
+DB 102,15,56,222,232
+DB 102,15,56,222,240
+DB 102,15,56,222,248
+DB 102,68,15,56,222,192
+DB 102,68,15,56,222,200
+ movups xmm0,XMMWORD[((128-112))+rcx]
+ nop
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+DB 102,15,56,222,225
+DB 102,15,56,222,233
+DB 102,15,56,222,241
+DB 102,15,56,222,249
+DB 102,68,15,56,222,193
+DB 102,68,15,56,222,201
+ movups xmm1,XMMWORD[((144-112))+rcx]
+ cmp eax,11
+DB 102,15,56,222,208
+DB 102,15,56,222,216
+DB 102,15,56,222,224
+DB 102,15,56,222,232
+DB 102,15,56,222,240
+DB 102,15,56,222,248
+DB 102,68,15,56,222,192
+DB 102,68,15,56,222,200
+ movups xmm0,XMMWORD[((160-112))+rcx]
+ jb NEAR $L$cbc_dec_done
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+DB 102,15,56,222,225
+DB 102,15,56,222,233
+DB 102,15,56,222,241
+DB 102,15,56,222,249
+DB 102,68,15,56,222,193
+DB 102,68,15,56,222,201
+ movups xmm1,XMMWORD[((176-112))+rcx]
+ nop
+DB 102,15,56,222,208
+DB 102,15,56,222,216
+DB 102,15,56,222,224
+DB 102,15,56,222,232
+DB 102,15,56,222,240
+DB 102,15,56,222,248
+DB 102,68,15,56,222,192
+DB 102,68,15,56,222,200
+ movups xmm0,XMMWORD[((192-112))+rcx]
+ je NEAR $L$cbc_dec_done
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+DB 102,15,56,222,225
+DB 102,15,56,222,233
+DB 102,15,56,222,241
+DB 102,15,56,222,249
+DB 102,68,15,56,222,193
+DB 102,68,15,56,222,201
+ movups xmm1,XMMWORD[((208-112))+rcx]
+ nop
+DB 102,15,56,222,208
+DB 102,15,56,222,216
+DB 102,15,56,222,224
+DB 102,15,56,222,232
+DB 102,15,56,222,240
+DB 102,15,56,222,248
+DB 102,68,15,56,222,192
+DB 102,68,15,56,222,200
+ movups xmm0,XMMWORD[((224-112))+rcx]
+ jmp NEAR $L$cbc_dec_done
+ALIGN 16
+$L$cbc_dec_done:
+DB 102,15,56,222,209
+DB 102,15,56,222,217
+ pxor xmm10,xmm0
+ pxor xmm11,xmm0
+DB 102,15,56,222,225
+DB 102,15,56,222,233
+ pxor xmm12,xmm0
+ pxor xmm13,xmm0
+DB 102,15,56,222,241
+DB 102,15,56,222,249
+ pxor xmm14,xmm0
+ pxor xmm15,xmm0
+DB 102,68,15,56,222,193
+DB 102,68,15,56,222,201
+ movdqu xmm1,XMMWORD[80+rdi]
+
+DB 102,65,15,56,223,210
+ movdqu xmm10,XMMWORD[96+rdi]
+ pxor xmm1,xmm0
+DB 102,65,15,56,223,219
+ pxor xmm10,xmm0
+ movdqu xmm0,XMMWORD[112+rdi]
+DB 102,65,15,56,223,228
+ lea rdi,[128+rdi]
+ movdqu xmm11,XMMWORD[rbp]
+DB 102,65,15,56,223,237
+DB 102,65,15,56,223,246
+ movdqu xmm12,XMMWORD[16+rbp]
+ movdqu xmm13,XMMWORD[32+rbp]
+DB 102,65,15,56,223,255
+DB 102,68,15,56,223,193
+ movdqu xmm14,XMMWORD[48+rbp]
+ movdqu xmm15,XMMWORD[64+rbp]
+DB 102,69,15,56,223,202
+ movdqa xmm10,xmm0
+ movdqu xmm1,XMMWORD[80+rbp]
+ movups xmm0,XMMWORD[((-112))+rcx]
+
+ movups XMMWORD[rsi],xmm2
+ movdqa xmm2,xmm11
+ movups XMMWORD[16+rsi],xmm3
+ movdqa xmm3,xmm12
+ movups XMMWORD[32+rsi],xmm4
+ movdqa xmm4,xmm13
+ movups XMMWORD[48+rsi],xmm5
+ movdqa xmm5,xmm14
+ movups XMMWORD[64+rsi],xmm6
+ movdqa xmm6,xmm15
+ movups XMMWORD[80+rsi],xmm7
+ movdqa xmm7,xmm1
+ movups XMMWORD[96+rsi],xmm8
+ lea rsi,[112+rsi]
+
+ sub rdx,0x80
+ ja NEAR $L$cbc_dec_loop8
+
+ movaps xmm2,xmm9
+ lea rcx,[((-112))+rcx]
+ add rdx,0x70
+ jle NEAR $L$cbc_dec_clear_tail_collected
+ movups XMMWORD[rsi],xmm9
+ lea rsi,[16+rsi]
+ cmp rdx,0x50
+ jbe NEAR $L$cbc_dec_tail
+
+ movaps xmm2,xmm11
+$L$cbc_dec_six_or_seven:
+ cmp rdx,0x60
+ ja NEAR $L$cbc_dec_seven
+
+ movaps xmm8,xmm7
+ call _aesni_decrypt6
+ pxor xmm2,xmm10
+ movaps xmm10,xmm8
+ pxor xmm3,xmm11
+ movdqu XMMWORD[rsi],xmm2
+ pxor xmm4,xmm12
+ movdqu XMMWORD[16+rsi],xmm3
+ pxor xmm3,xmm3
+ pxor xmm5,xmm13
+ movdqu XMMWORD[32+rsi],xmm4
+ pxor xmm4,xmm4
+ pxor xmm6,xmm14
+ movdqu XMMWORD[48+rsi],xmm5
+ pxor xmm5,xmm5
+ pxor xmm7,xmm15
+ movdqu XMMWORD[64+rsi],xmm6
+ pxor xmm6,xmm6
+ lea rsi,[80+rsi]
+ movdqa xmm2,xmm7
+ pxor xmm7,xmm7
+ jmp NEAR $L$cbc_dec_tail_collected
+
+ALIGN 16
+$L$cbc_dec_seven:
+ movups xmm8,XMMWORD[96+rdi]
+ xorps xmm9,xmm9
+ call _aesni_decrypt8
+ movups xmm9,XMMWORD[80+rdi]
+ pxor xmm2,xmm10
+ movups xmm10,XMMWORD[96+rdi]
+ pxor xmm3,xmm11
+ movdqu XMMWORD[rsi],xmm2
+ pxor xmm4,xmm12
+ movdqu XMMWORD[16+rsi],xmm3
+ pxor xmm3,xmm3
+ pxor xmm5,xmm13
+ movdqu XMMWORD[32+rsi],xmm4
+ pxor xmm4,xmm4
+ pxor xmm6,xmm14
+ movdqu XMMWORD[48+rsi],xmm5
+ pxor xmm5,xmm5
+ pxor xmm7,xmm15
+ movdqu XMMWORD[64+rsi],xmm6
+ pxor xmm6,xmm6
+ pxor xmm8,xmm9
+ movdqu XMMWORD[80+rsi],xmm7
+ pxor xmm7,xmm7
+ lea rsi,[96+rsi]
+ movdqa xmm2,xmm8
+ pxor xmm8,xmm8
+ pxor xmm9,xmm9
+ jmp NEAR $L$cbc_dec_tail_collected
+
+ALIGN 16
+$L$cbc_dec_loop6:
+ movups XMMWORD[rsi],xmm7
+ lea rsi,[16+rsi]
+ movdqu xmm2,XMMWORD[rdi]
+ movdqu xmm3,XMMWORD[16+rdi]
+ movdqa xmm11,xmm2
+ movdqu xmm4,XMMWORD[32+rdi]
+ movdqa xmm12,xmm3
+ movdqu xmm5,XMMWORD[48+rdi]
+ movdqa xmm13,xmm4
+ movdqu xmm6,XMMWORD[64+rdi]
+ movdqa xmm14,xmm5
+ movdqu xmm7,XMMWORD[80+rdi]
+ movdqa xmm15,xmm6
+$L$cbc_dec_loop6_enter:
+ lea rdi,[96+rdi]
+ movdqa xmm8,xmm7
+
+ call _aesni_decrypt6
+
+ pxor xmm2,xmm10
+ movdqa xmm10,xmm8
+ pxor xmm3,xmm11
+ movdqu XMMWORD[rsi],xmm2
+ pxor xmm4,xmm12
+ movdqu XMMWORD[16+rsi],xmm3
+ pxor xmm5,xmm13
+ movdqu XMMWORD[32+rsi],xmm4
+ pxor xmm6,xmm14
+ mov rcx,rbp
+ movdqu XMMWORD[48+rsi],xmm5
+ pxor xmm7,xmm15
+ mov eax,r10d
+ movdqu XMMWORD[64+rsi],xmm6
+ lea rsi,[80+rsi]
+ sub rdx,0x60
+ ja NEAR $L$cbc_dec_loop6
+
+ movdqa xmm2,xmm7
+ add rdx,0x50
+ jle NEAR $L$cbc_dec_clear_tail_collected
+ movups XMMWORD[rsi],xmm7
+ lea rsi,[16+rsi]
+
+$L$cbc_dec_tail:
+ movups xmm2,XMMWORD[rdi]
+ sub rdx,0x10
+ jbe NEAR $L$cbc_dec_one
+
+ movups xmm3,XMMWORD[16+rdi]
+ movaps xmm11,xmm2
+ sub rdx,0x10
+ jbe NEAR $L$cbc_dec_two
+
+ movups xmm4,XMMWORD[32+rdi]
+ movaps xmm12,xmm3
+ sub rdx,0x10
+ jbe NEAR $L$cbc_dec_three
+
+ movups xmm5,XMMWORD[48+rdi]
+ movaps xmm13,xmm4
+ sub rdx,0x10
+ jbe NEAR $L$cbc_dec_four
+
+ movups xmm6,XMMWORD[64+rdi]
+ movaps xmm14,xmm5
+ movaps xmm15,xmm6
+ xorps xmm7,xmm7
+ call _aesni_decrypt6
+ pxor xmm2,xmm10
+ movaps xmm10,xmm15
+ pxor xmm3,xmm11
+ movdqu XMMWORD[rsi],xmm2
+ pxor xmm4,xmm12
+ movdqu XMMWORD[16+rsi],xmm3
+ pxor xmm3,xmm3
+ pxor xmm5,xmm13
+ movdqu XMMWORD[32+rsi],xmm4
+ pxor xmm4,xmm4
+ pxor xmm6,xmm14
+ movdqu XMMWORD[48+rsi],xmm5
+ pxor xmm5,xmm5
+ lea rsi,[64+rsi]
+ movdqa xmm2,xmm6
+ pxor xmm6,xmm6
+ pxor xmm7,xmm7
+ sub rdx,0x10
+ jmp NEAR $L$cbc_dec_tail_collected
+
+ALIGN 16
+$L$cbc_dec_one:
+ movaps xmm11,xmm2
+ movups xmm0,XMMWORD[rcx]
+ movups xmm1,XMMWORD[16+rcx]
+ lea rcx,[32+rcx]
+ xorps xmm2,xmm0
+$L$oop_dec1_17:
+DB 102,15,56,222,209
+ dec eax
+ movups xmm1,XMMWORD[rcx]
+ lea rcx,[16+rcx]
+ jnz NEAR $L$oop_dec1_17
+DB 102,15,56,223,209
+ xorps xmm2,xmm10
+ movaps xmm10,xmm11
+ jmp NEAR $L$cbc_dec_tail_collected
+ALIGN 16
+$L$cbc_dec_two:
+ movaps xmm12,xmm3
+ call _aesni_decrypt2
+ pxor xmm2,xmm10
+ movaps xmm10,xmm12
+ pxor xmm3,xmm11
+ movdqu XMMWORD[rsi],xmm2
+ movdqa xmm2,xmm3
+ pxor xmm3,xmm3
+ lea rsi,[16+rsi]
+ jmp NEAR $L$cbc_dec_tail_collected
+ALIGN 16
+$L$cbc_dec_three:
+ movaps xmm13,xmm4
+ call _aesni_decrypt3
+ pxor xmm2,xmm10
+ movaps xmm10,xmm13
+ pxor xmm3,xmm11
+ movdqu XMMWORD[rsi],xmm2
+ pxor xmm4,xmm12
+ movdqu XMMWORD[16+rsi],xmm3
+ pxor xmm3,xmm3
+ movdqa xmm2,xmm4
+ pxor xmm4,xmm4
+ lea rsi,[32+rsi]
+ jmp NEAR $L$cbc_dec_tail_collected
+ALIGN 16
+$L$cbc_dec_four:
+ movaps xmm14,xmm5
+ call _aesni_decrypt4
+ pxor xmm2,xmm10
+ movaps xmm10,xmm14
+ pxor xmm3,xmm11
+ movdqu XMMWORD[rsi],xmm2
+ pxor xmm4,xmm12
+ movdqu XMMWORD[16+rsi],xmm3
+ pxor xmm3,xmm3
+ pxor xmm5,xmm13
+ movdqu XMMWORD[32+rsi],xmm4
+ pxor xmm4,xmm4
+ movdqa xmm2,xmm5
+ pxor xmm5,xmm5
+ lea rsi,[48+rsi]
+ jmp NEAR $L$cbc_dec_tail_collected
+
+ALIGN 16
+$L$cbc_dec_clear_tail_collected:
+ pxor xmm3,xmm3
+ pxor xmm4,xmm4
+ pxor xmm5,xmm5
+$L$cbc_dec_tail_collected:
+ movups XMMWORD[r8],xmm10
+ and rdx,15
+ jnz NEAR $L$cbc_dec_tail_partial
+ movups XMMWORD[rsi],xmm2
+ pxor xmm2,xmm2
+ jmp NEAR $L$cbc_dec_ret
+ALIGN 16
+$L$cbc_dec_tail_partial:
+ movaps XMMWORD[rsp],xmm2
+ pxor xmm2,xmm2
+ mov rcx,16
+ mov rdi,rsi
+ sub rcx,rdx
+ lea rsi,[rsp]
+ DD 0x9066A4F3
+ movdqa XMMWORD[rsp],xmm2
+
+$L$cbc_dec_ret:
+ xorps xmm0,xmm0
+ pxor xmm1,xmm1
+ movaps xmm6,XMMWORD[16+rsp]
+ movaps XMMWORD[16+rsp],xmm0
+ movaps xmm7,XMMWORD[32+rsp]
+ movaps XMMWORD[32+rsp],xmm0
+ movaps xmm8,XMMWORD[48+rsp]
+ movaps XMMWORD[48+rsp],xmm0
+ movaps xmm9,XMMWORD[64+rsp]
+ movaps XMMWORD[64+rsp],xmm0
+ movaps xmm10,XMMWORD[80+rsp]
+ movaps XMMWORD[80+rsp],xmm0
+ movaps xmm11,XMMWORD[96+rsp]
+ movaps XMMWORD[96+rsp],xmm0
+ movaps xmm12,XMMWORD[112+rsp]
+ movaps XMMWORD[112+rsp],xmm0
+ movaps xmm13,XMMWORD[128+rsp]
+ movaps XMMWORD[128+rsp],xmm0
+ movaps xmm14,XMMWORD[144+rsp]
+ movaps XMMWORD[144+rsp],xmm0
+ movaps xmm15,XMMWORD[160+rsp]
+ movaps XMMWORD[160+rsp],xmm0
+ mov rbp,QWORD[((-8))+r11]
+
+ lea rsp,[r11]
+
+$L$cbc_ret:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_aesni_cbc_encrypt:
+global aesni_set_decrypt_key
+
+ALIGN 16
+aesni_set_decrypt_key:
+
+DB 0x48,0x83,0xEC,0x08
+
+ call __aesni_set_encrypt_key
+ shl edx,4
+ test eax,eax
+ jnz NEAR $L$dec_key_ret
+ lea rcx,[16+rdx*1+r8]
+
+ movups xmm0,XMMWORD[r8]
+ movups xmm1,XMMWORD[rcx]
+ movups XMMWORD[rcx],xmm0
+ movups XMMWORD[r8],xmm1
+ lea r8,[16+r8]
+ lea rcx,[((-16))+rcx]
+
+$L$dec_key_inverse:
+ movups xmm0,XMMWORD[r8]
+ movups xmm1,XMMWORD[rcx]
+DB 102,15,56,219,192
+DB 102,15,56,219,201
+ lea r8,[16+r8]
+ lea rcx,[((-16))+rcx]
+ movups XMMWORD[16+rcx],xmm0
+ movups XMMWORD[(-16)+r8],xmm1
+ cmp rcx,r8
+ ja NEAR $L$dec_key_inverse
+
+ movups xmm0,XMMWORD[r8]
+DB 102,15,56,219,192
+ pxor xmm1,xmm1
+ movups XMMWORD[rcx],xmm0
+ pxor xmm0,xmm0
+$L$dec_key_ret:
+ add rsp,8
+
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_set_decrypt_key:
+
+global aesni_set_encrypt_key
+
+ALIGN 16
+aesni_set_encrypt_key:
+__aesni_set_encrypt_key:
+
+DB 0x48,0x83,0xEC,0x08
+
+ mov rax,-1
+ test rcx,rcx
+ jz NEAR $L$enc_key_ret
+ test r8,r8
+ jz NEAR $L$enc_key_ret
+
+ mov r10d,268437504
+ movups xmm0,XMMWORD[rcx]
+ xorps xmm4,xmm4
+ and r10d,DWORD[((OPENSSL_ia32cap_P+4))]
+ lea rax,[16+r8]
+ cmp edx,256
+ je NEAR $L$14rounds
+ cmp edx,192
+ je NEAR $L$12rounds
+ cmp edx,128
+ jne NEAR $L$bad_keybits
+
+$L$10rounds:
+ mov edx,9
+ cmp r10d,268435456
+ je NEAR $L$10rounds_alt
+
+ movups XMMWORD[r8],xmm0
+DB 102,15,58,223,200,1
+ call $L$key_expansion_128_cold
+DB 102,15,58,223,200,2
+ call $L$key_expansion_128
+DB 102,15,58,223,200,4
+ call $L$key_expansion_128
+DB 102,15,58,223,200,8
+ call $L$key_expansion_128
+DB 102,15,58,223,200,16
+ call $L$key_expansion_128
+DB 102,15,58,223,200,32
+ call $L$key_expansion_128
+DB 102,15,58,223,200,64
+ call $L$key_expansion_128
+DB 102,15,58,223,200,128
+ call $L$key_expansion_128
+DB 102,15,58,223,200,27
+ call $L$key_expansion_128
+DB 102,15,58,223,200,54
+ call $L$key_expansion_128
+ movups XMMWORD[rax],xmm0
+ mov DWORD[80+rax],edx
+ xor eax,eax
+ jmp NEAR $L$enc_key_ret
+
+ALIGN 16
+$L$10rounds_alt:
+ movdqa xmm5,XMMWORD[$L$key_rotate]
+ mov r10d,8
+ movdqa xmm4,XMMWORD[$L$key_rcon1]
+ movdqa xmm2,xmm0
+ movdqu XMMWORD[r8],xmm0
+ jmp NEAR $L$oop_key128
+
+ALIGN 16
+$L$oop_key128:
+DB 102,15,56,0,197
+DB 102,15,56,221,196
+ pslld xmm4,1
+ lea rax,[16+rax]
+
+ movdqa xmm3,xmm2
+ pslldq xmm2,4
+ pxor xmm3,xmm2
+ pslldq xmm2,4
+ pxor xmm3,xmm2
+ pslldq xmm2,4
+ pxor xmm2,xmm3
+
+ pxor xmm0,xmm2
+ movdqu XMMWORD[(-16)+rax],xmm0
+ movdqa xmm2,xmm0
+
+ dec r10d
+ jnz NEAR $L$oop_key128
+
+ movdqa xmm4,XMMWORD[$L$key_rcon1b]
+
+DB 102,15,56,0,197
+DB 102,15,56,221,196
+ pslld xmm4,1
+
+ movdqa xmm3,xmm2
+ pslldq xmm2,4
+ pxor xmm3,xmm2
+ pslldq xmm2,4
+ pxor xmm3,xmm2
+ pslldq xmm2,4
+ pxor xmm2,xmm3
+
+ pxor xmm0,xmm2
+ movdqu XMMWORD[rax],xmm0
+
+ movdqa xmm2,xmm0
+DB 102,15,56,0,197
+DB 102,15,56,221,196
+
+ movdqa xmm3,xmm2
+ pslldq xmm2,4
+ pxor xmm3,xmm2
+ pslldq xmm2,4
+ pxor xmm3,xmm2
+ pslldq xmm2,4
+ pxor xmm2,xmm3
+
+ pxor xmm0,xmm2
+ movdqu XMMWORD[16+rax],xmm0
+
+ mov DWORD[96+rax],edx
+ xor eax,eax
+ jmp NEAR $L$enc_key_ret
+
+ALIGN 16
+$L$12rounds:
+ movq xmm2,QWORD[16+rcx]
+ mov edx,11
+ cmp r10d,268435456
+ je NEAR $L$12rounds_alt
+
+ movups XMMWORD[r8],xmm0
+DB 102,15,58,223,202,1
+ call $L$key_expansion_192a_cold
+DB 102,15,58,223,202,2
+ call $L$key_expansion_192b
+DB 102,15,58,223,202,4
+ call $L$key_expansion_192a
+DB 102,15,58,223,202,8
+ call $L$key_expansion_192b
+DB 102,15,58,223,202,16
+ call $L$key_expansion_192a
+DB 102,15,58,223,202,32
+ call $L$key_expansion_192b
+DB 102,15,58,223,202,64
+ call $L$key_expansion_192a
+DB 102,15,58,223,202,128
+ call $L$key_expansion_192b
+ movups XMMWORD[rax],xmm0
+ mov DWORD[48+rax],edx
+ xor rax,rax
+ jmp NEAR $L$enc_key_ret
+
+ALIGN 16
+$L$12rounds_alt:
+ movdqa xmm5,XMMWORD[$L$key_rotate192]
+ movdqa xmm4,XMMWORD[$L$key_rcon1]
+ mov r10d,8
+ movdqu XMMWORD[r8],xmm0
+ jmp NEAR $L$oop_key192
+
+ALIGN 16
+$L$oop_key192:
+ movq QWORD[rax],xmm2
+ movdqa xmm1,xmm2
+DB 102,15,56,0,213
+DB 102,15,56,221,212
+ pslld xmm4,1
+ lea rax,[24+rax]
+
+ movdqa xmm3,xmm0
+ pslldq xmm0,4
+ pxor xmm3,xmm0
+ pslldq xmm0,4
+ pxor xmm3,xmm0
+ pslldq xmm0,4
+ pxor xmm0,xmm3
+
+ pshufd xmm3,xmm0,0xff
+ pxor xmm3,xmm1
+ pslldq xmm1,4
+ pxor xmm3,xmm1
+
+ pxor xmm0,xmm2
+ pxor xmm2,xmm3
+ movdqu XMMWORD[(-16)+rax],xmm0
+
+ dec r10d
+ jnz NEAR $L$oop_key192
+
+ mov DWORD[32+rax],edx
+ xor eax,eax
+ jmp NEAR $L$enc_key_ret
+
+ALIGN 16
+$L$14rounds:
+ movups xmm2,XMMWORD[16+rcx]
+ mov edx,13
+ lea rax,[16+rax]
+ cmp r10d,268435456
+ je NEAR $L$14rounds_alt
+
+ movups XMMWORD[r8],xmm0
+ movups XMMWORD[16+r8],xmm2
+DB 102,15,58,223,202,1
+ call $L$key_expansion_256a_cold
+DB 102,15,58,223,200,1
+ call $L$key_expansion_256b
+DB 102,15,58,223,202,2
+ call $L$key_expansion_256a
+DB 102,15,58,223,200,2
+ call $L$key_expansion_256b
+DB 102,15,58,223,202,4
+ call $L$key_expansion_256a
+DB 102,15,58,223,200,4
+ call $L$key_expansion_256b
+DB 102,15,58,223,202,8
+ call $L$key_expansion_256a
+DB 102,15,58,223,200,8
+ call $L$key_expansion_256b
+DB 102,15,58,223,202,16
+ call $L$key_expansion_256a
+DB 102,15,58,223,200,16
+ call $L$key_expansion_256b
+DB 102,15,58,223,202,32
+ call $L$key_expansion_256a
+DB 102,15,58,223,200,32
+ call $L$key_expansion_256b
+DB 102,15,58,223,202,64
+ call $L$key_expansion_256a
+ movups XMMWORD[rax],xmm0
+ mov DWORD[16+rax],edx
+ xor rax,rax
+ jmp NEAR $L$enc_key_ret
+
+ALIGN 16
+$L$14rounds_alt:
+ movdqa xmm5,XMMWORD[$L$key_rotate]
+ movdqa xmm4,XMMWORD[$L$key_rcon1]
+ mov r10d,7
+ movdqu XMMWORD[r8],xmm0
+ movdqa xmm1,xmm2
+ movdqu XMMWORD[16+r8],xmm2
+ jmp NEAR $L$oop_key256
+
+ALIGN 16
+$L$oop_key256:
+DB 102,15,56,0,213
+DB 102,15,56,221,212
+
+ movdqa xmm3,xmm0
+ pslldq xmm0,4
+ pxor xmm3,xmm0
+ pslldq xmm0,4
+ pxor xmm3,xmm0
+ pslldq xmm0,4
+ pxor xmm0,xmm3
+ pslld xmm4,1
+
+ pxor xmm0,xmm2
+ movdqu XMMWORD[rax],xmm0
+
+ dec r10d
+ jz NEAR $L$done_key256
+
+ pshufd xmm2,xmm0,0xff
+ pxor xmm3,xmm3
+DB 102,15,56,221,211
+
+ movdqa xmm3,xmm1
+ pslldq xmm1,4
+ pxor xmm3,xmm1
+ pslldq xmm1,4
+ pxor xmm3,xmm1
+ pslldq xmm1,4
+ pxor xmm1,xmm3
+
+ pxor xmm2,xmm1
+ movdqu XMMWORD[16+rax],xmm2
+ lea rax,[32+rax]
+ movdqa xmm1,xmm2
+
+ jmp NEAR $L$oop_key256
+
+$L$done_key256:
+ mov DWORD[16+rax],edx
+ xor eax,eax
+ jmp NEAR $L$enc_key_ret
+
+ALIGN 16
+$L$bad_keybits:
+ mov rax,-2
+$L$enc_key_ret:
+ pxor xmm0,xmm0
+ pxor xmm1,xmm1
+ pxor xmm2,xmm2
+ pxor xmm3,xmm3
+ pxor xmm4,xmm4
+ pxor xmm5,xmm5
+ add rsp,8
+
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_set_encrypt_key:
+
+ALIGN 16
+$L$key_expansion_128:
+ movups XMMWORD[rax],xmm0
+ lea rax,[16+rax]
+$L$key_expansion_128_cold:
+ shufps xmm4,xmm0,16
+ xorps xmm0,xmm4
+ shufps xmm4,xmm0,140
+ xorps xmm0,xmm4
+ shufps xmm1,xmm1,255
+ xorps xmm0,xmm1
+ DB 0F3h,0C3h ;repret
+
+ALIGN 16
+$L$key_expansion_192a:
+ movups XMMWORD[rax],xmm0
+ lea rax,[16+rax]
+$L$key_expansion_192a_cold:
+ movaps xmm5,xmm2
+$L$key_expansion_192b_warm:
+ shufps xmm4,xmm0,16
+ movdqa xmm3,xmm2
+ xorps xmm0,xmm4
+ shufps xmm4,xmm0,140
+ pslldq xmm3,4
+ xorps xmm0,xmm4
+ pshufd xmm1,xmm1,85
+ pxor xmm2,xmm3
+ pxor xmm0,xmm1
+ pshufd xmm3,xmm0,255
+ pxor xmm2,xmm3
+ DB 0F3h,0C3h ;repret
+
+ALIGN 16
+$L$key_expansion_192b:
+ movaps xmm3,xmm0
+ shufps xmm5,xmm0,68
+ movups XMMWORD[rax],xmm5
+ shufps xmm3,xmm2,78
+ movups XMMWORD[16+rax],xmm3
+ lea rax,[32+rax]
+ jmp NEAR $L$key_expansion_192b_warm
+
+ALIGN 16
+$L$key_expansion_256a:
+ movups XMMWORD[rax],xmm2
+ lea rax,[16+rax]
+$L$key_expansion_256a_cold:
+ shufps xmm4,xmm0,16
+ xorps xmm0,xmm4
+ shufps xmm4,xmm0,140
+ xorps xmm0,xmm4
+ shufps xmm1,xmm1,255
+ xorps xmm0,xmm1
+ DB 0F3h,0C3h ;repret
+
+ALIGN 16
+$L$key_expansion_256b:
+ movups XMMWORD[rax],xmm0
+ lea rax,[16+rax]
+
+ shufps xmm4,xmm2,16
+ xorps xmm2,xmm4
+ shufps xmm4,xmm2,140
+ xorps xmm2,xmm4
+ shufps xmm1,xmm1,170
+ xorps xmm2,xmm1
+ DB 0F3h,0C3h ;repret
+
+
+ALIGN 64
+$L$bswap_mask:
+DB 15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0
+$L$increment32:
+ DD 6,6,6,0
+$L$increment64:
+ DD 1,0,0,0
+$L$xts_magic:
+ DD 0x87,0,1,0
+$L$increment1:
+DB 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
+$L$key_rotate:
+ DD 0x0c0f0e0d,0x0c0f0e0d,0x0c0f0e0d,0x0c0f0e0d
+$L$key_rotate192:
+ DD 0x04070605,0x04070605,0x04070605,0x04070605
+$L$key_rcon1:
+ DD 1,1,1,1
+$L$key_rcon1b:
+ DD 0x1b,0x1b,0x1b,0x1b
+
+DB 65,69,83,32,102,111,114,32,73,110,116,101,108,32,65,69
+DB 83,45,78,73,44,32,67,82,89,80,84,79,71,65,77,83
+DB 32,98,121,32,60,97,112,112,114,111,64,111,112,101,110,115
+DB 115,108,46,111,114,103,62,0
+ALIGN 64
+EXTERN __imp_RtlVirtualUnwind
+
+ALIGN 16
+ecb_ccm64_se_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ mov rsi,QWORD[8+r9]
+ mov r11,QWORD[56+r9]
+
+ mov r10d,DWORD[r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jb NEAR $L$common_seh_tail
+
+ mov rax,QWORD[152+r8]
+
+ mov r10d,DWORD[4+r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jae NEAR $L$common_seh_tail
+
+ lea rsi,[rax]
+ lea rdi,[512+r8]
+ mov ecx,8
+ DD 0xa548f3fc
+ lea rax,[88+rax]
+
+ jmp NEAR $L$common_seh_tail
+
+
+
+ALIGN 16
+ctr_xts_se_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ mov rsi,QWORD[8+r9]
+ mov r11,QWORD[56+r9]
+
+ mov r10d,DWORD[r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jb NEAR $L$common_seh_tail
+
+ mov rax,QWORD[152+r8]
+
+ mov r10d,DWORD[4+r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jae NEAR $L$common_seh_tail
+
+ mov rax,QWORD[208+r8]
+
+ lea rsi,[((-168))+rax]
+ lea rdi,[512+r8]
+ mov ecx,20
+ DD 0xa548f3fc
+
+ mov rbp,QWORD[((-8))+rax]
+ mov QWORD[160+r8],rbp
+ jmp NEAR $L$common_seh_tail
+
+
+
+ALIGN 16
+ocb_se_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ mov rsi,QWORD[8+r9]
+ mov r11,QWORD[56+r9]
+
+ mov r10d,DWORD[r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jb NEAR $L$common_seh_tail
+
+ mov r10d,DWORD[4+r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jae NEAR $L$common_seh_tail
+
+ mov r10d,DWORD[8+r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jae NEAR $L$ocb_no_xmm
+
+ mov rax,QWORD[152+r8]
+
+ lea rsi,[rax]
+ lea rdi,[512+r8]
+ mov ecx,20
+ DD 0xa548f3fc
+ lea rax,[((160+40))+rax]
+
+$L$ocb_no_xmm:
+ mov rbx,QWORD[((-8))+rax]
+ mov rbp,QWORD[((-16))+rax]
+ mov r12,QWORD[((-24))+rax]
+ mov r13,QWORD[((-32))+rax]
+ mov r14,QWORD[((-40))+rax]
+
+ mov QWORD[144+r8],rbx
+ mov QWORD[160+r8],rbp
+ mov QWORD[216+r8],r12
+ mov QWORD[224+r8],r13
+ mov QWORD[232+r8],r14
+
+ jmp NEAR $L$common_seh_tail
+
+
+ALIGN 16
+cbc_se_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[152+r8]
+ mov rbx,QWORD[248+r8]
+
+ lea r10,[$L$cbc_decrypt_bulk]
+ cmp rbx,r10
+ jb NEAR $L$common_seh_tail
+
+ mov rax,QWORD[120+r8]
+
+ lea r10,[$L$cbc_decrypt_body]
+ cmp rbx,r10
+ jb NEAR $L$common_seh_tail
+
+ mov rax,QWORD[152+r8]
+
+ lea r10,[$L$cbc_ret]
+ cmp rbx,r10
+ jae NEAR $L$common_seh_tail
+
+ lea rsi,[16+rax]
+ lea rdi,[512+r8]
+ mov ecx,20
+ DD 0xa548f3fc
+
+ mov rax,QWORD[208+r8]
+
+ mov rbp,QWORD[((-8))+rax]
+ mov QWORD[160+r8],rbp
+
+$L$common_seh_tail:
+ mov rdi,QWORD[8+rax]
+ mov rsi,QWORD[16+rax]
+ mov QWORD[152+r8],rax
+ mov QWORD[168+r8],rsi
+ mov QWORD[176+r8],rdi
+
+ mov rdi,QWORD[40+r9]
+ mov rsi,r8
+ mov ecx,154
+ DD 0xa548f3fc
+
+ mov rsi,r9
+ xor rcx,rcx
+ mov rdx,QWORD[8+rsi]
+ mov r8,QWORD[rsi]
+ mov r9,QWORD[16+rsi]
+ mov r10,QWORD[40+rsi]
+ lea r11,[56+rsi]
+ lea r12,[24+rsi]
+ mov QWORD[32+rsp],r10
+ mov QWORD[40+rsp],r11
+ mov QWORD[48+rsp],r12
+ mov QWORD[56+rsp],rcx
+ call QWORD[__imp_RtlVirtualUnwind]
+
+ mov eax,1
+ add rsp,64
+ popfq
+ pop r15
+ pop r14
+ pop r13
+ pop r12
+ pop rbp
+ pop rbx
+ pop rdi
+ pop rsi
+ DB 0F3h,0C3h ;repret
+
+
+section .pdata rdata align=4
+ALIGN 4
+ DD $L$SEH_begin_aesni_ecb_encrypt wrt ..imagebase
+ DD $L$SEH_end_aesni_ecb_encrypt wrt ..imagebase
+ DD $L$SEH_info_ecb wrt ..imagebase
+
+ DD $L$SEH_begin_aesni_ccm64_encrypt_blocks wrt ..imagebase
+ DD $L$SEH_end_aesni_ccm64_encrypt_blocks wrt ..imagebase
+ DD $L$SEH_info_ccm64_enc wrt ..imagebase
+
+ DD $L$SEH_begin_aesni_ccm64_decrypt_blocks wrt ..imagebase
+ DD $L$SEH_end_aesni_ccm64_decrypt_blocks wrt ..imagebase
+ DD $L$SEH_info_ccm64_dec wrt ..imagebase
+
+ DD $L$SEH_begin_aesni_ctr32_encrypt_blocks wrt ..imagebase
+ DD $L$SEH_end_aesni_ctr32_encrypt_blocks wrt ..imagebase
+ DD $L$SEH_info_ctr32 wrt ..imagebase
+
+ DD $L$SEH_begin_aesni_xts_encrypt wrt ..imagebase
+ DD $L$SEH_end_aesni_xts_encrypt wrt ..imagebase
+ DD $L$SEH_info_xts_enc wrt ..imagebase
+
+ DD $L$SEH_begin_aesni_xts_decrypt wrt ..imagebase
+ DD $L$SEH_end_aesni_xts_decrypt wrt ..imagebase
+ DD $L$SEH_info_xts_dec wrt ..imagebase
+
+ DD $L$SEH_begin_aesni_ocb_encrypt wrt ..imagebase
+ DD $L$SEH_end_aesni_ocb_encrypt wrt ..imagebase
+ DD $L$SEH_info_ocb_enc wrt ..imagebase
+
+ DD $L$SEH_begin_aesni_ocb_decrypt wrt ..imagebase
+ DD $L$SEH_end_aesni_ocb_decrypt wrt ..imagebase
+ DD $L$SEH_info_ocb_dec wrt ..imagebase
+ DD $L$SEH_begin_aesni_cbc_encrypt wrt ..imagebase
+ DD $L$SEH_end_aesni_cbc_encrypt wrt ..imagebase
+ DD $L$SEH_info_cbc wrt ..imagebase
+
+ DD aesni_set_decrypt_key wrt ..imagebase
+ DD $L$SEH_end_set_decrypt_key wrt ..imagebase
+ DD $L$SEH_info_key wrt ..imagebase
+
+ DD aesni_set_encrypt_key wrt ..imagebase
+ DD $L$SEH_end_set_encrypt_key wrt ..imagebase
+ DD $L$SEH_info_key wrt ..imagebase
+section .xdata rdata align=8
+ALIGN 8
+$L$SEH_info_ecb:
+DB 9,0,0,0
+ DD ecb_ccm64_se_handler wrt ..imagebase
+ DD $L$ecb_enc_body wrt ..imagebase,$L$ecb_enc_ret wrt ..imagebase
+$L$SEH_info_ccm64_enc:
+DB 9,0,0,0
+ DD ecb_ccm64_se_handler wrt ..imagebase
+ DD $L$ccm64_enc_body wrt ..imagebase,$L$ccm64_enc_ret wrt ..imagebase
+$L$SEH_info_ccm64_dec:
+DB 9,0,0,0
+ DD ecb_ccm64_se_handler wrt ..imagebase
+ DD $L$ccm64_dec_body wrt ..imagebase,$L$ccm64_dec_ret wrt ..imagebase
+$L$SEH_info_ctr32:
+DB 9,0,0,0
+ DD ctr_xts_se_handler wrt ..imagebase
+ DD $L$ctr32_body wrt ..imagebase,$L$ctr32_epilogue wrt ..imagebase
+$L$SEH_info_xts_enc:
+DB 9,0,0,0
+ DD ctr_xts_se_handler wrt ..imagebase
+ DD $L$xts_enc_body wrt ..imagebase,$L$xts_enc_epilogue wrt ..imagebase
+$L$SEH_info_xts_dec:
+DB 9,0,0,0
+ DD ctr_xts_se_handler wrt ..imagebase
+ DD $L$xts_dec_body wrt ..imagebase,$L$xts_dec_epilogue wrt ..imagebase
+$L$SEH_info_ocb_enc:
+DB 9,0,0,0
+ DD ocb_se_handler wrt ..imagebase
+ DD $L$ocb_enc_body wrt ..imagebase,$L$ocb_enc_epilogue wrt ..imagebase
+ DD $L$ocb_enc_pop wrt ..imagebase
+ DD 0
+$L$SEH_info_ocb_dec:
+DB 9,0,0,0
+ DD ocb_se_handler wrt ..imagebase
+ DD $L$ocb_dec_body wrt ..imagebase,$L$ocb_dec_epilogue wrt ..imagebase
+ DD $L$ocb_dec_pop wrt ..imagebase
+ DD 0
+$L$SEH_info_cbc:
+DB 9,0,0,0
+ DD cbc_se_handler wrt ..imagebase
+$L$SEH_info_key:
+DB 0x01,0x04,0x01,0x00
+DB 0x04,0x02,0x00,0x00
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/aes/vpaes-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/vpaes-x86_64.nasm
new file mode 100644
index 0000000000..e6a5733924
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/vpaes-x86_64.nasm
@@ -0,0 +1,1170 @@
+; Copyright 2011-2019 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ALIGN 16
+_vpaes_encrypt_core:
+
+ mov r9,rdx
+ mov r11,16
+ mov eax,DWORD[240+rdx]
+ movdqa xmm1,xmm9
+ movdqa xmm2,XMMWORD[$L$k_ipt]
+ pandn xmm1,xmm0
+ movdqu xmm5,XMMWORD[r9]
+ psrld xmm1,4
+ pand xmm0,xmm9
+DB 102,15,56,0,208
+ movdqa xmm0,XMMWORD[(($L$k_ipt+16))]
+DB 102,15,56,0,193
+ pxor xmm2,xmm5
+ add r9,16
+ pxor xmm0,xmm2
+ lea r10,[$L$k_mc_backward]
+ jmp NEAR $L$enc_entry
+
+ALIGN 16
+$L$enc_loop:
+
+ movdqa xmm4,xmm13
+ movdqa xmm0,xmm12
+DB 102,15,56,0,226
+DB 102,15,56,0,195
+ pxor xmm4,xmm5
+ movdqa xmm5,xmm15
+ pxor xmm0,xmm4
+ movdqa xmm1,XMMWORD[((-64))+r10*1+r11]
+DB 102,15,56,0,234
+ movdqa xmm4,XMMWORD[r10*1+r11]
+ movdqa xmm2,xmm14
+DB 102,15,56,0,211
+ movdqa xmm3,xmm0
+ pxor xmm2,xmm5
+DB 102,15,56,0,193
+ add r9,16
+ pxor xmm0,xmm2
+DB 102,15,56,0,220
+ add r11,16
+ pxor xmm3,xmm0
+DB 102,15,56,0,193
+ and r11,0x30
+ sub rax,1
+ pxor xmm0,xmm3
+
+$L$enc_entry:
+
+ movdqa xmm1,xmm9
+ movdqa xmm5,xmm11
+ pandn xmm1,xmm0
+ psrld xmm1,4
+ pand xmm0,xmm9
+DB 102,15,56,0,232
+ movdqa xmm3,xmm10
+ pxor xmm0,xmm1
+DB 102,15,56,0,217
+ movdqa xmm4,xmm10
+ pxor xmm3,xmm5
+DB 102,15,56,0,224
+ movdqa xmm2,xmm10
+ pxor xmm4,xmm5
+DB 102,15,56,0,211
+ movdqa xmm3,xmm10
+ pxor xmm2,xmm0
+DB 102,15,56,0,220
+ movdqu xmm5,XMMWORD[r9]
+ pxor xmm3,xmm1
+ jnz NEAR $L$enc_loop
+
+
+ movdqa xmm4,XMMWORD[((-96))+r10]
+ movdqa xmm0,XMMWORD[((-80))+r10]
+DB 102,15,56,0,226
+ pxor xmm4,xmm5
+DB 102,15,56,0,195
+ movdqa xmm1,XMMWORD[64+r10*1+r11]
+ pxor xmm0,xmm4
+DB 102,15,56,0,193
+ DB 0F3h,0C3h ;repret
+
+
+
+
+
+
+
+
+
+ALIGN 16
+_vpaes_decrypt_core:
+
+ mov r9,rdx
+ mov eax,DWORD[240+rdx]
+ movdqa xmm1,xmm9
+ movdqa xmm2,XMMWORD[$L$k_dipt]
+ pandn xmm1,xmm0
+ mov r11,rax
+ psrld xmm1,4
+ movdqu xmm5,XMMWORD[r9]
+ shl r11,4
+ pand xmm0,xmm9
+DB 102,15,56,0,208
+ movdqa xmm0,XMMWORD[(($L$k_dipt+16))]
+ xor r11,0x30
+ lea r10,[$L$k_dsbd]
+DB 102,15,56,0,193
+ and r11,0x30
+ pxor xmm2,xmm5
+ movdqa xmm5,XMMWORD[(($L$k_mc_forward+48))]
+ pxor xmm0,xmm2
+ add r9,16
+ add r11,r10
+ jmp NEAR $L$dec_entry
+
+ALIGN 16
+$L$dec_loop:
+
+
+
+ movdqa xmm4,XMMWORD[((-32))+r10]
+ movdqa xmm1,XMMWORD[((-16))+r10]
+DB 102,15,56,0,226
+DB 102,15,56,0,203
+ pxor xmm0,xmm4
+ movdqa xmm4,XMMWORD[r10]
+ pxor xmm0,xmm1
+ movdqa xmm1,XMMWORD[16+r10]
+
+DB 102,15,56,0,226
+DB 102,15,56,0,197
+DB 102,15,56,0,203
+ pxor xmm0,xmm4
+ movdqa xmm4,XMMWORD[32+r10]
+ pxor xmm0,xmm1
+ movdqa xmm1,XMMWORD[48+r10]
+
+DB 102,15,56,0,226
+DB 102,15,56,0,197
+DB 102,15,56,0,203
+ pxor xmm0,xmm4
+ movdqa xmm4,XMMWORD[64+r10]
+ pxor xmm0,xmm1
+ movdqa xmm1,XMMWORD[80+r10]
+
+DB 102,15,56,0,226
+DB 102,15,56,0,197
+DB 102,15,56,0,203
+ pxor xmm0,xmm4
+ add r9,16
+DB 102,15,58,15,237,12
+ pxor xmm0,xmm1
+ sub rax,1
+
+$L$dec_entry:
+
+ movdqa xmm1,xmm9
+ pandn xmm1,xmm0
+ movdqa xmm2,xmm11
+ psrld xmm1,4
+ pand xmm0,xmm9
+DB 102,15,56,0,208
+ movdqa xmm3,xmm10
+ pxor xmm0,xmm1
+DB 102,15,56,0,217
+ movdqa xmm4,xmm10
+ pxor xmm3,xmm2
+DB 102,15,56,0,224
+ pxor xmm4,xmm2
+ movdqa xmm2,xmm10
+DB 102,15,56,0,211
+ movdqa xmm3,xmm10
+ pxor xmm2,xmm0
+DB 102,15,56,0,220
+ movdqu xmm0,XMMWORD[r9]
+ pxor xmm3,xmm1
+ jnz NEAR $L$dec_loop
+
+
+ movdqa xmm4,XMMWORD[96+r10]
+DB 102,15,56,0,226
+ pxor xmm4,xmm0
+ movdqa xmm0,XMMWORD[112+r10]
+ movdqa xmm2,XMMWORD[((-352))+r11]
+DB 102,15,56,0,195
+ pxor xmm0,xmm4
+DB 102,15,56,0,194
+ DB 0F3h,0C3h ;repret
+
+
+
+
+
+
+
+
+
+ALIGN 16
+_vpaes_schedule_core:
+
+
+
+
+
+
+ call _vpaes_preheat
+ movdqa xmm8,XMMWORD[$L$k_rcon]
+ movdqu xmm0,XMMWORD[rdi]
+
+
+ movdqa xmm3,xmm0
+ lea r11,[$L$k_ipt]
+ call _vpaes_schedule_transform
+ movdqa xmm7,xmm0
+
+ lea r10,[$L$k_sr]
+ test rcx,rcx
+ jnz NEAR $L$schedule_am_decrypting
+
+
+ movdqu XMMWORD[rdx],xmm0
+ jmp NEAR $L$schedule_go
+
+$L$schedule_am_decrypting:
+
+ movdqa xmm1,XMMWORD[r10*1+r8]
+DB 102,15,56,0,217
+ movdqu XMMWORD[rdx],xmm3
+ xor r8,0x30
+
+$L$schedule_go:
+ cmp esi,192
+ ja NEAR $L$schedule_256
+ je NEAR $L$schedule_192
+
+
+
+
+
+
+
+
+
+
+$L$schedule_128:
+ mov esi,10
+
+$L$oop_schedule_128:
+ call _vpaes_schedule_round
+ dec rsi
+ jz NEAR $L$schedule_mangle_last
+ call _vpaes_schedule_mangle
+ jmp NEAR $L$oop_schedule_128
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ALIGN 16
+$L$schedule_192:
+ movdqu xmm0,XMMWORD[8+rdi]
+ call _vpaes_schedule_transform
+ movdqa xmm6,xmm0
+ pxor xmm4,xmm4
+ movhlps xmm6,xmm4
+ mov esi,4
+
+$L$oop_schedule_192:
+ call _vpaes_schedule_round
+DB 102,15,58,15,198,8
+ call _vpaes_schedule_mangle
+ call _vpaes_schedule_192_smear
+ call _vpaes_schedule_mangle
+ call _vpaes_schedule_round
+ dec rsi
+ jz NEAR $L$schedule_mangle_last
+ call _vpaes_schedule_mangle
+ call _vpaes_schedule_192_smear
+ jmp NEAR $L$oop_schedule_192
+
+
+
+
+
+
+
+
+
+
+
+ALIGN 16
+$L$schedule_256:
+ movdqu xmm0,XMMWORD[16+rdi]
+ call _vpaes_schedule_transform
+ mov esi,7
+
+$L$oop_schedule_256:
+ call _vpaes_schedule_mangle
+ movdqa xmm6,xmm0
+
+
+ call _vpaes_schedule_round
+ dec rsi
+ jz NEAR $L$schedule_mangle_last
+ call _vpaes_schedule_mangle
+
+
+ pshufd xmm0,xmm0,0xFF
+ movdqa xmm5,xmm7
+ movdqa xmm7,xmm6
+ call _vpaes_schedule_low_round
+ movdqa xmm7,xmm5
+
+ jmp NEAR $L$oop_schedule_256
+
+
+
+
+
+
+
+
+
+
+
+
+ALIGN 16
+$L$schedule_mangle_last:
+
+ lea r11,[$L$k_deskew]
+ test rcx,rcx
+ jnz NEAR $L$schedule_mangle_last_dec
+
+
+ movdqa xmm1,XMMWORD[r10*1+r8]
+DB 102,15,56,0,193
+ lea r11,[$L$k_opt]
+ add rdx,32
+
+$L$schedule_mangle_last_dec:
+ add rdx,-16
+ pxor xmm0,XMMWORD[$L$k_s63]
+ call _vpaes_schedule_transform
+ movdqu XMMWORD[rdx],xmm0
+
+
+ pxor xmm0,xmm0
+ pxor xmm1,xmm1
+ pxor xmm2,xmm2
+ pxor xmm3,xmm3
+ pxor xmm4,xmm4
+ pxor xmm5,xmm5
+ pxor xmm6,xmm6
+ pxor xmm7,xmm7
+ DB 0F3h,0C3h ;repret
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ALIGN 16
+_vpaes_schedule_192_smear:
+
+ pshufd xmm1,xmm6,0x80
+ pshufd xmm0,xmm7,0xFE
+ pxor xmm6,xmm1
+ pxor xmm1,xmm1
+ pxor xmm6,xmm0
+ movdqa xmm0,xmm6
+ movhlps xmm6,xmm1
+ DB 0F3h,0C3h ;repret
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ALIGN 16
+_vpaes_schedule_round:
+
+
+ pxor xmm1,xmm1
+DB 102,65,15,58,15,200,15
+DB 102,69,15,58,15,192,15
+ pxor xmm7,xmm1
+
+
+ pshufd xmm0,xmm0,0xFF
+DB 102,15,58,15,192,1
+
+
+
+
+_vpaes_schedule_low_round:
+
+ movdqa xmm1,xmm7
+ pslldq xmm7,4
+ pxor xmm7,xmm1
+ movdqa xmm1,xmm7
+ pslldq xmm7,8
+ pxor xmm7,xmm1
+ pxor xmm7,XMMWORD[$L$k_s63]
+
+
+ movdqa xmm1,xmm9
+ pandn xmm1,xmm0
+ psrld xmm1,4
+ pand xmm0,xmm9
+ movdqa xmm2,xmm11
+DB 102,15,56,0,208
+ pxor xmm0,xmm1
+ movdqa xmm3,xmm10
+DB 102,15,56,0,217
+ pxor xmm3,xmm2
+ movdqa xmm4,xmm10
+DB 102,15,56,0,224
+ pxor xmm4,xmm2
+ movdqa xmm2,xmm10
+DB 102,15,56,0,211
+ pxor xmm2,xmm0
+ movdqa xmm3,xmm10
+DB 102,15,56,0,220
+ pxor xmm3,xmm1
+ movdqa xmm4,xmm13
+DB 102,15,56,0,226
+ movdqa xmm0,xmm12
+DB 102,15,56,0,195
+ pxor xmm0,xmm4
+
+
+ pxor xmm0,xmm7
+ movdqa xmm7,xmm0
+ DB 0F3h,0C3h ;repret
+
+
+
+
+
+
+
+
+
+
+
+
+
+ALIGN 16
+_vpaes_schedule_transform:
+
+ movdqa xmm1,xmm9
+ pandn xmm1,xmm0
+ psrld xmm1,4
+ pand xmm0,xmm9
+ movdqa xmm2,XMMWORD[r11]
+DB 102,15,56,0,208
+ movdqa xmm0,XMMWORD[16+r11]
+DB 102,15,56,0,193
+ pxor xmm0,xmm2
+ DB 0F3h,0C3h ;repret
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ALIGN 16
+_vpaes_schedule_mangle:
+
+ movdqa xmm4,xmm0
+ movdqa xmm5,XMMWORD[$L$k_mc_forward]
+ test rcx,rcx
+ jnz NEAR $L$schedule_mangle_dec
+
+
+ add rdx,16
+ pxor xmm4,XMMWORD[$L$k_s63]
+DB 102,15,56,0,229
+ movdqa xmm3,xmm4
+DB 102,15,56,0,229
+ pxor xmm3,xmm4
+DB 102,15,56,0,229
+ pxor xmm3,xmm4
+
+ jmp NEAR $L$schedule_mangle_both
+ALIGN 16
+$L$schedule_mangle_dec:
+
+ lea r11,[$L$k_dksd]
+ movdqa xmm1,xmm9
+ pandn xmm1,xmm4
+ psrld xmm1,4
+ pand xmm4,xmm9
+
+ movdqa xmm2,XMMWORD[r11]
+DB 102,15,56,0,212
+ movdqa xmm3,XMMWORD[16+r11]
+DB 102,15,56,0,217
+ pxor xmm3,xmm2
+DB 102,15,56,0,221
+
+ movdqa xmm2,XMMWORD[32+r11]
+DB 102,15,56,0,212
+ pxor xmm2,xmm3
+ movdqa xmm3,XMMWORD[48+r11]
+DB 102,15,56,0,217
+ pxor xmm3,xmm2
+DB 102,15,56,0,221
+
+ movdqa xmm2,XMMWORD[64+r11]
+DB 102,15,56,0,212
+ pxor xmm2,xmm3
+ movdqa xmm3,XMMWORD[80+r11]
+DB 102,15,56,0,217
+ pxor xmm3,xmm2
+DB 102,15,56,0,221
+
+ movdqa xmm2,XMMWORD[96+r11]
+DB 102,15,56,0,212
+ pxor xmm2,xmm3
+ movdqa xmm3,XMMWORD[112+r11]
+DB 102,15,56,0,217
+ pxor xmm3,xmm2
+
+ add rdx,-16
+
+$L$schedule_mangle_both:
+ movdqa xmm1,XMMWORD[r10*1+r8]
+DB 102,15,56,0,217
+ add r8,-16
+ and r8,0x30
+ movdqu XMMWORD[rdx],xmm3
+ DB 0F3h,0C3h ;repret
+
+
+
+
+
+
+global vpaes_set_encrypt_key
+
+ALIGN 16
+vpaes_set_encrypt_key:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_vpaes_set_encrypt_key:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+
+ lea rsp,[((-184))+rsp]
+ movaps XMMWORD[16+rsp],xmm6
+ movaps XMMWORD[32+rsp],xmm7
+ movaps XMMWORD[48+rsp],xmm8
+ movaps XMMWORD[64+rsp],xmm9
+ movaps XMMWORD[80+rsp],xmm10
+ movaps XMMWORD[96+rsp],xmm11
+ movaps XMMWORD[112+rsp],xmm12
+ movaps XMMWORD[128+rsp],xmm13
+ movaps XMMWORD[144+rsp],xmm14
+ movaps XMMWORD[160+rsp],xmm15
+$L$enc_key_body:
+ mov eax,esi
+ shr eax,5
+ add eax,5
+ mov DWORD[240+rdx],eax
+
+ mov ecx,0
+ mov r8d,0x30
+ call _vpaes_schedule_core
+ movaps xmm6,XMMWORD[16+rsp]
+ movaps xmm7,XMMWORD[32+rsp]
+ movaps xmm8,XMMWORD[48+rsp]
+ movaps xmm9,XMMWORD[64+rsp]
+ movaps xmm10,XMMWORD[80+rsp]
+ movaps xmm11,XMMWORD[96+rsp]
+ movaps xmm12,XMMWORD[112+rsp]
+ movaps xmm13,XMMWORD[128+rsp]
+ movaps xmm14,XMMWORD[144+rsp]
+ movaps xmm15,XMMWORD[160+rsp]
+ lea rsp,[184+rsp]
+$L$enc_key_epilogue:
+ xor eax,eax
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_vpaes_set_encrypt_key:
+
+global vpaes_set_decrypt_key
+
+ALIGN 16
+vpaes_set_decrypt_key:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_vpaes_set_decrypt_key:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+
+ lea rsp,[((-184))+rsp]
+ movaps XMMWORD[16+rsp],xmm6
+ movaps XMMWORD[32+rsp],xmm7
+ movaps XMMWORD[48+rsp],xmm8
+ movaps XMMWORD[64+rsp],xmm9
+ movaps XMMWORD[80+rsp],xmm10
+ movaps XMMWORD[96+rsp],xmm11
+ movaps XMMWORD[112+rsp],xmm12
+ movaps XMMWORD[128+rsp],xmm13
+ movaps XMMWORD[144+rsp],xmm14
+ movaps XMMWORD[160+rsp],xmm15
+$L$dec_key_body:
+ mov eax,esi
+ shr eax,5
+ add eax,5
+ mov DWORD[240+rdx],eax
+ shl eax,4
+ lea rdx,[16+rax*1+rdx]
+
+ mov ecx,1
+ mov r8d,esi
+ shr r8d,1
+ and r8d,32
+ xor r8d,32
+ call _vpaes_schedule_core
+ movaps xmm6,XMMWORD[16+rsp]
+ movaps xmm7,XMMWORD[32+rsp]
+ movaps xmm8,XMMWORD[48+rsp]
+ movaps xmm9,XMMWORD[64+rsp]
+ movaps xmm10,XMMWORD[80+rsp]
+ movaps xmm11,XMMWORD[96+rsp]
+ movaps xmm12,XMMWORD[112+rsp]
+ movaps xmm13,XMMWORD[128+rsp]
+ movaps xmm14,XMMWORD[144+rsp]
+ movaps xmm15,XMMWORD[160+rsp]
+ lea rsp,[184+rsp]
+$L$dec_key_epilogue:
+ xor eax,eax
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_vpaes_set_decrypt_key:
+
+global vpaes_encrypt
+
+ALIGN 16
+vpaes_encrypt:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_vpaes_encrypt:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+
+ lea rsp,[((-184))+rsp]
+ movaps XMMWORD[16+rsp],xmm6
+ movaps XMMWORD[32+rsp],xmm7
+ movaps XMMWORD[48+rsp],xmm8
+ movaps XMMWORD[64+rsp],xmm9
+ movaps XMMWORD[80+rsp],xmm10
+ movaps XMMWORD[96+rsp],xmm11
+ movaps XMMWORD[112+rsp],xmm12
+ movaps XMMWORD[128+rsp],xmm13
+ movaps XMMWORD[144+rsp],xmm14
+ movaps XMMWORD[160+rsp],xmm15
+$L$enc_body:
+ movdqu xmm0,XMMWORD[rdi]
+ call _vpaes_preheat
+ call _vpaes_encrypt_core
+ movdqu XMMWORD[rsi],xmm0
+ movaps xmm6,XMMWORD[16+rsp]
+ movaps xmm7,XMMWORD[32+rsp]
+ movaps xmm8,XMMWORD[48+rsp]
+ movaps xmm9,XMMWORD[64+rsp]
+ movaps xmm10,XMMWORD[80+rsp]
+ movaps xmm11,XMMWORD[96+rsp]
+ movaps xmm12,XMMWORD[112+rsp]
+ movaps xmm13,XMMWORD[128+rsp]
+ movaps xmm14,XMMWORD[144+rsp]
+ movaps xmm15,XMMWORD[160+rsp]
+ lea rsp,[184+rsp]
+$L$enc_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_vpaes_encrypt:
+
+global vpaes_decrypt
+
+ALIGN 16
+vpaes_decrypt:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_vpaes_decrypt:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+
+ lea rsp,[((-184))+rsp]
+ movaps XMMWORD[16+rsp],xmm6
+ movaps XMMWORD[32+rsp],xmm7
+ movaps XMMWORD[48+rsp],xmm8
+ movaps XMMWORD[64+rsp],xmm9
+ movaps XMMWORD[80+rsp],xmm10
+ movaps XMMWORD[96+rsp],xmm11
+ movaps XMMWORD[112+rsp],xmm12
+ movaps XMMWORD[128+rsp],xmm13
+ movaps XMMWORD[144+rsp],xmm14
+ movaps XMMWORD[160+rsp],xmm15
+$L$dec_body:
+ movdqu xmm0,XMMWORD[rdi]
+ call _vpaes_preheat
+ call _vpaes_decrypt_core
+ movdqu XMMWORD[rsi],xmm0
+ movaps xmm6,XMMWORD[16+rsp]
+ movaps xmm7,XMMWORD[32+rsp]
+ movaps xmm8,XMMWORD[48+rsp]
+ movaps xmm9,XMMWORD[64+rsp]
+ movaps xmm10,XMMWORD[80+rsp]
+ movaps xmm11,XMMWORD[96+rsp]
+ movaps xmm12,XMMWORD[112+rsp]
+ movaps xmm13,XMMWORD[128+rsp]
+ movaps xmm14,XMMWORD[144+rsp]
+ movaps xmm15,XMMWORD[160+rsp]
+ lea rsp,[184+rsp]
+$L$dec_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_vpaes_decrypt:
+global vpaes_cbc_encrypt
+
+ALIGN 16
+vpaes_cbc_encrypt:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_vpaes_cbc_encrypt:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+
+ xchg rdx,rcx
+ sub rcx,16
+ jc NEAR $L$cbc_abort
+ lea rsp,[((-184))+rsp]
+ movaps XMMWORD[16+rsp],xmm6
+ movaps XMMWORD[32+rsp],xmm7
+ movaps XMMWORD[48+rsp],xmm8
+ movaps XMMWORD[64+rsp],xmm9
+ movaps XMMWORD[80+rsp],xmm10
+ movaps XMMWORD[96+rsp],xmm11
+ movaps XMMWORD[112+rsp],xmm12
+ movaps XMMWORD[128+rsp],xmm13
+ movaps XMMWORD[144+rsp],xmm14
+ movaps XMMWORD[160+rsp],xmm15
+$L$cbc_body:
+ movdqu xmm6,XMMWORD[r8]
+ sub rsi,rdi
+ call _vpaes_preheat
+ cmp r9d,0
+ je NEAR $L$cbc_dec_loop
+ jmp NEAR $L$cbc_enc_loop
+ALIGN 16
+$L$cbc_enc_loop:
+ movdqu xmm0,XMMWORD[rdi]
+ pxor xmm0,xmm6
+ call _vpaes_encrypt_core
+ movdqa xmm6,xmm0
+ movdqu XMMWORD[rdi*1+rsi],xmm0
+ lea rdi,[16+rdi]
+ sub rcx,16
+ jnc NEAR $L$cbc_enc_loop
+ jmp NEAR $L$cbc_done
+ALIGN 16
+$L$cbc_dec_loop:
+ movdqu xmm0,XMMWORD[rdi]
+ movdqa xmm7,xmm0
+ call _vpaes_decrypt_core
+ pxor xmm0,xmm6
+ movdqa xmm6,xmm7
+ movdqu XMMWORD[rdi*1+rsi],xmm0
+ lea rdi,[16+rdi]
+ sub rcx,16
+ jnc NEAR $L$cbc_dec_loop
+$L$cbc_done:
+ movdqu XMMWORD[r8],xmm6
+ movaps xmm6,XMMWORD[16+rsp]
+ movaps xmm7,XMMWORD[32+rsp]
+ movaps xmm8,XMMWORD[48+rsp]
+ movaps xmm9,XMMWORD[64+rsp]
+ movaps xmm10,XMMWORD[80+rsp]
+ movaps xmm11,XMMWORD[96+rsp]
+ movaps xmm12,XMMWORD[112+rsp]
+ movaps xmm13,XMMWORD[128+rsp]
+ movaps xmm14,XMMWORD[144+rsp]
+ movaps xmm15,XMMWORD[160+rsp]
+ lea rsp,[184+rsp]
+$L$cbc_epilogue:
+$L$cbc_abort:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_vpaes_cbc_encrypt:
+
+
+
+
+
+
+
+ALIGN 16
+_vpaes_preheat:
+
+ lea r10,[$L$k_s0F]
+ movdqa xmm10,XMMWORD[((-32))+r10]
+ movdqa xmm11,XMMWORD[((-16))+r10]
+ movdqa xmm9,XMMWORD[r10]
+ movdqa xmm13,XMMWORD[48+r10]
+ movdqa xmm12,XMMWORD[64+r10]
+ movdqa xmm15,XMMWORD[80+r10]
+ movdqa xmm14,XMMWORD[96+r10]
+ DB 0F3h,0C3h ;repret
+
+
+
+
+
+
+
+
+ALIGN 64
+_vpaes_consts:
+$L$k_inv:
+ DQ 0x0E05060F0D080180,0x040703090A0B0C02
+ DQ 0x01040A060F0B0780,0x030D0E0C02050809
+
+$L$k_s0F:
+ DQ 0x0F0F0F0F0F0F0F0F,0x0F0F0F0F0F0F0F0F
+
+$L$k_ipt:
+ DQ 0xC2B2E8985A2A7000,0xCABAE09052227808
+ DQ 0x4C01307D317C4D00,0xCD80B1FCB0FDCC81
+
+$L$k_sb1:
+ DQ 0xB19BE18FCB503E00,0xA5DF7A6E142AF544
+ DQ 0x3618D415FAE22300,0x3BF7CCC10D2ED9EF
+$L$k_sb2:
+ DQ 0xE27A93C60B712400,0x5EB7E955BC982FCD
+ DQ 0x69EB88400AE12900,0xC2A163C8AB82234A
+$L$k_sbo:
+ DQ 0xD0D26D176FBDC700,0x15AABF7AC502A878
+ DQ 0xCFE474A55FBB6A00,0x8E1E90D1412B35FA
+
+$L$k_mc_forward:
+ DQ 0x0407060500030201,0x0C0F0E0D080B0A09
+ DQ 0x080B0A0904070605,0x000302010C0F0E0D
+ DQ 0x0C0F0E0D080B0A09,0x0407060500030201
+ DQ 0x000302010C0F0E0D,0x080B0A0904070605
+
+$L$k_mc_backward:
+ DQ 0x0605040702010003,0x0E0D0C0F0A09080B
+ DQ 0x020100030E0D0C0F,0x0A09080B06050407
+ DQ 0x0E0D0C0F0A09080B,0x0605040702010003
+ DQ 0x0A09080B06050407,0x020100030E0D0C0F
+
+$L$k_sr:
+ DQ 0x0706050403020100,0x0F0E0D0C0B0A0908
+ DQ 0x030E09040F0A0500,0x0B06010C07020D08
+ DQ 0x0F060D040B020900,0x070E050C030A0108
+ DQ 0x0B0E0104070A0D00,0x0306090C0F020508
+
+$L$k_rcon:
+ DQ 0x1F8391B9AF9DEEB6,0x702A98084D7C7D81
+
+$L$k_s63:
+ DQ 0x5B5B5B5B5B5B5B5B,0x5B5B5B5B5B5B5B5B
+
+$L$k_opt:
+ DQ 0xFF9F4929D6B66000,0xF7974121DEBE6808
+ DQ 0x01EDBD5150BCEC00,0xE10D5DB1B05C0CE0
+
+$L$k_deskew:
+ DQ 0x07E4A34047A4E300,0x1DFEB95A5DBEF91A
+ DQ 0x5F36B5DC83EA6900,0x2841C2ABF49D1E77
+
+
+
+
+
+$L$k_dksd:
+ DQ 0xFEB91A5DA3E44700,0x0740E3A45A1DBEF9
+ DQ 0x41C277F4B5368300,0x5FDC69EAAB289D1E
+$L$k_dksb:
+ DQ 0x9A4FCA1F8550D500,0x03D653861CC94C99
+ DQ 0x115BEDA7B6FC4A00,0xD993256F7E3482C8
+$L$k_dkse:
+ DQ 0xD5031CCA1FC9D600,0x53859A4C994F5086
+ DQ 0xA23196054FDC7BE8,0xCD5EF96A20B31487
+$L$k_dks9:
+ DQ 0xB6116FC87ED9A700,0x4AED933482255BFC
+ DQ 0x4576516227143300,0x8BB89FACE9DAFDCE
+
+
+
+
+
+$L$k_dipt:
+ DQ 0x0F505B040B545F00,0x154A411E114E451A
+ DQ 0x86E383E660056500,0x12771772F491F194
+
+$L$k_dsb9:
+ DQ 0x851C03539A86D600,0xCAD51F504F994CC9
+ DQ 0xC03B1789ECD74900,0x725E2C9EB2FBA565
+$L$k_dsbd:
+ DQ 0x7D57CCDFE6B1A200,0xF56E9B13882A4439
+ DQ 0x3CE2FAF724C6CB00,0x2931180D15DEEFD3
+$L$k_dsbb:
+ DQ 0xD022649296B44200,0x602646F6B0F2D404
+ DQ 0xC19498A6CD596700,0xF3FF0C3E3255AA6B
+$L$k_dsbe:
+ DQ 0x46F2929626D4D000,0x2242600464B4F6B0
+ DQ 0x0C55A6CDFFAAC100,0x9467F36B98593E32
+$L$k_dsbo:
+ DQ 0x1387EA537EF94000,0xC7AA6DB9D4943E2D
+ DQ 0x12D7560F93441D00,0xCA4B8159D8C58E9C
+DB 86,101,99,116,111,114,32,80,101,114,109,117,116,97,116,105
+DB 111,110,32,65,69,83,32,102,111,114,32,120,56,54,95,54
+DB 52,47,83,83,83,69,51,44,32,77,105,107,101,32,72,97
+DB 109,98,117,114,103,32,40,83,116,97,110,102,111,114,100,32
+DB 85,110,105,118,101,114,115,105,116,121,41,0
+ALIGN 64
+
+EXTERN __imp_RtlVirtualUnwind
+
+ALIGN 16
+se_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ mov rsi,QWORD[8+r9]
+ mov r11,QWORD[56+r9]
+
+ mov r10d,DWORD[r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jb NEAR $L$in_prologue
+
+ mov rax,QWORD[152+r8]
+
+ mov r10d,DWORD[4+r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jae NEAR $L$in_prologue
+
+ lea rsi,[16+rax]
+ lea rdi,[512+r8]
+ mov ecx,20
+ DD 0xa548f3fc
+ lea rax,[184+rax]
+
+$L$in_prologue:
+ mov rdi,QWORD[8+rax]
+ mov rsi,QWORD[16+rax]
+ mov QWORD[152+r8],rax
+ mov QWORD[168+r8],rsi
+ mov QWORD[176+r8],rdi
+
+ mov rdi,QWORD[40+r9]
+ mov rsi,r8
+ mov ecx,154
+ DD 0xa548f3fc
+
+ mov rsi,r9
+ xor rcx,rcx
+ mov rdx,QWORD[8+rsi]
+ mov r8,QWORD[rsi]
+ mov r9,QWORD[16+rsi]
+ mov r10,QWORD[40+rsi]
+ lea r11,[56+rsi]
+ lea r12,[24+rsi]
+ mov QWORD[32+rsp],r10
+ mov QWORD[40+rsp],r11
+ mov QWORD[48+rsp],r12
+ mov QWORD[56+rsp],rcx
+ call QWORD[__imp_RtlVirtualUnwind]
+
+ mov eax,1
+ add rsp,64
+ popfq
+ pop r15
+ pop r14
+ pop r13
+ pop r12
+ pop rbp
+ pop rbx
+ pop rdi
+ pop rsi
+ DB 0F3h,0C3h ;repret
+
+
+section .pdata rdata align=4
+ALIGN 4
+ DD $L$SEH_begin_vpaes_set_encrypt_key wrt ..imagebase
+ DD $L$SEH_end_vpaes_set_encrypt_key wrt ..imagebase
+ DD $L$SEH_info_vpaes_set_encrypt_key wrt ..imagebase
+
+ DD $L$SEH_begin_vpaes_set_decrypt_key wrt ..imagebase
+ DD $L$SEH_end_vpaes_set_decrypt_key wrt ..imagebase
+ DD $L$SEH_info_vpaes_set_decrypt_key wrt ..imagebase
+
+ DD $L$SEH_begin_vpaes_encrypt wrt ..imagebase
+ DD $L$SEH_end_vpaes_encrypt wrt ..imagebase
+ DD $L$SEH_info_vpaes_encrypt wrt ..imagebase
+
+ DD $L$SEH_begin_vpaes_decrypt wrt ..imagebase
+ DD $L$SEH_end_vpaes_decrypt wrt ..imagebase
+ DD $L$SEH_info_vpaes_decrypt wrt ..imagebase
+
+ DD $L$SEH_begin_vpaes_cbc_encrypt wrt ..imagebase
+ DD $L$SEH_end_vpaes_cbc_encrypt wrt ..imagebase
+ DD $L$SEH_info_vpaes_cbc_encrypt wrt ..imagebase
+
+section .xdata rdata align=8
+ALIGN 8
+$L$SEH_info_vpaes_set_encrypt_key:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$enc_key_body wrt ..imagebase,$L$enc_key_epilogue wrt ..imagebase
+$L$SEH_info_vpaes_set_decrypt_key:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$dec_key_body wrt ..imagebase,$L$dec_key_epilogue wrt ..imagebase
+$L$SEH_info_vpaes_encrypt:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$enc_body wrt ..imagebase,$L$enc_epilogue wrt ..imagebase
+$L$SEH_info_vpaes_decrypt:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$dec_body wrt ..imagebase,$L$dec_epilogue wrt ..imagebase
+$L$SEH_info_vpaes_cbc_encrypt:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$cbc_body wrt ..imagebase,$L$cbc_epilogue wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-avx2.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-avx2.nasm
new file mode 100644
index 0000000000..69443b7261
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-avx2.nasm
@@ -0,0 +1,1989 @@
+; Copyright 2013-2019 The OpenSSL Project Authors. All Rights Reserved.
+; Copyright (c) 2012, Intel Corporation. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+global rsaz_1024_sqr_avx2
+
+ALIGN 64
+rsaz_1024_sqr_avx2:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_rsaz_1024_sqr_avx2:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+
+
+
+ lea rax,[rsp]
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+ vzeroupper
+ lea rsp,[((-168))+rsp]
+ vmovaps XMMWORD[(-216)+rax],xmm6
+ vmovaps XMMWORD[(-200)+rax],xmm7
+ vmovaps XMMWORD[(-184)+rax],xmm8
+ vmovaps XMMWORD[(-168)+rax],xmm9
+ vmovaps XMMWORD[(-152)+rax],xmm10
+ vmovaps XMMWORD[(-136)+rax],xmm11
+ vmovaps XMMWORD[(-120)+rax],xmm12
+ vmovaps XMMWORD[(-104)+rax],xmm13
+ vmovaps XMMWORD[(-88)+rax],xmm14
+ vmovaps XMMWORD[(-72)+rax],xmm15
+$L$sqr_1024_body:
+ mov rbp,rax
+
+ mov r13,rdx
+ sub rsp,832
+ mov r15,r13
+ sub rdi,-128
+ sub rsi,-128
+ sub r13,-128
+
+ and r15,4095
+ add r15,32*10
+ shr r15,12
+ vpxor ymm9,ymm9,ymm9
+ jz NEAR $L$sqr_1024_no_n_copy
+
+
+
+
+
+ sub rsp,32*10
+ vmovdqu ymm0,YMMWORD[((0-128))+r13]
+ and rsp,-2048
+ vmovdqu ymm1,YMMWORD[((32-128))+r13]
+ vmovdqu ymm2,YMMWORD[((64-128))+r13]
+ vmovdqu ymm3,YMMWORD[((96-128))+r13]
+ vmovdqu ymm4,YMMWORD[((128-128))+r13]
+ vmovdqu ymm5,YMMWORD[((160-128))+r13]
+ vmovdqu ymm6,YMMWORD[((192-128))+r13]
+ vmovdqu ymm7,YMMWORD[((224-128))+r13]
+ vmovdqu ymm8,YMMWORD[((256-128))+r13]
+ lea r13,[((832+128))+rsp]
+ vmovdqu YMMWORD[(0-128)+r13],ymm0
+ vmovdqu YMMWORD[(32-128)+r13],ymm1
+ vmovdqu YMMWORD[(64-128)+r13],ymm2
+ vmovdqu YMMWORD[(96-128)+r13],ymm3
+ vmovdqu YMMWORD[(128-128)+r13],ymm4
+ vmovdqu YMMWORD[(160-128)+r13],ymm5
+ vmovdqu YMMWORD[(192-128)+r13],ymm6
+ vmovdqu YMMWORD[(224-128)+r13],ymm7
+ vmovdqu YMMWORD[(256-128)+r13],ymm8
+ vmovdqu YMMWORD[(288-128)+r13],ymm9
+
+$L$sqr_1024_no_n_copy:
+ and rsp,-1024
+
+ vmovdqu ymm1,YMMWORD[((32-128))+rsi]
+ vmovdqu ymm2,YMMWORD[((64-128))+rsi]
+ vmovdqu ymm3,YMMWORD[((96-128))+rsi]
+ vmovdqu ymm4,YMMWORD[((128-128))+rsi]
+ vmovdqu ymm5,YMMWORD[((160-128))+rsi]
+ vmovdqu ymm6,YMMWORD[((192-128))+rsi]
+ vmovdqu ymm7,YMMWORD[((224-128))+rsi]
+ vmovdqu ymm8,YMMWORD[((256-128))+rsi]
+
+ lea rbx,[192+rsp]
+ vmovdqu ymm15,YMMWORD[$L$and_mask]
+ jmp NEAR $L$OOP_GRANDE_SQR_1024
+
+ALIGN 32
+$L$OOP_GRANDE_SQR_1024:
+ lea r9,[((576+128))+rsp]
+ lea r12,[448+rsp]
+
+
+
+
+ vpaddq ymm1,ymm1,ymm1
+ vpbroadcastq ymm10,QWORD[((0-128))+rsi]
+ vpaddq ymm2,ymm2,ymm2
+ vmovdqa YMMWORD[(0-128)+r9],ymm1
+ vpaddq ymm3,ymm3,ymm3
+ vmovdqa YMMWORD[(32-128)+r9],ymm2
+ vpaddq ymm4,ymm4,ymm4
+ vmovdqa YMMWORD[(64-128)+r9],ymm3
+ vpaddq ymm5,ymm5,ymm5
+ vmovdqa YMMWORD[(96-128)+r9],ymm4
+ vpaddq ymm6,ymm6,ymm6
+ vmovdqa YMMWORD[(128-128)+r9],ymm5
+ vpaddq ymm7,ymm7,ymm7
+ vmovdqa YMMWORD[(160-128)+r9],ymm6
+ vpaddq ymm8,ymm8,ymm8
+ vmovdqa YMMWORD[(192-128)+r9],ymm7
+ vpxor ymm9,ymm9,ymm9
+ vmovdqa YMMWORD[(224-128)+r9],ymm8
+
+ vpmuludq ymm0,ymm10,YMMWORD[((0-128))+rsi]
+ vpbroadcastq ymm11,QWORD[((32-128))+rsi]
+ vmovdqu YMMWORD[(288-192)+rbx],ymm9
+ vpmuludq ymm1,ymm1,ymm10
+ vmovdqu YMMWORD[(320-448)+r12],ymm9
+ vpmuludq ymm2,ymm2,ymm10
+ vmovdqu YMMWORD[(352-448)+r12],ymm9
+ vpmuludq ymm3,ymm3,ymm10
+ vmovdqu YMMWORD[(384-448)+r12],ymm9
+ vpmuludq ymm4,ymm4,ymm10
+ vmovdqu YMMWORD[(416-448)+r12],ymm9
+ vpmuludq ymm5,ymm5,ymm10
+ vmovdqu YMMWORD[(448-448)+r12],ymm9
+ vpmuludq ymm6,ymm6,ymm10
+ vmovdqu YMMWORD[(480-448)+r12],ymm9
+ vpmuludq ymm7,ymm7,ymm10
+ vmovdqu YMMWORD[(512-448)+r12],ymm9
+ vpmuludq ymm8,ymm8,ymm10
+ vpbroadcastq ymm10,QWORD[((64-128))+rsi]
+ vmovdqu YMMWORD[(544-448)+r12],ymm9
+
+ mov r15,rsi
+ mov r14d,4
+ jmp NEAR $L$sqr_entry_1024
+ALIGN 32
+$L$OOP_SQR_1024:
+ vpbroadcastq ymm11,QWORD[((32-128))+r15]
+ vpmuludq ymm0,ymm10,YMMWORD[((0-128))+rsi]
+ vpaddq ymm0,ymm0,YMMWORD[((0-192))+rbx]
+ vpmuludq ymm1,ymm10,YMMWORD[((0-128))+r9]
+ vpaddq ymm1,ymm1,YMMWORD[((32-192))+rbx]
+ vpmuludq ymm2,ymm10,YMMWORD[((32-128))+r9]
+ vpaddq ymm2,ymm2,YMMWORD[((64-192))+rbx]
+ vpmuludq ymm3,ymm10,YMMWORD[((64-128))+r9]
+ vpaddq ymm3,ymm3,YMMWORD[((96-192))+rbx]
+ vpmuludq ymm4,ymm10,YMMWORD[((96-128))+r9]
+ vpaddq ymm4,ymm4,YMMWORD[((128-192))+rbx]
+ vpmuludq ymm5,ymm10,YMMWORD[((128-128))+r9]
+ vpaddq ymm5,ymm5,YMMWORD[((160-192))+rbx]
+ vpmuludq ymm6,ymm10,YMMWORD[((160-128))+r9]
+ vpaddq ymm6,ymm6,YMMWORD[((192-192))+rbx]
+ vpmuludq ymm7,ymm10,YMMWORD[((192-128))+r9]
+ vpaddq ymm7,ymm7,YMMWORD[((224-192))+rbx]
+ vpmuludq ymm8,ymm10,YMMWORD[((224-128))+r9]
+ vpbroadcastq ymm10,QWORD[((64-128))+r15]
+ vpaddq ymm8,ymm8,YMMWORD[((256-192))+rbx]
+$L$sqr_entry_1024:
+ vmovdqu YMMWORD[(0-192)+rbx],ymm0
+ vmovdqu YMMWORD[(32-192)+rbx],ymm1
+
+ vpmuludq ymm12,ymm11,YMMWORD[((32-128))+rsi]
+ vpaddq ymm2,ymm2,ymm12
+ vpmuludq ymm14,ymm11,YMMWORD[((32-128))+r9]
+ vpaddq ymm3,ymm3,ymm14
+ vpmuludq ymm13,ymm11,YMMWORD[((64-128))+r9]
+ vpaddq ymm4,ymm4,ymm13
+ vpmuludq ymm12,ymm11,YMMWORD[((96-128))+r9]
+ vpaddq ymm5,ymm5,ymm12
+ vpmuludq ymm14,ymm11,YMMWORD[((128-128))+r9]
+ vpaddq ymm6,ymm6,ymm14
+ vpmuludq ymm13,ymm11,YMMWORD[((160-128))+r9]
+ vpaddq ymm7,ymm7,ymm13
+ vpmuludq ymm12,ymm11,YMMWORD[((192-128))+r9]
+ vpaddq ymm8,ymm8,ymm12
+ vpmuludq ymm0,ymm11,YMMWORD[((224-128))+r9]
+ vpbroadcastq ymm11,QWORD[((96-128))+r15]
+ vpaddq ymm0,ymm0,YMMWORD[((288-192))+rbx]
+
+ vmovdqu YMMWORD[(64-192)+rbx],ymm2
+ vmovdqu YMMWORD[(96-192)+rbx],ymm3
+
+ vpmuludq ymm13,ymm10,YMMWORD[((64-128))+rsi]
+ vpaddq ymm4,ymm4,ymm13
+ vpmuludq ymm12,ymm10,YMMWORD[((64-128))+r9]
+ vpaddq ymm5,ymm5,ymm12
+ vpmuludq ymm14,ymm10,YMMWORD[((96-128))+r9]
+ vpaddq ymm6,ymm6,ymm14
+ vpmuludq ymm13,ymm10,YMMWORD[((128-128))+r9]
+ vpaddq ymm7,ymm7,ymm13
+ vpmuludq ymm12,ymm10,YMMWORD[((160-128))+r9]
+ vpaddq ymm8,ymm8,ymm12
+ vpmuludq ymm14,ymm10,YMMWORD[((192-128))+r9]
+ vpaddq ymm0,ymm0,ymm14
+ vpmuludq ymm1,ymm10,YMMWORD[((224-128))+r9]
+ vpbroadcastq ymm10,QWORD[((128-128))+r15]
+ vpaddq ymm1,ymm1,YMMWORD[((320-448))+r12]
+
+ vmovdqu YMMWORD[(128-192)+rbx],ymm4
+ vmovdqu YMMWORD[(160-192)+rbx],ymm5
+
+ vpmuludq ymm12,ymm11,YMMWORD[((96-128))+rsi]
+ vpaddq ymm6,ymm6,ymm12
+ vpmuludq ymm14,ymm11,YMMWORD[((96-128))+r9]
+ vpaddq ymm7,ymm7,ymm14
+ vpmuludq ymm13,ymm11,YMMWORD[((128-128))+r9]
+ vpaddq ymm8,ymm8,ymm13
+ vpmuludq ymm12,ymm11,YMMWORD[((160-128))+r9]
+ vpaddq ymm0,ymm0,ymm12
+ vpmuludq ymm14,ymm11,YMMWORD[((192-128))+r9]
+ vpaddq ymm1,ymm1,ymm14
+ vpmuludq ymm2,ymm11,YMMWORD[((224-128))+r9]
+ vpbroadcastq ymm11,QWORD[((160-128))+r15]
+ vpaddq ymm2,ymm2,YMMWORD[((352-448))+r12]
+
+ vmovdqu YMMWORD[(192-192)+rbx],ymm6
+ vmovdqu YMMWORD[(224-192)+rbx],ymm7
+
+ vpmuludq ymm12,ymm10,YMMWORD[((128-128))+rsi]
+ vpaddq ymm8,ymm8,ymm12
+ vpmuludq ymm14,ymm10,YMMWORD[((128-128))+r9]
+ vpaddq ymm0,ymm0,ymm14
+ vpmuludq ymm13,ymm10,YMMWORD[((160-128))+r9]
+ vpaddq ymm1,ymm1,ymm13
+ vpmuludq ymm12,ymm10,YMMWORD[((192-128))+r9]
+ vpaddq ymm2,ymm2,ymm12
+ vpmuludq ymm3,ymm10,YMMWORD[((224-128))+r9]
+ vpbroadcastq ymm10,QWORD[((192-128))+r15]
+ vpaddq ymm3,ymm3,YMMWORD[((384-448))+r12]
+
+ vmovdqu YMMWORD[(256-192)+rbx],ymm8
+ vmovdqu YMMWORD[(288-192)+rbx],ymm0
+ lea rbx,[8+rbx]
+
+ vpmuludq ymm13,ymm11,YMMWORD[((160-128))+rsi]
+ vpaddq ymm1,ymm1,ymm13
+ vpmuludq ymm12,ymm11,YMMWORD[((160-128))+r9]
+ vpaddq ymm2,ymm2,ymm12
+ vpmuludq ymm14,ymm11,YMMWORD[((192-128))+r9]
+ vpaddq ymm3,ymm3,ymm14
+ vpmuludq ymm4,ymm11,YMMWORD[((224-128))+r9]
+ vpbroadcastq ymm11,QWORD[((224-128))+r15]
+ vpaddq ymm4,ymm4,YMMWORD[((416-448))+r12]
+
+ vmovdqu YMMWORD[(320-448)+r12],ymm1
+ vmovdqu YMMWORD[(352-448)+r12],ymm2
+
+ vpmuludq ymm12,ymm10,YMMWORD[((192-128))+rsi]
+ vpaddq ymm3,ymm3,ymm12
+ vpmuludq ymm14,ymm10,YMMWORD[((192-128))+r9]
+ vpbroadcastq ymm0,QWORD[((256-128))+r15]
+ vpaddq ymm4,ymm4,ymm14
+ vpmuludq ymm5,ymm10,YMMWORD[((224-128))+r9]
+ vpbroadcastq ymm10,QWORD[((0+8-128))+r15]
+ vpaddq ymm5,ymm5,YMMWORD[((448-448))+r12]
+
+ vmovdqu YMMWORD[(384-448)+r12],ymm3
+ vmovdqu YMMWORD[(416-448)+r12],ymm4
+ lea r15,[8+r15]
+
+ vpmuludq ymm12,ymm11,YMMWORD[((224-128))+rsi]
+ vpaddq ymm5,ymm5,ymm12
+ vpmuludq ymm6,ymm11,YMMWORD[((224-128))+r9]
+ vpaddq ymm6,ymm6,YMMWORD[((480-448))+r12]
+
+ vpmuludq ymm7,ymm0,YMMWORD[((256-128))+rsi]
+ vmovdqu YMMWORD[(448-448)+r12],ymm5
+ vpaddq ymm7,ymm7,YMMWORD[((512-448))+r12]
+ vmovdqu YMMWORD[(480-448)+r12],ymm6
+ vmovdqu YMMWORD[(512-448)+r12],ymm7
+ lea r12,[8+r12]
+
+ dec r14d
+ jnz NEAR $L$OOP_SQR_1024
+
+ vmovdqu ymm8,YMMWORD[256+rsp]
+ vmovdqu ymm1,YMMWORD[288+rsp]
+ vmovdqu ymm2,YMMWORD[320+rsp]
+ lea rbx,[192+rsp]
+
+ vpsrlq ymm14,ymm8,29
+ vpand ymm8,ymm8,ymm15
+ vpsrlq ymm11,ymm1,29
+ vpand ymm1,ymm1,ymm15
+
+ vpermq ymm14,ymm14,0x93
+ vpxor ymm9,ymm9,ymm9
+ vpermq ymm11,ymm11,0x93
+
+ vpblendd ymm10,ymm14,ymm9,3
+ vpblendd ymm14,ymm11,ymm14,3
+ vpaddq ymm8,ymm8,ymm10
+ vpblendd ymm11,ymm9,ymm11,3
+ vpaddq ymm1,ymm1,ymm14
+ vpaddq ymm2,ymm2,ymm11
+ vmovdqu YMMWORD[(288-192)+rbx],ymm1
+ vmovdqu YMMWORD[(320-192)+rbx],ymm2
+
+ mov rax,QWORD[rsp]
+ mov r10,QWORD[8+rsp]
+ mov r11,QWORD[16+rsp]
+ mov r12,QWORD[24+rsp]
+ vmovdqu ymm1,YMMWORD[32+rsp]
+ vmovdqu ymm2,YMMWORD[((64-192))+rbx]
+ vmovdqu ymm3,YMMWORD[((96-192))+rbx]
+ vmovdqu ymm4,YMMWORD[((128-192))+rbx]
+ vmovdqu ymm5,YMMWORD[((160-192))+rbx]
+ vmovdqu ymm6,YMMWORD[((192-192))+rbx]
+ vmovdqu ymm7,YMMWORD[((224-192))+rbx]
+
+ mov r9,rax
+ imul eax,ecx
+ and eax,0x1fffffff
+ vmovd xmm12,eax
+
+ mov rdx,rax
+ imul rax,QWORD[((-128))+r13]
+ vpbroadcastq ymm12,xmm12
+ add r9,rax
+ mov rax,rdx
+ imul rax,QWORD[((8-128))+r13]
+ shr r9,29
+ add r10,rax
+ mov rax,rdx
+ imul rax,QWORD[((16-128))+r13]
+ add r10,r9
+ add r11,rax
+ imul rdx,QWORD[((24-128))+r13]
+ add r12,rdx
+
+ mov rax,r10
+ imul eax,ecx
+ and eax,0x1fffffff
+
+ mov r14d,9
+ jmp NEAR $L$OOP_REDUCE_1024
+
+ALIGN 32
+$L$OOP_REDUCE_1024:
+ vmovd xmm13,eax
+ vpbroadcastq ymm13,xmm13
+
+ vpmuludq ymm10,ymm12,YMMWORD[((32-128))+r13]
+ mov rdx,rax
+ imul rax,QWORD[((-128))+r13]
+ vpaddq ymm1,ymm1,ymm10
+ add r10,rax
+ vpmuludq ymm14,ymm12,YMMWORD[((64-128))+r13]
+ mov rax,rdx
+ imul rax,QWORD[((8-128))+r13]
+ vpaddq ymm2,ymm2,ymm14
+ vpmuludq ymm11,ymm12,YMMWORD[((96-128))+r13]
+DB 0x67
+ add r11,rax
+DB 0x67
+ mov rax,rdx
+ imul rax,QWORD[((16-128))+r13]
+ shr r10,29
+ vpaddq ymm3,ymm3,ymm11
+ vpmuludq ymm10,ymm12,YMMWORD[((128-128))+r13]
+ add r12,rax
+ add r11,r10
+ vpaddq ymm4,ymm4,ymm10
+ vpmuludq ymm14,ymm12,YMMWORD[((160-128))+r13]
+ mov rax,r11
+ imul eax,ecx
+ vpaddq ymm5,ymm5,ymm14
+ vpmuludq ymm11,ymm12,YMMWORD[((192-128))+r13]
+ and eax,0x1fffffff
+ vpaddq ymm6,ymm6,ymm11
+ vpmuludq ymm10,ymm12,YMMWORD[((224-128))+r13]
+ vpaddq ymm7,ymm7,ymm10
+ vpmuludq ymm14,ymm12,YMMWORD[((256-128))+r13]
+ vmovd xmm12,eax
+
+ vpaddq ymm8,ymm8,ymm14
+
+ vpbroadcastq ymm12,xmm12
+
+ vpmuludq ymm11,ymm13,YMMWORD[((32-8-128))+r13]
+ vmovdqu ymm14,YMMWORD[((96-8-128))+r13]
+ mov rdx,rax
+ imul rax,QWORD[((-128))+r13]
+ vpaddq ymm1,ymm1,ymm11
+ vpmuludq ymm10,ymm13,YMMWORD[((64-8-128))+r13]
+ vmovdqu ymm11,YMMWORD[((128-8-128))+r13]
+ add r11,rax
+ mov rax,rdx
+ imul rax,QWORD[((8-128))+r13]
+ vpaddq ymm2,ymm2,ymm10
+ add rax,r12
+ shr r11,29
+ vpmuludq ymm14,ymm14,ymm13
+ vmovdqu ymm10,YMMWORD[((160-8-128))+r13]
+ add rax,r11
+ vpaddq ymm3,ymm3,ymm14
+ vpmuludq ymm11,ymm11,ymm13
+ vmovdqu ymm14,YMMWORD[((192-8-128))+r13]
+DB 0x67
+ mov r12,rax
+ imul eax,ecx
+ vpaddq ymm4,ymm4,ymm11
+ vpmuludq ymm10,ymm10,ymm13
+DB 0xc4,0x41,0x7e,0x6f,0x9d,0x58,0x00,0x00,0x00
+ and eax,0x1fffffff
+ vpaddq ymm5,ymm5,ymm10
+ vpmuludq ymm14,ymm14,ymm13
+ vmovdqu ymm10,YMMWORD[((256-8-128))+r13]
+ vpaddq ymm6,ymm6,ymm14
+ vpmuludq ymm11,ymm11,ymm13
+ vmovdqu ymm9,YMMWORD[((288-8-128))+r13]
+ vmovd xmm0,eax
+ imul rax,QWORD[((-128))+r13]
+ vpaddq ymm7,ymm7,ymm11
+ vpmuludq ymm10,ymm10,ymm13
+ vmovdqu ymm14,YMMWORD[((32-16-128))+r13]
+ vpbroadcastq ymm0,xmm0
+ vpaddq ymm8,ymm8,ymm10
+ vpmuludq ymm9,ymm9,ymm13
+ vmovdqu ymm11,YMMWORD[((64-16-128))+r13]
+ add r12,rax
+
+ vmovdqu ymm13,YMMWORD[((32-24-128))+r13]
+ vpmuludq ymm14,ymm14,ymm12
+ vmovdqu ymm10,YMMWORD[((96-16-128))+r13]
+ vpaddq ymm1,ymm1,ymm14
+ vpmuludq ymm13,ymm13,ymm0
+ vpmuludq ymm11,ymm11,ymm12
+DB 0xc4,0x41,0x7e,0x6f,0xb5,0xf0,0xff,0xff,0xff
+ vpaddq ymm13,ymm13,ymm1
+ vpaddq ymm2,ymm2,ymm11
+ vpmuludq ymm10,ymm10,ymm12
+ vmovdqu ymm11,YMMWORD[((160-16-128))+r13]
+DB 0x67
+ vmovq rax,xmm13
+ vmovdqu YMMWORD[rsp],ymm13
+ vpaddq ymm3,ymm3,ymm10
+ vpmuludq ymm14,ymm14,ymm12
+ vmovdqu ymm10,YMMWORD[((192-16-128))+r13]
+ vpaddq ymm4,ymm4,ymm14
+ vpmuludq ymm11,ymm11,ymm12
+ vmovdqu ymm14,YMMWORD[((224-16-128))+r13]
+ vpaddq ymm5,ymm5,ymm11
+ vpmuludq ymm10,ymm10,ymm12
+ vmovdqu ymm11,YMMWORD[((256-16-128))+r13]
+ vpaddq ymm6,ymm6,ymm10
+ vpmuludq ymm14,ymm14,ymm12
+ shr r12,29
+ vmovdqu ymm10,YMMWORD[((288-16-128))+r13]
+ add rax,r12
+ vpaddq ymm7,ymm7,ymm14
+ vpmuludq ymm11,ymm11,ymm12
+
+ mov r9,rax
+ imul eax,ecx
+ vpaddq ymm8,ymm8,ymm11
+ vpmuludq ymm10,ymm10,ymm12
+ and eax,0x1fffffff
+ vmovd xmm12,eax
+ vmovdqu ymm11,YMMWORD[((96-24-128))+r13]
+DB 0x67
+ vpaddq ymm9,ymm9,ymm10
+ vpbroadcastq ymm12,xmm12
+
+ vpmuludq ymm14,ymm0,YMMWORD[((64-24-128))+r13]
+ vmovdqu ymm10,YMMWORD[((128-24-128))+r13]
+ mov rdx,rax
+ imul rax,QWORD[((-128))+r13]
+ mov r10,QWORD[8+rsp]
+ vpaddq ymm1,ymm2,ymm14
+ vpmuludq ymm11,ymm11,ymm0
+ vmovdqu ymm14,YMMWORD[((160-24-128))+r13]
+ add r9,rax
+ mov rax,rdx
+ imul rax,QWORD[((8-128))+r13]
+DB 0x67
+ shr r9,29
+ mov r11,QWORD[16+rsp]
+ vpaddq ymm2,ymm3,ymm11
+ vpmuludq ymm10,ymm10,ymm0
+ vmovdqu ymm11,YMMWORD[((192-24-128))+r13]
+ add r10,rax
+ mov rax,rdx
+ imul rax,QWORD[((16-128))+r13]
+ vpaddq ymm3,ymm4,ymm10
+ vpmuludq ymm14,ymm14,ymm0
+ vmovdqu ymm10,YMMWORD[((224-24-128))+r13]
+ imul rdx,QWORD[((24-128))+r13]
+ add r11,rax
+ lea rax,[r10*1+r9]
+ vpaddq ymm4,ymm5,ymm14
+ vpmuludq ymm11,ymm11,ymm0
+ vmovdqu ymm14,YMMWORD[((256-24-128))+r13]
+ mov r10,rax
+ imul eax,ecx
+ vpmuludq ymm10,ymm10,ymm0
+ vpaddq ymm5,ymm6,ymm11
+ vmovdqu ymm11,YMMWORD[((288-24-128))+r13]
+ and eax,0x1fffffff
+ vpaddq ymm6,ymm7,ymm10
+ vpmuludq ymm14,ymm14,ymm0
+ add rdx,QWORD[24+rsp]
+ vpaddq ymm7,ymm8,ymm14
+ vpmuludq ymm11,ymm11,ymm0
+ vpaddq ymm8,ymm9,ymm11
+ vmovq xmm9,r12
+ mov r12,rdx
+
+ dec r14d
+ jnz NEAR $L$OOP_REDUCE_1024
+ lea r12,[448+rsp]
+ vpaddq ymm0,ymm13,ymm9
+ vpxor ymm9,ymm9,ymm9
+
+ vpaddq ymm0,ymm0,YMMWORD[((288-192))+rbx]
+ vpaddq ymm1,ymm1,YMMWORD[((320-448))+r12]
+ vpaddq ymm2,ymm2,YMMWORD[((352-448))+r12]
+ vpaddq ymm3,ymm3,YMMWORD[((384-448))+r12]
+ vpaddq ymm4,ymm4,YMMWORD[((416-448))+r12]
+ vpaddq ymm5,ymm5,YMMWORD[((448-448))+r12]
+ vpaddq ymm6,ymm6,YMMWORD[((480-448))+r12]
+ vpaddq ymm7,ymm7,YMMWORD[((512-448))+r12]
+ vpaddq ymm8,ymm8,YMMWORD[((544-448))+r12]
+
+ vpsrlq ymm14,ymm0,29
+ vpand ymm0,ymm0,ymm15
+ vpsrlq ymm11,ymm1,29
+ vpand ymm1,ymm1,ymm15
+ vpsrlq ymm12,ymm2,29
+ vpermq ymm14,ymm14,0x93
+ vpand ymm2,ymm2,ymm15
+ vpsrlq ymm13,ymm3,29
+ vpermq ymm11,ymm11,0x93
+ vpand ymm3,ymm3,ymm15
+ vpermq ymm12,ymm12,0x93
+
+ vpblendd ymm10,ymm14,ymm9,3
+ vpermq ymm13,ymm13,0x93
+ vpblendd ymm14,ymm11,ymm14,3
+ vpaddq ymm0,ymm0,ymm10
+ vpblendd ymm11,ymm12,ymm11,3
+ vpaddq ymm1,ymm1,ymm14
+ vpblendd ymm12,ymm13,ymm12,3
+ vpaddq ymm2,ymm2,ymm11
+ vpblendd ymm13,ymm9,ymm13,3
+ vpaddq ymm3,ymm3,ymm12
+ vpaddq ymm4,ymm4,ymm13
+
+ vpsrlq ymm14,ymm0,29
+ vpand ymm0,ymm0,ymm15
+ vpsrlq ymm11,ymm1,29
+ vpand ymm1,ymm1,ymm15
+ vpsrlq ymm12,ymm2,29
+ vpermq ymm14,ymm14,0x93
+ vpand ymm2,ymm2,ymm15
+ vpsrlq ymm13,ymm3,29
+ vpermq ymm11,ymm11,0x93
+ vpand ymm3,ymm3,ymm15
+ vpermq ymm12,ymm12,0x93
+
+ vpblendd ymm10,ymm14,ymm9,3
+ vpermq ymm13,ymm13,0x93
+ vpblendd ymm14,ymm11,ymm14,3
+ vpaddq ymm0,ymm0,ymm10
+ vpblendd ymm11,ymm12,ymm11,3
+ vpaddq ymm1,ymm1,ymm14
+ vmovdqu YMMWORD[(0-128)+rdi],ymm0
+ vpblendd ymm12,ymm13,ymm12,3
+ vpaddq ymm2,ymm2,ymm11
+ vmovdqu YMMWORD[(32-128)+rdi],ymm1
+ vpblendd ymm13,ymm9,ymm13,3
+ vpaddq ymm3,ymm3,ymm12
+ vmovdqu YMMWORD[(64-128)+rdi],ymm2
+ vpaddq ymm4,ymm4,ymm13
+ vmovdqu YMMWORD[(96-128)+rdi],ymm3
+ vpsrlq ymm14,ymm4,29
+ vpand ymm4,ymm4,ymm15
+ vpsrlq ymm11,ymm5,29
+ vpand ymm5,ymm5,ymm15
+ vpsrlq ymm12,ymm6,29
+ vpermq ymm14,ymm14,0x93
+ vpand ymm6,ymm6,ymm15
+ vpsrlq ymm13,ymm7,29
+ vpermq ymm11,ymm11,0x93
+ vpand ymm7,ymm7,ymm15
+ vpsrlq ymm0,ymm8,29
+ vpermq ymm12,ymm12,0x93
+ vpand ymm8,ymm8,ymm15
+ vpermq ymm13,ymm13,0x93
+
+ vpblendd ymm10,ymm14,ymm9,3
+ vpermq ymm0,ymm0,0x93
+ vpblendd ymm14,ymm11,ymm14,3
+ vpaddq ymm4,ymm4,ymm10
+ vpblendd ymm11,ymm12,ymm11,3
+ vpaddq ymm5,ymm5,ymm14
+ vpblendd ymm12,ymm13,ymm12,3
+ vpaddq ymm6,ymm6,ymm11
+ vpblendd ymm13,ymm0,ymm13,3
+ vpaddq ymm7,ymm7,ymm12
+ vpaddq ymm8,ymm8,ymm13
+
+ vpsrlq ymm14,ymm4,29
+ vpand ymm4,ymm4,ymm15
+ vpsrlq ymm11,ymm5,29
+ vpand ymm5,ymm5,ymm15
+ vpsrlq ymm12,ymm6,29
+ vpermq ymm14,ymm14,0x93
+ vpand ymm6,ymm6,ymm15
+ vpsrlq ymm13,ymm7,29
+ vpermq ymm11,ymm11,0x93
+ vpand ymm7,ymm7,ymm15
+ vpsrlq ymm0,ymm8,29
+ vpermq ymm12,ymm12,0x93
+ vpand ymm8,ymm8,ymm15
+ vpermq ymm13,ymm13,0x93
+
+ vpblendd ymm10,ymm14,ymm9,3
+ vpermq ymm0,ymm0,0x93
+ vpblendd ymm14,ymm11,ymm14,3
+ vpaddq ymm4,ymm4,ymm10
+ vpblendd ymm11,ymm12,ymm11,3
+ vpaddq ymm5,ymm5,ymm14
+ vmovdqu YMMWORD[(128-128)+rdi],ymm4
+ vpblendd ymm12,ymm13,ymm12,3
+ vpaddq ymm6,ymm6,ymm11
+ vmovdqu YMMWORD[(160-128)+rdi],ymm5
+ vpblendd ymm13,ymm0,ymm13,3
+ vpaddq ymm7,ymm7,ymm12
+ vmovdqu YMMWORD[(192-128)+rdi],ymm6
+ vpaddq ymm8,ymm8,ymm13
+ vmovdqu YMMWORD[(224-128)+rdi],ymm7
+ vmovdqu YMMWORD[(256-128)+rdi],ymm8
+
+ mov rsi,rdi
+ dec r8d
+ jne NEAR $L$OOP_GRANDE_SQR_1024
+
+ vzeroall
+ mov rax,rbp
+
+$L$sqr_1024_in_tail:
+ movaps xmm6,XMMWORD[((-216))+rax]
+ movaps xmm7,XMMWORD[((-200))+rax]
+ movaps xmm8,XMMWORD[((-184))+rax]
+ movaps xmm9,XMMWORD[((-168))+rax]
+ movaps xmm10,XMMWORD[((-152))+rax]
+ movaps xmm11,XMMWORD[((-136))+rax]
+ movaps xmm12,XMMWORD[((-120))+rax]
+ movaps xmm13,XMMWORD[((-104))+rax]
+ movaps xmm14,XMMWORD[((-88))+rax]
+ movaps xmm15,XMMWORD[((-72))+rax]
+ mov r15,QWORD[((-48))+rax]
+
+ mov r14,QWORD[((-40))+rax]
+
+ mov r13,QWORD[((-32))+rax]
+
+ mov r12,QWORD[((-24))+rax]
+
+ mov rbp,QWORD[((-16))+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+
+ lea rsp,[rax]
+
+$L$sqr_1024_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_rsaz_1024_sqr_avx2:
+global rsaz_1024_mul_avx2
+
+ALIGN 64
+rsaz_1024_mul_avx2:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_rsaz_1024_mul_avx2:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+
+
+
+ lea rax,[rsp]
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+ vzeroupper
+ lea rsp,[((-168))+rsp]
+ vmovaps XMMWORD[(-216)+rax],xmm6
+ vmovaps XMMWORD[(-200)+rax],xmm7
+ vmovaps XMMWORD[(-184)+rax],xmm8
+ vmovaps XMMWORD[(-168)+rax],xmm9
+ vmovaps XMMWORD[(-152)+rax],xmm10
+ vmovaps XMMWORD[(-136)+rax],xmm11
+ vmovaps XMMWORD[(-120)+rax],xmm12
+ vmovaps XMMWORD[(-104)+rax],xmm13
+ vmovaps XMMWORD[(-88)+rax],xmm14
+ vmovaps XMMWORD[(-72)+rax],xmm15
+$L$mul_1024_body:
+ mov rbp,rax
+
+ vzeroall
+ mov r13,rdx
+ sub rsp,64
+
+
+
+
+
+
+DB 0x67,0x67
+ mov r15,rsi
+ and r15,4095
+ add r15,32*10
+ shr r15,12
+ mov r15,rsi
+ cmovnz rsi,r13
+ cmovnz r13,r15
+
+ mov r15,rcx
+ sub rsi,-128
+ sub rcx,-128
+ sub rdi,-128
+
+ and r15,4095
+ add r15,32*10
+DB 0x67,0x67
+ shr r15,12
+ jz NEAR $L$mul_1024_no_n_copy
+
+
+
+
+
+ sub rsp,32*10
+ vmovdqu ymm0,YMMWORD[((0-128))+rcx]
+ and rsp,-512
+ vmovdqu ymm1,YMMWORD[((32-128))+rcx]
+ vmovdqu ymm2,YMMWORD[((64-128))+rcx]
+ vmovdqu ymm3,YMMWORD[((96-128))+rcx]
+ vmovdqu ymm4,YMMWORD[((128-128))+rcx]
+ vmovdqu ymm5,YMMWORD[((160-128))+rcx]
+ vmovdqu ymm6,YMMWORD[((192-128))+rcx]
+ vmovdqu ymm7,YMMWORD[((224-128))+rcx]
+ vmovdqu ymm8,YMMWORD[((256-128))+rcx]
+ lea rcx,[((64+128))+rsp]
+ vmovdqu YMMWORD[(0-128)+rcx],ymm0
+ vpxor ymm0,ymm0,ymm0
+ vmovdqu YMMWORD[(32-128)+rcx],ymm1
+ vpxor ymm1,ymm1,ymm1
+ vmovdqu YMMWORD[(64-128)+rcx],ymm2
+ vpxor ymm2,ymm2,ymm2
+ vmovdqu YMMWORD[(96-128)+rcx],ymm3
+ vpxor ymm3,ymm3,ymm3
+ vmovdqu YMMWORD[(128-128)+rcx],ymm4
+ vpxor ymm4,ymm4,ymm4
+ vmovdqu YMMWORD[(160-128)+rcx],ymm5
+ vpxor ymm5,ymm5,ymm5
+ vmovdqu YMMWORD[(192-128)+rcx],ymm6
+ vpxor ymm6,ymm6,ymm6
+ vmovdqu YMMWORD[(224-128)+rcx],ymm7
+ vpxor ymm7,ymm7,ymm7
+ vmovdqu YMMWORD[(256-128)+rcx],ymm8
+ vmovdqa ymm8,ymm0
+ vmovdqu YMMWORD[(288-128)+rcx],ymm9
+$L$mul_1024_no_n_copy:
+ and rsp,-64
+
+ mov rbx,QWORD[r13]
+ vpbroadcastq ymm10,QWORD[r13]
+ vmovdqu YMMWORD[rsp],ymm0
+ xor r9,r9
+DB 0x67
+ xor r10,r10
+ xor r11,r11
+ xor r12,r12
+
+ vmovdqu ymm15,YMMWORD[$L$and_mask]
+ mov r14d,9
+ vmovdqu YMMWORD[(288-128)+rdi],ymm9
+ jmp NEAR $L$oop_mul_1024
+
+ALIGN 32
+$L$oop_mul_1024:
+ vpsrlq ymm9,ymm3,29
+ mov rax,rbx
+ imul rax,QWORD[((-128))+rsi]
+ add rax,r9
+ mov r10,rbx
+ imul r10,QWORD[((8-128))+rsi]
+ add r10,QWORD[8+rsp]
+
+ mov r9,rax
+ imul eax,r8d
+ and eax,0x1fffffff
+
+ mov r11,rbx
+ imul r11,QWORD[((16-128))+rsi]
+ add r11,QWORD[16+rsp]
+
+ mov r12,rbx
+ imul r12,QWORD[((24-128))+rsi]
+ add r12,QWORD[24+rsp]
+ vpmuludq ymm0,ymm10,YMMWORD[((32-128))+rsi]
+ vmovd xmm11,eax
+ vpaddq ymm1,ymm1,ymm0
+ vpmuludq ymm12,ymm10,YMMWORD[((64-128))+rsi]
+ vpbroadcastq ymm11,xmm11
+ vpaddq ymm2,ymm2,ymm12
+ vpmuludq ymm13,ymm10,YMMWORD[((96-128))+rsi]
+ vpand ymm3,ymm3,ymm15
+ vpaddq ymm3,ymm3,ymm13
+ vpmuludq ymm0,ymm10,YMMWORD[((128-128))+rsi]
+ vpaddq ymm4,ymm4,ymm0
+ vpmuludq ymm12,ymm10,YMMWORD[((160-128))+rsi]
+ vpaddq ymm5,ymm5,ymm12
+ vpmuludq ymm13,ymm10,YMMWORD[((192-128))+rsi]
+ vpaddq ymm6,ymm6,ymm13
+ vpmuludq ymm0,ymm10,YMMWORD[((224-128))+rsi]
+ vpermq ymm9,ymm9,0x93
+ vpaddq ymm7,ymm7,ymm0
+ vpmuludq ymm12,ymm10,YMMWORD[((256-128))+rsi]
+ vpbroadcastq ymm10,QWORD[8+r13]
+ vpaddq ymm8,ymm8,ymm12
+
+ mov rdx,rax
+ imul rax,QWORD[((-128))+rcx]
+ add r9,rax
+ mov rax,rdx
+ imul rax,QWORD[((8-128))+rcx]
+ add r10,rax
+ mov rax,rdx
+ imul rax,QWORD[((16-128))+rcx]
+ add r11,rax
+ shr r9,29
+ imul rdx,QWORD[((24-128))+rcx]
+ add r12,rdx
+ add r10,r9
+
+ vpmuludq ymm13,ymm11,YMMWORD[((32-128))+rcx]
+ vmovq rbx,xmm10
+ vpaddq ymm1,ymm1,ymm13
+ vpmuludq ymm0,ymm11,YMMWORD[((64-128))+rcx]
+ vpaddq ymm2,ymm2,ymm0
+ vpmuludq ymm12,ymm11,YMMWORD[((96-128))+rcx]
+ vpaddq ymm3,ymm3,ymm12
+ vpmuludq ymm13,ymm11,YMMWORD[((128-128))+rcx]
+ vpaddq ymm4,ymm4,ymm13
+ vpmuludq ymm0,ymm11,YMMWORD[((160-128))+rcx]
+ vpaddq ymm5,ymm5,ymm0
+ vpmuludq ymm12,ymm11,YMMWORD[((192-128))+rcx]
+ vpaddq ymm6,ymm6,ymm12
+ vpmuludq ymm13,ymm11,YMMWORD[((224-128))+rcx]
+ vpblendd ymm12,ymm9,ymm14,3
+ vpaddq ymm7,ymm7,ymm13
+ vpmuludq ymm0,ymm11,YMMWORD[((256-128))+rcx]
+ vpaddq ymm3,ymm3,ymm12
+ vpaddq ymm8,ymm8,ymm0
+
+ mov rax,rbx
+ imul rax,QWORD[((-128))+rsi]
+ add r10,rax
+ vmovdqu ymm12,YMMWORD[((-8+32-128))+rsi]
+ mov rax,rbx
+ imul rax,QWORD[((8-128))+rsi]
+ add r11,rax
+ vmovdqu ymm13,YMMWORD[((-8+64-128))+rsi]
+
+ mov rax,r10
+ vpblendd ymm9,ymm9,ymm14,0xfc
+ imul eax,r8d
+ vpaddq ymm4,ymm4,ymm9
+ and eax,0x1fffffff
+
+ imul rbx,QWORD[((16-128))+rsi]
+ add r12,rbx
+ vpmuludq ymm12,ymm12,ymm10
+ vmovd xmm11,eax
+ vmovdqu ymm0,YMMWORD[((-8+96-128))+rsi]
+ vpaddq ymm1,ymm1,ymm12
+ vpmuludq ymm13,ymm13,ymm10
+ vpbroadcastq ymm11,xmm11
+ vmovdqu ymm12,YMMWORD[((-8+128-128))+rsi]
+ vpaddq ymm2,ymm2,ymm13
+ vpmuludq ymm0,ymm0,ymm10
+ vmovdqu ymm13,YMMWORD[((-8+160-128))+rsi]
+ vpaddq ymm3,ymm3,ymm0
+ vpmuludq ymm12,ymm12,ymm10
+ vmovdqu ymm0,YMMWORD[((-8+192-128))+rsi]
+ vpaddq ymm4,ymm4,ymm12
+ vpmuludq ymm13,ymm13,ymm10
+ vmovdqu ymm12,YMMWORD[((-8+224-128))+rsi]
+ vpaddq ymm5,ymm5,ymm13
+ vpmuludq ymm0,ymm0,ymm10
+ vmovdqu ymm13,YMMWORD[((-8+256-128))+rsi]
+ vpaddq ymm6,ymm6,ymm0
+ vpmuludq ymm12,ymm12,ymm10
+ vmovdqu ymm9,YMMWORD[((-8+288-128))+rsi]
+ vpaddq ymm7,ymm7,ymm12
+ vpmuludq ymm13,ymm13,ymm10
+ vpaddq ymm8,ymm8,ymm13
+ vpmuludq ymm9,ymm9,ymm10
+ vpbroadcastq ymm10,QWORD[16+r13]
+
+ mov rdx,rax
+ imul rax,QWORD[((-128))+rcx]
+ add r10,rax
+ vmovdqu ymm0,YMMWORD[((-8+32-128))+rcx]
+ mov rax,rdx
+ imul rax,QWORD[((8-128))+rcx]
+ add r11,rax
+ vmovdqu ymm12,YMMWORD[((-8+64-128))+rcx]
+ shr r10,29
+ imul rdx,QWORD[((16-128))+rcx]
+ add r12,rdx
+ add r11,r10
+
+ vpmuludq ymm0,ymm0,ymm11
+ vmovq rbx,xmm10
+ vmovdqu ymm13,YMMWORD[((-8+96-128))+rcx]
+ vpaddq ymm1,ymm1,ymm0
+ vpmuludq ymm12,ymm12,ymm11
+ vmovdqu ymm0,YMMWORD[((-8+128-128))+rcx]
+ vpaddq ymm2,ymm2,ymm12
+ vpmuludq ymm13,ymm13,ymm11
+ vmovdqu ymm12,YMMWORD[((-8+160-128))+rcx]
+ vpaddq ymm3,ymm3,ymm13
+ vpmuludq ymm0,ymm0,ymm11
+ vmovdqu ymm13,YMMWORD[((-8+192-128))+rcx]
+ vpaddq ymm4,ymm4,ymm0
+ vpmuludq ymm12,ymm12,ymm11
+ vmovdqu ymm0,YMMWORD[((-8+224-128))+rcx]
+ vpaddq ymm5,ymm5,ymm12
+ vpmuludq ymm13,ymm13,ymm11
+ vmovdqu ymm12,YMMWORD[((-8+256-128))+rcx]
+ vpaddq ymm6,ymm6,ymm13
+ vpmuludq ymm0,ymm0,ymm11
+ vmovdqu ymm13,YMMWORD[((-8+288-128))+rcx]
+ vpaddq ymm7,ymm7,ymm0
+ vpmuludq ymm12,ymm12,ymm11
+ vpaddq ymm8,ymm8,ymm12
+ vpmuludq ymm13,ymm13,ymm11
+ vpaddq ymm9,ymm9,ymm13
+
+ vmovdqu ymm0,YMMWORD[((-16+32-128))+rsi]
+ mov rax,rbx
+ imul rax,QWORD[((-128))+rsi]
+ add rax,r11
+
+ vmovdqu ymm12,YMMWORD[((-16+64-128))+rsi]
+ mov r11,rax
+ imul eax,r8d
+ and eax,0x1fffffff
+
+ imul rbx,QWORD[((8-128))+rsi]
+ add r12,rbx
+ vpmuludq ymm0,ymm0,ymm10
+ vmovd xmm11,eax
+ vmovdqu ymm13,YMMWORD[((-16+96-128))+rsi]
+ vpaddq ymm1,ymm1,ymm0
+ vpmuludq ymm12,ymm12,ymm10
+ vpbroadcastq ymm11,xmm11
+ vmovdqu ymm0,YMMWORD[((-16+128-128))+rsi]
+ vpaddq ymm2,ymm2,ymm12
+ vpmuludq ymm13,ymm13,ymm10
+ vmovdqu ymm12,YMMWORD[((-16+160-128))+rsi]
+ vpaddq ymm3,ymm3,ymm13
+ vpmuludq ymm0,ymm0,ymm10
+ vmovdqu ymm13,YMMWORD[((-16+192-128))+rsi]
+ vpaddq ymm4,ymm4,ymm0
+ vpmuludq ymm12,ymm12,ymm10
+ vmovdqu ymm0,YMMWORD[((-16+224-128))+rsi]
+ vpaddq ymm5,ymm5,ymm12
+ vpmuludq ymm13,ymm13,ymm10
+ vmovdqu ymm12,YMMWORD[((-16+256-128))+rsi]
+ vpaddq ymm6,ymm6,ymm13
+ vpmuludq ymm0,ymm0,ymm10
+ vmovdqu ymm13,YMMWORD[((-16+288-128))+rsi]
+ vpaddq ymm7,ymm7,ymm0
+ vpmuludq ymm12,ymm12,ymm10
+ vpaddq ymm8,ymm8,ymm12
+ vpmuludq ymm13,ymm13,ymm10
+ vpbroadcastq ymm10,QWORD[24+r13]
+ vpaddq ymm9,ymm9,ymm13
+
+ vmovdqu ymm0,YMMWORD[((-16+32-128))+rcx]
+ mov rdx,rax
+ imul rax,QWORD[((-128))+rcx]
+ add r11,rax
+ vmovdqu ymm12,YMMWORD[((-16+64-128))+rcx]
+ imul rdx,QWORD[((8-128))+rcx]
+ add r12,rdx
+ shr r11,29
+
+ vpmuludq ymm0,ymm0,ymm11
+ vmovq rbx,xmm10
+ vmovdqu ymm13,YMMWORD[((-16+96-128))+rcx]
+ vpaddq ymm1,ymm1,ymm0
+ vpmuludq ymm12,ymm12,ymm11
+ vmovdqu ymm0,YMMWORD[((-16+128-128))+rcx]
+ vpaddq ymm2,ymm2,ymm12
+ vpmuludq ymm13,ymm13,ymm11
+ vmovdqu ymm12,YMMWORD[((-16+160-128))+rcx]
+ vpaddq ymm3,ymm3,ymm13
+ vpmuludq ymm0,ymm0,ymm11
+ vmovdqu ymm13,YMMWORD[((-16+192-128))+rcx]
+ vpaddq ymm4,ymm4,ymm0
+ vpmuludq ymm12,ymm12,ymm11
+ vmovdqu ymm0,YMMWORD[((-16+224-128))+rcx]
+ vpaddq ymm5,ymm5,ymm12
+ vpmuludq ymm13,ymm13,ymm11
+ vmovdqu ymm12,YMMWORD[((-16+256-128))+rcx]
+ vpaddq ymm6,ymm6,ymm13
+ vpmuludq ymm0,ymm0,ymm11
+ vmovdqu ymm13,YMMWORD[((-16+288-128))+rcx]
+ vpaddq ymm7,ymm7,ymm0
+ vpmuludq ymm12,ymm12,ymm11
+ vmovdqu ymm0,YMMWORD[((-24+32-128))+rsi]
+ vpaddq ymm8,ymm8,ymm12
+ vpmuludq ymm13,ymm13,ymm11
+ vmovdqu ymm12,YMMWORD[((-24+64-128))+rsi]
+ vpaddq ymm9,ymm9,ymm13
+
+ add r12,r11
+ imul rbx,QWORD[((-128))+rsi]
+ add r12,rbx
+
+ mov rax,r12
+ imul eax,r8d
+ and eax,0x1fffffff
+
+ vpmuludq ymm0,ymm0,ymm10
+ vmovd xmm11,eax
+ vmovdqu ymm13,YMMWORD[((-24+96-128))+rsi]
+ vpaddq ymm1,ymm1,ymm0
+ vpmuludq ymm12,ymm12,ymm10
+ vpbroadcastq ymm11,xmm11
+ vmovdqu ymm0,YMMWORD[((-24+128-128))+rsi]
+ vpaddq ymm2,ymm2,ymm12
+ vpmuludq ymm13,ymm13,ymm10
+ vmovdqu ymm12,YMMWORD[((-24+160-128))+rsi]
+ vpaddq ymm3,ymm3,ymm13
+ vpmuludq ymm0,ymm0,ymm10
+ vmovdqu ymm13,YMMWORD[((-24+192-128))+rsi]
+ vpaddq ymm4,ymm4,ymm0
+ vpmuludq ymm12,ymm12,ymm10
+ vmovdqu ymm0,YMMWORD[((-24+224-128))+rsi]
+ vpaddq ymm5,ymm5,ymm12
+ vpmuludq ymm13,ymm13,ymm10
+ vmovdqu ymm12,YMMWORD[((-24+256-128))+rsi]
+ vpaddq ymm6,ymm6,ymm13
+ vpmuludq ymm0,ymm0,ymm10
+ vmovdqu ymm13,YMMWORD[((-24+288-128))+rsi]
+ vpaddq ymm7,ymm7,ymm0
+ vpmuludq ymm12,ymm12,ymm10
+ vpaddq ymm8,ymm8,ymm12
+ vpmuludq ymm13,ymm13,ymm10
+ vpbroadcastq ymm10,QWORD[32+r13]
+ vpaddq ymm9,ymm9,ymm13
+ add r13,32
+
+ vmovdqu ymm0,YMMWORD[((-24+32-128))+rcx]
+ imul rax,QWORD[((-128))+rcx]
+ add r12,rax
+ shr r12,29
+
+ vmovdqu ymm12,YMMWORD[((-24+64-128))+rcx]
+ vpmuludq ymm0,ymm0,ymm11
+ vmovq rbx,xmm10
+ vmovdqu ymm13,YMMWORD[((-24+96-128))+rcx]
+ vpaddq ymm0,ymm1,ymm0
+ vpmuludq ymm12,ymm12,ymm11
+ vmovdqu YMMWORD[rsp],ymm0
+ vpaddq ymm1,ymm2,ymm12
+ vmovdqu ymm0,YMMWORD[((-24+128-128))+rcx]
+ vpmuludq ymm13,ymm13,ymm11
+ vmovdqu ymm12,YMMWORD[((-24+160-128))+rcx]
+ vpaddq ymm2,ymm3,ymm13
+ vpmuludq ymm0,ymm0,ymm11
+ vmovdqu ymm13,YMMWORD[((-24+192-128))+rcx]
+ vpaddq ymm3,ymm4,ymm0
+ vpmuludq ymm12,ymm12,ymm11
+ vmovdqu ymm0,YMMWORD[((-24+224-128))+rcx]
+ vpaddq ymm4,ymm5,ymm12
+ vpmuludq ymm13,ymm13,ymm11
+ vmovdqu ymm12,YMMWORD[((-24+256-128))+rcx]
+ vpaddq ymm5,ymm6,ymm13
+ vpmuludq ymm0,ymm0,ymm11
+ vmovdqu ymm13,YMMWORD[((-24+288-128))+rcx]
+ mov r9,r12
+ vpaddq ymm6,ymm7,ymm0
+ vpmuludq ymm12,ymm12,ymm11
+ add r9,QWORD[rsp]
+ vpaddq ymm7,ymm8,ymm12
+ vpmuludq ymm13,ymm13,ymm11
+ vmovq xmm12,r12
+ vpaddq ymm8,ymm9,ymm13
+
+ dec r14d
+ jnz NEAR $L$oop_mul_1024
+ vpaddq ymm0,ymm12,YMMWORD[rsp]
+
+ vpsrlq ymm12,ymm0,29
+ vpand ymm0,ymm0,ymm15
+ vpsrlq ymm13,ymm1,29
+ vpand ymm1,ymm1,ymm15
+ vpsrlq ymm10,ymm2,29
+ vpermq ymm12,ymm12,0x93
+ vpand ymm2,ymm2,ymm15
+ vpsrlq ymm11,ymm3,29
+ vpermq ymm13,ymm13,0x93
+ vpand ymm3,ymm3,ymm15
+
+ vpblendd ymm9,ymm12,ymm14,3
+ vpermq ymm10,ymm10,0x93
+ vpblendd ymm12,ymm13,ymm12,3
+ vpermq ymm11,ymm11,0x93
+ vpaddq ymm0,ymm0,ymm9
+ vpblendd ymm13,ymm10,ymm13,3
+ vpaddq ymm1,ymm1,ymm12
+ vpblendd ymm10,ymm11,ymm10,3
+ vpaddq ymm2,ymm2,ymm13
+ vpblendd ymm11,ymm14,ymm11,3
+ vpaddq ymm3,ymm3,ymm10
+ vpaddq ymm4,ymm4,ymm11
+
+ vpsrlq ymm12,ymm0,29
+ vpand ymm0,ymm0,ymm15
+ vpsrlq ymm13,ymm1,29
+ vpand ymm1,ymm1,ymm15
+ vpsrlq ymm10,ymm2,29
+ vpermq ymm12,ymm12,0x93
+ vpand ymm2,ymm2,ymm15
+ vpsrlq ymm11,ymm3,29
+ vpermq ymm13,ymm13,0x93
+ vpand ymm3,ymm3,ymm15
+ vpermq ymm10,ymm10,0x93
+
+ vpblendd ymm9,ymm12,ymm14,3
+ vpermq ymm11,ymm11,0x93
+ vpblendd ymm12,ymm13,ymm12,3
+ vpaddq ymm0,ymm0,ymm9
+ vpblendd ymm13,ymm10,ymm13,3
+ vpaddq ymm1,ymm1,ymm12
+ vpblendd ymm10,ymm11,ymm10,3
+ vpaddq ymm2,ymm2,ymm13
+ vpblendd ymm11,ymm14,ymm11,3
+ vpaddq ymm3,ymm3,ymm10
+ vpaddq ymm4,ymm4,ymm11
+
+ vmovdqu YMMWORD[(0-128)+rdi],ymm0
+ vmovdqu YMMWORD[(32-128)+rdi],ymm1
+ vmovdqu YMMWORD[(64-128)+rdi],ymm2
+ vmovdqu YMMWORD[(96-128)+rdi],ymm3
+ vpsrlq ymm12,ymm4,29
+ vpand ymm4,ymm4,ymm15
+ vpsrlq ymm13,ymm5,29
+ vpand ymm5,ymm5,ymm15
+ vpsrlq ymm10,ymm6,29
+ vpermq ymm12,ymm12,0x93
+ vpand ymm6,ymm6,ymm15
+ vpsrlq ymm11,ymm7,29
+ vpermq ymm13,ymm13,0x93
+ vpand ymm7,ymm7,ymm15
+ vpsrlq ymm0,ymm8,29
+ vpermq ymm10,ymm10,0x93
+ vpand ymm8,ymm8,ymm15
+ vpermq ymm11,ymm11,0x93
+
+ vpblendd ymm9,ymm12,ymm14,3
+ vpermq ymm0,ymm0,0x93
+ vpblendd ymm12,ymm13,ymm12,3
+ vpaddq ymm4,ymm4,ymm9
+ vpblendd ymm13,ymm10,ymm13,3
+ vpaddq ymm5,ymm5,ymm12
+ vpblendd ymm10,ymm11,ymm10,3
+ vpaddq ymm6,ymm6,ymm13
+ vpblendd ymm11,ymm0,ymm11,3
+ vpaddq ymm7,ymm7,ymm10
+ vpaddq ymm8,ymm8,ymm11
+
+ vpsrlq ymm12,ymm4,29
+ vpand ymm4,ymm4,ymm15
+ vpsrlq ymm13,ymm5,29
+ vpand ymm5,ymm5,ymm15
+ vpsrlq ymm10,ymm6,29
+ vpermq ymm12,ymm12,0x93
+ vpand ymm6,ymm6,ymm15
+ vpsrlq ymm11,ymm7,29
+ vpermq ymm13,ymm13,0x93
+ vpand ymm7,ymm7,ymm15
+ vpsrlq ymm0,ymm8,29
+ vpermq ymm10,ymm10,0x93
+ vpand ymm8,ymm8,ymm15
+ vpermq ymm11,ymm11,0x93
+
+ vpblendd ymm9,ymm12,ymm14,3
+ vpermq ymm0,ymm0,0x93
+ vpblendd ymm12,ymm13,ymm12,3
+ vpaddq ymm4,ymm4,ymm9
+ vpblendd ymm13,ymm10,ymm13,3
+ vpaddq ymm5,ymm5,ymm12
+ vpblendd ymm10,ymm11,ymm10,3
+ vpaddq ymm6,ymm6,ymm13
+ vpblendd ymm11,ymm0,ymm11,3
+ vpaddq ymm7,ymm7,ymm10
+ vpaddq ymm8,ymm8,ymm11
+
+ vmovdqu YMMWORD[(128-128)+rdi],ymm4
+ vmovdqu YMMWORD[(160-128)+rdi],ymm5
+ vmovdqu YMMWORD[(192-128)+rdi],ymm6
+ vmovdqu YMMWORD[(224-128)+rdi],ymm7
+ vmovdqu YMMWORD[(256-128)+rdi],ymm8
+ vzeroupper
+
+ mov rax,rbp
+
+$L$mul_1024_in_tail:
+ movaps xmm6,XMMWORD[((-216))+rax]
+ movaps xmm7,XMMWORD[((-200))+rax]
+ movaps xmm8,XMMWORD[((-184))+rax]
+ movaps xmm9,XMMWORD[((-168))+rax]
+ movaps xmm10,XMMWORD[((-152))+rax]
+ movaps xmm11,XMMWORD[((-136))+rax]
+ movaps xmm12,XMMWORD[((-120))+rax]
+ movaps xmm13,XMMWORD[((-104))+rax]
+ movaps xmm14,XMMWORD[((-88))+rax]
+ movaps xmm15,XMMWORD[((-72))+rax]
+ mov r15,QWORD[((-48))+rax]
+
+ mov r14,QWORD[((-40))+rax]
+
+ mov r13,QWORD[((-32))+rax]
+
+ mov r12,QWORD[((-24))+rax]
+
+ mov rbp,QWORD[((-16))+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+
+ lea rsp,[rax]
+
+$L$mul_1024_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_rsaz_1024_mul_avx2:
+global rsaz_1024_red2norm_avx2
+
+ALIGN 32
+rsaz_1024_red2norm_avx2:
+
+ sub rdx,-128
+ xor rax,rax
+ mov r8,QWORD[((-128))+rdx]
+ mov r9,QWORD[((-120))+rdx]
+ mov r10,QWORD[((-112))+rdx]
+ shl r8,0
+ shl r9,29
+ mov r11,r10
+ shl r10,58
+ shr r11,6
+ add rax,r8
+ add rax,r9
+ add rax,r10
+ adc r11,0
+ mov QWORD[rcx],rax
+ mov rax,r11
+ mov r8,QWORD[((-104))+rdx]
+ mov r9,QWORD[((-96))+rdx]
+ shl r8,23
+ mov r10,r9
+ shl r9,52
+ shr r10,12
+ add rax,r8
+ add rax,r9
+ adc r10,0
+ mov QWORD[8+rcx],rax
+ mov rax,r10
+ mov r11,QWORD[((-88))+rdx]
+ mov r8,QWORD[((-80))+rdx]
+ shl r11,17
+ mov r9,r8
+ shl r8,46
+ shr r9,18
+ add rax,r11
+ add rax,r8
+ adc r9,0
+ mov QWORD[16+rcx],rax
+ mov rax,r9
+ mov r10,QWORD[((-72))+rdx]
+ mov r11,QWORD[((-64))+rdx]
+ shl r10,11
+ mov r8,r11
+ shl r11,40
+ shr r8,24
+ add rax,r10
+ add rax,r11
+ adc r8,0
+ mov QWORD[24+rcx],rax
+ mov rax,r8
+ mov r9,QWORD[((-56))+rdx]
+ mov r10,QWORD[((-48))+rdx]
+ mov r11,QWORD[((-40))+rdx]
+ shl r9,5
+ shl r10,34
+ mov r8,r11
+ shl r11,63
+ shr r8,1
+ add rax,r9
+ add rax,r10
+ add rax,r11
+ adc r8,0
+ mov QWORD[32+rcx],rax
+ mov rax,r8
+ mov r9,QWORD[((-32))+rdx]
+ mov r10,QWORD[((-24))+rdx]
+ shl r9,28
+ mov r11,r10
+ shl r10,57
+ shr r11,7
+ add rax,r9
+ add rax,r10
+ adc r11,0
+ mov QWORD[40+rcx],rax
+ mov rax,r11
+ mov r8,QWORD[((-16))+rdx]
+ mov r9,QWORD[((-8))+rdx]
+ shl r8,22
+ mov r10,r9
+ shl r9,51
+ shr r10,13
+ add rax,r8
+ add rax,r9
+ adc r10,0
+ mov QWORD[48+rcx],rax
+ mov rax,r10
+ mov r11,QWORD[rdx]
+ mov r8,QWORD[8+rdx]
+ shl r11,16
+ mov r9,r8
+ shl r8,45
+ shr r9,19
+ add rax,r11
+ add rax,r8
+ adc r9,0
+ mov QWORD[56+rcx],rax
+ mov rax,r9
+ mov r10,QWORD[16+rdx]
+ mov r11,QWORD[24+rdx]
+ shl r10,10
+ mov r8,r11
+ shl r11,39
+ shr r8,25
+ add rax,r10
+ add rax,r11
+ adc r8,0
+ mov QWORD[64+rcx],rax
+ mov rax,r8
+ mov r9,QWORD[32+rdx]
+ mov r10,QWORD[40+rdx]
+ mov r11,QWORD[48+rdx]
+ shl r9,4
+ shl r10,33
+ mov r8,r11
+ shl r11,62
+ shr r8,2
+ add rax,r9
+ add rax,r10
+ add rax,r11
+ adc r8,0
+ mov QWORD[72+rcx],rax
+ mov rax,r8
+ mov r9,QWORD[56+rdx]
+ mov r10,QWORD[64+rdx]
+ shl r9,27
+ mov r11,r10
+ shl r10,56
+ shr r11,8
+ add rax,r9
+ add rax,r10
+ adc r11,0
+ mov QWORD[80+rcx],rax
+ mov rax,r11
+ mov r8,QWORD[72+rdx]
+ mov r9,QWORD[80+rdx]
+ shl r8,21
+ mov r10,r9
+ shl r9,50
+ shr r10,14
+ add rax,r8
+ add rax,r9
+ adc r10,0
+ mov QWORD[88+rcx],rax
+ mov rax,r10
+ mov r11,QWORD[88+rdx]
+ mov r8,QWORD[96+rdx]
+ shl r11,15
+ mov r9,r8
+ shl r8,44
+ shr r9,20
+ add rax,r11
+ add rax,r8
+ adc r9,0
+ mov QWORD[96+rcx],rax
+ mov rax,r9
+ mov r10,QWORD[104+rdx]
+ mov r11,QWORD[112+rdx]
+ shl r10,9
+ mov r8,r11
+ shl r11,38
+ shr r8,26
+ add rax,r10
+ add rax,r11
+ adc r8,0
+ mov QWORD[104+rcx],rax
+ mov rax,r8
+ mov r9,QWORD[120+rdx]
+ mov r10,QWORD[128+rdx]
+ mov r11,QWORD[136+rdx]
+ shl r9,3
+ shl r10,32
+ mov r8,r11
+ shl r11,61
+ shr r8,3
+ add rax,r9
+ add rax,r10
+ add rax,r11
+ adc r8,0
+ mov QWORD[112+rcx],rax
+ mov rax,r8
+ mov r9,QWORD[144+rdx]
+ mov r10,QWORD[152+rdx]
+ shl r9,26
+ mov r11,r10
+ shl r10,55
+ shr r11,9
+ add rax,r9
+ add rax,r10
+ adc r11,0
+ mov QWORD[120+rcx],rax
+ mov rax,r11
+ DB 0F3h,0C3h ;repret
+
+
+
+global rsaz_1024_norm2red_avx2
+
+ALIGN 32
+rsaz_1024_norm2red_avx2:
+
+ sub rcx,-128
+ mov r8,QWORD[rdx]
+ mov eax,0x1fffffff
+ mov r9,QWORD[8+rdx]
+ mov r11,r8
+ shr r11,0
+ and r11,rax
+ mov QWORD[((-128))+rcx],r11
+ mov r10,r8
+ shr r10,29
+ and r10,rax
+ mov QWORD[((-120))+rcx],r10
+ shrd r8,r9,58
+ and r8,rax
+ mov QWORD[((-112))+rcx],r8
+ mov r10,QWORD[16+rdx]
+ mov r8,r9
+ shr r8,23
+ and r8,rax
+ mov QWORD[((-104))+rcx],r8
+ shrd r9,r10,52
+ and r9,rax
+ mov QWORD[((-96))+rcx],r9
+ mov r11,QWORD[24+rdx]
+ mov r9,r10
+ shr r9,17
+ and r9,rax
+ mov QWORD[((-88))+rcx],r9
+ shrd r10,r11,46
+ and r10,rax
+ mov QWORD[((-80))+rcx],r10
+ mov r8,QWORD[32+rdx]
+ mov r10,r11
+ shr r10,11
+ and r10,rax
+ mov QWORD[((-72))+rcx],r10
+ shrd r11,r8,40
+ and r11,rax
+ mov QWORD[((-64))+rcx],r11
+ mov r9,QWORD[40+rdx]
+ mov r11,r8
+ shr r11,5
+ and r11,rax
+ mov QWORD[((-56))+rcx],r11
+ mov r10,r8
+ shr r10,34
+ and r10,rax
+ mov QWORD[((-48))+rcx],r10
+ shrd r8,r9,63
+ and r8,rax
+ mov QWORD[((-40))+rcx],r8
+ mov r10,QWORD[48+rdx]
+ mov r8,r9
+ shr r8,28
+ and r8,rax
+ mov QWORD[((-32))+rcx],r8
+ shrd r9,r10,57
+ and r9,rax
+ mov QWORD[((-24))+rcx],r9
+ mov r11,QWORD[56+rdx]
+ mov r9,r10
+ shr r9,22
+ and r9,rax
+ mov QWORD[((-16))+rcx],r9
+ shrd r10,r11,51
+ and r10,rax
+ mov QWORD[((-8))+rcx],r10
+ mov r8,QWORD[64+rdx]
+ mov r10,r11
+ shr r10,16
+ and r10,rax
+ mov QWORD[rcx],r10
+ shrd r11,r8,45
+ and r11,rax
+ mov QWORD[8+rcx],r11
+ mov r9,QWORD[72+rdx]
+ mov r11,r8
+ shr r11,10
+ and r11,rax
+ mov QWORD[16+rcx],r11
+ shrd r8,r9,39
+ and r8,rax
+ mov QWORD[24+rcx],r8
+ mov r10,QWORD[80+rdx]
+ mov r8,r9
+ shr r8,4
+ and r8,rax
+ mov QWORD[32+rcx],r8
+ mov r11,r9
+ shr r11,33
+ and r11,rax
+ mov QWORD[40+rcx],r11
+ shrd r9,r10,62
+ and r9,rax
+ mov QWORD[48+rcx],r9
+ mov r11,QWORD[88+rdx]
+ mov r9,r10
+ shr r9,27
+ and r9,rax
+ mov QWORD[56+rcx],r9
+ shrd r10,r11,56
+ and r10,rax
+ mov QWORD[64+rcx],r10
+ mov r8,QWORD[96+rdx]
+ mov r10,r11
+ shr r10,21
+ and r10,rax
+ mov QWORD[72+rcx],r10
+ shrd r11,r8,50
+ and r11,rax
+ mov QWORD[80+rcx],r11
+ mov r9,QWORD[104+rdx]
+ mov r11,r8
+ shr r11,15
+ and r11,rax
+ mov QWORD[88+rcx],r11
+ shrd r8,r9,44
+ and r8,rax
+ mov QWORD[96+rcx],r8
+ mov r10,QWORD[112+rdx]
+ mov r8,r9
+ shr r8,9
+ and r8,rax
+ mov QWORD[104+rcx],r8
+ shrd r9,r10,38
+ and r9,rax
+ mov QWORD[112+rcx],r9
+ mov r11,QWORD[120+rdx]
+ mov r9,r10
+ shr r9,3
+ and r9,rax
+ mov QWORD[120+rcx],r9
+ mov r8,r10
+ shr r8,32
+ and r8,rax
+ mov QWORD[128+rcx],r8
+ shrd r10,r11,61
+ and r10,rax
+ mov QWORD[136+rcx],r10
+ xor r8,r8
+ mov r10,r11
+ shr r10,26
+ and r10,rax
+ mov QWORD[144+rcx],r10
+ shrd r11,r8,55
+ and r11,rax
+ mov QWORD[152+rcx],r11
+ mov QWORD[160+rcx],r8
+ mov QWORD[168+rcx],r8
+ mov QWORD[176+rcx],r8
+ mov QWORD[184+rcx],r8
+ DB 0F3h,0C3h ;repret
+
+
+global rsaz_1024_scatter5_avx2
+
+ALIGN 32
+rsaz_1024_scatter5_avx2:
+
+ vzeroupper
+ vmovdqu ymm5,YMMWORD[$L$scatter_permd]
+ shl r8d,4
+ lea rcx,[r8*1+rcx]
+ mov eax,9
+ jmp NEAR $L$oop_scatter_1024
+
+ALIGN 32
+$L$oop_scatter_1024:
+ vmovdqu ymm0,YMMWORD[rdx]
+ lea rdx,[32+rdx]
+ vpermd ymm0,ymm5,ymm0
+ vmovdqu XMMWORD[rcx],xmm0
+ lea rcx,[512+rcx]
+ dec eax
+ jnz NEAR $L$oop_scatter_1024
+
+ vzeroupper
+ DB 0F3h,0C3h ;repret
+
+
+
+global rsaz_1024_gather5_avx2
+
+ALIGN 32
+rsaz_1024_gather5_avx2:
+
+ vzeroupper
+ mov r11,rsp
+
+ lea rax,[((-136))+rsp]
+$L$SEH_begin_rsaz_1024_gather5:
+
+DB 0x48,0x8d,0x60,0xe0
+DB 0xc5,0xf8,0x29,0x70,0xe0
+DB 0xc5,0xf8,0x29,0x78,0xf0
+DB 0xc5,0x78,0x29,0x40,0x00
+DB 0xc5,0x78,0x29,0x48,0x10
+DB 0xc5,0x78,0x29,0x50,0x20
+DB 0xc5,0x78,0x29,0x58,0x30
+DB 0xc5,0x78,0x29,0x60,0x40
+DB 0xc5,0x78,0x29,0x68,0x50
+DB 0xc5,0x78,0x29,0x70,0x60
+DB 0xc5,0x78,0x29,0x78,0x70
+ lea rsp,[((-256))+rsp]
+ and rsp,-32
+ lea r10,[$L$inc]
+ lea rax,[((-128))+rsp]
+
+ vmovd xmm4,r8d
+ vmovdqa ymm0,YMMWORD[r10]
+ vmovdqa ymm1,YMMWORD[32+r10]
+ vmovdqa ymm5,YMMWORD[64+r10]
+ vpbroadcastd ymm4,xmm4
+
+ vpaddd ymm2,ymm0,ymm5
+ vpcmpeqd ymm0,ymm0,ymm4
+ vpaddd ymm3,ymm1,ymm5
+ vpcmpeqd ymm1,ymm1,ymm4
+ vmovdqa YMMWORD[(0+128)+rax],ymm0
+ vpaddd ymm0,ymm2,ymm5
+ vpcmpeqd ymm2,ymm2,ymm4
+ vmovdqa YMMWORD[(32+128)+rax],ymm1
+ vpaddd ymm1,ymm3,ymm5
+ vpcmpeqd ymm3,ymm3,ymm4
+ vmovdqa YMMWORD[(64+128)+rax],ymm2
+ vpaddd ymm2,ymm0,ymm5
+ vpcmpeqd ymm0,ymm0,ymm4
+ vmovdqa YMMWORD[(96+128)+rax],ymm3
+ vpaddd ymm3,ymm1,ymm5
+ vpcmpeqd ymm1,ymm1,ymm4
+ vmovdqa YMMWORD[(128+128)+rax],ymm0
+ vpaddd ymm8,ymm2,ymm5
+ vpcmpeqd ymm2,ymm2,ymm4
+ vmovdqa YMMWORD[(160+128)+rax],ymm1
+ vpaddd ymm9,ymm3,ymm5
+ vpcmpeqd ymm3,ymm3,ymm4
+ vmovdqa YMMWORD[(192+128)+rax],ymm2
+ vpaddd ymm10,ymm8,ymm5
+ vpcmpeqd ymm8,ymm8,ymm4
+ vmovdqa YMMWORD[(224+128)+rax],ymm3
+ vpaddd ymm11,ymm9,ymm5
+ vpcmpeqd ymm9,ymm9,ymm4
+ vpaddd ymm12,ymm10,ymm5
+ vpcmpeqd ymm10,ymm10,ymm4
+ vpaddd ymm13,ymm11,ymm5
+ vpcmpeqd ymm11,ymm11,ymm4
+ vpaddd ymm14,ymm12,ymm5
+ vpcmpeqd ymm12,ymm12,ymm4
+ vpaddd ymm15,ymm13,ymm5
+ vpcmpeqd ymm13,ymm13,ymm4
+ vpcmpeqd ymm14,ymm14,ymm4
+ vpcmpeqd ymm15,ymm15,ymm4
+
+ vmovdqa ymm7,YMMWORD[((-32))+r10]
+ lea rdx,[128+rdx]
+ mov r8d,9
+
+$L$oop_gather_1024:
+ vmovdqa ymm0,YMMWORD[((0-128))+rdx]
+ vmovdqa ymm1,YMMWORD[((32-128))+rdx]
+ vmovdqa ymm2,YMMWORD[((64-128))+rdx]
+ vmovdqa ymm3,YMMWORD[((96-128))+rdx]
+ vpand ymm0,ymm0,YMMWORD[((0+128))+rax]
+ vpand ymm1,ymm1,YMMWORD[((32+128))+rax]
+ vpand ymm2,ymm2,YMMWORD[((64+128))+rax]
+ vpor ymm4,ymm1,ymm0
+ vpand ymm3,ymm3,YMMWORD[((96+128))+rax]
+ vmovdqa ymm0,YMMWORD[((128-128))+rdx]
+ vmovdqa ymm1,YMMWORD[((160-128))+rdx]
+ vpor ymm5,ymm3,ymm2
+ vmovdqa ymm2,YMMWORD[((192-128))+rdx]
+ vmovdqa ymm3,YMMWORD[((224-128))+rdx]
+ vpand ymm0,ymm0,YMMWORD[((128+128))+rax]
+ vpand ymm1,ymm1,YMMWORD[((160+128))+rax]
+ vpand ymm2,ymm2,YMMWORD[((192+128))+rax]
+ vpor ymm4,ymm4,ymm0
+ vpand ymm3,ymm3,YMMWORD[((224+128))+rax]
+ vpand ymm0,ymm8,YMMWORD[((256-128))+rdx]
+ vpor ymm5,ymm5,ymm1
+ vpand ymm1,ymm9,YMMWORD[((288-128))+rdx]
+ vpor ymm4,ymm4,ymm2
+ vpand ymm2,ymm10,YMMWORD[((320-128))+rdx]
+ vpor ymm5,ymm5,ymm3
+ vpand ymm3,ymm11,YMMWORD[((352-128))+rdx]
+ vpor ymm4,ymm4,ymm0
+ vpand ymm0,ymm12,YMMWORD[((384-128))+rdx]
+ vpor ymm5,ymm5,ymm1
+ vpand ymm1,ymm13,YMMWORD[((416-128))+rdx]
+ vpor ymm4,ymm4,ymm2
+ vpand ymm2,ymm14,YMMWORD[((448-128))+rdx]
+ vpor ymm5,ymm5,ymm3
+ vpand ymm3,ymm15,YMMWORD[((480-128))+rdx]
+ lea rdx,[512+rdx]
+ vpor ymm4,ymm4,ymm0
+ vpor ymm5,ymm5,ymm1
+ vpor ymm4,ymm4,ymm2
+ vpor ymm5,ymm5,ymm3
+
+ vpor ymm4,ymm4,ymm5
+ vextracti128 xmm5,ymm4,1
+ vpor xmm5,xmm5,xmm4
+ vpermd ymm5,ymm7,ymm5
+ vmovdqu YMMWORD[rcx],ymm5
+ lea rcx,[32+rcx]
+ dec r8d
+ jnz NEAR $L$oop_gather_1024
+
+ vpxor ymm0,ymm0,ymm0
+ vmovdqu YMMWORD[rcx],ymm0
+ vzeroupper
+ movaps xmm6,XMMWORD[((-168))+r11]
+ movaps xmm7,XMMWORD[((-152))+r11]
+ movaps xmm8,XMMWORD[((-136))+r11]
+ movaps xmm9,XMMWORD[((-120))+r11]
+ movaps xmm10,XMMWORD[((-104))+r11]
+ movaps xmm11,XMMWORD[((-88))+r11]
+ movaps xmm12,XMMWORD[((-72))+r11]
+ movaps xmm13,XMMWORD[((-56))+r11]
+ movaps xmm14,XMMWORD[((-40))+r11]
+ movaps xmm15,XMMWORD[((-24))+r11]
+ lea rsp,[r11]
+
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_rsaz_1024_gather5:
+
+EXTERN OPENSSL_ia32cap_P
+global rsaz_avx2_eligible
+
+ALIGN 32
+rsaz_avx2_eligible:
+ mov eax,DWORD[((OPENSSL_ia32cap_P+8))]
+ mov ecx,524544
+ mov edx,0
+ and ecx,eax
+ cmp ecx,524544
+ cmove eax,edx
+ and eax,32
+ shr eax,5
+ DB 0F3h,0C3h ;repret
+
+
+ALIGN 64
+$L$and_mask:
+ DQ 0x1fffffff,0x1fffffff,0x1fffffff,0x1fffffff
+$L$scatter_permd:
+ DD 0,2,4,6,7,7,7,7
+$L$gather_permd:
+ DD 0,7,1,7,2,7,3,7
+$L$inc:
+ DD 0,0,0,0,1,1,1,1
+ DD 2,2,2,2,3,3,3,3
+ DD 4,4,4,4,4,4,4,4
+ALIGN 64
+EXTERN __imp_RtlVirtualUnwind
+
+ALIGN 16
+rsaz_se_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ mov rsi,QWORD[8+r9]
+ mov r11,QWORD[56+r9]
+
+ mov r10d,DWORD[r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jb NEAR $L$common_seh_tail
+
+ mov r10d,DWORD[4+r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jae NEAR $L$common_seh_tail
+
+ mov rbp,QWORD[160+r8]
+
+ mov r10d,DWORD[8+r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ cmovc rax,rbp
+
+ mov r15,QWORD[((-48))+rax]
+ mov r14,QWORD[((-40))+rax]
+ mov r13,QWORD[((-32))+rax]
+ mov r12,QWORD[((-24))+rax]
+ mov rbp,QWORD[((-16))+rax]
+ mov rbx,QWORD[((-8))+rax]
+ mov QWORD[240+r8],r15
+ mov QWORD[232+r8],r14
+ mov QWORD[224+r8],r13
+ mov QWORD[216+r8],r12
+ mov QWORD[160+r8],rbp
+ mov QWORD[144+r8],rbx
+
+ lea rsi,[((-216))+rax]
+ lea rdi,[512+r8]
+ mov ecx,20
+ DD 0xa548f3fc
+
+$L$common_seh_tail:
+ mov rdi,QWORD[8+rax]
+ mov rsi,QWORD[16+rax]
+ mov QWORD[152+r8],rax
+ mov QWORD[168+r8],rsi
+ mov QWORD[176+r8],rdi
+
+ mov rdi,QWORD[40+r9]
+ mov rsi,r8
+ mov ecx,154
+ DD 0xa548f3fc
+
+ mov rsi,r9
+ xor rcx,rcx
+ mov rdx,QWORD[8+rsi]
+ mov r8,QWORD[rsi]
+ mov r9,QWORD[16+rsi]
+ mov r10,QWORD[40+rsi]
+ lea r11,[56+rsi]
+ lea r12,[24+rsi]
+ mov QWORD[32+rsp],r10
+ mov QWORD[40+rsp],r11
+ mov QWORD[48+rsp],r12
+ mov QWORD[56+rsp],rcx
+ call QWORD[__imp_RtlVirtualUnwind]
+
+ mov eax,1
+ add rsp,64
+ popfq
+ pop r15
+ pop r14
+ pop r13
+ pop r12
+ pop rbp
+ pop rbx
+ pop rdi
+ pop rsi
+ DB 0F3h,0C3h ;repret
+
+
+section .pdata rdata align=4
+ALIGN 4
+ DD $L$SEH_begin_rsaz_1024_sqr_avx2 wrt ..imagebase
+ DD $L$SEH_end_rsaz_1024_sqr_avx2 wrt ..imagebase
+ DD $L$SEH_info_rsaz_1024_sqr_avx2 wrt ..imagebase
+
+ DD $L$SEH_begin_rsaz_1024_mul_avx2 wrt ..imagebase
+ DD $L$SEH_end_rsaz_1024_mul_avx2 wrt ..imagebase
+ DD $L$SEH_info_rsaz_1024_mul_avx2 wrt ..imagebase
+
+ DD $L$SEH_begin_rsaz_1024_gather5 wrt ..imagebase
+ DD $L$SEH_end_rsaz_1024_gather5 wrt ..imagebase
+ DD $L$SEH_info_rsaz_1024_gather5 wrt ..imagebase
+section .xdata rdata align=8
+ALIGN 8
+$L$SEH_info_rsaz_1024_sqr_avx2:
+DB 9,0,0,0
+ DD rsaz_se_handler wrt ..imagebase
+ DD $L$sqr_1024_body wrt ..imagebase,$L$sqr_1024_epilogue wrt ..imagebase,$L$sqr_1024_in_tail wrt ..imagebase
+ DD 0
+$L$SEH_info_rsaz_1024_mul_avx2:
+DB 9,0,0,0
+ DD rsaz_se_handler wrt ..imagebase
+ DD $L$mul_1024_body wrt ..imagebase,$L$mul_1024_epilogue wrt ..imagebase,$L$mul_1024_in_tail wrt ..imagebase
+ DD 0
+$L$SEH_info_rsaz_1024_gather5:
+DB 0x01,0x36,0x17,0x0b
+DB 0x36,0xf8,0x09,0x00
+DB 0x31,0xe8,0x08,0x00
+DB 0x2c,0xd8,0x07,0x00
+DB 0x27,0xc8,0x06,0x00
+DB 0x22,0xb8,0x05,0x00
+DB 0x1d,0xa8,0x04,0x00
+DB 0x18,0x98,0x03,0x00
+DB 0x13,0x88,0x02,0x00
+DB 0x0e,0x78,0x01,0x00
+DB 0x09,0x68,0x00,0x00
+DB 0x04,0x01,0x15,0x00
+DB 0x00,0xb3,0x00,0x00
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-x86_64.nasm
new file mode 100644
index 0000000000..eb4958e903
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-x86_64.nasm
@@ -0,0 +1,2242 @@
+; Copyright 2013-2016 The OpenSSL Project Authors. All Rights Reserved.
+; Copyright (c) 2012, Intel Corporation. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+EXTERN OPENSSL_ia32cap_P
+
+global rsaz_512_sqr
+
+ALIGN 32
+rsaz_512_sqr:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_rsaz_512_sqr:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+
+
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+
+ sub rsp,128+24
+
+$L$sqr_body:
+ mov rbp,rdx
+ mov rdx,QWORD[rsi]
+ mov rax,QWORD[8+rsi]
+ mov QWORD[128+rsp],rcx
+ mov r11d,0x80100
+ and r11d,DWORD[((OPENSSL_ia32cap_P+8))]
+ cmp r11d,0x80100
+ je NEAR $L$oop_sqrx
+ jmp NEAR $L$oop_sqr
+
+ALIGN 32
+$L$oop_sqr:
+ mov DWORD[((128+8))+rsp],r8d
+
+ mov rbx,rdx
+ mul rdx
+ mov r8,rax
+ mov rax,QWORD[16+rsi]
+ mov r9,rdx
+
+ mul rbx
+ add r9,rax
+ mov rax,QWORD[24+rsi]
+ mov r10,rdx
+ adc r10,0
+
+ mul rbx
+ add r10,rax
+ mov rax,QWORD[32+rsi]
+ mov r11,rdx
+ adc r11,0
+
+ mul rbx
+ add r11,rax
+ mov rax,QWORD[40+rsi]
+ mov r12,rdx
+ adc r12,0
+
+ mul rbx
+ add r12,rax
+ mov rax,QWORD[48+rsi]
+ mov r13,rdx
+ adc r13,0
+
+ mul rbx
+ add r13,rax
+ mov rax,QWORD[56+rsi]
+ mov r14,rdx
+ adc r14,0
+
+ mul rbx
+ add r14,rax
+ mov rax,rbx
+ mov r15,rdx
+ adc r15,0
+
+ add r8,r8
+ mov rcx,r9
+ adc r9,r9
+
+ mul rax
+ mov QWORD[rsp],rax
+ add r8,rdx
+ adc r9,0
+
+ mov QWORD[8+rsp],r8
+ shr rcx,63
+
+
+ mov r8,QWORD[8+rsi]
+ mov rax,QWORD[16+rsi]
+ mul r8
+ add r10,rax
+ mov rax,QWORD[24+rsi]
+ mov rbx,rdx
+ adc rbx,0
+
+ mul r8
+ add r11,rax
+ mov rax,QWORD[32+rsi]
+ adc rdx,0
+ add r11,rbx
+ mov rbx,rdx
+ adc rbx,0
+
+ mul r8
+ add r12,rax
+ mov rax,QWORD[40+rsi]
+ adc rdx,0
+ add r12,rbx
+ mov rbx,rdx
+ adc rbx,0
+
+ mul r8
+ add r13,rax
+ mov rax,QWORD[48+rsi]
+ adc rdx,0
+ add r13,rbx
+ mov rbx,rdx
+ adc rbx,0
+
+ mul r8
+ add r14,rax
+ mov rax,QWORD[56+rsi]
+ adc rdx,0
+ add r14,rbx
+ mov rbx,rdx
+ adc rbx,0
+
+ mul r8
+ add r15,rax
+ mov rax,r8
+ adc rdx,0
+ add r15,rbx
+ mov r8,rdx
+ mov rdx,r10
+ adc r8,0
+
+ add rdx,rdx
+ lea r10,[r10*2+rcx]
+ mov rbx,r11
+ adc r11,r11
+
+ mul rax
+ add r9,rax
+ adc r10,rdx
+ adc r11,0
+
+ mov QWORD[16+rsp],r9
+ mov QWORD[24+rsp],r10
+ shr rbx,63
+
+
+ mov r9,QWORD[16+rsi]
+ mov rax,QWORD[24+rsi]
+ mul r9
+ add r12,rax
+ mov rax,QWORD[32+rsi]
+ mov rcx,rdx
+ adc rcx,0
+
+ mul r9
+ add r13,rax
+ mov rax,QWORD[40+rsi]
+ adc rdx,0
+ add r13,rcx
+ mov rcx,rdx
+ adc rcx,0
+
+ mul r9
+ add r14,rax
+ mov rax,QWORD[48+rsi]
+ adc rdx,0
+ add r14,rcx
+ mov rcx,rdx
+ adc rcx,0
+
+ mul r9
+ mov r10,r12
+ lea r12,[r12*2+rbx]
+ add r15,rax
+ mov rax,QWORD[56+rsi]
+ adc rdx,0
+ add r15,rcx
+ mov rcx,rdx
+ adc rcx,0
+
+ mul r9
+ shr r10,63
+ add r8,rax
+ mov rax,r9
+ adc rdx,0
+ add r8,rcx
+ mov r9,rdx
+ adc r9,0
+
+ mov rcx,r13
+ lea r13,[r13*2+r10]
+
+ mul rax
+ add r11,rax
+ adc r12,rdx
+ adc r13,0
+
+ mov QWORD[32+rsp],r11
+ mov QWORD[40+rsp],r12
+ shr rcx,63
+
+
+ mov r10,QWORD[24+rsi]
+ mov rax,QWORD[32+rsi]
+ mul r10
+ add r14,rax
+ mov rax,QWORD[40+rsi]
+ mov rbx,rdx
+ adc rbx,0
+
+ mul r10
+ add r15,rax
+ mov rax,QWORD[48+rsi]
+ adc rdx,0
+ add r15,rbx
+ mov rbx,rdx
+ adc rbx,0
+
+ mul r10
+ mov r12,r14
+ lea r14,[r14*2+rcx]
+ add r8,rax
+ mov rax,QWORD[56+rsi]
+ adc rdx,0
+ add r8,rbx
+ mov rbx,rdx
+ adc rbx,0
+
+ mul r10
+ shr r12,63
+ add r9,rax
+ mov rax,r10
+ adc rdx,0
+ add r9,rbx
+ mov r10,rdx
+ adc r10,0
+
+ mov rbx,r15
+ lea r15,[r15*2+r12]
+
+ mul rax
+ add r13,rax
+ adc r14,rdx
+ adc r15,0
+
+ mov QWORD[48+rsp],r13
+ mov QWORD[56+rsp],r14
+ shr rbx,63
+
+
+ mov r11,QWORD[32+rsi]
+ mov rax,QWORD[40+rsi]
+ mul r11
+ add r8,rax
+ mov rax,QWORD[48+rsi]
+ mov rcx,rdx
+ adc rcx,0
+
+ mul r11
+ add r9,rax
+ mov rax,QWORD[56+rsi]
+ adc rdx,0
+ mov r12,r8
+ lea r8,[r8*2+rbx]
+ add r9,rcx
+ mov rcx,rdx
+ adc rcx,0
+
+ mul r11
+ shr r12,63
+ add r10,rax
+ mov rax,r11
+ adc rdx,0
+ add r10,rcx
+ mov r11,rdx
+ adc r11,0
+
+ mov rcx,r9
+ lea r9,[r9*2+r12]
+
+ mul rax
+ add r15,rax
+ adc r8,rdx
+ adc r9,0
+
+ mov QWORD[64+rsp],r15
+ mov QWORD[72+rsp],r8
+ shr rcx,63
+
+
+ mov r12,QWORD[40+rsi]
+ mov rax,QWORD[48+rsi]
+ mul r12
+ add r10,rax
+ mov rax,QWORD[56+rsi]
+ mov rbx,rdx
+ adc rbx,0
+
+ mul r12
+ add r11,rax
+ mov rax,r12
+ mov r15,r10
+ lea r10,[r10*2+rcx]
+ adc rdx,0
+ shr r15,63
+ add r11,rbx
+ mov r12,rdx
+ adc r12,0
+
+ mov rbx,r11
+ lea r11,[r11*2+r15]
+
+ mul rax
+ add r9,rax
+ adc r10,rdx
+ adc r11,0
+
+ mov QWORD[80+rsp],r9
+ mov QWORD[88+rsp],r10
+
+
+ mov r13,QWORD[48+rsi]
+ mov rax,QWORD[56+rsi]
+ mul r13
+ add r12,rax
+ mov rax,r13
+ mov r13,rdx
+ adc r13,0
+
+ xor r14,r14
+ shl rbx,1
+ adc r12,r12
+ adc r13,r13
+ adc r14,r14
+
+ mul rax
+ add r11,rax
+ adc r12,rdx
+ adc r13,0
+
+ mov QWORD[96+rsp],r11
+ mov QWORD[104+rsp],r12
+
+
+ mov rax,QWORD[56+rsi]
+ mul rax
+ add r13,rax
+ adc rdx,0
+
+ add r14,rdx
+
+ mov QWORD[112+rsp],r13
+ mov QWORD[120+rsp],r14
+
+ mov r8,QWORD[rsp]
+ mov r9,QWORD[8+rsp]
+ mov r10,QWORD[16+rsp]
+ mov r11,QWORD[24+rsp]
+ mov r12,QWORD[32+rsp]
+ mov r13,QWORD[40+rsp]
+ mov r14,QWORD[48+rsp]
+ mov r15,QWORD[56+rsp]
+
+ call __rsaz_512_reduce
+
+ add r8,QWORD[64+rsp]
+ adc r9,QWORD[72+rsp]
+ adc r10,QWORD[80+rsp]
+ adc r11,QWORD[88+rsp]
+ adc r12,QWORD[96+rsp]
+ adc r13,QWORD[104+rsp]
+ adc r14,QWORD[112+rsp]
+ adc r15,QWORD[120+rsp]
+ sbb rcx,rcx
+
+ call __rsaz_512_subtract
+
+ mov rdx,r8
+ mov rax,r9
+ mov r8d,DWORD[((128+8))+rsp]
+ mov rsi,rdi
+
+ dec r8d
+ jnz NEAR $L$oop_sqr
+ jmp NEAR $L$sqr_tail
+
+ALIGN 32
+$L$oop_sqrx:
+ mov DWORD[((128+8))+rsp],r8d
+DB 102,72,15,110,199
+DB 102,72,15,110,205
+
+ mulx r9,r8,rax
+
+ mulx r10,rcx,QWORD[16+rsi]
+ xor rbp,rbp
+
+ mulx r11,rax,QWORD[24+rsi]
+ adcx r9,rcx
+
+ mulx r12,rcx,QWORD[32+rsi]
+ adcx r10,rax
+
+ mulx r13,rax,QWORD[40+rsi]
+ adcx r11,rcx
+
+DB 0xc4,0x62,0xf3,0xf6,0xb6,0x30,0x00,0x00,0x00
+ adcx r12,rax
+ adcx r13,rcx
+
+DB 0xc4,0x62,0xfb,0xf6,0xbe,0x38,0x00,0x00,0x00
+ adcx r14,rax
+ adcx r15,rbp
+
+ mov rcx,r9
+ shld r9,r8,1
+ shl r8,1
+
+ xor ebp,ebp
+ mulx rdx,rax,rdx
+ adcx r8,rdx
+ mov rdx,QWORD[8+rsi]
+ adcx r9,rbp
+
+ mov QWORD[rsp],rax
+ mov QWORD[8+rsp],r8
+
+
+ mulx rbx,rax,QWORD[16+rsi]
+ adox r10,rax
+ adcx r11,rbx
+
+DB 0xc4,0x62,0xc3,0xf6,0x86,0x18,0x00,0x00,0x00
+ adox r11,rdi
+ adcx r12,r8
+
+ mulx rbx,rax,QWORD[32+rsi]
+ adox r12,rax
+ adcx r13,rbx
+
+ mulx r8,rdi,QWORD[40+rsi]
+ adox r13,rdi
+ adcx r14,r8
+
+DB 0xc4,0xe2,0xfb,0xf6,0x9e,0x30,0x00,0x00,0x00
+ adox r14,rax
+ adcx r15,rbx
+
+DB 0xc4,0x62,0xc3,0xf6,0x86,0x38,0x00,0x00,0x00
+ adox r15,rdi
+ adcx r8,rbp
+ adox r8,rbp
+
+ mov rbx,r11
+ shld r11,r10,1
+ shld r10,rcx,1
+
+ xor ebp,ebp
+ mulx rcx,rax,rdx
+ mov rdx,QWORD[16+rsi]
+ adcx r9,rax
+ adcx r10,rcx
+ adcx r11,rbp
+
+ mov QWORD[16+rsp],r9
+DB 0x4c,0x89,0x94,0x24,0x18,0x00,0x00,0x00
+
+
+DB 0xc4,0x62,0xc3,0xf6,0x8e,0x18,0x00,0x00,0x00
+ adox r12,rdi
+ adcx r13,r9
+
+ mulx rcx,rax,QWORD[32+rsi]
+ adox r13,rax
+ adcx r14,rcx
+
+ mulx r9,rdi,QWORD[40+rsi]
+ adox r14,rdi
+ adcx r15,r9
+
+DB 0xc4,0xe2,0xfb,0xf6,0x8e,0x30,0x00,0x00,0x00
+ adox r15,rax
+ adcx r8,rcx
+
+DB 0xc4,0x62,0xc3,0xf6,0x8e,0x38,0x00,0x00,0x00
+ adox r8,rdi
+ adcx r9,rbp
+ adox r9,rbp
+
+ mov rcx,r13
+ shld r13,r12,1
+ shld r12,rbx,1
+
+ xor ebp,ebp
+ mulx rdx,rax,rdx
+ adcx r11,rax
+ adcx r12,rdx
+ mov rdx,QWORD[24+rsi]
+ adcx r13,rbp
+
+ mov QWORD[32+rsp],r11
+DB 0x4c,0x89,0xa4,0x24,0x28,0x00,0x00,0x00
+
+
+DB 0xc4,0xe2,0xfb,0xf6,0x9e,0x20,0x00,0x00,0x00
+ adox r14,rax
+ adcx r15,rbx
+
+ mulx r10,rdi,QWORD[40+rsi]
+ adox r15,rdi
+ adcx r8,r10
+
+ mulx rbx,rax,QWORD[48+rsi]
+ adox r8,rax
+ adcx r9,rbx
+
+ mulx r10,rdi,QWORD[56+rsi]
+ adox r9,rdi
+ adcx r10,rbp
+ adox r10,rbp
+
+DB 0x66
+ mov rbx,r15
+ shld r15,r14,1
+ shld r14,rcx,1
+
+ xor ebp,ebp
+ mulx rdx,rax,rdx
+ adcx r13,rax
+ adcx r14,rdx
+ mov rdx,QWORD[32+rsi]
+ adcx r15,rbp
+
+ mov QWORD[48+rsp],r13
+ mov QWORD[56+rsp],r14
+
+
+DB 0xc4,0x62,0xc3,0xf6,0x9e,0x28,0x00,0x00,0x00
+ adox r8,rdi
+ adcx r9,r11
+
+ mulx rcx,rax,QWORD[48+rsi]
+ adox r9,rax
+ adcx r10,rcx
+
+ mulx r11,rdi,QWORD[56+rsi]
+ adox r10,rdi
+ adcx r11,rbp
+ adox r11,rbp
+
+ mov rcx,r9
+ shld r9,r8,1
+ shld r8,rbx,1
+
+ xor ebp,ebp
+ mulx rdx,rax,rdx
+ adcx r15,rax
+ adcx r8,rdx
+ mov rdx,QWORD[40+rsi]
+ adcx r9,rbp
+
+ mov QWORD[64+rsp],r15
+ mov QWORD[72+rsp],r8
+
+
+DB 0xc4,0xe2,0xfb,0xf6,0x9e,0x30,0x00,0x00,0x00
+ adox r10,rax
+ adcx r11,rbx
+
+DB 0xc4,0x62,0xc3,0xf6,0xa6,0x38,0x00,0x00,0x00
+ adox r11,rdi
+ adcx r12,rbp
+ adox r12,rbp
+
+ mov rbx,r11
+ shld r11,r10,1
+ shld r10,rcx,1
+
+ xor ebp,ebp
+ mulx rdx,rax,rdx
+ adcx r9,rax
+ adcx r10,rdx
+ mov rdx,QWORD[48+rsi]
+ adcx r11,rbp
+
+ mov QWORD[80+rsp],r9
+ mov QWORD[88+rsp],r10
+
+
+DB 0xc4,0x62,0xfb,0xf6,0xae,0x38,0x00,0x00,0x00
+ adox r12,rax
+ adox r13,rbp
+
+ xor r14,r14
+ shld r14,r13,1
+ shld r13,r12,1
+ shld r12,rbx,1
+
+ xor ebp,ebp
+ mulx rdx,rax,rdx
+ adcx r11,rax
+ adcx r12,rdx
+ mov rdx,QWORD[56+rsi]
+ adcx r13,rbp
+
+DB 0x4c,0x89,0x9c,0x24,0x60,0x00,0x00,0x00
+DB 0x4c,0x89,0xa4,0x24,0x68,0x00,0x00,0x00
+
+
+ mulx rdx,rax,rdx
+ adox r13,rax
+ adox rdx,rbp
+
+DB 0x66
+ add r14,rdx
+
+ mov QWORD[112+rsp],r13
+ mov QWORD[120+rsp],r14
+DB 102,72,15,126,199
+DB 102,72,15,126,205
+
+ mov rdx,QWORD[128+rsp]
+ mov r8,QWORD[rsp]
+ mov r9,QWORD[8+rsp]
+ mov r10,QWORD[16+rsp]
+ mov r11,QWORD[24+rsp]
+ mov r12,QWORD[32+rsp]
+ mov r13,QWORD[40+rsp]
+ mov r14,QWORD[48+rsp]
+ mov r15,QWORD[56+rsp]
+
+ call __rsaz_512_reducex
+
+ add r8,QWORD[64+rsp]
+ adc r9,QWORD[72+rsp]
+ adc r10,QWORD[80+rsp]
+ adc r11,QWORD[88+rsp]
+ adc r12,QWORD[96+rsp]
+ adc r13,QWORD[104+rsp]
+ adc r14,QWORD[112+rsp]
+ adc r15,QWORD[120+rsp]
+ sbb rcx,rcx
+
+ call __rsaz_512_subtract
+
+ mov rdx,r8
+ mov rax,r9
+ mov r8d,DWORD[((128+8))+rsp]
+ mov rsi,rdi
+
+ dec r8d
+ jnz NEAR $L$oop_sqrx
+
+$L$sqr_tail:
+
+ lea rax,[((128+24+48))+rsp]
+
+ mov r15,QWORD[((-48))+rax]
+
+ mov r14,QWORD[((-40))+rax]
+
+ mov r13,QWORD[((-32))+rax]
+
+ mov r12,QWORD[((-24))+rax]
+
+ mov rbp,QWORD[((-16))+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+
+ lea rsp,[rax]
+
+$L$sqr_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_rsaz_512_sqr:
+global rsaz_512_mul
+
+ALIGN 32
+rsaz_512_mul:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_rsaz_512_mul:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+
+
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+
+ sub rsp,128+24
+
+$L$mul_body:
+DB 102,72,15,110,199
+DB 102,72,15,110,201
+ mov QWORD[128+rsp],r8
+ mov r11d,0x80100
+ and r11d,DWORD[((OPENSSL_ia32cap_P+8))]
+ cmp r11d,0x80100
+ je NEAR $L$mulx
+ mov rbx,QWORD[rdx]
+ mov rbp,rdx
+ call __rsaz_512_mul
+
+DB 102,72,15,126,199
+DB 102,72,15,126,205
+
+ mov r8,QWORD[rsp]
+ mov r9,QWORD[8+rsp]
+ mov r10,QWORD[16+rsp]
+ mov r11,QWORD[24+rsp]
+ mov r12,QWORD[32+rsp]
+ mov r13,QWORD[40+rsp]
+ mov r14,QWORD[48+rsp]
+ mov r15,QWORD[56+rsp]
+
+ call __rsaz_512_reduce
+ jmp NEAR $L$mul_tail
+
+ALIGN 32
+$L$mulx:
+ mov rbp,rdx
+ mov rdx,QWORD[rdx]
+ call __rsaz_512_mulx
+
+DB 102,72,15,126,199
+DB 102,72,15,126,205
+
+ mov rdx,QWORD[128+rsp]
+ mov r8,QWORD[rsp]
+ mov r9,QWORD[8+rsp]
+ mov r10,QWORD[16+rsp]
+ mov r11,QWORD[24+rsp]
+ mov r12,QWORD[32+rsp]
+ mov r13,QWORD[40+rsp]
+ mov r14,QWORD[48+rsp]
+ mov r15,QWORD[56+rsp]
+
+ call __rsaz_512_reducex
+$L$mul_tail:
+ add r8,QWORD[64+rsp]
+ adc r9,QWORD[72+rsp]
+ adc r10,QWORD[80+rsp]
+ adc r11,QWORD[88+rsp]
+ adc r12,QWORD[96+rsp]
+ adc r13,QWORD[104+rsp]
+ adc r14,QWORD[112+rsp]
+ adc r15,QWORD[120+rsp]
+ sbb rcx,rcx
+
+ call __rsaz_512_subtract
+
+ lea rax,[((128+24+48))+rsp]
+
+ mov r15,QWORD[((-48))+rax]
+
+ mov r14,QWORD[((-40))+rax]
+
+ mov r13,QWORD[((-32))+rax]
+
+ mov r12,QWORD[((-24))+rax]
+
+ mov rbp,QWORD[((-16))+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+
+ lea rsp,[rax]
+
+$L$mul_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_rsaz_512_mul:
+global rsaz_512_mul_gather4
+
+ALIGN 32
+rsaz_512_mul_gather4:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_rsaz_512_mul_gather4:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+
+ sub rsp,328
+
+ movaps XMMWORD[160+rsp],xmm6
+ movaps XMMWORD[176+rsp],xmm7
+ movaps XMMWORD[192+rsp],xmm8
+ movaps XMMWORD[208+rsp],xmm9
+ movaps XMMWORD[224+rsp],xmm10
+ movaps XMMWORD[240+rsp],xmm11
+ movaps XMMWORD[256+rsp],xmm12
+ movaps XMMWORD[272+rsp],xmm13
+ movaps XMMWORD[288+rsp],xmm14
+ movaps XMMWORD[304+rsp],xmm15
+$L$mul_gather4_body:
+ movd xmm8,r9d
+ movdqa xmm1,XMMWORD[(($L$inc+16))]
+ movdqa xmm0,XMMWORD[$L$inc]
+
+ pshufd xmm8,xmm8,0
+ movdqa xmm7,xmm1
+ movdqa xmm2,xmm1
+ paddd xmm1,xmm0
+ pcmpeqd xmm0,xmm8
+ movdqa xmm3,xmm7
+ paddd xmm2,xmm1
+ pcmpeqd xmm1,xmm8
+ movdqa xmm4,xmm7
+ paddd xmm3,xmm2
+ pcmpeqd xmm2,xmm8
+ movdqa xmm5,xmm7
+ paddd xmm4,xmm3
+ pcmpeqd xmm3,xmm8
+ movdqa xmm6,xmm7
+ paddd xmm5,xmm4
+ pcmpeqd xmm4,xmm8
+ paddd xmm6,xmm5
+ pcmpeqd xmm5,xmm8
+ paddd xmm7,xmm6
+ pcmpeqd xmm6,xmm8
+ pcmpeqd xmm7,xmm8
+
+ movdqa xmm8,XMMWORD[rdx]
+ movdqa xmm9,XMMWORD[16+rdx]
+ movdqa xmm10,XMMWORD[32+rdx]
+ movdqa xmm11,XMMWORD[48+rdx]
+ pand xmm8,xmm0
+ movdqa xmm12,XMMWORD[64+rdx]
+ pand xmm9,xmm1
+ movdqa xmm13,XMMWORD[80+rdx]
+ pand xmm10,xmm2
+ movdqa xmm14,XMMWORD[96+rdx]
+ pand xmm11,xmm3
+ movdqa xmm15,XMMWORD[112+rdx]
+ lea rbp,[128+rdx]
+ pand xmm12,xmm4
+ pand xmm13,xmm5
+ pand xmm14,xmm6
+ pand xmm15,xmm7
+ por xmm8,xmm10
+ por xmm9,xmm11
+ por xmm8,xmm12
+ por xmm9,xmm13
+ por xmm8,xmm14
+ por xmm9,xmm15
+
+ por xmm8,xmm9
+ pshufd xmm9,xmm8,0x4e
+ por xmm8,xmm9
+ mov r11d,0x80100
+ and r11d,DWORD[((OPENSSL_ia32cap_P+8))]
+ cmp r11d,0x80100
+ je NEAR $L$mulx_gather
+DB 102,76,15,126,195
+
+ mov QWORD[128+rsp],r8
+ mov QWORD[((128+8))+rsp],rdi
+ mov QWORD[((128+16))+rsp],rcx
+
+ mov rax,QWORD[rsi]
+ mov rcx,QWORD[8+rsi]
+ mul rbx
+ mov QWORD[rsp],rax
+ mov rax,rcx
+ mov r8,rdx
+
+ mul rbx
+ add r8,rax
+ mov rax,QWORD[16+rsi]
+ mov r9,rdx
+ adc r9,0
+
+ mul rbx
+ add r9,rax
+ mov rax,QWORD[24+rsi]
+ mov r10,rdx
+ adc r10,0
+
+ mul rbx
+ add r10,rax
+ mov rax,QWORD[32+rsi]
+ mov r11,rdx
+ adc r11,0
+
+ mul rbx
+ add r11,rax
+ mov rax,QWORD[40+rsi]
+ mov r12,rdx
+ adc r12,0
+
+ mul rbx
+ add r12,rax
+ mov rax,QWORD[48+rsi]
+ mov r13,rdx
+ adc r13,0
+
+ mul rbx
+ add r13,rax
+ mov rax,QWORD[56+rsi]
+ mov r14,rdx
+ adc r14,0
+
+ mul rbx
+ add r14,rax
+ mov rax,QWORD[rsi]
+ mov r15,rdx
+ adc r15,0
+
+ lea rdi,[8+rsp]
+ mov ecx,7
+ jmp NEAR $L$oop_mul_gather
+
+ALIGN 32
+$L$oop_mul_gather:
+ movdqa xmm8,XMMWORD[rbp]
+ movdqa xmm9,XMMWORD[16+rbp]
+ movdqa xmm10,XMMWORD[32+rbp]
+ movdqa xmm11,XMMWORD[48+rbp]
+ pand xmm8,xmm0
+ movdqa xmm12,XMMWORD[64+rbp]
+ pand xmm9,xmm1
+ movdqa xmm13,XMMWORD[80+rbp]
+ pand xmm10,xmm2
+ movdqa xmm14,XMMWORD[96+rbp]
+ pand xmm11,xmm3
+ movdqa xmm15,XMMWORD[112+rbp]
+ lea rbp,[128+rbp]
+ pand xmm12,xmm4
+ pand xmm13,xmm5
+ pand xmm14,xmm6
+ pand xmm15,xmm7
+ por xmm8,xmm10
+ por xmm9,xmm11
+ por xmm8,xmm12
+ por xmm9,xmm13
+ por xmm8,xmm14
+ por xmm9,xmm15
+
+ por xmm8,xmm9
+ pshufd xmm9,xmm8,0x4e
+ por xmm8,xmm9
+DB 102,76,15,126,195
+
+ mul rbx
+ add r8,rax
+ mov rax,QWORD[8+rsi]
+ mov QWORD[rdi],r8
+ mov r8,rdx
+ adc r8,0
+
+ mul rbx
+ add r9,rax
+ mov rax,QWORD[16+rsi]
+ adc rdx,0
+ add r8,r9
+ mov r9,rdx
+ adc r9,0
+
+ mul rbx
+ add r10,rax
+ mov rax,QWORD[24+rsi]
+ adc rdx,0
+ add r9,r10
+ mov r10,rdx
+ adc r10,0
+
+ mul rbx
+ add r11,rax
+ mov rax,QWORD[32+rsi]
+ adc rdx,0
+ add r10,r11
+ mov r11,rdx
+ adc r11,0
+
+ mul rbx
+ add r12,rax
+ mov rax,QWORD[40+rsi]
+ adc rdx,0
+ add r11,r12
+ mov r12,rdx
+ adc r12,0
+
+ mul rbx
+ add r13,rax
+ mov rax,QWORD[48+rsi]
+ adc rdx,0
+ add r12,r13
+ mov r13,rdx
+ adc r13,0
+
+ mul rbx
+ add r14,rax
+ mov rax,QWORD[56+rsi]
+ adc rdx,0
+ add r13,r14
+ mov r14,rdx
+ adc r14,0
+
+ mul rbx
+ add r15,rax
+ mov rax,QWORD[rsi]
+ adc rdx,0
+ add r14,r15
+ mov r15,rdx
+ adc r15,0
+
+ lea rdi,[8+rdi]
+
+ dec ecx
+ jnz NEAR $L$oop_mul_gather
+
+ mov QWORD[rdi],r8
+ mov QWORD[8+rdi],r9
+ mov QWORD[16+rdi],r10
+ mov QWORD[24+rdi],r11
+ mov QWORD[32+rdi],r12
+ mov QWORD[40+rdi],r13
+ mov QWORD[48+rdi],r14
+ mov QWORD[56+rdi],r15
+
+ mov rdi,QWORD[((128+8))+rsp]
+ mov rbp,QWORD[((128+16))+rsp]
+
+ mov r8,QWORD[rsp]
+ mov r9,QWORD[8+rsp]
+ mov r10,QWORD[16+rsp]
+ mov r11,QWORD[24+rsp]
+ mov r12,QWORD[32+rsp]
+ mov r13,QWORD[40+rsp]
+ mov r14,QWORD[48+rsp]
+ mov r15,QWORD[56+rsp]
+
+ call __rsaz_512_reduce
+ jmp NEAR $L$mul_gather_tail
+
+ALIGN 32
+$L$mulx_gather:
+DB 102,76,15,126,194
+
+ mov QWORD[128+rsp],r8
+ mov QWORD[((128+8))+rsp],rdi
+ mov QWORD[((128+16))+rsp],rcx
+
+ mulx r8,rbx,QWORD[rsi]
+ mov QWORD[rsp],rbx
+ xor edi,edi
+
+ mulx r9,rax,QWORD[8+rsi]
+
+ mulx r10,rbx,QWORD[16+rsi]
+ adcx r8,rax
+
+ mulx r11,rax,QWORD[24+rsi]
+ adcx r9,rbx
+
+ mulx r12,rbx,QWORD[32+rsi]
+ adcx r10,rax
+
+ mulx r13,rax,QWORD[40+rsi]
+ adcx r11,rbx
+
+ mulx r14,rbx,QWORD[48+rsi]
+ adcx r12,rax
+
+ mulx r15,rax,QWORD[56+rsi]
+ adcx r13,rbx
+ adcx r14,rax
+DB 0x67
+ mov rbx,r8
+ adcx r15,rdi
+
+ mov rcx,-7
+ jmp NEAR $L$oop_mulx_gather
+
+ALIGN 32
+$L$oop_mulx_gather:
+ movdqa xmm8,XMMWORD[rbp]
+ movdqa xmm9,XMMWORD[16+rbp]
+ movdqa xmm10,XMMWORD[32+rbp]
+ movdqa xmm11,XMMWORD[48+rbp]
+ pand xmm8,xmm0
+ movdqa xmm12,XMMWORD[64+rbp]
+ pand xmm9,xmm1
+ movdqa xmm13,XMMWORD[80+rbp]
+ pand xmm10,xmm2
+ movdqa xmm14,XMMWORD[96+rbp]
+ pand xmm11,xmm3
+ movdqa xmm15,XMMWORD[112+rbp]
+ lea rbp,[128+rbp]
+ pand xmm12,xmm4
+ pand xmm13,xmm5
+ pand xmm14,xmm6
+ pand xmm15,xmm7
+ por xmm8,xmm10
+ por xmm9,xmm11
+ por xmm8,xmm12
+ por xmm9,xmm13
+ por xmm8,xmm14
+ por xmm9,xmm15
+
+ por xmm8,xmm9
+ pshufd xmm9,xmm8,0x4e
+ por xmm8,xmm9
+DB 102,76,15,126,194
+
+DB 0xc4,0x62,0xfb,0xf6,0x86,0x00,0x00,0x00,0x00
+ adcx rbx,rax
+ adox r8,r9
+
+ mulx r9,rax,QWORD[8+rsi]
+ adcx r8,rax
+ adox r9,r10
+
+ mulx r10,rax,QWORD[16+rsi]
+ adcx r9,rax
+ adox r10,r11
+
+DB 0xc4,0x62,0xfb,0xf6,0x9e,0x18,0x00,0x00,0x00
+ adcx r10,rax
+ adox r11,r12
+
+ mulx r12,rax,QWORD[32+rsi]
+ adcx r11,rax
+ adox r12,r13
+
+ mulx r13,rax,QWORD[40+rsi]
+ adcx r12,rax
+ adox r13,r14
+
+DB 0xc4,0x62,0xfb,0xf6,0xb6,0x30,0x00,0x00,0x00
+ adcx r13,rax
+DB 0x67
+ adox r14,r15
+
+ mulx r15,rax,QWORD[56+rsi]
+ mov QWORD[64+rcx*8+rsp],rbx
+ adcx r14,rax
+ adox r15,rdi
+ mov rbx,r8
+ adcx r15,rdi
+
+ inc rcx
+ jnz NEAR $L$oop_mulx_gather
+
+ mov QWORD[64+rsp],r8
+ mov QWORD[((64+8))+rsp],r9
+ mov QWORD[((64+16))+rsp],r10
+ mov QWORD[((64+24))+rsp],r11
+ mov QWORD[((64+32))+rsp],r12
+ mov QWORD[((64+40))+rsp],r13
+ mov QWORD[((64+48))+rsp],r14
+ mov QWORD[((64+56))+rsp],r15
+
+ mov rdx,QWORD[128+rsp]
+ mov rdi,QWORD[((128+8))+rsp]
+ mov rbp,QWORD[((128+16))+rsp]
+
+ mov r8,QWORD[rsp]
+ mov r9,QWORD[8+rsp]
+ mov r10,QWORD[16+rsp]
+ mov r11,QWORD[24+rsp]
+ mov r12,QWORD[32+rsp]
+ mov r13,QWORD[40+rsp]
+ mov r14,QWORD[48+rsp]
+ mov r15,QWORD[56+rsp]
+
+ call __rsaz_512_reducex
+
+$L$mul_gather_tail:
+ add r8,QWORD[64+rsp]
+ adc r9,QWORD[72+rsp]
+ adc r10,QWORD[80+rsp]
+ adc r11,QWORD[88+rsp]
+ adc r12,QWORD[96+rsp]
+ adc r13,QWORD[104+rsp]
+ adc r14,QWORD[112+rsp]
+ adc r15,QWORD[120+rsp]
+ sbb rcx,rcx
+
+ call __rsaz_512_subtract
+
+ lea rax,[((128+24+48))+rsp]
+ movaps xmm6,XMMWORD[((160-200))+rax]
+ movaps xmm7,XMMWORD[((176-200))+rax]
+ movaps xmm8,XMMWORD[((192-200))+rax]
+ movaps xmm9,XMMWORD[((208-200))+rax]
+ movaps xmm10,XMMWORD[((224-200))+rax]
+ movaps xmm11,XMMWORD[((240-200))+rax]
+ movaps xmm12,XMMWORD[((256-200))+rax]
+ movaps xmm13,XMMWORD[((272-200))+rax]
+ movaps xmm14,XMMWORD[((288-200))+rax]
+ movaps xmm15,XMMWORD[((304-200))+rax]
+ lea rax,[176+rax]
+
+ mov r15,QWORD[((-48))+rax]
+
+ mov r14,QWORD[((-40))+rax]
+
+ mov r13,QWORD[((-32))+rax]
+
+ mov r12,QWORD[((-24))+rax]
+
+ mov rbp,QWORD[((-16))+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+
+ lea rsp,[rax]
+
+$L$mul_gather4_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_rsaz_512_mul_gather4:
+global rsaz_512_mul_scatter4
+
+ALIGN 32
+rsaz_512_mul_scatter4:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_rsaz_512_mul_scatter4:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+
+ mov r9d,r9d
+ sub rsp,128+24
+
+$L$mul_scatter4_body:
+ lea r8,[r9*8+r8]
+DB 102,72,15,110,199
+DB 102,72,15,110,202
+DB 102,73,15,110,208
+ mov QWORD[128+rsp],rcx
+
+ mov rbp,rdi
+ mov r11d,0x80100
+ and r11d,DWORD[((OPENSSL_ia32cap_P+8))]
+ cmp r11d,0x80100
+ je NEAR $L$mulx_scatter
+ mov rbx,QWORD[rdi]
+ call __rsaz_512_mul
+
+DB 102,72,15,126,199
+DB 102,72,15,126,205
+
+ mov r8,QWORD[rsp]
+ mov r9,QWORD[8+rsp]
+ mov r10,QWORD[16+rsp]
+ mov r11,QWORD[24+rsp]
+ mov r12,QWORD[32+rsp]
+ mov r13,QWORD[40+rsp]
+ mov r14,QWORD[48+rsp]
+ mov r15,QWORD[56+rsp]
+
+ call __rsaz_512_reduce
+ jmp NEAR $L$mul_scatter_tail
+
+ALIGN 32
+$L$mulx_scatter:
+ mov rdx,QWORD[rdi]
+ call __rsaz_512_mulx
+
+DB 102,72,15,126,199
+DB 102,72,15,126,205
+
+ mov rdx,QWORD[128+rsp]
+ mov r8,QWORD[rsp]
+ mov r9,QWORD[8+rsp]
+ mov r10,QWORD[16+rsp]
+ mov r11,QWORD[24+rsp]
+ mov r12,QWORD[32+rsp]
+ mov r13,QWORD[40+rsp]
+ mov r14,QWORD[48+rsp]
+ mov r15,QWORD[56+rsp]
+
+ call __rsaz_512_reducex
+
+$L$mul_scatter_tail:
+ add r8,QWORD[64+rsp]
+ adc r9,QWORD[72+rsp]
+ adc r10,QWORD[80+rsp]
+ adc r11,QWORD[88+rsp]
+ adc r12,QWORD[96+rsp]
+ adc r13,QWORD[104+rsp]
+ adc r14,QWORD[112+rsp]
+ adc r15,QWORD[120+rsp]
+DB 102,72,15,126,214
+ sbb rcx,rcx
+
+ call __rsaz_512_subtract
+
+ mov QWORD[rsi],r8
+ mov QWORD[128+rsi],r9
+ mov QWORD[256+rsi],r10
+ mov QWORD[384+rsi],r11
+ mov QWORD[512+rsi],r12
+ mov QWORD[640+rsi],r13
+ mov QWORD[768+rsi],r14
+ mov QWORD[896+rsi],r15
+
+ lea rax,[((128+24+48))+rsp]
+
+ mov r15,QWORD[((-48))+rax]
+
+ mov r14,QWORD[((-40))+rax]
+
+ mov r13,QWORD[((-32))+rax]
+
+ mov r12,QWORD[((-24))+rax]
+
+ mov rbp,QWORD[((-16))+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+
+ lea rsp,[rax]
+
+$L$mul_scatter4_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_rsaz_512_mul_scatter4:
+global rsaz_512_mul_by_one
+
+ALIGN 32
+rsaz_512_mul_by_one:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_rsaz_512_mul_by_one:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+
+
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+
+ sub rsp,128+24
+
+$L$mul_by_one_body:
+ mov eax,DWORD[((OPENSSL_ia32cap_P+8))]
+ mov rbp,rdx
+ mov QWORD[128+rsp],rcx
+
+ mov r8,QWORD[rsi]
+ pxor xmm0,xmm0
+ mov r9,QWORD[8+rsi]
+ mov r10,QWORD[16+rsi]
+ mov r11,QWORD[24+rsi]
+ mov r12,QWORD[32+rsi]
+ mov r13,QWORD[40+rsi]
+ mov r14,QWORD[48+rsi]
+ mov r15,QWORD[56+rsi]
+
+ movdqa XMMWORD[rsp],xmm0
+ movdqa XMMWORD[16+rsp],xmm0
+ movdqa XMMWORD[32+rsp],xmm0
+ movdqa XMMWORD[48+rsp],xmm0
+ movdqa XMMWORD[64+rsp],xmm0
+ movdqa XMMWORD[80+rsp],xmm0
+ movdqa XMMWORD[96+rsp],xmm0
+ and eax,0x80100
+ cmp eax,0x80100
+ je NEAR $L$by_one_callx
+ call __rsaz_512_reduce
+ jmp NEAR $L$by_one_tail
+ALIGN 32
+$L$by_one_callx:
+ mov rdx,QWORD[128+rsp]
+ call __rsaz_512_reducex
+$L$by_one_tail:
+ mov QWORD[rdi],r8
+ mov QWORD[8+rdi],r9
+ mov QWORD[16+rdi],r10
+ mov QWORD[24+rdi],r11
+ mov QWORD[32+rdi],r12
+ mov QWORD[40+rdi],r13
+ mov QWORD[48+rdi],r14
+ mov QWORD[56+rdi],r15
+
+ lea rax,[((128+24+48))+rsp]
+
+ mov r15,QWORD[((-48))+rax]
+
+ mov r14,QWORD[((-40))+rax]
+
+ mov r13,QWORD[((-32))+rax]
+
+ mov r12,QWORD[((-24))+rax]
+
+ mov rbp,QWORD[((-16))+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+
+ lea rsp,[rax]
+
+$L$mul_by_one_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_rsaz_512_mul_by_one:
+
+ALIGN 32
+__rsaz_512_reduce:
+ mov rbx,r8
+ imul rbx,QWORD[((128+8))+rsp]
+ mov rax,QWORD[rbp]
+ mov ecx,8
+ jmp NEAR $L$reduction_loop
+
+ALIGN 32
+$L$reduction_loop:
+ mul rbx
+ mov rax,QWORD[8+rbp]
+ neg r8
+ mov r8,rdx
+ adc r8,0
+
+ mul rbx
+ add r9,rax
+ mov rax,QWORD[16+rbp]
+ adc rdx,0
+ add r8,r9
+ mov r9,rdx
+ adc r9,0
+
+ mul rbx
+ add r10,rax
+ mov rax,QWORD[24+rbp]
+ adc rdx,0
+ add r9,r10
+ mov r10,rdx
+ adc r10,0
+
+ mul rbx
+ add r11,rax
+ mov rax,QWORD[32+rbp]
+ adc rdx,0
+ add r10,r11
+ mov rsi,QWORD[((128+8))+rsp]
+
+
+ adc rdx,0
+ mov r11,rdx
+
+ mul rbx
+ add r12,rax
+ mov rax,QWORD[40+rbp]
+ adc rdx,0
+ imul rsi,r8
+ add r11,r12
+ mov r12,rdx
+ adc r12,0
+
+ mul rbx
+ add r13,rax
+ mov rax,QWORD[48+rbp]
+ adc rdx,0
+ add r12,r13
+ mov r13,rdx
+ adc r13,0
+
+ mul rbx
+ add r14,rax
+ mov rax,QWORD[56+rbp]
+ adc rdx,0
+ add r13,r14
+ mov r14,rdx
+ adc r14,0
+
+ mul rbx
+ mov rbx,rsi
+ add r15,rax
+ mov rax,QWORD[rbp]
+ adc rdx,0
+ add r14,r15
+ mov r15,rdx
+ adc r15,0
+
+ dec ecx
+ jne NEAR $L$reduction_loop
+
+ DB 0F3h,0C3h ;repret
+
+
+ALIGN 32
+__rsaz_512_reducex:
+
+ imul rdx,r8
+ xor rsi,rsi
+ mov ecx,8
+ jmp NEAR $L$reduction_loopx
+
+ALIGN 32
+$L$reduction_loopx:
+ mov rbx,r8
+ mulx r8,rax,QWORD[rbp]
+ adcx rax,rbx
+ adox r8,r9
+
+ mulx r9,rax,QWORD[8+rbp]
+ adcx r8,rax
+ adox r9,r10
+
+ mulx r10,rbx,QWORD[16+rbp]
+ adcx r9,rbx
+ adox r10,r11
+
+ mulx r11,rbx,QWORD[24+rbp]
+ adcx r10,rbx
+ adox r11,r12
+
+DB 0xc4,0x62,0xe3,0xf6,0xa5,0x20,0x00,0x00,0x00
+ mov rax,rdx
+ mov rdx,r8
+ adcx r11,rbx
+ adox r12,r13
+
+ mulx rdx,rbx,QWORD[((128+8))+rsp]
+ mov rdx,rax
+
+ mulx r13,rax,QWORD[40+rbp]
+ adcx r12,rax
+ adox r13,r14
+
+DB 0xc4,0x62,0xfb,0xf6,0xb5,0x30,0x00,0x00,0x00
+ adcx r13,rax
+ adox r14,r15
+
+ mulx r15,rax,QWORD[56+rbp]
+ mov rdx,rbx
+ adcx r14,rax
+ adox r15,rsi
+ adcx r15,rsi
+
+ dec ecx
+ jne NEAR $L$reduction_loopx
+
+ DB 0F3h,0C3h ;repret
+
+
+ALIGN 32
+__rsaz_512_subtract:
+ mov QWORD[rdi],r8
+ mov QWORD[8+rdi],r9
+ mov QWORD[16+rdi],r10
+ mov QWORD[24+rdi],r11
+ mov QWORD[32+rdi],r12
+ mov QWORD[40+rdi],r13
+ mov QWORD[48+rdi],r14
+ mov QWORD[56+rdi],r15
+
+ mov r8,QWORD[rbp]
+ mov r9,QWORD[8+rbp]
+ neg r8
+ not r9
+ and r8,rcx
+ mov r10,QWORD[16+rbp]
+ and r9,rcx
+ not r10
+ mov r11,QWORD[24+rbp]
+ and r10,rcx
+ not r11
+ mov r12,QWORD[32+rbp]
+ and r11,rcx
+ not r12
+ mov r13,QWORD[40+rbp]
+ and r12,rcx
+ not r13
+ mov r14,QWORD[48+rbp]
+ and r13,rcx
+ not r14
+ mov r15,QWORD[56+rbp]
+ and r14,rcx
+ not r15
+ and r15,rcx
+
+ add r8,QWORD[rdi]
+ adc r9,QWORD[8+rdi]
+ adc r10,QWORD[16+rdi]
+ adc r11,QWORD[24+rdi]
+ adc r12,QWORD[32+rdi]
+ adc r13,QWORD[40+rdi]
+ adc r14,QWORD[48+rdi]
+ adc r15,QWORD[56+rdi]
+
+ mov QWORD[rdi],r8
+ mov QWORD[8+rdi],r9
+ mov QWORD[16+rdi],r10
+ mov QWORD[24+rdi],r11
+ mov QWORD[32+rdi],r12
+ mov QWORD[40+rdi],r13
+ mov QWORD[48+rdi],r14
+ mov QWORD[56+rdi],r15
+
+ DB 0F3h,0C3h ;repret
+
+
+ALIGN 32
+__rsaz_512_mul:
+ lea rdi,[8+rsp]
+
+ mov rax,QWORD[rsi]
+ mul rbx
+ mov QWORD[rdi],rax
+ mov rax,QWORD[8+rsi]
+ mov r8,rdx
+
+ mul rbx
+ add r8,rax
+ mov rax,QWORD[16+rsi]
+ mov r9,rdx
+ adc r9,0
+
+ mul rbx
+ add r9,rax
+ mov rax,QWORD[24+rsi]
+ mov r10,rdx
+ adc r10,0
+
+ mul rbx
+ add r10,rax
+ mov rax,QWORD[32+rsi]
+ mov r11,rdx
+ adc r11,0
+
+ mul rbx
+ add r11,rax
+ mov rax,QWORD[40+rsi]
+ mov r12,rdx
+ adc r12,0
+
+ mul rbx
+ add r12,rax
+ mov rax,QWORD[48+rsi]
+ mov r13,rdx
+ adc r13,0
+
+ mul rbx
+ add r13,rax
+ mov rax,QWORD[56+rsi]
+ mov r14,rdx
+ adc r14,0
+
+ mul rbx
+ add r14,rax
+ mov rax,QWORD[rsi]
+ mov r15,rdx
+ adc r15,0
+
+ lea rbp,[8+rbp]
+ lea rdi,[8+rdi]
+
+ mov ecx,7
+ jmp NEAR $L$oop_mul
+
+ALIGN 32
+$L$oop_mul:
+ mov rbx,QWORD[rbp]
+ mul rbx
+ add r8,rax
+ mov rax,QWORD[8+rsi]
+ mov QWORD[rdi],r8
+ mov r8,rdx
+ adc r8,0
+
+ mul rbx
+ add r9,rax
+ mov rax,QWORD[16+rsi]
+ adc rdx,0
+ add r8,r9
+ mov r9,rdx
+ adc r9,0
+
+ mul rbx
+ add r10,rax
+ mov rax,QWORD[24+rsi]
+ adc rdx,0
+ add r9,r10
+ mov r10,rdx
+ adc r10,0
+
+ mul rbx
+ add r11,rax
+ mov rax,QWORD[32+rsi]
+ adc rdx,0
+ add r10,r11
+ mov r11,rdx
+ adc r11,0
+
+ mul rbx
+ add r12,rax
+ mov rax,QWORD[40+rsi]
+ adc rdx,0
+ add r11,r12
+ mov r12,rdx
+ adc r12,0
+
+ mul rbx
+ add r13,rax
+ mov rax,QWORD[48+rsi]
+ adc rdx,0
+ add r12,r13
+ mov r13,rdx
+ adc r13,0
+
+ mul rbx
+ add r14,rax
+ mov rax,QWORD[56+rsi]
+ adc rdx,0
+ add r13,r14
+ mov r14,rdx
+ lea rbp,[8+rbp]
+ adc r14,0
+
+ mul rbx
+ add r15,rax
+ mov rax,QWORD[rsi]
+ adc rdx,0
+ add r14,r15
+ mov r15,rdx
+ adc r15,0
+
+ lea rdi,[8+rdi]
+
+ dec ecx
+ jnz NEAR $L$oop_mul
+
+ mov QWORD[rdi],r8
+ mov QWORD[8+rdi],r9
+ mov QWORD[16+rdi],r10
+ mov QWORD[24+rdi],r11
+ mov QWORD[32+rdi],r12
+ mov QWORD[40+rdi],r13
+ mov QWORD[48+rdi],r14
+ mov QWORD[56+rdi],r15
+
+ DB 0F3h,0C3h ;repret
+
+
+ALIGN 32
+__rsaz_512_mulx:
+ mulx r8,rbx,QWORD[rsi]
+ mov rcx,-6
+
+ mulx r9,rax,QWORD[8+rsi]
+ mov QWORD[8+rsp],rbx
+
+ mulx r10,rbx,QWORD[16+rsi]
+ adc r8,rax
+
+ mulx r11,rax,QWORD[24+rsi]
+ adc r9,rbx
+
+ mulx r12,rbx,QWORD[32+rsi]
+ adc r10,rax
+
+ mulx r13,rax,QWORD[40+rsi]
+ adc r11,rbx
+
+ mulx r14,rbx,QWORD[48+rsi]
+ adc r12,rax
+
+ mulx r15,rax,QWORD[56+rsi]
+ mov rdx,QWORD[8+rbp]
+ adc r13,rbx
+ adc r14,rax
+ adc r15,0
+
+ xor rdi,rdi
+ jmp NEAR $L$oop_mulx
+
+ALIGN 32
+$L$oop_mulx:
+ mov rbx,r8
+ mulx r8,rax,QWORD[rsi]
+ adcx rbx,rax
+ adox r8,r9
+
+ mulx r9,rax,QWORD[8+rsi]
+ adcx r8,rax
+ adox r9,r10
+
+ mulx r10,rax,QWORD[16+rsi]
+ adcx r9,rax
+ adox r10,r11
+
+ mulx r11,rax,QWORD[24+rsi]
+ adcx r10,rax
+ adox r11,r12
+
+DB 0x3e,0xc4,0x62,0xfb,0xf6,0xa6,0x20,0x00,0x00,0x00
+ adcx r11,rax
+ adox r12,r13
+
+ mulx r13,rax,QWORD[40+rsi]
+ adcx r12,rax
+ adox r13,r14
+
+ mulx r14,rax,QWORD[48+rsi]
+ adcx r13,rax
+ adox r14,r15
+
+ mulx r15,rax,QWORD[56+rsi]
+ mov rdx,QWORD[64+rcx*8+rbp]
+ mov QWORD[((8+64-8))+rcx*8+rsp],rbx
+ adcx r14,rax
+ adox r15,rdi
+ adcx r15,rdi
+
+ inc rcx
+ jnz NEAR $L$oop_mulx
+
+ mov rbx,r8
+ mulx r8,rax,QWORD[rsi]
+ adcx rbx,rax
+ adox r8,r9
+
+DB 0xc4,0x62,0xfb,0xf6,0x8e,0x08,0x00,0x00,0x00
+ adcx r8,rax
+ adox r9,r10
+
+DB 0xc4,0x62,0xfb,0xf6,0x96,0x10,0x00,0x00,0x00
+ adcx r9,rax
+ adox r10,r11
+
+ mulx r11,rax,QWORD[24+rsi]
+ adcx r10,rax
+ adox r11,r12
+
+ mulx r12,rax,QWORD[32+rsi]
+ adcx r11,rax
+ adox r12,r13
+
+ mulx r13,rax,QWORD[40+rsi]
+ adcx r12,rax
+ adox r13,r14
+
+DB 0xc4,0x62,0xfb,0xf6,0xb6,0x30,0x00,0x00,0x00
+ adcx r13,rax
+ adox r14,r15
+
+DB 0xc4,0x62,0xfb,0xf6,0xbe,0x38,0x00,0x00,0x00
+ adcx r14,rax
+ adox r15,rdi
+ adcx r15,rdi
+
+ mov QWORD[((8+64-8))+rsp],rbx
+ mov QWORD[((8+64))+rsp],r8
+ mov QWORD[((8+64+8))+rsp],r9
+ mov QWORD[((8+64+16))+rsp],r10
+ mov QWORD[((8+64+24))+rsp],r11
+ mov QWORD[((8+64+32))+rsp],r12
+ mov QWORD[((8+64+40))+rsp],r13
+ mov QWORD[((8+64+48))+rsp],r14
+ mov QWORD[((8+64+56))+rsp],r15
+
+ DB 0F3h,0C3h ;repret
+
+global rsaz_512_scatter4
+
+ALIGN 16
+rsaz_512_scatter4:
+ lea rcx,[r8*8+rcx]
+ mov r9d,8
+ jmp NEAR $L$oop_scatter
+ALIGN 16
+$L$oop_scatter:
+ mov rax,QWORD[rdx]
+ lea rdx,[8+rdx]
+ mov QWORD[rcx],rax
+ lea rcx,[128+rcx]
+ dec r9d
+ jnz NEAR $L$oop_scatter
+ DB 0F3h,0C3h ;repret
+
+
+global rsaz_512_gather4
+
+ALIGN 16
+rsaz_512_gather4:
+$L$SEH_begin_rsaz_512_gather4:
+DB 0x48,0x81,0xec,0xa8,0x00,0x00,0x00
+DB 0x0f,0x29,0x34,0x24
+DB 0x0f,0x29,0x7c,0x24,0x10
+DB 0x44,0x0f,0x29,0x44,0x24,0x20
+DB 0x44,0x0f,0x29,0x4c,0x24,0x30
+DB 0x44,0x0f,0x29,0x54,0x24,0x40
+DB 0x44,0x0f,0x29,0x5c,0x24,0x50
+DB 0x44,0x0f,0x29,0x64,0x24,0x60
+DB 0x44,0x0f,0x29,0x6c,0x24,0x70
+DB 0x44,0x0f,0x29,0xb4,0x24,0x80,0,0,0
+DB 0x44,0x0f,0x29,0xbc,0x24,0x90,0,0,0
+ movd xmm8,r8d
+ movdqa xmm1,XMMWORD[(($L$inc+16))]
+ movdqa xmm0,XMMWORD[$L$inc]
+
+ pshufd xmm8,xmm8,0
+ movdqa xmm7,xmm1
+ movdqa xmm2,xmm1
+ paddd xmm1,xmm0
+ pcmpeqd xmm0,xmm8
+ movdqa xmm3,xmm7
+ paddd xmm2,xmm1
+ pcmpeqd xmm1,xmm8
+ movdqa xmm4,xmm7
+ paddd xmm3,xmm2
+ pcmpeqd xmm2,xmm8
+ movdqa xmm5,xmm7
+ paddd xmm4,xmm3
+ pcmpeqd xmm3,xmm8
+ movdqa xmm6,xmm7
+ paddd xmm5,xmm4
+ pcmpeqd xmm4,xmm8
+ paddd xmm6,xmm5
+ pcmpeqd xmm5,xmm8
+ paddd xmm7,xmm6
+ pcmpeqd xmm6,xmm8
+ pcmpeqd xmm7,xmm8
+ mov r9d,8
+ jmp NEAR $L$oop_gather
+ALIGN 16
+$L$oop_gather:
+ movdqa xmm8,XMMWORD[rdx]
+ movdqa xmm9,XMMWORD[16+rdx]
+ movdqa xmm10,XMMWORD[32+rdx]
+ movdqa xmm11,XMMWORD[48+rdx]
+ pand xmm8,xmm0
+ movdqa xmm12,XMMWORD[64+rdx]
+ pand xmm9,xmm1
+ movdqa xmm13,XMMWORD[80+rdx]
+ pand xmm10,xmm2
+ movdqa xmm14,XMMWORD[96+rdx]
+ pand xmm11,xmm3
+ movdqa xmm15,XMMWORD[112+rdx]
+ lea rdx,[128+rdx]
+ pand xmm12,xmm4
+ pand xmm13,xmm5
+ pand xmm14,xmm6
+ pand xmm15,xmm7
+ por xmm8,xmm10
+ por xmm9,xmm11
+ por xmm8,xmm12
+ por xmm9,xmm13
+ por xmm8,xmm14
+ por xmm9,xmm15
+
+ por xmm8,xmm9
+ pshufd xmm9,xmm8,0x4e
+ por xmm8,xmm9
+ movq QWORD[rcx],xmm8
+ lea rcx,[8+rcx]
+ dec r9d
+ jnz NEAR $L$oop_gather
+ movaps xmm6,XMMWORD[rsp]
+ movaps xmm7,XMMWORD[16+rsp]
+ movaps xmm8,XMMWORD[32+rsp]
+ movaps xmm9,XMMWORD[48+rsp]
+ movaps xmm10,XMMWORD[64+rsp]
+ movaps xmm11,XMMWORD[80+rsp]
+ movaps xmm12,XMMWORD[96+rsp]
+ movaps xmm13,XMMWORD[112+rsp]
+ movaps xmm14,XMMWORD[128+rsp]
+ movaps xmm15,XMMWORD[144+rsp]
+ add rsp,0xa8
+ DB 0F3h,0C3h ;repret
+$L$SEH_end_rsaz_512_gather4:
+
+
+ALIGN 64
+$L$inc:
+ DD 0,0,1,1
+ DD 2,2,2,2
+EXTERN __imp_RtlVirtualUnwind
+
+ALIGN 16
+se_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ mov rsi,QWORD[8+r9]
+ mov r11,QWORD[56+r9]
+
+ mov r10d,DWORD[r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jb NEAR $L$common_seh_tail
+
+ mov rax,QWORD[152+r8]
+
+ mov r10d,DWORD[4+r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jae NEAR $L$common_seh_tail
+
+ lea rax,[((128+24+48))+rax]
+
+ lea rbx,[$L$mul_gather4_epilogue]
+ cmp rbx,r10
+ jne NEAR $L$se_not_in_mul_gather4
+
+ lea rax,[176+rax]
+
+ lea rsi,[((-48-168))+rax]
+ lea rdi,[512+r8]
+ mov ecx,20
+ DD 0xa548f3fc
+
+$L$se_not_in_mul_gather4:
+ mov rbx,QWORD[((-8))+rax]
+ mov rbp,QWORD[((-16))+rax]
+ mov r12,QWORD[((-24))+rax]
+ mov r13,QWORD[((-32))+rax]
+ mov r14,QWORD[((-40))+rax]
+ mov r15,QWORD[((-48))+rax]
+ mov QWORD[144+r8],rbx
+ mov QWORD[160+r8],rbp
+ mov QWORD[216+r8],r12
+ mov QWORD[224+r8],r13
+ mov QWORD[232+r8],r14
+ mov QWORD[240+r8],r15
+
+$L$common_seh_tail:
+ mov rdi,QWORD[8+rax]
+ mov rsi,QWORD[16+rax]
+ mov QWORD[152+r8],rax
+ mov QWORD[168+r8],rsi
+ mov QWORD[176+r8],rdi
+
+ mov rdi,QWORD[40+r9]
+ mov rsi,r8
+ mov ecx,154
+ DD 0xa548f3fc
+
+ mov rsi,r9
+ xor rcx,rcx
+ mov rdx,QWORD[8+rsi]
+ mov r8,QWORD[rsi]
+ mov r9,QWORD[16+rsi]
+ mov r10,QWORD[40+rsi]
+ lea r11,[56+rsi]
+ lea r12,[24+rsi]
+ mov QWORD[32+rsp],r10
+ mov QWORD[40+rsp],r11
+ mov QWORD[48+rsp],r12
+ mov QWORD[56+rsp],rcx
+ call QWORD[__imp_RtlVirtualUnwind]
+
+ mov eax,1
+ add rsp,64
+ popfq
+ pop r15
+ pop r14
+ pop r13
+ pop r12
+ pop rbp
+ pop rbx
+ pop rdi
+ pop rsi
+ DB 0F3h,0C3h ;repret
+
+
+section .pdata rdata align=4
+ALIGN 4
+ DD $L$SEH_begin_rsaz_512_sqr wrt ..imagebase
+ DD $L$SEH_end_rsaz_512_sqr wrt ..imagebase
+ DD $L$SEH_info_rsaz_512_sqr wrt ..imagebase
+
+ DD $L$SEH_begin_rsaz_512_mul wrt ..imagebase
+ DD $L$SEH_end_rsaz_512_mul wrt ..imagebase
+ DD $L$SEH_info_rsaz_512_mul wrt ..imagebase
+
+ DD $L$SEH_begin_rsaz_512_mul_gather4 wrt ..imagebase
+ DD $L$SEH_end_rsaz_512_mul_gather4 wrt ..imagebase
+ DD $L$SEH_info_rsaz_512_mul_gather4 wrt ..imagebase
+
+ DD $L$SEH_begin_rsaz_512_mul_scatter4 wrt ..imagebase
+ DD $L$SEH_end_rsaz_512_mul_scatter4 wrt ..imagebase
+ DD $L$SEH_info_rsaz_512_mul_scatter4 wrt ..imagebase
+
+ DD $L$SEH_begin_rsaz_512_mul_by_one wrt ..imagebase
+ DD $L$SEH_end_rsaz_512_mul_by_one wrt ..imagebase
+ DD $L$SEH_info_rsaz_512_mul_by_one wrt ..imagebase
+
+ DD $L$SEH_begin_rsaz_512_gather4 wrt ..imagebase
+ DD $L$SEH_end_rsaz_512_gather4 wrt ..imagebase
+ DD $L$SEH_info_rsaz_512_gather4 wrt ..imagebase
+
+section .xdata rdata align=8
+ALIGN 8
+$L$SEH_info_rsaz_512_sqr:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$sqr_body wrt ..imagebase,$L$sqr_epilogue wrt ..imagebase
+$L$SEH_info_rsaz_512_mul:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$mul_body wrt ..imagebase,$L$mul_epilogue wrt ..imagebase
+$L$SEH_info_rsaz_512_mul_gather4:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$mul_gather4_body wrt ..imagebase,$L$mul_gather4_epilogue wrt ..imagebase
+$L$SEH_info_rsaz_512_mul_scatter4:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$mul_scatter4_body wrt ..imagebase,$L$mul_scatter4_epilogue wrt ..imagebase
+$L$SEH_info_rsaz_512_mul_by_one:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$mul_by_one_body wrt ..imagebase,$L$mul_by_one_epilogue wrt ..imagebase
+$L$SEH_info_rsaz_512_gather4:
+DB 0x01,0x46,0x16,0x00
+DB 0x46,0xf8,0x09,0x00
+DB 0x3d,0xe8,0x08,0x00
+DB 0x34,0xd8,0x07,0x00
+DB 0x2e,0xc8,0x06,0x00
+DB 0x28,0xb8,0x05,0x00
+DB 0x22,0xa8,0x04,0x00
+DB 0x1c,0x98,0x03,0x00
+DB 0x16,0x88,0x02,0x00
+DB 0x10,0x78,0x01,0x00
+DB 0x0b,0x68,0x00,0x00
+DB 0x07,0x01,0x15,0x00
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-gf2m.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-gf2m.nasm
new file mode 100644
index 0000000000..b96e85a35a
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-gf2m.nasm
@@ -0,0 +1,432 @@
+; Copyright 2011-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+
+ALIGN 16
+_mul_1x1:
+
+ sub rsp,128+8
+
+ mov r9,-1
+ lea rsi,[rax*1+rax]
+ shr r9,3
+ lea rdi,[rax*4]
+ and r9,rax
+ lea r12,[rax*8]
+ sar rax,63
+ lea r10,[r9*1+r9]
+ sar rsi,63
+ lea r11,[r9*4]
+ and rax,rbp
+ sar rdi,63
+ mov rdx,rax
+ shl rax,63
+ and rsi,rbp
+ shr rdx,1
+ mov rcx,rsi
+ shl rsi,62
+ and rdi,rbp
+ shr rcx,2
+ xor rax,rsi
+ mov rbx,rdi
+ shl rdi,61
+ xor rdx,rcx
+ shr rbx,3
+ xor rax,rdi
+ xor rdx,rbx
+
+ mov r13,r9
+ mov QWORD[rsp],0
+ xor r13,r10
+ mov QWORD[8+rsp],r9
+ mov r14,r11
+ mov QWORD[16+rsp],r10
+ xor r14,r12
+ mov QWORD[24+rsp],r13
+
+ xor r9,r11
+ mov QWORD[32+rsp],r11
+ xor r10,r11
+ mov QWORD[40+rsp],r9
+ xor r13,r11
+ mov QWORD[48+rsp],r10
+ xor r9,r14
+ mov QWORD[56+rsp],r13
+ xor r10,r14
+
+ mov QWORD[64+rsp],r12
+ xor r13,r14
+ mov QWORD[72+rsp],r9
+ xor r9,r11
+ mov QWORD[80+rsp],r10
+ xor r10,r11
+ mov QWORD[88+rsp],r13
+
+ xor r13,r11
+ mov QWORD[96+rsp],r14
+ mov rsi,r8
+ mov QWORD[104+rsp],r9
+ and rsi,rbp
+ mov QWORD[112+rsp],r10
+ shr rbp,4
+ mov QWORD[120+rsp],r13
+ mov rdi,r8
+ and rdi,rbp
+ shr rbp,4
+
+ movq xmm0,QWORD[rsi*8+rsp]
+ mov rsi,r8
+ and rsi,rbp
+ shr rbp,4
+ mov rcx,QWORD[rdi*8+rsp]
+ mov rdi,r8
+ mov rbx,rcx
+ shl rcx,4
+ and rdi,rbp
+ movq xmm1,QWORD[rsi*8+rsp]
+ shr rbx,60
+ xor rax,rcx
+ pslldq xmm1,1
+ mov rsi,r8
+ shr rbp,4
+ xor rdx,rbx
+ and rsi,rbp
+ shr rbp,4
+ pxor xmm0,xmm1
+ mov rcx,QWORD[rdi*8+rsp]
+ mov rdi,r8
+ mov rbx,rcx
+ shl rcx,12
+ and rdi,rbp
+ movq xmm1,QWORD[rsi*8+rsp]
+ shr rbx,52
+ xor rax,rcx
+ pslldq xmm1,2
+ mov rsi,r8
+ shr rbp,4
+ xor rdx,rbx
+ and rsi,rbp
+ shr rbp,4
+ pxor xmm0,xmm1
+ mov rcx,QWORD[rdi*8+rsp]
+ mov rdi,r8
+ mov rbx,rcx
+ shl rcx,20
+ and rdi,rbp
+ movq xmm1,QWORD[rsi*8+rsp]
+ shr rbx,44
+ xor rax,rcx
+ pslldq xmm1,3
+ mov rsi,r8
+ shr rbp,4
+ xor rdx,rbx
+ and rsi,rbp
+ shr rbp,4
+ pxor xmm0,xmm1
+ mov rcx,QWORD[rdi*8+rsp]
+ mov rdi,r8
+ mov rbx,rcx
+ shl rcx,28
+ and rdi,rbp
+ movq xmm1,QWORD[rsi*8+rsp]
+ shr rbx,36
+ xor rax,rcx
+ pslldq xmm1,4
+ mov rsi,r8
+ shr rbp,4
+ xor rdx,rbx
+ and rsi,rbp
+ shr rbp,4
+ pxor xmm0,xmm1
+ mov rcx,QWORD[rdi*8+rsp]
+ mov rdi,r8
+ mov rbx,rcx
+ shl rcx,36
+ and rdi,rbp
+ movq xmm1,QWORD[rsi*8+rsp]
+ shr rbx,28
+ xor rax,rcx
+ pslldq xmm1,5
+ mov rsi,r8
+ shr rbp,4
+ xor rdx,rbx
+ and rsi,rbp
+ shr rbp,4
+ pxor xmm0,xmm1
+ mov rcx,QWORD[rdi*8+rsp]
+ mov rdi,r8
+ mov rbx,rcx
+ shl rcx,44
+ and rdi,rbp
+ movq xmm1,QWORD[rsi*8+rsp]
+ shr rbx,20
+ xor rax,rcx
+ pslldq xmm1,6
+ mov rsi,r8
+ shr rbp,4
+ xor rdx,rbx
+ and rsi,rbp
+ shr rbp,4
+ pxor xmm0,xmm1
+ mov rcx,QWORD[rdi*8+rsp]
+ mov rdi,r8
+ mov rbx,rcx
+ shl rcx,52
+ and rdi,rbp
+ movq xmm1,QWORD[rsi*8+rsp]
+ shr rbx,12
+ xor rax,rcx
+ pslldq xmm1,7
+ mov rsi,r8
+ shr rbp,4
+ xor rdx,rbx
+ and rsi,rbp
+ shr rbp,4
+ pxor xmm0,xmm1
+ mov rcx,QWORD[rdi*8+rsp]
+ mov rbx,rcx
+ shl rcx,60
+DB 102,72,15,126,198
+ shr rbx,4
+ xor rax,rcx
+ psrldq xmm0,8
+ xor rdx,rbx
+DB 102,72,15,126,199
+ xor rax,rsi
+ xor rdx,rdi
+
+ add rsp,128+8
+
+ DB 0F3h,0C3h ;repret
+$L$end_mul_1x1:
+
+
+EXTERN OPENSSL_ia32cap_P
+global bn_GF2m_mul_2x2
+
+ALIGN 16
+bn_GF2m_mul_2x2:
+
+ mov rax,rsp
+ mov r10,QWORD[OPENSSL_ia32cap_P]
+ bt r10,33
+ jnc NEAR $L$vanilla_mul_2x2
+
+DB 102,72,15,110,194
+DB 102,73,15,110,201
+DB 102,73,15,110,208
+ movq xmm3,QWORD[40+rsp]
+ movdqa xmm4,xmm0
+ movdqa xmm5,xmm1
+DB 102,15,58,68,193,0
+ pxor xmm4,xmm2
+ pxor xmm5,xmm3
+DB 102,15,58,68,211,0
+DB 102,15,58,68,229,0
+ xorps xmm4,xmm0
+ xorps xmm4,xmm2
+ movdqa xmm5,xmm4
+ pslldq xmm4,8
+ psrldq xmm5,8
+ pxor xmm2,xmm4
+ pxor xmm0,xmm5
+ movdqu XMMWORD[rcx],xmm2
+ movdqu XMMWORD[16+rcx],xmm0
+ DB 0F3h,0C3h ;repret
+
+ALIGN 16
+$L$vanilla_mul_2x2:
+ lea rsp,[((-136))+rsp]
+
+ mov r10,QWORD[176+rsp]
+ mov QWORD[120+rsp],rdi
+ mov QWORD[128+rsp],rsi
+ mov QWORD[80+rsp],r14
+
+ mov QWORD[88+rsp],r13
+
+ mov QWORD[96+rsp],r12
+
+ mov QWORD[104+rsp],rbp
+
+ mov QWORD[112+rsp],rbx
+
+$L$body_mul_2x2:
+ mov QWORD[32+rsp],rcx
+ mov QWORD[40+rsp],rdx
+ mov QWORD[48+rsp],r8
+ mov QWORD[56+rsp],r9
+ mov QWORD[64+rsp],r10
+
+ mov r8,0xf
+ mov rax,rdx
+ mov rbp,r9
+ call _mul_1x1
+ mov QWORD[16+rsp],rax
+ mov QWORD[24+rsp],rdx
+
+ mov rax,QWORD[48+rsp]
+ mov rbp,QWORD[64+rsp]
+ call _mul_1x1
+ mov QWORD[rsp],rax
+ mov QWORD[8+rsp],rdx
+
+ mov rax,QWORD[40+rsp]
+ mov rbp,QWORD[56+rsp]
+ xor rax,QWORD[48+rsp]
+ xor rbp,QWORD[64+rsp]
+ call _mul_1x1
+ mov rbx,QWORD[rsp]
+ mov rcx,QWORD[8+rsp]
+ mov rdi,QWORD[16+rsp]
+ mov rsi,QWORD[24+rsp]
+ mov rbp,QWORD[32+rsp]
+
+ xor rax,rdx
+ xor rdx,rcx
+ xor rax,rbx
+ mov QWORD[rbp],rbx
+ xor rdx,rdi
+ mov QWORD[24+rbp],rsi
+ xor rax,rsi
+ xor rdx,rsi
+ xor rax,rdx
+ mov QWORD[16+rbp],rdx
+ mov QWORD[8+rbp],rax
+
+ mov r14,QWORD[80+rsp]
+
+ mov r13,QWORD[88+rsp]
+
+ mov r12,QWORD[96+rsp]
+
+ mov rbp,QWORD[104+rsp]
+
+ mov rbx,QWORD[112+rsp]
+
+ mov rdi,QWORD[120+rsp]
+ mov rsi,QWORD[128+rsp]
+ lea rsp,[136+rsp]
+
+$L$epilogue_mul_2x2:
+ DB 0F3h,0C3h ;repret
+$L$end_mul_2x2:
+
+
+DB 71,70,40,50,94,109,41,32,77,117,108,116,105,112,108,105
+DB 99,97,116,105,111,110,32,102,111,114,32,120,56,54,95,54
+DB 52,44,32,67,82,89,80,84,79,71,65,77,83,32,98,121
+DB 32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46
+DB 111,114,103,62,0
+ALIGN 16
+EXTERN __imp_RtlVirtualUnwind
+
+
+ALIGN 16
+se_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ lea r10,[$L$body_mul_2x2]
+ cmp rbx,r10
+ jb NEAR $L$in_prologue
+
+ mov rax,QWORD[152+r8]
+
+ lea r10,[$L$epilogue_mul_2x2]
+ cmp rbx,r10
+ jae NEAR $L$in_prologue
+
+ mov r14,QWORD[80+rax]
+ mov r13,QWORD[88+rax]
+ mov r12,QWORD[96+rax]
+ mov rbp,QWORD[104+rax]
+ mov rbx,QWORD[112+rax]
+ mov rdi,QWORD[120+rax]
+ mov rsi,QWORD[128+rax]
+
+ mov QWORD[144+r8],rbx
+ mov QWORD[160+r8],rbp
+ mov QWORD[168+r8],rsi
+ mov QWORD[176+r8],rdi
+ mov QWORD[216+r8],r12
+ mov QWORD[224+r8],r13
+ mov QWORD[232+r8],r14
+
+ lea rax,[136+rax]
+
+$L$in_prologue:
+ mov QWORD[152+r8],rax
+
+ mov rdi,QWORD[40+r9]
+ mov rsi,r8
+ mov ecx,154
+ DD 0xa548f3fc
+
+ mov rsi,r9
+ xor rcx,rcx
+ mov rdx,QWORD[8+rsi]
+ mov r8,QWORD[rsi]
+ mov r9,QWORD[16+rsi]
+ mov r10,QWORD[40+rsi]
+ lea r11,[56+rsi]
+ lea r12,[24+rsi]
+ mov QWORD[32+rsp],r10
+ mov QWORD[40+rsp],r11
+ mov QWORD[48+rsp],r12
+ mov QWORD[56+rsp],rcx
+ call QWORD[__imp_RtlVirtualUnwind]
+
+ mov eax,1
+ add rsp,64
+ popfq
+ pop r15
+ pop r14
+ pop r13
+ pop r12
+ pop rbp
+ pop rbx
+ pop rdi
+ pop rsi
+ DB 0F3h,0C3h ;repret
+
+
+section .pdata rdata align=4
+ALIGN 4
+ DD _mul_1x1 wrt ..imagebase
+ DD $L$end_mul_1x1 wrt ..imagebase
+ DD $L$SEH_info_1x1 wrt ..imagebase
+
+ DD $L$vanilla_mul_2x2 wrt ..imagebase
+ DD $L$end_mul_2x2 wrt ..imagebase
+ DD $L$SEH_info_2x2 wrt ..imagebase
+section .xdata rdata align=8
+ALIGN 8
+$L$SEH_info_1x1:
+DB 0x01,0x07,0x02,0x00
+DB 0x07,0x01,0x11,0x00
+$L$SEH_info_2x2:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont.nasm
new file mode 100644
index 0000000000..9ff8ec428f
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont.nasm
@@ -0,0 +1,1479 @@
+; Copyright 2005-2018 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+EXTERN OPENSSL_ia32cap_P
+
+global bn_mul_mont
+
+ALIGN 16
+bn_mul_mont:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_bn_mul_mont:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+
+ mov r9d,r9d
+ mov rax,rsp
+
+ test r9d,3
+ jnz NEAR $L$mul_enter
+ cmp r9d,8
+ jb NEAR $L$mul_enter
+ mov r11d,DWORD[((OPENSSL_ia32cap_P+8))]
+ cmp rdx,rsi
+ jne NEAR $L$mul4x_enter
+ test r9d,7
+ jz NEAR $L$sqr8x_enter
+ jmp NEAR $L$mul4x_enter
+
+ALIGN 16
+$L$mul_enter:
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+
+ neg r9
+ mov r11,rsp
+ lea r10,[((-16))+r9*8+rsp]
+ neg r9
+ and r10,-1024
+
+
+
+
+
+
+
+
+
+ sub r11,r10
+ and r11,-4096
+ lea rsp,[r11*1+r10]
+ mov r11,QWORD[rsp]
+ cmp rsp,r10
+ ja NEAR $L$mul_page_walk
+ jmp NEAR $L$mul_page_walk_done
+
+ALIGN 16
+$L$mul_page_walk:
+ lea rsp,[((-4096))+rsp]
+ mov r11,QWORD[rsp]
+ cmp rsp,r10
+ ja NEAR $L$mul_page_walk
+$L$mul_page_walk_done:
+
+ mov QWORD[8+r9*8+rsp],rax
+
+$L$mul_body:
+ mov r12,rdx
+ mov r8,QWORD[r8]
+ mov rbx,QWORD[r12]
+ mov rax,QWORD[rsi]
+
+ xor r14,r14
+ xor r15,r15
+
+ mov rbp,r8
+ mul rbx
+ mov r10,rax
+ mov rax,QWORD[rcx]
+
+ imul rbp,r10
+ mov r11,rdx
+
+ mul rbp
+ add r10,rax
+ mov rax,QWORD[8+rsi]
+ adc rdx,0
+ mov r13,rdx
+
+ lea r15,[1+r15]
+ jmp NEAR $L$1st_enter
+
+ALIGN 16
+$L$1st:
+ add r13,rax
+ mov rax,QWORD[r15*8+rsi]
+ adc rdx,0
+ add r13,r11
+ mov r11,r10
+ adc rdx,0
+ mov QWORD[((-16))+r15*8+rsp],r13
+ mov r13,rdx
+
+$L$1st_enter:
+ mul rbx
+ add r11,rax
+ mov rax,QWORD[r15*8+rcx]
+ adc rdx,0
+ lea r15,[1+r15]
+ mov r10,rdx
+
+ mul rbp
+ cmp r15,r9
+ jne NEAR $L$1st
+
+ add r13,rax
+ mov rax,QWORD[rsi]
+ adc rdx,0
+ add r13,r11
+ adc rdx,0
+ mov QWORD[((-16))+r15*8+rsp],r13
+ mov r13,rdx
+ mov r11,r10
+
+ xor rdx,rdx
+ add r13,r11
+ adc rdx,0
+ mov QWORD[((-8))+r9*8+rsp],r13
+ mov QWORD[r9*8+rsp],rdx
+
+ lea r14,[1+r14]
+ jmp NEAR $L$outer
+ALIGN 16
+$L$outer:
+ mov rbx,QWORD[r14*8+r12]
+ xor r15,r15
+ mov rbp,r8
+ mov r10,QWORD[rsp]
+ mul rbx
+ add r10,rax
+ mov rax,QWORD[rcx]
+ adc rdx,0
+
+ imul rbp,r10
+ mov r11,rdx
+
+ mul rbp
+ add r10,rax
+ mov rax,QWORD[8+rsi]
+ adc rdx,0
+ mov r10,QWORD[8+rsp]
+ mov r13,rdx
+
+ lea r15,[1+r15]
+ jmp NEAR $L$inner_enter
+
+ALIGN 16
+$L$inner:
+ add r13,rax
+ mov rax,QWORD[r15*8+rsi]
+ adc rdx,0
+ add r13,r10
+ mov r10,QWORD[r15*8+rsp]
+ adc rdx,0
+ mov QWORD[((-16))+r15*8+rsp],r13
+ mov r13,rdx
+
+$L$inner_enter:
+ mul rbx
+ add r11,rax
+ mov rax,QWORD[r15*8+rcx]
+ adc rdx,0
+ add r10,r11
+ mov r11,rdx
+ adc r11,0
+ lea r15,[1+r15]
+
+ mul rbp
+ cmp r15,r9
+ jne NEAR $L$inner
+
+ add r13,rax
+ mov rax,QWORD[rsi]
+ adc rdx,0
+ add r13,r10
+ mov r10,QWORD[r15*8+rsp]
+ adc rdx,0
+ mov QWORD[((-16))+r15*8+rsp],r13
+ mov r13,rdx
+
+ xor rdx,rdx
+ add r13,r11
+ adc rdx,0
+ add r13,r10
+ adc rdx,0
+ mov QWORD[((-8))+r9*8+rsp],r13
+ mov QWORD[r9*8+rsp],rdx
+
+ lea r14,[1+r14]
+ cmp r14,r9
+ jb NEAR $L$outer
+
+ xor r14,r14
+ mov rax,QWORD[rsp]
+ mov r15,r9
+
+ALIGN 16
+$L$sub: sbb rax,QWORD[r14*8+rcx]
+ mov QWORD[r14*8+rdi],rax
+ mov rax,QWORD[8+r14*8+rsp]
+ lea r14,[1+r14]
+ dec r15
+ jnz NEAR $L$sub
+
+ sbb rax,0
+ mov rbx,-1
+ xor rbx,rax
+ xor r14,r14
+ mov r15,r9
+
+$L$copy:
+ mov rcx,QWORD[r14*8+rdi]
+ mov rdx,QWORD[r14*8+rsp]
+ and rcx,rbx
+ and rdx,rax
+ mov QWORD[r14*8+rsp],r9
+ or rdx,rcx
+ mov QWORD[r14*8+rdi],rdx
+ lea r14,[1+r14]
+ sub r15,1
+ jnz NEAR $L$copy
+
+ mov rsi,QWORD[8+r9*8+rsp]
+
+ mov rax,1
+ mov r15,QWORD[((-48))+rsi]
+
+ mov r14,QWORD[((-40))+rsi]
+
+ mov r13,QWORD[((-32))+rsi]
+
+ mov r12,QWORD[((-24))+rsi]
+
+ mov rbp,QWORD[((-16))+rsi]
+
+ mov rbx,QWORD[((-8))+rsi]
+
+ lea rsp,[rsi]
+
+$L$mul_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_bn_mul_mont:
+
+ALIGN 16
+bn_mul4x_mont:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_bn_mul4x_mont:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+
+ mov r9d,r9d
+ mov rax,rsp
+
+$L$mul4x_enter:
+ and r11d,0x80100
+ cmp r11d,0x80100
+ je NEAR $L$mulx4x_enter
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+
+ neg r9
+ mov r11,rsp
+ lea r10,[((-32))+r9*8+rsp]
+ neg r9
+ and r10,-1024
+
+ sub r11,r10
+ and r11,-4096
+ lea rsp,[r11*1+r10]
+ mov r11,QWORD[rsp]
+ cmp rsp,r10
+ ja NEAR $L$mul4x_page_walk
+ jmp NEAR $L$mul4x_page_walk_done
+
+$L$mul4x_page_walk:
+ lea rsp,[((-4096))+rsp]
+ mov r11,QWORD[rsp]
+ cmp rsp,r10
+ ja NEAR $L$mul4x_page_walk
+$L$mul4x_page_walk_done:
+
+ mov QWORD[8+r9*8+rsp],rax
+
+$L$mul4x_body:
+ mov QWORD[16+r9*8+rsp],rdi
+ mov r12,rdx
+ mov r8,QWORD[r8]
+ mov rbx,QWORD[r12]
+ mov rax,QWORD[rsi]
+
+ xor r14,r14
+ xor r15,r15
+
+ mov rbp,r8
+ mul rbx
+ mov r10,rax
+ mov rax,QWORD[rcx]
+
+ imul rbp,r10
+ mov r11,rdx
+
+ mul rbp
+ add r10,rax
+ mov rax,QWORD[8+rsi]
+ adc rdx,0
+ mov rdi,rdx
+
+ mul rbx
+ add r11,rax
+ mov rax,QWORD[8+rcx]
+ adc rdx,0
+ mov r10,rdx
+
+ mul rbp
+ add rdi,rax
+ mov rax,QWORD[16+rsi]
+ adc rdx,0
+ add rdi,r11
+ lea r15,[4+r15]
+ adc rdx,0
+ mov QWORD[rsp],rdi
+ mov r13,rdx
+ jmp NEAR $L$1st4x
+ALIGN 16
+$L$1st4x:
+ mul rbx
+ add r10,rax
+ mov rax,QWORD[((-16))+r15*8+rcx]
+ adc rdx,0
+ mov r11,rdx
+
+ mul rbp
+ add r13,rax
+ mov rax,QWORD[((-8))+r15*8+rsi]
+ adc rdx,0
+ add r13,r10
+ adc rdx,0
+ mov QWORD[((-24))+r15*8+rsp],r13
+ mov rdi,rdx
+
+ mul rbx
+ add r11,rax
+ mov rax,QWORD[((-8))+r15*8+rcx]
+ adc rdx,0
+ mov r10,rdx
+
+ mul rbp
+ add rdi,rax
+ mov rax,QWORD[r15*8+rsi]
+ adc rdx,0
+ add rdi,r11
+ adc rdx,0
+ mov QWORD[((-16))+r15*8+rsp],rdi
+ mov r13,rdx
+
+ mul rbx
+ add r10,rax
+ mov rax,QWORD[r15*8+rcx]
+ adc rdx,0
+ mov r11,rdx
+
+ mul rbp
+ add r13,rax
+ mov rax,QWORD[8+r15*8+rsi]
+ adc rdx,0
+ add r13,r10
+ adc rdx,0
+ mov QWORD[((-8))+r15*8+rsp],r13
+ mov rdi,rdx
+
+ mul rbx
+ add r11,rax
+ mov rax,QWORD[8+r15*8+rcx]
+ adc rdx,0
+ lea r15,[4+r15]
+ mov r10,rdx
+
+ mul rbp
+ add rdi,rax
+ mov rax,QWORD[((-16))+r15*8+rsi]
+ adc rdx,0
+ add rdi,r11
+ adc rdx,0
+ mov QWORD[((-32))+r15*8+rsp],rdi
+ mov r13,rdx
+ cmp r15,r9
+ jb NEAR $L$1st4x
+
+ mul rbx
+ add r10,rax
+ mov rax,QWORD[((-16))+r15*8+rcx]
+ adc rdx,0
+ mov r11,rdx
+
+ mul rbp
+ add r13,rax
+ mov rax,QWORD[((-8))+r15*8+rsi]
+ adc rdx,0
+ add r13,r10
+ adc rdx,0
+ mov QWORD[((-24))+r15*8+rsp],r13
+ mov rdi,rdx
+
+ mul rbx
+ add r11,rax
+ mov rax,QWORD[((-8))+r15*8+rcx]
+ adc rdx,0
+ mov r10,rdx
+
+ mul rbp
+ add rdi,rax
+ mov rax,QWORD[rsi]
+ adc rdx,0
+ add rdi,r11
+ adc rdx,0
+ mov QWORD[((-16))+r15*8+rsp],rdi
+ mov r13,rdx
+
+ xor rdi,rdi
+ add r13,r10
+ adc rdi,0
+ mov QWORD[((-8))+r15*8+rsp],r13
+ mov QWORD[r15*8+rsp],rdi
+
+ lea r14,[1+r14]
+ALIGN 4
+$L$outer4x:
+ mov rbx,QWORD[r14*8+r12]
+ xor r15,r15
+ mov r10,QWORD[rsp]
+ mov rbp,r8
+ mul rbx
+ add r10,rax
+ mov rax,QWORD[rcx]
+ adc rdx,0
+
+ imul rbp,r10
+ mov r11,rdx
+
+ mul rbp
+ add r10,rax
+ mov rax,QWORD[8+rsi]
+ adc rdx,0
+ mov rdi,rdx
+
+ mul rbx
+ add r11,rax
+ mov rax,QWORD[8+rcx]
+ adc rdx,0
+ add r11,QWORD[8+rsp]
+ adc rdx,0
+ mov r10,rdx
+
+ mul rbp
+ add rdi,rax
+ mov rax,QWORD[16+rsi]
+ adc rdx,0
+ add rdi,r11
+ lea r15,[4+r15]
+ adc rdx,0
+ mov QWORD[rsp],rdi
+ mov r13,rdx
+ jmp NEAR $L$inner4x
+ALIGN 16
+$L$inner4x:
+ mul rbx
+ add r10,rax
+ mov rax,QWORD[((-16))+r15*8+rcx]
+ adc rdx,0
+ add r10,QWORD[((-16))+r15*8+rsp]
+ adc rdx,0
+ mov r11,rdx
+
+ mul rbp
+ add r13,rax
+ mov rax,QWORD[((-8))+r15*8+rsi]
+ adc rdx,0
+ add r13,r10
+ adc rdx,0
+ mov QWORD[((-24))+r15*8+rsp],r13
+ mov rdi,rdx
+
+ mul rbx
+ add r11,rax
+ mov rax,QWORD[((-8))+r15*8+rcx]
+ adc rdx,0
+ add r11,QWORD[((-8))+r15*8+rsp]
+ adc rdx,0
+ mov r10,rdx
+
+ mul rbp
+ add rdi,rax
+ mov rax,QWORD[r15*8+rsi]
+ adc rdx,0
+ add rdi,r11
+ adc rdx,0
+ mov QWORD[((-16))+r15*8+rsp],rdi
+ mov r13,rdx
+
+ mul rbx
+ add r10,rax
+ mov rax,QWORD[r15*8+rcx]
+ adc rdx,0
+ add r10,QWORD[r15*8+rsp]
+ adc rdx,0
+ mov r11,rdx
+
+ mul rbp
+ add r13,rax
+ mov rax,QWORD[8+r15*8+rsi]
+ adc rdx,0
+ add r13,r10
+ adc rdx,0
+ mov QWORD[((-8))+r15*8+rsp],r13
+ mov rdi,rdx
+
+ mul rbx
+ add r11,rax
+ mov rax,QWORD[8+r15*8+rcx]
+ adc rdx,0
+ add r11,QWORD[8+r15*8+rsp]
+ adc rdx,0
+ lea r15,[4+r15]
+ mov r10,rdx
+
+ mul rbp
+ add rdi,rax
+ mov rax,QWORD[((-16))+r15*8+rsi]
+ adc rdx,0
+ add rdi,r11
+ adc rdx,0
+ mov QWORD[((-32))+r15*8+rsp],rdi
+ mov r13,rdx
+ cmp r15,r9
+ jb NEAR $L$inner4x
+
+ mul rbx
+ add r10,rax
+ mov rax,QWORD[((-16))+r15*8+rcx]
+ adc rdx,0
+ add r10,QWORD[((-16))+r15*8+rsp]
+ adc rdx,0
+ mov r11,rdx
+
+ mul rbp
+ add r13,rax
+ mov rax,QWORD[((-8))+r15*8+rsi]
+ adc rdx,0
+ add r13,r10
+ adc rdx,0
+ mov QWORD[((-24))+r15*8+rsp],r13
+ mov rdi,rdx
+
+ mul rbx
+ add r11,rax
+ mov rax,QWORD[((-8))+r15*8+rcx]
+ adc rdx,0
+ add r11,QWORD[((-8))+r15*8+rsp]
+ adc rdx,0
+ lea r14,[1+r14]
+ mov r10,rdx
+
+ mul rbp
+ add rdi,rax
+ mov rax,QWORD[rsi]
+ adc rdx,0
+ add rdi,r11
+ adc rdx,0
+ mov QWORD[((-16))+r15*8+rsp],rdi
+ mov r13,rdx
+
+ xor rdi,rdi
+ add r13,r10
+ adc rdi,0
+ add r13,QWORD[r9*8+rsp]
+ adc rdi,0
+ mov QWORD[((-8))+r15*8+rsp],r13
+ mov QWORD[r15*8+rsp],rdi
+
+ cmp r14,r9
+ jb NEAR $L$outer4x
+ mov rdi,QWORD[16+r9*8+rsp]
+ lea r15,[((-4))+r9]
+ mov rax,QWORD[rsp]
+ mov rdx,QWORD[8+rsp]
+ shr r15,2
+ lea rsi,[rsp]
+ xor r14,r14
+
+ sub rax,QWORD[rcx]
+ mov rbx,QWORD[16+rsi]
+ mov rbp,QWORD[24+rsi]
+ sbb rdx,QWORD[8+rcx]
+
+$L$sub4x:
+ mov QWORD[r14*8+rdi],rax
+ mov QWORD[8+r14*8+rdi],rdx
+ sbb rbx,QWORD[16+r14*8+rcx]
+ mov rax,QWORD[32+r14*8+rsi]
+ mov rdx,QWORD[40+r14*8+rsi]
+ sbb rbp,QWORD[24+r14*8+rcx]
+ mov QWORD[16+r14*8+rdi],rbx
+ mov QWORD[24+r14*8+rdi],rbp
+ sbb rax,QWORD[32+r14*8+rcx]
+ mov rbx,QWORD[48+r14*8+rsi]
+ mov rbp,QWORD[56+r14*8+rsi]
+ sbb rdx,QWORD[40+r14*8+rcx]
+ lea r14,[4+r14]
+ dec r15
+ jnz NEAR $L$sub4x
+
+ mov QWORD[r14*8+rdi],rax
+ mov rax,QWORD[32+r14*8+rsi]
+ sbb rbx,QWORD[16+r14*8+rcx]
+ mov QWORD[8+r14*8+rdi],rdx
+ sbb rbp,QWORD[24+r14*8+rcx]
+ mov QWORD[16+r14*8+rdi],rbx
+
+ sbb rax,0
+ mov QWORD[24+r14*8+rdi],rbp
+ pxor xmm0,xmm0
+DB 102,72,15,110,224
+ pcmpeqd xmm5,xmm5
+ pshufd xmm4,xmm4,0
+ mov r15,r9
+ pxor xmm5,xmm4
+ shr r15,2
+ xor eax,eax
+
+ jmp NEAR $L$copy4x
+ALIGN 16
+$L$copy4x:
+ movdqa xmm1,XMMWORD[rax*1+rsp]
+ movdqu xmm2,XMMWORD[rax*1+rdi]
+ pand xmm1,xmm4
+ pand xmm2,xmm5
+ movdqa xmm3,XMMWORD[16+rax*1+rsp]
+ movdqa XMMWORD[rax*1+rsp],xmm0
+ por xmm1,xmm2
+ movdqu xmm2,XMMWORD[16+rax*1+rdi]
+ movdqu XMMWORD[rax*1+rdi],xmm1
+ pand xmm3,xmm4
+ pand xmm2,xmm5
+ movdqa XMMWORD[16+rax*1+rsp],xmm0
+ por xmm3,xmm2
+ movdqu XMMWORD[16+rax*1+rdi],xmm3
+ lea rax,[32+rax]
+ dec r15
+ jnz NEAR $L$copy4x
+ mov rsi,QWORD[8+r9*8+rsp]
+
+ mov rax,1
+ mov r15,QWORD[((-48))+rsi]
+
+ mov r14,QWORD[((-40))+rsi]
+
+ mov r13,QWORD[((-32))+rsi]
+
+ mov r12,QWORD[((-24))+rsi]
+
+ mov rbp,QWORD[((-16))+rsi]
+
+ mov rbx,QWORD[((-8))+rsi]
+
+ lea rsp,[rsi]
+
+$L$mul4x_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_bn_mul4x_mont:
+EXTERN bn_sqrx8x_internal
+EXTERN bn_sqr8x_internal
+
+
+ALIGN 32
+bn_sqr8x_mont:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_bn_sqr8x_mont:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+
+ mov rax,rsp
+
+$L$sqr8x_enter:
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+$L$sqr8x_prologue:
+
+ mov r10d,r9d
+ shl r9d,3
+ shl r10,3+2
+ neg r9
+
+
+
+
+
+
+ lea r11,[((-64))+r9*2+rsp]
+ mov rbp,rsp
+ mov r8,QWORD[r8]
+ sub r11,rsi
+ and r11,4095
+ cmp r10,r11
+ jb NEAR $L$sqr8x_sp_alt
+ sub rbp,r11
+ lea rbp,[((-64))+r9*2+rbp]
+ jmp NEAR $L$sqr8x_sp_done
+
+ALIGN 32
+$L$sqr8x_sp_alt:
+ lea r10,[((4096-64))+r9*2]
+ lea rbp,[((-64))+r9*2+rbp]
+ sub r11,r10
+ mov r10,0
+ cmovc r11,r10
+ sub rbp,r11
+$L$sqr8x_sp_done:
+ and rbp,-64
+ mov r11,rsp
+ sub r11,rbp
+ and r11,-4096
+ lea rsp,[rbp*1+r11]
+ mov r10,QWORD[rsp]
+ cmp rsp,rbp
+ ja NEAR $L$sqr8x_page_walk
+ jmp NEAR $L$sqr8x_page_walk_done
+
+ALIGN 16
+$L$sqr8x_page_walk:
+ lea rsp,[((-4096))+rsp]
+ mov r10,QWORD[rsp]
+ cmp rsp,rbp
+ ja NEAR $L$sqr8x_page_walk
+$L$sqr8x_page_walk_done:
+
+ mov r10,r9
+ neg r9
+
+ mov QWORD[32+rsp],r8
+ mov QWORD[40+rsp],rax
+
+$L$sqr8x_body:
+
+DB 102,72,15,110,209
+ pxor xmm0,xmm0
+DB 102,72,15,110,207
+DB 102,73,15,110,218
+ mov eax,DWORD[((OPENSSL_ia32cap_P+8))]
+ and eax,0x80100
+ cmp eax,0x80100
+ jne NEAR $L$sqr8x_nox
+
+ call bn_sqrx8x_internal
+
+
+
+
+ lea rbx,[rcx*1+r8]
+ mov r9,rcx
+ mov rdx,rcx
+DB 102,72,15,126,207
+ sar rcx,3+2
+ jmp NEAR $L$sqr8x_sub
+
+ALIGN 32
+$L$sqr8x_nox:
+ call bn_sqr8x_internal
+
+
+
+
+ lea rbx,[r9*1+rdi]
+ mov rcx,r9
+ mov rdx,r9
+DB 102,72,15,126,207
+ sar rcx,3+2
+ jmp NEAR $L$sqr8x_sub
+
+ALIGN 32
+$L$sqr8x_sub:
+ mov r12,QWORD[rbx]
+ mov r13,QWORD[8+rbx]
+ mov r14,QWORD[16+rbx]
+ mov r15,QWORD[24+rbx]
+ lea rbx,[32+rbx]
+ sbb r12,QWORD[rbp]
+ sbb r13,QWORD[8+rbp]
+ sbb r14,QWORD[16+rbp]
+ sbb r15,QWORD[24+rbp]
+ lea rbp,[32+rbp]
+ mov QWORD[rdi],r12
+ mov QWORD[8+rdi],r13
+ mov QWORD[16+rdi],r14
+ mov QWORD[24+rdi],r15
+ lea rdi,[32+rdi]
+ inc rcx
+ jnz NEAR $L$sqr8x_sub
+
+ sbb rax,0
+ lea rbx,[r9*1+rbx]
+ lea rdi,[r9*1+rdi]
+
+DB 102,72,15,110,200
+ pxor xmm0,xmm0
+ pshufd xmm1,xmm1,0
+ mov rsi,QWORD[40+rsp]
+
+ jmp NEAR $L$sqr8x_cond_copy
+
+ALIGN 32
+$L$sqr8x_cond_copy:
+ movdqa xmm2,XMMWORD[rbx]
+ movdqa xmm3,XMMWORD[16+rbx]
+ lea rbx,[32+rbx]
+ movdqu xmm4,XMMWORD[rdi]
+ movdqu xmm5,XMMWORD[16+rdi]
+ lea rdi,[32+rdi]
+ movdqa XMMWORD[(-32)+rbx],xmm0
+ movdqa XMMWORD[(-16)+rbx],xmm0
+ movdqa XMMWORD[(-32)+rdx*1+rbx],xmm0
+ movdqa XMMWORD[(-16)+rdx*1+rbx],xmm0
+ pcmpeqd xmm0,xmm1
+ pand xmm2,xmm1
+ pand xmm3,xmm1
+ pand xmm4,xmm0
+ pand xmm5,xmm0
+ pxor xmm0,xmm0
+ por xmm4,xmm2
+ por xmm5,xmm3
+ movdqu XMMWORD[(-32)+rdi],xmm4
+ movdqu XMMWORD[(-16)+rdi],xmm5
+ add r9,32
+ jnz NEAR $L$sqr8x_cond_copy
+
+ mov rax,1
+ mov r15,QWORD[((-48))+rsi]
+
+ mov r14,QWORD[((-40))+rsi]
+
+ mov r13,QWORD[((-32))+rsi]
+
+ mov r12,QWORD[((-24))+rsi]
+
+ mov rbp,QWORD[((-16))+rsi]
+
+ mov rbx,QWORD[((-8))+rsi]
+
+ lea rsp,[rsi]
+
+$L$sqr8x_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_bn_sqr8x_mont:
+
+ALIGN 32
+bn_mulx4x_mont:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_bn_mulx4x_mont:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+
+ mov rax,rsp
+
+$L$mulx4x_enter:
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+$L$mulx4x_prologue:
+
+ shl r9d,3
+ xor r10,r10
+ sub r10,r9
+ mov r8,QWORD[r8]
+ lea rbp,[((-72))+r10*1+rsp]
+ and rbp,-128
+ mov r11,rsp
+ sub r11,rbp
+ and r11,-4096
+ lea rsp,[rbp*1+r11]
+ mov r10,QWORD[rsp]
+ cmp rsp,rbp
+ ja NEAR $L$mulx4x_page_walk
+ jmp NEAR $L$mulx4x_page_walk_done
+
+ALIGN 16
+$L$mulx4x_page_walk:
+ lea rsp,[((-4096))+rsp]
+ mov r10,QWORD[rsp]
+ cmp rsp,rbp
+ ja NEAR $L$mulx4x_page_walk
+$L$mulx4x_page_walk_done:
+
+ lea r10,[r9*1+rdx]
+
+
+
+
+
+
+
+
+
+
+
+
+ mov QWORD[rsp],r9
+ shr r9,5
+ mov QWORD[16+rsp],r10
+ sub r9,1
+ mov QWORD[24+rsp],r8
+ mov QWORD[32+rsp],rdi
+ mov QWORD[40+rsp],rax
+
+ mov QWORD[48+rsp],r9
+ jmp NEAR $L$mulx4x_body
+
+ALIGN 32
+$L$mulx4x_body:
+ lea rdi,[8+rdx]
+ mov rdx,QWORD[rdx]
+ lea rbx,[((64+32))+rsp]
+ mov r9,rdx
+
+ mulx rax,r8,QWORD[rsi]
+ mulx r14,r11,QWORD[8+rsi]
+ add r11,rax
+ mov QWORD[8+rsp],rdi
+ mulx r13,r12,QWORD[16+rsi]
+ adc r12,r14
+ adc r13,0
+
+ mov rdi,r8
+ imul r8,QWORD[24+rsp]
+ xor rbp,rbp
+
+ mulx r14,rax,QWORD[24+rsi]
+ mov rdx,r8
+ lea rsi,[32+rsi]
+ adcx r13,rax
+ adcx r14,rbp
+
+ mulx r10,rax,QWORD[rcx]
+ adcx rdi,rax
+ adox r10,r11
+ mulx r11,rax,QWORD[8+rcx]
+ adcx r10,rax
+ adox r11,r12
+DB 0xc4,0x62,0xfb,0xf6,0xa1,0x10,0x00,0x00,0x00
+ mov rdi,QWORD[48+rsp]
+ mov QWORD[((-32))+rbx],r10
+ adcx r11,rax
+ adox r12,r13
+ mulx r15,rax,QWORD[24+rcx]
+ mov rdx,r9
+ mov QWORD[((-24))+rbx],r11
+ adcx r12,rax
+ adox r15,rbp
+ lea rcx,[32+rcx]
+ mov QWORD[((-16))+rbx],r12
+
+ jmp NEAR $L$mulx4x_1st
+
+ALIGN 32
+$L$mulx4x_1st:
+ adcx r15,rbp
+ mulx rax,r10,QWORD[rsi]
+ adcx r10,r14
+ mulx r14,r11,QWORD[8+rsi]
+ adcx r11,rax
+ mulx rax,r12,QWORD[16+rsi]
+ adcx r12,r14
+ mulx r14,r13,QWORD[24+rsi]
+DB 0x67,0x67
+ mov rdx,r8
+ adcx r13,rax
+ adcx r14,rbp
+ lea rsi,[32+rsi]
+ lea rbx,[32+rbx]
+
+ adox r10,r15
+ mulx r15,rax,QWORD[rcx]
+ adcx r10,rax
+ adox r11,r15
+ mulx r15,rax,QWORD[8+rcx]
+ adcx r11,rax
+ adox r12,r15
+ mulx r15,rax,QWORD[16+rcx]
+ mov QWORD[((-40))+rbx],r10
+ adcx r12,rax
+ mov QWORD[((-32))+rbx],r11
+ adox r13,r15
+ mulx r15,rax,QWORD[24+rcx]
+ mov rdx,r9
+ mov QWORD[((-24))+rbx],r12
+ adcx r13,rax
+ adox r15,rbp
+ lea rcx,[32+rcx]
+ mov QWORD[((-16))+rbx],r13
+
+ dec rdi
+ jnz NEAR $L$mulx4x_1st
+
+ mov rax,QWORD[rsp]
+ mov rdi,QWORD[8+rsp]
+ adc r15,rbp
+ add r14,r15
+ sbb r15,r15
+ mov QWORD[((-8))+rbx],r14
+ jmp NEAR $L$mulx4x_outer
+
+ALIGN 32
+$L$mulx4x_outer:
+ mov rdx,QWORD[rdi]
+ lea rdi,[8+rdi]
+ sub rsi,rax
+ mov QWORD[rbx],r15
+ lea rbx,[((64+32))+rsp]
+ sub rcx,rax
+
+ mulx r11,r8,QWORD[rsi]
+ xor ebp,ebp
+ mov r9,rdx
+ mulx r12,r14,QWORD[8+rsi]
+ adox r8,QWORD[((-32))+rbx]
+ adcx r11,r14
+ mulx r13,r15,QWORD[16+rsi]
+ adox r11,QWORD[((-24))+rbx]
+ adcx r12,r15
+ adox r12,QWORD[((-16))+rbx]
+ adcx r13,rbp
+ adox r13,rbp
+
+ mov QWORD[8+rsp],rdi
+ mov r15,r8
+ imul r8,QWORD[24+rsp]
+ xor ebp,ebp
+
+ mulx r14,rax,QWORD[24+rsi]
+ mov rdx,r8
+ adcx r13,rax
+ adox r13,QWORD[((-8))+rbx]
+ adcx r14,rbp
+ lea rsi,[32+rsi]
+ adox r14,rbp
+
+ mulx r10,rax,QWORD[rcx]
+ adcx r15,rax
+ adox r10,r11
+ mulx r11,rax,QWORD[8+rcx]
+ adcx r10,rax
+ adox r11,r12
+ mulx r12,rax,QWORD[16+rcx]
+ mov QWORD[((-32))+rbx],r10
+ adcx r11,rax
+ adox r12,r13
+ mulx r15,rax,QWORD[24+rcx]
+ mov rdx,r9
+ mov QWORD[((-24))+rbx],r11
+ lea rcx,[32+rcx]
+ adcx r12,rax
+ adox r15,rbp
+ mov rdi,QWORD[48+rsp]
+ mov QWORD[((-16))+rbx],r12
+
+ jmp NEAR $L$mulx4x_inner
+
+ALIGN 32
+$L$mulx4x_inner:
+ mulx rax,r10,QWORD[rsi]
+ adcx r15,rbp
+ adox r10,r14
+ mulx r14,r11,QWORD[8+rsi]
+ adcx r10,QWORD[rbx]
+ adox r11,rax
+ mulx rax,r12,QWORD[16+rsi]
+ adcx r11,QWORD[8+rbx]
+ adox r12,r14
+ mulx r14,r13,QWORD[24+rsi]
+ mov rdx,r8
+ adcx r12,QWORD[16+rbx]
+ adox r13,rax
+ adcx r13,QWORD[24+rbx]
+ adox r14,rbp
+ lea rsi,[32+rsi]
+ lea rbx,[32+rbx]
+ adcx r14,rbp
+
+ adox r10,r15
+ mulx r15,rax,QWORD[rcx]
+ adcx r10,rax
+ adox r11,r15
+ mulx r15,rax,QWORD[8+rcx]
+ adcx r11,rax
+ adox r12,r15
+ mulx r15,rax,QWORD[16+rcx]
+ mov QWORD[((-40))+rbx],r10
+ adcx r12,rax
+ adox r13,r15
+ mulx r15,rax,QWORD[24+rcx]
+ mov rdx,r9
+ mov QWORD[((-32))+rbx],r11
+ mov QWORD[((-24))+rbx],r12
+ adcx r13,rax
+ adox r15,rbp
+ lea rcx,[32+rcx]
+ mov QWORD[((-16))+rbx],r13
+
+ dec rdi
+ jnz NEAR $L$mulx4x_inner
+
+ mov rax,QWORD[rsp]
+ mov rdi,QWORD[8+rsp]
+ adc r15,rbp
+ sub rbp,QWORD[rbx]
+ adc r14,r15
+ sbb r15,r15
+ mov QWORD[((-8))+rbx],r14
+
+ cmp rdi,QWORD[16+rsp]
+ jne NEAR $L$mulx4x_outer
+
+ lea rbx,[64+rsp]
+ sub rcx,rax
+ neg r15
+ mov rdx,rax
+ shr rax,3+2
+ mov rdi,QWORD[32+rsp]
+ jmp NEAR $L$mulx4x_sub
+
+ALIGN 32
+$L$mulx4x_sub:
+ mov r11,QWORD[rbx]
+ mov r12,QWORD[8+rbx]
+ mov r13,QWORD[16+rbx]
+ mov r14,QWORD[24+rbx]
+ lea rbx,[32+rbx]
+ sbb r11,QWORD[rcx]
+ sbb r12,QWORD[8+rcx]
+ sbb r13,QWORD[16+rcx]
+ sbb r14,QWORD[24+rcx]
+ lea rcx,[32+rcx]
+ mov QWORD[rdi],r11
+ mov QWORD[8+rdi],r12
+ mov QWORD[16+rdi],r13
+ mov QWORD[24+rdi],r14
+ lea rdi,[32+rdi]
+ dec rax
+ jnz NEAR $L$mulx4x_sub
+
+ sbb r15,0
+ lea rbx,[64+rsp]
+ sub rdi,rdx
+
+DB 102,73,15,110,207
+ pxor xmm0,xmm0
+ pshufd xmm1,xmm1,0
+ mov rsi,QWORD[40+rsp]
+
+ jmp NEAR $L$mulx4x_cond_copy
+
+ALIGN 32
+$L$mulx4x_cond_copy:
+ movdqa xmm2,XMMWORD[rbx]
+ movdqa xmm3,XMMWORD[16+rbx]
+ lea rbx,[32+rbx]
+ movdqu xmm4,XMMWORD[rdi]
+ movdqu xmm5,XMMWORD[16+rdi]
+ lea rdi,[32+rdi]
+ movdqa XMMWORD[(-32)+rbx],xmm0
+ movdqa XMMWORD[(-16)+rbx],xmm0
+ pcmpeqd xmm0,xmm1
+ pand xmm2,xmm1
+ pand xmm3,xmm1
+ pand xmm4,xmm0
+ pand xmm5,xmm0
+ pxor xmm0,xmm0
+ por xmm4,xmm2
+ por xmm5,xmm3
+ movdqu XMMWORD[(-32)+rdi],xmm4
+ movdqu XMMWORD[(-16)+rdi],xmm5
+ sub rdx,32
+ jnz NEAR $L$mulx4x_cond_copy
+
+ mov QWORD[rbx],rdx
+
+ mov rax,1
+ mov r15,QWORD[((-48))+rsi]
+
+ mov r14,QWORD[((-40))+rsi]
+
+ mov r13,QWORD[((-32))+rsi]
+
+ mov r12,QWORD[((-24))+rsi]
+
+ mov rbp,QWORD[((-16))+rsi]
+
+ mov rbx,QWORD[((-8))+rsi]
+
+ lea rsp,[rsi]
+
+$L$mulx4x_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_bn_mulx4x_mont:
+DB 77,111,110,116,103,111,109,101,114,121,32,77,117,108,116,105
+DB 112,108,105,99,97,116,105,111,110,32,102,111,114,32,120,56
+DB 54,95,54,52,44,32,67,82,89,80,84,79,71,65,77,83
+DB 32,98,121,32,60,97,112,112,114,111,64,111,112,101,110,115
+DB 115,108,46,111,114,103,62,0
+ALIGN 16
+EXTERN __imp_RtlVirtualUnwind
+
+ALIGN 16
+mul_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ mov rsi,QWORD[8+r9]
+ mov r11,QWORD[56+r9]
+
+ mov r10d,DWORD[r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jb NEAR $L$common_seh_tail
+
+ mov rax,QWORD[152+r8]
+
+ mov r10d,DWORD[4+r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jae NEAR $L$common_seh_tail
+
+ mov r10,QWORD[192+r8]
+ mov rax,QWORD[8+r10*8+rax]
+
+ jmp NEAR $L$common_pop_regs
+
+
+
+ALIGN 16
+sqr_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ mov rsi,QWORD[8+r9]
+ mov r11,QWORD[56+r9]
+
+ mov r10d,DWORD[r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jb NEAR $L$common_seh_tail
+
+ mov r10d,DWORD[4+r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jb NEAR $L$common_pop_regs
+
+ mov rax,QWORD[152+r8]
+
+ mov r10d,DWORD[8+r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jae NEAR $L$common_seh_tail
+
+ mov rax,QWORD[40+rax]
+
+$L$common_pop_regs:
+ mov rbx,QWORD[((-8))+rax]
+ mov rbp,QWORD[((-16))+rax]
+ mov r12,QWORD[((-24))+rax]
+ mov r13,QWORD[((-32))+rax]
+ mov r14,QWORD[((-40))+rax]
+ mov r15,QWORD[((-48))+rax]
+ mov QWORD[144+r8],rbx
+ mov QWORD[160+r8],rbp
+ mov QWORD[216+r8],r12
+ mov QWORD[224+r8],r13
+ mov QWORD[232+r8],r14
+ mov QWORD[240+r8],r15
+
+$L$common_seh_tail:
+ mov rdi,QWORD[8+rax]
+ mov rsi,QWORD[16+rax]
+ mov QWORD[152+r8],rax
+ mov QWORD[168+r8],rsi
+ mov QWORD[176+r8],rdi
+
+ mov rdi,QWORD[40+r9]
+ mov rsi,r8
+ mov ecx,154
+ DD 0xa548f3fc
+
+ mov rsi,r9
+ xor rcx,rcx
+ mov rdx,QWORD[8+rsi]
+ mov r8,QWORD[rsi]
+ mov r9,QWORD[16+rsi]
+ mov r10,QWORD[40+rsi]
+ lea r11,[56+rsi]
+ lea r12,[24+rsi]
+ mov QWORD[32+rsp],r10
+ mov QWORD[40+rsp],r11
+ mov QWORD[48+rsp],r12
+ mov QWORD[56+rsp],rcx
+ call QWORD[__imp_RtlVirtualUnwind]
+
+ mov eax,1
+ add rsp,64
+ popfq
+ pop r15
+ pop r14
+ pop r13
+ pop r12
+ pop rbp
+ pop rbx
+ pop rdi
+ pop rsi
+ DB 0F3h,0C3h ;repret
+
+
+section .pdata rdata align=4
+ALIGN 4
+ DD $L$SEH_begin_bn_mul_mont wrt ..imagebase
+ DD $L$SEH_end_bn_mul_mont wrt ..imagebase
+ DD $L$SEH_info_bn_mul_mont wrt ..imagebase
+
+ DD $L$SEH_begin_bn_mul4x_mont wrt ..imagebase
+ DD $L$SEH_end_bn_mul4x_mont wrt ..imagebase
+ DD $L$SEH_info_bn_mul4x_mont wrt ..imagebase
+
+ DD $L$SEH_begin_bn_sqr8x_mont wrt ..imagebase
+ DD $L$SEH_end_bn_sqr8x_mont wrt ..imagebase
+ DD $L$SEH_info_bn_sqr8x_mont wrt ..imagebase
+ DD $L$SEH_begin_bn_mulx4x_mont wrt ..imagebase
+ DD $L$SEH_end_bn_mulx4x_mont wrt ..imagebase
+ DD $L$SEH_info_bn_mulx4x_mont wrt ..imagebase
+section .xdata rdata align=8
+ALIGN 8
+$L$SEH_info_bn_mul_mont:
+DB 9,0,0,0
+ DD mul_handler wrt ..imagebase
+ DD $L$mul_body wrt ..imagebase,$L$mul_epilogue wrt ..imagebase
+$L$SEH_info_bn_mul4x_mont:
+DB 9,0,0,0
+ DD mul_handler wrt ..imagebase
+ DD $L$mul4x_body wrt ..imagebase,$L$mul4x_epilogue wrt ..imagebase
+$L$SEH_info_bn_sqr8x_mont:
+DB 9,0,0,0
+ DD sqr_handler wrt ..imagebase
+ DD $L$sqr8x_prologue wrt ..imagebase,$L$sqr8x_body wrt ..imagebase,$L$sqr8x_epilogue wrt ..imagebase
+ALIGN 8
+$L$SEH_info_bn_mulx4x_mont:
+DB 9,0,0,0
+ DD sqr_handler wrt ..imagebase
+ DD $L$mulx4x_prologue wrt ..imagebase,$L$mulx4x_body wrt ..imagebase,$L$mulx4x_epilogue wrt ..imagebase
+ALIGN 8
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont5.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont5.nasm
new file mode 100644
index 0000000000..f256a94476
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont5.nasm
@@ -0,0 +1,4033 @@
+; Copyright 2011-2019 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+EXTERN OPENSSL_ia32cap_P
+
+global bn_mul_mont_gather5
+
+ALIGN 64
+bn_mul_mont_gather5:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_bn_mul_mont_gather5:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+
+ mov r9d,r9d
+ mov rax,rsp
+
+ test r9d,7
+ jnz NEAR $L$mul_enter
+ mov r11d,DWORD[((OPENSSL_ia32cap_P+8))]
+ jmp NEAR $L$mul4x_enter
+
+ALIGN 16
+$L$mul_enter:
+ movd xmm5,DWORD[56+rsp]
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+
+ neg r9
+ mov r11,rsp
+ lea r10,[((-280))+r9*8+rsp]
+ neg r9
+ and r10,-1024
+
+
+
+
+
+
+
+
+
+ sub r11,r10
+ and r11,-4096
+ lea rsp,[r11*1+r10]
+ mov r11,QWORD[rsp]
+ cmp rsp,r10
+ ja NEAR $L$mul_page_walk
+ jmp NEAR $L$mul_page_walk_done
+
+$L$mul_page_walk:
+ lea rsp,[((-4096))+rsp]
+ mov r11,QWORD[rsp]
+ cmp rsp,r10
+ ja NEAR $L$mul_page_walk
+$L$mul_page_walk_done:
+
+ lea r10,[$L$inc]
+ mov QWORD[8+r9*8+rsp],rax
+
+$L$mul_body:
+
+ lea r12,[128+rdx]
+ movdqa xmm0,XMMWORD[r10]
+ movdqa xmm1,XMMWORD[16+r10]
+ lea r10,[((24-112))+r9*8+rsp]
+ and r10,-16
+
+ pshufd xmm5,xmm5,0
+ movdqa xmm4,xmm1
+ movdqa xmm2,xmm1
+ paddd xmm1,xmm0
+ pcmpeqd xmm0,xmm5
+DB 0x67
+ movdqa xmm3,xmm4
+ paddd xmm2,xmm1
+ pcmpeqd xmm1,xmm5
+ movdqa XMMWORD[112+r10],xmm0
+ movdqa xmm0,xmm4
+
+ paddd xmm3,xmm2
+ pcmpeqd xmm2,xmm5
+ movdqa XMMWORD[128+r10],xmm1
+ movdqa xmm1,xmm4
+
+ paddd xmm0,xmm3
+ pcmpeqd xmm3,xmm5
+ movdqa XMMWORD[144+r10],xmm2
+ movdqa xmm2,xmm4
+
+ paddd xmm1,xmm0
+ pcmpeqd xmm0,xmm5
+ movdqa XMMWORD[160+r10],xmm3
+ movdqa xmm3,xmm4
+ paddd xmm2,xmm1
+ pcmpeqd xmm1,xmm5
+ movdqa XMMWORD[176+r10],xmm0
+ movdqa xmm0,xmm4
+
+ paddd xmm3,xmm2
+ pcmpeqd xmm2,xmm5
+ movdqa XMMWORD[192+r10],xmm1
+ movdqa xmm1,xmm4
+
+ paddd xmm0,xmm3
+ pcmpeqd xmm3,xmm5
+ movdqa XMMWORD[208+r10],xmm2
+ movdqa xmm2,xmm4
+
+ paddd xmm1,xmm0
+ pcmpeqd xmm0,xmm5
+ movdqa XMMWORD[224+r10],xmm3
+ movdqa xmm3,xmm4
+ paddd xmm2,xmm1
+ pcmpeqd xmm1,xmm5
+ movdqa XMMWORD[240+r10],xmm0
+ movdqa xmm0,xmm4
+
+ paddd xmm3,xmm2
+ pcmpeqd xmm2,xmm5
+ movdqa XMMWORD[256+r10],xmm1
+ movdqa xmm1,xmm4
+
+ paddd xmm0,xmm3
+ pcmpeqd xmm3,xmm5
+ movdqa XMMWORD[272+r10],xmm2
+ movdqa xmm2,xmm4
+
+ paddd xmm1,xmm0
+ pcmpeqd xmm0,xmm5
+ movdqa XMMWORD[288+r10],xmm3
+ movdqa xmm3,xmm4
+ paddd xmm2,xmm1
+ pcmpeqd xmm1,xmm5
+ movdqa XMMWORD[304+r10],xmm0
+
+ paddd xmm3,xmm2
+DB 0x67
+ pcmpeqd xmm2,xmm5
+ movdqa XMMWORD[320+r10],xmm1
+
+ pcmpeqd xmm3,xmm5
+ movdqa XMMWORD[336+r10],xmm2
+ pand xmm0,XMMWORD[64+r12]
+
+ pand xmm1,XMMWORD[80+r12]
+ pand xmm2,XMMWORD[96+r12]
+ movdqa XMMWORD[352+r10],xmm3
+ pand xmm3,XMMWORD[112+r12]
+ por xmm0,xmm2
+ por xmm1,xmm3
+ movdqa xmm4,XMMWORD[((-128))+r12]
+ movdqa xmm5,XMMWORD[((-112))+r12]
+ movdqa xmm2,XMMWORD[((-96))+r12]
+ pand xmm4,XMMWORD[112+r10]
+ movdqa xmm3,XMMWORD[((-80))+r12]
+ pand xmm5,XMMWORD[128+r10]
+ por xmm0,xmm4
+ pand xmm2,XMMWORD[144+r10]
+ por xmm1,xmm5
+ pand xmm3,XMMWORD[160+r10]
+ por xmm0,xmm2
+ por xmm1,xmm3
+ movdqa xmm4,XMMWORD[((-64))+r12]
+ movdqa xmm5,XMMWORD[((-48))+r12]
+ movdqa xmm2,XMMWORD[((-32))+r12]
+ pand xmm4,XMMWORD[176+r10]
+ movdqa xmm3,XMMWORD[((-16))+r12]
+ pand xmm5,XMMWORD[192+r10]
+ por xmm0,xmm4
+ pand xmm2,XMMWORD[208+r10]
+ por xmm1,xmm5
+ pand xmm3,XMMWORD[224+r10]
+ por xmm0,xmm2
+ por xmm1,xmm3
+ movdqa xmm4,XMMWORD[r12]
+ movdqa xmm5,XMMWORD[16+r12]
+ movdqa xmm2,XMMWORD[32+r12]
+ pand xmm4,XMMWORD[240+r10]
+ movdqa xmm3,XMMWORD[48+r12]
+ pand xmm5,XMMWORD[256+r10]
+ por xmm0,xmm4
+ pand xmm2,XMMWORD[272+r10]
+ por xmm1,xmm5
+ pand xmm3,XMMWORD[288+r10]
+ por xmm0,xmm2
+ por xmm1,xmm3
+ por xmm0,xmm1
+ pshufd xmm1,xmm0,0x4e
+ por xmm0,xmm1
+ lea r12,[256+r12]
+DB 102,72,15,126,195
+
+ mov r8,QWORD[r8]
+ mov rax,QWORD[rsi]
+
+ xor r14,r14
+ xor r15,r15
+
+ mov rbp,r8
+ mul rbx
+ mov r10,rax
+ mov rax,QWORD[rcx]
+
+ imul rbp,r10
+ mov r11,rdx
+
+ mul rbp
+ add r10,rax
+ mov rax,QWORD[8+rsi]
+ adc rdx,0
+ mov r13,rdx
+
+ lea r15,[1+r15]
+ jmp NEAR $L$1st_enter
+
+ALIGN 16
+$L$1st:
+ add r13,rax
+ mov rax,QWORD[r15*8+rsi]
+ adc rdx,0
+ add r13,r11
+ mov r11,r10
+ adc rdx,0
+ mov QWORD[((-16))+r15*8+rsp],r13
+ mov r13,rdx
+
+$L$1st_enter:
+ mul rbx
+ add r11,rax
+ mov rax,QWORD[r15*8+rcx]
+ adc rdx,0
+ lea r15,[1+r15]
+ mov r10,rdx
+
+ mul rbp
+ cmp r15,r9
+ jne NEAR $L$1st
+
+
+ add r13,rax
+ adc rdx,0
+ add r13,r11
+ adc rdx,0
+ mov QWORD[((-16))+r9*8+rsp],r13
+ mov r13,rdx
+ mov r11,r10
+
+ xor rdx,rdx
+ add r13,r11
+ adc rdx,0
+ mov QWORD[((-8))+r9*8+rsp],r13
+ mov QWORD[r9*8+rsp],rdx
+
+ lea r14,[1+r14]
+ jmp NEAR $L$outer
+ALIGN 16
+$L$outer:
+ lea rdx,[((24+128))+r9*8+rsp]
+ and rdx,-16
+ pxor xmm4,xmm4
+ pxor xmm5,xmm5
+ movdqa xmm0,XMMWORD[((-128))+r12]
+ movdqa xmm1,XMMWORD[((-112))+r12]
+ movdqa xmm2,XMMWORD[((-96))+r12]
+ movdqa xmm3,XMMWORD[((-80))+r12]
+ pand xmm0,XMMWORD[((-128))+rdx]
+ pand xmm1,XMMWORD[((-112))+rdx]
+ por xmm4,xmm0
+ pand xmm2,XMMWORD[((-96))+rdx]
+ por xmm5,xmm1
+ pand xmm3,XMMWORD[((-80))+rdx]
+ por xmm4,xmm2
+ por xmm5,xmm3
+ movdqa xmm0,XMMWORD[((-64))+r12]
+ movdqa xmm1,XMMWORD[((-48))+r12]
+ movdqa xmm2,XMMWORD[((-32))+r12]
+ movdqa xmm3,XMMWORD[((-16))+r12]
+ pand xmm0,XMMWORD[((-64))+rdx]
+ pand xmm1,XMMWORD[((-48))+rdx]
+ por xmm4,xmm0
+ pand xmm2,XMMWORD[((-32))+rdx]
+ por xmm5,xmm1
+ pand xmm3,XMMWORD[((-16))+rdx]
+ por xmm4,xmm2
+ por xmm5,xmm3
+ movdqa xmm0,XMMWORD[r12]
+ movdqa xmm1,XMMWORD[16+r12]
+ movdqa xmm2,XMMWORD[32+r12]
+ movdqa xmm3,XMMWORD[48+r12]
+ pand xmm0,XMMWORD[rdx]
+ pand xmm1,XMMWORD[16+rdx]
+ por xmm4,xmm0
+ pand xmm2,XMMWORD[32+rdx]
+ por xmm5,xmm1
+ pand xmm3,XMMWORD[48+rdx]
+ por xmm4,xmm2
+ por xmm5,xmm3
+ movdqa xmm0,XMMWORD[64+r12]
+ movdqa xmm1,XMMWORD[80+r12]
+ movdqa xmm2,XMMWORD[96+r12]
+ movdqa xmm3,XMMWORD[112+r12]
+ pand xmm0,XMMWORD[64+rdx]
+ pand xmm1,XMMWORD[80+rdx]
+ por xmm4,xmm0
+ pand xmm2,XMMWORD[96+rdx]
+ por xmm5,xmm1
+ pand xmm3,XMMWORD[112+rdx]
+ por xmm4,xmm2
+ por xmm5,xmm3
+ por xmm4,xmm5
+ pshufd xmm0,xmm4,0x4e
+ por xmm0,xmm4
+ lea r12,[256+r12]
+
+ mov rax,QWORD[rsi]
+DB 102,72,15,126,195
+
+ xor r15,r15
+ mov rbp,r8
+ mov r10,QWORD[rsp]
+
+ mul rbx
+ add r10,rax
+ mov rax,QWORD[rcx]
+ adc rdx,0
+
+ imul rbp,r10
+ mov r11,rdx
+
+ mul rbp
+ add r10,rax
+ mov rax,QWORD[8+rsi]
+ adc rdx,0
+ mov r10,QWORD[8+rsp]
+ mov r13,rdx
+
+ lea r15,[1+r15]
+ jmp NEAR $L$inner_enter
+
+ALIGN 16
+$L$inner:
+ add r13,rax
+ mov rax,QWORD[r15*8+rsi]
+ adc rdx,0
+ add r13,r10
+ mov r10,QWORD[r15*8+rsp]
+ adc rdx,0
+ mov QWORD[((-16))+r15*8+rsp],r13
+ mov r13,rdx
+
+$L$inner_enter:
+ mul rbx
+ add r11,rax
+ mov rax,QWORD[r15*8+rcx]
+ adc rdx,0
+ add r10,r11
+ mov r11,rdx
+ adc r11,0
+ lea r15,[1+r15]
+
+ mul rbp
+ cmp r15,r9
+ jne NEAR $L$inner
+
+ add r13,rax
+ adc rdx,0
+ add r13,r10
+ mov r10,QWORD[r9*8+rsp]
+ adc rdx,0
+ mov QWORD[((-16))+r9*8+rsp],r13
+ mov r13,rdx
+
+ xor rdx,rdx
+ add r13,r11
+ adc rdx,0
+ add r13,r10
+ adc rdx,0
+ mov QWORD[((-8))+r9*8+rsp],r13
+ mov QWORD[r9*8+rsp],rdx
+
+ lea r14,[1+r14]
+ cmp r14,r9
+ jb NEAR $L$outer
+
+ xor r14,r14
+ mov rax,QWORD[rsp]
+ lea rsi,[rsp]
+ mov r15,r9
+ jmp NEAR $L$sub
+ALIGN 16
+$L$sub: sbb rax,QWORD[r14*8+rcx]
+ mov QWORD[r14*8+rdi],rax
+ mov rax,QWORD[8+r14*8+rsi]
+ lea r14,[1+r14]
+ dec r15
+ jnz NEAR $L$sub
+
+ sbb rax,0
+ mov rbx,-1
+ xor rbx,rax
+ xor r14,r14
+ mov r15,r9
+
+$L$copy:
+ mov rcx,QWORD[r14*8+rdi]
+ mov rdx,QWORD[r14*8+rsp]
+ and rcx,rbx
+ and rdx,rax
+ mov QWORD[r14*8+rsp],r14
+ or rdx,rcx
+ mov QWORD[r14*8+rdi],rdx
+ lea r14,[1+r14]
+ sub r15,1
+ jnz NEAR $L$copy
+
+ mov rsi,QWORD[8+r9*8+rsp]
+
+ mov rax,1
+
+ mov r15,QWORD[((-48))+rsi]
+
+ mov r14,QWORD[((-40))+rsi]
+
+ mov r13,QWORD[((-32))+rsi]
+
+ mov r12,QWORD[((-24))+rsi]
+
+ mov rbp,QWORD[((-16))+rsi]
+
+ mov rbx,QWORD[((-8))+rsi]
+
+ lea rsp,[rsi]
+
+$L$mul_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_bn_mul_mont_gather5:
+
+ALIGN 32
+bn_mul4x_mont_gather5:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_bn_mul4x_mont_gather5:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+
+DB 0x67
+ mov rax,rsp
+
+$L$mul4x_enter:
+ and r11d,0x80108
+ cmp r11d,0x80108
+ je NEAR $L$mulx4x_enter
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+$L$mul4x_prologue:
+
+DB 0x67
+ shl r9d,3
+ lea r10,[r9*2+r9]
+ neg r9
+
+
+
+
+
+
+
+
+
+
+ lea r11,[((-320))+r9*2+rsp]
+ mov rbp,rsp
+ sub r11,rdi
+ and r11,4095
+ cmp r10,r11
+ jb NEAR $L$mul4xsp_alt
+ sub rbp,r11
+ lea rbp,[((-320))+r9*2+rbp]
+ jmp NEAR $L$mul4xsp_done
+
+ALIGN 32
+$L$mul4xsp_alt:
+ lea r10,[((4096-320))+r9*2]
+ lea rbp,[((-320))+r9*2+rbp]
+ sub r11,r10
+ mov r10,0
+ cmovc r11,r10
+ sub rbp,r11
+$L$mul4xsp_done:
+ and rbp,-64
+ mov r11,rsp
+ sub r11,rbp
+ and r11,-4096
+ lea rsp,[rbp*1+r11]
+ mov r10,QWORD[rsp]
+ cmp rsp,rbp
+ ja NEAR $L$mul4x_page_walk
+ jmp NEAR $L$mul4x_page_walk_done
+
+$L$mul4x_page_walk:
+ lea rsp,[((-4096))+rsp]
+ mov r10,QWORD[rsp]
+ cmp rsp,rbp
+ ja NEAR $L$mul4x_page_walk
+$L$mul4x_page_walk_done:
+
+ neg r9
+
+ mov QWORD[40+rsp],rax
+
+$L$mul4x_body:
+
+ call mul4x_internal
+
+ mov rsi,QWORD[40+rsp]
+
+ mov rax,1
+
+ mov r15,QWORD[((-48))+rsi]
+
+ mov r14,QWORD[((-40))+rsi]
+
+ mov r13,QWORD[((-32))+rsi]
+
+ mov r12,QWORD[((-24))+rsi]
+
+ mov rbp,QWORD[((-16))+rsi]
+
+ mov rbx,QWORD[((-8))+rsi]
+
+ lea rsp,[rsi]
+
+$L$mul4x_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_bn_mul4x_mont_gather5:
+
+
+ALIGN 32
+mul4x_internal:
+ shl r9,5
+ movd xmm5,DWORD[56+rax]
+ lea rax,[$L$inc]
+ lea r13,[128+r9*1+rdx]
+ shr r9,5
+ movdqa xmm0,XMMWORD[rax]
+ movdqa xmm1,XMMWORD[16+rax]
+ lea r10,[((88-112))+r9*1+rsp]
+ lea r12,[128+rdx]
+
+ pshufd xmm5,xmm5,0
+ movdqa xmm4,xmm1
+DB 0x67,0x67
+ movdqa xmm2,xmm1
+ paddd xmm1,xmm0
+ pcmpeqd xmm0,xmm5
+DB 0x67
+ movdqa xmm3,xmm4
+ paddd xmm2,xmm1
+ pcmpeqd xmm1,xmm5
+ movdqa XMMWORD[112+r10],xmm0
+ movdqa xmm0,xmm4
+
+ paddd xmm3,xmm2
+ pcmpeqd xmm2,xmm5
+ movdqa XMMWORD[128+r10],xmm1
+ movdqa xmm1,xmm4
+
+ paddd xmm0,xmm3
+ pcmpeqd xmm3,xmm5
+ movdqa XMMWORD[144+r10],xmm2
+ movdqa xmm2,xmm4
+
+ paddd xmm1,xmm0
+ pcmpeqd xmm0,xmm5
+ movdqa XMMWORD[160+r10],xmm3
+ movdqa xmm3,xmm4
+ paddd xmm2,xmm1
+ pcmpeqd xmm1,xmm5
+ movdqa XMMWORD[176+r10],xmm0
+ movdqa xmm0,xmm4
+
+ paddd xmm3,xmm2
+ pcmpeqd xmm2,xmm5
+ movdqa XMMWORD[192+r10],xmm1
+ movdqa xmm1,xmm4
+
+ paddd xmm0,xmm3
+ pcmpeqd xmm3,xmm5
+ movdqa XMMWORD[208+r10],xmm2
+ movdqa xmm2,xmm4
+
+ paddd xmm1,xmm0
+ pcmpeqd xmm0,xmm5
+ movdqa XMMWORD[224+r10],xmm3
+ movdqa xmm3,xmm4
+ paddd xmm2,xmm1
+ pcmpeqd xmm1,xmm5
+ movdqa XMMWORD[240+r10],xmm0
+ movdqa xmm0,xmm4
+
+ paddd xmm3,xmm2
+ pcmpeqd xmm2,xmm5
+ movdqa XMMWORD[256+r10],xmm1
+ movdqa xmm1,xmm4
+
+ paddd xmm0,xmm3
+ pcmpeqd xmm3,xmm5
+ movdqa XMMWORD[272+r10],xmm2
+ movdqa xmm2,xmm4
+
+ paddd xmm1,xmm0
+ pcmpeqd xmm0,xmm5
+ movdqa XMMWORD[288+r10],xmm3
+ movdqa xmm3,xmm4
+ paddd xmm2,xmm1
+ pcmpeqd xmm1,xmm5
+ movdqa XMMWORD[304+r10],xmm0
+
+ paddd xmm3,xmm2
+DB 0x67
+ pcmpeqd xmm2,xmm5
+ movdqa XMMWORD[320+r10],xmm1
+
+ pcmpeqd xmm3,xmm5
+ movdqa XMMWORD[336+r10],xmm2
+ pand xmm0,XMMWORD[64+r12]
+
+ pand xmm1,XMMWORD[80+r12]
+ pand xmm2,XMMWORD[96+r12]
+ movdqa XMMWORD[352+r10],xmm3
+ pand xmm3,XMMWORD[112+r12]
+ por xmm0,xmm2
+ por xmm1,xmm3
+ movdqa xmm4,XMMWORD[((-128))+r12]
+ movdqa xmm5,XMMWORD[((-112))+r12]
+ movdqa xmm2,XMMWORD[((-96))+r12]
+ pand xmm4,XMMWORD[112+r10]
+ movdqa xmm3,XMMWORD[((-80))+r12]
+ pand xmm5,XMMWORD[128+r10]
+ por xmm0,xmm4
+ pand xmm2,XMMWORD[144+r10]
+ por xmm1,xmm5
+ pand xmm3,XMMWORD[160+r10]
+ por xmm0,xmm2
+ por xmm1,xmm3
+ movdqa xmm4,XMMWORD[((-64))+r12]
+ movdqa xmm5,XMMWORD[((-48))+r12]
+ movdqa xmm2,XMMWORD[((-32))+r12]
+ pand xmm4,XMMWORD[176+r10]
+ movdqa xmm3,XMMWORD[((-16))+r12]
+ pand xmm5,XMMWORD[192+r10]
+ por xmm0,xmm4
+ pand xmm2,XMMWORD[208+r10]
+ por xmm1,xmm5
+ pand xmm3,XMMWORD[224+r10]
+ por xmm0,xmm2
+ por xmm1,xmm3
+ movdqa xmm4,XMMWORD[r12]
+ movdqa xmm5,XMMWORD[16+r12]
+ movdqa xmm2,XMMWORD[32+r12]
+ pand xmm4,XMMWORD[240+r10]
+ movdqa xmm3,XMMWORD[48+r12]
+ pand xmm5,XMMWORD[256+r10]
+ por xmm0,xmm4
+ pand xmm2,XMMWORD[272+r10]
+ por xmm1,xmm5
+ pand xmm3,XMMWORD[288+r10]
+ por xmm0,xmm2
+ por xmm1,xmm3
+ por xmm0,xmm1
+ pshufd xmm1,xmm0,0x4e
+ por xmm0,xmm1
+ lea r12,[256+r12]
+DB 102,72,15,126,195
+
+ mov QWORD[((16+8))+rsp],r13
+ mov QWORD[((56+8))+rsp],rdi
+
+ mov r8,QWORD[r8]
+ mov rax,QWORD[rsi]
+ lea rsi,[r9*1+rsi]
+ neg r9
+
+ mov rbp,r8
+ mul rbx
+ mov r10,rax
+ mov rax,QWORD[rcx]
+
+ imul rbp,r10
+ lea r14,[((64+8))+rsp]
+ mov r11,rdx
+
+ mul rbp
+ add r10,rax
+ mov rax,QWORD[8+r9*1+rsi]
+ adc rdx,0
+ mov rdi,rdx
+
+ mul rbx
+ add r11,rax
+ mov rax,QWORD[8+rcx]
+ adc rdx,0
+ mov r10,rdx
+
+ mul rbp
+ add rdi,rax
+ mov rax,QWORD[16+r9*1+rsi]
+ adc rdx,0
+ add rdi,r11
+ lea r15,[32+r9]
+ lea rcx,[32+rcx]
+ adc rdx,0
+ mov QWORD[r14],rdi
+ mov r13,rdx
+ jmp NEAR $L$1st4x
+
+ALIGN 32
+$L$1st4x:
+ mul rbx
+ add r10,rax
+ mov rax,QWORD[((-16))+rcx]
+ lea r14,[32+r14]
+ adc rdx,0
+ mov r11,rdx
+
+ mul rbp
+ add r13,rax
+ mov rax,QWORD[((-8))+r15*1+rsi]
+ adc rdx,0
+ add r13,r10
+ adc rdx,0
+ mov QWORD[((-24))+r14],r13
+ mov rdi,rdx
+
+ mul rbx
+ add r11,rax
+ mov rax,QWORD[((-8))+rcx]
+ adc rdx,0
+ mov r10,rdx
+
+ mul rbp
+ add rdi,rax
+ mov rax,QWORD[r15*1+rsi]
+ adc rdx,0
+ add rdi,r11
+ adc rdx,0
+ mov QWORD[((-16))+r14],rdi
+ mov r13,rdx
+
+ mul rbx
+ add r10,rax
+ mov rax,QWORD[rcx]
+ adc rdx,0
+ mov r11,rdx
+
+ mul rbp
+ add r13,rax
+ mov rax,QWORD[8+r15*1+rsi]
+ adc rdx,0
+ add r13,r10
+ adc rdx,0
+ mov QWORD[((-8))+r14],r13
+ mov rdi,rdx
+
+ mul rbx
+ add r11,rax
+ mov rax,QWORD[8+rcx]
+ adc rdx,0
+ mov r10,rdx
+
+ mul rbp
+ add rdi,rax
+ mov rax,QWORD[16+r15*1+rsi]
+ adc rdx,0
+ add rdi,r11
+ lea rcx,[32+rcx]
+ adc rdx,0
+ mov QWORD[r14],rdi
+ mov r13,rdx
+
+ add r15,32
+ jnz NEAR $L$1st4x
+
+ mul rbx
+ add r10,rax
+ mov rax,QWORD[((-16))+rcx]
+ lea r14,[32+r14]
+ adc rdx,0
+ mov r11,rdx
+
+ mul rbp
+ add r13,rax
+ mov rax,QWORD[((-8))+rsi]
+ adc rdx,0
+ add r13,r10
+ adc rdx,0
+ mov QWORD[((-24))+r14],r13
+ mov rdi,rdx
+
+ mul rbx
+ add r11,rax
+ mov rax,QWORD[((-8))+rcx]
+ adc rdx,0
+ mov r10,rdx
+
+ mul rbp
+ add rdi,rax
+ mov rax,QWORD[r9*1+rsi]
+ adc rdx,0
+ add rdi,r11
+ adc rdx,0
+ mov QWORD[((-16))+r14],rdi
+ mov r13,rdx
+
+ lea rcx,[r9*1+rcx]
+
+ xor rdi,rdi
+ add r13,r10
+ adc rdi,0
+ mov QWORD[((-8))+r14],r13
+
+ jmp NEAR $L$outer4x
+
+ALIGN 32
+$L$outer4x:
+ lea rdx,[((16+128))+r14]
+ pxor xmm4,xmm4
+ pxor xmm5,xmm5
+ movdqa xmm0,XMMWORD[((-128))+r12]
+ movdqa xmm1,XMMWORD[((-112))+r12]
+ movdqa xmm2,XMMWORD[((-96))+r12]
+ movdqa xmm3,XMMWORD[((-80))+r12]
+ pand xmm0,XMMWORD[((-128))+rdx]
+ pand xmm1,XMMWORD[((-112))+rdx]
+ por xmm4,xmm0
+ pand xmm2,XMMWORD[((-96))+rdx]
+ por xmm5,xmm1
+ pand xmm3,XMMWORD[((-80))+rdx]
+ por xmm4,xmm2
+ por xmm5,xmm3
+ movdqa xmm0,XMMWORD[((-64))+r12]
+ movdqa xmm1,XMMWORD[((-48))+r12]
+ movdqa xmm2,XMMWORD[((-32))+r12]
+ movdqa xmm3,XMMWORD[((-16))+r12]
+ pand xmm0,XMMWORD[((-64))+rdx]
+ pand xmm1,XMMWORD[((-48))+rdx]
+ por xmm4,xmm0
+ pand xmm2,XMMWORD[((-32))+rdx]
+ por xmm5,xmm1
+ pand xmm3,XMMWORD[((-16))+rdx]
+ por xmm4,xmm2
+ por xmm5,xmm3
+ movdqa xmm0,XMMWORD[r12]
+ movdqa xmm1,XMMWORD[16+r12]
+ movdqa xmm2,XMMWORD[32+r12]
+ movdqa xmm3,XMMWORD[48+r12]
+ pand xmm0,XMMWORD[rdx]
+ pand xmm1,XMMWORD[16+rdx]
+ por xmm4,xmm0
+ pand xmm2,XMMWORD[32+rdx]
+ por xmm5,xmm1
+ pand xmm3,XMMWORD[48+rdx]
+ por xmm4,xmm2
+ por xmm5,xmm3
+ movdqa xmm0,XMMWORD[64+r12]
+ movdqa xmm1,XMMWORD[80+r12]
+ movdqa xmm2,XMMWORD[96+r12]
+ movdqa xmm3,XMMWORD[112+r12]
+ pand xmm0,XMMWORD[64+rdx]
+ pand xmm1,XMMWORD[80+rdx]
+ por xmm4,xmm0
+ pand xmm2,XMMWORD[96+rdx]
+ por xmm5,xmm1
+ pand xmm3,XMMWORD[112+rdx]
+ por xmm4,xmm2
+ por xmm5,xmm3
+ por xmm4,xmm5
+ pshufd xmm0,xmm4,0x4e
+ por xmm0,xmm4
+ lea r12,[256+r12]
+DB 102,72,15,126,195
+
+ mov r10,QWORD[r9*1+r14]
+ mov rbp,r8
+ mul rbx
+ add r10,rax
+ mov rax,QWORD[rcx]
+ adc rdx,0
+
+ imul rbp,r10
+ mov r11,rdx
+ mov QWORD[r14],rdi
+
+ lea r14,[r9*1+r14]
+
+ mul rbp
+ add r10,rax
+ mov rax,QWORD[8+r9*1+rsi]
+ adc rdx,0
+ mov rdi,rdx
+
+ mul rbx
+ add r11,rax
+ mov rax,QWORD[8+rcx]
+ adc rdx,0
+ add r11,QWORD[8+r14]
+ adc rdx,0
+ mov r10,rdx
+
+ mul rbp
+ add rdi,rax
+ mov rax,QWORD[16+r9*1+rsi]
+ adc rdx,0
+ add rdi,r11
+ lea r15,[32+r9]
+ lea rcx,[32+rcx]
+ adc rdx,0
+ mov r13,rdx
+ jmp NEAR $L$inner4x
+
+ALIGN 32
+$L$inner4x:
+ mul rbx
+ add r10,rax
+ mov rax,QWORD[((-16))+rcx]
+ adc rdx,0
+ add r10,QWORD[16+r14]
+ lea r14,[32+r14]
+ adc rdx,0
+ mov r11,rdx
+
+ mul rbp
+ add r13,rax
+ mov rax,QWORD[((-8))+r15*1+rsi]
+ adc rdx,0
+ add r13,r10
+ adc rdx,0
+ mov QWORD[((-32))+r14],rdi
+ mov rdi,rdx
+
+ mul rbx
+ add r11,rax
+ mov rax,QWORD[((-8))+rcx]
+ adc rdx,0
+ add r11,QWORD[((-8))+r14]
+ adc rdx,0
+ mov r10,rdx
+
+ mul rbp
+ add rdi,rax
+ mov rax,QWORD[r15*1+rsi]
+ adc rdx,0
+ add rdi,r11
+ adc rdx,0
+ mov QWORD[((-24))+r14],r13
+ mov r13,rdx
+
+ mul rbx
+ add r10,rax
+ mov rax,QWORD[rcx]
+ adc rdx,0
+ add r10,QWORD[r14]
+ adc rdx,0
+ mov r11,rdx
+
+ mul rbp
+ add r13,rax
+ mov rax,QWORD[8+r15*1+rsi]
+ adc rdx,0
+ add r13,r10
+ adc rdx,0
+ mov QWORD[((-16))+r14],rdi
+ mov rdi,rdx
+
+ mul rbx
+ add r11,rax
+ mov rax,QWORD[8+rcx]
+ adc rdx,0
+ add r11,QWORD[8+r14]
+ adc rdx,0
+ mov r10,rdx
+
+ mul rbp
+ add rdi,rax
+ mov rax,QWORD[16+r15*1+rsi]
+ adc rdx,0
+ add rdi,r11
+ lea rcx,[32+rcx]
+ adc rdx,0
+ mov QWORD[((-8))+r14],r13
+ mov r13,rdx
+
+ add r15,32
+ jnz NEAR $L$inner4x
+
+ mul rbx
+ add r10,rax
+ mov rax,QWORD[((-16))+rcx]
+ adc rdx,0
+ add r10,QWORD[16+r14]
+ lea r14,[32+r14]
+ adc rdx,0
+ mov r11,rdx
+
+ mul rbp
+ add r13,rax
+ mov rax,QWORD[((-8))+rsi]
+ adc rdx,0
+ add r13,r10
+ adc rdx,0
+ mov QWORD[((-32))+r14],rdi
+ mov rdi,rdx
+
+ mul rbx
+ add r11,rax
+ mov rax,rbp
+ mov rbp,QWORD[((-8))+rcx]
+ adc rdx,0
+ add r11,QWORD[((-8))+r14]
+ adc rdx,0
+ mov r10,rdx
+
+ mul rbp
+ add rdi,rax
+ mov rax,QWORD[r9*1+rsi]
+ adc rdx,0
+ add rdi,r11
+ adc rdx,0
+ mov QWORD[((-24))+r14],r13
+ mov r13,rdx
+
+ mov QWORD[((-16))+r14],rdi
+ lea rcx,[r9*1+rcx]
+
+ xor rdi,rdi
+ add r13,r10
+ adc rdi,0
+ add r13,QWORD[r14]
+ adc rdi,0
+ mov QWORD[((-8))+r14],r13
+
+ cmp r12,QWORD[((16+8))+rsp]
+ jb NEAR $L$outer4x
+ xor rax,rax
+ sub rbp,r13
+ adc r15,r15
+ or rdi,r15
+ sub rax,rdi
+ lea rbx,[r9*1+r14]
+ mov r12,QWORD[rcx]
+ lea rbp,[rcx]
+ mov rcx,r9
+ sar rcx,3+2
+ mov rdi,QWORD[((56+8))+rsp]
+ dec r12
+ xor r10,r10
+ mov r13,QWORD[8+rbp]
+ mov r14,QWORD[16+rbp]
+ mov r15,QWORD[24+rbp]
+ jmp NEAR $L$sqr4x_sub_entry
+
+global bn_power5
+
+ALIGN 32
+bn_power5:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_bn_power5:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+
+ mov rax,rsp
+
+ mov r11d,DWORD[((OPENSSL_ia32cap_P+8))]
+ and r11d,0x80108
+ cmp r11d,0x80108
+ je NEAR $L$powerx5_enter
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+$L$power5_prologue:
+
+ shl r9d,3
+ lea r10d,[r9*2+r9]
+ neg r9
+ mov r8,QWORD[r8]
+
+
+
+
+
+
+
+
+ lea r11,[((-320))+r9*2+rsp]
+ mov rbp,rsp
+ sub r11,rdi
+ and r11,4095
+ cmp r10,r11
+ jb NEAR $L$pwr_sp_alt
+ sub rbp,r11
+ lea rbp,[((-320))+r9*2+rbp]
+ jmp NEAR $L$pwr_sp_done
+
+ALIGN 32
+$L$pwr_sp_alt:
+ lea r10,[((4096-320))+r9*2]
+ lea rbp,[((-320))+r9*2+rbp]
+ sub r11,r10
+ mov r10,0
+ cmovc r11,r10
+ sub rbp,r11
+$L$pwr_sp_done:
+ and rbp,-64
+ mov r11,rsp
+ sub r11,rbp
+ and r11,-4096
+ lea rsp,[rbp*1+r11]
+ mov r10,QWORD[rsp]
+ cmp rsp,rbp
+ ja NEAR $L$pwr_page_walk
+ jmp NEAR $L$pwr_page_walk_done
+
+$L$pwr_page_walk:
+ lea rsp,[((-4096))+rsp]
+ mov r10,QWORD[rsp]
+ cmp rsp,rbp
+ ja NEAR $L$pwr_page_walk
+$L$pwr_page_walk_done:
+
+ mov r10,r9
+ neg r9
+
+
+
+
+
+
+
+
+
+
+ mov QWORD[32+rsp],r8
+ mov QWORD[40+rsp],rax
+
+$L$power5_body:
+DB 102,72,15,110,207
+DB 102,72,15,110,209
+DB 102,73,15,110,218
+DB 102,72,15,110,226
+
+ call __bn_sqr8x_internal
+ call __bn_post4x_internal
+ call __bn_sqr8x_internal
+ call __bn_post4x_internal
+ call __bn_sqr8x_internal
+ call __bn_post4x_internal
+ call __bn_sqr8x_internal
+ call __bn_post4x_internal
+ call __bn_sqr8x_internal
+ call __bn_post4x_internal
+
+DB 102,72,15,126,209
+DB 102,72,15,126,226
+ mov rdi,rsi
+ mov rax,QWORD[40+rsp]
+ lea r8,[32+rsp]
+
+ call mul4x_internal
+
+ mov rsi,QWORD[40+rsp]
+
+ mov rax,1
+ mov r15,QWORD[((-48))+rsi]
+
+ mov r14,QWORD[((-40))+rsi]
+
+ mov r13,QWORD[((-32))+rsi]
+
+ mov r12,QWORD[((-24))+rsi]
+
+ mov rbp,QWORD[((-16))+rsi]
+
+ mov rbx,QWORD[((-8))+rsi]
+
+ lea rsp,[rsi]
+
+$L$power5_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_bn_power5:
+
+global bn_sqr8x_internal
+
+
+ALIGN 32
+bn_sqr8x_internal:
+__bn_sqr8x_internal:
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ lea rbp,[32+r10]
+ lea rsi,[r9*1+rsi]
+
+ mov rcx,r9
+
+
+ mov r14,QWORD[((-32))+rbp*1+rsi]
+ lea rdi,[((48+8))+r9*2+rsp]
+ mov rax,QWORD[((-24))+rbp*1+rsi]
+ lea rdi,[((-32))+rbp*1+rdi]
+ mov rbx,QWORD[((-16))+rbp*1+rsi]
+ mov r15,rax
+
+ mul r14
+ mov r10,rax
+ mov rax,rbx
+ mov r11,rdx
+ mov QWORD[((-24))+rbp*1+rdi],r10
+
+ mul r14
+ add r11,rax
+ mov rax,rbx
+ adc rdx,0
+ mov QWORD[((-16))+rbp*1+rdi],r11
+ mov r10,rdx
+
+
+ mov rbx,QWORD[((-8))+rbp*1+rsi]
+ mul r15
+ mov r12,rax
+ mov rax,rbx
+ mov r13,rdx
+
+ lea rcx,[rbp]
+ mul r14
+ add r10,rax
+ mov rax,rbx
+ mov r11,rdx
+ adc r11,0
+ add r10,r12
+ adc r11,0
+ mov QWORD[((-8))+rcx*1+rdi],r10
+ jmp NEAR $L$sqr4x_1st
+
+ALIGN 32
+$L$sqr4x_1st:
+ mov rbx,QWORD[rcx*1+rsi]
+ mul r15
+ add r13,rax
+ mov rax,rbx
+ mov r12,rdx
+ adc r12,0
+
+ mul r14
+ add r11,rax
+ mov rax,rbx
+ mov rbx,QWORD[8+rcx*1+rsi]
+ mov r10,rdx
+ adc r10,0
+ add r11,r13
+ adc r10,0
+
+
+ mul r15
+ add r12,rax
+ mov rax,rbx
+ mov QWORD[rcx*1+rdi],r11
+ mov r13,rdx
+ adc r13,0
+
+ mul r14
+ add r10,rax
+ mov rax,rbx
+ mov rbx,QWORD[16+rcx*1+rsi]
+ mov r11,rdx
+ adc r11,0
+ add r10,r12
+ adc r11,0
+
+ mul r15
+ add r13,rax
+ mov rax,rbx
+ mov QWORD[8+rcx*1+rdi],r10
+ mov r12,rdx
+ adc r12,0
+
+ mul r14
+ add r11,rax
+ mov rax,rbx
+ mov rbx,QWORD[24+rcx*1+rsi]
+ mov r10,rdx
+ adc r10,0
+ add r11,r13
+ adc r10,0
+
+
+ mul r15
+ add r12,rax
+ mov rax,rbx
+ mov QWORD[16+rcx*1+rdi],r11
+ mov r13,rdx
+ adc r13,0
+ lea rcx,[32+rcx]
+
+ mul r14
+ add r10,rax
+ mov rax,rbx
+ mov r11,rdx
+ adc r11,0
+ add r10,r12
+ adc r11,0
+ mov QWORD[((-8))+rcx*1+rdi],r10
+
+ cmp rcx,0
+ jne NEAR $L$sqr4x_1st
+
+ mul r15
+ add r13,rax
+ lea rbp,[16+rbp]
+ adc rdx,0
+ add r13,r11
+ adc rdx,0
+
+ mov QWORD[rdi],r13
+ mov r12,rdx
+ mov QWORD[8+rdi],rdx
+ jmp NEAR $L$sqr4x_outer
+
+ALIGN 32
+$L$sqr4x_outer:
+ mov r14,QWORD[((-32))+rbp*1+rsi]
+ lea rdi,[((48+8))+r9*2+rsp]
+ mov rax,QWORD[((-24))+rbp*1+rsi]
+ lea rdi,[((-32))+rbp*1+rdi]
+ mov rbx,QWORD[((-16))+rbp*1+rsi]
+ mov r15,rax
+
+ mul r14
+ mov r10,QWORD[((-24))+rbp*1+rdi]
+ add r10,rax
+ mov rax,rbx
+ adc rdx,0
+ mov QWORD[((-24))+rbp*1+rdi],r10
+ mov r11,rdx
+
+ mul r14
+ add r11,rax
+ mov rax,rbx
+ adc rdx,0
+ add r11,QWORD[((-16))+rbp*1+rdi]
+ mov r10,rdx
+ adc r10,0
+ mov QWORD[((-16))+rbp*1+rdi],r11
+
+ xor r12,r12
+
+ mov rbx,QWORD[((-8))+rbp*1+rsi]
+ mul r15
+ add r12,rax
+ mov rax,rbx
+ adc rdx,0
+ add r12,QWORD[((-8))+rbp*1+rdi]
+ mov r13,rdx
+ adc r13,0
+
+ mul r14
+ add r10,rax
+ mov rax,rbx
+ adc rdx,0
+ add r10,r12
+ mov r11,rdx
+ adc r11,0
+ mov QWORD[((-8))+rbp*1+rdi],r10
+
+ lea rcx,[rbp]
+ jmp NEAR $L$sqr4x_inner
+
+ALIGN 32
+$L$sqr4x_inner:
+ mov rbx,QWORD[rcx*1+rsi]
+ mul r15
+ add r13,rax
+ mov rax,rbx
+ mov r12,rdx
+ adc r12,0
+ add r13,QWORD[rcx*1+rdi]
+ adc r12,0
+
+DB 0x67
+ mul r14
+ add r11,rax
+ mov rax,rbx
+ mov rbx,QWORD[8+rcx*1+rsi]
+ mov r10,rdx
+ adc r10,0
+ add r11,r13
+ adc r10,0
+
+ mul r15
+ add r12,rax
+ mov QWORD[rcx*1+rdi],r11
+ mov rax,rbx
+ mov r13,rdx
+ adc r13,0
+ add r12,QWORD[8+rcx*1+rdi]
+ lea rcx,[16+rcx]
+ adc r13,0
+
+ mul r14
+ add r10,rax
+ mov rax,rbx
+ adc rdx,0
+ add r10,r12
+ mov r11,rdx
+ adc r11,0
+ mov QWORD[((-8))+rcx*1+rdi],r10
+
+ cmp rcx,0
+ jne NEAR $L$sqr4x_inner
+
+DB 0x67
+ mul r15
+ add r13,rax
+ adc rdx,0
+ add r13,r11
+ adc rdx,0
+
+ mov QWORD[rdi],r13
+ mov r12,rdx
+ mov QWORD[8+rdi],rdx
+
+ add rbp,16
+ jnz NEAR $L$sqr4x_outer
+
+
+ mov r14,QWORD[((-32))+rsi]
+ lea rdi,[((48+8))+r9*2+rsp]
+ mov rax,QWORD[((-24))+rsi]
+ lea rdi,[((-32))+rbp*1+rdi]
+ mov rbx,QWORD[((-16))+rsi]
+ mov r15,rax
+
+ mul r14
+ add r10,rax
+ mov rax,rbx
+ mov r11,rdx
+ adc r11,0
+
+ mul r14
+ add r11,rax
+ mov rax,rbx
+ mov QWORD[((-24))+rdi],r10
+ mov r10,rdx
+ adc r10,0
+ add r11,r13
+ mov rbx,QWORD[((-8))+rsi]
+ adc r10,0
+
+ mul r15
+ add r12,rax
+ mov rax,rbx
+ mov QWORD[((-16))+rdi],r11
+ mov r13,rdx
+ adc r13,0
+
+ mul r14
+ add r10,rax
+ mov rax,rbx
+ mov r11,rdx
+ adc r11,0
+ add r10,r12
+ adc r11,0
+ mov QWORD[((-8))+rdi],r10
+
+ mul r15
+ add r13,rax
+ mov rax,QWORD[((-16))+rsi]
+ adc rdx,0
+ add r13,r11
+ adc rdx,0
+
+ mov QWORD[rdi],r13
+ mov r12,rdx
+ mov QWORD[8+rdi],rdx
+
+ mul rbx
+ add rbp,16
+ xor r14,r14
+ sub rbp,r9
+ xor r15,r15
+
+ add rax,r12
+ adc rdx,0
+ mov QWORD[8+rdi],rax
+ mov QWORD[16+rdi],rdx
+ mov QWORD[24+rdi],r15
+
+ mov rax,QWORD[((-16))+rbp*1+rsi]
+ lea rdi,[((48+8))+rsp]
+ xor r10,r10
+ mov r11,QWORD[8+rdi]
+
+ lea r12,[r10*2+r14]
+ shr r10,63
+ lea r13,[r11*2+rcx]
+ shr r11,63
+ or r13,r10
+ mov r10,QWORD[16+rdi]
+ mov r14,r11
+ mul rax
+ neg r15
+ mov r11,QWORD[24+rdi]
+ adc r12,rax
+ mov rax,QWORD[((-8))+rbp*1+rsi]
+ mov QWORD[rdi],r12
+ adc r13,rdx
+
+ lea rbx,[r10*2+r14]
+ mov QWORD[8+rdi],r13
+ sbb r15,r15
+ shr r10,63
+ lea r8,[r11*2+rcx]
+ shr r11,63
+ or r8,r10
+ mov r10,QWORD[32+rdi]
+ mov r14,r11
+ mul rax
+ neg r15
+ mov r11,QWORD[40+rdi]
+ adc rbx,rax
+ mov rax,QWORD[rbp*1+rsi]
+ mov QWORD[16+rdi],rbx
+ adc r8,rdx
+ lea rbp,[16+rbp]
+ mov QWORD[24+rdi],r8
+ sbb r15,r15
+ lea rdi,[64+rdi]
+ jmp NEAR $L$sqr4x_shift_n_add
+
+ALIGN 32
+$L$sqr4x_shift_n_add:
+ lea r12,[r10*2+r14]
+ shr r10,63
+ lea r13,[r11*2+rcx]
+ shr r11,63
+ or r13,r10
+ mov r10,QWORD[((-16))+rdi]
+ mov r14,r11
+ mul rax
+ neg r15
+ mov r11,QWORD[((-8))+rdi]
+ adc r12,rax
+ mov rax,QWORD[((-8))+rbp*1+rsi]
+ mov QWORD[((-32))+rdi],r12
+ adc r13,rdx
+
+ lea rbx,[r10*2+r14]
+ mov QWORD[((-24))+rdi],r13
+ sbb r15,r15
+ shr r10,63
+ lea r8,[r11*2+rcx]
+ shr r11,63
+ or r8,r10
+ mov r10,QWORD[rdi]
+ mov r14,r11
+ mul rax
+ neg r15
+ mov r11,QWORD[8+rdi]
+ adc rbx,rax
+ mov rax,QWORD[rbp*1+rsi]
+ mov QWORD[((-16))+rdi],rbx
+ adc r8,rdx
+
+ lea r12,[r10*2+r14]
+ mov QWORD[((-8))+rdi],r8
+ sbb r15,r15
+ shr r10,63
+ lea r13,[r11*2+rcx]
+ shr r11,63
+ or r13,r10
+ mov r10,QWORD[16+rdi]
+ mov r14,r11
+ mul rax
+ neg r15
+ mov r11,QWORD[24+rdi]
+ adc r12,rax
+ mov rax,QWORD[8+rbp*1+rsi]
+ mov QWORD[rdi],r12
+ adc r13,rdx
+
+ lea rbx,[r10*2+r14]
+ mov QWORD[8+rdi],r13
+ sbb r15,r15
+ shr r10,63
+ lea r8,[r11*2+rcx]
+ shr r11,63
+ or r8,r10
+ mov r10,QWORD[32+rdi]
+ mov r14,r11
+ mul rax
+ neg r15
+ mov r11,QWORD[40+rdi]
+ adc rbx,rax
+ mov rax,QWORD[16+rbp*1+rsi]
+ mov QWORD[16+rdi],rbx
+ adc r8,rdx
+ mov QWORD[24+rdi],r8
+ sbb r15,r15
+ lea rdi,[64+rdi]
+ add rbp,32
+ jnz NEAR $L$sqr4x_shift_n_add
+
+ lea r12,[r10*2+r14]
+DB 0x67
+ shr r10,63
+ lea r13,[r11*2+rcx]
+ shr r11,63
+ or r13,r10
+ mov r10,QWORD[((-16))+rdi]
+ mov r14,r11
+ mul rax
+ neg r15
+ mov r11,QWORD[((-8))+rdi]
+ adc r12,rax
+ mov rax,QWORD[((-8))+rsi]
+ mov QWORD[((-32))+rdi],r12
+ adc r13,rdx
+
+ lea rbx,[r10*2+r14]
+ mov QWORD[((-24))+rdi],r13
+ sbb r15,r15
+ shr r10,63
+ lea r8,[r11*2+rcx]
+ shr r11,63
+ or r8,r10
+ mul rax
+ neg r15
+ adc rbx,rax
+ adc r8,rdx
+ mov QWORD[((-16))+rdi],rbx
+ mov QWORD[((-8))+rdi],r8
+DB 102,72,15,126,213
+__bn_sqr8x_reduction:
+ xor rax,rax
+ lea rcx,[rbp*1+r9]
+ lea rdx,[((48+8))+r9*2+rsp]
+ mov QWORD[((0+8))+rsp],rcx
+ lea rdi,[((48+8))+r9*1+rsp]
+ mov QWORD[((8+8))+rsp],rdx
+ neg r9
+ jmp NEAR $L$8x_reduction_loop
+
+ALIGN 32
+$L$8x_reduction_loop:
+ lea rdi,[r9*1+rdi]
+DB 0x66
+ mov rbx,QWORD[rdi]
+ mov r9,QWORD[8+rdi]
+ mov r10,QWORD[16+rdi]
+ mov r11,QWORD[24+rdi]
+ mov r12,QWORD[32+rdi]
+ mov r13,QWORD[40+rdi]
+ mov r14,QWORD[48+rdi]
+ mov r15,QWORD[56+rdi]
+ mov QWORD[rdx],rax
+ lea rdi,[64+rdi]
+
+DB 0x67
+ mov r8,rbx
+ imul rbx,QWORD[((32+8))+rsp]
+ mov rax,QWORD[rbp]
+ mov ecx,8
+ jmp NEAR $L$8x_reduce
+
+ALIGN 32
+$L$8x_reduce:
+ mul rbx
+ mov rax,QWORD[8+rbp]
+ neg r8
+ mov r8,rdx
+ adc r8,0
+
+ mul rbx
+ add r9,rax
+ mov rax,QWORD[16+rbp]
+ adc rdx,0
+ add r8,r9
+ mov QWORD[((48-8+8))+rcx*8+rsp],rbx
+ mov r9,rdx
+ adc r9,0
+
+ mul rbx
+ add r10,rax
+ mov rax,QWORD[24+rbp]
+ adc rdx,0
+ add r9,r10
+ mov rsi,QWORD[((32+8))+rsp]
+ mov r10,rdx
+ adc r10,0
+
+ mul rbx
+ add r11,rax
+ mov rax,QWORD[32+rbp]
+ adc rdx,0
+ imul rsi,r8
+ add r10,r11
+ mov r11,rdx
+ adc r11,0
+
+ mul rbx
+ add r12,rax
+ mov rax,QWORD[40+rbp]
+ adc rdx,0
+ add r11,r12
+ mov r12,rdx
+ adc r12,0
+
+ mul rbx
+ add r13,rax
+ mov rax,QWORD[48+rbp]
+ adc rdx,0
+ add r12,r13
+ mov r13,rdx
+ adc r13,0
+
+ mul rbx
+ add r14,rax
+ mov rax,QWORD[56+rbp]
+ adc rdx,0
+ add r13,r14
+ mov r14,rdx
+ adc r14,0
+
+ mul rbx
+ mov rbx,rsi
+ add r15,rax
+ mov rax,QWORD[rbp]
+ adc rdx,0
+ add r14,r15
+ mov r15,rdx
+ adc r15,0
+
+ dec ecx
+ jnz NEAR $L$8x_reduce
+
+ lea rbp,[64+rbp]
+ xor rax,rax
+ mov rdx,QWORD[((8+8))+rsp]
+ cmp rbp,QWORD[((0+8))+rsp]
+ jae NEAR $L$8x_no_tail
+
+DB 0x66
+ add r8,QWORD[rdi]
+ adc r9,QWORD[8+rdi]
+ adc r10,QWORD[16+rdi]
+ adc r11,QWORD[24+rdi]
+ adc r12,QWORD[32+rdi]
+ adc r13,QWORD[40+rdi]
+ adc r14,QWORD[48+rdi]
+ adc r15,QWORD[56+rdi]
+ sbb rsi,rsi
+
+ mov rbx,QWORD[((48+56+8))+rsp]
+ mov ecx,8
+ mov rax,QWORD[rbp]
+ jmp NEAR $L$8x_tail
+
+ALIGN 32
+$L$8x_tail:
+ mul rbx
+ add r8,rax
+ mov rax,QWORD[8+rbp]
+ mov QWORD[rdi],r8
+ mov r8,rdx
+ adc r8,0
+
+ mul rbx
+ add r9,rax
+ mov rax,QWORD[16+rbp]
+ adc rdx,0
+ add r8,r9
+ lea rdi,[8+rdi]
+ mov r9,rdx
+ adc r9,0
+
+ mul rbx
+ add r10,rax
+ mov rax,QWORD[24+rbp]
+ adc rdx,0
+ add r9,r10
+ mov r10,rdx
+ adc r10,0
+
+ mul rbx
+ add r11,rax
+ mov rax,QWORD[32+rbp]
+ adc rdx,0
+ add r10,r11
+ mov r11,rdx
+ adc r11,0
+
+ mul rbx
+ add r12,rax
+ mov rax,QWORD[40+rbp]
+ adc rdx,0
+ add r11,r12
+ mov r12,rdx
+ adc r12,0
+
+ mul rbx
+ add r13,rax
+ mov rax,QWORD[48+rbp]
+ adc rdx,0
+ add r12,r13
+ mov r13,rdx
+ adc r13,0
+
+ mul rbx
+ add r14,rax
+ mov rax,QWORD[56+rbp]
+ adc rdx,0
+ add r13,r14
+ mov r14,rdx
+ adc r14,0
+
+ mul rbx
+ mov rbx,QWORD[((48-16+8))+rcx*8+rsp]
+ add r15,rax
+ adc rdx,0
+ add r14,r15
+ mov rax,QWORD[rbp]
+ mov r15,rdx
+ adc r15,0
+
+ dec ecx
+ jnz NEAR $L$8x_tail
+
+ lea rbp,[64+rbp]
+ mov rdx,QWORD[((8+8))+rsp]
+ cmp rbp,QWORD[((0+8))+rsp]
+ jae NEAR $L$8x_tail_done
+
+ mov rbx,QWORD[((48+56+8))+rsp]
+ neg rsi
+ mov rax,QWORD[rbp]
+ adc r8,QWORD[rdi]
+ adc r9,QWORD[8+rdi]
+ adc r10,QWORD[16+rdi]
+ adc r11,QWORD[24+rdi]
+ adc r12,QWORD[32+rdi]
+ adc r13,QWORD[40+rdi]
+ adc r14,QWORD[48+rdi]
+ adc r15,QWORD[56+rdi]
+ sbb rsi,rsi
+
+ mov ecx,8
+ jmp NEAR $L$8x_tail
+
+ALIGN 32
+$L$8x_tail_done:
+ xor rax,rax
+ add r8,QWORD[rdx]
+ adc r9,0
+ adc r10,0
+ adc r11,0
+ adc r12,0
+ adc r13,0
+ adc r14,0
+ adc r15,0
+ adc rax,0
+
+ neg rsi
+$L$8x_no_tail:
+ adc r8,QWORD[rdi]
+ adc r9,QWORD[8+rdi]
+ adc r10,QWORD[16+rdi]
+ adc r11,QWORD[24+rdi]
+ adc r12,QWORD[32+rdi]
+ adc r13,QWORD[40+rdi]
+ adc r14,QWORD[48+rdi]
+ adc r15,QWORD[56+rdi]
+ adc rax,0
+ mov rcx,QWORD[((-8))+rbp]
+ xor rsi,rsi
+
+DB 102,72,15,126,213
+
+ mov QWORD[rdi],r8
+ mov QWORD[8+rdi],r9
+DB 102,73,15,126,217
+ mov QWORD[16+rdi],r10
+ mov QWORD[24+rdi],r11
+ mov QWORD[32+rdi],r12
+ mov QWORD[40+rdi],r13
+ mov QWORD[48+rdi],r14
+ mov QWORD[56+rdi],r15
+ lea rdi,[64+rdi]
+
+ cmp rdi,rdx
+ jb NEAR $L$8x_reduction_loop
+ DB 0F3h,0C3h ;repret
+
+
+ALIGN 32
+__bn_post4x_internal:
+ mov r12,QWORD[rbp]
+ lea rbx,[r9*1+rdi]
+ mov rcx,r9
+DB 102,72,15,126,207
+ neg rax
+DB 102,72,15,126,206
+ sar rcx,3+2
+ dec r12
+ xor r10,r10
+ mov r13,QWORD[8+rbp]
+ mov r14,QWORD[16+rbp]
+ mov r15,QWORD[24+rbp]
+ jmp NEAR $L$sqr4x_sub_entry
+
+ALIGN 16
+$L$sqr4x_sub:
+ mov r12,QWORD[rbp]
+ mov r13,QWORD[8+rbp]
+ mov r14,QWORD[16+rbp]
+ mov r15,QWORD[24+rbp]
+$L$sqr4x_sub_entry:
+ lea rbp,[32+rbp]
+ not r12
+ not r13
+ not r14
+ not r15
+ and r12,rax
+ and r13,rax
+ and r14,rax
+ and r15,rax
+
+ neg r10
+ adc r12,QWORD[rbx]
+ adc r13,QWORD[8+rbx]
+ adc r14,QWORD[16+rbx]
+ adc r15,QWORD[24+rbx]
+ mov QWORD[rdi],r12
+ lea rbx,[32+rbx]
+ mov QWORD[8+rdi],r13
+ sbb r10,r10
+ mov QWORD[16+rdi],r14
+ mov QWORD[24+rdi],r15
+ lea rdi,[32+rdi]
+
+ inc rcx
+ jnz NEAR $L$sqr4x_sub
+
+ mov r10,r9
+ neg r9
+ DB 0F3h,0C3h ;repret
+
+global bn_from_montgomery
+
+ALIGN 32
+bn_from_montgomery:
+ test DWORD[48+rsp],7
+ jz NEAR bn_from_mont8x
+ xor eax,eax
+ DB 0F3h,0C3h ;repret
+
+
+
+ALIGN 32
+bn_from_mont8x:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_bn_from_mont8x:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+
+DB 0x67
+ mov rax,rsp
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+$L$from_prologue:
+
+ shl r9d,3
+ lea r10,[r9*2+r9]
+ neg r9
+ mov r8,QWORD[r8]
+
+
+
+
+
+
+
+
+ lea r11,[((-320))+r9*2+rsp]
+ mov rbp,rsp
+ sub r11,rdi
+ and r11,4095
+ cmp r10,r11
+ jb NEAR $L$from_sp_alt
+ sub rbp,r11
+ lea rbp,[((-320))+r9*2+rbp]
+ jmp NEAR $L$from_sp_done
+
+ALIGN 32
+$L$from_sp_alt:
+ lea r10,[((4096-320))+r9*2]
+ lea rbp,[((-320))+r9*2+rbp]
+ sub r11,r10
+ mov r10,0
+ cmovc r11,r10
+ sub rbp,r11
+$L$from_sp_done:
+ and rbp,-64
+ mov r11,rsp
+ sub r11,rbp
+ and r11,-4096
+ lea rsp,[rbp*1+r11]
+ mov r10,QWORD[rsp]
+ cmp rsp,rbp
+ ja NEAR $L$from_page_walk
+ jmp NEAR $L$from_page_walk_done
+
+$L$from_page_walk:
+ lea rsp,[((-4096))+rsp]
+ mov r10,QWORD[rsp]
+ cmp rsp,rbp
+ ja NEAR $L$from_page_walk
+$L$from_page_walk_done:
+
+ mov r10,r9
+ neg r9
+
+
+
+
+
+
+
+
+
+
+ mov QWORD[32+rsp],r8
+ mov QWORD[40+rsp],rax
+
+$L$from_body:
+ mov r11,r9
+ lea rax,[48+rsp]
+ pxor xmm0,xmm0
+ jmp NEAR $L$mul_by_1
+
+ALIGN 32
+$L$mul_by_1:
+ movdqu xmm1,XMMWORD[rsi]
+ movdqu xmm2,XMMWORD[16+rsi]
+ movdqu xmm3,XMMWORD[32+rsi]
+ movdqa XMMWORD[r9*1+rax],xmm0
+ movdqu xmm4,XMMWORD[48+rsi]
+ movdqa XMMWORD[16+r9*1+rax],xmm0
+DB 0x48,0x8d,0xb6,0x40,0x00,0x00,0x00
+ movdqa XMMWORD[rax],xmm1
+ movdqa XMMWORD[32+r9*1+rax],xmm0
+ movdqa XMMWORD[16+rax],xmm2
+ movdqa XMMWORD[48+r9*1+rax],xmm0
+ movdqa XMMWORD[32+rax],xmm3
+ movdqa XMMWORD[48+rax],xmm4
+ lea rax,[64+rax]
+ sub r11,64
+ jnz NEAR $L$mul_by_1
+
+DB 102,72,15,110,207
+DB 102,72,15,110,209
+DB 0x67
+ mov rbp,rcx
+DB 102,73,15,110,218
+ mov r11d,DWORD[((OPENSSL_ia32cap_P+8))]
+ and r11d,0x80108
+ cmp r11d,0x80108
+ jne NEAR $L$from_mont_nox
+
+ lea rdi,[r9*1+rax]
+ call __bn_sqrx8x_reduction
+ call __bn_postx4x_internal
+
+ pxor xmm0,xmm0
+ lea rax,[48+rsp]
+ jmp NEAR $L$from_mont_zero
+
+ALIGN 32
+$L$from_mont_nox:
+ call __bn_sqr8x_reduction
+ call __bn_post4x_internal
+
+ pxor xmm0,xmm0
+ lea rax,[48+rsp]
+ jmp NEAR $L$from_mont_zero
+
+ALIGN 32
+$L$from_mont_zero:
+ mov rsi,QWORD[40+rsp]
+
+ movdqa XMMWORD[rax],xmm0
+ movdqa XMMWORD[16+rax],xmm0
+ movdqa XMMWORD[32+rax],xmm0
+ movdqa XMMWORD[48+rax],xmm0
+ lea rax,[64+rax]
+ sub r9,32
+ jnz NEAR $L$from_mont_zero
+
+ mov rax,1
+ mov r15,QWORD[((-48))+rsi]
+
+ mov r14,QWORD[((-40))+rsi]
+
+ mov r13,QWORD[((-32))+rsi]
+
+ mov r12,QWORD[((-24))+rsi]
+
+ mov rbp,QWORD[((-16))+rsi]
+
+ mov rbx,QWORD[((-8))+rsi]
+
+ lea rsp,[rsi]
+
+$L$from_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_bn_from_mont8x:
+
+ALIGN 32
+bn_mulx4x_mont_gather5:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_bn_mulx4x_mont_gather5:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+
+ mov rax,rsp
+
+$L$mulx4x_enter:
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+$L$mulx4x_prologue:
+
+ shl r9d,3
+ lea r10,[r9*2+r9]
+ neg r9
+ mov r8,QWORD[r8]
+
+
+
+
+
+
+
+
+
+
+ lea r11,[((-320))+r9*2+rsp]
+ mov rbp,rsp
+ sub r11,rdi
+ and r11,4095
+ cmp r10,r11
+ jb NEAR $L$mulx4xsp_alt
+ sub rbp,r11
+ lea rbp,[((-320))+r9*2+rbp]
+ jmp NEAR $L$mulx4xsp_done
+
+$L$mulx4xsp_alt:
+ lea r10,[((4096-320))+r9*2]
+ lea rbp,[((-320))+r9*2+rbp]
+ sub r11,r10
+ mov r10,0
+ cmovc r11,r10
+ sub rbp,r11
+$L$mulx4xsp_done:
+ and rbp,-64
+ mov r11,rsp
+ sub r11,rbp
+ and r11,-4096
+ lea rsp,[rbp*1+r11]
+ mov r10,QWORD[rsp]
+ cmp rsp,rbp
+ ja NEAR $L$mulx4x_page_walk
+ jmp NEAR $L$mulx4x_page_walk_done
+
+$L$mulx4x_page_walk:
+ lea rsp,[((-4096))+rsp]
+ mov r10,QWORD[rsp]
+ cmp rsp,rbp
+ ja NEAR $L$mulx4x_page_walk
+$L$mulx4x_page_walk_done:
+
+
+
+
+
+
+
+
+
+
+
+
+
+ mov QWORD[32+rsp],r8
+ mov QWORD[40+rsp],rax
+
+$L$mulx4x_body:
+ call mulx4x_internal
+
+ mov rsi,QWORD[40+rsp]
+
+ mov rax,1
+
+ mov r15,QWORD[((-48))+rsi]
+
+ mov r14,QWORD[((-40))+rsi]
+
+ mov r13,QWORD[((-32))+rsi]
+
+ mov r12,QWORD[((-24))+rsi]
+
+ mov rbp,QWORD[((-16))+rsi]
+
+ mov rbx,QWORD[((-8))+rsi]
+
+ lea rsp,[rsi]
+
+$L$mulx4x_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_bn_mulx4x_mont_gather5:
+
+
+ALIGN 32
+mulx4x_internal:
+ mov QWORD[8+rsp],r9
+ mov r10,r9
+ neg r9
+ shl r9,5
+ neg r10
+ lea r13,[128+r9*1+rdx]
+ shr r9,5+5
+ movd xmm5,DWORD[56+rax]
+ sub r9,1
+ lea rax,[$L$inc]
+ mov QWORD[((16+8))+rsp],r13
+ mov QWORD[((24+8))+rsp],r9
+ mov QWORD[((56+8))+rsp],rdi
+ movdqa xmm0,XMMWORD[rax]
+ movdqa xmm1,XMMWORD[16+rax]
+ lea r10,[((88-112))+r10*1+rsp]
+ lea rdi,[128+rdx]
+
+ pshufd xmm5,xmm5,0
+ movdqa xmm4,xmm1
+DB 0x67
+ movdqa xmm2,xmm1
+DB 0x67
+ paddd xmm1,xmm0
+ pcmpeqd xmm0,xmm5
+ movdqa xmm3,xmm4
+ paddd xmm2,xmm1
+ pcmpeqd xmm1,xmm5
+ movdqa XMMWORD[112+r10],xmm0
+ movdqa xmm0,xmm4
+
+ paddd xmm3,xmm2
+ pcmpeqd xmm2,xmm5
+ movdqa XMMWORD[128+r10],xmm1
+ movdqa xmm1,xmm4
+
+ paddd xmm0,xmm3
+ pcmpeqd xmm3,xmm5
+ movdqa XMMWORD[144+r10],xmm2
+ movdqa xmm2,xmm4
+
+ paddd xmm1,xmm0
+ pcmpeqd xmm0,xmm5
+ movdqa XMMWORD[160+r10],xmm3
+ movdqa xmm3,xmm4
+ paddd xmm2,xmm1
+ pcmpeqd xmm1,xmm5
+ movdqa XMMWORD[176+r10],xmm0
+ movdqa xmm0,xmm4
+
+ paddd xmm3,xmm2
+ pcmpeqd xmm2,xmm5
+ movdqa XMMWORD[192+r10],xmm1
+ movdqa xmm1,xmm4
+
+ paddd xmm0,xmm3
+ pcmpeqd xmm3,xmm5
+ movdqa XMMWORD[208+r10],xmm2
+ movdqa xmm2,xmm4
+
+ paddd xmm1,xmm0
+ pcmpeqd xmm0,xmm5
+ movdqa XMMWORD[224+r10],xmm3
+ movdqa xmm3,xmm4
+ paddd xmm2,xmm1
+ pcmpeqd xmm1,xmm5
+ movdqa XMMWORD[240+r10],xmm0
+ movdqa xmm0,xmm4
+
+ paddd xmm3,xmm2
+ pcmpeqd xmm2,xmm5
+ movdqa XMMWORD[256+r10],xmm1
+ movdqa xmm1,xmm4
+
+ paddd xmm0,xmm3
+ pcmpeqd xmm3,xmm5
+ movdqa XMMWORD[272+r10],xmm2
+ movdqa xmm2,xmm4
+
+ paddd xmm1,xmm0
+ pcmpeqd xmm0,xmm5
+ movdqa XMMWORD[288+r10],xmm3
+ movdqa xmm3,xmm4
+DB 0x67
+ paddd xmm2,xmm1
+ pcmpeqd xmm1,xmm5
+ movdqa XMMWORD[304+r10],xmm0
+
+ paddd xmm3,xmm2
+ pcmpeqd xmm2,xmm5
+ movdqa XMMWORD[320+r10],xmm1
+
+ pcmpeqd xmm3,xmm5
+ movdqa XMMWORD[336+r10],xmm2
+
+ pand xmm0,XMMWORD[64+rdi]
+ pand xmm1,XMMWORD[80+rdi]
+ pand xmm2,XMMWORD[96+rdi]
+ movdqa XMMWORD[352+r10],xmm3
+ pand xmm3,XMMWORD[112+rdi]
+ por xmm0,xmm2
+ por xmm1,xmm3
+ movdqa xmm4,XMMWORD[((-128))+rdi]
+ movdqa xmm5,XMMWORD[((-112))+rdi]
+ movdqa xmm2,XMMWORD[((-96))+rdi]
+ pand xmm4,XMMWORD[112+r10]
+ movdqa xmm3,XMMWORD[((-80))+rdi]
+ pand xmm5,XMMWORD[128+r10]
+ por xmm0,xmm4
+ pand xmm2,XMMWORD[144+r10]
+ por xmm1,xmm5
+ pand xmm3,XMMWORD[160+r10]
+ por xmm0,xmm2
+ por xmm1,xmm3
+ movdqa xmm4,XMMWORD[((-64))+rdi]
+ movdqa xmm5,XMMWORD[((-48))+rdi]
+ movdqa xmm2,XMMWORD[((-32))+rdi]
+ pand xmm4,XMMWORD[176+r10]
+ movdqa xmm3,XMMWORD[((-16))+rdi]
+ pand xmm5,XMMWORD[192+r10]
+ por xmm0,xmm4
+ pand xmm2,XMMWORD[208+r10]
+ por xmm1,xmm5
+ pand xmm3,XMMWORD[224+r10]
+ por xmm0,xmm2
+ por xmm1,xmm3
+ movdqa xmm4,XMMWORD[rdi]
+ movdqa xmm5,XMMWORD[16+rdi]
+ movdqa xmm2,XMMWORD[32+rdi]
+ pand xmm4,XMMWORD[240+r10]
+ movdqa xmm3,XMMWORD[48+rdi]
+ pand xmm5,XMMWORD[256+r10]
+ por xmm0,xmm4
+ pand xmm2,XMMWORD[272+r10]
+ por xmm1,xmm5
+ pand xmm3,XMMWORD[288+r10]
+ por xmm0,xmm2
+ por xmm1,xmm3
+ pxor xmm0,xmm1
+ pshufd xmm1,xmm0,0x4e
+ por xmm0,xmm1
+ lea rdi,[256+rdi]
+DB 102,72,15,126,194
+ lea rbx,[((64+32+8))+rsp]
+
+ mov r9,rdx
+ mulx rax,r8,QWORD[rsi]
+ mulx r12,r11,QWORD[8+rsi]
+ add r11,rax
+ mulx r13,rax,QWORD[16+rsi]
+ adc r12,rax
+ adc r13,0
+ mulx r14,rax,QWORD[24+rsi]
+
+ mov r15,r8
+ imul r8,QWORD[((32+8))+rsp]
+ xor rbp,rbp
+ mov rdx,r8
+
+ mov QWORD[((8+8))+rsp],rdi
+
+ lea rsi,[32+rsi]
+ adcx r13,rax
+ adcx r14,rbp
+
+ mulx r10,rax,QWORD[rcx]
+ adcx r15,rax
+ adox r10,r11
+ mulx r11,rax,QWORD[8+rcx]
+ adcx r10,rax
+ adox r11,r12
+ mulx r12,rax,QWORD[16+rcx]
+ mov rdi,QWORD[((24+8))+rsp]
+ mov QWORD[((-32))+rbx],r10
+ adcx r11,rax
+ adox r12,r13
+ mulx r15,rax,QWORD[24+rcx]
+ mov rdx,r9
+ mov QWORD[((-24))+rbx],r11
+ adcx r12,rax
+ adox r15,rbp
+ lea rcx,[32+rcx]
+ mov QWORD[((-16))+rbx],r12
+ jmp NEAR $L$mulx4x_1st
+
+ALIGN 32
+$L$mulx4x_1st:
+ adcx r15,rbp
+ mulx rax,r10,QWORD[rsi]
+ adcx r10,r14
+ mulx r14,r11,QWORD[8+rsi]
+ adcx r11,rax
+ mulx rax,r12,QWORD[16+rsi]
+ adcx r12,r14
+ mulx r14,r13,QWORD[24+rsi]
+DB 0x67,0x67
+ mov rdx,r8
+ adcx r13,rax
+ adcx r14,rbp
+ lea rsi,[32+rsi]
+ lea rbx,[32+rbx]
+
+ adox r10,r15
+ mulx r15,rax,QWORD[rcx]
+ adcx r10,rax
+ adox r11,r15
+ mulx r15,rax,QWORD[8+rcx]
+ adcx r11,rax
+ adox r12,r15
+ mulx r15,rax,QWORD[16+rcx]
+ mov QWORD[((-40))+rbx],r10
+ adcx r12,rax
+ mov QWORD[((-32))+rbx],r11
+ adox r13,r15
+ mulx r15,rax,QWORD[24+rcx]
+ mov rdx,r9
+ mov QWORD[((-24))+rbx],r12
+ adcx r13,rax
+ adox r15,rbp
+ lea rcx,[32+rcx]
+ mov QWORD[((-16))+rbx],r13
+
+ dec rdi
+ jnz NEAR $L$mulx4x_1st
+
+ mov rax,QWORD[8+rsp]
+ adc r15,rbp
+ lea rsi,[rax*1+rsi]
+ add r14,r15
+ mov rdi,QWORD[((8+8))+rsp]
+ adc rbp,rbp
+ mov QWORD[((-8))+rbx],r14
+ jmp NEAR $L$mulx4x_outer
+
+ALIGN 32
+$L$mulx4x_outer:
+ lea r10,[((16-256))+rbx]
+ pxor xmm4,xmm4
+DB 0x67,0x67
+ pxor xmm5,xmm5
+ movdqa xmm0,XMMWORD[((-128))+rdi]
+ movdqa xmm1,XMMWORD[((-112))+rdi]
+ movdqa xmm2,XMMWORD[((-96))+rdi]
+ pand xmm0,XMMWORD[256+r10]
+ movdqa xmm3,XMMWORD[((-80))+rdi]
+ pand xmm1,XMMWORD[272+r10]
+ por xmm4,xmm0
+ pand xmm2,XMMWORD[288+r10]
+ por xmm5,xmm1
+ pand xmm3,XMMWORD[304+r10]
+ por xmm4,xmm2
+ por xmm5,xmm3
+ movdqa xmm0,XMMWORD[((-64))+rdi]
+ movdqa xmm1,XMMWORD[((-48))+rdi]
+ movdqa xmm2,XMMWORD[((-32))+rdi]
+ pand xmm0,XMMWORD[320+r10]
+ movdqa xmm3,XMMWORD[((-16))+rdi]
+ pand xmm1,XMMWORD[336+r10]
+ por xmm4,xmm0
+ pand xmm2,XMMWORD[352+r10]
+ por xmm5,xmm1
+ pand xmm3,XMMWORD[368+r10]
+ por xmm4,xmm2
+ por xmm5,xmm3
+ movdqa xmm0,XMMWORD[rdi]
+ movdqa xmm1,XMMWORD[16+rdi]
+ movdqa xmm2,XMMWORD[32+rdi]
+ pand xmm0,XMMWORD[384+r10]
+ movdqa xmm3,XMMWORD[48+rdi]
+ pand xmm1,XMMWORD[400+r10]
+ por xmm4,xmm0
+ pand xmm2,XMMWORD[416+r10]
+ por xmm5,xmm1
+ pand xmm3,XMMWORD[432+r10]
+ por xmm4,xmm2
+ por xmm5,xmm3
+ movdqa xmm0,XMMWORD[64+rdi]
+ movdqa xmm1,XMMWORD[80+rdi]
+ movdqa xmm2,XMMWORD[96+rdi]
+ pand xmm0,XMMWORD[448+r10]
+ movdqa xmm3,XMMWORD[112+rdi]
+ pand xmm1,XMMWORD[464+r10]
+ por xmm4,xmm0
+ pand xmm2,XMMWORD[480+r10]
+ por xmm5,xmm1
+ pand xmm3,XMMWORD[496+r10]
+ por xmm4,xmm2
+ por xmm5,xmm3
+ por xmm4,xmm5
+ pshufd xmm0,xmm4,0x4e
+ por xmm0,xmm4
+ lea rdi,[256+rdi]
+DB 102,72,15,126,194
+
+ mov QWORD[rbx],rbp
+ lea rbx,[32+rax*1+rbx]
+ mulx r11,r8,QWORD[rsi]
+ xor rbp,rbp
+ mov r9,rdx
+ mulx r12,r14,QWORD[8+rsi]
+ adox r8,QWORD[((-32))+rbx]
+ adcx r11,r14
+ mulx r13,r15,QWORD[16+rsi]
+ adox r11,QWORD[((-24))+rbx]
+ adcx r12,r15
+ mulx r14,rdx,QWORD[24+rsi]
+ adox r12,QWORD[((-16))+rbx]
+ adcx r13,rdx
+ lea rcx,[rax*1+rcx]
+ lea rsi,[32+rsi]
+ adox r13,QWORD[((-8))+rbx]
+ adcx r14,rbp
+ adox r14,rbp
+
+ mov r15,r8
+ imul r8,QWORD[((32+8))+rsp]
+
+ mov rdx,r8
+ xor rbp,rbp
+ mov QWORD[((8+8))+rsp],rdi
+
+ mulx r10,rax,QWORD[rcx]
+ adcx r15,rax
+ adox r10,r11
+ mulx r11,rax,QWORD[8+rcx]
+ adcx r10,rax
+ adox r11,r12
+ mulx r12,rax,QWORD[16+rcx]
+ adcx r11,rax
+ adox r12,r13
+ mulx r15,rax,QWORD[24+rcx]
+ mov rdx,r9
+ mov rdi,QWORD[((24+8))+rsp]
+ mov QWORD[((-32))+rbx],r10
+ adcx r12,rax
+ mov QWORD[((-24))+rbx],r11
+ adox r15,rbp
+ mov QWORD[((-16))+rbx],r12
+ lea rcx,[32+rcx]
+ jmp NEAR $L$mulx4x_inner
+
+ALIGN 32
+$L$mulx4x_inner:
+ mulx rax,r10,QWORD[rsi]
+ adcx r15,rbp
+ adox r10,r14
+ mulx r14,r11,QWORD[8+rsi]
+ adcx r10,QWORD[rbx]
+ adox r11,rax
+ mulx rax,r12,QWORD[16+rsi]
+ adcx r11,QWORD[8+rbx]
+ adox r12,r14
+ mulx r14,r13,QWORD[24+rsi]
+ mov rdx,r8
+ adcx r12,QWORD[16+rbx]
+ adox r13,rax
+ adcx r13,QWORD[24+rbx]
+ adox r14,rbp
+ lea rsi,[32+rsi]
+ lea rbx,[32+rbx]
+ adcx r14,rbp
+
+ adox r10,r15
+ mulx r15,rax,QWORD[rcx]
+ adcx r10,rax
+ adox r11,r15
+ mulx r15,rax,QWORD[8+rcx]
+ adcx r11,rax
+ adox r12,r15
+ mulx r15,rax,QWORD[16+rcx]
+ mov QWORD[((-40))+rbx],r10
+ adcx r12,rax
+ adox r13,r15
+ mov QWORD[((-32))+rbx],r11
+ mulx r15,rax,QWORD[24+rcx]
+ mov rdx,r9
+ lea rcx,[32+rcx]
+ mov QWORD[((-24))+rbx],r12
+ adcx r13,rax
+ adox r15,rbp
+ mov QWORD[((-16))+rbx],r13
+
+ dec rdi
+ jnz NEAR $L$mulx4x_inner
+
+ mov rax,QWORD[((0+8))+rsp]
+ adc r15,rbp
+ sub rdi,QWORD[rbx]
+ mov rdi,QWORD[((8+8))+rsp]
+ mov r10,QWORD[((16+8))+rsp]
+ adc r14,r15
+ lea rsi,[rax*1+rsi]
+ adc rbp,rbp
+ mov QWORD[((-8))+rbx],r14
+
+ cmp rdi,r10
+ jb NEAR $L$mulx4x_outer
+
+ mov r10,QWORD[((-8))+rcx]
+ mov r8,rbp
+ mov r12,QWORD[rax*1+rcx]
+ lea rbp,[rax*1+rcx]
+ mov rcx,rax
+ lea rdi,[rax*1+rbx]
+ xor eax,eax
+ xor r15,r15
+ sub r10,r14
+ adc r15,r15
+ or r8,r15
+ sar rcx,3+2
+ sub rax,r8
+ mov rdx,QWORD[((56+8))+rsp]
+ dec r12
+ mov r13,QWORD[8+rbp]
+ xor r8,r8
+ mov r14,QWORD[16+rbp]
+ mov r15,QWORD[24+rbp]
+ jmp NEAR $L$sqrx4x_sub_entry
+
+
+ALIGN 32
+bn_powerx5:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_bn_powerx5:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+
+ mov rax,rsp
+
+$L$powerx5_enter:
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+$L$powerx5_prologue:
+
+ shl r9d,3
+ lea r10,[r9*2+r9]
+ neg r9
+ mov r8,QWORD[r8]
+
+
+
+
+
+
+
+
+ lea r11,[((-320))+r9*2+rsp]
+ mov rbp,rsp
+ sub r11,rdi
+ and r11,4095
+ cmp r10,r11
+ jb NEAR $L$pwrx_sp_alt
+ sub rbp,r11
+ lea rbp,[((-320))+r9*2+rbp]
+ jmp NEAR $L$pwrx_sp_done
+
+ALIGN 32
+$L$pwrx_sp_alt:
+ lea r10,[((4096-320))+r9*2]
+ lea rbp,[((-320))+r9*2+rbp]
+ sub r11,r10
+ mov r10,0
+ cmovc r11,r10
+ sub rbp,r11
+$L$pwrx_sp_done:
+ and rbp,-64
+ mov r11,rsp
+ sub r11,rbp
+ and r11,-4096
+ lea rsp,[rbp*1+r11]
+ mov r10,QWORD[rsp]
+ cmp rsp,rbp
+ ja NEAR $L$pwrx_page_walk
+ jmp NEAR $L$pwrx_page_walk_done
+
+$L$pwrx_page_walk:
+ lea rsp,[((-4096))+rsp]
+ mov r10,QWORD[rsp]
+ cmp rsp,rbp
+ ja NEAR $L$pwrx_page_walk
+$L$pwrx_page_walk_done:
+
+ mov r10,r9
+ neg r9
+
+
+
+
+
+
+
+
+
+
+
+
+ pxor xmm0,xmm0
+DB 102,72,15,110,207
+DB 102,72,15,110,209
+DB 102,73,15,110,218
+DB 102,72,15,110,226
+ mov QWORD[32+rsp],r8
+ mov QWORD[40+rsp],rax
+
+$L$powerx5_body:
+
+ call __bn_sqrx8x_internal
+ call __bn_postx4x_internal
+ call __bn_sqrx8x_internal
+ call __bn_postx4x_internal
+ call __bn_sqrx8x_internal
+ call __bn_postx4x_internal
+ call __bn_sqrx8x_internal
+ call __bn_postx4x_internal
+ call __bn_sqrx8x_internal
+ call __bn_postx4x_internal
+
+ mov r9,r10
+ mov rdi,rsi
+DB 102,72,15,126,209
+DB 102,72,15,126,226
+ mov rax,QWORD[40+rsp]
+
+ call mulx4x_internal
+
+ mov rsi,QWORD[40+rsp]
+
+ mov rax,1
+
+ mov r15,QWORD[((-48))+rsi]
+
+ mov r14,QWORD[((-40))+rsi]
+
+ mov r13,QWORD[((-32))+rsi]
+
+ mov r12,QWORD[((-24))+rsi]
+
+ mov rbp,QWORD[((-16))+rsi]
+
+ mov rbx,QWORD[((-8))+rsi]
+
+ lea rsp,[rsi]
+
+$L$powerx5_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_bn_powerx5:
+
+global bn_sqrx8x_internal
+
+
+ALIGN 32
+bn_sqrx8x_internal:
+__bn_sqrx8x_internal:
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ lea rdi,[((48+8))+rsp]
+ lea rbp,[r9*1+rsi]
+ mov QWORD[((0+8))+rsp],r9
+ mov QWORD[((8+8))+rsp],rbp
+ jmp NEAR $L$sqr8x_zero_start
+
+ALIGN 32
+DB 0x66,0x66,0x66,0x2e,0x0f,0x1f,0x84,0x00,0x00,0x00,0x00,0x00
+$L$sqrx8x_zero:
+DB 0x3e
+ movdqa XMMWORD[rdi],xmm0
+ movdqa XMMWORD[16+rdi],xmm0
+ movdqa XMMWORD[32+rdi],xmm0
+ movdqa XMMWORD[48+rdi],xmm0
+$L$sqr8x_zero_start:
+ movdqa XMMWORD[64+rdi],xmm0
+ movdqa XMMWORD[80+rdi],xmm0
+ movdqa XMMWORD[96+rdi],xmm0
+ movdqa XMMWORD[112+rdi],xmm0
+ lea rdi,[128+rdi]
+ sub r9,64
+ jnz NEAR $L$sqrx8x_zero
+
+ mov rdx,QWORD[rsi]
+
+ xor r10,r10
+ xor r11,r11
+ xor r12,r12
+ xor r13,r13
+ xor r14,r14
+ xor r15,r15
+ lea rdi,[((48+8))+rsp]
+ xor rbp,rbp
+ jmp NEAR $L$sqrx8x_outer_loop
+
+ALIGN 32
+$L$sqrx8x_outer_loop:
+ mulx rax,r8,QWORD[8+rsi]
+ adcx r8,r9
+ adox r10,rax
+ mulx rax,r9,QWORD[16+rsi]
+ adcx r9,r10
+ adox r11,rax
+DB 0xc4,0xe2,0xab,0xf6,0x86,0x18,0x00,0x00,0x00
+ adcx r10,r11
+ adox r12,rax
+DB 0xc4,0xe2,0xa3,0xf6,0x86,0x20,0x00,0x00,0x00
+ adcx r11,r12
+ adox r13,rax
+ mulx rax,r12,QWORD[40+rsi]
+ adcx r12,r13
+ adox r14,rax
+ mulx rax,r13,QWORD[48+rsi]
+ adcx r13,r14
+ adox rax,r15
+ mulx r15,r14,QWORD[56+rsi]
+ mov rdx,QWORD[8+rsi]
+ adcx r14,rax
+ adox r15,rbp
+ adc r15,QWORD[64+rdi]
+ mov QWORD[8+rdi],r8
+ mov QWORD[16+rdi],r9
+ sbb rcx,rcx
+ xor rbp,rbp
+
+
+ mulx rbx,r8,QWORD[16+rsi]
+ mulx rax,r9,QWORD[24+rsi]
+ adcx r8,r10
+ adox r9,rbx
+ mulx rbx,r10,QWORD[32+rsi]
+ adcx r9,r11
+ adox r10,rax
+DB 0xc4,0xe2,0xa3,0xf6,0x86,0x28,0x00,0x00,0x00
+ adcx r10,r12
+ adox r11,rbx
+DB 0xc4,0xe2,0x9b,0xf6,0x9e,0x30,0x00,0x00,0x00
+ adcx r11,r13
+ adox r12,r14
+DB 0xc4,0x62,0x93,0xf6,0xb6,0x38,0x00,0x00,0x00
+ mov rdx,QWORD[16+rsi]
+ adcx r12,rax
+ adox r13,rbx
+ adcx r13,r15
+ adox r14,rbp
+ adcx r14,rbp
+
+ mov QWORD[24+rdi],r8
+ mov QWORD[32+rdi],r9
+
+ mulx rbx,r8,QWORD[24+rsi]
+ mulx rax,r9,QWORD[32+rsi]
+ adcx r8,r10
+ adox r9,rbx
+ mulx rbx,r10,QWORD[40+rsi]
+ adcx r9,r11
+ adox r10,rax
+DB 0xc4,0xe2,0xa3,0xf6,0x86,0x30,0x00,0x00,0x00
+ adcx r10,r12
+ adox r11,r13
+DB 0xc4,0x62,0x9b,0xf6,0xae,0x38,0x00,0x00,0x00
+DB 0x3e
+ mov rdx,QWORD[24+rsi]
+ adcx r11,rbx
+ adox r12,rax
+ adcx r12,r14
+ mov QWORD[40+rdi],r8
+ mov QWORD[48+rdi],r9
+ mulx rax,r8,QWORD[32+rsi]
+ adox r13,rbp
+ adcx r13,rbp
+
+ mulx rbx,r9,QWORD[40+rsi]
+ adcx r8,r10
+ adox r9,rax
+ mulx rax,r10,QWORD[48+rsi]
+ adcx r9,r11
+ adox r10,r12
+ mulx r12,r11,QWORD[56+rsi]
+ mov rdx,QWORD[32+rsi]
+ mov r14,QWORD[40+rsi]
+ adcx r10,rbx
+ adox r11,rax
+ mov r15,QWORD[48+rsi]
+ adcx r11,r13
+ adox r12,rbp
+ adcx r12,rbp
+
+ mov QWORD[56+rdi],r8
+ mov QWORD[64+rdi],r9
+
+ mulx rax,r9,r14
+ mov r8,QWORD[56+rsi]
+ adcx r9,r10
+ mulx rbx,r10,r15
+ adox r10,rax
+ adcx r10,r11
+ mulx rax,r11,r8
+ mov rdx,r14
+ adox r11,rbx
+ adcx r11,r12
+
+ adcx rax,rbp
+
+ mulx rbx,r14,r15
+ mulx r13,r12,r8
+ mov rdx,r15
+ lea rsi,[64+rsi]
+ adcx r11,r14
+ adox r12,rbx
+ adcx r12,rax
+ adox r13,rbp
+
+DB 0x67,0x67
+ mulx r14,r8,r8
+ adcx r13,r8
+ adcx r14,rbp
+
+ cmp rsi,QWORD[((8+8))+rsp]
+ je NEAR $L$sqrx8x_outer_break
+
+ neg rcx
+ mov rcx,-8
+ mov r15,rbp
+ mov r8,QWORD[64+rdi]
+ adcx r9,QWORD[72+rdi]
+ adcx r10,QWORD[80+rdi]
+ adcx r11,QWORD[88+rdi]
+ adc r12,QWORD[96+rdi]
+ adc r13,QWORD[104+rdi]
+ adc r14,QWORD[112+rdi]
+ adc r15,QWORD[120+rdi]
+ lea rbp,[rsi]
+ lea rdi,[128+rdi]
+ sbb rax,rax
+
+ mov rdx,QWORD[((-64))+rsi]
+ mov QWORD[((16+8))+rsp],rax
+ mov QWORD[((24+8))+rsp],rdi
+
+
+ xor eax,eax
+ jmp NEAR $L$sqrx8x_loop
+
+ALIGN 32
+$L$sqrx8x_loop:
+ mov rbx,r8
+ mulx r8,rax,QWORD[rbp]
+ adcx rbx,rax
+ adox r8,r9
+
+ mulx r9,rax,QWORD[8+rbp]
+ adcx r8,rax
+ adox r9,r10
+
+ mulx r10,rax,QWORD[16+rbp]
+ adcx r9,rax
+ adox r10,r11
+
+ mulx r11,rax,QWORD[24+rbp]
+ adcx r10,rax
+ adox r11,r12
+
+DB 0xc4,0x62,0xfb,0xf6,0xa5,0x20,0x00,0x00,0x00
+ adcx r11,rax
+ adox r12,r13
+
+ mulx r13,rax,QWORD[40+rbp]
+ adcx r12,rax
+ adox r13,r14
+
+ mulx r14,rax,QWORD[48+rbp]
+ mov QWORD[rcx*8+rdi],rbx
+ mov ebx,0
+ adcx r13,rax
+ adox r14,r15
+
+DB 0xc4,0x62,0xfb,0xf6,0xbd,0x38,0x00,0x00,0x00
+ mov rdx,QWORD[8+rcx*8+rsi]
+ adcx r14,rax
+ adox r15,rbx
+ adcx r15,rbx
+
+DB 0x67
+ inc rcx
+ jnz NEAR $L$sqrx8x_loop
+
+ lea rbp,[64+rbp]
+ mov rcx,-8
+ cmp rbp,QWORD[((8+8))+rsp]
+ je NEAR $L$sqrx8x_break
+
+ sub rbx,QWORD[((16+8))+rsp]
+DB 0x66
+ mov rdx,QWORD[((-64))+rsi]
+ adcx r8,QWORD[rdi]
+ adcx r9,QWORD[8+rdi]
+ adc r10,QWORD[16+rdi]
+ adc r11,QWORD[24+rdi]
+ adc r12,QWORD[32+rdi]
+ adc r13,QWORD[40+rdi]
+ adc r14,QWORD[48+rdi]
+ adc r15,QWORD[56+rdi]
+ lea rdi,[64+rdi]
+DB 0x67
+ sbb rax,rax
+ xor ebx,ebx
+ mov QWORD[((16+8))+rsp],rax
+ jmp NEAR $L$sqrx8x_loop
+
+ALIGN 32
+$L$sqrx8x_break:
+ xor rbp,rbp
+ sub rbx,QWORD[((16+8))+rsp]
+ adcx r8,rbp
+ mov rcx,QWORD[((24+8))+rsp]
+ adcx r9,rbp
+ mov rdx,QWORD[rsi]
+ adc r10,0
+ mov QWORD[rdi],r8
+ adc r11,0
+ adc r12,0
+ adc r13,0
+ adc r14,0
+ adc r15,0
+ cmp rdi,rcx
+ je NEAR $L$sqrx8x_outer_loop
+
+ mov QWORD[8+rdi],r9
+ mov r9,QWORD[8+rcx]
+ mov QWORD[16+rdi],r10
+ mov r10,QWORD[16+rcx]
+ mov QWORD[24+rdi],r11
+ mov r11,QWORD[24+rcx]
+ mov QWORD[32+rdi],r12
+ mov r12,QWORD[32+rcx]
+ mov QWORD[40+rdi],r13
+ mov r13,QWORD[40+rcx]
+ mov QWORD[48+rdi],r14
+ mov r14,QWORD[48+rcx]
+ mov QWORD[56+rdi],r15
+ mov r15,QWORD[56+rcx]
+ mov rdi,rcx
+ jmp NEAR $L$sqrx8x_outer_loop
+
+ALIGN 32
+$L$sqrx8x_outer_break:
+ mov QWORD[72+rdi],r9
+DB 102,72,15,126,217
+ mov QWORD[80+rdi],r10
+ mov QWORD[88+rdi],r11
+ mov QWORD[96+rdi],r12
+ mov QWORD[104+rdi],r13
+ mov QWORD[112+rdi],r14
+ lea rdi,[((48+8))+rsp]
+ mov rdx,QWORD[rcx*1+rsi]
+
+ mov r11,QWORD[8+rdi]
+ xor r10,r10
+ mov r9,QWORD[((0+8))+rsp]
+ adox r11,r11
+ mov r12,QWORD[16+rdi]
+ mov r13,QWORD[24+rdi]
+
+
+ALIGN 32
+$L$sqrx4x_shift_n_add:
+ mulx rbx,rax,rdx
+ adox r12,r12
+ adcx rax,r10
+DB 0x48,0x8b,0x94,0x0e,0x08,0x00,0x00,0x00
+DB 0x4c,0x8b,0x97,0x20,0x00,0x00,0x00
+ adox r13,r13
+ adcx rbx,r11
+ mov r11,QWORD[40+rdi]
+ mov QWORD[rdi],rax
+ mov QWORD[8+rdi],rbx
+
+ mulx rbx,rax,rdx
+ adox r10,r10
+ adcx rax,r12
+ mov rdx,QWORD[16+rcx*1+rsi]
+ mov r12,QWORD[48+rdi]
+ adox r11,r11
+ adcx rbx,r13
+ mov r13,QWORD[56+rdi]
+ mov QWORD[16+rdi],rax
+ mov QWORD[24+rdi],rbx
+
+ mulx rbx,rax,rdx
+ adox r12,r12
+ adcx rax,r10
+ mov rdx,QWORD[24+rcx*1+rsi]
+ lea rcx,[32+rcx]
+ mov r10,QWORD[64+rdi]
+ adox r13,r13
+ adcx rbx,r11
+ mov r11,QWORD[72+rdi]
+ mov QWORD[32+rdi],rax
+ mov QWORD[40+rdi],rbx
+
+ mulx rbx,rax,rdx
+ adox r10,r10
+ adcx rax,r12
+ jrcxz $L$sqrx4x_shift_n_add_break
+DB 0x48,0x8b,0x94,0x0e,0x00,0x00,0x00,0x00
+ adox r11,r11
+ adcx rbx,r13
+ mov r12,QWORD[80+rdi]
+ mov r13,QWORD[88+rdi]
+ mov QWORD[48+rdi],rax
+ mov QWORD[56+rdi],rbx
+ lea rdi,[64+rdi]
+ nop
+ jmp NEAR $L$sqrx4x_shift_n_add
+
+ALIGN 32
+$L$sqrx4x_shift_n_add_break:
+ adcx rbx,r13
+ mov QWORD[48+rdi],rax
+ mov QWORD[56+rdi],rbx
+ lea rdi,[64+rdi]
+DB 102,72,15,126,213
+__bn_sqrx8x_reduction:
+ xor eax,eax
+ mov rbx,QWORD[((32+8))+rsp]
+ mov rdx,QWORD[((48+8))+rsp]
+ lea rcx,[((-64))+r9*1+rbp]
+
+ mov QWORD[((0+8))+rsp],rcx
+ mov QWORD[((8+8))+rsp],rdi
+
+ lea rdi,[((48+8))+rsp]
+ jmp NEAR $L$sqrx8x_reduction_loop
+
+ALIGN 32
+$L$sqrx8x_reduction_loop:
+ mov r9,QWORD[8+rdi]
+ mov r10,QWORD[16+rdi]
+ mov r11,QWORD[24+rdi]
+ mov r12,QWORD[32+rdi]
+ mov r8,rdx
+ imul rdx,rbx
+ mov r13,QWORD[40+rdi]
+ mov r14,QWORD[48+rdi]
+ mov r15,QWORD[56+rdi]
+ mov QWORD[((24+8))+rsp],rax
+
+ lea rdi,[64+rdi]
+ xor rsi,rsi
+ mov rcx,-8
+ jmp NEAR $L$sqrx8x_reduce
+
+ALIGN 32
+$L$sqrx8x_reduce:
+ mov rbx,r8
+ mulx r8,rax,QWORD[rbp]
+ adcx rax,rbx
+ adox r8,r9
+
+ mulx r9,rbx,QWORD[8+rbp]
+ adcx r8,rbx
+ adox r9,r10
+
+ mulx r10,rbx,QWORD[16+rbp]
+ adcx r9,rbx
+ adox r10,r11
+
+ mulx r11,rbx,QWORD[24+rbp]
+ adcx r10,rbx
+ adox r11,r12
+
+DB 0xc4,0x62,0xe3,0xf6,0xa5,0x20,0x00,0x00,0x00
+ mov rax,rdx
+ mov rdx,r8
+ adcx r11,rbx
+ adox r12,r13
+
+ mulx rdx,rbx,QWORD[((32+8))+rsp]
+ mov rdx,rax
+ mov QWORD[((64+48+8))+rcx*8+rsp],rax
+
+ mulx r13,rax,QWORD[40+rbp]
+ adcx r12,rax
+ adox r13,r14
+
+ mulx r14,rax,QWORD[48+rbp]
+ adcx r13,rax
+ adox r14,r15
+
+ mulx r15,rax,QWORD[56+rbp]
+ mov rdx,rbx
+ adcx r14,rax
+ adox r15,rsi
+ adcx r15,rsi
+
+DB 0x67,0x67,0x67
+ inc rcx
+ jnz NEAR $L$sqrx8x_reduce
+
+ mov rax,rsi
+ cmp rbp,QWORD[((0+8))+rsp]
+ jae NEAR $L$sqrx8x_no_tail
+
+ mov rdx,QWORD[((48+8))+rsp]
+ add r8,QWORD[rdi]
+ lea rbp,[64+rbp]
+ mov rcx,-8
+ adcx r9,QWORD[8+rdi]
+ adcx r10,QWORD[16+rdi]
+ adc r11,QWORD[24+rdi]
+ adc r12,QWORD[32+rdi]
+ adc r13,QWORD[40+rdi]
+ adc r14,QWORD[48+rdi]
+ adc r15,QWORD[56+rdi]
+ lea rdi,[64+rdi]
+ sbb rax,rax
+
+ xor rsi,rsi
+ mov QWORD[((16+8))+rsp],rax
+ jmp NEAR $L$sqrx8x_tail
+
+ALIGN 32
+$L$sqrx8x_tail:
+ mov rbx,r8
+ mulx r8,rax,QWORD[rbp]
+ adcx rbx,rax
+ adox r8,r9
+
+ mulx r9,rax,QWORD[8+rbp]
+ adcx r8,rax
+ adox r9,r10
+
+ mulx r10,rax,QWORD[16+rbp]
+ adcx r9,rax
+ adox r10,r11
+
+ mulx r11,rax,QWORD[24+rbp]
+ adcx r10,rax
+ adox r11,r12
+
+DB 0xc4,0x62,0xfb,0xf6,0xa5,0x20,0x00,0x00,0x00
+ adcx r11,rax
+ adox r12,r13
+
+ mulx r13,rax,QWORD[40+rbp]
+ adcx r12,rax
+ adox r13,r14
+
+ mulx r14,rax,QWORD[48+rbp]
+ adcx r13,rax
+ adox r14,r15
+
+ mulx r15,rax,QWORD[56+rbp]
+ mov rdx,QWORD[((72+48+8))+rcx*8+rsp]
+ adcx r14,rax
+ adox r15,rsi
+ mov QWORD[rcx*8+rdi],rbx
+ mov rbx,r8
+ adcx r15,rsi
+
+ inc rcx
+ jnz NEAR $L$sqrx8x_tail
+
+ cmp rbp,QWORD[((0+8))+rsp]
+ jae NEAR $L$sqrx8x_tail_done
+
+ sub rsi,QWORD[((16+8))+rsp]
+ mov rdx,QWORD[((48+8))+rsp]
+ lea rbp,[64+rbp]
+ adc r8,QWORD[rdi]
+ adc r9,QWORD[8+rdi]
+ adc r10,QWORD[16+rdi]
+ adc r11,QWORD[24+rdi]
+ adc r12,QWORD[32+rdi]
+ adc r13,QWORD[40+rdi]
+ adc r14,QWORD[48+rdi]
+ adc r15,QWORD[56+rdi]
+ lea rdi,[64+rdi]
+ sbb rax,rax
+ sub rcx,8
+
+ xor rsi,rsi
+ mov QWORD[((16+8))+rsp],rax
+ jmp NEAR $L$sqrx8x_tail
+
+ALIGN 32
+$L$sqrx8x_tail_done:
+ xor rax,rax
+ add r8,QWORD[((24+8))+rsp]
+ adc r9,0
+ adc r10,0
+ adc r11,0
+ adc r12,0
+ adc r13,0
+ adc r14,0
+ adc r15,0
+ adc rax,0
+
+ sub rsi,QWORD[((16+8))+rsp]
+$L$sqrx8x_no_tail:
+ adc r8,QWORD[rdi]
+DB 102,72,15,126,217
+ adc r9,QWORD[8+rdi]
+ mov rsi,QWORD[56+rbp]
+DB 102,72,15,126,213
+ adc r10,QWORD[16+rdi]
+ adc r11,QWORD[24+rdi]
+ adc r12,QWORD[32+rdi]
+ adc r13,QWORD[40+rdi]
+ adc r14,QWORD[48+rdi]
+ adc r15,QWORD[56+rdi]
+ adc rax,0
+
+ mov rbx,QWORD[((32+8))+rsp]
+ mov rdx,QWORD[64+rcx*1+rdi]
+
+ mov QWORD[rdi],r8
+ lea r8,[64+rdi]
+ mov QWORD[8+rdi],r9
+ mov QWORD[16+rdi],r10
+ mov QWORD[24+rdi],r11
+ mov QWORD[32+rdi],r12
+ mov QWORD[40+rdi],r13
+ mov QWORD[48+rdi],r14
+ mov QWORD[56+rdi],r15
+
+ lea rdi,[64+rcx*1+rdi]
+ cmp r8,QWORD[((8+8))+rsp]
+ jb NEAR $L$sqrx8x_reduction_loop
+ DB 0F3h,0C3h ;repret
+
+
+ALIGN 32
+__bn_postx4x_internal:
+ mov r12,QWORD[rbp]
+ mov r10,rcx
+ mov r9,rcx
+ neg rax
+ sar rcx,3+2
+
+DB 102,72,15,126,202
+DB 102,72,15,126,206
+ dec r12
+ mov r13,QWORD[8+rbp]
+ xor r8,r8
+ mov r14,QWORD[16+rbp]
+ mov r15,QWORD[24+rbp]
+ jmp NEAR $L$sqrx4x_sub_entry
+
+ALIGN 16
+$L$sqrx4x_sub:
+ mov r12,QWORD[rbp]
+ mov r13,QWORD[8+rbp]
+ mov r14,QWORD[16+rbp]
+ mov r15,QWORD[24+rbp]
+$L$sqrx4x_sub_entry:
+ andn r12,r12,rax
+ lea rbp,[32+rbp]
+ andn r13,r13,rax
+ andn r14,r14,rax
+ andn r15,r15,rax
+
+ neg r8
+ adc r12,QWORD[rdi]
+ adc r13,QWORD[8+rdi]
+ adc r14,QWORD[16+rdi]
+ adc r15,QWORD[24+rdi]
+ mov QWORD[rdx],r12
+ lea rdi,[32+rdi]
+ mov QWORD[8+rdx],r13
+ sbb r8,r8
+ mov QWORD[16+rdx],r14
+ mov QWORD[24+rdx],r15
+ lea rdx,[32+rdx]
+
+ inc rcx
+ jnz NEAR $L$sqrx4x_sub
+
+ neg r9
+
+ DB 0F3h,0C3h ;repret
+
+global bn_get_bits5
+
+ALIGN 16
+bn_get_bits5:
+ lea r10,[rcx]
+ lea r11,[1+rcx]
+ mov ecx,edx
+ shr edx,4
+ and ecx,15
+ lea eax,[((-8))+rcx]
+ cmp ecx,11
+ cmova r10,r11
+ cmova ecx,eax
+ movzx eax,WORD[rdx*2+r10]
+ shr eax,cl
+ and eax,31
+ DB 0F3h,0C3h ;repret
+
+
+global bn_scatter5
+
+ALIGN 16
+bn_scatter5:
+ cmp edx,0
+ jz NEAR $L$scatter_epilogue
+ lea r8,[r9*8+r8]
+$L$scatter:
+ mov rax,QWORD[rcx]
+ lea rcx,[8+rcx]
+ mov QWORD[r8],rax
+ lea r8,[256+r8]
+ sub edx,1
+ jnz NEAR $L$scatter
+$L$scatter_epilogue:
+ DB 0F3h,0C3h ;repret
+
+
+global bn_gather5
+
+ALIGN 32
+bn_gather5:
+$L$SEH_begin_bn_gather5:
+
+DB 0x4c,0x8d,0x14,0x24
+DB 0x48,0x81,0xec,0x08,0x01,0x00,0x00
+ lea rax,[$L$inc]
+ and rsp,-16
+
+ movd xmm5,r9d
+ movdqa xmm0,XMMWORD[rax]
+ movdqa xmm1,XMMWORD[16+rax]
+ lea r11,[128+r8]
+ lea rax,[128+rsp]
+
+ pshufd xmm5,xmm5,0
+ movdqa xmm4,xmm1
+ movdqa xmm2,xmm1
+ paddd xmm1,xmm0
+ pcmpeqd xmm0,xmm5
+ movdqa xmm3,xmm4
+
+ paddd xmm2,xmm1
+ pcmpeqd xmm1,xmm5
+ movdqa XMMWORD[(-128)+rax],xmm0
+ movdqa xmm0,xmm4
+
+ paddd xmm3,xmm2
+ pcmpeqd xmm2,xmm5
+ movdqa XMMWORD[(-112)+rax],xmm1
+ movdqa xmm1,xmm4
+
+ paddd xmm0,xmm3
+ pcmpeqd xmm3,xmm5
+ movdqa XMMWORD[(-96)+rax],xmm2
+ movdqa xmm2,xmm4
+ paddd xmm1,xmm0
+ pcmpeqd xmm0,xmm5
+ movdqa XMMWORD[(-80)+rax],xmm3
+ movdqa xmm3,xmm4
+
+ paddd xmm2,xmm1
+ pcmpeqd xmm1,xmm5
+ movdqa XMMWORD[(-64)+rax],xmm0
+ movdqa xmm0,xmm4
+
+ paddd xmm3,xmm2
+ pcmpeqd xmm2,xmm5
+ movdqa XMMWORD[(-48)+rax],xmm1
+ movdqa xmm1,xmm4
+
+ paddd xmm0,xmm3
+ pcmpeqd xmm3,xmm5
+ movdqa XMMWORD[(-32)+rax],xmm2
+ movdqa xmm2,xmm4
+ paddd xmm1,xmm0
+ pcmpeqd xmm0,xmm5
+ movdqa XMMWORD[(-16)+rax],xmm3
+ movdqa xmm3,xmm4
+
+ paddd xmm2,xmm1
+ pcmpeqd xmm1,xmm5
+ movdqa XMMWORD[rax],xmm0
+ movdqa xmm0,xmm4
+
+ paddd xmm3,xmm2
+ pcmpeqd xmm2,xmm5
+ movdqa XMMWORD[16+rax],xmm1
+ movdqa xmm1,xmm4
+
+ paddd xmm0,xmm3
+ pcmpeqd xmm3,xmm5
+ movdqa XMMWORD[32+rax],xmm2
+ movdqa xmm2,xmm4
+ paddd xmm1,xmm0
+ pcmpeqd xmm0,xmm5
+ movdqa XMMWORD[48+rax],xmm3
+ movdqa xmm3,xmm4
+
+ paddd xmm2,xmm1
+ pcmpeqd xmm1,xmm5
+ movdqa XMMWORD[64+rax],xmm0
+ movdqa xmm0,xmm4
+
+ paddd xmm3,xmm2
+ pcmpeqd xmm2,xmm5
+ movdqa XMMWORD[80+rax],xmm1
+ movdqa xmm1,xmm4
+
+ paddd xmm0,xmm3
+ pcmpeqd xmm3,xmm5
+ movdqa XMMWORD[96+rax],xmm2
+ movdqa xmm2,xmm4
+ movdqa XMMWORD[112+rax],xmm3
+ jmp NEAR $L$gather
+
+ALIGN 32
+$L$gather:
+ pxor xmm4,xmm4
+ pxor xmm5,xmm5
+ movdqa xmm0,XMMWORD[((-128))+r11]
+ movdqa xmm1,XMMWORD[((-112))+r11]
+ movdqa xmm2,XMMWORD[((-96))+r11]
+ pand xmm0,XMMWORD[((-128))+rax]
+ movdqa xmm3,XMMWORD[((-80))+r11]
+ pand xmm1,XMMWORD[((-112))+rax]
+ por xmm4,xmm0
+ pand xmm2,XMMWORD[((-96))+rax]
+ por xmm5,xmm1
+ pand xmm3,XMMWORD[((-80))+rax]
+ por xmm4,xmm2
+ por xmm5,xmm3
+ movdqa xmm0,XMMWORD[((-64))+r11]
+ movdqa xmm1,XMMWORD[((-48))+r11]
+ movdqa xmm2,XMMWORD[((-32))+r11]
+ pand xmm0,XMMWORD[((-64))+rax]
+ movdqa xmm3,XMMWORD[((-16))+r11]
+ pand xmm1,XMMWORD[((-48))+rax]
+ por xmm4,xmm0
+ pand xmm2,XMMWORD[((-32))+rax]
+ por xmm5,xmm1
+ pand xmm3,XMMWORD[((-16))+rax]
+ por xmm4,xmm2
+ por xmm5,xmm3
+ movdqa xmm0,XMMWORD[r11]
+ movdqa xmm1,XMMWORD[16+r11]
+ movdqa xmm2,XMMWORD[32+r11]
+ pand xmm0,XMMWORD[rax]
+ movdqa xmm3,XMMWORD[48+r11]
+ pand xmm1,XMMWORD[16+rax]
+ por xmm4,xmm0
+ pand xmm2,XMMWORD[32+rax]
+ por xmm5,xmm1
+ pand xmm3,XMMWORD[48+rax]
+ por xmm4,xmm2
+ por xmm5,xmm3
+ movdqa xmm0,XMMWORD[64+r11]
+ movdqa xmm1,XMMWORD[80+r11]
+ movdqa xmm2,XMMWORD[96+r11]
+ pand xmm0,XMMWORD[64+rax]
+ movdqa xmm3,XMMWORD[112+r11]
+ pand xmm1,XMMWORD[80+rax]
+ por xmm4,xmm0
+ pand xmm2,XMMWORD[96+rax]
+ por xmm5,xmm1
+ pand xmm3,XMMWORD[112+rax]
+ por xmm4,xmm2
+ por xmm5,xmm3
+ por xmm4,xmm5
+ lea r11,[256+r11]
+ pshufd xmm0,xmm4,0x4e
+ por xmm0,xmm4
+ movq QWORD[rcx],xmm0
+ lea rcx,[8+rcx]
+ sub edx,1
+ jnz NEAR $L$gather
+
+ lea rsp,[r10]
+ DB 0F3h,0C3h ;repret
+$L$SEH_end_bn_gather5:
+
+ALIGN 64
+$L$inc:
+ DD 0,0,1,1
+ DD 2,2,2,2
+DB 77,111,110,116,103,111,109,101,114,121,32,77,117,108,116,105
+DB 112,108,105,99,97,116,105,111,110,32,119,105,116,104,32,115
+DB 99,97,116,116,101,114,47,103,97,116,104,101,114,32,102,111
+DB 114,32,120,56,54,95,54,52,44,32,67,82,89,80,84,79
+DB 71,65,77,83,32,98,121,32,60,97,112,112,114,111,64,111
+DB 112,101,110,115,115,108,46,111,114,103,62,0
+EXTERN __imp_RtlVirtualUnwind
+
+ALIGN 16
+mul_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ mov rsi,QWORD[8+r9]
+ mov r11,QWORD[56+r9]
+
+ mov r10d,DWORD[r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jb NEAR $L$common_seh_tail
+
+ mov r10d,DWORD[4+r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jb NEAR $L$common_pop_regs
+
+ mov rax,QWORD[152+r8]
+
+ mov r10d,DWORD[8+r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jae NEAR $L$common_seh_tail
+
+ lea r10,[$L$mul_epilogue]
+ cmp rbx,r10
+ ja NEAR $L$body_40
+
+ mov r10,QWORD[192+r8]
+ mov rax,QWORD[8+r10*8+rax]
+
+ jmp NEAR $L$common_pop_regs
+
+$L$body_40:
+ mov rax,QWORD[40+rax]
+$L$common_pop_regs:
+ mov rbx,QWORD[((-8))+rax]
+ mov rbp,QWORD[((-16))+rax]
+ mov r12,QWORD[((-24))+rax]
+ mov r13,QWORD[((-32))+rax]
+ mov r14,QWORD[((-40))+rax]
+ mov r15,QWORD[((-48))+rax]
+ mov QWORD[144+r8],rbx
+ mov QWORD[160+r8],rbp
+ mov QWORD[216+r8],r12
+ mov QWORD[224+r8],r13
+ mov QWORD[232+r8],r14
+ mov QWORD[240+r8],r15
+
+$L$common_seh_tail:
+ mov rdi,QWORD[8+rax]
+ mov rsi,QWORD[16+rax]
+ mov QWORD[152+r8],rax
+ mov QWORD[168+r8],rsi
+ mov QWORD[176+r8],rdi
+
+ mov rdi,QWORD[40+r9]
+ mov rsi,r8
+ mov ecx,154
+ DD 0xa548f3fc
+
+ mov rsi,r9
+ xor rcx,rcx
+ mov rdx,QWORD[8+rsi]
+ mov r8,QWORD[rsi]
+ mov r9,QWORD[16+rsi]
+ mov r10,QWORD[40+rsi]
+ lea r11,[56+rsi]
+ lea r12,[24+rsi]
+ mov QWORD[32+rsp],r10
+ mov QWORD[40+rsp],r11
+ mov QWORD[48+rsp],r12
+ mov QWORD[56+rsp],rcx
+ call QWORD[__imp_RtlVirtualUnwind]
+
+ mov eax,1
+ add rsp,64
+ popfq
+ pop r15
+ pop r14
+ pop r13
+ pop r12
+ pop rbp
+ pop rbx
+ pop rdi
+ pop rsi
+ DB 0F3h,0C3h ;repret
+
+
+section .pdata rdata align=4
+ALIGN 4
+ DD $L$SEH_begin_bn_mul_mont_gather5 wrt ..imagebase
+ DD $L$SEH_end_bn_mul_mont_gather5 wrt ..imagebase
+ DD $L$SEH_info_bn_mul_mont_gather5 wrt ..imagebase
+
+ DD $L$SEH_begin_bn_mul4x_mont_gather5 wrt ..imagebase
+ DD $L$SEH_end_bn_mul4x_mont_gather5 wrt ..imagebase
+ DD $L$SEH_info_bn_mul4x_mont_gather5 wrt ..imagebase
+
+ DD $L$SEH_begin_bn_power5 wrt ..imagebase
+ DD $L$SEH_end_bn_power5 wrt ..imagebase
+ DD $L$SEH_info_bn_power5 wrt ..imagebase
+
+ DD $L$SEH_begin_bn_from_mont8x wrt ..imagebase
+ DD $L$SEH_end_bn_from_mont8x wrt ..imagebase
+ DD $L$SEH_info_bn_from_mont8x wrt ..imagebase
+ DD $L$SEH_begin_bn_mulx4x_mont_gather5 wrt ..imagebase
+ DD $L$SEH_end_bn_mulx4x_mont_gather5 wrt ..imagebase
+ DD $L$SEH_info_bn_mulx4x_mont_gather5 wrt ..imagebase
+
+ DD $L$SEH_begin_bn_powerx5 wrt ..imagebase
+ DD $L$SEH_end_bn_powerx5 wrt ..imagebase
+ DD $L$SEH_info_bn_powerx5 wrt ..imagebase
+ DD $L$SEH_begin_bn_gather5 wrt ..imagebase
+ DD $L$SEH_end_bn_gather5 wrt ..imagebase
+ DD $L$SEH_info_bn_gather5 wrt ..imagebase
+
+section .xdata rdata align=8
+ALIGN 8
+$L$SEH_info_bn_mul_mont_gather5:
+DB 9,0,0,0
+ DD mul_handler wrt ..imagebase
+ DD $L$mul_body wrt ..imagebase,$L$mul_body wrt ..imagebase,$L$mul_epilogue wrt ..imagebase
+ALIGN 8
+$L$SEH_info_bn_mul4x_mont_gather5:
+DB 9,0,0,0
+ DD mul_handler wrt ..imagebase
+ DD $L$mul4x_prologue wrt ..imagebase,$L$mul4x_body wrt ..imagebase,$L$mul4x_epilogue wrt ..imagebase
+ALIGN 8
+$L$SEH_info_bn_power5:
+DB 9,0,0,0
+ DD mul_handler wrt ..imagebase
+ DD $L$power5_prologue wrt ..imagebase,$L$power5_body wrt ..imagebase,$L$power5_epilogue wrt ..imagebase
+ALIGN 8
+$L$SEH_info_bn_from_mont8x:
+DB 9,0,0,0
+ DD mul_handler wrt ..imagebase
+ DD $L$from_prologue wrt ..imagebase,$L$from_body wrt ..imagebase,$L$from_epilogue wrt ..imagebase
+ALIGN 8
+$L$SEH_info_bn_mulx4x_mont_gather5:
+DB 9,0,0,0
+ DD mul_handler wrt ..imagebase
+ DD $L$mulx4x_prologue wrt ..imagebase,$L$mulx4x_body wrt ..imagebase,$L$mulx4x_epilogue wrt ..imagebase
+ALIGN 8
+$L$SEH_info_bn_powerx5:
+DB 9,0,0,0
+ DD mul_handler wrt ..imagebase
+ DD $L$powerx5_prologue wrt ..imagebase,$L$powerx5_body wrt ..imagebase,$L$powerx5_epilogue wrt ..imagebase
+ALIGN 8
+$L$SEH_info_bn_gather5:
+DB 0x01,0x0b,0x03,0x0a
+DB 0x0b,0x01,0x21,0x00
+DB 0x04,0xa3,0x00,0x00
+ALIGN 8
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/md5/md5-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/md5/md5-x86_64.nasm
new file mode 100644
index 0000000000..ff688eeb06
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/md5/md5-x86_64.nasm
@@ -0,0 +1,794 @@
+; Author: Marc Bevand <bevand_m (at) epita.fr>
+; Copyright 2005-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+ALIGN 16
+
+global md5_block_asm_data_order
+
+md5_block_asm_data_order:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_md5_block_asm_data_order:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+
+ push rbp
+
+ push rbx
+
+ push r12
+
+ push r14
+
+ push r15
+
+$L$prologue:
+
+
+
+
+ mov rbp,rdi
+ shl rdx,6
+ lea rdi,[rdx*1+rsi]
+ mov eax,DWORD[rbp]
+ mov ebx,DWORD[4+rbp]
+ mov ecx,DWORD[8+rbp]
+ mov edx,DWORD[12+rbp]
+
+
+
+
+
+
+
+ cmp rsi,rdi
+ je NEAR $L$end
+
+
+$L$loop:
+ mov r8d,eax
+ mov r9d,ebx
+ mov r14d,ecx
+ mov r15d,edx
+ mov r10d,DWORD[rsi]
+ mov r11d,edx
+ xor r11d,ecx
+ lea eax,[((-680876936))+r10*1+rax]
+ and r11d,ebx
+ mov r10d,DWORD[4+rsi]
+ xor r11d,edx
+ add eax,r11d
+ rol eax,7
+ mov r11d,ecx
+ add eax,ebx
+ xor r11d,ebx
+ lea edx,[((-389564586))+r10*1+rdx]
+ and r11d,eax
+ mov r10d,DWORD[8+rsi]
+ xor r11d,ecx
+ add edx,r11d
+ rol edx,12
+ mov r11d,ebx
+ add edx,eax
+ xor r11d,eax
+ lea ecx,[606105819+r10*1+rcx]
+ and r11d,edx
+ mov r10d,DWORD[12+rsi]
+ xor r11d,ebx
+ add ecx,r11d
+ rol ecx,17
+ mov r11d,eax
+ add ecx,edx
+ xor r11d,edx
+ lea ebx,[((-1044525330))+r10*1+rbx]
+ and r11d,ecx
+ mov r10d,DWORD[16+rsi]
+ xor r11d,eax
+ add ebx,r11d
+ rol ebx,22
+ mov r11d,edx
+ add ebx,ecx
+ xor r11d,ecx
+ lea eax,[((-176418897))+r10*1+rax]
+ and r11d,ebx
+ mov r10d,DWORD[20+rsi]
+ xor r11d,edx
+ add eax,r11d
+ rol eax,7
+ mov r11d,ecx
+ add eax,ebx
+ xor r11d,ebx
+ lea edx,[1200080426+r10*1+rdx]
+ and r11d,eax
+ mov r10d,DWORD[24+rsi]
+ xor r11d,ecx
+ add edx,r11d
+ rol edx,12
+ mov r11d,ebx
+ add edx,eax
+ xor r11d,eax
+ lea ecx,[((-1473231341))+r10*1+rcx]
+ and r11d,edx
+ mov r10d,DWORD[28+rsi]
+ xor r11d,ebx
+ add ecx,r11d
+ rol ecx,17
+ mov r11d,eax
+ add ecx,edx
+ xor r11d,edx
+ lea ebx,[((-45705983))+r10*1+rbx]
+ and r11d,ecx
+ mov r10d,DWORD[32+rsi]
+ xor r11d,eax
+ add ebx,r11d
+ rol ebx,22
+ mov r11d,edx
+ add ebx,ecx
+ xor r11d,ecx
+ lea eax,[1770035416+r10*1+rax]
+ and r11d,ebx
+ mov r10d,DWORD[36+rsi]
+ xor r11d,edx
+ add eax,r11d
+ rol eax,7
+ mov r11d,ecx
+ add eax,ebx
+ xor r11d,ebx
+ lea edx,[((-1958414417))+r10*1+rdx]
+ and r11d,eax
+ mov r10d,DWORD[40+rsi]
+ xor r11d,ecx
+ add edx,r11d
+ rol edx,12
+ mov r11d,ebx
+ add edx,eax
+ xor r11d,eax
+ lea ecx,[((-42063))+r10*1+rcx]
+ and r11d,edx
+ mov r10d,DWORD[44+rsi]
+ xor r11d,ebx
+ add ecx,r11d
+ rol ecx,17
+ mov r11d,eax
+ add ecx,edx
+ xor r11d,edx
+ lea ebx,[((-1990404162))+r10*1+rbx]
+ and r11d,ecx
+ mov r10d,DWORD[48+rsi]
+ xor r11d,eax
+ add ebx,r11d
+ rol ebx,22
+ mov r11d,edx
+ add ebx,ecx
+ xor r11d,ecx
+ lea eax,[1804603682+r10*1+rax]
+ and r11d,ebx
+ mov r10d,DWORD[52+rsi]
+ xor r11d,edx
+ add eax,r11d
+ rol eax,7
+ mov r11d,ecx
+ add eax,ebx
+ xor r11d,ebx
+ lea edx,[((-40341101))+r10*1+rdx]
+ and r11d,eax
+ mov r10d,DWORD[56+rsi]
+ xor r11d,ecx
+ add edx,r11d
+ rol edx,12
+ mov r11d,ebx
+ add edx,eax
+ xor r11d,eax
+ lea ecx,[((-1502002290))+r10*1+rcx]
+ and r11d,edx
+ mov r10d,DWORD[60+rsi]
+ xor r11d,ebx
+ add ecx,r11d
+ rol ecx,17
+ mov r11d,eax
+ add ecx,edx
+ xor r11d,edx
+ lea ebx,[1236535329+r10*1+rbx]
+ and r11d,ecx
+ mov r10d,DWORD[4+rsi]
+ xor r11d,eax
+ add ebx,r11d
+ rol ebx,22
+ mov r11d,edx
+ add ebx,ecx
+ mov r11d,edx
+ mov r12d,edx
+ not r11d
+ and r12d,ebx
+ lea eax,[((-165796510))+r10*1+rax]
+ and r11d,ecx
+ mov r10d,DWORD[24+rsi]
+ or r12d,r11d
+ mov r11d,ecx
+ add eax,r12d
+ mov r12d,ecx
+ rol eax,5
+ add eax,ebx
+ not r11d
+ and r12d,eax
+ lea edx,[((-1069501632))+r10*1+rdx]
+ and r11d,ebx
+ mov r10d,DWORD[44+rsi]
+ or r12d,r11d
+ mov r11d,ebx
+ add edx,r12d
+ mov r12d,ebx
+ rol edx,9
+ add edx,eax
+ not r11d
+ and r12d,edx
+ lea ecx,[643717713+r10*1+rcx]
+ and r11d,eax
+ mov r10d,DWORD[rsi]
+ or r12d,r11d
+ mov r11d,eax
+ add ecx,r12d
+ mov r12d,eax
+ rol ecx,14
+ add ecx,edx
+ not r11d
+ and r12d,ecx
+ lea ebx,[((-373897302))+r10*1+rbx]
+ and r11d,edx
+ mov r10d,DWORD[20+rsi]
+ or r12d,r11d
+ mov r11d,edx
+ add ebx,r12d
+ mov r12d,edx
+ rol ebx,20
+ add ebx,ecx
+ not r11d
+ and r12d,ebx
+ lea eax,[((-701558691))+r10*1+rax]
+ and r11d,ecx
+ mov r10d,DWORD[40+rsi]
+ or r12d,r11d
+ mov r11d,ecx
+ add eax,r12d
+ mov r12d,ecx
+ rol eax,5
+ add eax,ebx
+ not r11d
+ and r12d,eax
+ lea edx,[38016083+r10*1+rdx]
+ and r11d,ebx
+ mov r10d,DWORD[60+rsi]
+ or r12d,r11d
+ mov r11d,ebx
+ add edx,r12d
+ mov r12d,ebx
+ rol edx,9
+ add edx,eax
+ not r11d
+ and r12d,edx
+ lea ecx,[((-660478335))+r10*1+rcx]
+ and r11d,eax
+ mov r10d,DWORD[16+rsi]
+ or r12d,r11d
+ mov r11d,eax
+ add ecx,r12d
+ mov r12d,eax
+ rol ecx,14
+ add ecx,edx
+ not r11d
+ and r12d,ecx
+ lea ebx,[((-405537848))+r10*1+rbx]
+ and r11d,edx
+ mov r10d,DWORD[36+rsi]
+ or r12d,r11d
+ mov r11d,edx
+ add ebx,r12d
+ mov r12d,edx
+ rol ebx,20
+ add ebx,ecx
+ not r11d
+ and r12d,ebx
+ lea eax,[568446438+r10*1+rax]
+ and r11d,ecx
+ mov r10d,DWORD[56+rsi]
+ or r12d,r11d
+ mov r11d,ecx
+ add eax,r12d
+ mov r12d,ecx
+ rol eax,5
+ add eax,ebx
+ not r11d
+ and r12d,eax
+ lea edx,[((-1019803690))+r10*1+rdx]
+ and r11d,ebx
+ mov r10d,DWORD[12+rsi]
+ or r12d,r11d
+ mov r11d,ebx
+ add edx,r12d
+ mov r12d,ebx
+ rol edx,9
+ add edx,eax
+ not r11d
+ and r12d,edx
+ lea ecx,[((-187363961))+r10*1+rcx]
+ and r11d,eax
+ mov r10d,DWORD[32+rsi]
+ or r12d,r11d
+ mov r11d,eax
+ add ecx,r12d
+ mov r12d,eax
+ rol ecx,14
+ add ecx,edx
+ not r11d
+ and r12d,ecx
+ lea ebx,[1163531501+r10*1+rbx]
+ and r11d,edx
+ mov r10d,DWORD[52+rsi]
+ or r12d,r11d
+ mov r11d,edx
+ add ebx,r12d
+ mov r12d,edx
+ rol ebx,20
+ add ebx,ecx
+ not r11d
+ and r12d,ebx
+ lea eax,[((-1444681467))+r10*1+rax]
+ and r11d,ecx
+ mov r10d,DWORD[8+rsi]
+ or r12d,r11d
+ mov r11d,ecx
+ add eax,r12d
+ mov r12d,ecx
+ rol eax,5
+ add eax,ebx
+ not r11d
+ and r12d,eax
+ lea edx,[((-51403784))+r10*1+rdx]
+ and r11d,ebx
+ mov r10d,DWORD[28+rsi]
+ or r12d,r11d
+ mov r11d,ebx
+ add edx,r12d
+ mov r12d,ebx
+ rol edx,9
+ add edx,eax
+ not r11d
+ and r12d,edx
+ lea ecx,[1735328473+r10*1+rcx]
+ and r11d,eax
+ mov r10d,DWORD[48+rsi]
+ or r12d,r11d
+ mov r11d,eax
+ add ecx,r12d
+ mov r12d,eax
+ rol ecx,14
+ add ecx,edx
+ not r11d
+ and r12d,ecx
+ lea ebx,[((-1926607734))+r10*1+rbx]
+ and r11d,edx
+ mov r10d,DWORD[20+rsi]
+ or r12d,r11d
+ mov r11d,edx
+ add ebx,r12d
+ mov r12d,edx
+ rol ebx,20
+ add ebx,ecx
+ mov r11d,ecx
+ lea eax,[((-378558))+r10*1+rax]
+ xor r11d,edx
+ mov r10d,DWORD[32+rsi]
+ xor r11d,ebx
+ add eax,r11d
+ mov r11d,ebx
+ rol eax,4
+ add eax,ebx
+ lea edx,[((-2022574463))+r10*1+rdx]
+ xor r11d,ecx
+ mov r10d,DWORD[44+rsi]
+ xor r11d,eax
+ add edx,r11d
+ rol edx,11
+ mov r11d,eax
+ add edx,eax
+ lea ecx,[1839030562+r10*1+rcx]
+ xor r11d,ebx
+ mov r10d,DWORD[56+rsi]
+ xor r11d,edx
+ add ecx,r11d
+ mov r11d,edx
+ rol ecx,16
+ add ecx,edx
+ lea ebx,[((-35309556))+r10*1+rbx]
+ xor r11d,eax
+ mov r10d,DWORD[4+rsi]
+ xor r11d,ecx
+ add ebx,r11d
+ rol ebx,23
+ mov r11d,ecx
+ add ebx,ecx
+ lea eax,[((-1530992060))+r10*1+rax]
+ xor r11d,edx
+ mov r10d,DWORD[16+rsi]
+ xor r11d,ebx
+ add eax,r11d
+ mov r11d,ebx
+ rol eax,4
+ add eax,ebx
+ lea edx,[1272893353+r10*1+rdx]
+ xor r11d,ecx
+ mov r10d,DWORD[28+rsi]
+ xor r11d,eax
+ add edx,r11d
+ rol edx,11
+ mov r11d,eax
+ add edx,eax
+ lea ecx,[((-155497632))+r10*1+rcx]
+ xor r11d,ebx
+ mov r10d,DWORD[40+rsi]
+ xor r11d,edx
+ add ecx,r11d
+ mov r11d,edx
+ rol ecx,16
+ add ecx,edx
+ lea ebx,[((-1094730640))+r10*1+rbx]
+ xor r11d,eax
+ mov r10d,DWORD[52+rsi]
+ xor r11d,ecx
+ add ebx,r11d
+ rol ebx,23
+ mov r11d,ecx
+ add ebx,ecx
+ lea eax,[681279174+r10*1+rax]
+ xor r11d,edx
+ mov r10d,DWORD[rsi]
+ xor r11d,ebx
+ add eax,r11d
+ mov r11d,ebx
+ rol eax,4
+ add eax,ebx
+ lea edx,[((-358537222))+r10*1+rdx]
+ xor r11d,ecx
+ mov r10d,DWORD[12+rsi]
+ xor r11d,eax
+ add edx,r11d
+ rol edx,11
+ mov r11d,eax
+ add edx,eax
+ lea ecx,[((-722521979))+r10*1+rcx]
+ xor r11d,ebx
+ mov r10d,DWORD[24+rsi]
+ xor r11d,edx
+ add ecx,r11d
+ mov r11d,edx
+ rol ecx,16
+ add ecx,edx
+ lea ebx,[76029189+r10*1+rbx]
+ xor r11d,eax
+ mov r10d,DWORD[36+rsi]
+ xor r11d,ecx
+ add ebx,r11d
+ rol ebx,23
+ mov r11d,ecx
+ add ebx,ecx
+ lea eax,[((-640364487))+r10*1+rax]
+ xor r11d,edx
+ mov r10d,DWORD[48+rsi]
+ xor r11d,ebx
+ add eax,r11d
+ mov r11d,ebx
+ rol eax,4
+ add eax,ebx
+ lea edx,[((-421815835))+r10*1+rdx]
+ xor r11d,ecx
+ mov r10d,DWORD[60+rsi]
+ xor r11d,eax
+ add edx,r11d
+ rol edx,11
+ mov r11d,eax
+ add edx,eax
+ lea ecx,[530742520+r10*1+rcx]
+ xor r11d,ebx
+ mov r10d,DWORD[8+rsi]
+ xor r11d,edx
+ add ecx,r11d
+ mov r11d,edx
+ rol ecx,16
+ add ecx,edx
+ lea ebx,[((-995338651))+r10*1+rbx]
+ xor r11d,eax
+ mov r10d,DWORD[rsi]
+ xor r11d,ecx
+ add ebx,r11d
+ rol ebx,23
+ mov r11d,ecx
+ add ebx,ecx
+ mov r11d,0xffffffff
+ xor r11d,edx
+ lea eax,[((-198630844))+r10*1+rax]
+ or r11d,ebx
+ mov r10d,DWORD[28+rsi]
+ xor r11d,ecx
+ add eax,r11d
+ mov r11d,0xffffffff
+ rol eax,6
+ xor r11d,ecx
+ add eax,ebx
+ lea edx,[1126891415+r10*1+rdx]
+ or r11d,eax
+ mov r10d,DWORD[56+rsi]
+ xor r11d,ebx
+ add edx,r11d
+ mov r11d,0xffffffff
+ rol edx,10
+ xor r11d,ebx
+ add edx,eax
+ lea ecx,[((-1416354905))+r10*1+rcx]
+ or r11d,edx
+ mov r10d,DWORD[20+rsi]
+ xor r11d,eax
+ add ecx,r11d
+ mov r11d,0xffffffff
+ rol ecx,15
+ xor r11d,eax
+ add ecx,edx
+ lea ebx,[((-57434055))+r10*1+rbx]
+ or r11d,ecx
+ mov r10d,DWORD[48+rsi]
+ xor r11d,edx
+ add ebx,r11d
+ mov r11d,0xffffffff
+ rol ebx,21
+ xor r11d,edx
+ add ebx,ecx
+ lea eax,[1700485571+r10*1+rax]
+ or r11d,ebx
+ mov r10d,DWORD[12+rsi]
+ xor r11d,ecx
+ add eax,r11d
+ mov r11d,0xffffffff
+ rol eax,6
+ xor r11d,ecx
+ add eax,ebx
+ lea edx,[((-1894986606))+r10*1+rdx]
+ or r11d,eax
+ mov r10d,DWORD[40+rsi]
+ xor r11d,ebx
+ add edx,r11d
+ mov r11d,0xffffffff
+ rol edx,10
+ xor r11d,ebx
+ add edx,eax
+ lea ecx,[((-1051523))+r10*1+rcx]
+ or r11d,edx
+ mov r10d,DWORD[4+rsi]
+ xor r11d,eax
+ add ecx,r11d
+ mov r11d,0xffffffff
+ rol ecx,15
+ xor r11d,eax
+ add ecx,edx
+ lea ebx,[((-2054922799))+r10*1+rbx]
+ or r11d,ecx
+ mov r10d,DWORD[32+rsi]
+ xor r11d,edx
+ add ebx,r11d
+ mov r11d,0xffffffff
+ rol ebx,21
+ xor r11d,edx
+ add ebx,ecx
+ lea eax,[1873313359+r10*1+rax]
+ or r11d,ebx
+ mov r10d,DWORD[60+rsi]
+ xor r11d,ecx
+ add eax,r11d
+ mov r11d,0xffffffff
+ rol eax,6
+ xor r11d,ecx
+ add eax,ebx
+ lea edx,[((-30611744))+r10*1+rdx]
+ or r11d,eax
+ mov r10d,DWORD[24+rsi]
+ xor r11d,ebx
+ add edx,r11d
+ mov r11d,0xffffffff
+ rol edx,10
+ xor r11d,ebx
+ add edx,eax
+ lea ecx,[((-1560198380))+r10*1+rcx]
+ or r11d,edx
+ mov r10d,DWORD[52+rsi]
+ xor r11d,eax
+ add ecx,r11d
+ mov r11d,0xffffffff
+ rol ecx,15
+ xor r11d,eax
+ add ecx,edx
+ lea ebx,[1309151649+r10*1+rbx]
+ or r11d,ecx
+ mov r10d,DWORD[16+rsi]
+ xor r11d,edx
+ add ebx,r11d
+ mov r11d,0xffffffff
+ rol ebx,21
+ xor r11d,edx
+ add ebx,ecx
+ lea eax,[((-145523070))+r10*1+rax]
+ or r11d,ebx
+ mov r10d,DWORD[44+rsi]
+ xor r11d,ecx
+ add eax,r11d
+ mov r11d,0xffffffff
+ rol eax,6
+ xor r11d,ecx
+ add eax,ebx
+ lea edx,[((-1120210379))+r10*1+rdx]
+ or r11d,eax
+ mov r10d,DWORD[8+rsi]
+ xor r11d,ebx
+ add edx,r11d
+ mov r11d,0xffffffff
+ rol edx,10
+ xor r11d,ebx
+ add edx,eax
+ lea ecx,[718787259+r10*1+rcx]
+ or r11d,edx
+ mov r10d,DWORD[36+rsi]
+ xor r11d,eax
+ add ecx,r11d
+ mov r11d,0xffffffff
+ rol ecx,15
+ xor r11d,eax
+ add ecx,edx
+ lea ebx,[((-343485551))+r10*1+rbx]
+ or r11d,ecx
+ mov r10d,DWORD[rsi]
+ xor r11d,edx
+ add ebx,r11d
+ mov r11d,0xffffffff
+ rol ebx,21
+ xor r11d,edx
+ add ebx,ecx
+
+ add eax,r8d
+ add ebx,r9d
+ add ecx,r14d
+ add edx,r15d
+
+
+ add rsi,64
+ cmp rsi,rdi
+ jb NEAR $L$loop
+
+
+$L$end:
+ mov DWORD[rbp],eax
+ mov DWORD[4+rbp],ebx
+ mov DWORD[8+rbp],ecx
+ mov DWORD[12+rbp],edx
+
+ mov r15,QWORD[rsp]
+
+ mov r14,QWORD[8+rsp]
+
+ mov r12,QWORD[16+rsp]
+
+ mov rbx,QWORD[24+rsp]
+
+ mov rbp,QWORD[32+rsp]
+
+ add rsp,40
+
+$L$epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_md5_block_asm_data_order:
+EXTERN __imp_RtlVirtualUnwind
+
+ALIGN 16
+se_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ lea r10,[$L$prologue]
+ cmp rbx,r10
+ jb NEAR $L$in_prologue
+
+ mov rax,QWORD[152+r8]
+
+ lea r10,[$L$epilogue]
+ cmp rbx,r10
+ jae NEAR $L$in_prologue
+
+ lea rax,[40+rax]
+
+ mov rbp,QWORD[((-8))+rax]
+ mov rbx,QWORD[((-16))+rax]
+ mov r12,QWORD[((-24))+rax]
+ mov r14,QWORD[((-32))+rax]
+ mov r15,QWORD[((-40))+rax]
+ mov QWORD[144+r8],rbx
+ mov QWORD[160+r8],rbp
+ mov QWORD[216+r8],r12
+ mov QWORD[232+r8],r14
+ mov QWORD[240+r8],r15
+
+$L$in_prologue:
+ mov rdi,QWORD[8+rax]
+ mov rsi,QWORD[16+rax]
+ mov QWORD[152+r8],rax
+ mov QWORD[168+r8],rsi
+ mov QWORD[176+r8],rdi
+
+ mov rdi,QWORD[40+r9]
+ mov rsi,r8
+ mov ecx,154
+ DD 0xa548f3fc
+
+ mov rsi,r9
+ xor rcx,rcx
+ mov rdx,QWORD[8+rsi]
+ mov r8,QWORD[rsi]
+ mov r9,QWORD[16+rsi]
+ mov r10,QWORD[40+rsi]
+ lea r11,[56+rsi]
+ lea r12,[24+rsi]
+ mov QWORD[32+rsp],r10
+ mov QWORD[40+rsp],r11
+ mov QWORD[48+rsp],r12
+ mov QWORD[56+rsp],rcx
+ call QWORD[__imp_RtlVirtualUnwind]
+
+ mov eax,1
+ add rsp,64
+ popfq
+ pop r15
+ pop r14
+ pop r13
+ pop r12
+ pop rbp
+ pop rbx
+ pop rdi
+ pop rsi
+ DB 0F3h,0C3h ;repret
+
+
+section .pdata rdata align=4
+ALIGN 4
+ DD $L$SEH_begin_md5_block_asm_data_order wrt ..imagebase
+ DD $L$SEH_end_md5_block_asm_data_order wrt ..imagebase
+ DD $L$SEH_info_md5_block_asm_data_order wrt ..imagebase
+
+section .xdata rdata align=8
+ALIGN 8
+$L$SEH_info_md5_block_asm_data_order:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/modes/aesni-gcm-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/modes/aesni-gcm-x86_64.nasm
new file mode 100644
index 0000000000..3951121452
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/modes/aesni-gcm-x86_64.nasm
@@ -0,0 +1,984 @@
+; Copyright 2013-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+
+ALIGN 32
+_aesni_ctr32_ghash_6x:
+ vmovdqu xmm2,XMMWORD[32+r11]
+ sub rdx,6
+ vpxor xmm4,xmm4,xmm4
+ vmovdqu xmm15,XMMWORD[((0-128))+rcx]
+ vpaddb xmm10,xmm1,xmm2
+ vpaddb xmm11,xmm10,xmm2
+ vpaddb xmm12,xmm11,xmm2
+ vpaddb xmm13,xmm12,xmm2
+ vpaddb xmm14,xmm13,xmm2
+ vpxor xmm9,xmm1,xmm15
+ vmovdqu XMMWORD[(16+8)+rsp],xmm4
+ jmp NEAR $L$oop6x
+
+ALIGN 32
+$L$oop6x:
+ add ebx,100663296
+ jc NEAR $L$handle_ctr32
+ vmovdqu xmm3,XMMWORD[((0-32))+r9]
+ vpaddb xmm1,xmm14,xmm2
+ vpxor xmm10,xmm10,xmm15
+ vpxor xmm11,xmm11,xmm15
+
+$L$resume_ctr32:
+ vmovdqu XMMWORD[r8],xmm1
+ vpclmulqdq xmm5,xmm7,xmm3,0x10
+ vpxor xmm12,xmm12,xmm15
+ vmovups xmm2,XMMWORD[((16-128))+rcx]
+ vpclmulqdq xmm6,xmm7,xmm3,0x01
+ xor r12,r12
+ cmp r15,r14
+
+ vaesenc xmm9,xmm9,xmm2
+ vmovdqu xmm0,XMMWORD[((48+8))+rsp]
+ vpxor xmm13,xmm13,xmm15
+ vpclmulqdq xmm1,xmm7,xmm3,0x00
+ vaesenc xmm10,xmm10,xmm2
+ vpxor xmm14,xmm14,xmm15
+ setnc r12b
+ vpclmulqdq xmm7,xmm7,xmm3,0x11
+ vaesenc xmm11,xmm11,xmm2
+ vmovdqu xmm3,XMMWORD[((16-32))+r9]
+ neg r12
+ vaesenc xmm12,xmm12,xmm2
+ vpxor xmm6,xmm6,xmm5
+ vpclmulqdq xmm5,xmm0,xmm3,0x00
+ vpxor xmm8,xmm8,xmm4
+ vaesenc xmm13,xmm13,xmm2
+ vpxor xmm4,xmm1,xmm5
+ and r12,0x60
+ vmovups xmm15,XMMWORD[((32-128))+rcx]
+ vpclmulqdq xmm1,xmm0,xmm3,0x10
+ vaesenc xmm14,xmm14,xmm2
+
+ vpclmulqdq xmm2,xmm0,xmm3,0x01
+ lea r14,[r12*1+r14]
+ vaesenc xmm9,xmm9,xmm15
+ vpxor xmm8,xmm8,XMMWORD[((16+8))+rsp]
+ vpclmulqdq xmm3,xmm0,xmm3,0x11
+ vmovdqu xmm0,XMMWORD[((64+8))+rsp]
+ vaesenc xmm10,xmm10,xmm15
+ movbe r13,QWORD[88+r14]
+ vaesenc xmm11,xmm11,xmm15
+ movbe r12,QWORD[80+r14]
+ vaesenc xmm12,xmm12,xmm15
+ mov QWORD[((32+8))+rsp],r13
+ vaesenc xmm13,xmm13,xmm15
+ mov QWORD[((40+8))+rsp],r12
+ vmovdqu xmm5,XMMWORD[((48-32))+r9]
+ vaesenc xmm14,xmm14,xmm15
+
+ vmovups xmm15,XMMWORD[((48-128))+rcx]
+ vpxor xmm6,xmm6,xmm1
+ vpclmulqdq xmm1,xmm0,xmm5,0x00
+ vaesenc xmm9,xmm9,xmm15
+ vpxor xmm6,xmm6,xmm2
+ vpclmulqdq xmm2,xmm0,xmm5,0x10
+ vaesenc xmm10,xmm10,xmm15
+ vpxor xmm7,xmm7,xmm3
+ vpclmulqdq xmm3,xmm0,xmm5,0x01
+ vaesenc xmm11,xmm11,xmm15
+ vpclmulqdq xmm5,xmm0,xmm5,0x11
+ vmovdqu xmm0,XMMWORD[((80+8))+rsp]
+ vaesenc xmm12,xmm12,xmm15
+ vaesenc xmm13,xmm13,xmm15
+ vpxor xmm4,xmm4,xmm1
+ vmovdqu xmm1,XMMWORD[((64-32))+r9]
+ vaesenc xmm14,xmm14,xmm15
+
+ vmovups xmm15,XMMWORD[((64-128))+rcx]
+ vpxor xmm6,xmm6,xmm2
+ vpclmulqdq xmm2,xmm0,xmm1,0x00
+ vaesenc xmm9,xmm9,xmm15
+ vpxor xmm6,xmm6,xmm3
+ vpclmulqdq xmm3,xmm0,xmm1,0x10
+ vaesenc xmm10,xmm10,xmm15
+ movbe r13,QWORD[72+r14]
+ vpxor xmm7,xmm7,xmm5
+ vpclmulqdq xmm5,xmm0,xmm1,0x01
+ vaesenc xmm11,xmm11,xmm15
+ movbe r12,QWORD[64+r14]
+ vpclmulqdq xmm1,xmm0,xmm1,0x11
+ vmovdqu xmm0,XMMWORD[((96+8))+rsp]
+ vaesenc xmm12,xmm12,xmm15
+ mov QWORD[((48+8))+rsp],r13
+ vaesenc xmm13,xmm13,xmm15
+ mov QWORD[((56+8))+rsp],r12
+ vpxor xmm4,xmm4,xmm2
+ vmovdqu xmm2,XMMWORD[((96-32))+r9]
+ vaesenc xmm14,xmm14,xmm15
+
+ vmovups xmm15,XMMWORD[((80-128))+rcx]
+ vpxor xmm6,xmm6,xmm3
+ vpclmulqdq xmm3,xmm0,xmm2,0x00
+ vaesenc xmm9,xmm9,xmm15
+ vpxor xmm6,xmm6,xmm5
+ vpclmulqdq xmm5,xmm0,xmm2,0x10
+ vaesenc xmm10,xmm10,xmm15
+ movbe r13,QWORD[56+r14]
+ vpxor xmm7,xmm7,xmm1
+ vpclmulqdq xmm1,xmm0,xmm2,0x01
+ vpxor xmm8,xmm8,XMMWORD[((112+8))+rsp]
+ vaesenc xmm11,xmm11,xmm15
+ movbe r12,QWORD[48+r14]
+ vpclmulqdq xmm2,xmm0,xmm2,0x11
+ vaesenc xmm12,xmm12,xmm15
+ mov QWORD[((64+8))+rsp],r13
+ vaesenc xmm13,xmm13,xmm15
+ mov QWORD[((72+8))+rsp],r12
+ vpxor xmm4,xmm4,xmm3
+ vmovdqu xmm3,XMMWORD[((112-32))+r9]
+ vaesenc xmm14,xmm14,xmm15
+
+ vmovups xmm15,XMMWORD[((96-128))+rcx]
+ vpxor xmm6,xmm6,xmm5
+ vpclmulqdq xmm5,xmm8,xmm3,0x10
+ vaesenc xmm9,xmm9,xmm15
+ vpxor xmm6,xmm6,xmm1
+ vpclmulqdq xmm1,xmm8,xmm3,0x01
+ vaesenc xmm10,xmm10,xmm15
+ movbe r13,QWORD[40+r14]
+ vpxor xmm7,xmm7,xmm2
+ vpclmulqdq xmm2,xmm8,xmm3,0x00
+ vaesenc xmm11,xmm11,xmm15
+ movbe r12,QWORD[32+r14]
+ vpclmulqdq xmm8,xmm8,xmm3,0x11
+ vaesenc xmm12,xmm12,xmm15
+ mov QWORD[((80+8))+rsp],r13
+ vaesenc xmm13,xmm13,xmm15
+ mov QWORD[((88+8))+rsp],r12
+ vpxor xmm6,xmm6,xmm5
+ vaesenc xmm14,xmm14,xmm15
+ vpxor xmm6,xmm6,xmm1
+
+ vmovups xmm15,XMMWORD[((112-128))+rcx]
+ vpslldq xmm5,xmm6,8
+ vpxor xmm4,xmm4,xmm2
+ vmovdqu xmm3,XMMWORD[16+r11]
+
+ vaesenc xmm9,xmm9,xmm15
+ vpxor xmm7,xmm7,xmm8
+ vaesenc xmm10,xmm10,xmm15
+ vpxor xmm4,xmm4,xmm5
+ movbe r13,QWORD[24+r14]
+ vaesenc xmm11,xmm11,xmm15
+ movbe r12,QWORD[16+r14]
+ vpalignr xmm0,xmm4,xmm4,8
+ vpclmulqdq xmm4,xmm4,xmm3,0x10
+ mov QWORD[((96+8))+rsp],r13
+ vaesenc xmm12,xmm12,xmm15
+ mov QWORD[((104+8))+rsp],r12
+ vaesenc xmm13,xmm13,xmm15
+ vmovups xmm1,XMMWORD[((128-128))+rcx]
+ vaesenc xmm14,xmm14,xmm15
+
+ vaesenc xmm9,xmm9,xmm1
+ vmovups xmm15,XMMWORD[((144-128))+rcx]
+ vaesenc xmm10,xmm10,xmm1
+ vpsrldq xmm6,xmm6,8
+ vaesenc xmm11,xmm11,xmm1
+ vpxor xmm7,xmm7,xmm6
+ vaesenc xmm12,xmm12,xmm1
+ vpxor xmm4,xmm4,xmm0
+ movbe r13,QWORD[8+r14]
+ vaesenc xmm13,xmm13,xmm1
+ movbe r12,QWORD[r14]
+ vaesenc xmm14,xmm14,xmm1
+ vmovups xmm1,XMMWORD[((160-128))+rcx]
+ cmp ebp,11
+ jb NEAR $L$enc_tail
+
+ vaesenc xmm9,xmm9,xmm15
+ vaesenc xmm10,xmm10,xmm15
+ vaesenc xmm11,xmm11,xmm15
+ vaesenc xmm12,xmm12,xmm15
+ vaesenc xmm13,xmm13,xmm15
+ vaesenc xmm14,xmm14,xmm15
+
+ vaesenc xmm9,xmm9,xmm1
+ vaesenc xmm10,xmm10,xmm1
+ vaesenc xmm11,xmm11,xmm1
+ vaesenc xmm12,xmm12,xmm1
+ vaesenc xmm13,xmm13,xmm1
+ vmovups xmm15,XMMWORD[((176-128))+rcx]
+ vaesenc xmm14,xmm14,xmm1
+ vmovups xmm1,XMMWORD[((192-128))+rcx]
+ je NEAR $L$enc_tail
+
+ vaesenc xmm9,xmm9,xmm15
+ vaesenc xmm10,xmm10,xmm15
+ vaesenc xmm11,xmm11,xmm15
+ vaesenc xmm12,xmm12,xmm15
+ vaesenc xmm13,xmm13,xmm15
+ vaesenc xmm14,xmm14,xmm15
+
+ vaesenc xmm9,xmm9,xmm1
+ vaesenc xmm10,xmm10,xmm1
+ vaesenc xmm11,xmm11,xmm1
+ vaesenc xmm12,xmm12,xmm1
+ vaesenc xmm13,xmm13,xmm1
+ vmovups xmm15,XMMWORD[((208-128))+rcx]
+ vaesenc xmm14,xmm14,xmm1
+ vmovups xmm1,XMMWORD[((224-128))+rcx]
+ jmp NEAR $L$enc_tail
+
+ALIGN 32
+$L$handle_ctr32:
+ vmovdqu xmm0,XMMWORD[r11]
+ vpshufb xmm6,xmm1,xmm0
+ vmovdqu xmm5,XMMWORD[48+r11]
+ vpaddd xmm10,xmm6,XMMWORD[64+r11]
+ vpaddd xmm11,xmm6,xmm5
+ vmovdqu xmm3,XMMWORD[((0-32))+r9]
+ vpaddd xmm12,xmm10,xmm5
+ vpshufb xmm10,xmm10,xmm0
+ vpaddd xmm13,xmm11,xmm5
+ vpshufb xmm11,xmm11,xmm0
+ vpxor xmm10,xmm10,xmm15
+ vpaddd xmm14,xmm12,xmm5
+ vpshufb xmm12,xmm12,xmm0
+ vpxor xmm11,xmm11,xmm15
+ vpaddd xmm1,xmm13,xmm5
+ vpshufb xmm13,xmm13,xmm0
+ vpshufb xmm14,xmm14,xmm0
+ vpshufb xmm1,xmm1,xmm0
+ jmp NEAR $L$resume_ctr32
+
+ALIGN 32
+$L$enc_tail:
+ vaesenc xmm9,xmm9,xmm15
+ vmovdqu XMMWORD[(16+8)+rsp],xmm7
+ vpalignr xmm8,xmm4,xmm4,8
+ vaesenc xmm10,xmm10,xmm15
+ vpclmulqdq xmm4,xmm4,xmm3,0x10
+ vpxor xmm2,xmm1,XMMWORD[rdi]
+ vaesenc xmm11,xmm11,xmm15
+ vpxor xmm0,xmm1,XMMWORD[16+rdi]
+ vaesenc xmm12,xmm12,xmm15
+ vpxor xmm5,xmm1,XMMWORD[32+rdi]
+ vaesenc xmm13,xmm13,xmm15
+ vpxor xmm6,xmm1,XMMWORD[48+rdi]
+ vaesenc xmm14,xmm14,xmm15
+ vpxor xmm7,xmm1,XMMWORD[64+rdi]
+ vpxor xmm3,xmm1,XMMWORD[80+rdi]
+ vmovdqu xmm1,XMMWORD[r8]
+
+ vaesenclast xmm9,xmm9,xmm2
+ vmovdqu xmm2,XMMWORD[32+r11]
+ vaesenclast xmm10,xmm10,xmm0
+ vpaddb xmm0,xmm1,xmm2
+ mov QWORD[((112+8))+rsp],r13
+ lea rdi,[96+rdi]
+ vaesenclast xmm11,xmm11,xmm5
+ vpaddb xmm5,xmm0,xmm2
+ mov QWORD[((120+8))+rsp],r12
+ lea rsi,[96+rsi]
+ vmovdqu xmm15,XMMWORD[((0-128))+rcx]
+ vaesenclast xmm12,xmm12,xmm6
+ vpaddb xmm6,xmm5,xmm2
+ vaesenclast xmm13,xmm13,xmm7
+ vpaddb xmm7,xmm6,xmm2
+ vaesenclast xmm14,xmm14,xmm3
+ vpaddb xmm3,xmm7,xmm2
+
+ add r10,0x60
+ sub rdx,0x6
+ jc NEAR $L$6x_done
+
+ vmovups XMMWORD[(-96)+rsi],xmm9
+ vpxor xmm9,xmm1,xmm15
+ vmovups XMMWORD[(-80)+rsi],xmm10
+ vmovdqa xmm10,xmm0
+ vmovups XMMWORD[(-64)+rsi],xmm11
+ vmovdqa xmm11,xmm5
+ vmovups XMMWORD[(-48)+rsi],xmm12
+ vmovdqa xmm12,xmm6
+ vmovups XMMWORD[(-32)+rsi],xmm13
+ vmovdqa xmm13,xmm7
+ vmovups XMMWORD[(-16)+rsi],xmm14
+ vmovdqa xmm14,xmm3
+ vmovdqu xmm7,XMMWORD[((32+8))+rsp]
+ jmp NEAR $L$oop6x
+
+$L$6x_done:
+ vpxor xmm8,xmm8,XMMWORD[((16+8))+rsp]
+ vpxor xmm8,xmm8,xmm4
+
+ DB 0F3h,0C3h ;repret
+
+global aesni_gcm_decrypt
+
+ALIGN 32
+aesni_gcm_decrypt:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_aesni_gcm_decrypt:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+
+ xor r10,r10
+ cmp rdx,0x60
+ jb NEAR $L$gcm_dec_abort
+
+ lea rax,[rsp]
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+ lea rsp,[((-168))+rsp]
+ movaps XMMWORD[(-216)+rax],xmm6
+ movaps XMMWORD[(-200)+rax],xmm7
+ movaps XMMWORD[(-184)+rax],xmm8
+ movaps XMMWORD[(-168)+rax],xmm9
+ movaps XMMWORD[(-152)+rax],xmm10
+ movaps XMMWORD[(-136)+rax],xmm11
+ movaps XMMWORD[(-120)+rax],xmm12
+ movaps XMMWORD[(-104)+rax],xmm13
+ movaps XMMWORD[(-88)+rax],xmm14
+ movaps XMMWORD[(-72)+rax],xmm15
+$L$gcm_dec_body:
+ vzeroupper
+
+ vmovdqu xmm1,XMMWORD[r8]
+ add rsp,-128
+ mov ebx,DWORD[12+r8]
+ lea r11,[$L$bswap_mask]
+ lea r14,[((-128))+rcx]
+ mov r15,0xf80
+ vmovdqu xmm8,XMMWORD[r9]
+ and rsp,-128
+ vmovdqu xmm0,XMMWORD[r11]
+ lea rcx,[128+rcx]
+ lea r9,[((32+32))+r9]
+ mov ebp,DWORD[((240-128))+rcx]
+ vpshufb xmm8,xmm8,xmm0
+
+ and r14,r15
+ and r15,rsp
+ sub r15,r14
+ jc NEAR $L$dec_no_key_aliasing
+ cmp r15,768
+ jnc NEAR $L$dec_no_key_aliasing
+ sub rsp,r15
+$L$dec_no_key_aliasing:
+
+ vmovdqu xmm7,XMMWORD[80+rdi]
+ lea r14,[rdi]
+ vmovdqu xmm4,XMMWORD[64+rdi]
+ lea r15,[((-192))+rdx*1+rdi]
+ vmovdqu xmm5,XMMWORD[48+rdi]
+ shr rdx,4
+ xor r10,r10
+ vmovdqu xmm6,XMMWORD[32+rdi]
+ vpshufb xmm7,xmm7,xmm0
+ vmovdqu xmm2,XMMWORD[16+rdi]
+ vpshufb xmm4,xmm4,xmm0
+ vmovdqu xmm3,XMMWORD[rdi]
+ vpshufb xmm5,xmm5,xmm0
+ vmovdqu XMMWORD[48+rsp],xmm4
+ vpshufb xmm6,xmm6,xmm0
+ vmovdqu XMMWORD[64+rsp],xmm5
+ vpshufb xmm2,xmm2,xmm0
+ vmovdqu XMMWORD[80+rsp],xmm6
+ vpshufb xmm3,xmm3,xmm0
+ vmovdqu XMMWORD[96+rsp],xmm2
+ vmovdqu XMMWORD[112+rsp],xmm3
+
+ call _aesni_ctr32_ghash_6x
+
+ vmovups XMMWORD[(-96)+rsi],xmm9
+ vmovups XMMWORD[(-80)+rsi],xmm10
+ vmovups XMMWORD[(-64)+rsi],xmm11
+ vmovups XMMWORD[(-48)+rsi],xmm12
+ vmovups XMMWORD[(-32)+rsi],xmm13
+ vmovups XMMWORD[(-16)+rsi],xmm14
+
+ vpshufb xmm8,xmm8,XMMWORD[r11]
+ vmovdqu XMMWORD[(-64)+r9],xmm8
+
+ vzeroupper
+ movaps xmm6,XMMWORD[((-216))+rax]
+ movaps xmm7,XMMWORD[((-200))+rax]
+ movaps xmm8,XMMWORD[((-184))+rax]
+ movaps xmm9,XMMWORD[((-168))+rax]
+ movaps xmm10,XMMWORD[((-152))+rax]
+ movaps xmm11,XMMWORD[((-136))+rax]
+ movaps xmm12,XMMWORD[((-120))+rax]
+ movaps xmm13,XMMWORD[((-104))+rax]
+ movaps xmm14,XMMWORD[((-88))+rax]
+ movaps xmm15,XMMWORD[((-72))+rax]
+ mov r15,QWORD[((-48))+rax]
+
+ mov r14,QWORD[((-40))+rax]
+
+ mov r13,QWORD[((-32))+rax]
+
+ mov r12,QWORD[((-24))+rax]
+
+ mov rbp,QWORD[((-16))+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+
+ lea rsp,[rax]
+
+$L$gcm_dec_abort:
+ mov rax,r10
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_aesni_gcm_decrypt:
+
+ALIGN 32
+_aesni_ctr32_6x:
+ vmovdqu xmm4,XMMWORD[((0-128))+rcx]
+ vmovdqu xmm2,XMMWORD[32+r11]
+ lea r13,[((-1))+rbp]
+ vmovups xmm15,XMMWORD[((16-128))+rcx]
+ lea r12,[((32-128))+rcx]
+ vpxor xmm9,xmm1,xmm4
+ add ebx,100663296
+ jc NEAR $L$handle_ctr32_2
+ vpaddb xmm10,xmm1,xmm2
+ vpaddb xmm11,xmm10,xmm2
+ vpxor xmm10,xmm10,xmm4
+ vpaddb xmm12,xmm11,xmm2
+ vpxor xmm11,xmm11,xmm4
+ vpaddb xmm13,xmm12,xmm2
+ vpxor xmm12,xmm12,xmm4
+ vpaddb xmm14,xmm13,xmm2
+ vpxor xmm13,xmm13,xmm4
+ vpaddb xmm1,xmm14,xmm2
+ vpxor xmm14,xmm14,xmm4
+ jmp NEAR $L$oop_ctr32
+
+ALIGN 16
+$L$oop_ctr32:
+ vaesenc xmm9,xmm9,xmm15
+ vaesenc xmm10,xmm10,xmm15
+ vaesenc xmm11,xmm11,xmm15
+ vaesenc xmm12,xmm12,xmm15
+ vaesenc xmm13,xmm13,xmm15
+ vaesenc xmm14,xmm14,xmm15
+ vmovups xmm15,XMMWORD[r12]
+ lea r12,[16+r12]
+ dec r13d
+ jnz NEAR $L$oop_ctr32
+
+ vmovdqu xmm3,XMMWORD[r12]
+ vaesenc xmm9,xmm9,xmm15
+ vpxor xmm4,xmm3,XMMWORD[rdi]
+ vaesenc xmm10,xmm10,xmm15
+ vpxor xmm5,xmm3,XMMWORD[16+rdi]
+ vaesenc xmm11,xmm11,xmm15
+ vpxor xmm6,xmm3,XMMWORD[32+rdi]
+ vaesenc xmm12,xmm12,xmm15
+ vpxor xmm8,xmm3,XMMWORD[48+rdi]
+ vaesenc xmm13,xmm13,xmm15
+ vpxor xmm2,xmm3,XMMWORD[64+rdi]
+ vaesenc xmm14,xmm14,xmm15
+ vpxor xmm3,xmm3,XMMWORD[80+rdi]
+ lea rdi,[96+rdi]
+
+ vaesenclast xmm9,xmm9,xmm4
+ vaesenclast xmm10,xmm10,xmm5
+ vaesenclast xmm11,xmm11,xmm6
+ vaesenclast xmm12,xmm12,xmm8
+ vaesenclast xmm13,xmm13,xmm2
+ vaesenclast xmm14,xmm14,xmm3
+ vmovups XMMWORD[rsi],xmm9
+ vmovups XMMWORD[16+rsi],xmm10
+ vmovups XMMWORD[32+rsi],xmm11
+ vmovups XMMWORD[48+rsi],xmm12
+ vmovups XMMWORD[64+rsi],xmm13
+ vmovups XMMWORD[80+rsi],xmm14
+ lea rsi,[96+rsi]
+
+ DB 0F3h,0C3h ;repret
+ALIGN 32
+$L$handle_ctr32_2:
+ vpshufb xmm6,xmm1,xmm0
+ vmovdqu xmm5,XMMWORD[48+r11]
+ vpaddd xmm10,xmm6,XMMWORD[64+r11]
+ vpaddd xmm11,xmm6,xmm5
+ vpaddd xmm12,xmm10,xmm5
+ vpshufb xmm10,xmm10,xmm0
+ vpaddd xmm13,xmm11,xmm5
+ vpshufb xmm11,xmm11,xmm0
+ vpxor xmm10,xmm10,xmm4
+ vpaddd xmm14,xmm12,xmm5
+ vpshufb xmm12,xmm12,xmm0
+ vpxor xmm11,xmm11,xmm4
+ vpaddd xmm1,xmm13,xmm5
+ vpshufb xmm13,xmm13,xmm0
+ vpxor xmm12,xmm12,xmm4
+ vpshufb xmm14,xmm14,xmm0
+ vpxor xmm13,xmm13,xmm4
+ vpshufb xmm1,xmm1,xmm0
+ vpxor xmm14,xmm14,xmm4
+ jmp NEAR $L$oop_ctr32
+
+
+global aesni_gcm_encrypt
+
+ALIGN 32
+aesni_gcm_encrypt:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_aesni_gcm_encrypt:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+
+ xor r10,r10
+ cmp rdx,0x60*3
+ jb NEAR $L$gcm_enc_abort
+
+ lea rax,[rsp]
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+ lea rsp,[((-168))+rsp]
+ movaps XMMWORD[(-216)+rax],xmm6
+ movaps XMMWORD[(-200)+rax],xmm7
+ movaps XMMWORD[(-184)+rax],xmm8
+ movaps XMMWORD[(-168)+rax],xmm9
+ movaps XMMWORD[(-152)+rax],xmm10
+ movaps XMMWORD[(-136)+rax],xmm11
+ movaps XMMWORD[(-120)+rax],xmm12
+ movaps XMMWORD[(-104)+rax],xmm13
+ movaps XMMWORD[(-88)+rax],xmm14
+ movaps XMMWORD[(-72)+rax],xmm15
+$L$gcm_enc_body:
+ vzeroupper
+
+ vmovdqu xmm1,XMMWORD[r8]
+ add rsp,-128
+ mov ebx,DWORD[12+r8]
+ lea r11,[$L$bswap_mask]
+ lea r14,[((-128))+rcx]
+ mov r15,0xf80
+ lea rcx,[128+rcx]
+ vmovdqu xmm0,XMMWORD[r11]
+ and rsp,-128
+ mov ebp,DWORD[((240-128))+rcx]
+
+ and r14,r15
+ and r15,rsp
+ sub r15,r14
+ jc NEAR $L$enc_no_key_aliasing
+ cmp r15,768
+ jnc NEAR $L$enc_no_key_aliasing
+ sub rsp,r15
+$L$enc_no_key_aliasing:
+
+ lea r14,[rsi]
+ lea r15,[((-192))+rdx*1+rsi]
+ shr rdx,4
+
+ call _aesni_ctr32_6x
+ vpshufb xmm8,xmm9,xmm0
+ vpshufb xmm2,xmm10,xmm0
+ vmovdqu XMMWORD[112+rsp],xmm8
+ vpshufb xmm4,xmm11,xmm0
+ vmovdqu XMMWORD[96+rsp],xmm2
+ vpshufb xmm5,xmm12,xmm0
+ vmovdqu XMMWORD[80+rsp],xmm4
+ vpshufb xmm6,xmm13,xmm0
+ vmovdqu XMMWORD[64+rsp],xmm5
+ vpshufb xmm7,xmm14,xmm0
+ vmovdqu XMMWORD[48+rsp],xmm6
+
+ call _aesni_ctr32_6x
+
+ vmovdqu xmm8,XMMWORD[r9]
+ lea r9,[((32+32))+r9]
+ sub rdx,12
+ mov r10,0x60*2
+ vpshufb xmm8,xmm8,xmm0
+
+ call _aesni_ctr32_ghash_6x
+ vmovdqu xmm7,XMMWORD[32+rsp]
+ vmovdqu xmm0,XMMWORD[r11]
+ vmovdqu xmm3,XMMWORD[((0-32))+r9]
+ vpunpckhqdq xmm1,xmm7,xmm7
+ vmovdqu xmm15,XMMWORD[((32-32))+r9]
+ vmovups XMMWORD[(-96)+rsi],xmm9
+ vpshufb xmm9,xmm9,xmm0
+ vpxor xmm1,xmm1,xmm7
+ vmovups XMMWORD[(-80)+rsi],xmm10
+ vpshufb xmm10,xmm10,xmm0
+ vmovups XMMWORD[(-64)+rsi],xmm11
+ vpshufb xmm11,xmm11,xmm0
+ vmovups XMMWORD[(-48)+rsi],xmm12
+ vpshufb xmm12,xmm12,xmm0
+ vmovups XMMWORD[(-32)+rsi],xmm13
+ vpshufb xmm13,xmm13,xmm0
+ vmovups XMMWORD[(-16)+rsi],xmm14
+ vpshufb xmm14,xmm14,xmm0
+ vmovdqu XMMWORD[16+rsp],xmm9
+ vmovdqu xmm6,XMMWORD[48+rsp]
+ vmovdqu xmm0,XMMWORD[((16-32))+r9]
+ vpunpckhqdq xmm2,xmm6,xmm6
+ vpclmulqdq xmm5,xmm7,xmm3,0x00
+ vpxor xmm2,xmm2,xmm6
+ vpclmulqdq xmm7,xmm7,xmm3,0x11
+ vpclmulqdq xmm1,xmm1,xmm15,0x00
+
+ vmovdqu xmm9,XMMWORD[64+rsp]
+ vpclmulqdq xmm4,xmm6,xmm0,0x00
+ vmovdqu xmm3,XMMWORD[((48-32))+r9]
+ vpxor xmm4,xmm4,xmm5
+ vpunpckhqdq xmm5,xmm9,xmm9
+ vpclmulqdq xmm6,xmm6,xmm0,0x11
+ vpxor xmm5,xmm5,xmm9
+ vpxor xmm6,xmm6,xmm7
+ vpclmulqdq xmm2,xmm2,xmm15,0x10
+ vmovdqu xmm15,XMMWORD[((80-32))+r9]
+ vpxor xmm2,xmm2,xmm1
+
+ vmovdqu xmm1,XMMWORD[80+rsp]
+ vpclmulqdq xmm7,xmm9,xmm3,0x00
+ vmovdqu xmm0,XMMWORD[((64-32))+r9]
+ vpxor xmm7,xmm7,xmm4
+ vpunpckhqdq xmm4,xmm1,xmm1
+ vpclmulqdq xmm9,xmm9,xmm3,0x11
+ vpxor xmm4,xmm4,xmm1
+ vpxor xmm9,xmm9,xmm6
+ vpclmulqdq xmm5,xmm5,xmm15,0x00
+ vpxor xmm5,xmm5,xmm2
+
+ vmovdqu xmm2,XMMWORD[96+rsp]
+ vpclmulqdq xmm6,xmm1,xmm0,0x00
+ vmovdqu xmm3,XMMWORD[((96-32))+r9]
+ vpxor xmm6,xmm6,xmm7
+ vpunpckhqdq xmm7,xmm2,xmm2
+ vpclmulqdq xmm1,xmm1,xmm0,0x11
+ vpxor xmm7,xmm7,xmm2
+ vpxor xmm1,xmm1,xmm9
+ vpclmulqdq xmm4,xmm4,xmm15,0x10
+ vmovdqu xmm15,XMMWORD[((128-32))+r9]
+ vpxor xmm4,xmm4,xmm5
+
+ vpxor xmm8,xmm8,XMMWORD[112+rsp]
+ vpclmulqdq xmm5,xmm2,xmm3,0x00
+ vmovdqu xmm0,XMMWORD[((112-32))+r9]
+ vpunpckhqdq xmm9,xmm8,xmm8
+ vpxor xmm5,xmm5,xmm6
+ vpclmulqdq xmm2,xmm2,xmm3,0x11
+ vpxor xmm9,xmm9,xmm8
+ vpxor xmm2,xmm2,xmm1
+ vpclmulqdq xmm7,xmm7,xmm15,0x00
+ vpxor xmm4,xmm7,xmm4
+
+ vpclmulqdq xmm6,xmm8,xmm0,0x00
+ vmovdqu xmm3,XMMWORD[((0-32))+r9]
+ vpunpckhqdq xmm1,xmm14,xmm14
+ vpclmulqdq xmm8,xmm8,xmm0,0x11
+ vpxor xmm1,xmm1,xmm14
+ vpxor xmm5,xmm6,xmm5
+ vpclmulqdq xmm9,xmm9,xmm15,0x10
+ vmovdqu xmm15,XMMWORD[((32-32))+r9]
+ vpxor xmm7,xmm8,xmm2
+ vpxor xmm6,xmm9,xmm4
+
+ vmovdqu xmm0,XMMWORD[((16-32))+r9]
+ vpxor xmm9,xmm7,xmm5
+ vpclmulqdq xmm4,xmm14,xmm3,0x00
+ vpxor xmm6,xmm6,xmm9
+ vpunpckhqdq xmm2,xmm13,xmm13
+ vpclmulqdq xmm14,xmm14,xmm3,0x11
+ vpxor xmm2,xmm2,xmm13
+ vpslldq xmm9,xmm6,8
+ vpclmulqdq xmm1,xmm1,xmm15,0x00
+ vpxor xmm8,xmm5,xmm9
+ vpsrldq xmm6,xmm6,8
+ vpxor xmm7,xmm7,xmm6
+
+ vpclmulqdq xmm5,xmm13,xmm0,0x00
+ vmovdqu xmm3,XMMWORD[((48-32))+r9]
+ vpxor xmm5,xmm5,xmm4
+ vpunpckhqdq xmm9,xmm12,xmm12
+ vpclmulqdq xmm13,xmm13,xmm0,0x11
+ vpxor xmm9,xmm9,xmm12
+ vpxor xmm13,xmm13,xmm14
+ vpalignr xmm14,xmm8,xmm8,8
+ vpclmulqdq xmm2,xmm2,xmm15,0x10
+ vmovdqu xmm15,XMMWORD[((80-32))+r9]
+ vpxor xmm2,xmm2,xmm1
+
+ vpclmulqdq xmm4,xmm12,xmm3,0x00
+ vmovdqu xmm0,XMMWORD[((64-32))+r9]
+ vpxor xmm4,xmm4,xmm5
+ vpunpckhqdq xmm1,xmm11,xmm11
+ vpclmulqdq xmm12,xmm12,xmm3,0x11
+ vpxor xmm1,xmm1,xmm11
+ vpxor xmm12,xmm12,xmm13
+ vxorps xmm7,xmm7,XMMWORD[16+rsp]
+ vpclmulqdq xmm9,xmm9,xmm15,0x00
+ vpxor xmm9,xmm9,xmm2
+
+ vpclmulqdq xmm8,xmm8,XMMWORD[16+r11],0x10
+ vxorps xmm8,xmm8,xmm14
+
+ vpclmulqdq xmm5,xmm11,xmm0,0x00
+ vmovdqu xmm3,XMMWORD[((96-32))+r9]
+ vpxor xmm5,xmm5,xmm4
+ vpunpckhqdq xmm2,xmm10,xmm10
+ vpclmulqdq xmm11,xmm11,xmm0,0x11
+ vpxor xmm2,xmm2,xmm10
+ vpalignr xmm14,xmm8,xmm8,8
+ vpxor xmm11,xmm11,xmm12
+ vpclmulqdq xmm1,xmm1,xmm15,0x10
+ vmovdqu xmm15,XMMWORD[((128-32))+r9]
+ vpxor xmm1,xmm1,xmm9
+
+ vxorps xmm14,xmm14,xmm7
+ vpclmulqdq xmm8,xmm8,XMMWORD[16+r11],0x10
+ vxorps xmm8,xmm8,xmm14
+
+ vpclmulqdq xmm4,xmm10,xmm3,0x00
+ vmovdqu xmm0,XMMWORD[((112-32))+r9]
+ vpxor xmm4,xmm4,xmm5
+ vpunpckhqdq xmm9,xmm8,xmm8
+ vpclmulqdq xmm10,xmm10,xmm3,0x11
+ vpxor xmm9,xmm9,xmm8
+ vpxor xmm10,xmm10,xmm11
+ vpclmulqdq xmm2,xmm2,xmm15,0x00
+ vpxor xmm2,xmm2,xmm1
+
+ vpclmulqdq xmm5,xmm8,xmm0,0x00
+ vpclmulqdq xmm7,xmm8,xmm0,0x11
+ vpxor xmm5,xmm5,xmm4
+ vpclmulqdq xmm6,xmm9,xmm15,0x10
+ vpxor xmm7,xmm7,xmm10
+ vpxor xmm6,xmm6,xmm2
+
+ vpxor xmm4,xmm7,xmm5
+ vpxor xmm6,xmm6,xmm4
+ vpslldq xmm1,xmm6,8
+ vmovdqu xmm3,XMMWORD[16+r11]
+ vpsrldq xmm6,xmm6,8
+ vpxor xmm8,xmm5,xmm1
+ vpxor xmm7,xmm7,xmm6
+
+ vpalignr xmm2,xmm8,xmm8,8
+ vpclmulqdq xmm8,xmm8,xmm3,0x10
+ vpxor xmm8,xmm8,xmm2
+
+ vpalignr xmm2,xmm8,xmm8,8
+ vpclmulqdq xmm8,xmm8,xmm3,0x10
+ vpxor xmm2,xmm2,xmm7
+ vpxor xmm8,xmm8,xmm2
+ vpshufb xmm8,xmm8,XMMWORD[r11]
+ vmovdqu XMMWORD[(-64)+r9],xmm8
+
+ vzeroupper
+ movaps xmm6,XMMWORD[((-216))+rax]
+ movaps xmm7,XMMWORD[((-200))+rax]
+ movaps xmm8,XMMWORD[((-184))+rax]
+ movaps xmm9,XMMWORD[((-168))+rax]
+ movaps xmm10,XMMWORD[((-152))+rax]
+ movaps xmm11,XMMWORD[((-136))+rax]
+ movaps xmm12,XMMWORD[((-120))+rax]
+ movaps xmm13,XMMWORD[((-104))+rax]
+ movaps xmm14,XMMWORD[((-88))+rax]
+ movaps xmm15,XMMWORD[((-72))+rax]
+ mov r15,QWORD[((-48))+rax]
+
+ mov r14,QWORD[((-40))+rax]
+
+ mov r13,QWORD[((-32))+rax]
+
+ mov r12,QWORD[((-24))+rax]
+
+ mov rbp,QWORD[((-16))+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+
+ lea rsp,[rax]
+
+$L$gcm_enc_abort:
+ mov rax,r10
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_aesni_gcm_encrypt:
+ALIGN 64
+$L$bswap_mask:
+DB 15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0
+$L$poly:
+DB 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0xc2
+$L$one_msb:
+DB 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
+$L$two_lsb:
+DB 2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
+$L$one_lsb:
+DB 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
+DB 65,69,83,45,78,73,32,71,67,77,32,109,111,100,117,108
+DB 101,32,102,111,114,32,120,56,54,95,54,52,44,32,67,82
+DB 89,80,84,79,71,65,77,83,32,98,121,32,60,97,112,112
+DB 114,111,64,111,112,101,110,115,115,108,46,111,114,103,62,0
+ALIGN 64
+EXTERN __imp_RtlVirtualUnwind
+
+ALIGN 16
+gcm_se_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ mov rsi,QWORD[8+r9]
+ mov r11,QWORD[56+r9]
+
+ mov r10d,DWORD[r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jb NEAR $L$common_seh_tail
+
+ mov rax,QWORD[152+r8]
+
+ mov r10d,DWORD[4+r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jae NEAR $L$common_seh_tail
+
+ mov rax,QWORD[120+r8]
+
+ mov r15,QWORD[((-48))+rax]
+ mov r14,QWORD[((-40))+rax]
+ mov r13,QWORD[((-32))+rax]
+ mov r12,QWORD[((-24))+rax]
+ mov rbp,QWORD[((-16))+rax]
+ mov rbx,QWORD[((-8))+rax]
+ mov QWORD[240+r8],r15
+ mov QWORD[232+r8],r14
+ mov QWORD[224+r8],r13
+ mov QWORD[216+r8],r12
+ mov QWORD[160+r8],rbp
+ mov QWORD[144+r8],rbx
+
+ lea rsi,[((-216))+rax]
+ lea rdi,[512+r8]
+ mov ecx,20
+ DD 0xa548f3fc
+
+$L$common_seh_tail:
+ mov rdi,QWORD[8+rax]
+ mov rsi,QWORD[16+rax]
+ mov QWORD[152+r8],rax
+ mov QWORD[168+r8],rsi
+ mov QWORD[176+r8],rdi
+
+ mov rdi,QWORD[40+r9]
+ mov rsi,r8
+ mov ecx,154
+ DD 0xa548f3fc
+
+ mov rsi,r9
+ xor rcx,rcx
+ mov rdx,QWORD[8+rsi]
+ mov r8,QWORD[rsi]
+ mov r9,QWORD[16+rsi]
+ mov r10,QWORD[40+rsi]
+ lea r11,[56+rsi]
+ lea r12,[24+rsi]
+ mov QWORD[32+rsp],r10
+ mov QWORD[40+rsp],r11
+ mov QWORD[48+rsp],r12
+ mov QWORD[56+rsp],rcx
+ call QWORD[__imp_RtlVirtualUnwind]
+
+ mov eax,1
+ add rsp,64
+ popfq
+ pop r15
+ pop r14
+ pop r13
+ pop r12
+ pop rbp
+ pop rbx
+ pop rdi
+ pop rsi
+ DB 0F3h,0C3h ;repret
+
+
+section .pdata rdata align=4
+ALIGN 4
+ DD $L$SEH_begin_aesni_gcm_decrypt wrt ..imagebase
+ DD $L$SEH_end_aesni_gcm_decrypt wrt ..imagebase
+ DD $L$SEH_gcm_dec_info wrt ..imagebase
+
+ DD $L$SEH_begin_aesni_gcm_encrypt wrt ..imagebase
+ DD $L$SEH_end_aesni_gcm_encrypt wrt ..imagebase
+ DD $L$SEH_gcm_enc_info wrt ..imagebase
+section .xdata rdata align=8
+ALIGN 8
+$L$SEH_gcm_dec_info:
+DB 9,0,0,0
+ DD gcm_se_handler wrt ..imagebase
+ DD $L$gcm_dec_body wrt ..imagebase,$L$gcm_dec_abort wrt ..imagebase
+$L$SEH_gcm_enc_info:
+DB 9,0,0,0
+ DD gcm_se_handler wrt ..imagebase
+ DD $L$gcm_enc_body wrt ..imagebase,$L$gcm_enc_abort wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/modes/ghash-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/modes/ghash-x86_64.nasm
new file mode 100644
index 0000000000..3d67e12775
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/modes/ghash-x86_64.nasm
@@ -0,0 +1,2077 @@
+; Copyright 2010-2019 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+EXTERN OPENSSL_ia32cap_P
+
+global gcm_gmult_4bit
+
+ALIGN 16
+gcm_gmult_4bit:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_gcm_gmult_4bit:
+ mov rdi,rcx
+ mov rsi,rdx
+
+
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+ sub rsp,280
+
+$L$gmult_prologue:
+
+ movzx r8,BYTE[15+rdi]
+ lea r11,[$L$rem_4bit]
+ xor rax,rax
+ xor rbx,rbx
+ mov al,r8b
+ mov bl,r8b
+ shl al,4
+ mov rcx,14
+ mov r8,QWORD[8+rax*1+rsi]
+ mov r9,QWORD[rax*1+rsi]
+ and bl,0xf0
+ mov rdx,r8
+ jmp NEAR $L$oop1
+
+ALIGN 16
+$L$oop1:
+ shr r8,4
+ and rdx,0xf
+ mov r10,r9
+ mov al,BYTE[rcx*1+rdi]
+ shr r9,4
+ xor r8,QWORD[8+rbx*1+rsi]
+ shl r10,60
+ xor r9,QWORD[rbx*1+rsi]
+ mov bl,al
+ xor r9,QWORD[rdx*8+r11]
+ mov rdx,r8
+ shl al,4
+ xor r8,r10
+ dec rcx
+ js NEAR $L$break1
+
+ shr r8,4
+ and rdx,0xf
+ mov r10,r9
+ shr r9,4
+ xor r8,QWORD[8+rax*1+rsi]
+ shl r10,60
+ xor r9,QWORD[rax*1+rsi]
+ and bl,0xf0
+ xor r9,QWORD[rdx*8+r11]
+ mov rdx,r8
+ xor r8,r10
+ jmp NEAR $L$oop1
+
+ALIGN 16
+$L$break1:
+ shr r8,4
+ and rdx,0xf
+ mov r10,r9
+ shr r9,4
+ xor r8,QWORD[8+rax*1+rsi]
+ shl r10,60
+ xor r9,QWORD[rax*1+rsi]
+ and bl,0xf0
+ xor r9,QWORD[rdx*8+r11]
+ mov rdx,r8
+ xor r8,r10
+
+ shr r8,4
+ and rdx,0xf
+ mov r10,r9
+ shr r9,4
+ xor r8,QWORD[8+rbx*1+rsi]
+ shl r10,60
+ xor r9,QWORD[rbx*1+rsi]
+ xor r8,r10
+ xor r9,QWORD[rdx*8+r11]
+
+ bswap r8
+ bswap r9
+ mov QWORD[8+rdi],r8
+ mov QWORD[rdi],r9
+
+ lea rsi,[((280+48))+rsp]
+
+ mov rbx,QWORD[((-8))+rsi]
+
+ lea rsp,[rsi]
+
+$L$gmult_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_gcm_gmult_4bit:
+global gcm_ghash_4bit
+
+ALIGN 16
+gcm_ghash_4bit:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_gcm_ghash_4bit:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+
+
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+ sub rsp,280
+
+$L$ghash_prologue:
+ mov r14,rdx
+ mov r15,rcx
+ sub rsi,-128
+ lea rbp,[((16+128))+rsp]
+ xor edx,edx
+ mov r8,QWORD[((0+0-128))+rsi]
+ mov rax,QWORD[((0+8-128))+rsi]
+ mov dl,al
+ shr rax,4
+ mov r10,r8
+ shr r8,4
+ mov r9,QWORD[((16+0-128))+rsi]
+ shl dl,4
+ mov rbx,QWORD[((16+8-128))+rsi]
+ shl r10,60
+ mov BYTE[rsp],dl
+ or rax,r10
+ mov dl,bl
+ shr rbx,4
+ mov r10,r9
+ shr r9,4
+ mov QWORD[rbp],r8
+ mov r8,QWORD[((32+0-128))+rsi]
+ shl dl,4
+ mov QWORD[((0-128))+rbp],rax
+ mov rax,QWORD[((32+8-128))+rsi]
+ shl r10,60
+ mov BYTE[1+rsp],dl
+ or rbx,r10
+ mov dl,al
+ shr rax,4
+ mov r10,r8
+ shr r8,4
+ mov QWORD[8+rbp],r9
+ mov r9,QWORD[((48+0-128))+rsi]
+ shl dl,4
+ mov QWORD[((8-128))+rbp],rbx
+ mov rbx,QWORD[((48+8-128))+rsi]
+ shl r10,60
+ mov BYTE[2+rsp],dl
+ or rax,r10
+ mov dl,bl
+ shr rbx,4
+ mov r10,r9
+ shr r9,4
+ mov QWORD[16+rbp],r8
+ mov r8,QWORD[((64+0-128))+rsi]
+ shl dl,4
+ mov QWORD[((16-128))+rbp],rax
+ mov rax,QWORD[((64+8-128))+rsi]
+ shl r10,60
+ mov BYTE[3+rsp],dl
+ or rbx,r10
+ mov dl,al
+ shr rax,4
+ mov r10,r8
+ shr r8,4
+ mov QWORD[24+rbp],r9
+ mov r9,QWORD[((80+0-128))+rsi]
+ shl dl,4
+ mov QWORD[((24-128))+rbp],rbx
+ mov rbx,QWORD[((80+8-128))+rsi]
+ shl r10,60
+ mov BYTE[4+rsp],dl
+ or rax,r10
+ mov dl,bl
+ shr rbx,4
+ mov r10,r9
+ shr r9,4
+ mov QWORD[32+rbp],r8
+ mov r8,QWORD[((96+0-128))+rsi]
+ shl dl,4
+ mov QWORD[((32-128))+rbp],rax
+ mov rax,QWORD[((96+8-128))+rsi]
+ shl r10,60
+ mov BYTE[5+rsp],dl
+ or rbx,r10
+ mov dl,al
+ shr rax,4
+ mov r10,r8
+ shr r8,4
+ mov QWORD[40+rbp],r9
+ mov r9,QWORD[((112+0-128))+rsi]
+ shl dl,4
+ mov QWORD[((40-128))+rbp],rbx
+ mov rbx,QWORD[((112+8-128))+rsi]
+ shl r10,60
+ mov BYTE[6+rsp],dl
+ or rax,r10
+ mov dl,bl
+ shr rbx,4
+ mov r10,r9
+ shr r9,4
+ mov QWORD[48+rbp],r8
+ mov r8,QWORD[((128+0-128))+rsi]
+ shl dl,4
+ mov QWORD[((48-128))+rbp],rax
+ mov rax,QWORD[((128+8-128))+rsi]
+ shl r10,60
+ mov BYTE[7+rsp],dl
+ or rbx,r10
+ mov dl,al
+ shr rax,4
+ mov r10,r8
+ shr r8,4
+ mov QWORD[56+rbp],r9
+ mov r9,QWORD[((144+0-128))+rsi]
+ shl dl,4
+ mov QWORD[((56-128))+rbp],rbx
+ mov rbx,QWORD[((144+8-128))+rsi]
+ shl r10,60
+ mov BYTE[8+rsp],dl
+ or rax,r10
+ mov dl,bl
+ shr rbx,4
+ mov r10,r9
+ shr r9,4
+ mov QWORD[64+rbp],r8
+ mov r8,QWORD[((160+0-128))+rsi]
+ shl dl,4
+ mov QWORD[((64-128))+rbp],rax
+ mov rax,QWORD[((160+8-128))+rsi]
+ shl r10,60
+ mov BYTE[9+rsp],dl
+ or rbx,r10
+ mov dl,al
+ shr rax,4
+ mov r10,r8
+ shr r8,4
+ mov QWORD[72+rbp],r9
+ mov r9,QWORD[((176+0-128))+rsi]
+ shl dl,4
+ mov QWORD[((72-128))+rbp],rbx
+ mov rbx,QWORD[((176+8-128))+rsi]
+ shl r10,60
+ mov BYTE[10+rsp],dl
+ or rax,r10
+ mov dl,bl
+ shr rbx,4
+ mov r10,r9
+ shr r9,4
+ mov QWORD[80+rbp],r8
+ mov r8,QWORD[((192+0-128))+rsi]
+ shl dl,4
+ mov QWORD[((80-128))+rbp],rax
+ mov rax,QWORD[((192+8-128))+rsi]
+ shl r10,60
+ mov BYTE[11+rsp],dl
+ or rbx,r10
+ mov dl,al
+ shr rax,4
+ mov r10,r8
+ shr r8,4
+ mov QWORD[88+rbp],r9
+ mov r9,QWORD[((208+0-128))+rsi]
+ shl dl,4
+ mov QWORD[((88-128))+rbp],rbx
+ mov rbx,QWORD[((208+8-128))+rsi]
+ shl r10,60
+ mov BYTE[12+rsp],dl
+ or rax,r10
+ mov dl,bl
+ shr rbx,4
+ mov r10,r9
+ shr r9,4
+ mov QWORD[96+rbp],r8
+ mov r8,QWORD[((224+0-128))+rsi]
+ shl dl,4
+ mov QWORD[((96-128))+rbp],rax
+ mov rax,QWORD[((224+8-128))+rsi]
+ shl r10,60
+ mov BYTE[13+rsp],dl
+ or rbx,r10
+ mov dl,al
+ shr rax,4
+ mov r10,r8
+ shr r8,4
+ mov QWORD[104+rbp],r9
+ mov r9,QWORD[((240+0-128))+rsi]
+ shl dl,4
+ mov QWORD[((104-128))+rbp],rbx
+ mov rbx,QWORD[((240+8-128))+rsi]
+ shl r10,60
+ mov BYTE[14+rsp],dl
+ or rax,r10
+ mov dl,bl
+ shr rbx,4
+ mov r10,r9
+ shr r9,4
+ mov QWORD[112+rbp],r8
+ shl dl,4
+ mov QWORD[((112-128))+rbp],rax
+ shl r10,60
+ mov BYTE[15+rsp],dl
+ or rbx,r10
+ mov QWORD[120+rbp],r9
+ mov QWORD[((120-128))+rbp],rbx
+ add rsi,-128
+ mov r8,QWORD[8+rdi]
+ mov r9,QWORD[rdi]
+ add r15,r14
+ lea r11,[$L$rem_8bit]
+ jmp NEAR $L$outer_loop
+ALIGN 16
+$L$outer_loop:
+ xor r9,QWORD[r14]
+ mov rdx,QWORD[8+r14]
+ lea r14,[16+r14]
+ xor rdx,r8
+ mov QWORD[rdi],r9
+ mov QWORD[8+rdi],rdx
+ shr rdx,32
+ xor rax,rax
+ rol edx,8
+ mov al,dl
+ movzx ebx,dl
+ shl al,4
+ shr ebx,4
+ rol edx,8
+ mov r8,QWORD[8+rax*1+rsi]
+ mov r9,QWORD[rax*1+rsi]
+ mov al,dl
+ movzx ecx,dl
+ shl al,4
+ movzx r12,BYTE[rbx*1+rsp]
+ shr ecx,4
+ xor r12,r8
+ mov r10,r9
+ shr r8,8
+ movzx r12,r12b
+ shr r9,8
+ xor r8,QWORD[((-128))+rbx*8+rbp]
+ shl r10,56
+ xor r9,QWORD[rbx*8+rbp]
+ rol edx,8
+ xor r8,QWORD[8+rax*1+rsi]
+ xor r9,QWORD[rax*1+rsi]
+ mov al,dl
+ xor r8,r10
+ movzx r12,WORD[r12*2+r11]
+ movzx ebx,dl
+ shl al,4
+ movzx r13,BYTE[rcx*1+rsp]
+ shr ebx,4
+ shl r12,48
+ xor r13,r8
+ mov r10,r9
+ xor r9,r12
+ shr r8,8
+ movzx r13,r13b
+ shr r9,8
+ xor r8,QWORD[((-128))+rcx*8+rbp]
+ shl r10,56
+ xor r9,QWORD[rcx*8+rbp]
+ rol edx,8
+ xor r8,QWORD[8+rax*1+rsi]
+ xor r9,QWORD[rax*1+rsi]
+ mov al,dl
+ xor r8,r10
+ movzx r13,WORD[r13*2+r11]
+ movzx ecx,dl
+ shl al,4
+ movzx r12,BYTE[rbx*1+rsp]
+ shr ecx,4
+ shl r13,48
+ xor r12,r8
+ mov r10,r9
+ xor r9,r13
+ shr r8,8
+ movzx r12,r12b
+ mov edx,DWORD[8+rdi]
+ shr r9,8
+ xor r8,QWORD[((-128))+rbx*8+rbp]
+ shl r10,56
+ xor r9,QWORD[rbx*8+rbp]
+ rol edx,8
+ xor r8,QWORD[8+rax*1+rsi]
+ xor r9,QWORD[rax*1+rsi]
+ mov al,dl
+ xor r8,r10
+ movzx r12,WORD[r12*2+r11]
+ movzx ebx,dl
+ shl al,4
+ movzx r13,BYTE[rcx*1+rsp]
+ shr ebx,4
+ shl r12,48
+ xor r13,r8
+ mov r10,r9
+ xor r9,r12
+ shr r8,8
+ movzx r13,r13b
+ shr r9,8
+ xor r8,QWORD[((-128))+rcx*8+rbp]
+ shl r10,56
+ xor r9,QWORD[rcx*8+rbp]
+ rol edx,8
+ xor r8,QWORD[8+rax*1+rsi]
+ xor r9,QWORD[rax*1+rsi]
+ mov al,dl
+ xor r8,r10
+ movzx r13,WORD[r13*2+r11]
+ movzx ecx,dl
+ shl al,4
+ movzx r12,BYTE[rbx*1+rsp]
+ shr ecx,4
+ shl r13,48
+ xor r12,r8
+ mov r10,r9
+ xor r9,r13
+ shr r8,8
+ movzx r12,r12b
+ shr r9,8
+ xor r8,QWORD[((-128))+rbx*8+rbp]
+ shl r10,56
+ xor r9,QWORD[rbx*8+rbp]
+ rol edx,8
+ xor r8,QWORD[8+rax*1+rsi]
+ xor r9,QWORD[rax*1+rsi]
+ mov al,dl
+ xor r8,r10
+ movzx r12,WORD[r12*2+r11]
+ movzx ebx,dl
+ shl al,4
+ movzx r13,BYTE[rcx*1+rsp]
+ shr ebx,4
+ shl r12,48
+ xor r13,r8
+ mov r10,r9
+ xor r9,r12
+ shr r8,8
+ movzx r13,r13b
+ shr r9,8
+ xor r8,QWORD[((-128))+rcx*8+rbp]
+ shl r10,56
+ xor r9,QWORD[rcx*8+rbp]
+ rol edx,8
+ xor r8,QWORD[8+rax*1+rsi]
+ xor r9,QWORD[rax*1+rsi]
+ mov al,dl
+ xor r8,r10
+ movzx r13,WORD[r13*2+r11]
+ movzx ecx,dl
+ shl al,4
+ movzx r12,BYTE[rbx*1+rsp]
+ shr ecx,4
+ shl r13,48
+ xor r12,r8
+ mov r10,r9
+ xor r9,r13
+ shr r8,8
+ movzx r12,r12b
+ mov edx,DWORD[4+rdi]
+ shr r9,8
+ xor r8,QWORD[((-128))+rbx*8+rbp]
+ shl r10,56
+ xor r9,QWORD[rbx*8+rbp]
+ rol edx,8
+ xor r8,QWORD[8+rax*1+rsi]
+ xor r9,QWORD[rax*1+rsi]
+ mov al,dl
+ xor r8,r10
+ movzx r12,WORD[r12*2+r11]
+ movzx ebx,dl
+ shl al,4
+ movzx r13,BYTE[rcx*1+rsp]
+ shr ebx,4
+ shl r12,48
+ xor r13,r8
+ mov r10,r9
+ xor r9,r12
+ shr r8,8
+ movzx r13,r13b
+ shr r9,8
+ xor r8,QWORD[((-128))+rcx*8+rbp]
+ shl r10,56
+ xor r9,QWORD[rcx*8+rbp]
+ rol edx,8
+ xor r8,QWORD[8+rax*1+rsi]
+ xor r9,QWORD[rax*1+rsi]
+ mov al,dl
+ xor r8,r10
+ movzx r13,WORD[r13*2+r11]
+ movzx ecx,dl
+ shl al,4
+ movzx r12,BYTE[rbx*1+rsp]
+ shr ecx,4
+ shl r13,48
+ xor r12,r8
+ mov r10,r9
+ xor r9,r13
+ shr r8,8
+ movzx r12,r12b
+ shr r9,8
+ xor r8,QWORD[((-128))+rbx*8+rbp]
+ shl r10,56
+ xor r9,QWORD[rbx*8+rbp]
+ rol edx,8
+ xor r8,QWORD[8+rax*1+rsi]
+ xor r9,QWORD[rax*1+rsi]
+ mov al,dl
+ xor r8,r10
+ movzx r12,WORD[r12*2+r11]
+ movzx ebx,dl
+ shl al,4
+ movzx r13,BYTE[rcx*1+rsp]
+ shr ebx,4
+ shl r12,48
+ xor r13,r8
+ mov r10,r9
+ xor r9,r12
+ shr r8,8
+ movzx r13,r13b
+ shr r9,8
+ xor r8,QWORD[((-128))+rcx*8+rbp]
+ shl r10,56
+ xor r9,QWORD[rcx*8+rbp]
+ rol edx,8
+ xor r8,QWORD[8+rax*1+rsi]
+ xor r9,QWORD[rax*1+rsi]
+ mov al,dl
+ xor r8,r10
+ movzx r13,WORD[r13*2+r11]
+ movzx ecx,dl
+ shl al,4
+ movzx r12,BYTE[rbx*1+rsp]
+ shr ecx,4
+ shl r13,48
+ xor r12,r8
+ mov r10,r9
+ xor r9,r13
+ shr r8,8
+ movzx r12,r12b
+ mov edx,DWORD[rdi]
+ shr r9,8
+ xor r8,QWORD[((-128))+rbx*8+rbp]
+ shl r10,56
+ xor r9,QWORD[rbx*8+rbp]
+ rol edx,8
+ xor r8,QWORD[8+rax*1+rsi]
+ xor r9,QWORD[rax*1+rsi]
+ mov al,dl
+ xor r8,r10
+ movzx r12,WORD[r12*2+r11]
+ movzx ebx,dl
+ shl al,4
+ movzx r13,BYTE[rcx*1+rsp]
+ shr ebx,4
+ shl r12,48
+ xor r13,r8
+ mov r10,r9
+ xor r9,r12
+ shr r8,8
+ movzx r13,r13b
+ shr r9,8
+ xor r8,QWORD[((-128))+rcx*8+rbp]
+ shl r10,56
+ xor r9,QWORD[rcx*8+rbp]
+ rol edx,8
+ xor r8,QWORD[8+rax*1+rsi]
+ xor r9,QWORD[rax*1+rsi]
+ mov al,dl
+ xor r8,r10
+ movzx r13,WORD[r13*2+r11]
+ movzx ecx,dl
+ shl al,4
+ movzx r12,BYTE[rbx*1+rsp]
+ shr ecx,4
+ shl r13,48
+ xor r12,r8
+ mov r10,r9
+ xor r9,r13
+ shr r8,8
+ movzx r12,r12b
+ shr r9,8
+ xor r8,QWORD[((-128))+rbx*8+rbp]
+ shl r10,56
+ xor r9,QWORD[rbx*8+rbp]
+ rol edx,8
+ xor r8,QWORD[8+rax*1+rsi]
+ xor r9,QWORD[rax*1+rsi]
+ mov al,dl
+ xor r8,r10
+ movzx r12,WORD[r12*2+r11]
+ movzx ebx,dl
+ shl al,4
+ movzx r13,BYTE[rcx*1+rsp]
+ shr ebx,4
+ shl r12,48
+ xor r13,r8
+ mov r10,r9
+ xor r9,r12
+ shr r8,8
+ movzx r13,r13b
+ shr r9,8
+ xor r8,QWORD[((-128))+rcx*8+rbp]
+ shl r10,56
+ xor r9,QWORD[rcx*8+rbp]
+ rol edx,8
+ xor r8,QWORD[8+rax*1+rsi]
+ xor r9,QWORD[rax*1+rsi]
+ mov al,dl
+ xor r8,r10
+ movzx r13,WORD[r13*2+r11]
+ movzx ecx,dl
+ shl al,4
+ movzx r12,BYTE[rbx*1+rsp]
+ and ecx,240
+ shl r13,48
+ xor r12,r8
+ mov r10,r9
+ xor r9,r13
+ shr r8,8
+ movzx r12,r12b
+ mov edx,DWORD[((-4))+rdi]
+ shr r9,8
+ xor r8,QWORD[((-128))+rbx*8+rbp]
+ shl r10,56
+ xor r9,QWORD[rbx*8+rbp]
+ movzx r12,WORD[r12*2+r11]
+ xor r8,QWORD[8+rax*1+rsi]
+ xor r9,QWORD[rax*1+rsi]
+ shl r12,48
+ xor r8,r10
+ xor r9,r12
+ movzx r13,r8b
+ shr r8,4
+ mov r10,r9
+ shl r13b,4
+ shr r9,4
+ xor r8,QWORD[8+rcx*1+rsi]
+ movzx r13,WORD[r13*2+r11]
+ shl r10,60
+ xor r9,QWORD[rcx*1+rsi]
+ xor r8,r10
+ shl r13,48
+ bswap r8
+ xor r9,r13
+ bswap r9
+ cmp r14,r15
+ jb NEAR $L$outer_loop
+ mov QWORD[8+rdi],r8
+ mov QWORD[rdi],r9
+
+ lea rsi,[((280+48))+rsp]
+
+ mov r15,QWORD[((-48))+rsi]
+
+ mov r14,QWORD[((-40))+rsi]
+
+ mov r13,QWORD[((-32))+rsi]
+
+ mov r12,QWORD[((-24))+rsi]
+
+ mov rbp,QWORD[((-16))+rsi]
+
+ mov rbx,QWORD[((-8))+rsi]
+
+ lea rsp,[rsi]
+
+$L$ghash_epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_gcm_ghash_4bit:
+global gcm_init_clmul
+
+ALIGN 16
+gcm_init_clmul:
+
+$L$_init_clmul:
+$L$SEH_begin_gcm_init_clmul:
+
+DB 0x48,0x83,0xec,0x18
+DB 0x0f,0x29,0x34,0x24
+ movdqu xmm2,XMMWORD[rdx]
+ pshufd xmm2,xmm2,78
+
+
+ pshufd xmm4,xmm2,255
+ movdqa xmm3,xmm2
+ psllq xmm2,1
+ pxor xmm5,xmm5
+ psrlq xmm3,63
+ pcmpgtd xmm5,xmm4
+ pslldq xmm3,8
+ por xmm2,xmm3
+
+
+ pand xmm5,XMMWORD[$L$0x1c2_polynomial]
+ pxor xmm2,xmm5
+
+
+ pshufd xmm6,xmm2,78
+ movdqa xmm0,xmm2
+ pxor xmm6,xmm2
+ movdqa xmm1,xmm0
+ pshufd xmm3,xmm0,78
+ pxor xmm3,xmm0
+DB 102,15,58,68,194,0
+DB 102,15,58,68,202,17
+DB 102,15,58,68,222,0
+ pxor xmm3,xmm0
+ pxor xmm3,xmm1
+
+ movdqa xmm4,xmm3
+ psrldq xmm3,8
+ pslldq xmm4,8
+ pxor xmm1,xmm3
+ pxor xmm0,xmm4
+
+ movdqa xmm4,xmm0
+ movdqa xmm3,xmm0
+ psllq xmm0,5
+ pxor xmm3,xmm0
+ psllq xmm0,1
+ pxor xmm0,xmm3
+ psllq xmm0,57
+ movdqa xmm3,xmm0
+ pslldq xmm0,8
+ psrldq xmm3,8
+ pxor xmm0,xmm4
+ pxor xmm1,xmm3
+
+
+ movdqa xmm4,xmm0
+ psrlq xmm0,1
+ pxor xmm1,xmm4
+ pxor xmm4,xmm0
+ psrlq xmm0,5
+ pxor xmm0,xmm4
+ psrlq xmm0,1
+ pxor xmm0,xmm1
+ pshufd xmm3,xmm2,78
+ pshufd xmm4,xmm0,78
+ pxor xmm3,xmm2
+ movdqu XMMWORD[rcx],xmm2
+ pxor xmm4,xmm0
+ movdqu XMMWORD[16+rcx],xmm0
+DB 102,15,58,15,227,8
+ movdqu XMMWORD[32+rcx],xmm4
+ movdqa xmm1,xmm0
+ pshufd xmm3,xmm0,78
+ pxor xmm3,xmm0
+DB 102,15,58,68,194,0
+DB 102,15,58,68,202,17
+DB 102,15,58,68,222,0
+ pxor xmm3,xmm0
+ pxor xmm3,xmm1
+
+ movdqa xmm4,xmm3
+ psrldq xmm3,8
+ pslldq xmm4,8
+ pxor xmm1,xmm3
+ pxor xmm0,xmm4
+
+ movdqa xmm4,xmm0
+ movdqa xmm3,xmm0
+ psllq xmm0,5
+ pxor xmm3,xmm0
+ psllq xmm0,1
+ pxor xmm0,xmm3
+ psllq xmm0,57
+ movdqa xmm3,xmm0
+ pslldq xmm0,8
+ psrldq xmm3,8
+ pxor xmm0,xmm4
+ pxor xmm1,xmm3
+
+
+ movdqa xmm4,xmm0
+ psrlq xmm0,1
+ pxor xmm1,xmm4
+ pxor xmm4,xmm0
+ psrlq xmm0,5
+ pxor xmm0,xmm4
+ psrlq xmm0,1
+ pxor xmm0,xmm1
+ movdqa xmm5,xmm0
+ movdqa xmm1,xmm0
+ pshufd xmm3,xmm0,78
+ pxor xmm3,xmm0
+DB 102,15,58,68,194,0
+DB 102,15,58,68,202,17
+DB 102,15,58,68,222,0
+ pxor xmm3,xmm0
+ pxor xmm3,xmm1
+
+ movdqa xmm4,xmm3
+ psrldq xmm3,8
+ pslldq xmm4,8
+ pxor xmm1,xmm3
+ pxor xmm0,xmm4
+
+ movdqa xmm4,xmm0
+ movdqa xmm3,xmm0
+ psllq xmm0,5
+ pxor xmm3,xmm0
+ psllq xmm0,1
+ pxor xmm0,xmm3
+ psllq xmm0,57
+ movdqa xmm3,xmm0
+ pslldq xmm0,8
+ psrldq xmm3,8
+ pxor xmm0,xmm4
+ pxor xmm1,xmm3
+
+
+ movdqa xmm4,xmm0
+ psrlq xmm0,1
+ pxor xmm1,xmm4
+ pxor xmm4,xmm0
+ psrlq xmm0,5
+ pxor xmm0,xmm4
+ psrlq xmm0,1
+ pxor xmm0,xmm1
+ pshufd xmm3,xmm5,78
+ pshufd xmm4,xmm0,78
+ pxor xmm3,xmm5
+ movdqu XMMWORD[48+rcx],xmm5
+ pxor xmm4,xmm0
+ movdqu XMMWORD[64+rcx],xmm0
+DB 102,15,58,15,227,8
+ movdqu XMMWORD[80+rcx],xmm4
+ movaps xmm6,XMMWORD[rsp]
+ lea rsp,[24+rsp]
+$L$SEH_end_gcm_init_clmul:
+ DB 0F3h,0C3h ;repret
+
+
+global gcm_gmult_clmul
+
+ALIGN 16
+gcm_gmult_clmul:
+
+$L$_gmult_clmul:
+ movdqu xmm0,XMMWORD[rcx]
+ movdqa xmm5,XMMWORD[$L$bswap_mask]
+ movdqu xmm2,XMMWORD[rdx]
+ movdqu xmm4,XMMWORD[32+rdx]
+DB 102,15,56,0,197
+ movdqa xmm1,xmm0
+ pshufd xmm3,xmm0,78
+ pxor xmm3,xmm0
+DB 102,15,58,68,194,0
+DB 102,15,58,68,202,17
+DB 102,15,58,68,220,0
+ pxor xmm3,xmm0
+ pxor xmm3,xmm1
+
+ movdqa xmm4,xmm3
+ psrldq xmm3,8
+ pslldq xmm4,8
+ pxor xmm1,xmm3
+ pxor xmm0,xmm4
+
+ movdqa xmm4,xmm0
+ movdqa xmm3,xmm0
+ psllq xmm0,5
+ pxor xmm3,xmm0
+ psllq xmm0,1
+ pxor xmm0,xmm3
+ psllq xmm0,57
+ movdqa xmm3,xmm0
+ pslldq xmm0,8
+ psrldq xmm3,8
+ pxor xmm0,xmm4
+ pxor xmm1,xmm3
+
+
+ movdqa xmm4,xmm0
+ psrlq xmm0,1
+ pxor xmm1,xmm4
+ pxor xmm4,xmm0
+ psrlq xmm0,5
+ pxor xmm0,xmm4
+ psrlq xmm0,1
+ pxor xmm0,xmm1
+DB 102,15,56,0,197
+ movdqu XMMWORD[rcx],xmm0
+ DB 0F3h,0C3h ;repret
+
+
+global gcm_ghash_clmul
+
+ALIGN 32
+gcm_ghash_clmul:
+
+$L$_ghash_clmul:
+ lea rax,[((-136))+rsp]
+$L$SEH_begin_gcm_ghash_clmul:
+
+DB 0x48,0x8d,0x60,0xe0
+DB 0x0f,0x29,0x70,0xe0
+DB 0x0f,0x29,0x78,0xf0
+DB 0x44,0x0f,0x29,0x00
+DB 0x44,0x0f,0x29,0x48,0x10
+DB 0x44,0x0f,0x29,0x50,0x20
+DB 0x44,0x0f,0x29,0x58,0x30
+DB 0x44,0x0f,0x29,0x60,0x40
+DB 0x44,0x0f,0x29,0x68,0x50
+DB 0x44,0x0f,0x29,0x70,0x60
+DB 0x44,0x0f,0x29,0x78,0x70
+ movdqa xmm10,XMMWORD[$L$bswap_mask]
+
+ movdqu xmm0,XMMWORD[rcx]
+ movdqu xmm2,XMMWORD[rdx]
+ movdqu xmm7,XMMWORD[32+rdx]
+DB 102,65,15,56,0,194
+
+ sub r9,0x10
+ jz NEAR $L$odd_tail
+
+ movdqu xmm6,XMMWORD[16+rdx]
+ mov eax,DWORD[((OPENSSL_ia32cap_P+4))]
+ cmp r9,0x30
+ jb NEAR $L$skip4x
+
+ and eax,71303168
+ cmp eax,4194304
+ je NEAR $L$skip4x
+
+ sub r9,0x30
+ mov rax,0xA040608020C0E000
+ movdqu xmm14,XMMWORD[48+rdx]
+ movdqu xmm15,XMMWORD[64+rdx]
+
+
+
+
+ movdqu xmm3,XMMWORD[48+r8]
+ movdqu xmm11,XMMWORD[32+r8]
+DB 102,65,15,56,0,218
+DB 102,69,15,56,0,218
+ movdqa xmm5,xmm3
+ pshufd xmm4,xmm3,78
+ pxor xmm4,xmm3
+DB 102,15,58,68,218,0
+DB 102,15,58,68,234,17
+DB 102,15,58,68,231,0
+
+ movdqa xmm13,xmm11
+ pshufd xmm12,xmm11,78
+ pxor xmm12,xmm11
+DB 102,68,15,58,68,222,0
+DB 102,68,15,58,68,238,17
+DB 102,68,15,58,68,231,16
+ xorps xmm3,xmm11
+ xorps xmm5,xmm13
+ movups xmm7,XMMWORD[80+rdx]
+ xorps xmm4,xmm12
+
+ movdqu xmm11,XMMWORD[16+r8]
+ movdqu xmm8,XMMWORD[r8]
+DB 102,69,15,56,0,218
+DB 102,69,15,56,0,194
+ movdqa xmm13,xmm11
+ pshufd xmm12,xmm11,78
+ pxor xmm0,xmm8
+ pxor xmm12,xmm11
+DB 102,69,15,58,68,222,0
+ movdqa xmm1,xmm0
+ pshufd xmm8,xmm0,78
+ pxor xmm8,xmm0
+DB 102,69,15,58,68,238,17
+DB 102,68,15,58,68,231,0
+ xorps xmm3,xmm11
+ xorps xmm5,xmm13
+
+ lea r8,[64+r8]
+ sub r9,0x40
+ jc NEAR $L$tail4x
+
+ jmp NEAR $L$mod4_loop
+ALIGN 32
+$L$mod4_loop:
+DB 102,65,15,58,68,199,0
+ xorps xmm4,xmm12
+ movdqu xmm11,XMMWORD[48+r8]
+DB 102,69,15,56,0,218
+DB 102,65,15,58,68,207,17
+ xorps xmm0,xmm3
+ movdqu xmm3,XMMWORD[32+r8]
+ movdqa xmm13,xmm11
+DB 102,68,15,58,68,199,16
+ pshufd xmm12,xmm11,78
+ xorps xmm1,xmm5
+ pxor xmm12,xmm11
+DB 102,65,15,56,0,218
+ movups xmm7,XMMWORD[32+rdx]
+ xorps xmm8,xmm4
+DB 102,68,15,58,68,218,0
+ pshufd xmm4,xmm3,78
+
+ pxor xmm8,xmm0
+ movdqa xmm5,xmm3
+ pxor xmm8,xmm1
+ pxor xmm4,xmm3
+ movdqa xmm9,xmm8
+DB 102,68,15,58,68,234,17
+ pslldq xmm8,8
+ psrldq xmm9,8
+ pxor xmm0,xmm8
+ movdqa xmm8,XMMWORD[$L$7_mask]
+ pxor xmm1,xmm9
+DB 102,76,15,110,200
+
+ pand xmm8,xmm0
+DB 102,69,15,56,0,200
+ pxor xmm9,xmm0
+DB 102,68,15,58,68,231,0
+ psllq xmm9,57
+ movdqa xmm8,xmm9
+ pslldq xmm9,8
+DB 102,15,58,68,222,0
+ psrldq xmm8,8
+ pxor xmm0,xmm9
+ pxor xmm1,xmm8
+ movdqu xmm8,XMMWORD[r8]
+
+ movdqa xmm9,xmm0
+ psrlq xmm0,1
+DB 102,15,58,68,238,17
+ xorps xmm3,xmm11
+ movdqu xmm11,XMMWORD[16+r8]
+DB 102,69,15,56,0,218
+DB 102,15,58,68,231,16
+ xorps xmm5,xmm13
+ movups xmm7,XMMWORD[80+rdx]
+DB 102,69,15,56,0,194
+ pxor xmm1,xmm9
+ pxor xmm9,xmm0
+ psrlq xmm0,5
+
+ movdqa xmm13,xmm11
+ pxor xmm4,xmm12
+ pshufd xmm12,xmm11,78
+ pxor xmm0,xmm9
+ pxor xmm1,xmm8
+ pxor xmm12,xmm11
+DB 102,69,15,58,68,222,0
+ psrlq xmm0,1
+ pxor xmm0,xmm1
+ movdqa xmm1,xmm0
+DB 102,69,15,58,68,238,17
+ xorps xmm3,xmm11
+ pshufd xmm8,xmm0,78
+ pxor xmm8,xmm0
+
+DB 102,68,15,58,68,231,0
+ xorps xmm5,xmm13
+
+ lea r8,[64+r8]
+ sub r9,0x40
+ jnc NEAR $L$mod4_loop
+
+$L$tail4x:
+DB 102,65,15,58,68,199,0
+DB 102,65,15,58,68,207,17
+DB 102,68,15,58,68,199,16
+ xorps xmm4,xmm12
+ xorps xmm0,xmm3
+ xorps xmm1,xmm5
+ pxor xmm1,xmm0
+ pxor xmm8,xmm4
+
+ pxor xmm8,xmm1
+ pxor xmm1,xmm0
+
+ movdqa xmm9,xmm8
+ psrldq xmm8,8
+ pslldq xmm9,8
+ pxor xmm1,xmm8
+ pxor xmm0,xmm9
+
+ movdqa xmm4,xmm0
+ movdqa xmm3,xmm0
+ psllq xmm0,5
+ pxor xmm3,xmm0
+ psllq xmm0,1
+ pxor xmm0,xmm3
+ psllq xmm0,57
+ movdqa xmm3,xmm0
+ pslldq xmm0,8
+ psrldq xmm3,8
+ pxor xmm0,xmm4
+ pxor xmm1,xmm3
+
+
+ movdqa xmm4,xmm0
+ psrlq xmm0,1
+ pxor xmm1,xmm4
+ pxor xmm4,xmm0
+ psrlq xmm0,5
+ pxor xmm0,xmm4
+ psrlq xmm0,1
+ pxor xmm0,xmm1
+ add r9,0x40
+ jz NEAR $L$done
+ movdqu xmm7,XMMWORD[32+rdx]
+ sub r9,0x10
+ jz NEAR $L$odd_tail
+$L$skip4x:
+
+
+
+
+
+ movdqu xmm8,XMMWORD[r8]
+ movdqu xmm3,XMMWORD[16+r8]
+DB 102,69,15,56,0,194
+DB 102,65,15,56,0,218
+ pxor xmm0,xmm8
+
+ movdqa xmm5,xmm3
+ pshufd xmm4,xmm3,78
+ pxor xmm4,xmm3
+DB 102,15,58,68,218,0
+DB 102,15,58,68,234,17
+DB 102,15,58,68,231,0
+
+ lea r8,[32+r8]
+ nop
+ sub r9,0x20
+ jbe NEAR $L$even_tail
+ nop
+ jmp NEAR $L$mod_loop
+
+ALIGN 32
+$L$mod_loop:
+ movdqa xmm1,xmm0
+ movdqa xmm8,xmm4
+ pshufd xmm4,xmm0,78
+ pxor xmm4,xmm0
+
+DB 102,15,58,68,198,0
+DB 102,15,58,68,206,17
+DB 102,15,58,68,231,16
+
+ pxor xmm0,xmm3
+ pxor xmm1,xmm5
+ movdqu xmm9,XMMWORD[r8]
+ pxor xmm8,xmm0
+DB 102,69,15,56,0,202
+ movdqu xmm3,XMMWORD[16+r8]
+
+ pxor xmm8,xmm1
+ pxor xmm1,xmm9
+ pxor xmm4,xmm8
+DB 102,65,15,56,0,218
+ movdqa xmm8,xmm4
+ psrldq xmm8,8
+ pslldq xmm4,8
+ pxor xmm1,xmm8
+ pxor xmm0,xmm4
+
+ movdqa xmm5,xmm3
+
+ movdqa xmm9,xmm0
+ movdqa xmm8,xmm0
+ psllq xmm0,5
+ pxor xmm8,xmm0
+DB 102,15,58,68,218,0
+ psllq xmm0,1
+ pxor xmm0,xmm8
+ psllq xmm0,57
+ movdqa xmm8,xmm0
+ pslldq xmm0,8
+ psrldq xmm8,8
+ pxor xmm0,xmm9
+ pshufd xmm4,xmm5,78
+ pxor xmm1,xmm8
+ pxor xmm4,xmm5
+
+ movdqa xmm9,xmm0
+ psrlq xmm0,1
+DB 102,15,58,68,234,17
+ pxor xmm1,xmm9
+ pxor xmm9,xmm0
+ psrlq xmm0,5
+ pxor xmm0,xmm9
+ lea r8,[32+r8]
+ psrlq xmm0,1
+DB 102,15,58,68,231,0
+ pxor xmm0,xmm1
+
+ sub r9,0x20
+ ja NEAR $L$mod_loop
+
+$L$even_tail:
+ movdqa xmm1,xmm0
+ movdqa xmm8,xmm4
+ pshufd xmm4,xmm0,78
+ pxor xmm4,xmm0
+
+DB 102,15,58,68,198,0
+DB 102,15,58,68,206,17
+DB 102,15,58,68,231,16
+
+ pxor xmm0,xmm3
+ pxor xmm1,xmm5
+ pxor xmm8,xmm0
+ pxor xmm8,xmm1
+ pxor xmm4,xmm8
+ movdqa xmm8,xmm4
+ psrldq xmm8,8
+ pslldq xmm4,8
+ pxor xmm1,xmm8
+ pxor xmm0,xmm4
+
+ movdqa xmm4,xmm0
+ movdqa xmm3,xmm0
+ psllq xmm0,5
+ pxor xmm3,xmm0
+ psllq xmm0,1
+ pxor xmm0,xmm3
+ psllq xmm0,57
+ movdqa xmm3,xmm0
+ pslldq xmm0,8
+ psrldq xmm3,8
+ pxor xmm0,xmm4
+ pxor xmm1,xmm3
+
+
+ movdqa xmm4,xmm0
+ psrlq xmm0,1
+ pxor xmm1,xmm4
+ pxor xmm4,xmm0
+ psrlq xmm0,5
+ pxor xmm0,xmm4
+ psrlq xmm0,1
+ pxor xmm0,xmm1
+ test r9,r9
+ jnz NEAR $L$done
+
+$L$odd_tail:
+ movdqu xmm8,XMMWORD[r8]
+DB 102,69,15,56,0,194
+ pxor xmm0,xmm8
+ movdqa xmm1,xmm0
+ pshufd xmm3,xmm0,78
+ pxor xmm3,xmm0
+DB 102,15,58,68,194,0
+DB 102,15,58,68,202,17
+DB 102,15,58,68,223,0
+ pxor xmm3,xmm0
+ pxor xmm3,xmm1
+
+ movdqa xmm4,xmm3
+ psrldq xmm3,8
+ pslldq xmm4,8
+ pxor xmm1,xmm3
+ pxor xmm0,xmm4
+
+ movdqa xmm4,xmm0
+ movdqa xmm3,xmm0
+ psllq xmm0,5
+ pxor xmm3,xmm0
+ psllq xmm0,1
+ pxor xmm0,xmm3
+ psllq xmm0,57
+ movdqa xmm3,xmm0
+ pslldq xmm0,8
+ psrldq xmm3,8
+ pxor xmm0,xmm4
+ pxor xmm1,xmm3
+
+
+ movdqa xmm4,xmm0
+ psrlq xmm0,1
+ pxor xmm1,xmm4
+ pxor xmm4,xmm0
+ psrlq xmm0,5
+ pxor xmm0,xmm4
+ psrlq xmm0,1
+ pxor xmm0,xmm1
+$L$done:
+DB 102,65,15,56,0,194
+ movdqu XMMWORD[rcx],xmm0
+ movaps xmm6,XMMWORD[rsp]
+ movaps xmm7,XMMWORD[16+rsp]
+ movaps xmm8,XMMWORD[32+rsp]
+ movaps xmm9,XMMWORD[48+rsp]
+ movaps xmm10,XMMWORD[64+rsp]
+ movaps xmm11,XMMWORD[80+rsp]
+ movaps xmm12,XMMWORD[96+rsp]
+ movaps xmm13,XMMWORD[112+rsp]
+ movaps xmm14,XMMWORD[128+rsp]
+ movaps xmm15,XMMWORD[144+rsp]
+ lea rsp,[168+rsp]
+$L$SEH_end_gcm_ghash_clmul:
+ DB 0F3h,0C3h ;repret
+
+
+global gcm_init_avx
+
+ALIGN 32
+gcm_init_avx:
+
+$L$SEH_begin_gcm_init_avx:
+
+DB 0x48,0x83,0xec,0x18
+DB 0x0f,0x29,0x34,0x24
+ vzeroupper
+
+ vmovdqu xmm2,XMMWORD[rdx]
+ vpshufd xmm2,xmm2,78
+
+
+ vpshufd xmm4,xmm2,255
+ vpsrlq xmm3,xmm2,63
+ vpsllq xmm2,xmm2,1
+ vpxor xmm5,xmm5,xmm5
+ vpcmpgtd xmm5,xmm5,xmm4
+ vpslldq xmm3,xmm3,8
+ vpor xmm2,xmm2,xmm3
+
+
+ vpand xmm5,xmm5,XMMWORD[$L$0x1c2_polynomial]
+ vpxor xmm2,xmm2,xmm5
+
+ vpunpckhqdq xmm6,xmm2,xmm2
+ vmovdqa xmm0,xmm2
+ vpxor xmm6,xmm6,xmm2
+ mov r10,4
+ jmp NEAR $L$init_start_avx
+ALIGN 32
+$L$init_loop_avx:
+ vpalignr xmm5,xmm4,xmm3,8
+ vmovdqu XMMWORD[(-16)+rcx],xmm5
+ vpunpckhqdq xmm3,xmm0,xmm0
+ vpxor xmm3,xmm3,xmm0
+ vpclmulqdq xmm1,xmm0,xmm2,0x11
+ vpclmulqdq xmm0,xmm0,xmm2,0x00
+ vpclmulqdq xmm3,xmm3,xmm6,0x00
+ vpxor xmm4,xmm1,xmm0
+ vpxor xmm3,xmm3,xmm4
+
+ vpslldq xmm4,xmm3,8
+ vpsrldq xmm3,xmm3,8
+ vpxor xmm0,xmm0,xmm4
+ vpxor xmm1,xmm1,xmm3
+ vpsllq xmm3,xmm0,57
+ vpsllq xmm4,xmm0,62
+ vpxor xmm4,xmm4,xmm3
+ vpsllq xmm3,xmm0,63
+ vpxor xmm4,xmm4,xmm3
+ vpslldq xmm3,xmm4,8
+ vpsrldq xmm4,xmm4,8
+ vpxor xmm0,xmm0,xmm3
+ vpxor xmm1,xmm1,xmm4
+
+ vpsrlq xmm4,xmm0,1
+ vpxor xmm1,xmm1,xmm0
+ vpxor xmm0,xmm0,xmm4
+ vpsrlq xmm4,xmm4,5
+ vpxor xmm0,xmm0,xmm4
+ vpsrlq xmm0,xmm0,1
+ vpxor xmm0,xmm0,xmm1
+$L$init_start_avx:
+ vmovdqa xmm5,xmm0
+ vpunpckhqdq xmm3,xmm0,xmm0
+ vpxor xmm3,xmm3,xmm0
+ vpclmulqdq xmm1,xmm0,xmm2,0x11
+ vpclmulqdq xmm0,xmm0,xmm2,0x00
+ vpclmulqdq xmm3,xmm3,xmm6,0x00
+ vpxor xmm4,xmm1,xmm0
+ vpxor xmm3,xmm3,xmm4
+
+ vpslldq xmm4,xmm3,8
+ vpsrldq xmm3,xmm3,8
+ vpxor xmm0,xmm0,xmm4
+ vpxor xmm1,xmm1,xmm3
+ vpsllq xmm3,xmm0,57
+ vpsllq xmm4,xmm0,62
+ vpxor xmm4,xmm4,xmm3
+ vpsllq xmm3,xmm0,63
+ vpxor xmm4,xmm4,xmm3
+ vpslldq xmm3,xmm4,8
+ vpsrldq xmm4,xmm4,8
+ vpxor xmm0,xmm0,xmm3
+ vpxor xmm1,xmm1,xmm4
+
+ vpsrlq xmm4,xmm0,1
+ vpxor xmm1,xmm1,xmm0
+ vpxor xmm0,xmm0,xmm4
+ vpsrlq xmm4,xmm4,5
+ vpxor xmm0,xmm0,xmm4
+ vpsrlq xmm0,xmm0,1
+ vpxor xmm0,xmm0,xmm1
+ vpshufd xmm3,xmm5,78
+ vpshufd xmm4,xmm0,78
+ vpxor xmm3,xmm3,xmm5
+ vmovdqu XMMWORD[rcx],xmm5
+ vpxor xmm4,xmm4,xmm0
+ vmovdqu XMMWORD[16+rcx],xmm0
+ lea rcx,[48+rcx]
+ sub r10,1
+ jnz NEAR $L$init_loop_avx
+
+ vpalignr xmm5,xmm3,xmm4,8
+ vmovdqu XMMWORD[(-16)+rcx],xmm5
+
+ vzeroupper
+ movaps xmm6,XMMWORD[rsp]
+ lea rsp,[24+rsp]
+$L$SEH_end_gcm_init_avx:
+ DB 0F3h,0C3h ;repret
+
+
+global gcm_gmult_avx
+
+ALIGN 32
+gcm_gmult_avx:
+
+ jmp NEAR $L$_gmult_clmul
+
+
+global gcm_ghash_avx
+
+ALIGN 32
+gcm_ghash_avx:
+
+ lea rax,[((-136))+rsp]
+$L$SEH_begin_gcm_ghash_avx:
+
+DB 0x48,0x8d,0x60,0xe0
+DB 0x0f,0x29,0x70,0xe0
+DB 0x0f,0x29,0x78,0xf0
+DB 0x44,0x0f,0x29,0x00
+DB 0x44,0x0f,0x29,0x48,0x10
+DB 0x44,0x0f,0x29,0x50,0x20
+DB 0x44,0x0f,0x29,0x58,0x30
+DB 0x44,0x0f,0x29,0x60,0x40
+DB 0x44,0x0f,0x29,0x68,0x50
+DB 0x44,0x0f,0x29,0x70,0x60
+DB 0x44,0x0f,0x29,0x78,0x70
+ vzeroupper
+
+ vmovdqu xmm10,XMMWORD[rcx]
+ lea r10,[$L$0x1c2_polynomial]
+ lea rdx,[64+rdx]
+ vmovdqu xmm13,XMMWORD[$L$bswap_mask]
+ vpshufb xmm10,xmm10,xmm13
+ cmp r9,0x80
+ jb NEAR $L$short_avx
+ sub r9,0x80
+
+ vmovdqu xmm14,XMMWORD[112+r8]
+ vmovdqu xmm6,XMMWORD[((0-64))+rdx]
+ vpshufb xmm14,xmm14,xmm13
+ vmovdqu xmm7,XMMWORD[((32-64))+rdx]
+
+ vpunpckhqdq xmm9,xmm14,xmm14
+ vmovdqu xmm15,XMMWORD[96+r8]
+ vpclmulqdq xmm0,xmm14,xmm6,0x00
+ vpxor xmm9,xmm9,xmm14
+ vpshufb xmm15,xmm15,xmm13
+ vpclmulqdq xmm1,xmm14,xmm6,0x11
+ vmovdqu xmm6,XMMWORD[((16-64))+rdx]
+ vpunpckhqdq xmm8,xmm15,xmm15
+ vmovdqu xmm14,XMMWORD[80+r8]
+ vpclmulqdq xmm2,xmm9,xmm7,0x00
+ vpxor xmm8,xmm8,xmm15
+
+ vpshufb xmm14,xmm14,xmm13
+ vpclmulqdq xmm3,xmm15,xmm6,0x00
+ vpunpckhqdq xmm9,xmm14,xmm14
+ vpclmulqdq xmm4,xmm15,xmm6,0x11
+ vmovdqu xmm6,XMMWORD[((48-64))+rdx]
+ vpxor xmm9,xmm9,xmm14
+ vmovdqu xmm15,XMMWORD[64+r8]
+ vpclmulqdq xmm5,xmm8,xmm7,0x10
+ vmovdqu xmm7,XMMWORD[((80-64))+rdx]
+
+ vpshufb xmm15,xmm15,xmm13
+ vpxor xmm3,xmm3,xmm0
+ vpclmulqdq xmm0,xmm14,xmm6,0x00
+ vpxor xmm4,xmm4,xmm1
+ vpunpckhqdq xmm8,xmm15,xmm15
+ vpclmulqdq xmm1,xmm14,xmm6,0x11
+ vmovdqu xmm6,XMMWORD[((64-64))+rdx]
+ vpxor xmm5,xmm5,xmm2
+ vpclmulqdq xmm2,xmm9,xmm7,0x00
+ vpxor xmm8,xmm8,xmm15
+
+ vmovdqu xmm14,XMMWORD[48+r8]
+ vpxor xmm0,xmm0,xmm3
+ vpclmulqdq xmm3,xmm15,xmm6,0x00
+ vpxor xmm1,xmm1,xmm4
+ vpshufb xmm14,xmm14,xmm13
+ vpclmulqdq xmm4,xmm15,xmm6,0x11
+ vmovdqu xmm6,XMMWORD[((96-64))+rdx]
+ vpxor xmm2,xmm2,xmm5
+ vpunpckhqdq xmm9,xmm14,xmm14
+ vpclmulqdq xmm5,xmm8,xmm7,0x10
+ vmovdqu xmm7,XMMWORD[((128-64))+rdx]
+ vpxor xmm9,xmm9,xmm14
+
+ vmovdqu xmm15,XMMWORD[32+r8]
+ vpxor xmm3,xmm3,xmm0
+ vpclmulqdq xmm0,xmm14,xmm6,0x00
+ vpxor xmm4,xmm4,xmm1
+ vpshufb xmm15,xmm15,xmm13
+ vpclmulqdq xmm1,xmm14,xmm6,0x11
+ vmovdqu xmm6,XMMWORD[((112-64))+rdx]
+ vpxor xmm5,xmm5,xmm2
+ vpunpckhqdq xmm8,xmm15,xmm15
+ vpclmulqdq xmm2,xmm9,xmm7,0x00
+ vpxor xmm8,xmm8,xmm15
+
+ vmovdqu xmm14,XMMWORD[16+r8]
+ vpxor xmm0,xmm0,xmm3
+ vpclmulqdq xmm3,xmm15,xmm6,0x00
+ vpxor xmm1,xmm1,xmm4
+ vpshufb xmm14,xmm14,xmm13
+ vpclmulqdq xmm4,xmm15,xmm6,0x11
+ vmovdqu xmm6,XMMWORD[((144-64))+rdx]
+ vpxor xmm2,xmm2,xmm5
+ vpunpckhqdq xmm9,xmm14,xmm14
+ vpclmulqdq xmm5,xmm8,xmm7,0x10
+ vmovdqu xmm7,XMMWORD[((176-64))+rdx]
+ vpxor xmm9,xmm9,xmm14
+
+ vmovdqu xmm15,XMMWORD[r8]
+ vpxor xmm3,xmm3,xmm0
+ vpclmulqdq xmm0,xmm14,xmm6,0x00
+ vpxor xmm4,xmm4,xmm1
+ vpshufb xmm15,xmm15,xmm13
+ vpclmulqdq xmm1,xmm14,xmm6,0x11
+ vmovdqu xmm6,XMMWORD[((160-64))+rdx]
+ vpxor xmm5,xmm5,xmm2
+ vpclmulqdq xmm2,xmm9,xmm7,0x10
+
+ lea r8,[128+r8]
+ cmp r9,0x80
+ jb NEAR $L$tail_avx
+
+ vpxor xmm15,xmm15,xmm10
+ sub r9,0x80
+ jmp NEAR $L$oop8x_avx
+
+ALIGN 32
+$L$oop8x_avx:
+ vpunpckhqdq xmm8,xmm15,xmm15
+ vmovdqu xmm14,XMMWORD[112+r8]
+ vpxor xmm3,xmm3,xmm0
+ vpxor xmm8,xmm8,xmm15
+ vpclmulqdq xmm10,xmm15,xmm6,0x00
+ vpshufb xmm14,xmm14,xmm13
+ vpxor xmm4,xmm4,xmm1
+ vpclmulqdq xmm11,xmm15,xmm6,0x11
+ vmovdqu xmm6,XMMWORD[((0-64))+rdx]
+ vpunpckhqdq xmm9,xmm14,xmm14
+ vpxor xmm5,xmm5,xmm2
+ vpclmulqdq xmm12,xmm8,xmm7,0x00
+ vmovdqu xmm7,XMMWORD[((32-64))+rdx]
+ vpxor xmm9,xmm9,xmm14
+
+ vmovdqu xmm15,XMMWORD[96+r8]
+ vpclmulqdq xmm0,xmm14,xmm6,0x00
+ vpxor xmm10,xmm10,xmm3
+ vpshufb xmm15,xmm15,xmm13
+ vpclmulqdq xmm1,xmm14,xmm6,0x11
+ vxorps xmm11,xmm11,xmm4
+ vmovdqu xmm6,XMMWORD[((16-64))+rdx]
+ vpunpckhqdq xmm8,xmm15,xmm15
+ vpclmulqdq xmm2,xmm9,xmm7,0x00
+ vpxor xmm12,xmm12,xmm5
+ vxorps xmm8,xmm8,xmm15
+
+ vmovdqu xmm14,XMMWORD[80+r8]
+ vpxor xmm12,xmm12,xmm10
+ vpclmulqdq xmm3,xmm15,xmm6,0x00
+ vpxor xmm12,xmm12,xmm11
+ vpslldq xmm9,xmm12,8
+ vpxor xmm3,xmm3,xmm0
+ vpclmulqdq xmm4,xmm15,xmm6,0x11
+ vpsrldq xmm12,xmm12,8
+ vpxor xmm10,xmm10,xmm9
+ vmovdqu xmm6,XMMWORD[((48-64))+rdx]
+ vpshufb xmm14,xmm14,xmm13
+ vxorps xmm11,xmm11,xmm12
+ vpxor xmm4,xmm4,xmm1
+ vpunpckhqdq xmm9,xmm14,xmm14
+ vpclmulqdq xmm5,xmm8,xmm7,0x10
+ vmovdqu xmm7,XMMWORD[((80-64))+rdx]
+ vpxor xmm9,xmm9,xmm14
+ vpxor xmm5,xmm5,xmm2
+
+ vmovdqu xmm15,XMMWORD[64+r8]
+ vpalignr xmm12,xmm10,xmm10,8
+ vpclmulqdq xmm0,xmm14,xmm6,0x00
+ vpshufb xmm15,xmm15,xmm13
+ vpxor xmm0,xmm0,xmm3
+ vpclmulqdq xmm1,xmm14,xmm6,0x11
+ vmovdqu xmm6,XMMWORD[((64-64))+rdx]
+ vpunpckhqdq xmm8,xmm15,xmm15
+ vpxor xmm1,xmm1,xmm4
+ vpclmulqdq xmm2,xmm9,xmm7,0x00
+ vxorps xmm8,xmm8,xmm15
+ vpxor xmm2,xmm2,xmm5
+
+ vmovdqu xmm14,XMMWORD[48+r8]
+ vpclmulqdq xmm10,xmm10,XMMWORD[r10],0x10
+ vpclmulqdq xmm3,xmm15,xmm6,0x00
+ vpshufb xmm14,xmm14,xmm13
+ vpxor xmm3,xmm3,xmm0
+ vpclmulqdq xmm4,xmm15,xmm6,0x11
+ vmovdqu xmm6,XMMWORD[((96-64))+rdx]
+ vpunpckhqdq xmm9,xmm14,xmm14
+ vpxor xmm4,xmm4,xmm1
+ vpclmulqdq xmm5,xmm8,xmm7,0x10
+ vmovdqu xmm7,XMMWORD[((128-64))+rdx]
+ vpxor xmm9,xmm9,xmm14
+ vpxor xmm5,xmm5,xmm2
+
+ vmovdqu xmm15,XMMWORD[32+r8]
+ vpclmulqdq xmm0,xmm14,xmm6,0x00
+ vpshufb xmm15,xmm15,xmm13
+ vpxor xmm0,xmm0,xmm3
+ vpclmulqdq xmm1,xmm14,xmm6,0x11
+ vmovdqu xmm6,XMMWORD[((112-64))+rdx]
+ vpunpckhqdq xmm8,xmm15,xmm15
+ vpxor xmm1,xmm1,xmm4
+ vpclmulqdq xmm2,xmm9,xmm7,0x00
+ vpxor xmm8,xmm8,xmm15
+ vpxor xmm2,xmm2,xmm5
+ vxorps xmm10,xmm10,xmm12
+
+ vmovdqu xmm14,XMMWORD[16+r8]
+ vpalignr xmm12,xmm10,xmm10,8
+ vpclmulqdq xmm3,xmm15,xmm6,0x00
+ vpshufb xmm14,xmm14,xmm13
+ vpxor xmm3,xmm3,xmm0
+ vpclmulqdq xmm4,xmm15,xmm6,0x11
+ vmovdqu xmm6,XMMWORD[((144-64))+rdx]
+ vpclmulqdq xmm10,xmm10,XMMWORD[r10],0x10
+ vxorps xmm12,xmm12,xmm11
+ vpunpckhqdq xmm9,xmm14,xmm14
+ vpxor xmm4,xmm4,xmm1
+ vpclmulqdq xmm5,xmm8,xmm7,0x10
+ vmovdqu xmm7,XMMWORD[((176-64))+rdx]
+ vpxor xmm9,xmm9,xmm14
+ vpxor xmm5,xmm5,xmm2
+
+ vmovdqu xmm15,XMMWORD[r8]
+ vpclmulqdq xmm0,xmm14,xmm6,0x00
+ vpshufb xmm15,xmm15,xmm13
+ vpclmulqdq xmm1,xmm14,xmm6,0x11
+ vmovdqu xmm6,XMMWORD[((160-64))+rdx]
+ vpxor xmm15,xmm15,xmm12
+ vpclmulqdq xmm2,xmm9,xmm7,0x10
+ vpxor xmm15,xmm15,xmm10
+
+ lea r8,[128+r8]
+ sub r9,0x80
+ jnc NEAR $L$oop8x_avx
+
+ add r9,0x80
+ jmp NEAR $L$tail_no_xor_avx
+
+ALIGN 32
+$L$short_avx:
+ vmovdqu xmm14,XMMWORD[((-16))+r9*1+r8]
+ lea r8,[r9*1+r8]
+ vmovdqu xmm6,XMMWORD[((0-64))+rdx]
+ vmovdqu xmm7,XMMWORD[((32-64))+rdx]
+ vpshufb xmm15,xmm14,xmm13
+
+ vmovdqa xmm3,xmm0
+ vmovdqa xmm4,xmm1
+ vmovdqa xmm5,xmm2
+ sub r9,0x10
+ jz NEAR $L$tail_avx
+
+ vpunpckhqdq xmm8,xmm15,xmm15
+ vpxor xmm3,xmm3,xmm0
+ vpclmulqdq xmm0,xmm15,xmm6,0x00
+ vpxor xmm8,xmm8,xmm15
+ vmovdqu xmm14,XMMWORD[((-32))+r8]
+ vpxor xmm4,xmm4,xmm1
+ vpclmulqdq xmm1,xmm15,xmm6,0x11
+ vmovdqu xmm6,XMMWORD[((16-64))+rdx]
+ vpshufb xmm15,xmm14,xmm13
+ vpxor xmm5,xmm5,xmm2
+ vpclmulqdq xmm2,xmm8,xmm7,0x00
+ vpsrldq xmm7,xmm7,8
+ sub r9,0x10
+ jz NEAR $L$tail_avx
+
+ vpunpckhqdq xmm8,xmm15,xmm15
+ vpxor xmm3,xmm3,xmm0
+ vpclmulqdq xmm0,xmm15,xmm6,0x00
+ vpxor xmm8,xmm8,xmm15
+ vmovdqu xmm14,XMMWORD[((-48))+r8]
+ vpxor xmm4,xmm4,xmm1
+ vpclmulqdq xmm1,xmm15,xmm6,0x11
+ vmovdqu xmm6,XMMWORD[((48-64))+rdx]
+ vpshufb xmm15,xmm14,xmm13
+ vpxor xmm5,xmm5,xmm2
+ vpclmulqdq xmm2,xmm8,xmm7,0x00
+ vmovdqu xmm7,XMMWORD[((80-64))+rdx]
+ sub r9,0x10
+ jz NEAR $L$tail_avx
+
+ vpunpckhqdq xmm8,xmm15,xmm15
+ vpxor xmm3,xmm3,xmm0
+ vpclmulqdq xmm0,xmm15,xmm6,0x00
+ vpxor xmm8,xmm8,xmm15
+ vmovdqu xmm14,XMMWORD[((-64))+r8]
+ vpxor xmm4,xmm4,xmm1
+ vpclmulqdq xmm1,xmm15,xmm6,0x11
+ vmovdqu xmm6,XMMWORD[((64-64))+rdx]
+ vpshufb xmm15,xmm14,xmm13
+ vpxor xmm5,xmm5,xmm2
+ vpclmulqdq xmm2,xmm8,xmm7,0x00
+ vpsrldq xmm7,xmm7,8
+ sub r9,0x10
+ jz NEAR $L$tail_avx
+
+ vpunpckhqdq xmm8,xmm15,xmm15
+ vpxor xmm3,xmm3,xmm0
+ vpclmulqdq xmm0,xmm15,xmm6,0x00
+ vpxor xmm8,xmm8,xmm15
+ vmovdqu xmm14,XMMWORD[((-80))+r8]
+ vpxor xmm4,xmm4,xmm1
+ vpclmulqdq xmm1,xmm15,xmm6,0x11
+ vmovdqu xmm6,XMMWORD[((96-64))+rdx]
+ vpshufb xmm15,xmm14,xmm13
+ vpxor xmm5,xmm5,xmm2
+ vpclmulqdq xmm2,xmm8,xmm7,0x00
+ vmovdqu xmm7,XMMWORD[((128-64))+rdx]
+ sub r9,0x10
+ jz NEAR $L$tail_avx
+
+ vpunpckhqdq xmm8,xmm15,xmm15
+ vpxor xmm3,xmm3,xmm0
+ vpclmulqdq xmm0,xmm15,xmm6,0x00
+ vpxor xmm8,xmm8,xmm15
+ vmovdqu xmm14,XMMWORD[((-96))+r8]
+ vpxor xmm4,xmm4,xmm1
+ vpclmulqdq xmm1,xmm15,xmm6,0x11
+ vmovdqu xmm6,XMMWORD[((112-64))+rdx]
+ vpshufb xmm15,xmm14,xmm13
+ vpxor xmm5,xmm5,xmm2
+ vpclmulqdq xmm2,xmm8,xmm7,0x00
+ vpsrldq xmm7,xmm7,8
+ sub r9,0x10
+ jz NEAR $L$tail_avx
+
+ vpunpckhqdq xmm8,xmm15,xmm15
+ vpxor xmm3,xmm3,xmm0
+ vpclmulqdq xmm0,xmm15,xmm6,0x00
+ vpxor xmm8,xmm8,xmm15
+ vmovdqu xmm14,XMMWORD[((-112))+r8]
+ vpxor xmm4,xmm4,xmm1
+ vpclmulqdq xmm1,xmm15,xmm6,0x11
+ vmovdqu xmm6,XMMWORD[((144-64))+rdx]
+ vpshufb xmm15,xmm14,xmm13
+ vpxor xmm5,xmm5,xmm2
+ vpclmulqdq xmm2,xmm8,xmm7,0x00
+ vmovq xmm7,QWORD[((184-64))+rdx]
+ sub r9,0x10
+ jmp NEAR $L$tail_avx
+
+ALIGN 32
+$L$tail_avx:
+ vpxor xmm15,xmm15,xmm10
+$L$tail_no_xor_avx:
+ vpunpckhqdq xmm8,xmm15,xmm15
+ vpxor xmm3,xmm3,xmm0
+ vpclmulqdq xmm0,xmm15,xmm6,0x00
+ vpxor xmm8,xmm8,xmm15
+ vpxor xmm4,xmm4,xmm1
+ vpclmulqdq xmm1,xmm15,xmm6,0x11
+ vpxor xmm5,xmm5,xmm2
+ vpclmulqdq xmm2,xmm8,xmm7,0x00
+
+ vmovdqu xmm12,XMMWORD[r10]
+
+ vpxor xmm10,xmm3,xmm0
+ vpxor xmm11,xmm4,xmm1
+ vpxor xmm5,xmm5,xmm2
+
+ vpxor xmm5,xmm5,xmm10
+ vpxor xmm5,xmm5,xmm11
+ vpslldq xmm9,xmm5,8
+ vpsrldq xmm5,xmm5,8
+ vpxor xmm10,xmm10,xmm9
+ vpxor xmm11,xmm11,xmm5
+
+ vpclmulqdq xmm9,xmm10,xmm12,0x10
+ vpalignr xmm10,xmm10,xmm10,8
+ vpxor xmm10,xmm10,xmm9
+
+ vpclmulqdq xmm9,xmm10,xmm12,0x10
+ vpalignr xmm10,xmm10,xmm10,8
+ vpxor xmm10,xmm10,xmm11
+ vpxor xmm10,xmm10,xmm9
+
+ cmp r9,0
+ jne NEAR $L$short_avx
+
+ vpshufb xmm10,xmm10,xmm13
+ vmovdqu XMMWORD[rcx],xmm10
+ vzeroupper
+ movaps xmm6,XMMWORD[rsp]
+ movaps xmm7,XMMWORD[16+rsp]
+ movaps xmm8,XMMWORD[32+rsp]
+ movaps xmm9,XMMWORD[48+rsp]
+ movaps xmm10,XMMWORD[64+rsp]
+ movaps xmm11,XMMWORD[80+rsp]
+ movaps xmm12,XMMWORD[96+rsp]
+ movaps xmm13,XMMWORD[112+rsp]
+ movaps xmm14,XMMWORD[128+rsp]
+ movaps xmm15,XMMWORD[144+rsp]
+ lea rsp,[168+rsp]
+$L$SEH_end_gcm_ghash_avx:
+ DB 0F3h,0C3h ;repret
+
+
+ALIGN 64
+$L$bswap_mask:
+DB 15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0
+$L$0x1c2_polynomial:
+DB 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0xc2
+$L$7_mask:
+ DD 7,0,7,0
+$L$7_mask_poly:
+ DD 7,0,450,0
+ALIGN 64
+
+$L$rem_4bit:
+ DD 0,0,0,471859200,0,943718400,0,610271232
+ DD 0,1887436800,0,1822425088,0,1220542464,0,1423966208
+ DD 0,3774873600,0,4246732800,0,3644850176,0,3311403008
+ DD 0,2441084928,0,2376073216,0,2847932416,0,3051356160
+
+$L$rem_8bit:
+ DW 0x0000,0x01C2,0x0384,0x0246,0x0708,0x06CA,0x048C,0x054E
+ DW 0x0E10,0x0FD2,0x0D94,0x0C56,0x0918,0x08DA,0x0A9C,0x0B5E
+ DW 0x1C20,0x1DE2,0x1FA4,0x1E66,0x1B28,0x1AEA,0x18AC,0x196E
+ DW 0x1230,0x13F2,0x11B4,0x1076,0x1538,0x14FA,0x16BC,0x177E
+ DW 0x3840,0x3982,0x3BC4,0x3A06,0x3F48,0x3E8A,0x3CCC,0x3D0E
+ DW 0x3650,0x3792,0x35D4,0x3416,0x3158,0x309A,0x32DC,0x331E
+ DW 0x2460,0x25A2,0x27E4,0x2626,0x2368,0x22AA,0x20EC,0x212E
+ DW 0x2A70,0x2BB2,0x29F4,0x2836,0x2D78,0x2CBA,0x2EFC,0x2F3E
+ DW 0x7080,0x7142,0x7304,0x72C6,0x7788,0x764A,0x740C,0x75CE
+ DW 0x7E90,0x7F52,0x7D14,0x7CD6,0x7998,0x785A,0x7A1C,0x7BDE
+ DW 0x6CA0,0x6D62,0x6F24,0x6EE6,0x6BA8,0x6A6A,0x682C,0x69EE
+ DW 0x62B0,0x6372,0x6134,0x60F6,0x65B8,0x647A,0x663C,0x67FE
+ DW 0x48C0,0x4902,0x4B44,0x4A86,0x4FC8,0x4E0A,0x4C4C,0x4D8E
+ DW 0x46D0,0x4712,0x4554,0x4496,0x41D8,0x401A,0x425C,0x439E
+ DW 0x54E0,0x5522,0x5764,0x56A6,0x53E8,0x522A,0x506C,0x51AE
+ DW 0x5AF0,0x5B32,0x5974,0x58B6,0x5DF8,0x5C3A,0x5E7C,0x5FBE
+ DW 0xE100,0xE0C2,0xE284,0xE346,0xE608,0xE7CA,0xE58C,0xE44E
+ DW 0xEF10,0xEED2,0xEC94,0xED56,0xE818,0xE9DA,0xEB9C,0xEA5E
+ DW 0xFD20,0xFCE2,0xFEA4,0xFF66,0xFA28,0xFBEA,0xF9AC,0xF86E
+ DW 0xF330,0xF2F2,0xF0B4,0xF176,0xF438,0xF5FA,0xF7BC,0xF67E
+ DW 0xD940,0xD882,0xDAC4,0xDB06,0xDE48,0xDF8A,0xDDCC,0xDC0E
+ DW 0xD750,0xD692,0xD4D4,0xD516,0xD058,0xD19A,0xD3DC,0xD21E
+ DW 0xC560,0xC4A2,0xC6E4,0xC726,0xC268,0xC3AA,0xC1EC,0xC02E
+ DW 0xCB70,0xCAB2,0xC8F4,0xC936,0xCC78,0xCDBA,0xCFFC,0xCE3E
+ DW 0x9180,0x9042,0x9204,0x93C6,0x9688,0x974A,0x950C,0x94CE
+ DW 0x9F90,0x9E52,0x9C14,0x9DD6,0x9898,0x995A,0x9B1C,0x9ADE
+ DW 0x8DA0,0x8C62,0x8E24,0x8FE6,0x8AA8,0x8B6A,0x892C,0x88EE
+ DW 0x83B0,0x8272,0x8034,0x81F6,0x84B8,0x857A,0x873C,0x86FE
+ DW 0xA9C0,0xA802,0xAA44,0xAB86,0xAEC8,0xAF0A,0xAD4C,0xAC8E
+ DW 0xA7D0,0xA612,0xA454,0xA596,0xA0D8,0xA11A,0xA35C,0xA29E
+ DW 0xB5E0,0xB422,0xB664,0xB7A6,0xB2E8,0xB32A,0xB16C,0xB0AE
+ DW 0xBBF0,0xBA32,0xB874,0xB9B6,0xBCF8,0xBD3A,0xBF7C,0xBEBE
+
+DB 71,72,65,83,72,32,102,111,114,32,120,56,54,95,54,52
+DB 44,32,67,82,89,80,84,79,71,65,77,83,32,98,121,32
+DB 60,97,112,112,114,111,64,111,112,101,110,115,115,108,46,111
+DB 114,103,62,0
+ALIGN 64
+EXTERN __imp_RtlVirtualUnwind
+
+ALIGN 16
+se_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ mov rsi,QWORD[8+r9]
+ mov r11,QWORD[56+r9]
+
+ mov r10d,DWORD[r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jb NEAR $L$in_prologue
+
+ mov rax,QWORD[152+r8]
+
+ mov r10d,DWORD[4+r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jae NEAR $L$in_prologue
+
+ lea rax,[((48+280))+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+ mov rbp,QWORD[((-16))+rax]
+ mov r12,QWORD[((-24))+rax]
+ mov r13,QWORD[((-32))+rax]
+ mov r14,QWORD[((-40))+rax]
+ mov r15,QWORD[((-48))+rax]
+ mov QWORD[144+r8],rbx
+ mov QWORD[160+r8],rbp
+ mov QWORD[216+r8],r12
+ mov QWORD[224+r8],r13
+ mov QWORD[232+r8],r14
+ mov QWORD[240+r8],r15
+
+$L$in_prologue:
+ mov rdi,QWORD[8+rax]
+ mov rsi,QWORD[16+rax]
+ mov QWORD[152+r8],rax
+ mov QWORD[168+r8],rsi
+ mov QWORD[176+r8],rdi
+
+ mov rdi,QWORD[40+r9]
+ mov rsi,r8
+ mov ecx,154
+ DD 0xa548f3fc
+
+ mov rsi,r9
+ xor rcx,rcx
+ mov rdx,QWORD[8+rsi]
+ mov r8,QWORD[rsi]
+ mov r9,QWORD[16+rsi]
+ mov r10,QWORD[40+rsi]
+ lea r11,[56+rsi]
+ lea r12,[24+rsi]
+ mov QWORD[32+rsp],r10
+ mov QWORD[40+rsp],r11
+ mov QWORD[48+rsp],r12
+ mov QWORD[56+rsp],rcx
+ call QWORD[__imp_RtlVirtualUnwind]
+
+ mov eax,1
+ add rsp,64
+ popfq
+ pop r15
+ pop r14
+ pop r13
+ pop r12
+ pop rbp
+ pop rbx
+ pop rdi
+ pop rsi
+ DB 0F3h,0C3h ;repret
+
+
+section .pdata rdata align=4
+ALIGN 4
+ DD $L$SEH_begin_gcm_gmult_4bit wrt ..imagebase
+ DD $L$SEH_end_gcm_gmult_4bit wrt ..imagebase
+ DD $L$SEH_info_gcm_gmult_4bit wrt ..imagebase
+
+ DD $L$SEH_begin_gcm_ghash_4bit wrt ..imagebase
+ DD $L$SEH_end_gcm_ghash_4bit wrt ..imagebase
+ DD $L$SEH_info_gcm_ghash_4bit wrt ..imagebase
+
+ DD $L$SEH_begin_gcm_init_clmul wrt ..imagebase
+ DD $L$SEH_end_gcm_init_clmul wrt ..imagebase
+ DD $L$SEH_info_gcm_init_clmul wrt ..imagebase
+
+ DD $L$SEH_begin_gcm_ghash_clmul wrt ..imagebase
+ DD $L$SEH_end_gcm_ghash_clmul wrt ..imagebase
+ DD $L$SEH_info_gcm_ghash_clmul wrt ..imagebase
+ DD $L$SEH_begin_gcm_init_avx wrt ..imagebase
+ DD $L$SEH_end_gcm_init_avx wrt ..imagebase
+ DD $L$SEH_info_gcm_init_clmul wrt ..imagebase
+
+ DD $L$SEH_begin_gcm_ghash_avx wrt ..imagebase
+ DD $L$SEH_end_gcm_ghash_avx wrt ..imagebase
+ DD $L$SEH_info_gcm_ghash_clmul wrt ..imagebase
+section .xdata rdata align=8
+ALIGN 8
+$L$SEH_info_gcm_gmult_4bit:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$gmult_prologue wrt ..imagebase,$L$gmult_epilogue wrt ..imagebase
+$L$SEH_info_gcm_ghash_4bit:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$ghash_prologue wrt ..imagebase,$L$ghash_epilogue wrt ..imagebase
+$L$SEH_info_gcm_init_clmul:
+DB 0x01,0x08,0x03,0x00
+DB 0x08,0x68,0x00,0x00
+DB 0x04,0x22,0x00,0x00
+$L$SEH_info_gcm_ghash_clmul:
+DB 0x01,0x33,0x16,0x00
+DB 0x33,0xf8,0x09,0x00
+DB 0x2e,0xe8,0x08,0x00
+DB 0x29,0xd8,0x07,0x00
+DB 0x24,0xc8,0x06,0x00
+DB 0x1f,0xb8,0x05,0x00
+DB 0x1a,0xa8,0x04,0x00
+DB 0x15,0x98,0x03,0x00
+DB 0x10,0x88,0x02,0x00
+DB 0x0c,0x78,0x01,0x00
+DB 0x08,0x68,0x00,0x00
+DB 0x04,0x01,0x15,0x00
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-md5-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-md5-x86_64.nasm
new file mode 100644
index 0000000000..c9a37a47c9
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-md5-x86_64.nasm
@@ -0,0 +1,1395 @@
+; Copyright 2011-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+ALIGN 16
+
+global rc4_md5_enc
+
+rc4_md5_enc:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_rc4_md5_enc:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+ mov r8,QWORD[40+rsp]
+ mov r9,QWORD[48+rsp]
+
+
+
+ cmp r9,0
+ je NEAR $L$abort
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+ sub rsp,40
+
+$L$body:
+ mov r11,rcx
+ mov r12,r9
+ mov r13,rsi
+ mov r14,rdx
+ mov r15,r8
+ xor rbp,rbp
+ xor rcx,rcx
+
+ lea rdi,[8+rdi]
+ mov bpl,BYTE[((-8))+rdi]
+ mov cl,BYTE[((-4))+rdi]
+
+ inc bpl
+ sub r14,r13
+ mov eax,DWORD[rbp*4+rdi]
+ add cl,al
+ lea rsi,[rbp*4+rdi]
+ shl r12,6
+ add r12,r15
+ mov QWORD[16+rsp],r12
+
+ mov QWORD[24+rsp],r11
+ mov r8d,DWORD[r11]
+ mov r9d,DWORD[4+r11]
+ mov r10d,DWORD[8+r11]
+ mov r11d,DWORD[12+r11]
+ jmp NEAR $L$oop
+
+ALIGN 16
+$L$oop:
+ mov DWORD[rsp],r8d
+ mov DWORD[4+rsp],r9d
+ mov DWORD[8+rsp],r10d
+ mov r12d,r11d
+ mov DWORD[12+rsp],r11d
+ pxor xmm0,xmm0
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r10d
+ mov DWORD[rcx*4+rdi],eax
+ and r12d,r9d
+ add r8d,DWORD[r15]
+ add al,dl
+ mov ebx,DWORD[4+rsi]
+ add r8d,3614090360
+ xor r12d,r11d
+ movzx eax,al
+ mov DWORD[rsi],edx
+ add r8d,r12d
+ add cl,bl
+ rol r8d,7
+ mov r12d,r10d
+ movd xmm0,DWORD[rax*4+rdi]
+
+ add r8d,r9d
+ pxor xmm1,xmm1
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r9d
+ mov DWORD[rcx*4+rdi],ebx
+ and r12d,r8d
+ add r11d,DWORD[4+r15]
+ add bl,dl
+ mov eax,DWORD[8+rsi]
+ add r11d,3905402710
+ xor r12d,r10d
+ movzx ebx,bl
+ mov DWORD[4+rsi],edx
+ add r11d,r12d
+ add cl,al
+ rol r11d,12
+ mov r12d,r9d
+ movd xmm1,DWORD[rbx*4+rdi]
+
+ add r11d,r8d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r8d
+ mov DWORD[rcx*4+rdi],eax
+ and r12d,r11d
+ add r10d,DWORD[8+r15]
+ add al,dl
+ mov ebx,DWORD[12+rsi]
+ add r10d,606105819
+ xor r12d,r9d
+ movzx eax,al
+ mov DWORD[8+rsi],edx
+ add r10d,r12d
+ add cl,bl
+ rol r10d,17
+ mov r12d,r8d
+ pinsrw xmm0,WORD[rax*4+rdi],1
+
+ add r10d,r11d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r11d
+ mov DWORD[rcx*4+rdi],ebx
+ and r12d,r10d
+ add r9d,DWORD[12+r15]
+ add bl,dl
+ mov eax,DWORD[16+rsi]
+ add r9d,3250441966
+ xor r12d,r8d
+ movzx ebx,bl
+ mov DWORD[12+rsi],edx
+ add r9d,r12d
+ add cl,al
+ rol r9d,22
+ mov r12d,r11d
+ pinsrw xmm1,WORD[rbx*4+rdi],1
+
+ add r9d,r10d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r10d
+ mov DWORD[rcx*4+rdi],eax
+ and r12d,r9d
+ add r8d,DWORD[16+r15]
+ add al,dl
+ mov ebx,DWORD[20+rsi]
+ add r8d,4118548399
+ xor r12d,r11d
+ movzx eax,al
+ mov DWORD[16+rsi],edx
+ add r8d,r12d
+ add cl,bl
+ rol r8d,7
+ mov r12d,r10d
+ pinsrw xmm0,WORD[rax*4+rdi],2
+
+ add r8d,r9d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r9d
+ mov DWORD[rcx*4+rdi],ebx
+ and r12d,r8d
+ add r11d,DWORD[20+r15]
+ add bl,dl
+ mov eax,DWORD[24+rsi]
+ add r11d,1200080426
+ xor r12d,r10d
+ movzx ebx,bl
+ mov DWORD[20+rsi],edx
+ add r11d,r12d
+ add cl,al
+ rol r11d,12
+ mov r12d,r9d
+ pinsrw xmm1,WORD[rbx*4+rdi],2
+
+ add r11d,r8d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r8d
+ mov DWORD[rcx*4+rdi],eax
+ and r12d,r11d
+ add r10d,DWORD[24+r15]
+ add al,dl
+ mov ebx,DWORD[28+rsi]
+ add r10d,2821735955
+ xor r12d,r9d
+ movzx eax,al
+ mov DWORD[24+rsi],edx
+ add r10d,r12d
+ add cl,bl
+ rol r10d,17
+ mov r12d,r8d
+ pinsrw xmm0,WORD[rax*4+rdi],3
+
+ add r10d,r11d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r11d
+ mov DWORD[rcx*4+rdi],ebx
+ and r12d,r10d
+ add r9d,DWORD[28+r15]
+ add bl,dl
+ mov eax,DWORD[32+rsi]
+ add r9d,4249261313
+ xor r12d,r8d
+ movzx ebx,bl
+ mov DWORD[28+rsi],edx
+ add r9d,r12d
+ add cl,al
+ rol r9d,22
+ mov r12d,r11d
+ pinsrw xmm1,WORD[rbx*4+rdi],3
+
+ add r9d,r10d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r10d
+ mov DWORD[rcx*4+rdi],eax
+ and r12d,r9d
+ add r8d,DWORD[32+r15]
+ add al,dl
+ mov ebx,DWORD[36+rsi]
+ add r8d,1770035416
+ xor r12d,r11d
+ movzx eax,al
+ mov DWORD[32+rsi],edx
+ add r8d,r12d
+ add cl,bl
+ rol r8d,7
+ mov r12d,r10d
+ pinsrw xmm0,WORD[rax*4+rdi],4
+
+ add r8d,r9d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r9d
+ mov DWORD[rcx*4+rdi],ebx
+ and r12d,r8d
+ add r11d,DWORD[36+r15]
+ add bl,dl
+ mov eax,DWORD[40+rsi]
+ add r11d,2336552879
+ xor r12d,r10d
+ movzx ebx,bl
+ mov DWORD[36+rsi],edx
+ add r11d,r12d
+ add cl,al
+ rol r11d,12
+ mov r12d,r9d
+ pinsrw xmm1,WORD[rbx*4+rdi],4
+
+ add r11d,r8d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r8d
+ mov DWORD[rcx*4+rdi],eax
+ and r12d,r11d
+ add r10d,DWORD[40+r15]
+ add al,dl
+ mov ebx,DWORD[44+rsi]
+ add r10d,4294925233
+ xor r12d,r9d
+ movzx eax,al
+ mov DWORD[40+rsi],edx
+ add r10d,r12d
+ add cl,bl
+ rol r10d,17
+ mov r12d,r8d
+ pinsrw xmm0,WORD[rax*4+rdi],5
+
+ add r10d,r11d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r11d
+ mov DWORD[rcx*4+rdi],ebx
+ and r12d,r10d
+ add r9d,DWORD[44+r15]
+ add bl,dl
+ mov eax,DWORD[48+rsi]
+ add r9d,2304563134
+ xor r12d,r8d
+ movzx ebx,bl
+ mov DWORD[44+rsi],edx
+ add r9d,r12d
+ add cl,al
+ rol r9d,22
+ mov r12d,r11d
+ pinsrw xmm1,WORD[rbx*4+rdi],5
+
+ add r9d,r10d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r10d
+ mov DWORD[rcx*4+rdi],eax
+ and r12d,r9d
+ add r8d,DWORD[48+r15]
+ add al,dl
+ mov ebx,DWORD[52+rsi]
+ add r8d,1804603682
+ xor r12d,r11d
+ movzx eax,al
+ mov DWORD[48+rsi],edx
+ add r8d,r12d
+ add cl,bl
+ rol r8d,7
+ mov r12d,r10d
+ pinsrw xmm0,WORD[rax*4+rdi],6
+
+ add r8d,r9d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r9d
+ mov DWORD[rcx*4+rdi],ebx
+ and r12d,r8d
+ add r11d,DWORD[52+r15]
+ add bl,dl
+ mov eax,DWORD[56+rsi]
+ add r11d,4254626195
+ xor r12d,r10d
+ movzx ebx,bl
+ mov DWORD[52+rsi],edx
+ add r11d,r12d
+ add cl,al
+ rol r11d,12
+ mov r12d,r9d
+ pinsrw xmm1,WORD[rbx*4+rdi],6
+
+ add r11d,r8d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r8d
+ mov DWORD[rcx*4+rdi],eax
+ and r12d,r11d
+ add r10d,DWORD[56+r15]
+ add al,dl
+ mov ebx,DWORD[60+rsi]
+ add r10d,2792965006
+ xor r12d,r9d
+ movzx eax,al
+ mov DWORD[56+rsi],edx
+ add r10d,r12d
+ add cl,bl
+ rol r10d,17
+ mov r12d,r8d
+ pinsrw xmm0,WORD[rax*4+rdi],7
+
+ add r10d,r11d
+ movdqu xmm2,XMMWORD[r13]
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r11d
+ mov DWORD[rcx*4+rdi],ebx
+ and r12d,r10d
+ add r9d,DWORD[60+r15]
+ add bl,dl
+ mov eax,DWORD[64+rsi]
+ add r9d,1236535329
+ xor r12d,r8d
+ movzx ebx,bl
+ mov DWORD[60+rsi],edx
+ add r9d,r12d
+ add cl,al
+ rol r9d,22
+ mov r12d,r10d
+ pinsrw xmm1,WORD[rbx*4+rdi],7
+
+ add r9d,r10d
+ psllq xmm1,8
+ pxor xmm2,xmm0
+ pxor xmm2,xmm1
+ pxor xmm0,xmm0
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r9d
+ mov DWORD[rcx*4+rdi],eax
+ and r12d,r11d
+ add r8d,DWORD[4+r15]
+ add al,dl
+ mov ebx,DWORD[68+rsi]
+ add r8d,4129170786
+ xor r12d,r10d
+ movzx eax,al
+ mov DWORD[64+rsi],edx
+ add r8d,r12d
+ add cl,bl
+ rol r8d,5
+ mov r12d,r9d
+ movd xmm0,DWORD[rax*4+rdi]
+
+ add r8d,r9d
+ pxor xmm1,xmm1
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r8d
+ mov DWORD[rcx*4+rdi],ebx
+ and r12d,r10d
+ add r11d,DWORD[24+r15]
+ add bl,dl
+ mov eax,DWORD[72+rsi]
+ add r11d,3225465664
+ xor r12d,r9d
+ movzx ebx,bl
+ mov DWORD[68+rsi],edx
+ add r11d,r12d
+ add cl,al
+ rol r11d,9
+ mov r12d,r8d
+ movd xmm1,DWORD[rbx*4+rdi]
+
+ add r11d,r8d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r11d
+ mov DWORD[rcx*4+rdi],eax
+ and r12d,r9d
+ add r10d,DWORD[44+r15]
+ add al,dl
+ mov ebx,DWORD[76+rsi]
+ add r10d,643717713
+ xor r12d,r8d
+ movzx eax,al
+ mov DWORD[72+rsi],edx
+ add r10d,r12d
+ add cl,bl
+ rol r10d,14
+ mov r12d,r11d
+ pinsrw xmm0,WORD[rax*4+rdi],1
+
+ add r10d,r11d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r10d
+ mov DWORD[rcx*4+rdi],ebx
+ and r12d,r8d
+ add r9d,DWORD[r15]
+ add bl,dl
+ mov eax,DWORD[80+rsi]
+ add r9d,3921069994
+ xor r12d,r11d
+ movzx ebx,bl
+ mov DWORD[76+rsi],edx
+ add r9d,r12d
+ add cl,al
+ rol r9d,20
+ mov r12d,r10d
+ pinsrw xmm1,WORD[rbx*4+rdi],1
+
+ add r9d,r10d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r9d
+ mov DWORD[rcx*4+rdi],eax
+ and r12d,r11d
+ add r8d,DWORD[20+r15]
+ add al,dl
+ mov ebx,DWORD[84+rsi]
+ add r8d,3593408605
+ xor r12d,r10d
+ movzx eax,al
+ mov DWORD[80+rsi],edx
+ add r8d,r12d
+ add cl,bl
+ rol r8d,5
+ mov r12d,r9d
+ pinsrw xmm0,WORD[rax*4+rdi],2
+
+ add r8d,r9d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r8d
+ mov DWORD[rcx*4+rdi],ebx
+ and r12d,r10d
+ add r11d,DWORD[40+r15]
+ add bl,dl
+ mov eax,DWORD[88+rsi]
+ add r11d,38016083
+ xor r12d,r9d
+ movzx ebx,bl
+ mov DWORD[84+rsi],edx
+ add r11d,r12d
+ add cl,al
+ rol r11d,9
+ mov r12d,r8d
+ pinsrw xmm1,WORD[rbx*4+rdi],2
+
+ add r11d,r8d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r11d
+ mov DWORD[rcx*4+rdi],eax
+ and r12d,r9d
+ add r10d,DWORD[60+r15]
+ add al,dl
+ mov ebx,DWORD[92+rsi]
+ add r10d,3634488961
+ xor r12d,r8d
+ movzx eax,al
+ mov DWORD[88+rsi],edx
+ add r10d,r12d
+ add cl,bl
+ rol r10d,14
+ mov r12d,r11d
+ pinsrw xmm0,WORD[rax*4+rdi],3
+
+ add r10d,r11d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r10d
+ mov DWORD[rcx*4+rdi],ebx
+ and r12d,r8d
+ add r9d,DWORD[16+r15]
+ add bl,dl
+ mov eax,DWORD[96+rsi]
+ add r9d,3889429448
+ xor r12d,r11d
+ movzx ebx,bl
+ mov DWORD[92+rsi],edx
+ add r9d,r12d
+ add cl,al
+ rol r9d,20
+ mov r12d,r10d
+ pinsrw xmm1,WORD[rbx*4+rdi],3
+
+ add r9d,r10d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r9d
+ mov DWORD[rcx*4+rdi],eax
+ and r12d,r11d
+ add r8d,DWORD[36+r15]
+ add al,dl
+ mov ebx,DWORD[100+rsi]
+ add r8d,568446438
+ xor r12d,r10d
+ movzx eax,al
+ mov DWORD[96+rsi],edx
+ add r8d,r12d
+ add cl,bl
+ rol r8d,5
+ mov r12d,r9d
+ pinsrw xmm0,WORD[rax*4+rdi],4
+
+ add r8d,r9d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r8d
+ mov DWORD[rcx*4+rdi],ebx
+ and r12d,r10d
+ add r11d,DWORD[56+r15]
+ add bl,dl
+ mov eax,DWORD[104+rsi]
+ add r11d,3275163606
+ xor r12d,r9d
+ movzx ebx,bl
+ mov DWORD[100+rsi],edx
+ add r11d,r12d
+ add cl,al
+ rol r11d,9
+ mov r12d,r8d
+ pinsrw xmm1,WORD[rbx*4+rdi],4
+
+ add r11d,r8d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r11d
+ mov DWORD[rcx*4+rdi],eax
+ and r12d,r9d
+ add r10d,DWORD[12+r15]
+ add al,dl
+ mov ebx,DWORD[108+rsi]
+ add r10d,4107603335
+ xor r12d,r8d
+ movzx eax,al
+ mov DWORD[104+rsi],edx
+ add r10d,r12d
+ add cl,bl
+ rol r10d,14
+ mov r12d,r11d
+ pinsrw xmm0,WORD[rax*4+rdi],5
+
+ add r10d,r11d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r10d
+ mov DWORD[rcx*4+rdi],ebx
+ and r12d,r8d
+ add r9d,DWORD[32+r15]
+ add bl,dl
+ mov eax,DWORD[112+rsi]
+ add r9d,1163531501
+ xor r12d,r11d
+ movzx ebx,bl
+ mov DWORD[108+rsi],edx
+ add r9d,r12d
+ add cl,al
+ rol r9d,20
+ mov r12d,r10d
+ pinsrw xmm1,WORD[rbx*4+rdi],5
+
+ add r9d,r10d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r9d
+ mov DWORD[rcx*4+rdi],eax
+ and r12d,r11d
+ add r8d,DWORD[52+r15]
+ add al,dl
+ mov ebx,DWORD[116+rsi]
+ add r8d,2850285829
+ xor r12d,r10d
+ movzx eax,al
+ mov DWORD[112+rsi],edx
+ add r8d,r12d
+ add cl,bl
+ rol r8d,5
+ mov r12d,r9d
+ pinsrw xmm0,WORD[rax*4+rdi],6
+
+ add r8d,r9d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r8d
+ mov DWORD[rcx*4+rdi],ebx
+ and r12d,r10d
+ add r11d,DWORD[8+r15]
+ add bl,dl
+ mov eax,DWORD[120+rsi]
+ add r11d,4243563512
+ xor r12d,r9d
+ movzx ebx,bl
+ mov DWORD[116+rsi],edx
+ add r11d,r12d
+ add cl,al
+ rol r11d,9
+ mov r12d,r8d
+ pinsrw xmm1,WORD[rbx*4+rdi],6
+
+ add r11d,r8d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r11d
+ mov DWORD[rcx*4+rdi],eax
+ and r12d,r9d
+ add r10d,DWORD[28+r15]
+ add al,dl
+ mov ebx,DWORD[124+rsi]
+ add r10d,1735328473
+ xor r12d,r8d
+ movzx eax,al
+ mov DWORD[120+rsi],edx
+ add r10d,r12d
+ add cl,bl
+ rol r10d,14
+ mov r12d,r11d
+ pinsrw xmm0,WORD[rax*4+rdi],7
+
+ add r10d,r11d
+ movdqu xmm3,XMMWORD[16+r13]
+ add bpl,32
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r10d
+ mov DWORD[rcx*4+rdi],ebx
+ and r12d,r8d
+ add r9d,DWORD[48+r15]
+ add bl,dl
+ mov eax,DWORD[rbp*4+rdi]
+ add r9d,2368359562
+ xor r12d,r11d
+ movzx ebx,bl
+ mov DWORD[124+rsi],edx
+ add r9d,r12d
+ add cl,al
+ rol r9d,20
+ mov r12d,r11d
+ pinsrw xmm1,WORD[rbx*4+rdi],7
+
+ add r9d,r10d
+ mov rsi,rcx
+ xor rcx,rcx
+ mov cl,sil
+ lea rsi,[rbp*4+rdi]
+ psllq xmm1,8
+ pxor xmm3,xmm0
+ pxor xmm3,xmm1
+ pxor xmm0,xmm0
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r10d
+ mov DWORD[rcx*4+rdi],eax
+ xor r12d,r9d
+ add r8d,DWORD[20+r15]
+ add al,dl
+ mov ebx,DWORD[4+rsi]
+ add r8d,4294588738
+ movzx eax,al
+ add r8d,r12d
+ mov DWORD[rsi],edx
+ add cl,bl
+ rol r8d,4
+ mov r12d,r10d
+ movd xmm0,DWORD[rax*4+rdi]
+
+ add r8d,r9d
+ pxor xmm1,xmm1
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r9d
+ mov DWORD[rcx*4+rdi],ebx
+ xor r12d,r8d
+ add r11d,DWORD[32+r15]
+ add bl,dl
+ mov eax,DWORD[8+rsi]
+ add r11d,2272392833
+ movzx ebx,bl
+ add r11d,r12d
+ mov DWORD[4+rsi],edx
+ add cl,al
+ rol r11d,11
+ mov r12d,r9d
+ movd xmm1,DWORD[rbx*4+rdi]
+
+ add r11d,r8d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r8d
+ mov DWORD[rcx*4+rdi],eax
+ xor r12d,r11d
+ add r10d,DWORD[44+r15]
+ add al,dl
+ mov ebx,DWORD[12+rsi]
+ add r10d,1839030562
+ movzx eax,al
+ add r10d,r12d
+ mov DWORD[8+rsi],edx
+ add cl,bl
+ rol r10d,16
+ mov r12d,r8d
+ pinsrw xmm0,WORD[rax*4+rdi],1
+
+ add r10d,r11d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r11d
+ mov DWORD[rcx*4+rdi],ebx
+ xor r12d,r10d
+ add r9d,DWORD[56+r15]
+ add bl,dl
+ mov eax,DWORD[16+rsi]
+ add r9d,4259657740
+ movzx ebx,bl
+ add r9d,r12d
+ mov DWORD[12+rsi],edx
+ add cl,al
+ rol r9d,23
+ mov r12d,r11d
+ pinsrw xmm1,WORD[rbx*4+rdi],1
+
+ add r9d,r10d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r10d
+ mov DWORD[rcx*4+rdi],eax
+ xor r12d,r9d
+ add r8d,DWORD[4+r15]
+ add al,dl
+ mov ebx,DWORD[20+rsi]
+ add r8d,2763975236
+ movzx eax,al
+ add r8d,r12d
+ mov DWORD[16+rsi],edx
+ add cl,bl
+ rol r8d,4
+ mov r12d,r10d
+ pinsrw xmm0,WORD[rax*4+rdi],2
+
+ add r8d,r9d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r9d
+ mov DWORD[rcx*4+rdi],ebx
+ xor r12d,r8d
+ add r11d,DWORD[16+r15]
+ add bl,dl
+ mov eax,DWORD[24+rsi]
+ add r11d,1272893353
+ movzx ebx,bl
+ add r11d,r12d
+ mov DWORD[20+rsi],edx
+ add cl,al
+ rol r11d,11
+ mov r12d,r9d
+ pinsrw xmm1,WORD[rbx*4+rdi],2
+
+ add r11d,r8d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r8d
+ mov DWORD[rcx*4+rdi],eax
+ xor r12d,r11d
+ add r10d,DWORD[28+r15]
+ add al,dl
+ mov ebx,DWORD[28+rsi]
+ add r10d,4139469664
+ movzx eax,al
+ add r10d,r12d
+ mov DWORD[24+rsi],edx
+ add cl,bl
+ rol r10d,16
+ mov r12d,r8d
+ pinsrw xmm0,WORD[rax*4+rdi],3
+
+ add r10d,r11d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r11d
+ mov DWORD[rcx*4+rdi],ebx
+ xor r12d,r10d
+ add r9d,DWORD[40+r15]
+ add bl,dl
+ mov eax,DWORD[32+rsi]
+ add r9d,3200236656
+ movzx ebx,bl
+ add r9d,r12d
+ mov DWORD[28+rsi],edx
+ add cl,al
+ rol r9d,23
+ mov r12d,r11d
+ pinsrw xmm1,WORD[rbx*4+rdi],3
+
+ add r9d,r10d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r10d
+ mov DWORD[rcx*4+rdi],eax
+ xor r12d,r9d
+ add r8d,DWORD[52+r15]
+ add al,dl
+ mov ebx,DWORD[36+rsi]
+ add r8d,681279174
+ movzx eax,al
+ add r8d,r12d
+ mov DWORD[32+rsi],edx
+ add cl,bl
+ rol r8d,4
+ mov r12d,r10d
+ pinsrw xmm0,WORD[rax*4+rdi],4
+
+ add r8d,r9d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r9d
+ mov DWORD[rcx*4+rdi],ebx
+ xor r12d,r8d
+ add r11d,DWORD[r15]
+ add bl,dl
+ mov eax,DWORD[40+rsi]
+ add r11d,3936430074
+ movzx ebx,bl
+ add r11d,r12d
+ mov DWORD[36+rsi],edx
+ add cl,al
+ rol r11d,11
+ mov r12d,r9d
+ pinsrw xmm1,WORD[rbx*4+rdi],4
+
+ add r11d,r8d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r8d
+ mov DWORD[rcx*4+rdi],eax
+ xor r12d,r11d
+ add r10d,DWORD[12+r15]
+ add al,dl
+ mov ebx,DWORD[44+rsi]
+ add r10d,3572445317
+ movzx eax,al
+ add r10d,r12d
+ mov DWORD[40+rsi],edx
+ add cl,bl
+ rol r10d,16
+ mov r12d,r8d
+ pinsrw xmm0,WORD[rax*4+rdi],5
+
+ add r10d,r11d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r11d
+ mov DWORD[rcx*4+rdi],ebx
+ xor r12d,r10d
+ add r9d,DWORD[24+r15]
+ add bl,dl
+ mov eax,DWORD[48+rsi]
+ add r9d,76029189
+ movzx ebx,bl
+ add r9d,r12d
+ mov DWORD[44+rsi],edx
+ add cl,al
+ rol r9d,23
+ mov r12d,r11d
+ pinsrw xmm1,WORD[rbx*4+rdi],5
+
+ add r9d,r10d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r10d
+ mov DWORD[rcx*4+rdi],eax
+ xor r12d,r9d
+ add r8d,DWORD[36+r15]
+ add al,dl
+ mov ebx,DWORD[52+rsi]
+ add r8d,3654602809
+ movzx eax,al
+ add r8d,r12d
+ mov DWORD[48+rsi],edx
+ add cl,bl
+ rol r8d,4
+ mov r12d,r10d
+ pinsrw xmm0,WORD[rax*4+rdi],6
+
+ add r8d,r9d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r9d
+ mov DWORD[rcx*4+rdi],ebx
+ xor r12d,r8d
+ add r11d,DWORD[48+r15]
+ add bl,dl
+ mov eax,DWORD[56+rsi]
+ add r11d,3873151461
+ movzx ebx,bl
+ add r11d,r12d
+ mov DWORD[52+rsi],edx
+ add cl,al
+ rol r11d,11
+ mov r12d,r9d
+ pinsrw xmm1,WORD[rbx*4+rdi],6
+
+ add r11d,r8d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r8d
+ mov DWORD[rcx*4+rdi],eax
+ xor r12d,r11d
+ add r10d,DWORD[60+r15]
+ add al,dl
+ mov ebx,DWORD[60+rsi]
+ add r10d,530742520
+ movzx eax,al
+ add r10d,r12d
+ mov DWORD[56+rsi],edx
+ add cl,bl
+ rol r10d,16
+ mov r12d,r8d
+ pinsrw xmm0,WORD[rax*4+rdi],7
+
+ add r10d,r11d
+ movdqu xmm4,XMMWORD[32+r13]
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r11d
+ mov DWORD[rcx*4+rdi],ebx
+ xor r12d,r10d
+ add r9d,DWORD[8+r15]
+ add bl,dl
+ mov eax,DWORD[64+rsi]
+ add r9d,3299628645
+ movzx ebx,bl
+ add r9d,r12d
+ mov DWORD[60+rsi],edx
+ add cl,al
+ rol r9d,23
+ mov r12d,-1
+ pinsrw xmm1,WORD[rbx*4+rdi],7
+
+ add r9d,r10d
+ psllq xmm1,8
+ pxor xmm4,xmm0
+ pxor xmm4,xmm1
+ pxor xmm0,xmm0
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r11d
+ mov DWORD[rcx*4+rdi],eax
+ or r12d,r9d
+ add r8d,DWORD[r15]
+ add al,dl
+ mov ebx,DWORD[68+rsi]
+ add r8d,4096336452
+ movzx eax,al
+ xor r12d,r10d
+ mov DWORD[64+rsi],edx
+ add r8d,r12d
+ add cl,bl
+ rol r8d,6
+ mov r12d,-1
+ movd xmm0,DWORD[rax*4+rdi]
+
+ add r8d,r9d
+ pxor xmm1,xmm1
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r10d
+ mov DWORD[rcx*4+rdi],ebx
+ or r12d,r8d
+ add r11d,DWORD[28+r15]
+ add bl,dl
+ mov eax,DWORD[72+rsi]
+ add r11d,1126891415
+ movzx ebx,bl
+ xor r12d,r9d
+ mov DWORD[68+rsi],edx
+ add r11d,r12d
+ add cl,al
+ rol r11d,10
+ mov r12d,-1
+ movd xmm1,DWORD[rbx*4+rdi]
+
+ add r11d,r8d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r9d
+ mov DWORD[rcx*4+rdi],eax
+ or r12d,r11d
+ add r10d,DWORD[56+r15]
+ add al,dl
+ mov ebx,DWORD[76+rsi]
+ add r10d,2878612391
+ movzx eax,al
+ xor r12d,r8d
+ mov DWORD[72+rsi],edx
+ add r10d,r12d
+ add cl,bl
+ rol r10d,15
+ mov r12d,-1
+ pinsrw xmm0,WORD[rax*4+rdi],1
+
+ add r10d,r11d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r8d
+ mov DWORD[rcx*4+rdi],ebx
+ or r12d,r10d
+ add r9d,DWORD[20+r15]
+ add bl,dl
+ mov eax,DWORD[80+rsi]
+ add r9d,4237533241
+ movzx ebx,bl
+ xor r12d,r11d
+ mov DWORD[76+rsi],edx
+ add r9d,r12d
+ add cl,al
+ rol r9d,21
+ mov r12d,-1
+ pinsrw xmm1,WORD[rbx*4+rdi],1
+
+ add r9d,r10d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r11d
+ mov DWORD[rcx*4+rdi],eax
+ or r12d,r9d
+ add r8d,DWORD[48+r15]
+ add al,dl
+ mov ebx,DWORD[84+rsi]
+ add r8d,1700485571
+ movzx eax,al
+ xor r12d,r10d
+ mov DWORD[80+rsi],edx
+ add r8d,r12d
+ add cl,bl
+ rol r8d,6
+ mov r12d,-1
+ pinsrw xmm0,WORD[rax*4+rdi],2
+
+ add r8d,r9d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r10d
+ mov DWORD[rcx*4+rdi],ebx
+ or r12d,r8d
+ add r11d,DWORD[12+r15]
+ add bl,dl
+ mov eax,DWORD[88+rsi]
+ add r11d,2399980690
+ movzx ebx,bl
+ xor r12d,r9d
+ mov DWORD[84+rsi],edx
+ add r11d,r12d
+ add cl,al
+ rol r11d,10
+ mov r12d,-1
+ pinsrw xmm1,WORD[rbx*4+rdi],2
+
+ add r11d,r8d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r9d
+ mov DWORD[rcx*4+rdi],eax
+ or r12d,r11d
+ add r10d,DWORD[40+r15]
+ add al,dl
+ mov ebx,DWORD[92+rsi]
+ add r10d,4293915773
+ movzx eax,al
+ xor r12d,r8d
+ mov DWORD[88+rsi],edx
+ add r10d,r12d
+ add cl,bl
+ rol r10d,15
+ mov r12d,-1
+ pinsrw xmm0,WORD[rax*4+rdi],3
+
+ add r10d,r11d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r8d
+ mov DWORD[rcx*4+rdi],ebx
+ or r12d,r10d
+ add r9d,DWORD[4+r15]
+ add bl,dl
+ mov eax,DWORD[96+rsi]
+ add r9d,2240044497
+ movzx ebx,bl
+ xor r12d,r11d
+ mov DWORD[92+rsi],edx
+ add r9d,r12d
+ add cl,al
+ rol r9d,21
+ mov r12d,-1
+ pinsrw xmm1,WORD[rbx*4+rdi],3
+
+ add r9d,r10d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r11d
+ mov DWORD[rcx*4+rdi],eax
+ or r12d,r9d
+ add r8d,DWORD[32+r15]
+ add al,dl
+ mov ebx,DWORD[100+rsi]
+ add r8d,1873313359
+ movzx eax,al
+ xor r12d,r10d
+ mov DWORD[96+rsi],edx
+ add r8d,r12d
+ add cl,bl
+ rol r8d,6
+ mov r12d,-1
+ pinsrw xmm0,WORD[rax*4+rdi],4
+
+ add r8d,r9d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r10d
+ mov DWORD[rcx*4+rdi],ebx
+ or r12d,r8d
+ add r11d,DWORD[60+r15]
+ add bl,dl
+ mov eax,DWORD[104+rsi]
+ add r11d,4264355552
+ movzx ebx,bl
+ xor r12d,r9d
+ mov DWORD[100+rsi],edx
+ add r11d,r12d
+ add cl,al
+ rol r11d,10
+ mov r12d,-1
+ pinsrw xmm1,WORD[rbx*4+rdi],4
+
+ add r11d,r8d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r9d
+ mov DWORD[rcx*4+rdi],eax
+ or r12d,r11d
+ add r10d,DWORD[24+r15]
+ add al,dl
+ mov ebx,DWORD[108+rsi]
+ add r10d,2734768916
+ movzx eax,al
+ xor r12d,r8d
+ mov DWORD[104+rsi],edx
+ add r10d,r12d
+ add cl,bl
+ rol r10d,15
+ mov r12d,-1
+ pinsrw xmm0,WORD[rax*4+rdi],5
+
+ add r10d,r11d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r8d
+ mov DWORD[rcx*4+rdi],ebx
+ or r12d,r10d
+ add r9d,DWORD[52+r15]
+ add bl,dl
+ mov eax,DWORD[112+rsi]
+ add r9d,1309151649
+ movzx ebx,bl
+ xor r12d,r11d
+ mov DWORD[108+rsi],edx
+ add r9d,r12d
+ add cl,al
+ rol r9d,21
+ mov r12d,-1
+ pinsrw xmm1,WORD[rbx*4+rdi],5
+
+ add r9d,r10d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r11d
+ mov DWORD[rcx*4+rdi],eax
+ or r12d,r9d
+ add r8d,DWORD[16+r15]
+ add al,dl
+ mov ebx,DWORD[116+rsi]
+ add r8d,4149444226
+ movzx eax,al
+ xor r12d,r10d
+ mov DWORD[112+rsi],edx
+ add r8d,r12d
+ add cl,bl
+ rol r8d,6
+ mov r12d,-1
+ pinsrw xmm0,WORD[rax*4+rdi],6
+
+ add r8d,r9d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r10d
+ mov DWORD[rcx*4+rdi],ebx
+ or r12d,r8d
+ add r11d,DWORD[44+r15]
+ add bl,dl
+ mov eax,DWORD[120+rsi]
+ add r11d,3174756917
+ movzx ebx,bl
+ xor r12d,r9d
+ mov DWORD[116+rsi],edx
+ add r11d,r12d
+ add cl,al
+ rol r11d,10
+ mov r12d,-1
+ pinsrw xmm1,WORD[rbx*4+rdi],6
+
+ add r11d,r8d
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r9d
+ mov DWORD[rcx*4+rdi],eax
+ or r12d,r11d
+ add r10d,DWORD[8+r15]
+ add al,dl
+ mov ebx,DWORD[124+rsi]
+ add r10d,718787259
+ movzx eax,al
+ xor r12d,r8d
+ mov DWORD[120+rsi],edx
+ add r10d,r12d
+ add cl,bl
+ rol r10d,15
+ mov r12d,-1
+ pinsrw xmm0,WORD[rax*4+rdi],7
+
+ add r10d,r11d
+ movdqu xmm5,XMMWORD[48+r13]
+ add bpl,32
+ mov edx,DWORD[rcx*4+rdi]
+ xor r12d,r8d
+ mov DWORD[rcx*4+rdi],ebx
+ or r12d,r10d
+ add r9d,DWORD[36+r15]
+ add bl,dl
+ mov eax,DWORD[rbp*4+rdi]
+ add r9d,3951481745
+ movzx ebx,bl
+ xor r12d,r11d
+ mov DWORD[124+rsi],edx
+ add r9d,r12d
+ add cl,al
+ rol r9d,21
+ mov r12d,-1
+ pinsrw xmm1,WORD[rbx*4+rdi],7
+
+ add r9d,r10d
+ mov rsi,rbp
+ xor rbp,rbp
+ mov bpl,sil
+ mov rsi,rcx
+ xor rcx,rcx
+ mov cl,sil
+ lea rsi,[rbp*4+rdi]
+ psllq xmm1,8
+ pxor xmm5,xmm0
+ pxor xmm5,xmm1
+ add r8d,DWORD[rsp]
+ add r9d,DWORD[4+rsp]
+ add r10d,DWORD[8+rsp]
+ add r11d,DWORD[12+rsp]
+
+ movdqu XMMWORD[r13*1+r14],xmm2
+ movdqu XMMWORD[16+r13*1+r14],xmm3
+ movdqu XMMWORD[32+r13*1+r14],xmm4
+ movdqu XMMWORD[48+r13*1+r14],xmm5
+ lea r15,[64+r15]
+ lea r13,[64+r13]
+ cmp r15,QWORD[16+rsp]
+ jb NEAR $L$oop
+
+ mov r12,QWORD[24+rsp]
+ sub cl,al
+ mov DWORD[r12],r8d
+ mov DWORD[4+r12],r9d
+ mov DWORD[8+r12],r10d
+ mov DWORD[12+r12],r11d
+ sub bpl,1
+ mov DWORD[((-8))+rdi],ebp
+ mov DWORD[((-4))+rdi],ecx
+
+ mov r15,QWORD[40+rsp]
+
+ mov r14,QWORD[48+rsp]
+
+ mov r13,QWORD[56+rsp]
+
+ mov r12,QWORD[64+rsp]
+
+ mov rbp,QWORD[72+rsp]
+
+ mov rbx,QWORD[80+rsp]
+
+ lea rsp,[88+rsp]
+
+$L$epilogue:
+$L$abort:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_rc4_md5_enc:
+EXTERN __imp_RtlVirtualUnwind
+
+ALIGN 16
+se_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ lea r10,[$L$body]
+ cmp rbx,r10
+ jb NEAR $L$in_prologue
+
+ mov rax,QWORD[152+r8]
+
+ lea r10,[$L$epilogue]
+ cmp rbx,r10
+ jae NEAR $L$in_prologue
+
+ mov r15,QWORD[40+rax]
+ mov r14,QWORD[48+rax]
+ mov r13,QWORD[56+rax]
+ mov r12,QWORD[64+rax]
+ mov rbp,QWORD[72+rax]
+ mov rbx,QWORD[80+rax]
+ lea rax,[88+rax]
+
+ mov QWORD[144+r8],rbx
+ mov QWORD[160+r8],rbp
+ mov QWORD[216+r8],r12
+ mov QWORD[224+r8],r13
+ mov QWORD[232+r8],r14
+ mov QWORD[240+r8],r15
+
+$L$in_prologue:
+ mov rdi,QWORD[8+rax]
+ mov rsi,QWORD[16+rax]
+ mov QWORD[152+r8],rax
+ mov QWORD[168+r8],rsi
+ mov QWORD[176+r8],rdi
+
+ mov rdi,QWORD[40+r9]
+ mov rsi,r8
+ mov ecx,154
+ DD 0xa548f3fc
+
+ mov rsi,r9
+ xor rcx,rcx
+ mov rdx,QWORD[8+rsi]
+ mov r8,QWORD[rsi]
+ mov r9,QWORD[16+rsi]
+ mov r10,QWORD[40+rsi]
+ lea r11,[56+rsi]
+ lea r12,[24+rsi]
+ mov QWORD[32+rsp],r10
+ mov QWORD[40+rsp],r11
+ mov QWORD[48+rsp],r12
+ mov QWORD[56+rsp],rcx
+ call QWORD[__imp_RtlVirtualUnwind]
+
+ mov eax,1
+ add rsp,64
+ popfq
+ pop r15
+ pop r14
+ pop r13
+ pop r12
+ pop rbp
+ pop rbx
+ pop rdi
+ pop rsi
+ DB 0F3h,0C3h ;repret
+
+
+section .pdata rdata align=4
+ALIGN 4
+ DD $L$SEH_begin_rc4_md5_enc wrt ..imagebase
+ DD $L$SEH_end_rc4_md5_enc wrt ..imagebase
+ DD $L$SEH_info_rc4_md5_enc wrt ..imagebase
+
+section .xdata rdata align=8
+ALIGN 8
+$L$SEH_info_rc4_md5_enc:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-x86_64.nasm
new file mode 100644
index 0000000000..72e3641649
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-x86_64.nasm
@@ -0,0 +1,784 @@
+; Copyright 2005-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+EXTERN OPENSSL_ia32cap_P
+
+global RC4
+
+ALIGN 16
+RC4:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_RC4:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+
+ or rsi,rsi
+ jne NEAR $L$entry
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+$L$entry:
+
+ push rbx
+
+ push r12
+
+ push r13
+
+$L$prologue:
+ mov r11,rsi
+ mov r12,rdx
+ mov r13,rcx
+ xor r10,r10
+ xor rcx,rcx
+
+ lea rdi,[8+rdi]
+ mov r10b,BYTE[((-8))+rdi]
+ mov cl,BYTE[((-4))+rdi]
+ cmp DWORD[256+rdi],-1
+ je NEAR $L$RC4_CHAR
+ mov r8d,DWORD[OPENSSL_ia32cap_P]
+ xor rbx,rbx
+ inc r10b
+ sub rbx,r10
+ sub r13,r12
+ mov eax,DWORD[r10*4+rdi]
+ test r11,-16
+ jz NEAR $L$loop1
+ bt r8d,30
+ jc NEAR $L$intel
+ and rbx,7
+ lea rsi,[1+r10]
+ jz NEAR $L$oop8
+ sub r11,rbx
+$L$oop8_warmup:
+ add cl,al
+ mov edx,DWORD[rcx*4+rdi]
+ mov DWORD[rcx*4+rdi],eax
+ mov DWORD[r10*4+rdi],edx
+ add al,dl
+ inc r10b
+ mov edx,DWORD[rax*4+rdi]
+ mov eax,DWORD[r10*4+rdi]
+ xor dl,BYTE[r12]
+ mov BYTE[r13*1+r12],dl
+ lea r12,[1+r12]
+ dec rbx
+ jnz NEAR $L$oop8_warmup
+
+ lea rsi,[1+r10]
+ jmp NEAR $L$oop8
+ALIGN 16
+$L$oop8:
+ add cl,al
+ mov edx,DWORD[rcx*4+rdi]
+ mov DWORD[rcx*4+rdi],eax
+ mov ebx,DWORD[rsi*4+rdi]
+ ror r8,8
+ mov DWORD[r10*4+rdi],edx
+ add dl,al
+ mov r8b,BYTE[rdx*4+rdi]
+ add cl,bl
+ mov edx,DWORD[rcx*4+rdi]
+ mov DWORD[rcx*4+rdi],ebx
+ mov eax,DWORD[4+rsi*4+rdi]
+ ror r8,8
+ mov DWORD[4+r10*4+rdi],edx
+ add dl,bl
+ mov r8b,BYTE[rdx*4+rdi]
+ add cl,al
+ mov edx,DWORD[rcx*4+rdi]
+ mov DWORD[rcx*4+rdi],eax
+ mov ebx,DWORD[8+rsi*4+rdi]
+ ror r8,8
+ mov DWORD[8+r10*4+rdi],edx
+ add dl,al
+ mov r8b,BYTE[rdx*4+rdi]
+ add cl,bl
+ mov edx,DWORD[rcx*4+rdi]
+ mov DWORD[rcx*4+rdi],ebx
+ mov eax,DWORD[12+rsi*4+rdi]
+ ror r8,8
+ mov DWORD[12+r10*4+rdi],edx
+ add dl,bl
+ mov r8b,BYTE[rdx*4+rdi]
+ add cl,al
+ mov edx,DWORD[rcx*4+rdi]
+ mov DWORD[rcx*4+rdi],eax
+ mov ebx,DWORD[16+rsi*4+rdi]
+ ror r8,8
+ mov DWORD[16+r10*4+rdi],edx
+ add dl,al
+ mov r8b,BYTE[rdx*4+rdi]
+ add cl,bl
+ mov edx,DWORD[rcx*4+rdi]
+ mov DWORD[rcx*4+rdi],ebx
+ mov eax,DWORD[20+rsi*4+rdi]
+ ror r8,8
+ mov DWORD[20+r10*4+rdi],edx
+ add dl,bl
+ mov r8b,BYTE[rdx*4+rdi]
+ add cl,al
+ mov edx,DWORD[rcx*4+rdi]
+ mov DWORD[rcx*4+rdi],eax
+ mov ebx,DWORD[24+rsi*4+rdi]
+ ror r8,8
+ mov DWORD[24+r10*4+rdi],edx
+ add dl,al
+ mov r8b,BYTE[rdx*4+rdi]
+ add sil,8
+ add cl,bl
+ mov edx,DWORD[rcx*4+rdi]
+ mov DWORD[rcx*4+rdi],ebx
+ mov eax,DWORD[((-4))+rsi*4+rdi]
+ ror r8,8
+ mov DWORD[28+r10*4+rdi],edx
+ add dl,bl
+ mov r8b,BYTE[rdx*4+rdi]
+ add r10b,8
+ ror r8,8
+ sub r11,8
+
+ xor r8,QWORD[r12]
+ mov QWORD[r13*1+r12],r8
+ lea r12,[8+r12]
+
+ test r11,-8
+ jnz NEAR $L$oop8
+ cmp r11,0
+ jne NEAR $L$loop1
+ jmp NEAR $L$exit
+
+ALIGN 16
+$L$intel:
+ test r11,-32
+ jz NEAR $L$loop1
+ and rbx,15
+ jz NEAR $L$oop16_is_hot
+ sub r11,rbx
+$L$oop16_warmup:
+ add cl,al
+ mov edx,DWORD[rcx*4+rdi]
+ mov DWORD[rcx*4+rdi],eax
+ mov DWORD[r10*4+rdi],edx
+ add al,dl
+ inc r10b
+ mov edx,DWORD[rax*4+rdi]
+ mov eax,DWORD[r10*4+rdi]
+ xor dl,BYTE[r12]
+ mov BYTE[r13*1+r12],dl
+ lea r12,[1+r12]
+ dec rbx
+ jnz NEAR $L$oop16_warmup
+
+ mov rbx,rcx
+ xor rcx,rcx
+ mov cl,bl
+
+$L$oop16_is_hot:
+ lea rsi,[r10*4+rdi]
+ add cl,al
+ mov edx,DWORD[rcx*4+rdi]
+ pxor xmm0,xmm0
+ mov DWORD[rcx*4+rdi],eax
+ add al,dl
+ mov ebx,DWORD[4+rsi]
+ movzx eax,al
+ mov DWORD[rsi],edx
+ add cl,bl
+ pinsrw xmm0,WORD[rax*4+rdi],0
+ jmp NEAR $L$oop16_enter
+ALIGN 16
+$L$oop16:
+ add cl,al
+ mov edx,DWORD[rcx*4+rdi]
+ pxor xmm2,xmm0
+ psllq xmm1,8
+ pxor xmm0,xmm0
+ mov DWORD[rcx*4+rdi],eax
+ add al,dl
+ mov ebx,DWORD[4+rsi]
+ movzx eax,al
+ mov DWORD[rsi],edx
+ pxor xmm2,xmm1
+ add cl,bl
+ pinsrw xmm0,WORD[rax*4+rdi],0
+ movdqu XMMWORD[r13*1+r12],xmm2
+ lea r12,[16+r12]
+$L$oop16_enter:
+ mov edx,DWORD[rcx*4+rdi]
+ pxor xmm1,xmm1
+ mov DWORD[rcx*4+rdi],ebx
+ add bl,dl
+ mov eax,DWORD[8+rsi]
+ movzx ebx,bl
+ mov DWORD[4+rsi],edx
+ add cl,al
+ pinsrw xmm1,WORD[rbx*4+rdi],0
+ mov edx,DWORD[rcx*4+rdi]
+ mov DWORD[rcx*4+rdi],eax
+ add al,dl
+ mov ebx,DWORD[12+rsi]
+ movzx eax,al
+ mov DWORD[8+rsi],edx
+ add cl,bl
+ pinsrw xmm0,WORD[rax*4+rdi],1
+ mov edx,DWORD[rcx*4+rdi]
+ mov DWORD[rcx*4+rdi],ebx
+ add bl,dl
+ mov eax,DWORD[16+rsi]
+ movzx ebx,bl
+ mov DWORD[12+rsi],edx
+ add cl,al
+ pinsrw xmm1,WORD[rbx*4+rdi],1
+ mov edx,DWORD[rcx*4+rdi]
+ mov DWORD[rcx*4+rdi],eax
+ add al,dl
+ mov ebx,DWORD[20+rsi]
+ movzx eax,al
+ mov DWORD[16+rsi],edx
+ add cl,bl
+ pinsrw xmm0,WORD[rax*4+rdi],2
+ mov edx,DWORD[rcx*4+rdi]
+ mov DWORD[rcx*4+rdi],ebx
+ add bl,dl
+ mov eax,DWORD[24+rsi]
+ movzx ebx,bl
+ mov DWORD[20+rsi],edx
+ add cl,al
+ pinsrw xmm1,WORD[rbx*4+rdi],2
+ mov edx,DWORD[rcx*4+rdi]
+ mov DWORD[rcx*4+rdi],eax
+ add al,dl
+ mov ebx,DWORD[28+rsi]
+ movzx eax,al
+ mov DWORD[24+rsi],edx
+ add cl,bl
+ pinsrw xmm0,WORD[rax*4+rdi],3
+ mov edx,DWORD[rcx*4+rdi]
+ mov DWORD[rcx*4+rdi],ebx
+ add bl,dl
+ mov eax,DWORD[32+rsi]
+ movzx ebx,bl
+ mov DWORD[28+rsi],edx
+ add cl,al
+ pinsrw xmm1,WORD[rbx*4+rdi],3
+ mov edx,DWORD[rcx*4+rdi]
+ mov DWORD[rcx*4+rdi],eax
+ add al,dl
+ mov ebx,DWORD[36+rsi]
+ movzx eax,al
+ mov DWORD[32+rsi],edx
+ add cl,bl
+ pinsrw xmm0,WORD[rax*4+rdi],4
+ mov edx,DWORD[rcx*4+rdi]
+ mov DWORD[rcx*4+rdi],ebx
+ add bl,dl
+ mov eax,DWORD[40+rsi]
+ movzx ebx,bl
+ mov DWORD[36+rsi],edx
+ add cl,al
+ pinsrw xmm1,WORD[rbx*4+rdi],4
+ mov edx,DWORD[rcx*4+rdi]
+ mov DWORD[rcx*4+rdi],eax
+ add al,dl
+ mov ebx,DWORD[44+rsi]
+ movzx eax,al
+ mov DWORD[40+rsi],edx
+ add cl,bl
+ pinsrw xmm0,WORD[rax*4+rdi],5
+ mov edx,DWORD[rcx*4+rdi]
+ mov DWORD[rcx*4+rdi],ebx
+ add bl,dl
+ mov eax,DWORD[48+rsi]
+ movzx ebx,bl
+ mov DWORD[44+rsi],edx
+ add cl,al
+ pinsrw xmm1,WORD[rbx*4+rdi],5
+ mov edx,DWORD[rcx*4+rdi]
+ mov DWORD[rcx*4+rdi],eax
+ add al,dl
+ mov ebx,DWORD[52+rsi]
+ movzx eax,al
+ mov DWORD[48+rsi],edx
+ add cl,bl
+ pinsrw xmm0,WORD[rax*4+rdi],6
+ mov edx,DWORD[rcx*4+rdi]
+ mov DWORD[rcx*4+rdi],ebx
+ add bl,dl
+ mov eax,DWORD[56+rsi]
+ movzx ebx,bl
+ mov DWORD[52+rsi],edx
+ add cl,al
+ pinsrw xmm1,WORD[rbx*4+rdi],6
+ mov edx,DWORD[rcx*4+rdi]
+ mov DWORD[rcx*4+rdi],eax
+ add al,dl
+ mov ebx,DWORD[60+rsi]
+ movzx eax,al
+ mov DWORD[56+rsi],edx
+ add cl,bl
+ pinsrw xmm0,WORD[rax*4+rdi],7
+ add r10b,16
+ movdqu xmm2,XMMWORD[r12]
+ mov edx,DWORD[rcx*4+rdi]
+ mov DWORD[rcx*4+rdi],ebx
+ add bl,dl
+ movzx ebx,bl
+ mov DWORD[60+rsi],edx
+ lea rsi,[r10*4+rdi]
+ pinsrw xmm1,WORD[rbx*4+rdi],7
+ mov eax,DWORD[rsi]
+ mov rbx,rcx
+ xor rcx,rcx
+ sub r11,16
+ mov cl,bl
+ test r11,-16
+ jnz NEAR $L$oop16
+
+ psllq xmm1,8
+ pxor xmm2,xmm0
+ pxor xmm2,xmm1
+ movdqu XMMWORD[r13*1+r12],xmm2
+ lea r12,[16+r12]
+
+ cmp r11,0
+ jne NEAR $L$loop1
+ jmp NEAR $L$exit
+
+ALIGN 16
+$L$loop1:
+ add cl,al
+ mov edx,DWORD[rcx*4+rdi]
+ mov DWORD[rcx*4+rdi],eax
+ mov DWORD[r10*4+rdi],edx
+ add al,dl
+ inc r10b
+ mov edx,DWORD[rax*4+rdi]
+ mov eax,DWORD[r10*4+rdi]
+ xor dl,BYTE[r12]
+ mov BYTE[r13*1+r12],dl
+ lea r12,[1+r12]
+ dec r11
+ jnz NEAR $L$loop1
+ jmp NEAR $L$exit
+
+ALIGN 16
+$L$RC4_CHAR:
+ add r10b,1
+ movzx eax,BYTE[r10*1+rdi]
+ test r11,-8
+ jz NEAR $L$cloop1
+ jmp NEAR $L$cloop8
+ALIGN 16
+$L$cloop8:
+ mov r8d,DWORD[r12]
+ mov r9d,DWORD[4+r12]
+ add cl,al
+ lea rsi,[1+r10]
+ movzx edx,BYTE[rcx*1+rdi]
+ movzx esi,sil
+ movzx ebx,BYTE[rsi*1+rdi]
+ mov BYTE[rcx*1+rdi],al
+ cmp rcx,rsi
+ mov BYTE[r10*1+rdi],dl
+ jne NEAR $L$cmov0
+ mov rbx,rax
+$L$cmov0:
+ add dl,al
+ xor r8b,BYTE[rdx*1+rdi]
+ ror r8d,8
+ add cl,bl
+ lea r10,[1+rsi]
+ movzx edx,BYTE[rcx*1+rdi]
+ movzx r10d,r10b
+ movzx eax,BYTE[r10*1+rdi]
+ mov BYTE[rcx*1+rdi],bl
+ cmp rcx,r10
+ mov BYTE[rsi*1+rdi],dl
+ jne NEAR $L$cmov1
+ mov rax,rbx
+$L$cmov1:
+ add dl,bl
+ xor r8b,BYTE[rdx*1+rdi]
+ ror r8d,8
+ add cl,al
+ lea rsi,[1+r10]
+ movzx edx,BYTE[rcx*1+rdi]
+ movzx esi,sil
+ movzx ebx,BYTE[rsi*1+rdi]
+ mov BYTE[rcx*1+rdi],al
+ cmp rcx,rsi
+ mov BYTE[r10*1+rdi],dl
+ jne NEAR $L$cmov2
+ mov rbx,rax
+$L$cmov2:
+ add dl,al
+ xor r8b,BYTE[rdx*1+rdi]
+ ror r8d,8
+ add cl,bl
+ lea r10,[1+rsi]
+ movzx edx,BYTE[rcx*1+rdi]
+ movzx r10d,r10b
+ movzx eax,BYTE[r10*1+rdi]
+ mov BYTE[rcx*1+rdi],bl
+ cmp rcx,r10
+ mov BYTE[rsi*1+rdi],dl
+ jne NEAR $L$cmov3
+ mov rax,rbx
+$L$cmov3:
+ add dl,bl
+ xor r8b,BYTE[rdx*1+rdi]
+ ror r8d,8
+ add cl,al
+ lea rsi,[1+r10]
+ movzx edx,BYTE[rcx*1+rdi]
+ movzx esi,sil
+ movzx ebx,BYTE[rsi*1+rdi]
+ mov BYTE[rcx*1+rdi],al
+ cmp rcx,rsi
+ mov BYTE[r10*1+rdi],dl
+ jne NEAR $L$cmov4
+ mov rbx,rax
+$L$cmov4:
+ add dl,al
+ xor r9b,BYTE[rdx*1+rdi]
+ ror r9d,8
+ add cl,bl
+ lea r10,[1+rsi]
+ movzx edx,BYTE[rcx*1+rdi]
+ movzx r10d,r10b
+ movzx eax,BYTE[r10*1+rdi]
+ mov BYTE[rcx*1+rdi],bl
+ cmp rcx,r10
+ mov BYTE[rsi*1+rdi],dl
+ jne NEAR $L$cmov5
+ mov rax,rbx
+$L$cmov5:
+ add dl,bl
+ xor r9b,BYTE[rdx*1+rdi]
+ ror r9d,8
+ add cl,al
+ lea rsi,[1+r10]
+ movzx edx,BYTE[rcx*1+rdi]
+ movzx esi,sil
+ movzx ebx,BYTE[rsi*1+rdi]
+ mov BYTE[rcx*1+rdi],al
+ cmp rcx,rsi
+ mov BYTE[r10*1+rdi],dl
+ jne NEAR $L$cmov6
+ mov rbx,rax
+$L$cmov6:
+ add dl,al
+ xor r9b,BYTE[rdx*1+rdi]
+ ror r9d,8
+ add cl,bl
+ lea r10,[1+rsi]
+ movzx edx,BYTE[rcx*1+rdi]
+ movzx r10d,r10b
+ movzx eax,BYTE[r10*1+rdi]
+ mov BYTE[rcx*1+rdi],bl
+ cmp rcx,r10
+ mov BYTE[rsi*1+rdi],dl
+ jne NEAR $L$cmov7
+ mov rax,rbx
+$L$cmov7:
+ add dl,bl
+ xor r9b,BYTE[rdx*1+rdi]
+ ror r9d,8
+ lea r11,[((-8))+r11]
+ mov DWORD[r13],r8d
+ lea r12,[8+r12]
+ mov DWORD[4+r13],r9d
+ lea r13,[8+r13]
+
+ test r11,-8
+ jnz NEAR $L$cloop8
+ cmp r11,0
+ jne NEAR $L$cloop1
+ jmp NEAR $L$exit
+ALIGN 16
+$L$cloop1:
+ add cl,al
+ movzx ecx,cl
+ movzx edx,BYTE[rcx*1+rdi]
+ mov BYTE[rcx*1+rdi],al
+ mov BYTE[r10*1+rdi],dl
+ add dl,al
+ add r10b,1
+ movzx edx,dl
+ movzx r10d,r10b
+ movzx edx,BYTE[rdx*1+rdi]
+ movzx eax,BYTE[r10*1+rdi]
+ xor dl,BYTE[r12]
+ lea r12,[1+r12]
+ mov BYTE[r13],dl
+ lea r13,[1+r13]
+ sub r11,1
+ jnz NEAR $L$cloop1
+ jmp NEAR $L$exit
+
+ALIGN 16
+$L$exit:
+ sub r10b,1
+ mov DWORD[((-8))+rdi],r10d
+ mov DWORD[((-4))+rdi],ecx
+
+ mov r13,QWORD[rsp]
+
+ mov r12,QWORD[8+rsp]
+
+ mov rbx,QWORD[16+rsp]
+
+ add rsp,24
+
+$L$epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_RC4:
+global RC4_set_key
+
+ALIGN 16
+RC4_set_key:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_RC4_set_key:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+ lea rdi,[8+rdi]
+ lea rdx,[rsi*1+rdx]
+ neg rsi
+ mov rcx,rsi
+ xor eax,eax
+ xor r9,r9
+ xor r10,r10
+ xor r11,r11
+
+ mov r8d,DWORD[OPENSSL_ia32cap_P]
+ bt r8d,20
+ jc NEAR $L$c1stloop
+ jmp NEAR $L$w1stloop
+
+ALIGN 16
+$L$w1stloop:
+ mov DWORD[rax*4+rdi],eax
+ add al,1
+ jnc NEAR $L$w1stloop
+
+ xor r9,r9
+ xor r8,r8
+ALIGN 16
+$L$w2ndloop:
+ mov r10d,DWORD[r9*4+rdi]
+ add r8b,BYTE[rsi*1+rdx]
+ add r8b,r10b
+ add rsi,1
+ mov r11d,DWORD[r8*4+rdi]
+ cmovz rsi,rcx
+ mov DWORD[r8*4+rdi],r10d
+ mov DWORD[r9*4+rdi],r11d
+ add r9b,1
+ jnc NEAR $L$w2ndloop
+ jmp NEAR $L$exit_key
+
+ALIGN 16
+$L$c1stloop:
+ mov BYTE[rax*1+rdi],al
+ add al,1
+ jnc NEAR $L$c1stloop
+
+ xor r9,r9
+ xor r8,r8
+ALIGN 16
+$L$c2ndloop:
+ mov r10b,BYTE[r9*1+rdi]
+ add r8b,BYTE[rsi*1+rdx]
+ add r8b,r10b
+ add rsi,1
+ mov r11b,BYTE[r8*1+rdi]
+ jnz NEAR $L$cnowrap
+ mov rsi,rcx
+$L$cnowrap:
+ mov BYTE[r8*1+rdi],r10b
+ mov BYTE[r9*1+rdi],r11b
+ add r9b,1
+ jnc NEAR $L$c2ndloop
+ mov DWORD[256+rdi],-1
+
+ALIGN 16
+$L$exit_key:
+ xor eax,eax
+ mov DWORD[((-8))+rdi],eax
+ mov DWORD[((-4))+rdi],eax
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+$L$SEH_end_RC4_set_key:
+
+global RC4_options
+
+ALIGN 16
+RC4_options:
+ lea rax,[$L$opts]
+ mov edx,DWORD[OPENSSL_ia32cap_P]
+ bt edx,20
+ jc NEAR $L$8xchar
+ bt edx,30
+ jnc NEAR $L$done
+ add rax,25
+ DB 0F3h,0C3h ;repret
+$L$8xchar:
+ add rax,12
+$L$done:
+ DB 0F3h,0C3h ;repret
+ALIGN 64
+$L$opts:
+DB 114,99,52,40,56,120,44,105,110,116,41,0
+DB 114,99,52,40,56,120,44,99,104,97,114,41,0
+DB 114,99,52,40,49,54,120,44,105,110,116,41,0
+DB 82,67,52,32,102,111,114,32,120,56,54,95,54,52,44,32
+DB 67,82,89,80,84,79,71,65,77,83,32,98,121,32,60,97
+DB 112,112,114,111,64,111,112,101,110,115,115,108,46,111,114,103
+DB 62,0
+ALIGN 64
+
+EXTERN __imp_RtlVirtualUnwind
+
+ALIGN 16
+stream_se_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ lea r10,[$L$prologue]
+ cmp rbx,r10
+ jb NEAR $L$in_prologue
+
+ mov rax,QWORD[152+r8]
+
+ lea r10,[$L$epilogue]
+ cmp rbx,r10
+ jae NEAR $L$in_prologue
+
+ lea rax,[24+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+ mov r12,QWORD[((-16))+rax]
+ mov r13,QWORD[((-24))+rax]
+ mov QWORD[144+r8],rbx
+ mov QWORD[216+r8],r12
+ mov QWORD[224+r8],r13
+
+$L$in_prologue:
+ mov rdi,QWORD[8+rax]
+ mov rsi,QWORD[16+rax]
+ mov QWORD[152+r8],rax
+ mov QWORD[168+r8],rsi
+ mov QWORD[176+r8],rdi
+
+ jmp NEAR $L$common_seh_exit
+
+
+
+ALIGN 16
+key_se_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[152+r8]
+ mov rdi,QWORD[8+rax]
+ mov rsi,QWORD[16+rax]
+ mov QWORD[168+r8],rsi
+ mov QWORD[176+r8],rdi
+
+$L$common_seh_exit:
+
+ mov rdi,QWORD[40+r9]
+ mov rsi,r8
+ mov ecx,154
+ DD 0xa548f3fc
+
+ mov rsi,r9
+ xor rcx,rcx
+ mov rdx,QWORD[8+rsi]
+ mov r8,QWORD[rsi]
+ mov r9,QWORD[16+rsi]
+ mov r10,QWORD[40+rsi]
+ lea r11,[56+rsi]
+ lea r12,[24+rsi]
+ mov QWORD[32+rsp],r10
+ mov QWORD[40+rsp],r11
+ mov QWORD[48+rsp],r12
+ mov QWORD[56+rsp],rcx
+ call QWORD[__imp_RtlVirtualUnwind]
+
+ mov eax,1
+ add rsp,64
+ popfq
+ pop r15
+ pop r14
+ pop r13
+ pop r12
+ pop rbp
+ pop rbx
+ pop rdi
+ pop rsi
+ DB 0F3h,0C3h ;repret
+
+
+section .pdata rdata align=4
+ALIGN 4
+ DD $L$SEH_begin_RC4 wrt ..imagebase
+ DD $L$SEH_end_RC4 wrt ..imagebase
+ DD $L$SEH_info_RC4 wrt ..imagebase
+
+ DD $L$SEH_begin_RC4_set_key wrt ..imagebase
+ DD $L$SEH_end_RC4_set_key wrt ..imagebase
+ DD $L$SEH_info_RC4_set_key wrt ..imagebase
+
+section .xdata rdata align=8
+ALIGN 8
+$L$SEH_info_RC4:
+DB 9,0,0,0
+ DD stream_se_handler wrt ..imagebase
+$L$SEH_info_RC4_set_key:
+DB 9,0,0,0
+ DD key_se_handler wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/sha/keccak1600-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/keccak1600-x86_64.nasm
new file mode 100644
index 0000000000..00eadebf68
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/keccak1600-x86_64.nasm
@@ -0,0 +1,532 @@
+; Copyright 2017-2018 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+
+ALIGN 32
+__KeccakF1600:
+ mov rax,QWORD[60+rdi]
+ mov rbx,QWORD[68+rdi]
+ mov rcx,QWORD[76+rdi]
+ mov rdx,QWORD[84+rdi]
+ mov rbp,QWORD[92+rdi]
+ jmp NEAR $L$oop
+
+ALIGN 32
+$L$oop:
+ mov r8,QWORD[((-100))+rdi]
+ mov r9,QWORD[((-52))+rdi]
+ mov r10,QWORD[((-4))+rdi]
+ mov r11,QWORD[44+rdi]
+
+ xor rcx,QWORD[((-84))+rdi]
+ xor rdx,QWORD[((-76))+rdi]
+ xor rax,r8
+ xor rbx,QWORD[((-92))+rdi]
+ xor rcx,QWORD[((-44))+rdi]
+ xor rax,QWORD[((-60))+rdi]
+ mov r12,rbp
+ xor rbp,QWORD[((-68))+rdi]
+
+ xor rcx,r10
+ xor rax,QWORD[((-20))+rdi]
+ xor rdx,QWORD[((-36))+rdi]
+ xor rbx,r9
+ xor rbp,QWORD[((-28))+rdi]
+
+ xor rcx,QWORD[36+rdi]
+ xor rax,QWORD[20+rdi]
+ xor rdx,QWORD[4+rdi]
+ xor rbx,QWORD[((-12))+rdi]
+ xor rbp,QWORD[12+rdi]
+
+ mov r13,rcx
+ rol rcx,1
+ xor rcx,rax
+ xor rdx,r11
+
+ rol rax,1
+ xor rax,rdx
+ xor rbx,QWORD[28+rdi]
+
+ rol rdx,1
+ xor rdx,rbx
+ xor rbp,QWORD[52+rdi]
+
+ rol rbx,1
+ xor rbx,rbp
+
+ rol rbp,1
+ xor rbp,r13
+ xor r9,rcx
+ xor r10,rdx
+ rol r9,44
+ xor r11,rbp
+ xor r12,rax
+ rol r10,43
+ xor r8,rbx
+ mov r13,r9
+ rol r11,21
+ or r9,r10
+ xor r9,r8
+ rol r12,14
+
+ xor r9,QWORD[r15]
+ lea r15,[8+r15]
+
+ mov r14,r12
+ and r12,r11
+ mov QWORD[((-100))+rsi],r9
+ xor r12,r10
+ not r10
+ mov QWORD[((-84))+rsi],r12
+
+ or r10,r11
+ mov r12,QWORD[76+rdi]
+ xor r10,r13
+ mov QWORD[((-92))+rsi],r10
+
+ and r13,r8
+ mov r9,QWORD[((-28))+rdi]
+ xor r13,r14
+ mov r10,QWORD[((-20))+rdi]
+ mov QWORD[((-68))+rsi],r13
+
+ or r14,r8
+ mov r8,QWORD[((-76))+rdi]
+ xor r14,r11
+ mov r11,QWORD[28+rdi]
+ mov QWORD[((-76))+rsi],r14
+
+
+ xor r8,rbp
+ xor r12,rdx
+ rol r8,28
+ xor r11,rcx
+ xor r9,rax
+ rol r12,61
+ rol r11,45
+ xor r10,rbx
+ rol r9,20
+ mov r13,r8
+ or r8,r12
+ rol r10,3
+
+ xor r8,r11
+ mov QWORD[((-36))+rsi],r8
+
+ mov r14,r9
+ and r9,r13
+ mov r8,QWORD[((-92))+rdi]
+ xor r9,r12
+ not r12
+ mov QWORD[((-28))+rsi],r9
+
+ or r12,r11
+ mov r9,QWORD[((-44))+rdi]
+ xor r12,r10
+ mov QWORD[((-44))+rsi],r12
+
+ and r11,r10
+ mov r12,QWORD[60+rdi]
+ xor r11,r14
+ mov QWORD[((-52))+rsi],r11
+
+ or r14,r10
+ mov r10,QWORD[4+rdi]
+ xor r14,r13
+ mov r11,QWORD[52+rdi]
+ mov QWORD[((-60))+rsi],r14
+
+
+ xor r10,rbp
+ xor r11,rax
+ rol r10,25
+ xor r9,rdx
+ rol r11,8
+ xor r12,rbx
+ rol r9,6
+ xor r8,rcx
+ rol r12,18
+ mov r13,r10
+ and r10,r11
+ rol r8,1
+
+ not r11
+ xor r10,r9
+ mov QWORD[((-12))+rsi],r10
+
+ mov r14,r12
+ and r12,r11
+ mov r10,QWORD[((-12))+rdi]
+ xor r12,r13
+ mov QWORD[((-4))+rsi],r12
+
+ or r13,r9
+ mov r12,QWORD[84+rdi]
+ xor r13,r8
+ mov QWORD[((-20))+rsi],r13
+
+ and r9,r8
+ xor r9,r14
+ mov QWORD[12+rsi],r9
+
+ or r14,r8
+ mov r9,QWORD[((-60))+rdi]
+ xor r14,r11
+ mov r11,QWORD[36+rdi]
+ mov QWORD[4+rsi],r14
+
+
+ mov r8,QWORD[((-68))+rdi]
+
+ xor r10,rcx
+ xor r11,rdx
+ rol r10,10
+ xor r9,rbx
+ rol r11,15
+ xor r12,rbp
+ rol r9,36
+ xor r8,rax
+ rol r12,56
+ mov r13,r10
+ or r10,r11
+ rol r8,27
+
+ not r11
+ xor r10,r9
+ mov QWORD[28+rsi],r10
+
+ mov r14,r12
+ or r12,r11
+ xor r12,r13
+ mov QWORD[36+rsi],r12
+
+ and r13,r9
+ xor r13,r8
+ mov QWORD[20+rsi],r13
+
+ or r9,r8
+ xor r9,r14
+ mov QWORD[52+rsi],r9
+
+ and r8,r14
+ xor r8,r11
+ mov QWORD[44+rsi],r8
+
+
+ xor rdx,QWORD[((-84))+rdi]
+ xor rbp,QWORD[((-36))+rdi]
+ rol rdx,62
+ xor rcx,QWORD[68+rdi]
+ rol rbp,55
+ xor rax,QWORD[12+rdi]
+ rol rcx,2
+ xor rbx,QWORD[20+rdi]
+ xchg rdi,rsi
+ rol rax,39
+ rol rbx,41
+ mov r13,rdx
+ and rdx,rbp
+ not rbp
+ xor rdx,rcx
+ mov QWORD[92+rdi],rdx
+
+ mov r14,rax
+ and rax,rbp
+ xor rax,r13
+ mov QWORD[60+rdi],rax
+
+ or r13,rcx
+ xor r13,rbx
+ mov QWORD[84+rdi],r13
+
+ and rcx,rbx
+ xor rcx,r14
+ mov QWORD[76+rdi],rcx
+
+ or rbx,r14
+ xor rbx,rbp
+ mov QWORD[68+rdi],rbx
+
+ mov rbp,rdx
+ mov rdx,r13
+
+ test r15,255
+ jnz NEAR $L$oop
+
+ lea r15,[((-192))+r15]
+ DB 0F3h,0C3h ;repret
+
+
+
+ALIGN 32
+KeccakF1600:
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+
+ lea rdi,[100+rdi]
+ sub rsp,200
+
+
+ not QWORD[((-92))+rdi]
+ not QWORD[((-84))+rdi]
+ not QWORD[((-36))+rdi]
+ not QWORD[((-4))+rdi]
+ not QWORD[36+rdi]
+ not QWORD[60+rdi]
+
+ lea r15,[iotas]
+ lea rsi,[100+rsp]
+
+ call __KeccakF1600
+
+ not QWORD[((-92))+rdi]
+ not QWORD[((-84))+rdi]
+ not QWORD[((-36))+rdi]
+ not QWORD[((-4))+rdi]
+ not QWORD[36+rdi]
+ not QWORD[60+rdi]
+ lea rdi,[((-100))+rdi]
+
+ add rsp,200
+
+
+ pop r15
+
+ pop r14
+
+ pop r13
+
+ pop r12
+
+ pop rbp
+
+ pop rbx
+
+ DB 0F3h,0C3h ;repret
+
+
+global SHA3_absorb
+
+ALIGN 32
+SHA3_absorb:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_SHA3_absorb:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+
+
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+
+ lea rdi,[100+rdi]
+ sub rsp,232
+
+
+ mov r9,rsi
+ lea rsi,[100+rsp]
+
+ not QWORD[((-92))+rdi]
+ not QWORD[((-84))+rdi]
+ not QWORD[((-36))+rdi]
+ not QWORD[((-4))+rdi]
+ not QWORD[36+rdi]
+ not QWORD[60+rdi]
+ lea r15,[iotas]
+
+ mov QWORD[((216-100))+rsi],rcx
+
+$L$oop_absorb:
+ cmp rdx,rcx
+ jc NEAR $L$done_absorb
+
+ shr rcx,3
+ lea r8,[((-100))+rdi]
+
+$L$block_absorb:
+ mov rax,QWORD[r9]
+ lea r9,[8+r9]
+ xor rax,QWORD[r8]
+ lea r8,[8+r8]
+ sub rdx,8
+ mov QWORD[((-8))+r8],rax
+ sub rcx,1
+ jnz NEAR $L$block_absorb
+
+ mov QWORD[((200-100))+rsi],r9
+ mov QWORD[((208-100))+rsi],rdx
+ call __KeccakF1600
+ mov r9,QWORD[((200-100))+rsi]
+ mov rdx,QWORD[((208-100))+rsi]
+ mov rcx,QWORD[((216-100))+rsi]
+ jmp NEAR $L$oop_absorb
+
+ALIGN 32
+$L$done_absorb:
+ mov rax,rdx
+
+ not QWORD[((-92))+rdi]
+ not QWORD[((-84))+rdi]
+ not QWORD[((-36))+rdi]
+ not QWORD[((-4))+rdi]
+ not QWORD[36+rdi]
+ not QWORD[60+rdi]
+
+ add rsp,232
+
+
+ pop r15
+
+ pop r14
+
+ pop r13
+
+ pop r12
+
+ pop rbp
+
+ pop rbx
+
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_SHA3_absorb:
+global SHA3_squeeze
+
+ALIGN 32
+SHA3_squeeze:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_SHA3_squeeze:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+ mov rcx,r9
+
+
+
+ push r12
+
+ push r13
+
+ push r14
+
+
+ shr rcx,3
+ mov r8,rdi
+ mov r12,rsi
+ mov r13,rdx
+ mov r14,rcx
+ jmp NEAR $L$oop_squeeze
+
+ALIGN 32
+$L$oop_squeeze:
+ cmp r13,8
+ jb NEAR $L$tail_squeeze
+
+ mov rax,QWORD[r8]
+ lea r8,[8+r8]
+ mov QWORD[r12],rax
+ lea r12,[8+r12]
+ sub r13,8
+ jz NEAR $L$done_squeeze
+
+ sub rcx,1
+ jnz NEAR $L$oop_squeeze
+
+ call KeccakF1600
+ mov r8,rdi
+ mov rcx,r14
+ jmp NEAR $L$oop_squeeze
+
+$L$tail_squeeze:
+ mov rsi,r8
+ mov rdi,r12
+ mov rcx,r13
+DB 0xf3,0xa4
+
+$L$done_squeeze:
+ pop r14
+
+ pop r13
+
+ pop r12
+
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_SHA3_squeeze:
+ALIGN 256
+ DQ 0,0,0,0,0,0,0,0
+
+iotas:
+ DQ 0x0000000000000001
+ DQ 0x0000000000008082
+ DQ 0x800000000000808a
+ DQ 0x8000000080008000
+ DQ 0x000000000000808b
+ DQ 0x0000000080000001
+ DQ 0x8000000080008081
+ DQ 0x8000000000008009
+ DQ 0x000000000000008a
+ DQ 0x0000000000000088
+ DQ 0x0000000080008009
+ DQ 0x000000008000000a
+ DQ 0x000000008000808b
+ DQ 0x800000000000008b
+ DQ 0x8000000000008089
+ DQ 0x8000000000008003
+ DQ 0x8000000000008002
+ DQ 0x8000000000000080
+ DQ 0x000000000000800a
+ DQ 0x800000008000000a
+ DQ 0x8000000080008081
+ DQ 0x8000000000008080
+ DQ 0x0000000080000001
+ DQ 0x8000000080008008
+
+DB 75,101,99,99,97,107,45,49,54,48,48,32,97,98,115,111
+DB 114,98,32,97,110,100,32,115,113,117,101,101,122,101,32,102
+DB 111,114,32,120,56,54,95,54,52,44,32,67,82,89,80,84
+DB 79,71,65,77,83,32,98,121,32,60,97,112,112,114,111,64
+DB 111,112,101,110,115,115,108,46,111,114,103,62,0
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-mb-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-mb-x86_64.nasm
new file mode 100644
index 0000000000..ea394daa3b
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-mb-x86_64.nasm
@@ -0,0 +1,7581 @@
+; Copyright 2013-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+EXTERN OPENSSL_ia32cap_P
+
+global sha1_multi_block
+
+ALIGN 32
+sha1_multi_block:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_sha1_multi_block:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+
+ mov rcx,QWORD[((OPENSSL_ia32cap_P+4))]
+ bt rcx,61
+ jc NEAR _shaext_shortcut
+ test ecx,268435456
+ jnz NEAR _avx_shortcut
+ mov rax,rsp
+
+ push rbx
+
+ push rbp
+
+ lea rsp,[((-168))+rsp]
+ movaps XMMWORD[rsp],xmm6
+ movaps XMMWORD[16+rsp],xmm7
+ movaps XMMWORD[32+rsp],xmm8
+ movaps XMMWORD[48+rsp],xmm9
+ movaps XMMWORD[(-120)+rax],xmm10
+ movaps XMMWORD[(-104)+rax],xmm11
+ movaps XMMWORD[(-88)+rax],xmm12
+ movaps XMMWORD[(-72)+rax],xmm13
+ movaps XMMWORD[(-56)+rax],xmm14
+ movaps XMMWORD[(-40)+rax],xmm15
+ sub rsp,288
+ and rsp,-256
+ mov QWORD[272+rsp],rax
+
+$L$body:
+ lea rbp,[K_XX_XX]
+ lea rbx,[256+rsp]
+
+$L$oop_grande:
+ mov DWORD[280+rsp],edx
+ xor edx,edx
+ mov r8,QWORD[rsi]
+ mov ecx,DWORD[8+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[rbx],ecx
+ cmovle r8,rbp
+ mov r9,QWORD[16+rsi]
+ mov ecx,DWORD[24+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[4+rbx],ecx
+ cmovle r9,rbp
+ mov r10,QWORD[32+rsi]
+ mov ecx,DWORD[40+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[8+rbx],ecx
+ cmovle r10,rbp
+ mov r11,QWORD[48+rsi]
+ mov ecx,DWORD[56+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[12+rbx],ecx
+ cmovle r11,rbp
+ test edx,edx
+ jz NEAR $L$done
+
+ movdqu xmm10,XMMWORD[rdi]
+ lea rax,[128+rsp]
+ movdqu xmm11,XMMWORD[32+rdi]
+ movdqu xmm12,XMMWORD[64+rdi]
+ movdqu xmm13,XMMWORD[96+rdi]
+ movdqu xmm14,XMMWORD[128+rdi]
+ movdqa xmm5,XMMWORD[96+rbp]
+ movdqa xmm15,XMMWORD[((-32))+rbp]
+ jmp NEAR $L$oop
+
+ALIGN 32
+$L$oop:
+ movd xmm0,DWORD[r8]
+ lea r8,[64+r8]
+ movd xmm2,DWORD[r9]
+ lea r9,[64+r9]
+ movd xmm3,DWORD[r10]
+ lea r10,[64+r10]
+ movd xmm4,DWORD[r11]
+ lea r11,[64+r11]
+ punpckldq xmm0,xmm3
+ movd xmm1,DWORD[((-60))+r8]
+ punpckldq xmm2,xmm4
+ movd xmm9,DWORD[((-60))+r9]
+ punpckldq xmm0,xmm2
+ movd xmm8,DWORD[((-60))+r10]
+DB 102,15,56,0,197
+ movd xmm7,DWORD[((-60))+r11]
+ punpckldq xmm1,xmm8
+ movdqa xmm8,xmm10
+ paddd xmm14,xmm15
+ punpckldq xmm9,xmm7
+ movdqa xmm7,xmm11
+ movdqa xmm6,xmm11
+ pslld xmm8,5
+ pandn xmm7,xmm13
+ pand xmm6,xmm12
+ punpckldq xmm1,xmm9
+ movdqa xmm9,xmm10
+
+ movdqa XMMWORD[(0-128)+rax],xmm0
+ paddd xmm14,xmm0
+ movd xmm2,DWORD[((-56))+r8]
+ psrld xmm9,27
+ pxor xmm6,xmm7
+ movdqa xmm7,xmm11
+
+ por xmm8,xmm9
+ movd xmm9,DWORD[((-56))+r9]
+ pslld xmm7,30
+ paddd xmm14,xmm6
+
+ psrld xmm11,2
+ paddd xmm14,xmm8
+DB 102,15,56,0,205
+ movd xmm8,DWORD[((-56))+r10]
+ por xmm11,xmm7
+ movd xmm7,DWORD[((-56))+r11]
+ punpckldq xmm2,xmm8
+ movdqa xmm8,xmm14
+ paddd xmm13,xmm15
+ punpckldq xmm9,xmm7
+ movdqa xmm7,xmm10
+ movdqa xmm6,xmm10
+ pslld xmm8,5
+ pandn xmm7,xmm12
+ pand xmm6,xmm11
+ punpckldq xmm2,xmm9
+ movdqa xmm9,xmm14
+
+ movdqa XMMWORD[(16-128)+rax],xmm1
+ paddd xmm13,xmm1
+ movd xmm3,DWORD[((-52))+r8]
+ psrld xmm9,27
+ pxor xmm6,xmm7
+ movdqa xmm7,xmm10
+
+ por xmm8,xmm9
+ movd xmm9,DWORD[((-52))+r9]
+ pslld xmm7,30
+ paddd xmm13,xmm6
+
+ psrld xmm10,2
+ paddd xmm13,xmm8
+DB 102,15,56,0,213
+ movd xmm8,DWORD[((-52))+r10]
+ por xmm10,xmm7
+ movd xmm7,DWORD[((-52))+r11]
+ punpckldq xmm3,xmm8
+ movdqa xmm8,xmm13
+ paddd xmm12,xmm15
+ punpckldq xmm9,xmm7
+ movdqa xmm7,xmm14
+ movdqa xmm6,xmm14
+ pslld xmm8,5
+ pandn xmm7,xmm11
+ pand xmm6,xmm10
+ punpckldq xmm3,xmm9
+ movdqa xmm9,xmm13
+
+ movdqa XMMWORD[(32-128)+rax],xmm2
+ paddd xmm12,xmm2
+ movd xmm4,DWORD[((-48))+r8]
+ psrld xmm9,27
+ pxor xmm6,xmm7
+ movdqa xmm7,xmm14
+
+ por xmm8,xmm9
+ movd xmm9,DWORD[((-48))+r9]
+ pslld xmm7,30
+ paddd xmm12,xmm6
+
+ psrld xmm14,2
+ paddd xmm12,xmm8
+DB 102,15,56,0,221
+ movd xmm8,DWORD[((-48))+r10]
+ por xmm14,xmm7
+ movd xmm7,DWORD[((-48))+r11]
+ punpckldq xmm4,xmm8
+ movdqa xmm8,xmm12
+ paddd xmm11,xmm15
+ punpckldq xmm9,xmm7
+ movdqa xmm7,xmm13
+ movdqa xmm6,xmm13
+ pslld xmm8,5
+ pandn xmm7,xmm10
+ pand xmm6,xmm14
+ punpckldq xmm4,xmm9
+ movdqa xmm9,xmm12
+
+ movdqa XMMWORD[(48-128)+rax],xmm3
+ paddd xmm11,xmm3
+ movd xmm0,DWORD[((-44))+r8]
+ psrld xmm9,27
+ pxor xmm6,xmm7
+ movdqa xmm7,xmm13
+
+ por xmm8,xmm9
+ movd xmm9,DWORD[((-44))+r9]
+ pslld xmm7,30
+ paddd xmm11,xmm6
+
+ psrld xmm13,2
+ paddd xmm11,xmm8
+DB 102,15,56,0,229
+ movd xmm8,DWORD[((-44))+r10]
+ por xmm13,xmm7
+ movd xmm7,DWORD[((-44))+r11]
+ punpckldq xmm0,xmm8
+ movdqa xmm8,xmm11
+ paddd xmm10,xmm15
+ punpckldq xmm9,xmm7
+ movdqa xmm7,xmm12
+ movdqa xmm6,xmm12
+ pslld xmm8,5
+ pandn xmm7,xmm14
+ pand xmm6,xmm13
+ punpckldq xmm0,xmm9
+ movdqa xmm9,xmm11
+
+ movdqa XMMWORD[(64-128)+rax],xmm4
+ paddd xmm10,xmm4
+ movd xmm1,DWORD[((-40))+r8]
+ psrld xmm9,27
+ pxor xmm6,xmm7
+ movdqa xmm7,xmm12
+
+ por xmm8,xmm9
+ movd xmm9,DWORD[((-40))+r9]
+ pslld xmm7,30
+ paddd xmm10,xmm6
+
+ psrld xmm12,2
+ paddd xmm10,xmm8
+DB 102,15,56,0,197
+ movd xmm8,DWORD[((-40))+r10]
+ por xmm12,xmm7
+ movd xmm7,DWORD[((-40))+r11]
+ punpckldq xmm1,xmm8
+ movdqa xmm8,xmm10
+ paddd xmm14,xmm15
+ punpckldq xmm9,xmm7
+ movdqa xmm7,xmm11
+ movdqa xmm6,xmm11
+ pslld xmm8,5
+ pandn xmm7,xmm13
+ pand xmm6,xmm12
+ punpckldq xmm1,xmm9
+ movdqa xmm9,xmm10
+
+ movdqa XMMWORD[(80-128)+rax],xmm0
+ paddd xmm14,xmm0
+ movd xmm2,DWORD[((-36))+r8]
+ psrld xmm9,27
+ pxor xmm6,xmm7
+ movdqa xmm7,xmm11
+
+ por xmm8,xmm9
+ movd xmm9,DWORD[((-36))+r9]
+ pslld xmm7,30
+ paddd xmm14,xmm6
+
+ psrld xmm11,2
+ paddd xmm14,xmm8
+DB 102,15,56,0,205
+ movd xmm8,DWORD[((-36))+r10]
+ por xmm11,xmm7
+ movd xmm7,DWORD[((-36))+r11]
+ punpckldq xmm2,xmm8
+ movdqa xmm8,xmm14
+ paddd xmm13,xmm15
+ punpckldq xmm9,xmm7
+ movdqa xmm7,xmm10
+ movdqa xmm6,xmm10
+ pslld xmm8,5
+ pandn xmm7,xmm12
+ pand xmm6,xmm11
+ punpckldq xmm2,xmm9
+ movdqa xmm9,xmm14
+
+ movdqa XMMWORD[(96-128)+rax],xmm1
+ paddd xmm13,xmm1
+ movd xmm3,DWORD[((-32))+r8]
+ psrld xmm9,27
+ pxor xmm6,xmm7
+ movdqa xmm7,xmm10
+
+ por xmm8,xmm9
+ movd xmm9,DWORD[((-32))+r9]
+ pslld xmm7,30
+ paddd xmm13,xmm6
+
+ psrld xmm10,2
+ paddd xmm13,xmm8
+DB 102,15,56,0,213
+ movd xmm8,DWORD[((-32))+r10]
+ por xmm10,xmm7
+ movd xmm7,DWORD[((-32))+r11]
+ punpckldq xmm3,xmm8
+ movdqa xmm8,xmm13
+ paddd xmm12,xmm15
+ punpckldq xmm9,xmm7
+ movdqa xmm7,xmm14
+ movdqa xmm6,xmm14
+ pslld xmm8,5
+ pandn xmm7,xmm11
+ pand xmm6,xmm10
+ punpckldq xmm3,xmm9
+ movdqa xmm9,xmm13
+
+ movdqa XMMWORD[(112-128)+rax],xmm2
+ paddd xmm12,xmm2
+ movd xmm4,DWORD[((-28))+r8]
+ psrld xmm9,27
+ pxor xmm6,xmm7
+ movdqa xmm7,xmm14
+
+ por xmm8,xmm9
+ movd xmm9,DWORD[((-28))+r9]
+ pslld xmm7,30
+ paddd xmm12,xmm6
+
+ psrld xmm14,2
+ paddd xmm12,xmm8
+DB 102,15,56,0,221
+ movd xmm8,DWORD[((-28))+r10]
+ por xmm14,xmm7
+ movd xmm7,DWORD[((-28))+r11]
+ punpckldq xmm4,xmm8
+ movdqa xmm8,xmm12
+ paddd xmm11,xmm15
+ punpckldq xmm9,xmm7
+ movdqa xmm7,xmm13
+ movdqa xmm6,xmm13
+ pslld xmm8,5
+ pandn xmm7,xmm10
+ pand xmm6,xmm14
+ punpckldq xmm4,xmm9
+ movdqa xmm9,xmm12
+
+ movdqa XMMWORD[(128-128)+rax],xmm3
+ paddd xmm11,xmm3
+ movd xmm0,DWORD[((-24))+r8]
+ psrld xmm9,27
+ pxor xmm6,xmm7
+ movdqa xmm7,xmm13
+
+ por xmm8,xmm9
+ movd xmm9,DWORD[((-24))+r9]
+ pslld xmm7,30
+ paddd xmm11,xmm6
+
+ psrld xmm13,2
+ paddd xmm11,xmm8
+DB 102,15,56,0,229
+ movd xmm8,DWORD[((-24))+r10]
+ por xmm13,xmm7
+ movd xmm7,DWORD[((-24))+r11]
+ punpckldq xmm0,xmm8
+ movdqa xmm8,xmm11
+ paddd xmm10,xmm15
+ punpckldq xmm9,xmm7
+ movdqa xmm7,xmm12
+ movdqa xmm6,xmm12
+ pslld xmm8,5
+ pandn xmm7,xmm14
+ pand xmm6,xmm13
+ punpckldq xmm0,xmm9
+ movdqa xmm9,xmm11
+
+ movdqa XMMWORD[(144-128)+rax],xmm4
+ paddd xmm10,xmm4
+ movd xmm1,DWORD[((-20))+r8]
+ psrld xmm9,27
+ pxor xmm6,xmm7
+ movdqa xmm7,xmm12
+
+ por xmm8,xmm9
+ movd xmm9,DWORD[((-20))+r9]
+ pslld xmm7,30
+ paddd xmm10,xmm6
+
+ psrld xmm12,2
+ paddd xmm10,xmm8
+DB 102,15,56,0,197
+ movd xmm8,DWORD[((-20))+r10]
+ por xmm12,xmm7
+ movd xmm7,DWORD[((-20))+r11]
+ punpckldq xmm1,xmm8
+ movdqa xmm8,xmm10
+ paddd xmm14,xmm15
+ punpckldq xmm9,xmm7
+ movdqa xmm7,xmm11
+ movdqa xmm6,xmm11
+ pslld xmm8,5
+ pandn xmm7,xmm13
+ pand xmm6,xmm12
+ punpckldq xmm1,xmm9
+ movdqa xmm9,xmm10
+
+ movdqa XMMWORD[(160-128)+rax],xmm0
+ paddd xmm14,xmm0
+ movd xmm2,DWORD[((-16))+r8]
+ psrld xmm9,27
+ pxor xmm6,xmm7
+ movdqa xmm7,xmm11
+
+ por xmm8,xmm9
+ movd xmm9,DWORD[((-16))+r9]
+ pslld xmm7,30
+ paddd xmm14,xmm6
+
+ psrld xmm11,2
+ paddd xmm14,xmm8
+DB 102,15,56,0,205
+ movd xmm8,DWORD[((-16))+r10]
+ por xmm11,xmm7
+ movd xmm7,DWORD[((-16))+r11]
+ punpckldq xmm2,xmm8
+ movdqa xmm8,xmm14
+ paddd xmm13,xmm15
+ punpckldq xmm9,xmm7
+ movdqa xmm7,xmm10
+ movdqa xmm6,xmm10
+ pslld xmm8,5
+ pandn xmm7,xmm12
+ pand xmm6,xmm11
+ punpckldq xmm2,xmm9
+ movdqa xmm9,xmm14
+
+ movdqa XMMWORD[(176-128)+rax],xmm1
+ paddd xmm13,xmm1
+ movd xmm3,DWORD[((-12))+r8]
+ psrld xmm9,27
+ pxor xmm6,xmm7
+ movdqa xmm7,xmm10
+
+ por xmm8,xmm9
+ movd xmm9,DWORD[((-12))+r9]
+ pslld xmm7,30
+ paddd xmm13,xmm6
+
+ psrld xmm10,2
+ paddd xmm13,xmm8
+DB 102,15,56,0,213
+ movd xmm8,DWORD[((-12))+r10]
+ por xmm10,xmm7
+ movd xmm7,DWORD[((-12))+r11]
+ punpckldq xmm3,xmm8
+ movdqa xmm8,xmm13
+ paddd xmm12,xmm15
+ punpckldq xmm9,xmm7
+ movdqa xmm7,xmm14
+ movdqa xmm6,xmm14
+ pslld xmm8,5
+ pandn xmm7,xmm11
+ pand xmm6,xmm10
+ punpckldq xmm3,xmm9
+ movdqa xmm9,xmm13
+
+ movdqa XMMWORD[(192-128)+rax],xmm2
+ paddd xmm12,xmm2
+ movd xmm4,DWORD[((-8))+r8]
+ psrld xmm9,27
+ pxor xmm6,xmm7
+ movdqa xmm7,xmm14
+
+ por xmm8,xmm9
+ movd xmm9,DWORD[((-8))+r9]
+ pslld xmm7,30
+ paddd xmm12,xmm6
+
+ psrld xmm14,2
+ paddd xmm12,xmm8
+DB 102,15,56,0,221
+ movd xmm8,DWORD[((-8))+r10]
+ por xmm14,xmm7
+ movd xmm7,DWORD[((-8))+r11]
+ punpckldq xmm4,xmm8
+ movdqa xmm8,xmm12
+ paddd xmm11,xmm15
+ punpckldq xmm9,xmm7
+ movdqa xmm7,xmm13
+ movdqa xmm6,xmm13
+ pslld xmm8,5
+ pandn xmm7,xmm10
+ pand xmm6,xmm14
+ punpckldq xmm4,xmm9
+ movdqa xmm9,xmm12
+
+ movdqa XMMWORD[(208-128)+rax],xmm3
+ paddd xmm11,xmm3
+ movd xmm0,DWORD[((-4))+r8]
+ psrld xmm9,27
+ pxor xmm6,xmm7
+ movdqa xmm7,xmm13
+
+ por xmm8,xmm9
+ movd xmm9,DWORD[((-4))+r9]
+ pslld xmm7,30
+ paddd xmm11,xmm6
+
+ psrld xmm13,2
+ paddd xmm11,xmm8
+DB 102,15,56,0,229
+ movd xmm8,DWORD[((-4))+r10]
+ por xmm13,xmm7
+ movdqa xmm1,XMMWORD[((0-128))+rax]
+ movd xmm7,DWORD[((-4))+r11]
+ punpckldq xmm0,xmm8
+ movdqa xmm8,xmm11
+ paddd xmm10,xmm15
+ punpckldq xmm9,xmm7
+ movdqa xmm7,xmm12
+ movdqa xmm6,xmm12
+ pslld xmm8,5
+ prefetcht0 [63+r8]
+ pandn xmm7,xmm14
+ pand xmm6,xmm13
+ punpckldq xmm0,xmm9
+ movdqa xmm9,xmm11
+
+ movdqa XMMWORD[(224-128)+rax],xmm4
+ paddd xmm10,xmm4
+ psrld xmm9,27
+ pxor xmm6,xmm7
+ movdqa xmm7,xmm12
+ prefetcht0 [63+r9]
+
+ por xmm8,xmm9
+ pslld xmm7,30
+ paddd xmm10,xmm6
+ prefetcht0 [63+r10]
+
+ psrld xmm12,2
+ paddd xmm10,xmm8
+DB 102,15,56,0,197
+ prefetcht0 [63+r11]
+ por xmm12,xmm7
+ movdqa xmm2,XMMWORD[((16-128))+rax]
+ pxor xmm1,xmm3
+ movdqa xmm3,XMMWORD[((32-128))+rax]
+
+ movdqa xmm8,xmm10
+ pxor xmm1,XMMWORD[((128-128))+rax]
+ paddd xmm14,xmm15
+ movdqa xmm7,xmm11
+ pslld xmm8,5
+ pxor xmm1,xmm3
+ movdqa xmm6,xmm11
+ pandn xmm7,xmm13
+ movdqa xmm5,xmm1
+ pand xmm6,xmm12
+ movdqa xmm9,xmm10
+ psrld xmm5,31
+ paddd xmm1,xmm1
+
+ movdqa XMMWORD[(240-128)+rax],xmm0
+ paddd xmm14,xmm0
+ psrld xmm9,27
+ pxor xmm6,xmm7
+
+ movdqa xmm7,xmm11
+ por xmm8,xmm9
+ pslld xmm7,30
+ paddd xmm14,xmm6
+
+ psrld xmm11,2
+ paddd xmm14,xmm8
+ por xmm1,xmm5
+ por xmm11,xmm7
+ pxor xmm2,xmm4
+ movdqa xmm4,XMMWORD[((48-128))+rax]
+
+ movdqa xmm8,xmm14
+ pxor xmm2,XMMWORD[((144-128))+rax]
+ paddd xmm13,xmm15
+ movdqa xmm7,xmm10
+ pslld xmm8,5
+ pxor xmm2,xmm4
+ movdqa xmm6,xmm10
+ pandn xmm7,xmm12
+ movdqa xmm5,xmm2
+ pand xmm6,xmm11
+ movdqa xmm9,xmm14
+ psrld xmm5,31
+ paddd xmm2,xmm2
+
+ movdqa XMMWORD[(0-128)+rax],xmm1
+ paddd xmm13,xmm1
+ psrld xmm9,27
+ pxor xmm6,xmm7
+
+ movdqa xmm7,xmm10
+ por xmm8,xmm9
+ pslld xmm7,30
+ paddd xmm13,xmm6
+
+ psrld xmm10,2
+ paddd xmm13,xmm8
+ por xmm2,xmm5
+ por xmm10,xmm7
+ pxor xmm3,xmm0
+ movdqa xmm0,XMMWORD[((64-128))+rax]
+
+ movdqa xmm8,xmm13
+ pxor xmm3,XMMWORD[((160-128))+rax]
+ paddd xmm12,xmm15
+ movdqa xmm7,xmm14
+ pslld xmm8,5
+ pxor xmm3,xmm0
+ movdqa xmm6,xmm14
+ pandn xmm7,xmm11
+ movdqa xmm5,xmm3
+ pand xmm6,xmm10
+ movdqa xmm9,xmm13
+ psrld xmm5,31
+ paddd xmm3,xmm3
+
+ movdqa XMMWORD[(16-128)+rax],xmm2
+ paddd xmm12,xmm2
+ psrld xmm9,27
+ pxor xmm6,xmm7
+
+ movdqa xmm7,xmm14
+ por xmm8,xmm9
+ pslld xmm7,30
+ paddd xmm12,xmm6
+
+ psrld xmm14,2
+ paddd xmm12,xmm8
+ por xmm3,xmm5
+ por xmm14,xmm7
+ pxor xmm4,xmm1
+ movdqa xmm1,XMMWORD[((80-128))+rax]
+
+ movdqa xmm8,xmm12
+ pxor xmm4,XMMWORD[((176-128))+rax]
+ paddd xmm11,xmm15
+ movdqa xmm7,xmm13
+ pslld xmm8,5
+ pxor xmm4,xmm1
+ movdqa xmm6,xmm13
+ pandn xmm7,xmm10
+ movdqa xmm5,xmm4
+ pand xmm6,xmm14
+ movdqa xmm9,xmm12
+ psrld xmm5,31
+ paddd xmm4,xmm4
+
+ movdqa XMMWORD[(32-128)+rax],xmm3
+ paddd xmm11,xmm3
+ psrld xmm9,27
+ pxor xmm6,xmm7
+
+ movdqa xmm7,xmm13
+ por xmm8,xmm9
+ pslld xmm7,30
+ paddd xmm11,xmm6
+
+ psrld xmm13,2
+ paddd xmm11,xmm8
+ por xmm4,xmm5
+ por xmm13,xmm7
+ pxor xmm0,xmm2
+ movdqa xmm2,XMMWORD[((96-128))+rax]
+
+ movdqa xmm8,xmm11
+ pxor xmm0,XMMWORD[((192-128))+rax]
+ paddd xmm10,xmm15
+ movdqa xmm7,xmm12
+ pslld xmm8,5
+ pxor xmm0,xmm2
+ movdqa xmm6,xmm12
+ pandn xmm7,xmm14
+ movdqa xmm5,xmm0
+ pand xmm6,xmm13
+ movdqa xmm9,xmm11
+ psrld xmm5,31
+ paddd xmm0,xmm0
+
+ movdqa XMMWORD[(48-128)+rax],xmm4
+ paddd xmm10,xmm4
+ psrld xmm9,27
+ pxor xmm6,xmm7
+
+ movdqa xmm7,xmm12
+ por xmm8,xmm9
+ pslld xmm7,30
+ paddd xmm10,xmm6
+
+ psrld xmm12,2
+ paddd xmm10,xmm8
+ por xmm0,xmm5
+ por xmm12,xmm7
+ movdqa xmm15,XMMWORD[rbp]
+ pxor xmm1,xmm3
+ movdqa xmm3,XMMWORD[((112-128))+rax]
+
+ movdqa xmm8,xmm10
+ movdqa xmm6,xmm13
+ pxor xmm1,XMMWORD[((208-128))+rax]
+ paddd xmm14,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm11
+
+ movdqa xmm9,xmm10
+ movdqa XMMWORD[(64-128)+rax],xmm0
+ paddd xmm14,xmm0
+ pxor xmm1,xmm3
+ psrld xmm9,27
+ pxor xmm6,xmm12
+ movdqa xmm7,xmm11
+
+ pslld xmm7,30
+ movdqa xmm5,xmm1
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm14,xmm6
+ paddd xmm1,xmm1
+
+ psrld xmm11,2
+ paddd xmm14,xmm8
+ por xmm1,xmm5
+ por xmm11,xmm7
+ pxor xmm2,xmm4
+ movdqa xmm4,XMMWORD[((128-128))+rax]
+
+ movdqa xmm8,xmm14
+ movdqa xmm6,xmm12
+ pxor xmm2,XMMWORD[((224-128))+rax]
+ paddd xmm13,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm10
+
+ movdqa xmm9,xmm14
+ movdqa XMMWORD[(80-128)+rax],xmm1
+ paddd xmm13,xmm1
+ pxor xmm2,xmm4
+ psrld xmm9,27
+ pxor xmm6,xmm11
+ movdqa xmm7,xmm10
+
+ pslld xmm7,30
+ movdqa xmm5,xmm2
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm13,xmm6
+ paddd xmm2,xmm2
+
+ psrld xmm10,2
+ paddd xmm13,xmm8
+ por xmm2,xmm5
+ por xmm10,xmm7
+ pxor xmm3,xmm0
+ movdqa xmm0,XMMWORD[((144-128))+rax]
+
+ movdqa xmm8,xmm13
+ movdqa xmm6,xmm11
+ pxor xmm3,XMMWORD[((240-128))+rax]
+ paddd xmm12,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm14
+
+ movdqa xmm9,xmm13
+ movdqa XMMWORD[(96-128)+rax],xmm2
+ paddd xmm12,xmm2
+ pxor xmm3,xmm0
+ psrld xmm9,27
+ pxor xmm6,xmm10
+ movdqa xmm7,xmm14
+
+ pslld xmm7,30
+ movdqa xmm5,xmm3
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm12,xmm6
+ paddd xmm3,xmm3
+
+ psrld xmm14,2
+ paddd xmm12,xmm8
+ por xmm3,xmm5
+ por xmm14,xmm7
+ pxor xmm4,xmm1
+ movdqa xmm1,XMMWORD[((160-128))+rax]
+
+ movdqa xmm8,xmm12
+ movdqa xmm6,xmm10
+ pxor xmm4,XMMWORD[((0-128))+rax]
+ paddd xmm11,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm13
+
+ movdqa xmm9,xmm12
+ movdqa XMMWORD[(112-128)+rax],xmm3
+ paddd xmm11,xmm3
+ pxor xmm4,xmm1
+ psrld xmm9,27
+ pxor xmm6,xmm14
+ movdqa xmm7,xmm13
+
+ pslld xmm7,30
+ movdqa xmm5,xmm4
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm11,xmm6
+ paddd xmm4,xmm4
+
+ psrld xmm13,2
+ paddd xmm11,xmm8
+ por xmm4,xmm5
+ por xmm13,xmm7
+ pxor xmm0,xmm2
+ movdqa xmm2,XMMWORD[((176-128))+rax]
+
+ movdqa xmm8,xmm11
+ movdqa xmm6,xmm14
+ pxor xmm0,XMMWORD[((16-128))+rax]
+ paddd xmm10,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm12
+
+ movdqa xmm9,xmm11
+ movdqa XMMWORD[(128-128)+rax],xmm4
+ paddd xmm10,xmm4
+ pxor xmm0,xmm2
+ psrld xmm9,27
+ pxor xmm6,xmm13
+ movdqa xmm7,xmm12
+
+ pslld xmm7,30
+ movdqa xmm5,xmm0
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm10,xmm6
+ paddd xmm0,xmm0
+
+ psrld xmm12,2
+ paddd xmm10,xmm8
+ por xmm0,xmm5
+ por xmm12,xmm7
+ pxor xmm1,xmm3
+ movdqa xmm3,XMMWORD[((192-128))+rax]
+
+ movdqa xmm8,xmm10
+ movdqa xmm6,xmm13
+ pxor xmm1,XMMWORD[((32-128))+rax]
+ paddd xmm14,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm11
+
+ movdqa xmm9,xmm10
+ movdqa XMMWORD[(144-128)+rax],xmm0
+ paddd xmm14,xmm0
+ pxor xmm1,xmm3
+ psrld xmm9,27
+ pxor xmm6,xmm12
+ movdqa xmm7,xmm11
+
+ pslld xmm7,30
+ movdqa xmm5,xmm1
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm14,xmm6
+ paddd xmm1,xmm1
+
+ psrld xmm11,2
+ paddd xmm14,xmm8
+ por xmm1,xmm5
+ por xmm11,xmm7
+ pxor xmm2,xmm4
+ movdqa xmm4,XMMWORD[((208-128))+rax]
+
+ movdqa xmm8,xmm14
+ movdqa xmm6,xmm12
+ pxor xmm2,XMMWORD[((48-128))+rax]
+ paddd xmm13,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm10
+
+ movdqa xmm9,xmm14
+ movdqa XMMWORD[(160-128)+rax],xmm1
+ paddd xmm13,xmm1
+ pxor xmm2,xmm4
+ psrld xmm9,27
+ pxor xmm6,xmm11
+ movdqa xmm7,xmm10
+
+ pslld xmm7,30
+ movdqa xmm5,xmm2
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm13,xmm6
+ paddd xmm2,xmm2
+
+ psrld xmm10,2
+ paddd xmm13,xmm8
+ por xmm2,xmm5
+ por xmm10,xmm7
+ pxor xmm3,xmm0
+ movdqa xmm0,XMMWORD[((224-128))+rax]
+
+ movdqa xmm8,xmm13
+ movdqa xmm6,xmm11
+ pxor xmm3,XMMWORD[((64-128))+rax]
+ paddd xmm12,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm14
+
+ movdqa xmm9,xmm13
+ movdqa XMMWORD[(176-128)+rax],xmm2
+ paddd xmm12,xmm2
+ pxor xmm3,xmm0
+ psrld xmm9,27
+ pxor xmm6,xmm10
+ movdqa xmm7,xmm14
+
+ pslld xmm7,30
+ movdqa xmm5,xmm3
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm12,xmm6
+ paddd xmm3,xmm3
+
+ psrld xmm14,2
+ paddd xmm12,xmm8
+ por xmm3,xmm5
+ por xmm14,xmm7
+ pxor xmm4,xmm1
+ movdqa xmm1,XMMWORD[((240-128))+rax]
+
+ movdqa xmm8,xmm12
+ movdqa xmm6,xmm10
+ pxor xmm4,XMMWORD[((80-128))+rax]
+ paddd xmm11,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm13
+
+ movdqa xmm9,xmm12
+ movdqa XMMWORD[(192-128)+rax],xmm3
+ paddd xmm11,xmm3
+ pxor xmm4,xmm1
+ psrld xmm9,27
+ pxor xmm6,xmm14
+ movdqa xmm7,xmm13
+
+ pslld xmm7,30
+ movdqa xmm5,xmm4
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm11,xmm6
+ paddd xmm4,xmm4
+
+ psrld xmm13,2
+ paddd xmm11,xmm8
+ por xmm4,xmm5
+ por xmm13,xmm7
+ pxor xmm0,xmm2
+ movdqa xmm2,XMMWORD[((0-128))+rax]
+
+ movdqa xmm8,xmm11
+ movdqa xmm6,xmm14
+ pxor xmm0,XMMWORD[((96-128))+rax]
+ paddd xmm10,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm12
+
+ movdqa xmm9,xmm11
+ movdqa XMMWORD[(208-128)+rax],xmm4
+ paddd xmm10,xmm4
+ pxor xmm0,xmm2
+ psrld xmm9,27
+ pxor xmm6,xmm13
+ movdqa xmm7,xmm12
+
+ pslld xmm7,30
+ movdqa xmm5,xmm0
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm10,xmm6
+ paddd xmm0,xmm0
+
+ psrld xmm12,2
+ paddd xmm10,xmm8
+ por xmm0,xmm5
+ por xmm12,xmm7
+ pxor xmm1,xmm3
+ movdqa xmm3,XMMWORD[((16-128))+rax]
+
+ movdqa xmm8,xmm10
+ movdqa xmm6,xmm13
+ pxor xmm1,XMMWORD[((112-128))+rax]
+ paddd xmm14,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm11
+
+ movdqa xmm9,xmm10
+ movdqa XMMWORD[(224-128)+rax],xmm0
+ paddd xmm14,xmm0
+ pxor xmm1,xmm3
+ psrld xmm9,27
+ pxor xmm6,xmm12
+ movdqa xmm7,xmm11
+
+ pslld xmm7,30
+ movdqa xmm5,xmm1
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm14,xmm6
+ paddd xmm1,xmm1
+
+ psrld xmm11,2
+ paddd xmm14,xmm8
+ por xmm1,xmm5
+ por xmm11,xmm7
+ pxor xmm2,xmm4
+ movdqa xmm4,XMMWORD[((32-128))+rax]
+
+ movdqa xmm8,xmm14
+ movdqa xmm6,xmm12
+ pxor xmm2,XMMWORD[((128-128))+rax]
+ paddd xmm13,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm10
+
+ movdqa xmm9,xmm14
+ movdqa XMMWORD[(240-128)+rax],xmm1
+ paddd xmm13,xmm1
+ pxor xmm2,xmm4
+ psrld xmm9,27
+ pxor xmm6,xmm11
+ movdqa xmm7,xmm10
+
+ pslld xmm7,30
+ movdqa xmm5,xmm2
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm13,xmm6
+ paddd xmm2,xmm2
+
+ psrld xmm10,2
+ paddd xmm13,xmm8
+ por xmm2,xmm5
+ por xmm10,xmm7
+ pxor xmm3,xmm0
+ movdqa xmm0,XMMWORD[((48-128))+rax]
+
+ movdqa xmm8,xmm13
+ movdqa xmm6,xmm11
+ pxor xmm3,XMMWORD[((144-128))+rax]
+ paddd xmm12,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm14
+
+ movdqa xmm9,xmm13
+ movdqa XMMWORD[(0-128)+rax],xmm2
+ paddd xmm12,xmm2
+ pxor xmm3,xmm0
+ psrld xmm9,27
+ pxor xmm6,xmm10
+ movdqa xmm7,xmm14
+
+ pslld xmm7,30
+ movdqa xmm5,xmm3
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm12,xmm6
+ paddd xmm3,xmm3
+
+ psrld xmm14,2
+ paddd xmm12,xmm8
+ por xmm3,xmm5
+ por xmm14,xmm7
+ pxor xmm4,xmm1
+ movdqa xmm1,XMMWORD[((64-128))+rax]
+
+ movdqa xmm8,xmm12
+ movdqa xmm6,xmm10
+ pxor xmm4,XMMWORD[((160-128))+rax]
+ paddd xmm11,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm13
+
+ movdqa xmm9,xmm12
+ movdqa XMMWORD[(16-128)+rax],xmm3
+ paddd xmm11,xmm3
+ pxor xmm4,xmm1
+ psrld xmm9,27
+ pxor xmm6,xmm14
+ movdqa xmm7,xmm13
+
+ pslld xmm7,30
+ movdqa xmm5,xmm4
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm11,xmm6
+ paddd xmm4,xmm4
+
+ psrld xmm13,2
+ paddd xmm11,xmm8
+ por xmm4,xmm5
+ por xmm13,xmm7
+ pxor xmm0,xmm2
+ movdqa xmm2,XMMWORD[((80-128))+rax]
+
+ movdqa xmm8,xmm11
+ movdqa xmm6,xmm14
+ pxor xmm0,XMMWORD[((176-128))+rax]
+ paddd xmm10,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm12
+
+ movdqa xmm9,xmm11
+ movdqa XMMWORD[(32-128)+rax],xmm4
+ paddd xmm10,xmm4
+ pxor xmm0,xmm2
+ psrld xmm9,27
+ pxor xmm6,xmm13
+ movdqa xmm7,xmm12
+
+ pslld xmm7,30
+ movdqa xmm5,xmm0
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm10,xmm6
+ paddd xmm0,xmm0
+
+ psrld xmm12,2
+ paddd xmm10,xmm8
+ por xmm0,xmm5
+ por xmm12,xmm7
+ pxor xmm1,xmm3
+ movdqa xmm3,XMMWORD[((96-128))+rax]
+
+ movdqa xmm8,xmm10
+ movdqa xmm6,xmm13
+ pxor xmm1,XMMWORD[((192-128))+rax]
+ paddd xmm14,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm11
+
+ movdqa xmm9,xmm10
+ movdqa XMMWORD[(48-128)+rax],xmm0
+ paddd xmm14,xmm0
+ pxor xmm1,xmm3
+ psrld xmm9,27
+ pxor xmm6,xmm12
+ movdqa xmm7,xmm11
+
+ pslld xmm7,30
+ movdqa xmm5,xmm1
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm14,xmm6
+ paddd xmm1,xmm1
+
+ psrld xmm11,2
+ paddd xmm14,xmm8
+ por xmm1,xmm5
+ por xmm11,xmm7
+ pxor xmm2,xmm4
+ movdqa xmm4,XMMWORD[((112-128))+rax]
+
+ movdqa xmm8,xmm14
+ movdqa xmm6,xmm12
+ pxor xmm2,XMMWORD[((208-128))+rax]
+ paddd xmm13,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm10
+
+ movdqa xmm9,xmm14
+ movdqa XMMWORD[(64-128)+rax],xmm1
+ paddd xmm13,xmm1
+ pxor xmm2,xmm4
+ psrld xmm9,27
+ pxor xmm6,xmm11
+ movdqa xmm7,xmm10
+
+ pslld xmm7,30
+ movdqa xmm5,xmm2
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm13,xmm6
+ paddd xmm2,xmm2
+
+ psrld xmm10,2
+ paddd xmm13,xmm8
+ por xmm2,xmm5
+ por xmm10,xmm7
+ pxor xmm3,xmm0
+ movdqa xmm0,XMMWORD[((128-128))+rax]
+
+ movdqa xmm8,xmm13
+ movdqa xmm6,xmm11
+ pxor xmm3,XMMWORD[((224-128))+rax]
+ paddd xmm12,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm14
+
+ movdqa xmm9,xmm13
+ movdqa XMMWORD[(80-128)+rax],xmm2
+ paddd xmm12,xmm2
+ pxor xmm3,xmm0
+ psrld xmm9,27
+ pxor xmm6,xmm10
+ movdqa xmm7,xmm14
+
+ pslld xmm7,30
+ movdqa xmm5,xmm3
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm12,xmm6
+ paddd xmm3,xmm3
+
+ psrld xmm14,2
+ paddd xmm12,xmm8
+ por xmm3,xmm5
+ por xmm14,xmm7
+ pxor xmm4,xmm1
+ movdqa xmm1,XMMWORD[((144-128))+rax]
+
+ movdqa xmm8,xmm12
+ movdqa xmm6,xmm10
+ pxor xmm4,XMMWORD[((240-128))+rax]
+ paddd xmm11,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm13
+
+ movdqa xmm9,xmm12
+ movdqa XMMWORD[(96-128)+rax],xmm3
+ paddd xmm11,xmm3
+ pxor xmm4,xmm1
+ psrld xmm9,27
+ pxor xmm6,xmm14
+ movdqa xmm7,xmm13
+
+ pslld xmm7,30
+ movdqa xmm5,xmm4
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm11,xmm6
+ paddd xmm4,xmm4
+
+ psrld xmm13,2
+ paddd xmm11,xmm8
+ por xmm4,xmm5
+ por xmm13,xmm7
+ pxor xmm0,xmm2
+ movdqa xmm2,XMMWORD[((160-128))+rax]
+
+ movdqa xmm8,xmm11
+ movdqa xmm6,xmm14
+ pxor xmm0,XMMWORD[((0-128))+rax]
+ paddd xmm10,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm12
+
+ movdqa xmm9,xmm11
+ movdqa XMMWORD[(112-128)+rax],xmm4
+ paddd xmm10,xmm4
+ pxor xmm0,xmm2
+ psrld xmm9,27
+ pxor xmm6,xmm13
+ movdqa xmm7,xmm12
+
+ pslld xmm7,30
+ movdqa xmm5,xmm0
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm10,xmm6
+ paddd xmm0,xmm0
+
+ psrld xmm12,2
+ paddd xmm10,xmm8
+ por xmm0,xmm5
+ por xmm12,xmm7
+ movdqa xmm15,XMMWORD[32+rbp]
+ pxor xmm1,xmm3
+ movdqa xmm3,XMMWORD[((176-128))+rax]
+
+ movdqa xmm8,xmm10
+ movdqa xmm7,xmm13
+ pxor xmm1,XMMWORD[((16-128))+rax]
+ pxor xmm1,xmm3
+ paddd xmm14,xmm15
+ pslld xmm8,5
+ movdqa xmm9,xmm10
+ pand xmm7,xmm12
+
+ movdqa xmm6,xmm13
+ movdqa xmm5,xmm1
+ psrld xmm9,27
+ paddd xmm14,xmm7
+ pxor xmm6,xmm12
+
+ movdqa XMMWORD[(128-128)+rax],xmm0
+ paddd xmm14,xmm0
+ por xmm8,xmm9
+ psrld xmm5,31
+ pand xmm6,xmm11
+ movdqa xmm7,xmm11
+
+ pslld xmm7,30
+ paddd xmm1,xmm1
+ paddd xmm14,xmm6
+
+ psrld xmm11,2
+ paddd xmm14,xmm8
+ por xmm1,xmm5
+ por xmm11,xmm7
+ pxor xmm2,xmm4
+ movdqa xmm4,XMMWORD[((192-128))+rax]
+
+ movdqa xmm8,xmm14
+ movdqa xmm7,xmm12
+ pxor xmm2,XMMWORD[((32-128))+rax]
+ pxor xmm2,xmm4
+ paddd xmm13,xmm15
+ pslld xmm8,5
+ movdqa xmm9,xmm14
+ pand xmm7,xmm11
+
+ movdqa xmm6,xmm12
+ movdqa xmm5,xmm2
+ psrld xmm9,27
+ paddd xmm13,xmm7
+ pxor xmm6,xmm11
+
+ movdqa XMMWORD[(144-128)+rax],xmm1
+ paddd xmm13,xmm1
+ por xmm8,xmm9
+ psrld xmm5,31
+ pand xmm6,xmm10
+ movdqa xmm7,xmm10
+
+ pslld xmm7,30
+ paddd xmm2,xmm2
+ paddd xmm13,xmm6
+
+ psrld xmm10,2
+ paddd xmm13,xmm8
+ por xmm2,xmm5
+ por xmm10,xmm7
+ pxor xmm3,xmm0
+ movdqa xmm0,XMMWORD[((208-128))+rax]
+
+ movdqa xmm8,xmm13
+ movdqa xmm7,xmm11
+ pxor xmm3,XMMWORD[((48-128))+rax]
+ pxor xmm3,xmm0
+ paddd xmm12,xmm15
+ pslld xmm8,5
+ movdqa xmm9,xmm13
+ pand xmm7,xmm10
+
+ movdqa xmm6,xmm11
+ movdqa xmm5,xmm3
+ psrld xmm9,27
+ paddd xmm12,xmm7
+ pxor xmm6,xmm10
+
+ movdqa XMMWORD[(160-128)+rax],xmm2
+ paddd xmm12,xmm2
+ por xmm8,xmm9
+ psrld xmm5,31
+ pand xmm6,xmm14
+ movdqa xmm7,xmm14
+
+ pslld xmm7,30
+ paddd xmm3,xmm3
+ paddd xmm12,xmm6
+
+ psrld xmm14,2
+ paddd xmm12,xmm8
+ por xmm3,xmm5
+ por xmm14,xmm7
+ pxor xmm4,xmm1
+ movdqa xmm1,XMMWORD[((224-128))+rax]
+
+ movdqa xmm8,xmm12
+ movdqa xmm7,xmm10
+ pxor xmm4,XMMWORD[((64-128))+rax]
+ pxor xmm4,xmm1
+ paddd xmm11,xmm15
+ pslld xmm8,5
+ movdqa xmm9,xmm12
+ pand xmm7,xmm14
+
+ movdqa xmm6,xmm10
+ movdqa xmm5,xmm4
+ psrld xmm9,27
+ paddd xmm11,xmm7
+ pxor xmm6,xmm14
+
+ movdqa XMMWORD[(176-128)+rax],xmm3
+ paddd xmm11,xmm3
+ por xmm8,xmm9
+ psrld xmm5,31
+ pand xmm6,xmm13
+ movdqa xmm7,xmm13
+
+ pslld xmm7,30
+ paddd xmm4,xmm4
+ paddd xmm11,xmm6
+
+ psrld xmm13,2
+ paddd xmm11,xmm8
+ por xmm4,xmm5
+ por xmm13,xmm7
+ pxor xmm0,xmm2
+ movdqa xmm2,XMMWORD[((240-128))+rax]
+
+ movdqa xmm8,xmm11
+ movdqa xmm7,xmm14
+ pxor xmm0,XMMWORD[((80-128))+rax]
+ pxor xmm0,xmm2
+ paddd xmm10,xmm15
+ pslld xmm8,5
+ movdqa xmm9,xmm11
+ pand xmm7,xmm13
+
+ movdqa xmm6,xmm14
+ movdqa xmm5,xmm0
+ psrld xmm9,27
+ paddd xmm10,xmm7
+ pxor xmm6,xmm13
+
+ movdqa XMMWORD[(192-128)+rax],xmm4
+ paddd xmm10,xmm4
+ por xmm8,xmm9
+ psrld xmm5,31
+ pand xmm6,xmm12
+ movdqa xmm7,xmm12
+
+ pslld xmm7,30
+ paddd xmm0,xmm0
+ paddd xmm10,xmm6
+
+ psrld xmm12,2
+ paddd xmm10,xmm8
+ por xmm0,xmm5
+ por xmm12,xmm7
+ pxor xmm1,xmm3
+ movdqa xmm3,XMMWORD[((0-128))+rax]
+
+ movdqa xmm8,xmm10
+ movdqa xmm7,xmm13
+ pxor xmm1,XMMWORD[((96-128))+rax]
+ pxor xmm1,xmm3
+ paddd xmm14,xmm15
+ pslld xmm8,5
+ movdqa xmm9,xmm10
+ pand xmm7,xmm12
+
+ movdqa xmm6,xmm13
+ movdqa xmm5,xmm1
+ psrld xmm9,27
+ paddd xmm14,xmm7
+ pxor xmm6,xmm12
+
+ movdqa XMMWORD[(208-128)+rax],xmm0
+ paddd xmm14,xmm0
+ por xmm8,xmm9
+ psrld xmm5,31
+ pand xmm6,xmm11
+ movdqa xmm7,xmm11
+
+ pslld xmm7,30
+ paddd xmm1,xmm1
+ paddd xmm14,xmm6
+
+ psrld xmm11,2
+ paddd xmm14,xmm8
+ por xmm1,xmm5
+ por xmm11,xmm7
+ pxor xmm2,xmm4
+ movdqa xmm4,XMMWORD[((16-128))+rax]
+
+ movdqa xmm8,xmm14
+ movdqa xmm7,xmm12
+ pxor xmm2,XMMWORD[((112-128))+rax]
+ pxor xmm2,xmm4
+ paddd xmm13,xmm15
+ pslld xmm8,5
+ movdqa xmm9,xmm14
+ pand xmm7,xmm11
+
+ movdqa xmm6,xmm12
+ movdqa xmm5,xmm2
+ psrld xmm9,27
+ paddd xmm13,xmm7
+ pxor xmm6,xmm11
+
+ movdqa XMMWORD[(224-128)+rax],xmm1
+ paddd xmm13,xmm1
+ por xmm8,xmm9
+ psrld xmm5,31
+ pand xmm6,xmm10
+ movdqa xmm7,xmm10
+
+ pslld xmm7,30
+ paddd xmm2,xmm2
+ paddd xmm13,xmm6
+
+ psrld xmm10,2
+ paddd xmm13,xmm8
+ por xmm2,xmm5
+ por xmm10,xmm7
+ pxor xmm3,xmm0
+ movdqa xmm0,XMMWORD[((32-128))+rax]
+
+ movdqa xmm8,xmm13
+ movdqa xmm7,xmm11
+ pxor xmm3,XMMWORD[((128-128))+rax]
+ pxor xmm3,xmm0
+ paddd xmm12,xmm15
+ pslld xmm8,5
+ movdqa xmm9,xmm13
+ pand xmm7,xmm10
+
+ movdqa xmm6,xmm11
+ movdqa xmm5,xmm3
+ psrld xmm9,27
+ paddd xmm12,xmm7
+ pxor xmm6,xmm10
+
+ movdqa XMMWORD[(240-128)+rax],xmm2
+ paddd xmm12,xmm2
+ por xmm8,xmm9
+ psrld xmm5,31
+ pand xmm6,xmm14
+ movdqa xmm7,xmm14
+
+ pslld xmm7,30
+ paddd xmm3,xmm3
+ paddd xmm12,xmm6
+
+ psrld xmm14,2
+ paddd xmm12,xmm8
+ por xmm3,xmm5
+ por xmm14,xmm7
+ pxor xmm4,xmm1
+ movdqa xmm1,XMMWORD[((48-128))+rax]
+
+ movdqa xmm8,xmm12
+ movdqa xmm7,xmm10
+ pxor xmm4,XMMWORD[((144-128))+rax]
+ pxor xmm4,xmm1
+ paddd xmm11,xmm15
+ pslld xmm8,5
+ movdqa xmm9,xmm12
+ pand xmm7,xmm14
+
+ movdqa xmm6,xmm10
+ movdqa xmm5,xmm4
+ psrld xmm9,27
+ paddd xmm11,xmm7
+ pxor xmm6,xmm14
+
+ movdqa XMMWORD[(0-128)+rax],xmm3
+ paddd xmm11,xmm3
+ por xmm8,xmm9
+ psrld xmm5,31
+ pand xmm6,xmm13
+ movdqa xmm7,xmm13
+
+ pslld xmm7,30
+ paddd xmm4,xmm4
+ paddd xmm11,xmm6
+
+ psrld xmm13,2
+ paddd xmm11,xmm8
+ por xmm4,xmm5
+ por xmm13,xmm7
+ pxor xmm0,xmm2
+ movdqa xmm2,XMMWORD[((64-128))+rax]
+
+ movdqa xmm8,xmm11
+ movdqa xmm7,xmm14
+ pxor xmm0,XMMWORD[((160-128))+rax]
+ pxor xmm0,xmm2
+ paddd xmm10,xmm15
+ pslld xmm8,5
+ movdqa xmm9,xmm11
+ pand xmm7,xmm13
+
+ movdqa xmm6,xmm14
+ movdqa xmm5,xmm0
+ psrld xmm9,27
+ paddd xmm10,xmm7
+ pxor xmm6,xmm13
+
+ movdqa XMMWORD[(16-128)+rax],xmm4
+ paddd xmm10,xmm4
+ por xmm8,xmm9
+ psrld xmm5,31
+ pand xmm6,xmm12
+ movdqa xmm7,xmm12
+
+ pslld xmm7,30
+ paddd xmm0,xmm0
+ paddd xmm10,xmm6
+
+ psrld xmm12,2
+ paddd xmm10,xmm8
+ por xmm0,xmm5
+ por xmm12,xmm7
+ pxor xmm1,xmm3
+ movdqa xmm3,XMMWORD[((80-128))+rax]
+
+ movdqa xmm8,xmm10
+ movdqa xmm7,xmm13
+ pxor xmm1,XMMWORD[((176-128))+rax]
+ pxor xmm1,xmm3
+ paddd xmm14,xmm15
+ pslld xmm8,5
+ movdqa xmm9,xmm10
+ pand xmm7,xmm12
+
+ movdqa xmm6,xmm13
+ movdqa xmm5,xmm1
+ psrld xmm9,27
+ paddd xmm14,xmm7
+ pxor xmm6,xmm12
+
+ movdqa XMMWORD[(32-128)+rax],xmm0
+ paddd xmm14,xmm0
+ por xmm8,xmm9
+ psrld xmm5,31
+ pand xmm6,xmm11
+ movdqa xmm7,xmm11
+
+ pslld xmm7,30
+ paddd xmm1,xmm1
+ paddd xmm14,xmm6
+
+ psrld xmm11,2
+ paddd xmm14,xmm8
+ por xmm1,xmm5
+ por xmm11,xmm7
+ pxor xmm2,xmm4
+ movdqa xmm4,XMMWORD[((96-128))+rax]
+
+ movdqa xmm8,xmm14
+ movdqa xmm7,xmm12
+ pxor xmm2,XMMWORD[((192-128))+rax]
+ pxor xmm2,xmm4
+ paddd xmm13,xmm15
+ pslld xmm8,5
+ movdqa xmm9,xmm14
+ pand xmm7,xmm11
+
+ movdqa xmm6,xmm12
+ movdqa xmm5,xmm2
+ psrld xmm9,27
+ paddd xmm13,xmm7
+ pxor xmm6,xmm11
+
+ movdqa XMMWORD[(48-128)+rax],xmm1
+ paddd xmm13,xmm1
+ por xmm8,xmm9
+ psrld xmm5,31
+ pand xmm6,xmm10
+ movdqa xmm7,xmm10
+
+ pslld xmm7,30
+ paddd xmm2,xmm2
+ paddd xmm13,xmm6
+
+ psrld xmm10,2
+ paddd xmm13,xmm8
+ por xmm2,xmm5
+ por xmm10,xmm7
+ pxor xmm3,xmm0
+ movdqa xmm0,XMMWORD[((112-128))+rax]
+
+ movdqa xmm8,xmm13
+ movdqa xmm7,xmm11
+ pxor xmm3,XMMWORD[((208-128))+rax]
+ pxor xmm3,xmm0
+ paddd xmm12,xmm15
+ pslld xmm8,5
+ movdqa xmm9,xmm13
+ pand xmm7,xmm10
+
+ movdqa xmm6,xmm11
+ movdqa xmm5,xmm3
+ psrld xmm9,27
+ paddd xmm12,xmm7
+ pxor xmm6,xmm10
+
+ movdqa XMMWORD[(64-128)+rax],xmm2
+ paddd xmm12,xmm2
+ por xmm8,xmm9
+ psrld xmm5,31
+ pand xmm6,xmm14
+ movdqa xmm7,xmm14
+
+ pslld xmm7,30
+ paddd xmm3,xmm3
+ paddd xmm12,xmm6
+
+ psrld xmm14,2
+ paddd xmm12,xmm8
+ por xmm3,xmm5
+ por xmm14,xmm7
+ pxor xmm4,xmm1
+ movdqa xmm1,XMMWORD[((128-128))+rax]
+
+ movdqa xmm8,xmm12
+ movdqa xmm7,xmm10
+ pxor xmm4,XMMWORD[((224-128))+rax]
+ pxor xmm4,xmm1
+ paddd xmm11,xmm15
+ pslld xmm8,5
+ movdqa xmm9,xmm12
+ pand xmm7,xmm14
+
+ movdqa xmm6,xmm10
+ movdqa xmm5,xmm4
+ psrld xmm9,27
+ paddd xmm11,xmm7
+ pxor xmm6,xmm14
+
+ movdqa XMMWORD[(80-128)+rax],xmm3
+ paddd xmm11,xmm3
+ por xmm8,xmm9
+ psrld xmm5,31
+ pand xmm6,xmm13
+ movdqa xmm7,xmm13
+
+ pslld xmm7,30
+ paddd xmm4,xmm4
+ paddd xmm11,xmm6
+
+ psrld xmm13,2
+ paddd xmm11,xmm8
+ por xmm4,xmm5
+ por xmm13,xmm7
+ pxor xmm0,xmm2
+ movdqa xmm2,XMMWORD[((144-128))+rax]
+
+ movdqa xmm8,xmm11
+ movdqa xmm7,xmm14
+ pxor xmm0,XMMWORD[((240-128))+rax]
+ pxor xmm0,xmm2
+ paddd xmm10,xmm15
+ pslld xmm8,5
+ movdqa xmm9,xmm11
+ pand xmm7,xmm13
+
+ movdqa xmm6,xmm14
+ movdqa xmm5,xmm0
+ psrld xmm9,27
+ paddd xmm10,xmm7
+ pxor xmm6,xmm13
+
+ movdqa XMMWORD[(96-128)+rax],xmm4
+ paddd xmm10,xmm4
+ por xmm8,xmm9
+ psrld xmm5,31
+ pand xmm6,xmm12
+ movdqa xmm7,xmm12
+
+ pslld xmm7,30
+ paddd xmm0,xmm0
+ paddd xmm10,xmm6
+
+ psrld xmm12,2
+ paddd xmm10,xmm8
+ por xmm0,xmm5
+ por xmm12,xmm7
+ pxor xmm1,xmm3
+ movdqa xmm3,XMMWORD[((160-128))+rax]
+
+ movdqa xmm8,xmm10
+ movdqa xmm7,xmm13
+ pxor xmm1,XMMWORD[((0-128))+rax]
+ pxor xmm1,xmm3
+ paddd xmm14,xmm15
+ pslld xmm8,5
+ movdqa xmm9,xmm10
+ pand xmm7,xmm12
+
+ movdqa xmm6,xmm13
+ movdqa xmm5,xmm1
+ psrld xmm9,27
+ paddd xmm14,xmm7
+ pxor xmm6,xmm12
+
+ movdqa XMMWORD[(112-128)+rax],xmm0
+ paddd xmm14,xmm0
+ por xmm8,xmm9
+ psrld xmm5,31
+ pand xmm6,xmm11
+ movdqa xmm7,xmm11
+
+ pslld xmm7,30
+ paddd xmm1,xmm1
+ paddd xmm14,xmm6
+
+ psrld xmm11,2
+ paddd xmm14,xmm8
+ por xmm1,xmm5
+ por xmm11,xmm7
+ pxor xmm2,xmm4
+ movdqa xmm4,XMMWORD[((176-128))+rax]
+
+ movdqa xmm8,xmm14
+ movdqa xmm7,xmm12
+ pxor xmm2,XMMWORD[((16-128))+rax]
+ pxor xmm2,xmm4
+ paddd xmm13,xmm15
+ pslld xmm8,5
+ movdqa xmm9,xmm14
+ pand xmm7,xmm11
+
+ movdqa xmm6,xmm12
+ movdqa xmm5,xmm2
+ psrld xmm9,27
+ paddd xmm13,xmm7
+ pxor xmm6,xmm11
+
+ movdqa XMMWORD[(128-128)+rax],xmm1
+ paddd xmm13,xmm1
+ por xmm8,xmm9
+ psrld xmm5,31
+ pand xmm6,xmm10
+ movdqa xmm7,xmm10
+
+ pslld xmm7,30
+ paddd xmm2,xmm2
+ paddd xmm13,xmm6
+
+ psrld xmm10,2
+ paddd xmm13,xmm8
+ por xmm2,xmm5
+ por xmm10,xmm7
+ pxor xmm3,xmm0
+ movdqa xmm0,XMMWORD[((192-128))+rax]
+
+ movdqa xmm8,xmm13
+ movdqa xmm7,xmm11
+ pxor xmm3,XMMWORD[((32-128))+rax]
+ pxor xmm3,xmm0
+ paddd xmm12,xmm15
+ pslld xmm8,5
+ movdqa xmm9,xmm13
+ pand xmm7,xmm10
+
+ movdqa xmm6,xmm11
+ movdqa xmm5,xmm3
+ psrld xmm9,27
+ paddd xmm12,xmm7
+ pxor xmm6,xmm10
+
+ movdqa XMMWORD[(144-128)+rax],xmm2
+ paddd xmm12,xmm2
+ por xmm8,xmm9
+ psrld xmm5,31
+ pand xmm6,xmm14
+ movdqa xmm7,xmm14
+
+ pslld xmm7,30
+ paddd xmm3,xmm3
+ paddd xmm12,xmm6
+
+ psrld xmm14,2
+ paddd xmm12,xmm8
+ por xmm3,xmm5
+ por xmm14,xmm7
+ pxor xmm4,xmm1
+ movdqa xmm1,XMMWORD[((208-128))+rax]
+
+ movdqa xmm8,xmm12
+ movdqa xmm7,xmm10
+ pxor xmm4,XMMWORD[((48-128))+rax]
+ pxor xmm4,xmm1
+ paddd xmm11,xmm15
+ pslld xmm8,5
+ movdqa xmm9,xmm12
+ pand xmm7,xmm14
+
+ movdqa xmm6,xmm10
+ movdqa xmm5,xmm4
+ psrld xmm9,27
+ paddd xmm11,xmm7
+ pxor xmm6,xmm14
+
+ movdqa XMMWORD[(160-128)+rax],xmm3
+ paddd xmm11,xmm3
+ por xmm8,xmm9
+ psrld xmm5,31
+ pand xmm6,xmm13
+ movdqa xmm7,xmm13
+
+ pslld xmm7,30
+ paddd xmm4,xmm4
+ paddd xmm11,xmm6
+
+ psrld xmm13,2
+ paddd xmm11,xmm8
+ por xmm4,xmm5
+ por xmm13,xmm7
+ pxor xmm0,xmm2
+ movdqa xmm2,XMMWORD[((224-128))+rax]
+
+ movdqa xmm8,xmm11
+ movdqa xmm7,xmm14
+ pxor xmm0,XMMWORD[((64-128))+rax]
+ pxor xmm0,xmm2
+ paddd xmm10,xmm15
+ pslld xmm8,5
+ movdqa xmm9,xmm11
+ pand xmm7,xmm13
+
+ movdqa xmm6,xmm14
+ movdqa xmm5,xmm0
+ psrld xmm9,27
+ paddd xmm10,xmm7
+ pxor xmm6,xmm13
+
+ movdqa XMMWORD[(176-128)+rax],xmm4
+ paddd xmm10,xmm4
+ por xmm8,xmm9
+ psrld xmm5,31
+ pand xmm6,xmm12
+ movdqa xmm7,xmm12
+
+ pslld xmm7,30
+ paddd xmm0,xmm0
+ paddd xmm10,xmm6
+
+ psrld xmm12,2
+ paddd xmm10,xmm8
+ por xmm0,xmm5
+ por xmm12,xmm7
+ movdqa xmm15,XMMWORD[64+rbp]
+ pxor xmm1,xmm3
+ movdqa xmm3,XMMWORD[((240-128))+rax]
+
+ movdqa xmm8,xmm10
+ movdqa xmm6,xmm13
+ pxor xmm1,XMMWORD[((80-128))+rax]
+ paddd xmm14,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm11
+
+ movdqa xmm9,xmm10
+ movdqa XMMWORD[(192-128)+rax],xmm0
+ paddd xmm14,xmm0
+ pxor xmm1,xmm3
+ psrld xmm9,27
+ pxor xmm6,xmm12
+ movdqa xmm7,xmm11
+
+ pslld xmm7,30
+ movdqa xmm5,xmm1
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm14,xmm6
+ paddd xmm1,xmm1
+
+ psrld xmm11,2
+ paddd xmm14,xmm8
+ por xmm1,xmm5
+ por xmm11,xmm7
+ pxor xmm2,xmm4
+ movdqa xmm4,XMMWORD[((0-128))+rax]
+
+ movdqa xmm8,xmm14
+ movdqa xmm6,xmm12
+ pxor xmm2,XMMWORD[((96-128))+rax]
+ paddd xmm13,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm10
+
+ movdqa xmm9,xmm14
+ movdqa XMMWORD[(208-128)+rax],xmm1
+ paddd xmm13,xmm1
+ pxor xmm2,xmm4
+ psrld xmm9,27
+ pxor xmm6,xmm11
+ movdqa xmm7,xmm10
+
+ pslld xmm7,30
+ movdqa xmm5,xmm2
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm13,xmm6
+ paddd xmm2,xmm2
+
+ psrld xmm10,2
+ paddd xmm13,xmm8
+ por xmm2,xmm5
+ por xmm10,xmm7
+ pxor xmm3,xmm0
+ movdqa xmm0,XMMWORD[((16-128))+rax]
+
+ movdqa xmm8,xmm13
+ movdqa xmm6,xmm11
+ pxor xmm3,XMMWORD[((112-128))+rax]
+ paddd xmm12,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm14
+
+ movdqa xmm9,xmm13
+ movdqa XMMWORD[(224-128)+rax],xmm2
+ paddd xmm12,xmm2
+ pxor xmm3,xmm0
+ psrld xmm9,27
+ pxor xmm6,xmm10
+ movdqa xmm7,xmm14
+
+ pslld xmm7,30
+ movdqa xmm5,xmm3
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm12,xmm6
+ paddd xmm3,xmm3
+
+ psrld xmm14,2
+ paddd xmm12,xmm8
+ por xmm3,xmm5
+ por xmm14,xmm7
+ pxor xmm4,xmm1
+ movdqa xmm1,XMMWORD[((32-128))+rax]
+
+ movdqa xmm8,xmm12
+ movdqa xmm6,xmm10
+ pxor xmm4,XMMWORD[((128-128))+rax]
+ paddd xmm11,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm13
+
+ movdqa xmm9,xmm12
+ movdqa XMMWORD[(240-128)+rax],xmm3
+ paddd xmm11,xmm3
+ pxor xmm4,xmm1
+ psrld xmm9,27
+ pxor xmm6,xmm14
+ movdqa xmm7,xmm13
+
+ pslld xmm7,30
+ movdqa xmm5,xmm4
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm11,xmm6
+ paddd xmm4,xmm4
+
+ psrld xmm13,2
+ paddd xmm11,xmm8
+ por xmm4,xmm5
+ por xmm13,xmm7
+ pxor xmm0,xmm2
+ movdqa xmm2,XMMWORD[((48-128))+rax]
+
+ movdqa xmm8,xmm11
+ movdqa xmm6,xmm14
+ pxor xmm0,XMMWORD[((144-128))+rax]
+ paddd xmm10,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm12
+
+ movdqa xmm9,xmm11
+ movdqa XMMWORD[(0-128)+rax],xmm4
+ paddd xmm10,xmm4
+ pxor xmm0,xmm2
+ psrld xmm9,27
+ pxor xmm6,xmm13
+ movdqa xmm7,xmm12
+
+ pslld xmm7,30
+ movdqa xmm5,xmm0
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm10,xmm6
+ paddd xmm0,xmm0
+
+ psrld xmm12,2
+ paddd xmm10,xmm8
+ por xmm0,xmm5
+ por xmm12,xmm7
+ pxor xmm1,xmm3
+ movdqa xmm3,XMMWORD[((64-128))+rax]
+
+ movdqa xmm8,xmm10
+ movdqa xmm6,xmm13
+ pxor xmm1,XMMWORD[((160-128))+rax]
+ paddd xmm14,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm11
+
+ movdqa xmm9,xmm10
+ movdqa XMMWORD[(16-128)+rax],xmm0
+ paddd xmm14,xmm0
+ pxor xmm1,xmm3
+ psrld xmm9,27
+ pxor xmm6,xmm12
+ movdqa xmm7,xmm11
+
+ pslld xmm7,30
+ movdqa xmm5,xmm1
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm14,xmm6
+ paddd xmm1,xmm1
+
+ psrld xmm11,2
+ paddd xmm14,xmm8
+ por xmm1,xmm5
+ por xmm11,xmm7
+ pxor xmm2,xmm4
+ movdqa xmm4,XMMWORD[((80-128))+rax]
+
+ movdqa xmm8,xmm14
+ movdqa xmm6,xmm12
+ pxor xmm2,XMMWORD[((176-128))+rax]
+ paddd xmm13,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm10
+
+ movdqa xmm9,xmm14
+ movdqa XMMWORD[(32-128)+rax],xmm1
+ paddd xmm13,xmm1
+ pxor xmm2,xmm4
+ psrld xmm9,27
+ pxor xmm6,xmm11
+ movdqa xmm7,xmm10
+
+ pslld xmm7,30
+ movdqa xmm5,xmm2
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm13,xmm6
+ paddd xmm2,xmm2
+
+ psrld xmm10,2
+ paddd xmm13,xmm8
+ por xmm2,xmm5
+ por xmm10,xmm7
+ pxor xmm3,xmm0
+ movdqa xmm0,XMMWORD[((96-128))+rax]
+
+ movdqa xmm8,xmm13
+ movdqa xmm6,xmm11
+ pxor xmm3,XMMWORD[((192-128))+rax]
+ paddd xmm12,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm14
+
+ movdqa xmm9,xmm13
+ movdqa XMMWORD[(48-128)+rax],xmm2
+ paddd xmm12,xmm2
+ pxor xmm3,xmm0
+ psrld xmm9,27
+ pxor xmm6,xmm10
+ movdqa xmm7,xmm14
+
+ pslld xmm7,30
+ movdqa xmm5,xmm3
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm12,xmm6
+ paddd xmm3,xmm3
+
+ psrld xmm14,2
+ paddd xmm12,xmm8
+ por xmm3,xmm5
+ por xmm14,xmm7
+ pxor xmm4,xmm1
+ movdqa xmm1,XMMWORD[((112-128))+rax]
+
+ movdqa xmm8,xmm12
+ movdqa xmm6,xmm10
+ pxor xmm4,XMMWORD[((208-128))+rax]
+ paddd xmm11,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm13
+
+ movdqa xmm9,xmm12
+ movdqa XMMWORD[(64-128)+rax],xmm3
+ paddd xmm11,xmm3
+ pxor xmm4,xmm1
+ psrld xmm9,27
+ pxor xmm6,xmm14
+ movdqa xmm7,xmm13
+
+ pslld xmm7,30
+ movdqa xmm5,xmm4
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm11,xmm6
+ paddd xmm4,xmm4
+
+ psrld xmm13,2
+ paddd xmm11,xmm8
+ por xmm4,xmm5
+ por xmm13,xmm7
+ pxor xmm0,xmm2
+ movdqa xmm2,XMMWORD[((128-128))+rax]
+
+ movdqa xmm8,xmm11
+ movdqa xmm6,xmm14
+ pxor xmm0,XMMWORD[((224-128))+rax]
+ paddd xmm10,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm12
+
+ movdqa xmm9,xmm11
+ movdqa XMMWORD[(80-128)+rax],xmm4
+ paddd xmm10,xmm4
+ pxor xmm0,xmm2
+ psrld xmm9,27
+ pxor xmm6,xmm13
+ movdqa xmm7,xmm12
+
+ pslld xmm7,30
+ movdqa xmm5,xmm0
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm10,xmm6
+ paddd xmm0,xmm0
+
+ psrld xmm12,2
+ paddd xmm10,xmm8
+ por xmm0,xmm5
+ por xmm12,xmm7
+ pxor xmm1,xmm3
+ movdqa xmm3,XMMWORD[((144-128))+rax]
+
+ movdqa xmm8,xmm10
+ movdqa xmm6,xmm13
+ pxor xmm1,XMMWORD[((240-128))+rax]
+ paddd xmm14,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm11
+
+ movdqa xmm9,xmm10
+ movdqa XMMWORD[(96-128)+rax],xmm0
+ paddd xmm14,xmm0
+ pxor xmm1,xmm3
+ psrld xmm9,27
+ pxor xmm6,xmm12
+ movdqa xmm7,xmm11
+
+ pslld xmm7,30
+ movdqa xmm5,xmm1
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm14,xmm6
+ paddd xmm1,xmm1
+
+ psrld xmm11,2
+ paddd xmm14,xmm8
+ por xmm1,xmm5
+ por xmm11,xmm7
+ pxor xmm2,xmm4
+ movdqa xmm4,XMMWORD[((160-128))+rax]
+
+ movdqa xmm8,xmm14
+ movdqa xmm6,xmm12
+ pxor xmm2,XMMWORD[((0-128))+rax]
+ paddd xmm13,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm10
+
+ movdqa xmm9,xmm14
+ movdqa XMMWORD[(112-128)+rax],xmm1
+ paddd xmm13,xmm1
+ pxor xmm2,xmm4
+ psrld xmm9,27
+ pxor xmm6,xmm11
+ movdqa xmm7,xmm10
+
+ pslld xmm7,30
+ movdqa xmm5,xmm2
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm13,xmm6
+ paddd xmm2,xmm2
+
+ psrld xmm10,2
+ paddd xmm13,xmm8
+ por xmm2,xmm5
+ por xmm10,xmm7
+ pxor xmm3,xmm0
+ movdqa xmm0,XMMWORD[((176-128))+rax]
+
+ movdqa xmm8,xmm13
+ movdqa xmm6,xmm11
+ pxor xmm3,XMMWORD[((16-128))+rax]
+ paddd xmm12,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm14
+
+ movdqa xmm9,xmm13
+ paddd xmm12,xmm2
+ pxor xmm3,xmm0
+ psrld xmm9,27
+ pxor xmm6,xmm10
+ movdqa xmm7,xmm14
+
+ pslld xmm7,30
+ movdqa xmm5,xmm3
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm12,xmm6
+ paddd xmm3,xmm3
+
+ psrld xmm14,2
+ paddd xmm12,xmm8
+ por xmm3,xmm5
+ por xmm14,xmm7
+ pxor xmm4,xmm1
+ movdqa xmm1,XMMWORD[((192-128))+rax]
+
+ movdqa xmm8,xmm12
+ movdqa xmm6,xmm10
+ pxor xmm4,XMMWORD[((32-128))+rax]
+ paddd xmm11,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm13
+
+ movdqa xmm9,xmm12
+ paddd xmm11,xmm3
+ pxor xmm4,xmm1
+ psrld xmm9,27
+ pxor xmm6,xmm14
+ movdqa xmm7,xmm13
+
+ pslld xmm7,30
+ movdqa xmm5,xmm4
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm11,xmm6
+ paddd xmm4,xmm4
+
+ psrld xmm13,2
+ paddd xmm11,xmm8
+ por xmm4,xmm5
+ por xmm13,xmm7
+ pxor xmm0,xmm2
+ movdqa xmm2,XMMWORD[((208-128))+rax]
+
+ movdqa xmm8,xmm11
+ movdqa xmm6,xmm14
+ pxor xmm0,XMMWORD[((48-128))+rax]
+ paddd xmm10,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm12
+
+ movdqa xmm9,xmm11
+ paddd xmm10,xmm4
+ pxor xmm0,xmm2
+ psrld xmm9,27
+ pxor xmm6,xmm13
+ movdqa xmm7,xmm12
+
+ pslld xmm7,30
+ movdqa xmm5,xmm0
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm10,xmm6
+ paddd xmm0,xmm0
+
+ psrld xmm12,2
+ paddd xmm10,xmm8
+ por xmm0,xmm5
+ por xmm12,xmm7
+ pxor xmm1,xmm3
+ movdqa xmm3,XMMWORD[((224-128))+rax]
+
+ movdqa xmm8,xmm10
+ movdqa xmm6,xmm13
+ pxor xmm1,XMMWORD[((64-128))+rax]
+ paddd xmm14,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm11
+
+ movdqa xmm9,xmm10
+ paddd xmm14,xmm0
+ pxor xmm1,xmm3
+ psrld xmm9,27
+ pxor xmm6,xmm12
+ movdqa xmm7,xmm11
+
+ pslld xmm7,30
+ movdqa xmm5,xmm1
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm14,xmm6
+ paddd xmm1,xmm1
+
+ psrld xmm11,2
+ paddd xmm14,xmm8
+ por xmm1,xmm5
+ por xmm11,xmm7
+ pxor xmm2,xmm4
+ movdqa xmm4,XMMWORD[((240-128))+rax]
+
+ movdqa xmm8,xmm14
+ movdqa xmm6,xmm12
+ pxor xmm2,XMMWORD[((80-128))+rax]
+ paddd xmm13,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm10
+
+ movdqa xmm9,xmm14
+ paddd xmm13,xmm1
+ pxor xmm2,xmm4
+ psrld xmm9,27
+ pxor xmm6,xmm11
+ movdqa xmm7,xmm10
+
+ pslld xmm7,30
+ movdqa xmm5,xmm2
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm13,xmm6
+ paddd xmm2,xmm2
+
+ psrld xmm10,2
+ paddd xmm13,xmm8
+ por xmm2,xmm5
+ por xmm10,xmm7
+ pxor xmm3,xmm0
+ movdqa xmm0,XMMWORD[((0-128))+rax]
+
+ movdqa xmm8,xmm13
+ movdqa xmm6,xmm11
+ pxor xmm3,XMMWORD[((96-128))+rax]
+ paddd xmm12,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm14
+
+ movdqa xmm9,xmm13
+ paddd xmm12,xmm2
+ pxor xmm3,xmm0
+ psrld xmm9,27
+ pxor xmm6,xmm10
+ movdqa xmm7,xmm14
+
+ pslld xmm7,30
+ movdqa xmm5,xmm3
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm12,xmm6
+ paddd xmm3,xmm3
+
+ psrld xmm14,2
+ paddd xmm12,xmm8
+ por xmm3,xmm5
+ por xmm14,xmm7
+ pxor xmm4,xmm1
+ movdqa xmm1,XMMWORD[((16-128))+rax]
+
+ movdqa xmm8,xmm12
+ movdqa xmm6,xmm10
+ pxor xmm4,XMMWORD[((112-128))+rax]
+ paddd xmm11,xmm15
+ pslld xmm8,5
+ pxor xmm6,xmm13
+
+ movdqa xmm9,xmm12
+ paddd xmm11,xmm3
+ pxor xmm4,xmm1
+ psrld xmm9,27
+ pxor xmm6,xmm14
+ movdqa xmm7,xmm13
+
+ pslld xmm7,30
+ movdqa xmm5,xmm4
+ por xmm8,xmm9
+ psrld xmm5,31
+ paddd xmm11,xmm6
+ paddd xmm4,xmm4
+
+ psrld xmm13,2
+ paddd xmm11,xmm8
+ por xmm4,xmm5
+ por xmm13,xmm7
+ movdqa xmm8,xmm11
+ paddd xmm10,xmm15
+ movdqa xmm6,xmm14
+ pslld xmm8,5
+ pxor xmm6,xmm12
+
+ movdqa xmm9,xmm11
+ paddd xmm10,xmm4
+ psrld xmm9,27
+ movdqa xmm7,xmm12
+ pxor xmm6,xmm13
+
+ pslld xmm7,30
+ por xmm8,xmm9
+ paddd xmm10,xmm6
+
+ psrld xmm12,2
+ paddd xmm10,xmm8
+ por xmm12,xmm7
+ movdqa xmm0,XMMWORD[rbx]
+ mov ecx,1
+ cmp ecx,DWORD[rbx]
+ pxor xmm8,xmm8
+ cmovge r8,rbp
+ cmp ecx,DWORD[4+rbx]
+ movdqa xmm1,xmm0
+ cmovge r9,rbp
+ cmp ecx,DWORD[8+rbx]
+ pcmpgtd xmm1,xmm8
+ cmovge r10,rbp
+ cmp ecx,DWORD[12+rbx]
+ paddd xmm0,xmm1
+ cmovge r11,rbp
+
+ movdqu xmm6,XMMWORD[rdi]
+ pand xmm10,xmm1
+ movdqu xmm7,XMMWORD[32+rdi]
+ pand xmm11,xmm1
+ paddd xmm10,xmm6
+ movdqu xmm8,XMMWORD[64+rdi]
+ pand xmm12,xmm1
+ paddd xmm11,xmm7
+ movdqu xmm9,XMMWORD[96+rdi]
+ pand xmm13,xmm1
+ paddd xmm12,xmm8
+ movdqu xmm5,XMMWORD[128+rdi]
+ pand xmm14,xmm1
+ movdqu XMMWORD[rdi],xmm10
+ paddd xmm13,xmm9
+ movdqu XMMWORD[32+rdi],xmm11
+ paddd xmm14,xmm5
+ movdqu XMMWORD[64+rdi],xmm12
+ movdqu XMMWORD[96+rdi],xmm13
+ movdqu XMMWORD[128+rdi],xmm14
+
+ movdqa XMMWORD[rbx],xmm0
+ movdqa xmm5,XMMWORD[96+rbp]
+ movdqa xmm15,XMMWORD[((-32))+rbp]
+ dec edx
+ jnz NEAR $L$oop
+
+ mov edx,DWORD[280+rsp]
+ lea rdi,[16+rdi]
+ lea rsi,[64+rsi]
+ dec edx
+ jnz NEAR $L$oop_grande
+
+$L$done:
+ mov rax,QWORD[272+rsp]
+
+ movaps xmm6,XMMWORD[((-184))+rax]
+ movaps xmm7,XMMWORD[((-168))+rax]
+ movaps xmm8,XMMWORD[((-152))+rax]
+ movaps xmm9,XMMWORD[((-136))+rax]
+ movaps xmm10,XMMWORD[((-120))+rax]
+ movaps xmm11,XMMWORD[((-104))+rax]
+ movaps xmm12,XMMWORD[((-88))+rax]
+ movaps xmm13,XMMWORD[((-72))+rax]
+ movaps xmm14,XMMWORD[((-56))+rax]
+ movaps xmm15,XMMWORD[((-40))+rax]
+ mov rbp,QWORD[((-16))+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+
+ lea rsp,[rax]
+
+$L$epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_sha1_multi_block:
+
+ALIGN 32
+sha1_multi_block_shaext:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_sha1_multi_block_shaext:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+
+_shaext_shortcut:
+ mov rax,rsp
+
+ push rbx
+
+ push rbp
+
+ lea rsp,[((-168))+rsp]
+ movaps XMMWORD[rsp],xmm6
+ movaps XMMWORD[16+rsp],xmm7
+ movaps XMMWORD[32+rsp],xmm8
+ movaps XMMWORD[48+rsp],xmm9
+ movaps XMMWORD[(-120)+rax],xmm10
+ movaps XMMWORD[(-104)+rax],xmm11
+ movaps XMMWORD[(-88)+rax],xmm12
+ movaps XMMWORD[(-72)+rax],xmm13
+ movaps XMMWORD[(-56)+rax],xmm14
+ movaps XMMWORD[(-40)+rax],xmm15
+ sub rsp,288
+ shl edx,1
+ and rsp,-256
+ lea rdi,[64+rdi]
+ mov QWORD[272+rsp],rax
+$L$body_shaext:
+ lea rbx,[256+rsp]
+ movdqa xmm3,XMMWORD[((K_XX_XX+128))]
+
+$L$oop_grande_shaext:
+ mov DWORD[280+rsp],edx
+ xor edx,edx
+ mov r8,QWORD[rsi]
+ mov ecx,DWORD[8+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[rbx],ecx
+ cmovle r8,rsp
+ mov r9,QWORD[16+rsi]
+ mov ecx,DWORD[24+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[4+rbx],ecx
+ cmovle r9,rsp
+ test edx,edx
+ jz NEAR $L$done_shaext
+
+ movq xmm0,QWORD[((0-64))+rdi]
+ movq xmm4,QWORD[((32-64))+rdi]
+ movq xmm5,QWORD[((64-64))+rdi]
+ movq xmm6,QWORD[((96-64))+rdi]
+ movq xmm7,QWORD[((128-64))+rdi]
+
+ punpckldq xmm0,xmm4
+ punpckldq xmm5,xmm6
+
+ movdqa xmm8,xmm0
+ punpcklqdq xmm0,xmm5
+ punpckhqdq xmm8,xmm5
+
+ pshufd xmm1,xmm7,63
+ pshufd xmm9,xmm7,127
+ pshufd xmm0,xmm0,27
+ pshufd xmm8,xmm8,27
+ jmp NEAR $L$oop_shaext
+
+ALIGN 32
+$L$oop_shaext:
+ movdqu xmm4,XMMWORD[r8]
+ movdqu xmm11,XMMWORD[r9]
+ movdqu xmm5,XMMWORD[16+r8]
+ movdqu xmm12,XMMWORD[16+r9]
+ movdqu xmm6,XMMWORD[32+r8]
+DB 102,15,56,0,227
+ movdqu xmm13,XMMWORD[32+r9]
+DB 102,68,15,56,0,219
+ movdqu xmm7,XMMWORD[48+r8]
+ lea r8,[64+r8]
+DB 102,15,56,0,235
+ movdqu xmm14,XMMWORD[48+r9]
+ lea r9,[64+r9]
+DB 102,68,15,56,0,227
+
+ movdqa XMMWORD[80+rsp],xmm1
+ paddd xmm1,xmm4
+ movdqa XMMWORD[112+rsp],xmm9
+ paddd xmm9,xmm11
+ movdqa XMMWORD[64+rsp],xmm0
+ movdqa xmm2,xmm0
+ movdqa XMMWORD[96+rsp],xmm8
+ movdqa xmm10,xmm8
+DB 15,58,204,193,0
+DB 15,56,200,213
+DB 69,15,58,204,193,0
+DB 69,15,56,200,212
+DB 102,15,56,0,243
+ prefetcht0 [127+r8]
+DB 15,56,201,229
+DB 102,68,15,56,0,235
+ prefetcht0 [127+r9]
+DB 69,15,56,201,220
+
+DB 102,15,56,0,251
+ movdqa xmm1,xmm0
+DB 102,68,15,56,0,243
+ movdqa xmm9,xmm8
+DB 15,58,204,194,0
+DB 15,56,200,206
+DB 69,15,58,204,194,0
+DB 69,15,56,200,205
+ pxor xmm4,xmm6
+DB 15,56,201,238
+ pxor xmm11,xmm13
+DB 69,15,56,201,229
+ movdqa xmm2,xmm0
+ movdqa xmm10,xmm8
+DB 15,58,204,193,0
+DB 15,56,200,215
+DB 69,15,58,204,193,0
+DB 69,15,56,200,214
+DB 15,56,202,231
+DB 69,15,56,202,222
+ pxor xmm5,xmm7
+DB 15,56,201,247
+ pxor xmm12,xmm14
+DB 69,15,56,201,238
+ movdqa xmm1,xmm0
+ movdqa xmm9,xmm8
+DB 15,58,204,194,0
+DB 15,56,200,204
+DB 69,15,58,204,194,0
+DB 69,15,56,200,203
+DB 15,56,202,236
+DB 69,15,56,202,227
+ pxor xmm6,xmm4
+DB 15,56,201,252
+ pxor xmm13,xmm11
+DB 69,15,56,201,243
+ movdqa xmm2,xmm0
+ movdqa xmm10,xmm8
+DB 15,58,204,193,0
+DB 15,56,200,213
+DB 69,15,58,204,193,0
+DB 69,15,56,200,212
+DB 15,56,202,245
+DB 69,15,56,202,236
+ pxor xmm7,xmm5
+DB 15,56,201,229
+ pxor xmm14,xmm12
+DB 69,15,56,201,220
+ movdqa xmm1,xmm0
+ movdqa xmm9,xmm8
+DB 15,58,204,194,1
+DB 15,56,200,206
+DB 69,15,58,204,194,1
+DB 69,15,56,200,205
+DB 15,56,202,254
+DB 69,15,56,202,245
+ pxor xmm4,xmm6
+DB 15,56,201,238
+ pxor xmm11,xmm13
+DB 69,15,56,201,229
+ movdqa xmm2,xmm0
+ movdqa xmm10,xmm8
+DB 15,58,204,193,1
+DB 15,56,200,215
+DB 69,15,58,204,193,1
+DB 69,15,56,200,214
+DB 15,56,202,231
+DB 69,15,56,202,222
+ pxor xmm5,xmm7
+DB 15,56,201,247
+ pxor xmm12,xmm14
+DB 69,15,56,201,238
+ movdqa xmm1,xmm0
+ movdqa xmm9,xmm8
+DB 15,58,204,194,1
+DB 15,56,200,204
+DB 69,15,58,204,194,1
+DB 69,15,56,200,203
+DB 15,56,202,236
+DB 69,15,56,202,227
+ pxor xmm6,xmm4
+DB 15,56,201,252
+ pxor xmm13,xmm11
+DB 69,15,56,201,243
+ movdqa xmm2,xmm0
+ movdqa xmm10,xmm8
+DB 15,58,204,193,1
+DB 15,56,200,213
+DB 69,15,58,204,193,1
+DB 69,15,56,200,212
+DB 15,56,202,245
+DB 69,15,56,202,236
+ pxor xmm7,xmm5
+DB 15,56,201,229
+ pxor xmm14,xmm12
+DB 69,15,56,201,220
+ movdqa xmm1,xmm0
+ movdqa xmm9,xmm8
+DB 15,58,204,194,1
+DB 15,56,200,206
+DB 69,15,58,204,194,1
+DB 69,15,56,200,205
+DB 15,56,202,254
+DB 69,15,56,202,245
+ pxor xmm4,xmm6
+DB 15,56,201,238
+ pxor xmm11,xmm13
+DB 69,15,56,201,229
+ movdqa xmm2,xmm0
+ movdqa xmm10,xmm8
+DB 15,58,204,193,2
+DB 15,56,200,215
+DB 69,15,58,204,193,2
+DB 69,15,56,200,214
+DB 15,56,202,231
+DB 69,15,56,202,222
+ pxor xmm5,xmm7
+DB 15,56,201,247
+ pxor xmm12,xmm14
+DB 69,15,56,201,238
+ movdqa xmm1,xmm0
+ movdqa xmm9,xmm8
+DB 15,58,204,194,2
+DB 15,56,200,204
+DB 69,15,58,204,194,2
+DB 69,15,56,200,203
+DB 15,56,202,236
+DB 69,15,56,202,227
+ pxor xmm6,xmm4
+DB 15,56,201,252
+ pxor xmm13,xmm11
+DB 69,15,56,201,243
+ movdqa xmm2,xmm0
+ movdqa xmm10,xmm8
+DB 15,58,204,193,2
+DB 15,56,200,213
+DB 69,15,58,204,193,2
+DB 69,15,56,200,212
+DB 15,56,202,245
+DB 69,15,56,202,236
+ pxor xmm7,xmm5
+DB 15,56,201,229
+ pxor xmm14,xmm12
+DB 69,15,56,201,220
+ movdqa xmm1,xmm0
+ movdqa xmm9,xmm8
+DB 15,58,204,194,2
+DB 15,56,200,206
+DB 69,15,58,204,194,2
+DB 69,15,56,200,205
+DB 15,56,202,254
+DB 69,15,56,202,245
+ pxor xmm4,xmm6
+DB 15,56,201,238
+ pxor xmm11,xmm13
+DB 69,15,56,201,229
+ movdqa xmm2,xmm0
+ movdqa xmm10,xmm8
+DB 15,58,204,193,2
+DB 15,56,200,215
+DB 69,15,58,204,193,2
+DB 69,15,56,200,214
+DB 15,56,202,231
+DB 69,15,56,202,222
+ pxor xmm5,xmm7
+DB 15,56,201,247
+ pxor xmm12,xmm14
+DB 69,15,56,201,238
+ movdqa xmm1,xmm0
+ movdqa xmm9,xmm8
+DB 15,58,204,194,3
+DB 15,56,200,204
+DB 69,15,58,204,194,3
+DB 69,15,56,200,203
+DB 15,56,202,236
+DB 69,15,56,202,227
+ pxor xmm6,xmm4
+DB 15,56,201,252
+ pxor xmm13,xmm11
+DB 69,15,56,201,243
+ movdqa xmm2,xmm0
+ movdqa xmm10,xmm8
+DB 15,58,204,193,3
+DB 15,56,200,213
+DB 69,15,58,204,193,3
+DB 69,15,56,200,212
+DB 15,56,202,245
+DB 69,15,56,202,236
+ pxor xmm7,xmm5
+ pxor xmm14,xmm12
+
+ mov ecx,1
+ pxor xmm4,xmm4
+ cmp ecx,DWORD[rbx]
+ cmovge r8,rsp
+
+ movdqa xmm1,xmm0
+ movdqa xmm9,xmm8
+DB 15,58,204,194,3
+DB 15,56,200,206
+DB 69,15,58,204,194,3
+DB 69,15,56,200,205
+DB 15,56,202,254
+DB 69,15,56,202,245
+
+ cmp ecx,DWORD[4+rbx]
+ cmovge r9,rsp
+ movq xmm6,QWORD[rbx]
+
+ movdqa xmm2,xmm0
+ movdqa xmm10,xmm8
+DB 15,58,204,193,3
+DB 15,56,200,215
+DB 69,15,58,204,193,3
+DB 69,15,56,200,214
+
+ pshufd xmm11,xmm6,0x00
+ pshufd xmm12,xmm6,0x55
+ movdqa xmm7,xmm6
+ pcmpgtd xmm11,xmm4
+ pcmpgtd xmm12,xmm4
+
+ movdqa xmm1,xmm0
+ movdqa xmm9,xmm8
+DB 15,58,204,194,3
+DB 15,56,200,204
+DB 69,15,58,204,194,3
+DB 68,15,56,200,204
+
+ pcmpgtd xmm7,xmm4
+ pand xmm0,xmm11
+ pand xmm1,xmm11
+ pand xmm8,xmm12
+ pand xmm9,xmm12
+ paddd xmm6,xmm7
+
+ paddd xmm0,XMMWORD[64+rsp]
+ paddd xmm1,XMMWORD[80+rsp]
+ paddd xmm8,XMMWORD[96+rsp]
+ paddd xmm9,XMMWORD[112+rsp]
+
+ movq QWORD[rbx],xmm6
+ dec edx
+ jnz NEAR $L$oop_shaext
+
+ mov edx,DWORD[280+rsp]
+
+ pshufd xmm0,xmm0,27
+ pshufd xmm8,xmm8,27
+
+ movdqa xmm6,xmm0
+ punpckldq xmm0,xmm8
+ punpckhdq xmm6,xmm8
+ punpckhdq xmm1,xmm9
+ movq QWORD[(0-64)+rdi],xmm0
+ psrldq xmm0,8
+ movq QWORD[(64-64)+rdi],xmm6
+ psrldq xmm6,8
+ movq QWORD[(32-64)+rdi],xmm0
+ psrldq xmm1,8
+ movq QWORD[(96-64)+rdi],xmm6
+ movq QWORD[(128-64)+rdi],xmm1
+
+ lea rdi,[8+rdi]
+ lea rsi,[32+rsi]
+ dec edx
+ jnz NEAR $L$oop_grande_shaext
+
+$L$done_shaext:
+
+ movaps xmm6,XMMWORD[((-184))+rax]
+ movaps xmm7,XMMWORD[((-168))+rax]
+ movaps xmm8,XMMWORD[((-152))+rax]
+ movaps xmm9,XMMWORD[((-136))+rax]
+ movaps xmm10,XMMWORD[((-120))+rax]
+ movaps xmm11,XMMWORD[((-104))+rax]
+ movaps xmm12,XMMWORD[((-88))+rax]
+ movaps xmm13,XMMWORD[((-72))+rax]
+ movaps xmm14,XMMWORD[((-56))+rax]
+ movaps xmm15,XMMWORD[((-40))+rax]
+ mov rbp,QWORD[((-16))+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+
+ lea rsp,[rax]
+
+$L$epilogue_shaext:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_sha1_multi_block_shaext:
+
+ALIGN 32
+sha1_multi_block_avx:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_sha1_multi_block_avx:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+
+_avx_shortcut:
+ shr rcx,32
+ cmp edx,2
+ jb NEAR $L$avx
+ test ecx,32
+ jnz NEAR _avx2_shortcut
+ jmp NEAR $L$avx
+ALIGN 32
+$L$avx:
+ mov rax,rsp
+
+ push rbx
+
+ push rbp
+
+ lea rsp,[((-168))+rsp]
+ movaps XMMWORD[rsp],xmm6
+ movaps XMMWORD[16+rsp],xmm7
+ movaps XMMWORD[32+rsp],xmm8
+ movaps XMMWORD[48+rsp],xmm9
+ movaps XMMWORD[(-120)+rax],xmm10
+ movaps XMMWORD[(-104)+rax],xmm11
+ movaps XMMWORD[(-88)+rax],xmm12
+ movaps XMMWORD[(-72)+rax],xmm13
+ movaps XMMWORD[(-56)+rax],xmm14
+ movaps XMMWORD[(-40)+rax],xmm15
+ sub rsp,288
+ and rsp,-256
+ mov QWORD[272+rsp],rax
+
+$L$body_avx:
+ lea rbp,[K_XX_XX]
+ lea rbx,[256+rsp]
+
+ vzeroupper
+$L$oop_grande_avx:
+ mov DWORD[280+rsp],edx
+ xor edx,edx
+ mov r8,QWORD[rsi]
+ mov ecx,DWORD[8+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[rbx],ecx
+ cmovle r8,rbp
+ mov r9,QWORD[16+rsi]
+ mov ecx,DWORD[24+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[4+rbx],ecx
+ cmovle r9,rbp
+ mov r10,QWORD[32+rsi]
+ mov ecx,DWORD[40+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[8+rbx],ecx
+ cmovle r10,rbp
+ mov r11,QWORD[48+rsi]
+ mov ecx,DWORD[56+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[12+rbx],ecx
+ cmovle r11,rbp
+ test edx,edx
+ jz NEAR $L$done_avx
+
+ vmovdqu xmm10,XMMWORD[rdi]
+ lea rax,[128+rsp]
+ vmovdqu xmm11,XMMWORD[32+rdi]
+ vmovdqu xmm12,XMMWORD[64+rdi]
+ vmovdqu xmm13,XMMWORD[96+rdi]
+ vmovdqu xmm14,XMMWORD[128+rdi]
+ vmovdqu xmm5,XMMWORD[96+rbp]
+ jmp NEAR $L$oop_avx
+
+ALIGN 32
+$L$oop_avx:
+ vmovdqa xmm15,XMMWORD[((-32))+rbp]
+ vmovd xmm0,DWORD[r8]
+ lea r8,[64+r8]
+ vmovd xmm2,DWORD[r9]
+ lea r9,[64+r9]
+ vpinsrd xmm0,xmm0,DWORD[r10],1
+ lea r10,[64+r10]
+ vpinsrd xmm2,xmm2,DWORD[r11],1
+ lea r11,[64+r11]
+ vmovd xmm1,DWORD[((-60))+r8]
+ vpunpckldq xmm0,xmm0,xmm2
+ vmovd xmm9,DWORD[((-60))+r9]
+ vpshufb xmm0,xmm0,xmm5
+ vpinsrd xmm1,xmm1,DWORD[((-60))+r10],1
+ vpinsrd xmm9,xmm9,DWORD[((-60))+r11],1
+ vpaddd xmm14,xmm14,xmm15
+ vpslld xmm8,xmm10,5
+ vpandn xmm7,xmm11,xmm13
+ vpand xmm6,xmm11,xmm12
+
+ vmovdqa XMMWORD[(0-128)+rax],xmm0
+ vpaddd xmm14,xmm14,xmm0
+ vpunpckldq xmm1,xmm1,xmm9
+ vpsrld xmm9,xmm10,27
+ vpxor xmm6,xmm6,xmm7
+ vmovd xmm2,DWORD[((-56))+r8]
+
+ vpslld xmm7,xmm11,30
+ vpor xmm8,xmm8,xmm9
+ vmovd xmm9,DWORD[((-56))+r9]
+ vpaddd xmm14,xmm14,xmm6
+
+ vpsrld xmm11,xmm11,2
+ vpaddd xmm14,xmm14,xmm8
+ vpshufb xmm1,xmm1,xmm5
+ vpor xmm11,xmm11,xmm7
+ vpinsrd xmm2,xmm2,DWORD[((-56))+r10],1
+ vpinsrd xmm9,xmm9,DWORD[((-56))+r11],1
+ vpaddd xmm13,xmm13,xmm15
+ vpslld xmm8,xmm14,5
+ vpandn xmm7,xmm10,xmm12
+ vpand xmm6,xmm10,xmm11
+
+ vmovdqa XMMWORD[(16-128)+rax],xmm1
+ vpaddd xmm13,xmm13,xmm1
+ vpunpckldq xmm2,xmm2,xmm9
+ vpsrld xmm9,xmm14,27
+ vpxor xmm6,xmm6,xmm7
+ vmovd xmm3,DWORD[((-52))+r8]
+
+ vpslld xmm7,xmm10,30
+ vpor xmm8,xmm8,xmm9
+ vmovd xmm9,DWORD[((-52))+r9]
+ vpaddd xmm13,xmm13,xmm6
+
+ vpsrld xmm10,xmm10,2
+ vpaddd xmm13,xmm13,xmm8
+ vpshufb xmm2,xmm2,xmm5
+ vpor xmm10,xmm10,xmm7
+ vpinsrd xmm3,xmm3,DWORD[((-52))+r10],1
+ vpinsrd xmm9,xmm9,DWORD[((-52))+r11],1
+ vpaddd xmm12,xmm12,xmm15
+ vpslld xmm8,xmm13,5
+ vpandn xmm7,xmm14,xmm11
+ vpand xmm6,xmm14,xmm10
+
+ vmovdqa XMMWORD[(32-128)+rax],xmm2
+ vpaddd xmm12,xmm12,xmm2
+ vpunpckldq xmm3,xmm3,xmm9
+ vpsrld xmm9,xmm13,27
+ vpxor xmm6,xmm6,xmm7
+ vmovd xmm4,DWORD[((-48))+r8]
+
+ vpslld xmm7,xmm14,30
+ vpor xmm8,xmm8,xmm9
+ vmovd xmm9,DWORD[((-48))+r9]
+ vpaddd xmm12,xmm12,xmm6
+
+ vpsrld xmm14,xmm14,2
+ vpaddd xmm12,xmm12,xmm8
+ vpshufb xmm3,xmm3,xmm5
+ vpor xmm14,xmm14,xmm7
+ vpinsrd xmm4,xmm4,DWORD[((-48))+r10],1
+ vpinsrd xmm9,xmm9,DWORD[((-48))+r11],1
+ vpaddd xmm11,xmm11,xmm15
+ vpslld xmm8,xmm12,5
+ vpandn xmm7,xmm13,xmm10
+ vpand xmm6,xmm13,xmm14
+
+ vmovdqa XMMWORD[(48-128)+rax],xmm3
+ vpaddd xmm11,xmm11,xmm3
+ vpunpckldq xmm4,xmm4,xmm9
+ vpsrld xmm9,xmm12,27
+ vpxor xmm6,xmm6,xmm7
+ vmovd xmm0,DWORD[((-44))+r8]
+
+ vpslld xmm7,xmm13,30
+ vpor xmm8,xmm8,xmm9
+ vmovd xmm9,DWORD[((-44))+r9]
+ vpaddd xmm11,xmm11,xmm6
+
+ vpsrld xmm13,xmm13,2
+ vpaddd xmm11,xmm11,xmm8
+ vpshufb xmm4,xmm4,xmm5
+ vpor xmm13,xmm13,xmm7
+ vpinsrd xmm0,xmm0,DWORD[((-44))+r10],1
+ vpinsrd xmm9,xmm9,DWORD[((-44))+r11],1
+ vpaddd xmm10,xmm10,xmm15
+ vpslld xmm8,xmm11,5
+ vpandn xmm7,xmm12,xmm14
+ vpand xmm6,xmm12,xmm13
+
+ vmovdqa XMMWORD[(64-128)+rax],xmm4
+ vpaddd xmm10,xmm10,xmm4
+ vpunpckldq xmm0,xmm0,xmm9
+ vpsrld xmm9,xmm11,27
+ vpxor xmm6,xmm6,xmm7
+ vmovd xmm1,DWORD[((-40))+r8]
+
+ vpslld xmm7,xmm12,30
+ vpor xmm8,xmm8,xmm9
+ vmovd xmm9,DWORD[((-40))+r9]
+ vpaddd xmm10,xmm10,xmm6
+
+ vpsrld xmm12,xmm12,2
+ vpaddd xmm10,xmm10,xmm8
+ vpshufb xmm0,xmm0,xmm5
+ vpor xmm12,xmm12,xmm7
+ vpinsrd xmm1,xmm1,DWORD[((-40))+r10],1
+ vpinsrd xmm9,xmm9,DWORD[((-40))+r11],1
+ vpaddd xmm14,xmm14,xmm15
+ vpslld xmm8,xmm10,5
+ vpandn xmm7,xmm11,xmm13
+ vpand xmm6,xmm11,xmm12
+
+ vmovdqa XMMWORD[(80-128)+rax],xmm0
+ vpaddd xmm14,xmm14,xmm0
+ vpunpckldq xmm1,xmm1,xmm9
+ vpsrld xmm9,xmm10,27
+ vpxor xmm6,xmm6,xmm7
+ vmovd xmm2,DWORD[((-36))+r8]
+
+ vpslld xmm7,xmm11,30
+ vpor xmm8,xmm8,xmm9
+ vmovd xmm9,DWORD[((-36))+r9]
+ vpaddd xmm14,xmm14,xmm6
+
+ vpsrld xmm11,xmm11,2
+ vpaddd xmm14,xmm14,xmm8
+ vpshufb xmm1,xmm1,xmm5
+ vpor xmm11,xmm11,xmm7
+ vpinsrd xmm2,xmm2,DWORD[((-36))+r10],1
+ vpinsrd xmm9,xmm9,DWORD[((-36))+r11],1
+ vpaddd xmm13,xmm13,xmm15
+ vpslld xmm8,xmm14,5
+ vpandn xmm7,xmm10,xmm12
+ vpand xmm6,xmm10,xmm11
+
+ vmovdqa XMMWORD[(96-128)+rax],xmm1
+ vpaddd xmm13,xmm13,xmm1
+ vpunpckldq xmm2,xmm2,xmm9
+ vpsrld xmm9,xmm14,27
+ vpxor xmm6,xmm6,xmm7
+ vmovd xmm3,DWORD[((-32))+r8]
+
+ vpslld xmm7,xmm10,30
+ vpor xmm8,xmm8,xmm9
+ vmovd xmm9,DWORD[((-32))+r9]
+ vpaddd xmm13,xmm13,xmm6
+
+ vpsrld xmm10,xmm10,2
+ vpaddd xmm13,xmm13,xmm8
+ vpshufb xmm2,xmm2,xmm5
+ vpor xmm10,xmm10,xmm7
+ vpinsrd xmm3,xmm3,DWORD[((-32))+r10],1
+ vpinsrd xmm9,xmm9,DWORD[((-32))+r11],1
+ vpaddd xmm12,xmm12,xmm15
+ vpslld xmm8,xmm13,5
+ vpandn xmm7,xmm14,xmm11
+ vpand xmm6,xmm14,xmm10
+
+ vmovdqa XMMWORD[(112-128)+rax],xmm2
+ vpaddd xmm12,xmm12,xmm2
+ vpunpckldq xmm3,xmm3,xmm9
+ vpsrld xmm9,xmm13,27
+ vpxor xmm6,xmm6,xmm7
+ vmovd xmm4,DWORD[((-28))+r8]
+
+ vpslld xmm7,xmm14,30
+ vpor xmm8,xmm8,xmm9
+ vmovd xmm9,DWORD[((-28))+r9]
+ vpaddd xmm12,xmm12,xmm6
+
+ vpsrld xmm14,xmm14,2
+ vpaddd xmm12,xmm12,xmm8
+ vpshufb xmm3,xmm3,xmm5
+ vpor xmm14,xmm14,xmm7
+ vpinsrd xmm4,xmm4,DWORD[((-28))+r10],1
+ vpinsrd xmm9,xmm9,DWORD[((-28))+r11],1
+ vpaddd xmm11,xmm11,xmm15
+ vpslld xmm8,xmm12,5
+ vpandn xmm7,xmm13,xmm10
+ vpand xmm6,xmm13,xmm14
+
+ vmovdqa XMMWORD[(128-128)+rax],xmm3
+ vpaddd xmm11,xmm11,xmm3
+ vpunpckldq xmm4,xmm4,xmm9
+ vpsrld xmm9,xmm12,27
+ vpxor xmm6,xmm6,xmm7
+ vmovd xmm0,DWORD[((-24))+r8]
+
+ vpslld xmm7,xmm13,30
+ vpor xmm8,xmm8,xmm9
+ vmovd xmm9,DWORD[((-24))+r9]
+ vpaddd xmm11,xmm11,xmm6
+
+ vpsrld xmm13,xmm13,2
+ vpaddd xmm11,xmm11,xmm8
+ vpshufb xmm4,xmm4,xmm5
+ vpor xmm13,xmm13,xmm7
+ vpinsrd xmm0,xmm0,DWORD[((-24))+r10],1
+ vpinsrd xmm9,xmm9,DWORD[((-24))+r11],1
+ vpaddd xmm10,xmm10,xmm15
+ vpslld xmm8,xmm11,5
+ vpandn xmm7,xmm12,xmm14
+ vpand xmm6,xmm12,xmm13
+
+ vmovdqa XMMWORD[(144-128)+rax],xmm4
+ vpaddd xmm10,xmm10,xmm4
+ vpunpckldq xmm0,xmm0,xmm9
+ vpsrld xmm9,xmm11,27
+ vpxor xmm6,xmm6,xmm7
+ vmovd xmm1,DWORD[((-20))+r8]
+
+ vpslld xmm7,xmm12,30
+ vpor xmm8,xmm8,xmm9
+ vmovd xmm9,DWORD[((-20))+r9]
+ vpaddd xmm10,xmm10,xmm6
+
+ vpsrld xmm12,xmm12,2
+ vpaddd xmm10,xmm10,xmm8
+ vpshufb xmm0,xmm0,xmm5
+ vpor xmm12,xmm12,xmm7
+ vpinsrd xmm1,xmm1,DWORD[((-20))+r10],1
+ vpinsrd xmm9,xmm9,DWORD[((-20))+r11],1
+ vpaddd xmm14,xmm14,xmm15
+ vpslld xmm8,xmm10,5
+ vpandn xmm7,xmm11,xmm13
+ vpand xmm6,xmm11,xmm12
+
+ vmovdqa XMMWORD[(160-128)+rax],xmm0
+ vpaddd xmm14,xmm14,xmm0
+ vpunpckldq xmm1,xmm1,xmm9
+ vpsrld xmm9,xmm10,27
+ vpxor xmm6,xmm6,xmm7
+ vmovd xmm2,DWORD[((-16))+r8]
+
+ vpslld xmm7,xmm11,30
+ vpor xmm8,xmm8,xmm9
+ vmovd xmm9,DWORD[((-16))+r9]
+ vpaddd xmm14,xmm14,xmm6
+
+ vpsrld xmm11,xmm11,2
+ vpaddd xmm14,xmm14,xmm8
+ vpshufb xmm1,xmm1,xmm5
+ vpor xmm11,xmm11,xmm7
+ vpinsrd xmm2,xmm2,DWORD[((-16))+r10],1
+ vpinsrd xmm9,xmm9,DWORD[((-16))+r11],1
+ vpaddd xmm13,xmm13,xmm15
+ vpslld xmm8,xmm14,5
+ vpandn xmm7,xmm10,xmm12
+ vpand xmm6,xmm10,xmm11
+
+ vmovdqa XMMWORD[(176-128)+rax],xmm1
+ vpaddd xmm13,xmm13,xmm1
+ vpunpckldq xmm2,xmm2,xmm9
+ vpsrld xmm9,xmm14,27
+ vpxor xmm6,xmm6,xmm7
+ vmovd xmm3,DWORD[((-12))+r8]
+
+ vpslld xmm7,xmm10,30
+ vpor xmm8,xmm8,xmm9
+ vmovd xmm9,DWORD[((-12))+r9]
+ vpaddd xmm13,xmm13,xmm6
+
+ vpsrld xmm10,xmm10,2
+ vpaddd xmm13,xmm13,xmm8
+ vpshufb xmm2,xmm2,xmm5
+ vpor xmm10,xmm10,xmm7
+ vpinsrd xmm3,xmm3,DWORD[((-12))+r10],1
+ vpinsrd xmm9,xmm9,DWORD[((-12))+r11],1
+ vpaddd xmm12,xmm12,xmm15
+ vpslld xmm8,xmm13,5
+ vpandn xmm7,xmm14,xmm11
+ vpand xmm6,xmm14,xmm10
+
+ vmovdqa XMMWORD[(192-128)+rax],xmm2
+ vpaddd xmm12,xmm12,xmm2
+ vpunpckldq xmm3,xmm3,xmm9
+ vpsrld xmm9,xmm13,27
+ vpxor xmm6,xmm6,xmm7
+ vmovd xmm4,DWORD[((-8))+r8]
+
+ vpslld xmm7,xmm14,30
+ vpor xmm8,xmm8,xmm9
+ vmovd xmm9,DWORD[((-8))+r9]
+ vpaddd xmm12,xmm12,xmm6
+
+ vpsrld xmm14,xmm14,2
+ vpaddd xmm12,xmm12,xmm8
+ vpshufb xmm3,xmm3,xmm5
+ vpor xmm14,xmm14,xmm7
+ vpinsrd xmm4,xmm4,DWORD[((-8))+r10],1
+ vpinsrd xmm9,xmm9,DWORD[((-8))+r11],1
+ vpaddd xmm11,xmm11,xmm15
+ vpslld xmm8,xmm12,5
+ vpandn xmm7,xmm13,xmm10
+ vpand xmm6,xmm13,xmm14
+
+ vmovdqa XMMWORD[(208-128)+rax],xmm3
+ vpaddd xmm11,xmm11,xmm3
+ vpunpckldq xmm4,xmm4,xmm9
+ vpsrld xmm9,xmm12,27
+ vpxor xmm6,xmm6,xmm7
+ vmovd xmm0,DWORD[((-4))+r8]
+
+ vpslld xmm7,xmm13,30
+ vpor xmm8,xmm8,xmm9
+ vmovd xmm9,DWORD[((-4))+r9]
+ vpaddd xmm11,xmm11,xmm6
+
+ vpsrld xmm13,xmm13,2
+ vpaddd xmm11,xmm11,xmm8
+ vpshufb xmm4,xmm4,xmm5
+ vpor xmm13,xmm13,xmm7
+ vmovdqa xmm1,XMMWORD[((0-128))+rax]
+ vpinsrd xmm0,xmm0,DWORD[((-4))+r10],1
+ vpinsrd xmm9,xmm9,DWORD[((-4))+r11],1
+ vpaddd xmm10,xmm10,xmm15
+ prefetcht0 [63+r8]
+ vpslld xmm8,xmm11,5
+ vpandn xmm7,xmm12,xmm14
+ vpand xmm6,xmm12,xmm13
+
+ vmovdqa XMMWORD[(224-128)+rax],xmm4
+ vpaddd xmm10,xmm10,xmm4
+ vpunpckldq xmm0,xmm0,xmm9
+ vpsrld xmm9,xmm11,27
+ prefetcht0 [63+r9]
+ vpxor xmm6,xmm6,xmm7
+
+ vpslld xmm7,xmm12,30
+ vpor xmm8,xmm8,xmm9
+ prefetcht0 [63+r10]
+ vpaddd xmm10,xmm10,xmm6
+
+ vpsrld xmm12,xmm12,2
+ vpaddd xmm10,xmm10,xmm8
+ prefetcht0 [63+r11]
+ vpshufb xmm0,xmm0,xmm5
+ vpor xmm12,xmm12,xmm7
+ vmovdqa xmm2,XMMWORD[((16-128))+rax]
+ vpxor xmm1,xmm1,xmm3
+ vmovdqa xmm3,XMMWORD[((32-128))+rax]
+
+ vpaddd xmm14,xmm14,xmm15
+ vpslld xmm8,xmm10,5
+ vpandn xmm7,xmm11,xmm13
+
+ vpand xmm6,xmm11,xmm12
+
+ vmovdqa XMMWORD[(240-128)+rax],xmm0
+ vpaddd xmm14,xmm14,xmm0
+ vpxor xmm1,xmm1,XMMWORD[((128-128))+rax]
+ vpsrld xmm9,xmm10,27
+ vpxor xmm6,xmm6,xmm7
+ vpxor xmm1,xmm1,xmm3
+
+
+ vpslld xmm7,xmm11,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm14,xmm14,xmm6
+
+ vpsrld xmm5,xmm1,31
+ vpaddd xmm1,xmm1,xmm1
+
+ vpsrld xmm11,xmm11,2
+
+ vpaddd xmm14,xmm14,xmm8
+ vpor xmm1,xmm1,xmm5
+ vpor xmm11,xmm11,xmm7
+ vpxor xmm2,xmm2,xmm4
+ vmovdqa xmm4,XMMWORD[((48-128))+rax]
+
+ vpaddd xmm13,xmm13,xmm15
+ vpslld xmm8,xmm14,5
+ vpandn xmm7,xmm10,xmm12
+
+ vpand xmm6,xmm10,xmm11
+
+ vmovdqa XMMWORD[(0-128)+rax],xmm1
+ vpaddd xmm13,xmm13,xmm1
+ vpxor xmm2,xmm2,XMMWORD[((144-128))+rax]
+ vpsrld xmm9,xmm14,27
+ vpxor xmm6,xmm6,xmm7
+ vpxor xmm2,xmm2,xmm4
+
+
+ vpslld xmm7,xmm10,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm13,xmm13,xmm6
+
+ vpsrld xmm5,xmm2,31
+ vpaddd xmm2,xmm2,xmm2
+
+ vpsrld xmm10,xmm10,2
+
+ vpaddd xmm13,xmm13,xmm8
+ vpor xmm2,xmm2,xmm5
+ vpor xmm10,xmm10,xmm7
+ vpxor xmm3,xmm3,xmm0
+ vmovdqa xmm0,XMMWORD[((64-128))+rax]
+
+ vpaddd xmm12,xmm12,xmm15
+ vpslld xmm8,xmm13,5
+ vpandn xmm7,xmm14,xmm11
+
+ vpand xmm6,xmm14,xmm10
+
+ vmovdqa XMMWORD[(16-128)+rax],xmm2
+ vpaddd xmm12,xmm12,xmm2
+ vpxor xmm3,xmm3,XMMWORD[((160-128))+rax]
+ vpsrld xmm9,xmm13,27
+ vpxor xmm6,xmm6,xmm7
+ vpxor xmm3,xmm3,xmm0
+
+
+ vpslld xmm7,xmm14,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm12,xmm12,xmm6
+
+ vpsrld xmm5,xmm3,31
+ vpaddd xmm3,xmm3,xmm3
+
+ vpsrld xmm14,xmm14,2
+
+ vpaddd xmm12,xmm12,xmm8
+ vpor xmm3,xmm3,xmm5
+ vpor xmm14,xmm14,xmm7
+ vpxor xmm4,xmm4,xmm1
+ vmovdqa xmm1,XMMWORD[((80-128))+rax]
+
+ vpaddd xmm11,xmm11,xmm15
+ vpslld xmm8,xmm12,5
+ vpandn xmm7,xmm13,xmm10
+
+ vpand xmm6,xmm13,xmm14
+
+ vmovdqa XMMWORD[(32-128)+rax],xmm3
+ vpaddd xmm11,xmm11,xmm3
+ vpxor xmm4,xmm4,XMMWORD[((176-128))+rax]
+ vpsrld xmm9,xmm12,27
+ vpxor xmm6,xmm6,xmm7
+ vpxor xmm4,xmm4,xmm1
+
+
+ vpslld xmm7,xmm13,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm11,xmm11,xmm6
+
+ vpsrld xmm5,xmm4,31
+ vpaddd xmm4,xmm4,xmm4
+
+ vpsrld xmm13,xmm13,2
+
+ vpaddd xmm11,xmm11,xmm8
+ vpor xmm4,xmm4,xmm5
+ vpor xmm13,xmm13,xmm7
+ vpxor xmm0,xmm0,xmm2
+ vmovdqa xmm2,XMMWORD[((96-128))+rax]
+
+ vpaddd xmm10,xmm10,xmm15
+ vpslld xmm8,xmm11,5
+ vpandn xmm7,xmm12,xmm14
+
+ vpand xmm6,xmm12,xmm13
+
+ vmovdqa XMMWORD[(48-128)+rax],xmm4
+ vpaddd xmm10,xmm10,xmm4
+ vpxor xmm0,xmm0,XMMWORD[((192-128))+rax]
+ vpsrld xmm9,xmm11,27
+ vpxor xmm6,xmm6,xmm7
+ vpxor xmm0,xmm0,xmm2
+
+
+ vpslld xmm7,xmm12,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm10,xmm10,xmm6
+
+ vpsrld xmm5,xmm0,31
+ vpaddd xmm0,xmm0,xmm0
+
+ vpsrld xmm12,xmm12,2
+
+ vpaddd xmm10,xmm10,xmm8
+ vpor xmm0,xmm0,xmm5
+ vpor xmm12,xmm12,xmm7
+ vmovdqa xmm15,XMMWORD[rbp]
+ vpxor xmm1,xmm1,xmm3
+ vmovdqa xmm3,XMMWORD[((112-128))+rax]
+
+ vpslld xmm8,xmm10,5
+ vpaddd xmm14,xmm14,xmm15
+ vpxor xmm6,xmm13,xmm11
+ vmovdqa XMMWORD[(64-128)+rax],xmm0
+ vpaddd xmm14,xmm14,xmm0
+ vpxor xmm1,xmm1,XMMWORD[((208-128))+rax]
+ vpsrld xmm9,xmm10,27
+ vpxor xmm6,xmm6,xmm12
+ vpxor xmm1,xmm1,xmm3
+
+ vpslld xmm7,xmm11,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm14,xmm14,xmm6
+ vpsrld xmm5,xmm1,31
+ vpaddd xmm1,xmm1,xmm1
+
+ vpsrld xmm11,xmm11,2
+ vpaddd xmm14,xmm14,xmm8
+ vpor xmm1,xmm1,xmm5
+ vpor xmm11,xmm11,xmm7
+ vpxor xmm2,xmm2,xmm4
+ vmovdqa xmm4,XMMWORD[((128-128))+rax]
+
+ vpslld xmm8,xmm14,5
+ vpaddd xmm13,xmm13,xmm15
+ vpxor xmm6,xmm12,xmm10
+ vmovdqa XMMWORD[(80-128)+rax],xmm1
+ vpaddd xmm13,xmm13,xmm1
+ vpxor xmm2,xmm2,XMMWORD[((224-128))+rax]
+ vpsrld xmm9,xmm14,27
+ vpxor xmm6,xmm6,xmm11
+ vpxor xmm2,xmm2,xmm4
+
+ vpslld xmm7,xmm10,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm13,xmm13,xmm6
+ vpsrld xmm5,xmm2,31
+ vpaddd xmm2,xmm2,xmm2
+
+ vpsrld xmm10,xmm10,2
+ vpaddd xmm13,xmm13,xmm8
+ vpor xmm2,xmm2,xmm5
+ vpor xmm10,xmm10,xmm7
+ vpxor xmm3,xmm3,xmm0
+ vmovdqa xmm0,XMMWORD[((144-128))+rax]
+
+ vpslld xmm8,xmm13,5
+ vpaddd xmm12,xmm12,xmm15
+ vpxor xmm6,xmm11,xmm14
+ vmovdqa XMMWORD[(96-128)+rax],xmm2
+ vpaddd xmm12,xmm12,xmm2
+ vpxor xmm3,xmm3,XMMWORD[((240-128))+rax]
+ vpsrld xmm9,xmm13,27
+ vpxor xmm6,xmm6,xmm10
+ vpxor xmm3,xmm3,xmm0
+
+ vpslld xmm7,xmm14,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm12,xmm12,xmm6
+ vpsrld xmm5,xmm3,31
+ vpaddd xmm3,xmm3,xmm3
+
+ vpsrld xmm14,xmm14,2
+ vpaddd xmm12,xmm12,xmm8
+ vpor xmm3,xmm3,xmm5
+ vpor xmm14,xmm14,xmm7
+ vpxor xmm4,xmm4,xmm1
+ vmovdqa xmm1,XMMWORD[((160-128))+rax]
+
+ vpslld xmm8,xmm12,5
+ vpaddd xmm11,xmm11,xmm15
+ vpxor xmm6,xmm10,xmm13
+ vmovdqa XMMWORD[(112-128)+rax],xmm3
+ vpaddd xmm11,xmm11,xmm3
+ vpxor xmm4,xmm4,XMMWORD[((0-128))+rax]
+ vpsrld xmm9,xmm12,27
+ vpxor xmm6,xmm6,xmm14
+ vpxor xmm4,xmm4,xmm1
+
+ vpslld xmm7,xmm13,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm11,xmm11,xmm6
+ vpsrld xmm5,xmm4,31
+ vpaddd xmm4,xmm4,xmm4
+
+ vpsrld xmm13,xmm13,2
+ vpaddd xmm11,xmm11,xmm8
+ vpor xmm4,xmm4,xmm5
+ vpor xmm13,xmm13,xmm7
+ vpxor xmm0,xmm0,xmm2
+ vmovdqa xmm2,XMMWORD[((176-128))+rax]
+
+ vpslld xmm8,xmm11,5
+ vpaddd xmm10,xmm10,xmm15
+ vpxor xmm6,xmm14,xmm12
+ vmovdqa XMMWORD[(128-128)+rax],xmm4
+ vpaddd xmm10,xmm10,xmm4
+ vpxor xmm0,xmm0,XMMWORD[((16-128))+rax]
+ vpsrld xmm9,xmm11,27
+ vpxor xmm6,xmm6,xmm13
+ vpxor xmm0,xmm0,xmm2
+
+ vpslld xmm7,xmm12,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm10,xmm10,xmm6
+ vpsrld xmm5,xmm0,31
+ vpaddd xmm0,xmm0,xmm0
+
+ vpsrld xmm12,xmm12,2
+ vpaddd xmm10,xmm10,xmm8
+ vpor xmm0,xmm0,xmm5
+ vpor xmm12,xmm12,xmm7
+ vpxor xmm1,xmm1,xmm3
+ vmovdqa xmm3,XMMWORD[((192-128))+rax]
+
+ vpslld xmm8,xmm10,5
+ vpaddd xmm14,xmm14,xmm15
+ vpxor xmm6,xmm13,xmm11
+ vmovdqa XMMWORD[(144-128)+rax],xmm0
+ vpaddd xmm14,xmm14,xmm0
+ vpxor xmm1,xmm1,XMMWORD[((32-128))+rax]
+ vpsrld xmm9,xmm10,27
+ vpxor xmm6,xmm6,xmm12
+ vpxor xmm1,xmm1,xmm3
+
+ vpslld xmm7,xmm11,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm14,xmm14,xmm6
+ vpsrld xmm5,xmm1,31
+ vpaddd xmm1,xmm1,xmm1
+
+ vpsrld xmm11,xmm11,2
+ vpaddd xmm14,xmm14,xmm8
+ vpor xmm1,xmm1,xmm5
+ vpor xmm11,xmm11,xmm7
+ vpxor xmm2,xmm2,xmm4
+ vmovdqa xmm4,XMMWORD[((208-128))+rax]
+
+ vpslld xmm8,xmm14,5
+ vpaddd xmm13,xmm13,xmm15
+ vpxor xmm6,xmm12,xmm10
+ vmovdqa XMMWORD[(160-128)+rax],xmm1
+ vpaddd xmm13,xmm13,xmm1
+ vpxor xmm2,xmm2,XMMWORD[((48-128))+rax]
+ vpsrld xmm9,xmm14,27
+ vpxor xmm6,xmm6,xmm11
+ vpxor xmm2,xmm2,xmm4
+
+ vpslld xmm7,xmm10,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm13,xmm13,xmm6
+ vpsrld xmm5,xmm2,31
+ vpaddd xmm2,xmm2,xmm2
+
+ vpsrld xmm10,xmm10,2
+ vpaddd xmm13,xmm13,xmm8
+ vpor xmm2,xmm2,xmm5
+ vpor xmm10,xmm10,xmm7
+ vpxor xmm3,xmm3,xmm0
+ vmovdqa xmm0,XMMWORD[((224-128))+rax]
+
+ vpslld xmm8,xmm13,5
+ vpaddd xmm12,xmm12,xmm15
+ vpxor xmm6,xmm11,xmm14
+ vmovdqa XMMWORD[(176-128)+rax],xmm2
+ vpaddd xmm12,xmm12,xmm2
+ vpxor xmm3,xmm3,XMMWORD[((64-128))+rax]
+ vpsrld xmm9,xmm13,27
+ vpxor xmm6,xmm6,xmm10
+ vpxor xmm3,xmm3,xmm0
+
+ vpslld xmm7,xmm14,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm12,xmm12,xmm6
+ vpsrld xmm5,xmm3,31
+ vpaddd xmm3,xmm3,xmm3
+
+ vpsrld xmm14,xmm14,2
+ vpaddd xmm12,xmm12,xmm8
+ vpor xmm3,xmm3,xmm5
+ vpor xmm14,xmm14,xmm7
+ vpxor xmm4,xmm4,xmm1
+ vmovdqa xmm1,XMMWORD[((240-128))+rax]
+
+ vpslld xmm8,xmm12,5
+ vpaddd xmm11,xmm11,xmm15
+ vpxor xmm6,xmm10,xmm13
+ vmovdqa XMMWORD[(192-128)+rax],xmm3
+ vpaddd xmm11,xmm11,xmm3
+ vpxor xmm4,xmm4,XMMWORD[((80-128))+rax]
+ vpsrld xmm9,xmm12,27
+ vpxor xmm6,xmm6,xmm14
+ vpxor xmm4,xmm4,xmm1
+
+ vpslld xmm7,xmm13,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm11,xmm11,xmm6
+ vpsrld xmm5,xmm4,31
+ vpaddd xmm4,xmm4,xmm4
+
+ vpsrld xmm13,xmm13,2
+ vpaddd xmm11,xmm11,xmm8
+ vpor xmm4,xmm4,xmm5
+ vpor xmm13,xmm13,xmm7
+ vpxor xmm0,xmm0,xmm2
+ vmovdqa xmm2,XMMWORD[((0-128))+rax]
+
+ vpslld xmm8,xmm11,5
+ vpaddd xmm10,xmm10,xmm15
+ vpxor xmm6,xmm14,xmm12
+ vmovdqa XMMWORD[(208-128)+rax],xmm4
+ vpaddd xmm10,xmm10,xmm4
+ vpxor xmm0,xmm0,XMMWORD[((96-128))+rax]
+ vpsrld xmm9,xmm11,27
+ vpxor xmm6,xmm6,xmm13
+ vpxor xmm0,xmm0,xmm2
+
+ vpslld xmm7,xmm12,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm10,xmm10,xmm6
+ vpsrld xmm5,xmm0,31
+ vpaddd xmm0,xmm0,xmm0
+
+ vpsrld xmm12,xmm12,2
+ vpaddd xmm10,xmm10,xmm8
+ vpor xmm0,xmm0,xmm5
+ vpor xmm12,xmm12,xmm7
+ vpxor xmm1,xmm1,xmm3
+ vmovdqa xmm3,XMMWORD[((16-128))+rax]
+
+ vpslld xmm8,xmm10,5
+ vpaddd xmm14,xmm14,xmm15
+ vpxor xmm6,xmm13,xmm11
+ vmovdqa XMMWORD[(224-128)+rax],xmm0
+ vpaddd xmm14,xmm14,xmm0
+ vpxor xmm1,xmm1,XMMWORD[((112-128))+rax]
+ vpsrld xmm9,xmm10,27
+ vpxor xmm6,xmm6,xmm12
+ vpxor xmm1,xmm1,xmm3
+
+ vpslld xmm7,xmm11,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm14,xmm14,xmm6
+ vpsrld xmm5,xmm1,31
+ vpaddd xmm1,xmm1,xmm1
+
+ vpsrld xmm11,xmm11,2
+ vpaddd xmm14,xmm14,xmm8
+ vpor xmm1,xmm1,xmm5
+ vpor xmm11,xmm11,xmm7
+ vpxor xmm2,xmm2,xmm4
+ vmovdqa xmm4,XMMWORD[((32-128))+rax]
+
+ vpslld xmm8,xmm14,5
+ vpaddd xmm13,xmm13,xmm15
+ vpxor xmm6,xmm12,xmm10
+ vmovdqa XMMWORD[(240-128)+rax],xmm1
+ vpaddd xmm13,xmm13,xmm1
+ vpxor xmm2,xmm2,XMMWORD[((128-128))+rax]
+ vpsrld xmm9,xmm14,27
+ vpxor xmm6,xmm6,xmm11
+ vpxor xmm2,xmm2,xmm4
+
+ vpslld xmm7,xmm10,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm13,xmm13,xmm6
+ vpsrld xmm5,xmm2,31
+ vpaddd xmm2,xmm2,xmm2
+
+ vpsrld xmm10,xmm10,2
+ vpaddd xmm13,xmm13,xmm8
+ vpor xmm2,xmm2,xmm5
+ vpor xmm10,xmm10,xmm7
+ vpxor xmm3,xmm3,xmm0
+ vmovdqa xmm0,XMMWORD[((48-128))+rax]
+
+ vpslld xmm8,xmm13,5
+ vpaddd xmm12,xmm12,xmm15
+ vpxor xmm6,xmm11,xmm14
+ vmovdqa XMMWORD[(0-128)+rax],xmm2
+ vpaddd xmm12,xmm12,xmm2
+ vpxor xmm3,xmm3,XMMWORD[((144-128))+rax]
+ vpsrld xmm9,xmm13,27
+ vpxor xmm6,xmm6,xmm10
+ vpxor xmm3,xmm3,xmm0
+
+ vpslld xmm7,xmm14,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm12,xmm12,xmm6
+ vpsrld xmm5,xmm3,31
+ vpaddd xmm3,xmm3,xmm3
+
+ vpsrld xmm14,xmm14,2
+ vpaddd xmm12,xmm12,xmm8
+ vpor xmm3,xmm3,xmm5
+ vpor xmm14,xmm14,xmm7
+ vpxor xmm4,xmm4,xmm1
+ vmovdqa xmm1,XMMWORD[((64-128))+rax]
+
+ vpslld xmm8,xmm12,5
+ vpaddd xmm11,xmm11,xmm15
+ vpxor xmm6,xmm10,xmm13
+ vmovdqa XMMWORD[(16-128)+rax],xmm3
+ vpaddd xmm11,xmm11,xmm3
+ vpxor xmm4,xmm4,XMMWORD[((160-128))+rax]
+ vpsrld xmm9,xmm12,27
+ vpxor xmm6,xmm6,xmm14
+ vpxor xmm4,xmm4,xmm1
+
+ vpslld xmm7,xmm13,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm11,xmm11,xmm6
+ vpsrld xmm5,xmm4,31
+ vpaddd xmm4,xmm4,xmm4
+
+ vpsrld xmm13,xmm13,2
+ vpaddd xmm11,xmm11,xmm8
+ vpor xmm4,xmm4,xmm5
+ vpor xmm13,xmm13,xmm7
+ vpxor xmm0,xmm0,xmm2
+ vmovdqa xmm2,XMMWORD[((80-128))+rax]
+
+ vpslld xmm8,xmm11,5
+ vpaddd xmm10,xmm10,xmm15
+ vpxor xmm6,xmm14,xmm12
+ vmovdqa XMMWORD[(32-128)+rax],xmm4
+ vpaddd xmm10,xmm10,xmm4
+ vpxor xmm0,xmm0,XMMWORD[((176-128))+rax]
+ vpsrld xmm9,xmm11,27
+ vpxor xmm6,xmm6,xmm13
+ vpxor xmm0,xmm0,xmm2
+
+ vpslld xmm7,xmm12,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm10,xmm10,xmm6
+ vpsrld xmm5,xmm0,31
+ vpaddd xmm0,xmm0,xmm0
+
+ vpsrld xmm12,xmm12,2
+ vpaddd xmm10,xmm10,xmm8
+ vpor xmm0,xmm0,xmm5
+ vpor xmm12,xmm12,xmm7
+ vpxor xmm1,xmm1,xmm3
+ vmovdqa xmm3,XMMWORD[((96-128))+rax]
+
+ vpslld xmm8,xmm10,5
+ vpaddd xmm14,xmm14,xmm15
+ vpxor xmm6,xmm13,xmm11
+ vmovdqa XMMWORD[(48-128)+rax],xmm0
+ vpaddd xmm14,xmm14,xmm0
+ vpxor xmm1,xmm1,XMMWORD[((192-128))+rax]
+ vpsrld xmm9,xmm10,27
+ vpxor xmm6,xmm6,xmm12
+ vpxor xmm1,xmm1,xmm3
+
+ vpslld xmm7,xmm11,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm14,xmm14,xmm6
+ vpsrld xmm5,xmm1,31
+ vpaddd xmm1,xmm1,xmm1
+
+ vpsrld xmm11,xmm11,2
+ vpaddd xmm14,xmm14,xmm8
+ vpor xmm1,xmm1,xmm5
+ vpor xmm11,xmm11,xmm7
+ vpxor xmm2,xmm2,xmm4
+ vmovdqa xmm4,XMMWORD[((112-128))+rax]
+
+ vpslld xmm8,xmm14,5
+ vpaddd xmm13,xmm13,xmm15
+ vpxor xmm6,xmm12,xmm10
+ vmovdqa XMMWORD[(64-128)+rax],xmm1
+ vpaddd xmm13,xmm13,xmm1
+ vpxor xmm2,xmm2,XMMWORD[((208-128))+rax]
+ vpsrld xmm9,xmm14,27
+ vpxor xmm6,xmm6,xmm11
+ vpxor xmm2,xmm2,xmm4
+
+ vpslld xmm7,xmm10,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm13,xmm13,xmm6
+ vpsrld xmm5,xmm2,31
+ vpaddd xmm2,xmm2,xmm2
+
+ vpsrld xmm10,xmm10,2
+ vpaddd xmm13,xmm13,xmm8
+ vpor xmm2,xmm2,xmm5
+ vpor xmm10,xmm10,xmm7
+ vpxor xmm3,xmm3,xmm0
+ vmovdqa xmm0,XMMWORD[((128-128))+rax]
+
+ vpslld xmm8,xmm13,5
+ vpaddd xmm12,xmm12,xmm15
+ vpxor xmm6,xmm11,xmm14
+ vmovdqa XMMWORD[(80-128)+rax],xmm2
+ vpaddd xmm12,xmm12,xmm2
+ vpxor xmm3,xmm3,XMMWORD[((224-128))+rax]
+ vpsrld xmm9,xmm13,27
+ vpxor xmm6,xmm6,xmm10
+ vpxor xmm3,xmm3,xmm0
+
+ vpslld xmm7,xmm14,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm12,xmm12,xmm6
+ vpsrld xmm5,xmm3,31
+ vpaddd xmm3,xmm3,xmm3
+
+ vpsrld xmm14,xmm14,2
+ vpaddd xmm12,xmm12,xmm8
+ vpor xmm3,xmm3,xmm5
+ vpor xmm14,xmm14,xmm7
+ vpxor xmm4,xmm4,xmm1
+ vmovdqa xmm1,XMMWORD[((144-128))+rax]
+
+ vpslld xmm8,xmm12,5
+ vpaddd xmm11,xmm11,xmm15
+ vpxor xmm6,xmm10,xmm13
+ vmovdqa XMMWORD[(96-128)+rax],xmm3
+ vpaddd xmm11,xmm11,xmm3
+ vpxor xmm4,xmm4,XMMWORD[((240-128))+rax]
+ vpsrld xmm9,xmm12,27
+ vpxor xmm6,xmm6,xmm14
+ vpxor xmm4,xmm4,xmm1
+
+ vpslld xmm7,xmm13,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm11,xmm11,xmm6
+ vpsrld xmm5,xmm4,31
+ vpaddd xmm4,xmm4,xmm4
+
+ vpsrld xmm13,xmm13,2
+ vpaddd xmm11,xmm11,xmm8
+ vpor xmm4,xmm4,xmm5
+ vpor xmm13,xmm13,xmm7
+ vpxor xmm0,xmm0,xmm2
+ vmovdqa xmm2,XMMWORD[((160-128))+rax]
+
+ vpslld xmm8,xmm11,5
+ vpaddd xmm10,xmm10,xmm15
+ vpxor xmm6,xmm14,xmm12
+ vmovdqa XMMWORD[(112-128)+rax],xmm4
+ vpaddd xmm10,xmm10,xmm4
+ vpxor xmm0,xmm0,XMMWORD[((0-128))+rax]
+ vpsrld xmm9,xmm11,27
+ vpxor xmm6,xmm6,xmm13
+ vpxor xmm0,xmm0,xmm2
+
+ vpslld xmm7,xmm12,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm10,xmm10,xmm6
+ vpsrld xmm5,xmm0,31
+ vpaddd xmm0,xmm0,xmm0
+
+ vpsrld xmm12,xmm12,2
+ vpaddd xmm10,xmm10,xmm8
+ vpor xmm0,xmm0,xmm5
+ vpor xmm12,xmm12,xmm7
+ vmovdqa xmm15,XMMWORD[32+rbp]
+ vpxor xmm1,xmm1,xmm3
+ vmovdqa xmm3,XMMWORD[((176-128))+rax]
+
+ vpaddd xmm14,xmm14,xmm15
+ vpslld xmm8,xmm10,5
+ vpand xmm7,xmm13,xmm12
+ vpxor xmm1,xmm1,XMMWORD[((16-128))+rax]
+
+ vpaddd xmm14,xmm14,xmm7
+ vpsrld xmm9,xmm10,27
+ vpxor xmm6,xmm13,xmm12
+ vpxor xmm1,xmm1,xmm3
+
+ vmovdqu XMMWORD[(128-128)+rax],xmm0
+ vpaddd xmm14,xmm14,xmm0
+ vpor xmm8,xmm8,xmm9
+ vpsrld xmm5,xmm1,31
+ vpand xmm6,xmm6,xmm11
+ vpaddd xmm1,xmm1,xmm1
+
+ vpslld xmm7,xmm11,30
+ vpaddd xmm14,xmm14,xmm6
+
+ vpsrld xmm11,xmm11,2
+ vpaddd xmm14,xmm14,xmm8
+ vpor xmm1,xmm1,xmm5
+ vpor xmm11,xmm11,xmm7
+ vpxor xmm2,xmm2,xmm4
+ vmovdqa xmm4,XMMWORD[((192-128))+rax]
+
+ vpaddd xmm13,xmm13,xmm15
+ vpslld xmm8,xmm14,5
+ vpand xmm7,xmm12,xmm11
+ vpxor xmm2,xmm2,XMMWORD[((32-128))+rax]
+
+ vpaddd xmm13,xmm13,xmm7
+ vpsrld xmm9,xmm14,27
+ vpxor xmm6,xmm12,xmm11
+ vpxor xmm2,xmm2,xmm4
+
+ vmovdqu XMMWORD[(144-128)+rax],xmm1
+ vpaddd xmm13,xmm13,xmm1
+ vpor xmm8,xmm8,xmm9
+ vpsrld xmm5,xmm2,31
+ vpand xmm6,xmm6,xmm10
+ vpaddd xmm2,xmm2,xmm2
+
+ vpslld xmm7,xmm10,30
+ vpaddd xmm13,xmm13,xmm6
+
+ vpsrld xmm10,xmm10,2
+ vpaddd xmm13,xmm13,xmm8
+ vpor xmm2,xmm2,xmm5
+ vpor xmm10,xmm10,xmm7
+ vpxor xmm3,xmm3,xmm0
+ vmovdqa xmm0,XMMWORD[((208-128))+rax]
+
+ vpaddd xmm12,xmm12,xmm15
+ vpslld xmm8,xmm13,5
+ vpand xmm7,xmm11,xmm10
+ vpxor xmm3,xmm3,XMMWORD[((48-128))+rax]
+
+ vpaddd xmm12,xmm12,xmm7
+ vpsrld xmm9,xmm13,27
+ vpxor xmm6,xmm11,xmm10
+ vpxor xmm3,xmm3,xmm0
+
+ vmovdqu XMMWORD[(160-128)+rax],xmm2
+ vpaddd xmm12,xmm12,xmm2
+ vpor xmm8,xmm8,xmm9
+ vpsrld xmm5,xmm3,31
+ vpand xmm6,xmm6,xmm14
+ vpaddd xmm3,xmm3,xmm3
+
+ vpslld xmm7,xmm14,30
+ vpaddd xmm12,xmm12,xmm6
+
+ vpsrld xmm14,xmm14,2
+ vpaddd xmm12,xmm12,xmm8
+ vpor xmm3,xmm3,xmm5
+ vpor xmm14,xmm14,xmm7
+ vpxor xmm4,xmm4,xmm1
+ vmovdqa xmm1,XMMWORD[((224-128))+rax]
+
+ vpaddd xmm11,xmm11,xmm15
+ vpslld xmm8,xmm12,5
+ vpand xmm7,xmm10,xmm14
+ vpxor xmm4,xmm4,XMMWORD[((64-128))+rax]
+
+ vpaddd xmm11,xmm11,xmm7
+ vpsrld xmm9,xmm12,27
+ vpxor xmm6,xmm10,xmm14
+ vpxor xmm4,xmm4,xmm1
+
+ vmovdqu XMMWORD[(176-128)+rax],xmm3
+ vpaddd xmm11,xmm11,xmm3
+ vpor xmm8,xmm8,xmm9
+ vpsrld xmm5,xmm4,31
+ vpand xmm6,xmm6,xmm13
+ vpaddd xmm4,xmm4,xmm4
+
+ vpslld xmm7,xmm13,30
+ vpaddd xmm11,xmm11,xmm6
+
+ vpsrld xmm13,xmm13,2
+ vpaddd xmm11,xmm11,xmm8
+ vpor xmm4,xmm4,xmm5
+ vpor xmm13,xmm13,xmm7
+ vpxor xmm0,xmm0,xmm2
+ vmovdqa xmm2,XMMWORD[((240-128))+rax]
+
+ vpaddd xmm10,xmm10,xmm15
+ vpslld xmm8,xmm11,5
+ vpand xmm7,xmm14,xmm13
+ vpxor xmm0,xmm0,XMMWORD[((80-128))+rax]
+
+ vpaddd xmm10,xmm10,xmm7
+ vpsrld xmm9,xmm11,27
+ vpxor xmm6,xmm14,xmm13
+ vpxor xmm0,xmm0,xmm2
+
+ vmovdqu XMMWORD[(192-128)+rax],xmm4
+ vpaddd xmm10,xmm10,xmm4
+ vpor xmm8,xmm8,xmm9
+ vpsrld xmm5,xmm0,31
+ vpand xmm6,xmm6,xmm12
+ vpaddd xmm0,xmm0,xmm0
+
+ vpslld xmm7,xmm12,30
+ vpaddd xmm10,xmm10,xmm6
+
+ vpsrld xmm12,xmm12,2
+ vpaddd xmm10,xmm10,xmm8
+ vpor xmm0,xmm0,xmm5
+ vpor xmm12,xmm12,xmm7
+ vpxor xmm1,xmm1,xmm3
+ vmovdqa xmm3,XMMWORD[((0-128))+rax]
+
+ vpaddd xmm14,xmm14,xmm15
+ vpslld xmm8,xmm10,5
+ vpand xmm7,xmm13,xmm12
+ vpxor xmm1,xmm1,XMMWORD[((96-128))+rax]
+
+ vpaddd xmm14,xmm14,xmm7
+ vpsrld xmm9,xmm10,27
+ vpxor xmm6,xmm13,xmm12
+ vpxor xmm1,xmm1,xmm3
+
+ vmovdqu XMMWORD[(208-128)+rax],xmm0
+ vpaddd xmm14,xmm14,xmm0
+ vpor xmm8,xmm8,xmm9
+ vpsrld xmm5,xmm1,31
+ vpand xmm6,xmm6,xmm11
+ vpaddd xmm1,xmm1,xmm1
+
+ vpslld xmm7,xmm11,30
+ vpaddd xmm14,xmm14,xmm6
+
+ vpsrld xmm11,xmm11,2
+ vpaddd xmm14,xmm14,xmm8
+ vpor xmm1,xmm1,xmm5
+ vpor xmm11,xmm11,xmm7
+ vpxor xmm2,xmm2,xmm4
+ vmovdqa xmm4,XMMWORD[((16-128))+rax]
+
+ vpaddd xmm13,xmm13,xmm15
+ vpslld xmm8,xmm14,5
+ vpand xmm7,xmm12,xmm11
+ vpxor xmm2,xmm2,XMMWORD[((112-128))+rax]
+
+ vpaddd xmm13,xmm13,xmm7
+ vpsrld xmm9,xmm14,27
+ vpxor xmm6,xmm12,xmm11
+ vpxor xmm2,xmm2,xmm4
+
+ vmovdqu XMMWORD[(224-128)+rax],xmm1
+ vpaddd xmm13,xmm13,xmm1
+ vpor xmm8,xmm8,xmm9
+ vpsrld xmm5,xmm2,31
+ vpand xmm6,xmm6,xmm10
+ vpaddd xmm2,xmm2,xmm2
+
+ vpslld xmm7,xmm10,30
+ vpaddd xmm13,xmm13,xmm6
+
+ vpsrld xmm10,xmm10,2
+ vpaddd xmm13,xmm13,xmm8
+ vpor xmm2,xmm2,xmm5
+ vpor xmm10,xmm10,xmm7
+ vpxor xmm3,xmm3,xmm0
+ vmovdqa xmm0,XMMWORD[((32-128))+rax]
+
+ vpaddd xmm12,xmm12,xmm15
+ vpslld xmm8,xmm13,5
+ vpand xmm7,xmm11,xmm10
+ vpxor xmm3,xmm3,XMMWORD[((128-128))+rax]
+
+ vpaddd xmm12,xmm12,xmm7
+ vpsrld xmm9,xmm13,27
+ vpxor xmm6,xmm11,xmm10
+ vpxor xmm3,xmm3,xmm0
+
+ vmovdqu XMMWORD[(240-128)+rax],xmm2
+ vpaddd xmm12,xmm12,xmm2
+ vpor xmm8,xmm8,xmm9
+ vpsrld xmm5,xmm3,31
+ vpand xmm6,xmm6,xmm14
+ vpaddd xmm3,xmm3,xmm3
+
+ vpslld xmm7,xmm14,30
+ vpaddd xmm12,xmm12,xmm6
+
+ vpsrld xmm14,xmm14,2
+ vpaddd xmm12,xmm12,xmm8
+ vpor xmm3,xmm3,xmm5
+ vpor xmm14,xmm14,xmm7
+ vpxor xmm4,xmm4,xmm1
+ vmovdqa xmm1,XMMWORD[((48-128))+rax]
+
+ vpaddd xmm11,xmm11,xmm15
+ vpslld xmm8,xmm12,5
+ vpand xmm7,xmm10,xmm14
+ vpxor xmm4,xmm4,XMMWORD[((144-128))+rax]
+
+ vpaddd xmm11,xmm11,xmm7
+ vpsrld xmm9,xmm12,27
+ vpxor xmm6,xmm10,xmm14
+ vpxor xmm4,xmm4,xmm1
+
+ vmovdqu XMMWORD[(0-128)+rax],xmm3
+ vpaddd xmm11,xmm11,xmm3
+ vpor xmm8,xmm8,xmm9
+ vpsrld xmm5,xmm4,31
+ vpand xmm6,xmm6,xmm13
+ vpaddd xmm4,xmm4,xmm4
+
+ vpslld xmm7,xmm13,30
+ vpaddd xmm11,xmm11,xmm6
+
+ vpsrld xmm13,xmm13,2
+ vpaddd xmm11,xmm11,xmm8
+ vpor xmm4,xmm4,xmm5
+ vpor xmm13,xmm13,xmm7
+ vpxor xmm0,xmm0,xmm2
+ vmovdqa xmm2,XMMWORD[((64-128))+rax]
+
+ vpaddd xmm10,xmm10,xmm15
+ vpslld xmm8,xmm11,5
+ vpand xmm7,xmm14,xmm13
+ vpxor xmm0,xmm0,XMMWORD[((160-128))+rax]
+
+ vpaddd xmm10,xmm10,xmm7
+ vpsrld xmm9,xmm11,27
+ vpxor xmm6,xmm14,xmm13
+ vpxor xmm0,xmm0,xmm2
+
+ vmovdqu XMMWORD[(16-128)+rax],xmm4
+ vpaddd xmm10,xmm10,xmm4
+ vpor xmm8,xmm8,xmm9
+ vpsrld xmm5,xmm0,31
+ vpand xmm6,xmm6,xmm12
+ vpaddd xmm0,xmm0,xmm0
+
+ vpslld xmm7,xmm12,30
+ vpaddd xmm10,xmm10,xmm6
+
+ vpsrld xmm12,xmm12,2
+ vpaddd xmm10,xmm10,xmm8
+ vpor xmm0,xmm0,xmm5
+ vpor xmm12,xmm12,xmm7
+ vpxor xmm1,xmm1,xmm3
+ vmovdqa xmm3,XMMWORD[((80-128))+rax]
+
+ vpaddd xmm14,xmm14,xmm15
+ vpslld xmm8,xmm10,5
+ vpand xmm7,xmm13,xmm12
+ vpxor xmm1,xmm1,XMMWORD[((176-128))+rax]
+
+ vpaddd xmm14,xmm14,xmm7
+ vpsrld xmm9,xmm10,27
+ vpxor xmm6,xmm13,xmm12
+ vpxor xmm1,xmm1,xmm3
+
+ vmovdqu XMMWORD[(32-128)+rax],xmm0
+ vpaddd xmm14,xmm14,xmm0
+ vpor xmm8,xmm8,xmm9
+ vpsrld xmm5,xmm1,31
+ vpand xmm6,xmm6,xmm11
+ vpaddd xmm1,xmm1,xmm1
+
+ vpslld xmm7,xmm11,30
+ vpaddd xmm14,xmm14,xmm6
+
+ vpsrld xmm11,xmm11,2
+ vpaddd xmm14,xmm14,xmm8
+ vpor xmm1,xmm1,xmm5
+ vpor xmm11,xmm11,xmm7
+ vpxor xmm2,xmm2,xmm4
+ vmovdqa xmm4,XMMWORD[((96-128))+rax]
+
+ vpaddd xmm13,xmm13,xmm15
+ vpslld xmm8,xmm14,5
+ vpand xmm7,xmm12,xmm11
+ vpxor xmm2,xmm2,XMMWORD[((192-128))+rax]
+
+ vpaddd xmm13,xmm13,xmm7
+ vpsrld xmm9,xmm14,27
+ vpxor xmm6,xmm12,xmm11
+ vpxor xmm2,xmm2,xmm4
+
+ vmovdqu XMMWORD[(48-128)+rax],xmm1
+ vpaddd xmm13,xmm13,xmm1
+ vpor xmm8,xmm8,xmm9
+ vpsrld xmm5,xmm2,31
+ vpand xmm6,xmm6,xmm10
+ vpaddd xmm2,xmm2,xmm2
+
+ vpslld xmm7,xmm10,30
+ vpaddd xmm13,xmm13,xmm6
+
+ vpsrld xmm10,xmm10,2
+ vpaddd xmm13,xmm13,xmm8
+ vpor xmm2,xmm2,xmm5
+ vpor xmm10,xmm10,xmm7
+ vpxor xmm3,xmm3,xmm0
+ vmovdqa xmm0,XMMWORD[((112-128))+rax]
+
+ vpaddd xmm12,xmm12,xmm15
+ vpslld xmm8,xmm13,5
+ vpand xmm7,xmm11,xmm10
+ vpxor xmm3,xmm3,XMMWORD[((208-128))+rax]
+
+ vpaddd xmm12,xmm12,xmm7
+ vpsrld xmm9,xmm13,27
+ vpxor xmm6,xmm11,xmm10
+ vpxor xmm3,xmm3,xmm0
+
+ vmovdqu XMMWORD[(64-128)+rax],xmm2
+ vpaddd xmm12,xmm12,xmm2
+ vpor xmm8,xmm8,xmm9
+ vpsrld xmm5,xmm3,31
+ vpand xmm6,xmm6,xmm14
+ vpaddd xmm3,xmm3,xmm3
+
+ vpslld xmm7,xmm14,30
+ vpaddd xmm12,xmm12,xmm6
+
+ vpsrld xmm14,xmm14,2
+ vpaddd xmm12,xmm12,xmm8
+ vpor xmm3,xmm3,xmm5
+ vpor xmm14,xmm14,xmm7
+ vpxor xmm4,xmm4,xmm1
+ vmovdqa xmm1,XMMWORD[((128-128))+rax]
+
+ vpaddd xmm11,xmm11,xmm15
+ vpslld xmm8,xmm12,5
+ vpand xmm7,xmm10,xmm14
+ vpxor xmm4,xmm4,XMMWORD[((224-128))+rax]
+
+ vpaddd xmm11,xmm11,xmm7
+ vpsrld xmm9,xmm12,27
+ vpxor xmm6,xmm10,xmm14
+ vpxor xmm4,xmm4,xmm1
+
+ vmovdqu XMMWORD[(80-128)+rax],xmm3
+ vpaddd xmm11,xmm11,xmm3
+ vpor xmm8,xmm8,xmm9
+ vpsrld xmm5,xmm4,31
+ vpand xmm6,xmm6,xmm13
+ vpaddd xmm4,xmm4,xmm4
+
+ vpslld xmm7,xmm13,30
+ vpaddd xmm11,xmm11,xmm6
+
+ vpsrld xmm13,xmm13,2
+ vpaddd xmm11,xmm11,xmm8
+ vpor xmm4,xmm4,xmm5
+ vpor xmm13,xmm13,xmm7
+ vpxor xmm0,xmm0,xmm2
+ vmovdqa xmm2,XMMWORD[((144-128))+rax]
+
+ vpaddd xmm10,xmm10,xmm15
+ vpslld xmm8,xmm11,5
+ vpand xmm7,xmm14,xmm13
+ vpxor xmm0,xmm0,XMMWORD[((240-128))+rax]
+
+ vpaddd xmm10,xmm10,xmm7
+ vpsrld xmm9,xmm11,27
+ vpxor xmm6,xmm14,xmm13
+ vpxor xmm0,xmm0,xmm2
+
+ vmovdqu XMMWORD[(96-128)+rax],xmm4
+ vpaddd xmm10,xmm10,xmm4
+ vpor xmm8,xmm8,xmm9
+ vpsrld xmm5,xmm0,31
+ vpand xmm6,xmm6,xmm12
+ vpaddd xmm0,xmm0,xmm0
+
+ vpslld xmm7,xmm12,30
+ vpaddd xmm10,xmm10,xmm6
+
+ vpsrld xmm12,xmm12,2
+ vpaddd xmm10,xmm10,xmm8
+ vpor xmm0,xmm0,xmm5
+ vpor xmm12,xmm12,xmm7
+ vpxor xmm1,xmm1,xmm3
+ vmovdqa xmm3,XMMWORD[((160-128))+rax]
+
+ vpaddd xmm14,xmm14,xmm15
+ vpslld xmm8,xmm10,5
+ vpand xmm7,xmm13,xmm12
+ vpxor xmm1,xmm1,XMMWORD[((0-128))+rax]
+
+ vpaddd xmm14,xmm14,xmm7
+ vpsrld xmm9,xmm10,27
+ vpxor xmm6,xmm13,xmm12
+ vpxor xmm1,xmm1,xmm3
+
+ vmovdqu XMMWORD[(112-128)+rax],xmm0
+ vpaddd xmm14,xmm14,xmm0
+ vpor xmm8,xmm8,xmm9
+ vpsrld xmm5,xmm1,31
+ vpand xmm6,xmm6,xmm11
+ vpaddd xmm1,xmm1,xmm1
+
+ vpslld xmm7,xmm11,30
+ vpaddd xmm14,xmm14,xmm6
+
+ vpsrld xmm11,xmm11,2
+ vpaddd xmm14,xmm14,xmm8
+ vpor xmm1,xmm1,xmm5
+ vpor xmm11,xmm11,xmm7
+ vpxor xmm2,xmm2,xmm4
+ vmovdqa xmm4,XMMWORD[((176-128))+rax]
+
+ vpaddd xmm13,xmm13,xmm15
+ vpslld xmm8,xmm14,5
+ vpand xmm7,xmm12,xmm11
+ vpxor xmm2,xmm2,XMMWORD[((16-128))+rax]
+
+ vpaddd xmm13,xmm13,xmm7
+ vpsrld xmm9,xmm14,27
+ vpxor xmm6,xmm12,xmm11
+ vpxor xmm2,xmm2,xmm4
+
+ vmovdqu XMMWORD[(128-128)+rax],xmm1
+ vpaddd xmm13,xmm13,xmm1
+ vpor xmm8,xmm8,xmm9
+ vpsrld xmm5,xmm2,31
+ vpand xmm6,xmm6,xmm10
+ vpaddd xmm2,xmm2,xmm2
+
+ vpslld xmm7,xmm10,30
+ vpaddd xmm13,xmm13,xmm6
+
+ vpsrld xmm10,xmm10,2
+ vpaddd xmm13,xmm13,xmm8
+ vpor xmm2,xmm2,xmm5
+ vpor xmm10,xmm10,xmm7
+ vpxor xmm3,xmm3,xmm0
+ vmovdqa xmm0,XMMWORD[((192-128))+rax]
+
+ vpaddd xmm12,xmm12,xmm15
+ vpslld xmm8,xmm13,5
+ vpand xmm7,xmm11,xmm10
+ vpxor xmm3,xmm3,XMMWORD[((32-128))+rax]
+
+ vpaddd xmm12,xmm12,xmm7
+ vpsrld xmm9,xmm13,27
+ vpxor xmm6,xmm11,xmm10
+ vpxor xmm3,xmm3,xmm0
+
+ vmovdqu XMMWORD[(144-128)+rax],xmm2
+ vpaddd xmm12,xmm12,xmm2
+ vpor xmm8,xmm8,xmm9
+ vpsrld xmm5,xmm3,31
+ vpand xmm6,xmm6,xmm14
+ vpaddd xmm3,xmm3,xmm3
+
+ vpslld xmm7,xmm14,30
+ vpaddd xmm12,xmm12,xmm6
+
+ vpsrld xmm14,xmm14,2
+ vpaddd xmm12,xmm12,xmm8
+ vpor xmm3,xmm3,xmm5
+ vpor xmm14,xmm14,xmm7
+ vpxor xmm4,xmm4,xmm1
+ vmovdqa xmm1,XMMWORD[((208-128))+rax]
+
+ vpaddd xmm11,xmm11,xmm15
+ vpslld xmm8,xmm12,5
+ vpand xmm7,xmm10,xmm14
+ vpxor xmm4,xmm4,XMMWORD[((48-128))+rax]
+
+ vpaddd xmm11,xmm11,xmm7
+ vpsrld xmm9,xmm12,27
+ vpxor xmm6,xmm10,xmm14
+ vpxor xmm4,xmm4,xmm1
+
+ vmovdqu XMMWORD[(160-128)+rax],xmm3
+ vpaddd xmm11,xmm11,xmm3
+ vpor xmm8,xmm8,xmm9
+ vpsrld xmm5,xmm4,31
+ vpand xmm6,xmm6,xmm13
+ vpaddd xmm4,xmm4,xmm4
+
+ vpslld xmm7,xmm13,30
+ vpaddd xmm11,xmm11,xmm6
+
+ vpsrld xmm13,xmm13,2
+ vpaddd xmm11,xmm11,xmm8
+ vpor xmm4,xmm4,xmm5
+ vpor xmm13,xmm13,xmm7
+ vpxor xmm0,xmm0,xmm2
+ vmovdqa xmm2,XMMWORD[((224-128))+rax]
+
+ vpaddd xmm10,xmm10,xmm15
+ vpslld xmm8,xmm11,5
+ vpand xmm7,xmm14,xmm13
+ vpxor xmm0,xmm0,XMMWORD[((64-128))+rax]
+
+ vpaddd xmm10,xmm10,xmm7
+ vpsrld xmm9,xmm11,27
+ vpxor xmm6,xmm14,xmm13
+ vpxor xmm0,xmm0,xmm2
+
+ vmovdqu XMMWORD[(176-128)+rax],xmm4
+ vpaddd xmm10,xmm10,xmm4
+ vpor xmm8,xmm8,xmm9
+ vpsrld xmm5,xmm0,31
+ vpand xmm6,xmm6,xmm12
+ vpaddd xmm0,xmm0,xmm0
+
+ vpslld xmm7,xmm12,30
+ vpaddd xmm10,xmm10,xmm6
+
+ vpsrld xmm12,xmm12,2
+ vpaddd xmm10,xmm10,xmm8
+ vpor xmm0,xmm0,xmm5
+ vpor xmm12,xmm12,xmm7
+ vmovdqa xmm15,XMMWORD[64+rbp]
+ vpxor xmm1,xmm1,xmm3
+ vmovdqa xmm3,XMMWORD[((240-128))+rax]
+
+ vpslld xmm8,xmm10,5
+ vpaddd xmm14,xmm14,xmm15
+ vpxor xmm6,xmm13,xmm11
+ vmovdqa XMMWORD[(192-128)+rax],xmm0
+ vpaddd xmm14,xmm14,xmm0
+ vpxor xmm1,xmm1,XMMWORD[((80-128))+rax]
+ vpsrld xmm9,xmm10,27
+ vpxor xmm6,xmm6,xmm12
+ vpxor xmm1,xmm1,xmm3
+
+ vpslld xmm7,xmm11,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm14,xmm14,xmm6
+ vpsrld xmm5,xmm1,31
+ vpaddd xmm1,xmm1,xmm1
+
+ vpsrld xmm11,xmm11,2
+ vpaddd xmm14,xmm14,xmm8
+ vpor xmm1,xmm1,xmm5
+ vpor xmm11,xmm11,xmm7
+ vpxor xmm2,xmm2,xmm4
+ vmovdqa xmm4,XMMWORD[((0-128))+rax]
+
+ vpslld xmm8,xmm14,5
+ vpaddd xmm13,xmm13,xmm15
+ vpxor xmm6,xmm12,xmm10
+ vmovdqa XMMWORD[(208-128)+rax],xmm1
+ vpaddd xmm13,xmm13,xmm1
+ vpxor xmm2,xmm2,XMMWORD[((96-128))+rax]
+ vpsrld xmm9,xmm14,27
+ vpxor xmm6,xmm6,xmm11
+ vpxor xmm2,xmm2,xmm4
+
+ vpslld xmm7,xmm10,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm13,xmm13,xmm6
+ vpsrld xmm5,xmm2,31
+ vpaddd xmm2,xmm2,xmm2
+
+ vpsrld xmm10,xmm10,2
+ vpaddd xmm13,xmm13,xmm8
+ vpor xmm2,xmm2,xmm5
+ vpor xmm10,xmm10,xmm7
+ vpxor xmm3,xmm3,xmm0
+ vmovdqa xmm0,XMMWORD[((16-128))+rax]
+
+ vpslld xmm8,xmm13,5
+ vpaddd xmm12,xmm12,xmm15
+ vpxor xmm6,xmm11,xmm14
+ vmovdqa XMMWORD[(224-128)+rax],xmm2
+ vpaddd xmm12,xmm12,xmm2
+ vpxor xmm3,xmm3,XMMWORD[((112-128))+rax]
+ vpsrld xmm9,xmm13,27
+ vpxor xmm6,xmm6,xmm10
+ vpxor xmm3,xmm3,xmm0
+
+ vpslld xmm7,xmm14,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm12,xmm12,xmm6
+ vpsrld xmm5,xmm3,31
+ vpaddd xmm3,xmm3,xmm3
+
+ vpsrld xmm14,xmm14,2
+ vpaddd xmm12,xmm12,xmm8
+ vpor xmm3,xmm3,xmm5
+ vpor xmm14,xmm14,xmm7
+ vpxor xmm4,xmm4,xmm1
+ vmovdqa xmm1,XMMWORD[((32-128))+rax]
+
+ vpslld xmm8,xmm12,5
+ vpaddd xmm11,xmm11,xmm15
+ vpxor xmm6,xmm10,xmm13
+ vmovdqa XMMWORD[(240-128)+rax],xmm3
+ vpaddd xmm11,xmm11,xmm3
+ vpxor xmm4,xmm4,XMMWORD[((128-128))+rax]
+ vpsrld xmm9,xmm12,27
+ vpxor xmm6,xmm6,xmm14
+ vpxor xmm4,xmm4,xmm1
+
+ vpslld xmm7,xmm13,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm11,xmm11,xmm6
+ vpsrld xmm5,xmm4,31
+ vpaddd xmm4,xmm4,xmm4
+
+ vpsrld xmm13,xmm13,2
+ vpaddd xmm11,xmm11,xmm8
+ vpor xmm4,xmm4,xmm5
+ vpor xmm13,xmm13,xmm7
+ vpxor xmm0,xmm0,xmm2
+ vmovdqa xmm2,XMMWORD[((48-128))+rax]
+
+ vpslld xmm8,xmm11,5
+ vpaddd xmm10,xmm10,xmm15
+ vpxor xmm6,xmm14,xmm12
+ vmovdqa XMMWORD[(0-128)+rax],xmm4
+ vpaddd xmm10,xmm10,xmm4
+ vpxor xmm0,xmm0,XMMWORD[((144-128))+rax]
+ vpsrld xmm9,xmm11,27
+ vpxor xmm6,xmm6,xmm13
+ vpxor xmm0,xmm0,xmm2
+
+ vpslld xmm7,xmm12,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm10,xmm10,xmm6
+ vpsrld xmm5,xmm0,31
+ vpaddd xmm0,xmm0,xmm0
+
+ vpsrld xmm12,xmm12,2
+ vpaddd xmm10,xmm10,xmm8
+ vpor xmm0,xmm0,xmm5
+ vpor xmm12,xmm12,xmm7
+ vpxor xmm1,xmm1,xmm3
+ vmovdqa xmm3,XMMWORD[((64-128))+rax]
+
+ vpslld xmm8,xmm10,5
+ vpaddd xmm14,xmm14,xmm15
+ vpxor xmm6,xmm13,xmm11
+ vmovdqa XMMWORD[(16-128)+rax],xmm0
+ vpaddd xmm14,xmm14,xmm0
+ vpxor xmm1,xmm1,XMMWORD[((160-128))+rax]
+ vpsrld xmm9,xmm10,27
+ vpxor xmm6,xmm6,xmm12
+ vpxor xmm1,xmm1,xmm3
+
+ vpslld xmm7,xmm11,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm14,xmm14,xmm6
+ vpsrld xmm5,xmm1,31
+ vpaddd xmm1,xmm1,xmm1
+
+ vpsrld xmm11,xmm11,2
+ vpaddd xmm14,xmm14,xmm8
+ vpor xmm1,xmm1,xmm5
+ vpor xmm11,xmm11,xmm7
+ vpxor xmm2,xmm2,xmm4
+ vmovdqa xmm4,XMMWORD[((80-128))+rax]
+
+ vpslld xmm8,xmm14,5
+ vpaddd xmm13,xmm13,xmm15
+ vpxor xmm6,xmm12,xmm10
+ vmovdqa XMMWORD[(32-128)+rax],xmm1
+ vpaddd xmm13,xmm13,xmm1
+ vpxor xmm2,xmm2,XMMWORD[((176-128))+rax]
+ vpsrld xmm9,xmm14,27
+ vpxor xmm6,xmm6,xmm11
+ vpxor xmm2,xmm2,xmm4
+
+ vpslld xmm7,xmm10,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm13,xmm13,xmm6
+ vpsrld xmm5,xmm2,31
+ vpaddd xmm2,xmm2,xmm2
+
+ vpsrld xmm10,xmm10,2
+ vpaddd xmm13,xmm13,xmm8
+ vpor xmm2,xmm2,xmm5
+ vpor xmm10,xmm10,xmm7
+ vpxor xmm3,xmm3,xmm0
+ vmovdqa xmm0,XMMWORD[((96-128))+rax]
+
+ vpslld xmm8,xmm13,5
+ vpaddd xmm12,xmm12,xmm15
+ vpxor xmm6,xmm11,xmm14
+ vmovdqa XMMWORD[(48-128)+rax],xmm2
+ vpaddd xmm12,xmm12,xmm2
+ vpxor xmm3,xmm3,XMMWORD[((192-128))+rax]
+ vpsrld xmm9,xmm13,27
+ vpxor xmm6,xmm6,xmm10
+ vpxor xmm3,xmm3,xmm0
+
+ vpslld xmm7,xmm14,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm12,xmm12,xmm6
+ vpsrld xmm5,xmm3,31
+ vpaddd xmm3,xmm3,xmm3
+
+ vpsrld xmm14,xmm14,2
+ vpaddd xmm12,xmm12,xmm8
+ vpor xmm3,xmm3,xmm5
+ vpor xmm14,xmm14,xmm7
+ vpxor xmm4,xmm4,xmm1
+ vmovdqa xmm1,XMMWORD[((112-128))+rax]
+
+ vpslld xmm8,xmm12,5
+ vpaddd xmm11,xmm11,xmm15
+ vpxor xmm6,xmm10,xmm13
+ vmovdqa XMMWORD[(64-128)+rax],xmm3
+ vpaddd xmm11,xmm11,xmm3
+ vpxor xmm4,xmm4,XMMWORD[((208-128))+rax]
+ vpsrld xmm9,xmm12,27
+ vpxor xmm6,xmm6,xmm14
+ vpxor xmm4,xmm4,xmm1
+
+ vpslld xmm7,xmm13,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm11,xmm11,xmm6
+ vpsrld xmm5,xmm4,31
+ vpaddd xmm4,xmm4,xmm4
+
+ vpsrld xmm13,xmm13,2
+ vpaddd xmm11,xmm11,xmm8
+ vpor xmm4,xmm4,xmm5
+ vpor xmm13,xmm13,xmm7
+ vpxor xmm0,xmm0,xmm2
+ vmovdqa xmm2,XMMWORD[((128-128))+rax]
+
+ vpslld xmm8,xmm11,5
+ vpaddd xmm10,xmm10,xmm15
+ vpxor xmm6,xmm14,xmm12
+ vmovdqa XMMWORD[(80-128)+rax],xmm4
+ vpaddd xmm10,xmm10,xmm4
+ vpxor xmm0,xmm0,XMMWORD[((224-128))+rax]
+ vpsrld xmm9,xmm11,27
+ vpxor xmm6,xmm6,xmm13
+ vpxor xmm0,xmm0,xmm2
+
+ vpslld xmm7,xmm12,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm10,xmm10,xmm6
+ vpsrld xmm5,xmm0,31
+ vpaddd xmm0,xmm0,xmm0
+
+ vpsrld xmm12,xmm12,2
+ vpaddd xmm10,xmm10,xmm8
+ vpor xmm0,xmm0,xmm5
+ vpor xmm12,xmm12,xmm7
+ vpxor xmm1,xmm1,xmm3
+ vmovdqa xmm3,XMMWORD[((144-128))+rax]
+
+ vpslld xmm8,xmm10,5
+ vpaddd xmm14,xmm14,xmm15
+ vpxor xmm6,xmm13,xmm11
+ vmovdqa XMMWORD[(96-128)+rax],xmm0
+ vpaddd xmm14,xmm14,xmm0
+ vpxor xmm1,xmm1,XMMWORD[((240-128))+rax]
+ vpsrld xmm9,xmm10,27
+ vpxor xmm6,xmm6,xmm12
+ vpxor xmm1,xmm1,xmm3
+
+ vpslld xmm7,xmm11,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm14,xmm14,xmm6
+ vpsrld xmm5,xmm1,31
+ vpaddd xmm1,xmm1,xmm1
+
+ vpsrld xmm11,xmm11,2
+ vpaddd xmm14,xmm14,xmm8
+ vpor xmm1,xmm1,xmm5
+ vpor xmm11,xmm11,xmm7
+ vpxor xmm2,xmm2,xmm4
+ vmovdqa xmm4,XMMWORD[((160-128))+rax]
+
+ vpslld xmm8,xmm14,5
+ vpaddd xmm13,xmm13,xmm15
+ vpxor xmm6,xmm12,xmm10
+ vmovdqa XMMWORD[(112-128)+rax],xmm1
+ vpaddd xmm13,xmm13,xmm1
+ vpxor xmm2,xmm2,XMMWORD[((0-128))+rax]
+ vpsrld xmm9,xmm14,27
+ vpxor xmm6,xmm6,xmm11
+ vpxor xmm2,xmm2,xmm4
+
+ vpslld xmm7,xmm10,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm13,xmm13,xmm6
+ vpsrld xmm5,xmm2,31
+ vpaddd xmm2,xmm2,xmm2
+
+ vpsrld xmm10,xmm10,2
+ vpaddd xmm13,xmm13,xmm8
+ vpor xmm2,xmm2,xmm5
+ vpor xmm10,xmm10,xmm7
+ vpxor xmm3,xmm3,xmm0
+ vmovdqa xmm0,XMMWORD[((176-128))+rax]
+
+ vpslld xmm8,xmm13,5
+ vpaddd xmm12,xmm12,xmm15
+ vpxor xmm6,xmm11,xmm14
+ vpaddd xmm12,xmm12,xmm2
+ vpxor xmm3,xmm3,XMMWORD[((16-128))+rax]
+ vpsrld xmm9,xmm13,27
+ vpxor xmm6,xmm6,xmm10
+ vpxor xmm3,xmm3,xmm0
+
+ vpslld xmm7,xmm14,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm12,xmm12,xmm6
+ vpsrld xmm5,xmm3,31
+ vpaddd xmm3,xmm3,xmm3
+
+ vpsrld xmm14,xmm14,2
+ vpaddd xmm12,xmm12,xmm8
+ vpor xmm3,xmm3,xmm5
+ vpor xmm14,xmm14,xmm7
+ vpxor xmm4,xmm4,xmm1
+ vmovdqa xmm1,XMMWORD[((192-128))+rax]
+
+ vpslld xmm8,xmm12,5
+ vpaddd xmm11,xmm11,xmm15
+ vpxor xmm6,xmm10,xmm13
+ vpaddd xmm11,xmm11,xmm3
+ vpxor xmm4,xmm4,XMMWORD[((32-128))+rax]
+ vpsrld xmm9,xmm12,27
+ vpxor xmm6,xmm6,xmm14
+ vpxor xmm4,xmm4,xmm1
+
+ vpslld xmm7,xmm13,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm11,xmm11,xmm6
+ vpsrld xmm5,xmm4,31
+ vpaddd xmm4,xmm4,xmm4
+
+ vpsrld xmm13,xmm13,2
+ vpaddd xmm11,xmm11,xmm8
+ vpor xmm4,xmm4,xmm5
+ vpor xmm13,xmm13,xmm7
+ vpxor xmm0,xmm0,xmm2
+ vmovdqa xmm2,XMMWORD[((208-128))+rax]
+
+ vpslld xmm8,xmm11,5
+ vpaddd xmm10,xmm10,xmm15
+ vpxor xmm6,xmm14,xmm12
+ vpaddd xmm10,xmm10,xmm4
+ vpxor xmm0,xmm0,XMMWORD[((48-128))+rax]
+ vpsrld xmm9,xmm11,27
+ vpxor xmm6,xmm6,xmm13
+ vpxor xmm0,xmm0,xmm2
+
+ vpslld xmm7,xmm12,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm10,xmm10,xmm6
+ vpsrld xmm5,xmm0,31
+ vpaddd xmm0,xmm0,xmm0
+
+ vpsrld xmm12,xmm12,2
+ vpaddd xmm10,xmm10,xmm8
+ vpor xmm0,xmm0,xmm5
+ vpor xmm12,xmm12,xmm7
+ vpxor xmm1,xmm1,xmm3
+ vmovdqa xmm3,XMMWORD[((224-128))+rax]
+
+ vpslld xmm8,xmm10,5
+ vpaddd xmm14,xmm14,xmm15
+ vpxor xmm6,xmm13,xmm11
+ vpaddd xmm14,xmm14,xmm0
+ vpxor xmm1,xmm1,XMMWORD[((64-128))+rax]
+ vpsrld xmm9,xmm10,27
+ vpxor xmm6,xmm6,xmm12
+ vpxor xmm1,xmm1,xmm3
+
+ vpslld xmm7,xmm11,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm14,xmm14,xmm6
+ vpsrld xmm5,xmm1,31
+ vpaddd xmm1,xmm1,xmm1
+
+ vpsrld xmm11,xmm11,2
+ vpaddd xmm14,xmm14,xmm8
+ vpor xmm1,xmm1,xmm5
+ vpor xmm11,xmm11,xmm7
+ vpxor xmm2,xmm2,xmm4
+ vmovdqa xmm4,XMMWORD[((240-128))+rax]
+
+ vpslld xmm8,xmm14,5
+ vpaddd xmm13,xmm13,xmm15
+ vpxor xmm6,xmm12,xmm10
+ vpaddd xmm13,xmm13,xmm1
+ vpxor xmm2,xmm2,XMMWORD[((80-128))+rax]
+ vpsrld xmm9,xmm14,27
+ vpxor xmm6,xmm6,xmm11
+ vpxor xmm2,xmm2,xmm4
+
+ vpslld xmm7,xmm10,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm13,xmm13,xmm6
+ vpsrld xmm5,xmm2,31
+ vpaddd xmm2,xmm2,xmm2
+
+ vpsrld xmm10,xmm10,2
+ vpaddd xmm13,xmm13,xmm8
+ vpor xmm2,xmm2,xmm5
+ vpor xmm10,xmm10,xmm7
+ vpxor xmm3,xmm3,xmm0
+ vmovdqa xmm0,XMMWORD[((0-128))+rax]
+
+ vpslld xmm8,xmm13,5
+ vpaddd xmm12,xmm12,xmm15
+ vpxor xmm6,xmm11,xmm14
+ vpaddd xmm12,xmm12,xmm2
+ vpxor xmm3,xmm3,XMMWORD[((96-128))+rax]
+ vpsrld xmm9,xmm13,27
+ vpxor xmm6,xmm6,xmm10
+ vpxor xmm3,xmm3,xmm0
+
+ vpslld xmm7,xmm14,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm12,xmm12,xmm6
+ vpsrld xmm5,xmm3,31
+ vpaddd xmm3,xmm3,xmm3
+
+ vpsrld xmm14,xmm14,2
+ vpaddd xmm12,xmm12,xmm8
+ vpor xmm3,xmm3,xmm5
+ vpor xmm14,xmm14,xmm7
+ vpxor xmm4,xmm4,xmm1
+ vmovdqa xmm1,XMMWORD[((16-128))+rax]
+
+ vpslld xmm8,xmm12,5
+ vpaddd xmm11,xmm11,xmm15
+ vpxor xmm6,xmm10,xmm13
+ vpaddd xmm11,xmm11,xmm3
+ vpxor xmm4,xmm4,XMMWORD[((112-128))+rax]
+ vpsrld xmm9,xmm12,27
+ vpxor xmm6,xmm6,xmm14
+ vpxor xmm4,xmm4,xmm1
+
+ vpslld xmm7,xmm13,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm11,xmm11,xmm6
+ vpsrld xmm5,xmm4,31
+ vpaddd xmm4,xmm4,xmm4
+
+ vpsrld xmm13,xmm13,2
+ vpaddd xmm11,xmm11,xmm8
+ vpor xmm4,xmm4,xmm5
+ vpor xmm13,xmm13,xmm7
+ vpslld xmm8,xmm11,5
+ vpaddd xmm10,xmm10,xmm15
+ vpxor xmm6,xmm14,xmm12
+
+ vpsrld xmm9,xmm11,27
+ vpaddd xmm10,xmm10,xmm4
+ vpxor xmm6,xmm6,xmm13
+
+ vpslld xmm7,xmm12,30
+ vpor xmm8,xmm8,xmm9
+ vpaddd xmm10,xmm10,xmm6
+
+ vpsrld xmm12,xmm12,2
+ vpaddd xmm10,xmm10,xmm8
+ vpor xmm12,xmm12,xmm7
+ mov ecx,1
+ cmp ecx,DWORD[rbx]
+ cmovge r8,rbp
+ cmp ecx,DWORD[4+rbx]
+ cmovge r9,rbp
+ cmp ecx,DWORD[8+rbx]
+ cmovge r10,rbp
+ cmp ecx,DWORD[12+rbx]
+ cmovge r11,rbp
+ vmovdqu xmm6,XMMWORD[rbx]
+ vpxor xmm8,xmm8,xmm8
+ vmovdqa xmm7,xmm6
+ vpcmpgtd xmm7,xmm7,xmm8
+ vpaddd xmm6,xmm6,xmm7
+
+ vpand xmm10,xmm10,xmm7
+ vpand xmm11,xmm11,xmm7
+ vpaddd xmm10,xmm10,XMMWORD[rdi]
+ vpand xmm12,xmm12,xmm7
+ vpaddd xmm11,xmm11,XMMWORD[32+rdi]
+ vpand xmm13,xmm13,xmm7
+ vpaddd xmm12,xmm12,XMMWORD[64+rdi]
+ vpand xmm14,xmm14,xmm7
+ vpaddd xmm13,xmm13,XMMWORD[96+rdi]
+ vpaddd xmm14,xmm14,XMMWORD[128+rdi]
+ vmovdqu XMMWORD[rdi],xmm10
+ vmovdqu XMMWORD[32+rdi],xmm11
+ vmovdqu XMMWORD[64+rdi],xmm12
+ vmovdqu XMMWORD[96+rdi],xmm13
+ vmovdqu XMMWORD[128+rdi],xmm14
+
+ vmovdqu XMMWORD[rbx],xmm6
+ vmovdqu xmm5,XMMWORD[96+rbp]
+ dec edx
+ jnz NEAR $L$oop_avx
+
+ mov edx,DWORD[280+rsp]
+ lea rdi,[16+rdi]
+ lea rsi,[64+rsi]
+ dec edx
+ jnz NEAR $L$oop_grande_avx
+
+$L$done_avx:
+ mov rax,QWORD[272+rsp]
+
+ vzeroupper
+ movaps xmm6,XMMWORD[((-184))+rax]
+ movaps xmm7,XMMWORD[((-168))+rax]
+ movaps xmm8,XMMWORD[((-152))+rax]
+ movaps xmm9,XMMWORD[((-136))+rax]
+ movaps xmm10,XMMWORD[((-120))+rax]
+ movaps xmm11,XMMWORD[((-104))+rax]
+ movaps xmm12,XMMWORD[((-88))+rax]
+ movaps xmm13,XMMWORD[((-72))+rax]
+ movaps xmm14,XMMWORD[((-56))+rax]
+ movaps xmm15,XMMWORD[((-40))+rax]
+ mov rbp,QWORD[((-16))+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+
+ lea rsp,[rax]
+
+$L$epilogue_avx:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_sha1_multi_block_avx:
+
+ALIGN 32
+sha1_multi_block_avx2:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_sha1_multi_block_avx2:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+
+_avx2_shortcut:
+ mov rax,rsp
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+ lea rsp,[((-168))+rsp]
+ movaps XMMWORD[rsp],xmm6
+ movaps XMMWORD[16+rsp],xmm7
+ movaps XMMWORD[32+rsp],xmm8
+ movaps XMMWORD[48+rsp],xmm9
+ movaps XMMWORD[64+rsp],xmm10
+ movaps XMMWORD[80+rsp],xmm11
+ movaps XMMWORD[(-120)+rax],xmm12
+ movaps XMMWORD[(-104)+rax],xmm13
+ movaps XMMWORD[(-88)+rax],xmm14
+ movaps XMMWORD[(-72)+rax],xmm15
+ sub rsp,576
+ and rsp,-256
+ mov QWORD[544+rsp],rax
+
+$L$body_avx2:
+ lea rbp,[K_XX_XX]
+ shr edx,1
+
+ vzeroupper
+$L$oop_grande_avx2:
+ mov DWORD[552+rsp],edx
+ xor edx,edx
+ lea rbx,[512+rsp]
+ mov r12,QWORD[rsi]
+ mov ecx,DWORD[8+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[rbx],ecx
+ cmovle r12,rbp
+ mov r13,QWORD[16+rsi]
+ mov ecx,DWORD[24+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[4+rbx],ecx
+ cmovle r13,rbp
+ mov r14,QWORD[32+rsi]
+ mov ecx,DWORD[40+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[8+rbx],ecx
+ cmovle r14,rbp
+ mov r15,QWORD[48+rsi]
+ mov ecx,DWORD[56+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[12+rbx],ecx
+ cmovle r15,rbp
+ mov r8,QWORD[64+rsi]
+ mov ecx,DWORD[72+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[16+rbx],ecx
+ cmovle r8,rbp
+ mov r9,QWORD[80+rsi]
+ mov ecx,DWORD[88+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[20+rbx],ecx
+ cmovle r9,rbp
+ mov r10,QWORD[96+rsi]
+ mov ecx,DWORD[104+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[24+rbx],ecx
+ cmovle r10,rbp
+ mov r11,QWORD[112+rsi]
+ mov ecx,DWORD[120+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[28+rbx],ecx
+ cmovle r11,rbp
+ vmovdqu ymm0,YMMWORD[rdi]
+ lea rax,[128+rsp]
+ vmovdqu ymm1,YMMWORD[32+rdi]
+ lea rbx,[((256+128))+rsp]
+ vmovdqu ymm2,YMMWORD[64+rdi]
+ vmovdqu ymm3,YMMWORD[96+rdi]
+ vmovdqu ymm4,YMMWORD[128+rdi]
+ vmovdqu ymm9,YMMWORD[96+rbp]
+ jmp NEAR $L$oop_avx2
+
+ALIGN 32
+$L$oop_avx2:
+ vmovdqa ymm15,YMMWORD[((-32))+rbp]
+ vmovd xmm10,DWORD[r12]
+ lea r12,[64+r12]
+ vmovd xmm12,DWORD[r8]
+ lea r8,[64+r8]
+ vmovd xmm7,DWORD[r13]
+ lea r13,[64+r13]
+ vmovd xmm6,DWORD[r9]
+ lea r9,[64+r9]
+ vpinsrd xmm10,xmm10,DWORD[r14],1
+ lea r14,[64+r14]
+ vpinsrd xmm12,xmm12,DWORD[r10],1
+ lea r10,[64+r10]
+ vpinsrd xmm7,xmm7,DWORD[r15],1
+ lea r15,[64+r15]
+ vpunpckldq ymm10,ymm10,ymm7
+ vpinsrd xmm6,xmm6,DWORD[r11],1
+ lea r11,[64+r11]
+ vpunpckldq ymm12,ymm12,ymm6
+ vmovd xmm11,DWORD[((-60))+r12]
+ vinserti128 ymm10,ymm10,xmm12,1
+ vmovd xmm8,DWORD[((-60))+r8]
+ vpshufb ymm10,ymm10,ymm9
+ vmovd xmm7,DWORD[((-60))+r13]
+ vmovd xmm6,DWORD[((-60))+r9]
+ vpinsrd xmm11,xmm11,DWORD[((-60))+r14],1
+ vpinsrd xmm8,xmm8,DWORD[((-60))+r10],1
+ vpinsrd xmm7,xmm7,DWORD[((-60))+r15],1
+ vpunpckldq ymm11,ymm11,ymm7
+ vpinsrd xmm6,xmm6,DWORD[((-60))+r11],1
+ vpunpckldq ymm8,ymm8,ymm6
+ vpaddd ymm4,ymm4,ymm15
+ vpslld ymm7,ymm0,5
+ vpandn ymm6,ymm1,ymm3
+ vpand ymm5,ymm1,ymm2
+
+ vmovdqa YMMWORD[(0-128)+rax],ymm10
+ vpaddd ymm4,ymm4,ymm10
+ vinserti128 ymm11,ymm11,xmm8,1
+ vpsrld ymm8,ymm0,27
+ vpxor ymm5,ymm5,ymm6
+ vmovd xmm12,DWORD[((-56))+r12]
+
+ vpslld ymm6,ymm1,30
+ vpor ymm7,ymm7,ymm8
+ vmovd xmm8,DWORD[((-56))+r8]
+ vpaddd ymm4,ymm4,ymm5
+
+ vpsrld ymm1,ymm1,2
+ vpaddd ymm4,ymm4,ymm7
+ vpshufb ymm11,ymm11,ymm9
+ vpor ymm1,ymm1,ymm6
+ vmovd xmm7,DWORD[((-56))+r13]
+ vmovd xmm6,DWORD[((-56))+r9]
+ vpinsrd xmm12,xmm12,DWORD[((-56))+r14],1
+ vpinsrd xmm8,xmm8,DWORD[((-56))+r10],1
+ vpinsrd xmm7,xmm7,DWORD[((-56))+r15],1
+ vpunpckldq ymm12,ymm12,ymm7
+ vpinsrd xmm6,xmm6,DWORD[((-56))+r11],1
+ vpunpckldq ymm8,ymm8,ymm6
+ vpaddd ymm3,ymm3,ymm15
+ vpslld ymm7,ymm4,5
+ vpandn ymm6,ymm0,ymm2
+ vpand ymm5,ymm0,ymm1
+
+ vmovdqa YMMWORD[(32-128)+rax],ymm11
+ vpaddd ymm3,ymm3,ymm11
+ vinserti128 ymm12,ymm12,xmm8,1
+ vpsrld ymm8,ymm4,27
+ vpxor ymm5,ymm5,ymm6
+ vmovd xmm13,DWORD[((-52))+r12]
+
+ vpslld ymm6,ymm0,30
+ vpor ymm7,ymm7,ymm8
+ vmovd xmm8,DWORD[((-52))+r8]
+ vpaddd ymm3,ymm3,ymm5
+
+ vpsrld ymm0,ymm0,2
+ vpaddd ymm3,ymm3,ymm7
+ vpshufb ymm12,ymm12,ymm9
+ vpor ymm0,ymm0,ymm6
+ vmovd xmm7,DWORD[((-52))+r13]
+ vmovd xmm6,DWORD[((-52))+r9]
+ vpinsrd xmm13,xmm13,DWORD[((-52))+r14],1
+ vpinsrd xmm8,xmm8,DWORD[((-52))+r10],1
+ vpinsrd xmm7,xmm7,DWORD[((-52))+r15],1
+ vpunpckldq ymm13,ymm13,ymm7
+ vpinsrd xmm6,xmm6,DWORD[((-52))+r11],1
+ vpunpckldq ymm8,ymm8,ymm6
+ vpaddd ymm2,ymm2,ymm15
+ vpslld ymm7,ymm3,5
+ vpandn ymm6,ymm4,ymm1
+ vpand ymm5,ymm4,ymm0
+
+ vmovdqa YMMWORD[(64-128)+rax],ymm12
+ vpaddd ymm2,ymm2,ymm12
+ vinserti128 ymm13,ymm13,xmm8,1
+ vpsrld ymm8,ymm3,27
+ vpxor ymm5,ymm5,ymm6
+ vmovd xmm14,DWORD[((-48))+r12]
+
+ vpslld ymm6,ymm4,30
+ vpor ymm7,ymm7,ymm8
+ vmovd xmm8,DWORD[((-48))+r8]
+ vpaddd ymm2,ymm2,ymm5
+
+ vpsrld ymm4,ymm4,2
+ vpaddd ymm2,ymm2,ymm7
+ vpshufb ymm13,ymm13,ymm9
+ vpor ymm4,ymm4,ymm6
+ vmovd xmm7,DWORD[((-48))+r13]
+ vmovd xmm6,DWORD[((-48))+r9]
+ vpinsrd xmm14,xmm14,DWORD[((-48))+r14],1
+ vpinsrd xmm8,xmm8,DWORD[((-48))+r10],1
+ vpinsrd xmm7,xmm7,DWORD[((-48))+r15],1
+ vpunpckldq ymm14,ymm14,ymm7
+ vpinsrd xmm6,xmm6,DWORD[((-48))+r11],1
+ vpunpckldq ymm8,ymm8,ymm6
+ vpaddd ymm1,ymm1,ymm15
+ vpslld ymm7,ymm2,5
+ vpandn ymm6,ymm3,ymm0
+ vpand ymm5,ymm3,ymm4
+
+ vmovdqa YMMWORD[(96-128)+rax],ymm13
+ vpaddd ymm1,ymm1,ymm13
+ vinserti128 ymm14,ymm14,xmm8,1
+ vpsrld ymm8,ymm2,27
+ vpxor ymm5,ymm5,ymm6
+ vmovd xmm10,DWORD[((-44))+r12]
+
+ vpslld ymm6,ymm3,30
+ vpor ymm7,ymm7,ymm8
+ vmovd xmm8,DWORD[((-44))+r8]
+ vpaddd ymm1,ymm1,ymm5
+
+ vpsrld ymm3,ymm3,2
+ vpaddd ymm1,ymm1,ymm7
+ vpshufb ymm14,ymm14,ymm9
+ vpor ymm3,ymm3,ymm6
+ vmovd xmm7,DWORD[((-44))+r13]
+ vmovd xmm6,DWORD[((-44))+r9]
+ vpinsrd xmm10,xmm10,DWORD[((-44))+r14],1
+ vpinsrd xmm8,xmm8,DWORD[((-44))+r10],1
+ vpinsrd xmm7,xmm7,DWORD[((-44))+r15],1
+ vpunpckldq ymm10,ymm10,ymm7
+ vpinsrd xmm6,xmm6,DWORD[((-44))+r11],1
+ vpunpckldq ymm8,ymm8,ymm6
+ vpaddd ymm0,ymm0,ymm15
+ vpslld ymm7,ymm1,5
+ vpandn ymm6,ymm2,ymm4
+ vpand ymm5,ymm2,ymm3
+
+ vmovdqa YMMWORD[(128-128)+rax],ymm14
+ vpaddd ymm0,ymm0,ymm14
+ vinserti128 ymm10,ymm10,xmm8,1
+ vpsrld ymm8,ymm1,27
+ vpxor ymm5,ymm5,ymm6
+ vmovd xmm11,DWORD[((-40))+r12]
+
+ vpslld ymm6,ymm2,30
+ vpor ymm7,ymm7,ymm8
+ vmovd xmm8,DWORD[((-40))+r8]
+ vpaddd ymm0,ymm0,ymm5
+
+ vpsrld ymm2,ymm2,2
+ vpaddd ymm0,ymm0,ymm7
+ vpshufb ymm10,ymm10,ymm9
+ vpor ymm2,ymm2,ymm6
+ vmovd xmm7,DWORD[((-40))+r13]
+ vmovd xmm6,DWORD[((-40))+r9]
+ vpinsrd xmm11,xmm11,DWORD[((-40))+r14],1
+ vpinsrd xmm8,xmm8,DWORD[((-40))+r10],1
+ vpinsrd xmm7,xmm7,DWORD[((-40))+r15],1
+ vpunpckldq ymm11,ymm11,ymm7
+ vpinsrd xmm6,xmm6,DWORD[((-40))+r11],1
+ vpunpckldq ymm8,ymm8,ymm6
+ vpaddd ymm4,ymm4,ymm15
+ vpslld ymm7,ymm0,5
+ vpandn ymm6,ymm1,ymm3
+ vpand ymm5,ymm1,ymm2
+
+ vmovdqa YMMWORD[(160-128)+rax],ymm10
+ vpaddd ymm4,ymm4,ymm10
+ vinserti128 ymm11,ymm11,xmm8,1
+ vpsrld ymm8,ymm0,27
+ vpxor ymm5,ymm5,ymm6
+ vmovd xmm12,DWORD[((-36))+r12]
+
+ vpslld ymm6,ymm1,30
+ vpor ymm7,ymm7,ymm8
+ vmovd xmm8,DWORD[((-36))+r8]
+ vpaddd ymm4,ymm4,ymm5
+
+ vpsrld ymm1,ymm1,2
+ vpaddd ymm4,ymm4,ymm7
+ vpshufb ymm11,ymm11,ymm9
+ vpor ymm1,ymm1,ymm6
+ vmovd xmm7,DWORD[((-36))+r13]
+ vmovd xmm6,DWORD[((-36))+r9]
+ vpinsrd xmm12,xmm12,DWORD[((-36))+r14],1
+ vpinsrd xmm8,xmm8,DWORD[((-36))+r10],1
+ vpinsrd xmm7,xmm7,DWORD[((-36))+r15],1
+ vpunpckldq ymm12,ymm12,ymm7
+ vpinsrd xmm6,xmm6,DWORD[((-36))+r11],1
+ vpunpckldq ymm8,ymm8,ymm6
+ vpaddd ymm3,ymm3,ymm15
+ vpslld ymm7,ymm4,5
+ vpandn ymm6,ymm0,ymm2
+ vpand ymm5,ymm0,ymm1
+
+ vmovdqa YMMWORD[(192-128)+rax],ymm11
+ vpaddd ymm3,ymm3,ymm11
+ vinserti128 ymm12,ymm12,xmm8,1
+ vpsrld ymm8,ymm4,27
+ vpxor ymm5,ymm5,ymm6
+ vmovd xmm13,DWORD[((-32))+r12]
+
+ vpslld ymm6,ymm0,30
+ vpor ymm7,ymm7,ymm8
+ vmovd xmm8,DWORD[((-32))+r8]
+ vpaddd ymm3,ymm3,ymm5
+
+ vpsrld ymm0,ymm0,2
+ vpaddd ymm3,ymm3,ymm7
+ vpshufb ymm12,ymm12,ymm9
+ vpor ymm0,ymm0,ymm6
+ vmovd xmm7,DWORD[((-32))+r13]
+ vmovd xmm6,DWORD[((-32))+r9]
+ vpinsrd xmm13,xmm13,DWORD[((-32))+r14],1
+ vpinsrd xmm8,xmm8,DWORD[((-32))+r10],1
+ vpinsrd xmm7,xmm7,DWORD[((-32))+r15],1
+ vpunpckldq ymm13,ymm13,ymm7
+ vpinsrd xmm6,xmm6,DWORD[((-32))+r11],1
+ vpunpckldq ymm8,ymm8,ymm6
+ vpaddd ymm2,ymm2,ymm15
+ vpslld ymm7,ymm3,5
+ vpandn ymm6,ymm4,ymm1
+ vpand ymm5,ymm4,ymm0
+
+ vmovdqa YMMWORD[(224-128)+rax],ymm12
+ vpaddd ymm2,ymm2,ymm12
+ vinserti128 ymm13,ymm13,xmm8,1
+ vpsrld ymm8,ymm3,27
+ vpxor ymm5,ymm5,ymm6
+ vmovd xmm14,DWORD[((-28))+r12]
+
+ vpslld ymm6,ymm4,30
+ vpor ymm7,ymm7,ymm8
+ vmovd xmm8,DWORD[((-28))+r8]
+ vpaddd ymm2,ymm2,ymm5
+
+ vpsrld ymm4,ymm4,2
+ vpaddd ymm2,ymm2,ymm7
+ vpshufb ymm13,ymm13,ymm9
+ vpor ymm4,ymm4,ymm6
+ vmovd xmm7,DWORD[((-28))+r13]
+ vmovd xmm6,DWORD[((-28))+r9]
+ vpinsrd xmm14,xmm14,DWORD[((-28))+r14],1
+ vpinsrd xmm8,xmm8,DWORD[((-28))+r10],1
+ vpinsrd xmm7,xmm7,DWORD[((-28))+r15],1
+ vpunpckldq ymm14,ymm14,ymm7
+ vpinsrd xmm6,xmm6,DWORD[((-28))+r11],1
+ vpunpckldq ymm8,ymm8,ymm6
+ vpaddd ymm1,ymm1,ymm15
+ vpslld ymm7,ymm2,5
+ vpandn ymm6,ymm3,ymm0
+ vpand ymm5,ymm3,ymm4
+
+ vmovdqa YMMWORD[(256-256-128)+rbx],ymm13
+ vpaddd ymm1,ymm1,ymm13
+ vinserti128 ymm14,ymm14,xmm8,1
+ vpsrld ymm8,ymm2,27
+ vpxor ymm5,ymm5,ymm6
+ vmovd xmm10,DWORD[((-24))+r12]
+
+ vpslld ymm6,ymm3,30
+ vpor ymm7,ymm7,ymm8
+ vmovd xmm8,DWORD[((-24))+r8]
+ vpaddd ymm1,ymm1,ymm5
+
+ vpsrld ymm3,ymm3,2
+ vpaddd ymm1,ymm1,ymm7
+ vpshufb ymm14,ymm14,ymm9
+ vpor ymm3,ymm3,ymm6
+ vmovd xmm7,DWORD[((-24))+r13]
+ vmovd xmm6,DWORD[((-24))+r9]
+ vpinsrd xmm10,xmm10,DWORD[((-24))+r14],1
+ vpinsrd xmm8,xmm8,DWORD[((-24))+r10],1
+ vpinsrd xmm7,xmm7,DWORD[((-24))+r15],1
+ vpunpckldq ymm10,ymm10,ymm7
+ vpinsrd xmm6,xmm6,DWORD[((-24))+r11],1
+ vpunpckldq ymm8,ymm8,ymm6
+ vpaddd ymm0,ymm0,ymm15
+ vpslld ymm7,ymm1,5
+ vpandn ymm6,ymm2,ymm4
+ vpand ymm5,ymm2,ymm3
+
+ vmovdqa YMMWORD[(288-256-128)+rbx],ymm14
+ vpaddd ymm0,ymm0,ymm14
+ vinserti128 ymm10,ymm10,xmm8,1
+ vpsrld ymm8,ymm1,27
+ vpxor ymm5,ymm5,ymm6
+ vmovd xmm11,DWORD[((-20))+r12]
+
+ vpslld ymm6,ymm2,30
+ vpor ymm7,ymm7,ymm8
+ vmovd xmm8,DWORD[((-20))+r8]
+ vpaddd ymm0,ymm0,ymm5
+
+ vpsrld ymm2,ymm2,2
+ vpaddd ymm0,ymm0,ymm7
+ vpshufb ymm10,ymm10,ymm9
+ vpor ymm2,ymm2,ymm6
+ vmovd xmm7,DWORD[((-20))+r13]
+ vmovd xmm6,DWORD[((-20))+r9]
+ vpinsrd xmm11,xmm11,DWORD[((-20))+r14],1
+ vpinsrd xmm8,xmm8,DWORD[((-20))+r10],1
+ vpinsrd xmm7,xmm7,DWORD[((-20))+r15],1
+ vpunpckldq ymm11,ymm11,ymm7
+ vpinsrd xmm6,xmm6,DWORD[((-20))+r11],1
+ vpunpckldq ymm8,ymm8,ymm6
+ vpaddd ymm4,ymm4,ymm15
+ vpslld ymm7,ymm0,5
+ vpandn ymm6,ymm1,ymm3
+ vpand ymm5,ymm1,ymm2
+
+ vmovdqa YMMWORD[(320-256-128)+rbx],ymm10
+ vpaddd ymm4,ymm4,ymm10
+ vinserti128 ymm11,ymm11,xmm8,1
+ vpsrld ymm8,ymm0,27
+ vpxor ymm5,ymm5,ymm6
+ vmovd xmm12,DWORD[((-16))+r12]
+
+ vpslld ymm6,ymm1,30
+ vpor ymm7,ymm7,ymm8
+ vmovd xmm8,DWORD[((-16))+r8]
+ vpaddd ymm4,ymm4,ymm5
+
+ vpsrld ymm1,ymm1,2
+ vpaddd ymm4,ymm4,ymm7
+ vpshufb ymm11,ymm11,ymm9
+ vpor ymm1,ymm1,ymm6
+ vmovd xmm7,DWORD[((-16))+r13]
+ vmovd xmm6,DWORD[((-16))+r9]
+ vpinsrd xmm12,xmm12,DWORD[((-16))+r14],1
+ vpinsrd xmm8,xmm8,DWORD[((-16))+r10],1
+ vpinsrd xmm7,xmm7,DWORD[((-16))+r15],1
+ vpunpckldq ymm12,ymm12,ymm7
+ vpinsrd xmm6,xmm6,DWORD[((-16))+r11],1
+ vpunpckldq ymm8,ymm8,ymm6
+ vpaddd ymm3,ymm3,ymm15
+ vpslld ymm7,ymm4,5
+ vpandn ymm6,ymm0,ymm2
+ vpand ymm5,ymm0,ymm1
+
+ vmovdqa YMMWORD[(352-256-128)+rbx],ymm11
+ vpaddd ymm3,ymm3,ymm11
+ vinserti128 ymm12,ymm12,xmm8,1
+ vpsrld ymm8,ymm4,27
+ vpxor ymm5,ymm5,ymm6
+ vmovd xmm13,DWORD[((-12))+r12]
+
+ vpslld ymm6,ymm0,30
+ vpor ymm7,ymm7,ymm8
+ vmovd xmm8,DWORD[((-12))+r8]
+ vpaddd ymm3,ymm3,ymm5
+
+ vpsrld ymm0,ymm0,2
+ vpaddd ymm3,ymm3,ymm7
+ vpshufb ymm12,ymm12,ymm9
+ vpor ymm0,ymm0,ymm6
+ vmovd xmm7,DWORD[((-12))+r13]
+ vmovd xmm6,DWORD[((-12))+r9]
+ vpinsrd xmm13,xmm13,DWORD[((-12))+r14],1
+ vpinsrd xmm8,xmm8,DWORD[((-12))+r10],1
+ vpinsrd xmm7,xmm7,DWORD[((-12))+r15],1
+ vpunpckldq ymm13,ymm13,ymm7
+ vpinsrd xmm6,xmm6,DWORD[((-12))+r11],1
+ vpunpckldq ymm8,ymm8,ymm6
+ vpaddd ymm2,ymm2,ymm15
+ vpslld ymm7,ymm3,5
+ vpandn ymm6,ymm4,ymm1
+ vpand ymm5,ymm4,ymm0
+
+ vmovdqa YMMWORD[(384-256-128)+rbx],ymm12
+ vpaddd ymm2,ymm2,ymm12
+ vinserti128 ymm13,ymm13,xmm8,1
+ vpsrld ymm8,ymm3,27
+ vpxor ymm5,ymm5,ymm6
+ vmovd xmm14,DWORD[((-8))+r12]
+
+ vpslld ymm6,ymm4,30
+ vpor ymm7,ymm7,ymm8
+ vmovd xmm8,DWORD[((-8))+r8]
+ vpaddd ymm2,ymm2,ymm5
+
+ vpsrld ymm4,ymm4,2
+ vpaddd ymm2,ymm2,ymm7
+ vpshufb ymm13,ymm13,ymm9
+ vpor ymm4,ymm4,ymm6
+ vmovd xmm7,DWORD[((-8))+r13]
+ vmovd xmm6,DWORD[((-8))+r9]
+ vpinsrd xmm14,xmm14,DWORD[((-8))+r14],1
+ vpinsrd xmm8,xmm8,DWORD[((-8))+r10],1
+ vpinsrd xmm7,xmm7,DWORD[((-8))+r15],1
+ vpunpckldq ymm14,ymm14,ymm7
+ vpinsrd xmm6,xmm6,DWORD[((-8))+r11],1
+ vpunpckldq ymm8,ymm8,ymm6
+ vpaddd ymm1,ymm1,ymm15
+ vpslld ymm7,ymm2,5
+ vpandn ymm6,ymm3,ymm0
+ vpand ymm5,ymm3,ymm4
+
+ vmovdqa YMMWORD[(416-256-128)+rbx],ymm13
+ vpaddd ymm1,ymm1,ymm13
+ vinserti128 ymm14,ymm14,xmm8,1
+ vpsrld ymm8,ymm2,27
+ vpxor ymm5,ymm5,ymm6
+ vmovd xmm10,DWORD[((-4))+r12]
+
+ vpslld ymm6,ymm3,30
+ vpor ymm7,ymm7,ymm8
+ vmovd xmm8,DWORD[((-4))+r8]
+ vpaddd ymm1,ymm1,ymm5
+
+ vpsrld ymm3,ymm3,2
+ vpaddd ymm1,ymm1,ymm7
+ vpshufb ymm14,ymm14,ymm9
+ vpor ymm3,ymm3,ymm6
+ vmovdqa ymm11,YMMWORD[((0-128))+rax]
+ vmovd xmm7,DWORD[((-4))+r13]
+ vmovd xmm6,DWORD[((-4))+r9]
+ vpinsrd xmm10,xmm10,DWORD[((-4))+r14],1
+ vpinsrd xmm8,xmm8,DWORD[((-4))+r10],1
+ vpinsrd xmm7,xmm7,DWORD[((-4))+r15],1
+ vpunpckldq ymm10,ymm10,ymm7
+ vpinsrd xmm6,xmm6,DWORD[((-4))+r11],1
+ vpunpckldq ymm8,ymm8,ymm6
+ vpaddd ymm0,ymm0,ymm15
+ prefetcht0 [63+r12]
+ vpslld ymm7,ymm1,5
+ vpandn ymm6,ymm2,ymm4
+ vpand ymm5,ymm2,ymm3
+
+ vmovdqa YMMWORD[(448-256-128)+rbx],ymm14
+ vpaddd ymm0,ymm0,ymm14
+ vinserti128 ymm10,ymm10,xmm8,1
+ vpsrld ymm8,ymm1,27
+ prefetcht0 [63+r13]
+ vpxor ymm5,ymm5,ymm6
+
+ vpslld ymm6,ymm2,30
+ vpor ymm7,ymm7,ymm8
+ prefetcht0 [63+r14]
+ vpaddd ymm0,ymm0,ymm5
+
+ vpsrld ymm2,ymm2,2
+ vpaddd ymm0,ymm0,ymm7
+ prefetcht0 [63+r15]
+ vpshufb ymm10,ymm10,ymm9
+ vpor ymm2,ymm2,ymm6
+ vmovdqa ymm12,YMMWORD[((32-128))+rax]
+ vpxor ymm11,ymm11,ymm13
+ vmovdqa ymm13,YMMWORD[((64-128))+rax]
+
+ vpaddd ymm4,ymm4,ymm15
+ vpslld ymm7,ymm0,5
+ vpandn ymm6,ymm1,ymm3
+ prefetcht0 [63+r8]
+ vpand ymm5,ymm1,ymm2
+
+ vmovdqa YMMWORD[(480-256-128)+rbx],ymm10
+ vpaddd ymm4,ymm4,ymm10
+ vpxor ymm11,ymm11,YMMWORD[((256-256-128))+rbx]
+ vpsrld ymm8,ymm0,27
+ vpxor ymm5,ymm5,ymm6
+ vpxor ymm11,ymm11,ymm13
+ prefetcht0 [63+r9]
+
+ vpslld ymm6,ymm1,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm4,ymm4,ymm5
+ prefetcht0 [63+r10]
+ vpsrld ymm9,ymm11,31
+ vpaddd ymm11,ymm11,ymm11
+
+ vpsrld ymm1,ymm1,2
+ prefetcht0 [63+r11]
+ vpaddd ymm4,ymm4,ymm7
+ vpor ymm11,ymm11,ymm9
+ vpor ymm1,ymm1,ymm6
+ vpxor ymm12,ymm12,ymm14
+ vmovdqa ymm14,YMMWORD[((96-128))+rax]
+
+ vpaddd ymm3,ymm3,ymm15
+ vpslld ymm7,ymm4,5
+ vpandn ymm6,ymm0,ymm2
+
+ vpand ymm5,ymm0,ymm1
+
+ vmovdqa YMMWORD[(0-128)+rax],ymm11
+ vpaddd ymm3,ymm3,ymm11
+ vpxor ymm12,ymm12,YMMWORD[((288-256-128))+rbx]
+ vpsrld ymm8,ymm4,27
+ vpxor ymm5,ymm5,ymm6
+ vpxor ymm12,ymm12,ymm14
+
+
+ vpslld ymm6,ymm0,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm3,ymm3,ymm5
+
+ vpsrld ymm9,ymm12,31
+ vpaddd ymm12,ymm12,ymm12
+
+ vpsrld ymm0,ymm0,2
+
+ vpaddd ymm3,ymm3,ymm7
+ vpor ymm12,ymm12,ymm9
+ vpor ymm0,ymm0,ymm6
+ vpxor ymm13,ymm13,ymm10
+ vmovdqa ymm10,YMMWORD[((128-128))+rax]
+
+ vpaddd ymm2,ymm2,ymm15
+ vpslld ymm7,ymm3,5
+ vpandn ymm6,ymm4,ymm1
+
+ vpand ymm5,ymm4,ymm0
+
+ vmovdqa YMMWORD[(32-128)+rax],ymm12
+ vpaddd ymm2,ymm2,ymm12
+ vpxor ymm13,ymm13,YMMWORD[((320-256-128))+rbx]
+ vpsrld ymm8,ymm3,27
+ vpxor ymm5,ymm5,ymm6
+ vpxor ymm13,ymm13,ymm10
+
+
+ vpslld ymm6,ymm4,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm2,ymm2,ymm5
+
+ vpsrld ymm9,ymm13,31
+ vpaddd ymm13,ymm13,ymm13
+
+ vpsrld ymm4,ymm4,2
+
+ vpaddd ymm2,ymm2,ymm7
+ vpor ymm13,ymm13,ymm9
+ vpor ymm4,ymm4,ymm6
+ vpxor ymm14,ymm14,ymm11
+ vmovdqa ymm11,YMMWORD[((160-128))+rax]
+
+ vpaddd ymm1,ymm1,ymm15
+ vpslld ymm7,ymm2,5
+ vpandn ymm6,ymm3,ymm0
+
+ vpand ymm5,ymm3,ymm4
+
+ vmovdqa YMMWORD[(64-128)+rax],ymm13
+ vpaddd ymm1,ymm1,ymm13
+ vpxor ymm14,ymm14,YMMWORD[((352-256-128))+rbx]
+ vpsrld ymm8,ymm2,27
+ vpxor ymm5,ymm5,ymm6
+ vpxor ymm14,ymm14,ymm11
+
+
+ vpslld ymm6,ymm3,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm1,ymm1,ymm5
+
+ vpsrld ymm9,ymm14,31
+ vpaddd ymm14,ymm14,ymm14
+
+ vpsrld ymm3,ymm3,2
+
+ vpaddd ymm1,ymm1,ymm7
+ vpor ymm14,ymm14,ymm9
+ vpor ymm3,ymm3,ymm6
+ vpxor ymm10,ymm10,ymm12
+ vmovdqa ymm12,YMMWORD[((192-128))+rax]
+
+ vpaddd ymm0,ymm0,ymm15
+ vpslld ymm7,ymm1,5
+ vpandn ymm6,ymm2,ymm4
+
+ vpand ymm5,ymm2,ymm3
+
+ vmovdqa YMMWORD[(96-128)+rax],ymm14
+ vpaddd ymm0,ymm0,ymm14
+ vpxor ymm10,ymm10,YMMWORD[((384-256-128))+rbx]
+ vpsrld ymm8,ymm1,27
+ vpxor ymm5,ymm5,ymm6
+ vpxor ymm10,ymm10,ymm12
+
+
+ vpslld ymm6,ymm2,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm0,ymm0,ymm5
+
+ vpsrld ymm9,ymm10,31
+ vpaddd ymm10,ymm10,ymm10
+
+ vpsrld ymm2,ymm2,2
+
+ vpaddd ymm0,ymm0,ymm7
+ vpor ymm10,ymm10,ymm9
+ vpor ymm2,ymm2,ymm6
+ vmovdqa ymm15,YMMWORD[rbp]
+ vpxor ymm11,ymm11,ymm13
+ vmovdqa ymm13,YMMWORD[((224-128))+rax]
+
+ vpslld ymm7,ymm0,5
+ vpaddd ymm4,ymm4,ymm15
+ vpxor ymm5,ymm3,ymm1
+ vmovdqa YMMWORD[(128-128)+rax],ymm10
+ vpaddd ymm4,ymm4,ymm10
+ vpxor ymm11,ymm11,YMMWORD[((416-256-128))+rbx]
+ vpsrld ymm8,ymm0,27
+ vpxor ymm5,ymm5,ymm2
+ vpxor ymm11,ymm11,ymm13
+
+ vpslld ymm6,ymm1,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm4,ymm4,ymm5
+ vpsrld ymm9,ymm11,31
+ vpaddd ymm11,ymm11,ymm11
+
+ vpsrld ymm1,ymm1,2
+ vpaddd ymm4,ymm4,ymm7
+ vpor ymm11,ymm11,ymm9
+ vpor ymm1,ymm1,ymm6
+ vpxor ymm12,ymm12,ymm14
+ vmovdqa ymm14,YMMWORD[((256-256-128))+rbx]
+
+ vpslld ymm7,ymm4,5
+ vpaddd ymm3,ymm3,ymm15
+ vpxor ymm5,ymm2,ymm0
+ vmovdqa YMMWORD[(160-128)+rax],ymm11
+ vpaddd ymm3,ymm3,ymm11
+ vpxor ymm12,ymm12,YMMWORD[((448-256-128))+rbx]
+ vpsrld ymm8,ymm4,27
+ vpxor ymm5,ymm5,ymm1
+ vpxor ymm12,ymm12,ymm14
+
+ vpslld ymm6,ymm0,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm3,ymm3,ymm5
+ vpsrld ymm9,ymm12,31
+ vpaddd ymm12,ymm12,ymm12
+
+ vpsrld ymm0,ymm0,2
+ vpaddd ymm3,ymm3,ymm7
+ vpor ymm12,ymm12,ymm9
+ vpor ymm0,ymm0,ymm6
+ vpxor ymm13,ymm13,ymm10
+ vmovdqa ymm10,YMMWORD[((288-256-128))+rbx]
+
+ vpslld ymm7,ymm3,5
+ vpaddd ymm2,ymm2,ymm15
+ vpxor ymm5,ymm1,ymm4
+ vmovdqa YMMWORD[(192-128)+rax],ymm12
+ vpaddd ymm2,ymm2,ymm12
+ vpxor ymm13,ymm13,YMMWORD[((480-256-128))+rbx]
+ vpsrld ymm8,ymm3,27
+ vpxor ymm5,ymm5,ymm0
+ vpxor ymm13,ymm13,ymm10
+
+ vpslld ymm6,ymm4,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm2,ymm2,ymm5
+ vpsrld ymm9,ymm13,31
+ vpaddd ymm13,ymm13,ymm13
+
+ vpsrld ymm4,ymm4,2
+ vpaddd ymm2,ymm2,ymm7
+ vpor ymm13,ymm13,ymm9
+ vpor ymm4,ymm4,ymm6
+ vpxor ymm14,ymm14,ymm11
+ vmovdqa ymm11,YMMWORD[((320-256-128))+rbx]
+
+ vpslld ymm7,ymm2,5
+ vpaddd ymm1,ymm1,ymm15
+ vpxor ymm5,ymm0,ymm3
+ vmovdqa YMMWORD[(224-128)+rax],ymm13
+ vpaddd ymm1,ymm1,ymm13
+ vpxor ymm14,ymm14,YMMWORD[((0-128))+rax]
+ vpsrld ymm8,ymm2,27
+ vpxor ymm5,ymm5,ymm4
+ vpxor ymm14,ymm14,ymm11
+
+ vpslld ymm6,ymm3,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm1,ymm1,ymm5
+ vpsrld ymm9,ymm14,31
+ vpaddd ymm14,ymm14,ymm14
+
+ vpsrld ymm3,ymm3,2
+ vpaddd ymm1,ymm1,ymm7
+ vpor ymm14,ymm14,ymm9
+ vpor ymm3,ymm3,ymm6
+ vpxor ymm10,ymm10,ymm12
+ vmovdqa ymm12,YMMWORD[((352-256-128))+rbx]
+
+ vpslld ymm7,ymm1,5
+ vpaddd ymm0,ymm0,ymm15
+ vpxor ymm5,ymm4,ymm2
+ vmovdqa YMMWORD[(256-256-128)+rbx],ymm14
+ vpaddd ymm0,ymm0,ymm14
+ vpxor ymm10,ymm10,YMMWORD[((32-128))+rax]
+ vpsrld ymm8,ymm1,27
+ vpxor ymm5,ymm5,ymm3
+ vpxor ymm10,ymm10,ymm12
+
+ vpslld ymm6,ymm2,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm0,ymm0,ymm5
+ vpsrld ymm9,ymm10,31
+ vpaddd ymm10,ymm10,ymm10
+
+ vpsrld ymm2,ymm2,2
+ vpaddd ymm0,ymm0,ymm7
+ vpor ymm10,ymm10,ymm9
+ vpor ymm2,ymm2,ymm6
+ vpxor ymm11,ymm11,ymm13
+ vmovdqa ymm13,YMMWORD[((384-256-128))+rbx]
+
+ vpslld ymm7,ymm0,5
+ vpaddd ymm4,ymm4,ymm15
+ vpxor ymm5,ymm3,ymm1
+ vmovdqa YMMWORD[(288-256-128)+rbx],ymm10
+ vpaddd ymm4,ymm4,ymm10
+ vpxor ymm11,ymm11,YMMWORD[((64-128))+rax]
+ vpsrld ymm8,ymm0,27
+ vpxor ymm5,ymm5,ymm2
+ vpxor ymm11,ymm11,ymm13
+
+ vpslld ymm6,ymm1,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm4,ymm4,ymm5
+ vpsrld ymm9,ymm11,31
+ vpaddd ymm11,ymm11,ymm11
+
+ vpsrld ymm1,ymm1,2
+ vpaddd ymm4,ymm4,ymm7
+ vpor ymm11,ymm11,ymm9
+ vpor ymm1,ymm1,ymm6
+ vpxor ymm12,ymm12,ymm14
+ vmovdqa ymm14,YMMWORD[((416-256-128))+rbx]
+
+ vpslld ymm7,ymm4,5
+ vpaddd ymm3,ymm3,ymm15
+ vpxor ymm5,ymm2,ymm0
+ vmovdqa YMMWORD[(320-256-128)+rbx],ymm11
+ vpaddd ymm3,ymm3,ymm11
+ vpxor ymm12,ymm12,YMMWORD[((96-128))+rax]
+ vpsrld ymm8,ymm4,27
+ vpxor ymm5,ymm5,ymm1
+ vpxor ymm12,ymm12,ymm14
+
+ vpslld ymm6,ymm0,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm3,ymm3,ymm5
+ vpsrld ymm9,ymm12,31
+ vpaddd ymm12,ymm12,ymm12
+
+ vpsrld ymm0,ymm0,2
+ vpaddd ymm3,ymm3,ymm7
+ vpor ymm12,ymm12,ymm9
+ vpor ymm0,ymm0,ymm6
+ vpxor ymm13,ymm13,ymm10
+ vmovdqa ymm10,YMMWORD[((448-256-128))+rbx]
+
+ vpslld ymm7,ymm3,5
+ vpaddd ymm2,ymm2,ymm15
+ vpxor ymm5,ymm1,ymm4
+ vmovdqa YMMWORD[(352-256-128)+rbx],ymm12
+ vpaddd ymm2,ymm2,ymm12
+ vpxor ymm13,ymm13,YMMWORD[((128-128))+rax]
+ vpsrld ymm8,ymm3,27
+ vpxor ymm5,ymm5,ymm0
+ vpxor ymm13,ymm13,ymm10
+
+ vpslld ymm6,ymm4,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm2,ymm2,ymm5
+ vpsrld ymm9,ymm13,31
+ vpaddd ymm13,ymm13,ymm13
+
+ vpsrld ymm4,ymm4,2
+ vpaddd ymm2,ymm2,ymm7
+ vpor ymm13,ymm13,ymm9
+ vpor ymm4,ymm4,ymm6
+ vpxor ymm14,ymm14,ymm11
+ vmovdqa ymm11,YMMWORD[((480-256-128))+rbx]
+
+ vpslld ymm7,ymm2,5
+ vpaddd ymm1,ymm1,ymm15
+ vpxor ymm5,ymm0,ymm3
+ vmovdqa YMMWORD[(384-256-128)+rbx],ymm13
+ vpaddd ymm1,ymm1,ymm13
+ vpxor ymm14,ymm14,YMMWORD[((160-128))+rax]
+ vpsrld ymm8,ymm2,27
+ vpxor ymm5,ymm5,ymm4
+ vpxor ymm14,ymm14,ymm11
+
+ vpslld ymm6,ymm3,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm1,ymm1,ymm5
+ vpsrld ymm9,ymm14,31
+ vpaddd ymm14,ymm14,ymm14
+
+ vpsrld ymm3,ymm3,2
+ vpaddd ymm1,ymm1,ymm7
+ vpor ymm14,ymm14,ymm9
+ vpor ymm3,ymm3,ymm6
+ vpxor ymm10,ymm10,ymm12
+ vmovdqa ymm12,YMMWORD[((0-128))+rax]
+
+ vpslld ymm7,ymm1,5
+ vpaddd ymm0,ymm0,ymm15
+ vpxor ymm5,ymm4,ymm2
+ vmovdqa YMMWORD[(416-256-128)+rbx],ymm14
+ vpaddd ymm0,ymm0,ymm14
+ vpxor ymm10,ymm10,YMMWORD[((192-128))+rax]
+ vpsrld ymm8,ymm1,27
+ vpxor ymm5,ymm5,ymm3
+ vpxor ymm10,ymm10,ymm12
+
+ vpslld ymm6,ymm2,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm0,ymm0,ymm5
+ vpsrld ymm9,ymm10,31
+ vpaddd ymm10,ymm10,ymm10
+
+ vpsrld ymm2,ymm2,2
+ vpaddd ymm0,ymm0,ymm7
+ vpor ymm10,ymm10,ymm9
+ vpor ymm2,ymm2,ymm6
+ vpxor ymm11,ymm11,ymm13
+ vmovdqa ymm13,YMMWORD[((32-128))+rax]
+
+ vpslld ymm7,ymm0,5
+ vpaddd ymm4,ymm4,ymm15
+ vpxor ymm5,ymm3,ymm1
+ vmovdqa YMMWORD[(448-256-128)+rbx],ymm10
+ vpaddd ymm4,ymm4,ymm10
+ vpxor ymm11,ymm11,YMMWORD[((224-128))+rax]
+ vpsrld ymm8,ymm0,27
+ vpxor ymm5,ymm5,ymm2
+ vpxor ymm11,ymm11,ymm13
+
+ vpslld ymm6,ymm1,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm4,ymm4,ymm5
+ vpsrld ymm9,ymm11,31
+ vpaddd ymm11,ymm11,ymm11
+
+ vpsrld ymm1,ymm1,2
+ vpaddd ymm4,ymm4,ymm7
+ vpor ymm11,ymm11,ymm9
+ vpor ymm1,ymm1,ymm6
+ vpxor ymm12,ymm12,ymm14
+ vmovdqa ymm14,YMMWORD[((64-128))+rax]
+
+ vpslld ymm7,ymm4,5
+ vpaddd ymm3,ymm3,ymm15
+ vpxor ymm5,ymm2,ymm0
+ vmovdqa YMMWORD[(480-256-128)+rbx],ymm11
+ vpaddd ymm3,ymm3,ymm11
+ vpxor ymm12,ymm12,YMMWORD[((256-256-128))+rbx]
+ vpsrld ymm8,ymm4,27
+ vpxor ymm5,ymm5,ymm1
+ vpxor ymm12,ymm12,ymm14
+
+ vpslld ymm6,ymm0,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm3,ymm3,ymm5
+ vpsrld ymm9,ymm12,31
+ vpaddd ymm12,ymm12,ymm12
+
+ vpsrld ymm0,ymm0,2
+ vpaddd ymm3,ymm3,ymm7
+ vpor ymm12,ymm12,ymm9
+ vpor ymm0,ymm0,ymm6
+ vpxor ymm13,ymm13,ymm10
+ vmovdqa ymm10,YMMWORD[((96-128))+rax]
+
+ vpslld ymm7,ymm3,5
+ vpaddd ymm2,ymm2,ymm15
+ vpxor ymm5,ymm1,ymm4
+ vmovdqa YMMWORD[(0-128)+rax],ymm12
+ vpaddd ymm2,ymm2,ymm12
+ vpxor ymm13,ymm13,YMMWORD[((288-256-128))+rbx]
+ vpsrld ymm8,ymm3,27
+ vpxor ymm5,ymm5,ymm0
+ vpxor ymm13,ymm13,ymm10
+
+ vpslld ymm6,ymm4,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm2,ymm2,ymm5
+ vpsrld ymm9,ymm13,31
+ vpaddd ymm13,ymm13,ymm13
+
+ vpsrld ymm4,ymm4,2
+ vpaddd ymm2,ymm2,ymm7
+ vpor ymm13,ymm13,ymm9
+ vpor ymm4,ymm4,ymm6
+ vpxor ymm14,ymm14,ymm11
+ vmovdqa ymm11,YMMWORD[((128-128))+rax]
+
+ vpslld ymm7,ymm2,5
+ vpaddd ymm1,ymm1,ymm15
+ vpxor ymm5,ymm0,ymm3
+ vmovdqa YMMWORD[(32-128)+rax],ymm13
+ vpaddd ymm1,ymm1,ymm13
+ vpxor ymm14,ymm14,YMMWORD[((320-256-128))+rbx]
+ vpsrld ymm8,ymm2,27
+ vpxor ymm5,ymm5,ymm4
+ vpxor ymm14,ymm14,ymm11
+
+ vpslld ymm6,ymm3,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm1,ymm1,ymm5
+ vpsrld ymm9,ymm14,31
+ vpaddd ymm14,ymm14,ymm14
+
+ vpsrld ymm3,ymm3,2
+ vpaddd ymm1,ymm1,ymm7
+ vpor ymm14,ymm14,ymm9
+ vpor ymm3,ymm3,ymm6
+ vpxor ymm10,ymm10,ymm12
+ vmovdqa ymm12,YMMWORD[((160-128))+rax]
+
+ vpslld ymm7,ymm1,5
+ vpaddd ymm0,ymm0,ymm15
+ vpxor ymm5,ymm4,ymm2
+ vmovdqa YMMWORD[(64-128)+rax],ymm14
+ vpaddd ymm0,ymm0,ymm14
+ vpxor ymm10,ymm10,YMMWORD[((352-256-128))+rbx]
+ vpsrld ymm8,ymm1,27
+ vpxor ymm5,ymm5,ymm3
+ vpxor ymm10,ymm10,ymm12
+
+ vpslld ymm6,ymm2,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm0,ymm0,ymm5
+ vpsrld ymm9,ymm10,31
+ vpaddd ymm10,ymm10,ymm10
+
+ vpsrld ymm2,ymm2,2
+ vpaddd ymm0,ymm0,ymm7
+ vpor ymm10,ymm10,ymm9
+ vpor ymm2,ymm2,ymm6
+ vpxor ymm11,ymm11,ymm13
+ vmovdqa ymm13,YMMWORD[((192-128))+rax]
+
+ vpslld ymm7,ymm0,5
+ vpaddd ymm4,ymm4,ymm15
+ vpxor ymm5,ymm3,ymm1
+ vmovdqa YMMWORD[(96-128)+rax],ymm10
+ vpaddd ymm4,ymm4,ymm10
+ vpxor ymm11,ymm11,YMMWORD[((384-256-128))+rbx]
+ vpsrld ymm8,ymm0,27
+ vpxor ymm5,ymm5,ymm2
+ vpxor ymm11,ymm11,ymm13
+
+ vpslld ymm6,ymm1,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm4,ymm4,ymm5
+ vpsrld ymm9,ymm11,31
+ vpaddd ymm11,ymm11,ymm11
+
+ vpsrld ymm1,ymm1,2
+ vpaddd ymm4,ymm4,ymm7
+ vpor ymm11,ymm11,ymm9
+ vpor ymm1,ymm1,ymm6
+ vpxor ymm12,ymm12,ymm14
+ vmovdqa ymm14,YMMWORD[((224-128))+rax]
+
+ vpslld ymm7,ymm4,5
+ vpaddd ymm3,ymm3,ymm15
+ vpxor ymm5,ymm2,ymm0
+ vmovdqa YMMWORD[(128-128)+rax],ymm11
+ vpaddd ymm3,ymm3,ymm11
+ vpxor ymm12,ymm12,YMMWORD[((416-256-128))+rbx]
+ vpsrld ymm8,ymm4,27
+ vpxor ymm5,ymm5,ymm1
+ vpxor ymm12,ymm12,ymm14
+
+ vpslld ymm6,ymm0,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm3,ymm3,ymm5
+ vpsrld ymm9,ymm12,31
+ vpaddd ymm12,ymm12,ymm12
+
+ vpsrld ymm0,ymm0,2
+ vpaddd ymm3,ymm3,ymm7
+ vpor ymm12,ymm12,ymm9
+ vpor ymm0,ymm0,ymm6
+ vpxor ymm13,ymm13,ymm10
+ vmovdqa ymm10,YMMWORD[((256-256-128))+rbx]
+
+ vpslld ymm7,ymm3,5
+ vpaddd ymm2,ymm2,ymm15
+ vpxor ymm5,ymm1,ymm4
+ vmovdqa YMMWORD[(160-128)+rax],ymm12
+ vpaddd ymm2,ymm2,ymm12
+ vpxor ymm13,ymm13,YMMWORD[((448-256-128))+rbx]
+ vpsrld ymm8,ymm3,27
+ vpxor ymm5,ymm5,ymm0
+ vpxor ymm13,ymm13,ymm10
+
+ vpslld ymm6,ymm4,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm2,ymm2,ymm5
+ vpsrld ymm9,ymm13,31
+ vpaddd ymm13,ymm13,ymm13
+
+ vpsrld ymm4,ymm4,2
+ vpaddd ymm2,ymm2,ymm7
+ vpor ymm13,ymm13,ymm9
+ vpor ymm4,ymm4,ymm6
+ vpxor ymm14,ymm14,ymm11
+ vmovdqa ymm11,YMMWORD[((288-256-128))+rbx]
+
+ vpslld ymm7,ymm2,5
+ vpaddd ymm1,ymm1,ymm15
+ vpxor ymm5,ymm0,ymm3
+ vmovdqa YMMWORD[(192-128)+rax],ymm13
+ vpaddd ymm1,ymm1,ymm13
+ vpxor ymm14,ymm14,YMMWORD[((480-256-128))+rbx]
+ vpsrld ymm8,ymm2,27
+ vpxor ymm5,ymm5,ymm4
+ vpxor ymm14,ymm14,ymm11
+
+ vpslld ymm6,ymm3,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm1,ymm1,ymm5
+ vpsrld ymm9,ymm14,31
+ vpaddd ymm14,ymm14,ymm14
+
+ vpsrld ymm3,ymm3,2
+ vpaddd ymm1,ymm1,ymm7
+ vpor ymm14,ymm14,ymm9
+ vpor ymm3,ymm3,ymm6
+ vpxor ymm10,ymm10,ymm12
+ vmovdqa ymm12,YMMWORD[((320-256-128))+rbx]
+
+ vpslld ymm7,ymm1,5
+ vpaddd ymm0,ymm0,ymm15
+ vpxor ymm5,ymm4,ymm2
+ vmovdqa YMMWORD[(224-128)+rax],ymm14
+ vpaddd ymm0,ymm0,ymm14
+ vpxor ymm10,ymm10,YMMWORD[((0-128))+rax]
+ vpsrld ymm8,ymm1,27
+ vpxor ymm5,ymm5,ymm3
+ vpxor ymm10,ymm10,ymm12
+
+ vpslld ymm6,ymm2,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm0,ymm0,ymm5
+ vpsrld ymm9,ymm10,31
+ vpaddd ymm10,ymm10,ymm10
+
+ vpsrld ymm2,ymm2,2
+ vpaddd ymm0,ymm0,ymm7
+ vpor ymm10,ymm10,ymm9
+ vpor ymm2,ymm2,ymm6
+ vmovdqa ymm15,YMMWORD[32+rbp]
+ vpxor ymm11,ymm11,ymm13
+ vmovdqa ymm13,YMMWORD[((352-256-128))+rbx]
+
+ vpaddd ymm4,ymm4,ymm15
+ vpslld ymm7,ymm0,5
+ vpand ymm6,ymm3,ymm2
+ vpxor ymm11,ymm11,YMMWORD[((32-128))+rax]
+
+ vpaddd ymm4,ymm4,ymm6
+ vpsrld ymm8,ymm0,27
+ vpxor ymm5,ymm3,ymm2
+ vpxor ymm11,ymm11,ymm13
+
+ vmovdqu YMMWORD[(256-256-128)+rbx],ymm10
+ vpaddd ymm4,ymm4,ymm10
+ vpor ymm7,ymm7,ymm8
+ vpsrld ymm9,ymm11,31
+ vpand ymm5,ymm5,ymm1
+ vpaddd ymm11,ymm11,ymm11
+
+ vpslld ymm6,ymm1,30
+ vpaddd ymm4,ymm4,ymm5
+
+ vpsrld ymm1,ymm1,2
+ vpaddd ymm4,ymm4,ymm7
+ vpor ymm11,ymm11,ymm9
+ vpor ymm1,ymm1,ymm6
+ vpxor ymm12,ymm12,ymm14
+ vmovdqa ymm14,YMMWORD[((384-256-128))+rbx]
+
+ vpaddd ymm3,ymm3,ymm15
+ vpslld ymm7,ymm4,5
+ vpand ymm6,ymm2,ymm1
+ vpxor ymm12,ymm12,YMMWORD[((64-128))+rax]
+
+ vpaddd ymm3,ymm3,ymm6
+ vpsrld ymm8,ymm4,27
+ vpxor ymm5,ymm2,ymm1
+ vpxor ymm12,ymm12,ymm14
+
+ vmovdqu YMMWORD[(288-256-128)+rbx],ymm11
+ vpaddd ymm3,ymm3,ymm11
+ vpor ymm7,ymm7,ymm8
+ vpsrld ymm9,ymm12,31
+ vpand ymm5,ymm5,ymm0
+ vpaddd ymm12,ymm12,ymm12
+
+ vpslld ymm6,ymm0,30
+ vpaddd ymm3,ymm3,ymm5
+
+ vpsrld ymm0,ymm0,2
+ vpaddd ymm3,ymm3,ymm7
+ vpor ymm12,ymm12,ymm9
+ vpor ymm0,ymm0,ymm6
+ vpxor ymm13,ymm13,ymm10
+ vmovdqa ymm10,YMMWORD[((416-256-128))+rbx]
+
+ vpaddd ymm2,ymm2,ymm15
+ vpslld ymm7,ymm3,5
+ vpand ymm6,ymm1,ymm0
+ vpxor ymm13,ymm13,YMMWORD[((96-128))+rax]
+
+ vpaddd ymm2,ymm2,ymm6
+ vpsrld ymm8,ymm3,27
+ vpxor ymm5,ymm1,ymm0
+ vpxor ymm13,ymm13,ymm10
+
+ vmovdqu YMMWORD[(320-256-128)+rbx],ymm12
+ vpaddd ymm2,ymm2,ymm12
+ vpor ymm7,ymm7,ymm8
+ vpsrld ymm9,ymm13,31
+ vpand ymm5,ymm5,ymm4
+ vpaddd ymm13,ymm13,ymm13
+
+ vpslld ymm6,ymm4,30
+ vpaddd ymm2,ymm2,ymm5
+
+ vpsrld ymm4,ymm4,2
+ vpaddd ymm2,ymm2,ymm7
+ vpor ymm13,ymm13,ymm9
+ vpor ymm4,ymm4,ymm6
+ vpxor ymm14,ymm14,ymm11
+ vmovdqa ymm11,YMMWORD[((448-256-128))+rbx]
+
+ vpaddd ymm1,ymm1,ymm15
+ vpslld ymm7,ymm2,5
+ vpand ymm6,ymm0,ymm4
+ vpxor ymm14,ymm14,YMMWORD[((128-128))+rax]
+
+ vpaddd ymm1,ymm1,ymm6
+ vpsrld ymm8,ymm2,27
+ vpxor ymm5,ymm0,ymm4
+ vpxor ymm14,ymm14,ymm11
+
+ vmovdqu YMMWORD[(352-256-128)+rbx],ymm13
+ vpaddd ymm1,ymm1,ymm13
+ vpor ymm7,ymm7,ymm8
+ vpsrld ymm9,ymm14,31
+ vpand ymm5,ymm5,ymm3
+ vpaddd ymm14,ymm14,ymm14
+
+ vpslld ymm6,ymm3,30
+ vpaddd ymm1,ymm1,ymm5
+
+ vpsrld ymm3,ymm3,2
+ vpaddd ymm1,ymm1,ymm7
+ vpor ymm14,ymm14,ymm9
+ vpor ymm3,ymm3,ymm6
+ vpxor ymm10,ymm10,ymm12
+ vmovdqa ymm12,YMMWORD[((480-256-128))+rbx]
+
+ vpaddd ymm0,ymm0,ymm15
+ vpslld ymm7,ymm1,5
+ vpand ymm6,ymm4,ymm3
+ vpxor ymm10,ymm10,YMMWORD[((160-128))+rax]
+
+ vpaddd ymm0,ymm0,ymm6
+ vpsrld ymm8,ymm1,27
+ vpxor ymm5,ymm4,ymm3
+ vpxor ymm10,ymm10,ymm12
+
+ vmovdqu YMMWORD[(384-256-128)+rbx],ymm14
+ vpaddd ymm0,ymm0,ymm14
+ vpor ymm7,ymm7,ymm8
+ vpsrld ymm9,ymm10,31
+ vpand ymm5,ymm5,ymm2
+ vpaddd ymm10,ymm10,ymm10
+
+ vpslld ymm6,ymm2,30
+ vpaddd ymm0,ymm0,ymm5
+
+ vpsrld ymm2,ymm2,2
+ vpaddd ymm0,ymm0,ymm7
+ vpor ymm10,ymm10,ymm9
+ vpor ymm2,ymm2,ymm6
+ vpxor ymm11,ymm11,ymm13
+ vmovdqa ymm13,YMMWORD[((0-128))+rax]
+
+ vpaddd ymm4,ymm4,ymm15
+ vpslld ymm7,ymm0,5
+ vpand ymm6,ymm3,ymm2
+ vpxor ymm11,ymm11,YMMWORD[((192-128))+rax]
+
+ vpaddd ymm4,ymm4,ymm6
+ vpsrld ymm8,ymm0,27
+ vpxor ymm5,ymm3,ymm2
+ vpxor ymm11,ymm11,ymm13
+
+ vmovdqu YMMWORD[(416-256-128)+rbx],ymm10
+ vpaddd ymm4,ymm4,ymm10
+ vpor ymm7,ymm7,ymm8
+ vpsrld ymm9,ymm11,31
+ vpand ymm5,ymm5,ymm1
+ vpaddd ymm11,ymm11,ymm11
+
+ vpslld ymm6,ymm1,30
+ vpaddd ymm4,ymm4,ymm5
+
+ vpsrld ymm1,ymm1,2
+ vpaddd ymm4,ymm4,ymm7
+ vpor ymm11,ymm11,ymm9
+ vpor ymm1,ymm1,ymm6
+ vpxor ymm12,ymm12,ymm14
+ vmovdqa ymm14,YMMWORD[((32-128))+rax]
+
+ vpaddd ymm3,ymm3,ymm15
+ vpslld ymm7,ymm4,5
+ vpand ymm6,ymm2,ymm1
+ vpxor ymm12,ymm12,YMMWORD[((224-128))+rax]
+
+ vpaddd ymm3,ymm3,ymm6
+ vpsrld ymm8,ymm4,27
+ vpxor ymm5,ymm2,ymm1
+ vpxor ymm12,ymm12,ymm14
+
+ vmovdqu YMMWORD[(448-256-128)+rbx],ymm11
+ vpaddd ymm3,ymm3,ymm11
+ vpor ymm7,ymm7,ymm8
+ vpsrld ymm9,ymm12,31
+ vpand ymm5,ymm5,ymm0
+ vpaddd ymm12,ymm12,ymm12
+
+ vpslld ymm6,ymm0,30
+ vpaddd ymm3,ymm3,ymm5
+
+ vpsrld ymm0,ymm0,2
+ vpaddd ymm3,ymm3,ymm7
+ vpor ymm12,ymm12,ymm9
+ vpor ymm0,ymm0,ymm6
+ vpxor ymm13,ymm13,ymm10
+ vmovdqa ymm10,YMMWORD[((64-128))+rax]
+
+ vpaddd ymm2,ymm2,ymm15
+ vpslld ymm7,ymm3,5
+ vpand ymm6,ymm1,ymm0
+ vpxor ymm13,ymm13,YMMWORD[((256-256-128))+rbx]
+
+ vpaddd ymm2,ymm2,ymm6
+ vpsrld ymm8,ymm3,27
+ vpxor ymm5,ymm1,ymm0
+ vpxor ymm13,ymm13,ymm10
+
+ vmovdqu YMMWORD[(480-256-128)+rbx],ymm12
+ vpaddd ymm2,ymm2,ymm12
+ vpor ymm7,ymm7,ymm8
+ vpsrld ymm9,ymm13,31
+ vpand ymm5,ymm5,ymm4
+ vpaddd ymm13,ymm13,ymm13
+
+ vpslld ymm6,ymm4,30
+ vpaddd ymm2,ymm2,ymm5
+
+ vpsrld ymm4,ymm4,2
+ vpaddd ymm2,ymm2,ymm7
+ vpor ymm13,ymm13,ymm9
+ vpor ymm4,ymm4,ymm6
+ vpxor ymm14,ymm14,ymm11
+ vmovdqa ymm11,YMMWORD[((96-128))+rax]
+
+ vpaddd ymm1,ymm1,ymm15
+ vpslld ymm7,ymm2,5
+ vpand ymm6,ymm0,ymm4
+ vpxor ymm14,ymm14,YMMWORD[((288-256-128))+rbx]
+
+ vpaddd ymm1,ymm1,ymm6
+ vpsrld ymm8,ymm2,27
+ vpxor ymm5,ymm0,ymm4
+ vpxor ymm14,ymm14,ymm11
+
+ vmovdqu YMMWORD[(0-128)+rax],ymm13
+ vpaddd ymm1,ymm1,ymm13
+ vpor ymm7,ymm7,ymm8
+ vpsrld ymm9,ymm14,31
+ vpand ymm5,ymm5,ymm3
+ vpaddd ymm14,ymm14,ymm14
+
+ vpslld ymm6,ymm3,30
+ vpaddd ymm1,ymm1,ymm5
+
+ vpsrld ymm3,ymm3,2
+ vpaddd ymm1,ymm1,ymm7
+ vpor ymm14,ymm14,ymm9
+ vpor ymm3,ymm3,ymm6
+ vpxor ymm10,ymm10,ymm12
+ vmovdqa ymm12,YMMWORD[((128-128))+rax]
+
+ vpaddd ymm0,ymm0,ymm15
+ vpslld ymm7,ymm1,5
+ vpand ymm6,ymm4,ymm3
+ vpxor ymm10,ymm10,YMMWORD[((320-256-128))+rbx]
+
+ vpaddd ymm0,ymm0,ymm6
+ vpsrld ymm8,ymm1,27
+ vpxor ymm5,ymm4,ymm3
+ vpxor ymm10,ymm10,ymm12
+
+ vmovdqu YMMWORD[(32-128)+rax],ymm14
+ vpaddd ymm0,ymm0,ymm14
+ vpor ymm7,ymm7,ymm8
+ vpsrld ymm9,ymm10,31
+ vpand ymm5,ymm5,ymm2
+ vpaddd ymm10,ymm10,ymm10
+
+ vpslld ymm6,ymm2,30
+ vpaddd ymm0,ymm0,ymm5
+
+ vpsrld ymm2,ymm2,2
+ vpaddd ymm0,ymm0,ymm7
+ vpor ymm10,ymm10,ymm9
+ vpor ymm2,ymm2,ymm6
+ vpxor ymm11,ymm11,ymm13
+ vmovdqa ymm13,YMMWORD[((160-128))+rax]
+
+ vpaddd ymm4,ymm4,ymm15
+ vpslld ymm7,ymm0,5
+ vpand ymm6,ymm3,ymm2
+ vpxor ymm11,ymm11,YMMWORD[((352-256-128))+rbx]
+
+ vpaddd ymm4,ymm4,ymm6
+ vpsrld ymm8,ymm0,27
+ vpxor ymm5,ymm3,ymm2
+ vpxor ymm11,ymm11,ymm13
+
+ vmovdqu YMMWORD[(64-128)+rax],ymm10
+ vpaddd ymm4,ymm4,ymm10
+ vpor ymm7,ymm7,ymm8
+ vpsrld ymm9,ymm11,31
+ vpand ymm5,ymm5,ymm1
+ vpaddd ymm11,ymm11,ymm11
+
+ vpslld ymm6,ymm1,30
+ vpaddd ymm4,ymm4,ymm5
+
+ vpsrld ymm1,ymm1,2
+ vpaddd ymm4,ymm4,ymm7
+ vpor ymm11,ymm11,ymm9
+ vpor ymm1,ymm1,ymm6
+ vpxor ymm12,ymm12,ymm14
+ vmovdqa ymm14,YMMWORD[((192-128))+rax]
+
+ vpaddd ymm3,ymm3,ymm15
+ vpslld ymm7,ymm4,5
+ vpand ymm6,ymm2,ymm1
+ vpxor ymm12,ymm12,YMMWORD[((384-256-128))+rbx]
+
+ vpaddd ymm3,ymm3,ymm6
+ vpsrld ymm8,ymm4,27
+ vpxor ymm5,ymm2,ymm1
+ vpxor ymm12,ymm12,ymm14
+
+ vmovdqu YMMWORD[(96-128)+rax],ymm11
+ vpaddd ymm3,ymm3,ymm11
+ vpor ymm7,ymm7,ymm8
+ vpsrld ymm9,ymm12,31
+ vpand ymm5,ymm5,ymm0
+ vpaddd ymm12,ymm12,ymm12
+
+ vpslld ymm6,ymm0,30
+ vpaddd ymm3,ymm3,ymm5
+
+ vpsrld ymm0,ymm0,2
+ vpaddd ymm3,ymm3,ymm7
+ vpor ymm12,ymm12,ymm9
+ vpor ymm0,ymm0,ymm6
+ vpxor ymm13,ymm13,ymm10
+ vmovdqa ymm10,YMMWORD[((224-128))+rax]
+
+ vpaddd ymm2,ymm2,ymm15
+ vpslld ymm7,ymm3,5
+ vpand ymm6,ymm1,ymm0
+ vpxor ymm13,ymm13,YMMWORD[((416-256-128))+rbx]
+
+ vpaddd ymm2,ymm2,ymm6
+ vpsrld ymm8,ymm3,27
+ vpxor ymm5,ymm1,ymm0
+ vpxor ymm13,ymm13,ymm10
+
+ vmovdqu YMMWORD[(128-128)+rax],ymm12
+ vpaddd ymm2,ymm2,ymm12
+ vpor ymm7,ymm7,ymm8
+ vpsrld ymm9,ymm13,31
+ vpand ymm5,ymm5,ymm4
+ vpaddd ymm13,ymm13,ymm13
+
+ vpslld ymm6,ymm4,30
+ vpaddd ymm2,ymm2,ymm5
+
+ vpsrld ymm4,ymm4,2
+ vpaddd ymm2,ymm2,ymm7
+ vpor ymm13,ymm13,ymm9
+ vpor ymm4,ymm4,ymm6
+ vpxor ymm14,ymm14,ymm11
+ vmovdqa ymm11,YMMWORD[((256-256-128))+rbx]
+
+ vpaddd ymm1,ymm1,ymm15
+ vpslld ymm7,ymm2,5
+ vpand ymm6,ymm0,ymm4
+ vpxor ymm14,ymm14,YMMWORD[((448-256-128))+rbx]
+
+ vpaddd ymm1,ymm1,ymm6
+ vpsrld ymm8,ymm2,27
+ vpxor ymm5,ymm0,ymm4
+ vpxor ymm14,ymm14,ymm11
+
+ vmovdqu YMMWORD[(160-128)+rax],ymm13
+ vpaddd ymm1,ymm1,ymm13
+ vpor ymm7,ymm7,ymm8
+ vpsrld ymm9,ymm14,31
+ vpand ymm5,ymm5,ymm3
+ vpaddd ymm14,ymm14,ymm14
+
+ vpslld ymm6,ymm3,30
+ vpaddd ymm1,ymm1,ymm5
+
+ vpsrld ymm3,ymm3,2
+ vpaddd ymm1,ymm1,ymm7
+ vpor ymm14,ymm14,ymm9
+ vpor ymm3,ymm3,ymm6
+ vpxor ymm10,ymm10,ymm12
+ vmovdqa ymm12,YMMWORD[((288-256-128))+rbx]
+
+ vpaddd ymm0,ymm0,ymm15
+ vpslld ymm7,ymm1,5
+ vpand ymm6,ymm4,ymm3
+ vpxor ymm10,ymm10,YMMWORD[((480-256-128))+rbx]
+
+ vpaddd ymm0,ymm0,ymm6
+ vpsrld ymm8,ymm1,27
+ vpxor ymm5,ymm4,ymm3
+ vpxor ymm10,ymm10,ymm12
+
+ vmovdqu YMMWORD[(192-128)+rax],ymm14
+ vpaddd ymm0,ymm0,ymm14
+ vpor ymm7,ymm7,ymm8
+ vpsrld ymm9,ymm10,31
+ vpand ymm5,ymm5,ymm2
+ vpaddd ymm10,ymm10,ymm10
+
+ vpslld ymm6,ymm2,30
+ vpaddd ymm0,ymm0,ymm5
+
+ vpsrld ymm2,ymm2,2
+ vpaddd ymm0,ymm0,ymm7
+ vpor ymm10,ymm10,ymm9
+ vpor ymm2,ymm2,ymm6
+ vpxor ymm11,ymm11,ymm13
+ vmovdqa ymm13,YMMWORD[((320-256-128))+rbx]
+
+ vpaddd ymm4,ymm4,ymm15
+ vpslld ymm7,ymm0,5
+ vpand ymm6,ymm3,ymm2
+ vpxor ymm11,ymm11,YMMWORD[((0-128))+rax]
+
+ vpaddd ymm4,ymm4,ymm6
+ vpsrld ymm8,ymm0,27
+ vpxor ymm5,ymm3,ymm2
+ vpxor ymm11,ymm11,ymm13
+
+ vmovdqu YMMWORD[(224-128)+rax],ymm10
+ vpaddd ymm4,ymm4,ymm10
+ vpor ymm7,ymm7,ymm8
+ vpsrld ymm9,ymm11,31
+ vpand ymm5,ymm5,ymm1
+ vpaddd ymm11,ymm11,ymm11
+
+ vpslld ymm6,ymm1,30
+ vpaddd ymm4,ymm4,ymm5
+
+ vpsrld ymm1,ymm1,2
+ vpaddd ymm4,ymm4,ymm7
+ vpor ymm11,ymm11,ymm9
+ vpor ymm1,ymm1,ymm6
+ vpxor ymm12,ymm12,ymm14
+ vmovdqa ymm14,YMMWORD[((352-256-128))+rbx]
+
+ vpaddd ymm3,ymm3,ymm15
+ vpslld ymm7,ymm4,5
+ vpand ymm6,ymm2,ymm1
+ vpxor ymm12,ymm12,YMMWORD[((32-128))+rax]
+
+ vpaddd ymm3,ymm3,ymm6
+ vpsrld ymm8,ymm4,27
+ vpxor ymm5,ymm2,ymm1
+ vpxor ymm12,ymm12,ymm14
+
+ vmovdqu YMMWORD[(256-256-128)+rbx],ymm11
+ vpaddd ymm3,ymm3,ymm11
+ vpor ymm7,ymm7,ymm8
+ vpsrld ymm9,ymm12,31
+ vpand ymm5,ymm5,ymm0
+ vpaddd ymm12,ymm12,ymm12
+
+ vpslld ymm6,ymm0,30
+ vpaddd ymm3,ymm3,ymm5
+
+ vpsrld ymm0,ymm0,2
+ vpaddd ymm3,ymm3,ymm7
+ vpor ymm12,ymm12,ymm9
+ vpor ymm0,ymm0,ymm6
+ vpxor ymm13,ymm13,ymm10
+ vmovdqa ymm10,YMMWORD[((384-256-128))+rbx]
+
+ vpaddd ymm2,ymm2,ymm15
+ vpslld ymm7,ymm3,5
+ vpand ymm6,ymm1,ymm0
+ vpxor ymm13,ymm13,YMMWORD[((64-128))+rax]
+
+ vpaddd ymm2,ymm2,ymm6
+ vpsrld ymm8,ymm3,27
+ vpxor ymm5,ymm1,ymm0
+ vpxor ymm13,ymm13,ymm10
+
+ vmovdqu YMMWORD[(288-256-128)+rbx],ymm12
+ vpaddd ymm2,ymm2,ymm12
+ vpor ymm7,ymm7,ymm8
+ vpsrld ymm9,ymm13,31
+ vpand ymm5,ymm5,ymm4
+ vpaddd ymm13,ymm13,ymm13
+
+ vpslld ymm6,ymm4,30
+ vpaddd ymm2,ymm2,ymm5
+
+ vpsrld ymm4,ymm4,2
+ vpaddd ymm2,ymm2,ymm7
+ vpor ymm13,ymm13,ymm9
+ vpor ymm4,ymm4,ymm6
+ vpxor ymm14,ymm14,ymm11
+ vmovdqa ymm11,YMMWORD[((416-256-128))+rbx]
+
+ vpaddd ymm1,ymm1,ymm15
+ vpslld ymm7,ymm2,5
+ vpand ymm6,ymm0,ymm4
+ vpxor ymm14,ymm14,YMMWORD[((96-128))+rax]
+
+ vpaddd ymm1,ymm1,ymm6
+ vpsrld ymm8,ymm2,27
+ vpxor ymm5,ymm0,ymm4
+ vpxor ymm14,ymm14,ymm11
+
+ vmovdqu YMMWORD[(320-256-128)+rbx],ymm13
+ vpaddd ymm1,ymm1,ymm13
+ vpor ymm7,ymm7,ymm8
+ vpsrld ymm9,ymm14,31
+ vpand ymm5,ymm5,ymm3
+ vpaddd ymm14,ymm14,ymm14
+
+ vpslld ymm6,ymm3,30
+ vpaddd ymm1,ymm1,ymm5
+
+ vpsrld ymm3,ymm3,2
+ vpaddd ymm1,ymm1,ymm7
+ vpor ymm14,ymm14,ymm9
+ vpor ymm3,ymm3,ymm6
+ vpxor ymm10,ymm10,ymm12
+ vmovdqa ymm12,YMMWORD[((448-256-128))+rbx]
+
+ vpaddd ymm0,ymm0,ymm15
+ vpslld ymm7,ymm1,5
+ vpand ymm6,ymm4,ymm3
+ vpxor ymm10,ymm10,YMMWORD[((128-128))+rax]
+
+ vpaddd ymm0,ymm0,ymm6
+ vpsrld ymm8,ymm1,27
+ vpxor ymm5,ymm4,ymm3
+ vpxor ymm10,ymm10,ymm12
+
+ vmovdqu YMMWORD[(352-256-128)+rbx],ymm14
+ vpaddd ymm0,ymm0,ymm14
+ vpor ymm7,ymm7,ymm8
+ vpsrld ymm9,ymm10,31
+ vpand ymm5,ymm5,ymm2
+ vpaddd ymm10,ymm10,ymm10
+
+ vpslld ymm6,ymm2,30
+ vpaddd ymm0,ymm0,ymm5
+
+ vpsrld ymm2,ymm2,2
+ vpaddd ymm0,ymm0,ymm7
+ vpor ymm10,ymm10,ymm9
+ vpor ymm2,ymm2,ymm6
+ vmovdqa ymm15,YMMWORD[64+rbp]
+ vpxor ymm11,ymm11,ymm13
+ vmovdqa ymm13,YMMWORD[((480-256-128))+rbx]
+
+ vpslld ymm7,ymm0,5
+ vpaddd ymm4,ymm4,ymm15
+ vpxor ymm5,ymm3,ymm1
+ vmovdqa YMMWORD[(384-256-128)+rbx],ymm10
+ vpaddd ymm4,ymm4,ymm10
+ vpxor ymm11,ymm11,YMMWORD[((160-128))+rax]
+ vpsrld ymm8,ymm0,27
+ vpxor ymm5,ymm5,ymm2
+ vpxor ymm11,ymm11,ymm13
+
+ vpslld ymm6,ymm1,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm4,ymm4,ymm5
+ vpsrld ymm9,ymm11,31
+ vpaddd ymm11,ymm11,ymm11
+
+ vpsrld ymm1,ymm1,2
+ vpaddd ymm4,ymm4,ymm7
+ vpor ymm11,ymm11,ymm9
+ vpor ymm1,ymm1,ymm6
+ vpxor ymm12,ymm12,ymm14
+ vmovdqa ymm14,YMMWORD[((0-128))+rax]
+
+ vpslld ymm7,ymm4,5
+ vpaddd ymm3,ymm3,ymm15
+ vpxor ymm5,ymm2,ymm0
+ vmovdqa YMMWORD[(416-256-128)+rbx],ymm11
+ vpaddd ymm3,ymm3,ymm11
+ vpxor ymm12,ymm12,YMMWORD[((192-128))+rax]
+ vpsrld ymm8,ymm4,27
+ vpxor ymm5,ymm5,ymm1
+ vpxor ymm12,ymm12,ymm14
+
+ vpslld ymm6,ymm0,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm3,ymm3,ymm5
+ vpsrld ymm9,ymm12,31
+ vpaddd ymm12,ymm12,ymm12
+
+ vpsrld ymm0,ymm0,2
+ vpaddd ymm3,ymm3,ymm7
+ vpor ymm12,ymm12,ymm9
+ vpor ymm0,ymm0,ymm6
+ vpxor ymm13,ymm13,ymm10
+ vmovdqa ymm10,YMMWORD[((32-128))+rax]
+
+ vpslld ymm7,ymm3,5
+ vpaddd ymm2,ymm2,ymm15
+ vpxor ymm5,ymm1,ymm4
+ vmovdqa YMMWORD[(448-256-128)+rbx],ymm12
+ vpaddd ymm2,ymm2,ymm12
+ vpxor ymm13,ymm13,YMMWORD[((224-128))+rax]
+ vpsrld ymm8,ymm3,27
+ vpxor ymm5,ymm5,ymm0
+ vpxor ymm13,ymm13,ymm10
+
+ vpslld ymm6,ymm4,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm2,ymm2,ymm5
+ vpsrld ymm9,ymm13,31
+ vpaddd ymm13,ymm13,ymm13
+
+ vpsrld ymm4,ymm4,2
+ vpaddd ymm2,ymm2,ymm7
+ vpor ymm13,ymm13,ymm9
+ vpor ymm4,ymm4,ymm6
+ vpxor ymm14,ymm14,ymm11
+ vmovdqa ymm11,YMMWORD[((64-128))+rax]
+
+ vpslld ymm7,ymm2,5
+ vpaddd ymm1,ymm1,ymm15
+ vpxor ymm5,ymm0,ymm3
+ vmovdqa YMMWORD[(480-256-128)+rbx],ymm13
+ vpaddd ymm1,ymm1,ymm13
+ vpxor ymm14,ymm14,YMMWORD[((256-256-128))+rbx]
+ vpsrld ymm8,ymm2,27
+ vpxor ymm5,ymm5,ymm4
+ vpxor ymm14,ymm14,ymm11
+
+ vpslld ymm6,ymm3,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm1,ymm1,ymm5
+ vpsrld ymm9,ymm14,31
+ vpaddd ymm14,ymm14,ymm14
+
+ vpsrld ymm3,ymm3,2
+ vpaddd ymm1,ymm1,ymm7
+ vpor ymm14,ymm14,ymm9
+ vpor ymm3,ymm3,ymm6
+ vpxor ymm10,ymm10,ymm12
+ vmovdqa ymm12,YMMWORD[((96-128))+rax]
+
+ vpslld ymm7,ymm1,5
+ vpaddd ymm0,ymm0,ymm15
+ vpxor ymm5,ymm4,ymm2
+ vmovdqa YMMWORD[(0-128)+rax],ymm14
+ vpaddd ymm0,ymm0,ymm14
+ vpxor ymm10,ymm10,YMMWORD[((288-256-128))+rbx]
+ vpsrld ymm8,ymm1,27
+ vpxor ymm5,ymm5,ymm3
+ vpxor ymm10,ymm10,ymm12
+
+ vpslld ymm6,ymm2,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm0,ymm0,ymm5
+ vpsrld ymm9,ymm10,31
+ vpaddd ymm10,ymm10,ymm10
+
+ vpsrld ymm2,ymm2,2
+ vpaddd ymm0,ymm0,ymm7
+ vpor ymm10,ymm10,ymm9
+ vpor ymm2,ymm2,ymm6
+ vpxor ymm11,ymm11,ymm13
+ vmovdqa ymm13,YMMWORD[((128-128))+rax]
+
+ vpslld ymm7,ymm0,5
+ vpaddd ymm4,ymm4,ymm15
+ vpxor ymm5,ymm3,ymm1
+ vmovdqa YMMWORD[(32-128)+rax],ymm10
+ vpaddd ymm4,ymm4,ymm10
+ vpxor ymm11,ymm11,YMMWORD[((320-256-128))+rbx]
+ vpsrld ymm8,ymm0,27
+ vpxor ymm5,ymm5,ymm2
+ vpxor ymm11,ymm11,ymm13
+
+ vpslld ymm6,ymm1,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm4,ymm4,ymm5
+ vpsrld ymm9,ymm11,31
+ vpaddd ymm11,ymm11,ymm11
+
+ vpsrld ymm1,ymm1,2
+ vpaddd ymm4,ymm4,ymm7
+ vpor ymm11,ymm11,ymm9
+ vpor ymm1,ymm1,ymm6
+ vpxor ymm12,ymm12,ymm14
+ vmovdqa ymm14,YMMWORD[((160-128))+rax]
+
+ vpslld ymm7,ymm4,5
+ vpaddd ymm3,ymm3,ymm15
+ vpxor ymm5,ymm2,ymm0
+ vmovdqa YMMWORD[(64-128)+rax],ymm11
+ vpaddd ymm3,ymm3,ymm11
+ vpxor ymm12,ymm12,YMMWORD[((352-256-128))+rbx]
+ vpsrld ymm8,ymm4,27
+ vpxor ymm5,ymm5,ymm1
+ vpxor ymm12,ymm12,ymm14
+
+ vpslld ymm6,ymm0,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm3,ymm3,ymm5
+ vpsrld ymm9,ymm12,31
+ vpaddd ymm12,ymm12,ymm12
+
+ vpsrld ymm0,ymm0,2
+ vpaddd ymm3,ymm3,ymm7
+ vpor ymm12,ymm12,ymm9
+ vpor ymm0,ymm0,ymm6
+ vpxor ymm13,ymm13,ymm10
+ vmovdqa ymm10,YMMWORD[((192-128))+rax]
+
+ vpslld ymm7,ymm3,5
+ vpaddd ymm2,ymm2,ymm15
+ vpxor ymm5,ymm1,ymm4
+ vmovdqa YMMWORD[(96-128)+rax],ymm12
+ vpaddd ymm2,ymm2,ymm12
+ vpxor ymm13,ymm13,YMMWORD[((384-256-128))+rbx]
+ vpsrld ymm8,ymm3,27
+ vpxor ymm5,ymm5,ymm0
+ vpxor ymm13,ymm13,ymm10
+
+ vpslld ymm6,ymm4,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm2,ymm2,ymm5
+ vpsrld ymm9,ymm13,31
+ vpaddd ymm13,ymm13,ymm13
+
+ vpsrld ymm4,ymm4,2
+ vpaddd ymm2,ymm2,ymm7
+ vpor ymm13,ymm13,ymm9
+ vpor ymm4,ymm4,ymm6
+ vpxor ymm14,ymm14,ymm11
+ vmovdqa ymm11,YMMWORD[((224-128))+rax]
+
+ vpslld ymm7,ymm2,5
+ vpaddd ymm1,ymm1,ymm15
+ vpxor ymm5,ymm0,ymm3
+ vmovdqa YMMWORD[(128-128)+rax],ymm13
+ vpaddd ymm1,ymm1,ymm13
+ vpxor ymm14,ymm14,YMMWORD[((416-256-128))+rbx]
+ vpsrld ymm8,ymm2,27
+ vpxor ymm5,ymm5,ymm4
+ vpxor ymm14,ymm14,ymm11
+
+ vpslld ymm6,ymm3,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm1,ymm1,ymm5
+ vpsrld ymm9,ymm14,31
+ vpaddd ymm14,ymm14,ymm14
+
+ vpsrld ymm3,ymm3,2
+ vpaddd ymm1,ymm1,ymm7
+ vpor ymm14,ymm14,ymm9
+ vpor ymm3,ymm3,ymm6
+ vpxor ymm10,ymm10,ymm12
+ vmovdqa ymm12,YMMWORD[((256-256-128))+rbx]
+
+ vpslld ymm7,ymm1,5
+ vpaddd ymm0,ymm0,ymm15
+ vpxor ymm5,ymm4,ymm2
+ vmovdqa YMMWORD[(160-128)+rax],ymm14
+ vpaddd ymm0,ymm0,ymm14
+ vpxor ymm10,ymm10,YMMWORD[((448-256-128))+rbx]
+ vpsrld ymm8,ymm1,27
+ vpxor ymm5,ymm5,ymm3
+ vpxor ymm10,ymm10,ymm12
+
+ vpslld ymm6,ymm2,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm0,ymm0,ymm5
+ vpsrld ymm9,ymm10,31
+ vpaddd ymm10,ymm10,ymm10
+
+ vpsrld ymm2,ymm2,2
+ vpaddd ymm0,ymm0,ymm7
+ vpor ymm10,ymm10,ymm9
+ vpor ymm2,ymm2,ymm6
+ vpxor ymm11,ymm11,ymm13
+ vmovdqa ymm13,YMMWORD[((288-256-128))+rbx]
+
+ vpslld ymm7,ymm0,5
+ vpaddd ymm4,ymm4,ymm15
+ vpxor ymm5,ymm3,ymm1
+ vmovdqa YMMWORD[(192-128)+rax],ymm10
+ vpaddd ymm4,ymm4,ymm10
+ vpxor ymm11,ymm11,YMMWORD[((480-256-128))+rbx]
+ vpsrld ymm8,ymm0,27
+ vpxor ymm5,ymm5,ymm2
+ vpxor ymm11,ymm11,ymm13
+
+ vpslld ymm6,ymm1,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm4,ymm4,ymm5
+ vpsrld ymm9,ymm11,31
+ vpaddd ymm11,ymm11,ymm11
+
+ vpsrld ymm1,ymm1,2
+ vpaddd ymm4,ymm4,ymm7
+ vpor ymm11,ymm11,ymm9
+ vpor ymm1,ymm1,ymm6
+ vpxor ymm12,ymm12,ymm14
+ vmovdqa ymm14,YMMWORD[((320-256-128))+rbx]
+
+ vpslld ymm7,ymm4,5
+ vpaddd ymm3,ymm3,ymm15
+ vpxor ymm5,ymm2,ymm0
+ vmovdqa YMMWORD[(224-128)+rax],ymm11
+ vpaddd ymm3,ymm3,ymm11
+ vpxor ymm12,ymm12,YMMWORD[((0-128))+rax]
+ vpsrld ymm8,ymm4,27
+ vpxor ymm5,ymm5,ymm1
+ vpxor ymm12,ymm12,ymm14
+
+ vpslld ymm6,ymm0,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm3,ymm3,ymm5
+ vpsrld ymm9,ymm12,31
+ vpaddd ymm12,ymm12,ymm12
+
+ vpsrld ymm0,ymm0,2
+ vpaddd ymm3,ymm3,ymm7
+ vpor ymm12,ymm12,ymm9
+ vpor ymm0,ymm0,ymm6
+ vpxor ymm13,ymm13,ymm10
+ vmovdqa ymm10,YMMWORD[((352-256-128))+rbx]
+
+ vpslld ymm7,ymm3,5
+ vpaddd ymm2,ymm2,ymm15
+ vpxor ymm5,ymm1,ymm4
+ vpaddd ymm2,ymm2,ymm12
+ vpxor ymm13,ymm13,YMMWORD[((32-128))+rax]
+ vpsrld ymm8,ymm3,27
+ vpxor ymm5,ymm5,ymm0
+ vpxor ymm13,ymm13,ymm10
+
+ vpslld ymm6,ymm4,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm2,ymm2,ymm5
+ vpsrld ymm9,ymm13,31
+ vpaddd ymm13,ymm13,ymm13
+
+ vpsrld ymm4,ymm4,2
+ vpaddd ymm2,ymm2,ymm7
+ vpor ymm13,ymm13,ymm9
+ vpor ymm4,ymm4,ymm6
+ vpxor ymm14,ymm14,ymm11
+ vmovdqa ymm11,YMMWORD[((384-256-128))+rbx]
+
+ vpslld ymm7,ymm2,5
+ vpaddd ymm1,ymm1,ymm15
+ vpxor ymm5,ymm0,ymm3
+ vpaddd ymm1,ymm1,ymm13
+ vpxor ymm14,ymm14,YMMWORD[((64-128))+rax]
+ vpsrld ymm8,ymm2,27
+ vpxor ymm5,ymm5,ymm4
+ vpxor ymm14,ymm14,ymm11
+
+ vpslld ymm6,ymm3,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm1,ymm1,ymm5
+ vpsrld ymm9,ymm14,31
+ vpaddd ymm14,ymm14,ymm14
+
+ vpsrld ymm3,ymm3,2
+ vpaddd ymm1,ymm1,ymm7
+ vpor ymm14,ymm14,ymm9
+ vpor ymm3,ymm3,ymm6
+ vpxor ymm10,ymm10,ymm12
+ vmovdqa ymm12,YMMWORD[((416-256-128))+rbx]
+
+ vpslld ymm7,ymm1,5
+ vpaddd ymm0,ymm0,ymm15
+ vpxor ymm5,ymm4,ymm2
+ vpaddd ymm0,ymm0,ymm14
+ vpxor ymm10,ymm10,YMMWORD[((96-128))+rax]
+ vpsrld ymm8,ymm1,27
+ vpxor ymm5,ymm5,ymm3
+ vpxor ymm10,ymm10,ymm12
+
+ vpslld ymm6,ymm2,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm0,ymm0,ymm5
+ vpsrld ymm9,ymm10,31
+ vpaddd ymm10,ymm10,ymm10
+
+ vpsrld ymm2,ymm2,2
+ vpaddd ymm0,ymm0,ymm7
+ vpor ymm10,ymm10,ymm9
+ vpor ymm2,ymm2,ymm6
+ vpxor ymm11,ymm11,ymm13
+ vmovdqa ymm13,YMMWORD[((448-256-128))+rbx]
+
+ vpslld ymm7,ymm0,5
+ vpaddd ymm4,ymm4,ymm15
+ vpxor ymm5,ymm3,ymm1
+ vpaddd ymm4,ymm4,ymm10
+ vpxor ymm11,ymm11,YMMWORD[((128-128))+rax]
+ vpsrld ymm8,ymm0,27
+ vpxor ymm5,ymm5,ymm2
+ vpxor ymm11,ymm11,ymm13
+
+ vpslld ymm6,ymm1,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm4,ymm4,ymm5
+ vpsrld ymm9,ymm11,31
+ vpaddd ymm11,ymm11,ymm11
+
+ vpsrld ymm1,ymm1,2
+ vpaddd ymm4,ymm4,ymm7
+ vpor ymm11,ymm11,ymm9
+ vpor ymm1,ymm1,ymm6
+ vpxor ymm12,ymm12,ymm14
+ vmovdqa ymm14,YMMWORD[((480-256-128))+rbx]
+
+ vpslld ymm7,ymm4,5
+ vpaddd ymm3,ymm3,ymm15
+ vpxor ymm5,ymm2,ymm0
+ vpaddd ymm3,ymm3,ymm11
+ vpxor ymm12,ymm12,YMMWORD[((160-128))+rax]
+ vpsrld ymm8,ymm4,27
+ vpxor ymm5,ymm5,ymm1
+ vpxor ymm12,ymm12,ymm14
+
+ vpslld ymm6,ymm0,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm3,ymm3,ymm5
+ vpsrld ymm9,ymm12,31
+ vpaddd ymm12,ymm12,ymm12
+
+ vpsrld ymm0,ymm0,2
+ vpaddd ymm3,ymm3,ymm7
+ vpor ymm12,ymm12,ymm9
+ vpor ymm0,ymm0,ymm6
+ vpxor ymm13,ymm13,ymm10
+ vmovdqa ymm10,YMMWORD[((0-128))+rax]
+
+ vpslld ymm7,ymm3,5
+ vpaddd ymm2,ymm2,ymm15
+ vpxor ymm5,ymm1,ymm4
+ vpaddd ymm2,ymm2,ymm12
+ vpxor ymm13,ymm13,YMMWORD[((192-128))+rax]
+ vpsrld ymm8,ymm3,27
+ vpxor ymm5,ymm5,ymm0
+ vpxor ymm13,ymm13,ymm10
+
+ vpslld ymm6,ymm4,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm2,ymm2,ymm5
+ vpsrld ymm9,ymm13,31
+ vpaddd ymm13,ymm13,ymm13
+
+ vpsrld ymm4,ymm4,2
+ vpaddd ymm2,ymm2,ymm7
+ vpor ymm13,ymm13,ymm9
+ vpor ymm4,ymm4,ymm6
+ vpxor ymm14,ymm14,ymm11
+ vmovdqa ymm11,YMMWORD[((32-128))+rax]
+
+ vpslld ymm7,ymm2,5
+ vpaddd ymm1,ymm1,ymm15
+ vpxor ymm5,ymm0,ymm3
+ vpaddd ymm1,ymm1,ymm13
+ vpxor ymm14,ymm14,YMMWORD[((224-128))+rax]
+ vpsrld ymm8,ymm2,27
+ vpxor ymm5,ymm5,ymm4
+ vpxor ymm14,ymm14,ymm11
+
+ vpslld ymm6,ymm3,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm1,ymm1,ymm5
+ vpsrld ymm9,ymm14,31
+ vpaddd ymm14,ymm14,ymm14
+
+ vpsrld ymm3,ymm3,2
+ vpaddd ymm1,ymm1,ymm7
+ vpor ymm14,ymm14,ymm9
+ vpor ymm3,ymm3,ymm6
+ vpslld ymm7,ymm1,5
+ vpaddd ymm0,ymm0,ymm15
+ vpxor ymm5,ymm4,ymm2
+
+ vpsrld ymm8,ymm1,27
+ vpaddd ymm0,ymm0,ymm14
+ vpxor ymm5,ymm5,ymm3
+
+ vpslld ymm6,ymm2,30
+ vpor ymm7,ymm7,ymm8
+ vpaddd ymm0,ymm0,ymm5
+
+ vpsrld ymm2,ymm2,2
+ vpaddd ymm0,ymm0,ymm7
+ vpor ymm2,ymm2,ymm6
+ mov ecx,1
+ lea rbx,[512+rsp]
+ cmp ecx,DWORD[rbx]
+ cmovge r12,rbp
+ cmp ecx,DWORD[4+rbx]
+ cmovge r13,rbp
+ cmp ecx,DWORD[8+rbx]
+ cmovge r14,rbp
+ cmp ecx,DWORD[12+rbx]
+ cmovge r15,rbp
+ cmp ecx,DWORD[16+rbx]
+ cmovge r8,rbp
+ cmp ecx,DWORD[20+rbx]
+ cmovge r9,rbp
+ cmp ecx,DWORD[24+rbx]
+ cmovge r10,rbp
+ cmp ecx,DWORD[28+rbx]
+ cmovge r11,rbp
+ vmovdqu ymm5,YMMWORD[rbx]
+ vpxor ymm7,ymm7,ymm7
+ vmovdqa ymm6,ymm5
+ vpcmpgtd ymm6,ymm6,ymm7
+ vpaddd ymm5,ymm5,ymm6
+
+ vpand ymm0,ymm0,ymm6
+ vpand ymm1,ymm1,ymm6
+ vpaddd ymm0,ymm0,YMMWORD[rdi]
+ vpand ymm2,ymm2,ymm6
+ vpaddd ymm1,ymm1,YMMWORD[32+rdi]
+ vpand ymm3,ymm3,ymm6
+ vpaddd ymm2,ymm2,YMMWORD[64+rdi]
+ vpand ymm4,ymm4,ymm6
+ vpaddd ymm3,ymm3,YMMWORD[96+rdi]
+ vpaddd ymm4,ymm4,YMMWORD[128+rdi]
+ vmovdqu YMMWORD[rdi],ymm0
+ vmovdqu YMMWORD[32+rdi],ymm1
+ vmovdqu YMMWORD[64+rdi],ymm2
+ vmovdqu YMMWORD[96+rdi],ymm3
+ vmovdqu YMMWORD[128+rdi],ymm4
+
+ vmovdqu YMMWORD[rbx],ymm5
+ lea rbx,[((256+128))+rsp]
+ vmovdqu ymm9,YMMWORD[96+rbp]
+ dec edx
+ jnz NEAR $L$oop_avx2
+
+
+
+
+
+
+
+$L$done_avx2:
+ mov rax,QWORD[544+rsp]
+
+ vzeroupper
+ movaps xmm6,XMMWORD[((-216))+rax]
+ movaps xmm7,XMMWORD[((-200))+rax]
+ movaps xmm8,XMMWORD[((-184))+rax]
+ movaps xmm9,XMMWORD[((-168))+rax]
+ movaps xmm10,XMMWORD[((-152))+rax]
+ movaps xmm11,XMMWORD[((-136))+rax]
+ movaps xmm12,XMMWORD[((-120))+rax]
+ movaps xmm13,XMMWORD[((-104))+rax]
+ movaps xmm14,XMMWORD[((-88))+rax]
+ movaps xmm15,XMMWORD[((-72))+rax]
+ mov r15,QWORD[((-48))+rax]
+
+ mov r14,QWORD[((-40))+rax]
+
+ mov r13,QWORD[((-32))+rax]
+
+ mov r12,QWORD[((-24))+rax]
+
+ mov rbp,QWORD[((-16))+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+
+ lea rsp,[rax]
+
+$L$epilogue_avx2:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_sha1_multi_block_avx2:
+
+ALIGN 256
+ DD 0x5a827999,0x5a827999,0x5a827999,0x5a827999
+ DD 0x5a827999,0x5a827999,0x5a827999,0x5a827999
+K_XX_XX:
+ DD 0x6ed9eba1,0x6ed9eba1,0x6ed9eba1,0x6ed9eba1
+ DD 0x6ed9eba1,0x6ed9eba1,0x6ed9eba1,0x6ed9eba1
+ DD 0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc
+ DD 0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc
+ DD 0xca62c1d6,0xca62c1d6,0xca62c1d6,0xca62c1d6
+ DD 0xca62c1d6,0xca62c1d6,0xca62c1d6,0xca62c1d6
+ DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f
+ DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f
+DB 0xf,0xe,0xd,0xc,0xb,0xa,0x9,0x8,0x7,0x6,0x5,0x4,0x3,0x2,0x1,0x0
+DB 83,72,65,49,32,109,117,108,116,105,45,98,108,111,99,107
+DB 32,116,114,97,110,115,102,111,114,109,32,102,111,114,32,120
+DB 56,54,95,54,52,44,32,67,82,89,80,84,79,71,65,77
+DB 83,32,98,121,32,60,97,112,112,114,111,64,111,112,101,110
+DB 115,115,108,46,111,114,103,62,0
+EXTERN __imp_RtlVirtualUnwind
+
+ALIGN 16
+se_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ mov rsi,QWORD[8+r9]
+ mov r11,QWORD[56+r9]
+
+ mov r10d,DWORD[r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jb NEAR $L$in_prologue
+
+ mov rax,QWORD[152+r8]
+
+ mov r10d,DWORD[4+r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jae NEAR $L$in_prologue
+
+ mov rax,QWORD[272+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+ mov rbp,QWORD[((-16))+rax]
+ mov QWORD[144+r8],rbx
+ mov QWORD[160+r8],rbp
+
+ lea rsi,[((-24-160))+rax]
+ lea rdi,[512+r8]
+ mov ecx,20
+ DD 0xa548f3fc
+
+$L$in_prologue:
+ mov rdi,QWORD[8+rax]
+ mov rsi,QWORD[16+rax]
+ mov QWORD[152+r8],rax
+ mov QWORD[168+r8],rsi
+ mov QWORD[176+r8],rdi
+
+ mov rdi,QWORD[40+r9]
+ mov rsi,r8
+ mov ecx,154
+ DD 0xa548f3fc
+
+ mov rsi,r9
+ xor rcx,rcx
+ mov rdx,QWORD[8+rsi]
+ mov r8,QWORD[rsi]
+ mov r9,QWORD[16+rsi]
+ mov r10,QWORD[40+rsi]
+ lea r11,[56+rsi]
+ lea r12,[24+rsi]
+ mov QWORD[32+rsp],r10
+ mov QWORD[40+rsp],r11
+ mov QWORD[48+rsp],r12
+ mov QWORD[56+rsp],rcx
+ call QWORD[__imp_RtlVirtualUnwind]
+
+ mov eax,1
+ add rsp,64
+ popfq
+ pop r15
+ pop r14
+ pop r13
+ pop r12
+ pop rbp
+ pop rbx
+ pop rdi
+ pop rsi
+ DB 0F3h,0C3h ;repret
+
+
+ALIGN 16
+avx2_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ mov rsi,QWORD[8+r9]
+ mov r11,QWORD[56+r9]
+
+ mov r10d,DWORD[r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jb NEAR $L$in_prologue
+
+ mov rax,QWORD[152+r8]
+
+ mov r10d,DWORD[4+r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jae NEAR $L$in_prologue
+
+ mov rax,QWORD[544+r8]
+
+ mov rbx,QWORD[((-8))+rax]
+ mov rbp,QWORD[((-16))+rax]
+ mov r12,QWORD[((-24))+rax]
+ mov r13,QWORD[((-32))+rax]
+ mov r14,QWORD[((-40))+rax]
+ mov r15,QWORD[((-48))+rax]
+ mov QWORD[144+r8],rbx
+ mov QWORD[160+r8],rbp
+ mov QWORD[216+r8],r12
+ mov QWORD[224+r8],r13
+ mov QWORD[232+r8],r14
+ mov QWORD[240+r8],r15
+
+ lea rsi,[((-56-160))+rax]
+ lea rdi,[512+r8]
+ mov ecx,20
+ DD 0xa548f3fc
+
+ jmp NEAR $L$in_prologue
+
+section .pdata rdata align=4
+ALIGN 4
+ DD $L$SEH_begin_sha1_multi_block wrt ..imagebase
+ DD $L$SEH_end_sha1_multi_block wrt ..imagebase
+ DD $L$SEH_info_sha1_multi_block wrt ..imagebase
+ DD $L$SEH_begin_sha1_multi_block_shaext wrt ..imagebase
+ DD $L$SEH_end_sha1_multi_block_shaext wrt ..imagebase
+ DD $L$SEH_info_sha1_multi_block_shaext wrt ..imagebase
+ DD $L$SEH_begin_sha1_multi_block_avx wrt ..imagebase
+ DD $L$SEH_end_sha1_multi_block_avx wrt ..imagebase
+ DD $L$SEH_info_sha1_multi_block_avx wrt ..imagebase
+ DD $L$SEH_begin_sha1_multi_block_avx2 wrt ..imagebase
+ DD $L$SEH_end_sha1_multi_block_avx2 wrt ..imagebase
+ DD $L$SEH_info_sha1_multi_block_avx2 wrt ..imagebase
+section .xdata rdata align=8
+ALIGN 8
+$L$SEH_info_sha1_multi_block:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$body wrt ..imagebase,$L$epilogue wrt ..imagebase
+$L$SEH_info_sha1_multi_block_shaext:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$body_shaext wrt ..imagebase,$L$epilogue_shaext wrt ..imagebase
+$L$SEH_info_sha1_multi_block_avx:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$body_avx wrt ..imagebase,$L$epilogue_avx wrt ..imagebase
+$L$SEH_info_sha1_multi_block_avx2:
+DB 9,0,0,0
+ DD avx2_handler wrt ..imagebase
+ DD $L$body_avx2 wrt ..imagebase,$L$epilogue_avx2 wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-x86_64.nasm
new file mode 100644
index 0000000000..3a7655b27f
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-x86_64.nasm
@@ -0,0 +1,5773 @@
+; Copyright 2006-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+EXTERN OPENSSL_ia32cap_P
+
+global sha1_block_data_order
+
+ALIGN 16
+sha1_block_data_order:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_sha1_block_data_order:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+
+ mov r9d,DWORD[((OPENSSL_ia32cap_P+0))]
+ mov r8d,DWORD[((OPENSSL_ia32cap_P+4))]
+ mov r10d,DWORD[((OPENSSL_ia32cap_P+8))]
+ test r8d,512
+ jz NEAR $L$ialu
+ test r10d,536870912
+ jnz NEAR _shaext_shortcut
+ and r10d,296
+ cmp r10d,296
+ je NEAR _avx2_shortcut
+ and r8d,268435456
+ and r9d,1073741824
+ or r8d,r9d
+ cmp r8d,1342177280
+ je NEAR _avx_shortcut
+ jmp NEAR _ssse3_shortcut
+
+ALIGN 16
+$L$ialu:
+ mov rax,rsp
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ mov r8,rdi
+ sub rsp,72
+ mov r9,rsi
+ and rsp,-64
+ mov r10,rdx
+ mov QWORD[64+rsp],rax
+
+$L$prologue:
+
+ mov esi,DWORD[r8]
+ mov edi,DWORD[4+r8]
+ mov r11d,DWORD[8+r8]
+ mov r12d,DWORD[12+r8]
+ mov r13d,DWORD[16+r8]
+ jmp NEAR $L$loop
+
+ALIGN 16
+$L$loop:
+ mov edx,DWORD[r9]
+ bswap edx
+ mov ebp,DWORD[4+r9]
+ mov eax,r12d
+ mov DWORD[rsp],edx
+ mov ecx,esi
+ bswap ebp
+ xor eax,r11d
+ rol ecx,5
+ and eax,edi
+ lea r13d,[1518500249+r13*1+rdx]
+ add r13d,ecx
+ xor eax,r12d
+ rol edi,30
+ add r13d,eax
+ mov r14d,DWORD[8+r9]
+ mov eax,r11d
+ mov DWORD[4+rsp],ebp
+ mov ecx,r13d
+ bswap r14d
+ xor eax,edi
+ rol ecx,5
+ and eax,esi
+ lea r12d,[1518500249+r12*1+rbp]
+ add r12d,ecx
+ xor eax,r11d
+ rol esi,30
+ add r12d,eax
+ mov edx,DWORD[12+r9]
+ mov eax,edi
+ mov DWORD[8+rsp],r14d
+ mov ecx,r12d
+ bswap edx
+ xor eax,esi
+ rol ecx,5
+ and eax,r13d
+ lea r11d,[1518500249+r11*1+r14]
+ add r11d,ecx
+ xor eax,edi
+ rol r13d,30
+ add r11d,eax
+ mov ebp,DWORD[16+r9]
+ mov eax,esi
+ mov DWORD[12+rsp],edx
+ mov ecx,r11d
+ bswap ebp
+ xor eax,r13d
+ rol ecx,5
+ and eax,r12d
+ lea edi,[1518500249+rdi*1+rdx]
+ add edi,ecx
+ xor eax,esi
+ rol r12d,30
+ add edi,eax
+ mov r14d,DWORD[20+r9]
+ mov eax,r13d
+ mov DWORD[16+rsp],ebp
+ mov ecx,edi
+ bswap r14d
+ xor eax,r12d
+ rol ecx,5
+ and eax,r11d
+ lea esi,[1518500249+rsi*1+rbp]
+ add esi,ecx
+ xor eax,r13d
+ rol r11d,30
+ add esi,eax
+ mov edx,DWORD[24+r9]
+ mov eax,r12d
+ mov DWORD[20+rsp],r14d
+ mov ecx,esi
+ bswap edx
+ xor eax,r11d
+ rol ecx,5
+ and eax,edi
+ lea r13d,[1518500249+r13*1+r14]
+ add r13d,ecx
+ xor eax,r12d
+ rol edi,30
+ add r13d,eax
+ mov ebp,DWORD[28+r9]
+ mov eax,r11d
+ mov DWORD[24+rsp],edx
+ mov ecx,r13d
+ bswap ebp
+ xor eax,edi
+ rol ecx,5
+ and eax,esi
+ lea r12d,[1518500249+r12*1+rdx]
+ add r12d,ecx
+ xor eax,r11d
+ rol esi,30
+ add r12d,eax
+ mov r14d,DWORD[32+r9]
+ mov eax,edi
+ mov DWORD[28+rsp],ebp
+ mov ecx,r12d
+ bswap r14d
+ xor eax,esi
+ rol ecx,5
+ and eax,r13d
+ lea r11d,[1518500249+r11*1+rbp]
+ add r11d,ecx
+ xor eax,edi
+ rol r13d,30
+ add r11d,eax
+ mov edx,DWORD[36+r9]
+ mov eax,esi
+ mov DWORD[32+rsp],r14d
+ mov ecx,r11d
+ bswap edx
+ xor eax,r13d
+ rol ecx,5
+ and eax,r12d
+ lea edi,[1518500249+rdi*1+r14]
+ add edi,ecx
+ xor eax,esi
+ rol r12d,30
+ add edi,eax
+ mov ebp,DWORD[40+r9]
+ mov eax,r13d
+ mov DWORD[36+rsp],edx
+ mov ecx,edi
+ bswap ebp
+ xor eax,r12d
+ rol ecx,5
+ and eax,r11d
+ lea esi,[1518500249+rsi*1+rdx]
+ add esi,ecx
+ xor eax,r13d
+ rol r11d,30
+ add esi,eax
+ mov r14d,DWORD[44+r9]
+ mov eax,r12d
+ mov DWORD[40+rsp],ebp
+ mov ecx,esi
+ bswap r14d
+ xor eax,r11d
+ rol ecx,5
+ and eax,edi
+ lea r13d,[1518500249+r13*1+rbp]
+ add r13d,ecx
+ xor eax,r12d
+ rol edi,30
+ add r13d,eax
+ mov edx,DWORD[48+r9]
+ mov eax,r11d
+ mov DWORD[44+rsp],r14d
+ mov ecx,r13d
+ bswap edx
+ xor eax,edi
+ rol ecx,5
+ and eax,esi
+ lea r12d,[1518500249+r12*1+r14]
+ add r12d,ecx
+ xor eax,r11d
+ rol esi,30
+ add r12d,eax
+ mov ebp,DWORD[52+r9]
+ mov eax,edi
+ mov DWORD[48+rsp],edx
+ mov ecx,r12d
+ bswap ebp
+ xor eax,esi
+ rol ecx,5
+ and eax,r13d
+ lea r11d,[1518500249+r11*1+rdx]
+ add r11d,ecx
+ xor eax,edi
+ rol r13d,30
+ add r11d,eax
+ mov r14d,DWORD[56+r9]
+ mov eax,esi
+ mov DWORD[52+rsp],ebp
+ mov ecx,r11d
+ bswap r14d
+ xor eax,r13d
+ rol ecx,5
+ and eax,r12d
+ lea edi,[1518500249+rdi*1+rbp]
+ add edi,ecx
+ xor eax,esi
+ rol r12d,30
+ add edi,eax
+ mov edx,DWORD[60+r9]
+ mov eax,r13d
+ mov DWORD[56+rsp],r14d
+ mov ecx,edi
+ bswap edx
+ xor eax,r12d
+ rol ecx,5
+ and eax,r11d
+ lea esi,[1518500249+rsi*1+r14]
+ add esi,ecx
+ xor eax,r13d
+ rol r11d,30
+ add esi,eax
+ xor ebp,DWORD[rsp]
+ mov eax,r12d
+ mov DWORD[60+rsp],edx
+ mov ecx,esi
+ xor ebp,DWORD[8+rsp]
+ xor eax,r11d
+ rol ecx,5
+ xor ebp,DWORD[32+rsp]
+ and eax,edi
+ lea r13d,[1518500249+r13*1+rdx]
+ rol edi,30
+ xor eax,r12d
+ add r13d,ecx
+ rol ebp,1
+ add r13d,eax
+ xor r14d,DWORD[4+rsp]
+ mov eax,r11d
+ mov DWORD[rsp],ebp
+ mov ecx,r13d
+ xor r14d,DWORD[12+rsp]
+ xor eax,edi
+ rol ecx,5
+ xor r14d,DWORD[36+rsp]
+ and eax,esi
+ lea r12d,[1518500249+r12*1+rbp]
+ rol esi,30
+ xor eax,r11d
+ add r12d,ecx
+ rol r14d,1
+ add r12d,eax
+ xor edx,DWORD[8+rsp]
+ mov eax,edi
+ mov DWORD[4+rsp],r14d
+ mov ecx,r12d
+ xor edx,DWORD[16+rsp]
+ xor eax,esi
+ rol ecx,5
+ xor edx,DWORD[40+rsp]
+ and eax,r13d
+ lea r11d,[1518500249+r11*1+r14]
+ rol r13d,30
+ xor eax,edi
+ add r11d,ecx
+ rol edx,1
+ add r11d,eax
+ xor ebp,DWORD[12+rsp]
+ mov eax,esi
+ mov DWORD[8+rsp],edx
+ mov ecx,r11d
+ xor ebp,DWORD[20+rsp]
+ xor eax,r13d
+ rol ecx,5
+ xor ebp,DWORD[44+rsp]
+ and eax,r12d
+ lea edi,[1518500249+rdi*1+rdx]
+ rol r12d,30
+ xor eax,esi
+ add edi,ecx
+ rol ebp,1
+ add edi,eax
+ xor r14d,DWORD[16+rsp]
+ mov eax,r13d
+ mov DWORD[12+rsp],ebp
+ mov ecx,edi
+ xor r14d,DWORD[24+rsp]
+ xor eax,r12d
+ rol ecx,5
+ xor r14d,DWORD[48+rsp]
+ and eax,r11d
+ lea esi,[1518500249+rsi*1+rbp]
+ rol r11d,30
+ xor eax,r13d
+ add esi,ecx
+ rol r14d,1
+ add esi,eax
+ xor edx,DWORD[20+rsp]
+ mov eax,edi
+ mov DWORD[16+rsp],r14d
+ mov ecx,esi
+ xor edx,DWORD[28+rsp]
+ xor eax,r12d
+ rol ecx,5
+ xor edx,DWORD[52+rsp]
+ lea r13d,[1859775393+r13*1+r14]
+ xor eax,r11d
+ add r13d,ecx
+ rol edi,30
+ add r13d,eax
+ rol edx,1
+ xor ebp,DWORD[24+rsp]
+ mov eax,esi
+ mov DWORD[20+rsp],edx
+ mov ecx,r13d
+ xor ebp,DWORD[32+rsp]
+ xor eax,r11d
+ rol ecx,5
+ xor ebp,DWORD[56+rsp]
+ lea r12d,[1859775393+r12*1+rdx]
+ xor eax,edi
+ add r12d,ecx
+ rol esi,30
+ add r12d,eax
+ rol ebp,1
+ xor r14d,DWORD[28+rsp]
+ mov eax,r13d
+ mov DWORD[24+rsp],ebp
+ mov ecx,r12d
+ xor r14d,DWORD[36+rsp]
+ xor eax,edi
+ rol ecx,5
+ xor r14d,DWORD[60+rsp]
+ lea r11d,[1859775393+r11*1+rbp]
+ xor eax,esi
+ add r11d,ecx
+ rol r13d,30
+ add r11d,eax
+ rol r14d,1
+ xor edx,DWORD[32+rsp]
+ mov eax,r12d
+ mov DWORD[28+rsp],r14d
+ mov ecx,r11d
+ xor edx,DWORD[40+rsp]
+ xor eax,esi
+ rol ecx,5
+ xor edx,DWORD[rsp]
+ lea edi,[1859775393+rdi*1+r14]
+ xor eax,r13d
+ add edi,ecx
+ rol r12d,30
+ add edi,eax
+ rol edx,1
+ xor ebp,DWORD[36+rsp]
+ mov eax,r11d
+ mov DWORD[32+rsp],edx
+ mov ecx,edi
+ xor ebp,DWORD[44+rsp]
+ xor eax,r13d
+ rol ecx,5
+ xor ebp,DWORD[4+rsp]
+ lea esi,[1859775393+rsi*1+rdx]
+ xor eax,r12d
+ add esi,ecx
+ rol r11d,30
+ add esi,eax
+ rol ebp,1
+ xor r14d,DWORD[40+rsp]
+ mov eax,edi
+ mov DWORD[36+rsp],ebp
+ mov ecx,esi
+ xor r14d,DWORD[48+rsp]
+ xor eax,r12d
+ rol ecx,5
+ xor r14d,DWORD[8+rsp]
+ lea r13d,[1859775393+r13*1+rbp]
+ xor eax,r11d
+ add r13d,ecx
+ rol edi,30
+ add r13d,eax
+ rol r14d,1
+ xor edx,DWORD[44+rsp]
+ mov eax,esi
+ mov DWORD[40+rsp],r14d
+ mov ecx,r13d
+ xor edx,DWORD[52+rsp]
+ xor eax,r11d
+ rol ecx,5
+ xor edx,DWORD[12+rsp]
+ lea r12d,[1859775393+r12*1+r14]
+ xor eax,edi
+ add r12d,ecx
+ rol esi,30
+ add r12d,eax
+ rol edx,1
+ xor ebp,DWORD[48+rsp]
+ mov eax,r13d
+ mov DWORD[44+rsp],edx
+ mov ecx,r12d
+ xor ebp,DWORD[56+rsp]
+ xor eax,edi
+ rol ecx,5
+ xor ebp,DWORD[16+rsp]
+ lea r11d,[1859775393+r11*1+rdx]
+ xor eax,esi
+ add r11d,ecx
+ rol r13d,30
+ add r11d,eax
+ rol ebp,1
+ xor r14d,DWORD[52+rsp]
+ mov eax,r12d
+ mov DWORD[48+rsp],ebp
+ mov ecx,r11d
+ xor r14d,DWORD[60+rsp]
+ xor eax,esi
+ rol ecx,5
+ xor r14d,DWORD[20+rsp]
+ lea edi,[1859775393+rdi*1+rbp]
+ xor eax,r13d
+ add edi,ecx
+ rol r12d,30
+ add edi,eax
+ rol r14d,1
+ xor edx,DWORD[56+rsp]
+ mov eax,r11d
+ mov DWORD[52+rsp],r14d
+ mov ecx,edi
+ xor edx,DWORD[rsp]
+ xor eax,r13d
+ rol ecx,5
+ xor edx,DWORD[24+rsp]
+ lea esi,[1859775393+rsi*1+r14]
+ xor eax,r12d
+ add esi,ecx
+ rol r11d,30
+ add esi,eax
+ rol edx,1
+ xor ebp,DWORD[60+rsp]
+ mov eax,edi
+ mov DWORD[56+rsp],edx
+ mov ecx,esi
+ xor ebp,DWORD[4+rsp]
+ xor eax,r12d
+ rol ecx,5
+ xor ebp,DWORD[28+rsp]
+ lea r13d,[1859775393+r13*1+rdx]
+ xor eax,r11d
+ add r13d,ecx
+ rol edi,30
+ add r13d,eax
+ rol ebp,1
+ xor r14d,DWORD[rsp]
+ mov eax,esi
+ mov DWORD[60+rsp],ebp
+ mov ecx,r13d
+ xor r14d,DWORD[8+rsp]
+ xor eax,r11d
+ rol ecx,5
+ xor r14d,DWORD[32+rsp]
+ lea r12d,[1859775393+r12*1+rbp]
+ xor eax,edi
+ add r12d,ecx
+ rol esi,30
+ add r12d,eax
+ rol r14d,1
+ xor edx,DWORD[4+rsp]
+ mov eax,r13d
+ mov DWORD[rsp],r14d
+ mov ecx,r12d
+ xor edx,DWORD[12+rsp]
+ xor eax,edi
+ rol ecx,5
+ xor edx,DWORD[36+rsp]
+ lea r11d,[1859775393+r11*1+r14]
+ xor eax,esi
+ add r11d,ecx
+ rol r13d,30
+ add r11d,eax
+ rol edx,1
+ xor ebp,DWORD[8+rsp]
+ mov eax,r12d
+ mov DWORD[4+rsp],edx
+ mov ecx,r11d
+ xor ebp,DWORD[16+rsp]
+ xor eax,esi
+ rol ecx,5
+ xor ebp,DWORD[40+rsp]
+ lea edi,[1859775393+rdi*1+rdx]
+ xor eax,r13d
+ add edi,ecx
+ rol r12d,30
+ add edi,eax
+ rol ebp,1
+ xor r14d,DWORD[12+rsp]
+ mov eax,r11d
+ mov DWORD[8+rsp],ebp
+ mov ecx,edi
+ xor r14d,DWORD[20+rsp]
+ xor eax,r13d
+ rol ecx,5
+ xor r14d,DWORD[44+rsp]
+ lea esi,[1859775393+rsi*1+rbp]
+ xor eax,r12d
+ add esi,ecx
+ rol r11d,30
+ add esi,eax
+ rol r14d,1
+ xor edx,DWORD[16+rsp]
+ mov eax,edi
+ mov DWORD[12+rsp],r14d
+ mov ecx,esi
+ xor edx,DWORD[24+rsp]
+ xor eax,r12d
+ rol ecx,5
+ xor edx,DWORD[48+rsp]
+ lea r13d,[1859775393+r13*1+r14]
+ xor eax,r11d
+ add r13d,ecx
+ rol edi,30
+ add r13d,eax
+ rol edx,1
+ xor ebp,DWORD[20+rsp]
+ mov eax,esi
+ mov DWORD[16+rsp],edx
+ mov ecx,r13d
+ xor ebp,DWORD[28+rsp]
+ xor eax,r11d
+ rol ecx,5
+ xor ebp,DWORD[52+rsp]
+ lea r12d,[1859775393+r12*1+rdx]
+ xor eax,edi
+ add r12d,ecx
+ rol esi,30
+ add r12d,eax
+ rol ebp,1
+ xor r14d,DWORD[24+rsp]
+ mov eax,r13d
+ mov DWORD[20+rsp],ebp
+ mov ecx,r12d
+ xor r14d,DWORD[32+rsp]
+ xor eax,edi
+ rol ecx,5
+ xor r14d,DWORD[56+rsp]
+ lea r11d,[1859775393+r11*1+rbp]
+ xor eax,esi
+ add r11d,ecx
+ rol r13d,30
+ add r11d,eax
+ rol r14d,1
+ xor edx,DWORD[28+rsp]
+ mov eax,r12d
+ mov DWORD[24+rsp],r14d
+ mov ecx,r11d
+ xor edx,DWORD[36+rsp]
+ xor eax,esi
+ rol ecx,5
+ xor edx,DWORD[60+rsp]
+ lea edi,[1859775393+rdi*1+r14]
+ xor eax,r13d
+ add edi,ecx
+ rol r12d,30
+ add edi,eax
+ rol edx,1
+ xor ebp,DWORD[32+rsp]
+ mov eax,r11d
+ mov DWORD[28+rsp],edx
+ mov ecx,edi
+ xor ebp,DWORD[40+rsp]
+ xor eax,r13d
+ rol ecx,5
+ xor ebp,DWORD[rsp]
+ lea esi,[1859775393+rsi*1+rdx]
+ xor eax,r12d
+ add esi,ecx
+ rol r11d,30
+ add esi,eax
+ rol ebp,1
+ xor r14d,DWORD[36+rsp]
+ mov eax,r12d
+ mov DWORD[32+rsp],ebp
+ mov ebx,r12d
+ xor r14d,DWORD[44+rsp]
+ and eax,r11d
+ mov ecx,esi
+ xor r14d,DWORD[4+rsp]
+ lea r13d,[((-1894007588))+r13*1+rbp]
+ xor ebx,r11d
+ rol ecx,5
+ add r13d,eax
+ rol r14d,1
+ and ebx,edi
+ add r13d,ecx
+ rol edi,30
+ add r13d,ebx
+ xor edx,DWORD[40+rsp]
+ mov eax,r11d
+ mov DWORD[36+rsp],r14d
+ mov ebx,r11d
+ xor edx,DWORD[48+rsp]
+ and eax,edi
+ mov ecx,r13d
+ xor edx,DWORD[8+rsp]
+ lea r12d,[((-1894007588))+r12*1+r14]
+ xor ebx,edi
+ rol ecx,5
+ add r12d,eax
+ rol edx,1
+ and ebx,esi
+ add r12d,ecx
+ rol esi,30
+ add r12d,ebx
+ xor ebp,DWORD[44+rsp]
+ mov eax,edi
+ mov DWORD[40+rsp],edx
+ mov ebx,edi
+ xor ebp,DWORD[52+rsp]
+ and eax,esi
+ mov ecx,r12d
+ xor ebp,DWORD[12+rsp]
+ lea r11d,[((-1894007588))+r11*1+rdx]
+ xor ebx,esi
+ rol ecx,5
+ add r11d,eax
+ rol ebp,1
+ and ebx,r13d
+ add r11d,ecx
+ rol r13d,30
+ add r11d,ebx
+ xor r14d,DWORD[48+rsp]
+ mov eax,esi
+ mov DWORD[44+rsp],ebp
+ mov ebx,esi
+ xor r14d,DWORD[56+rsp]
+ and eax,r13d
+ mov ecx,r11d
+ xor r14d,DWORD[16+rsp]
+ lea edi,[((-1894007588))+rdi*1+rbp]
+ xor ebx,r13d
+ rol ecx,5
+ add edi,eax
+ rol r14d,1
+ and ebx,r12d
+ add edi,ecx
+ rol r12d,30
+ add edi,ebx
+ xor edx,DWORD[52+rsp]
+ mov eax,r13d
+ mov DWORD[48+rsp],r14d
+ mov ebx,r13d
+ xor edx,DWORD[60+rsp]
+ and eax,r12d
+ mov ecx,edi
+ xor edx,DWORD[20+rsp]
+ lea esi,[((-1894007588))+rsi*1+r14]
+ xor ebx,r12d
+ rol ecx,5
+ add esi,eax
+ rol edx,1
+ and ebx,r11d
+ add esi,ecx
+ rol r11d,30
+ add esi,ebx
+ xor ebp,DWORD[56+rsp]
+ mov eax,r12d
+ mov DWORD[52+rsp],edx
+ mov ebx,r12d
+ xor ebp,DWORD[rsp]
+ and eax,r11d
+ mov ecx,esi
+ xor ebp,DWORD[24+rsp]
+ lea r13d,[((-1894007588))+r13*1+rdx]
+ xor ebx,r11d
+ rol ecx,5
+ add r13d,eax
+ rol ebp,1
+ and ebx,edi
+ add r13d,ecx
+ rol edi,30
+ add r13d,ebx
+ xor r14d,DWORD[60+rsp]
+ mov eax,r11d
+ mov DWORD[56+rsp],ebp
+ mov ebx,r11d
+ xor r14d,DWORD[4+rsp]
+ and eax,edi
+ mov ecx,r13d
+ xor r14d,DWORD[28+rsp]
+ lea r12d,[((-1894007588))+r12*1+rbp]
+ xor ebx,edi
+ rol ecx,5
+ add r12d,eax
+ rol r14d,1
+ and ebx,esi
+ add r12d,ecx
+ rol esi,30
+ add r12d,ebx
+ xor edx,DWORD[rsp]
+ mov eax,edi
+ mov DWORD[60+rsp],r14d
+ mov ebx,edi
+ xor edx,DWORD[8+rsp]
+ and eax,esi
+ mov ecx,r12d
+ xor edx,DWORD[32+rsp]
+ lea r11d,[((-1894007588))+r11*1+r14]
+ xor ebx,esi
+ rol ecx,5
+ add r11d,eax
+ rol edx,1
+ and ebx,r13d
+ add r11d,ecx
+ rol r13d,30
+ add r11d,ebx
+ xor ebp,DWORD[4+rsp]
+ mov eax,esi
+ mov DWORD[rsp],edx
+ mov ebx,esi
+ xor ebp,DWORD[12+rsp]
+ and eax,r13d
+ mov ecx,r11d
+ xor ebp,DWORD[36+rsp]
+ lea edi,[((-1894007588))+rdi*1+rdx]
+ xor ebx,r13d
+ rol ecx,5
+ add edi,eax
+ rol ebp,1
+ and ebx,r12d
+ add edi,ecx
+ rol r12d,30
+ add edi,ebx
+ xor r14d,DWORD[8+rsp]
+ mov eax,r13d
+ mov DWORD[4+rsp],ebp
+ mov ebx,r13d
+ xor r14d,DWORD[16+rsp]
+ and eax,r12d
+ mov ecx,edi
+ xor r14d,DWORD[40+rsp]
+ lea esi,[((-1894007588))+rsi*1+rbp]
+ xor ebx,r12d
+ rol ecx,5
+ add esi,eax
+ rol r14d,1
+ and ebx,r11d
+ add esi,ecx
+ rol r11d,30
+ add esi,ebx
+ xor edx,DWORD[12+rsp]
+ mov eax,r12d
+ mov DWORD[8+rsp],r14d
+ mov ebx,r12d
+ xor edx,DWORD[20+rsp]
+ and eax,r11d
+ mov ecx,esi
+ xor edx,DWORD[44+rsp]
+ lea r13d,[((-1894007588))+r13*1+r14]
+ xor ebx,r11d
+ rol ecx,5
+ add r13d,eax
+ rol edx,1
+ and ebx,edi
+ add r13d,ecx
+ rol edi,30
+ add r13d,ebx
+ xor ebp,DWORD[16+rsp]
+ mov eax,r11d
+ mov DWORD[12+rsp],edx
+ mov ebx,r11d
+ xor ebp,DWORD[24+rsp]
+ and eax,edi
+ mov ecx,r13d
+ xor ebp,DWORD[48+rsp]
+ lea r12d,[((-1894007588))+r12*1+rdx]
+ xor ebx,edi
+ rol ecx,5
+ add r12d,eax
+ rol ebp,1
+ and ebx,esi
+ add r12d,ecx
+ rol esi,30
+ add r12d,ebx
+ xor r14d,DWORD[20+rsp]
+ mov eax,edi
+ mov DWORD[16+rsp],ebp
+ mov ebx,edi
+ xor r14d,DWORD[28+rsp]
+ and eax,esi
+ mov ecx,r12d
+ xor r14d,DWORD[52+rsp]
+ lea r11d,[((-1894007588))+r11*1+rbp]
+ xor ebx,esi
+ rol ecx,5
+ add r11d,eax
+ rol r14d,1
+ and ebx,r13d
+ add r11d,ecx
+ rol r13d,30
+ add r11d,ebx
+ xor edx,DWORD[24+rsp]
+ mov eax,esi
+ mov DWORD[20+rsp],r14d
+ mov ebx,esi
+ xor edx,DWORD[32+rsp]
+ and eax,r13d
+ mov ecx,r11d
+ xor edx,DWORD[56+rsp]
+ lea edi,[((-1894007588))+rdi*1+r14]
+ xor ebx,r13d
+ rol ecx,5
+ add edi,eax
+ rol edx,1
+ and ebx,r12d
+ add edi,ecx
+ rol r12d,30
+ add edi,ebx
+ xor ebp,DWORD[28+rsp]
+ mov eax,r13d
+ mov DWORD[24+rsp],edx
+ mov ebx,r13d
+ xor ebp,DWORD[36+rsp]
+ and eax,r12d
+ mov ecx,edi
+ xor ebp,DWORD[60+rsp]
+ lea esi,[((-1894007588))+rsi*1+rdx]
+ xor ebx,r12d
+ rol ecx,5
+ add esi,eax
+ rol ebp,1
+ and ebx,r11d
+ add esi,ecx
+ rol r11d,30
+ add esi,ebx
+ xor r14d,DWORD[32+rsp]
+ mov eax,r12d
+ mov DWORD[28+rsp],ebp
+ mov ebx,r12d
+ xor r14d,DWORD[40+rsp]
+ and eax,r11d
+ mov ecx,esi
+ xor r14d,DWORD[rsp]
+ lea r13d,[((-1894007588))+r13*1+rbp]
+ xor ebx,r11d
+ rol ecx,5
+ add r13d,eax
+ rol r14d,1
+ and ebx,edi
+ add r13d,ecx
+ rol edi,30
+ add r13d,ebx
+ xor edx,DWORD[36+rsp]
+ mov eax,r11d
+ mov DWORD[32+rsp],r14d
+ mov ebx,r11d
+ xor edx,DWORD[44+rsp]
+ and eax,edi
+ mov ecx,r13d
+ xor edx,DWORD[4+rsp]
+ lea r12d,[((-1894007588))+r12*1+r14]
+ xor ebx,edi
+ rol ecx,5
+ add r12d,eax
+ rol edx,1
+ and ebx,esi
+ add r12d,ecx
+ rol esi,30
+ add r12d,ebx
+ xor ebp,DWORD[40+rsp]
+ mov eax,edi
+ mov DWORD[36+rsp],edx
+ mov ebx,edi
+ xor ebp,DWORD[48+rsp]
+ and eax,esi
+ mov ecx,r12d
+ xor ebp,DWORD[8+rsp]
+ lea r11d,[((-1894007588))+r11*1+rdx]
+ xor ebx,esi
+ rol ecx,5
+ add r11d,eax
+ rol ebp,1
+ and ebx,r13d
+ add r11d,ecx
+ rol r13d,30
+ add r11d,ebx
+ xor r14d,DWORD[44+rsp]
+ mov eax,esi
+ mov DWORD[40+rsp],ebp
+ mov ebx,esi
+ xor r14d,DWORD[52+rsp]
+ and eax,r13d
+ mov ecx,r11d
+ xor r14d,DWORD[12+rsp]
+ lea edi,[((-1894007588))+rdi*1+rbp]
+ xor ebx,r13d
+ rol ecx,5
+ add edi,eax
+ rol r14d,1
+ and ebx,r12d
+ add edi,ecx
+ rol r12d,30
+ add edi,ebx
+ xor edx,DWORD[48+rsp]
+ mov eax,r13d
+ mov DWORD[44+rsp],r14d
+ mov ebx,r13d
+ xor edx,DWORD[56+rsp]
+ and eax,r12d
+ mov ecx,edi
+ xor edx,DWORD[16+rsp]
+ lea esi,[((-1894007588))+rsi*1+r14]
+ xor ebx,r12d
+ rol ecx,5
+ add esi,eax
+ rol edx,1
+ and ebx,r11d
+ add esi,ecx
+ rol r11d,30
+ add esi,ebx
+ xor ebp,DWORD[52+rsp]
+ mov eax,edi
+ mov DWORD[48+rsp],edx
+ mov ecx,esi
+ xor ebp,DWORD[60+rsp]
+ xor eax,r12d
+ rol ecx,5
+ xor ebp,DWORD[20+rsp]
+ lea r13d,[((-899497514))+r13*1+rdx]
+ xor eax,r11d
+ add r13d,ecx
+ rol edi,30
+ add r13d,eax
+ rol ebp,1
+ xor r14d,DWORD[56+rsp]
+ mov eax,esi
+ mov DWORD[52+rsp],ebp
+ mov ecx,r13d
+ xor r14d,DWORD[rsp]
+ xor eax,r11d
+ rol ecx,5
+ xor r14d,DWORD[24+rsp]
+ lea r12d,[((-899497514))+r12*1+rbp]
+ xor eax,edi
+ add r12d,ecx
+ rol esi,30
+ add r12d,eax
+ rol r14d,1
+ xor edx,DWORD[60+rsp]
+ mov eax,r13d
+ mov DWORD[56+rsp],r14d
+ mov ecx,r12d
+ xor edx,DWORD[4+rsp]
+ xor eax,edi
+ rol ecx,5
+ xor edx,DWORD[28+rsp]
+ lea r11d,[((-899497514))+r11*1+r14]
+ xor eax,esi
+ add r11d,ecx
+ rol r13d,30
+ add r11d,eax
+ rol edx,1
+ xor ebp,DWORD[rsp]
+ mov eax,r12d
+ mov DWORD[60+rsp],edx
+ mov ecx,r11d
+ xor ebp,DWORD[8+rsp]
+ xor eax,esi
+ rol ecx,5
+ xor ebp,DWORD[32+rsp]
+ lea edi,[((-899497514))+rdi*1+rdx]
+ xor eax,r13d
+ add edi,ecx
+ rol r12d,30
+ add edi,eax
+ rol ebp,1
+ xor r14d,DWORD[4+rsp]
+ mov eax,r11d
+ mov DWORD[rsp],ebp
+ mov ecx,edi
+ xor r14d,DWORD[12+rsp]
+ xor eax,r13d
+ rol ecx,5
+ xor r14d,DWORD[36+rsp]
+ lea esi,[((-899497514))+rsi*1+rbp]
+ xor eax,r12d
+ add esi,ecx
+ rol r11d,30
+ add esi,eax
+ rol r14d,1
+ xor edx,DWORD[8+rsp]
+ mov eax,edi
+ mov DWORD[4+rsp],r14d
+ mov ecx,esi
+ xor edx,DWORD[16+rsp]
+ xor eax,r12d
+ rol ecx,5
+ xor edx,DWORD[40+rsp]
+ lea r13d,[((-899497514))+r13*1+r14]
+ xor eax,r11d
+ add r13d,ecx
+ rol edi,30
+ add r13d,eax
+ rol edx,1
+ xor ebp,DWORD[12+rsp]
+ mov eax,esi
+ mov DWORD[8+rsp],edx
+ mov ecx,r13d
+ xor ebp,DWORD[20+rsp]
+ xor eax,r11d
+ rol ecx,5
+ xor ebp,DWORD[44+rsp]
+ lea r12d,[((-899497514))+r12*1+rdx]
+ xor eax,edi
+ add r12d,ecx
+ rol esi,30
+ add r12d,eax
+ rol ebp,1
+ xor r14d,DWORD[16+rsp]
+ mov eax,r13d
+ mov DWORD[12+rsp],ebp
+ mov ecx,r12d
+ xor r14d,DWORD[24+rsp]
+ xor eax,edi
+ rol ecx,5
+ xor r14d,DWORD[48+rsp]
+ lea r11d,[((-899497514))+r11*1+rbp]
+ xor eax,esi
+ add r11d,ecx
+ rol r13d,30
+ add r11d,eax
+ rol r14d,1
+ xor edx,DWORD[20+rsp]
+ mov eax,r12d
+ mov DWORD[16+rsp],r14d
+ mov ecx,r11d
+ xor edx,DWORD[28+rsp]
+ xor eax,esi
+ rol ecx,5
+ xor edx,DWORD[52+rsp]
+ lea edi,[((-899497514))+rdi*1+r14]
+ xor eax,r13d
+ add edi,ecx
+ rol r12d,30
+ add edi,eax
+ rol edx,1
+ xor ebp,DWORD[24+rsp]
+ mov eax,r11d
+ mov DWORD[20+rsp],edx
+ mov ecx,edi
+ xor ebp,DWORD[32+rsp]
+ xor eax,r13d
+ rol ecx,5
+ xor ebp,DWORD[56+rsp]
+ lea esi,[((-899497514))+rsi*1+rdx]
+ xor eax,r12d
+ add esi,ecx
+ rol r11d,30
+ add esi,eax
+ rol ebp,1
+ xor r14d,DWORD[28+rsp]
+ mov eax,edi
+ mov DWORD[24+rsp],ebp
+ mov ecx,esi
+ xor r14d,DWORD[36+rsp]
+ xor eax,r12d
+ rol ecx,5
+ xor r14d,DWORD[60+rsp]
+ lea r13d,[((-899497514))+r13*1+rbp]
+ xor eax,r11d
+ add r13d,ecx
+ rol edi,30
+ add r13d,eax
+ rol r14d,1
+ xor edx,DWORD[32+rsp]
+ mov eax,esi
+ mov DWORD[28+rsp],r14d
+ mov ecx,r13d
+ xor edx,DWORD[40+rsp]
+ xor eax,r11d
+ rol ecx,5
+ xor edx,DWORD[rsp]
+ lea r12d,[((-899497514))+r12*1+r14]
+ xor eax,edi
+ add r12d,ecx
+ rol esi,30
+ add r12d,eax
+ rol edx,1
+ xor ebp,DWORD[36+rsp]
+ mov eax,r13d
+
+ mov ecx,r12d
+ xor ebp,DWORD[44+rsp]
+ xor eax,edi
+ rol ecx,5
+ xor ebp,DWORD[4+rsp]
+ lea r11d,[((-899497514))+r11*1+rdx]
+ xor eax,esi
+ add r11d,ecx
+ rol r13d,30
+ add r11d,eax
+ rol ebp,1
+ xor r14d,DWORD[40+rsp]
+ mov eax,r12d
+
+ mov ecx,r11d
+ xor r14d,DWORD[48+rsp]
+ xor eax,esi
+ rol ecx,5
+ xor r14d,DWORD[8+rsp]
+ lea edi,[((-899497514))+rdi*1+rbp]
+ xor eax,r13d
+ add edi,ecx
+ rol r12d,30
+ add edi,eax
+ rol r14d,1
+ xor edx,DWORD[44+rsp]
+ mov eax,r11d
+
+ mov ecx,edi
+ xor edx,DWORD[52+rsp]
+ xor eax,r13d
+ rol ecx,5
+ xor edx,DWORD[12+rsp]
+ lea esi,[((-899497514))+rsi*1+r14]
+ xor eax,r12d
+ add esi,ecx
+ rol r11d,30
+ add esi,eax
+ rol edx,1
+ xor ebp,DWORD[48+rsp]
+ mov eax,edi
+
+ mov ecx,esi
+ xor ebp,DWORD[56+rsp]
+ xor eax,r12d
+ rol ecx,5
+ xor ebp,DWORD[16+rsp]
+ lea r13d,[((-899497514))+r13*1+rdx]
+ xor eax,r11d
+ add r13d,ecx
+ rol edi,30
+ add r13d,eax
+ rol ebp,1
+ xor r14d,DWORD[52+rsp]
+ mov eax,esi
+
+ mov ecx,r13d
+ xor r14d,DWORD[60+rsp]
+ xor eax,r11d
+ rol ecx,5
+ xor r14d,DWORD[20+rsp]
+ lea r12d,[((-899497514))+r12*1+rbp]
+ xor eax,edi
+ add r12d,ecx
+ rol esi,30
+ add r12d,eax
+ rol r14d,1
+ xor edx,DWORD[56+rsp]
+ mov eax,r13d
+
+ mov ecx,r12d
+ xor edx,DWORD[rsp]
+ xor eax,edi
+ rol ecx,5
+ xor edx,DWORD[24+rsp]
+ lea r11d,[((-899497514))+r11*1+r14]
+ xor eax,esi
+ add r11d,ecx
+ rol r13d,30
+ add r11d,eax
+ rol edx,1
+ xor ebp,DWORD[60+rsp]
+ mov eax,r12d
+
+ mov ecx,r11d
+ xor ebp,DWORD[4+rsp]
+ xor eax,esi
+ rol ecx,5
+ xor ebp,DWORD[28+rsp]
+ lea edi,[((-899497514))+rdi*1+rdx]
+ xor eax,r13d
+ add edi,ecx
+ rol r12d,30
+ add edi,eax
+ rol ebp,1
+ mov eax,r11d
+ mov ecx,edi
+ xor eax,r13d
+ lea esi,[((-899497514))+rsi*1+rbp]
+ rol ecx,5
+ xor eax,r12d
+ add esi,ecx
+ rol r11d,30
+ add esi,eax
+ add esi,DWORD[r8]
+ add edi,DWORD[4+r8]
+ add r11d,DWORD[8+r8]
+ add r12d,DWORD[12+r8]
+ add r13d,DWORD[16+r8]
+ mov DWORD[r8],esi
+ mov DWORD[4+r8],edi
+ mov DWORD[8+r8],r11d
+ mov DWORD[12+r8],r12d
+ mov DWORD[16+r8],r13d
+
+ sub r10,1
+ lea r9,[64+r9]
+ jnz NEAR $L$loop
+
+ mov rsi,QWORD[64+rsp]
+
+ mov r14,QWORD[((-40))+rsi]
+
+ mov r13,QWORD[((-32))+rsi]
+
+ mov r12,QWORD[((-24))+rsi]
+
+ mov rbp,QWORD[((-16))+rsi]
+
+ mov rbx,QWORD[((-8))+rsi]
+
+ lea rsp,[rsi]
+
+$L$epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_sha1_block_data_order:
+
+ALIGN 32
+sha1_block_data_order_shaext:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_sha1_block_data_order_shaext:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+_shaext_shortcut:
+
+ lea rsp,[((-72))+rsp]
+ movaps XMMWORD[(-8-64)+rax],xmm6
+ movaps XMMWORD[(-8-48)+rax],xmm7
+ movaps XMMWORD[(-8-32)+rax],xmm8
+ movaps XMMWORD[(-8-16)+rax],xmm9
+$L$prologue_shaext:
+ movdqu xmm0,XMMWORD[rdi]
+ movd xmm1,DWORD[16+rdi]
+ movdqa xmm3,XMMWORD[((K_XX_XX+160))]
+
+ movdqu xmm4,XMMWORD[rsi]
+ pshufd xmm0,xmm0,27
+ movdqu xmm5,XMMWORD[16+rsi]
+ pshufd xmm1,xmm1,27
+ movdqu xmm6,XMMWORD[32+rsi]
+DB 102,15,56,0,227
+ movdqu xmm7,XMMWORD[48+rsi]
+DB 102,15,56,0,235
+DB 102,15,56,0,243
+ movdqa xmm9,xmm1
+DB 102,15,56,0,251
+ jmp NEAR $L$oop_shaext
+
+ALIGN 16
+$L$oop_shaext:
+ dec rdx
+ lea r8,[64+rsi]
+ paddd xmm1,xmm4
+ cmovne rsi,r8
+ movdqa xmm8,xmm0
+DB 15,56,201,229
+ movdqa xmm2,xmm0
+DB 15,58,204,193,0
+DB 15,56,200,213
+ pxor xmm4,xmm6
+DB 15,56,201,238
+DB 15,56,202,231
+
+ movdqa xmm1,xmm0
+DB 15,58,204,194,0
+DB 15,56,200,206
+ pxor xmm5,xmm7
+DB 15,56,202,236
+DB 15,56,201,247
+ movdqa xmm2,xmm0
+DB 15,58,204,193,0
+DB 15,56,200,215
+ pxor xmm6,xmm4
+DB 15,56,201,252
+DB 15,56,202,245
+
+ movdqa xmm1,xmm0
+DB 15,58,204,194,0
+DB 15,56,200,204
+ pxor xmm7,xmm5
+DB 15,56,202,254
+DB 15,56,201,229
+ movdqa xmm2,xmm0
+DB 15,58,204,193,0
+DB 15,56,200,213
+ pxor xmm4,xmm6
+DB 15,56,201,238
+DB 15,56,202,231
+
+ movdqa xmm1,xmm0
+DB 15,58,204,194,1
+DB 15,56,200,206
+ pxor xmm5,xmm7
+DB 15,56,202,236
+DB 15,56,201,247
+ movdqa xmm2,xmm0
+DB 15,58,204,193,1
+DB 15,56,200,215
+ pxor xmm6,xmm4
+DB 15,56,201,252
+DB 15,56,202,245
+
+ movdqa xmm1,xmm0
+DB 15,58,204,194,1
+DB 15,56,200,204
+ pxor xmm7,xmm5
+DB 15,56,202,254
+DB 15,56,201,229
+ movdqa xmm2,xmm0
+DB 15,58,204,193,1
+DB 15,56,200,213
+ pxor xmm4,xmm6
+DB 15,56,201,238
+DB 15,56,202,231
+
+ movdqa xmm1,xmm0
+DB 15,58,204,194,1
+DB 15,56,200,206
+ pxor xmm5,xmm7
+DB 15,56,202,236
+DB 15,56,201,247
+ movdqa xmm2,xmm0
+DB 15,58,204,193,2
+DB 15,56,200,215
+ pxor xmm6,xmm4
+DB 15,56,201,252
+DB 15,56,202,245
+
+ movdqa xmm1,xmm0
+DB 15,58,204,194,2
+DB 15,56,200,204
+ pxor xmm7,xmm5
+DB 15,56,202,254
+DB 15,56,201,229
+ movdqa xmm2,xmm0
+DB 15,58,204,193,2
+DB 15,56,200,213
+ pxor xmm4,xmm6
+DB 15,56,201,238
+DB 15,56,202,231
+
+ movdqa xmm1,xmm0
+DB 15,58,204,194,2
+DB 15,56,200,206
+ pxor xmm5,xmm7
+DB 15,56,202,236
+DB 15,56,201,247
+ movdqa xmm2,xmm0
+DB 15,58,204,193,2
+DB 15,56,200,215
+ pxor xmm6,xmm4
+DB 15,56,201,252
+DB 15,56,202,245
+
+ movdqa xmm1,xmm0
+DB 15,58,204,194,3
+DB 15,56,200,204
+ pxor xmm7,xmm5
+DB 15,56,202,254
+ movdqu xmm4,XMMWORD[rsi]
+ movdqa xmm2,xmm0
+DB 15,58,204,193,3
+DB 15,56,200,213
+ movdqu xmm5,XMMWORD[16+rsi]
+DB 102,15,56,0,227
+
+ movdqa xmm1,xmm0
+DB 15,58,204,194,3
+DB 15,56,200,206
+ movdqu xmm6,XMMWORD[32+rsi]
+DB 102,15,56,0,235
+
+ movdqa xmm2,xmm0
+DB 15,58,204,193,3
+DB 15,56,200,215
+ movdqu xmm7,XMMWORD[48+rsi]
+DB 102,15,56,0,243
+
+ movdqa xmm1,xmm0
+DB 15,58,204,194,3
+DB 65,15,56,200,201
+DB 102,15,56,0,251
+
+ paddd xmm0,xmm8
+ movdqa xmm9,xmm1
+
+ jnz NEAR $L$oop_shaext
+
+ pshufd xmm0,xmm0,27
+ pshufd xmm1,xmm1,27
+ movdqu XMMWORD[rdi],xmm0
+ movd DWORD[16+rdi],xmm1
+ movaps xmm6,XMMWORD[((-8-64))+rax]
+ movaps xmm7,XMMWORD[((-8-48))+rax]
+ movaps xmm8,XMMWORD[((-8-32))+rax]
+ movaps xmm9,XMMWORD[((-8-16))+rax]
+ mov rsp,rax
+$L$epilogue_shaext:
+
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+$L$SEH_end_sha1_block_data_order_shaext:
+
+ALIGN 16
+sha1_block_data_order_ssse3:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_sha1_block_data_order_ssse3:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+_ssse3_shortcut:
+
+ mov r11,rsp
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ lea rsp,[((-160))+rsp]
+ movaps XMMWORD[(-40-96)+r11],xmm6
+ movaps XMMWORD[(-40-80)+r11],xmm7
+ movaps XMMWORD[(-40-64)+r11],xmm8
+ movaps XMMWORD[(-40-48)+r11],xmm9
+ movaps XMMWORD[(-40-32)+r11],xmm10
+ movaps XMMWORD[(-40-16)+r11],xmm11
+$L$prologue_ssse3:
+ and rsp,-64
+ mov r8,rdi
+ mov r9,rsi
+ mov r10,rdx
+
+ shl r10,6
+ add r10,r9
+ lea r14,[((K_XX_XX+64))]
+
+ mov eax,DWORD[r8]
+ mov ebx,DWORD[4+r8]
+ mov ecx,DWORD[8+r8]
+ mov edx,DWORD[12+r8]
+ mov esi,ebx
+ mov ebp,DWORD[16+r8]
+ mov edi,ecx
+ xor edi,edx
+ and esi,edi
+
+ movdqa xmm6,XMMWORD[64+r14]
+ movdqa xmm9,XMMWORD[((-64))+r14]
+ movdqu xmm0,XMMWORD[r9]
+ movdqu xmm1,XMMWORD[16+r9]
+ movdqu xmm2,XMMWORD[32+r9]
+ movdqu xmm3,XMMWORD[48+r9]
+DB 102,15,56,0,198
+DB 102,15,56,0,206
+DB 102,15,56,0,214
+ add r9,64
+ paddd xmm0,xmm9
+DB 102,15,56,0,222
+ paddd xmm1,xmm9
+ paddd xmm2,xmm9
+ movdqa XMMWORD[rsp],xmm0
+ psubd xmm0,xmm9
+ movdqa XMMWORD[16+rsp],xmm1
+ psubd xmm1,xmm9
+ movdqa XMMWORD[32+rsp],xmm2
+ psubd xmm2,xmm9
+ jmp NEAR $L$oop_ssse3
+ALIGN 16
+$L$oop_ssse3:
+ ror ebx,2
+ pshufd xmm4,xmm0,238
+ xor esi,edx
+ movdqa xmm8,xmm3
+ paddd xmm9,xmm3
+ mov edi,eax
+ add ebp,DWORD[rsp]
+ punpcklqdq xmm4,xmm1
+ xor ebx,ecx
+ rol eax,5
+ add ebp,esi
+ psrldq xmm8,4
+ and edi,ebx
+ xor ebx,ecx
+ pxor xmm4,xmm0
+ add ebp,eax
+ ror eax,7
+ pxor xmm8,xmm2
+ xor edi,ecx
+ mov esi,ebp
+ add edx,DWORD[4+rsp]
+ pxor xmm4,xmm8
+ xor eax,ebx
+ rol ebp,5
+ movdqa XMMWORD[48+rsp],xmm9
+ add edx,edi
+ and esi,eax
+ movdqa xmm10,xmm4
+ xor eax,ebx
+ add edx,ebp
+ ror ebp,7
+ movdqa xmm8,xmm4
+ xor esi,ebx
+ pslldq xmm10,12
+ paddd xmm4,xmm4
+ mov edi,edx
+ add ecx,DWORD[8+rsp]
+ psrld xmm8,31
+ xor ebp,eax
+ rol edx,5
+ add ecx,esi
+ movdqa xmm9,xmm10
+ and edi,ebp
+ xor ebp,eax
+ psrld xmm10,30
+ add ecx,edx
+ ror edx,7
+ por xmm4,xmm8
+ xor edi,eax
+ mov esi,ecx
+ add ebx,DWORD[12+rsp]
+ pslld xmm9,2
+ pxor xmm4,xmm10
+ xor edx,ebp
+ movdqa xmm10,XMMWORD[((-64))+r14]
+ rol ecx,5
+ add ebx,edi
+ and esi,edx
+ pxor xmm4,xmm9
+ xor edx,ebp
+ add ebx,ecx
+ ror ecx,7
+ pshufd xmm5,xmm1,238
+ xor esi,ebp
+ movdqa xmm9,xmm4
+ paddd xmm10,xmm4
+ mov edi,ebx
+ add eax,DWORD[16+rsp]
+ punpcklqdq xmm5,xmm2
+ xor ecx,edx
+ rol ebx,5
+ add eax,esi
+ psrldq xmm9,4
+ and edi,ecx
+ xor ecx,edx
+ pxor xmm5,xmm1
+ add eax,ebx
+ ror ebx,7
+ pxor xmm9,xmm3
+ xor edi,edx
+ mov esi,eax
+ add ebp,DWORD[20+rsp]
+ pxor xmm5,xmm9
+ xor ebx,ecx
+ rol eax,5
+ movdqa XMMWORD[rsp],xmm10
+ add ebp,edi
+ and esi,ebx
+ movdqa xmm8,xmm5
+ xor ebx,ecx
+ add ebp,eax
+ ror eax,7
+ movdqa xmm9,xmm5
+ xor esi,ecx
+ pslldq xmm8,12
+ paddd xmm5,xmm5
+ mov edi,ebp
+ add edx,DWORD[24+rsp]
+ psrld xmm9,31
+ xor eax,ebx
+ rol ebp,5
+ add edx,esi
+ movdqa xmm10,xmm8
+ and edi,eax
+ xor eax,ebx
+ psrld xmm8,30
+ add edx,ebp
+ ror ebp,7
+ por xmm5,xmm9
+ xor edi,ebx
+ mov esi,edx
+ add ecx,DWORD[28+rsp]
+ pslld xmm10,2
+ pxor xmm5,xmm8
+ xor ebp,eax
+ movdqa xmm8,XMMWORD[((-32))+r14]
+ rol edx,5
+ add ecx,edi
+ and esi,ebp
+ pxor xmm5,xmm10
+ xor ebp,eax
+ add ecx,edx
+ ror edx,7
+ pshufd xmm6,xmm2,238
+ xor esi,eax
+ movdqa xmm10,xmm5
+ paddd xmm8,xmm5
+ mov edi,ecx
+ add ebx,DWORD[32+rsp]
+ punpcklqdq xmm6,xmm3
+ xor edx,ebp
+ rol ecx,5
+ add ebx,esi
+ psrldq xmm10,4
+ and edi,edx
+ xor edx,ebp
+ pxor xmm6,xmm2
+ add ebx,ecx
+ ror ecx,7
+ pxor xmm10,xmm4
+ xor edi,ebp
+ mov esi,ebx
+ add eax,DWORD[36+rsp]
+ pxor xmm6,xmm10
+ xor ecx,edx
+ rol ebx,5
+ movdqa XMMWORD[16+rsp],xmm8
+ add eax,edi
+ and esi,ecx
+ movdqa xmm9,xmm6
+ xor ecx,edx
+ add eax,ebx
+ ror ebx,7
+ movdqa xmm10,xmm6
+ xor esi,edx
+ pslldq xmm9,12
+ paddd xmm6,xmm6
+ mov edi,eax
+ add ebp,DWORD[40+rsp]
+ psrld xmm10,31
+ xor ebx,ecx
+ rol eax,5
+ add ebp,esi
+ movdqa xmm8,xmm9
+ and edi,ebx
+ xor ebx,ecx
+ psrld xmm9,30
+ add ebp,eax
+ ror eax,7
+ por xmm6,xmm10
+ xor edi,ecx
+ mov esi,ebp
+ add edx,DWORD[44+rsp]
+ pslld xmm8,2
+ pxor xmm6,xmm9
+ xor eax,ebx
+ movdqa xmm9,XMMWORD[((-32))+r14]
+ rol ebp,5
+ add edx,edi
+ and esi,eax
+ pxor xmm6,xmm8
+ xor eax,ebx
+ add edx,ebp
+ ror ebp,7
+ pshufd xmm7,xmm3,238
+ xor esi,ebx
+ movdqa xmm8,xmm6
+ paddd xmm9,xmm6
+ mov edi,edx
+ add ecx,DWORD[48+rsp]
+ punpcklqdq xmm7,xmm4
+ xor ebp,eax
+ rol edx,5
+ add ecx,esi
+ psrldq xmm8,4
+ and edi,ebp
+ xor ebp,eax
+ pxor xmm7,xmm3
+ add ecx,edx
+ ror edx,7
+ pxor xmm8,xmm5
+ xor edi,eax
+ mov esi,ecx
+ add ebx,DWORD[52+rsp]
+ pxor xmm7,xmm8
+ xor edx,ebp
+ rol ecx,5
+ movdqa XMMWORD[32+rsp],xmm9
+ add ebx,edi
+ and esi,edx
+ movdqa xmm10,xmm7
+ xor edx,ebp
+ add ebx,ecx
+ ror ecx,7
+ movdqa xmm8,xmm7
+ xor esi,ebp
+ pslldq xmm10,12
+ paddd xmm7,xmm7
+ mov edi,ebx
+ add eax,DWORD[56+rsp]
+ psrld xmm8,31
+ xor ecx,edx
+ rol ebx,5
+ add eax,esi
+ movdqa xmm9,xmm10
+ and edi,ecx
+ xor ecx,edx
+ psrld xmm10,30
+ add eax,ebx
+ ror ebx,7
+ por xmm7,xmm8
+ xor edi,edx
+ mov esi,eax
+ add ebp,DWORD[60+rsp]
+ pslld xmm9,2
+ pxor xmm7,xmm10
+ xor ebx,ecx
+ movdqa xmm10,XMMWORD[((-32))+r14]
+ rol eax,5
+ add ebp,edi
+ and esi,ebx
+ pxor xmm7,xmm9
+ pshufd xmm9,xmm6,238
+ xor ebx,ecx
+ add ebp,eax
+ ror eax,7
+ pxor xmm0,xmm4
+ xor esi,ecx
+ mov edi,ebp
+ add edx,DWORD[rsp]
+ punpcklqdq xmm9,xmm7
+ xor eax,ebx
+ rol ebp,5
+ pxor xmm0,xmm1
+ add edx,esi
+ and edi,eax
+ movdqa xmm8,xmm10
+ xor eax,ebx
+ paddd xmm10,xmm7
+ add edx,ebp
+ pxor xmm0,xmm9
+ ror ebp,7
+ xor edi,ebx
+ mov esi,edx
+ add ecx,DWORD[4+rsp]
+ movdqa xmm9,xmm0
+ xor ebp,eax
+ rol edx,5
+ movdqa XMMWORD[48+rsp],xmm10
+ add ecx,edi
+ and esi,ebp
+ xor ebp,eax
+ pslld xmm0,2
+ add ecx,edx
+ ror edx,7
+ psrld xmm9,30
+ xor esi,eax
+ mov edi,ecx
+ add ebx,DWORD[8+rsp]
+ por xmm0,xmm9
+ xor edx,ebp
+ rol ecx,5
+ pshufd xmm10,xmm7,238
+ add ebx,esi
+ and edi,edx
+ xor edx,ebp
+ add ebx,ecx
+ add eax,DWORD[12+rsp]
+ xor edi,ebp
+ mov esi,ebx
+ rol ebx,5
+ add eax,edi
+ xor esi,edx
+ ror ecx,7
+ add eax,ebx
+ pxor xmm1,xmm5
+ add ebp,DWORD[16+rsp]
+ xor esi,ecx
+ punpcklqdq xmm10,xmm0
+ mov edi,eax
+ rol eax,5
+ pxor xmm1,xmm2
+ add ebp,esi
+ xor edi,ecx
+ movdqa xmm9,xmm8
+ ror ebx,7
+ paddd xmm8,xmm0
+ add ebp,eax
+ pxor xmm1,xmm10
+ add edx,DWORD[20+rsp]
+ xor edi,ebx
+ mov esi,ebp
+ rol ebp,5
+ movdqa xmm10,xmm1
+ add edx,edi
+ xor esi,ebx
+ movdqa XMMWORD[rsp],xmm8
+ ror eax,7
+ add edx,ebp
+ add ecx,DWORD[24+rsp]
+ pslld xmm1,2
+ xor esi,eax
+ mov edi,edx
+ psrld xmm10,30
+ rol edx,5
+ add ecx,esi
+ xor edi,eax
+ ror ebp,7
+ por xmm1,xmm10
+ add ecx,edx
+ add ebx,DWORD[28+rsp]
+ pshufd xmm8,xmm0,238
+ xor edi,ebp
+ mov esi,ecx
+ rol ecx,5
+ add ebx,edi
+ xor esi,ebp
+ ror edx,7
+ add ebx,ecx
+ pxor xmm2,xmm6
+ add eax,DWORD[32+rsp]
+ xor esi,edx
+ punpcklqdq xmm8,xmm1
+ mov edi,ebx
+ rol ebx,5
+ pxor xmm2,xmm3
+ add eax,esi
+ xor edi,edx
+ movdqa xmm10,XMMWORD[r14]
+ ror ecx,7
+ paddd xmm9,xmm1
+ add eax,ebx
+ pxor xmm2,xmm8
+ add ebp,DWORD[36+rsp]
+ xor edi,ecx
+ mov esi,eax
+ rol eax,5
+ movdqa xmm8,xmm2
+ add ebp,edi
+ xor esi,ecx
+ movdqa XMMWORD[16+rsp],xmm9
+ ror ebx,7
+ add ebp,eax
+ add edx,DWORD[40+rsp]
+ pslld xmm2,2
+ xor esi,ebx
+ mov edi,ebp
+ psrld xmm8,30
+ rol ebp,5
+ add edx,esi
+ xor edi,ebx
+ ror eax,7
+ por xmm2,xmm8
+ add edx,ebp
+ add ecx,DWORD[44+rsp]
+ pshufd xmm9,xmm1,238
+ xor edi,eax
+ mov esi,edx
+ rol edx,5
+ add ecx,edi
+ xor esi,eax
+ ror ebp,7
+ add ecx,edx
+ pxor xmm3,xmm7
+ add ebx,DWORD[48+rsp]
+ xor esi,ebp
+ punpcklqdq xmm9,xmm2
+ mov edi,ecx
+ rol ecx,5
+ pxor xmm3,xmm4
+ add ebx,esi
+ xor edi,ebp
+ movdqa xmm8,xmm10
+ ror edx,7
+ paddd xmm10,xmm2
+ add ebx,ecx
+ pxor xmm3,xmm9
+ add eax,DWORD[52+rsp]
+ xor edi,edx
+ mov esi,ebx
+ rol ebx,5
+ movdqa xmm9,xmm3
+ add eax,edi
+ xor esi,edx
+ movdqa XMMWORD[32+rsp],xmm10
+ ror ecx,7
+ add eax,ebx
+ add ebp,DWORD[56+rsp]
+ pslld xmm3,2
+ xor esi,ecx
+ mov edi,eax
+ psrld xmm9,30
+ rol eax,5
+ add ebp,esi
+ xor edi,ecx
+ ror ebx,7
+ por xmm3,xmm9
+ add ebp,eax
+ add edx,DWORD[60+rsp]
+ pshufd xmm10,xmm2,238
+ xor edi,ebx
+ mov esi,ebp
+ rol ebp,5
+ add edx,edi
+ xor esi,ebx
+ ror eax,7
+ add edx,ebp
+ pxor xmm4,xmm0
+ add ecx,DWORD[rsp]
+ xor esi,eax
+ punpcklqdq xmm10,xmm3
+ mov edi,edx
+ rol edx,5
+ pxor xmm4,xmm5
+ add ecx,esi
+ xor edi,eax
+ movdqa xmm9,xmm8
+ ror ebp,7
+ paddd xmm8,xmm3
+ add ecx,edx
+ pxor xmm4,xmm10
+ add ebx,DWORD[4+rsp]
+ xor edi,ebp
+ mov esi,ecx
+ rol ecx,5
+ movdqa xmm10,xmm4
+ add ebx,edi
+ xor esi,ebp
+ movdqa XMMWORD[48+rsp],xmm8
+ ror edx,7
+ add ebx,ecx
+ add eax,DWORD[8+rsp]
+ pslld xmm4,2
+ xor esi,edx
+ mov edi,ebx
+ psrld xmm10,30
+ rol ebx,5
+ add eax,esi
+ xor edi,edx
+ ror ecx,7
+ por xmm4,xmm10
+ add eax,ebx
+ add ebp,DWORD[12+rsp]
+ pshufd xmm8,xmm3,238
+ xor edi,ecx
+ mov esi,eax
+ rol eax,5
+ add ebp,edi
+ xor esi,ecx
+ ror ebx,7
+ add ebp,eax
+ pxor xmm5,xmm1
+ add edx,DWORD[16+rsp]
+ xor esi,ebx
+ punpcklqdq xmm8,xmm4
+ mov edi,ebp
+ rol ebp,5
+ pxor xmm5,xmm6
+ add edx,esi
+ xor edi,ebx
+ movdqa xmm10,xmm9
+ ror eax,7
+ paddd xmm9,xmm4
+ add edx,ebp
+ pxor xmm5,xmm8
+ add ecx,DWORD[20+rsp]
+ xor edi,eax
+ mov esi,edx
+ rol edx,5
+ movdqa xmm8,xmm5
+ add ecx,edi
+ xor esi,eax
+ movdqa XMMWORD[rsp],xmm9
+ ror ebp,7
+ add ecx,edx
+ add ebx,DWORD[24+rsp]
+ pslld xmm5,2
+ xor esi,ebp
+ mov edi,ecx
+ psrld xmm8,30
+ rol ecx,5
+ add ebx,esi
+ xor edi,ebp
+ ror edx,7
+ por xmm5,xmm8
+ add ebx,ecx
+ add eax,DWORD[28+rsp]
+ pshufd xmm9,xmm4,238
+ ror ecx,7
+ mov esi,ebx
+ xor edi,edx
+ rol ebx,5
+ add eax,edi
+ xor esi,ecx
+ xor ecx,edx
+ add eax,ebx
+ pxor xmm6,xmm2
+ add ebp,DWORD[32+rsp]
+ and esi,ecx
+ xor ecx,edx
+ ror ebx,7
+ punpcklqdq xmm9,xmm5
+ mov edi,eax
+ xor esi,ecx
+ pxor xmm6,xmm7
+ rol eax,5
+ add ebp,esi
+ movdqa xmm8,xmm10
+ xor edi,ebx
+ paddd xmm10,xmm5
+ xor ebx,ecx
+ pxor xmm6,xmm9
+ add ebp,eax
+ add edx,DWORD[36+rsp]
+ and edi,ebx
+ xor ebx,ecx
+ ror eax,7
+ movdqa xmm9,xmm6
+ mov esi,ebp
+ xor edi,ebx
+ movdqa XMMWORD[16+rsp],xmm10
+ rol ebp,5
+ add edx,edi
+ xor esi,eax
+ pslld xmm6,2
+ xor eax,ebx
+ add edx,ebp
+ psrld xmm9,30
+ add ecx,DWORD[40+rsp]
+ and esi,eax
+ xor eax,ebx
+ por xmm6,xmm9
+ ror ebp,7
+ mov edi,edx
+ xor esi,eax
+ rol edx,5
+ pshufd xmm10,xmm5,238
+ add ecx,esi
+ xor edi,ebp
+ xor ebp,eax
+ add ecx,edx
+ add ebx,DWORD[44+rsp]
+ and edi,ebp
+ xor ebp,eax
+ ror edx,7
+ mov esi,ecx
+ xor edi,ebp
+ rol ecx,5
+ add ebx,edi
+ xor esi,edx
+ xor edx,ebp
+ add ebx,ecx
+ pxor xmm7,xmm3
+ add eax,DWORD[48+rsp]
+ and esi,edx
+ xor edx,ebp
+ ror ecx,7
+ punpcklqdq xmm10,xmm6
+ mov edi,ebx
+ xor esi,edx
+ pxor xmm7,xmm0
+ rol ebx,5
+ add eax,esi
+ movdqa xmm9,XMMWORD[32+r14]
+ xor edi,ecx
+ paddd xmm8,xmm6
+ xor ecx,edx
+ pxor xmm7,xmm10
+ add eax,ebx
+ add ebp,DWORD[52+rsp]
+ and edi,ecx
+ xor ecx,edx
+ ror ebx,7
+ movdqa xmm10,xmm7
+ mov esi,eax
+ xor edi,ecx
+ movdqa XMMWORD[32+rsp],xmm8
+ rol eax,5
+ add ebp,edi
+ xor esi,ebx
+ pslld xmm7,2
+ xor ebx,ecx
+ add ebp,eax
+ psrld xmm10,30
+ add edx,DWORD[56+rsp]
+ and esi,ebx
+ xor ebx,ecx
+ por xmm7,xmm10
+ ror eax,7
+ mov edi,ebp
+ xor esi,ebx
+ rol ebp,5
+ pshufd xmm8,xmm6,238
+ add edx,esi
+ xor edi,eax
+ xor eax,ebx
+ add edx,ebp
+ add ecx,DWORD[60+rsp]
+ and edi,eax
+ xor eax,ebx
+ ror ebp,7
+ mov esi,edx
+ xor edi,eax
+ rol edx,5
+ add ecx,edi
+ xor esi,ebp
+ xor ebp,eax
+ add ecx,edx
+ pxor xmm0,xmm4
+ add ebx,DWORD[rsp]
+ and esi,ebp
+ xor ebp,eax
+ ror edx,7
+ punpcklqdq xmm8,xmm7
+ mov edi,ecx
+ xor esi,ebp
+ pxor xmm0,xmm1
+ rol ecx,5
+ add ebx,esi
+ movdqa xmm10,xmm9
+ xor edi,edx
+ paddd xmm9,xmm7
+ xor edx,ebp
+ pxor xmm0,xmm8
+ add ebx,ecx
+ add eax,DWORD[4+rsp]
+ and edi,edx
+ xor edx,ebp
+ ror ecx,7
+ movdqa xmm8,xmm0
+ mov esi,ebx
+ xor edi,edx
+ movdqa XMMWORD[48+rsp],xmm9
+ rol ebx,5
+ add eax,edi
+ xor esi,ecx
+ pslld xmm0,2
+ xor ecx,edx
+ add eax,ebx
+ psrld xmm8,30
+ add ebp,DWORD[8+rsp]
+ and esi,ecx
+ xor ecx,edx
+ por xmm0,xmm8
+ ror ebx,7
+ mov edi,eax
+ xor esi,ecx
+ rol eax,5
+ pshufd xmm9,xmm7,238
+ add ebp,esi
+ xor edi,ebx
+ xor ebx,ecx
+ add ebp,eax
+ add edx,DWORD[12+rsp]
+ and edi,ebx
+ xor ebx,ecx
+ ror eax,7
+ mov esi,ebp
+ xor edi,ebx
+ rol ebp,5
+ add edx,edi
+ xor esi,eax
+ xor eax,ebx
+ add edx,ebp
+ pxor xmm1,xmm5
+ add ecx,DWORD[16+rsp]
+ and esi,eax
+ xor eax,ebx
+ ror ebp,7
+ punpcklqdq xmm9,xmm0
+ mov edi,edx
+ xor esi,eax
+ pxor xmm1,xmm2
+ rol edx,5
+ add ecx,esi
+ movdqa xmm8,xmm10
+ xor edi,ebp
+ paddd xmm10,xmm0
+ xor ebp,eax
+ pxor xmm1,xmm9
+ add ecx,edx
+ add ebx,DWORD[20+rsp]
+ and edi,ebp
+ xor ebp,eax
+ ror edx,7
+ movdqa xmm9,xmm1
+ mov esi,ecx
+ xor edi,ebp
+ movdqa XMMWORD[rsp],xmm10
+ rol ecx,5
+ add ebx,edi
+ xor esi,edx
+ pslld xmm1,2
+ xor edx,ebp
+ add ebx,ecx
+ psrld xmm9,30
+ add eax,DWORD[24+rsp]
+ and esi,edx
+ xor edx,ebp
+ por xmm1,xmm9
+ ror ecx,7
+ mov edi,ebx
+ xor esi,edx
+ rol ebx,5
+ pshufd xmm10,xmm0,238
+ add eax,esi
+ xor edi,ecx
+ xor ecx,edx
+ add eax,ebx
+ add ebp,DWORD[28+rsp]
+ and edi,ecx
+ xor ecx,edx
+ ror ebx,7
+ mov esi,eax
+ xor edi,ecx
+ rol eax,5
+ add ebp,edi
+ xor esi,ebx
+ xor ebx,ecx
+ add ebp,eax
+ pxor xmm2,xmm6
+ add edx,DWORD[32+rsp]
+ and esi,ebx
+ xor ebx,ecx
+ ror eax,7
+ punpcklqdq xmm10,xmm1
+ mov edi,ebp
+ xor esi,ebx
+ pxor xmm2,xmm3
+ rol ebp,5
+ add edx,esi
+ movdqa xmm9,xmm8
+ xor edi,eax
+ paddd xmm8,xmm1
+ xor eax,ebx
+ pxor xmm2,xmm10
+ add edx,ebp
+ add ecx,DWORD[36+rsp]
+ and edi,eax
+ xor eax,ebx
+ ror ebp,7
+ movdqa xmm10,xmm2
+ mov esi,edx
+ xor edi,eax
+ movdqa XMMWORD[16+rsp],xmm8
+ rol edx,5
+ add ecx,edi
+ xor esi,ebp
+ pslld xmm2,2
+ xor ebp,eax
+ add ecx,edx
+ psrld xmm10,30
+ add ebx,DWORD[40+rsp]
+ and esi,ebp
+ xor ebp,eax
+ por xmm2,xmm10
+ ror edx,7
+ mov edi,ecx
+ xor esi,ebp
+ rol ecx,5
+ pshufd xmm8,xmm1,238
+ add ebx,esi
+ xor edi,edx
+ xor edx,ebp
+ add ebx,ecx
+ add eax,DWORD[44+rsp]
+ and edi,edx
+ xor edx,ebp
+ ror ecx,7
+ mov esi,ebx
+ xor edi,edx
+ rol ebx,5
+ add eax,edi
+ xor esi,edx
+ add eax,ebx
+ pxor xmm3,xmm7
+ add ebp,DWORD[48+rsp]
+ xor esi,ecx
+ punpcklqdq xmm8,xmm2
+ mov edi,eax
+ rol eax,5
+ pxor xmm3,xmm4
+ add ebp,esi
+ xor edi,ecx
+ movdqa xmm10,xmm9
+ ror ebx,7
+ paddd xmm9,xmm2
+ add ebp,eax
+ pxor xmm3,xmm8
+ add edx,DWORD[52+rsp]
+ xor edi,ebx
+ mov esi,ebp
+ rol ebp,5
+ movdqa xmm8,xmm3
+ add edx,edi
+ xor esi,ebx
+ movdqa XMMWORD[32+rsp],xmm9
+ ror eax,7
+ add edx,ebp
+ add ecx,DWORD[56+rsp]
+ pslld xmm3,2
+ xor esi,eax
+ mov edi,edx
+ psrld xmm8,30
+ rol edx,5
+ add ecx,esi
+ xor edi,eax
+ ror ebp,7
+ por xmm3,xmm8
+ add ecx,edx
+ add ebx,DWORD[60+rsp]
+ xor edi,ebp
+ mov esi,ecx
+ rol ecx,5
+ add ebx,edi
+ xor esi,ebp
+ ror edx,7
+ add ebx,ecx
+ add eax,DWORD[rsp]
+ xor esi,edx
+ mov edi,ebx
+ rol ebx,5
+ paddd xmm10,xmm3
+ add eax,esi
+ xor edi,edx
+ movdqa XMMWORD[48+rsp],xmm10
+ ror ecx,7
+ add eax,ebx
+ add ebp,DWORD[4+rsp]
+ xor edi,ecx
+ mov esi,eax
+ rol eax,5
+ add ebp,edi
+ xor esi,ecx
+ ror ebx,7
+ add ebp,eax
+ add edx,DWORD[8+rsp]
+ xor esi,ebx
+ mov edi,ebp
+ rol ebp,5
+ add edx,esi
+ xor edi,ebx
+ ror eax,7
+ add edx,ebp
+ add ecx,DWORD[12+rsp]
+ xor edi,eax
+ mov esi,edx
+ rol edx,5
+ add ecx,edi
+ xor esi,eax
+ ror ebp,7
+ add ecx,edx
+ cmp r9,r10
+ je NEAR $L$done_ssse3
+ movdqa xmm6,XMMWORD[64+r14]
+ movdqa xmm9,XMMWORD[((-64))+r14]
+ movdqu xmm0,XMMWORD[r9]
+ movdqu xmm1,XMMWORD[16+r9]
+ movdqu xmm2,XMMWORD[32+r9]
+ movdqu xmm3,XMMWORD[48+r9]
+DB 102,15,56,0,198
+ add r9,64
+ add ebx,DWORD[16+rsp]
+ xor esi,ebp
+ mov edi,ecx
+DB 102,15,56,0,206
+ rol ecx,5
+ add ebx,esi
+ xor edi,ebp
+ ror edx,7
+ paddd xmm0,xmm9
+ add ebx,ecx
+ add eax,DWORD[20+rsp]
+ xor edi,edx
+ mov esi,ebx
+ movdqa XMMWORD[rsp],xmm0
+ rol ebx,5
+ add eax,edi
+ xor esi,edx
+ ror ecx,7
+ psubd xmm0,xmm9
+ add eax,ebx
+ add ebp,DWORD[24+rsp]
+ xor esi,ecx
+ mov edi,eax
+ rol eax,5
+ add ebp,esi
+ xor edi,ecx
+ ror ebx,7
+ add ebp,eax
+ add edx,DWORD[28+rsp]
+ xor edi,ebx
+ mov esi,ebp
+ rol ebp,5
+ add edx,edi
+ xor esi,ebx
+ ror eax,7
+ add edx,ebp
+ add ecx,DWORD[32+rsp]
+ xor esi,eax
+ mov edi,edx
+DB 102,15,56,0,214
+ rol edx,5
+ add ecx,esi
+ xor edi,eax
+ ror ebp,7
+ paddd xmm1,xmm9
+ add ecx,edx
+ add ebx,DWORD[36+rsp]
+ xor edi,ebp
+ mov esi,ecx
+ movdqa XMMWORD[16+rsp],xmm1
+ rol ecx,5
+ add ebx,edi
+ xor esi,ebp
+ ror edx,7
+ psubd xmm1,xmm9
+ add ebx,ecx
+ add eax,DWORD[40+rsp]
+ xor esi,edx
+ mov edi,ebx
+ rol ebx,5
+ add eax,esi
+ xor edi,edx
+ ror ecx,7
+ add eax,ebx
+ add ebp,DWORD[44+rsp]
+ xor edi,ecx
+ mov esi,eax
+ rol eax,5
+ add ebp,edi
+ xor esi,ecx
+ ror ebx,7
+ add ebp,eax
+ add edx,DWORD[48+rsp]
+ xor esi,ebx
+ mov edi,ebp
+DB 102,15,56,0,222
+ rol ebp,5
+ add edx,esi
+ xor edi,ebx
+ ror eax,7
+ paddd xmm2,xmm9
+ add edx,ebp
+ add ecx,DWORD[52+rsp]
+ xor edi,eax
+ mov esi,edx
+ movdqa XMMWORD[32+rsp],xmm2
+ rol edx,5
+ add ecx,edi
+ xor esi,eax
+ ror ebp,7
+ psubd xmm2,xmm9
+ add ecx,edx
+ add ebx,DWORD[56+rsp]
+ xor esi,ebp
+ mov edi,ecx
+ rol ecx,5
+ add ebx,esi
+ xor edi,ebp
+ ror edx,7
+ add ebx,ecx
+ add eax,DWORD[60+rsp]
+ xor edi,edx
+ mov esi,ebx
+ rol ebx,5
+ add eax,edi
+ ror ecx,7
+ add eax,ebx
+ add eax,DWORD[r8]
+ add esi,DWORD[4+r8]
+ add ecx,DWORD[8+r8]
+ add edx,DWORD[12+r8]
+ mov DWORD[r8],eax
+ add ebp,DWORD[16+r8]
+ mov DWORD[4+r8],esi
+ mov ebx,esi
+ mov DWORD[8+r8],ecx
+ mov edi,ecx
+ mov DWORD[12+r8],edx
+ xor edi,edx
+ mov DWORD[16+r8],ebp
+ and esi,edi
+ jmp NEAR $L$oop_ssse3
+
+ALIGN 16
+$L$done_ssse3:
+ add ebx,DWORD[16+rsp]
+ xor esi,ebp
+ mov edi,ecx
+ rol ecx,5
+ add ebx,esi
+ xor edi,ebp
+ ror edx,7
+ add ebx,ecx
+ add eax,DWORD[20+rsp]
+ xor edi,edx
+ mov esi,ebx
+ rol ebx,5
+ add eax,edi
+ xor esi,edx
+ ror ecx,7
+ add eax,ebx
+ add ebp,DWORD[24+rsp]
+ xor esi,ecx
+ mov edi,eax
+ rol eax,5
+ add ebp,esi
+ xor edi,ecx
+ ror ebx,7
+ add ebp,eax
+ add edx,DWORD[28+rsp]
+ xor edi,ebx
+ mov esi,ebp
+ rol ebp,5
+ add edx,edi
+ xor esi,ebx
+ ror eax,7
+ add edx,ebp
+ add ecx,DWORD[32+rsp]
+ xor esi,eax
+ mov edi,edx
+ rol edx,5
+ add ecx,esi
+ xor edi,eax
+ ror ebp,7
+ add ecx,edx
+ add ebx,DWORD[36+rsp]
+ xor edi,ebp
+ mov esi,ecx
+ rol ecx,5
+ add ebx,edi
+ xor esi,ebp
+ ror edx,7
+ add ebx,ecx
+ add eax,DWORD[40+rsp]
+ xor esi,edx
+ mov edi,ebx
+ rol ebx,5
+ add eax,esi
+ xor edi,edx
+ ror ecx,7
+ add eax,ebx
+ add ebp,DWORD[44+rsp]
+ xor edi,ecx
+ mov esi,eax
+ rol eax,5
+ add ebp,edi
+ xor esi,ecx
+ ror ebx,7
+ add ebp,eax
+ add edx,DWORD[48+rsp]
+ xor esi,ebx
+ mov edi,ebp
+ rol ebp,5
+ add edx,esi
+ xor edi,ebx
+ ror eax,7
+ add edx,ebp
+ add ecx,DWORD[52+rsp]
+ xor edi,eax
+ mov esi,edx
+ rol edx,5
+ add ecx,edi
+ xor esi,eax
+ ror ebp,7
+ add ecx,edx
+ add ebx,DWORD[56+rsp]
+ xor esi,ebp
+ mov edi,ecx
+ rol ecx,5
+ add ebx,esi
+ xor edi,ebp
+ ror edx,7
+ add ebx,ecx
+ add eax,DWORD[60+rsp]
+ xor edi,edx
+ mov esi,ebx
+ rol ebx,5
+ add eax,edi
+ ror ecx,7
+ add eax,ebx
+ add eax,DWORD[r8]
+ add esi,DWORD[4+r8]
+ add ecx,DWORD[8+r8]
+ mov DWORD[r8],eax
+ add edx,DWORD[12+r8]
+ mov DWORD[4+r8],esi
+ add ebp,DWORD[16+r8]
+ mov DWORD[8+r8],ecx
+ mov DWORD[12+r8],edx
+ mov DWORD[16+r8],ebp
+ movaps xmm6,XMMWORD[((-40-96))+r11]
+ movaps xmm7,XMMWORD[((-40-80))+r11]
+ movaps xmm8,XMMWORD[((-40-64))+r11]
+ movaps xmm9,XMMWORD[((-40-48))+r11]
+ movaps xmm10,XMMWORD[((-40-32))+r11]
+ movaps xmm11,XMMWORD[((-40-16))+r11]
+ mov r14,QWORD[((-40))+r11]
+
+ mov r13,QWORD[((-32))+r11]
+
+ mov r12,QWORD[((-24))+r11]
+
+ mov rbp,QWORD[((-16))+r11]
+
+ mov rbx,QWORD[((-8))+r11]
+
+ lea rsp,[r11]
+
+$L$epilogue_ssse3:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_sha1_block_data_order_ssse3:
+
+ALIGN 16
+sha1_block_data_order_avx:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_sha1_block_data_order_avx:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+_avx_shortcut:
+
+ mov r11,rsp
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ lea rsp,[((-160))+rsp]
+ vzeroupper
+ vmovaps XMMWORD[(-40-96)+r11],xmm6
+ vmovaps XMMWORD[(-40-80)+r11],xmm7
+ vmovaps XMMWORD[(-40-64)+r11],xmm8
+ vmovaps XMMWORD[(-40-48)+r11],xmm9
+ vmovaps XMMWORD[(-40-32)+r11],xmm10
+ vmovaps XMMWORD[(-40-16)+r11],xmm11
+$L$prologue_avx:
+ and rsp,-64
+ mov r8,rdi
+ mov r9,rsi
+ mov r10,rdx
+
+ shl r10,6
+ add r10,r9
+ lea r14,[((K_XX_XX+64))]
+
+ mov eax,DWORD[r8]
+ mov ebx,DWORD[4+r8]
+ mov ecx,DWORD[8+r8]
+ mov edx,DWORD[12+r8]
+ mov esi,ebx
+ mov ebp,DWORD[16+r8]
+ mov edi,ecx
+ xor edi,edx
+ and esi,edi
+
+ vmovdqa xmm6,XMMWORD[64+r14]
+ vmovdqa xmm11,XMMWORD[((-64))+r14]
+ vmovdqu xmm0,XMMWORD[r9]
+ vmovdqu xmm1,XMMWORD[16+r9]
+ vmovdqu xmm2,XMMWORD[32+r9]
+ vmovdqu xmm3,XMMWORD[48+r9]
+ vpshufb xmm0,xmm0,xmm6
+ add r9,64
+ vpshufb xmm1,xmm1,xmm6
+ vpshufb xmm2,xmm2,xmm6
+ vpshufb xmm3,xmm3,xmm6
+ vpaddd xmm4,xmm0,xmm11
+ vpaddd xmm5,xmm1,xmm11
+ vpaddd xmm6,xmm2,xmm11
+ vmovdqa XMMWORD[rsp],xmm4
+ vmovdqa XMMWORD[16+rsp],xmm5
+ vmovdqa XMMWORD[32+rsp],xmm6
+ jmp NEAR $L$oop_avx
+ALIGN 16
+$L$oop_avx:
+ shrd ebx,ebx,2
+ xor esi,edx
+ vpalignr xmm4,xmm1,xmm0,8
+ mov edi,eax
+ add ebp,DWORD[rsp]
+ vpaddd xmm9,xmm11,xmm3
+ xor ebx,ecx
+ shld eax,eax,5
+ vpsrldq xmm8,xmm3,4
+ add ebp,esi
+ and edi,ebx
+ vpxor xmm4,xmm4,xmm0
+ xor ebx,ecx
+ add ebp,eax
+ vpxor xmm8,xmm8,xmm2
+ shrd eax,eax,7
+ xor edi,ecx
+ mov esi,ebp
+ add edx,DWORD[4+rsp]
+ vpxor xmm4,xmm4,xmm8
+ xor eax,ebx
+ shld ebp,ebp,5
+ vmovdqa XMMWORD[48+rsp],xmm9
+ add edx,edi
+ and esi,eax
+ vpsrld xmm8,xmm4,31
+ xor eax,ebx
+ add edx,ebp
+ shrd ebp,ebp,7
+ xor esi,ebx
+ vpslldq xmm10,xmm4,12
+ vpaddd xmm4,xmm4,xmm4
+ mov edi,edx
+ add ecx,DWORD[8+rsp]
+ xor ebp,eax
+ shld edx,edx,5
+ vpsrld xmm9,xmm10,30
+ vpor xmm4,xmm4,xmm8
+ add ecx,esi
+ and edi,ebp
+ xor ebp,eax
+ add ecx,edx
+ vpslld xmm10,xmm10,2
+ vpxor xmm4,xmm4,xmm9
+ shrd edx,edx,7
+ xor edi,eax
+ mov esi,ecx
+ add ebx,DWORD[12+rsp]
+ vpxor xmm4,xmm4,xmm10
+ xor edx,ebp
+ shld ecx,ecx,5
+ add ebx,edi
+ and esi,edx
+ xor edx,ebp
+ add ebx,ecx
+ shrd ecx,ecx,7
+ xor esi,ebp
+ vpalignr xmm5,xmm2,xmm1,8
+ mov edi,ebx
+ add eax,DWORD[16+rsp]
+ vpaddd xmm9,xmm11,xmm4
+ xor ecx,edx
+ shld ebx,ebx,5
+ vpsrldq xmm8,xmm4,4
+ add eax,esi
+ and edi,ecx
+ vpxor xmm5,xmm5,xmm1
+ xor ecx,edx
+ add eax,ebx
+ vpxor xmm8,xmm8,xmm3
+ shrd ebx,ebx,7
+ xor edi,edx
+ mov esi,eax
+ add ebp,DWORD[20+rsp]
+ vpxor xmm5,xmm5,xmm8
+ xor ebx,ecx
+ shld eax,eax,5
+ vmovdqa XMMWORD[rsp],xmm9
+ add ebp,edi
+ and esi,ebx
+ vpsrld xmm8,xmm5,31
+ xor ebx,ecx
+ add ebp,eax
+ shrd eax,eax,7
+ xor esi,ecx
+ vpslldq xmm10,xmm5,12
+ vpaddd xmm5,xmm5,xmm5
+ mov edi,ebp
+ add edx,DWORD[24+rsp]
+ xor eax,ebx
+ shld ebp,ebp,5
+ vpsrld xmm9,xmm10,30
+ vpor xmm5,xmm5,xmm8
+ add edx,esi
+ and edi,eax
+ xor eax,ebx
+ add edx,ebp
+ vpslld xmm10,xmm10,2
+ vpxor xmm5,xmm5,xmm9
+ shrd ebp,ebp,7
+ xor edi,ebx
+ mov esi,edx
+ add ecx,DWORD[28+rsp]
+ vpxor xmm5,xmm5,xmm10
+ xor ebp,eax
+ shld edx,edx,5
+ vmovdqa xmm11,XMMWORD[((-32))+r14]
+ add ecx,edi
+ and esi,ebp
+ xor ebp,eax
+ add ecx,edx
+ shrd edx,edx,7
+ xor esi,eax
+ vpalignr xmm6,xmm3,xmm2,8
+ mov edi,ecx
+ add ebx,DWORD[32+rsp]
+ vpaddd xmm9,xmm11,xmm5
+ xor edx,ebp
+ shld ecx,ecx,5
+ vpsrldq xmm8,xmm5,4
+ add ebx,esi
+ and edi,edx
+ vpxor xmm6,xmm6,xmm2
+ xor edx,ebp
+ add ebx,ecx
+ vpxor xmm8,xmm8,xmm4
+ shrd ecx,ecx,7
+ xor edi,ebp
+ mov esi,ebx
+ add eax,DWORD[36+rsp]
+ vpxor xmm6,xmm6,xmm8
+ xor ecx,edx
+ shld ebx,ebx,5
+ vmovdqa XMMWORD[16+rsp],xmm9
+ add eax,edi
+ and esi,ecx
+ vpsrld xmm8,xmm6,31
+ xor ecx,edx
+ add eax,ebx
+ shrd ebx,ebx,7
+ xor esi,edx
+ vpslldq xmm10,xmm6,12
+ vpaddd xmm6,xmm6,xmm6
+ mov edi,eax
+ add ebp,DWORD[40+rsp]
+ xor ebx,ecx
+ shld eax,eax,5
+ vpsrld xmm9,xmm10,30
+ vpor xmm6,xmm6,xmm8
+ add ebp,esi
+ and edi,ebx
+ xor ebx,ecx
+ add ebp,eax
+ vpslld xmm10,xmm10,2
+ vpxor xmm6,xmm6,xmm9
+ shrd eax,eax,7
+ xor edi,ecx
+ mov esi,ebp
+ add edx,DWORD[44+rsp]
+ vpxor xmm6,xmm6,xmm10
+ xor eax,ebx
+ shld ebp,ebp,5
+ add edx,edi
+ and esi,eax
+ xor eax,ebx
+ add edx,ebp
+ shrd ebp,ebp,7
+ xor esi,ebx
+ vpalignr xmm7,xmm4,xmm3,8
+ mov edi,edx
+ add ecx,DWORD[48+rsp]
+ vpaddd xmm9,xmm11,xmm6
+ xor ebp,eax
+ shld edx,edx,5
+ vpsrldq xmm8,xmm6,4
+ add ecx,esi
+ and edi,ebp
+ vpxor xmm7,xmm7,xmm3
+ xor ebp,eax
+ add ecx,edx
+ vpxor xmm8,xmm8,xmm5
+ shrd edx,edx,7
+ xor edi,eax
+ mov esi,ecx
+ add ebx,DWORD[52+rsp]
+ vpxor xmm7,xmm7,xmm8
+ xor edx,ebp
+ shld ecx,ecx,5
+ vmovdqa XMMWORD[32+rsp],xmm9
+ add ebx,edi
+ and esi,edx
+ vpsrld xmm8,xmm7,31
+ xor edx,ebp
+ add ebx,ecx
+ shrd ecx,ecx,7
+ xor esi,ebp
+ vpslldq xmm10,xmm7,12
+ vpaddd xmm7,xmm7,xmm7
+ mov edi,ebx
+ add eax,DWORD[56+rsp]
+ xor ecx,edx
+ shld ebx,ebx,5
+ vpsrld xmm9,xmm10,30
+ vpor xmm7,xmm7,xmm8
+ add eax,esi
+ and edi,ecx
+ xor ecx,edx
+ add eax,ebx
+ vpslld xmm10,xmm10,2
+ vpxor xmm7,xmm7,xmm9
+ shrd ebx,ebx,7
+ xor edi,edx
+ mov esi,eax
+ add ebp,DWORD[60+rsp]
+ vpxor xmm7,xmm7,xmm10
+ xor ebx,ecx
+ shld eax,eax,5
+ add ebp,edi
+ and esi,ebx
+ xor ebx,ecx
+ add ebp,eax
+ vpalignr xmm8,xmm7,xmm6,8
+ vpxor xmm0,xmm0,xmm4
+ shrd eax,eax,7
+ xor esi,ecx
+ mov edi,ebp
+ add edx,DWORD[rsp]
+ vpxor xmm0,xmm0,xmm1
+ xor eax,ebx
+ shld ebp,ebp,5
+ vpaddd xmm9,xmm11,xmm7
+ add edx,esi
+ and edi,eax
+ vpxor xmm0,xmm0,xmm8
+ xor eax,ebx
+ add edx,ebp
+ shrd ebp,ebp,7
+ xor edi,ebx
+ vpsrld xmm8,xmm0,30
+ vmovdqa XMMWORD[48+rsp],xmm9
+ mov esi,edx
+ add ecx,DWORD[4+rsp]
+ xor ebp,eax
+ shld edx,edx,5
+ vpslld xmm0,xmm0,2
+ add ecx,edi
+ and esi,ebp
+ xor ebp,eax
+ add ecx,edx
+ shrd edx,edx,7
+ xor esi,eax
+ mov edi,ecx
+ add ebx,DWORD[8+rsp]
+ vpor xmm0,xmm0,xmm8
+ xor edx,ebp
+ shld ecx,ecx,5
+ add ebx,esi
+ and edi,edx
+ xor edx,ebp
+ add ebx,ecx
+ add eax,DWORD[12+rsp]
+ xor edi,ebp
+ mov esi,ebx
+ shld ebx,ebx,5
+ add eax,edi
+ xor esi,edx
+ shrd ecx,ecx,7
+ add eax,ebx
+ vpalignr xmm8,xmm0,xmm7,8
+ vpxor xmm1,xmm1,xmm5
+ add ebp,DWORD[16+rsp]
+ xor esi,ecx
+ mov edi,eax
+ shld eax,eax,5
+ vpxor xmm1,xmm1,xmm2
+ add ebp,esi
+ xor edi,ecx
+ vpaddd xmm9,xmm11,xmm0
+ shrd ebx,ebx,7
+ add ebp,eax
+ vpxor xmm1,xmm1,xmm8
+ add edx,DWORD[20+rsp]
+ xor edi,ebx
+ mov esi,ebp
+ shld ebp,ebp,5
+ vpsrld xmm8,xmm1,30
+ vmovdqa XMMWORD[rsp],xmm9
+ add edx,edi
+ xor esi,ebx
+ shrd eax,eax,7
+ add edx,ebp
+ vpslld xmm1,xmm1,2
+ add ecx,DWORD[24+rsp]
+ xor esi,eax
+ mov edi,edx
+ shld edx,edx,5
+ add ecx,esi
+ xor edi,eax
+ shrd ebp,ebp,7
+ add ecx,edx
+ vpor xmm1,xmm1,xmm8
+ add ebx,DWORD[28+rsp]
+ xor edi,ebp
+ mov esi,ecx
+ shld ecx,ecx,5
+ add ebx,edi
+ xor esi,ebp
+ shrd edx,edx,7
+ add ebx,ecx
+ vpalignr xmm8,xmm1,xmm0,8
+ vpxor xmm2,xmm2,xmm6
+ add eax,DWORD[32+rsp]
+ xor esi,edx
+ mov edi,ebx
+ shld ebx,ebx,5
+ vpxor xmm2,xmm2,xmm3
+ add eax,esi
+ xor edi,edx
+ vpaddd xmm9,xmm11,xmm1
+ vmovdqa xmm11,XMMWORD[r14]
+ shrd ecx,ecx,7
+ add eax,ebx
+ vpxor xmm2,xmm2,xmm8
+ add ebp,DWORD[36+rsp]
+ xor edi,ecx
+ mov esi,eax
+ shld eax,eax,5
+ vpsrld xmm8,xmm2,30
+ vmovdqa XMMWORD[16+rsp],xmm9
+ add ebp,edi
+ xor esi,ecx
+ shrd ebx,ebx,7
+ add ebp,eax
+ vpslld xmm2,xmm2,2
+ add edx,DWORD[40+rsp]
+ xor esi,ebx
+ mov edi,ebp
+ shld ebp,ebp,5
+ add edx,esi
+ xor edi,ebx
+ shrd eax,eax,7
+ add edx,ebp
+ vpor xmm2,xmm2,xmm8
+ add ecx,DWORD[44+rsp]
+ xor edi,eax
+ mov esi,edx
+ shld edx,edx,5
+ add ecx,edi
+ xor esi,eax
+ shrd ebp,ebp,7
+ add ecx,edx
+ vpalignr xmm8,xmm2,xmm1,8
+ vpxor xmm3,xmm3,xmm7
+ add ebx,DWORD[48+rsp]
+ xor esi,ebp
+ mov edi,ecx
+ shld ecx,ecx,5
+ vpxor xmm3,xmm3,xmm4
+ add ebx,esi
+ xor edi,ebp
+ vpaddd xmm9,xmm11,xmm2
+ shrd edx,edx,7
+ add ebx,ecx
+ vpxor xmm3,xmm3,xmm8
+ add eax,DWORD[52+rsp]
+ xor edi,edx
+ mov esi,ebx
+ shld ebx,ebx,5
+ vpsrld xmm8,xmm3,30
+ vmovdqa XMMWORD[32+rsp],xmm9
+ add eax,edi
+ xor esi,edx
+ shrd ecx,ecx,7
+ add eax,ebx
+ vpslld xmm3,xmm3,2
+ add ebp,DWORD[56+rsp]
+ xor esi,ecx
+ mov edi,eax
+ shld eax,eax,5
+ add ebp,esi
+ xor edi,ecx
+ shrd ebx,ebx,7
+ add ebp,eax
+ vpor xmm3,xmm3,xmm8
+ add edx,DWORD[60+rsp]
+ xor edi,ebx
+ mov esi,ebp
+ shld ebp,ebp,5
+ add edx,edi
+ xor esi,ebx
+ shrd eax,eax,7
+ add edx,ebp
+ vpalignr xmm8,xmm3,xmm2,8
+ vpxor xmm4,xmm4,xmm0
+ add ecx,DWORD[rsp]
+ xor esi,eax
+ mov edi,edx
+ shld edx,edx,5
+ vpxor xmm4,xmm4,xmm5
+ add ecx,esi
+ xor edi,eax
+ vpaddd xmm9,xmm11,xmm3
+ shrd ebp,ebp,7
+ add ecx,edx
+ vpxor xmm4,xmm4,xmm8
+ add ebx,DWORD[4+rsp]
+ xor edi,ebp
+ mov esi,ecx
+ shld ecx,ecx,5
+ vpsrld xmm8,xmm4,30
+ vmovdqa XMMWORD[48+rsp],xmm9
+ add ebx,edi
+ xor esi,ebp
+ shrd edx,edx,7
+ add ebx,ecx
+ vpslld xmm4,xmm4,2
+ add eax,DWORD[8+rsp]
+ xor esi,edx
+ mov edi,ebx
+ shld ebx,ebx,5
+ add eax,esi
+ xor edi,edx
+ shrd ecx,ecx,7
+ add eax,ebx
+ vpor xmm4,xmm4,xmm8
+ add ebp,DWORD[12+rsp]
+ xor edi,ecx
+ mov esi,eax
+ shld eax,eax,5
+ add ebp,edi
+ xor esi,ecx
+ shrd ebx,ebx,7
+ add ebp,eax
+ vpalignr xmm8,xmm4,xmm3,8
+ vpxor xmm5,xmm5,xmm1
+ add edx,DWORD[16+rsp]
+ xor esi,ebx
+ mov edi,ebp
+ shld ebp,ebp,5
+ vpxor xmm5,xmm5,xmm6
+ add edx,esi
+ xor edi,ebx
+ vpaddd xmm9,xmm11,xmm4
+ shrd eax,eax,7
+ add edx,ebp
+ vpxor xmm5,xmm5,xmm8
+ add ecx,DWORD[20+rsp]
+ xor edi,eax
+ mov esi,edx
+ shld edx,edx,5
+ vpsrld xmm8,xmm5,30
+ vmovdqa XMMWORD[rsp],xmm9
+ add ecx,edi
+ xor esi,eax
+ shrd ebp,ebp,7
+ add ecx,edx
+ vpslld xmm5,xmm5,2
+ add ebx,DWORD[24+rsp]
+ xor esi,ebp
+ mov edi,ecx
+ shld ecx,ecx,5
+ add ebx,esi
+ xor edi,ebp
+ shrd edx,edx,7
+ add ebx,ecx
+ vpor xmm5,xmm5,xmm8
+ add eax,DWORD[28+rsp]
+ shrd ecx,ecx,7
+ mov esi,ebx
+ xor edi,edx
+ shld ebx,ebx,5
+ add eax,edi
+ xor esi,ecx
+ xor ecx,edx
+ add eax,ebx
+ vpalignr xmm8,xmm5,xmm4,8
+ vpxor xmm6,xmm6,xmm2
+ add ebp,DWORD[32+rsp]
+ and esi,ecx
+ xor ecx,edx
+ shrd ebx,ebx,7
+ vpxor xmm6,xmm6,xmm7
+ mov edi,eax
+ xor esi,ecx
+ vpaddd xmm9,xmm11,xmm5
+ shld eax,eax,5
+ add ebp,esi
+ vpxor xmm6,xmm6,xmm8
+ xor edi,ebx
+ xor ebx,ecx
+ add ebp,eax
+ add edx,DWORD[36+rsp]
+ vpsrld xmm8,xmm6,30
+ vmovdqa XMMWORD[16+rsp],xmm9
+ and edi,ebx
+ xor ebx,ecx
+ shrd eax,eax,7
+ mov esi,ebp
+ vpslld xmm6,xmm6,2
+ xor edi,ebx
+ shld ebp,ebp,5
+ add edx,edi
+ xor esi,eax
+ xor eax,ebx
+ add edx,ebp
+ add ecx,DWORD[40+rsp]
+ and esi,eax
+ vpor xmm6,xmm6,xmm8
+ xor eax,ebx
+ shrd ebp,ebp,7
+ mov edi,edx
+ xor esi,eax
+ shld edx,edx,5
+ add ecx,esi
+ xor edi,ebp
+ xor ebp,eax
+ add ecx,edx
+ add ebx,DWORD[44+rsp]
+ and edi,ebp
+ xor ebp,eax
+ shrd edx,edx,7
+ mov esi,ecx
+ xor edi,ebp
+ shld ecx,ecx,5
+ add ebx,edi
+ xor esi,edx
+ xor edx,ebp
+ add ebx,ecx
+ vpalignr xmm8,xmm6,xmm5,8
+ vpxor xmm7,xmm7,xmm3
+ add eax,DWORD[48+rsp]
+ and esi,edx
+ xor edx,ebp
+ shrd ecx,ecx,7
+ vpxor xmm7,xmm7,xmm0
+ mov edi,ebx
+ xor esi,edx
+ vpaddd xmm9,xmm11,xmm6
+ vmovdqa xmm11,XMMWORD[32+r14]
+ shld ebx,ebx,5
+ add eax,esi
+ vpxor xmm7,xmm7,xmm8
+ xor edi,ecx
+ xor ecx,edx
+ add eax,ebx
+ add ebp,DWORD[52+rsp]
+ vpsrld xmm8,xmm7,30
+ vmovdqa XMMWORD[32+rsp],xmm9
+ and edi,ecx
+ xor ecx,edx
+ shrd ebx,ebx,7
+ mov esi,eax
+ vpslld xmm7,xmm7,2
+ xor edi,ecx
+ shld eax,eax,5
+ add ebp,edi
+ xor esi,ebx
+ xor ebx,ecx
+ add ebp,eax
+ add edx,DWORD[56+rsp]
+ and esi,ebx
+ vpor xmm7,xmm7,xmm8
+ xor ebx,ecx
+ shrd eax,eax,7
+ mov edi,ebp
+ xor esi,ebx
+ shld ebp,ebp,5
+ add edx,esi
+ xor edi,eax
+ xor eax,ebx
+ add edx,ebp
+ add ecx,DWORD[60+rsp]
+ and edi,eax
+ xor eax,ebx
+ shrd ebp,ebp,7
+ mov esi,edx
+ xor edi,eax
+ shld edx,edx,5
+ add ecx,edi
+ xor esi,ebp
+ xor ebp,eax
+ add ecx,edx
+ vpalignr xmm8,xmm7,xmm6,8
+ vpxor xmm0,xmm0,xmm4
+ add ebx,DWORD[rsp]
+ and esi,ebp
+ xor ebp,eax
+ shrd edx,edx,7
+ vpxor xmm0,xmm0,xmm1
+ mov edi,ecx
+ xor esi,ebp
+ vpaddd xmm9,xmm11,xmm7
+ shld ecx,ecx,5
+ add ebx,esi
+ vpxor xmm0,xmm0,xmm8
+ xor edi,edx
+ xor edx,ebp
+ add ebx,ecx
+ add eax,DWORD[4+rsp]
+ vpsrld xmm8,xmm0,30
+ vmovdqa XMMWORD[48+rsp],xmm9
+ and edi,edx
+ xor edx,ebp
+ shrd ecx,ecx,7
+ mov esi,ebx
+ vpslld xmm0,xmm0,2
+ xor edi,edx
+ shld ebx,ebx,5
+ add eax,edi
+ xor esi,ecx
+ xor ecx,edx
+ add eax,ebx
+ add ebp,DWORD[8+rsp]
+ and esi,ecx
+ vpor xmm0,xmm0,xmm8
+ xor ecx,edx
+ shrd ebx,ebx,7
+ mov edi,eax
+ xor esi,ecx
+ shld eax,eax,5
+ add ebp,esi
+ xor edi,ebx
+ xor ebx,ecx
+ add ebp,eax
+ add edx,DWORD[12+rsp]
+ and edi,ebx
+ xor ebx,ecx
+ shrd eax,eax,7
+ mov esi,ebp
+ xor edi,ebx
+ shld ebp,ebp,5
+ add edx,edi
+ xor esi,eax
+ xor eax,ebx
+ add edx,ebp
+ vpalignr xmm8,xmm0,xmm7,8
+ vpxor xmm1,xmm1,xmm5
+ add ecx,DWORD[16+rsp]
+ and esi,eax
+ xor eax,ebx
+ shrd ebp,ebp,7
+ vpxor xmm1,xmm1,xmm2
+ mov edi,edx
+ xor esi,eax
+ vpaddd xmm9,xmm11,xmm0
+ shld edx,edx,5
+ add ecx,esi
+ vpxor xmm1,xmm1,xmm8
+ xor edi,ebp
+ xor ebp,eax
+ add ecx,edx
+ add ebx,DWORD[20+rsp]
+ vpsrld xmm8,xmm1,30
+ vmovdqa XMMWORD[rsp],xmm9
+ and edi,ebp
+ xor ebp,eax
+ shrd edx,edx,7
+ mov esi,ecx
+ vpslld xmm1,xmm1,2
+ xor edi,ebp
+ shld ecx,ecx,5
+ add ebx,edi
+ xor esi,edx
+ xor edx,ebp
+ add ebx,ecx
+ add eax,DWORD[24+rsp]
+ and esi,edx
+ vpor xmm1,xmm1,xmm8
+ xor edx,ebp
+ shrd ecx,ecx,7
+ mov edi,ebx
+ xor esi,edx
+ shld ebx,ebx,5
+ add eax,esi
+ xor edi,ecx
+ xor ecx,edx
+ add eax,ebx
+ add ebp,DWORD[28+rsp]
+ and edi,ecx
+ xor ecx,edx
+ shrd ebx,ebx,7
+ mov esi,eax
+ xor edi,ecx
+ shld eax,eax,5
+ add ebp,edi
+ xor esi,ebx
+ xor ebx,ecx
+ add ebp,eax
+ vpalignr xmm8,xmm1,xmm0,8
+ vpxor xmm2,xmm2,xmm6
+ add edx,DWORD[32+rsp]
+ and esi,ebx
+ xor ebx,ecx
+ shrd eax,eax,7
+ vpxor xmm2,xmm2,xmm3
+ mov edi,ebp
+ xor esi,ebx
+ vpaddd xmm9,xmm11,xmm1
+ shld ebp,ebp,5
+ add edx,esi
+ vpxor xmm2,xmm2,xmm8
+ xor edi,eax
+ xor eax,ebx
+ add edx,ebp
+ add ecx,DWORD[36+rsp]
+ vpsrld xmm8,xmm2,30
+ vmovdqa XMMWORD[16+rsp],xmm9
+ and edi,eax
+ xor eax,ebx
+ shrd ebp,ebp,7
+ mov esi,edx
+ vpslld xmm2,xmm2,2
+ xor edi,eax
+ shld edx,edx,5
+ add ecx,edi
+ xor esi,ebp
+ xor ebp,eax
+ add ecx,edx
+ add ebx,DWORD[40+rsp]
+ and esi,ebp
+ vpor xmm2,xmm2,xmm8
+ xor ebp,eax
+ shrd edx,edx,7
+ mov edi,ecx
+ xor esi,ebp
+ shld ecx,ecx,5
+ add ebx,esi
+ xor edi,edx
+ xor edx,ebp
+ add ebx,ecx
+ add eax,DWORD[44+rsp]
+ and edi,edx
+ xor edx,ebp
+ shrd ecx,ecx,7
+ mov esi,ebx
+ xor edi,edx
+ shld ebx,ebx,5
+ add eax,edi
+ xor esi,edx
+ add eax,ebx
+ vpalignr xmm8,xmm2,xmm1,8
+ vpxor xmm3,xmm3,xmm7
+ add ebp,DWORD[48+rsp]
+ xor esi,ecx
+ mov edi,eax
+ shld eax,eax,5
+ vpxor xmm3,xmm3,xmm4
+ add ebp,esi
+ xor edi,ecx
+ vpaddd xmm9,xmm11,xmm2
+ shrd ebx,ebx,7
+ add ebp,eax
+ vpxor xmm3,xmm3,xmm8
+ add edx,DWORD[52+rsp]
+ xor edi,ebx
+ mov esi,ebp
+ shld ebp,ebp,5
+ vpsrld xmm8,xmm3,30
+ vmovdqa XMMWORD[32+rsp],xmm9
+ add edx,edi
+ xor esi,ebx
+ shrd eax,eax,7
+ add edx,ebp
+ vpslld xmm3,xmm3,2
+ add ecx,DWORD[56+rsp]
+ xor esi,eax
+ mov edi,edx
+ shld edx,edx,5
+ add ecx,esi
+ xor edi,eax
+ shrd ebp,ebp,7
+ add ecx,edx
+ vpor xmm3,xmm3,xmm8
+ add ebx,DWORD[60+rsp]
+ xor edi,ebp
+ mov esi,ecx
+ shld ecx,ecx,5
+ add ebx,edi
+ xor esi,ebp
+ shrd edx,edx,7
+ add ebx,ecx
+ add eax,DWORD[rsp]
+ vpaddd xmm9,xmm11,xmm3
+ xor esi,edx
+ mov edi,ebx
+ shld ebx,ebx,5
+ add eax,esi
+ vmovdqa XMMWORD[48+rsp],xmm9
+ xor edi,edx
+ shrd ecx,ecx,7
+ add eax,ebx
+ add ebp,DWORD[4+rsp]
+ xor edi,ecx
+ mov esi,eax
+ shld eax,eax,5
+ add ebp,edi
+ xor esi,ecx
+ shrd ebx,ebx,7
+ add ebp,eax
+ add edx,DWORD[8+rsp]
+ xor esi,ebx
+ mov edi,ebp
+ shld ebp,ebp,5
+ add edx,esi
+ xor edi,ebx
+ shrd eax,eax,7
+ add edx,ebp
+ add ecx,DWORD[12+rsp]
+ xor edi,eax
+ mov esi,edx
+ shld edx,edx,5
+ add ecx,edi
+ xor esi,eax
+ shrd ebp,ebp,7
+ add ecx,edx
+ cmp r9,r10
+ je NEAR $L$done_avx
+ vmovdqa xmm6,XMMWORD[64+r14]
+ vmovdqa xmm11,XMMWORD[((-64))+r14]
+ vmovdqu xmm0,XMMWORD[r9]
+ vmovdqu xmm1,XMMWORD[16+r9]
+ vmovdqu xmm2,XMMWORD[32+r9]
+ vmovdqu xmm3,XMMWORD[48+r9]
+ vpshufb xmm0,xmm0,xmm6
+ add r9,64
+ add ebx,DWORD[16+rsp]
+ xor esi,ebp
+ vpshufb xmm1,xmm1,xmm6
+ mov edi,ecx
+ shld ecx,ecx,5
+ vpaddd xmm4,xmm0,xmm11
+ add ebx,esi
+ xor edi,ebp
+ shrd edx,edx,7
+ add ebx,ecx
+ vmovdqa XMMWORD[rsp],xmm4
+ add eax,DWORD[20+rsp]
+ xor edi,edx
+ mov esi,ebx
+ shld ebx,ebx,5
+ add eax,edi
+ xor esi,edx
+ shrd ecx,ecx,7
+ add eax,ebx
+ add ebp,DWORD[24+rsp]
+ xor esi,ecx
+ mov edi,eax
+ shld eax,eax,5
+ add ebp,esi
+ xor edi,ecx
+ shrd ebx,ebx,7
+ add ebp,eax
+ add edx,DWORD[28+rsp]
+ xor edi,ebx
+ mov esi,ebp
+ shld ebp,ebp,5
+ add edx,edi
+ xor esi,ebx
+ shrd eax,eax,7
+ add edx,ebp
+ add ecx,DWORD[32+rsp]
+ xor esi,eax
+ vpshufb xmm2,xmm2,xmm6
+ mov edi,edx
+ shld edx,edx,5
+ vpaddd xmm5,xmm1,xmm11
+ add ecx,esi
+ xor edi,eax
+ shrd ebp,ebp,7
+ add ecx,edx
+ vmovdqa XMMWORD[16+rsp],xmm5
+ add ebx,DWORD[36+rsp]
+ xor edi,ebp
+ mov esi,ecx
+ shld ecx,ecx,5
+ add ebx,edi
+ xor esi,ebp
+ shrd edx,edx,7
+ add ebx,ecx
+ add eax,DWORD[40+rsp]
+ xor esi,edx
+ mov edi,ebx
+ shld ebx,ebx,5
+ add eax,esi
+ xor edi,edx
+ shrd ecx,ecx,7
+ add eax,ebx
+ add ebp,DWORD[44+rsp]
+ xor edi,ecx
+ mov esi,eax
+ shld eax,eax,5
+ add ebp,edi
+ xor esi,ecx
+ shrd ebx,ebx,7
+ add ebp,eax
+ add edx,DWORD[48+rsp]
+ xor esi,ebx
+ vpshufb xmm3,xmm3,xmm6
+ mov edi,ebp
+ shld ebp,ebp,5
+ vpaddd xmm6,xmm2,xmm11
+ add edx,esi
+ xor edi,ebx
+ shrd eax,eax,7
+ add edx,ebp
+ vmovdqa XMMWORD[32+rsp],xmm6
+ add ecx,DWORD[52+rsp]
+ xor edi,eax
+ mov esi,edx
+ shld edx,edx,5
+ add ecx,edi
+ xor esi,eax
+ shrd ebp,ebp,7
+ add ecx,edx
+ add ebx,DWORD[56+rsp]
+ xor esi,ebp
+ mov edi,ecx
+ shld ecx,ecx,5
+ add ebx,esi
+ xor edi,ebp
+ shrd edx,edx,7
+ add ebx,ecx
+ add eax,DWORD[60+rsp]
+ xor edi,edx
+ mov esi,ebx
+ shld ebx,ebx,5
+ add eax,edi
+ shrd ecx,ecx,7
+ add eax,ebx
+ add eax,DWORD[r8]
+ add esi,DWORD[4+r8]
+ add ecx,DWORD[8+r8]
+ add edx,DWORD[12+r8]
+ mov DWORD[r8],eax
+ add ebp,DWORD[16+r8]
+ mov DWORD[4+r8],esi
+ mov ebx,esi
+ mov DWORD[8+r8],ecx
+ mov edi,ecx
+ mov DWORD[12+r8],edx
+ xor edi,edx
+ mov DWORD[16+r8],ebp
+ and esi,edi
+ jmp NEAR $L$oop_avx
+
+ALIGN 16
+$L$done_avx:
+ add ebx,DWORD[16+rsp]
+ xor esi,ebp
+ mov edi,ecx
+ shld ecx,ecx,5
+ add ebx,esi
+ xor edi,ebp
+ shrd edx,edx,7
+ add ebx,ecx
+ add eax,DWORD[20+rsp]
+ xor edi,edx
+ mov esi,ebx
+ shld ebx,ebx,5
+ add eax,edi
+ xor esi,edx
+ shrd ecx,ecx,7
+ add eax,ebx
+ add ebp,DWORD[24+rsp]
+ xor esi,ecx
+ mov edi,eax
+ shld eax,eax,5
+ add ebp,esi
+ xor edi,ecx
+ shrd ebx,ebx,7
+ add ebp,eax
+ add edx,DWORD[28+rsp]
+ xor edi,ebx
+ mov esi,ebp
+ shld ebp,ebp,5
+ add edx,edi
+ xor esi,ebx
+ shrd eax,eax,7
+ add edx,ebp
+ add ecx,DWORD[32+rsp]
+ xor esi,eax
+ mov edi,edx
+ shld edx,edx,5
+ add ecx,esi
+ xor edi,eax
+ shrd ebp,ebp,7
+ add ecx,edx
+ add ebx,DWORD[36+rsp]
+ xor edi,ebp
+ mov esi,ecx
+ shld ecx,ecx,5
+ add ebx,edi
+ xor esi,ebp
+ shrd edx,edx,7
+ add ebx,ecx
+ add eax,DWORD[40+rsp]
+ xor esi,edx
+ mov edi,ebx
+ shld ebx,ebx,5
+ add eax,esi
+ xor edi,edx
+ shrd ecx,ecx,7
+ add eax,ebx
+ add ebp,DWORD[44+rsp]
+ xor edi,ecx
+ mov esi,eax
+ shld eax,eax,5
+ add ebp,edi
+ xor esi,ecx
+ shrd ebx,ebx,7
+ add ebp,eax
+ add edx,DWORD[48+rsp]
+ xor esi,ebx
+ mov edi,ebp
+ shld ebp,ebp,5
+ add edx,esi
+ xor edi,ebx
+ shrd eax,eax,7
+ add edx,ebp
+ add ecx,DWORD[52+rsp]
+ xor edi,eax
+ mov esi,edx
+ shld edx,edx,5
+ add ecx,edi
+ xor esi,eax
+ shrd ebp,ebp,7
+ add ecx,edx
+ add ebx,DWORD[56+rsp]
+ xor esi,ebp
+ mov edi,ecx
+ shld ecx,ecx,5
+ add ebx,esi
+ xor edi,ebp
+ shrd edx,edx,7
+ add ebx,ecx
+ add eax,DWORD[60+rsp]
+ xor edi,edx
+ mov esi,ebx
+ shld ebx,ebx,5
+ add eax,edi
+ shrd ecx,ecx,7
+ add eax,ebx
+ vzeroupper
+
+ add eax,DWORD[r8]
+ add esi,DWORD[4+r8]
+ add ecx,DWORD[8+r8]
+ mov DWORD[r8],eax
+ add edx,DWORD[12+r8]
+ mov DWORD[4+r8],esi
+ add ebp,DWORD[16+r8]
+ mov DWORD[8+r8],ecx
+ mov DWORD[12+r8],edx
+ mov DWORD[16+r8],ebp
+ movaps xmm6,XMMWORD[((-40-96))+r11]
+ movaps xmm7,XMMWORD[((-40-80))+r11]
+ movaps xmm8,XMMWORD[((-40-64))+r11]
+ movaps xmm9,XMMWORD[((-40-48))+r11]
+ movaps xmm10,XMMWORD[((-40-32))+r11]
+ movaps xmm11,XMMWORD[((-40-16))+r11]
+ mov r14,QWORD[((-40))+r11]
+
+ mov r13,QWORD[((-32))+r11]
+
+ mov r12,QWORD[((-24))+r11]
+
+ mov rbp,QWORD[((-16))+r11]
+
+ mov rbx,QWORD[((-8))+r11]
+
+ lea rsp,[r11]
+
+$L$epilogue_avx:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_sha1_block_data_order_avx:
+
+ALIGN 16
+sha1_block_data_order_avx2:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_sha1_block_data_order_avx2:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+_avx2_shortcut:
+
+ mov r11,rsp
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ vzeroupper
+ lea rsp,[((-96))+rsp]
+ vmovaps XMMWORD[(-40-96)+r11],xmm6
+ vmovaps XMMWORD[(-40-80)+r11],xmm7
+ vmovaps XMMWORD[(-40-64)+r11],xmm8
+ vmovaps XMMWORD[(-40-48)+r11],xmm9
+ vmovaps XMMWORD[(-40-32)+r11],xmm10
+ vmovaps XMMWORD[(-40-16)+r11],xmm11
+$L$prologue_avx2:
+ mov r8,rdi
+ mov r9,rsi
+ mov r10,rdx
+
+ lea rsp,[((-640))+rsp]
+ shl r10,6
+ lea r13,[64+r9]
+ and rsp,-128
+ add r10,r9
+ lea r14,[((K_XX_XX+64))]
+
+ mov eax,DWORD[r8]
+ cmp r13,r10
+ cmovae r13,r9
+ mov ebp,DWORD[4+r8]
+ mov ecx,DWORD[8+r8]
+ mov edx,DWORD[12+r8]
+ mov esi,DWORD[16+r8]
+ vmovdqu ymm6,YMMWORD[64+r14]
+
+ vmovdqu xmm0,XMMWORD[r9]
+ vmovdqu xmm1,XMMWORD[16+r9]
+ vmovdqu xmm2,XMMWORD[32+r9]
+ vmovdqu xmm3,XMMWORD[48+r9]
+ lea r9,[64+r9]
+ vinserti128 ymm0,ymm0,XMMWORD[r13],1
+ vinserti128 ymm1,ymm1,XMMWORD[16+r13],1
+ vpshufb ymm0,ymm0,ymm6
+ vinserti128 ymm2,ymm2,XMMWORD[32+r13],1
+ vpshufb ymm1,ymm1,ymm6
+ vinserti128 ymm3,ymm3,XMMWORD[48+r13],1
+ vpshufb ymm2,ymm2,ymm6
+ vmovdqu ymm11,YMMWORD[((-64))+r14]
+ vpshufb ymm3,ymm3,ymm6
+
+ vpaddd ymm4,ymm0,ymm11
+ vpaddd ymm5,ymm1,ymm11
+ vmovdqu YMMWORD[rsp],ymm4
+ vpaddd ymm6,ymm2,ymm11
+ vmovdqu YMMWORD[32+rsp],ymm5
+ vpaddd ymm7,ymm3,ymm11
+ vmovdqu YMMWORD[64+rsp],ymm6
+ vmovdqu YMMWORD[96+rsp],ymm7
+ vpalignr ymm4,ymm1,ymm0,8
+ vpsrldq ymm8,ymm3,4
+ vpxor ymm4,ymm4,ymm0
+ vpxor ymm8,ymm8,ymm2
+ vpxor ymm4,ymm4,ymm8
+ vpsrld ymm8,ymm4,31
+ vpslldq ymm10,ymm4,12
+ vpaddd ymm4,ymm4,ymm4
+ vpsrld ymm9,ymm10,30
+ vpor ymm4,ymm4,ymm8
+ vpslld ymm10,ymm10,2
+ vpxor ymm4,ymm4,ymm9
+ vpxor ymm4,ymm4,ymm10
+ vpaddd ymm9,ymm4,ymm11
+ vmovdqu YMMWORD[128+rsp],ymm9
+ vpalignr ymm5,ymm2,ymm1,8
+ vpsrldq ymm8,ymm4,4
+ vpxor ymm5,ymm5,ymm1
+ vpxor ymm8,ymm8,ymm3
+ vpxor ymm5,ymm5,ymm8
+ vpsrld ymm8,ymm5,31
+ vmovdqu ymm11,YMMWORD[((-32))+r14]
+ vpslldq ymm10,ymm5,12
+ vpaddd ymm5,ymm5,ymm5
+ vpsrld ymm9,ymm10,30
+ vpor ymm5,ymm5,ymm8
+ vpslld ymm10,ymm10,2
+ vpxor ymm5,ymm5,ymm9
+ vpxor ymm5,ymm5,ymm10
+ vpaddd ymm9,ymm5,ymm11
+ vmovdqu YMMWORD[160+rsp],ymm9
+ vpalignr ymm6,ymm3,ymm2,8
+ vpsrldq ymm8,ymm5,4
+ vpxor ymm6,ymm6,ymm2
+ vpxor ymm8,ymm8,ymm4
+ vpxor ymm6,ymm6,ymm8
+ vpsrld ymm8,ymm6,31
+ vpslldq ymm10,ymm6,12
+ vpaddd ymm6,ymm6,ymm6
+ vpsrld ymm9,ymm10,30
+ vpor ymm6,ymm6,ymm8
+ vpslld ymm10,ymm10,2
+ vpxor ymm6,ymm6,ymm9
+ vpxor ymm6,ymm6,ymm10
+ vpaddd ymm9,ymm6,ymm11
+ vmovdqu YMMWORD[192+rsp],ymm9
+ vpalignr ymm7,ymm4,ymm3,8
+ vpsrldq ymm8,ymm6,4
+ vpxor ymm7,ymm7,ymm3
+ vpxor ymm8,ymm8,ymm5
+ vpxor ymm7,ymm7,ymm8
+ vpsrld ymm8,ymm7,31
+ vpslldq ymm10,ymm7,12
+ vpaddd ymm7,ymm7,ymm7
+ vpsrld ymm9,ymm10,30
+ vpor ymm7,ymm7,ymm8
+ vpslld ymm10,ymm10,2
+ vpxor ymm7,ymm7,ymm9
+ vpxor ymm7,ymm7,ymm10
+ vpaddd ymm9,ymm7,ymm11
+ vmovdqu YMMWORD[224+rsp],ymm9
+ lea r13,[128+rsp]
+ jmp NEAR $L$oop_avx2
+ALIGN 32
+$L$oop_avx2:
+ rorx ebx,ebp,2
+ andn edi,ebp,edx
+ and ebp,ecx
+ xor ebp,edi
+ jmp NEAR $L$align32_1
+ALIGN 32
+$L$align32_1:
+ vpalignr ymm8,ymm7,ymm6,8
+ vpxor ymm0,ymm0,ymm4
+ add esi,DWORD[((-128))+r13]
+ andn edi,eax,ecx
+ vpxor ymm0,ymm0,ymm1
+ add esi,ebp
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ vpxor ymm0,ymm0,ymm8
+ and eax,ebx
+ add esi,r12d
+ xor eax,edi
+ vpsrld ymm8,ymm0,30
+ vpslld ymm0,ymm0,2
+ add edx,DWORD[((-124))+r13]
+ andn edi,esi,ebx
+ add edx,eax
+ rorx r12d,esi,27
+ rorx eax,esi,2
+ and esi,ebp
+ vpor ymm0,ymm0,ymm8
+ add edx,r12d
+ xor esi,edi
+ add ecx,DWORD[((-120))+r13]
+ andn edi,edx,ebp
+ vpaddd ymm9,ymm0,ymm11
+ add ecx,esi
+ rorx r12d,edx,27
+ rorx esi,edx,2
+ and edx,eax
+ vmovdqu YMMWORD[256+rsp],ymm9
+ add ecx,r12d
+ xor edx,edi
+ add ebx,DWORD[((-116))+r13]
+ andn edi,ecx,eax
+ add ebx,edx
+ rorx r12d,ecx,27
+ rorx edx,ecx,2
+ and ecx,esi
+ add ebx,r12d
+ xor ecx,edi
+ add ebp,DWORD[((-96))+r13]
+ andn edi,ebx,esi
+ add ebp,ecx
+ rorx r12d,ebx,27
+ rorx ecx,ebx,2
+ and ebx,edx
+ add ebp,r12d
+ xor ebx,edi
+ vpalignr ymm8,ymm0,ymm7,8
+ vpxor ymm1,ymm1,ymm5
+ add eax,DWORD[((-92))+r13]
+ andn edi,ebp,edx
+ vpxor ymm1,ymm1,ymm2
+ add eax,ebx
+ rorx r12d,ebp,27
+ rorx ebx,ebp,2
+ vpxor ymm1,ymm1,ymm8
+ and ebp,ecx
+ add eax,r12d
+ xor ebp,edi
+ vpsrld ymm8,ymm1,30
+ vpslld ymm1,ymm1,2
+ add esi,DWORD[((-88))+r13]
+ andn edi,eax,ecx
+ add esi,ebp
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ and eax,ebx
+ vpor ymm1,ymm1,ymm8
+ add esi,r12d
+ xor eax,edi
+ add edx,DWORD[((-84))+r13]
+ andn edi,esi,ebx
+ vpaddd ymm9,ymm1,ymm11
+ add edx,eax
+ rorx r12d,esi,27
+ rorx eax,esi,2
+ and esi,ebp
+ vmovdqu YMMWORD[288+rsp],ymm9
+ add edx,r12d
+ xor esi,edi
+ add ecx,DWORD[((-64))+r13]
+ andn edi,edx,ebp
+ add ecx,esi
+ rorx r12d,edx,27
+ rorx esi,edx,2
+ and edx,eax
+ add ecx,r12d
+ xor edx,edi
+ add ebx,DWORD[((-60))+r13]
+ andn edi,ecx,eax
+ add ebx,edx
+ rorx r12d,ecx,27
+ rorx edx,ecx,2
+ and ecx,esi
+ add ebx,r12d
+ xor ecx,edi
+ vpalignr ymm8,ymm1,ymm0,8
+ vpxor ymm2,ymm2,ymm6
+ add ebp,DWORD[((-56))+r13]
+ andn edi,ebx,esi
+ vpxor ymm2,ymm2,ymm3
+ vmovdqu ymm11,YMMWORD[r14]
+ add ebp,ecx
+ rorx r12d,ebx,27
+ rorx ecx,ebx,2
+ vpxor ymm2,ymm2,ymm8
+ and ebx,edx
+ add ebp,r12d
+ xor ebx,edi
+ vpsrld ymm8,ymm2,30
+ vpslld ymm2,ymm2,2
+ add eax,DWORD[((-52))+r13]
+ andn edi,ebp,edx
+ add eax,ebx
+ rorx r12d,ebp,27
+ rorx ebx,ebp,2
+ and ebp,ecx
+ vpor ymm2,ymm2,ymm8
+ add eax,r12d
+ xor ebp,edi
+ add esi,DWORD[((-32))+r13]
+ andn edi,eax,ecx
+ vpaddd ymm9,ymm2,ymm11
+ add esi,ebp
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ and eax,ebx
+ vmovdqu YMMWORD[320+rsp],ymm9
+ add esi,r12d
+ xor eax,edi
+ add edx,DWORD[((-28))+r13]
+ andn edi,esi,ebx
+ add edx,eax
+ rorx r12d,esi,27
+ rorx eax,esi,2
+ and esi,ebp
+ add edx,r12d
+ xor esi,edi
+ add ecx,DWORD[((-24))+r13]
+ andn edi,edx,ebp
+ add ecx,esi
+ rorx r12d,edx,27
+ rorx esi,edx,2
+ and edx,eax
+ add ecx,r12d
+ xor edx,edi
+ vpalignr ymm8,ymm2,ymm1,8
+ vpxor ymm3,ymm3,ymm7
+ add ebx,DWORD[((-20))+r13]
+ andn edi,ecx,eax
+ vpxor ymm3,ymm3,ymm4
+ add ebx,edx
+ rorx r12d,ecx,27
+ rorx edx,ecx,2
+ vpxor ymm3,ymm3,ymm8
+ and ecx,esi
+ add ebx,r12d
+ xor ecx,edi
+ vpsrld ymm8,ymm3,30
+ vpslld ymm3,ymm3,2
+ add ebp,DWORD[r13]
+ andn edi,ebx,esi
+ add ebp,ecx
+ rorx r12d,ebx,27
+ rorx ecx,ebx,2
+ and ebx,edx
+ vpor ymm3,ymm3,ymm8
+ add ebp,r12d
+ xor ebx,edi
+ add eax,DWORD[4+r13]
+ andn edi,ebp,edx
+ vpaddd ymm9,ymm3,ymm11
+ add eax,ebx
+ rorx r12d,ebp,27
+ rorx ebx,ebp,2
+ and ebp,ecx
+ vmovdqu YMMWORD[352+rsp],ymm9
+ add eax,r12d
+ xor ebp,edi
+ add esi,DWORD[8+r13]
+ andn edi,eax,ecx
+ add esi,ebp
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ and eax,ebx
+ add esi,r12d
+ xor eax,edi
+ add edx,DWORD[12+r13]
+ lea edx,[rax*1+rdx]
+ rorx r12d,esi,27
+ rorx eax,esi,2
+ xor esi,ebp
+ add edx,r12d
+ xor esi,ebx
+ vpalignr ymm8,ymm3,ymm2,8
+ vpxor ymm4,ymm4,ymm0
+ add ecx,DWORD[32+r13]
+ lea ecx,[rsi*1+rcx]
+ vpxor ymm4,ymm4,ymm5
+ rorx r12d,edx,27
+ rorx esi,edx,2
+ xor edx,eax
+ vpxor ymm4,ymm4,ymm8
+ add ecx,r12d
+ xor edx,ebp
+ add ebx,DWORD[36+r13]
+ vpsrld ymm8,ymm4,30
+ vpslld ymm4,ymm4,2
+ lea ebx,[rdx*1+rbx]
+ rorx r12d,ecx,27
+ rorx edx,ecx,2
+ xor ecx,esi
+ add ebx,r12d
+ xor ecx,eax
+ vpor ymm4,ymm4,ymm8
+ add ebp,DWORD[40+r13]
+ lea ebp,[rbp*1+rcx]
+ rorx r12d,ebx,27
+ rorx ecx,ebx,2
+ vpaddd ymm9,ymm4,ymm11
+ xor ebx,edx
+ add ebp,r12d
+ xor ebx,esi
+ add eax,DWORD[44+r13]
+ vmovdqu YMMWORD[384+rsp],ymm9
+ lea eax,[rbx*1+rax]
+ rorx r12d,ebp,27
+ rorx ebx,ebp,2
+ xor ebp,ecx
+ add eax,r12d
+ xor ebp,edx
+ add esi,DWORD[64+r13]
+ lea esi,[rbp*1+rsi]
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ xor eax,ebx
+ add esi,r12d
+ xor eax,ecx
+ vpalignr ymm8,ymm4,ymm3,8
+ vpxor ymm5,ymm5,ymm1
+ add edx,DWORD[68+r13]
+ lea edx,[rax*1+rdx]
+ vpxor ymm5,ymm5,ymm6
+ rorx r12d,esi,27
+ rorx eax,esi,2
+ xor esi,ebp
+ vpxor ymm5,ymm5,ymm8
+ add edx,r12d
+ xor esi,ebx
+ add ecx,DWORD[72+r13]
+ vpsrld ymm8,ymm5,30
+ vpslld ymm5,ymm5,2
+ lea ecx,[rsi*1+rcx]
+ rorx r12d,edx,27
+ rorx esi,edx,2
+ xor edx,eax
+ add ecx,r12d
+ xor edx,ebp
+ vpor ymm5,ymm5,ymm8
+ add ebx,DWORD[76+r13]
+ lea ebx,[rdx*1+rbx]
+ rorx r12d,ecx,27
+ rorx edx,ecx,2
+ vpaddd ymm9,ymm5,ymm11
+ xor ecx,esi
+ add ebx,r12d
+ xor ecx,eax
+ add ebp,DWORD[96+r13]
+ vmovdqu YMMWORD[416+rsp],ymm9
+ lea ebp,[rbp*1+rcx]
+ rorx r12d,ebx,27
+ rorx ecx,ebx,2
+ xor ebx,edx
+ add ebp,r12d
+ xor ebx,esi
+ add eax,DWORD[100+r13]
+ lea eax,[rbx*1+rax]
+ rorx r12d,ebp,27
+ rorx ebx,ebp,2
+ xor ebp,ecx
+ add eax,r12d
+ xor ebp,edx
+ vpalignr ymm8,ymm5,ymm4,8
+ vpxor ymm6,ymm6,ymm2
+ add esi,DWORD[104+r13]
+ lea esi,[rbp*1+rsi]
+ vpxor ymm6,ymm6,ymm7
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ xor eax,ebx
+ vpxor ymm6,ymm6,ymm8
+ add esi,r12d
+ xor eax,ecx
+ add edx,DWORD[108+r13]
+ lea r13,[256+r13]
+ vpsrld ymm8,ymm6,30
+ vpslld ymm6,ymm6,2
+ lea edx,[rax*1+rdx]
+ rorx r12d,esi,27
+ rorx eax,esi,2
+ xor esi,ebp
+ add edx,r12d
+ xor esi,ebx
+ vpor ymm6,ymm6,ymm8
+ add ecx,DWORD[((-128))+r13]
+ lea ecx,[rsi*1+rcx]
+ rorx r12d,edx,27
+ rorx esi,edx,2
+ vpaddd ymm9,ymm6,ymm11
+ xor edx,eax
+ add ecx,r12d
+ xor edx,ebp
+ add ebx,DWORD[((-124))+r13]
+ vmovdqu YMMWORD[448+rsp],ymm9
+ lea ebx,[rdx*1+rbx]
+ rorx r12d,ecx,27
+ rorx edx,ecx,2
+ xor ecx,esi
+ add ebx,r12d
+ xor ecx,eax
+ add ebp,DWORD[((-120))+r13]
+ lea ebp,[rbp*1+rcx]
+ rorx r12d,ebx,27
+ rorx ecx,ebx,2
+ xor ebx,edx
+ add ebp,r12d
+ xor ebx,esi
+ vpalignr ymm8,ymm6,ymm5,8
+ vpxor ymm7,ymm7,ymm3
+ add eax,DWORD[((-116))+r13]
+ lea eax,[rbx*1+rax]
+ vpxor ymm7,ymm7,ymm0
+ vmovdqu ymm11,YMMWORD[32+r14]
+ rorx r12d,ebp,27
+ rorx ebx,ebp,2
+ xor ebp,ecx
+ vpxor ymm7,ymm7,ymm8
+ add eax,r12d
+ xor ebp,edx
+ add esi,DWORD[((-96))+r13]
+ vpsrld ymm8,ymm7,30
+ vpslld ymm7,ymm7,2
+ lea esi,[rbp*1+rsi]
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ xor eax,ebx
+ add esi,r12d
+ xor eax,ecx
+ vpor ymm7,ymm7,ymm8
+ add edx,DWORD[((-92))+r13]
+ lea edx,[rax*1+rdx]
+ rorx r12d,esi,27
+ rorx eax,esi,2
+ vpaddd ymm9,ymm7,ymm11
+ xor esi,ebp
+ add edx,r12d
+ xor esi,ebx
+ add ecx,DWORD[((-88))+r13]
+ vmovdqu YMMWORD[480+rsp],ymm9
+ lea ecx,[rsi*1+rcx]
+ rorx r12d,edx,27
+ rorx esi,edx,2
+ xor edx,eax
+ add ecx,r12d
+ xor edx,ebp
+ add ebx,DWORD[((-84))+r13]
+ mov edi,esi
+ xor edi,eax
+ lea ebx,[rdx*1+rbx]
+ rorx r12d,ecx,27
+ rorx edx,ecx,2
+ xor ecx,esi
+ add ebx,r12d
+ and ecx,edi
+ jmp NEAR $L$align32_2
+ALIGN 32
+$L$align32_2:
+ vpalignr ymm8,ymm7,ymm6,8
+ vpxor ymm0,ymm0,ymm4
+ add ebp,DWORD[((-64))+r13]
+ xor ecx,esi
+ vpxor ymm0,ymm0,ymm1
+ mov edi,edx
+ xor edi,esi
+ lea ebp,[rbp*1+rcx]
+ vpxor ymm0,ymm0,ymm8
+ rorx r12d,ebx,27
+ rorx ecx,ebx,2
+ xor ebx,edx
+ vpsrld ymm8,ymm0,30
+ vpslld ymm0,ymm0,2
+ add ebp,r12d
+ and ebx,edi
+ add eax,DWORD[((-60))+r13]
+ xor ebx,edx
+ mov edi,ecx
+ xor edi,edx
+ vpor ymm0,ymm0,ymm8
+ lea eax,[rbx*1+rax]
+ rorx r12d,ebp,27
+ rorx ebx,ebp,2
+ xor ebp,ecx
+ vpaddd ymm9,ymm0,ymm11
+ add eax,r12d
+ and ebp,edi
+ add esi,DWORD[((-56))+r13]
+ xor ebp,ecx
+ vmovdqu YMMWORD[512+rsp],ymm9
+ mov edi,ebx
+ xor edi,ecx
+ lea esi,[rbp*1+rsi]
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ xor eax,ebx
+ add esi,r12d
+ and eax,edi
+ add edx,DWORD[((-52))+r13]
+ xor eax,ebx
+ mov edi,ebp
+ xor edi,ebx
+ lea edx,[rax*1+rdx]
+ rorx r12d,esi,27
+ rorx eax,esi,2
+ xor esi,ebp
+ add edx,r12d
+ and esi,edi
+ add ecx,DWORD[((-32))+r13]
+ xor esi,ebp
+ mov edi,eax
+ xor edi,ebp
+ lea ecx,[rsi*1+rcx]
+ rorx r12d,edx,27
+ rorx esi,edx,2
+ xor edx,eax
+ add ecx,r12d
+ and edx,edi
+ vpalignr ymm8,ymm0,ymm7,8
+ vpxor ymm1,ymm1,ymm5
+ add ebx,DWORD[((-28))+r13]
+ xor edx,eax
+ vpxor ymm1,ymm1,ymm2
+ mov edi,esi
+ xor edi,eax
+ lea ebx,[rdx*1+rbx]
+ vpxor ymm1,ymm1,ymm8
+ rorx r12d,ecx,27
+ rorx edx,ecx,2
+ xor ecx,esi
+ vpsrld ymm8,ymm1,30
+ vpslld ymm1,ymm1,2
+ add ebx,r12d
+ and ecx,edi
+ add ebp,DWORD[((-24))+r13]
+ xor ecx,esi
+ mov edi,edx
+ xor edi,esi
+ vpor ymm1,ymm1,ymm8
+ lea ebp,[rbp*1+rcx]
+ rorx r12d,ebx,27
+ rorx ecx,ebx,2
+ xor ebx,edx
+ vpaddd ymm9,ymm1,ymm11
+ add ebp,r12d
+ and ebx,edi
+ add eax,DWORD[((-20))+r13]
+ xor ebx,edx
+ vmovdqu YMMWORD[544+rsp],ymm9
+ mov edi,ecx
+ xor edi,edx
+ lea eax,[rbx*1+rax]
+ rorx r12d,ebp,27
+ rorx ebx,ebp,2
+ xor ebp,ecx
+ add eax,r12d
+ and ebp,edi
+ add esi,DWORD[r13]
+ xor ebp,ecx
+ mov edi,ebx
+ xor edi,ecx
+ lea esi,[rbp*1+rsi]
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ xor eax,ebx
+ add esi,r12d
+ and eax,edi
+ add edx,DWORD[4+r13]
+ xor eax,ebx
+ mov edi,ebp
+ xor edi,ebx
+ lea edx,[rax*1+rdx]
+ rorx r12d,esi,27
+ rorx eax,esi,2
+ xor esi,ebp
+ add edx,r12d
+ and esi,edi
+ vpalignr ymm8,ymm1,ymm0,8
+ vpxor ymm2,ymm2,ymm6
+ add ecx,DWORD[8+r13]
+ xor esi,ebp
+ vpxor ymm2,ymm2,ymm3
+ mov edi,eax
+ xor edi,ebp
+ lea ecx,[rsi*1+rcx]
+ vpxor ymm2,ymm2,ymm8
+ rorx r12d,edx,27
+ rorx esi,edx,2
+ xor edx,eax
+ vpsrld ymm8,ymm2,30
+ vpslld ymm2,ymm2,2
+ add ecx,r12d
+ and edx,edi
+ add ebx,DWORD[12+r13]
+ xor edx,eax
+ mov edi,esi
+ xor edi,eax
+ vpor ymm2,ymm2,ymm8
+ lea ebx,[rdx*1+rbx]
+ rorx r12d,ecx,27
+ rorx edx,ecx,2
+ xor ecx,esi
+ vpaddd ymm9,ymm2,ymm11
+ add ebx,r12d
+ and ecx,edi
+ add ebp,DWORD[32+r13]
+ xor ecx,esi
+ vmovdqu YMMWORD[576+rsp],ymm9
+ mov edi,edx
+ xor edi,esi
+ lea ebp,[rbp*1+rcx]
+ rorx r12d,ebx,27
+ rorx ecx,ebx,2
+ xor ebx,edx
+ add ebp,r12d
+ and ebx,edi
+ add eax,DWORD[36+r13]
+ xor ebx,edx
+ mov edi,ecx
+ xor edi,edx
+ lea eax,[rbx*1+rax]
+ rorx r12d,ebp,27
+ rorx ebx,ebp,2
+ xor ebp,ecx
+ add eax,r12d
+ and ebp,edi
+ add esi,DWORD[40+r13]
+ xor ebp,ecx
+ mov edi,ebx
+ xor edi,ecx
+ lea esi,[rbp*1+rsi]
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ xor eax,ebx
+ add esi,r12d
+ and eax,edi
+ vpalignr ymm8,ymm2,ymm1,8
+ vpxor ymm3,ymm3,ymm7
+ add edx,DWORD[44+r13]
+ xor eax,ebx
+ vpxor ymm3,ymm3,ymm4
+ mov edi,ebp
+ xor edi,ebx
+ lea edx,[rax*1+rdx]
+ vpxor ymm3,ymm3,ymm8
+ rorx r12d,esi,27
+ rorx eax,esi,2
+ xor esi,ebp
+ vpsrld ymm8,ymm3,30
+ vpslld ymm3,ymm3,2
+ add edx,r12d
+ and esi,edi
+ add ecx,DWORD[64+r13]
+ xor esi,ebp
+ mov edi,eax
+ xor edi,ebp
+ vpor ymm3,ymm3,ymm8
+ lea ecx,[rsi*1+rcx]
+ rorx r12d,edx,27
+ rorx esi,edx,2
+ xor edx,eax
+ vpaddd ymm9,ymm3,ymm11
+ add ecx,r12d
+ and edx,edi
+ add ebx,DWORD[68+r13]
+ xor edx,eax
+ vmovdqu YMMWORD[608+rsp],ymm9
+ mov edi,esi
+ xor edi,eax
+ lea ebx,[rdx*1+rbx]
+ rorx r12d,ecx,27
+ rorx edx,ecx,2
+ xor ecx,esi
+ add ebx,r12d
+ and ecx,edi
+ add ebp,DWORD[72+r13]
+ xor ecx,esi
+ mov edi,edx
+ xor edi,esi
+ lea ebp,[rbp*1+rcx]
+ rorx r12d,ebx,27
+ rorx ecx,ebx,2
+ xor ebx,edx
+ add ebp,r12d
+ and ebx,edi
+ add eax,DWORD[76+r13]
+ xor ebx,edx
+ lea eax,[rbx*1+rax]
+ rorx r12d,ebp,27
+ rorx ebx,ebp,2
+ xor ebp,ecx
+ add eax,r12d
+ xor ebp,edx
+ add esi,DWORD[96+r13]
+ lea esi,[rbp*1+rsi]
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ xor eax,ebx
+ add esi,r12d
+ xor eax,ecx
+ add edx,DWORD[100+r13]
+ lea edx,[rax*1+rdx]
+ rorx r12d,esi,27
+ rorx eax,esi,2
+ xor esi,ebp
+ add edx,r12d
+ xor esi,ebx
+ add ecx,DWORD[104+r13]
+ lea ecx,[rsi*1+rcx]
+ rorx r12d,edx,27
+ rorx esi,edx,2
+ xor edx,eax
+ add ecx,r12d
+ xor edx,ebp
+ add ebx,DWORD[108+r13]
+ lea r13,[256+r13]
+ lea ebx,[rdx*1+rbx]
+ rorx r12d,ecx,27
+ rorx edx,ecx,2
+ xor ecx,esi
+ add ebx,r12d
+ xor ecx,eax
+ add ebp,DWORD[((-128))+r13]
+ lea ebp,[rbp*1+rcx]
+ rorx r12d,ebx,27
+ rorx ecx,ebx,2
+ xor ebx,edx
+ add ebp,r12d
+ xor ebx,esi
+ add eax,DWORD[((-124))+r13]
+ lea eax,[rbx*1+rax]
+ rorx r12d,ebp,27
+ rorx ebx,ebp,2
+ xor ebp,ecx
+ add eax,r12d
+ xor ebp,edx
+ add esi,DWORD[((-120))+r13]
+ lea esi,[rbp*1+rsi]
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ xor eax,ebx
+ add esi,r12d
+ xor eax,ecx
+ add edx,DWORD[((-116))+r13]
+ lea edx,[rax*1+rdx]
+ rorx r12d,esi,27
+ rorx eax,esi,2
+ xor esi,ebp
+ add edx,r12d
+ xor esi,ebx
+ add ecx,DWORD[((-96))+r13]
+ lea ecx,[rsi*1+rcx]
+ rorx r12d,edx,27
+ rorx esi,edx,2
+ xor edx,eax
+ add ecx,r12d
+ xor edx,ebp
+ add ebx,DWORD[((-92))+r13]
+ lea ebx,[rdx*1+rbx]
+ rorx r12d,ecx,27
+ rorx edx,ecx,2
+ xor ecx,esi
+ add ebx,r12d
+ xor ecx,eax
+ add ebp,DWORD[((-88))+r13]
+ lea ebp,[rbp*1+rcx]
+ rorx r12d,ebx,27
+ rorx ecx,ebx,2
+ xor ebx,edx
+ add ebp,r12d
+ xor ebx,esi
+ add eax,DWORD[((-84))+r13]
+ lea eax,[rbx*1+rax]
+ rorx r12d,ebp,27
+ rorx ebx,ebp,2
+ xor ebp,ecx
+ add eax,r12d
+ xor ebp,edx
+ add esi,DWORD[((-64))+r13]
+ lea esi,[rbp*1+rsi]
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ xor eax,ebx
+ add esi,r12d
+ xor eax,ecx
+ add edx,DWORD[((-60))+r13]
+ lea edx,[rax*1+rdx]
+ rorx r12d,esi,27
+ rorx eax,esi,2
+ xor esi,ebp
+ add edx,r12d
+ xor esi,ebx
+ add ecx,DWORD[((-56))+r13]
+ lea ecx,[rsi*1+rcx]
+ rorx r12d,edx,27
+ rorx esi,edx,2
+ xor edx,eax
+ add ecx,r12d
+ xor edx,ebp
+ add ebx,DWORD[((-52))+r13]
+ lea ebx,[rdx*1+rbx]
+ rorx r12d,ecx,27
+ rorx edx,ecx,2
+ xor ecx,esi
+ add ebx,r12d
+ xor ecx,eax
+ add ebp,DWORD[((-32))+r13]
+ lea ebp,[rbp*1+rcx]
+ rorx r12d,ebx,27
+ rorx ecx,ebx,2
+ xor ebx,edx
+ add ebp,r12d
+ xor ebx,esi
+ add eax,DWORD[((-28))+r13]
+ lea eax,[rbx*1+rax]
+ rorx r12d,ebp,27
+ rorx ebx,ebp,2
+ xor ebp,ecx
+ add eax,r12d
+ xor ebp,edx
+ add esi,DWORD[((-24))+r13]
+ lea esi,[rbp*1+rsi]
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ xor eax,ebx
+ add esi,r12d
+ xor eax,ecx
+ add edx,DWORD[((-20))+r13]
+ lea edx,[rax*1+rdx]
+ rorx r12d,esi,27
+ add edx,r12d
+ lea r13,[128+r9]
+ lea rdi,[128+r9]
+ cmp r13,r10
+ cmovae r13,r9
+
+
+ add edx,DWORD[r8]
+ add esi,DWORD[4+r8]
+ add ebp,DWORD[8+r8]
+ mov DWORD[r8],edx
+ add ebx,DWORD[12+r8]
+ mov DWORD[4+r8],esi
+ mov eax,edx
+ add ecx,DWORD[16+r8]
+ mov r12d,ebp
+ mov DWORD[8+r8],ebp
+ mov edx,ebx
+
+ mov DWORD[12+r8],ebx
+ mov ebp,esi
+ mov DWORD[16+r8],ecx
+
+ mov esi,ecx
+ mov ecx,r12d
+
+
+ cmp r9,r10
+ je NEAR $L$done_avx2
+ vmovdqu ymm6,YMMWORD[64+r14]
+ cmp rdi,r10
+ ja NEAR $L$ast_avx2
+
+ vmovdqu xmm0,XMMWORD[((-64))+rdi]
+ vmovdqu xmm1,XMMWORD[((-48))+rdi]
+ vmovdqu xmm2,XMMWORD[((-32))+rdi]
+ vmovdqu xmm3,XMMWORD[((-16))+rdi]
+ vinserti128 ymm0,ymm0,XMMWORD[r13],1
+ vinserti128 ymm1,ymm1,XMMWORD[16+r13],1
+ vinserti128 ymm2,ymm2,XMMWORD[32+r13],1
+ vinserti128 ymm3,ymm3,XMMWORD[48+r13],1
+ jmp NEAR $L$ast_avx2
+
+ALIGN 32
+$L$ast_avx2:
+ lea r13,[((128+16))+rsp]
+ rorx ebx,ebp,2
+ andn edi,ebp,edx
+ and ebp,ecx
+ xor ebp,edi
+ sub r9,-128
+ add esi,DWORD[((-128))+r13]
+ andn edi,eax,ecx
+ add esi,ebp
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ and eax,ebx
+ add esi,r12d
+ xor eax,edi
+ add edx,DWORD[((-124))+r13]
+ andn edi,esi,ebx
+ add edx,eax
+ rorx r12d,esi,27
+ rorx eax,esi,2
+ and esi,ebp
+ add edx,r12d
+ xor esi,edi
+ add ecx,DWORD[((-120))+r13]
+ andn edi,edx,ebp
+ add ecx,esi
+ rorx r12d,edx,27
+ rorx esi,edx,2
+ and edx,eax
+ add ecx,r12d
+ xor edx,edi
+ add ebx,DWORD[((-116))+r13]
+ andn edi,ecx,eax
+ add ebx,edx
+ rorx r12d,ecx,27
+ rorx edx,ecx,2
+ and ecx,esi
+ add ebx,r12d
+ xor ecx,edi
+ add ebp,DWORD[((-96))+r13]
+ andn edi,ebx,esi
+ add ebp,ecx
+ rorx r12d,ebx,27
+ rorx ecx,ebx,2
+ and ebx,edx
+ add ebp,r12d
+ xor ebx,edi
+ add eax,DWORD[((-92))+r13]
+ andn edi,ebp,edx
+ add eax,ebx
+ rorx r12d,ebp,27
+ rorx ebx,ebp,2
+ and ebp,ecx
+ add eax,r12d
+ xor ebp,edi
+ add esi,DWORD[((-88))+r13]
+ andn edi,eax,ecx
+ add esi,ebp
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ and eax,ebx
+ add esi,r12d
+ xor eax,edi
+ add edx,DWORD[((-84))+r13]
+ andn edi,esi,ebx
+ add edx,eax
+ rorx r12d,esi,27
+ rorx eax,esi,2
+ and esi,ebp
+ add edx,r12d
+ xor esi,edi
+ add ecx,DWORD[((-64))+r13]
+ andn edi,edx,ebp
+ add ecx,esi
+ rorx r12d,edx,27
+ rorx esi,edx,2
+ and edx,eax
+ add ecx,r12d
+ xor edx,edi
+ add ebx,DWORD[((-60))+r13]
+ andn edi,ecx,eax
+ add ebx,edx
+ rorx r12d,ecx,27
+ rorx edx,ecx,2
+ and ecx,esi
+ add ebx,r12d
+ xor ecx,edi
+ add ebp,DWORD[((-56))+r13]
+ andn edi,ebx,esi
+ add ebp,ecx
+ rorx r12d,ebx,27
+ rorx ecx,ebx,2
+ and ebx,edx
+ add ebp,r12d
+ xor ebx,edi
+ add eax,DWORD[((-52))+r13]
+ andn edi,ebp,edx
+ add eax,ebx
+ rorx r12d,ebp,27
+ rorx ebx,ebp,2
+ and ebp,ecx
+ add eax,r12d
+ xor ebp,edi
+ add esi,DWORD[((-32))+r13]
+ andn edi,eax,ecx
+ add esi,ebp
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ and eax,ebx
+ add esi,r12d
+ xor eax,edi
+ add edx,DWORD[((-28))+r13]
+ andn edi,esi,ebx
+ add edx,eax
+ rorx r12d,esi,27
+ rorx eax,esi,2
+ and esi,ebp
+ add edx,r12d
+ xor esi,edi
+ add ecx,DWORD[((-24))+r13]
+ andn edi,edx,ebp
+ add ecx,esi
+ rorx r12d,edx,27
+ rorx esi,edx,2
+ and edx,eax
+ add ecx,r12d
+ xor edx,edi
+ add ebx,DWORD[((-20))+r13]
+ andn edi,ecx,eax
+ add ebx,edx
+ rorx r12d,ecx,27
+ rorx edx,ecx,2
+ and ecx,esi
+ add ebx,r12d
+ xor ecx,edi
+ add ebp,DWORD[r13]
+ andn edi,ebx,esi
+ add ebp,ecx
+ rorx r12d,ebx,27
+ rorx ecx,ebx,2
+ and ebx,edx
+ add ebp,r12d
+ xor ebx,edi
+ add eax,DWORD[4+r13]
+ andn edi,ebp,edx
+ add eax,ebx
+ rorx r12d,ebp,27
+ rorx ebx,ebp,2
+ and ebp,ecx
+ add eax,r12d
+ xor ebp,edi
+ add esi,DWORD[8+r13]
+ andn edi,eax,ecx
+ add esi,ebp
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ and eax,ebx
+ add esi,r12d
+ xor eax,edi
+ add edx,DWORD[12+r13]
+ lea edx,[rax*1+rdx]
+ rorx r12d,esi,27
+ rorx eax,esi,2
+ xor esi,ebp
+ add edx,r12d
+ xor esi,ebx
+ add ecx,DWORD[32+r13]
+ lea ecx,[rsi*1+rcx]
+ rorx r12d,edx,27
+ rorx esi,edx,2
+ xor edx,eax
+ add ecx,r12d
+ xor edx,ebp
+ add ebx,DWORD[36+r13]
+ lea ebx,[rdx*1+rbx]
+ rorx r12d,ecx,27
+ rorx edx,ecx,2
+ xor ecx,esi
+ add ebx,r12d
+ xor ecx,eax
+ add ebp,DWORD[40+r13]
+ lea ebp,[rbp*1+rcx]
+ rorx r12d,ebx,27
+ rorx ecx,ebx,2
+ xor ebx,edx
+ add ebp,r12d
+ xor ebx,esi
+ add eax,DWORD[44+r13]
+ lea eax,[rbx*1+rax]
+ rorx r12d,ebp,27
+ rorx ebx,ebp,2
+ xor ebp,ecx
+ add eax,r12d
+ xor ebp,edx
+ add esi,DWORD[64+r13]
+ lea esi,[rbp*1+rsi]
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ xor eax,ebx
+ add esi,r12d
+ xor eax,ecx
+ vmovdqu ymm11,YMMWORD[((-64))+r14]
+ vpshufb ymm0,ymm0,ymm6
+ add edx,DWORD[68+r13]
+ lea edx,[rax*1+rdx]
+ rorx r12d,esi,27
+ rorx eax,esi,2
+ xor esi,ebp
+ add edx,r12d
+ xor esi,ebx
+ add ecx,DWORD[72+r13]
+ lea ecx,[rsi*1+rcx]
+ rorx r12d,edx,27
+ rorx esi,edx,2
+ xor edx,eax
+ add ecx,r12d
+ xor edx,ebp
+ add ebx,DWORD[76+r13]
+ lea ebx,[rdx*1+rbx]
+ rorx r12d,ecx,27
+ rorx edx,ecx,2
+ xor ecx,esi
+ add ebx,r12d
+ xor ecx,eax
+ add ebp,DWORD[96+r13]
+ lea ebp,[rbp*1+rcx]
+ rorx r12d,ebx,27
+ rorx ecx,ebx,2
+ xor ebx,edx
+ add ebp,r12d
+ xor ebx,esi
+ add eax,DWORD[100+r13]
+ lea eax,[rbx*1+rax]
+ rorx r12d,ebp,27
+ rorx ebx,ebp,2
+ xor ebp,ecx
+ add eax,r12d
+ xor ebp,edx
+ vpshufb ymm1,ymm1,ymm6
+ vpaddd ymm8,ymm0,ymm11
+ add esi,DWORD[104+r13]
+ lea esi,[rbp*1+rsi]
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ xor eax,ebx
+ add esi,r12d
+ xor eax,ecx
+ add edx,DWORD[108+r13]
+ lea r13,[256+r13]
+ lea edx,[rax*1+rdx]
+ rorx r12d,esi,27
+ rorx eax,esi,2
+ xor esi,ebp
+ add edx,r12d
+ xor esi,ebx
+ add ecx,DWORD[((-128))+r13]
+ lea ecx,[rsi*1+rcx]
+ rorx r12d,edx,27
+ rorx esi,edx,2
+ xor edx,eax
+ add ecx,r12d
+ xor edx,ebp
+ add ebx,DWORD[((-124))+r13]
+ lea ebx,[rdx*1+rbx]
+ rorx r12d,ecx,27
+ rorx edx,ecx,2
+ xor ecx,esi
+ add ebx,r12d
+ xor ecx,eax
+ add ebp,DWORD[((-120))+r13]
+ lea ebp,[rbp*1+rcx]
+ rorx r12d,ebx,27
+ rorx ecx,ebx,2
+ xor ebx,edx
+ add ebp,r12d
+ xor ebx,esi
+ vmovdqu YMMWORD[rsp],ymm8
+ vpshufb ymm2,ymm2,ymm6
+ vpaddd ymm9,ymm1,ymm11
+ add eax,DWORD[((-116))+r13]
+ lea eax,[rbx*1+rax]
+ rorx r12d,ebp,27
+ rorx ebx,ebp,2
+ xor ebp,ecx
+ add eax,r12d
+ xor ebp,edx
+ add esi,DWORD[((-96))+r13]
+ lea esi,[rbp*1+rsi]
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ xor eax,ebx
+ add esi,r12d
+ xor eax,ecx
+ add edx,DWORD[((-92))+r13]
+ lea edx,[rax*1+rdx]
+ rorx r12d,esi,27
+ rorx eax,esi,2
+ xor esi,ebp
+ add edx,r12d
+ xor esi,ebx
+ add ecx,DWORD[((-88))+r13]
+ lea ecx,[rsi*1+rcx]
+ rorx r12d,edx,27
+ rorx esi,edx,2
+ xor edx,eax
+ add ecx,r12d
+ xor edx,ebp
+ add ebx,DWORD[((-84))+r13]
+ mov edi,esi
+ xor edi,eax
+ lea ebx,[rdx*1+rbx]
+ rorx r12d,ecx,27
+ rorx edx,ecx,2
+ xor ecx,esi
+ add ebx,r12d
+ and ecx,edi
+ vmovdqu YMMWORD[32+rsp],ymm9
+ vpshufb ymm3,ymm3,ymm6
+ vpaddd ymm6,ymm2,ymm11
+ add ebp,DWORD[((-64))+r13]
+ xor ecx,esi
+ mov edi,edx
+ xor edi,esi
+ lea ebp,[rbp*1+rcx]
+ rorx r12d,ebx,27
+ rorx ecx,ebx,2
+ xor ebx,edx
+ add ebp,r12d
+ and ebx,edi
+ add eax,DWORD[((-60))+r13]
+ xor ebx,edx
+ mov edi,ecx
+ xor edi,edx
+ lea eax,[rbx*1+rax]
+ rorx r12d,ebp,27
+ rorx ebx,ebp,2
+ xor ebp,ecx
+ add eax,r12d
+ and ebp,edi
+ add esi,DWORD[((-56))+r13]
+ xor ebp,ecx
+ mov edi,ebx
+ xor edi,ecx
+ lea esi,[rbp*1+rsi]
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ xor eax,ebx
+ add esi,r12d
+ and eax,edi
+ add edx,DWORD[((-52))+r13]
+ xor eax,ebx
+ mov edi,ebp
+ xor edi,ebx
+ lea edx,[rax*1+rdx]
+ rorx r12d,esi,27
+ rorx eax,esi,2
+ xor esi,ebp
+ add edx,r12d
+ and esi,edi
+ add ecx,DWORD[((-32))+r13]
+ xor esi,ebp
+ mov edi,eax
+ xor edi,ebp
+ lea ecx,[rsi*1+rcx]
+ rorx r12d,edx,27
+ rorx esi,edx,2
+ xor edx,eax
+ add ecx,r12d
+ and edx,edi
+ jmp NEAR $L$align32_3
+ALIGN 32
+$L$align32_3:
+ vmovdqu YMMWORD[64+rsp],ymm6
+ vpaddd ymm7,ymm3,ymm11
+ add ebx,DWORD[((-28))+r13]
+ xor edx,eax
+ mov edi,esi
+ xor edi,eax
+ lea ebx,[rdx*1+rbx]
+ rorx r12d,ecx,27
+ rorx edx,ecx,2
+ xor ecx,esi
+ add ebx,r12d
+ and ecx,edi
+ add ebp,DWORD[((-24))+r13]
+ xor ecx,esi
+ mov edi,edx
+ xor edi,esi
+ lea ebp,[rbp*1+rcx]
+ rorx r12d,ebx,27
+ rorx ecx,ebx,2
+ xor ebx,edx
+ add ebp,r12d
+ and ebx,edi
+ add eax,DWORD[((-20))+r13]
+ xor ebx,edx
+ mov edi,ecx
+ xor edi,edx
+ lea eax,[rbx*1+rax]
+ rorx r12d,ebp,27
+ rorx ebx,ebp,2
+ xor ebp,ecx
+ add eax,r12d
+ and ebp,edi
+ add esi,DWORD[r13]
+ xor ebp,ecx
+ mov edi,ebx
+ xor edi,ecx
+ lea esi,[rbp*1+rsi]
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ xor eax,ebx
+ add esi,r12d
+ and eax,edi
+ add edx,DWORD[4+r13]
+ xor eax,ebx
+ mov edi,ebp
+ xor edi,ebx
+ lea edx,[rax*1+rdx]
+ rorx r12d,esi,27
+ rorx eax,esi,2
+ xor esi,ebp
+ add edx,r12d
+ and esi,edi
+ vmovdqu YMMWORD[96+rsp],ymm7
+ add ecx,DWORD[8+r13]
+ xor esi,ebp
+ mov edi,eax
+ xor edi,ebp
+ lea ecx,[rsi*1+rcx]
+ rorx r12d,edx,27
+ rorx esi,edx,2
+ xor edx,eax
+ add ecx,r12d
+ and edx,edi
+ add ebx,DWORD[12+r13]
+ xor edx,eax
+ mov edi,esi
+ xor edi,eax
+ lea ebx,[rdx*1+rbx]
+ rorx r12d,ecx,27
+ rorx edx,ecx,2
+ xor ecx,esi
+ add ebx,r12d
+ and ecx,edi
+ add ebp,DWORD[32+r13]
+ xor ecx,esi
+ mov edi,edx
+ xor edi,esi
+ lea ebp,[rbp*1+rcx]
+ rorx r12d,ebx,27
+ rorx ecx,ebx,2
+ xor ebx,edx
+ add ebp,r12d
+ and ebx,edi
+ add eax,DWORD[36+r13]
+ xor ebx,edx
+ mov edi,ecx
+ xor edi,edx
+ lea eax,[rbx*1+rax]
+ rorx r12d,ebp,27
+ rorx ebx,ebp,2
+ xor ebp,ecx
+ add eax,r12d
+ and ebp,edi
+ add esi,DWORD[40+r13]
+ xor ebp,ecx
+ mov edi,ebx
+ xor edi,ecx
+ lea esi,[rbp*1+rsi]
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ xor eax,ebx
+ add esi,r12d
+ and eax,edi
+ vpalignr ymm4,ymm1,ymm0,8
+ add edx,DWORD[44+r13]
+ xor eax,ebx
+ mov edi,ebp
+ xor edi,ebx
+ vpsrldq ymm8,ymm3,4
+ lea edx,[rax*1+rdx]
+ rorx r12d,esi,27
+ rorx eax,esi,2
+ vpxor ymm4,ymm4,ymm0
+ vpxor ymm8,ymm8,ymm2
+ xor esi,ebp
+ add edx,r12d
+ vpxor ymm4,ymm4,ymm8
+ and esi,edi
+ add ecx,DWORD[64+r13]
+ xor esi,ebp
+ mov edi,eax
+ vpsrld ymm8,ymm4,31
+ xor edi,ebp
+ lea ecx,[rsi*1+rcx]
+ rorx r12d,edx,27
+ vpslldq ymm10,ymm4,12
+ vpaddd ymm4,ymm4,ymm4
+ rorx esi,edx,2
+ xor edx,eax
+ vpsrld ymm9,ymm10,30
+ vpor ymm4,ymm4,ymm8
+ add ecx,r12d
+ and edx,edi
+ vpslld ymm10,ymm10,2
+ vpxor ymm4,ymm4,ymm9
+ add ebx,DWORD[68+r13]
+ xor edx,eax
+ vpxor ymm4,ymm4,ymm10
+ mov edi,esi
+ xor edi,eax
+ lea ebx,[rdx*1+rbx]
+ vpaddd ymm9,ymm4,ymm11
+ rorx r12d,ecx,27
+ rorx edx,ecx,2
+ xor ecx,esi
+ vmovdqu YMMWORD[128+rsp],ymm9
+ add ebx,r12d
+ and ecx,edi
+ add ebp,DWORD[72+r13]
+ xor ecx,esi
+ mov edi,edx
+ xor edi,esi
+ lea ebp,[rbp*1+rcx]
+ rorx r12d,ebx,27
+ rorx ecx,ebx,2
+ xor ebx,edx
+ add ebp,r12d
+ and ebx,edi
+ add eax,DWORD[76+r13]
+ xor ebx,edx
+ lea eax,[rbx*1+rax]
+ rorx r12d,ebp,27
+ rorx ebx,ebp,2
+ xor ebp,ecx
+ add eax,r12d
+ xor ebp,edx
+ vpalignr ymm5,ymm2,ymm1,8
+ add esi,DWORD[96+r13]
+ lea esi,[rbp*1+rsi]
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ vpsrldq ymm8,ymm4,4
+ xor eax,ebx
+ add esi,r12d
+ xor eax,ecx
+ vpxor ymm5,ymm5,ymm1
+ vpxor ymm8,ymm8,ymm3
+ add edx,DWORD[100+r13]
+ lea edx,[rax*1+rdx]
+ vpxor ymm5,ymm5,ymm8
+ rorx r12d,esi,27
+ rorx eax,esi,2
+ xor esi,ebp
+ add edx,r12d
+ vpsrld ymm8,ymm5,31
+ vmovdqu ymm11,YMMWORD[((-32))+r14]
+ xor esi,ebx
+ add ecx,DWORD[104+r13]
+ lea ecx,[rsi*1+rcx]
+ vpslldq ymm10,ymm5,12
+ vpaddd ymm5,ymm5,ymm5
+ rorx r12d,edx,27
+ rorx esi,edx,2
+ vpsrld ymm9,ymm10,30
+ vpor ymm5,ymm5,ymm8
+ xor edx,eax
+ add ecx,r12d
+ vpslld ymm10,ymm10,2
+ vpxor ymm5,ymm5,ymm9
+ xor edx,ebp
+ add ebx,DWORD[108+r13]
+ lea r13,[256+r13]
+ vpxor ymm5,ymm5,ymm10
+ lea ebx,[rdx*1+rbx]
+ rorx r12d,ecx,27
+ rorx edx,ecx,2
+ vpaddd ymm9,ymm5,ymm11
+ xor ecx,esi
+ add ebx,r12d
+ xor ecx,eax
+ vmovdqu YMMWORD[160+rsp],ymm9
+ add ebp,DWORD[((-128))+r13]
+ lea ebp,[rbp*1+rcx]
+ rorx r12d,ebx,27
+ rorx ecx,ebx,2
+ xor ebx,edx
+ add ebp,r12d
+ xor ebx,esi
+ vpalignr ymm6,ymm3,ymm2,8
+ add eax,DWORD[((-124))+r13]
+ lea eax,[rbx*1+rax]
+ rorx r12d,ebp,27
+ rorx ebx,ebp,2
+ vpsrldq ymm8,ymm5,4
+ xor ebp,ecx
+ add eax,r12d
+ xor ebp,edx
+ vpxor ymm6,ymm6,ymm2
+ vpxor ymm8,ymm8,ymm4
+ add esi,DWORD[((-120))+r13]
+ lea esi,[rbp*1+rsi]
+ vpxor ymm6,ymm6,ymm8
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ xor eax,ebx
+ add esi,r12d
+ vpsrld ymm8,ymm6,31
+ xor eax,ecx
+ add edx,DWORD[((-116))+r13]
+ lea edx,[rax*1+rdx]
+ vpslldq ymm10,ymm6,12
+ vpaddd ymm6,ymm6,ymm6
+ rorx r12d,esi,27
+ rorx eax,esi,2
+ vpsrld ymm9,ymm10,30
+ vpor ymm6,ymm6,ymm8
+ xor esi,ebp
+ add edx,r12d
+ vpslld ymm10,ymm10,2
+ vpxor ymm6,ymm6,ymm9
+ xor esi,ebx
+ add ecx,DWORD[((-96))+r13]
+ vpxor ymm6,ymm6,ymm10
+ lea ecx,[rsi*1+rcx]
+ rorx r12d,edx,27
+ rorx esi,edx,2
+ vpaddd ymm9,ymm6,ymm11
+ xor edx,eax
+ add ecx,r12d
+ xor edx,ebp
+ vmovdqu YMMWORD[192+rsp],ymm9
+ add ebx,DWORD[((-92))+r13]
+ lea ebx,[rdx*1+rbx]
+ rorx r12d,ecx,27
+ rorx edx,ecx,2
+ xor ecx,esi
+ add ebx,r12d
+ xor ecx,eax
+ vpalignr ymm7,ymm4,ymm3,8
+ add ebp,DWORD[((-88))+r13]
+ lea ebp,[rbp*1+rcx]
+ rorx r12d,ebx,27
+ rorx ecx,ebx,2
+ vpsrldq ymm8,ymm6,4
+ xor ebx,edx
+ add ebp,r12d
+ xor ebx,esi
+ vpxor ymm7,ymm7,ymm3
+ vpxor ymm8,ymm8,ymm5
+ add eax,DWORD[((-84))+r13]
+ lea eax,[rbx*1+rax]
+ vpxor ymm7,ymm7,ymm8
+ rorx r12d,ebp,27
+ rorx ebx,ebp,2
+ xor ebp,ecx
+ add eax,r12d
+ vpsrld ymm8,ymm7,31
+ xor ebp,edx
+ add esi,DWORD[((-64))+r13]
+ lea esi,[rbp*1+rsi]
+ vpslldq ymm10,ymm7,12
+ vpaddd ymm7,ymm7,ymm7
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ vpsrld ymm9,ymm10,30
+ vpor ymm7,ymm7,ymm8
+ xor eax,ebx
+ add esi,r12d
+ vpslld ymm10,ymm10,2
+ vpxor ymm7,ymm7,ymm9
+ xor eax,ecx
+ add edx,DWORD[((-60))+r13]
+ vpxor ymm7,ymm7,ymm10
+ lea edx,[rax*1+rdx]
+ rorx r12d,esi,27
+ rorx eax,esi,2
+ vpaddd ymm9,ymm7,ymm11
+ xor esi,ebp
+ add edx,r12d
+ xor esi,ebx
+ vmovdqu YMMWORD[224+rsp],ymm9
+ add ecx,DWORD[((-56))+r13]
+ lea ecx,[rsi*1+rcx]
+ rorx r12d,edx,27
+ rorx esi,edx,2
+ xor edx,eax
+ add ecx,r12d
+ xor edx,ebp
+ add ebx,DWORD[((-52))+r13]
+ lea ebx,[rdx*1+rbx]
+ rorx r12d,ecx,27
+ rorx edx,ecx,2
+ xor ecx,esi
+ add ebx,r12d
+ xor ecx,eax
+ add ebp,DWORD[((-32))+r13]
+ lea ebp,[rbp*1+rcx]
+ rorx r12d,ebx,27
+ rorx ecx,ebx,2
+ xor ebx,edx
+ add ebp,r12d
+ xor ebx,esi
+ add eax,DWORD[((-28))+r13]
+ lea eax,[rbx*1+rax]
+ rorx r12d,ebp,27
+ rorx ebx,ebp,2
+ xor ebp,ecx
+ add eax,r12d
+ xor ebp,edx
+ add esi,DWORD[((-24))+r13]
+ lea esi,[rbp*1+rsi]
+ rorx r12d,eax,27
+ rorx ebp,eax,2
+ xor eax,ebx
+ add esi,r12d
+ xor eax,ecx
+ add edx,DWORD[((-20))+r13]
+ lea edx,[rax*1+rdx]
+ rorx r12d,esi,27
+ add edx,r12d
+ lea r13,[128+rsp]
+
+
+ add edx,DWORD[r8]
+ add esi,DWORD[4+r8]
+ add ebp,DWORD[8+r8]
+ mov DWORD[r8],edx
+ add ebx,DWORD[12+r8]
+ mov DWORD[4+r8],esi
+ mov eax,edx
+ add ecx,DWORD[16+r8]
+ mov r12d,ebp
+ mov DWORD[8+r8],ebp
+ mov edx,ebx
+
+ mov DWORD[12+r8],ebx
+ mov ebp,esi
+ mov DWORD[16+r8],ecx
+
+ mov esi,ecx
+ mov ecx,r12d
+
+
+ cmp r9,r10
+ jbe NEAR $L$oop_avx2
+
+$L$done_avx2:
+ vzeroupper
+ movaps xmm6,XMMWORD[((-40-96))+r11]
+ movaps xmm7,XMMWORD[((-40-80))+r11]
+ movaps xmm8,XMMWORD[((-40-64))+r11]
+ movaps xmm9,XMMWORD[((-40-48))+r11]
+ movaps xmm10,XMMWORD[((-40-32))+r11]
+ movaps xmm11,XMMWORD[((-40-16))+r11]
+ mov r14,QWORD[((-40))+r11]
+
+ mov r13,QWORD[((-32))+r11]
+
+ mov r12,QWORD[((-24))+r11]
+
+ mov rbp,QWORD[((-16))+r11]
+
+ mov rbx,QWORD[((-8))+r11]
+
+ lea rsp,[r11]
+
+$L$epilogue_avx2:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_sha1_block_data_order_avx2:
+ALIGN 64
+K_XX_XX:
+ DD 0x5a827999,0x5a827999,0x5a827999,0x5a827999
+ DD 0x5a827999,0x5a827999,0x5a827999,0x5a827999
+ DD 0x6ed9eba1,0x6ed9eba1,0x6ed9eba1,0x6ed9eba1
+ DD 0x6ed9eba1,0x6ed9eba1,0x6ed9eba1,0x6ed9eba1
+ DD 0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc
+ DD 0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc
+ DD 0xca62c1d6,0xca62c1d6,0xca62c1d6,0xca62c1d6
+ DD 0xca62c1d6,0xca62c1d6,0xca62c1d6,0xca62c1d6
+ DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f
+ DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f
+DB 0xf,0xe,0xd,0xc,0xb,0xa,0x9,0x8,0x7,0x6,0x5,0x4,0x3,0x2,0x1,0x0
+DB 83,72,65,49,32,98,108,111,99,107,32,116,114,97,110,115
+DB 102,111,114,109,32,102,111,114,32,120,56,54,95,54,52,44
+DB 32,67,82,89,80,84,79,71,65,77,83,32,98,121,32,60
+DB 97,112,112,114,111,64,111,112,101,110,115,115,108,46,111,114
+DB 103,62,0
+ALIGN 64
+EXTERN __imp_RtlVirtualUnwind
+
+ALIGN 16
+se_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ lea r10,[$L$prologue]
+ cmp rbx,r10
+ jb NEAR $L$common_seh_tail
+
+ mov rax,QWORD[152+r8]
+
+ lea r10,[$L$epilogue]
+ cmp rbx,r10
+ jae NEAR $L$common_seh_tail
+
+ mov rax,QWORD[64+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+ mov rbp,QWORD[((-16))+rax]
+ mov r12,QWORD[((-24))+rax]
+ mov r13,QWORD[((-32))+rax]
+ mov r14,QWORD[((-40))+rax]
+ mov QWORD[144+r8],rbx
+ mov QWORD[160+r8],rbp
+ mov QWORD[216+r8],r12
+ mov QWORD[224+r8],r13
+ mov QWORD[232+r8],r14
+
+ jmp NEAR $L$common_seh_tail
+
+
+ALIGN 16
+shaext_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ lea r10,[$L$prologue_shaext]
+ cmp rbx,r10
+ jb NEAR $L$common_seh_tail
+
+ lea r10,[$L$epilogue_shaext]
+ cmp rbx,r10
+ jae NEAR $L$common_seh_tail
+
+ lea rsi,[((-8-64))+rax]
+ lea rdi,[512+r8]
+ mov ecx,8
+ DD 0xa548f3fc
+
+ jmp NEAR $L$common_seh_tail
+
+
+ALIGN 16
+ssse3_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ mov rsi,QWORD[8+r9]
+ mov r11,QWORD[56+r9]
+
+ mov r10d,DWORD[r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jb NEAR $L$common_seh_tail
+
+ mov rax,QWORD[208+r8]
+
+ mov r10d,DWORD[4+r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jae NEAR $L$common_seh_tail
+
+ lea rsi,[((-40-96))+rax]
+ lea rdi,[512+r8]
+ mov ecx,12
+ DD 0xa548f3fc
+
+ mov rbx,QWORD[((-8))+rax]
+ mov rbp,QWORD[((-16))+rax]
+ mov r12,QWORD[((-24))+rax]
+ mov r13,QWORD[((-32))+rax]
+ mov r14,QWORD[((-40))+rax]
+ mov QWORD[144+r8],rbx
+ mov QWORD[160+r8],rbp
+ mov QWORD[216+r8],r12
+ mov QWORD[224+r8],r13
+ mov QWORD[232+r8],r14
+
+$L$common_seh_tail:
+ mov rdi,QWORD[8+rax]
+ mov rsi,QWORD[16+rax]
+ mov QWORD[152+r8],rax
+ mov QWORD[168+r8],rsi
+ mov QWORD[176+r8],rdi
+
+ mov rdi,QWORD[40+r9]
+ mov rsi,r8
+ mov ecx,154
+ DD 0xa548f3fc
+
+ mov rsi,r9
+ xor rcx,rcx
+ mov rdx,QWORD[8+rsi]
+ mov r8,QWORD[rsi]
+ mov r9,QWORD[16+rsi]
+ mov r10,QWORD[40+rsi]
+ lea r11,[56+rsi]
+ lea r12,[24+rsi]
+ mov QWORD[32+rsp],r10
+ mov QWORD[40+rsp],r11
+ mov QWORD[48+rsp],r12
+ mov QWORD[56+rsp],rcx
+ call QWORD[__imp_RtlVirtualUnwind]
+
+ mov eax,1
+ add rsp,64
+ popfq
+ pop r15
+ pop r14
+ pop r13
+ pop r12
+ pop rbp
+ pop rbx
+ pop rdi
+ pop rsi
+ DB 0F3h,0C3h ;repret
+
+
+section .pdata rdata align=4
+ALIGN 4
+ DD $L$SEH_begin_sha1_block_data_order wrt ..imagebase
+ DD $L$SEH_end_sha1_block_data_order wrt ..imagebase
+ DD $L$SEH_info_sha1_block_data_order wrt ..imagebase
+ DD $L$SEH_begin_sha1_block_data_order_shaext wrt ..imagebase
+ DD $L$SEH_end_sha1_block_data_order_shaext wrt ..imagebase
+ DD $L$SEH_info_sha1_block_data_order_shaext wrt ..imagebase
+ DD $L$SEH_begin_sha1_block_data_order_ssse3 wrt ..imagebase
+ DD $L$SEH_end_sha1_block_data_order_ssse3 wrt ..imagebase
+ DD $L$SEH_info_sha1_block_data_order_ssse3 wrt ..imagebase
+ DD $L$SEH_begin_sha1_block_data_order_avx wrt ..imagebase
+ DD $L$SEH_end_sha1_block_data_order_avx wrt ..imagebase
+ DD $L$SEH_info_sha1_block_data_order_avx wrt ..imagebase
+ DD $L$SEH_begin_sha1_block_data_order_avx2 wrt ..imagebase
+ DD $L$SEH_end_sha1_block_data_order_avx2 wrt ..imagebase
+ DD $L$SEH_info_sha1_block_data_order_avx2 wrt ..imagebase
+section .xdata rdata align=8
+ALIGN 8
+$L$SEH_info_sha1_block_data_order:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+$L$SEH_info_sha1_block_data_order_shaext:
+DB 9,0,0,0
+ DD shaext_handler wrt ..imagebase
+$L$SEH_info_sha1_block_data_order_ssse3:
+DB 9,0,0,0
+ DD ssse3_handler wrt ..imagebase
+ DD $L$prologue_ssse3 wrt ..imagebase,$L$epilogue_ssse3 wrt ..imagebase
+$L$SEH_info_sha1_block_data_order_avx:
+DB 9,0,0,0
+ DD ssse3_handler wrt ..imagebase
+ DD $L$prologue_avx wrt ..imagebase,$L$epilogue_avx wrt ..imagebase
+$L$SEH_info_sha1_block_data_order_avx2:
+DB 9,0,0,0
+ DD ssse3_handler wrt ..imagebase
+ DD $L$prologue_avx2 wrt ..imagebase,$L$epilogue_avx2 wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-mb-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-mb-x86_64.nasm
new file mode 100644
index 0000000000..5940112c1f
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-mb-x86_64.nasm
@@ -0,0 +1,8262 @@
+; Copyright 2013-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+EXTERN OPENSSL_ia32cap_P
+
+global sha256_multi_block
+
+ALIGN 32
+sha256_multi_block:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_sha256_multi_block:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+
+ mov rcx,QWORD[((OPENSSL_ia32cap_P+4))]
+ bt rcx,61
+ jc NEAR _shaext_shortcut
+ test ecx,268435456
+ jnz NEAR _avx_shortcut
+ mov rax,rsp
+
+ push rbx
+
+ push rbp
+
+ lea rsp,[((-168))+rsp]
+ movaps XMMWORD[rsp],xmm6
+ movaps XMMWORD[16+rsp],xmm7
+ movaps XMMWORD[32+rsp],xmm8
+ movaps XMMWORD[48+rsp],xmm9
+ movaps XMMWORD[(-120)+rax],xmm10
+ movaps XMMWORD[(-104)+rax],xmm11
+ movaps XMMWORD[(-88)+rax],xmm12
+ movaps XMMWORD[(-72)+rax],xmm13
+ movaps XMMWORD[(-56)+rax],xmm14
+ movaps XMMWORD[(-40)+rax],xmm15
+ sub rsp,288
+ and rsp,-256
+ mov QWORD[272+rsp],rax
+
+$L$body:
+ lea rbp,[((K256+128))]
+ lea rbx,[256+rsp]
+ lea rdi,[128+rdi]
+
+$L$oop_grande:
+ mov DWORD[280+rsp],edx
+ xor edx,edx
+ mov r8,QWORD[rsi]
+ mov ecx,DWORD[8+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[rbx],ecx
+ cmovle r8,rbp
+ mov r9,QWORD[16+rsi]
+ mov ecx,DWORD[24+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[4+rbx],ecx
+ cmovle r9,rbp
+ mov r10,QWORD[32+rsi]
+ mov ecx,DWORD[40+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[8+rbx],ecx
+ cmovle r10,rbp
+ mov r11,QWORD[48+rsi]
+ mov ecx,DWORD[56+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[12+rbx],ecx
+ cmovle r11,rbp
+ test edx,edx
+ jz NEAR $L$done
+
+ movdqu xmm8,XMMWORD[((0-128))+rdi]
+ lea rax,[128+rsp]
+ movdqu xmm9,XMMWORD[((32-128))+rdi]
+ movdqu xmm10,XMMWORD[((64-128))+rdi]
+ movdqu xmm11,XMMWORD[((96-128))+rdi]
+ movdqu xmm12,XMMWORD[((128-128))+rdi]
+ movdqu xmm13,XMMWORD[((160-128))+rdi]
+ movdqu xmm14,XMMWORD[((192-128))+rdi]
+ movdqu xmm15,XMMWORD[((224-128))+rdi]
+ movdqu xmm6,XMMWORD[$L$pbswap]
+ jmp NEAR $L$oop
+
+ALIGN 32
+$L$oop:
+ movdqa xmm4,xmm10
+ pxor xmm4,xmm9
+ movd xmm5,DWORD[r8]
+ movd xmm0,DWORD[r9]
+ movd xmm1,DWORD[r10]
+ movd xmm2,DWORD[r11]
+ punpckldq xmm5,xmm1
+ punpckldq xmm0,xmm2
+ punpckldq xmm5,xmm0
+ movdqa xmm7,xmm12
+DB 102,15,56,0,238
+ movdqa xmm2,xmm12
+
+ psrld xmm7,6
+ movdqa xmm1,xmm12
+ pslld xmm2,7
+ movdqa XMMWORD[(0-128)+rax],xmm5
+ paddd xmm5,xmm15
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm5,XMMWORD[((-128))+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm12
+
+ pxor xmm7,xmm2
+ movdqa xmm3,xmm12
+ pslld xmm2,26-21
+ pandn xmm0,xmm14
+ pand xmm3,xmm13
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm8
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm8
+ psrld xmm1,2
+ paddd xmm5,xmm7
+ pxor xmm0,xmm3
+ movdqa xmm3,xmm9
+ movdqa xmm7,xmm8
+ pslld xmm2,10
+ pxor xmm3,xmm8
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm5,xmm0
+ pslld xmm2,19-10
+ pand xmm4,xmm3
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm15,xmm9
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm15,xmm4
+ paddd xmm11,xmm5
+ pxor xmm7,xmm2
+
+ paddd xmm15,xmm5
+ paddd xmm15,xmm7
+ movd xmm5,DWORD[4+r8]
+ movd xmm0,DWORD[4+r9]
+ movd xmm1,DWORD[4+r10]
+ movd xmm2,DWORD[4+r11]
+ punpckldq xmm5,xmm1
+ punpckldq xmm0,xmm2
+ punpckldq xmm5,xmm0
+ movdqa xmm7,xmm11
+
+ movdqa xmm2,xmm11
+DB 102,15,56,0,238
+ psrld xmm7,6
+ movdqa xmm1,xmm11
+ pslld xmm2,7
+ movdqa XMMWORD[(16-128)+rax],xmm5
+ paddd xmm5,xmm14
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm5,XMMWORD[((-96))+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm11
+
+ pxor xmm7,xmm2
+ movdqa xmm4,xmm11
+ pslld xmm2,26-21
+ pandn xmm0,xmm13
+ pand xmm4,xmm12
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm15
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm15
+ psrld xmm1,2
+ paddd xmm5,xmm7
+ pxor xmm0,xmm4
+ movdqa xmm4,xmm8
+ movdqa xmm7,xmm15
+ pslld xmm2,10
+ pxor xmm4,xmm15
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm5,xmm0
+ pslld xmm2,19-10
+ pand xmm3,xmm4
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm14,xmm8
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm14,xmm3
+ paddd xmm10,xmm5
+ pxor xmm7,xmm2
+
+ paddd xmm14,xmm5
+ paddd xmm14,xmm7
+ movd xmm5,DWORD[8+r8]
+ movd xmm0,DWORD[8+r9]
+ movd xmm1,DWORD[8+r10]
+ movd xmm2,DWORD[8+r11]
+ punpckldq xmm5,xmm1
+ punpckldq xmm0,xmm2
+ punpckldq xmm5,xmm0
+ movdqa xmm7,xmm10
+DB 102,15,56,0,238
+ movdqa xmm2,xmm10
+
+ psrld xmm7,6
+ movdqa xmm1,xmm10
+ pslld xmm2,7
+ movdqa XMMWORD[(32-128)+rax],xmm5
+ paddd xmm5,xmm13
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm5,XMMWORD[((-64))+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm10
+
+ pxor xmm7,xmm2
+ movdqa xmm3,xmm10
+ pslld xmm2,26-21
+ pandn xmm0,xmm12
+ pand xmm3,xmm11
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm14
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm14
+ psrld xmm1,2
+ paddd xmm5,xmm7
+ pxor xmm0,xmm3
+ movdqa xmm3,xmm15
+ movdqa xmm7,xmm14
+ pslld xmm2,10
+ pxor xmm3,xmm14
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm5,xmm0
+ pslld xmm2,19-10
+ pand xmm4,xmm3
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm13,xmm15
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm13,xmm4
+ paddd xmm9,xmm5
+ pxor xmm7,xmm2
+
+ paddd xmm13,xmm5
+ paddd xmm13,xmm7
+ movd xmm5,DWORD[12+r8]
+ movd xmm0,DWORD[12+r9]
+ movd xmm1,DWORD[12+r10]
+ movd xmm2,DWORD[12+r11]
+ punpckldq xmm5,xmm1
+ punpckldq xmm0,xmm2
+ punpckldq xmm5,xmm0
+ movdqa xmm7,xmm9
+
+ movdqa xmm2,xmm9
+DB 102,15,56,0,238
+ psrld xmm7,6
+ movdqa xmm1,xmm9
+ pslld xmm2,7
+ movdqa XMMWORD[(48-128)+rax],xmm5
+ paddd xmm5,xmm12
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm5,XMMWORD[((-32))+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm9
+
+ pxor xmm7,xmm2
+ movdqa xmm4,xmm9
+ pslld xmm2,26-21
+ pandn xmm0,xmm11
+ pand xmm4,xmm10
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm13
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm13
+ psrld xmm1,2
+ paddd xmm5,xmm7
+ pxor xmm0,xmm4
+ movdqa xmm4,xmm14
+ movdqa xmm7,xmm13
+ pslld xmm2,10
+ pxor xmm4,xmm13
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm5,xmm0
+ pslld xmm2,19-10
+ pand xmm3,xmm4
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm12,xmm14
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm12,xmm3
+ paddd xmm8,xmm5
+ pxor xmm7,xmm2
+
+ paddd xmm12,xmm5
+ paddd xmm12,xmm7
+ movd xmm5,DWORD[16+r8]
+ movd xmm0,DWORD[16+r9]
+ movd xmm1,DWORD[16+r10]
+ movd xmm2,DWORD[16+r11]
+ punpckldq xmm5,xmm1
+ punpckldq xmm0,xmm2
+ punpckldq xmm5,xmm0
+ movdqa xmm7,xmm8
+DB 102,15,56,0,238
+ movdqa xmm2,xmm8
+
+ psrld xmm7,6
+ movdqa xmm1,xmm8
+ pslld xmm2,7
+ movdqa XMMWORD[(64-128)+rax],xmm5
+ paddd xmm5,xmm11
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm5,XMMWORD[rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm8
+
+ pxor xmm7,xmm2
+ movdqa xmm3,xmm8
+ pslld xmm2,26-21
+ pandn xmm0,xmm10
+ pand xmm3,xmm9
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm12
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm12
+ psrld xmm1,2
+ paddd xmm5,xmm7
+ pxor xmm0,xmm3
+ movdqa xmm3,xmm13
+ movdqa xmm7,xmm12
+ pslld xmm2,10
+ pxor xmm3,xmm12
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm5,xmm0
+ pslld xmm2,19-10
+ pand xmm4,xmm3
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm11,xmm13
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm11,xmm4
+ paddd xmm15,xmm5
+ pxor xmm7,xmm2
+
+ paddd xmm11,xmm5
+ paddd xmm11,xmm7
+ movd xmm5,DWORD[20+r8]
+ movd xmm0,DWORD[20+r9]
+ movd xmm1,DWORD[20+r10]
+ movd xmm2,DWORD[20+r11]
+ punpckldq xmm5,xmm1
+ punpckldq xmm0,xmm2
+ punpckldq xmm5,xmm0
+ movdqa xmm7,xmm15
+
+ movdqa xmm2,xmm15
+DB 102,15,56,0,238
+ psrld xmm7,6
+ movdqa xmm1,xmm15
+ pslld xmm2,7
+ movdqa XMMWORD[(80-128)+rax],xmm5
+ paddd xmm5,xmm10
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm5,XMMWORD[32+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm15
+
+ pxor xmm7,xmm2
+ movdqa xmm4,xmm15
+ pslld xmm2,26-21
+ pandn xmm0,xmm9
+ pand xmm4,xmm8
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm11
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm11
+ psrld xmm1,2
+ paddd xmm5,xmm7
+ pxor xmm0,xmm4
+ movdqa xmm4,xmm12
+ movdqa xmm7,xmm11
+ pslld xmm2,10
+ pxor xmm4,xmm11
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm5,xmm0
+ pslld xmm2,19-10
+ pand xmm3,xmm4
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm10,xmm12
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm10,xmm3
+ paddd xmm14,xmm5
+ pxor xmm7,xmm2
+
+ paddd xmm10,xmm5
+ paddd xmm10,xmm7
+ movd xmm5,DWORD[24+r8]
+ movd xmm0,DWORD[24+r9]
+ movd xmm1,DWORD[24+r10]
+ movd xmm2,DWORD[24+r11]
+ punpckldq xmm5,xmm1
+ punpckldq xmm0,xmm2
+ punpckldq xmm5,xmm0
+ movdqa xmm7,xmm14
+DB 102,15,56,0,238
+ movdqa xmm2,xmm14
+
+ psrld xmm7,6
+ movdqa xmm1,xmm14
+ pslld xmm2,7
+ movdqa XMMWORD[(96-128)+rax],xmm5
+ paddd xmm5,xmm9
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm5,XMMWORD[64+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm14
+
+ pxor xmm7,xmm2
+ movdqa xmm3,xmm14
+ pslld xmm2,26-21
+ pandn xmm0,xmm8
+ pand xmm3,xmm15
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm10
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm10
+ psrld xmm1,2
+ paddd xmm5,xmm7
+ pxor xmm0,xmm3
+ movdqa xmm3,xmm11
+ movdqa xmm7,xmm10
+ pslld xmm2,10
+ pxor xmm3,xmm10
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm5,xmm0
+ pslld xmm2,19-10
+ pand xmm4,xmm3
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm9,xmm11
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm9,xmm4
+ paddd xmm13,xmm5
+ pxor xmm7,xmm2
+
+ paddd xmm9,xmm5
+ paddd xmm9,xmm7
+ movd xmm5,DWORD[28+r8]
+ movd xmm0,DWORD[28+r9]
+ movd xmm1,DWORD[28+r10]
+ movd xmm2,DWORD[28+r11]
+ punpckldq xmm5,xmm1
+ punpckldq xmm0,xmm2
+ punpckldq xmm5,xmm0
+ movdqa xmm7,xmm13
+
+ movdqa xmm2,xmm13
+DB 102,15,56,0,238
+ psrld xmm7,6
+ movdqa xmm1,xmm13
+ pslld xmm2,7
+ movdqa XMMWORD[(112-128)+rax],xmm5
+ paddd xmm5,xmm8
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm5,XMMWORD[96+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm13
+
+ pxor xmm7,xmm2
+ movdqa xmm4,xmm13
+ pslld xmm2,26-21
+ pandn xmm0,xmm15
+ pand xmm4,xmm14
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm9
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm9
+ psrld xmm1,2
+ paddd xmm5,xmm7
+ pxor xmm0,xmm4
+ movdqa xmm4,xmm10
+ movdqa xmm7,xmm9
+ pslld xmm2,10
+ pxor xmm4,xmm9
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm5,xmm0
+ pslld xmm2,19-10
+ pand xmm3,xmm4
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm8,xmm10
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm8,xmm3
+ paddd xmm12,xmm5
+ pxor xmm7,xmm2
+
+ paddd xmm8,xmm5
+ paddd xmm8,xmm7
+ lea rbp,[256+rbp]
+ movd xmm5,DWORD[32+r8]
+ movd xmm0,DWORD[32+r9]
+ movd xmm1,DWORD[32+r10]
+ movd xmm2,DWORD[32+r11]
+ punpckldq xmm5,xmm1
+ punpckldq xmm0,xmm2
+ punpckldq xmm5,xmm0
+ movdqa xmm7,xmm12
+DB 102,15,56,0,238
+ movdqa xmm2,xmm12
+
+ psrld xmm7,6
+ movdqa xmm1,xmm12
+ pslld xmm2,7
+ movdqa XMMWORD[(128-128)+rax],xmm5
+ paddd xmm5,xmm15
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm5,XMMWORD[((-128))+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm12
+
+ pxor xmm7,xmm2
+ movdqa xmm3,xmm12
+ pslld xmm2,26-21
+ pandn xmm0,xmm14
+ pand xmm3,xmm13
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm8
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm8
+ psrld xmm1,2
+ paddd xmm5,xmm7
+ pxor xmm0,xmm3
+ movdqa xmm3,xmm9
+ movdqa xmm7,xmm8
+ pslld xmm2,10
+ pxor xmm3,xmm8
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm5,xmm0
+ pslld xmm2,19-10
+ pand xmm4,xmm3
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm15,xmm9
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm15,xmm4
+ paddd xmm11,xmm5
+ pxor xmm7,xmm2
+
+ paddd xmm15,xmm5
+ paddd xmm15,xmm7
+ movd xmm5,DWORD[36+r8]
+ movd xmm0,DWORD[36+r9]
+ movd xmm1,DWORD[36+r10]
+ movd xmm2,DWORD[36+r11]
+ punpckldq xmm5,xmm1
+ punpckldq xmm0,xmm2
+ punpckldq xmm5,xmm0
+ movdqa xmm7,xmm11
+
+ movdqa xmm2,xmm11
+DB 102,15,56,0,238
+ psrld xmm7,6
+ movdqa xmm1,xmm11
+ pslld xmm2,7
+ movdqa XMMWORD[(144-128)+rax],xmm5
+ paddd xmm5,xmm14
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm5,XMMWORD[((-96))+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm11
+
+ pxor xmm7,xmm2
+ movdqa xmm4,xmm11
+ pslld xmm2,26-21
+ pandn xmm0,xmm13
+ pand xmm4,xmm12
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm15
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm15
+ psrld xmm1,2
+ paddd xmm5,xmm7
+ pxor xmm0,xmm4
+ movdqa xmm4,xmm8
+ movdqa xmm7,xmm15
+ pslld xmm2,10
+ pxor xmm4,xmm15
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm5,xmm0
+ pslld xmm2,19-10
+ pand xmm3,xmm4
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm14,xmm8
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm14,xmm3
+ paddd xmm10,xmm5
+ pxor xmm7,xmm2
+
+ paddd xmm14,xmm5
+ paddd xmm14,xmm7
+ movd xmm5,DWORD[40+r8]
+ movd xmm0,DWORD[40+r9]
+ movd xmm1,DWORD[40+r10]
+ movd xmm2,DWORD[40+r11]
+ punpckldq xmm5,xmm1
+ punpckldq xmm0,xmm2
+ punpckldq xmm5,xmm0
+ movdqa xmm7,xmm10
+DB 102,15,56,0,238
+ movdqa xmm2,xmm10
+
+ psrld xmm7,6
+ movdqa xmm1,xmm10
+ pslld xmm2,7
+ movdqa XMMWORD[(160-128)+rax],xmm5
+ paddd xmm5,xmm13
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm5,XMMWORD[((-64))+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm10
+
+ pxor xmm7,xmm2
+ movdqa xmm3,xmm10
+ pslld xmm2,26-21
+ pandn xmm0,xmm12
+ pand xmm3,xmm11
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm14
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm14
+ psrld xmm1,2
+ paddd xmm5,xmm7
+ pxor xmm0,xmm3
+ movdqa xmm3,xmm15
+ movdqa xmm7,xmm14
+ pslld xmm2,10
+ pxor xmm3,xmm14
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm5,xmm0
+ pslld xmm2,19-10
+ pand xmm4,xmm3
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm13,xmm15
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm13,xmm4
+ paddd xmm9,xmm5
+ pxor xmm7,xmm2
+
+ paddd xmm13,xmm5
+ paddd xmm13,xmm7
+ movd xmm5,DWORD[44+r8]
+ movd xmm0,DWORD[44+r9]
+ movd xmm1,DWORD[44+r10]
+ movd xmm2,DWORD[44+r11]
+ punpckldq xmm5,xmm1
+ punpckldq xmm0,xmm2
+ punpckldq xmm5,xmm0
+ movdqa xmm7,xmm9
+
+ movdqa xmm2,xmm9
+DB 102,15,56,0,238
+ psrld xmm7,6
+ movdqa xmm1,xmm9
+ pslld xmm2,7
+ movdqa XMMWORD[(176-128)+rax],xmm5
+ paddd xmm5,xmm12
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm5,XMMWORD[((-32))+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm9
+
+ pxor xmm7,xmm2
+ movdqa xmm4,xmm9
+ pslld xmm2,26-21
+ pandn xmm0,xmm11
+ pand xmm4,xmm10
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm13
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm13
+ psrld xmm1,2
+ paddd xmm5,xmm7
+ pxor xmm0,xmm4
+ movdqa xmm4,xmm14
+ movdqa xmm7,xmm13
+ pslld xmm2,10
+ pxor xmm4,xmm13
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm5,xmm0
+ pslld xmm2,19-10
+ pand xmm3,xmm4
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm12,xmm14
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm12,xmm3
+ paddd xmm8,xmm5
+ pxor xmm7,xmm2
+
+ paddd xmm12,xmm5
+ paddd xmm12,xmm7
+ movd xmm5,DWORD[48+r8]
+ movd xmm0,DWORD[48+r9]
+ movd xmm1,DWORD[48+r10]
+ movd xmm2,DWORD[48+r11]
+ punpckldq xmm5,xmm1
+ punpckldq xmm0,xmm2
+ punpckldq xmm5,xmm0
+ movdqa xmm7,xmm8
+DB 102,15,56,0,238
+ movdqa xmm2,xmm8
+
+ psrld xmm7,6
+ movdqa xmm1,xmm8
+ pslld xmm2,7
+ movdqa XMMWORD[(192-128)+rax],xmm5
+ paddd xmm5,xmm11
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm5,XMMWORD[rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm8
+
+ pxor xmm7,xmm2
+ movdqa xmm3,xmm8
+ pslld xmm2,26-21
+ pandn xmm0,xmm10
+ pand xmm3,xmm9
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm12
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm12
+ psrld xmm1,2
+ paddd xmm5,xmm7
+ pxor xmm0,xmm3
+ movdqa xmm3,xmm13
+ movdqa xmm7,xmm12
+ pslld xmm2,10
+ pxor xmm3,xmm12
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm5,xmm0
+ pslld xmm2,19-10
+ pand xmm4,xmm3
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm11,xmm13
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm11,xmm4
+ paddd xmm15,xmm5
+ pxor xmm7,xmm2
+
+ paddd xmm11,xmm5
+ paddd xmm11,xmm7
+ movd xmm5,DWORD[52+r8]
+ movd xmm0,DWORD[52+r9]
+ movd xmm1,DWORD[52+r10]
+ movd xmm2,DWORD[52+r11]
+ punpckldq xmm5,xmm1
+ punpckldq xmm0,xmm2
+ punpckldq xmm5,xmm0
+ movdqa xmm7,xmm15
+
+ movdqa xmm2,xmm15
+DB 102,15,56,0,238
+ psrld xmm7,6
+ movdqa xmm1,xmm15
+ pslld xmm2,7
+ movdqa XMMWORD[(208-128)+rax],xmm5
+ paddd xmm5,xmm10
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm5,XMMWORD[32+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm15
+
+ pxor xmm7,xmm2
+ movdqa xmm4,xmm15
+ pslld xmm2,26-21
+ pandn xmm0,xmm9
+ pand xmm4,xmm8
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm11
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm11
+ psrld xmm1,2
+ paddd xmm5,xmm7
+ pxor xmm0,xmm4
+ movdqa xmm4,xmm12
+ movdqa xmm7,xmm11
+ pslld xmm2,10
+ pxor xmm4,xmm11
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm5,xmm0
+ pslld xmm2,19-10
+ pand xmm3,xmm4
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm10,xmm12
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm10,xmm3
+ paddd xmm14,xmm5
+ pxor xmm7,xmm2
+
+ paddd xmm10,xmm5
+ paddd xmm10,xmm7
+ movd xmm5,DWORD[56+r8]
+ movd xmm0,DWORD[56+r9]
+ movd xmm1,DWORD[56+r10]
+ movd xmm2,DWORD[56+r11]
+ punpckldq xmm5,xmm1
+ punpckldq xmm0,xmm2
+ punpckldq xmm5,xmm0
+ movdqa xmm7,xmm14
+DB 102,15,56,0,238
+ movdqa xmm2,xmm14
+
+ psrld xmm7,6
+ movdqa xmm1,xmm14
+ pslld xmm2,7
+ movdqa XMMWORD[(224-128)+rax],xmm5
+ paddd xmm5,xmm9
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm5,XMMWORD[64+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm14
+
+ pxor xmm7,xmm2
+ movdqa xmm3,xmm14
+ pslld xmm2,26-21
+ pandn xmm0,xmm8
+ pand xmm3,xmm15
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm10
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm10
+ psrld xmm1,2
+ paddd xmm5,xmm7
+ pxor xmm0,xmm3
+ movdqa xmm3,xmm11
+ movdqa xmm7,xmm10
+ pslld xmm2,10
+ pxor xmm3,xmm10
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm5,xmm0
+ pslld xmm2,19-10
+ pand xmm4,xmm3
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm9,xmm11
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm9,xmm4
+ paddd xmm13,xmm5
+ pxor xmm7,xmm2
+
+ paddd xmm9,xmm5
+ paddd xmm9,xmm7
+ movd xmm5,DWORD[60+r8]
+ lea r8,[64+r8]
+ movd xmm0,DWORD[60+r9]
+ lea r9,[64+r9]
+ movd xmm1,DWORD[60+r10]
+ lea r10,[64+r10]
+ movd xmm2,DWORD[60+r11]
+ lea r11,[64+r11]
+ punpckldq xmm5,xmm1
+ punpckldq xmm0,xmm2
+ punpckldq xmm5,xmm0
+ movdqa xmm7,xmm13
+
+ movdqa xmm2,xmm13
+DB 102,15,56,0,238
+ psrld xmm7,6
+ movdqa xmm1,xmm13
+ pslld xmm2,7
+ movdqa XMMWORD[(240-128)+rax],xmm5
+ paddd xmm5,xmm8
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm5,XMMWORD[96+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm13
+ prefetcht0 [63+r8]
+ pxor xmm7,xmm2
+ movdqa xmm4,xmm13
+ pslld xmm2,26-21
+ pandn xmm0,xmm15
+ pand xmm4,xmm14
+ pxor xmm7,xmm1
+
+ prefetcht0 [63+r9]
+ movdqa xmm1,xmm9
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm9
+ psrld xmm1,2
+ paddd xmm5,xmm7
+ pxor xmm0,xmm4
+ movdqa xmm4,xmm10
+ movdqa xmm7,xmm9
+ pslld xmm2,10
+ pxor xmm4,xmm9
+
+ prefetcht0 [63+r10]
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm5,xmm0
+ pslld xmm2,19-10
+ pand xmm3,xmm4
+ pxor xmm1,xmm7
+
+ prefetcht0 [63+r11]
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm8,xmm10
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm8,xmm3
+ paddd xmm12,xmm5
+ pxor xmm7,xmm2
+
+ paddd xmm8,xmm5
+ paddd xmm8,xmm7
+ lea rbp,[256+rbp]
+ movdqu xmm5,XMMWORD[((0-128))+rax]
+ mov ecx,3
+ jmp NEAR $L$oop_16_xx
+ALIGN 32
+$L$oop_16_xx:
+ movdqa xmm6,XMMWORD[((16-128))+rax]
+ paddd xmm5,XMMWORD[((144-128))+rax]
+
+ movdqa xmm7,xmm6
+ movdqa xmm1,xmm6
+ psrld xmm7,3
+ movdqa xmm2,xmm6
+
+ psrld xmm1,7
+ movdqa xmm0,XMMWORD[((224-128))+rax]
+ pslld xmm2,14
+ pxor xmm7,xmm1
+ psrld xmm1,18-7
+ movdqa xmm3,xmm0
+ pxor xmm7,xmm2
+ pslld xmm2,25-14
+ pxor xmm7,xmm1
+ psrld xmm0,10
+ movdqa xmm1,xmm3
+
+ psrld xmm3,17
+ pxor xmm7,xmm2
+ pslld xmm1,13
+ paddd xmm5,xmm7
+ pxor xmm0,xmm3
+ psrld xmm3,19-17
+ pxor xmm0,xmm1
+ pslld xmm1,15-13
+ pxor xmm0,xmm3
+ pxor xmm0,xmm1
+ paddd xmm5,xmm0
+ movdqa xmm7,xmm12
+
+ movdqa xmm2,xmm12
+
+ psrld xmm7,6
+ movdqa xmm1,xmm12
+ pslld xmm2,7
+ movdqa XMMWORD[(0-128)+rax],xmm5
+ paddd xmm5,xmm15
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm5,XMMWORD[((-128))+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm12
+
+ pxor xmm7,xmm2
+ movdqa xmm3,xmm12
+ pslld xmm2,26-21
+ pandn xmm0,xmm14
+ pand xmm3,xmm13
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm8
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm8
+ psrld xmm1,2
+ paddd xmm5,xmm7
+ pxor xmm0,xmm3
+ movdqa xmm3,xmm9
+ movdqa xmm7,xmm8
+ pslld xmm2,10
+ pxor xmm3,xmm8
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm5,xmm0
+ pslld xmm2,19-10
+ pand xmm4,xmm3
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm15,xmm9
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm15,xmm4
+ paddd xmm11,xmm5
+ pxor xmm7,xmm2
+
+ paddd xmm15,xmm5
+ paddd xmm15,xmm7
+ movdqa xmm5,XMMWORD[((32-128))+rax]
+ paddd xmm6,XMMWORD[((160-128))+rax]
+
+ movdqa xmm7,xmm5
+ movdqa xmm1,xmm5
+ psrld xmm7,3
+ movdqa xmm2,xmm5
+
+ psrld xmm1,7
+ movdqa xmm0,XMMWORD[((240-128))+rax]
+ pslld xmm2,14
+ pxor xmm7,xmm1
+ psrld xmm1,18-7
+ movdqa xmm4,xmm0
+ pxor xmm7,xmm2
+ pslld xmm2,25-14
+ pxor xmm7,xmm1
+ psrld xmm0,10
+ movdqa xmm1,xmm4
+
+ psrld xmm4,17
+ pxor xmm7,xmm2
+ pslld xmm1,13
+ paddd xmm6,xmm7
+ pxor xmm0,xmm4
+ psrld xmm4,19-17
+ pxor xmm0,xmm1
+ pslld xmm1,15-13
+ pxor xmm0,xmm4
+ pxor xmm0,xmm1
+ paddd xmm6,xmm0
+ movdqa xmm7,xmm11
+
+ movdqa xmm2,xmm11
+
+ psrld xmm7,6
+ movdqa xmm1,xmm11
+ pslld xmm2,7
+ movdqa XMMWORD[(16-128)+rax],xmm6
+ paddd xmm6,xmm14
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm6,XMMWORD[((-96))+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm11
+
+ pxor xmm7,xmm2
+ movdqa xmm4,xmm11
+ pslld xmm2,26-21
+ pandn xmm0,xmm13
+ pand xmm4,xmm12
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm15
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm15
+ psrld xmm1,2
+ paddd xmm6,xmm7
+ pxor xmm0,xmm4
+ movdqa xmm4,xmm8
+ movdqa xmm7,xmm15
+ pslld xmm2,10
+ pxor xmm4,xmm15
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm6,xmm0
+ pslld xmm2,19-10
+ pand xmm3,xmm4
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm14,xmm8
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm14,xmm3
+ paddd xmm10,xmm6
+ pxor xmm7,xmm2
+
+ paddd xmm14,xmm6
+ paddd xmm14,xmm7
+ movdqa xmm6,XMMWORD[((48-128))+rax]
+ paddd xmm5,XMMWORD[((176-128))+rax]
+
+ movdqa xmm7,xmm6
+ movdqa xmm1,xmm6
+ psrld xmm7,3
+ movdqa xmm2,xmm6
+
+ psrld xmm1,7
+ movdqa xmm0,XMMWORD[((0-128))+rax]
+ pslld xmm2,14
+ pxor xmm7,xmm1
+ psrld xmm1,18-7
+ movdqa xmm3,xmm0
+ pxor xmm7,xmm2
+ pslld xmm2,25-14
+ pxor xmm7,xmm1
+ psrld xmm0,10
+ movdqa xmm1,xmm3
+
+ psrld xmm3,17
+ pxor xmm7,xmm2
+ pslld xmm1,13
+ paddd xmm5,xmm7
+ pxor xmm0,xmm3
+ psrld xmm3,19-17
+ pxor xmm0,xmm1
+ pslld xmm1,15-13
+ pxor xmm0,xmm3
+ pxor xmm0,xmm1
+ paddd xmm5,xmm0
+ movdqa xmm7,xmm10
+
+ movdqa xmm2,xmm10
+
+ psrld xmm7,6
+ movdqa xmm1,xmm10
+ pslld xmm2,7
+ movdqa XMMWORD[(32-128)+rax],xmm5
+ paddd xmm5,xmm13
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm5,XMMWORD[((-64))+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm10
+
+ pxor xmm7,xmm2
+ movdqa xmm3,xmm10
+ pslld xmm2,26-21
+ pandn xmm0,xmm12
+ pand xmm3,xmm11
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm14
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm14
+ psrld xmm1,2
+ paddd xmm5,xmm7
+ pxor xmm0,xmm3
+ movdqa xmm3,xmm15
+ movdqa xmm7,xmm14
+ pslld xmm2,10
+ pxor xmm3,xmm14
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm5,xmm0
+ pslld xmm2,19-10
+ pand xmm4,xmm3
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm13,xmm15
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm13,xmm4
+ paddd xmm9,xmm5
+ pxor xmm7,xmm2
+
+ paddd xmm13,xmm5
+ paddd xmm13,xmm7
+ movdqa xmm5,XMMWORD[((64-128))+rax]
+ paddd xmm6,XMMWORD[((192-128))+rax]
+
+ movdqa xmm7,xmm5
+ movdqa xmm1,xmm5
+ psrld xmm7,3
+ movdqa xmm2,xmm5
+
+ psrld xmm1,7
+ movdqa xmm0,XMMWORD[((16-128))+rax]
+ pslld xmm2,14
+ pxor xmm7,xmm1
+ psrld xmm1,18-7
+ movdqa xmm4,xmm0
+ pxor xmm7,xmm2
+ pslld xmm2,25-14
+ pxor xmm7,xmm1
+ psrld xmm0,10
+ movdqa xmm1,xmm4
+
+ psrld xmm4,17
+ pxor xmm7,xmm2
+ pslld xmm1,13
+ paddd xmm6,xmm7
+ pxor xmm0,xmm4
+ psrld xmm4,19-17
+ pxor xmm0,xmm1
+ pslld xmm1,15-13
+ pxor xmm0,xmm4
+ pxor xmm0,xmm1
+ paddd xmm6,xmm0
+ movdqa xmm7,xmm9
+
+ movdqa xmm2,xmm9
+
+ psrld xmm7,6
+ movdqa xmm1,xmm9
+ pslld xmm2,7
+ movdqa XMMWORD[(48-128)+rax],xmm6
+ paddd xmm6,xmm12
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm6,XMMWORD[((-32))+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm9
+
+ pxor xmm7,xmm2
+ movdqa xmm4,xmm9
+ pslld xmm2,26-21
+ pandn xmm0,xmm11
+ pand xmm4,xmm10
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm13
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm13
+ psrld xmm1,2
+ paddd xmm6,xmm7
+ pxor xmm0,xmm4
+ movdqa xmm4,xmm14
+ movdqa xmm7,xmm13
+ pslld xmm2,10
+ pxor xmm4,xmm13
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm6,xmm0
+ pslld xmm2,19-10
+ pand xmm3,xmm4
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm12,xmm14
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm12,xmm3
+ paddd xmm8,xmm6
+ pxor xmm7,xmm2
+
+ paddd xmm12,xmm6
+ paddd xmm12,xmm7
+ movdqa xmm6,XMMWORD[((80-128))+rax]
+ paddd xmm5,XMMWORD[((208-128))+rax]
+
+ movdqa xmm7,xmm6
+ movdqa xmm1,xmm6
+ psrld xmm7,3
+ movdqa xmm2,xmm6
+
+ psrld xmm1,7
+ movdqa xmm0,XMMWORD[((32-128))+rax]
+ pslld xmm2,14
+ pxor xmm7,xmm1
+ psrld xmm1,18-7
+ movdqa xmm3,xmm0
+ pxor xmm7,xmm2
+ pslld xmm2,25-14
+ pxor xmm7,xmm1
+ psrld xmm0,10
+ movdqa xmm1,xmm3
+
+ psrld xmm3,17
+ pxor xmm7,xmm2
+ pslld xmm1,13
+ paddd xmm5,xmm7
+ pxor xmm0,xmm3
+ psrld xmm3,19-17
+ pxor xmm0,xmm1
+ pslld xmm1,15-13
+ pxor xmm0,xmm3
+ pxor xmm0,xmm1
+ paddd xmm5,xmm0
+ movdqa xmm7,xmm8
+
+ movdqa xmm2,xmm8
+
+ psrld xmm7,6
+ movdqa xmm1,xmm8
+ pslld xmm2,7
+ movdqa XMMWORD[(64-128)+rax],xmm5
+ paddd xmm5,xmm11
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm5,XMMWORD[rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm8
+
+ pxor xmm7,xmm2
+ movdqa xmm3,xmm8
+ pslld xmm2,26-21
+ pandn xmm0,xmm10
+ pand xmm3,xmm9
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm12
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm12
+ psrld xmm1,2
+ paddd xmm5,xmm7
+ pxor xmm0,xmm3
+ movdqa xmm3,xmm13
+ movdqa xmm7,xmm12
+ pslld xmm2,10
+ pxor xmm3,xmm12
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm5,xmm0
+ pslld xmm2,19-10
+ pand xmm4,xmm3
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm11,xmm13
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm11,xmm4
+ paddd xmm15,xmm5
+ pxor xmm7,xmm2
+
+ paddd xmm11,xmm5
+ paddd xmm11,xmm7
+ movdqa xmm5,XMMWORD[((96-128))+rax]
+ paddd xmm6,XMMWORD[((224-128))+rax]
+
+ movdqa xmm7,xmm5
+ movdqa xmm1,xmm5
+ psrld xmm7,3
+ movdqa xmm2,xmm5
+
+ psrld xmm1,7
+ movdqa xmm0,XMMWORD[((48-128))+rax]
+ pslld xmm2,14
+ pxor xmm7,xmm1
+ psrld xmm1,18-7
+ movdqa xmm4,xmm0
+ pxor xmm7,xmm2
+ pslld xmm2,25-14
+ pxor xmm7,xmm1
+ psrld xmm0,10
+ movdqa xmm1,xmm4
+
+ psrld xmm4,17
+ pxor xmm7,xmm2
+ pslld xmm1,13
+ paddd xmm6,xmm7
+ pxor xmm0,xmm4
+ psrld xmm4,19-17
+ pxor xmm0,xmm1
+ pslld xmm1,15-13
+ pxor xmm0,xmm4
+ pxor xmm0,xmm1
+ paddd xmm6,xmm0
+ movdqa xmm7,xmm15
+
+ movdqa xmm2,xmm15
+
+ psrld xmm7,6
+ movdqa xmm1,xmm15
+ pslld xmm2,7
+ movdqa XMMWORD[(80-128)+rax],xmm6
+ paddd xmm6,xmm10
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm6,XMMWORD[32+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm15
+
+ pxor xmm7,xmm2
+ movdqa xmm4,xmm15
+ pslld xmm2,26-21
+ pandn xmm0,xmm9
+ pand xmm4,xmm8
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm11
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm11
+ psrld xmm1,2
+ paddd xmm6,xmm7
+ pxor xmm0,xmm4
+ movdqa xmm4,xmm12
+ movdqa xmm7,xmm11
+ pslld xmm2,10
+ pxor xmm4,xmm11
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm6,xmm0
+ pslld xmm2,19-10
+ pand xmm3,xmm4
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm10,xmm12
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm10,xmm3
+ paddd xmm14,xmm6
+ pxor xmm7,xmm2
+
+ paddd xmm10,xmm6
+ paddd xmm10,xmm7
+ movdqa xmm6,XMMWORD[((112-128))+rax]
+ paddd xmm5,XMMWORD[((240-128))+rax]
+
+ movdqa xmm7,xmm6
+ movdqa xmm1,xmm6
+ psrld xmm7,3
+ movdqa xmm2,xmm6
+
+ psrld xmm1,7
+ movdqa xmm0,XMMWORD[((64-128))+rax]
+ pslld xmm2,14
+ pxor xmm7,xmm1
+ psrld xmm1,18-7
+ movdqa xmm3,xmm0
+ pxor xmm7,xmm2
+ pslld xmm2,25-14
+ pxor xmm7,xmm1
+ psrld xmm0,10
+ movdqa xmm1,xmm3
+
+ psrld xmm3,17
+ pxor xmm7,xmm2
+ pslld xmm1,13
+ paddd xmm5,xmm7
+ pxor xmm0,xmm3
+ psrld xmm3,19-17
+ pxor xmm0,xmm1
+ pslld xmm1,15-13
+ pxor xmm0,xmm3
+ pxor xmm0,xmm1
+ paddd xmm5,xmm0
+ movdqa xmm7,xmm14
+
+ movdqa xmm2,xmm14
+
+ psrld xmm7,6
+ movdqa xmm1,xmm14
+ pslld xmm2,7
+ movdqa XMMWORD[(96-128)+rax],xmm5
+ paddd xmm5,xmm9
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm5,XMMWORD[64+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm14
+
+ pxor xmm7,xmm2
+ movdqa xmm3,xmm14
+ pslld xmm2,26-21
+ pandn xmm0,xmm8
+ pand xmm3,xmm15
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm10
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm10
+ psrld xmm1,2
+ paddd xmm5,xmm7
+ pxor xmm0,xmm3
+ movdqa xmm3,xmm11
+ movdqa xmm7,xmm10
+ pslld xmm2,10
+ pxor xmm3,xmm10
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm5,xmm0
+ pslld xmm2,19-10
+ pand xmm4,xmm3
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm9,xmm11
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm9,xmm4
+ paddd xmm13,xmm5
+ pxor xmm7,xmm2
+
+ paddd xmm9,xmm5
+ paddd xmm9,xmm7
+ movdqa xmm5,XMMWORD[((128-128))+rax]
+ paddd xmm6,XMMWORD[((0-128))+rax]
+
+ movdqa xmm7,xmm5
+ movdqa xmm1,xmm5
+ psrld xmm7,3
+ movdqa xmm2,xmm5
+
+ psrld xmm1,7
+ movdqa xmm0,XMMWORD[((80-128))+rax]
+ pslld xmm2,14
+ pxor xmm7,xmm1
+ psrld xmm1,18-7
+ movdqa xmm4,xmm0
+ pxor xmm7,xmm2
+ pslld xmm2,25-14
+ pxor xmm7,xmm1
+ psrld xmm0,10
+ movdqa xmm1,xmm4
+
+ psrld xmm4,17
+ pxor xmm7,xmm2
+ pslld xmm1,13
+ paddd xmm6,xmm7
+ pxor xmm0,xmm4
+ psrld xmm4,19-17
+ pxor xmm0,xmm1
+ pslld xmm1,15-13
+ pxor xmm0,xmm4
+ pxor xmm0,xmm1
+ paddd xmm6,xmm0
+ movdqa xmm7,xmm13
+
+ movdqa xmm2,xmm13
+
+ psrld xmm7,6
+ movdqa xmm1,xmm13
+ pslld xmm2,7
+ movdqa XMMWORD[(112-128)+rax],xmm6
+ paddd xmm6,xmm8
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm6,XMMWORD[96+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm13
+
+ pxor xmm7,xmm2
+ movdqa xmm4,xmm13
+ pslld xmm2,26-21
+ pandn xmm0,xmm15
+ pand xmm4,xmm14
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm9
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm9
+ psrld xmm1,2
+ paddd xmm6,xmm7
+ pxor xmm0,xmm4
+ movdqa xmm4,xmm10
+ movdqa xmm7,xmm9
+ pslld xmm2,10
+ pxor xmm4,xmm9
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm6,xmm0
+ pslld xmm2,19-10
+ pand xmm3,xmm4
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm8,xmm10
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm8,xmm3
+ paddd xmm12,xmm6
+ pxor xmm7,xmm2
+
+ paddd xmm8,xmm6
+ paddd xmm8,xmm7
+ lea rbp,[256+rbp]
+ movdqa xmm6,XMMWORD[((144-128))+rax]
+ paddd xmm5,XMMWORD[((16-128))+rax]
+
+ movdqa xmm7,xmm6
+ movdqa xmm1,xmm6
+ psrld xmm7,3
+ movdqa xmm2,xmm6
+
+ psrld xmm1,7
+ movdqa xmm0,XMMWORD[((96-128))+rax]
+ pslld xmm2,14
+ pxor xmm7,xmm1
+ psrld xmm1,18-7
+ movdqa xmm3,xmm0
+ pxor xmm7,xmm2
+ pslld xmm2,25-14
+ pxor xmm7,xmm1
+ psrld xmm0,10
+ movdqa xmm1,xmm3
+
+ psrld xmm3,17
+ pxor xmm7,xmm2
+ pslld xmm1,13
+ paddd xmm5,xmm7
+ pxor xmm0,xmm3
+ psrld xmm3,19-17
+ pxor xmm0,xmm1
+ pslld xmm1,15-13
+ pxor xmm0,xmm3
+ pxor xmm0,xmm1
+ paddd xmm5,xmm0
+ movdqa xmm7,xmm12
+
+ movdqa xmm2,xmm12
+
+ psrld xmm7,6
+ movdqa xmm1,xmm12
+ pslld xmm2,7
+ movdqa XMMWORD[(128-128)+rax],xmm5
+ paddd xmm5,xmm15
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm5,XMMWORD[((-128))+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm12
+
+ pxor xmm7,xmm2
+ movdqa xmm3,xmm12
+ pslld xmm2,26-21
+ pandn xmm0,xmm14
+ pand xmm3,xmm13
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm8
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm8
+ psrld xmm1,2
+ paddd xmm5,xmm7
+ pxor xmm0,xmm3
+ movdqa xmm3,xmm9
+ movdqa xmm7,xmm8
+ pslld xmm2,10
+ pxor xmm3,xmm8
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm5,xmm0
+ pslld xmm2,19-10
+ pand xmm4,xmm3
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm15,xmm9
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm15,xmm4
+ paddd xmm11,xmm5
+ pxor xmm7,xmm2
+
+ paddd xmm15,xmm5
+ paddd xmm15,xmm7
+ movdqa xmm5,XMMWORD[((160-128))+rax]
+ paddd xmm6,XMMWORD[((32-128))+rax]
+
+ movdqa xmm7,xmm5
+ movdqa xmm1,xmm5
+ psrld xmm7,3
+ movdqa xmm2,xmm5
+
+ psrld xmm1,7
+ movdqa xmm0,XMMWORD[((112-128))+rax]
+ pslld xmm2,14
+ pxor xmm7,xmm1
+ psrld xmm1,18-7
+ movdqa xmm4,xmm0
+ pxor xmm7,xmm2
+ pslld xmm2,25-14
+ pxor xmm7,xmm1
+ psrld xmm0,10
+ movdqa xmm1,xmm4
+
+ psrld xmm4,17
+ pxor xmm7,xmm2
+ pslld xmm1,13
+ paddd xmm6,xmm7
+ pxor xmm0,xmm4
+ psrld xmm4,19-17
+ pxor xmm0,xmm1
+ pslld xmm1,15-13
+ pxor xmm0,xmm4
+ pxor xmm0,xmm1
+ paddd xmm6,xmm0
+ movdqa xmm7,xmm11
+
+ movdqa xmm2,xmm11
+
+ psrld xmm7,6
+ movdqa xmm1,xmm11
+ pslld xmm2,7
+ movdqa XMMWORD[(144-128)+rax],xmm6
+ paddd xmm6,xmm14
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm6,XMMWORD[((-96))+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm11
+
+ pxor xmm7,xmm2
+ movdqa xmm4,xmm11
+ pslld xmm2,26-21
+ pandn xmm0,xmm13
+ pand xmm4,xmm12
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm15
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm15
+ psrld xmm1,2
+ paddd xmm6,xmm7
+ pxor xmm0,xmm4
+ movdqa xmm4,xmm8
+ movdqa xmm7,xmm15
+ pslld xmm2,10
+ pxor xmm4,xmm15
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm6,xmm0
+ pslld xmm2,19-10
+ pand xmm3,xmm4
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm14,xmm8
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm14,xmm3
+ paddd xmm10,xmm6
+ pxor xmm7,xmm2
+
+ paddd xmm14,xmm6
+ paddd xmm14,xmm7
+ movdqa xmm6,XMMWORD[((176-128))+rax]
+ paddd xmm5,XMMWORD[((48-128))+rax]
+
+ movdqa xmm7,xmm6
+ movdqa xmm1,xmm6
+ psrld xmm7,3
+ movdqa xmm2,xmm6
+
+ psrld xmm1,7
+ movdqa xmm0,XMMWORD[((128-128))+rax]
+ pslld xmm2,14
+ pxor xmm7,xmm1
+ psrld xmm1,18-7
+ movdqa xmm3,xmm0
+ pxor xmm7,xmm2
+ pslld xmm2,25-14
+ pxor xmm7,xmm1
+ psrld xmm0,10
+ movdqa xmm1,xmm3
+
+ psrld xmm3,17
+ pxor xmm7,xmm2
+ pslld xmm1,13
+ paddd xmm5,xmm7
+ pxor xmm0,xmm3
+ psrld xmm3,19-17
+ pxor xmm0,xmm1
+ pslld xmm1,15-13
+ pxor xmm0,xmm3
+ pxor xmm0,xmm1
+ paddd xmm5,xmm0
+ movdqa xmm7,xmm10
+
+ movdqa xmm2,xmm10
+
+ psrld xmm7,6
+ movdqa xmm1,xmm10
+ pslld xmm2,7
+ movdqa XMMWORD[(160-128)+rax],xmm5
+ paddd xmm5,xmm13
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm5,XMMWORD[((-64))+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm10
+
+ pxor xmm7,xmm2
+ movdqa xmm3,xmm10
+ pslld xmm2,26-21
+ pandn xmm0,xmm12
+ pand xmm3,xmm11
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm14
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm14
+ psrld xmm1,2
+ paddd xmm5,xmm7
+ pxor xmm0,xmm3
+ movdqa xmm3,xmm15
+ movdqa xmm7,xmm14
+ pslld xmm2,10
+ pxor xmm3,xmm14
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm5,xmm0
+ pslld xmm2,19-10
+ pand xmm4,xmm3
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm13,xmm15
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm13,xmm4
+ paddd xmm9,xmm5
+ pxor xmm7,xmm2
+
+ paddd xmm13,xmm5
+ paddd xmm13,xmm7
+ movdqa xmm5,XMMWORD[((192-128))+rax]
+ paddd xmm6,XMMWORD[((64-128))+rax]
+
+ movdqa xmm7,xmm5
+ movdqa xmm1,xmm5
+ psrld xmm7,3
+ movdqa xmm2,xmm5
+
+ psrld xmm1,7
+ movdqa xmm0,XMMWORD[((144-128))+rax]
+ pslld xmm2,14
+ pxor xmm7,xmm1
+ psrld xmm1,18-7
+ movdqa xmm4,xmm0
+ pxor xmm7,xmm2
+ pslld xmm2,25-14
+ pxor xmm7,xmm1
+ psrld xmm0,10
+ movdqa xmm1,xmm4
+
+ psrld xmm4,17
+ pxor xmm7,xmm2
+ pslld xmm1,13
+ paddd xmm6,xmm7
+ pxor xmm0,xmm4
+ psrld xmm4,19-17
+ pxor xmm0,xmm1
+ pslld xmm1,15-13
+ pxor xmm0,xmm4
+ pxor xmm0,xmm1
+ paddd xmm6,xmm0
+ movdqa xmm7,xmm9
+
+ movdqa xmm2,xmm9
+
+ psrld xmm7,6
+ movdqa xmm1,xmm9
+ pslld xmm2,7
+ movdqa XMMWORD[(176-128)+rax],xmm6
+ paddd xmm6,xmm12
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm6,XMMWORD[((-32))+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm9
+
+ pxor xmm7,xmm2
+ movdqa xmm4,xmm9
+ pslld xmm2,26-21
+ pandn xmm0,xmm11
+ pand xmm4,xmm10
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm13
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm13
+ psrld xmm1,2
+ paddd xmm6,xmm7
+ pxor xmm0,xmm4
+ movdqa xmm4,xmm14
+ movdqa xmm7,xmm13
+ pslld xmm2,10
+ pxor xmm4,xmm13
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm6,xmm0
+ pslld xmm2,19-10
+ pand xmm3,xmm4
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm12,xmm14
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm12,xmm3
+ paddd xmm8,xmm6
+ pxor xmm7,xmm2
+
+ paddd xmm12,xmm6
+ paddd xmm12,xmm7
+ movdqa xmm6,XMMWORD[((208-128))+rax]
+ paddd xmm5,XMMWORD[((80-128))+rax]
+
+ movdqa xmm7,xmm6
+ movdqa xmm1,xmm6
+ psrld xmm7,3
+ movdqa xmm2,xmm6
+
+ psrld xmm1,7
+ movdqa xmm0,XMMWORD[((160-128))+rax]
+ pslld xmm2,14
+ pxor xmm7,xmm1
+ psrld xmm1,18-7
+ movdqa xmm3,xmm0
+ pxor xmm7,xmm2
+ pslld xmm2,25-14
+ pxor xmm7,xmm1
+ psrld xmm0,10
+ movdqa xmm1,xmm3
+
+ psrld xmm3,17
+ pxor xmm7,xmm2
+ pslld xmm1,13
+ paddd xmm5,xmm7
+ pxor xmm0,xmm3
+ psrld xmm3,19-17
+ pxor xmm0,xmm1
+ pslld xmm1,15-13
+ pxor xmm0,xmm3
+ pxor xmm0,xmm1
+ paddd xmm5,xmm0
+ movdqa xmm7,xmm8
+
+ movdqa xmm2,xmm8
+
+ psrld xmm7,6
+ movdqa xmm1,xmm8
+ pslld xmm2,7
+ movdqa XMMWORD[(192-128)+rax],xmm5
+ paddd xmm5,xmm11
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm5,XMMWORD[rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm8
+
+ pxor xmm7,xmm2
+ movdqa xmm3,xmm8
+ pslld xmm2,26-21
+ pandn xmm0,xmm10
+ pand xmm3,xmm9
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm12
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm12
+ psrld xmm1,2
+ paddd xmm5,xmm7
+ pxor xmm0,xmm3
+ movdqa xmm3,xmm13
+ movdqa xmm7,xmm12
+ pslld xmm2,10
+ pxor xmm3,xmm12
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm5,xmm0
+ pslld xmm2,19-10
+ pand xmm4,xmm3
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm11,xmm13
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm11,xmm4
+ paddd xmm15,xmm5
+ pxor xmm7,xmm2
+
+ paddd xmm11,xmm5
+ paddd xmm11,xmm7
+ movdqa xmm5,XMMWORD[((224-128))+rax]
+ paddd xmm6,XMMWORD[((96-128))+rax]
+
+ movdqa xmm7,xmm5
+ movdqa xmm1,xmm5
+ psrld xmm7,3
+ movdqa xmm2,xmm5
+
+ psrld xmm1,7
+ movdqa xmm0,XMMWORD[((176-128))+rax]
+ pslld xmm2,14
+ pxor xmm7,xmm1
+ psrld xmm1,18-7
+ movdqa xmm4,xmm0
+ pxor xmm7,xmm2
+ pslld xmm2,25-14
+ pxor xmm7,xmm1
+ psrld xmm0,10
+ movdqa xmm1,xmm4
+
+ psrld xmm4,17
+ pxor xmm7,xmm2
+ pslld xmm1,13
+ paddd xmm6,xmm7
+ pxor xmm0,xmm4
+ psrld xmm4,19-17
+ pxor xmm0,xmm1
+ pslld xmm1,15-13
+ pxor xmm0,xmm4
+ pxor xmm0,xmm1
+ paddd xmm6,xmm0
+ movdqa xmm7,xmm15
+
+ movdqa xmm2,xmm15
+
+ psrld xmm7,6
+ movdqa xmm1,xmm15
+ pslld xmm2,7
+ movdqa XMMWORD[(208-128)+rax],xmm6
+ paddd xmm6,xmm10
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm6,XMMWORD[32+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm15
+
+ pxor xmm7,xmm2
+ movdqa xmm4,xmm15
+ pslld xmm2,26-21
+ pandn xmm0,xmm9
+ pand xmm4,xmm8
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm11
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm11
+ psrld xmm1,2
+ paddd xmm6,xmm7
+ pxor xmm0,xmm4
+ movdqa xmm4,xmm12
+ movdqa xmm7,xmm11
+ pslld xmm2,10
+ pxor xmm4,xmm11
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm6,xmm0
+ pslld xmm2,19-10
+ pand xmm3,xmm4
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm10,xmm12
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm10,xmm3
+ paddd xmm14,xmm6
+ pxor xmm7,xmm2
+
+ paddd xmm10,xmm6
+ paddd xmm10,xmm7
+ movdqa xmm6,XMMWORD[((240-128))+rax]
+ paddd xmm5,XMMWORD[((112-128))+rax]
+
+ movdqa xmm7,xmm6
+ movdqa xmm1,xmm6
+ psrld xmm7,3
+ movdqa xmm2,xmm6
+
+ psrld xmm1,7
+ movdqa xmm0,XMMWORD[((192-128))+rax]
+ pslld xmm2,14
+ pxor xmm7,xmm1
+ psrld xmm1,18-7
+ movdqa xmm3,xmm0
+ pxor xmm7,xmm2
+ pslld xmm2,25-14
+ pxor xmm7,xmm1
+ psrld xmm0,10
+ movdqa xmm1,xmm3
+
+ psrld xmm3,17
+ pxor xmm7,xmm2
+ pslld xmm1,13
+ paddd xmm5,xmm7
+ pxor xmm0,xmm3
+ psrld xmm3,19-17
+ pxor xmm0,xmm1
+ pslld xmm1,15-13
+ pxor xmm0,xmm3
+ pxor xmm0,xmm1
+ paddd xmm5,xmm0
+ movdqa xmm7,xmm14
+
+ movdqa xmm2,xmm14
+
+ psrld xmm7,6
+ movdqa xmm1,xmm14
+ pslld xmm2,7
+ movdqa XMMWORD[(224-128)+rax],xmm5
+ paddd xmm5,xmm9
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm5,XMMWORD[64+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm14
+
+ pxor xmm7,xmm2
+ movdqa xmm3,xmm14
+ pslld xmm2,26-21
+ pandn xmm0,xmm8
+ pand xmm3,xmm15
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm10
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm10
+ psrld xmm1,2
+ paddd xmm5,xmm7
+ pxor xmm0,xmm3
+ movdqa xmm3,xmm11
+ movdqa xmm7,xmm10
+ pslld xmm2,10
+ pxor xmm3,xmm10
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm5,xmm0
+ pslld xmm2,19-10
+ pand xmm4,xmm3
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm9,xmm11
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm9,xmm4
+ paddd xmm13,xmm5
+ pxor xmm7,xmm2
+
+ paddd xmm9,xmm5
+ paddd xmm9,xmm7
+ movdqa xmm5,XMMWORD[((0-128))+rax]
+ paddd xmm6,XMMWORD[((128-128))+rax]
+
+ movdqa xmm7,xmm5
+ movdqa xmm1,xmm5
+ psrld xmm7,3
+ movdqa xmm2,xmm5
+
+ psrld xmm1,7
+ movdqa xmm0,XMMWORD[((208-128))+rax]
+ pslld xmm2,14
+ pxor xmm7,xmm1
+ psrld xmm1,18-7
+ movdqa xmm4,xmm0
+ pxor xmm7,xmm2
+ pslld xmm2,25-14
+ pxor xmm7,xmm1
+ psrld xmm0,10
+ movdqa xmm1,xmm4
+
+ psrld xmm4,17
+ pxor xmm7,xmm2
+ pslld xmm1,13
+ paddd xmm6,xmm7
+ pxor xmm0,xmm4
+ psrld xmm4,19-17
+ pxor xmm0,xmm1
+ pslld xmm1,15-13
+ pxor xmm0,xmm4
+ pxor xmm0,xmm1
+ paddd xmm6,xmm0
+ movdqa xmm7,xmm13
+
+ movdqa xmm2,xmm13
+
+ psrld xmm7,6
+ movdqa xmm1,xmm13
+ pslld xmm2,7
+ movdqa XMMWORD[(240-128)+rax],xmm6
+ paddd xmm6,xmm8
+
+ psrld xmm1,11
+ pxor xmm7,xmm2
+ pslld xmm2,21-7
+ paddd xmm6,XMMWORD[96+rbp]
+ pxor xmm7,xmm1
+
+ psrld xmm1,25-11
+ movdqa xmm0,xmm13
+
+ pxor xmm7,xmm2
+ movdqa xmm4,xmm13
+ pslld xmm2,26-21
+ pandn xmm0,xmm15
+ pand xmm4,xmm14
+ pxor xmm7,xmm1
+
+
+ movdqa xmm1,xmm9
+ pxor xmm7,xmm2
+ movdqa xmm2,xmm9
+ psrld xmm1,2
+ paddd xmm6,xmm7
+ pxor xmm0,xmm4
+ movdqa xmm4,xmm10
+ movdqa xmm7,xmm9
+ pslld xmm2,10
+ pxor xmm4,xmm9
+
+
+ psrld xmm7,13
+ pxor xmm1,xmm2
+ paddd xmm6,xmm0
+ pslld xmm2,19-10
+ pand xmm3,xmm4
+ pxor xmm1,xmm7
+
+
+ psrld xmm7,22-13
+ pxor xmm1,xmm2
+ movdqa xmm8,xmm10
+ pslld xmm2,30-19
+ pxor xmm7,xmm1
+ pxor xmm8,xmm3
+ paddd xmm12,xmm6
+ pxor xmm7,xmm2
+
+ paddd xmm8,xmm6
+ paddd xmm8,xmm7
+ lea rbp,[256+rbp]
+ dec ecx
+ jnz NEAR $L$oop_16_xx
+
+ mov ecx,1
+ lea rbp,[((K256+128))]
+
+ movdqa xmm7,XMMWORD[rbx]
+ cmp ecx,DWORD[rbx]
+ pxor xmm0,xmm0
+ cmovge r8,rbp
+ cmp ecx,DWORD[4+rbx]
+ movdqa xmm6,xmm7
+ cmovge r9,rbp
+ cmp ecx,DWORD[8+rbx]
+ pcmpgtd xmm6,xmm0
+ cmovge r10,rbp
+ cmp ecx,DWORD[12+rbx]
+ paddd xmm7,xmm6
+ cmovge r11,rbp
+
+ movdqu xmm0,XMMWORD[((0-128))+rdi]
+ pand xmm8,xmm6
+ movdqu xmm1,XMMWORD[((32-128))+rdi]
+ pand xmm9,xmm6
+ movdqu xmm2,XMMWORD[((64-128))+rdi]
+ pand xmm10,xmm6
+ movdqu xmm5,XMMWORD[((96-128))+rdi]
+ pand xmm11,xmm6
+ paddd xmm8,xmm0
+ movdqu xmm0,XMMWORD[((128-128))+rdi]
+ pand xmm12,xmm6
+ paddd xmm9,xmm1
+ movdqu xmm1,XMMWORD[((160-128))+rdi]
+ pand xmm13,xmm6
+ paddd xmm10,xmm2
+ movdqu xmm2,XMMWORD[((192-128))+rdi]
+ pand xmm14,xmm6
+ paddd xmm11,xmm5
+ movdqu xmm5,XMMWORD[((224-128))+rdi]
+ pand xmm15,xmm6
+ paddd xmm12,xmm0
+ paddd xmm13,xmm1
+ movdqu XMMWORD[(0-128)+rdi],xmm8
+ paddd xmm14,xmm2
+ movdqu XMMWORD[(32-128)+rdi],xmm9
+ paddd xmm15,xmm5
+ movdqu XMMWORD[(64-128)+rdi],xmm10
+ movdqu XMMWORD[(96-128)+rdi],xmm11
+ movdqu XMMWORD[(128-128)+rdi],xmm12
+ movdqu XMMWORD[(160-128)+rdi],xmm13
+ movdqu XMMWORD[(192-128)+rdi],xmm14
+ movdqu XMMWORD[(224-128)+rdi],xmm15
+
+ movdqa XMMWORD[rbx],xmm7
+ movdqa xmm6,XMMWORD[$L$pbswap]
+ dec edx
+ jnz NEAR $L$oop
+
+ mov edx,DWORD[280+rsp]
+ lea rdi,[16+rdi]
+ lea rsi,[64+rsi]
+ dec edx
+ jnz NEAR $L$oop_grande
+
+$L$done:
+ mov rax,QWORD[272+rsp]
+
+ movaps xmm6,XMMWORD[((-184))+rax]
+ movaps xmm7,XMMWORD[((-168))+rax]
+ movaps xmm8,XMMWORD[((-152))+rax]
+ movaps xmm9,XMMWORD[((-136))+rax]
+ movaps xmm10,XMMWORD[((-120))+rax]
+ movaps xmm11,XMMWORD[((-104))+rax]
+ movaps xmm12,XMMWORD[((-88))+rax]
+ movaps xmm13,XMMWORD[((-72))+rax]
+ movaps xmm14,XMMWORD[((-56))+rax]
+ movaps xmm15,XMMWORD[((-40))+rax]
+ mov rbp,QWORD[((-16))+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+
+ lea rsp,[rax]
+
+$L$epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_sha256_multi_block:
+
+ALIGN 32
+sha256_multi_block_shaext:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_sha256_multi_block_shaext:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+
+_shaext_shortcut:
+ mov rax,rsp
+
+ push rbx
+
+ push rbp
+
+ lea rsp,[((-168))+rsp]
+ movaps XMMWORD[rsp],xmm6
+ movaps XMMWORD[16+rsp],xmm7
+ movaps XMMWORD[32+rsp],xmm8
+ movaps XMMWORD[48+rsp],xmm9
+ movaps XMMWORD[(-120)+rax],xmm10
+ movaps XMMWORD[(-104)+rax],xmm11
+ movaps XMMWORD[(-88)+rax],xmm12
+ movaps XMMWORD[(-72)+rax],xmm13
+ movaps XMMWORD[(-56)+rax],xmm14
+ movaps XMMWORD[(-40)+rax],xmm15
+ sub rsp,288
+ shl edx,1
+ and rsp,-256
+ lea rdi,[128+rdi]
+ mov QWORD[272+rsp],rax
+$L$body_shaext:
+ lea rbx,[256+rsp]
+ lea rbp,[((K256_shaext+128))]
+
+$L$oop_grande_shaext:
+ mov DWORD[280+rsp],edx
+ xor edx,edx
+ mov r8,QWORD[rsi]
+ mov ecx,DWORD[8+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[rbx],ecx
+ cmovle r8,rsp
+ mov r9,QWORD[16+rsi]
+ mov ecx,DWORD[24+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[4+rbx],ecx
+ cmovle r9,rsp
+ test edx,edx
+ jz NEAR $L$done_shaext
+
+ movq xmm12,QWORD[((0-128))+rdi]
+ movq xmm4,QWORD[((32-128))+rdi]
+ movq xmm13,QWORD[((64-128))+rdi]
+ movq xmm5,QWORD[((96-128))+rdi]
+ movq xmm8,QWORD[((128-128))+rdi]
+ movq xmm9,QWORD[((160-128))+rdi]
+ movq xmm10,QWORD[((192-128))+rdi]
+ movq xmm11,QWORD[((224-128))+rdi]
+
+ punpckldq xmm12,xmm4
+ punpckldq xmm13,xmm5
+ punpckldq xmm8,xmm9
+ punpckldq xmm10,xmm11
+ movdqa xmm3,XMMWORD[((K256_shaext-16))]
+
+ movdqa xmm14,xmm12
+ movdqa xmm15,xmm13
+ punpcklqdq xmm12,xmm8
+ punpcklqdq xmm13,xmm10
+ punpckhqdq xmm14,xmm8
+ punpckhqdq xmm15,xmm10
+
+ pshufd xmm12,xmm12,27
+ pshufd xmm13,xmm13,27
+ pshufd xmm14,xmm14,27
+ pshufd xmm15,xmm15,27
+ jmp NEAR $L$oop_shaext
+
+ALIGN 32
+$L$oop_shaext:
+ movdqu xmm4,XMMWORD[r8]
+ movdqu xmm8,XMMWORD[r9]
+ movdqu xmm5,XMMWORD[16+r8]
+ movdqu xmm9,XMMWORD[16+r9]
+ movdqu xmm6,XMMWORD[32+r8]
+DB 102,15,56,0,227
+ movdqu xmm10,XMMWORD[32+r9]
+DB 102,68,15,56,0,195
+ movdqu xmm7,XMMWORD[48+r8]
+ lea r8,[64+r8]
+ movdqu xmm11,XMMWORD[48+r9]
+ lea r9,[64+r9]
+
+ movdqa xmm0,XMMWORD[((0-128))+rbp]
+DB 102,15,56,0,235
+ paddd xmm0,xmm4
+ pxor xmm4,xmm12
+ movdqa xmm1,xmm0
+ movdqa xmm2,XMMWORD[((0-128))+rbp]
+DB 102,68,15,56,0,203
+ paddd xmm2,xmm8
+ movdqa XMMWORD[80+rsp],xmm13
+DB 69,15,56,203,236
+ pxor xmm8,xmm14
+ movdqa xmm0,xmm2
+ movdqa XMMWORD[112+rsp],xmm15
+DB 69,15,56,203,254
+ pshufd xmm0,xmm1,0x0e
+ pxor xmm4,xmm12
+ movdqa XMMWORD[64+rsp],xmm12
+DB 69,15,56,203,229
+ pshufd xmm0,xmm2,0x0e
+ pxor xmm8,xmm14
+ movdqa XMMWORD[96+rsp],xmm14
+ movdqa xmm1,XMMWORD[((16-128))+rbp]
+ paddd xmm1,xmm5
+DB 102,15,56,0,243
+DB 69,15,56,203,247
+
+ movdqa xmm0,xmm1
+ movdqa xmm2,XMMWORD[((16-128))+rbp]
+ paddd xmm2,xmm9
+DB 69,15,56,203,236
+ movdqa xmm0,xmm2
+ prefetcht0 [127+r8]
+DB 102,15,56,0,251
+DB 102,68,15,56,0,211
+ prefetcht0 [127+r9]
+DB 69,15,56,203,254
+ pshufd xmm0,xmm1,0x0e
+DB 102,68,15,56,0,219
+DB 15,56,204,229
+DB 69,15,56,203,229
+ pshufd xmm0,xmm2,0x0e
+ movdqa xmm1,XMMWORD[((32-128))+rbp]
+ paddd xmm1,xmm6
+DB 69,15,56,203,247
+
+ movdqa xmm0,xmm1
+ movdqa xmm2,XMMWORD[((32-128))+rbp]
+ paddd xmm2,xmm10
+DB 69,15,56,203,236
+DB 69,15,56,204,193
+ movdqa xmm0,xmm2
+ movdqa xmm3,xmm7
+DB 69,15,56,203,254
+ pshufd xmm0,xmm1,0x0e
+DB 102,15,58,15,222,4
+ paddd xmm4,xmm3
+ movdqa xmm3,xmm11
+DB 102,65,15,58,15,218,4
+DB 15,56,204,238
+DB 69,15,56,203,229
+ pshufd xmm0,xmm2,0x0e
+ movdqa xmm1,XMMWORD[((48-128))+rbp]
+ paddd xmm1,xmm7
+DB 69,15,56,203,247
+DB 69,15,56,204,202
+
+ movdqa xmm0,xmm1
+ movdqa xmm2,XMMWORD[((48-128))+rbp]
+ paddd xmm8,xmm3
+ paddd xmm2,xmm11
+DB 15,56,205,231
+DB 69,15,56,203,236
+ movdqa xmm0,xmm2
+ movdqa xmm3,xmm4
+DB 102,15,58,15,223,4
+DB 69,15,56,203,254
+DB 69,15,56,205,195
+ pshufd xmm0,xmm1,0x0e
+ paddd xmm5,xmm3
+ movdqa xmm3,xmm8
+DB 102,65,15,58,15,219,4
+DB 15,56,204,247
+DB 69,15,56,203,229
+ pshufd xmm0,xmm2,0x0e
+ movdqa xmm1,XMMWORD[((64-128))+rbp]
+ paddd xmm1,xmm4
+DB 69,15,56,203,247
+DB 69,15,56,204,211
+ movdqa xmm0,xmm1
+ movdqa xmm2,XMMWORD[((64-128))+rbp]
+ paddd xmm9,xmm3
+ paddd xmm2,xmm8
+DB 15,56,205,236
+DB 69,15,56,203,236
+ movdqa xmm0,xmm2
+ movdqa xmm3,xmm5
+DB 102,15,58,15,220,4
+DB 69,15,56,203,254
+DB 69,15,56,205,200
+ pshufd xmm0,xmm1,0x0e
+ paddd xmm6,xmm3
+ movdqa xmm3,xmm9
+DB 102,65,15,58,15,216,4
+DB 15,56,204,252
+DB 69,15,56,203,229
+ pshufd xmm0,xmm2,0x0e
+ movdqa xmm1,XMMWORD[((80-128))+rbp]
+ paddd xmm1,xmm5
+DB 69,15,56,203,247
+DB 69,15,56,204,216
+ movdqa xmm0,xmm1
+ movdqa xmm2,XMMWORD[((80-128))+rbp]
+ paddd xmm10,xmm3
+ paddd xmm2,xmm9
+DB 15,56,205,245
+DB 69,15,56,203,236
+ movdqa xmm0,xmm2
+ movdqa xmm3,xmm6
+DB 102,15,58,15,221,4
+DB 69,15,56,203,254
+DB 69,15,56,205,209
+ pshufd xmm0,xmm1,0x0e
+ paddd xmm7,xmm3
+ movdqa xmm3,xmm10
+DB 102,65,15,58,15,217,4
+DB 15,56,204,229
+DB 69,15,56,203,229
+ pshufd xmm0,xmm2,0x0e
+ movdqa xmm1,XMMWORD[((96-128))+rbp]
+ paddd xmm1,xmm6
+DB 69,15,56,203,247
+DB 69,15,56,204,193
+ movdqa xmm0,xmm1
+ movdqa xmm2,XMMWORD[((96-128))+rbp]
+ paddd xmm11,xmm3
+ paddd xmm2,xmm10
+DB 15,56,205,254
+DB 69,15,56,203,236
+ movdqa xmm0,xmm2
+ movdqa xmm3,xmm7
+DB 102,15,58,15,222,4
+DB 69,15,56,203,254
+DB 69,15,56,205,218
+ pshufd xmm0,xmm1,0x0e
+ paddd xmm4,xmm3
+ movdqa xmm3,xmm11
+DB 102,65,15,58,15,218,4
+DB 15,56,204,238
+DB 69,15,56,203,229
+ pshufd xmm0,xmm2,0x0e
+ movdqa xmm1,XMMWORD[((112-128))+rbp]
+ paddd xmm1,xmm7
+DB 69,15,56,203,247
+DB 69,15,56,204,202
+ movdqa xmm0,xmm1
+ movdqa xmm2,XMMWORD[((112-128))+rbp]
+ paddd xmm8,xmm3
+ paddd xmm2,xmm11
+DB 15,56,205,231
+DB 69,15,56,203,236
+ movdqa xmm0,xmm2
+ movdqa xmm3,xmm4
+DB 102,15,58,15,223,4
+DB 69,15,56,203,254
+DB 69,15,56,205,195
+ pshufd xmm0,xmm1,0x0e
+ paddd xmm5,xmm3
+ movdqa xmm3,xmm8
+DB 102,65,15,58,15,219,4
+DB 15,56,204,247
+DB 69,15,56,203,229
+ pshufd xmm0,xmm2,0x0e
+ movdqa xmm1,XMMWORD[((128-128))+rbp]
+ paddd xmm1,xmm4
+DB 69,15,56,203,247
+DB 69,15,56,204,211
+ movdqa xmm0,xmm1
+ movdqa xmm2,XMMWORD[((128-128))+rbp]
+ paddd xmm9,xmm3
+ paddd xmm2,xmm8
+DB 15,56,205,236
+DB 69,15,56,203,236
+ movdqa xmm0,xmm2
+ movdqa xmm3,xmm5
+DB 102,15,58,15,220,4
+DB 69,15,56,203,254
+DB 69,15,56,205,200
+ pshufd xmm0,xmm1,0x0e
+ paddd xmm6,xmm3
+ movdqa xmm3,xmm9
+DB 102,65,15,58,15,216,4
+DB 15,56,204,252
+DB 69,15,56,203,229
+ pshufd xmm0,xmm2,0x0e
+ movdqa xmm1,XMMWORD[((144-128))+rbp]
+ paddd xmm1,xmm5
+DB 69,15,56,203,247
+DB 69,15,56,204,216
+ movdqa xmm0,xmm1
+ movdqa xmm2,XMMWORD[((144-128))+rbp]
+ paddd xmm10,xmm3
+ paddd xmm2,xmm9
+DB 15,56,205,245
+DB 69,15,56,203,236
+ movdqa xmm0,xmm2
+ movdqa xmm3,xmm6
+DB 102,15,58,15,221,4
+DB 69,15,56,203,254
+DB 69,15,56,205,209
+ pshufd xmm0,xmm1,0x0e
+ paddd xmm7,xmm3
+ movdqa xmm3,xmm10
+DB 102,65,15,58,15,217,4
+DB 15,56,204,229
+DB 69,15,56,203,229
+ pshufd xmm0,xmm2,0x0e
+ movdqa xmm1,XMMWORD[((160-128))+rbp]
+ paddd xmm1,xmm6
+DB 69,15,56,203,247
+DB 69,15,56,204,193
+ movdqa xmm0,xmm1
+ movdqa xmm2,XMMWORD[((160-128))+rbp]
+ paddd xmm11,xmm3
+ paddd xmm2,xmm10
+DB 15,56,205,254
+DB 69,15,56,203,236
+ movdqa xmm0,xmm2
+ movdqa xmm3,xmm7
+DB 102,15,58,15,222,4
+DB 69,15,56,203,254
+DB 69,15,56,205,218
+ pshufd xmm0,xmm1,0x0e
+ paddd xmm4,xmm3
+ movdqa xmm3,xmm11
+DB 102,65,15,58,15,218,4
+DB 15,56,204,238
+DB 69,15,56,203,229
+ pshufd xmm0,xmm2,0x0e
+ movdqa xmm1,XMMWORD[((176-128))+rbp]
+ paddd xmm1,xmm7
+DB 69,15,56,203,247
+DB 69,15,56,204,202
+ movdqa xmm0,xmm1
+ movdqa xmm2,XMMWORD[((176-128))+rbp]
+ paddd xmm8,xmm3
+ paddd xmm2,xmm11
+DB 15,56,205,231
+DB 69,15,56,203,236
+ movdqa xmm0,xmm2
+ movdqa xmm3,xmm4
+DB 102,15,58,15,223,4
+DB 69,15,56,203,254
+DB 69,15,56,205,195
+ pshufd xmm0,xmm1,0x0e
+ paddd xmm5,xmm3
+ movdqa xmm3,xmm8
+DB 102,65,15,58,15,219,4
+DB 15,56,204,247
+DB 69,15,56,203,229
+ pshufd xmm0,xmm2,0x0e
+ movdqa xmm1,XMMWORD[((192-128))+rbp]
+ paddd xmm1,xmm4
+DB 69,15,56,203,247
+DB 69,15,56,204,211
+ movdqa xmm0,xmm1
+ movdqa xmm2,XMMWORD[((192-128))+rbp]
+ paddd xmm9,xmm3
+ paddd xmm2,xmm8
+DB 15,56,205,236
+DB 69,15,56,203,236
+ movdqa xmm0,xmm2
+ movdqa xmm3,xmm5
+DB 102,15,58,15,220,4
+DB 69,15,56,203,254
+DB 69,15,56,205,200
+ pshufd xmm0,xmm1,0x0e
+ paddd xmm6,xmm3
+ movdqa xmm3,xmm9
+DB 102,65,15,58,15,216,4
+DB 15,56,204,252
+DB 69,15,56,203,229
+ pshufd xmm0,xmm2,0x0e
+ movdqa xmm1,XMMWORD[((208-128))+rbp]
+ paddd xmm1,xmm5
+DB 69,15,56,203,247
+DB 69,15,56,204,216
+ movdqa xmm0,xmm1
+ movdqa xmm2,XMMWORD[((208-128))+rbp]
+ paddd xmm10,xmm3
+ paddd xmm2,xmm9
+DB 15,56,205,245
+DB 69,15,56,203,236
+ movdqa xmm0,xmm2
+ movdqa xmm3,xmm6
+DB 102,15,58,15,221,4
+DB 69,15,56,203,254
+DB 69,15,56,205,209
+ pshufd xmm0,xmm1,0x0e
+ paddd xmm7,xmm3
+ movdqa xmm3,xmm10
+DB 102,65,15,58,15,217,4
+ nop
+DB 69,15,56,203,229
+ pshufd xmm0,xmm2,0x0e
+ movdqa xmm1,XMMWORD[((224-128))+rbp]
+ paddd xmm1,xmm6
+DB 69,15,56,203,247
+
+ movdqa xmm0,xmm1
+ movdqa xmm2,XMMWORD[((224-128))+rbp]
+ paddd xmm11,xmm3
+ paddd xmm2,xmm10
+DB 15,56,205,254
+ nop
+DB 69,15,56,203,236
+ movdqa xmm0,xmm2
+ mov ecx,1
+ pxor xmm6,xmm6
+DB 69,15,56,203,254
+DB 69,15,56,205,218
+ pshufd xmm0,xmm1,0x0e
+ movdqa xmm1,XMMWORD[((240-128))+rbp]
+ paddd xmm1,xmm7
+ movq xmm7,QWORD[rbx]
+ nop
+DB 69,15,56,203,229
+ pshufd xmm0,xmm2,0x0e
+ movdqa xmm2,XMMWORD[((240-128))+rbp]
+ paddd xmm2,xmm11
+DB 69,15,56,203,247
+
+ movdqa xmm0,xmm1
+ cmp ecx,DWORD[rbx]
+ cmovge r8,rsp
+ cmp ecx,DWORD[4+rbx]
+ cmovge r9,rsp
+ pshufd xmm9,xmm7,0x00
+DB 69,15,56,203,236
+ movdqa xmm0,xmm2
+ pshufd xmm10,xmm7,0x55
+ movdqa xmm11,xmm7
+DB 69,15,56,203,254
+ pshufd xmm0,xmm1,0x0e
+ pcmpgtd xmm9,xmm6
+ pcmpgtd xmm10,xmm6
+DB 69,15,56,203,229
+ pshufd xmm0,xmm2,0x0e
+ pcmpgtd xmm11,xmm6
+ movdqa xmm3,XMMWORD[((K256_shaext-16))]
+DB 69,15,56,203,247
+
+ pand xmm13,xmm9
+ pand xmm15,xmm10
+ pand xmm12,xmm9
+ pand xmm14,xmm10
+ paddd xmm11,xmm7
+
+ paddd xmm13,XMMWORD[80+rsp]
+ paddd xmm15,XMMWORD[112+rsp]
+ paddd xmm12,XMMWORD[64+rsp]
+ paddd xmm14,XMMWORD[96+rsp]
+
+ movq QWORD[rbx],xmm11
+ dec edx
+ jnz NEAR $L$oop_shaext
+
+ mov edx,DWORD[280+rsp]
+
+ pshufd xmm12,xmm12,27
+ pshufd xmm13,xmm13,27
+ pshufd xmm14,xmm14,27
+ pshufd xmm15,xmm15,27
+
+ movdqa xmm5,xmm12
+ movdqa xmm6,xmm13
+ punpckldq xmm12,xmm14
+ punpckhdq xmm5,xmm14
+ punpckldq xmm13,xmm15
+ punpckhdq xmm6,xmm15
+
+ movq QWORD[(0-128)+rdi],xmm12
+ psrldq xmm12,8
+ movq QWORD[(128-128)+rdi],xmm5
+ psrldq xmm5,8
+ movq QWORD[(32-128)+rdi],xmm12
+ movq QWORD[(160-128)+rdi],xmm5
+
+ movq QWORD[(64-128)+rdi],xmm13
+ psrldq xmm13,8
+ movq QWORD[(192-128)+rdi],xmm6
+ psrldq xmm6,8
+ movq QWORD[(96-128)+rdi],xmm13
+ movq QWORD[(224-128)+rdi],xmm6
+
+ lea rdi,[8+rdi]
+ lea rsi,[32+rsi]
+ dec edx
+ jnz NEAR $L$oop_grande_shaext
+
+$L$done_shaext:
+
+ movaps xmm6,XMMWORD[((-184))+rax]
+ movaps xmm7,XMMWORD[((-168))+rax]
+ movaps xmm8,XMMWORD[((-152))+rax]
+ movaps xmm9,XMMWORD[((-136))+rax]
+ movaps xmm10,XMMWORD[((-120))+rax]
+ movaps xmm11,XMMWORD[((-104))+rax]
+ movaps xmm12,XMMWORD[((-88))+rax]
+ movaps xmm13,XMMWORD[((-72))+rax]
+ movaps xmm14,XMMWORD[((-56))+rax]
+ movaps xmm15,XMMWORD[((-40))+rax]
+ mov rbp,QWORD[((-16))+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+
+ lea rsp,[rax]
+
+$L$epilogue_shaext:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_sha256_multi_block_shaext:
+
+ALIGN 32
+sha256_multi_block_avx:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_sha256_multi_block_avx:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+
+_avx_shortcut:
+ shr rcx,32
+ cmp edx,2
+ jb NEAR $L$avx
+ test ecx,32
+ jnz NEAR _avx2_shortcut
+ jmp NEAR $L$avx
+ALIGN 32
+$L$avx:
+ mov rax,rsp
+
+ push rbx
+
+ push rbp
+
+ lea rsp,[((-168))+rsp]
+ movaps XMMWORD[rsp],xmm6
+ movaps XMMWORD[16+rsp],xmm7
+ movaps XMMWORD[32+rsp],xmm8
+ movaps XMMWORD[48+rsp],xmm9
+ movaps XMMWORD[(-120)+rax],xmm10
+ movaps XMMWORD[(-104)+rax],xmm11
+ movaps XMMWORD[(-88)+rax],xmm12
+ movaps XMMWORD[(-72)+rax],xmm13
+ movaps XMMWORD[(-56)+rax],xmm14
+ movaps XMMWORD[(-40)+rax],xmm15
+ sub rsp,288
+ and rsp,-256
+ mov QWORD[272+rsp],rax
+
+$L$body_avx:
+ lea rbp,[((K256+128))]
+ lea rbx,[256+rsp]
+ lea rdi,[128+rdi]
+
+$L$oop_grande_avx:
+ mov DWORD[280+rsp],edx
+ xor edx,edx
+ mov r8,QWORD[rsi]
+ mov ecx,DWORD[8+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[rbx],ecx
+ cmovle r8,rbp
+ mov r9,QWORD[16+rsi]
+ mov ecx,DWORD[24+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[4+rbx],ecx
+ cmovle r9,rbp
+ mov r10,QWORD[32+rsi]
+ mov ecx,DWORD[40+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[8+rbx],ecx
+ cmovle r10,rbp
+ mov r11,QWORD[48+rsi]
+ mov ecx,DWORD[56+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[12+rbx],ecx
+ cmovle r11,rbp
+ test edx,edx
+ jz NEAR $L$done_avx
+
+ vmovdqu xmm8,XMMWORD[((0-128))+rdi]
+ lea rax,[128+rsp]
+ vmovdqu xmm9,XMMWORD[((32-128))+rdi]
+ vmovdqu xmm10,XMMWORD[((64-128))+rdi]
+ vmovdqu xmm11,XMMWORD[((96-128))+rdi]
+ vmovdqu xmm12,XMMWORD[((128-128))+rdi]
+ vmovdqu xmm13,XMMWORD[((160-128))+rdi]
+ vmovdqu xmm14,XMMWORD[((192-128))+rdi]
+ vmovdqu xmm15,XMMWORD[((224-128))+rdi]
+ vmovdqu xmm6,XMMWORD[$L$pbswap]
+ jmp NEAR $L$oop_avx
+
+ALIGN 32
+$L$oop_avx:
+ vpxor xmm4,xmm10,xmm9
+ vmovd xmm5,DWORD[r8]
+ vmovd xmm0,DWORD[r9]
+ vpinsrd xmm5,xmm5,DWORD[r10],1
+ vpinsrd xmm0,xmm0,DWORD[r11],1
+ vpunpckldq xmm5,xmm5,xmm0
+ vpshufb xmm5,xmm5,xmm6
+ vpsrld xmm7,xmm12,6
+ vpslld xmm2,xmm12,26
+ vmovdqu XMMWORD[(0-128)+rax],xmm5
+ vpaddd xmm5,xmm5,xmm15
+
+ vpsrld xmm1,xmm12,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm12,21
+ vpaddd xmm5,xmm5,XMMWORD[((-128))+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm12,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm12,7
+ vpandn xmm0,xmm12,xmm14
+ vpand xmm3,xmm12,xmm13
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm15,xmm8,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm8,30
+ vpxor xmm0,xmm0,xmm3
+ vpxor xmm3,xmm9,xmm8
+
+ vpxor xmm15,xmm15,xmm1
+ vpaddd xmm5,xmm5,xmm7
+
+ vpsrld xmm1,xmm8,13
+
+ vpslld xmm2,xmm8,19
+ vpaddd xmm5,xmm5,xmm0
+ vpand xmm4,xmm4,xmm3
+
+ vpxor xmm7,xmm15,xmm1
+
+ vpsrld xmm1,xmm8,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm8,10
+ vpxor xmm15,xmm9,xmm4
+ vpaddd xmm11,xmm11,xmm5
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm15,xmm15,xmm5
+ vpaddd xmm15,xmm15,xmm7
+ vmovd xmm5,DWORD[4+r8]
+ vmovd xmm0,DWORD[4+r9]
+ vpinsrd xmm5,xmm5,DWORD[4+r10],1
+ vpinsrd xmm0,xmm0,DWORD[4+r11],1
+ vpunpckldq xmm5,xmm5,xmm0
+ vpshufb xmm5,xmm5,xmm6
+ vpsrld xmm7,xmm11,6
+ vpslld xmm2,xmm11,26
+ vmovdqu XMMWORD[(16-128)+rax],xmm5
+ vpaddd xmm5,xmm5,xmm14
+
+ vpsrld xmm1,xmm11,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm11,21
+ vpaddd xmm5,xmm5,XMMWORD[((-96))+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm11,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm11,7
+ vpandn xmm0,xmm11,xmm13
+ vpand xmm4,xmm11,xmm12
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm14,xmm15,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm15,30
+ vpxor xmm0,xmm0,xmm4
+ vpxor xmm4,xmm8,xmm15
+
+ vpxor xmm14,xmm14,xmm1
+ vpaddd xmm5,xmm5,xmm7
+
+ vpsrld xmm1,xmm15,13
+
+ vpslld xmm2,xmm15,19
+ vpaddd xmm5,xmm5,xmm0
+ vpand xmm3,xmm3,xmm4
+
+ vpxor xmm7,xmm14,xmm1
+
+ vpsrld xmm1,xmm15,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm15,10
+ vpxor xmm14,xmm8,xmm3
+ vpaddd xmm10,xmm10,xmm5
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm14,xmm14,xmm5
+ vpaddd xmm14,xmm14,xmm7
+ vmovd xmm5,DWORD[8+r8]
+ vmovd xmm0,DWORD[8+r9]
+ vpinsrd xmm5,xmm5,DWORD[8+r10],1
+ vpinsrd xmm0,xmm0,DWORD[8+r11],1
+ vpunpckldq xmm5,xmm5,xmm0
+ vpshufb xmm5,xmm5,xmm6
+ vpsrld xmm7,xmm10,6
+ vpslld xmm2,xmm10,26
+ vmovdqu XMMWORD[(32-128)+rax],xmm5
+ vpaddd xmm5,xmm5,xmm13
+
+ vpsrld xmm1,xmm10,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm10,21
+ vpaddd xmm5,xmm5,XMMWORD[((-64))+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm10,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm10,7
+ vpandn xmm0,xmm10,xmm12
+ vpand xmm3,xmm10,xmm11
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm13,xmm14,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm14,30
+ vpxor xmm0,xmm0,xmm3
+ vpxor xmm3,xmm15,xmm14
+
+ vpxor xmm13,xmm13,xmm1
+ vpaddd xmm5,xmm5,xmm7
+
+ vpsrld xmm1,xmm14,13
+
+ vpslld xmm2,xmm14,19
+ vpaddd xmm5,xmm5,xmm0
+ vpand xmm4,xmm4,xmm3
+
+ vpxor xmm7,xmm13,xmm1
+
+ vpsrld xmm1,xmm14,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm14,10
+ vpxor xmm13,xmm15,xmm4
+ vpaddd xmm9,xmm9,xmm5
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm13,xmm13,xmm5
+ vpaddd xmm13,xmm13,xmm7
+ vmovd xmm5,DWORD[12+r8]
+ vmovd xmm0,DWORD[12+r9]
+ vpinsrd xmm5,xmm5,DWORD[12+r10],1
+ vpinsrd xmm0,xmm0,DWORD[12+r11],1
+ vpunpckldq xmm5,xmm5,xmm0
+ vpshufb xmm5,xmm5,xmm6
+ vpsrld xmm7,xmm9,6
+ vpslld xmm2,xmm9,26
+ vmovdqu XMMWORD[(48-128)+rax],xmm5
+ vpaddd xmm5,xmm5,xmm12
+
+ vpsrld xmm1,xmm9,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm9,21
+ vpaddd xmm5,xmm5,XMMWORD[((-32))+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm9,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm9,7
+ vpandn xmm0,xmm9,xmm11
+ vpand xmm4,xmm9,xmm10
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm12,xmm13,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm13,30
+ vpxor xmm0,xmm0,xmm4
+ vpxor xmm4,xmm14,xmm13
+
+ vpxor xmm12,xmm12,xmm1
+ vpaddd xmm5,xmm5,xmm7
+
+ vpsrld xmm1,xmm13,13
+
+ vpslld xmm2,xmm13,19
+ vpaddd xmm5,xmm5,xmm0
+ vpand xmm3,xmm3,xmm4
+
+ vpxor xmm7,xmm12,xmm1
+
+ vpsrld xmm1,xmm13,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm13,10
+ vpxor xmm12,xmm14,xmm3
+ vpaddd xmm8,xmm8,xmm5
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm12,xmm12,xmm5
+ vpaddd xmm12,xmm12,xmm7
+ vmovd xmm5,DWORD[16+r8]
+ vmovd xmm0,DWORD[16+r9]
+ vpinsrd xmm5,xmm5,DWORD[16+r10],1
+ vpinsrd xmm0,xmm0,DWORD[16+r11],1
+ vpunpckldq xmm5,xmm5,xmm0
+ vpshufb xmm5,xmm5,xmm6
+ vpsrld xmm7,xmm8,6
+ vpslld xmm2,xmm8,26
+ vmovdqu XMMWORD[(64-128)+rax],xmm5
+ vpaddd xmm5,xmm5,xmm11
+
+ vpsrld xmm1,xmm8,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm8,21
+ vpaddd xmm5,xmm5,XMMWORD[rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm8,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm8,7
+ vpandn xmm0,xmm8,xmm10
+ vpand xmm3,xmm8,xmm9
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm11,xmm12,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm12,30
+ vpxor xmm0,xmm0,xmm3
+ vpxor xmm3,xmm13,xmm12
+
+ vpxor xmm11,xmm11,xmm1
+ vpaddd xmm5,xmm5,xmm7
+
+ vpsrld xmm1,xmm12,13
+
+ vpslld xmm2,xmm12,19
+ vpaddd xmm5,xmm5,xmm0
+ vpand xmm4,xmm4,xmm3
+
+ vpxor xmm7,xmm11,xmm1
+
+ vpsrld xmm1,xmm12,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm12,10
+ vpxor xmm11,xmm13,xmm4
+ vpaddd xmm15,xmm15,xmm5
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm11,xmm11,xmm5
+ vpaddd xmm11,xmm11,xmm7
+ vmovd xmm5,DWORD[20+r8]
+ vmovd xmm0,DWORD[20+r9]
+ vpinsrd xmm5,xmm5,DWORD[20+r10],1
+ vpinsrd xmm0,xmm0,DWORD[20+r11],1
+ vpunpckldq xmm5,xmm5,xmm0
+ vpshufb xmm5,xmm5,xmm6
+ vpsrld xmm7,xmm15,6
+ vpslld xmm2,xmm15,26
+ vmovdqu XMMWORD[(80-128)+rax],xmm5
+ vpaddd xmm5,xmm5,xmm10
+
+ vpsrld xmm1,xmm15,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm15,21
+ vpaddd xmm5,xmm5,XMMWORD[32+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm15,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm15,7
+ vpandn xmm0,xmm15,xmm9
+ vpand xmm4,xmm15,xmm8
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm10,xmm11,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm11,30
+ vpxor xmm0,xmm0,xmm4
+ vpxor xmm4,xmm12,xmm11
+
+ vpxor xmm10,xmm10,xmm1
+ vpaddd xmm5,xmm5,xmm7
+
+ vpsrld xmm1,xmm11,13
+
+ vpslld xmm2,xmm11,19
+ vpaddd xmm5,xmm5,xmm0
+ vpand xmm3,xmm3,xmm4
+
+ vpxor xmm7,xmm10,xmm1
+
+ vpsrld xmm1,xmm11,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm11,10
+ vpxor xmm10,xmm12,xmm3
+ vpaddd xmm14,xmm14,xmm5
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm10,xmm10,xmm5
+ vpaddd xmm10,xmm10,xmm7
+ vmovd xmm5,DWORD[24+r8]
+ vmovd xmm0,DWORD[24+r9]
+ vpinsrd xmm5,xmm5,DWORD[24+r10],1
+ vpinsrd xmm0,xmm0,DWORD[24+r11],1
+ vpunpckldq xmm5,xmm5,xmm0
+ vpshufb xmm5,xmm5,xmm6
+ vpsrld xmm7,xmm14,6
+ vpslld xmm2,xmm14,26
+ vmovdqu XMMWORD[(96-128)+rax],xmm5
+ vpaddd xmm5,xmm5,xmm9
+
+ vpsrld xmm1,xmm14,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm14,21
+ vpaddd xmm5,xmm5,XMMWORD[64+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm14,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm14,7
+ vpandn xmm0,xmm14,xmm8
+ vpand xmm3,xmm14,xmm15
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm9,xmm10,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm10,30
+ vpxor xmm0,xmm0,xmm3
+ vpxor xmm3,xmm11,xmm10
+
+ vpxor xmm9,xmm9,xmm1
+ vpaddd xmm5,xmm5,xmm7
+
+ vpsrld xmm1,xmm10,13
+
+ vpslld xmm2,xmm10,19
+ vpaddd xmm5,xmm5,xmm0
+ vpand xmm4,xmm4,xmm3
+
+ vpxor xmm7,xmm9,xmm1
+
+ vpsrld xmm1,xmm10,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm10,10
+ vpxor xmm9,xmm11,xmm4
+ vpaddd xmm13,xmm13,xmm5
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm9,xmm9,xmm5
+ vpaddd xmm9,xmm9,xmm7
+ vmovd xmm5,DWORD[28+r8]
+ vmovd xmm0,DWORD[28+r9]
+ vpinsrd xmm5,xmm5,DWORD[28+r10],1
+ vpinsrd xmm0,xmm0,DWORD[28+r11],1
+ vpunpckldq xmm5,xmm5,xmm0
+ vpshufb xmm5,xmm5,xmm6
+ vpsrld xmm7,xmm13,6
+ vpslld xmm2,xmm13,26
+ vmovdqu XMMWORD[(112-128)+rax],xmm5
+ vpaddd xmm5,xmm5,xmm8
+
+ vpsrld xmm1,xmm13,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm13,21
+ vpaddd xmm5,xmm5,XMMWORD[96+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm13,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm13,7
+ vpandn xmm0,xmm13,xmm15
+ vpand xmm4,xmm13,xmm14
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm8,xmm9,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm9,30
+ vpxor xmm0,xmm0,xmm4
+ vpxor xmm4,xmm10,xmm9
+
+ vpxor xmm8,xmm8,xmm1
+ vpaddd xmm5,xmm5,xmm7
+
+ vpsrld xmm1,xmm9,13
+
+ vpslld xmm2,xmm9,19
+ vpaddd xmm5,xmm5,xmm0
+ vpand xmm3,xmm3,xmm4
+
+ vpxor xmm7,xmm8,xmm1
+
+ vpsrld xmm1,xmm9,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm9,10
+ vpxor xmm8,xmm10,xmm3
+ vpaddd xmm12,xmm12,xmm5
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm8,xmm8,xmm5
+ vpaddd xmm8,xmm8,xmm7
+ add rbp,256
+ vmovd xmm5,DWORD[32+r8]
+ vmovd xmm0,DWORD[32+r9]
+ vpinsrd xmm5,xmm5,DWORD[32+r10],1
+ vpinsrd xmm0,xmm0,DWORD[32+r11],1
+ vpunpckldq xmm5,xmm5,xmm0
+ vpshufb xmm5,xmm5,xmm6
+ vpsrld xmm7,xmm12,6
+ vpslld xmm2,xmm12,26
+ vmovdqu XMMWORD[(128-128)+rax],xmm5
+ vpaddd xmm5,xmm5,xmm15
+
+ vpsrld xmm1,xmm12,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm12,21
+ vpaddd xmm5,xmm5,XMMWORD[((-128))+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm12,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm12,7
+ vpandn xmm0,xmm12,xmm14
+ vpand xmm3,xmm12,xmm13
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm15,xmm8,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm8,30
+ vpxor xmm0,xmm0,xmm3
+ vpxor xmm3,xmm9,xmm8
+
+ vpxor xmm15,xmm15,xmm1
+ vpaddd xmm5,xmm5,xmm7
+
+ vpsrld xmm1,xmm8,13
+
+ vpslld xmm2,xmm8,19
+ vpaddd xmm5,xmm5,xmm0
+ vpand xmm4,xmm4,xmm3
+
+ vpxor xmm7,xmm15,xmm1
+
+ vpsrld xmm1,xmm8,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm8,10
+ vpxor xmm15,xmm9,xmm4
+ vpaddd xmm11,xmm11,xmm5
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm15,xmm15,xmm5
+ vpaddd xmm15,xmm15,xmm7
+ vmovd xmm5,DWORD[36+r8]
+ vmovd xmm0,DWORD[36+r9]
+ vpinsrd xmm5,xmm5,DWORD[36+r10],1
+ vpinsrd xmm0,xmm0,DWORD[36+r11],1
+ vpunpckldq xmm5,xmm5,xmm0
+ vpshufb xmm5,xmm5,xmm6
+ vpsrld xmm7,xmm11,6
+ vpslld xmm2,xmm11,26
+ vmovdqu XMMWORD[(144-128)+rax],xmm5
+ vpaddd xmm5,xmm5,xmm14
+
+ vpsrld xmm1,xmm11,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm11,21
+ vpaddd xmm5,xmm5,XMMWORD[((-96))+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm11,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm11,7
+ vpandn xmm0,xmm11,xmm13
+ vpand xmm4,xmm11,xmm12
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm14,xmm15,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm15,30
+ vpxor xmm0,xmm0,xmm4
+ vpxor xmm4,xmm8,xmm15
+
+ vpxor xmm14,xmm14,xmm1
+ vpaddd xmm5,xmm5,xmm7
+
+ vpsrld xmm1,xmm15,13
+
+ vpslld xmm2,xmm15,19
+ vpaddd xmm5,xmm5,xmm0
+ vpand xmm3,xmm3,xmm4
+
+ vpxor xmm7,xmm14,xmm1
+
+ vpsrld xmm1,xmm15,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm15,10
+ vpxor xmm14,xmm8,xmm3
+ vpaddd xmm10,xmm10,xmm5
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm14,xmm14,xmm5
+ vpaddd xmm14,xmm14,xmm7
+ vmovd xmm5,DWORD[40+r8]
+ vmovd xmm0,DWORD[40+r9]
+ vpinsrd xmm5,xmm5,DWORD[40+r10],1
+ vpinsrd xmm0,xmm0,DWORD[40+r11],1
+ vpunpckldq xmm5,xmm5,xmm0
+ vpshufb xmm5,xmm5,xmm6
+ vpsrld xmm7,xmm10,6
+ vpslld xmm2,xmm10,26
+ vmovdqu XMMWORD[(160-128)+rax],xmm5
+ vpaddd xmm5,xmm5,xmm13
+
+ vpsrld xmm1,xmm10,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm10,21
+ vpaddd xmm5,xmm5,XMMWORD[((-64))+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm10,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm10,7
+ vpandn xmm0,xmm10,xmm12
+ vpand xmm3,xmm10,xmm11
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm13,xmm14,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm14,30
+ vpxor xmm0,xmm0,xmm3
+ vpxor xmm3,xmm15,xmm14
+
+ vpxor xmm13,xmm13,xmm1
+ vpaddd xmm5,xmm5,xmm7
+
+ vpsrld xmm1,xmm14,13
+
+ vpslld xmm2,xmm14,19
+ vpaddd xmm5,xmm5,xmm0
+ vpand xmm4,xmm4,xmm3
+
+ vpxor xmm7,xmm13,xmm1
+
+ vpsrld xmm1,xmm14,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm14,10
+ vpxor xmm13,xmm15,xmm4
+ vpaddd xmm9,xmm9,xmm5
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm13,xmm13,xmm5
+ vpaddd xmm13,xmm13,xmm7
+ vmovd xmm5,DWORD[44+r8]
+ vmovd xmm0,DWORD[44+r9]
+ vpinsrd xmm5,xmm5,DWORD[44+r10],1
+ vpinsrd xmm0,xmm0,DWORD[44+r11],1
+ vpunpckldq xmm5,xmm5,xmm0
+ vpshufb xmm5,xmm5,xmm6
+ vpsrld xmm7,xmm9,6
+ vpslld xmm2,xmm9,26
+ vmovdqu XMMWORD[(176-128)+rax],xmm5
+ vpaddd xmm5,xmm5,xmm12
+
+ vpsrld xmm1,xmm9,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm9,21
+ vpaddd xmm5,xmm5,XMMWORD[((-32))+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm9,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm9,7
+ vpandn xmm0,xmm9,xmm11
+ vpand xmm4,xmm9,xmm10
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm12,xmm13,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm13,30
+ vpxor xmm0,xmm0,xmm4
+ vpxor xmm4,xmm14,xmm13
+
+ vpxor xmm12,xmm12,xmm1
+ vpaddd xmm5,xmm5,xmm7
+
+ vpsrld xmm1,xmm13,13
+
+ vpslld xmm2,xmm13,19
+ vpaddd xmm5,xmm5,xmm0
+ vpand xmm3,xmm3,xmm4
+
+ vpxor xmm7,xmm12,xmm1
+
+ vpsrld xmm1,xmm13,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm13,10
+ vpxor xmm12,xmm14,xmm3
+ vpaddd xmm8,xmm8,xmm5
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm12,xmm12,xmm5
+ vpaddd xmm12,xmm12,xmm7
+ vmovd xmm5,DWORD[48+r8]
+ vmovd xmm0,DWORD[48+r9]
+ vpinsrd xmm5,xmm5,DWORD[48+r10],1
+ vpinsrd xmm0,xmm0,DWORD[48+r11],1
+ vpunpckldq xmm5,xmm5,xmm0
+ vpshufb xmm5,xmm5,xmm6
+ vpsrld xmm7,xmm8,6
+ vpslld xmm2,xmm8,26
+ vmovdqu XMMWORD[(192-128)+rax],xmm5
+ vpaddd xmm5,xmm5,xmm11
+
+ vpsrld xmm1,xmm8,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm8,21
+ vpaddd xmm5,xmm5,XMMWORD[rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm8,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm8,7
+ vpandn xmm0,xmm8,xmm10
+ vpand xmm3,xmm8,xmm9
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm11,xmm12,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm12,30
+ vpxor xmm0,xmm0,xmm3
+ vpxor xmm3,xmm13,xmm12
+
+ vpxor xmm11,xmm11,xmm1
+ vpaddd xmm5,xmm5,xmm7
+
+ vpsrld xmm1,xmm12,13
+
+ vpslld xmm2,xmm12,19
+ vpaddd xmm5,xmm5,xmm0
+ vpand xmm4,xmm4,xmm3
+
+ vpxor xmm7,xmm11,xmm1
+
+ vpsrld xmm1,xmm12,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm12,10
+ vpxor xmm11,xmm13,xmm4
+ vpaddd xmm15,xmm15,xmm5
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm11,xmm11,xmm5
+ vpaddd xmm11,xmm11,xmm7
+ vmovd xmm5,DWORD[52+r8]
+ vmovd xmm0,DWORD[52+r9]
+ vpinsrd xmm5,xmm5,DWORD[52+r10],1
+ vpinsrd xmm0,xmm0,DWORD[52+r11],1
+ vpunpckldq xmm5,xmm5,xmm0
+ vpshufb xmm5,xmm5,xmm6
+ vpsrld xmm7,xmm15,6
+ vpslld xmm2,xmm15,26
+ vmovdqu XMMWORD[(208-128)+rax],xmm5
+ vpaddd xmm5,xmm5,xmm10
+
+ vpsrld xmm1,xmm15,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm15,21
+ vpaddd xmm5,xmm5,XMMWORD[32+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm15,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm15,7
+ vpandn xmm0,xmm15,xmm9
+ vpand xmm4,xmm15,xmm8
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm10,xmm11,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm11,30
+ vpxor xmm0,xmm0,xmm4
+ vpxor xmm4,xmm12,xmm11
+
+ vpxor xmm10,xmm10,xmm1
+ vpaddd xmm5,xmm5,xmm7
+
+ vpsrld xmm1,xmm11,13
+
+ vpslld xmm2,xmm11,19
+ vpaddd xmm5,xmm5,xmm0
+ vpand xmm3,xmm3,xmm4
+
+ vpxor xmm7,xmm10,xmm1
+
+ vpsrld xmm1,xmm11,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm11,10
+ vpxor xmm10,xmm12,xmm3
+ vpaddd xmm14,xmm14,xmm5
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm10,xmm10,xmm5
+ vpaddd xmm10,xmm10,xmm7
+ vmovd xmm5,DWORD[56+r8]
+ vmovd xmm0,DWORD[56+r9]
+ vpinsrd xmm5,xmm5,DWORD[56+r10],1
+ vpinsrd xmm0,xmm0,DWORD[56+r11],1
+ vpunpckldq xmm5,xmm5,xmm0
+ vpshufb xmm5,xmm5,xmm6
+ vpsrld xmm7,xmm14,6
+ vpslld xmm2,xmm14,26
+ vmovdqu XMMWORD[(224-128)+rax],xmm5
+ vpaddd xmm5,xmm5,xmm9
+
+ vpsrld xmm1,xmm14,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm14,21
+ vpaddd xmm5,xmm5,XMMWORD[64+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm14,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm14,7
+ vpandn xmm0,xmm14,xmm8
+ vpand xmm3,xmm14,xmm15
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm9,xmm10,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm10,30
+ vpxor xmm0,xmm0,xmm3
+ vpxor xmm3,xmm11,xmm10
+
+ vpxor xmm9,xmm9,xmm1
+ vpaddd xmm5,xmm5,xmm7
+
+ vpsrld xmm1,xmm10,13
+
+ vpslld xmm2,xmm10,19
+ vpaddd xmm5,xmm5,xmm0
+ vpand xmm4,xmm4,xmm3
+
+ vpxor xmm7,xmm9,xmm1
+
+ vpsrld xmm1,xmm10,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm10,10
+ vpxor xmm9,xmm11,xmm4
+ vpaddd xmm13,xmm13,xmm5
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm9,xmm9,xmm5
+ vpaddd xmm9,xmm9,xmm7
+ vmovd xmm5,DWORD[60+r8]
+ lea r8,[64+r8]
+ vmovd xmm0,DWORD[60+r9]
+ lea r9,[64+r9]
+ vpinsrd xmm5,xmm5,DWORD[60+r10],1
+ lea r10,[64+r10]
+ vpinsrd xmm0,xmm0,DWORD[60+r11],1
+ lea r11,[64+r11]
+ vpunpckldq xmm5,xmm5,xmm0
+ vpshufb xmm5,xmm5,xmm6
+ vpsrld xmm7,xmm13,6
+ vpslld xmm2,xmm13,26
+ vmovdqu XMMWORD[(240-128)+rax],xmm5
+ vpaddd xmm5,xmm5,xmm8
+
+ vpsrld xmm1,xmm13,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm13,21
+ vpaddd xmm5,xmm5,XMMWORD[96+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm13,25
+ vpxor xmm7,xmm7,xmm2
+ prefetcht0 [63+r8]
+ vpslld xmm2,xmm13,7
+ vpandn xmm0,xmm13,xmm15
+ vpand xmm4,xmm13,xmm14
+ prefetcht0 [63+r9]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm8,xmm9,2
+ vpxor xmm7,xmm7,xmm2
+ prefetcht0 [63+r10]
+ vpslld xmm1,xmm9,30
+ vpxor xmm0,xmm0,xmm4
+ vpxor xmm4,xmm10,xmm9
+ prefetcht0 [63+r11]
+ vpxor xmm8,xmm8,xmm1
+ vpaddd xmm5,xmm5,xmm7
+
+ vpsrld xmm1,xmm9,13
+
+ vpslld xmm2,xmm9,19
+ vpaddd xmm5,xmm5,xmm0
+ vpand xmm3,xmm3,xmm4
+
+ vpxor xmm7,xmm8,xmm1
+
+ vpsrld xmm1,xmm9,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm9,10
+ vpxor xmm8,xmm10,xmm3
+ vpaddd xmm12,xmm12,xmm5
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm8,xmm8,xmm5
+ vpaddd xmm8,xmm8,xmm7
+ add rbp,256
+ vmovdqu xmm5,XMMWORD[((0-128))+rax]
+ mov ecx,3
+ jmp NEAR $L$oop_16_xx_avx
+ALIGN 32
+$L$oop_16_xx_avx:
+ vmovdqu xmm6,XMMWORD[((16-128))+rax]
+ vpaddd xmm5,xmm5,XMMWORD[((144-128))+rax]
+
+ vpsrld xmm7,xmm6,3
+ vpsrld xmm1,xmm6,7
+ vpslld xmm2,xmm6,25
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm6,18
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm6,14
+ vmovdqu xmm0,XMMWORD[((224-128))+rax]
+ vpsrld xmm3,xmm0,10
+
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm0,17
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,15
+ vpaddd xmm5,xmm5,xmm7
+ vpxor xmm7,xmm3,xmm1
+ vpsrld xmm1,xmm0,19
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,13
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+ vpaddd xmm5,xmm5,xmm7
+ vpsrld xmm7,xmm12,6
+ vpslld xmm2,xmm12,26
+ vmovdqu XMMWORD[(0-128)+rax],xmm5
+ vpaddd xmm5,xmm5,xmm15
+
+ vpsrld xmm1,xmm12,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm12,21
+ vpaddd xmm5,xmm5,XMMWORD[((-128))+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm12,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm12,7
+ vpandn xmm0,xmm12,xmm14
+ vpand xmm3,xmm12,xmm13
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm15,xmm8,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm8,30
+ vpxor xmm0,xmm0,xmm3
+ vpxor xmm3,xmm9,xmm8
+
+ vpxor xmm15,xmm15,xmm1
+ vpaddd xmm5,xmm5,xmm7
+
+ vpsrld xmm1,xmm8,13
+
+ vpslld xmm2,xmm8,19
+ vpaddd xmm5,xmm5,xmm0
+ vpand xmm4,xmm4,xmm3
+
+ vpxor xmm7,xmm15,xmm1
+
+ vpsrld xmm1,xmm8,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm8,10
+ vpxor xmm15,xmm9,xmm4
+ vpaddd xmm11,xmm11,xmm5
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm15,xmm15,xmm5
+ vpaddd xmm15,xmm15,xmm7
+ vmovdqu xmm5,XMMWORD[((32-128))+rax]
+ vpaddd xmm6,xmm6,XMMWORD[((160-128))+rax]
+
+ vpsrld xmm7,xmm5,3
+ vpsrld xmm1,xmm5,7
+ vpslld xmm2,xmm5,25
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm5,18
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm5,14
+ vmovdqu xmm0,XMMWORD[((240-128))+rax]
+ vpsrld xmm4,xmm0,10
+
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm0,17
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,15
+ vpaddd xmm6,xmm6,xmm7
+ vpxor xmm7,xmm4,xmm1
+ vpsrld xmm1,xmm0,19
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,13
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+ vpaddd xmm6,xmm6,xmm7
+ vpsrld xmm7,xmm11,6
+ vpslld xmm2,xmm11,26
+ vmovdqu XMMWORD[(16-128)+rax],xmm6
+ vpaddd xmm6,xmm6,xmm14
+
+ vpsrld xmm1,xmm11,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm11,21
+ vpaddd xmm6,xmm6,XMMWORD[((-96))+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm11,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm11,7
+ vpandn xmm0,xmm11,xmm13
+ vpand xmm4,xmm11,xmm12
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm14,xmm15,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm15,30
+ vpxor xmm0,xmm0,xmm4
+ vpxor xmm4,xmm8,xmm15
+
+ vpxor xmm14,xmm14,xmm1
+ vpaddd xmm6,xmm6,xmm7
+
+ vpsrld xmm1,xmm15,13
+
+ vpslld xmm2,xmm15,19
+ vpaddd xmm6,xmm6,xmm0
+ vpand xmm3,xmm3,xmm4
+
+ vpxor xmm7,xmm14,xmm1
+
+ vpsrld xmm1,xmm15,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm15,10
+ vpxor xmm14,xmm8,xmm3
+ vpaddd xmm10,xmm10,xmm6
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm14,xmm14,xmm6
+ vpaddd xmm14,xmm14,xmm7
+ vmovdqu xmm6,XMMWORD[((48-128))+rax]
+ vpaddd xmm5,xmm5,XMMWORD[((176-128))+rax]
+
+ vpsrld xmm7,xmm6,3
+ vpsrld xmm1,xmm6,7
+ vpslld xmm2,xmm6,25
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm6,18
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm6,14
+ vmovdqu xmm0,XMMWORD[((0-128))+rax]
+ vpsrld xmm3,xmm0,10
+
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm0,17
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,15
+ vpaddd xmm5,xmm5,xmm7
+ vpxor xmm7,xmm3,xmm1
+ vpsrld xmm1,xmm0,19
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,13
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+ vpaddd xmm5,xmm5,xmm7
+ vpsrld xmm7,xmm10,6
+ vpslld xmm2,xmm10,26
+ vmovdqu XMMWORD[(32-128)+rax],xmm5
+ vpaddd xmm5,xmm5,xmm13
+
+ vpsrld xmm1,xmm10,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm10,21
+ vpaddd xmm5,xmm5,XMMWORD[((-64))+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm10,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm10,7
+ vpandn xmm0,xmm10,xmm12
+ vpand xmm3,xmm10,xmm11
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm13,xmm14,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm14,30
+ vpxor xmm0,xmm0,xmm3
+ vpxor xmm3,xmm15,xmm14
+
+ vpxor xmm13,xmm13,xmm1
+ vpaddd xmm5,xmm5,xmm7
+
+ vpsrld xmm1,xmm14,13
+
+ vpslld xmm2,xmm14,19
+ vpaddd xmm5,xmm5,xmm0
+ vpand xmm4,xmm4,xmm3
+
+ vpxor xmm7,xmm13,xmm1
+
+ vpsrld xmm1,xmm14,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm14,10
+ vpxor xmm13,xmm15,xmm4
+ vpaddd xmm9,xmm9,xmm5
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm13,xmm13,xmm5
+ vpaddd xmm13,xmm13,xmm7
+ vmovdqu xmm5,XMMWORD[((64-128))+rax]
+ vpaddd xmm6,xmm6,XMMWORD[((192-128))+rax]
+
+ vpsrld xmm7,xmm5,3
+ vpsrld xmm1,xmm5,7
+ vpslld xmm2,xmm5,25
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm5,18
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm5,14
+ vmovdqu xmm0,XMMWORD[((16-128))+rax]
+ vpsrld xmm4,xmm0,10
+
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm0,17
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,15
+ vpaddd xmm6,xmm6,xmm7
+ vpxor xmm7,xmm4,xmm1
+ vpsrld xmm1,xmm0,19
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,13
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+ vpaddd xmm6,xmm6,xmm7
+ vpsrld xmm7,xmm9,6
+ vpslld xmm2,xmm9,26
+ vmovdqu XMMWORD[(48-128)+rax],xmm6
+ vpaddd xmm6,xmm6,xmm12
+
+ vpsrld xmm1,xmm9,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm9,21
+ vpaddd xmm6,xmm6,XMMWORD[((-32))+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm9,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm9,7
+ vpandn xmm0,xmm9,xmm11
+ vpand xmm4,xmm9,xmm10
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm12,xmm13,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm13,30
+ vpxor xmm0,xmm0,xmm4
+ vpxor xmm4,xmm14,xmm13
+
+ vpxor xmm12,xmm12,xmm1
+ vpaddd xmm6,xmm6,xmm7
+
+ vpsrld xmm1,xmm13,13
+
+ vpslld xmm2,xmm13,19
+ vpaddd xmm6,xmm6,xmm0
+ vpand xmm3,xmm3,xmm4
+
+ vpxor xmm7,xmm12,xmm1
+
+ vpsrld xmm1,xmm13,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm13,10
+ vpxor xmm12,xmm14,xmm3
+ vpaddd xmm8,xmm8,xmm6
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm12,xmm12,xmm6
+ vpaddd xmm12,xmm12,xmm7
+ vmovdqu xmm6,XMMWORD[((80-128))+rax]
+ vpaddd xmm5,xmm5,XMMWORD[((208-128))+rax]
+
+ vpsrld xmm7,xmm6,3
+ vpsrld xmm1,xmm6,7
+ vpslld xmm2,xmm6,25
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm6,18
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm6,14
+ vmovdqu xmm0,XMMWORD[((32-128))+rax]
+ vpsrld xmm3,xmm0,10
+
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm0,17
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,15
+ vpaddd xmm5,xmm5,xmm7
+ vpxor xmm7,xmm3,xmm1
+ vpsrld xmm1,xmm0,19
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,13
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+ vpaddd xmm5,xmm5,xmm7
+ vpsrld xmm7,xmm8,6
+ vpslld xmm2,xmm8,26
+ vmovdqu XMMWORD[(64-128)+rax],xmm5
+ vpaddd xmm5,xmm5,xmm11
+
+ vpsrld xmm1,xmm8,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm8,21
+ vpaddd xmm5,xmm5,XMMWORD[rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm8,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm8,7
+ vpandn xmm0,xmm8,xmm10
+ vpand xmm3,xmm8,xmm9
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm11,xmm12,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm12,30
+ vpxor xmm0,xmm0,xmm3
+ vpxor xmm3,xmm13,xmm12
+
+ vpxor xmm11,xmm11,xmm1
+ vpaddd xmm5,xmm5,xmm7
+
+ vpsrld xmm1,xmm12,13
+
+ vpslld xmm2,xmm12,19
+ vpaddd xmm5,xmm5,xmm0
+ vpand xmm4,xmm4,xmm3
+
+ vpxor xmm7,xmm11,xmm1
+
+ vpsrld xmm1,xmm12,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm12,10
+ vpxor xmm11,xmm13,xmm4
+ vpaddd xmm15,xmm15,xmm5
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm11,xmm11,xmm5
+ vpaddd xmm11,xmm11,xmm7
+ vmovdqu xmm5,XMMWORD[((96-128))+rax]
+ vpaddd xmm6,xmm6,XMMWORD[((224-128))+rax]
+
+ vpsrld xmm7,xmm5,3
+ vpsrld xmm1,xmm5,7
+ vpslld xmm2,xmm5,25
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm5,18
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm5,14
+ vmovdqu xmm0,XMMWORD[((48-128))+rax]
+ vpsrld xmm4,xmm0,10
+
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm0,17
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,15
+ vpaddd xmm6,xmm6,xmm7
+ vpxor xmm7,xmm4,xmm1
+ vpsrld xmm1,xmm0,19
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,13
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+ vpaddd xmm6,xmm6,xmm7
+ vpsrld xmm7,xmm15,6
+ vpslld xmm2,xmm15,26
+ vmovdqu XMMWORD[(80-128)+rax],xmm6
+ vpaddd xmm6,xmm6,xmm10
+
+ vpsrld xmm1,xmm15,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm15,21
+ vpaddd xmm6,xmm6,XMMWORD[32+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm15,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm15,7
+ vpandn xmm0,xmm15,xmm9
+ vpand xmm4,xmm15,xmm8
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm10,xmm11,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm11,30
+ vpxor xmm0,xmm0,xmm4
+ vpxor xmm4,xmm12,xmm11
+
+ vpxor xmm10,xmm10,xmm1
+ vpaddd xmm6,xmm6,xmm7
+
+ vpsrld xmm1,xmm11,13
+
+ vpslld xmm2,xmm11,19
+ vpaddd xmm6,xmm6,xmm0
+ vpand xmm3,xmm3,xmm4
+
+ vpxor xmm7,xmm10,xmm1
+
+ vpsrld xmm1,xmm11,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm11,10
+ vpxor xmm10,xmm12,xmm3
+ vpaddd xmm14,xmm14,xmm6
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm10,xmm10,xmm6
+ vpaddd xmm10,xmm10,xmm7
+ vmovdqu xmm6,XMMWORD[((112-128))+rax]
+ vpaddd xmm5,xmm5,XMMWORD[((240-128))+rax]
+
+ vpsrld xmm7,xmm6,3
+ vpsrld xmm1,xmm6,7
+ vpslld xmm2,xmm6,25
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm6,18
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm6,14
+ vmovdqu xmm0,XMMWORD[((64-128))+rax]
+ vpsrld xmm3,xmm0,10
+
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm0,17
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,15
+ vpaddd xmm5,xmm5,xmm7
+ vpxor xmm7,xmm3,xmm1
+ vpsrld xmm1,xmm0,19
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,13
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+ vpaddd xmm5,xmm5,xmm7
+ vpsrld xmm7,xmm14,6
+ vpslld xmm2,xmm14,26
+ vmovdqu XMMWORD[(96-128)+rax],xmm5
+ vpaddd xmm5,xmm5,xmm9
+
+ vpsrld xmm1,xmm14,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm14,21
+ vpaddd xmm5,xmm5,XMMWORD[64+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm14,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm14,7
+ vpandn xmm0,xmm14,xmm8
+ vpand xmm3,xmm14,xmm15
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm9,xmm10,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm10,30
+ vpxor xmm0,xmm0,xmm3
+ vpxor xmm3,xmm11,xmm10
+
+ vpxor xmm9,xmm9,xmm1
+ vpaddd xmm5,xmm5,xmm7
+
+ vpsrld xmm1,xmm10,13
+
+ vpslld xmm2,xmm10,19
+ vpaddd xmm5,xmm5,xmm0
+ vpand xmm4,xmm4,xmm3
+
+ vpxor xmm7,xmm9,xmm1
+
+ vpsrld xmm1,xmm10,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm10,10
+ vpxor xmm9,xmm11,xmm4
+ vpaddd xmm13,xmm13,xmm5
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm9,xmm9,xmm5
+ vpaddd xmm9,xmm9,xmm7
+ vmovdqu xmm5,XMMWORD[((128-128))+rax]
+ vpaddd xmm6,xmm6,XMMWORD[((0-128))+rax]
+
+ vpsrld xmm7,xmm5,3
+ vpsrld xmm1,xmm5,7
+ vpslld xmm2,xmm5,25
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm5,18
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm5,14
+ vmovdqu xmm0,XMMWORD[((80-128))+rax]
+ vpsrld xmm4,xmm0,10
+
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm0,17
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,15
+ vpaddd xmm6,xmm6,xmm7
+ vpxor xmm7,xmm4,xmm1
+ vpsrld xmm1,xmm0,19
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,13
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+ vpaddd xmm6,xmm6,xmm7
+ vpsrld xmm7,xmm13,6
+ vpslld xmm2,xmm13,26
+ vmovdqu XMMWORD[(112-128)+rax],xmm6
+ vpaddd xmm6,xmm6,xmm8
+
+ vpsrld xmm1,xmm13,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm13,21
+ vpaddd xmm6,xmm6,XMMWORD[96+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm13,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm13,7
+ vpandn xmm0,xmm13,xmm15
+ vpand xmm4,xmm13,xmm14
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm8,xmm9,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm9,30
+ vpxor xmm0,xmm0,xmm4
+ vpxor xmm4,xmm10,xmm9
+
+ vpxor xmm8,xmm8,xmm1
+ vpaddd xmm6,xmm6,xmm7
+
+ vpsrld xmm1,xmm9,13
+
+ vpslld xmm2,xmm9,19
+ vpaddd xmm6,xmm6,xmm0
+ vpand xmm3,xmm3,xmm4
+
+ vpxor xmm7,xmm8,xmm1
+
+ vpsrld xmm1,xmm9,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm9,10
+ vpxor xmm8,xmm10,xmm3
+ vpaddd xmm12,xmm12,xmm6
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm8,xmm8,xmm6
+ vpaddd xmm8,xmm8,xmm7
+ add rbp,256
+ vmovdqu xmm6,XMMWORD[((144-128))+rax]
+ vpaddd xmm5,xmm5,XMMWORD[((16-128))+rax]
+
+ vpsrld xmm7,xmm6,3
+ vpsrld xmm1,xmm6,7
+ vpslld xmm2,xmm6,25
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm6,18
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm6,14
+ vmovdqu xmm0,XMMWORD[((96-128))+rax]
+ vpsrld xmm3,xmm0,10
+
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm0,17
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,15
+ vpaddd xmm5,xmm5,xmm7
+ vpxor xmm7,xmm3,xmm1
+ vpsrld xmm1,xmm0,19
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,13
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+ vpaddd xmm5,xmm5,xmm7
+ vpsrld xmm7,xmm12,6
+ vpslld xmm2,xmm12,26
+ vmovdqu XMMWORD[(128-128)+rax],xmm5
+ vpaddd xmm5,xmm5,xmm15
+
+ vpsrld xmm1,xmm12,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm12,21
+ vpaddd xmm5,xmm5,XMMWORD[((-128))+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm12,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm12,7
+ vpandn xmm0,xmm12,xmm14
+ vpand xmm3,xmm12,xmm13
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm15,xmm8,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm8,30
+ vpxor xmm0,xmm0,xmm3
+ vpxor xmm3,xmm9,xmm8
+
+ vpxor xmm15,xmm15,xmm1
+ vpaddd xmm5,xmm5,xmm7
+
+ vpsrld xmm1,xmm8,13
+
+ vpslld xmm2,xmm8,19
+ vpaddd xmm5,xmm5,xmm0
+ vpand xmm4,xmm4,xmm3
+
+ vpxor xmm7,xmm15,xmm1
+
+ vpsrld xmm1,xmm8,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm8,10
+ vpxor xmm15,xmm9,xmm4
+ vpaddd xmm11,xmm11,xmm5
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm15,xmm15,xmm5
+ vpaddd xmm15,xmm15,xmm7
+ vmovdqu xmm5,XMMWORD[((160-128))+rax]
+ vpaddd xmm6,xmm6,XMMWORD[((32-128))+rax]
+
+ vpsrld xmm7,xmm5,3
+ vpsrld xmm1,xmm5,7
+ vpslld xmm2,xmm5,25
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm5,18
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm5,14
+ vmovdqu xmm0,XMMWORD[((112-128))+rax]
+ vpsrld xmm4,xmm0,10
+
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm0,17
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,15
+ vpaddd xmm6,xmm6,xmm7
+ vpxor xmm7,xmm4,xmm1
+ vpsrld xmm1,xmm0,19
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,13
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+ vpaddd xmm6,xmm6,xmm7
+ vpsrld xmm7,xmm11,6
+ vpslld xmm2,xmm11,26
+ vmovdqu XMMWORD[(144-128)+rax],xmm6
+ vpaddd xmm6,xmm6,xmm14
+
+ vpsrld xmm1,xmm11,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm11,21
+ vpaddd xmm6,xmm6,XMMWORD[((-96))+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm11,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm11,7
+ vpandn xmm0,xmm11,xmm13
+ vpand xmm4,xmm11,xmm12
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm14,xmm15,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm15,30
+ vpxor xmm0,xmm0,xmm4
+ vpxor xmm4,xmm8,xmm15
+
+ vpxor xmm14,xmm14,xmm1
+ vpaddd xmm6,xmm6,xmm7
+
+ vpsrld xmm1,xmm15,13
+
+ vpslld xmm2,xmm15,19
+ vpaddd xmm6,xmm6,xmm0
+ vpand xmm3,xmm3,xmm4
+
+ vpxor xmm7,xmm14,xmm1
+
+ vpsrld xmm1,xmm15,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm15,10
+ vpxor xmm14,xmm8,xmm3
+ vpaddd xmm10,xmm10,xmm6
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm14,xmm14,xmm6
+ vpaddd xmm14,xmm14,xmm7
+ vmovdqu xmm6,XMMWORD[((176-128))+rax]
+ vpaddd xmm5,xmm5,XMMWORD[((48-128))+rax]
+
+ vpsrld xmm7,xmm6,3
+ vpsrld xmm1,xmm6,7
+ vpslld xmm2,xmm6,25
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm6,18
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm6,14
+ vmovdqu xmm0,XMMWORD[((128-128))+rax]
+ vpsrld xmm3,xmm0,10
+
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm0,17
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,15
+ vpaddd xmm5,xmm5,xmm7
+ vpxor xmm7,xmm3,xmm1
+ vpsrld xmm1,xmm0,19
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,13
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+ vpaddd xmm5,xmm5,xmm7
+ vpsrld xmm7,xmm10,6
+ vpslld xmm2,xmm10,26
+ vmovdqu XMMWORD[(160-128)+rax],xmm5
+ vpaddd xmm5,xmm5,xmm13
+
+ vpsrld xmm1,xmm10,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm10,21
+ vpaddd xmm5,xmm5,XMMWORD[((-64))+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm10,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm10,7
+ vpandn xmm0,xmm10,xmm12
+ vpand xmm3,xmm10,xmm11
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm13,xmm14,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm14,30
+ vpxor xmm0,xmm0,xmm3
+ vpxor xmm3,xmm15,xmm14
+
+ vpxor xmm13,xmm13,xmm1
+ vpaddd xmm5,xmm5,xmm7
+
+ vpsrld xmm1,xmm14,13
+
+ vpslld xmm2,xmm14,19
+ vpaddd xmm5,xmm5,xmm0
+ vpand xmm4,xmm4,xmm3
+
+ vpxor xmm7,xmm13,xmm1
+
+ vpsrld xmm1,xmm14,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm14,10
+ vpxor xmm13,xmm15,xmm4
+ vpaddd xmm9,xmm9,xmm5
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm13,xmm13,xmm5
+ vpaddd xmm13,xmm13,xmm7
+ vmovdqu xmm5,XMMWORD[((192-128))+rax]
+ vpaddd xmm6,xmm6,XMMWORD[((64-128))+rax]
+
+ vpsrld xmm7,xmm5,3
+ vpsrld xmm1,xmm5,7
+ vpslld xmm2,xmm5,25
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm5,18
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm5,14
+ vmovdqu xmm0,XMMWORD[((144-128))+rax]
+ vpsrld xmm4,xmm0,10
+
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm0,17
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,15
+ vpaddd xmm6,xmm6,xmm7
+ vpxor xmm7,xmm4,xmm1
+ vpsrld xmm1,xmm0,19
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,13
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+ vpaddd xmm6,xmm6,xmm7
+ vpsrld xmm7,xmm9,6
+ vpslld xmm2,xmm9,26
+ vmovdqu XMMWORD[(176-128)+rax],xmm6
+ vpaddd xmm6,xmm6,xmm12
+
+ vpsrld xmm1,xmm9,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm9,21
+ vpaddd xmm6,xmm6,XMMWORD[((-32))+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm9,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm9,7
+ vpandn xmm0,xmm9,xmm11
+ vpand xmm4,xmm9,xmm10
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm12,xmm13,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm13,30
+ vpxor xmm0,xmm0,xmm4
+ vpxor xmm4,xmm14,xmm13
+
+ vpxor xmm12,xmm12,xmm1
+ vpaddd xmm6,xmm6,xmm7
+
+ vpsrld xmm1,xmm13,13
+
+ vpslld xmm2,xmm13,19
+ vpaddd xmm6,xmm6,xmm0
+ vpand xmm3,xmm3,xmm4
+
+ vpxor xmm7,xmm12,xmm1
+
+ vpsrld xmm1,xmm13,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm13,10
+ vpxor xmm12,xmm14,xmm3
+ vpaddd xmm8,xmm8,xmm6
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm12,xmm12,xmm6
+ vpaddd xmm12,xmm12,xmm7
+ vmovdqu xmm6,XMMWORD[((208-128))+rax]
+ vpaddd xmm5,xmm5,XMMWORD[((80-128))+rax]
+
+ vpsrld xmm7,xmm6,3
+ vpsrld xmm1,xmm6,7
+ vpslld xmm2,xmm6,25
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm6,18
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm6,14
+ vmovdqu xmm0,XMMWORD[((160-128))+rax]
+ vpsrld xmm3,xmm0,10
+
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm0,17
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,15
+ vpaddd xmm5,xmm5,xmm7
+ vpxor xmm7,xmm3,xmm1
+ vpsrld xmm1,xmm0,19
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,13
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+ vpaddd xmm5,xmm5,xmm7
+ vpsrld xmm7,xmm8,6
+ vpslld xmm2,xmm8,26
+ vmovdqu XMMWORD[(192-128)+rax],xmm5
+ vpaddd xmm5,xmm5,xmm11
+
+ vpsrld xmm1,xmm8,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm8,21
+ vpaddd xmm5,xmm5,XMMWORD[rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm8,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm8,7
+ vpandn xmm0,xmm8,xmm10
+ vpand xmm3,xmm8,xmm9
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm11,xmm12,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm12,30
+ vpxor xmm0,xmm0,xmm3
+ vpxor xmm3,xmm13,xmm12
+
+ vpxor xmm11,xmm11,xmm1
+ vpaddd xmm5,xmm5,xmm7
+
+ vpsrld xmm1,xmm12,13
+
+ vpslld xmm2,xmm12,19
+ vpaddd xmm5,xmm5,xmm0
+ vpand xmm4,xmm4,xmm3
+
+ vpxor xmm7,xmm11,xmm1
+
+ vpsrld xmm1,xmm12,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm12,10
+ vpxor xmm11,xmm13,xmm4
+ vpaddd xmm15,xmm15,xmm5
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm11,xmm11,xmm5
+ vpaddd xmm11,xmm11,xmm7
+ vmovdqu xmm5,XMMWORD[((224-128))+rax]
+ vpaddd xmm6,xmm6,XMMWORD[((96-128))+rax]
+
+ vpsrld xmm7,xmm5,3
+ vpsrld xmm1,xmm5,7
+ vpslld xmm2,xmm5,25
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm5,18
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm5,14
+ vmovdqu xmm0,XMMWORD[((176-128))+rax]
+ vpsrld xmm4,xmm0,10
+
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm0,17
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,15
+ vpaddd xmm6,xmm6,xmm7
+ vpxor xmm7,xmm4,xmm1
+ vpsrld xmm1,xmm0,19
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,13
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+ vpaddd xmm6,xmm6,xmm7
+ vpsrld xmm7,xmm15,6
+ vpslld xmm2,xmm15,26
+ vmovdqu XMMWORD[(208-128)+rax],xmm6
+ vpaddd xmm6,xmm6,xmm10
+
+ vpsrld xmm1,xmm15,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm15,21
+ vpaddd xmm6,xmm6,XMMWORD[32+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm15,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm15,7
+ vpandn xmm0,xmm15,xmm9
+ vpand xmm4,xmm15,xmm8
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm10,xmm11,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm11,30
+ vpxor xmm0,xmm0,xmm4
+ vpxor xmm4,xmm12,xmm11
+
+ vpxor xmm10,xmm10,xmm1
+ vpaddd xmm6,xmm6,xmm7
+
+ vpsrld xmm1,xmm11,13
+
+ vpslld xmm2,xmm11,19
+ vpaddd xmm6,xmm6,xmm0
+ vpand xmm3,xmm3,xmm4
+
+ vpxor xmm7,xmm10,xmm1
+
+ vpsrld xmm1,xmm11,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm11,10
+ vpxor xmm10,xmm12,xmm3
+ vpaddd xmm14,xmm14,xmm6
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm10,xmm10,xmm6
+ vpaddd xmm10,xmm10,xmm7
+ vmovdqu xmm6,XMMWORD[((240-128))+rax]
+ vpaddd xmm5,xmm5,XMMWORD[((112-128))+rax]
+
+ vpsrld xmm7,xmm6,3
+ vpsrld xmm1,xmm6,7
+ vpslld xmm2,xmm6,25
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm6,18
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm6,14
+ vmovdqu xmm0,XMMWORD[((192-128))+rax]
+ vpsrld xmm3,xmm0,10
+
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm0,17
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,15
+ vpaddd xmm5,xmm5,xmm7
+ vpxor xmm7,xmm3,xmm1
+ vpsrld xmm1,xmm0,19
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,13
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+ vpaddd xmm5,xmm5,xmm7
+ vpsrld xmm7,xmm14,6
+ vpslld xmm2,xmm14,26
+ vmovdqu XMMWORD[(224-128)+rax],xmm5
+ vpaddd xmm5,xmm5,xmm9
+
+ vpsrld xmm1,xmm14,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm14,21
+ vpaddd xmm5,xmm5,XMMWORD[64+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm14,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm14,7
+ vpandn xmm0,xmm14,xmm8
+ vpand xmm3,xmm14,xmm15
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm9,xmm10,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm10,30
+ vpxor xmm0,xmm0,xmm3
+ vpxor xmm3,xmm11,xmm10
+
+ vpxor xmm9,xmm9,xmm1
+ vpaddd xmm5,xmm5,xmm7
+
+ vpsrld xmm1,xmm10,13
+
+ vpslld xmm2,xmm10,19
+ vpaddd xmm5,xmm5,xmm0
+ vpand xmm4,xmm4,xmm3
+
+ vpxor xmm7,xmm9,xmm1
+
+ vpsrld xmm1,xmm10,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm10,10
+ vpxor xmm9,xmm11,xmm4
+ vpaddd xmm13,xmm13,xmm5
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm9,xmm9,xmm5
+ vpaddd xmm9,xmm9,xmm7
+ vmovdqu xmm5,XMMWORD[((0-128))+rax]
+ vpaddd xmm6,xmm6,XMMWORD[((128-128))+rax]
+
+ vpsrld xmm7,xmm5,3
+ vpsrld xmm1,xmm5,7
+ vpslld xmm2,xmm5,25
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm5,18
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm5,14
+ vmovdqu xmm0,XMMWORD[((208-128))+rax]
+ vpsrld xmm4,xmm0,10
+
+ vpxor xmm7,xmm7,xmm1
+ vpsrld xmm1,xmm0,17
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,15
+ vpaddd xmm6,xmm6,xmm7
+ vpxor xmm7,xmm4,xmm1
+ vpsrld xmm1,xmm0,19
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm0,13
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+ vpaddd xmm6,xmm6,xmm7
+ vpsrld xmm7,xmm13,6
+ vpslld xmm2,xmm13,26
+ vmovdqu XMMWORD[(240-128)+rax],xmm6
+ vpaddd xmm6,xmm6,xmm8
+
+ vpsrld xmm1,xmm13,11
+ vpxor xmm7,xmm7,xmm2
+ vpslld xmm2,xmm13,21
+ vpaddd xmm6,xmm6,XMMWORD[96+rbp]
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm1,xmm13,25
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm13,7
+ vpandn xmm0,xmm13,xmm15
+ vpand xmm4,xmm13,xmm14
+
+ vpxor xmm7,xmm7,xmm1
+
+ vpsrld xmm8,xmm9,2
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm1,xmm9,30
+ vpxor xmm0,xmm0,xmm4
+ vpxor xmm4,xmm10,xmm9
+
+ vpxor xmm8,xmm8,xmm1
+ vpaddd xmm6,xmm6,xmm7
+
+ vpsrld xmm1,xmm9,13
+
+ vpslld xmm2,xmm9,19
+ vpaddd xmm6,xmm6,xmm0
+ vpand xmm3,xmm3,xmm4
+
+ vpxor xmm7,xmm8,xmm1
+
+ vpsrld xmm1,xmm9,22
+ vpxor xmm7,xmm7,xmm2
+
+ vpslld xmm2,xmm9,10
+ vpxor xmm8,xmm10,xmm3
+ vpaddd xmm12,xmm12,xmm6
+
+ vpxor xmm7,xmm7,xmm1
+ vpxor xmm7,xmm7,xmm2
+
+ vpaddd xmm8,xmm8,xmm6
+ vpaddd xmm8,xmm8,xmm7
+ add rbp,256
+ dec ecx
+ jnz NEAR $L$oop_16_xx_avx
+
+ mov ecx,1
+ lea rbp,[((K256+128))]
+ cmp ecx,DWORD[rbx]
+ cmovge r8,rbp
+ cmp ecx,DWORD[4+rbx]
+ cmovge r9,rbp
+ cmp ecx,DWORD[8+rbx]
+ cmovge r10,rbp
+ cmp ecx,DWORD[12+rbx]
+ cmovge r11,rbp
+ vmovdqa xmm7,XMMWORD[rbx]
+ vpxor xmm0,xmm0,xmm0
+ vmovdqa xmm6,xmm7
+ vpcmpgtd xmm6,xmm6,xmm0
+ vpaddd xmm7,xmm7,xmm6
+
+ vmovdqu xmm0,XMMWORD[((0-128))+rdi]
+ vpand xmm8,xmm8,xmm6
+ vmovdqu xmm1,XMMWORD[((32-128))+rdi]
+ vpand xmm9,xmm9,xmm6
+ vmovdqu xmm2,XMMWORD[((64-128))+rdi]
+ vpand xmm10,xmm10,xmm6
+ vmovdqu xmm5,XMMWORD[((96-128))+rdi]
+ vpand xmm11,xmm11,xmm6
+ vpaddd xmm8,xmm8,xmm0
+ vmovdqu xmm0,XMMWORD[((128-128))+rdi]
+ vpand xmm12,xmm12,xmm6
+ vpaddd xmm9,xmm9,xmm1
+ vmovdqu xmm1,XMMWORD[((160-128))+rdi]
+ vpand xmm13,xmm13,xmm6
+ vpaddd xmm10,xmm10,xmm2
+ vmovdqu xmm2,XMMWORD[((192-128))+rdi]
+ vpand xmm14,xmm14,xmm6
+ vpaddd xmm11,xmm11,xmm5
+ vmovdqu xmm5,XMMWORD[((224-128))+rdi]
+ vpand xmm15,xmm15,xmm6
+ vpaddd xmm12,xmm12,xmm0
+ vpaddd xmm13,xmm13,xmm1
+ vmovdqu XMMWORD[(0-128)+rdi],xmm8
+ vpaddd xmm14,xmm14,xmm2
+ vmovdqu XMMWORD[(32-128)+rdi],xmm9
+ vpaddd xmm15,xmm15,xmm5
+ vmovdqu XMMWORD[(64-128)+rdi],xmm10
+ vmovdqu XMMWORD[(96-128)+rdi],xmm11
+ vmovdqu XMMWORD[(128-128)+rdi],xmm12
+ vmovdqu XMMWORD[(160-128)+rdi],xmm13
+ vmovdqu XMMWORD[(192-128)+rdi],xmm14
+ vmovdqu XMMWORD[(224-128)+rdi],xmm15
+
+ vmovdqu XMMWORD[rbx],xmm7
+ vmovdqu xmm6,XMMWORD[$L$pbswap]
+ dec edx
+ jnz NEAR $L$oop_avx
+
+ mov edx,DWORD[280+rsp]
+ lea rdi,[16+rdi]
+ lea rsi,[64+rsi]
+ dec edx
+ jnz NEAR $L$oop_grande_avx
+
+$L$done_avx:
+ mov rax,QWORD[272+rsp]
+
+ vzeroupper
+ movaps xmm6,XMMWORD[((-184))+rax]
+ movaps xmm7,XMMWORD[((-168))+rax]
+ movaps xmm8,XMMWORD[((-152))+rax]
+ movaps xmm9,XMMWORD[((-136))+rax]
+ movaps xmm10,XMMWORD[((-120))+rax]
+ movaps xmm11,XMMWORD[((-104))+rax]
+ movaps xmm12,XMMWORD[((-88))+rax]
+ movaps xmm13,XMMWORD[((-72))+rax]
+ movaps xmm14,XMMWORD[((-56))+rax]
+ movaps xmm15,XMMWORD[((-40))+rax]
+ mov rbp,QWORD[((-16))+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+
+ lea rsp,[rax]
+
+$L$epilogue_avx:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_sha256_multi_block_avx:
+
+ALIGN 32
+sha256_multi_block_avx2:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_sha256_multi_block_avx2:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+
+_avx2_shortcut:
+ mov rax,rsp
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+ lea rsp,[((-168))+rsp]
+ movaps XMMWORD[rsp],xmm6
+ movaps XMMWORD[16+rsp],xmm7
+ movaps XMMWORD[32+rsp],xmm8
+ movaps XMMWORD[48+rsp],xmm9
+ movaps XMMWORD[64+rsp],xmm10
+ movaps XMMWORD[80+rsp],xmm11
+ movaps XMMWORD[(-120)+rax],xmm12
+ movaps XMMWORD[(-104)+rax],xmm13
+ movaps XMMWORD[(-88)+rax],xmm14
+ movaps XMMWORD[(-72)+rax],xmm15
+ sub rsp,576
+ and rsp,-256
+ mov QWORD[544+rsp],rax
+
+$L$body_avx2:
+ lea rbp,[((K256+128))]
+ lea rdi,[128+rdi]
+
+$L$oop_grande_avx2:
+ mov DWORD[552+rsp],edx
+ xor edx,edx
+ lea rbx,[512+rsp]
+ mov r12,QWORD[rsi]
+ mov ecx,DWORD[8+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[rbx],ecx
+ cmovle r12,rbp
+ mov r13,QWORD[16+rsi]
+ mov ecx,DWORD[24+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[4+rbx],ecx
+ cmovle r13,rbp
+ mov r14,QWORD[32+rsi]
+ mov ecx,DWORD[40+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[8+rbx],ecx
+ cmovle r14,rbp
+ mov r15,QWORD[48+rsi]
+ mov ecx,DWORD[56+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[12+rbx],ecx
+ cmovle r15,rbp
+ mov r8,QWORD[64+rsi]
+ mov ecx,DWORD[72+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[16+rbx],ecx
+ cmovle r8,rbp
+ mov r9,QWORD[80+rsi]
+ mov ecx,DWORD[88+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[20+rbx],ecx
+ cmovle r9,rbp
+ mov r10,QWORD[96+rsi]
+ mov ecx,DWORD[104+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[24+rbx],ecx
+ cmovle r10,rbp
+ mov r11,QWORD[112+rsi]
+ mov ecx,DWORD[120+rsi]
+ cmp ecx,edx
+ cmovg edx,ecx
+ test ecx,ecx
+ mov DWORD[28+rbx],ecx
+ cmovle r11,rbp
+ vmovdqu ymm8,YMMWORD[((0-128))+rdi]
+ lea rax,[128+rsp]
+ vmovdqu ymm9,YMMWORD[((32-128))+rdi]
+ lea rbx,[((256+128))+rsp]
+ vmovdqu ymm10,YMMWORD[((64-128))+rdi]
+ vmovdqu ymm11,YMMWORD[((96-128))+rdi]
+ vmovdqu ymm12,YMMWORD[((128-128))+rdi]
+ vmovdqu ymm13,YMMWORD[((160-128))+rdi]
+ vmovdqu ymm14,YMMWORD[((192-128))+rdi]
+ vmovdqu ymm15,YMMWORD[((224-128))+rdi]
+ vmovdqu ymm6,YMMWORD[$L$pbswap]
+ jmp NEAR $L$oop_avx2
+
+ALIGN 32
+$L$oop_avx2:
+ vpxor ymm4,ymm10,ymm9
+ vmovd xmm5,DWORD[r12]
+ vmovd xmm0,DWORD[r8]
+ vmovd xmm1,DWORD[r13]
+ vmovd xmm2,DWORD[r9]
+ vpinsrd xmm5,xmm5,DWORD[r14],1
+ vpinsrd xmm0,xmm0,DWORD[r10],1
+ vpinsrd xmm1,xmm1,DWORD[r15],1
+ vpunpckldq ymm5,ymm5,ymm1
+ vpinsrd xmm2,xmm2,DWORD[r11],1
+ vpunpckldq ymm0,ymm0,ymm2
+ vinserti128 ymm5,ymm5,xmm0,1
+ vpshufb ymm5,ymm5,ymm6
+ vpsrld ymm7,ymm12,6
+ vpslld ymm2,ymm12,26
+ vmovdqu YMMWORD[(0-128)+rax],ymm5
+ vpaddd ymm5,ymm5,ymm15
+
+ vpsrld ymm1,ymm12,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm12,21
+ vpaddd ymm5,ymm5,YMMWORD[((-128))+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm12,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm12,7
+ vpandn ymm0,ymm12,ymm14
+ vpand ymm3,ymm12,ymm13
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm15,ymm8,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm8,30
+ vpxor ymm0,ymm0,ymm3
+ vpxor ymm3,ymm9,ymm8
+
+ vpxor ymm15,ymm15,ymm1
+ vpaddd ymm5,ymm5,ymm7
+
+ vpsrld ymm1,ymm8,13
+
+ vpslld ymm2,ymm8,19
+ vpaddd ymm5,ymm5,ymm0
+ vpand ymm4,ymm4,ymm3
+
+ vpxor ymm7,ymm15,ymm1
+
+ vpsrld ymm1,ymm8,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm8,10
+ vpxor ymm15,ymm9,ymm4
+ vpaddd ymm11,ymm11,ymm5
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm15,ymm15,ymm5
+ vpaddd ymm15,ymm15,ymm7
+ vmovd xmm5,DWORD[4+r12]
+ vmovd xmm0,DWORD[4+r8]
+ vmovd xmm1,DWORD[4+r13]
+ vmovd xmm2,DWORD[4+r9]
+ vpinsrd xmm5,xmm5,DWORD[4+r14],1
+ vpinsrd xmm0,xmm0,DWORD[4+r10],1
+ vpinsrd xmm1,xmm1,DWORD[4+r15],1
+ vpunpckldq ymm5,ymm5,ymm1
+ vpinsrd xmm2,xmm2,DWORD[4+r11],1
+ vpunpckldq ymm0,ymm0,ymm2
+ vinserti128 ymm5,ymm5,xmm0,1
+ vpshufb ymm5,ymm5,ymm6
+ vpsrld ymm7,ymm11,6
+ vpslld ymm2,ymm11,26
+ vmovdqu YMMWORD[(32-128)+rax],ymm5
+ vpaddd ymm5,ymm5,ymm14
+
+ vpsrld ymm1,ymm11,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm11,21
+ vpaddd ymm5,ymm5,YMMWORD[((-96))+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm11,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm11,7
+ vpandn ymm0,ymm11,ymm13
+ vpand ymm4,ymm11,ymm12
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm14,ymm15,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm15,30
+ vpxor ymm0,ymm0,ymm4
+ vpxor ymm4,ymm8,ymm15
+
+ vpxor ymm14,ymm14,ymm1
+ vpaddd ymm5,ymm5,ymm7
+
+ vpsrld ymm1,ymm15,13
+
+ vpslld ymm2,ymm15,19
+ vpaddd ymm5,ymm5,ymm0
+ vpand ymm3,ymm3,ymm4
+
+ vpxor ymm7,ymm14,ymm1
+
+ vpsrld ymm1,ymm15,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm15,10
+ vpxor ymm14,ymm8,ymm3
+ vpaddd ymm10,ymm10,ymm5
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm14,ymm14,ymm5
+ vpaddd ymm14,ymm14,ymm7
+ vmovd xmm5,DWORD[8+r12]
+ vmovd xmm0,DWORD[8+r8]
+ vmovd xmm1,DWORD[8+r13]
+ vmovd xmm2,DWORD[8+r9]
+ vpinsrd xmm5,xmm5,DWORD[8+r14],1
+ vpinsrd xmm0,xmm0,DWORD[8+r10],1
+ vpinsrd xmm1,xmm1,DWORD[8+r15],1
+ vpunpckldq ymm5,ymm5,ymm1
+ vpinsrd xmm2,xmm2,DWORD[8+r11],1
+ vpunpckldq ymm0,ymm0,ymm2
+ vinserti128 ymm5,ymm5,xmm0,1
+ vpshufb ymm5,ymm5,ymm6
+ vpsrld ymm7,ymm10,6
+ vpslld ymm2,ymm10,26
+ vmovdqu YMMWORD[(64-128)+rax],ymm5
+ vpaddd ymm5,ymm5,ymm13
+
+ vpsrld ymm1,ymm10,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm10,21
+ vpaddd ymm5,ymm5,YMMWORD[((-64))+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm10,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm10,7
+ vpandn ymm0,ymm10,ymm12
+ vpand ymm3,ymm10,ymm11
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm13,ymm14,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm14,30
+ vpxor ymm0,ymm0,ymm3
+ vpxor ymm3,ymm15,ymm14
+
+ vpxor ymm13,ymm13,ymm1
+ vpaddd ymm5,ymm5,ymm7
+
+ vpsrld ymm1,ymm14,13
+
+ vpslld ymm2,ymm14,19
+ vpaddd ymm5,ymm5,ymm0
+ vpand ymm4,ymm4,ymm3
+
+ vpxor ymm7,ymm13,ymm1
+
+ vpsrld ymm1,ymm14,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm14,10
+ vpxor ymm13,ymm15,ymm4
+ vpaddd ymm9,ymm9,ymm5
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm13,ymm13,ymm5
+ vpaddd ymm13,ymm13,ymm7
+ vmovd xmm5,DWORD[12+r12]
+ vmovd xmm0,DWORD[12+r8]
+ vmovd xmm1,DWORD[12+r13]
+ vmovd xmm2,DWORD[12+r9]
+ vpinsrd xmm5,xmm5,DWORD[12+r14],1
+ vpinsrd xmm0,xmm0,DWORD[12+r10],1
+ vpinsrd xmm1,xmm1,DWORD[12+r15],1
+ vpunpckldq ymm5,ymm5,ymm1
+ vpinsrd xmm2,xmm2,DWORD[12+r11],1
+ vpunpckldq ymm0,ymm0,ymm2
+ vinserti128 ymm5,ymm5,xmm0,1
+ vpshufb ymm5,ymm5,ymm6
+ vpsrld ymm7,ymm9,6
+ vpslld ymm2,ymm9,26
+ vmovdqu YMMWORD[(96-128)+rax],ymm5
+ vpaddd ymm5,ymm5,ymm12
+
+ vpsrld ymm1,ymm9,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm9,21
+ vpaddd ymm5,ymm5,YMMWORD[((-32))+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm9,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm9,7
+ vpandn ymm0,ymm9,ymm11
+ vpand ymm4,ymm9,ymm10
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm12,ymm13,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm13,30
+ vpxor ymm0,ymm0,ymm4
+ vpxor ymm4,ymm14,ymm13
+
+ vpxor ymm12,ymm12,ymm1
+ vpaddd ymm5,ymm5,ymm7
+
+ vpsrld ymm1,ymm13,13
+
+ vpslld ymm2,ymm13,19
+ vpaddd ymm5,ymm5,ymm0
+ vpand ymm3,ymm3,ymm4
+
+ vpxor ymm7,ymm12,ymm1
+
+ vpsrld ymm1,ymm13,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm13,10
+ vpxor ymm12,ymm14,ymm3
+ vpaddd ymm8,ymm8,ymm5
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm12,ymm12,ymm5
+ vpaddd ymm12,ymm12,ymm7
+ vmovd xmm5,DWORD[16+r12]
+ vmovd xmm0,DWORD[16+r8]
+ vmovd xmm1,DWORD[16+r13]
+ vmovd xmm2,DWORD[16+r9]
+ vpinsrd xmm5,xmm5,DWORD[16+r14],1
+ vpinsrd xmm0,xmm0,DWORD[16+r10],1
+ vpinsrd xmm1,xmm1,DWORD[16+r15],1
+ vpunpckldq ymm5,ymm5,ymm1
+ vpinsrd xmm2,xmm2,DWORD[16+r11],1
+ vpunpckldq ymm0,ymm0,ymm2
+ vinserti128 ymm5,ymm5,xmm0,1
+ vpshufb ymm5,ymm5,ymm6
+ vpsrld ymm7,ymm8,6
+ vpslld ymm2,ymm8,26
+ vmovdqu YMMWORD[(128-128)+rax],ymm5
+ vpaddd ymm5,ymm5,ymm11
+
+ vpsrld ymm1,ymm8,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm8,21
+ vpaddd ymm5,ymm5,YMMWORD[rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm8,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm8,7
+ vpandn ymm0,ymm8,ymm10
+ vpand ymm3,ymm8,ymm9
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm11,ymm12,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm12,30
+ vpxor ymm0,ymm0,ymm3
+ vpxor ymm3,ymm13,ymm12
+
+ vpxor ymm11,ymm11,ymm1
+ vpaddd ymm5,ymm5,ymm7
+
+ vpsrld ymm1,ymm12,13
+
+ vpslld ymm2,ymm12,19
+ vpaddd ymm5,ymm5,ymm0
+ vpand ymm4,ymm4,ymm3
+
+ vpxor ymm7,ymm11,ymm1
+
+ vpsrld ymm1,ymm12,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm12,10
+ vpxor ymm11,ymm13,ymm4
+ vpaddd ymm15,ymm15,ymm5
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm11,ymm11,ymm5
+ vpaddd ymm11,ymm11,ymm7
+ vmovd xmm5,DWORD[20+r12]
+ vmovd xmm0,DWORD[20+r8]
+ vmovd xmm1,DWORD[20+r13]
+ vmovd xmm2,DWORD[20+r9]
+ vpinsrd xmm5,xmm5,DWORD[20+r14],1
+ vpinsrd xmm0,xmm0,DWORD[20+r10],1
+ vpinsrd xmm1,xmm1,DWORD[20+r15],1
+ vpunpckldq ymm5,ymm5,ymm1
+ vpinsrd xmm2,xmm2,DWORD[20+r11],1
+ vpunpckldq ymm0,ymm0,ymm2
+ vinserti128 ymm5,ymm5,xmm0,1
+ vpshufb ymm5,ymm5,ymm6
+ vpsrld ymm7,ymm15,6
+ vpslld ymm2,ymm15,26
+ vmovdqu YMMWORD[(160-128)+rax],ymm5
+ vpaddd ymm5,ymm5,ymm10
+
+ vpsrld ymm1,ymm15,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm15,21
+ vpaddd ymm5,ymm5,YMMWORD[32+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm15,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm15,7
+ vpandn ymm0,ymm15,ymm9
+ vpand ymm4,ymm15,ymm8
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm10,ymm11,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm11,30
+ vpxor ymm0,ymm0,ymm4
+ vpxor ymm4,ymm12,ymm11
+
+ vpxor ymm10,ymm10,ymm1
+ vpaddd ymm5,ymm5,ymm7
+
+ vpsrld ymm1,ymm11,13
+
+ vpslld ymm2,ymm11,19
+ vpaddd ymm5,ymm5,ymm0
+ vpand ymm3,ymm3,ymm4
+
+ vpxor ymm7,ymm10,ymm1
+
+ vpsrld ymm1,ymm11,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm11,10
+ vpxor ymm10,ymm12,ymm3
+ vpaddd ymm14,ymm14,ymm5
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm10,ymm10,ymm5
+ vpaddd ymm10,ymm10,ymm7
+ vmovd xmm5,DWORD[24+r12]
+ vmovd xmm0,DWORD[24+r8]
+ vmovd xmm1,DWORD[24+r13]
+ vmovd xmm2,DWORD[24+r9]
+ vpinsrd xmm5,xmm5,DWORD[24+r14],1
+ vpinsrd xmm0,xmm0,DWORD[24+r10],1
+ vpinsrd xmm1,xmm1,DWORD[24+r15],1
+ vpunpckldq ymm5,ymm5,ymm1
+ vpinsrd xmm2,xmm2,DWORD[24+r11],1
+ vpunpckldq ymm0,ymm0,ymm2
+ vinserti128 ymm5,ymm5,xmm0,1
+ vpshufb ymm5,ymm5,ymm6
+ vpsrld ymm7,ymm14,6
+ vpslld ymm2,ymm14,26
+ vmovdqu YMMWORD[(192-128)+rax],ymm5
+ vpaddd ymm5,ymm5,ymm9
+
+ vpsrld ymm1,ymm14,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm14,21
+ vpaddd ymm5,ymm5,YMMWORD[64+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm14,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm14,7
+ vpandn ymm0,ymm14,ymm8
+ vpand ymm3,ymm14,ymm15
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm9,ymm10,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm10,30
+ vpxor ymm0,ymm0,ymm3
+ vpxor ymm3,ymm11,ymm10
+
+ vpxor ymm9,ymm9,ymm1
+ vpaddd ymm5,ymm5,ymm7
+
+ vpsrld ymm1,ymm10,13
+
+ vpslld ymm2,ymm10,19
+ vpaddd ymm5,ymm5,ymm0
+ vpand ymm4,ymm4,ymm3
+
+ vpxor ymm7,ymm9,ymm1
+
+ vpsrld ymm1,ymm10,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm10,10
+ vpxor ymm9,ymm11,ymm4
+ vpaddd ymm13,ymm13,ymm5
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm9,ymm9,ymm5
+ vpaddd ymm9,ymm9,ymm7
+ vmovd xmm5,DWORD[28+r12]
+ vmovd xmm0,DWORD[28+r8]
+ vmovd xmm1,DWORD[28+r13]
+ vmovd xmm2,DWORD[28+r9]
+ vpinsrd xmm5,xmm5,DWORD[28+r14],1
+ vpinsrd xmm0,xmm0,DWORD[28+r10],1
+ vpinsrd xmm1,xmm1,DWORD[28+r15],1
+ vpunpckldq ymm5,ymm5,ymm1
+ vpinsrd xmm2,xmm2,DWORD[28+r11],1
+ vpunpckldq ymm0,ymm0,ymm2
+ vinserti128 ymm5,ymm5,xmm0,1
+ vpshufb ymm5,ymm5,ymm6
+ vpsrld ymm7,ymm13,6
+ vpslld ymm2,ymm13,26
+ vmovdqu YMMWORD[(224-128)+rax],ymm5
+ vpaddd ymm5,ymm5,ymm8
+
+ vpsrld ymm1,ymm13,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm13,21
+ vpaddd ymm5,ymm5,YMMWORD[96+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm13,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm13,7
+ vpandn ymm0,ymm13,ymm15
+ vpand ymm4,ymm13,ymm14
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm8,ymm9,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm9,30
+ vpxor ymm0,ymm0,ymm4
+ vpxor ymm4,ymm10,ymm9
+
+ vpxor ymm8,ymm8,ymm1
+ vpaddd ymm5,ymm5,ymm7
+
+ vpsrld ymm1,ymm9,13
+
+ vpslld ymm2,ymm9,19
+ vpaddd ymm5,ymm5,ymm0
+ vpand ymm3,ymm3,ymm4
+
+ vpxor ymm7,ymm8,ymm1
+
+ vpsrld ymm1,ymm9,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm9,10
+ vpxor ymm8,ymm10,ymm3
+ vpaddd ymm12,ymm12,ymm5
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm8,ymm8,ymm5
+ vpaddd ymm8,ymm8,ymm7
+ add rbp,256
+ vmovd xmm5,DWORD[32+r12]
+ vmovd xmm0,DWORD[32+r8]
+ vmovd xmm1,DWORD[32+r13]
+ vmovd xmm2,DWORD[32+r9]
+ vpinsrd xmm5,xmm5,DWORD[32+r14],1
+ vpinsrd xmm0,xmm0,DWORD[32+r10],1
+ vpinsrd xmm1,xmm1,DWORD[32+r15],1
+ vpunpckldq ymm5,ymm5,ymm1
+ vpinsrd xmm2,xmm2,DWORD[32+r11],1
+ vpunpckldq ymm0,ymm0,ymm2
+ vinserti128 ymm5,ymm5,xmm0,1
+ vpshufb ymm5,ymm5,ymm6
+ vpsrld ymm7,ymm12,6
+ vpslld ymm2,ymm12,26
+ vmovdqu YMMWORD[(256-256-128)+rbx],ymm5
+ vpaddd ymm5,ymm5,ymm15
+
+ vpsrld ymm1,ymm12,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm12,21
+ vpaddd ymm5,ymm5,YMMWORD[((-128))+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm12,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm12,7
+ vpandn ymm0,ymm12,ymm14
+ vpand ymm3,ymm12,ymm13
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm15,ymm8,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm8,30
+ vpxor ymm0,ymm0,ymm3
+ vpxor ymm3,ymm9,ymm8
+
+ vpxor ymm15,ymm15,ymm1
+ vpaddd ymm5,ymm5,ymm7
+
+ vpsrld ymm1,ymm8,13
+
+ vpslld ymm2,ymm8,19
+ vpaddd ymm5,ymm5,ymm0
+ vpand ymm4,ymm4,ymm3
+
+ vpxor ymm7,ymm15,ymm1
+
+ vpsrld ymm1,ymm8,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm8,10
+ vpxor ymm15,ymm9,ymm4
+ vpaddd ymm11,ymm11,ymm5
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm15,ymm15,ymm5
+ vpaddd ymm15,ymm15,ymm7
+ vmovd xmm5,DWORD[36+r12]
+ vmovd xmm0,DWORD[36+r8]
+ vmovd xmm1,DWORD[36+r13]
+ vmovd xmm2,DWORD[36+r9]
+ vpinsrd xmm5,xmm5,DWORD[36+r14],1
+ vpinsrd xmm0,xmm0,DWORD[36+r10],1
+ vpinsrd xmm1,xmm1,DWORD[36+r15],1
+ vpunpckldq ymm5,ymm5,ymm1
+ vpinsrd xmm2,xmm2,DWORD[36+r11],1
+ vpunpckldq ymm0,ymm0,ymm2
+ vinserti128 ymm5,ymm5,xmm0,1
+ vpshufb ymm5,ymm5,ymm6
+ vpsrld ymm7,ymm11,6
+ vpslld ymm2,ymm11,26
+ vmovdqu YMMWORD[(288-256-128)+rbx],ymm5
+ vpaddd ymm5,ymm5,ymm14
+
+ vpsrld ymm1,ymm11,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm11,21
+ vpaddd ymm5,ymm5,YMMWORD[((-96))+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm11,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm11,7
+ vpandn ymm0,ymm11,ymm13
+ vpand ymm4,ymm11,ymm12
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm14,ymm15,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm15,30
+ vpxor ymm0,ymm0,ymm4
+ vpxor ymm4,ymm8,ymm15
+
+ vpxor ymm14,ymm14,ymm1
+ vpaddd ymm5,ymm5,ymm7
+
+ vpsrld ymm1,ymm15,13
+
+ vpslld ymm2,ymm15,19
+ vpaddd ymm5,ymm5,ymm0
+ vpand ymm3,ymm3,ymm4
+
+ vpxor ymm7,ymm14,ymm1
+
+ vpsrld ymm1,ymm15,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm15,10
+ vpxor ymm14,ymm8,ymm3
+ vpaddd ymm10,ymm10,ymm5
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm14,ymm14,ymm5
+ vpaddd ymm14,ymm14,ymm7
+ vmovd xmm5,DWORD[40+r12]
+ vmovd xmm0,DWORD[40+r8]
+ vmovd xmm1,DWORD[40+r13]
+ vmovd xmm2,DWORD[40+r9]
+ vpinsrd xmm5,xmm5,DWORD[40+r14],1
+ vpinsrd xmm0,xmm0,DWORD[40+r10],1
+ vpinsrd xmm1,xmm1,DWORD[40+r15],1
+ vpunpckldq ymm5,ymm5,ymm1
+ vpinsrd xmm2,xmm2,DWORD[40+r11],1
+ vpunpckldq ymm0,ymm0,ymm2
+ vinserti128 ymm5,ymm5,xmm0,1
+ vpshufb ymm5,ymm5,ymm6
+ vpsrld ymm7,ymm10,6
+ vpslld ymm2,ymm10,26
+ vmovdqu YMMWORD[(320-256-128)+rbx],ymm5
+ vpaddd ymm5,ymm5,ymm13
+
+ vpsrld ymm1,ymm10,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm10,21
+ vpaddd ymm5,ymm5,YMMWORD[((-64))+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm10,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm10,7
+ vpandn ymm0,ymm10,ymm12
+ vpand ymm3,ymm10,ymm11
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm13,ymm14,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm14,30
+ vpxor ymm0,ymm0,ymm3
+ vpxor ymm3,ymm15,ymm14
+
+ vpxor ymm13,ymm13,ymm1
+ vpaddd ymm5,ymm5,ymm7
+
+ vpsrld ymm1,ymm14,13
+
+ vpslld ymm2,ymm14,19
+ vpaddd ymm5,ymm5,ymm0
+ vpand ymm4,ymm4,ymm3
+
+ vpxor ymm7,ymm13,ymm1
+
+ vpsrld ymm1,ymm14,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm14,10
+ vpxor ymm13,ymm15,ymm4
+ vpaddd ymm9,ymm9,ymm5
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm13,ymm13,ymm5
+ vpaddd ymm13,ymm13,ymm7
+ vmovd xmm5,DWORD[44+r12]
+ vmovd xmm0,DWORD[44+r8]
+ vmovd xmm1,DWORD[44+r13]
+ vmovd xmm2,DWORD[44+r9]
+ vpinsrd xmm5,xmm5,DWORD[44+r14],1
+ vpinsrd xmm0,xmm0,DWORD[44+r10],1
+ vpinsrd xmm1,xmm1,DWORD[44+r15],1
+ vpunpckldq ymm5,ymm5,ymm1
+ vpinsrd xmm2,xmm2,DWORD[44+r11],1
+ vpunpckldq ymm0,ymm0,ymm2
+ vinserti128 ymm5,ymm5,xmm0,1
+ vpshufb ymm5,ymm5,ymm6
+ vpsrld ymm7,ymm9,6
+ vpslld ymm2,ymm9,26
+ vmovdqu YMMWORD[(352-256-128)+rbx],ymm5
+ vpaddd ymm5,ymm5,ymm12
+
+ vpsrld ymm1,ymm9,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm9,21
+ vpaddd ymm5,ymm5,YMMWORD[((-32))+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm9,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm9,7
+ vpandn ymm0,ymm9,ymm11
+ vpand ymm4,ymm9,ymm10
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm12,ymm13,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm13,30
+ vpxor ymm0,ymm0,ymm4
+ vpxor ymm4,ymm14,ymm13
+
+ vpxor ymm12,ymm12,ymm1
+ vpaddd ymm5,ymm5,ymm7
+
+ vpsrld ymm1,ymm13,13
+
+ vpslld ymm2,ymm13,19
+ vpaddd ymm5,ymm5,ymm0
+ vpand ymm3,ymm3,ymm4
+
+ vpxor ymm7,ymm12,ymm1
+
+ vpsrld ymm1,ymm13,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm13,10
+ vpxor ymm12,ymm14,ymm3
+ vpaddd ymm8,ymm8,ymm5
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm12,ymm12,ymm5
+ vpaddd ymm12,ymm12,ymm7
+ vmovd xmm5,DWORD[48+r12]
+ vmovd xmm0,DWORD[48+r8]
+ vmovd xmm1,DWORD[48+r13]
+ vmovd xmm2,DWORD[48+r9]
+ vpinsrd xmm5,xmm5,DWORD[48+r14],1
+ vpinsrd xmm0,xmm0,DWORD[48+r10],1
+ vpinsrd xmm1,xmm1,DWORD[48+r15],1
+ vpunpckldq ymm5,ymm5,ymm1
+ vpinsrd xmm2,xmm2,DWORD[48+r11],1
+ vpunpckldq ymm0,ymm0,ymm2
+ vinserti128 ymm5,ymm5,xmm0,1
+ vpshufb ymm5,ymm5,ymm6
+ vpsrld ymm7,ymm8,6
+ vpslld ymm2,ymm8,26
+ vmovdqu YMMWORD[(384-256-128)+rbx],ymm5
+ vpaddd ymm5,ymm5,ymm11
+
+ vpsrld ymm1,ymm8,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm8,21
+ vpaddd ymm5,ymm5,YMMWORD[rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm8,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm8,7
+ vpandn ymm0,ymm8,ymm10
+ vpand ymm3,ymm8,ymm9
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm11,ymm12,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm12,30
+ vpxor ymm0,ymm0,ymm3
+ vpxor ymm3,ymm13,ymm12
+
+ vpxor ymm11,ymm11,ymm1
+ vpaddd ymm5,ymm5,ymm7
+
+ vpsrld ymm1,ymm12,13
+
+ vpslld ymm2,ymm12,19
+ vpaddd ymm5,ymm5,ymm0
+ vpand ymm4,ymm4,ymm3
+
+ vpxor ymm7,ymm11,ymm1
+
+ vpsrld ymm1,ymm12,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm12,10
+ vpxor ymm11,ymm13,ymm4
+ vpaddd ymm15,ymm15,ymm5
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm11,ymm11,ymm5
+ vpaddd ymm11,ymm11,ymm7
+ vmovd xmm5,DWORD[52+r12]
+ vmovd xmm0,DWORD[52+r8]
+ vmovd xmm1,DWORD[52+r13]
+ vmovd xmm2,DWORD[52+r9]
+ vpinsrd xmm5,xmm5,DWORD[52+r14],1
+ vpinsrd xmm0,xmm0,DWORD[52+r10],1
+ vpinsrd xmm1,xmm1,DWORD[52+r15],1
+ vpunpckldq ymm5,ymm5,ymm1
+ vpinsrd xmm2,xmm2,DWORD[52+r11],1
+ vpunpckldq ymm0,ymm0,ymm2
+ vinserti128 ymm5,ymm5,xmm0,1
+ vpshufb ymm5,ymm5,ymm6
+ vpsrld ymm7,ymm15,6
+ vpslld ymm2,ymm15,26
+ vmovdqu YMMWORD[(416-256-128)+rbx],ymm5
+ vpaddd ymm5,ymm5,ymm10
+
+ vpsrld ymm1,ymm15,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm15,21
+ vpaddd ymm5,ymm5,YMMWORD[32+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm15,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm15,7
+ vpandn ymm0,ymm15,ymm9
+ vpand ymm4,ymm15,ymm8
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm10,ymm11,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm11,30
+ vpxor ymm0,ymm0,ymm4
+ vpxor ymm4,ymm12,ymm11
+
+ vpxor ymm10,ymm10,ymm1
+ vpaddd ymm5,ymm5,ymm7
+
+ vpsrld ymm1,ymm11,13
+
+ vpslld ymm2,ymm11,19
+ vpaddd ymm5,ymm5,ymm0
+ vpand ymm3,ymm3,ymm4
+
+ vpxor ymm7,ymm10,ymm1
+
+ vpsrld ymm1,ymm11,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm11,10
+ vpxor ymm10,ymm12,ymm3
+ vpaddd ymm14,ymm14,ymm5
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm10,ymm10,ymm5
+ vpaddd ymm10,ymm10,ymm7
+ vmovd xmm5,DWORD[56+r12]
+ vmovd xmm0,DWORD[56+r8]
+ vmovd xmm1,DWORD[56+r13]
+ vmovd xmm2,DWORD[56+r9]
+ vpinsrd xmm5,xmm5,DWORD[56+r14],1
+ vpinsrd xmm0,xmm0,DWORD[56+r10],1
+ vpinsrd xmm1,xmm1,DWORD[56+r15],1
+ vpunpckldq ymm5,ymm5,ymm1
+ vpinsrd xmm2,xmm2,DWORD[56+r11],1
+ vpunpckldq ymm0,ymm0,ymm2
+ vinserti128 ymm5,ymm5,xmm0,1
+ vpshufb ymm5,ymm5,ymm6
+ vpsrld ymm7,ymm14,6
+ vpslld ymm2,ymm14,26
+ vmovdqu YMMWORD[(448-256-128)+rbx],ymm5
+ vpaddd ymm5,ymm5,ymm9
+
+ vpsrld ymm1,ymm14,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm14,21
+ vpaddd ymm5,ymm5,YMMWORD[64+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm14,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm14,7
+ vpandn ymm0,ymm14,ymm8
+ vpand ymm3,ymm14,ymm15
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm9,ymm10,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm10,30
+ vpxor ymm0,ymm0,ymm3
+ vpxor ymm3,ymm11,ymm10
+
+ vpxor ymm9,ymm9,ymm1
+ vpaddd ymm5,ymm5,ymm7
+
+ vpsrld ymm1,ymm10,13
+
+ vpslld ymm2,ymm10,19
+ vpaddd ymm5,ymm5,ymm0
+ vpand ymm4,ymm4,ymm3
+
+ vpxor ymm7,ymm9,ymm1
+
+ vpsrld ymm1,ymm10,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm10,10
+ vpxor ymm9,ymm11,ymm4
+ vpaddd ymm13,ymm13,ymm5
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm9,ymm9,ymm5
+ vpaddd ymm9,ymm9,ymm7
+ vmovd xmm5,DWORD[60+r12]
+ lea r12,[64+r12]
+ vmovd xmm0,DWORD[60+r8]
+ lea r8,[64+r8]
+ vmovd xmm1,DWORD[60+r13]
+ lea r13,[64+r13]
+ vmovd xmm2,DWORD[60+r9]
+ lea r9,[64+r9]
+ vpinsrd xmm5,xmm5,DWORD[60+r14],1
+ lea r14,[64+r14]
+ vpinsrd xmm0,xmm0,DWORD[60+r10],1
+ lea r10,[64+r10]
+ vpinsrd xmm1,xmm1,DWORD[60+r15],1
+ lea r15,[64+r15]
+ vpunpckldq ymm5,ymm5,ymm1
+ vpinsrd xmm2,xmm2,DWORD[60+r11],1
+ lea r11,[64+r11]
+ vpunpckldq ymm0,ymm0,ymm2
+ vinserti128 ymm5,ymm5,xmm0,1
+ vpshufb ymm5,ymm5,ymm6
+ vpsrld ymm7,ymm13,6
+ vpslld ymm2,ymm13,26
+ vmovdqu YMMWORD[(480-256-128)+rbx],ymm5
+ vpaddd ymm5,ymm5,ymm8
+
+ vpsrld ymm1,ymm13,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm13,21
+ vpaddd ymm5,ymm5,YMMWORD[96+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm13,25
+ vpxor ymm7,ymm7,ymm2
+ prefetcht0 [63+r12]
+ vpslld ymm2,ymm13,7
+ vpandn ymm0,ymm13,ymm15
+ vpand ymm4,ymm13,ymm14
+ prefetcht0 [63+r13]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm8,ymm9,2
+ vpxor ymm7,ymm7,ymm2
+ prefetcht0 [63+r14]
+ vpslld ymm1,ymm9,30
+ vpxor ymm0,ymm0,ymm4
+ vpxor ymm4,ymm10,ymm9
+ prefetcht0 [63+r15]
+ vpxor ymm8,ymm8,ymm1
+ vpaddd ymm5,ymm5,ymm7
+
+ vpsrld ymm1,ymm9,13
+ prefetcht0 [63+r8]
+ vpslld ymm2,ymm9,19
+ vpaddd ymm5,ymm5,ymm0
+ vpand ymm3,ymm3,ymm4
+ prefetcht0 [63+r9]
+ vpxor ymm7,ymm8,ymm1
+
+ vpsrld ymm1,ymm9,22
+ vpxor ymm7,ymm7,ymm2
+ prefetcht0 [63+r10]
+ vpslld ymm2,ymm9,10
+ vpxor ymm8,ymm10,ymm3
+ vpaddd ymm12,ymm12,ymm5
+ prefetcht0 [63+r11]
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm8,ymm8,ymm5
+ vpaddd ymm8,ymm8,ymm7
+ add rbp,256
+ vmovdqu ymm5,YMMWORD[((0-128))+rax]
+ mov ecx,3
+ jmp NEAR $L$oop_16_xx_avx2
+ALIGN 32
+$L$oop_16_xx_avx2:
+ vmovdqu ymm6,YMMWORD[((32-128))+rax]
+ vpaddd ymm5,ymm5,YMMWORD[((288-256-128))+rbx]
+
+ vpsrld ymm7,ymm6,3
+ vpsrld ymm1,ymm6,7
+ vpslld ymm2,ymm6,25
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm6,18
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm6,14
+ vmovdqu ymm0,YMMWORD[((448-256-128))+rbx]
+ vpsrld ymm3,ymm0,10
+
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm0,17
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,15
+ vpaddd ymm5,ymm5,ymm7
+ vpxor ymm7,ymm3,ymm1
+ vpsrld ymm1,ymm0,19
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,13
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+ vpaddd ymm5,ymm5,ymm7
+ vpsrld ymm7,ymm12,6
+ vpslld ymm2,ymm12,26
+ vmovdqu YMMWORD[(0-128)+rax],ymm5
+ vpaddd ymm5,ymm5,ymm15
+
+ vpsrld ymm1,ymm12,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm12,21
+ vpaddd ymm5,ymm5,YMMWORD[((-128))+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm12,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm12,7
+ vpandn ymm0,ymm12,ymm14
+ vpand ymm3,ymm12,ymm13
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm15,ymm8,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm8,30
+ vpxor ymm0,ymm0,ymm3
+ vpxor ymm3,ymm9,ymm8
+
+ vpxor ymm15,ymm15,ymm1
+ vpaddd ymm5,ymm5,ymm7
+
+ vpsrld ymm1,ymm8,13
+
+ vpslld ymm2,ymm8,19
+ vpaddd ymm5,ymm5,ymm0
+ vpand ymm4,ymm4,ymm3
+
+ vpxor ymm7,ymm15,ymm1
+
+ vpsrld ymm1,ymm8,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm8,10
+ vpxor ymm15,ymm9,ymm4
+ vpaddd ymm11,ymm11,ymm5
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm15,ymm15,ymm5
+ vpaddd ymm15,ymm15,ymm7
+ vmovdqu ymm5,YMMWORD[((64-128))+rax]
+ vpaddd ymm6,ymm6,YMMWORD[((320-256-128))+rbx]
+
+ vpsrld ymm7,ymm5,3
+ vpsrld ymm1,ymm5,7
+ vpslld ymm2,ymm5,25
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm5,18
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm5,14
+ vmovdqu ymm0,YMMWORD[((480-256-128))+rbx]
+ vpsrld ymm4,ymm0,10
+
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm0,17
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,15
+ vpaddd ymm6,ymm6,ymm7
+ vpxor ymm7,ymm4,ymm1
+ vpsrld ymm1,ymm0,19
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,13
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+ vpaddd ymm6,ymm6,ymm7
+ vpsrld ymm7,ymm11,6
+ vpslld ymm2,ymm11,26
+ vmovdqu YMMWORD[(32-128)+rax],ymm6
+ vpaddd ymm6,ymm6,ymm14
+
+ vpsrld ymm1,ymm11,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm11,21
+ vpaddd ymm6,ymm6,YMMWORD[((-96))+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm11,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm11,7
+ vpandn ymm0,ymm11,ymm13
+ vpand ymm4,ymm11,ymm12
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm14,ymm15,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm15,30
+ vpxor ymm0,ymm0,ymm4
+ vpxor ymm4,ymm8,ymm15
+
+ vpxor ymm14,ymm14,ymm1
+ vpaddd ymm6,ymm6,ymm7
+
+ vpsrld ymm1,ymm15,13
+
+ vpslld ymm2,ymm15,19
+ vpaddd ymm6,ymm6,ymm0
+ vpand ymm3,ymm3,ymm4
+
+ vpxor ymm7,ymm14,ymm1
+
+ vpsrld ymm1,ymm15,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm15,10
+ vpxor ymm14,ymm8,ymm3
+ vpaddd ymm10,ymm10,ymm6
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm14,ymm14,ymm6
+ vpaddd ymm14,ymm14,ymm7
+ vmovdqu ymm6,YMMWORD[((96-128))+rax]
+ vpaddd ymm5,ymm5,YMMWORD[((352-256-128))+rbx]
+
+ vpsrld ymm7,ymm6,3
+ vpsrld ymm1,ymm6,7
+ vpslld ymm2,ymm6,25
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm6,18
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm6,14
+ vmovdqu ymm0,YMMWORD[((0-128))+rax]
+ vpsrld ymm3,ymm0,10
+
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm0,17
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,15
+ vpaddd ymm5,ymm5,ymm7
+ vpxor ymm7,ymm3,ymm1
+ vpsrld ymm1,ymm0,19
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,13
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+ vpaddd ymm5,ymm5,ymm7
+ vpsrld ymm7,ymm10,6
+ vpslld ymm2,ymm10,26
+ vmovdqu YMMWORD[(64-128)+rax],ymm5
+ vpaddd ymm5,ymm5,ymm13
+
+ vpsrld ymm1,ymm10,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm10,21
+ vpaddd ymm5,ymm5,YMMWORD[((-64))+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm10,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm10,7
+ vpandn ymm0,ymm10,ymm12
+ vpand ymm3,ymm10,ymm11
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm13,ymm14,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm14,30
+ vpxor ymm0,ymm0,ymm3
+ vpxor ymm3,ymm15,ymm14
+
+ vpxor ymm13,ymm13,ymm1
+ vpaddd ymm5,ymm5,ymm7
+
+ vpsrld ymm1,ymm14,13
+
+ vpslld ymm2,ymm14,19
+ vpaddd ymm5,ymm5,ymm0
+ vpand ymm4,ymm4,ymm3
+
+ vpxor ymm7,ymm13,ymm1
+
+ vpsrld ymm1,ymm14,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm14,10
+ vpxor ymm13,ymm15,ymm4
+ vpaddd ymm9,ymm9,ymm5
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm13,ymm13,ymm5
+ vpaddd ymm13,ymm13,ymm7
+ vmovdqu ymm5,YMMWORD[((128-128))+rax]
+ vpaddd ymm6,ymm6,YMMWORD[((384-256-128))+rbx]
+
+ vpsrld ymm7,ymm5,3
+ vpsrld ymm1,ymm5,7
+ vpslld ymm2,ymm5,25
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm5,18
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm5,14
+ vmovdqu ymm0,YMMWORD[((32-128))+rax]
+ vpsrld ymm4,ymm0,10
+
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm0,17
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,15
+ vpaddd ymm6,ymm6,ymm7
+ vpxor ymm7,ymm4,ymm1
+ vpsrld ymm1,ymm0,19
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,13
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+ vpaddd ymm6,ymm6,ymm7
+ vpsrld ymm7,ymm9,6
+ vpslld ymm2,ymm9,26
+ vmovdqu YMMWORD[(96-128)+rax],ymm6
+ vpaddd ymm6,ymm6,ymm12
+
+ vpsrld ymm1,ymm9,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm9,21
+ vpaddd ymm6,ymm6,YMMWORD[((-32))+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm9,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm9,7
+ vpandn ymm0,ymm9,ymm11
+ vpand ymm4,ymm9,ymm10
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm12,ymm13,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm13,30
+ vpxor ymm0,ymm0,ymm4
+ vpxor ymm4,ymm14,ymm13
+
+ vpxor ymm12,ymm12,ymm1
+ vpaddd ymm6,ymm6,ymm7
+
+ vpsrld ymm1,ymm13,13
+
+ vpslld ymm2,ymm13,19
+ vpaddd ymm6,ymm6,ymm0
+ vpand ymm3,ymm3,ymm4
+
+ vpxor ymm7,ymm12,ymm1
+
+ vpsrld ymm1,ymm13,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm13,10
+ vpxor ymm12,ymm14,ymm3
+ vpaddd ymm8,ymm8,ymm6
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm12,ymm12,ymm6
+ vpaddd ymm12,ymm12,ymm7
+ vmovdqu ymm6,YMMWORD[((160-128))+rax]
+ vpaddd ymm5,ymm5,YMMWORD[((416-256-128))+rbx]
+
+ vpsrld ymm7,ymm6,3
+ vpsrld ymm1,ymm6,7
+ vpslld ymm2,ymm6,25
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm6,18
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm6,14
+ vmovdqu ymm0,YMMWORD[((64-128))+rax]
+ vpsrld ymm3,ymm0,10
+
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm0,17
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,15
+ vpaddd ymm5,ymm5,ymm7
+ vpxor ymm7,ymm3,ymm1
+ vpsrld ymm1,ymm0,19
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,13
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+ vpaddd ymm5,ymm5,ymm7
+ vpsrld ymm7,ymm8,6
+ vpslld ymm2,ymm8,26
+ vmovdqu YMMWORD[(128-128)+rax],ymm5
+ vpaddd ymm5,ymm5,ymm11
+
+ vpsrld ymm1,ymm8,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm8,21
+ vpaddd ymm5,ymm5,YMMWORD[rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm8,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm8,7
+ vpandn ymm0,ymm8,ymm10
+ vpand ymm3,ymm8,ymm9
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm11,ymm12,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm12,30
+ vpxor ymm0,ymm0,ymm3
+ vpxor ymm3,ymm13,ymm12
+
+ vpxor ymm11,ymm11,ymm1
+ vpaddd ymm5,ymm5,ymm7
+
+ vpsrld ymm1,ymm12,13
+
+ vpslld ymm2,ymm12,19
+ vpaddd ymm5,ymm5,ymm0
+ vpand ymm4,ymm4,ymm3
+
+ vpxor ymm7,ymm11,ymm1
+
+ vpsrld ymm1,ymm12,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm12,10
+ vpxor ymm11,ymm13,ymm4
+ vpaddd ymm15,ymm15,ymm5
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm11,ymm11,ymm5
+ vpaddd ymm11,ymm11,ymm7
+ vmovdqu ymm5,YMMWORD[((192-128))+rax]
+ vpaddd ymm6,ymm6,YMMWORD[((448-256-128))+rbx]
+
+ vpsrld ymm7,ymm5,3
+ vpsrld ymm1,ymm5,7
+ vpslld ymm2,ymm5,25
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm5,18
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm5,14
+ vmovdqu ymm0,YMMWORD[((96-128))+rax]
+ vpsrld ymm4,ymm0,10
+
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm0,17
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,15
+ vpaddd ymm6,ymm6,ymm7
+ vpxor ymm7,ymm4,ymm1
+ vpsrld ymm1,ymm0,19
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,13
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+ vpaddd ymm6,ymm6,ymm7
+ vpsrld ymm7,ymm15,6
+ vpslld ymm2,ymm15,26
+ vmovdqu YMMWORD[(160-128)+rax],ymm6
+ vpaddd ymm6,ymm6,ymm10
+
+ vpsrld ymm1,ymm15,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm15,21
+ vpaddd ymm6,ymm6,YMMWORD[32+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm15,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm15,7
+ vpandn ymm0,ymm15,ymm9
+ vpand ymm4,ymm15,ymm8
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm10,ymm11,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm11,30
+ vpxor ymm0,ymm0,ymm4
+ vpxor ymm4,ymm12,ymm11
+
+ vpxor ymm10,ymm10,ymm1
+ vpaddd ymm6,ymm6,ymm7
+
+ vpsrld ymm1,ymm11,13
+
+ vpslld ymm2,ymm11,19
+ vpaddd ymm6,ymm6,ymm0
+ vpand ymm3,ymm3,ymm4
+
+ vpxor ymm7,ymm10,ymm1
+
+ vpsrld ymm1,ymm11,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm11,10
+ vpxor ymm10,ymm12,ymm3
+ vpaddd ymm14,ymm14,ymm6
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm10,ymm10,ymm6
+ vpaddd ymm10,ymm10,ymm7
+ vmovdqu ymm6,YMMWORD[((224-128))+rax]
+ vpaddd ymm5,ymm5,YMMWORD[((480-256-128))+rbx]
+
+ vpsrld ymm7,ymm6,3
+ vpsrld ymm1,ymm6,7
+ vpslld ymm2,ymm6,25
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm6,18
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm6,14
+ vmovdqu ymm0,YMMWORD[((128-128))+rax]
+ vpsrld ymm3,ymm0,10
+
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm0,17
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,15
+ vpaddd ymm5,ymm5,ymm7
+ vpxor ymm7,ymm3,ymm1
+ vpsrld ymm1,ymm0,19
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,13
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+ vpaddd ymm5,ymm5,ymm7
+ vpsrld ymm7,ymm14,6
+ vpslld ymm2,ymm14,26
+ vmovdqu YMMWORD[(192-128)+rax],ymm5
+ vpaddd ymm5,ymm5,ymm9
+
+ vpsrld ymm1,ymm14,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm14,21
+ vpaddd ymm5,ymm5,YMMWORD[64+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm14,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm14,7
+ vpandn ymm0,ymm14,ymm8
+ vpand ymm3,ymm14,ymm15
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm9,ymm10,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm10,30
+ vpxor ymm0,ymm0,ymm3
+ vpxor ymm3,ymm11,ymm10
+
+ vpxor ymm9,ymm9,ymm1
+ vpaddd ymm5,ymm5,ymm7
+
+ vpsrld ymm1,ymm10,13
+
+ vpslld ymm2,ymm10,19
+ vpaddd ymm5,ymm5,ymm0
+ vpand ymm4,ymm4,ymm3
+
+ vpxor ymm7,ymm9,ymm1
+
+ vpsrld ymm1,ymm10,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm10,10
+ vpxor ymm9,ymm11,ymm4
+ vpaddd ymm13,ymm13,ymm5
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm9,ymm9,ymm5
+ vpaddd ymm9,ymm9,ymm7
+ vmovdqu ymm5,YMMWORD[((256-256-128))+rbx]
+ vpaddd ymm6,ymm6,YMMWORD[((0-128))+rax]
+
+ vpsrld ymm7,ymm5,3
+ vpsrld ymm1,ymm5,7
+ vpslld ymm2,ymm5,25
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm5,18
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm5,14
+ vmovdqu ymm0,YMMWORD[((160-128))+rax]
+ vpsrld ymm4,ymm0,10
+
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm0,17
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,15
+ vpaddd ymm6,ymm6,ymm7
+ vpxor ymm7,ymm4,ymm1
+ vpsrld ymm1,ymm0,19
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,13
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+ vpaddd ymm6,ymm6,ymm7
+ vpsrld ymm7,ymm13,6
+ vpslld ymm2,ymm13,26
+ vmovdqu YMMWORD[(224-128)+rax],ymm6
+ vpaddd ymm6,ymm6,ymm8
+
+ vpsrld ymm1,ymm13,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm13,21
+ vpaddd ymm6,ymm6,YMMWORD[96+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm13,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm13,7
+ vpandn ymm0,ymm13,ymm15
+ vpand ymm4,ymm13,ymm14
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm8,ymm9,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm9,30
+ vpxor ymm0,ymm0,ymm4
+ vpxor ymm4,ymm10,ymm9
+
+ vpxor ymm8,ymm8,ymm1
+ vpaddd ymm6,ymm6,ymm7
+
+ vpsrld ymm1,ymm9,13
+
+ vpslld ymm2,ymm9,19
+ vpaddd ymm6,ymm6,ymm0
+ vpand ymm3,ymm3,ymm4
+
+ vpxor ymm7,ymm8,ymm1
+
+ vpsrld ymm1,ymm9,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm9,10
+ vpxor ymm8,ymm10,ymm3
+ vpaddd ymm12,ymm12,ymm6
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm8,ymm8,ymm6
+ vpaddd ymm8,ymm8,ymm7
+ add rbp,256
+ vmovdqu ymm6,YMMWORD[((288-256-128))+rbx]
+ vpaddd ymm5,ymm5,YMMWORD[((32-128))+rax]
+
+ vpsrld ymm7,ymm6,3
+ vpsrld ymm1,ymm6,7
+ vpslld ymm2,ymm6,25
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm6,18
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm6,14
+ vmovdqu ymm0,YMMWORD[((192-128))+rax]
+ vpsrld ymm3,ymm0,10
+
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm0,17
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,15
+ vpaddd ymm5,ymm5,ymm7
+ vpxor ymm7,ymm3,ymm1
+ vpsrld ymm1,ymm0,19
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,13
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+ vpaddd ymm5,ymm5,ymm7
+ vpsrld ymm7,ymm12,6
+ vpslld ymm2,ymm12,26
+ vmovdqu YMMWORD[(256-256-128)+rbx],ymm5
+ vpaddd ymm5,ymm5,ymm15
+
+ vpsrld ymm1,ymm12,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm12,21
+ vpaddd ymm5,ymm5,YMMWORD[((-128))+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm12,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm12,7
+ vpandn ymm0,ymm12,ymm14
+ vpand ymm3,ymm12,ymm13
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm15,ymm8,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm8,30
+ vpxor ymm0,ymm0,ymm3
+ vpxor ymm3,ymm9,ymm8
+
+ vpxor ymm15,ymm15,ymm1
+ vpaddd ymm5,ymm5,ymm7
+
+ vpsrld ymm1,ymm8,13
+
+ vpslld ymm2,ymm8,19
+ vpaddd ymm5,ymm5,ymm0
+ vpand ymm4,ymm4,ymm3
+
+ vpxor ymm7,ymm15,ymm1
+
+ vpsrld ymm1,ymm8,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm8,10
+ vpxor ymm15,ymm9,ymm4
+ vpaddd ymm11,ymm11,ymm5
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm15,ymm15,ymm5
+ vpaddd ymm15,ymm15,ymm7
+ vmovdqu ymm5,YMMWORD[((320-256-128))+rbx]
+ vpaddd ymm6,ymm6,YMMWORD[((64-128))+rax]
+
+ vpsrld ymm7,ymm5,3
+ vpsrld ymm1,ymm5,7
+ vpslld ymm2,ymm5,25
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm5,18
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm5,14
+ vmovdqu ymm0,YMMWORD[((224-128))+rax]
+ vpsrld ymm4,ymm0,10
+
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm0,17
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,15
+ vpaddd ymm6,ymm6,ymm7
+ vpxor ymm7,ymm4,ymm1
+ vpsrld ymm1,ymm0,19
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,13
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+ vpaddd ymm6,ymm6,ymm7
+ vpsrld ymm7,ymm11,6
+ vpslld ymm2,ymm11,26
+ vmovdqu YMMWORD[(288-256-128)+rbx],ymm6
+ vpaddd ymm6,ymm6,ymm14
+
+ vpsrld ymm1,ymm11,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm11,21
+ vpaddd ymm6,ymm6,YMMWORD[((-96))+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm11,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm11,7
+ vpandn ymm0,ymm11,ymm13
+ vpand ymm4,ymm11,ymm12
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm14,ymm15,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm15,30
+ vpxor ymm0,ymm0,ymm4
+ vpxor ymm4,ymm8,ymm15
+
+ vpxor ymm14,ymm14,ymm1
+ vpaddd ymm6,ymm6,ymm7
+
+ vpsrld ymm1,ymm15,13
+
+ vpslld ymm2,ymm15,19
+ vpaddd ymm6,ymm6,ymm0
+ vpand ymm3,ymm3,ymm4
+
+ vpxor ymm7,ymm14,ymm1
+
+ vpsrld ymm1,ymm15,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm15,10
+ vpxor ymm14,ymm8,ymm3
+ vpaddd ymm10,ymm10,ymm6
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm14,ymm14,ymm6
+ vpaddd ymm14,ymm14,ymm7
+ vmovdqu ymm6,YMMWORD[((352-256-128))+rbx]
+ vpaddd ymm5,ymm5,YMMWORD[((96-128))+rax]
+
+ vpsrld ymm7,ymm6,3
+ vpsrld ymm1,ymm6,7
+ vpslld ymm2,ymm6,25
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm6,18
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm6,14
+ vmovdqu ymm0,YMMWORD[((256-256-128))+rbx]
+ vpsrld ymm3,ymm0,10
+
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm0,17
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,15
+ vpaddd ymm5,ymm5,ymm7
+ vpxor ymm7,ymm3,ymm1
+ vpsrld ymm1,ymm0,19
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,13
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+ vpaddd ymm5,ymm5,ymm7
+ vpsrld ymm7,ymm10,6
+ vpslld ymm2,ymm10,26
+ vmovdqu YMMWORD[(320-256-128)+rbx],ymm5
+ vpaddd ymm5,ymm5,ymm13
+
+ vpsrld ymm1,ymm10,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm10,21
+ vpaddd ymm5,ymm5,YMMWORD[((-64))+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm10,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm10,7
+ vpandn ymm0,ymm10,ymm12
+ vpand ymm3,ymm10,ymm11
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm13,ymm14,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm14,30
+ vpxor ymm0,ymm0,ymm3
+ vpxor ymm3,ymm15,ymm14
+
+ vpxor ymm13,ymm13,ymm1
+ vpaddd ymm5,ymm5,ymm7
+
+ vpsrld ymm1,ymm14,13
+
+ vpslld ymm2,ymm14,19
+ vpaddd ymm5,ymm5,ymm0
+ vpand ymm4,ymm4,ymm3
+
+ vpxor ymm7,ymm13,ymm1
+
+ vpsrld ymm1,ymm14,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm14,10
+ vpxor ymm13,ymm15,ymm4
+ vpaddd ymm9,ymm9,ymm5
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm13,ymm13,ymm5
+ vpaddd ymm13,ymm13,ymm7
+ vmovdqu ymm5,YMMWORD[((384-256-128))+rbx]
+ vpaddd ymm6,ymm6,YMMWORD[((128-128))+rax]
+
+ vpsrld ymm7,ymm5,3
+ vpsrld ymm1,ymm5,7
+ vpslld ymm2,ymm5,25
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm5,18
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm5,14
+ vmovdqu ymm0,YMMWORD[((288-256-128))+rbx]
+ vpsrld ymm4,ymm0,10
+
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm0,17
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,15
+ vpaddd ymm6,ymm6,ymm7
+ vpxor ymm7,ymm4,ymm1
+ vpsrld ymm1,ymm0,19
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,13
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+ vpaddd ymm6,ymm6,ymm7
+ vpsrld ymm7,ymm9,6
+ vpslld ymm2,ymm9,26
+ vmovdqu YMMWORD[(352-256-128)+rbx],ymm6
+ vpaddd ymm6,ymm6,ymm12
+
+ vpsrld ymm1,ymm9,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm9,21
+ vpaddd ymm6,ymm6,YMMWORD[((-32))+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm9,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm9,7
+ vpandn ymm0,ymm9,ymm11
+ vpand ymm4,ymm9,ymm10
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm12,ymm13,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm13,30
+ vpxor ymm0,ymm0,ymm4
+ vpxor ymm4,ymm14,ymm13
+
+ vpxor ymm12,ymm12,ymm1
+ vpaddd ymm6,ymm6,ymm7
+
+ vpsrld ymm1,ymm13,13
+
+ vpslld ymm2,ymm13,19
+ vpaddd ymm6,ymm6,ymm0
+ vpand ymm3,ymm3,ymm4
+
+ vpxor ymm7,ymm12,ymm1
+
+ vpsrld ymm1,ymm13,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm13,10
+ vpxor ymm12,ymm14,ymm3
+ vpaddd ymm8,ymm8,ymm6
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm12,ymm12,ymm6
+ vpaddd ymm12,ymm12,ymm7
+ vmovdqu ymm6,YMMWORD[((416-256-128))+rbx]
+ vpaddd ymm5,ymm5,YMMWORD[((160-128))+rax]
+
+ vpsrld ymm7,ymm6,3
+ vpsrld ymm1,ymm6,7
+ vpslld ymm2,ymm6,25
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm6,18
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm6,14
+ vmovdqu ymm0,YMMWORD[((320-256-128))+rbx]
+ vpsrld ymm3,ymm0,10
+
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm0,17
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,15
+ vpaddd ymm5,ymm5,ymm7
+ vpxor ymm7,ymm3,ymm1
+ vpsrld ymm1,ymm0,19
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,13
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+ vpaddd ymm5,ymm5,ymm7
+ vpsrld ymm7,ymm8,6
+ vpslld ymm2,ymm8,26
+ vmovdqu YMMWORD[(384-256-128)+rbx],ymm5
+ vpaddd ymm5,ymm5,ymm11
+
+ vpsrld ymm1,ymm8,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm8,21
+ vpaddd ymm5,ymm5,YMMWORD[rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm8,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm8,7
+ vpandn ymm0,ymm8,ymm10
+ vpand ymm3,ymm8,ymm9
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm11,ymm12,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm12,30
+ vpxor ymm0,ymm0,ymm3
+ vpxor ymm3,ymm13,ymm12
+
+ vpxor ymm11,ymm11,ymm1
+ vpaddd ymm5,ymm5,ymm7
+
+ vpsrld ymm1,ymm12,13
+
+ vpslld ymm2,ymm12,19
+ vpaddd ymm5,ymm5,ymm0
+ vpand ymm4,ymm4,ymm3
+
+ vpxor ymm7,ymm11,ymm1
+
+ vpsrld ymm1,ymm12,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm12,10
+ vpxor ymm11,ymm13,ymm4
+ vpaddd ymm15,ymm15,ymm5
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm11,ymm11,ymm5
+ vpaddd ymm11,ymm11,ymm7
+ vmovdqu ymm5,YMMWORD[((448-256-128))+rbx]
+ vpaddd ymm6,ymm6,YMMWORD[((192-128))+rax]
+
+ vpsrld ymm7,ymm5,3
+ vpsrld ymm1,ymm5,7
+ vpslld ymm2,ymm5,25
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm5,18
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm5,14
+ vmovdqu ymm0,YMMWORD[((352-256-128))+rbx]
+ vpsrld ymm4,ymm0,10
+
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm0,17
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,15
+ vpaddd ymm6,ymm6,ymm7
+ vpxor ymm7,ymm4,ymm1
+ vpsrld ymm1,ymm0,19
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,13
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+ vpaddd ymm6,ymm6,ymm7
+ vpsrld ymm7,ymm15,6
+ vpslld ymm2,ymm15,26
+ vmovdqu YMMWORD[(416-256-128)+rbx],ymm6
+ vpaddd ymm6,ymm6,ymm10
+
+ vpsrld ymm1,ymm15,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm15,21
+ vpaddd ymm6,ymm6,YMMWORD[32+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm15,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm15,7
+ vpandn ymm0,ymm15,ymm9
+ vpand ymm4,ymm15,ymm8
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm10,ymm11,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm11,30
+ vpxor ymm0,ymm0,ymm4
+ vpxor ymm4,ymm12,ymm11
+
+ vpxor ymm10,ymm10,ymm1
+ vpaddd ymm6,ymm6,ymm7
+
+ vpsrld ymm1,ymm11,13
+
+ vpslld ymm2,ymm11,19
+ vpaddd ymm6,ymm6,ymm0
+ vpand ymm3,ymm3,ymm4
+
+ vpxor ymm7,ymm10,ymm1
+
+ vpsrld ymm1,ymm11,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm11,10
+ vpxor ymm10,ymm12,ymm3
+ vpaddd ymm14,ymm14,ymm6
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm10,ymm10,ymm6
+ vpaddd ymm10,ymm10,ymm7
+ vmovdqu ymm6,YMMWORD[((480-256-128))+rbx]
+ vpaddd ymm5,ymm5,YMMWORD[((224-128))+rax]
+
+ vpsrld ymm7,ymm6,3
+ vpsrld ymm1,ymm6,7
+ vpslld ymm2,ymm6,25
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm6,18
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm6,14
+ vmovdqu ymm0,YMMWORD[((384-256-128))+rbx]
+ vpsrld ymm3,ymm0,10
+
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm0,17
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,15
+ vpaddd ymm5,ymm5,ymm7
+ vpxor ymm7,ymm3,ymm1
+ vpsrld ymm1,ymm0,19
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,13
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+ vpaddd ymm5,ymm5,ymm7
+ vpsrld ymm7,ymm14,6
+ vpslld ymm2,ymm14,26
+ vmovdqu YMMWORD[(448-256-128)+rbx],ymm5
+ vpaddd ymm5,ymm5,ymm9
+
+ vpsrld ymm1,ymm14,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm14,21
+ vpaddd ymm5,ymm5,YMMWORD[64+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm14,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm14,7
+ vpandn ymm0,ymm14,ymm8
+ vpand ymm3,ymm14,ymm15
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm9,ymm10,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm10,30
+ vpxor ymm0,ymm0,ymm3
+ vpxor ymm3,ymm11,ymm10
+
+ vpxor ymm9,ymm9,ymm1
+ vpaddd ymm5,ymm5,ymm7
+
+ vpsrld ymm1,ymm10,13
+
+ vpslld ymm2,ymm10,19
+ vpaddd ymm5,ymm5,ymm0
+ vpand ymm4,ymm4,ymm3
+
+ vpxor ymm7,ymm9,ymm1
+
+ vpsrld ymm1,ymm10,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm10,10
+ vpxor ymm9,ymm11,ymm4
+ vpaddd ymm13,ymm13,ymm5
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm9,ymm9,ymm5
+ vpaddd ymm9,ymm9,ymm7
+ vmovdqu ymm5,YMMWORD[((0-128))+rax]
+ vpaddd ymm6,ymm6,YMMWORD[((256-256-128))+rbx]
+
+ vpsrld ymm7,ymm5,3
+ vpsrld ymm1,ymm5,7
+ vpslld ymm2,ymm5,25
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm5,18
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm5,14
+ vmovdqu ymm0,YMMWORD[((416-256-128))+rbx]
+ vpsrld ymm4,ymm0,10
+
+ vpxor ymm7,ymm7,ymm1
+ vpsrld ymm1,ymm0,17
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,15
+ vpaddd ymm6,ymm6,ymm7
+ vpxor ymm7,ymm4,ymm1
+ vpsrld ymm1,ymm0,19
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm0,13
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+ vpaddd ymm6,ymm6,ymm7
+ vpsrld ymm7,ymm13,6
+ vpslld ymm2,ymm13,26
+ vmovdqu YMMWORD[(480-256-128)+rbx],ymm6
+ vpaddd ymm6,ymm6,ymm8
+
+ vpsrld ymm1,ymm13,11
+ vpxor ymm7,ymm7,ymm2
+ vpslld ymm2,ymm13,21
+ vpaddd ymm6,ymm6,YMMWORD[96+rbp]
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm1,ymm13,25
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm13,7
+ vpandn ymm0,ymm13,ymm15
+ vpand ymm4,ymm13,ymm14
+
+ vpxor ymm7,ymm7,ymm1
+
+ vpsrld ymm8,ymm9,2
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm1,ymm9,30
+ vpxor ymm0,ymm0,ymm4
+ vpxor ymm4,ymm10,ymm9
+
+ vpxor ymm8,ymm8,ymm1
+ vpaddd ymm6,ymm6,ymm7
+
+ vpsrld ymm1,ymm9,13
+
+ vpslld ymm2,ymm9,19
+ vpaddd ymm6,ymm6,ymm0
+ vpand ymm3,ymm3,ymm4
+
+ vpxor ymm7,ymm8,ymm1
+
+ vpsrld ymm1,ymm9,22
+ vpxor ymm7,ymm7,ymm2
+
+ vpslld ymm2,ymm9,10
+ vpxor ymm8,ymm10,ymm3
+ vpaddd ymm12,ymm12,ymm6
+
+ vpxor ymm7,ymm7,ymm1
+ vpxor ymm7,ymm7,ymm2
+
+ vpaddd ymm8,ymm8,ymm6
+ vpaddd ymm8,ymm8,ymm7
+ add rbp,256
+ dec ecx
+ jnz NEAR $L$oop_16_xx_avx2
+
+ mov ecx,1
+ lea rbx,[512+rsp]
+ lea rbp,[((K256+128))]
+ cmp ecx,DWORD[rbx]
+ cmovge r12,rbp
+ cmp ecx,DWORD[4+rbx]
+ cmovge r13,rbp
+ cmp ecx,DWORD[8+rbx]
+ cmovge r14,rbp
+ cmp ecx,DWORD[12+rbx]
+ cmovge r15,rbp
+ cmp ecx,DWORD[16+rbx]
+ cmovge r8,rbp
+ cmp ecx,DWORD[20+rbx]
+ cmovge r9,rbp
+ cmp ecx,DWORD[24+rbx]
+ cmovge r10,rbp
+ cmp ecx,DWORD[28+rbx]
+ cmovge r11,rbp
+ vmovdqa ymm7,YMMWORD[rbx]
+ vpxor ymm0,ymm0,ymm0
+ vmovdqa ymm6,ymm7
+ vpcmpgtd ymm6,ymm6,ymm0
+ vpaddd ymm7,ymm7,ymm6
+
+ vmovdqu ymm0,YMMWORD[((0-128))+rdi]
+ vpand ymm8,ymm8,ymm6
+ vmovdqu ymm1,YMMWORD[((32-128))+rdi]
+ vpand ymm9,ymm9,ymm6
+ vmovdqu ymm2,YMMWORD[((64-128))+rdi]
+ vpand ymm10,ymm10,ymm6
+ vmovdqu ymm5,YMMWORD[((96-128))+rdi]
+ vpand ymm11,ymm11,ymm6
+ vpaddd ymm8,ymm8,ymm0
+ vmovdqu ymm0,YMMWORD[((128-128))+rdi]
+ vpand ymm12,ymm12,ymm6
+ vpaddd ymm9,ymm9,ymm1
+ vmovdqu ymm1,YMMWORD[((160-128))+rdi]
+ vpand ymm13,ymm13,ymm6
+ vpaddd ymm10,ymm10,ymm2
+ vmovdqu ymm2,YMMWORD[((192-128))+rdi]
+ vpand ymm14,ymm14,ymm6
+ vpaddd ymm11,ymm11,ymm5
+ vmovdqu ymm5,YMMWORD[((224-128))+rdi]
+ vpand ymm15,ymm15,ymm6
+ vpaddd ymm12,ymm12,ymm0
+ vpaddd ymm13,ymm13,ymm1
+ vmovdqu YMMWORD[(0-128)+rdi],ymm8
+ vpaddd ymm14,ymm14,ymm2
+ vmovdqu YMMWORD[(32-128)+rdi],ymm9
+ vpaddd ymm15,ymm15,ymm5
+ vmovdqu YMMWORD[(64-128)+rdi],ymm10
+ vmovdqu YMMWORD[(96-128)+rdi],ymm11
+ vmovdqu YMMWORD[(128-128)+rdi],ymm12
+ vmovdqu YMMWORD[(160-128)+rdi],ymm13
+ vmovdqu YMMWORD[(192-128)+rdi],ymm14
+ vmovdqu YMMWORD[(224-128)+rdi],ymm15
+
+ vmovdqu YMMWORD[rbx],ymm7
+ lea rbx,[((256+128))+rsp]
+ vmovdqu ymm6,YMMWORD[$L$pbswap]
+ dec edx
+ jnz NEAR $L$oop_avx2
+
+
+
+
+
+
+
+$L$done_avx2:
+ mov rax,QWORD[544+rsp]
+
+ vzeroupper
+ movaps xmm6,XMMWORD[((-216))+rax]
+ movaps xmm7,XMMWORD[((-200))+rax]
+ movaps xmm8,XMMWORD[((-184))+rax]
+ movaps xmm9,XMMWORD[((-168))+rax]
+ movaps xmm10,XMMWORD[((-152))+rax]
+ movaps xmm11,XMMWORD[((-136))+rax]
+ movaps xmm12,XMMWORD[((-120))+rax]
+ movaps xmm13,XMMWORD[((-104))+rax]
+ movaps xmm14,XMMWORD[((-88))+rax]
+ movaps xmm15,XMMWORD[((-72))+rax]
+ mov r15,QWORD[((-48))+rax]
+
+ mov r14,QWORD[((-40))+rax]
+
+ mov r13,QWORD[((-32))+rax]
+
+ mov r12,QWORD[((-24))+rax]
+
+ mov rbp,QWORD[((-16))+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+
+ lea rsp,[rax]
+
+$L$epilogue_avx2:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_sha256_multi_block_avx2:
+ALIGN 256
+K256:
+ DD 1116352408,1116352408,1116352408,1116352408
+ DD 1116352408,1116352408,1116352408,1116352408
+ DD 1899447441,1899447441,1899447441,1899447441
+ DD 1899447441,1899447441,1899447441,1899447441
+ DD 3049323471,3049323471,3049323471,3049323471
+ DD 3049323471,3049323471,3049323471,3049323471
+ DD 3921009573,3921009573,3921009573,3921009573
+ DD 3921009573,3921009573,3921009573,3921009573
+ DD 961987163,961987163,961987163,961987163
+ DD 961987163,961987163,961987163,961987163
+ DD 1508970993,1508970993,1508970993,1508970993
+ DD 1508970993,1508970993,1508970993,1508970993
+ DD 2453635748,2453635748,2453635748,2453635748
+ DD 2453635748,2453635748,2453635748,2453635748
+ DD 2870763221,2870763221,2870763221,2870763221
+ DD 2870763221,2870763221,2870763221,2870763221
+ DD 3624381080,3624381080,3624381080,3624381080
+ DD 3624381080,3624381080,3624381080,3624381080
+ DD 310598401,310598401,310598401,310598401
+ DD 310598401,310598401,310598401,310598401
+ DD 607225278,607225278,607225278,607225278
+ DD 607225278,607225278,607225278,607225278
+ DD 1426881987,1426881987,1426881987,1426881987
+ DD 1426881987,1426881987,1426881987,1426881987
+ DD 1925078388,1925078388,1925078388,1925078388
+ DD 1925078388,1925078388,1925078388,1925078388
+ DD 2162078206,2162078206,2162078206,2162078206
+ DD 2162078206,2162078206,2162078206,2162078206
+ DD 2614888103,2614888103,2614888103,2614888103
+ DD 2614888103,2614888103,2614888103,2614888103
+ DD 3248222580,3248222580,3248222580,3248222580
+ DD 3248222580,3248222580,3248222580,3248222580
+ DD 3835390401,3835390401,3835390401,3835390401
+ DD 3835390401,3835390401,3835390401,3835390401
+ DD 4022224774,4022224774,4022224774,4022224774
+ DD 4022224774,4022224774,4022224774,4022224774
+ DD 264347078,264347078,264347078,264347078
+ DD 264347078,264347078,264347078,264347078
+ DD 604807628,604807628,604807628,604807628
+ DD 604807628,604807628,604807628,604807628
+ DD 770255983,770255983,770255983,770255983
+ DD 770255983,770255983,770255983,770255983
+ DD 1249150122,1249150122,1249150122,1249150122
+ DD 1249150122,1249150122,1249150122,1249150122
+ DD 1555081692,1555081692,1555081692,1555081692
+ DD 1555081692,1555081692,1555081692,1555081692
+ DD 1996064986,1996064986,1996064986,1996064986
+ DD 1996064986,1996064986,1996064986,1996064986
+ DD 2554220882,2554220882,2554220882,2554220882
+ DD 2554220882,2554220882,2554220882,2554220882
+ DD 2821834349,2821834349,2821834349,2821834349
+ DD 2821834349,2821834349,2821834349,2821834349
+ DD 2952996808,2952996808,2952996808,2952996808
+ DD 2952996808,2952996808,2952996808,2952996808
+ DD 3210313671,3210313671,3210313671,3210313671
+ DD 3210313671,3210313671,3210313671,3210313671
+ DD 3336571891,3336571891,3336571891,3336571891
+ DD 3336571891,3336571891,3336571891,3336571891
+ DD 3584528711,3584528711,3584528711,3584528711
+ DD 3584528711,3584528711,3584528711,3584528711
+ DD 113926993,113926993,113926993,113926993
+ DD 113926993,113926993,113926993,113926993
+ DD 338241895,338241895,338241895,338241895
+ DD 338241895,338241895,338241895,338241895
+ DD 666307205,666307205,666307205,666307205
+ DD 666307205,666307205,666307205,666307205
+ DD 773529912,773529912,773529912,773529912
+ DD 773529912,773529912,773529912,773529912
+ DD 1294757372,1294757372,1294757372,1294757372
+ DD 1294757372,1294757372,1294757372,1294757372
+ DD 1396182291,1396182291,1396182291,1396182291
+ DD 1396182291,1396182291,1396182291,1396182291
+ DD 1695183700,1695183700,1695183700,1695183700
+ DD 1695183700,1695183700,1695183700,1695183700
+ DD 1986661051,1986661051,1986661051,1986661051
+ DD 1986661051,1986661051,1986661051,1986661051
+ DD 2177026350,2177026350,2177026350,2177026350
+ DD 2177026350,2177026350,2177026350,2177026350
+ DD 2456956037,2456956037,2456956037,2456956037
+ DD 2456956037,2456956037,2456956037,2456956037
+ DD 2730485921,2730485921,2730485921,2730485921
+ DD 2730485921,2730485921,2730485921,2730485921
+ DD 2820302411,2820302411,2820302411,2820302411
+ DD 2820302411,2820302411,2820302411,2820302411
+ DD 3259730800,3259730800,3259730800,3259730800
+ DD 3259730800,3259730800,3259730800,3259730800
+ DD 3345764771,3345764771,3345764771,3345764771
+ DD 3345764771,3345764771,3345764771,3345764771
+ DD 3516065817,3516065817,3516065817,3516065817
+ DD 3516065817,3516065817,3516065817,3516065817
+ DD 3600352804,3600352804,3600352804,3600352804
+ DD 3600352804,3600352804,3600352804,3600352804
+ DD 4094571909,4094571909,4094571909,4094571909
+ DD 4094571909,4094571909,4094571909,4094571909
+ DD 275423344,275423344,275423344,275423344
+ DD 275423344,275423344,275423344,275423344
+ DD 430227734,430227734,430227734,430227734
+ DD 430227734,430227734,430227734,430227734
+ DD 506948616,506948616,506948616,506948616
+ DD 506948616,506948616,506948616,506948616
+ DD 659060556,659060556,659060556,659060556
+ DD 659060556,659060556,659060556,659060556
+ DD 883997877,883997877,883997877,883997877
+ DD 883997877,883997877,883997877,883997877
+ DD 958139571,958139571,958139571,958139571
+ DD 958139571,958139571,958139571,958139571
+ DD 1322822218,1322822218,1322822218,1322822218
+ DD 1322822218,1322822218,1322822218,1322822218
+ DD 1537002063,1537002063,1537002063,1537002063
+ DD 1537002063,1537002063,1537002063,1537002063
+ DD 1747873779,1747873779,1747873779,1747873779
+ DD 1747873779,1747873779,1747873779,1747873779
+ DD 1955562222,1955562222,1955562222,1955562222
+ DD 1955562222,1955562222,1955562222,1955562222
+ DD 2024104815,2024104815,2024104815,2024104815
+ DD 2024104815,2024104815,2024104815,2024104815
+ DD 2227730452,2227730452,2227730452,2227730452
+ DD 2227730452,2227730452,2227730452,2227730452
+ DD 2361852424,2361852424,2361852424,2361852424
+ DD 2361852424,2361852424,2361852424,2361852424
+ DD 2428436474,2428436474,2428436474,2428436474
+ DD 2428436474,2428436474,2428436474,2428436474
+ DD 2756734187,2756734187,2756734187,2756734187
+ DD 2756734187,2756734187,2756734187,2756734187
+ DD 3204031479,3204031479,3204031479,3204031479
+ DD 3204031479,3204031479,3204031479,3204031479
+ DD 3329325298,3329325298,3329325298,3329325298
+ DD 3329325298,3329325298,3329325298,3329325298
+$L$pbswap:
+ DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f
+ DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f
+K256_shaext:
+ DD 0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5
+ DD 0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5
+ DD 0xd807aa98,0x12835b01,0x243185be,0x550c7dc3
+ DD 0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174
+ DD 0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc
+ DD 0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da
+ DD 0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7
+ DD 0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967
+ DD 0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13
+ DD 0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85
+ DD 0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3
+ DD 0xd192e819,0xd6990624,0xf40e3585,0x106aa070
+ DD 0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5
+ DD 0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3
+ DD 0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208
+ DD 0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2
+DB 83,72,65,50,53,54,32,109,117,108,116,105,45,98,108,111
+DB 99,107,32,116,114,97,110,115,102,111,114,109,32,102,111,114
+DB 32,120,56,54,95,54,52,44,32,67,82,89,80,84,79,71
+DB 65,77,83,32,98,121,32,60,97,112,112,114,111,64,111,112
+DB 101,110,115,115,108,46,111,114,103,62,0
+EXTERN __imp_RtlVirtualUnwind
+
+ALIGN 16
+se_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ mov rsi,QWORD[8+r9]
+ mov r11,QWORD[56+r9]
+
+ mov r10d,DWORD[r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jb NEAR $L$in_prologue
+
+ mov rax,QWORD[152+r8]
+
+ mov r10d,DWORD[4+r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jae NEAR $L$in_prologue
+
+ mov rax,QWORD[272+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+ mov rbp,QWORD[((-16))+rax]
+ mov QWORD[144+r8],rbx
+ mov QWORD[160+r8],rbp
+
+ lea rsi,[((-24-160))+rax]
+ lea rdi,[512+r8]
+ mov ecx,20
+ DD 0xa548f3fc
+
+$L$in_prologue:
+ mov rdi,QWORD[8+rax]
+ mov rsi,QWORD[16+rax]
+ mov QWORD[152+r8],rax
+ mov QWORD[168+r8],rsi
+ mov QWORD[176+r8],rdi
+
+ mov rdi,QWORD[40+r9]
+ mov rsi,r8
+ mov ecx,154
+ DD 0xa548f3fc
+
+ mov rsi,r9
+ xor rcx,rcx
+ mov rdx,QWORD[8+rsi]
+ mov r8,QWORD[rsi]
+ mov r9,QWORD[16+rsi]
+ mov r10,QWORD[40+rsi]
+ lea r11,[56+rsi]
+ lea r12,[24+rsi]
+ mov QWORD[32+rsp],r10
+ mov QWORD[40+rsp],r11
+ mov QWORD[48+rsp],r12
+ mov QWORD[56+rsp],rcx
+ call QWORD[__imp_RtlVirtualUnwind]
+
+ mov eax,1
+ add rsp,64
+ popfq
+ pop r15
+ pop r14
+ pop r13
+ pop r12
+ pop rbp
+ pop rbx
+ pop rdi
+ pop rsi
+ DB 0F3h,0C3h ;repret
+
+
+ALIGN 16
+avx2_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ mov rsi,QWORD[8+r9]
+ mov r11,QWORD[56+r9]
+
+ mov r10d,DWORD[r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jb NEAR $L$in_prologue
+
+ mov rax,QWORD[152+r8]
+
+ mov r10d,DWORD[4+r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jae NEAR $L$in_prologue
+
+ mov rax,QWORD[544+r8]
+
+ mov rbx,QWORD[((-8))+rax]
+ mov rbp,QWORD[((-16))+rax]
+ mov r12,QWORD[((-24))+rax]
+ mov r13,QWORD[((-32))+rax]
+ mov r14,QWORD[((-40))+rax]
+ mov r15,QWORD[((-48))+rax]
+ mov QWORD[144+r8],rbx
+ mov QWORD[160+r8],rbp
+ mov QWORD[216+r8],r12
+ mov QWORD[224+r8],r13
+ mov QWORD[232+r8],r14
+ mov QWORD[240+r8],r15
+
+ lea rsi,[((-56-160))+rax]
+ lea rdi,[512+r8]
+ mov ecx,20
+ DD 0xa548f3fc
+
+ jmp NEAR $L$in_prologue
+
+section .pdata rdata align=4
+ALIGN 4
+ DD $L$SEH_begin_sha256_multi_block wrt ..imagebase
+ DD $L$SEH_end_sha256_multi_block wrt ..imagebase
+ DD $L$SEH_info_sha256_multi_block wrt ..imagebase
+ DD $L$SEH_begin_sha256_multi_block_shaext wrt ..imagebase
+ DD $L$SEH_end_sha256_multi_block_shaext wrt ..imagebase
+ DD $L$SEH_info_sha256_multi_block_shaext wrt ..imagebase
+ DD $L$SEH_begin_sha256_multi_block_avx wrt ..imagebase
+ DD $L$SEH_end_sha256_multi_block_avx wrt ..imagebase
+ DD $L$SEH_info_sha256_multi_block_avx wrt ..imagebase
+ DD $L$SEH_begin_sha256_multi_block_avx2 wrt ..imagebase
+ DD $L$SEH_end_sha256_multi_block_avx2 wrt ..imagebase
+ DD $L$SEH_info_sha256_multi_block_avx2 wrt ..imagebase
+section .xdata rdata align=8
+ALIGN 8
+$L$SEH_info_sha256_multi_block:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$body wrt ..imagebase,$L$epilogue wrt ..imagebase
+$L$SEH_info_sha256_multi_block_shaext:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$body_shaext wrt ..imagebase,$L$epilogue_shaext wrt ..imagebase
+$L$SEH_info_sha256_multi_block_avx:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$body_avx wrt ..imagebase,$L$epilogue_avx wrt ..imagebase
+$L$SEH_info_sha256_multi_block_avx2:
+DB 9,0,0,0
+ DD avx2_handler wrt ..imagebase
+ DD $L$body_avx2 wrt ..imagebase,$L$epilogue_avx2 wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-x86_64.nasm
new file mode 100644
index 0000000000..e8abeaa668
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-x86_64.nasm
@@ -0,0 +1,5712 @@
+; Copyright 2005-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+EXTERN OPENSSL_ia32cap_P
+global sha256_block_data_order
+
+ALIGN 16
+sha256_block_data_order:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_sha256_block_data_order:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+
+ lea r11,[OPENSSL_ia32cap_P]
+ mov r9d,DWORD[r11]
+ mov r10d,DWORD[4+r11]
+ mov r11d,DWORD[8+r11]
+ test r11d,536870912
+ jnz NEAR _shaext_shortcut
+ and r11d,296
+ cmp r11d,296
+ je NEAR $L$avx2_shortcut
+ and r9d,1073741824
+ and r10d,268435968
+ or r10d,r9d
+ cmp r10d,1342177792
+ je NEAR $L$avx_shortcut
+ test r10d,512
+ jnz NEAR $L$ssse3_shortcut
+ mov rax,rsp
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+ shl rdx,4
+ sub rsp,16*4+4*8
+ lea rdx,[rdx*4+rsi]
+ and rsp,-64
+ mov QWORD[((64+0))+rsp],rdi
+ mov QWORD[((64+8))+rsp],rsi
+ mov QWORD[((64+16))+rsp],rdx
+ mov QWORD[88+rsp],rax
+
+$L$prologue:
+
+ mov eax,DWORD[rdi]
+ mov ebx,DWORD[4+rdi]
+ mov ecx,DWORD[8+rdi]
+ mov edx,DWORD[12+rdi]
+ mov r8d,DWORD[16+rdi]
+ mov r9d,DWORD[20+rdi]
+ mov r10d,DWORD[24+rdi]
+ mov r11d,DWORD[28+rdi]
+ jmp NEAR $L$loop
+
+ALIGN 16
+$L$loop:
+ mov edi,ebx
+ lea rbp,[K256]
+ xor edi,ecx
+ mov r12d,DWORD[rsi]
+ mov r13d,r8d
+ mov r14d,eax
+ bswap r12d
+ ror r13d,14
+ mov r15d,r9d
+
+ xor r13d,r8d
+ ror r14d,9
+ xor r15d,r10d
+
+ mov DWORD[rsp],r12d
+ xor r14d,eax
+ and r15d,r8d
+
+ ror r13d,5
+ add r12d,r11d
+ xor r15d,r10d
+
+ ror r14d,11
+ xor r13d,r8d
+ add r12d,r15d
+
+ mov r15d,eax
+ add r12d,DWORD[rbp]
+ xor r14d,eax
+
+ xor r15d,ebx
+ ror r13d,6
+ mov r11d,ebx
+
+ and edi,r15d
+ ror r14d,2
+ add r12d,r13d
+
+ xor r11d,edi
+ add edx,r12d
+ add r11d,r12d
+
+ lea rbp,[4+rbp]
+ add r11d,r14d
+ mov r12d,DWORD[4+rsi]
+ mov r13d,edx
+ mov r14d,r11d
+ bswap r12d
+ ror r13d,14
+ mov edi,r8d
+
+ xor r13d,edx
+ ror r14d,9
+ xor edi,r9d
+
+ mov DWORD[4+rsp],r12d
+ xor r14d,r11d
+ and edi,edx
+
+ ror r13d,5
+ add r12d,r10d
+ xor edi,r9d
+
+ ror r14d,11
+ xor r13d,edx
+ add r12d,edi
+
+ mov edi,r11d
+ add r12d,DWORD[rbp]
+ xor r14d,r11d
+
+ xor edi,eax
+ ror r13d,6
+ mov r10d,eax
+
+ and r15d,edi
+ ror r14d,2
+ add r12d,r13d
+
+ xor r10d,r15d
+ add ecx,r12d
+ add r10d,r12d
+
+ lea rbp,[4+rbp]
+ add r10d,r14d
+ mov r12d,DWORD[8+rsi]
+ mov r13d,ecx
+ mov r14d,r10d
+ bswap r12d
+ ror r13d,14
+ mov r15d,edx
+
+ xor r13d,ecx
+ ror r14d,9
+ xor r15d,r8d
+
+ mov DWORD[8+rsp],r12d
+ xor r14d,r10d
+ and r15d,ecx
+
+ ror r13d,5
+ add r12d,r9d
+ xor r15d,r8d
+
+ ror r14d,11
+ xor r13d,ecx
+ add r12d,r15d
+
+ mov r15d,r10d
+ add r12d,DWORD[rbp]
+ xor r14d,r10d
+
+ xor r15d,r11d
+ ror r13d,6
+ mov r9d,r11d
+
+ and edi,r15d
+ ror r14d,2
+ add r12d,r13d
+
+ xor r9d,edi
+ add ebx,r12d
+ add r9d,r12d
+
+ lea rbp,[4+rbp]
+ add r9d,r14d
+ mov r12d,DWORD[12+rsi]
+ mov r13d,ebx
+ mov r14d,r9d
+ bswap r12d
+ ror r13d,14
+ mov edi,ecx
+
+ xor r13d,ebx
+ ror r14d,9
+ xor edi,edx
+
+ mov DWORD[12+rsp],r12d
+ xor r14d,r9d
+ and edi,ebx
+
+ ror r13d,5
+ add r12d,r8d
+ xor edi,edx
+
+ ror r14d,11
+ xor r13d,ebx
+ add r12d,edi
+
+ mov edi,r9d
+ add r12d,DWORD[rbp]
+ xor r14d,r9d
+
+ xor edi,r10d
+ ror r13d,6
+ mov r8d,r10d
+
+ and r15d,edi
+ ror r14d,2
+ add r12d,r13d
+
+ xor r8d,r15d
+ add eax,r12d
+ add r8d,r12d
+
+ lea rbp,[20+rbp]
+ add r8d,r14d
+ mov r12d,DWORD[16+rsi]
+ mov r13d,eax
+ mov r14d,r8d
+ bswap r12d
+ ror r13d,14
+ mov r15d,ebx
+
+ xor r13d,eax
+ ror r14d,9
+ xor r15d,ecx
+
+ mov DWORD[16+rsp],r12d
+ xor r14d,r8d
+ and r15d,eax
+
+ ror r13d,5
+ add r12d,edx
+ xor r15d,ecx
+
+ ror r14d,11
+ xor r13d,eax
+ add r12d,r15d
+
+ mov r15d,r8d
+ add r12d,DWORD[rbp]
+ xor r14d,r8d
+
+ xor r15d,r9d
+ ror r13d,6
+ mov edx,r9d
+
+ and edi,r15d
+ ror r14d,2
+ add r12d,r13d
+
+ xor edx,edi
+ add r11d,r12d
+ add edx,r12d
+
+ lea rbp,[4+rbp]
+ add edx,r14d
+ mov r12d,DWORD[20+rsi]
+ mov r13d,r11d
+ mov r14d,edx
+ bswap r12d
+ ror r13d,14
+ mov edi,eax
+
+ xor r13d,r11d
+ ror r14d,9
+ xor edi,ebx
+
+ mov DWORD[20+rsp],r12d
+ xor r14d,edx
+ and edi,r11d
+
+ ror r13d,5
+ add r12d,ecx
+ xor edi,ebx
+
+ ror r14d,11
+ xor r13d,r11d
+ add r12d,edi
+
+ mov edi,edx
+ add r12d,DWORD[rbp]
+ xor r14d,edx
+
+ xor edi,r8d
+ ror r13d,6
+ mov ecx,r8d
+
+ and r15d,edi
+ ror r14d,2
+ add r12d,r13d
+
+ xor ecx,r15d
+ add r10d,r12d
+ add ecx,r12d
+
+ lea rbp,[4+rbp]
+ add ecx,r14d
+ mov r12d,DWORD[24+rsi]
+ mov r13d,r10d
+ mov r14d,ecx
+ bswap r12d
+ ror r13d,14
+ mov r15d,r11d
+
+ xor r13d,r10d
+ ror r14d,9
+ xor r15d,eax
+
+ mov DWORD[24+rsp],r12d
+ xor r14d,ecx
+ and r15d,r10d
+
+ ror r13d,5
+ add r12d,ebx
+ xor r15d,eax
+
+ ror r14d,11
+ xor r13d,r10d
+ add r12d,r15d
+
+ mov r15d,ecx
+ add r12d,DWORD[rbp]
+ xor r14d,ecx
+
+ xor r15d,edx
+ ror r13d,6
+ mov ebx,edx
+
+ and edi,r15d
+ ror r14d,2
+ add r12d,r13d
+
+ xor ebx,edi
+ add r9d,r12d
+ add ebx,r12d
+
+ lea rbp,[4+rbp]
+ add ebx,r14d
+ mov r12d,DWORD[28+rsi]
+ mov r13d,r9d
+ mov r14d,ebx
+ bswap r12d
+ ror r13d,14
+ mov edi,r10d
+
+ xor r13d,r9d
+ ror r14d,9
+ xor edi,r11d
+
+ mov DWORD[28+rsp],r12d
+ xor r14d,ebx
+ and edi,r9d
+
+ ror r13d,5
+ add r12d,eax
+ xor edi,r11d
+
+ ror r14d,11
+ xor r13d,r9d
+ add r12d,edi
+
+ mov edi,ebx
+ add r12d,DWORD[rbp]
+ xor r14d,ebx
+
+ xor edi,ecx
+ ror r13d,6
+ mov eax,ecx
+
+ and r15d,edi
+ ror r14d,2
+ add r12d,r13d
+
+ xor eax,r15d
+ add r8d,r12d
+ add eax,r12d
+
+ lea rbp,[20+rbp]
+ add eax,r14d
+ mov r12d,DWORD[32+rsi]
+ mov r13d,r8d
+ mov r14d,eax
+ bswap r12d
+ ror r13d,14
+ mov r15d,r9d
+
+ xor r13d,r8d
+ ror r14d,9
+ xor r15d,r10d
+
+ mov DWORD[32+rsp],r12d
+ xor r14d,eax
+ and r15d,r8d
+
+ ror r13d,5
+ add r12d,r11d
+ xor r15d,r10d
+
+ ror r14d,11
+ xor r13d,r8d
+ add r12d,r15d
+
+ mov r15d,eax
+ add r12d,DWORD[rbp]
+ xor r14d,eax
+
+ xor r15d,ebx
+ ror r13d,6
+ mov r11d,ebx
+
+ and edi,r15d
+ ror r14d,2
+ add r12d,r13d
+
+ xor r11d,edi
+ add edx,r12d
+ add r11d,r12d
+
+ lea rbp,[4+rbp]
+ add r11d,r14d
+ mov r12d,DWORD[36+rsi]
+ mov r13d,edx
+ mov r14d,r11d
+ bswap r12d
+ ror r13d,14
+ mov edi,r8d
+
+ xor r13d,edx
+ ror r14d,9
+ xor edi,r9d
+
+ mov DWORD[36+rsp],r12d
+ xor r14d,r11d
+ and edi,edx
+
+ ror r13d,5
+ add r12d,r10d
+ xor edi,r9d
+
+ ror r14d,11
+ xor r13d,edx
+ add r12d,edi
+
+ mov edi,r11d
+ add r12d,DWORD[rbp]
+ xor r14d,r11d
+
+ xor edi,eax
+ ror r13d,6
+ mov r10d,eax
+
+ and r15d,edi
+ ror r14d,2
+ add r12d,r13d
+
+ xor r10d,r15d
+ add ecx,r12d
+ add r10d,r12d
+
+ lea rbp,[4+rbp]
+ add r10d,r14d
+ mov r12d,DWORD[40+rsi]
+ mov r13d,ecx
+ mov r14d,r10d
+ bswap r12d
+ ror r13d,14
+ mov r15d,edx
+
+ xor r13d,ecx
+ ror r14d,9
+ xor r15d,r8d
+
+ mov DWORD[40+rsp],r12d
+ xor r14d,r10d
+ and r15d,ecx
+
+ ror r13d,5
+ add r12d,r9d
+ xor r15d,r8d
+
+ ror r14d,11
+ xor r13d,ecx
+ add r12d,r15d
+
+ mov r15d,r10d
+ add r12d,DWORD[rbp]
+ xor r14d,r10d
+
+ xor r15d,r11d
+ ror r13d,6
+ mov r9d,r11d
+
+ and edi,r15d
+ ror r14d,2
+ add r12d,r13d
+
+ xor r9d,edi
+ add ebx,r12d
+ add r9d,r12d
+
+ lea rbp,[4+rbp]
+ add r9d,r14d
+ mov r12d,DWORD[44+rsi]
+ mov r13d,ebx
+ mov r14d,r9d
+ bswap r12d
+ ror r13d,14
+ mov edi,ecx
+
+ xor r13d,ebx
+ ror r14d,9
+ xor edi,edx
+
+ mov DWORD[44+rsp],r12d
+ xor r14d,r9d
+ and edi,ebx
+
+ ror r13d,5
+ add r12d,r8d
+ xor edi,edx
+
+ ror r14d,11
+ xor r13d,ebx
+ add r12d,edi
+
+ mov edi,r9d
+ add r12d,DWORD[rbp]
+ xor r14d,r9d
+
+ xor edi,r10d
+ ror r13d,6
+ mov r8d,r10d
+
+ and r15d,edi
+ ror r14d,2
+ add r12d,r13d
+
+ xor r8d,r15d
+ add eax,r12d
+ add r8d,r12d
+
+ lea rbp,[20+rbp]
+ add r8d,r14d
+ mov r12d,DWORD[48+rsi]
+ mov r13d,eax
+ mov r14d,r8d
+ bswap r12d
+ ror r13d,14
+ mov r15d,ebx
+
+ xor r13d,eax
+ ror r14d,9
+ xor r15d,ecx
+
+ mov DWORD[48+rsp],r12d
+ xor r14d,r8d
+ and r15d,eax
+
+ ror r13d,5
+ add r12d,edx
+ xor r15d,ecx
+
+ ror r14d,11
+ xor r13d,eax
+ add r12d,r15d
+
+ mov r15d,r8d
+ add r12d,DWORD[rbp]
+ xor r14d,r8d
+
+ xor r15d,r9d
+ ror r13d,6
+ mov edx,r9d
+
+ and edi,r15d
+ ror r14d,2
+ add r12d,r13d
+
+ xor edx,edi
+ add r11d,r12d
+ add edx,r12d
+
+ lea rbp,[4+rbp]
+ add edx,r14d
+ mov r12d,DWORD[52+rsi]
+ mov r13d,r11d
+ mov r14d,edx
+ bswap r12d
+ ror r13d,14
+ mov edi,eax
+
+ xor r13d,r11d
+ ror r14d,9
+ xor edi,ebx
+
+ mov DWORD[52+rsp],r12d
+ xor r14d,edx
+ and edi,r11d
+
+ ror r13d,5
+ add r12d,ecx
+ xor edi,ebx
+
+ ror r14d,11
+ xor r13d,r11d
+ add r12d,edi
+
+ mov edi,edx
+ add r12d,DWORD[rbp]
+ xor r14d,edx
+
+ xor edi,r8d
+ ror r13d,6
+ mov ecx,r8d
+
+ and r15d,edi
+ ror r14d,2
+ add r12d,r13d
+
+ xor ecx,r15d
+ add r10d,r12d
+ add ecx,r12d
+
+ lea rbp,[4+rbp]
+ add ecx,r14d
+ mov r12d,DWORD[56+rsi]
+ mov r13d,r10d
+ mov r14d,ecx
+ bswap r12d
+ ror r13d,14
+ mov r15d,r11d
+
+ xor r13d,r10d
+ ror r14d,9
+ xor r15d,eax
+
+ mov DWORD[56+rsp],r12d
+ xor r14d,ecx
+ and r15d,r10d
+
+ ror r13d,5
+ add r12d,ebx
+ xor r15d,eax
+
+ ror r14d,11
+ xor r13d,r10d
+ add r12d,r15d
+
+ mov r15d,ecx
+ add r12d,DWORD[rbp]
+ xor r14d,ecx
+
+ xor r15d,edx
+ ror r13d,6
+ mov ebx,edx
+
+ and edi,r15d
+ ror r14d,2
+ add r12d,r13d
+
+ xor ebx,edi
+ add r9d,r12d
+ add ebx,r12d
+
+ lea rbp,[4+rbp]
+ add ebx,r14d
+ mov r12d,DWORD[60+rsi]
+ mov r13d,r9d
+ mov r14d,ebx
+ bswap r12d
+ ror r13d,14
+ mov edi,r10d
+
+ xor r13d,r9d
+ ror r14d,9
+ xor edi,r11d
+
+ mov DWORD[60+rsp],r12d
+ xor r14d,ebx
+ and edi,r9d
+
+ ror r13d,5
+ add r12d,eax
+ xor edi,r11d
+
+ ror r14d,11
+ xor r13d,r9d
+ add r12d,edi
+
+ mov edi,ebx
+ add r12d,DWORD[rbp]
+ xor r14d,ebx
+
+ xor edi,ecx
+ ror r13d,6
+ mov eax,ecx
+
+ and r15d,edi
+ ror r14d,2
+ add r12d,r13d
+
+ xor eax,r15d
+ add r8d,r12d
+ add eax,r12d
+
+ lea rbp,[20+rbp]
+ jmp NEAR $L$rounds_16_xx
+ALIGN 16
+$L$rounds_16_xx:
+ mov r13d,DWORD[4+rsp]
+ mov r15d,DWORD[56+rsp]
+
+ mov r12d,r13d
+ ror r13d,11
+ add eax,r14d
+ mov r14d,r15d
+ ror r15d,2
+
+ xor r13d,r12d
+ shr r12d,3
+ ror r13d,7
+ xor r15d,r14d
+ shr r14d,10
+
+ ror r15d,17
+ xor r12d,r13d
+ xor r15d,r14d
+ add r12d,DWORD[36+rsp]
+
+ add r12d,DWORD[rsp]
+ mov r13d,r8d
+ add r12d,r15d
+ mov r14d,eax
+ ror r13d,14
+ mov r15d,r9d
+
+ xor r13d,r8d
+ ror r14d,9
+ xor r15d,r10d
+
+ mov DWORD[rsp],r12d
+ xor r14d,eax
+ and r15d,r8d
+
+ ror r13d,5
+ add r12d,r11d
+ xor r15d,r10d
+
+ ror r14d,11
+ xor r13d,r8d
+ add r12d,r15d
+
+ mov r15d,eax
+ add r12d,DWORD[rbp]
+ xor r14d,eax
+
+ xor r15d,ebx
+ ror r13d,6
+ mov r11d,ebx
+
+ and edi,r15d
+ ror r14d,2
+ add r12d,r13d
+
+ xor r11d,edi
+ add edx,r12d
+ add r11d,r12d
+
+ lea rbp,[4+rbp]
+ mov r13d,DWORD[8+rsp]
+ mov edi,DWORD[60+rsp]
+
+ mov r12d,r13d
+ ror r13d,11
+ add r11d,r14d
+ mov r14d,edi
+ ror edi,2
+
+ xor r13d,r12d
+ shr r12d,3
+ ror r13d,7
+ xor edi,r14d
+ shr r14d,10
+
+ ror edi,17
+ xor r12d,r13d
+ xor edi,r14d
+ add r12d,DWORD[40+rsp]
+
+ add r12d,DWORD[4+rsp]
+ mov r13d,edx
+ add r12d,edi
+ mov r14d,r11d
+ ror r13d,14
+ mov edi,r8d
+
+ xor r13d,edx
+ ror r14d,9
+ xor edi,r9d
+
+ mov DWORD[4+rsp],r12d
+ xor r14d,r11d
+ and edi,edx
+
+ ror r13d,5
+ add r12d,r10d
+ xor edi,r9d
+
+ ror r14d,11
+ xor r13d,edx
+ add r12d,edi
+
+ mov edi,r11d
+ add r12d,DWORD[rbp]
+ xor r14d,r11d
+
+ xor edi,eax
+ ror r13d,6
+ mov r10d,eax
+
+ and r15d,edi
+ ror r14d,2
+ add r12d,r13d
+
+ xor r10d,r15d
+ add ecx,r12d
+ add r10d,r12d
+
+ lea rbp,[4+rbp]
+ mov r13d,DWORD[12+rsp]
+ mov r15d,DWORD[rsp]
+
+ mov r12d,r13d
+ ror r13d,11
+ add r10d,r14d
+ mov r14d,r15d
+ ror r15d,2
+
+ xor r13d,r12d
+ shr r12d,3
+ ror r13d,7
+ xor r15d,r14d
+ shr r14d,10
+
+ ror r15d,17
+ xor r12d,r13d
+ xor r15d,r14d
+ add r12d,DWORD[44+rsp]
+
+ add r12d,DWORD[8+rsp]
+ mov r13d,ecx
+ add r12d,r15d
+ mov r14d,r10d
+ ror r13d,14
+ mov r15d,edx
+
+ xor r13d,ecx
+ ror r14d,9
+ xor r15d,r8d
+
+ mov DWORD[8+rsp],r12d
+ xor r14d,r10d
+ and r15d,ecx
+
+ ror r13d,5
+ add r12d,r9d
+ xor r15d,r8d
+
+ ror r14d,11
+ xor r13d,ecx
+ add r12d,r15d
+
+ mov r15d,r10d
+ add r12d,DWORD[rbp]
+ xor r14d,r10d
+
+ xor r15d,r11d
+ ror r13d,6
+ mov r9d,r11d
+
+ and edi,r15d
+ ror r14d,2
+ add r12d,r13d
+
+ xor r9d,edi
+ add ebx,r12d
+ add r9d,r12d
+
+ lea rbp,[4+rbp]
+ mov r13d,DWORD[16+rsp]
+ mov edi,DWORD[4+rsp]
+
+ mov r12d,r13d
+ ror r13d,11
+ add r9d,r14d
+ mov r14d,edi
+ ror edi,2
+
+ xor r13d,r12d
+ shr r12d,3
+ ror r13d,7
+ xor edi,r14d
+ shr r14d,10
+
+ ror edi,17
+ xor r12d,r13d
+ xor edi,r14d
+ add r12d,DWORD[48+rsp]
+
+ add r12d,DWORD[12+rsp]
+ mov r13d,ebx
+ add r12d,edi
+ mov r14d,r9d
+ ror r13d,14
+ mov edi,ecx
+
+ xor r13d,ebx
+ ror r14d,9
+ xor edi,edx
+
+ mov DWORD[12+rsp],r12d
+ xor r14d,r9d
+ and edi,ebx
+
+ ror r13d,5
+ add r12d,r8d
+ xor edi,edx
+
+ ror r14d,11
+ xor r13d,ebx
+ add r12d,edi
+
+ mov edi,r9d
+ add r12d,DWORD[rbp]
+ xor r14d,r9d
+
+ xor edi,r10d
+ ror r13d,6
+ mov r8d,r10d
+
+ and r15d,edi
+ ror r14d,2
+ add r12d,r13d
+
+ xor r8d,r15d
+ add eax,r12d
+ add r8d,r12d
+
+ lea rbp,[20+rbp]
+ mov r13d,DWORD[20+rsp]
+ mov r15d,DWORD[8+rsp]
+
+ mov r12d,r13d
+ ror r13d,11
+ add r8d,r14d
+ mov r14d,r15d
+ ror r15d,2
+
+ xor r13d,r12d
+ shr r12d,3
+ ror r13d,7
+ xor r15d,r14d
+ shr r14d,10
+
+ ror r15d,17
+ xor r12d,r13d
+ xor r15d,r14d
+ add r12d,DWORD[52+rsp]
+
+ add r12d,DWORD[16+rsp]
+ mov r13d,eax
+ add r12d,r15d
+ mov r14d,r8d
+ ror r13d,14
+ mov r15d,ebx
+
+ xor r13d,eax
+ ror r14d,9
+ xor r15d,ecx
+
+ mov DWORD[16+rsp],r12d
+ xor r14d,r8d
+ and r15d,eax
+
+ ror r13d,5
+ add r12d,edx
+ xor r15d,ecx
+
+ ror r14d,11
+ xor r13d,eax
+ add r12d,r15d
+
+ mov r15d,r8d
+ add r12d,DWORD[rbp]
+ xor r14d,r8d
+
+ xor r15d,r9d
+ ror r13d,6
+ mov edx,r9d
+
+ and edi,r15d
+ ror r14d,2
+ add r12d,r13d
+
+ xor edx,edi
+ add r11d,r12d
+ add edx,r12d
+
+ lea rbp,[4+rbp]
+ mov r13d,DWORD[24+rsp]
+ mov edi,DWORD[12+rsp]
+
+ mov r12d,r13d
+ ror r13d,11
+ add edx,r14d
+ mov r14d,edi
+ ror edi,2
+
+ xor r13d,r12d
+ shr r12d,3
+ ror r13d,7
+ xor edi,r14d
+ shr r14d,10
+
+ ror edi,17
+ xor r12d,r13d
+ xor edi,r14d
+ add r12d,DWORD[56+rsp]
+
+ add r12d,DWORD[20+rsp]
+ mov r13d,r11d
+ add r12d,edi
+ mov r14d,edx
+ ror r13d,14
+ mov edi,eax
+
+ xor r13d,r11d
+ ror r14d,9
+ xor edi,ebx
+
+ mov DWORD[20+rsp],r12d
+ xor r14d,edx
+ and edi,r11d
+
+ ror r13d,5
+ add r12d,ecx
+ xor edi,ebx
+
+ ror r14d,11
+ xor r13d,r11d
+ add r12d,edi
+
+ mov edi,edx
+ add r12d,DWORD[rbp]
+ xor r14d,edx
+
+ xor edi,r8d
+ ror r13d,6
+ mov ecx,r8d
+
+ and r15d,edi
+ ror r14d,2
+ add r12d,r13d
+
+ xor ecx,r15d
+ add r10d,r12d
+ add ecx,r12d
+
+ lea rbp,[4+rbp]
+ mov r13d,DWORD[28+rsp]
+ mov r15d,DWORD[16+rsp]
+
+ mov r12d,r13d
+ ror r13d,11
+ add ecx,r14d
+ mov r14d,r15d
+ ror r15d,2
+
+ xor r13d,r12d
+ shr r12d,3
+ ror r13d,7
+ xor r15d,r14d
+ shr r14d,10
+
+ ror r15d,17
+ xor r12d,r13d
+ xor r15d,r14d
+ add r12d,DWORD[60+rsp]
+
+ add r12d,DWORD[24+rsp]
+ mov r13d,r10d
+ add r12d,r15d
+ mov r14d,ecx
+ ror r13d,14
+ mov r15d,r11d
+
+ xor r13d,r10d
+ ror r14d,9
+ xor r15d,eax
+
+ mov DWORD[24+rsp],r12d
+ xor r14d,ecx
+ and r15d,r10d
+
+ ror r13d,5
+ add r12d,ebx
+ xor r15d,eax
+
+ ror r14d,11
+ xor r13d,r10d
+ add r12d,r15d
+
+ mov r15d,ecx
+ add r12d,DWORD[rbp]
+ xor r14d,ecx
+
+ xor r15d,edx
+ ror r13d,6
+ mov ebx,edx
+
+ and edi,r15d
+ ror r14d,2
+ add r12d,r13d
+
+ xor ebx,edi
+ add r9d,r12d
+ add ebx,r12d
+
+ lea rbp,[4+rbp]
+ mov r13d,DWORD[32+rsp]
+ mov edi,DWORD[20+rsp]
+
+ mov r12d,r13d
+ ror r13d,11
+ add ebx,r14d
+ mov r14d,edi
+ ror edi,2
+
+ xor r13d,r12d
+ shr r12d,3
+ ror r13d,7
+ xor edi,r14d
+ shr r14d,10
+
+ ror edi,17
+ xor r12d,r13d
+ xor edi,r14d
+ add r12d,DWORD[rsp]
+
+ add r12d,DWORD[28+rsp]
+ mov r13d,r9d
+ add r12d,edi
+ mov r14d,ebx
+ ror r13d,14
+ mov edi,r10d
+
+ xor r13d,r9d
+ ror r14d,9
+ xor edi,r11d
+
+ mov DWORD[28+rsp],r12d
+ xor r14d,ebx
+ and edi,r9d
+
+ ror r13d,5
+ add r12d,eax
+ xor edi,r11d
+
+ ror r14d,11
+ xor r13d,r9d
+ add r12d,edi
+
+ mov edi,ebx
+ add r12d,DWORD[rbp]
+ xor r14d,ebx
+
+ xor edi,ecx
+ ror r13d,6
+ mov eax,ecx
+
+ and r15d,edi
+ ror r14d,2
+ add r12d,r13d
+
+ xor eax,r15d
+ add r8d,r12d
+ add eax,r12d
+
+ lea rbp,[20+rbp]
+ mov r13d,DWORD[36+rsp]
+ mov r15d,DWORD[24+rsp]
+
+ mov r12d,r13d
+ ror r13d,11
+ add eax,r14d
+ mov r14d,r15d
+ ror r15d,2
+
+ xor r13d,r12d
+ shr r12d,3
+ ror r13d,7
+ xor r15d,r14d
+ shr r14d,10
+
+ ror r15d,17
+ xor r12d,r13d
+ xor r15d,r14d
+ add r12d,DWORD[4+rsp]
+
+ add r12d,DWORD[32+rsp]
+ mov r13d,r8d
+ add r12d,r15d
+ mov r14d,eax
+ ror r13d,14
+ mov r15d,r9d
+
+ xor r13d,r8d
+ ror r14d,9
+ xor r15d,r10d
+
+ mov DWORD[32+rsp],r12d
+ xor r14d,eax
+ and r15d,r8d
+
+ ror r13d,5
+ add r12d,r11d
+ xor r15d,r10d
+
+ ror r14d,11
+ xor r13d,r8d
+ add r12d,r15d
+
+ mov r15d,eax
+ add r12d,DWORD[rbp]
+ xor r14d,eax
+
+ xor r15d,ebx
+ ror r13d,6
+ mov r11d,ebx
+
+ and edi,r15d
+ ror r14d,2
+ add r12d,r13d
+
+ xor r11d,edi
+ add edx,r12d
+ add r11d,r12d
+
+ lea rbp,[4+rbp]
+ mov r13d,DWORD[40+rsp]
+ mov edi,DWORD[28+rsp]
+
+ mov r12d,r13d
+ ror r13d,11
+ add r11d,r14d
+ mov r14d,edi
+ ror edi,2
+
+ xor r13d,r12d
+ shr r12d,3
+ ror r13d,7
+ xor edi,r14d
+ shr r14d,10
+
+ ror edi,17
+ xor r12d,r13d
+ xor edi,r14d
+ add r12d,DWORD[8+rsp]
+
+ add r12d,DWORD[36+rsp]
+ mov r13d,edx
+ add r12d,edi
+ mov r14d,r11d
+ ror r13d,14
+ mov edi,r8d
+
+ xor r13d,edx
+ ror r14d,9
+ xor edi,r9d
+
+ mov DWORD[36+rsp],r12d
+ xor r14d,r11d
+ and edi,edx
+
+ ror r13d,5
+ add r12d,r10d
+ xor edi,r9d
+
+ ror r14d,11
+ xor r13d,edx
+ add r12d,edi
+
+ mov edi,r11d
+ add r12d,DWORD[rbp]
+ xor r14d,r11d
+
+ xor edi,eax
+ ror r13d,6
+ mov r10d,eax
+
+ and r15d,edi
+ ror r14d,2
+ add r12d,r13d
+
+ xor r10d,r15d
+ add ecx,r12d
+ add r10d,r12d
+
+ lea rbp,[4+rbp]
+ mov r13d,DWORD[44+rsp]
+ mov r15d,DWORD[32+rsp]
+
+ mov r12d,r13d
+ ror r13d,11
+ add r10d,r14d
+ mov r14d,r15d
+ ror r15d,2
+
+ xor r13d,r12d
+ shr r12d,3
+ ror r13d,7
+ xor r15d,r14d
+ shr r14d,10
+
+ ror r15d,17
+ xor r12d,r13d
+ xor r15d,r14d
+ add r12d,DWORD[12+rsp]
+
+ add r12d,DWORD[40+rsp]
+ mov r13d,ecx
+ add r12d,r15d
+ mov r14d,r10d
+ ror r13d,14
+ mov r15d,edx
+
+ xor r13d,ecx
+ ror r14d,9
+ xor r15d,r8d
+
+ mov DWORD[40+rsp],r12d
+ xor r14d,r10d
+ and r15d,ecx
+
+ ror r13d,5
+ add r12d,r9d
+ xor r15d,r8d
+
+ ror r14d,11
+ xor r13d,ecx
+ add r12d,r15d
+
+ mov r15d,r10d
+ add r12d,DWORD[rbp]
+ xor r14d,r10d
+
+ xor r15d,r11d
+ ror r13d,6
+ mov r9d,r11d
+
+ and edi,r15d
+ ror r14d,2
+ add r12d,r13d
+
+ xor r9d,edi
+ add ebx,r12d
+ add r9d,r12d
+
+ lea rbp,[4+rbp]
+ mov r13d,DWORD[48+rsp]
+ mov edi,DWORD[36+rsp]
+
+ mov r12d,r13d
+ ror r13d,11
+ add r9d,r14d
+ mov r14d,edi
+ ror edi,2
+
+ xor r13d,r12d
+ shr r12d,3
+ ror r13d,7
+ xor edi,r14d
+ shr r14d,10
+
+ ror edi,17
+ xor r12d,r13d
+ xor edi,r14d
+ add r12d,DWORD[16+rsp]
+
+ add r12d,DWORD[44+rsp]
+ mov r13d,ebx
+ add r12d,edi
+ mov r14d,r9d
+ ror r13d,14
+ mov edi,ecx
+
+ xor r13d,ebx
+ ror r14d,9
+ xor edi,edx
+
+ mov DWORD[44+rsp],r12d
+ xor r14d,r9d
+ and edi,ebx
+
+ ror r13d,5
+ add r12d,r8d
+ xor edi,edx
+
+ ror r14d,11
+ xor r13d,ebx
+ add r12d,edi
+
+ mov edi,r9d
+ add r12d,DWORD[rbp]
+ xor r14d,r9d
+
+ xor edi,r10d
+ ror r13d,6
+ mov r8d,r10d
+
+ and r15d,edi
+ ror r14d,2
+ add r12d,r13d
+
+ xor r8d,r15d
+ add eax,r12d
+ add r8d,r12d
+
+ lea rbp,[20+rbp]
+ mov r13d,DWORD[52+rsp]
+ mov r15d,DWORD[40+rsp]
+
+ mov r12d,r13d
+ ror r13d,11
+ add r8d,r14d
+ mov r14d,r15d
+ ror r15d,2
+
+ xor r13d,r12d
+ shr r12d,3
+ ror r13d,7
+ xor r15d,r14d
+ shr r14d,10
+
+ ror r15d,17
+ xor r12d,r13d
+ xor r15d,r14d
+ add r12d,DWORD[20+rsp]
+
+ add r12d,DWORD[48+rsp]
+ mov r13d,eax
+ add r12d,r15d
+ mov r14d,r8d
+ ror r13d,14
+ mov r15d,ebx
+
+ xor r13d,eax
+ ror r14d,9
+ xor r15d,ecx
+
+ mov DWORD[48+rsp],r12d
+ xor r14d,r8d
+ and r15d,eax
+
+ ror r13d,5
+ add r12d,edx
+ xor r15d,ecx
+
+ ror r14d,11
+ xor r13d,eax
+ add r12d,r15d
+
+ mov r15d,r8d
+ add r12d,DWORD[rbp]
+ xor r14d,r8d
+
+ xor r15d,r9d
+ ror r13d,6
+ mov edx,r9d
+
+ and edi,r15d
+ ror r14d,2
+ add r12d,r13d
+
+ xor edx,edi
+ add r11d,r12d
+ add edx,r12d
+
+ lea rbp,[4+rbp]
+ mov r13d,DWORD[56+rsp]
+ mov edi,DWORD[44+rsp]
+
+ mov r12d,r13d
+ ror r13d,11
+ add edx,r14d
+ mov r14d,edi
+ ror edi,2
+
+ xor r13d,r12d
+ shr r12d,3
+ ror r13d,7
+ xor edi,r14d
+ shr r14d,10
+
+ ror edi,17
+ xor r12d,r13d
+ xor edi,r14d
+ add r12d,DWORD[24+rsp]
+
+ add r12d,DWORD[52+rsp]
+ mov r13d,r11d
+ add r12d,edi
+ mov r14d,edx
+ ror r13d,14
+ mov edi,eax
+
+ xor r13d,r11d
+ ror r14d,9
+ xor edi,ebx
+
+ mov DWORD[52+rsp],r12d
+ xor r14d,edx
+ and edi,r11d
+
+ ror r13d,5
+ add r12d,ecx
+ xor edi,ebx
+
+ ror r14d,11
+ xor r13d,r11d
+ add r12d,edi
+
+ mov edi,edx
+ add r12d,DWORD[rbp]
+ xor r14d,edx
+
+ xor edi,r8d
+ ror r13d,6
+ mov ecx,r8d
+
+ and r15d,edi
+ ror r14d,2
+ add r12d,r13d
+
+ xor ecx,r15d
+ add r10d,r12d
+ add ecx,r12d
+
+ lea rbp,[4+rbp]
+ mov r13d,DWORD[60+rsp]
+ mov r15d,DWORD[48+rsp]
+
+ mov r12d,r13d
+ ror r13d,11
+ add ecx,r14d
+ mov r14d,r15d
+ ror r15d,2
+
+ xor r13d,r12d
+ shr r12d,3
+ ror r13d,7
+ xor r15d,r14d
+ shr r14d,10
+
+ ror r15d,17
+ xor r12d,r13d
+ xor r15d,r14d
+ add r12d,DWORD[28+rsp]
+
+ add r12d,DWORD[56+rsp]
+ mov r13d,r10d
+ add r12d,r15d
+ mov r14d,ecx
+ ror r13d,14
+ mov r15d,r11d
+
+ xor r13d,r10d
+ ror r14d,9
+ xor r15d,eax
+
+ mov DWORD[56+rsp],r12d
+ xor r14d,ecx
+ and r15d,r10d
+
+ ror r13d,5
+ add r12d,ebx
+ xor r15d,eax
+
+ ror r14d,11
+ xor r13d,r10d
+ add r12d,r15d
+
+ mov r15d,ecx
+ add r12d,DWORD[rbp]
+ xor r14d,ecx
+
+ xor r15d,edx
+ ror r13d,6
+ mov ebx,edx
+
+ and edi,r15d
+ ror r14d,2
+ add r12d,r13d
+
+ xor ebx,edi
+ add r9d,r12d
+ add ebx,r12d
+
+ lea rbp,[4+rbp]
+ mov r13d,DWORD[rsp]
+ mov edi,DWORD[52+rsp]
+
+ mov r12d,r13d
+ ror r13d,11
+ add ebx,r14d
+ mov r14d,edi
+ ror edi,2
+
+ xor r13d,r12d
+ shr r12d,3
+ ror r13d,7
+ xor edi,r14d
+ shr r14d,10
+
+ ror edi,17
+ xor r12d,r13d
+ xor edi,r14d
+ add r12d,DWORD[32+rsp]
+
+ add r12d,DWORD[60+rsp]
+ mov r13d,r9d
+ add r12d,edi
+ mov r14d,ebx
+ ror r13d,14
+ mov edi,r10d
+
+ xor r13d,r9d
+ ror r14d,9
+ xor edi,r11d
+
+ mov DWORD[60+rsp],r12d
+ xor r14d,ebx
+ and edi,r9d
+
+ ror r13d,5
+ add r12d,eax
+ xor edi,r11d
+
+ ror r14d,11
+ xor r13d,r9d
+ add r12d,edi
+
+ mov edi,ebx
+ add r12d,DWORD[rbp]
+ xor r14d,ebx
+
+ xor edi,ecx
+ ror r13d,6
+ mov eax,ecx
+
+ and r15d,edi
+ ror r14d,2
+ add r12d,r13d
+
+ xor eax,r15d
+ add r8d,r12d
+ add eax,r12d
+
+ lea rbp,[20+rbp]
+ cmp BYTE[3+rbp],0
+ jnz NEAR $L$rounds_16_xx
+
+ mov rdi,QWORD[((64+0))+rsp]
+ add eax,r14d
+ lea rsi,[64+rsi]
+
+ add eax,DWORD[rdi]
+ add ebx,DWORD[4+rdi]
+ add ecx,DWORD[8+rdi]
+ add edx,DWORD[12+rdi]
+ add r8d,DWORD[16+rdi]
+ add r9d,DWORD[20+rdi]
+ add r10d,DWORD[24+rdi]
+ add r11d,DWORD[28+rdi]
+
+ cmp rsi,QWORD[((64+16))+rsp]
+
+ mov DWORD[rdi],eax
+ mov DWORD[4+rdi],ebx
+ mov DWORD[8+rdi],ecx
+ mov DWORD[12+rdi],edx
+ mov DWORD[16+rdi],r8d
+ mov DWORD[20+rdi],r9d
+ mov DWORD[24+rdi],r10d
+ mov DWORD[28+rdi],r11d
+ jb NEAR $L$loop
+
+ mov rsi,QWORD[88+rsp]
+
+ mov r15,QWORD[((-48))+rsi]
+
+ mov r14,QWORD[((-40))+rsi]
+
+ mov r13,QWORD[((-32))+rsi]
+
+ mov r12,QWORD[((-24))+rsi]
+
+ mov rbp,QWORD[((-16))+rsi]
+
+ mov rbx,QWORD[((-8))+rsi]
+
+ lea rsp,[rsi]
+
+$L$epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_sha256_block_data_order:
+ALIGN 64
+
+K256:
+ DD 0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5
+ DD 0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5
+ DD 0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5
+ DD 0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5
+ DD 0xd807aa98,0x12835b01,0x243185be,0x550c7dc3
+ DD 0xd807aa98,0x12835b01,0x243185be,0x550c7dc3
+ DD 0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174
+ DD 0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174
+ DD 0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc
+ DD 0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc
+ DD 0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da
+ DD 0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da
+ DD 0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7
+ DD 0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7
+ DD 0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967
+ DD 0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967
+ DD 0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13
+ DD 0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13
+ DD 0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85
+ DD 0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85
+ DD 0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3
+ DD 0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3
+ DD 0xd192e819,0xd6990624,0xf40e3585,0x106aa070
+ DD 0xd192e819,0xd6990624,0xf40e3585,0x106aa070
+ DD 0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5
+ DD 0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5
+ DD 0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3
+ DD 0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3
+ DD 0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208
+ DD 0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208
+ DD 0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2
+ DD 0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2
+
+ DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f
+ DD 0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f
+ DD 0x03020100,0x0b0a0908,0xffffffff,0xffffffff
+ DD 0x03020100,0x0b0a0908,0xffffffff,0xffffffff
+ DD 0xffffffff,0xffffffff,0x03020100,0x0b0a0908
+ DD 0xffffffff,0xffffffff,0x03020100,0x0b0a0908
+DB 83,72,65,50,53,54,32,98,108,111,99,107,32,116,114,97
+DB 110,115,102,111,114,109,32,102,111,114,32,120,56,54,95,54
+DB 52,44,32,67,82,89,80,84,79,71,65,77,83,32,98,121
+DB 32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46
+DB 111,114,103,62,0
+
+ALIGN 64
+sha256_block_data_order_shaext:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_sha256_block_data_order_shaext:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+_shaext_shortcut:
+ lea rsp,[((-88))+rsp]
+ movaps XMMWORD[(-8-80)+rax],xmm6
+ movaps XMMWORD[(-8-64)+rax],xmm7
+ movaps XMMWORD[(-8-48)+rax],xmm8
+ movaps XMMWORD[(-8-32)+rax],xmm9
+ movaps XMMWORD[(-8-16)+rax],xmm10
+$L$prologue_shaext:
+ lea rcx,[((K256+128))]
+ movdqu xmm1,XMMWORD[rdi]
+ movdqu xmm2,XMMWORD[16+rdi]
+ movdqa xmm7,XMMWORD[((512-128))+rcx]
+
+ pshufd xmm0,xmm1,0x1b
+ pshufd xmm1,xmm1,0xb1
+ pshufd xmm2,xmm2,0x1b
+ movdqa xmm8,xmm7
+DB 102,15,58,15,202,8
+ punpcklqdq xmm2,xmm0
+ jmp NEAR $L$oop_shaext
+
+ALIGN 16
+$L$oop_shaext:
+ movdqu xmm3,XMMWORD[rsi]
+ movdqu xmm4,XMMWORD[16+rsi]
+ movdqu xmm5,XMMWORD[32+rsi]
+DB 102,15,56,0,223
+ movdqu xmm6,XMMWORD[48+rsi]
+
+ movdqa xmm0,XMMWORD[((0-128))+rcx]
+ paddd xmm0,xmm3
+DB 102,15,56,0,231
+ movdqa xmm10,xmm2
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ nop
+ movdqa xmm9,xmm1
+DB 15,56,203,202
+
+ movdqa xmm0,XMMWORD[((32-128))+rcx]
+ paddd xmm0,xmm4
+DB 102,15,56,0,239
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ lea rsi,[64+rsi]
+DB 15,56,204,220
+DB 15,56,203,202
+
+ movdqa xmm0,XMMWORD[((64-128))+rcx]
+ paddd xmm0,xmm5
+DB 102,15,56,0,247
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movdqa xmm7,xmm6
+DB 102,15,58,15,253,4
+ nop
+ paddd xmm3,xmm7
+DB 15,56,204,229
+DB 15,56,203,202
+
+ movdqa xmm0,XMMWORD[((96-128))+rcx]
+ paddd xmm0,xmm6
+DB 15,56,205,222
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movdqa xmm7,xmm3
+DB 102,15,58,15,254,4
+ nop
+ paddd xmm4,xmm7
+DB 15,56,204,238
+DB 15,56,203,202
+ movdqa xmm0,XMMWORD[((128-128))+rcx]
+ paddd xmm0,xmm3
+DB 15,56,205,227
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movdqa xmm7,xmm4
+DB 102,15,58,15,251,4
+ nop
+ paddd xmm5,xmm7
+DB 15,56,204,243
+DB 15,56,203,202
+ movdqa xmm0,XMMWORD[((160-128))+rcx]
+ paddd xmm0,xmm4
+DB 15,56,205,236
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movdqa xmm7,xmm5
+DB 102,15,58,15,252,4
+ nop
+ paddd xmm6,xmm7
+DB 15,56,204,220
+DB 15,56,203,202
+ movdqa xmm0,XMMWORD[((192-128))+rcx]
+ paddd xmm0,xmm5
+DB 15,56,205,245
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movdqa xmm7,xmm6
+DB 102,15,58,15,253,4
+ nop
+ paddd xmm3,xmm7
+DB 15,56,204,229
+DB 15,56,203,202
+ movdqa xmm0,XMMWORD[((224-128))+rcx]
+ paddd xmm0,xmm6
+DB 15,56,205,222
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movdqa xmm7,xmm3
+DB 102,15,58,15,254,4
+ nop
+ paddd xmm4,xmm7
+DB 15,56,204,238
+DB 15,56,203,202
+ movdqa xmm0,XMMWORD[((256-128))+rcx]
+ paddd xmm0,xmm3
+DB 15,56,205,227
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movdqa xmm7,xmm4
+DB 102,15,58,15,251,4
+ nop
+ paddd xmm5,xmm7
+DB 15,56,204,243
+DB 15,56,203,202
+ movdqa xmm0,XMMWORD[((288-128))+rcx]
+ paddd xmm0,xmm4
+DB 15,56,205,236
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movdqa xmm7,xmm5
+DB 102,15,58,15,252,4
+ nop
+ paddd xmm6,xmm7
+DB 15,56,204,220
+DB 15,56,203,202
+ movdqa xmm0,XMMWORD[((320-128))+rcx]
+ paddd xmm0,xmm5
+DB 15,56,205,245
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movdqa xmm7,xmm6
+DB 102,15,58,15,253,4
+ nop
+ paddd xmm3,xmm7
+DB 15,56,204,229
+DB 15,56,203,202
+ movdqa xmm0,XMMWORD[((352-128))+rcx]
+ paddd xmm0,xmm6
+DB 15,56,205,222
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movdqa xmm7,xmm3
+DB 102,15,58,15,254,4
+ nop
+ paddd xmm4,xmm7
+DB 15,56,204,238
+DB 15,56,203,202
+ movdqa xmm0,XMMWORD[((384-128))+rcx]
+ paddd xmm0,xmm3
+DB 15,56,205,227
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movdqa xmm7,xmm4
+DB 102,15,58,15,251,4
+ nop
+ paddd xmm5,xmm7
+DB 15,56,204,243
+DB 15,56,203,202
+ movdqa xmm0,XMMWORD[((416-128))+rcx]
+ paddd xmm0,xmm4
+DB 15,56,205,236
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ movdqa xmm7,xmm5
+DB 102,15,58,15,252,4
+DB 15,56,203,202
+ paddd xmm6,xmm7
+
+ movdqa xmm0,XMMWORD[((448-128))+rcx]
+ paddd xmm0,xmm5
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+DB 15,56,205,245
+ movdqa xmm7,xmm8
+DB 15,56,203,202
+
+ movdqa xmm0,XMMWORD[((480-128))+rcx]
+ paddd xmm0,xmm6
+ nop
+DB 15,56,203,209
+ pshufd xmm0,xmm0,0x0e
+ dec rdx
+ nop
+DB 15,56,203,202
+
+ paddd xmm2,xmm10
+ paddd xmm1,xmm9
+ jnz NEAR $L$oop_shaext
+
+ pshufd xmm2,xmm2,0xb1
+ pshufd xmm7,xmm1,0x1b
+ pshufd xmm1,xmm1,0xb1
+ punpckhqdq xmm1,xmm2
+DB 102,15,58,15,215,8
+
+ movdqu XMMWORD[rdi],xmm1
+ movdqu XMMWORD[16+rdi],xmm2
+ movaps xmm6,XMMWORD[((-8-80))+rax]
+ movaps xmm7,XMMWORD[((-8-64))+rax]
+ movaps xmm8,XMMWORD[((-8-48))+rax]
+ movaps xmm9,XMMWORD[((-8-32))+rax]
+ movaps xmm10,XMMWORD[((-8-16))+rax]
+ mov rsp,rax
+$L$epilogue_shaext:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+$L$SEH_end_sha256_block_data_order_shaext:
+
+ALIGN 64
+sha256_block_data_order_ssse3:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_sha256_block_data_order_ssse3:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+
+$L$ssse3_shortcut:
+ mov rax,rsp
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+ shl rdx,4
+ sub rsp,160
+ lea rdx,[rdx*4+rsi]
+ and rsp,-64
+ mov QWORD[((64+0))+rsp],rdi
+ mov QWORD[((64+8))+rsp],rsi
+ mov QWORD[((64+16))+rsp],rdx
+ mov QWORD[88+rsp],rax
+
+ movaps XMMWORD[(64+32)+rsp],xmm6
+ movaps XMMWORD[(64+48)+rsp],xmm7
+ movaps XMMWORD[(64+64)+rsp],xmm8
+ movaps XMMWORD[(64+80)+rsp],xmm9
+$L$prologue_ssse3:
+
+ mov eax,DWORD[rdi]
+ mov ebx,DWORD[4+rdi]
+ mov ecx,DWORD[8+rdi]
+ mov edx,DWORD[12+rdi]
+ mov r8d,DWORD[16+rdi]
+ mov r9d,DWORD[20+rdi]
+ mov r10d,DWORD[24+rdi]
+ mov r11d,DWORD[28+rdi]
+
+
+ jmp NEAR $L$loop_ssse3
+ALIGN 16
+$L$loop_ssse3:
+ movdqa xmm7,XMMWORD[((K256+512))]
+ movdqu xmm0,XMMWORD[rsi]
+ movdqu xmm1,XMMWORD[16+rsi]
+ movdqu xmm2,XMMWORD[32+rsi]
+DB 102,15,56,0,199
+ movdqu xmm3,XMMWORD[48+rsi]
+ lea rbp,[K256]
+DB 102,15,56,0,207
+ movdqa xmm4,XMMWORD[rbp]
+ movdqa xmm5,XMMWORD[32+rbp]
+DB 102,15,56,0,215
+ paddd xmm4,xmm0
+ movdqa xmm6,XMMWORD[64+rbp]
+DB 102,15,56,0,223
+ movdqa xmm7,XMMWORD[96+rbp]
+ paddd xmm5,xmm1
+ paddd xmm6,xmm2
+ paddd xmm7,xmm3
+ movdqa XMMWORD[rsp],xmm4
+ mov r14d,eax
+ movdqa XMMWORD[16+rsp],xmm5
+ mov edi,ebx
+ movdqa XMMWORD[32+rsp],xmm6
+ xor edi,ecx
+ movdqa XMMWORD[48+rsp],xmm7
+ mov r13d,r8d
+ jmp NEAR $L$ssse3_00_47
+
+ALIGN 16
+$L$ssse3_00_47:
+ sub rbp,-128
+ ror r13d,14
+ movdqa xmm4,xmm1
+ mov eax,r14d
+ mov r12d,r9d
+ movdqa xmm7,xmm3
+ ror r14d,9
+ xor r13d,r8d
+ xor r12d,r10d
+ ror r13d,5
+ xor r14d,eax
+DB 102,15,58,15,224,4
+ and r12d,r8d
+ xor r13d,r8d
+DB 102,15,58,15,250,4
+ add r11d,DWORD[rsp]
+ mov r15d,eax
+ xor r12d,r10d
+ ror r14d,11
+ movdqa xmm5,xmm4
+ xor r15d,ebx
+ add r11d,r12d
+ movdqa xmm6,xmm4
+ ror r13d,6
+ and edi,r15d
+ psrld xmm4,3
+ xor r14d,eax
+ add r11d,r13d
+ xor edi,ebx
+ paddd xmm0,xmm7
+ ror r14d,2
+ add edx,r11d
+ psrld xmm6,7
+ add r11d,edi
+ mov r13d,edx
+ pshufd xmm7,xmm3,250
+ add r14d,r11d
+ ror r13d,14
+ pslld xmm5,14
+ mov r11d,r14d
+ mov r12d,r8d
+ pxor xmm4,xmm6
+ ror r14d,9
+ xor r13d,edx
+ xor r12d,r9d
+ ror r13d,5
+ psrld xmm6,11
+ xor r14d,r11d
+ pxor xmm4,xmm5
+ and r12d,edx
+ xor r13d,edx
+ pslld xmm5,11
+ add r10d,DWORD[4+rsp]
+ mov edi,r11d
+ pxor xmm4,xmm6
+ xor r12d,r9d
+ ror r14d,11
+ movdqa xmm6,xmm7
+ xor edi,eax
+ add r10d,r12d
+ pxor xmm4,xmm5
+ ror r13d,6
+ and r15d,edi
+ xor r14d,r11d
+ psrld xmm7,10
+ add r10d,r13d
+ xor r15d,eax
+ paddd xmm0,xmm4
+ ror r14d,2
+ add ecx,r10d
+ psrlq xmm6,17
+ add r10d,r15d
+ mov r13d,ecx
+ add r14d,r10d
+ pxor xmm7,xmm6
+ ror r13d,14
+ mov r10d,r14d
+ mov r12d,edx
+ ror r14d,9
+ psrlq xmm6,2
+ xor r13d,ecx
+ xor r12d,r8d
+ pxor xmm7,xmm6
+ ror r13d,5
+ xor r14d,r10d
+ and r12d,ecx
+ pshufd xmm7,xmm7,128
+ xor r13d,ecx
+ add r9d,DWORD[8+rsp]
+ mov r15d,r10d
+ psrldq xmm7,8
+ xor r12d,r8d
+ ror r14d,11
+ xor r15d,r11d
+ add r9d,r12d
+ ror r13d,6
+ paddd xmm0,xmm7
+ and edi,r15d
+ xor r14d,r10d
+ add r9d,r13d
+ pshufd xmm7,xmm0,80
+ xor edi,r11d
+ ror r14d,2
+ add ebx,r9d
+ movdqa xmm6,xmm7
+ add r9d,edi
+ mov r13d,ebx
+ psrld xmm7,10
+ add r14d,r9d
+ ror r13d,14
+ psrlq xmm6,17
+ mov r9d,r14d
+ mov r12d,ecx
+ pxor xmm7,xmm6
+ ror r14d,9
+ xor r13d,ebx
+ xor r12d,edx
+ ror r13d,5
+ xor r14d,r9d
+ psrlq xmm6,2
+ and r12d,ebx
+ xor r13d,ebx
+ add r8d,DWORD[12+rsp]
+ pxor xmm7,xmm6
+ mov edi,r9d
+ xor r12d,edx
+ ror r14d,11
+ pshufd xmm7,xmm7,8
+ xor edi,r10d
+ add r8d,r12d
+ movdqa xmm6,XMMWORD[rbp]
+ ror r13d,6
+ and r15d,edi
+ pslldq xmm7,8
+ xor r14d,r9d
+ add r8d,r13d
+ xor r15d,r10d
+ paddd xmm0,xmm7
+ ror r14d,2
+ add eax,r8d
+ add r8d,r15d
+ paddd xmm6,xmm0
+ mov r13d,eax
+ add r14d,r8d
+ movdqa XMMWORD[rsp],xmm6
+ ror r13d,14
+ movdqa xmm4,xmm2
+ mov r8d,r14d
+ mov r12d,ebx
+ movdqa xmm7,xmm0
+ ror r14d,9
+ xor r13d,eax
+ xor r12d,ecx
+ ror r13d,5
+ xor r14d,r8d
+DB 102,15,58,15,225,4
+ and r12d,eax
+ xor r13d,eax
+DB 102,15,58,15,251,4
+ add edx,DWORD[16+rsp]
+ mov r15d,r8d
+ xor r12d,ecx
+ ror r14d,11
+ movdqa xmm5,xmm4
+ xor r15d,r9d
+ add edx,r12d
+ movdqa xmm6,xmm4
+ ror r13d,6
+ and edi,r15d
+ psrld xmm4,3
+ xor r14d,r8d
+ add edx,r13d
+ xor edi,r9d
+ paddd xmm1,xmm7
+ ror r14d,2
+ add r11d,edx
+ psrld xmm6,7
+ add edx,edi
+ mov r13d,r11d
+ pshufd xmm7,xmm0,250
+ add r14d,edx
+ ror r13d,14
+ pslld xmm5,14
+ mov edx,r14d
+ mov r12d,eax
+ pxor xmm4,xmm6
+ ror r14d,9
+ xor r13d,r11d
+ xor r12d,ebx
+ ror r13d,5
+ psrld xmm6,11
+ xor r14d,edx
+ pxor xmm4,xmm5
+ and r12d,r11d
+ xor r13d,r11d
+ pslld xmm5,11
+ add ecx,DWORD[20+rsp]
+ mov edi,edx
+ pxor xmm4,xmm6
+ xor r12d,ebx
+ ror r14d,11
+ movdqa xmm6,xmm7
+ xor edi,r8d
+ add ecx,r12d
+ pxor xmm4,xmm5
+ ror r13d,6
+ and r15d,edi
+ xor r14d,edx
+ psrld xmm7,10
+ add ecx,r13d
+ xor r15d,r8d
+ paddd xmm1,xmm4
+ ror r14d,2
+ add r10d,ecx
+ psrlq xmm6,17
+ add ecx,r15d
+ mov r13d,r10d
+ add r14d,ecx
+ pxor xmm7,xmm6
+ ror r13d,14
+ mov ecx,r14d
+ mov r12d,r11d
+ ror r14d,9
+ psrlq xmm6,2
+ xor r13d,r10d
+ xor r12d,eax
+ pxor xmm7,xmm6
+ ror r13d,5
+ xor r14d,ecx
+ and r12d,r10d
+ pshufd xmm7,xmm7,128
+ xor r13d,r10d
+ add ebx,DWORD[24+rsp]
+ mov r15d,ecx
+ psrldq xmm7,8
+ xor r12d,eax
+ ror r14d,11
+ xor r15d,edx
+ add ebx,r12d
+ ror r13d,6
+ paddd xmm1,xmm7
+ and edi,r15d
+ xor r14d,ecx
+ add ebx,r13d
+ pshufd xmm7,xmm1,80
+ xor edi,edx
+ ror r14d,2
+ add r9d,ebx
+ movdqa xmm6,xmm7
+ add ebx,edi
+ mov r13d,r9d
+ psrld xmm7,10
+ add r14d,ebx
+ ror r13d,14
+ psrlq xmm6,17
+ mov ebx,r14d
+ mov r12d,r10d
+ pxor xmm7,xmm6
+ ror r14d,9
+ xor r13d,r9d
+ xor r12d,r11d
+ ror r13d,5
+ xor r14d,ebx
+ psrlq xmm6,2
+ and r12d,r9d
+ xor r13d,r9d
+ add eax,DWORD[28+rsp]
+ pxor xmm7,xmm6
+ mov edi,ebx
+ xor r12d,r11d
+ ror r14d,11
+ pshufd xmm7,xmm7,8
+ xor edi,ecx
+ add eax,r12d
+ movdqa xmm6,XMMWORD[32+rbp]
+ ror r13d,6
+ and r15d,edi
+ pslldq xmm7,8
+ xor r14d,ebx
+ add eax,r13d
+ xor r15d,ecx
+ paddd xmm1,xmm7
+ ror r14d,2
+ add r8d,eax
+ add eax,r15d
+ paddd xmm6,xmm1
+ mov r13d,r8d
+ add r14d,eax
+ movdqa XMMWORD[16+rsp],xmm6
+ ror r13d,14
+ movdqa xmm4,xmm3
+ mov eax,r14d
+ mov r12d,r9d
+ movdqa xmm7,xmm1
+ ror r14d,9
+ xor r13d,r8d
+ xor r12d,r10d
+ ror r13d,5
+ xor r14d,eax
+DB 102,15,58,15,226,4
+ and r12d,r8d
+ xor r13d,r8d
+DB 102,15,58,15,248,4
+ add r11d,DWORD[32+rsp]
+ mov r15d,eax
+ xor r12d,r10d
+ ror r14d,11
+ movdqa xmm5,xmm4
+ xor r15d,ebx
+ add r11d,r12d
+ movdqa xmm6,xmm4
+ ror r13d,6
+ and edi,r15d
+ psrld xmm4,3
+ xor r14d,eax
+ add r11d,r13d
+ xor edi,ebx
+ paddd xmm2,xmm7
+ ror r14d,2
+ add edx,r11d
+ psrld xmm6,7
+ add r11d,edi
+ mov r13d,edx
+ pshufd xmm7,xmm1,250
+ add r14d,r11d
+ ror r13d,14
+ pslld xmm5,14
+ mov r11d,r14d
+ mov r12d,r8d
+ pxor xmm4,xmm6
+ ror r14d,9
+ xor r13d,edx
+ xor r12d,r9d
+ ror r13d,5
+ psrld xmm6,11
+ xor r14d,r11d
+ pxor xmm4,xmm5
+ and r12d,edx
+ xor r13d,edx
+ pslld xmm5,11
+ add r10d,DWORD[36+rsp]
+ mov edi,r11d
+ pxor xmm4,xmm6
+ xor r12d,r9d
+ ror r14d,11
+ movdqa xmm6,xmm7
+ xor edi,eax
+ add r10d,r12d
+ pxor xmm4,xmm5
+ ror r13d,6
+ and r15d,edi
+ xor r14d,r11d
+ psrld xmm7,10
+ add r10d,r13d
+ xor r15d,eax
+ paddd xmm2,xmm4
+ ror r14d,2
+ add ecx,r10d
+ psrlq xmm6,17
+ add r10d,r15d
+ mov r13d,ecx
+ add r14d,r10d
+ pxor xmm7,xmm6
+ ror r13d,14
+ mov r10d,r14d
+ mov r12d,edx
+ ror r14d,9
+ psrlq xmm6,2
+ xor r13d,ecx
+ xor r12d,r8d
+ pxor xmm7,xmm6
+ ror r13d,5
+ xor r14d,r10d
+ and r12d,ecx
+ pshufd xmm7,xmm7,128
+ xor r13d,ecx
+ add r9d,DWORD[40+rsp]
+ mov r15d,r10d
+ psrldq xmm7,8
+ xor r12d,r8d
+ ror r14d,11
+ xor r15d,r11d
+ add r9d,r12d
+ ror r13d,6
+ paddd xmm2,xmm7
+ and edi,r15d
+ xor r14d,r10d
+ add r9d,r13d
+ pshufd xmm7,xmm2,80
+ xor edi,r11d
+ ror r14d,2
+ add ebx,r9d
+ movdqa xmm6,xmm7
+ add r9d,edi
+ mov r13d,ebx
+ psrld xmm7,10
+ add r14d,r9d
+ ror r13d,14
+ psrlq xmm6,17
+ mov r9d,r14d
+ mov r12d,ecx
+ pxor xmm7,xmm6
+ ror r14d,9
+ xor r13d,ebx
+ xor r12d,edx
+ ror r13d,5
+ xor r14d,r9d
+ psrlq xmm6,2
+ and r12d,ebx
+ xor r13d,ebx
+ add r8d,DWORD[44+rsp]
+ pxor xmm7,xmm6
+ mov edi,r9d
+ xor r12d,edx
+ ror r14d,11
+ pshufd xmm7,xmm7,8
+ xor edi,r10d
+ add r8d,r12d
+ movdqa xmm6,XMMWORD[64+rbp]
+ ror r13d,6
+ and r15d,edi
+ pslldq xmm7,8
+ xor r14d,r9d
+ add r8d,r13d
+ xor r15d,r10d
+ paddd xmm2,xmm7
+ ror r14d,2
+ add eax,r8d
+ add r8d,r15d
+ paddd xmm6,xmm2
+ mov r13d,eax
+ add r14d,r8d
+ movdqa XMMWORD[32+rsp],xmm6
+ ror r13d,14
+ movdqa xmm4,xmm0
+ mov r8d,r14d
+ mov r12d,ebx
+ movdqa xmm7,xmm2
+ ror r14d,9
+ xor r13d,eax
+ xor r12d,ecx
+ ror r13d,5
+ xor r14d,r8d
+DB 102,15,58,15,227,4
+ and r12d,eax
+ xor r13d,eax
+DB 102,15,58,15,249,4
+ add edx,DWORD[48+rsp]
+ mov r15d,r8d
+ xor r12d,ecx
+ ror r14d,11
+ movdqa xmm5,xmm4
+ xor r15d,r9d
+ add edx,r12d
+ movdqa xmm6,xmm4
+ ror r13d,6
+ and edi,r15d
+ psrld xmm4,3
+ xor r14d,r8d
+ add edx,r13d
+ xor edi,r9d
+ paddd xmm3,xmm7
+ ror r14d,2
+ add r11d,edx
+ psrld xmm6,7
+ add edx,edi
+ mov r13d,r11d
+ pshufd xmm7,xmm2,250
+ add r14d,edx
+ ror r13d,14
+ pslld xmm5,14
+ mov edx,r14d
+ mov r12d,eax
+ pxor xmm4,xmm6
+ ror r14d,9
+ xor r13d,r11d
+ xor r12d,ebx
+ ror r13d,5
+ psrld xmm6,11
+ xor r14d,edx
+ pxor xmm4,xmm5
+ and r12d,r11d
+ xor r13d,r11d
+ pslld xmm5,11
+ add ecx,DWORD[52+rsp]
+ mov edi,edx
+ pxor xmm4,xmm6
+ xor r12d,ebx
+ ror r14d,11
+ movdqa xmm6,xmm7
+ xor edi,r8d
+ add ecx,r12d
+ pxor xmm4,xmm5
+ ror r13d,6
+ and r15d,edi
+ xor r14d,edx
+ psrld xmm7,10
+ add ecx,r13d
+ xor r15d,r8d
+ paddd xmm3,xmm4
+ ror r14d,2
+ add r10d,ecx
+ psrlq xmm6,17
+ add ecx,r15d
+ mov r13d,r10d
+ add r14d,ecx
+ pxor xmm7,xmm6
+ ror r13d,14
+ mov ecx,r14d
+ mov r12d,r11d
+ ror r14d,9
+ psrlq xmm6,2
+ xor r13d,r10d
+ xor r12d,eax
+ pxor xmm7,xmm6
+ ror r13d,5
+ xor r14d,ecx
+ and r12d,r10d
+ pshufd xmm7,xmm7,128
+ xor r13d,r10d
+ add ebx,DWORD[56+rsp]
+ mov r15d,ecx
+ psrldq xmm7,8
+ xor r12d,eax
+ ror r14d,11
+ xor r15d,edx
+ add ebx,r12d
+ ror r13d,6
+ paddd xmm3,xmm7
+ and edi,r15d
+ xor r14d,ecx
+ add ebx,r13d
+ pshufd xmm7,xmm3,80
+ xor edi,edx
+ ror r14d,2
+ add r9d,ebx
+ movdqa xmm6,xmm7
+ add ebx,edi
+ mov r13d,r9d
+ psrld xmm7,10
+ add r14d,ebx
+ ror r13d,14
+ psrlq xmm6,17
+ mov ebx,r14d
+ mov r12d,r10d
+ pxor xmm7,xmm6
+ ror r14d,9
+ xor r13d,r9d
+ xor r12d,r11d
+ ror r13d,5
+ xor r14d,ebx
+ psrlq xmm6,2
+ and r12d,r9d
+ xor r13d,r9d
+ add eax,DWORD[60+rsp]
+ pxor xmm7,xmm6
+ mov edi,ebx
+ xor r12d,r11d
+ ror r14d,11
+ pshufd xmm7,xmm7,8
+ xor edi,ecx
+ add eax,r12d
+ movdqa xmm6,XMMWORD[96+rbp]
+ ror r13d,6
+ and r15d,edi
+ pslldq xmm7,8
+ xor r14d,ebx
+ add eax,r13d
+ xor r15d,ecx
+ paddd xmm3,xmm7
+ ror r14d,2
+ add r8d,eax
+ add eax,r15d
+ paddd xmm6,xmm3
+ mov r13d,r8d
+ add r14d,eax
+ movdqa XMMWORD[48+rsp],xmm6
+ cmp BYTE[131+rbp],0
+ jne NEAR $L$ssse3_00_47
+ ror r13d,14
+ mov eax,r14d
+ mov r12d,r9d
+ ror r14d,9
+ xor r13d,r8d
+ xor r12d,r10d
+ ror r13d,5
+ xor r14d,eax
+ and r12d,r8d
+ xor r13d,r8d
+ add r11d,DWORD[rsp]
+ mov r15d,eax
+ xor r12d,r10d
+ ror r14d,11
+ xor r15d,ebx
+ add r11d,r12d
+ ror r13d,6
+ and edi,r15d
+ xor r14d,eax
+ add r11d,r13d
+ xor edi,ebx
+ ror r14d,2
+ add edx,r11d
+ add r11d,edi
+ mov r13d,edx
+ add r14d,r11d
+ ror r13d,14
+ mov r11d,r14d
+ mov r12d,r8d
+ ror r14d,9
+ xor r13d,edx
+ xor r12d,r9d
+ ror r13d,5
+ xor r14d,r11d
+ and r12d,edx
+ xor r13d,edx
+ add r10d,DWORD[4+rsp]
+ mov edi,r11d
+ xor r12d,r9d
+ ror r14d,11
+ xor edi,eax
+ add r10d,r12d
+ ror r13d,6
+ and r15d,edi
+ xor r14d,r11d
+ add r10d,r13d
+ xor r15d,eax
+ ror r14d,2
+ add ecx,r10d
+ add r10d,r15d
+ mov r13d,ecx
+ add r14d,r10d
+ ror r13d,14
+ mov r10d,r14d
+ mov r12d,edx
+ ror r14d,9
+ xor r13d,ecx
+ xor r12d,r8d
+ ror r13d,5
+ xor r14d,r10d
+ and r12d,ecx
+ xor r13d,ecx
+ add r9d,DWORD[8+rsp]
+ mov r15d,r10d
+ xor r12d,r8d
+ ror r14d,11
+ xor r15d,r11d
+ add r9d,r12d
+ ror r13d,6
+ and edi,r15d
+ xor r14d,r10d
+ add r9d,r13d
+ xor edi,r11d
+ ror r14d,2
+ add ebx,r9d
+ add r9d,edi
+ mov r13d,ebx
+ add r14d,r9d
+ ror r13d,14
+ mov r9d,r14d
+ mov r12d,ecx
+ ror r14d,9
+ xor r13d,ebx
+ xor r12d,edx
+ ror r13d,5
+ xor r14d,r9d
+ and r12d,ebx
+ xor r13d,ebx
+ add r8d,DWORD[12+rsp]
+ mov edi,r9d
+ xor r12d,edx
+ ror r14d,11
+ xor edi,r10d
+ add r8d,r12d
+ ror r13d,6
+ and r15d,edi
+ xor r14d,r9d
+ add r8d,r13d
+ xor r15d,r10d
+ ror r14d,2
+ add eax,r8d
+ add r8d,r15d
+ mov r13d,eax
+ add r14d,r8d
+ ror r13d,14
+ mov r8d,r14d
+ mov r12d,ebx
+ ror r14d,9
+ xor r13d,eax
+ xor r12d,ecx
+ ror r13d,5
+ xor r14d,r8d
+ and r12d,eax
+ xor r13d,eax
+ add edx,DWORD[16+rsp]
+ mov r15d,r8d
+ xor r12d,ecx
+ ror r14d,11
+ xor r15d,r9d
+ add edx,r12d
+ ror r13d,6
+ and edi,r15d
+ xor r14d,r8d
+ add edx,r13d
+ xor edi,r9d
+ ror r14d,2
+ add r11d,edx
+ add edx,edi
+ mov r13d,r11d
+ add r14d,edx
+ ror r13d,14
+ mov edx,r14d
+ mov r12d,eax
+ ror r14d,9
+ xor r13d,r11d
+ xor r12d,ebx
+ ror r13d,5
+ xor r14d,edx
+ and r12d,r11d
+ xor r13d,r11d
+ add ecx,DWORD[20+rsp]
+ mov edi,edx
+ xor r12d,ebx
+ ror r14d,11
+ xor edi,r8d
+ add ecx,r12d
+ ror r13d,6
+ and r15d,edi
+ xor r14d,edx
+ add ecx,r13d
+ xor r15d,r8d
+ ror r14d,2
+ add r10d,ecx
+ add ecx,r15d
+ mov r13d,r10d
+ add r14d,ecx
+ ror r13d,14
+ mov ecx,r14d
+ mov r12d,r11d
+ ror r14d,9
+ xor r13d,r10d
+ xor r12d,eax
+ ror r13d,5
+ xor r14d,ecx
+ and r12d,r10d
+ xor r13d,r10d
+ add ebx,DWORD[24+rsp]
+ mov r15d,ecx
+ xor r12d,eax
+ ror r14d,11
+ xor r15d,edx
+ add ebx,r12d
+ ror r13d,6
+ and edi,r15d
+ xor r14d,ecx
+ add ebx,r13d
+ xor edi,edx
+ ror r14d,2
+ add r9d,ebx
+ add ebx,edi
+ mov r13d,r9d
+ add r14d,ebx
+ ror r13d,14
+ mov ebx,r14d
+ mov r12d,r10d
+ ror r14d,9
+ xor r13d,r9d
+ xor r12d,r11d
+ ror r13d,5
+ xor r14d,ebx
+ and r12d,r9d
+ xor r13d,r9d
+ add eax,DWORD[28+rsp]
+ mov edi,ebx
+ xor r12d,r11d
+ ror r14d,11
+ xor edi,ecx
+ add eax,r12d
+ ror r13d,6
+ and r15d,edi
+ xor r14d,ebx
+ add eax,r13d
+ xor r15d,ecx
+ ror r14d,2
+ add r8d,eax
+ add eax,r15d
+ mov r13d,r8d
+ add r14d,eax
+ ror r13d,14
+ mov eax,r14d
+ mov r12d,r9d
+ ror r14d,9
+ xor r13d,r8d
+ xor r12d,r10d
+ ror r13d,5
+ xor r14d,eax
+ and r12d,r8d
+ xor r13d,r8d
+ add r11d,DWORD[32+rsp]
+ mov r15d,eax
+ xor r12d,r10d
+ ror r14d,11
+ xor r15d,ebx
+ add r11d,r12d
+ ror r13d,6
+ and edi,r15d
+ xor r14d,eax
+ add r11d,r13d
+ xor edi,ebx
+ ror r14d,2
+ add edx,r11d
+ add r11d,edi
+ mov r13d,edx
+ add r14d,r11d
+ ror r13d,14
+ mov r11d,r14d
+ mov r12d,r8d
+ ror r14d,9
+ xor r13d,edx
+ xor r12d,r9d
+ ror r13d,5
+ xor r14d,r11d
+ and r12d,edx
+ xor r13d,edx
+ add r10d,DWORD[36+rsp]
+ mov edi,r11d
+ xor r12d,r9d
+ ror r14d,11
+ xor edi,eax
+ add r10d,r12d
+ ror r13d,6
+ and r15d,edi
+ xor r14d,r11d
+ add r10d,r13d
+ xor r15d,eax
+ ror r14d,2
+ add ecx,r10d
+ add r10d,r15d
+ mov r13d,ecx
+ add r14d,r10d
+ ror r13d,14
+ mov r10d,r14d
+ mov r12d,edx
+ ror r14d,9
+ xor r13d,ecx
+ xor r12d,r8d
+ ror r13d,5
+ xor r14d,r10d
+ and r12d,ecx
+ xor r13d,ecx
+ add r9d,DWORD[40+rsp]
+ mov r15d,r10d
+ xor r12d,r8d
+ ror r14d,11
+ xor r15d,r11d
+ add r9d,r12d
+ ror r13d,6
+ and edi,r15d
+ xor r14d,r10d
+ add r9d,r13d
+ xor edi,r11d
+ ror r14d,2
+ add ebx,r9d
+ add r9d,edi
+ mov r13d,ebx
+ add r14d,r9d
+ ror r13d,14
+ mov r9d,r14d
+ mov r12d,ecx
+ ror r14d,9
+ xor r13d,ebx
+ xor r12d,edx
+ ror r13d,5
+ xor r14d,r9d
+ and r12d,ebx
+ xor r13d,ebx
+ add r8d,DWORD[44+rsp]
+ mov edi,r9d
+ xor r12d,edx
+ ror r14d,11
+ xor edi,r10d
+ add r8d,r12d
+ ror r13d,6
+ and r15d,edi
+ xor r14d,r9d
+ add r8d,r13d
+ xor r15d,r10d
+ ror r14d,2
+ add eax,r8d
+ add r8d,r15d
+ mov r13d,eax
+ add r14d,r8d
+ ror r13d,14
+ mov r8d,r14d
+ mov r12d,ebx
+ ror r14d,9
+ xor r13d,eax
+ xor r12d,ecx
+ ror r13d,5
+ xor r14d,r8d
+ and r12d,eax
+ xor r13d,eax
+ add edx,DWORD[48+rsp]
+ mov r15d,r8d
+ xor r12d,ecx
+ ror r14d,11
+ xor r15d,r9d
+ add edx,r12d
+ ror r13d,6
+ and edi,r15d
+ xor r14d,r8d
+ add edx,r13d
+ xor edi,r9d
+ ror r14d,2
+ add r11d,edx
+ add edx,edi
+ mov r13d,r11d
+ add r14d,edx
+ ror r13d,14
+ mov edx,r14d
+ mov r12d,eax
+ ror r14d,9
+ xor r13d,r11d
+ xor r12d,ebx
+ ror r13d,5
+ xor r14d,edx
+ and r12d,r11d
+ xor r13d,r11d
+ add ecx,DWORD[52+rsp]
+ mov edi,edx
+ xor r12d,ebx
+ ror r14d,11
+ xor edi,r8d
+ add ecx,r12d
+ ror r13d,6
+ and r15d,edi
+ xor r14d,edx
+ add ecx,r13d
+ xor r15d,r8d
+ ror r14d,2
+ add r10d,ecx
+ add ecx,r15d
+ mov r13d,r10d
+ add r14d,ecx
+ ror r13d,14
+ mov ecx,r14d
+ mov r12d,r11d
+ ror r14d,9
+ xor r13d,r10d
+ xor r12d,eax
+ ror r13d,5
+ xor r14d,ecx
+ and r12d,r10d
+ xor r13d,r10d
+ add ebx,DWORD[56+rsp]
+ mov r15d,ecx
+ xor r12d,eax
+ ror r14d,11
+ xor r15d,edx
+ add ebx,r12d
+ ror r13d,6
+ and edi,r15d
+ xor r14d,ecx
+ add ebx,r13d
+ xor edi,edx
+ ror r14d,2
+ add r9d,ebx
+ add ebx,edi
+ mov r13d,r9d
+ add r14d,ebx
+ ror r13d,14
+ mov ebx,r14d
+ mov r12d,r10d
+ ror r14d,9
+ xor r13d,r9d
+ xor r12d,r11d
+ ror r13d,5
+ xor r14d,ebx
+ and r12d,r9d
+ xor r13d,r9d
+ add eax,DWORD[60+rsp]
+ mov edi,ebx
+ xor r12d,r11d
+ ror r14d,11
+ xor edi,ecx
+ add eax,r12d
+ ror r13d,6
+ and r15d,edi
+ xor r14d,ebx
+ add eax,r13d
+ xor r15d,ecx
+ ror r14d,2
+ add r8d,eax
+ add eax,r15d
+ mov r13d,r8d
+ add r14d,eax
+ mov rdi,QWORD[((64+0))+rsp]
+ mov eax,r14d
+
+ add eax,DWORD[rdi]
+ lea rsi,[64+rsi]
+ add ebx,DWORD[4+rdi]
+ add ecx,DWORD[8+rdi]
+ add edx,DWORD[12+rdi]
+ add r8d,DWORD[16+rdi]
+ add r9d,DWORD[20+rdi]
+ add r10d,DWORD[24+rdi]
+ add r11d,DWORD[28+rdi]
+
+ cmp rsi,QWORD[((64+16))+rsp]
+
+ mov DWORD[rdi],eax
+ mov DWORD[4+rdi],ebx
+ mov DWORD[8+rdi],ecx
+ mov DWORD[12+rdi],edx
+ mov DWORD[16+rdi],r8d
+ mov DWORD[20+rdi],r9d
+ mov DWORD[24+rdi],r10d
+ mov DWORD[28+rdi],r11d
+ jb NEAR $L$loop_ssse3
+
+ mov rsi,QWORD[88+rsp]
+
+ movaps xmm6,XMMWORD[((64+32))+rsp]
+ movaps xmm7,XMMWORD[((64+48))+rsp]
+ movaps xmm8,XMMWORD[((64+64))+rsp]
+ movaps xmm9,XMMWORD[((64+80))+rsp]
+ mov r15,QWORD[((-48))+rsi]
+
+ mov r14,QWORD[((-40))+rsi]
+
+ mov r13,QWORD[((-32))+rsi]
+
+ mov r12,QWORD[((-24))+rsi]
+
+ mov rbp,QWORD[((-16))+rsi]
+
+ mov rbx,QWORD[((-8))+rsi]
+
+ lea rsp,[rsi]
+
+$L$epilogue_ssse3:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_sha256_block_data_order_ssse3:
+
+ALIGN 64
+sha256_block_data_order_avx:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_sha256_block_data_order_avx:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+
+$L$avx_shortcut:
+ mov rax,rsp
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+ shl rdx,4
+ sub rsp,160
+ lea rdx,[rdx*4+rsi]
+ and rsp,-64
+ mov QWORD[((64+0))+rsp],rdi
+ mov QWORD[((64+8))+rsp],rsi
+ mov QWORD[((64+16))+rsp],rdx
+ mov QWORD[88+rsp],rax
+
+ movaps XMMWORD[(64+32)+rsp],xmm6
+ movaps XMMWORD[(64+48)+rsp],xmm7
+ movaps XMMWORD[(64+64)+rsp],xmm8
+ movaps XMMWORD[(64+80)+rsp],xmm9
+$L$prologue_avx:
+
+ vzeroupper
+ mov eax,DWORD[rdi]
+ mov ebx,DWORD[4+rdi]
+ mov ecx,DWORD[8+rdi]
+ mov edx,DWORD[12+rdi]
+ mov r8d,DWORD[16+rdi]
+ mov r9d,DWORD[20+rdi]
+ mov r10d,DWORD[24+rdi]
+ mov r11d,DWORD[28+rdi]
+ vmovdqa xmm8,XMMWORD[((K256+512+32))]
+ vmovdqa xmm9,XMMWORD[((K256+512+64))]
+ jmp NEAR $L$loop_avx
+ALIGN 16
+$L$loop_avx:
+ vmovdqa xmm7,XMMWORD[((K256+512))]
+ vmovdqu xmm0,XMMWORD[rsi]
+ vmovdqu xmm1,XMMWORD[16+rsi]
+ vmovdqu xmm2,XMMWORD[32+rsi]
+ vmovdqu xmm3,XMMWORD[48+rsi]
+ vpshufb xmm0,xmm0,xmm7
+ lea rbp,[K256]
+ vpshufb xmm1,xmm1,xmm7
+ vpshufb xmm2,xmm2,xmm7
+ vpaddd xmm4,xmm0,XMMWORD[rbp]
+ vpshufb xmm3,xmm3,xmm7
+ vpaddd xmm5,xmm1,XMMWORD[32+rbp]
+ vpaddd xmm6,xmm2,XMMWORD[64+rbp]
+ vpaddd xmm7,xmm3,XMMWORD[96+rbp]
+ vmovdqa XMMWORD[rsp],xmm4
+ mov r14d,eax
+ vmovdqa XMMWORD[16+rsp],xmm5
+ mov edi,ebx
+ vmovdqa XMMWORD[32+rsp],xmm6
+ xor edi,ecx
+ vmovdqa XMMWORD[48+rsp],xmm7
+ mov r13d,r8d
+ jmp NEAR $L$avx_00_47
+
+ALIGN 16
+$L$avx_00_47:
+ sub rbp,-128
+ vpalignr xmm4,xmm1,xmm0,4
+ shrd r13d,r13d,14
+ mov eax,r14d
+ mov r12d,r9d
+ vpalignr xmm7,xmm3,xmm2,4
+ shrd r14d,r14d,9
+ xor r13d,r8d
+ xor r12d,r10d
+ vpsrld xmm6,xmm4,7
+ shrd r13d,r13d,5
+ xor r14d,eax
+ and r12d,r8d
+ vpaddd xmm0,xmm0,xmm7
+ xor r13d,r8d
+ add r11d,DWORD[rsp]
+ mov r15d,eax
+ vpsrld xmm7,xmm4,3
+ xor r12d,r10d
+ shrd r14d,r14d,11
+ xor r15d,ebx
+ vpslld xmm5,xmm4,14
+ add r11d,r12d
+ shrd r13d,r13d,6
+ and edi,r15d
+ vpxor xmm4,xmm7,xmm6
+ xor r14d,eax
+ add r11d,r13d
+ xor edi,ebx
+ vpshufd xmm7,xmm3,250
+ shrd r14d,r14d,2
+ add edx,r11d
+ add r11d,edi
+ vpsrld xmm6,xmm6,11
+ mov r13d,edx
+ add r14d,r11d
+ shrd r13d,r13d,14
+ vpxor xmm4,xmm4,xmm5
+ mov r11d,r14d
+ mov r12d,r8d
+ shrd r14d,r14d,9
+ vpslld xmm5,xmm5,11
+ xor r13d,edx
+ xor r12d,r9d
+ shrd r13d,r13d,5
+ vpxor xmm4,xmm4,xmm6
+ xor r14d,r11d
+ and r12d,edx
+ xor r13d,edx
+ vpsrld xmm6,xmm7,10
+ add r10d,DWORD[4+rsp]
+ mov edi,r11d
+ xor r12d,r9d
+ vpxor xmm4,xmm4,xmm5
+ shrd r14d,r14d,11
+ xor edi,eax
+ add r10d,r12d
+ vpsrlq xmm7,xmm7,17
+ shrd r13d,r13d,6
+ and r15d,edi
+ xor r14d,r11d
+ vpaddd xmm0,xmm0,xmm4
+ add r10d,r13d
+ xor r15d,eax
+ shrd r14d,r14d,2
+ vpxor xmm6,xmm6,xmm7
+ add ecx,r10d
+ add r10d,r15d
+ mov r13d,ecx
+ vpsrlq xmm7,xmm7,2
+ add r14d,r10d
+ shrd r13d,r13d,14
+ mov r10d,r14d
+ vpxor xmm6,xmm6,xmm7
+ mov r12d,edx
+ shrd r14d,r14d,9
+ xor r13d,ecx
+ vpshufb xmm6,xmm6,xmm8
+ xor r12d,r8d
+ shrd r13d,r13d,5
+ xor r14d,r10d
+ vpaddd xmm0,xmm0,xmm6
+ and r12d,ecx
+ xor r13d,ecx
+ add r9d,DWORD[8+rsp]
+ vpshufd xmm7,xmm0,80
+ mov r15d,r10d
+ xor r12d,r8d
+ shrd r14d,r14d,11
+ vpsrld xmm6,xmm7,10
+ xor r15d,r11d
+ add r9d,r12d
+ shrd r13d,r13d,6
+ vpsrlq xmm7,xmm7,17
+ and edi,r15d
+ xor r14d,r10d
+ add r9d,r13d
+ vpxor xmm6,xmm6,xmm7
+ xor edi,r11d
+ shrd r14d,r14d,2
+ add ebx,r9d
+ vpsrlq xmm7,xmm7,2
+ add r9d,edi
+ mov r13d,ebx
+ add r14d,r9d
+ vpxor xmm6,xmm6,xmm7
+ shrd r13d,r13d,14
+ mov r9d,r14d
+ mov r12d,ecx
+ vpshufb xmm6,xmm6,xmm9
+ shrd r14d,r14d,9
+ xor r13d,ebx
+ xor r12d,edx
+ vpaddd xmm0,xmm0,xmm6
+ shrd r13d,r13d,5
+ xor r14d,r9d
+ and r12d,ebx
+ vpaddd xmm6,xmm0,XMMWORD[rbp]
+ xor r13d,ebx
+ add r8d,DWORD[12+rsp]
+ mov edi,r9d
+ xor r12d,edx
+ shrd r14d,r14d,11
+ xor edi,r10d
+ add r8d,r12d
+ shrd r13d,r13d,6
+ and r15d,edi
+ xor r14d,r9d
+ add r8d,r13d
+ xor r15d,r10d
+ shrd r14d,r14d,2
+ add eax,r8d
+ add r8d,r15d
+ mov r13d,eax
+ add r14d,r8d
+ vmovdqa XMMWORD[rsp],xmm6
+ vpalignr xmm4,xmm2,xmm1,4
+ shrd r13d,r13d,14
+ mov r8d,r14d
+ mov r12d,ebx
+ vpalignr xmm7,xmm0,xmm3,4
+ shrd r14d,r14d,9
+ xor r13d,eax
+ xor r12d,ecx
+ vpsrld xmm6,xmm4,7
+ shrd r13d,r13d,5
+ xor r14d,r8d
+ and r12d,eax
+ vpaddd xmm1,xmm1,xmm7
+ xor r13d,eax
+ add edx,DWORD[16+rsp]
+ mov r15d,r8d
+ vpsrld xmm7,xmm4,3
+ xor r12d,ecx
+ shrd r14d,r14d,11
+ xor r15d,r9d
+ vpslld xmm5,xmm4,14
+ add edx,r12d
+ shrd r13d,r13d,6
+ and edi,r15d
+ vpxor xmm4,xmm7,xmm6
+ xor r14d,r8d
+ add edx,r13d
+ xor edi,r9d
+ vpshufd xmm7,xmm0,250
+ shrd r14d,r14d,2
+ add r11d,edx
+ add edx,edi
+ vpsrld xmm6,xmm6,11
+ mov r13d,r11d
+ add r14d,edx
+ shrd r13d,r13d,14
+ vpxor xmm4,xmm4,xmm5
+ mov edx,r14d
+ mov r12d,eax
+ shrd r14d,r14d,9
+ vpslld xmm5,xmm5,11
+ xor r13d,r11d
+ xor r12d,ebx
+ shrd r13d,r13d,5
+ vpxor xmm4,xmm4,xmm6
+ xor r14d,edx
+ and r12d,r11d
+ xor r13d,r11d
+ vpsrld xmm6,xmm7,10
+ add ecx,DWORD[20+rsp]
+ mov edi,edx
+ xor r12d,ebx
+ vpxor xmm4,xmm4,xmm5
+ shrd r14d,r14d,11
+ xor edi,r8d
+ add ecx,r12d
+ vpsrlq xmm7,xmm7,17
+ shrd r13d,r13d,6
+ and r15d,edi
+ xor r14d,edx
+ vpaddd xmm1,xmm1,xmm4
+ add ecx,r13d
+ xor r15d,r8d
+ shrd r14d,r14d,2
+ vpxor xmm6,xmm6,xmm7
+ add r10d,ecx
+ add ecx,r15d
+ mov r13d,r10d
+ vpsrlq xmm7,xmm7,2
+ add r14d,ecx
+ shrd r13d,r13d,14
+ mov ecx,r14d
+ vpxor xmm6,xmm6,xmm7
+ mov r12d,r11d
+ shrd r14d,r14d,9
+ xor r13d,r10d
+ vpshufb xmm6,xmm6,xmm8
+ xor r12d,eax
+ shrd r13d,r13d,5
+ xor r14d,ecx
+ vpaddd xmm1,xmm1,xmm6
+ and r12d,r10d
+ xor r13d,r10d
+ add ebx,DWORD[24+rsp]
+ vpshufd xmm7,xmm1,80
+ mov r15d,ecx
+ xor r12d,eax
+ shrd r14d,r14d,11
+ vpsrld xmm6,xmm7,10
+ xor r15d,edx
+ add ebx,r12d
+ shrd r13d,r13d,6
+ vpsrlq xmm7,xmm7,17
+ and edi,r15d
+ xor r14d,ecx
+ add ebx,r13d
+ vpxor xmm6,xmm6,xmm7
+ xor edi,edx
+ shrd r14d,r14d,2
+ add r9d,ebx
+ vpsrlq xmm7,xmm7,2
+ add ebx,edi
+ mov r13d,r9d
+ add r14d,ebx
+ vpxor xmm6,xmm6,xmm7
+ shrd r13d,r13d,14
+ mov ebx,r14d
+ mov r12d,r10d
+ vpshufb xmm6,xmm6,xmm9
+ shrd r14d,r14d,9
+ xor r13d,r9d
+ xor r12d,r11d
+ vpaddd xmm1,xmm1,xmm6
+ shrd r13d,r13d,5
+ xor r14d,ebx
+ and r12d,r9d
+ vpaddd xmm6,xmm1,XMMWORD[32+rbp]
+ xor r13d,r9d
+ add eax,DWORD[28+rsp]
+ mov edi,ebx
+ xor r12d,r11d
+ shrd r14d,r14d,11
+ xor edi,ecx
+ add eax,r12d
+ shrd r13d,r13d,6
+ and r15d,edi
+ xor r14d,ebx
+ add eax,r13d
+ xor r15d,ecx
+ shrd r14d,r14d,2
+ add r8d,eax
+ add eax,r15d
+ mov r13d,r8d
+ add r14d,eax
+ vmovdqa XMMWORD[16+rsp],xmm6
+ vpalignr xmm4,xmm3,xmm2,4
+ shrd r13d,r13d,14
+ mov eax,r14d
+ mov r12d,r9d
+ vpalignr xmm7,xmm1,xmm0,4
+ shrd r14d,r14d,9
+ xor r13d,r8d
+ xor r12d,r10d
+ vpsrld xmm6,xmm4,7
+ shrd r13d,r13d,5
+ xor r14d,eax
+ and r12d,r8d
+ vpaddd xmm2,xmm2,xmm7
+ xor r13d,r8d
+ add r11d,DWORD[32+rsp]
+ mov r15d,eax
+ vpsrld xmm7,xmm4,3
+ xor r12d,r10d
+ shrd r14d,r14d,11
+ xor r15d,ebx
+ vpslld xmm5,xmm4,14
+ add r11d,r12d
+ shrd r13d,r13d,6
+ and edi,r15d
+ vpxor xmm4,xmm7,xmm6
+ xor r14d,eax
+ add r11d,r13d
+ xor edi,ebx
+ vpshufd xmm7,xmm1,250
+ shrd r14d,r14d,2
+ add edx,r11d
+ add r11d,edi
+ vpsrld xmm6,xmm6,11
+ mov r13d,edx
+ add r14d,r11d
+ shrd r13d,r13d,14
+ vpxor xmm4,xmm4,xmm5
+ mov r11d,r14d
+ mov r12d,r8d
+ shrd r14d,r14d,9
+ vpslld xmm5,xmm5,11
+ xor r13d,edx
+ xor r12d,r9d
+ shrd r13d,r13d,5
+ vpxor xmm4,xmm4,xmm6
+ xor r14d,r11d
+ and r12d,edx
+ xor r13d,edx
+ vpsrld xmm6,xmm7,10
+ add r10d,DWORD[36+rsp]
+ mov edi,r11d
+ xor r12d,r9d
+ vpxor xmm4,xmm4,xmm5
+ shrd r14d,r14d,11
+ xor edi,eax
+ add r10d,r12d
+ vpsrlq xmm7,xmm7,17
+ shrd r13d,r13d,6
+ and r15d,edi
+ xor r14d,r11d
+ vpaddd xmm2,xmm2,xmm4
+ add r10d,r13d
+ xor r15d,eax
+ shrd r14d,r14d,2
+ vpxor xmm6,xmm6,xmm7
+ add ecx,r10d
+ add r10d,r15d
+ mov r13d,ecx
+ vpsrlq xmm7,xmm7,2
+ add r14d,r10d
+ shrd r13d,r13d,14
+ mov r10d,r14d
+ vpxor xmm6,xmm6,xmm7
+ mov r12d,edx
+ shrd r14d,r14d,9
+ xor r13d,ecx
+ vpshufb xmm6,xmm6,xmm8
+ xor r12d,r8d
+ shrd r13d,r13d,5
+ xor r14d,r10d
+ vpaddd xmm2,xmm2,xmm6
+ and r12d,ecx
+ xor r13d,ecx
+ add r9d,DWORD[40+rsp]
+ vpshufd xmm7,xmm2,80
+ mov r15d,r10d
+ xor r12d,r8d
+ shrd r14d,r14d,11
+ vpsrld xmm6,xmm7,10
+ xor r15d,r11d
+ add r9d,r12d
+ shrd r13d,r13d,6
+ vpsrlq xmm7,xmm7,17
+ and edi,r15d
+ xor r14d,r10d
+ add r9d,r13d
+ vpxor xmm6,xmm6,xmm7
+ xor edi,r11d
+ shrd r14d,r14d,2
+ add ebx,r9d
+ vpsrlq xmm7,xmm7,2
+ add r9d,edi
+ mov r13d,ebx
+ add r14d,r9d
+ vpxor xmm6,xmm6,xmm7
+ shrd r13d,r13d,14
+ mov r9d,r14d
+ mov r12d,ecx
+ vpshufb xmm6,xmm6,xmm9
+ shrd r14d,r14d,9
+ xor r13d,ebx
+ xor r12d,edx
+ vpaddd xmm2,xmm2,xmm6
+ shrd r13d,r13d,5
+ xor r14d,r9d
+ and r12d,ebx
+ vpaddd xmm6,xmm2,XMMWORD[64+rbp]
+ xor r13d,ebx
+ add r8d,DWORD[44+rsp]
+ mov edi,r9d
+ xor r12d,edx
+ shrd r14d,r14d,11
+ xor edi,r10d
+ add r8d,r12d
+ shrd r13d,r13d,6
+ and r15d,edi
+ xor r14d,r9d
+ add r8d,r13d
+ xor r15d,r10d
+ shrd r14d,r14d,2
+ add eax,r8d
+ add r8d,r15d
+ mov r13d,eax
+ add r14d,r8d
+ vmovdqa XMMWORD[32+rsp],xmm6
+ vpalignr xmm4,xmm0,xmm3,4
+ shrd r13d,r13d,14
+ mov r8d,r14d
+ mov r12d,ebx
+ vpalignr xmm7,xmm2,xmm1,4
+ shrd r14d,r14d,9
+ xor r13d,eax
+ xor r12d,ecx
+ vpsrld xmm6,xmm4,7
+ shrd r13d,r13d,5
+ xor r14d,r8d
+ and r12d,eax
+ vpaddd xmm3,xmm3,xmm7
+ xor r13d,eax
+ add edx,DWORD[48+rsp]
+ mov r15d,r8d
+ vpsrld xmm7,xmm4,3
+ xor r12d,ecx
+ shrd r14d,r14d,11
+ xor r15d,r9d
+ vpslld xmm5,xmm4,14
+ add edx,r12d
+ shrd r13d,r13d,6
+ and edi,r15d
+ vpxor xmm4,xmm7,xmm6
+ xor r14d,r8d
+ add edx,r13d
+ xor edi,r9d
+ vpshufd xmm7,xmm2,250
+ shrd r14d,r14d,2
+ add r11d,edx
+ add edx,edi
+ vpsrld xmm6,xmm6,11
+ mov r13d,r11d
+ add r14d,edx
+ shrd r13d,r13d,14
+ vpxor xmm4,xmm4,xmm5
+ mov edx,r14d
+ mov r12d,eax
+ shrd r14d,r14d,9
+ vpslld xmm5,xmm5,11
+ xor r13d,r11d
+ xor r12d,ebx
+ shrd r13d,r13d,5
+ vpxor xmm4,xmm4,xmm6
+ xor r14d,edx
+ and r12d,r11d
+ xor r13d,r11d
+ vpsrld xmm6,xmm7,10
+ add ecx,DWORD[52+rsp]
+ mov edi,edx
+ xor r12d,ebx
+ vpxor xmm4,xmm4,xmm5
+ shrd r14d,r14d,11
+ xor edi,r8d
+ add ecx,r12d
+ vpsrlq xmm7,xmm7,17
+ shrd r13d,r13d,6
+ and r15d,edi
+ xor r14d,edx
+ vpaddd xmm3,xmm3,xmm4
+ add ecx,r13d
+ xor r15d,r8d
+ shrd r14d,r14d,2
+ vpxor xmm6,xmm6,xmm7
+ add r10d,ecx
+ add ecx,r15d
+ mov r13d,r10d
+ vpsrlq xmm7,xmm7,2
+ add r14d,ecx
+ shrd r13d,r13d,14
+ mov ecx,r14d
+ vpxor xmm6,xmm6,xmm7
+ mov r12d,r11d
+ shrd r14d,r14d,9
+ xor r13d,r10d
+ vpshufb xmm6,xmm6,xmm8
+ xor r12d,eax
+ shrd r13d,r13d,5
+ xor r14d,ecx
+ vpaddd xmm3,xmm3,xmm6
+ and r12d,r10d
+ xor r13d,r10d
+ add ebx,DWORD[56+rsp]
+ vpshufd xmm7,xmm3,80
+ mov r15d,ecx
+ xor r12d,eax
+ shrd r14d,r14d,11
+ vpsrld xmm6,xmm7,10
+ xor r15d,edx
+ add ebx,r12d
+ shrd r13d,r13d,6
+ vpsrlq xmm7,xmm7,17
+ and edi,r15d
+ xor r14d,ecx
+ add ebx,r13d
+ vpxor xmm6,xmm6,xmm7
+ xor edi,edx
+ shrd r14d,r14d,2
+ add r9d,ebx
+ vpsrlq xmm7,xmm7,2
+ add ebx,edi
+ mov r13d,r9d
+ add r14d,ebx
+ vpxor xmm6,xmm6,xmm7
+ shrd r13d,r13d,14
+ mov ebx,r14d
+ mov r12d,r10d
+ vpshufb xmm6,xmm6,xmm9
+ shrd r14d,r14d,9
+ xor r13d,r9d
+ xor r12d,r11d
+ vpaddd xmm3,xmm3,xmm6
+ shrd r13d,r13d,5
+ xor r14d,ebx
+ and r12d,r9d
+ vpaddd xmm6,xmm3,XMMWORD[96+rbp]
+ xor r13d,r9d
+ add eax,DWORD[60+rsp]
+ mov edi,ebx
+ xor r12d,r11d
+ shrd r14d,r14d,11
+ xor edi,ecx
+ add eax,r12d
+ shrd r13d,r13d,6
+ and r15d,edi
+ xor r14d,ebx
+ add eax,r13d
+ xor r15d,ecx
+ shrd r14d,r14d,2
+ add r8d,eax
+ add eax,r15d
+ mov r13d,r8d
+ add r14d,eax
+ vmovdqa XMMWORD[48+rsp],xmm6
+ cmp BYTE[131+rbp],0
+ jne NEAR $L$avx_00_47
+ shrd r13d,r13d,14
+ mov eax,r14d
+ mov r12d,r9d
+ shrd r14d,r14d,9
+ xor r13d,r8d
+ xor r12d,r10d
+ shrd r13d,r13d,5
+ xor r14d,eax
+ and r12d,r8d
+ xor r13d,r8d
+ add r11d,DWORD[rsp]
+ mov r15d,eax
+ xor r12d,r10d
+ shrd r14d,r14d,11
+ xor r15d,ebx
+ add r11d,r12d
+ shrd r13d,r13d,6
+ and edi,r15d
+ xor r14d,eax
+ add r11d,r13d
+ xor edi,ebx
+ shrd r14d,r14d,2
+ add edx,r11d
+ add r11d,edi
+ mov r13d,edx
+ add r14d,r11d
+ shrd r13d,r13d,14
+ mov r11d,r14d
+ mov r12d,r8d
+ shrd r14d,r14d,9
+ xor r13d,edx
+ xor r12d,r9d
+ shrd r13d,r13d,5
+ xor r14d,r11d
+ and r12d,edx
+ xor r13d,edx
+ add r10d,DWORD[4+rsp]
+ mov edi,r11d
+ xor r12d,r9d
+ shrd r14d,r14d,11
+ xor edi,eax
+ add r10d,r12d
+ shrd r13d,r13d,6
+ and r15d,edi
+ xor r14d,r11d
+ add r10d,r13d
+ xor r15d,eax
+ shrd r14d,r14d,2
+ add ecx,r10d
+ add r10d,r15d
+ mov r13d,ecx
+ add r14d,r10d
+ shrd r13d,r13d,14
+ mov r10d,r14d
+ mov r12d,edx
+ shrd r14d,r14d,9
+ xor r13d,ecx
+ xor r12d,r8d
+ shrd r13d,r13d,5
+ xor r14d,r10d
+ and r12d,ecx
+ xor r13d,ecx
+ add r9d,DWORD[8+rsp]
+ mov r15d,r10d
+ xor r12d,r8d
+ shrd r14d,r14d,11
+ xor r15d,r11d
+ add r9d,r12d
+ shrd r13d,r13d,6
+ and edi,r15d
+ xor r14d,r10d
+ add r9d,r13d
+ xor edi,r11d
+ shrd r14d,r14d,2
+ add ebx,r9d
+ add r9d,edi
+ mov r13d,ebx
+ add r14d,r9d
+ shrd r13d,r13d,14
+ mov r9d,r14d
+ mov r12d,ecx
+ shrd r14d,r14d,9
+ xor r13d,ebx
+ xor r12d,edx
+ shrd r13d,r13d,5
+ xor r14d,r9d
+ and r12d,ebx
+ xor r13d,ebx
+ add r8d,DWORD[12+rsp]
+ mov edi,r9d
+ xor r12d,edx
+ shrd r14d,r14d,11
+ xor edi,r10d
+ add r8d,r12d
+ shrd r13d,r13d,6
+ and r15d,edi
+ xor r14d,r9d
+ add r8d,r13d
+ xor r15d,r10d
+ shrd r14d,r14d,2
+ add eax,r8d
+ add r8d,r15d
+ mov r13d,eax
+ add r14d,r8d
+ shrd r13d,r13d,14
+ mov r8d,r14d
+ mov r12d,ebx
+ shrd r14d,r14d,9
+ xor r13d,eax
+ xor r12d,ecx
+ shrd r13d,r13d,5
+ xor r14d,r8d
+ and r12d,eax
+ xor r13d,eax
+ add edx,DWORD[16+rsp]
+ mov r15d,r8d
+ xor r12d,ecx
+ shrd r14d,r14d,11
+ xor r15d,r9d
+ add edx,r12d
+ shrd r13d,r13d,6
+ and edi,r15d
+ xor r14d,r8d
+ add edx,r13d
+ xor edi,r9d
+ shrd r14d,r14d,2
+ add r11d,edx
+ add edx,edi
+ mov r13d,r11d
+ add r14d,edx
+ shrd r13d,r13d,14
+ mov edx,r14d
+ mov r12d,eax
+ shrd r14d,r14d,9
+ xor r13d,r11d
+ xor r12d,ebx
+ shrd r13d,r13d,5
+ xor r14d,edx
+ and r12d,r11d
+ xor r13d,r11d
+ add ecx,DWORD[20+rsp]
+ mov edi,edx
+ xor r12d,ebx
+ shrd r14d,r14d,11
+ xor edi,r8d
+ add ecx,r12d
+ shrd r13d,r13d,6
+ and r15d,edi
+ xor r14d,edx
+ add ecx,r13d
+ xor r15d,r8d
+ shrd r14d,r14d,2
+ add r10d,ecx
+ add ecx,r15d
+ mov r13d,r10d
+ add r14d,ecx
+ shrd r13d,r13d,14
+ mov ecx,r14d
+ mov r12d,r11d
+ shrd r14d,r14d,9
+ xor r13d,r10d
+ xor r12d,eax
+ shrd r13d,r13d,5
+ xor r14d,ecx
+ and r12d,r10d
+ xor r13d,r10d
+ add ebx,DWORD[24+rsp]
+ mov r15d,ecx
+ xor r12d,eax
+ shrd r14d,r14d,11
+ xor r15d,edx
+ add ebx,r12d
+ shrd r13d,r13d,6
+ and edi,r15d
+ xor r14d,ecx
+ add ebx,r13d
+ xor edi,edx
+ shrd r14d,r14d,2
+ add r9d,ebx
+ add ebx,edi
+ mov r13d,r9d
+ add r14d,ebx
+ shrd r13d,r13d,14
+ mov ebx,r14d
+ mov r12d,r10d
+ shrd r14d,r14d,9
+ xor r13d,r9d
+ xor r12d,r11d
+ shrd r13d,r13d,5
+ xor r14d,ebx
+ and r12d,r9d
+ xor r13d,r9d
+ add eax,DWORD[28+rsp]
+ mov edi,ebx
+ xor r12d,r11d
+ shrd r14d,r14d,11
+ xor edi,ecx
+ add eax,r12d
+ shrd r13d,r13d,6
+ and r15d,edi
+ xor r14d,ebx
+ add eax,r13d
+ xor r15d,ecx
+ shrd r14d,r14d,2
+ add r8d,eax
+ add eax,r15d
+ mov r13d,r8d
+ add r14d,eax
+ shrd r13d,r13d,14
+ mov eax,r14d
+ mov r12d,r9d
+ shrd r14d,r14d,9
+ xor r13d,r8d
+ xor r12d,r10d
+ shrd r13d,r13d,5
+ xor r14d,eax
+ and r12d,r8d
+ xor r13d,r8d
+ add r11d,DWORD[32+rsp]
+ mov r15d,eax
+ xor r12d,r10d
+ shrd r14d,r14d,11
+ xor r15d,ebx
+ add r11d,r12d
+ shrd r13d,r13d,6
+ and edi,r15d
+ xor r14d,eax
+ add r11d,r13d
+ xor edi,ebx
+ shrd r14d,r14d,2
+ add edx,r11d
+ add r11d,edi
+ mov r13d,edx
+ add r14d,r11d
+ shrd r13d,r13d,14
+ mov r11d,r14d
+ mov r12d,r8d
+ shrd r14d,r14d,9
+ xor r13d,edx
+ xor r12d,r9d
+ shrd r13d,r13d,5
+ xor r14d,r11d
+ and r12d,edx
+ xor r13d,edx
+ add r10d,DWORD[36+rsp]
+ mov edi,r11d
+ xor r12d,r9d
+ shrd r14d,r14d,11
+ xor edi,eax
+ add r10d,r12d
+ shrd r13d,r13d,6
+ and r15d,edi
+ xor r14d,r11d
+ add r10d,r13d
+ xor r15d,eax
+ shrd r14d,r14d,2
+ add ecx,r10d
+ add r10d,r15d
+ mov r13d,ecx
+ add r14d,r10d
+ shrd r13d,r13d,14
+ mov r10d,r14d
+ mov r12d,edx
+ shrd r14d,r14d,9
+ xor r13d,ecx
+ xor r12d,r8d
+ shrd r13d,r13d,5
+ xor r14d,r10d
+ and r12d,ecx
+ xor r13d,ecx
+ add r9d,DWORD[40+rsp]
+ mov r15d,r10d
+ xor r12d,r8d
+ shrd r14d,r14d,11
+ xor r15d,r11d
+ add r9d,r12d
+ shrd r13d,r13d,6
+ and edi,r15d
+ xor r14d,r10d
+ add r9d,r13d
+ xor edi,r11d
+ shrd r14d,r14d,2
+ add ebx,r9d
+ add r9d,edi
+ mov r13d,ebx
+ add r14d,r9d
+ shrd r13d,r13d,14
+ mov r9d,r14d
+ mov r12d,ecx
+ shrd r14d,r14d,9
+ xor r13d,ebx
+ xor r12d,edx
+ shrd r13d,r13d,5
+ xor r14d,r9d
+ and r12d,ebx
+ xor r13d,ebx
+ add r8d,DWORD[44+rsp]
+ mov edi,r9d
+ xor r12d,edx
+ shrd r14d,r14d,11
+ xor edi,r10d
+ add r8d,r12d
+ shrd r13d,r13d,6
+ and r15d,edi
+ xor r14d,r9d
+ add r8d,r13d
+ xor r15d,r10d
+ shrd r14d,r14d,2
+ add eax,r8d
+ add r8d,r15d
+ mov r13d,eax
+ add r14d,r8d
+ shrd r13d,r13d,14
+ mov r8d,r14d
+ mov r12d,ebx
+ shrd r14d,r14d,9
+ xor r13d,eax
+ xor r12d,ecx
+ shrd r13d,r13d,5
+ xor r14d,r8d
+ and r12d,eax
+ xor r13d,eax
+ add edx,DWORD[48+rsp]
+ mov r15d,r8d
+ xor r12d,ecx
+ shrd r14d,r14d,11
+ xor r15d,r9d
+ add edx,r12d
+ shrd r13d,r13d,6
+ and edi,r15d
+ xor r14d,r8d
+ add edx,r13d
+ xor edi,r9d
+ shrd r14d,r14d,2
+ add r11d,edx
+ add edx,edi
+ mov r13d,r11d
+ add r14d,edx
+ shrd r13d,r13d,14
+ mov edx,r14d
+ mov r12d,eax
+ shrd r14d,r14d,9
+ xor r13d,r11d
+ xor r12d,ebx
+ shrd r13d,r13d,5
+ xor r14d,edx
+ and r12d,r11d
+ xor r13d,r11d
+ add ecx,DWORD[52+rsp]
+ mov edi,edx
+ xor r12d,ebx
+ shrd r14d,r14d,11
+ xor edi,r8d
+ add ecx,r12d
+ shrd r13d,r13d,6
+ and r15d,edi
+ xor r14d,edx
+ add ecx,r13d
+ xor r15d,r8d
+ shrd r14d,r14d,2
+ add r10d,ecx
+ add ecx,r15d
+ mov r13d,r10d
+ add r14d,ecx
+ shrd r13d,r13d,14
+ mov ecx,r14d
+ mov r12d,r11d
+ shrd r14d,r14d,9
+ xor r13d,r10d
+ xor r12d,eax
+ shrd r13d,r13d,5
+ xor r14d,ecx
+ and r12d,r10d
+ xor r13d,r10d
+ add ebx,DWORD[56+rsp]
+ mov r15d,ecx
+ xor r12d,eax
+ shrd r14d,r14d,11
+ xor r15d,edx
+ add ebx,r12d
+ shrd r13d,r13d,6
+ and edi,r15d
+ xor r14d,ecx
+ add ebx,r13d
+ xor edi,edx
+ shrd r14d,r14d,2
+ add r9d,ebx
+ add ebx,edi
+ mov r13d,r9d
+ add r14d,ebx
+ shrd r13d,r13d,14
+ mov ebx,r14d
+ mov r12d,r10d
+ shrd r14d,r14d,9
+ xor r13d,r9d
+ xor r12d,r11d
+ shrd r13d,r13d,5
+ xor r14d,ebx
+ and r12d,r9d
+ xor r13d,r9d
+ add eax,DWORD[60+rsp]
+ mov edi,ebx
+ xor r12d,r11d
+ shrd r14d,r14d,11
+ xor edi,ecx
+ add eax,r12d
+ shrd r13d,r13d,6
+ and r15d,edi
+ xor r14d,ebx
+ add eax,r13d
+ xor r15d,ecx
+ shrd r14d,r14d,2
+ add r8d,eax
+ add eax,r15d
+ mov r13d,r8d
+ add r14d,eax
+ mov rdi,QWORD[((64+0))+rsp]
+ mov eax,r14d
+
+ add eax,DWORD[rdi]
+ lea rsi,[64+rsi]
+ add ebx,DWORD[4+rdi]
+ add ecx,DWORD[8+rdi]
+ add edx,DWORD[12+rdi]
+ add r8d,DWORD[16+rdi]
+ add r9d,DWORD[20+rdi]
+ add r10d,DWORD[24+rdi]
+ add r11d,DWORD[28+rdi]
+
+ cmp rsi,QWORD[((64+16))+rsp]
+
+ mov DWORD[rdi],eax
+ mov DWORD[4+rdi],ebx
+ mov DWORD[8+rdi],ecx
+ mov DWORD[12+rdi],edx
+ mov DWORD[16+rdi],r8d
+ mov DWORD[20+rdi],r9d
+ mov DWORD[24+rdi],r10d
+ mov DWORD[28+rdi],r11d
+ jb NEAR $L$loop_avx
+
+ mov rsi,QWORD[88+rsp]
+
+ vzeroupper
+ movaps xmm6,XMMWORD[((64+32))+rsp]
+ movaps xmm7,XMMWORD[((64+48))+rsp]
+ movaps xmm8,XMMWORD[((64+64))+rsp]
+ movaps xmm9,XMMWORD[((64+80))+rsp]
+ mov r15,QWORD[((-48))+rsi]
+
+ mov r14,QWORD[((-40))+rsi]
+
+ mov r13,QWORD[((-32))+rsi]
+
+ mov r12,QWORD[((-24))+rsi]
+
+ mov rbp,QWORD[((-16))+rsi]
+
+ mov rbx,QWORD[((-8))+rsi]
+
+ lea rsp,[rsi]
+
+$L$epilogue_avx:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_sha256_block_data_order_avx:
+
+ALIGN 64
+sha256_block_data_order_avx2:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_sha256_block_data_order_avx2:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+
+$L$avx2_shortcut:
+ mov rax,rsp
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+ sub rsp,608
+ shl rdx,4
+ and rsp,-256*4
+ lea rdx,[rdx*4+rsi]
+ add rsp,448
+ mov QWORD[((64+0))+rsp],rdi
+ mov QWORD[((64+8))+rsp],rsi
+ mov QWORD[((64+16))+rsp],rdx
+ mov QWORD[88+rsp],rax
+
+ movaps XMMWORD[(64+32)+rsp],xmm6
+ movaps XMMWORD[(64+48)+rsp],xmm7
+ movaps XMMWORD[(64+64)+rsp],xmm8
+ movaps XMMWORD[(64+80)+rsp],xmm9
+$L$prologue_avx2:
+
+ vzeroupper
+ sub rsi,-16*4
+ mov eax,DWORD[rdi]
+ mov r12,rsi
+ mov ebx,DWORD[4+rdi]
+ cmp rsi,rdx
+ mov ecx,DWORD[8+rdi]
+ cmove r12,rsp
+ mov edx,DWORD[12+rdi]
+ mov r8d,DWORD[16+rdi]
+ mov r9d,DWORD[20+rdi]
+ mov r10d,DWORD[24+rdi]
+ mov r11d,DWORD[28+rdi]
+ vmovdqa ymm8,YMMWORD[((K256+512+32))]
+ vmovdqa ymm9,YMMWORD[((K256+512+64))]
+ jmp NEAR $L$oop_avx2
+ALIGN 16
+$L$oop_avx2:
+ vmovdqa ymm7,YMMWORD[((K256+512))]
+ vmovdqu xmm0,XMMWORD[((-64+0))+rsi]
+ vmovdqu xmm1,XMMWORD[((-64+16))+rsi]
+ vmovdqu xmm2,XMMWORD[((-64+32))+rsi]
+ vmovdqu xmm3,XMMWORD[((-64+48))+rsi]
+
+ vinserti128 ymm0,ymm0,XMMWORD[r12],1
+ vinserti128 ymm1,ymm1,XMMWORD[16+r12],1
+ vpshufb ymm0,ymm0,ymm7
+ vinserti128 ymm2,ymm2,XMMWORD[32+r12],1
+ vpshufb ymm1,ymm1,ymm7
+ vinserti128 ymm3,ymm3,XMMWORD[48+r12],1
+
+ lea rbp,[K256]
+ vpshufb ymm2,ymm2,ymm7
+ vpaddd ymm4,ymm0,YMMWORD[rbp]
+ vpshufb ymm3,ymm3,ymm7
+ vpaddd ymm5,ymm1,YMMWORD[32+rbp]
+ vpaddd ymm6,ymm2,YMMWORD[64+rbp]
+ vpaddd ymm7,ymm3,YMMWORD[96+rbp]
+ vmovdqa YMMWORD[rsp],ymm4
+ xor r14d,r14d
+ vmovdqa YMMWORD[32+rsp],ymm5
+ lea rsp,[((-64))+rsp]
+ mov edi,ebx
+ vmovdqa YMMWORD[rsp],ymm6
+ xor edi,ecx
+ vmovdqa YMMWORD[32+rsp],ymm7
+ mov r12d,r9d
+ sub rbp,-16*2*4
+ jmp NEAR $L$avx2_00_47
+
+ALIGN 16
+$L$avx2_00_47:
+ lea rsp,[((-64))+rsp]
+ vpalignr ymm4,ymm1,ymm0,4
+ add r11d,DWORD[((0+128))+rsp]
+ and r12d,r8d
+ rorx r13d,r8d,25
+ vpalignr ymm7,ymm3,ymm2,4
+ rorx r15d,r8d,11
+ lea eax,[r14*1+rax]
+ lea r11d,[r12*1+r11]
+ vpsrld ymm6,ymm4,7
+ andn r12d,r8d,r10d
+ xor r13d,r15d
+ rorx r14d,r8d,6
+ vpaddd ymm0,ymm0,ymm7
+ lea r11d,[r12*1+r11]
+ xor r13d,r14d
+ mov r15d,eax
+ vpsrld ymm7,ymm4,3
+ rorx r12d,eax,22
+ lea r11d,[r13*1+r11]
+ xor r15d,ebx
+ vpslld ymm5,ymm4,14
+ rorx r14d,eax,13
+ rorx r13d,eax,2
+ lea edx,[r11*1+rdx]
+ vpxor ymm4,ymm7,ymm6
+ and edi,r15d
+ xor r14d,r12d
+ xor edi,ebx
+ vpshufd ymm7,ymm3,250
+ xor r14d,r13d
+ lea r11d,[rdi*1+r11]
+ mov r12d,r8d
+ vpsrld ymm6,ymm6,11
+ add r10d,DWORD[((4+128))+rsp]
+ and r12d,edx
+ rorx r13d,edx,25
+ vpxor ymm4,ymm4,ymm5
+ rorx edi,edx,11
+ lea r11d,[r14*1+r11]
+ lea r10d,[r12*1+r10]
+ vpslld ymm5,ymm5,11
+ andn r12d,edx,r9d
+ xor r13d,edi
+ rorx r14d,edx,6
+ vpxor ymm4,ymm4,ymm6
+ lea r10d,[r12*1+r10]
+ xor r13d,r14d
+ mov edi,r11d
+ vpsrld ymm6,ymm7,10
+ rorx r12d,r11d,22
+ lea r10d,[r13*1+r10]
+ xor edi,eax
+ vpxor ymm4,ymm4,ymm5
+ rorx r14d,r11d,13
+ rorx r13d,r11d,2
+ lea ecx,[r10*1+rcx]
+ vpsrlq ymm7,ymm7,17
+ and r15d,edi
+ xor r14d,r12d
+ xor r15d,eax
+ vpaddd ymm0,ymm0,ymm4
+ xor r14d,r13d
+ lea r10d,[r15*1+r10]
+ mov r12d,edx
+ vpxor ymm6,ymm6,ymm7
+ add r9d,DWORD[((8+128))+rsp]
+ and r12d,ecx
+ rorx r13d,ecx,25
+ vpsrlq ymm7,ymm7,2
+ rorx r15d,ecx,11
+ lea r10d,[r14*1+r10]
+ lea r9d,[r12*1+r9]
+ vpxor ymm6,ymm6,ymm7
+ andn r12d,ecx,r8d
+ xor r13d,r15d
+ rorx r14d,ecx,6
+ vpshufb ymm6,ymm6,ymm8
+ lea r9d,[r12*1+r9]
+ xor r13d,r14d
+ mov r15d,r10d
+ vpaddd ymm0,ymm0,ymm6
+ rorx r12d,r10d,22
+ lea r9d,[r13*1+r9]
+ xor r15d,r11d
+ vpshufd ymm7,ymm0,80
+ rorx r14d,r10d,13
+ rorx r13d,r10d,2
+ lea ebx,[r9*1+rbx]
+ vpsrld ymm6,ymm7,10
+ and edi,r15d
+ xor r14d,r12d
+ xor edi,r11d
+ vpsrlq ymm7,ymm7,17
+ xor r14d,r13d
+ lea r9d,[rdi*1+r9]
+ mov r12d,ecx
+ vpxor ymm6,ymm6,ymm7
+ add r8d,DWORD[((12+128))+rsp]
+ and r12d,ebx
+ rorx r13d,ebx,25
+ vpsrlq ymm7,ymm7,2
+ rorx edi,ebx,11
+ lea r9d,[r14*1+r9]
+ lea r8d,[r12*1+r8]
+ vpxor ymm6,ymm6,ymm7
+ andn r12d,ebx,edx
+ xor r13d,edi
+ rorx r14d,ebx,6
+ vpshufb ymm6,ymm6,ymm9
+ lea r8d,[r12*1+r8]
+ xor r13d,r14d
+ mov edi,r9d
+ vpaddd ymm0,ymm0,ymm6
+ rorx r12d,r9d,22
+ lea r8d,[r13*1+r8]
+ xor edi,r10d
+ vpaddd ymm6,ymm0,YMMWORD[rbp]
+ rorx r14d,r9d,13
+ rorx r13d,r9d,2
+ lea eax,[r8*1+rax]
+ and r15d,edi
+ xor r14d,r12d
+ xor r15d,r10d
+ xor r14d,r13d
+ lea r8d,[r15*1+r8]
+ mov r12d,ebx
+ vmovdqa YMMWORD[rsp],ymm6
+ vpalignr ymm4,ymm2,ymm1,4
+ add edx,DWORD[((32+128))+rsp]
+ and r12d,eax
+ rorx r13d,eax,25
+ vpalignr ymm7,ymm0,ymm3,4
+ rorx r15d,eax,11
+ lea r8d,[r14*1+r8]
+ lea edx,[r12*1+rdx]
+ vpsrld ymm6,ymm4,7
+ andn r12d,eax,ecx
+ xor r13d,r15d
+ rorx r14d,eax,6
+ vpaddd ymm1,ymm1,ymm7
+ lea edx,[r12*1+rdx]
+ xor r13d,r14d
+ mov r15d,r8d
+ vpsrld ymm7,ymm4,3
+ rorx r12d,r8d,22
+ lea edx,[r13*1+rdx]
+ xor r15d,r9d
+ vpslld ymm5,ymm4,14
+ rorx r14d,r8d,13
+ rorx r13d,r8d,2
+ lea r11d,[rdx*1+r11]
+ vpxor ymm4,ymm7,ymm6
+ and edi,r15d
+ xor r14d,r12d
+ xor edi,r9d
+ vpshufd ymm7,ymm0,250
+ xor r14d,r13d
+ lea edx,[rdi*1+rdx]
+ mov r12d,eax
+ vpsrld ymm6,ymm6,11
+ add ecx,DWORD[((36+128))+rsp]
+ and r12d,r11d
+ rorx r13d,r11d,25
+ vpxor ymm4,ymm4,ymm5
+ rorx edi,r11d,11
+ lea edx,[r14*1+rdx]
+ lea ecx,[r12*1+rcx]
+ vpslld ymm5,ymm5,11
+ andn r12d,r11d,ebx
+ xor r13d,edi
+ rorx r14d,r11d,6
+ vpxor ymm4,ymm4,ymm6
+ lea ecx,[r12*1+rcx]
+ xor r13d,r14d
+ mov edi,edx
+ vpsrld ymm6,ymm7,10
+ rorx r12d,edx,22
+ lea ecx,[r13*1+rcx]
+ xor edi,r8d
+ vpxor ymm4,ymm4,ymm5
+ rorx r14d,edx,13
+ rorx r13d,edx,2
+ lea r10d,[rcx*1+r10]
+ vpsrlq ymm7,ymm7,17
+ and r15d,edi
+ xor r14d,r12d
+ xor r15d,r8d
+ vpaddd ymm1,ymm1,ymm4
+ xor r14d,r13d
+ lea ecx,[r15*1+rcx]
+ mov r12d,r11d
+ vpxor ymm6,ymm6,ymm7
+ add ebx,DWORD[((40+128))+rsp]
+ and r12d,r10d
+ rorx r13d,r10d,25
+ vpsrlq ymm7,ymm7,2
+ rorx r15d,r10d,11
+ lea ecx,[r14*1+rcx]
+ lea ebx,[r12*1+rbx]
+ vpxor ymm6,ymm6,ymm7
+ andn r12d,r10d,eax
+ xor r13d,r15d
+ rorx r14d,r10d,6
+ vpshufb ymm6,ymm6,ymm8
+ lea ebx,[r12*1+rbx]
+ xor r13d,r14d
+ mov r15d,ecx
+ vpaddd ymm1,ymm1,ymm6
+ rorx r12d,ecx,22
+ lea ebx,[r13*1+rbx]
+ xor r15d,edx
+ vpshufd ymm7,ymm1,80
+ rorx r14d,ecx,13
+ rorx r13d,ecx,2
+ lea r9d,[rbx*1+r9]
+ vpsrld ymm6,ymm7,10
+ and edi,r15d
+ xor r14d,r12d
+ xor edi,edx
+ vpsrlq ymm7,ymm7,17
+ xor r14d,r13d
+ lea ebx,[rdi*1+rbx]
+ mov r12d,r10d
+ vpxor ymm6,ymm6,ymm7
+ add eax,DWORD[((44+128))+rsp]
+ and r12d,r9d
+ rorx r13d,r9d,25
+ vpsrlq ymm7,ymm7,2
+ rorx edi,r9d,11
+ lea ebx,[r14*1+rbx]
+ lea eax,[r12*1+rax]
+ vpxor ymm6,ymm6,ymm7
+ andn r12d,r9d,r11d
+ xor r13d,edi
+ rorx r14d,r9d,6
+ vpshufb ymm6,ymm6,ymm9
+ lea eax,[r12*1+rax]
+ xor r13d,r14d
+ mov edi,ebx
+ vpaddd ymm1,ymm1,ymm6
+ rorx r12d,ebx,22
+ lea eax,[r13*1+rax]
+ xor edi,ecx
+ vpaddd ymm6,ymm1,YMMWORD[32+rbp]
+ rorx r14d,ebx,13
+ rorx r13d,ebx,2
+ lea r8d,[rax*1+r8]
+ and r15d,edi
+ xor r14d,r12d
+ xor r15d,ecx
+ xor r14d,r13d
+ lea eax,[r15*1+rax]
+ mov r12d,r9d
+ vmovdqa YMMWORD[32+rsp],ymm6
+ lea rsp,[((-64))+rsp]
+ vpalignr ymm4,ymm3,ymm2,4
+ add r11d,DWORD[((0+128))+rsp]
+ and r12d,r8d
+ rorx r13d,r8d,25
+ vpalignr ymm7,ymm1,ymm0,4
+ rorx r15d,r8d,11
+ lea eax,[r14*1+rax]
+ lea r11d,[r12*1+r11]
+ vpsrld ymm6,ymm4,7
+ andn r12d,r8d,r10d
+ xor r13d,r15d
+ rorx r14d,r8d,6
+ vpaddd ymm2,ymm2,ymm7
+ lea r11d,[r12*1+r11]
+ xor r13d,r14d
+ mov r15d,eax
+ vpsrld ymm7,ymm4,3
+ rorx r12d,eax,22
+ lea r11d,[r13*1+r11]
+ xor r15d,ebx
+ vpslld ymm5,ymm4,14
+ rorx r14d,eax,13
+ rorx r13d,eax,2
+ lea edx,[r11*1+rdx]
+ vpxor ymm4,ymm7,ymm6
+ and edi,r15d
+ xor r14d,r12d
+ xor edi,ebx
+ vpshufd ymm7,ymm1,250
+ xor r14d,r13d
+ lea r11d,[rdi*1+r11]
+ mov r12d,r8d
+ vpsrld ymm6,ymm6,11
+ add r10d,DWORD[((4+128))+rsp]
+ and r12d,edx
+ rorx r13d,edx,25
+ vpxor ymm4,ymm4,ymm5
+ rorx edi,edx,11
+ lea r11d,[r14*1+r11]
+ lea r10d,[r12*1+r10]
+ vpslld ymm5,ymm5,11
+ andn r12d,edx,r9d
+ xor r13d,edi
+ rorx r14d,edx,6
+ vpxor ymm4,ymm4,ymm6
+ lea r10d,[r12*1+r10]
+ xor r13d,r14d
+ mov edi,r11d
+ vpsrld ymm6,ymm7,10
+ rorx r12d,r11d,22
+ lea r10d,[r13*1+r10]
+ xor edi,eax
+ vpxor ymm4,ymm4,ymm5
+ rorx r14d,r11d,13
+ rorx r13d,r11d,2
+ lea ecx,[r10*1+rcx]
+ vpsrlq ymm7,ymm7,17
+ and r15d,edi
+ xor r14d,r12d
+ xor r15d,eax
+ vpaddd ymm2,ymm2,ymm4
+ xor r14d,r13d
+ lea r10d,[r15*1+r10]
+ mov r12d,edx
+ vpxor ymm6,ymm6,ymm7
+ add r9d,DWORD[((8+128))+rsp]
+ and r12d,ecx
+ rorx r13d,ecx,25
+ vpsrlq ymm7,ymm7,2
+ rorx r15d,ecx,11
+ lea r10d,[r14*1+r10]
+ lea r9d,[r12*1+r9]
+ vpxor ymm6,ymm6,ymm7
+ andn r12d,ecx,r8d
+ xor r13d,r15d
+ rorx r14d,ecx,6
+ vpshufb ymm6,ymm6,ymm8
+ lea r9d,[r12*1+r9]
+ xor r13d,r14d
+ mov r15d,r10d
+ vpaddd ymm2,ymm2,ymm6
+ rorx r12d,r10d,22
+ lea r9d,[r13*1+r9]
+ xor r15d,r11d
+ vpshufd ymm7,ymm2,80
+ rorx r14d,r10d,13
+ rorx r13d,r10d,2
+ lea ebx,[r9*1+rbx]
+ vpsrld ymm6,ymm7,10
+ and edi,r15d
+ xor r14d,r12d
+ xor edi,r11d
+ vpsrlq ymm7,ymm7,17
+ xor r14d,r13d
+ lea r9d,[rdi*1+r9]
+ mov r12d,ecx
+ vpxor ymm6,ymm6,ymm7
+ add r8d,DWORD[((12+128))+rsp]
+ and r12d,ebx
+ rorx r13d,ebx,25
+ vpsrlq ymm7,ymm7,2
+ rorx edi,ebx,11
+ lea r9d,[r14*1+r9]
+ lea r8d,[r12*1+r8]
+ vpxor ymm6,ymm6,ymm7
+ andn r12d,ebx,edx
+ xor r13d,edi
+ rorx r14d,ebx,6
+ vpshufb ymm6,ymm6,ymm9
+ lea r8d,[r12*1+r8]
+ xor r13d,r14d
+ mov edi,r9d
+ vpaddd ymm2,ymm2,ymm6
+ rorx r12d,r9d,22
+ lea r8d,[r13*1+r8]
+ xor edi,r10d
+ vpaddd ymm6,ymm2,YMMWORD[64+rbp]
+ rorx r14d,r9d,13
+ rorx r13d,r9d,2
+ lea eax,[r8*1+rax]
+ and r15d,edi
+ xor r14d,r12d
+ xor r15d,r10d
+ xor r14d,r13d
+ lea r8d,[r15*1+r8]
+ mov r12d,ebx
+ vmovdqa YMMWORD[rsp],ymm6
+ vpalignr ymm4,ymm0,ymm3,4
+ add edx,DWORD[((32+128))+rsp]
+ and r12d,eax
+ rorx r13d,eax,25
+ vpalignr ymm7,ymm2,ymm1,4
+ rorx r15d,eax,11
+ lea r8d,[r14*1+r8]
+ lea edx,[r12*1+rdx]
+ vpsrld ymm6,ymm4,7
+ andn r12d,eax,ecx
+ xor r13d,r15d
+ rorx r14d,eax,6
+ vpaddd ymm3,ymm3,ymm7
+ lea edx,[r12*1+rdx]
+ xor r13d,r14d
+ mov r15d,r8d
+ vpsrld ymm7,ymm4,3
+ rorx r12d,r8d,22
+ lea edx,[r13*1+rdx]
+ xor r15d,r9d
+ vpslld ymm5,ymm4,14
+ rorx r14d,r8d,13
+ rorx r13d,r8d,2
+ lea r11d,[rdx*1+r11]
+ vpxor ymm4,ymm7,ymm6
+ and edi,r15d
+ xor r14d,r12d
+ xor edi,r9d
+ vpshufd ymm7,ymm2,250
+ xor r14d,r13d
+ lea edx,[rdi*1+rdx]
+ mov r12d,eax
+ vpsrld ymm6,ymm6,11
+ add ecx,DWORD[((36+128))+rsp]
+ and r12d,r11d
+ rorx r13d,r11d,25
+ vpxor ymm4,ymm4,ymm5
+ rorx edi,r11d,11
+ lea edx,[r14*1+rdx]
+ lea ecx,[r12*1+rcx]
+ vpslld ymm5,ymm5,11
+ andn r12d,r11d,ebx
+ xor r13d,edi
+ rorx r14d,r11d,6
+ vpxor ymm4,ymm4,ymm6
+ lea ecx,[r12*1+rcx]
+ xor r13d,r14d
+ mov edi,edx
+ vpsrld ymm6,ymm7,10
+ rorx r12d,edx,22
+ lea ecx,[r13*1+rcx]
+ xor edi,r8d
+ vpxor ymm4,ymm4,ymm5
+ rorx r14d,edx,13
+ rorx r13d,edx,2
+ lea r10d,[rcx*1+r10]
+ vpsrlq ymm7,ymm7,17
+ and r15d,edi
+ xor r14d,r12d
+ xor r15d,r8d
+ vpaddd ymm3,ymm3,ymm4
+ xor r14d,r13d
+ lea ecx,[r15*1+rcx]
+ mov r12d,r11d
+ vpxor ymm6,ymm6,ymm7
+ add ebx,DWORD[((40+128))+rsp]
+ and r12d,r10d
+ rorx r13d,r10d,25
+ vpsrlq ymm7,ymm7,2
+ rorx r15d,r10d,11
+ lea ecx,[r14*1+rcx]
+ lea ebx,[r12*1+rbx]
+ vpxor ymm6,ymm6,ymm7
+ andn r12d,r10d,eax
+ xor r13d,r15d
+ rorx r14d,r10d,6
+ vpshufb ymm6,ymm6,ymm8
+ lea ebx,[r12*1+rbx]
+ xor r13d,r14d
+ mov r15d,ecx
+ vpaddd ymm3,ymm3,ymm6
+ rorx r12d,ecx,22
+ lea ebx,[r13*1+rbx]
+ xor r15d,edx
+ vpshufd ymm7,ymm3,80
+ rorx r14d,ecx,13
+ rorx r13d,ecx,2
+ lea r9d,[rbx*1+r9]
+ vpsrld ymm6,ymm7,10
+ and edi,r15d
+ xor r14d,r12d
+ xor edi,edx
+ vpsrlq ymm7,ymm7,17
+ xor r14d,r13d
+ lea ebx,[rdi*1+rbx]
+ mov r12d,r10d
+ vpxor ymm6,ymm6,ymm7
+ add eax,DWORD[((44+128))+rsp]
+ and r12d,r9d
+ rorx r13d,r9d,25
+ vpsrlq ymm7,ymm7,2
+ rorx edi,r9d,11
+ lea ebx,[r14*1+rbx]
+ lea eax,[r12*1+rax]
+ vpxor ymm6,ymm6,ymm7
+ andn r12d,r9d,r11d
+ xor r13d,edi
+ rorx r14d,r9d,6
+ vpshufb ymm6,ymm6,ymm9
+ lea eax,[r12*1+rax]
+ xor r13d,r14d
+ mov edi,ebx
+ vpaddd ymm3,ymm3,ymm6
+ rorx r12d,ebx,22
+ lea eax,[r13*1+rax]
+ xor edi,ecx
+ vpaddd ymm6,ymm3,YMMWORD[96+rbp]
+ rorx r14d,ebx,13
+ rorx r13d,ebx,2
+ lea r8d,[rax*1+r8]
+ and r15d,edi
+ xor r14d,r12d
+ xor r15d,ecx
+ xor r14d,r13d
+ lea eax,[r15*1+rax]
+ mov r12d,r9d
+ vmovdqa YMMWORD[32+rsp],ymm6
+ lea rbp,[128+rbp]
+ cmp BYTE[3+rbp],0
+ jne NEAR $L$avx2_00_47
+ add r11d,DWORD[((0+64))+rsp]
+ and r12d,r8d
+ rorx r13d,r8d,25
+ rorx r15d,r8d,11
+ lea eax,[r14*1+rax]
+ lea r11d,[r12*1+r11]
+ andn r12d,r8d,r10d
+ xor r13d,r15d
+ rorx r14d,r8d,6
+ lea r11d,[r12*1+r11]
+ xor r13d,r14d
+ mov r15d,eax
+ rorx r12d,eax,22
+ lea r11d,[r13*1+r11]
+ xor r15d,ebx
+ rorx r14d,eax,13
+ rorx r13d,eax,2
+ lea edx,[r11*1+rdx]
+ and edi,r15d
+ xor r14d,r12d
+ xor edi,ebx
+ xor r14d,r13d
+ lea r11d,[rdi*1+r11]
+ mov r12d,r8d
+ add r10d,DWORD[((4+64))+rsp]
+ and r12d,edx
+ rorx r13d,edx,25
+ rorx edi,edx,11
+ lea r11d,[r14*1+r11]
+ lea r10d,[r12*1+r10]
+ andn r12d,edx,r9d
+ xor r13d,edi
+ rorx r14d,edx,6
+ lea r10d,[r12*1+r10]
+ xor r13d,r14d
+ mov edi,r11d
+ rorx r12d,r11d,22
+ lea r10d,[r13*1+r10]
+ xor edi,eax
+ rorx r14d,r11d,13
+ rorx r13d,r11d,2
+ lea ecx,[r10*1+rcx]
+ and r15d,edi
+ xor r14d,r12d
+ xor r15d,eax
+ xor r14d,r13d
+ lea r10d,[r15*1+r10]
+ mov r12d,edx
+ add r9d,DWORD[((8+64))+rsp]
+ and r12d,ecx
+ rorx r13d,ecx,25
+ rorx r15d,ecx,11
+ lea r10d,[r14*1+r10]
+ lea r9d,[r12*1+r9]
+ andn r12d,ecx,r8d
+ xor r13d,r15d
+ rorx r14d,ecx,6
+ lea r9d,[r12*1+r9]
+ xor r13d,r14d
+ mov r15d,r10d
+ rorx r12d,r10d,22
+ lea r9d,[r13*1+r9]
+ xor r15d,r11d
+ rorx r14d,r10d,13
+ rorx r13d,r10d,2
+ lea ebx,[r9*1+rbx]
+ and edi,r15d
+ xor r14d,r12d
+ xor edi,r11d
+ xor r14d,r13d
+ lea r9d,[rdi*1+r9]
+ mov r12d,ecx
+ add r8d,DWORD[((12+64))+rsp]
+ and r12d,ebx
+ rorx r13d,ebx,25
+ rorx edi,ebx,11
+ lea r9d,[r14*1+r9]
+ lea r8d,[r12*1+r8]
+ andn r12d,ebx,edx
+ xor r13d,edi
+ rorx r14d,ebx,6
+ lea r8d,[r12*1+r8]
+ xor r13d,r14d
+ mov edi,r9d
+ rorx r12d,r9d,22
+ lea r8d,[r13*1+r8]
+ xor edi,r10d
+ rorx r14d,r9d,13
+ rorx r13d,r9d,2
+ lea eax,[r8*1+rax]
+ and r15d,edi
+ xor r14d,r12d
+ xor r15d,r10d
+ xor r14d,r13d
+ lea r8d,[r15*1+r8]
+ mov r12d,ebx
+ add edx,DWORD[((32+64))+rsp]
+ and r12d,eax
+ rorx r13d,eax,25
+ rorx r15d,eax,11
+ lea r8d,[r14*1+r8]
+ lea edx,[r12*1+rdx]
+ andn r12d,eax,ecx
+ xor r13d,r15d
+ rorx r14d,eax,6
+ lea edx,[r12*1+rdx]
+ xor r13d,r14d
+ mov r15d,r8d
+ rorx r12d,r8d,22
+ lea edx,[r13*1+rdx]
+ xor r15d,r9d
+ rorx r14d,r8d,13
+ rorx r13d,r8d,2
+ lea r11d,[rdx*1+r11]
+ and edi,r15d
+ xor r14d,r12d
+ xor edi,r9d
+ xor r14d,r13d
+ lea edx,[rdi*1+rdx]
+ mov r12d,eax
+ add ecx,DWORD[((36+64))+rsp]
+ and r12d,r11d
+ rorx r13d,r11d,25
+ rorx edi,r11d,11
+ lea edx,[r14*1+rdx]
+ lea ecx,[r12*1+rcx]
+ andn r12d,r11d,ebx
+ xor r13d,edi
+ rorx r14d,r11d,6
+ lea ecx,[r12*1+rcx]
+ xor r13d,r14d
+ mov edi,edx
+ rorx r12d,edx,22
+ lea ecx,[r13*1+rcx]
+ xor edi,r8d
+ rorx r14d,edx,13
+ rorx r13d,edx,2
+ lea r10d,[rcx*1+r10]
+ and r15d,edi
+ xor r14d,r12d
+ xor r15d,r8d
+ xor r14d,r13d
+ lea ecx,[r15*1+rcx]
+ mov r12d,r11d
+ add ebx,DWORD[((40+64))+rsp]
+ and r12d,r10d
+ rorx r13d,r10d,25
+ rorx r15d,r10d,11
+ lea ecx,[r14*1+rcx]
+ lea ebx,[r12*1+rbx]
+ andn r12d,r10d,eax
+ xor r13d,r15d
+ rorx r14d,r10d,6
+ lea ebx,[r12*1+rbx]
+ xor r13d,r14d
+ mov r15d,ecx
+ rorx r12d,ecx,22
+ lea ebx,[r13*1+rbx]
+ xor r15d,edx
+ rorx r14d,ecx,13
+ rorx r13d,ecx,2
+ lea r9d,[rbx*1+r9]
+ and edi,r15d
+ xor r14d,r12d
+ xor edi,edx
+ xor r14d,r13d
+ lea ebx,[rdi*1+rbx]
+ mov r12d,r10d
+ add eax,DWORD[((44+64))+rsp]
+ and r12d,r9d
+ rorx r13d,r9d,25
+ rorx edi,r9d,11
+ lea ebx,[r14*1+rbx]
+ lea eax,[r12*1+rax]
+ andn r12d,r9d,r11d
+ xor r13d,edi
+ rorx r14d,r9d,6
+ lea eax,[r12*1+rax]
+ xor r13d,r14d
+ mov edi,ebx
+ rorx r12d,ebx,22
+ lea eax,[r13*1+rax]
+ xor edi,ecx
+ rorx r14d,ebx,13
+ rorx r13d,ebx,2
+ lea r8d,[rax*1+r8]
+ and r15d,edi
+ xor r14d,r12d
+ xor r15d,ecx
+ xor r14d,r13d
+ lea eax,[r15*1+rax]
+ mov r12d,r9d
+ add r11d,DWORD[rsp]
+ and r12d,r8d
+ rorx r13d,r8d,25
+ rorx r15d,r8d,11
+ lea eax,[r14*1+rax]
+ lea r11d,[r12*1+r11]
+ andn r12d,r8d,r10d
+ xor r13d,r15d
+ rorx r14d,r8d,6
+ lea r11d,[r12*1+r11]
+ xor r13d,r14d
+ mov r15d,eax
+ rorx r12d,eax,22
+ lea r11d,[r13*1+r11]
+ xor r15d,ebx
+ rorx r14d,eax,13
+ rorx r13d,eax,2
+ lea edx,[r11*1+rdx]
+ and edi,r15d
+ xor r14d,r12d
+ xor edi,ebx
+ xor r14d,r13d
+ lea r11d,[rdi*1+r11]
+ mov r12d,r8d
+ add r10d,DWORD[4+rsp]
+ and r12d,edx
+ rorx r13d,edx,25
+ rorx edi,edx,11
+ lea r11d,[r14*1+r11]
+ lea r10d,[r12*1+r10]
+ andn r12d,edx,r9d
+ xor r13d,edi
+ rorx r14d,edx,6
+ lea r10d,[r12*1+r10]
+ xor r13d,r14d
+ mov edi,r11d
+ rorx r12d,r11d,22
+ lea r10d,[r13*1+r10]
+ xor edi,eax
+ rorx r14d,r11d,13
+ rorx r13d,r11d,2
+ lea ecx,[r10*1+rcx]
+ and r15d,edi
+ xor r14d,r12d
+ xor r15d,eax
+ xor r14d,r13d
+ lea r10d,[r15*1+r10]
+ mov r12d,edx
+ add r9d,DWORD[8+rsp]
+ and r12d,ecx
+ rorx r13d,ecx,25
+ rorx r15d,ecx,11
+ lea r10d,[r14*1+r10]
+ lea r9d,[r12*1+r9]
+ andn r12d,ecx,r8d
+ xor r13d,r15d
+ rorx r14d,ecx,6
+ lea r9d,[r12*1+r9]
+ xor r13d,r14d
+ mov r15d,r10d
+ rorx r12d,r10d,22
+ lea r9d,[r13*1+r9]
+ xor r15d,r11d
+ rorx r14d,r10d,13
+ rorx r13d,r10d,2
+ lea ebx,[r9*1+rbx]
+ and edi,r15d
+ xor r14d,r12d
+ xor edi,r11d
+ xor r14d,r13d
+ lea r9d,[rdi*1+r9]
+ mov r12d,ecx
+ add r8d,DWORD[12+rsp]
+ and r12d,ebx
+ rorx r13d,ebx,25
+ rorx edi,ebx,11
+ lea r9d,[r14*1+r9]
+ lea r8d,[r12*1+r8]
+ andn r12d,ebx,edx
+ xor r13d,edi
+ rorx r14d,ebx,6
+ lea r8d,[r12*1+r8]
+ xor r13d,r14d
+ mov edi,r9d
+ rorx r12d,r9d,22
+ lea r8d,[r13*1+r8]
+ xor edi,r10d
+ rorx r14d,r9d,13
+ rorx r13d,r9d,2
+ lea eax,[r8*1+rax]
+ and r15d,edi
+ xor r14d,r12d
+ xor r15d,r10d
+ xor r14d,r13d
+ lea r8d,[r15*1+r8]
+ mov r12d,ebx
+ add edx,DWORD[32+rsp]
+ and r12d,eax
+ rorx r13d,eax,25
+ rorx r15d,eax,11
+ lea r8d,[r14*1+r8]
+ lea edx,[r12*1+rdx]
+ andn r12d,eax,ecx
+ xor r13d,r15d
+ rorx r14d,eax,6
+ lea edx,[r12*1+rdx]
+ xor r13d,r14d
+ mov r15d,r8d
+ rorx r12d,r8d,22
+ lea edx,[r13*1+rdx]
+ xor r15d,r9d
+ rorx r14d,r8d,13
+ rorx r13d,r8d,2
+ lea r11d,[rdx*1+r11]
+ and edi,r15d
+ xor r14d,r12d
+ xor edi,r9d
+ xor r14d,r13d
+ lea edx,[rdi*1+rdx]
+ mov r12d,eax
+ add ecx,DWORD[36+rsp]
+ and r12d,r11d
+ rorx r13d,r11d,25
+ rorx edi,r11d,11
+ lea edx,[r14*1+rdx]
+ lea ecx,[r12*1+rcx]
+ andn r12d,r11d,ebx
+ xor r13d,edi
+ rorx r14d,r11d,6
+ lea ecx,[r12*1+rcx]
+ xor r13d,r14d
+ mov edi,edx
+ rorx r12d,edx,22
+ lea ecx,[r13*1+rcx]
+ xor edi,r8d
+ rorx r14d,edx,13
+ rorx r13d,edx,2
+ lea r10d,[rcx*1+r10]
+ and r15d,edi
+ xor r14d,r12d
+ xor r15d,r8d
+ xor r14d,r13d
+ lea ecx,[r15*1+rcx]
+ mov r12d,r11d
+ add ebx,DWORD[40+rsp]
+ and r12d,r10d
+ rorx r13d,r10d,25
+ rorx r15d,r10d,11
+ lea ecx,[r14*1+rcx]
+ lea ebx,[r12*1+rbx]
+ andn r12d,r10d,eax
+ xor r13d,r15d
+ rorx r14d,r10d,6
+ lea ebx,[r12*1+rbx]
+ xor r13d,r14d
+ mov r15d,ecx
+ rorx r12d,ecx,22
+ lea ebx,[r13*1+rbx]
+ xor r15d,edx
+ rorx r14d,ecx,13
+ rorx r13d,ecx,2
+ lea r9d,[rbx*1+r9]
+ and edi,r15d
+ xor r14d,r12d
+ xor edi,edx
+ xor r14d,r13d
+ lea ebx,[rdi*1+rbx]
+ mov r12d,r10d
+ add eax,DWORD[44+rsp]
+ and r12d,r9d
+ rorx r13d,r9d,25
+ rorx edi,r9d,11
+ lea ebx,[r14*1+rbx]
+ lea eax,[r12*1+rax]
+ andn r12d,r9d,r11d
+ xor r13d,edi
+ rorx r14d,r9d,6
+ lea eax,[r12*1+rax]
+ xor r13d,r14d
+ mov edi,ebx
+ rorx r12d,ebx,22
+ lea eax,[r13*1+rax]
+ xor edi,ecx
+ rorx r14d,ebx,13
+ rorx r13d,ebx,2
+ lea r8d,[rax*1+r8]
+ and r15d,edi
+ xor r14d,r12d
+ xor r15d,ecx
+ xor r14d,r13d
+ lea eax,[r15*1+rax]
+ mov r12d,r9d
+ mov rdi,QWORD[512+rsp]
+ add eax,r14d
+
+ lea rbp,[448+rsp]
+
+ add eax,DWORD[rdi]
+ add ebx,DWORD[4+rdi]
+ add ecx,DWORD[8+rdi]
+ add edx,DWORD[12+rdi]
+ add r8d,DWORD[16+rdi]
+ add r9d,DWORD[20+rdi]
+ add r10d,DWORD[24+rdi]
+ add r11d,DWORD[28+rdi]
+
+ mov DWORD[rdi],eax
+ mov DWORD[4+rdi],ebx
+ mov DWORD[8+rdi],ecx
+ mov DWORD[12+rdi],edx
+ mov DWORD[16+rdi],r8d
+ mov DWORD[20+rdi],r9d
+ mov DWORD[24+rdi],r10d
+ mov DWORD[28+rdi],r11d
+
+ cmp rsi,QWORD[80+rbp]
+ je NEAR $L$done_avx2
+
+ xor r14d,r14d
+ mov edi,ebx
+ xor edi,ecx
+ mov r12d,r9d
+ jmp NEAR $L$ower_avx2
+ALIGN 16
+$L$ower_avx2:
+ add r11d,DWORD[((0+16))+rbp]
+ and r12d,r8d
+ rorx r13d,r8d,25
+ rorx r15d,r8d,11
+ lea eax,[r14*1+rax]
+ lea r11d,[r12*1+r11]
+ andn r12d,r8d,r10d
+ xor r13d,r15d
+ rorx r14d,r8d,6
+ lea r11d,[r12*1+r11]
+ xor r13d,r14d
+ mov r15d,eax
+ rorx r12d,eax,22
+ lea r11d,[r13*1+r11]
+ xor r15d,ebx
+ rorx r14d,eax,13
+ rorx r13d,eax,2
+ lea edx,[r11*1+rdx]
+ and edi,r15d
+ xor r14d,r12d
+ xor edi,ebx
+ xor r14d,r13d
+ lea r11d,[rdi*1+r11]
+ mov r12d,r8d
+ add r10d,DWORD[((4+16))+rbp]
+ and r12d,edx
+ rorx r13d,edx,25
+ rorx edi,edx,11
+ lea r11d,[r14*1+r11]
+ lea r10d,[r12*1+r10]
+ andn r12d,edx,r9d
+ xor r13d,edi
+ rorx r14d,edx,6
+ lea r10d,[r12*1+r10]
+ xor r13d,r14d
+ mov edi,r11d
+ rorx r12d,r11d,22
+ lea r10d,[r13*1+r10]
+ xor edi,eax
+ rorx r14d,r11d,13
+ rorx r13d,r11d,2
+ lea ecx,[r10*1+rcx]
+ and r15d,edi
+ xor r14d,r12d
+ xor r15d,eax
+ xor r14d,r13d
+ lea r10d,[r15*1+r10]
+ mov r12d,edx
+ add r9d,DWORD[((8+16))+rbp]
+ and r12d,ecx
+ rorx r13d,ecx,25
+ rorx r15d,ecx,11
+ lea r10d,[r14*1+r10]
+ lea r9d,[r12*1+r9]
+ andn r12d,ecx,r8d
+ xor r13d,r15d
+ rorx r14d,ecx,6
+ lea r9d,[r12*1+r9]
+ xor r13d,r14d
+ mov r15d,r10d
+ rorx r12d,r10d,22
+ lea r9d,[r13*1+r9]
+ xor r15d,r11d
+ rorx r14d,r10d,13
+ rorx r13d,r10d,2
+ lea ebx,[r9*1+rbx]
+ and edi,r15d
+ xor r14d,r12d
+ xor edi,r11d
+ xor r14d,r13d
+ lea r9d,[rdi*1+r9]
+ mov r12d,ecx
+ add r8d,DWORD[((12+16))+rbp]
+ and r12d,ebx
+ rorx r13d,ebx,25
+ rorx edi,ebx,11
+ lea r9d,[r14*1+r9]
+ lea r8d,[r12*1+r8]
+ andn r12d,ebx,edx
+ xor r13d,edi
+ rorx r14d,ebx,6
+ lea r8d,[r12*1+r8]
+ xor r13d,r14d
+ mov edi,r9d
+ rorx r12d,r9d,22
+ lea r8d,[r13*1+r8]
+ xor edi,r10d
+ rorx r14d,r9d,13
+ rorx r13d,r9d,2
+ lea eax,[r8*1+rax]
+ and r15d,edi
+ xor r14d,r12d
+ xor r15d,r10d
+ xor r14d,r13d
+ lea r8d,[r15*1+r8]
+ mov r12d,ebx
+ add edx,DWORD[((32+16))+rbp]
+ and r12d,eax
+ rorx r13d,eax,25
+ rorx r15d,eax,11
+ lea r8d,[r14*1+r8]
+ lea edx,[r12*1+rdx]
+ andn r12d,eax,ecx
+ xor r13d,r15d
+ rorx r14d,eax,6
+ lea edx,[r12*1+rdx]
+ xor r13d,r14d
+ mov r15d,r8d
+ rorx r12d,r8d,22
+ lea edx,[r13*1+rdx]
+ xor r15d,r9d
+ rorx r14d,r8d,13
+ rorx r13d,r8d,2
+ lea r11d,[rdx*1+r11]
+ and edi,r15d
+ xor r14d,r12d
+ xor edi,r9d
+ xor r14d,r13d
+ lea edx,[rdi*1+rdx]
+ mov r12d,eax
+ add ecx,DWORD[((36+16))+rbp]
+ and r12d,r11d
+ rorx r13d,r11d,25
+ rorx edi,r11d,11
+ lea edx,[r14*1+rdx]
+ lea ecx,[r12*1+rcx]
+ andn r12d,r11d,ebx
+ xor r13d,edi
+ rorx r14d,r11d,6
+ lea ecx,[r12*1+rcx]
+ xor r13d,r14d
+ mov edi,edx
+ rorx r12d,edx,22
+ lea ecx,[r13*1+rcx]
+ xor edi,r8d
+ rorx r14d,edx,13
+ rorx r13d,edx,2
+ lea r10d,[rcx*1+r10]
+ and r15d,edi
+ xor r14d,r12d
+ xor r15d,r8d
+ xor r14d,r13d
+ lea ecx,[r15*1+rcx]
+ mov r12d,r11d
+ add ebx,DWORD[((40+16))+rbp]
+ and r12d,r10d
+ rorx r13d,r10d,25
+ rorx r15d,r10d,11
+ lea ecx,[r14*1+rcx]
+ lea ebx,[r12*1+rbx]
+ andn r12d,r10d,eax
+ xor r13d,r15d
+ rorx r14d,r10d,6
+ lea ebx,[r12*1+rbx]
+ xor r13d,r14d
+ mov r15d,ecx
+ rorx r12d,ecx,22
+ lea ebx,[r13*1+rbx]
+ xor r15d,edx
+ rorx r14d,ecx,13
+ rorx r13d,ecx,2
+ lea r9d,[rbx*1+r9]
+ and edi,r15d
+ xor r14d,r12d
+ xor edi,edx
+ xor r14d,r13d
+ lea ebx,[rdi*1+rbx]
+ mov r12d,r10d
+ add eax,DWORD[((44+16))+rbp]
+ and r12d,r9d
+ rorx r13d,r9d,25
+ rorx edi,r9d,11
+ lea ebx,[r14*1+rbx]
+ lea eax,[r12*1+rax]
+ andn r12d,r9d,r11d
+ xor r13d,edi
+ rorx r14d,r9d,6
+ lea eax,[r12*1+rax]
+ xor r13d,r14d
+ mov edi,ebx
+ rorx r12d,ebx,22
+ lea eax,[r13*1+rax]
+ xor edi,ecx
+ rorx r14d,ebx,13
+ rorx r13d,ebx,2
+ lea r8d,[rax*1+r8]
+ and r15d,edi
+ xor r14d,r12d
+ xor r15d,ecx
+ xor r14d,r13d
+ lea eax,[r15*1+rax]
+ mov r12d,r9d
+ lea rbp,[((-64))+rbp]
+ cmp rbp,rsp
+ jae NEAR $L$ower_avx2
+
+ mov rdi,QWORD[512+rsp]
+ add eax,r14d
+
+ lea rsp,[448+rsp]
+
+ add eax,DWORD[rdi]
+ add ebx,DWORD[4+rdi]
+ add ecx,DWORD[8+rdi]
+ add edx,DWORD[12+rdi]
+ add r8d,DWORD[16+rdi]
+ add r9d,DWORD[20+rdi]
+ lea rsi,[128+rsi]
+ add r10d,DWORD[24+rdi]
+ mov r12,rsi
+ add r11d,DWORD[28+rdi]
+ cmp rsi,QWORD[((64+16))+rsp]
+
+ mov DWORD[rdi],eax
+ cmove r12,rsp
+ mov DWORD[4+rdi],ebx
+ mov DWORD[8+rdi],ecx
+ mov DWORD[12+rdi],edx
+ mov DWORD[16+rdi],r8d
+ mov DWORD[20+rdi],r9d
+ mov DWORD[24+rdi],r10d
+ mov DWORD[28+rdi],r11d
+
+ jbe NEAR $L$oop_avx2
+ lea rbp,[rsp]
+
+$L$done_avx2:
+ lea rsp,[rbp]
+ mov rsi,QWORD[88+rsp]
+
+ vzeroupper
+ movaps xmm6,XMMWORD[((64+32))+rsp]
+ movaps xmm7,XMMWORD[((64+48))+rsp]
+ movaps xmm8,XMMWORD[((64+64))+rsp]
+ movaps xmm9,XMMWORD[((64+80))+rsp]
+ mov r15,QWORD[((-48))+rsi]
+
+ mov r14,QWORD[((-40))+rsi]
+
+ mov r13,QWORD[((-32))+rsi]
+
+ mov r12,QWORD[((-24))+rsi]
+
+ mov rbp,QWORD[((-16))+rsi]
+
+ mov rbx,QWORD[((-8))+rsi]
+
+ lea rsp,[rsi]
+
+$L$epilogue_avx2:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_sha256_block_data_order_avx2:
+EXTERN __imp_RtlVirtualUnwind
+
+ALIGN 16
+se_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ mov rsi,QWORD[8+r9]
+ mov r11,QWORD[56+r9]
+
+ mov r10d,DWORD[r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jb NEAR $L$in_prologue
+
+ mov rax,QWORD[152+r8]
+
+ mov r10d,DWORD[4+r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jae NEAR $L$in_prologue
+ lea r10,[$L$avx2_shortcut]
+ cmp rbx,r10
+ jb NEAR $L$not_in_avx2
+
+ and rax,-256*4
+ add rax,448
+$L$not_in_avx2:
+ mov rsi,rax
+ mov rax,QWORD[((64+24))+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+ mov rbp,QWORD[((-16))+rax]
+ mov r12,QWORD[((-24))+rax]
+ mov r13,QWORD[((-32))+rax]
+ mov r14,QWORD[((-40))+rax]
+ mov r15,QWORD[((-48))+rax]
+ mov QWORD[144+r8],rbx
+ mov QWORD[160+r8],rbp
+ mov QWORD[216+r8],r12
+ mov QWORD[224+r8],r13
+ mov QWORD[232+r8],r14
+ mov QWORD[240+r8],r15
+
+ lea r10,[$L$epilogue]
+ cmp rbx,r10
+ jb NEAR $L$in_prologue
+
+ lea rsi,[((64+32))+rsi]
+ lea rdi,[512+r8]
+ mov ecx,8
+ DD 0xa548f3fc
+
+$L$in_prologue:
+ mov rdi,QWORD[8+rax]
+ mov rsi,QWORD[16+rax]
+ mov QWORD[152+r8],rax
+ mov QWORD[168+r8],rsi
+ mov QWORD[176+r8],rdi
+
+ mov rdi,QWORD[40+r9]
+ mov rsi,r8
+ mov ecx,154
+ DD 0xa548f3fc
+
+ mov rsi,r9
+ xor rcx,rcx
+ mov rdx,QWORD[8+rsi]
+ mov r8,QWORD[rsi]
+ mov r9,QWORD[16+rsi]
+ mov r10,QWORD[40+rsi]
+ lea r11,[56+rsi]
+ lea r12,[24+rsi]
+ mov QWORD[32+rsp],r10
+ mov QWORD[40+rsp],r11
+ mov QWORD[48+rsp],r12
+ mov QWORD[56+rsp],rcx
+ call QWORD[__imp_RtlVirtualUnwind]
+
+ mov eax,1
+ add rsp,64
+ popfq
+ pop r15
+ pop r14
+ pop r13
+ pop r12
+ pop rbp
+ pop rbx
+ pop rdi
+ pop rsi
+ DB 0F3h,0C3h ;repret
+
+
+ALIGN 16
+shaext_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ lea r10,[$L$prologue_shaext]
+ cmp rbx,r10
+ jb NEAR $L$in_prologue
+
+ lea r10,[$L$epilogue_shaext]
+ cmp rbx,r10
+ jae NEAR $L$in_prologue
+
+ lea rsi,[((-8-80))+rax]
+ lea rdi,[512+r8]
+ mov ecx,10
+ DD 0xa548f3fc
+
+ jmp NEAR $L$in_prologue
+
+section .pdata rdata align=4
+ALIGN 4
+ DD $L$SEH_begin_sha256_block_data_order wrt ..imagebase
+ DD $L$SEH_end_sha256_block_data_order wrt ..imagebase
+ DD $L$SEH_info_sha256_block_data_order wrt ..imagebase
+ DD $L$SEH_begin_sha256_block_data_order_shaext wrt ..imagebase
+ DD $L$SEH_end_sha256_block_data_order_shaext wrt ..imagebase
+ DD $L$SEH_info_sha256_block_data_order_shaext wrt ..imagebase
+ DD $L$SEH_begin_sha256_block_data_order_ssse3 wrt ..imagebase
+ DD $L$SEH_end_sha256_block_data_order_ssse3 wrt ..imagebase
+ DD $L$SEH_info_sha256_block_data_order_ssse3 wrt ..imagebase
+ DD $L$SEH_begin_sha256_block_data_order_avx wrt ..imagebase
+ DD $L$SEH_end_sha256_block_data_order_avx wrt ..imagebase
+ DD $L$SEH_info_sha256_block_data_order_avx wrt ..imagebase
+ DD $L$SEH_begin_sha256_block_data_order_avx2 wrt ..imagebase
+ DD $L$SEH_end_sha256_block_data_order_avx2 wrt ..imagebase
+ DD $L$SEH_info_sha256_block_data_order_avx2 wrt ..imagebase
+section .xdata rdata align=8
+ALIGN 8
+$L$SEH_info_sha256_block_data_order:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$prologue wrt ..imagebase,$L$epilogue wrt ..imagebase
+$L$SEH_info_sha256_block_data_order_shaext:
+DB 9,0,0,0
+ DD shaext_handler wrt ..imagebase
+$L$SEH_info_sha256_block_data_order_ssse3:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$prologue_ssse3 wrt ..imagebase,$L$epilogue_ssse3 wrt ..imagebase
+$L$SEH_info_sha256_block_data_order_avx:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$prologue_avx wrt ..imagebase,$L$epilogue_avx wrt ..imagebase
+$L$SEH_info_sha256_block_data_order_avx2:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$prologue_avx2 wrt ..imagebase,$L$epilogue_avx2 wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha512-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha512-x86_64.nasm
new file mode 100644
index 0000000000..6d48b93b84
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha512-x86_64.nasm
@@ -0,0 +1,5668 @@
+; Copyright 2005-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+EXTERN OPENSSL_ia32cap_P
+global sha512_block_data_order
+
+ALIGN 16
+sha512_block_data_order:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_sha512_block_data_order:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+
+ lea r11,[OPENSSL_ia32cap_P]
+ mov r9d,DWORD[r11]
+ mov r10d,DWORD[4+r11]
+ mov r11d,DWORD[8+r11]
+ test r10d,2048
+ jnz NEAR $L$xop_shortcut
+ and r11d,296
+ cmp r11d,296
+ je NEAR $L$avx2_shortcut
+ and r9d,1073741824
+ and r10d,268435968
+ or r10d,r9d
+ cmp r10d,1342177792
+ je NEAR $L$avx_shortcut
+ mov rax,rsp
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+ shl rdx,4
+ sub rsp,16*8+4*8
+ lea rdx,[rdx*8+rsi]
+ and rsp,-64
+ mov QWORD[((128+0))+rsp],rdi
+ mov QWORD[((128+8))+rsp],rsi
+ mov QWORD[((128+16))+rsp],rdx
+ mov QWORD[152+rsp],rax
+
+$L$prologue:
+
+ mov rax,QWORD[rdi]
+ mov rbx,QWORD[8+rdi]
+ mov rcx,QWORD[16+rdi]
+ mov rdx,QWORD[24+rdi]
+ mov r8,QWORD[32+rdi]
+ mov r9,QWORD[40+rdi]
+ mov r10,QWORD[48+rdi]
+ mov r11,QWORD[56+rdi]
+ jmp NEAR $L$loop
+
+ALIGN 16
+$L$loop:
+ mov rdi,rbx
+ lea rbp,[K512]
+ xor rdi,rcx
+ mov r12,QWORD[rsi]
+ mov r13,r8
+ mov r14,rax
+ bswap r12
+ ror r13,23
+ mov r15,r9
+
+ xor r13,r8
+ ror r14,5
+ xor r15,r10
+
+ mov QWORD[rsp],r12
+ xor r14,rax
+ and r15,r8
+
+ ror r13,4
+ add r12,r11
+ xor r15,r10
+
+ ror r14,6
+ xor r13,r8
+ add r12,r15
+
+ mov r15,rax
+ add r12,QWORD[rbp]
+ xor r14,rax
+
+ xor r15,rbx
+ ror r13,14
+ mov r11,rbx
+
+ and rdi,r15
+ ror r14,28
+ add r12,r13
+
+ xor r11,rdi
+ add rdx,r12
+ add r11,r12
+
+ lea rbp,[8+rbp]
+ add r11,r14
+ mov r12,QWORD[8+rsi]
+ mov r13,rdx
+ mov r14,r11
+ bswap r12
+ ror r13,23
+ mov rdi,r8
+
+ xor r13,rdx
+ ror r14,5
+ xor rdi,r9
+
+ mov QWORD[8+rsp],r12
+ xor r14,r11
+ and rdi,rdx
+
+ ror r13,4
+ add r12,r10
+ xor rdi,r9
+
+ ror r14,6
+ xor r13,rdx
+ add r12,rdi
+
+ mov rdi,r11
+ add r12,QWORD[rbp]
+ xor r14,r11
+
+ xor rdi,rax
+ ror r13,14
+ mov r10,rax
+
+ and r15,rdi
+ ror r14,28
+ add r12,r13
+
+ xor r10,r15
+ add rcx,r12
+ add r10,r12
+
+ lea rbp,[24+rbp]
+ add r10,r14
+ mov r12,QWORD[16+rsi]
+ mov r13,rcx
+ mov r14,r10
+ bswap r12
+ ror r13,23
+ mov r15,rdx
+
+ xor r13,rcx
+ ror r14,5
+ xor r15,r8
+
+ mov QWORD[16+rsp],r12
+ xor r14,r10
+ and r15,rcx
+
+ ror r13,4
+ add r12,r9
+ xor r15,r8
+
+ ror r14,6
+ xor r13,rcx
+ add r12,r15
+
+ mov r15,r10
+ add r12,QWORD[rbp]
+ xor r14,r10
+
+ xor r15,r11
+ ror r13,14
+ mov r9,r11
+
+ and rdi,r15
+ ror r14,28
+ add r12,r13
+
+ xor r9,rdi
+ add rbx,r12
+ add r9,r12
+
+ lea rbp,[8+rbp]
+ add r9,r14
+ mov r12,QWORD[24+rsi]
+ mov r13,rbx
+ mov r14,r9
+ bswap r12
+ ror r13,23
+ mov rdi,rcx
+
+ xor r13,rbx
+ ror r14,5
+ xor rdi,rdx
+
+ mov QWORD[24+rsp],r12
+ xor r14,r9
+ and rdi,rbx
+
+ ror r13,4
+ add r12,r8
+ xor rdi,rdx
+
+ ror r14,6
+ xor r13,rbx
+ add r12,rdi
+
+ mov rdi,r9
+ add r12,QWORD[rbp]
+ xor r14,r9
+
+ xor rdi,r10
+ ror r13,14
+ mov r8,r10
+
+ and r15,rdi
+ ror r14,28
+ add r12,r13
+
+ xor r8,r15
+ add rax,r12
+ add r8,r12
+
+ lea rbp,[24+rbp]
+ add r8,r14
+ mov r12,QWORD[32+rsi]
+ mov r13,rax
+ mov r14,r8
+ bswap r12
+ ror r13,23
+ mov r15,rbx
+
+ xor r13,rax
+ ror r14,5
+ xor r15,rcx
+
+ mov QWORD[32+rsp],r12
+ xor r14,r8
+ and r15,rax
+
+ ror r13,4
+ add r12,rdx
+ xor r15,rcx
+
+ ror r14,6
+ xor r13,rax
+ add r12,r15
+
+ mov r15,r8
+ add r12,QWORD[rbp]
+ xor r14,r8
+
+ xor r15,r9
+ ror r13,14
+ mov rdx,r9
+
+ and rdi,r15
+ ror r14,28
+ add r12,r13
+
+ xor rdx,rdi
+ add r11,r12
+ add rdx,r12
+
+ lea rbp,[8+rbp]
+ add rdx,r14
+ mov r12,QWORD[40+rsi]
+ mov r13,r11
+ mov r14,rdx
+ bswap r12
+ ror r13,23
+ mov rdi,rax
+
+ xor r13,r11
+ ror r14,5
+ xor rdi,rbx
+
+ mov QWORD[40+rsp],r12
+ xor r14,rdx
+ and rdi,r11
+
+ ror r13,4
+ add r12,rcx
+ xor rdi,rbx
+
+ ror r14,6
+ xor r13,r11
+ add r12,rdi
+
+ mov rdi,rdx
+ add r12,QWORD[rbp]
+ xor r14,rdx
+
+ xor rdi,r8
+ ror r13,14
+ mov rcx,r8
+
+ and r15,rdi
+ ror r14,28
+ add r12,r13
+
+ xor rcx,r15
+ add r10,r12
+ add rcx,r12
+
+ lea rbp,[24+rbp]
+ add rcx,r14
+ mov r12,QWORD[48+rsi]
+ mov r13,r10
+ mov r14,rcx
+ bswap r12
+ ror r13,23
+ mov r15,r11
+
+ xor r13,r10
+ ror r14,5
+ xor r15,rax
+
+ mov QWORD[48+rsp],r12
+ xor r14,rcx
+ and r15,r10
+
+ ror r13,4
+ add r12,rbx
+ xor r15,rax
+
+ ror r14,6
+ xor r13,r10
+ add r12,r15
+
+ mov r15,rcx
+ add r12,QWORD[rbp]
+ xor r14,rcx
+
+ xor r15,rdx
+ ror r13,14
+ mov rbx,rdx
+
+ and rdi,r15
+ ror r14,28
+ add r12,r13
+
+ xor rbx,rdi
+ add r9,r12
+ add rbx,r12
+
+ lea rbp,[8+rbp]
+ add rbx,r14
+ mov r12,QWORD[56+rsi]
+ mov r13,r9
+ mov r14,rbx
+ bswap r12
+ ror r13,23
+ mov rdi,r10
+
+ xor r13,r9
+ ror r14,5
+ xor rdi,r11
+
+ mov QWORD[56+rsp],r12
+ xor r14,rbx
+ and rdi,r9
+
+ ror r13,4
+ add r12,rax
+ xor rdi,r11
+
+ ror r14,6
+ xor r13,r9
+ add r12,rdi
+
+ mov rdi,rbx
+ add r12,QWORD[rbp]
+ xor r14,rbx
+
+ xor rdi,rcx
+ ror r13,14
+ mov rax,rcx
+
+ and r15,rdi
+ ror r14,28
+ add r12,r13
+
+ xor rax,r15
+ add r8,r12
+ add rax,r12
+
+ lea rbp,[24+rbp]
+ add rax,r14
+ mov r12,QWORD[64+rsi]
+ mov r13,r8
+ mov r14,rax
+ bswap r12
+ ror r13,23
+ mov r15,r9
+
+ xor r13,r8
+ ror r14,5
+ xor r15,r10
+
+ mov QWORD[64+rsp],r12
+ xor r14,rax
+ and r15,r8
+
+ ror r13,4
+ add r12,r11
+ xor r15,r10
+
+ ror r14,6
+ xor r13,r8
+ add r12,r15
+
+ mov r15,rax
+ add r12,QWORD[rbp]
+ xor r14,rax
+
+ xor r15,rbx
+ ror r13,14
+ mov r11,rbx
+
+ and rdi,r15
+ ror r14,28
+ add r12,r13
+
+ xor r11,rdi
+ add rdx,r12
+ add r11,r12
+
+ lea rbp,[8+rbp]
+ add r11,r14
+ mov r12,QWORD[72+rsi]
+ mov r13,rdx
+ mov r14,r11
+ bswap r12
+ ror r13,23
+ mov rdi,r8
+
+ xor r13,rdx
+ ror r14,5
+ xor rdi,r9
+
+ mov QWORD[72+rsp],r12
+ xor r14,r11
+ and rdi,rdx
+
+ ror r13,4
+ add r12,r10
+ xor rdi,r9
+
+ ror r14,6
+ xor r13,rdx
+ add r12,rdi
+
+ mov rdi,r11
+ add r12,QWORD[rbp]
+ xor r14,r11
+
+ xor rdi,rax
+ ror r13,14
+ mov r10,rax
+
+ and r15,rdi
+ ror r14,28
+ add r12,r13
+
+ xor r10,r15
+ add rcx,r12
+ add r10,r12
+
+ lea rbp,[24+rbp]
+ add r10,r14
+ mov r12,QWORD[80+rsi]
+ mov r13,rcx
+ mov r14,r10
+ bswap r12
+ ror r13,23
+ mov r15,rdx
+
+ xor r13,rcx
+ ror r14,5
+ xor r15,r8
+
+ mov QWORD[80+rsp],r12
+ xor r14,r10
+ and r15,rcx
+
+ ror r13,4
+ add r12,r9
+ xor r15,r8
+
+ ror r14,6
+ xor r13,rcx
+ add r12,r15
+
+ mov r15,r10
+ add r12,QWORD[rbp]
+ xor r14,r10
+
+ xor r15,r11
+ ror r13,14
+ mov r9,r11
+
+ and rdi,r15
+ ror r14,28
+ add r12,r13
+
+ xor r9,rdi
+ add rbx,r12
+ add r9,r12
+
+ lea rbp,[8+rbp]
+ add r9,r14
+ mov r12,QWORD[88+rsi]
+ mov r13,rbx
+ mov r14,r9
+ bswap r12
+ ror r13,23
+ mov rdi,rcx
+
+ xor r13,rbx
+ ror r14,5
+ xor rdi,rdx
+
+ mov QWORD[88+rsp],r12
+ xor r14,r9
+ and rdi,rbx
+
+ ror r13,4
+ add r12,r8
+ xor rdi,rdx
+
+ ror r14,6
+ xor r13,rbx
+ add r12,rdi
+
+ mov rdi,r9
+ add r12,QWORD[rbp]
+ xor r14,r9
+
+ xor rdi,r10
+ ror r13,14
+ mov r8,r10
+
+ and r15,rdi
+ ror r14,28
+ add r12,r13
+
+ xor r8,r15
+ add rax,r12
+ add r8,r12
+
+ lea rbp,[24+rbp]
+ add r8,r14
+ mov r12,QWORD[96+rsi]
+ mov r13,rax
+ mov r14,r8
+ bswap r12
+ ror r13,23
+ mov r15,rbx
+
+ xor r13,rax
+ ror r14,5
+ xor r15,rcx
+
+ mov QWORD[96+rsp],r12
+ xor r14,r8
+ and r15,rax
+
+ ror r13,4
+ add r12,rdx
+ xor r15,rcx
+
+ ror r14,6
+ xor r13,rax
+ add r12,r15
+
+ mov r15,r8
+ add r12,QWORD[rbp]
+ xor r14,r8
+
+ xor r15,r9
+ ror r13,14
+ mov rdx,r9
+
+ and rdi,r15
+ ror r14,28
+ add r12,r13
+
+ xor rdx,rdi
+ add r11,r12
+ add rdx,r12
+
+ lea rbp,[8+rbp]
+ add rdx,r14
+ mov r12,QWORD[104+rsi]
+ mov r13,r11
+ mov r14,rdx
+ bswap r12
+ ror r13,23
+ mov rdi,rax
+
+ xor r13,r11
+ ror r14,5
+ xor rdi,rbx
+
+ mov QWORD[104+rsp],r12
+ xor r14,rdx
+ and rdi,r11
+
+ ror r13,4
+ add r12,rcx
+ xor rdi,rbx
+
+ ror r14,6
+ xor r13,r11
+ add r12,rdi
+
+ mov rdi,rdx
+ add r12,QWORD[rbp]
+ xor r14,rdx
+
+ xor rdi,r8
+ ror r13,14
+ mov rcx,r8
+
+ and r15,rdi
+ ror r14,28
+ add r12,r13
+
+ xor rcx,r15
+ add r10,r12
+ add rcx,r12
+
+ lea rbp,[24+rbp]
+ add rcx,r14
+ mov r12,QWORD[112+rsi]
+ mov r13,r10
+ mov r14,rcx
+ bswap r12
+ ror r13,23
+ mov r15,r11
+
+ xor r13,r10
+ ror r14,5
+ xor r15,rax
+
+ mov QWORD[112+rsp],r12
+ xor r14,rcx
+ and r15,r10
+
+ ror r13,4
+ add r12,rbx
+ xor r15,rax
+
+ ror r14,6
+ xor r13,r10
+ add r12,r15
+
+ mov r15,rcx
+ add r12,QWORD[rbp]
+ xor r14,rcx
+
+ xor r15,rdx
+ ror r13,14
+ mov rbx,rdx
+
+ and rdi,r15
+ ror r14,28
+ add r12,r13
+
+ xor rbx,rdi
+ add r9,r12
+ add rbx,r12
+
+ lea rbp,[8+rbp]
+ add rbx,r14
+ mov r12,QWORD[120+rsi]
+ mov r13,r9
+ mov r14,rbx
+ bswap r12
+ ror r13,23
+ mov rdi,r10
+
+ xor r13,r9
+ ror r14,5
+ xor rdi,r11
+
+ mov QWORD[120+rsp],r12
+ xor r14,rbx
+ and rdi,r9
+
+ ror r13,4
+ add r12,rax
+ xor rdi,r11
+
+ ror r14,6
+ xor r13,r9
+ add r12,rdi
+
+ mov rdi,rbx
+ add r12,QWORD[rbp]
+ xor r14,rbx
+
+ xor rdi,rcx
+ ror r13,14
+ mov rax,rcx
+
+ and r15,rdi
+ ror r14,28
+ add r12,r13
+
+ xor rax,r15
+ add r8,r12
+ add rax,r12
+
+ lea rbp,[24+rbp]
+ jmp NEAR $L$rounds_16_xx
+ALIGN 16
+$L$rounds_16_xx:
+ mov r13,QWORD[8+rsp]
+ mov r15,QWORD[112+rsp]
+
+ mov r12,r13
+ ror r13,7
+ add rax,r14
+ mov r14,r15
+ ror r15,42
+
+ xor r13,r12
+ shr r12,7
+ ror r13,1
+ xor r15,r14
+ shr r14,6
+
+ ror r15,19
+ xor r12,r13
+ xor r15,r14
+ add r12,QWORD[72+rsp]
+
+ add r12,QWORD[rsp]
+ mov r13,r8
+ add r12,r15
+ mov r14,rax
+ ror r13,23
+ mov r15,r9
+
+ xor r13,r8
+ ror r14,5
+ xor r15,r10
+
+ mov QWORD[rsp],r12
+ xor r14,rax
+ and r15,r8
+
+ ror r13,4
+ add r12,r11
+ xor r15,r10
+
+ ror r14,6
+ xor r13,r8
+ add r12,r15
+
+ mov r15,rax
+ add r12,QWORD[rbp]
+ xor r14,rax
+
+ xor r15,rbx
+ ror r13,14
+ mov r11,rbx
+
+ and rdi,r15
+ ror r14,28
+ add r12,r13
+
+ xor r11,rdi
+ add rdx,r12
+ add r11,r12
+
+ lea rbp,[8+rbp]
+ mov r13,QWORD[16+rsp]
+ mov rdi,QWORD[120+rsp]
+
+ mov r12,r13
+ ror r13,7
+ add r11,r14
+ mov r14,rdi
+ ror rdi,42
+
+ xor r13,r12
+ shr r12,7
+ ror r13,1
+ xor rdi,r14
+ shr r14,6
+
+ ror rdi,19
+ xor r12,r13
+ xor rdi,r14
+ add r12,QWORD[80+rsp]
+
+ add r12,QWORD[8+rsp]
+ mov r13,rdx
+ add r12,rdi
+ mov r14,r11
+ ror r13,23
+ mov rdi,r8
+
+ xor r13,rdx
+ ror r14,5
+ xor rdi,r9
+
+ mov QWORD[8+rsp],r12
+ xor r14,r11
+ and rdi,rdx
+
+ ror r13,4
+ add r12,r10
+ xor rdi,r9
+
+ ror r14,6
+ xor r13,rdx
+ add r12,rdi
+
+ mov rdi,r11
+ add r12,QWORD[rbp]
+ xor r14,r11
+
+ xor rdi,rax
+ ror r13,14
+ mov r10,rax
+
+ and r15,rdi
+ ror r14,28
+ add r12,r13
+
+ xor r10,r15
+ add rcx,r12
+ add r10,r12
+
+ lea rbp,[24+rbp]
+ mov r13,QWORD[24+rsp]
+ mov r15,QWORD[rsp]
+
+ mov r12,r13
+ ror r13,7
+ add r10,r14
+ mov r14,r15
+ ror r15,42
+
+ xor r13,r12
+ shr r12,7
+ ror r13,1
+ xor r15,r14
+ shr r14,6
+
+ ror r15,19
+ xor r12,r13
+ xor r15,r14
+ add r12,QWORD[88+rsp]
+
+ add r12,QWORD[16+rsp]
+ mov r13,rcx
+ add r12,r15
+ mov r14,r10
+ ror r13,23
+ mov r15,rdx
+
+ xor r13,rcx
+ ror r14,5
+ xor r15,r8
+
+ mov QWORD[16+rsp],r12
+ xor r14,r10
+ and r15,rcx
+
+ ror r13,4
+ add r12,r9
+ xor r15,r8
+
+ ror r14,6
+ xor r13,rcx
+ add r12,r15
+
+ mov r15,r10
+ add r12,QWORD[rbp]
+ xor r14,r10
+
+ xor r15,r11
+ ror r13,14
+ mov r9,r11
+
+ and rdi,r15
+ ror r14,28
+ add r12,r13
+
+ xor r9,rdi
+ add rbx,r12
+ add r9,r12
+
+ lea rbp,[8+rbp]
+ mov r13,QWORD[32+rsp]
+ mov rdi,QWORD[8+rsp]
+
+ mov r12,r13
+ ror r13,7
+ add r9,r14
+ mov r14,rdi
+ ror rdi,42
+
+ xor r13,r12
+ shr r12,7
+ ror r13,1
+ xor rdi,r14
+ shr r14,6
+
+ ror rdi,19
+ xor r12,r13
+ xor rdi,r14
+ add r12,QWORD[96+rsp]
+
+ add r12,QWORD[24+rsp]
+ mov r13,rbx
+ add r12,rdi
+ mov r14,r9
+ ror r13,23
+ mov rdi,rcx
+
+ xor r13,rbx
+ ror r14,5
+ xor rdi,rdx
+
+ mov QWORD[24+rsp],r12
+ xor r14,r9
+ and rdi,rbx
+
+ ror r13,4
+ add r12,r8
+ xor rdi,rdx
+
+ ror r14,6
+ xor r13,rbx
+ add r12,rdi
+
+ mov rdi,r9
+ add r12,QWORD[rbp]
+ xor r14,r9
+
+ xor rdi,r10
+ ror r13,14
+ mov r8,r10
+
+ and r15,rdi
+ ror r14,28
+ add r12,r13
+
+ xor r8,r15
+ add rax,r12
+ add r8,r12
+
+ lea rbp,[24+rbp]
+ mov r13,QWORD[40+rsp]
+ mov r15,QWORD[16+rsp]
+
+ mov r12,r13
+ ror r13,7
+ add r8,r14
+ mov r14,r15
+ ror r15,42
+
+ xor r13,r12
+ shr r12,7
+ ror r13,1
+ xor r15,r14
+ shr r14,6
+
+ ror r15,19
+ xor r12,r13
+ xor r15,r14
+ add r12,QWORD[104+rsp]
+
+ add r12,QWORD[32+rsp]
+ mov r13,rax
+ add r12,r15
+ mov r14,r8
+ ror r13,23
+ mov r15,rbx
+
+ xor r13,rax
+ ror r14,5
+ xor r15,rcx
+
+ mov QWORD[32+rsp],r12
+ xor r14,r8
+ and r15,rax
+
+ ror r13,4
+ add r12,rdx
+ xor r15,rcx
+
+ ror r14,6
+ xor r13,rax
+ add r12,r15
+
+ mov r15,r8
+ add r12,QWORD[rbp]
+ xor r14,r8
+
+ xor r15,r9
+ ror r13,14
+ mov rdx,r9
+
+ and rdi,r15
+ ror r14,28
+ add r12,r13
+
+ xor rdx,rdi
+ add r11,r12
+ add rdx,r12
+
+ lea rbp,[8+rbp]
+ mov r13,QWORD[48+rsp]
+ mov rdi,QWORD[24+rsp]
+
+ mov r12,r13
+ ror r13,7
+ add rdx,r14
+ mov r14,rdi
+ ror rdi,42
+
+ xor r13,r12
+ shr r12,7
+ ror r13,1
+ xor rdi,r14
+ shr r14,6
+
+ ror rdi,19
+ xor r12,r13
+ xor rdi,r14
+ add r12,QWORD[112+rsp]
+
+ add r12,QWORD[40+rsp]
+ mov r13,r11
+ add r12,rdi
+ mov r14,rdx
+ ror r13,23
+ mov rdi,rax
+
+ xor r13,r11
+ ror r14,5
+ xor rdi,rbx
+
+ mov QWORD[40+rsp],r12
+ xor r14,rdx
+ and rdi,r11
+
+ ror r13,4
+ add r12,rcx
+ xor rdi,rbx
+
+ ror r14,6
+ xor r13,r11
+ add r12,rdi
+
+ mov rdi,rdx
+ add r12,QWORD[rbp]
+ xor r14,rdx
+
+ xor rdi,r8
+ ror r13,14
+ mov rcx,r8
+
+ and r15,rdi
+ ror r14,28
+ add r12,r13
+
+ xor rcx,r15
+ add r10,r12
+ add rcx,r12
+
+ lea rbp,[24+rbp]
+ mov r13,QWORD[56+rsp]
+ mov r15,QWORD[32+rsp]
+
+ mov r12,r13
+ ror r13,7
+ add rcx,r14
+ mov r14,r15
+ ror r15,42
+
+ xor r13,r12
+ shr r12,7
+ ror r13,1
+ xor r15,r14
+ shr r14,6
+
+ ror r15,19
+ xor r12,r13
+ xor r15,r14
+ add r12,QWORD[120+rsp]
+
+ add r12,QWORD[48+rsp]
+ mov r13,r10
+ add r12,r15
+ mov r14,rcx
+ ror r13,23
+ mov r15,r11
+
+ xor r13,r10
+ ror r14,5
+ xor r15,rax
+
+ mov QWORD[48+rsp],r12
+ xor r14,rcx
+ and r15,r10
+
+ ror r13,4
+ add r12,rbx
+ xor r15,rax
+
+ ror r14,6
+ xor r13,r10
+ add r12,r15
+
+ mov r15,rcx
+ add r12,QWORD[rbp]
+ xor r14,rcx
+
+ xor r15,rdx
+ ror r13,14
+ mov rbx,rdx
+
+ and rdi,r15
+ ror r14,28
+ add r12,r13
+
+ xor rbx,rdi
+ add r9,r12
+ add rbx,r12
+
+ lea rbp,[8+rbp]
+ mov r13,QWORD[64+rsp]
+ mov rdi,QWORD[40+rsp]
+
+ mov r12,r13
+ ror r13,7
+ add rbx,r14
+ mov r14,rdi
+ ror rdi,42
+
+ xor r13,r12
+ shr r12,7
+ ror r13,1
+ xor rdi,r14
+ shr r14,6
+
+ ror rdi,19
+ xor r12,r13
+ xor rdi,r14
+ add r12,QWORD[rsp]
+
+ add r12,QWORD[56+rsp]
+ mov r13,r9
+ add r12,rdi
+ mov r14,rbx
+ ror r13,23
+ mov rdi,r10
+
+ xor r13,r9
+ ror r14,5
+ xor rdi,r11
+
+ mov QWORD[56+rsp],r12
+ xor r14,rbx
+ and rdi,r9
+
+ ror r13,4
+ add r12,rax
+ xor rdi,r11
+
+ ror r14,6
+ xor r13,r9
+ add r12,rdi
+
+ mov rdi,rbx
+ add r12,QWORD[rbp]
+ xor r14,rbx
+
+ xor rdi,rcx
+ ror r13,14
+ mov rax,rcx
+
+ and r15,rdi
+ ror r14,28
+ add r12,r13
+
+ xor rax,r15
+ add r8,r12
+ add rax,r12
+
+ lea rbp,[24+rbp]
+ mov r13,QWORD[72+rsp]
+ mov r15,QWORD[48+rsp]
+
+ mov r12,r13
+ ror r13,7
+ add rax,r14
+ mov r14,r15
+ ror r15,42
+
+ xor r13,r12
+ shr r12,7
+ ror r13,1
+ xor r15,r14
+ shr r14,6
+
+ ror r15,19
+ xor r12,r13
+ xor r15,r14
+ add r12,QWORD[8+rsp]
+
+ add r12,QWORD[64+rsp]
+ mov r13,r8
+ add r12,r15
+ mov r14,rax
+ ror r13,23
+ mov r15,r9
+
+ xor r13,r8
+ ror r14,5
+ xor r15,r10
+
+ mov QWORD[64+rsp],r12
+ xor r14,rax
+ and r15,r8
+
+ ror r13,4
+ add r12,r11
+ xor r15,r10
+
+ ror r14,6
+ xor r13,r8
+ add r12,r15
+
+ mov r15,rax
+ add r12,QWORD[rbp]
+ xor r14,rax
+
+ xor r15,rbx
+ ror r13,14
+ mov r11,rbx
+
+ and rdi,r15
+ ror r14,28
+ add r12,r13
+
+ xor r11,rdi
+ add rdx,r12
+ add r11,r12
+
+ lea rbp,[8+rbp]
+ mov r13,QWORD[80+rsp]
+ mov rdi,QWORD[56+rsp]
+
+ mov r12,r13
+ ror r13,7
+ add r11,r14
+ mov r14,rdi
+ ror rdi,42
+
+ xor r13,r12
+ shr r12,7
+ ror r13,1
+ xor rdi,r14
+ shr r14,6
+
+ ror rdi,19
+ xor r12,r13
+ xor rdi,r14
+ add r12,QWORD[16+rsp]
+
+ add r12,QWORD[72+rsp]
+ mov r13,rdx
+ add r12,rdi
+ mov r14,r11
+ ror r13,23
+ mov rdi,r8
+
+ xor r13,rdx
+ ror r14,5
+ xor rdi,r9
+
+ mov QWORD[72+rsp],r12
+ xor r14,r11
+ and rdi,rdx
+
+ ror r13,4
+ add r12,r10
+ xor rdi,r9
+
+ ror r14,6
+ xor r13,rdx
+ add r12,rdi
+
+ mov rdi,r11
+ add r12,QWORD[rbp]
+ xor r14,r11
+
+ xor rdi,rax
+ ror r13,14
+ mov r10,rax
+
+ and r15,rdi
+ ror r14,28
+ add r12,r13
+
+ xor r10,r15
+ add rcx,r12
+ add r10,r12
+
+ lea rbp,[24+rbp]
+ mov r13,QWORD[88+rsp]
+ mov r15,QWORD[64+rsp]
+
+ mov r12,r13
+ ror r13,7
+ add r10,r14
+ mov r14,r15
+ ror r15,42
+
+ xor r13,r12
+ shr r12,7
+ ror r13,1
+ xor r15,r14
+ shr r14,6
+
+ ror r15,19
+ xor r12,r13
+ xor r15,r14
+ add r12,QWORD[24+rsp]
+
+ add r12,QWORD[80+rsp]
+ mov r13,rcx
+ add r12,r15
+ mov r14,r10
+ ror r13,23
+ mov r15,rdx
+
+ xor r13,rcx
+ ror r14,5
+ xor r15,r8
+
+ mov QWORD[80+rsp],r12
+ xor r14,r10
+ and r15,rcx
+
+ ror r13,4
+ add r12,r9
+ xor r15,r8
+
+ ror r14,6
+ xor r13,rcx
+ add r12,r15
+
+ mov r15,r10
+ add r12,QWORD[rbp]
+ xor r14,r10
+
+ xor r15,r11
+ ror r13,14
+ mov r9,r11
+
+ and rdi,r15
+ ror r14,28
+ add r12,r13
+
+ xor r9,rdi
+ add rbx,r12
+ add r9,r12
+
+ lea rbp,[8+rbp]
+ mov r13,QWORD[96+rsp]
+ mov rdi,QWORD[72+rsp]
+
+ mov r12,r13
+ ror r13,7
+ add r9,r14
+ mov r14,rdi
+ ror rdi,42
+
+ xor r13,r12
+ shr r12,7
+ ror r13,1
+ xor rdi,r14
+ shr r14,6
+
+ ror rdi,19
+ xor r12,r13
+ xor rdi,r14
+ add r12,QWORD[32+rsp]
+
+ add r12,QWORD[88+rsp]
+ mov r13,rbx
+ add r12,rdi
+ mov r14,r9
+ ror r13,23
+ mov rdi,rcx
+
+ xor r13,rbx
+ ror r14,5
+ xor rdi,rdx
+
+ mov QWORD[88+rsp],r12
+ xor r14,r9
+ and rdi,rbx
+
+ ror r13,4
+ add r12,r8
+ xor rdi,rdx
+
+ ror r14,6
+ xor r13,rbx
+ add r12,rdi
+
+ mov rdi,r9
+ add r12,QWORD[rbp]
+ xor r14,r9
+
+ xor rdi,r10
+ ror r13,14
+ mov r8,r10
+
+ and r15,rdi
+ ror r14,28
+ add r12,r13
+
+ xor r8,r15
+ add rax,r12
+ add r8,r12
+
+ lea rbp,[24+rbp]
+ mov r13,QWORD[104+rsp]
+ mov r15,QWORD[80+rsp]
+
+ mov r12,r13
+ ror r13,7
+ add r8,r14
+ mov r14,r15
+ ror r15,42
+
+ xor r13,r12
+ shr r12,7
+ ror r13,1
+ xor r15,r14
+ shr r14,6
+
+ ror r15,19
+ xor r12,r13
+ xor r15,r14
+ add r12,QWORD[40+rsp]
+
+ add r12,QWORD[96+rsp]
+ mov r13,rax
+ add r12,r15
+ mov r14,r8
+ ror r13,23
+ mov r15,rbx
+
+ xor r13,rax
+ ror r14,5
+ xor r15,rcx
+
+ mov QWORD[96+rsp],r12
+ xor r14,r8
+ and r15,rax
+
+ ror r13,4
+ add r12,rdx
+ xor r15,rcx
+
+ ror r14,6
+ xor r13,rax
+ add r12,r15
+
+ mov r15,r8
+ add r12,QWORD[rbp]
+ xor r14,r8
+
+ xor r15,r9
+ ror r13,14
+ mov rdx,r9
+
+ and rdi,r15
+ ror r14,28
+ add r12,r13
+
+ xor rdx,rdi
+ add r11,r12
+ add rdx,r12
+
+ lea rbp,[8+rbp]
+ mov r13,QWORD[112+rsp]
+ mov rdi,QWORD[88+rsp]
+
+ mov r12,r13
+ ror r13,7
+ add rdx,r14
+ mov r14,rdi
+ ror rdi,42
+
+ xor r13,r12
+ shr r12,7
+ ror r13,1
+ xor rdi,r14
+ shr r14,6
+
+ ror rdi,19
+ xor r12,r13
+ xor rdi,r14
+ add r12,QWORD[48+rsp]
+
+ add r12,QWORD[104+rsp]
+ mov r13,r11
+ add r12,rdi
+ mov r14,rdx
+ ror r13,23
+ mov rdi,rax
+
+ xor r13,r11
+ ror r14,5
+ xor rdi,rbx
+
+ mov QWORD[104+rsp],r12
+ xor r14,rdx
+ and rdi,r11
+
+ ror r13,4
+ add r12,rcx
+ xor rdi,rbx
+
+ ror r14,6
+ xor r13,r11
+ add r12,rdi
+
+ mov rdi,rdx
+ add r12,QWORD[rbp]
+ xor r14,rdx
+
+ xor rdi,r8
+ ror r13,14
+ mov rcx,r8
+
+ and r15,rdi
+ ror r14,28
+ add r12,r13
+
+ xor rcx,r15
+ add r10,r12
+ add rcx,r12
+
+ lea rbp,[24+rbp]
+ mov r13,QWORD[120+rsp]
+ mov r15,QWORD[96+rsp]
+
+ mov r12,r13
+ ror r13,7
+ add rcx,r14
+ mov r14,r15
+ ror r15,42
+
+ xor r13,r12
+ shr r12,7
+ ror r13,1
+ xor r15,r14
+ shr r14,6
+
+ ror r15,19
+ xor r12,r13
+ xor r15,r14
+ add r12,QWORD[56+rsp]
+
+ add r12,QWORD[112+rsp]
+ mov r13,r10
+ add r12,r15
+ mov r14,rcx
+ ror r13,23
+ mov r15,r11
+
+ xor r13,r10
+ ror r14,5
+ xor r15,rax
+
+ mov QWORD[112+rsp],r12
+ xor r14,rcx
+ and r15,r10
+
+ ror r13,4
+ add r12,rbx
+ xor r15,rax
+
+ ror r14,6
+ xor r13,r10
+ add r12,r15
+
+ mov r15,rcx
+ add r12,QWORD[rbp]
+ xor r14,rcx
+
+ xor r15,rdx
+ ror r13,14
+ mov rbx,rdx
+
+ and rdi,r15
+ ror r14,28
+ add r12,r13
+
+ xor rbx,rdi
+ add r9,r12
+ add rbx,r12
+
+ lea rbp,[8+rbp]
+ mov r13,QWORD[rsp]
+ mov rdi,QWORD[104+rsp]
+
+ mov r12,r13
+ ror r13,7
+ add rbx,r14
+ mov r14,rdi
+ ror rdi,42
+
+ xor r13,r12
+ shr r12,7
+ ror r13,1
+ xor rdi,r14
+ shr r14,6
+
+ ror rdi,19
+ xor r12,r13
+ xor rdi,r14
+ add r12,QWORD[64+rsp]
+
+ add r12,QWORD[120+rsp]
+ mov r13,r9
+ add r12,rdi
+ mov r14,rbx
+ ror r13,23
+ mov rdi,r10
+
+ xor r13,r9
+ ror r14,5
+ xor rdi,r11
+
+ mov QWORD[120+rsp],r12
+ xor r14,rbx
+ and rdi,r9
+
+ ror r13,4
+ add r12,rax
+ xor rdi,r11
+
+ ror r14,6
+ xor r13,r9
+ add r12,rdi
+
+ mov rdi,rbx
+ add r12,QWORD[rbp]
+ xor r14,rbx
+
+ xor rdi,rcx
+ ror r13,14
+ mov rax,rcx
+
+ and r15,rdi
+ ror r14,28
+ add r12,r13
+
+ xor rax,r15
+ add r8,r12
+ add rax,r12
+
+ lea rbp,[24+rbp]
+ cmp BYTE[7+rbp],0
+ jnz NEAR $L$rounds_16_xx
+
+ mov rdi,QWORD[((128+0))+rsp]
+ add rax,r14
+ lea rsi,[128+rsi]
+
+ add rax,QWORD[rdi]
+ add rbx,QWORD[8+rdi]
+ add rcx,QWORD[16+rdi]
+ add rdx,QWORD[24+rdi]
+ add r8,QWORD[32+rdi]
+ add r9,QWORD[40+rdi]
+ add r10,QWORD[48+rdi]
+ add r11,QWORD[56+rdi]
+
+ cmp rsi,QWORD[((128+16))+rsp]
+
+ mov QWORD[rdi],rax
+ mov QWORD[8+rdi],rbx
+ mov QWORD[16+rdi],rcx
+ mov QWORD[24+rdi],rdx
+ mov QWORD[32+rdi],r8
+ mov QWORD[40+rdi],r9
+ mov QWORD[48+rdi],r10
+ mov QWORD[56+rdi],r11
+ jb NEAR $L$loop
+
+ mov rsi,QWORD[152+rsp]
+
+ mov r15,QWORD[((-48))+rsi]
+
+ mov r14,QWORD[((-40))+rsi]
+
+ mov r13,QWORD[((-32))+rsi]
+
+ mov r12,QWORD[((-24))+rsi]
+
+ mov rbp,QWORD[((-16))+rsi]
+
+ mov rbx,QWORD[((-8))+rsi]
+
+ lea rsp,[rsi]
+
+$L$epilogue:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_sha512_block_data_order:
+ALIGN 64
+
+K512:
+ DQ 0x428a2f98d728ae22,0x7137449123ef65cd
+ DQ 0x428a2f98d728ae22,0x7137449123ef65cd
+ DQ 0xb5c0fbcfec4d3b2f,0xe9b5dba58189dbbc
+ DQ 0xb5c0fbcfec4d3b2f,0xe9b5dba58189dbbc
+ DQ 0x3956c25bf348b538,0x59f111f1b605d019
+ DQ 0x3956c25bf348b538,0x59f111f1b605d019
+ DQ 0x923f82a4af194f9b,0xab1c5ed5da6d8118
+ DQ 0x923f82a4af194f9b,0xab1c5ed5da6d8118
+ DQ 0xd807aa98a3030242,0x12835b0145706fbe
+ DQ 0xd807aa98a3030242,0x12835b0145706fbe
+ DQ 0x243185be4ee4b28c,0x550c7dc3d5ffb4e2
+ DQ 0x243185be4ee4b28c,0x550c7dc3d5ffb4e2
+ DQ 0x72be5d74f27b896f,0x80deb1fe3b1696b1
+ DQ 0x72be5d74f27b896f,0x80deb1fe3b1696b1
+ DQ 0x9bdc06a725c71235,0xc19bf174cf692694
+ DQ 0x9bdc06a725c71235,0xc19bf174cf692694
+ DQ 0xe49b69c19ef14ad2,0xefbe4786384f25e3
+ DQ 0xe49b69c19ef14ad2,0xefbe4786384f25e3
+ DQ 0x0fc19dc68b8cd5b5,0x240ca1cc77ac9c65
+ DQ 0x0fc19dc68b8cd5b5,0x240ca1cc77ac9c65
+ DQ 0x2de92c6f592b0275,0x4a7484aa6ea6e483
+ DQ 0x2de92c6f592b0275,0x4a7484aa6ea6e483
+ DQ 0x5cb0a9dcbd41fbd4,0x76f988da831153b5
+ DQ 0x5cb0a9dcbd41fbd4,0x76f988da831153b5
+ DQ 0x983e5152ee66dfab,0xa831c66d2db43210
+ DQ 0x983e5152ee66dfab,0xa831c66d2db43210
+ DQ 0xb00327c898fb213f,0xbf597fc7beef0ee4
+ DQ 0xb00327c898fb213f,0xbf597fc7beef0ee4
+ DQ 0xc6e00bf33da88fc2,0xd5a79147930aa725
+ DQ 0xc6e00bf33da88fc2,0xd5a79147930aa725
+ DQ 0x06ca6351e003826f,0x142929670a0e6e70
+ DQ 0x06ca6351e003826f,0x142929670a0e6e70
+ DQ 0x27b70a8546d22ffc,0x2e1b21385c26c926
+ DQ 0x27b70a8546d22ffc,0x2e1b21385c26c926
+ DQ 0x4d2c6dfc5ac42aed,0x53380d139d95b3df
+ DQ 0x4d2c6dfc5ac42aed,0x53380d139d95b3df
+ DQ 0x650a73548baf63de,0x766a0abb3c77b2a8
+ DQ 0x650a73548baf63de,0x766a0abb3c77b2a8
+ DQ 0x81c2c92e47edaee6,0x92722c851482353b
+ DQ 0x81c2c92e47edaee6,0x92722c851482353b
+ DQ 0xa2bfe8a14cf10364,0xa81a664bbc423001
+ DQ 0xa2bfe8a14cf10364,0xa81a664bbc423001
+ DQ 0xc24b8b70d0f89791,0xc76c51a30654be30
+ DQ 0xc24b8b70d0f89791,0xc76c51a30654be30
+ DQ 0xd192e819d6ef5218,0xd69906245565a910
+ DQ 0xd192e819d6ef5218,0xd69906245565a910
+ DQ 0xf40e35855771202a,0x106aa07032bbd1b8
+ DQ 0xf40e35855771202a,0x106aa07032bbd1b8
+ DQ 0x19a4c116b8d2d0c8,0x1e376c085141ab53
+ DQ 0x19a4c116b8d2d0c8,0x1e376c085141ab53
+ DQ 0x2748774cdf8eeb99,0x34b0bcb5e19b48a8
+ DQ 0x2748774cdf8eeb99,0x34b0bcb5e19b48a8
+ DQ 0x391c0cb3c5c95a63,0x4ed8aa4ae3418acb
+ DQ 0x391c0cb3c5c95a63,0x4ed8aa4ae3418acb
+ DQ 0x5b9cca4f7763e373,0x682e6ff3d6b2b8a3
+ DQ 0x5b9cca4f7763e373,0x682e6ff3d6b2b8a3
+ DQ 0x748f82ee5defb2fc,0x78a5636f43172f60
+ DQ 0x748f82ee5defb2fc,0x78a5636f43172f60
+ DQ 0x84c87814a1f0ab72,0x8cc702081a6439ec
+ DQ 0x84c87814a1f0ab72,0x8cc702081a6439ec
+ DQ 0x90befffa23631e28,0xa4506cebde82bde9
+ DQ 0x90befffa23631e28,0xa4506cebde82bde9
+ DQ 0xbef9a3f7b2c67915,0xc67178f2e372532b
+ DQ 0xbef9a3f7b2c67915,0xc67178f2e372532b
+ DQ 0xca273eceea26619c,0xd186b8c721c0c207
+ DQ 0xca273eceea26619c,0xd186b8c721c0c207
+ DQ 0xeada7dd6cde0eb1e,0xf57d4f7fee6ed178
+ DQ 0xeada7dd6cde0eb1e,0xf57d4f7fee6ed178
+ DQ 0x06f067aa72176fba,0x0a637dc5a2c898a6
+ DQ 0x06f067aa72176fba,0x0a637dc5a2c898a6
+ DQ 0x113f9804bef90dae,0x1b710b35131c471b
+ DQ 0x113f9804bef90dae,0x1b710b35131c471b
+ DQ 0x28db77f523047d84,0x32caab7b40c72493
+ DQ 0x28db77f523047d84,0x32caab7b40c72493
+ DQ 0x3c9ebe0a15c9bebc,0x431d67c49c100d4c
+ DQ 0x3c9ebe0a15c9bebc,0x431d67c49c100d4c
+ DQ 0x4cc5d4becb3e42b6,0x597f299cfc657e2a
+ DQ 0x4cc5d4becb3e42b6,0x597f299cfc657e2a
+ DQ 0x5fcb6fab3ad6faec,0x6c44198c4a475817
+ DQ 0x5fcb6fab3ad6faec,0x6c44198c4a475817
+
+ DQ 0x0001020304050607,0x08090a0b0c0d0e0f
+ DQ 0x0001020304050607,0x08090a0b0c0d0e0f
+DB 83,72,65,53,49,50,32,98,108,111,99,107,32,116,114,97
+DB 110,115,102,111,114,109,32,102,111,114,32,120,56,54,95,54
+DB 52,44,32,67,82,89,80,84,79,71,65,77,83,32,98,121
+DB 32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46
+DB 111,114,103,62,0
+
+ALIGN 64
+sha512_block_data_order_xop:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_sha512_block_data_order_xop:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+
+$L$xop_shortcut:
+ mov rax,rsp
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+ shl rdx,4
+ sub rsp,256
+ lea rdx,[rdx*8+rsi]
+ and rsp,-64
+ mov QWORD[((128+0))+rsp],rdi
+ mov QWORD[((128+8))+rsp],rsi
+ mov QWORD[((128+16))+rsp],rdx
+ mov QWORD[152+rsp],rax
+
+ movaps XMMWORD[(128+32)+rsp],xmm6
+ movaps XMMWORD[(128+48)+rsp],xmm7
+ movaps XMMWORD[(128+64)+rsp],xmm8
+ movaps XMMWORD[(128+80)+rsp],xmm9
+ movaps XMMWORD[(128+96)+rsp],xmm10
+ movaps XMMWORD[(128+112)+rsp],xmm11
+$L$prologue_xop:
+
+ vzeroupper
+ mov rax,QWORD[rdi]
+ mov rbx,QWORD[8+rdi]
+ mov rcx,QWORD[16+rdi]
+ mov rdx,QWORD[24+rdi]
+ mov r8,QWORD[32+rdi]
+ mov r9,QWORD[40+rdi]
+ mov r10,QWORD[48+rdi]
+ mov r11,QWORD[56+rdi]
+ jmp NEAR $L$loop_xop
+ALIGN 16
+$L$loop_xop:
+ vmovdqa xmm11,XMMWORD[((K512+1280))]
+ vmovdqu xmm0,XMMWORD[rsi]
+ lea rbp,[((K512+128))]
+ vmovdqu xmm1,XMMWORD[16+rsi]
+ vmovdqu xmm2,XMMWORD[32+rsi]
+ vpshufb xmm0,xmm0,xmm11
+ vmovdqu xmm3,XMMWORD[48+rsi]
+ vpshufb xmm1,xmm1,xmm11
+ vmovdqu xmm4,XMMWORD[64+rsi]
+ vpshufb xmm2,xmm2,xmm11
+ vmovdqu xmm5,XMMWORD[80+rsi]
+ vpshufb xmm3,xmm3,xmm11
+ vmovdqu xmm6,XMMWORD[96+rsi]
+ vpshufb xmm4,xmm4,xmm11
+ vmovdqu xmm7,XMMWORD[112+rsi]
+ vpshufb xmm5,xmm5,xmm11
+ vpaddq xmm8,xmm0,XMMWORD[((-128))+rbp]
+ vpshufb xmm6,xmm6,xmm11
+ vpaddq xmm9,xmm1,XMMWORD[((-96))+rbp]
+ vpshufb xmm7,xmm7,xmm11
+ vpaddq xmm10,xmm2,XMMWORD[((-64))+rbp]
+ vpaddq xmm11,xmm3,XMMWORD[((-32))+rbp]
+ vmovdqa XMMWORD[rsp],xmm8
+ vpaddq xmm8,xmm4,XMMWORD[rbp]
+ vmovdqa XMMWORD[16+rsp],xmm9
+ vpaddq xmm9,xmm5,XMMWORD[32+rbp]
+ vmovdqa XMMWORD[32+rsp],xmm10
+ vpaddq xmm10,xmm6,XMMWORD[64+rbp]
+ vmovdqa XMMWORD[48+rsp],xmm11
+ vpaddq xmm11,xmm7,XMMWORD[96+rbp]
+ vmovdqa XMMWORD[64+rsp],xmm8
+ mov r14,rax
+ vmovdqa XMMWORD[80+rsp],xmm9
+ mov rdi,rbx
+ vmovdqa XMMWORD[96+rsp],xmm10
+ xor rdi,rcx
+ vmovdqa XMMWORD[112+rsp],xmm11
+ mov r13,r8
+ jmp NEAR $L$xop_00_47
+
+ALIGN 16
+$L$xop_00_47:
+ add rbp,256
+ vpalignr xmm8,xmm1,xmm0,8
+ ror r13,23
+ mov rax,r14
+ vpalignr xmm11,xmm5,xmm4,8
+ mov r12,r9
+ ror r14,5
+DB 143,72,120,195,200,56
+ xor r13,r8
+ xor r12,r10
+ vpsrlq xmm8,xmm8,7
+ ror r13,4
+ xor r14,rax
+ vpaddq xmm0,xmm0,xmm11
+ and r12,r8
+ xor r13,r8
+ add r11,QWORD[rsp]
+ mov r15,rax
+DB 143,72,120,195,209,7
+ xor r12,r10
+ ror r14,6
+ vpxor xmm8,xmm8,xmm9
+ xor r15,rbx
+ add r11,r12
+ ror r13,14
+ and rdi,r15
+DB 143,104,120,195,223,3
+ xor r14,rax
+ add r11,r13
+ vpxor xmm8,xmm8,xmm10
+ xor rdi,rbx
+ ror r14,28
+ vpsrlq xmm10,xmm7,6
+ add rdx,r11
+ add r11,rdi
+ vpaddq xmm0,xmm0,xmm8
+ mov r13,rdx
+ add r14,r11
+DB 143,72,120,195,203,42
+ ror r13,23
+ mov r11,r14
+ vpxor xmm11,xmm11,xmm10
+ mov r12,r8
+ ror r14,5
+ xor r13,rdx
+ xor r12,r9
+ vpxor xmm11,xmm11,xmm9
+ ror r13,4
+ xor r14,r11
+ and r12,rdx
+ xor r13,rdx
+ vpaddq xmm0,xmm0,xmm11
+ add r10,QWORD[8+rsp]
+ mov rdi,r11
+ xor r12,r9
+ ror r14,6
+ vpaddq xmm10,xmm0,XMMWORD[((-128))+rbp]
+ xor rdi,rax
+ add r10,r12
+ ror r13,14
+ and r15,rdi
+ xor r14,r11
+ add r10,r13
+ xor r15,rax
+ ror r14,28
+ add rcx,r10
+ add r10,r15
+ mov r13,rcx
+ add r14,r10
+ vmovdqa XMMWORD[rsp],xmm10
+ vpalignr xmm8,xmm2,xmm1,8
+ ror r13,23
+ mov r10,r14
+ vpalignr xmm11,xmm6,xmm5,8
+ mov r12,rdx
+ ror r14,5
+DB 143,72,120,195,200,56
+ xor r13,rcx
+ xor r12,r8
+ vpsrlq xmm8,xmm8,7
+ ror r13,4
+ xor r14,r10
+ vpaddq xmm1,xmm1,xmm11
+ and r12,rcx
+ xor r13,rcx
+ add r9,QWORD[16+rsp]
+ mov r15,r10
+DB 143,72,120,195,209,7
+ xor r12,r8
+ ror r14,6
+ vpxor xmm8,xmm8,xmm9
+ xor r15,r11
+ add r9,r12
+ ror r13,14
+ and rdi,r15
+DB 143,104,120,195,216,3
+ xor r14,r10
+ add r9,r13
+ vpxor xmm8,xmm8,xmm10
+ xor rdi,r11
+ ror r14,28
+ vpsrlq xmm10,xmm0,6
+ add rbx,r9
+ add r9,rdi
+ vpaddq xmm1,xmm1,xmm8
+ mov r13,rbx
+ add r14,r9
+DB 143,72,120,195,203,42
+ ror r13,23
+ mov r9,r14
+ vpxor xmm11,xmm11,xmm10
+ mov r12,rcx
+ ror r14,5
+ xor r13,rbx
+ xor r12,rdx
+ vpxor xmm11,xmm11,xmm9
+ ror r13,4
+ xor r14,r9
+ and r12,rbx
+ xor r13,rbx
+ vpaddq xmm1,xmm1,xmm11
+ add r8,QWORD[24+rsp]
+ mov rdi,r9
+ xor r12,rdx
+ ror r14,6
+ vpaddq xmm10,xmm1,XMMWORD[((-96))+rbp]
+ xor rdi,r10
+ add r8,r12
+ ror r13,14
+ and r15,rdi
+ xor r14,r9
+ add r8,r13
+ xor r15,r10
+ ror r14,28
+ add rax,r8
+ add r8,r15
+ mov r13,rax
+ add r14,r8
+ vmovdqa XMMWORD[16+rsp],xmm10
+ vpalignr xmm8,xmm3,xmm2,8
+ ror r13,23
+ mov r8,r14
+ vpalignr xmm11,xmm7,xmm6,8
+ mov r12,rbx
+ ror r14,5
+DB 143,72,120,195,200,56
+ xor r13,rax
+ xor r12,rcx
+ vpsrlq xmm8,xmm8,7
+ ror r13,4
+ xor r14,r8
+ vpaddq xmm2,xmm2,xmm11
+ and r12,rax
+ xor r13,rax
+ add rdx,QWORD[32+rsp]
+ mov r15,r8
+DB 143,72,120,195,209,7
+ xor r12,rcx
+ ror r14,6
+ vpxor xmm8,xmm8,xmm9
+ xor r15,r9
+ add rdx,r12
+ ror r13,14
+ and rdi,r15
+DB 143,104,120,195,217,3
+ xor r14,r8
+ add rdx,r13
+ vpxor xmm8,xmm8,xmm10
+ xor rdi,r9
+ ror r14,28
+ vpsrlq xmm10,xmm1,6
+ add r11,rdx
+ add rdx,rdi
+ vpaddq xmm2,xmm2,xmm8
+ mov r13,r11
+ add r14,rdx
+DB 143,72,120,195,203,42
+ ror r13,23
+ mov rdx,r14
+ vpxor xmm11,xmm11,xmm10
+ mov r12,rax
+ ror r14,5
+ xor r13,r11
+ xor r12,rbx
+ vpxor xmm11,xmm11,xmm9
+ ror r13,4
+ xor r14,rdx
+ and r12,r11
+ xor r13,r11
+ vpaddq xmm2,xmm2,xmm11
+ add rcx,QWORD[40+rsp]
+ mov rdi,rdx
+ xor r12,rbx
+ ror r14,6
+ vpaddq xmm10,xmm2,XMMWORD[((-64))+rbp]
+ xor rdi,r8
+ add rcx,r12
+ ror r13,14
+ and r15,rdi
+ xor r14,rdx
+ add rcx,r13
+ xor r15,r8
+ ror r14,28
+ add r10,rcx
+ add rcx,r15
+ mov r13,r10
+ add r14,rcx
+ vmovdqa XMMWORD[32+rsp],xmm10
+ vpalignr xmm8,xmm4,xmm3,8
+ ror r13,23
+ mov rcx,r14
+ vpalignr xmm11,xmm0,xmm7,8
+ mov r12,r11
+ ror r14,5
+DB 143,72,120,195,200,56
+ xor r13,r10
+ xor r12,rax
+ vpsrlq xmm8,xmm8,7
+ ror r13,4
+ xor r14,rcx
+ vpaddq xmm3,xmm3,xmm11
+ and r12,r10
+ xor r13,r10
+ add rbx,QWORD[48+rsp]
+ mov r15,rcx
+DB 143,72,120,195,209,7
+ xor r12,rax
+ ror r14,6
+ vpxor xmm8,xmm8,xmm9
+ xor r15,rdx
+ add rbx,r12
+ ror r13,14
+ and rdi,r15
+DB 143,104,120,195,218,3
+ xor r14,rcx
+ add rbx,r13
+ vpxor xmm8,xmm8,xmm10
+ xor rdi,rdx
+ ror r14,28
+ vpsrlq xmm10,xmm2,6
+ add r9,rbx
+ add rbx,rdi
+ vpaddq xmm3,xmm3,xmm8
+ mov r13,r9
+ add r14,rbx
+DB 143,72,120,195,203,42
+ ror r13,23
+ mov rbx,r14
+ vpxor xmm11,xmm11,xmm10
+ mov r12,r10
+ ror r14,5
+ xor r13,r9
+ xor r12,r11
+ vpxor xmm11,xmm11,xmm9
+ ror r13,4
+ xor r14,rbx
+ and r12,r9
+ xor r13,r9
+ vpaddq xmm3,xmm3,xmm11
+ add rax,QWORD[56+rsp]
+ mov rdi,rbx
+ xor r12,r11
+ ror r14,6
+ vpaddq xmm10,xmm3,XMMWORD[((-32))+rbp]
+ xor rdi,rcx
+ add rax,r12
+ ror r13,14
+ and r15,rdi
+ xor r14,rbx
+ add rax,r13
+ xor r15,rcx
+ ror r14,28
+ add r8,rax
+ add rax,r15
+ mov r13,r8
+ add r14,rax
+ vmovdqa XMMWORD[48+rsp],xmm10
+ vpalignr xmm8,xmm5,xmm4,8
+ ror r13,23
+ mov rax,r14
+ vpalignr xmm11,xmm1,xmm0,8
+ mov r12,r9
+ ror r14,5
+DB 143,72,120,195,200,56
+ xor r13,r8
+ xor r12,r10
+ vpsrlq xmm8,xmm8,7
+ ror r13,4
+ xor r14,rax
+ vpaddq xmm4,xmm4,xmm11
+ and r12,r8
+ xor r13,r8
+ add r11,QWORD[64+rsp]
+ mov r15,rax
+DB 143,72,120,195,209,7
+ xor r12,r10
+ ror r14,6
+ vpxor xmm8,xmm8,xmm9
+ xor r15,rbx
+ add r11,r12
+ ror r13,14
+ and rdi,r15
+DB 143,104,120,195,219,3
+ xor r14,rax
+ add r11,r13
+ vpxor xmm8,xmm8,xmm10
+ xor rdi,rbx
+ ror r14,28
+ vpsrlq xmm10,xmm3,6
+ add rdx,r11
+ add r11,rdi
+ vpaddq xmm4,xmm4,xmm8
+ mov r13,rdx
+ add r14,r11
+DB 143,72,120,195,203,42
+ ror r13,23
+ mov r11,r14
+ vpxor xmm11,xmm11,xmm10
+ mov r12,r8
+ ror r14,5
+ xor r13,rdx
+ xor r12,r9
+ vpxor xmm11,xmm11,xmm9
+ ror r13,4
+ xor r14,r11
+ and r12,rdx
+ xor r13,rdx
+ vpaddq xmm4,xmm4,xmm11
+ add r10,QWORD[72+rsp]
+ mov rdi,r11
+ xor r12,r9
+ ror r14,6
+ vpaddq xmm10,xmm4,XMMWORD[rbp]
+ xor rdi,rax
+ add r10,r12
+ ror r13,14
+ and r15,rdi
+ xor r14,r11
+ add r10,r13
+ xor r15,rax
+ ror r14,28
+ add rcx,r10
+ add r10,r15
+ mov r13,rcx
+ add r14,r10
+ vmovdqa XMMWORD[64+rsp],xmm10
+ vpalignr xmm8,xmm6,xmm5,8
+ ror r13,23
+ mov r10,r14
+ vpalignr xmm11,xmm2,xmm1,8
+ mov r12,rdx
+ ror r14,5
+DB 143,72,120,195,200,56
+ xor r13,rcx
+ xor r12,r8
+ vpsrlq xmm8,xmm8,7
+ ror r13,4
+ xor r14,r10
+ vpaddq xmm5,xmm5,xmm11
+ and r12,rcx
+ xor r13,rcx
+ add r9,QWORD[80+rsp]
+ mov r15,r10
+DB 143,72,120,195,209,7
+ xor r12,r8
+ ror r14,6
+ vpxor xmm8,xmm8,xmm9
+ xor r15,r11
+ add r9,r12
+ ror r13,14
+ and rdi,r15
+DB 143,104,120,195,220,3
+ xor r14,r10
+ add r9,r13
+ vpxor xmm8,xmm8,xmm10
+ xor rdi,r11
+ ror r14,28
+ vpsrlq xmm10,xmm4,6
+ add rbx,r9
+ add r9,rdi
+ vpaddq xmm5,xmm5,xmm8
+ mov r13,rbx
+ add r14,r9
+DB 143,72,120,195,203,42
+ ror r13,23
+ mov r9,r14
+ vpxor xmm11,xmm11,xmm10
+ mov r12,rcx
+ ror r14,5
+ xor r13,rbx
+ xor r12,rdx
+ vpxor xmm11,xmm11,xmm9
+ ror r13,4
+ xor r14,r9
+ and r12,rbx
+ xor r13,rbx
+ vpaddq xmm5,xmm5,xmm11
+ add r8,QWORD[88+rsp]
+ mov rdi,r9
+ xor r12,rdx
+ ror r14,6
+ vpaddq xmm10,xmm5,XMMWORD[32+rbp]
+ xor rdi,r10
+ add r8,r12
+ ror r13,14
+ and r15,rdi
+ xor r14,r9
+ add r8,r13
+ xor r15,r10
+ ror r14,28
+ add rax,r8
+ add r8,r15
+ mov r13,rax
+ add r14,r8
+ vmovdqa XMMWORD[80+rsp],xmm10
+ vpalignr xmm8,xmm7,xmm6,8
+ ror r13,23
+ mov r8,r14
+ vpalignr xmm11,xmm3,xmm2,8
+ mov r12,rbx
+ ror r14,5
+DB 143,72,120,195,200,56
+ xor r13,rax
+ xor r12,rcx
+ vpsrlq xmm8,xmm8,7
+ ror r13,4
+ xor r14,r8
+ vpaddq xmm6,xmm6,xmm11
+ and r12,rax
+ xor r13,rax
+ add rdx,QWORD[96+rsp]
+ mov r15,r8
+DB 143,72,120,195,209,7
+ xor r12,rcx
+ ror r14,6
+ vpxor xmm8,xmm8,xmm9
+ xor r15,r9
+ add rdx,r12
+ ror r13,14
+ and rdi,r15
+DB 143,104,120,195,221,3
+ xor r14,r8
+ add rdx,r13
+ vpxor xmm8,xmm8,xmm10
+ xor rdi,r9
+ ror r14,28
+ vpsrlq xmm10,xmm5,6
+ add r11,rdx
+ add rdx,rdi
+ vpaddq xmm6,xmm6,xmm8
+ mov r13,r11
+ add r14,rdx
+DB 143,72,120,195,203,42
+ ror r13,23
+ mov rdx,r14
+ vpxor xmm11,xmm11,xmm10
+ mov r12,rax
+ ror r14,5
+ xor r13,r11
+ xor r12,rbx
+ vpxor xmm11,xmm11,xmm9
+ ror r13,4
+ xor r14,rdx
+ and r12,r11
+ xor r13,r11
+ vpaddq xmm6,xmm6,xmm11
+ add rcx,QWORD[104+rsp]
+ mov rdi,rdx
+ xor r12,rbx
+ ror r14,6
+ vpaddq xmm10,xmm6,XMMWORD[64+rbp]
+ xor rdi,r8
+ add rcx,r12
+ ror r13,14
+ and r15,rdi
+ xor r14,rdx
+ add rcx,r13
+ xor r15,r8
+ ror r14,28
+ add r10,rcx
+ add rcx,r15
+ mov r13,r10
+ add r14,rcx
+ vmovdqa XMMWORD[96+rsp],xmm10
+ vpalignr xmm8,xmm0,xmm7,8
+ ror r13,23
+ mov rcx,r14
+ vpalignr xmm11,xmm4,xmm3,8
+ mov r12,r11
+ ror r14,5
+DB 143,72,120,195,200,56
+ xor r13,r10
+ xor r12,rax
+ vpsrlq xmm8,xmm8,7
+ ror r13,4
+ xor r14,rcx
+ vpaddq xmm7,xmm7,xmm11
+ and r12,r10
+ xor r13,r10
+ add rbx,QWORD[112+rsp]
+ mov r15,rcx
+DB 143,72,120,195,209,7
+ xor r12,rax
+ ror r14,6
+ vpxor xmm8,xmm8,xmm9
+ xor r15,rdx
+ add rbx,r12
+ ror r13,14
+ and rdi,r15
+DB 143,104,120,195,222,3
+ xor r14,rcx
+ add rbx,r13
+ vpxor xmm8,xmm8,xmm10
+ xor rdi,rdx
+ ror r14,28
+ vpsrlq xmm10,xmm6,6
+ add r9,rbx
+ add rbx,rdi
+ vpaddq xmm7,xmm7,xmm8
+ mov r13,r9
+ add r14,rbx
+DB 143,72,120,195,203,42
+ ror r13,23
+ mov rbx,r14
+ vpxor xmm11,xmm11,xmm10
+ mov r12,r10
+ ror r14,5
+ xor r13,r9
+ xor r12,r11
+ vpxor xmm11,xmm11,xmm9
+ ror r13,4
+ xor r14,rbx
+ and r12,r9
+ xor r13,r9
+ vpaddq xmm7,xmm7,xmm11
+ add rax,QWORD[120+rsp]
+ mov rdi,rbx
+ xor r12,r11
+ ror r14,6
+ vpaddq xmm10,xmm7,XMMWORD[96+rbp]
+ xor rdi,rcx
+ add rax,r12
+ ror r13,14
+ and r15,rdi
+ xor r14,rbx
+ add rax,r13
+ xor r15,rcx
+ ror r14,28
+ add r8,rax
+ add rax,r15
+ mov r13,r8
+ add r14,rax
+ vmovdqa XMMWORD[112+rsp],xmm10
+ cmp BYTE[135+rbp],0
+ jne NEAR $L$xop_00_47
+ ror r13,23
+ mov rax,r14
+ mov r12,r9
+ ror r14,5
+ xor r13,r8
+ xor r12,r10
+ ror r13,4
+ xor r14,rax
+ and r12,r8
+ xor r13,r8
+ add r11,QWORD[rsp]
+ mov r15,rax
+ xor r12,r10
+ ror r14,6
+ xor r15,rbx
+ add r11,r12
+ ror r13,14
+ and rdi,r15
+ xor r14,rax
+ add r11,r13
+ xor rdi,rbx
+ ror r14,28
+ add rdx,r11
+ add r11,rdi
+ mov r13,rdx
+ add r14,r11
+ ror r13,23
+ mov r11,r14
+ mov r12,r8
+ ror r14,5
+ xor r13,rdx
+ xor r12,r9
+ ror r13,4
+ xor r14,r11
+ and r12,rdx
+ xor r13,rdx
+ add r10,QWORD[8+rsp]
+ mov rdi,r11
+ xor r12,r9
+ ror r14,6
+ xor rdi,rax
+ add r10,r12
+ ror r13,14
+ and r15,rdi
+ xor r14,r11
+ add r10,r13
+ xor r15,rax
+ ror r14,28
+ add rcx,r10
+ add r10,r15
+ mov r13,rcx
+ add r14,r10
+ ror r13,23
+ mov r10,r14
+ mov r12,rdx
+ ror r14,5
+ xor r13,rcx
+ xor r12,r8
+ ror r13,4
+ xor r14,r10
+ and r12,rcx
+ xor r13,rcx
+ add r9,QWORD[16+rsp]
+ mov r15,r10
+ xor r12,r8
+ ror r14,6
+ xor r15,r11
+ add r9,r12
+ ror r13,14
+ and rdi,r15
+ xor r14,r10
+ add r9,r13
+ xor rdi,r11
+ ror r14,28
+ add rbx,r9
+ add r9,rdi
+ mov r13,rbx
+ add r14,r9
+ ror r13,23
+ mov r9,r14
+ mov r12,rcx
+ ror r14,5
+ xor r13,rbx
+ xor r12,rdx
+ ror r13,4
+ xor r14,r9
+ and r12,rbx
+ xor r13,rbx
+ add r8,QWORD[24+rsp]
+ mov rdi,r9
+ xor r12,rdx
+ ror r14,6
+ xor rdi,r10
+ add r8,r12
+ ror r13,14
+ and r15,rdi
+ xor r14,r9
+ add r8,r13
+ xor r15,r10
+ ror r14,28
+ add rax,r8
+ add r8,r15
+ mov r13,rax
+ add r14,r8
+ ror r13,23
+ mov r8,r14
+ mov r12,rbx
+ ror r14,5
+ xor r13,rax
+ xor r12,rcx
+ ror r13,4
+ xor r14,r8
+ and r12,rax
+ xor r13,rax
+ add rdx,QWORD[32+rsp]
+ mov r15,r8
+ xor r12,rcx
+ ror r14,6
+ xor r15,r9
+ add rdx,r12
+ ror r13,14
+ and rdi,r15
+ xor r14,r8
+ add rdx,r13
+ xor rdi,r9
+ ror r14,28
+ add r11,rdx
+ add rdx,rdi
+ mov r13,r11
+ add r14,rdx
+ ror r13,23
+ mov rdx,r14
+ mov r12,rax
+ ror r14,5
+ xor r13,r11
+ xor r12,rbx
+ ror r13,4
+ xor r14,rdx
+ and r12,r11
+ xor r13,r11
+ add rcx,QWORD[40+rsp]
+ mov rdi,rdx
+ xor r12,rbx
+ ror r14,6
+ xor rdi,r8
+ add rcx,r12
+ ror r13,14
+ and r15,rdi
+ xor r14,rdx
+ add rcx,r13
+ xor r15,r8
+ ror r14,28
+ add r10,rcx
+ add rcx,r15
+ mov r13,r10
+ add r14,rcx
+ ror r13,23
+ mov rcx,r14
+ mov r12,r11
+ ror r14,5
+ xor r13,r10
+ xor r12,rax
+ ror r13,4
+ xor r14,rcx
+ and r12,r10
+ xor r13,r10
+ add rbx,QWORD[48+rsp]
+ mov r15,rcx
+ xor r12,rax
+ ror r14,6
+ xor r15,rdx
+ add rbx,r12
+ ror r13,14
+ and rdi,r15
+ xor r14,rcx
+ add rbx,r13
+ xor rdi,rdx
+ ror r14,28
+ add r9,rbx
+ add rbx,rdi
+ mov r13,r9
+ add r14,rbx
+ ror r13,23
+ mov rbx,r14
+ mov r12,r10
+ ror r14,5
+ xor r13,r9
+ xor r12,r11
+ ror r13,4
+ xor r14,rbx
+ and r12,r9
+ xor r13,r9
+ add rax,QWORD[56+rsp]
+ mov rdi,rbx
+ xor r12,r11
+ ror r14,6
+ xor rdi,rcx
+ add rax,r12
+ ror r13,14
+ and r15,rdi
+ xor r14,rbx
+ add rax,r13
+ xor r15,rcx
+ ror r14,28
+ add r8,rax
+ add rax,r15
+ mov r13,r8
+ add r14,rax
+ ror r13,23
+ mov rax,r14
+ mov r12,r9
+ ror r14,5
+ xor r13,r8
+ xor r12,r10
+ ror r13,4
+ xor r14,rax
+ and r12,r8
+ xor r13,r8
+ add r11,QWORD[64+rsp]
+ mov r15,rax
+ xor r12,r10
+ ror r14,6
+ xor r15,rbx
+ add r11,r12
+ ror r13,14
+ and rdi,r15
+ xor r14,rax
+ add r11,r13
+ xor rdi,rbx
+ ror r14,28
+ add rdx,r11
+ add r11,rdi
+ mov r13,rdx
+ add r14,r11
+ ror r13,23
+ mov r11,r14
+ mov r12,r8
+ ror r14,5
+ xor r13,rdx
+ xor r12,r9
+ ror r13,4
+ xor r14,r11
+ and r12,rdx
+ xor r13,rdx
+ add r10,QWORD[72+rsp]
+ mov rdi,r11
+ xor r12,r9
+ ror r14,6
+ xor rdi,rax
+ add r10,r12
+ ror r13,14
+ and r15,rdi
+ xor r14,r11
+ add r10,r13
+ xor r15,rax
+ ror r14,28
+ add rcx,r10
+ add r10,r15
+ mov r13,rcx
+ add r14,r10
+ ror r13,23
+ mov r10,r14
+ mov r12,rdx
+ ror r14,5
+ xor r13,rcx
+ xor r12,r8
+ ror r13,4
+ xor r14,r10
+ and r12,rcx
+ xor r13,rcx
+ add r9,QWORD[80+rsp]
+ mov r15,r10
+ xor r12,r8
+ ror r14,6
+ xor r15,r11
+ add r9,r12
+ ror r13,14
+ and rdi,r15
+ xor r14,r10
+ add r9,r13
+ xor rdi,r11
+ ror r14,28
+ add rbx,r9
+ add r9,rdi
+ mov r13,rbx
+ add r14,r9
+ ror r13,23
+ mov r9,r14
+ mov r12,rcx
+ ror r14,5
+ xor r13,rbx
+ xor r12,rdx
+ ror r13,4
+ xor r14,r9
+ and r12,rbx
+ xor r13,rbx
+ add r8,QWORD[88+rsp]
+ mov rdi,r9
+ xor r12,rdx
+ ror r14,6
+ xor rdi,r10
+ add r8,r12
+ ror r13,14
+ and r15,rdi
+ xor r14,r9
+ add r8,r13
+ xor r15,r10
+ ror r14,28
+ add rax,r8
+ add r8,r15
+ mov r13,rax
+ add r14,r8
+ ror r13,23
+ mov r8,r14
+ mov r12,rbx
+ ror r14,5
+ xor r13,rax
+ xor r12,rcx
+ ror r13,4
+ xor r14,r8
+ and r12,rax
+ xor r13,rax
+ add rdx,QWORD[96+rsp]
+ mov r15,r8
+ xor r12,rcx
+ ror r14,6
+ xor r15,r9
+ add rdx,r12
+ ror r13,14
+ and rdi,r15
+ xor r14,r8
+ add rdx,r13
+ xor rdi,r9
+ ror r14,28
+ add r11,rdx
+ add rdx,rdi
+ mov r13,r11
+ add r14,rdx
+ ror r13,23
+ mov rdx,r14
+ mov r12,rax
+ ror r14,5
+ xor r13,r11
+ xor r12,rbx
+ ror r13,4
+ xor r14,rdx
+ and r12,r11
+ xor r13,r11
+ add rcx,QWORD[104+rsp]
+ mov rdi,rdx
+ xor r12,rbx
+ ror r14,6
+ xor rdi,r8
+ add rcx,r12
+ ror r13,14
+ and r15,rdi
+ xor r14,rdx
+ add rcx,r13
+ xor r15,r8
+ ror r14,28
+ add r10,rcx
+ add rcx,r15
+ mov r13,r10
+ add r14,rcx
+ ror r13,23
+ mov rcx,r14
+ mov r12,r11
+ ror r14,5
+ xor r13,r10
+ xor r12,rax
+ ror r13,4
+ xor r14,rcx
+ and r12,r10
+ xor r13,r10
+ add rbx,QWORD[112+rsp]
+ mov r15,rcx
+ xor r12,rax
+ ror r14,6
+ xor r15,rdx
+ add rbx,r12
+ ror r13,14
+ and rdi,r15
+ xor r14,rcx
+ add rbx,r13
+ xor rdi,rdx
+ ror r14,28
+ add r9,rbx
+ add rbx,rdi
+ mov r13,r9
+ add r14,rbx
+ ror r13,23
+ mov rbx,r14
+ mov r12,r10
+ ror r14,5
+ xor r13,r9
+ xor r12,r11
+ ror r13,4
+ xor r14,rbx
+ and r12,r9
+ xor r13,r9
+ add rax,QWORD[120+rsp]
+ mov rdi,rbx
+ xor r12,r11
+ ror r14,6
+ xor rdi,rcx
+ add rax,r12
+ ror r13,14
+ and r15,rdi
+ xor r14,rbx
+ add rax,r13
+ xor r15,rcx
+ ror r14,28
+ add r8,rax
+ add rax,r15
+ mov r13,r8
+ add r14,rax
+ mov rdi,QWORD[((128+0))+rsp]
+ mov rax,r14
+
+ add rax,QWORD[rdi]
+ lea rsi,[128+rsi]
+ add rbx,QWORD[8+rdi]
+ add rcx,QWORD[16+rdi]
+ add rdx,QWORD[24+rdi]
+ add r8,QWORD[32+rdi]
+ add r9,QWORD[40+rdi]
+ add r10,QWORD[48+rdi]
+ add r11,QWORD[56+rdi]
+
+ cmp rsi,QWORD[((128+16))+rsp]
+
+ mov QWORD[rdi],rax
+ mov QWORD[8+rdi],rbx
+ mov QWORD[16+rdi],rcx
+ mov QWORD[24+rdi],rdx
+ mov QWORD[32+rdi],r8
+ mov QWORD[40+rdi],r9
+ mov QWORD[48+rdi],r10
+ mov QWORD[56+rdi],r11
+ jb NEAR $L$loop_xop
+
+ mov rsi,QWORD[152+rsp]
+
+ vzeroupper
+ movaps xmm6,XMMWORD[((128+32))+rsp]
+ movaps xmm7,XMMWORD[((128+48))+rsp]
+ movaps xmm8,XMMWORD[((128+64))+rsp]
+ movaps xmm9,XMMWORD[((128+80))+rsp]
+ movaps xmm10,XMMWORD[((128+96))+rsp]
+ movaps xmm11,XMMWORD[((128+112))+rsp]
+ mov r15,QWORD[((-48))+rsi]
+
+ mov r14,QWORD[((-40))+rsi]
+
+ mov r13,QWORD[((-32))+rsi]
+
+ mov r12,QWORD[((-24))+rsi]
+
+ mov rbp,QWORD[((-16))+rsi]
+
+ mov rbx,QWORD[((-8))+rsi]
+
+ lea rsp,[rsi]
+
+$L$epilogue_xop:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_sha512_block_data_order_xop:
+
+ALIGN 64
+sha512_block_data_order_avx:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_sha512_block_data_order_avx:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+
+$L$avx_shortcut:
+ mov rax,rsp
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+ shl rdx,4
+ sub rsp,256
+ lea rdx,[rdx*8+rsi]
+ and rsp,-64
+ mov QWORD[((128+0))+rsp],rdi
+ mov QWORD[((128+8))+rsp],rsi
+ mov QWORD[((128+16))+rsp],rdx
+ mov QWORD[152+rsp],rax
+
+ movaps XMMWORD[(128+32)+rsp],xmm6
+ movaps XMMWORD[(128+48)+rsp],xmm7
+ movaps XMMWORD[(128+64)+rsp],xmm8
+ movaps XMMWORD[(128+80)+rsp],xmm9
+ movaps XMMWORD[(128+96)+rsp],xmm10
+ movaps XMMWORD[(128+112)+rsp],xmm11
+$L$prologue_avx:
+
+ vzeroupper
+ mov rax,QWORD[rdi]
+ mov rbx,QWORD[8+rdi]
+ mov rcx,QWORD[16+rdi]
+ mov rdx,QWORD[24+rdi]
+ mov r8,QWORD[32+rdi]
+ mov r9,QWORD[40+rdi]
+ mov r10,QWORD[48+rdi]
+ mov r11,QWORD[56+rdi]
+ jmp NEAR $L$loop_avx
+ALIGN 16
+$L$loop_avx:
+ vmovdqa xmm11,XMMWORD[((K512+1280))]
+ vmovdqu xmm0,XMMWORD[rsi]
+ lea rbp,[((K512+128))]
+ vmovdqu xmm1,XMMWORD[16+rsi]
+ vmovdqu xmm2,XMMWORD[32+rsi]
+ vpshufb xmm0,xmm0,xmm11
+ vmovdqu xmm3,XMMWORD[48+rsi]
+ vpshufb xmm1,xmm1,xmm11
+ vmovdqu xmm4,XMMWORD[64+rsi]
+ vpshufb xmm2,xmm2,xmm11
+ vmovdqu xmm5,XMMWORD[80+rsi]
+ vpshufb xmm3,xmm3,xmm11
+ vmovdqu xmm6,XMMWORD[96+rsi]
+ vpshufb xmm4,xmm4,xmm11
+ vmovdqu xmm7,XMMWORD[112+rsi]
+ vpshufb xmm5,xmm5,xmm11
+ vpaddq xmm8,xmm0,XMMWORD[((-128))+rbp]
+ vpshufb xmm6,xmm6,xmm11
+ vpaddq xmm9,xmm1,XMMWORD[((-96))+rbp]
+ vpshufb xmm7,xmm7,xmm11
+ vpaddq xmm10,xmm2,XMMWORD[((-64))+rbp]
+ vpaddq xmm11,xmm3,XMMWORD[((-32))+rbp]
+ vmovdqa XMMWORD[rsp],xmm8
+ vpaddq xmm8,xmm4,XMMWORD[rbp]
+ vmovdqa XMMWORD[16+rsp],xmm9
+ vpaddq xmm9,xmm5,XMMWORD[32+rbp]
+ vmovdqa XMMWORD[32+rsp],xmm10
+ vpaddq xmm10,xmm6,XMMWORD[64+rbp]
+ vmovdqa XMMWORD[48+rsp],xmm11
+ vpaddq xmm11,xmm7,XMMWORD[96+rbp]
+ vmovdqa XMMWORD[64+rsp],xmm8
+ mov r14,rax
+ vmovdqa XMMWORD[80+rsp],xmm9
+ mov rdi,rbx
+ vmovdqa XMMWORD[96+rsp],xmm10
+ xor rdi,rcx
+ vmovdqa XMMWORD[112+rsp],xmm11
+ mov r13,r8
+ jmp NEAR $L$avx_00_47
+
+ALIGN 16
+$L$avx_00_47:
+ add rbp,256
+ vpalignr xmm8,xmm1,xmm0,8
+ shrd r13,r13,23
+ mov rax,r14
+ vpalignr xmm11,xmm5,xmm4,8
+ mov r12,r9
+ shrd r14,r14,5
+ vpsrlq xmm10,xmm8,1
+ xor r13,r8
+ xor r12,r10
+ vpaddq xmm0,xmm0,xmm11
+ shrd r13,r13,4
+ xor r14,rax
+ vpsrlq xmm11,xmm8,7
+ and r12,r8
+ xor r13,r8
+ vpsllq xmm9,xmm8,56
+ add r11,QWORD[rsp]
+ mov r15,rax
+ vpxor xmm8,xmm11,xmm10
+ xor r12,r10
+ shrd r14,r14,6
+ vpsrlq xmm10,xmm10,7
+ xor r15,rbx
+ add r11,r12
+ vpxor xmm8,xmm8,xmm9
+ shrd r13,r13,14
+ and rdi,r15
+ vpsllq xmm9,xmm9,7
+ xor r14,rax
+ add r11,r13
+ vpxor xmm8,xmm8,xmm10
+ xor rdi,rbx
+ shrd r14,r14,28
+ vpsrlq xmm11,xmm7,6
+ add rdx,r11
+ add r11,rdi
+ vpxor xmm8,xmm8,xmm9
+ mov r13,rdx
+ add r14,r11
+ vpsllq xmm10,xmm7,3
+ shrd r13,r13,23
+ mov r11,r14
+ vpaddq xmm0,xmm0,xmm8
+ mov r12,r8
+ shrd r14,r14,5
+ vpsrlq xmm9,xmm7,19
+ xor r13,rdx
+ xor r12,r9
+ vpxor xmm11,xmm11,xmm10
+ shrd r13,r13,4
+ xor r14,r11
+ vpsllq xmm10,xmm10,42
+ and r12,rdx
+ xor r13,rdx
+ vpxor xmm11,xmm11,xmm9
+ add r10,QWORD[8+rsp]
+ mov rdi,r11
+ vpsrlq xmm9,xmm9,42
+ xor r12,r9
+ shrd r14,r14,6
+ vpxor xmm11,xmm11,xmm10
+ xor rdi,rax
+ add r10,r12
+ vpxor xmm11,xmm11,xmm9
+ shrd r13,r13,14
+ and r15,rdi
+ vpaddq xmm0,xmm0,xmm11
+ xor r14,r11
+ add r10,r13
+ vpaddq xmm10,xmm0,XMMWORD[((-128))+rbp]
+ xor r15,rax
+ shrd r14,r14,28
+ add rcx,r10
+ add r10,r15
+ mov r13,rcx
+ add r14,r10
+ vmovdqa XMMWORD[rsp],xmm10
+ vpalignr xmm8,xmm2,xmm1,8
+ shrd r13,r13,23
+ mov r10,r14
+ vpalignr xmm11,xmm6,xmm5,8
+ mov r12,rdx
+ shrd r14,r14,5
+ vpsrlq xmm10,xmm8,1
+ xor r13,rcx
+ xor r12,r8
+ vpaddq xmm1,xmm1,xmm11
+ shrd r13,r13,4
+ xor r14,r10
+ vpsrlq xmm11,xmm8,7
+ and r12,rcx
+ xor r13,rcx
+ vpsllq xmm9,xmm8,56
+ add r9,QWORD[16+rsp]
+ mov r15,r10
+ vpxor xmm8,xmm11,xmm10
+ xor r12,r8
+ shrd r14,r14,6
+ vpsrlq xmm10,xmm10,7
+ xor r15,r11
+ add r9,r12
+ vpxor xmm8,xmm8,xmm9
+ shrd r13,r13,14
+ and rdi,r15
+ vpsllq xmm9,xmm9,7
+ xor r14,r10
+ add r9,r13
+ vpxor xmm8,xmm8,xmm10
+ xor rdi,r11
+ shrd r14,r14,28
+ vpsrlq xmm11,xmm0,6
+ add rbx,r9
+ add r9,rdi
+ vpxor xmm8,xmm8,xmm9
+ mov r13,rbx
+ add r14,r9
+ vpsllq xmm10,xmm0,3
+ shrd r13,r13,23
+ mov r9,r14
+ vpaddq xmm1,xmm1,xmm8
+ mov r12,rcx
+ shrd r14,r14,5
+ vpsrlq xmm9,xmm0,19
+ xor r13,rbx
+ xor r12,rdx
+ vpxor xmm11,xmm11,xmm10
+ shrd r13,r13,4
+ xor r14,r9
+ vpsllq xmm10,xmm10,42
+ and r12,rbx
+ xor r13,rbx
+ vpxor xmm11,xmm11,xmm9
+ add r8,QWORD[24+rsp]
+ mov rdi,r9
+ vpsrlq xmm9,xmm9,42
+ xor r12,rdx
+ shrd r14,r14,6
+ vpxor xmm11,xmm11,xmm10
+ xor rdi,r10
+ add r8,r12
+ vpxor xmm11,xmm11,xmm9
+ shrd r13,r13,14
+ and r15,rdi
+ vpaddq xmm1,xmm1,xmm11
+ xor r14,r9
+ add r8,r13
+ vpaddq xmm10,xmm1,XMMWORD[((-96))+rbp]
+ xor r15,r10
+ shrd r14,r14,28
+ add rax,r8
+ add r8,r15
+ mov r13,rax
+ add r14,r8
+ vmovdqa XMMWORD[16+rsp],xmm10
+ vpalignr xmm8,xmm3,xmm2,8
+ shrd r13,r13,23
+ mov r8,r14
+ vpalignr xmm11,xmm7,xmm6,8
+ mov r12,rbx
+ shrd r14,r14,5
+ vpsrlq xmm10,xmm8,1
+ xor r13,rax
+ xor r12,rcx
+ vpaddq xmm2,xmm2,xmm11
+ shrd r13,r13,4
+ xor r14,r8
+ vpsrlq xmm11,xmm8,7
+ and r12,rax
+ xor r13,rax
+ vpsllq xmm9,xmm8,56
+ add rdx,QWORD[32+rsp]
+ mov r15,r8
+ vpxor xmm8,xmm11,xmm10
+ xor r12,rcx
+ shrd r14,r14,6
+ vpsrlq xmm10,xmm10,7
+ xor r15,r9
+ add rdx,r12
+ vpxor xmm8,xmm8,xmm9
+ shrd r13,r13,14
+ and rdi,r15
+ vpsllq xmm9,xmm9,7
+ xor r14,r8
+ add rdx,r13
+ vpxor xmm8,xmm8,xmm10
+ xor rdi,r9
+ shrd r14,r14,28
+ vpsrlq xmm11,xmm1,6
+ add r11,rdx
+ add rdx,rdi
+ vpxor xmm8,xmm8,xmm9
+ mov r13,r11
+ add r14,rdx
+ vpsllq xmm10,xmm1,3
+ shrd r13,r13,23
+ mov rdx,r14
+ vpaddq xmm2,xmm2,xmm8
+ mov r12,rax
+ shrd r14,r14,5
+ vpsrlq xmm9,xmm1,19
+ xor r13,r11
+ xor r12,rbx
+ vpxor xmm11,xmm11,xmm10
+ shrd r13,r13,4
+ xor r14,rdx
+ vpsllq xmm10,xmm10,42
+ and r12,r11
+ xor r13,r11
+ vpxor xmm11,xmm11,xmm9
+ add rcx,QWORD[40+rsp]
+ mov rdi,rdx
+ vpsrlq xmm9,xmm9,42
+ xor r12,rbx
+ shrd r14,r14,6
+ vpxor xmm11,xmm11,xmm10
+ xor rdi,r8
+ add rcx,r12
+ vpxor xmm11,xmm11,xmm9
+ shrd r13,r13,14
+ and r15,rdi
+ vpaddq xmm2,xmm2,xmm11
+ xor r14,rdx
+ add rcx,r13
+ vpaddq xmm10,xmm2,XMMWORD[((-64))+rbp]
+ xor r15,r8
+ shrd r14,r14,28
+ add r10,rcx
+ add rcx,r15
+ mov r13,r10
+ add r14,rcx
+ vmovdqa XMMWORD[32+rsp],xmm10
+ vpalignr xmm8,xmm4,xmm3,8
+ shrd r13,r13,23
+ mov rcx,r14
+ vpalignr xmm11,xmm0,xmm7,8
+ mov r12,r11
+ shrd r14,r14,5
+ vpsrlq xmm10,xmm8,1
+ xor r13,r10
+ xor r12,rax
+ vpaddq xmm3,xmm3,xmm11
+ shrd r13,r13,4
+ xor r14,rcx
+ vpsrlq xmm11,xmm8,7
+ and r12,r10
+ xor r13,r10
+ vpsllq xmm9,xmm8,56
+ add rbx,QWORD[48+rsp]
+ mov r15,rcx
+ vpxor xmm8,xmm11,xmm10
+ xor r12,rax
+ shrd r14,r14,6
+ vpsrlq xmm10,xmm10,7
+ xor r15,rdx
+ add rbx,r12
+ vpxor xmm8,xmm8,xmm9
+ shrd r13,r13,14
+ and rdi,r15
+ vpsllq xmm9,xmm9,7
+ xor r14,rcx
+ add rbx,r13
+ vpxor xmm8,xmm8,xmm10
+ xor rdi,rdx
+ shrd r14,r14,28
+ vpsrlq xmm11,xmm2,6
+ add r9,rbx
+ add rbx,rdi
+ vpxor xmm8,xmm8,xmm9
+ mov r13,r9
+ add r14,rbx
+ vpsllq xmm10,xmm2,3
+ shrd r13,r13,23
+ mov rbx,r14
+ vpaddq xmm3,xmm3,xmm8
+ mov r12,r10
+ shrd r14,r14,5
+ vpsrlq xmm9,xmm2,19
+ xor r13,r9
+ xor r12,r11
+ vpxor xmm11,xmm11,xmm10
+ shrd r13,r13,4
+ xor r14,rbx
+ vpsllq xmm10,xmm10,42
+ and r12,r9
+ xor r13,r9
+ vpxor xmm11,xmm11,xmm9
+ add rax,QWORD[56+rsp]
+ mov rdi,rbx
+ vpsrlq xmm9,xmm9,42
+ xor r12,r11
+ shrd r14,r14,6
+ vpxor xmm11,xmm11,xmm10
+ xor rdi,rcx
+ add rax,r12
+ vpxor xmm11,xmm11,xmm9
+ shrd r13,r13,14
+ and r15,rdi
+ vpaddq xmm3,xmm3,xmm11
+ xor r14,rbx
+ add rax,r13
+ vpaddq xmm10,xmm3,XMMWORD[((-32))+rbp]
+ xor r15,rcx
+ shrd r14,r14,28
+ add r8,rax
+ add rax,r15
+ mov r13,r8
+ add r14,rax
+ vmovdqa XMMWORD[48+rsp],xmm10
+ vpalignr xmm8,xmm5,xmm4,8
+ shrd r13,r13,23
+ mov rax,r14
+ vpalignr xmm11,xmm1,xmm0,8
+ mov r12,r9
+ shrd r14,r14,5
+ vpsrlq xmm10,xmm8,1
+ xor r13,r8
+ xor r12,r10
+ vpaddq xmm4,xmm4,xmm11
+ shrd r13,r13,4
+ xor r14,rax
+ vpsrlq xmm11,xmm8,7
+ and r12,r8
+ xor r13,r8
+ vpsllq xmm9,xmm8,56
+ add r11,QWORD[64+rsp]
+ mov r15,rax
+ vpxor xmm8,xmm11,xmm10
+ xor r12,r10
+ shrd r14,r14,6
+ vpsrlq xmm10,xmm10,7
+ xor r15,rbx
+ add r11,r12
+ vpxor xmm8,xmm8,xmm9
+ shrd r13,r13,14
+ and rdi,r15
+ vpsllq xmm9,xmm9,7
+ xor r14,rax
+ add r11,r13
+ vpxor xmm8,xmm8,xmm10
+ xor rdi,rbx
+ shrd r14,r14,28
+ vpsrlq xmm11,xmm3,6
+ add rdx,r11
+ add r11,rdi
+ vpxor xmm8,xmm8,xmm9
+ mov r13,rdx
+ add r14,r11
+ vpsllq xmm10,xmm3,3
+ shrd r13,r13,23
+ mov r11,r14
+ vpaddq xmm4,xmm4,xmm8
+ mov r12,r8
+ shrd r14,r14,5
+ vpsrlq xmm9,xmm3,19
+ xor r13,rdx
+ xor r12,r9
+ vpxor xmm11,xmm11,xmm10
+ shrd r13,r13,4
+ xor r14,r11
+ vpsllq xmm10,xmm10,42
+ and r12,rdx
+ xor r13,rdx
+ vpxor xmm11,xmm11,xmm9
+ add r10,QWORD[72+rsp]
+ mov rdi,r11
+ vpsrlq xmm9,xmm9,42
+ xor r12,r9
+ shrd r14,r14,6
+ vpxor xmm11,xmm11,xmm10
+ xor rdi,rax
+ add r10,r12
+ vpxor xmm11,xmm11,xmm9
+ shrd r13,r13,14
+ and r15,rdi
+ vpaddq xmm4,xmm4,xmm11
+ xor r14,r11
+ add r10,r13
+ vpaddq xmm10,xmm4,XMMWORD[rbp]
+ xor r15,rax
+ shrd r14,r14,28
+ add rcx,r10
+ add r10,r15
+ mov r13,rcx
+ add r14,r10
+ vmovdqa XMMWORD[64+rsp],xmm10
+ vpalignr xmm8,xmm6,xmm5,8
+ shrd r13,r13,23
+ mov r10,r14
+ vpalignr xmm11,xmm2,xmm1,8
+ mov r12,rdx
+ shrd r14,r14,5
+ vpsrlq xmm10,xmm8,1
+ xor r13,rcx
+ xor r12,r8
+ vpaddq xmm5,xmm5,xmm11
+ shrd r13,r13,4
+ xor r14,r10
+ vpsrlq xmm11,xmm8,7
+ and r12,rcx
+ xor r13,rcx
+ vpsllq xmm9,xmm8,56
+ add r9,QWORD[80+rsp]
+ mov r15,r10
+ vpxor xmm8,xmm11,xmm10
+ xor r12,r8
+ shrd r14,r14,6
+ vpsrlq xmm10,xmm10,7
+ xor r15,r11
+ add r9,r12
+ vpxor xmm8,xmm8,xmm9
+ shrd r13,r13,14
+ and rdi,r15
+ vpsllq xmm9,xmm9,7
+ xor r14,r10
+ add r9,r13
+ vpxor xmm8,xmm8,xmm10
+ xor rdi,r11
+ shrd r14,r14,28
+ vpsrlq xmm11,xmm4,6
+ add rbx,r9
+ add r9,rdi
+ vpxor xmm8,xmm8,xmm9
+ mov r13,rbx
+ add r14,r9
+ vpsllq xmm10,xmm4,3
+ shrd r13,r13,23
+ mov r9,r14
+ vpaddq xmm5,xmm5,xmm8
+ mov r12,rcx
+ shrd r14,r14,5
+ vpsrlq xmm9,xmm4,19
+ xor r13,rbx
+ xor r12,rdx
+ vpxor xmm11,xmm11,xmm10
+ shrd r13,r13,4
+ xor r14,r9
+ vpsllq xmm10,xmm10,42
+ and r12,rbx
+ xor r13,rbx
+ vpxor xmm11,xmm11,xmm9
+ add r8,QWORD[88+rsp]
+ mov rdi,r9
+ vpsrlq xmm9,xmm9,42
+ xor r12,rdx
+ shrd r14,r14,6
+ vpxor xmm11,xmm11,xmm10
+ xor rdi,r10
+ add r8,r12
+ vpxor xmm11,xmm11,xmm9
+ shrd r13,r13,14
+ and r15,rdi
+ vpaddq xmm5,xmm5,xmm11
+ xor r14,r9
+ add r8,r13
+ vpaddq xmm10,xmm5,XMMWORD[32+rbp]
+ xor r15,r10
+ shrd r14,r14,28
+ add rax,r8
+ add r8,r15
+ mov r13,rax
+ add r14,r8
+ vmovdqa XMMWORD[80+rsp],xmm10
+ vpalignr xmm8,xmm7,xmm6,8
+ shrd r13,r13,23
+ mov r8,r14
+ vpalignr xmm11,xmm3,xmm2,8
+ mov r12,rbx
+ shrd r14,r14,5
+ vpsrlq xmm10,xmm8,1
+ xor r13,rax
+ xor r12,rcx
+ vpaddq xmm6,xmm6,xmm11
+ shrd r13,r13,4
+ xor r14,r8
+ vpsrlq xmm11,xmm8,7
+ and r12,rax
+ xor r13,rax
+ vpsllq xmm9,xmm8,56
+ add rdx,QWORD[96+rsp]
+ mov r15,r8
+ vpxor xmm8,xmm11,xmm10
+ xor r12,rcx
+ shrd r14,r14,6
+ vpsrlq xmm10,xmm10,7
+ xor r15,r9
+ add rdx,r12
+ vpxor xmm8,xmm8,xmm9
+ shrd r13,r13,14
+ and rdi,r15
+ vpsllq xmm9,xmm9,7
+ xor r14,r8
+ add rdx,r13
+ vpxor xmm8,xmm8,xmm10
+ xor rdi,r9
+ shrd r14,r14,28
+ vpsrlq xmm11,xmm5,6
+ add r11,rdx
+ add rdx,rdi
+ vpxor xmm8,xmm8,xmm9
+ mov r13,r11
+ add r14,rdx
+ vpsllq xmm10,xmm5,3
+ shrd r13,r13,23
+ mov rdx,r14
+ vpaddq xmm6,xmm6,xmm8
+ mov r12,rax
+ shrd r14,r14,5
+ vpsrlq xmm9,xmm5,19
+ xor r13,r11
+ xor r12,rbx
+ vpxor xmm11,xmm11,xmm10
+ shrd r13,r13,4
+ xor r14,rdx
+ vpsllq xmm10,xmm10,42
+ and r12,r11
+ xor r13,r11
+ vpxor xmm11,xmm11,xmm9
+ add rcx,QWORD[104+rsp]
+ mov rdi,rdx
+ vpsrlq xmm9,xmm9,42
+ xor r12,rbx
+ shrd r14,r14,6
+ vpxor xmm11,xmm11,xmm10
+ xor rdi,r8
+ add rcx,r12
+ vpxor xmm11,xmm11,xmm9
+ shrd r13,r13,14
+ and r15,rdi
+ vpaddq xmm6,xmm6,xmm11
+ xor r14,rdx
+ add rcx,r13
+ vpaddq xmm10,xmm6,XMMWORD[64+rbp]
+ xor r15,r8
+ shrd r14,r14,28
+ add r10,rcx
+ add rcx,r15
+ mov r13,r10
+ add r14,rcx
+ vmovdqa XMMWORD[96+rsp],xmm10
+ vpalignr xmm8,xmm0,xmm7,8
+ shrd r13,r13,23
+ mov rcx,r14
+ vpalignr xmm11,xmm4,xmm3,8
+ mov r12,r11
+ shrd r14,r14,5
+ vpsrlq xmm10,xmm8,1
+ xor r13,r10
+ xor r12,rax
+ vpaddq xmm7,xmm7,xmm11
+ shrd r13,r13,4
+ xor r14,rcx
+ vpsrlq xmm11,xmm8,7
+ and r12,r10
+ xor r13,r10
+ vpsllq xmm9,xmm8,56
+ add rbx,QWORD[112+rsp]
+ mov r15,rcx
+ vpxor xmm8,xmm11,xmm10
+ xor r12,rax
+ shrd r14,r14,6
+ vpsrlq xmm10,xmm10,7
+ xor r15,rdx
+ add rbx,r12
+ vpxor xmm8,xmm8,xmm9
+ shrd r13,r13,14
+ and rdi,r15
+ vpsllq xmm9,xmm9,7
+ xor r14,rcx
+ add rbx,r13
+ vpxor xmm8,xmm8,xmm10
+ xor rdi,rdx
+ shrd r14,r14,28
+ vpsrlq xmm11,xmm6,6
+ add r9,rbx
+ add rbx,rdi
+ vpxor xmm8,xmm8,xmm9
+ mov r13,r9
+ add r14,rbx
+ vpsllq xmm10,xmm6,3
+ shrd r13,r13,23
+ mov rbx,r14
+ vpaddq xmm7,xmm7,xmm8
+ mov r12,r10
+ shrd r14,r14,5
+ vpsrlq xmm9,xmm6,19
+ xor r13,r9
+ xor r12,r11
+ vpxor xmm11,xmm11,xmm10
+ shrd r13,r13,4
+ xor r14,rbx
+ vpsllq xmm10,xmm10,42
+ and r12,r9
+ xor r13,r9
+ vpxor xmm11,xmm11,xmm9
+ add rax,QWORD[120+rsp]
+ mov rdi,rbx
+ vpsrlq xmm9,xmm9,42
+ xor r12,r11
+ shrd r14,r14,6
+ vpxor xmm11,xmm11,xmm10
+ xor rdi,rcx
+ add rax,r12
+ vpxor xmm11,xmm11,xmm9
+ shrd r13,r13,14
+ and r15,rdi
+ vpaddq xmm7,xmm7,xmm11
+ xor r14,rbx
+ add rax,r13
+ vpaddq xmm10,xmm7,XMMWORD[96+rbp]
+ xor r15,rcx
+ shrd r14,r14,28
+ add r8,rax
+ add rax,r15
+ mov r13,r8
+ add r14,rax
+ vmovdqa XMMWORD[112+rsp],xmm10
+ cmp BYTE[135+rbp],0
+ jne NEAR $L$avx_00_47
+ shrd r13,r13,23
+ mov rax,r14
+ mov r12,r9
+ shrd r14,r14,5
+ xor r13,r8
+ xor r12,r10
+ shrd r13,r13,4
+ xor r14,rax
+ and r12,r8
+ xor r13,r8
+ add r11,QWORD[rsp]
+ mov r15,rax
+ xor r12,r10
+ shrd r14,r14,6
+ xor r15,rbx
+ add r11,r12
+ shrd r13,r13,14
+ and rdi,r15
+ xor r14,rax
+ add r11,r13
+ xor rdi,rbx
+ shrd r14,r14,28
+ add rdx,r11
+ add r11,rdi
+ mov r13,rdx
+ add r14,r11
+ shrd r13,r13,23
+ mov r11,r14
+ mov r12,r8
+ shrd r14,r14,5
+ xor r13,rdx
+ xor r12,r9
+ shrd r13,r13,4
+ xor r14,r11
+ and r12,rdx
+ xor r13,rdx
+ add r10,QWORD[8+rsp]
+ mov rdi,r11
+ xor r12,r9
+ shrd r14,r14,6
+ xor rdi,rax
+ add r10,r12
+ shrd r13,r13,14
+ and r15,rdi
+ xor r14,r11
+ add r10,r13
+ xor r15,rax
+ shrd r14,r14,28
+ add rcx,r10
+ add r10,r15
+ mov r13,rcx
+ add r14,r10
+ shrd r13,r13,23
+ mov r10,r14
+ mov r12,rdx
+ shrd r14,r14,5
+ xor r13,rcx
+ xor r12,r8
+ shrd r13,r13,4
+ xor r14,r10
+ and r12,rcx
+ xor r13,rcx
+ add r9,QWORD[16+rsp]
+ mov r15,r10
+ xor r12,r8
+ shrd r14,r14,6
+ xor r15,r11
+ add r9,r12
+ shrd r13,r13,14
+ and rdi,r15
+ xor r14,r10
+ add r9,r13
+ xor rdi,r11
+ shrd r14,r14,28
+ add rbx,r9
+ add r9,rdi
+ mov r13,rbx
+ add r14,r9
+ shrd r13,r13,23
+ mov r9,r14
+ mov r12,rcx
+ shrd r14,r14,5
+ xor r13,rbx
+ xor r12,rdx
+ shrd r13,r13,4
+ xor r14,r9
+ and r12,rbx
+ xor r13,rbx
+ add r8,QWORD[24+rsp]
+ mov rdi,r9
+ xor r12,rdx
+ shrd r14,r14,6
+ xor rdi,r10
+ add r8,r12
+ shrd r13,r13,14
+ and r15,rdi
+ xor r14,r9
+ add r8,r13
+ xor r15,r10
+ shrd r14,r14,28
+ add rax,r8
+ add r8,r15
+ mov r13,rax
+ add r14,r8
+ shrd r13,r13,23
+ mov r8,r14
+ mov r12,rbx
+ shrd r14,r14,5
+ xor r13,rax
+ xor r12,rcx
+ shrd r13,r13,4
+ xor r14,r8
+ and r12,rax
+ xor r13,rax
+ add rdx,QWORD[32+rsp]
+ mov r15,r8
+ xor r12,rcx
+ shrd r14,r14,6
+ xor r15,r9
+ add rdx,r12
+ shrd r13,r13,14
+ and rdi,r15
+ xor r14,r8
+ add rdx,r13
+ xor rdi,r9
+ shrd r14,r14,28
+ add r11,rdx
+ add rdx,rdi
+ mov r13,r11
+ add r14,rdx
+ shrd r13,r13,23
+ mov rdx,r14
+ mov r12,rax
+ shrd r14,r14,5
+ xor r13,r11
+ xor r12,rbx
+ shrd r13,r13,4
+ xor r14,rdx
+ and r12,r11
+ xor r13,r11
+ add rcx,QWORD[40+rsp]
+ mov rdi,rdx
+ xor r12,rbx
+ shrd r14,r14,6
+ xor rdi,r8
+ add rcx,r12
+ shrd r13,r13,14
+ and r15,rdi
+ xor r14,rdx
+ add rcx,r13
+ xor r15,r8
+ shrd r14,r14,28
+ add r10,rcx
+ add rcx,r15
+ mov r13,r10
+ add r14,rcx
+ shrd r13,r13,23
+ mov rcx,r14
+ mov r12,r11
+ shrd r14,r14,5
+ xor r13,r10
+ xor r12,rax
+ shrd r13,r13,4
+ xor r14,rcx
+ and r12,r10
+ xor r13,r10
+ add rbx,QWORD[48+rsp]
+ mov r15,rcx
+ xor r12,rax
+ shrd r14,r14,6
+ xor r15,rdx
+ add rbx,r12
+ shrd r13,r13,14
+ and rdi,r15
+ xor r14,rcx
+ add rbx,r13
+ xor rdi,rdx
+ shrd r14,r14,28
+ add r9,rbx
+ add rbx,rdi
+ mov r13,r9
+ add r14,rbx
+ shrd r13,r13,23
+ mov rbx,r14
+ mov r12,r10
+ shrd r14,r14,5
+ xor r13,r9
+ xor r12,r11
+ shrd r13,r13,4
+ xor r14,rbx
+ and r12,r9
+ xor r13,r9
+ add rax,QWORD[56+rsp]
+ mov rdi,rbx
+ xor r12,r11
+ shrd r14,r14,6
+ xor rdi,rcx
+ add rax,r12
+ shrd r13,r13,14
+ and r15,rdi
+ xor r14,rbx
+ add rax,r13
+ xor r15,rcx
+ shrd r14,r14,28
+ add r8,rax
+ add rax,r15
+ mov r13,r8
+ add r14,rax
+ shrd r13,r13,23
+ mov rax,r14
+ mov r12,r9
+ shrd r14,r14,5
+ xor r13,r8
+ xor r12,r10
+ shrd r13,r13,4
+ xor r14,rax
+ and r12,r8
+ xor r13,r8
+ add r11,QWORD[64+rsp]
+ mov r15,rax
+ xor r12,r10
+ shrd r14,r14,6
+ xor r15,rbx
+ add r11,r12
+ shrd r13,r13,14
+ and rdi,r15
+ xor r14,rax
+ add r11,r13
+ xor rdi,rbx
+ shrd r14,r14,28
+ add rdx,r11
+ add r11,rdi
+ mov r13,rdx
+ add r14,r11
+ shrd r13,r13,23
+ mov r11,r14
+ mov r12,r8
+ shrd r14,r14,5
+ xor r13,rdx
+ xor r12,r9
+ shrd r13,r13,4
+ xor r14,r11
+ and r12,rdx
+ xor r13,rdx
+ add r10,QWORD[72+rsp]
+ mov rdi,r11
+ xor r12,r9
+ shrd r14,r14,6
+ xor rdi,rax
+ add r10,r12
+ shrd r13,r13,14
+ and r15,rdi
+ xor r14,r11
+ add r10,r13
+ xor r15,rax
+ shrd r14,r14,28
+ add rcx,r10
+ add r10,r15
+ mov r13,rcx
+ add r14,r10
+ shrd r13,r13,23
+ mov r10,r14
+ mov r12,rdx
+ shrd r14,r14,5
+ xor r13,rcx
+ xor r12,r8
+ shrd r13,r13,4
+ xor r14,r10
+ and r12,rcx
+ xor r13,rcx
+ add r9,QWORD[80+rsp]
+ mov r15,r10
+ xor r12,r8
+ shrd r14,r14,6
+ xor r15,r11
+ add r9,r12
+ shrd r13,r13,14
+ and rdi,r15
+ xor r14,r10
+ add r9,r13
+ xor rdi,r11
+ shrd r14,r14,28
+ add rbx,r9
+ add r9,rdi
+ mov r13,rbx
+ add r14,r9
+ shrd r13,r13,23
+ mov r9,r14
+ mov r12,rcx
+ shrd r14,r14,5
+ xor r13,rbx
+ xor r12,rdx
+ shrd r13,r13,4
+ xor r14,r9
+ and r12,rbx
+ xor r13,rbx
+ add r8,QWORD[88+rsp]
+ mov rdi,r9
+ xor r12,rdx
+ shrd r14,r14,6
+ xor rdi,r10
+ add r8,r12
+ shrd r13,r13,14
+ and r15,rdi
+ xor r14,r9
+ add r8,r13
+ xor r15,r10
+ shrd r14,r14,28
+ add rax,r8
+ add r8,r15
+ mov r13,rax
+ add r14,r8
+ shrd r13,r13,23
+ mov r8,r14
+ mov r12,rbx
+ shrd r14,r14,5
+ xor r13,rax
+ xor r12,rcx
+ shrd r13,r13,4
+ xor r14,r8
+ and r12,rax
+ xor r13,rax
+ add rdx,QWORD[96+rsp]
+ mov r15,r8
+ xor r12,rcx
+ shrd r14,r14,6
+ xor r15,r9
+ add rdx,r12
+ shrd r13,r13,14
+ and rdi,r15
+ xor r14,r8
+ add rdx,r13
+ xor rdi,r9
+ shrd r14,r14,28
+ add r11,rdx
+ add rdx,rdi
+ mov r13,r11
+ add r14,rdx
+ shrd r13,r13,23
+ mov rdx,r14
+ mov r12,rax
+ shrd r14,r14,5
+ xor r13,r11
+ xor r12,rbx
+ shrd r13,r13,4
+ xor r14,rdx
+ and r12,r11
+ xor r13,r11
+ add rcx,QWORD[104+rsp]
+ mov rdi,rdx
+ xor r12,rbx
+ shrd r14,r14,6
+ xor rdi,r8
+ add rcx,r12
+ shrd r13,r13,14
+ and r15,rdi
+ xor r14,rdx
+ add rcx,r13
+ xor r15,r8
+ shrd r14,r14,28
+ add r10,rcx
+ add rcx,r15
+ mov r13,r10
+ add r14,rcx
+ shrd r13,r13,23
+ mov rcx,r14
+ mov r12,r11
+ shrd r14,r14,5
+ xor r13,r10
+ xor r12,rax
+ shrd r13,r13,4
+ xor r14,rcx
+ and r12,r10
+ xor r13,r10
+ add rbx,QWORD[112+rsp]
+ mov r15,rcx
+ xor r12,rax
+ shrd r14,r14,6
+ xor r15,rdx
+ add rbx,r12
+ shrd r13,r13,14
+ and rdi,r15
+ xor r14,rcx
+ add rbx,r13
+ xor rdi,rdx
+ shrd r14,r14,28
+ add r9,rbx
+ add rbx,rdi
+ mov r13,r9
+ add r14,rbx
+ shrd r13,r13,23
+ mov rbx,r14
+ mov r12,r10
+ shrd r14,r14,5
+ xor r13,r9
+ xor r12,r11
+ shrd r13,r13,4
+ xor r14,rbx
+ and r12,r9
+ xor r13,r9
+ add rax,QWORD[120+rsp]
+ mov rdi,rbx
+ xor r12,r11
+ shrd r14,r14,6
+ xor rdi,rcx
+ add rax,r12
+ shrd r13,r13,14
+ and r15,rdi
+ xor r14,rbx
+ add rax,r13
+ xor r15,rcx
+ shrd r14,r14,28
+ add r8,rax
+ add rax,r15
+ mov r13,r8
+ add r14,rax
+ mov rdi,QWORD[((128+0))+rsp]
+ mov rax,r14
+
+ add rax,QWORD[rdi]
+ lea rsi,[128+rsi]
+ add rbx,QWORD[8+rdi]
+ add rcx,QWORD[16+rdi]
+ add rdx,QWORD[24+rdi]
+ add r8,QWORD[32+rdi]
+ add r9,QWORD[40+rdi]
+ add r10,QWORD[48+rdi]
+ add r11,QWORD[56+rdi]
+
+ cmp rsi,QWORD[((128+16))+rsp]
+
+ mov QWORD[rdi],rax
+ mov QWORD[8+rdi],rbx
+ mov QWORD[16+rdi],rcx
+ mov QWORD[24+rdi],rdx
+ mov QWORD[32+rdi],r8
+ mov QWORD[40+rdi],r9
+ mov QWORD[48+rdi],r10
+ mov QWORD[56+rdi],r11
+ jb NEAR $L$loop_avx
+
+ mov rsi,QWORD[152+rsp]
+
+ vzeroupper
+ movaps xmm6,XMMWORD[((128+32))+rsp]
+ movaps xmm7,XMMWORD[((128+48))+rsp]
+ movaps xmm8,XMMWORD[((128+64))+rsp]
+ movaps xmm9,XMMWORD[((128+80))+rsp]
+ movaps xmm10,XMMWORD[((128+96))+rsp]
+ movaps xmm11,XMMWORD[((128+112))+rsp]
+ mov r15,QWORD[((-48))+rsi]
+
+ mov r14,QWORD[((-40))+rsi]
+
+ mov r13,QWORD[((-32))+rsi]
+
+ mov r12,QWORD[((-24))+rsi]
+
+ mov rbp,QWORD[((-16))+rsi]
+
+ mov rbx,QWORD[((-8))+rsi]
+
+ lea rsp,[rsi]
+
+$L$epilogue_avx:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_sha512_block_data_order_avx:
+
+ALIGN 64
+sha512_block_data_order_avx2:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_sha512_block_data_order_avx2:
+ mov rdi,rcx
+ mov rsi,rdx
+ mov rdx,r8
+
+
+
+$L$avx2_shortcut:
+ mov rax,rsp
+
+ push rbx
+
+ push rbp
+
+ push r12
+
+ push r13
+
+ push r14
+
+ push r15
+
+ sub rsp,1408
+ shl rdx,4
+ and rsp,-256*8
+ lea rdx,[rdx*8+rsi]
+ add rsp,1152
+ mov QWORD[((128+0))+rsp],rdi
+ mov QWORD[((128+8))+rsp],rsi
+ mov QWORD[((128+16))+rsp],rdx
+ mov QWORD[152+rsp],rax
+
+ movaps XMMWORD[(128+32)+rsp],xmm6
+ movaps XMMWORD[(128+48)+rsp],xmm7
+ movaps XMMWORD[(128+64)+rsp],xmm8
+ movaps XMMWORD[(128+80)+rsp],xmm9
+ movaps XMMWORD[(128+96)+rsp],xmm10
+ movaps XMMWORD[(128+112)+rsp],xmm11
+$L$prologue_avx2:
+
+ vzeroupper
+ sub rsi,-16*8
+ mov rax,QWORD[rdi]
+ mov r12,rsi
+ mov rbx,QWORD[8+rdi]
+ cmp rsi,rdx
+ mov rcx,QWORD[16+rdi]
+ cmove r12,rsp
+ mov rdx,QWORD[24+rdi]
+ mov r8,QWORD[32+rdi]
+ mov r9,QWORD[40+rdi]
+ mov r10,QWORD[48+rdi]
+ mov r11,QWORD[56+rdi]
+ jmp NEAR $L$oop_avx2
+ALIGN 16
+$L$oop_avx2:
+ vmovdqu xmm0,XMMWORD[((-128))+rsi]
+ vmovdqu xmm1,XMMWORD[((-128+16))+rsi]
+ vmovdqu xmm2,XMMWORD[((-128+32))+rsi]
+ lea rbp,[((K512+128))]
+ vmovdqu xmm3,XMMWORD[((-128+48))+rsi]
+ vmovdqu xmm4,XMMWORD[((-128+64))+rsi]
+ vmovdqu xmm5,XMMWORD[((-128+80))+rsi]
+ vmovdqu xmm6,XMMWORD[((-128+96))+rsi]
+ vmovdqu xmm7,XMMWORD[((-128+112))+rsi]
+
+ vmovdqa ymm10,YMMWORD[1152+rbp]
+ vinserti128 ymm0,ymm0,XMMWORD[r12],1
+ vinserti128 ymm1,ymm1,XMMWORD[16+r12],1
+ vpshufb ymm0,ymm0,ymm10
+ vinserti128 ymm2,ymm2,XMMWORD[32+r12],1
+ vpshufb ymm1,ymm1,ymm10
+ vinserti128 ymm3,ymm3,XMMWORD[48+r12],1
+ vpshufb ymm2,ymm2,ymm10
+ vinserti128 ymm4,ymm4,XMMWORD[64+r12],1
+ vpshufb ymm3,ymm3,ymm10
+ vinserti128 ymm5,ymm5,XMMWORD[80+r12],1
+ vpshufb ymm4,ymm4,ymm10
+ vinserti128 ymm6,ymm6,XMMWORD[96+r12],1
+ vpshufb ymm5,ymm5,ymm10
+ vinserti128 ymm7,ymm7,XMMWORD[112+r12],1
+
+ vpaddq ymm8,ymm0,YMMWORD[((-128))+rbp]
+ vpshufb ymm6,ymm6,ymm10
+ vpaddq ymm9,ymm1,YMMWORD[((-96))+rbp]
+ vpshufb ymm7,ymm7,ymm10
+ vpaddq ymm10,ymm2,YMMWORD[((-64))+rbp]
+ vpaddq ymm11,ymm3,YMMWORD[((-32))+rbp]
+ vmovdqa YMMWORD[rsp],ymm8
+ vpaddq ymm8,ymm4,YMMWORD[rbp]
+ vmovdqa YMMWORD[32+rsp],ymm9
+ vpaddq ymm9,ymm5,YMMWORD[32+rbp]
+ vmovdqa YMMWORD[64+rsp],ymm10
+ vpaddq ymm10,ymm6,YMMWORD[64+rbp]
+ vmovdqa YMMWORD[96+rsp],ymm11
+ lea rsp,[((-128))+rsp]
+ vpaddq ymm11,ymm7,YMMWORD[96+rbp]
+ vmovdqa YMMWORD[rsp],ymm8
+ xor r14,r14
+ vmovdqa YMMWORD[32+rsp],ymm9
+ mov rdi,rbx
+ vmovdqa YMMWORD[64+rsp],ymm10
+ xor rdi,rcx
+ vmovdqa YMMWORD[96+rsp],ymm11
+ mov r12,r9
+ add rbp,16*2*8
+ jmp NEAR $L$avx2_00_47
+
+ALIGN 16
+$L$avx2_00_47:
+ lea rsp,[((-128))+rsp]
+ vpalignr ymm8,ymm1,ymm0,8
+ add r11,QWORD[((0+256))+rsp]
+ and r12,r8
+ rorx r13,r8,41
+ vpalignr ymm11,ymm5,ymm4,8
+ rorx r15,r8,18
+ lea rax,[r14*1+rax]
+ lea r11,[r12*1+r11]
+ vpsrlq ymm10,ymm8,1
+ andn r12,r8,r10
+ xor r13,r15
+ rorx r14,r8,14
+ vpaddq ymm0,ymm0,ymm11
+ vpsrlq ymm11,ymm8,7
+ lea r11,[r12*1+r11]
+ xor r13,r14
+ mov r15,rax
+ vpsllq ymm9,ymm8,56
+ vpxor ymm8,ymm11,ymm10
+ rorx r12,rax,39
+ lea r11,[r13*1+r11]
+ xor r15,rbx
+ vpsrlq ymm10,ymm10,7
+ vpxor ymm8,ymm8,ymm9
+ rorx r14,rax,34
+ rorx r13,rax,28
+ lea rdx,[r11*1+rdx]
+ vpsllq ymm9,ymm9,7
+ vpxor ymm8,ymm8,ymm10
+ and rdi,r15
+ xor r14,r12
+ xor rdi,rbx
+ vpsrlq ymm11,ymm7,6
+ vpxor ymm8,ymm8,ymm9
+ xor r14,r13
+ lea r11,[rdi*1+r11]
+ mov r12,r8
+ vpsllq ymm10,ymm7,3
+ vpaddq ymm0,ymm0,ymm8
+ add r10,QWORD[((8+256))+rsp]
+ and r12,rdx
+ rorx r13,rdx,41
+ vpsrlq ymm9,ymm7,19
+ vpxor ymm11,ymm11,ymm10
+ rorx rdi,rdx,18
+ lea r11,[r14*1+r11]
+ lea r10,[r12*1+r10]
+ vpsllq ymm10,ymm10,42
+ vpxor ymm11,ymm11,ymm9
+ andn r12,rdx,r9
+ xor r13,rdi
+ rorx r14,rdx,14
+ vpsrlq ymm9,ymm9,42
+ vpxor ymm11,ymm11,ymm10
+ lea r10,[r12*1+r10]
+ xor r13,r14
+ mov rdi,r11
+ vpxor ymm11,ymm11,ymm9
+ rorx r12,r11,39
+ lea r10,[r13*1+r10]
+ xor rdi,rax
+ vpaddq ymm0,ymm0,ymm11
+ rorx r14,r11,34
+ rorx r13,r11,28
+ lea rcx,[r10*1+rcx]
+ vpaddq ymm10,ymm0,YMMWORD[((-128))+rbp]
+ and r15,rdi
+ xor r14,r12
+ xor r15,rax
+ xor r14,r13
+ lea r10,[r15*1+r10]
+ mov r12,rdx
+ vmovdqa YMMWORD[rsp],ymm10
+ vpalignr ymm8,ymm2,ymm1,8
+ add r9,QWORD[((32+256))+rsp]
+ and r12,rcx
+ rorx r13,rcx,41
+ vpalignr ymm11,ymm6,ymm5,8
+ rorx r15,rcx,18
+ lea r10,[r14*1+r10]
+ lea r9,[r12*1+r9]
+ vpsrlq ymm10,ymm8,1
+ andn r12,rcx,r8
+ xor r13,r15
+ rorx r14,rcx,14
+ vpaddq ymm1,ymm1,ymm11
+ vpsrlq ymm11,ymm8,7
+ lea r9,[r12*1+r9]
+ xor r13,r14
+ mov r15,r10
+ vpsllq ymm9,ymm8,56
+ vpxor ymm8,ymm11,ymm10
+ rorx r12,r10,39
+ lea r9,[r13*1+r9]
+ xor r15,r11
+ vpsrlq ymm10,ymm10,7
+ vpxor ymm8,ymm8,ymm9
+ rorx r14,r10,34
+ rorx r13,r10,28
+ lea rbx,[r9*1+rbx]
+ vpsllq ymm9,ymm9,7
+ vpxor ymm8,ymm8,ymm10
+ and rdi,r15
+ xor r14,r12
+ xor rdi,r11
+ vpsrlq ymm11,ymm0,6
+ vpxor ymm8,ymm8,ymm9
+ xor r14,r13
+ lea r9,[rdi*1+r9]
+ mov r12,rcx
+ vpsllq ymm10,ymm0,3
+ vpaddq ymm1,ymm1,ymm8
+ add r8,QWORD[((40+256))+rsp]
+ and r12,rbx
+ rorx r13,rbx,41
+ vpsrlq ymm9,ymm0,19
+ vpxor ymm11,ymm11,ymm10
+ rorx rdi,rbx,18
+ lea r9,[r14*1+r9]
+ lea r8,[r12*1+r8]
+ vpsllq ymm10,ymm10,42
+ vpxor ymm11,ymm11,ymm9
+ andn r12,rbx,rdx
+ xor r13,rdi
+ rorx r14,rbx,14
+ vpsrlq ymm9,ymm9,42
+ vpxor ymm11,ymm11,ymm10
+ lea r8,[r12*1+r8]
+ xor r13,r14
+ mov rdi,r9
+ vpxor ymm11,ymm11,ymm9
+ rorx r12,r9,39
+ lea r8,[r13*1+r8]
+ xor rdi,r10
+ vpaddq ymm1,ymm1,ymm11
+ rorx r14,r9,34
+ rorx r13,r9,28
+ lea rax,[r8*1+rax]
+ vpaddq ymm10,ymm1,YMMWORD[((-96))+rbp]
+ and r15,rdi
+ xor r14,r12
+ xor r15,r10
+ xor r14,r13
+ lea r8,[r15*1+r8]
+ mov r12,rbx
+ vmovdqa YMMWORD[32+rsp],ymm10
+ vpalignr ymm8,ymm3,ymm2,8
+ add rdx,QWORD[((64+256))+rsp]
+ and r12,rax
+ rorx r13,rax,41
+ vpalignr ymm11,ymm7,ymm6,8
+ rorx r15,rax,18
+ lea r8,[r14*1+r8]
+ lea rdx,[r12*1+rdx]
+ vpsrlq ymm10,ymm8,1
+ andn r12,rax,rcx
+ xor r13,r15
+ rorx r14,rax,14
+ vpaddq ymm2,ymm2,ymm11
+ vpsrlq ymm11,ymm8,7
+ lea rdx,[r12*1+rdx]
+ xor r13,r14
+ mov r15,r8
+ vpsllq ymm9,ymm8,56
+ vpxor ymm8,ymm11,ymm10
+ rorx r12,r8,39
+ lea rdx,[r13*1+rdx]
+ xor r15,r9
+ vpsrlq ymm10,ymm10,7
+ vpxor ymm8,ymm8,ymm9
+ rorx r14,r8,34
+ rorx r13,r8,28
+ lea r11,[rdx*1+r11]
+ vpsllq ymm9,ymm9,7
+ vpxor ymm8,ymm8,ymm10
+ and rdi,r15
+ xor r14,r12
+ xor rdi,r9
+ vpsrlq ymm11,ymm1,6
+ vpxor ymm8,ymm8,ymm9
+ xor r14,r13
+ lea rdx,[rdi*1+rdx]
+ mov r12,rax
+ vpsllq ymm10,ymm1,3
+ vpaddq ymm2,ymm2,ymm8
+ add rcx,QWORD[((72+256))+rsp]
+ and r12,r11
+ rorx r13,r11,41
+ vpsrlq ymm9,ymm1,19
+ vpxor ymm11,ymm11,ymm10
+ rorx rdi,r11,18
+ lea rdx,[r14*1+rdx]
+ lea rcx,[r12*1+rcx]
+ vpsllq ymm10,ymm10,42
+ vpxor ymm11,ymm11,ymm9
+ andn r12,r11,rbx
+ xor r13,rdi
+ rorx r14,r11,14
+ vpsrlq ymm9,ymm9,42
+ vpxor ymm11,ymm11,ymm10
+ lea rcx,[r12*1+rcx]
+ xor r13,r14
+ mov rdi,rdx
+ vpxor ymm11,ymm11,ymm9
+ rorx r12,rdx,39
+ lea rcx,[r13*1+rcx]
+ xor rdi,r8
+ vpaddq ymm2,ymm2,ymm11
+ rorx r14,rdx,34
+ rorx r13,rdx,28
+ lea r10,[rcx*1+r10]
+ vpaddq ymm10,ymm2,YMMWORD[((-64))+rbp]
+ and r15,rdi
+ xor r14,r12
+ xor r15,r8
+ xor r14,r13
+ lea rcx,[r15*1+rcx]
+ mov r12,r11
+ vmovdqa YMMWORD[64+rsp],ymm10
+ vpalignr ymm8,ymm4,ymm3,8
+ add rbx,QWORD[((96+256))+rsp]
+ and r12,r10
+ rorx r13,r10,41
+ vpalignr ymm11,ymm0,ymm7,8
+ rorx r15,r10,18
+ lea rcx,[r14*1+rcx]
+ lea rbx,[r12*1+rbx]
+ vpsrlq ymm10,ymm8,1
+ andn r12,r10,rax
+ xor r13,r15
+ rorx r14,r10,14
+ vpaddq ymm3,ymm3,ymm11
+ vpsrlq ymm11,ymm8,7
+ lea rbx,[r12*1+rbx]
+ xor r13,r14
+ mov r15,rcx
+ vpsllq ymm9,ymm8,56
+ vpxor ymm8,ymm11,ymm10
+ rorx r12,rcx,39
+ lea rbx,[r13*1+rbx]
+ xor r15,rdx
+ vpsrlq ymm10,ymm10,7
+ vpxor ymm8,ymm8,ymm9
+ rorx r14,rcx,34
+ rorx r13,rcx,28
+ lea r9,[rbx*1+r9]
+ vpsllq ymm9,ymm9,7
+ vpxor ymm8,ymm8,ymm10
+ and rdi,r15
+ xor r14,r12
+ xor rdi,rdx
+ vpsrlq ymm11,ymm2,6
+ vpxor ymm8,ymm8,ymm9
+ xor r14,r13
+ lea rbx,[rdi*1+rbx]
+ mov r12,r10
+ vpsllq ymm10,ymm2,3
+ vpaddq ymm3,ymm3,ymm8
+ add rax,QWORD[((104+256))+rsp]
+ and r12,r9
+ rorx r13,r9,41
+ vpsrlq ymm9,ymm2,19
+ vpxor ymm11,ymm11,ymm10
+ rorx rdi,r9,18
+ lea rbx,[r14*1+rbx]
+ lea rax,[r12*1+rax]
+ vpsllq ymm10,ymm10,42
+ vpxor ymm11,ymm11,ymm9
+ andn r12,r9,r11
+ xor r13,rdi
+ rorx r14,r9,14
+ vpsrlq ymm9,ymm9,42
+ vpxor ymm11,ymm11,ymm10
+ lea rax,[r12*1+rax]
+ xor r13,r14
+ mov rdi,rbx
+ vpxor ymm11,ymm11,ymm9
+ rorx r12,rbx,39
+ lea rax,[r13*1+rax]
+ xor rdi,rcx
+ vpaddq ymm3,ymm3,ymm11
+ rorx r14,rbx,34
+ rorx r13,rbx,28
+ lea r8,[rax*1+r8]
+ vpaddq ymm10,ymm3,YMMWORD[((-32))+rbp]
+ and r15,rdi
+ xor r14,r12
+ xor r15,rcx
+ xor r14,r13
+ lea rax,[r15*1+rax]
+ mov r12,r9
+ vmovdqa YMMWORD[96+rsp],ymm10
+ lea rsp,[((-128))+rsp]
+ vpalignr ymm8,ymm5,ymm4,8
+ add r11,QWORD[((0+256))+rsp]
+ and r12,r8
+ rorx r13,r8,41
+ vpalignr ymm11,ymm1,ymm0,8
+ rorx r15,r8,18
+ lea rax,[r14*1+rax]
+ lea r11,[r12*1+r11]
+ vpsrlq ymm10,ymm8,1
+ andn r12,r8,r10
+ xor r13,r15
+ rorx r14,r8,14
+ vpaddq ymm4,ymm4,ymm11
+ vpsrlq ymm11,ymm8,7
+ lea r11,[r12*1+r11]
+ xor r13,r14
+ mov r15,rax
+ vpsllq ymm9,ymm8,56
+ vpxor ymm8,ymm11,ymm10
+ rorx r12,rax,39
+ lea r11,[r13*1+r11]
+ xor r15,rbx
+ vpsrlq ymm10,ymm10,7
+ vpxor ymm8,ymm8,ymm9
+ rorx r14,rax,34
+ rorx r13,rax,28
+ lea rdx,[r11*1+rdx]
+ vpsllq ymm9,ymm9,7
+ vpxor ymm8,ymm8,ymm10
+ and rdi,r15
+ xor r14,r12
+ xor rdi,rbx
+ vpsrlq ymm11,ymm3,6
+ vpxor ymm8,ymm8,ymm9
+ xor r14,r13
+ lea r11,[rdi*1+r11]
+ mov r12,r8
+ vpsllq ymm10,ymm3,3
+ vpaddq ymm4,ymm4,ymm8
+ add r10,QWORD[((8+256))+rsp]
+ and r12,rdx
+ rorx r13,rdx,41
+ vpsrlq ymm9,ymm3,19
+ vpxor ymm11,ymm11,ymm10
+ rorx rdi,rdx,18
+ lea r11,[r14*1+r11]
+ lea r10,[r12*1+r10]
+ vpsllq ymm10,ymm10,42
+ vpxor ymm11,ymm11,ymm9
+ andn r12,rdx,r9
+ xor r13,rdi
+ rorx r14,rdx,14
+ vpsrlq ymm9,ymm9,42
+ vpxor ymm11,ymm11,ymm10
+ lea r10,[r12*1+r10]
+ xor r13,r14
+ mov rdi,r11
+ vpxor ymm11,ymm11,ymm9
+ rorx r12,r11,39
+ lea r10,[r13*1+r10]
+ xor rdi,rax
+ vpaddq ymm4,ymm4,ymm11
+ rorx r14,r11,34
+ rorx r13,r11,28
+ lea rcx,[r10*1+rcx]
+ vpaddq ymm10,ymm4,YMMWORD[rbp]
+ and r15,rdi
+ xor r14,r12
+ xor r15,rax
+ xor r14,r13
+ lea r10,[r15*1+r10]
+ mov r12,rdx
+ vmovdqa YMMWORD[rsp],ymm10
+ vpalignr ymm8,ymm6,ymm5,8
+ add r9,QWORD[((32+256))+rsp]
+ and r12,rcx
+ rorx r13,rcx,41
+ vpalignr ymm11,ymm2,ymm1,8
+ rorx r15,rcx,18
+ lea r10,[r14*1+r10]
+ lea r9,[r12*1+r9]
+ vpsrlq ymm10,ymm8,1
+ andn r12,rcx,r8
+ xor r13,r15
+ rorx r14,rcx,14
+ vpaddq ymm5,ymm5,ymm11
+ vpsrlq ymm11,ymm8,7
+ lea r9,[r12*1+r9]
+ xor r13,r14
+ mov r15,r10
+ vpsllq ymm9,ymm8,56
+ vpxor ymm8,ymm11,ymm10
+ rorx r12,r10,39
+ lea r9,[r13*1+r9]
+ xor r15,r11
+ vpsrlq ymm10,ymm10,7
+ vpxor ymm8,ymm8,ymm9
+ rorx r14,r10,34
+ rorx r13,r10,28
+ lea rbx,[r9*1+rbx]
+ vpsllq ymm9,ymm9,7
+ vpxor ymm8,ymm8,ymm10
+ and rdi,r15
+ xor r14,r12
+ xor rdi,r11
+ vpsrlq ymm11,ymm4,6
+ vpxor ymm8,ymm8,ymm9
+ xor r14,r13
+ lea r9,[rdi*1+r9]
+ mov r12,rcx
+ vpsllq ymm10,ymm4,3
+ vpaddq ymm5,ymm5,ymm8
+ add r8,QWORD[((40+256))+rsp]
+ and r12,rbx
+ rorx r13,rbx,41
+ vpsrlq ymm9,ymm4,19
+ vpxor ymm11,ymm11,ymm10
+ rorx rdi,rbx,18
+ lea r9,[r14*1+r9]
+ lea r8,[r12*1+r8]
+ vpsllq ymm10,ymm10,42
+ vpxor ymm11,ymm11,ymm9
+ andn r12,rbx,rdx
+ xor r13,rdi
+ rorx r14,rbx,14
+ vpsrlq ymm9,ymm9,42
+ vpxor ymm11,ymm11,ymm10
+ lea r8,[r12*1+r8]
+ xor r13,r14
+ mov rdi,r9
+ vpxor ymm11,ymm11,ymm9
+ rorx r12,r9,39
+ lea r8,[r13*1+r8]
+ xor rdi,r10
+ vpaddq ymm5,ymm5,ymm11
+ rorx r14,r9,34
+ rorx r13,r9,28
+ lea rax,[r8*1+rax]
+ vpaddq ymm10,ymm5,YMMWORD[32+rbp]
+ and r15,rdi
+ xor r14,r12
+ xor r15,r10
+ xor r14,r13
+ lea r8,[r15*1+r8]
+ mov r12,rbx
+ vmovdqa YMMWORD[32+rsp],ymm10
+ vpalignr ymm8,ymm7,ymm6,8
+ add rdx,QWORD[((64+256))+rsp]
+ and r12,rax
+ rorx r13,rax,41
+ vpalignr ymm11,ymm3,ymm2,8
+ rorx r15,rax,18
+ lea r8,[r14*1+r8]
+ lea rdx,[r12*1+rdx]
+ vpsrlq ymm10,ymm8,1
+ andn r12,rax,rcx
+ xor r13,r15
+ rorx r14,rax,14
+ vpaddq ymm6,ymm6,ymm11
+ vpsrlq ymm11,ymm8,7
+ lea rdx,[r12*1+rdx]
+ xor r13,r14
+ mov r15,r8
+ vpsllq ymm9,ymm8,56
+ vpxor ymm8,ymm11,ymm10
+ rorx r12,r8,39
+ lea rdx,[r13*1+rdx]
+ xor r15,r9
+ vpsrlq ymm10,ymm10,7
+ vpxor ymm8,ymm8,ymm9
+ rorx r14,r8,34
+ rorx r13,r8,28
+ lea r11,[rdx*1+r11]
+ vpsllq ymm9,ymm9,7
+ vpxor ymm8,ymm8,ymm10
+ and rdi,r15
+ xor r14,r12
+ xor rdi,r9
+ vpsrlq ymm11,ymm5,6
+ vpxor ymm8,ymm8,ymm9
+ xor r14,r13
+ lea rdx,[rdi*1+rdx]
+ mov r12,rax
+ vpsllq ymm10,ymm5,3
+ vpaddq ymm6,ymm6,ymm8
+ add rcx,QWORD[((72+256))+rsp]
+ and r12,r11
+ rorx r13,r11,41
+ vpsrlq ymm9,ymm5,19
+ vpxor ymm11,ymm11,ymm10
+ rorx rdi,r11,18
+ lea rdx,[r14*1+rdx]
+ lea rcx,[r12*1+rcx]
+ vpsllq ymm10,ymm10,42
+ vpxor ymm11,ymm11,ymm9
+ andn r12,r11,rbx
+ xor r13,rdi
+ rorx r14,r11,14
+ vpsrlq ymm9,ymm9,42
+ vpxor ymm11,ymm11,ymm10
+ lea rcx,[r12*1+rcx]
+ xor r13,r14
+ mov rdi,rdx
+ vpxor ymm11,ymm11,ymm9
+ rorx r12,rdx,39
+ lea rcx,[r13*1+rcx]
+ xor rdi,r8
+ vpaddq ymm6,ymm6,ymm11
+ rorx r14,rdx,34
+ rorx r13,rdx,28
+ lea r10,[rcx*1+r10]
+ vpaddq ymm10,ymm6,YMMWORD[64+rbp]
+ and r15,rdi
+ xor r14,r12
+ xor r15,r8
+ xor r14,r13
+ lea rcx,[r15*1+rcx]
+ mov r12,r11
+ vmovdqa YMMWORD[64+rsp],ymm10
+ vpalignr ymm8,ymm0,ymm7,8
+ add rbx,QWORD[((96+256))+rsp]
+ and r12,r10
+ rorx r13,r10,41
+ vpalignr ymm11,ymm4,ymm3,8
+ rorx r15,r10,18
+ lea rcx,[r14*1+rcx]
+ lea rbx,[r12*1+rbx]
+ vpsrlq ymm10,ymm8,1
+ andn r12,r10,rax
+ xor r13,r15
+ rorx r14,r10,14
+ vpaddq ymm7,ymm7,ymm11
+ vpsrlq ymm11,ymm8,7
+ lea rbx,[r12*1+rbx]
+ xor r13,r14
+ mov r15,rcx
+ vpsllq ymm9,ymm8,56
+ vpxor ymm8,ymm11,ymm10
+ rorx r12,rcx,39
+ lea rbx,[r13*1+rbx]
+ xor r15,rdx
+ vpsrlq ymm10,ymm10,7
+ vpxor ymm8,ymm8,ymm9
+ rorx r14,rcx,34
+ rorx r13,rcx,28
+ lea r9,[rbx*1+r9]
+ vpsllq ymm9,ymm9,7
+ vpxor ymm8,ymm8,ymm10
+ and rdi,r15
+ xor r14,r12
+ xor rdi,rdx
+ vpsrlq ymm11,ymm6,6
+ vpxor ymm8,ymm8,ymm9
+ xor r14,r13
+ lea rbx,[rdi*1+rbx]
+ mov r12,r10
+ vpsllq ymm10,ymm6,3
+ vpaddq ymm7,ymm7,ymm8
+ add rax,QWORD[((104+256))+rsp]
+ and r12,r9
+ rorx r13,r9,41
+ vpsrlq ymm9,ymm6,19
+ vpxor ymm11,ymm11,ymm10
+ rorx rdi,r9,18
+ lea rbx,[r14*1+rbx]
+ lea rax,[r12*1+rax]
+ vpsllq ymm10,ymm10,42
+ vpxor ymm11,ymm11,ymm9
+ andn r12,r9,r11
+ xor r13,rdi
+ rorx r14,r9,14
+ vpsrlq ymm9,ymm9,42
+ vpxor ymm11,ymm11,ymm10
+ lea rax,[r12*1+rax]
+ xor r13,r14
+ mov rdi,rbx
+ vpxor ymm11,ymm11,ymm9
+ rorx r12,rbx,39
+ lea rax,[r13*1+rax]
+ xor rdi,rcx
+ vpaddq ymm7,ymm7,ymm11
+ rorx r14,rbx,34
+ rorx r13,rbx,28
+ lea r8,[rax*1+r8]
+ vpaddq ymm10,ymm7,YMMWORD[96+rbp]
+ and r15,rdi
+ xor r14,r12
+ xor r15,rcx
+ xor r14,r13
+ lea rax,[r15*1+rax]
+ mov r12,r9
+ vmovdqa YMMWORD[96+rsp],ymm10
+ lea rbp,[256+rbp]
+ cmp BYTE[((-121))+rbp],0
+ jne NEAR $L$avx2_00_47
+ add r11,QWORD[((0+128))+rsp]
+ and r12,r8
+ rorx r13,r8,41
+ rorx r15,r8,18
+ lea rax,[r14*1+rax]
+ lea r11,[r12*1+r11]
+ andn r12,r8,r10
+ xor r13,r15
+ rorx r14,r8,14
+ lea r11,[r12*1+r11]
+ xor r13,r14
+ mov r15,rax
+ rorx r12,rax,39
+ lea r11,[r13*1+r11]
+ xor r15,rbx
+ rorx r14,rax,34
+ rorx r13,rax,28
+ lea rdx,[r11*1+rdx]
+ and rdi,r15
+ xor r14,r12
+ xor rdi,rbx
+ xor r14,r13
+ lea r11,[rdi*1+r11]
+ mov r12,r8
+ add r10,QWORD[((8+128))+rsp]
+ and r12,rdx
+ rorx r13,rdx,41
+ rorx rdi,rdx,18
+ lea r11,[r14*1+r11]
+ lea r10,[r12*1+r10]
+ andn r12,rdx,r9
+ xor r13,rdi
+ rorx r14,rdx,14
+ lea r10,[r12*1+r10]
+ xor r13,r14
+ mov rdi,r11
+ rorx r12,r11,39
+ lea r10,[r13*1+r10]
+ xor rdi,rax
+ rorx r14,r11,34
+ rorx r13,r11,28
+ lea rcx,[r10*1+rcx]
+ and r15,rdi
+ xor r14,r12
+ xor r15,rax
+ xor r14,r13
+ lea r10,[r15*1+r10]
+ mov r12,rdx
+ add r9,QWORD[((32+128))+rsp]
+ and r12,rcx
+ rorx r13,rcx,41
+ rorx r15,rcx,18
+ lea r10,[r14*1+r10]
+ lea r9,[r12*1+r9]
+ andn r12,rcx,r8
+ xor r13,r15
+ rorx r14,rcx,14
+ lea r9,[r12*1+r9]
+ xor r13,r14
+ mov r15,r10
+ rorx r12,r10,39
+ lea r9,[r13*1+r9]
+ xor r15,r11
+ rorx r14,r10,34
+ rorx r13,r10,28
+ lea rbx,[r9*1+rbx]
+ and rdi,r15
+ xor r14,r12
+ xor rdi,r11
+ xor r14,r13
+ lea r9,[rdi*1+r9]
+ mov r12,rcx
+ add r8,QWORD[((40+128))+rsp]
+ and r12,rbx
+ rorx r13,rbx,41
+ rorx rdi,rbx,18
+ lea r9,[r14*1+r9]
+ lea r8,[r12*1+r8]
+ andn r12,rbx,rdx
+ xor r13,rdi
+ rorx r14,rbx,14
+ lea r8,[r12*1+r8]
+ xor r13,r14
+ mov rdi,r9
+ rorx r12,r9,39
+ lea r8,[r13*1+r8]
+ xor rdi,r10
+ rorx r14,r9,34
+ rorx r13,r9,28
+ lea rax,[r8*1+rax]
+ and r15,rdi
+ xor r14,r12
+ xor r15,r10
+ xor r14,r13
+ lea r8,[r15*1+r8]
+ mov r12,rbx
+ add rdx,QWORD[((64+128))+rsp]
+ and r12,rax
+ rorx r13,rax,41
+ rorx r15,rax,18
+ lea r8,[r14*1+r8]
+ lea rdx,[r12*1+rdx]
+ andn r12,rax,rcx
+ xor r13,r15
+ rorx r14,rax,14
+ lea rdx,[r12*1+rdx]
+ xor r13,r14
+ mov r15,r8
+ rorx r12,r8,39
+ lea rdx,[r13*1+rdx]
+ xor r15,r9
+ rorx r14,r8,34
+ rorx r13,r8,28
+ lea r11,[rdx*1+r11]
+ and rdi,r15
+ xor r14,r12
+ xor rdi,r9
+ xor r14,r13
+ lea rdx,[rdi*1+rdx]
+ mov r12,rax
+ add rcx,QWORD[((72+128))+rsp]
+ and r12,r11
+ rorx r13,r11,41
+ rorx rdi,r11,18
+ lea rdx,[r14*1+rdx]
+ lea rcx,[r12*1+rcx]
+ andn r12,r11,rbx
+ xor r13,rdi
+ rorx r14,r11,14
+ lea rcx,[r12*1+rcx]
+ xor r13,r14
+ mov rdi,rdx
+ rorx r12,rdx,39
+ lea rcx,[r13*1+rcx]
+ xor rdi,r8
+ rorx r14,rdx,34
+ rorx r13,rdx,28
+ lea r10,[rcx*1+r10]
+ and r15,rdi
+ xor r14,r12
+ xor r15,r8
+ xor r14,r13
+ lea rcx,[r15*1+rcx]
+ mov r12,r11
+ add rbx,QWORD[((96+128))+rsp]
+ and r12,r10
+ rorx r13,r10,41
+ rorx r15,r10,18
+ lea rcx,[r14*1+rcx]
+ lea rbx,[r12*1+rbx]
+ andn r12,r10,rax
+ xor r13,r15
+ rorx r14,r10,14
+ lea rbx,[r12*1+rbx]
+ xor r13,r14
+ mov r15,rcx
+ rorx r12,rcx,39
+ lea rbx,[r13*1+rbx]
+ xor r15,rdx
+ rorx r14,rcx,34
+ rorx r13,rcx,28
+ lea r9,[rbx*1+r9]
+ and rdi,r15
+ xor r14,r12
+ xor rdi,rdx
+ xor r14,r13
+ lea rbx,[rdi*1+rbx]
+ mov r12,r10
+ add rax,QWORD[((104+128))+rsp]
+ and r12,r9
+ rorx r13,r9,41
+ rorx rdi,r9,18
+ lea rbx,[r14*1+rbx]
+ lea rax,[r12*1+rax]
+ andn r12,r9,r11
+ xor r13,rdi
+ rorx r14,r9,14
+ lea rax,[r12*1+rax]
+ xor r13,r14
+ mov rdi,rbx
+ rorx r12,rbx,39
+ lea rax,[r13*1+rax]
+ xor rdi,rcx
+ rorx r14,rbx,34
+ rorx r13,rbx,28
+ lea r8,[rax*1+r8]
+ and r15,rdi
+ xor r14,r12
+ xor r15,rcx
+ xor r14,r13
+ lea rax,[r15*1+rax]
+ mov r12,r9
+ add r11,QWORD[rsp]
+ and r12,r8
+ rorx r13,r8,41
+ rorx r15,r8,18
+ lea rax,[r14*1+rax]
+ lea r11,[r12*1+r11]
+ andn r12,r8,r10
+ xor r13,r15
+ rorx r14,r8,14
+ lea r11,[r12*1+r11]
+ xor r13,r14
+ mov r15,rax
+ rorx r12,rax,39
+ lea r11,[r13*1+r11]
+ xor r15,rbx
+ rorx r14,rax,34
+ rorx r13,rax,28
+ lea rdx,[r11*1+rdx]
+ and rdi,r15
+ xor r14,r12
+ xor rdi,rbx
+ xor r14,r13
+ lea r11,[rdi*1+r11]
+ mov r12,r8
+ add r10,QWORD[8+rsp]
+ and r12,rdx
+ rorx r13,rdx,41
+ rorx rdi,rdx,18
+ lea r11,[r14*1+r11]
+ lea r10,[r12*1+r10]
+ andn r12,rdx,r9
+ xor r13,rdi
+ rorx r14,rdx,14
+ lea r10,[r12*1+r10]
+ xor r13,r14
+ mov rdi,r11
+ rorx r12,r11,39
+ lea r10,[r13*1+r10]
+ xor rdi,rax
+ rorx r14,r11,34
+ rorx r13,r11,28
+ lea rcx,[r10*1+rcx]
+ and r15,rdi
+ xor r14,r12
+ xor r15,rax
+ xor r14,r13
+ lea r10,[r15*1+r10]
+ mov r12,rdx
+ add r9,QWORD[32+rsp]
+ and r12,rcx
+ rorx r13,rcx,41
+ rorx r15,rcx,18
+ lea r10,[r14*1+r10]
+ lea r9,[r12*1+r9]
+ andn r12,rcx,r8
+ xor r13,r15
+ rorx r14,rcx,14
+ lea r9,[r12*1+r9]
+ xor r13,r14
+ mov r15,r10
+ rorx r12,r10,39
+ lea r9,[r13*1+r9]
+ xor r15,r11
+ rorx r14,r10,34
+ rorx r13,r10,28
+ lea rbx,[r9*1+rbx]
+ and rdi,r15
+ xor r14,r12
+ xor rdi,r11
+ xor r14,r13
+ lea r9,[rdi*1+r9]
+ mov r12,rcx
+ add r8,QWORD[40+rsp]
+ and r12,rbx
+ rorx r13,rbx,41
+ rorx rdi,rbx,18
+ lea r9,[r14*1+r9]
+ lea r8,[r12*1+r8]
+ andn r12,rbx,rdx
+ xor r13,rdi
+ rorx r14,rbx,14
+ lea r8,[r12*1+r8]
+ xor r13,r14
+ mov rdi,r9
+ rorx r12,r9,39
+ lea r8,[r13*1+r8]
+ xor rdi,r10
+ rorx r14,r9,34
+ rorx r13,r9,28
+ lea rax,[r8*1+rax]
+ and r15,rdi
+ xor r14,r12
+ xor r15,r10
+ xor r14,r13
+ lea r8,[r15*1+r8]
+ mov r12,rbx
+ add rdx,QWORD[64+rsp]
+ and r12,rax
+ rorx r13,rax,41
+ rorx r15,rax,18
+ lea r8,[r14*1+r8]
+ lea rdx,[r12*1+rdx]
+ andn r12,rax,rcx
+ xor r13,r15
+ rorx r14,rax,14
+ lea rdx,[r12*1+rdx]
+ xor r13,r14
+ mov r15,r8
+ rorx r12,r8,39
+ lea rdx,[r13*1+rdx]
+ xor r15,r9
+ rorx r14,r8,34
+ rorx r13,r8,28
+ lea r11,[rdx*1+r11]
+ and rdi,r15
+ xor r14,r12
+ xor rdi,r9
+ xor r14,r13
+ lea rdx,[rdi*1+rdx]
+ mov r12,rax
+ add rcx,QWORD[72+rsp]
+ and r12,r11
+ rorx r13,r11,41
+ rorx rdi,r11,18
+ lea rdx,[r14*1+rdx]
+ lea rcx,[r12*1+rcx]
+ andn r12,r11,rbx
+ xor r13,rdi
+ rorx r14,r11,14
+ lea rcx,[r12*1+rcx]
+ xor r13,r14
+ mov rdi,rdx
+ rorx r12,rdx,39
+ lea rcx,[r13*1+rcx]
+ xor rdi,r8
+ rorx r14,rdx,34
+ rorx r13,rdx,28
+ lea r10,[rcx*1+r10]
+ and r15,rdi
+ xor r14,r12
+ xor r15,r8
+ xor r14,r13
+ lea rcx,[r15*1+rcx]
+ mov r12,r11
+ add rbx,QWORD[96+rsp]
+ and r12,r10
+ rorx r13,r10,41
+ rorx r15,r10,18
+ lea rcx,[r14*1+rcx]
+ lea rbx,[r12*1+rbx]
+ andn r12,r10,rax
+ xor r13,r15
+ rorx r14,r10,14
+ lea rbx,[r12*1+rbx]
+ xor r13,r14
+ mov r15,rcx
+ rorx r12,rcx,39
+ lea rbx,[r13*1+rbx]
+ xor r15,rdx
+ rorx r14,rcx,34
+ rorx r13,rcx,28
+ lea r9,[rbx*1+r9]
+ and rdi,r15
+ xor r14,r12
+ xor rdi,rdx
+ xor r14,r13
+ lea rbx,[rdi*1+rbx]
+ mov r12,r10
+ add rax,QWORD[104+rsp]
+ and r12,r9
+ rorx r13,r9,41
+ rorx rdi,r9,18
+ lea rbx,[r14*1+rbx]
+ lea rax,[r12*1+rax]
+ andn r12,r9,r11
+ xor r13,rdi
+ rorx r14,r9,14
+ lea rax,[r12*1+rax]
+ xor r13,r14
+ mov rdi,rbx
+ rorx r12,rbx,39
+ lea rax,[r13*1+rax]
+ xor rdi,rcx
+ rorx r14,rbx,34
+ rorx r13,rbx,28
+ lea r8,[rax*1+r8]
+ and r15,rdi
+ xor r14,r12
+ xor r15,rcx
+ xor r14,r13
+ lea rax,[r15*1+rax]
+ mov r12,r9
+ mov rdi,QWORD[1280+rsp]
+ add rax,r14
+
+ lea rbp,[1152+rsp]
+
+ add rax,QWORD[rdi]
+ add rbx,QWORD[8+rdi]
+ add rcx,QWORD[16+rdi]
+ add rdx,QWORD[24+rdi]
+ add r8,QWORD[32+rdi]
+ add r9,QWORD[40+rdi]
+ add r10,QWORD[48+rdi]
+ add r11,QWORD[56+rdi]
+
+ mov QWORD[rdi],rax
+ mov QWORD[8+rdi],rbx
+ mov QWORD[16+rdi],rcx
+ mov QWORD[24+rdi],rdx
+ mov QWORD[32+rdi],r8
+ mov QWORD[40+rdi],r9
+ mov QWORD[48+rdi],r10
+ mov QWORD[56+rdi],r11
+
+ cmp rsi,QWORD[144+rbp]
+ je NEAR $L$done_avx2
+
+ xor r14,r14
+ mov rdi,rbx
+ xor rdi,rcx
+ mov r12,r9
+ jmp NEAR $L$ower_avx2
+ALIGN 16
+$L$ower_avx2:
+ add r11,QWORD[((0+16))+rbp]
+ and r12,r8
+ rorx r13,r8,41
+ rorx r15,r8,18
+ lea rax,[r14*1+rax]
+ lea r11,[r12*1+r11]
+ andn r12,r8,r10
+ xor r13,r15
+ rorx r14,r8,14
+ lea r11,[r12*1+r11]
+ xor r13,r14
+ mov r15,rax
+ rorx r12,rax,39
+ lea r11,[r13*1+r11]
+ xor r15,rbx
+ rorx r14,rax,34
+ rorx r13,rax,28
+ lea rdx,[r11*1+rdx]
+ and rdi,r15
+ xor r14,r12
+ xor rdi,rbx
+ xor r14,r13
+ lea r11,[rdi*1+r11]
+ mov r12,r8
+ add r10,QWORD[((8+16))+rbp]
+ and r12,rdx
+ rorx r13,rdx,41
+ rorx rdi,rdx,18
+ lea r11,[r14*1+r11]
+ lea r10,[r12*1+r10]
+ andn r12,rdx,r9
+ xor r13,rdi
+ rorx r14,rdx,14
+ lea r10,[r12*1+r10]
+ xor r13,r14
+ mov rdi,r11
+ rorx r12,r11,39
+ lea r10,[r13*1+r10]
+ xor rdi,rax
+ rorx r14,r11,34
+ rorx r13,r11,28
+ lea rcx,[r10*1+rcx]
+ and r15,rdi
+ xor r14,r12
+ xor r15,rax
+ xor r14,r13
+ lea r10,[r15*1+r10]
+ mov r12,rdx
+ add r9,QWORD[((32+16))+rbp]
+ and r12,rcx
+ rorx r13,rcx,41
+ rorx r15,rcx,18
+ lea r10,[r14*1+r10]
+ lea r9,[r12*1+r9]
+ andn r12,rcx,r8
+ xor r13,r15
+ rorx r14,rcx,14
+ lea r9,[r12*1+r9]
+ xor r13,r14
+ mov r15,r10
+ rorx r12,r10,39
+ lea r9,[r13*1+r9]
+ xor r15,r11
+ rorx r14,r10,34
+ rorx r13,r10,28
+ lea rbx,[r9*1+rbx]
+ and rdi,r15
+ xor r14,r12
+ xor rdi,r11
+ xor r14,r13
+ lea r9,[rdi*1+r9]
+ mov r12,rcx
+ add r8,QWORD[((40+16))+rbp]
+ and r12,rbx
+ rorx r13,rbx,41
+ rorx rdi,rbx,18
+ lea r9,[r14*1+r9]
+ lea r8,[r12*1+r8]
+ andn r12,rbx,rdx
+ xor r13,rdi
+ rorx r14,rbx,14
+ lea r8,[r12*1+r8]
+ xor r13,r14
+ mov rdi,r9
+ rorx r12,r9,39
+ lea r8,[r13*1+r8]
+ xor rdi,r10
+ rorx r14,r9,34
+ rorx r13,r9,28
+ lea rax,[r8*1+rax]
+ and r15,rdi
+ xor r14,r12
+ xor r15,r10
+ xor r14,r13
+ lea r8,[r15*1+r8]
+ mov r12,rbx
+ add rdx,QWORD[((64+16))+rbp]
+ and r12,rax
+ rorx r13,rax,41
+ rorx r15,rax,18
+ lea r8,[r14*1+r8]
+ lea rdx,[r12*1+rdx]
+ andn r12,rax,rcx
+ xor r13,r15
+ rorx r14,rax,14
+ lea rdx,[r12*1+rdx]
+ xor r13,r14
+ mov r15,r8
+ rorx r12,r8,39
+ lea rdx,[r13*1+rdx]
+ xor r15,r9
+ rorx r14,r8,34
+ rorx r13,r8,28
+ lea r11,[rdx*1+r11]
+ and rdi,r15
+ xor r14,r12
+ xor rdi,r9
+ xor r14,r13
+ lea rdx,[rdi*1+rdx]
+ mov r12,rax
+ add rcx,QWORD[((72+16))+rbp]
+ and r12,r11
+ rorx r13,r11,41
+ rorx rdi,r11,18
+ lea rdx,[r14*1+rdx]
+ lea rcx,[r12*1+rcx]
+ andn r12,r11,rbx
+ xor r13,rdi
+ rorx r14,r11,14
+ lea rcx,[r12*1+rcx]
+ xor r13,r14
+ mov rdi,rdx
+ rorx r12,rdx,39
+ lea rcx,[r13*1+rcx]
+ xor rdi,r8
+ rorx r14,rdx,34
+ rorx r13,rdx,28
+ lea r10,[rcx*1+r10]
+ and r15,rdi
+ xor r14,r12
+ xor r15,r8
+ xor r14,r13
+ lea rcx,[r15*1+rcx]
+ mov r12,r11
+ add rbx,QWORD[((96+16))+rbp]
+ and r12,r10
+ rorx r13,r10,41
+ rorx r15,r10,18
+ lea rcx,[r14*1+rcx]
+ lea rbx,[r12*1+rbx]
+ andn r12,r10,rax
+ xor r13,r15
+ rorx r14,r10,14
+ lea rbx,[r12*1+rbx]
+ xor r13,r14
+ mov r15,rcx
+ rorx r12,rcx,39
+ lea rbx,[r13*1+rbx]
+ xor r15,rdx
+ rorx r14,rcx,34
+ rorx r13,rcx,28
+ lea r9,[rbx*1+r9]
+ and rdi,r15
+ xor r14,r12
+ xor rdi,rdx
+ xor r14,r13
+ lea rbx,[rdi*1+rbx]
+ mov r12,r10
+ add rax,QWORD[((104+16))+rbp]
+ and r12,r9
+ rorx r13,r9,41
+ rorx rdi,r9,18
+ lea rbx,[r14*1+rbx]
+ lea rax,[r12*1+rax]
+ andn r12,r9,r11
+ xor r13,rdi
+ rorx r14,r9,14
+ lea rax,[r12*1+rax]
+ xor r13,r14
+ mov rdi,rbx
+ rorx r12,rbx,39
+ lea rax,[r13*1+rax]
+ xor rdi,rcx
+ rorx r14,rbx,34
+ rorx r13,rbx,28
+ lea r8,[rax*1+r8]
+ and r15,rdi
+ xor r14,r12
+ xor r15,rcx
+ xor r14,r13
+ lea rax,[r15*1+rax]
+ mov r12,r9
+ lea rbp,[((-128))+rbp]
+ cmp rbp,rsp
+ jae NEAR $L$ower_avx2
+
+ mov rdi,QWORD[1280+rsp]
+ add rax,r14
+
+ lea rsp,[1152+rsp]
+
+ add rax,QWORD[rdi]
+ add rbx,QWORD[8+rdi]
+ add rcx,QWORD[16+rdi]
+ add rdx,QWORD[24+rdi]
+ add r8,QWORD[32+rdi]
+ add r9,QWORD[40+rdi]
+ lea rsi,[256+rsi]
+ add r10,QWORD[48+rdi]
+ mov r12,rsi
+ add r11,QWORD[56+rdi]
+ cmp rsi,QWORD[((128+16))+rsp]
+
+ mov QWORD[rdi],rax
+ cmove r12,rsp
+ mov QWORD[8+rdi],rbx
+ mov QWORD[16+rdi],rcx
+ mov QWORD[24+rdi],rdx
+ mov QWORD[32+rdi],r8
+ mov QWORD[40+rdi],r9
+ mov QWORD[48+rdi],r10
+ mov QWORD[56+rdi],r11
+
+ jbe NEAR $L$oop_avx2
+ lea rbp,[rsp]
+
+$L$done_avx2:
+ lea rsp,[rbp]
+ mov rsi,QWORD[152+rsp]
+
+ vzeroupper
+ movaps xmm6,XMMWORD[((128+32))+rsp]
+ movaps xmm7,XMMWORD[((128+48))+rsp]
+ movaps xmm8,XMMWORD[((128+64))+rsp]
+ movaps xmm9,XMMWORD[((128+80))+rsp]
+ movaps xmm10,XMMWORD[((128+96))+rsp]
+ movaps xmm11,XMMWORD[((128+112))+rsp]
+ mov r15,QWORD[((-48))+rsi]
+
+ mov r14,QWORD[((-40))+rsi]
+
+ mov r13,QWORD[((-32))+rsi]
+
+ mov r12,QWORD[((-24))+rsi]
+
+ mov rbp,QWORD[((-16))+rsi]
+
+ mov rbx,QWORD[((-8))+rsi]
+
+ lea rsp,[rsi]
+
+$L$epilogue_avx2:
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_sha512_block_data_order_avx2:
+EXTERN __imp_RtlVirtualUnwind
+
+ALIGN 16
+se_handler:
+ push rsi
+ push rdi
+ push rbx
+ push rbp
+ push r12
+ push r13
+ push r14
+ push r15
+ pushfq
+ sub rsp,64
+
+ mov rax,QWORD[120+r8]
+ mov rbx,QWORD[248+r8]
+
+ mov rsi,QWORD[8+r9]
+ mov r11,QWORD[56+r9]
+
+ mov r10d,DWORD[r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jb NEAR $L$in_prologue
+
+ mov rax,QWORD[152+r8]
+
+ mov r10d,DWORD[4+r11]
+ lea r10,[r10*1+rsi]
+ cmp rbx,r10
+ jae NEAR $L$in_prologue
+ lea r10,[$L$avx2_shortcut]
+ cmp rbx,r10
+ jb NEAR $L$not_in_avx2
+
+ and rax,-256*8
+ add rax,1152
+$L$not_in_avx2:
+ mov rsi,rax
+ mov rax,QWORD[((128+24))+rax]
+
+ mov rbx,QWORD[((-8))+rax]
+ mov rbp,QWORD[((-16))+rax]
+ mov r12,QWORD[((-24))+rax]
+ mov r13,QWORD[((-32))+rax]
+ mov r14,QWORD[((-40))+rax]
+ mov r15,QWORD[((-48))+rax]
+ mov QWORD[144+r8],rbx
+ mov QWORD[160+r8],rbp
+ mov QWORD[216+r8],r12
+ mov QWORD[224+r8],r13
+ mov QWORD[232+r8],r14
+ mov QWORD[240+r8],r15
+
+ lea r10,[$L$epilogue]
+ cmp rbx,r10
+ jb NEAR $L$in_prologue
+
+ lea rsi,[((128+32))+rsi]
+ lea rdi,[512+r8]
+ mov ecx,12
+ DD 0xa548f3fc
+
+$L$in_prologue:
+ mov rdi,QWORD[8+rax]
+ mov rsi,QWORD[16+rax]
+ mov QWORD[152+r8],rax
+ mov QWORD[168+r8],rsi
+ mov QWORD[176+r8],rdi
+
+ mov rdi,QWORD[40+r9]
+ mov rsi,r8
+ mov ecx,154
+ DD 0xa548f3fc
+
+ mov rsi,r9
+ xor rcx,rcx
+ mov rdx,QWORD[8+rsi]
+ mov r8,QWORD[rsi]
+ mov r9,QWORD[16+rsi]
+ mov r10,QWORD[40+rsi]
+ lea r11,[56+rsi]
+ lea r12,[24+rsi]
+ mov QWORD[32+rsp],r10
+ mov QWORD[40+rsp],r11
+ mov QWORD[48+rsp],r12
+ mov QWORD[56+rsp],rcx
+ call QWORD[__imp_RtlVirtualUnwind]
+
+ mov eax,1
+ add rsp,64
+ popfq
+ pop r15
+ pop r14
+ pop r13
+ pop r12
+ pop rbp
+ pop rbx
+ pop rdi
+ pop rsi
+ DB 0F3h,0C3h ;repret
+
+section .pdata rdata align=4
+ALIGN 4
+ DD $L$SEH_begin_sha512_block_data_order wrt ..imagebase
+ DD $L$SEH_end_sha512_block_data_order wrt ..imagebase
+ DD $L$SEH_info_sha512_block_data_order wrt ..imagebase
+ DD $L$SEH_begin_sha512_block_data_order_xop wrt ..imagebase
+ DD $L$SEH_end_sha512_block_data_order_xop wrt ..imagebase
+ DD $L$SEH_info_sha512_block_data_order_xop wrt ..imagebase
+ DD $L$SEH_begin_sha512_block_data_order_avx wrt ..imagebase
+ DD $L$SEH_end_sha512_block_data_order_avx wrt ..imagebase
+ DD $L$SEH_info_sha512_block_data_order_avx wrt ..imagebase
+ DD $L$SEH_begin_sha512_block_data_order_avx2 wrt ..imagebase
+ DD $L$SEH_end_sha512_block_data_order_avx2 wrt ..imagebase
+ DD $L$SEH_info_sha512_block_data_order_avx2 wrt ..imagebase
+section .xdata rdata align=8
+ALIGN 8
+$L$SEH_info_sha512_block_data_order:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$prologue wrt ..imagebase,$L$epilogue wrt ..imagebase
+$L$SEH_info_sha512_block_data_order_xop:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$prologue_xop wrt ..imagebase,$L$epilogue_xop wrt ..imagebase
+$L$SEH_info_sha512_block_data_order_avx:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$prologue_avx wrt ..imagebase,$L$epilogue_avx wrt ..imagebase
+$L$SEH_info_sha512_block_data_order_avx2:
+DB 9,0,0,0
+ DD se_handler wrt ..imagebase
+ DD $L$prologue_avx2 wrt ..imagebase,$L$epilogue_avx2 wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/x86_64cpuid.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/x86_64cpuid.nasm
new file mode 100644
index 0000000000..2b64a074c3
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/x86_64cpuid.nasm
@@ -0,0 +1,472 @@
+; Copyright 2005-2018 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License"). You may not use
+; this file except in compliance with the License. You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+EXTERN OPENSSL_cpuid_setup
+
+section .CRT$XCU rdata align=8
+ DQ OPENSSL_cpuid_setup
+
+
+common OPENSSL_ia32cap_P 16
+
+section .text code align=64
+
+
+global OPENSSL_atomic_add
+
+ALIGN 16
+OPENSSL_atomic_add:
+ mov eax,DWORD[rcx]
+$L$spin: lea r8,[rax*1+rdx]
+DB 0xf0
+ cmpxchg DWORD[rcx],r8d
+ jne NEAR $L$spin
+ mov eax,r8d
+DB 0x48,0x98
+ DB 0F3h,0C3h ;repret
+
+
+global OPENSSL_rdtsc
+
+ALIGN 16
+OPENSSL_rdtsc:
+ rdtsc
+ shl rdx,32
+ or rax,rdx
+ DB 0F3h,0C3h ;repret
+
+
+global OPENSSL_ia32_cpuid
+
+ALIGN 16
+OPENSSL_ia32_cpuid:
+ mov QWORD[8+rsp],rdi ;WIN64 prologue
+ mov QWORD[16+rsp],rsi
+ mov rax,rsp
+$L$SEH_begin_OPENSSL_ia32_cpuid:
+ mov rdi,rcx
+
+
+
+ mov r8,rbx
+
+
+ xor eax,eax
+ mov QWORD[8+rdi],rax
+ cpuid
+ mov r11d,eax
+
+ xor eax,eax
+ cmp ebx,0x756e6547
+ setne al
+ mov r9d,eax
+ cmp edx,0x49656e69
+ setne al
+ or r9d,eax
+ cmp ecx,0x6c65746e
+ setne al
+ or r9d,eax
+ jz NEAR $L$intel
+
+ cmp ebx,0x68747541
+ setne al
+ mov r10d,eax
+ cmp edx,0x69746E65
+ setne al
+ or r10d,eax
+ cmp ecx,0x444D4163
+ setne al
+ or r10d,eax
+ jnz NEAR $L$intel
+
+
+ mov eax,0x80000000
+ cpuid
+ cmp eax,0x80000001
+ jb NEAR $L$intel
+ mov r10d,eax
+ mov eax,0x80000001
+ cpuid
+ or r9d,ecx
+ and r9d,0x00000801
+
+ cmp r10d,0x80000008
+ jb NEAR $L$intel
+
+ mov eax,0x80000008
+ cpuid
+ movzx r10,cl
+ inc r10
+
+ mov eax,1
+ cpuid
+ bt edx,28
+ jnc NEAR $L$generic
+ shr ebx,16
+ cmp bl,r10b
+ ja NEAR $L$generic
+ and edx,0xefffffff
+ jmp NEAR $L$generic
+
+$L$intel:
+ cmp r11d,4
+ mov r10d,-1
+ jb NEAR $L$nocacheinfo
+
+ mov eax,4
+ mov ecx,0
+ cpuid
+ mov r10d,eax
+ shr r10d,14
+ and r10d,0xfff
+
+$L$nocacheinfo:
+ mov eax,1
+ cpuid
+ movd xmm0,eax
+ and edx,0xbfefffff
+ cmp r9d,0
+ jne NEAR $L$notintel
+ or edx,0x40000000
+ and ah,15
+ cmp ah,15
+ jne NEAR $L$notP4
+ or edx,0x00100000
+$L$notP4:
+ cmp ah,6
+ jne NEAR $L$notintel
+ and eax,0x0fff0ff0
+ cmp eax,0x00050670
+ je NEAR $L$knights
+ cmp eax,0x00080650
+ jne NEAR $L$notintel
+$L$knights:
+ and ecx,0xfbffffff
+
+$L$notintel:
+ bt edx,28
+ jnc NEAR $L$generic
+ and edx,0xefffffff
+ cmp r10d,0
+ je NEAR $L$generic
+
+ or edx,0x10000000
+ shr ebx,16
+ cmp bl,1
+ ja NEAR $L$generic
+ and edx,0xefffffff
+$L$generic:
+ and r9d,0x00000800
+ and ecx,0xfffff7ff
+ or r9d,ecx
+
+ mov r10d,edx
+
+ cmp r11d,7
+ jb NEAR $L$no_extended_info
+ mov eax,7
+ xor ecx,ecx
+ cpuid
+ bt r9d,26
+ jc NEAR $L$notknights
+ and ebx,0xfff7ffff
+$L$notknights:
+ movd eax,xmm0
+ and eax,0x0fff0ff0
+ cmp eax,0x00050650
+ jne NEAR $L$notskylakex
+ and ebx,0xfffeffff
+
+$L$notskylakex:
+ mov DWORD[8+rdi],ebx
+ mov DWORD[12+rdi],ecx
+$L$no_extended_info:
+
+ bt r9d,27
+ jnc NEAR $L$clear_avx
+ xor ecx,ecx
+DB 0x0f,0x01,0xd0
+ and eax,0xe6
+ cmp eax,0xe6
+ je NEAR $L$done
+ and DWORD[8+rdi],0x3fdeffff
+
+
+
+
+ and eax,6
+ cmp eax,6
+ je NEAR $L$done
+$L$clear_avx:
+ mov eax,0xefffe7ff
+ and r9d,eax
+ mov eax,0x3fdeffdf
+ and DWORD[8+rdi],eax
+$L$done:
+ shl r9,32
+ mov eax,r10d
+ mov rbx,r8
+
+ or rax,r9
+ mov rdi,QWORD[8+rsp] ;WIN64 epilogue
+ mov rsi,QWORD[16+rsp]
+ DB 0F3h,0C3h ;repret
+
+$L$SEH_end_OPENSSL_ia32_cpuid:
+
+global OPENSSL_cleanse
+
+ALIGN 16
+OPENSSL_cleanse:
+ xor rax,rax
+ cmp rdx,15
+ jae NEAR $L$ot
+ cmp rdx,0
+ je NEAR $L$ret
+$L$ittle:
+ mov BYTE[rcx],al
+ sub rdx,1
+ lea rcx,[1+rcx]
+ jnz NEAR $L$ittle
+$L$ret:
+ DB 0F3h,0C3h ;repret
+ALIGN 16
+$L$ot:
+ test rcx,7
+ jz NEAR $L$aligned
+ mov BYTE[rcx],al
+ lea rdx,[((-1))+rdx]
+ lea rcx,[1+rcx]
+ jmp NEAR $L$ot
+$L$aligned:
+ mov QWORD[rcx],rax
+ lea rdx,[((-8))+rdx]
+ test rdx,-8
+ lea rcx,[8+rcx]
+ jnz NEAR $L$aligned
+ cmp rdx,0
+ jne NEAR $L$ittle
+ DB 0F3h,0C3h ;repret
+
+
+global CRYPTO_memcmp
+
+ALIGN 16
+CRYPTO_memcmp:
+ xor rax,rax
+ xor r10,r10
+ cmp r8,0
+ je NEAR $L$no_data
+ cmp r8,16
+ jne NEAR $L$oop_cmp
+ mov r10,QWORD[rcx]
+ mov r11,QWORD[8+rcx]
+ mov r8,1
+ xor r10,QWORD[rdx]
+ xor r11,QWORD[8+rdx]
+ or r10,r11
+ cmovnz rax,r8
+ DB 0F3h,0C3h ;repret
+
+ALIGN 16
+$L$oop_cmp:
+ mov r10b,BYTE[rcx]
+ lea rcx,[1+rcx]
+ xor r10b,BYTE[rdx]
+ lea rdx,[1+rdx]
+ or al,r10b
+ dec r8
+ jnz NEAR $L$oop_cmp
+ neg rax
+ shr rax,63
+$L$no_data:
+ DB 0F3h,0C3h ;repret
+
+global OPENSSL_wipe_cpu
+
+ALIGN 16
+OPENSSL_wipe_cpu:
+ pxor xmm0,xmm0
+ pxor xmm1,xmm1
+ pxor xmm2,xmm2
+ pxor xmm3,xmm3
+ pxor xmm4,xmm4
+ pxor xmm5,xmm5
+ xor rcx,rcx
+ xor rdx,rdx
+ xor r8,r8
+ xor r9,r9
+ xor r10,r10
+ xor r11,r11
+ lea rax,[8+rsp]
+ DB 0F3h,0C3h ;repret
+
+global OPENSSL_instrument_bus
+
+ALIGN 16
+OPENSSL_instrument_bus:
+ mov r10,rcx
+ mov rcx,rdx
+ mov r11,rdx
+
+ rdtsc
+ mov r8d,eax
+ mov r9d,0
+ clflush [r10]
+DB 0xf0
+ add DWORD[r10],r9d
+ jmp NEAR $L$oop
+ALIGN 16
+$L$oop: rdtsc
+ mov edx,eax
+ sub eax,r8d
+ mov r8d,edx
+ mov r9d,eax
+ clflush [r10]
+DB 0xf0
+ add DWORD[r10],eax
+ lea r10,[4+r10]
+ sub rcx,1
+ jnz NEAR $L$oop
+
+ mov rax,r11
+ DB 0F3h,0C3h ;repret
+
+
+global OPENSSL_instrument_bus2
+
+ALIGN 16
+OPENSSL_instrument_bus2:
+ mov r10,rcx
+ mov rcx,rdx
+ mov r11,r8
+ mov QWORD[8+rsp],rcx
+
+ rdtsc
+ mov r8d,eax
+ mov r9d,0
+
+ clflush [r10]
+DB 0xf0
+ add DWORD[r10],r9d
+
+ rdtsc
+ mov edx,eax
+ sub eax,r8d
+ mov r8d,edx
+ mov r9d,eax
+$L$oop2:
+ clflush [r10]
+DB 0xf0
+ add DWORD[r10],eax
+
+ sub r11,1
+ jz NEAR $L$done2
+
+ rdtsc
+ mov edx,eax
+ sub eax,r8d
+ mov r8d,edx
+ cmp eax,r9d
+ mov r9d,eax
+ mov edx,0
+ setne dl
+ sub rcx,rdx
+ lea r10,[rdx*4+r10]
+ jnz NEAR $L$oop2
+
+$L$done2:
+ mov rax,QWORD[8+rsp]
+ sub rax,rcx
+ DB 0F3h,0C3h ;repret
+
+global OPENSSL_ia32_rdrand_bytes
+
+ALIGN 16
+OPENSSL_ia32_rdrand_bytes:
+ xor rax,rax
+ cmp rdx,0
+ je NEAR $L$done_rdrand_bytes
+
+ mov r11,8
+$L$oop_rdrand_bytes:
+DB 73,15,199,242
+ jc NEAR $L$break_rdrand_bytes
+ dec r11
+ jnz NEAR $L$oop_rdrand_bytes
+ jmp NEAR $L$done_rdrand_bytes
+
+ALIGN 16
+$L$break_rdrand_bytes:
+ cmp rdx,8
+ jb NEAR $L$tail_rdrand_bytes
+ mov QWORD[rcx],r10
+ lea rcx,[8+rcx]
+ add rax,8
+ sub rdx,8
+ jz NEAR $L$done_rdrand_bytes
+ mov r11,8
+ jmp NEAR $L$oop_rdrand_bytes
+
+ALIGN 16
+$L$tail_rdrand_bytes:
+ mov BYTE[rcx],r10b
+ lea rcx,[1+rcx]
+ inc rax
+ shr r10,8
+ dec rdx
+ jnz NEAR $L$tail_rdrand_bytes
+
+$L$done_rdrand_bytes:
+ xor r10,r10
+ DB 0F3h,0C3h ;repret
+
+global OPENSSL_ia32_rdseed_bytes
+
+ALIGN 16
+OPENSSL_ia32_rdseed_bytes:
+ xor rax,rax
+ cmp rdx,0
+ je NEAR $L$done_rdseed_bytes
+
+ mov r11,8
+$L$oop_rdseed_bytes:
+DB 73,15,199,250
+ jc NEAR $L$break_rdseed_bytes
+ dec r11
+ jnz NEAR $L$oop_rdseed_bytes
+ jmp NEAR $L$done_rdseed_bytes
+
+ALIGN 16
+$L$break_rdseed_bytes:
+ cmp rdx,8
+ jb NEAR $L$tail_rdseed_bytes
+ mov QWORD[rcx],r10
+ lea rcx,[8+rcx]
+ add rax,8
+ sub rdx,8
+ jz NEAR $L$done_rdseed_bytes
+ mov r11,8
+ jmp NEAR $L$oop_rdseed_bytes
+
+ALIGN 16
+$L$tail_rdseed_bytes:
+ mov BYTE[rcx],r10b
+ lea rcx,[1+rcx]
+ inc rax
+ shr r10,8
+ dec rdx
+ jnz NEAR $L$tail_rdseed_bytes
+
+$L$done_rdseed_bytes:
+ xor r10,r10
+ DB 0F3h,0C3h ;repret
+
diff --git a/CryptoPkg/Library/OpensslLib/process_files.pl b/CryptoPkg/Library/OpensslLib/process_files.pl
index 4ba25da407..c0a19b99b6 100755
--- a/CryptoPkg/Library/OpensslLib/process_files.pl
+++ b/CryptoPkg/Library/OpensslLib/process_files.pl
@@ -12,6 +12,47 @@
use strict;
use Cwd;
use File::Copy;
+use File::Basename;
+use File::Path qw(make_path remove_tree);
+use Text::Tabs;
+
+#
+# OpenSSL perlasm generator script does not transfer the copyright header
+#
+sub copy_license_header
+{
+ my @args = split / /, shift; #Separate args by spaces
+ my $source = $args[1]; #Source file is second (after "perl")
+ my $target = pop @args; #Target file is always last
+ chop ($target); #Remove newline char
+
+ my $temp_file_name = "license.tmp";
+ open (my $source_file, "<" . $source) || die $source;
+ open (my $target_file, "<" . $target) || die $target;
+ open (my $temp_file, ">" . $temp_file_name) || die $temp_file_name;
+
+ #Copy source file header to temp file
+ while (my $line = <$source_file>) {
+ next if ($line =~ /#!/); #Ignore shebang line
+ $line =~ s/#/;/; #Fix comment character for assembly
+ $line =~ s/\s+$/\r\n/; #Trim trailing whitepsace, fixup line endings
+ print ($temp_file $line);
+ last if ($line =~ /http/); #Last line of copyright header contains a web link
+ }
+ print ($temp_file "\r\n"); #Add an empty line after the header
+ #Retrieve generated assembly contents
+ while (my $line = <$target_file>) {
+ $line =~ s/\s+$/\r\n/; #Trim trailing whitepsace, fixup line endings
+ print ($temp_file expand ($line)); #expand() replaces tabs with spaces
+ }
+
+ close ($source_file);
+ close ($target_file);
+ close ($temp_file);
+
+ move ($temp_file_name, $target) ||
+ die "Cannot replace \"" . $target . "\"!";
+}
#
# Find the openssl directory name for use lib. We have to do this
@@ -21,10 +62,39 @@ use File::Copy;
#
my $inf_file;
my $OPENSSL_PATH;
+my $uefi_config;
+my $extension;
+my $arch;
my @inf;
BEGIN {
$inf_file = "OpensslLib.inf";
+ $uefi_config = "UEFI";
+ $arch = shift;
+
+ if (defined $arch) {
+ if (lc ($arch) eq lc ("X64")) {
+ $inf_file = "OpensslLibX64.inf";
+ $uefi_config = "UEFI-x86_64";
+ $extension = "nasm";
+ } elsif (lc ($arch) eq lc ("IA32")) {
+ $arch = "Ia32";
+ $inf_file = "OpensslLibIa32.inf";
+ $uefi_config = "UEFI-x86";
+ $extension = "nasm";
+ } else {
+ die "Unsupported architecture \"" . $arch . "\"!";
+ }
+
+ # Prepare assembly folder
+ if (-d $arch) {
+ remove_tree ($arch, {safe => 1}) ||
+ die "Cannot clean assembly folder \"" . $arch . "\"!";
+ } else {
+ mkdir $arch ||
+ die "Cannot create assembly folder \"" . $arch . "\"!";
+ }
+ }
# Read the contents of the inf file
open( FD, "<" . $inf_file ) ||
@@ -47,9 +117,9 @@ BEGIN {
# Configure UEFI
system(
"./Configure",
- "UEFI",
+ "--config=../uefi-asm.conf",
+ "$uefi_config",
"no-afalgeng",
- "no-asm",
"no-async",
"no-autoerrinit",
"no-autoload-config",
@@ -126,22 +196,52 @@ BEGIN {
# Retrieve file lists from OpenSSL configdata
#
use configdata qw/%unified_info/;
+use configdata qw/%config/;
+use configdata qw/%target/;
+
+#
+# Collect build flags from configdata
+#
+my $flags = "";
+foreach my $f (@{$config{lib_defines}}) {
+ $flags .= " -D$f";
+}
my @cryptofilelist = ();
my @sslfilelist = ();
+my @asmfilelist = ();
+my @asmbuild = ();
foreach my $product ((@{$unified_info{libraries}},
@{$unified_info{engines}})) {
foreach my $o (@{$unified_info{sources}->{$product}}) {
foreach my $s (@{$unified_info{sources}->{$o}}) {
- next if ($unified_info{generate}->{$s});
- next if $s =~ "crypto/bio/b_print.c";
-
# No need to add unused files in UEFI.
# So it can reduce porting time, compile time, library size.
+ next if $s =~ "crypto/bio/b_print.c";
next if $s =~ "crypto/rand/randfile.c";
next if $s =~ "crypto/store/";
next if $s =~ "crypto/err/err_all.c";
+ if ($unified_info{generate}->{$s}) {
+ if (defined $arch) {
+ my $buildstring = "perl";
+ foreach my $arg (@{$unified_info{generate}->{$s}}) {
+ if ($arg =~ ".pl") {
+ $buildstring .= " ./openssl/$arg";
+ } elsif ($arg =~ "PERLASM_SCHEME") {
+ $buildstring .= " $target{perlasm_scheme}";
+ } elsif ($arg =~ "LIB_CFLAGS") {
+ $buildstring .= "$flags";
+ }
+ }
+ ($s, my $path, undef) = fileparse($s, qr/\.[^.]*/);
+ $buildstring .= " ./$arch/$path$s.$extension";
+ make_path ("./$arch/$path");
+ push @asmbuild, "$buildstring\n";
+ push @asmfilelist, " $arch/$path$s.$extension\r\n";
+ }
+ next;
+ }
if ($product =~ "libssl") {
push @sslfilelist, ' $(OPENSSL_PATH)/' . $s . "\r\n";
next;
@@ -179,15 +279,31 @@ foreach (@headers){
}
+#
+# Generate assembly files
+#
+if (@asmbuild) {
+ print "\n--> Generating assembly files ... ";
+ foreach my $buildstring (@asmbuild) {
+ system ("$buildstring");
+ copy_license_header ($buildstring);
+ }
+ print "Done!";
+}
+
#
# Update OpensslLib.inf with autogenerated file list
#
my @new_inf = ();
my $subbing = 0;
-print "\n--> Updating OpensslLib.inf ... ";
+print "\n--> Updating $inf_file ... ";
foreach (@inf) {
+ if ($_ =~ "DEFINE OPENSSL_FLAGS_CONFIG") {
+ push @new_inf, " DEFINE OPENSSL_FLAGS_CONFIG =" . $flags . "\r\n";
+ next;
+ }
if ( $_ =~ "# Autogenerated files list starts here" ) {
- push @new_inf, $_, @cryptofilelist, @sslfilelist;
+ push @new_inf, $_, @asmfilelist, @cryptofilelist, @sslfilelist;
$subbing = 1;
next;
}
@@ -212,49 +328,51 @@ rename( $new_inf_file, $inf_file ) ||
die "rename $inf_file";
print "Done!";
-#
-# Update OpensslLibCrypto.inf with auto-generated file list (no libssl)
-#
-$inf_file = "OpensslLibCrypto.inf";
-
-# Read the contents of the inf file
-@inf = ();
-@new_inf = ();
-open( FD, "<" . $inf_file ) ||
- die "Cannot open \"" . $inf_file . "\"!";
-@inf = (<FD>);
-close(FD) ||
- die "Cannot close \"" . $inf_file . "\"!";
+if (!defined $arch) {
+ #
+ # Update OpensslLibCrypto.inf with auto-generated file list (no libssl)
+ #
+ $inf_file = "OpensslLibCrypto.inf";
-$subbing = 0;
-print "\n--> Updating OpensslLibCrypto.inf ... ";
-foreach (@inf) {
- if ( $_ =~ "# Autogenerated files list starts here" ) {
- push @new_inf, $_, @cryptofilelist;
- $subbing = 1;
- next;
- }
- if ( $_ =~ "# Autogenerated files list ends here" ) {
- push @new_inf, $_;
- $subbing = 0;
- next;
+ # Read the contents of the inf file
+ @inf = ();
+ @new_inf = ();
+ open( FD, "<" . $inf_file ) ||
+ die "Cannot open \"" . $inf_file . "\"!";
+ @inf = (<FD>);
+ close(FD) ||
+ die "Cannot close \"" . $inf_file . "\"!";
+
+ $subbing = 0;
+ print "\n--> Updating OpensslLibCrypto.inf ... ";
+ foreach (@inf) {
+ if ( $_ =~ "# Autogenerated files list starts here" ) {
+ push @new_inf, $_, @cryptofilelist;
+ $subbing = 1;
+ next;
+ }
+ if ( $_ =~ "# Autogenerated files list ends here" ) {
+ push @new_inf, $_;
+ $subbing = 0;
+ next;
+ }
+
+ push @new_inf, $_
+ unless ($subbing);
}
- push @new_inf, $_
- unless ($subbing);
+ $new_inf_file = $inf_file . ".new";
+ open( FD, ">" . $new_inf_file ) ||
+ die $new_inf_file;
+ print( FD @new_inf ) ||
+ die $new_inf_file;
+ close(FD) ||
+ die $new_inf_file;
+ rename( $new_inf_file, $inf_file ) ||
+ die "rename $inf_file";
+ print "Done!";
}
-$new_inf_file = $inf_file . ".new";
-open( FD, ">" . $new_inf_file ) ||
- die $new_inf_file;
-print( FD @new_inf ) ||
- die $new_inf_file;
-close(FD) ||
- die $new_inf_file;
-rename( $new_inf_file, $inf_file ) ||
- die "rename $inf_file";
-print "Done!";
-
#
# Copy opensslconf.h and dso_conf.h generated from OpenSSL Configuration
#
diff --git a/CryptoPkg/Library/OpensslLib/uefi-asm.conf b/CryptoPkg/Library/OpensslLib/uefi-asm.conf
new file mode 100644
index 0000000000..4fd52c9cf2
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/uefi-asm.conf
@@ -0,0 +1,14 @@
+## -*- mode: perl; -*-
+## UEFI assembly openssl configuration targets.
+
+my %targets = (
+#### UEFI
+ "UEFI-x86" => {
+ inherit_from => [ "UEFI", asm("x86_asm") ],
+ perlasm_scheme => "win32n",
+ },
+ "UEFI-x86_64" => {
+ inherit_from => [ "UEFI", asm("x86_64_asm") ],
+ perlasm_scheme => "nasm",
+ },
+);
--
2.16.2.windows.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [edk2-devel] [PATCH 0/1] CryptoPkg/OpensslLib: Add native instruction support for IA32 and X64
2020-03-17 10:26 [PATCH 0/1] CryptoPkg/OpensslLib: Add native instruction support for IA32 and X64 Zurcher, Christopher J
2020-03-17 10:26 ` [PATCH 1/1] " Zurcher, Christopher J
@ 2020-03-23 12:59 ` Laszlo Ersek
2020-03-25 18:40 ` Ard Biesheuvel
2 siblings, 0 replies; 14+ messages in thread
From: Laszlo Ersek @ 2020-03-23 12:59 UTC (permalink / raw)
To: devel, christopher.j.zurcher
Cc: Jian J Wang, Xiaoyu Lu, Eugene Cohen, Ard Biesheuvel
On 03/17/20 11:26, Zurcher, Christopher J wrote:
> BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=2507
>
> This patch adds support for building the native instruction algorithms for
> IA32 and X64 versions of OpensslLib. The process_files.pl script was modified
> to parse the .asm file targets from the OpenSSL build config data struct, and
> generate the necessary assembly files for the EDK2 build environment.
>
> For the X64 variant, OpenSSL includes calls to a Windows error handling API,
> and that function has been stubbed out in ApiHooks.c.
>
> For all variants, a constructor was added to call the required CPUID function
> within OpenSSL to facilitate processor capability checks in the native
> algorithms.
>
> Additional native architecture variants should be simple to add by following
> the changes made for these two architectures.
>
> The OpenSSL assembly files are traditionally generated at build time using a
> perl script. To avoid that burden on EDK2 users, these end-result assembly
> files are generated during the configuration steps performed by the package
> maintainer (through process_files.pl). The perl generator scripts inside
> OpenSSL do not parse file comments as they are only meant to create
> intermediate build files, so process_files.pl contains additional hooks to
> preserve the copyright headers as well as clean up tabs and line endings to
> comply with EDK2 coding standards. The resulting file headers align with
> the generated .h files which are already included in the EDK2 repository.
>
> Cc: Jian J Wang <jian.j.wang@intel.com>
> Cc: Xiaoyu Lu <xiaoyux.lu@intel.com>
> Cc: Eugene Cohen <eugene@hp.com>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>
> Christopher J Zurcher (1):
> CryptoPkg/OpensslLib: Add native instruction support for IA32 and X64
>
> CryptoPkg/Library/OpensslLib/OpensslLib.inf | 2 +-
> CryptoPkg/Library/OpensslLib/OpensslLibCrypto.inf | 2 +-
> CryptoPkg/Library/OpensslLib/OpensslLibIa32.inf | 680 ++
> CryptoPkg/Library/OpensslLib/OpensslLibX64.inf | 691 ++
> CryptoPkg/Library/Include/openssl/opensslconf.h | 3 -
> CryptoPkg/Library/OpensslLib/ApiHooks.c | 18 +
> CryptoPkg/Library/OpensslLib/OpensslLibConstructor.c | 34 +
> CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/aesni-x86.nasm | 3209 ++++++++
> CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/vpaes-x86.nasm | 648 ++
> CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/bn-586.nasm | 1522 ++++
> CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/co-586.nasm | 1259 +++
> CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-gf2m.nasm | 352 +
> CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-mont.nasm | 486 ++
> CryptoPkg/Library/OpensslLib/Ia32/crypto/des/crypt586.nasm | 887 +++
> CryptoPkg/Library/OpensslLib/Ia32/crypto/des/des-586.nasm | 1835 +++++
> CryptoPkg/Library/OpensslLib/Ia32/crypto/md5/md5-586.nasm | 690 ++
> CryptoPkg/Library/OpensslLib/Ia32/crypto/modes/ghash-x86.nasm | 1264 +++
> CryptoPkg/Library/OpensslLib/Ia32/crypto/rc4/rc4-586.nasm | 381 +
> CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha1-586.nasm | 3977 ++++++++++
> CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha256-586.nasm | 6796 ++++++++++++++++
> CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha512-586.nasm | 2842 +++++++
> CryptoPkg/Library/OpensslLib/Ia32/crypto/x86cpuid.nasm | 513 ++
> CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-mb-x86_64.nasm | 1772 +++++
> CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha1-x86_64.nasm | 3271 ++++++++
> CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha256-x86_64.nasm | 4709 +++++++++++
> CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-x86_64.nasm | 5084 ++++++++++++
> CryptoPkg/Library/OpensslLib/X64/crypto/aes/vpaes-x86_64.nasm | 1170 +++
> CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-avx2.nasm | 1989 +++++
> CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-x86_64.nasm | 2242 ++++++
> CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-gf2m.nasm | 432 +
> CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont.nasm | 1479 ++++
> CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont5.nasm | 4033 ++++++++++
> CryptoPkg/Library/OpensslLib/X64/crypto/md5/md5-x86_64.nasm | 794 ++
> CryptoPkg/Library/OpensslLib/X64/crypto/modes/aesni-gcm-x86_64.nasm | 984 +++
> CryptoPkg/Library/OpensslLib/X64/crypto/modes/ghash-x86_64.nasm | 2077 +++++
> CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-md5-x86_64.nasm | 1395 ++++
> CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-x86_64.nasm | 784 ++
> CryptoPkg/Library/OpensslLib/X64/crypto/sha/keccak1600-x86_64.nasm | 532 ++
> CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-mb-x86_64.nasm | 7581 ++++++++++++++++++
> CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-x86_64.nasm | 5773 ++++++++++++++
> CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-mb-x86_64.nasm | 8262 ++++++++++++++++++++
> CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-x86_64.nasm | 5712 ++++++++++++++
> CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha512-x86_64.nasm | 5668 ++++++++++++++
> CryptoPkg/Library/OpensslLib/X64/crypto/x86_64cpuid.nasm | 472 ++
> CryptoPkg/Library/OpensslLib/process_files.pl | 208 +-
> CryptoPkg/Library/OpensslLib/uefi-asm.conf | 14 +
> 46 files changed, 94478 insertions(+), 50 deletions(-)
> create mode 100644 CryptoPkg/Library/OpensslLib/OpensslLibIa32.inf
> create mode 100644 CryptoPkg/Library/OpensslLib/OpensslLibX64.inf
> create mode 100644 CryptoPkg/Library/OpensslLib/ApiHooks.c
> create mode 100644 CryptoPkg/Library/OpensslLib/OpensslLibConstructor.c
> create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/aesni-x86.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/vpaes-x86.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/bn-586.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/co-586.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-gf2m.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-mont.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/des/crypt586.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/des/des-586.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/md5/md5-586.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/modes/ghash-x86.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/rc4/rc4-586.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha1-586.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha256-586.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha512-586.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/x86cpuid.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-mb-x86_64.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha1-x86_64.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha256-x86_64.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-x86_64.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/aes/vpaes-x86_64.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-avx2.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-x86_64.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-gf2m.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont5.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/md5/md5-x86_64.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/modes/aesni-gcm-x86_64.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/modes/ghash-x86_64.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-md5-x86_64.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-x86_64.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/sha/keccak1600-x86_64.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-mb-x86_64.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-x86_64.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-mb-x86_64.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-x86_64.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha512-x86_64.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/x86_64cpuid.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/uefi-asm.conf
>
(1) Please break this patch into at least two patches. The generated
files add more than ninety thousand lines, and (as I understand) noone
is expected to review them in detail. They should be separated to a
dedicated patch.
The rest of the code changes should be reviewed with more care, so they
deserve at least one stand-alone patch (several patches if necessary).
(2) Furthermore, I would suggest including a comment near the top of
each generated NASM file that said file was generated, and should not be
modified manually. (Of course this would mean updating the perl script
-- I'm not asking for manual comments!)
Thanks!
Laszlo
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/1] CryptoPkg/OpensslLib: Add native instruction support for IA32 and X64
2020-03-17 10:26 [PATCH 0/1] CryptoPkg/OpensslLib: Add native instruction support for IA32 and X64 Zurcher, Christopher J
2020-03-17 10:26 ` [PATCH 1/1] " Zurcher, Christopher J
2020-03-23 12:59 ` [edk2-devel] [PATCH 0/1] " Laszlo Ersek
@ 2020-03-25 18:40 ` Ard Biesheuvel
2020-03-26 1:04 ` [edk2-devel] " Zurcher, Christopher J
2 siblings, 1 reply; 14+ messages in thread
From: Ard Biesheuvel @ 2020-03-25 18:40 UTC (permalink / raw)
To: Christopher J Zurcher
Cc: edk2-devel-groups-io, Jian J Wang, Xiaoyu Lu, Eugene Cohen
On Tue, 17 Mar 2020 at 11:26, Christopher J Zurcher
<christopher.j.zurcher@intel.com> wrote:
>
> BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=2507
>
> This patch adds support for building the native instruction algorithms for
> IA32 and X64 versions of OpensslLib. The process_files.pl script was modified
> to parse the .asm file targets from the OpenSSL build config data struct, and
> generate the necessary assembly files for the EDK2 build environment.
>
> For the X64 variant, OpenSSL includes calls to a Windows error handling API,
> and that function has been stubbed out in ApiHooks.c.
>
> For all variants, a constructor was added to call the required CPUID function
> within OpenSSL to facilitate processor capability checks in the native
> algorithms.
>
> Additional native architecture variants should be simple to add by following
> the changes made for these two architectures.
>
> The OpenSSL assembly files are traditionally generated at build time using a
> perl script. To avoid that burden on EDK2 users, these end-result assembly
> files are generated during the configuration steps performed by the package
> maintainer (through process_files.pl). The perl generator scripts inside
> OpenSSL do not parse file comments as they are only meant to create
> intermediate build files, so process_files.pl contains additional hooks to
> preserve the copyright headers as well as clean up tabs and line endings to
> comply with EDK2 coding standards. The resulting file headers align with
> the generated .h files which are already included in the EDK2 repository.
>
> Cc: Jian J Wang <jian.j.wang@intel.com>
> Cc: Xiaoyu Lu <xiaoyux.lu@intel.com>
> Cc: Eugene Cohen <eugene@hp.com>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>
> Christopher J Zurcher (1):
> CryptoPkg/OpensslLib: Add native instruction support for IA32 and X64
>
Hello Christopher,
It would be helpful to understand the purpose of all this. Also, I
think we could be more picky about which algorithms we enable - DES
and MD5 don't seem highly useful, and even if they were, what do we
gain by using a faster (or smaller?) implementation?
> CryptoPkg/Library/OpensslLib/OpensslLib.inf | 2 +-
> CryptoPkg/Library/OpensslLib/OpensslLibCrypto.inf | 2 +-
> CryptoPkg/Library/OpensslLib/OpensslLibIa32.inf | 680 ++
> CryptoPkg/Library/OpensslLib/OpensslLibX64.inf | 691 ++
> CryptoPkg/Library/Include/openssl/opensslconf.h | 3 -
> CryptoPkg/Library/OpensslLib/ApiHooks.c | 18 +
> CryptoPkg/Library/OpensslLib/OpensslLibConstructor.c | 34 +
> CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/aesni-x86.nasm | 3209 ++++++++
> CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/vpaes-x86.nasm | 648 ++
> CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/bn-586.nasm | 1522 ++++
> CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/co-586.nasm | 1259 +++
> CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-gf2m.nasm | 352 +
> CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-mont.nasm | 486 ++
> CryptoPkg/Library/OpensslLib/Ia32/crypto/des/crypt586.nasm | 887 +++
> CryptoPkg/Library/OpensslLib/Ia32/crypto/des/des-586.nasm | 1835 +++++
> CryptoPkg/Library/OpensslLib/Ia32/crypto/md5/md5-586.nasm | 690 ++
> CryptoPkg/Library/OpensslLib/Ia32/crypto/modes/ghash-x86.nasm | 1264 +++
> CryptoPkg/Library/OpensslLib/Ia32/crypto/rc4/rc4-586.nasm | 381 +
> CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha1-586.nasm | 3977 ++++++++++
> CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha256-586.nasm | 6796 ++++++++++++++++
> CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha512-586.nasm | 2842 +++++++
> CryptoPkg/Library/OpensslLib/Ia32/crypto/x86cpuid.nasm | 513 ++
> CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-mb-x86_64.nasm | 1772 +++++
> CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha1-x86_64.nasm | 3271 ++++++++
> CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha256-x86_64.nasm | 4709 +++++++++++
> CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-x86_64.nasm | 5084 ++++++++++++
> CryptoPkg/Library/OpensslLib/X64/crypto/aes/vpaes-x86_64.nasm | 1170 +++
> CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-avx2.nasm | 1989 +++++
> CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-x86_64.nasm | 2242 ++++++
> CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-gf2m.nasm | 432 +
> CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont.nasm | 1479 ++++
> CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont5.nasm | 4033 ++++++++++
> CryptoPkg/Library/OpensslLib/X64/crypto/md5/md5-x86_64.nasm | 794 ++
> CryptoPkg/Library/OpensslLib/X64/crypto/modes/aesni-gcm-x86_64.nasm | 984 +++
> CryptoPkg/Library/OpensslLib/X64/crypto/modes/ghash-x86_64.nasm | 2077 +++++
> CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-md5-x86_64.nasm | 1395 ++++
> CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-x86_64.nasm | 784 ++
> CryptoPkg/Library/OpensslLib/X64/crypto/sha/keccak1600-x86_64.nasm | 532 ++
> CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-mb-x86_64.nasm | 7581 ++++++++++++++++++
> CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-x86_64.nasm | 5773 ++++++++++++++
> CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-mb-x86_64.nasm | 8262 ++++++++++++++++++++
> CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-x86_64.nasm | 5712 ++++++++++++++
> CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha512-x86_64.nasm | 5668 ++++++++++++++
> CryptoPkg/Library/OpensslLib/X64/crypto/x86_64cpuid.nasm | 472 ++
> CryptoPkg/Library/OpensslLib/process_files.pl | 208 +-
> CryptoPkg/Library/OpensslLib/uefi-asm.conf | 14 +
> 46 files changed, 94478 insertions(+), 50 deletions(-)
> create mode 100644 CryptoPkg/Library/OpensslLib/OpensslLibIa32.inf
> create mode 100644 CryptoPkg/Library/OpensslLib/OpensslLibX64.inf
> create mode 100644 CryptoPkg/Library/OpensslLib/ApiHooks.c
> create mode 100644 CryptoPkg/Library/OpensslLib/OpensslLibConstructor.c
> create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/aesni-x86.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/vpaes-x86.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/bn-586.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/co-586.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-gf2m.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-mont.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/des/crypt586.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/des/des-586.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/md5/md5-586.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/modes/ghash-x86.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/rc4/rc4-586.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha1-586.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha256-586.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha512-586.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/x86cpuid.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-mb-x86_64.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha1-x86_64.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha256-x86_64.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-x86_64.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/aes/vpaes-x86_64.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-avx2.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-x86_64.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-gf2m.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont5.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/md5/md5-x86_64.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/modes/aesni-gcm-x86_64.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/modes/ghash-x86_64.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-md5-x86_64.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-x86_64.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/sha/keccak1600-x86_64.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-mb-x86_64.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-x86_64.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-mb-x86_64.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-x86_64.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha512-x86_64.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/x86_64cpuid.nasm
> create mode 100644 CryptoPkg/Library/OpensslLib/uefi-asm.conf
>
> --
> 2.16.2.windows.1
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [edk2-devel] [PATCH 0/1] CryptoPkg/OpensslLib: Add native instruction support for IA32 and X64
2020-03-25 18:40 ` Ard Biesheuvel
@ 2020-03-26 1:04 ` Zurcher, Christopher J
2020-03-26 7:49 ` Ard Biesheuvel
0 siblings, 1 reply; 14+ messages in thread
From: Zurcher, Christopher J @ 2020-03-26 1:04 UTC (permalink / raw)
To: devel@edk2.groups.io, ard.biesheuvel@linaro.org
Cc: Wang, Jian J, Lu, XiaoyuX, david.harris4@hp.com
> -----Original Message-----
> From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of Ard Biesheuvel
> Sent: Wednesday, March 25, 2020 11:40
> To: Zurcher, Christopher J <christopher.j.zurcher@intel.com>
> Cc: edk2-devel-groups-io <devel@edk2.groups.io>; Wang, Jian J
> <jian.j.wang@intel.com>; Lu, XiaoyuX <xiaoyux.lu@intel.com>; Eugene Cohen
> <eugene@hp.com>
> Subject: Re: [edk2-devel] [PATCH 0/1] CryptoPkg/OpensslLib: Add native
> instruction support for IA32 and X64
>
> On Tue, 17 Mar 2020 at 11:26, Christopher J Zurcher
> <christopher.j.zurcher@intel.com> wrote:
> >
> > BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=2507
> >
> > This patch adds support for building the native instruction algorithms for
> > IA32 and X64 versions of OpensslLib. The process_files.pl script was
> modified
> > to parse the .asm file targets from the OpenSSL build config data struct,
> and
> > generate the necessary assembly files for the EDK2 build environment.
> >
> > For the X64 variant, OpenSSL includes calls to a Windows error handling
> API,
> > and that function has been stubbed out in ApiHooks.c.
> >
> > For all variants, a constructor was added to call the required CPUID
> function
> > within OpenSSL to facilitate processor capability checks in the native
> > algorithms.
> >
> > Additional native architecture variants should be simple to add by
> following
> > the changes made for these two architectures.
> >
> > The OpenSSL assembly files are traditionally generated at build time using
> a
> > perl script. To avoid that burden on EDK2 users, these end-result assembly
> > files are generated during the configuration steps performed by the package
> > maintainer (through process_files.pl). The perl generator scripts inside
> > OpenSSL do not parse file comments as they are only meant to create
> > intermediate build files, so process_files.pl contains additional hooks to
> > preserve the copyright headers as well as clean up tabs and line endings to
> > comply with EDK2 coding standards. The resulting file headers align with
> > the generated .h files which are already included in the EDK2 repository.
> >
> > Cc: Jian J Wang <jian.j.wang@intel.com>
> > Cc: Xiaoyu Lu <xiaoyux.lu@intel.com>
> > Cc: Eugene Cohen <eugene@hp.com>
> > Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> >
> > Christopher J Zurcher (1):
> > CryptoPkg/OpensslLib: Add native instruction support for IA32 and X64
> >
>
> Hello Christopher,
>
> It would be helpful to understand the purpose of all this. Also, I
> think we could be more picky about which algorithms we enable - DES
> and MD5 don't seem highly useful, and even if they were, what do we
> gain by using a faster (or smaller?) implementation?
>
The selection of algorithms comes from the default OpenSSL assembly targets; this combination is validated and widely used, and I don't know all the consequences of picking and choosing which ones to include. If necessary I could look into reducing the list.
The primary driver for this change is enabling the Full Flash Update (FFU) OS provisioning flow for Windows as described here:
https://docs.microsoft.com/en-us/windows-hardware/manufacture/desktop/wim-vs-ffu-image-file-formats
This item under "Reliability" is what we are speeding up: "Includes a catalog and hash table to validate a signature upfront before flashing onto a device. The hash table is generated during capture, and validated when applying the image."
This provisioning flow can be performed within the UEFI environment, and the native algorithms allow significant time savings in a factory setting (minutes per device).
We also had a BZ which requested these changes and the specific need was not provided. Maybe David @HP can provide further insight?
https://bugzilla.tianocore.org/show_bug.cgi?id=2507
There have been additional platform-specific benefits identified, for example speeding up HMAC authentication of communication with other HW/FW components.
Thanks,
Christopher Zurcher
>
>
>
> > CryptoPkg/Library/OpensslLib/OpensslLib.inf |
> 2 +-
> > CryptoPkg/Library/OpensslLib/OpensslLibCrypto.inf |
> 2 +-
> > CryptoPkg/Library/OpensslLib/OpensslLibIa32.inf |
> 680 ++
> > CryptoPkg/Library/OpensslLib/OpensslLibX64.inf |
> 691 ++
> > CryptoPkg/Library/Include/openssl/opensslconf.h |
> 3 -
> > CryptoPkg/Library/OpensslLib/ApiHooks.c |
> 18 +
> > CryptoPkg/Library/OpensslLib/OpensslLibConstructor.c |
> 34 +
> > CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/aesni-x86.nasm |
> 3209 ++++++++
> > CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/vpaes-x86.nasm |
> 648 ++
> > CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/bn-586.nasm |
> 1522 ++++
> > CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/co-586.nasm |
> 1259 +++
> > CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-gf2m.nasm |
> 352 +
> > CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-mont.nasm |
> 486 ++
> > CryptoPkg/Library/OpensslLib/Ia32/crypto/des/crypt586.nasm |
> 887 +++
> > CryptoPkg/Library/OpensslLib/Ia32/crypto/des/des-586.nasm |
> 1835 +++++
> > CryptoPkg/Library/OpensslLib/Ia32/crypto/md5/md5-586.nasm |
> 690 ++
> > CryptoPkg/Library/OpensslLib/Ia32/crypto/modes/ghash-x86.nasm |
> 1264 +++
> > CryptoPkg/Library/OpensslLib/Ia32/crypto/rc4/rc4-586.nasm |
> 381 +
> > CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha1-586.nasm |
> 3977 ++++++++++
> > CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha256-586.nasm |
> 6796 ++++++++++++++++
> > CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha512-586.nasm |
> 2842 +++++++
> > CryptoPkg/Library/OpensslLib/Ia32/crypto/x86cpuid.nasm |
> 513 ++
> > CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-mb-x86_64.nasm |
> 1772 +++++
> > CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha1-x86_64.nasm |
> 3271 ++++++++
> > CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha256-x86_64.nasm |
> 4709 +++++++++++
> > CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-x86_64.nasm |
> 5084 ++++++++++++
> > CryptoPkg/Library/OpensslLib/X64/crypto/aes/vpaes-x86_64.nasm |
> 1170 +++
> > CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-avx2.nasm |
> 1989 +++++
> > CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-x86_64.nasm |
> 2242 ++++++
> > CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-gf2m.nasm |
> 432 +
> > CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont.nasm |
> 1479 ++++
> > CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont5.nasm |
> 4033 ++++++++++
> > CryptoPkg/Library/OpensslLib/X64/crypto/md5/md5-x86_64.nasm |
> 794 ++
> > CryptoPkg/Library/OpensslLib/X64/crypto/modes/aesni-gcm-x86_64.nasm |
> 984 +++
> > CryptoPkg/Library/OpensslLib/X64/crypto/modes/ghash-x86_64.nasm |
> 2077 +++++
> > CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-md5-x86_64.nasm |
> 1395 ++++
> > CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-x86_64.nasm |
> 784 ++
> > CryptoPkg/Library/OpensslLib/X64/crypto/sha/keccak1600-x86_64.nasm |
> 532 ++
> > CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-mb-x86_64.nasm |
> 7581 ++++++++++++++++++
> > CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-x86_64.nasm |
> 5773 ++++++++++++++
> > CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-mb-x86_64.nasm |
> 8262 ++++++++++++++++++++
> > CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-x86_64.nasm |
> 5712 ++++++++++++++
> > CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha512-x86_64.nasm |
> 5668 ++++++++++++++
> > CryptoPkg/Library/OpensslLib/X64/crypto/x86_64cpuid.nasm |
> 472 ++
> > CryptoPkg/Library/OpensslLib/process_files.pl |
> 208 +-
> > CryptoPkg/Library/OpensslLib/uefi-asm.conf |
> 14 +
> > 46 files changed, 94478 insertions(+), 50 deletions(-)
> > create mode 100644 CryptoPkg/Library/OpensslLib/OpensslLibIa32.inf
> > create mode 100644 CryptoPkg/Library/OpensslLib/OpensslLibX64.inf
> > create mode 100644 CryptoPkg/Library/OpensslLib/ApiHooks.c
> > create mode 100644 CryptoPkg/Library/OpensslLib/OpensslLibConstructor.c
> > create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/aesni-
> x86.nasm
> > create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/vpaes-
> x86.nasm
> > create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/bn-586.nasm
> > create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/co-586.nasm
> > create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-
> gf2m.nasm
> > create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-
> mont.nasm
> > create mode 100644
> CryptoPkg/Library/OpensslLib/Ia32/crypto/des/crypt586.nasm
> > create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/des/des-
> 586.nasm
> > create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/md5/md5-
> 586.nasm
> > create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/modes/ghash-
> x86.nasm
> > create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/rc4/rc4-
> 586.nasm
> > create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha1-
> 586.nasm
> > create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha256-
> 586.nasm
> > create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha512-
> 586.nasm
> > create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/x86cpuid.nasm
> > create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-mb-
> x86_64.nasm
> > create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha1-
> x86_64.nasm
> > create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-
> sha256-x86_64.nasm
> > create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-
> x86_64.nasm
> > create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/aes/vpaes-
> x86_64.nasm
> > create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-
> avx2.nasm
> > create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-
> x86_64.nasm
> > create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-
> gf2m.nasm
> > create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-
> mont.nasm
> > create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-
> mont5.nasm
> > create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/md5/md5-
> x86_64.nasm
> > create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/modes/aesni-
> gcm-x86_64.nasm
> > create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/modes/ghash-
> x86_64.nasm
> > create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-md5-
> x86_64.nasm
> > create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-
> x86_64.nasm
> > create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/sha/keccak1600-
> x86_64.nasm
> > create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-mb-
> x86_64.nasm
> > create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-
> x86_64.nasm
> > create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-mb-
> x86_64.nasm
> > create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-
> x86_64.nasm
> > create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha512-
> x86_64.nasm
> > create mode 100644
> CryptoPkg/Library/OpensslLib/X64/crypto/x86_64cpuid.nasm
> > create mode 100644 CryptoPkg/Library/OpensslLib/uefi-asm.conf
> >
> > --
> > 2.16.2.windows.1
> >
>
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native instruction support for IA32 and X64
2020-03-17 10:26 ` [PATCH 1/1] " Zurcher, Christopher J
@ 2020-03-26 1:15 ` Yao, Jiewen
[not found] ` <15FFB5A5A94CCE31.23217@groups.io>
1 sibling, 0 replies; 14+ messages in thread
From: Yao, Jiewen @ 2020-03-26 1:15 UTC (permalink / raw)
To: devel@edk2.groups.io, Zurcher, Christopher J
Cc: Wang, Jian J, Lu, XiaoyuX, Eugene Cohen, Ard Biesheuvel
HI Christopher
Thanks for the contribution. I think it is good enhancement.
Do you have any data show what performance improvement we can get?
Did the system boot faster with the this? Which feature ?
UEFI Secure Boot? TCG Measured Boot? HTTPS boot?
Comment for the code:
1) I am not sure if we need separate OpensslLibIa32 and OpensslLibX64.
Can we just define single INF, such as OpensslLibHw.inf ?
2) Do we also need add a new version for OpensslLibCrypto.inf ?
Thank you
Yao Jiewen
> -----Original Message-----
> From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of Zurcher,
> Christopher J
> Sent: Tuesday, March 17, 2020 6:27 PM
> To: devel@edk2.groups.io
> Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX <xiaoyux.lu@intel.com>;
> Eugene Cohen <eugene@hp.com>; Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Subject: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native instruction
> support for IA32 and X64
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native instruction support for IA32 and X64
[not found] ` <15FFB5A5A94CCE31.23217@groups.io>
@ 2020-03-26 1:23 ` Yao, Jiewen
2020-03-26 2:44 ` Zurcher, Christopher J
0 siblings, 1 reply; 14+ messages in thread
From: Yao, Jiewen @ 2020-03-26 1:23 UTC (permalink / raw)
To: devel@edk2.groups.io, Yao, Jiewen, Zurcher, Christopher J
Cc: Wang, Jian J, Lu, XiaoyuX, Eugene Cohen, Ard Biesheuvel
Some more comment:
3) Do you consider to enable RNG instruction as well?
4) I saw you added some code for AVX instruction, such as YMM register.
Have you validated that code, to make sure it can work correctly in current environment?
> -----Original Message-----
> From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of Yao, Jiewen
> Sent: Thursday, March 26, 2020 9:15 AM
> To: devel@edk2.groups.io; Zurcher, Christopher J
> <christopher.j.zurcher@intel.com>
> Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX <xiaoyux.lu@intel.com>;
> Eugene Cohen <eugene@hp.com>; Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Subject: Re: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native
> instruction support for IA32 and X64
>
> HI Christopher
> Thanks for the contribution. I think it is good enhancement.
>
> Do you have any data show what performance improvement we can get?
> Did the system boot faster with the this? Which feature ?
> UEFI Secure Boot? TCG Measured Boot? HTTPS boot?
>
>
> Comment for the code:
> 1) I am not sure if we need separate OpensslLibIa32 and OpensslLibX64.
> Can we just define single INF, such as OpensslLibHw.inf ?
>
> 2) Do we also need add a new version for OpensslLibCrypto.inf ?
>
>
>
> Thank you
> Yao Jiewen
>
> > -----Original Message-----
> > From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of Zurcher,
> > Christopher J
> > Sent: Tuesday, March 17, 2020 6:27 PM
> > To: devel@edk2.groups.io
> > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX
> <xiaoyux.lu@intel.com>;
> > Eugene Cohen <eugene@hp.com>; Ard Biesheuvel <ard.biesheuvel@linaro.org>
> > Subject: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native
> instruction
> > support for IA32 and X64
> >
>
>
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native instruction support for IA32 and X64
2020-03-26 1:23 ` Yao, Jiewen
@ 2020-03-26 2:44 ` Zurcher, Christopher J
2020-03-26 3:05 ` Yao, Jiewen
0 siblings, 1 reply; 14+ messages in thread
From: Zurcher, Christopher J @ 2020-03-26 2:44 UTC (permalink / raw)
To: devel@edk2.groups.io, Yao, Jiewen
Cc: Wang, Jian J, Lu, XiaoyuX, Ard Biesheuvel, david.harris4@hp.com,
Kinney, Michael D
The specific performance improvement depends on the operation; the OS provisioning I mentioned in the [Patch 0/1] thread removed the hashing as a bottleneck and improved the overall operation speed over 4x (saving 2.5 minutes of flashing time), but a direct SHA256 benchmark on the particular silicon I have available showed over 12x improvement. I have not benchmarked the improvements to boot time. I do not know the use case targeted by BZ 2507 so I don't know what benefit will be seen there.
I will look at unifying the INF files in the next patch-set and will also add the OpensslLibCrypto.inf case.
I have not exercised the AVX code specifically, as it is coming directly from OpenSSL and includes checks against the CPUID capability flags before executing. I'm not entirely familiar with AVX requirements; is there a known environment restriction against AVX instructions in EDK2?
Regarding RNG, it looks like we already have architecture-specific variants of RdRand...?
There was some off-list feedback regarding the number of files required to be checked in here. OpenSSL does not include assembler-specific implementations of these files and instead relies on "perlasm" scripts which are parsed by a translation script at build time (in the normal OpenSSL build flow) to generate the resulting .nasm files. The implementation I have shared here generates these files as part of the OpensslLib maintainer process, similar to the existing header files which are also generated. Since process_files.pl already requires the package maintainer to have a Perl environment installed, this does not place any additional burden on them.
An alternative implementation has been proposed which would see only a listing/script of the required generator operations to be checked in, and any platform build which intended to utilize the native algorithms would require a local Perl environment as well as any underlying environment dependencies (such as a version check against the NASM executable) for every developer, and additional pre-build steps to run the generator scripts.
Are there any strong opinions here around adding Perl as a build environment dependency vs. checking in maintainer-generated assembly "intermediate" build files?
Thanks,
Christopher Zurcher
> -----Original Message-----
> From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of Yao, Jiewen
> Sent: Wednesday, March 25, 2020 18:23
> To: devel@edk2.groups.io; Yao, Jiewen <jiewen.yao@intel.com>; Zurcher,
> Christopher J <christopher.j.zurcher@intel.com>
> Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX <xiaoyux.lu@intel.com>;
> Eugene Cohen <eugene@hp.com>; Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Subject: Re: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native
> instruction support for IA32 and X64
>
> Some more comment:
>
> 3) Do you consider to enable RNG instruction as well?
>
> 4) I saw you added some code for AVX instruction, such as YMM register.
> Have you validated that code, to make sure it can work correctly in current
> environment?
>
>
>
>
> > -----Original Message-----
> > From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of Yao, Jiewen
> > Sent: Thursday, March 26, 2020 9:15 AM
> > To: devel@edk2.groups.io; Zurcher, Christopher J
> > <christopher.j.zurcher@intel.com>
> > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX
> <xiaoyux.lu@intel.com>;
> > Eugene Cohen <eugene@hp.com>; Ard Biesheuvel <ard.biesheuvel@linaro.org>
> > Subject: Re: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native
> > instruction support for IA32 and X64
> >
> > HI Christopher
> > Thanks for the contribution. I think it is good enhancement.
> >
> > Do you have any data show what performance improvement we can get?
> > Did the system boot faster with the this? Which feature ?
> > UEFI Secure Boot? TCG Measured Boot? HTTPS boot?
> >
> >
> > Comment for the code:
> > 1) I am not sure if we need separate OpensslLibIa32 and OpensslLibX64.
> > Can we just define single INF, such as OpensslLibHw.inf ?
> >
> > 2) Do we also need add a new version for OpensslLibCrypto.inf ?
> >
> >
> >
> > Thank you
> > Yao Jiewen
> >
> > > -----Original Message-----
> > > From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of Zurcher,
> > > Christopher J
> > > Sent: Tuesday, March 17, 2020 6:27 PM
> > > To: devel@edk2.groups.io
> > > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX
> > <xiaoyux.lu@intel.com>;
> > > Eugene Cohen <eugene@hp.com>; Ard Biesheuvel <ard.biesheuvel@linaro.org>
> > > Subject: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native
> > instruction
> > > support for IA32 and X64
> > >
> >
> >
> >
>
>
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native instruction support for IA32 and X64
2020-03-26 2:44 ` Zurcher, Christopher J
@ 2020-03-26 3:05 ` Yao, Jiewen
2020-03-26 3:29 ` Zurcher, Christopher J
0 siblings, 1 reply; 14+ messages in thread
From: Yao, Jiewen @ 2020-03-26 3:05 UTC (permalink / raw)
To: Zurcher, Christopher J, devel@edk2.groups.io
Cc: Wang, Jian J, Lu, XiaoyuX, Ard Biesheuvel, david.harris4@hp.com,
Kinney, Michael D
Thanks. Comment inline:
> -----Original Message-----
> From: Zurcher, Christopher J <christopher.j.zurcher@intel.com>
> Sent: Thursday, March 26, 2020 10:44 AM
> To: devel@edk2.groups.io; Yao, Jiewen <jiewen.yao@intel.com>
> Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX <xiaoyux.lu@intel.com>;
> Ard Biesheuvel <ard.biesheuvel@linaro.org>; david.harris4@hp.com; Kinney,
> Michael D <michael.d.kinney@intel.com>
> Subject: RE: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native
> instruction support for IA32 and X64
>
> The specific performance improvement depends on the operation; the OS
> provisioning I mentioned in the [Patch 0/1] thread removed the hashing as a
> bottleneck and improved the overall operation speed over 4x (saving 2.5
> minutes of flashing time), but a direct SHA256 benchmark on the particular
> silicon I have available showed over 12x improvement. I have not benchmarked
> the improvements to boot time. I do not know the use case targeted by BZ 2507
> so I don't know what benefit will be seen there.
[Jiewen] I guess there might be some improvement on HTTPS boot because of AES-NI for TLS session.
I am just curious on the data. :-)
>
> I will look at unifying the INF files in the next patch-set and will also add the
> OpensslLibCrypto.inf case.
[Jiewen] Thanks!
>
> I have not exercised the AVX code specifically, as it is coming directly from
> OpenSSL and includes checks against the CPUID capability flags before executing.
> I'm not entirely familiar with AVX requirements; is there a known environment
> restriction against AVX instructions in EDK2?
[Jiewen] Yes. UEFI spec only requires to set env for XMM register.
Using other registers such as YMM or ZMM may requires special setup, and save/restore some FPU state.
If a function containing the YMM register access and it is linked into BaseCryptoLib, I highly recommend you run some test to make sure it can work correct.
Maybe I should ask more generic question: Have you validated all impacted crypto lib API to make sure they still work well with this improvement?
>
> Regarding RNG, it looks like we already have architecture-specific variants of
> RdRand...?
[Jiewen] Yes. That is in RngLib.
I ask this question because I see openssl wrapper is using PMC/TSC as noisy.
https://github.com/tianocore/edk2/blob/master/CryptoPkg/Library/OpensslLib/rand_pool.c
Since this patch adds instruction dependency, why no use RNG instruction as well?
>
> There was some off-list feedback regarding the number of files required to be
> checked in here. OpenSSL does not include assembler-specific implementations
> of these files and instead relies on "perlasm" scripts which are parsed by a
> translation script at build time (in the normal OpenSSL build flow) to generate
> the resulting .nasm files. The implementation I have shared here generates these
> files as part of the OpensslLib maintainer process, similar to the existing header
> files which are also generated. Since process_files.pl already requires the
> package maintainer to have a Perl environment installed, this does not place any
> additional burden on them.
> An alternative implementation has been proposed which would see only a
> listing/script of the required generator operations to be checked in, and any
> platform build which intended to utilize the native algorithms would require a
> local Perl environment as well as any underlying environment dependencies
> (such as a version check against the NASM executable) for every developer, and
> additional pre-build steps to run the generator scripts.
>
> Are there any strong opinions here around adding Perl as a build environment
> dependency vs. checking in maintainer-generated assembly "intermediate" build
> files?
[Jiewen] Good question. For tool, maybe Mike or Liming can answer.
And I did get similar issue with you.
I got a submodule code need using CMake and pre-processor to generate a common include file.
How do we handle that? I look for the recommendation as well.
>
> Thanks,
> Christopher Zurcher
>
> > -----Original Message-----
> > From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of Yao,
> Jiewen
> > Sent: Wednesday, March 25, 2020 18:23
> > To: devel@edk2.groups.io; Yao, Jiewen <jiewen.yao@intel.com>; Zurcher,
> > Christopher J <christopher.j.zurcher@intel.com>
> > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX
> <xiaoyux.lu@intel.com>;
> > Eugene Cohen <eugene@hp.com>; Ard Biesheuvel <ard.biesheuvel@linaro.org>
> > Subject: Re: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native
> > instruction support for IA32 and X64
> >
> > Some more comment:
> >
> > 3) Do you consider to enable RNG instruction as well?
> >
> > 4) I saw you added some code for AVX instruction, such as YMM register.
> > Have you validated that code, to make sure it can work correctly in current
> > environment?
> >
> >
> >
> >
> > > -----Original Message-----
> > > From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of Yao,
> Jiewen
> > > Sent: Thursday, March 26, 2020 9:15 AM
> > > To: devel@edk2.groups.io; Zurcher, Christopher J
> > > <christopher.j.zurcher@intel.com>
> > > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX
> > <xiaoyux.lu@intel.com>;
> > > Eugene Cohen <eugene@hp.com>; Ard Biesheuvel
> <ard.biesheuvel@linaro.org>
> > > Subject: Re: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native
> > > instruction support for IA32 and X64
> > >
> > > HI Christopher
> > > Thanks for the contribution. I think it is good enhancement.
> > >
> > > Do you have any data show what performance improvement we can get?
> > > Did the system boot faster with the this? Which feature ?
> > > UEFI Secure Boot? TCG Measured Boot? HTTPS boot?
> > >
> > >
> > > Comment for the code:
> > > 1) I am not sure if we need separate OpensslLibIa32 and OpensslLibX64.
> > > Can we just define single INF, such as OpensslLibHw.inf ?
> > >
> > > 2) Do we also need add a new version for OpensslLibCrypto.inf ?
> > >
> > >
> > >
> > > Thank you
> > > Yao Jiewen
> > >
> > > > -----Original Message-----
> > > > From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of
> Zurcher,
> > > > Christopher J
> > > > Sent: Tuesday, March 17, 2020 6:27 PM
> > > > To: devel@edk2.groups.io
> > > > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX
> > > <xiaoyux.lu@intel.com>;
> > > > Eugene Cohen <eugene@hp.com>; Ard Biesheuvel
> <ard.biesheuvel@linaro.org>
> > > > Subject: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native
> > > instruction
> > > > support for IA32 and X64
> > > >
> > >
> > >
> > >
> >
> >
> >
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native instruction support for IA32 and X64
2020-03-26 3:05 ` Yao, Jiewen
@ 2020-03-26 3:29 ` Zurcher, Christopher J
2020-03-26 3:58 ` Yao, Jiewen
0 siblings, 1 reply; 14+ messages in thread
From: Zurcher, Christopher J @ 2020-03-26 3:29 UTC (permalink / raw)
To: Yao, Jiewen, devel@edk2.groups.io
Cc: Wang, Jian J, Lu, XiaoyuX, Ard Biesheuvel, david.harris4@hp.com,
Kinney, Michael D
> -----Original Message-----
> From: Yao, Jiewen <jiewen.yao@intel.com>
> Sent: Wednesday, March 25, 2020 20:05
> To: Zurcher, Christopher J <christopher.j.zurcher@intel.com>;
> devel@edk2.groups.io
> Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX <xiaoyux.lu@intel.com>;
> Ard Biesheuvel <ard.biesheuvel@linaro.org>; david.harris4@hp.com; Kinney,
> Michael D <michael.d.kinney@intel.com>
> Subject: RE: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native
> instruction support for IA32 and X64
>
> Thanks. Comment inline:
>
> > -----Original Message-----
> > From: Zurcher, Christopher J <christopher.j.zurcher@intel.com>
> > Sent: Thursday, March 26, 2020 10:44 AM
> > To: devel@edk2.groups.io; Yao, Jiewen <jiewen.yao@intel.com>
> > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX
> <xiaoyux.lu@intel.com>;
> > Ard Biesheuvel <ard.biesheuvel@linaro.org>; david.harris4@hp.com; Kinney,
> > Michael D <michael.d.kinney@intel.com>
> > Subject: RE: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native
> > instruction support for IA32 and X64
> >
> > The specific performance improvement depends on the operation; the OS
> > provisioning I mentioned in the [Patch 0/1] thread removed the hashing as a
> > bottleneck and improved the overall operation speed over 4x (saving 2.5
> > minutes of flashing time), but a direct SHA256 benchmark on the particular
> > silicon I have available showed over 12x improvement. I have not
> benchmarked
> > the improvements to boot time. I do not know the use case targeted by BZ
> 2507
> > so I don't know what benefit will be seen there.
> [Jiewen] I guess there might be some improvement on HTTPS boot because of
> AES-NI for TLS session.
> I am just curious on the data. :-)
>
>
> >
> > I will look at unifying the INF files in the next patch-set and will also
> add the
> > OpensslLibCrypto.inf case.
> [Jiewen] Thanks!
>
>
> >
> > I have not exercised the AVX code specifically, as it is coming directly
> from
> > OpenSSL and includes checks against the CPUID capability flags before
> executing.
> > I'm not entirely familiar with AVX requirements; is there a known
> environment
> > restriction against AVX instructions in EDK2?
> [Jiewen] Yes. UEFI spec only requires to set env for XMM register.
> Using other registers such as YMM or ZMM may requires special setup, and
> save/restore some FPU state.
> If a function containing the YMM register access and it is linked into
> BaseCryptoLib, I highly recommend you run some test to make sure it can work
> correct.
>
> Maybe I should ask more generic question: Have you validated all impacted
> crypto lib API to make sure they still work well with this improvement?
>
I have not. Is there an existing test suite that was used initially, or used with each OpenSSL version update?
Thanks,
Christopher Zurcher
>
> >
> > Regarding RNG, it looks like we already have architecture-specific variants
> of
> > RdRand...?
> [Jiewen] Yes. That is in RngLib.
> I ask this question because I see openssl wrapper is using PMC/TSC as noisy.
> https://github.com/tianocore/edk2/blob/master/CryptoPkg/Library/OpensslLib/ra
> nd_pool.c
> Since this patch adds instruction dependency, why no use RNG instruction as
> well?
>
>
> >
> > There was some off-list feedback regarding the number of files required to
> be
> > checked in here. OpenSSL does not include assembler-specific
> implementations
> > of these files and instead relies on "perlasm" scripts which are parsed by
> a
> > translation script at build time (in the normal OpenSSL build flow) to
> generate
> > the resulting .nasm files. The implementation I have shared here generates
> these
> > files as part of the OpensslLib maintainer process, similar to the existing
> header
> > files which are also generated. Since process_files.pl already requires the
> > package maintainer to have a Perl environment installed, this does not
> place any
> > additional burden on them.
> > An alternative implementation has been proposed which would see only a
> > listing/script of the required generator operations to be checked in, and
> any
> > platform build which intended to utilize the native algorithms would
> require a
> > local Perl environment as well as any underlying environment dependencies
> > (such as a version check against the NASM executable) for every developer,
> and
> > additional pre-build steps to run the generator scripts.
> >
> > Are there any strong opinions here around adding Perl as a build
> environment
> > dependency vs. checking in maintainer-generated assembly "intermediate"
> build
> > files?
> [Jiewen] Good question. For tool, maybe Mike or Liming can answer.
> And I did get similar issue with you.
> I got a submodule code need using CMake and pre-processor to generate a
> common include file.
> How do we handle that? I look for the recommendation as well.
>
>
> >
> > Thanks,
> > Christopher Zurcher
> >
> > > -----Original Message-----
> > > From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of Yao,
> > Jiewen
> > > Sent: Wednesday, March 25, 2020 18:23
> > > To: devel@edk2.groups.io; Yao, Jiewen <jiewen.yao@intel.com>; Zurcher,
> > > Christopher J <christopher.j.zurcher@intel.com>
> > > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX
> > <xiaoyux.lu@intel.com>;
> > > Eugene Cohen <eugene@hp.com>; Ard Biesheuvel <ard.biesheuvel@linaro.org>
> > > Subject: Re: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native
> > > instruction support for IA32 and X64
> > >
> > > Some more comment:
> > >
> > > 3) Do you consider to enable RNG instruction as well?
> > >
> > > 4) I saw you added some code for AVX instruction, such as YMM register.
> > > Have you validated that code, to make sure it can work correctly in
> current
> > > environment?
> > >
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of Yao,
> > Jiewen
> > > > Sent: Thursday, March 26, 2020 9:15 AM
> > > > To: devel@edk2.groups.io; Zurcher, Christopher J
> > > > <christopher.j.zurcher@intel.com>
> > > > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX
> > > <xiaoyux.lu@intel.com>;
> > > > Eugene Cohen <eugene@hp.com>; Ard Biesheuvel
> > <ard.biesheuvel@linaro.org>
> > > > Subject: Re: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native
> > > > instruction support for IA32 and X64
> > > >
> > > > HI Christopher
> > > > Thanks for the contribution. I think it is good enhancement.
> > > >
> > > > Do you have any data show what performance improvement we can get?
> > > > Did the system boot faster with the this? Which feature ?
> > > > UEFI Secure Boot? TCG Measured Boot? HTTPS boot?
> > > >
> > > >
> > > > Comment for the code:
> > > > 1) I am not sure if we need separate OpensslLibIa32 and OpensslLibX64.
> > > > Can we just define single INF, such as OpensslLibHw.inf ?
> > > >
> > > > 2) Do we also need add a new version for OpensslLibCrypto.inf ?
> > > >
> > > >
> > > >
> > > > Thank you
> > > > Yao Jiewen
> > > >
> > > > > -----Original Message-----
> > > > > From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of
> > Zurcher,
> > > > > Christopher J
> > > > > Sent: Tuesday, March 17, 2020 6:27 PM
> > > > > To: devel@edk2.groups.io
> > > > > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX
> > > > <xiaoyux.lu@intel.com>;
> > > > > Eugene Cohen <eugene@hp.com>; Ard Biesheuvel
> > <ard.biesheuvel@linaro.org>
> > > > > Subject: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native
> > > > instruction
> > > > > support for IA32 and X64
> > > > >
> > > >
> > > >
> > > >
> > >
> > >
> > >
> >
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native instruction support for IA32 and X64
2020-03-26 3:29 ` Zurcher, Christopher J
@ 2020-03-26 3:58 ` Yao, Jiewen
2020-03-26 18:23 ` Michael D Kinney
0 siblings, 1 reply; 14+ messages in thread
From: Yao, Jiewen @ 2020-03-26 3:58 UTC (permalink / raw)
To: Zurcher, Christopher J, devel@edk2.groups.io
Cc: Wang, Jian J, Lu, XiaoyuX, Ard Biesheuvel, david.harris4@hp.com,
Kinney, Michael D
There is an old shell based test application. It is removed from edk2-repo, and it should be in edk2-test repo in the future.
Before the latter is ready, you can find the archive at
https://github.com/jyao1/edk2/tree/edk2-cryptest/CryptoTestPkg/Cryptest
But I am not sure if it covers the latest interface, especially the new added one. As far as I know, some SM3 is not added there.
I still recommend you double check which functions use the YMM and make sure these functions are tested.
I have no concern on HASH, since you already provided the data.
But other algo, such as AES/RSA/HMAC, etc.
Thank you
Yao Jiewen
> -----Original Message-----
> From: Zurcher, Christopher J <christopher.j.zurcher@intel.com>
> Sent: Thursday, March 26, 2020 11:29 AM
> To: Yao, Jiewen <jiewen.yao@intel.com>; devel@edk2.groups.io
> Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX <xiaoyux.lu@intel.com>;
> Ard Biesheuvel <ard.biesheuvel@linaro.org>; david.harris4@hp.com; Kinney,
> Michael D <michael.d.kinney@intel.com>
> Subject: RE: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native
> instruction support for IA32 and X64
>
> > -----Original Message-----
> > From: Yao, Jiewen <jiewen.yao@intel.com>
> > Sent: Wednesday, March 25, 2020 20:05
> > To: Zurcher, Christopher J <christopher.j.zurcher@intel.com>;
> > devel@edk2.groups.io
> > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX
> <xiaoyux.lu@intel.com>;
> > Ard Biesheuvel <ard.biesheuvel@linaro.org>; david.harris4@hp.com; Kinney,
> > Michael D <michael.d.kinney@intel.com>
> > Subject: RE: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native
> > instruction support for IA32 and X64
> >
> > Thanks. Comment inline:
> >
> > > -----Original Message-----
> > > From: Zurcher, Christopher J <christopher.j.zurcher@intel.com>
> > > Sent: Thursday, March 26, 2020 10:44 AM
> > > To: devel@edk2.groups.io; Yao, Jiewen <jiewen.yao@intel.com>
> > > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX
> > <xiaoyux.lu@intel.com>;
> > > Ard Biesheuvel <ard.biesheuvel@linaro.org>; david.harris4@hp.com; Kinney,
> > > Michael D <michael.d.kinney@intel.com>
> > > Subject: RE: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native
> > > instruction support for IA32 and X64
> > >
> > > The specific performance improvement depends on the operation; the OS
> > > provisioning I mentioned in the [Patch 0/1] thread removed the hashing as a
> > > bottleneck and improved the overall operation speed over 4x (saving 2.5
> > > minutes of flashing time), but a direct SHA256 benchmark on the particular
> > > silicon I have available showed over 12x improvement. I have not
> > benchmarked
> > > the improvements to boot time. I do not know the use case targeted by BZ
> > 2507
> > > so I don't know what benefit will be seen there.
> > [Jiewen] I guess there might be some improvement on HTTPS boot because of
> > AES-NI for TLS session.
> > I am just curious on the data. :-)
> >
> >
> > >
> > > I will look at unifying the INF files in the next patch-set and will also
> > add the
> > > OpensslLibCrypto.inf case.
> > [Jiewen] Thanks!
> >
> >
> > >
> > > I have not exercised the AVX code specifically, as it is coming directly
> > from
> > > OpenSSL and includes checks against the CPUID capability flags before
> > executing.
> > > I'm not entirely familiar with AVX requirements; is there a known
> > environment
> > > restriction against AVX instructions in EDK2?
> > [Jiewen] Yes. UEFI spec only requires to set env for XMM register.
> > Using other registers such as YMM or ZMM may requires special setup, and
> > save/restore some FPU state.
> > If a function containing the YMM register access and it is linked into
> > BaseCryptoLib, I highly recommend you run some test to make sure it can
> work
> > correct.
> >
> > Maybe I should ask more generic question: Have you validated all impacted
> > crypto lib API to make sure they still work well with this improvement?
> >
>
> I have not. Is there an existing test suite that was used initially, or used with
> each OpenSSL version update?
>
> Thanks,
> Christopher Zurcher
>
> >
> > >
> > > Regarding RNG, it looks like we already have architecture-specific variants
> > of
> > > RdRand...?
> > [Jiewen] Yes. That is in RngLib.
> > I ask this question because I see openssl wrapper is using PMC/TSC as noisy.
> >
> https://github.com/tianocore/edk2/blob/master/CryptoPkg/Library/OpensslLib/
> ra
> > nd_pool.c
> > Since this patch adds instruction dependency, why no use RNG instruction as
> > well?
> >
> >
> > >
> > > There was some off-list feedback regarding the number of files required to
> > be
> > > checked in here. OpenSSL does not include assembler-specific
> > implementations
> > > of these files and instead relies on "perlasm" scripts which are parsed by
> > a
> > > translation script at build time (in the normal OpenSSL build flow) to
> > generate
> > > the resulting .nasm files. The implementation I have shared here generates
> > these
> > > files as part of the OpensslLib maintainer process, similar to the existing
> > header
> > > files which are also generated. Since process_files.pl already requires the
> > > package maintainer to have a Perl environment installed, this does not
> > place any
> > > additional burden on them.
> > > An alternative implementation has been proposed which would see only a
> > > listing/script of the required generator operations to be checked in, and
> > any
> > > platform build which intended to utilize the native algorithms would
> > require a
> > > local Perl environment as well as any underlying environment dependencies
> > > (such as a version check against the NASM executable) for every developer,
> > and
> > > additional pre-build steps to run the generator scripts.
> > >
> > > Are there any strong opinions here around adding Perl as a build
> > environment
> > > dependency vs. checking in maintainer-generated assembly "intermediate"
> > build
> > > files?
> > [Jiewen] Good question. For tool, maybe Mike or Liming can answer.
> > And I did get similar issue with you.
> > I got a submodule code need using CMake and pre-processor to generate a
> > common include file.
> > How do we handle that? I look for the recommendation as well.
> >
> >
> > >
> > > Thanks,
> > > Christopher Zurcher
> > >
> > > > -----Original Message-----
> > > > From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of Yao,
> > > Jiewen
> > > > Sent: Wednesday, March 25, 2020 18:23
> > > > To: devel@edk2.groups.io; Yao, Jiewen <jiewen.yao@intel.com>; Zurcher,
> > > > Christopher J <christopher.j.zurcher@intel.com>
> > > > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX
> > > <xiaoyux.lu@intel.com>;
> > > > Eugene Cohen <eugene@hp.com>; Ard Biesheuvel
> <ard.biesheuvel@linaro.org>
> > > > Subject: Re: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native
> > > > instruction support for IA32 and X64
> > > >
> > > > Some more comment:
> > > >
> > > > 3) Do you consider to enable RNG instruction as well?
> > > >
> > > > 4) I saw you added some code for AVX instruction, such as YMM register.
> > > > Have you validated that code, to make sure it can work correctly in
> > current
> > > > environment?
> > > >
> > > >
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of Yao,
> > > Jiewen
> > > > > Sent: Thursday, March 26, 2020 9:15 AM
> > > > > To: devel@edk2.groups.io; Zurcher, Christopher J
> > > > > <christopher.j.zurcher@intel.com>
> > > > > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX
> > > > <xiaoyux.lu@intel.com>;
> > > > > Eugene Cohen <eugene@hp.com>; Ard Biesheuvel
> > > <ard.biesheuvel@linaro.org>
> > > > > Subject: Re: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native
> > > > > instruction support for IA32 and X64
> > > > >
> > > > > HI Christopher
> > > > > Thanks for the contribution. I think it is good enhancement.
> > > > >
> > > > > Do you have any data show what performance improvement we can get?
> > > > > Did the system boot faster with the this? Which feature ?
> > > > > UEFI Secure Boot? TCG Measured Boot? HTTPS boot?
> > > > >
> > > > >
> > > > > Comment for the code:
> > > > > 1) I am not sure if we need separate OpensslLibIa32 and OpensslLibX64.
> > > > > Can we just define single INF, such as OpensslLibHw.inf ?
> > > > >
> > > > > 2) Do we also need add a new version for OpensslLibCrypto.inf ?
> > > > >
> > > > >
> > > > >
> > > > > Thank you
> > > > > Yao Jiewen
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of
> > > Zurcher,
> > > > > > Christopher J
> > > > > > Sent: Tuesday, March 17, 2020 6:27 PM
> > > > > > To: devel@edk2.groups.io
> > > > > > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX
> > > > > <xiaoyux.lu@intel.com>;
> > > > > > Eugene Cohen <eugene@hp.com>; Ard Biesheuvel
> > > <ard.biesheuvel@linaro.org>
> > > > > > Subject: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native
> > > > > instruction
> > > > > > support for IA32 and X64
> > > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > >
> > >
> >
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [edk2-devel] [PATCH 0/1] CryptoPkg/OpensslLib: Add native instruction support for IA32 and X64
2020-03-26 1:04 ` [edk2-devel] " Zurcher, Christopher J
@ 2020-03-26 7:49 ` Ard Biesheuvel
0 siblings, 0 replies; 14+ messages in thread
From: Ard Biesheuvel @ 2020-03-26 7:49 UTC (permalink / raw)
To: Zurcher, Christopher J
Cc: devel@edk2.groups.io, Wang, Jian J, Lu, XiaoyuX,
david.harris4@hp.com
On Thu, 26 Mar 2020 at 02:04, Zurcher, Christopher J
<christopher.j.zurcher@intel.com> wrote:
>
> > -----Original Message-----
> > From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of Ard Biesheuvel
> > Sent: Wednesday, March 25, 2020 11:40
> > To: Zurcher, Christopher J <christopher.j.zurcher@intel.com>
> > Cc: edk2-devel-groups-io <devel@edk2.groups.io>; Wang, Jian J
> > <jian.j.wang@intel.com>; Lu, XiaoyuX <xiaoyux.lu@intel.com>; Eugene Cohen
> > <eugene@hp.com>
> > Subject: Re: [edk2-devel] [PATCH 0/1] CryptoPkg/OpensslLib: Add native
> > instruction support for IA32 and X64
> >
> > On Tue, 17 Mar 2020 at 11:26, Christopher J Zurcher
> > <christopher.j.zurcher@intel.com> wrote:
> > >
> > > BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=2507
> > >
> > > This patch adds support for building the native instruction algorithms for
> > > IA32 and X64 versions of OpensslLib. The process_files.pl script was
> > modified
> > > to parse the .asm file targets from the OpenSSL build config data struct,
> > and
> > > generate the necessary assembly files for the EDK2 build environment.
> > >
> > > For the X64 variant, OpenSSL includes calls to a Windows error handling
> > API,
> > > and that function has been stubbed out in ApiHooks.c.
> > >
> > > For all variants, a constructor was added to call the required CPUID
> > function
> > > within OpenSSL to facilitate processor capability checks in the native
> > > algorithms.
> > >
> > > Additional native architecture variants should be simple to add by
> > following
> > > the changes made for these two architectures.
> > >
> > > The OpenSSL assembly files are traditionally generated at build time using
> > a
> > > perl script. To avoid that burden on EDK2 users, these end-result assembly
> > > files are generated during the configuration steps performed by the package
> > > maintainer (through process_files.pl). The perl generator scripts inside
> > > OpenSSL do not parse file comments as they are only meant to create
> > > intermediate build files, so process_files.pl contains additional hooks to
> > > preserve the copyright headers as well as clean up tabs and line endings to
> > > comply with EDK2 coding standards. The resulting file headers align with
> > > the generated .h files which are already included in the EDK2 repository.
> > >
> > > Cc: Jian J Wang <jian.j.wang@intel.com>
> > > Cc: Xiaoyu Lu <xiaoyux.lu@intel.com>
> > > Cc: Eugene Cohen <eugene@hp.com>
> > > Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> > >
> > > Christopher J Zurcher (1):
> > > CryptoPkg/OpensslLib: Add native instruction support for IA32 and X64
> > >
> >
> > Hello Christopher,
> >
> > It would be helpful to understand the purpose of all this. Also, I
> > think we could be more picky about which algorithms we enable - DES
> > and MD5 don't seem highly useful, and even if they were, what do we
> > gain by using a faster (or smaller?) implementation?
> >
>
> The selection of algorithms comes from the default OpenSSL assembly targets; this combination is validated and widely used, and I don't know all the consequences of picking and choosing which ones to include. If necessary I could look into reducing the list.
>
> The primary driver for this change is enabling the Full Flash Update (FFU) OS provisioning flow for Windows as described here:
> https://docs.microsoft.com/en-us/windows-hardware/manufacture/desktop/wim-vs-ffu-image-file-formats
> This item under "Reliability" is what we are speeding up: "Includes a catalog and hash table to validate a signature upfront before flashing onto a device. The hash table is generated during capture, and validated when applying the image."
> This provisioning flow can be performed within the UEFI environment, and the native algorithms allow significant time savings in a factory setting (minutes per device).
>
> We also had a BZ which requested these changes and the specific need was not provided. Maybe David @HP can provide further insight?
> https://bugzilla.tianocore.org/show_bug.cgi?id=2507
>
> There have been additional platform-specific benefits identified, for example speeding up HMAC authentication of communication with other HW/FW components.
>
OK, so in summary, there is one particular hash that you need to speed
up, and this is why you enable every single asm implementation in
OpenSSL for X86. I'm not sure I am convinced that this is justified.
OpenSSL can easily deal with having just a couple of these accelerated
implementations enabled. To me, it seems like a more sensible approach
to only enable algorithms based on special instructions (such as
AES-NI and SHA-NI), which typically give a speedup in the 10x-20x
range, and to leave the other ones disabled. E.g., accelerated
montgomery multiplication for RSA is really not worth it, since the
speedup is around 50% and RSA is hardly a bottleneck in UEFI. Same
applies to deprecated ciphers such as DES or MD5 - they are not based
on special instructions, and shouldn't be used in the first place.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native instruction support for IA32 and X64
2020-03-26 3:58 ` Yao, Jiewen
@ 2020-03-26 18:23 ` Michael D Kinney
2020-03-27 0:52 ` Zurcher, Christopher J
0 siblings, 1 reply; 14+ messages in thread
From: Michael D Kinney @ 2020-03-26 18:23 UTC (permalink / raw)
To: Yao, Jiewen, Zurcher, Christopher J, devel@edk2.groups.io,
Kinney, Michael D
Cc: Wang, Jian J, Lu, XiaoyuX, Ard Biesheuvel, david.harris4@hp.com
The main issue with use of extended registers is
that the default interrupt/exception handler for
IA32/X64 does FXSAVE/FXRSTOR.
https://github.com/tianocore/edk2/blob/5f7c91f0d72efca3b53628163861265c89306f1f/UefiCpuPkg/Library/CpuExceptionHandlerLib/X64/ExceptionHandlerAsm.nasm#L244
https://github.com/tianocore/edk2/blob/5f7c91f0d72efca3b53628163861265c89306f1f/UefiCpuPkg/Library/CpuExceptionHandlerLib/Ia32/ExceptionHandlerAsm.nasm#L290
This covers FP/MMX/XMM registers, but no more.
Any code that uses more than this has 2 options:
1) Expand the state saved in interrupts/exception handlers
for the entire platform.
2) Disable interrupts when calling code that uses
additional register state.
Mike
> -----Original Message-----
> From: Yao, Jiewen <jiewen.yao@intel.com>
> Sent: Wednesday, March 25, 2020 8:58 PM
> To: Zurcher, Christopher J
> <christopher.j.zurcher@intel.com>; devel@edk2.groups.io
> Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX
> <xiaoyux.lu@intel.com>; Ard Biesheuvel
> <ard.biesheuvel@linaro.org>; david.harris4@hp.com;
> Kinney, Michael D <michael.d.kinney@intel.com>
> Subject: RE: [edk2-devel] [PATCH 1/1]
> CryptoPkg/OpensslLib: Add native instruction support for
> IA32 and X64
>
> There is an old shell based test application. It is
> removed from edk2-repo, and it should be in edk2-test
> repo in the future.
>
> Before the latter is ready, you can find the archive at
> https://github.com/jyao1/edk2/tree/edk2-
> cryptest/CryptoTestPkg/Cryptest
>
> But I am not sure if it covers the latest interface,
> especially the new added one. As far as I know, some SM3
> is not added there.
>
> I still recommend you double check which functions use
> the YMM and make sure these functions are tested.
>
> I have no concern on HASH, since you already provided the
> data.
> But other algo, such as AES/RSA/HMAC, etc.
>
>
> Thank you
> Yao Jiewen
>
> > -----Original Message-----
> > From: Zurcher, Christopher J
> <christopher.j.zurcher@intel.com>
> > Sent: Thursday, March 26, 2020 11:29 AM
> > To: Yao, Jiewen <jiewen.yao@intel.com>;
> devel@edk2.groups.io
> > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX
> <xiaoyux.lu@intel.com>;
> > Ard Biesheuvel <ard.biesheuvel@linaro.org>;
> david.harris4@hp.com; Kinney,
> > Michael D <michael.d.kinney@intel.com>
> > Subject: RE: [edk2-devel] [PATCH 1/1]
> CryptoPkg/OpensslLib: Add native
> > instruction support for IA32 and X64
> >
> > > -----Original Message-----
> > > From: Yao, Jiewen <jiewen.yao@intel.com>
> > > Sent: Wednesday, March 25, 2020 20:05
> > > To: Zurcher, Christopher J
> <christopher.j.zurcher@intel.com>;
> > > devel@edk2.groups.io
> > > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX
> > <xiaoyux.lu@intel.com>;
> > > Ard Biesheuvel <ard.biesheuvel@linaro.org>;
> david.harris4@hp.com; Kinney,
> > > Michael D <michael.d.kinney@intel.com>
> > > Subject: RE: [edk2-devel] [PATCH 1/1]
> CryptoPkg/OpensslLib: Add native
> > > instruction support for IA32 and X64
> > >
> > > Thanks. Comment inline:
> > >
> > > > -----Original Message-----
> > > > From: Zurcher, Christopher J
> <christopher.j.zurcher@intel.com>
> > > > Sent: Thursday, March 26, 2020 10:44 AM
> > > > To: devel@edk2.groups.io; Yao, Jiewen
> <jiewen.yao@intel.com>
> > > > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu,
> XiaoyuX
> > > <xiaoyux.lu@intel.com>;
> > > > Ard Biesheuvel <ard.biesheuvel@linaro.org>;
> david.harris4@hp.com; Kinney,
> > > > Michael D <michael.d.kinney@intel.com>
> > > > Subject: RE: [edk2-devel] [PATCH 1/1]
> CryptoPkg/OpensslLib: Add native
> > > > instruction support for IA32 and X64
> > > >
> > > > The specific performance improvement depends on the
> operation; the OS
> > > > provisioning I mentioned in the [Patch 0/1] thread
> removed the hashing as a
> > > > bottleneck and improved the overall operation speed
> over 4x (saving 2.5
> > > > minutes of flashing time), but a direct SHA256
> benchmark on the particular
> > > > silicon I have available showed over 12x
> improvement. I have not
> > > benchmarked
> > > > the improvements to boot time. I do not know the
> use case targeted by BZ
> > > 2507
> > > > so I don’t know what benefit will be seen there.
> > > [Jiewen] I guess there might be some improvement on
> HTTPS boot because of
> > > AES-NI for TLS session.
> > > I am just curious on the data. :-)
> > >
> > >
> > > >
> > > > I will look at unifying the INF files in the next
> patch-set and will also
> > > add the
> > > > OpensslLibCrypto.inf case.
> > > [Jiewen] Thanks!
> > >
> > >
> > > >
> > > > I have not exercised the AVX code specifically, as
> it is coming directly
> > > from
> > > > OpenSSL and includes checks against the CPUID
> capability flags before
> > > executing.
> > > > I'm not entirely familiar with AVX requirements; is
> there a known
> > > environment
> > > > restriction against AVX instructions in EDK2?
> > > [Jiewen] Yes. UEFI spec only requires to set env for
> XMM register.
> > > Using other registers such as YMM or ZMM may requires
> special setup, and
> > > save/restore some FPU state.
> > > If a function containing the YMM register access and
> it is linked into
> > > BaseCryptoLib, I highly recommend you run some test
> to make sure it can
> > work
> > > correct.
> > >
> > > Maybe I should ask more generic question: Have you
> validated all impacted
> > > crypto lib API to make sure they still work well with
> this improvement?
> > >
> >
> > I have not. Is there an existing test suite that was
> used initially, or used with
> > each OpenSSL version update?
> >
> > Thanks,
> > Christopher Zurcher
> >
> > >
> > > >
> > > > Regarding RNG, it looks like we already have
> architecture-specific variants
> > > of
> > > > RdRand...?
> > > [Jiewen] Yes. That is in RngLib.
> > > I ask this question because I see openssl wrapper is
> using PMC/TSC as noisy.
> > >
> >
> https://github.com/tianocore/edk2/blob/master/CryptoPkg/L
> ibrary/OpensslLib/
> > ra
> > > nd_pool.c
> > > Since this patch adds instruction dependency, why no
> use RNG instruction as
> > > well?
> > >
> > >
> > > >
> > > > There was some off-list feedback regarding the
> number of files required to
> > > be
> > > > checked in here. OpenSSL does not include
> assembler-specific
> > > implementations
> > > > of these files and instead relies on "perlasm"
> scripts which are parsed by
> > > a
> > > > translation script at build time (in the normal
> OpenSSL build flow) to
> > > generate
> > > > the resulting .nasm files. The implementation I
> have shared here generates
> > > these
> > > > files as part of the OpensslLib maintainer process,
> similar to the existing
> > > header
> > > > files which are also generated. Since
> process_files.pl already requires the
> > > > package maintainer to have a Perl environment
> installed, this does not
> > > place any
> > > > additional burden on them.
> > > > An alternative implementation has been proposed
> which would see only a
> > > > listing/script of the required generator operations
> to be checked in, and
> > > any
> > > > platform build which intended to utilize the native
> algorithms would
> > > require a
> > > > local Perl environment as well as any underlying
> environment dependencies
> > > > (such as a version check against the NASM
> executable) for every developer,
> > > and
> > > > additional pre-build steps to run the generator
> scripts.
> > > >
> > > > Are there any strong opinions here around adding
> Perl as a build
> > > environment
> > > > dependency vs. checking in maintainer-generated
> assembly "intermediate"
> > > build
> > > > files?
> > > [Jiewen] Good question. For tool, maybe Mike or
> Liming can answer.
> > > And I did get similar issue with you.
> > > I got a submodule code need using CMake and pre-
> processor to generate a
> > > common include file.
> > > How do we handle that? I look for the recommendation
> as well.
> > >
> > >
> > > >
> > > > Thanks,
> > > > Christopher Zurcher
> > > >
> > > > > -----Original Message-----
> > > > > From: devel@edk2.groups.io <devel@edk2.groups.io>
> On Behalf Of Yao,
> > > > Jiewen
> > > > > Sent: Wednesday, March 25, 2020 18:23
> > > > > To: devel@edk2.groups.io; Yao, Jiewen
> <jiewen.yao@intel.com>; Zurcher,
> > > > > Christopher J <christopher.j.zurcher@intel.com>
> > > > > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu,
> XiaoyuX
> > > > <xiaoyux.lu@intel.com>;
> > > > > Eugene Cohen <eugene@hp.com>; Ard Biesheuvel
> > <ard.biesheuvel@linaro.org>
> > > > > Subject: Re: [edk2-devel] [PATCH 1/1]
> CryptoPkg/OpensslLib: Add native
> > > > > instruction support for IA32 and X64
> > > > >
> > > > > Some more comment:
> > > > >
> > > > > 3) Do you consider to enable RNG instruction as
> well?
> > > > >
> > > > > 4) I saw you added some code for AVX instruction,
> such as YMM register.
> > > > > Have you validated that code, to make sure it can
> work correctly in
> > > current
> > > > > environment?
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: devel@edk2.groups.io
> <devel@edk2.groups.io> On Behalf Of Yao,
> > > > Jiewen
> > > > > > Sent: Thursday, March 26, 2020 9:15 AM
> > > > > > To: devel@edk2.groups.io; Zurcher, Christopher
> J
> > > > > > <christopher.j.zurcher@intel.com>
> > > > > > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu,
> XiaoyuX
> > > > > <xiaoyux.lu@intel.com>;
> > > > > > Eugene Cohen <eugene@hp.com>; Ard Biesheuvel
> > > > <ard.biesheuvel@linaro.org>
> > > > > > Subject: Re: [edk2-devel] [PATCH 1/1]
> CryptoPkg/OpensslLib: Add native
> > > > > > instruction support for IA32 and X64
> > > > > >
> > > > > > HI Christopher
> > > > > > Thanks for the contribution. I think it is good
> enhancement.
> > > > > >
> > > > > > Do you have any data show what performance
> improvement we can get?
> > > > > > Did the system boot faster with the this? Which
> feature ?
> > > > > > UEFI Secure Boot? TCG Measured Boot? HTTPS
> boot?
> > > > > >
> > > > > >
> > > > > > Comment for the code:
> > > > > > 1) I am not sure if we need separate
> OpensslLibIa32 and OpensslLibX64.
> > > > > > Can we just define single INF, such as
> OpensslLibHw.inf ?
> > > > > >
> > > > > > 2) Do we also need add a new version for
> OpensslLibCrypto.inf ?
> > > > > >
> > > > > >
> > > > > >
> > > > > > Thank you
> > > > > > Yao Jiewen
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: devel@edk2.groups.io
> <devel@edk2.groups.io> On Behalf Of
> > > > Zurcher,
> > > > > > > Christopher J
> > > > > > > Sent: Tuesday, March 17, 2020 6:27 PM
> > > > > > > To: devel@edk2.groups.io
> > > > > > > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu,
> XiaoyuX
> > > > > > <xiaoyux.lu@intel.com>;
> > > > > > > Eugene Cohen <eugene@hp.com>; Ard Biesheuvel
> > > > <ard.biesheuvel@linaro.org>
> > > > > > > Subject: [edk2-devel] [PATCH 1/1]
> CryptoPkg/OpensslLib: Add native
> > > > > > instruction
> > > > > > > support for IA32 and X64
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native instruction support for IA32 and X64
2020-03-26 18:23 ` Michael D Kinney
@ 2020-03-27 0:52 ` Zurcher, Christopher J
0 siblings, 0 replies; 14+ messages in thread
From: Zurcher, Christopher J @ 2020-03-27 0:52 UTC (permalink / raw)
To: Kinney, Michael D, Yao, Jiewen, devel@edk2.groups.io
Cc: Wang, Jian J, Lu, XiaoyuX, Ard Biesheuvel, david.harris4@hp.com
During the Perlasm generation step, the script attempts to execute NASM from the user's path to see if it's an appropriate version to support AVX instructions:
https://github.com/openssl/openssl/blob/OpenSSL_1_1_1-stable/crypto/sha/asm/sha512-x86_64.pl#L129
For now, we could force this check to fail during generation. I'm going to reduce this patch to only cover AES and SHA, and the OS-provisioning use case for the SHA acceleration has enough dependency on I/O operations that the AVX speed increase is already reaching diminishing returns. I don't mind leaving AVX support to the future.
--
Christopher Zurcher
> -----Original Message-----
> From: Kinney, Michael D <michael.d.kinney@intel.com>
> Sent: Thursday, March 26, 2020 11:23
> To: Yao, Jiewen <jiewen.yao@intel.com>; Zurcher, Christopher J
> <christopher.j.zurcher@intel.com>; devel@edk2.groups.io; Kinney, Michael D
> <michael.d.kinney@intel.com>
> Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX <xiaoyux.lu@intel.com>;
> Ard Biesheuvel <ard.biesheuvel@linaro.org>; david.harris4@hp.com
> Subject: RE: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native
> instruction support for IA32 and X64
>
> The main issue with use of extended registers is
> that the default interrupt/exception handler for
> IA32/X64 does FXSAVE/FXRSTOR.
>
> https://github.com/tianocore/edk2/blob/5f7c91f0d72efca3b53628163861265c89306f
> 1f/UefiCpuPkg/Library/CpuExceptionHandlerLib/X64/ExceptionHandlerAsm.nasm#L24
> 4
>
> https://github.com/tianocore/edk2/blob/5f7c91f0d72efca3b53628163861265c89306f
> 1f/UefiCpuPkg/Library/CpuExceptionHandlerLib/Ia32/ExceptionHandlerAsm.nasm#L2
> 90
>
> This covers FP/MMX/XMM registers, but no more.
>
> Any code that uses more than this has 2 options:
> 1) Expand the state saved in interrupts/exception handlers
> for the entire platform.
> 2) Disable interrupts when calling code that uses
> additional register state.
>
> Mike
>
> > -----Original Message-----
> > From: Yao, Jiewen <jiewen.yao@intel.com>
> > Sent: Wednesday, March 25, 2020 8:58 PM
> > To: Zurcher, Christopher J
> > <christopher.j.zurcher@intel.com>; devel@edk2.groups.io
> > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX
> > <xiaoyux.lu@intel.com>; Ard Biesheuvel
> > <ard.biesheuvel@linaro.org>; david.harris4@hp.com;
> > Kinney, Michael D <michael.d.kinney@intel.com>
> > Subject: RE: [edk2-devel] [PATCH 1/1]
> > CryptoPkg/OpensslLib: Add native instruction support for
> > IA32 and X64
> >
> > There is an old shell based test application. It is
> > removed from edk2-repo, and it should be in edk2-test
> > repo in the future.
> >
> > Before the latter is ready, you can find the archive at
> > https://github.com/jyao1/edk2/tree/edk2-
> > cryptest/CryptoTestPkg/Cryptest
> >
> > But I am not sure if it covers the latest interface,
> > especially the new added one. As far as I know, some SM3
> > is not added there.
> >
> > I still recommend you double check which functions use
> > the YMM and make sure these functions are tested.
> >
> > I have no concern on HASH, since you already provided the
> > data.
> > But other algo, such as AES/RSA/HMAC, etc.
> >
> >
> > Thank you
> > Yao Jiewen
> >
> > > -----Original Message-----
> > > From: Zurcher, Christopher J
> > <christopher.j.zurcher@intel.com>
> > > Sent: Thursday, March 26, 2020 11:29 AM
> > > To: Yao, Jiewen <jiewen.yao@intel.com>;
> > devel@edk2.groups.io
> > > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX
> > <xiaoyux.lu@intel.com>;
> > > Ard Biesheuvel <ard.biesheuvel@linaro.org>;
> > david.harris4@hp.com; Kinney,
> > > Michael D <michael.d.kinney@intel.com>
> > > Subject: RE: [edk2-devel] [PATCH 1/1]
> > CryptoPkg/OpensslLib: Add native
> > > instruction support for IA32 and X64
> > >
> > > > -----Original Message-----
> > > > From: Yao, Jiewen <jiewen.yao@intel.com>
> > > > Sent: Wednesday, March 25, 2020 20:05
> > > > To: Zurcher, Christopher J
> > <christopher.j.zurcher@intel.com>;
> > > > devel@edk2.groups.io
> > > > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX
> > > <xiaoyux.lu@intel.com>;
> > > > Ard Biesheuvel <ard.biesheuvel@linaro.org>;
> > david.harris4@hp.com; Kinney,
> > > > Michael D <michael.d.kinney@intel.com>
> > > > Subject: RE: [edk2-devel] [PATCH 1/1]
> > CryptoPkg/OpensslLib: Add native
> > > > instruction support for IA32 and X64
> > > >
> > > > Thanks. Comment inline:
> > > >
> > > > > -----Original Message-----
> > > > > From: Zurcher, Christopher J
> > <christopher.j.zurcher@intel.com>
> > > > > Sent: Thursday, March 26, 2020 10:44 AM
> > > > > To: devel@edk2.groups.io; Yao, Jiewen
> > <jiewen.yao@intel.com>
> > > > > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu,
> > XiaoyuX
> > > > <xiaoyux.lu@intel.com>;
> > > > > Ard Biesheuvel <ard.biesheuvel@linaro.org>;
> > david.harris4@hp.com; Kinney,
> > > > > Michael D <michael.d.kinney@intel.com>
> > > > > Subject: RE: [edk2-devel] [PATCH 1/1]
> > CryptoPkg/OpensslLib: Add native
> > > > > instruction support for IA32 and X64
> > > > >
> > > > > The specific performance improvement depends on the
> > operation; the OS
> > > > > provisioning I mentioned in the [Patch 0/1] thread
> > removed the hashing as a
> > > > > bottleneck and improved the overall operation speed
> > over 4x (saving 2.5
> > > > > minutes of flashing time), but a direct SHA256
> > benchmark on the particular
> > > > > silicon I have available showed over 12x
> > improvement. I have not
> > > > benchmarked
> > > > > the improvements to boot time. I do not know the
> > use case targeted by BZ
> > > > 2507
> > > > > so I don’t know what benefit will be seen there.
> > > > [Jiewen] I guess there might be some improvement on
> > HTTPS boot because of
> > > > AES-NI for TLS session.
> > > > I am just curious on the data. :-)
> > > >
> > > >
> > > > >
> > > > > I will look at unifying the INF files in the next
> > patch-set and will also
> > > > add the
> > > > > OpensslLibCrypto.inf case.
> > > > [Jiewen] Thanks!
> > > >
> > > >
> > > > >
> > > > > I have not exercised the AVX code specifically, as
> > it is coming directly
> > > > from
> > > > > OpenSSL and includes checks against the CPUID
> > capability flags before
> > > > executing.
> > > > > I'm not entirely familiar with AVX requirements; is
> > there a known
> > > > environment
> > > > > restriction against AVX instructions in EDK2?
> > > > [Jiewen] Yes. UEFI spec only requires to set env for
> > XMM register.
> > > > Using other registers such as YMM or ZMM may requires
> > special setup, and
> > > > save/restore some FPU state.
> > > > If a function containing the YMM register access and
> > it is linked into
> > > > BaseCryptoLib, I highly recommend you run some test
> > to make sure it can
> > > work
> > > > correct.
> > > >
> > > > Maybe I should ask more generic question: Have you
> > validated all impacted
> > > > crypto lib API to make sure they still work well with
> > this improvement?
> > > >
> > >
> > > I have not. Is there an existing test suite that was
> > used initially, or used with
> > > each OpenSSL version update?
> > >
> > > Thanks,
> > > Christopher Zurcher
> > >
> > > >
> > > > >
> > > > > Regarding RNG, it looks like we already have
> > architecture-specific variants
> > > > of
> > > > > RdRand...?
> > > > [Jiewen] Yes. That is in RngLib.
> > > > I ask this question because I see openssl wrapper is
> > using PMC/TSC as noisy.
> > > >
> > >
> > https://github.com/tianocore/edk2/blob/master/CryptoPkg/L
> > ibrary/OpensslLib/
> > > ra
> > > > nd_pool.c
> > > > Since this patch adds instruction dependency, why no
> > use RNG instruction as
> > > > well?
> > > >
> > > >
> > > > >
> > > > > There was some off-list feedback regarding the
> > number of files required to
> > > > be
> > > > > checked in here. OpenSSL does not include
> > assembler-specific
> > > > implementations
> > > > > of these files and instead relies on "perlasm"
> > scripts which are parsed by
> > > > a
> > > > > translation script at build time (in the normal
> > OpenSSL build flow) to
> > > > generate
> > > > > the resulting .nasm files. The implementation I
> > have shared here generates
> > > > these
> > > > > files as part of the OpensslLib maintainer process,
> > similar to the existing
> > > > header
> > > > > files which are also generated. Since
> > process_files.pl already requires the
> > > > > package maintainer to have a Perl environment
> > installed, this does not
> > > > place any
> > > > > additional burden on them.
> > > > > An alternative implementation has been proposed
> > which would see only a
> > > > > listing/script of the required generator operations
> > to be checked in, and
> > > > any
> > > > > platform build which intended to utilize the native
> > algorithms would
> > > > require a
> > > > > local Perl environment as well as any underlying
> > environment dependencies
> > > > > (such as a version check against the NASM
> > executable) for every developer,
> > > > and
> > > > > additional pre-build steps to run the generator
> > scripts.
> > > > >
> > > > > Are there any strong opinions here around adding
> > Perl as a build
> > > > environment
> > > > > dependency vs. checking in maintainer-generated
> > assembly "intermediate"
> > > > build
> > > > > files?
> > > > [Jiewen] Good question. For tool, maybe Mike or
> > Liming can answer.
> > > > And I did get similar issue with you.
> > > > I got a submodule code need using CMake and pre-
> > processor to generate a
> > > > common include file.
> > > > How do we handle that? I look for the recommendation
> > as well.
> > > >
> > > >
> > > > >
> > > > > Thanks,
> > > > > Christopher Zurcher
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: devel@edk2.groups.io <devel@edk2.groups.io>
> > On Behalf Of Yao,
> > > > > Jiewen
> > > > > > Sent: Wednesday, March 25, 2020 18:23
> > > > > > To: devel@edk2.groups.io; Yao, Jiewen
> > <jiewen.yao@intel.com>; Zurcher,
> > > > > > Christopher J <christopher.j.zurcher@intel.com>
> > > > > > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu,
> > XiaoyuX
> > > > > <xiaoyux.lu@intel.com>;
> > > > > > Eugene Cohen <eugene@hp.com>; Ard Biesheuvel
> > > <ard.biesheuvel@linaro.org>
> > > > > > Subject: Re: [edk2-devel] [PATCH 1/1]
> > CryptoPkg/OpensslLib: Add native
> > > > > > instruction support for IA32 and X64
> > > > > >
> > > > > > Some more comment:
> > > > > >
> > > > > > 3) Do you consider to enable RNG instruction as
> > well?
> > > > > >
> > > > > > 4) I saw you added some code for AVX instruction,
> > such as YMM register.
> > > > > > Have you validated that code, to make sure it can
> > work correctly in
> > > > current
> > > > > > environment?
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: devel@edk2.groups.io
> > <devel@edk2.groups.io> On Behalf Of Yao,
> > > > > Jiewen
> > > > > > > Sent: Thursday, March 26, 2020 9:15 AM
> > > > > > > To: devel@edk2.groups.io; Zurcher, Christopher
> > J
> > > > > > > <christopher.j.zurcher@intel.com>
> > > > > > > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu,
> > XiaoyuX
> > > > > > <xiaoyux.lu@intel.com>;
> > > > > > > Eugene Cohen <eugene@hp.com>; Ard Biesheuvel
> > > > > <ard.biesheuvel@linaro.org>
> > > > > > > Subject: Re: [edk2-devel] [PATCH 1/1]
> > CryptoPkg/OpensslLib: Add native
> > > > > > > instruction support for IA32 and X64
> > > > > > >
> > > > > > > HI Christopher
> > > > > > > Thanks for the contribution. I think it is good
> > enhancement.
> > > > > > >
> > > > > > > Do you have any data show what performance
> > improvement we can get?
> > > > > > > Did the system boot faster with the this? Which
> > feature ?
> > > > > > > UEFI Secure Boot? TCG Measured Boot? HTTPS
> > boot?
> > > > > > >
> > > > > > >
> > > > > > > Comment for the code:
> > > > > > > 1) I am not sure if we need separate
> > OpensslLibIa32 and OpensslLibX64.
> > > > > > > Can we just define single INF, such as
> > OpensslLibHw.inf ?
> > > > > > >
> > > > > > > 2) Do we also need add a new version for
> > OpensslLibCrypto.inf ?
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Thank you
> > > > > > > Yao Jiewen
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: devel@edk2.groups.io
> > <devel@edk2.groups.io> On Behalf Of
> > > > > Zurcher,
> > > > > > > > Christopher J
> > > > > > > > Sent: Tuesday, March 17, 2020 6:27 PM
> > > > > > > > To: devel@edk2.groups.io
> > > > > > > > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu,
> > XiaoyuX
> > > > > > > <xiaoyux.lu@intel.com>;
> > > > > > > > Eugene Cohen <eugene@hp.com>; Ard Biesheuvel
> > > > > <ard.biesheuvel@linaro.org>
> > > > > > > > Subject: [edk2-devel] [PATCH 1/1]
> > CryptoPkg/OpensslLib: Add native
> > > > > > > instruction
> > > > > > > > support for IA32 and X64
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
>
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2020-03-27 0:52 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-03-17 10:26 [PATCH 0/1] CryptoPkg/OpensslLib: Add native instruction support for IA32 and X64 Zurcher, Christopher J
2020-03-17 10:26 ` [PATCH 1/1] " Zurcher, Christopher J
2020-03-26 1:15 ` [edk2-devel] " Yao, Jiewen
[not found] ` <15FFB5A5A94CCE31.23217@groups.io>
2020-03-26 1:23 ` Yao, Jiewen
2020-03-26 2:44 ` Zurcher, Christopher J
2020-03-26 3:05 ` Yao, Jiewen
2020-03-26 3:29 ` Zurcher, Christopher J
2020-03-26 3:58 ` Yao, Jiewen
2020-03-26 18:23 ` Michael D Kinney
2020-03-27 0:52 ` Zurcher, Christopher J
2020-03-23 12:59 ` [edk2-devel] [PATCH 0/1] " Laszlo Ersek
2020-03-25 18:40 ` Ard Biesheuvel
2020-03-26 1:04 ` [edk2-devel] " Zurcher, Christopher J
2020-03-26 7:49 ` Ard Biesheuvel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox