public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
* [PATCH 0/1] CryptoPkg/OpensslLib: Add native instruction support for IA32 and X64
@ 2020-03-17 10:26 Zurcher, Christopher J
  2020-03-17 10:26 ` [PATCH 1/1] " Zurcher, Christopher J
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Zurcher, Christopher J @ 2020-03-17 10:26 UTC (permalink / raw)
  To: devel; +Cc: Jian J Wang, Xiaoyu Lu, Eugene Cohen, Ard Biesheuvel

BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=2507

This patch adds support for building the native instruction algorithms for
IA32 and X64 versions of OpensslLib. The process_files.pl script was modified
to parse the .asm file targets from the OpenSSL build config data struct, and
generate the necessary assembly files for the EDK2 build environment.

For the X64 variant, OpenSSL includes calls to a Windows error handling API,
and that function has been stubbed out in ApiHooks.c.

For all variants, a constructor was added to call the required CPUID function
within OpenSSL to facilitate processor capability checks in the native
algorithms.

Additional native architecture variants should be simple to add by following
the changes made for these two architectures.

The OpenSSL assembly files are traditionally generated at build time using a
perl script. To avoid that burden on EDK2 users, these end-result assembly
files are generated during the configuration steps performed by the package
maintainer (through process_files.pl). The perl generator scripts inside
OpenSSL do not parse file comments as they are only meant to create
intermediate build files, so process_files.pl contains additional hooks to
preserve the copyright headers as well as clean up tabs and line endings to
comply with EDK2 coding standards. The resulting file headers align with
the generated .h files which are already included in the EDK2 repository.

Cc: Jian J Wang <jian.j.wang@intel.com>
Cc: Xiaoyu Lu <xiaoyux.lu@intel.com>
Cc: Eugene Cohen <eugene@hp.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>

Christopher J Zurcher (1):
  CryptoPkg/OpensslLib: Add native instruction support for IA32 and X64

 CryptoPkg/Library/OpensslLib/OpensslLib.inf                          |    2 +-
 CryptoPkg/Library/OpensslLib/OpensslLibCrypto.inf                    |    2 +-
 CryptoPkg/Library/OpensslLib/OpensslLibIa32.inf                      |  680 ++
 CryptoPkg/Library/OpensslLib/OpensslLibX64.inf                       |  691 ++
 CryptoPkg/Library/Include/openssl/opensslconf.h                      |    3 -
 CryptoPkg/Library/OpensslLib/ApiHooks.c                              |   18 +
 CryptoPkg/Library/OpensslLib/OpensslLibConstructor.c                 |   34 +
 CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/aesni-x86.nasm          | 3209 ++++++++
 CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/vpaes-x86.nasm          |  648 ++
 CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/bn-586.nasm              | 1522 ++++
 CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/co-586.nasm              | 1259 +++
 CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-gf2m.nasm            |  352 +
 CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-mont.nasm            |  486 ++
 CryptoPkg/Library/OpensslLib/Ia32/crypto/des/crypt586.nasm           |  887 +++
 CryptoPkg/Library/OpensslLib/Ia32/crypto/des/des-586.nasm            | 1835 +++++
 CryptoPkg/Library/OpensslLib/Ia32/crypto/md5/md5-586.nasm            |  690 ++
 CryptoPkg/Library/OpensslLib/Ia32/crypto/modes/ghash-x86.nasm        | 1264 +++
 CryptoPkg/Library/OpensslLib/Ia32/crypto/rc4/rc4-586.nasm            |  381 +
 CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha1-586.nasm           | 3977 ++++++++++
 CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha256-586.nasm         | 6796 ++++++++++++++++
 CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha512-586.nasm         | 2842 +++++++
 CryptoPkg/Library/OpensslLib/Ia32/crypto/x86cpuid.nasm               |  513 ++
 CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-mb-x86_64.nasm     | 1772 +++++
 CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha1-x86_64.nasm   | 3271 ++++++++
 CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha256-x86_64.nasm | 4709 +++++++++++
 CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-x86_64.nasm        | 5084 ++++++++++++
 CryptoPkg/Library/OpensslLib/X64/crypto/aes/vpaes-x86_64.nasm        | 1170 +++
 CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-avx2.nasm            | 1989 +++++
 CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-x86_64.nasm          | 2242 ++++++
 CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-gf2m.nasm          |  432 +
 CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont.nasm          | 1479 ++++
 CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont5.nasm         | 4033 ++++++++++
 CryptoPkg/Library/OpensslLib/X64/crypto/md5/md5-x86_64.nasm          |  794 ++
 CryptoPkg/Library/OpensslLib/X64/crypto/modes/aesni-gcm-x86_64.nasm  |  984 +++
 CryptoPkg/Library/OpensslLib/X64/crypto/modes/ghash-x86_64.nasm      | 2077 +++++
 CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-md5-x86_64.nasm      | 1395 ++++
 CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-x86_64.nasm          |  784 ++
 CryptoPkg/Library/OpensslLib/X64/crypto/sha/keccak1600-x86_64.nasm   |  532 ++
 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-mb-x86_64.nasm      | 7581 ++++++++++++++++++
 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-x86_64.nasm         | 5773 ++++++++++++++
 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-mb-x86_64.nasm    | 8262 ++++++++++++++++++++
 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-x86_64.nasm       | 5712 ++++++++++++++
 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha512-x86_64.nasm       | 5668 ++++++++++++++
 CryptoPkg/Library/OpensslLib/X64/crypto/x86_64cpuid.nasm             |  472 ++
 CryptoPkg/Library/OpensslLib/process_files.pl                        |  208 +-
 CryptoPkg/Library/OpensslLib/uefi-asm.conf                           |   14 +
 46 files changed, 94478 insertions(+), 50 deletions(-)
 create mode 100644 CryptoPkg/Library/OpensslLib/OpensslLibIa32.inf
 create mode 100644 CryptoPkg/Library/OpensslLib/OpensslLibX64.inf
 create mode 100644 CryptoPkg/Library/OpensslLib/ApiHooks.c
 create mode 100644 CryptoPkg/Library/OpensslLib/OpensslLibConstructor.c
 create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/aesni-x86.nasm
 create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/vpaes-x86.nasm
 create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/bn-586.nasm
 create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/co-586.nasm
 create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-gf2m.nasm
 create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-mont.nasm
 create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/des/crypt586.nasm
 create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/des/des-586.nasm
 create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/md5/md5-586.nasm
 create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/modes/ghash-x86.nasm
 create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/rc4/rc4-586.nasm
 create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha1-586.nasm
 create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha256-586.nasm
 create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha512-586.nasm
 create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/x86cpuid.nasm
 create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-mb-x86_64.nasm
 create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha1-x86_64.nasm
 create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha256-x86_64.nasm
 create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-x86_64.nasm
 create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/aes/vpaes-x86_64.nasm
 create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-avx2.nasm
 create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-x86_64.nasm
 create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-gf2m.nasm
 create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont.nasm
 create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont5.nasm
 create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/md5/md5-x86_64.nasm
 create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/modes/aesni-gcm-x86_64.nasm
 create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/modes/ghash-x86_64.nasm
 create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-md5-x86_64.nasm
 create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-x86_64.nasm
 create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/sha/keccak1600-x86_64.nasm
 create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-mb-x86_64.nasm
 create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-x86_64.nasm
 create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-mb-x86_64.nasm
 create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-x86_64.nasm
 create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha512-x86_64.nasm
 create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/x86_64cpuid.nasm
 create mode 100644 CryptoPkg/Library/OpensslLib/uefi-asm.conf

-- 
2.16.2.windows.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/1] CryptoPkg/OpensslLib: Add native instruction support for IA32 and X64
  2020-03-17 10:26 [PATCH 0/1] CryptoPkg/OpensslLib: Add native instruction support for IA32 and X64 Zurcher, Christopher J
@ 2020-03-17 10:26 ` Zurcher, Christopher J
  2020-03-26  1:15   ` [edk2-devel] " Yao, Jiewen
       [not found]   ` <15FFB5A5A94CCE31.23217@groups.io>
  2020-03-23 12:59 ` [edk2-devel] [PATCH 0/1] " Laszlo Ersek
  2020-03-25 18:40 ` Ard Biesheuvel
  2 siblings, 2 replies; 14+ messages in thread
From: Zurcher, Christopher J @ 2020-03-17 10:26 UTC (permalink / raw)
  To: devel; +Cc: Jian J Wang, Xiaoyu Lu, Eugene Cohen, Ard Biesheuvel

BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=2507

Adding IA32 and X64 versions of OpensslLib.inf, and their respective
assembly files. This also introduces the required modifications to
process_files.pl for generating these files.

Cc: Jian J Wang <jian.j.wang@intel.com>
Cc: Xiaoyu Lu <xiaoyux.lu@intel.com>
Cc: Eugene Cohen <eugene@hp.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Christopher J Zurcher <christopher.j.zurcher@intel.com>
---
 CryptoPkg/Library/OpensslLib/OpensslLib.inf                          |    2 +-
 CryptoPkg/Library/OpensslLib/OpensslLibCrypto.inf                    |    2 +-
 CryptoPkg/Library/OpensslLib/OpensslLibIa32.inf                      |  680 ++
 CryptoPkg/Library/OpensslLib/OpensslLibX64.inf                       |  691 ++
 CryptoPkg/Library/Include/openssl/opensslconf.h                      |    3 -
 CryptoPkg/Library/OpensslLib/ApiHooks.c                              |   18 +
 CryptoPkg/Library/OpensslLib/OpensslLibConstructor.c                 |   34 +
 CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/aesni-x86.nasm          | 3209 ++++++++
 CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/vpaes-x86.nasm          |  648 ++
 CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/bn-586.nasm              | 1522 ++++
 CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/co-586.nasm              | 1259 +++
 CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-gf2m.nasm            |  352 +
 CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-mont.nasm            |  486 ++
 CryptoPkg/Library/OpensslLib/Ia32/crypto/des/crypt586.nasm           |  887 +++
 CryptoPkg/Library/OpensslLib/Ia32/crypto/des/des-586.nasm            | 1835 +++++
 CryptoPkg/Library/OpensslLib/Ia32/crypto/md5/md5-586.nasm            |  690 ++
 CryptoPkg/Library/OpensslLib/Ia32/crypto/modes/ghash-x86.nasm        | 1264 +++
 CryptoPkg/Library/OpensslLib/Ia32/crypto/rc4/rc4-586.nasm            |  381 +
 CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha1-586.nasm           | 3977 ++++++++++
 CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha256-586.nasm         | 6796 ++++++++++++++++
 CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha512-586.nasm         | 2842 +++++++
 CryptoPkg/Library/OpensslLib/Ia32/crypto/x86cpuid.nasm               |  513 ++
 CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-mb-x86_64.nasm     | 1772 +++++
 CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha1-x86_64.nasm   | 3271 ++++++++
 CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha256-x86_64.nasm | 4709 +++++++++++
 CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-x86_64.nasm        | 5084 ++++++++++++
 CryptoPkg/Library/OpensslLib/X64/crypto/aes/vpaes-x86_64.nasm        | 1170 +++
 CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-avx2.nasm            | 1989 +++++
 CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-x86_64.nasm          | 2242 ++++++
 CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-gf2m.nasm          |  432 +
 CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont.nasm          | 1479 ++++
 CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont5.nasm         | 4033 ++++++++++
 CryptoPkg/Library/OpensslLib/X64/crypto/md5/md5-x86_64.nasm          |  794 ++
 CryptoPkg/Library/OpensslLib/X64/crypto/modes/aesni-gcm-x86_64.nasm  |  984 +++
 CryptoPkg/Library/OpensslLib/X64/crypto/modes/ghash-x86_64.nasm      | 2077 +++++
 CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-md5-x86_64.nasm      | 1395 ++++
 CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-x86_64.nasm          |  784 ++
 CryptoPkg/Library/OpensslLib/X64/crypto/sha/keccak1600-x86_64.nasm   |  532 ++
 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-mb-x86_64.nasm      | 7581 ++++++++++++++++++
 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-x86_64.nasm         | 5773 ++++++++++++++
 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-mb-x86_64.nasm    | 8262 ++++++++++++++++++++
 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-x86_64.nasm       | 5712 ++++++++++++++
 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha512-x86_64.nasm       | 5668 ++++++++++++++
 CryptoPkg/Library/OpensslLib/X64/crypto/x86_64cpuid.nasm             |  472 ++
 CryptoPkg/Library/OpensslLib/process_files.pl                        |  208 +-
 CryptoPkg/Library/OpensslLib/uefi-asm.conf                           |   14 +
 46 files changed, 94478 insertions(+), 50 deletions(-)

diff --git a/CryptoPkg/Library/OpensslLib/OpensslLib.inf b/CryptoPkg/Library/OpensslLib/OpensslLib.inf
index 3519a66885..542507a534 100644
--- a/CryptoPkg/Library/OpensslLib/OpensslLib.inf
+++ b/CryptoPkg/Library/OpensslLib/OpensslLib.inf
@@ -15,7 +15,7 @@
   VERSION_STRING                 = 1.0
   LIBRARY_CLASS                  = OpensslLib
   DEFINE OPENSSL_PATH            = openssl
-  DEFINE OPENSSL_FLAGS           = -DL_ENDIAN -DOPENSSL_SMALL_FOOTPRINT -D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE
+  DEFINE OPENSSL_FLAGS           = -DL_ENDIAN -DOPENSSL_SMALL_FOOTPRINT -D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE -DOPENSSL_NO_ASM
 
 #
 #  VALID_ARCHITECTURES           = IA32 X64 ARM AARCH64
diff --git a/CryptoPkg/Library/OpensslLib/OpensslLibCrypto.inf b/CryptoPkg/Library/OpensslLib/OpensslLibCrypto.inf
index 8a723cb8cd..f0c588284c 100644
--- a/CryptoPkg/Library/OpensslLib/OpensslLibCrypto.inf
+++ b/CryptoPkg/Library/OpensslLib/OpensslLibCrypto.inf
@@ -15,7 +15,7 @@
   VERSION_STRING                 = 1.0
   LIBRARY_CLASS                  = OpensslLib
   DEFINE OPENSSL_PATH            = openssl
-  DEFINE OPENSSL_FLAGS           = -DL_ENDIAN -DOPENSSL_SMALL_FOOTPRINT -D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE
+  DEFINE OPENSSL_FLAGS           = -DL_ENDIAN -DOPENSSL_SMALL_FOOTPRINT -D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE -DOPENSSL_NO_ASM
 
 #
 #  VALID_ARCHITECTURES           = IA32 X64 ARM AARCH64
diff --git a/CryptoPkg/Library/OpensslLib/OpensslLibIa32.inf b/CryptoPkg/Library/OpensslLib/OpensslLibIa32.inf
new file mode 100644
index 0000000000..14f4d4ab1a
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/OpensslLibIa32.inf
@@ -0,0 +1,680 @@
+## @file
+#  This module provides OpenSSL Library implementation.
+#
+#  Copyright (c) 2010 - 2020, Intel Corporation. All rights reserved.<BR>
+#  SPDX-License-Identifier: BSD-2-Clause-Patent
+#
+##
+
+[Defines]
+  INF_VERSION                    = 0x00010005
+  BASE_NAME                      = OpensslLibIa32
+  MODULE_UNI_FILE                = OpensslLib.uni
+  FILE_GUID                      = 5805D1D4-F8EE-4FBA-BDD8-74465F16A534
+  MODULE_TYPE                    = BASE
+  VERSION_STRING                 = 1.0
+  LIBRARY_CLASS                  = OpensslLib
+  DEFINE OPENSSL_PATH            = openssl
+  DEFINE OPENSSL_FLAGS           = -DL_ENDIAN -DOPENSSL_SMALL_FOOTPRINT -D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE
+  DEFINE OPENSSL_FLAGS_CONFIG    = -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DRMD160_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM
+  CONSTRUCTOR                    = OpensslLibConstructor
+
+#
+#  VALID_ARCHITECTURES           = IA32
+#
+
+[Sources]
+  OpensslLibConstructor.c
+  $(OPENSSL_PATH)/e_os.h
+  $(OPENSSL_PATH)/ms/uplink.h
+# Autogenerated files list starts here
+  Ia32/crypto/aes/aesni-x86.nasm
+  Ia32/crypto/aes/vpaes-x86.nasm
+  Ia32/crypto/bn/bn-586.nasm
+  Ia32/crypto/bn/co-586.nasm
+  Ia32/crypto/bn/x86-gf2m.nasm
+  Ia32/crypto/bn/x86-mont.nasm
+  Ia32/crypto/des/crypt586.nasm
+  Ia32/crypto/des/des-586.nasm
+  Ia32/crypto/md5/md5-586.nasm
+  Ia32/crypto/modes/ghash-x86.nasm
+  Ia32/crypto/rc4/rc4-586.nasm
+  Ia32/crypto/sha/sha1-586.nasm
+  Ia32/crypto/sha/sha256-586.nasm
+  Ia32/crypto/sha/sha512-586.nasm
+  Ia32/crypto/x86cpuid.nasm
+  $(OPENSSL_PATH)/crypto/aes/aes_cbc.c
+  $(OPENSSL_PATH)/crypto/aes/aes_cfb.c
+  $(OPENSSL_PATH)/crypto/aes/aes_core.c
+  $(OPENSSL_PATH)/crypto/aes/aes_ecb.c
+  $(OPENSSL_PATH)/crypto/aes/aes_ige.c
+  $(OPENSSL_PATH)/crypto/aes/aes_misc.c
+  $(OPENSSL_PATH)/crypto/aes/aes_ofb.c
+  $(OPENSSL_PATH)/crypto/aes/aes_wrap.c
+  $(OPENSSL_PATH)/crypto/aria/aria.c
+  $(OPENSSL_PATH)/crypto/asn1/a_bitstr.c
+  $(OPENSSL_PATH)/crypto/asn1/a_d2i_fp.c
+  $(OPENSSL_PATH)/crypto/asn1/a_digest.c
+  $(OPENSSL_PATH)/crypto/asn1/a_dup.c
+  $(OPENSSL_PATH)/crypto/asn1/a_gentm.c
+  $(OPENSSL_PATH)/crypto/asn1/a_i2d_fp.c
+  $(OPENSSL_PATH)/crypto/asn1/a_int.c
+  $(OPENSSL_PATH)/crypto/asn1/a_mbstr.c
+  $(OPENSSL_PATH)/crypto/asn1/a_object.c
+  $(OPENSSL_PATH)/crypto/asn1/a_octet.c
+  $(OPENSSL_PATH)/crypto/asn1/a_print.c
+  $(OPENSSL_PATH)/crypto/asn1/a_sign.c
+  $(OPENSSL_PATH)/crypto/asn1/a_strex.c
+  $(OPENSSL_PATH)/crypto/asn1/a_strnid.c
+  $(OPENSSL_PATH)/crypto/asn1/a_time.c
+  $(OPENSSL_PATH)/crypto/asn1/a_type.c
+  $(OPENSSL_PATH)/crypto/asn1/a_utctm.c
+  $(OPENSSL_PATH)/crypto/asn1/a_utf8.c
+  $(OPENSSL_PATH)/crypto/asn1/a_verify.c
+  $(OPENSSL_PATH)/crypto/asn1/ameth_lib.c
+  $(OPENSSL_PATH)/crypto/asn1/asn1_err.c
+  $(OPENSSL_PATH)/crypto/asn1/asn1_gen.c
+  $(OPENSSL_PATH)/crypto/asn1/asn1_item_list.c
+  $(OPENSSL_PATH)/crypto/asn1/asn1_lib.c
+  $(OPENSSL_PATH)/crypto/asn1/asn1_par.c
+  $(OPENSSL_PATH)/crypto/asn1/asn_mime.c
+  $(OPENSSL_PATH)/crypto/asn1/asn_moid.c
+  $(OPENSSL_PATH)/crypto/asn1/asn_mstbl.c
+  $(OPENSSL_PATH)/crypto/asn1/asn_pack.c
+  $(OPENSSL_PATH)/crypto/asn1/bio_asn1.c
+  $(OPENSSL_PATH)/crypto/asn1/bio_ndef.c
+  $(OPENSSL_PATH)/crypto/asn1/d2i_pr.c
+  $(OPENSSL_PATH)/crypto/asn1/d2i_pu.c
+  $(OPENSSL_PATH)/crypto/asn1/evp_asn1.c
+  $(OPENSSL_PATH)/crypto/asn1/f_int.c
+  $(OPENSSL_PATH)/crypto/asn1/f_string.c
+  $(OPENSSL_PATH)/crypto/asn1/i2d_pr.c
+  $(OPENSSL_PATH)/crypto/asn1/i2d_pu.c
+  $(OPENSSL_PATH)/crypto/asn1/n_pkey.c
+  $(OPENSSL_PATH)/crypto/asn1/nsseq.c
+  $(OPENSSL_PATH)/crypto/asn1/p5_pbe.c
+  $(OPENSSL_PATH)/crypto/asn1/p5_pbev2.c
+  $(OPENSSL_PATH)/crypto/asn1/p5_scrypt.c
+  $(OPENSSL_PATH)/crypto/asn1/p8_pkey.c
+  $(OPENSSL_PATH)/crypto/asn1/t_bitst.c
+  $(OPENSSL_PATH)/crypto/asn1/t_pkey.c
+  $(OPENSSL_PATH)/crypto/asn1/t_spki.c
+  $(OPENSSL_PATH)/crypto/asn1/tasn_dec.c
+  $(OPENSSL_PATH)/crypto/asn1/tasn_enc.c
+  $(OPENSSL_PATH)/crypto/asn1/tasn_fre.c
+  $(OPENSSL_PATH)/crypto/asn1/tasn_new.c
+  $(OPENSSL_PATH)/crypto/asn1/tasn_prn.c
+  $(OPENSSL_PATH)/crypto/asn1/tasn_scn.c
+  $(OPENSSL_PATH)/crypto/asn1/tasn_typ.c
+  $(OPENSSL_PATH)/crypto/asn1/tasn_utl.c
+  $(OPENSSL_PATH)/crypto/asn1/x_algor.c
+  $(OPENSSL_PATH)/crypto/asn1/x_bignum.c
+  $(OPENSSL_PATH)/crypto/asn1/x_info.c
+  $(OPENSSL_PATH)/crypto/asn1/x_int64.c
+  $(OPENSSL_PATH)/crypto/asn1/x_long.c
+  $(OPENSSL_PATH)/crypto/asn1/x_pkey.c
+  $(OPENSSL_PATH)/crypto/asn1/x_sig.c
+  $(OPENSSL_PATH)/crypto/asn1/x_spki.c
+  $(OPENSSL_PATH)/crypto/asn1/x_val.c
+  $(OPENSSL_PATH)/crypto/async/arch/async_null.c
+  $(OPENSSL_PATH)/crypto/async/arch/async_posix.c
+  $(OPENSSL_PATH)/crypto/async/arch/async_win.c
+  $(OPENSSL_PATH)/crypto/async/async.c
+  $(OPENSSL_PATH)/crypto/async/async_err.c
+  $(OPENSSL_PATH)/crypto/async/async_wait.c
+  $(OPENSSL_PATH)/crypto/bio/b_addr.c
+  $(OPENSSL_PATH)/crypto/bio/b_dump.c
+  $(OPENSSL_PATH)/crypto/bio/b_sock.c
+  $(OPENSSL_PATH)/crypto/bio/b_sock2.c
+  $(OPENSSL_PATH)/crypto/bio/bf_buff.c
+  $(OPENSSL_PATH)/crypto/bio/bf_lbuf.c
+  $(OPENSSL_PATH)/crypto/bio/bf_nbio.c
+  $(OPENSSL_PATH)/crypto/bio/bf_null.c
+  $(OPENSSL_PATH)/crypto/bio/bio_cb.c
+  $(OPENSSL_PATH)/crypto/bio/bio_err.c
+  $(OPENSSL_PATH)/crypto/bio/bio_lib.c
+  $(OPENSSL_PATH)/crypto/bio/bio_meth.c
+  $(OPENSSL_PATH)/crypto/bio/bss_acpt.c
+  $(OPENSSL_PATH)/crypto/bio/bss_bio.c
+  $(OPENSSL_PATH)/crypto/bio/bss_conn.c
+  $(OPENSSL_PATH)/crypto/bio/bss_dgram.c
+  $(OPENSSL_PATH)/crypto/bio/bss_fd.c
+  $(OPENSSL_PATH)/crypto/bio/bss_file.c
+  $(OPENSSL_PATH)/crypto/bio/bss_log.c
+  $(OPENSSL_PATH)/crypto/bio/bss_mem.c
+  $(OPENSSL_PATH)/crypto/bio/bss_null.c
+  $(OPENSSL_PATH)/crypto/bio/bss_sock.c
+  $(OPENSSL_PATH)/crypto/bn/bn_add.c
+  $(OPENSSL_PATH)/crypto/bn/bn_blind.c
+  $(OPENSSL_PATH)/crypto/bn/bn_const.c
+  $(OPENSSL_PATH)/crypto/bn/bn_ctx.c
+  $(OPENSSL_PATH)/crypto/bn/bn_depr.c
+  $(OPENSSL_PATH)/crypto/bn/bn_dh.c
+  $(OPENSSL_PATH)/crypto/bn/bn_div.c
+  $(OPENSSL_PATH)/crypto/bn/bn_err.c
+  $(OPENSSL_PATH)/crypto/bn/bn_exp.c
+  $(OPENSSL_PATH)/crypto/bn/bn_exp2.c
+  $(OPENSSL_PATH)/crypto/bn/bn_gcd.c
+  $(OPENSSL_PATH)/crypto/bn/bn_gf2m.c
+  $(OPENSSL_PATH)/crypto/bn/bn_intern.c
+  $(OPENSSL_PATH)/crypto/bn/bn_kron.c
+  $(OPENSSL_PATH)/crypto/bn/bn_lib.c
+  $(OPENSSL_PATH)/crypto/bn/bn_mod.c
+  $(OPENSSL_PATH)/crypto/bn/bn_mont.c
+  $(OPENSSL_PATH)/crypto/bn/bn_mpi.c
+  $(OPENSSL_PATH)/crypto/bn/bn_mul.c
+  $(OPENSSL_PATH)/crypto/bn/bn_nist.c
+  $(OPENSSL_PATH)/crypto/bn/bn_prime.c
+  $(OPENSSL_PATH)/crypto/bn/bn_print.c
+  $(OPENSSL_PATH)/crypto/bn/bn_rand.c
+  $(OPENSSL_PATH)/crypto/bn/bn_recp.c
+  $(OPENSSL_PATH)/crypto/bn/bn_shift.c
+  $(OPENSSL_PATH)/crypto/bn/bn_sqr.c
+  $(OPENSSL_PATH)/crypto/bn/bn_sqrt.c
+  $(OPENSSL_PATH)/crypto/bn/bn_srp.c
+  $(OPENSSL_PATH)/crypto/bn/bn_word.c
+  $(OPENSSL_PATH)/crypto/bn/bn_x931p.c
+  $(OPENSSL_PATH)/crypto/buffer/buf_err.c
+  $(OPENSSL_PATH)/crypto/buffer/buffer.c
+  $(OPENSSL_PATH)/crypto/cmac/cm_ameth.c
+  $(OPENSSL_PATH)/crypto/cmac/cm_pmeth.c
+  $(OPENSSL_PATH)/crypto/cmac/cmac.c
+  $(OPENSSL_PATH)/crypto/comp/c_zlib.c
+  $(OPENSSL_PATH)/crypto/comp/comp_err.c
+  $(OPENSSL_PATH)/crypto/comp/comp_lib.c
+  $(OPENSSL_PATH)/crypto/conf/conf_api.c
+  $(OPENSSL_PATH)/crypto/conf/conf_def.c
+  $(OPENSSL_PATH)/crypto/conf/conf_err.c
+  $(OPENSSL_PATH)/crypto/conf/conf_lib.c
+  $(OPENSSL_PATH)/crypto/conf/conf_mall.c
+  $(OPENSSL_PATH)/crypto/conf/conf_mod.c
+  $(OPENSSL_PATH)/crypto/conf/conf_sap.c
+  $(OPENSSL_PATH)/crypto/conf/conf_ssl.c
+  $(OPENSSL_PATH)/crypto/cpt_err.c
+  $(OPENSSL_PATH)/crypto/cryptlib.c
+  $(OPENSSL_PATH)/crypto/ctype.c
+  $(OPENSSL_PATH)/crypto/cversion.c
+  $(OPENSSL_PATH)/crypto/des/cbc_cksm.c
+  $(OPENSSL_PATH)/crypto/des/cbc_enc.c
+  $(OPENSSL_PATH)/crypto/des/cfb64ede.c
+  $(OPENSSL_PATH)/crypto/des/cfb64enc.c
+  $(OPENSSL_PATH)/crypto/des/cfb_enc.c
+  $(OPENSSL_PATH)/crypto/des/ecb3_enc.c
+  $(OPENSSL_PATH)/crypto/des/ecb_enc.c
+  $(OPENSSL_PATH)/crypto/des/fcrypt.c
+  $(OPENSSL_PATH)/crypto/des/ofb64ede.c
+  $(OPENSSL_PATH)/crypto/des/ofb64enc.c
+  $(OPENSSL_PATH)/crypto/des/ofb_enc.c
+  $(OPENSSL_PATH)/crypto/des/pcbc_enc.c
+  $(OPENSSL_PATH)/crypto/des/qud_cksm.c
+  $(OPENSSL_PATH)/crypto/des/rand_key.c
+  $(OPENSSL_PATH)/crypto/des/set_key.c
+  $(OPENSSL_PATH)/crypto/des/str2key.c
+  $(OPENSSL_PATH)/crypto/des/xcbc_enc.c
+  $(OPENSSL_PATH)/crypto/dh/dh_ameth.c
+  $(OPENSSL_PATH)/crypto/dh/dh_asn1.c
+  $(OPENSSL_PATH)/crypto/dh/dh_check.c
+  $(OPENSSL_PATH)/crypto/dh/dh_depr.c
+  $(OPENSSL_PATH)/crypto/dh/dh_err.c
+  $(OPENSSL_PATH)/crypto/dh/dh_gen.c
+  $(OPENSSL_PATH)/crypto/dh/dh_kdf.c
+  $(OPENSSL_PATH)/crypto/dh/dh_key.c
+  $(OPENSSL_PATH)/crypto/dh/dh_lib.c
+  $(OPENSSL_PATH)/crypto/dh/dh_meth.c
+  $(OPENSSL_PATH)/crypto/dh/dh_pmeth.c
+  $(OPENSSL_PATH)/crypto/dh/dh_prn.c
+  $(OPENSSL_PATH)/crypto/dh/dh_rfc5114.c
+  $(OPENSSL_PATH)/crypto/dh/dh_rfc7919.c
+  $(OPENSSL_PATH)/crypto/dso/dso_dl.c
+  $(OPENSSL_PATH)/crypto/dso/dso_dlfcn.c
+  $(OPENSSL_PATH)/crypto/dso/dso_err.c
+  $(OPENSSL_PATH)/crypto/dso/dso_lib.c
+  $(OPENSSL_PATH)/crypto/dso/dso_openssl.c
+  $(OPENSSL_PATH)/crypto/dso/dso_vms.c
+  $(OPENSSL_PATH)/crypto/dso/dso_win32.c
+  $(OPENSSL_PATH)/crypto/ebcdic.c
+  $(OPENSSL_PATH)/crypto/err/err.c
+  $(OPENSSL_PATH)/crypto/err/err_prn.c
+  $(OPENSSL_PATH)/crypto/evp/bio_b64.c
+  $(OPENSSL_PATH)/crypto/evp/bio_enc.c
+  $(OPENSSL_PATH)/crypto/evp/bio_md.c
+  $(OPENSSL_PATH)/crypto/evp/bio_ok.c
+  $(OPENSSL_PATH)/crypto/evp/c_allc.c
+  $(OPENSSL_PATH)/crypto/evp/c_alld.c
+  $(OPENSSL_PATH)/crypto/evp/cmeth_lib.c
+  $(OPENSSL_PATH)/crypto/evp/digest.c
+  $(OPENSSL_PATH)/crypto/evp/e_aes.c
+  $(OPENSSL_PATH)/crypto/evp/e_aes_cbc_hmac_sha1.c
+  $(OPENSSL_PATH)/crypto/evp/e_aes_cbc_hmac_sha256.c
+  $(OPENSSL_PATH)/crypto/evp/e_aria.c
+  $(OPENSSL_PATH)/crypto/evp/e_bf.c
+  $(OPENSSL_PATH)/crypto/evp/e_camellia.c
+  $(OPENSSL_PATH)/crypto/evp/e_cast.c
+  $(OPENSSL_PATH)/crypto/evp/e_chacha20_poly1305.c
+  $(OPENSSL_PATH)/crypto/evp/e_des.c
+  $(OPENSSL_PATH)/crypto/evp/e_des3.c
+  $(OPENSSL_PATH)/crypto/evp/e_idea.c
+  $(OPENSSL_PATH)/crypto/evp/e_null.c
+  $(OPENSSL_PATH)/crypto/evp/e_old.c
+  $(OPENSSL_PATH)/crypto/evp/e_rc2.c
+  $(OPENSSL_PATH)/crypto/evp/e_rc4.c
+  $(OPENSSL_PATH)/crypto/evp/e_rc4_hmac_md5.c
+  $(OPENSSL_PATH)/crypto/evp/e_rc5.c
+  $(OPENSSL_PATH)/crypto/evp/e_seed.c
+  $(OPENSSL_PATH)/crypto/evp/e_sm4.c
+  $(OPENSSL_PATH)/crypto/evp/e_xcbc_d.c
+  $(OPENSSL_PATH)/crypto/evp/encode.c
+  $(OPENSSL_PATH)/crypto/evp/evp_cnf.c
+  $(OPENSSL_PATH)/crypto/evp/evp_enc.c
+  $(OPENSSL_PATH)/crypto/evp/evp_err.c
+  $(OPENSSL_PATH)/crypto/evp/evp_key.c
+  $(OPENSSL_PATH)/crypto/evp/evp_lib.c
+  $(OPENSSL_PATH)/crypto/evp/evp_pbe.c
+  $(OPENSSL_PATH)/crypto/evp/evp_pkey.c
+  $(OPENSSL_PATH)/crypto/evp/m_md2.c
+  $(OPENSSL_PATH)/crypto/evp/m_md4.c
+  $(OPENSSL_PATH)/crypto/evp/m_md5.c
+  $(OPENSSL_PATH)/crypto/evp/m_md5_sha1.c
+  $(OPENSSL_PATH)/crypto/evp/m_mdc2.c
+  $(OPENSSL_PATH)/crypto/evp/m_null.c
+  $(OPENSSL_PATH)/crypto/evp/m_ripemd.c
+  $(OPENSSL_PATH)/crypto/evp/m_sha1.c
+  $(OPENSSL_PATH)/crypto/evp/m_sha3.c
+  $(OPENSSL_PATH)/crypto/evp/m_sigver.c
+  $(OPENSSL_PATH)/crypto/evp/m_wp.c
+  $(OPENSSL_PATH)/crypto/evp/names.c
+  $(OPENSSL_PATH)/crypto/evp/p5_crpt.c
+  $(OPENSSL_PATH)/crypto/evp/p5_crpt2.c
+  $(OPENSSL_PATH)/crypto/evp/p_dec.c
+  $(OPENSSL_PATH)/crypto/evp/p_enc.c
+  $(OPENSSL_PATH)/crypto/evp/p_lib.c
+  $(OPENSSL_PATH)/crypto/evp/p_open.c
+  $(OPENSSL_PATH)/crypto/evp/p_seal.c
+  $(OPENSSL_PATH)/crypto/evp/p_sign.c
+  $(OPENSSL_PATH)/crypto/evp/p_verify.c
+  $(OPENSSL_PATH)/crypto/evp/pbe_scrypt.c
+  $(OPENSSL_PATH)/crypto/evp/pmeth_fn.c
+  $(OPENSSL_PATH)/crypto/evp/pmeth_gn.c
+  $(OPENSSL_PATH)/crypto/evp/pmeth_lib.c
+  $(OPENSSL_PATH)/crypto/ex_data.c
+  $(OPENSSL_PATH)/crypto/getenv.c
+  $(OPENSSL_PATH)/crypto/hmac/hm_ameth.c
+  $(OPENSSL_PATH)/crypto/hmac/hm_pmeth.c
+  $(OPENSSL_PATH)/crypto/hmac/hmac.c
+  $(OPENSSL_PATH)/crypto/init.c
+  $(OPENSSL_PATH)/crypto/kdf/hkdf.c
+  $(OPENSSL_PATH)/crypto/kdf/kdf_err.c
+  $(OPENSSL_PATH)/crypto/kdf/scrypt.c
+  $(OPENSSL_PATH)/crypto/kdf/tls1_prf.c
+  $(OPENSSL_PATH)/crypto/lhash/lh_stats.c
+  $(OPENSSL_PATH)/crypto/lhash/lhash.c
+  $(OPENSSL_PATH)/crypto/md4/md4_dgst.c
+  $(OPENSSL_PATH)/crypto/md4/md4_one.c
+  $(OPENSSL_PATH)/crypto/md5/md5_dgst.c
+  $(OPENSSL_PATH)/crypto/md5/md5_one.c
+  $(OPENSSL_PATH)/crypto/mem.c
+  $(OPENSSL_PATH)/crypto/mem_dbg.c
+  $(OPENSSL_PATH)/crypto/mem_sec.c
+  $(OPENSSL_PATH)/crypto/modes/cbc128.c
+  $(OPENSSL_PATH)/crypto/modes/ccm128.c
+  $(OPENSSL_PATH)/crypto/modes/cfb128.c
+  $(OPENSSL_PATH)/crypto/modes/ctr128.c
+  $(OPENSSL_PATH)/crypto/modes/cts128.c
+  $(OPENSSL_PATH)/crypto/modes/gcm128.c
+  $(OPENSSL_PATH)/crypto/modes/ocb128.c
+  $(OPENSSL_PATH)/crypto/modes/ofb128.c
+  $(OPENSSL_PATH)/crypto/modes/wrap128.c
+  $(OPENSSL_PATH)/crypto/modes/xts128.c
+  $(OPENSSL_PATH)/crypto/o_dir.c
+  $(OPENSSL_PATH)/crypto/o_fips.c
+  $(OPENSSL_PATH)/crypto/o_fopen.c
+  $(OPENSSL_PATH)/crypto/o_init.c
+  $(OPENSSL_PATH)/crypto/o_str.c
+  $(OPENSSL_PATH)/crypto/o_time.c
+  $(OPENSSL_PATH)/crypto/objects/o_names.c
+  $(OPENSSL_PATH)/crypto/objects/obj_dat.c
+  $(OPENSSL_PATH)/crypto/objects/obj_err.c
+  $(OPENSSL_PATH)/crypto/objects/obj_lib.c
+  $(OPENSSL_PATH)/crypto/objects/obj_xref.c
+  $(OPENSSL_PATH)/crypto/ocsp/ocsp_asn.c
+  $(OPENSSL_PATH)/crypto/ocsp/ocsp_cl.c
+  $(OPENSSL_PATH)/crypto/ocsp/ocsp_err.c
+  $(OPENSSL_PATH)/crypto/ocsp/ocsp_ext.c
+  $(OPENSSL_PATH)/crypto/ocsp/ocsp_ht.c
+  $(OPENSSL_PATH)/crypto/ocsp/ocsp_lib.c
+  $(OPENSSL_PATH)/crypto/ocsp/ocsp_prn.c
+  $(OPENSSL_PATH)/crypto/ocsp/ocsp_srv.c
+  $(OPENSSL_PATH)/crypto/ocsp/ocsp_vfy.c
+  $(OPENSSL_PATH)/crypto/ocsp/v3_ocsp.c
+  $(OPENSSL_PATH)/crypto/pem/pem_all.c
+  $(OPENSSL_PATH)/crypto/pem/pem_err.c
+  $(OPENSSL_PATH)/crypto/pem/pem_info.c
+  $(OPENSSL_PATH)/crypto/pem/pem_lib.c
+  $(OPENSSL_PATH)/crypto/pem/pem_oth.c
+  $(OPENSSL_PATH)/crypto/pem/pem_pk8.c
+  $(OPENSSL_PATH)/crypto/pem/pem_pkey.c
+  $(OPENSSL_PATH)/crypto/pem/pem_sign.c
+  $(OPENSSL_PATH)/crypto/pem/pem_x509.c
+  $(OPENSSL_PATH)/crypto/pem/pem_xaux.c
+  $(OPENSSL_PATH)/crypto/pem/pvkfmt.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_add.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_asn.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_attr.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_crpt.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_crt.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_decr.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_init.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_key.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_kiss.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_mutl.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_npas.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_p8d.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_p8e.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_sbag.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_utl.c
+  $(OPENSSL_PATH)/crypto/pkcs12/pk12err.c
+  $(OPENSSL_PATH)/crypto/pkcs7/bio_pk7.c
+  $(OPENSSL_PATH)/crypto/pkcs7/pk7_asn1.c
+  $(OPENSSL_PATH)/crypto/pkcs7/pk7_attr.c
+  $(OPENSSL_PATH)/crypto/pkcs7/pk7_doit.c
+  $(OPENSSL_PATH)/crypto/pkcs7/pk7_lib.c
+  $(OPENSSL_PATH)/crypto/pkcs7/pk7_mime.c
+  $(OPENSSL_PATH)/crypto/pkcs7/pk7_smime.c
+  $(OPENSSL_PATH)/crypto/pkcs7/pkcs7err.c
+  $(OPENSSL_PATH)/crypto/rand/drbg_ctr.c
+  $(OPENSSL_PATH)/crypto/rand/drbg_lib.c
+  $(OPENSSL_PATH)/crypto/rand/rand_egd.c
+  $(OPENSSL_PATH)/crypto/rand/rand_err.c
+  $(OPENSSL_PATH)/crypto/rand/rand_lib.c
+  $(OPENSSL_PATH)/crypto/rand/rand_unix.c
+  $(OPENSSL_PATH)/crypto/rand/rand_vms.c
+  $(OPENSSL_PATH)/crypto/rand/rand_win.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_ameth.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_asn1.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_chk.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_crpt.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_depr.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_err.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_gen.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_lib.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_meth.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_mp.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_none.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_oaep.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_ossl.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_pk1.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_pmeth.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_prn.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_pss.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_saos.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_sign.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_ssl.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_x931.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_x931g.c
+  $(OPENSSL_PATH)/crypto/sha/keccak1600.c
+  $(OPENSSL_PATH)/crypto/sha/sha1_one.c
+  $(OPENSSL_PATH)/crypto/sha/sha1dgst.c
+  $(OPENSSL_PATH)/crypto/sha/sha256.c
+  $(OPENSSL_PATH)/crypto/sha/sha512.c
+  $(OPENSSL_PATH)/crypto/siphash/siphash.c
+  $(OPENSSL_PATH)/crypto/siphash/siphash_ameth.c
+  $(OPENSSL_PATH)/crypto/siphash/siphash_pmeth.c
+  $(OPENSSL_PATH)/crypto/sm3/m_sm3.c
+  $(OPENSSL_PATH)/crypto/sm3/sm3.c
+  $(OPENSSL_PATH)/crypto/sm4/sm4.c
+  $(OPENSSL_PATH)/crypto/stack/stack.c
+  $(OPENSSL_PATH)/crypto/threads_none.c
+  $(OPENSSL_PATH)/crypto/threads_pthread.c
+  $(OPENSSL_PATH)/crypto/threads_win.c
+  $(OPENSSL_PATH)/crypto/txt_db/txt_db.c
+  $(OPENSSL_PATH)/crypto/ui/ui_err.c
+  $(OPENSSL_PATH)/crypto/ui/ui_lib.c
+  $(OPENSSL_PATH)/crypto/ui/ui_null.c
+  $(OPENSSL_PATH)/crypto/ui/ui_openssl.c
+  $(OPENSSL_PATH)/crypto/ui/ui_util.c
+  $(OPENSSL_PATH)/crypto/uid.c
+  $(OPENSSL_PATH)/crypto/x509/by_dir.c
+  $(OPENSSL_PATH)/crypto/x509/by_file.c
+  $(OPENSSL_PATH)/crypto/x509/t_crl.c
+  $(OPENSSL_PATH)/crypto/x509/t_req.c
+  $(OPENSSL_PATH)/crypto/x509/t_x509.c
+  $(OPENSSL_PATH)/crypto/x509/x509_att.c
+  $(OPENSSL_PATH)/crypto/x509/x509_cmp.c
+  $(OPENSSL_PATH)/crypto/x509/x509_d2.c
+  $(OPENSSL_PATH)/crypto/x509/x509_def.c
+  $(OPENSSL_PATH)/crypto/x509/x509_err.c
+  $(OPENSSL_PATH)/crypto/x509/x509_ext.c
+  $(OPENSSL_PATH)/crypto/x509/x509_lu.c
+  $(OPENSSL_PATH)/crypto/x509/x509_meth.c
+  $(OPENSSL_PATH)/crypto/x509/x509_obj.c
+  $(OPENSSL_PATH)/crypto/x509/x509_r2x.c
+  $(OPENSSL_PATH)/crypto/x509/x509_req.c
+  $(OPENSSL_PATH)/crypto/x509/x509_set.c
+  $(OPENSSL_PATH)/crypto/x509/x509_trs.c
+  $(OPENSSL_PATH)/crypto/x509/x509_txt.c
+  $(OPENSSL_PATH)/crypto/x509/x509_v3.c
+  $(OPENSSL_PATH)/crypto/x509/x509_vfy.c
+  $(OPENSSL_PATH)/crypto/x509/x509_vpm.c
+  $(OPENSSL_PATH)/crypto/x509/x509cset.c
+  $(OPENSSL_PATH)/crypto/x509/x509name.c
+  $(OPENSSL_PATH)/crypto/x509/x509rset.c
+  $(OPENSSL_PATH)/crypto/x509/x509spki.c
+  $(OPENSSL_PATH)/crypto/x509/x509type.c
+  $(OPENSSL_PATH)/crypto/x509/x_all.c
+  $(OPENSSL_PATH)/crypto/x509/x_attrib.c
+  $(OPENSSL_PATH)/crypto/x509/x_crl.c
+  $(OPENSSL_PATH)/crypto/x509/x_exten.c
+  $(OPENSSL_PATH)/crypto/x509/x_name.c
+  $(OPENSSL_PATH)/crypto/x509/x_pubkey.c
+  $(OPENSSL_PATH)/crypto/x509/x_req.c
+  $(OPENSSL_PATH)/crypto/x509/x_x509.c
+  $(OPENSSL_PATH)/crypto/x509/x_x509a.c
+  $(OPENSSL_PATH)/crypto/x509v3/pcy_cache.c
+  $(OPENSSL_PATH)/crypto/x509v3/pcy_data.c
+  $(OPENSSL_PATH)/crypto/x509v3/pcy_lib.c
+  $(OPENSSL_PATH)/crypto/x509v3/pcy_map.c
+  $(OPENSSL_PATH)/crypto/x509v3/pcy_node.c
+  $(OPENSSL_PATH)/crypto/x509v3/pcy_tree.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_addr.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_admis.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_akey.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_akeya.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_alt.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_asid.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_bcons.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_bitst.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_conf.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_cpols.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_crld.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_enum.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_extku.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_genn.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_ia5.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_info.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_int.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_lib.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_ncons.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_pci.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_pcia.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_pcons.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_pku.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_pmaps.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_prn.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_purp.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_skey.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_sxnet.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_tlsf.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_utl.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3err.c
+  $(OPENSSL_PATH)/crypto/arm_arch.h
+  $(OPENSSL_PATH)/crypto/mips_arch.h
+  $(OPENSSL_PATH)/crypto/ppc_arch.h
+  $(OPENSSL_PATH)/crypto/s390x_arch.h
+  $(OPENSSL_PATH)/crypto/sparc_arch.h
+  $(OPENSSL_PATH)/crypto/vms_rms.h
+  $(OPENSSL_PATH)/crypto/aes/aes_locl.h
+  $(OPENSSL_PATH)/crypto/asn1/asn1_item_list.h
+  $(OPENSSL_PATH)/crypto/asn1/asn1_locl.h
+  $(OPENSSL_PATH)/crypto/asn1/charmap.h
+  $(OPENSSL_PATH)/crypto/asn1/standard_methods.h
+  $(OPENSSL_PATH)/crypto/asn1/tbl_standard.h
+  $(OPENSSL_PATH)/crypto/async/async_locl.h
+  $(OPENSSL_PATH)/crypto/async/arch/async_null.h
+  $(OPENSSL_PATH)/crypto/async/arch/async_posix.h
+  $(OPENSSL_PATH)/crypto/async/arch/async_win.h
+  $(OPENSSL_PATH)/crypto/bio/bio_lcl.h
+  $(OPENSSL_PATH)/crypto/bn/bn_lcl.h
+  $(OPENSSL_PATH)/crypto/bn/bn_prime.h
+  $(OPENSSL_PATH)/crypto/bn/rsaz_exp.h
+  $(OPENSSL_PATH)/crypto/comp/comp_lcl.h
+  $(OPENSSL_PATH)/crypto/conf/conf_def.h
+  $(OPENSSL_PATH)/crypto/conf/conf_lcl.h
+  $(OPENSSL_PATH)/crypto/des/des_locl.h
+  $(OPENSSL_PATH)/crypto/des/spr.h
+  $(OPENSSL_PATH)/crypto/dh/dh_locl.h
+  $(OPENSSL_PATH)/crypto/dso/dso_locl.h
+  $(OPENSSL_PATH)/crypto/evp/evp_locl.h
+  $(OPENSSL_PATH)/crypto/hmac/hmac_lcl.h
+  $(OPENSSL_PATH)/crypto/lhash/lhash_lcl.h
+  $(OPENSSL_PATH)/crypto/md4/md4_locl.h
+  $(OPENSSL_PATH)/crypto/md5/md5_locl.h
+  $(OPENSSL_PATH)/crypto/modes/modes_lcl.h
+  $(OPENSSL_PATH)/crypto/objects/obj_dat.h
+  $(OPENSSL_PATH)/crypto/objects/obj_lcl.h
+  $(OPENSSL_PATH)/crypto/objects/obj_xref.h
+  $(OPENSSL_PATH)/crypto/ocsp/ocsp_lcl.h
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_lcl.h
+  $(OPENSSL_PATH)/crypto/rand/rand_lcl.h
+  $(OPENSSL_PATH)/crypto/rc4/rc4_locl.h
+  $(OPENSSL_PATH)/crypto/rsa/rsa_locl.h
+  $(OPENSSL_PATH)/crypto/sha/sha_locl.h
+  $(OPENSSL_PATH)/crypto/siphash/siphash_local.h
+  $(OPENSSL_PATH)/crypto/sm3/sm3_locl.h
+  $(OPENSSL_PATH)/crypto/store/store_locl.h
+  $(OPENSSL_PATH)/crypto/ui/ui_locl.h
+  $(OPENSSL_PATH)/crypto/x509/x509_lcl.h
+  $(OPENSSL_PATH)/crypto/x509v3/ext_dat.h
+  $(OPENSSL_PATH)/crypto/x509v3/pcy_int.h
+  $(OPENSSL_PATH)/crypto/x509v3/standard_exts.h
+  $(OPENSSL_PATH)/crypto/x509v3/v3_admis.h
+  $(OPENSSL_PATH)/ssl/bio_ssl.c
+  $(OPENSSL_PATH)/ssl/d1_lib.c
+  $(OPENSSL_PATH)/ssl/d1_msg.c
+  $(OPENSSL_PATH)/ssl/d1_srtp.c
+  $(OPENSSL_PATH)/ssl/methods.c
+  $(OPENSSL_PATH)/ssl/packet.c
+  $(OPENSSL_PATH)/ssl/pqueue.c
+  $(OPENSSL_PATH)/ssl/record/dtls1_bitmap.c
+  $(OPENSSL_PATH)/ssl/record/rec_layer_d1.c
+  $(OPENSSL_PATH)/ssl/record/rec_layer_s3.c
+  $(OPENSSL_PATH)/ssl/record/ssl3_buffer.c
+  $(OPENSSL_PATH)/ssl/record/ssl3_record.c
+  $(OPENSSL_PATH)/ssl/record/ssl3_record_tls13.c
+  $(OPENSSL_PATH)/ssl/s3_cbc.c
+  $(OPENSSL_PATH)/ssl/s3_enc.c
+  $(OPENSSL_PATH)/ssl/s3_lib.c
+  $(OPENSSL_PATH)/ssl/s3_msg.c
+  $(OPENSSL_PATH)/ssl/ssl_asn1.c
+  $(OPENSSL_PATH)/ssl/ssl_cert.c
+  $(OPENSSL_PATH)/ssl/ssl_ciph.c
+  $(OPENSSL_PATH)/ssl/ssl_conf.c
+  $(OPENSSL_PATH)/ssl/ssl_err.c
+  $(OPENSSL_PATH)/ssl/ssl_init.c
+  $(OPENSSL_PATH)/ssl/ssl_lib.c
+  $(OPENSSL_PATH)/ssl/ssl_mcnf.c
+  $(OPENSSL_PATH)/ssl/ssl_rsa.c
+  $(OPENSSL_PATH)/ssl/ssl_sess.c
+  $(OPENSSL_PATH)/ssl/ssl_stat.c
+  $(OPENSSL_PATH)/ssl/ssl_txt.c
+  $(OPENSSL_PATH)/ssl/ssl_utst.c
+  $(OPENSSL_PATH)/ssl/statem/extensions.c
+  $(OPENSSL_PATH)/ssl/statem/extensions_clnt.c
+  $(OPENSSL_PATH)/ssl/statem/extensions_cust.c
+  $(OPENSSL_PATH)/ssl/statem/extensions_srvr.c
+  $(OPENSSL_PATH)/ssl/statem/statem.c
+  $(OPENSSL_PATH)/ssl/statem/statem_clnt.c
+  $(OPENSSL_PATH)/ssl/statem/statem_dtls.c
+  $(OPENSSL_PATH)/ssl/statem/statem_lib.c
+  $(OPENSSL_PATH)/ssl/statem/statem_srvr.c
+  $(OPENSSL_PATH)/ssl/t1_enc.c
+  $(OPENSSL_PATH)/ssl/t1_lib.c
+  $(OPENSSL_PATH)/ssl/t1_trce.c
+  $(OPENSSL_PATH)/ssl/tls13_enc.c
+  $(OPENSSL_PATH)/ssl/tls_srp.c
+  $(OPENSSL_PATH)/ssl/packet_locl.h
+  $(OPENSSL_PATH)/ssl/ssl_cert_table.h
+  $(OPENSSL_PATH)/ssl/ssl_locl.h
+  $(OPENSSL_PATH)/ssl/record/record.h
+  $(OPENSSL_PATH)/ssl/record/record_locl.h
+  $(OPENSSL_PATH)/ssl/statem/statem.h
+  $(OPENSSL_PATH)/ssl/statem/statem_locl.h
+# Autogenerated files list ends here
+  buildinf.h
+  rand_pool_noise.h
+  ossl_store.c
+  rand_pool.c
+
+[Sources.Ia32]
+  rand_pool_noise_tsc.c
+
+[Packages]
+  MdePkg/MdePkg.dec
+  CryptoPkg/CryptoPkg.dec
+
+[LibraryClasses]
+  BaseLib
+  DebugLib
+  TimerLib
+  PrintLib
+
+[BuildOptions]
+  #
+  # Disables the following Visual Studio compiler warnings brought by openssl source,
+  # so we do not break the build with /WX option:
+  #   C4090: 'function' : different 'const' qualifiers
+  #   C4132: 'object' : const object should be initialized (tls13_enc.c)
+  #   C4210: nonstandard extension used: function given file scope
+  #   C4244: conversion from type1 to type2, possible loss of data
+  #   C4245: conversion from type1 to type2, signed/unsigned mismatch
+  #   C4267: conversion from size_t to type, possible loss of data
+  #   C4306: 'identifier' : conversion from 'type1' to 'type2' of greater size
+  #   C4310: cast truncates constant value
+  #   C4389: 'operator' : signed/unsigned mismatch (xxxx)
+  #   C4700: uninitialized local variable 'name' used. (conf_sap.c(71))
+  #   C4702: unreachable code
+  #   C4706: assignment within conditional expression
+  #   C4819: The file contains a character that cannot be represented in the current code page
+  #
+  MSFT:*_*_IA32_CC_FLAGS   = -U_WIN32 -U_WIN64 -U_MSC_VER $(OPENSSL_FLAGS) $(OPENSSL_FLAGS_CONFIG) /wd4090 /wd4132 /wd4210 /wd4244 /wd4245 /wd4267 /wd4310 /wd4389 /wd4700 /wd4702 /wd4706 /wd4819
+
+  INTEL:*_*_IA32_CC_FLAGS  = -U_WIN32 -U_WIN64 -U_MSC_VER -U__ICC $(OPENSSL_FLAGS) $(OPENSSL_FLAGS_CONFIG) /w
+
+  #
+  # Suppress the following build warnings in openssl so we don't break the build with -Werror
+  #   -Werror=maybe-uninitialized: there exist some other paths for which the variable is not initialized.
+  #   -Werror=format: Check calls to printf and scanf, etc., to make sure that the arguments supplied have
+  #                   types appropriate to the format string specified.
+  #   -Werror=unused-but-set-variable: Warn whenever a local variable is assigned to, but otherwise unused (aside from its declaration).
+  #
+  GCC:*_*_IA32_CC_FLAGS    = -U_WIN32 -U_WIN64 $(OPENSSL_FLAGS) -Wno-error=maybe-uninitialized -Wno-error=unused-but-set-variable
+
+  # suppress the following warnings in openssl so we don't break the build with warnings-as-errors:
+  # 1295: Deprecated declaration <entity> - give arg types
+  #  550: <entity> was set but never used
+  # 1293: assignment in condition
+  #  111: statement is unreachable (invariably "break;" after "return X;" in case statement)
+  #   68: integer conversion resulted in a change of sign ("if (Status == -1)")
+  #  177: <entity> was declared but never referenced
+  #  223: function <entity> declared implicitly
+  #  144: a value of type <type> cannot be used to initialize an entity of type <type>
+  #  513: a value of type <type> cannot be assigned to an entity of type <type>
+  #  188: enumerated type mixed with another type (i.e. passing an integer as an enum without a cast)
+  # 1296: Extended constant initialiser used
+  #  128: loop is not reachable - may be emitted inappropriately if code follows a conditional return
+  #       from the function that evaluates to true at compile time
+  #  546: transfer of control bypasses initialization - may be emitted inappropriately if the uninitialized
+  #       variable is never referenced after the jump
+  #    1: ignore "#1-D: last line of file ends without a newline"
+  # 3017: <entity> may be used before being set (NOTE: This was fixed in OpenSSL 1.1 HEAD with
+  #       commit d9b8b89bec4480de3a10bdaf9425db371c19145b, and can be dropped then.)
+  XCODE:*_*_IA32_CC_FLAGS   = -mmmx -msse -U_WIN32 -U_WIN64 $(OPENSSL_FLAGS) -w -std=c99 -Wno-error=uninitialized
diff --git a/CryptoPkg/Library/OpensslLib/OpensslLibX64.inf b/CryptoPkg/Library/OpensslLib/OpensslLibX64.inf
new file mode 100644
index 0000000000..fcebc6d6de
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/OpensslLibX64.inf
@@ -0,0 +1,691 @@
+## @file
+#  This module provides OpenSSL Library implementation.
+#
+#  Copyright (c) 2010 - 2020, Intel Corporation. All rights reserved.<BR>
+#  SPDX-License-Identifier: BSD-2-Clause-Patent
+#
+##
+
+[Defines]
+  INF_VERSION                    = 0x00010005
+  BASE_NAME                      = OpensslLibX64
+  MODULE_UNI_FILE                = OpensslLib.uni
+  FILE_GUID                      = 18125E50-0117-4DD0-BE54-4784AD995FEF
+  MODULE_TYPE                    = BASE
+  VERSION_STRING                 = 1.0
+  LIBRARY_CLASS                  = OpensslLib
+  DEFINE OPENSSL_PATH            = openssl
+  DEFINE OPENSSL_FLAGS           = -DL_ENDIAN -DOPENSSL_SMALL_FOOTPRINT -D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE
+  DEFINE OPENSSL_FLAGS_CONFIG    = -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM
+  CONSTRUCTOR                    = OpensslLibConstructor
+
+#
+#  VALID_ARCHITECTURES           = X64
+#
+
+[Sources]
+  OpensslLibConstructor.c
+  $(OPENSSL_PATH)/e_os.h
+  $(OPENSSL_PATH)/ms/uplink.h
+# Autogenerated files list starts here
+  X64/crypto/aes/aesni-mb-x86_64.nasm
+  X64/crypto/aes/aesni-sha1-x86_64.nasm
+  X64/crypto/aes/aesni-sha256-x86_64.nasm
+  X64/crypto/aes/aesni-x86_64.nasm
+  X64/crypto/aes/vpaes-x86_64.nasm
+  X64/crypto/bn/rsaz-avx2.nasm
+  X64/crypto/bn/rsaz-x86_64.nasm
+  X64/crypto/bn/x86_64-gf2m.nasm
+  X64/crypto/bn/x86_64-mont.nasm
+  X64/crypto/bn/x86_64-mont5.nasm
+  X64/crypto/md5/md5-x86_64.nasm
+  X64/crypto/modes/aesni-gcm-x86_64.nasm
+  X64/crypto/modes/ghash-x86_64.nasm
+  X64/crypto/rc4/rc4-md5-x86_64.nasm
+  X64/crypto/rc4/rc4-x86_64.nasm
+  X64/crypto/sha/keccak1600-x86_64.nasm
+  X64/crypto/sha/sha1-mb-x86_64.nasm
+  X64/crypto/sha/sha1-x86_64.nasm
+  X64/crypto/sha/sha256-mb-x86_64.nasm
+  X64/crypto/sha/sha256-x86_64.nasm
+  X64/crypto/sha/sha512-x86_64.nasm
+  X64/crypto/x86_64cpuid.nasm
+  $(OPENSSL_PATH)/crypto/aes/aes_cbc.c
+  $(OPENSSL_PATH)/crypto/aes/aes_cfb.c
+  $(OPENSSL_PATH)/crypto/aes/aes_core.c
+  $(OPENSSL_PATH)/crypto/aes/aes_ecb.c
+  $(OPENSSL_PATH)/crypto/aes/aes_ige.c
+  $(OPENSSL_PATH)/crypto/aes/aes_misc.c
+  $(OPENSSL_PATH)/crypto/aes/aes_ofb.c
+  $(OPENSSL_PATH)/crypto/aes/aes_wrap.c
+  $(OPENSSL_PATH)/crypto/aria/aria.c
+  $(OPENSSL_PATH)/crypto/asn1/a_bitstr.c
+  $(OPENSSL_PATH)/crypto/asn1/a_d2i_fp.c
+  $(OPENSSL_PATH)/crypto/asn1/a_digest.c
+  $(OPENSSL_PATH)/crypto/asn1/a_dup.c
+  $(OPENSSL_PATH)/crypto/asn1/a_gentm.c
+  $(OPENSSL_PATH)/crypto/asn1/a_i2d_fp.c
+  $(OPENSSL_PATH)/crypto/asn1/a_int.c
+  $(OPENSSL_PATH)/crypto/asn1/a_mbstr.c
+  $(OPENSSL_PATH)/crypto/asn1/a_object.c
+  $(OPENSSL_PATH)/crypto/asn1/a_octet.c
+  $(OPENSSL_PATH)/crypto/asn1/a_print.c
+  $(OPENSSL_PATH)/crypto/asn1/a_sign.c
+  $(OPENSSL_PATH)/crypto/asn1/a_strex.c
+  $(OPENSSL_PATH)/crypto/asn1/a_strnid.c
+  $(OPENSSL_PATH)/crypto/asn1/a_time.c
+  $(OPENSSL_PATH)/crypto/asn1/a_type.c
+  $(OPENSSL_PATH)/crypto/asn1/a_utctm.c
+  $(OPENSSL_PATH)/crypto/asn1/a_utf8.c
+  $(OPENSSL_PATH)/crypto/asn1/a_verify.c
+  $(OPENSSL_PATH)/crypto/asn1/ameth_lib.c
+  $(OPENSSL_PATH)/crypto/asn1/asn1_err.c
+  $(OPENSSL_PATH)/crypto/asn1/asn1_gen.c
+  $(OPENSSL_PATH)/crypto/asn1/asn1_item_list.c
+  $(OPENSSL_PATH)/crypto/asn1/asn1_lib.c
+  $(OPENSSL_PATH)/crypto/asn1/asn1_par.c
+  $(OPENSSL_PATH)/crypto/asn1/asn_mime.c
+  $(OPENSSL_PATH)/crypto/asn1/asn_moid.c
+  $(OPENSSL_PATH)/crypto/asn1/asn_mstbl.c
+  $(OPENSSL_PATH)/crypto/asn1/asn_pack.c
+  $(OPENSSL_PATH)/crypto/asn1/bio_asn1.c
+  $(OPENSSL_PATH)/crypto/asn1/bio_ndef.c
+  $(OPENSSL_PATH)/crypto/asn1/d2i_pr.c
+  $(OPENSSL_PATH)/crypto/asn1/d2i_pu.c
+  $(OPENSSL_PATH)/crypto/asn1/evp_asn1.c
+  $(OPENSSL_PATH)/crypto/asn1/f_int.c
+  $(OPENSSL_PATH)/crypto/asn1/f_string.c
+  $(OPENSSL_PATH)/crypto/asn1/i2d_pr.c
+  $(OPENSSL_PATH)/crypto/asn1/i2d_pu.c
+  $(OPENSSL_PATH)/crypto/asn1/n_pkey.c
+  $(OPENSSL_PATH)/crypto/asn1/nsseq.c
+  $(OPENSSL_PATH)/crypto/asn1/p5_pbe.c
+  $(OPENSSL_PATH)/crypto/asn1/p5_pbev2.c
+  $(OPENSSL_PATH)/crypto/asn1/p5_scrypt.c
+  $(OPENSSL_PATH)/crypto/asn1/p8_pkey.c
+  $(OPENSSL_PATH)/crypto/asn1/t_bitst.c
+  $(OPENSSL_PATH)/crypto/asn1/t_pkey.c
+  $(OPENSSL_PATH)/crypto/asn1/t_spki.c
+  $(OPENSSL_PATH)/crypto/asn1/tasn_dec.c
+  $(OPENSSL_PATH)/crypto/asn1/tasn_enc.c
+  $(OPENSSL_PATH)/crypto/asn1/tasn_fre.c
+  $(OPENSSL_PATH)/crypto/asn1/tasn_new.c
+  $(OPENSSL_PATH)/crypto/asn1/tasn_prn.c
+  $(OPENSSL_PATH)/crypto/asn1/tasn_scn.c
+  $(OPENSSL_PATH)/crypto/asn1/tasn_typ.c
+  $(OPENSSL_PATH)/crypto/asn1/tasn_utl.c
+  $(OPENSSL_PATH)/crypto/asn1/x_algor.c
+  $(OPENSSL_PATH)/crypto/asn1/x_bignum.c
+  $(OPENSSL_PATH)/crypto/asn1/x_info.c
+  $(OPENSSL_PATH)/crypto/asn1/x_int64.c
+  $(OPENSSL_PATH)/crypto/asn1/x_long.c
+  $(OPENSSL_PATH)/crypto/asn1/x_pkey.c
+  $(OPENSSL_PATH)/crypto/asn1/x_sig.c
+  $(OPENSSL_PATH)/crypto/asn1/x_spki.c
+  $(OPENSSL_PATH)/crypto/asn1/x_val.c
+  $(OPENSSL_PATH)/crypto/async/arch/async_null.c
+  $(OPENSSL_PATH)/crypto/async/arch/async_posix.c
+  $(OPENSSL_PATH)/crypto/async/arch/async_win.c
+  $(OPENSSL_PATH)/crypto/async/async.c
+  $(OPENSSL_PATH)/crypto/async/async_err.c
+  $(OPENSSL_PATH)/crypto/async/async_wait.c
+  $(OPENSSL_PATH)/crypto/bio/b_addr.c
+  $(OPENSSL_PATH)/crypto/bio/b_dump.c
+  $(OPENSSL_PATH)/crypto/bio/b_sock.c
+  $(OPENSSL_PATH)/crypto/bio/b_sock2.c
+  $(OPENSSL_PATH)/crypto/bio/bf_buff.c
+  $(OPENSSL_PATH)/crypto/bio/bf_lbuf.c
+  $(OPENSSL_PATH)/crypto/bio/bf_nbio.c
+  $(OPENSSL_PATH)/crypto/bio/bf_null.c
+  $(OPENSSL_PATH)/crypto/bio/bio_cb.c
+  $(OPENSSL_PATH)/crypto/bio/bio_err.c
+  $(OPENSSL_PATH)/crypto/bio/bio_lib.c
+  $(OPENSSL_PATH)/crypto/bio/bio_meth.c
+  $(OPENSSL_PATH)/crypto/bio/bss_acpt.c
+  $(OPENSSL_PATH)/crypto/bio/bss_bio.c
+  $(OPENSSL_PATH)/crypto/bio/bss_conn.c
+  $(OPENSSL_PATH)/crypto/bio/bss_dgram.c
+  $(OPENSSL_PATH)/crypto/bio/bss_fd.c
+  $(OPENSSL_PATH)/crypto/bio/bss_file.c
+  $(OPENSSL_PATH)/crypto/bio/bss_log.c
+  $(OPENSSL_PATH)/crypto/bio/bss_mem.c
+  $(OPENSSL_PATH)/crypto/bio/bss_null.c
+  $(OPENSSL_PATH)/crypto/bio/bss_sock.c
+  $(OPENSSL_PATH)/crypto/bn/asm/x86_64-gcc.c
+  $(OPENSSL_PATH)/crypto/bn/bn_add.c
+  $(OPENSSL_PATH)/crypto/bn/bn_blind.c
+  $(OPENSSL_PATH)/crypto/bn/bn_const.c
+  $(OPENSSL_PATH)/crypto/bn/bn_ctx.c
+  $(OPENSSL_PATH)/crypto/bn/bn_depr.c
+  $(OPENSSL_PATH)/crypto/bn/bn_dh.c
+  $(OPENSSL_PATH)/crypto/bn/bn_div.c
+  $(OPENSSL_PATH)/crypto/bn/bn_err.c
+  $(OPENSSL_PATH)/crypto/bn/bn_exp.c
+  $(OPENSSL_PATH)/crypto/bn/bn_exp2.c
+  $(OPENSSL_PATH)/crypto/bn/bn_gcd.c
+  $(OPENSSL_PATH)/crypto/bn/bn_gf2m.c
+  $(OPENSSL_PATH)/crypto/bn/bn_intern.c
+  $(OPENSSL_PATH)/crypto/bn/bn_kron.c
+  $(OPENSSL_PATH)/crypto/bn/bn_lib.c
+  $(OPENSSL_PATH)/crypto/bn/bn_mod.c
+  $(OPENSSL_PATH)/crypto/bn/bn_mont.c
+  $(OPENSSL_PATH)/crypto/bn/bn_mpi.c
+  $(OPENSSL_PATH)/crypto/bn/bn_mul.c
+  $(OPENSSL_PATH)/crypto/bn/bn_nist.c
+  $(OPENSSL_PATH)/crypto/bn/bn_prime.c
+  $(OPENSSL_PATH)/crypto/bn/bn_print.c
+  $(OPENSSL_PATH)/crypto/bn/bn_rand.c
+  $(OPENSSL_PATH)/crypto/bn/bn_recp.c
+  $(OPENSSL_PATH)/crypto/bn/bn_shift.c
+  $(OPENSSL_PATH)/crypto/bn/bn_sqr.c
+  $(OPENSSL_PATH)/crypto/bn/bn_sqrt.c
+  $(OPENSSL_PATH)/crypto/bn/bn_srp.c
+  $(OPENSSL_PATH)/crypto/bn/bn_word.c
+  $(OPENSSL_PATH)/crypto/bn/bn_x931p.c
+  $(OPENSSL_PATH)/crypto/bn/rsaz_exp.c
+  $(OPENSSL_PATH)/crypto/buffer/buf_err.c
+  $(OPENSSL_PATH)/crypto/buffer/buffer.c
+  $(OPENSSL_PATH)/crypto/cmac/cm_ameth.c
+  $(OPENSSL_PATH)/crypto/cmac/cm_pmeth.c
+  $(OPENSSL_PATH)/crypto/cmac/cmac.c
+  $(OPENSSL_PATH)/crypto/comp/c_zlib.c
+  $(OPENSSL_PATH)/crypto/comp/comp_err.c
+  $(OPENSSL_PATH)/crypto/comp/comp_lib.c
+  $(OPENSSL_PATH)/crypto/conf/conf_api.c
+  $(OPENSSL_PATH)/crypto/conf/conf_def.c
+  $(OPENSSL_PATH)/crypto/conf/conf_err.c
+  $(OPENSSL_PATH)/crypto/conf/conf_lib.c
+  $(OPENSSL_PATH)/crypto/conf/conf_mall.c
+  $(OPENSSL_PATH)/crypto/conf/conf_mod.c
+  $(OPENSSL_PATH)/crypto/conf/conf_sap.c
+  $(OPENSSL_PATH)/crypto/conf/conf_ssl.c
+  $(OPENSSL_PATH)/crypto/cpt_err.c
+  $(OPENSSL_PATH)/crypto/cryptlib.c
+  $(OPENSSL_PATH)/crypto/ctype.c
+  $(OPENSSL_PATH)/crypto/cversion.c
+  $(OPENSSL_PATH)/crypto/des/cbc_cksm.c
+  $(OPENSSL_PATH)/crypto/des/cbc_enc.c
+  $(OPENSSL_PATH)/crypto/des/cfb64ede.c
+  $(OPENSSL_PATH)/crypto/des/cfb64enc.c
+  $(OPENSSL_PATH)/crypto/des/cfb_enc.c
+  $(OPENSSL_PATH)/crypto/des/des_enc.c
+  $(OPENSSL_PATH)/crypto/des/ecb3_enc.c
+  $(OPENSSL_PATH)/crypto/des/ecb_enc.c
+  $(OPENSSL_PATH)/crypto/des/fcrypt.c
+  $(OPENSSL_PATH)/crypto/des/fcrypt_b.c
+  $(OPENSSL_PATH)/crypto/des/ofb64ede.c
+  $(OPENSSL_PATH)/crypto/des/ofb64enc.c
+  $(OPENSSL_PATH)/crypto/des/ofb_enc.c
+  $(OPENSSL_PATH)/crypto/des/pcbc_enc.c
+  $(OPENSSL_PATH)/crypto/des/qud_cksm.c
+  $(OPENSSL_PATH)/crypto/des/rand_key.c
+  $(OPENSSL_PATH)/crypto/des/set_key.c
+  $(OPENSSL_PATH)/crypto/des/str2key.c
+  $(OPENSSL_PATH)/crypto/des/xcbc_enc.c
+  $(OPENSSL_PATH)/crypto/dh/dh_ameth.c
+  $(OPENSSL_PATH)/crypto/dh/dh_asn1.c
+  $(OPENSSL_PATH)/crypto/dh/dh_check.c
+  $(OPENSSL_PATH)/crypto/dh/dh_depr.c
+  $(OPENSSL_PATH)/crypto/dh/dh_err.c
+  $(OPENSSL_PATH)/crypto/dh/dh_gen.c
+  $(OPENSSL_PATH)/crypto/dh/dh_kdf.c
+  $(OPENSSL_PATH)/crypto/dh/dh_key.c
+  $(OPENSSL_PATH)/crypto/dh/dh_lib.c
+  $(OPENSSL_PATH)/crypto/dh/dh_meth.c
+  $(OPENSSL_PATH)/crypto/dh/dh_pmeth.c
+  $(OPENSSL_PATH)/crypto/dh/dh_prn.c
+  $(OPENSSL_PATH)/crypto/dh/dh_rfc5114.c
+  $(OPENSSL_PATH)/crypto/dh/dh_rfc7919.c
+  $(OPENSSL_PATH)/crypto/dso/dso_dl.c
+  $(OPENSSL_PATH)/crypto/dso/dso_dlfcn.c
+  $(OPENSSL_PATH)/crypto/dso/dso_err.c
+  $(OPENSSL_PATH)/crypto/dso/dso_lib.c
+  $(OPENSSL_PATH)/crypto/dso/dso_openssl.c
+  $(OPENSSL_PATH)/crypto/dso/dso_vms.c
+  $(OPENSSL_PATH)/crypto/dso/dso_win32.c
+  $(OPENSSL_PATH)/crypto/ebcdic.c
+  $(OPENSSL_PATH)/crypto/err/err.c
+  $(OPENSSL_PATH)/crypto/err/err_prn.c
+  $(OPENSSL_PATH)/crypto/evp/bio_b64.c
+  $(OPENSSL_PATH)/crypto/evp/bio_enc.c
+  $(OPENSSL_PATH)/crypto/evp/bio_md.c
+  $(OPENSSL_PATH)/crypto/evp/bio_ok.c
+  $(OPENSSL_PATH)/crypto/evp/c_allc.c
+  $(OPENSSL_PATH)/crypto/evp/c_alld.c
+  $(OPENSSL_PATH)/crypto/evp/cmeth_lib.c
+  $(OPENSSL_PATH)/crypto/evp/digest.c
+  $(OPENSSL_PATH)/crypto/evp/e_aes.c
+  $(OPENSSL_PATH)/crypto/evp/e_aes_cbc_hmac_sha1.c
+  $(OPENSSL_PATH)/crypto/evp/e_aes_cbc_hmac_sha256.c
+  $(OPENSSL_PATH)/crypto/evp/e_aria.c
+  $(OPENSSL_PATH)/crypto/evp/e_bf.c
+  $(OPENSSL_PATH)/crypto/evp/e_camellia.c
+  $(OPENSSL_PATH)/crypto/evp/e_cast.c
+  $(OPENSSL_PATH)/crypto/evp/e_chacha20_poly1305.c
+  $(OPENSSL_PATH)/crypto/evp/e_des.c
+  $(OPENSSL_PATH)/crypto/evp/e_des3.c
+  $(OPENSSL_PATH)/crypto/evp/e_idea.c
+  $(OPENSSL_PATH)/crypto/evp/e_null.c
+  $(OPENSSL_PATH)/crypto/evp/e_old.c
+  $(OPENSSL_PATH)/crypto/evp/e_rc2.c
+  $(OPENSSL_PATH)/crypto/evp/e_rc4.c
+  $(OPENSSL_PATH)/crypto/evp/e_rc4_hmac_md5.c
+  $(OPENSSL_PATH)/crypto/evp/e_rc5.c
+  $(OPENSSL_PATH)/crypto/evp/e_seed.c
+  $(OPENSSL_PATH)/crypto/evp/e_sm4.c
+  $(OPENSSL_PATH)/crypto/evp/e_xcbc_d.c
+  $(OPENSSL_PATH)/crypto/evp/encode.c
+  $(OPENSSL_PATH)/crypto/evp/evp_cnf.c
+  $(OPENSSL_PATH)/crypto/evp/evp_enc.c
+  $(OPENSSL_PATH)/crypto/evp/evp_err.c
+  $(OPENSSL_PATH)/crypto/evp/evp_key.c
+  $(OPENSSL_PATH)/crypto/evp/evp_lib.c
+  $(OPENSSL_PATH)/crypto/evp/evp_pbe.c
+  $(OPENSSL_PATH)/crypto/evp/evp_pkey.c
+  $(OPENSSL_PATH)/crypto/evp/m_md2.c
+  $(OPENSSL_PATH)/crypto/evp/m_md4.c
+  $(OPENSSL_PATH)/crypto/evp/m_md5.c
+  $(OPENSSL_PATH)/crypto/evp/m_md5_sha1.c
+  $(OPENSSL_PATH)/crypto/evp/m_mdc2.c
+  $(OPENSSL_PATH)/crypto/evp/m_null.c
+  $(OPENSSL_PATH)/crypto/evp/m_ripemd.c
+  $(OPENSSL_PATH)/crypto/evp/m_sha1.c
+  $(OPENSSL_PATH)/crypto/evp/m_sha3.c
+  $(OPENSSL_PATH)/crypto/evp/m_sigver.c
+  $(OPENSSL_PATH)/crypto/evp/m_wp.c
+  $(OPENSSL_PATH)/crypto/evp/names.c
+  $(OPENSSL_PATH)/crypto/evp/p5_crpt.c
+  $(OPENSSL_PATH)/crypto/evp/p5_crpt2.c
+  $(OPENSSL_PATH)/crypto/evp/p_dec.c
+  $(OPENSSL_PATH)/crypto/evp/p_enc.c
+  $(OPENSSL_PATH)/crypto/evp/p_lib.c
+  $(OPENSSL_PATH)/crypto/evp/p_open.c
+  $(OPENSSL_PATH)/crypto/evp/p_seal.c
+  $(OPENSSL_PATH)/crypto/evp/p_sign.c
+  $(OPENSSL_PATH)/crypto/evp/p_verify.c
+  $(OPENSSL_PATH)/crypto/evp/pbe_scrypt.c
+  $(OPENSSL_PATH)/crypto/evp/pmeth_fn.c
+  $(OPENSSL_PATH)/crypto/evp/pmeth_gn.c
+  $(OPENSSL_PATH)/crypto/evp/pmeth_lib.c
+  $(OPENSSL_PATH)/crypto/ex_data.c
+  $(OPENSSL_PATH)/crypto/getenv.c
+  $(OPENSSL_PATH)/crypto/hmac/hm_ameth.c
+  $(OPENSSL_PATH)/crypto/hmac/hm_pmeth.c
+  $(OPENSSL_PATH)/crypto/hmac/hmac.c
+  $(OPENSSL_PATH)/crypto/init.c
+  $(OPENSSL_PATH)/crypto/kdf/hkdf.c
+  $(OPENSSL_PATH)/crypto/kdf/kdf_err.c
+  $(OPENSSL_PATH)/crypto/kdf/scrypt.c
+  $(OPENSSL_PATH)/crypto/kdf/tls1_prf.c
+  $(OPENSSL_PATH)/crypto/lhash/lh_stats.c
+  $(OPENSSL_PATH)/crypto/lhash/lhash.c
+  $(OPENSSL_PATH)/crypto/md4/md4_dgst.c
+  $(OPENSSL_PATH)/crypto/md4/md4_one.c
+  $(OPENSSL_PATH)/crypto/md5/md5_dgst.c
+  $(OPENSSL_PATH)/crypto/md5/md5_one.c
+  $(OPENSSL_PATH)/crypto/mem.c
+  $(OPENSSL_PATH)/crypto/mem_dbg.c
+  $(OPENSSL_PATH)/crypto/mem_sec.c
+  $(OPENSSL_PATH)/crypto/modes/cbc128.c
+  $(OPENSSL_PATH)/crypto/modes/ccm128.c
+  $(OPENSSL_PATH)/crypto/modes/cfb128.c
+  $(OPENSSL_PATH)/crypto/modes/ctr128.c
+  $(OPENSSL_PATH)/crypto/modes/cts128.c
+  $(OPENSSL_PATH)/crypto/modes/gcm128.c
+  $(OPENSSL_PATH)/crypto/modes/ocb128.c
+  $(OPENSSL_PATH)/crypto/modes/ofb128.c
+  $(OPENSSL_PATH)/crypto/modes/wrap128.c
+  $(OPENSSL_PATH)/crypto/modes/xts128.c
+  $(OPENSSL_PATH)/crypto/o_dir.c
+  $(OPENSSL_PATH)/crypto/o_fips.c
+  $(OPENSSL_PATH)/crypto/o_fopen.c
+  $(OPENSSL_PATH)/crypto/o_init.c
+  $(OPENSSL_PATH)/crypto/o_str.c
+  $(OPENSSL_PATH)/crypto/o_time.c
+  $(OPENSSL_PATH)/crypto/objects/o_names.c
+  $(OPENSSL_PATH)/crypto/objects/obj_dat.c
+  $(OPENSSL_PATH)/crypto/objects/obj_err.c
+  $(OPENSSL_PATH)/crypto/objects/obj_lib.c
+  $(OPENSSL_PATH)/crypto/objects/obj_xref.c
+  $(OPENSSL_PATH)/crypto/ocsp/ocsp_asn.c
+  $(OPENSSL_PATH)/crypto/ocsp/ocsp_cl.c
+  $(OPENSSL_PATH)/crypto/ocsp/ocsp_err.c
+  $(OPENSSL_PATH)/crypto/ocsp/ocsp_ext.c
+  $(OPENSSL_PATH)/crypto/ocsp/ocsp_ht.c
+  $(OPENSSL_PATH)/crypto/ocsp/ocsp_lib.c
+  $(OPENSSL_PATH)/crypto/ocsp/ocsp_prn.c
+  $(OPENSSL_PATH)/crypto/ocsp/ocsp_srv.c
+  $(OPENSSL_PATH)/crypto/ocsp/ocsp_vfy.c
+  $(OPENSSL_PATH)/crypto/ocsp/v3_ocsp.c
+  $(OPENSSL_PATH)/crypto/pem/pem_all.c
+  $(OPENSSL_PATH)/crypto/pem/pem_err.c
+  $(OPENSSL_PATH)/crypto/pem/pem_info.c
+  $(OPENSSL_PATH)/crypto/pem/pem_lib.c
+  $(OPENSSL_PATH)/crypto/pem/pem_oth.c
+  $(OPENSSL_PATH)/crypto/pem/pem_pk8.c
+  $(OPENSSL_PATH)/crypto/pem/pem_pkey.c
+  $(OPENSSL_PATH)/crypto/pem/pem_sign.c
+  $(OPENSSL_PATH)/crypto/pem/pem_x509.c
+  $(OPENSSL_PATH)/crypto/pem/pem_xaux.c
+  $(OPENSSL_PATH)/crypto/pem/pvkfmt.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_add.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_asn.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_attr.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_crpt.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_crt.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_decr.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_init.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_key.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_kiss.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_mutl.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_npas.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_p8d.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_p8e.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_sbag.c
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_utl.c
+  $(OPENSSL_PATH)/crypto/pkcs12/pk12err.c
+  $(OPENSSL_PATH)/crypto/pkcs7/bio_pk7.c
+  $(OPENSSL_PATH)/crypto/pkcs7/pk7_asn1.c
+  $(OPENSSL_PATH)/crypto/pkcs7/pk7_attr.c
+  $(OPENSSL_PATH)/crypto/pkcs7/pk7_doit.c
+  $(OPENSSL_PATH)/crypto/pkcs7/pk7_lib.c
+  $(OPENSSL_PATH)/crypto/pkcs7/pk7_mime.c
+  $(OPENSSL_PATH)/crypto/pkcs7/pk7_smime.c
+  $(OPENSSL_PATH)/crypto/pkcs7/pkcs7err.c
+  $(OPENSSL_PATH)/crypto/rand/drbg_ctr.c
+  $(OPENSSL_PATH)/crypto/rand/drbg_lib.c
+  $(OPENSSL_PATH)/crypto/rand/rand_egd.c
+  $(OPENSSL_PATH)/crypto/rand/rand_err.c
+  $(OPENSSL_PATH)/crypto/rand/rand_lib.c
+  $(OPENSSL_PATH)/crypto/rand/rand_unix.c
+  $(OPENSSL_PATH)/crypto/rand/rand_vms.c
+  $(OPENSSL_PATH)/crypto/rand/rand_win.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_ameth.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_asn1.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_chk.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_crpt.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_depr.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_err.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_gen.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_lib.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_meth.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_mp.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_none.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_oaep.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_ossl.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_pk1.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_pmeth.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_prn.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_pss.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_saos.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_sign.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_ssl.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_x931.c
+  $(OPENSSL_PATH)/crypto/rsa/rsa_x931g.c
+  $(OPENSSL_PATH)/crypto/sha/sha1_one.c
+  $(OPENSSL_PATH)/crypto/sha/sha1dgst.c
+  $(OPENSSL_PATH)/crypto/sha/sha256.c
+  $(OPENSSL_PATH)/crypto/sha/sha512.c
+  $(OPENSSL_PATH)/crypto/siphash/siphash.c
+  $(OPENSSL_PATH)/crypto/siphash/siphash_ameth.c
+  $(OPENSSL_PATH)/crypto/siphash/siphash_pmeth.c
+  $(OPENSSL_PATH)/crypto/sm3/m_sm3.c
+  $(OPENSSL_PATH)/crypto/sm3/sm3.c
+  $(OPENSSL_PATH)/crypto/sm4/sm4.c
+  $(OPENSSL_PATH)/crypto/stack/stack.c
+  $(OPENSSL_PATH)/crypto/threads_none.c
+  $(OPENSSL_PATH)/crypto/threads_pthread.c
+  $(OPENSSL_PATH)/crypto/threads_win.c
+  $(OPENSSL_PATH)/crypto/txt_db/txt_db.c
+  $(OPENSSL_PATH)/crypto/ui/ui_err.c
+  $(OPENSSL_PATH)/crypto/ui/ui_lib.c
+  $(OPENSSL_PATH)/crypto/ui/ui_null.c
+  $(OPENSSL_PATH)/crypto/ui/ui_openssl.c
+  $(OPENSSL_PATH)/crypto/ui/ui_util.c
+  $(OPENSSL_PATH)/crypto/uid.c
+  $(OPENSSL_PATH)/crypto/x509/by_dir.c
+  $(OPENSSL_PATH)/crypto/x509/by_file.c
+  $(OPENSSL_PATH)/crypto/x509/t_crl.c
+  $(OPENSSL_PATH)/crypto/x509/t_req.c
+  $(OPENSSL_PATH)/crypto/x509/t_x509.c
+  $(OPENSSL_PATH)/crypto/x509/x509_att.c
+  $(OPENSSL_PATH)/crypto/x509/x509_cmp.c
+  $(OPENSSL_PATH)/crypto/x509/x509_d2.c
+  $(OPENSSL_PATH)/crypto/x509/x509_def.c
+  $(OPENSSL_PATH)/crypto/x509/x509_err.c
+  $(OPENSSL_PATH)/crypto/x509/x509_ext.c
+  $(OPENSSL_PATH)/crypto/x509/x509_lu.c
+  $(OPENSSL_PATH)/crypto/x509/x509_meth.c
+  $(OPENSSL_PATH)/crypto/x509/x509_obj.c
+  $(OPENSSL_PATH)/crypto/x509/x509_r2x.c
+  $(OPENSSL_PATH)/crypto/x509/x509_req.c
+  $(OPENSSL_PATH)/crypto/x509/x509_set.c
+  $(OPENSSL_PATH)/crypto/x509/x509_trs.c
+  $(OPENSSL_PATH)/crypto/x509/x509_txt.c
+  $(OPENSSL_PATH)/crypto/x509/x509_v3.c
+  $(OPENSSL_PATH)/crypto/x509/x509_vfy.c
+  $(OPENSSL_PATH)/crypto/x509/x509_vpm.c
+  $(OPENSSL_PATH)/crypto/x509/x509cset.c
+  $(OPENSSL_PATH)/crypto/x509/x509name.c
+  $(OPENSSL_PATH)/crypto/x509/x509rset.c
+  $(OPENSSL_PATH)/crypto/x509/x509spki.c
+  $(OPENSSL_PATH)/crypto/x509/x509type.c
+  $(OPENSSL_PATH)/crypto/x509/x_all.c
+  $(OPENSSL_PATH)/crypto/x509/x_attrib.c
+  $(OPENSSL_PATH)/crypto/x509/x_crl.c
+  $(OPENSSL_PATH)/crypto/x509/x_exten.c
+  $(OPENSSL_PATH)/crypto/x509/x_name.c
+  $(OPENSSL_PATH)/crypto/x509/x_pubkey.c
+  $(OPENSSL_PATH)/crypto/x509/x_req.c
+  $(OPENSSL_PATH)/crypto/x509/x_x509.c
+  $(OPENSSL_PATH)/crypto/x509/x_x509a.c
+  $(OPENSSL_PATH)/crypto/x509v3/pcy_cache.c
+  $(OPENSSL_PATH)/crypto/x509v3/pcy_data.c
+  $(OPENSSL_PATH)/crypto/x509v3/pcy_lib.c
+  $(OPENSSL_PATH)/crypto/x509v3/pcy_map.c
+  $(OPENSSL_PATH)/crypto/x509v3/pcy_node.c
+  $(OPENSSL_PATH)/crypto/x509v3/pcy_tree.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_addr.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_admis.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_akey.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_akeya.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_alt.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_asid.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_bcons.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_bitst.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_conf.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_cpols.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_crld.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_enum.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_extku.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_genn.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_ia5.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_info.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_int.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_lib.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_ncons.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_pci.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_pcia.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_pcons.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_pku.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_pmaps.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_prn.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_purp.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_skey.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_sxnet.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_tlsf.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3_utl.c
+  $(OPENSSL_PATH)/crypto/x509v3/v3err.c
+  $(OPENSSL_PATH)/crypto/arm_arch.h
+  $(OPENSSL_PATH)/crypto/mips_arch.h
+  $(OPENSSL_PATH)/crypto/ppc_arch.h
+  $(OPENSSL_PATH)/crypto/s390x_arch.h
+  $(OPENSSL_PATH)/crypto/sparc_arch.h
+  $(OPENSSL_PATH)/crypto/vms_rms.h
+  $(OPENSSL_PATH)/crypto/aes/aes_locl.h
+  $(OPENSSL_PATH)/crypto/asn1/asn1_item_list.h
+  $(OPENSSL_PATH)/crypto/asn1/asn1_locl.h
+  $(OPENSSL_PATH)/crypto/asn1/charmap.h
+  $(OPENSSL_PATH)/crypto/asn1/standard_methods.h
+  $(OPENSSL_PATH)/crypto/asn1/tbl_standard.h
+  $(OPENSSL_PATH)/crypto/async/async_locl.h
+  $(OPENSSL_PATH)/crypto/async/arch/async_null.h
+  $(OPENSSL_PATH)/crypto/async/arch/async_posix.h
+  $(OPENSSL_PATH)/crypto/async/arch/async_win.h
+  $(OPENSSL_PATH)/crypto/bio/bio_lcl.h
+  $(OPENSSL_PATH)/crypto/bn/bn_lcl.h
+  $(OPENSSL_PATH)/crypto/bn/bn_prime.h
+  $(OPENSSL_PATH)/crypto/bn/rsaz_exp.h
+  $(OPENSSL_PATH)/crypto/comp/comp_lcl.h
+  $(OPENSSL_PATH)/crypto/conf/conf_def.h
+  $(OPENSSL_PATH)/crypto/conf/conf_lcl.h
+  $(OPENSSL_PATH)/crypto/des/des_locl.h
+  $(OPENSSL_PATH)/crypto/des/spr.h
+  $(OPENSSL_PATH)/crypto/dh/dh_locl.h
+  $(OPENSSL_PATH)/crypto/dso/dso_locl.h
+  $(OPENSSL_PATH)/crypto/evp/evp_locl.h
+  $(OPENSSL_PATH)/crypto/hmac/hmac_lcl.h
+  $(OPENSSL_PATH)/crypto/lhash/lhash_lcl.h
+  $(OPENSSL_PATH)/crypto/md4/md4_locl.h
+  $(OPENSSL_PATH)/crypto/md5/md5_locl.h
+  $(OPENSSL_PATH)/crypto/modes/modes_lcl.h
+  $(OPENSSL_PATH)/crypto/objects/obj_dat.h
+  $(OPENSSL_PATH)/crypto/objects/obj_lcl.h
+  $(OPENSSL_PATH)/crypto/objects/obj_xref.h
+  $(OPENSSL_PATH)/crypto/ocsp/ocsp_lcl.h
+  $(OPENSSL_PATH)/crypto/pkcs12/p12_lcl.h
+  $(OPENSSL_PATH)/crypto/rand/rand_lcl.h
+  $(OPENSSL_PATH)/crypto/rc4/rc4_locl.h
+  $(OPENSSL_PATH)/crypto/rsa/rsa_locl.h
+  $(OPENSSL_PATH)/crypto/sha/sha_locl.h
+  $(OPENSSL_PATH)/crypto/siphash/siphash_local.h
+  $(OPENSSL_PATH)/crypto/sm3/sm3_locl.h
+  $(OPENSSL_PATH)/crypto/store/store_locl.h
+  $(OPENSSL_PATH)/crypto/ui/ui_locl.h
+  $(OPENSSL_PATH)/crypto/x509/x509_lcl.h
+  $(OPENSSL_PATH)/crypto/x509v3/ext_dat.h
+  $(OPENSSL_PATH)/crypto/x509v3/pcy_int.h
+  $(OPENSSL_PATH)/crypto/x509v3/standard_exts.h
+  $(OPENSSL_PATH)/crypto/x509v3/v3_admis.h
+  $(OPENSSL_PATH)/ssl/bio_ssl.c
+  $(OPENSSL_PATH)/ssl/d1_lib.c
+  $(OPENSSL_PATH)/ssl/d1_msg.c
+  $(OPENSSL_PATH)/ssl/d1_srtp.c
+  $(OPENSSL_PATH)/ssl/methods.c
+  $(OPENSSL_PATH)/ssl/packet.c
+  $(OPENSSL_PATH)/ssl/pqueue.c
+  $(OPENSSL_PATH)/ssl/record/dtls1_bitmap.c
+  $(OPENSSL_PATH)/ssl/record/rec_layer_d1.c
+  $(OPENSSL_PATH)/ssl/record/rec_layer_s3.c
+  $(OPENSSL_PATH)/ssl/record/ssl3_buffer.c
+  $(OPENSSL_PATH)/ssl/record/ssl3_record.c
+  $(OPENSSL_PATH)/ssl/record/ssl3_record_tls13.c
+  $(OPENSSL_PATH)/ssl/s3_cbc.c
+  $(OPENSSL_PATH)/ssl/s3_enc.c
+  $(OPENSSL_PATH)/ssl/s3_lib.c
+  $(OPENSSL_PATH)/ssl/s3_msg.c
+  $(OPENSSL_PATH)/ssl/ssl_asn1.c
+  $(OPENSSL_PATH)/ssl/ssl_cert.c
+  $(OPENSSL_PATH)/ssl/ssl_ciph.c
+  $(OPENSSL_PATH)/ssl/ssl_conf.c
+  $(OPENSSL_PATH)/ssl/ssl_err.c
+  $(OPENSSL_PATH)/ssl/ssl_init.c
+  $(OPENSSL_PATH)/ssl/ssl_lib.c
+  $(OPENSSL_PATH)/ssl/ssl_mcnf.c
+  $(OPENSSL_PATH)/ssl/ssl_rsa.c
+  $(OPENSSL_PATH)/ssl/ssl_sess.c
+  $(OPENSSL_PATH)/ssl/ssl_stat.c
+  $(OPENSSL_PATH)/ssl/ssl_txt.c
+  $(OPENSSL_PATH)/ssl/ssl_utst.c
+  $(OPENSSL_PATH)/ssl/statem/extensions.c
+  $(OPENSSL_PATH)/ssl/statem/extensions_clnt.c
+  $(OPENSSL_PATH)/ssl/statem/extensions_cust.c
+  $(OPENSSL_PATH)/ssl/statem/extensions_srvr.c
+  $(OPENSSL_PATH)/ssl/statem/statem.c
+  $(OPENSSL_PATH)/ssl/statem/statem_clnt.c
+  $(OPENSSL_PATH)/ssl/statem/statem_dtls.c
+  $(OPENSSL_PATH)/ssl/statem/statem_lib.c
+  $(OPENSSL_PATH)/ssl/statem/statem_srvr.c
+  $(OPENSSL_PATH)/ssl/t1_enc.c
+  $(OPENSSL_PATH)/ssl/t1_lib.c
+  $(OPENSSL_PATH)/ssl/t1_trce.c
+  $(OPENSSL_PATH)/ssl/tls13_enc.c
+  $(OPENSSL_PATH)/ssl/tls_srp.c
+  $(OPENSSL_PATH)/ssl/packet_locl.h
+  $(OPENSSL_PATH)/ssl/ssl_cert_table.h
+  $(OPENSSL_PATH)/ssl/ssl_locl.h
+  $(OPENSSL_PATH)/ssl/record/record.h
+  $(OPENSSL_PATH)/ssl/record/record_locl.h
+  $(OPENSSL_PATH)/ssl/statem/statem.h
+  $(OPENSSL_PATH)/ssl/statem/statem_locl.h
+# Autogenerated files list ends here
+  buildinf.h
+  rand_pool_noise.h
+  ossl_store.c
+  rand_pool.c
+
+[Sources.X64]
+  ApiHooks.c
+  rand_pool_noise_tsc.c
+
+[Packages]
+  MdePkg/MdePkg.dec
+  CryptoPkg/CryptoPkg.dec
+
+[LibraryClasses]
+  BaseLib
+  DebugLib
+  TimerLib
+  PrintLib
+
+[BuildOptions]
+  #
+  # Disables the following Visual Studio compiler warnings brought by openssl source,
+  # so we do not break the build with /WX option:
+  #   C4090: 'function' : different 'const' qualifiers
+  #   C4132: 'object' : const object should be initialized (tls13_enc.c)
+  #   C4210: nonstandard extension used: function given file scope
+  #   C4244: conversion from type1 to type2, possible loss of data
+  #   C4245: conversion from type1 to type2, signed/unsigned mismatch
+  #   C4267: conversion from size_t to type, possible loss of data
+  #   C4306: 'identifier' : conversion from 'type1' to 'type2' of greater size
+  #   C4310: cast truncates constant value
+  #   C4389: 'operator' : signed/unsigned mismatch (xxxx)
+  #   C4700: uninitialized local variable 'name' used. (conf_sap.c(71))
+  #   C4702: unreachable code
+  #   C4706: assignment within conditional expression
+  #   C4819: The file contains a character that cannot be represented in the current code page
+  #
+  MSFT:*_*_X64_CC_FLAGS    = -U_WIN32 -U_WIN64 -U_MSC_VER $(OPENSSL_FLAGS) $(OPENSSL_FLAGS_CONFIG) /wd4090 /wd4132 /wd4210 /wd4244 /wd4245 /wd4267 /wd4306 /wd4310 /wd4700 /wd4389 /wd4702 /wd4706 /wd4819
+
+  INTEL:*_*_X64_CC_FLAGS   = -U_WIN32 -U_WIN64 -U_MSC_VER -U__ICC $(OPENSSL_FLAGS) $(OPENSSL_FLAGS_CONFIG) /w
+
+  #
+  # Suppress the following build warnings in openssl so we don't break the build with -Werror
+  #   -Werror=maybe-uninitialized: there exist some other paths for which the variable is not initialized.
+  #   -Werror=format: Check calls to printf and scanf, etc., to make sure that the arguments supplied have
+  #                   types appropriate to the format string specified.
+  #   -Werror=unused-but-set-variable: Warn whenever a local variable is assigned to, but otherwise unused (aside from its declaration).
+  #
+  GCC:*_*_X64_CC_FLAGS     = -U_WIN32 -U_WIN64 $(OPENSSL_FLAGS) -Wno-error=maybe-uninitialized -Wno-error=format -Wno-format -Wno-error=unused-but-set-variable -DNO_MSABI_VA_FUNCS
+
+  # suppress the following warnings in openssl so we don't break the build with warnings-as-errors:
+  # 1295: Deprecated declaration <entity> - give arg types
+  #  550: <entity> was set but never used
+  # 1293: assignment in condition
+  #  111: statement is unreachable (invariably "break;" after "return X;" in case statement)
+  #   68: integer conversion resulted in a change of sign ("if (Status == -1)")
+  #  177: <entity> was declared but never referenced
+  #  223: function <entity> declared implicitly
+  #  144: a value of type <type> cannot be used to initialize an entity of type <type>
+  #  513: a value of type <type> cannot be assigned to an entity of type <type>
+  #  188: enumerated type mixed with another type (i.e. passing an integer as an enum without a cast)
+  # 1296: Extended constant initialiser used
+  #  128: loop is not reachable - may be emitted inappropriately if code follows a conditional return
+  #       from the function that evaluates to true at compile time
+  #  546: transfer of control bypasses initialization - may be emitted inappropriately if the uninitialized
+  #       variable is never referenced after the jump
+  #    1: ignore "#1-D: last line of file ends without a newline"
+  # 3017: <entity> may be used before being set (NOTE: This was fixed in OpenSSL 1.1 HEAD with
+  #       commit d9b8b89bec4480de3a10bdaf9425db371c19145b, and can be dropped then.)
+  XCODE:*_*_X64_CC_FLAGS    = -mmmx -msse -U_WIN32 -U_WIN64 $(OPENSSL_FLAGS) -w -std=c99 -Wno-error=uninitialized
diff --git a/CryptoPkg/Library/Include/openssl/opensslconf.h b/CryptoPkg/Library/Include/openssl/opensslconf.h
index bd34e53ef2..20f32cc6fe 100644
--- a/CryptoPkg/Library/Include/openssl/opensslconf.h
+++ b/CryptoPkg/Library/Include/openssl/opensslconf.h
@@ -103,9 +103,6 @@ extern "C" {
 #ifndef OPENSSL_NO_ASAN
 # define OPENSSL_NO_ASAN
 #endif
-#ifndef OPENSSL_NO_ASM
-# define OPENSSL_NO_ASM
-#endif
 #ifndef OPENSSL_NO_ASYNC
 # define OPENSSL_NO_ASYNC
 #endif
diff --git a/CryptoPkg/Library/OpensslLib/ApiHooks.c b/CryptoPkg/Library/OpensslLib/ApiHooks.c
new file mode 100644
index 0000000000..58cff16838
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/ApiHooks.c
@@ -0,0 +1,18 @@
+/** @file
+  OpenSSL Library API hooks.
+
+Copyright (c) 2020, Intel Corporation. All rights reserved.<BR>
+SPDX-License-Identifier: BSD-2-Clause-Patent
+
+**/
+
+#include <Uefi.h>
+
+VOID *
+__imp_RtlVirtualUnwind (
+  VOID *    Args
+  )
+{
+  return NULL;
+}
+
diff --git a/CryptoPkg/Library/OpensslLib/OpensslLibConstructor.c b/CryptoPkg/Library/OpensslLib/OpensslLibConstructor.c
new file mode 100644
index 0000000000..ef20d2b84e
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/OpensslLibConstructor.c
@@ -0,0 +1,34 @@
+/** @file
+  Constructor to initialize CPUID data for OpenSSL assembly operations.
+
+Copyright (c) 2020, Intel Corporation. All rights reserved.<BR>
+SPDX-License-Identifier: BSD-2-Clause-Patent
+
+**/
+
+#include <Uefi.h>
+
+extern void OPENSSL_cpuid_setup (void);
+
+/**
+  Constructor routine for OpensslLib.
+
+  The constructor calls an internal OpenSSL function which fetches a local copy
+  of the hardware capability flags, used to enable native crypto instructions.
+
+  @param  None
+
+  @retval EFI_SUCCESS         The construction succeeded.
+
+**/
+EFI_STATUS
+EFIAPI
+OpensslLibConstructor (
+  VOID
+  )
+{
+  OPENSSL_cpuid_setup ();
+
+  return EFI_SUCCESS;
+}
+
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/aesni-x86.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/aesni-x86.nasm
new file mode 100644
index 0000000000..30879d3cf5
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/aesni-x86.nasm
@@ -0,0 +1,3209 @@
+; Copyright 2009-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code    use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text   code align=64
+%else
+section .text   code
+%endif
+;extern _OPENSSL_ia32cap_P
+global  _aesni_encrypt
+align   16
+_aesni_encrypt:
+L$_aesni_encrypt_begin:
+        mov     eax,DWORD [4+esp]
+        mov     edx,DWORD [12+esp]
+        movups  xmm2,[eax]
+        mov     ecx,DWORD [240+edx]
+        mov     eax,DWORD [8+esp]
+        movups  xmm0,[edx]
+        movups  xmm1,[16+edx]
+        lea     edx,[32+edx]
+        xorps   xmm2,xmm0
+L$000enc1_loop_1:
+db      102,15,56,220,209
+        dec     ecx
+        movups  xmm1,[edx]
+        lea     edx,[16+edx]
+        jnz     NEAR L$000enc1_loop_1
+db      102,15,56,221,209
+        pxor    xmm0,xmm0
+        pxor    xmm1,xmm1
+        movups  [eax],xmm2
+        pxor    xmm2,xmm2
+        ret
+global  _aesni_decrypt
+align   16
+_aesni_decrypt:
+L$_aesni_decrypt_begin:
+        mov     eax,DWORD [4+esp]
+        mov     edx,DWORD [12+esp]
+        movups  xmm2,[eax]
+        mov     ecx,DWORD [240+edx]
+        mov     eax,DWORD [8+esp]
+        movups  xmm0,[edx]
+        movups  xmm1,[16+edx]
+        lea     edx,[32+edx]
+        xorps   xmm2,xmm0
+L$001dec1_loop_2:
+db      102,15,56,222,209
+        dec     ecx
+        movups  xmm1,[edx]
+        lea     edx,[16+edx]
+        jnz     NEAR L$001dec1_loop_2
+db      102,15,56,223,209
+        pxor    xmm0,xmm0
+        pxor    xmm1,xmm1
+        movups  [eax],xmm2
+        pxor    xmm2,xmm2
+        ret
+align   16
+__aesni_encrypt2:
+        movups  xmm0,[edx]
+        shl     ecx,4
+        movups  xmm1,[16+edx]
+        xorps   xmm2,xmm0
+        pxor    xmm3,xmm0
+        movups  xmm0,[32+edx]
+        lea     edx,[32+ecx*1+edx]
+        neg     ecx
+        add     ecx,16
+L$002enc2_loop:
+db      102,15,56,220,209
+db      102,15,56,220,217
+        movups  xmm1,[ecx*1+edx]
+        add     ecx,32
+db      102,15,56,220,208
+db      102,15,56,220,216
+        movups  xmm0,[ecx*1+edx-16]
+        jnz     NEAR L$002enc2_loop
+db      102,15,56,220,209
+db      102,15,56,220,217
+db      102,15,56,221,208
+db      102,15,56,221,216
+        ret
+align   16
+__aesni_decrypt2:
+        movups  xmm0,[edx]
+        shl     ecx,4
+        movups  xmm1,[16+edx]
+        xorps   xmm2,xmm0
+        pxor    xmm3,xmm0
+        movups  xmm0,[32+edx]
+        lea     edx,[32+ecx*1+edx]
+        neg     ecx
+        add     ecx,16
+L$003dec2_loop:
+db      102,15,56,222,209
+db      102,15,56,222,217
+        movups  xmm1,[ecx*1+edx]
+        add     ecx,32
+db      102,15,56,222,208
+db      102,15,56,222,216
+        movups  xmm0,[ecx*1+edx-16]
+        jnz     NEAR L$003dec2_loop
+db      102,15,56,222,209
+db      102,15,56,222,217
+db      102,15,56,223,208
+db      102,15,56,223,216
+        ret
+align   16
+__aesni_encrypt3:
+        movups  xmm0,[edx]
+        shl     ecx,4
+        movups  xmm1,[16+edx]
+        xorps   xmm2,xmm0
+        pxor    xmm3,xmm0
+        pxor    xmm4,xmm0
+        movups  xmm0,[32+edx]
+        lea     edx,[32+ecx*1+edx]
+        neg     ecx
+        add     ecx,16
+L$004enc3_loop:
+db      102,15,56,220,209
+db      102,15,56,220,217
+db      102,15,56,220,225
+        movups  xmm1,[ecx*1+edx]
+        add     ecx,32
+db      102,15,56,220,208
+db      102,15,56,220,216
+db      102,15,56,220,224
+        movups  xmm0,[ecx*1+edx-16]
+        jnz     NEAR L$004enc3_loop
+db      102,15,56,220,209
+db      102,15,56,220,217
+db      102,15,56,220,225
+db      102,15,56,221,208
+db      102,15,56,221,216
+db      102,15,56,221,224
+        ret
+align   16
+__aesni_decrypt3:
+        movups  xmm0,[edx]
+        shl     ecx,4
+        movups  xmm1,[16+edx]
+        xorps   xmm2,xmm0
+        pxor    xmm3,xmm0
+        pxor    xmm4,xmm0
+        movups  xmm0,[32+edx]
+        lea     edx,[32+ecx*1+edx]
+        neg     ecx
+        add     ecx,16
+L$005dec3_loop:
+db      102,15,56,222,209
+db      102,15,56,222,217
+db      102,15,56,222,225
+        movups  xmm1,[ecx*1+edx]
+        add     ecx,32
+db      102,15,56,222,208
+db      102,15,56,222,216
+db      102,15,56,222,224
+        movups  xmm0,[ecx*1+edx-16]
+        jnz     NEAR L$005dec3_loop
+db      102,15,56,222,209
+db      102,15,56,222,217
+db      102,15,56,222,225
+db      102,15,56,223,208
+db      102,15,56,223,216
+db      102,15,56,223,224
+        ret
+align   16
+__aesni_encrypt4:
+        movups  xmm0,[edx]
+        movups  xmm1,[16+edx]
+        shl     ecx,4
+        xorps   xmm2,xmm0
+        pxor    xmm3,xmm0
+        pxor    xmm4,xmm0
+        pxor    xmm5,xmm0
+        movups  xmm0,[32+edx]
+        lea     edx,[32+ecx*1+edx]
+        neg     ecx
+db      15,31,64,0
+        add     ecx,16
+L$006enc4_loop:
+db      102,15,56,220,209
+db      102,15,56,220,217
+db      102,15,56,220,225
+db      102,15,56,220,233
+        movups  xmm1,[ecx*1+edx]
+        add     ecx,32
+db      102,15,56,220,208
+db      102,15,56,220,216
+db      102,15,56,220,224
+db      102,15,56,220,232
+        movups  xmm0,[ecx*1+edx-16]
+        jnz     NEAR L$006enc4_loop
+db      102,15,56,220,209
+db      102,15,56,220,217
+db      102,15,56,220,225
+db      102,15,56,220,233
+db      102,15,56,221,208
+db      102,15,56,221,216
+db      102,15,56,221,224
+db      102,15,56,221,232
+        ret
+align   16
+__aesni_decrypt4:
+        movups  xmm0,[edx]
+        movups  xmm1,[16+edx]
+        shl     ecx,4
+        xorps   xmm2,xmm0
+        pxor    xmm3,xmm0
+        pxor    xmm4,xmm0
+        pxor    xmm5,xmm0
+        movups  xmm0,[32+edx]
+        lea     edx,[32+ecx*1+edx]
+        neg     ecx
+db      15,31,64,0
+        add     ecx,16
+L$007dec4_loop:
+db      102,15,56,222,209
+db      102,15,56,222,217
+db      102,15,56,222,225
+db      102,15,56,222,233
+        movups  xmm1,[ecx*1+edx]
+        add     ecx,32
+db      102,15,56,222,208
+db      102,15,56,222,216
+db      102,15,56,222,224
+db      102,15,56,222,232
+        movups  xmm0,[ecx*1+edx-16]
+        jnz     NEAR L$007dec4_loop
+db      102,15,56,222,209
+db      102,15,56,222,217
+db      102,15,56,222,225
+db      102,15,56,222,233
+db      102,15,56,223,208
+db      102,15,56,223,216
+db      102,15,56,223,224
+db      102,15,56,223,232
+        ret
+align   16
+__aesni_encrypt6:
+        movups  xmm0,[edx]
+        shl     ecx,4
+        movups  xmm1,[16+edx]
+        xorps   xmm2,xmm0
+        pxor    xmm3,xmm0
+        pxor    xmm4,xmm0
+db      102,15,56,220,209
+        pxor    xmm5,xmm0
+        pxor    xmm6,xmm0
+db      102,15,56,220,217
+        lea     edx,[32+ecx*1+edx]
+        neg     ecx
+db      102,15,56,220,225
+        pxor    xmm7,xmm0
+        movups  xmm0,[ecx*1+edx]
+        add     ecx,16
+        jmp     NEAR L$008_aesni_encrypt6_inner
+align   16
+L$009enc6_loop:
+db      102,15,56,220,209
+db      102,15,56,220,217
+db      102,15,56,220,225
+L$008_aesni_encrypt6_inner:
+db      102,15,56,220,233
+db      102,15,56,220,241
+db      102,15,56,220,249
+L$_aesni_encrypt6_enter:
+        movups  xmm1,[ecx*1+edx]
+        add     ecx,32
+db      102,15,56,220,208
+db      102,15,56,220,216
+db      102,15,56,220,224
+db      102,15,56,220,232
+db      102,15,56,220,240
+db      102,15,56,220,248
+        movups  xmm0,[ecx*1+edx-16]
+        jnz     NEAR L$009enc6_loop
+db      102,15,56,220,209
+db      102,15,56,220,217
+db      102,15,56,220,225
+db      102,15,56,220,233
+db      102,15,56,220,241
+db      102,15,56,220,249
+db      102,15,56,221,208
+db      102,15,56,221,216
+db      102,15,56,221,224
+db      102,15,56,221,232
+db      102,15,56,221,240
+db      102,15,56,221,248
+        ret
+align   16
+__aesni_decrypt6:
+        movups  xmm0,[edx]
+        shl     ecx,4
+        movups  xmm1,[16+edx]
+        xorps   xmm2,xmm0
+        pxor    xmm3,xmm0
+        pxor    xmm4,xmm0
+db      102,15,56,222,209
+        pxor    xmm5,xmm0
+        pxor    xmm6,xmm0
+db      102,15,56,222,217
+        lea     edx,[32+ecx*1+edx]
+        neg     ecx
+db      102,15,56,222,225
+        pxor    xmm7,xmm0
+        movups  xmm0,[ecx*1+edx]
+        add     ecx,16
+        jmp     NEAR L$010_aesni_decrypt6_inner
+align   16
+L$011dec6_loop:
+db      102,15,56,222,209
+db      102,15,56,222,217
+db      102,15,56,222,225
+L$010_aesni_decrypt6_inner:
+db      102,15,56,222,233
+db      102,15,56,222,241
+db      102,15,56,222,249
+L$_aesni_decrypt6_enter:
+        movups  xmm1,[ecx*1+edx]
+        add     ecx,32
+db      102,15,56,222,208
+db      102,15,56,222,216
+db      102,15,56,222,224
+db      102,15,56,222,232
+db      102,15,56,222,240
+db      102,15,56,222,248
+        movups  xmm0,[ecx*1+edx-16]
+        jnz     NEAR L$011dec6_loop
+db      102,15,56,222,209
+db      102,15,56,222,217
+db      102,15,56,222,225
+db      102,15,56,222,233
+db      102,15,56,222,241
+db      102,15,56,222,249
+db      102,15,56,223,208
+db      102,15,56,223,216
+db      102,15,56,223,224
+db      102,15,56,223,232
+db      102,15,56,223,240
+db      102,15,56,223,248
+        ret
+global  _aesni_ecb_encrypt
+align   16
+_aesni_ecb_encrypt:
+L$_aesni_ecb_encrypt_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        mov     esi,DWORD [20+esp]
+        mov     edi,DWORD [24+esp]
+        mov     eax,DWORD [28+esp]
+        mov     edx,DWORD [32+esp]
+        mov     ebx,DWORD [36+esp]
+        and     eax,-16
+        jz      NEAR L$012ecb_ret
+        mov     ecx,DWORD [240+edx]
+        test    ebx,ebx
+        jz      NEAR L$013ecb_decrypt
+        mov     ebp,edx
+        mov     ebx,ecx
+        cmp     eax,96
+        jb      NEAR L$014ecb_enc_tail
+        movdqu  xmm2,[esi]
+        movdqu  xmm3,[16+esi]
+        movdqu  xmm4,[32+esi]
+        movdqu  xmm5,[48+esi]
+        movdqu  xmm6,[64+esi]
+        movdqu  xmm7,[80+esi]
+        lea     esi,[96+esi]
+        sub     eax,96
+        jmp     NEAR L$015ecb_enc_loop6_enter
+align   16
+L$016ecb_enc_loop6:
+        movups  [edi],xmm2
+        movdqu  xmm2,[esi]
+        movups  [16+edi],xmm3
+        movdqu  xmm3,[16+esi]
+        movups  [32+edi],xmm4
+        movdqu  xmm4,[32+esi]
+        movups  [48+edi],xmm5
+        movdqu  xmm5,[48+esi]
+        movups  [64+edi],xmm6
+        movdqu  xmm6,[64+esi]
+        movups  [80+edi],xmm7
+        lea     edi,[96+edi]
+        movdqu  xmm7,[80+esi]
+        lea     esi,[96+esi]
+L$015ecb_enc_loop6_enter:
+        call    __aesni_encrypt6
+        mov     edx,ebp
+        mov     ecx,ebx
+        sub     eax,96
+        jnc     NEAR L$016ecb_enc_loop6
+        movups  [edi],xmm2
+        movups  [16+edi],xmm3
+        movups  [32+edi],xmm4
+        movups  [48+edi],xmm5
+        movups  [64+edi],xmm6
+        movups  [80+edi],xmm7
+        lea     edi,[96+edi]
+        add     eax,96
+        jz      NEAR L$012ecb_ret
+L$014ecb_enc_tail:
+        movups  xmm2,[esi]
+        cmp     eax,32
+        jb      NEAR L$017ecb_enc_one
+        movups  xmm3,[16+esi]
+        je      NEAR L$018ecb_enc_two
+        movups  xmm4,[32+esi]
+        cmp     eax,64
+        jb      NEAR L$019ecb_enc_three
+        movups  xmm5,[48+esi]
+        je      NEAR L$020ecb_enc_four
+        movups  xmm6,[64+esi]
+        xorps   xmm7,xmm7
+        call    __aesni_encrypt6
+        movups  [edi],xmm2
+        movups  [16+edi],xmm3
+        movups  [32+edi],xmm4
+        movups  [48+edi],xmm5
+        movups  [64+edi],xmm6
+        jmp     NEAR L$012ecb_ret
+align   16
+L$017ecb_enc_one:
+        movups  xmm0,[edx]
+        movups  xmm1,[16+edx]
+        lea     edx,[32+edx]
+        xorps   xmm2,xmm0
+L$021enc1_loop_3:
+db      102,15,56,220,209
+        dec     ecx
+        movups  xmm1,[edx]
+        lea     edx,[16+edx]
+        jnz     NEAR L$021enc1_loop_3
+db      102,15,56,221,209
+        movups  [edi],xmm2
+        jmp     NEAR L$012ecb_ret
+align   16
+L$018ecb_enc_two:
+        call    __aesni_encrypt2
+        movups  [edi],xmm2
+        movups  [16+edi],xmm3
+        jmp     NEAR L$012ecb_ret
+align   16
+L$019ecb_enc_three:
+        call    __aesni_encrypt3
+        movups  [edi],xmm2
+        movups  [16+edi],xmm3
+        movups  [32+edi],xmm4
+        jmp     NEAR L$012ecb_ret
+align   16
+L$020ecb_enc_four:
+        call    __aesni_encrypt4
+        movups  [edi],xmm2
+        movups  [16+edi],xmm3
+        movups  [32+edi],xmm4
+        movups  [48+edi],xmm5
+        jmp     NEAR L$012ecb_ret
+align   16
+L$013ecb_decrypt:
+        mov     ebp,edx
+        mov     ebx,ecx
+        cmp     eax,96
+        jb      NEAR L$022ecb_dec_tail
+        movdqu  xmm2,[esi]
+        movdqu  xmm3,[16+esi]
+        movdqu  xmm4,[32+esi]
+        movdqu  xmm5,[48+esi]
+        movdqu  xmm6,[64+esi]
+        movdqu  xmm7,[80+esi]
+        lea     esi,[96+esi]
+        sub     eax,96
+        jmp     NEAR L$023ecb_dec_loop6_enter
+align   16
+L$024ecb_dec_loop6:
+        movups  [edi],xmm2
+        movdqu  xmm2,[esi]
+        movups  [16+edi],xmm3
+        movdqu  xmm3,[16+esi]
+        movups  [32+edi],xmm4
+        movdqu  xmm4,[32+esi]
+        movups  [48+edi],xmm5
+        movdqu  xmm5,[48+esi]
+        movups  [64+edi],xmm6
+        movdqu  xmm6,[64+esi]
+        movups  [80+edi],xmm7
+        lea     edi,[96+edi]
+        movdqu  xmm7,[80+esi]
+        lea     esi,[96+esi]
+L$023ecb_dec_loop6_enter:
+        call    __aesni_decrypt6
+        mov     edx,ebp
+        mov     ecx,ebx
+        sub     eax,96
+        jnc     NEAR L$024ecb_dec_loop6
+        movups  [edi],xmm2
+        movups  [16+edi],xmm3
+        movups  [32+edi],xmm4
+        movups  [48+edi],xmm5
+        movups  [64+edi],xmm6
+        movups  [80+edi],xmm7
+        lea     edi,[96+edi]
+        add     eax,96
+        jz      NEAR L$012ecb_ret
+L$022ecb_dec_tail:
+        movups  xmm2,[esi]
+        cmp     eax,32
+        jb      NEAR L$025ecb_dec_one
+        movups  xmm3,[16+esi]
+        je      NEAR L$026ecb_dec_two
+        movups  xmm4,[32+esi]
+        cmp     eax,64
+        jb      NEAR L$027ecb_dec_three
+        movups  xmm5,[48+esi]
+        je      NEAR L$028ecb_dec_four
+        movups  xmm6,[64+esi]
+        xorps   xmm7,xmm7
+        call    __aesni_decrypt6
+        movups  [edi],xmm2
+        movups  [16+edi],xmm3
+        movups  [32+edi],xmm4
+        movups  [48+edi],xmm5
+        movups  [64+edi],xmm6
+        jmp     NEAR L$012ecb_ret
+align   16
+L$025ecb_dec_one:
+        movups  xmm0,[edx]
+        movups  xmm1,[16+edx]
+        lea     edx,[32+edx]
+        xorps   xmm2,xmm0
+L$029dec1_loop_4:
+db      102,15,56,222,209
+        dec     ecx
+        movups  xmm1,[edx]
+        lea     edx,[16+edx]
+        jnz     NEAR L$029dec1_loop_4
+db      102,15,56,223,209
+        movups  [edi],xmm2
+        jmp     NEAR L$012ecb_ret
+align   16
+L$026ecb_dec_two:
+        call    __aesni_decrypt2
+        movups  [edi],xmm2
+        movups  [16+edi],xmm3
+        jmp     NEAR L$012ecb_ret
+align   16
+L$027ecb_dec_three:
+        call    __aesni_decrypt3
+        movups  [edi],xmm2
+        movups  [16+edi],xmm3
+        movups  [32+edi],xmm4
+        jmp     NEAR L$012ecb_ret
+align   16
+L$028ecb_dec_four:
+        call    __aesni_decrypt4
+        movups  [edi],xmm2
+        movups  [16+edi],xmm3
+        movups  [32+edi],xmm4
+        movups  [48+edi],xmm5
+L$012ecb_ret:
+        pxor    xmm0,xmm0
+        pxor    xmm1,xmm1
+        pxor    xmm2,xmm2
+        pxor    xmm3,xmm3
+        pxor    xmm4,xmm4
+        pxor    xmm5,xmm5
+        pxor    xmm6,xmm6
+        pxor    xmm7,xmm7
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+global  _aesni_ccm64_encrypt_blocks
+align   16
+_aesni_ccm64_encrypt_blocks:
+L$_aesni_ccm64_encrypt_blocks_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        mov     esi,DWORD [20+esp]
+        mov     edi,DWORD [24+esp]
+        mov     eax,DWORD [28+esp]
+        mov     edx,DWORD [32+esp]
+        mov     ebx,DWORD [36+esp]
+        mov     ecx,DWORD [40+esp]
+        mov     ebp,esp
+        sub     esp,60
+        and     esp,-16
+        mov     DWORD [48+esp],ebp
+        movdqu  xmm7,[ebx]
+        movdqu  xmm3,[ecx]
+        mov     ecx,DWORD [240+edx]
+        mov     DWORD [esp],202182159
+        mov     DWORD [4+esp],134810123
+        mov     DWORD [8+esp],67438087
+        mov     DWORD [12+esp],66051
+        mov     ebx,1
+        xor     ebp,ebp
+        mov     DWORD [16+esp],ebx
+        mov     DWORD [20+esp],ebp
+        mov     DWORD [24+esp],ebp
+        mov     DWORD [28+esp],ebp
+        shl     ecx,4
+        mov     ebx,16
+        lea     ebp,[edx]
+        movdqa  xmm5,[esp]
+        movdqa  xmm2,xmm7
+        lea     edx,[32+ecx*1+edx]
+        sub     ebx,ecx
+db      102,15,56,0,253
+L$030ccm64_enc_outer:
+        movups  xmm0,[ebp]
+        mov     ecx,ebx
+        movups  xmm6,[esi]
+        xorps   xmm2,xmm0
+        movups  xmm1,[16+ebp]
+        xorps   xmm0,xmm6
+        xorps   xmm3,xmm0
+        movups  xmm0,[32+ebp]
+L$031ccm64_enc2_loop:
+db      102,15,56,220,209
+db      102,15,56,220,217
+        movups  xmm1,[ecx*1+edx]
+        add     ecx,32
+db      102,15,56,220,208
+db      102,15,56,220,216
+        movups  xmm0,[ecx*1+edx-16]
+        jnz     NEAR L$031ccm64_enc2_loop
+db      102,15,56,220,209
+db      102,15,56,220,217
+        paddq   xmm7,[16+esp]
+        dec     eax
+db      102,15,56,221,208
+db      102,15,56,221,216
+        lea     esi,[16+esi]
+        xorps   xmm6,xmm2
+        movdqa  xmm2,xmm7
+        movups  [edi],xmm6
+db      102,15,56,0,213
+        lea     edi,[16+edi]
+        jnz     NEAR L$030ccm64_enc_outer
+        mov     esp,DWORD [48+esp]
+        mov     edi,DWORD [40+esp]
+        movups  [edi],xmm3
+        pxor    xmm0,xmm0
+        pxor    xmm1,xmm1
+        pxor    xmm2,xmm2
+        pxor    xmm3,xmm3
+        pxor    xmm4,xmm4
+        pxor    xmm5,xmm5
+        pxor    xmm6,xmm6
+        pxor    xmm7,xmm7
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+global  _aesni_ccm64_decrypt_blocks
+align   16
+_aesni_ccm64_decrypt_blocks:
+L$_aesni_ccm64_decrypt_blocks_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        mov     esi,DWORD [20+esp]
+        mov     edi,DWORD [24+esp]
+        mov     eax,DWORD [28+esp]
+        mov     edx,DWORD [32+esp]
+        mov     ebx,DWORD [36+esp]
+        mov     ecx,DWORD [40+esp]
+        mov     ebp,esp
+        sub     esp,60
+        and     esp,-16
+        mov     DWORD [48+esp],ebp
+        movdqu  xmm7,[ebx]
+        movdqu  xmm3,[ecx]
+        mov     ecx,DWORD [240+edx]
+        mov     DWORD [esp],202182159
+        mov     DWORD [4+esp],134810123
+        mov     DWORD [8+esp],67438087
+        mov     DWORD [12+esp],66051
+        mov     ebx,1
+        xor     ebp,ebp
+        mov     DWORD [16+esp],ebx
+        mov     DWORD [20+esp],ebp
+        mov     DWORD [24+esp],ebp
+        mov     DWORD [28+esp],ebp
+        movdqa  xmm5,[esp]
+        movdqa  xmm2,xmm7
+        mov     ebp,edx
+        mov     ebx,ecx
+db      102,15,56,0,253
+        movups  xmm0,[edx]
+        movups  xmm1,[16+edx]
+        lea     edx,[32+edx]
+        xorps   xmm2,xmm0
+L$032enc1_loop_5:
+db      102,15,56,220,209
+        dec     ecx
+        movups  xmm1,[edx]
+        lea     edx,[16+edx]
+        jnz     NEAR L$032enc1_loop_5
+db      102,15,56,221,209
+        shl     ebx,4
+        mov     ecx,16
+        movups  xmm6,[esi]
+        paddq   xmm7,[16+esp]
+        lea     esi,[16+esi]
+        sub     ecx,ebx
+        lea     edx,[32+ebx*1+ebp]
+        mov     ebx,ecx
+        jmp     NEAR L$033ccm64_dec_outer
+align   16
+L$033ccm64_dec_outer:
+        xorps   xmm6,xmm2
+        movdqa  xmm2,xmm7
+        movups  [edi],xmm6
+        lea     edi,[16+edi]
+db      102,15,56,0,213
+        sub     eax,1
+        jz      NEAR L$034ccm64_dec_break
+        movups  xmm0,[ebp]
+        mov     ecx,ebx
+        movups  xmm1,[16+ebp]
+        xorps   xmm6,xmm0
+        xorps   xmm2,xmm0
+        xorps   xmm3,xmm6
+        movups  xmm0,[32+ebp]
+L$035ccm64_dec2_loop:
+db      102,15,56,220,209
+db      102,15,56,220,217
+        movups  xmm1,[ecx*1+edx]
+        add     ecx,32
+db      102,15,56,220,208
+db      102,15,56,220,216
+        movups  xmm0,[ecx*1+edx-16]
+        jnz     NEAR L$035ccm64_dec2_loop
+        movups  xmm6,[esi]
+        paddq   xmm7,[16+esp]
+db      102,15,56,220,209
+db      102,15,56,220,217
+db      102,15,56,221,208
+db      102,15,56,221,216
+        lea     esi,[16+esi]
+        jmp     NEAR L$033ccm64_dec_outer
+align   16
+L$034ccm64_dec_break:
+        mov     ecx,DWORD [240+ebp]
+        mov     edx,ebp
+        movups  xmm0,[edx]
+        movups  xmm1,[16+edx]
+        xorps   xmm6,xmm0
+        lea     edx,[32+edx]
+        xorps   xmm3,xmm6
+L$036enc1_loop_6:
+db      102,15,56,220,217
+        dec     ecx
+        movups  xmm1,[edx]
+        lea     edx,[16+edx]
+        jnz     NEAR L$036enc1_loop_6
+db      102,15,56,221,217
+        mov     esp,DWORD [48+esp]
+        mov     edi,DWORD [40+esp]
+        movups  [edi],xmm3
+        pxor    xmm0,xmm0
+        pxor    xmm1,xmm1
+        pxor    xmm2,xmm2
+        pxor    xmm3,xmm3
+        pxor    xmm4,xmm4
+        pxor    xmm5,xmm5
+        pxor    xmm6,xmm6
+        pxor    xmm7,xmm7
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+global  _aesni_ctr32_encrypt_blocks
+align   16
+_aesni_ctr32_encrypt_blocks:
+L$_aesni_ctr32_encrypt_blocks_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        mov     esi,DWORD [20+esp]
+        mov     edi,DWORD [24+esp]
+        mov     eax,DWORD [28+esp]
+        mov     edx,DWORD [32+esp]
+        mov     ebx,DWORD [36+esp]
+        mov     ebp,esp
+        sub     esp,88
+        and     esp,-16
+        mov     DWORD [80+esp],ebp
+        cmp     eax,1
+        je      NEAR L$037ctr32_one_shortcut
+        movdqu  xmm7,[ebx]
+        mov     DWORD [esp],202182159
+        mov     DWORD [4+esp],134810123
+        mov     DWORD [8+esp],67438087
+        mov     DWORD [12+esp],66051
+        mov     ecx,6
+        xor     ebp,ebp
+        mov     DWORD [16+esp],ecx
+        mov     DWORD [20+esp],ecx
+        mov     DWORD [24+esp],ecx
+        mov     DWORD [28+esp],ebp
+db      102,15,58,22,251,3
+db      102,15,58,34,253,3
+        mov     ecx,DWORD [240+edx]
+        bswap   ebx
+        pxor    xmm0,xmm0
+        pxor    xmm1,xmm1
+        movdqa  xmm2,[esp]
+db      102,15,58,34,195,0
+        lea     ebp,[3+ebx]
+db      102,15,58,34,205,0
+        inc     ebx
+db      102,15,58,34,195,1
+        inc     ebp
+db      102,15,58,34,205,1
+        inc     ebx
+db      102,15,58,34,195,2
+        inc     ebp
+db      102,15,58,34,205,2
+        movdqa  [48+esp],xmm0
+db      102,15,56,0,194
+        movdqu  xmm6,[edx]
+        movdqa  [64+esp],xmm1
+db      102,15,56,0,202
+        pshufd  xmm2,xmm0,192
+        pshufd  xmm3,xmm0,128
+        cmp     eax,6
+        jb      NEAR L$038ctr32_tail
+        pxor    xmm7,xmm6
+        shl     ecx,4
+        mov     ebx,16
+        movdqa  [32+esp],xmm7
+        mov     ebp,edx
+        sub     ebx,ecx
+        lea     edx,[32+ecx*1+edx]
+        sub     eax,6
+        jmp     NEAR L$039ctr32_loop6
+align   16
+L$039ctr32_loop6:
+        pshufd  xmm4,xmm0,64
+        movdqa  xmm0,[32+esp]
+        pshufd  xmm5,xmm1,192
+        pxor    xmm2,xmm0
+        pshufd  xmm6,xmm1,128
+        pxor    xmm3,xmm0
+        pshufd  xmm7,xmm1,64
+        movups  xmm1,[16+ebp]
+        pxor    xmm4,xmm0
+        pxor    xmm5,xmm0
+db      102,15,56,220,209
+        pxor    xmm6,xmm0
+        pxor    xmm7,xmm0
+db      102,15,56,220,217
+        movups  xmm0,[32+ebp]
+        mov     ecx,ebx
+db      102,15,56,220,225
+db      102,15,56,220,233
+db      102,15,56,220,241
+db      102,15,56,220,249
+        call    L$_aesni_encrypt6_enter
+        movups  xmm1,[esi]
+        movups  xmm0,[16+esi]
+        xorps   xmm2,xmm1
+        movups  xmm1,[32+esi]
+        xorps   xmm3,xmm0
+        movups  [edi],xmm2
+        movdqa  xmm0,[16+esp]
+        xorps   xmm4,xmm1
+        movdqa  xmm1,[64+esp]
+        movups  [16+edi],xmm3
+        movups  [32+edi],xmm4
+        paddd   xmm1,xmm0
+        paddd   xmm0,[48+esp]
+        movdqa  xmm2,[esp]
+        movups  xmm3,[48+esi]
+        movups  xmm4,[64+esi]
+        xorps   xmm5,xmm3
+        movups  xmm3,[80+esi]
+        lea     esi,[96+esi]
+        movdqa  [48+esp],xmm0
+db      102,15,56,0,194
+        xorps   xmm6,xmm4
+        movups  [48+edi],xmm5
+        xorps   xmm7,xmm3
+        movdqa  [64+esp],xmm1
+db      102,15,56,0,202
+        movups  [64+edi],xmm6
+        pshufd  xmm2,xmm0,192
+        movups  [80+edi],xmm7
+        lea     edi,[96+edi]
+        pshufd  xmm3,xmm0,128
+        sub     eax,6
+        jnc     NEAR L$039ctr32_loop6
+        add     eax,6
+        jz      NEAR L$040ctr32_ret
+        movdqu  xmm7,[ebp]
+        mov     edx,ebp
+        pxor    xmm7,[32+esp]
+        mov     ecx,DWORD [240+ebp]
+L$038ctr32_tail:
+        por     xmm2,xmm7
+        cmp     eax,2
+        jb      NEAR L$041ctr32_one
+        pshufd  xmm4,xmm0,64
+        por     xmm3,xmm7
+        je      NEAR L$042ctr32_two
+        pshufd  xmm5,xmm1,192
+        por     xmm4,xmm7
+        cmp     eax,4
+        jb      NEAR L$043ctr32_three
+        pshufd  xmm6,xmm1,128
+        por     xmm5,xmm7
+        je      NEAR L$044ctr32_four
+        por     xmm6,xmm7
+        call    __aesni_encrypt6
+        movups  xmm1,[esi]
+        movups  xmm0,[16+esi]
+        xorps   xmm2,xmm1
+        movups  xmm1,[32+esi]
+        xorps   xmm3,xmm0
+        movups  xmm0,[48+esi]
+        xorps   xmm4,xmm1
+        movups  xmm1,[64+esi]
+        xorps   xmm5,xmm0
+        movups  [edi],xmm2
+        xorps   xmm6,xmm1
+        movups  [16+edi],xmm3
+        movups  [32+edi],xmm4
+        movups  [48+edi],xmm5
+        movups  [64+edi],xmm6
+        jmp     NEAR L$040ctr32_ret
+align   16
+L$037ctr32_one_shortcut:
+        movups  xmm2,[ebx]
+        mov     ecx,DWORD [240+edx]
+L$041ctr32_one:
+        movups  xmm0,[edx]
+        movups  xmm1,[16+edx]
+        lea     edx,[32+edx]
+        xorps   xmm2,xmm0
+L$045enc1_loop_7:
+db      102,15,56,220,209
+        dec     ecx
+        movups  xmm1,[edx]
+        lea     edx,[16+edx]
+        jnz     NEAR L$045enc1_loop_7
+db      102,15,56,221,209
+        movups  xmm6,[esi]
+        xorps   xmm6,xmm2
+        movups  [edi],xmm6
+        jmp     NEAR L$040ctr32_ret
+align   16
+L$042ctr32_two:
+        call    __aesni_encrypt2
+        movups  xmm5,[esi]
+        movups  xmm6,[16+esi]
+        xorps   xmm2,xmm5
+        xorps   xmm3,xmm6
+        movups  [edi],xmm2
+        movups  [16+edi],xmm3
+        jmp     NEAR L$040ctr32_ret
+align   16
+L$043ctr32_three:
+        call    __aesni_encrypt3
+        movups  xmm5,[esi]
+        movups  xmm6,[16+esi]
+        xorps   xmm2,xmm5
+        movups  xmm7,[32+esi]
+        xorps   xmm3,xmm6
+        movups  [edi],xmm2
+        xorps   xmm4,xmm7
+        movups  [16+edi],xmm3
+        movups  [32+edi],xmm4
+        jmp     NEAR L$040ctr32_ret
+align   16
+L$044ctr32_four:
+        call    __aesni_encrypt4
+        movups  xmm6,[esi]
+        movups  xmm7,[16+esi]
+        movups  xmm1,[32+esi]
+        xorps   xmm2,xmm6
+        movups  xmm0,[48+esi]
+        xorps   xmm3,xmm7
+        movups  [edi],xmm2
+        xorps   xmm4,xmm1
+        movups  [16+edi],xmm3
+        xorps   xmm5,xmm0
+        movups  [32+edi],xmm4
+        movups  [48+edi],xmm5
+L$040ctr32_ret:
+        pxor    xmm0,xmm0
+        pxor    xmm1,xmm1
+        pxor    xmm2,xmm2
+        pxor    xmm3,xmm3
+        pxor    xmm4,xmm4
+        movdqa  [32+esp],xmm0
+        pxor    xmm5,xmm5
+        movdqa  [48+esp],xmm0
+        pxor    xmm6,xmm6
+        movdqa  [64+esp],xmm0
+        pxor    xmm7,xmm7
+        mov     esp,DWORD [80+esp]
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+global  _aesni_xts_encrypt
+align   16
+_aesni_xts_encrypt:
+L$_aesni_xts_encrypt_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        mov     edx,DWORD [36+esp]
+        mov     esi,DWORD [40+esp]
+        mov     ecx,DWORD [240+edx]
+        movups  xmm2,[esi]
+        movups  xmm0,[edx]
+        movups  xmm1,[16+edx]
+        lea     edx,[32+edx]
+        xorps   xmm2,xmm0
+L$046enc1_loop_8:
+db      102,15,56,220,209
+        dec     ecx
+        movups  xmm1,[edx]
+        lea     edx,[16+edx]
+        jnz     NEAR L$046enc1_loop_8
+db      102,15,56,221,209
+        mov     esi,DWORD [20+esp]
+        mov     edi,DWORD [24+esp]
+        mov     eax,DWORD [28+esp]
+        mov     edx,DWORD [32+esp]
+        mov     ebp,esp
+        sub     esp,120
+        mov     ecx,DWORD [240+edx]
+        and     esp,-16
+        mov     DWORD [96+esp],135
+        mov     DWORD [100+esp],0
+        mov     DWORD [104+esp],1
+        mov     DWORD [108+esp],0
+        mov     DWORD [112+esp],eax
+        mov     DWORD [116+esp],ebp
+        movdqa  xmm1,xmm2
+        pxor    xmm0,xmm0
+        movdqa  xmm3,[96+esp]
+        pcmpgtd xmm0,xmm1
+        and     eax,-16
+        mov     ebp,edx
+        mov     ebx,ecx
+        sub     eax,96
+        jc      NEAR L$047xts_enc_short
+        shl     ecx,4
+        mov     ebx,16
+        sub     ebx,ecx
+        lea     edx,[32+ecx*1+edx]
+        jmp     NEAR L$048xts_enc_loop6
+align   16
+L$048xts_enc_loop6:
+        pshufd  xmm2,xmm0,19
+        pxor    xmm0,xmm0
+        movdqa  [esp],xmm1
+        paddq   xmm1,xmm1
+        pand    xmm2,xmm3
+        pcmpgtd xmm0,xmm1
+        pxor    xmm1,xmm2
+        pshufd  xmm2,xmm0,19
+        pxor    xmm0,xmm0
+        movdqa  [16+esp],xmm1
+        paddq   xmm1,xmm1
+        pand    xmm2,xmm3
+        pcmpgtd xmm0,xmm1
+        pxor    xmm1,xmm2
+        pshufd  xmm2,xmm0,19
+        pxor    xmm0,xmm0
+        movdqa  [32+esp],xmm1
+        paddq   xmm1,xmm1
+        pand    xmm2,xmm3
+        pcmpgtd xmm0,xmm1
+        pxor    xmm1,xmm2
+        pshufd  xmm2,xmm0,19
+        pxor    xmm0,xmm0
+        movdqa  [48+esp],xmm1
+        paddq   xmm1,xmm1
+        pand    xmm2,xmm3
+        pcmpgtd xmm0,xmm1
+        pxor    xmm1,xmm2
+        pshufd  xmm7,xmm0,19
+        movdqa  [64+esp],xmm1
+        paddq   xmm1,xmm1
+        movups  xmm0,[ebp]
+        pand    xmm7,xmm3
+        movups  xmm2,[esi]
+        pxor    xmm7,xmm1
+        mov     ecx,ebx
+        movdqu  xmm3,[16+esi]
+        xorps   xmm2,xmm0
+        movdqu  xmm4,[32+esi]
+        pxor    xmm3,xmm0
+        movdqu  xmm5,[48+esi]
+        pxor    xmm4,xmm0
+        movdqu  xmm6,[64+esi]
+        pxor    xmm5,xmm0
+        movdqu  xmm1,[80+esi]
+        pxor    xmm6,xmm0
+        lea     esi,[96+esi]
+        pxor    xmm2,[esp]
+        movdqa  [80+esp],xmm7
+        pxor    xmm7,xmm1
+        movups  xmm1,[16+ebp]
+        pxor    xmm3,[16+esp]
+        pxor    xmm4,[32+esp]
+db      102,15,56,220,209
+        pxor    xmm5,[48+esp]
+        pxor    xmm6,[64+esp]
+db      102,15,56,220,217
+        pxor    xmm7,xmm0
+        movups  xmm0,[32+ebp]
+db      102,15,56,220,225
+db      102,15,56,220,233
+db      102,15,56,220,241
+db      102,15,56,220,249
+        call    L$_aesni_encrypt6_enter
+        movdqa  xmm1,[80+esp]
+        pxor    xmm0,xmm0
+        xorps   xmm2,[esp]
+        pcmpgtd xmm0,xmm1
+        xorps   xmm3,[16+esp]
+        movups  [edi],xmm2
+        xorps   xmm4,[32+esp]
+        movups  [16+edi],xmm3
+        xorps   xmm5,[48+esp]
+        movups  [32+edi],xmm4
+        xorps   xmm6,[64+esp]
+        movups  [48+edi],xmm5
+        xorps   xmm7,xmm1
+        movups  [64+edi],xmm6
+        pshufd  xmm2,xmm0,19
+        movups  [80+edi],xmm7
+        lea     edi,[96+edi]
+        movdqa  xmm3,[96+esp]
+        pxor    xmm0,xmm0
+        paddq   xmm1,xmm1
+        pand    xmm2,xmm3
+        pcmpgtd xmm0,xmm1
+        pxor    xmm1,xmm2
+        sub     eax,96
+        jnc     NEAR L$048xts_enc_loop6
+        mov     ecx,DWORD [240+ebp]
+        mov     edx,ebp
+        mov     ebx,ecx
+L$047xts_enc_short:
+        add     eax,96
+        jz      NEAR L$049xts_enc_done6x
+        movdqa  xmm5,xmm1
+        cmp     eax,32
+        jb      NEAR L$050xts_enc_one
+        pshufd  xmm2,xmm0,19
+        pxor    xmm0,xmm0
+        paddq   xmm1,xmm1
+        pand    xmm2,xmm3
+        pcmpgtd xmm0,xmm1
+        pxor    xmm1,xmm2
+        je      NEAR L$051xts_enc_two
+        pshufd  xmm2,xmm0,19
+        pxor    xmm0,xmm0
+        movdqa  xmm6,xmm1
+        paddq   xmm1,xmm1
+        pand    xmm2,xmm3
+        pcmpgtd xmm0,xmm1
+        pxor    xmm1,xmm2
+        cmp     eax,64
+        jb      NEAR L$052xts_enc_three
+        pshufd  xmm2,xmm0,19
+        pxor    xmm0,xmm0
+        movdqa  xmm7,xmm1
+        paddq   xmm1,xmm1
+        pand    xmm2,xmm3
+        pcmpgtd xmm0,xmm1
+        pxor    xmm1,xmm2
+        movdqa  [esp],xmm5
+        movdqa  [16+esp],xmm6
+        je      NEAR L$053xts_enc_four
+        movdqa  [32+esp],xmm7
+        pshufd  xmm7,xmm0,19
+        movdqa  [48+esp],xmm1
+        paddq   xmm1,xmm1
+        pand    xmm7,xmm3
+        pxor    xmm7,xmm1
+        movdqu  xmm2,[esi]
+        movdqu  xmm3,[16+esi]
+        movdqu  xmm4,[32+esi]
+        pxor    xmm2,[esp]
+        movdqu  xmm5,[48+esi]
+        pxor    xmm3,[16+esp]
+        movdqu  xmm6,[64+esi]
+        pxor    xmm4,[32+esp]
+        lea     esi,[80+esi]
+        pxor    xmm5,[48+esp]
+        movdqa  [64+esp],xmm7
+        pxor    xmm6,xmm7
+        call    __aesni_encrypt6
+        movaps  xmm1,[64+esp]
+        xorps   xmm2,[esp]
+        xorps   xmm3,[16+esp]
+        xorps   xmm4,[32+esp]
+        movups  [edi],xmm2
+        xorps   xmm5,[48+esp]
+        movups  [16+edi],xmm3
+        xorps   xmm6,xmm1
+        movups  [32+edi],xmm4
+        movups  [48+edi],xmm5
+        movups  [64+edi],xmm6
+        lea     edi,[80+edi]
+        jmp     NEAR L$054xts_enc_done
+align   16
+L$050xts_enc_one:
+        movups  xmm2,[esi]
+        lea     esi,[16+esi]
+        xorps   xmm2,xmm5
+        movups  xmm0,[edx]
+        movups  xmm1,[16+edx]
+        lea     edx,[32+edx]
+        xorps   xmm2,xmm0
+L$055enc1_loop_9:
+db      102,15,56,220,209
+        dec     ecx
+        movups  xmm1,[edx]
+        lea     edx,[16+edx]
+        jnz     NEAR L$055enc1_loop_9
+db      102,15,56,221,209
+        xorps   xmm2,xmm5
+        movups  [edi],xmm2
+        lea     edi,[16+edi]
+        movdqa  xmm1,xmm5
+        jmp     NEAR L$054xts_enc_done
+align   16
+L$051xts_enc_two:
+        movaps  xmm6,xmm1
+        movups  xmm2,[esi]
+        movups  xmm3,[16+esi]
+        lea     esi,[32+esi]
+        xorps   xmm2,xmm5
+        xorps   xmm3,xmm6
+        call    __aesni_encrypt2
+        xorps   xmm2,xmm5
+        xorps   xmm3,xmm6
+        movups  [edi],xmm2
+        movups  [16+edi],xmm3
+        lea     edi,[32+edi]
+        movdqa  xmm1,xmm6
+        jmp     NEAR L$054xts_enc_done
+align   16
+L$052xts_enc_three:
+        movaps  xmm7,xmm1
+        movups  xmm2,[esi]
+        movups  xmm3,[16+esi]
+        movups  xmm4,[32+esi]
+        lea     esi,[48+esi]
+        xorps   xmm2,xmm5
+        xorps   xmm3,xmm6
+        xorps   xmm4,xmm7
+        call    __aesni_encrypt3
+        xorps   xmm2,xmm5
+        xorps   xmm3,xmm6
+        xorps   xmm4,xmm7
+        movups  [edi],xmm2
+        movups  [16+edi],xmm3
+        movups  [32+edi],xmm4
+        lea     edi,[48+edi]
+        movdqa  xmm1,xmm7
+        jmp     NEAR L$054xts_enc_done
+align   16
+L$053xts_enc_four:
+        movaps  xmm6,xmm1
+        movups  xmm2,[esi]
+        movups  xmm3,[16+esi]
+        movups  xmm4,[32+esi]
+        xorps   xmm2,[esp]
+        movups  xmm5,[48+esi]
+        lea     esi,[64+esi]
+        xorps   xmm3,[16+esp]
+        xorps   xmm4,xmm7
+        xorps   xmm5,xmm6
+        call    __aesni_encrypt4
+        xorps   xmm2,[esp]
+        xorps   xmm3,[16+esp]
+        xorps   xmm4,xmm7
+        movups  [edi],xmm2
+        xorps   xmm5,xmm6
+        movups  [16+edi],xmm3
+        movups  [32+edi],xmm4
+        movups  [48+edi],xmm5
+        lea     edi,[64+edi]
+        movdqa  xmm1,xmm6
+        jmp     NEAR L$054xts_enc_done
+align   16
+L$049xts_enc_done6x:
+        mov     eax,DWORD [112+esp]
+        and     eax,15
+        jz      NEAR L$056xts_enc_ret
+        movdqa  xmm5,xmm1
+        mov     DWORD [112+esp],eax
+        jmp     NEAR L$057xts_enc_steal
+align   16
+L$054xts_enc_done:
+        mov     eax,DWORD [112+esp]
+        pxor    xmm0,xmm0
+        and     eax,15
+        jz      NEAR L$056xts_enc_ret
+        pcmpgtd xmm0,xmm1
+        mov     DWORD [112+esp],eax
+        pshufd  xmm5,xmm0,19
+        paddq   xmm1,xmm1
+        pand    xmm5,[96+esp]
+        pxor    xmm5,xmm1
+L$057xts_enc_steal:
+        movzx   ecx,BYTE [esi]
+        movzx   edx,BYTE [edi-16]
+        lea     esi,[1+esi]
+        mov     BYTE [edi-16],cl
+        mov     BYTE [edi],dl
+        lea     edi,[1+edi]
+        sub     eax,1
+        jnz     NEAR L$057xts_enc_steal
+        sub     edi,DWORD [112+esp]
+        mov     edx,ebp
+        mov     ecx,ebx
+        movups  xmm2,[edi-16]
+        xorps   xmm2,xmm5
+        movups  xmm0,[edx]
+        movups  xmm1,[16+edx]
+        lea     edx,[32+edx]
+        xorps   xmm2,xmm0
+L$058enc1_loop_10:
+db      102,15,56,220,209
+        dec     ecx
+        movups  xmm1,[edx]
+        lea     edx,[16+edx]
+        jnz     NEAR L$058enc1_loop_10
+db      102,15,56,221,209
+        xorps   xmm2,xmm5
+        movups  [edi-16],xmm2
+L$056xts_enc_ret:
+        pxor    xmm0,xmm0
+        pxor    xmm1,xmm1
+        pxor    xmm2,xmm2
+        movdqa  [esp],xmm0
+        pxor    xmm3,xmm3
+        movdqa  [16+esp],xmm0
+        pxor    xmm4,xmm4
+        movdqa  [32+esp],xmm0
+        pxor    xmm5,xmm5
+        movdqa  [48+esp],xmm0
+        pxor    xmm6,xmm6
+        movdqa  [64+esp],xmm0
+        pxor    xmm7,xmm7
+        movdqa  [80+esp],xmm0
+        mov     esp,DWORD [116+esp]
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+global  _aesni_xts_decrypt
+align   16
+_aesni_xts_decrypt:
+L$_aesni_xts_decrypt_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        mov     edx,DWORD [36+esp]
+        mov     esi,DWORD [40+esp]
+        mov     ecx,DWORD [240+edx]
+        movups  xmm2,[esi]
+        movups  xmm0,[edx]
+        movups  xmm1,[16+edx]
+        lea     edx,[32+edx]
+        xorps   xmm2,xmm0
+L$059enc1_loop_11:
+db      102,15,56,220,209
+        dec     ecx
+        movups  xmm1,[edx]
+        lea     edx,[16+edx]
+        jnz     NEAR L$059enc1_loop_11
+db      102,15,56,221,209
+        mov     esi,DWORD [20+esp]
+        mov     edi,DWORD [24+esp]
+        mov     eax,DWORD [28+esp]
+        mov     edx,DWORD [32+esp]
+        mov     ebp,esp
+        sub     esp,120
+        and     esp,-16
+        xor     ebx,ebx
+        test    eax,15
+        setnz   bl
+        shl     ebx,4
+        sub     eax,ebx
+        mov     DWORD [96+esp],135
+        mov     DWORD [100+esp],0
+        mov     DWORD [104+esp],1
+        mov     DWORD [108+esp],0
+        mov     DWORD [112+esp],eax
+        mov     DWORD [116+esp],ebp
+        mov     ecx,DWORD [240+edx]
+        mov     ebp,edx
+        mov     ebx,ecx
+        movdqa  xmm1,xmm2
+        pxor    xmm0,xmm0
+        movdqa  xmm3,[96+esp]
+        pcmpgtd xmm0,xmm1
+        and     eax,-16
+        sub     eax,96
+        jc      NEAR L$060xts_dec_short
+        shl     ecx,4
+        mov     ebx,16
+        sub     ebx,ecx
+        lea     edx,[32+ecx*1+edx]
+        jmp     NEAR L$061xts_dec_loop6
+align   16
+L$061xts_dec_loop6:
+        pshufd  xmm2,xmm0,19
+        pxor    xmm0,xmm0
+        movdqa  [esp],xmm1
+        paddq   xmm1,xmm1
+        pand    xmm2,xmm3
+        pcmpgtd xmm0,xmm1
+        pxor    xmm1,xmm2
+        pshufd  xmm2,xmm0,19
+        pxor    xmm0,xmm0
+        movdqa  [16+esp],xmm1
+        paddq   xmm1,xmm1
+        pand    xmm2,xmm3
+        pcmpgtd xmm0,xmm1
+        pxor    xmm1,xmm2
+        pshufd  xmm2,xmm0,19
+        pxor    xmm0,xmm0
+        movdqa  [32+esp],xmm1
+        paddq   xmm1,xmm1
+        pand    xmm2,xmm3
+        pcmpgtd xmm0,xmm1
+        pxor    xmm1,xmm2
+        pshufd  xmm2,xmm0,19
+        pxor    xmm0,xmm0
+        movdqa  [48+esp],xmm1
+        paddq   xmm1,xmm1
+        pand    xmm2,xmm3
+        pcmpgtd xmm0,xmm1
+        pxor    xmm1,xmm2
+        pshufd  xmm7,xmm0,19
+        movdqa  [64+esp],xmm1
+        paddq   xmm1,xmm1
+        movups  xmm0,[ebp]
+        pand    xmm7,xmm3
+        movups  xmm2,[esi]
+        pxor    xmm7,xmm1
+        mov     ecx,ebx
+        movdqu  xmm3,[16+esi]
+        xorps   xmm2,xmm0
+        movdqu  xmm4,[32+esi]
+        pxor    xmm3,xmm0
+        movdqu  xmm5,[48+esi]
+        pxor    xmm4,xmm0
+        movdqu  xmm6,[64+esi]
+        pxor    xmm5,xmm0
+        movdqu  xmm1,[80+esi]
+        pxor    xmm6,xmm0
+        lea     esi,[96+esi]
+        pxor    xmm2,[esp]
+        movdqa  [80+esp],xmm7
+        pxor    xmm7,xmm1
+        movups  xmm1,[16+ebp]
+        pxor    xmm3,[16+esp]
+        pxor    xmm4,[32+esp]
+db      102,15,56,222,209
+        pxor    xmm5,[48+esp]
+        pxor    xmm6,[64+esp]
+db      102,15,56,222,217
+        pxor    xmm7,xmm0
+        movups  xmm0,[32+ebp]
+db      102,15,56,222,225
+db      102,15,56,222,233
+db      102,15,56,222,241
+db      102,15,56,222,249
+        call    L$_aesni_decrypt6_enter
+        movdqa  xmm1,[80+esp]
+        pxor    xmm0,xmm0
+        xorps   xmm2,[esp]
+        pcmpgtd xmm0,xmm1
+        xorps   xmm3,[16+esp]
+        movups  [edi],xmm2
+        xorps   xmm4,[32+esp]
+        movups  [16+edi],xmm3
+        xorps   xmm5,[48+esp]
+        movups  [32+edi],xmm4
+        xorps   xmm6,[64+esp]
+        movups  [48+edi],xmm5
+        xorps   xmm7,xmm1
+        movups  [64+edi],xmm6
+        pshufd  xmm2,xmm0,19
+        movups  [80+edi],xmm7
+        lea     edi,[96+edi]
+        movdqa  xmm3,[96+esp]
+        pxor    xmm0,xmm0
+        paddq   xmm1,xmm1
+        pand    xmm2,xmm3
+        pcmpgtd xmm0,xmm1
+        pxor    xmm1,xmm2
+        sub     eax,96
+        jnc     NEAR L$061xts_dec_loop6
+        mov     ecx,DWORD [240+ebp]
+        mov     edx,ebp
+        mov     ebx,ecx
+L$060xts_dec_short:
+        add     eax,96
+        jz      NEAR L$062xts_dec_done6x
+        movdqa  xmm5,xmm1
+        cmp     eax,32
+        jb      NEAR L$063xts_dec_one
+        pshufd  xmm2,xmm0,19
+        pxor    xmm0,xmm0
+        paddq   xmm1,xmm1
+        pand    xmm2,xmm3
+        pcmpgtd xmm0,xmm1
+        pxor    xmm1,xmm2
+        je      NEAR L$064xts_dec_two
+        pshufd  xmm2,xmm0,19
+        pxor    xmm0,xmm0
+        movdqa  xmm6,xmm1
+        paddq   xmm1,xmm1
+        pand    xmm2,xmm3
+        pcmpgtd xmm0,xmm1
+        pxor    xmm1,xmm2
+        cmp     eax,64
+        jb      NEAR L$065xts_dec_three
+        pshufd  xmm2,xmm0,19
+        pxor    xmm0,xmm0
+        movdqa  xmm7,xmm1
+        paddq   xmm1,xmm1
+        pand    xmm2,xmm3
+        pcmpgtd xmm0,xmm1
+        pxor    xmm1,xmm2
+        movdqa  [esp],xmm5
+        movdqa  [16+esp],xmm6
+        je      NEAR L$066xts_dec_four
+        movdqa  [32+esp],xmm7
+        pshufd  xmm7,xmm0,19
+        movdqa  [48+esp],xmm1
+        paddq   xmm1,xmm1
+        pand    xmm7,xmm3
+        pxor    xmm7,xmm1
+        movdqu  xmm2,[esi]
+        movdqu  xmm3,[16+esi]
+        movdqu  xmm4,[32+esi]
+        pxor    xmm2,[esp]
+        movdqu  xmm5,[48+esi]
+        pxor    xmm3,[16+esp]
+        movdqu  xmm6,[64+esi]
+        pxor    xmm4,[32+esp]
+        lea     esi,[80+esi]
+        pxor    xmm5,[48+esp]
+        movdqa  [64+esp],xmm7
+        pxor    xmm6,xmm7
+        call    __aesni_decrypt6
+        movaps  xmm1,[64+esp]
+        xorps   xmm2,[esp]
+        xorps   xmm3,[16+esp]
+        xorps   xmm4,[32+esp]
+        movups  [edi],xmm2
+        xorps   xmm5,[48+esp]
+        movups  [16+edi],xmm3
+        xorps   xmm6,xmm1
+        movups  [32+edi],xmm4
+        movups  [48+edi],xmm5
+        movups  [64+edi],xmm6
+        lea     edi,[80+edi]
+        jmp     NEAR L$067xts_dec_done
+align   16
+L$063xts_dec_one:
+        movups  xmm2,[esi]
+        lea     esi,[16+esi]
+        xorps   xmm2,xmm5
+        movups  xmm0,[edx]
+        movups  xmm1,[16+edx]
+        lea     edx,[32+edx]
+        xorps   xmm2,xmm0
+L$068dec1_loop_12:
+db      102,15,56,222,209
+        dec     ecx
+        movups  xmm1,[edx]
+        lea     edx,[16+edx]
+        jnz     NEAR L$068dec1_loop_12
+db      102,15,56,223,209
+        xorps   xmm2,xmm5
+        movups  [edi],xmm2
+        lea     edi,[16+edi]
+        movdqa  xmm1,xmm5
+        jmp     NEAR L$067xts_dec_done
+align   16
+L$064xts_dec_two:
+        movaps  xmm6,xmm1
+        movups  xmm2,[esi]
+        movups  xmm3,[16+esi]
+        lea     esi,[32+esi]
+        xorps   xmm2,xmm5
+        xorps   xmm3,xmm6
+        call    __aesni_decrypt2
+        xorps   xmm2,xmm5
+        xorps   xmm3,xmm6
+        movups  [edi],xmm2
+        movups  [16+edi],xmm3
+        lea     edi,[32+edi]
+        movdqa  xmm1,xmm6
+        jmp     NEAR L$067xts_dec_done
+align   16
+L$065xts_dec_three:
+        movaps  xmm7,xmm1
+        movups  xmm2,[esi]
+        movups  xmm3,[16+esi]
+        movups  xmm4,[32+esi]
+        lea     esi,[48+esi]
+        xorps   xmm2,xmm5
+        xorps   xmm3,xmm6
+        xorps   xmm4,xmm7
+        call    __aesni_decrypt3
+        xorps   xmm2,xmm5
+        xorps   xmm3,xmm6
+        xorps   xmm4,xmm7
+        movups  [edi],xmm2
+        movups  [16+edi],xmm3
+        movups  [32+edi],xmm4
+        lea     edi,[48+edi]
+        movdqa  xmm1,xmm7
+        jmp     NEAR L$067xts_dec_done
+align   16
+L$066xts_dec_four:
+        movaps  xmm6,xmm1
+        movups  xmm2,[esi]
+        movups  xmm3,[16+esi]
+        movups  xmm4,[32+esi]
+        xorps   xmm2,[esp]
+        movups  xmm5,[48+esi]
+        lea     esi,[64+esi]
+        xorps   xmm3,[16+esp]
+        xorps   xmm4,xmm7
+        xorps   xmm5,xmm6
+        call    __aesni_decrypt4
+        xorps   xmm2,[esp]
+        xorps   xmm3,[16+esp]
+        xorps   xmm4,xmm7
+        movups  [edi],xmm2
+        xorps   xmm5,xmm6
+        movups  [16+edi],xmm3
+        movups  [32+edi],xmm4
+        movups  [48+edi],xmm5
+        lea     edi,[64+edi]
+        movdqa  xmm1,xmm6
+        jmp     NEAR L$067xts_dec_done
+align   16
+L$062xts_dec_done6x:
+        mov     eax,DWORD [112+esp]
+        and     eax,15
+        jz      NEAR L$069xts_dec_ret
+        mov     DWORD [112+esp],eax
+        jmp     NEAR L$070xts_dec_only_one_more
+align   16
+L$067xts_dec_done:
+        mov     eax,DWORD [112+esp]
+        pxor    xmm0,xmm0
+        and     eax,15
+        jz      NEAR L$069xts_dec_ret
+        pcmpgtd xmm0,xmm1
+        mov     DWORD [112+esp],eax
+        pshufd  xmm2,xmm0,19
+        pxor    xmm0,xmm0
+        movdqa  xmm3,[96+esp]
+        paddq   xmm1,xmm1
+        pand    xmm2,xmm3
+        pcmpgtd xmm0,xmm1
+        pxor    xmm1,xmm2
+L$070xts_dec_only_one_more:
+        pshufd  xmm5,xmm0,19
+        movdqa  xmm6,xmm1
+        paddq   xmm1,xmm1
+        pand    xmm5,xmm3
+        pxor    xmm5,xmm1
+        mov     edx,ebp
+        mov     ecx,ebx
+        movups  xmm2,[esi]
+        xorps   xmm2,xmm5
+        movups  xmm0,[edx]
+        movups  xmm1,[16+edx]
+        lea     edx,[32+edx]
+        xorps   xmm2,xmm0
+L$071dec1_loop_13:
+db      102,15,56,222,209
+        dec     ecx
+        movups  xmm1,[edx]
+        lea     edx,[16+edx]
+        jnz     NEAR L$071dec1_loop_13
+db      102,15,56,223,209
+        xorps   xmm2,xmm5
+        movups  [edi],xmm2
+L$072xts_dec_steal:
+        movzx   ecx,BYTE [16+esi]
+        movzx   edx,BYTE [edi]
+        lea     esi,[1+esi]
+        mov     BYTE [edi],cl
+        mov     BYTE [16+edi],dl
+        lea     edi,[1+edi]
+        sub     eax,1
+        jnz     NEAR L$072xts_dec_steal
+        sub     edi,DWORD [112+esp]
+        mov     edx,ebp
+        mov     ecx,ebx
+        movups  xmm2,[edi]
+        xorps   xmm2,xmm6
+        movups  xmm0,[edx]
+        movups  xmm1,[16+edx]
+        lea     edx,[32+edx]
+        xorps   xmm2,xmm0
+L$073dec1_loop_14:
+db      102,15,56,222,209
+        dec     ecx
+        movups  xmm1,[edx]
+        lea     edx,[16+edx]
+        jnz     NEAR L$073dec1_loop_14
+db      102,15,56,223,209
+        xorps   xmm2,xmm6
+        movups  [edi],xmm2
+L$069xts_dec_ret:
+        pxor    xmm0,xmm0
+        pxor    xmm1,xmm1
+        pxor    xmm2,xmm2
+        movdqa  [esp],xmm0
+        pxor    xmm3,xmm3
+        movdqa  [16+esp],xmm0
+        pxor    xmm4,xmm4
+        movdqa  [32+esp],xmm0
+        pxor    xmm5,xmm5
+        movdqa  [48+esp],xmm0
+        pxor    xmm6,xmm6
+        movdqa  [64+esp],xmm0
+        pxor    xmm7,xmm7
+        movdqa  [80+esp],xmm0
+        mov     esp,DWORD [116+esp]
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+global  _aesni_ocb_encrypt
+align   16
+_aesni_ocb_encrypt:
+L$_aesni_ocb_encrypt_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        mov     ecx,DWORD [40+esp]
+        mov     ebx,DWORD [48+esp]
+        mov     esi,DWORD [20+esp]
+        mov     edi,DWORD [24+esp]
+        mov     eax,DWORD [28+esp]
+        mov     edx,DWORD [32+esp]
+        movdqu  xmm0,[ecx]
+        mov     ebp,DWORD [36+esp]
+        movdqu  xmm1,[ebx]
+        mov     ebx,DWORD [44+esp]
+        mov     ecx,esp
+        sub     esp,132
+        and     esp,-16
+        sub     edi,esi
+        shl     eax,4
+        lea     eax,[eax*1+esi-96]
+        mov     DWORD [120+esp],edi
+        mov     DWORD [124+esp],eax
+        mov     DWORD [128+esp],ecx
+        mov     ecx,DWORD [240+edx]
+        test    ebp,1
+        jnz     NEAR L$074odd
+        bsf     eax,ebp
+        add     ebp,1
+        shl     eax,4
+        movdqu  xmm7,[eax*1+ebx]
+        mov     eax,edx
+        movdqu  xmm2,[esi]
+        lea     esi,[16+esi]
+        pxor    xmm7,xmm0
+        pxor    xmm1,xmm2
+        pxor    xmm2,xmm7
+        movdqa  xmm6,xmm1
+        movups  xmm0,[edx]
+        movups  xmm1,[16+edx]
+        lea     edx,[32+edx]
+        xorps   xmm2,xmm0
+L$075enc1_loop_15:
+db      102,15,56,220,209
+        dec     ecx
+        movups  xmm1,[edx]
+        lea     edx,[16+edx]
+        jnz     NEAR L$075enc1_loop_15
+db      102,15,56,221,209
+        xorps   xmm2,xmm7
+        movdqa  xmm0,xmm7
+        movdqa  xmm1,xmm6
+        movups  [esi*1+edi-16],xmm2
+        mov     ecx,DWORD [240+eax]
+        mov     edx,eax
+        mov     eax,DWORD [124+esp]
+L$074odd:
+        shl     ecx,4
+        mov     edi,16
+        sub     edi,ecx
+        mov     DWORD [112+esp],edx
+        lea     edx,[32+ecx*1+edx]
+        mov     DWORD [116+esp],edi
+        cmp     esi,eax
+        ja      NEAR L$076short
+        jmp     NEAR L$077grandloop
+align   32
+L$077grandloop:
+        lea     ecx,[1+ebp]
+        lea     eax,[3+ebp]
+        lea     edi,[5+ebp]
+        add     ebp,6
+        bsf     ecx,ecx
+        bsf     eax,eax
+        bsf     edi,edi
+        shl     ecx,4
+        shl     eax,4
+        shl     edi,4
+        movdqu  xmm2,[ebx]
+        movdqu  xmm3,[ecx*1+ebx]
+        mov     ecx,DWORD [116+esp]
+        movdqa  xmm4,xmm2
+        movdqu  xmm5,[eax*1+ebx]
+        movdqa  xmm6,xmm2
+        movdqu  xmm7,[edi*1+ebx]
+        pxor    xmm2,xmm0
+        pxor    xmm3,xmm2
+        movdqa  [esp],xmm2
+        pxor    xmm4,xmm3
+        movdqa  [16+esp],xmm3
+        pxor    xmm5,xmm4
+        movdqa  [32+esp],xmm4
+        pxor    xmm6,xmm5
+        movdqa  [48+esp],xmm5
+        pxor    xmm7,xmm6
+        movdqa  [64+esp],xmm6
+        movdqa  [80+esp],xmm7
+        movups  xmm0,[ecx*1+edx-48]
+        movdqu  xmm2,[esi]
+        movdqu  xmm3,[16+esi]
+        movdqu  xmm4,[32+esi]
+        movdqu  xmm5,[48+esi]
+        movdqu  xmm6,[64+esi]
+        movdqu  xmm7,[80+esi]
+        lea     esi,[96+esi]
+        pxor    xmm1,xmm2
+        pxor    xmm2,xmm0
+        pxor    xmm1,xmm3
+        pxor    xmm3,xmm0
+        pxor    xmm1,xmm4
+        pxor    xmm4,xmm0
+        pxor    xmm1,xmm5
+        pxor    xmm5,xmm0
+        pxor    xmm1,xmm6
+        pxor    xmm6,xmm0
+        pxor    xmm1,xmm7
+        pxor    xmm7,xmm0
+        movdqa  [96+esp],xmm1
+        movups  xmm1,[ecx*1+edx-32]
+        pxor    xmm2,[esp]
+        pxor    xmm3,[16+esp]
+        pxor    xmm4,[32+esp]
+        pxor    xmm5,[48+esp]
+        pxor    xmm6,[64+esp]
+        pxor    xmm7,[80+esp]
+        movups  xmm0,[ecx*1+edx-16]
+db      102,15,56,220,209
+db      102,15,56,220,217
+db      102,15,56,220,225
+db      102,15,56,220,233
+db      102,15,56,220,241
+db      102,15,56,220,249
+        mov     edi,DWORD [120+esp]
+        mov     eax,DWORD [124+esp]
+        call    L$_aesni_encrypt6_enter
+        movdqa  xmm0,[80+esp]
+        pxor    xmm2,[esp]
+        pxor    xmm3,[16+esp]
+        pxor    xmm4,[32+esp]
+        pxor    xmm5,[48+esp]
+        pxor    xmm6,[64+esp]
+        pxor    xmm7,xmm0
+        movdqa  xmm1,[96+esp]
+        movdqu  [esi*1+edi-96],xmm2
+        movdqu  [esi*1+edi-80],xmm3
+        movdqu  [esi*1+edi-64],xmm4
+        movdqu  [esi*1+edi-48],xmm5
+        movdqu  [esi*1+edi-32],xmm6
+        movdqu  [esi*1+edi-16],xmm7
+        cmp     esi,eax
+        jb      NEAR L$077grandloop
+L$076short:
+        add     eax,96
+        sub     eax,esi
+        jz      NEAR L$078done
+        cmp     eax,32
+        jb      NEAR L$079one
+        je      NEAR L$080two
+        cmp     eax,64
+        jb      NEAR L$081three
+        je      NEAR L$082four
+        lea     ecx,[1+ebp]
+        lea     eax,[3+ebp]
+        bsf     ecx,ecx
+        bsf     eax,eax
+        shl     ecx,4
+        shl     eax,4
+        movdqu  xmm2,[ebx]
+        movdqu  xmm3,[ecx*1+ebx]
+        mov     ecx,DWORD [116+esp]
+        movdqa  xmm4,xmm2
+        movdqu  xmm5,[eax*1+ebx]
+        movdqa  xmm6,xmm2
+        pxor    xmm2,xmm0
+        pxor    xmm3,xmm2
+        movdqa  [esp],xmm2
+        pxor    xmm4,xmm3
+        movdqa  [16+esp],xmm3
+        pxor    xmm5,xmm4
+        movdqa  [32+esp],xmm4
+        pxor    xmm6,xmm5
+        movdqa  [48+esp],xmm5
+        pxor    xmm7,xmm6
+        movdqa  [64+esp],xmm6
+        movups  xmm0,[ecx*1+edx-48]
+        movdqu  xmm2,[esi]
+        movdqu  xmm3,[16+esi]
+        movdqu  xmm4,[32+esi]
+        movdqu  xmm5,[48+esi]
+        movdqu  xmm6,[64+esi]
+        pxor    xmm7,xmm7
+        pxor    xmm1,xmm2
+        pxor    xmm2,xmm0
+        pxor    xmm1,xmm3
+        pxor    xmm3,xmm0
+        pxor    xmm1,xmm4
+        pxor    xmm4,xmm0
+        pxor    xmm1,xmm5
+        pxor    xmm5,xmm0
+        pxor    xmm1,xmm6
+        pxor    xmm6,xmm0
+        movdqa  [96+esp],xmm1
+        movups  xmm1,[ecx*1+edx-32]
+        pxor    xmm2,[esp]
+        pxor    xmm3,[16+esp]
+        pxor    xmm4,[32+esp]
+        pxor    xmm5,[48+esp]
+        pxor    xmm6,[64+esp]
+        movups  xmm0,[ecx*1+edx-16]
+db      102,15,56,220,209
+db      102,15,56,220,217
+db      102,15,56,220,225
+db      102,15,56,220,233
+db      102,15,56,220,241
+db      102,15,56,220,249
+        mov     edi,DWORD [120+esp]
+        call    L$_aesni_encrypt6_enter
+        movdqa  xmm0,[64+esp]
+        pxor    xmm2,[esp]
+        pxor    xmm3,[16+esp]
+        pxor    xmm4,[32+esp]
+        pxor    xmm5,[48+esp]
+        pxor    xmm6,xmm0
+        movdqa  xmm1,[96+esp]
+        movdqu  [esi*1+edi],xmm2
+        movdqu  [16+esi*1+edi],xmm3
+        movdqu  [32+esi*1+edi],xmm4
+        movdqu  [48+esi*1+edi],xmm5
+        movdqu  [64+esi*1+edi],xmm6
+        jmp     NEAR L$078done
+align   16
+L$079one:
+        movdqu  xmm7,[ebx]
+        mov     edx,DWORD [112+esp]
+        movdqu  xmm2,[esi]
+        mov     ecx,DWORD [240+edx]
+        pxor    xmm7,xmm0
+        pxor    xmm1,xmm2
+        pxor    xmm2,xmm7
+        movdqa  xmm6,xmm1
+        mov     edi,DWORD [120+esp]
+        movups  xmm0,[edx]
+        movups  xmm1,[16+edx]
+        lea     edx,[32+edx]
+        xorps   xmm2,xmm0
+L$083enc1_loop_16:
+db      102,15,56,220,209
+        dec     ecx
+        movups  xmm1,[edx]
+        lea     edx,[16+edx]
+        jnz     NEAR L$083enc1_loop_16
+db      102,15,56,221,209
+        xorps   xmm2,xmm7
+        movdqa  xmm0,xmm7
+        movdqa  xmm1,xmm6
+        movups  [esi*1+edi],xmm2
+        jmp     NEAR L$078done
+align   16
+L$080two:
+        lea     ecx,[1+ebp]
+        mov     edx,DWORD [112+esp]
+        bsf     ecx,ecx
+        shl     ecx,4
+        movdqu  xmm6,[ebx]
+        movdqu  xmm7,[ecx*1+ebx]
+        movdqu  xmm2,[esi]
+        movdqu  xmm3,[16+esi]
+        mov     ecx,DWORD [240+edx]
+        pxor    xmm6,xmm0
+        pxor    xmm7,xmm6
+        pxor    xmm1,xmm2
+        pxor    xmm2,xmm6
+        pxor    xmm1,xmm3
+        pxor    xmm3,xmm7
+        movdqa  xmm5,xmm1
+        mov     edi,DWORD [120+esp]
+        call    __aesni_encrypt2
+        xorps   xmm2,xmm6
+        xorps   xmm3,xmm7
+        movdqa  xmm0,xmm7
+        movdqa  xmm1,xmm5
+        movups  [esi*1+edi],xmm2
+        movups  [16+esi*1+edi],xmm3
+        jmp     NEAR L$078done
+align   16
+L$081three:
+        lea     ecx,[1+ebp]
+        mov     edx,DWORD [112+esp]
+        bsf     ecx,ecx
+        shl     ecx,4
+        movdqu  xmm5,[ebx]
+        movdqu  xmm6,[ecx*1+ebx]
+        movdqa  xmm7,xmm5
+        movdqu  xmm2,[esi]
+        movdqu  xmm3,[16+esi]
+        movdqu  xmm4,[32+esi]
+        mov     ecx,DWORD [240+edx]
+        pxor    xmm5,xmm0
+        pxor    xmm6,xmm5
+        pxor    xmm7,xmm6
+        pxor    xmm1,xmm2
+        pxor    xmm2,xmm5
+        pxor    xmm1,xmm3
+        pxor    xmm3,xmm6
+        pxor    xmm1,xmm4
+        pxor    xmm4,xmm7
+        movdqa  [96+esp],xmm1
+        mov     edi,DWORD [120+esp]
+        call    __aesni_encrypt3
+        xorps   xmm2,xmm5
+        xorps   xmm3,xmm6
+        xorps   xmm4,xmm7
+        movdqa  xmm0,xmm7
+        movdqa  xmm1,[96+esp]
+        movups  [esi*1+edi],xmm2
+        movups  [16+esi*1+edi],xmm3
+        movups  [32+esi*1+edi],xmm4
+        jmp     NEAR L$078done
+align   16
+L$082four:
+        lea     ecx,[1+ebp]
+        lea     eax,[3+ebp]
+        bsf     ecx,ecx
+        bsf     eax,eax
+        mov     edx,DWORD [112+esp]
+        shl     ecx,4
+        shl     eax,4
+        movdqu  xmm4,[ebx]
+        movdqu  xmm5,[ecx*1+ebx]
+        movdqa  xmm6,xmm4
+        movdqu  xmm7,[eax*1+ebx]
+        pxor    xmm4,xmm0
+        movdqu  xmm2,[esi]
+        pxor    xmm5,xmm4
+        movdqu  xmm3,[16+esi]
+        pxor    xmm6,xmm5
+        movdqa  [esp],xmm4
+        pxor    xmm7,xmm6
+        movdqa  [16+esp],xmm5
+        movdqu  xmm4,[32+esi]
+        movdqu  xmm5,[48+esi]
+        mov     ecx,DWORD [240+edx]
+        pxor    xmm1,xmm2
+        pxor    xmm2,[esp]
+        pxor    xmm1,xmm3
+        pxor    xmm3,[16+esp]
+        pxor    xmm1,xmm4
+        pxor    xmm4,xmm6
+        pxor    xmm1,xmm5
+        pxor    xmm5,xmm7
+        movdqa  [96+esp],xmm1
+        mov     edi,DWORD [120+esp]
+        call    __aesni_encrypt4
+        xorps   xmm2,[esp]
+        xorps   xmm3,[16+esp]
+        xorps   xmm4,xmm6
+        movups  [esi*1+edi],xmm2
+        xorps   xmm5,xmm7
+        movups  [16+esi*1+edi],xmm3
+        movdqa  xmm0,xmm7
+        movups  [32+esi*1+edi],xmm4
+        movdqa  xmm1,[96+esp]
+        movups  [48+esi*1+edi],xmm5
+L$078done:
+        mov     edx,DWORD [128+esp]
+        pxor    xmm2,xmm2
+        pxor    xmm3,xmm3
+        movdqa  [esp],xmm2
+        pxor    xmm4,xmm4
+        movdqa  [16+esp],xmm2
+        pxor    xmm5,xmm5
+        movdqa  [32+esp],xmm2
+        pxor    xmm6,xmm6
+        movdqa  [48+esp],xmm2
+        pxor    xmm7,xmm7
+        movdqa  [64+esp],xmm2
+        movdqa  [80+esp],xmm2
+        movdqa  [96+esp],xmm2
+        lea     esp,[edx]
+        mov     ecx,DWORD [40+esp]
+        mov     ebx,DWORD [48+esp]
+        movdqu  [ecx],xmm0
+        pxor    xmm0,xmm0
+        movdqu  [ebx],xmm1
+        pxor    xmm1,xmm1
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+global  _aesni_ocb_decrypt
+align   16
+_aesni_ocb_decrypt:
+L$_aesni_ocb_decrypt_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        mov     ecx,DWORD [40+esp]
+        mov     ebx,DWORD [48+esp]
+        mov     esi,DWORD [20+esp]
+        mov     edi,DWORD [24+esp]
+        mov     eax,DWORD [28+esp]
+        mov     edx,DWORD [32+esp]
+        movdqu  xmm0,[ecx]
+        mov     ebp,DWORD [36+esp]
+        movdqu  xmm1,[ebx]
+        mov     ebx,DWORD [44+esp]
+        mov     ecx,esp
+        sub     esp,132
+        and     esp,-16
+        sub     edi,esi
+        shl     eax,4
+        lea     eax,[eax*1+esi-96]
+        mov     DWORD [120+esp],edi
+        mov     DWORD [124+esp],eax
+        mov     DWORD [128+esp],ecx
+        mov     ecx,DWORD [240+edx]
+        test    ebp,1
+        jnz     NEAR L$084odd
+        bsf     eax,ebp
+        add     ebp,1
+        shl     eax,4
+        movdqu  xmm7,[eax*1+ebx]
+        mov     eax,edx
+        movdqu  xmm2,[esi]
+        lea     esi,[16+esi]
+        pxor    xmm7,xmm0
+        pxor    xmm2,xmm7
+        movdqa  xmm6,xmm1
+        movups  xmm0,[edx]
+        movups  xmm1,[16+edx]
+        lea     edx,[32+edx]
+        xorps   xmm2,xmm0
+L$085dec1_loop_17:
+db      102,15,56,222,209
+        dec     ecx
+        movups  xmm1,[edx]
+        lea     edx,[16+edx]
+        jnz     NEAR L$085dec1_loop_17
+db      102,15,56,223,209
+        xorps   xmm2,xmm7
+        movaps  xmm1,xmm6
+        movdqa  xmm0,xmm7
+        xorps   xmm1,xmm2
+        movups  [esi*1+edi-16],xmm2
+        mov     ecx,DWORD [240+eax]
+        mov     edx,eax
+        mov     eax,DWORD [124+esp]
+L$084odd:
+        shl     ecx,4
+        mov     edi,16
+        sub     edi,ecx
+        mov     DWORD [112+esp],edx
+        lea     edx,[32+ecx*1+edx]
+        mov     DWORD [116+esp],edi
+        cmp     esi,eax
+        ja      NEAR L$086short
+        jmp     NEAR L$087grandloop
+align   32
+L$087grandloop:
+        lea     ecx,[1+ebp]
+        lea     eax,[3+ebp]
+        lea     edi,[5+ebp]
+        add     ebp,6
+        bsf     ecx,ecx
+        bsf     eax,eax
+        bsf     edi,edi
+        shl     ecx,4
+        shl     eax,4
+        shl     edi,4
+        movdqu  xmm2,[ebx]
+        movdqu  xmm3,[ecx*1+ebx]
+        mov     ecx,DWORD [116+esp]
+        movdqa  xmm4,xmm2
+        movdqu  xmm5,[eax*1+ebx]
+        movdqa  xmm6,xmm2
+        movdqu  xmm7,[edi*1+ebx]
+        pxor    xmm2,xmm0
+        pxor    xmm3,xmm2
+        movdqa  [esp],xmm2
+        pxor    xmm4,xmm3
+        movdqa  [16+esp],xmm3
+        pxor    xmm5,xmm4
+        movdqa  [32+esp],xmm4
+        pxor    xmm6,xmm5
+        movdqa  [48+esp],xmm5
+        pxor    xmm7,xmm6
+        movdqa  [64+esp],xmm6
+        movdqa  [80+esp],xmm7
+        movups  xmm0,[ecx*1+edx-48]
+        movdqu  xmm2,[esi]
+        movdqu  xmm3,[16+esi]
+        movdqu  xmm4,[32+esi]
+        movdqu  xmm5,[48+esi]
+        movdqu  xmm6,[64+esi]
+        movdqu  xmm7,[80+esi]
+        lea     esi,[96+esi]
+        movdqa  [96+esp],xmm1
+        pxor    xmm2,xmm0
+        pxor    xmm3,xmm0
+        pxor    xmm4,xmm0
+        pxor    xmm5,xmm0
+        pxor    xmm6,xmm0
+        pxor    xmm7,xmm0
+        movups  xmm1,[ecx*1+edx-32]
+        pxor    xmm2,[esp]
+        pxor    xmm3,[16+esp]
+        pxor    xmm4,[32+esp]
+        pxor    xmm5,[48+esp]
+        pxor    xmm6,[64+esp]
+        pxor    xmm7,[80+esp]
+        movups  xmm0,[ecx*1+edx-16]
+db      102,15,56,222,209
+db      102,15,56,222,217
+db      102,15,56,222,225
+db      102,15,56,222,233
+db      102,15,56,222,241
+db      102,15,56,222,249
+        mov     edi,DWORD [120+esp]
+        mov     eax,DWORD [124+esp]
+        call    L$_aesni_decrypt6_enter
+        movdqa  xmm0,[80+esp]
+        pxor    xmm2,[esp]
+        movdqa  xmm1,[96+esp]
+        pxor    xmm3,[16+esp]
+        pxor    xmm4,[32+esp]
+        pxor    xmm5,[48+esp]
+        pxor    xmm6,[64+esp]
+        pxor    xmm7,xmm0
+        pxor    xmm1,xmm2
+        movdqu  [esi*1+edi-96],xmm2
+        pxor    xmm1,xmm3
+        movdqu  [esi*1+edi-80],xmm3
+        pxor    xmm1,xmm4
+        movdqu  [esi*1+edi-64],xmm4
+        pxor    xmm1,xmm5
+        movdqu  [esi*1+edi-48],xmm5
+        pxor    xmm1,xmm6
+        movdqu  [esi*1+edi-32],xmm6
+        pxor    xmm1,xmm7
+        movdqu  [esi*1+edi-16],xmm7
+        cmp     esi,eax
+        jb      NEAR L$087grandloop
+L$086short:
+        add     eax,96
+        sub     eax,esi
+        jz      NEAR L$088done
+        cmp     eax,32
+        jb      NEAR L$089one
+        je      NEAR L$090two
+        cmp     eax,64
+        jb      NEAR L$091three
+        je      NEAR L$092four
+        lea     ecx,[1+ebp]
+        lea     eax,[3+ebp]
+        bsf     ecx,ecx
+        bsf     eax,eax
+        shl     ecx,4
+        shl     eax,4
+        movdqu  xmm2,[ebx]
+        movdqu  xmm3,[ecx*1+ebx]
+        mov     ecx,DWORD [116+esp]
+        movdqa  xmm4,xmm2
+        movdqu  xmm5,[eax*1+ebx]
+        movdqa  xmm6,xmm2
+        pxor    xmm2,xmm0
+        pxor    xmm3,xmm2
+        movdqa  [esp],xmm2
+        pxor    xmm4,xmm3
+        movdqa  [16+esp],xmm3
+        pxor    xmm5,xmm4
+        movdqa  [32+esp],xmm4
+        pxor    xmm6,xmm5
+        movdqa  [48+esp],xmm5
+        pxor    xmm7,xmm6
+        movdqa  [64+esp],xmm6
+        movups  xmm0,[ecx*1+edx-48]
+        movdqu  xmm2,[esi]
+        movdqu  xmm3,[16+esi]
+        movdqu  xmm4,[32+esi]
+        movdqu  xmm5,[48+esi]
+        movdqu  xmm6,[64+esi]
+        pxor    xmm7,xmm7
+        movdqa  [96+esp],xmm1
+        pxor    xmm2,xmm0
+        pxor    xmm3,xmm0
+        pxor    xmm4,xmm0
+        pxor    xmm5,xmm0
+        pxor    xmm6,xmm0
+        movups  xmm1,[ecx*1+edx-32]
+        pxor    xmm2,[esp]
+        pxor    xmm3,[16+esp]
+        pxor    xmm4,[32+esp]
+        pxor    xmm5,[48+esp]
+        pxor    xmm6,[64+esp]
+        movups  xmm0,[ecx*1+edx-16]
+db      102,15,56,222,209
+db      102,15,56,222,217
+db      102,15,56,222,225
+db      102,15,56,222,233
+db      102,15,56,222,241
+db      102,15,56,222,249
+        mov     edi,DWORD [120+esp]
+        call    L$_aesni_decrypt6_enter
+        movdqa  xmm0,[64+esp]
+        pxor    xmm2,[esp]
+        movdqa  xmm1,[96+esp]
+        pxor    xmm3,[16+esp]
+        pxor    xmm4,[32+esp]
+        pxor    xmm5,[48+esp]
+        pxor    xmm6,xmm0
+        pxor    xmm1,xmm2
+        movdqu  [esi*1+edi],xmm2
+        pxor    xmm1,xmm3
+        movdqu  [16+esi*1+edi],xmm3
+        pxor    xmm1,xmm4
+        movdqu  [32+esi*1+edi],xmm4
+        pxor    xmm1,xmm5
+        movdqu  [48+esi*1+edi],xmm5
+        pxor    xmm1,xmm6
+        movdqu  [64+esi*1+edi],xmm6
+        jmp     NEAR L$088done
+align   16
+L$089one:
+        movdqu  xmm7,[ebx]
+        mov     edx,DWORD [112+esp]
+        movdqu  xmm2,[esi]
+        mov     ecx,DWORD [240+edx]
+        pxor    xmm7,xmm0
+        pxor    xmm2,xmm7
+        movdqa  xmm6,xmm1
+        mov     edi,DWORD [120+esp]
+        movups  xmm0,[edx]
+        movups  xmm1,[16+edx]
+        lea     edx,[32+edx]
+        xorps   xmm2,xmm0
+L$093dec1_loop_18:
+db      102,15,56,222,209
+        dec     ecx
+        movups  xmm1,[edx]
+        lea     edx,[16+edx]
+        jnz     NEAR L$093dec1_loop_18
+db      102,15,56,223,209
+        xorps   xmm2,xmm7
+        movaps  xmm1,xmm6
+        movdqa  xmm0,xmm7
+        xorps   xmm1,xmm2
+        movups  [esi*1+edi],xmm2
+        jmp     NEAR L$088done
+align   16
+L$090two:
+        lea     ecx,[1+ebp]
+        mov     edx,DWORD [112+esp]
+        bsf     ecx,ecx
+        shl     ecx,4
+        movdqu  xmm6,[ebx]
+        movdqu  xmm7,[ecx*1+ebx]
+        movdqu  xmm2,[esi]
+        movdqu  xmm3,[16+esi]
+        mov     ecx,DWORD [240+edx]
+        movdqa  xmm5,xmm1
+        pxor    xmm6,xmm0
+        pxor    xmm7,xmm6
+        pxor    xmm2,xmm6
+        pxor    xmm3,xmm7
+        mov     edi,DWORD [120+esp]
+        call    __aesni_decrypt2
+        xorps   xmm2,xmm6
+        xorps   xmm3,xmm7
+        movdqa  xmm0,xmm7
+        xorps   xmm5,xmm2
+        movups  [esi*1+edi],xmm2
+        xorps   xmm5,xmm3
+        movups  [16+esi*1+edi],xmm3
+        movaps  xmm1,xmm5
+        jmp     NEAR L$088done
+align   16
+L$091three:
+        lea     ecx,[1+ebp]
+        mov     edx,DWORD [112+esp]
+        bsf     ecx,ecx
+        shl     ecx,4
+        movdqu  xmm5,[ebx]
+        movdqu  xmm6,[ecx*1+ebx]
+        movdqa  xmm7,xmm5
+        movdqu  xmm2,[esi]
+        movdqu  xmm3,[16+esi]
+        movdqu  xmm4,[32+esi]
+        mov     ecx,DWORD [240+edx]
+        movdqa  [96+esp],xmm1
+        pxor    xmm5,xmm0
+        pxor    xmm6,xmm5
+        pxor    xmm7,xmm6
+        pxor    xmm2,xmm5
+        pxor    xmm3,xmm6
+        pxor    xmm4,xmm7
+        mov     edi,DWORD [120+esp]
+        call    __aesni_decrypt3
+        movdqa  xmm1,[96+esp]
+        xorps   xmm2,xmm5
+        xorps   xmm3,xmm6
+        xorps   xmm4,xmm7
+        movups  [esi*1+edi],xmm2
+        pxor    xmm1,xmm2
+        movdqa  xmm0,xmm7
+        movups  [16+esi*1+edi],xmm3
+        pxor    xmm1,xmm3
+        movups  [32+esi*1+edi],xmm4
+        pxor    xmm1,xmm4
+        jmp     NEAR L$088done
+align   16
+L$092four:
+        lea     ecx,[1+ebp]
+        lea     eax,[3+ebp]
+        bsf     ecx,ecx
+        bsf     eax,eax
+        mov     edx,DWORD [112+esp]
+        shl     ecx,4
+        shl     eax,4
+        movdqu  xmm4,[ebx]
+        movdqu  xmm5,[ecx*1+ebx]
+        movdqa  xmm6,xmm4
+        movdqu  xmm7,[eax*1+ebx]
+        pxor    xmm4,xmm0
+        movdqu  xmm2,[esi]
+        pxor    xmm5,xmm4
+        movdqu  xmm3,[16+esi]
+        pxor    xmm6,xmm5
+        movdqa  [esp],xmm4
+        pxor    xmm7,xmm6
+        movdqa  [16+esp],xmm5
+        movdqu  xmm4,[32+esi]
+        movdqu  xmm5,[48+esi]
+        mov     ecx,DWORD [240+edx]
+        movdqa  [96+esp],xmm1
+        pxor    xmm2,[esp]
+        pxor    xmm3,[16+esp]
+        pxor    xmm4,xmm6
+        pxor    xmm5,xmm7
+        mov     edi,DWORD [120+esp]
+        call    __aesni_decrypt4
+        movdqa  xmm1,[96+esp]
+        xorps   xmm2,[esp]
+        xorps   xmm3,[16+esp]
+        xorps   xmm4,xmm6
+        movups  [esi*1+edi],xmm2
+        pxor    xmm1,xmm2
+        xorps   xmm5,xmm7
+        movups  [16+esi*1+edi],xmm3
+        pxor    xmm1,xmm3
+        movdqa  xmm0,xmm7
+        movups  [32+esi*1+edi],xmm4
+        pxor    xmm1,xmm4
+        movups  [48+esi*1+edi],xmm5
+        pxor    xmm1,xmm5
+L$088done:
+        mov     edx,DWORD [128+esp]
+        pxor    xmm2,xmm2
+        pxor    xmm3,xmm3
+        movdqa  [esp],xmm2
+        pxor    xmm4,xmm4
+        movdqa  [16+esp],xmm2
+        pxor    xmm5,xmm5
+        movdqa  [32+esp],xmm2
+        pxor    xmm6,xmm6
+        movdqa  [48+esp],xmm2
+        pxor    xmm7,xmm7
+        movdqa  [64+esp],xmm2
+        movdqa  [80+esp],xmm2
+        movdqa  [96+esp],xmm2
+        lea     esp,[edx]
+        mov     ecx,DWORD [40+esp]
+        mov     ebx,DWORD [48+esp]
+        movdqu  [ecx],xmm0
+        pxor    xmm0,xmm0
+        movdqu  [ebx],xmm1
+        pxor    xmm1,xmm1
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+global  _aesni_cbc_encrypt
+align   16
+_aesni_cbc_encrypt:
+L$_aesni_cbc_encrypt_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        mov     esi,DWORD [20+esp]
+        mov     ebx,esp
+        mov     edi,DWORD [24+esp]
+        sub     ebx,24
+        mov     eax,DWORD [28+esp]
+        and     ebx,-16
+        mov     edx,DWORD [32+esp]
+        mov     ebp,DWORD [36+esp]
+        test    eax,eax
+        jz      NEAR L$094cbc_abort
+        cmp     DWORD [40+esp],0
+        xchg    ebx,esp
+        movups  xmm7,[ebp]
+        mov     ecx,DWORD [240+edx]
+        mov     ebp,edx
+        mov     DWORD [16+esp],ebx
+        mov     ebx,ecx
+        je      NEAR L$095cbc_decrypt
+        movaps  xmm2,xmm7
+        cmp     eax,16
+        jb      NEAR L$096cbc_enc_tail
+        sub     eax,16
+        jmp     NEAR L$097cbc_enc_loop
+align   16
+L$097cbc_enc_loop:
+        movups  xmm7,[esi]
+        lea     esi,[16+esi]
+        movups  xmm0,[edx]
+        movups  xmm1,[16+edx]
+        xorps   xmm7,xmm0
+        lea     edx,[32+edx]
+        xorps   xmm2,xmm7
+L$098enc1_loop_19:
+db      102,15,56,220,209
+        dec     ecx
+        movups  xmm1,[edx]
+        lea     edx,[16+edx]
+        jnz     NEAR L$098enc1_loop_19
+db      102,15,56,221,209
+        mov     ecx,ebx
+        mov     edx,ebp
+        movups  [edi],xmm2
+        lea     edi,[16+edi]
+        sub     eax,16
+        jnc     NEAR L$097cbc_enc_loop
+        add     eax,16
+        jnz     NEAR L$096cbc_enc_tail
+        movaps  xmm7,xmm2
+        pxor    xmm2,xmm2
+        jmp     NEAR L$099cbc_ret
+L$096cbc_enc_tail:
+        mov     ecx,eax
+dd      2767451785
+        mov     ecx,16
+        sub     ecx,eax
+        xor     eax,eax
+dd      2868115081
+        lea     edi,[edi-16]
+        mov     ecx,ebx
+        mov     esi,edi
+        mov     edx,ebp
+        jmp     NEAR L$097cbc_enc_loop
+align   16
+L$095cbc_decrypt:
+        cmp     eax,80
+        jbe     NEAR L$100cbc_dec_tail
+        movaps  [esp],xmm7
+        sub     eax,80
+        jmp     NEAR L$101cbc_dec_loop6_enter
+align   16
+L$102cbc_dec_loop6:
+        movaps  [esp],xmm0
+        movups  [edi],xmm7
+        lea     edi,[16+edi]
+L$101cbc_dec_loop6_enter:
+        movdqu  xmm2,[esi]
+        movdqu  xmm3,[16+esi]
+        movdqu  xmm4,[32+esi]
+        movdqu  xmm5,[48+esi]
+        movdqu  xmm6,[64+esi]
+        movdqu  xmm7,[80+esi]
+        call    __aesni_decrypt6
+        movups  xmm1,[esi]
+        movups  xmm0,[16+esi]
+        xorps   xmm2,[esp]
+        xorps   xmm3,xmm1
+        movups  xmm1,[32+esi]
+        xorps   xmm4,xmm0
+        movups  xmm0,[48+esi]
+        xorps   xmm5,xmm1
+        movups  xmm1,[64+esi]
+        xorps   xmm6,xmm0
+        movups  xmm0,[80+esi]
+        xorps   xmm7,xmm1
+        movups  [edi],xmm2
+        movups  [16+edi],xmm3
+        lea     esi,[96+esi]
+        movups  [32+edi],xmm4
+        mov     ecx,ebx
+        movups  [48+edi],xmm5
+        mov     edx,ebp
+        movups  [64+edi],xmm6
+        lea     edi,[80+edi]
+        sub     eax,96
+        ja      NEAR L$102cbc_dec_loop6
+        movaps  xmm2,xmm7
+        movaps  xmm7,xmm0
+        add     eax,80
+        jle     NEAR L$103cbc_dec_clear_tail_collected
+        movups  [edi],xmm2
+        lea     edi,[16+edi]
+L$100cbc_dec_tail:
+        movups  xmm2,[esi]
+        movaps  xmm6,xmm2
+        cmp     eax,16
+        jbe     NEAR L$104cbc_dec_one
+        movups  xmm3,[16+esi]
+        movaps  xmm5,xmm3
+        cmp     eax,32
+        jbe     NEAR L$105cbc_dec_two
+        movups  xmm4,[32+esi]
+        cmp     eax,48
+        jbe     NEAR L$106cbc_dec_three
+        movups  xmm5,[48+esi]
+        cmp     eax,64
+        jbe     NEAR L$107cbc_dec_four
+        movups  xmm6,[64+esi]
+        movaps  [esp],xmm7
+        movups  xmm2,[esi]
+        xorps   xmm7,xmm7
+        call    __aesni_decrypt6
+        movups  xmm1,[esi]
+        movups  xmm0,[16+esi]
+        xorps   xmm2,[esp]
+        xorps   xmm3,xmm1
+        movups  xmm1,[32+esi]
+        xorps   xmm4,xmm0
+        movups  xmm0,[48+esi]
+        xorps   xmm5,xmm1
+        movups  xmm7,[64+esi]
+        xorps   xmm6,xmm0
+        movups  [edi],xmm2
+        movups  [16+edi],xmm3
+        pxor    xmm3,xmm3
+        movups  [32+edi],xmm4
+        pxor    xmm4,xmm4
+        movups  [48+edi],xmm5
+        pxor    xmm5,xmm5
+        lea     edi,[64+edi]
+        movaps  xmm2,xmm6
+        pxor    xmm6,xmm6
+        sub     eax,80
+        jmp     NEAR L$108cbc_dec_tail_collected
+align   16
+L$104cbc_dec_one:
+        movups  xmm0,[edx]
+        movups  xmm1,[16+edx]
+        lea     edx,[32+edx]
+        xorps   xmm2,xmm0
+L$109dec1_loop_20:
+db      102,15,56,222,209
+        dec     ecx
+        movups  xmm1,[edx]
+        lea     edx,[16+edx]
+        jnz     NEAR L$109dec1_loop_20
+db      102,15,56,223,209
+        xorps   xmm2,xmm7
+        movaps  xmm7,xmm6
+        sub     eax,16
+        jmp     NEAR L$108cbc_dec_tail_collected
+align   16
+L$105cbc_dec_two:
+        call    __aesni_decrypt2
+        xorps   xmm2,xmm7
+        xorps   xmm3,xmm6
+        movups  [edi],xmm2
+        movaps  xmm2,xmm3
+        pxor    xmm3,xmm3
+        lea     edi,[16+edi]
+        movaps  xmm7,xmm5
+        sub     eax,32
+        jmp     NEAR L$108cbc_dec_tail_collected
+align   16
+L$106cbc_dec_three:
+        call    __aesni_decrypt3
+        xorps   xmm2,xmm7
+        xorps   xmm3,xmm6
+        xorps   xmm4,xmm5
+        movups  [edi],xmm2
+        movaps  xmm2,xmm4
+        pxor    xmm4,xmm4
+        movups  [16+edi],xmm3
+        pxor    xmm3,xmm3
+        lea     edi,[32+edi]
+        movups  xmm7,[32+esi]
+        sub     eax,48
+        jmp     NEAR L$108cbc_dec_tail_collected
+align   16
+L$107cbc_dec_four:
+        call    __aesni_decrypt4
+        movups  xmm1,[16+esi]
+        movups  xmm0,[32+esi]
+        xorps   xmm2,xmm7
+        movups  xmm7,[48+esi]
+        xorps   xmm3,xmm6
+        movups  [edi],xmm2
+        xorps   xmm4,xmm1
+        movups  [16+edi],xmm3
+        pxor    xmm3,xmm3
+        xorps   xmm5,xmm0
+        movups  [32+edi],xmm4
+        pxor    xmm4,xmm4
+        lea     edi,[48+edi]
+        movaps  xmm2,xmm5
+        pxor    xmm5,xmm5
+        sub     eax,64
+        jmp     NEAR L$108cbc_dec_tail_collected
+align   16
+L$103cbc_dec_clear_tail_collected:
+        pxor    xmm3,xmm3
+        pxor    xmm4,xmm4
+        pxor    xmm5,xmm5
+        pxor    xmm6,xmm6
+L$108cbc_dec_tail_collected:
+        and     eax,15
+        jnz     NEAR L$110cbc_dec_tail_partial
+        movups  [edi],xmm2
+        pxor    xmm0,xmm0
+        jmp     NEAR L$099cbc_ret
+align   16
+L$110cbc_dec_tail_partial:
+        movaps  [esp],xmm2
+        pxor    xmm0,xmm0
+        mov     ecx,16
+        mov     esi,esp
+        sub     ecx,eax
+dd      2767451785
+        movdqa  [esp],xmm2
+L$099cbc_ret:
+        mov     esp,DWORD [16+esp]
+        mov     ebp,DWORD [36+esp]
+        pxor    xmm2,xmm2
+        pxor    xmm1,xmm1
+        movups  [ebp],xmm7
+        pxor    xmm7,xmm7
+L$094cbc_abort:
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+align   16
+__aesni_set_encrypt_key:
+        push    ebp
+        push    ebx
+        test    eax,eax
+        jz      NEAR L$111bad_pointer
+        test    edx,edx
+        jz      NEAR L$111bad_pointer
+        call    L$112pic
+L$112pic:
+        pop     ebx
+        lea     ebx,[(L$key_const-L$112pic)+ebx]
+        lea     ebp,[_OPENSSL_ia32cap_P]
+        movups  xmm0,[eax]
+        xorps   xmm4,xmm4
+        mov     ebp,DWORD [4+ebp]
+        lea     edx,[16+edx]
+        and     ebp,268437504
+        cmp     ecx,256
+        je      NEAR L$11314rounds
+        cmp     ecx,192
+        je      NEAR L$11412rounds
+        cmp     ecx,128
+        jne     NEAR L$115bad_keybits
+align   16
+L$11610rounds:
+        cmp     ebp,268435456
+        je      NEAR L$11710rounds_alt
+        mov     ecx,9
+        movups  [edx-16],xmm0
+db      102,15,58,223,200,1
+        call    L$118key_128_cold
+db      102,15,58,223,200,2
+        call    L$119key_128
+db      102,15,58,223,200,4
+        call    L$119key_128
+db      102,15,58,223,200,8
+        call    L$119key_128
+db      102,15,58,223,200,16
+        call    L$119key_128
+db      102,15,58,223,200,32
+        call    L$119key_128
+db      102,15,58,223,200,64
+        call    L$119key_128
+db      102,15,58,223,200,128
+        call    L$119key_128
+db      102,15,58,223,200,27
+        call    L$119key_128
+db      102,15,58,223,200,54
+        call    L$119key_128
+        movups  [edx],xmm0
+        mov     DWORD [80+edx],ecx
+        jmp     NEAR L$120good_key
+align   16
+L$119key_128:
+        movups  [edx],xmm0
+        lea     edx,[16+edx]
+L$118key_128_cold:
+        shufps  xmm4,xmm0,16
+        xorps   xmm0,xmm4
+        shufps  xmm4,xmm0,140
+        xorps   xmm0,xmm4
+        shufps  xmm1,xmm1,255
+        xorps   xmm0,xmm1
+        ret
+align   16
+L$11710rounds_alt:
+        movdqa  xmm5,[ebx]
+        mov     ecx,8
+        movdqa  xmm4,[32+ebx]
+        movdqa  xmm2,xmm0
+        movdqu  [edx-16],xmm0
+L$121loop_key128:
+db      102,15,56,0,197
+db      102,15,56,221,196
+        pslld   xmm4,1
+        lea     edx,[16+edx]
+        movdqa  xmm3,xmm2
+        pslldq  xmm2,4
+        pxor    xmm3,xmm2
+        pslldq  xmm2,4
+        pxor    xmm3,xmm2
+        pslldq  xmm2,4
+        pxor    xmm2,xmm3
+        pxor    xmm0,xmm2
+        movdqu  [edx-16],xmm0
+        movdqa  xmm2,xmm0
+        dec     ecx
+        jnz     NEAR L$121loop_key128
+        movdqa  xmm4,[48+ebx]
+db      102,15,56,0,197
+db      102,15,56,221,196
+        pslld   xmm4,1
+        movdqa  xmm3,xmm2
+        pslldq  xmm2,4
+        pxor    xmm3,xmm2
+        pslldq  xmm2,4
+        pxor    xmm3,xmm2
+        pslldq  xmm2,4
+        pxor    xmm2,xmm3
+        pxor    xmm0,xmm2
+        movdqu  [edx],xmm0
+        movdqa  xmm2,xmm0
+db      102,15,56,0,197
+db      102,15,56,221,196
+        movdqa  xmm3,xmm2
+        pslldq  xmm2,4
+        pxor    xmm3,xmm2
+        pslldq  xmm2,4
+        pxor    xmm3,xmm2
+        pslldq  xmm2,4
+        pxor    xmm2,xmm3
+        pxor    xmm0,xmm2
+        movdqu  [16+edx],xmm0
+        mov     ecx,9
+        mov     DWORD [96+edx],ecx
+        jmp     NEAR L$120good_key
+align   16
+L$11412rounds:
+        movq    xmm2,[16+eax]
+        cmp     ebp,268435456
+        je      NEAR L$12212rounds_alt
+        mov     ecx,11
+        movups  [edx-16],xmm0
+db      102,15,58,223,202,1
+        call    L$123key_192a_cold
+db      102,15,58,223,202,2
+        call    L$124key_192b
+db      102,15,58,223,202,4
+        call    L$125key_192a
+db      102,15,58,223,202,8
+        call    L$124key_192b
+db      102,15,58,223,202,16
+        call    L$125key_192a
+db      102,15,58,223,202,32
+        call    L$124key_192b
+db      102,15,58,223,202,64
+        call    L$125key_192a
+db      102,15,58,223,202,128
+        call    L$124key_192b
+        movups  [edx],xmm0
+        mov     DWORD [48+edx],ecx
+        jmp     NEAR L$120good_key
+align   16
+L$125key_192a:
+        movups  [edx],xmm0
+        lea     edx,[16+edx]
+align   16
+L$123key_192a_cold:
+        movaps  xmm5,xmm2
+L$126key_192b_warm:
+        shufps  xmm4,xmm0,16
+        movdqa  xmm3,xmm2
+        xorps   xmm0,xmm4
+        shufps  xmm4,xmm0,140
+        pslldq  xmm3,4
+        xorps   xmm0,xmm4
+        pshufd  xmm1,xmm1,85
+        pxor    xmm2,xmm3
+        pxor    xmm0,xmm1
+        pshufd  xmm3,xmm0,255
+        pxor    xmm2,xmm3
+        ret
+align   16
+L$124key_192b:
+        movaps  xmm3,xmm0
+        shufps  xmm5,xmm0,68
+        movups  [edx],xmm5
+        shufps  xmm3,xmm2,78
+        movups  [16+edx],xmm3
+        lea     edx,[32+edx]
+        jmp     NEAR L$126key_192b_warm
+align   16
+L$12212rounds_alt:
+        movdqa  xmm5,[16+ebx]
+        movdqa  xmm4,[32+ebx]
+        mov     ecx,8
+        movdqu  [edx-16],xmm0
+L$127loop_key192:
+        movq    [edx],xmm2
+        movdqa  xmm1,xmm2
+db      102,15,56,0,213
+db      102,15,56,221,212
+        pslld   xmm4,1
+        lea     edx,[24+edx]
+        movdqa  xmm3,xmm0
+        pslldq  xmm0,4
+        pxor    xmm3,xmm0
+        pslldq  xmm0,4
+        pxor    xmm3,xmm0
+        pslldq  xmm0,4
+        pxor    xmm0,xmm3
+        pshufd  xmm3,xmm0,255
+        pxor    xmm3,xmm1
+        pslldq  xmm1,4
+        pxor    xmm3,xmm1
+        pxor    xmm0,xmm2
+        pxor    xmm2,xmm3
+        movdqu  [edx-16],xmm0
+        dec     ecx
+        jnz     NEAR L$127loop_key192
+        mov     ecx,11
+        mov     DWORD [32+edx],ecx
+        jmp     NEAR L$120good_key
+align   16
+L$11314rounds:
+        movups  xmm2,[16+eax]
+        lea     edx,[16+edx]
+        cmp     ebp,268435456
+        je      NEAR L$12814rounds_alt
+        mov     ecx,13
+        movups  [edx-32],xmm0
+        movups  [edx-16],xmm2
+db      102,15,58,223,202,1
+        call    L$129key_256a_cold
+db      102,15,58,223,200,1
+        call    L$130key_256b
+db      102,15,58,223,202,2
+        call    L$131key_256a
+db      102,15,58,223,200,2
+        call    L$130key_256b
+db      102,15,58,223,202,4
+        call    L$131key_256a
+db      102,15,58,223,200,4
+        call    L$130key_256b
+db      102,15,58,223,202,8
+        call    L$131key_256a
+db      102,15,58,223,200,8
+        call    L$130key_256b
+db      102,15,58,223,202,16
+        call    L$131key_256a
+db      102,15,58,223,200,16
+        call    L$130key_256b
+db      102,15,58,223,202,32
+        call    L$131key_256a
+db      102,15,58,223,200,32
+        call    L$130key_256b
+db      102,15,58,223,202,64
+        call    L$131key_256a
+        movups  [edx],xmm0
+        mov     DWORD [16+edx],ecx
+        xor     eax,eax
+        jmp     NEAR L$120good_key
+align   16
+L$131key_256a:
+        movups  [edx],xmm2
+        lea     edx,[16+edx]
+L$129key_256a_cold:
+        shufps  xmm4,xmm0,16
+        xorps   xmm0,xmm4
+        shufps  xmm4,xmm0,140
+        xorps   xmm0,xmm4
+        shufps  xmm1,xmm1,255
+        xorps   xmm0,xmm1
+        ret
+align   16
+L$130key_256b:
+        movups  [edx],xmm0
+        lea     edx,[16+edx]
+        shufps  xmm4,xmm2,16
+        xorps   xmm2,xmm4
+        shufps  xmm4,xmm2,140
+        xorps   xmm2,xmm4
+        shufps  xmm1,xmm1,170
+        xorps   xmm2,xmm1
+        ret
+align   16
+L$12814rounds_alt:
+        movdqa  xmm5,[ebx]
+        movdqa  xmm4,[32+ebx]
+        mov     ecx,7
+        movdqu  [edx-32],xmm0
+        movdqa  xmm1,xmm2
+        movdqu  [edx-16],xmm2
+L$132loop_key256:
+db      102,15,56,0,213
+db      102,15,56,221,212
+        movdqa  xmm3,xmm0
+        pslldq  xmm0,4
+        pxor    xmm3,xmm0
+        pslldq  xmm0,4
+        pxor    xmm3,xmm0
+        pslldq  xmm0,4
+        pxor    xmm0,xmm3
+        pslld   xmm4,1
+        pxor    xmm0,xmm2
+        movdqu  [edx],xmm0
+        dec     ecx
+        jz      NEAR L$133done_key256
+        pshufd  xmm2,xmm0,255
+        pxor    xmm3,xmm3
+db      102,15,56,221,211
+        movdqa  xmm3,xmm1
+        pslldq  xmm1,4
+        pxor    xmm3,xmm1
+        pslldq  xmm1,4
+        pxor    xmm3,xmm1
+        pslldq  xmm1,4
+        pxor    xmm1,xmm3
+        pxor    xmm2,xmm1
+        movdqu  [16+edx],xmm2
+        lea     edx,[32+edx]
+        movdqa  xmm1,xmm2
+        jmp     NEAR L$132loop_key256
+L$133done_key256:
+        mov     ecx,13
+        mov     DWORD [16+edx],ecx
+L$120good_key:
+        pxor    xmm0,xmm0
+        pxor    xmm1,xmm1
+        pxor    xmm2,xmm2
+        pxor    xmm3,xmm3
+        pxor    xmm4,xmm4
+        pxor    xmm5,xmm5
+        xor     eax,eax
+        pop     ebx
+        pop     ebp
+        ret
+align   4
+L$111bad_pointer:
+        mov     eax,-1
+        pop     ebx
+        pop     ebp
+        ret
+align   4
+L$115bad_keybits:
+        pxor    xmm0,xmm0
+        mov     eax,-2
+        pop     ebx
+        pop     ebp
+        ret
+global  _aesni_set_encrypt_key
+align   16
+_aesni_set_encrypt_key:
+L$_aesni_set_encrypt_key_begin:
+        mov     eax,DWORD [4+esp]
+        mov     ecx,DWORD [8+esp]
+        mov     edx,DWORD [12+esp]
+        call    __aesni_set_encrypt_key
+        ret
+global  _aesni_set_decrypt_key
+align   16
+_aesni_set_decrypt_key:
+L$_aesni_set_decrypt_key_begin:
+        mov     eax,DWORD [4+esp]
+        mov     ecx,DWORD [8+esp]
+        mov     edx,DWORD [12+esp]
+        call    __aesni_set_encrypt_key
+        mov     edx,DWORD [12+esp]
+        shl     ecx,4
+        test    eax,eax
+        jnz     NEAR L$134dec_key_ret
+        lea     eax,[16+ecx*1+edx]
+        movups  xmm0,[edx]
+        movups  xmm1,[eax]
+        movups  [eax],xmm0
+        movups  [edx],xmm1
+        lea     edx,[16+edx]
+        lea     eax,[eax-16]
+L$135dec_key_inverse:
+        movups  xmm0,[edx]
+        movups  xmm1,[eax]
+db      102,15,56,219,192
+db      102,15,56,219,201
+        lea     edx,[16+edx]
+        lea     eax,[eax-16]
+        movups  [16+eax],xmm0
+        movups  [edx-16],xmm1
+        cmp     eax,edx
+        ja      NEAR L$135dec_key_inverse
+        movups  xmm0,[edx]
+db      102,15,56,219,192
+        movups  [edx],xmm0
+        pxor    xmm0,xmm0
+        pxor    xmm1,xmm1
+        xor     eax,eax
+L$134dec_key_ret:
+        ret
+align   64
+L$key_const:
+dd      202313229,202313229,202313229,202313229
+dd      67569157,67569157,67569157,67569157
+dd      1,1,1,1
+dd      27,27,27,27
+db      65,69,83,32,102,111,114,32,73,110,116,101,108,32,65,69
+db      83,45,78,73,44,32,67,82,89,80,84,79,71,65,77,83
+db      32,98,121,32,60,97,112,112,114,111,64,111,112,101,110,115
+db      115,108,46,111,114,103,62,0
+segment .bss
+common  _OPENSSL_ia32cap_P 16
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/vpaes-x86.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/vpaes-x86.nasm
new file mode 100644
index 0000000000..5eecfdba3d
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/vpaes-x86.nasm
@@ -0,0 +1,648 @@
+; Copyright 2011-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code    use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text   code align=64
+%else
+section .text   code
+%endif
+align   64
+L$_vpaes_consts:
+dd      218628480,235210255,168496130,67568393
+dd      252381056,17041926,33884169,51187212
+dd      252645135,252645135,252645135,252645135
+dd      1512730624,3266504856,1377990664,3401244816
+dd      830229760,1275146365,2969422977,3447763452
+dd      3411033600,2979783055,338359620,2782886510
+dd      4209124096,907596821,221174255,1006095553
+dd      191964160,3799684038,3164090317,1589111125
+dd      182528256,1777043520,2877432650,3265356744
+dd      1874708224,3503451415,3305285752,363511674
+dd      1606117888,3487855781,1093350906,2384367825
+dd      197121,67569157,134941193,202313229
+dd      67569157,134941193,202313229,197121
+dd      134941193,202313229,197121,67569157
+dd      202313229,197121,67569157,134941193
+dd      33619971,100992007,168364043,235736079
+dd      235736079,33619971,100992007,168364043
+dd      168364043,235736079,33619971,100992007
+dd      100992007,168364043,235736079,33619971
+dd      50462976,117835012,185207048,252579084
+dd      252314880,51251460,117574920,184942860
+dd      184682752,252054788,50987272,118359308
+dd      118099200,185467140,251790600,50727180
+dd      2946363062,528716217,1300004225,1881839624
+dd      1532713819,1532713819,1532713819,1532713819
+dd      3602276352,4288629033,3737020424,4153884961
+dd      1354558464,32357713,2958822624,3775749553
+dd      1201988352,132424512,1572796698,503232858
+dd      2213177600,1597421020,4103937655,675398315
+dd      2749646592,4273543773,1511898873,121693092
+dd      3040248576,1103263732,2871565598,1608280554
+dd      2236667136,2588920351,482954393,64377734
+dd      3069987328,291237287,2117370568,3650299247
+dd      533321216,3573750986,2572112006,1401264716
+dd      1339849704,2721158661,548607111,3445553514
+dd      2128193280,3054596040,2183486460,1257083700
+dd      655635200,1165381986,3923443150,2344132524
+dd      190078720,256924420,290342170,357187870
+dd      1610966272,2263057382,4103205268,309794674
+dd      2592527872,2233205587,1335446729,3402964816
+dd      3973531904,3225098121,3002836325,1918774430
+dd      3870401024,2102906079,2284471353,4117666579
+dd      617007872,1021508343,366931923,691083277
+dd      2528395776,3491914898,2968704004,1613121270
+dd      3445188352,3247741094,844474987,4093578302
+dd      651481088,1190302358,1689581232,574775300
+dd      4289380608,206939853,2555985458,2489840491
+dd      2130264064,327674451,3566485037,3349835193
+dd      2470714624,316102159,3636825756,3393945945
+db      86,101,99,116,111,114,32,80,101,114,109,117,116,97,116,105
+db      111,110,32,65,69,83,32,102,111,114,32,120,56,54,47,83
+db      83,83,69,51,44,32,77,105,107,101,32,72,97,109,98,117
+db      114,103,32,40,83,116,97,110,102,111,114,100,32,85,110,105
+db      118,101,114,115,105,116,121,41,0
+align   64
+align   16
+__vpaes_preheat:
+        add     ebp,DWORD [esp]
+        movdqa  xmm7,[ebp-48]
+        movdqa  xmm6,[ebp-16]
+        ret
+align   16
+__vpaes_encrypt_core:
+        mov     ecx,16
+        mov     eax,DWORD [240+edx]
+        movdqa  xmm1,xmm6
+        movdqa  xmm2,[ebp]
+        pandn   xmm1,xmm0
+        pand    xmm0,xmm6
+        movdqu  xmm5,[edx]
+db      102,15,56,0,208
+        movdqa  xmm0,[16+ebp]
+        pxor    xmm2,xmm5
+        psrld   xmm1,4
+        add     edx,16
+db      102,15,56,0,193
+        lea     ebx,[192+ebp]
+        pxor    xmm0,xmm2
+        jmp     NEAR L$000enc_entry
+align   16
+L$001enc_loop:
+        movdqa  xmm4,[32+ebp]
+        movdqa  xmm0,[48+ebp]
+db      102,15,56,0,226
+db      102,15,56,0,195
+        pxor    xmm4,xmm5
+        movdqa  xmm5,[64+ebp]
+        pxor    xmm0,xmm4
+        movdqa  xmm1,[ecx*1+ebx-64]
+db      102,15,56,0,234
+        movdqa  xmm2,[80+ebp]
+        movdqa  xmm4,[ecx*1+ebx]
+db      102,15,56,0,211
+        movdqa  xmm3,xmm0
+        pxor    xmm2,xmm5
+db      102,15,56,0,193
+        add     edx,16
+        pxor    xmm0,xmm2
+db      102,15,56,0,220
+        add     ecx,16
+        pxor    xmm3,xmm0
+db      102,15,56,0,193
+        and     ecx,48
+        sub     eax,1
+        pxor    xmm0,xmm3
+L$000enc_entry:
+        movdqa  xmm1,xmm6
+        movdqa  xmm5,[ebp-32]
+        pandn   xmm1,xmm0
+        psrld   xmm1,4
+        pand    xmm0,xmm6
+db      102,15,56,0,232
+        movdqa  xmm3,xmm7
+        pxor    xmm0,xmm1
+db      102,15,56,0,217
+        movdqa  xmm4,xmm7
+        pxor    xmm3,xmm5
+db      102,15,56,0,224
+        movdqa  xmm2,xmm7
+        pxor    xmm4,xmm5
+db      102,15,56,0,211
+        movdqa  xmm3,xmm7
+        pxor    xmm2,xmm0
+db      102,15,56,0,220
+        movdqu  xmm5,[edx]
+        pxor    xmm3,xmm1
+        jnz     NEAR L$001enc_loop
+        movdqa  xmm4,[96+ebp]
+        movdqa  xmm0,[112+ebp]
+db      102,15,56,0,226
+        pxor    xmm4,xmm5
+db      102,15,56,0,195
+        movdqa  xmm1,[64+ecx*1+ebx]
+        pxor    xmm0,xmm4
+db      102,15,56,0,193
+        ret
+align   16
+__vpaes_decrypt_core:
+        lea     ebx,[608+ebp]
+        mov     eax,DWORD [240+edx]
+        movdqa  xmm1,xmm6
+        movdqa  xmm2,[ebx-64]
+        pandn   xmm1,xmm0
+        mov     ecx,eax
+        psrld   xmm1,4
+        movdqu  xmm5,[edx]
+        shl     ecx,4
+        pand    xmm0,xmm6
+db      102,15,56,0,208
+        movdqa  xmm0,[ebx-48]
+        xor     ecx,48
+db      102,15,56,0,193
+        and     ecx,48
+        pxor    xmm2,xmm5
+        movdqa  xmm5,[176+ebp]
+        pxor    xmm0,xmm2
+        add     edx,16
+        lea     ecx,[ecx*1+ebx-352]
+        jmp     NEAR L$002dec_entry
+align   16
+L$003dec_loop:
+        movdqa  xmm4,[ebx-32]
+        movdqa  xmm1,[ebx-16]
+db      102,15,56,0,226
+db      102,15,56,0,203
+        pxor    xmm0,xmm4
+        movdqa  xmm4,[ebx]
+        pxor    xmm0,xmm1
+        movdqa  xmm1,[16+ebx]
+db      102,15,56,0,226
+db      102,15,56,0,197
+db      102,15,56,0,203
+        pxor    xmm0,xmm4
+        movdqa  xmm4,[32+ebx]
+        pxor    xmm0,xmm1
+        movdqa  xmm1,[48+ebx]
+db      102,15,56,0,226
+db      102,15,56,0,197
+db      102,15,56,0,203
+        pxor    xmm0,xmm4
+        movdqa  xmm4,[64+ebx]
+        pxor    xmm0,xmm1
+        movdqa  xmm1,[80+ebx]
+db      102,15,56,0,226
+db      102,15,56,0,197
+db      102,15,56,0,203
+        pxor    xmm0,xmm4
+        add     edx,16
+db      102,15,58,15,237,12
+        pxor    xmm0,xmm1
+        sub     eax,1
+L$002dec_entry:
+        movdqa  xmm1,xmm6
+        movdqa  xmm2,[ebp-32]
+        pandn   xmm1,xmm0
+        pand    xmm0,xmm6
+        psrld   xmm1,4
+db      102,15,56,0,208
+        movdqa  xmm3,xmm7
+        pxor    xmm0,xmm1
+db      102,15,56,0,217
+        movdqa  xmm4,xmm7
+        pxor    xmm3,xmm2
+db      102,15,56,0,224
+        pxor    xmm4,xmm2
+        movdqa  xmm2,xmm7
+db      102,15,56,0,211
+        movdqa  xmm3,xmm7
+        pxor    xmm2,xmm0
+db      102,15,56,0,220
+        movdqu  xmm0,[edx]
+        pxor    xmm3,xmm1
+        jnz     NEAR L$003dec_loop
+        movdqa  xmm4,[96+ebx]
+db      102,15,56,0,226
+        pxor    xmm4,xmm0
+        movdqa  xmm0,[112+ebx]
+        movdqa  xmm2,[ecx]
+db      102,15,56,0,195
+        pxor    xmm0,xmm4
+db      102,15,56,0,194
+        ret
+align   16
+__vpaes_schedule_core:
+        add     ebp,DWORD [esp]
+        movdqu  xmm0,[esi]
+        movdqa  xmm2,[320+ebp]
+        movdqa  xmm3,xmm0
+        lea     ebx,[ebp]
+        movdqa  [4+esp],xmm2
+        call    __vpaes_schedule_transform
+        movdqa  xmm7,xmm0
+        test    edi,edi
+        jnz     NEAR L$004schedule_am_decrypting
+        movdqu  [edx],xmm0
+        jmp     NEAR L$005schedule_go
+L$004schedule_am_decrypting:
+        movdqa  xmm1,[256+ecx*1+ebp]
+db      102,15,56,0,217
+        movdqu  [edx],xmm3
+        xor     ecx,48
+L$005schedule_go:
+        cmp     eax,192
+        ja      NEAR L$006schedule_256
+        je      NEAR L$007schedule_192
+L$008schedule_128:
+        mov     eax,10
+L$009loop_schedule_128:
+        call    __vpaes_schedule_round
+        dec     eax
+        jz      NEAR L$010schedule_mangle_last
+        call    __vpaes_schedule_mangle
+        jmp     NEAR L$009loop_schedule_128
+align   16
+L$007schedule_192:
+        movdqu  xmm0,[8+esi]
+        call    __vpaes_schedule_transform
+        movdqa  xmm6,xmm0
+        pxor    xmm4,xmm4
+        movhlps xmm6,xmm4
+        mov     eax,4
+L$011loop_schedule_192:
+        call    __vpaes_schedule_round
+db      102,15,58,15,198,8
+        call    __vpaes_schedule_mangle
+        call    __vpaes_schedule_192_smear
+        call    __vpaes_schedule_mangle
+        call    __vpaes_schedule_round
+        dec     eax
+        jz      NEAR L$010schedule_mangle_last
+        call    __vpaes_schedule_mangle
+        call    __vpaes_schedule_192_smear
+        jmp     NEAR L$011loop_schedule_192
+align   16
+L$006schedule_256:
+        movdqu  xmm0,[16+esi]
+        call    __vpaes_schedule_transform
+        mov     eax,7
+L$012loop_schedule_256:
+        call    __vpaes_schedule_mangle
+        movdqa  xmm6,xmm0
+        call    __vpaes_schedule_round
+        dec     eax
+        jz      NEAR L$010schedule_mangle_last
+        call    __vpaes_schedule_mangle
+        pshufd  xmm0,xmm0,255
+        movdqa  [20+esp],xmm7
+        movdqa  xmm7,xmm6
+        call    L$_vpaes_schedule_low_round
+        movdqa  xmm7,[20+esp]
+        jmp     NEAR L$012loop_schedule_256
+align   16
+L$010schedule_mangle_last:
+        lea     ebx,[384+ebp]
+        test    edi,edi
+        jnz     NEAR L$013schedule_mangle_last_dec
+        movdqa  xmm1,[256+ecx*1+ebp]
+db      102,15,56,0,193
+        lea     ebx,[352+ebp]
+        add     edx,32
+L$013schedule_mangle_last_dec:
+        add     edx,-16
+        pxor    xmm0,[336+ebp]
+        call    __vpaes_schedule_transform
+        movdqu  [edx],xmm0
+        pxor    xmm0,xmm0
+        pxor    xmm1,xmm1
+        pxor    xmm2,xmm2
+        pxor    xmm3,xmm3
+        pxor    xmm4,xmm4
+        pxor    xmm5,xmm5
+        pxor    xmm6,xmm6
+        pxor    xmm7,xmm7
+        ret
+align   16
+__vpaes_schedule_192_smear:
+        pshufd  xmm1,xmm6,128
+        pshufd  xmm0,xmm7,254
+        pxor    xmm6,xmm1
+        pxor    xmm1,xmm1
+        pxor    xmm6,xmm0
+        movdqa  xmm0,xmm6
+        movhlps xmm6,xmm1
+        ret
+align   16
+__vpaes_schedule_round:
+        movdqa  xmm2,[8+esp]
+        pxor    xmm1,xmm1
+db      102,15,58,15,202,15
+db      102,15,58,15,210,15
+        pxor    xmm7,xmm1
+        pshufd  xmm0,xmm0,255
+db      102,15,58,15,192,1
+        movdqa  [8+esp],xmm2
+L$_vpaes_schedule_low_round:
+        movdqa  xmm1,xmm7
+        pslldq  xmm7,4
+        pxor    xmm7,xmm1
+        movdqa  xmm1,xmm7
+        pslldq  xmm7,8
+        pxor    xmm7,xmm1
+        pxor    xmm7,[336+ebp]
+        movdqa  xmm4,[ebp-16]
+        movdqa  xmm5,[ebp-48]
+        movdqa  xmm1,xmm4
+        pandn   xmm1,xmm0
+        psrld   xmm1,4
+        pand    xmm0,xmm4
+        movdqa  xmm2,[ebp-32]
+db      102,15,56,0,208
+        pxor    xmm0,xmm1
+        movdqa  xmm3,xmm5
+db      102,15,56,0,217
+        pxor    xmm3,xmm2
+        movdqa  xmm4,xmm5
+db      102,15,56,0,224
+        pxor    xmm4,xmm2
+        movdqa  xmm2,xmm5
+db      102,15,56,0,211
+        pxor    xmm2,xmm0
+        movdqa  xmm3,xmm5
+db      102,15,56,0,220
+        pxor    xmm3,xmm1
+        movdqa  xmm4,[32+ebp]
+db      102,15,56,0,226
+        movdqa  xmm0,[48+ebp]
+db      102,15,56,0,195
+        pxor    xmm0,xmm4
+        pxor    xmm0,xmm7
+        movdqa  xmm7,xmm0
+        ret
+align   16
+__vpaes_schedule_transform:
+        movdqa  xmm2,[ebp-16]
+        movdqa  xmm1,xmm2
+        pandn   xmm1,xmm0
+        psrld   xmm1,4
+        pand    xmm0,xmm2
+        movdqa  xmm2,[ebx]
+db      102,15,56,0,208
+        movdqa  xmm0,[16+ebx]
+db      102,15,56,0,193
+        pxor    xmm0,xmm2
+        ret
+align   16
+__vpaes_schedule_mangle:
+        movdqa  xmm4,xmm0
+        movdqa  xmm5,[128+ebp]
+        test    edi,edi
+        jnz     NEAR L$014schedule_mangle_dec
+        add     edx,16
+        pxor    xmm4,[336+ebp]
+db      102,15,56,0,229
+        movdqa  xmm3,xmm4
+db      102,15,56,0,229
+        pxor    xmm3,xmm4
+db      102,15,56,0,229
+        pxor    xmm3,xmm4
+        jmp     NEAR L$015schedule_mangle_both
+align   16
+L$014schedule_mangle_dec:
+        movdqa  xmm2,[ebp-16]
+        lea     esi,[416+ebp]
+        movdqa  xmm1,xmm2
+        pandn   xmm1,xmm4
+        psrld   xmm1,4
+        pand    xmm4,xmm2
+        movdqa  xmm2,[esi]
+db      102,15,56,0,212
+        movdqa  xmm3,[16+esi]
+db      102,15,56,0,217
+        pxor    xmm3,xmm2
+db      102,15,56,0,221
+        movdqa  xmm2,[32+esi]
+db      102,15,56,0,212
+        pxor    xmm2,xmm3
+        movdqa  xmm3,[48+esi]
+db      102,15,56,0,217
+        pxor    xmm3,xmm2
+db      102,15,56,0,221
+        movdqa  xmm2,[64+esi]
+db      102,15,56,0,212
+        pxor    xmm2,xmm3
+        movdqa  xmm3,[80+esi]
+db      102,15,56,0,217
+        pxor    xmm3,xmm2
+db      102,15,56,0,221
+        movdqa  xmm2,[96+esi]
+db      102,15,56,0,212
+        pxor    xmm2,xmm3
+        movdqa  xmm3,[112+esi]
+db      102,15,56,0,217
+        pxor    xmm3,xmm2
+        add     edx,-16
+L$015schedule_mangle_both:
+        movdqa  xmm1,[256+ecx*1+ebp]
+db      102,15,56,0,217
+        add     ecx,-16
+        and     ecx,48
+        movdqu  [edx],xmm3
+        ret
+global  _vpaes_set_encrypt_key
+align   16
+_vpaes_set_encrypt_key:
+L$_vpaes_set_encrypt_key_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        mov     esi,DWORD [20+esp]
+        lea     ebx,[esp-56]
+        mov     eax,DWORD [24+esp]
+        and     ebx,-16
+        mov     edx,DWORD [28+esp]
+        xchg    ebx,esp
+        mov     DWORD [48+esp],ebx
+        mov     ebx,eax
+        shr     ebx,5
+        add     ebx,5
+        mov     DWORD [240+edx],ebx
+        mov     ecx,48
+        mov     edi,0
+        lea     ebp,[(L$_vpaes_consts+0x30-L$016pic_point)]
+        call    __vpaes_schedule_core
+L$016pic_point:
+        mov     esp,DWORD [48+esp]
+        xor     eax,eax
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+global  _vpaes_set_decrypt_key
+align   16
+_vpaes_set_decrypt_key:
+L$_vpaes_set_decrypt_key_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        mov     esi,DWORD [20+esp]
+        lea     ebx,[esp-56]
+        mov     eax,DWORD [24+esp]
+        and     ebx,-16
+        mov     edx,DWORD [28+esp]
+        xchg    ebx,esp
+        mov     DWORD [48+esp],ebx
+        mov     ebx,eax
+        shr     ebx,5
+        add     ebx,5
+        mov     DWORD [240+edx],ebx
+        shl     ebx,4
+        lea     edx,[16+ebx*1+edx]
+        mov     edi,1
+        mov     ecx,eax
+        shr     ecx,1
+        and     ecx,32
+        xor     ecx,32
+        lea     ebp,[(L$_vpaes_consts+0x30-L$017pic_point)]
+        call    __vpaes_schedule_core
+L$017pic_point:
+        mov     esp,DWORD [48+esp]
+        xor     eax,eax
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+global  _vpaes_encrypt
+align   16
+_vpaes_encrypt:
+L$_vpaes_encrypt_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        lea     ebp,[(L$_vpaes_consts+0x30-L$018pic_point)]
+        call    __vpaes_preheat
+L$018pic_point:
+        mov     esi,DWORD [20+esp]
+        lea     ebx,[esp-56]
+        mov     edi,DWORD [24+esp]
+        and     ebx,-16
+        mov     edx,DWORD [28+esp]
+        xchg    ebx,esp
+        mov     DWORD [48+esp],ebx
+        movdqu  xmm0,[esi]
+        call    __vpaes_encrypt_core
+        movdqu  [edi],xmm0
+        mov     esp,DWORD [48+esp]
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+global  _vpaes_decrypt
+align   16
+_vpaes_decrypt:
+L$_vpaes_decrypt_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        lea     ebp,[(L$_vpaes_consts+0x30-L$019pic_point)]
+        call    __vpaes_preheat
+L$019pic_point:
+        mov     esi,DWORD [20+esp]
+        lea     ebx,[esp-56]
+        mov     edi,DWORD [24+esp]
+        and     ebx,-16
+        mov     edx,DWORD [28+esp]
+        xchg    ebx,esp
+        mov     DWORD [48+esp],ebx
+        movdqu  xmm0,[esi]
+        call    __vpaes_decrypt_core
+        movdqu  [edi],xmm0
+        mov     esp,DWORD [48+esp]
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+global  _vpaes_cbc_encrypt
+align   16
+_vpaes_cbc_encrypt:
+L$_vpaes_cbc_encrypt_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        mov     esi,DWORD [20+esp]
+        mov     edi,DWORD [24+esp]
+        mov     eax,DWORD [28+esp]
+        mov     edx,DWORD [32+esp]
+        sub     eax,16
+        jc      NEAR L$020cbc_abort
+        lea     ebx,[esp-56]
+        mov     ebp,DWORD [36+esp]
+        and     ebx,-16
+        mov     ecx,DWORD [40+esp]
+        xchg    ebx,esp
+        movdqu  xmm1,[ebp]
+        sub     edi,esi
+        mov     DWORD [48+esp],ebx
+        mov     DWORD [esp],edi
+        mov     DWORD [4+esp],edx
+        mov     DWORD [8+esp],ebp
+        mov     edi,eax
+        lea     ebp,[(L$_vpaes_consts+0x30-L$021pic_point)]
+        call    __vpaes_preheat
+L$021pic_point:
+        cmp     ecx,0
+        je      NEAR L$022cbc_dec_loop
+        jmp     NEAR L$023cbc_enc_loop
+align   16
+L$023cbc_enc_loop:
+        movdqu  xmm0,[esi]
+        pxor    xmm0,xmm1
+        call    __vpaes_encrypt_core
+        mov     ebx,DWORD [esp]
+        mov     edx,DWORD [4+esp]
+        movdqa  xmm1,xmm0
+        movdqu  [esi*1+ebx],xmm0
+        lea     esi,[16+esi]
+        sub     edi,16
+        jnc     NEAR L$023cbc_enc_loop
+        jmp     NEAR L$024cbc_done
+align   16
+L$022cbc_dec_loop:
+        movdqu  xmm0,[esi]
+        movdqa  [16+esp],xmm1
+        movdqa  [32+esp],xmm0
+        call    __vpaes_decrypt_core
+        mov     ebx,DWORD [esp]
+        mov     edx,DWORD [4+esp]
+        pxor    xmm0,[16+esp]
+        movdqa  xmm1,[32+esp]
+        movdqu  [esi*1+ebx],xmm0
+        lea     esi,[16+esi]
+        sub     edi,16
+        jnc     NEAR L$022cbc_dec_loop
+L$024cbc_done:
+        mov     ebx,DWORD [8+esp]
+        mov     esp,DWORD [48+esp]
+        movdqu  [ebx],xmm1
+L$020cbc_abort:
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/bn-586.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/bn-586.nasm
new file mode 100644
index 0000000000..75bba13387
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/bn-586.nasm
@@ -0,0 +1,1522 @@
+; Copyright 1995-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code    use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text   code align=64
+%else
+section .text   code
+%endif
+;extern _OPENSSL_ia32cap_P
+global  _bn_mul_add_words
+align   16
+_bn_mul_add_words:
+L$_bn_mul_add_words_begin:
+        lea     eax,[_OPENSSL_ia32cap_P]
+        bt      DWORD [eax],26
+        jnc     NEAR L$000maw_non_sse2
+        mov     eax,DWORD [4+esp]
+        mov     edx,DWORD [8+esp]
+        mov     ecx,DWORD [12+esp]
+        movd    mm0,DWORD [16+esp]
+        pxor    mm1,mm1
+        jmp     NEAR L$001maw_sse2_entry
+align   16
+L$002maw_sse2_unrolled:
+        movd    mm3,DWORD [eax]
+        paddq   mm1,mm3
+        movd    mm2,DWORD [edx]
+        pmuludq mm2,mm0
+        movd    mm4,DWORD [4+edx]
+        pmuludq mm4,mm0
+        movd    mm6,DWORD [8+edx]
+        pmuludq mm6,mm0
+        movd    mm7,DWORD [12+edx]
+        pmuludq mm7,mm0
+        paddq   mm1,mm2
+        movd    mm3,DWORD [4+eax]
+        paddq   mm3,mm4
+        movd    mm5,DWORD [8+eax]
+        paddq   mm5,mm6
+        movd    mm4,DWORD [12+eax]
+        paddq   mm7,mm4
+        movd    DWORD [eax],mm1
+        movd    mm2,DWORD [16+edx]
+        pmuludq mm2,mm0
+        psrlq   mm1,32
+        movd    mm4,DWORD [20+edx]
+        pmuludq mm4,mm0
+        paddq   mm1,mm3
+        movd    mm6,DWORD [24+edx]
+        pmuludq mm6,mm0
+        movd    DWORD [4+eax],mm1
+        psrlq   mm1,32
+        movd    mm3,DWORD [28+edx]
+        add     edx,32
+        pmuludq mm3,mm0
+        paddq   mm1,mm5
+        movd    mm5,DWORD [16+eax]
+        paddq   mm2,mm5
+        movd    DWORD [8+eax],mm1
+        psrlq   mm1,32
+        paddq   mm1,mm7
+        movd    mm5,DWORD [20+eax]
+        paddq   mm4,mm5
+        movd    DWORD [12+eax],mm1
+        psrlq   mm1,32
+        paddq   mm1,mm2
+        movd    mm5,DWORD [24+eax]
+        paddq   mm6,mm5
+        movd    DWORD [16+eax],mm1
+        psrlq   mm1,32
+        paddq   mm1,mm4
+        movd    mm5,DWORD [28+eax]
+        paddq   mm3,mm5
+        movd    DWORD [20+eax],mm1
+        psrlq   mm1,32
+        paddq   mm1,mm6
+        movd    DWORD [24+eax],mm1
+        psrlq   mm1,32
+        paddq   mm1,mm3
+        movd    DWORD [28+eax],mm1
+        lea     eax,[32+eax]
+        psrlq   mm1,32
+        sub     ecx,8
+        jz      NEAR L$003maw_sse2_exit
+L$001maw_sse2_entry:
+        test    ecx,4294967288
+        jnz     NEAR L$002maw_sse2_unrolled
+align   4
+L$004maw_sse2_loop:
+        movd    mm2,DWORD [edx]
+        movd    mm3,DWORD [eax]
+        pmuludq mm2,mm0
+        lea     edx,[4+edx]
+        paddq   mm1,mm3
+        paddq   mm1,mm2
+        movd    DWORD [eax],mm1
+        sub     ecx,1
+        psrlq   mm1,32
+        lea     eax,[4+eax]
+        jnz     NEAR L$004maw_sse2_loop
+L$003maw_sse2_exit:
+        movd    eax,mm1
+        emms
+        ret
+align   16
+L$000maw_non_sse2:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        ;
+        xor     esi,esi
+        mov     edi,DWORD [20+esp]
+        mov     ecx,DWORD [28+esp]
+        mov     ebx,DWORD [24+esp]
+        and     ecx,4294967288
+        mov     ebp,DWORD [32+esp]
+        push    ecx
+        jz      NEAR L$005maw_finish
+align   16
+L$006maw_loop:
+        ; Round 0
+        mov     eax,DWORD [ebx]
+        mul     ebp
+        add     eax,esi
+        adc     edx,0
+        add     eax,DWORD [edi]
+        adc     edx,0
+        mov     DWORD [edi],eax
+        mov     esi,edx
+        ; Round 4
+        mov     eax,DWORD [4+ebx]
+        mul     ebp
+        add     eax,esi
+        adc     edx,0
+        add     eax,DWORD [4+edi]
+        adc     edx,0
+        mov     DWORD [4+edi],eax
+        mov     esi,edx
+        ; Round 8
+        mov     eax,DWORD [8+ebx]
+        mul     ebp
+        add     eax,esi
+        adc     edx,0
+        add     eax,DWORD [8+edi]
+        adc     edx,0
+        mov     DWORD [8+edi],eax
+        mov     esi,edx
+        ; Round 12
+        mov     eax,DWORD [12+ebx]
+        mul     ebp
+        add     eax,esi
+        adc     edx,0
+        add     eax,DWORD [12+edi]
+        adc     edx,0
+        mov     DWORD [12+edi],eax
+        mov     esi,edx
+        ; Round 16
+        mov     eax,DWORD [16+ebx]
+        mul     ebp
+        add     eax,esi
+        adc     edx,0
+        add     eax,DWORD [16+edi]
+        adc     edx,0
+        mov     DWORD [16+edi],eax
+        mov     esi,edx
+        ; Round 20
+        mov     eax,DWORD [20+ebx]
+        mul     ebp
+        add     eax,esi
+        adc     edx,0
+        add     eax,DWORD [20+edi]
+        adc     edx,0
+        mov     DWORD [20+edi],eax
+        mov     esi,edx
+        ; Round 24
+        mov     eax,DWORD [24+ebx]
+        mul     ebp
+        add     eax,esi
+        adc     edx,0
+        add     eax,DWORD [24+edi]
+        adc     edx,0
+        mov     DWORD [24+edi],eax
+        mov     esi,edx
+        ; Round 28
+        mov     eax,DWORD [28+ebx]
+        mul     ebp
+        add     eax,esi
+        adc     edx,0
+        add     eax,DWORD [28+edi]
+        adc     edx,0
+        mov     DWORD [28+edi],eax
+        mov     esi,edx
+        ;
+        sub     ecx,8
+        lea     ebx,[32+ebx]
+        lea     edi,[32+edi]
+        jnz     NEAR L$006maw_loop
+L$005maw_finish:
+        mov     ecx,DWORD [32+esp]
+        and     ecx,7
+        jnz     NEAR L$007maw_finish2
+        jmp     NEAR L$008maw_end
+L$007maw_finish2:
+        ; Tail Round 0
+        mov     eax,DWORD [ebx]
+        mul     ebp
+        add     eax,esi
+        adc     edx,0
+        add     eax,DWORD [edi]
+        adc     edx,0
+        dec     ecx
+        mov     DWORD [edi],eax
+        mov     esi,edx
+        jz      NEAR L$008maw_end
+        ; Tail Round 1
+        mov     eax,DWORD [4+ebx]
+        mul     ebp
+        add     eax,esi
+        adc     edx,0
+        add     eax,DWORD [4+edi]
+        adc     edx,0
+        dec     ecx
+        mov     DWORD [4+edi],eax
+        mov     esi,edx
+        jz      NEAR L$008maw_end
+        ; Tail Round 2
+        mov     eax,DWORD [8+ebx]
+        mul     ebp
+        add     eax,esi
+        adc     edx,0
+        add     eax,DWORD [8+edi]
+        adc     edx,0
+        dec     ecx
+        mov     DWORD [8+edi],eax
+        mov     esi,edx
+        jz      NEAR L$008maw_end
+        ; Tail Round 3
+        mov     eax,DWORD [12+ebx]
+        mul     ebp
+        add     eax,esi
+        adc     edx,0
+        add     eax,DWORD [12+edi]
+        adc     edx,0
+        dec     ecx
+        mov     DWORD [12+edi],eax
+        mov     esi,edx
+        jz      NEAR L$008maw_end
+        ; Tail Round 4
+        mov     eax,DWORD [16+ebx]
+        mul     ebp
+        add     eax,esi
+        adc     edx,0
+        add     eax,DWORD [16+edi]
+        adc     edx,0
+        dec     ecx
+        mov     DWORD [16+edi],eax
+        mov     esi,edx
+        jz      NEAR L$008maw_end
+        ; Tail Round 5
+        mov     eax,DWORD [20+ebx]
+        mul     ebp
+        add     eax,esi
+        adc     edx,0
+        add     eax,DWORD [20+edi]
+        adc     edx,0
+        dec     ecx
+        mov     DWORD [20+edi],eax
+        mov     esi,edx
+        jz      NEAR L$008maw_end
+        ; Tail Round 6
+        mov     eax,DWORD [24+ebx]
+        mul     ebp
+        add     eax,esi
+        adc     edx,0
+        add     eax,DWORD [24+edi]
+        adc     edx,0
+        mov     DWORD [24+edi],eax
+        mov     esi,edx
+L$008maw_end:
+        mov     eax,esi
+        pop     ecx
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+global  _bn_mul_words
+align   16
+_bn_mul_words:
+L$_bn_mul_words_begin:
+        lea     eax,[_OPENSSL_ia32cap_P]
+        bt      DWORD [eax],26
+        jnc     NEAR L$009mw_non_sse2
+        mov     eax,DWORD [4+esp]
+        mov     edx,DWORD [8+esp]
+        mov     ecx,DWORD [12+esp]
+        movd    mm0,DWORD [16+esp]
+        pxor    mm1,mm1
+align   16
+L$010mw_sse2_loop:
+        movd    mm2,DWORD [edx]
+        pmuludq mm2,mm0
+        lea     edx,[4+edx]
+        paddq   mm1,mm2
+        movd    DWORD [eax],mm1
+        sub     ecx,1
+        psrlq   mm1,32
+        lea     eax,[4+eax]
+        jnz     NEAR L$010mw_sse2_loop
+        movd    eax,mm1
+        emms
+        ret
+align   16
+L$009mw_non_sse2:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        ;
+        xor     esi,esi
+        mov     edi,DWORD [20+esp]
+        mov     ebx,DWORD [24+esp]
+        mov     ebp,DWORD [28+esp]
+        mov     ecx,DWORD [32+esp]
+        and     ebp,4294967288
+        jz      NEAR L$011mw_finish
+L$012mw_loop:
+        ; Round 0
+        mov     eax,DWORD [ebx]
+        mul     ecx
+        add     eax,esi
+        adc     edx,0
+        mov     DWORD [edi],eax
+        mov     esi,edx
+        ; Round 4
+        mov     eax,DWORD [4+ebx]
+        mul     ecx
+        add     eax,esi
+        adc     edx,0
+        mov     DWORD [4+edi],eax
+        mov     esi,edx
+        ; Round 8
+        mov     eax,DWORD [8+ebx]
+        mul     ecx
+        add     eax,esi
+        adc     edx,0
+        mov     DWORD [8+edi],eax
+        mov     esi,edx
+        ; Round 12
+        mov     eax,DWORD [12+ebx]
+        mul     ecx
+        add     eax,esi
+        adc     edx,0
+        mov     DWORD [12+edi],eax
+        mov     esi,edx
+        ; Round 16
+        mov     eax,DWORD [16+ebx]
+        mul     ecx
+        add     eax,esi
+        adc     edx,0
+        mov     DWORD [16+edi],eax
+        mov     esi,edx
+        ; Round 20
+        mov     eax,DWORD [20+ebx]
+        mul     ecx
+        add     eax,esi
+        adc     edx,0
+        mov     DWORD [20+edi],eax
+        mov     esi,edx
+        ; Round 24
+        mov     eax,DWORD [24+ebx]
+        mul     ecx
+        add     eax,esi
+        adc     edx,0
+        mov     DWORD [24+edi],eax
+        mov     esi,edx
+        ; Round 28
+        mov     eax,DWORD [28+ebx]
+        mul     ecx
+        add     eax,esi
+        adc     edx,0
+        mov     DWORD [28+edi],eax
+        mov     esi,edx
+        ;
+        add     ebx,32
+        add     edi,32
+        sub     ebp,8
+        jz      NEAR L$011mw_finish
+        jmp     NEAR L$012mw_loop
+L$011mw_finish:
+        mov     ebp,DWORD [28+esp]
+        and     ebp,7
+        jnz     NEAR L$013mw_finish2
+        jmp     NEAR L$014mw_end
+L$013mw_finish2:
+        ; Tail Round 0
+        mov     eax,DWORD [ebx]
+        mul     ecx
+        add     eax,esi
+        adc     edx,0
+        mov     DWORD [edi],eax
+        mov     esi,edx
+        dec     ebp
+        jz      NEAR L$014mw_end
+        ; Tail Round 1
+        mov     eax,DWORD [4+ebx]
+        mul     ecx
+        add     eax,esi
+        adc     edx,0
+        mov     DWORD [4+edi],eax
+        mov     esi,edx
+        dec     ebp
+        jz      NEAR L$014mw_end
+        ; Tail Round 2
+        mov     eax,DWORD [8+ebx]
+        mul     ecx
+        add     eax,esi
+        adc     edx,0
+        mov     DWORD [8+edi],eax
+        mov     esi,edx
+        dec     ebp
+        jz      NEAR L$014mw_end
+        ; Tail Round 3
+        mov     eax,DWORD [12+ebx]
+        mul     ecx
+        add     eax,esi
+        adc     edx,0
+        mov     DWORD [12+edi],eax
+        mov     esi,edx
+        dec     ebp
+        jz      NEAR L$014mw_end
+        ; Tail Round 4
+        mov     eax,DWORD [16+ebx]
+        mul     ecx
+        add     eax,esi
+        adc     edx,0
+        mov     DWORD [16+edi],eax
+        mov     esi,edx
+        dec     ebp
+        jz      NEAR L$014mw_end
+        ; Tail Round 5
+        mov     eax,DWORD [20+ebx]
+        mul     ecx
+        add     eax,esi
+        adc     edx,0
+        mov     DWORD [20+edi],eax
+        mov     esi,edx
+        dec     ebp
+        jz      NEAR L$014mw_end
+        ; Tail Round 6
+        mov     eax,DWORD [24+ebx]
+        mul     ecx
+        add     eax,esi
+        adc     edx,0
+        mov     DWORD [24+edi],eax
+        mov     esi,edx
+L$014mw_end:
+        mov     eax,esi
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+global  _bn_sqr_words
+align   16
+_bn_sqr_words:
+L$_bn_sqr_words_begin:
+        lea     eax,[_OPENSSL_ia32cap_P]
+        bt      DWORD [eax],26
+        jnc     NEAR L$015sqr_non_sse2
+        mov     eax,DWORD [4+esp]
+        mov     edx,DWORD [8+esp]
+        mov     ecx,DWORD [12+esp]
+align   16
+L$016sqr_sse2_loop:
+        movd    mm0,DWORD [edx]
+        pmuludq mm0,mm0
+        lea     edx,[4+edx]
+        movq    [eax],mm0
+        sub     ecx,1
+        lea     eax,[8+eax]
+        jnz     NEAR L$016sqr_sse2_loop
+        emms
+        ret
+align   16
+L$015sqr_non_sse2:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        ;
+        mov     esi,DWORD [20+esp]
+        mov     edi,DWORD [24+esp]
+        mov     ebx,DWORD [28+esp]
+        and     ebx,4294967288
+        jz      NEAR L$017sw_finish
+L$018sw_loop:
+        ; Round 0
+        mov     eax,DWORD [edi]
+        mul     eax
+        mov     DWORD [esi],eax
+        mov     DWORD [4+esi],edx
+        ; Round 4
+        mov     eax,DWORD [4+edi]
+        mul     eax
+        mov     DWORD [8+esi],eax
+        mov     DWORD [12+esi],edx
+        ; Round 8
+        mov     eax,DWORD [8+edi]
+        mul     eax
+        mov     DWORD [16+esi],eax
+        mov     DWORD [20+esi],edx
+        ; Round 12
+        mov     eax,DWORD [12+edi]
+        mul     eax
+        mov     DWORD [24+esi],eax
+        mov     DWORD [28+esi],edx
+        ; Round 16
+        mov     eax,DWORD [16+edi]
+        mul     eax
+        mov     DWORD [32+esi],eax
+        mov     DWORD [36+esi],edx
+        ; Round 20
+        mov     eax,DWORD [20+edi]
+        mul     eax
+        mov     DWORD [40+esi],eax
+        mov     DWORD [44+esi],edx
+        ; Round 24
+        mov     eax,DWORD [24+edi]
+        mul     eax
+        mov     DWORD [48+esi],eax
+        mov     DWORD [52+esi],edx
+        ; Round 28
+        mov     eax,DWORD [28+edi]
+        mul     eax
+        mov     DWORD [56+esi],eax
+        mov     DWORD [60+esi],edx
+        ;
+        add     edi,32
+        add     esi,64
+        sub     ebx,8
+        jnz     NEAR L$018sw_loop
+L$017sw_finish:
+        mov     ebx,DWORD [28+esp]
+        and     ebx,7
+        jz      NEAR L$019sw_end
+        ; Tail Round 0
+        mov     eax,DWORD [edi]
+        mul     eax
+        mov     DWORD [esi],eax
+        dec     ebx
+        mov     DWORD [4+esi],edx
+        jz      NEAR L$019sw_end
+        ; Tail Round 1
+        mov     eax,DWORD [4+edi]
+        mul     eax
+        mov     DWORD [8+esi],eax
+        dec     ebx
+        mov     DWORD [12+esi],edx
+        jz      NEAR L$019sw_end
+        ; Tail Round 2
+        mov     eax,DWORD [8+edi]
+        mul     eax
+        mov     DWORD [16+esi],eax
+        dec     ebx
+        mov     DWORD [20+esi],edx
+        jz      NEAR L$019sw_end
+        ; Tail Round 3
+        mov     eax,DWORD [12+edi]
+        mul     eax
+        mov     DWORD [24+esi],eax
+        dec     ebx
+        mov     DWORD [28+esi],edx
+        jz      NEAR L$019sw_end
+        ; Tail Round 4
+        mov     eax,DWORD [16+edi]
+        mul     eax
+        mov     DWORD [32+esi],eax
+        dec     ebx
+        mov     DWORD [36+esi],edx
+        jz      NEAR L$019sw_end
+        ; Tail Round 5
+        mov     eax,DWORD [20+edi]
+        mul     eax
+        mov     DWORD [40+esi],eax
+        dec     ebx
+        mov     DWORD [44+esi],edx
+        jz      NEAR L$019sw_end
+        ; Tail Round 6
+        mov     eax,DWORD [24+edi]
+        mul     eax
+        mov     DWORD [48+esi],eax
+        mov     DWORD [52+esi],edx
+L$019sw_end:
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+global  _bn_div_words
+align   16
+_bn_div_words:
+L$_bn_div_words_begin:
+        mov     edx,DWORD [4+esp]
+        mov     eax,DWORD [8+esp]
+        mov     ecx,DWORD [12+esp]
+        div     ecx
+        ret
+global  _bn_add_words
+align   16
+_bn_add_words:
+L$_bn_add_words_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        ;
+        mov     ebx,DWORD [20+esp]
+        mov     esi,DWORD [24+esp]
+        mov     edi,DWORD [28+esp]
+        mov     ebp,DWORD [32+esp]
+        xor     eax,eax
+        and     ebp,4294967288
+        jz      NEAR L$020aw_finish
+L$021aw_loop:
+        ; Round 0
+        mov     ecx,DWORD [esi]
+        mov     edx,DWORD [edi]
+        add     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        add     ecx,edx
+        adc     eax,0
+        mov     DWORD [ebx],ecx
+        ; Round 1
+        mov     ecx,DWORD [4+esi]
+        mov     edx,DWORD [4+edi]
+        add     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        add     ecx,edx
+        adc     eax,0
+        mov     DWORD [4+ebx],ecx
+        ; Round 2
+        mov     ecx,DWORD [8+esi]
+        mov     edx,DWORD [8+edi]
+        add     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        add     ecx,edx
+        adc     eax,0
+        mov     DWORD [8+ebx],ecx
+        ; Round 3
+        mov     ecx,DWORD [12+esi]
+        mov     edx,DWORD [12+edi]
+        add     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        add     ecx,edx
+        adc     eax,0
+        mov     DWORD [12+ebx],ecx
+        ; Round 4
+        mov     ecx,DWORD [16+esi]
+        mov     edx,DWORD [16+edi]
+        add     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        add     ecx,edx
+        adc     eax,0
+        mov     DWORD [16+ebx],ecx
+        ; Round 5
+        mov     ecx,DWORD [20+esi]
+        mov     edx,DWORD [20+edi]
+        add     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        add     ecx,edx
+        adc     eax,0
+        mov     DWORD [20+ebx],ecx
+        ; Round 6
+        mov     ecx,DWORD [24+esi]
+        mov     edx,DWORD [24+edi]
+        add     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        add     ecx,edx
+        adc     eax,0
+        mov     DWORD [24+ebx],ecx
+        ; Round 7
+        mov     ecx,DWORD [28+esi]
+        mov     edx,DWORD [28+edi]
+        add     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        add     ecx,edx
+        adc     eax,0
+        mov     DWORD [28+ebx],ecx
+        ;
+        add     esi,32
+        add     edi,32
+        add     ebx,32
+        sub     ebp,8
+        jnz     NEAR L$021aw_loop
+L$020aw_finish:
+        mov     ebp,DWORD [32+esp]
+        and     ebp,7
+        jz      NEAR L$022aw_end
+        ; Tail Round 0
+        mov     ecx,DWORD [esi]
+        mov     edx,DWORD [edi]
+        add     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        add     ecx,edx
+        adc     eax,0
+        dec     ebp
+        mov     DWORD [ebx],ecx
+        jz      NEAR L$022aw_end
+        ; Tail Round 1
+        mov     ecx,DWORD [4+esi]
+        mov     edx,DWORD [4+edi]
+        add     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        add     ecx,edx
+        adc     eax,0
+        dec     ebp
+        mov     DWORD [4+ebx],ecx
+        jz      NEAR L$022aw_end
+        ; Tail Round 2
+        mov     ecx,DWORD [8+esi]
+        mov     edx,DWORD [8+edi]
+        add     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        add     ecx,edx
+        adc     eax,0
+        dec     ebp
+        mov     DWORD [8+ebx],ecx
+        jz      NEAR L$022aw_end
+        ; Tail Round 3
+        mov     ecx,DWORD [12+esi]
+        mov     edx,DWORD [12+edi]
+        add     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        add     ecx,edx
+        adc     eax,0
+        dec     ebp
+        mov     DWORD [12+ebx],ecx
+        jz      NEAR L$022aw_end
+        ; Tail Round 4
+        mov     ecx,DWORD [16+esi]
+        mov     edx,DWORD [16+edi]
+        add     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        add     ecx,edx
+        adc     eax,0
+        dec     ebp
+        mov     DWORD [16+ebx],ecx
+        jz      NEAR L$022aw_end
+        ; Tail Round 5
+        mov     ecx,DWORD [20+esi]
+        mov     edx,DWORD [20+edi]
+        add     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        add     ecx,edx
+        adc     eax,0
+        dec     ebp
+        mov     DWORD [20+ebx],ecx
+        jz      NEAR L$022aw_end
+        ; Tail Round 6
+        mov     ecx,DWORD [24+esi]
+        mov     edx,DWORD [24+edi]
+        add     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        add     ecx,edx
+        adc     eax,0
+        mov     DWORD [24+ebx],ecx
+L$022aw_end:
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+global  _bn_sub_words
+align   16
+_bn_sub_words:
+L$_bn_sub_words_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        ;
+        mov     ebx,DWORD [20+esp]
+        mov     esi,DWORD [24+esp]
+        mov     edi,DWORD [28+esp]
+        mov     ebp,DWORD [32+esp]
+        xor     eax,eax
+        and     ebp,4294967288
+        jz      NEAR L$023aw_finish
+L$024aw_loop:
+        ; Round 0
+        mov     ecx,DWORD [esi]
+        mov     edx,DWORD [edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [ebx],ecx
+        ; Round 1
+        mov     ecx,DWORD [4+esi]
+        mov     edx,DWORD [4+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [4+ebx],ecx
+        ; Round 2
+        mov     ecx,DWORD [8+esi]
+        mov     edx,DWORD [8+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [8+ebx],ecx
+        ; Round 3
+        mov     ecx,DWORD [12+esi]
+        mov     edx,DWORD [12+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [12+ebx],ecx
+        ; Round 4
+        mov     ecx,DWORD [16+esi]
+        mov     edx,DWORD [16+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [16+ebx],ecx
+        ; Round 5
+        mov     ecx,DWORD [20+esi]
+        mov     edx,DWORD [20+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [20+ebx],ecx
+        ; Round 6
+        mov     ecx,DWORD [24+esi]
+        mov     edx,DWORD [24+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [24+ebx],ecx
+        ; Round 7
+        mov     ecx,DWORD [28+esi]
+        mov     edx,DWORD [28+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [28+ebx],ecx
+        ;
+        add     esi,32
+        add     edi,32
+        add     ebx,32
+        sub     ebp,8
+        jnz     NEAR L$024aw_loop
+L$023aw_finish:
+        mov     ebp,DWORD [32+esp]
+        and     ebp,7
+        jz      NEAR L$025aw_end
+        ; Tail Round 0
+        mov     ecx,DWORD [esi]
+        mov     edx,DWORD [edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        dec     ebp
+        mov     DWORD [ebx],ecx
+        jz      NEAR L$025aw_end
+        ; Tail Round 1
+        mov     ecx,DWORD [4+esi]
+        mov     edx,DWORD [4+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        dec     ebp
+        mov     DWORD [4+ebx],ecx
+        jz      NEAR L$025aw_end
+        ; Tail Round 2
+        mov     ecx,DWORD [8+esi]
+        mov     edx,DWORD [8+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        dec     ebp
+        mov     DWORD [8+ebx],ecx
+        jz      NEAR L$025aw_end
+        ; Tail Round 3
+        mov     ecx,DWORD [12+esi]
+        mov     edx,DWORD [12+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        dec     ebp
+        mov     DWORD [12+ebx],ecx
+        jz      NEAR L$025aw_end
+        ; Tail Round 4
+        mov     ecx,DWORD [16+esi]
+        mov     edx,DWORD [16+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        dec     ebp
+        mov     DWORD [16+ebx],ecx
+        jz      NEAR L$025aw_end
+        ; Tail Round 5
+        mov     ecx,DWORD [20+esi]
+        mov     edx,DWORD [20+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        dec     ebp
+        mov     DWORD [20+ebx],ecx
+        jz      NEAR L$025aw_end
+        ; Tail Round 6
+        mov     ecx,DWORD [24+esi]
+        mov     edx,DWORD [24+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [24+ebx],ecx
+L$025aw_end:
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+global  _bn_sub_part_words
+align   16
+_bn_sub_part_words:
+L$_bn_sub_part_words_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        ;
+        mov     ebx,DWORD [20+esp]
+        mov     esi,DWORD [24+esp]
+        mov     edi,DWORD [28+esp]
+        mov     ebp,DWORD [32+esp]
+        xor     eax,eax
+        and     ebp,4294967288
+        jz      NEAR L$026aw_finish
+L$027aw_loop:
+        ; Round 0
+        mov     ecx,DWORD [esi]
+        mov     edx,DWORD [edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [ebx],ecx
+        ; Round 1
+        mov     ecx,DWORD [4+esi]
+        mov     edx,DWORD [4+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [4+ebx],ecx
+        ; Round 2
+        mov     ecx,DWORD [8+esi]
+        mov     edx,DWORD [8+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [8+ebx],ecx
+        ; Round 3
+        mov     ecx,DWORD [12+esi]
+        mov     edx,DWORD [12+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [12+ebx],ecx
+        ; Round 4
+        mov     ecx,DWORD [16+esi]
+        mov     edx,DWORD [16+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [16+ebx],ecx
+        ; Round 5
+        mov     ecx,DWORD [20+esi]
+        mov     edx,DWORD [20+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [20+ebx],ecx
+        ; Round 6
+        mov     ecx,DWORD [24+esi]
+        mov     edx,DWORD [24+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [24+ebx],ecx
+        ; Round 7
+        mov     ecx,DWORD [28+esi]
+        mov     edx,DWORD [28+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [28+ebx],ecx
+        ;
+        add     esi,32
+        add     edi,32
+        add     ebx,32
+        sub     ebp,8
+        jnz     NEAR L$027aw_loop
+L$026aw_finish:
+        mov     ebp,DWORD [32+esp]
+        and     ebp,7
+        jz      NEAR L$028aw_end
+        ; Tail Round 0
+        mov     ecx,DWORD [esi]
+        mov     edx,DWORD [edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [ebx],ecx
+        add     esi,4
+        add     edi,4
+        add     ebx,4
+        dec     ebp
+        jz      NEAR L$028aw_end
+        ; Tail Round 1
+        mov     ecx,DWORD [esi]
+        mov     edx,DWORD [edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [ebx],ecx
+        add     esi,4
+        add     edi,4
+        add     ebx,4
+        dec     ebp
+        jz      NEAR L$028aw_end
+        ; Tail Round 2
+        mov     ecx,DWORD [esi]
+        mov     edx,DWORD [edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [ebx],ecx
+        add     esi,4
+        add     edi,4
+        add     ebx,4
+        dec     ebp
+        jz      NEAR L$028aw_end
+        ; Tail Round 3
+        mov     ecx,DWORD [esi]
+        mov     edx,DWORD [edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [ebx],ecx
+        add     esi,4
+        add     edi,4
+        add     ebx,4
+        dec     ebp
+        jz      NEAR L$028aw_end
+        ; Tail Round 4
+        mov     ecx,DWORD [esi]
+        mov     edx,DWORD [edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [ebx],ecx
+        add     esi,4
+        add     edi,4
+        add     ebx,4
+        dec     ebp
+        jz      NEAR L$028aw_end
+        ; Tail Round 5
+        mov     ecx,DWORD [esi]
+        mov     edx,DWORD [edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [ebx],ecx
+        add     esi,4
+        add     edi,4
+        add     ebx,4
+        dec     ebp
+        jz      NEAR L$028aw_end
+        ; Tail Round 6
+        mov     ecx,DWORD [esi]
+        mov     edx,DWORD [edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [ebx],ecx
+        add     esi,4
+        add     edi,4
+        add     ebx,4
+L$028aw_end:
+        cmp     DWORD [36+esp],0
+        je      NEAR L$029pw_end
+        mov     ebp,DWORD [36+esp]
+        cmp     ebp,0
+        je      NEAR L$029pw_end
+        jge     NEAR L$030pw_pos
+        ; pw_neg
+        mov     edx,0
+        sub     edx,ebp
+        mov     ebp,edx
+        and     ebp,4294967288
+        jz      NEAR L$031pw_neg_finish
+L$032pw_neg_loop:
+        ; dl<0 Round 0
+        mov     ecx,0
+        mov     edx,DWORD [edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [ebx],ecx
+        ; dl<0 Round 1
+        mov     ecx,0
+        mov     edx,DWORD [4+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [4+ebx],ecx
+        ; dl<0 Round 2
+        mov     ecx,0
+        mov     edx,DWORD [8+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [8+ebx],ecx
+        ; dl<0 Round 3
+        mov     ecx,0
+        mov     edx,DWORD [12+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [12+ebx],ecx
+        ; dl<0 Round 4
+        mov     ecx,0
+        mov     edx,DWORD [16+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [16+ebx],ecx
+        ; dl<0 Round 5
+        mov     ecx,0
+        mov     edx,DWORD [20+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [20+ebx],ecx
+        ; dl<0 Round 6
+        mov     ecx,0
+        mov     edx,DWORD [24+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [24+ebx],ecx
+        ; dl<0 Round 7
+        mov     ecx,0
+        mov     edx,DWORD [28+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [28+ebx],ecx
+        ;
+        add     edi,32
+        add     ebx,32
+        sub     ebp,8
+        jnz     NEAR L$032pw_neg_loop
+L$031pw_neg_finish:
+        mov     edx,DWORD [36+esp]
+        mov     ebp,0
+        sub     ebp,edx
+        and     ebp,7
+        jz      NEAR L$029pw_end
+        ; dl<0 Tail Round 0
+        mov     ecx,0
+        mov     edx,DWORD [edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        dec     ebp
+        mov     DWORD [ebx],ecx
+        jz      NEAR L$029pw_end
+        ; dl<0 Tail Round 1
+        mov     ecx,0
+        mov     edx,DWORD [4+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        dec     ebp
+        mov     DWORD [4+ebx],ecx
+        jz      NEAR L$029pw_end
+        ; dl<0 Tail Round 2
+        mov     ecx,0
+        mov     edx,DWORD [8+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        dec     ebp
+        mov     DWORD [8+ebx],ecx
+        jz      NEAR L$029pw_end
+        ; dl<0 Tail Round 3
+        mov     ecx,0
+        mov     edx,DWORD [12+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        dec     ebp
+        mov     DWORD [12+ebx],ecx
+        jz      NEAR L$029pw_end
+        ; dl<0 Tail Round 4
+        mov     ecx,0
+        mov     edx,DWORD [16+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        dec     ebp
+        mov     DWORD [16+ebx],ecx
+        jz      NEAR L$029pw_end
+        ; dl<0 Tail Round 5
+        mov     ecx,0
+        mov     edx,DWORD [20+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        dec     ebp
+        mov     DWORD [20+ebx],ecx
+        jz      NEAR L$029pw_end
+        ; dl<0 Tail Round 6
+        mov     ecx,0
+        mov     edx,DWORD [24+edi]
+        sub     ecx,eax
+        mov     eax,0
+        adc     eax,eax
+        sub     ecx,edx
+        adc     eax,0
+        mov     DWORD [24+ebx],ecx
+        jmp     NEAR L$029pw_end
+L$030pw_pos:
+        and     ebp,4294967288
+        jz      NEAR L$033pw_pos_finish
+L$034pw_pos_loop:
+        ; dl>0 Round 0
+        mov     ecx,DWORD [esi]
+        sub     ecx,eax
+        mov     DWORD [ebx],ecx
+        jnc     NEAR L$035pw_nc0
+        ; dl>0 Round 1
+        mov     ecx,DWORD [4+esi]
+        sub     ecx,eax
+        mov     DWORD [4+ebx],ecx
+        jnc     NEAR L$036pw_nc1
+        ; dl>0 Round 2
+        mov     ecx,DWORD [8+esi]
+        sub     ecx,eax
+        mov     DWORD [8+ebx],ecx
+        jnc     NEAR L$037pw_nc2
+        ; dl>0 Round 3
+        mov     ecx,DWORD [12+esi]
+        sub     ecx,eax
+        mov     DWORD [12+ebx],ecx
+        jnc     NEAR L$038pw_nc3
+        ; dl>0 Round 4
+        mov     ecx,DWORD [16+esi]
+        sub     ecx,eax
+        mov     DWORD [16+ebx],ecx
+        jnc     NEAR L$039pw_nc4
+        ; dl>0 Round 5
+        mov     ecx,DWORD [20+esi]
+        sub     ecx,eax
+        mov     DWORD [20+ebx],ecx
+        jnc     NEAR L$040pw_nc5
+        ; dl>0 Round 6
+        mov     ecx,DWORD [24+esi]
+        sub     ecx,eax
+        mov     DWORD [24+ebx],ecx
+        jnc     NEAR L$041pw_nc6
+        ; dl>0 Round 7
+        mov     ecx,DWORD [28+esi]
+        sub     ecx,eax
+        mov     DWORD [28+ebx],ecx
+        jnc     NEAR L$042pw_nc7
+        ;
+        add     esi,32
+        add     ebx,32
+        sub     ebp,8
+        jnz     NEAR L$034pw_pos_loop
+L$033pw_pos_finish:
+        mov     ebp,DWORD [36+esp]
+        and     ebp,7
+        jz      NEAR L$029pw_end
+        ; dl>0 Tail Round 0
+        mov     ecx,DWORD [esi]
+        sub     ecx,eax
+        mov     DWORD [ebx],ecx
+        jnc     NEAR L$043pw_tail_nc0
+        dec     ebp
+        jz      NEAR L$029pw_end
+        ; dl>0 Tail Round 1
+        mov     ecx,DWORD [4+esi]
+        sub     ecx,eax
+        mov     DWORD [4+ebx],ecx
+        jnc     NEAR L$044pw_tail_nc1
+        dec     ebp
+        jz      NEAR L$029pw_end
+        ; dl>0 Tail Round 2
+        mov     ecx,DWORD [8+esi]
+        sub     ecx,eax
+        mov     DWORD [8+ebx],ecx
+        jnc     NEAR L$045pw_tail_nc2
+        dec     ebp
+        jz      NEAR L$029pw_end
+        ; dl>0 Tail Round 3
+        mov     ecx,DWORD [12+esi]
+        sub     ecx,eax
+        mov     DWORD [12+ebx],ecx
+        jnc     NEAR L$046pw_tail_nc3
+        dec     ebp
+        jz      NEAR L$029pw_end
+        ; dl>0 Tail Round 4
+        mov     ecx,DWORD [16+esi]
+        sub     ecx,eax
+        mov     DWORD [16+ebx],ecx
+        jnc     NEAR L$047pw_tail_nc4
+        dec     ebp
+        jz      NEAR L$029pw_end
+        ; dl>0 Tail Round 5
+        mov     ecx,DWORD [20+esi]
+        sub     ecx,eax
+        mov     DWORD [20+ebx],ecx
+        jnc     NEAR L$048pw_tail_nc5
+        dec     ebp
+        jz      NEAR L$029pw_end
+        ; dl>0 Tail Round 6
+        mov     ecx,DWORD [24+esi]
+        sub     ecx,eax
+        mov     DWORD [24+ebx],ecx
+        jnc     NEAR L$049pw_tail_nc6
+        mov     eax,1
+        jmp     NEAR L$029pw_end
+L$050pw_nc_loop:
+        mov     ecx,DWORD [esi]
+        mov     DWORD [ebx],ecx
+L$035pw_nc0:
+        mov     ecx,DWORD [4+esi]
+        mov     DWORD [4+ebx],ecx
+L$036pw_nc1:
+        mov     ecx,DWORD [8+esi]
+        mov     DWORD [8+ebx],ecx
+L$037pw_nc2:
+        mov     ecx,DWORD [12+esi]
+        mov     DWORD [12+ebx],ecx
+L$038pw_nc3:
+        mov     ecx,DWORD [16+esi]
+        mov     DWORD [16+ebx],ecx
+L$039pw_nc4:
+        mov     ecx,DWORD [20+esi]
+        mov     DWORD [20+ebx],ecx
+L$040pw_nc5:
+        mov     ecx,DWORD [24+esi]
+        mov     DWORD [24+ebx],ecx
+L$041pw_nc6:
+        mov     ecx,DWORD [28+esi]
+        mov     DWORD [28+ebx],ecx
+L$042pw_nc7:
+        ;
+        add     esi,32
+        add     ebx,32
+        sub     ebp,8
+        jnz     NEAR L$050pw_nc_loop
+        mov     ebp,DWORD [36+esp]
+        and     ebp,7
+        jz      NEAR L$051pw_nc_end
+        mov     ecx,DWORD [esi]
+        mov     DWORD [ebx],ecx
+L$043pw_tail_nc0:
+        dec     ebp
+        jz      NEAR L$051pw_nc_end
+        mov     ecx,DWORD [4+esi]
+        mov     DWORD [4+ebx],ecx
+L$044pw_tail_nc1:
+        dec     ebp
+        jz      NEAR L$051pw_nc_end
+        mov     ecx,DWORD [8+esi]
+        mov     DWORD [8+ebx],ecx
+L$045pw_tail_nc2:
+        dec     ebp
+        jz      NEAR L$051pw_nc_end
+        mov     ecx,DWORD [12+esi]
+        mov     DWORD [12+ebx],ecx
+L$046pw_tail_nc3:
+        dec     ebp
+        jz      NEAR L$051pw_nc_end
+        mov     ecx,DWORD [16+esi]
+        mov     DWORD [16+ebx],ecx
+L$047pw_tail_nc4:
+        dec     ebp
+        jz      NEAR L$051pw_nc_end
+        mov     ecx,DWORD [20+esi]
+        mov     DWORD [20+ebx],ecx
+L$048pw_tail_nc5:
+        dec     ebp
+        jz      NEAR L$051pw_nc_end
+        mov     ecx,DWORD [24+esi]
+        mov     DWORD [24+ebx],ecx
+L$049pw_tail_nc6:
+L$051pw_nc_end:
+        mov     eax,0
+L$029pw_end:
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+segment .bss
+common  _OPENSSL_ia32cap_P 16
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/co-586.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/co-586.nasm
new file mode 100644
index 0000000000..08eb9fe372
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/co-586.nasm
@@ -0,0 +1,1259 @@
+; Copyright 1995-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code    use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text   code align=64
+%else
+section .text   code
+%endif
+global  _bn_mul_comba8
+align   16
+_bn_mul_comba8:
+L$_bn_mul_comba8_begin:
+        push    esi
+        mov     esi,DWORD [12+esp]
+        push    edi
+        mov     edi,DWORD [20+esp]
+        push    ebp
+        push    ebx
+        xor     ebx,ebx
+        mov     eax,DWORD [esi]
+        xor     ecx,ecx
+        mov     edx,DWORD [edi]
+        ; ################## Calculate word 0
+        xor     ebp,ebp
+        ; mul a[0]*b[0]
+        mul     edx
+        add     ebx,eax
+        mov     eax,DWORD [20+esp]
+        adc     ecx,edx
+        mov     edx,DWORD [edi]
+        adc     ebp,0
+        mov     DWORD [eax],ebx
+        mov     eax,DWORD [4+esi]
+        ; saved r[0]
+        ; ################## Calculate word 1
+        xor     ebx,ebx
+        ; mul a[1]*b[0]
+        mul     edx
+        add     ecx,eax
+        mov     eax,DWORD [esi]
+        adc     ebp,edx
+        mov     edx,DWORD [4+edi]
+        adc     ebx,0
+        ; mul a[0]*b[1]
+        mul     edx
+        add     ecx,eax
+        mov     eax,DWORD [20+esp]
+        adc     ebp,edx
+        mov     edx,DWORD [edi]
+        adc     ebx,0
+        mov     DWORD [4+eax],ecx
+        mov     eax,DWORD [8+esi]
+        ; saved r[1]
+        ; ################## Calculate word 2
+        xor     ecx,ecx
+        ; mul a[2]*b[0]
+        mul     edx
+        add     ebp,eax
+        mov     eax,DWORD [4+esi]
+        adc     ebx,edx
+        mov     edx,DWORD [4+edi]
+        adc     ecx,0
+        ; mul a[1]*b[1]
+        mul     edx
+        add     ebp,eax
+        mov     eax,DWORD [esi]
+        adc     ebx,edx
+        mov     edx,DWORD [8+edi]
+        adc     ecx,0
+        ; mul a[0]*b[2]
+        mul     edx
+        add     ebp,eax
+        mov     eax,DWORD [20+esp]
+        adc     ebx,edx
+        mov     edx,DWORD [edi]
+        adc     ecx,0
+        mov     DWORD [8+eax],ebp
+        mov     eax,DWORD [12+esi]
+        ; saved r[2]
+        ; ################## Calculate word 3
+        xor     ebp,ebp
+        ; mul a[3]*b[0]
+        mul     edx
+        add     ebx,eax
+        mov     eax,DWORD [8+esi]
+        adc     ecx,edx
+        mov     edx,DWORD [4+edi]
+        adc     ebp,0
+        ; mul a[2]*b[1]
+        mul     edx
+        add     ebx,eax
+        mov     eax,DWORD [4+esi]
+        adc     ecx,edx
+        mov     edx,DWORD [8+edi]
+        adc     ebp,0
+        ; mul a[1]*b[2]
+        mul     edx
+        add     ebx,eax
+        mov     eax,DWORD [esi]
+        adc     ecx,edx
+        mov     edx,DWORD [12+edi]
+        adc     ebp,0
+        ; mul a[0]*b[3]
+        mul     edx
+        add     ebx,eax
+        mov     eax,DWORD [20+esp]
+        adc     ecx,edx
+        mov     edx,DWORD [edi]
+        adc     ebp,0
+        mov     DWORD [12+eax],ebx
+        mov     eax,DWORD [16+esi]
+        ; saved r[3]
+        ; ################## Calculate word 4
+        xor     ebx,ebx
+        ; mul a[4]*b[0]
+        mul     edx
+        add     ecx,eax
+        mov     eax,DWORD [12+esi]
+        adc     ebp,edx
+        mov     edx,DWORD [4+edi]
+        adc     ebx,0
+        ; mul a[3]*b[1]
+        mul     edx
+        add     ecx,eax
+        mov     eax,DWORD [8+esi]
+        adc     ebp,edx
+        mov     edx,DWORD [8+edi]
+        adc     ebx,0
+        ; mul a[2]*b[2]
+        mul     edx
+        add     ecx,eax
+        mov     eax,DWORD [4+esi]
+        adc     ebp,edx
+        mov     edx,DWORD [12+edi]
+        adc     ebx,0
+        ; mul a[1]*b[3]
+        mul     edx
+        add     ecx,eax
+        mov     eax,DWORD [esi]
+        adc     ebp,edx
+        mov     edx,DWORD [16+edi]
+        adc     ebx,0
+        ; mul a[0]*b[4]
+        mul     edx
+        add     ecx,eax
+        mov     eax,DWORD [20+esp]
+        adc     ebp,edx
+        mov     edx,DWORD [edi]
+        adc     ebx,0
+        mov     DWORD [16+eax],ecx
+        mov     eax,DWORD [20+esi]
+        ; saved r[4]
+        ; ################## Calculate word 5
+        xor     ecx,ecx
+        ; mul a[5]*b[0]
+        mul     edx
+        add     ebp,eax
+        mov     eax,DWORD [16+esi]
+        adc     ebx,edx
+        mov     edx,DWORD [4+edi]
+        adc     ecx,0
+        ; mul a[4]*b[1]
+        mul     edx
+        add     ebp,eax
+        mov     eax,DWORD [12+esi]
+        adc     ebx,edx
+        mov     edx,DWORD [8+edi]
+        adc     ecx,0
+        ; mul a[3]*b[2]
+        mul     edx
+        add     ebp,eax
+        mov     eax,DWORD [8+esi]
+        adc     ebx,edx
+        mov     edx,DWORD [12+edi]
+        adc     ecx,0
+        ; mul a[2]*b[3]
+        mul     edx
+        add     ebp,eax
+        mov     eax,DWORD [4+esi]
+        adc     ebx,edx
+        mov     edx,DWORD [16+edi]
+        adc     ecx,0
+        ; mul a[1]*b[4]
+        mul     edx
+        add     ebp,eax
+        mov     eax,DWORD [esi]
+        adc     ebx,edx
+        mov     edx,DWORD [20+edi]
+        adc     ecx,0
+        ; mul a[0]*b[5]
+        mul     edx
+        add     ebp,eax
+        mov     eax,DWORD [20+esp]
+        adc     ebx,edx
+        mov     edx,DWORD [edi]
+        adc     ecx,0
+        mov     DWORD [20+eax],ebp
+        mov     eax,DWORD [24+esi]
+        ; saved r[5]
+        ; ################## Calculate word 6
+        xor     ebp,ebp
+        ; mul a[6]*b[0]
+        mul     edx
+        add     ebx,eax
+        mov     eax,DWORD [20+esi]
+        adc     ecx,edx
+        mov     edx,DWORD [4+edi]
+        adc     ebp,0
+        ; mul a[5]*b[1]
+        mul     edx
+        add     ebx,eax
+        mov     eax,DWORD [16+esi]
+        adc     ecx,edx
+        mov     edx,DWORD [8+edi]
+        adc     ebp,0
+        ; mul a[4]*b[2]
+        mul     edx
+        add     ebx,eax
+        mov     eax,DWORD [12+esi]
+        adc     ecx,edx
+        mov     edx,DWORD [12+edi]
+        adc     ebp,0
+        ; mul a[3]*b[3]
+        mul     edx
+        add     ebx,eax
+        mov     eax,DWORD [8+esi]
+        adc     ecx,edx
+        mov     edx,DWORD [16+edi]
+        adc     ebp,0
+        ; mul a[2]*b[4]
+        mul     edx
+        add     ebx,eax
+        mov     eax,DWORD [4+esi]
+        adc     ecx,edx
+        mov     edx,DWORD [20+edi]
+        adc     ebp,0
+        ; mul a[1]*b[5]
+        mul     edx
+        add     ebx,eax
+        mov     eax,DWORD [esi]
+        adc     ecx,edx
+        mov     edx,DWORD [24+edi]
+        adc     ebp,0
+        ; mul a[0]*b[6]
+        mul     edx
+        add     ebx,eax
+        mov     eax,DWORD [20+esp]
+        adc     ecx,edx
+        mov     edx,DWORD [edi]
+        adc     ebp,0
+        mov     DWORD [24+eax],ebx
+        mov     eax,DWORD [28+esi]
+        ; saved r[6]
+        ; ################## Calculate word 7
+        xor     ebx,ebx
+        ; mul a[7]*b[0]
+        mul     edx
+        add     ecx,eax
+        mov     eax,DWORD [24+esi]
+        adc     ebp,edx
+        mov     edx,DWORD [4+edi]
+        adc     ebx,0
+        ; mul a[6]*b[1]
+        mul     edx
+        add     ecx,eax
+        mov     eax,DWORD [20+esi]
+        adc     ebp,edx
+        mov     edx,DWORD [8+edi]
+        adc     ebx,0
+        ; mul a[5]*b[2]
+        mul     edx
+        add     ecx,eax
+        mov     eax,DWORD [16+esi]
+        adc     ebp,edx
+        mov     edx,DWORD [12+edi]
+        adc     ebx,0
+        ; mul a[4]*b[3]
+        mul     edx
+        add     ecx,eax
+        mov     eax,DWORD [12+esi]
+        adc     ebp,edx
+        mov     edx,DWORD [16+edi]
+        adc     ebx,0
+        ; mul a[3]*b[4]
+        mul     edx
+        add     ecx,eax
+        mov     eax,DWORD [8+esi]
+        adc     ebp,edx
+        mov     edx,DWORD [20+edi]
+        adc     ebx,0
+        ; mul a[2]*b[5]
+        mul     edx
+        add     ecx,eax
+        mov     eax,DWORD [4+esi]
+        adc     ebp,edx
+        mov     edx,DWORD [24+edi]
+        adc     ebx,0
+        ; mul a[1]*b[6]
+        mul     edx
+        add     ecx,eax
+        mov     eax,DWORD [esi]
+        adc     ebp,edx
+        mov     edx,DWORD [28+edi]
+        adc     ebx,0
+        ; mul a[0]*b[7]
+        mul     edx
+        add     ecx,eax
+        mov     eax,DWORD [20+esp]
+        adc     ebp,edx
+        mov     edx,DWORD [4+edi]
+        adc     ebx,0
+        mov     DWORD [28+eax],ecx
+        mov     eax,DWORD [28+esi]
+        ; saved r[7]
+        ; ################## Calculate word 8
+        xor     ecx,ecx
+        ; mul a[7]*b[1]
+        mul     edx
+        add     ebp,eax
+        mov     eax,DWORD [24+esi]
+        adc     ebx,edx
+        mov     edx,DWORD [8+edi]
+        adc     ecx,0
+        ; mul a[6]*b[2]
+        mul     edx
+        add     ebp,eax
+        mov     eax,DWORD [20+esi]
+        adc     ebx,edx
+        mov     edx,DWORD [12+edi]
+        adc     ecx,0
+        ; mul a[5]*b[3]
+        mul     edx
+        add     ebp,eax
+        mov     eax,DWORD [16+esi]
+        adc     ebx,edx
+        mov     edx,DWORD [16+edi]
+        adc     ecx,0
+        ; mul a[4]*b[4]
+        mul     edx
+        add     ebp,eax
+        mov     eax,DWORD [12+esi]
+        adc     ebx,edx
+        mov     edx,DWORD [20+edi]
+        adc     ecx,0
+        ; mul a[3]*b[5]
+        mul     edx
+        add     ebp,eax
+        mov     eax,DWORD [8+esi]
+        adc     ebx,edx
+        mov     edx,DWORD [24+edi]
+        adc     ecx,0
+        ; mul a[2]*b[6]
+        mul     edx
+        add     ebp,eax
+        mov     eax,DWORD [4+esi]
+        adc     ebx,edx
+        mov     edx,DWORD [28+edi]
+        adc     ecx,0
+        ; mul a[1]*b[7]
+        mul     edx
+        add     ebp,eax
+        mov     eax,DWORD [20+esp]
+        adc     ebx,edx
+        mov     edx,DWORD [8+edi]
+        adc     ecx,0
+        mov     DWORD [32+eax],ebp
+        mov     eax,DWORD [28+esi]
+        ; saved r[8]
+        ; ################## Calculate word 9
+        xor     ebp,ebp
+        ; mul a[7]*b[2]
+        mul     edx
+        add     ebx,eax
+        mov     eax,DWORD [24+esi]
+        adc     ecx,edx
+        mov     edx,DWORD [12+edi]
+        adc     ebp,0
+        ; mul a[6]*b[3]
+        mul     edx
+        add     ebx,eax
+        mov     eax,DWORD [20+esi]
+        adc     ecx,edx
+        mov     edx,DWORD [16+edi]
+        adc     ebp,0
+        ; mul a[5]*b[4]
+        mul     edx
+        add     ebx,eax
+        mov     eax,DWORD [16+esi]
+        adc     ecx,edx
+        mov     edx,DWORD [20+edi]
+        adc     ebp,0
+        ; mul a[4]*b[5]
+        mul     edx
+        add     ebx,eax
+        mov     eax,DWORD [12+esi]
+        adc     ecx,edx
+        mov     edx,DWORD [24+edi]
+        adc     ebp,0
+        ; mul a[3]*b[6]
+        mul     edx
+        add     ebx,eax
+        mov     eax,DWORD [8+esi]
+        adc     ecx,edx
+        mov     edx,DWORD [28+edi]
+        adc     ebp,0
+        ; mul a[2]*b[7]
+        mul     edx
+        add     ebx,eax
+        mov     eax,DWORD [20+esp]
+        adc     ecx,edx
+        mov     edx,DWORD [12+edi]
+        adc     ebp,0
+        mov     DWORD [36+eax],ebx
+        mov     eax,DWORD [28+esi]
+        ; saved r[9]
+        ; ################## Calculate word 10
+        xor     ebx,ebx
+        ; mul a[7]*b[3]
+        mul     edx
+        add     ecx,eax
+        mov     eax,DWORD [24+esi]
+        adc     ebp,edx
+        mov     edx,DWORD [16+edi]
+        adc     ebx,0
+        ; mul a[6]*b[4]
+        mul     edx
+        add     ecx,eax
+        mov     eax,DWORD [20+esi]
+        adc     ebp,edx
+        mov     edx,DWORD [20+edi]
+        adc     ebx,0
+        ; mul a[5]*b[5]
+        mul     edx
+        add     ecx,eax
+        mov     eax,DWORD [16+esi]
+        adc     ebp,edx
+        mov     edx,DWORD [24+edi]
+        adc     ebx,0
+        ; mul a[4]*b[6]
+        mul     edx
+        add     ecx,eax
+        mov     eax,DWORD [12+esi]
+        adc     ebp,edx
+        mov     edx,DWORD [28+edi]
+        adc     ebx,0
+        ; mul a[3]*b[7]
+        mul     edx
+        add     ecx,eax
+        mov     eax,DWORD [20+esp]
+        adc     ebp,edx
+        mov     edx,DWORD [16+edi]
+        adc     ebx,0
+        mov     DWORD [40+eax],ecx
+        mov     eax,DWORD [28+esi]
+        ; saved r[10]
+        ; ################## Calculate word 11
+        xor     ecx,ecx
+        ; mul a[7]*b[4]
+        mul     edx
+        add     ebp,eax
+        mov     eax,DWORD [24+esi]
+        adc     ebx,edx
+        mov     edx,DWORD [20+edi]
+        adc     ecx,0
+        ; mul a[6]*b[5]
+        mul     edx
+        add     ebp,eax
+        mov     eax,DWORD [20+esi]
+        adc     ebx,edx
+        mov     edx,DWORD [24+edi]
+        adc     ecx,0
+        ; mul a[5]*b[6]
+        mul     edx
+        add     ebp,eax
+        mov     eax,DWORD [16+esi]
+        adc     ebx,edx
+        mov     edx,DWORD [28+edi]
+        adc     ecx,0
+        ; mul a[4]*b[7]
+        mul     edx
+        add     ebp,eax
+        mov     eax,DWORD [20+esp]
+        adc     ebx,edx
+        mov     edx,DWORD [20+edi]
+        adc     ecx,0
+        mov     DWORD [44+eax],ebp
+        mov     eax,DWORD [28+esi]
+        ; saved r[11]
+        ; ################## Calculate word 12
+        xor     ebp,ebp
+        ; mul a[7]*b[5]
+        mul     edx
+        add     ebx,eax
+        mov     eax,DWORD [24+esi]
+        adc     ecx,edx
+        mov     edx,DWORD [24+edi]
+        adc     ebp,0
+        ; mul a[6]*b[6]
+        mul     edx
+        add     ebx,eax
+        mov     eax,DWORD [20+esi]
+        adc     ecx,edx
+        mov     edx,DWORD [28+edi]
+        adc     ebp,0
+        ; mul a[5]*b[7]
+        mul     edx
+        add     ebx,eax
+        mov     eax,DWORD [20+esp]
+        adc     ecx,edx
+        mov     edx,DWORD [24+edi]
+        adc     ebp,0
+        mov     DWORD [48+eax],ebx
+        mov     eax,DWORD [28+esi]
+        ; saved r[12]
+        ; ################## Calculate word 13
+        xor     ebx,ebx
+        ; mul a[7]*b[6]
+        mul     edx
+        add     ecx,eax
+        mov     eax,DWORD [24+esi]
+        adc     ebp,edx
+        mov     edx,DWORD [28+edi]
+        adc     ebx,0
+        ; mul a[6]*b[7]
+        mul     edx
+        add     ecx,eax
+        mov     eax,DWORD [20+esp]
+        adc     ebp,edx
+        mov     edx,DWORD [28+edi]
+        adc     ebx,0
+        mov     DWORD [52+eax],ecx
+        mov     eax,DWORD [28+esi]
+        ; saved r[13]
+        ; ################## Calculate word 14
+        xor     ecx,ecx
+        ; mul a[7]*b[7]
+        mul     edx
+        add     ebp,eax
+        mov     eax,DWORD [20+esp]
+        adc     ebx,edx
+        adc     ecx,0
+        mov     DWORD [56+eax],ebp
+        ; saved r[14]
+        ; save r[15]
+        mov     DWORD [60+eax],ebx
+        pop     ebx
+        pop     ebp
+        pop     edi
+        pop     esi
+        ret
+global  _bn_mul_comba4
+align   16
+_bn_mul_comba4:
+L$_bn_mul_comba4_begin:
+        push    esi
+        mov     esi,DWORD [12+esp]
+        push    edi
+        mov     edi,DWORD [20+esp]
+        push    ebp
+        push    ebx
+        xor     ebx,ebx
+        mov     eax,DWORD [esi]
+        xor     ecx,ecx
+        mov     edx,DWORD [edi]
+        ; ################## Calculate word 0
+        xor     ebp,ebp
+        ; mul a[0]*b[0]
+        mul     edx
+        add     ebx,eax
+        mov     eax,DWORD [20+esp]
+        adc     ecx,edx
+        mov     edx,DWORD [edi]
+        adc     ebp,0
+        mov     DWORD [eax],ebx
+        mov     eax,DWORD [4+esi]
+        ; saved r[0]
+        ; ################## Calculate word 1
+        xor     ebx,ebx
+        ; mul a[1]*b[0]
+        mul     edx
+        add     ecx,eax
+        mov     eax,DWORD [esi]
+        adc     ebp,edx
+        mov     edx,DWORD [4+edi]
+        adc     ebx,0
+        ; mul a[0]*b[1]
+        mul     edx
+        add     ecx,eax
+        mov     eax,DWORD [20+esp]
+        adc     ebp,edx
+        mov     edx,DWORD [edi]
+        adc     ebx,0
+        mov     DWORD [4+eax],ecx
+        mov     eax,DWORD [8+esi]
+        ; saved r[1]
+        ; ################## Calculate word 2
+        xor     ecx,ecx
+        ; mul a[2]*b[0]
+        mul     edx
+        add     ebp,eax
+        mov     eax,DWORD [4+esi]
+        adc     ebx,edx
+        mov     edx,DWORD [4+edi]
+        adc     ecx,0
+        ; mul a[1]*b[1]
+        mul     edx
+        add     ebp,eax
+        mov     eax,DWORD [esi]
+        adc     ebx,edx
+        mov     edx,DWORD [8+edi]
+        adc     ecx,0
+        ; mul a[0]*b[2]
+        mul     edx
+        add     ebp,eax
+        mov     eax,DWORD [20+esp]
+        adc     ebx,edx
+        mov     edx,DWORD [edi]
+        adc     ecx,0
+        mov     DWORD [8+eax],ebp
+        mov     eax,DWORD [12+esi]
+        ; saved r[2]
+        ; ################## Calculate word 3
+        xor     ebp,ebp
+        ; mul a[3]*b[0]
+        mul     edx
+        add     ebx,eax
+        mov     eax,DWORD [8+esi]
+        adc     ecx,edx
+        mov     edx,DWORD [4+edi]
+        adc     ebp,0
+        ; mul a[2]*b[1]
+        mul     edx
+        add     ebx,eax
+        mov     eax,DWORD [4+esi]
+        adc     ecx,edx
+        mov     edx,DWORD [8+edi]
+        adc     ebp,0
+        ; mul a[1]*b[2]
+        mul     edx
+        add     ebx,eax
+        mov     eax,DWORD [esi]
+        adc     ecx,edx
+        mov     edx,DWORD [12+edi]
+        adc     ebp,0
+        ; mul a[0]*b[3]
+        mul     edx
+        add     ebx,eax
+        mov     eax,DWORD [20+esp]
+        adc     ecx,edx
+        mov     edx,DWORD [4+edi]
+        adc     ebp,0
+        mov     DWORD [12+eax],ebx
+        mov     eax,DWORD [12+esi]
+        ; saved r[3]
+        ; ################## Calculate word 4
+        xor     ebx,ebx
+        ; mul a[3]*b[1]
+        mul     edx
+        add     ecx,eax
+        mov     eax,DWORD [8+esi]
+        adc     ebp,edx
+        mov     edx,DWORD [8+edi]
+        adc     ebx,0
+        ; mul a[2]*b[2]
+        mul     edx
+        add     ecx,eax
+        mov     eax,DWORD [4+esi]
+        adc     ebp,edx
+        mov     edx,DWORD [12+edi]
+        adc     ebx,0
+        ; mul a[1]*b[3]
+        mul     edx
+        add     ecx,eax
+        mov     eax,DWORD [20+esp]
+        adc     ebp,edx
+        mov     edx,DWORD [8+edi]
+        adc     ebx,0
+        mov     DWORD [16+eax],ecx
+        mov     eax,DWORD [12+esi]
+        ; saved r[4]
+        ; ################## Calculate word 5
+        xor     ecx,ecx
+        ; mul a[3]*b[2]
+        mul     edx
+        add     ebp,eax
+        mov     eax,DWORD [8+esi]
+        adc     ebx,edx
+        mov     edx,DWORD [12+edi]
+        adc     ecx,0
+        ; mul a[2]*b[3]
+        mul     edx
+        add     ebp,eax
+        mov     eax,DWORD [20+esp]
+        adc     ebx,edx
+        mov     edx,DWORD [12+edi]
+        adc     ecx,0
+        mov     DWORD [20+eax],ebp
+        mov     eax,DWORD [12+esi]
+        ; saved r[5]
+        ; ################## Calculate word 6
+        xor     ebp,ebp
+        ; mul a[3]*b[3]
+        mul     edx
+        add     ebx,eax
+        mov     eax,DWORD [20+esp]
+        adc     ecx,edx
+        adc     ebp,0
+        mov     DWORD [24+eax],ebx
+        ; saved r[6]
+        ; save r[7]
+        mov     DWORD [28+eax],ecx
+        pop     ebx
+        pop     ebp
+        pop     edi
+        pop     esi
+        ret
+global  _bn_sqr_comba8
+align   16
+_bn_sqr_comba8:
+L$_bn_sqr_comba8_begin:
+        push    esi
+        push    edi
+        push    ebp
+        push    ebx
+        mov     edi,DWORD [20+esp]
+        mov     esi,DWORD [24+esp]
+        xor     ebx,ebx
+        xor     ecx,ecx
+        mov     eax,DWORD [esi]
+        ; ############### Calculate word 0
+        xor     ebp,ebp
+        ; sqr a[0]*a[0]
+        mul     eax
+        add     ebx,eax
+        adc     ecx,edx
+        mov     edx,DWORD [esi]
+        adc     ebp,0
+        mov     DWORD [edi],ebx
+        mov     eax,DWORD [4+esi]
+        ; saved r[0]
+        ; ############### Calculate word 1
+        xor     ebx,ebx
+        ; sqr a[1]*a[0]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ebx,0
+        add     ecx,eax
+        adc     ebp,edx
+        mov     eax,DWORD [8+esi]
+        adc     ebx,0
+        mov     DWORD [4+edi],ecx
+        mov     edx,DWORD [esi]
+        ; saved r[1]
+        ; ############### Calculate word 2
+        xor     ecx,ecx
+        ; sqr a[2]*a[0]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ecx,0
+        add     ebp,eax
+        adc     ebx,edx
+        mov     eax,DWORD [4+esi]
+        adc     ecx,0
+        ; sqr a[1]*a[1]
+        mul     eax
+        add     ebp,eax
+        adc     ebx,edx
+        mov     edx,DWORD [esi]
+        adc     ecx,0
+        mov     DWORD [8+edi],ebp
+        mov     eax,DWORD [12+esi]
+        ; saved r[2]
+        ; ############### Calculate word 3
+        xor     ebp,ebp
+        ; sqr a[3]*a[0]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ebp,0
+        add     ebx,eax
+        adc     ecx,edx
+        mov     eax,DWORD [8+esi]
+        adc     ebp,0
+        mov     edx,DWORD [4+esi]
+        ; sqr a[2]*a[1]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ebp,0
+        add     ebx,eax
+        adc     ecx,edx
+        mov     eax,DWORD [16+esi]
+        adc     ebp,0
+        mov     DWORD [12+edi],ebx
+        mov     edx,DWORD [esi]
+        ; saved r[3]
+        ; ############### Calculate word 4
+        xor     ebx,ebx
+        ; sqr a[4]*a[0]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ebx,0
+        add     ecx,eax
+        adc     ebp,edx
+        mov     eax,DWORD [12+esi]
+        adc     ebx,0
+        mov     edx,DWORD [4+esi]
+        ; sqr a[3]*a[1]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ebx,0
+        add     ecx,eax
+        adc     ebp,edx
+        mov     eax,DWORD [8+esi]
+        adc     ebx,0
+        ; sqr a[2]*a[2]
+        mul     eax
+        add     ecx,eax
+        adc     ebp,edx
+        mov     edx,DWORD [esi]
+        adc     ebx,0
+        mov     DWORD [16+edi],ecx
+        mov     eax,DWORD [20+esi]
+        ; saved r[4]
+        ; ############### Calculate word 5
+        xor     ecx,ecx
+        ; sqr a[5]*a[0]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ecx,0
+        add     ebp,eax
+        adc     ebx,edx
+        mov     eax,DWORD [16+esi]
+        adc     ecx,0
+        mov     edx,DWORD [4+esi]
+        ; sqr a[4]*a[1]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ecx,0
+        add     ebp,eax
+        adc     ebx,edx
+        mov     eax,DWORD [12+esi]
+        adc     ecx,0
+        mov     edx,DWORD [8+esi]
+        ; sqr a[3]*a[2]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ecx,0
+        add     ebp,eax
+        adc     ebx,edx
+        mov     eax,DWORD [24+esi]
+        adc     ecx,0
+        mov     DWORD [20+edi],ebp
+        mov     edx,DWORD [esi]
+        ; saved r[5]
+        ; ############### Calculate word 6
+        xor     ebp,ebp
+        ; sqr a[6]*a[0]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ebp,0
+        add     ebx,eax
+        adc     ecx,edx
+        mov     eax,DWORD [20+esi]
+        adc     ebp,0
+        mov     edx,DWORD [4+esi]
+        ; sqr a[5]*a[1]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ebp,0
+        add     ebx,eax
+        adc     ecx,edx
+        mov     eax,DWORD [16+esi]
+        adc     ebp,0
+        mov     edx,DWORD [8+esi]
+        ; sqr a[4]*a[2]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ebp,0
+        add     ebx,eax
+        adc     ecx,edx
+        mov     eax,DWORD [12+esi]
+        adc     ebp,0
+        ; sqr a[3]*a[3]
+        mul     eax
+        add     ebx,eax
+        adc     ecx,edx
+        mov     edx,DWORD [esi]
+        adc     ebp,0
+        mov     DWORD [24+edi],ebx
+        mov     eax,DWORD [28+esi]
+        ; saved r[6]
+        ; ############### Calculate word 7
+        xor     ebx,ebx
+        ; sqr a[7]*a[0]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ebx,0
+        add     ecx,eax
+        adc     ebp,edx
+        mov     eax,DWORD [24+esi]
+        adc     ebx,0
+        mov     edx,DWORD [4+esi]
+        ; sqr a[6]*a[1]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ebx,0
+        add     ecx,eax
+        adc     ebp,edx
+        mov     eax,DWORD [20+esi]
+        adc     ebx,0
+        mov     edx,DWORD [8+esi]
+        ; sqr a[5]*a[2]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ebx,0
+        add     ecx,eax
+        adc     ebp,edx
+        mov     eax,DWORD [16+esi]
+        adc     ebx,0
+        mov     edx,DWORD [12+esi]
+        ; sqr a[4]*a[3]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ebx,0
+        add     ecx,eax
+        adc     ebp,edx
+        mov     eax,DWORD [28+esi]
+        adc     ebx,0
+        mov     DWORD [28+edi],ecx
+        mov     edx,DWORD [4+esi]
+        ; saved r[7]
+        ; ############### Calculate word 8
+        xor     ecx,ecx
+        ; sqr a[7]*a[1]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ecx,0
+        add     ebp,eax
+        adc     ebx,edx
+        mov     eax,DWORD [24+esi]
+        adc     ecx,0
+        mov     edx,DWORD [8+esi]
+        ; sqr a[6]*a[2]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ecx,0
+        add     ebp,eax
+        adc     ebx,edx
+        mov     eax,DWORD [20+esi]
+        adc     ecx,0
+        mov     edx,DWORD [12+esi]
+        ; sqr a[5]*a[3]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ecx,0
+        add     ebp,eax
+        adc     ebx,edx
+        mov     eax,DWORD [16+esi]
+        adc     ecx,0
+        ; sqr a[4]*a[4]
+        mul     eax
+        add     ebp,eax
+        adc     ebx,edx
+        mov     edx,DWORD [8+esi]
+        adc     ecx,0
+        mov     DWORD [32+edi],ebp
+        mov     eax,DWORD [28+esi]
+        ; saved r[8]
+        ; ############### Calculate word 9
+        xor     ebp,ebp
+        ; sqr a[7]*a[2]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ebp,0
+        add     ebx,eax
+        adc     ecx,edx
+        mov     eax,DWORD [24+esi]
+        adc     ebp,0
+        mov     edx,DWORD [12+esi]
+        ; sqr a[6]*a[3]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ebp,0
+        add     ebx,eax
+        adc     ecx,edx
+        mov     eax,DWORD [20+esi]
+        adc     ebp,0
+        mov     edx,DWORD [16+esi]
+        ; sqr a[5]*a[4]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ebp,0
+        add     ebx,eax
+        adc     ecx,edx
+        mov     eax,DWORD [28+esi]
+        adc     ebp,0
+        mov     DWORD [36+edi],ebx
+        mov     edx,DWORD [12+esi]
+        ; saved r[9]
+        ; ############### Calculate word 10
+        xor     ebx,ebx
+        ; sqr a[7]*a[3]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ebx,0
+        add     ecx,eax
+        adc     ebp,edx
+        mov     eax,DWORD [24+esi]
+        adc     ebx,0
+        mov     edx,DWORD [16+esi]
+        ; sqr a[6]*a[4]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ebx,0
+        add     ecx,eax
+        adc     ebp,edx
+        mov     eax,DWORD [20+esi]
+        adc     ebx,0
+        ; sqr a[5]*a[5]
+        mul     eax
+        add     ecx,eax
+        adc     ebp,edx
+        mov     edx,DWORD [16+esi]
+        adc     ebx,0
+        mov     DWORD [40+edi],ecx
+        mov     eax,DWORD [28+esi]
+        ; saved r[10]
+        ; ############### Calculate word 11
+        xor     ecx,ecx
+        ; sqr a[7]*a[4]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ecx,0
+        add     ebp,eax
+        adc     ebx,edx
+        mov     eax,DWORD [24+esi]
+        adc     ecx,0
+        mov     edx,DWORD [20+esi]
+        ; sqr a[6]*a[5]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ecx,0
+        add     ebp,eax
+        adc     ebx,edx
+        mov     eax,DWORD [28+esi]
+        adc     ecx,0
+        mov     DWORD [44+edi],ebp
+        mov     edx,DWORD [20+esi]
+        ; saved r[11]
+        ; ############### Calculate word 12
+        xor     ebp,ebp
+        ; sqr a[7]*a[5]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ebp,0
+        add     ebx,eax
+        adc     ecx,edx
+        mov     eax,DWORD [24+esi]
+        adc     ebp,0
+        ; sqr a[6]*a[6]
+        mul     eax
+        add     ebx,eax
+        adc     ecx,edx
+        mov     edx,DWORD [24+esi]
+        adc     ebp,0
+        mov     DWORD [48+edi],ebx
+        mov     eax,DWORD [28+esi]
+        ; saved r[12]
+        ; ############### Calculate word 13
+        xor     ebx,ebx
+        ; sqr a[7]*a[6]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ebx,0
+        add     ecx,eax
+        adc     ebp,edx
+        mov     eax,DWORD [28+esi]
+        adc     ebx,0
+        mov     DWORD [52+edi],ecx
+        ; saved r[13]
+        ; ############### Calculate word 14
+        xor     ecx,ecx
+        ; sqr a[7]*a[7]
+        mul     eax
+        add     ebp,eax
+        adc     ebx,edx
+        adc     ecx,0
+        mov     DWORD [56+edi],ebp
+        ; saved r[14]
+        mov     DWORD [60+edi],ebx
+        pop     ebx
+        pop     ebp
+        pop     edi
+        pop     esi
+        ret
+global  _bn_sqr_comba4
+align   16
+_bn_sqr_comba4:
+L$_bn_sqr_comba4_begin:
+        push    esi
+        push    edi
+        push    ebp
+        push    ebx
+        mov     edi,DWORD [20+esp]
+        mov     esi,DWORD [24+esp]
+        xor     ebx,ebx
+        xor     ecx,ecx
+        mov     eax,DWORD [esi]
+        ; ############### Calculate word 0
+        xor     ebp,ebp
+        ; sqr a[0]*a[0]
+        mul     eax
+        add     ebx,eax
+        adc     ecx,edx
+        mov     edx,DWORD [esi]
+        adc     ebp,0
+        mov     DWORD [edi],ebx
+        mov     eax,DWORD [4+esi]
+        ; saved r[0]
+        ; ############### Calculate word 1
+        xor     ebx,ebx
+        ; sqr a[1]*a[0]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ebx,0
+        add     ecx,eax
+        adc     ebp,edx
+        mov     eax,DWORD [8+esi]
+        adc     ebx,0
+        mov     DWORD [4+edi],ecx
+        mov     edx,DWORD [esi]
+        ; saved r[1]
+        ; ############### Calculate word 2
+        xor     ecx,ecx
+        ; sqr a[2]*a[0]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ecx,0
+        add     ebp,eax
+        adc     ebx,edx
+        mov     eax,DWORD [4+esi]
+        adc     ecx,0
+        ; sqr a[1]*a[1]
+        mul     eax
+        add     ebp,eax
+        adc     ebx,edx
+        mov     edx,DWORD [esi]
+        adc     ecx,0
+        mov     DWORD [8+edi],ebp
+        mov     eax,DWORD [12+esi]
+        ; saved r[2]
+        ; ############### Calculate word 3
+        xor     ebp,ebp
+        ; sqr a[3]*a[0]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ebp,0
+        add     ebx,eax
+        adc     ecx,edx
+        mov     eax,DWORD [8+esi]
+        adc     ebp,0
+        mov     edx,DWORD [4+esi]
+        ; sqr a[2]*a[1]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ebp,0
+        add     ebx,eax
+        adc     ecx,edx
+        mov     eax,DWORD [12+esi]
+        adc     ebp,0
+        mov     DWORD [12+edi],ebx
+        mov     edx,DWORD [4+esi]
+        ; saved r[3]
+        ; ############### Calculate word 4
+        xor     ebx,ebx
+        ; sqr a[3]*a[1]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ebx,0
+        add     ecx,eax
+        adc     ebp,edx
+        mov     eax,DWORD [8+esi]
+        adc     ebx,0
+        ; sqr a[2]*a[2]
+        mul     eax
+        add     ecx,eax
+        adc     ebp,edx
+        mov     edx,DWORD [8+esi]
+        adc     ebx,0
+        mov     DWORD [16+edi],ecx
+        mov     eax,DWORD [12+esi]
+        ; saved r[4]
+        ; ############### Calculate word 5
+        xor     ecx,ecx
+        ; sqr a[3]*a[2]
+        mul     edx
+        add     eax,eax
+        adc     edx,edx
+        adc     ecx,0
+        add     ebp,eax
+        adc     ebx,edx
+        mov     eax,DWORD [12+esi]
+        adc     ecx,0
+        mov     DWORD [20+edi],ebp
+        ; saved r[5]
+        ; ############### Calculate word 6
+        xor     ebp,ebp
+        ; sqr a[3]*a[3]
+        mul     eax
+        add     ebx,eax
+        adc     ecx,edx
+        adc     ebp,0
+        mov     DWORD [24+edi],ebx
+        ; saved r[6]
+        mov     DWORD [28+edi],ecx
+        pop     ebx
+        pop     ebp
+        pop     edi
+        pop     esi
+        ret
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-gf2m.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-gf2m.nasm
new file mode 100644
index 0000000000..5f2f4f65de
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-gf2m.nasm
@@ -0,0 +1,352 @@
+; Copyright 2011-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code    use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text   code align=64
+%else
+section .text   code
+%endif
+;extern _OPENSSL_ia32cap_P
+align   16
+__mul_1x1_mmx:
+        sub     esp,36
+        mov     ecx,eax
+        lea     edx,[eax*1+eax]
+        and     ecx,1073741823
+        lea     ebp,[edx*1+edx]
+        mov     DWORD [esp],0
+        and     edx,2147483647
+        movd    mm2,eax
+        movd    mm3,ebx
+        mov     DWORD [4+esp],ecx
+        xor     ecx,edx
+        pxor    mm5,mm5
+        pxor    mm4,mm4
+        mov     DWORD [8+esp],edx
+        xor     edx,ebp
+        mov     DWORD [12+esp],ecx
+        pcmpgtd mm5,mm2
+        paddd   mm2,mm2
+        xor     ecx,edx
+        mov     DWORD [16+esp],ebp
+        xor     ebp,edx
+        pand    mm5,mm3
+        pcmpgtd mm4,mm2
+        mov     DWORD [20+esp],ecx
+        xor     ebp,ecx
+        psllq   mm5,31
+        pand    mm4,mm3
+        mov     DWORD [24+esp],edx
+        mov     esi,7
+        mov     DWORD [28+esp],ebp
+        mov     ebp,esi
+        and     esi,ebx
+        shr     ebx,3
+        mov     edi,ebp
+        psllq   mm4,30
+        and     edi,ebx
+        shr     ebx,3
+        movd    mm0,DWORD [esi*4+esp]
+        mov     esi,ebp
+        and     esi,ebx
+        shr     ebx,3
+        movd    mm2,DWORD [edi*4+esp]
+        mov     edi,ebp
+        psllq   mm2,3
+        and     edi,ebx
+        shr     ebx,3
+        pxor    mm0,mm2
+        movd    mm1,DWORD [esi*4+esp]
+        mov     esi,ebp
+        psllq   mm1,6
+        and     esi,ebx
+        shr     ebx,3
+        pxor    mm0,mm1
+        movd    mm2,DWORD [edi*4+esp]
+        mov     edi,ebp
+        psllq   mm2,9
+        and     edi,ebx
+        shr     ebx,3
+        pxor    mm0,mm2
+        movd    mm1,DWORD [esi*4+esp]
+        mov     esi,ebp
+        psllq   mm1,12
+        and     esi,ebx
+        shr     ebx,3
+        pxor    mm0,mm1
+        movd    mm2,DWORD [edi*4+esp]
+        mov     edi,ebp
+        psllq   mm2,15
+        and     edi,ebx
+        shr     ebx,3
+        pxor    mm0,mm2
+        movd    mm1,DWORD [esi*4+esp]
+        mov     esi,ebp
+        psllq   mm1,18
+        and     esi,ebx
+        shr     ebx,3
+        pxor    mm0,mm1
+        movd    mm2,DWORD [edi*4+esp]
+        mov     edi,ebp
+        psllq   mm2,21
+        and     edi,ebx
+        shr     ebx,3
+        pxor    mm0,mm2
+        movd    mm1,DWORD [esi*4+esp]
+        mov     esi,ebp
+        psllq   mm1,24
+        and     esi,ebx
+        shr     ebx,3
+        pxor    mm0,mm1
+        movd    mm2,DWORD [edi*4+esp]
+        pxor    mm0,mm4
+        psllq   mm2,27
+        pxor    mm0,mm2
+        movd    mm1,DWORD [esi*4+esp]
+        pxor    mm0,mm5
+        psllq   mm1,30
+        add     esp,36
+        pxor    mm0,mm1
+        ret
+align   16
+__mul_1x1_ialu:
+        sub     esp,36
+        mov     ecx,eax
+        lea     edx,[eax*1+eax]
+        lea     ebp,[eax*4]
+        and     ecx,1073741823
+        lea     edi,[eax*1+eax]
+        sar     eax,31
+        mov     DWORD [esp],0
+        and     edx,2147483647
+        mov     DWORD [4+esp],ecx
+        xor     ecx,edx
+        mov     DWORD [8+esp],edx
+        xor     edx,ebp
+        mov     DWORD [12+esp],ecx
+        xor     ecx,edx
+        mov     DWORD [16+esp],ebp
+        xor     ebp,edx
+        mov     DWORD [20+esp],ecx
+        xor     ebp,ecx
+        sar     edi,31
+        and     eax,ebx
+        mov     DWORD [24+esp],edx
+        and     edi,ebx
+        mov     DWORD [28+esp],ebp
+        mov     edx,eax
+        shl     eax,31
+        mov     ecx,edi
+        shr     edx,1
+        mov     esi,7
+        shl     edi,30
+        and     esi,ebx
+        shr     ecx,2
+        xor     eax,edi
+        shr     ebx,3
+        mov     edi,7
+        and     edi,ebx
+        shr     ebx,3
+        xor     edx,ecx
+        xor     eax,DWORD [esi*4+esp]
+        mov     esi,7
+        and     esi,ebx
+        shr     ebx,3
+        mov     ebp,DWORD [edi*4+esp]
+        mov     edi,7
+        mov     ecx,ebp
+        shl     ebp,3
+        and     edi,ebx
+        shr     ecx,29
+        xor     eax,ebp
+        shr     ebx,3
+        xor     edx,ecx
+        mov     ecx,DWORD [esi*4+esp]
+        mov     esi,7
+        mov     ebp,ecx
+        shl     ecx,6
+        and     esi,ebx
+        shr     ebp,26
+        xor     eax,ecx
+        shr     ebx,3
+        xor     edx,ebp
+        mov     ebp,DWORD [edi*4+esp]
+        mov     edi,7
+        mov     ecx,ebp
+        shl     ebp,9
+        and     edi,ebx
+        shr     ecx,23
+        xor     eax,ebp
+        shr     ebx,3
+        xor     edx,ecx
+        mov     ecx,DWORD [esi*4+esp]
+        mov     esi,7
+        mov     ebp,ecx
+        shl     ecx,12
+        and     esi,ebx
+        shr     ebp,20
+        xor     eax,ecx
+        shr     ebx,3
+        xor     edx,ebp
+        mov     ebp,DWORD [edi*4+esp]
+        mov     edi,7
+        mov     ecx,ebp
+        shl     ebp,15
+        and     edi,ebx
+        shr     ecx,17
+        xor     eax,ebp
+        shr     ebx,3
+        xor     edx,ecx
+        mov     ecx,DWORD [esi*4+esp]
+        mov     esi,7
+        mov     ebp,ecx
+        shl     ecx,18
+        and     esi,ebx
+        shr     ebp,14
+        xor     eax,ecx
+        shr     ebx,3
+        xor     edx,ebp
+        mov     ebp,DWORD [edi*4+esp]
+        mov     edi,7
+        mov     ecx,ebp
+        shl     ebp,21
+        and     edi,ebx
+        shr     ecx,11
+        xor     eax,ebp
+        shr     ebx,3
+        xor     edx,ecx
+        mov     ecx,DWORD [esi*4+esp]
+        mov     esi,7
+        mov     ebp,ecx
+        shl     ecx,24
+        and     esi,ebx
+        shr     ebp,8
+        xor     eax,ecx
+        shr     ebx,3
+        xor     edx,ebp
+        mov     ebp,DWORD [edi*4+esp]
+        mov     ecx,ebp
+        shl     ebp,27
+        mov     edi,DWORD [esi*4+esp]
+        shr     ecx,5
+        mov     esi,edi
+        xor     eax,ebp
+        shl     edi,30
+        xor     edx,ecx
+        shr     esi,2
+        xor     eax,edi
+        xor     edx,esi
+        add     esp,36
+        ret
+global  _bn_GF2m_mul_2x2
+align   16
+_bn_GF2m_mul_2x2:
+L$_bn_GF2m_mul_2x2_begin:
+        lea     edx,[_OPENSSL_ia32cap_P]
+        mov     eax,DWORD [edx]
+        mov     edx,DWORD [4+edx]
+        test    eax,8388608
+        jz      NEAR L$000ialu
+        test    eax,16777216
+        jz      NEAR L$001mmx
+        test    edx,2
+        jz      NEAR L$001mmx
+        movups  xmm0,[8+esp]
+        shufps  xmm0,xmm0,177
+db      102,15,58,68,192,1
+        mov     eax,DWORD [4+esp]
+        movups  [eax],xmm0
+        ret
+align   16
+L$001mmx:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        mov     eax,DWORD [24+esp]
+        mov     ebx,DWORD [32+esp]
+        call    __mul_1x1_mmx
+        movq    mm7,mm0
+        mov     eax,DWORD [28+esp]
+        mov     ebx,DWORD [36+esp]
+        call    __mul_1x1_mmx
+        movq    mm6,mm0
+        mov     eax,DWORD [24+esp]
+        mov     ebx,DWORD [32+esp]
+        xor     eax,DWORD [28+esp]
+        xor     ebx,DWORD [36+esp]
+        call    __mul_1x1_mmx
+        pxor    mm0,mm7
+        mov     eax,DWORD [20+esp]
+        pxor    mm0,mm6
+        movq    mm2,mm0
+        psllq   mm0,32
+        pop     edi
+        psrlq   mm2,32
+        pop     esi
+        pxor    mm0,mm6
+        pop     ebx
+        pxor    mm2,mm7
+        movq    [eax],mm0
+        pop     ebp
+        movq    [8+eax],mm2
+        emms
+        ret
+align   16
+L$000ialu:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        sub     esp,20
+        mov     eax,DWORD [44+esp]
+        mov     ebx,DWORD [52+esp]
+        call    __mul_1x1_ialu
+        mov     DWORD [8+esp],eax
+        mov     DWORD [12+esp],edx
+        mov     eax,DWORD [48+esp]
+        mov     ebx,DWORD [56+esp]
+        call    __mul_1x1_ialu
+        mov     DWORD [esp],eax
+        mov     DWORD [4+esp],edx
+        mov     eax,DWORD [44+esp]
+        mov     ebx,DWORD [52+esp]
+        xor     eax,DWORD [48+esp]
+        xor     ebx,DWORD [56+esp]
+        call    __mul_1x1_ialu
+        mov     ebp,DWORD [40+esp]
+        mov     ebx,DWORD [esp]
+        mov     ecx,DWORD [4+esp]
+        mov     edi,DWORD [8+esp]
+        mov     esi,DWORD [12+esp]
+        xor     eax,edx
+        xor     edx,ecx
+        xor     eax,ebx
+        mov     DWORD [ebp],ebx
+        xor     edx,edi
+        mov     DWORD [12+ebp],esi
+        xor     eax,esi
+        add     esp,20
+        xor     edx,esi
+        pop     edi
+        xor     eax,edx
+        pop     esi
+        mov     DWORD [8+ebp],edx
+        pop     ebx
+        mov     DWORD [4+ebp],eax
+        pop     ebp
+        ret
+db      71,70,40,50,94,109,41,32,77,117,108,116,105,112,108,105
+db      99,97,116,105,111,110,32,102,111,114,32,120,56,54,44,32
+db      67,82,89,80,84,79,71,65,77,83,32,98,121,32,60,97
+db      112,112,114,111,64,111,112,101,110,115,115,108,46,111,114,103
+db      62,0
+segment .bss
+common  _OPENSSL_ia32cap_P 16
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-mont.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-mont.nasm
new file mode 100644
index 0000000000..904526ffbf
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-mont.nasm
@@ -0,0 +1,486 @@
+; Copyright 2005-2018 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code    use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text   code align=64
+%else
+section .text   code
+%endif
+;extern _OPENSSL_ia32cap_P
+global  _bn_mul_mont
+align   16
+_bn_mul_mont:
+L$_bn_mul_mont_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        xor     eax,eax
+        mov     edi,DWORD [40+esp]
+        cmp     edi,4
+        jl      NEAR L$000just_leave
+        lea     esi,[20+esp]
+        lea     edx,[24+esp]
+        add     edi,2
+        neg     edi
+        lea     ebp,[edi*4+esp-32]
+        neg     edi
+        mov     eax,ebp
+        sub     eax,edx
+        and     eax,2047
+        sub     ebp,eax
+        xor     edx,ebp
+        and     edx,2048
+        xor     edx,2048
+        sub     ebp,edx
+        and     ebp,-64
+        mov     eax,esp
+        sub     eax,ebp
+        and     eax,-4096
+        mov     edx,esp
+        lea     esp,[eax*1+ebp]
+        mov     eax,DWORD [esp]
+        cmp     esp,ebp
+        ja      NEAR L$001page_walk
+        jmp     NEAR L$002page_walk_done
+align   16
+L$001page_walk:
+        lea     esp,[esp-4096]
+        mov     eax,DWORD [esp]
+        cmp     esp,ebp
+        ja      NEAR L$001page_walk
+L$002page_walk_done:
+        mov     eax,DWORD [esi]
+        mov     ebx,DWORD [4+esi]
+        mov     ecx,DWORD [8+esi]
+        mov     ebp,DWORD [12+esi]
+        mov     esi,DWORD [16+esi]
+        mov     esi,DWORD [esi]
+        mov     DWORD [4+esp],eax
+        mov     DWORD [8+esp],ebx
+        mov     DWORD [12+esp],ecx
+        mov     DWORD [16+esp],ebp
+        mov     DWORD [20+esp],esi
+        lea     ebx,[edi-3]
+        mov     DWORD [24+esp],edx
+        lea     eax,[_OPENSSL_ia32cap_P]
+        bt      DWORD [eax],26
+        jnc     NEAR L$003non_sse2
+        mov     eax,-1
+        movd    mm7,eax
+        mov     esi,DWORD [8+esp]
+        mov     edi,DWORD [12+esp]
+        mov     ebp,DWORD [16+esp]
+        xor     edx,edx
+        xor     ecx,ecx
+        movd    mm4,DWORD [edi]
+        movd    mm5,DWORD [esi]
+        movd    mm3,DWORD [ebp]
+        pmuludq mm5,mm4
+        movq    mm2,mm5
+        movq    mm0,mm5
+        pand    mm0,mm7
+        pmuludq mm5,[20+esp]
+        pmuludq mm3,mm5
+        paddq   mm3,mm0
+        movd    mm1,DWORD [4+ebp]
+        movd    mm0,DWORD [4+esi]
+        psrlq   mm2,32
+        psrlq   mm3,32
+        inc     ecx
+align   16
+L$0041st:
+        pmuludq mm0,mm4
+        pmuludq mm1,mm5
+        paddq   mm2,mm0
+        paddq   mm3,mm1
+        movq    mm0,mm2
+        pand    mm0,mm7
+        movd    mm1,DWORD [4+ecx*4+ebp]
+        paddq   mm3,mm0
+        movd    mm0,DWORD [4+ecx*4+esi]
+        psrlq   mm2,32
+        movd    DWORD [28+ecx*4+esp],mm3
+        psrlq   mm3,32
+        lea     ecx,[1+ecx]
+        cmp     ecx,ebx
+        jl      NEAR L$0041st
+        pmuludq mm0,mm4
+        pmuludq mm1,mm5
+        paddq   mm2,mm0
+        paddq   mm3,mm1
+        movq    mm0,mm2
+        pand    mm0,mm7
+        paddq   mm3,mm0
+        movd    DWORD [28+ecx*4+esp],mm3
+        psrlq   mm2,32
+        psrlq   mm3,32
+        paddq   mm3,mm2
+        movq    [32+ebx*4+esp],mm3
+        inc     edx
+L$005outer:
+        xor     ecx,ecx
+        movd    mm4,DWORD [edx*4+edi]
+        movd    mm5,DWORD [esi]
+        movd    mm6,DWORD [32+esp]
+        movd    mm3,DWORD [ebp]
+        pmuludq mm5,mm4
+        paddq   mm5,mm6
+        movq    mm0,mm5
+        movq    mm2,mm5
+        pand    mm0,mm7
+        pmuludq mm5,[20+esp]
+        pmuludq mm3,mm5
+        paddq   mm3,mm0
+        movd    mm6,DWORD [36+esp]
+        movd    mm1,DWORD [4+ebp]
+        movd    mm0,DWORD [4+esi]
+        psrlq   mm2,32
+        psrlq   mm3,32
+        paddq   mm2,mm6
+        inc     ecx
+        dec     ebx
+L$006inner:
+        pmuludq mm0,mm4
+        pmuludq mm1,mm5
+        paddq   mm2,mm0
+        paddq   mm3,mm1
+        movq    mm0,mm2
+        movd    mm6,DWORD [36+ecx*4+esp]
+        pand    mm0,mm7
+        movd    mm1,DWORD [4+ecx*4+ebp]
+        paddq   mm3,mm0
+        movd    mm0,DWORD [4+ecx*4+esi]
+        psrlq   mm2,32
+        movd    DWORD [28+ecx*4+esp],mm3
+        psrlq   mm3,32
+        paddq   mm2,mm6
+        dec     ebx
+        lea     ecx,[1+ecx]
+        jnz     NEAR L$006inner
+        mov     ebx,ecx
+        pmuludq mm0,mm4
+        pmuludq mm1,mm5
+        paddq   mm2,mm0
+        paddq   mm3,mm1
+        movq    mm0,mm2
+        pand    mm0,mm7
+        paddq   mm3,mm0
+        movd    DWORD [28+ecx*4+esp],mm3
+        psrlq   mm2,32
+        psrlq   mm3,32
+        movd    mm6,DWORD [36+ebx*4+esp]
+        paddq   mm3,mm2
+        paddq   mm3,mm6
+        movq    [32+ebx*4+esp],mm3
+        lea     edx,[1+edx]
+        cmp     edx,ebx
+        jle     NEAR L$005outer
+        emms
+        jmp     NEAR L$007common_tail
+align   16
+L$003non_sse2:
+        mov     esi,DWORD [8+esp]
+        lea     ebp,[1+ebx]
+        mov     edi,DWORD [12+esp]
+        xor     ecx,ecx
+        mov     edx,esi
+        and     ebp,1
+        sub     edx,edi
+        lea     eax,[4+ebx*4+edi]
+        or      ebp,edx
+        mov     edi,DWORD [edi]
+        jz      NEAR L$008bn_sqr_mont
+        mov     DWORD [28+esp],eax
+        mov     eax,DWORD [esi]
+        xor     edx,edx
+align   16
+L$009mull:
+        mov     ebp,edx
+        mul     edi
+        add     ebp,eax
+        lea     ecx,[1+ecx]
+        adc     edx,0
+        mov     eax,DWORD [ecx*4+esi]
+        cmp     ecx,ebx
+        mov     DWORD [28+ecx*4+esp],ebp
+        jl      NEAR L$009mull
+        mov     ebp,edx
+        mul     edi
+        mov     edi,DWORD [20+esp]
+        add     eax,ebp
+        mov     esi,DWORD [16+esp]
+        adc     edx,0
+        imul    edi,DWORD [32+esp]
+        mov     DWORD [32+ebx*4+esp],eax
+        xor     ecx,ecx
+        mov     DWORD [36+ebx*4+esp],edx
+        mov     DWORD [40+ebx*4+esp],ecx
+        mov     eax,DWORD [esi]
+        mul     edi
+        add     eax,DWORD [32+esp]
+        mov     eax,DWORD [4+esi]
+        adc     edx,0
+        inc     ecx
+        jmp     NEAR L$0102ndmadd
+align   16
+L$0111stmadd:
+        mov     ebp,edx
+        mul     edi
+        add     ebp,DWORD [32+ecx*4+esp]
+        lea     ecx,[1+ecx]
+        adc     edx,0
+        add     ebp,eax
+        mov     eax,DWORD [ecx*4+esi]
+        adc     edx,0
+        cmp     ecx,ebx
+        mov     DWORD [28+ecx*4+esp],ebp
+        jl      NEAR L$0111stmadd
+        mov     ebp,edx
+        mul     edi
+        add     eax,DWORD [32+ebx*4+esp]
+        mov     edi,DWORD [20+esp]
+        adc     edx,0
+        mov     esi,DWORD [16+esp]
+        add     ebp,eax
+        adc     edx,0
+        imul    edi,DWORD [32+esp]
+        xor     ecx,ecx
+        add     edx,DWORD [36+ebx*4+esp]
+        mov     DWORD [32+ebx*4+esp],ebp
+        adc     ecx,0
+        mov     eax,DWORD [esi]
+        mov     DWORD [36+ebx*4+esp],edx
+        mov     DWORD [40+ebx*4+esp],ecx
+        mul     edi
+        add     eax,DWORD [32+esp]
+        mov     eax,DWORD [4+esi]
+        adc     edx,0
+        mov     ecx,1
+align   16
+L$0102ndmadd:
+        mov     ebp,edx
+        mul     edi
+        add     ebp,DWORD [32+ecx*4+esp]
+        lea     ecx,[1+ecx]
+        adc     edx,0
+        add     ebp,eax
+        mov     eax,DWORD [ecx*4+esi]
+        adc     edx,0
+        cmp     ecx,ebx
+        mov     DWORD [24+ecx*4+esp],ebp
+        jl      NEAR L$0102ndmadd
+        mov     ebp,edx
+        mul     edi
+        add     ebp,DWORD [32+ebx*4+esp]
+        adc     edx,0
+        add     ebp,eax
+        adc     edx,0
+        mov     DWORD [28+ebx*4+esp],ebp
+        xor     eax,eax
+        mov     ecx,DWORD [12+esp]
+        add     edx,DWORD [36+ebx*4+esp]
+        adc     eax,DWORD [40+ebx*4+esp]
+        lea     ecx,[4+ecx]
+        mov     DWORD [32+ebx*4+esp],edx
+        cmp     ecx,DWORD [28+esp]
+        mov     DWORD [36+ebx*4+esp],eax
+        je      NEAR L$007common_tail
+        mov     edi,DWORD [ecx]
+        mov     esi,DWORD [8+esp]
+        mov     DWORD [12+esp],ecx
+        xor     ecx,ecx
+        xor     edx,edx
+        mov     eax,DWORD [esi]
+        jmp     NEAR L$0111stmadd
+align   16
+L$008bn_sqr_mont:
+        mov     DWORD [esp],ebx
+        mov     DWORD [12+esp],ecx
+        mov     eax,edi
+        mul     edi
+        mov     DWORD [32+esp],eax
+        mov     ebx,edx
+        shr     edx,1
+        and     ebx,1
+        inc     ecx
+align   16
+L$012sqr:
+        mov     eax,DWORD [ecx*4+esi]
+        mov     ebp,edx
+        mul     edi
+        add     eax,ebp
+        lea     ecx,[1+ecx]
+        adc     edx,0
+        lea     ebp,[eax*2+ebx]
+        shr     eax,31
+        cmp     ecx,DWORD [esp]
+        mov     ebx,eax
+        mov     DWORD [28+ecx*4+esp],ebp
+        jl      NEAR L$012sqr
+        mov     eax,DWORD [ecx*4+esi]
+        mov     ebp,edx
+        mul     edi
+        add     eax,ebp
+        mov     edi,DWORD [20+esp]
+        adc     edx,0
+        mov     esi,DWORD [16+esp]
+        lea     ebp,[eax*2+ebx]
+        imul    edi,DWORD [32+esp]
+        shr     eax,31
+        mov     DWORD [32+ecx*4+esp],ebp
+        lea     ebp,[edx*2+eax]
+        mov     eax,DWORD [esi]
+        shr     edx,31
+        mov     DWORD [36+ecx*4+esp],ebp
+        mov     DWORD [40+ecx*4+esp],edx
+        mul     edi
+        add     eax,DWORD [32+esp]
+        mov     ebx,ecx
+        adc     edx,0
+        mov     eax,DWORD [4+esi]
+        mov     ecx,1
+align   16
+L$0133rdmadd:
+        mov     ebp,edx
+        mul     edi
+        add     ebp,DWORD [32+ecx*4+esp]
+        adc     edx,0
+        add     ebp,eax
+        mov     eax,DWORD [4+ecx*4+esi]
+        adc     edx,0
+        mov     DWORD [28+ecx*4+esp],ebp
+        mov     ebp,edx
+        mul     edi
+        add     ebp,DWORD [36+ecx*4+esp]
+        lea     ecx,[2+ecx]
+        adc     edx,0
+        add     ebp,eax
+        mov     eax,DWORD [ecx*4+esi]
+        adc     edx,0
+        cmp     ecx,ebx
+        mov     DWORD [24+ecx*4+esp],ebp
+        jl      NEAR L$0133rdmadd
+        mov     ebp,edx
+        mul     edi
+        add     ebp,DWORD [32+ebx*4+esp]
+        adc     edx,0
+        add     ebp,eax
+        adc     edx,0
+        mov     DWORD [28+ebx*4+esp],ebp
+        mov     ecx,DWORD [12+esp]
+        xor     eax,eax
+        mov     esi,DWORD [8+esp]
+        add     edx,DWORD [36+ebx*4+esp]
+        adc     eax,DWORD [40+ebx*4+esp]
+        mov     DWORD [32+ebx*4+esp],edx
+        cmp     ecx,ebx
+        mov     DWORD [36+ebx*4+esp],eax
+        je      NEAR L$007common_tail
+        mov     edi,DWORD [4+ecx*4+esi]
+        lea     ecx,[1+ecx]
+        mov     eax,edi
+        mov     DWORD [12+esp],ecx
+        mul     edi
+        add     eax,DWORD [32+ecx*4+esp]
+        adc     edx,0
+        mov     DWORD [32+ecx*4+esp],eax
+        xor     ebp,ebp
+        cmp     ecx,ebx
+        lea     ecx,[1+ecx]
+        je      NEAR L$014sqrlast
+        mov     ebx,edx
+        shr     edx,1
+        and     ebx,1
+align   16
+L$015sqradd:
+        mov     eax,DWORD [ecx*4+esi]
+        mov     ebp,edx
+        mul     edi
+        add     eax,ebp
+        lea     ebp,[eax*1+eax]
+        adc     edx,0
+        shr     eax,31
+        add     ebp,DWORD [32+ecx*4+esp]
+        lea     ecx,[1+ecx]
+        adc     eax,0
+        add     ebp,ebx
+        adc     eax,0
+        cmp     ecx,DWORD [esp]
+        mov     DWORD [28+ecx*4+esp],ebp
+        mov     ebx,eax
+        jle     NEAR L$015sqradd
+        mov     ebp,edx
+        add     edx,edx
+        shr     ebp,31
+        add     edx,ebx
+        adc     ebp,0
+L$014sqrlast:
+        mov     edi,DWORD [20+esp]
+        mov     esi,DWORD [16+esp]
+        imul    edi,DWORD [32+esp]
+        add     edx,DWORD [32+ecx*4+esp]
+        mov     eax,DWORD [esi]
+        adc     ebp,0
+        mov     DWORD [32+ecx*4+esp],edx
+        mov     DWORD [36+ecx*4+esp],ebp
+        mul     edi
+        add     eax,DWORD [32+esp]
+        lea     ebx,[ecx-1]
+        adc     edx,0
+        mov     ecx,1
+        mov     eax,DWORD [4+esi]
+        jmp     NEAR L$0133rdmadd
+align   16
+L$007common_tail:
+        mov     ebp,DWORD [16+esp]
+        mov     edi,DWORD [4+esp]
+        lea     esi,[32+esp]
+        mov     eax,DWORD [esi]
+        mov     ecx,ebx
+        xor     edx,edx
+align   16
+L$016sub:
+        sbb     eax,DWORD [edx*4+ebp]
+        mov     DWORD [edx*4+edi],eax
+        dec     ecx
+        mov     eax,DWORD [4+edx*4+esi]
+        lea     edx,[1+edx]
+        jge     NEAR L$016sub
+        sbb     eax,0
+        mov     edx,-1
+        xor     edx,eax
+        jmp     NEAR L$017copy
+align   16
+L$017copy:
+        mov     esi,DWORD [32+ebx*4+esp]
+        mov     ebp,DWORD [ebx*4+edi]
+        mov     DWORD [32+ebx*4+esp],ecx
+        and     esi,eax
+        and     ebp,edx
+        or      ebp,esi
+        mov     DWORD [ebx*4+edi],ebp
+        dec     ebx
+        jge     NEAR L$017copy
+        mov     esp,DWORD [24+esp]
+        mov     eax,1
+L$000just_leave:
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+db      77,111,110,116,103,111,109,101,114,121,32,77,117,108,116,105
+db      112,108,105,99,97,116,105,111,110,32,102,111,114,32,120,56
+db      54,44,32,67,82,89,80,84,79,71,65,77,83,32,98,121
+db      32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46
+db      111,114,103,62,0
+segment .bss
+common  _OPENSSL_ia32cap_P 16
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/des/crypt586.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/des/crypt586.nasm
new file mode 100644
index 0000000000..dd69f436c4
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/des/crypt586.nasm
@@ -0,0 +1,887 @@
+; Copyright 1995-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code    use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text   code align=64
+%else
+section .text   code
+%endif
+extern  _DES_SPtrans
+global  _fcrypt_body
+align   16
+_fcrypt_body:
+L$_fcrypt_body_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        ;
+        ; Load the 2 words
+        xor     edi,edi
+        xor     esi,esi
+        lea     edx,[_DES_SPtrans]
+        push    edx
+        mov     ebp,DWORD [28+esp]
+        push    DWORD 25
+L$000start:
+        ;
+        ; Round 0
+        mov     eax,DWORD [36+esp]
+        mov     edx,esi
+        shr     edx,16
+        mov     ecx,DWORD [40+esp]
+        xor     edx,esi
+        and     eax,edx
+        and     edx,ecx
+        mov     ebx,eax
+        shl     ebx,16
+        mov     ecx,edx
+        shl     ecx,16
+        xor     eax,ebx
+        xor     edx,ecx
+        mov     ebx,DWORD [ebp]
+        xor     eax,ebx
+        mov     ecx,DWORD [4+ebp]
+        xor     eax,esi
+        xor     edx,esi
+        xor     edx,ecx
+        and     eax,0xfcfcfcfc
+        xor     ebx,ebx
+        and     edx,0xcfcfcfcf
+        xor     ecx,ecx
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        mov     ebp,DWORD [4+esp]
+        xor     edi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     edi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     edi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     edi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        mov     ebx,DWORD [0x600+ebx*1+ebp]
+        xor     edi,ebx
+        mov     ebx,DWORD [0x700+ecx*1+ebp]
+        xor     edi,ebx
+        mov     ebx,DWORD [0x400+eax*1+ebp]
+        xor     edi,ebx
+        mov     ebx,DWORD [0x500+edx*1+ebp]
+        xor     edi,ebx
+        mov     ebp,DWORD [32+esp]
+        ;
+        ; Round 1
+        mov     eax,DWORD [36+esp]
+        mov     edx,edi
+        shr     edx,16
+        mov     ecx,DWORD [40+esp]
+        xor     edx,edi
+        and     eax,edx
+        and     edx,ecx
+        mov     ebx,eax
+        shl     ebx,16
+        mov     ecx,edx
+        shl     ecx,16
+        xor     eax,ebx
+        xor     edx,ecx
+        mov     ebx,DWORD [8+ebp]
+        xor     eax,ebx
+        mov     ecx,DWORD [12+ebp]
+        xor     eax,edi
+        xor     edx,edi
+        xor     edx,ecx
+        and     eax,0xfcfcfcfc
+        xor     ebx,ebx
+        and     edx,0xcfcfcfcf
+        xor     ecx,ecx
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        mov     ebp,DWORD [4+esp]
+        xor     esi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     esi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     esi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     esi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        mov     ebx,DWORD [0x600+ebx*1+ebp]
+        xor     esi,ebx
+        mov     ebx,DWORD [0x700+ecx*1+ebp]
+        xor     esi,ebx
+        mov     ebx,DWORD [0x400+eax*1+ebp]
+        xor     esi,ebx
+        mov     ebx,DWORD [0x500+edx*1+ebp]
+        xor     esi,ebx
+        mov     ebp,DWORD [32+esp]
+        ;
+        ; Round 2
+        mov     eax,DWORD [36+esp]
+        mov     edx,esi
+        shr     edx,16
+        mov     ecx,DWORD [40+esp]
+        xor     edx,esi
+        and     eax,edx
+        and     edx,ecx
+        mov     ebx,eax
+        shl     ebx,16
+        mov     ecx,edx
+        shl     ecx,16
+        xor     eax,ebx
+        xor     edx,ecx
+        mov     ebx,DWORD [16+ebp]
+        xor     eax,ebx
+        mov     ecx,DWORD [20+ebp]
+        xor     eax,esi
+        xor     edx,esi
+        xor     edx,ecx
+        and     eax,0xfcfcfcfc
+        xor     ebx,ebx
+        and     edx,0xcfcfcfcf
+        xor     ecx,ecx
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        mov     ebp,DWORD [4+esp]
+        xor     edi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     edi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     edi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     edi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        mov     ebx,DWORD [0x600+ebx*1+ebp]
+        xor     edi,ebx
+        mov     ebx,DWORD [0x700+ecx*1+ebp]
+        xor     edi,ebx
+        mov     ebx,DWORD [0x400+eax*1+ebp]
+        xor     edi,ebx
+        mov     ebx,DWORD [0x500+edx*1+ebp]
+        xor     edi,ebx
+        mov     ebp,DWORD [32+esp]
+        ;
+        ; Round 3
+        mov     eax,DWORD [36+esp]
+        mov     edx,edi
+        shr     edx,16
+        mov     ecx,DWORD [40+esp]
+        xor     edx,edi
+        and     eax,edx
+        and     edx,ecx
+        mov     ebx,eax
+        shl     ebx,16
+        mov     ecx,edx
+        shl     ecx,16
+        xor     eax,ebx
+        xor     edx,ecx
+        mov     ebx,DWORD [24+ebp]
+        xor     eax,ebx
+        mov     ecx,DWORD [28+ebp]
+        xor     eax,edi
+        xor     edx,edi
+        xor     edx,ecx
+        and     eax,0xfcfcfcfc
+        xor     ebx,ebx
+        and     edx,0xcfcfcfcf
+        xor     ecx,ecx
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        mov     ebp,DWORD [4+esp]
+        xor     esi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     esi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     esi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     esi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        mov     ebx,DWORD [0x600+ebx*1+ebp]
+        xor     esi,ebx
+        mov     ebx,DWORD [0x700+ecx*1+ebp]
+        xor     esi,ebx
+        mov     ebx,DWORD [0x400+eax*1+ebp]
+        xor     esi,ebx
+        mov     ebx,DWORD [0x500+edx*1+ebp]
+        xor     esi,ebx
+        mov     ebp,DWORD [32+esp]
+        ;
+        ; Round 4
+        mov     eax,DWORD [36+esp]
+        mov     edx,esi
+        shr     edx,16
+        mov     ecx,DWORD [40+esp]
+        xor     edx,esi
+        and     eax,edx
+        and     edx,ecx
+        mov     ebx,eax
+        shl     ebx,16
+        mov     ecx,edx
+        shl     ecx,16
+        xor     eax,ebx
+        xor     edx,ecx
+        mov     ebx,DWORD [32+ebp]
+        xor     eax,ebx
+        mov     ecx,DWORD [36+ebp]
+        xor     eax,esi
+        xor     edx,esi
+        xor     edx,ecx
+        and     eax,0xfcfcfcfc
+        xor     ebx,ebx
+        and     edx,0xcfcfcfcf
+        xor     ecx,ecx
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        mov     ebp,DWORD [4+esp]
+        xor     edi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     edi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     edi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     edi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        mov     ebx,DWORD [0x600+ebx*1+ebp]
+        xor     edi,ebx
+        mov     ebx,DWORD [0x700+ecx*1+ebp]
+        xor     edi,ebx
+        mov     ebx,DWORD [0x400+eax*1+ebp]
+        xor     edi,ebx
+        mov     ebx,DWORD [0x500+edx*1+ebp]
+        xor     edi,ebx
+        mov     ebp,DWORD [32+esp]
+        ;
+        ; Round 5
+        mov     eax,DWORD [36+esp]
+        mov     edx,edi
+        shr     edx,16
+        mov     ecx,DWORD [40+esp]
+        xor     edx,edi
+        and     eax,edx
+        and     edx,ecx
+        mov     ebx,eax
+        shl     ebx,16
+        mov     ecx,edx
+        shl     ecx,16
+        xor     eax,ebx
+        xor     edx,ecx
+        mov     ebx,DWORD [40+ebp]
+        xor     eax,ebx
+        mov     ecx,DWORD [44+ebp]
+        xor     eax,edi
+        xor     edx,edi
+        xor     edx,ecx
+        and     eax,0xfcfcfcfc
+        xor     ebx,ebx
+        and     edx,0xcfcfcfcf
+        xor     ecx,ecx
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        mov     ebp,DWORD [4+esp]
+        xor     esi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     esi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     esi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     esi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        mov     ebx,DWORD [0x600+ebx*1+ebp]
+        xor     esi,ebx
+        mov     ebx,DWORD [0x700+ecx*1+ebp]
+        xor     esi,ebx
+        mov     ebx,DWORD [0x400+eax*1+ebp]
+        xor     esi,ebx
+        mov     ebx,DWORD [0x500+edx*1+ebp]
+        xor     esi,ebx
+        mov     ebp,DWORD [32+esp]
+        ;
+        ; Round 6
+        mov     eax,DWORD [36+esp]
+        mov     edx,esi
+        shr     edx,16
+        mov     ecx,DWORD [40+esp]
+        xor     edx,esi
+        and     eax,edx
+        and     edx,ecx
+        mov     ebx,eax
+        shl     ebx,16
+        mov     ecx,edx
+        shl     ecx,16
+        xor     eax,ebx
+        xor     edx,ecx
+        mov     ebx,DWORD [48+ebp]
+        xor     eax,ebx
+        mov     ecx,DWORD [52+ebp]
+        xor     eax,esi
+        xor     edx,esi
+        xor     edx,ecx
+        and     eax,0xfcfcfcfc
+        xor     ebx,ebx
+        and     edx,0xcfcfcfcf
+        xor     ecx,ecx
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        mov     ebp,DWORD [4+esp]
+        xor     edi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     edi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     edi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     edi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        mov     ebx,DWORD [0x600+ebx*1+ebp]
+        xor     edi,ebx
+        mov     ebx,DWORD [0x700+ecx*1+ebp]
+        xor     edi,ebx
+        mov     ebx,DWORD [0x400+eax*1+ebp]
+        xor     edi,ebx
+        mov     ebx,DWORD [0x500+edx*1+ebp]
+        xor     edi,ebx
+        mov     ebp,DWORD [32+esp]
+        ;
+        ; Round 7
+        mov     eax,DWORD [36+esp]
+        mov     edx,edi
+        shr     edx,16
+        mov     ecx,DWORD [40+esp]
+        xor     edx,edi
+        and     eax,edx
+        and     edx,ecx
+        mov     ebx,eax
+        shl     ebx,16
+        mov     ecx,edx
+        shl     ecx,16
+        xor     eax,ebx
+        xor     edx,ecx
+        mov     ebx,DWORD [56+ebp]
+        xor     eax,ebx
+        mov     ecx,DWORD [60+ebp]
+        xor     eax,edi
+        xor     edx,edi
+        xor     edx,ecx
+        and     eax,0xfcfcfcfc
+        xor     ebx,ebx
+        and     edx,0xcfcfcfcf
+        xor     ecx,ecx
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        mov     ebp,DWORD [4+esp]
+        xor     esi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     esi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     esi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     esi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        mov     ebx,DWORD [0x600+ebx*1+ebp]
+        xor     esi,ebx
+        mov     ebx,DWORD [0x700+ecx*1+ebp]
+        xor     esi,ebx
+        mov     ebx,DWORD [0x400+eax*1+ebp]
+        xor     esi,ebx
+        mov     ebx,DWORD [0x500+edx*1+ebp]
+        xor     esi,ebx
+        mov     ebp,DWORD [32+esp]
+        ;
+        ; Round 8
+        mov     eax,DWORD [36+esp]
+        mov     edx,esi
+        shr     edx,16
+        mov     ecx,DWORD [40+esp]
+        xor     edx,esi
+        and     eax,edx
+        and     edx,ecx
+        mov     ebx,eax
+        shl     ebx,16
+        mov     ecx,edx
+        shl     ecx,16
+        xor     eax,ebx
+        xor     edx,ecx
+        mov     ebx,DWORD [64+ebp]
+        xor     eax,ebx
+        mov     ecx,DWORD [68+ebp]
+        xor     eax,esi
+        xor     edx,esi
+        xor     edx,ecx
+        and     eax,0xfcfcfcfc
+        xor     ebx,ebx
+        and     edx,0xcfcfcfcf
+        xor     ecx,ecx
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        mov     ebp,DWORD [4+esp]
+        xor     edi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     edi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     edi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     edi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        mov     ebx,DWORD [0x600+ebx*1+ebp]
+        xor     edi,ebx
+        mov     ebx,DWORD [0x700+ecx*1+ebp]
+        xor     edi,ebx
+        mov     ebx,DWORD [0x400+eax*1+ebp]
+        xor     edi,ebx
+        mov     ebx,DWORD [0x500+edx*1+ebp]
+        xor     edi,ebx
+        mov     ebp,DWORD [32+esp]
+        ;
+        ; Round 9
+        mov     eax,DWORD [36+esp]
+        mov     edx,edi
+        shr     edx,16
+        mov     ecx,DWORD [40+esp]
+        xor     edx,edi
+        and     eax,edx
+        and     edx,ecx
+        mov     ebx,eax
+        shl     ebx,16
+        mov     ecx,edx
+        shl     ecx,16
+        xor     eax,ebx
+        xor     edx,ecx
+        mov     ebx,DWORD [72+ebp]
+        xor     eax,ebx
+        mov     ecx,DWORD [76+ebp]
+        xor     eax,edi
+        xor     edx,edi
+        xor     edx,ecx
+        and     eax,0xfcfcfcfc
+        xor     ebx,ebx
+        and     edx,0xcfcfcfcf
+        xor     ecx,ecx
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        mov     ebp,DWORD [4+esp]
+        xor     esi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     esi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     esi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     esi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        mov     ebx,DWORD [0x600+ebx*1+ebp]
+        xor     esi,ebx
+        mov     ebx,DWORD [0x700+ecx*1+ebp]
+        xor     esi,ebx
+        mov     ebx,DWORD [0x400+eax*1+ebp]
+        xor     esi,ebx
+        mov     ebx,DWORD [0x500+edx*1+ebp]
+        xor     esi,ebx
+        mov     ebp,DWORD [32+esp]
+        ;
+        ; Round 10
+        mov     eax,DWORD [36+esp]
+        mov     edx,esi
+        shr     edx,16
+        mov     ecx,DWORD [40+esp]
+        xor     edx,esi
+        and     eax,edx
+        and     edx,ecx
+        mov     ebx,eax
+        shl     ebx,16
+        mov     ecx,edx
+        shl     ecx,16
+        xor     eax,ebx
+        xor     edx,ecx
+        mov     ebx,DWORD [80+ebp]
+        xor     eax,ebx
+        mov     ecx,DWORD [84+ebp]
+        xor     eax,esi
+        xor     edx,esi
+        xor     edx,ecx
+        and     eax,0xfcfcfcfc
+        xor     ebx,ebx
+        and     edx,0xcfcfcfcf
+        xor     ecx,ecx
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        mov     ebp,DWORD [4+esp]
+        xor     edi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     edi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     edi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     edi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        mov     ebx,DWORD [0x600+ebx*1+ebp]
+        xor     edi,ebx
+        mov     ebx,DWORD [0x700+ecx*1+ebp]
+        xor     edi,ebx
+        mov     ebx,DWORD [0x400+eax*1+ebp]
+        xor     edi,ebx
+        mov     ebx,DWORD [0x500+edx*1+ebp]
+        xor     edi,ebx
+        mov     ebp,DWORD [32+esp]
+        ;
+        ; Round 11
+        mov     eax,DWORD [36+esp]
+        mov     edx,edi
+        shr     edx,16
+        mov     ecx,DWORD [40+esp]
+        xor     edx,edi
+        and     eax,edx
+        and     edx,ecx
+        mov     ebx,eax
+        shl     ebx,16
+        mov     ecx,edx
+        shl     ecx,16
+        xor     eax,ebx
+        xor     edx,ecx
+        mov     ebx,DWORD [88+ebp]
+        xor     eax,ebx
+        mov     ecx,DWORD [92+ebp]
+        xor     eax,edi
+        xor     edx,edi
+        xor     edx,ecx
+        and     eax,0xfcfcfcfc
+        xor     ebx,ebx
+        and     edx,0xcfcfcfcf
+        xor     ecx,ecx
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        mov     ebp,DWORD [4+esp]
+        xor     esi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     esi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     esi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     esi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        mov     ebx,DWORD [0x600+ebx*1+ebp]
+        xor     esi,ebx
+        mov     ebx,DWORD [0x700+ecx*1+ebp]
+        xor     esi,ebx
+        mov     ebx,DWORD [0x400+eax*1+ebp]
+        xor     esi,ebx
+        mov     ebx,DWORD [0x500+edx*1+ebp]
+        xor     esi,ebx
+        mov     ebp,DWORD [32+esp]
+        ;
+        ; Round 12
+        mov     eax,DWORD [36+esp]
+        mov     edx,esi
+        shr     edx,16
+        mov     ecx,DWORD [40+esp]
+        xor     edx,esi
+        and     eax,edx
+        and     edx,ecx
+        mov     ebx,eax
+        shl     ebx,16
+        mov     ecx,edx
+        shl     ecx,16
+        xor     eax,ebx
+        xor     edx,ecx
+        mov     ebx,DWORD [96+ebp]
+        xor     eax,ebx
+        mov     ecx,DWORD [100+ebp]
+        xor     eax,esi
+        xor     edx,esi
+        xor     edx,ecx
+        and     eax,0xfcfcfcfc
+        xor     ebx,ebx
+        and     edx,0xcfcfcfcf
+        xor     ecx,ecx
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        mov     ebp,DWORD [4+esp]
+        xor     edi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     edi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     edi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     edi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        mov     ebx,DWORD [0x600+ebx*1+ebp]
+        xor     edi,ebx
+        mov     ebx,DWORD [0x700+ecx*1+ebp]
+        xor     edi,ebx
+        mov     ebx,DWORD [0x400+eax*1+ebp]
+        xor     edi,ebx
+        mov     ebx,DWORD [0x500+edx*1+ebp]
+        xor     edi,ebx
+        mov     ebp,DWORD [32+esp]
+        ;
+        ; Round 13
+        mov     eax,DWORD [36+esp]
+        mov     edx,edi
+        shr     edx,16
+        mov     ecx,DWORD [40+esp]
+        xor     edx,edi
+        and     eax,edx
+        and     edx,ecx
+        mov     ebx,eax
+        shl     ebx,16
+        mov     ecx,edx
+        shl     ecx,16
+        xor     eax,ebx
+        xor     edx,ecx
+        mov     ebx,DWORD [104+ebp]
+        xor     eax,ebx
+        mov     ecx,DWORD [108+ebp]
+        xor     eax,edi
+        xor     edx,edi
+        xor     edx,ecx
+        and     eax,0xfcfcfcfc
+        xor     ebx,ebx
+        and     edx,0xcfcfcfcf
+        xor     ecx,ecx
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        mov     ebp,DWORD [4+esp]
+        xor     esi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     esi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     esi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     esi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        mov     ebx,DWORD [0x600+ebx*1+ebp]
+        xor     esi,ebx
+        mov     ebx,DWORD [0x700+ecx*1+ebp]
+        xor     esi,ebx
+        mov     ebx,DWORD [0x400+eax*1+ebp]
+        xor     esi,ebx
+        mov     ebx,DWORD [0x500+edx*1+ebp]
+        xor     esi,ebx
+        mov     ebp,DWORD [32+esp]
+        ;
+        ; Round 14
+        mov     eax,DWORD [36+esp]
+        mov     edx,esi
+        shr     edx,16
+        mov     ecx,DWORD [40+esp]
+        xor     edx,esi
+        and     eax,edx
+        and     edx,ecx
+        mov     ebx,eax
+        shl     ebx,16
+        mov     ecx,edx
+        shl     ecx,16
+        xor     eax,ebx
+        xor     edx,ecx
+        mov     ebx,DWORD [112+ebp]
+        xor     eax,ebx
+        mov     ecx,DWORD [116+ebp]
+        xor     eax,esi
+        xor     edx,esi
+        xor     edx,ecx
+        and     eax,0xfcfcfcfc
+        xor     ebx,ebx
+        and     edx,0xcfcfcfcf
+        xor     ecx,ecx
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        mov     ebp,DWORD [4+esp]
+        xor     edi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     edi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     edi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     edi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        mov     ebx,DWORD [0x600+ebx*1+ebp]
+        xor     edi,ebx
+        mov     ebx,DWORD [0x700+ecx*1+ebp]
+        xor     edi,ebx
+        mov     ebx,DWORD [0x400+eax*1+ebp]
+        xor     edi,ebx
+        mov     ebx,DWORD [0x500+edx*1+ebp]
+        xor     edi,ebx
+        mov     ebp,DWORD [32+esp]
+        ;
+        ; Round 15
+        mov     eax,DWORD [36+esp]
+        mov     edx,edi
+        shr     edx,16
+        mov     ecx,DWORD [40+esp]
+        xor     edx,edi
+        and     eax,edx
+        and     edx,ecx
+        mov     ebx,eax
+        shl     ebx,16
+        mov     ecx,edx
+        shl     ecx,16
+        xor     eax,ebx
+        xor     edx,ecx
+        mov     ebx,DWORD [120+ebp]
+        xor     eax,ebx
+        mov     ecx,DWORD [124+ebp]
+        xor     eax,edi
+        xor     edx,edi
+        xor     edx,ecx
+        and     eax,0xfcfcfcfc
+        xor     ebx,ebx
+        and     edx,0xcfcfcfcf
+        xor     ecx,ecx
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        mov     ebp,DWORD [4+esp]
+        xor     esi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     esi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     esi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     esi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        mov     ebx,DWORD [0x600+ebx*1+ebp]
+        xor     esi,ebx
+        mov     ebx,DWORD [0x700+ecx*1+ebp]
+        xor     esi,ebx
+        mov     ebx,DWORD [0x400+eax*1+ebp]
+        xor     esi,ebx
+        mov     ebx,DWORD [0x500+edx*1+ebp]
+        xor     esi,ebx
+        mov     ebp,DWORD [32+esp]
+        mov     ebx,DWORD [esp]
+        mov     eax,edi
+        dec     ebx
+        mov     edi,esi
+        mov     esi,eax
+        mov     DWORD [esp],ebx
+        jnz     NEAR L$000start
+        ;
+        ; FP
+        mov     edx,DWORD [28+esp]
+        ror     edi,1
+        mov     eax,esi
+        xor     esi,edi
+        and     esi,0xaaaaaaaa
+        xor     eax,esi
+        xor     edi,esi
+        ;
+        rol     eax,23
+        mov     esi,eax
+        xor     eax,edi
+        and     eax,0x03fc03fc
+        xor     esi,eax
+        xor     edi,eax
+        ;
+        rol     esi,10
+        mov     eax,esi
+        xor     esi,edi
+        and     esi,0x33333333
+        xor     eax,esi
+        xor     edi,esi
+        ;
+        rol     edi,18
+        mov     esi,edi
+        xor     edi,eax
+        and     edi,0xfff0000f
+        xor     esi,edi
+        xor     eax,edi
+        ;
+        rol     esi,12
+        mov     edi,esi
+        xor     esi,eax
+        and     esi,0xf0f0f0f0
+        xor     edi,esi
+        xor     eax,esi
+        ;
+        ror     eax,4
+        mov     DWORD [edx],eax
+        mov     DWORD [4+edx],edi
+        add     esp,8
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/des/des-586.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/des/des-586.nasm
new file mode 100644
index 0000000000..980d488316
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/des/des-586.nasm
@@ -0,0 +1,1835 @@
+; Copyright 1995-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code    use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text   code align=64
+%else
+section .text   code
+%endif
+global  _DES_SPtrans
+align   16
+__x86_DES_encrypt:
+        push    ecx
+        ; Round 0
+        mov     eax,DWORD [ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [4+ecx]
+        xor     eax,esi
+        xor     ecx,ecx
+        xor     edx,esi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     edi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     edi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     edi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     edi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     edi,DWORD [0x600+ebx*1+ebp]
+        xor     edi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     edi,DWORD [0x400+eax*1+ebp]
+        xor     edi,DWORD [0x500+edx*1+ebp]
+        ; Round 1
+        mov     eax,DWORD [8+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [12+ecx]
+        xor     eax,edi
+        xor     ecx,ecx
+        xor     edx,edi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     esi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     esi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     esi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     esi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     esi,DWORD [0x600+ebx*1+ebp]
+        xor     esi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     esi,DWORD [0x400+eax*1+ebp]
+        xor     esi,DWORD [0x500+edx*1+ebp]
+        ; Round 2
+        mov     eax,DWORD [16+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [20+ecx]
+        xor     eax,esi
+        xor     ecx,ecx
+        xor     edx,esi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     edi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     edi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     edi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     edi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     edi,DWORD [0x600+ebx*1+ebp]
+        xor     edi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     edi,DWORD [0x400+eax*1+ebp]
+        xor     edi,DWORD [0x500+edx*1+ebp]
+        ; Round 3
+        mov     eax,DWORD [24+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [28+ecx]
+        xor     eax,edi
+        xor     ecx,ecx
+        xor     edx,edi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     esi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     esi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     esi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     esi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     esi,DWORD [0x600+ebx*1+ebp]
+        xor     esi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     esi,DWORD [0x400+eax*1+ebp]
+        xor     esi,DWORD [0x500+edx*1+ebp]
+        ; Round 4
+        mov     eax,DWORD [32+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [36+ecx]
+        xor     eax,esi
+        xor     ecx,ecx
+        xor     edx,esi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     edi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     edi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     edi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     edi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     edi,DWORD [0x600+ebx*1+ebp]
+        xor     edi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     edi,DWORD [0x400+eax*1+ebp]
+        xor     edi,DWORD [0x500+edx*1+ebp]
+        ; Round 5
+        mov     eax,DWORD [40+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [44+ecx]
+        xor     eax,edi
+        xor     ecx,ecx
+        xor     edx,edi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     esi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     esi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     esi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     esi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     esi,DWORD [0x600+ebx*1+ebp]
+        xor     esi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     esi,DWORD [0x400+eax*1+ebp]
+        xor     esi,DWORD [0x500+edx*1+ebp]
+        ; Round 6
+        mov     eax,DWORD [48+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [52+ecx]
+        xor     eax,esi
+        xor     ecx,ecx
+        xor     edx,esi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     edi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     edi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     edi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     edi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     edi,DWORD [0x600+ebx*1+ebp]
+        xor     edi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     edi,DWORD [0x400+eax*1+ebp]
+        xor     edi,DWORD [0x500+edx*1+ebp]
+        ; Round 7
+        mov     eax,DWORD [56+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [60+ecx]
+        xor     eax,edi
+        xor     ecx,ecx
+        xor     edx,edi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     esi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     esi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     esi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     esi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     esi,DWORD [0x600+ebx*1+ebp]
+        xor     esi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     esi,DWORD [0x400+eax*1+ebp]
+        xor     esi,DWORD [0x500+edx*1+ebp]
+        ; Round 8
+        mov     eax,DWORD [64+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [68+ecx]
+        xor     eax,esi
+        xor     ecx,ecx
+        xor     edx,esi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     edi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     edi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     edi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     edi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     edi,DWORD [0x600+ebx*1+ebp]
+        xor     edi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     edi,DWORD [0x400+eax*1+ebp]
+        xor     edi,DWORD [0x500+edx*1+ebp]
+        ; Round 9
+        mov     eax,DWORD [72+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [76+ecx]
+        xor     eax,edi
+        xor     ecx,ecx
+        xor     edx,edi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     esi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     esi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     esi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     esi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     esi,DWORD [0x600+ebx*1+ebp]
+        xor     esi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     esi,DWORD [0x400+eax*1+ebp]
+        xor     esi,DWORD [0x500+edx*1+ebp]
+        ; Round 10
+        mov     eax,DWORD [80+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [84+ecx]
+        xor     eax,esi
+        xor     ecx,ecx
+        xor     edx,esi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     edi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     edi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     edi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     edi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     edi,DWORD [0x600+ebx*1+ebp]
+        xor     edi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     edi,DWORD [0x400+eax*1+ebp]
+        xor     edi,DWORD [0x500+edx*1+ebp]
+        ; Round 11
+        mov     eax,DWORD [88+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [92+ecx]
+        xor     eax,edi
+        xor     ecx,ecx
+        xor     edx,edi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     esi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     esi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     esi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     esi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     esi,DWORD [0x600+ebx*1+ebp]
+        xor     esi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     esi,DWORD [0x400+eax*1+ebp]
+        xor     esi,DWORD [0x500+edx*1+ebp]
+        ; Round 12
+        mov     eax,DWORD [96+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [100+ecx]
+        xor     eax,esi
+        xor     ecx,ecx
+        xor     edx,esi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     edi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     edi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     edi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     edi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     edi,DWORD [0x600+ebx*1+ebp]
+        xor     edi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     edi,DWORD [0x400+eax*1+ebp]
+        xor     edi,DWORD [0x500+edx*1+ebp]
+        ; Round 13
+        mov     eax,DWORD [104+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [108+ecx]
+        xor     eax,edi
+        xor     ecx,ecx
+        xor     edx,edi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     esi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     esi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     esi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     esi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     esi,DWORD [0x600+ebx*1+ebp]
+        xor     esi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     esi,DWORD [0x400+eax*1+ebp]
+        xor     esi,DWORD [0x500+edx*1+ebp]
+        ; Round 14
+        mov     eax,DWORD [112+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [116+ecx]
+        xor     eax,esi
+        xor     ecx,ecx
+        xor     edx,esi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     edi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     edi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     edi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     edi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     edi,DWORD [0x600+ebx*1+ebp]
+        xor     edi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     edi,DWORD [0x400+eax*1+ebp]
+        xor     edi,DWORD [0x500+edx*1+ebp]
+        ; Round 15
+        mov     eax,DWORD [120+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [124+ecx]
+        xor     eax,edi
+        xor     ecx,ecx
+        xor     edx,edi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     esi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     esi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     esi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     esi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     esi,DWORD [0x600+ebx*1+ebp]
+        xor     esi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     esi,DWORD [0x400+eax*1+ebp]
+        xor     esi,DWORD [0x500+edx*1+ebp]
+        add     esp,4
+        ret
+align   16
+__x86_DES_decrypt:
+        push    ecx
+        ; Round 15
+        mov     eax,DWORD [120+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [124+ecx]
+        xor     eax,esi
+        xor     ecx,ecx
+        xor     edx,esi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     edi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     edi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     edi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     edi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     edi,DWORD [0x600+ebx*1+ebp]
+        xor     edi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     edi,DWORD [0x400+eax*1+ebp]
+        xor     edi,DWORD [0x500+edx*1+ebp]
+        ; Round 14
+        mov     eax,DWORD [112+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [116+ecx]
+        xor     eax,edi
+        xor     ecx,ecx
+        xor     edx,edi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     esi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     esi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     esi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     esi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     esi,DWORD [0x600+ebx*1+ebp]
+        xor     esi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     esi,DWORD [0x400+eax*1+ebp]
+        xor     esi,DWORD [0x500+edx*1+ebp]
+        ; Round 13
+        mov     eax,DWORD [104+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [108+ecx]
+        xor     eax,esi
+        xor     ecx,ecx
+        xor     edx,esi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     edi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     edi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     edi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     edi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     edi,DWORD [0x600+ebx*1+ebp]
+        xor     edi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     edi,DWORD [0x400+eax*1+ebp]
+        xor     edi,DWORD [0x500+edx*1+ebp]
+        ; Round 12
+        mov     eax,DWORD [96+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [100+ecx]
+        xor     eax,edi
+        xor     ecx,ecx
+        xor     edx,edi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     esi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     esi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     esi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     esi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     esi,DWORD [0x600+ebx*1+ebp]
+        xor     esi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     esi,DWORD [0x400+eax*1+ebp]
+        xor     esi,DWORD [0x500+edx*1+ebp]
+        ; Round 11
+        mov     eax,DWORD [88+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [92+ecx]
+        xor     eax,esi
+        xor     ecx,ecx
+        xor     edx,esi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     edi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     edi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     edi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     edi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     edi,DWORD [0x600+ebx*1+ebp]
+        xor     edi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     edi,DWORD [0x400+eax*1+ebp]
+        xor     edi,DWORD [0x500+edx*1+ebp]
+        ; Round 10
+        mov     eax,DWORD [80+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [84+ecx]
+        xor     eax,edi
+        xor     ecx,ecx
+        xor     edx,edi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     esi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     esi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     esi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     esi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     esi,DWORD [0x600+ebx*1+ebp]
+        xor     esi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     esi,DWORD [0x400+eax*1+ebp]
+        xor     esi,DWORD [0x500+edx*1+ebp]
+        ; Round 9
+        mov     eax,DWORD [72+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [76+ecx]
+        xor     eax,esi
+        xor     ecx,ecx
+        xor     edx,esi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     edi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     edi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     edi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     edi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     edi,DWORD [0x600+ebx*1+ebp]
+        xor     edi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     edi,DWORD [0x400+eax*1+ebp]
+        xor     edi,DWORD [0x500+edx*1+ebp]
+        ; Round 8
+        mov     eax,DWORD [64+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [68+ecx]
+        xor     eax,edi
+        xor     ecx,ecx
+        xor     edx,edi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     esi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     esi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     esi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     esi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     esi,DWORD [0x600+ebx*1+ebp]
+        xor     esi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     esi,DWORD [0x400+eax*1+ebp]
+        xor     esi,DWORD [0x500+edx*1+ebp]
+        ; Round 7
+        mov     eax,DWORD [56+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [60+ecx]
+        xor     eax,esi
+        xor     ecx,ecx
+        xor     edx,esi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     edi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     edi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     edi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     edi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     edi,DWORD [0x600+ebx*1+ebp]
+        xor     edi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     edi,DWORD [0x400+eax*1+ebp]
+        xor     edi,DWORD [0x500+edx*1+ebp]
+        ; Round 6
+        mov     eax,DWORD [48+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [52+ecx]
+        xor     eax,edi
+        xor     ecx,ecx
+        xor     edx,edi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     esi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     esi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     esi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     esi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     esi,DWORD [0x600+ebx*1+ebp]
+        xor     esi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     esi,DWORD [0x400+eax*1+ebp]
+        xor     esi,DWORD [0x500+edx*1+ebp]
+        ; Round 5
+        mov     eax,DWORD [40+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [44+ecx]
+        xor     eax,esi
+        xor     ecx,ecx
+        xor     edx,esi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     edi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     edi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     edi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     edi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     edi,DWORD [0x600+ebx*1+ebp]
+        xor     edi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     edi,DWORD [0x400+eax*1+ebp]
+        xor     edi,DWORD [0x500+edx*1+ebp]
+        ; Round 4
+        mov     eax,DWORD [32+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [36+ecx]
+        xor     eax,edi
+        xor     ecx,ecx
+        xor     edx,edi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     esi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     esi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     esi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     esi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     esi,DWORD [0x600+ebx*1+ebp]
+        xor     esi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     esi,DWORD [0x400+eax*1+ebp]
+        xor     esi,DWORD [0x500+edx*1+ebp]
+        ; Round 3
+        mov     eax,DWORD [24+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [28+ecx]
+        xor     eax,esi
+        xor     ecx,ecx
+        xor     edx,esi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     edi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     edi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     edi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     edi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     edi,DWORD [0x600+ebx*1+ebp]
+        xor     edi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     edi,DWORD [0x400+eax*1+ebp]
+        xor     edi,DWORD [0x500+edx*1+ebp]
+        ; Round 2
+        mov     eax,DWORD [16+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [20+ecx]
+        xor     eax,edi
+        xor     ecx,ecx
+        xor     edx,edi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     esi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     esi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     esi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     esi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     esi,DWORD [0x600+ebx*1+ebp]
+        xor     esi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     esi,DWORD [0x400+eax*1+ebp]
+        xor     esi,DWORD [0x500+edx*1+ebp]
+        ; Round 1
+        mov     eax,DWORD [8+ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [12+ecx]
+        xor     eax,esi
+        xor     ecx,ecx
+        xor     edx,esi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     edi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     edi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     edi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     edi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     edi,DWORD [0x600+ebx*1+ebp]
+        xor     edi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     edi,DWORD [0x400+eax*1+ebp]
+        xor     edi,DWORD [0x500+edx*1+ebp]
+        ; Round 0
+        mov     eax,DWORD [ecx]
+        xor     ebx,ebx
+        mov     edx,DWORD [4+ecx]
+        xor     eax,edi
+        xor     ecx,ecx
+        xor     edx,edi
+        and     eax,0xfcfcfcfc
+        and     edx,0xcfcfcfcf
+        mov     bl,al
+        mov     cl,ah
+        ror     edx,4
+        xor     esi,DWORD [ebx*1+ebp]
+        mov     bl,dl
+        xor     esi,DWORD [0x200+ecx*1+ebp]
+        mov     cl,dh
+        shr     eax,16
+        xor     esi,DWORD [0x100+ebx*1+ebp]
+        mov     bl,ah
+        shr     edx,16
+        xor     esi,DWORD [0x300+ecx*1+ebp]
+        mov     cl,dh
+        and     eax,0xff
+        and     edx,0xff
+        xor     esi,DWORD [0x600+ebx*1+ebp]
+        xor     esi,DWORD [0x700+ecx*1+ebp]
+        mov     ecx,DWORD [esp]
+        xor     esi,DWORD [0x400+eax*1+ebp]
+        xor     esi,DWORD [0x500+edx*1+ebp]
+        add     esp,4
+        ret
+global  _DES_encrypt1
+align   16
+_DES_encrypt1:
+L$_DES_encrypt1_begin:
+        push    esi
+        push    edi
+        ;
+        ; Load the 2 words
+        mov     esi,DWORD [12+esp]
+        xor     ecx,ecx
+        push    ebx
+        push    ebp
+        mov     eax,DWORD [esi]
+        mov     ebx,DWORD [28+esp]
+        mov     edi,DWORD [4+esi]
+        ;
+        ; IP
+        rol     eax,4
+        mov     esi,eax
+        xor     eax,edi
+        and     eax,0xf0f0f0f0
+        xor     esi,eax
+        xor     edi,eax
+        ;
+        rol     edi,20
+        mov     eax,edi
+        xor     edi,esi
+        and     edi,0xfff0000f
+        xor     eax,edi
+        xor     esi,edi
+        ;
+        rol     eax,14
+        mov     edi,eax
+        xor     eax,esi
+        and     eax,0x33333333
+        xor     edi,eax
+        xor     esi,eax
+        ;
+        rol     esi,22
+        mov     eax,esi
+        xor     esi,edi
+        and     esi,0x03fc03fc
+        xor     eax,esi
+        xor     edi,esi
+        ;
+        rol     eax,9
+        mov     esi,eax
+        xor     eax,edi
+        and     eax,0xaaaaaaaa
+        xor     esi,eax
+        xor     edi,eax
+        ;
+        rol     edi,1
+        call    L$000pic_point
+L$000pic_point:
+        pop     ebp
+        lea     ebp,[(L$des_sptrans-L$000pic_point)+ebp]
+        mov     ecx,DWORD [24+esp]
+        cmp     ebx,0
+        je      NEAR L$001decrypt
+        call    __x86_DES_encrypt
+        jmp     NEAR L$002done
+L$001decrypt:
+        call    __x86_DES_decrypt
+L$002done:
+        ;
+        ; FP
+        mov     edx,DWORD [20+esp]
+        ror     esi,1
+        mov     eax,edi
+        xor     edi,esi
+        and     edi,0xaaaaaaaa
+        xor     eax,edi
+        xor     esi,edi
+        ;
+        rol     eax,23
+        mov     edi,eax
+        xor     eax,esi
+        and     eax,0x03fc03fc
+        xor     edi,eax
+        xor     esi,eax
+        ;
+        rol     edi,10
+        mov     eax,edi
+        xor     edi,esi
+        and     edi,0x33333333
+        xor     eax,edi
+        xor     esi,edi
+        ;
+        rol     esi,18
+        mov     edi,esi
+        xor     esi,eax
+        and     esi,0xfff0000f
+        xor     edi,esi
+        xor     eax,esi
+        ;
+        rol     edi,12
+        mov     esi,edi
+        xor     edi,eax
+        and     edi,0xf0f0f0f0
+        xor     esi,edi
+        xor     eax,edi
+        ;
+        ror     eax,4
+        mov     DWORD [edx],eax
+        mov     DWORD [4+edx],esi
+        pop     ebp
+        pop     ebx
+        pop     edi
+        pop     esi
+        ret
+global  _DES_encrypt2
+align   16
+_DES_encrypt2:
+L$_DES_encrypt2_begin:
+        push    esi
+        push    edi
+        ;
+        ; Load the 2 words
+        mov     eax,DWORD [12+esp]
+        xor     ecx,ecx
+        push    ebx
+        push    ebp
+        mov     esi,DWORD [eax]
+        mov     ebx,DWORD [28+esp]
+        rol     esi,3
+        mov     edi,DWORD [4+eax]
+        rol     edi,3
+        call    L$003pic_point
+L$003pic_point:
+        pop     ebp
+        lea     ebp,[(L$des_sptrans-L$003pic_point)+ebp]
+        mov     ecx,DWORD [24+esp]
+        cmp     ebx,0
+        je      NEAR L$004decrypt
+        call    __x86_DES_encrypt
+        jmp     NEAR L$005done
+L$004decrypt:
+        call    __x86_DES_decrypt
+L$005done:
+        ;
+        ; Fixup
+        ror     edi,3
+        mov     eax,DWORD [20+esp]
+        ror     esi,3
+        mov     DWORD [eax],edi
+        mov     DWORD [4+eax],esi
+        pop     ebp
+        pop     ebx
+        pop     edi
+        pop     esi
+        ret
+global  _DES_encrypt3
+align   16
+_DES_encrypt3:
+L$_DES_encrypt3_begin:
+        push    ebx
+        mov     ebx,DWORD [8+esp]
+        push    ebp
+        push    esi
+        push    edi
+        ;
+        ; Load the data words
+        mov     edi,DWORD [ebx]
+        mov     esi,DWORD [4+ebx]
+        sub     esp,12
+        ;
+        ; IP
+        rol     edi,4
+        mov     edx,edi
+        xor     edi,esi
+        and     edi,0xf0f0f0f0
+        xor     edx,edi
+        xor     esi,edi
+        ;
+        rol     esi,20
+        mov     edi,esi
+        xor     esi,edx
+        and     esi,0xfff0000f
+        xor     edi,esi
+        xor     edx,esi
+        ;
+        rol     edi,14
+        mov     esi,edi
+        xor     edi,edx
+        and     edi,0x33333333
+        xor     esi,edi
+        xor     edx,edi
+        ;
+        rol     edx,22
+        mov     edi,edx
+        xor     edx,esi
+        and     edx,0x03fc03fc
+        xor     edi,edx
+        xor     esi,edx
+        ;
+        rol     edi,9
+        mov     edx,edi
+        xor     edi,esi
+        and     edi,0xaaaaaaaa
+        xor     edx,edi
+        xor     esi,edi
+        ;
+        ror     edx,3
+        ror     esi,2
+        mov     DWORD [4+ebx],esi
+        mov     eax,DWORD [36+esp]
+        mov     DWORD [ebx],edx
+        mov     edi,DWORD [40+esp]
+        mov     esi,DWORD [44+esp]
+        mov     DWORD [8+esp],DWORD 1
+        mov     DWORD [4+esp],eax
+        mov     DWORD [esp],ebx
+        call    L$_DES_encrypt2_begin
+        mov     DWORD [8+esp],DWORD 0
+        mov     DWORD [4+esp],edi
+        mov     DWORD [esp],ebx
+        call    L$_DES_encrypt2_begin
+        mov     DWORD [8+esp],DWORD 1
+        mov     DWORD [4+esp],esi
+        mov     DWORD [esp],ebx
+        call    L$_DES_encrypt2_begin
+        add     esp,12
+        mov     edi,DWORD [ebx]
+        mov     esi,DWORD [4+ebx]
+        ;
+        ; FP
+        rol     esi,2
+        rol     edi,3
+        mov     eax,edi
+        xor     edi,esi
+        and     edi,0xaaaaaaaa
+        xor     eax,edi
+        xor     esi,edi
+        ;
+        rol     eax,23
+        mov     edi,eax
+        xor     eax,esi
+        and     eax,0x03fc03fc
+        xor     edi,eax
+        xor     esi,eax
+        ;
+        rol     edi,10
+        mov     eax,edi
+        xor     edi,esi
+        and     edi,0x33333333
+        xor     eax,edi
+        xor     esi,edi
+        ;
+        rol     esi,18
+        mov     edi,esi
+        xor     esi,eax
+        and     esi,0xfff0000f
+        xor     edi,esi
+        xor     eax,esi
+        ;
+        rol     edi,12
+        mov     esi,edi
+        xor     edi,eax
+        and     edi,0xf0f0f0f0
+        xor     esi,edi
+        xor     eax,edi
+        ;
+        ror     eax,4
+        mov     DWORD [ebx],eax
+        mov     DWORD [4+ebx],esi
+        pop     edi
+        pop     esi
+        pop     ebp
+        pop     ebx
+        ret
+global  _DES_decrypt3
+align   16
+_DES_decrypt3:
+L$_DES_decrypt3_begin:
+        push    ebx
+        mov     ebx,DWORD [8+esp]
+        push    ebp
+        push    esi
+        push    edi
+        ;
+        ; Load the data words
+        mov     edi,DWORD [ebx]
+        mov     esi,DWORD [4+ebx]
+        sub     esp,12
+        ;
+        ; IP
+        rol     edi,4
+        mov     edx,edi
+        xor     edi,esi
+        and     edi,0xf0f0f0f0
+        xor     edx,edi
+        xor     esi,edi
+        ;
+        rol     esi,20
+        mov     edi,esi
+        xor     esi,edx
+        and     esi,0xfff0000f
+        xor     edi,esi
+        xor     edx,esi
+        ;
+        rol     edi,14
+        mov     esi,edi
+        xor     edi,edx
+        and     edi,0x33333333
+        xor     esi,edi
+        xor     edx,edi
+        ;
+        rol     edx,22
+        mov     edi,edx
+        xor     edx,esi
+        and     edx,0x03fc03fc
+        xor     edi,edx
+        xor     esi,edx
+        ;
+        rol     edi,9
+        mov     edx,edi
+        xor     edi,esi
+        and     edi,0xaaaaaaaa
+        xor     edx,edi
+        xor     esi,edi
+        ;
+        ror     edx,3
+        ror     esi,2
+        mov     DWORD [4+ebx],esi
+        mov     esi,DWORD [36+esp]
+        mov     DWORD [ebx],edx
+        mov     edi,DWORD [40+esp]
+        mov     eax,DWORD [44+esp]
+        mov     DWORD [8+esp],DWORD 0
+        mov     DWORD [4+esp],eax
+        mov     DWORD [esp],ebx
+        call    L$_DES_encrypt2_begin
+        mov     DWORD [8+esp],DWORD 1
+        mov     DWORD [4+esp],edi
+        mov     DWORD [esp],ebx
+        call    L$_DES_encrypt2_begin
+        mov     DWORD [8+esp],DWORD 0
+        mov     DWORD [4+esp],esi
+        mov     DWORD [esp],ebx
+        call    L$_DES_encrypt2_begin
+        add     esp,12
+        mov     edi,DWORD [ebx]
+        mov     esi,DWORD [4+ebx]
+        ;
+        ; FP
+        rol     esi,2
+        rol     edi,3
+        mov     eax,edi
+        xor     edi,esi
+        and     edi,0xaaaaaaaa
+        xor     eax,edi
+        xor     esi,edi
+        ;
+        rol     eax,23
+        mov     edi,eax
+        xor     eax,esi
+        and     eax,0x03fc03fc
+        xor     edi,eax
+        xor     esi,eax
+        ;
+        rol     edi,10
+        mov     eax,edi
+        xor     edi,esi
+        and     edi,0x33333333
+        xor     eax,edi
+        xor     esi,edi
+        ;
+        rol     esi,18
+        mov     edi,esi
+        xor     esi,eax
+        and     esi,0xfff0000f
+        xor     edi,esi
+        xor     eax,esi
+        ;
+        rol     edi,12
+        mov     esi,edi
+        xor     edi,eax
+        and     edi,0xf0f0f0f0
+        xor     esi,edi
+        xor     eax,edi
+        ;
+        ror     eax,4
+        mov     DWORD [ebx],eax
+        mov     DWORD [4+ebx],esi
+        pop     edi
+        pop     esi
+        pop     ebp
+        pop     ebx
+        ret
+global  _DES_ncbc_encrypt
+align   16
+_DES_ncbc_encrypt:
+L$_DES_ncbc_encrypt_begin:
+        ;
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        mov     ebp,DWORD [28+esp]
+        ; getting iv ptr from parameter 4
+        mov     ebx,DWORD [36+esp]
+        mov     esi,DWORD [ebx]
+        mov     edi,DWORD [4+ebx]
+        push    edi
+        push    esi
+        push    edi
+        push    esi
+        mov     ebx,esp
+        mov     esi,DWORD [36+esp]
+        mov     edi,DWORD [40+esp]
+        ; getting encrypt flag from parameter 5
+        mov     ecx,DWORD [56+esp]
+        ; get and push parameter 5
+        push    ecx
+        ; get and push parameter 3
+        mov     eax,DWORD [52+esp]
+        push    eax
+        push    ebx
+        cmp     ecx,0
+        jz      NEAR L$006decrypt
+        and     ebp,4294967288
+        mov     eax,DWORD [12+esp]
+        mov     ebx,DWORD [16+esp]
+        jz      NEAR L$007encrypt_finish
+L$008encrypt_loop:
+        mov     ecx,DWORD [esi]
+        mov     edx,DWORD [4+esi]
+        xor     eax,ecx
+        xor     ebx,edx
+        mov     DWORD [12+esp],eax
+        mov     DWORD [16+esp],ebx
+        call    L$_DES_encrypt1_begin
+        mov     eax,DWORD [12+esp]
+        mov     ebx,DWORD [16+esp]
+        mov     DWORD [edi],eax
+        mov     DWORD [4+edi],ebx
+        add     esi,8
+        add     edi,8
+        sub     ebp,8
+        jnz     NEAR L$008encrypt_loop
+L$007encrypt_finish:
+        mov     ebp,DWORD [56+esp]
+        and     ebp,7
+        jz      NEAR L$009finish
+        call    L$010PIC_point
+L$010PIC_point:
+        pop     edx
+        lea     ecx,[(L$011cbc_enc_jmp_table-L$010PIC_point)+edx]
+        mov     ebp,DWORD [ebp*4+ecx]
+        add     ebp,edx
+        xor     ecx,ecx
+        xor     edx,edx
+        jmp     ebp
+L$012ej7:
+        mov     dh,BYTE [6+esi]
+        shl     edx,8
+L$013ej6:
+        mov     dh,BYTE [5+esi]
+L$014ej5:
+        mov     dl,BYTE [4+esi]
+L$015ej4:
+        mov     ecx,DWORD [esi]
+        jmp     NEAR L$016ejend
+L$017ej3:
+        mov     ch,BYTE [2+esi]
+        shl     ecx,8
+L$018ej2:
+        mov     ch,BYTE [1+esi]
+L$019ej1:
+        mov     cl,BYTE [esi]
+L$016ejend:
+        xor     eax,ecx
+        xor     ebx,edx
+        mov     DWORD [12+esp],eax
+        mov     DWORD [16+esp],ebx
+        call    L$_DES_encrypt1_begin
+        mov     eax,DWORD [12+esp]
+        mov     ebx,DWORD [16+esp]
+        mov     DWORD [edi],eax
+        mov     DWORD [4+edi],ebx
+        jmp     NEAR L$009finish
+L$006decrypt:
+        and     ebp,4294967288
+        mov     eax,DWORD [20+esp]
+        mov     ebx,DWORD [24+esp]
+        jz      NEAR L$020decrypt_finish
+L$021decrypt_loop:
+        mov     eax,DWORD [esi]
+        mov     ebx,DWORD [4+esi]
+        mov     DWORD [12+esp],eax
+        mov     DWORD [16+esp],ebx
+        call    L$_DES_encrypt1_begin
+        mov     eax,DWORD [12+esp]
+        mov     ebx,DWORD [16+esp]
+        mov     ecx,DWORD [20+esp]
+        mov     edx,DWORD [24+esp]
+        xor     ecx,eax
+        xor     edx,ebx
+        mov     eax,DWORD [esi]
+        mov     ebx,DWORD [4+esi]
+        mov     DWORD [edi],ecx
+        mov     DWORD [4+edi],edx
+        mov     DWORD [20+esp],eax
+        mov     DWORD [24+esp],ebx
+        add     esi,8
+        add     edi,8
+        sub     ebp,8
+        jnz     NEAR L$021decrypt_loop
+L$020decrypt_finish:
+        mov     ebp,DWORD [56+esp]
+        and     ebp,7
+        jz      NEAR L$009finish
+        mov     eax,DWORD [esi]
+        mov     ebx,DWORD [4+esi]
+        mov     DWORD [12+esp],eax
+        mov     DWORD [16+esp],ebx
+        call    L$_DES_encrypt1_begin
+        mov     eax,DWORD [12+esp]
+        mov     ebx,DWORD [16+esp]
+        mov     ecx,DWORD [20+esp]
+        mov     edx,DWORD [24+esp]
+        xor     ecx,eax
+        xor     edx,ebx
+        mov     eax,DWORD [esi]
+        mov     ebx,DWORD [4+esi]
+L$022dj7:
+        ror     edx,16
+        mov     BYTE [6+edi],dl
+        shr     edx,16
+L$023dj6:
+        mov     BYTE [5+edi],dh
+L$024dj5:
+        mov     BYTE [4+edi],dl
+L$025dj4:
+        mov     DWORD [edi],ecx
+        jmp     NEAR L$026djend
+L$027dj3:
+        ror     ecx,16
+        mov     BYTE [2+edi],cl
+        shl     ecx,16
+L$028dj2:
+        mov     BYTE [1+esi],ch
+L$029dj1:
+        mov     BYTE [esi],cl
+L$026djend:
+        jmp     NEAR L$009finish
+L$009finish:
+        mov     ecx,DWORD [64+esp]
+        add     esp,28
+        mov     DWORD [ecx],eax
+        mov     DWORD [4+ecx],ebx
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+align   64
+L$011cbc_enc_jmp_table:
+dd      0
+dd      L$019ej1-L$010PIC_point
+dd      L$018ej2-L$010PIC_point
+dd      L$017ej3-L$010PIC_point
+dd      L$015ej4-L$010PIC_point
+dd      L$014ej5-L$010PIC_point
+dd      L$013ej6-L$010PIC_point
+dd      L$012ej7-L$010PIC_point
+align   64
+global  _DES_ede3_cbc_encrypt
+align   16
+_DES_ede3_cbc_encrypt:
+L$_DES_ede3_cbc_encrypt_begin:
+        ;
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        mov     ebp,DWORD [28+esp]
+        ; getting iv ptr from parameter 6
+        mov     ebx,DWORD [44+esp]
+        mov     esi,DWORD [ebx]
+        mov     edi,DWORD [4+ebx]
+        push    edi
+        push    esi
+        push    edi
+        push    esi
+        mov     ebx,esp
+        mov     esi,DWORD [36+esp]
+        mov     edi,DWORD [40+esp]
+        ; getting encrypt flag from parameter 7
+        mov     ecx,DWORD [64+esp]
+        ; get and push parameter 5
+        mov     eax,DWORD [56+esp]
+        push    eax
+        ; get and push parameter 4
+        mov     eax,DWORD [56+esp]
+        push    eax
+        ; get and push parameter 3
+        mov     eax,DWORD [56+esp]
+        push    eax
+        push    ebx
+        cmp     ecx,0
+        jz      NEAR L$030decrypt
+        and     ebp,4294967288
+        mov     eax,DWORD [16+esp]
+        mov     ebx,DWORD [20+esp]
+        jz      NEAR L$031encrypt_finish
+L$032encrypt_loop:
+        mov     ecx,DWORD [esi]
+        mov     edx,DWORD [4+esi]
+        xor     eax,ecx
+        xor     ebx,edx
+        mov     DWORD [16+esp],eax
+        mov     DWORD [20+esp],ebx
+        call    L$_DES_encrypt3_begin
+        mov     eax,DWORD [16+esp]
+        mov     ebx,DWORD [20+esp]
+        mov     DWORD [edi],eax
+        mov     DWORD [4+edi],ebx
+        add     esi,8
+        add     edi,8
+        sub     ebp,8
+        jnz     NEAR L$032encrypt_loop
+L$031encrypt_finish:
+        mov     ebp,DWORD [60+esp]
+        and     ebp,7
+        jz      NEAR L$033finish
+        call    L$034PIC_point
+L$034PIC_point:
+        pop     edx
+        lea     ecx,[(L$035cbc_enc_jmp_table-L$034PIC_point)+edx]
+        mov     ebp,DWORD [ebp*4+ecx]
+        add     ebp,edx
+        xor     ecx,ecx
+        xor     edx,edx
+        jmp     ebp
+L$036ej7:
+        mov     dh,BYTE [6+esi]
+        shl     edx,8
+L$037ej6:
+        mov     dh,BYTE [5+esi]
+L$038ej5:
+        mov     dl,BYTE [4+esi]
+L$039ej4:
+        mov     ecx,DWORD [esi]
+        jmp     NEAR L$040ejend
+L$041ej3:
+        mov     ch,BYTE [2+esi]
+        shl     ecx,8
+L$042ej2:
+        mov     ch,BYTE [1+esi]
+L$043ej1:
+        mov     cl,BYTE [esi]
+L$040ejend:
+        xor     eax,ecx
+        xor     ebx,edx
+        mov     DWORD [16+esp],eax
+        mov     DWORD [20+esp],ebx
+        call    L$_DES_encrypt3_begin
+        mov     eax,DWORD [16+esp]
+        mov     ebx,DWORD [20+esp]
+        mov     DWORD [edi],eax
+        mov     DWORD [4+edi],ebx
+        jmp     NEAR L$033finish
+L$030decrypt:
+        and     ebp,4294967288
+        mov     eax,DWORD [24+esp]
+        mov     ebx,DWORD [28+esp]
+        jz      NEAR L$044decrypt_finish
+L$045decrypt_loop:
+        mov     eax,DWORD [esi]
+        mov     ebx,DWORD [4+esi]
+        mov     DWORD [16+esp],eax
+        mov     DWORD [20+esp],ebx
+        call    L$_DES_decrypt3_begin
+        mov     eax,DWORD [16+esp]
+        mov     ebx,DWORD [20+esp]
+        mov     ecx,DWORD [24+esp]
+        mov     edx,DWORD [28+esp]
+        xor     ecx,eax
+        xor     edx,ebx
+        mov     eax,DWORD [esi]
+        mov     ebx,DWORD [4+esi]
+        mov     DWORD [edi],ecx
+        mov     DWORD [4+edi],edx
+        mov     DWORD [24+esp],eax
+        mov     DWORD [28+esp],ebx
+        add     esi,8
+        add     edi,8
+        sub     ebp,8
+        jnz     NEAR L$045decrypt_loop
+L$044decrypt_finish:
+        mov     ebp,DWORD [60+esp]
+        and     ebp,7
+        jz      NEAR L$033finish
+        mov     eax,DWORD [esi]
+        mov     ebx,DWORD [4+esi]
+        mov     DWORD [16+esp],eax
+        mov     DWORD [20+esp],ebx
+        call    L$_DES_decrypt3_begin
+        mov     eax,DWORD [16+esp]
+        mov     ebx,DWORD [20+esp]
+        mov     ecx,DWORD [24+esp]
+        mov     edx,DWORD [28+esp]
+        xor     ecx,eax
+        xor     edx,ebx
+        mov     eax,DWORD [esi]
+        mov     ebx,DWORD [4+esi]
+L$046dj7:
+        ror     edx,16
+        mov     BYTE [6+edi],dl
+        shr     edx,16
+L$047dj6:
+        mov     BYTE [5+edi],dh
+L$048dj5:
+        mov     BYTE [4+edi],dl
+L$049dj4:
+        mov     DWORD [edi],ecx
+        jmp     NEAR L$050djend
+L$051dj3:
+        ror     ecx,16
+        mov     BYTE [2+edi],cl
+        shl     ecx,16
+L$052dj2:
+        mov     BYTE [1+esi],ch
+L$053dj1:
+        mov     BYTE [esi],cl
+L$050djend:
+        jmp     NEAR L$033finish
+L$033finish:
+        mov     ecx,DWORD [76+esp]
+        add     esp,32
+        mov     DWORD [ecx],eax
+        mov     DWORD [4+ecx],ebx
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+align   64
+L$035cbc_enc_jmp_table:
+dd      0
+dd      L$043ej1-L$034PIC_point
+dd      L$042ej2-L$034PIC_point
+dd      L$041ej3-L$034PIC_point
+dd      L$039ej4-L$034PIC_point
+dd      L$038ej5-L$034PIC_point
+dd      L$037ej6-L$034PIC_point
+dd      L$036ej7-L$034PIC_point
+align   64
+align   64
+_DES_SPtrans:
+L$des_sptrans:
+dd      34080768,524288,33554434,34080770
+dd      33554432,526338,524290,33554434
+dd      526338,34080768,34078720,2050
+dd      33556482,33554432,0,524290
+dd      524288,2,33556480,526336
+dd      34080770,34078720,2050,33556480
+dd      2,2048,526336,34078722
+dd      2048,33556482,34078722,0
+dd      0,34080770,33556480,524290
+dd      34080768,524288,2050,33556480
+dd      34078722,2048,526336,33554434
+dd      526338,2,33554434,34078720
+dd      34080770,526336,34078720,33556482
+dd      33554432,2050,524290,0
+dd      524288,33554432,33556482,34080768
+dd      2,34078722,2048,526338
+dd      1074823184,0,1081344,1074790400
+dd      1073741840,32784,1073774592,1081344
+dd      32768,1074790416,16,1073774592
+dd      1048592,1074823168,1074790400,16
+dd      1048576,1073774608,1074790416,32768
+dd      1081360,1073741824,0,1048592
+dd      1073774608,1081360,1074823168,1073741840
+dd      1073741824,1048576,32784,1074823184
+dd      1048592,1074823168,1073774592,1081360
+dd      1074823184,1048592,1073741840,0
+dd      1073741824,32784,1048576,1074790416
+dd      32768,1073741824,1081360,1073774608
+dd      1074823168,32768,0,1073741840
+dd      16,1074823184,1081344,1074790400
+dd      1074790416,1048576,32784,1073774592
+dd      1073774608,16,1074790400,1081344
+dd      67108865,67371264,256,67109121
+dd      262145,67108864,67109121,262400
+dd      67109120,262144,67371008,1
+dd      67371265,257,1,67371009
+dd      0,262145,67371264,256
+dd      257,67371265,262144,67108865
+dd      67371009,67109120,262401,67371008
+dd      262400,0,67108864,262401
+dd      67371264,256,1,262144
+dd      257,262145,67371008,67109121
+dd      0,67371264,262400,67371009
+dd      262145,67108864,67371265,1
+dd      262401,67108865,67108864,67371265
+dd      262144,67109120,67109121,262400
+dd      67109120,0,67371009,257
+dd      67108865,262401,256,67371008
+dd      4198408,268439552,8,272633864
+dd      0,272629760,268439560,4194312
+dd      272633856,268435464,268435456,4104
+dd      268435464,4198408,4194304,268435456
+dd      272629768,4198400,4096,8
+dd      4198400,268439560,272629760,4096
+dd      4104,0,4194312,272633856
+dd      268439552,272629768,272633864,4194304
+dd      272629768,4104,4194304,268435464
+dd      4198400,268439552,8,272629760
+dd      268439560,0,4096,4194312
+dd      0,272629768,272633856,4096
+dd      268435456,272633864,4198408,4194304
+dd      272633864,8,268439552,4198408
+dd      4194312,4198400,272629760,268439560
+dd      4104,268435456,268435464,272633856
+dd      134217728,65536,1024,134284320
+dd      134283296,134218752,66592,134283264
+dd      65536,32,134217760,66560
+dd      134218784,134283296,134284288,0
+dd      66560,134217728,65568,1056
+dd      134218752,66592,0,134217760
+dd      32,134218784,134284320,65568
+dd      134283264,1024,1056,134284288
+dd      134284288,134218784,65568,134283264
+dd      65536,32,134217760,134218752
+dd      134217728,66560,134284320,0
+dd      66592,134217728,1024,65568
+dd      134218784,1024,0,134284320
+dd      134283296,134284288,1056,65536
+dd      66560,134283296,134218752,1056
+dd      32,66592,134283264,134217760
+dd      2147483712,2097216,0,2149588992
+dd      2097216,8192,2147491904,2097152
+dd      8256,2149589056,2105344,2147483648
+dd      2147491840,2147483712,2149580800,2105408
+dd      2097152,2147491904,2149580864,0
+dd      8192,64,2149588992,2149580864
+dd      2149589056,2149580800,2147483648,8256
+dd      64,2105344,2105408,2147491840
+dd      8256,2147483648,2147491840,2105408
+dd      2149588992,2097216,0,2147491840
+dd      2147483648,8192,2149580864,2097152
+dd      2097216,2149589056,2105344,64
+dd      2149589056,2105344,2097152,2147491904
+dd      2147483712,2149580800,2105408,0
+dd      8192,2147483712,2147491904,2149588992
+dd      2149580800,8256,64,2149580864
+dd      16384,512,16777728,16777220
+dd      16794116,16388,16896,0
+dd      16777216,16777732,516,16793600
+dd      4,16794112,16793600,516
+dd      16777732,16384,16388,16794116
+dd      0,16777728,16777220,16896
+dd      16793604,16900,16794112,4
+dd      16900,16793604,512,16777216
+dd      16900,16793600,16793604,516
+dd      16384,512,16777216,16793604
+dd      16777732,16900,16896,0
+dd      512,16777220,4,16777728
+dd      0,16777732,16777728,16896
+dd      516,16384,16794116,16777216
+dd      16794112,4,16388,16794116
+dd      16777220,16794112,16793600,16388
+dd      545259648,545390592,131200,0
+dd      537001984,8388736,545259520,545390720
+dd      128,536870912,8519680,131200
+dd      8519808,537002112,536871040,545259520
+dd      131072,8519808,8388736,537001984
+dd      545390720,536871040,0,8519680
+dd      536870912,8388608,537002112,545259648
+dd      8388608,131072,545390592,128
+dd      8388608,131072,536871040,545390720
+dd      131200,536870912,0,8519680
+dd      545259648,537002112,537001984,8388736
+dd      545390592,128,8388736,537001984
+dd      545390720,8388608,545259520,536871040
+dd      8519680,131200,537002112,545259520
+dd      128,545390592,8519808,0
+dd      536870912,545259648,131072,8519808
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/md5/md5-586.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/md5/md5-586.nasm
new file mode 100644
index 0000000000..83e4e77e6a
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/md5/md5-586.nasm
@@ -0,0 +1,690 @@
+; Copyright 1995-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code    use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text   code align=64
+%else
+section .text   code
+%endif
+global  _md5_block_asm_data_order
+align   16
+_md5_block_asm_data_order:
+L$_md5_block_asm_data_order_begin:
+        push    esi
+        push    edi
+        mov     edi,DWORD [12+esp]
+        mov     esi,DWORD [16+esp]
+        mov     ecx,DWORD [20+esp]
+        push    ebp
+        shl     ecx,6
+        push    ebx
+        add     ecx,esi
+        sub     ecx,64
+        mov     eax,DWORD [edi]
+        push    ecx
+        mov     ebx,DWORD [4+edi]
+        mov     ecx,DWORD [8+edi]
+        mov     edx,DWORD [12+edi]
+L$000start:
+        ;
+        ; R0 section
+        mov     edi,ecx
+        mov     ebp,DWORD [esi]
+        ; R0 0
+        xor     edi,edx
+        and     edi,ebx
+        lea     eax,[3614090360+ebp*1+eax]
+        xor     edi,edx
+        mov     ebp,DWORD [4+esi]
+        add     eax,edi
+        rol     eax,7
+        mov     edi,ebx
+        add     eax,ebx
+        ; R0 1
+        xor     edi,ecx
+        and     edi,eax
+        lea     edx,[3905402710+ebp*1+edx]
+        xor     edi,ecx
+        mov     ebp,DWORD [8+esi]
+        add     edx,edi
+        rol     edx,12
+        mov     edi,eax
+        add     edx,eax
+        ; R0 2
+        xor     edi,ebx
+        and     edi,edx
+        lea     ecx,[606105819+ebp*1+ecx]
+        xor     edi,ebx
+        mov     ebp,DWORD [12+esi]
+        add     ecx,edi
+        rol     ecx,17
+        mov     edi,edx
+        add     ecx,edx
+        ; R0 3
+        xor     edi,eax
+        and     edi,ecx
+        lea     ebx,[3250441966+ebp*1+ebx]
+        xor     edi,eax
+        mov     ebp,DWORD [16+esi]
+        add     ebx,edi
+        rol     ebx,22
+        mov     edi,ecx
+        add     ebx,ecx
+        ; R0 4
+        xor     edi,edx
+        and     edi,ebx
+        lea     eax,[4118548399+ebp*1+eax]
+        xor     edi,edx
+        mov     ebp,DWORD [20+esi]
+        add     eax,edi
+        rol     eax,7
+        mov     edi,ebx
+        add     eax,ebx
+        ; R0 5
+        xor     edi,ecx
+        and     edi,eax
+        lea     edx,[1200080426+ebp*1+edx]
+        xor     edi,ecx
+        mov     ebp,DWORD [24+esi]
+        add     edx,edi
+        rol     edx,12
+        mov     edi,eax
+        add     edx,eax
+        ; R0 6
+        xor     edi,ebx
+        and     edi,edx
+        lea     ecx,[2821735955+ebp*1+ecx]
+        xor     edi,ebx
+        mov     ebp,DWORD [28+esi]
+        add     ecx,edi
+        rol     ecx,17
+        mov     edi,edx
+        add     ecx,edx
+        ; R0 7
+        xor     edi,eax
+        and     edi,ecx
+        lea     ebx,[4249261313+ebp*1+ebx]
+        xor     edi,eax
+        mov     ebp,DWORD [32+esi]
+        add     ebx,edi
+        rol     ebx,22
+        mov     edi,ecx
+        add     ebx,ecx
+        ; R0 8
+        xor     edi,edx
+        and     edi,ebx
+        lea     eax,[1770035416+ebp*1+eax]
+        xor     edi,edx
+        mov     ebp,DWORD [36+esi]
+        add     eax,edi
+        rol     eax,7
+        mov     edi,ebx
+        add     eax,ebx
+        ; R0 9
+        xor     edi,ecx
+        and     edi,eax
+        lea     edx,[2336552879+ebp*1+edx]
+        xor     edi,ecx
+        mov     ebp,DWORD [40+esi]
+        add     edx,edi
+        rol     edx,12
+        mov     edi,eax
+        add     edx,eax
+        ; R0 10
+        xor     edi,ebx
+        and     edi,edx
+        lea     ecx,[4294925233+ebp*1+ecx]
+        xor     edi,ebx
+        mov     ebp,DWORD [44+esi]
+        add     ecx,edi
+        rol     ecx,17
+        mov     edi,edx
+        add     ecx,edx
+        ; R0 11
+        xor     edi,eax
+        and     edi,ecx
+        lea     ebx,[2304563134+ebp*1+ebx]
+        xor     edi,eax
+        mov     ebp,DWORD [48+esi]
+        add     ebx,edi
+        rol     ebx,22
+        mov     edi,ecx
+        add     ebx,ecx
+        ; R0 12
+        xor     edi,edx
+        and     edi,ebx
+        lea     eax,[1804603682+ebp*1+eax]
+        xor     edi,edx
+        mov     ebp,DWORD [52+esi]
+        add     eax,edi
+        rol     eax,7
+        mov     edi,ebx
+        add     eax,ebx
+        ; R0 13
+        xor     edi,ecx
+        and     edi,eax
+        lea     edx,[4254626195+ebp*1+edx]
+        xor     edi,ecx
+        mov     ebp,DWORD [56+esi]
+        add     edx,edi
+        rol     edx,12
+        mov     edi,eax
+        add     edx,eax
+        ; R0 14
+        xor     edi,ebx
+        and     edi,edx
+        lea     ecx,[2792965006+ebp*1+ecx]
+        xor     edi,ebx
+        mov     ebp,DWORD [60+esi]
+        add     ecx,edi
+        rol     ecx,17
+        mov     edi,edx
+        add     ecx,edx
+        ; R0 15
+        xor     edi,eax
+        and     edi,ecx
+        lea     ebx,[1236535329+ebp*1+ebx]
+        xor     edi,eax
+        mov     ebp,DWORD [4+esi]
+        add     ebx,edi
+        rol     ebx,22
+        mov     edi,ecx
+        add     ebx,ecx
+        ;
+        ; R1 section
+        ; R1 16
+        xor     edi,ebx
+        and     edi,edx
+        lea     eax,[4129170786+ebp*1+eax]
+        xor     edi,ecx
+        mov     ebp,DWORD [24+esi]
+        add     eax,edi
+        mov     edi,ebx
+        rol     eax,5
+        add     eax,ebx
+        ; R1 17
+        xor     edi,eax
+        and     edi,ecx
+        lea     edx,[3225465664+ebp*1+edx]
+        xor     edi,ebx
+        mov     ebp,DWORD [44+esi]
+        add     edx,edi
+        mov     edi,eax
+        rol     edx,9
+        add     edx,eax
+        ; R1 18
+        xor     edi,edx
+        and     edi,ebx
+        lea     ecx,[643717713+ebp*1+ecx]
+        xor     edi,eax
+        mov     ebp,DWORD [esi]
+        add     ecx,edi
+        mov     edi,edx
+        rol     ecx,14
+        add     ecx,edx
+        ; R1 19
+        xor     edi,ecx
+        and     edi,eax
+        lea     ebx,[3921069994+ebp*1+ebx]
+        xor     edi,edx
+        mov     ebp,DWORD [20+esi]
+        add     ebx,edi
+        mov     edi,ecx
+        rol     ebx,20
+        add     ebx,ecx
+        ; R1 20
+        xor     edi,ebx
+        and     edi,edx
+        lea     eax,[3593408605+ebp*1+eax]
+        xor     edi,ecx
+        mov     ebp,DWORD [40+esi]
+        add     eax,edi
+        mov     edi,ebx
+        rol     eax,5
+        add     eax,ebx
+        ; R1 21
+        xor     edi,eax
+        and     edi,ecx
+        lea     edx,[38016083+ebp*1+edx]
+        xor     edi,ebx
+        mov     ebp,DWORD [60+esi]
+        add     edx,edi
+        mov     edi,eax
+        rol     edx,9
+        add     edx,eax
+        ; R1 22
+        xor     edi,edx
+        and     edi,ebx
+        lea     ecx,[3634488961+ebp*1+ecx]
+        xor     edi,eax
+        mov     ebp,DWORD [16+esi]
+        add     ecx,edi
+        mov     edi,edx
+        rol     ecx,14
+        add     ecx,edx
+        ; R1 23
+        xor     edi,ecx
+        and     edi,eax
+        lea     ebx,[3889429448+ebp*1+ebx]
+        xor     edi,edx
+        mov     ebp,DWORD [36+esi]
+        add     ebx,edi
+        mov     edi,ecx
+        rol     ebx,20
+        add     ebx,ecx
+        ; R1 24
+        xor     edi,ebx
+        and     edi,edx
+        lea     eax,[568446438+ebp*1+eax]
+        xor     edi,ecx
+        mov     ebp,DWORD [56+esi]
+        add     eax,edi
+        mov     edi,ebx
+        rol     eax,5
+        add     eax,ebx
+        ; R1 25
+        xor     edi,eax
+        and     edi,ecx
+        lea     edx,[3275163606+ebp*1+edx]
+        xor     edi,ebx
+        mov     ebp,DWORD [12+esi]
+        add     edx,edi
+        mov     edi,eax
+        rol     edx,9
+        add     edx,eax
+        ; R1 26
+        xor     edi,edx
+        and     edi,ebx
+        lea     ecx,[4107603335+ebp*1+ecx]
+        xor     edi,eax
+        mov     ebp,DWORD [32+esi]
+        add     ecx,edi
+        mov     edi,edx
+        rol     ecx,14
+        add     ecx,edx
+        ; R1 27
+        xor     edi,ecx
+        and     edi,eax
+        lea     ebx,[1163531501+ebp*1+ebx]
+        xor     edi,edx
+        mov     ebp,DWORD [52+esi]
+        add     ebx,edi
+        mov     edi,ecx
+        rol     ebx,20
+        add     ebx,ecx
+        ; R1 28
+        xor     edi,ebx
+        and     edi,edx
+        lea     eax,[2850285829+ebp*1+eax]
+        xor     edi,ecx
+        mov     ebp,DWORD [8+esi]
+        add     eax,edi
+        mov     edi,ebx
+        rol     eax,5
+        add     eax,ebx
+        ; R1 29
+        xor     edi,eax
+        and     edi,ecx
+        lea     edx,[4243563512+ebp*1+edx]
+        xor     edi,ebx
+        mov     ebp,DWORD [28+esi]
+        add     edx,edi
+        mov     edi,eax
+        rol     edx,9
+        add     edx,eax
+        ; R1 30
+        xor     edi,edx
+        and     edi,ebx
+        lea     ecx,[1735328473+ebp*1+ecx]
+        xor     edi,eax
+        mov     ebp,DWORD [48+esi]
+        add     ecx,edi
+        mov     edi,edx
+        rol     ecx,14
+        add     ecx,edx
+        ; R1 31
+        xor     edi,ecx
+        and     edi,eax
+        lea     ebx,[2368359562+ebp*1+ebx]
+        xor     edi,edx
+        mov     ebp,DWORD [20+esi]
+        add     ebx,edi
+        mov     edi,ecx
+        rol     ebx,20
+        add     ebx,ecx
+        ;
+        ; R2 section
+        ; R2 32
+        xor     edi,edx
+        xor     edi,ebx
+        lea     eax,[4294588738+ebp*1+eax]
+        add     eax,edi
+        mov     ebp,DWORD [32+esi]
+        rol     eax,4
+        mov     edi,ebx
+        ; R2 33
+        add     eax,ebx
+        xor     edi,ecx
+        lea     edx,[2272392833+ebp*1+edx]
+        xor     edi,eax
+        mov     ebp,DWORD [44+esi]
+        add     edx,edi
+        mov     edi,eax
+        rol     edx,11
+        add     edx,eax
+        ; R2 34
+        xor     edi,ebx
+        xor     edi,edx
+        lea     ecx,[1839030562+ebp*1+ecx]
+        add     ecx,edi
+        mov     ebp,DWORD [56+esi]
+        rol     ecx,16
+        mov     edi,edx
+        ; R2 35
+        add     ecx,edx
+        xor     edi,eax
+        lea     ebx,[4259657740+ebp*1+ebx]
+        xor     edi,ecx
+        mov     ebp,DWORD [4+esi]
+        add     ebx,edi
+        mov     edi,ecx
+        rol     ebx,23
+        add     ebx,ecx
+        ; R2 36
+        xor     edi,edx
+        xor     edi,ebx
+        lea     eax,[2763975236+ebp*1+eax]
+        add     eax,edi
+        mov     ebp,DWORD [16+esi]
+        rol     eax,4
+        mov     edi,ebx
+        ; R2 37
+        add     eax,ebx
+        xor     edi,ecx
+        lea     edx,[1272893353+ebp*1+edx]
+        xor     edi,eax
+        mov     ebp,DWORD [28+esi]
+        add     edx,edi
+        mov     edi,eax
+        rol     edx,11
+        add     edx,eax
+        ; R2 38
+        xor     edi,ebx
+        xor     edi,edx
+        lea     ecx,[4139469664+ebp*1+ecx]
+        add     ecx,edi
+        mov     ebp,DWORD [40+esi]
+        rol     ecx,16
+        mov     edi,edx
+        ; R2 39
+        add     ecx,edx
+        xor     edi,eax
+        lea     ebx,[3200236656+ebp*1+ebx]
+        xor     edi,ecx
+        mov     ebp,DWORD [52+esi]
+        add     ebx,edi
+        mov     edi,ecx
+        rol     ebx,23
+        add     ebx,ecx
+        ; R2 40
+        xor     edi,edx
+        xor     edi,ebx
+        lea     eax,[681279174+ebp*1+eax]
+        add     eax,edi
+        mov     ebp,DWORD [esi]
+        rol     eax,4
+        mov     edi,ebx
+        ; R2 41
+        add     eax,ebx
+        xor     edi,ecx
+        lea     edx,[3936430074+ebp*1+edx]
+        xor     edi,eax
+        mov     ebp,DWORD [12+esi]
+        add     edx,edi
+        mov     edi,eax
+        rol     edx,11
+        add     edx,eax
+        ; R2 42
+        xor     edi,ebx
+        xor     edi,edx
+        lea     ecx,[3572445317+ebp*1+ecx]
+        add     ecx,edi
+        mov     ebp,DWORD [24+esi]
+        rol     ecx,16
+        mov     edi,edx
+        ; R2 43
+        add     ecx,edx
+        xor     edi,eax
+        lea     ebx,[76029189+ebp*1+ebx]
+        xor     edi,ecx
+        mov     ebp,DWORD [36+esi]
+        add     ebx,edi
+        mov     edi,ecx
+        rol     ebx,23
+        add     ebx,ecx
+        ; R2 44
+        xor     edi,edx
+        xor     edi,ebx
+        lea     eax,[3654602809+ebp*1+eax]
+        add     eax,edi
+        mov     ebp,DWORD [48+esi]
+        rol     eax,4
+        mov     edi,ebx
+        ; R2 45
+        add     eax,ebx
+        xor     edi,ecx
+        lea     edx,[3873151461+ebp*1+edx]
+        xor     edi,eax
+        mov     ebp,DWORD [60+esi]
+        add     edx,edi
+        mov     edi,eax
+        rol     edx,11
+        add     edx,eax
+        ; R2 46
+        xor     edi,ebx
+        xor     edi,edx
+        lea     ecx,[530742520+ebp*1+ecx]
+        add     ecx,edi
+        mov     ebp,DWORD [8+esi]
+        rol     ecx,16
+        mov     edi,edx
+        ; R2 47
+        add     ecx,edx
+        xor     edi,eax
+        lea     ebx,[3299628645+ebp*1+ebx]
+        xor     edi,ecx
+        mov     ebp,DWORD [esi]
+        add     ebx,edi
+        mov     edi,-1
+        rol     ebx,23
+        add     ebx,ecx
+        ;
+        ; R3 section
+        ; R3 48
+        xor     edi,edx
+        or      edi,ebx
+        lea     eax,[4096336452+ebp*1+eax]
+        xor     edi,ecx
+        mov     ebp,DWORD [28+esi]
+        add     eax,edi
+        mov     edi,-1
+        rol     eax,6
+        xor     edi,ecx
+        add     eax,ebx
+        ; R3 49
+        or      edi,eax
+        lea     edx,[1126891415+ebp*1+edx]
+        xor     edi,ebx
+        mov     ebp,DWORD [56+esi]
+        add     edx,edi
+        mov     edi,-1
+        rol     edx,10
+        xor     edi,ebx
+        add     edx,eax
+        ; R3 50
+        or      edi,edx
+        lea     ecx,[2878612391+ebp*1+ecx]
+        xor     edi,eax
+        mov     ebp,DWORD [20+esi]
+        add     ecx,edi
+        mov     edi,-1
+        rol     ecx,15
+        xor     edi,eax
+        add     ecx,edx
+        ; R3 51
+        or      edi,ecx
+        lea     ebx,[4237533241+ebp*1+ebx]
+        xor     edi,edx
+        mov     ebp,DWORD [48+esi]
+        add     ebx,edi
+        mov     edi,-1
+        rol     ebx,21
+        xor     edi,edx
+        add     ebx,ecx
+        ; R3 52
+        or      edi,ebx
+        lea     eax,[1700485571+ebp*1+eax]
+        xor     edi,ecx
+        mov     ebp,DWORD [12+esi]
+        add     eax,edi
+        mov     edi,-1
+        rol     eax,6
+        xor     edi,ecx
+        add     eax,ebx
+        ; R3 53
+        or      edi,eax
+        lea     edx,[2399980690+ebp*1+edx]
+        xor     edi,ebx
+        mov     ebp,DWORD [40+esi]
+        add     edx,edi
+        mov     edi,-1
+        rol     edx,10
+        xor     edi,ebx
+        add     edx,eax
+        ; R3 54
+        or      edi,edx
+        lea     ecx,[4293915773+ebp*1+ecx]
+        xor     edi,eax
+        mov     ebp,DWORD [4+esi]
+        add     ecx,edi
+        mov     edi,-1
+        rol     ecx,15
+        xor     edi,eax
+        add     ecx,edx
+        ; R3 55
+        or      edi,ecx
+        lea     ebx,[2240044497+ebp*1+ebx]
+        xor     edi,edx
+        mov     ebp,DWORD [32+esi]
+        add     ebx,edi
+        mov     edi,-1
+        rol     ebx,21
+        xor     edi,edx
+        add     ebx,ecx
+        ; R3 56
+        or      edi,ebx
+        lea     eax,[1873313359+ebp*1+eax]
+        xor     edi,ecx
+        mov     ebp,DWORD [60+esi]
+        add     eax,edi
+        mov     edi,-1
+        rol     eax,6
+        xor     edi,ecx
+        add     eax,ebx
+        ; R3 57
+        or      edi,eax
+        lea     edx,[4264355552+ebp*1+edx]
+        xor     edi,ebx
+        mov     ebp,DWORD [24+esi]
+        add     edx,edi
+        mov     edi,-1
+        rol     edx,10
+        xor     edi,ebx
+        add     edx,eax
+        ; R3 58
+        or      edi,edx
+        lea     ecx,[2734768916+ebp*1+ecx]
+        xor     edi,eax
+        mov     ebp,DWORD [52+esi]
+        add     ecx,edi
+        mov     edi,-1
+        rol     ecx,15
+        xor     edi,eax
+        add     ecx,edx
+        ; R3 59
+        or      edi,ecx
+        lea     ebx,[1309151649+ebp*1+ebx]
+        xor     edi,edx
+        mov     ebp,DWORD [16+esi]
+        add     ebx,edi
+        mov     edi,-1
+        rol     ebx,21
+        xor     edi,edx
+        add     ebx,ecx
+        ; R3 60
+        or      edi,ebx
+        lea     eax,[4149444226+ebp*1+eax]
+        xor     edi,ecx
+        mov     ebp,DWORD [44+esi]
+        add     eax,edi
+        mov     edi,-1
+        rol     eax,6
+        xor     edi,ecx
+        add     eax,ebx
+        ; R3 61
+        or      edi,eax
+        lea     edx,[3174756917+ebp*1+edx]
+        xor     edi,ebx
+        mov     ebp,DWORD [8+esi]
+        add     edx,edi
+        mov     edi,-1
+        rol     edx,10
+        xor     edi,ebx
+        add     edx,eax
+        ; R3 62
+        or      edi,edx
+        lea     ecx,[718787259+ebp*1+ecx]
+        xor     edi,eax
+        mov     ebp,DWORD [36+esi]
+        add     ecx,edi
+        mov     edi,-1
+        rol     ecx,15
+        xor     edi,eax
+        add     ecx,edx
+        ; R3 63
+        or      edi,ecx
+        lea     ebx,[3951481745+ebp*1+ebx]
+        xor     edi,edx
+        mov     ebp,DWORD [24+esp]
+        add     ebx,edi
+        add     esi,64
+        rol     ebx,21
+        mov     edi,DWORD [ebp]
+        add     ebx,ecx
+        add     eax,edi
+        mov     edi,DWORD [4+ebp]
+        add     ebx,edi
+        mov     edi,DWORD [8+ebp]
+        add     ecx,edi
+        mov     edi,DWORD [12+ebp]
+        add     edx,edi
+        mov     DWORD [ebp],eax
+        mov     DWORD [4+ebp],ebx
+        mov     edi,DWORD [esp]
+        mov     DWORD [8+ebp],ecx
+        mov     DWORD [12+ebp],edx
+        cmp     edi,esi
+        jae     NEAR L$000start
+        pop     eax
+        pop     ebx
+        pop     ebp
+        pop     edi
+        pop     esi
+        ret
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/modes/ghash-x86.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/modes/ghash-x86.nasm
new file mode 100644
index 0000000000..57649ad22b
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/modes/ghash-x86.nasm
@@ -0,0 +1,1264 @@
+; Copyright 2010-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code    use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text   code align=64
+%else
+section .text   code
+%endif
+global  _gcm_gmult_4bit_x86
+align   16
+_gcm_gmult_4bit_x86:
+L$_gcm_gmult_4bit_x86_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        sub     esp,84
+        mov     edi,DWORD [104+esp]
+        mov     esi,DWORD [108+esp]
+        mov     ebp,DWORD [edi]
+        mov     edx,DWORD [4+edi]
+        mov     ecx,DWORD [8+edi]
+        mov     ebx,DWORD [12+edi]
+        mov     DWORD [16+esp],0
+        mov     DWORD [20+esp],471859200
+        mov     DWORD [24+esp],943718400
+        mov     DWORD [28+esp],610271232
+        mov     DWORD [32+esp],1887436800
+        mov     DWORD [36+esp],1822425088
+        mov     DWORD [40+esp],1220542464
+        mov     DWORD [44+esp],1423966208
+        mov     DWORD [48+esp],3774873600
+        mov     DWORD [52+esp],4246732800
+        mov     DWORD [56+esp],3644850176
+        mov     DWORD [60+esp],3311403008
+        mov     DWORD [64+esp],2441084928
+        mov     DWORD [68+esp],2376073216
+        mov     DWORD [72+esp],2847932416
+        mov     DWORD [76+esp],3051356160
+        mov     DWORD [esp],ebp
+        mov     DWORD [4+esp],edx
+        mov     DWORD [8+esp],ecx
+        mov     DWORD [12+esp],ebx
+        shr     ebx,20
+        and     ebx,240
+        mov     ebp,DWORD [4+ebx*1+esi]
+        mov     edx,DWORD [ebx*1+esi]
+        mov     ecx,DWORD [12+ebx*1+esi]
+        mov     ebx,DWORD [8+ebx*1+esi]
+        xor     eax,eax
+        mov     edi,15
+        jmp     NEAR L$000x86_loop
+align   16
+L$000x86_loop:
+        mov     al,bl
+        shrd    ebx,ecx,4
+        and     al,15
+        shrd    ecx,edx,4
+        shrd    edx,ebp,4
+        shr     ebp,4
+        xor     ebp,DWORD [16+eax*4+esp]
+        mov     al,BYTE [edi*1+esp]
+        and     al,240
+        xor     ebx,DWORD [8+eax*1+esi]
+        xor     ecx,DWORD [12+eax*1+esi]
+        xor     edx,DWORD [eax*1+esi]
+        xor     ebp,DWORD [4+eax*1+esi]
+        dec     edi
+        js      NEAR L$001x86_break
+        mov     al,bl
+        shrd    ebx,ecx,4
+        and     al,15
+        shrd    ecx,edx,4
+        shrd    edx,ebp,4
+        shr     ebp,4
+        xor     ebp,DWORD [16+eax*4+esp]
+        mov     al,BYTE [edi*1+esp]
+        shl     al,4
+        xor     ebx,DWORD [8+eax*1+esi]
+        xor     ecx,DWORD [12+eax*1+esi]
+        xor     edx,DWORD [eax*1+esi]
+        xor     ebp,DWORD [4+eax*1+esi]
+        jmp     NEAR L$000x86_loop
+align   16
+L$001x86_break:
+        bswap   ebx
+        bswap   ecx
+        bswap   edx
+        bswap   ebp
+        mov     edi,DWORD [104+esp]
+        mov     DWORD [12+edi],ebx
+        mov     DWORD [8+edi],ecx
+        mov     DWORD [4+edi],edx
+        mov     DWORD [edi],ebp
+        add     esp,84
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+global  _gcm_ghash_4bit_x86
+align   16
+_gcm_ghash_4bit_x86:
+L$_gcm_ghash_4bit_x86_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        sub     esp,84
+        mov     ebx,DWORD [104+esp]
+        mov     esi,DWORD [108+esp]
+        mov     edi,DWORD [112+esp]
+        mov     ecx,DWORD [116+esp]
+        add     ecx,edi
+        mov     DWORD [116+esp],ecx
+        mov     ebp,DWORD [ebx]
+        mov     edx,DWORD [4+ebx]
+        mov     ecx,DWORD [8+ebx]
+        mov     ebx,DWORD [12+ebx]
+        mov     DWORD [16+esp],0
+        mov     DWORD [20+esp],471859200
+        mov     DWORD [24+esp],943718400
+        mov     DWORD [28+esp],610271232
+        mov     DWORD [32+esp],1887436800
+        mov     DWORD [36+esp],1822425088
+        mov     DWORD [40+esp],1220542464
+        mov     DWORD [44+esp],1423966208
+        mov     DWORD [48+esp],3774873600
+        mov     DWORD [52+esp],4246732800
+        mov     DWORD [56+esp],3644850176
+        mov     DWORD [60+esp],3311403008
+        mov     DWORD [64+esp],2441084928
+        mov     DWORD [68+esp],2376073216
+        mov     DWORD [72+esp],2847932416
+        mov     DWORD [76+esp],3051356160
+align   16
+L$002x86_outer_loop:
+        xor     ebx,DWORD [12+edi]
+        xor     ecx,DWORD [8+edi]
+        xor     edx,DWORD [4+edi]
+        xor     ebp,DWORD [edi]
+        mov     DWORD [12+esp],ebx
+        mov     DWORD [8+esp],ecx
+        mov     DWORD [4+esp],edx
+        mov     DWORD [esp],ebp
+        shr     ebx,20
+        and     ebx,240
+        mov     ebp,DWORD [4+ebx*1+esi]
+        mov     edx,DWORD [ebx*1+esi]
+        mov     ecx,DWORD [12+ebx*1+esi]
+        mov     ebx,DWORD [8+ebx*1+esi]
+        xor     eax,eax
+        mov     edi,15
+        jmp     NEAR L$003x86_loop
+align   16
+L$003x86_loop:
+        mov     al,bl
+        shrd    ebx,ecx,4
+        and     al,15
+        shrd    ecx,edx,4
+        shrd    edx,ebp,4
+        shr     ebp,4
+        xor     ebp,DWORD [16+eax*4+esp]
+        mov     al,BYTE [edi*1+esp]
+        and     al,240
+        xor     ebx,DWORD [8+eax*1+esi]
+        xor     ecx,DWORD [12+eax*1+esi]
+        xor     edx,DWORD [eax*1+esi]
+        xor     ebp,DWORD [4+eax*1+esi]
+        dec     edi
+        js      NEAR L$004x86_break
+        mov     al,bl
+        shrd    ebx,ecx,4
+        and     al,15
+        shrd    ecx,edx,4
+        shrd    edx,ebp,4
+        shr     ebp,4
+        xor     ebp,DWORD [16+eax*4+esp]
+        mov     al,BYTE [edi*1+esp]
+        shl     al,4
+        xor     ebx,DWORD [8+eax*1+esi]
+        xor     ecx,DWORD [12+eax*1+esi]
+        xor     edx,DWORD [eax*1+esi]
+        xor     ebp,DWORD [4+eax*1+esi]
+        jmp     NEAR L$003x86_loop
+align   16
+L$004x86_break:
+        bswap   ebx
+        bswap   ecx
+        bswap   edx
+        bswap   ebp
+        mov     edi,DWORD [112+esp]
+        lea     edi,[16+edi]
+        cmp     edi,DWORD [116+esp]
+        mov     DWORD [112+esp],edi
+        jb      NEAR L$002x86_outer_loop
+        mov     edi,DWORD [104+esp]
+        mov     DWORD [12+edi],ebx
+        mov     DWORD [8+edi],ecx
+        mov     DWORD [4+edi],edx
+        mov     DWORD [edi],ebp
+        add     esp,84
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+global  _gcm_gmult_4bit_mmx
+align   16
+_gcm_gmult_4bit_mmx:
+L$_gcm_gmult_4bit_mmx_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        mov     edi,DWORD [20+esp]
+        mov     esi,DWORD [24+esp]
+        call    L$005pic_point
+L$005pic_point:
+        pop     eax
+        lea     eax,[(L$rem_4bit-L$005pic_point)+eax]
+        movzx   ebx,BYTE [15+edi]
+        xor     ecx,ecx
+        mov     edx,ebx
+        mov     cl,dl
+        mov     ebp,14
+        shl     cl,4
+        and     edx,240
+        movq    mm0,[8+ecx*1+esi]
+        movq    mm1,[ecx*1+esi]
+        movd    ebx,mm0
+        jmp     NEAR L$006mmx_loop
+align   16
+L$006mmx_loop:
+        psrlq   mm0,4
+        and     ebx,15
+        movq    mm2,mm1
+        psrlq   mm1,4
+        pxor    mm0,[8+edx*1+esi]
+        mov     cl,BYTE [ebp*1+edi]
+        psllq   mm2,60
+        pxor    mm1,[ebx*8+eax]
+        dec     ebp
+        movd    ebx,mm0
+        pxor    mm1,[edx*1+esi]
+        mov     edx,ecx
+        pxor    mm0,mm2
+        js      NEAR L$007mmx_break
+        shl     cl,4
+        and     ebx,15
+        psrlq   mm0,4
+        and     edx,240
+        movq    mm2,mm1
+        psrlq   mm1,4
+        pxor    mm0,[8+ecx*1+esi]
+        psllq   mm2,60
+        pxor    mm1,[ebx*8+eax]
+        movd    ebx,mm0
+        pxor    mm1,[ecx*1+esi]
+        pxor    mm0,mm2
+        jmp     NEAR L$006mmx_loop
+align   16
+L$007mmx_break:
+        shl     cl,4
+        and     ebx,15
+        psrlq   mm0,4
+        and     edx,240
+        movq    mm2,mm1
+        psrlq   mm1,4
+        pxor    mm0,[8+ecx*1+esi]
+        psllq   mm2,60
+        pxor    mm1,[ebx*8+eax]
+        movd    ebx,mm0
+        pxor    mm1,[ecx*1+esi]
+        pxor    mm0,mm2
+        psrlq   mm0,4
+        and     ebx,15
+        movq    mm2,mm1
+        psrlq   mm1,4
+        pxor    mm0,[8+edx*1+esi]
+        psllq   mm2,60
+        pxor    mm1,[ebx*8+eax]
+        movd    ebx,mm0
+        pxor    mm1,[edx*1+esi]
+        pxor    mm0,mm2
+        psrlq   mm0,32
+        movd    edx,mm1
+        psrlq   mm1,32
+        movd    ecx,mm0
+        movd    ebp,mm1
+        bswap   ebx
+        bswap   edx
+        bswap   ecx
+        bswap   ebp
+        emms
+        mov     DWORD [12+edi],ebx
+        mov     DWORD [4+edi],edx
+        mov     DWORD [8+edi],ecx
+        mov     DWORD [edi],ebp
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+global  _gcm_ghash_4bit_mmx
+align   16
+_gcm_ghash_4bit_mmx:
+L$_gcm_ghash_4bit_mmx_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        mov     eax,DWORD [20+esp]
+        mov     ebx,DWORD [24+esp]
+        mov     ecx,DWORD [28+esp]
+        mov     edx,DWORD [32+esp]
+        mov     ebp,esp
+        call    L$008pic_point
+L$008pic_point:
+        pop     esi
+        lea     esi,[(L$rem_8bit-L$008pic_point)+esi]
+        sub     esp,544
+        and     esp,-64
+        sub     esp,16
+        add     edx,ecx
+        mov     DWORD [544+esp],eax
+        mov     DWORD [552+esp],edx
+        mov     DWORD [556+esp],ebp
+        add     ebx,128
+        lea     edi,[144+esp]
+        lea     ebp,[400+esp]
+        mov     edx,DWORD [ebx-120]
+        movq    mm0,[ebx-120]
+        movq    mm3,[ebx-128]
+        shl     edx,4
+        mov     BYTE [esp],dl
+        mov     edx,DWORD [ebx-104]
+        movq    mm2,[ebx-104]
+        movq    mm5,[ebx-112]
+        movq    [edi-128],mm0
+        psrlq   mm0,4
+        movq    [edi],mm3
+        movq    mm7,mm3
+        psrlq   mm3,4
+        shl     edx,4
+        mov     BYTE [1+esp],dl
+        mov     edx,DWORD [ebx-88]
+        movq    mm1,[ebx-88]
+        psllq   mm7,60
+        movq    mm4,[ebx-96]
+        por     mm0,mm7
+        movq    [edi-120],mm2
+        psrlq   mm2,4
+        movq    [8+edi],mm5
+        movq    mm6,mm5
+        movq    [ebp-128],mm0
+        psrlq   mm5,4
+        movq    [ebp],mm3
+        shl     edx,4
+        mov     BYTE [2+esp],dl
+        mov     edx,DWORD [ebx-72]
+        movq    mm0,[ebx-72]
+        psllq   mm6,60
+        movq    mm3,[ebx-80]
+        por     mm2,mm6
+        movq    [edi-112],mm1
+        psrlq   mm1,4
+        movq    [16+edi],mm4
+        movq    mm7,mm4
+        movq    [ebp-120],mm2
+        psrlq   mm4,4
+        movq    [8+ebp],mm5
+        shl     edx,4
+        mov     BYTE [3+esp],dl
+        mov     edx,DWORD [ebx-56]
+        movq    mm2,[ebx-56]
+        psllq   mm7,60
+        movq    mm5,[ebx-64]
+        por     mm1,mm7
+        movq    [edi-104],mm0
+        psrlq   mm0,4
+        movq    [24+edi],mm3
+        movq    mm6,mm3
+        movq    [ebp-112],mm1
+        psrlq   mm3,4
+        movq    [16+ebp],mm4
+        shl     edx,4
+        mov     BYTE [4+esp],dl
+        mov     edx,DWORD [ebx-40]
+        movq    mm1,[ebx-40]
+        psllq   mm6,60
+        movq    mm4,[ebx-48]
+        por     mm0,mm6
+        movq    [edi-96],mm2
+        psrlq   mm2,4
+        movq    [32+edi],mm5
+        movq    mm7,mm5
+        movq    [ebp-104],mm0
+        psrlq   mm5,4
+        movq    [24+ebp],mm3
+        shl     edx,4
+        mov     BYTE [5+esp],dl
+        mov     edx,DWORD [ebx-24]
+        movq    mm0,[ebx-24]
+        psllq   mm7,60
+        movq    mm3,[ebx-32]
+        por     mm2,mm7
+        movq    [edi-88],mm1
+        psrlq   mm1,4
+        movq    [40+edi],mm4
+        movq    mm6,mm4
+        movq    [ebp-96],mm2
+        psrlq   mm4,4
+        movq    [32+ebp],mm5
+        shl     edx,4
+        mov     BYTE [6+esp],dl
+        mov     edx,DWORD [ebx-8]
+        movq    mm2,[ebx-8]
+        psllq   mm6,60
+        movq    mm5,[ebx-16]
+        por     mm1,mm6
+        movq    [edi-80],mm0
+        psrlq   mm0,4
+        movq    [48+edi],mm3
+        movq    mm7,mm3
+        movq    [ebp-88],mm1
+        psrlq   mm3,4
+        movq    [40+ebp],mm4
+        shl     edx,4
+        mov     BYTE [7+esp],dl
+        mov     edx,DWORD [8+ebx]
+        movq    mm1,[8+ebx]
+        psllq   mm7,60
+        movq    mm4,[ebx]
+        por     mm0,mm7
+        movq    [edi-72],mm2
+        psrlq   mm2,4
+        movq    [56+edi],mm5
+        movq    mm6,mm5
+        movq    [ebp-80],mm0
+        psrlq   mm5,4
+        movq    [48+ebp],mm3
+        shl     edx,4
+        mov     BYTE [8+esp],dl
+        mov     edx,DWORD [24+ebx]
+        movq    mm0,[24+ebx]
+        psllq   mm6,60
+        movq    mm3,[16+ebx]
+        por     mm2,mm6
+        movq    [edi-64],mm1
+        psrlq   mm1,4
+        movq    [64+edi],mm4
+        movq    mm7,mm4
+        movq    [ebp-72],mm2
+        psrlq   mm4,4
+        movq    [56+ebp],mm5
+        shl     edx,4
+        mov     BYTE [9+esp],dl
+        mov     edx,DWORD [40+ebx]
+        movq    mm2,[40+ebx]
+        psllq   mm7,60
+        movq    mm5,[32+ebx]
+        por     mm1,mm7
+        movq    [edi-56],mm0
+        psrlq   mm0,4
+        movq    [72+edi],mm3
+        movq    mm6,mm3
+        movq    [ebp-64],mm1
+        psrlq   mm3,4
+        movq    [64+ebp],mm4
+        shl     edx,4
+        mov     BYTE [10+esp],dl
+        mov     edx,DWORD [56+ebx]
+        movq    mm1,[56+ebx]
+        psllq   mm6,60
+        movq    mm4,[48+ebx]
+        por     mm0,mm6
+        movq    [edi-48],mm2
+        psrlq   mm2,4
+        movq    [80+edi],mm5
+        movq    mm7,mm5
+        movq    [ebp-56],mm0
+        psrlq   mm5,4
+        movq    [72+ebp],mm3
+        shl     edx,4
+        mov     BYTE [11+esp],dl
+        mov     edx,DWORD [72+ebx]
+        movq    mm0,[72+ebx]
+        psllq   mm7,60
+        movq    mm3,[64+ebx]
+        por     mm2,mm7
+        movq    [edi-40],mm1
+        psrlq   mm1,4
+        movq    [88+edi],mm4
+        movq    mm6,mm4
+        movq    [ebp-48],mm2
+        psrlq   mm4,4
+        movq    [80+ebp],mm5
+        shl     edx,4
+        mov     BYTE [12+esp],dl
+        mov     edx,DWORD [88+ebx]
+        movq    mm2,[88+ebx]
+        psllq   mm6,60
+        movq    mm5,[80+ebx]
+        por     mm1,mm6
+        movq    [edi-32],mm0
+        psrlq   mm0,4
+        movq    [96+edi],mm3
+        movq    mm7,mm3
+        movq    [ebp-40],mm1
+        psrlq   mm3,4
+        movq    [88+ebp],mm4
+        shl     edx,4
+        mov     BYTE [13+esp],dl
+        mov     edx,DWORD [104+ebx]
+        movq    mm1,[104+ebx]
+        psllq   mm7,60
+        movq    mm4,[96+ebx]
+        por     mm0,mm7
+        movq    [edi-24],mm2
+        psrlq   mm2,4
+        movq    [104+edi],mm5
+        movq    mm6,mm5
+        movq    [ebp-32],mm0
+        psrlq   mm5,4
+        movq    [96+ebp],mm3
+        shl     edx,4
+        mov     BYTE [14+esp],dl
+        mov     edx,DWORD [120+ebx]
+        movq    mm0,[120+ebx]
+        psllq   mm6,60
+        movq    mm3,[112+ebx]
+        por     mm2,mm6
+        movq    [edi-16],mm1
+        psrlq   mm1,4
+        movq    [112+edi],mm4
+        movq    mm7,mm4
+        movq    [ebp-24],mm2
+        psrlq   mm4,4
+        movq    [104+ebp],mm5
+        shl     edx,4
+        mov     BYTE [15+esp],dl
+        psllq   mm7,60
+        por     mm1,mm7
+        movq    [edi-8],mm0
+        psrlq   mm0,4
+        movq    [120+edi],mm3
+        movq    mm6,mm3
+        movq    [ebp-16],mm1
+        psrlq   mm3,4
+        movq    [112+ebp],mm4
+        psllq   mm6,60
+        por     mm0,mm6
+        movq    [ebp-8],mm0
+        movq    [120+ebp],mm3
+        movq    mm6,[eax]
+        mov     ebx,DWORD [8+eax]
+        mov     edx,DWORD [12+eax]
+align   16
+L$009outer:
+        xor     edx,DWORD [12+ecx]
+        xor     ebx,DWORD [8+ecx]
+        pxor    mm6,[ecx]
+        lea     ecx,[16+ecx]
+        mov     DWORD [536+esp],ebx
+        movq    [528+esp],mm6
+        mov     DWORD [548+esp],ecx
+        xor     eax,eax
+        rol     edx,8
+        mov     al,dl
+        mov     ebp,eax
+        and     al,15
+        shr     ebp,4
+        pxor    mm0,mm0
+        rol     edx,8
+        pxor    mm1,mm1
+        pxor    mm2,mm2
+        movq    mm7,[16+eax*8+esp]
+        movq    mm6,[144+eax*8+esp]
+        mov     al,dl
+        movd    ebx,mm7
+        psrlq   mm7,8
+        movq    mm3,mm6
+        mov     edi,eax
+        psrlq   mm6,8
+        pxor    mm7,[272+ebp*8+esp]
+        and     al,15
+        psllq   mm3,56
+        shr     edi,4
+        pxor    mm7,[16+eax*8+esp]
+        rol     edx,8
+        pxor    mm6,[144+eax*8+esp]
+        pxor    mm7,mm3
+        pxor    mm6,[400+ebp*8+esp]
+        xor     bl,BYTE [ebp*1+esp]
+        mov     al,dl
+        movd    ecx,mm7
+        movzx   ebx,bl
+        psrlq   mm7,8
+        movq    mm3,mm6
+        mov     ebp,eax
+        psrlq   mm6,8
+        pxor    mm7,[272+edi*8+esp]
+        and     al,15
+        psllq   mm3,56
+        shr     ebp,4
+        pinsrw  mm2,WORD [ebx*2+esi],2
+        pxor    mm7,[16+eax*8+esp]
+        rol     edx,8
+        pxor    mm6,[144+eax*8+esp]
+        pxor    mm7,mm3
+        pxor    mm6,[400+edi*8+esp]
+        xor     cl,BYTE [edi*1+esp]
+        mov     al,dl
+        mov     edx,DWORD [536+esp]
+        movd    ebx,mm7
+        movzx   ecx,cl
+        psrlq   mm7,8
+        movq    mm3,mm6
+        mov     edi,eax
+        psrlq   mm6,8
+        pxor    mm7,[272+ebp*8+esp]
+        and     al,15
+        psllq   mm3,56
+        pxor    mm6,mm2
+        shr     edi,4
+        pinsrw  mm1,WORD [ecx*2+esi],2
+        pxor    mm7,[16+eax*8+esp]
+        rol     edx,8
+        pxor    mm6,[144+eax*8+esp]
+        pxor    mm7,mm3
+        pxor    mm6,[400+ebp*8+esp]
+        xor     bl,BYTE [ebp*1+esp]
+        mov     al,dl
+        movd    ecx,mm7
+        movzx   ebx,bl
+        psrlq   mm7,8
+        movq    mm3,mm6
+        mov     ebp,eax
+        psrlq   mm6,8
+        pxor    mm7,[272+edi*8+esp]
+        and     al,15
+        psllq   mm3,56
+        pxor    mm6,mm1
+        shr     ebp,4
+        pinsrw  mm0,WORD [ebx*2+esi],2
+        pxor    mm7,[16+eax*8+esp]
+        rol     edx,8
+        pxor    mm6,[144+eax*8+esp]
+        pxor    mm7,mm3
+        pxor    mm6,[400+edi*8+esp]
+        xor     cl,BYTE [edi*1+esp]
+        mov     al,dl
+        movd    ebx,mm7
+        movzx   ecx,cl
+        psrlq   mm7,8
+        movq    mm3,mm6
+        mov     edi,eax
+        psrlq   mm6,8
+        pxor    mm7,[272+ebp*8+esp]
+        and     al,15
+        psllq   mm3,56
+        pxor    mm6,mm0
+        shr     edi,4
+        pinsrw  mm2,WORD [ecx*2+esi],2
+        pxor    mm7,[16+eax*8+esp]
+        rol     edx,8
+        pxor    mm6,[144+eax*8+esp]
+        pxor    mm7,mm3
+        pxor    mm6,[400+ebp*8+esp]
+        xor     bl,BYTE [ebp*1+esp]
+        mov     al,dl
+        movd    ecx,mm7
+        movzx   ebx,bl
+        psrlq   mm7,8
+        movq    mm3,mm6
+        mov     ebp,eax
+        psrlq   mm6,8
+        pxor    mm7,[272+edi*8+esp]
+        and     al,15
+        psllq   mm3,56
+        pxor    mm6,mm2
+        shr     ebp,4
+        pinsrw  mm1,WORD [ebx*2+esi],2
+        pxor    mm7,[16+eax*8+esp]
+        rol     edx,8
+        pxor    mm6,[144+eax*8+esp]
+        pxor    mm7,mm3
+        pxor    mm6,[400+edi*8+esp]
+        xor     cl,BYTE [edi*1+esp]
+        mov     al,dl
+        mov     edx,DWORD [532+esp]
+        movd    ebx,mm7
+        movzx   ecx,cl
+        psrlq   mm7,8
+        movq    mm3,mm6
+        mov     edi,eax
+        psrlq   mm6,8
+        pxor    mm7,[272+ebp*8+esp]
+        and     al,15
+        psllq   mm3,56
+        pxor    mm6,mm1
+        shr     edi,4
+        pinsrw  mm0,WORD [ecx*2+esi],2
+        pxor    mm7,[16+eax*8+esp]
+        rol     edx,8
+        pxor    mm6,[144+eax*8+esp]
+        pxor    mm7,mm3
+        pxor    mm6,[400+ebp*8+esp]
+        xor     bl,BYTE [ebp*1+esp]
+        mov     al,dl
+        movd    ecx,mm7
+        movzx   ebx,bl
+        psrlq   mm7,8
+        movq    mm3,mm6
+        mov     ebp,eax
+        psrlq   mm6,8
+        pxor    mm7,[272+edi*8+esp]
+        and     al,15
+        psllq   mm3,56
+        pxor    mm6,mm0
+        shr     ebp,4
+        pinsrw  mm2,WORD [ebx*2+esi],2
+        pxor    mm7,[16+eax*8+esp]
+        rol     edx,8
+        pxor    mm6,[144+eax*8+esp]
+        pxor    mm7,mm3
+        pxor    mm6,[400+edi*8+esp]
+        xor     cl,BYTE [edi*1+esp]
+        mov     al,dl
+        movd    ebx,mm7
+        movzx   ecx,cl
+        psrlq   mm7,8
+        movq    mm3,mm6
+        mov     edi,eax
+        psrlq   mm6,8
+        pxor    mm7,[272+ebp*8+esp]
+        and     al,15
+        psllq   mm3,56
+        pxor    mm6,mm2
+        shr     edi,4
+        pinsrw  mm1,WORD [ecx*2+esi],2
+        pxor    mm7,[16+eax*8+esp]
+        rol     edx,8
+        pxor    mm6,[144+eax*8+esp]
+        pxor    mm7,mm3
+        pxor    mm6,[400+ebp*8+esp]
+        xor     bl,BYTE [ebp*1+esp]
+        mov     al,dl
+        movd    ecx,mm7
+        movzx   ebx,bl
+        psrlq   mm7,8
+        movq    mm3,mm6
+        mov     ebp,eax
+        psrlq   mm6,8
+        pxor    mm7,[272+edi*8+esp]
+        and     al,15
+        psllq   mm3,56
+        pxor    mm6,mm1
+        shr     ebp,4
+        pinsrw  mm0,WORD [ebx*2+esi],2
+        pxor    mm7,[16+eax*8+esp]
+        rol     edx,8
+        pxor    mm6,[144+eax*8+esp]
+        pxor    mm7,mm3
+        pxor    mm6,[400+edi*8+esp]
+        xor     cl,BYTE [edi*1+esp]
+        mov     al,dl
+        mov     edx,DWORD [528+esp]
+        movd    ebx,mm7
+        movzx   ecx,cl
+        psrlq   mm7,8
+        movq    mm3,mm6
+        mov     edi,eax
+        psrlq   mm6,8
+        pxor    mm7,[272+ebp*8+esp]
+        and     al,15
+        psllq   mm3,56
+        pxor    mm6,mm0
+        shr     edi,4
+        pinsrw  mm2,WORD [ecx*2+esi],2
+        pxor    mm7,[16+eax*8+esp]
+        rol     edx,8
+        pxor    mm6,[144+eax*8+esp]
+        pxor    mm7,mm3
+        pxor    mm6,[400+ebp*8+esp]
+        xor     bl,BYTE [ebp*1+esp]
+        mov     al,dl
+        movd    ecx,mm7
+        movzx   ebx,bl
+        psrlq   mm7,8
+        movq    mm3,mm6
+        mov     ebp,eax
+        psrlq   mm6,8
+        pxor    mm7,[272+edi*8+esp]
+        and     al,15
+        psllq   mm3,56
+        pxor    mm6,mm2
+        shr     ebp,4
+        pinsrw  mm1,WORD [ebx*2+esi],2
+        pxor    mm7,[16+eax*8+esp]
+        rol     edx,8
+        pxor    mm6,[144+eax*8+esp]
+        pxor    mm7,mm3
+        pxor    mm6,[400+edi*8+esp]
+        xor     cl,BYTE [edi*1+esp]
+        mov     al,dl
+        movd    ebx,mm7
+        movzx   ecx,cl
+        psrlq   mm7,8
+        movq    mm3,mm6
+        mov     edi,eax
+        psrlq   mm6,8
+        pxor    mm7,[272+ebp*8+esp]
+        and     al,15
+        psllq   mm3,56
+        pxor    mm6,mm1
+        shr     edi,4
+        pinsrw  mm0,WORD [ecx*2+esi],2
+        pxor    mm7,[16+eax*8+esp]
+        rol     edx,8
+        pxor    mm6,[144+eax*8+esp]
+        pxor    mm7,mm3
+        pxor    mm6,[400+ebp*8+esp]
+        xor     bl,BYTE [ebp*1+esp]
+        mov     al,dl
+        movd    ecx,mm7
+        movzx   ebx,bl
+        psrlq   mm7,8
+        movq    mm3,mm6
+        mov     ebp,eax
+        psrlq   mm6,8
+        pxor    mm7,[272+edi*8+esp]
+        and     al,15
+        psllq   mm3,56
+        pxor    mm6,mm0
+        shr     ebp,4
+        pinsrw  mm2,WORD [ebx*2+esi],2
+        pxor    mm7,[16+eax*8+esp]
+        rol     edx,8
+        pxor    mm6,[144+eax*8+esp]
+        pxor    mm7,mm3
+        pxor    mm6,[400+edi*8+esp]
+        xor     cl,BYTE [edi*1+esp]
+        mov     al,dl
+        mov     edx,DWORD [524+esp]
+        movd    ebx,mm7
+        movzx   ecx,cl
+        psrlq   mm7,8
+        movq    mm3,mm6
+        mov     edi,eax
+        psrlq   mm6,8
+        pxor    mm7,[272+ebp*8+esp]
+        and     al,15
+        psllq   mm3,56
+        pxor    mm6,mm2
+        shr     edi,4
+        pinsrw  mm1,WORD [ecx*2+esi],2
+        pxor    mm7,[16+eax*8+esp]
+        pxor    mm6,[144+eax*8+esp]
+        xor     bl,BYTE [ebp*1+esp]
+        pxor    mm7,mm3
+        pxor    mm6,[400+ebp*8+esp]
+        movzx   ebx,bl
+        pxor    mm2,mm2
+        psllq   mm1,4
+        movd    ecx,mm7
+        psrlq   mm7,4
+        movq    mm3,mm6
+        psrlq   mm6,4
+        shl     ecx,4
+        pxor    mm7,[16+edi*8+esp]
+        psllq   mm3,60
+        movzx   ecx,cl
+        pxor    mm7,mm3
+        pxor    mm6,[144+edi*8+esp]
+        pinsrw  mm0,WORD [ebx*2+esi],2
+        pxor    mm6,mm1
+        movd    edx,mm7
+        pinsrw  mm2,WORD [ecx*2+esi],3
+        psllq   mm0,12
+        pxor    mm6,mm0
+        psrlq   mm7,32
+        pxor    mm6,mm2
+        mov     ecx,DWORD [548+esp]
+        movd    ebx,mm7
+        movq    mm3,mm6
+        psllw   mm6,8
+        psrlw   mm3,8
+        por     mm6,mm3
+        bswap   edx
+        pshufw  mm6,mm6,27
+        bswap   ebx
+        cmp     ecx,DWORD [552+esp]
+        jne     NEAR L$009outer
+        mov     eax,DWORD [544+esp]
+        mov     DWORD [12+eax],edx
+        mov     DWORD [8+eax],ebx
+        movq    [eax],mm6
+        mov     esp,DWORD [556+esp]
+        emms
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+global  _gcm_init_clmul
+align   16
+_gcm_init_clmul:
+L$_gcm_init_clmul_begin:
+        mov     edx,DWORD [4+esp]
+        mov     eax,DWORD [8+esp]
+        call    L$010pic
+L$010pic:
+        pop     ecx
+        lea     ecx,[(L$bswap-L$010pic)+ecx]
+        movdqu  xmm2,[eax]
+        pshufd  xmm2,xmm2,78
+        pshufd  xmm4,xmm2,255
+        movdqa  xmm3,xmm2
+        psllq   xmm2,1
+        pxor    xmm5,xmm5
+        psrlq   xmm3,63
+        pcmpgtd xmm5,xmm4
+        pslldq  xmm3,8
+        por     xmm2,xmm3
+        pand    xmm5,[16+ecx]
+        pxor    xmm2,xmm5
+        movdqa  xmm0,xmm2
+        movdqa  xmm1,xmm0
+        pshufd  xmm3,xmm0,78
+        pshufd  xmm4,xmm2,78
+        pxor    xmm3,xmm0
+        pxor    xmm4,xmm2
+db      102,15,58,68,194,0
+db      102,15,58,68,202,17
+db      102,15,58,68,220,0
+        xorps   xmm3,xmm0
+        xorps   xmm3,xmm1
+        movdqa  xmm4,xmm3
+        psrldq  xmm3,8
+        pslldq  xmm4,8
+        pxor    xmm1,xmm3
+        pxor    xmm0,xmm4
+        movdqa  xmm4,xmm0
+        movdqa  xmm3,xmm0
+        psllq   xmm0,5
+        pxor    xmm3,xmm0
+        psllq   xmm0,1
+        pxor    xmm0,xmm3
+        psllq   xmm0,57
+        movdqa  xmm3,xmm0
+        pslldq  xmm0,8
+        psrldq  xmm3,8
+        pxor    xmm0,xmm4
+        pxor    xmm1,xmm3
+        movdqa  xmm4,xmm0
+        psrlq   xmm0,1
+        pxor    xmm1,xmm4
+        pxor    xmm4,xmm0
+        psrlq   xmm0,5
+        pxor    xmm0,xmm4
+        psrlq   xmm0,1
+        pxor    xmm0,xmm1
+        pshufd  xmm3,xmm2,78
+        pshufd  xmm4,xmm0,78
+        pxor    xmm3,xmm2
+        movdqu  [edx],xmm2
+        pxor    xmm4,xmm0
+        movdqu  [16+edx],xmm0
+db      102,15,58,15,227,8
+        movdqu  [32+edx],xmm4
+        ret
+global  _gcm_gmult_clmul
+align   16
+_gcm_gmult_clmul:
+L$_gcm_gmult_clmul_begin:
+        mov     eax,DWORD [4+esp]
+        mov     edx,DWORD [8+esp]
+        call    L$011pic
+L$011pic:
+        pop     ecx
+        lea     ecx,[(L$bswap-L$011pic)+ecx]
+        movdqu  xmm0,[eax]
+        movdqa  xmm5,[ecx]
+        movups  xmm2,[edx]
+db      102,15,56,0,197
+        movups  xmm4,[32+edx]
+        movdqa  xmm1,xmm0
+        pshufd  xmm3,xmm0,78
+        pxor    xmm3,xmm0
+db      102,15,58,68,194,0
+db      102,15,58,68,202,17
+db      102,15,58,68,220,0
+        xorps   xmm3,xmm0
+        xorps   xmm3,xmm1
+        movdqa  xmm4,xmm3
+        psrldq  xmm3,8
+        pslldq  xmm4,8
+        pxor    xmm1,xmm3
+        pxor    xmm0,xmm4
+        movdqa  xmm4,xmm0
+        movdqa  xmm3,xmm0
+        psllq   xmm0,5
+        pxor    xmm3,xmm0
+        psllq   xmm0,1
+        pxor    xmm0,xmm3
+        psllq   xmm0,57
+        movdqa  xmm3,xmm0
+        pslldq  xmm0,8
+        psrldq  xmm3,8
+        pxor    xmm0,xmm4
+        pxor    xmm1,xmm3
+        movdqa  xmm4,xmm0
+        psrlq   xmm0,1
+        pxor    xmm1,xmm4
+        pxor    xmm4,xmm0
+        psrlq   xmm0,5
+        pxor    xmm0,xmm4
+        psrlq   xmm0,1
+        pxor    xmm0,xmm1
+db      102,15,56,0,197
+        movdqu  [eax],xmm0
+        ret
+global  _gcm_ghash_clmul
+align   16
+_gcm_ghash_clmul:
+L$_gcm_ghash_clmul_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        mov     eax,DWORD [20+esp]
+        mov     edx,DWORD [24+esp]
+        mov     esi,DWORD [28+esp]
+        mov     ebx,DWORD [32+esp]
+        call    L$012pic
+L$012pic:
+        pop     ecx
+        lea     ecx,[(L$bswap-L$012pic)+ecx]
+        movdqu  xmm0,[eax]
+        movdqa  xmm5,[ecx]
+        movdqu  xmm2,[edx]
+db      102,15,56,0,197
+        sub     ebx,16
+        jz      NEAR L$013odd_tail
+        movdqu  xmm3,[esi]
+        movdqu  xmm6,[16+esi]
+db      102,15,56,0,221
+db      102,15,56,0,245
+        movdqu  xmm5,[32+edx]
+        pxor    xmm0,xmm3
+        pshufd  xmm3,xmm6,78
+        movdqa  xmm7,xmm6
+        pxor    xmm3,xmm6
+        lea     esi,[32+esi]
+db      102,15,58,68,242,0
+db      102,15,58,68,250,17
+db      102,15,58,68,221,0
+        movups  xmm2,[16+edx]
+        nop
+        sub     ebx,32
+        jbe     NEAR L$014even_tail
+        jmp     NEAR L$015mod_loop
+align   32
+L$015mod_loop:
+        pshufd  xmm4,xmm0,78
+        movdqa  xmm1,xmm0
+        pxor    xmm4,xmm0
+        nop
+db      102,15,58,68,194,0
+db      102,15,58,68,202,17
+db      102,15,58,68,229,16
+        movups  xmm2,[edx]
+        xorps   xmm0,xmm6
+        movdqa  xmm5,[ecx]
+        xorps   xmm1,xmm7
+        movdqu  xmm7,[esi]
+        pxor    xmm3,xmm0
+        movdqu  xmm6,[16+esi]
+        pxor    xmm3,xmm1
+db      102,15,56,0,253
+        pxor    xmm4,xmm3
+        movdqa  xmm3,xmm4
+        psrldq  xmm4,8
+        pslldq  xmm3,8
+        pxor    xmm1,xmm4
+        pxor    xmm0,xmm3
+db      102,15,56,0,245
+        pxor    xmm1,xmm7
+        movdqa  xmm7,xmm6
+        movdqa  xmm4,xmm0
+        movdqa  xmm3,xmm0
+        psllq   xmm0,5
+        pxor    xmm3,xmm0
+        psllq   xmm0,1
+        pxor    xmm0,xmm3
+db      102,15,58,68,242,0
+        movups  xmm5,[32+edx]
+        psllq   xmm0,57
+        movdqa  xmm3,xmm0
+        pslldq  xmm0,8
+        psrldq  xmm3,8
+        pxor    xmm0,xmm4
+        pxor    xmm1,xmm3
+        pshufd  xmm3,xmm7,78
+        movdqa  xmm4,xmm0
+        psrlq   xmm0,1
+        pxor    xmm3,xmm7
+        pxor    xmm1,xmm4
+db      102,15,58,68,250,17
+        movups  xmm2,[16+edx]
+        pxor    xmm4,xmm0
+        psrlq   xmm0,5
+        pxor    xmm0,xmm4
+        psrlq   xmm0,1
+        pxor    xmm0,xmm1
+db      102,15,58,68,221,0
+        lea     esi,[32+esi]
+        sub     ebx,32
+        ja      NEAR L$015mod_loop
+L$014even_tail:
+        pshufd  xmm4,xmm0,78
+        movdqa  xmm1,xmm0
+        pxor    xmm4,xmm0
+db      102,15,58,68,194,0
+db      102,15,58,68,202,17
+db      102,15,58,68,229,16
+        movdqa  xmm5,[ecx]
+        xorps   xmm0,xmm6
+        xorps   xmm1,xmm7
+        pxor    xmm3,xmm0
+        pxor    xmm3,xmm1
+        pxor    xmm4,xmm3
+        movdqa  xmm3,xmm4
+        psrldq  xmm4,8
+        pslldq  xmm3,8
+        pxor    xmm1,xmm4
+        pxor    xmm0,xmm3
+        movdqa  xmm4,xmm0
+        movdqa  xmm3,xmm0
+        psllq   xmm0,5
+        pxor    xmm3,xmm0
+        psllq   xmm0,1
+        pxor    xmm0,xmm3
+        psllq   xmm0,57
+        movdqa  xmm3,xmm0
+        pslldq  xmm0,8
+        psrldq  xmm3,8
+        pxor    xmm0,xmm4
+        pxor    xmm1,xmm3
+        movdqa  xmm4,xmm0
+        psrlq   xmm0,1
+        pxor    xmm1,xmm4
+        pxor    xmm4,xmm0
+        psrlq   xmm0,5
+        pxor    xmm0,xmm4
+        psrlq   xmm0,1
+        pxor    xmm0,xmm1
+        test    ebx,ebx
+        jnz     NEAR L$016done
+        movups  xmm2,[edx]
+L$013odd_tail:
+        movdqu  xmm3,[esi]
+db      102,15,56,0,221
+        pxor    xmm0,xmm3
+        movdqa  xmm1,xmm0
+        pshufd  xmm3,xmm0,78
+        pshufd  xmm4,xmm2,78
+        pxor    xmm3,xmm0
+        pxor    xmm4,xmm2
+db      102,15,58,68,194,0
+db      102,15,58,68,202,17
+db      102,15,58,68,220,0
+        xorps   xmm3,xmm0
+        xorps   xmm3,xmm1
+        movdqa  xmm4,xmm3
+        psrldq  xmm3,8
+        pslldq  xmm4,8
+        pxor    xmm1,xmm3
+        pxor    xmm0,xmm4
+        movdqa  xmm4,xmm0
+        movdqa  xmm3,xmm0
+        psllq   xmm0,5
+        pxor    xmm3,xmm0
+        psllq   xmm0,1
+        pxor    xmm0,xmm3
+        psllq   xmm0,57
+        movdqa  xmm3,xmm0
+        pslldq  xmm0,8
+        psrldq  xmm3,8
+        pxor    xmm0,xmm4
+        pxor    xmm1,xmm3
+        movdqa  xmm4,xmm0
+        psrlq   xmm0,1
+        pxor    xmm1,xmm4
+        pxor    xmm4,xmm0
+        psrlq   xmm0,5
+        pxor    xmm0,xmm4
+        psrlq   xmm0,1
+        pxor    xmm0,xmm1
+L$016done:
+db      102,15,56,0,197
+        movdqu  [eax],xmm0
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+align   64
+L$bswap:
+db      15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0
+db      1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,194
+align   64
+L$rem_8bit:
+dw      0,450,900,582,1800,1738,1164,1358
+dw      3600,4050,3476,3158,2328,2266,2716,2910
+dw      7200,7650,8100,7782,6952,6890,6316,6510
+dw      4656,5106,4532,4214,5432,5370,5820,6014
+dw      14400,14722,15300,14854,16200,16010,15564,15630
+dw      13904,14226,13780,13334,12632,12442,13020,13086
+dw      9312,9634,10212,9766,9064,8874,8428,8494
+dw      10864,11186,10740,10294,11640,11450,12028,12094
+dw      28800,28994,29444,29382,30600,30282,29708,30158
+dw      32400,32594,32020,31958,31128,30810,31260,31710
+dw      27808,28002,28452,28390,27560,27242,26668,27118
+dw      25264,25458,24884,24822,26040,25722,26172,26622
+dw      18624,18690,19268,19078,20424,19978,19532,19854
+dw      18128,18194,17748,17558,16856,16410,16988,17310
+dw      21728,21794,22372,22182,21480,21034,20588,20910
+dw      23280,23346,22900,22710,24056,23610,24188,24510
+dw      57600,57538,57988,58182,58888,59338,58764,58446
+dw      61200,61138,60564,60758,59416,59866,60316,59998
+dw      64800,64738,65188,65382,64040,64490,63916,63598
+dw      62256,62194,61620,61814,62520,62970,63420,63102
+dw      55616,55426,56004,56070,56904,57226,56780,56334
+dw      55120,54930,54484,54550,53336,53658,54236,53790
+dw      50528,50338,50916,50982,49768,50090,49644,49198
+dw      52080,51890,51444,51510,52344,52666,53244,52798
+dw      37248,36930,37380,37830,38536,38730,38156,38094
+dw      40848,40530,39956,40406,39064,39258,39708,39646
+dw      36256,35938,36388,36838,35496,35690,35116,35054
+dw      33712,33394,32820,33270,33976,34170,34620,34558
+dw      43456,43010,43588,43910,44744,44810,44364,44174
+dw      42960,42514,42068,42390,41176,41242,41820,41630
+dw      46560,46114,46692,47014,45800,45866,45420,45230
+dw      48112,47666,47220,47542,48376,48442,49020,48830
+align   64
+L$rem_4bit:
+dd      0,0,0,471859200,0,943718400,0,610271232
+dd      0,1887436800,0,1822425088,0,1220542464,0,1423966208
+dd      0,3774873600,0,4246732800,0,3644850176,0,3311403008
+dd      0,2441084928,0,2376073216,0,2847932416,0,3051356160
+db      71,72,65,83,72,32,102,111,114,32,120,56,54,44,32,67
+db      82,89,80,84,79,71,65,77,83,32,98,121,32,60,97,112
+db      112,114,111,64,111,112,101,110,115,115,108,46,111,114,103,62
+db      0
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/rc4/rc4-586.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/rc4/rc4-586.nasm
new file mode 100644
index 0000000000..e78222ee9d
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/rc4/rc4-586.nasm
@@ -0,0 +1,381 @@
+; Copyright 1998-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code    use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text   code align=64
+%else
+section .text   code
+%endif
+;extern _OPENSSL_ia32cap_P
+global  _RC4
+align   16
+_RC4:
+L$_RC4_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        mov     edi,DWORD [20+esp]
+        mov     edx,DWORD [24+esp]
+        mov     esi,DWORD [28+esp]
+        mov     ebp,DWORD [32+esp]
+        xor     eax,eax
+        xor     ebx,ebx
+        cmp     edx,0
+        je      NEAR L$000abort
+        mov     al,BYTE [edi]
+        mov     bl,BYTE [4+edi]
+        add     edi,8
+        lea     ecx,[edx*1+esi]
+        sub     ebp,esi
+        mov     DWORD [24+esp],ecx
+        inc     al
+        cmp     DWORD [256+edi],-1
+        je      NEAR L$001RC4_CHAR
+        mov     ecx,DWORD [eax*4+edi]
+        and     edx,-4
+        jz      NEAR L$002loop1
+        mov     DWORD [32+esp],ebp
+        test    edx,-8
+        jz      NEAR L$003go4loop4
+        lea     ebp,[_OPENSSL_ia32cap_P]
+        bt      DWORD [ebp],26
+        jnc     NEAR L$003go4loop4
+        mov     ebp,DWORD [32+esp]
+        and     edx,-8
+        lea     edx,[edx*1+esi-8]
+        mov     DWORD [edi-4],edx
+        add     bl,cl
+        mov     edx,DWORD [ebx*4+edi]
+        mov     DWORD [ebx*4+edi],ecx
+        mov     DWORD [eax*4+edi],edx
+        inc     eax
+        add     edx,ecx
+        movzx   eax,al
+        movzx   edx,dl
+        movq    mm0,[esi]
+        mov     ecx,DWORD [eax*4+edi]
+        movd    mm2,DWORD [edx*4+edi]
+        jmp     NEAR L$004loop_mmx_enter
+align   16
+L$005loop_mmx:
+        add     bl,cl
+        psllq   mm1,56
+        mov     edx,DWORD [ebx*4+edi]
+        mov     DWORD [ebx*4+edi],ecx
+        mov     DWORD [eax*4+edi],edx
+        inc     eax
+        add     edx,ecx
+        movzx   eax,al
+        movzx   edx,dl
+        pxor    mm2,mm1
+        movq    mm0,[esi]
+        movq    [esi*1+ebp-8],mm2
+        mov     ecx,DWORD [eax*4+edi]
+        movd    mm2,DWORD [edx*4+edi]
+L$004loop_mmx_enter:
+        add     bl,cl
+        mov     edx,DWORD [ebx*4+edi]
+        mov     DWORD [ebx*4+edi],ecx
+        mov     DWORD [eax*4+edi],edx
+        inc     eax
+        add     edx,ecx
+        movzx   eax,al
+        movzx   edx,dl
+        pxor    mm2,mm0
+        mov     ecx,DWORD [eax*4+edi]
+        movd    mm1,DWORD [edx*4+edi]
+        add     bl,cl
+        psllq   mm1,8
+        mov     edx,DWORD [ebx*4+edi]
+        mov     DWORD [ebx*4+edi],ecx
+        mov     DWORD [eax*4+edi],edx
+        inc     eax
+        add     edx,ecx
+        movzx   eax,al
+        movzx   edx,dl
+        pxor    mm2,mm1
+        mov     ecx,DWORD [eax*4+edi]
+        movd    mm1,DWORD [edx*4+edi]
+        add     bl,cl
+        psllq   mm1,16
+        mov     edx,DWORD [ebx*4+edi]
+        mov     DWORD [ebx*4+edi],ecx
+        mov     DWORD [eax*4+edi],edx
+        inc     eax
+        add     edx,ecx
+        movzx   eax,al
+        movzx   edx,dl
+        pxor    mm2,mm1
+        mov     ecx,DWORD [eax*4+edi]
+        movd    mm1,DWORD [edx*4+edi]
+        add     bl,cl
+        psllq   mm1,24
+        mov     edx,DWORD [ebx*4+edi]
+        mov     DWORD [ebx*4+edi],ecx
+        mov     DWORD [eax*4+edi],edx
+        inc     eax
+        add     edx,ecx
+        movzx   eax,al
+        movzx   edx,dl
+        pxor    mm2,mm1
+        mov     ecx,DWORD [eax*4+edi]
+        movd    mm1,DWORD [edx*4+edi]
+        add     bl,cl
+        psllq   mm1,32
+        mov     edx,DWORD [ebx*4+edi]
+        mov     DWORD [ebx*4+edi],ecx
+        mov     DWORD [eax*4+edi],edx
+        inc     eax
+        add     edx,ecx
+        movzx   eax,al
+        movzx   edx,dl
+        pxor    mm2,mm1
+        mov     ecx,DWORD [eax*4+edi]
+        movd    mm1,DWORD [edx*4+edi]
+        add     bl,cl
+        psllq   mm1,40
+        mov     edx,DWORD [ebx*4+edi]
+        mov     DWORD [ebx*4+edi],ecx
+        mov     DWORD [eax*4+edi],edx
+        inc     eax
+        add     edx,ecx
+        movzx   eax,al
+        movzx   edx,dl
+        pxor    mm2,mm1
+        mov     ecx,DWORD [eax*4+edi]
+        movd    mm1,DWORD [edx*4+edi]
+        add     bl,cl
+        psllq   mm1,48
+        mov     edx,DWORD [ebx*4+edi]
+        mov     DWORD [ebx*4+edi],ecx
+        mov     DWORD [eax*4+edi],edx
+        inc     eax
+        add     edx,ecx
+        movzx   eax,al
+        movzx   edx,dl
+        pxor    mm2,mm1
+        mov     ecx,DWORD [eax*4+edi]
+        movd    mm1,DWORD [edx*4+edi]
+        mov     edx,ebx
+        xor     ebx,ebx
+        mov     bl,dl
+        cmp     esi,DWORD [edi-4]
+        lea     esi,[8+esi]
+        jb      NEAR L$005loop_mmx
+        psllq   mm1,56
+        pxor    mm2,mm1
+        movq    [esi*1+ebp-8],mm2
+        emms
+        cmp     esi,DWORD [24+esp]
+        je      NEAR L$006done
+        jmp     NEAR L$002loop1
+align   16
+L$003go4loop4:
+        lea     edx,[edx*1+esi-4]
+        mov     DWORD [28+esp],edx
+L$007loop4:
+        add     bl,cl
+        mov     edx,DWORD [ebx*4+edi]
+        mov     DWORD [ebx*4+edi],ecx
+        mov     DWORD [eax*4+edi],edx
+        add     edx,ecx
+        inc     al
+        and     edx,255
+        mov     ecx,DWORD [eax*4+edi]
+        mov     ebp,DWORD [edx*4+edi]
+        add     bl,cl
+        mov     edx,DWORD [ebx*4+edi]
+        mov     DWORD [ebx*4+edi],ecx
+        mov     DWORD [eax*4+edi],edx
+        add     edx,ecx
+        inc     al
+        and     edx,255
+        ror     ebp,8
+        mov     ecx,DWORD [eax*4+edi]
+        or      ebp,DWORD [edx*4+edi]
+        add     bl,cl
+        mov     edx,DWORD [ebx*4+edi]
+        mov     DWORD [ebx*4+edi],ecx
+        mov     DWORD [eax*4+edi],edx
+        add     edx,ecx
+        inc     al
+        and     edx,255
+        ror     ebp,8
+        mov     ecx,DWORD [eax*4+edi]
+        or      ebp,DWORD [edx*4+edi]
+        add     bl,cl
+        mov     edx,DWORD [ebx*4+edi]
+        mov     DWORD [ebx*4+edi],ecx
+        mov     DWORD [eax*4+edi],edx
+        add     edx,ecx
+        inc     al
+        and     edx,255
+        ror     ebp,8
+        mov     ecx,DWORD [32+esp]
+        or      ebp,DWORD [edx*4+edi]
+        ror     ebp,8
+        xor     ebp,DWORD [esi]
+        cmp     esi,DWORD [28+esp]
+        mov     DWORD [esi*1+ecx],ebp
+        lea     esi,[4+esi]
+        mov     ecx,DWORD [eax*4+edi]
+        jb      NEAR L$007loop4
+        cmp     esi,DWORD [24+esp]
+        je      NEAR L$006done
+        mov     ebp,DWORD [32+esp]
+align   16
+L$002loop1:
+        add     bl,cl
+        mov     edx,DWORD [ebx*4+edi]
+        mov     DWORD [ebx*4+edi],ecx
+        mov     DWORD [eax*4+edi],edx
+        add     edx,ecx
+        inc     al
+        and     edx,255
+        mov     edx,DWORD [edx*4+edi]
+        xor     dl,BYTE [esi]
+        lea     esi,[1+esi]
+        mov     ecx,DWORD [eax*4+edi]
+        cmp     esi,DWORD [24+esp]
+        mov     BYTE [esi*1+ebp-1],dl
+        jb      NEAR L$002loop1
+        jmp     NEAR L$006done
+align   16
+L$001RC4_CHAR:
+        movzx   ecx,BYTE [eax*1+edi]
+L$008cloop1:
+        add     bl,cl
+        movzx   edx,BYTE [ebx*1+edi]
+        mov     BYTE [ebx*1+edi],cl
+        mov     BYTE [eax*1+edi],dl
+        add     dl,cl
+        movzx   edx,BYTE [edx*1+edi]
+        add     al,1
+        xor     dl,BYTE [esi]
+        lea     esi,[1+esi]
+        movzx   ecx,BYTE [eax*1+edi]
+        cmp     esi,DWORD [24+esp]
+        mov     BYTE [esi*1+ebp-1],dl
+        jb      NEAR L$008cloop1
+L$006done:
+        dec     al
+        mov     DWORD [edi-4],ebx
+        mov     BYTE [edi-8],al
+L$000abort:
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+global  _RC4_set_key
+align   16
+_RC4_set_key:
+L$_RC4_set_key_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        mov     edi,DWORD [20+esp]
+        mov     ebp,DWORD [24+esp]
+        mov     esi,DWORD [28+esp]
+        lea     edx,[_OPENSSL_ia32cap_P]
+        lea     edi,[8+edi]
+        lea     esi,[ebp*1+esi]
+        neg     ebp
+        xor     eax,eax
+        mov     DWORD [edi-4],ebp
+        bt      DWORD [edx],20
+        jc      NEAR L$009c1stloop
+align   16
+L$010w1stloop:
+        mov     DWORD [eax*4+edi],eax
+        add     al,1
+        jnc     NEAR L$010w1stloop
+        xor     ecx,ecx
+        xor     edx,edx
+align   16
+L$011w2ndloop:
+        mov     eax,DWORD [ecx*4+edi]
+        add     dl,BYTE [ebp*1+esi]
+        add     dl,al
+        add     ebp,1
+        mov     ebx,DWORD [edx*4+edi]
+        jnz     NEAR L$012wnowrap
+        mov     ebp,DWORD [edi-4]
+L$012wnowrap:
+        mov     DWORD [edx*4+edi],eax
+        mov     DWORD [ecx*4+edi],ebx
+        add     cl,1
+        jnc     NEAR L$011w2ndloop
+        jmp     NEAR L$013exit
+align   16
+L$009c1stloop:
+        mov     BYTE [eax*1+edi],al
+        add     al,1
+        jnc     NEAR L$009c1stloop
+        xor     ecx,ecx
+        xor     edx,edx
+        xor     ebx,ebx
+align   16
+L$014c2ndloop:
+        mov     al,BYTE [ecx*1+edi]
+        add     dl,BYTE [ebp*1+esi]
+        add     dl,al
+        add     ebp,1
+        mov     bl,BYTE [edx*1+edi]
+        jnz     NEAR L$015cnowrap
+        mov     ebp,DWORD [edi-4]
+L$015cnowrap:
+        mov     BYTE [edx*1+edi],al
+        mov     BYTE [ecx*1+edi],bl
+        add     cl,1
+        jnc     NEAR L$014c2ndloop
+        mov     DWORD [256+edi],-1
+L$013exit:
+        xor     eax,eax
+        mov     DWORD [edi-8],eax
+        mov     DWORD [edi-4],eax
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+global  _RC4_options
+align   16
+_RC4_options:
+L$_RC4_options_begin:
+        call    L$016pic_point
+L$016pic_point:
+        pop     eax
+        lea     eax,[(L$017opts-L$016pic_point)+eax]
+        lea     edx,[_OPENSSL_ia32cap_P]
+        mov     edx,DWORD [edx]
+        bt      edx,20
+        jc      NEAR L$0181xchar
+        bt      edx,26
+        jnc     NEAR L$019ret
+        add     eax,25
+        ret
+L$0181xchar:
+        add     eax,12
+L$019ret:
+        ret
+align   64
+L$017opts:
+db      114,99,52,40,52,120,44,105,110,116,41,0
+db      114,99,52,40,49,120,44,99,104,97,114,41,0
+db      114,99,52,40,56,120,44,109,109,120,41,0
+db      82,67,52,32,102,111,114,32,120,56,54,44,32,67,82,89
+db      80,84,79,71,65,77,83,32,98,121,32,60,97,112,112,114
+db      111,64,111,112,101,110,115,115,108,46,111,114,103,62,0
+align   64
+segment .bss
+common  _OPENSSL_ia32cap_P 16
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha1-586.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha1-586.nasm
new file mode 100644
index 0000000000..4a893333d8
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha1-586.nasm
@@ -0,0 +1,3977 @@
+; Copyright 1998-2018 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code    use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text   code align=64
+%else
+section .text   code
+%endif
+;extern _OPENSSL_ia32cap_P
+global  _sha1_block_data_order
+align   16
+_sha1_block_data_order:
+L$_sha1_block_data_order_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        call    L$000pic_point
+L$000pic_point:
+        pop     ebp
+        lea     esi,[_OPENSSL_ia32cap_P]
+        lea     ebp,[(L$K_XX_XX-L$000pic_point)+ebp]
+        mov     eax,DWORD [esi]
+        mov     edx,DWORD [4+esi]
+        test    edx,512
+        jz      NEAR L$001x86
+        mov     ecx,DWORD [8+esi]
+        test    eax,16777216
+        jz      NEAR L$001x86
+        test    ecx,536870912
+        jnz     NEAR L$shaext_shortcut
+        and     edx,268435456
+        and     eax,1073741824
+        or      eax,edx
+        cmp     eax,1342177280
+        je      NEAR L$avx_shortcut
+        jmp     NEAR L$ssse3_shortcut
+align   16
+L$001x86:
+        mov     ebp,DWORD [20+esp]
+        mov     esi,DWORD [24+esp]
+        mov     eax,DWORD [28+esp]
+        sub     esp,76
+        shl     eax,6
+        add     eax,esi
+        mov     DWORD [104+esp],eax
+        mov     edi,DWORD [16+ebp]
+        jmp     NEAR L$002loop
+align   16
+L$002loop:
+        mov     eax,DWORD [esi]
+        mov     ebx,DWORD [4+esi]
+        mov     ecx,DWORD [8+esi]
+        mov     edx,DWORD [12+esi]
+        bswap   eax
+        bswap   ebx
+        bswap   ecx
+        bswap   edx
+        mov     DWORD [esp],eax
+        mov     DWORD [4+esp],ebx
+        mov     DWORD [8+esp],ecx
+        mov     DWORD [12+esp],edx
+        mov     eax,DWORD [16+esi]
+        mov     ebx,DWORD [20+esi]
+        mov     ecx,DWORD [24+esi]
+        mov     edx,DWORD [28+esi]
+        bswap   eax
+        bswap   ebx
+        bswap   ecx
+        bswap   edx
+        mov     DWORD [16+esp],eax
+        mov     DWORD [20+esp],ebx
+        mov     DWORD [24+esp],ecx
+        mov     DWORD [28+esp],edx
+        mov     eax,DWORD [32+esi]
+        mov     ebx,DWORD [36+esi]
+        mov     ecx,DWORD [40+esi]
+        mov     edx,DWORD [44+esi]
+        bswap   eax
+        bswap   ebx
+        bswap   ecx
+        bswap   edx
+        mov     DWORD [32+esp],eax
+        mov     DWORD [36+esp],ebx
+        mov     DWORD [40+esp],ecx
+        mov     DWORD [44+esp],edx
+        mov     eax,DWORD [48+esi]
+        mov     ebx,DWORD [52+esi]
+        mov     ecx,DWORD [56+esi]
+        mov     edx,DWORD [60+esi]
+        bswap   eax
+        bswap   ebx
+        bswap   ecx
+        bswap   edx
+        mov     DWORD [48+esp],eax
+        mov     DWORD [52+esp],ebx
+        mov     DWORD [56+esp],ecx
+        mov     DWORD [60+esp],edx
+        mov     DWORD [100+esp],esi
+        mov     eax,DWORD [ebp]
+        mov     ebx,DWORD [4+ebp]
+        mov     ecx,DWORD [8+ebp]
+        mov     edx,DWORD [12+ebp]
+        ; 00_15 0
+        mov     esi,ecx
+        mov     ebp,eax
+        rol     ebp,5
+        xor     esi,edx
+        add     ebp,edi
+        mov     edi,DWORD [esp]
+        and     esi,ebx
+        ror     ebx,2
+        xor     esi,edx
+        lea     ebp,[1518500249+edi*1+ebp]
+        add     ebp,esi
+        ; 00_15 1
+        mov     edi,ebx
+        mov     esi,ebp
+        rol     ebp,5
+        xor     edi,ecx
+        add     ebp,edx
+        mov     edx,DWORD [4+esp]
+        and     edi,eax
+        ror     eax,2
+        xor     edi,ecx
+        lea     ebp,[1518500249+edx*1+ebp]
+        add     ebp,edi
+        ; 00_15 2
+        mov     edx,eax
+        mov     edi,ebp
+        rol     ebp,5
+        xor     edx,ebx
+        add     ebp,ecx
+        mov     ecx,DWORD [8+esp]
+        and     edx,esi
+        ror     esi,2
+        xor     edx,ebx
+        lea     ebp,[1518500249+ecx*1+ebp]
+        add     ebp,edx
+        ; 00_15 3
+        mov     ecx,esi
+        mov     edx,ebp
+        rol     ebp,5
+        xor     ecx,eax
+        add     ebp,ebx
+        mov     ebx,DWORD [12+esp]
+        and     ecx,edi
+        ror     edi,2
+        xor     ecx,eax
+        lea     ebp,[1518500249+ebx*1+ebp]
+        add     ebp,ecx
+        ; 00_15 4
+        mov     ebx,edi
+        mov     ecx,ebp
+        rol     ebp,5
+        xor     ebx,esi
+        add     ebp,eax
+        mov     eax,DWORD [16+esp]
+        and     ebx,edx
+        ror     edx,2
+        xor     ebx,esi
+        lea     ebp,[1518500249+eax*1+ebp]
+        add     ebp,ebx
+        ; 00_15 5
+        mov     eax,edx
+        mov     ebx,ebp
+        rol     ebp,5
+        xor     eax,edi
+        add     ebp,esi
+        mov     esi,DWORD [20+esp]
+        and     eax,ecx
+        ror     ecx,2
+        xor     eax,edi
+        lea     ebp,[1518500249+esi*1+ebp]
+        add     ebp,eax
+        ; 00_15 6
+        mov     esi,ecx
+        mov     eax,ebp
+        rol     ebp,5
+        xor     esi,edx
+        add     ebp,edi
+        mov     edi,DWORD [24+esp]
+        and     esi,ebx
+        ror     ebx,2
+        xor     esi,edx
+        lea     ebp,[1518500249+edi*1+ebp]
+        add     ebp,esi
+        ; 00_15 7
+        mov     edi,ebx
+        mov     esi,ebp
+        rol     ebp,5
+        xor     edi,ecx
+        add     ebp,edx
+        mov     edx,DWORD [28+esp]
+        and     edi,eax
+        ror     eax,2
+        xor     edi,ecx
+        lea     ebp,[1518500249+edx*1+ebp]
+        add     ebp,edi
+        ; 00_15 8
+        mov     edx,eax
+        mov     edi,ebp
+        rol     ebp,5
+        xor     edx,ebx
+        add     ebp,ecx
+        mov     ecx,DWORD [32+esp]
+        and     edx,esi
+        ror     esi,2
+        xor     edx,ebx
+        lea     ebp,[1518500249+ecx*1+ebp]
+        add     ebp,edx
+        ; 00_15 9
+        mov     ecx,esi
+        mov     edx,ebp
+        rol     ebp,5
+        xor     ecx,eax
+        add     ebp,ebx
+        mov     ebx,DWORD [36+esp]
+        and     ecx,edi
+        ror     edi,2
+        xor     ecx,eax
+        lea     ebp,[1518500249+ebx*1+ebp]
+        add     ebp,ecx
+        ; 00_15 10
+        mov     ebx,edi
+        mov     ecx,ebp
+        rol     ebp,5
+        xor     ebx,esi
+        add     ebp,eax
+        mov     eax,DWORD [40+esp]
+        and     ebx,edx
+        ror     edx,2
+        xor     ebx,esi
+        lea     ebp,[1518500249+eax*1+ebp]
+        add     ebp,ebx
+        ; 00_15 11
+        mov     eax,edx
+        mov     ebx,ebp
+        rol     ebp,5
+        xor     eax,edi
+        add     ebp,esi
+        mov     esi,DWORD [44+esp]
+        and     eax,ecx
+        ror     ecx,2
+        xor     eax,edi
+        lea     ebp,[1518500249+esi*1+ebp]
+        add     ebp,eax
+        ; 00_15 12
+        mov     esi,ecx
+        mov     eax,ebp
+        rol     ebp,5
+        xor     esi,edx
+        add     ebp,edi
+        mov     edi,DWORD [48+esp]
+        and     esi,ebx
+        ror     ebx,2
+        xor     esi,edx
+        lea     ebp,[1518500249+edi*1+ebp]
+        add     ebp,esi
+        ; 00_15 13
+        mov     edi,ebx
+        mov     esi,ebp
+        rol     ebp,5
+        xor     edi,ecx
+        add     ebp,edx
+        mov     edx,DWORD [52+esp]
+        and     edi,eax
+        ror     eax,2
+        xor     edi,ecx
+        lea     ebp,[1518500249+edx*1+ebp]
+        add     ebp,edi
+        ; 00_15 14
+        mov     edx,eax
+        mov     edi,ebp
+        rol     ebp,5
+        xor     edx,ebx
+        add     ebp,ecx
+        mov     ecx,DWORD [56+esp]
+        and     edx,esi
+        ror     esi,2
+        xor     edx,ebx
+        lea     ebp,[1518500249+ecx*1+ebp]
+        add     ebp,edx
+        ; 00_15 15
+        mov     ecx,esi
+        mov     edx,ebp
+        rol     ebp,5
+        xor     ecx,eax
+        add     ebp,ebx
+        mov     ebx,DWORD [60+esp]
+        and     ecx,edi
+        ror     edi,2
+        xor     ecx,eax
+        lea     ebp,[1518500249+ebx*1+ebp]
+        mov     ebx,DWORD [esp]
+        add     ecx,ebp
+        ; 16_19 16
+        mov     ebp,edi
+        xor     ebx,DWORD [8+esp]
+        xor     ebp,esi
+        xor     ebx,DWORD [32+esp]
+        and     ebp,edx
+        xor     ebx,DWORD [52+esp]
+        rol     ebx,1
+        xor     ebp,esi
+        add     eax,ebp
+        mov     ebp,ecx
+        ror     edx,2
+        mov     DWORD [esp],ebx
+        rol     ebp,5
+        lea     ebx,[1518500249+eax*1+ebx]
+        mov     eax,DWORD [4+esp]
+        add     ebx,ebp
+        ; 16_19 17
+        mov     ebp,edx
+        xor     eax,DWORD [12+esp]
+        xor     ebp,edi
+        xor     eax,DWORD [36+esp]
+        and     ebp,ecx
+        xor     eax,DWORD [56+esp]
+        rol     eax,1
+        xor     ebp,edi
+        add     esi,ebp
+        mov     ebp,ebx
+        ror     ecx,2
+        mov     DWORD [4+esp],eax
+        rol     ebp,5
+        lea     eax,[1518500249+esi*1+eax]
+        mov     esi,DWORD [8+esp]
+        add     eax,ebp
+        ; 16_19 18
+        mov     ebp,ecx
+        xor     esi,DWORD [16+esp]
+        xor     ebp,edx
+        xor     esi,DWORD [40+esp]
+        and     ebp,ebx
+        xor     esi,DWORD [60+esp]
+        rol     esi,1
+        xor     ebp,edx
+        add     edi,ebp
+        mov     ebp,eax
+        ror     ebx,2
+        mov     DWORD [8+esp],esi
+        rol     ebp,5
+        lea     esi,[1518500249+edi*1+esi]
+        mov     edi,DWORD [12+esp]
+        add     esi,ebp
+        ; 16_19 19
+        mov     ebp,ebx
+        xor     edi,DWORD [20+esp]
+        xor     ebp,ecx
+        xor     edi,DWORD [44+esp]
+        and     ebp,eax
+        xor     edi,DWORD [esp]
+        rol     edi,1
+        xor     ebp,ecx
+        add     edx,ebp
+        mov     ebp,esi
+        ror     eax,2
+        mov     DWORD [12+esp],edi
+        rol     ebp,5
+        lea     edi,[1518500249+edx*1+edi]
+        mov     edx,DWORD [16+esp]
+        add     edi,ebp
+        ; 20_39 20
+        mov     ebp,esi
+        xor     edx,DWORD [24+esp]
+        xor     ebp,eax
+        xor     edx,DWORD [48+esp]
+        xor     ebp,ebx
+        xor     edx,DWORD [4+esp]
+        rol     edx,1
+        add     ecx,ebp
+        ror     esi,2
+        mov     ebp,edi
+        rol     ebp,5
+        mov     DWORD [16+esp],edx
+        lea     edx,[1859775393+ecx*1+edx]
+        mov     ecx,DWORD [20+esp]
+        add     edx,ebp
+        ; 20_39 21
+        mov     ebp,edi
+        xor     ecx,DWORD [28+esp]
+        xor     ebp,esi
+        xor     ecx,DWORD [52+esp]
+        xor     ebp,eax
+        xor     ecx,DWORD [8+esp]
+        rol     ecx,1
+        add     ebx,ebp
+        ror     edi,2
+        mov     ebp,edx
+        rol     ebp,5
+        mov     DWORD [20+esp],ecx
+        lea     ecx,[1859775393+ebx*1+ecx]
+        mov     ebx,DWORD [24+esp]
+        add     ecx,ebp
+        ; 20_39 22
+        mov     ebp,edx
+        xor     ebx,DWORD [32+esp]
+        xor     ebp,edi
+        xor     ebx,DWORD [56+esp]
+        xor     ebp,esi
+        xor     ebx,DWORD [12+esp]
+        rol     ebx,1
+        add     eax,ebp
+        ror     edx,2
+        mov     ebp,ecx
+        rol     ebp,5
+        mov     DWORD [24+esp],ebx
+        lea     ebx,[1859775393+eax*1+ebx]
+        mov     eax,DWORD [28+esp]
+        add     ebx,ebp
+        ; 20_39 23
+        mov     ebp,ecx
+        xor     eax,DWORD [36+esp]
+        xor     ebp,edx
+        xor     eax,DWORD [60+esp]
+        xor     ebp,edi
+        xor     eax,DWORD [16+esp]
+        rol     eax,1
+        add     esi,ebp
+        ror     ecx,2
+        mov     ebp,ebx
+        rol     ebp,5
+        mov     DWORD [28+esp],eax
+        lea     eax,[1859775393+esi*1+eax]
+        mov     esi,DWORD [32+esp]
+        add     eax,ebp
+        ; 20_39 24
+        mov     ebp,ebx
+        xor     esi,DWORD [40+esp]
+        xor     ebp,ecx
+        xor     esi,DWORD [esp]
+        xor     ebp,edx
+        xor     esi,DWORD [20+esp]
+        rol     esi,1
+        add     edi,ebp
+        ror     ebx,2
+        mov     ebp,eax
+        rol     ebp,5
+        mov     DWORD [32+esp],esi
+        lea     esi,[1859775393+edi*1+esi]
+        mov     edi,DWORD [36+esp]
+        add     esi,ebp
+        ; 20_39 25
+        mov     ebp,eax
+        xor     edi,DWORD [44+esp]
+        xor     ebp,ebx
+        xor     edi,DWORD [4+esp]
+        xor     ebp,ecx
+        xor     edi,DWORD [24+esp]
+        rol     edi,1
+        add     edx,ebp
+        ror     eax,2
+        mov     ebp,esi
+        rol     ebp,5
+        mov     DWORD [36+esp],edi
+        lea     edi,[1859775393+edx*1+edi]
+        mov     edx,DWORD [40+esp]
+        add     edi,ebp
+        ; 20_39 26
+        mov     ebp,esi
+        xor     edx,DWORD [48+esp]
+        xor     ebp,eax
+        xor     edx,DWORD [8+esp]
+        xor     ebp,ebx
+        xor     edx,DWORD [28+esp]
+        rol     edx,1
+        add     ecx,ebp
+        ror     esi,2
+        mov     ebp,edi
+        rol     ebp,5
+        mov     DWORD [40+esp],edx
+        lea     edx,[1859775393+ecx*1+edx]
+        mov     ecx,DWORD [44+esp]
+        add     edx,ebp
+        ; 20_39 27
+        mov     ebp,edi
+        xor     ecx,DWORD [52+esp]
+        xor     ebp,esi
+        xor     ecx,DWORD [12+esp]
+        xor     ebp,eax
+        xor     ecx,DWORD [32+esp]
+        rol     ecx,1
+        add     ebx,ebp
+        ror     edi,2
+        mov     ebp,edx
+        rol     ebp,5
+        mov     DWORD [44+esp],ecx
+        lea     ecx,[1859775393+ebx*1+ecx]
+        mov     ebx,DWORD [48+esp]
+        add     ecx,ebp
+        ; 20_39 28
+        mov     ebp,edx
+        xor     ebx,DWORD [56+esp]
+        xor     ebp,edi
+        xor     ebx,DWORD [16+esp]
+        xor     ebp,esi
+        xor     ebx,DWORD [36+esp]
+        rol     ebx,1
+        add     eax,ebp
+        ror     edx,2
+        mov     ebp,ecx
+        rol     ebp,5
+        mov     DWORD [48+esp],ebx
+        lea     ebx,[1859775393+eax*1+ebx]
+        mov     eax,DWORD [52+esp]
+        add     ebx,ebp
+        ; 20_39 29
+        mov     ebp,ecx
+        xor     eax,DWORD [60+esp]
+        xor     ebp,edx
+        xor     eax,DWORD [20+esp]
+        xor     ebp,edi
+        xor     eax,DWORD [40+esp]
+        rol     eax,1
+        add     esi,ebp
+        ror     ecx,2
+        mov     ebp,ebx
+        rol     ebp,5
+        mov     DWORD [52+esp],eax
+        lea     eax,[1859775393+esi*1+eax]
+        mov     esi,DWORD [56+esp]
+        add     eax,ebp
+        ; 20_39 30
+        mov     ebp,ebx
+        xor     esi,DWORD [esp]
+        xor     ebp,ecx
+        xor     esi,DWORD [24+esp]
+        xor     ebp,edx
+        xor     esi,DWORD [44+esp]
+        rol     esi,1
+        add     edi,ebp
+        ror     ebx,2
+        mov     ebp,eax
+        rol     ebp,5
+        mov     DWORD [56+esp],esi
+        lea     esi,[1859775393+edi*1+esi]
+        mov     edi,DWORD [60+esp]
+        add     esi,ebp
+        ; 20_39 31
+        mov     ebp,eax
+        xor     edi,DWORD [4+esp]
+        xor     ebp,ebx
+        xor     edi,DWORD [28+esp]
+        xor     ebp,ecx
+        xor     edi,DWORD [48+esp]
+        rol     edi,1
+        add     edx,ebp
+        ror     eax,2
+        mov     ebp,esi
+        rol     ebp,5
+        mov     DWORD [60+esp],edi
+        lea     edi,[1859775393+edx*1+edi]
+        mov     edx,DWORD [esp]
+        add     edi,ebp
+        ; 20_39 32
+        mov     ebp,esi
+        xor     edx,DWORD [8+esp]
+        xor     ebp,eax
+        xor     edx,DWORD [32+esp]
+        xor     ebp,ebx
+        xor     edx,DWORD [52+esp]
+        rol     edx,1
+        add     ecx,ebp
+        ror     esi,2
+        mov     ebp,edi
+        rol     ebp,5
+        mov     DWORD [esp],edx
+        lea     edx,[1859775393+ecx*1+edx]
+        mov     ecx,DWORD [4+esp]
+        add     edx,ebp
+        ; 20_39 33
+        mov     ebp,edi
+        xor     ecx,DWORD [12+esp]
+        xor     ebp,esi
+        xor     ecx,DWORD [36+esp]
+        xor     ebp,eax
+        xor     ecx,DWORD [56+esp]
+        rol     ecx,1
+        add     ebx,ebp
+        ror     edi,2
+        mov     ebp,edx
+        rol     ebp,5
+        mov     DWORD [4+esp],ecx
+        lea     ecx,[1859775393+ebx*1+ecx]
+        mov     ebx,DWORD [8+esp]
+        add     ecx,ebp
+        ; 20_39 34
+        mov     ebp,edx
+        xor     ebx,DWORD [16+esp]
+        xor     ebp,edi
+        xor     ebx,DWORD [40+esp]
+        xor     ebp,esi
+        xor     ebx,DWORD [60+esp]
+        rol     ebx,1
+        add     eax,ebp
+        ror     edx,2
+        mov     ebp,ecx
+        rol     ebp,5
+        mov     DWORD [8+esp],ebx
+        lea     ebx,[1859775393+eax*1+ebx]
+        mov     eax,DWORD [12+esp]
+        add     ebx,ebp
+        ; 20_39 35
+        mov     ebp,ecx
+        xor     eax,DWORD [20+esp]
+        xor     ebp,edx
+        xor     eax,DWORD [44+esp]
+        xor     ebp,edi
+        xor     eax,DWORD [esp]
+        rol     eax,1
+        add     esi,ebp
+        ror     ecx,2
+        mov     ebp,ebx
+        rol     ebp,5
+        mov     DWORD [12+esp],eax
+        lea     eax,[1859775393+esi*1+eax]
+        mov     esi,DWORD [16+esp]
+        add     eax,ebp
+        ; 20_39 36
+        mov     ebp,ebx
+        xor     esi,DWORD [24+esp]
+        xor     ebp,ecx
+        xor     esi,DWORD [48+esp]
+        xor     ebp,edx
+        xor     esi,DWORD [4+esp]
+        rol     esi,1
+        add     edi,ebp
+        ror     ebx,2
+        mov     ebp,eax
+        rol     ebp,5
+        mov     DWORD [16+esp],esi
+        lea     esi,[1859775393+edi*1+esi]
+        mov     edi,DWORD [20+esp]
+        add     esi,ebp
+        ; 20_39 37
+        mov     ebp,eax
+        xor     edi,DWORD [28+esp]
+        xor     ebp,ebx
+        xor     edi,DWORD [52+esp]
+        xor     ebp,ecx
+        xor     edi,DWORD [8+esp]
+        rol     edi,1
+        add     edx,ebp
+        ror     eax,2
+        mov     ebp,esi
+        rol     ebp,5
+        mov     DWORD [20+esp],edi
+        lea     edi,[1859775393+edx*1+edi]
+        mov     edx,DWORD [24+esp]
+        add     edi,ebp
+        ; 20_39 38
+        mov     ebp,esi
+        xor     edx,DWORD [32+esp]
+        xor     ebp,eax
+        xor     edx,DWORD [56+esp]
+        xor     ebp,ebx
+        xor     edx,DWORD [12+esp]
+        rol     edx,1
+        add     ecx,ebp
+        ror     esi,2
+        mov     ebp,edi
+        rol     ebp,5
+        mov     DWORD [24+esp],edx
+        lea     edx,[1859775393+ecx*1+edx]
+        mov     ecx,DWORD [28+esp]
+        add     edx,ebp
+        ; 20_39 39
+        mov     ebp,edi
+        xor     ecx,DWORD [36+esp]
+        xor     ebp,esi
+        xor     ecx,DWORD [60+esp]
+        xor     ebp,eax
+        xor     ecx,DWORD [16+esp]
+        rol     ecx,1
+        add     ebx,ebp
+        ror     edi,2
+        mov     ebp,edx
+        rol     ebp,5
+        mov     DWORD [28+esp],ecx
+        lea     ecx,[1859775393+ebx*1+ecx]
+        mov     ebx,DWORD [32+esp]
+        add     ecx,ebp
+        ; 40_59 40
+        mov     ebp,edi
+        xor     ebx,DWORD [40+esp]
+        xor     ebp,esi
+        xor     ebx,DWORD [esp]
+        and     ebp,edx
+        xor     ebx,DWORD [20+esp]
+        rol     ebx,1
+        add     ebp,eax
+        ror     edx,2
+        mov     eax,ecx
+        rol     eax,5
+        mov     DWORD [32+esp],ebx
+        lea     ebx,[2400959708+ebp*1+ebx]
+        mov     ebp,edi
+        add     ebx,eax
+        and     ebp,esi
+        mov     eax,DWORD [36+esp]
+        add     ebx,ebp
+        ; 40_59 41
+        mov     ebp,edx
+        xor     eax,DWORD [44+esp]
+        xor     ebp,edi
+        xor     eax,DWORD [4+esp]
+        and     ebp,ecx
+        xor     eax,DWORD [24+esp]
+        rol     eax,1
+        add     ebp,esi
+        ror     ecx,2
+        mov     esi,ebx
+        rol     esi,5
+        mov     DWORD [36+esp],eax
+        lea     eax,[2400959708+ebp*1+eax]
+        mov     ebp,edx
+        add     eax,esi
+        and     ebp,edi
+        mov     esi,DWORD [40+esp]
+        add     eax,ebp
+        ; 40_59 42
+        mov     ebp,ecx
+        xor     esi,DWORD [48+esp]
+        xor     ebp,edx
+        xor     esi,DWORD [8+esp]
+        and     ebp,ebx
+        xor     esi,DWORD [28+esp]
+        rol     esi,1
+        add     ebp,edi
+        ror     ebx,2
+        mov     edi,eax
+        rol     edi,5
+        mov     DWORD [40+esp],esi
+        lea     esi,[2400959708+ebp*1+esi]
+        mov     ebp,ecx
+        add     esi,edi
+        and     ebp,edx
+        mov     edi,DWORD [44+esp]
+        add     esi,ebp
+        ; 40_59 43
+        mov     ebp,ebx
+        xor     edi,DWORD [52+esp]
+        xor     ebp,ecx
+        xor     edi,DWORD [12+esp]
+        and     ebp,eax
+        xor     edi,DWORD [32+esp]
+        rol     edi,1
+        add     ebp,edx
+        ror     eax,2
+        mov     edx,esi
+        rol     edx,5
+        mov     DWORD [44+esp],edi
+        lea     edi,[2400959708+ebp*1+edi]
+        mov     ebp,ebx
+        add     edi,edx
+        and     ebp,ecx
+        mov     edx,DWORD [48+esp]
+        add     edi,ebp
+        ; 40_59 44
+        mov     ebp,eax
+        xor     edx,DWORD [56+esp]
+        xor     ebp,ebx
+        xor     edx,DWORD [16+esp]
+        and     ebp,esi
+        xor     edx,DWORD [36+esp]
+        rol     edx,1
+        add     ebp,ecx
+        ror     esi,2
+        mov     ecx,edi
+        rol     ecx,5
+        mov     DWORD [48+esp],edx
+        lea     edx,[2400959708+ebp*1+edx]
+        mov     ebp,eax
+        add     edx,ecx
+        and     ebp,ebx
+        mov     ecx,DWORD [52+esp]
+        add     edx,ebp
+        ; 40_59 45
+        mov     ebp,esi
+        xor     ecx,DWORD [60+esp]
+        xor     ebp,eax
+        xor     ecx,DWORD [20+esp]
+        and     ebp,edi
+        xor     ecx,DWORD [40+esp]
+        rol     ecx,1
+        add     ebp,ebx
+        ror     edi,2
+        mov     ebx,edx
+        rol     ebx,5
+        mov     DWORD [52+esp],ecx
+        lea     ecx,[2400959708+ebp*1+ecx]
+        mov     ebp,esi
+        add     ecx,ebx
+        and     ebp,eax
+        mov     ebx,DWORD [56+esp]
+        add     ecx,ebp
+        ; 40_59 46
+        mov     ebp,edi
+        xor     ebx,DWORD [esp]
+        xor     ebp,esi
+        xor     ebx,DWORD [24+esp]
+        and     ebp,edx
+        xor     ebx,DWORD [44+esp]
+        rol     ebx,1
+        add     ebp,eax
+        ror     edx,2
+        mov     eax,ecx
+        rol     eax,5
+        mov     DWORD [56+esp],ebx
+        lea     ebx,[2400959708+ebp*1+ebx]
+        mov     ebp,edi
+        add     ebx,eax
+        and     ebp,esi
+        mov     eax,DWORD [60+esp]
+        add     ebx,ebp
+        ; 40_59 47
+        mov     ebp,edx
+        xor     eax,DWORD [4+esp]
+        xor     ebp,edi
+        xor     eax,DWORD [28+esp]
+        and     ebp,ecx
+        xor     eax,DWORD [48+esp]
+        rol     eax,1
+        add     ebp,esi
+        ror     ecx,2
+        mov     esi,ebx
+        rol     esi,5
+        mov     DWORD [60+esp],eax
+        lea     eax,[2400959708+ebp*1+eax]
+        mov     ebp,edx
+        add     eax,esi
+        and     ebp,edi
+        mov     esi,DWORD [esp]
+        add     eax,ebp
+        ; 40_59 48
+        mov     ebp,ecx
+        xor     esi,DWORD [8+esp]
+        xor     ebp,edx
+        xor     esi,DWORD [32+esp]
+        and     ebp,ebx
+        xor     esi,DWORD [52+esp]
+        rol     esi,1
+        add     ebp,edi
+        ror     ebx,2
+        mov     edi,eax
+        rol     edi,5
+        mov     DWORD [esp],esi
+        lea     esi,[2400959708+ebp*1+esi]
+        mov     ebp,ecx
+        add     esi,edi
+        and     ebp,edx
+        mov     edi,DWORD [4+esp]
+        add     esi,ebp
+        ; 40_59 49
+        mov     ebp,ebx
+        xor     edi,DWORD [12+esp]
+        xor     ebp,ecx
+        xor     edi,DWORD [36+esp]
+        and     ebp,eax
+        xor     edi,DWORD [56+esp]
+        rol     edi,1
+        add     ebp,edx
+        ror     eax,2
+        mov     edx,esi
+        rol     edx,5
+        mov     DWORD [4+esp],edi
+        lea     edi,[2400959708+ebp*1+edi]
+        mov     ebp,ebx
+        add     edi,edx
+        and     ebp,ecx
+        mov     edx,DWORD [8+esp]
+        add     edi,ebp
+        ; 40_59 50
+        mov     ebp,eax
+        xor     edx,DWORD [16+esp]
+        xor     ebp,ebx
+        xor     edx,DWORD [40+esp]
+        and     ebp,esi
+        xor     edx,DWORD [60+esp]
+        rol     edx,1
+        add     ebp,ecx
+        ror     esi,2
+        mov     ecx,edi
+        rol     ecx,5
+        mov     DWORD [8+esp],edx
+        lea     edx,[2400959708+ebp*1+edx]
+        mov     ebp,eax
+        add     edx,ecx
+        and     ebp,ebx
+        mov     ecx,DWORD [12+esp]
+        add     edx,ebp
+        ; 40_59 51
+        mov     ebp,esi
+        xor     ecx,DWORD [20+esp]
+        xor     ebp,eax
+        xor     ecx,DWORD [44+esp]
+        and     ebp,edi
+        xor     ecx,DWORD [esp]
+        rol     ecx,1
+        add     ebp,ebx
+        ror     edi,2
+        mov     ebx,edx
+        rol     ebx,5
+        mov     DWORD [12+esp],ecx
+        lea     ecx,[2400959708+ebp*1+ecx]
+        mov     ebp,esi
+        add     ecx,ebx
+        and     ebp,eax
+        mov     ebx,DWORD [16+esp]
+        add     ecx,ebp
+        ; 40_59 52
+        mov     ebp,edi
+        xor     ebx,DWORD [24+esp]
+        xor     ebp,esi
+        xor     ebx,DWORD [48+esp]
+        and     ebp,edx
+        xor     ebx,DWORD [4+esp]
+        rol     ebx,1
+        add     ebp,eax
+        ror     edx,2
+        mov     eax,ecx
+        rol     eax,5
+        mov     DWORD [16+esp],ebx
+        lea     ebx,[2400959708+ebp*1+ebx]
+        mov     ebp,edi
+        add     ebx,eax
+        and     ebp,esi
+        mov     eax,DWORD [20+esp]
+        add     ebx,ebp
+        ; 40_59 53
+        mov     ebp,edx
+        xor     eax,DWORD [28+esp]
+        xor     ebp,edi
+        xor     eax,DWORD [52+esp]
+        and     ebp,ecx
+        xor     eax,DWORD [8+esp]
+        rol     eax,1
+        add     ebp,esi
+        ror     ecx,2
+        mov     esi,ebx
+        rol     esi,5
+        mov     DWORD [20+esp],eax
+        lea     eax,[2400959708+ebp*1+eax]
+        mov     ebp,edx
+        add     eax,esi
+        and     ebp,edi
+        mov     esi,DWORD [24+esp]
+        add     eax,ebp
+        ; 40_59 54
+        mov     ebp,ecx
+        xor     esi,DWORD [32+esp]
+        xor     ebp,edx
+        xor     esi,DWORD [56+esp]
+        and     ebp,ebx
+        xor     esi,DWORD [12+esp]
+        rol     esi,1
+        add     ebp,edi
+        ror     ebx,2
+        mov     edi,eax
+        rol     edi,5
+        mov     DWORD [24+esp],esi
+        lea     esi,[2400959708+ebp*1+esi]
+        mov     ebp,ecx
+        add     esi,edi
+        and     ebp,edx
+        mov     edi,DWORD [28+esp]
+        add     esi,ebp
+        ; 40_59 55
+        mov     ebp,ebx
+        xor     edi,DWORD [36+esp]
+        xor     ebp,ecx
+        xor     edi,DWORD [60+esp]
+        and     ebp,eax
+        xor     edi,DWORD [16+esp]
+        rol     edi,1
+        add     ebp,edx
+        ror     eax,2
+        mov     edx,esi
+        rol     edx,5
+        mov     DWORD [28+esp],edi
+        lea     edi,[2400959708+ebp*1+edi]
+        mov     ebp,ebx
+        add     edi,edx
+        and     ebp,ecx
+        mov     edx,DWORD [32+esp]
+        add     edi,ebp
+        ; 40_59 56
+        mov     ebp,eax
+        xor     edx,DWORD [40+esp]
+        xor     ebp,ebx
+        xor     edx,DWORD [esp]
+        and     ebp,esi
+        xor     edx,DWORD [20+esp]
+        rol     edx,1
+        add     ebp,ecx
+        ror     esi,2
+        mov     ecx,edi
+        rol     ecx,5
+        mov     DWORD [32+esp],edx
+        lea     edx,[2400959708+ebp*1+edx]
+        mov     ebp,eax
+        add     edx,ecx
+        and     ebp,ebx
+        mov     ecx,DWORD [36+esp]
+        add     edx,ebp
+        ; 40_59 57
+        mov     ebp,esi
+        xor     ecx,DWORD [44+esp]
+        xor     ebp,eax
+        xor     ecx,DWORD [4+esp]
+        and     ebp,edi
+        xor     ecx,DWORD [24+esp]
+        rol     ecx,1
+        add     ebp,ebx
+        ror     edi,2
+        mov     ebx,edx
+        rol     ebx,5
+        mov     DWORD [36+esp],ecx
+        lea     ecx,[2400959708+ebp*1+ecx]
+        mov     ebp,esi
+        add     ecx,ebx
+        and     ebp,eax
+        mov     ebx,DWORD [40+esp]
+        add     ecx,ebp
+        ; 40_59 58
+        mov     ebp,edi
+        xor     ebx,DWORD [48+esp]
+        xor     ebp,esi
+        xor     ebx,DWORD [8+esp]
+        and     ebp,edx
+        xor     ebx,DWORD [28+esp]
+        rol     ebx,1
+        add     ebp,eax
+        ror     edx,2
+        mov     eax,ecx
+        rol     eax,5
+        mov     DWORD [40+esp],ebx
+        lea     ebx,[2400959708+ebp*1+ebx]
+        mov     ebp,edi
+        add     ebx,eax
+        and     ebp,esi
+        mov     eax,DWORD [44+esp]
+        add     ebx,ebp
+        ; 40_59 59
+        mov     ebp,edx
+        xor     eax,DWORD [52+esp]
+        xor     ebp,edi
+        xor     eax,DWORD [12+esp]
+        and     ebp,ecx
+        xor     eax,DWORD [32+esp]
+        rol     eax,1
+        add     ebp,esi
+        ror     ecx,2
+        mov     esi,ebx
+        rol     esi,5
+        mov     DWORD [44+esp],eax
+        lea     eax,[2400959708+ebp*1+eax]
+        mov     ebp,edx
+        add     eax,esi
+        and     ebp,edi
+        mov     esi,DWORD [48+esp]
+        add     eax,ebp
+        ; 20_39 60
+        mov     ebp,ebx
+        xor     esi,DWORD [56+esp]
+        xor     ebp,ecx
+        xor     esi,DWORD [16+esp]
+        xor     ebp,edx
+        xor     esi,DWORD [36+esp]
+        rol     esi,1
+        add     edi,ebp
+        ror     ebx,2
+        mov     ebp,eax
+        rol     ebp,5
+        mov     DWORD [48+esp],esi
+        lea     esi,[3395469782+edi*1+esi]
+        mov     edi,DWORD [52+esp]
+        add     esi,ebp
+        ; 20_39 61
+        mov     ebp,eax
+        xor     edi,DWORD [60+esp]
+        xor     ebp,ebx
+        xor     edi,DWORD [20+esp]
+        xor     ebp,ecx
+        xor     edi,DWORD [40+esp]
+        rol     edi,1
+        add     edx,ebp
+        ror     eax,2
+        mov     ebp,esi
+        rol     ebp,5
+        mov     DWORD [52+esp],edi
+        lea     edi,[3395469782+edx*1+edi]
+        mov     edx,DWORD [56+esp]
+        add     edi,ebp
+        ; 20_39 62
+        mov     ebp,esi
+        xor     edx,DWORD [esp]
+        xor     ebp,eax
+        xor     edx,DWORD [24+esp]
+        xor     ebp,ebx
+        xor     edx,DWORD [44+esp]
+        rol     edx,1
+        add     ecx,ebp
+        ror     esi,2
+        mov     ebp,edi
+        rol     ebp,5
+        mov     DWORD [56+esp],edx
+        lea     edx,[3395469782+ecx*1+edx]
+        mov     ecx,DWORD [60+esp]
+        add     edx,ebp
+        ; 20_39 63
+        mov     ebp,edi
+        xor     ecx,DWORD [4+esp]
+        xor     ebp,esi
+        xor     ecx,DWORD [28+esp]
+        xor     ebp,eax
+        xor     ecx,DWORD [48+esp]
+        rol     ecx,1
+        add     ebx,ebp
+        ror     edi,2
+        mov     ebp,edx
+        rol     ebp,5
+        mov     DWORD [60+esp],ecx
+        lea     ecx,[3395469782+ebx*1+ecx]
+        mov     ebx,DWORD [esp]
+        add     ecx,ebp
+        ; 20_39 64
+        mov     ebp,edx
+        xor     ebx,DWORD [8+esp]
+        xor     ebp,edi
+        xor     ebx,DWORD [32+esp]
+        xor     ebp,esi
+        xor     ebx,DWORD [52+esp]
+        rol     ebx,1
+        add     eax,ebp
+        ror     edx,2
+        mov     ebp,ecx
+        rol     ebp,5
+        mov     DWORD [esp],ebx
+        lea     ebx,[3395469782+eax*1+ebx]
+        mov     eax,DWORD [4+esp]
+        add     ebx,ebp
+        ; 20_39 65
+        mov     ebp,ecx
+        xor     eax,DWORD [12+esp]
+        xor     ebp,edx
+        xor     eax,DWORD [36+esp]
+        xor     ebp,edi
+        xor     eax,DWORD [56+esp]
+        rol     eax,1
+        add     esi,ebp
+        ror     ecx,2
+        mov     ebp,ebx
+        rol     ebp,5
+        mov     DWORD [4+esp],eax
+        lea     eax,[3395469782+esi*1+eax]
+        mov     esi,DWORD [8+esp]
+        add     eax,ebp
+        ; 20_39 66
+        mov     ebp,ebx
+        xor     esi,DWORD [16+esp]
+        xor     ebp,ecx
+        xor     esi,DWORD [40+esp]
+        xor     ebp,edx
+        xor     esi,DWORD [60+esp]
+        rol     esi,1
+        add     edi,ebp
+        ror     ebx,2
+        mov     ebp,eax
+        rol     ebp,5
+        mov     DWORD [8+esp],esi
+        lea     esi,[3395469782+edi*1+esi]
+        mov     edi,DWORD [12+esp]
+        add     esi,ebp
+        ; 20_39 67
+        mov     ebp,eax
+        xor     edi,DWORD [20+esp]
+        xor     ebp,ebx
+        xor     edi,DWORD [44+esp]
+        xor     ebp,ecx
+        xor     edi,DWORD [esp]
+        rol     edi,1
+        add     edx,ebp
+        ror     eax,2
+        mov     ebp,esi
+        rol     ebp,5
+        mov     DWORD [12+esp],edi
+        lea     edi,[3395469782+edx*1+edi]
+        mov     edx,DWORD [16+esp]
+        add     edi,ebp
+        ; 20_39 68
+        mov     ebp,esi
+        xor     edx,DWORD [24+esp]
+        xor     ebp,eax
+        xor     edx,DWORD [48+esp]
+        xor     ebp,ebx
+        xor     edx,DWORD [4+esp]
+        rol     edx,1
+        add     ecx,ebp
+        ror     esi,2
+        mov     ebp,edi
+        rol     ebp,5
+        mov     DWORD [16+esp],edx
+        lea     edx,[3395469782+ecx*1+edx]
+        mov     ecx,DWORD [20+esp]
+        add     edx,ebp
+        ; 20_39 69
+        mov     ebp,edi
+        xor     ecx,DWORD [28+esp]
+        xor     ebp,esi
+        xor     ecx,DWORD [52+esp]
+        xor     ebp,eax
+        xor     ecx,DWORD [8+esp]
+        rol     ecx,1
+        add     ebx,ebp
+        ror     edi,2
+        mov     ebp,edx
+        rol     ebp,5
+        mov     DWORD [20+esp],ecx
+        lea     ecx,[3395469782+ebx*1+ecx]
+        mov     ebx,DWORD [24+esp]
+        add     ecx,ebp
+        ; 20_39 70
+        mov     ebp,edx
+        xor     ebx,DWORD [32+esp]
+        xor     ebp,edi
+        xor     ebx,DWORD [56+esp]
+        xor     ebp,esi
+        xor     ebx,DWORD [12+esp]
+        rol     ebx,1
+        add     eax,ebp
+        ror     edx,2
+        mov     ebp,ecx
+        rol     ebp,5
+        mov     DWORD [24+esp],ebx
+        lea     ebx,[3395469782+eax*1+ebx]
+        mov     eax,DWORD [28+esp]
+        add     ebx,ebp
+        ; 20_39 71
+        mov     ebp,ecx
+        xor     eax,DWORD [36+esp]
+        xor     ebp,edx
+        xor     eax,DWORD [60+esp]
+        xor     ebp,edi
+        xor     eax,DWORD [16+esp]
+        rol     eax,1
+        add     esi,ebp
+        ror     ecx,2
+        mov     ebp,ebx
+        rol     ebp,5
+        mov     DWORD [28+esp],eax
+        lea     eax,[3395469782+esi*1+eax]
+        mov     esi,DWORD [32+esp]
+        add     eax,ebp
+        ; 20_39 72
+        mov     ebp,ebx
+        xor     esi,DWORD [40+esp]
+        xor     ebp,ecx
+        xor     esi,DWORD [esp]
+        xor     ebp,edx
+        xor     esi,DWORD [20+esp]
+        rol     esi,1
+        add     edi,ebp
+        ror     ebx,2
+        mov     ebp,eax
+        rol     ebp,5
+        mov     DWORD [32+esp],esi
+        lea     esi,[3395469782+edi*1+esi]
+        mov     edi,DWORD [36+esp]
+        add     esi,ebp
+        ; 20_39 73
+        mov     ebp,eax
+        xor     edi,DWORD [44+esp]
+        xor     ebp,ebx
+        xor     edi,DWORD [4+esp]
+        xor     ebp,ecx
+        xor     edi,DWORD [24+esp]
+        rol     edi,1
+        add     edx,ebp
+        ror     eax,2
+        mov     ebp,esi
+        rol     ebp,5
+        mov     DWORD [36+esp],edi
+        lea     edi,[3395469782+edx*1+edi]
+        mov     edx,DWORD [40+esp]
+        add     edi,ebp
+        ; 20_39 74
+        mov     ebp,esi
+        xor     edx,DWORD [48+esp]
+        xor     ebp,eax
+        xor     edx,DWORD [8+esp]
+        xor     ebp,ebx
+        xor     edx,DWORD [28+esp]
+        rol     edx,1
+        add     ecx,ebp
+        ror     esi,2
+        mov     ebp,edi
+        rol     ebp,5
+        mov     DWORD [40+esp],edx
+        lea     edx,[3395469782+ecx*1+edx]
+        mov     ecx,DWORD [44+esp]
+        add     edx,ebp
+        ; 20_39 75
+        mov     ebp,edi
+        xor     ecx,DWORD [52+esp]
+        xor     ebp,esi
+        xor     ecx,DWORD [12+esp]
+        xor     ebp,eax
+        xor     ecx,DWORD [32+esp]
+        rol     ecx,1
+        add     ebx,ebp
+        ror     edi,2
+        mov     ebp,edx
+        rol     ebp,5
+        mov     DWORD [44+esp],ecx
+        lea     ecx,[3395469782+ebx*1+ecx]
+        mov     ebx,DWORD [48+esp]
+        add     ecx,ebp
+        ; 20_39 76
+        mov     ebp,edx
+        xor     ebx,DWORD [56+esp]
+        xor     ebp,edi
+        xor     ebx,DWORD [16+esp]
+        xor     ebp,esi
+        xor     ebx,DWORD [36+esp]
+        rol     ebx,1
+        add     eax,ebp
+        ror     edx,2
+        mov     ebp,ecx
+        rol     ebp,5
+        mov     DWORD [48+esp],ebx
+        lea     ebx,[3395469782+eax*1+ebx]
+        mov     eax,DWORD [52+esp]
+        add     ebx,ebp
+        ; 20_39 77
+        mov     ebp,ecx
+        xor     eax,DWORD [60+esp]
+        xor     ebp,edx
+        xor     eax,DWORD [20+esp]
+        xor     ebp,edi
+        xor     eax,DWORD [40+esp]
+        rol     eax,1
+        add     esi,ebp
+        ror     ecx,2
+        mov     ebp,ebx
+        rol     ebp,5
+        lea     eax,[3395469782+esi*1+eax]
+        mov     esi,DWORD [56+esp]
+        add     eax,ebp
+        ; 20_39 78
+        mov     ebp,ebx
+        xor     esi,DWORD [esp]
+        xor     ebp,ecx
+        xor     esi,DWORD [24+esp]
+        xor     ebp,edx
+        xor     esi,DWORD [44+esp]
+        rol     esi,1
+        add     edi,ebp
+        ror     ebx,2
+        mov     ebp,eax
+        rol     ebp,5
+        lea     esi,[3395469782+edi*1+esi]
+        mov     edi,DWORD [60+esp]
+        add     esi,ebp
+        ; 20_39 79
+        mov     ebp,eax
+        xor     edi,DWORD [4+esp]
+        xor     ebp,ebx
+        xor     edi,DWORD [28+esp]
+        xor     ebp,ecx
+        xor     edi,DWORD [48+esp]
+        rol     edi,1
+        add     edx,ebp
+        ror     eax,2
+        mov     ebp,esi
+        rol     ebp,5
+        lea     edi,[3395469782+edx*1+edi]
+        add     edi,ebp
+        mov     ebp,DWORD [96+esp]
+        mov     edx,DWORD [100+esp]
+        add     edi,DWORD [ebp]
+        add     esi,DWORD [4+ebp]
+        add     eax,DWORD [8+ebp]
+        add     ebx,DWORD [12+ebp]
+        add     ecx,DWORD [16+ebp]
+        mov     DWORD [ebp],edi
+        add     edx,64
+        mov     DWORD [4+ebp],esi
+        cmp     edx,DWORD [104+esp]
+        mov     DWORD [8+ebp],eax
+        mov     edi,ecx
+        mov     DWORD [12+ebp],ebx
+        mov     esi,edx
+        mov     DWORD [16+ebp],ecx
+        jb      NEAR L$002loop
+        add     esp,76
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+align   16
+__sha1_block_data_order_shaext:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        call    L$003pic_point
+L$003pic_point:
+        pop     ebp
+        lea     ebp,[(L$K_XX_XX-L$003pic_point)+ebp]
+L$shaext_shortcut:
+        mov     edi,DWORD [20+esp]
+        mov     ebx,esp
+        mov     esi,DWORD [24+esp]
+        mov     ecx,DWORD [28+esp]
+        sub     esp,32
+        movdqu  xmm0,[edi]
+        movd    xmm1,DWORD [16+edi]
+        and     esp,-32
+        movdqa  xmm3,[80+ebp]
+        movdqu  xmm4,[esi]
+        pshufd  xmm0,xmm0,27
+        movdqu  xmm5,[16+esi]
+        pshufd  xmm1,xmm1,27
+        movdqu  xmm6,[32+esi]
+db      102,15,56,0,227
+        movdqu  xmm7,[48+esi]
+db      102,15,56,0,235
+db      102,15,56,0,243
+db      102,15,56,0,251
+        jmp     NEAR L$004loop_shaext
+align   16
+L$004loop_shaext:
+        dec     ecx
+        lea     eax,[64+esi]
+        movdqa  [esp],xmm1
+        paddd   xmm1,xmm4
+        cmovne  esi,eax
+        movdqa  [16+esp],xmm0
+db      15,56,201,229
+        movdqa  xmm2,xmm0
+db      15,58,204,193,0
+db      15,56,200,213
+        pxor    xmm4,xmm6
+db      15,56,201,238
+db      15,56,202,231
+        movdqa  xmm1,xmm0
+db      15,58,204,194,0
+db      15,56,200,206
+        pxor    xmm5,xmm7
+db      15,56,202,236
+db      15,56,201,247
+        movdqa  xmm2,xmm0
+db      15,58,204,193,0
+db      15,56,200,215
+        pxor    xmm6,xmm4
+db      15,56,201,252
+db      15,56,202,245
+        movdqa  xmm1,xmm0
+db      15,58,204,194,0
+db      15,56,200,204
+        pxor    xmm7,xmm5
+db      15,56,202,254
+db      15,56,201,229
+        movdqa  xmm2,xmm0
+db      15,58,204,193,0
+db      15,56,200,213
+        pxor    xmm4,xmm6
+db      15,56,201,238
+db      15,56,202,231
+        movdqa  xmm1,xmm0
+db      15,58,204,194,1
+db      15,56,200,206
+        pxor    xmm5,xmm7
+db      15,56,202,236
+db      15,56,201,247
+        movdqa  xmm2,xmm0
+db      15,58,204,193,1
+db      15,56,200,215
+        pxor    xmm6,xmm4
+db      15,56,201,252
+db      15,56,202,245
+        movdqa  xmm1,xmm0
+db      15,58,204,194,1
+db      15,56,200,204
+        pxor    xmm7,xmm5
+db      15,56,202,254
+db      15,56,201,229
+        movdqa  xmm2,xmm0
+db      15,58,204,193,1
+db      15,56,200,213
+        pxor    xmm4,xmm6
+db      15,56,201,238
+db      15,56,202,231
+        movdqa  xmm1,xmm0
+db      15,58,204,194,1
+db      15,56,200,206
+        pxor    xmm5,xmm7
+db      15,56,202,236
+db      15,56,201,247
+        movdqa  xmm2,xmm0
+db      15,58,204,193,2
+db      15,56,200,215
+        pxor    xmm6,xmm4
+db      15,56,201,252
+db      15,56,202,245
+        movdqa  xmm1,xmm0
+db      15,58,204,194,2
+db      15,56,200,204
+        pxor    xmm7,xmm5
+db      15,56,202,254
+db      15,56,201,229
+        movdqa  xmm2,xmm0
+db      15,58,204,193,2
+db      15,56,200,213
+        pxor    xmm4,xmm6
+db      15,56,201,238
+db      15,56,202,231
+        movdqa  xmm1,xmm0
+db      15,58,204,194,2
+db      15,56,200,206
+        pxor    xmm5,xmm7
+db      15,56,202,236
+db      15,56,201,247
+        movdqa  xmm2,xmm0
+db      15,58,204,193,2
+db      15,56,200,215
+        pxor    xmm6,xmm4
+db      15,56,201,252
+db      15,56,202,245
+        movdqa  xmm1,xmm0
+db      15,58,204,194,3
+db      15,56,200,204
+        pxor    xmm7,xmm5
+db      15,56,202,254
+        movdqu  xmm4,[esi]
+        movdqa  xmm2,xmm0
+db      15,58,204,193,3
+db      15,56,200,213
+        movdqu  xmm5,[16+esi]
+db      102,15,56,0,227
+        movdqa  xmm1,xmm0
+db      15,58,204,194,3
+db      15,56,200,206
+        movdqu  xmm6,[32+esi]
+db      102,15,56,0,235
+        movdqa  xmm2,xmm0
+db      15,58,204,193,3
+db      15,56,200,215
+        movdqu  xmm7,[48+esi]
+db      102,15,56,0,243
+        movdqa  xmm1,xmm0
+db      15,58,204,194,3
+        movdqa  xmm2,[esp]
+db      102,15,56,0,251
+db      15,56,200,202
+        paddd   xmm0,[16+esp]
+        jnz     NEAR L$004loop_shaext
+        pshufd  xmm0,xmm0,27
+        pshufd  xmm1,xmm1,27
+        movdqu  [edi],xmm0
+        movd    DWORD [16+edi],xmm1
+        mov     esp,ebx
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+align   16
+__sha1_block_data_order_ssse3:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        call    L$005pic_point
+L$005pic_point:
+        pop     ebp
+        lea     ebp,[(L$K_XX_XX-L$005pic_point)+ebp]
+L$ssse3_shortcut:
+        movdqa  xmm7,[ebp]
+        movdqa  xmm0,[16+ebp]
+        movdqa  xmm1,[32+ebp]
+        movdqa  xmm2,[48+ebp]
+        movdqa  xmm6,[64+ebp]
+        mov     edi,DWORD [20+esp]
+        mov     ebp,DWORD [24+esp]
+        mov     edx,DWORD [28+esp]
+        mov     esi,esp
+        sub     esp,208
+        and     esp,-64
+        movdqa  [112+esp],xmm0
+        movdqa  [128+esp],xmm1
+        movdqa  [144+esp],xmm2
+        shl     edx,6
+        movdqa  [160+esp],xmm7
+        add     edx,ebp
+        movdqa  [176+esp],xmm6
+        add     ebp,64
+        mov     DWORD [192+esp],edi
+        mov     DWORD [196+esp],ebp
+        mov     DWORD [200+esp],edx
+        mov     DWORD [204+esp],esi
+        mov     eax,DWORD [edi]
+        mov     ebx,DWORD [4+edi]
+        mov     ecx,DWORD [8+edi]
+        mov     edx,DWORD [12+edi]
+        mov     edi,DWORD [16+edi]
+        mov     esi,ebx
+        movdqu  xmm0,[ebp-64]
+        movdqu  xmm1,[ebp-48]
+        movdqu  xmm2,[ebp-32]
+        movdqu  xmm3,[ebp-16]
+db      102,15,56,0,198
+db      102,15,56,0,206
+db      102,15,56,0,214
+        movdqa  [96+esp],xmm7
+db      102,15,56,0,222
+        paddd   xmm0,xmm7
+        paddd   xmm1,xmm7
+        paddd   xmm2,xmm7
+        movdqa  [esp],xmm0
+        psubd   xmm0,xmm7
+        movdqa  [16+esp],xmm1
+        psubd   xmm1,xmm7
+        movdqa  [32+esp],xmm2
+        mov     ebp,ecx
+        psubd   xmm2,xmm7
+        xor     ebp,edx
+        pshufd  xmm4,xmm0,238
+        and     esi,ebp
+        jmp     NEAR L$006loop
+align   16
+L$006loop:
+        ror     ebx,2
+        xor     esi,edx
+        mov     ebp,eax
+        punpcklqdq      xmm4,xmm1
+        movdqa  xmm6,xmm3
+        add     edi,DWORD [esp]
+        xor     ebx,ecx
+        paddd   xmm7,xmm3
+        movdqa  [64+esp],xmm0
+        rol     eax,5
+        add     edi,esi
+        psrldq  xmm6,4
+        and     ebp,ebx
+        xor     ebx,ecx
+        pxor    xmm4,xmm0
+        add     edi,eax
+        ror     eax,7
+        pxor    xmm6,xmm2
+        xor     ebp,ecx
+        mov     esi,edi
+        add     edx,DWORD [4+esp]
+        pxor    xmm4,xmm6
+        xor     eax,ebx
+        rol     edi,5
+        movdqa  [48+esp],xmm7
+        add     edx,ebp
+        and     esi,eax
+        movdqa  xmm0,xmm4
+        xor     eax,ebx
+        add     edx,edi
+        ror     edi,7
+        movdqa  xmm6,xmm4
+        xor     esi,ebx
+        pslldq  xmm0,12
+        paddd   xmm4,xmm4
+        mov     ebp,edx
+        add     ecx,DWORD [8+esp]
+        psrld   xmm6,31
+        xor     edi,eax
+        rol     edx,5
+        movdqa  xmm7,xmm0
+        add     ecx,esi
+        and     ebp,edi
+        xor     edi,eax
+        psrld   xmm0,30
+        add     ecx,edx
+        ror     edx,7
+        por     xmm4,xmm6
+        xor     ebp,eax
+        mov     esi,ecx
+        add     ebx,DWORD [12+esp]
+        pslld   xmm7,2
+        xor     edx,edi
+        rol     ecx,5
+        pxor    xmm4,xmm0
+        movdqa  xmm0,[96+esp]
+        add     ebx,ebp
+        and     esi,edx
+        pxor    xmm4,xmm7
+        pshufd  xmm5,xmm1,238
+        xor     edx,edi
+        add     ebx,ecx
+        ror     ecx,7
+        xor     esi,edi
+        mov     ebp,ebx
+        punpcklqdq      xmm5,xmm2
+        movdqa  xmm7,xmm4
+        add     eax,DWORD [16+esp]
+        xor     ecx,edx
+        paddd   xmm0,xmm4
+        movdqa  [80+esp],xmm1
+        rol     ebx,5
+        add     eax,esi
+        psrldq  xmm7,4
+        and     ebp,ecx
+        xor     ecx,edx
+        pxor    xmm5,xmm1
+        add     eax,ebx
+        ror     ebx,7
+        pxor    xmm7,xmm3
+        xor     ebp,edx
+        mov     esi,eax
+        add     edi,DWORD [20+esp]
+        pxor    xmm5,xmm7
+        xor     ebx,ecx
+        rol     eax,5
+        movdqa  [esp],xmm0
+        add     edi,ebp
+        and     esi,ebx
+        movdqa  xmm1,xmm5
+        xor     ebx,ecx
+        add     edi,eax
+        ror     eax,7
+        movdqa  xmm7,xmm5
+        xor     esi,ecx
+        pslldq  xmm1,12
+        paddd   xmm5,xmm5
+        mov     ebp,edi
+        add     edx,DWORD [24+esp]
+        psrld   xmm7,31
+        xor     eax,ebx
+        rol     edi,5
+        movdqa  xmm0,xmm1
+        add     edx,esi
+        and     ebp,eax
+        xor     eax,ebx
+        psrld   xmm1,30
+        add     edx,edi
+        ror     edi,7
+        por     xmm5,xmm7
+        xor     ebp,ebx
+        mov     esi,edx
+        add     ecx,DWORD [28+esp]
+        pslld   xmm0,2
+        xor     edi,eax
+        rol     edx,5
+        pxor    xmm5,xmm1
+        movdqa  xmm1,[112+esp]
+        add     ecx,ebp
+        and     esi,edi
+        pxor    xmm5,xmm0
+        pshufd  xmm6,xmm2,238
+        xor     edi,eax
+        add     ecx,edx
+        ror     edx,7
+        xor     esi,eax
+        mov     ebp,ecx
+        punpcklqdq      xmm6,xmm3
+        movdqa  xmm0,xmm5
+        add     ebx,DWORD [32+esp]
+        xor     edx,edi
+        paddd   xmm1,xmm5
+        movdqa  [96+esp],xmm2
+        rol     ecx,5
+        add     ebx,esi
+        psrldq  xmm0,4
+        and     ebp,edx
+        xor     edx,edi
+        pxor    xmm6,xmm2
+        add     ebx,ecx
+        ror     ecx,7
+        pxor    xmm0,xmm4
+        xor     ebp,edi
+        mov     esi,ebx
+        add     eax,DWORD [36+esp]
+        pxor    xmm6,xmm0
+        xor     ecx,edx
+        rol     ebx,5
+        movdqa  [16+esp],xmm1
+        add     eax,ebp
+        and     esi,ecx
+        movdqa  xmm2,xmm6
+        xor     ecx,edx
+        add     eax,ebx
+        ror     ebx,7
+        movdqa  xmm0,xmm6
+        xor     esi,edx
+        pslldq  xmm2,12
+        paddd   xmm6,xmm6
+        mov     ebp,eax
+        add     edi,DWORD [40+esp]
+        psrld   xmm0,31
+        xor     ebx,ecx
+        rol     eax,5
+        movdqa  xmm1,xmm2
+        add     edi,esi
+        and     ebp,ebx
+        xor     ebx,ecx
+        psrld   xmm2,30
+        add     edi,eax
+        ror     eax,7
+        por     xmm6,xmm0
+        xor     ebp,ecx
+        movdqa  xmm0,[64+esp]
+        mov     esi,edi
+        add     edx,DWORD [44+esp]
+        pslld   xmm1,2
+        xor     eax,ebx
+        rol     edi,5
+        pxor    xmm6,xmm2
+        movdqa  xmm2,[112+esp]
+        add     edx,ebp
+        and     esi,eax
+        pxor    xmm6,xmm1
+        pshufd  xmm7,xmm3,238
+        xor     eax,ebx
+        add     edx,edi
+        ror     edi,7
+        xor     esi,ebx
+        mov     ebp,edx
+        punpcklqdq      xmm7,xmm4
+        movdqa  xmm1,xmm6
+        add     ecx,DWORD [48+esp]
+        xor     edi,eax
+        paddd   xmm2,xmm6
+        movdqa  [64+esp],xmm3
+        rol     edx,5
+        add     ecx,esi
+        psrldq  xmm1,4
+        and     ebp,edi
+        xor     edi,eax
+        pxor    xmm7,xmm3
+        add     ecx,edx
+        ror     edx,7
+        pxor    xmm1,xmm5
+        xor     ebp,eax
+        mov     esi,ecx
+        add     ebx,DWORD [52+esp]
+        pxor    xmm7,xmm1
+        xor     edx,edi
+        rol     ecx,5
+        movdqa  [32+esp],xmm2
+        add     ebx,ebp
+        and     esi,edx
+        movdqa  xmm3,xmm7
+        xor     edx,edi
+        add     ebx,ecx
+        ror     ecx,7
+        movdqa  xmm1,xmm7
+        xor     esi,edi
+        pslldq  xmm3,12
+        paddd   xmm7,xmm7
+        mov     ebp,ebx
+        add     eax,DWORD [56+esp]
+        psrld   xmm1,31
+        xor     ecx,edx
+        rol     ebx,5
+        movdqa  xmm2,xmm3
+        add     eax,esi
+        and     ebp,ecx
+        xor     ecx,edx
+        psrld   xmm3,30
+        add     eax,ebx
+        ror     ebx,7
+        por     xmm7,xmm1
+        xor     ebp,edx
+        movdqa  xmm1,[80+esp]
+        mov     esi,eax
+        add     edi,DWORD [60+esp]
+        pslld   xmm2,2
+        xor     ebx,ecx
+        rol     eax,5
+        pxor    xmm7,xmm3
+        movdqa  xmm3,[112+esp]
+        add     edi,ebp
+        and     esi,ebx
+        pxor    xmm7,xmm2
+        pshufd  xmm2,xmm6,238
+        xor     ebx,ecx
+        add     edi,eax
+        ror     eax,7
+        pxor    xmm0,xmm4
+        punpcklqdq      xmm2,xmm7
+        xor     esi,ecx
+        mov     ebp,edi
+        add     edx,DWORD [esp]
+        pxor    xmm0,xmm1
+        movdqa  [80+esp],xmm4
+        xor     eax,ebx
+        rol     edi,5
+        movdqa  xmm4,xmm3
+        add     edx,esi
+        paddd   xmm3,xmm7
+        and     ebp,eax
+        pxor    xmm0,xmm2
+        xor     eax,ebx
+        add     edx,edi
+        ror     edi,7
+        xor     ebp,ebx
+        movdqa  xmm2,xmm0
+        movdqa  [48+esp],xmm3
+        mov     esi,edx
+        add     ecx,DWORD [4+esp]
+        xor     edi,eax
+        rol     edx,5
+        pslld   xmm0,2
+        add     ecx,ebp
+        and     esi,edi
+        psrld   xmm2,30
+        xor     edi,eax
+        add     ecx,edx
+        ror     edx,7
+        xor     esi,eax
+        mov     ebp,ecx
+        add     ebx,DWORD [8+esp]
+        xor     edx,edi
+        rol     ecx,5
+        por     xmm0,xmm2
+        add     ebx,esi
+        and     ebp,edx
+        movdqa  xmm2,[96+esp]
+        xor     edx,edi
+        add     ebx,ecx
+        add     eax,DWORD [12+esp]
+        xor     ebp,edi
+        mov     esi,ebx
+        pshufd  xmm3,xmm7,238
+        rol     ebx,5
+        add     eax,ebp
+        xor     esi,edx
+        ror     ecx,7
+        add     eax,ebx
+        add     edi,DWORD [16+esp]
+        pxor    xmm1,xmm5
+        punpcklqdq      xmm3,xmm0
+        xor     esi,ecx
+        mov     ebp,eax
+        rol     eax,5
+        pxor    xmm1,xmm2
+        movdqa  [96+esp],xmm5
+        add     edi,esi
+        xor     ebp,ecx
+        movdqa  xmm5,xmm4
+        ror     ebx,7
+        paddd   xmm4,xmm0
+        add     edi,eax
+        pxor    xmm1,xmm3
+        add     edx,DWORD [20+esp]
+        xor     ebp,ebx
+        mov     esi,edi
+        rol     edi,5
+        movdqa  xmm3,xmm1
+        movdqa  [esp],xmm4
+        add     edx,ebp
+        xor     esi,ebx
+        ror     eax,7
+        add     edx,edi
+        pslld   xmm1,2
+        add     ecx,DWORD [24+esp]
+        xor     esi,eax
+        psrld   xmm3,30
+        mov     ebp,edx
+        rol     edx,5
+        add     ecx,esi
+        xor     ebp,eax
+        ror     edi,7
+        add     ecx,edx
+        por     xmm1,xmm3
+        add     ebx,DWORD [28+esp]
+        xor     ebp,edi
+        movdqa  xmm3,[64+esp]
+        mov     esi,ecx
+        rol     ecx,5
+        add     ebx,ebp
+        xor     esi,edi
+        ror     edx,7
+        pshufd  xmm4,xmm0,238
+        add     ebx,ecx
+        add     eax,DWORD [32+esp]
+        pxor    xmm2,xmm6
+        punpcklqdq      xmm4,xmm1
+        xor     esi,edx
+        mov     ebp,ebx
+        rol     ebx,5
+        pxor    xmm2,xmm3
+        movdqa  [64+esp],xmm6
+        add     eax,esi
+        xor     ebp,edx
+        movdqa  xmm6,[128+esp]
+        ror     ecx,7
+        paddd   xmm5,xmm1
+        add     eax,ebx
+        pxor    xmm2,xmm4
+        add     edi,DWORD [36+esp]
+        xor     ebp,ecx
+        mov     esi,eax
+        rol     eax,5
+        movdqa  xmm4,xmm2
+        movdqa  [16+esp],xmm5
+        add     edi,ebp
+        xor     esi,ecx
+        ror     ebx,7
+        add     edi,eax
+        pslld   xmm2,2
+        add     edx,DWORD [40+esp]
+        xor     esi,ebx
+        psrld   xmm4,30
+        mov     ebp,edi
+        rol     edi,5
+        add     edx,esi
+        xor     ebp,ebx
+        ror     eax,7
+        add     edx,edi
+        por     xmm2,xmm4
+        add     ecx,DWORD [44+esp]
+        xor     ebp,eax
+        movdqa  xmm4,[80+esp]
+        mov     esi,edx
+        rol     edx,5
+        add     ecx,ebp
+        xor     esi,eax
+        ror     edi,7
+        pshufd  xmm5,xmm1,238
+        add     ecx,edx
+        add     ebx,DWORD [48+esp]
+        pxor    xmm3,xmm7
+        punpcklqdq      xmm5,xmm2
+        xor     esi,edi
+        mov     ebp,ecx
+        rol     ecx,5
+        pxor    xmm3,xmm4
+        movdqa  [80+esp],xmm7
+        add     ebx,esi
+        xor     ebp,edi
+        movdqa  xmm7,xmm6
+        ror     edx,7
+        paddd   xmm6,xmm2
+        add     ebx,ecx
+        pxor    xmm3,xmm5
+        add     eax,DWORD [52+esp]
+        xor     ebp,edx
+        mov     esi,ebx
+        rol     ebx,5
+        movdqa  xmm5,xmm3
+        movdqa  [32+esp],xmm6
+        add     eax,ebp
+        xor     esi,edx
+        ror     ecx,7
+        add     eax,ebx
+        pslld   xmm3,2
+        add     edi,DWORD [56+esp]
+        xor     esi,ecx
+        psrld   xmm5,30
+        mov     ebp,eax
+        rol     eax,5
+        add     edi,esi
+        xor     ebp,ecx
+        ror     ebx,7
+        add     edi,eax
+        por     xmm3,xmm5
+        add     edx,DWORD [60+esp]
+        xor     ebp,ebx
+        movdqa  xmm5,[96+esp]
+        mov     esi,edi
+        rol     edi,5
+        add     edx,ebp
+        xor     esi,ebx
+        ror     eax,7
+        pshufd  xmm6,xmm2,238
+        add     edx,edi
+        add     ecx,DWORD [esp]
+        pxor    xmm4,xmm0
+        punpcklqdq      xmm6,xmm3
+        xor     esi,eax
+        mov     ebp,edx
+        rol     edx,5
+        pxor    xmm4,xmm5
+        movdqa  [96+esp],xmm0
+        add     ecx,esi
+        xor     ebp,eax
+        movdqa  xmm0,xmm7
+        ror     edi,7
+        paddd   xmm7,xmm3
+        add     ecx,edx
+        pxor    xmm4,xmm6
+        add     ebx,DWORD [4+esp]
+        xor     ebp,edi
+        mov     esi,ecx
+        rol     ecx,5
+        movdqa  xmm6,xmm4
+        movdqa  [48+esp],xmm7
+        add     ebx,ebp
+        xor     esi,edi
+        ror     edx,7
+        add     ebx,ecx
+        pslld   xmm4,2
+        add     eax,DWORD [8+esp]
+        xor     esi,edx
+        psrld   xmm6,30
+        mov     ebp,ebx
+        rol     ebx,5
+        add     eax,esi
+        xor     ebp,edx
+        ror     ecx,7
+        add     eax,ebx
+        por     xmm4,xmm6
+        add     edi,DWORD [12+esp]
+        xor     ebp,ecx
+        movdqa  xmm6,[64+esp]
+        mov     esi,eax
+        rol     eax,5
+        add     edi,ebp
+        xor     esi,ecx
+        ror     ebx,7
+        pshufd  xmm7,xmm3,238
+        add     edi,eax
+        add     edx,DWORD [16+esp]
+        pxor    xmm5,xmm1
+        punpcklqdq      xmm7,xmm4
+        xor     esi,ebx
+        mov     ebp,edi
+        rol     edi,5
+        pxor    xmm5,xmm6
+        movdqa  [64+esp],xmm1
+        add     edx,esi
+        xor     ebp,ebx
+        movdqa  xmm1,xmm0
+        ror     eax,7
+        paddd   xmm0,xmm4
+        add     edx,edi
+        pxor    xmm5,xmm7
+        add     ecx,DWORD [20+esp]
+        xor     ebp,eax
+        mov     esi,edx
+        rol     edx,5
+        movdqa  xmm7,xmm5
+        movdqa  [esp],xmm0
+        add     ecx,ebp
+        xor     esi,eax
+        ror     edi,7
+        add     ecx,edx
+        pslld   xmm5,2
+        add     ebx,DWORD [24+esp]
+        xor     esi,edi
+        psrld   xmm7,30
+        mov     ebp,ecx
+        rol     ecx,5
+        add     ebx,esi
+        xor     ebp,edi
+        ror     edx,7
+        add     ebx,ecx
+        por     xmm5,xmm7
+        add     eax,DWORD [28+esp]
+        movdqa  xmm7,[80+esp]
+        ror     ecx,7
+        mov     esi,ebx
+        xor     ebp,edx
+        rol     ebx,5
+        pshufd  xmm0,xmm4,238
+        add     eax,ebp
+        xor     esi,ecx
+        xor     ecx,edx
+        add     eax,ebx
+        add     edi,DWORD [32+esp]
+        pxor    xmm6,xmm2
+        punpcklqdq      xmm0,xmm5
+        and     esi,ecx
+        xor     ecx,edx
+        ror     ebx,7
+        pxor    xmm6,xmm7
+        movdqa  [80+esp],xmm2
+        mov     ebp,eax
+        xor     esi,ecx
+        rol     eax,5
+        movdqa  xmm2,xmm1
+        add     edi,esi
+        paddd   xmm1,xmm5
+        xor     ebp,ebx
+        pxor    xmm6,xmm0
+        xor     ebx,ecx
+        add     edi,eax
+        add     edx,DWORD [36+esp]
+        and     ebp,ebx
+        movdqa  xmm0,xmm6
+        movdqa  [16+esp],xmm1
+        xor     ebx,ecx
+        ror     eax,7
+        mov     esi,edi
+        xor     ebp,ebx
+        rol     edi,5
+        pslld   xmm6,2
+        add     edx,ebp
+        xor     esi,eax
+        psrld   xmm0,30
+        xor     eax,ebx
+        add     edx,edi
+        add     ecx,DWORD [40+esp]
+        and     esi,eax
+        xor     eax,ebx
+        ror     edi,7
+        por     xmm6,xmm0
+        mov     ebp,edx
+        xor     esi,eax
+        movdqa  xmm0,[96+esp]
+        rol     edx,5
+        add     ecx,esi
+        xor     ebp,edi
+        xor     edi,eax
+        add     ecx,edx
+        pshufd  xmm1,xmm5,238
+        add     ebx,DWORD [44+esp]
+        and     ebp,edi
+        xor     edi,eax
+        ror     edx,7
+        mov     esi,ecx
+        xor     ebp,edi
+        rol     ecx,5
+        add     ebx,ebp
+        xor     esi,edx
+        xor     edx,edi
+        add     ebx,ecx
+        add     eax,DWORD [48+esp]
+        pxor    xmm7,xmm3
+        punpcklqdq      xmm1,xmm6
+        and     esi,edx
+        xor     edx,edi
+        ror     ecx,7
+        pxor    xmm7,xmm0
+        movdqa  [96+esp],xmm3
+        mov     ebp,ebx
+        xor     esi,edx
+        rol     ebx,5
+        movdqa  xmm3,[144+esp]
+        add     eax,esi
+        paddd   xmm2,xmm6
+        xor     ebp,ecx
+        pxor    xmm7,xmm1
+        xor     ecx,edx
+        add     eax,ebx
+        add     edi,DWORD [52+esp]
+        and     ebp,ecx
+        movdqa  xmm1,xmm7
+        movdqa  [32+esp],xmm2
+        xor     ecx,edx
+        ror     ebx,7
+        mov     esi,eax
+        xor     ebp,ecx
+        rol     eax,5
+        pslld   xmm7,2
+        add     edi,ebp
+        xor     esi,ebx
+        psrld   xmm1,30
+        xor     ebx,ecx
+        add     edi,eax
+        add     edx,DWORD [56+esp]
+        and     esi,ebx
+        xor     ebx,ecx
+        ror     eax,7
+        por     xmm7,xmm1
+        mov     ebp,edi
+        xor     esi,ebx
+        movdqa  xmm1,[64+esp]
+        rol     edi,5
+        add     edx,esi
+        xor     ebp,eax
+        xor     eax,ebx
+        add     edx,edi
+        pshufd  xmm2,xmm6,238
+        add     ecx,DWORD [60+esp]
+        and     ebp,eax
+        xor     eax,ebx
+        ror     edi,7
+        mov     esi,edx
+        xor     ebp,eax
+        rol     edx,5
+        add     ecx,ebp
+        xor     esi,edi
+        xor     edi,eax
+        add     ecx,edx
+        add     ebx,DWORD [esp]
+        pxor    xmm0,xmm4
+        punpcklqdq      xmm2,xmm7
+        and     esi,edi
+        xor     edi,eax
+        ror     edx,7
+        pxor    xmm0,xmm1
+        movdqa  [64+esp],xmm4
+        mov     ebp,ecx
+        xor     esi,edi
+        rol     ecx,5
+        movdqa  xmm4,xmm3
+        add     ebx,esi
+        paddd   xmm3,xmm7
+        xor     ebp,edx
+        pxor    xmm0,xmm2
+        xor     edx,edi
+        add     ebx,ecx
+        add     eax,DWORD [4+esp]
+        and     ebp,edx
+        movdqa  xmm2,xmm0
+        movdqa  [48+esp],xmm3
+        xor     edx,edi
+        ror     ecx,7
+        mov     esi,ebx
+        xor     ebp,edx
+        rol     ebx,5
+        pslld   xmm0,2
+        add     eax,ebp
+        xor     esi,ecx
+        psrld   xmm2,30
+        xor     ecx,edx
+        add     eax,ebx
+        add     edi,DWORD [8+esp]
+        and     esi,ecx
+        xor     ecx,edx
+        ror     ebx,7
+        por     xmm0,xmm2
+        mov     ebp,eax
+        xor     esi,ecx
+        movdqa  xmm2,[80+esp]
+        rol     eax,5
+        add     edi,esi
+        xor     ebp,ebx
+        xor     ebx,ecx
+        add     edi,eax
+        pshufd  xmm3,xmm7,238
+        add     edx,DWORD [12+esp]
+        and     ebp,ebx
+        xor     ebx,ecx
+        ror     eax,7
+        mov     esi,edi
+        xor     ebp,ebx
+        rol     edi,5
+        add     edx,ebp
+        xor     esi,eax
+        xor     eax,ebx
+        add     edx,edi
+        add     ecx,DWORD [16+esp]
+        pxor    xmm1,xmm5
+        punpcklqdq      xmm3,xmm0
+        and     esi,eax
+        xor     eax,ebx
+        ror     edi,7
+        pxor    xmm1,xmm2
+        movdqa  [80+esp],xmm5
+        mov     ebp,edx
+        xor     esi,eax
+        rol     edx,5
+        movdqa  xmm5,xmm4
+        add     ecx,esi
+        paddd   xmm4,xmm0
+        xor     ebp,edi
+        pxor    xmm1,xmm3
+        xor     edi,eax
+        add     ecx,edx
+        add     ebx,DWORD [20+esp]
+        and     ebp,edi
+        movdqa  xmm3,xmm1
+        movdqa  [esp],xmm4
+        xor     edi,eax
+        ror     edx,7
+        mov     esi,ecx
+        xor     ebp,edi
+        rol     ecx,5
+        pslld   xmm1,2
+        add     ebx,ebp
+        xor     esi,edx
+        psrld   xmm3,30
+        xor     edx,edi
+        add     ebx,ecx
+        add     eax,DWORD [24+esp]
+        and     esi,edx
+        xor     edx,edi
+        ror     ecx,7
+        por     xmm1,xmm3
+        mov     ebp,ebx
+        xor     esi,edx
+        movdqa  xmm3,[96+esp]
+        rol     ebx,5
+        add     eax,esi
+        xor     ebp,ecx
+        xor     ecx,edx
+        add     eax,ebx
+        pshufd  xmm4,xmm0,238
+        add     edi,DWORD [28+esp]
+        and     ebp,ecx
+        xor     ecx,edx
+        ror     ebx,7
+        mov     esi,eax
+        xor     ebp,ecx
+        rol     eax,5
+        add     edi,ebp
+        xor     esi,ebx
+        xor     ebx,ecx
+        add     edi,eax
+        add     edx,DWORD [32+esp]
+        pxor    xmm2,xmm6
+        punpcklqdq      xmm4,xmm1
+        and     esi,ebx
+        xor     ebx,ecx
+        ror     eax,7
+        pxor    xmm2,xmm3
+        movdqa  [96+esp],xmm6
+        mov     ebp,edi
+        xor     esi,ebx
+        rol     edi,5
+        movdqa  xmm6,xmm5
+        add     edx,esi
+        paddd   xmm5,xmm1
+        xor     ebp,eax
+        pxor    xmm2,xmm4
+        xor     eax,ebx
+        add     edx,edi
+        add     ecx,DWORD [36+esp]
+        and     ebp,eax
+        movdqa  xmm4,xmm2
+        movdqa  [16+esp],xmm5
+        xor     eax,ebx
+        ror     edi,7
+        mov     esi,edx
+        xor     ebp,eax
+        rol     edx,5
+        pslld   xmm2,2
+        add     ecx,ebp
+        xor     esi,edi
+        psrld   xmm4,30
+        xor     edi,eax
+        add     ecx,edx
+        add     ebx,DWORD [40+esp]
+        and     esi,edi
+        xor     edi,eax
+        ror     edx,7
+        por     xmm2,xmm4
+        mov     ebp,ecx
+        xor     esi,edi
+        movdqa  xmm4,[64+esp]
+        rol     ecx,5
+        add     ebx,esi
+        xor     ebp,edx
+        xor     edx,edi
+        add     ebx,ecx
+        pshufd  xmm5,xmm1,238
+        add     eax,DWORD [44+esp]
+        and     ebp,edx
+        xor     edx,edi
+        ror     ecx,7
+        mov     esi,ebx
+        xor     ebp,edx
+        rol     ebx,5
+        add     eax,ebp
+        xor     esi,edx
+        add     eax,ebx
+        add     edi,DWORD [48+esp]
+        pxor    xmm3,xmm7
+        punpcklqdq      xmm5,xmm2
+        xor     esi,ecx
+        mov     ebp,eax
+        rol     eax,5
+        pxor    xmm3,xmm4
+        movdqa  [64+esp],xmm7
+        add     edi,esi
+        xor     ebp,ecx
+        movdqa  xmm7,xmm6
+        ror     ebx,7
+        paddd   xmm6,xmm2
+        add     edi,eax
+        pxor    xmm3,xmm5
+        add     edx,DWORD [52+esp]
+        xor     ebp,ebx
+        mov     esi,edi
+        rol     edi,5
+        movdqa  xmm5,xmm3
+        movdqa  [32+esp],xmm6
+        add     edx,ebp
+        xor     esi,ebx
+        ror     eax,7
+        add     edx,edi
+        pslld   xmm3,2
+        add     ecx,DWORD [56+esp]
+        xor     esi,eax
+        psrld   xmm5,30
+        mov     ebp,edx
+        rol     edx,5
+        add     ecx,esi
+        xor     ebp,eax
+        ror     edi,7
+        add     ecx,edx
+        por     xmm3,xmm5
+        add     ebx,DWORD [60+esp]
+        xor     ebp,edi
+        mov     esi,ecx
+        rol     ecx,5
+        add     ebx,ebp
+        xor     esi,edi
+        ror     edx,7
+        add     ebx,ecx
+        add     eax,DWORD [esp]
+        xor     esi,edx
+        mov     ebp,ebx
+        rol     ebx,5
+        add     eax,esi
+        xor     ebp,edx
+        ror     ecx,7
+        paddd   xmm7,xmm3
+        add     eax,ebx
+        add     edi,DWORD [4+esp]
+        xor     ebp,ecx
+        mov     esi,eax
+        movdqa  [48+esp],xmm7
+        rol     eax,5
+        add     edi,ebp
+        xor     esi,ecx
+        ror     ebx,7
+        add     edi,eax
+        add     edx,DWORD [8+esp]
+        xor     esi,ebx
+        mov     ebp,edi
+        rol     edi,5
+        add     edx,esi
+        xor     ebp,ebx
+        ror     eax,7
+        add     edx,edi
+        add     ecx,DWORD [12+esp]
+        xor     ebp,eax
+        mov     esi,edx
+        rol     edx,5
+        add     ecx,ebp
+        xor     esi,eax
+        ror     edi,7
+        add     ecx,edx
+        mov     ebp,DWORD [196+esp]
+        cmp     ebp,DWORD [200+esp]
+        je      NEAR L$007done
+        movdqa  xmm7,[160+esp]
+        movdqa  xmm6,[176+esp]
+        movdqu  xmm0,[ebp]
+        movdqu  xmm1,[16+ebp]
+        movdqu  xmm2,[32+ebp]
+        movdqu  xmm3,[48+ebp]
+        add     ebp,64
+db      102,15,56,0,198
+        mov     DWORD [196+esp],ebp
+        movdqa  [96+esp],xmm7
+        add     ebx,DWORD [16+esp]
+        xor     esi,edi
+        mov     ebp,ecx
+        rol     ecx,5
+        add     ebx,esi
+        xor     ebp,edi
+        ror     edx,7
+db      102,15,56,0,206
+        add     ebx,ecx
+        add     eax,DWORD [20+esp]
+        xor     ebp,edx
+        mov     esi,ebx
+        paddd   xmm0,xmm7
+        rol     ebx,5
+        add     eax,ebp
+        xor     esi,edx
+        ror     ecx,7
+        movdqa  [esp],xmm0
+        add     eax,ebx
+        add     edi,DWORD [24+esp]
+        xor     esi,ecx
+        mov     ebp,eax
+        psubd   xmm0,xmm7
+        rol     eax,5
+        add     edi,esi
+        xor     ebp,ecx
+        ror     ebx,7
+        add     edi,eax
+        add     edx,DWORD [28+esp]
+        xor     ebp,ebx
+        mov     esi,edi
+        rol     edi,5
+        add     edx,ebp
+        xor     esi,ebx
+        ror     eax,7
+        add     edx,edi
+        add     ecx,DWORD [32+esp]
+        xor     esi,eax
+        mov     ebp,edx
+        rol     edx,5
+        add     ecx,esi
+        xor     ebp,eax
+        ror     edi,7
+db      102,15,56,0,214
+        add     ecx,edx
+        add     ebx,DWORD [36+esp]
+        xor     ebp,edi
+        mov     esi,ecx
+        paddd   xmm1,xmm7
+        rol     ecx,5
+        add     ebx,ebp
+        xor     esi,edi
+        ror     edx,7
+        movdqa  [16+esp],xmm1
+        add     ebx,ecx
+        add     eax,DWORD [40+esp]
+        xor     esi,edx
+        mov     ebp,ebx
+        psubd   xmm1,xmm7
+        rol     ebx,5
+        add     eax,esi
+        xor     ebp,edx
+        ror     ecx,7
+        add     eax,ebx
+        add     edi,DWORD [44+esp]
+        xor     ebp,ecx
+        mov     esi,eax
+        rol     eax,5
+        add     edi,ebp
+        xor     esi,ecx
+        ror     ebx,7
+        add     edi,eax
+        add     edx,DWORD [48+esp]
+        xor     esi,ebx
+        mov     ebp,edi
+        rol     edi,5
+        add     edx,esi
+        xor     ebp,ebx
+        ror     eax,7
+db      102,15,56,0,222
+        add     edx,edi
+        add     ecx,DWORD [52+esp]
+        xor     ebp,eax
+        mov     esi,edx
+        paddd   xmm2,xmm7
+        rol     edx,5
+        add     ecx,ebp
+        xor     esi,eax
+        ror     edi,7
+        movdqa  [32+esp],xmm2
+        add     ecx,edx
+        add     ebx,DWORD [56+esp]
+        xor     esi,edi
+        mov     ebp,ecx
+        psubd   xmm2,xmm7
+        rol     ecx,5
+        add     ebx,esi
+        xor     ebp,edi
+        ror     edx,7
+        add     ebx,ecx
+        add     eax,DWORD [60+esp]
+        xor     ebp,edx
+        mov     esi,ebx
+        rol     ebx,5
+        add     eax,ebp
+        ror     ecx,7
+        add     eax,ebx
+        mov     ebp,DWORD [192+esp]
+        add     eax,DWORD [ebp]
+        add     esi,DWORD [4+ebp]
+        add     ecx,DWORD [8+ebp]
+        mov     DWORD [ebp],eax
+        add     edx,DWORD [12+ebp]
+        mov     DWORD [4+ebp],esi
+        add     edi,DWORD [16+ebp]
+        mov     DWORD [8+ebp],ecx
+        mov     ebx,ecx
+        mov     DWORD [12+ebp],edx
+        xor     ebx,edx
+        mov     DWORD [16+ebp],edi
+        mov     ebp,esi
+        pshufd  xmm4,xmm0,238
+        and     esi,ebx
+        mov     ebx,ebp
+        jmp     NEAR L$006loop
+align   16
+L$007done:
+        add     ebx,DWORD [16+esp]
+        xor     esi,edi
+        mov     ebp,ecx
+        rol     ecx,5
+        add     ebx,esi
+        xor     ebp,edi
+        ror     edx,7
+        add     ebx,ecx
+        add     eax,DWORD [20+esp]
+        xor     ebp,edx
+        mov     esi,ebx
+        rol     ebx,5
+        add     eax,ebp
+        xor     esi,edx
+        ror     ecx,7
+        add     eax,ebx
+        add     edi,DWORD [24+esp]
+        xor     esi,ecx
+        mov     ebp,eax
+        rol     eax,5
+        add     edi,esi
+        xor     ebp,ecx
+        ror     ebx,7
+        add     edi,eax
+        add     edx,DWORD [28+esp]
+        xor     ebp,ebx
+        mov     esi,edi
+        rol     edi,5
+        add     edx,ebp
+        xor     esi,ebx
+        ror     eax,7
+        add     edx,edi
+        add     ecx,DWORD [32+esp]
+        xor     esi,eax
+        mov     ebp,edx
+        rol     edx,5
+        add     ecx,esi
+        xor     ebp,eax
+        ror     edi,7
+        add     ecx,edx
+        add     ebx,DWORD [36+esp]
+        xor     ebp,edi
+        mov     esi,ecx
+        rol     ecx,5
+        add     ebx,ebp
+        xor     esi,edi
+        ror     edx,7
+        add     ebx,ecx
+        add     eax,DWORD [40+esp]
+        xor     esi,edx
+        mov     ebp,ebx
+        rol     ebx,5
+        add     eax,esi
+        xor     ebp,edx
+        ror     ecx,7
+        add     eax,ebx
+        add     edi,DWORD [44+esp]
+        xor     ebp,ecx
+        mov     esi,eax
+        rol     eax,5
+        add     edi,ebp
+        xor     esi,ecx
+        ror     ebx,7
+        add     edi,eax
+        add     edx,DWORD [48+esp]
+        xor     esi,ebx
+        mov     ebp,edi
+        rol     edi,5
+        add     edx,esi
+        xor     ebp,ebx
+        ror     eax,7
+        add     edx,edi
+        add     ecx,DWORD [52+esp]
+        xor     ebp,eax
+        mov     esi,edx
+        rol     edx,5
+        add     ecx,ebp
+        xor     esi,eax
+        ror     edi,7
+        add     ecx,edx
+        add     ebx,DWORD [56+esp]
+        xor     esi,edi
+        mov     ebp,ecx
+        rol     ecx,5
+        add     ebx,esi
+        xor     ebp,edi
+        ror     edx,7
+        add     ebx,ecx
+        add     eax,DWORD [60+esp]
+        xor     ebp,edx
+        mov     esi,ebx
+        rol     ebx,5
+        add     eax,ebp
+        ror     ecx,7
+        add     eax,ebx
+        mov     ebp,DWORD [192+esp]
+        add     eax,DWORD [ebp]
+        mov     esp,DWORD [204+esp]
+        add     esi,DWORD [4+ebp]
+        add     ecx,DWORD [8+ebp]
+        mov     DWORD [ebp],eax
+        add     edx,DWORD [12+ebp]
+        mov     DWORD [4+ebp],esi
+        add     edi,DWORD [16+ebp]
+        mov     DWORD [8+ebp],ecx
+        mov     DWORD [12+ebp],edx
+        mov     DWORD [16+ebp],edi
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+align   16
+__sha1_block_data_order_avx:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        call    L$008pic_point
+L$008pic_point:
+        pop     ebp
+        lea     ebp,[(L$K_XX_XX-L$008pic_point)+ebp]
+L$avx_shortcut:
+        vzeroall
+        vmovdqa xmm7,[ebp]
+        vmovdqa xmm0,[16+ebp]
+        vmovdqa xmm1,[32+ebp]
+        vmovdqa xmm2,[48+ebp]
+        vmovdqa xmm6,[64+ebp]
+        mov     edi,DWORD [20+esp]
+        mov     ebp,DWORD [24+esp]
+        mov     edx,DWORD [28+esp]
+        mov     esi,esp
+        sub     esp,208
+        and     esp,-64
+        vmovdqa [112+esp],xmm0
+        vmovdqa [128+esp],xmm1
+        vmovdqa [144+esp],xmm2
+        shl     edx,6
+        vmovdqa [160+esp],xmm7
+        add     edx,ebp
+        vmovdqa [176+esp],xmm6
+        add     ebp,64
+        mov     DWORD [192+esp],edi
+        mov     DWORD [196+esp],ebp
+        mov     DWORD [200+esp],edx
+        mov     DWORD [204+esp],esi
+        mov     eax,DWORD [edi]
+        mov     ebx,DWORD [4+edi]
+        mov     ecx,DWORD [8+edi]
+        mov     edx,DWORD [12+edi]
+        mov     edi,DWORD [16+edi]
+        mov     esi,ebx
+        vmovdqu xmm0,[ebp-64]
+        vmovdqu xmm1,[ebp-48]
+        vmovdqu xmm2,[ebp-32]
+        vmovdqu xmm3,[ebp-16]
+        vpshufb xmm0,xmm0,xmm6
+        vpshufb xmm1,xmm1,xmm6
+        vpshufb xmm2,xmm2,xmm6
+        vmovdqa [96+esp],xmm7
+        vpshufb xmm3,xmm3,xmm6
+        vpaddd  xmm4,xmm0,xmm7
+        vpaddd  xmm5,xmm1,xmm7
+        vpaddd  xmm6,xmm2,xmm7
+        vmovdqa [esp],xmm4
+        mov     ebp,ecx
+        vmovdqa [16+esp],xmm5
+        xor     ebp,edx
+        vmovdqa [32+esp],xmm6
+        and     esi,ebp
+        jmp     NEAR L$009loop
+align   16
+L$009loop:
+        shrd    ebx,ebx,2
+        xor     esi,edx
+        vpalignr        xmm4,xmm1,xmm0,8
+        mov     ebp,eax
+        add     edi,DWORD [esp]
+        vpaddd  xmm7,xmm7,xmm3
+        vmovdqa [64+esp],xmm0
+        xor     ebx,ecx
+        shld    eax,eax,5
+        vpsrldq xmm6,xmm3,4
+        add     edi,esi
+        and     ebp,ebx
+        vpxor   xmm4,xmm4,xmm0
+        xor     ebx,ecx
+        add     edi,eax
+        vpxor   xmm6,xmm6,xmm2
+        shrd    eax,eax,7
+        xor     ebp,ecx
+        vmovdqa [48+esp],xmm7
+        mov     esi,edi
+        add     edx,DWORD [4+esp]
+        vpxor   xmm4,xmm4,xmm6
+        xor     eax,ebx
+        shld    edi,edi,5
+        add     edx,ebp
+        and     esi,eax
+        vpsrld  xmm6,xmm4,31
+        xor     eax,ebx
+        add     edx,edi
+        shrd    edi,edi,7
+        xor     esi,ebx
+        vpslldq xmm0,xmm4,12
+        vpaddd  xmm4,xmm4,xmm4
+        mov     ebp,edx
+        add     ecx,DWORD [8+esp]
+        xor     edi,eax
+        shld    edx,edx,5
+        vpsrld  xmm7,xmm0,30
+        vpor    xmm4,xmm4,xmm6
+        add     ecx,esi
+        and     ebp,edi
+        xor     edi,eax
+        add     ecx,edx
+        vpslld  xmm0,xmm0,2
+        shrd    edx,edx,7
+        xor     ebp,eax
+        vpxor   xmm4,xmm4,xmm7
+        mov     esi,ecx
+        add     ebx,DWORD [12+esp]
+        xor     edx,edi
+        shld    ecx,ecx,5
+        vpxor   xmm4,xmm4,xmm0
+        add     ebx,ebp
+        and     esi,edx
+        vmovdqa xmm0,[96+esp]
+        xor     edx,edi
+        add     ebx,ecx
+        shrd    ecx,ecx,7
+        xor     esi,edi
+        vpalignr        xmm5,xmm2,xmm1,8
+        mov     ebp,ebx
+        add     eax,DWORD [16+esp]
+        vpaddd  xmm0,xmm0,xmm4
+        vmovdqa [80+esp],xmm1
+        xor     ecx,edx
+        shld    ebx,ebx,5
+        vpsrldq xmm7,xmm4,4
+        add     eax,esi
+        and     ebp,ecx
+        vpxor   xmm5,xmm5,xmm1
+        xor     ecx,edx
+        add     eax,ebx
+        vpxor   xmm7,xmm7,xmm3
+        shrd    ebx,ebx,7
+        xor     ebp,edx
+        vmovdqa [esp],xmm0
+        mov     esi,eax
+        add     edi,DWORD [20+esp]
+        vpxor   xmm5,xmm5,xmm7
+        xor     ebx,ecx
+        shld    eax,eax,5
+        add     edi,ebp
+        and     esi,ebx
+        vpsrld  xmm7,xmm5,31
+        xor     ebx,ecx
+        add     edi,eax
+        shrd    eax,eax,7
+        xor     esi,ecx
+        vpslldq xmm1,xmm5,12
+        vpaddd  xmm5,xmm5,xmm5
+        mov     ebp,edi
+        add     edx,DWORD [24+esp]
+        xor     eax,ebx
+        shld    edi,edi,5
+        vpsrld  xmm0,xmm1,30
+        vpor    xmm5,xmm5,xmm7
+        add     edx,esi
+        and     ebp,eax
+        xor     eax,ebx
+        add     edx,edi
+        vpslld  xmm1,xmm1,2
+        shrd    edi,edi,7
+        xor     ebp,ebx
+        vpxor   xmm5,xmm5,xmm0
+        mov     esi,edx
+        add     ecx,DWORD [28+esp]
+        xor     edi,eax
+        shld    edx,edx,5
+        vpxor   xmm5,xmm5,xmm1
+        add     ecx,ebp
+        and     esi,edi
+        vmovdqa xmm1,[112+esp]
+        xor     edi,eax
+        add     ecx,edx
+        shrd    edx,edx,7
+        xor     esi,eax
+        vpalignr        xmm6,xmm3,xmm2,8
+        mov     ebp,ecx
+        add     ebx,DWORD [32+esp]
+        vpaddd  xmm1,xmm1,xmm5
+        vmovdqa [96+esp],xmm2
+        xor     edx,edi
+        shld    ecx,ecx,5
+        vpsrldq xmm0,xmm5,4
+        add     ebx,esi
+        and     ebp,edx
+        vpxor   xmm6,xmm6,xmm2
+        xor     edx,edi
+        add     ebx,ecx
+        vpxor   xmm0,xmm0,xmm4
+        shrd    ecx,ecx,7
+        xor     ebp,edi
+        vmovdqa [16+esp],xmm1
+        mov     esi,ebx
+        add     eax,DWORD [36+esp]
+        vpxor   xmm6,xmm6,xmm0
+        xor     ecx,edx
+        shld    ebx,ebx,5
+        add     eax,ebp
+        and     esi,ecx
+        vpsrld  xmm0,xmm6,31
+        xor     ecx,edx
+        add     eax,ebx
+        shrd    ebx,ebx,7
+        xor     esi,edx
+        vpslldq xmm2,xmm6,12
+        vpaddd  xmm6,xmm6,xmm6
+        mov     ebp,eax
+        add     edi,DWORD [40+esp]
+        xor     ebx,ecx
+        shld    eax,eax,5
+        vpsrld  xmm1,xmm2,30
+        vpor    xmm6,xmm6,xmm0
+        add     edi,esi
+        and     ebp,ebx
+        xor     ebx,ecx
+        add     edi,eax
+        vpslld  xmm2,xmm2,2
+        vmovdqa xmm0,[64+esp]
+        shrd    eax,eax,7
+        xor     ebp,ecx
+        vpxor   xmm6,xmm6,xmm1
+        mov     esi,edi
+        add     edx,DWORD [44+esp]
+        xor     eax,ebx
+        shld    edi,edi,5
+        vpxor   xmm6,xmm6,xmm2
+        add     edx,ebp
+        and     esi,eax
+        vmovdqa xmm2,[112+esp]
+        xor     eax,ebx
+        add     edx,edi
+        shrd    edi,edi,7
+        xor     esi,ebx
+        vpalignr        xmm7,xmm4,xmm3,8
+        mov     ebp,edx
+        add     ecx,DWORD [48+esp]
+        vpaddd  xmm2,xmm2,xmm6
+        vmovdqa [64+esp],xmm3
+        xor     edi,eax
+        shld    edx,edx,5
+        vpsrldq xmm1,xmm6,4
+        add     ecx,esi
+        and     ebp,edi
+        vpxor   xmm7,xmm7,xmm3
+        xor     edi,eax
+        add     ecx,edx
+        vpxor   xmm1,xmm1,xmm5
+        shrd    edx,edx,7
+        xor     ebp,eax
+        vmovdqa [32+esp],xmm2
+        mov     esi,ecx
+        add     ebx,DWORD [52+esp]
+        vpxor   xmm7,xmm7,xmm1
+        xor     edx,edi
+        shld    ecx,ecx,5
+        add     ebx,ebp
+        and     esi,edx
+        vpsrld  xmm1,xmm7,31
+        xor     edx,edi
+        add     ebx,ecx
+        shrd    ecx,ecx,7
+        xor     esi,edi
+        vpslldq xmm3,xmm7,12
+        vpaddd  xmm7,xmm7,xmm7
+        mov     ebp,ebx
+        add     eax,DWORD [56+esp]
+        xor     ecx,edx
+        shld    ebx,ebx,5
+        vpsrld  xmm2,xmm3,30
+        vpor    xmm7,xmm7,xmm1
+        add     eax,esi
+        and     ebp,ecx
+        xor     ecx,edx
+        add     eax,ebx
+        vpslld  xmm3,xmm3,2
+        vmovdqa xmm1,[80+esp]
+        shrd    ebx,ebx,7
+        xor     ebp,edx
+        vpxor   xmm7,xmm7,xmm2
+        mov     esi,eax
+        add     edi,DWORD [60+esp]
+        xor     ebx,ecx
+        shld    eax,eax,5
+        vpxor   xmm7,xmm7,xmm3
+        add     edi,ebp
+        and     esi,ebx
+        vmovdqa xmm3,[112+esp]
+        xor     ebx,ecx
+        add     edi,eax
+        vpalignr        xmm2,xmm7,xmm6,8
+        vpxor   xmm0,xmm0,xmm4
+        shrd    eax,eax,7
+        xor     esi,ecx
+        mov     ebp,edi
+        add     edx,DWORD [esp]
+        vpxor   xmm0,xmm0,xmm1
+        vmovdqa [80+esp],xmm4
+        xor     eax,ebx
+        shld    edi,edi,5
+        vmovdqa xmm4,xmm3
+        vpaddd  xmm3,xmm3,xmm7
+        add     edx,esi
+        and     ebp,eax
+        vpxor   xmm0,xmm0,xmm2
+        xor     eax,ebx
+        add     edx,edi
+        shrd    edi,edi,7
+        xor     ebp,ebx
+        vpsrld  xmm2,xmm0,30
+        vmovdqa [48+esp],xmm3
+        mov     esi,edx
+        add     ecx,DWORD [4+esp]
+        xor     edi,eax
+        shld    edx,edx,5
+        vpslld  xmm0,xmm0,2
+        add     ecx,ebp
+        and     esi,edi
+        xor     edi,eax
+        add     ecx,edx
+        shrd    edx,edx,7
+        xor     esi,eax
+        mov     ebp,ecx
+        add     ebx,DWORD [8+esp]
+        vpor    xmm0,xmm0,xmm2
+        xor     edx,edi
+        shld    ecx,ecx,5
+        vmovdqa xmm2,[96+esp]
+        add     ebx,esi
+        and     ebp,edx
+        xor     edx,edi
+        add     ebx,ecx
+        add     eax,DWORD [12+esp]
+        xor     ebp,edi
+        mov     esi,ebx
+        shld    ebx,ebx,5
+        add     eax,ebp
+        xor     esi,edx
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        vpalignr        xmm3,xmm0,xmm7,8
+        vpxor   xmm1,xmm1,xmm5
+        add     edi,DWORD [16+esp]
+        xor     esi,ecx
+        mov     ebp,eax
+        shld    eax,eax,5
+        vpxor   xmm1,xmm1,xmm2
+        vmovdqa [96+esp],xmm5
+        add     edi,esi
+        xor     ebp,ecx
+        vmovdqa xmm5,xmm4
+        vpaddd  xmm4,xmm4,xmm0
+        shrd    ebx,ebx,7
+        add     edi,eax
+        vpxor   xmm1,xmm1,xmm3
+        add     edx,DWORD [20+esp]
+        xor     ebp,ebx
+        mov     esi,edi
+        shld    edi,edi,5
+        vpsrld  xmm3,xmm1,30
+        vmovdqa [esp],xmm4
+        add     edx,ebp
+        xor     esi,ebx
+        shrd    eax,eax,7
+        add     edx,edi
+        vpslld  xmm1,xmm1,2
+        add     ecx,DWORD [24+esp]
+        xor     esi,eax
+        mov     ebp,edx
+        shld    edx,edx,5
+        add     ecx,esi
+        xor     ebp,eax
+        shrd    edi,edi,7
+        add     ecx,edx
+        vpor    xmm1,xmm1,xmm3
+        add     ebx,DWORD [28+esp]
+        xor     ebp,edi
+        vmovdqa xmm3,[64+esp]
+        mov     esi,ecx
+        shld    ecx,ecx,5
+        add     ebx,ebp
+        xor     esi,edi
+        shrd    edx,edx,7
+        add     ebx,ecx
+        vpalignr        xmm4,xmm1,xmm0,8
+        vpxor   xmm2,xmm2,xmm6
+        add     eax,DWORD [32+esp]
+        xor     esi,edx
+        mov     ebp,ebx
+        shld    ebx,ebx,5
+        vpxor   xmm2,xmm2,xmm3
+        vmovdqa [64+esp],xmm6
+        add     eax,esi
+        xor     ebp,edx
+        vmovdqa xmm6,[128+esp]
+        vpaddd  xmm5,xmm5,xmm1
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        vpxor   xmm2,xmm2,xmm4
+        add     edi,DWORD [36+esp]
+        xor     ebp,ecx
+        mov     esi,eax
+        shld    eax,eax,5
+        vpsrld  xmm4,xmm2,30
+        vmovdqa [16+esp],xmm5
+        add     edi,ebp
+        xor     esi,ecx
+        shrd    ebx,ebx,7
+        add     edi,eax
+        vpslld  xmm2,xmm2,2
+        add     edx,DWORD [40+esp]
+        xor     esi,ebx
+        mov     ebp,edi
+        shld    edi,edi,5
+        add     edx,esi
+        xor     ebp,ebx
+        shrd    eax,eax,7
+        add     edx,edi
+        vpor    xmm2,xmm2,xmm4
+        add     ecx,DWORD [44+esp]
+        xor     ebp,eax
+        vmovdqa xmm4,[80+esp]
+        mov     esi,edx
+        shld    edx,edx,5
+        add     ecx,ebp
+        xor     esi,eax
+        shrd    edi,edi,7
+        add     ecx,edx
+        vpalignr        xmm5,xmm2,xmm1,8
+        vpxor   xmm3,xmm3,xmm7
+        add     ebx,DWORD [48+esp]
+        xor     esi,edi
+        mov     ebp,ecx
+        shld    ecx,ecx,5
+        vpxor   xmm3,xmm3,xmm4
+        vmovdqa [80+esp],xmm7
+        add     ebx,esi
+        xor     ebp,edi
+        vmovdqa xmm7,xmm6
+        vpaddd  xmm6,xmm6,xmm2
+        shrd    edx,edx,7
+        add     ebx,ecx
+        vpxor   xmm3,xmm3,xmm5
+        add     eax,DWORD [52+esp]
+        xor     ebp,edx
+        mov     esi,ebx
+        shld    ebx,ebx,5
+        vpsrld  xmm5,xmm3,30
+        vmovdqa [32+esp],xmm6
+        add     eax,ebp
+        xor     esi,edx
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        vpslld  xmm3,xmm3,2
+        add     edi,DWORD [56+esp]
+        xor     esi,ecx
+        mov     ebp,eax
+        shld    eax,eax,5
+        add     edi,esi
+        xor     ebp,ecx
+        shrd    ebx,ebx,7
+        add     edi,eax
+        vpor    xmm3,xmm3,xmm5
+        add     edx,DWORD [60+esp]
+        xor     ebp,ebx
+        vmovdqa xmm5,[96+esp]
+        mov     esi,edi
+        shld    edi,edi,5
+        add     edx,ebp
+        xor     esi,ebx
+        shrd    eax,eax,7
+        add     edx,edi
+        vpalignr        xmm6,xmm3,xmm2,8
+        vpxor   xmm4,xmm4,xmm0
+        add     ecx,DWORD [esp]
+        xor     esi,eax
+        mov     ebp,edx
+        shld    edx,edx,5
+        vpxor   xmm4,xmm4,xmm5
+        vmovdqa [96+esp],xmm0
+        add     ecx,esi
+        xor     ebp,eax
+        vmovdqa xmm0,xmm7
+        vpaddd  xmm7,xmm7,xmm3
+        shrd    edi,edi,7
+        add     ecx,edx
+        vpxor   xmm4,xmm4,xmm6
+        add     ebx,DWORD [4+esp]
+        xor     ebp,edi
+        mov     esi,ecx
+        shld    ecx,ecx,5
+        vpsrld  xmm6,xmm4,30
+        vmovdqa [48+esp],xmm7
+        add     ebx,ebp
+        xor     esi,edi
+        shrd    edx,edx,7
+        add     ebx,ecx
+        vpslld  xmm4,xmm4,2
+        add     eax,DWORD [8+esp]
+        xor     esi,edx
+        mov     ebp,ebx
+        shld    ebx,ebx,5
+        add     eax,esi
+        xor     ebp,edx
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        vpor    xmm4,xmm4,xmm6
+        add     edi,DWORD [12+esp]
+        xor     ebp,ecx
+        vmovdqa xmm6,[64+esp]
+        mov     esi,eax
+        shld    eax,eax,5
+        add     edi,ebp
+        xor     esi,ecx
+        shrd    ebx,ebx,7
+        add     edi,eax
+        vpalignr        xmm7,xmm4,xmm3,8
+        vpxor   xmm5,xmm5,xmm1
+        add     edx,DWORD [16+esp]
+        xor     esi,ebx
+        mov     ebp,edi
+        shld    edi,edi,5
+        vpxor   xmm5,xmm5,xmm6
+        vmovdqa [64+esp],xmm1
+        add     edx,esi
+        xor     ebp,ebx
+        vmovdqa xmm1,xmm0
+        vpaddd  xmm0,xmm0,xmm4
+        shrd    eax,eax,7
+        add     edx,edi
+        vpxor   xmm5,xmm5,xmm7
+        add     ecx,DWORD [20+esp]
+        xor     ebp,eax
+        mov     esi,edx
+        shld    edx,edx,5
+        vpsrld  xmm7,xmm5,30
+        vmovdqa [esp],xmm0
+        add     ecx,ebp
+        xor     esi,eax
+        shrd    edi,edi,7
+        add     ecx,edx
+        vpslld  xmm5,xmm5,2
+        add     ebx,DWORD [24+esp]
+        xor     esi,edi
+        mov     ebp,ecx
+        shld    ecx,ecx,5
+        add     ebx,esi
+        xor     ebp,edi
+        shrd    edx,edx,7
+        add     ebx,ecx
+        vpor    xmm5,xmm5,xmm7
+        add     eax,DWORD [28+esp]
+        vmovdqa xmm7,[80+esp]
+        shrd    ecx,ecx,7
+        mov     esi,ebx
+        xor     ebp,edx
+        shld    ebx,ebx,5
+        add     eax,ebp
+        xor     esi,ecx
+        xor     ecx,edx
+        add     eax,ebx
+        vpalignr        xmm0,xmm5,xmm4,8
+        vpxor   xmm6,xmm6,xmm2
+        add     edi,DWORD [32+esp]
+        and     esi,ecx
+        xor     ecx,edx
+        shrd    ebx,ebx,7
+        vpxor   xmm6,xmm6,xmm7
+        vmovdqa [80+esp],xmm2
+        mov     ebp,eax
+        xor     esi,ecx
+        vmovdqa xmm2,xmm1
+        vpaddd  xmm1,xmm1,xmm5
+        shld    eax,eax,5
+        add     edi,esi
+        vpxor   xmm6,xmm6,xmm0
+        xor     ebp,ebx
+        xor     ebx,ecx
+        add     edi,eax
+        add     edx,DWORD [36+esp]
+        vpsrld  xmm0,xmm6,30
+        vmovdqa [16+esp],xmm1
+        and     ebp,ebx
+        xor     ebx,ecx
+        shrd    eax,eax,7
+        mov     esi,edi
+        vpslld  xmm6,xmm6,2
+        xor     ebp,ebx
+        shld    edi,edi,5
+        add     edx,ebp
+        xor     esi,eax
+        xor     eax,ebx
+        add     edx,edi
+        add     ecx,DWORD [40+esp]
+        and     esi,eax
+        vpor    xmm6,xmm6,xmm0
+        xor     eax,ebx
+        shrd    edi,edi,7
+        vmovdqa xmm0,[96+esp]
+        mov     ebp,edx
+        xor     esi,eax
+        shld    edx,edx,5
+        add     ecx,esi
+        xor     ebp,edi
+        xor     edi,eax
+        add     ecx,edx
+        add     ebx,DWORD [44+esp]
+        and     ebp,edi
+        xor     edi,eax
+        shrd    edx,edx,7
+        mov     esi,ecx
+        xor     ebp,edi
+        shld    ecx,ecx,5
+        add     ebx,ebp
+        xor     esi,edx
+        xor     edx,edi
+        add     ebx,ecx
+        vpalignr        xmm1,xmm6,xmm5,8
+        vpxor   xmm7,xmm7,xmm3
+        add     eax,DWORD [48+esp]
+        and     esi,edx
+        xor     edx,edi
+        shrd    ecx,ecx,7
+        vpxor   xmm7,xmm7,xmm0
+        vmovdqa [96+esp],xmm3
+        mov     ebp,ebx
+        xor     esi,edx
+        vmovdqa xmm3,[144+esp]
+        vpaddd  xmm2,xmm2,xmm6
+        shld    ebx,ebx,5
+        add     eax,esi
+        vpxor   xmm7,xmm7,xmm1
+        xor     ebp,ecx
+        xor     ecx,edx
+        add     eax,ebx
+        add     edi,DWORD [52+esp]
+        vpsrld  xmm1,xmm7,30
+        vmovdqa [32+esp],xmm2
+        and     ebp,ecx
+        xor     ecx,edx
+        shrd    ebx,ebx,7
+        mov     esi,eax
+        vpslld  xmm7,xmm7,2
+        xor     ebp,ecx
+        shld    eax,eax,5
+        add     edi,ebp
+        xor     esi,ebx
+        xor     ebx,ecx
+        add     edi,eax
+        add     edx,DWORD [56+esp]
+        and     esi,ebx
+        vpor    xmm7,xmm7,xmm1
+        xor     ebx,ecx
+        shrd    eax,eax,7
+        vmovdqa xmm1,[64+esp]
+        mov     ebp,edi
+        xor     esi,ebx
+        shld    edi,edi,5
+        add     edx,esi
+        xor     ebp,eax
+        xor     eax,ebx
+        add     edx,edi
+        add     ecx,DWORD [60+esp]
+        and     ebp,eax
+        xor     eax,ebx
+        shrd    edi,edi,7
+        mov     esi,edx
+        xor     ebp,eax
+        shld    edx,edx,5
+        add     ecx,ebp
+        xor     esi,edi
+        xor     edi,eax
+        add     ecx,edx
+        vpalignr        xmm2,xmm7,xmm6,8
+        vpxor   xmm0,xmm0,xmm4
+        add     ebx,DWORD [esp]
+        and     esi,edi
+        xor     edi,eax
+        shrd    edx,edx,7
+        vpxor   xmm0,xmm0,xmm1
+        vmovdqa [64+esp],xmm4
+        mov     ebp,ecx
+        xor     esi,edi
+        vmovdqa xmm4,xmm3
+        vpaddd  xmm3,xmm3,xmm7
+        shld    ecx,ecx,5
+        add     ebx,esi
+        vpxor   xmm0,xmm0,xmm2
+        xor     ebp,edx
+        xor     edx,edi
+        add     ebx,ecx
+        add     eax,DWORD [4+esp]
+        vpsrld  xmm2,xmm0,30
+        vmovdqa [48+esp],xmm3
+        and     ebp,edx
+        xor     edx,edi
+        shrd    ecx,ecx,7
+        mov     esi,ebx
+        vpslld  xmm0,xmm0,2
+        xor     ebp,edx
+        shld    ebx,ebx,5
+        add     eax,ebp
+        xor     esi,ecx
+        xor     ecx,edx
+        add     eax,ebx
+        add     edi,DWORD [8+esp]
+        and     esi,ecx
+        vpor    xmm0,xmm0,xmm2
+        xor     ecx,edx
+        shrd    ebx,ebx,7
+        vmovdqa xmm2,[80+esp]
+        mov     ebp,eax
+        xor     esi,ecx
+        shld    eax,eax,5
+        add     edi,esi
+        xor     ebp,ebx
+        xor     ebx,ecx
+        add     edi,eax
+        add     edx,DWORD [12+esp]
+        and     ebp,ebx
+        xor     ebx,ecx
+        shrd    eax,eax,7
+        mov     esi,edi
+        xor     ebp,ebx
+        shld    edi,edi,5
+        add     edx,ebp
+        xor     esi,eax
+        xor     eax,ebx
+        add     edx,edi
+        vpalignr        xmm3,xmm0,xmm7,8
+        vpxor   xmm1,xmm1,xmm5
+        add     ecx,DWORD [16+esp]
+        and     esi,eax
+        xor     eax,ebx
+        shrd    edi,edi,7
+        vpxor   xmm1,xmm1,xmm2
+        vmovdqa [80+esp],xmm5
+        mov     ebp,edx
+        xor     esi,eax
+        vmovdqa xmm5,xmm4
+        vpaddd  xmm4,xmm4,xmm0
+        shld    edx,edx,5
+        add     ecx,esi
+        vpxor   xmm1,xmm1,xmm3
+        xor     ebp,edi
+        xor     edi,eax
+        add     ecx,edx
+        add     ebx,DWORD [20+esp]
+        vpsrld  xmm3,xmm1,30
+        vmovdqa [esp],xmm4
+        and     ebp,edi
+        xor     edi,eax
+        shrd    edx,edx,7
+        mov     esi,ecx
+        vpslld  xmm1,xmm1,2
+        xor     ebp,edi
+        shld    ecx,ecx,5
+        add     ebx,ebp
+        xor     esi,edx
+        xor     edx,edi
+        add     ebx,ecx
+        add     eax,DWORD [24+esp]
+        and     esi,edx
+        vpor    xmm1,xmm1,xmm3
+        xor     edx,edi
+        shrd    ecx,ecx,7
+        vmovdqa xmm3,[96+esp]
+        mov     ebp,ebx
+        xor     esi,edx
+        shld    ebx,ebx,5
+        add     eax,esi
+        xor     ebp,ecx
+        xor     ecx,edx
+        add     eax,ebx
+        add     edi,DWORD [28+esp]
+        and     ebp,ecx
+        xor     ecx,edx
+        shrd    ebx,ebx,7
+        mov     esi,eax
+        xor     ebp,ecx
+        shld    eax,eax,5
+        add     edi,ebp
+        xor     esi,ebx
+        xor     ebx,ecx
+        add     edi,eax
+        vpalignr        xmm4,xmm1,xmm0,8
+        vpxor   xmm2,xmm2,xmm6
+        add     edx,DWORD [32+esp]
+        and     esi,ebx
+        xor     ebx,ecx
+        shrd    eax,eax,7
+        vpxor   xmm2,xmm2,xmm3
+        vmovdqa [96+esp],xmm6
+        mov     ebp,edi
+        xor     esi,ebx
+        vmovdqa xmm6,xmm5
+        vpaddd  xmm5,xmm5,xmm1
+        shld    edi,edi,5
+        add     edx,esi
+        vpxor   xmm2,xmm2,xmm4
+        xor     ebp,eax
+        xor     eax,ebx
+        add     edx,edi
+        add     ecx,DWORD [36+esp]
+        vpsrld  xmm4,xmm2,30
+        vmovdqa [16+esp],xmm5
+        and     ebp,eax
+        xor     eax,ebx
+        shrd    edi,edi,7
+        mov     esi,edx
+        vpslld  xmm2,xmm2,2
+        xor     ebp,eax
+        shld    edx,edx,5
+        add     ecx,ebp
+        xor     esi,edi
+        xor     edi,eax
+        add     ecx,edx
+        add     ebx,DWORD [40+esp]
+        and     esi,edi
+        vpor    xmm2,xmm2,xmm4
+        xor     edi,eax
+        shrd    edx,edx,7
+        vmovdqa xmm4,[64+esp]
+        mov     ebp,ecx
+        xor     esi,edi
+        shld    ecx,ecx,5
+        add     ebx,esi
+        xor     ebp,edx
+        xor     edx,edi
+        add     ebx,ecx
+        add     eax,DWORD [44+esp]
+        and     ebp,edx
+        xor     edx,edi
+        shrd    ecx,ecx,7
+        mov     esi,ebx
+        xor     ebp,edx
+        shld    ebx,ebx,5
+        add     eax,ebp
+        xor     esi,edx
+        add     eax,ebx
+        vpalignr        xmm5,xmm2,xmm1,8
+        vpxor   xmm3,xmm3,xmm7
+        add     edi,DWORD [48+esp]
+        xor     esi,ecx
+        mov     ebp,eax
+        shld    eax,eax,5
+        vpxor   xmm3,xmm3,xmm4
+        vmovdqa [64+esp],xmm7
+        add     edi,esi
+        xor     ebp,ecx
+        vmovdqa xmm7,xmm6
+        vpaddd  xmm6,xmm6,xmm2
+        shrd    ebx,ebx,7
+        add     edi,eax
+        vpxor   xmm3,xmm3,xmm5
+        add     edx,DWORD [52+esp]
+        xor     ebp,ebx
+        mov     esi,edi
+        shld    edi,edi,5
+        vpsrld  xmm5,xmm3,30
+        vmovdqa [32+esp],xmm6
+        add     edx,ebp
+        xor     esi,ebx
+        shrd    eax,eax,7
+        add     edx,edi
+        vpslld  xmm3,xmm3,2
+        add     ecx,DWORD [56+esp]
+        xor     esi,eax
+        mov     ebp,edx
+        shld    edx,edx,5
+        add     ecx,esi
+        xor     ebp,eax
+        shrd    edi,edi,7
+        add     ecx,edx
+        vpor    xmm3,xmm3,xmm5
+        add     ebx,DWORD [60+esp]
+        xor     ebp,edi
+        mov     esi,ecx
+        shld    ecx,ecx,5
+        add     ebx,ebp
+        xor     esi,edi
+        shrd    edx,edx,7
+        add     ebx,ecx
+        add     eax,DWORD [esp]
+        vpaddd  xmm7,xmm7,xmm3
+        xor     esi,edx
+        mov     ebp,ebx
+        shld    ebx,ebx,5
+        add     eax,esi
+        vmovdqa [48+esp],xmm7
+        xor     ebp,edx
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        add     edi,DWORD [4+esp]
+        xor     ebp,ecx
+        mov     esi,eax
+        shld    eax,eax,5
+        add     edi,ebp
+        xor     esi,ecx
+        shrd    ebx,ebx,7
+        add     edi,eax
+        add     edx,DWORD [8+esp]
+        xor     esi,ebx
+        mov     ebp,edi
+        shld    edi,edi,5
+        add     edx,esi
+        xor     ebp,ebx
+        shrd    eax,eax,7
+        add     edx,edi
+        add     ecx,DWORD [12+esp]
+        xor     ebp,eax
+        mov     esi,edx
+        shld    edx,edx,5
+        add     ecx,ebp
+        xor     esi,eax
+        shrd    edi,edi,7
+        add     ecx,edx
+        mov     ebp,DWORD [196+esp]
+        cmp     ebp,DWORD [200+esp]
+        je      NEAR L$010done
+        vmovdqa xmm7,[160+esp]
+        vmovdqa xmm6,[176+esp]
+        vmovdqu xmm0,[ebp]
+        vmovdqu xmm1,[16+ebp]
+        vmovdqu xmm2,[32+ebp]
+        vmovdqu xmm3,[48+ebp]
+        add     ebp,64
+        vpshufb xmm0,xmm0,xmm6
+        mov     DWORD [196+esp],ebp
+        vmovdqa [96+esp],xmm7
+        add     ebx,DWORD [16+esp]
+        xor     esi,edi
+        vpshufb xmm1,xmm1,xmm6
+        mov     ebp,ecx
+        shld    ecx,ecx,5
+        vpaddd  xmm4,xmm0,xmm7
+        add     ebx,esi
+        xor     ebp,edi
+        shrd    edx,edx,7
+        add     ebx,ecx
+        vmovdqa [esp],xmm4
+        add     eax,DWORD [20+esp]
+        xor     ebp,edx
+        mov     esi,ebx
+        shld    ebx,ebx,5
+        add     eax,ebp
+        xor     esi,edx
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        add     edi,DWORD [24+esp]
+        xor     esi,ecx
+        mov     ebp,eax
+        shld    eax,eax,5
+        add     edi,esi
+        xor     ebp,ecx
+        shrd    ebx,ebx,7
+        add     edi,eax
+        add     edx,DWORD [28+esp]
+        xor     ebp,ebx
+        mov     esi,edi
+        shld    edi,edi,5
+        add     edx,ebp
+        xor     esi,ebx
+        shrd    eax,eax,7
+        add     edx,edi
+        add     ecx,DWORD [32+esp]
+        xor     esi,eax
+        vpshufb xmm2,xmm2,xmm6
+        mov     ebp,edx
+        shld    edx,edx,5
+        vpaddd  xmm5,xmm1,xmm7
+        add     ecx,esi
+        xor     ebp,eax
+        shrd    edi,edi,7
+        add     ecx,edx
+        vmovdqa [16+esp],xmm5
+        add     ebx,DWORD [36+esp]
+        xor     ebp,edi
+        mov     esi,ecx
+        shld    ecx,ecx,5
+        add     ebx,ebp
+        xor     esi,edi
+        shrd    edx,edx,7
+        add     ebx,ecx
+        add     eax,DWORD [40+esp]
+        xor     esi,edx
+        mov     ebp,ebx
+        shld    ebx,ebx,5
+        add     eax,esi
+        xor     ebp,edx
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        add     edi,DWORD [44+esp]
+        xor     ebp,ecx
+        mov     esi,eax
+        shld    eax,eax,5
+        add     edi,ebp
+        xor     esi,ecx
+        shrd    ebx,ebx,7
+        add     edi,eax
+        add     edx,DWORD [48+esp]
+        xor     esi,ebx
+        vpshufb xmm3,xmm3,xmm6
+        mov     ebp,edi
+        shld    edi,edi,5
+        vpaddd  xmm6,xmm2,xmm7
+        add     edx,esi
+        xor     ebp,ebx
+        shrd    eax,eax,7
+        add     edx,edi
+        vmovdqa [32+esp],xmm6
+        add     ecx,DWORD [52+esp]
+        xor     ebp,eax
+        mov     esi,edx
+        shld    edx,edx,5
+        add     ecx,ebp
+        xor     esi,eax
+        shrd    edi,edi,7
+        add     ecx,edx
+        add     ebx,DWORD [56+esp]
+        xor     esi,edi
+        mov     ebp,ecx
+        shld    ecx,ecx,5
+        add     ebx,esi
+        xor     ebp,edi
+        shrd    edx,edx,7
+        add     ebx,ecx
+        add     eax,DWORD [60+esp]
+        xor     ebp,edx
+        mov     esi,ebx
+        shld    ebx,ebx,5
+        add     eax,ebp
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        mov     ebp,DWORD [192+esp]
+        add     eax,DWORD [ebp]
+        add     esi,DWORD [4+ebp]
+        add     ecx,DWORD [8+ebp]
+        mov     DWORD [ebp],eax
+        add     edx,DWORD [12+ebp]
+        mov     DWORD [4+ebp],esi
+        add     edi,DWORD [16+ebp]
+        mov     ebx,ecx
+        mov     DWORD [8+ebp],ecx
+        xor     ebx,edx
+        mov     DWORD [12+ebp],edx
+        mov     DWORD [16+ebp],edi
+        mov     ebp,esi
+        and     esi,ebx
+        mov     ebx,ebp
+        jmp     NEAR L$009loop
+align   16
+L$010done:
+        add     ebx,DWORD [16+esp]
+        xor     esi,edi
+        mov     ebp,ecx
+        shld    ecx,ecx,5
+        add     ebx,esi
+        xor     ebp,edi
+        shrd    edx,edx,7
+        add     ebx,ecx
+        add     eax,DWORD [20+esp]
+        xor     ebp,edx
+        mov     esi,ebx
+        shld    ebx,ebx,5
+        add     eax,ebp
+        xor     esi,edx
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        add     edi,DWORD [24+esp]
+        xor     esi,ecx
+        mov     ebp,eax
+        shld    eax,eax,5
+        add     edi,esi
+        xor     ebp,ecx
+        shrd    ebx,ebx,7
+        add     edi,eax
+        add     edx,DWORD [28+esp]
+        xor     ebp,ebx
+        mov     esi,edi
+        shld    edi,edi,5
+        add     edx,ebp
+        xor     esi,ebx
+        shrd    eax,eax,7
+        add     edx,edi
+        add     ecx,DWORD [32+esp]
+        xor     esi,eax
+        mov     ebp,edx
+        shld    edx,edx,5
+        add     ecx,esi
+        xor     ebp,eax
+        shrd    edi,edi,7
+        add     ecx,edx
+        add     ebx,DWORD [36+esp]
+        xor     ebp,edi
+        mov     esi,ecx
+        shld    ecx,ecx,5
+        add     ebx,ebp
+        xor     esi,edi
+        shrd    edx,edx,7
+        add     ebx,ecx
+        add     eax,DWORD [40+esp]
+        xor     esi,edx
+        mov     ebp,ebx
+        shld    ebx,ebx,5
+        add     eax,esi
+        xor     ebp,edx
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        add     edi,DWORD [44+esp]
+        xor     ebp,ecx
+        mov     esi,eax
+        shld    eax,eax,5
+        add     edi,ebp
+        xor     esi,ecx
+        shrd    ebx,ebx,7
+        add     edi,eax
+        add     edx,DWORD [48+esp]
+        xor     esi,ebx
+        mov     ebp,edi
+        shld    edi,edi,5
+        add     edx,esi
+        xor     ebp,ebx
+        shrd    eax,eax,7
+        add     edx,edi
+        add     ecx,DWORD [52+esp]
+        xor     ebp,eax
+        mov     esi,edx
+        shld    edx,edx,5
+        add     ecx,ebp
+        xor     esi,eax
+        shrd    edi,edi,7
+        add     ecx,edx
+        add     ebx,DWORD [56+esp]
+        xor     esi,edi
+        mov     ebp,ecx
+        shld    ecx,ecx,5
+        add     ebx,esi
+        xor     ebp,edi
+        shrd    edx,edx,7
+        add     ebx,ecx
+        add     eax,DWORD [60+esp]
+        xor     ebp,edx
+        mov     esi,ebx
+        shld    ebx,ebx,5
+        add     eax,ebp
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        vzeroall
+        mov     ebp,DWORD [192+esp]
+        add     eax,DWORD [ebp]
+        mov     esp,DWORD [204+esp]
+        add     esi,DWORD [4+ebp]
+        add     ecx,DWORD [8+ebp]
+        mov     DWORD [ebp],eax
+        add     edx,DWORD [12+ebp]
+        mov     DWORD [4+ebp],esi
+        add     edi,DWORD [16+ebp]
+        mov     DWORD [8+ebp],ecx
+        mov     DWORD [12+ebp],edx
+        mov     DWORD [16+ebp],edi
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+align   64
+L$K_XX_XX:
+dd      1518500249,1518500249,1518500249,1518500249
+dd      1859775393,1859775393,1859775393,1859775393
+dd      2400959708,2400959708,2400959708,2400959708
+dd      3395469782,3395469782,3395469782,3395469782
+dd      66051,67438087,134810123,202182159
+db      15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0
+db      83,72,65,49,32,98,108,111,99,107,32,116,114,97,110,115
+db      102,111,114,109,32,102,111,114,32,120,56,54,44,32,67,82
+db      89,80,84,79,71,65,77,83,32,98,121,32,60,97,112,112
+db      114,111,64,111,112,101,110,115,115,108,46,111,114,103,62,0
+segment .bss
+common  _OPENSSL_ia32cap_P 16
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha256-586.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha256-586.nasm
new file mode 100644
index 0000000000..0540b0eac7
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha256-586.nasm
@@ -0,0 +1,6796 @@
+; Copyright 2007-2018 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code    use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text   code align=64
+%else
+section .text   code
+%endif
+;extern _OPENSSL_ia32cap_P
+global  _sha256_block_data_order
+align   16
+_sha256_block_data_order:
+L$_sha256_block_data_order_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        mov     esi,DWORD [20+esp]
+        mov     edi,DWORD [24+esp]
+        mov     eax,DWORD [28+esp]
+        mov     ebx,esp
+        call    L$000pic_point
+L$000pic_point:
+        pop     ebp
+        lea     ebp,[(L$001K256-L$000pic_point)+ebp]
+        sub     esp,16
+        and     esp,-64
+        shl     eax,6
+        add     eax,edi
+        mov     DWORD [esp],esi
+        mov     DWORD [4+esp],edi
+        mov     DWORD [8+esp],eax
+        mov     DWORD [12+esp],ebx
+        lea     edx,[_OPENSSL_ia32cap_P]
+        mov     ecx,DWORD [edx]
+        mov     ebx,DWORD [4+edx]
+        test    ecx,1048576
+        jnz     NEAR L$002loop
+        mov     edx,DWORD [8+edx]
+        test    ecx,16777216
+        jz      NEAR L$003no_xmm
+        and     ecx,1073741824
+        and     ebx,268435968
+        test    edx,536870912
+        jnz     NEAR L$004shaext
+        or      ecx,ebx
+        and     ecx,1342177280
+        cmp     ecx,1342177280
+        je      NEAR L$005AVX
+        test    ebx,512
+        jnz     NEAR L$006SSSE3
+L$003no_xmm:
+        sub     eax,edi
+        cmp     eax,256
+        jae     NEAR L$007unrolled
+        jmp     NEAR L$002loop
+align   16
+L$002loop:
+        mov     eax,DWORD [edi]
+        mov     ebx,DWORD [4+edi]
+        mov     ecx,DWORD [8+edi]
+        bswap   eax
+        mov     edx,DWORD [12+edi]
+        bswap   ebx
+        push    eax
+        bswap   ecx
+        push    ebx
+        bswap   edx
+        push    ecx
+        push    edx
+        mov     eax,DWORD [16+edi]
+        mov     ebx,DWORD [20+edi]
+        mov     ecx,DWORD [24+edi]
+        bswap   eax
+        mov     edx,DWORD [28+edi]
+        bswap   ebx
+        push    eax
+        bswap   ecx
+        push    ebx
+        bswap   edx
+        push    ecx
+        push    edx
+        mov     eax,DWORD [32+edi]
+        mov     ebx,DWORD [36+edi]
+        mov     ecx,DWORD [40+edi]
+        bswap   eax
+        mov     edx,DWORD [44+edi]
+        bswap   ebx
+        push    eax
+        bswap   ecx
+        push    ebx
+        bswap   edx
+        push    ecx
+        push    edx
+        mov     eax,DWORD [48+edi]
+        mov     ebx,DWORD [52+edi]
+        mov     ecx,DWORD [56+edi]
+        bswap   eax
+        mov     edx,DWORD [60+edi]
+        bswap   ebx
+        push    eax
+        bswap   ecx
+        push    ebx
+        bswap   edx
+        push    ecx
+        push    edx
+        add     edi,64
+        lea     esp,[esp-36]
+        mov     DWORD [104+esp],edi
+        mov     eax,DWORD [esi]
+        mov     ebx,DWORD [4+esi]
+        mov     ecx,DWORD [8+esi]
+        mov     edi,DWORD [12+esi]
+        mov     DWORD [8+esp],ebx
+        xor     ebx,ecx
+        mov     DWORD [12+esp],ecx
+        mov     DWORD [16+esp],edi
+        mov     DWORD [esp],ebx
+        mov     edx,DWORD [16+esi]
+        mov     ebx,DWORD [20+esi]
+        mov     ecx,DWORD [24+esi]
+        mov     edi,DWORD [28+esi]
+        mov     DWORD [24+esp],ebx
+        mov     DWORD [28+esp],ecx
+        mov     DWORD [32+esp],edi
+align   16
+L$00800_15:
+        mov     ecx,edx
+        mov     esi,DWORD [24+esp]
+        ror     ecx,14
+        mov     edi,DWORD [28+esp]
+        xor     ecx,edx
+        xor     esi,edi
+        mov     ebx,DWORD [96+esp]
+        ror     ecx,5
+        and     esi,edx
+        mov     DWORD [20+esp],edx
+        xor     edx,ecx
+        add     ebx,DWORD [32+esp]
+        xor     esi,edi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,esi
+        ror     ecx,9
+        add     ebx,edx
+        mov     edi,DWORD [8+esp]
+        xor     ecx,eax
+        mov     DWORD [4+esp],eax
+        lea     esp,[esp-4]
+        ror     ecx,11
+        mov     esi,DWORD [ebp]
+        xor     ecx,eax
+        mov     edx,DWORD [20+esp]
+        xor     eax,edi
+        ror     ecx,2
+        add     ebx,esi
+        mov     DWORD [esp],eax
+        add     edx,ebx
+        and     eax,DWORD [4+esp]
+        add     ebx,ecx
+        xor     eax,edi
+        add     ebp,4
+        add     eax,ebx
+        cmp     esi,3248222580
+        jne     NEAR L$00800_15
+        mov     ecx,DWORD [156+esp]
+        jmp     NEAR L$00916_63
+align   16
+L$00916_63:
+        mov     ebx,ecx
+        mov     esi,DWORD [104+esp]
+        ror     ecx,11
+        mov     edi,esi
+        ror     esi,2
+        xor     ecx,ebx
+        shr     ebx,3
+        ror     ecx,7
+        xor     esi,edi
+        xor     ebx,ecx
+        ror     esi,17
+        add     ebx,DWORD [160+esp]
+        shr     edi,10
+        add     ebx,DWORD [124+esp]
+        mov     ecx,edx
+        xor     edi,esi
+        mov     esi,DWORD [24+esp]
+        ror     ecx,14
+        add     ebx,edi
+        mov     edi,DWORD [28+esp]
+        xor     ecx,edx
+        xor     esi,edi
+        mov     DWORD [96+esp],ebx
+        ror     ecx,5
+        and     esi,edx
+        mov     DWORD [20+esp],edx
+        xor     edx,ecx
+        add     ebx,DWORD [32+esp]
+        xor     esi,edi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,esi
+        ror     ecx,9
+        add     ebx,edx
+        mov     edi,DWORD [8+esp]
+        xor     ecx,eax
+        mov     DWORD [4+esp],eax
+        lea     esp,[esp-4]
+        ror     ecx,11
+        mov     esi,DWORD [ebp]
+        xor     ecx,eax
+        mov     edx,DWORD [20+esp]
+        xor     eax,edi
+        ror     ecx,2
+        add     ebx,esi
+        mov     DWORD [esp],eax
+        add     edx,ebx
+        and     eax,DWORD [4+esp]
+        add     ebx,ecx
+        xor     eax,edi
+        mov     ecx,DWORD [156+esp]
+        add     ebp,4
+        add     eax,ebx
+        cmp     esi,3329325298
+        jne     NEAR L$00916_63
+        mov     esi,DWORD [356+esp]
+        mov     ebx,DWORD [8+esp]
+        mov     ecx,DWORD [16+esp]
+        add     eax,DWORD [esi]
+        add     ebx,DWORD [4+esi]
+        add     edi,DWORD [8+esi]
+        add     ecx,DWORD [12+esi]
+        mov     DWORD [esi],eax
+        mov     DWORD [4+esi],ebx
+        mov     DWORD [8+esi],edi
+        mov     DWORD [12+esi],ecx
+        mov     eax,DWORD [24+esp]
+        mov     ebx,DWORD [28+esp]
+        mov     ecx,DWORD [32+esp]
+        mov     edi,DWORD [360+esp]
+        add     edx,DWORD [16+esi]
+        add     eax,DWORD [20+esi]
+        add     ebx,DWORD [24+esi]
+        add     ecx,DWORD [28+esi]
+        mov     DWORD [16+esi],edx
+        mov     DWORD [20+esi],eax
+        mov     DWORD [24+esi],ebx
+        mov     DWORD [28+esi],ecx
+        lea     esp,[356+esp]
+        sub     ebp,256
+        cmp     edi,DWORD [8+esp]
+        jb      NEAR L$002loop
+        mov     esp,DWORD [12+esp]
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+align   64
+L$001K256:
+dd      1116352408,1899447441,3049323471,3921009573,961987163,1508970993,2453635748,2870763221,3624381080,310598401,607225278,1426881987,1925078388,2162078206,2614888103,3248222580,3835390401,4022224774,264347078,604807628,770255983,1249150122,1555081692,1996064986,2554220882,2821834349,2952996808,3210313671,3336571891,3584528711,113926993,338241895,666307205,773529912,1294757372,1396182291,1695183700,1986661051,2177026350,2456956037,2730485921,2820302411,3259730800,3345764771,3516065817,3600352804,4094571909,275423344,430227734,506948616,659060556,883997877,958139571,1322822218,1537002063,1747873779,1955562222,2024104815,2227730452,2361852424,2428436474,2756734187,3204031479,3329325298
+dd      66051,67438087,134810123,202182159
+db      83,72,65,50,53,54,32,98,108,111,99,107,32,116,114,97
+db      110,115,102,111,114,109,32,102,111,114,32,120,56,54,44,32
+db      67,82,89,80,84,79,71,65,77,83,32,98,121,32,60,97
+db      112,112,114,111,64,111,112,101,110,115,115,108,46,111,114,103
+db      62,0
+align   16
+L$007unrolled:
+        lea     esp,[esp-96]
+        mov     eax,DWORD [esi]
+        mov     ebp,DWORD [4+esi]
+        mov     ecx,DWORD [8+esi]
+        mov     ebx,DWORD [12+esi]
+        mov     DWORD [4+esp],ebp
+        xor     ebp,ecx
+        mov     DWORD [8+esp],ecx
+        mov     DWORD [12+esp],ebx
+        mov     edx,DWORD [16+esi]
+        mov     ebx,DWORD [20+esi]
+        mov     ecx,DWORD [24+esi]
+        mov     esi,DWORD [28+esi]
+        mov     DWORD [20+esp],ebx
+        mov     DWORD [24+esp],ecx
+        mov     DWORD [28+esp],esi
+        jmp     NEAR L$010grand_loop
+align   16
+L$010grand_loop:
+        mov     ebx,DWORD [edi]
+        mov     ecx,DWORD [4+edi]
+        bswap   ebx
+        mov     esi,DWORD [8+edi]
+        bswap   ecx
+        mov     DWORD [32+esp],ebx
+        bswap   esi
+        mov     DWORD [36+esp],ecx
+        mov     DWORD [40+esp],esi
+        mov     ebx,DWORD [12+edi]
+        mov     ecx,DWORD [16+edi]
+        bswap   ebx
+        mov     esi,DWORD [20+edi]
+        bswap   ecx
+        mov     DWORD [44+esp],ebx
+        bswap   esi
+        mov     DWORD [48+esp],ecx
+        mov     DWORD [52+esp],esi
+        mov     ebx,DWORD [24+edi]
+        mov     ecx,DWORD [28+edi]
+        bswap   ebx
+        mov     esi,DWORD [32+edi]
+        bswap   ecx
+        mov     DWORD [56+esp],ebx
+        bswap   esi
+        mov     DWORD [60+esp],ecx
+        mov     DWORD [64+esp],esi
+        mov     ebx,DWORD [36+edi]
+        mov     ecx,DWORD [40+edi]
+        bswap   ebx
+        mov     esi,DWORD [44+edi]
+        bswap   ecx
+        mov     DWORD [68+esp],ebx
+        bswap   esi
+        mov     DWORD [72+esp],ecx
+        mov     DWORD [76+esp],esi
+        mov     ebx,DWORD [48+edi]
+        mov     ecx,DWORD [52+edi]
+        bswap   ebx
+        mov     esi,DWORD [56+edi]
+        bswap   ecx
+        mov     DWORD [80+esp],ebx
+        bswap   esi
+        mov     DWORD [84+esp],ecx
+        mov     DWORD [88+esp],esi
+        mov     ebx,DWORD [60+edi]
+        add     edi,64
+        bswap   ebx
+        mov     DWORD [100+esp],edi
+        mov     DWORD [92+esp],ebx
+        mov     ecx,edx
+        mov     esi,DWORD [20+esp]
+        ror     edx,14
+        mov     edi,DWORD [24+esp]
+        xor     edx,ecx
+        mov     ebx,DWORD [32+esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [16+esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [28+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [4+esp]
+        xor     ecx,eax
+        mov     DWORD [esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[1116352408+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [12+esp]
+        add     ebp,ecx
+        mov     esi,edx
+        mov     ecx,DWORD [16+esp]
+        ror     edx,14
+        mov     edi,DWORD [20+esp]
+        xor     edx,esi
+        mov     ebx,DWORD [36+esp]
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [12+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [24+esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [esp]
+        xor     esi,ebp
+        mov     DWORD [28+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[1899447441+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [8+esp]
+        add     eax,esi
+        mov     ecx,edx
+        mov     esi,DWORD [12+esp]
+        ror     edx,14
+        mov     edi,DWORD [16+esp]
+        xor     edx,ecx
+        mov     ebx,DWORD [40+esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [8+esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [20+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [28+esp]
+        xor     ecx,eax
+        mov     DWORD [24+esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[3049323471+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [4+esp]
+        add     ebp,ecx
+        mov     esi,edx
+        mov     ecx,DWORD [8+esp]
+        ror     edx,14
+        mov     edi,DWORD [12+esp]
+        xor     edx,esi
+        mov     ebx,DWORD [44+esp]
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [4+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [16+esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [24+esp]
+        xor     esi,ebp
+        mov     DWORD [20+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[3921009573+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [esp]
+        add     eax,esi
+        mov     ecx,edx
+        mov     esi,DWORD [4+esp]
+        ror     edx,14
+        mov     edi,DWORD [8+esp]
+        xor     edx,ecx
+        mov     ebx,DWORD [48+esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [12+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [20+esp]
+        xor     ecx,eax
+        mov     DWORD [16+esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[961987163+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [28+esp]
+        add     ebp,ecx
+        mov     esi,edx
+        mov     ecx,DWORD [esp]
+        ror     edx,14
+        mov     edi,DWORD [4+esp]
+        xor     edx,esi
+        mov     ebx,DWORD [52+esp]
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [28+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [8+esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [16+esp]
+        xor     esi,ebp
+        mov     DWORD [12+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[1508970993+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [24+esp]
+        add     eax,esi
+        mov     ecx,edx
+        mov     esi,DWORD [28+esp]
+        ror     edx,14
+        mov     edi,DWORD [esp]
+        xor     edx,ecx
+        mov     ebx,DWORD [56+esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [24+esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [4+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [12+esp]
+        xor     ecx,eax
+        mov     DWORD [8+esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[2453635748+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [20+esp]
+        add     ebp,ecx
+        mov     esi,edx
+        mov     ecx,DWORD [24+esp]
+        ror     edx,14
+        mov     edi,DWORD [28+esp]
+        xor     edx,esi
+        mov     ebx,DWORD [60+esp]
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [20+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [8+esp]
+        xor     esi,ebp
+        mov     DWORD [4+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[2870763221+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [16+esp]
+        add     eax,esi
+        mov     ecx,edx
+        mov     esi,DWORD [20+esp]
+        ror     edx,14
+        mov     edi,DWORD [24+esp]
+        xor     edx,ecx
+        mov     ebx,DWORD [64+esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [16+esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [28+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [4+esp]
+        xor     ecx,eax
+        mov     DWORD [esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[3624381080+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [12+esp]
+        add     ebp,ecx
+        mov     esi,edx
+        mov     ecx,DWORD [16+esp]
+        ror     edx,14
+        mov     edi,DWORD [20+esp]
+        xor     edx,esi
+        mov     ebx,DWORD [68+esp]
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [12+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [24+esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [esp]
+        xor     esi,ebp
+        mov     DWORD [28+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[310598401+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [8+esp]
+        add     eax,esi
+        mov     ecx,edx
+        mov     esi,DWORD [12+esp]
+        ror     edx,14
+        mov     edi,DWORD [16+esp]
+        xor     edx,ecx
+        mov     ebx,DWORD [72+esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [8+esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [20+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [28+esp]
+        xor     ecx,eax
+        mov     DWORD [24+esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[607225278+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [4+esp]
+        add     ebp,ecx
+        mov     esi,edx
+        mov     ecx,DWORD [8+esp]
+        ror     edx,14
+        mov     edi,DWORD [12+esp]
+        xor     edx,esi
+        mov     ebx,DWORD [76+esp]
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [4+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [16+esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [24+esp]
+        xor     esi,ebp
+        mov     DWORD [20+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[1426881987+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [esp]
+        add     eax,esi
+        mov     ecx,edx
+        mov     esi,DWORD [4+esp]
+        ror     edx,14
+        mov     edi,DWORD [8+esp]
+        xor     edx,ecx
+        mov     ebx,DWORD [80+esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [12+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [20+esp]
+        xor     ecx,eax
+        mov     DWORD [16+esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[1925078388+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [28+esp]
+        add     ebp,ecx
+        mov     esi,edx
+        mov     ecx,DWORD [esp]
+        ror     edx,14
+        mov     edi,DWORD [4+esp]
+        xor     edx,esi
+        mov     ebx,DWORD [84+esp]
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [28+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [8+esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [16+esp]
+        xor     esi,ebp
+        mov     DWORD [12+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[2162078206+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [24+esp]
+        add     eax,esi
+        mov     ecx,edx
+        mov     esi,DWORD [28+esp]
+        ror     edx,14
+        mov     edi,DWORD [esp]
+        xor     edx,ecx
+        mov     ebx,DWORD [88+esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [24+esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [4+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [12+esp]
+        xor     ecx,eax
+        mov     DWORD [8+esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[2614888103+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [20+esp]
+        add     ebp,ecx
+        mov     esi,edx
+        mov     ecx,DWORD [24+esp]
+        ror     edx,14
+        mov     edi,DWORD [28+esp]
+        xor     edx,esi
+        mov     ebx,DWORD [92+esp]
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [20+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [8+esp]
+        xor     esi,ebp
+        mov     DWORD [4+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[3248222580+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        mov     ecx,DWORD [36+esp]
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [16+esp]
+        add     eax,esi
+        mov     esi,DWORD [88+esp]
+        mov     ebx,ecx
+        ror     ecx,11
+        mov     edi,esi
+        ror     esi,2
+        xor     ecx,ebx
+        shr     ebx,3
+        ror     ecx,7
+        xor     esi,edi
+        xor     ebx,ecx
+        ror     esi,17
+        add     ebx,DWORD [32+esp]
+        shr     edi,10
+        add     ebx,DWORD [68+esp]
+        mov     ecx,edx
+        xor     edi,esi
+        mov     esi,DWORD [20+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [24+esp]
+        xor     edx,ecx
+        mov     DWORD [32+esp],ebx
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [16+esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [28+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [4+esp]
+        xor     ecx,eax
+        mov     DWORD [esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[3835390401+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        mov     esi,DWORD [40+esp]
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [12+esp]
+        add     ebp,ecx
+        mov     ecx,DWORD [92+esp]
+        mov     ebx,esi
+        ror     esi,11
+        mov     edi,ecx
+        ror     ecx,2
+        xor     esi,ebx
+        shr     ebx,3
+        ror     esi,7
+        xor     ecx,edi
+        xor     ebx,esi
+        ror     ecx,17
+        add     ebx,DWORD [36+esp]
+        shr     edi,10
+        add     ebx,DWORD [72+esp]
+        mov     esi,edx
+        xor     edi,ecx
+        mov     ecx,DWORD [16+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [20+esp]
+        xor     edx,esi
+        mov     DWORD [36+esp],ebx
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [12+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [24+esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [esp]
+        xor     esi,ebp
+        mov     DWORD [28+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[4022224774+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        mov     ecx,DWORD [44+esp]
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [8+esp]
+        add     eax,esi
+        mov     esi,DWORD [32+esp]
+        mov     ebx,ecx
+        ror     ecx,11
+        mov     edi,esi
+        ror     esi,2
+        xor     ecx,ebx
+        shr     ebx,3
+        ror     ecx,7
+        xor     esi,edi
+        xor     ebx,ecx
+        ror     esi,17
+        add     ebx,DWORD [40+esp]
+        shr     edi,10
+        add     ebx,DWORD [76+esp]
+        mov     ecx,edx
+        xor     edi,esi
+        mov     esi,DWORD [12+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [16+esp]
+        xor     edx,ecx
+        mov     DWORD [40+esp],ebx
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [8+esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [20+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [28+esp]
+        xor     ecx,eax
+        mov     DWORD [24+esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[264347078+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        mov     esi,DWORD [48+esp]
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [4+esp]
+        add     ebp,ecx
+        mov     ecx,DWORD [36+esp]
+        mov     ebx,esi
+        ror     esi,11
+        mov     edi,ecx
+        ror     ecx,2
+        xor     esi,ebx
+        shr     ebx,3
+        ror     esi,7
+        xor     ecx,edi
+        xor     ebx,esi
+        ror     ecx,17
+        add     ebx,DWORD [44+esp]
+        shr     edi,10
+        add     ebx,DWORD [80+esp]
+        mov     esi,edx
+        xor     edi,ecx
+        mov     ecx,DWORD [8+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [12+esp]
+        xor     edx,esi
+        mov     DWORD [44+esp],ebx
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [4+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [16+esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [24+esp]
+        xor     esi,ebp
+        mov     DWORD [20+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[604807628+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        mov     ecx,DWORD [52+esp]
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [esp]
+        add     eax,esi
+        mov     esi,DWORD [40+esp]
+        mov     ebx,ecx
+        ror     ecx,11
+        mov     edi,esi
+        ror     esi,2
+        xor     ecx,ebx
+        shr     ebx,3
+        ror     ecx,7
+        xor     esi,edi
+        xor     ebx,ecx
+        ror     esi,17
+        add     ebx,DWORD [48+esp]
+        shr     edi,10
+        add     ebx,DWORD [84+esp]
+        mov     ecx,edx
+        xor     edi,esi
+        mov     esi,DWORD [4+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [8+esp]
+        xor     edx,ecx
+        mov     DWORD [48+esp],ebx
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [12+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [20+esp]
+        xor     ecx,eax
+        mov     DWORD [16+esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[770255983+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        mov     esi,DWORD [56+esp]
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [28+esp]
+        add     ebp,ecx
+        mov     ecx,DWORD [44+esp]
+        mov     ebx,esi
+        ror     esi,11
+        mov     edi,ecx
+        ror     ecx,2
+        xor     esi,ebx
+        shr     ebx,3
+        ror     esi,7
+        xor     ecx,edi
+        xor     ebx,esi
+        ror     ecx,17
+        add     ebx,DWORD [52+esp]
+        shr     edi,10
+        add     ebx,DWORD [88+esp]
+        mov     esi,edx
+        xor     edi,ecx
+        mov     ecx,DWORD [esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [4+esp]
+        xor     edx,esi
+        mov     DWORD [52+esp],ebx
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [28+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [8+esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [16+esp]
+        xor     esi,ebp
+        mov     DWORD [12+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[1249150122+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        mov     ecx,DWORD [60+esp]
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [24+esp]
+        add     eax,esi
+        mov     esi,DWORD [48+esp]
+        mov     ebx,ecx
+        ror     ecx,11
+        mov     edi,esi
+        ror     esi,2
+        xor     ecx,ebx
+        shr     ebx,3
+        ror     ecx,7
+        xor     esi,edi
+        xor     ebx,ecx
+        ror     esi,17
+        add     ebx,DWORD [56+esp]
+        shr     edi,10
+        add     ebx,DWORD [92+esp]
+        mov     ecx,edx
+        xor     edi,esi
+        mov     esi,DWORD [28+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [esp]
+        xor     edx,ecx
+        mov     DWORD [56+esp],ebx
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [24+esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [4+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [12+esp]
+        xor     ecx,eax
+        mov     DWORD [8+esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[1555081692+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        mov     esi,DWORD [64+esp]
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [20+esp]
+        add     ebp,ecx
+        mov     ecx,DWORD [52+esp]
+        mov     ebx,esi
+        ror     esi,11
+        mov     edi,ecx
+        ror     ecx,2
+        xor     esi,ebx
+        shr     ebx,3
+        ror     esi,7
+        xor     ecx,edi
+        xor     ebx,esi
+        ror     ecx,17
+        add     ebx,DWORD [60+esp]
+        shr     edi,10
+        add     ebx,DWORD [32+esp]
+        mov     esi,edx
+        xor     edi,ecx
+        mov     ecx,DWORD [24+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [28+esp]
+        xor     edx,esi
+        mov     DWORD [60+esp],ebx
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [20+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [8+esp]
+        xor     esi,ebp
+        mov     DWORD [4+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[1996064986+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        mov     ecx,DWORD [68+esp]
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [16+esp]
+        add     eax,esi
+        mov     esi,DWORD [56+esp]
+        mov     ebx,ecx
+        ror     ecx,11
+        mov     edi,esi
+        ror     esi,2
+        xor     ecx,ebx
+        shr     ebx,3
+        ror     ecx,7
+        xor     esi,edi
+        xor     ebx,ecx
+        ror     esi,17
+        add     ebx,DWORD [64+esp]
+        shr     edi,10
+        add     ebx,DWORD [36+esp]
+        mov     ecx,edx
+        xor     edi,esi
+        mov     esi,DWORD [20+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [24+esp]
+        xor     edx,ecx
+        mov     DWORD [64+esp],ebx
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [16+esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [28+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [4+esp]
+        xor     ecx,eax
+        mov     DWORD [esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[2554220882+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        mov     esi,DWORD [72+esp]
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [12+esp]
+        add     ebp,ecx
+        mov     ecx,DWORD [60+esp]
+        mov     ebx,esi
+        ror     esi,11
+        mov     edi,ecx
+        ror     ecx,2
+        xor     esi,ebx
+        shr     ebx,3
+        ror     esi,7
+        xor     ecx,edi
+        xor     ebx,esi
+        ror     ecx,17
+        add     ebx,DWORD [68+esp]
+        shr     edi,10
+        add     ebx,DWORD [40+esp]
+        mov     esi,edx
+        xor     edi,ecx
+        mov     ecx,DWORD [16+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [20+esp]
+        xor     edx,esi
+        mov     DWORD [68+esp],ebx
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [12+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [24+esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [esp]
+        xor     esi,ebp
+        mov     DWORD [28+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[2821834349+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        mov     ecx,DWORD [76+esp]
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [8+esp]
+        add     eax,esi
+        mov     esi,DWORD [64+esp]
+        mov     ebx,ecx
+        ror     ecx,11
+        mov     edi,esi
+        ror     esi,2
+        xor     ecx,ebx
+        shr     ebx,3
+        ror     ecx,7
+        xor     esi,edi
+        xor     ebx,ecx
+        ror     esi,17
+        add     ebx,DWORD [72+esp]
+        shr     edi,10
+        add     ebx,DWORD [44+esp]
+        mov     ecx,edx
+        xor     edi,esi
+        mov     esi,DWORD [12+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [16+esp]
+        xor     edx,ecx
+        mov     DWORD [72+esp],ebx
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [8+esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [20+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [28+esp]
+        xor     ecx,eax
+        mov     DWORD [24+esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[2952996808+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        mov     esi,DWORD [80+esp]
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [4+esp]
+        add     ebp,ecx
+        mov     ecx,DWORD [68+esp]
+        mov     ebx,esi
+        ror     esi,11
+        mov     edi,ecx
+        ror     ecx,2
+        xor     esi,ebx
+        shr     ebx,3
+        ror     esi,7
+        xor     ecx,edi
+        xor     ebx,esi
+        ror     ecx,17
+        add     ebx,DWORD [76+esp]
+        shr     edi,10
+        add     ebx,DWORD [48+esp]
+        mov     esi,edx
+        xor     edi,ecx
+        mov     ecx,DWORD [8+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [12+esp]
+        xor     edx,esi
+        mov     DWORD [76+esp],ebx
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [4+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [16+esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [24+esp]
+        xor     esi,ebp
+        mov     DWORD [20+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[3210313671+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        mov     ecx,DWORD [84+esp]
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [esp]
+        add     eax,esi
+        mov     esi,DWORD [72+esp]
+        mov     ebx,ecx
+        ror     ecx,11
+        mov     edi,esi
+        ror     esi,2
+        xor     ecx,ebx
+        shr     ebx,3
+        ror     ecx,7
+        xor     esi,edi
+        xor     ebx,ecx
+        ror     esi,17
+        add     ebx,DWORD [80+esp]
+        shr     edi,10
+        add     ebx,DWORD [52+esp]
+        mov     ecx,edx
+        xor     edi,esi
+        mov     esi,DWORD [4+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [8+esp]
+        xor     edx,ecx
+        mov     DWORD [80+esp],ebx
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [12+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [20+esp]
+        xor     ecx,eax
+        mov     DWORD [16+esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[3336571891+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        mov     esi,DWORD [88+esp]
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [28+esp]
+        add     ebp,ecx
+        mov     ecx,DWORD [76+esp]
+        mov     ebx,esi
+        ror     esi,11
+        mov     edi,ecx
+        ror     ecx,2
+        xor     esi,ebx
+        shr     ebx,3
+        ror     esi,7
+        xor     ecx,edi
+        xor     ebx,esi
+        ror     ecx,17
+        add     ebx,DWORD [84+esp]
+        shr     edi,10
+        add     ebx,DWORD [56+esp]
+        mov     esi,edx
+        xor     edi,ecx
+        mov     ecx,DWORD [esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [4+esp]
+        xor     edx,esi
+        mov     DWORD [84+esp],ebx
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [28+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [8+esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [16+esp]
+        xor     esi,ebp
+        mov     DWORD [12+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[3584528711+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        mov     ecx,DWORD [92+esp]
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [24+esp]
+        add     eax,esi
+        mov     esi,DWORD [80+esp]
+        mov     ebx,ecx
+        ror     ecx,11
+        mov     edi,esi
+        ror     esi,2
+        xor     ecx,ebx
+        shr     ebx,3
+        ror     ecx,7
+        xor     esi,edi
+        xor     ebx,ecx
+        ror     esi,17
+        add     ebx,DWORD [88+esp]
+        shr     edi,10
+        add     ebx,DWORD [60+esp]
+        mov     ecx,edx
+        xor     edi,esi
+        mov     esi,DWORD [28+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [esp]
+        xor     edx,ecx
+        mov     DWORD [88+esp],ebx
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [24+esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [4+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [12+esp]
+        xor     ecx,eax
+        mov     DWORD [8+esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[113926993+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        mov     esi,DWORD [32+esp]
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [20+esp]
+        add     ebp,ecx
+        mov     ecx,DWORD [84+esp]
+        mov     ebx,esi
+        ror     esi,11
+        mov     edi,ecx
+        ror     ecx,2
+        xor     esi,ebx
+        shr     ebx,3
+        ror     esi,7
+        xor     ecx,edi
+        xor     ebx,esi
+        ror     ecx,17
+        add     ebx,DWORD [92+esp]
+        shr     edi,10
+        add     ebx,DWORD [64+esp]
+        mov     esi,edx
+        xor     edi,ecx
+        mov     ecx,DWORD [24+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [28+esp]
+        xor     edx,esi
+        mov     DWORD [92+esp],ebx
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [20+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [8+esp]
+        xor     esi,ebp
+        mov     DWORD [4+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[338241895+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        mov     ecx,DWORD [36+esp]
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [16+esp]
+        add     eax,esi
+        mov     esi,DWORD [88+esp]
+        mov     ebx,ecx
+        ror     ecx,11
+        mov     edi,esi
+        ror     esi,2
+        xor     ecx,ebx
+        shr     ebx,3
+        ror     ecx,7
+        xor     esi,edi
+        xor     ebx,ecx
+        ror     esi,17
+        add     ebx,DWORD [32+esp]
+        shr     edi,10
+        add     ebx,DWORD [68+esp]
+        mov     ecx,edx
+        xor     edi,esi
+        mov     esi,DWORD [20+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [24+esp]
+        xor     edx,ecx
+        mov     DWORD [32+esp],ebx
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [16+esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [28+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [4+esp]
+        xor     ecx,eax
+        mov     DWORD [esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[666307205+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        mov     esi,DWORD [40+esp]
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [12+esp]
+        add     ebp,ecx
+        mov     ecx,DWORD [92+esp]
+        mov     ebx,esi
+        ror     esi,11
+        mov     edi,ecx
+        ror     ecx,2
+        xor     esi,ebx
+        shr     ebx,3
+        ror     esi,7
+        xor     ecx,edi
+        xor     ebx,esi
+        ror     ecx,17
+        add     ebx,DWORD [36+esp]
+        shr     edi,10
+        add     ebx,DWORD [72+esp]
+        mov     esi,edx
+        xor     edi,ecx
+        mov     ecx,DWORD [16+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [20+esp]
+        xor     edx,esi
+        mov     DWORD [36+esp],ebx
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [12+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [24+esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [esp]
+        xor     esi,ebp
+        mov     DWORD [28+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[773529912+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        mov     ecx,DWORD [44+esp]
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [8+esp]
+        add     eax,esi
+        mov     esi,DWORD [32+esp]
+        mov     ebx,ecx
+        ror     ecx,11
+        mov     edi,esi
+        ror     esi,2
+        xor     ecx,ebx
+        shr     ebx,3
+        ror     ecx,7
+        xor     esi,edi
+        xor     ebx,ecx
+        ror     esi,17
+        add     ebx,DWORD [40+esp]
+        shr     edi,10
+        add     ebx,DWORD [76+esp]
+        mov     ecx,edx
+        xor     edi,esi
+        mov     esi,DWORD [12+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [16+esp]
+        xor     edx,ecx
+        mov     DWORD [40+esp],ebx
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [8+esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [20+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [28+esp]
+        xor     ecx,eax
+        mov     DWORD [24+esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[1294757372+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        mov     esi,DWORD [48+esp]
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [4+esp]
+        add     ebp,ecx
+        mov     ecx,DWORD [36+esp]
+        mov     ebx,esi
+        ror     esi,11
+        mov     edi,ecx
+        ror     ecx,2
+        xor     esi,ebx
+        shr     ebx,3
+        ror     esi,7
+        xor     ecx,edi
+        xor     ebx,esi
+        ror     ecx,17
+        add     ebx,DWORD [44+esp]
+        shr     edi,10
+        add     ebx,DWORD [80+esp]
+        mov     esi,edx
+        xor     edi,ecx
+        mov     ecx,DWORD [8+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [12+esp]
+        xor     edx,esi
+        mov     DWORD [44+esp],ebx
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [4+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [16+esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [24+esp]
+        xor     esi,ebp
+        mov     DWORD [20+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[1396182291+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        mov     ecx,DWORD [52+esp]
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [esp]
+        add     eax,esi
+        mov     esi,DWORD [40+esp]
+        mov     ebx,ecx
+        ror     ecx,11
+        mov     edi,esi
+        ror     esi,2
+        xor     ecx,ebx
+        shr     ebx,3
+        ror     ecx,7
+        xor     esi,edi
+        xor     ebx,ecx
+        ror     esi,17
+        add     ebx,DWORD [48+esp]
+        shr     edi,10
+        add     ebx,DWORD [84+esp]
+        mov     ecx,edx
+        xor     edi,esi
+        mov     esi,DWORD [4+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [8+esp]
+        xor     edx,ecx
+        mov     DWORD [48+esp],ebx
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [12+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [20+esp]
+        xor     ecx,eax
+        mov     DWORD [16+esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[1695183700+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        mov     esi,DWORD [56+esp]
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [28+esp]
+        add     ebp,ecx
+        mov     ecx,DWORD [44+esp]
+        mov     ebx,esi
+        ror     esi,11
+        mov     edi,ecx
+        ror     ecx,2
+        xor     esi,ebx
+        shr     ebx,3
+        ror     esi,7
+        xor     ecx,edi
+        xor     ebx,esi
+        ror     ecx,17
+        add     ebx,DWORD [52+esp]
+        shr     edi,10
+        add     ebx,DWORD [88+esp]
+        mov     esi,edx
+        xor     edi,ecx
+        mov     ecx,DWORD [esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [4+esp]
+        xor     edx,esi
+        mov     DWORD [52+esp],ebx
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [28+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [8+esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [16+esp]
+        xor     esi,ebp
+        mov     DWORD [12+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[1986661051+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        mov     ecx,DWORD [60+esp]
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [24+esp]
+        add     eax,esi
+        mov     esi,DWORD [48+esp]
+        mov     ebx,ecx
+        ror     ecx,11
+        mov     edi,esi
+        ror     esi,2
+        xor     ecx,ebx
+        shr     ebx,3
+        ror     ecx,7
+        xor     esi,edi
+        xor     ebx,ecx
+        ror     esi,17
+        add     ebx,DWORD [56+esp]
+        shr     edi,10
+        add     ebx,DWORD [92+esp]
+        mov     ecx,edx
+        xor     edi,esi
+        mov     esi,DWORD [28+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [esp]
+        xor     edx,ecx
+        mov     DWORD [56+esp],ebx
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [24+esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [4+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [12+esp]
+        xor     ecx,eax
+        mov     DWORD [8+esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[2177026350+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        mov     esi,DWORD [64+esp]
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [20+esp]
+        add     ebp,ecx
+        mov     ecx,DWORD [52+esp]
+        mov     ebx,esi
+        ror     esi,11
+        mov     edi,ecx
+        ror     ecx,2
+        xor     esi,ebx
+        shr     ebx,3
+        ror     esi,7
+        xor     ecx,edi
+        xor     ebx,esi
+        ror     ecx,17
+        add     ebx,DWORD [60+esp]
+        shr     edi,10
+        add     ebx,DWORD [32+esp]
+        mov     esi,edx
+        xor     edi,ecx
+        mov     ecx,DWORD [24+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [28+esp]
+        xor     edx,esi
+        mov     DWORD [60+esp],ebx
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [20+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [8+esp]
+        xor     esi,ebp
+        mov     DWORD [4+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[2456956037+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        mov     ecx,DWORD [68+esp]
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [16+esp]
+        add     eax,esi
+        mov     esi,DWORD [56+esp]
+        mov     ebx,ecx
+        ror     ecx,11
+        mov     edi,esi
+        ror     esi,2
+        xor     ecx,ebx
+        shr     ebx,3
+        ror     ecx,7
+        xor     esi,edi
+        xor     ebx,ecx
+        ror     esi,17
+        add     ebx,DWORD [64+esp]
+        shr     edi,10
+        add     ebx,DWORD [36+esp]
+        mov     ecx,edx
+        xor     edi,esi
+        mov     esi,DWORD [20+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [24+esp]
+        xor     edx,ecx
+        mov     DWORD [64+esp],ebx
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [16+esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [28+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [4+esp]
+        xor     ecx,eax
+        mov     DWORD [esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[2730485921+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        mov     esi,DWORD [72+esp]
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [12+esp]
+        add     ebp,ecx
+        mov     ecx,DWORD [60+esp]
+        mov     ebx,esi
+        ror     esi,11
+        mov     edi,ecx
+        ror     ecx,2
+        xor     esi,ebx
+        shr     ebx,3
+        ror     esi,7
+        xor     ecx,edi
+        xor     ebx,esi
+        ror     ecx,17
+        add     ebx,DWORD [68+esp]
+        shr     edi,10
+        add     ebx,DWORD [40+esp]
+        mov     esi,edx
+        xor     edi,ecx
+        mov     ecx,DWORD [16+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [20+esp]
+        xor     edx,esi
+        mov     DWORD [68+esp],ebx
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [12+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [24+esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [esp]
+        xor     esi,ebp
+        mov     DWORD [28+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[2820302411+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        mov     ecx,DWORD [76+esp]
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [8+esp]
+        add     eax,esi
+        mov     esi,DWORD [64+esp]
+        mov     ebx,ecx
+        ror     ecx,11
+        mov     edi,esi
+        ror     esi,2
+        xor     ecx,ebx
+        shr     ebx,3
+        ror     ecx,7
+        xor     esi,edi
+        xor     ebx,ecx
+        ror     esi,17
+        add     ebx,DWORD [72+esp]
+        shr     edi,10
+        add     ebx,DWORD [44+esp]
+        mov     ecx,edx
+        xor     edi,esi
+        mov     esi,DWORD [12+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [16+esp]
+        xor     edx,ecx
+        mov     DWORD [72+esp],ebx
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [8+esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [20+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [28+esp]
+        xor     ecx,eax
+        mov     DWORD [24+esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[3259730800+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        mov     esi,DWORD [80+esp]
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [4+esp]
+        add     ebp,ecx
+        mov     ecx,DWORD [68+esp]
+        mov     ebx,esi
+        ror     esi,11
+        mov     edi,ecx
+        ror     ecx,2
+        xor     esi,ebx
+        shr     ebx,3
+        ror     esi,7
+        xor     ecx,edi
+        xor     ebx,esi
+        ror     ecx,17
+        add     ebx,DWORD [76+esp]
+        shr     edi,10
+        add     ebx,DWORD [48+esp]
+        mov     esi,edx
+        xor     edi,ecx
+        mov     ecx,DWORD [8+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [12+esp]
+        xor     edx,esi
+        mov     DWORD [76+esp],ebx
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [4+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [16+esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [24+esp]
+        xor     esi,ebp
+        mov     DWORD [20+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[3345764771+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        mov     ecx,DWORD [84+esp]
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [esp]
+        add     eax,esi
+        mov     esi,DWORD [72+esp]
+        mov     ebx,ecx
+        ror     ecx,11
+        mov     edi,esi
+        ror     esi,2
+        xor     ecx,ebx
+        shr     ebx,3
+        ror     ecx,7
+        xor     esi,edi
+        xor     ebx,ecx
+        ror     esi,17
+        add     ebx,DWORD [80+esp]
+        shr     edi,10
+        add     ebx,DWORD [52+esp]
+        mov     ecx,edx
+        xor     edi,esi
+        mov     esi,DWORD [4+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [8+esp]
+        xor     edx,ecx
+        mov     DWORD [80+esp],ebx
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [12+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [20+esp]
+        xor     ecx,eax
+        mov     DWORD [16+esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[3516065817+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        mov     esi,DWORD [88+esp]
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [28+esp]
+        add     ebp,ecx
+        mov     ecx,DWORD [76+esp]
+        mov     ebx,esi
+        ror     esi,11
+        mov     edi,ecx
+        ror     ecx,2
+        xor     esi,ebx
+        shr     ebx,3
+        ror     esi,7
+        xor     ecx,edi
+        xor     ebx,esi
+        ror     ecx,17
+        add     ebx,DWORD [84+esp]
+        shr     edi,10
+        add     ebx,DWORD [56+esp]
+        mov     esi,edx
+        xor     edi,ecx
+        mov     ecx,DWORD [esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [4+esp]
+        xor     edx,esi
+        mov     DWORD [84+esp],ebx
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [28+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [8+esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [16+esp]
+        xor     esi,ebp
+        mov     DWORD [12+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[3600352804+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        mov     ecx,DWORD [92+esp]
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [24+esp]
+        add     eax,esi
+        mov     esi,DWORD [80+esp]
+        mov     ebx,ecx
+        ror     ecx,11
+        mov     edi,esi
+        ror     esi,2
+        xor     ecx,ebx
+        shr     ebx,3
+        ror     ecx,7
+        xor     esi,edi
+        xor     ebx,ecx
+        ror     esi,17
+        add     ebx,DWORD [88+esp]
+        shr     edi,10
+        add     ebx,DWORD [60+esp]
+        mov     ecx,edx
+        xor     edi,esi
+        mov     esi,DWORD [28+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [esp]
+        xor     edx,ecx
+        mov     DWORD [88+esp],ebx
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [24+esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [4+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [12+esp]
+        xor     ecx,eax
+        mov     DWORD [8+esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[4094571909+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        mov     esi,DWORD [32+esp]
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [20+esp]
+        add     ebp,ecx
+        mov     ecx,DWORD [84+esp]
+        mov     ebx,esi
+        ror     esi,11
+        mov     edi,ecx
+        ror     ecx,2
+        xor     esi,ebx
+        shr     ebx,3
+        ror     esi,7
+        xor     ecx,edi
+        xor     ebx,esi
+        ror     ecx,17
+        add     ebx,DWORD [92+esp]
+        shr     edi,10
+        add     ebx,DWORD [64+esp]
+        mov     esi,edx
+        xor     edi,ecx
+        mov     ecx,DWORD [24+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [28+esp]
+        xor     edx,esi
+        mov     DWORD [92+esp],ebx
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [20+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [8+esp]
+        xor     esi,ebp
+        mov     DWORD [4+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[275423344+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        mov     ecx,DWORD [36+esp]
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [16+esp]
+        add     eax,esi
+        mov     esi,DWORD [88+esp]
+        mov     ebx,ecx
+        ror     ecx,11
+        mov     edi,esi
+        ror     esi,2
+        xor     ecx,ebx
+        shr     ebx,3
+        ror     ecx,7
+        xor     esi,edi
+        xor     ebx,ecx
+        ror     esi,17
+        add     ebx,DWORD [32+esp]
+        shr     edi,10
+        add     ebx,DWORD [68+esp]
+        mov     ecx,edx
+        xor     edi,esi
+        mov     esi,DWORD [20+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [24+esp]
+        xor     edx,ecx
+        mov     DWORD [32+esp],ebx
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [16+esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [28+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [4+esp]
+        xor     ecx,eax
+        mov     DWORD [esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[430227734+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        mov     esi,DWORD [40+esp]
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [12+esp]
+        add     ebp,ecx
+        mov     ecx,DWORD [92+esp]
+        mov     ebx,esi
+        ror     esi,11
+        mov     edi,ecx
+        ror     ecx,2
+        xor     esi,ebx
+        shr     ebx,3
+        ror     esi,7
+        xor     ecx,edi
+        xor     ebx,esi
+        ror     ecx,17
+        add     ebx,DWORD [36+esp]
+        shr     edi,10
+        add     ebx,DWORD [72+esp]
+        mov     esi,edx
+        xor     edi,ecx
+        mov     ecx,DWORD [16+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [20+esp]
+        xor     edx,esi
+        mov     DWORD [36+esp],ebx
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [12+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [24+esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [esp]
+        xor     esi,ebp
+        mov     DWORD [28+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[506948616+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        mov     ecx,DWORD [44+esp]
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [8+esp]
+        add     eax,esi
+        mov     esi,DWORD [32+esp]
+        mov     ebx,ecx
+        ror     ecx,11
+        mov     edi,esi
+        ror     esi,2
+        xor     ecx,ebx
+        shr     ebx,3
+        ror     ecx,7
+        xor     esi,edi
+        xor     ebx,ecx
+        ror     esi,17
+        add     ebx,DWORD [40+esp]
+        shr     edi,10
+        add     ebx,DWORD [76+esp]
+        mov     ecx,edx
+        xor     edi,esi
+        mov     esi,DWORD [12+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [16+esp]
+        xor     edx,ecx
+        mov     DWORD [40+esp],ebx
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [8+esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [20+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [28+esp]
+        xor     ecx,eax
+        mov     DWORD [24+esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[659060556+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        mov     esi,DWORD [48+esp]
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [4+esp]
+        add     ebp,ecx
+        mov     ecx,DWORD [36+esp]
+        mov     ebx,esi
+        ror     esi,11
+        mov     edi,ecx
+        ror     ecx,2
+        xor     esi,ebx
+        shr     ebx,3
+        ror     esi,7
+        xor     ecx,edi
+        xor     ebx,esi
+        ror     ecx,17
+        add     ebx,DWORD [44+esp]
+        shr     edi,10
+        add     ebx,DWORD [80+esp]
+        mov     esi,edx
+        xor     edi,ecx
+        mov     ecx,DWORD [8+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [12+esp]
+        xor     edx,esi
+        mov     DWORD [44+esp],ebx
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [4+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [16+esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [24+esp]
+        xor     esi,ebp
+        mov     DWORD [20+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[883997877+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        mov     ecx,DWORD [52+esp]
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [esp]
+        add     eax,esi
+        mov     esi,DWORD [40+esp]
+        mov     ebx,ecx
+        ror     ecx,11
+        mov     edi,esi
+        ror     esi,2
+        xor     ecx,ebx
+        shr     ebx,3
+        ror     ecx,7
+        xor     esi,edi
+        xor     ebx,ecx
+        ror     esi,17
+        add     ebx,DWORD [48+esp]
+        shr     edi,10
+        add     ebx,DWORD [84+esp]
+        mov     ecx,edx
+        xor     edi,esi
+        mov     esi,DWORD [4+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [8+esp]
+        xor     edx,ecx
+        mov     DWORD [48+esp],ebx
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [12+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [20+esp]
+        xor     ecx,eax
+        mov     DWORD [16+esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[958139571+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        mov     esi,DWORD [56+esp]
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [28+esp]
+        add     ebp,ecx
+        mov     ecx,DWORD [44+esp]
+        mov     ebx,esi
+        ror     esi,11
+        mov     edi,ecx
+        ror     ecx,2
+        xor     esi,ebx
+        shr     ebx,3
+        ror     esi,7
+        xor     ecx,edi
+        xor     ebx,esi
+        ror     ecx,17
+        add     ebx,DWORD [52+esp]
+        shr     edi,10
+        add     ebx,DWORD [88+esp]
+        mov     esi,edx
+        xor     edi,ecx
+        mov     ecx,DWORD [esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [4+esp]
+        xor     edx,esi
+        mov     DWORD [52+esp],ebx
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [28+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [8+esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [16+esp]
+        xor     esi,ebp
+        mov     DWORD [12+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[1322822218+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        mov     ecx,DWORD [60+esp]
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [24+esp]
+        add     eax,esi
+        mov     esi,DWORD [48+esp]
+        mov     ebx,ecx
+        ror     ecx,11
+        mov     edi,esi
+        ror     esi,2
+        xor     ecx,ebx
+        shr     ebx,3
+        ror     ecx,7
+        xor     esi,edi
+        xor     ebx,ecx
+        ror     esi,17
+        add     ebx,DWORD [56+esp]
+        shr     edi,10
+        add     ebx,DWORD [92+esp]
+        mov     ecx,edx
+        xor     edi,esi
+        mov     esi,DWORD [28+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [esp]
+        xor     edx,ecx
+        mov     DWORD [56+esp],ebx
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [24+esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [4+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [12+esp]
+        xor     ecx,eax
+        mov     DWORD [8+esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[1537002063+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        mov     esi,DWORD [64+esp]
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [20+esp]
+        add     ebp,ecx
+        mov     ecx,DWORD [52+esp]
+        mov     ebx,esi
+        ror     esi,11
+        mov     edi,ecx
+        ror     ecx,2
+        xor     esi,ebx
+        shr     ebx,3
+        ror     esi,7
+        xor     ecx,edi
+        xor     ebx,esi
+        ror     ecx,17
+        add     ebx,DWORD [60+esp]
+        shr     edi,10
+        add     ebx,DWORD [32+esp]
+        mov     esi,edx
+        xor     edi,ecx
+        mov     ecx,DWORD [24+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [28+esp]
+        xor     edx,esi
+        mov     DWORD [60+esp],ebx
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [20+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [8+esp]
+        xor     esi,ebp
+        mov     DWORD [4+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[1747873779+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        mov     ecx,DWORD [68+esp]
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [16+esp]
+        add     eax,esi
+        mov     esi,DWORD [56+esp]
+        mov     ebx,ecx
+        ror     ecx,11
+        mov     edi,esi
+        ror     esi,2
+        xor     ecx,ebx
+        shr     ebx,3
+        ror     ecx,7
+        xor     esi,edi
+        xor     ebx,ecx
+        ror     esi,17
+        add     ebx,DWORD [64+esp]
+        shr     edi,10
+        add     ebx,DWORD [36+esp]
+        mov     ecx,edx
+        xor     edi,esi
+        mov     esi,DWORD [20+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [24+esp]
+        xor     edx,ecx
+        mov     DWORD [64+esp],ebx
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [16+esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [28+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [4+esp]
+        xor     ecx,eax
+        mov     DWORD [esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[1955562222+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        mov     esi,DWORD [72+esp]
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [12+esp]
+        add     ebp,ecx
+        mov     ecx,DWORD [60+esp]
+        mov     ebx,esi
+        ror     esi,11
+        mov     edi,ecx
+        ror     ecx,2
+        xor     esi,ebx
+        shr     ebx,3
+        ror     esi,7
+        xor     ecx,edi
+        xor     ebx,esi
+        ror     ecx,17
+        add     ebx,DWORD [68+esp]
+        shr     edi,10
+        add     ebx,DWORD [40+esp]
+        mov     esi,edx
+        xor     edi,ecx
+        mov     ecx,DWORD [16+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [20+esp]
+        xor     edx,esi
+        mov     DWORD [68+esp],ebx
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [12+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [24+esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [esp]
+        xor     esi,ebp
+        mov     DWORD [28+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[2024104815+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        mov     ecx,DWORD [76+esp]
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [8+esp]
+        add     eax,esi
+        mov     esi,DWORD [64+esp]
+        mov     ebx,ecx
+        ror     ecx,11
+        mov     edi,esi
+        ror     esi,2
+        xor     ecx,ebx
+        shr     ebx,3
+        ror     ecx,7
+        xor     esi,edi
+        xor     ebx,ecx
+        ror     esi,17
+        add     ebx,DWORD [72+esp]
+        shr     edi,10
+        add     ebx,DWORD [44+esp]
+        mov     ecx,edx
+        xor     edi,esi
+        mov     esi,DWORD [12+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [16+esp]
+        xor     edx,ecx
+        mov     DWORD [72+esp],ebx
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [8+esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [20+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [28+esp]
+        xor     ecx,eax
+        mov     DWORD [24+esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[2227730452+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        mov     esi,DWORD [80+esp]
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [4+esp]
+        add     ebp,ecx
+        mov     ecx,DWORD [68+esp]
+        mov     ebx,esi
+        ror     esi,11
+        mov     edi,ecx
+        ror     ecx,2
+        xor     esi,ebx
+        shr     ebx,3
+        ror     esi,7
+        xor     ecx,edi
+        xor     ebx,esi
+        ror     ecx,17
+        add     ebx,DWORD [76+esp]
+        shr     edi,10
+        add     ebx,DWORD [48+esp]
+        mov     esi,edx
+        xor     edi,ecx
+        mov     ecx,DWORD [8+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [12+esp]
+        xor     edx,esi
+        mov     DWORD [76+esp],ebx
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [4+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [16+esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [24+esp]
+        xor     esi,ebp
+        mov     DWORD [20+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[2361852424+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        mov     ecx,DWORD [84+esp]
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [esp]
+        add     eax,esi
+        mov     esi,DWORD [72+esp]
+        mov     ebx,ecx
+        ror     ecx,11
+        mov     edi,esi
+        ror     esi,2
+        xor     ecx,ebx
+        shr     ebx,3
+        ror     ecx,7
+        xor     esi,edi
+        xor     ebx,ecx
+        ror     esi,17
+        add     ebx,DWORD [80+esp]
+        shr     edi,10
+        add     ebx,DWORD [52+esp]
+        mov     ecx,edx
+        xor     edi,esi
+        mov     esi,DWORD [4+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [8+esp]
+        xor     edx,ecx
+        mov     DWORD [80+esp],ebx
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [12+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [20+esp]
+        xor     ecx,eax
+        mov     DWORD [16+esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[2428436474+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        mov     esi,DWORD [88+esp]
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [28+esp]
+        add     ebp,ecx
+        mov     ecx,DWORD [76+esp]
+        mov     ebx,esi
+        ror     esi,11
+        mov     edi,ecx
+        ror     ecx,2
+        xor     esi,ebx
+        shr     ebx,3
+        ror     esi,7
+        xor     ecx,edi
+        xor     ebx,esi
+        ror     ecx,17
+        add     ebx,DWORD [84+esp]
+        shr     edi,10
+        add     ebx,DWORD [56+esp]
+        mov     esi,edx
+        xor     edi,ecx
+        mov     ecx,DWORD [esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [4+esp]
+        xor     edx,esi
+        mov     DWORD [84+esp],ebx
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [28+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [8+esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [16+esp]
+        xor     esi,ebp
+        mov     DWORD [12+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[2756734187+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        mov     ecx,DWORD [92+esp]
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [24+esp]
+        add     eax,esi
+        mov     esi,DWORD [80+esp]
+        mov     ebx,ecx
+        ror     ecx,11
+        mov     edi,esi
+        ror     esi,2
+        xor     ecx,ebx
+        shr     ebx,3
+        ror     ecx,7
+        xor     esi,edi
+        xor     ebx,ecx
+        ror     esi,17
+        add     ebx,DWORD [88+esp]
+        shr     edi,10
+        add     ebx,DWORD [60+esp]
+        mov     ecx,edx
+        xor     edi,esi
+        mov     esi,DWORD [28+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [esp]
+        xor     edx,ecx
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [24+esp],ecx
+        xor     edx,ecx
+        add     ebx,DWORD [4+esp]
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     ebx,edi
+        ror     ecx,9
+        mov     esi,eax
+        mov     edi,DWORD [12+esp]
+        xor     ecx,eax
+        mov     DWORD [8+esp],eax
+        xor     eax,edi
+        ror     ecx,11
+        and     ebp,eax
+        lea     edx,[3204031479+edx*1+ebx]
+        xor     ecx,esi
+        xor     ebp,edi
+        mov     esi,DWORD [32+esp]
+        ror     ecx,2
+        add     ebp,edx
+        add     edx,DWORD [20+esp]
+        add     ebp,ecx
+        mov     ecx,DWORD [84+esp]
+        mov     ebx,esi
+        ror     esi,11
+        mov     edi,ecx
+        ror     ecx,2
+        xor     esi,ebx
+        shr     ebx,3
+        ror     esi,7
+        xor     ecx,edi
+        xor     ebx,esi
+        ror     ecx,17
+        add     ebx,DWORD [92+esp]
+        shr     edi,10
+        add     ebx,DWORD [64+esp]
+        mov     esi,edx
+        xor     edi,ecx
+        mov     ecx,DWORD [24+esp]
+        ror     edx,14
+        add     ebx,edi
+        mov     edi,DWORD [28+esp]
+        xor     edx,esi
+        xor     ecx,edi
+        ror     edx,5
+        and     ecx,esi
+        mov     DWORD [20+esp],esi
+        xor     edx,esi
+        add     ebx,DWORD [esp]
+        xor     edi,ecx
+        ror     edx,6
+        mov     esi,ebp
+        add     ebx,edi
+        ror     esi,9
+        mov     ecx,ebp
+        mov     edi,DWORD [8+esp]
+        xor     esi,ebp
+        mov     DWORD [4+esp],ebp
+        xor     ebp,edi
+        ror     esi,11
+        and     eax,ebp
+        lea     edx,[3329325298+edx*1+ebx]
+        xor     esi,ecx
+        xor     eax,edi
+        ror     esi,2
+        add     eax,edx
+        add     edx,DWORD [16+esp]
+        add     eax,esi
+        mov     esi,DWORD [96+esp]
+        xor     ebp,edi
+        mov     ecx,DWORD [12+esp]
+        add     eax,DWORD [esi]
+        add     ebp,DWORD [4+esi]
+        add     edi,DWORD [8+esi]
+        add     ecx,DWORD [12+esi]
+        mov     DWORD [esi],eax
+        mov     DWORD [4+esi],ebp
+        mov     DWORD [8+esi],edi
+        mov     DWORD [12+esi],ecx
+        mov     DWORD [4+esp],ebp
+        xor     ebp,edi
+        mov     DWORD [8+esp],edi
+        mov     DWORD [12+esp],ecx
+        mov     edi,DWORD [20+esp]
+        mov     ebx,DWORD [24+esp]
+        mov     ecx,DWORD [28+esp]
+        add     edx,DWORD [16+esi]
+        add     edi,DWORD [20+esi]
+        add     ebx,DWORD [24+esi]
+        add     ecx,DWORD [28+esi]
+        mov     DWORD [16+esi],edx
+        mov     DWORD [20+esi],edi
+        mov     DWORD [24+esi],ebx
+        mov     DWORD [28+esi],ecx
+        mov     DWORD [20+esp],edi
+        mov     edi,DWORD [100+esp]
+        mov     DWORD [24+esp],ebx
+        mov     DWORD [28+esp],ecx
+        cmp     edi,DWORD [104+esp]
+        jb      NEAR L$010grand_loop
+        mov     esp,DWORD [108+esp]
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+align   32
+L$004shaext:
+        sub     esp,32
+        movdqu  xmm1,[esi]
+        lea     ebp,[128+ebp]
+        movdqu  xmm2,[16+esi]
+        movdqa  xmm7,[128+ebp]
+        pshufd  xmm0,xmm1,27
+        pshufd  xmm1,xmm1,177
+        pshufd  xmm2,xmm2,27
+db      102,15,58,15,202,8
+        punpcklqdq      xmm2,xmm0
+        jmp     NEAR L$011loop_shaext
+align   16
+L$011loop_shaext:
+        movdqu  xmm3,[edi]
+        movdqu  xmm4,[16+edi]
+        movdqu  xmm5,[32+edi]
+db      102,15,56,0,223
+        movdqu  xmm6,[48+edi]
+        movdqa  [16+esp],xmm2
+        movdqa  xmm0,[ebp-128]
+        paddd   xmm0,xmm3
+db      102,15,56,0,231
+db      15,56,203,209
+        pshufd  xmm0,xmm0,14
+        nop
+        movdqa  [esp],xmm1
+db      15,56,203,202
+        movdqa  xmm0,[ebp-112]
+        paddd   xmm0,xmm4
+db      102,15,56,0,239
+db      15,56,203,209
+        pshufd  xmm0,xmm0,14
+        lea     edi,[64+edi]
+db      15,56,204,220
+db      15,56,203,202
+        movdqa  xmm0,[ebp-96]
+        paddd   xmm0,xmm5
+db      102,15,56,0,247
+db      15,56,203,209
+        pshufd  xmm0,xmm0,14
+        movdqa  xmm7,xmm6
+db      102,15,58,15,253,4
+        nop
+        paddd   xmm3,xmm7
+db      15,56,204,229
+db      15,56,203,202
+        movdqa  xmm0,[ebp-80]
+        paddd   xmm0,xmm6
+db      15,56,205,222
+db      15,56,203,209
+        pshufd  xmm0,xmm0,14
+        movdqa  xmm7,xmm3
+db      102,15,58,15,254,4
+        nop
+        paddd   xmm4,xmm7
+db      15,56,204,238
+db      15,56,203,202
+        movdqa  xmm0,[ebp-64]
+        paddd   xmm0,xmm3
+db      15,56,205,227
+db      15,56,203,209
+        pshufd  xmm0,xmm0,14
+        movdqa  xmm7,xmm4
+db      102,15,58,15,251,4
+        nop
+        paddd   xmm5,xmm7
+db      15,56,204,243
+db      15,56,203,202
+        movdqa  xmm0,[ebp-48]
+        paddd   xmm0,xmm4
+db      15,56,205,236
+db      15,56,203,209
+        pshufd  xmm0,xmm0,14
+        movdqa  xmm7,xmm5
+db      102,15,58,15,252,4
+        nop
+        paddd   xmm6,xmm7
+db      15,56,204,220
+db      15,56,203,202
+        movdqa  xmm0,[ebp-32]
+        paddd   xmm0,xmm5
+db      15,56,205,245
+db      15,56,203,209
+        pshufd  xmm0,xmm0,14
+        movdqa  xmm7,xmm6
+db      102,15,58,15,253,4
+        nop
+        paddd   xmm3,xmm7
+db      15,56,204,229
+db      15,56,203,202
+        movdqa  xmm0,[ebp-16]
+        paddd   xmm0,xmm6
+db      15,56,205,222
+db      15,56,203,209
+        pshufd  xmm0,xmm0,14
+        movdqa  xmm7,xmm3
+db      102,15,58,15,254,4
+        nop
+        paddd   xmm4,xmm7
+db      15,56,204,238
+db      15,56,203,202
+        movdqa  xmm0,[ebp]
+        paddd   xmm0,xmm3
+db      15,56,205,227
+db      15,56,203,209
+        pshufd  xmm0,xmm0,14
+        movdqa  xmm7,xmm4
+db      102,15,58,15,251,4
+        nop
+        paddd   xmm5,xmm7
+db      15,56,204,243
+db      15,56,203,202
+        movdqa  xmm0,[16+ebp]
+        paddd   xmm0,xmm4
+db      15,56,205,236
+db      15,56,203,209
+        pshufd  xmm0,xmm0,14
+        movdqa  xmm7,xmm5
+db      102,15,58,15,252,4
+        nop
+        paddd   xmm6,xmm7
+db      15,56,204,220
+db      15,56,203,202
+        movdqa  xmm0,[32+ebp]
+        paddd   xmm0,xmm5
+db      15,56,205,245
+db      15,56,203,209
+        pshufd  xmm0,xmm0,14
+        movdqa  xmm7,xmm6
+db      102,15,58,15,253,4
+        nop
+        paddd   xmm3,xmm7
+db      15,56,204,229
+db      15,56,203,202
+        movdqa  xmm0,[48+ebp]
+        paddd   xmm0,xmm6
+db      15,56,205,222
+db      15,56,203,209
+        pshufd  xmm0,xmm0,14
+        movdqa  xmm7,xmm3
+db      102,15,58,15,254,4
+        nop
+        paddd   xmm4,xmm7
+db      15,56,204,238
+db      15,56,203,202
+        movdqa  xmm0,[64+ebp]
+        paddd   xmm0,xmm3
+db      15,56,205,227
+db      15,56,203,209
+        pshufd  xmm0,xmm0,14
+        movdqa  xmm7,xmm4
+db      102,15,58,15,251,4
+        nop
+        paddd   xmm5,xmm7
+db      15,56,204,243
+db      15,56,203,202
+        movdqa  xmm0,[80+ebp]
+        paddd   xmm0,xmm4
+db      15,56,205,236
+db      15,56,203,209
+        pshufd  xmm0,xmm0,14
+        movdqa  xmm7,xmm5
+db      102,15,58,15,252,4
+db      15,56,203,202
+        paddd   xmm6,xmm7
+        movdqa  xmm0,[96+ebp]
+        paddd   xmm0,xmm5
+db      15,56,203,209
+        pshufd  xmm0,xmm0,14
+db      15,56,205,245
+        movdqa  xmm7,[128+ebp]
+db      15,56,203,202
+        movdqa  xmm0,[112+ebp]
+        paddd   xmm0,xmm6
+        nop
+db      15,56,203,209
+        pshufd  xmm0,xmm0,14
+        cmp     eax,edi
+        nop
+db      15,56,203,202
+        paddd   xmm2,[16+esp]
+        paddd   xmm1,[esp]
+        jnz     NEAR L$011loop_shaext
+        pshufd  xmm2,xmm2,177
+        pshufd  xmm7,xmm1,27
+        pshufd  xmm1,xmm1,177
+        punpckhqdq      xmm1,xmm2
+db      102,15,58,15,215,8
+        mov     esp,DWORD [44+esp]
+        movdqu  [esi],xmm1
+        movdqu  [16+esi],xmm2
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+align   32
+L$006SSSE3:
+        lea     esp,[esp-96]
+        mov     eax,DWORD [esi]
+        mov     ebx,DWORD [4+esi]
+        mov     ecx,DWORD [8+esi]
+        mov     edi,DWORD [12+esi]
+        mov     DWORD [4+esp],ebx
+        xor     ebx,ecx
+        mov     DWORD [8+esp],ecx
+        mov     DWORD [12+esp],edi
+        mov     edx,DWORD [16+esi]
+        mov     edi,DWORD [20+esi]
+        mov     ecx,DWORD [24+esi]
+        mov     esi,DWORD [28+esi]
+        mov     DWORD [20+esp],edi
+        mov     edi,DWORD [100+esp]
+        mov     DWORD [24+esp],ecx
+        mov     DWORD [28+esp],esi
+        movdqa  xmm7,[256+ebp]
+        jmp     NEAR L$012grand_ssse3
+align   16
+L$012grand_ssse3:
+        movdqu  xmm0,[edi]
+        movdqu  xmm1,[16+edi]
+        movdqu  xmm2,[32+edi]
+        movdqu  xmm3,[48+edi]
+        add     edi,64
+db      102,15,56,0,199
+        mov     DWORD [100+esp],edi
+db      102,15,56,0,207
+        movdqa  xmm4,[ebp]
+db      102,15,56,0,215
+        movdqa  xmm5,[16+ebp]
+        paddd   xmm4,xmm0
+db      102,15,56,0,223
+        movdqa  xmm6,[32+ebp]
+        paddd   xmm5,xmm1
+        movdqa  xmm7,[48+ebp]
+        movdqa  [32+esp],xmm4
+        paddd   xmm6,xmm2
+        movdqa  [48+esp],xmm5
+        paddd   xmm7,xmm3
+        movdqa  [64+esp],xmm6
+        movdqa  [80+esp],xmm7
+        jmp     NEAR L$013ssse3_00_47
+align   16
+L$013ssse3_00_47:
+        add     ebp,64
+        mov     ecx,edx
+        movdqa  xmm4,xmm1
+        ror     edx,14
+        mov     esi,DWORD [20+esp]
+        movdqa  xmm7,xmm3
+        xor     edx,ecx
+        mov     edi,DWORD [24+esp]
+db      102,15,58,15,224,4
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+db      102,15,58,15,250,4
+        mov     DWORD [16+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        movdqa  xmm5,xmm4
+        ror     edx,6
+        mov     ecx,eax
+        movdqa  xmm6,xmm4
+        add     edx,edi
+        mov     edi,DWORD [4+esp]
+        psrld   xmm4,3
+        mov     esi,eax
+        ror     ecx,9
+        paddd   xmm0,xmm7
+        mov     DWORD [esp],eax
+        xor     ecx,eax
+        psrld   xmm6,7
+        xor     eax,edi
+        add     edx,DWORD [28+esp]
+        ror     ecx,11
+        and     ebx,eax
+        pshufd  xmm7,xmm3,250
+        xor     ecx,esi
+        add     edx,DWORD [32+esp]
+        pslld   xmm5,14
+        xor     ebx,edi
+        ror     ecx,2
+        pxor    xmm4,xmm6
+        add     ebx,edx
+        add     edx,DWORD [12+esp]
+        psrld   xmm6,11
+        add     ebx,ecx
+        mov     ecx,edx
+        ror     edx,14
+        pxor    xmm4,xmm5
+        mov     esi,DWORD [16+esp]
+        xor     edx,ecx
+        pslld   xmm5,11
+        mov     edi,DWORD [20+esp]
+        xor     esi,edi
+        ror     edx,5
+        pxor    xmm4,xmm6
+        and     esi,ecx
+        mov     DWORD [12+esp],ecx
+        movdqa  xmm6,xmm7
+        xor     edx,ecx
+        xor     edi,esi
+        ror     edx,6
+        pxor    xmm4,xmm5
+        mov     ecx,ebx
+        add     edx,edi
+        psrld   xmm7,10
+        mov     edi,DWORD [esp]
+        mov     esi,ebx
+        ror     ecx,9
+        paddd   xmm0,xmm4
+        mov     DWORD [28+esp],ebx
+        xor     ecx,ebx
+        psrlq   xmm6,17
+        xor     ebx,edi
+        add     edx,DWORD [24+esp]
+        ror     ecx,11
+        pxor    xmm7,xmm6
+        and     eax,ebx
+        xor     ecx,esi
+        psrlq   xmm6,2
+        add     edx,DWORD [36+esp]
+        xor     eax,edi
+        ror     ecx,2
+        pxor    xmm7,xmm6
+        add     eax,edx
+        add     edx,DWORD [8+esp]
+        pshufd  xmm7,xmm7,128
+        add     eax,ecx
+        mov     ecx,edx
+        ror     edx,14
+        mov     esi,DWORD [12+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [16+esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        psrldq  xmm7,8
+        mov     DWORD [8+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        paddd   xmm0,xmm7
+        ror     edx,6
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [28+esp]
+        mov     esi,eax
+        ror     ecx,9
+        mov     DWORD [24+esp],eax
+        pshufd  xmm7,xmm0,80
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [20+esp]
+        movdqa  xmm6,xmm7
+        ror     ecx,11
+        psrld   xmm7,10
+        and     ebx,eax
+        psrlq   xmm6,17
+        xor     ecx,esi
+        add     edx,DWORD [40+esp]
+        xor     ebx,edi
+        ror     ecx,2
+        pxor    xmm7,xmm6
+        add     ebx,edx
+        add     edx,DWORD [4+esp]
+        psrlq   xmm6,2
+        add     ebx,ecx
+        mov     ecx,edx
+        ror     edx,14
+        pxor    xmm7,xmm6
+        mov     esi,DWORD [8+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [12+esp]
+        pshufd  xmm7,xmm7,8
+        xor     esi,edi
+        ror     edx,5
+        movdqa  xmm6,[ebp]
+        and     esi,ecx
+        mov     DWORD [4+esp],ecx
+        pslldq  xmm7,8
+        xor     edx,ecx
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [24+esp]
+        mov     esi,ebx
+        ror     ecx,9
+        paddd   xmm0,xmm7
+        mov     DWORD [20+esp],ebx
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [16+esp]
+        paddd   xmm6,xmm0
+        ror     ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        add     edx,DWORD [44+esp]
+        xor     eax,edi
+        ror     ecx,2
+        add     eax,edx
+        add     edx,DWORD [esp]
+        add     eax,ecx
+        movdqa  [32+esp],xmm6
+        mov     ecx,edx
+        movdqa  xmm4,xmm2
+        ror     edx,14
+        mov     esi,DWORD [4+esp]
+        movdqa  xmm7,xmm0
+        xor     edx,ecx
+        mov     edi,DWORD [8+esp]
+db      102,15,58,15,225,4
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+db      102,15,58,15,251,4
+        mov     DWORD [esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        movdqa  xmm5,xmm4
+        ror     edx,6
+        mov     ecx,eax
+        movdqa  xmm6,xmm4
+        add     edx,edi
+        mov     edi,DWORD [20+esp]
+        psrld   xmm4,3
+        mov     esi,eax
+        ror     ecx,9
+        paddd   xmm1,xmm7
+        mov     DWORD [16+esp],eax
+        xor     ecx,eax
+        psrld   xmm6,7
+        xor     eax,edi
+        add     edx,DWORD [12+esp]
+        ror     ecx,11
+        and     ebx,eax
+        pshufd  xmm7,xmm0,250
+        xor     ecx,esi
+        add     edx,DWORD [48+esp]
+        pslld   xmm5,14
+        xor     ebx,edi
+        ror     ecx,2
+        pxor    xmm4,xmm6
+        add     ebx,edx
+        add     edx,DWORD [28+esp]
+        psrld   xmm6,11
+        add     ebx,ecx
+        mov     ecx,edx
+        ror     edx,14
+        pxor    xmm4,xmm5
+        mov     esi,DWORD [esp]
+        xor     edx,ecx
+        pslld   xmm5,11
+        mov     edi,DWORD [4+esp]
+        xor     esi,edi
+        ror     edx,5
+        pxor    xmm4,xmm6
+        and     esi,ecx
+        mov     DWORD [28+esp],ecx
+        movdqa  xmm6,xmm7
+        xor     edx,ecx
+        xor     edi,esi
+        ror     edx,6
+        pxor    xmm4,xmm5
+        mov     ecx,ebx
+        add     edx,edi
+        psrld   xmm7,10
+        mov     edi,DWORD [16+esp]
+        mov     esi,ebx
+        ror     ecx,9
+        paddd   xmm1,xmm4
+        mov     DWORD [12+esp],ebx
+        xor     ecx,ebx
+        psrlq   xmm6,17
+        xor     ebx,edi
+        add     edx,DWORD [8+esp]
+        ror     ecx,11
+        pxor    xmm7,xmm6
+        and     eax,ebx
+        xor     ecx,esi
+        psrlq   xmm6,2
+        add     edx,DWORD [52+esp]
+        xor     eax,edi
+        ror     ecx,2
+        pxor    xmm7,xmm6
+        add     eax,edx
+        add     edx,DWORD [24+esp]
+        pshufd  xmm7,xmm7,128
+        add     eax,ecx
+        mov     ecx,edx
+        ror     edx,14
+        mov     esi,DWORD [28+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        psrldq  xmm7,8
+        mov     DWORD [24+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        paddd   xmm1,xmm7
+        ror     edx,6
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [12+esp]
+        mov     esi,eax
+        ror     ecx,9
+        mov     DWORD [8+esp],eax
+        pshufd  xmm7,xmm1,80
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [4+esp]
+        movdqa  xmm6,xmm7
+        ror     ecx,11
+        psrld   xmm7,10
+        and     ebx,eax
+        psrlq   xmm6,17
+        xor     ecx,esi
+        add     edx,DWORD [56+esp]
+        xor     ebx,edi
+        ror     ecx,2
+        pxor    xmm7,xmm6
+        add     ebx,edx
+        add     edx,DWORD [20+esp]
+        psrlq   xmm6,2
+        add     ebx,ecx
+        mov     ecx,edx
+        ror     edx,14
+        pxor    xmm7,xmm6
+        mov     esi,DWORD [24+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [28+esp]
+        pshufd  xmm7,xmm7,8
+        xor     esi,edi
+        ror     edx,5
+        movdqa  xmm6,[16+ebp]
+        and     esi,ecx
+        mov     DWORD [20+esp],ecx
+        pslldq  xmm7,8
+        xor     edx,ecx
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [8+esp]
+        mov     esi,ebx
+        ror     ecx,9
+        paddd   xmm1,xmm7
+        mov     DWORD [4+esp],ebx
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [esp]
+        paddd   xmm6,xmm1
+        ror     ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        add     edx,DWORD [60+esp]
+        xor     eax,edi
+        ror     ecx,2
+        add     eax,edx
+        add     edx,DWORD [16+esp]
+        add     eax,ecx
+        movdqa  [48+esp],xmm6
+        mov     ecx,edx
+        movdqa  xmm4,xmm3
+        ror     edx,14
+        mov     esi,DWORD [20+esp]
+        movdqa  xmm7,xmm1
+        xor     edx,ecx
+        mov     edi,DWORD [24+esp]
+db      102,15,58,15,226,4
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+db      102,15,58,15,248,4
+        mov     DWORD [16+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        movdqa  xmm5,xmm4
+        ror     edx,6
+        mov     ecx,eax
+        movdqa  xmm6,xmm4
+        add     edx,edi
+        mov     edi,DWORD [4+esp]
+        psrld   xmm4,3
+        mov     esi,eax
+        ror     ecx,9
+        paddd   xmm2,xmm7
+        mov     DWORD [esp],eax
+        xor     ecx,eax
+        psrld   xmm6,7
+        xor     eax,edi
+        add     edx,DWORD [28+esp]
+        ror     ecx,11
+        and     ebx,eax
+        pshufd  xmm7,xmm1,250
+        xor     ecx,esi
+        add     edx,DWORD [64+esp]
+        pslld   xmm5,14
+        xor     ebx,edi
+        ror     ecx,2
+        pxor    xmm4,xmm6
+        add     ebx,edx
+        add     edx,DWORD [12+esp]
+        psrld   xmm6,11
+        add     ebx,ecx
+        mov     ecx,edx
+        ror     edx,14
+        pxor    xmm4,xmm5
+        mov     esi,DWORD [16+esp]
+        xor     edx,ecx
+        pslld   xmm5,11
+        mov     edi,DWORD [20+esp]
+        xor     esi,edi
+        ror     edx,5
+        pxor    xmm4,xmm6
+        and     esi,ecx
+        mov     DWORD [12+esp],ecx
+        movdqa  xmm6,xmm7
+        xor     edx,ecx
+        xor     edi,esi
+        ror     edx,6
+        pxor    xmm4,xmm5
+        mov     ecx,ebx
+        add     edx,edi
+        psrld   xmm7,10
+        mov     edi,DWORD [esp]
+        mov     esi,ebx
+        ror     ecx,9
+        paddd   xmm2,xmm4
+        mov     DWORD [28+esp],ebx
+        xor     ecx,ebx
+        psrlq   xmm6,17
+        xor     ebx,edi
+        add     edx,DWORD [24+esp]
+        ror     ecx,11
+        pxor    xmm7,xmm6
+        and     eax,ebx
+        xor     ecx,esi
+        psrlq   xmm6,2
+        add     edx,DWORD [68+esp]
+        xor     eax,edi
+        ror     ecx,2
+        pxor    xmm7,xmm6
+        add     eax,edx
+        add     edx,DWORD [8+esp]
+        pshufd  xmm7,xmm7,128
+        add     eax,ecx
+        mov     ecx,edx
+        ror     edx,14
+        mov     esi,DWORD [12+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [16+esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        psrldq  xmm7,8
+        mov     DWORD [8+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        paddd   xmm2,xmm7
+        ror     edx,6
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [28+esp]
+        mov     esi,eax
+        ror     ecx,9
+        mov     DWORD [24+esp],eax
+        pshufd  xmm7,xmm2,80
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [20+esp]
+        movdqa  xmm6,xmm7
+        ror     ecx,11
+        psrld   xmm7,10
+        and     ebx,eax
+        psrlq   xmm6,17
+        xor     ecx,esi
+        add     edx,DWORD [72+esp]
+        xor     ebx,edi
+        ror     ecx,2
+        pxor    xmm7,xmm6
+        add     ebx,edx
+        add     edx,DWORD [4+esp]
+        psrlq   xmm6,2
+        add     ebx,ecx
+        mov     ecx,edx
+        ror     edx,14
+        pxor    xmm7,xmm6
+        mov     esi,DWORD [8+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [12+esp]
+        pshufd  xmm7,xmm7,8
+        xor     esi,edi
+        ror     edx,5
+        movdqa  xmm6,[32+ebp]
+        and     esi,ecx
+        mov     DWORD [4+esp],ecx
+        pslldq  xmm7,8
+        xor     edx,ecx
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [24+esp]
+        mov     esi,ebx
+        ror     ecx,9
+        paddd   xmm2,xmm7
+        mov     DWORD [20+esp],ebx
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [16+esp]
+        paddd   xmm6,xmm2
+        ror     ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        add     edx,DWORD [76+esp]
+        xor     eax,edi
+        ror     ecx,2
+        add     eax,edx
+        add     edx,DWORD [esp]
+        add     eax,ecx
+        movdqa  [64+esp],xmm6
+        mov     ecx,edx
+        movdqa  xmm4,xmm0
+        ror     edx,14
+        mov     esi,DWORD [4+esp]
+        movdqa  xmm7,xmm2
+        xor     edx,ecx
+        mov     edi,DWORD [8+esp]
+db      102,15,58,15,227,4
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+db      102,15,58,15,249,4
+        mov     DWORD [esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        movdqa  xmm5,xmm4
+        ror     edx,6
+        mov     ecx,eax
+        movdqa  xmm6,xmm4
+        add     edx,edi
+        mov     edi,DWORD [20+esp]
+        psrld   xmm4,3
+        mov     esi,eax
+        ror     ecx,9
+        paddd   xmm3,xmm7
+        mov     DWORD [16+esp],eax
+        xor     ecx,eax
+        psrld   xmm6,7
+        xor     eax,edi
+        add     edx,DWORD [12+esp]
+        ror     ecx,11
+        and     ebx,eax
+        pshufd  xmm7,xmm2,250
+        xor     ecx,esi
+        add     edx,DWORD [80+esp]
+        pslld   xmm5,14
+        xor     ebx,edi
+        ror     ecx,2
+        pxor    xmm4,xmm6
+        add     ebx,edx
+        add     edx,DWORD [28+esp]
+        psrld   xmm6,11
+        add     ebx,ecx
+        mov     ecx,edx
+        ror     edx,14
+        pxor    xmm4,xmm5
+        mov     esi,DWORD [esp]
+        xor     edx,ecx
+        pslld   xmm5,11
+        mov     edi,DWORD [4+esp]
+        xor     esi,edi
+        ror     edx,5
+        pxor    xmm4,xmm6
+        and     esi,ecx
+        mov     DWORD [28+esp],ecx
+        movdqa  xmm6,xmm7
+        xor     edx,ecx
+        xor     edi,esi
+        ror     edx,6
+        pxor    xmm4,xmm5
+        mov     ecx,ebx
+        add     edx,edi
+        psrld   xmm7,10
+        mov     edi,DWORD [16+esp]
+        mov     esi,ebx
+        ror     ecx,9
+        paddd   xmm3,xmm4
+        mov     DWORD [12+esp],ebx
+        xor     ecx,ebx
+        psrlq   xmm6,17
+        xor     ebx,edi
+        add     edx,DWORD [8+esp]
+        ror     ecx,11
+        pxor    xmm7,xmm6
+        and     eax,ebx
+        xor     ecx,esi
+        psrlq   xmm6,2
+        add     edx,DWORD [84+esp]
+        xor     eax,edi
+        ror     ecx,2
+        pxor    xmm7,xmm6
+        add     eax,edx
+        add     edx,DWORD [24+esp]
+        pshufd  xmm7,xmm7,128
+        add     eax,ecx
+        mov     ecx,edx
+        ror     edx,14
+        mov     esi,DWORD [28+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        psrldq  xmm7,8
+        mov     DWORD [24+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        paddd   xmm3,xmm7
+        ror     edx,6
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [12+esp]
+        mov     esi,eax
+        ror     ecx,9
+        mov     DWORD [8+esp],eax
+        pshufd  xmm7,xmm3,80
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [4+esp]
+        movdqa  xmm6,xmm7
+        ror     ecx,11
+        psrld   xmm7,10
+        and     ebx,eax
+        psrlq   xmm6,17
+        xor     ecx,esi
+        add     edx,DWORD [88+esp]
+        xor     ebx,edi
+        ror     ecx,2
+        pxor    xmm7,xmm6
+        add     ebx,edx
+        add     edx,DWORD [20+esp]
+        psrlq   xmm6,2
+        add     ebx,ecx
+        mov     ecx,edx
+        ror     edx,14
+        pxor    xmm7,xmm6
+        mov     esi,DWORD [24+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [28+esp]
+        pshufd  xmm7,xmm7,8
+        xor     esi,edi
+        ror     edx,5
+        movdqa  xmm6,[48+ebp]
+        and     esi,ecx
+        mov     DWORD [20+esp],ecx
+        pslldq  xmm7,8
+        xor     edx,ecx
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [8+esp]
+        mov     esi,ebx
+        ror     ecx,9
+        paddd   xmm3,xmm7
+        mov     DWORD [4+esp],ebx
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [esp]
+        paddd   xmm6,xmm3
+        ror     ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        add     edx,DWORD [92+esp]
+        xor     eax,edi
+        ror     ecx,2
+        add     eax,edx
+        add     edx,DWORD [16+esp]
+        add     eax,ecx
+        movdqa  [80+esp],xmm6
+        cmp     DWORD [64+ebp],66051
+        jne     NEAR L$013ssse3_00_47
+        mov     ecx,edx
+        ror     edx,14
+        mov     esi,DWORD [20+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [24+esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [16+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [4+esp]
+        mov     esi,eax
+        ror     ecx,9
+        mov     DWORD [esp],eax
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [28+esp]
+        ror     ecx,11
+        and     ebx,eax
+        xor     ecx,esi
+        add     edx,DWORD [32+esp]
+        xor     ebx,edi
+        ror     ecx,2
+        add     ebx,edx
+        add     edx,DWORD [12+esp]
+        add     ebx,ecx
+        mov     ecx,edx
+        ror     edx,14
+        mov     esi,DWORD [16+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [20+esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [12+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [esp]
+        mov     esi,ebx
+        ror     ecx,9
+        mov     DWORD [28+esp],ebx
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [24+esp]
+        ror     ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        add     edx,DWORD [36+esp]
+        xor     eax,edi
+        ror     ecx,2
+        add     eax,edx
+        add     edx,DWORD [8+esp]
+        add     eax,ecx
+        mov     ecx,edx
+        ror     edx,14
+        mov     esi,DWORD [12+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [16+esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [8+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [28+esp]
+        mov     esi,eax
+        ror     ecx,9
+        mov     DWORD [24+esp],eax
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [20+esp]
+        ror     ecx,11
+        and     ebx,eax
+        xor     ecx,esi
+        add     edx,DWORD [40+esp]
+        xor     ebx,edi
+        ror     ecx,2
+        add     ebx,edx
+        add     edx,DWORD [4+esp]
+        add     ebx,ecx
+        mov     ecx,edx
+        ror     edx,14
+        mov     esi,DWORD [8+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [12+esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [4+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [24+esp]
+        mov     esi,ebx
+        ror     ecx,9
+        mov     DWORD [20+esp],ebx
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [16+esp]
+        ror     ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        add     edx,DWORD [44+esp]
+        xor     eax,edi
+        ror     ecx,2
+        add     eax,edx
+        add     edx,DWORD [esp]
+        add     eax,ecx
+        mov     ecx,edx
+        ror     edx,14
+        mov     esi,DWORD [4+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [8+esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [20+esp]
+        mov     esi,eax
+        ror     ecx,9
+        mov     DWORD [16+esp],eax
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [12+esp]
+        ror     ecx,11
+        and     ebx,eax
+        xor     ecx,esi
+        add     edx,DWORD [48+esp]
+        xor     ebx,edi
+        ror     ecx,2
+        add     ebx,edx
+        add     edx,DWORD [28+esp]
+        add     ebx,ecx
+        mov     ecx,edx
+        ror     edx,14
+        mov     esi,DWORD [esp]
+        xor     edx,ecx
+        mov     edi,DWORD [4+esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [28+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [16+esp]
+        mov     esi,ebx
+        ror     ecx,9
+        mov     DWORD [12+esp],ebx
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [8+esp]
+        ror     ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        add     edx,DWORD [52+esp]
+        xor     eax,edi
+        ror     ecx,2
+        add     eax,edx
+        add     edx,DWORD [24+esp]
+        add     eax,ecx
+        mov     ecx,edx
+        ror     edx,14
+        mov     esi,DWORD [28+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [24+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [12+esp]
+        mov     esi,eax
+        ror     ecx,9
+        mov     DWORD [8+esp],eax
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [4+esp]
+        ror     ecx,11
+        and     ebx,eax
+        xor     ecx,esi
+        add     edx,DWORD [56+esp]
+        xor     ebx,edi
+        ror     ecx,2
+        add     ebx,edx
+        add     edx,DWORD [20+esp]
+        add     ebx,ecx
+        mov     ecx,edx
+        ror     edx,14
+        mov     esi,DWORD [24+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [28+esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [20+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [8+esp]
+        mov     esi,ebx
+        ror     ecx,9
+        mov     DWORD [4+esp],ebx
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [esp]
+        ror     ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        add     edx,DWORD [60+esp]
+        xor     eax,edi
+        ror     ecx,2
+        add     eax,edx
+        add     edx,DWORD [16+esp]
+        add     eax,ecx
+        mov     ecx,edx
+        ror     edx,14
+        mov     esi,DWORD [20+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [24+esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [16+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [4+esp]
+        mov     esi,eax
+        ror     ecx,9
+        mov     DWORD [esp],eax
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [28+esp]
+        ror     ecx,11
+        and     ebx,eax
+        xor     ecx,esi
+        add     edx,DWORD [64+esp]
+        xor     ebx,edi
+        ror     ecx,2
+        add     ebx,edx
+        add     edx,DWORD [12+esp]
+        add     ebx,ecx
+        mov     ecx,edx
+        ror     edx,14
+        mov     esi,DWORD [16+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [20+esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [12+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [esp]
+        mov     esi,ebx
+        ror     ecx,9
+        mov     DWORD [28+esp],ebx
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [24+esp]
+        ror     ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        add     edx,DWORD [68+esp]
+        xor     eax,edi
+        ror     ecx,2
+        add     eax,edx
+        add     edx,DWORD [8+esp]
+        add     eax,ecx
+        mov     ecx,edx
+        ror     edx,14
+        mov     esi,DWORD [12+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [16+esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [8+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [28+esp]
+        mov     esi,eax
+        ror     ecx,9
+        mov     DWORD [24+esp],eax
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [20+esp]
+        ror     ecx,11
+        and     ebx,eax
+        xor     ecx,esi
+        add     edx,DWORD [72+esp]
+        xor     ebx,edi
+        ror     ecx,2
+        add     ebx,edx
+        add     edx,DWORD [4+esp]
+        add     ebx,ecx
+        mov     ecx,edx
+        ror     edx,14
+        mov     esi,DWORD [8+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [12+esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [4+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [24+esp]
+        mov     esi,ebx
+        ror     ecx,9
+        mov     DWORD [20+esp],ebx
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [16+esp]
+        ror     ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        add     edx,DWORD [76+esp]
+        xor     eax,edi
+        ror     ecx,2
+        add     eax,edx
+        add     edx,DWORD [esp]
+        add     eax,ecx
+        mov     ecx,edx
+        ror     edx,14
+        mov     esi,DWORD [4+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [8+esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [20+esp]
+        mov     esi,eax
+        ror     ecx,9
+        mov     DWORD [16+esp],eax
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [12+esp]
+        ror     ecx,11
+        and     ebx,eax
+        xor     ecx,esi
+        add     edx,DWORD [80+esp]
+        xor     ebx,edi
+        ror     ecx,2
+        add     ebx,edx
+        add     edx,DWORD [28+esp]
+        add     ebx,ecx
+        mov     ecx,edx
+        ror     edx,14
+        mov     esi,DWORD [esp]
+        xor     edx,ecx
+        mov     edi,DWORD [4+esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [28+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [16+esp]
+        mov     esi,ebx
+        ror     ecx,9
+        mov     DWORD [12+esp],ebx
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [8+esp]
+        ror     ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        add     edx,DWORD [84+esp]
+        xor     eax,edi
+        ror     ecx,2
+        add     eax,edx
+        add     edx,DWORD [24+esp]
+        add     eax,ecx
+        mov     ecx,edx
+        ror     edx,14
+        mov     esi,DWORD [28+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [24+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [12+esp]
+        mov     esi,eax
+        ror     ecx,9
+        mov     DWORD [8+esp],eax
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [4+esp]
+        ror     ecx,11
+        and     ebx,eax
+        xor     ecx,esi
+        add     edx,DWORD [88+esp]
+        xor     ebx,edi
+        ror     ecx,2
+        add     ebx,edx
+        add     edx,DWORD [20+esp]
+        add     ebx,ecx
+        mov     ecx,edx
+        ror     edx,14
+        mov     esi,DWORD [24+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [28+esp]
+        xor     esi,edi
+        ror     edx,5
+        and     esi,ecx
+        mov     DWORD [20+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        ror     edx,6
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [8+esp]
+        mov     esi,ebx
+        ror     ecx,9
+        mov     DWORD [4+esp],ebx
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [esp]
+        ror     ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        add     edx,DWORD [92+esp]
+        xor     eax,edi
+        ror     ecx,2
+        add     eax,edx
+        add     edx,DWORD [16+esp]
+        add     eax,ecx
+        mov     esi,DWORD [96+esp]
+        xor     ebx,edi
+        mov     ecx,DWORD [12+esp]
+        add     eax,DWORD [esi]
+        add     ebx,DWORD [4+esi]
+        add     edi,DWORD [8+esi]
+        add     ecx,DWORD [12+esi]
+        mov     DWORD [esi],eax
+        mov     DWORD [4+esi],ebx
+        mov     DWORD [8+esi],edi
+        mov     DWORD [12+esi],ecx
+        mov     DWORD [4+esp],ebx
+        xor     ebx,edi
+        mov     DWORD [8+esp],edi
+        mov     DWORD [12+esp],ecx
+        mov     edi,DWORD [20+esp]
+        mov     ecx,DWORD [24+esp]
+        add     edx,DWORD [16+esi]
+        add     edi,DWORD [20+esi]
+        add     ecx,DWORD [24+esi]
+        mov     DWORD [16+esi],edx
+        mov     DWORD [20+esi],edi
+        mov     DWORD [20+esp],edi
+        mov     edi,DWORD [28+esp]
+        mov     DWORD [24+esi],ecx
+        add     edi,DWORD [28+esi]
+        mov     DWORD [24+esp],ecx
+        mov     DWORD [28+esi],edi
+        mov     DWORD [28+esp],edi
+        mov     edi,DWORD [100+esp]
+        movdqa  xmm7,[64+ebp]
+        sub     ebp,192
+        cmp     edi,DWORD [104+esp]
+        jb      NEAR L$012grand_ssse3
+        mov     esp,DWORD [108+esp]
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+align   32
+L$005AVX:
+        and     edx,264
+        cmp     edx,264
+        je      NEAR L$014AVX_BMI
+        lea     esp,[esp-96]
+        vzeroall
+        mov     eax,DWORD [esi]
+        mov     ebx,DWORD [4+esi]
+        mov     ecx,DWORD [8+esi]
+        mov     edi,DWORD [12+esi]
+        mov     DWORD [4+esp],ebx
+        xor     ebx,ecx
+        mov     DWORD [8+esp],ecx
+        mov     DWORD [12+esp],edi
+        mov     edx,DWORD [16+esi]
+        mov     edi,DWORD [20+esi]
+        mov     ecx,DWORD [24+esi]
+        mov     esi,DWORD [28+esi]
+        mov     DWORD [20+esp],edi
+        mov     edi,DWORD [100+esp]
+        mov     DWORD [24+esp],ecx
+        mov     DWORD [28+esp],esi
+        vmovdqa xmm7,[256+ebp]
+        jmp     NEAR L$015grand_avx
+align   32
+L$015grand_avx:
+        vmovdqu xmm0,[edi]
+        vmovdqu xmm1,[16+edi]
+        vmovdqu xmm2,[32+edi]
+        vmovdqu xmm3,[48+edi]
+        add     edi,64
+        vpshufb xmm0,xmm0,xmm7
+        mov     DWORD [100+esp],edi
+        vpshufb xmm1,xmm1,xmm7
+        vpshufb xmm2,xmm2,xmm7
+        vpaddd  xmm4,xmm0,[ebp]
+        vpshufb xmm3,xmm3,xmm7
+        vpaddd  xmm5,xmm1,[16+ebp]
+        vpaddd  xmm6,xmm2,[32+ebp]
+        vpaddd  xmm7,xmm3,[48+ebp]
+        vmovdqa [32+esp],xmm4
+        vmovdqa [48+esp],xmm5
+        vmovdqa [64+esp],xmm6
+        vmovdqa [80+esp],xmm7
+        jmp     NEAR L$016avx_00_47
+align   16
+L$016avx_00_47:
+        add     ebp,64
+        vpalignr        xmm4,xmm1,xmm0,4
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [20+esp]
+        vpalignr        xmm7,xmm3,xmm2,4
+        xor     edx,ecx
+        mov     edi,DWORD [24+esp]
+        xor     esi,edi
+        vpsrld  xmm6,xmm4,7
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [16+esp],ecx
+        vpaddd  xmm0,xmm0,xmm7
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        vpsrld  xmm7,xmm4,3
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [4+esp]
+        vpslld  xmm5,xmm4,14
+        mov     esi,eax
+        shrd    ecx,ecx,9
+        mov     DWORD [esp],eax
+        vpxor   xmm4,xmm7,xmm6
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [28+esp]
+        vpshufd xmm7,xmm3,250
+        shrd    ecx,ecx,11
+        and     ebx,eax
+        xor     ecx,esi
+        vpsrld  xmm6,xmm6,11
+        add     edx,DWORD [32+esp]
+        xor     ebx,edi
+        shrd    ecx,ecx,2
+        vpxor   xmm4,xmm4,xmm5
+        add     ebx,edx
+        add     edx,DWORD [12+esp]
+        add     ebx,ecx
+        vpslld  xmm5,xmm5,11
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [16+esp]
+        vpxor   xmm4,xmm4,xmm6
+        xor     edx,ecx
+        mov     edi,DWORD [20+esp]
+        xor     esi,edi
+        vpsrld  xmm6,xmm7,10
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [12+esp],ecx
+        vpxor   xmm4,xmm4,xmm5
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        vpsrlq  xmm5,xmm7,17
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [esp]
+        vpaddd  xmm0,xmm0,xmm4
+        mov     esi,ebx
+        shrd    ecx,ecx,9
+        mov     DWORD [28+esp],ebx
+        vpxor   xmm6,xmm6,xmm5
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [24+esp]
+        vpsrlq  xmm7,xmm7,19
+        shrd    ecx,ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        vpxor   xmm6,xmm6,xmm7
+        add     edx,DWORD [36+esp]
+        xor     eax,edi
+        shrd    ecx,ecx,2
+        vpshufd xmm7,xmm6,132
+        add     eax,edx
+        add     edx,DWORD [8+esp]
+        add     eax,ecx
+        vpsrldq xmm7,xmm7,8
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [12+esp]
+        vpaddd  xmm0,xmm0,xmm7
+        xor     edx,ecx
+        mov     edi,DWORD [16+esp]
+        xor     esi,edi
+        vpshufd xmm7,xmm0,80
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [8+esp],ecx
+        vpsrld  xmm6,xmm7,10
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        vpsrlq  xmm5,xmm7,17
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [28+esp]
+        vpxor   xmm6,xmm6,xmm5
+        mov     esi,eax
+        shrd    ecx,ecx,9
+        mov     DWORD [24+esp],eax
+        vpsrlq  xmm7,xmm7,19
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [20+esp]
+        vpxor   xmm6,xmm6,xmm7
+        shrd    ecx,ecx,11
+        and     ebx,eax
+        xor     ecx,esi
+        vpshufd xmm7,xmm6,232
+        add     edx,DWORD [40+esp]
+        xor     ebx,edi
+        shrd    ecx,ecx,2
+        vpslldq xmm7,xmm7,8
+        add     ebx,edx
+        add     edx,DWORD [4+esp]
+        add     ebx,ecx
+        vpaddd  xmm0,xmm0,xmm7
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [8+esp]
+        vpaddd  xmm6,xmm0,[ebp]
+        xor     edx,ecx
+        mov     edi,DWORD [12+esp]
+        xor     esi,edi
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [4+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [24+esp]
+        mov     esi,ebx
+        shrd    ecx,ecx,9
+        mov     DWORD [20+esp],ebx
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [16+esp]
+        shrd    ecx,ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        add     edx,DWORD [44+esp]
+        xor     eax,edi
+        shrd    ecx,ecx,2
+        add     eax,edx
+        add     edx,DWORD [esp]
+        add     eax,ecx
+        vmovdqa [32+esp],xmm6
+        vpalignr        xmm4,xmm2,xmm1,4
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [4+esp]
+        vpalignr        xmm7,xmm0,xmm3,4
+        xor     edx,ecx
+        mov     edi,DWORD [8+esp]
+        xor     esi,edi
+        vpsrld  xmm6,xmm4,7
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [esp],ecx
+        vpaddd  xmm1,xmm1,xmm7
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        vpsrld  xmm7,xmm4,3
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [20+esp]
+        vpslld  xmm5,xmm4,14
+        mov     esi,eax
+        shrd    ecx,ecx,9
+        mov     DWORD [16+esp],eax
+        vpxor   xmm4,xmm7,xmm6
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [12+esp]
+        vpshufd xmm7,xmm0,250
+        shrd    ecx,ecx,11
+        and     ebx,eax
+        xor     ecx,esi
+        vpsrld  xmm6,xmm6,11
+        add     edx,DWORD [48+esp]
+        xor     ebx,edi
+        shrd    ecx,ecx,2
+        vpxor   xmm4,xmm4,xmm5
+        add     ebx,edx
+        add     edx,DWORD [28+esp]
+        add     ebx,ecx
+        vpslld  xmm5,xmm5,11
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [esp]
+        vpxor   xmm4,xmm4,xmm6
+        xor     edx,ecx
+        mov     edi,DWORD [4+esp]
+        xor     esi,edi
+        vpsrld  xmm6,xmm7,10
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [28+esp],ecx
+        vpxor   xmm4,xmm4,xmm5
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        vpsrlq  xmm5,xmm7,17
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [16+esp]
+        vpaddd  xmm1,xmm1,xmm4
+        mov     esi,ebx
+        shrd    ecx,ecx,9
+        mov     DWORD [12+esp],ebx
+        vpxor   xmm6,xmm6,xmm5
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [8+esp]
+        vpsrlq  xmm7,xmm7,19
+        shrd    ecx,ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        vpxor   xmm6,xmm6,xmm7
+        add     edx,DWORD [52+esp]
+        xor     eax,edi
+        shrd    ecx,ecx,2
+        vpshufd xmm7,xmm6,132
+        add     eax,edx
+        add     edx,DWORD [24+esp]
+        add     eax,ecx
+        vpsrldq xmm7,xmm7,8
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [28+esp]
+        vpaddd  xmm1,xmm1,xmm7
+        xor     edx,ecx
+        mov     edi,DWORD [esp]
+        xor     esi,edi
+        vpshufd xmm7,xmm1,80
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [24+esp],ecx
+        vpsrld  xmm6,xmm7,10
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        vpsrlq  xmm5,xmm7,17
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [12+esp]
+        vpxor   xmm6,xmm6,xmm5
+        mov     esi,eax
+        shrd    ecx,ecx,9
+        mov     DWORD [8+esp],eax
+        vpsrlq  xmm7,xmm7,19
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [4+esp]
+        vpxor   xmm6,xmm6,xmm7
+        shrd    ecx,ecx,11
+        and     ebx,eax
+        xor     ecx,esi
+        vpshufd xmm7,xmm6,232
+        add     edx,DWORD [56+esp]
+        xor     ebx,edi
+        shrd    ecx,ecx,2
+        vpslldq xmm7,xmm7,8
+        add     ebx,edx
+        add     edx,DWORD [20+esp]
+        add     ebx,ecx
+        vpaddd  xmm1,xmm1,xmm7
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [24+esp]
+        vpaddd  xmm6,xmm1,[16+ebp]
+        xor     edx,ecx
+        mov     edi,DWORD [28+esp]
+        xor     esi,edi
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [20+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [8+esp]
+        mov     esi,ebx
+        shrd    ecx,ecx,9
+        mov     DWORD [4+esp],ebx
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [esp]
+        shrd    ecx,ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        add     edx,DWORD [60+esp]
+        xor     eax,edi
+        shrd    ecx,ecx,2
+        add     eax,edx
+        add     edx,DWORD [16+esp]
+        add     eax,ecx
+        vmovdqa [48+esp],xmm6
+        vpalignr        xmm4,xmm3,xmm2,4
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [20+esp]
+        vpalignr        xmm7,xmm1,xmm0,4
+        xor     edx,ecx
+        mov     edi,DWORD [24+esp]
+        xor     esi,edi
+        vpsrld  xmm6,xmm4,7
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [16+esp],ecx
+        vpaddd  xmm2,xmm2,xmm7
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        vpsrld  xmm7,xmm4,3
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [4+esp]
+        vpslld  xmm5,xmm4,14
+        mov     esi,eax
+        shrd    ecx,ecx,9
+        mov     DWORD [esp],eax
+        vpxor   xmm4,xmm7,xmm6
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [28+esp]
+        vpshufd xmm7,xmm1,250
+        shrd    ecx,ecx,11
+        and     ebx,eax
+        xor     ecx,esi
+        vpsrld  xmm6,xmm6,11
+        add     edx,DWORD [64+esp]
+        xor     ebx,edi
+        shrd    ecx,ecx,2
+        vpxor   xmm4,xmm4,xmm5
+        add     ebx,edx
+        add     edx,DWORD [12+esp]
+        add     ebx,ecx
+        vpslld  xmm5,xmm5,11
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [16+esp]
+        vpxor   xmm4,xmm4,xmm6
+        xor     edx,ecx
+        mov     edi,DWORD [20+esp]
+        xor     esi,edi
+        vpsrld  xmm6,xmm7,10
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [12+esp],ecx
+        vpxor   xmm4,xmm4,xmm5
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        vpsrlq  xmm5,xmm7,17
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [esp]
+        vpaddd  xmm2,xmm2,xmm4
+        mov     esi,ebx
+        shrd    ecx,ecx,9
+        mov     DWORD [28+esp],ebx
+        vpxor   xmm6,xmm6,xmm5
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [24+esp]
+        vpsrlq  xmm7,xmm7,19
+        shrd    ecx,ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        vpxor   xmm6,xmm6,xmm7
+        add     edx,DWORD [68+esp]
+        xor     eax,edi
+        shrd    ecx,ecx,2
+        vpshufd xmm7,xmm6,132
+        add     eax,edx
+        add     edx,DWORD [8+esp]
+        add     eax,ecx
+        vpsrldq xmm7,xmm7,8
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [12+esp]
+        vpaddd  xmm2,xmm2,xmm7
+        xor     edx,ecx
+        mov     edi,DWORD [16+esp]
+        xor     esi,edi
+        vpshufd xmm7,xmm2,80
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [8+esp],ecx
+        vpsrld  xmm6,xmm7,10
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        vpsrlq  xmm5,xmm7,17
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [28+esp]
+        vpxor   xmm6,xmm6,xmm5
+        mov     esi,eax
+        shrd    ecx,ecx,9
+        mov     DWORD [24+esp],eax
+        vpsrlq  xmm7,xmm7,19
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [20+esp]
+        vpxor   xmm6,xmm6,xmm7
+        shrd    ecx,ecx,11
+        and     ebx,eax
+        xor     ecx,esi
+        vpshufd xmm7,xmm6,232
+        add     edx,DWORD [72+esp]
+        xor     ebx,edi
+        shrd    ecx,ecx,2
+        vpslldq xmm7,xmm7,8
+        add     ebx,edx
+        add     edx,DWORD [4+esp]
+        add     ebx,ecx
+        vpaddd  xmm2,xmm2,xmm7
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [8+esp]
+        vpaddd  xmm6,xmm2,[32+ebp]
+        xor     edx,ecx
+        mov     edi,DWORD [12+esp]
+        xor     esi,edi
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [4+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [24+esp]
+        mov     esi,ebx
+        shrd    ecx,ecx,9
+        mov     DWORD [20+esp],ebx
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [16+esp]
+        shrd    ecx,ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        add     edx,DWORD [76+esp]
+        xor     eax,edi
+        shrd    ecx,ecx,2
+        add     eax,edx
+        add     edx,DWORD [esp]
+        add     eax,ecx
+        vmovdqa [64+esp],xmm6
+        vpalignr        xmm4,xmm0,xmm3,4
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [4+esp]
+        vpalignr        xmm7,xmm2,xmm1,4
+        xor     edx,ecx
+        mov     edi,DWORD [8+esp]
+        xor     esi,edi
+        vpsrld  xmm6,xmm4,7
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [esp],ecx
+        vpaddd  xmm3,xmm3,xmm7
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        vpsrld  xmm7,xmm4,3
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [20+esp]
+        vpslld  xmm5,xmm4,14
+        mov     esi,eax
+        shrd    ecx,ecx,9
+        mov     DWORD [16+esp],eax
+        vpxor   xmm4,xmm7,xmm6
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [12+esp]
+        vpshufd xmm7,xmm2,250
+        shrd    ecx,ecx,11
+        and     ebx,eax
+        xor     ecx,esi
+        vpsrld  xmm6,xmm6,11
+        add     edx,DWORD [80+esp]
+        xor     ebx,edi
+        shrd    ecx,ecx,2
+        vpxor   xmm4,xmm4,xmm5
+        add     ebx,edx
+        add     edx,DWORD [28+esp]
+        add     ebx,ecx
+        vpslld  xmm5,xmm5,11
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [esp]
+        vpxor   xmm4,xmm4,xmm6
+        xor     edx,ecx
+        mov     edi,DWORD [4+esp]
+        xor     esi,edi
+        vpsrld  xmm6,xmm7,10
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [28+esp],ecx
+        vpxor   xmm4,xmm4,xmm5
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        vpsrlq  xmm5,xmm7,17
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [16+esp]
+        vpaddd  xmm3,xmm3,xmm4
+        mov     esi,ebx
+        shrd    ecx,ecx,9
+        mov     DWORD [12+esp],ebx
+        vpxor   xmm6,xmm6,xmm5
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [8+esp]
+        vpsrlq  xmm7,xmm7,19
+        shrd    ecx,ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        vpxor   xmm6,xmm6,xmm7
+        add     edx,DWORD [84+esp]
+        xor     eax,edi
+        shrd    ecx,ecx,2
+        vpshufd xmm7,xmm6,132
+        add     eax,edx
+        add     edx,DWORD [24+esp]
+        add     eax,ecx
+        vpsrldq xmm7,xmm7,8
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [28+esp]
+        vpaddd  xmm3,xmm3,xmm7
+        xor     edx,ecx
+        mov     edi,DWORD [esp]
+        xor     esi,edi
+        vpshufd xmm7,xmm3,80
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [24+esp],ecx
+        vpsrld  xmm6,xmm7,10
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        vpsrlq  xmm5,xmm7,17
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [12+esp]
+        vpxor   xmm6,xmm6,xmm5
+        mov     esi,eax
+        shrd    ecx,ecx,9
+        mov     DWORD [8+esp],eax
+        vpsrlq  xmm7,xmm7,19
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [4+esp]
+        vpxor   xmm6,xmm6,xmm7
+        shrd    ecx,ecx,11
+        and     ebx,eax
+        xor     ecx,esi
+        vpshufd xmm7,xmm6,232
+        add     edx,DWORD [88+esp]
+        xor     ebx,edi
+        shrd    ecx,ecx,2
+        vpslldq xmm7,xmm7,8
+        add     ebx,edx
+        add     edx,DWORD [20+esp]
+        add     ebx,ecx
+        vpaddd  xmm3,xmm3,xmm7
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [24+esp]
+        vpaddd  xmm6,xmm3,[48+ebp]
+        xor     edx,ecx
+        mov     edi,DWORD [28+esp]
+        xor     esi,edi
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [20+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [8+esp]
+        mov     esi,ebx
+        shrd    ecx,ecx,9
+        mov     DWORD [4+esp],ebx
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [esp]
+        shrd    ecx,ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        add     edx,DWORD [92+esp]
+        xor     eax,edi
+        shrd    ecx,ecx,2
+        add     eax,edx
+        add     edx,DWORD [16+esp]
+        add     eax,ecx
+        vmovdqa [80+esp],xmm6
+        cmp     DWORD [64+ebp],66051
+        jne     NEAR L$016avx_00_47
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [20+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [24+esp]
+        xor     esi,edi
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [16+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [4+esp]
+        mov     esi,eax
+        shrd    ecx,ecx,9
+        mov     DWORD [esp],eax
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [28+esp]
+        shrd    ecx,ecx,11
+        and     ebx,eax
+        xor     ecx,esi
+        add     edx,DWORD [32+esp]
+        xor     ebx,edi
+        shrd    ecx,ecx,2
+        add     ebx,edx
+        add     edx,DWORD [12+esp]
+        add     ebx,ecx
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [16+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [20+esp]
+        xor     esi,edi
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [12+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [esp]
+        mov     esi,ebx
+        shrd    ecx,ecx,9
+        mov     DWORD [28+esp],ebx
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [24+esp]
+        shrd    ecx,ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        add     edx,DWORD [36+esp]
+        xor     eax,edi
+        shrd    ecx,ecx,2
+        add     eax,edx
+        add     edx,DWORD [8+esp]
+        add     eax,ecx
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [12+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [16+esp]
+        xor     esi,edi
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [8+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [28+esp]
+        mov     esi,eax
+        shrd    ecx,ecx,9
+        mov     DWORD [24+esp],eax
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [20+esp]
+        shrd    ecx,ecx,11
+        and     ebx,eax
+        xor     ecx,esi
+        add     edx,DWORD [40+esp]
+        xor     ebx,edi
+        shrd    ecx,ecx,2
+        add     ebx,edx
+        add     edx,DWORD [4+esp]
+        add     ebx,ecx
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [8+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [12+esp]
+        xor     esi,edi
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [4+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [24+esp]
+        mov     esi,ebx
+        shrd    ecx,ecx,9
+        mov     DWORD [20+esp],ebx
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [16+esp]
+        shrd    ecx,ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        add     edx,DWORD [44+esp]
+        xor     eax,edi
+        shrd    ecx,ecx,2
+        add     eax,edx
+        add     edx,DWORD [esp]
+        add     eax,ecx
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [4+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [8+esp]
+        xor     esi,edi
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [20+esp]
+        mov     esi,eax
+        shrd    ecx,ecx,9
+        mov     DWORD [16+esp],eax
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [12+esp]
+        shrd    ecx,ecx,11
+        and     ebx,eax
+        xor     ecx,esi
+        add     edx,DWORD [48+esp]
+        xor     ebx,edi
+        shrd    ecx,ecx,2
+        add     ebx,edx
+        add     edx,DWORD [28+esp]
+        add     ebx,ecx
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [esp]
+        xor     edx,ecx
+        mov     edi,DWORD [4+esp]
+        xor     esi,edi
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [28+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [16+esp]
+        mov     esi,ebx
+        shrd    ecx,ecx,9
+        mov     DWORD [12+esp],ebx
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [8+esp]
+        shrd    ecx,ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        add     edx,DWORD [52+esp]
+        xor     eax,edi
+        shrd    ecx,ecx,2
+        add     eax,edx
+        add     edx,DWORD [24+esp]
+        add     eax,ecx
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [28+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [esp]
+        xor     esi,edi
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [24+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [12+esp]
+        mov     esi,eax
+        shrd    ecx,ecx,9
+        mov     DWORD [8+esp],eax
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [4+esp]
+        shrd    ecx,ecx,11
+        and     ebx,eax
+        xor     ecx,esi
+        add     edx,DWORD [56+esp]
+        xor     ebx,edi
+        shrd    ecx,ecx,2
+        add     ebx,edx
+        add     edx,DWORD [20+esp]
+        add     ebx,ecx
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [24+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [28+esp]
+        xor     esi,edi
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [20+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [8+esp]
+        mov     esi,ebx
+        shrd    ecx,ecx,9
+        mov     DWORD [4+esp],ebx
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [esp]
+        shrd    ecx,ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        add     edx,DWORD [60+esp]
+        xor     eax,edi
+        shrd    ecx,ecx,2
+        add     eax,edx
+        add     edx,DWORD [16+esp]
+        add     eax,ecx
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [20+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [24+esp]
+        xor     esi,edi
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [16+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [4+esp]
+        mov     esi,eax
+        shrd    ecx,ecx,9
+        mov     DWORD [esp],eax
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [28+esp]
+        shrd    ecx,ecx,11
+        and     ebx,eax
+        xor     ecx,esi
+        add     edx,DWORD [64+esp]
+        xor     ebx,edi
+        shrd    ecx,ecx,2
+        add     ebx,edx
+        add     edx,DWORD [12+esp]
+        add     ebx,ecx
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [16+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [20+esp]
+        xor     esi,edi
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [12+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [esp]
+        mov     esi,ebx
+        shrd    ecx,ecx,9
+        mov     DWORD [28+esp],ebx
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [24+esp]
+        shrd    ecx,ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        add     edx,DWORD [68+esp]
+        xor     eax,edi
+        shrd    ecx,ecx,2
+        add     eax,edx
+        add     edx,DWORD [8+esp]
+        add     eax,ecx
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [12+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [16+esp]
+        xor     esi,edi
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [8+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [28+esp]
+        mov     esi,eax
+        shrd    ecx,ecx,9
+        mov     DWORD [24+esp],eax
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [20+esp]
+        shrd    ecx,ecx,11
+        and     ebx,eax
+        xor     ecx,esi
+        add     edx,DWORD [72+esp]
+        xor     ebx,edi
+        shrd    ecx,ecx,2
+        add     ebx,edx
+        add     edx,DWORD [4+esp]
+        add     ebx,ecx
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [8+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [12+esp]
+        xor     esi,edi
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [4+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [24+esp]
+        mov     esi,ebx
+        shrd    ecx,ecx,9
+        mov     DWORD [20+esp],ebx
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [16+esp]
+        shrd    ecx,ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        add     edx,DWORD [76+esp]
+        xor     eax,edi
+        shrd    ecx,ecx,2
+        add     eax,edx
+        add     edx,DWORD [esp]
+        add     eax,ecx
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [4+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [8+esp]
+        xor     esi,edi
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [20+esp]
+        mov     esi,eax
+        shrd    ecx,ecx,9
+        mov     DWORD [16+esp],eax
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [12+esp]
+        shrd    ecx,ecx,11
+        and     ebx,eax
+        xor     ecx,esi
+        add     edx,DWORD [80+esp]
+        xor     ebx,edi
+        shrd    ecx,ecx,2
+        add     ebx,edx
+        add     edx,DWORD [28+esp]
+        add     ebx,ecx
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [esp]
+        xor     edx,ecx
+        mov     edi,DWORD [4+esp]
+        xor     esi,edi
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [28+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [16+esp]
+        mov     esi,ebx
+        shrd    ecx,ecx,9
+        mov     DWORD [12+esp],ebx
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [8+esp]
+        shrd    ecx,ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        add     edx,DWORD [84+esp]
+        xor     eax,edi
+        shrd    ecx,ecx,2
+        add     eax,edx
+        add     edx,DWORD [24+esp]
+        add     eax,ecx
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [28+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [esp]
+        xor     esi,edi
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [24+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        mov     ecx,eax
+        add     edx,edi
+        mov     edi,DWORD [12+esp]
+        mov     esi,eax
+        shrd    ecx,ecx,9
+        mov     DWORD [8+esp],eax
+        xor     ecx,eax
+        xor     eax,edi
+        add     edx,DWORD [4+esp]
+        shrd    ecx,ecx,11
+        and     ebx,eax
+        xor     ecx,esi
+        add     edx,DWORD [88+esp]
+        xor     ebx,edi
+        shrd    ecx,ecx,2
+        add     ebx,edx
+        add     edx,DWORD [20+esp]
+        add     ebx,ecx
+        mov     ecx,edx
+        shrd    edx,edx,14
+        mov     esi,DWORD [24+esp]
+        xor     edx,ecx
+        mov     edi,DWORD [28+esp]
+        xor     esi,edi
+        shrd    edx,edx,5
+        and     esi,ecx
+        mov     DWORD [20+esp],ecx
+        xor     edx,ecx
+        xor     edi,esi
+        shrd    edx,edx,6
+        mov     ecx,ebx
+        add     edx,edi
+        mov     edi,DWORD [8+esp]
+        mov     esi,ebx
+        shrd    ecx,ecx,9
+        mov     DWORD [4+esp],ebx
+        xor     ecx,ebx
+        xor     ebx,edi
+        add     edx,DWORD [esp]
+        shrd    ecx,ecx,11
+        and     eax,ebx
+        xor     ecx,esi
+        add     edx,DWORD [92+esp]
+        xor     eax,edi
+        shrd    ecx,ecx,2
+        add     eax,edx
+        add     edx,DWORD [16+esp]
+        add     eax,ecx
+        mov     esi,DWORD [96+esp]
+        xor     ebx,edi
+        mov     ecx,DWORD [12+esp]
+        add     eax,DWORD [esi]
+        add     ebx,DWORD [4+esi]
+        add     edi,DWORD [8+esi]
+        add     ecx,DWORD [12+esi]
+        mov     DWORD [esi],eax
+        mov     DWORD [4+esi],ebx
+        mov     DWORD [8+esi],edi
+        mov     DWORD [12+esi],ecx
+        mov     DWORD [4+esp],ebx
+        xor     ebx,edi
+        mov     DWORD [8+esp],edi
+        mov     DWORD [12+esp],ecx
+        mov     edi,DWORD [20+esp]
+        mov     ecx,DWORD [24+esp]
+        add     edx,DWORD [16+esi]
+        add     edi,DWORD [20+esi]
+        add     ecx,DWORD [24+esi]
+        mov     DWORD [16+esi],edx
+        mov     DWORD [20+esi],edi
+        mov     DWORD [20+esp],edi
+        mov     edi,DWORD [28+esp]
+        mov     DWORD [24+esi],ecx
+        add     edi,DWORD [28+esi]
+        mov     DWORD [24+esp],ecx
+        mov     DWORD [28+esi],edi
+        mov     DWORD [28+esp],edi
+        mov     edi,DWORD [100+esp]
+        vmovdqa xmm7,[64+ebp]
+        sub     ebp,192
+        cmp     edi,DWORD [104+esp]
+        jb      NEAR L$015grand_avx
+        mov     esp,DWORD [108+esp]
+        vzeroall
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+align   32
+L$014AVX_BMI:
+        lea     esp,[esp-96]
+        vzeroall
+        mov     eax,DWORD [esi]
+        mov     ebx,DWORD [4+esi]
+        mov     ecx,DWORD [8+esi]
+        mov     edi,DWORD [12+esi]
+        mov     DWORD [4+esp],ebx
+        xor     ebx,ecx
+        mov     DWORD [8+esp],ecx
+        mov     DWORD [12+esp],edi
+        mov     edx,DWORD [16+esi]
+        mov     edi,DWORD [20+esi]
+        mov     ecx,DWORD [24+esi]
+        mov     esi,DWORD [28+esi]
+        mov     DWORD [20+esp],edi
+        mov     edi,DWORD [100+esp]
+        mov     DWORD [24+esp],ecx
+        mov     DWORD [28+esp],esi
+        vmovdqa xmm7,[256+ebp]
+        jmp     NEAR L$017grand_avx_bmi
+align   32
+L$017grand_avx_bmi:
+        vmovdqu xmm0,[edi]
+        vmovdqu xmm1,[16+edi]
+        vmovdqu xmm2,[32+edi]
+        vmovdqu xmm3,[48+edi]
+        add     edi,64
+        vpshufb xmm0,xmm0,xmm7
+        mov     DWORD [100+esp],edi
+        vpshufb xmm1,xmm1,xmm7
+        vpshufb xmm2,xmm2,xmm7
+        vpaddd  xmm4,xmm0,[ebp]
+        vpshufb xmm3,xmm3,xmm7
+        vpaddd  xmm5,xmm1,[16+ebp]
+        vpaddd  xmm6,xmm2,[32+ebp]
+        vpaddd  xmm7,xmm3,[48+ebp]
+        vmovdqa [32+esp],xmm4
+        vmovdqa [48+esp],xmm5
+        vmovdqa [64+esp],xmm6
+        vmovdqa [80+esp],xmm7
+        jmp     NEAR L$018avx_bmi_00_47
+align   16
+L$018avx_bmi_00_47:
+        add     ebp,64
+        vpalignr        xmm4,xmm1,xmm0,4
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        mov     DWORD [16+esp],edx
+        vpalignr        xmm7,xmm3,xmm2,4
+        rorx    edi,edx,25
+        xor     ecx,esi
+        andn    esi,edx,DWORD [24+esp]
+        vpsrld  xmm6,xmm4,7
+        xor     ecx,edi
+        and     edx,DWORD [20+esp]
+        mov     DWORD [esp],eax
+        vpaddd  xmm0,xmm0,xmm7
+        or      edx,esi
+        rorx    edi,eax,2
+        rorx    esi,eax,13
+        vpsrld  xmm7,xmm4,3
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,eax,22
+        xor     esi,edi
+        vpslld  xmm5,xmm4,14
+        mov     edi,DWORD [4+esp]
+        xor     ecx,esi
+        xor     eax,edi
+        vpxor   xmm4,xmm7,xmm6
+        add     edx,DWORD [28+esp]
+        and     ebx,eax
+        add     edx,DWORD [32+esp]
+        vpshufd xmm7,xmm3,250
+        xor     ebx,edi
+        add     ecx,edx
+        add     edx,DWORD [12+esp]
+        vpsrld  xmm6,xmm6,11
+        lea     ebx,[ecx*1+ebx]
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        vpxor   xmm4,xmm4,xmm5
+        mov     DWORD [12+esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        vpslld  xmm5,xmm5,11
+        andn    esi,edx,DWORD [20+esp]
+        xor     ecx,edi
+        and     edx,DWORD [16+esp]
+        vpxor   xmm4,xmm4,xmm6
+        mov     DWORD [28+esp],ebx
+        or      edx,esi
+        rorx    edi,ebx,2
+        rorx    esi,ebx,13
+        vpsrld  xmm6,xmm7,10
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,ebx,22
+        xor     esi,edi
+        vpxor   xmm4,xmm4,xmm5
+        mov     edi,DWORD [esp]
+        xor     ecx,esi
+        xor     ebx,edi
+        vpsrlq  xmm5,xmm7,17
+        add     edx,DWORD [24+esp]
+        and     eax,ebx
+        add     edx,DWORD [36+esp]
+        vpaddd  xmm0,xmm0,xmm4
+        xor     eax,edi
+        add     ecx,edx
+        add     edx,DWORD [8+esp]
+        vpxor   xmm6,xmm6,xmm5
+        lea     eax,[ecx*1+eax]
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        vpsrlq  xmm7,xmm7,19
+        mov     DWORD [8+esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        vpxor   xmm6,xmm6,xmm7
+        andn    esi,edx,DWORD [16+esp]
+        xor     ecx,edi
+        and     edx,DWORD [12+esp]
+        vpshufd xmm7,xmm6,132
+        mov     DWORD [24+esp],eax
+        or      edx,esi
+        rorx    edi,eax,2
+        rorx    esi,eax,13
+        vpsrldq xmm7,xmm7,8
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,eax,22
+        xor     esi,edi
+        vpaddd  xmm0,xmm0,xmm7
+        mov     edi,DWORD [28+esp]
+        xor     ecx,esi
+        xor     eax,edi
+        vpshufd xmm7,xmm0,80
+        add     edx,DWORD [20+esp]
+        and     ebx,eax
+        add     edx,DWORD [40+esp]
+        vpsrld  xmm6,xmm7,10
+        xor     ebx,edi
+        add     ecx,edx
+        add     edx,DWORD [4+esp]
+        vpsrlq  xmm5,xmm7,17
+        lea     ebx,[ecx*1+ebx]
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        vpxor   xmm6,xmm6,xmm5
+        mov     DWORD [4+esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        vpsrlq  xmm7,xmm7,19
+        andn    esi,edx,DWORD [12+esp]
+        xor     ecx,edi
+        and     edx,DWORD [8+esp]
+        vpxor   xmm6,xmm6,xmm7
+        mov     DWORD [20+esp],ebx
+        or      edx,esi
+        rorx    edi,ebx,2
+        rorx    esi,ebx,13
+        vpshufd xmm7,xmm6,232
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,ebx,22
+        xor     esi,edi
+        vpslldq xmm7,xmm7,8
+        mov     edi,DWORD [24+esp]
+        xor     ecx,esi
+        xor     ebx,edi
+        vpaddd  xmm0,xmm0,xmm7
+        add     edx,DWORD [16+esp]
+        and     eax,ebx
+        add     edx,DWORD [44+esp]
+        vpaddd  xmm6,xmm0,[ebp]
+        xor     eax,edi
+        add     ecx,edx
+        add     edx,DWORD [esp]
+        lea     eax,[ecx*1+eax]
+        vmovdqa [32+esp],xmm6
+        vpalignr        xmm4,xmm2,xmm1,4
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        mov     DWORD [esp],edx
+        vpalignr        xmm7,xmm0,xmm3,4
+        rorx    edi,edx,25
+        xor     ecx,esi
+        andn    esi,edx,DWORD [8+esp]
+        vpsrld  xmm6,xmm4,7
+        xor     ecx,edi
+        and     edx,DWORD [4+esp]
+        mov     DWORD [16+esp],eax
+        vpaddd  xmm1,xmm1,xmm7
+        or      edx,esi
+        rorx    edi,eax,2
+        rorx    esi,eax,13
+        vpsrld  xmm7,xmm4,3
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,eax,22
+        xor     esi,edi
+        vpslld  xmm5,xmm4,14
+        mov     edi,DWORD [20+esp]
+        xor     ecx,esi
+        xor     eax,edi
+        vpxor   xmm4,xmm7,xmm6
+        add     edx,DWORD [12+esp]
+        and     ebx,eax
+        add     edx,DWORD [48+esp]
+        vpshufd xmm7,xmm0,250
+        xor     ebx,edi
+        add     ecx,edx
+        add     edx,DWORD [28+esp]
+        vpsrld  xmm6,xmm6,11
+        lea     ebx,[ecx*1+ebx]
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        vpxor   xmm4,xmm4,xmm5
+        mov     DWORD [28+esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        vpslld  xmm5,xmm5,11
+        andn    esi,edx,DWORD [4+esp]
+        xor     ecx,edi
+        and     edx,DWORD [esp]
+        vpxor   xmm4,xmm4,xmm6
+        mov     DWORD [12+esp],ebx
+        or      edx,esi
+        rorx    edi,ebx,2
+        rorx    esi,ebx,13
+        vpsrld  xmm6,xmm7,10
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,ebx,22
+        xor     esi,edi
+        vpxor   xmm4,xmm4,xmm5
+        mov     edi,DWORD [16+esp]
+        xor     ecx,esi
+        xor     ebx,edi
+        vpsrlq  xmm5,xmm7,17
+        add     edx,DWORD [8+esp]
+        and     eax,ebx
+        add     edx,DWORD [52+esp]
+        vpaddd  xmm1,xmm1,xmm4
+        xor     eax,edi
+        add     ecx,edx
+        add     edx,DWORD [24+esp]
+        vpxor   xmm6,xmm6,xmm5
+        lea     eax,[ecx*1+eax]
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        vpsrlq  xmm7,xmm7,19
+        mov     DWORD [24+esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        vpxor   xmm6,xmm6,xmm7
+        andn    esi,edx,DWORD [esp]
+        xor     ecx,edi
+        and     edx,DWORD [28+esp]
+        vpshufd xmm7,xmm6,132
+        mov     DWORD [8+esp],eax
+        or      edx,esi
+        rorx    edi,eax,2
+        rorx    esi,eax,13
+        vpsrldq xmm7,xmm7,8
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,eax,22
+        xor     esi,edi
+        vpaddd  xmm1,xmm1,xmm7
+        mov     edi,DWORD [12+esp]
+        xor     ecx,esi
+        xor     eax,edi
+        vpshufd xmm7,xmm1,80
+        add     edx,DWORD [4+esp]
+        and     ebx,eax
+        add     edx,DWORD [56+esp]
+        vpsrld  xmm6,xmm7,10
+        xor     ebx,edi
+        add     ecx,edx
+        add     edx,DWORD [20+esp]
+        vpsrlq  xmm5,xmm7,17
+        lea     ebx,[ecx*1+ebx]
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        vpxor   xmm6,xmm6,xmm5
+        mov     DWORD [20+esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        vpsrlq  xmm7,xmm7,19
+        andn    esi,edx,DWORD [28+esp]
+        xor     ecx,edi
+        and     edx,DWORD [24+esp]
+        vpxor   xmm6,xmm6,xmm7
+        mov     DWORD [4+esp],ebx
+        or      edx,esi
+        rorx    edi,ebx,2
+        rorx    esi,ebx,13
+        vpshufd xmm7,xmm6,232
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,ebx,22
+        xor     esi,edi
+        vpslldq xmm7,xmm7,8
+        mov     edi,DWORD [8+esp]
+        xor     ecx,esi
+        xor     ebx,edi
+        vpaddd  xmm1,xmm1,xmm7
+        add     edx,DWORD [esp]
+        and     eax,ebx
+        add     edx,DWORD [60+esp]
+        vpaddd  xmm6,xmm1,[16+ebp]
+        xor     eax,edi
+        add     ecx,edx
+        add     edx,DWORD [16+esp]
+        lea     eax,[ecx*1+eax]
+        vmovdqa [48+esp],xmm6
+        vpalignr        xmm4,xmm3,xmm2,4
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        mov     DWORD [16+esp],edx
+        vpalignr        xmm7,xmm1,xmm0,4
+        rorx    edi,edx,25
+        xor     ecx,esi
+        andn    esi,edx,DWORD [24+esp]
+        vpsrld  xmm6,xmm4,7
+        xor     ecx,edi
+        and     edx,DWORD [20+esp]
+        mov     DWORD [esp],eax
+        vpaddd  xmm2,xmm2,xmm7
+        or      edx,esi
+        rorx    edi,eax,2
+        rorx    esi,eax,13
+        vpsrld  xmm7,xmm4,3
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,eax,22
+        xor     esi,edi
+        vpslld  xmm5,xmm4,14
+        mov     edi,DWORD [4+esp]
+        xor     ecx,esi
+        xor     eax,edi
+        vpxor   xmm4,xmm7,xmm6
+        add     edx,DWORD [28+esp]
+        and     ebx,eax
+        add     edx,DWORD [64+esp]
+        vpshufd xmm7,xmm1,250
+        xor     ebx,edi
+        add     ecx,edx
+        add     edx,DWORD [12+esp]
+        vpsrld  xmm6,xmm6,11
+        lea     ebx,[ecx*1+ebx]
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        vpxor   xmm4,xmm4,xmm5
+        mov     DWORD [12+esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        vpslld  xmm5,xmm5,11
+        andn    esi,edx,DWORD [20+esp]
+        xor     ecx,edi
+        and     edx,DWORD [16+esp]
+        vpxor   xmm4,xmm4,xmm6
+        mov     DWORD [28+esp],ebx
+        or      edx,esi
+        rorx    edi,ebx,2
+        rorx    esi,ebx,13
+        vpsrld  xmm6,xmm7,10
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,ebx,22
+        xor     esi,edi
+        vpxor   xmm4,xmm4,xmm5
+        mov     edi,DWORD [esp]
+        xor     ecx,esi
+        xor     ebx,edi
+        vpsrlq  xmm5,xmm7,17
+        add     edx,DWORD [24+esp]
+        and     eax,ebx
+        add     edx,DWORD [68+esp]
+        vpaddd  xmm2,xmm2,xmm4
+        xor     eax,edi
+        add     ecx,edx
+        add     edx,DWORD [8+esp]
+        vpxor   xmm6,xmm6,xmm5
+        lea     eax,[ecx*1+eax]
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        vpsrlq  xmm7,xmm7,19
+        mov     DWORD [8+esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        vpxor   xmm6,xmm6,xmm7
+        andn    esi,edx,DWORD [16+esp]
+        xor     ecx,edi
+        and     edx,DWORD [12+esp]
+        vpshufd xmm7,xmm6,132
+        mov     DWORD [24+esp],eax
+        or      edx,esi
+        rorx    edi,eax,2
+        rorx    esi,eax,13
+        vpsrldq xmm7,xmm7,8
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,eax,22
+        xor     esi,edi
+        vpaddd  xmm2,xmm2,xmm7
+        mov     edi,DWORD [28+esp]
+        xor     ecx,esi
+        xor     eax,edi
+        vpshufd xmm7,xmm2,80
+        add     edx,DWORD [20+esp]
+        and     ebx,eax
+        add     edx,DWORD [72+esp]
+        vpsrld  xmm6,xmm7,10
+        xor     ebx,edi
+        add     ecx,edx
+        add     edx,DWORD [4+esp]
+        vpsrlq  xmm5,xmm7,17
+        lea     ebx,[ecx*1+ebx]
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        vpxor   xmm6,xmm6,xmm5
+        mov     DWORD [4+esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        vpsrlq  xmm7,xmm7,19
+        andn    esi,edx,DWORD [12+esp]
+        xor     ecx,edi
+        and     edx,DWORD [8+esp]
+        vpxor   xmm6,xmm6,xmm7
+        mov     DWORD [20+esp],ebx
+        or      edx,esi
+        rorx    edi,ebx,2
+        rorx    esi,ebx,13
+        vpshufd xmm7,xmm6,232
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,ebx,22
+        xor     esi,edi
+        vpslldq xmm7,xmm7,8
+        mov     edi,DWORD [24+esp]
+        xor     ecx,esi
+        xor     ebx,edi
+        vpaddd  xmm2,xmm2,xmm7
+        add     edx,DWORD [16+esp]
+        and     eax,ebx
+        add     edx,DWORD [76+esp]
+        vpaddd  xmm6,xmm2,[32+ebp]
+        xor     eax,edi
+        add     ecx,edx
+        add     edx,DWORD [esp]
+        lea     eax,[ecx*1+eax]
+        vmovdqa [64+esp],xmm6
+        vpalignr        xmm4,xmm0,xmm3,4
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        mov     DWORD [esp],edx
+        vpalignr        xmm7,xmm2,xmm1,4
+        rorx    edi,edx,25
+        xor     ecx,esi
+        andn    esi,edx,DWORD [8+esp]
+        vpsrld  xmm6,xmm4,7
+        xor     ecx,edi
+        and     edx,DWORD [4+esp]
+        mov     DWORD [16+esp],eax
+        vpaddd  xmm3,xmm3,xmm7
+        or      edx,esi
+        rorx    edi,eax,2
+        rorx    esi,eax,13
+        vpsrld  xmm7,xmm4,3
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,eax,22
+        xor     esi,edi
+        vpslld  xmm5,xmm4,14
+        mov     edi,DWORD [20+esp]
+        xor     ecx,esi
+        xor     eax,edi
+        vpxor   xmm4,xmm7,xmm6
+        add     edx,DWORD [12+esp]
+        and     ebx,eax
+        add     edx,DWORD [80+esp]
+        vpshufd xmm7,xmm2,250
+        xor     ebx,edi
+        add     ecx,edx
+        add     edx,DWORD [28+esp]
+        vpsrld  xmm6,xmm6,11
+        lea     ebx,[ecx*1+ebx]
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        vpxor   xmm4,xmm4,xmm5
+        mov     DWORD [28+esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        vpslld  xmm5,xmm5,11
+        andn    esi,edx,DWORD [4+esp]
+        xor     ecx,edi
+        and     edx,DWORD [esp]
+        vpxor   xmm4,xmm4,xmm6
+        mov     DWORD [12+esp],ebx
+        or      edx,esi
+        rorx    edi,ebx,2
+        rorx    esi,ebx,13
+        vpsrld  xmm6,xmm7,10
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,ebx,22
+        xor     esi,edi
+        vpxor   xmm4,xmm4,xmm5
+        mov     edi,DWORD [16+esp]
+        xor     ecx,esi
+        xor     ebx,edi
+        vpsrlq  xmm5,xmm7,17
+        add     edx,DWORD [8+esp]
+        and     eax,ebx
+        add     edx,DWORD [84+esp]
+        vpaddd  xmm3,xmm3,xmm4
+        xor     eax,edi
+        add     ecx,edx
+        add     edx,DWORD [24+esp]
+        vpxor   xmm6,xmm6,xmm5
+        lea     eax,[ecx*1+eax]
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        vpsrlq  xmm7,xmm7,19
+        mov     DWORD [24+esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        vpxor   xmm6,xmm6,xmm7
+        andn    esi,edx,DWORD [esp]
+        xor     ecx,edi
+        and     edx,DWORD [28+esp]
+        vpshufd xmm7,xmm6,132
+        mov     DWORD [8+esp],eax
+        or      edx,esi
+        rorx    edi,eax,2
+        rorx    esi,eax,13
+        vpsrldq xmm7,xmm7,8
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,eax,22
+        xor     esi,edi
+        vpaddd  xmm3,xmm3,xmm7
+        mov     edi,DWORD [12+esp]
+        xor     ecx,esi
+        xor     eax,edi
+        vpshufd xmm7,xmm3,80
+        add     edx,DWORD [4+esp]
+        and     ebx,eax
+        add     edx,DWORD [88+esp]
+        vpsrld  xmm6,xmm7,10
+        xor     ebx,edi
+        add     ecx,edx
+        add     edx,DWORD [20+esp]
+        vpsrlq  xmm5,xmm7,17
+        lea     ebx,[ecx*1+ebx]
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        vpxor   xmm6,xmm6,xmm5
+        mov     DWORD [20+esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        vpsrlq  xmm7,xmm7,19
+        andn    esi,edx,DWORD [28+esp]
+        xor     ecx,edi
+        and     edx,DWORD [24+esp]
+        vpxor   xmm6,xmm6,xmm7
+        mov     DWORD [4+esp],ebx
+        or      edx,esi
+        rorx    edi,ebx,2
+        rorx    esi,ebx,13
+        vpshufd xmm7,xmm6,232
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,ebx,22
+        xor     esi,edi
+        vpslldq xmm7,xmm7,8
+        mov     edi,DWORD [8+esp]
+        xor     ecx,esi
+        xor     ebx,edi
+        vpaddd  xmm3,xmm3,xmm7
+        add     edx,DWORD [esp]
+        and     eax,ebx
+        add     edx,DWORD [92+esp]
+        vpaddd  xmm6,xmm3,[48+ebp]
+        xor     eax,edi
+        add     ecx,edx
+        add     edx,DWORD [16+esp]
+        lea     eax,[ecx*1+eax]
+        vmovdqa [80+esp],xmm6
+        cmp     DWORD [64+ebp],66051
+        jne     NEAR L$018avx_bmi_00_47
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        mov     DWORD [16+esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        andn    esi,edx,DWORD [24+esp]
+        xor     ecx,edi
+        and     edx,DWORD [20+esp]
+        mov     DWORD [esp],eax
+        or      edx,esi
+        rorx    edi,eax,2
+        rorx    esi,eax,13
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,eax,22
+        xor     esi,edi
+        mov     edi,DWORD [4+esp]
+        xor     ecx,esi
+        xor     eax,edi
+        add     edx,DWORD [28+esp]
+        and     ebx,eax
+        add     edx,DWORD [32+esp]
+        xor     ebx,edi
+        add     ecx,edx
+        add     edx,DWORD [12+esp]
+        lea     ebx,[ecx*1+ebx]
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        mov     DWORD [12+esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        andn    esi,edx,DWORD [20+esp]
+        xor     ecx,edi
+        and     edx,DWORD [16+esp]
+        mov     DWORD [28+esp],ebx
+        or      edx,esi
+        rorx    edi,ebx,2
+        rorx    esi,ebx,13
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,ebx,22
+        xor     esi,edi
+        mov     edi,DWORD [esp]
+        xor     ecx,esi
+        xor     ebx,edi
+        add     edx,DWORD [24+esp]
+        and     eax,ebx
+        add     edx,DWORD [36+esp]
+        xor     eax,edi
+        add     ecx,edx
+        add     edx,DWORD [8+esp]
+        lea     eax,[ecx*1+eax]
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        mov     DWORD [8+esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        andn    esi,edx,DWORD [16+esp]
+        xor     ecx,edi
+        and     edx,DWORD [12+esp]
+        mov     DWORD [24+esp],eax
+        or      edx,esi
+        rorx    edi,eax,2
+        rorx    esi,eax,13
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,eax,22
+        xor     esi,edi
+        mov     edi,DWORD [28+esp]
+        xor     ecx,esi
+        xor     eax,edi
+        add     edx,DWORD [20+esp]
+        and     ebx,eax
+        add     edx,DWORD [40+esp]
+        xor     ebx,edi
+        add     ecx,edx
+        add     edx,DWORD [4+esp]
+        lea     ebx,[ecx*1+ebx]
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        mov     DWORD [4+esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        andn    esi,edx,DWORD [12+esp]
+        xor     ecx,edi
+        and     edx,DWORD [8+esp]
+        mov     DWORD [20+esp],ebx
+        or      edx,esi
+        rorx    edi,ebx,2
+        rorx    esi,ebx,13
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,ebx,22
+        xor     esi,edi
+        mov     edi,DWORD [24+esp]
+        xor     ecx,esi
+        xor     ebx,edi
+        add     edx,DWORD [16+esp]
+        and     eax,ebx
+        add     edx,DWORD [44+esp]
+        xor     eax,edi
+        add     ecx,edx
+        add     edx,DWORD [esp]
+        lea     eax,[ecx*1+eax]
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        mov     DWORD [esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        andn    esi,edx,DWORD [8+esp]
+        xor     ecx,edi
+        and     edx,DWORD [4+esp]
+        mov     DWORD [16+esp],eax
+        or      edx,esi
+        rorx    edi,eax,2
+        rorx    esi,eax,13
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,eax,22
+        xor     esi,edi
+        mov     edi,DWORD [20+esp]
+        xor     ecx,esi
+        xor     eax,edi
+        add     edx,DWORD [12+esp]
+        and     ebx,eax
+        add     edx,DWORD [48+esp]
+        xor     ebx,edi
+        add     ecx,edx
+        add     edx,DWORD [28+esp]
+        lea     ebx,[ecx*1+ebx]
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        mov     DWORD [28+esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        andn    esi,edx,DWORD [4+esp]
+        xor     ecx,edi
+        and     edx,DWORD [esp]
+        mov     DWORD [12+esp],ebx
+        or      edx,esi
+        rorx    edi,ebx,2
+        rorx    esi,ebx,13
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,ebx,22
+        xor     esi,edi
+        mov     edi,DWORD [16+esp]
+        xor     ecx,esi
+        xor     ebx,edi
+        add     edx,DWORD [8+esp]
+        and     eax,ebx
+        add     edx,DWORD [52+esp]
+        xor     eax,edi
+        add     ecx,edx
+        add     edx,DWORD [24+esp]
+        lea     eax,[ecx*1+eax]
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        mov     DWORD [24+esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        andn    esi,edx,DWORD [esp]
+        xor     ecx,edi
+        and     edx,DWORD [28+esp]
+        mov     DWORD [8+esp],eax
+        or      edx,esi
+        rorx    edi,eax,2
+        rorx    esi,eax,13
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,eax,22
+        xor     esi,edi
+        mov     edi,DWORD [12+esp]
+        xor     ecx,esi
+        xor     eax,edi
+        add     edx,DWORD [4+esp]
+        and     ebx,eax
+        add     edx,DWORD [56+esp]
+        xor     ebx,edi
+        add     ecx,edx
+        add     edx,DWORD [20+esp]
+        lea     ebx,[ecx*1+ebx]
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        mov     DWORD [20+esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        andn    esi,edx,DWORD [28+esp]
+        xor     ecx,edi
+        and     edx,DWORD [24+esp]
+        mov     DWORD [4+esp],ebx
+        or      edx,esi
+        rorx    edi,ebx,2
+        rorx    esi,ebx,13
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,ebx,22
+        xor     esi,edi
+        mov     edi,DWORD [8+esp]
+        xor     ecx,esi
+        xor     ebx,edi
+        add     edx,DWORD [esp]
+        and     eax,ebx
+        add     edx,DWORD [60+esp]
+        xor     eax,edi
+        add     ecx,edx
+        add     edx,DWORD [16+esp]
+        lea     eax,[ecx*1+eax]
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        mov     DWORD [16+esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        andn    esi,edx,DWORD [24+esp]
+        xor     ecx,edi
+        and     edx,DWORD [20+esp]
+        mov     DWORD [esp],eax
+        or      edx,esi
+        rorx    edi,eax,2
+        rorx    esi,eax,13
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,eax,22
+        xor     esi,edi
+        mov     edi,DWORD [4+esp]
+        xor     ecx,esi
+        xor     eax,edi
+        add     edx,DWORD [28+esp]
+        and     ebx,eax
+        add     edx,DWORD [64+esp]
+        xor     ebx,edi
+        add     ecx,edx
+        add     edx,DWORD [12+esp]
+        lea     ebx,[ecx*1+ebx]
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        mov     DWORD [12+esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        andn    esi,edx,DWORD [20+esp]
+        xor     ecx,edi
+        and     edx,DWORD [16+esp]
+        mov     DWORD [28+esp],ebx
+        or      edx,esi
+        rorx    edi,ebx,2
+        rorx    esi,ebx,13
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,ebx,22
+        xor     esi,edi
+        mov     edi,DWORD [esp]
+        xor     ecx,esi
+        xor     ebx,edi
+        add     edx,DWORD [24+esp]
+        and     eax,ebx
+        add     edx,DWORD [68+esp]
+        xor     eax,edi
+        add     ecx,edx
+        add     edx,DWORD [8+esp]
+        lea     eax,[ecx*1+eax]
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        mov     DWORD [8+esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        andn    esi,edx,DWORD [16+esp]
+        xor     ecx,edi
+        and     edx,DWORD [12+esp]
+        mov     DWORD [24+esp],eax
+        or      edx,esi
+        rorx    edi,eax,2
+        rorx    esi,eax,13
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,eax,22
+        xor     esi,edi
+        mov     edi,DWORD [28+esp]
+        xor     ecx,esi
+        xor     eax,edi
+        add     edx,DWORD [20+esp]
+        and     ebx,eax
+        add     edx,DWORD [72+esp]
+        xor     ebx,edi
+        add     ecx,edx
+        add     edx,DWORD [4+esp]
+        lea     ebx,[ecx*1+ebx]
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        mov     DWORD [4+esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        andn    esi,edx,DWORD [12+esp]
+        xor     ecx,edi
+        and     edx,DWORD [8+esp]
+        mov     DWORD [20+esp],ebx
+        or      edx,esi
+        rorx    edi,ebx,2
+        rorx    esi,ebx,13
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,ebx,22
+        xor     esi,edi
+        mov     edi,DWORD [24+esp]
+        xor     ecx,esi
+        xor     ebx,edi
+        add     edx,DWORD [16+esp]
+        and     eax,ebx
+        add     edx,DWORD [76+esp]
+        xor     eax,edi
+        add     ecx,edx
+        add     edx,DWORD [esp]
+        lea     eax,[ecx*1+eax]
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        mov     DWORD [esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        andn    esi,edx,DWORD [8+esp]
+        xor     ecx,edi
+        and     edx,DWORD [4+esp]
+        mov     DWORD [16+esp],eax
+        or      edx,esi
+        rorx    edi,eax,2
+        rorx    esi,eax,13
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,eax,22
+        xor     esi,edi
+        mov     edi,DWORD [20+esp]
+        xor     ecx,esi
+        xor     eax,edi
+        add     edx,DWORD [12+esp]
+        and     ebx,eax
+        add     edx,DWORD [80+esp]
+        xor     ebx,edi
+        add     ecx,edx
+        add     edx,DWORD [28+esp]
+        lea     ebx,[ecx*1+ebx]
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        mov     DWORD [28+esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        andn    esi,edx,DWORD [4+esp]
+        xor     ecx,edi
+        and     edx,DWORD [esp]
+        mov     DWORD [12+esp],ebx
+        or      edx,esi
+        rorx    edi,ebx,2
+        rorx    esi,ebx,13
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,ebx,22
+        xor     esi,edi
+        mov     edi,DWORD [16+esp]
+        xor     ecx,esi
+        xor     ebx,edi
+        add     edx,DWORD [8+esp]
+        and     eax,ebx
+        add     edx,DWORD [84+esp]
+        xor     eax,edi
+        add     ecx,edx
+        add     edx,DWORD [24+esp]
+        lea     eax,[ecx*1+eax]
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        mov     DWORD [24+esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        andn    esi,edx,DWORD [esp]
+        xor     ecx,edi
+        and     edx,DWORD [28+esp]
+        mov     DWORD [8+esp],eax
+        or      edx,esi
+        rorx    edi,eax,2
+        rorx    esi,eax,13
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,eax,22
+        xor     esi,edi
+        mov     edi,DWORD [12+esp]
+        xor     ecx,esi
+        xor     eax,edi
+        add     edx,DWORD [4+esp]
+        and     ebx,eax
+        add     edx,DWORD [88+esp]
+        xor     ebx,edi
+        add     ecx,edx
+        add     edx,DWORD [20+esp]
+        lea     ebx,[ecx*1+ebx]
+        rorx    ecx,edx,6
+        rorx    esi,edx,11
+        mov     DWORD [20+esp],edx
+        rorx    edi,edx,25
+        xor     ecx,esi
+        andn    esi,edx,DWORD [28+esp]
+        xor     ecx,edi
+        and     edx,DWORD [24+esp]
+        mov     DWORD [4+esp],ebx
+        or      edx,esi
+        rorx    edi,ebx,2
+        rorx    esi,ebx,13
+        lea     edx,[ecx*1+edx]
+        rorx    ecx,ebx,22
+        xor     esi,edi
+        mov     edi,DWORD [8+esp]
+        xor     ecx,esi
+        xor     ebx,edi
+        add     edx,DWORD [esp]
+        and     eax,ebx
+        add     edx,DWORD [92+esp]
+        xor     eax,edi
+        add     ecx,edx
+        add     edx,DWORD [16+esp]
+        lea     eax,[ecx*1+eax]
+        mov     esi,DWORD [96+esp]
+        xor     ebx,edi
+        mov     ecx,DWORD [12+esp]
+        add     eax,DWORD [esi]
+        add     ebx,DWORD [4+esi]
+        add     edi,DWORD [8+esi]
+        add     ecx,DWORD [12+esi]
+        mov     DWORD [esi],eax
+        mov     DWORD [4+esi],ebx
+        mov     DWORD [8+esi],edi
+        mov     DWORD [12+esi],ecx
+        mov     DWORD [4+esp],ebx
+        xor     ebx,edi
+        mov     DWORD [8+esp],edi
+        mov     DWORD [12+esp],ecx
+        mov     edi,DWORD [20+esp]
+        mov     ecx,DWORD [24+esp]
+        add     edx,DWORD [16+esi]
+        add     edi,DWORD [20+esi]
+        add     ecx,DWORD [24+esi]
+        mov     DWORD [16+esi],edx
+        mov     DWORD [20+esi],edi
+        mov     DWORD [20+esp],edi
+        mov     edi,DWORD [28+esp]
+        mov     DWORD [24+esi],ecx
+        add     edi,DWORD [28+esi]
+        mov     DWORD [24+esp],ecx
+        mov     DWORD [28+esi],edi
+        mov     DWORD [28+esp],edi
+        mov     edi,DWORD [100+esp]
+        vmovdqa xmm7,[64+ebp]
+        sub     ebp,192
+        cmp     edi,DWORD [104+esp]
+        jb      NEAR L$017grand_avx_bmi
+        mov     esp,DWORD [108+esp]
+        vzeroall
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+segment .bss
+common  _OPENSSL_ia32cap_P 16
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha512-586.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha512-586.nasm
new file mode 100644
index 0000000000..f80f1cca53
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha512-586.nasm
@@ -0,0 +1,2842 @@
+; Copyright 2007-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code    use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text   code align=64
+%else
+section .text   code
+%endif
+;extern _OPENSSL_ia32cap_P
+global  _sha512_block_data_order
+align   16
+_sha512_block_data_order:
+L$_sha512_block_data_order_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        mov     esi,DWORD [20+esp]
+        mov     edi,DWORD [24+esp]
+        mov     eax,DWORD [28+esp]
+        mov     ebx,esp
+        call    L$000pic_point
+L$000pic_point:
+        pop     ebp
+        lea     ebp,[(L$001K512-L$000pic_point)+ebp]
+        sub     esp,16
+        and     esp,-64
+        shl     eax,7
+        add     eax,edi
+        mov     DWORD [esp],esi
+        mov     DWORD [4+esp],edi
+        mov     DWORD [8+esp],eax
+        mov     DWORD [12+esp],ebx
+        lea     edx,[_OPENSSL_ia32cap_P]
+        mov     ecx,DWORD [edx]
+        test    ecx,67108864
+        jz      NEAR L$002loop_x86
+        mov     edx,DWORD [4+edx]
+        movq    mm0,[esi]
+        and     ecx,16777216
+        movq    mm1,[8+esi]
+        and     edx,512
+        movq    mm2,[16+esi]
+        or      ecx,edx
+        movq    mm3,[24+esi]
+        movq    mm4,[32+esi]
+        movq    mm5,[40+esi]
+        movq    mm6,[48+esi]
+        movq    mm7,[56+esi]
+        cmp     ecx,16777728
+        je      NEAR L$003SSSE3
+        sub     esp,80
+        jmp     NEAR L$004loop_sse2
+align   16
+L$004loop_sse2:
+        movq    [8+esp],mm1
+        movq    [16+esp],mm2
+        movq    [24+esp],mm3
+        movq    [40+esp],mm5
+        movq    [48+esp],mm6
+        pxor    mm2,mm1
+        movq    [56+esp],mm7
+        movq    mm3,mm0
+        mov     eax,DWORD [edi]
+        mov     ebx,DWORD [4+edi]
+        add     edi,8
+        mov     edx,15
+        bswap   eax
+        bswap   ebx
+        jmp     NEAR L$00500_14_sse2
+align   16
+L$00500_14_sse2:
+        movd    mm1,eax
+        mov     eax,DWORD [edi]
+        movd    mm7,ebx
+        mov     ebx,DWORD [4+edi]
+        add     edi,8
+        bswap   eax
+        bswap   ebx
+        punpckldq       mm7,mm1
+        movq    mm1,mm4
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [32+esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        movq    mm0,mm3
+        movq    [72+esp],mm7
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [esp],mm0
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[56+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        paddq   mm7,[ebp]
+        pxor    mm3,mm4
+        movq    mm4,[24+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm0
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm0
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[8+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        sub     esp,8
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm2,mm0
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        pxor    mm6,mm7
+        movq    mm5,[40+esp]
+        paddq   mm3,mm2
+        movq    mm2,mm0
+        add     ebp,8
+        paddq   mm3,mm6
+        movq    mm6,[48+esp]
+        dec     edx
+        jnz     NEAR L$00500_14_sse2
+        movd    mm1,eax
+        movd    mm7,ebx
+        punpckldq       mm7,mm1
+        movq    mm1,mm4
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [32+esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        movq    mm0,mm3
+        movq    [72+esp],mm7
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [esp],mm0
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[56+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        paddq   mm7,[ebp]
+        pxor    mm3,mm4
+        movq    mm4,[24+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm0
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm0
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[8+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        sub     esp,8
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm2,mm0
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        pxor    mm6,mm7
+        movq    mm7,[192+esp]
+        paddq   mm3,mm2
+        movq    mm2,mm0
+        add     ebp,8
+        paddq   mm3,mm6
+        pxor    mm0,mm0
+        mov     edx,32
+        jmp     NEAR L$00616_79_sse2
+align   16
+L$00616_79_sse2:
+        movq    mm5,[88+esp]
+        movq    mm1,mm7
+        psrlq   mm7,1
+        movq    mm6,mm5
+        psrlq   mm5,6
+        psllq   mm1,56
+        paddq   mm0,mm3
+        movq    mm3,mm7
+        psrlq   mm7,6
+        pxor    mm3,mm1
+        psllq   mm1,7
+        pxor    mm3,mm7
+        psrlq   mm7,1
+        pxor    mm3,mm1
+        movq    mm1,mm5
+        psrlq   mm5,13
+        pxor    mm7,mm3
+        psllq   mm6,3
+        pxor    mm1,mm5
+        paddq   mm7,[200+esp]
+        pxor    mm1,mm6
+        psrlq   mm5,42
+        paddq   mm7,[128+esp]
+        pxor    mm1,mm5
+        psllq   mm6,42
+        movq    mm5,[40+esp]
+        pxor    mm1,mm6
+        movq    mm6,[48+esp]
+        paddq   mm7,mm1
+        movq    mm1,mm4
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [32+esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        movq    [72+esp],mm7
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [esp],mm0
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[56+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        paddq   mm7,[ebp]
+        pxor    mm3,mm4
+        movq    mm4,[24+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm0
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm0
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[8+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        sub     esp,8
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm2,mm0
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        pxor    mm6,mm7
+        movq    mm7,[192+esp]
+        paddq   mm2,mm6
+        add     ebp,8
+        movq    mm5,[88+esp]
+        movq    mm1,mm7
+        psrlq   mm7,1
+        movq    mm6,mm5
+        psrlq   mm5,6
+        psllq   mm1,56
+        paddq   mm2,mm3
+        movq    mm3,mm7
+        psrlq   mm7,6
+        pxor    mm3,mm1
+        psllq   mm1,7
+        pxor    mm3,mm7
+        psrlq   mm7,1
+        pxor    mm3,mm1
+        movq    mm1,mm5
+        psrlq   mm5,13
+        pxor    mm7,mm3
+        psllq   mm6,3
+        pxor    mm1,mm5
+        paddq   mm7,[200+esp]
+        pxor    mm1,mm6
+        psrlq   mm5,42
+        paddq   mm7,[128+esp]
+        pxor    mm1,mm5
+        psllq   mm6,42
+        movq    mm5,[40+esp]
+        pxor    mm1,mm6
+        movq    mm6,[48+esp]
+        paddq   mm7,mm1
+        movq    mm1,mm4
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [32+esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        movq    [72+esp],mm7
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [esp],mm2
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[56+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        paddq   mm7,[ebp]
+        pxor    mm3,mm4
+        movq    mm4,[24+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm2
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm2
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[8+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        sub     esp,8
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm0,mm2
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        pxor    mm6,mm7
+        movq    mm7,[192+esp]
+        paddq   mm0,mm6
+        add     ebp,8
+        dec     edx
+        jnz     NEAR L$00616_79_sse2
+        paddq   mm0,mm3
+        movq    mm1,[8+esp]
+        movq    mm3,[24+esp]
+        movq    mm5,[40+esp]
+        movq    mm6,[48+esp]
+        movq    mm7,[56+esp]
+        pxor    mm2,mm1
+        paddq   mm0,[esi]
+        paddq   mm1,[8+esi]
+        paddq   mm2,[16+esi]
+        paddq   mm3,[24+esi]
+        paddq   mm4,[32+esi]
+        paddq   mm5,[40+esi]
+        paddq   mm6,[48+esi]
+        paddq   mm7,[56+esi]
+        mov     eax,640
+        movq    [esi],mm0
+        movq    [8+esi],mm1
+        movq    [16+esi],mm2
+        movq    [24+esi],mm3
+        movq    [32+esi],mm4
+        movq    [40+esi],mm5
+        movq    [48+esi],mm6
+        movq    [56+esi],mm7
+        lea     esp,[eax*1+esp]
+        sub     ebp,eax
+        cmp     edi,DWORD [88+esp]
+        jb      NEAR L$004loop_sse2
+        mov     esp,DWORD [92+esp]
+        emms
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+align   32
+L$003SSSE3:
+        lea     edx,[esp-64]
+        sub     esp,256
+        movdqa  xmm1,[640+ebp]
+        movdqu  xmm0,[edi]
+db      102,15,56,0,193
+        movdqa  xmm3,[ebp]
+        movdqa  xmm2,xmm1
+        movdqu  xmm1,[16+edi]
+        paddq   xmm3,xmm0
+db      102,15,56,0,202
+        movdqa  [edx-128],xmm3
+        movdqa  xmm4,[16+ebp]
+        movdqa  xmm3,xmm2
+        movdqu  xmm2,[32+edi]
+        paddq   xmm4,xmm1
+db      102,15,56,0,211
+        movdqa  [edx-112],xmm4
+        movdqa  xmm5,[32+ebp]
+        movdqa  xmm4,xmm3
+        movdqu  xmm3,[48+edi]
+        paddq   xmm5,xmm2
+db      102,15,56,0,220
+        movdqa  [edx-96],xmm5
+        movdqa  xmm6,[48+ebp]
+        movdqa  xmm5,xmm4
+        movdqu  xmm4,[64+edi]
+        paddq   xmm6,xmm3
+db      102,15,56,0,229
+        movdqa  [edx-80],xmm6
+        movdqa  xmm7,[64+ebp]
+        movdqa  xmm6,xmm5
+        movdqu  xmm5,[80+edi]
+        paddq   xmm7,xmm4
+db      102,15,56,0,238
+        movdqa  [edx-64],xmm7
+        movdqa  [edx],xmm0
+        movdqa  xmm0,[80+ebp]
+        movdqa  xmm7,xmm6
+        movdqu  xmm6,[96+edi]
+        paddq   xmm0,xmm5
+db      102,15,56,0,247
+        movdqa  [edx-48],xmm0
+        movdqa  [16+edx],xmm1
+        movdqa  xmm1,[96+ebp]
+        movdqa  xmm0,xmm7
+        movdqu  xmm7,[112+edi]
+        paddq   xmm1,xmm6
+db      102,15,56,0,248
+        movdqa  [edx-32],xmm1
+        movdqa  [32+edx],xmm2
+        movdqa  xmm2,[112+ebp]
+        movdqa  xmm0,[edx]
+        paddq   xmm2,xmm7
+        movdqa  [edx-16],xmm2
+        nop
+align   32
+L$007loop_ssse3:
+        movdqa  xmm2,[16+edx]
+        movdqa  [48+edx],xmm3
+        lea     ebp,[128+ebp]
+        movq    [8+esp],mm1
+        mov     ebx,edi
+        movq    [16+esp],mm2
+        lea     edi,[128+edi]
+        movq    [24+esp],mm3
+        cmp     edi,eax
+        movq    [40+esp],mm5
+        cmovb   ebx,edi
+        movq    [48+esp],mm6
+        mov     ecx,4
+        pxor    mm2,mm1
+        movq    [56+esp],mm7
+        pxor    mm3,mm3
+        jmp     NEAR L$00800_47_ssse3
+align   32
+L$00800_47_ssse3:
+        movdqa  xmm3,xmm5
+        movdqa  xmm1,xmm2
+db      102,15,58,15,208,8
+        movdqa  [edx],xmm4
+db      102,15,58,15,220,8
+        movdqa  xmm4,xmm2
+        psrlq   xmm2,7
+        paddq   xmm0,xmm3
+        movdqa  xmm3,xmm4
+        psrlq   xmm4,1
+        psllq   xmm3,56
+        pxor    xmm2,xmm4
+        psrlq   xmm4,7
+        pxor    xmm2,xmm3
+        psllq   xmm3,7
+        pxor    xmm2,xmm4
+        movdqa  xmm4,xmm7
+        pxor    xmm2,xmm3
+        movdqa  xmm3,xmm7
+        psrlq   xmm4,6
+        paddq   xmm0,xmm2
+        movdqa  xmm2,xmm7
+        psrlq   xmm3,19
+        psllq   xmm2,3
+        pxor    xmm4,xmm3
+        psrlq   xmm3,42
+        pxor    xmm4,xmm2
+        psllq   xmm2,42
+        pxor    xmm4,xmm3
+        movdqa  xmm3,[32+edx]
+        pxor    xmm4,xmm2
+        movdqa  xmm2,[ebp]
+        movq    mm1,mm4
+        paddq   xmm0,xmm4
+        movq    mm7,[edx-128]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [32+esp],mm4
+        paddq   xmm2,xmm0
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm0,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [esp],mm0
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[56+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[24+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm0
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm0
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[8+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm2,mm0
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        pxor    mm6,mm7
+        movq    mm5,[32+esp]
+        paddq   mm2,mm6
+        movq    mm6,[40+esp]
+        movq    mm1,mm4
+        movq    mm7,[edx-120]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [24+esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm2,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [56+esp],mm2
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[48+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[16+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm2
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm2
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm0,mm2
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        pxor    mm6,mm7
+        movq    mm5,[24+esp]
+        paddq   mm0,mm6
+        movq    mm6,[32+esp]
+        movdqa  [edx-128],xmm2
+        movdqa  xmm4,xmm6
+        movdqa  xmm2,xmm3
+db      102,15,58,15,217,8
+        movdqa  [16+edx],xmm5
+db      102,15,58,15,229,8
+        movdqa  xmm5,xmm3
+        psrlq   xmm3,7
+        paddq   xmm1,xmm4
+        movdqa  xmm4,xmm5
+        psrlq   xmm5,1
+        psllq   xmm4,56
+        pxor    xmm3,xmm5
+        psrlq   xmm5,7
+        pxor    xmm3,xmm4
+        psllq   xmm4,7
+        pxor    xmm3,xmm5
+        movdqa  xmm5,xmm0
+        pxor    xmm3,xmm4
+        movdqa  xmm4,xmm0
+        psrlq   xmm5,6
+        paddq   xmm1,xmm3
+        movdqa  xmm3,xmm0
+        psrlq   xmm4,19
+        psllq   xmm3,3
+        pxor    xmm5,xmm4
+        psrlq   xmm4,42
+        pxor    xmm5,xmm3
+        psllq   xmm3,42
+        pxor    xmm5,xmm4
+        movdqa  xmm4,[48+edx]
+        pxor    xmm5,xmm3
+        movdqa  xmm3,[16+ebp]
+        movq    mm1,mm4
+        paddq   xmm1,xmm5
+        movq    mm7,[edx-112]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [16+esp],mm4
+        paddq   xmm3,xmm1
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm0,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [48+esp],mm0
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[40+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[8+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm0
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm0
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[56+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm2,mm0
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        pxor    mm6,mm7
+        movq    mm5,[16+esp]
+        paddq   mm2,mm6
+        movq    mm6,[24+esp]
+        movq    mm1,mm4
+        movq    mm7,[edx-104]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [8+esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm2,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [40+esp],mm2
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[32+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[esp]
+        paddq   mm3,mm7
+        movq    mm5,mm2
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm2
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[48+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm0,mm2
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        pxor    mm6,mm7
+        movq    mm5,[8+esp]
+        paddq   mm0,mm6
+        movq    mm6,[16+esp]
+        movdqa  [edx-112],xmm3
+        movdqa  xmm5,xmm7
+        movdqa  xmm3,xmm4
+db      102,15,58,15,226,8
+        movdqa  [32+edx],xmm6
+db      102,15,58,15,238,8
+        movdqa  xmm6,xmm4
+        psrlq   xmm4,7
+        paddq   xmm2,xmm5
+        movdqa  xmm5,xmm6
+        psrlq   xmm6,1
+        psllq   xmm5,56
+        pxor    xmm4,xmm6
+        psrlq   xmm6,7
+        pxor    xmm4,xmm5
+        psllq   xmm5,7
+        pxor    xmm4,xmm6
+        movdqa  xmm6,xmm1
+        pxor    xmm4,xmm5
+        movdqa  xmm5,xmm1
+        psrlq   xmm6,6
+        paddq   xmm2,xmm4
+        movdqa  xmm4,xmm1
+        psrlq   xmm5,19
+        psllq   xmm4,3
+        pxor    xmm6,xmm5
+        psrlq   xmm5,42
+        pxor    xmm6,xmm4
+        psllq   xmm4,42
+        pxor    xmm6,xmm5
+        movdqa  xmm5,[edx]
+        pxor    xmm6,xmm4
+        movdqa  xmm4,[32+ebp]
+        movq    mm1,mm4
+        paddq   xmm2,xmm6
+        movq    mm7,[edx-96]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [esp],mm4
+        paddq   xmm4,xmm2
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm0,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [32+esp],mm0
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[24+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[56+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm0
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm0
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[40+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm2,mm0
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        pxor    mm6,mm7
+        movq    mm5,[esp]
+        paddq   mm2,mm6
+        movq    mm6,[8+esp]
+        movq    mm1,mm4
+        movq    mm7,[edx-88]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [56+esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm2,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [24+esp],mm2
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[16+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[48+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm2
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm2
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[32+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm0,mm2
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        pxor    mm6,mm7
+        movq    mm5,[56+esp]
+        paddq   mm0,mm6
+        movq    mm6,[esp]
+        movdqa  [edx-96],xmm4
+        movdqa  xmm6,xmm0
+        movdqa  xmm4,xmm5
+db      102,15,58,15,235,8
+        movdqa  [48+edx],xmm7
+db      102,15,58,15,247,8
+        movdqa  xmm7,xmm5
+        psrlq   xmm5,7
+        paddq   xmm3,xmm6
+        movdqa  xmm6,xmm7
+        psrlq   xmm7,1
+        psllq   xmm6,56
+        pxor    xmm5,xmm7
+        psrlq   xmm7,7
+        pxor    xmm5,xmm6
+        psllq   xmm6,7
+        pxor    xmm5,xmm7
+        movdqa  xmm7,xmm2
+        pxor    xmm5,xmm6
+        movdqa  xmm6,xmm2
+        psrlq   xmm7,6
+        paddq   xmm3,xmm5
+        movdqa  xmm5,xmm2
+        psrlq   xmm6,19
+        psllq   xmm5,3
+        pxor    xmm7,xmm6
+        psrlq   xmm6,42
+        pxor    xmm7,xmm5
+        psllq   xmm5,42
+        pxor    xmm7,xmm6
+        movdqa  xmm6,[16+edx]
+        pxor    xmm7,xmm5
+        movdqa  xmm5,[48+ebp]
+        movq    mm1,mm4
+        paddq   xmm3,xmm7
+        movq    mm7,[edx-80]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [48+esp],mm4
+        paddq   xmm5,xmm3
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm0,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [16+esp],mm0
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[8+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[40+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm0
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm0
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[24+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm2,mm0
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        pxor    mm6,mm7
+        movq    mm5,[48+esp]
+        paddq   mm2,mm6
+        movq    mm6,[56+esp]
+        movq    mm1,mm4
+        movq    mm7,[edx-72]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [40+esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm2,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [8+esp],mm2
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[32+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm2
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm2
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[16+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm0,mm2
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        pxor    mm6,mm7
+        movq    mm5,[40+esp]
+        paddq   mm0,mm6
+        movq    mm6,[48+esp]
+        movdqa  [edx-80],xmm5
+        movdqa  xmm7,xmm1
+        movdqa  xmm5,xmm6
+db      102,15,58,15,244,8
+        movdqa  [edx],xmm0
+db      102,15,58,15,248,8
+        movdqa  xmm0,xmm6
+        psrlq   xmm6,7
+        paddq   xmm4,xmm7
+        movdqa  xmm7,xmm0
+        psrlq   xmm0,1
+        psllq   xmm7,56
+        pxor    xmm6,xmm0
+        psrlq   xmm0,7
+        pxor    xmm6,xmm7
+        psllq   xmm7,7
+        pxor    xmm6,xmm0
+        movdqa  xmm0,xmm3
+        pxor    xmm6,xmm7
+        movdqa  xmm7,xmm3
+        psrlq   xmm0,6
+        paddq   xmm4,xmm6
+        movdqa  xmm6,xmm3
+        psrlq   xmm7,19
+        psllq   xmm6,3
+        pxor    xmm0,xmm7
+        psrlq   xmm7,42
+        pxor    xmm0,xmm6
+        psllq   xmm6,42
+        pxor    xmm0,xmm7
+        movdqa  xmm7,[32+edx]
+        pxor    xmm0,xmm6
+        movdqa  xmm6,[64+ebp]
+        movq    mm1,mm4
+        paddq   xmm4,xmm0
+        movq    mm7,[edx-64]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [32+esp],mm4
+        paddq   xmm6,xmm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm0,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [esp],mm0
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[56+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[24+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm0
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm0
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[8+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm2,mm0
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        pxor    mm6,mm7
+        movq    mm5,[32+esp]
+        paddq   mm2,mm6
+        movq    mm6,[40+esp]
+        movq    mm1,mm4
+        movq    mm7,[edx-56]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [24+esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm2,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [56+esp],mm2
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[48+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[16+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm2
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm2
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm0,mm2
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        pxor    mm6,mm7
+        movq    mm5,[24+esp]
+        paddq   mm0,mm6
+        movq    mm6,[32+esp]
+        movdqa  [edx-64],xmm6
+        movdqa  xmm0,xmm2
+        movdqa  xmm6,xmm7
+db      102,15,58,15,253,8
+        movdqa  [16+edx],xmm1
+db      102,15,58,15,193,8
+        movdqa  xmm1,xmm7
+        psrlq   xmm7,7
+        paddq   xmm5,xmm0
+        movdqa  xmm0,xmm1
+        psrlq   xmm1,1
+        psllq   xmm0,56
+        pxor    xmm7,xmm1
+        psrlq   xmm1,7
+        pxor    xmm7,xmm0
+        psllq   xmm0,7
+        pxor    xmm7,xmm1
+        movdqa  xmm1,xmm4
+        pxor    xmm7,xmm0
+        movdqa  xmm0,xmm4
+        psrlq   xmm1,6
+        paddq   xmm5,xmm7
+        movdqa  xmm7,xmm4
+        psrlq   xmm0,19
+        psllq   xmm7,3
+        pxor    xmm1,xmm0
+        psrlq   xmm0,42
+        pxor    xmm1,xmm7
+        psllq   xmm7,42
+        pxor    xmm1,xmm0
+        movdqa  xmm0,[48+edx]
+        pxor    xmm1,xmm7
+        movdqa  xmm7,[80+ebp]
+        movq    mm1,mm4
+        paddq   xmm5,xmm1
+        movq    mm7,[edx-48]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [16+esp],mm4
+        paddq   xmm7,xmm5
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm0,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [48+esp],mm0
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[40+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[8+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm0
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm0
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[56+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm2,mm0
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        pxor    mm6,mm7
+        movq    mm5,[16+esp]
+        paddq   mm2,mm6
+        movq    mm6,[24+esp]
+        movq    mm1,mm4
+        movq    mm7,[edx-40]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [8+esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm2,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [40+esp],mm2
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[32+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[esp]
+        paddq   mm3,mm7
+        movq    mm5,mm2
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm2
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[48+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm0,mm2
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        pxor    mm6,mm7
+        movq    mm5,[8+esp]
+        paddq   mm0,mm6
+        movq    mm6,[16+esp]
+        movdqa  [edx-48],xmm7
+        movdqa  xmm1,xmm3
+        movdqa  xmm7,xmm0
+db      102,15,58,15,198,8
+        movdqa  [32+edx],xmm2
+db      102,15,58,15,202,8
+        movdqa  xmm2,xmm0
+        psrlq   xmm0,7
+        paddq   xmm6,xmm1
+        movdqa  xmm1,xmm2
+        psrlq   xmm2,1
+        psllq   xmm1,56
+        pxor    xmm0,xmm2
+        psrlq   xmm2,7
+        pxor    xmm0,xmm1
+        psllq   xmm1,7
+        pxor    xmm0,xmm2
+        movdqa  xmm2,xmm5
+        pxor    xmm0,xmm1
+        movdqa  xmm1,xmm5
+        psrlq   xmm2,6
+        paddq   xmm6,xmm0
+        movdqa  xmm0,xmm5
+        psrlq   xmm1,19
+        psllq   xmm0,3
+        pxor    xmm2,xmm1
+        psrlq   xmm1,42
+        pxor    xmm2,xmm0
+        psllq   xmm0,42
+        pxor    xmm2,xmm1
+        movdqa  xmm1,[edx]
+        pxor    xmm2,xmm0
+        movdqa  xmm0,[96+ebp]
+        movq    mm1,mm4
+        paddq   xmm6,xmm2
+        movq    mm7,[edx-32]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [esp],mm4
+        paddq   xmm0,xmm6
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm0,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [32+esp],mm0
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[24+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[56+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm0
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm0
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[40+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm2,mm0
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        pxor    mm6,mm7
+        movq    mm5,[esp]
+        paddq   mm2,mm6
+        movq    mm6,[8+esp]
+        movq    mm1,mm4
+        movq    mm7,[edx-24]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [56+esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm2,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [24+esp],mm2
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[16+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[48+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm2
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm2
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[32+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm0,mm2
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        pxor    mm6,mm7
+        movq    mm5,[56+esp]
+        paddq   mm0,mm6
+        movq    mm6,[esp]
+        movdqa  [edx-32],xmm0
+        movdqa  xmm2,xmm4
+        movdqa  xmm0,xmm1
+db      102,15,58,15,207,8
+        movdqa  [48+edx],xmm3
+db      102,15,58,15,211,8
+        movdqa  xmm3,xmm1
+        psrlq   xmm1,7
+        paddq   xmm7,xmm2
+        movdqa  xmm2,xmm3
+        psrlq   xmm3,1
+        psllq   xmm2,56
+        pxor    xmm1,xmm3
+        psrlq   xmm3,7
+        pxor    xmm1,xmm2
+        psllq   xmm2,7
+        pxor    xmm1,xmm3
+        movdqa  xmm3,xmm6
+        pxor    xmm1,xmm2
+        movdqa  xmm2,xmm6
+        psrlq   xmm3,6
+        paddq   xmm7,xmm1
+        movdqa  xmm1,xmm6
+        psrlq   xmm2,19
+        psllq   xmm1,3
+        pxor    xmm3,xmm2
+        psrlq   xmm2,42
+        pxor    xmm3,xmm1
+        psllq   xmm1,42
+        pxor    xmm3,xmm2
+        movdqa  xmm2,[16+edx]
+        pxor    xmm3,xmm1
+        movdqa  xmm1,[112+ebp]
+        movq    mm1,mm4
+        paddq   xmm7,xmm3
+        movq    mm7,[edx-16]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [48+esp],mm4
+        paddq   xmm1,xmm7
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm0,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [16+esp],mm0
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[8+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[40+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm0
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm0
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[24+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm2,mm0
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        pxor    mm6,mm7
+        movq    mm5,[48+esp]
+        paddq   mm2,mm6
+        movq    mm6,[56+esp]
+        movq    mm1,mm4
+        movq    mm7,[edx-8]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [40+esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm2,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [8+esp],mm2
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[32+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm2
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm2
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[16+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm0,mm2
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        pxor    mm6,mm7
+        movq    mm5,[40+esp]
+        paddq   mm0,mm6
+        movq    mm6,[48+esp]
+        movdqa  [edx-16],xmm1
+        lea     ebp,[128+ebp]
+        dec     ecx
+        jnz     NEAR L$00800_47_ssse3
+        movdqa  xmm1,[ebp]
+        lea     ebp,[ebp-640]
+        movdqu  xmm0,[ebx]
+db      102,15,56,0,193
+        movdqa  xmm3,[ebp]
+        movdqa  xmm2,xmm1
+        movdqu  xmm1,[16+ebx]
+        paddq   xmm3,xmm0
+db      102,15,56,0,202
+        movq    mm1,mm4
+        movq    mm7,[edx-128]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [32+esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm0,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [esp],mm0
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[56+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[24+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm0
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm0
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[8+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm2,mm0
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        pxor    mm6,mm7
+        movq    mm5,[32+esp]
+        paddq   mm2,mm6
+        movq    mm6,[40+esp]
+        movq    mm1,mm4
+        movq    mm7,[edx-120]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [24+esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm2,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [56+esp],mm2
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[48+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[16+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm2
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm2
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm0,mm2
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        pxor    mm6,mm7
+        movq    mm5,[24+esp]
+        paddq   mm0,mm6
+        movq    mm6,[32+esp]
+        movdqa  [edx-128],xmm3
+        movdqa  xmm4,[16+ebp]
+        movdqa  xmm3,xmm2
+        movdqu  xmm2,[32+ebx]
+        paddq   xmm4,xmm1
+db      102,15,56,0,211
+        movq    mm1,mm4
+        movq    mm7,[edx-112]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [16+esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm0,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [48+esp],mm0
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[40+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[8+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm0
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm0
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[56+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm2,mm0
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        pxor    mm6,mm7
+        movq    mm5,[16+esp]
+        paddq   mm2,mm6
+        movq    mm6,[24+esp]
+        movq    mm1,mm4
+        movq    mm7,[edx-104]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [8+esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm2,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [40+esp],mm2
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[32+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[esp]
+        paddq   mm3,mm7
+        movq    mm5,mm2
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm2
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[48+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm0,mm2
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        pxor    mm6,mm7
+        movq    mm5,[8+esp]
+        paddq   mm0,mm6
+        movq    mm6,[16+esp]
+        movdqa  [edx-112],xmm4
+        movdqa  xmm5,[32+ebp]
+        movdqa  xmm4,xmm3
+        movdqu  xmm3,[48+ebx]
+        paddq   xmm5,xmm2
+db      102,15,56,0,220
+        movq    mm1,mm4
+        movq    mm7,[edx-96]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm0,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [32+esp],mm0
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[24+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[56+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm0
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm0
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[40+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm2,mm0
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        pxor    mm6,mm7
+        movq    mm5,[esp]
+        paddq   mm2,mm6
+        movq    mm6,[8+esp]
+        movq    mm1,mm4
+        movq    mm7,[edx-88]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [56+esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm2,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [24+esp],mm2
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[16+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[48+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm2
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm2
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[32+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm0,mm2
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        pxor    mm6,mm7
+        movq    mm5,[56+esp]
+        paddq   mm0,mm6
+        movq    mm6,[esp]
+        movdqa  [edx-96],xmm5
+        movdqa  xmm6,[48+ebp]
+        movdqa  xmm5,xmm4
+        movdqu  xmm4,[64+ebx]
+        paddq   xmm6,xmm3
+db      102,15,56,0,229
+        movq    mm1,mm4
+        movq    mm7,[edx-80]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [48+esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm0,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [16+esp],mm0
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[8+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[40+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm0
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm0
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[24+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm2,mm0
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        pxor    mm6,mm7
+        movq    mm5,[48+esp]
+        paddq   mm2,mm6
+        movq    mm6,[56+esp]
+        movq    mm1,mm4
+        movq    mm7,[edx-72]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [40+esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm2,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [8+esp],mm2
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[32+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm2
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm2
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[16+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm0,mm2
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        pxor    mm6,mm7
+        movq    mm5,[40+esp]
+        paddq   mm0,mm6
+        movq    mm6,[48+esp]
+        movdqa  [edx-80],xmm6
+        movdqa  xmm7,[64+ebp]
+        movdqa  xmm6,xmm5
+        movdqu  xmm5,[80+ebx]
+        paddq   xmm7,xmm4
+db      102,15,56,0,238
+        movq    mm1,mm4
+        movq    mm7,[edx-64]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [32+esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm0,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [esp],mm0
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[56+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[24+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm0
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm0
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[8+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm2,mm0
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        pxor    mm6,mm7
+        movq    mm5,[32+esp]
+        paddq   mm2,mm6
+        movq    mm6,[40+esp]
+        movq    mm1,mm4
+        movq    mm7,[edx-56]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [24+esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm2,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [56+esp],mm2
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[48+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[16+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm2
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm2
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm0,mm2
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        pxor    mm6,mm7
+        movq    mm5,[24+esp]
+        paddq   mm0,mm6
+        movq    mm6,[32+esp]
+        movdqa  [edx-64],xmm7
+        movdqa  [edx],xmm0
+        movdqa  xmm0,[80+ebp]
+        movdqa  xmm7,xmm6
+        movdqu  xmm6,[96+ebx]
+        paddq   xmm0,xmm5
+db      102,15,56,0,247
+        movq    mm1,mm4
+        movq    mm7,[edx-48]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [16+esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm0,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [48+esp],mm0
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[40+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[8+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm0
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm0
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[56+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm2,mm0
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        pxor    mm6,mm7
+        movq    mm5,[16+esp]
+        paddq   mm2,mm6
+        movq    mm6,[24+esp]
+        movq    mm1,mm4
+        movq    mm7,[edx-40]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [8+esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm2,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [40+esp],mm2
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[32+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[esp]
+        paddq   mm3,mm7
+        movq    mm5,mm2
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm2
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[48+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm0,mm2
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        pxor    mm6,mm7
+        movq    mm5,[8+esp]
+        paddq   mm0,mm6
+        movq    mm6,[16+esp]
+        movdqa  [edx-48],xmm0
+        movdqa  [16+edx],xmm1
+        movdqa  xmm1,[96+ebp]
+        movdqa  xmm0,xmm7
+        movdqu  xmm7,[112+ebx]
+        paddq   xmm1,xmm6
+db      102,15,56,0,248
+        movq    mm1,mm4
+        movq    mm7,[edx-32]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm0,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [32+esp],mm0
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[24+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[56+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm0
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm0
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[40+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm2,mm0
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        pxor    mm6,mm7
+        movq    mm5,[esp]
+        paddq   mm2,mm6
+        movq    mm6,[8+esp]
+        movq    mm1,mm4
+        movq    mm7,[edx-24]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [56+esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm2,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [24+esp],mm2
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[16+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[48+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm2
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm2
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[32+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm0,mm2
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        pxor    mm6,mm7
+        movq    mm5,[56+esp]
+        paddq   mm0,mm6
+        movq    mm6,[esp]
+        movdqa  [edx-32],xmm1
+        movdqa  [32+edx],xmm2
+        movdqa  xmm2,[112+ebp]
+        movdqa  xmm0,[edx]
+        paddq   xmm2,xmm7
+        movq    mm1,mm4
+        movq    mm7,[edx-16]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [48+esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm0,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [16+esp],mm0
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[8+esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[40+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm0
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm0
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[24+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm2,mm0
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        pxor    mm6,mm7
+        movq    mm5,[48+esp]
+        paddq   mm2,mm6
+        movq    mm6,[56+esp]
+        movq    mm1,mm4
+        movq    mm7,[edx-8]
+        pxor    mm5,mm6
+        psrlq   mm1,14
+        movq    [40+esp],mm4
+        pand    mm5,mm4
+        psllq   mm4,23
+        paddq   mm2,mm3
+        movq    mm3,mm1
+        psrlq   mm1,4
+        pxor    mm5,mm6
+        pxor    mm3,mm4
+        psllq   mm4,23
+        pxor    mm3,mm1
+        movq    [8+esp],mm2
+        paddq   mm7,mm5
+        pxor    mm3,mm4
+        psrlq   mm1,23
+        paddq   mm7,[esp]
+        pxor    mm3,mm1
+        psllq   mm4,4
+        pxor    mm3,mm4
+        movq    mm4,[32+esp]
+        paddq   mm3,mm7
+        movq    mm5,mm2
+        psrlq   mm5,28
+        paddq   mm4,mm3
+        movq    mm6,mm2
+        movq    mm7,mm5
+        psllq   mm6,25
+        movq    mm1,[16+esp]
+        psrlq   mm5,6
+        pxor    mm7,mm6
+        psllq   mm6,5
+        pxor    mm7,mm5
+        pxor    mm2,mm1
+        psrlq   mm5,5
+        pxor    mm7,mm6
+        pand    mm0,mm2
+        psllq   mm6,6
+        pxor    mm7,mm5
+        pxor    mm0,mm1
+        pxor    mm6,mm7
+        movq    mm5,[40+esp]
+        paddq   mm0,mm6
+        movq    mm6,[48+esp]
+        movdqa  [edx-16],xmm2
+        movq    mm1,[8+esp]
+        paddq   mm0,mm3
+        movq    mm3,[24+esp]
+        movq    mm7,[56+esp]
+        pxor    mm2,mm1
+        paddq   mm0,[esi]
+        paddq   mm1,[8+esi]
+        paddq   mm2,[16+esi]
+        paddq   mm3,[24+esi]
+        paddq   mm4,[32+esi]
+        paddq   mm5,[40+esi]
+        paddq   mm6,[48+esi]
+        paddq   mm7,[56+esi]
+        movq    [esi],mm0
+        movq    [8+esi],mm1
+        movq    [16+esi],mm2
+        movq    [24+esi],mm3
+        movq    [32+esi],mm4
+        movq    [40+esi],mm5
+        movq    [48+esi],mm6
+        movq    [56+esi],mm7
+        cmp     edi,eax
+        jb      NEAR L$007loop_ssse3
+        mov     esp,DWORD [76+edx]
+        emms
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+align   16
+L$002loop_x86:
+        mov     eax,DWORD [edi]
+        mov     ebx,DWORD [4+edi]
+        mov     ecx,DWORD [8+edi]
+        mov     edx,DWORD [12+edi]
+        bswap   eax
+        bswap   ebx
+        bswap   ecx
+        bswap   edx
+        push    eax
+        push    ebx
+        push    ecx
+        push    edx
+        mov     eax,DWORD [16+edi]
+        mov     ebx,DWORD [20+edi]
+        mov     ecx,DWORD [24+edi]
+        mov     edx,DWORD [28+edi]
+        bswap   eax
+        bswap   ebx
+        bswap   ecx
+        bswap   edx
+        push    eax
+        push    ebx
+        push    ecx
+        push    edx
+        mov     eax,DWORD [32+edi]
+        mov     ebx,DWORD [36+edi]
+        mov     ecx,DWORD [40+edi]
+        mov     edx,DWORD [44+edi]
+        bswap   eax
+        bswap   ebx
+        bswap   ecx
+        bswap   edx
+        push    eax
+        push    ebx
+        push    ecx
+        push    edx
+        mov     eax,DWORD [48+edi]
+        mov     ebx,DWORD [52+edi]
+        mov     ecx,DWORD [56+edi]
+        mov     edx,DWORD [60+edi]
+        bswap   eax
+        bswap   ebx
+        bswap   ecx
+        bswap   edx
+        push    eax
+        push    ebx
+        push    ecx
+        push    edx
+        mov     eax,DWORD [64+edi]
+        mov     ebx,DWORD [68+edi]
+        mov     ecx,DWORD [72+edi]
+        mov     edx,DWORD [76+edi]
+        bswap   eax
+        bswap   ebx
+        bswap   ecx
+        bswap   edx
+        push    eax
+        push    ebx
+        push    ecx
+        push    edx
+        mov     eax,DWORD [80+edi]
+        mov     ebx,DWORD [84+edi]
+        mov     ecx,DWORD [88+edi]
+        mov     edx,DWORD [92+edi]
+        bswap   eax
+        bswap   ebx
+        bswap   ecx
+        bswap   edx
+        push    eax
+        push    ebx
+        push    ecx
+        push    edx
+        mov     eax,DWORD [96+edi]
+        mov     ebx,DWORD [100+edi]
+        mov     ecx,DWORD [104+edi]
+        mov     edx,DWORD [108+edi]
+        bswap   eax
+        bswap   ebx
+        bswap   ecx
+        bswap   edx
+        push    eax
+        push    ebx
+        push    ecx
+        push    edx
+        mov     eax,DWORD [112+edi]
+        mov     ebx,DWORD [116+edi]
+        mov     ecx,DWORD [120+edi]
+        mov     edx,DWORD [124+edi]
+        bswap   eax
+        bswap   ebx
+        bswap   ecx
+        bswap   edx
+        push    eax
+        push    ebx
+        push    ecx
+        push    edx
+        add     edi,128
+        sub     esp,72
+        mov     DWORD [204+esp],edi
+        lea     edi,[8+esp]
+        mov     ecx,16
+dd      2784229001
+align   16
+L$00900_15_x86:
+        mov     ecx,DWORD [40+esp]
+        mov     edx,DWORD [44+esp]
+        mov     esi,ecx
+        shr     ecx,9
+        mov     edi,edx
+        shr     edx,9
+        mov     ebx,ecx
+        shl     esi,14
+        mov     eax,edx
+        shl     edi,14
+        xor     ebx,esi
+        shr     ecx,5
+        xor     eax,edi
+        shr     edx,5
+        xor     eax,ecx
+        shl     esi,4
+        xor     ebx,edx
+        shl     edi,4
+        xor     ebx,esi
+        shr     ecx,4
+        xor     eax,edi
+        shr     edx,4
+        xor     eax,ecx
+        shl     esi,5
+        xor     ebx,edx
+        shl     edi,5
+        xor     eax,esi
+        xor     ebx,edi
+        mov     ecx,DWORD [48+esp]
+        mov     edx,DWORD [52+esp]
+        mov     esi,DWORD [56+esp]
+        mov     edi,DWORD [60+esp]
+        add     eax,DWORD [64+esp]
+        adc     ebx,DWORD [68+esp]
+        xor     ecx,esi
+        xor     edx,edi
+        and     ecx,DWORD [40+esp]
+        and     edx,DWORD [44+esp]
+        add     eax,DWORD [192+esp]
+        adc     ebx,DWORD [196+esp]
+        xor     ecx,esi
+        xor     edx,edi
+        mov     esi,DWORD [ebp]
+        mov     edi,DWORD [4+ebp]
+        add     eax,ecx
+        adc     ebx,edx
+        mov     ecx,DWORD [32+esp]
+        mov     edx,DWORD [36+esp]
+        add     eax,esi
+        adc     ebx,edi
+        mov     DWORD [esp],eax
+        mov     DWORD [4+esp],ebx
+        add     eax,ecx
+        adc     ebx,edx
+        mov     ecx,DWORD [8+esp]
+        mov     edx,DWORD [12+esp]
+        mov     DWORD [32+esp],eax
+        mov     DWORD [36+esp],ebx
+        mov     esi,ecx
+        shr     ecx,2
+        mov     edi,edx
+        shr     edx,2
+        mov     ebx,ecx
+        shl     esi,4
+        mov     eax,edx
+        shl     edi,4
+        xor     ebx,esi
+        shr     ecx,5
+        xor     eax,edi
+        shr     edx,5
+        xor     ebx,ecx
+        shl     esi,21
+        xor     eax,edx
+        shl     edi,21
+        xor     eax,esi
+        shr     ecx,21
+        xor     ebx,edi
+        shr     edx,21
+        xor     eax,ecx
+        shl     esi,5
+        xor     ebx,edx
+        shl     edi,5
+        xor     eax,esi
+        xor     ebx,edi
+        mov     ecx,DWORD [8+esp]
+        mov     edx,DWORD [12+esp]
+        mov     esi,DWORD [16+esp]
+        mov     edi,DWORD [20+esp]
+        add     eax,DWORD [esp]
+        adc     ebx,DWORD [4+esp]
+        or      ecx,esi
+        or      edx,edi
+        and     ecx,DWORD [24+esp]
+        and     edx,DWORD [28+esp]
+        and     esi,DWORD [8+esp]
+        and     edi,DWORD [12+esp]
+        or      ecx,esi
+        or      edx,edi
+        add     eax,ecx
+        adc     ebx,edx
+        mov     DWORD [esp],eax
+        mov     DWORD [4+esp],ebx
+        mov     dl,BYTE [ebp]
+        sub     esp,8
+        lea     ebp,[8+ebp]
+        cmp     dl,148
+        jne     NEAR L$00900_15_x86
+align   16
+L$01016_79_x86:
+        mov     ecx,DWORD [312+esp]
+        mov     edx,DWORD [316+esp]
+        mov     esi,ecx
+        shr     ecx,1
+        mov     edi,edx
+        shr     edx,1
+        mov     eax,ecx
+        shl     esi,24
+        mov     ebx,edx
+        shl     edi,24
+        xor     ebx,esi
+        shr     ecx,6
+        xor     eax,edi
+        shr     edx,6
+        xor     eax,ecx
+        shl     esi,7
+        xor     ebx,edx
+        shl     edi,1
+        xor     ebx,esi
+        shr     ecx,1
+        xor     eax,edi
+        shr     edx,1
+        xor     eax,ecx
+        shl     edi,6
+        xor     ebx,edx
+        xor     eax,edi
+        mov     DWORD [esp],eax
+        mov     DWORD [4+esp],ebx
+        mov     ecx,DWORD [208+esp]
+        mov     edx,DWORD [212+esp]
+        mov     esi,ecx
+        shr     ecx,6
+        mov     edi,edx
+        shr     edx,6
+        mov     eax,ecx
+        shl     esi,3
+        mov     ebx,edx
+        shl     edi,3
+        xor     eax,esi
+        shr     ecx,13
+        xor     ebx,edi
+        shr     edx,13
+        xor     eax,ecx
+        shl     esi,10
+        xor     ebx,edx
+        shl     edi,10
+        xor     ebx,esi
+        shr     ecx,10
+        xor     eax,edi
+        shr     edx,10
+        xor     ebx,ecx
+        shl     edi,13
+        xor     eax,edx
+        xor     eax,edi
+        mov     ecx,DWORD [320+esp]
+        mov     edx,DWORD [324+esp]
+        add     eax,DWORD [esp]
+        adc     ebx,DWORD [4+esp]
+        mov     esi,DWORD [248+esp]
+        mov     edi,DWORD [252+esp]
+        add     eax,ecx
+        adc     ebx,edx
+        add     eax,esi
+        adc     ebx,edi
+        mov     DWORD [192+esp],eax
+        mov     DWORD [196+esp],ebx
+        mov     ecx,DWORD [40+esp]
+        mov     edx,DWORD [44+esp]
+        mov     esi,ecx
+        shr     ecx,9
+        mov     edi,edx
+        shr     edx,9
+        mov     ebx,ecx
+        shl     esi,14
+        mov     eax,edx
+        shl     edi,14
+        xor     ebx,esi
+        shr     ecx,5
+        xor     eax,edi
+        shr     edx,5
+        xor     eax,ecx
+        shl     esi,4
+        xor     ebx,edx
+        shl     edi,4
+        xor     ebx,esi
+        shr     ecx,4
+        xor     eax,edi
+        shr     edx,4
+        xor     eax,ecx
+        shl     esi,5
+        xor     ebx,edx
+        shl     edi,5
+        xor     eax,esi
+        xor     ebx,edi
+        mov     ecx,DWORD [48+esp]
+        mov     edx,DWORD [52+esp]
+        mov     esi,DWORD [56+esp]
+        mov     edi,DWORD [60+esp]
+        add     eax,DWORD [64+esp]
+        adc     ebx,DWORD [68+esp]
+        xor     ecx,esi
+        xor     edx,edi
+        and     ecx,DWORD [40+esp]
+        and     edx,DWORD [44+esp]
+        add     eax,DWORD [192+esp]
+        adc     ebx,DWORD [196+esp]
+        xor     ecx,esi
+        xor     edx,edi
+        mov     esi,DWORD [ebp]
+        mov     edi,DWORD [4+ebp]
+        add     eax,ecx
+        adc     ebx,edx
+        mov     ecx,DWORD [32+esp]
+        mov     edx,DWORD [36+esp]
+        add     eax,esi
+        adc     ebx,edi
+        mov     DWORD [esp],eax
+        mov     DWORD [4+esp],ebx
+        add     eax,ecx
+        adc     ebx,edx
+        mov     ecx,DWORD [8+esp]
+        mov     edx,DWORD [12+esp]
+        mov     DWORD [32+esp],eax
+        mov     DWORD [36+esp],ebx
+        mov     esi,ecx
+        shr     ecx,2
+        mov     edi,edx
+        shr     edx,2
+        mov     ebx,ecx
+        shl     esi,4
+        mov     eax,edx
+        shl     edi,4
+        xor     ebx,esi
+        shr     ecx,5
+        xor     eax,edi
+        shr     edx,5
+        xor     ebx,ecx
+        shl     esi,21
+        xor     eax,edx
+        shl     edi,21
+        xor     eax,esi
+        shr     ecx,21
+        xor     ebx,edi
+        shr     edx,21
+        xor     eax,ecx
+        shl     esi,5
+        xor     ebx,edx
+        shl     edi,5
+        xor     eax,esi
+        xor     ebx,edi
+        mov     ecx,DWORD [8+esp]
+        mov     edx,DWORD [12+esp]
+        mov     esi,DWORD [16+esp]
+        mov     edi,DWORD [20+esp]
+        add     eax,DWORD [esp]
+        adc     ebx,DWORD [4+esp]
+        or      ecx,esi
+        or      edx,edi
+        and     ecx,DWORD [24+esp]
+        and     edx,DWORD [28+esp]
+        and     esi,DWORD [8+esp]
+        and     edi,DWORD [12+esp]
+        or      ecx,esi
+        or      edx,edi
+        add     eax,ecx
+        adc     ebx,edx
+        mov     DWORD [esp],eax
+        mov     DWORD [4+esp],ebx
+        mov     dl,BYTE [ebp]
+        sub     esp,8
+        lea     ebp,[8+ebp]
+        cmp     dl,23
+        jne     NEAR L$01016_79_x86
+        mov     esi,DWORD [840+esp]
+        mov     edi,DWORD [844+esp]
+        mov     eax,DWORD [esi]
+        mov     ebx,DWORD [4+esi]
+        mov     ecx,DWORD [8+esi]
+        mov     edx,DWORD [12+esi]
+        add     eax,DWORD [8+esp]
+        adc     ebx,DWORD [12+esp]
+        mov     DWORD [esi],eax
+        mov     DWORD [4+esi],ebx
+        add     ecx,DWORD [16+esp]
+        adc     edx,DWORD [20+esp]
+        mov     DWORD [8+esi],ecx
+        mov     DWORD [12+esi],edx
+        mov     eax,DWORD [16+esi]
+        mov     ebx,DWORD [20+esi]
+        mov     ecx,DWORD [24+esi]
+        mov     edx,DWORD [28+esi]
+        add     eax,DWORD [24+esp]
+        adc     ebx,DWORD [28+esp]
+        mov     DWORD [16+esi],eax
+        mov     DWORD [20+esi],ebx
+        add     ecx,DWORD [32+esp]
+        adc     edx,DWORD [36+esp]
+        mov     DWORD [24+esi],ecx
+        mov     DWORD [28+esi],edx
+        mov     eax,DWORD [32+esi]
+        mov     ebx,DWORD [36+esi]
+        mov     ecx,DWORD [40+esi]
+        mov     edx,DWORD [44+esi]
+        add     eax,DWORD [40+esp]
+        adc     ebx,DWORD [44+esp]
+        mov     DWORD [32+esi],eax
+        mov     DWORD [36+esi],ebx
+        add     ecx,DWORD [48+esp]
+        adc     edx,DWORD [52+esp]
+        mov     DWORD [40+esi],ecx
+        mov     DWORD [44+esi],edx
+        mov     eax,DWORD [48+esi]
+        mov     ebx,DWORD [52+esi]
+        mov     ecx,DWORD [56+esi]
+        mov     edx,DWORD [60+esi]
+        add     eax,DWORD [56+esp]
+        adc     ebx,DWORD [60+esp]
+        mov     DWORD [48+esi],eax
+        mov     DWORD [52+esi],ebx
+        add     ecx,DWORD [64+esp]
+        adc     edx,DWORD [68+esp]
+        mov     DWORD [56+esi],ecx
+        mov     DWORD [60+esi],edx
+        add     esp,840
+        sub     ebp,640
+        cmp     edi,DWORD [8+esp]
+        jb      NEAR L$002loop_x86
+        mov     esp,DWORD [12+esp]
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+align   64
+L$001K512:
+dd      3609767458,1116352408
+dd      602891725,1899447441
+dd      3964484399,3049323471
+dd      2173295548,3921009573
+dd      4081628472,961987163
+dd      3053834265,1508970993
+dd      2937671579,2453635748
+dd      3664609560,2870763221
+dd      2734883394,3624381080
+dd      1164996542,310598401
+dd      1323610764,607225278
+dd      3590304994,1426881987
+dd      4068182383,1925078388
+dd      991336113,2162078206
+dd      633803317,2614888103
+dd      3479774868,3248222580
+dd      2666613458,3835390401
+dd      944711139,4022224774
+dd      2341262773,264347078
+dd      2007800933,604807628
+dd      1495990901,770255983
+dd      1856431235,1249150122
+dd      3175218132,1555081692
+dd      2198950837,1996064986
+dd      3999719339,2554220882
+dd      766784016,2821834349
+dd      2566594879,2952996808
+dd      3203337956,3210313671
+dd      1034457026,3336571891
+dd      2466948901,3584528711
+dd      3758326383,113926993
+dd      168717936,338241895
+dd      1188179964,666307205
+dd      1546045734,773529912
+dd      1522805485,1294757372
+dd      2643833823,1396182291
+dd      2343527390,1695183700
+dd      1014477480,1986661051
+dd      1206759142,2177026350
+dd      344077627,2456956037
+dd      1290863460,2730485921
+dd      3158454273,2820302411
+dd      3505952657,3259730800
+dd      106217008,3345764771
+dd      3606008344,3516065817
+dd      1432725776,3600352804
+dd      1467031594,4094571909
+dd      851169720,275423344
+dd      3100823752,430227734
+dd      1363258195,506948616
+dd      3750685593,659060556
+dd      3785050280,883997877
+dd      3318307427,958139571
+dd      3812723403,1322822218
+dd      2003034995,1537002063
+dd      3602036899,1747873779
+dd      1575990012,1955562222
+dd      1125592928,2024104815
+dd      2716904306,2227730452
+dd      442776044,2361852424
+dd      593698344,2428436474
+dd      3733110249,2756734187
+dd      2999351573,3204031479
+dd      3815920427,3329325298
+dd      3928383900,3391569614
+dd      566280711,3515267271
+dd      3454069534,3940187606
+dd      4000239992,4118630271
+dd      1914138554,116418474
+dd      2731055270,174292421
+dd      3203993006,289380356
+dd      320620315,460393269
+dd      587496836,685471733
+dd      1086792851,852142971
+dd      365543100,1017036298
+dd      2618297676,1126000580
+dd      3409855158,1288033470
+dd      4234509866,1501505948
+dd      987167468,1607167915
+dd      1246189591,1816402316
+dd      67438087,66051
+dd      202182159,134810123
+db      83,72,65,53,49,50,32,98,108,111,99,107,32,116,114,97
+db      110,115,102,111,114,109,32,102,111,114,32,120,56,54,44,32
+db      67,82,89,80,84,79,71,65,77,83,32,98,121,32,60,97
+db      112,112,114,111,64,111,112,101,110,115,115,108,46,111,114,103
+db      62,0
+segment .bss
+common  _OPENSSL_ia32cap_P 16
diff --git a/CryptoPkg/Library/OpensslLib/Ia32/crypto/x86cpuid.nasm b/CryptoPkg/Library/OpensslLib/Ia32/crypto/x86cpuid.nasm
new file mode 100644
index 0000000000..9d61eedd34
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/Ia32/crypto/x86cpuid.nasm
@@ -0,0 +1,513 @@
+; Copyright 2004-2018 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+%ifidn __OUTPUT_FORMAT__,obj
+section code    use32 class=code align=64
+%elifidn __OUTPUT_FORMAT__,win32
+$@feat.00 equ 1
+section .text   code align=64
+%else
+section .text   code
+%endif
+global  _OPENSSL_ia32_cpuid
+align   16
+_OPENSSL_ia32_cpuid:
+L$_OPENSSL_ia32_cpuid_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        xor     edx,edx
+        pushfd
+        pop     eax
+        mov     ecx,eax
+        xor     eax,2097152
+        push    eax
+        popfd
+        pushfd
+        pop     eax
+        xor     ecx,eax
+        xor     eax,eax
+        mov     esi,DWORD [20+esp]
+        mov     DWORD [8+esi],eax
+        bt      ecx,21
+        jnc     NEAR L$000nocpuid
+        cpuid
+        mov     edi,eax
+        xor     eax,eax
+        cmp     ebx,1970169159
+        setne   al
+        mov     ebp,eax
+        cmp     edx,1231384169
+        setne   al
+        or      ebp,eax
+        cmp     ecx,1818588270
+        setne   al
+        or      ebp,eax
+        jz      NEAR L$001intel
+        cmp     ebx,1752462657
+        setne   al
+        mov     esi,eax
+        cmp     edx,1769238117
+        setne   al
+        or      esi,eax
+        cmp     ecx,1145913699
+        setne   al
+        or      esi,eax
+        jnz     NEAR L$001intel
+        mov     eax,2147483648
+        cpuid
+        cmp     eax,2147483649
+        jb      NEAR L$001intel
+        mov     esi,eax
+        mov     eax,2147483649
+        cpuid
+        or      ebp,ecx
+        and     ebp,2049
+        cmp     esi,2147483656
+        jb      NEAR L$001intel
+        mov     eax,2147483656
+        cpuid
+        movzx   esi,cl
+        inc     esi
+        mov     eax,1
+        xor     ecx,ecx
+        cpuid
+        bt      edx,28
+        jnc     NEAR L$002generic
+        shr     ebx,16
+        and     ebx,255
+        cmp     ebx,esi
+        ja      NEAR L$002generic
+        and     edx,4026531839
+        jmp     NEAR L$002generic
+L$001intel:
+        cmp     edi,4
+        mov     esi,-1
+        jb      NEAR L$003nocacheinfo
+        mov     eax,4
+        mov     ecx,0
+        cpuid
+        mov     esi,eax
+        shr     esi,14
+        and     esi,4095
+L$003nocacheinfo:
+        mov     eax,1
+        xor     ecx,ecx
+        cpuid
+        and     edx,3220176895
+        cmp     ebp,0
+        jne     NEAR L$004notintel
+        or      edx,1073741824
+        and     ah,15
+        cmp     ah,15
+        jne     NEAR L$004notintel
+        or      edx,1048576
+L$004notintel:
+        bt      edx,28
+        jnc     NEAR L$002generic
+        and     edx,4026531839
+        cmp     esi,0
+        je      NEAR L$002generic
+        or      edx,268435456
+        shr     ebx,16
+        cmp     bl,1
+        ja      NEAR L$002generic
+        and     edx,4026531839
+L$002generic:
+        and     ebp,2048
+        and     ecx,4294965247
+        mov     esi,edx
+        or      ebp,ecx
+        cmp     edi,7
+        mov     edi,DWORD [20+esp]
+        jb      NEAR L$005no_extended_info
+        mov     eax,7
+        xor     ecx,ecx
+        cpuid
+        mov     DWORD [8+edi],ebx
+L$005no_extended_info:
+        bt      ebp,27
+        jnc     NEAR L$006clear_avx
+        xor     ecx,ecx
+db      15,1,208
+        and     eax,6
+        cmp     eax,6
+        je      NEAR L$007done
+        cmp     eax,2
+        je      NEAR L$006clear_avx
+L$008clear_xmm:
+        and     ebp,4261412861
+        and     esi,4278190079
+L$006clear_avx:
+        and     ebp,4026525695
+        and     DWORD [8+edi],4294967263
+L$007done:
+        mov     eax,esi
+        mov     edx,ebp
+L$000nocpuid:
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+;extern _OPENSSL_ia32cap_P
+global  _OPENSSL_rdtsc
+align   16
+_OPENSSL_rdtsc:
+L$_OPENSSL_rdtsc_begin:
+        xor     eax,eax
+        xor     edx,edx
+        lea     ecx,[_OPENSSL_ia32cap_P]
+        bt      DWORD [ecx],4
+        jnc     NEAR L$009notsc
+        rdtsc
+L$009notsc:
+        ret
+global  _OPENSSL_instrument_halt
+align   16
+_OPENSSL_instrument_halt:
+L$_OPENSSL_instrument_halt_begin:
+        lea     ecx,[_OPENSSL_ia32cap_P]
+        bt      DWORD [ecx],4
+        jnc     NEAR L$010nohalt
+dd      2421723150
+        and     eax,3
+        jnz     NEAR L$010nohalt
+        pushfd
+        pop     eax
+        bt      eax,9
+        jnc     NEAR L$010nohalt
+        rdtsc
+        push    edx
+        push    eax
+        hlt
+        rdtsc
+        sub     eax,DWORD [esp]
+        sbb     edx,DWORD [4+esp]
+        add     esp,8
+        ret
+L$010nohalt:
+        xor     eax,eax
+        xor     edx,edx
+        ret
+global  _OPENSSL_far_spin
+align   16
+_OPENSSL_far_spin:
+L$_OPENSSL_far_spin_begin:
+        pushfd
+        pop     eax
+        bt      eax,9
+        jnc     NEAR L$011nospin
+        mov     eax,DWORD [4+esp]
+        mov     ecx,DWORD [8+esp]
+dd      2430111262
+        xor     eax,eax
+        mov     edx,DWORD [ecx]
+        jmp     NEAR L$012spin
+align   16
+L$012spin:
+        inc     eax
+        cmp     edx,DWORD [ecx]
+        je      NEAR L$012spin
+dd      529567888
+        ret
+L$011nospin:
+        xor     eax,eax
+        xor     edx,edx
+        ret
+global  _OPENSSL_wipe_cpu
+align   16
+_OPENSSL_wipe_cpu:
+L$_OPENSSL_wipe_cpu_begin:
+        xor     eax,eax
+        xor     edx,edx
+        lea     ecx,[_OPENSSL_ia32cap_P]
+        mov     ecx,DWORD [ecx]
+        bt      DWORD [ecx],1
+        jnc     NEAR L$013no_x87
+        and     ecx,83886080
+        cmp     ecx,83886080
+        jne     NEAR L$014no_sse2
+        pxor    xmm0,xmm0
+        pxor    xmm1,xmm1
+        pxor    xmm2,xmm2
+        pxor    xmm3,xmm3
+        pxor    xmm4,xmm4
+        pxor    xmm5,xmm5
+        pxor    xmm6,xmm6
+        pxor    xmm7,xmm7
+L$014no_sse2:
+dd      4007259865,4007259865,4007259865,4007259865,2430851995
+L$013no_x87:
+        lea     eax,[4+esp]
+        ret
+global  _OPENSSL_atomic_add
+align   16
+_OPENSSL_atomic_add:
+L$_OPENSSL_atomic_add_begin:
+        mov     edx,DWORD [4+esp]
+        mov     ecx,DWORD [8+esp]
+        push    ebx
+        nop
+        mov     eax,DWORD [edx]
+L$015spin:
+        lea     ebx,[ecx*1+eax]
+        nop
+dd      447811568
+        jne     NEAR L$015spin
+        mov     eax,ebx
+        pop     ebx
+        ret
+global  _OPENSSL_cleanse
+align   16
+_OPENSSL_cleanse:
+L$_OPENSSL_cleanse_begin:
+        mov     edx,DWORD [4+esp]
+        mov     ecx,DWORD [8+esp]
+        xor     eax,eax
+        cmp     ecx,7
+        jae     NEAR L$016lot
+        cmp     ecx,0
+        je      NEAR L$017ret
+L$018little:
+        mov     BYTE [edx],al
+        sub     ecx,1
+        lea     edx,[1+edx]
+        jnz     NEAR L$018little
+L$017ret:
+        ret
+align   16
+L$016lot:
+        test    edx,3
+        jz      NEAR L$019aligned
+        mov     BYTE [edx],al
+        lea     ecx,[ecx-1]
+        lea     edx,[1+edx]
+        jmp     NEAR L$016lot
+L$019aligned:
+        mov     DWORD [edx],eax
+        lea     ecx,[ecx-4]
+        test    ecx,-4
+        lea     edx,[4+edx]
+        jnz     NEAR L$019aligned
+        cmp     ecx,0
+        jne     NEAR L$018little
+        ret
+global  _CRYPTO_memcmp
+align   16
+_CRYPTO_memcmp:
+L$_CRYPTO_memcmp_begin:
+        push    esi
+        push    edi
+        mov     esi,DWORD [12+esp]
+        mov     edi,DWORD [16+esp]
+        mov     ecx,DWORD [20+esp]
+        xor     eax,eax
+        xor     edx,edx
+        cmp     ecx,0
+        je      NEAR L$020no_data
+L$021loop:
+        mov     dl,BYTE [esi]
+        lea     esi,[1+esi]
+        xor     dl,BYTE [edi]
+        lea     edi,[1+edi]
+        or      al,dl
+        dec     ecx
+        jnz     NEAR L$021loop
+        neg     eax
+        shr     eax,31
+L$020no_data:
+        pop     edi
+        pop     esi
+        ret
+global  _OPENSSL_instrument_bus
+align   16
+_OPENSSL_instrument_bus:
+L$_OPENSSL_instrument_bus_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        mov     eax,0
+        lea     edx,[_OPENSSL_ia32cap_P]
+        bt      DWORD [edx],4
+        jnc     NEAR L$022nogo
+        bt      DWORD [edx],19
+        jnc     NEAR L$022nogo
+        mov     edi,DWORD [20+esp]
+        mov     ecx,DWORD [24+esp]
+        rdtsc
+        mov     esi,eax
+        mov     ebx,0
+        clflush [edi]
+db      240
+        add     DWORD [edi],ebx
+        jmp     NEAR L$023loop
+align   16
+L$023loop:
+        rdtsc
+        mov     edx,eax
+        sub     eax,esi
+        mov     esi,edx
+        mov     ebx,eax
+        clflush [edi]
+db      240
+        add     DWORD [edi],eax
+        lea     edi,[4+edi]
+        sub     ecx,1
+        jnz     NEAR L$023loop
+        mov     eax,DWORD [24+esp]
+L$022nogo:
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+global  _OPENSSL_instrument_bus2
+align   16
+_OPENSSL_instrument_bus2:
+L$_OPENSSL_instrument_bus2_begin:
+        push    ebp
+        push    ebx
+        push    esi
+        push    edi
+        mov     eax,0
+        lea     edx,[_OPENSSL_ia32cap_P]
+        bt      DWORD [edx],4
+        jnc     NEAR L$024nogo
+        bt      DWORD [edx],19
+        jnc     NEAR L$024nogo
+        mov     edi,DWORD [20+esp]
+        mov     ecx,DWORD [24+esp]
+        mov     ebp,DWORD [28+esp]
+        rdtsc
+        mov     esi,eax
+        mov     ebx,0
+        clflush [edi]
+db      240
+        add     DWORD [edi],ebx
+        rdtsc
+        mov     edx,eax
+        sub     eax,esi
+        mov     esi,edx
+        mov     ebx,eax
+        jmp     NEAR L$025loop2
+align   16
+L$025loop2:
+        clflush [edi]
+db      240
+        add     DWORD [edi],eax
+        sub     ebp,1
+        jz      NEAR L$026done2
+        rdtsc
+        mov     edx,eax
+        sub     eax,esi
+        mov     esi,edx
+        cmp     eax,ebx
+        mov     ebx,eax
+        mov     edx,0
+        setne   dl
+        sub     ecx,edx
+        lea     edi,[edx*4+edi]
+        jnz     NEAR L$025loop2
+L$026done2:
+        mov     eax,DWORD [24+esp]
+        sub     eax,ecx
+L$024nogo:
+        pop     edi
+        pop     esi
+        pop     ebx
+        pop     ebp
+        ret
+global  _OPENSSL_ia32_rdrand_bytes
+align   16
+_OPENSSL_ia32_rdrand_bytes:
+L$_OPENSSL_ia32_rdrand_bytes_begin:
+        push    edi
+        push    ebx
+        xor     eax,eax
+        mov     edi,DWORD [12+esp]
+        mov     ebx,DWORD [16+esp]
+        cmp     ebx,0
+        je      NEAR L$027done
+        mov     ecx,8
+L$028loop:
+db      15,199,242
+        jc      NEAR L$029break
+        loop    L$028loop
+        jmp     NEAR L$027done
+align   16
+L$029break:
+        cmp     ebx,4
+        jb      NEAR L$030tail
+        mov     DWORD [edi],edx
+        lea     edi,[4+edi]
+        add     eax,4
+        sub     ebx,4
+        jz      NEAR L$027done
+        mov     ecx,8
+        jmp     NEAR L$028loop
+align   16
+L$030tail:
+        mov     BYTE [edi],dl
+        lea     edi,[1+edi]
+        inc     eax
+        shr     edx,8
+        dec     ebx
+        jnz     NEAR L$030tail
+L$027done:
+        xor     edx,edx
+        pop     ebx
+        pop     edi
+        ret
+global  _OPENSSL_ia32_rdseed_bytes
+align   16
+_OPENSSL_ia32_rdseed_bytes:
+L$_OPENSSL_ia32_rdseed_bytes_begin:
+        push    edi
+        push    ebx
+        xor     eax,eax
+        mov     edi,DWORD [12+esp]
+        mov     ebx,DWORD [16+esp]
+        cmp     ebx,0
+        je      NEAR L$031done
+        mov     ecx,8
+L$032loop:
+db      15,199,250
+        jc      NEAR L$033break
+        loop    L$032loop
+        jmp     NEAR L$031done
+align   16
+L$033break:
+        cmp     ebx,4
+        jb      NEAR L$034tail
+        mov     DWORD [edi],edx
+        lea     edi,[4+edi]
+        add     eax,4
+        sub     ebx,4
+        jz      NEAR L$031done
+        mov     ecx,8
+        jmp     NEAR L$032loop
+align   16
+L$034tail:
+        mov     BYTE [edi],dl
+        lea     edi,[1+edi]
+        inc     eax
+        shr     edx,8
+        dec     ebx
+        jnz     NEAR L$034tail
+L$031done:
+        xor     edx,edx
+        pop     ebx
+        pop     edi
+        ret
+segment .bss
+common  _OPENSSL_ia32cap_P 16
+segment .CRT$XCU data align=4
+extern  _OPENSSL_cpuid_setup
+dd      _OPENSSL_cpuid_setup
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-mb-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-mb-x86_64.nasm
new file mode 100644
index 0000000000..a90434b21f
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-mb-x86_64.nasm
@@ -0,0 +1,1772 @@
+; Copyright 2013-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+EXTERN  OPENSSL_ia32cap_P
+
+global  aesni_multi_cbc_encrypt
+
+ALIGN   32
+aesni_multi_cbc_encrypt:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_aesni_multi_cbc_encrypt:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+
+        cmp     edx,2
+        jb      NEAR $L$enc_non_avx
+        mov     ecx,DWORD[((OPENSSL_ia32cap_P+4))]
+        test    ecx,268435456
+        jnz     NEAR _avx_cbc_enc_shortcut
+        jmp     NEAR $L$enc_non_avx
+ALIGN   16
+$L$enc_non_avx:
+        mov     rax,rsp
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+        lea     rsp,[((-168))+rsp]
+        movaps  XMMWORD[rsp],xmm6
+        movaps  XMMWORD[16+rsp],xmm7
+        movaps  XMMWORD[32+rsp],xmm8
+        movaps  XMMWORD[48+rsp],xmm9
+        movaps  XMMWORD[64+rsp],xmm10
+        movaps  XMMWORD[80+rsp],xmm11
+        movaps  XMMWORD[96+rsp],xmm12
+        movaps  XMMWORD[(-104)+rax],xmm13
+        movaps  XMMWORD[(-88)+rax],xmm14
+        movaps  XMMWORD[(-72)+rax],xmm15
+
+
+
+
+
+
+        sub     rsp,48
+        and     rsp,-64
+        mov     QWORD[16+rsp],rax
+
+
+$L$enc4x_body:
+        movdqu  xmm12,XMMWORD[rsi]
+        lea     rsi,[120+rsi]
+        lea     rdi,[80+rdi]
+
+$L$enc4x_loop_grande:
+        mov     DWORD[24+rsp],edx
+        xor     edx,edx
+        mov     ecx,DWORD[((-64))+rdi]
+        mov     r8,QWORD[((-80))+rdi]
+        cmp     ecx,edx
+        mov     r12,QWORD[((-72))+rdi]
+        cmovg   edx,ecx
+        test    ecx,ecx
+        movdqu  xmm2,XMMWORD[((-56))+rdi]
+        mov     DWORD[32+rsp],ecx
+        cmovle  r8,rsp
+        mov     ecx,DWORD[((-24))+rdi]
+        mov     r9,QWORD[((-40))+rdi]
+        cmp     ecx,edx
+        mov     r13,QWORD[((-32))+rdi]
+        cmovg   edx,ecx
+        test    ecx,ecx
+        movdqu  xmm3,XMMWORD[((-16))+rdi]
+        mov     DWORD[36+rsp],ecx
+        cmovle  r9,rsp
+        mov     ecx,DWORD[16+rdi]
+        mov     r10,QWORD[rdi]
+        cmp     ecx,edx
+        mov     r14,QWORD[8+rdi]
+        cmovg   edx,ecx
+        test    ecx,ecx
+        movdqu  xmm4,XMMWORD[24+rdi]
+        mov     DWORD[40+rsp],ecx
+        cmovle  r10,rsp
+        mov     ecx,DWORD[56+rdi]
+        mov     r11,QWORD[40+rdi]
+        cmp     ecx,edx
+        mov     r15,QWORD[48+rdi]
+        cmovg   edx,ecx
+        test    ecx,ecx
+        movdqu  xmm5,XMMWORD[64+rdi]
+        mov     DWORD[44+rsp],ecx
+        cmovle  r11,rsp
+        test    edx,edx
+        jz      NEAR $L$enc4x_done
+
+        movups  xmm1,XMMWORD[((16-120))+rsi]
+        pxor    xmm2,xmm12
+        movups  xmm0,XMMWORD[((32-120))+rsi]
+        pxor    xmm3,xmm12
+        mov     eax,DWORD[((240-120))+rsi]
+        pxor    xmm4,xmm12
+        movdqu  xmm6,XMMWORD[r8]
+        pxor    xmm5,xmm12
+        movdqu  xmm7,XMMWORD[r9]
+        pxor    xmm2,xmm6
+        movdqu  xmm8,XMMWORD[r10]
+        pxor    xmm3,xmm7
+        movdqu  xmm9,XMMWORD[r11]
+        pxor    xmm4,xmm8
+        pxor    xmm5,xmm9
+        movdqa  xmm10,XMMWORD[32+rsp]
+        xor     rbx,rbx
+        jmp     NEAR $L$oop_enc4x
+
+ALIGN   32
+$L$oop_enc4x:
+        add     rbx,16
+        lea     rbp,[16+rsp]
+        mov     ecx,1
+        sub     rbp,rbx
+
+DB      102,15,56,220,209
+        prefetcht0      [31+rbx*1+r8]
+        prefetcht0      [31+rbx*1+r9]
+DB      102,15,56,220,217
+        prefetcht0      [31+rbx*1+r10]
+        prefetcht0      [31+rbx*1+r10]
+DB      102,15,56,220,225
+DB      102,15,56,220,233
+        movups  xmm1,XMMWORD[((48-120))+rsi]
+        cmp     ecx,DWORD[32+rsp]
+DB      102,15,56,220,208
+DB      102,15,56,220,216
+DB      102,15,56,220,224
+        cmovge  r8,rbp
+        cmovg   r12,rbp
+DB      102,15,56,220,232
+        movups  xmm0,XMMWORD[((-56))+rsi]
+        cmp     ecx,DWORD[36+rsp]
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+DB      102,15,56,220,225
+        cmovge  r9,rbp
+        cmovg   r13,rbp
+DB      102,15,56,220,233
+        movups  xmm1,XMMWORD[((-40))+rsi]
+        cmp     ecx,DWORD[40+rsp]
+DB      102,15,56,220,208
+DB      102,15,56,220,216
+DB      102,15,56,220,224
+        cmovge  r10,rbp
+        cmovg   r14,rbp
+DB      102,15,56,220,232
+        movups  xmm0,XMMWORD[((-24))+rsi]
+        cmp     ecx,DWORD[44+rsp]
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+DB      102,15,56,220,225
+        cmovge  r11,rbp
+        cmovg   r15,rbp
+DB      102,15,56,220,233
+        movups  xmm1,XMMWORD[((-8))+rsi]
+        movdqa  xmm11,xmm10
+DB      102,15,56,220,208
+        prefetcht0      [15+rbx*1+r12]
+        prefetcht0      [15+rbx*1+r13]
+DB      102,15,56,220,216
+        prefetcht0      [15+rbx*1+r14]
+        prefetcht0      [15+rbx*1+r15]
+DB      102,15,56,220,224
+DB      102,15,56,220,232
+        movups  xmm0,XMMWORD[((128-120))+rsi]
+        pxor    xmm12,xmm12
+
+DB      102,15,56,220,209
+        pcmpgtd xmm11,xmm12
+        movdqu  xmm12,XMMWORD[((-120))+rsi]
+DB      102,15,56,220,217
+        paddd   xmm10,xmm11
+        movdqa  XMMWORD[32+rsp],xmm10
+DB      102,15,56,220,225
+DB      102,15,56,220,233
+        movups  xmm1,XMMWORD[((144-120))+rsi]
+
+        cmp     eax,11
+
+DB      102,15,56,220,208
+DB      102,15,56,220,216
+DB      102,15,56,220,224
+DB      102,15,56,220,232
+        movups  xmm0,XMMWORD[((160-120))+rsi]
+
+        jb      NEAR $L$enc4x_tail
+
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+DB      102,15,56,220,225
+DB      102,15,56,220,233
+        movups  xmm1,XMMWORD[((176-120))+rsi]
+
+DB      102,15,56,220,208
+DB      102,15,56,220,216
+DB      102,15,56,220,224
+DB      102,15,56,220,232
+        movups  xmm0,XMMWORD[((192-120))+rsi]
+
+        je      NEAR $L$enc4x_tail
+
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+DB      102,15,56,220,225
+DB      102,15,56,220,233
+        movups  xmm1,XMMWORD[((208-120))+rsi]
+
+DB      102,15,56,220,208
+DB      102,15,56,220,216
+DB      102,15,56,220,224
+DB      102,15,56,220,232
+        movups  xmm0,XMMWORD[((224-120))+rsi]
+        jmp     NEAR $L$enc4x_tail
+
+ALIGN   32
+$L$enc4x_tail:
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+DB      102,15,56,220,225
+DB      102,15,56,220,233
+        movdqu  xmm6,XMMWORD[rbx*1+r8]
+        movdqu  xmm1,XMMWORD[((16-120))+rsi]
+
+DB      102,15,56,221,208
+        movdqu  xmm7,XMMWORD[rbx*1+r9]
+        pxor    xmm6,xmm12
+DB      102,15,56,221,216
+        movdqu  xmm8,XMMWORD[rbx*1+r10]
+        pxor    xmm7,xmm12
+DB      102,15,56,221,224
+        movdqu  xmm9,XMMWORD[rbx*1+r11]
+        pxor    xmm8,xmm12
+DB      102,15,56,221,232
+        movdqu  xmm0,XMMWORD[((32-120))+rsi]
+        pxor    xmm9,xmm12
+
+        movups  XMMWORD[(-16)+rbx*1+r12],xmm2
+        pxor    xmm2,xmm6
+        movups  XMMWORD[(-16)+rbx*1+r13],xmm3
+        pxor    xmm3,xmm7
+        movups  XMMWORD[(-16)+rbx*1+r14],xmm4
+        pxor    xmm4,xmm8
+        movups  XMMWORD[(-16)+rbx*1+r15],xmm5
+        pxor    xmm5,xmm9
+
+        dec     edx
+        jnz     NEAR $L$oop_enc4x
+
+        mov     rax,QWORD[16+rsp]
+
+        mov     edx,DWORD[24+rsp]
+
+
+
+
+
+
+
+
+
+
+        lea     rdi,[160+rdi]
+        dec     edx
+        jnz     NEAR $L$enc4x_loop_grande
+
+$L$enc4x_done:
+        movaps  xmm6,XMMWORD[((-216))+rax]
+        movaps  xmm7,XMMWORD[((-200))+rax]
+        movaps  xmm8,XMMWORD[((-184))+rax]
+        movaps  xmm9,XMMWORD[((-168))+rax]
+        movaps  xmm10,XMMWORD[((-152))+rax]
+        movaps  xmm11,XMMWORD[((-136))+rax]
+        movaps  xmm12,XMMWORD[((-120))+rax]
+
+
+
+        mov     r15,QWORD[((-48))+rax]
+
+        mov     r14,QWORD[((-40))+rax]
+
+        mov     r13,QWORD[((-32))+rax]
+
+        mov     r12,QWORD[((-24))+rax]
+
+        mov     rbp,QWORD[((-16))+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+
+        lea     rsp,[rax]
+
+$L$enc4x_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_aesni_multi_cbc_encrypt:
+
+global  aesni_multi_cbc_decrypt
+
+ALIGN   32
+aesni_multi_cbc_decrypt:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_aesni_multi_cbc_decrypt:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+
+        cmp     edx,2
+        jb      NEAR $L$dec_non_avx
+        mov     ecx,DWORD[((OPENSSL_ia32cap_P+4))]
+        test    ecx,268435456
+        jnz     NEAR _avx_cbc_dec_shortcut
+        jmp     NEAR $L$dec_non_avx
+ALIGN   16
+$L$dec_non_avx:
+        mov     rax,rsp
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+        lea     rsp,[((-168))+rsp]
+        movaps  XMMWORD[rsp],xmm6
+        movaps  XMMWORD[16+rsp],xmm7
+        movaps  XMMWORD[32+rsp],xmm8
+        movaps  XMMWORD[48+rsp],xmm9
+        movaps  XMMWORD[64+rsp],xmm10
+        movaps  XMMWORD[80+rsp],xmm11
+        movaps  XMMWORD[96+rsp],xmm12
+        movaps  XMMWORD[(-104)+rax],xmm13
+        movaps  XMMWORD[(-88)+rax],xmm14
+        movaps  XMMWORD[(-72)+rax],xmm15
+
+
+
+
+
+
+        sub     rsp,48
+        and     rsp,-64
+        mov     QWORD[16+rsp],rax
+
+
+$L$dec4x_body:
+        movdqu  xmm12,XMMWORD[rsi]
+        lea     rsi,[120+rsi]
+        lea     rdi,[80+rdi]
+
+$L$dec4x_loop_grande:
+        mov     DWORD[24+rsp],edx
+        xor     edx,edx
+        mov     ecx,DWORD[((-64))+rdi]
+        mov     r8,QWORD[((-80))+rdi]
+        cmp     ecx,edx
+        mov     r12,QWORD[((-72))+rdi]
+        cmovg   edx,ecx
+        test    ecx,ecx
+        movdqu  xmm6,XMMWORD[((-56))+rdi]
+        mov     DWORD[32+rsp],ecx
+        cmovle  r8,rsp
+        mov     ecx,DWORD[((-24))+rdi]
+        mov     r9,QWORD[((-40))+rdi]
+        cmp     ecx,edx
+        mov     r13,QWORD[((-32))+rdi]
+        cmovg   edx,ecx
+        test    ecx,ecx
+        movdqu  xmm7,XMMWORD[((-16))+rdi]
+        mov     DWORD[36+rsp],ecx
+        cmovle  r9,rsp
+        mov     ecx,DWORD[16+rdi]
+        mov     r10,QWORD[rdi]
+        cmp     ecx,edx
+        mov     r14,QWORD[8+rdi]
+        cmovg   edx,ecx
+        test    ecx,ecx
+        movdqu  xmm8,XMMWORD[24+rdi]
+        mov     DWORD[40+rsp],ecx
+        cmovle  r10,rsp
+        mov     ecx,DWORD[56+rdi]
+        mov     r11,QWORD[40+rdi]
+        cmp     ecx,edx
+        mov     r15,QWORD[48+rdi]
+        cmovg   edx,ecx
+        test    ecx,ecx
+        movdqu  xmm9,XMMWORD[64+rdi]
+        mov     DWORD[44+rsp],ecx
+        cmovle  r11,rsp
+        test    edx,edx
+        jz      NEAR $L$dec4x_done
+
+        movups  xmm1,XMMWORD[((16-120))+rsi]
+        movups  xmm0,XMMWORD[((32-120))+rsi]
+        mov     eax,DWORD[((240-120))+rsi]
+        movdqu  xmm2,XMMWORD[r8]
+        movdqu  xmm3,XMMWORD[r9]
+        pxor    xmm2,xmm12
+        movdqu  xmm4,XMMWORD[r10]
+        pxor    xmm3,xmm12
+        movdqu  xmm5,XMMWORD[r11]
+        pxor    xmm4,xmm12
+        pxor    xmm5,xmm12
+        movdqa  xmm10,XMMWORD[32+rsp]
+        xor     rbx,rbx
+        jmp     NEAR $L$oop_dec4x
+
+ALIGN   32
+$L$oop_dec4x:
+        add     rbx,16
+        lea     rbp,[16+rsp]
+        mov     ecx,1
+        sub     rbp,rbx
+
+DB      102,15,56,222,209
+        prefetcht0      [31+rbx*1+r8]
+        prefetcht0      [31+rbx*1+r9]
+DB      102,15,56,222,217
+        prefetcht0      [31+rbx*1+r10]
+        prefetcht0      [31+rbx*1+r11]
+DB      102,15,56,222,225
+DB      102,15,56,222,233
+        movups  xmm1,XMMWORD[((48-120))+rsi]
+        cmp     ecx,DWORD[32+rsp]
+DB      102,15,56,222,208
+DB      102,15,56,222,216
+DB      102,15,56,222,224
+        cmovge  r8,rbp
+        cmovg   r12,rbp
+DB      102,15,56,222,232
+        movups  xmm0,XMMWORD[((-56))+rsi]
+        cmp     ecx,DWORD[36+rsp]
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+DB      102,15,56,222,225
+        cmovge  r9,rbp
+        cmovg   r13,rbp
+DB      102,15,56,222,233
+        movups  xmm1,XMMWORD[((-40))+rsi]
+        cmp     ecx,DWORD[40+rsp]
+DB      102,15,56,222,208
+DB      102,15,56,222,216
+DB      102,15,56,222,224
+        cmovge  r10,rbp
+        cmovg   r14,rbp
+DB      102,15,56,222,232
+        movups  xmm0,XMMWORD[((-24))+rsi]
+        cmp     ecx,DWORD[44+rsp]
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+DB      102,15,56,222,225
+        cmovge  r11,rbp
+        cmovg   r15,rbp
+DB      102,15,56,222,233
+        movups  xmm1,XMMWORD[((-8))+rsi]
+        movdqa  xmm11,xmm10
+DB      102,15,56,222,208
+        prefetcht0      [15+rbx*1+r12]
+        prefetcht0      [15+rbx*1+r13]
+DB      102,15,56,222,216
+        prefetcht0      [15+rbx*1+r14]
+        prefetcht0      [15+rbx*1+r15]
+DB      102,15,56,222,224
+DB      102,15,56,222,232
+        movups  xmm0,XMMWORD[((128-120))+rsi]
+        pxor    xmm12,xmm12
+
+DB      102,15,56,222,209
+        pcmpgtd xmm11,xmm12
+        movdqu  xmm12,XMMWORD[((-120))+rsi]
+DB      102,15,56,222,217
+        paddd   xmm10,xmm11
+        movdqa  XMMWORD[32+rsp],xmm10
+DB      102,15,56,222,225
+DB      102,15,56,222,233
+        movups  xmm1,XMMWORD[((144-120))+rsi]
+
+        cmp     eax,11
+
+DB      102,15,56,222,208
+DB      102,15,56,222,216
+DB      102,15,56,222,224
+DB      102,15,56,222,232
+        movups  xmm0,XMMWORD[((160-120))+rsi]
+
+        jb      NEAR $L$dec4x_tail
+
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+DB      102,15,56,222,225
+DB      102,15,56,222,233
+        movups  xmm1,XMMWORD[((176-120))+rsi]
+
+DB      102,15,56,222,208
+DB      102,15,56,222,216
+DB      102,15,56,222,224
+DB      102,15,56,222,232
+        movups  xmm0,XMMWORD[((192-120))+rsi]
+
+        je      NEAR $L$dec4x_tail
+
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+DB      102,15,56,222,225
+DB      102,15,56,222,233
+        movups  xmm1,XMMWORD[((208-120))+rsi]
+
+DB      102,15,56,222,208
+DB      102,15,56,222,216
+DB      102,15,56,222,224
+DB      102,15,56,222,232
+        movups  xmm0,XMMWORD[((224-120))+rsi]
+        jmp     NEAR $L$dec4x_tail
+
+ALIGN   32
+$L$dec4x_tail:
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+DB      102,15,56,222,225
+        pxor    xmm6,xmm0
+        pxor    xmm7,xmm0
+DB      102,15,56,222,233
+        movdqu  xmm1,XMMWORD[((16-120))+rsi]
+        pxor    xmm8,xmm0
+        pxor    xmm9,xmm0
+        movdqu  xmm0,XMMWORD[((32-120))+rsi]
+
+DB      102,15,56,223,214
+DB      102,15,56,223,223
+        movdqu  xmm6,XMMWORD[((-16))+rbx*1+r8]
+        movdqu  xmm7,XMMWORD[((-16))+rbx*1+r9]
+DB      102,65,15,56,223,224
+DB      102,65,15,56,223,233
+        movdqu  xmm8,XMMWORD[((-16))+rbx*1+r10]
+        movdqu  xmm9,XMMWORD[((-16))+rbx*1+r11]
+
+        movups  XMMWORD[(-16)+rbx*1+r12],xmm2
+        movdqu  xmm2,XMMWORD[rbx*1+r8]
+        movups  XMMWORD[(-16)+rbx*1+r13],xmm3
+        movdqu  xmm3,XMMWORD[rbx*1+r9]
+        pxor    xmm2,xmm12
+        movups  XMMWORD[(-16)+rbx*1+r14],xmm4
+        movdqu  xmm4,XMMWORD[rbx*1+r10]
+        pxor    xmm3,xmm12
+        movups  XMMWORD[(-16)+rbx*1+r15],xmm5
+        movdqu  xmm5,XMMWORD[rbx*1+r11]
+        pxor    xmm4,xmm12
+        pxor    xmm5,xmm12
+
+        dec     edx
+        jnz     NEAR $L$oop_dec4x
+
+        mov     rax,QWORD[16+rsp]
+
+        mov     edx,DWORD[24+rsp]
+
+        lea     rdi,[160+rdi]
+        dec     edx
+        jnz     NEAR $L$dec4x_loop_grande
+
+$L$dec4x_done:
+        movaps  xmm6,XMMWORD[((-216))+rax]
+        movaps  xmm7,XMMWORD[((-200))+rax]
+        movaps  xmm8,XMMWORD[((-184))+rax]
+        movaps  xmm9,XMMWORD[((-168))+rax]
+        movaps  xmm10,XMMWORD[((-152))+rax]
+        movaps  xmm11,XMMWORD[((-136))+rax]
+        movaps  xmm12,XMMWORD[((-120))+rax]
+
+
+
+        mov     r15,QWORD[((-48))+rax]
+
+        mov     r14,QWORD[((-40))+rax]
+
+        mov     r13,QWORD[((-32))+rax]
+
+        mov     r12,QWORD[((-24))+rax]
+
+        mov     rbp,QWORD[((-16))+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+
+        lea     rsp,[rax]
+
+$L$dec4x_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_aesni_multi_cbc_decrypt:
+
+ALIGN   32
+aesni_multi_cbc_encrypt_avx:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_aesni_multi_cbc_encrypt_avx:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+
+_avx_cbc_enc_shortcut:
+        mov     rax,rsp
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+        lea     rsp,[((-168))+rsp]
+        movaps  XMMWORD[rsp],xmm6
+        movaps  XMMWORD[16+rsp],xmm7
+        movaps  XMMWORD[32+rsp],xmm8
+        movaps  XMMWORD[48+rsp],xmm9
+        movaps  XMMWORD[64+rsp],xmm10
+        movaps  XMMWORD[80+rsp],xmm11
+        movaps  XMMWORD[(-120)+rax],xmm12
+        movaps  XMMWORD[(-104)+rax],xmm13
+        movaps  XMMWORD[(-88)+rax],xmm14
+        movaps  XMMWORD[(-72)+rax],xmm15
+
+
+
+
+
+
+
+
+        sub     rsp,192
+        and     rsp,-128
+        mov     QWORD[16+rsp],rax
+
+
+$L$enc8x_body:
+        vzeroupper
+        vmovdqu xmm15,XMMWORD[rsi]
+        lea     rsi,[120+rsi]
+        lea     rdi,[160+rdi]
+        shr     edx,1
+
+$L$enc8x_loop_grande:
+
+        xor     edx,edx
+        mov     ecx,DWORD[((-144))+rdi]
+        mov     r8,QWORD[((-160))+rdi]
+        cmp     ecx,edx
+        mov     rbx,QWORD[((-152))+rdi]
+        cmovg   edx,ecx
+        test    ecx,ecx
+        vmovdqu xmm2,XMMWORD[((-136))+rdi]
+        mov     DWORD[32+rsp],ecx
+        cmovle  r8,rsp
+        sub     rbx,r8
+        mov     QWORD[64+rsp],rbx
+        mov     ecx,DWORD[((-104))+rdi]
+        mov     r9,QWORD[((-120))+rdi]
+        cmp     ecx,edx
+        mov     rbp,QWORD[((-112))+rdi]
+        cmovg   edx,ecx
+        test    ecx,ecx
+        vmovdqu xmm3,XMMWORD[((-96))+rdi]
+        mov     DWORD[36+rsp],ecx
+        cmovle  r9,rsp
+        sub     rbp,r9
+        mov     QWORD[72+rsp],rbp
+        mov     ecx,DWORD[((-64))+rdi]
+        mov     r10,QWORD[((-80))+rdi]
+        cmp     ecx,edx
+        mov     rbp,QWORD[((-72))+rdi]
+        cmovg   edx,ecx
+        test    ecx,ecx
+        vmovdqu xmm4,XMMWORD[((-56))+rdi]
+        mov     DWORD[40+rsp],ecx
+        cmovle  r10,rsp
+        sub     rbp,r10
+        mov     QWORD[80+rsp],rbp
+        mov     ecx,DWORD[((-24))+rdi]
+        mov     r11,QWORD[((-40))+rdi]
+        cmp     ecx,edx
+        mov     rbp,QWORD[((-32))+rdi]
+        cmovg   edx,ecx
+        test    ecx,ecx
+        vmovdqu xmm5,XMMWORD[((-16))+rdi]
+        mov     DWORD[44+rsp],ecx
+        cmovle  r11,rsp
+        sub     rbp,r11
+        mov     QWORD[88+rsp],rbp
+        mov     ecx,DWORD[16+rdi]
+        mov     r12,QWORD[rdi]
+        cmp     ecx,edx
+        mov     rbp,QWORD[8+rdi]
+        cmovg   edx,ecx
+        test    ecx,ecx
+        vmovdqu xmm6,XMMWORD[24+rdi]
+        mov     DWORD[48+rsp],ecx
+        cmovle  r12,rsp
+        sub     rbp,r12
+        mov     QWORD[96+rsp],rbp
+        mov     ecx,DWORD[56+rdi]
+        mov     r13,QWORD[40+rdi]
+        cmp     ecx,edx
+        mov     rbp,QWORD[48+rdi]
+        cmovg   edx,ecx
+        test    ecx,ecx
+        vmovdqu xmm7,XMMWORD[64+rdi]
+        mov     DWORD[52+rsp],ecx
+        cmovle  r13,rsp
+        sub     rbp,r13
+        mov     QWORD[104+rsp],rbp
+        mov     ecx,DWORD[96+rdi]
+        mov     r14,QWORD[80+rdi]
+        cmp     ecx,edx
+        mov     rbp,QWORD[88+rdi]
+        cmovg   edx,ecx
+        test    ecx,ecx
+        vmovdqu xmm8,XMMWORD[104+rdi]
+        mov     DWORD[56+rsp],ecx
+        cmovle  r14,rsp
+        sub     rbp,r14
+        mov     QWORD[112+rsp],rbp
+        mov     ecx,DWORD[136+rdi]
+        mov     r15,QWORD[120+rdi]
+        cmp     ecx,edx
+        mov     rbp,QWORD[128+rdi]
+        cmovg   edx,ecx
+        test    ecx,ecx
+        vmovdqu xmm9,XMMWORD[144+rdi]
+        mov     DWORD[60+rsp],ecx
+        cmovle  r15,rsp
+        sub     rbp,r15
+        mov     QWORD[120+rsp],rbp
+        test    edx,edx
+        jz      NEAR $L$enc8x_done
+
+        vmovups xmm1,XMMWORD[((16-120))+rsi]
+        vmovups xmm0,XMMWORD[((32-120))+rsi]
+        mov     eax,DWORD[((240-120))+rsi]
+
+        vpxor   xmm10,xmm15,XMMWORD[r8]
+        lea     rbp,[128+rsp]
+        vpxor   xmm11,xmm15,XMMWORD[r9]
+        vpxor   xmm12,xmm15,XMMWORD[r10]
+        vpxor   xmm13,xmm15,XMMWORD[r11]
+        vpxor   xmm2,xmm2,xmm10
+        vpxor   xmm10,xmm15,XMMWORD[r12]
+        vpxor   xmm3,xmm3,xmm11
+        vpxor   xmm11,xmm15,XMMWORD[r13]
+        vpxor   xmm4,xmm4,xmm12
+        vpxor   xmm12,xmm15,XMMWORD[r14]
+        vpxor   xmm5,xmm5,xmm13
+        vpxor   xmm13,xmm15,XMMWORD[r15]
+        vpxor   xmm6,xmm6,xmm10
+        mov     ecx,1
+        vpxor   xmm7,xmm7,xmm11
+        vpxor   xmm8,xmm8,xmm12
+        vpxor   xmm9,xmm9,xmm13
+        jmp     NEAR $L$oop_enc8x
+
+ALIGN   32
+$L$oop_enc8x:
+        vaesenc xmm2,xmm2,xmm1
+        cmp     ecx,DWORD[((32+0))+rsp]
+        vaesenc xmm3,xmm3,xmm1
+        prefetcht0      [31+r8]
+        vaesenc xmm4,xmm4,xmm1
+        vaesenc xmm5,xmm5,xmm1
+        lea     rbx,[rbx*1+r8]
+        cmovge  r8,rsp
+        vaesenc xmm6,xmm6,xmm1
+        cmovg   rbx,rsp
+        vaesenc xmm7,xmm7,xmm1
+        sub     rbx,r8
+        vaesenc xmm8,xmm8,xmm1
+        vpxor   xmm10,xmm15,XMMWORD[16+r8]
+        mov     QWORD[((64+0))+rsp],rbx
+        vaesenc xmm9,xmm9,xmm1
+        vmovups xmm1,XMMWORD[((-72))+rsi]
+        lea     r8,[16+rbx*1+r8]
+        vmovdqu XMMWORD[rbp],xmm10
+        vaesenc xmm2,xmm2,xmm0
+        cmp     ecx,DWORD[((32+4))+rsp]
+        mov     rbx,QWORD[((64+8))+rsp]
+        vaesenc xmm3,xmm3,xmm0
+        prefetcht0      [31+r9]
+        vaesenc xmm4,xmm4,xmm0
+        vaesenc xmm5,xmm5,xmm0
+        lea     rbx,[rbx*1+r9]
+        cmovge  r9,rsp
+        vaesenc xmm6,xmm6,xmm0
+        cmovg   rbx,rsp
+        vaesenc xmm7,xmm7,xmm0
+        sub     rbx,r9
+        vaesenc xmm8,xmm8,xmm0
+        vpxor   xmm11,xmm15,XMMWORD[16+r9]
+        mov     QWORD[((64+8))+rsp],rbx
+        vaesenc xmm9,xmm9,xmm0
+        vmovups xmm0,XMMWORD[((-56))+rsi]
+        lea     r9,[16+rbx*1+r9]
+        vmovdqu XMMWORD[16+rbp],xmm11
+        vaesenc xmm2,xmm2,xmm1
+        cmp     ecx,DWORD[((32+8))+rsp]
+        mov     rbx,QWORD[((64+16))+rsp]
+        vaesenc xmm3,xmm3,xmm1
+        prefetcht0      [31+r10]
+        vaesenc xmm4,xmm4,xmm1
+        prefetcht0      [15+r8]
+        vaesenc xmm5,xmm5,xmm1
+        lea     rbx,[rbx*1+r10]
+        cmovge  r10,rsp
+        vaesenc xmm6,xmm6,xmm1
+        cmovg   rbx,rsp
+        vaesenc xmm7,xmm7,xmm1
+        sub     rbx,r10
+        vaesenc xmm8,xmm8,xmm1
+        vpxor   xmm12,xmm15,XMMWORD[16+r10]
+        mov     QWORD[((64+16))+rsp],rbx
+        vaesenc xmm9,xmm9,xmm1
+        vmovups xmm1,XMMWORD[((-40))+rsi]
+        lea     r10,[16+rbx*1+r10]
+        vmovdqu XMMWORD[32+rbp],xmm12
+        vaesenc xmm2,xmm2,xmm0
+        cmp     ecx,DWORD[((32+12))+rsp]
+        mov     rbx,QWORD[((64+24))+rsp]
+        vaesenc xmm3,xmm3,xmm0
+        prefetcht0      [31+r11]
+        vaesenc xmm4,xmm4,xmm0
+        prefetcht0      [15+r9]
+        vaesenc xmm5,xmm5,xmm0
+        lea     rbx,[rbx*1+r11]
+        cmovge  r11,rsp
+        vaesenc xmm6,xmm6,xmm0
+        cmovg   rbx,rsp
+        vaesenc xmm7,xmm7,xmm0
+        sub     rbx,r11
+        vaesenc xmm8,xmm8,xmm0
+        vpxor   xmm13,xmm15,XMMWORD[16+r11]
+        mov     QWORD[((64+24))+rsp],rbx
+        vaesenc xmm9,xmm9,xmm0
+        vmovups xmm0,XMMWORD[((-24))+rsi]
+        lea     r11,[16+rbx*1+r11]
+        vmovdqu XMMWORD[48+rbp],xmm13
+        vaesenc xmm2,xmm2,xmm1
+        cmp     ecx,DWORD[((32+16))+rsp]
+        mov     rbx,QWORD[((64+32))+rsp]
+        vaesenc xmm3,xmm3,xmm1
+        prefetcht0      [31+r12]
+        vaesenc xmm4,xmm4,xmm1
+        prefetcht0      [15+r10]
+        vaesenc xmm5,xmm5,xmm1
+        lea     rbx,[rbx*1+r12]
+        cmovge  r12,rsp
+        vaesenc xmm6,xmm6,xmm1
+        cmovg   rbx,rsp
+        vaesenc xmm7,xmm7,xmm1
+        sub     rbx,r12
+        vaesenc xmm8,xmm8,xmm1
+        vpxor   xmm10,xmm15,XMMWORD[16+r12]
+        mov     QWORD[((64+32))+rsp],rbx
+        vaesenc xmm9,xmm9,xmm1
+        vmovups xmm1,XMMWORD[((-8))+rsi]
+        lea     r12,[16+rbx*1+r12]
+        vaesenc xmm2,xmm2,xmm0
+        cmp     ecx,DWORD[((32+20))+rsp]
+        mov     rbx,QWORD[((64+40))+rsp]
+        vaesenc xmm3,xmm3,xmm0
+        prefetcht0      [31+r13]
+        vaesenc xmm4,xmm4,xmm0
+        prefetcht0      [15+r11]
+        vaesenc xmm5,xmm5,xmm0
+        lea     rbx,[r13*1+rbx]
+        cmovge  r13,rsp
+        vaesenc xmm6,xmm6,xmm0
+        cmovg   rbx,rsp
+        vaesenc xmm7,xmm7,xmm0
+        sub     rbx,r13
+        vaesenc xmm8,xmm8,xmm0
+        vpxor   xmm11,xmm15,XMMWORD[16+r13]
+        mov     QWORD[((64+40))+rsp],rbx
+        vaesenc xmm9,xmm9,xmm0
+        vmovups xmm0,XMMWORD[8+rsi]
+        lea     r13,[16+rbx*1+r13]
+        vaesenc xmm2,xmm2,xmm1
+        cmp     ecx,DWORD[((32+24))+rsp]
+        mov     rbx,QWORD[((64+48))+rsp]
+        vaesenc xmm3,xmm3,xmm1
+        prefetcht0      [31+r14]
+        vaesenc xmm4,xmm4,xmm1
+        prefetcht0      [15+r12]
+        vaesenc xmm5,xmm5,xmm1
+        lea     rbx,[rbx*1+r14]
+        cmovge  r14,rsp
+        vaesenc xmm6,xmm6,xmm1
+        cmovg   rbx,rsp
+        vaesenc xmm7,xmm7,xmm1
+        sub     rbx,r14
+        vaesenc xmm8,xmm8,xmm1
+        vpxor   xmm12,xmm15,XMMWORD[16+r14]
+        mov     QWORD[((64+48))+rsp],rbx
+        vaesenc xmm9,xmm9,xmm1
+        vmovups xmm1,XMMWORD[24+rsi]
+        lea     r14,[16+rbx*1+r14]
+        vaesenc xmm2,xmm2,xmm0
+        cmp     ecx,DWORD[((32+28))+rsp]
+        mov     rbx,QWORD[((64+56))+rsp]
+        vaesenc xmm3,xmm3,xmm0
+        prefetcht0      [31+r15]
+        vaesenc xmm4,xmm4,xmm0
+        prefetcht0      [15+r13]
+        vaesenc xmm5,xmm5,xmm0
+        lea     rbx,[rbx*1+r15]
+        cmovge  r15,rsp
+        vaesenc xmm6,xmm6,xmm0
+        cmovg   rbx,rsp
+        vaesenc xmm7,xmm7,xmm0
+        sub     rbx,r15
+        vaesenc xmm8,xmm8,xmm0
+        vpxor   xmm13,xmm15,XMMWORD[16+r15]
+        mov     QWORD[((64+56))+rsp],rbx
+        vaesenc xmm9,xmm9,xmm0
+        vmovups xmm0,XMMWORD[40+rsi]
+        lea     r15,[16+rbx*1+r15]
+        vmovdqu xmm14,XMMWORD[32+rsp]
+        prefetcht0      [15+r14]
+        prefetcht0      [15+r15]
+        cmp     eax,11
+        jb      NEAR $L$enc8x_tail
+
+        vaesenc xmm2,xmm2,xmm1
+        vaesenc xmm3,xmm3,xmm1
+        vaesenc xmm4,xmm4,xmm1
+        vaesenc xmm5,xmm5,xmm1
+        vaesenc xmm6,xmm6,xmm1
+        vaesenc xmm7,xmm7,xmm1
+        vaesenc xmm8,xmm8,xmm1
+        vaesenc xmm9,xmm9,xmm1
+        vmovups xmm1,XMMWORD[((176-120))+rsi]
+
+        vaesenc xmm2,xmm2,xmm0
+        vaesenc xmm3,xmm3,xmm0
+        vaesenc xmm4,xmm4,xmm0
+        vaesenc xmm5,xmm5,xmm0
+        vaesenc xmm6,xmm6,xmm0
+        vaesenc xmm7,xmm7,xmm0
+        vaesenc xmm8,xmm8,xmm0
+        vaesenc xmm9,xmm9,xmm0
+        vmovups xmm0,XMMWORD[((192-120))+rsi]
+        je      NEAR $L$enc8x_tail
+
+        vaesenc xmm2,xmm2,xmm1
+        vaesenc xmm3,xmm3,xmm1
+        vaesenc xmm4,xmm4,xmm1
+        vaesenc xmm5,xmm5,xmm1
+        vaesenc xmm6,xmm6,xmm1
+        vaesenc xmm7,xmm7,xmm1
+        vaesenc xmm8,xmm8,xmm1
+        vaesenc xmm9,xmm9,xmm1
+        vmovups xmm1,XMMWORD[((208-120))+rsi]
+
+        vaesenc xmm2,xmm2,xmm0
+        vaesenc xmm3,xmm3,xmm0
+        vaesenc xmm4,xmm4,xmm0
+        vaesenc xmm5,xmm5,xmm0
+        vaesenc xmm6,xmm6,xmm0
+        vaesenc xmm7,xmm7,xmm0
+        vaesenc xmm8,xmm8,xmm0
+        vaesenc xmm9,xmm9,xmm0
+        vmovups xmm0,XMMWORD[((224-120))+rsi]
+
+$L$enc8x_tail:
+        vaesenc xmm2,xmm2,xmm1
+        vpxor   xmm15,xmm15,xmm15
+        vaesenc xmm3,xmm3,xmm1
+        vaesenc xmm4,xmm4,xmm1
+        vpcmpgtd        xmm15,xmm14,xmm15
+        vaesenc xmm5,xmm5,xmm1
+        vaesenc xmm6,xmm6,xmm1
+        vpaddd  xmm15,xmm15,xmm14
+        vmovdqu xmm14,XMMWORD[48+rsp]
+        vaesenc xmm7,xmm7,xmm1
+        mov     rbx,QWORD[64+rsp]
+        vaesenc xmm8,xmm8,xmm1
+        vaesenc xmm9,xmm9,xmm1
+        vmovups xmm1,XMMWORD[((16-120))+rsi]
+
+        vaesenclast     xmm2,xmm2,xmm0
+        vmovdqa XMMWORD[32+rsp],xmm15
+        vpxor   xmm15,xmm15,xmm15
+        vaesenclast     xmm3,xmm3,xmm0
+        vaesenclast     xmm4,xmm4,xmm0
+        vpcmpgtd        xmm15,xmm14,xmm15
+        vaesenclast     xmm5,xmm5,xmm0
+        vaesenclast     xmm6,xmm6,xmm0
+        vpaddd  xmm14,xmm14,xmm15
+        vmovdqu xmm15,XMMWORD[((-120))+rsi]
+        vaesenclast     xmm7,xmm7,xmm0
+        vaesenclast     xmm8,xmm8,xmm0
+        vmovdqa XMMWORD[48+rsp],xmm14
+        vaesenclast     xmm9,xmm9,xmm0
+        vmovups xmm0,XMMWORD[((32-120))+rsi]
+
+        vmovups XMMWORD[(-16)+r8],xmm2
+        sub     r8,rbx
+        vpxor   xmm2,xmm2,XMMWORD[rbp]
+        vmovups XMMWORD[(-16)+r9],xmm3
+        sub     r9,QWORD[72+rsp]
+        vpxor   xmm3,xmm3,XMMWORD[16+rbp]
+        vmovups XMMWORD[(-16)+r10],xmm4
+        sub     r10,QWORD[80+rsp]
+        vpxor   xmm4,xmm4,XMMWORD[32+rbp]
+        vmovups XMMWORD[(-16)+r11],xmm5
+        sub     r11,QWORD[88+rsp]
+        vpxor   xmm5,xmm5,XMMWORD[48+rbp]
+        vmovups XMMWORD[(-16)+r12],xmm6
+        sub     r12,QWORD[96+rsp]
+        vpxor   xmm6,xmm6,xmm10
+        vmovups XMMWORD[(-16)+r13],xmm7
+        sub     r13,QWORD[104+rsp]
+        vpxor   xmm7,xmm7,xmm11
+        vmovups XMMWORD[(-16)+r14],xmm8
+        sub     r14,QWORD[112+rsp]
+        vpxor   xmm8,xmm8,xmm12
+        vmovups XMMWORD[(-16)+r15],xmm9
+        sub     r15,QWORD[120+rsp]
+        vpxor   xmm9,xmm9,xmm13
+
+        dec     edx
+        jnz     NEAR $L$oop_enc8x
+
+        mov     rax,QWORD[16+rsp]
+
+
+
+
+
+
+$L$enc8x_done:
+        vzeroupper
+        movaps  xmm6,XMMWORD[((-216))+rax]
+        movaps  xmm7,XMMWORD[((-200))+rax]
+        movaps  xmm8,XMMWORD[((-184))+rax]
+        movaps  xmm9,XMMWORD[((-168))+rax]
+        movaps  xmm10,XMMWORD[((-152))+rax]
+        movaps  xmm11,XMMWORD[((-136))+rax]
+        movaps  xmm12,XMMWORD[((-120))+rax]
+        movaps  xmm13,XMMWORD[((-104))+rax]
+        movaps  xmm14,XMMWORD[((-88))+rax]
+        movaps  xmm15,XMMWORD[((-72))+rax]
+        mov     r15,QWORD[((-48))+rax]
+
+        mov     r14,QWORD[((-40))+rax]
+
+        mov     r13,QWORD[((-32))+rax]
+
+        mov     r12,QWORD[((-24))+rax]
+
+        mov     rbp,QWORD[((-16))+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+
+        lea     rsp,[rax]
+
+$L$enc8x_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_aesni_multi_cbc_encrypt_avx:
+
+
+ALIGN   32
+aesni_multi_cbc_decrypt_avx:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_aesni_multi_cbc_decrypt_avx:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+
+_avx_cbc_dec_shortcut:
+        mov     rax,rsp
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+        lea     rsp,[((-168))+rsp]
+        movaps  XMMWORD[rsp],xmm6
+        movaps  XMMWORD[16+rsp],xmm7
+        movaps  XMMWORD[32+rsp],xmm8
+        movaps  XMMWORD[48+rsp],xmm9
+        movaps  XMMWORD[64+rsp],xmm10
+        movaps  XMMWORD[80+rsp],xmm11
+        movaps  XMMWORD[(-120)+rax],xmm12
+        movaps  XMMWORD[(-104)+rax],xmm13
+        movaps  XMMWORD[(-88)+rax],xmm14
+        movaps  XMMWORD[(-72)+rax],xmm15
+
+
+
+
+
+
+
+
+
+        sub     rsp,256
+        and     rsp,-256
+        sub     rsp,192
+        mov     QWORD[16+rsp],rax
+
+
+$L$dec8x_body:
+        vzeroupper
+        vmovdqu xmm15,XMMWORD[rsi]
+        lea     rsi,[120+rsi]
+        lea     rdi,[160+rdi]
+        shr     edx,1
+
+$L$dec8x_loop_grande:
+
+        xor     edx,edx
+        mov     ecx,DWORD[((-144))+rdi]
+        mov     r8,QWORD[((-160))+rdi]
+        cmp     ecx,edx
+        mov     rbx,QWORD[((-152))+rdi]
+        cmovg   edx,ecx
+        test    ecx,ecx
+        vmovdqu xmm2,XMMWORD[((-136))+rdi]
+        mov     DWORD[32+rsp],ecx
+        cmovle  r8,rsp
+        sub     rbx,r8
+        mov     QWORD[64+rsp],rbx
+        vmovdqu XMMWORD[192+rsp],xmm2
+        mov     ecx,DWORD[((-104))+rdi]
+        mov     r9,QWORD[((-120))+rdi]
+        cmp     ecx,edx
+        mov     rbp,QWORD[((-112))+rdi]
+        cmovg   edx,ecx
+        test    ecx,ecx
+        vmovdqu xmm3,XMMWORD[((-96))+rdi]
+        mov     DWORD[36+rsp],ecx
+        cmovle  r9,rsp
+        sub     rbp,r9
+        mov     QWORD[72+rsp],rbp
+        vmovdqu XMMWORD[208+rsp],xmm3
+        mov     ecx,DWORD[((-64))+rdi]
+        mov     r10,QWORD[((-80))+rdi]
+        cmp     ecx,edx
+        mov     rbp,QWORD[((-72))+rdi]
+        cmovg   edx,ecx
+        test    ecx,ecx
+        vmovdqu xmm4,XMMWORD[((-56))+rdi]
+        mov     DWORD[40+rsp],ecx
+        cmovle  r10,rsp
+        sub     rbp,r10
+        mov     QWORD[80+rsp],rbp
+        vmovdqu XMMWORD[224+rsp],xmm4
+        mov     ecx,DWORD[((-24))+rdi]
+        mov     r11,QWORD[((-40))+rdi]
+        cmp     ecx,edx
+        mov     rbp,QWORD[((-32))+rdi]
+        cmovg   edx,ecx
+        test    ecx,ecx
+        vmovdqu xmm5,XMMWORD[((-16))+rdi]
+        mov     DWORD[44+rsp],ecx
+        cmovle  r11,rsp
+        sub     rbp,r11
+        mov     QWORD[88+rsp],rbp
+        vmovdqu XMMWORD[240+rsp],xmm5
+        mov     ecx,DWORD[16+rdi]
+        mov     r12,QWORD[rdi]
+        cmp     ecx,edx
+        mov     rbp,QWORD[8+rdi]
+        cmovg   edx,ecx
+        test    ecx,ecx
+        vmovdqu xmm6,XMMWORD[24+rdi]
+        mov     DWORD[48+rsp],ecx
+        cmovle  r12,rsp
+        sub     rbp,r12
+        mov     QWORD[96+rsp],rbp
+        vmovdqu XMMWORD[256+rsp],xmm6
+        mov     ecx,DWORD[56+rdi]
+        mov     r13,QWORD[40+rdi]
+        cmp     ecx,edx
+        mov     rbp,QWORD[48+rdi]
+        cmovg   edx,ecx
+        test    ecx,ecx
+        vmovdqu xmm7,XMMWORD[64+rdi]
+        mov     DWORD[52+rsp],ecx
+        cmovle  r13,rsp
+        sub     rbp,r13
+        mov     QWORD[104+rsp],rbp
+        vmovdqu XMMWORD[272+rsp],xmm7
+        mov     ecx,DWORD[96+rdi]
+        mov     r14,QWORD[80+rdi]
+        cmp     ecx,edx
+        mov     rbp,QWORD[88+rdi]
+        cmovg   edx,ecx
+        test    ecx,ecx
+        vmovdqu xmm8,XMMWORD[104+rdi]
+        mov     DWORD[56+rsp],ecx
+        cmovle  r14,rsp
+        sub     rbp,r14
+        mov     QWORD[112+rsp],rbp
+        vmovdqu XMMWORD[288+rsp],xmm8
+        mov     ecx,DWORD[136+rdi]
+        mov     r15,QWORD[120+rdi]
+        cmp     ecx,edx
+        mov     rbp,QWORD[128+rdi]
+        cmovg   edx,ecx
+        test    ecx,ecx
+        vmovdqu xmm9,XMMWORD[144+rdi]
+        mov     DWORD[60+rsp],ecx
+        cmovle  r15,rsp
+        sub     rbp,r15
+        mov     QWORD[120+rsp],rbp
+        vmovdqu XMMWORD[304+rsp],xmm9
+        test    edx,edx
+        jz      NEAR $L$dec8x_done
+
+        vmovups xmm1,XMMWORD[((16-120))+rsi]
+        vmovups xmm0,XMMWORD[((32-120))+rsi]
+        mov     eax,DWORD[((240-120))+rsi]
+        lea     rbp,[((192+128))+rsp]
+
+        vmovdqu xmm2,XMMWORD[r8]
+        vmovdqu xmm3,XMMWORD[r9]
+        vmovdqu xmm4,XMMWORD[r10]
+        vmovdqu xmm5,XMMWORD[r11]
+        vmovdqu xmm6,XMMWORD[r12]
+        vmovdqu xmm7,XMMWORD[r13]
+        vmovdqu xmm8,XMMWORD[r14]
+        vmovdqu xmm9,XMMWORD[r15]
+        vmovdqu XMMWORD[rbp],xmm2
+        vpxor   xmm2,xmm2,xmm15
+        vmovdqu XMMWORD[16+rbp],xmm3
+        vpxor   xmm3,xmm3,xmm15
+        vmovdqu XMMWORD[32+rbp],xmm4
+        vpxor   xmm4,xmm4,xmm15
+        vmovdqu XMMWORD[48+rbp],xmm5
+        vpxor   xmm5,xmm5,xmm15
+        vmovdqu XMMWORD[64+rbp],xmm6
+        vpxor   xmm6,xmm6,xmm15
+        vmovdqu XMMWORD[80+rbp],xmm7
+        vpxor   xmm7,xmm7,xmm15
+        vmovdqu XMMWORD[96+rbp],xmm8
+        vpxor   xmm8,xmm8,xmm15
+        vmovdqu XMMWORD[112+rbp],xmm9
+        vpxor   xmm9,xmm9,xmm15
+        xor     rbp,0x80
+        mov     ecx,1
+        jmp     NEAR $L$oop_dec8x
+
+ALIGN   32
+$L$oop_dec8x:
+        vaesdec xmm2,xmm2,xmm1
+        cmp     ecx,DWORD[((32+0))+rsp]
+        vaesdec xmm3,xmm3,xmm1
+        prefetcht0      [31+r8]
+        vaesdec xmm4,xmm4,xmm1
+        vaesdec xmm5,xmm5,xmm1
+        lea     rbx,[rbx*1+r8]
+        cmovge  r8,rsp
+        vaesdec xmm6,xmm6,xmm1
+        cmovg   rbx,rsp
+        vaesdec xmm7,xmm7,xmm1
+        sub     rbx,r8
+        vaesdec xmm8,xmm8,xmm1
+        vmovdqu xmm10,XMMWORD[16+r8]
+        mov     QWORD[((64+0))+rsp],rbx
+        vaesdec xmm9,xmm9,xmm1
+        vmovups xmm1,XMMWORD[((-72))+rsi]
+        lea     r8,[16+rbx*1+r8]
+        vmovdqu XMMWORD[128+rsp],xmm10
+        vaesdec xmm2,xmm2,xmm0
+        cmp     ecx,DWORD[((32+4))+rsp]
+        mov     rbx,QWORD[((64+8))+rsp]
+        vaesdec xmm3,xmm3,xmm0
+        prefetcht0      [31+r9]
+        vaesdec xmm4,xmm4,xmm0
+        vaesdec xmm5,xmm5,xmm0
+        lea     rbx,[rbx*1+r9]
+        cmovge  r9,rsp
+        vaesdec xmm6,xmm6,xmm0
+        cmovg   rbx,rsp
+        vaesdec xmm7,xmm7,xmm0
+        sub     rbx,r9
+        vaesdec xmm8,xmm8,xmm0
+        vmovdqu xmm11,XMMWORD[16+r9]
+        mov     QWORD[((64+8))+rsp],rbx
+        vaesdec xmm9,xmm9,xmm0
+        vmovups xmm0,XMMWORD[((-56))+rsi]
+        lea     r9,[16+rbx*1+r9]
+        vmovdqu XMMWORD[144+rsp],xmm11
+        vaesdec xmm2,xmm2,xmm1
+        cmp     ecx,DWORD[((32+8))+rsp]
+        mov     rbx,QWORD[((64+16))+rsp]
+        vaesdec xmm3,xmm3,xmm1
+        prefetcht0      [31+r10]
+        vaesdec xmm4,xmm4,xmm1
+        prefetcht0      [15+r8]
+        vaesdec xmm5,xmm5,xmm1
+        lea     rbx,[rbx*1+r10]
+        cmovge  r10,rsp
+        vaesdec xmm6,xmm6,xmm1
+        cmovg   rbx,rsp
+        vaesdec xmm7,xmm7,xmm1
+        sub     rbx,r10
+        vaesdec xmm8,xmm8,xmm1
+        vmovdqu xmm12,XMMWORD[16+r10]
+        mov     QWORD[((64+16))+rsp],rbx
+        vaesdec xmm9,xmm9,xmm1
+        vmovups xmm1,XMMWORD[((-40))+rsi]
+        lea     r10,[16+rbx*1+r10]
+        vmovdqu XMMWORD[160+rsp],xmm12
+        vaesdec xmm2,xmm2,xmm0
+        cmp     ecx,DWORD[((32+12))+rsp]
+        mov     rbx,QWORD[((64+24))+rsp]
+        vaesdec xmm3,xmm3,xmm0
+        prefetcht0      [31+r11]
+        vaesdec xmm4,xmm4,xmm0
+        prefetcht0      [15+r9]
+        vaesdec xmm5,xmm5,xmm0
+        lea     rbx,[rbx*1+r11]
+        cmovge  r11,rsp
+        vaesdec xmm6,xmm6,xmm0
+        cmovg   rbx,rsp
+        vaesdec xmm7,xmm7,xmm0
+        sub     rbx,r11
+        vaesdec xmm8,xmm8,xmm0
+        vmovdqu xmm13,XMMWORD[16+r11]
+        mov     QWORD[((64+24))+rsp],rbx
+        vaesdec xmm9,xmm9,xmm0
+        vmovups xmm0,XMMWORD[((-24))+rsi]
+        lea     r11,[16+rbx*1+r11]
+        vmovdqu XMMWORD[176+rsp],xmm13
+        vaesdec xmm2,xmm2,xmm1
+        cmp     ecx,DWORD[((32+16))+rsp]
+        mov     rbx,QWORD[((64+32))+rsp]
+        vaesdec xmm3,xmm3,xmm1
+        prefetcht0      [31+r12]
+        vaesdec xmm4,xmm4,xmm1
+        prefetcht0      [15+r10]
+        vaesdec xmm5,xmm5,xmm1
+        lea     rbx,[rbx*1+r12]
+        cmovge  r12,rsp
+        vaesdec xmm6,xmm6,xmm1
+        cmovg   rbx,rsp
+        vaesdec xmm7,xmm7,xmm1
+        sub     rbx,r12
+        vaesdec xmm8,xmm8,xmm1
+        vmovdqu xmm10,XMMWORD[16+r12]
+        mov     QWORD[((64+32))+rsp],rbx
+        vaesdec xmm9,xmm9,xmm1
+        vmovups xmm1,XMMWORD[((-8))+rsi]
+        lea     r12,[16+rbx*1+r12]
+        vaesdec xmm2,xmm2,xmm0
+        cmp     ecx,DWORD[((32+20))+rsp]
+        mov     rbx,QWORD[((64+40))+rsp]
+        vaesdec xmm3,xmm3,xmm0
+        prefetcht0      [31+r13]
+        vaesdec xmm4,xmm4,xmm0
+        prefetcht0      [15+r11]
+        vaesdec xmm5,xmm5,xmm0
+        lea     rbx,[r13*1+rbx]
+        cmovge  r13,rsp
+        vaesdec xmm6,xmm6,xmm0
+        cmovg   rbx,rsp
+        vaesdec xmm7,xmm7,xmm0
+        sub     rbx,r13
+        vaesdec xmm8,xmm8,xmm0
+        vmovdqu xmm11,XMMWORD[16+r13]
+        mov     QWORD[((64+40))+rsp],rbx
+        vaesdec xmm9,xmm9,xmm0
+        vmovups xmm0,XMMWORD[8+rsi]
+        lea     r13,[16+rbx*1+r13]
+        vaesdec xmm2,xmm2,xmm1
+        cmp     ecx,DWORD[((32+24))+rsp]
+        mov     rbx,QWORD[((64+48))+rsp]
+        vaesdec xmm3,xmm3,xmm1
+        prefetcht0      [31+r14]
+        vaesdec xmm4,xmm4,xmm1
+        prefetcht0      [15+r12]
+        vaesdec xmm5,xmm5,xmm1
+        lea     rbx,[rbx*1+r14]
+        cmovge  r14,rsp
+        vaesdec xmm6,xmm6,xmm1
+        cmovg   rbx,rsp
+        vaesdec xmm7,xmm7,xmm1
+        sub     rbx,r14
+        vaesdec xmm8,xmm8,xmm1
+        vmovdqu xmm12,XMMWORD[16+r14]
+        mov     QWORD[((64+48))+rsp],rbx
+        vaesdec xmm9,xmm9,xmm1
+        vmovups xmm1,XMMWORD[24+rsi]
+        lea     r14,[16+rbx*1+r14]
+        vaesdec xmm2,xmm2,xmm0
+        cmp     ecx,DWORD[((32+28))+rsp]
+        mov     rbx,QWORD[((64+56))+rsp]
+        vaesdec xmm3,xmm3,xmm0
+        prefetcht0      [31+r15]
+        vaesdec xmm4,xmm4,xmm0
+        prefetcht0      [15+r13]
+        vaesdec xmm5,xmm5,xmm0
+        lea     rbx,[rbx*1+r15]
+        cmovge  r15,rsp
+        vaesdec xmm6,xmm6,xmm0
+        cmovg   rbx,rsp
+        vaesdec xmm7,xmm7,xmm0
+        sub     rbx,r15
+        vaesdec xmm8,xmm8,xmm0
+        vmovdqu xmm13,XMMWORD[16+r15]
+        mov     QWORD[((64+56))+rsp],rbx
+        vaesdec xmm9,xmm9,xmm0
+        vmovups xmm0,XMMWORD[40+rsi]
+        lea     r15,[16+rbx*1+r15]
+        vmovdqu xmm14,XMMWORD[32+rsp]
+        prefetcht0      [15+r14]
+        prefetcht0      [15+r15]
+        cmp     eax,11
+        jb      NEAR $L$dec8x_tail
+
+        vaesdec xmm2,xmm2,xmm1
+        vaesdec xmm3,xmm3,xmm1
+        vaesdec xmm4,xmm4,xmm1
+        vaesdec xmm5,xmm5,xmm1
+        vaesdec xmm6,xmm6,xmm1
+        vaesdec xmm7,xmm7,xmm1
+        vaesdec xmm8,xmm8,xmm1
+        vaesdec xmm9,xmm9,xmm1
+        vmovups xmm1,XMMWORD[((176-120))+rsi]
+
+        vaesdec xmm2,xmm2,xmm0
+        vaesdec xmm3,xmm3,xmm0
+        vaesdec xmm4,xmm4,xmm0
+        vaesdec xmm5,xmm5,xmm0
+        vaesdec xmm6,xmm6,xmm0
+        vaesdec xmm7,xmm7,xmm0
+        vaesdec xmm8,xmm8,xmm0
+        vaesdec xmm9,xmm9,xmm0
+        vmovups xmm0,XMMWORD[((192-120))+rsi]
+        je      NEAR $L$dec8x_tail
+
+        vaesdec xmm2,xmm2,xmm1
+        vaesdec xmm3,xmm3,xmm1
+        vaesdec xmm4,xmm4,xmm1
+        vaesdec xmm5,xmm5,xmm1
+        vaesdec xmm6,xmm6,xmm1
+        vaesdec xmm7,xmm7,xmm1
+        vaesdec xmm8,xmm8,xmm1
+        vaesdec xmm9,xmm9,xmm1
+        vmovups xmm1,XMMWORD[((208-120))+rsi]
+
+        vaesdec xmm2,xmm2,xmm0
+        vaesdec xmm3,xmm3,xmm0
+        vaesdec xmm4,xmm4,xmm0
+        vaesdec xmm5,xmm5,xmm0
+        vaesdec xmm6,xmm6,xmm0
+        vaesdec xmm7,xmm7,xmm0
+        vaesdec xmm8,xmm8,xmm0
+        vaesdec xmm9,xmm9,xmm0
+        vmovups xmm0,XMMWORD[((224-120))+rsi]
+
+$L$dec8x_tail:
+        vaesdec xmm2,xmm2,xmm1
+        vpxor   xmm15,xmm15,xmm15
+        vaesdec xmm3,xmm3,xmm1
+        vaesdec xmm4,xmm4,xmm1
+        vpcmpgtd        xmm15,xmm14,xmm15
+        vaesdec xmm5,xmm5,xmm1
+        vaesdec xmm6,xmm6,xmm1
+        vpaddd  xmm15,xmm15,xmm14
+        vmovdqu xmm14,XMMWORD[48+rsp]
+        vaesdec xmm7,xmm7,xmm1
+        mov     rbx,QWORD[64+rsp]
+        vaesdec xmm8,xmm8,xmm1
+        vaesdec xmm9,xmm9,xmm1
+        vmovups xmm1,XMMWORD[((16-120))+rsi]
+
+        vaesdeclast     xmm2,xmm2,xmm0
+        vmovdqa XMMWORD[32+rsp],xmm15
+        vpxor   xmm15,xmm15,xmm15
+        vaesdeclast     xmm3,xmm3,xmm0
+        vpxor   xmm2,xmm2,XMMWORD[rbp]
+        vaesdeclast     xmm4,xmm4,xmm0
+        vpxor   xmm3,xmm3,XMMWORD[16+rbp]
+        vpcmpgtd        xmm15,xmm14,xmm15
+        vaesdeclast     xmm5,xmm5,xmm0
+        vpxor   xmm4,xmm4,XMMWORD[32+rbp]
+        vaesdeclast     xmm6,xmm6,xmm0
+        vpxor   xmm5,xmm5,XMMWORD[48+rbp]
+        vpaddd  xmm14,xmm14,xmm15
+        vmovdqu xmm15,XMMWORD[((-120))+rsi]
+        vaesdeclast     xmm7,xmm7,xmm0
+        vpxor   xmm6,xmm6,XMMWORD[64+rbp]
+        vaesdeclast     xmm8,xmm8,xmm0
+        vpxor   xmm7,xmm7,XMMWORD[80+rbp]
+        vmovdqa XMMWORD[48+rsp],xmm14
+        vaesdeclast     xmm9,xmm9,xmm0
+        vpxor   xmm8,xmm8,XMMWORD[96+rbp]
+        vmovups xmm0,XMMWORD[((32-120))+rsi]
+
+        vmovups XMMWORD[(-16)+r8],xmm2
+        sub     r8,rbx
+        vmovdqu xmm2,XMMWORD[((128+0))+rsp]
+        vpxor   xmm9,xmm9,XMMWORD[112+rbp]
+        vmovups XMMWORD[(-16)+r9],xmm3
+        sub     r9,QWORD[72+rsp]
+        vmovdqu XMMWORD[rbp],xmm2
+        vpxor   xmm2,xmm2,xmm15
+        vmovdqu xmm3,XMMWORD[((128+16))+rsp]
+        vmovups XMMWORD[(-16)+r10],xmm4
+        sub     r10,QWORD[80+rsp]
+        vmovdqu XMMWORD[16+rbp],xmm3
+        vpxor   xmm3,xmm3,xmm15
+        vmovdqu xmm4,XMMWORD[((128+32))+rsp]
+        vmovups XMMWORD[(-16)+r11],xmm5
+        sub     r11,QWORD[88+rsp]
+        vmovdqu XMMWORD[32+rbp],xmm4
+        vpxor   xmm4,xmm4,xmm15
+        vmovdqu xmm5,XMMWORD[((128+48))+rsp]
+        vmovups XMMWORD[(-16)+r12],xmm6
+        sub     r12,QWORD[96+rsp]
+        vmovdqu XMMWORD[48+rbp],xmm5
+        vpxor   xmm5,xmm5,xmm15
+        vmovdqu XMMWORD[64+rbp],xmm10
+        vpxor   xmm6,xmm15,xmm10
+        vmovups XMMWORD[(-16)+r13],xmm7
+        sub     r13,QWORD[104+rsp]
+        vmovdqu XMMWORD[80+rbp],xmm11
+        vpxor   xmm7,xmm15,xmm11
+        vmovups XMMWORD[(-16)+r14],xmm8
+        sub     r14,QWORD[112+rsp]
+        vmovdqu XMMWORD[96+rbp],xmm12
+        vpxor   xmm8,xmm15,xmm12
+        vmovups XMMWORD[(-16)+r15],xmm9
+        sub     r15,QWORD[120+rsp]
+        vmovdqu XMMWORD[112+rbp],xmm13
+        vpxor   xmm9,xmm15,xmm13
+
+        xor     rbp,128
+        dec     edx
+        jnz     NEAR $L$oop_dec8x
+
+        mov     rax,QWORD[16+rsp]
+
+
+
+
+
+
+$L$dec8x_done:
+        vzeroupper
+        movaps  xmm6,XMMWORD[((-216))+rax]
+        movaps  xmm7,XMMWORD[((-200))+rax]
+        movaps  xmm8,XMMWORD[((-184))+rax]
+        movaps  xmm9,XMMWORD[((-168))+rax]
+        movaps  xmm10,XMMWORD[((-152))+rax]
+        movaps  xmm11,XMMWORD[((-136))+rax]
+        movaps  xmm12,XMMWORD[((-120))+rax]
+        movaps  xmm13,XMMWORD[((-104))+rax]
+        movaps  xmm14,XMMWORD[((-88))+rax]
+        movaps  xmm15,XMMWORD[((-72))+rax]
+        mov     r15,QWORD[((-48))+rax]
+
+        mov     r14,QWORD[((-40))+rax]
+
+        mov     r13,QWORD[((-32))+rax]
+
+        mov     r12,QWORD[((-24))+rax]
+
+        mov     rbp,QWORD[((-16))+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+
+        lea     rsp,[rax]
+
+$L$dec8x_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_aesni_multi_cbc_decrypt_avx:
+EXTERN  __imp_RtlVirtualUnwind
+
+ALIGN   16
+se_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        mov     rsi,QWORD[8+r9]
+        mov     r11,QWORD[56+r9]
+
+        mov     r10d,DWORD[r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jb      NEAR $L$in_prologue
+
+        mov     rax,QWORD[152+r8]
+
+        mov     r10d,DWORD[4+r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jae     NEAR $L$in_prologue
+
+        mov     rax,QWORD[16+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+        mov     rbp,QWORD[((-16))+rax]
+        mov     r12,QWORD[((-24))+rax]
+        mov     r13,QWORD[((-32))+rax]
+        mov     r14,QWORD[((-40))+rax]
+        mov     r15,QWORD[((-48))+rax]
+        mov     QWORD[144+r8],rbx
+        mov     QWORD[160+r8],rbp
+        mov     QWORD[216+r8],r12
+        mov     QWORD[224+r8],r13
+        mov     QWORD[232+r8],r14
+        mov     QWORD[240+r8],r15
+
+        lea     rsi,[((-56-160))+rax]
+        lea     rdi,[512+r8]
+        mov     ecx,20
+        DD      0xa548f3fc
+
+$L$in_prologue:
+        mov     rdi,QWORD[8+rax]
+        mov     rsi,QWORD[16+rax]
+        mov     QWORD[152+r8],rax
+        mov     QWORD[168+r8],rsi
+        mov     QWORD[176+r8],rdi
+
+        mov     rdi,QWORD[40+r9]
+        mov     rsi,r8
+        mov     ecx,154
+        DD      0xa548f3fc
+
+        mov     rsi,r9
+        xor     rcx,rcx
+        mov     rdx,QWORD[8+rsi]
+        mov     r8,QWORD[rsi]
+        mov     r9,QWORD[16+rsi]
+        mov     r10,QWORD[40+rsi]
+        lea     r11,[56+rsi]
+        lea     r12,[24+rsi]
+        mov     QWORD[32+rsp],r10
+        mov     QWORD[40+rsp],r11
+        mov     QWORD[48+rsp],r12
+        mov     QWORD[56+rsp],rcx
+        call    QWORD[__imp_RtlVirtualUnwind]
+
+        mov     eax,1
+        add     rsp,64
+        popfq
+        pop     r15
+        pop     r14
+        pop     r13
+        pop     r12
+        pop     rbp
+        pop     rbx
+        pop     rdi
+        pop     rsi
+        DB      0F3h,0C3h               ;repret
+
+
+section .pdata rdata align=4
+ALIGN   4
+        DD      $L$SEH_begin_aesni_multi_cbc_encrypt wrt ..imagebase
+        DD      $L$SEH_end_aesni_multi_cbc_encrypt wrt ..imagebase
+        DD      $L$SEH_info_aesni_multi_cbc_encrypt wrt ..imagebase
+        DD      $L$SEH_begin_aesni_multi_cbc_decrypt wrt ..imagebase
+        DD      $L$SEH_end_aesni_multi_cbc_decrypt wrt ..imagebase
+        DD      $L$SEH_info_aesni_multi_cbc_decrypt wrt ..imagebase
+        DD      $L$SEH_begin_aesni_multi_cbc_encrypt_avx wrt ..imagebase
+        DD      $L$SEH_end_aesni_multi_cbc_encrypt_avx wrt ..imagebase
+        DD      $L$SEH_info_aesni_multi_cbc_encrypt_avx wrt ..imagebase
+        DD      $L$SEH_begin_aesni_multi_cbc_decrypt_avx wrt ..imagebase
+        DD      $L$SEH_end_aesni_multi_cbc_decrypt_avx wrt ..imagebase
+        DD      $L$SEH_info_aesni_multi_cbc_decrypt_avx wrt ..imagebase
+section .xdata rdata align=8
+ALIGN   8
+$L$SEH_info_aesni_multi_cbc_encrypt:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$enc4x_body wrt ..imagebase,$L$enc4x_epilogue wrt ..imagebase
+$L$SEH_info_aesni_multi_cbc_decrypt:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$dec4x_body wrt ..imagebase,$L$dec4x_epilogue wrt ..imagebase
+$L$SEH_info_aesni_multi_cbc_encrypt_avx:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$enc8x_body wrt ..imagebase,$L$enc8x_epilogue wrt ..imagebase
+$L$SEH_info_aesni_multi_cbc_decrypt_avx:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$dec8x_body wrt ..imagebase,$L$dec8x_epilogue wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha1-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha1-x86_64.nasm
new file mode 100644
index 0000000000..0b706c4e77
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha1-x86_64.nasm
@@ -0,0 +1,3271 @@
+; Copyright 2011-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+EXTERN  OPENSSL_ia32cap_P
+
+global  aesni_cbc_sha1_enc
+
+ALIGN   32
+aesni_cbc_sha1_enc:
+
+        mov     r10d,DWORD[((OPENSSL_ia32cap_P+0))]
+        mov     r11,QWORD[((OPENSSL_ia32cap_P+4))]
+        bt      r11,61
+        jc      NEAR aesni_cbc_sha1_enc_shaext
+        and     r11d,268435456
+        and     r10d,1073741824
+        or      r10d,r11d
+        cmp     r10d,1342177280
+        je      NEAR aesni_cbc_sha1_enc_avx
+        jmp     NEAR aesni_cbc_sha1_enc_ssse3
+        DB      0F3h,0C3h               ;repret
+
+
+ALIGN   32
+aesni_cbc_sha1_enc_ssse3:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_aesni_cbc_sha1_enc_ssse3:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+
+        mov     r10,QWORD[56+rsp]
+
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+        lea     rsp,[((-264))+rsp]
+
+
+
+        movaps  XMMWORD[(96+0)+rsp],xmm6
+        movaps  XMMWORD[(96+16)+rsp],xmm7
+        movaps  XMMWORD[(96+32)+rsp],xmm8
+        movaps  XMMWORD[(96+48)+rsp],xmm9
+        movaps  XMMWORD[(96+64)+rsp],xmm10
+        movaps  XMMWORD[(96+80)+rsp],xmm11
+        movaps  XMMWORD[(96+96)+rsp],xmm12
+        movaps  XMMWORD[(96+112)+rsp],xmm13
+        movaps  XMMWORD[(96+128)+rsp],xmm14
+        movaps  XMMWORD[(96+144)+rsp],xmm15
+$L$prologue_ssse3:
+        mov     r12,rdi
+        mov     r13,rsi
+        mov     r14,rdx
+        lea     r15,[112+rcx]
+        movdqu  xmm2,XMMWORD[r8]
+        mov     QWORD[88+rsp],r8
+        shl     r14,6
+        sub     r13,r12
+        mov     r8d,DWORD[((240-112))+r15]
+        add     r14,r10
+
+        lea     r11,[K_XX_XX]
+        mov     eax,DWORD[r9]
+        mov     ebx,DWORD[4+r9]
+        mov     ecx,DWORD[8+r9]
+        mov     edx,DWORD[12+r9]
+        mov     esi,ebx
+        mov     ebp,DWORD[16+r9]
+        mov     edi,ecx
+        xor     edi,edx
+        and     esi,edi
+
+        movdqa  xmm3,XMMWORD[64+r11]
+        movdqa  xmm13,XMMWORD[r11]
+        movdqu  xmm4,XMMWORD[r10]
+        movdqu  xmm5,XMMWORD[16+r10]
+        movdqu  xmm6,XMMWORD[32+r10]
+        movdqu  xmm7,XMMWORD[48+r10]
+DB      102,15,56,0,227
+DB      102,15,56,0,235
+DB      102,15,56,0,243
+        add     r10,64
+        paddd   xmm4,xmm13
+DB      102,15,56,0,251
+        paddd   xmm5,xmm13
+        paddd   xmm6,xmm13
+        movdqa  XMMWORD[rsp],xmm4
+        psubd   xmm4,xmm13
+        movdqa  XMMWORD[16+rsp],xmm5
+        psubd   xmm5,xmm13
+        movdqa  XMMWORD[32+rsp],xmm6
+        psubd   xmm6,xmm13
+        movups  xmm15,XMMWORD[((-112))+r15]
+        movups  xmm0,XMMWORD[((16-112))+r15]
+        jmp     NEAR $L$oop_ssse3
+ALIGN   32
+$L$oop_ssse3:
+        ror     ebx,2
+        movups  xmm14,XMMWORD[r12]
+        xorps   xmm14,xmm15
+        xorps   xmm2,xmm14
+        movups  xmm1,XMMWORD[((-80))+r15]
+DB      102,15,56,220,208
+        pshufd  xmm8,xmm4,238
+        xor     esi,edx
+        movdqa  xmm12,xmm7
+        paddd   xmm13,xmm7
+        mov     edi,eax
+        add     ebp,DWORD[rsp]
+        punpcklqdq      xmm8,xmm5
+        xor     ebx,ecx
+        rol     eax,5
+        add     ebp,esi
+        psrldq  xmm12,4
+        and     edi,ebx
+        xor     ebx,ecx
+        pxor    xmm8,xmm4
+        add     ebp,eax
+        ror     eax,7
+        pxor    xmm12,xmm6
+        xor     edi,ecx
+        mov     esi,ebp
+        add     edx,DWORD[4+rsp]
+        pxor    xmm8,xmm12
+        xor     eax,ebx
+        rol     ebp,5
+        movdqa  XMMWORD[48+rsp],xmm13
+        add     edx,edi
+        movups  xmm0,XMMWORD[((-64))+r15]
+DB      102,15,56,220,209
+        and     esi,eax
+        movdqa  xmm3,xmm8
+        xor     eax,ebx
+        add     edx,ebp
+        ror     ebp,7
+        movdqa  xmm12,xmm8
+        xor     esi,ebx
+        pslldq  xmm3,12
+        paddd   xmm8,xmm8
+        mov     edi,edx
+        add     ecx,DWORD[8+rsp]
+        psrld   xmm12,31
+        xor     ebp,eax
+        rol     edx,5
+        add     ecx,esi
+        movdqa  xmm13,xmm3
+        and     edi,ebp
+        xor     ebp,eax
+        psrld   xmm3,30
+        add     ecx,edx
+        ror     edx,7
+        por     xmm8,xmm12
+        xor     edi,eax
+        mov     esi,ecx
+        add     ebx,DWORD[12+rsp]
+        movups  xmm1,XMMWORD[((-48))+r15]
+DB      102,15,56,220,208
+        pslld   xmm13,2
+        pxor    xmm8,xmm3
+        xor     edx,ebp
+        movdqa  xmm3,XMMWORD[r11]
+        rol     ecx,5
+        add     ebx,edi
+        and     esi,edx
+        pxor    xmm8,xmm13
+        xor     edx,ebp
+        add     ebx,ecx
+        ror     ecx,7
+        pshufd  xmm9,xmm5,238
+        xor     esi,ebp
+        movdqa  xmm13,xmm8
+        paddd   xmm3,xmm8
+        mov     edi,ebx
+        add     eax,DWORD[16+rsp]
+        punpcklqdq      xmm9,xmm6
+        xor     ecx,edx
+        rol     ebx,5
+        add     eax,esi
+        psrldq  xmm13,4
+        and     edi,ecx
+        xor     ecx,edx
+        pxor    xmm9,xmm5
+        add     eax,ebx
+        ror     ebx,7
+        movups  xmm0,XMMWORD[((-32))+r15]
+DB      102,15,56,220,209
+        pxor    xmm13,xmm7
+        xor     edi,edx
+        mov     esi,eax
+        add     ebp,DWORD[20+rsp]
+        pxor    xmm9,xmm13
+        xor     ebx,ecx
+        rol     eax,5
+        movdqa  XMMWORD[rsp],xmm3
+        add     ebp,edi
+        and     esi,ebx
+        movdqa  xmm12,xmm9
+        xor     ebx,ecx
+        add     ebp,eax
+        ror     eax,7
+        movdqa  xmm13,xmm9
+        xor     esi,ecx
+        pslldq  xmm12,12
+        paddd   xmm9,xmm9
+        mov     edi,ebp
+        add     edx,DWORD[24+rsp]
+        psrld   xmm13,31
+        xor     eax,ebx
+        rol     ebp,5
+        add     edx,esi
+        movups  xmm1,XMMWORD[((-16))+r15]
+DB      102,15,56,220,208
+        movdqa  xmm3,xmm12
+        and     edi,eax
+        xor     eax,ebx
+        psrld   xmm12,30
+        add     edx,ebp
+        ror     ebp,7
+        por     xmm9,xmm13
+        xor     edi,ebx
+        mov     esi,edx
+        add     ecx,DWORD[28+rsp]
+        pslld   xmm3,2
+        pxor    xmm9,xmm12
+        xor     ebp,eax
+        movdqa  xmm12,XMMWORD[16+r11]
+        rol     edx,5
+        add     ecx,edi
+        and     esi,ebp
+        pxor    xmm9,xmm3
+        xor     ebp,eax
+        add     ecx,edx
+        ror     edx,7
+        pshufd  xmm10,xmm6,238
+        xor     esi,eax
+        movdqa  xmm3,xmm9
+        paddd   xmm12,xmm9
+        mov     edi,ecx
+        add     ebx,DWORD[32+rsp]
+        movups  xmm0,XMMWORD[r15]
+DB      102,15,56,220,209
+        punpcklqdq      xmm10,xmm7
+        xor     edx,ebp
+        rol     ecx,5
+        add     ebx,esi
+        psrldq  xmm3,4
+        and     edi,edx
+        xor     edx,ebp
+        pxor    xmm10,xmm6
+        add     ebx,ecx
+        ror     ecx,7
+        pxor    xmm3,xmm8
+        xor     edi,ebp
+        mov     esi,ebx
+        add     eax,DWORD[36+rsp]
+        pxor    xmm10,xmm3
+        xor     ecx,edx
+        rol     ebx,5
+        movdqa  XMMWORD[16+rsp],xmm12
+        add     eax,edi
+        and     esi,ecx
+        movdqa  xmm13,xmm10
+        xor     ecx,edx
+        add     eax,ebx
+        ror     ebx,7
+        movups  xmm1,XMMWORD[16+r15]
+DB      102,15,56,220,208
+        movdqa  xmm3,xmm10
+        xor     esi,edx
+        pslldq  xmm13,12
+        paddd   xmm10,xmm10
+        mov     edi,eax
+        add     ebp,DWORD[40+rsp]
+        psrld   xmm3,31
+        xor     ebx,ecx
+        rol     eax,5
+        add     ebp,esi
+        movdqa  xmm12,xmm13
+        and     edi,ebx
+        xor     ebx,ecx
+        psrld   xmm13,30
+        add     ebp,eax
+        ror     eax,7
+        por     xmm10,xmm3
+        xor     edi,ecx
+        mov     esi,ebp
+        add     edx,DWORD[44+rsp]
+        pslld   xmm12,2
+        pxor    xmm10,xmm13
+        xor     eax,ebx
+        movdqa  xmm13,XMMWORD[16+r11]
+        rol     ebp,5
+        add     edx,edi
+        movups  xmm0,XMMWORD[32+r15]
+DB      102,15,56,220,209
+        and     esi,eax
+        pxor    xmm10,xmm12
+        xor     eax,ebx
+        add     edx,ebp
+        ror     ebp,7
+        pshufd  xmm11,xmm7,238
+        xor     esi,ebx
+        movdqa  xmm12,xmm10
+        paddd   xmm13,xmm10
+        mov     edi,edx
+        add     ecx,DWORD[48+rsp]
+        punpcklqdq      xmm11,xmm8
+        xor     ebp,eax
+        rol     edx,5
+        add     ecx,esi
+        psrldq  xmm12,4
+        and     edi,ebp
+        xor     ebp,eax
+        pxor    xmm11,xmm7
+        add     ecx,edx
+        ror     edx,7
+        pxor    xmm12,xmm9
+        xor     edi,eax
+        mov     esi,ecx
+        add     ebx,DWORD[52+rsp]
+        movups  xmm1,XMMWORD[48+r15]
+DB      102,15,56,220,208
+        pxor    xmm11,xmm12
+        xor     edx,ebp
+        rol     ecx,5
+        movdqa  XMMWORD[32+rsp],xmm13
+        add     ebx,edi
+        and     esi,edx
+        movdqa  xmm3,xmm11
+        xor     edx,ebp
+        add     ebx,ecx
+        ror     ecx,7
+        movdqa  xmm12,xmm11
+        xor     esi,ebp
+        pslldq  xmm3,12
+        paddd   xmm11,xmm11
+        mov     edi,ebx
+        add     eax,DWORD[56+rsp]
+        psrld   xmm12,31
+        xor     ecx,edx
+        rol     ebx,5
+        add     eax,esi
+        movdqa  xmm13,xmm3
+        and     edi,ecx
+        xor     ecx,edx
+        psrld   xmm3,30
+        add     eax,ebx
+        ror     ebx,7
+        cmp     r8d,11
+        jb      NEAR $L$aesenclast1
+        movups  xmm0,XMMWORD[64+r15]
+DB      102,15,56,220,209
+        movups  xmm1,XMMWORD[80+r15]
+DB      102,15,56,220,208
+        je      NEAR $L$aesenclast1
+        movups  xmm0,XMMWORD[96+r15]
+DB      102,15,56,220,209
+        movups  xmm1,XMMWORD[112+r15]
+DB      102,15,56,220,208
+$L$aesenclast1:
+DB      102,15,56,221,209
+        movups  xmm0,XMMWORD[((16-112))+r15]
+        por     xmm11,xmm12
+        xor     edi,edx
+        mov     esi,eax
+        add     ebp,DWORD[60+rsp]
+        pslld   xmm13,2
+        pxor    xmm11,xmm3
+        xor     ebx,ecx
+        movdqa  xmm3,XMMWORD[16+r11]
+        rol     eax,5
+        add     ebp,edi
+        and     esi,ebx
+        pxor    xmm11,xmm13
+        pshufd  xmm13,xmm10,238
+        xor     ebx,ecx
+        add     ebp,eax
+        ror     eax,7
+        pxor    xmm4,xmm8
+        xor     esi,ecx
+        mov     edi,ebp
+        add     edx,DWORD[rsp]
+        punpcklqdq      xmm13,xmm11
+        xor     eax,ebx
+        rol     ebp,5
+        pxor    xmm4,xmm5
+        add     edx,esi
+        movups  xmm14,XMMWORD[16+r12]
+        xorps   xmm14,xmm15
+        movups  XMMWORD[r13*1+r12],xmm2
+        xorps   xmm2,xmm14
+        movups  xmm1,XMMWORD[((-80))+r15]
+DB      102,15,56,220,208
+        and     edi,eax
+        movdqa  xmm12,xmm3
+        xor     eax,ebx
+        paddd   xmm3,xmm11
+        add     edx,ebp
+        pxor    xmm4,xmm13
+        ror     ebp,7
+        xor     edi,ebx
+        mov     esi,edx
+        add     ecx,DWORD[4+rsp]
+        movdqa  xmm13,xmm4
+        xor     ebp,eax
+        rol     edx,5
+        movdqa  XMMWORD[48+rsp],xmm3
+        add     ecx,edi
+        and     esi,ebp
+        xor     ebp,eax
+        pslld   xmm4,2
+        add     ecx,edx
+        ror     edx,7
+        psrld   xmm13,30
+        xor     esi,eax
+        mov     edi,ecx
+        add     ebx,DWORD[8+rsp]
+        movups  xmm0,XMMWORD[((-64))+r15]
+DB      102,15,56,220,209
+        por     xmm4,xmm13
+        xor     edx,ebp
+        rol     ecx,5
+        pshufd  xmm3,xmm11,238
+        add     ebx,esi
+        and     edi,edx
+        xor     edx,ebp
+        add     ebx,ecx
+        add     eax,DWORD[12+rsp]
+        xor     edi,ebp
+        mov     esi,ebx
+        rol     ebx,5
+        add     eax,edi
+        xor     esi,edx
+        ror     ecx,7
+        add     eax,ebx
+        pxor    xmm5,xmm9
+        add     ebp,DWORD[16+rsp]
+        movups  xmm1,XMMWORD[((-48))+r15]
+DB      102,15,56,220,208
+        xor     esi,ecx
+        punpcklqdq      xmm3,xmm4
+        mov     edi,eax
+        rol     eax,5
+        pxor    xmm5,xmm6
+        add     ebp,esi
+        xor     edi,ecx
+        movdqa  xmm13,xmm12
+        ror     ebx,7
+        paddd   xmm12,xmm4
+        add     ebp,eax
+        pxor    xmm5,xmm3
+        add     edx,DWORD[20+rsp]
+        xor     edi,ebx
+        mov     esi,ebp
+        rol     ebp,5
+        movdqa  xmm3,xmm5
+        add     edx,edi
+        xor     esi,ebx
+        movdqa  XMMWORD[rsp],xmm12
+        ror     eax,7
+        add     edx,ebp
+        add     ecx,DWORD[24+rsp]
+        pslld   xmm5,2
+        xor     esi,eax
+        mov     edi,edx
+        psrld   xmm3,30
+        rol     edx,5
+        add     ecx,esi
+        movups  xmm0,XMMWORD[((-32))+r15]
+DB      102,15,56,220,209
+        xor     edi,eax
+        ror     ebp,7
+        por     xmm5,xmm3
+        add     ecx,edx
+        add     ebx,DWORD[28+rsp]
+        pshufd  xmm12,xmm4,238
+        xor     edi,ebp
+        mov     esi,ecx
+        rol     ecx,5
+        add     ebx,edi
+        xor     esi,ebp
+        ror     edx,7
+        add     ebx,ecx
+        pxor    xmm6,xmm10
+        add     eax,DWORD[32+rsp]
+        xor     esi,edx
+        punpcklqdq      xmm12,xmm5
+        mov     edi,ebx
+        rol     ebx,5
+        pxor    xmm6,xmm7
+        add     eax,esi
+        xor     edi,edx
+        movdqa  xmm3,XMMWORD[32+r11]
+        ror     ecx,7
+        paddd   xmm13,xmm5
+        add     eax,ebx
+        pxor    xmm6,xmm12
+        add     ebp,DWORD[36+rsp]
+        movups  xmm1,XMMWORD[((-16))+r15]
+DB      102,15,56,220,208
+        xor     edi,ecx
+        mov     esi,eax
+        rol     eax,5
+        movdqa  xmm12,xmm6
+        add     ebp,edi
+        xor     esi,ecx
+        movdqa  XMMWORD[16+rsp],xmm13
+        ror     ebx,7
+        add     ebp,eax
+        add     edx,DWORD[40+rsp]
+        pslld   xmm6,2
+        xor     esi,ebx
+        mov     edi,ebp
+        psrld   xmm12,30
+        rol     ebp,5
+        add     edx,esi
+        xor     edi,ebx
+        ror     eax,7
+        por     xmm6,xmm12
+        add     edx,ebp
+        add     ecx,DWORD[44+rsp]
+        pshufd  xmm13,xmm5,238
+        xor     edi,eax
+        mov     esi,edx
+        rol     edx,5
+        add     ecx,edi
+        movups  xmm0,XMMWORD[r15]
+DB      102,15,56,220,209
+        xor     esi,eax
+        ror     ebp,7
+        add     ecx,edx
+        pxor    xmm7,xmm11
+        add     ebx,DWORD[48+rsp]
+        xor     esi,ebp
+        punpcklqdq      xmm13,xmm6
+        mov     edi,ecx
+        rol     ecx,5
+        pxor    xmm7,xmm8
+        add     ebx,esi
+        xor     edi,ebp
+        movdqa  xmm12,xmm3
+        ror     edx,7
+        paddd   xmm3,xmm6
+        add     ebx,ecx
+        pxor    xmm7,xmm13
+        add     eax,DWORD[52+rsp]
+        xor     edi,edx
+        mov     esi,ebx
+        rol     ebx,5
+        movdqa  xmm13,xmm7
+        add     eax,edi
+        xor     esi,edx
+        movdqa  XMMWORD[32+rsp],xmm3
+        ror     ecx,7
+        add     eax,ebx
+        add     ebp,DWORD[56+rsp]
+        movups  xmm1,XMMWORD[16+r15]
+DB      102,15,56,220,208
+        pslld   xmm7,2
+        xor     esi,ecx
+        mov     edi,eax
+        psrld   xmm13,30
+        rol     eax,5
+        add     ebp,esi
+        xor     edi,ecx
+        ror     ebx,7
+        por     xmm7,xmm13
+        add     ebp,eax
+        add     edx,DWORD[60+rsp]
+        pshufd  xmm3,xmm6,238
+        xor     edi,ebx
+        mov     esi,ebp
+        rol     ebp,5
+        add     edx,edi
+        xor     esi,ebx
+        ror     eax,7
+        add     edx,ebp
+        pxor    xmm8,xmm4
+        add     ecx,DWORD[rsp]
+        xor     esi,eax
+        punpcklqdq      xmm3,xmm7
+        mov     edi,edx
+        rol     edx,5
+        pxor    xmm8,xmm9
+        add     ecx,esi
+        movups  xmm0,XMMWORD[32+r15]
+DB      102,15,56,220,209
+        xor     edi,eax
+        movdqa  xmm13,xmm12
+        ror     ebp,7
+        paddd   xmm12,xmm7
+        add     ecx,edx
+        pxor    xmm8,xmm3
+        add     ebx,DWORD[4+rsp]
+        xor     edi,ebp
+        mov     esi,ecx
+        rol     ecx,5
+        movdqa  xmm3,xmm8
+        add     ebx,edi
+        xor     esi,ebp
+        movdqa  XMMWORD[48+rsp],xmm12
+        ror     edx,7
+        add     ebx,ecx
+        add     eax,DWORD[8+rsp]
+        pslld   xmm8,2
+        xor     esi,edx
+        mov     edi,ebx
+        psrld   xmm3,30
+        rol     ebx,5
+        add     eax,esi
+        xor     edi,edx
+        ror     ecx,7
+        por     xmm8,xmm3
+        add     eax,ebx
+        add     ebp,DWORD[12+rsp]
+        movups  xmm1,XMMWORD[48+r15]
+DB      102,15,56,220,208
+        pshufd  xmm12,xmm7,238
+        xor     edi,ecx
+        mov     esi,eax
+        rol     eax,5
+        add     ebp,edi
+        xor     esi,ecx
+        ror     ebx,7
+        add     ebp,eax
+        pxor    xmm9,xmm5
+        add     edx,DWORD[16+rsp]
+        xor     esi,ebx
+        punpcklqdq      xmm12,xmm8
+        mov     edi,ebp
+        rol     ebp,5
+        pxor    xmm9,xmm10
+        add     edx,esi
+        xor     edi,ebx
+        movdqa  xmm3,xmm13
+        ror     eax,7
+        paddd   xmm13,xmm8
+        add     edx,ebp
+        pxor    xmm9,xmm12
+        add     ecx,DWORD[20+rsp]
+        xor     edi,eax
+        mov     esi,edx
+        rol     edx,5
+        movdqa  xmm12,xmm9
+        add     ecx,edi
+        cmp     r8d,11
+        jb      NEAR $L$aesenclast2
+        movups  xmm0,XMMWORD[64+r15]
+DB      102,15,56,220,209
+        movups  xmm1,XMMWORD[80+r15]
+DB      102,15,56,220,208
+        je      NEAR $L$aesenclast2
+        movups  xmm0,XMMWORD[96+r15]
+DB      102,15,56,220,209
+        movups  xmm1,XMMWORD[112+r15]
+DB      102,15,56,220,208
+$L$aesenclast2:
+DB      102,15,56,221,209
+        movups  xmm0,XMMWORD[((16-112))+r15]
+        xor     esi,eax
+        movdqa  XMMWORD[rsp],xmm13
+        ror     ebp,7
+        add     ecx,edx
+        add     ebx,DWORD[24+rsp]
+        pslld   xmm9,2
+        xor     esi,ebp
+        mov     edi,ecx
+        psrld   xmm12,30
+        rol     ecx,5
+        add     ebx,esi
+        xor     edi,ebp
+        ror     edx,7
+        por     xmm9,xmm12
+        add     ebx,ecx
+        add     eax,DWORD[28+rsp]
+        pshufd  xmm13,xmm8,238
+        ror     ecx,7
+        mov     esi,ebx
+        xor     edi,edx
+        rol     ebx,5
+        add     eax,edi
+        xor     esi,ecx
+        xor     ecx,edx
+        add     eax,ebx
+        pxor    xmm10,xmm6
+        add     ebp,DWORD[32+rsp]
+        movups  xmm14,XMMWORD[32+r12]
+        xorps   xmm14,xmm15
+        movups  XMMWORD[16+r12*1+r13],xmm2
+        xorps   xmm2,xmm14
+        movups  xmm1,XMMWORD[((-80))+r15]
+DB      102,15,56,220,208
+        and     esi,ecx
+        xor     ecx,edx
+        ror     ebx,7
+        punpcklqdq      xmm13,xmm9
+        mov     edi,eax
+        xor     esi,ecx
+        pxor    xmm10,xmm11
+        rol     eax,5
+        add     ebp,esi
+        movdqa  xmm12,xmm3
+        xor     edi,ebx
+        paddd   xmm3,xmm9
+        xor     ebx,ecx
+        pxor    xmm10,xmm13
+        add     ebp,eax
+        add     edx,DWORD[36+rsp]
+        and     edi,ebx
+        xor     ebx,ecx
+        ror     eax,7
+        movdqa  xmm13,xmm10
+        mov     esi,ebp
+        xor     edi,ebx
+        movdqa  XMMWORD[16+rsp],xmm3
+        rol     ebp,5
+        add     edx,edi
+        movups  xmm0,XMMWORD[((-64))+r15]
+DB      102,15,56,220,209
+        xor     esi,eax
+        pslld   xmm10,2
+        xor     eax,ebx
+        add     edx,ebp
+        psrld   xmm13,30
+        add     ecx,DWORD[40+rsp]
+        and     esi,eax
+        xor     eax,ebx
+        por     xmm10,xmm13
+        ror     ebp,7
+        mov     edi,edx
+        xor     esi,eax
+        rol     edx,5
+        pshufd  xmm3,xmm9,238
+        add     ecx,esi
+        xor     edi,ebp
+        xor     ebp,eax
+        add     ecx,edx
+        add     ebx,DWORD[44+rsp]
+        and     edi,ebp
+        xor     ebp,eax
+        ror     edx,7
+        movups  xmm1,XMMWORD[((-48))+r15]
+DB      102,15,56,220,208
+        mov     esi,ecx
+        xor     edi,ebp
+        rol     ecx,5
+        add     ebx,edi
+        xor     esi,edx
+        xor     edx,ebp
+        add     ebx,ecx
+        pxor    xmm11,xmm7
+        add     eax,DWORD[48+rsp]
+        and     esi,edx
+        xor     edx,ebp
+        ror     ecx,7
+        punpcklqdq      xmm3,xmm10
+        mov     edi,ebx
+        xor     esi,edx
+        pxor    xmm11,xmm4
+        rol     ebx,5
+        add     eax,esi
+        movdqa  xmm13,XMMWORD[48+r11]
+        xor     edi,ecx
+        paddd   xmm12,xmm10
+        xor     ecx,edx
+        pxor    xmm11,xmm3
+        add     eax,ebx
+        add     ebp,DWORD[52+rsp]
+        movups  xmm0,XMMWORD[((-32))+r15]
+DB      102,15,56,220,209
+        and     edi,ecx
+        xor     ecx,edx
+        ror     ebx,7
+        movdqa  xmm3,xmm11
+        mov     esi,eax
+        xor     edi,ecx
+        movdqa  XMMWORD[32+rsp],xmm12
+        rol     eax,5
+        add     ebp,edi
+        xor     esi,ebx
+        pslld   xmm11,2
+        xor     ebx,ecx
+        add     ebp,eax
+        psrld   xmm3,30
+        add     edx,DWORD[56+rsp]
+        and     esi,ebx
+        xor     ebx,ecx
+        por     xmm11,xmm3
+        ror     eax,7
+        mov     edi,ebp
+        xor     esi,ebx
+        rol     ebp,5
+        pshufd  xmm12,xmm10,238
+        add     edx,esi
+        movups  xmm1,XMMWORD[((-16))+r15]
+DB      102,15,56,220,208
+        xor     edi,eax
+        xor     eax,ebx
+        add     edx,ebp
+        add     ecx,DWORD[60+rsp]
+        and     edi,eax
+        xor     eax,ebx
+        ror     ebp,7
+        mov     esi,edx
+        xor     edi,eax
+        rol     edx,5
+        add     ecx,edi
+        xor     esi,ebp
+        xor     ebp,eax
+        add     ecx,edx
+        pxor    xmm4,xmm8
+        add     ebx,DWORD[rsp]
+        and     esi,ebp
+        xor     ebp,eax
+        ror     edx,7
+        movups  xmm0,XMMWORD[r15]
+DB      102,15,56,220,209
+        punpcklqdq      xmm12,xmm11
+        mov     edi,ecx
+        xor     esi,ebp
+        pxor    xmm4,xmm5
+        rol     ecx,5
+        add     ebx,esi
+        movdqa  xmm3,xmm13
+        xor     edi,edx
+        paddd   xmm13,xmm11
+        xor     edx,ebp
+        pxor    xmm4,xmm12
+        add     ebx,ecx
+        add     eax,DWORD[4+rsp]
+        and     edi,edx
+        xor     edx,ebp
+        ror     ecx,7
+        movdqa  xmm12,xmm4
+        mov     esi,ebx
+        xor     edi,edx
+        movdqa  XMMWORD[48+rsp],xmm13
+        rol     ebx,5
+        add     eax,edi
+        xor     esi,ecx
+        pslld   xmm4,2
+        xor     ecx,edx
+        add     eax,ebx
+        psrld   xmm12,30
+        add     ebp,DWORD[8+rsp]
+        movups  xmm1,XMMWORD[16+r15]
+DB      102,15,56,220,208
+        and     esi,ecx
+        xor     ecx,edx
+        por     xmm4,xmm12
+        ror     ebx,7
+        mov     edi,eax
+        xor     esi,ecx
+        rol     eax,5
+        pshufd  xmm13,xmm11,238
+        add     ebp,esi
+        xor     edi,ebx
+        xor     ebx,ecx
+        add     ebp,eax
+        add     edx,DWORD[12+rsp]
+        and     edi,ebx
+        xor     ebx,ecx
+        ror     eax,7
+        mov     esi,ebp
+        xor     edi,ebx
+        rol     ebp,5
+        add     edx,edi
+        movups  xmm0,XMMWORD[32+r15]
+DB      102,15,56,220,209
+        xor     esi,eax
+        xor     eax,ebx
+        add     edx,ebp
+        pxor    xmm5,xmm9
+        add     ecx,DWORD[16+rsp]
+        and     esi,eax
+        xor     eax,ebx
+        ror     ebp,7
+        punpcklqdq      xmm13,xmm4
+        mov     edi,edx
+        xor     esi,eax
+        pxor    xmm5,xmm6
+        rol     edx,5
+        add     ecx,esi
+        movdqa  xmm12,xmm3
+        xor     edi,ebp
+        paddd   xmm3,xmm4
+        xor     ebp,eax
+        pxor    xmm5,xmm13
+        add     ecx,edx
+        add     ebx,DWORD[20+rsp]
+        and     edi,ebp
+        xor     ebp,eax
+        ror     edx,7
+        movups  xmm1,XMMWORD[48+r15]
+DB      102,15,56,220,208
+        movdqa  xmm13,xmm5
+        mov     esi,ecx
+        xor     edi,ebp
+        movdqa  XMMWORD[rsp],xmm3
+        rol     ecx,5
+        add     ebx,edi
+        xor     esi,edx
+        pslld   xmm5,2
+        xor     edx,ebp
+        add     ebx,ecx
+        psrld   xmm13,30
+        add     eax,DWORD[24+rsp]
+        and     esi,edx
+        xor     edx,ebp
+        por     xmm5,xmm13
+        ror     ecx,7
+        mov     edi,ebx
+        xor     esi,edx
+        rol     ebx,5
+        pshufd  xmm3,xmm4,238
+        add     eax,esi
+        xor     edi,ecx
+        xor     ecx,edx
+        add     eax,ebx
+        add     ebp,DWORD[28+rsp]
+        cmp     r8d,11
+        jb      NEAR $L$aesenclast3
+        movups  xmm0,XMMWORD[64+r15]
+DB      102,15,56,220,209
+        movups  xmm1,XMMWORD[80+r15]
+DB      102,15,56,220,208
+        je      NEAR $L$aesenclast3
+        movups  xmm0,XMMWORD[96+r15]
+DB      102,15,56,220,209
+        movups  xmm1,XMMWORD[112+r15]
+DB      102,15,56,220,208
+$L$aesenclast3:
+DB      102,15,56,221,209
+        movups  xmm0,XMMWORD[((16-112))+r15]
+        and     edi,ecx
+        xor     ecx,edx
+        ror     ebx,7
+        mov     esi,eax
+        xor     edi,ecx
+        rol     eax,5
+        add     ebp,edi
+        xor     esi,ebx
+        xor     ebx,ecx
+        add     ebp,eax
+        pxor    xmm6,xmm10
+        add     edx,DWORD[32+rsp]
+        and     esi,ebx
+        xor     ebx,ecx
+        ror     eax,7
+        punpcklqdq      xmm3,xmm5
+        mov     edi,ebp
+        xor     esi,ebx
+        pxor    xmm6,xmm7
+        rol     ebp,5
+        add     edx,esi
+        movups  xmm14,XMMWORD[48+r12]
+        xorps   xmm14,xmm15
+        movups  XMMWORD[32+r12*1+r13],xmm2
+        xorps   xmm2,xmm14
+        movups  xmm1,XMMWORD[((-80))+r15]
+DB      102,15,56,220,208
+        movdqa  xmm13,xmm12
+        xor     edi,eax
+        paddd   xmm12,xmm5
+        xor     eax,ebx
+        pxor    xmm6,xmm3
+        add     edx,ebp
+        add     ecx,DWORD[36+rsp]
+        and     edi,eax
+        xor     eax,ebx
+        ror     ebp,7
+        movdqa  xmm3,xmm6
+        mov     esi,edx
+        xor     edi,eax
+        movdqa  XMMWORD[16+rsp],xmm12
+        rol     edx,5
+        add     ecx,edi
+        xor     esi,ebp
+        pslld   xmm6,2
+        xor     ebp,eax
+        add     ecx,edx
+        psrld   xmm3,30
+        add     ebx,DWORD[40+rsp]
+        and     esi,ebp
+        xor     ebp,eax
+        por     xmm6,xmm3
+        ror     edx,7
+        movups  xmm0,XMMWORD[((-64))+r15]
+DB      102,15,56,220,209
+        mov     edi,ecx
+        xor     esi,ebp
+        rol     ecx,5
+        pshufd  xmm12,xmm5,238
+        add     ebx,esi
+        xor     edi,edx
+        xor     edx,ebp
+        add     ebx,ecx
+        add     eax,DWORD[44+rsp]
+        and     edi,edx
+        xor     edx,ebp
+        ror     ecx,7
+        mov     esi,ebx
+        xor     edi,edx
+        rol     ebx,5
+        add     eax,edi
+        xor     esi,edx
+        add     eax,ebx
+        pxor    xmm7,xmm11
+        add     ebp,DWORD[48+rsp]
+        movups  xmm1,XMMWORD[((-48))+r15]
+DB      102,15,56,220,208
+        xor     esi,ecx
+        punpcklqdq      xmm12,xmm6
+        mov     edi,eax
+        rol     eax,5
+        pxor    xmm7,xmm8
+        add     ebp,esi
+        xor     edi,ecx
+        movdqa  xmm3,xmm13
+        ror     ebx,7
+        paddd   xmm13,xmm6
+        add     ebp,eax
+        pxor    xmm7,xmm12
+        add     edx,DWORD[52+rsp]
+        xor     edi,ebx
+        mov     esi,ebp
+        rol     ebp,5
+        movdqa  xmm12,xmm7
+        add     edx,edi
+        xor     esi,ebx
+        movdqa  XMMWORD[32+rsp],xmm13
+        ror     eax,7
+        add     edx,ebp
+        add     ecx,DWORD[56+rsp]
+        pslld   xmm7,2
+        xor     esi,eax
+        mov     edi,edx
+        psrld   xmm12,30
+        rol     edx,5
+        add     ecx,esi
+        movups  xmm0,XMMWORD[((-32))+r15]
+DB      102,15,56,220,209
+        xor     edi,eax
+        ror     ebp,7
+        por     xmm7,xmm12
+        add     ecx,edx
+        add     ebx,DWORD[60+rsp]
+        xor     edi,ebp
+        mov     esi,ecx
+        rol     ecx,5
+        add     ebx,edi
+        xor     esi,ebp
+        ror     edx,7
+        add     ebx,ecx
+        add     eax,DWORD[rsp]
+        xor     esi,edx
+        mov     edi,ebx
+        rol     ebx,5
+        paddd   xmm3,xmm7
+        add     eax,esi
+        xor     edi,edx
+        movdqa  XMMWORD[48+rsp],xmm3
+        ror     ecx,7
+        add     eax,ebx
+        add     ebp,DWORD[4+rsp]
+        movups  xmm1,XMMWORD[((-16))+r15]
+DB      102,15,56,220,208
+        xor     edi,ecx
+        mov     esi,eax
+        rol     eax,5
+        add     ebp,edi
+        xor     esi,ecx
+        ror     ebx,7
+        add     ebp,eax
+        add     edx,DWORD[8+rsp]
+        xor     esi,ebx
+        mov     edi,ebp
+        rol     ebp,5
+        add     edx,esi
+        xor     edi,ebx
+        ror     eax,7
+        add     edx,ebp
+        add     ecx,DWORD[12+rsp]
+        xor     edi,eax
+        mov     esi,edx
+        rol     edx,5
+        add     ecx,edi
+        movups  xmm0,XMMWORD[r15]
+DB      102,15,56,220,209
+        xor     esi,eax
+        ror     ebp,7
+        add     ecx,edx
+        cmp     r10,r14
+        je      NEAR $L$done_ssse3
+        movdqa  xmm3,XMMWORD[64+r11]
+        movdqa  xmm13,XMMWORD[r11]
+        movdqu  xmm4,XMMWORD[r10]
+        movdqu  xmm5,XMMWORD[16+r10]
+        movdqu  xmm6,XMMWORD[32+r10]
+        movdqu  xmm7,XMMWORD[48+r10]
+DB      102,15,56,0,227
+        add     r10,64
+        add     ebx,DWORD[16+rsp]
+        xor     esi,ebp
+        mov     edi,ecx
+DB      102,15,56,0,235
+        rol     ecx,5
+        add     ebx,esi
+        xor     edi,ebp
+        ror     edx,7
+        paddd   xmm4,xmm13
+        add     ebx,ecx
+        add     eax,DWORD[20+rsp]
+        xor     edi,edx
+        mov     esi,ebx
+        movdqa  XMMWORD[rsp],xmm4
+        rol     ebx,5
+        add     eax,edi
+        xor     esi,edx
+        ror     ecx,7
+        psubd   xmm4,xmm13
+        add     eax,ebx
+        add     ebp,DWORD[24+rsp]
+        movups  xmm1,XMMWORD[16+r15]
+DB      102,15,56,220,208
+        xor     esi,ecx
+        mov     edi,eax
+        rol     eax,5
+        add     ebp,esi
+        xor     edi,ecx
+        ror     ebx,7
+        add     ebp,eax
+        add     edx,DWORD[28+rsp]
+        xor     edi,ebx
+        mov     esi,ebp
+        rol     ebp,5
+        add     edx,edi
+        xor     esi,ebx
+        ror     eax,7
+        add     edx,ebp
+        add     ecx,DWORD[32+rsp]
+        xor     esi,eax
+        mov     edi,edx
+DB      102,15,56,0,243
+        rol     edx,5
+        add     ecx,esi
+        movups  xmm0,XMMWORD[32+r15]
+DB      102,15,56,220,209
+        xor     edi,eax
+        ror     ebp,7
+        paddd   xmm5,xmm13
+        add     ecx,edx
+        add     ebx,DWORD[36+rsp]
+        xor     edi,ebp
+        mov     esi,ecx
+        movdqa  XMMWORD[16+rsp],xmm5
+        rol     ecx,5
+        add     ebx,edi
+        xor     esi,ebp
+        ror     edx,7
+        psubd   xmm5,xmm13
+        add     ebx,ecx
+        add     eax,DWORD[40+rsp]
+        xor     esi,edx
+        mov     edi,ebx
+        rol     ebx,5
+        add     eax,esi
+        xor     edi,edx
+        ror     ecx,7
+        add     eax,ebx
+        add     ebp,DWORD[44+rsp]
+        movups  xmm1,XMMWORD[48+r15]
+DB      102,15,56,220,208
+        xor     edi,ecx
+        mov     esi,eax
+        rol     eax,5
+        add     ebp,edi
+        xor     esi,ecx
+        ror     ebx,7
+        add     ebp,eax
+        add     edx,DWORD[48+rsp]
+        xor     esi,ebx
+        mov     edi,ebp
+DB      102,15,56,0,251
+        rol     ebp,5
+        add     edx,esi
+        xor     edi,ebx
+        ror     eax,7
+        paddd   xmm6,xmm13
+        add     edx,ebp
+        add     ecx,DWORD[52+rsp]
+        xor     edi,eax
+        mov     esi,edx
+        movdqa  XMMWORD[32+rsp],xmm6
+        rol     edx,5
+        add     ecx,edi
+        cmp     r8d,11
+        jb      NEAR $L$aesenclast4
+        movups  xmm0,XMMWORD[64+r15]
+DB      102,15,56,220,209
+        movups  xmm1,XMMWORD[80+r15]
+DB      102,15,56,220,208
+        je      NEAR $L$aesenclast4
+        movups  xmm0,XMMWORD[96+r15]
+DB      102,15,56,220,209
+        movups  xmm1,XMMWORD[112+r15]
+DB      102,15,56,220,208
+$L$aesenclast4:
+DB      102,15,56,221,209
+        movups  xmm0,XMMWORD[((16-112))+r15]
+        xor     esi,eax
+        ror     ebp,7
+        psubd   xmm6,xmm13
+        add     ecx,edx
+        add     ebx,DWORD[56+rsp]
+        xor     esi,ebp
+        mov     edi,ecx
+        rol     ecx,5
+        add     ebx,esi
+        xor     edi,ebp
+        ror     edx,7
+        add     ebx,ecx
+        add     eax,DWORD[60+rsp]
+        xor     edi,edx
+        mov     esi,ebx
+        rol     ebx,5
+        add     eax,edi
+        ror     ecx,7
+        add     eax,ebx
+        movups  XMMWORD[48+r12*1+r13],xmm2
+        lea     r12,[64+r12]
+
+        add     eax,DWORD[r9]
+        add     esi,DWORD[4+r9]
+        add     ecx,DWORD[8+r9]
+        add     edx,DWORD[12+r9]
+        mov     DWORD[r9],eax
+        add     ebp,DWORD[16+r9]
+        mov     DWORD[4+r9],esi
+        mov     ebx,esi
+        mov     DWORD[8+r9],ecx
+        mov     edi,ecx
+        mov     DWORD[12+r9],edx
+        xor     edi,edx
+        mov     DWORD[16+r9],ebp
+        and     esi,edi
+        jmp     NEAR $L$oop_ssse3
+
+$L$done_ssse3:
+        add     ebx,DWORD[16+rsp]
+        xor     esi,ebp
+        mov     edi,ecx
+        rol     ecx,5
+        add     ebx,esi
+        xor     edi,ebp
+        ror     edx,7
+        add     ebx,ecx
+        add     eax,DWORD[20+rsp]
+        xor     edi,edx
+        mov     esi,ebx
+        rol     ebx,5
+        add     eax,edi
+        xor     esi,edx
+        ror     ecx,7
+        add     eax,ebx
+        add     ebp,DWORD[24+rsp]
+        movups  xmm1,XMMWORD[16+r15]
+DB      102,15,56,220,208
+        xor     esi,ecx
+        mov     edi,eax
+        rol     eax,5
+        add     ebp,esi
+        xor     edi,ecx
+        ror     ebx,7
+        add     ebp,eax
+        add     edx,DWORD[28+rsp]
+        xor     edi,ebx
+        mov     esi,ebp
+        rol     ebp,5
+        add     edx,edi
+        xor     esi,ebx
+        ror     eax,7
+        add     edx,ebp
+        add     ecx,DWORD[32+rsp]
+        xor     esi,eax
+        mov     edi,edx
+        rol     edx,5
+        add     ecx,esi
+        movups  xmm0,XMMWORD[32+r15]
+DB      102,15,56,220,209
+        xor     edi,eax
+        ror     ebp,7
+        add     ecx,edx
+        add     ebx,DWORD[36+rsp]
+        xor     edi,ebp
+        mov     esi,ecx
+        rol     ecx,5
+        add     ebx,edi
+        xor     esi,ebp
+        ror     edx,7
+        add     ebx,ecx
+        add     eax,DWORD[40+rsp]
+        xor     esi,edx
+        mov     edi,ebx
+        rol     ebx,5
+        add     eax,esi
+        xor     edi,edx
+        ror     ecx,7
+        add     eax,ebx
+        add     ebp,DWORD[44+rsp]
+        movups  xmm1,XMMWORD[48+r15]
+DB      102,15,56,220,208
+        xor     edi,ecx
+        mov     esi,eax
+        rol     eax,5
+        add     ebp,edi
+        xor     esi,ecx
+        ror     ebx,7
+        add     ebp,eax
+        add     edx,DWORD[48+rsp]
+        xor     esi,ebx
+        mov     edi,ebp
+        rol     ebp,5
+        add     edx,esi
+        xor     edi,ebx
+        ror     eax,7
+        add     edx,ebp
+        add     ecx,DWORD[52+rsp]
+        xor     edi,eax
+        mov     esi,edx
+        rol     edx,5
+        add     ecx,edi
+        cmp     r8d,11
+        jb      NEAR $L$aesenclast5
+        movups  xmm0,XMMWORD[64+r15]
+DB      102,15,56,220,209
+        movups  xmm1,XMMWORD[80+r15]
+DB      102,15,56,220,208
+        je      NEAR $L$aesenclast5
+        movups  xmm0,XMMWORD[96+r15]
+DB      102,15,56,220,209
+        movups  xmm1,XMMWORD[112+r15]
+DB      102,15,56,220,208
+$L$aesenclast5:
+DB      102,15,56,221,209
+        movups  xmm0,XMMWORD[((16-112))+r15]
+        xor     esi,eax
+        ror     ebp,7
+        add     ecx,edx
+        add     ebx,DWORD[56+rsp]
+        xor     esi,ebp
+        mov     edi,ecx
+        rol     ecx,5
+        add     ebx,esi
+        xor     edi,ebp
+        ror     edx,7
+        add     ebx,ecx
+        add     eax,DWORD[60+rsp]
+        xor     edi,edx
+        mov     esi,ebx
+        rol     ebx,5
+        add     eax,edi
+        ror     ecx,7
+        add     eax,ebx
+        movups  XMMWORD[48+r12*1+r13],xmm2
+        mov     r8,QWORD[88+rsp]
+
+        add     eax,DWORD[r9]
+        add     esi,DWORD[4+r9]
+        add     ecx,DWORD[8+r9]
+        mov     DWORD[r9],eax
+        add     edx,DWORD[12+r9]
+        mov     DWORD[4+r9],esi
+        add     ebp,DWORD[16+r9]
+        mov     DWORD[8+r9],ecx
+        mov     DWORD[12+r9],edx
+        mov     DWORD[16+r9],ebp
+        movups  XMMWORD[r8],xmm2
+        movaps  xmm6,XMMWORD[((96+0))+rsp]
+        movaps  xmm7,XMMWORD[((96+16))+rsp]
+        movaps  xmm8,XMMWORD[((96+32))+rsp]
+        movaps  xmm9,XMMWORD[((96+48))+rsp]
+        movaps  xmm10,XMMWORD[((96+64))+rsp]
+        movaps  xmm11,XMMWORD[((96+80))+rsp]
+        movaps  xmm12,XMMWORD[((96+96))+rsp]
+        movaps  xmm13,XMMWORD[((96+112))+rsp]
+        movaps  xmm14,XMMWORD[((96+128))+rsp]
+        movaps  xmm15,XMMWORD[((96+144))+rsp]
+        lea     rsi,[264+rsp]
+
+        mov     r15,QWORD[rsi]
+
+        mov     r14,QWORD[8+rsi]
+
+        mov     r13,QWORD[16+rsi]
+
+        mov     r12,QWORD[24+rsi]
+
+        mov     rbp,QWORD[32+rsi]
+
+        mov     rbx,QWORD[40+rsi]
+
+        lea     rsp,[48+rsi]
+
+$L$epilogue_ssse3:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_aesni_cbc_sha1_enc_ssse3:
+
+ALIGN   32
+aesni_cbc_sha1_enc_avx:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_aesni_cbc_sha1_enc_avx:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+
+        mov     r10,QWORD[56+rsp]
+
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+        lea     rsp,[((-264))+rsp]
+
+
+
+        movaps  XMMWORD[(96+0)+rsp],xmm6
+        movaps  XMMWORD[(96+16)+rsp],xmm7
+        movaps  XMMWORD[(96+32)+rsp],xmm8
+        movaps  XMMWORD[(96+48)+rsp],xmm9
+        movaps  XMMWORD[(96+64)+rsp],xmm10
+        movaps  XMMWORD[(96+80)+rsp],xmm11
+        movaps  XMMWORD[(96+96)+rsp],xmm12
+        movaps  XMMWORD[(96+112)+rsp],xmm13
+        movaps  XMMWORD[(96+128)+rsp],xmm14
+        movaps  XMMWORD[(96+144)+rsp],xmm15
+$L$prologue_avx:
+        vzeroall
+        mov     r12,rdi
+        mov     r13,rsi
+        mov     r14,rdx
+        lea     r15,[112+rcx]
+        vmovdqu xmm12,XMMWORD[r8]
+        mov     QWORD[88+rsp],r8
+        shl     r14,6
+        sub     r13,r12
+        mov     r8d,DWORD[((240-112))+r15]
+        add     r14,r10
+
+        lea     r11,[K_XX_XX]
+        mov     eax,DWORD[r9]
+        mov     ebx,DWORD[4+r9]
+        mov     ecx,DWORD[8+r9]
+        mov     edx,DWORD[12+r9]
+        mov     esi,ebx
+        mov     ebp,DWORD[16+r9]
+        mov     edi,ecx
+        xor     edi,edx
+        and     esi,edi
+
+        vmovdqa xmm6,XMMWORD[64+r11]
+        vmovdqa xmm10,XMMWORD[r11]
+        vmovdqu xmm0,XMMWORD[r10]
+        vmovdqu xmm1,XMMWORD[16+r10]
+        vmovdqu xmm2,XMMWORD[32+r10]
+        vmovdqu xmm3,XMMWORD[48+r10]
+        vpshufb xmm0,xmm0,xmm6
+        add     r10,64
+        vpshufb xmm1,xmm1,xmm6
+        vpshufb xmm2,xmm2,xmm6
+        vpshufb xmm3,xmm3,xmm6
+        vpaddd  xmm4,xmm0,xmm10
+        vpaddd  xmm5,xmm1,xmm10
+        vpaddd  xmm6,xmm2,xmm10
+        vmovdqa XMMWORD[rsp],xmm4
+        vmovdqa XMMWORD[16+rsp],xmm5
+        vmovdqa XMMWORD[32+rsp],xmm6
+        vmovups xmm15,XMMWORD[((-112))+r15]
+        vmovups xmm14,XMMWORD[((16-112))+r15]
+        jmp     NEAR $L$oop_avx
+ALIGN   32
+$L$oop_avx:
+        shrd    ebx,ebx,2
+        vmovdqu xmm13,XMMWORD[r12]
+        vpxor   xmm13,xmm13,xmm15
+        vpxor   xmm12,xmm12,xmm13
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[((-80))+r15]
+        xor     esi,edx
+        vpalignr        xmm4,xmm1,xmm0,8
+        mov     edi,eax
+        add     ebp,DWORD[rsp]
+        vpaddd  xmm9,xmm10,xmm3
+        xor     ebx,ecx
+        shld    eax,eax,5
+        vpsrldq xmm8,xmm3,4
+        add     ebp,esi
+        and     edi,ebx
+        vpxor   xmm4,xmm4,xmm0
+        xor     ebx,ecx
+        add     ebp,eax
+        vpxor   xmm8,xmm8,xmm2
+        shrd    eax,eax,7
+        xor     edi,ecx
+        mov     esi,ebp
+        add     edx,DWORD[4+rsp]
+        vpxor   xmm4,xmm4,xmm8
+        xor     eax,ebx
+        shld    ebp,ebp,5
+        vmovdqa XMMWORD[48+rsp],xmm9
+        add     edx,edi
+        vaesenc xmm12,xmm12,xmm15
+        vmovups xmm14,XMMWORD[((-64))+r15]
+        and     esi,eax
+        vpsrld  xmm8,xmm4,31
+        xor     eax,ebx
+        add     edx,ebp
+        shrd    ebp,ebp,7
+        xor     esi,ebx
+        vpslldq xmm9,xmm4,12
+        vpaddd  xmm4,xmm4,xmm4
+        mov     edi,edx
+        add     ecx,DWORD[8+rsp]
+        xor     ebp,eax
+        shld    edx,edx,5
+        vpor    xmm4,xmm4,xmm8
+        vpsrld  xmm8,xmm9,30
+        add     ecx,esi
+        and     edi,ebp
+        xor     ebp,eax
+        add     ecx,edx
+        vpslld  xmm9,xmm9,2
+        vpxor   xmm4,xmm4,xmm8
+        shrd    edx,edx,7
+        xor     edi,eax
+        mov     esi,ecx
+        add     ebx,DWORD[12+rsp]
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[((-48))+r15]
+        vpxor   xmm4,xmm4,xmm9
+        xor     edx,ebp
+        shld    ecx,ecx,5
+        add     ebx,edi
+        and     esi,edx
+        xor     edx,ebp
+        add     ebx,ecx
+        shrd    ecx,ecx,7
+        xor     esi,ebp
+        vpalignr        xmm5,xmm2,xmm1,8
+        mov     edi,ebx
+        add     eax,DWORD[16+rsp]
+        vpaddd  xmm9,xmm10,xmm4
+        xor     ecx,edx
+        shld    ebx,ebx,5
+        vpsrldq xmm8,xmm4,4
+        add     eax,esi
+        and     edi,ecx
+        vpxor   xmm5,xmm5,xmm1
+        xor     ecx,edx
+        add     eax,ebx
+        vpxor   xmm8,xmm8,xmm3
+        shrd    ebx,ebx,7
+        vaesenc xmm12,xmm12,xmm15
+        vmovups xmm14,XMMWORD[((-32))+r15]
+        xor     edi,edx
+        mov     esi,eax
+        add     ebp,DWORD[20+rsp]
+        vpxor   xmm5,xmm5,xmm8
+        xor     ebx,ecx
+        shld    eax,eax,5
+        vmovdqa XMMWORD[rsp],xmm9
+        add     ebp,edi
+        and     esi,ebx
+        vpsrld  xmm8,xmm5,31
+        xor     ebx,ecx
+        add     ebp,eax
+        shrd    eax,eax,7
+        xor     esi,ecx
+        vpslldq xmm9,xmm5,12
+        vpaddd  xmm5,xmm5,xmm5
+        mov     edi,ebp
+        add     edx,DWORD[24+rsp]
+        xor     eax,ebx
+        shld    ebp,ebp,5
+        vpor    xmm5,xmm5,xmm8
+        vpsrld  xmm8,xmm9,30
+        add     edx,esi
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[((-16))+r15]
+        and     edi,eax
+        xor     eax,ebx
+        add     edx,ebp
+        vpslld  xmm9,xmm9,2
+        vpxor   xmm5,xmm5,xmm8
+        shrd    ebp,ebp,7
+        xor     edi,ebx
+        mov     esi,edx
+        add     ecx,DWORD[28+rsp]
+        vpxor   xmm5,xmm5,xmm9
+        xor     ebp,eax
+        shld    edx,edx,5
+        vmovdqa xmm10,XMMWORD[16+r11]
+        add     ecx,edi
+        and     esi,ebp
+        xor     ebp,eax
+        add     ecx,edx
+        shrd    edx,edx,7
+        xor     esi,eax
+        vpalignr        xmm6,xmm3,xmm2,8
+        mov     edi,ecx
+        add     ebx,DWORD[32+rsp]
+        vaesenc xmm12,xmm12,xmm15
+        vmovups xmm14,XMMWORD[r15]
+        vpaddd  xmm9,xmm10,xmm5
+        xor     edx,ebp
+        shld    ecx,ecx,5
+        vpsrldq xmm8,xmm5,4
+        add     ebx,esi
+        and     edi,edx
+        vpxor   xmm6,xmm6,xmm2
+        xor     edx,ebp
+        add     ebx,ecx
+        vpxor   xmm8,xmm8,xmm4
+        shrd    ecx,ecx,7
+        xor     edi,ebp
+        mov     esi,ebx
+        add     eax,DWORD[36+rsp]
+        vpxor   xmm6,xmm6,xmm8
+        xor     ecx,edx
+        shld    ebx,ebx,5
+        vmovdqa XMMWORD[16+rsp],xmm9
+        add     eax,edi
+        and     esi,ecx
+        vpsrld  xmm8,xmm6,31
+        xor     ecx,edx
+        add     eax,ebx
+        shrd    ebx,ebx,7
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[16+r15]
+        xor     esi,edx
+        vpslldq xmm9,xmm6,12
+        vpaddd  xmm6,xmm6,xmm6
+        mov     edi,eax
+        add     ebp,DWORD[40+rsp]
+        xor     ebx,ecx
+        shld    eax,eax,5
+        vpor    xmm6,xmm6,xmm8
+        vpsrld  xmm8,xmm9,30
+        add     ebp,esi
+        and     edi,ebx
+        xor     ebx,ecx
+        add     ebp,eax
+        vpslld  xmm9,xmm9,2
+        vpxor   xmm6,xmm6,xmm8
+        shrd    eax,eax,7
+        xor     edi,ecx
+        mov     esi,ebp
+        add     edx,DWORD[44+rsp]
+        vpxor   xmm6,xmm6,xmm9
+        xor     eax,ebx
+        shld    ebp,ebp,5
+        add     edx,edi
+        vaesenc xmm12,xmm12,xmm15
+        vmovups xmm14,XMMWORD[32+r15]
+        and     esi,eax
+        xor     eax,ebx
+        add     edx,ebp
+        shrd    ebp,ebp,7
+        xor     esi,ebx
+        vpalignr        xmm7,xmm4,xmm3,8
+        mov     edi,edx
+        add     ecx,DWORD[48+rsp]
+        vpaddd  xmm9,xmm10,xmm6
+        xor     ebp,eax
+        shld    edx,edx,5
+        vpsrldq xmm8,xmm6,4
+        add     ecx,esi
+        and     edi,ebp
+        vpxor   xmm7,xmm7,xmm3
+        xor     ebp,eax
+        add     ecx,edx
+        vpxor   xmm8,xmm8,xmm5
+        shrd    edx,edx,7
+        xor     edi,eax
+        mov     esi,ecx
+        add     ebx,DWORD[52+rsp]
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[48+r15]
+        vpxor   xmm7,xmm7,xmm8
+        xor     edx,ebp
+        shld    ecx,ecx,5
+        vmovdqa XMMWORD[32+rsp],xmm9
+        add     ebx,edi
+        and     esi,edx
+        vpsrld  xmm8,xmm7,31
+        xor     edx,ebp
+        add     ebx,ecx
+        shrd    ecx,ecx,7
+        xor     esi,ebp
+        vpslldq xmm9,xmm7,12
+        vpaddd  xmm7,xmm7,xmm7
+        mov     edi,ebx
+        add     eax,DWORD[56+rsp]
+        xor     ecx,edx
+        shld    ebx,ebx,5
+        vpor    xmm7,xmm7,xmm8
+        vpsrld  xmm8,xmm9,30
+        add     eax,esi
+        and     edi,ecx
+        xor     ecx,edx
+        add     eax,ebx
+        vpslld  xmm9,xmm9,2
+        vpxor   xmm7,xmm7,xmm8
+        shrd    ebx,ebx,7
+        cmp     r8d,11
+        jb      NEAR $L$vaesenclast6
+        vaesenc xmm12,xmm12,xmm15
+        vmovups xmm14,XMMWORD[64+r15]
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[80+r15]
+        je      NEAR $L$vaesenclast6
+        vaesenc xmm12,xmm12,xmm15
+        vmovups xmm14,XMMWORD[96+r15]
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[112+r15]
+$L$vaesenclast6:
+        vaesenclast     xmm12,xmm12,xmm15
+        vmovups xmm15,XMMWORD[((-112))+r15]
+        vmovups xmm14,XMMWORD[((16-112))+r15]
+        xor     edi,edx
+        mov     esi,eax
+        add     ebp,DWORD[60+rsp]
+        vpxor   xmm7,xmm7,xmm9
+        xor     ebx,ecx
+        shld    eax,eax,5
+        add     ebp,edi
+        and     esi,ebx
+        xor     ebx,ecx
+        add     ebp,eax
+        vpalignr        xmm8,xmm7,xmm6,8
+        vpxor   xmm0,xmm0,xmm4
+        shrd    eax,eax,7
+        xor     esi,ecx
+        mov     edi,ebp
+        add     edx,DWORD[rsp]
+        vpxor   xmm0,xmm0,xmm1
+        xor     eax,ebx
+        shld    ebp,ebp,5
+        vpaddd  xmm9,xmm10,xmm7
+        add     edx,esi
+        vmovdqu xmm13,XMMWORD[16+r12]
+        vpxor   xmm13,xmm13,xmm15
+        vmovups XMMWORD[r13*1+r12],xmm12
+        vpxor   xmm12,xmm12,xmm13
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[((-80))+r15]
+        and     edi,eax
+        vpxor   xmm0,xmm0,xmm8
+        xor     eax,ebx
+        add     edx,ebp
+        shrd    ebp,ebp,7
+        xor     edi,ebx
+        vpsrld  xmm8,xmm0,30
+        vmovdqa XMMWORD[48+rsp],xmm9
+        mov     esi,edx
+        add     ecx,DWORD[4+rsp]
+        xor     ebp,eax
+        shld    edx,edx,5
+        vpslld  xmm0,xmm0,2
+        add     ecx,edi
+        and     esi,ebp
+        xor     ebp,eax
+        add     ecx,edx
+        shrd    edx,edx,7
+        xor     esi,eax
+        mov     edi,ecx
+        add     ebx,DWORD[8+rsp]
+        vaesenc xmm12,xmm12,xmm15
+        vmovups xmm14,XMMWORD[((-64))+r15]
+        vpor    xmm0,xmm0,xmm8
+        xor     edx,ebp
+        shld    ecx,ecx,5
+        add     ebx,esi
+        and     edi,edx
+        xor     edx,ebp
+        add     ebx,ecx
+        add     eax,DWORD[12+rsp]
+        xor     edi,ebp
+        mov     esi,ebx
+        shld    ebx,ebx,5
+        add     eax,edi
+        xor     esi,edx
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        vpalignr        xmm8,xmm0,xmm7,8
+        vpxor   xmm1,xmm1,xmm5
+        add     ebp,DWORD[16+rsp]
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[((-48))+r15]
+        xor     esi,ecx
+        mov     edi,eax
+        shld    eax,eax,5
+        vpxor   xmm1,xmm1,xmm2
+        add     ebp,esi
+        xor     edi,ecx
+        vpaddd  xmm9,xmm10,xmm0
+        shrd    ebx,ebx,7
+        add     ebp,eax
+        vpxor   xmm1,xmm1,xmm8
+        add     edx,DWORD[20+rsp]
+        xor     edi,ebx
+        mov     esi,ebp
+        shld    ebp,ebp,5
+        vpsrld  xmm8,xmm1,30
+        vmovdqa XMMWORD[rsp],xmm9
+        add     edx,edi
+        xor     esi,ebx
+        shrd    eax,eax,7
+        add     edx,ebp
+        vpslld  xmm1,xmm1,2
+        add     ecx,DWORD[24+rsp]
+        xor     esi,eax
+        mov     edi,edx
+        shld    edx,edx,5
+        add     ecx,esi
+        vaesenc xmm12,xmm12,xmm15
+        vmovups xmm14,XMMWORD[((-32))+r15]
+        xor     edi,eax
+        shrd    ebp,ebp,7
+        add     ecx,edx
+        vpor    xmm1,xmm1,xmm8
+        add     ebx,DWORD[28+rsp]
+        xor     edi,ebp
+        mov     esi,ecx
+        shld    ecx,ecx,5
+        add     ebx,edi
+        xor     esi,ebp
+        shrd    edx,edx,7
+        add     ebx,ecx
+        vpalignr        xmm8,xmm1,xmm0,8
+        vpxor   xmm2,xmm2,xmm6
+        add     eax,DWORD[32+rsp]
+        xor     esi,edx
+        mov     edi,ebx
+        shld    ebx,ebx,5
+        vpxor   xmm2,xmm2,xmm3
+        add     eax,esi
+        xor     edi,edx
+        vpaddd  xmm9,xmm10,xmm1
+        vmovdqa xmm10,XMMWORD[32+r11]
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        vpxor   xmm2,xmm2,xmm8
+        add     ebp,DWORD[36+rsp]
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[((-16))+r15]
+        xor     edi,ecx
+        mov     esi,eax
+        shld    eax,eax,5
+        vpsrld  xmm8,xmm2,30
+        vmovdqa XMMWORD[16+rsp],xmm9
+        add     ebp,edi
+        xor     esi,ecx
+        shrd    ebx,ebx,7
+        add     ebp,eax
+        vpslld  xmm2,xmm2,2
+        add     edx,DWORD[40+rsp]
+        xor     esi,ebx
+        mov     edi,ebp
+        shld    ebp,ebp,5
+        add     edx,esi
+        xor     edi,ebx
+        shrd    eax,eax,7
+        add     edx,ebp
+        vpor    xmm2,xmm2,xmm8
+        add     ecx,DWORD[44+rsp]
+        xor     edi,eax
+        mov     esi,edx
+        shld    edx,edx,5
+        add     ecx,edi
+        vaesenc xmm12,xmm12,xmm15
+        vmovups xmm14,XMMWORD[r15]
+        xor     esi,eax
+        shrd    ebp,ebp,7
+        add     ecx,edx
+        vpalignr        xmm8,xmm2,xmm1,8
+        vpxor   xmm3,xmm3,xmm7
+        add     ebx,DWORD[48+rsp]
+        xor     esi,ebp
+        mov     edi,ecx
+        shld    ecx,ecx,5
+        vpxor   xmm3,xmm3,xmm4
+        add     ebx,esi
+        xor     edi,ebp
+        vpaddd  xmm9,xmm10,xmm2
+        shrd    edx,edx,7
+        add     ebx,ecx
+        vpxor   xmm3,xmm3,xmm8
+        add     eax,DWORD[52+rsp]
+        xor     edi,edx
+        mov     esi,ebx
+        shld    ebx,ebx,5
+        vpsrld  xmm8,xmm3,30
+        vmovdqa XMMWORD[32+rsp],xmm9
+        add     eax,edi
+        xor     esi,edx
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        vpslld  xmm3,xmm3,2
+        add     ebp,DWORD[56+rsp]
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[16+r15]
+        xor     esi,ecx
+        mov     edi,eax
+        shld    eax,eax,5
+        add     ebp,esi
+        xor     edi,ecx
+        shrd    ebx,ebx,7
+        add     ebp,eax
+        vpor    xmm3,xmm3,xmm8
+        add     edx,DWORD[60+rsp]
+        xor     edi,ebx
+        mov     esi,ebp
+        shld    ebp,ebp,5
+        add     edx,edi
+        xor     esi,ebx
+        shrd    eax,eax,7
+        add     edx,ebp
+        vpalignr        xmm8,xmm3,xmm2,8
+        vpxor   xmm4,xmm4,xmm0
+        add     ecx,DWORD[rsp]
+        xor     esi,eax
+        mov     edi,edx
+        shld    edx,edx,5
+        vpxor   xmm4,xmm4,xmm5
+        add     ecx,esi
+        vaesenc xmm12,xmm12,xmm15
+        vmovups xmm14,XMMWORD[32+r15]
+        xor     edi,eax
+        vpaddd  xmm9,xmm10,xmm3
+        shrd    ebp,ebp,7
+        add     ecx,edx
+        vpxor   xmm4,xmm4,xmm8
+        add     ebx,DWORD[4+rsp]
+        xor     edi,ebp
+        mov     esi,ecx
+        shld    ecx,ecx,5
+        vpsrld  xmm8,xmm4,30
+        vmovdqa XMMWORD[48+rsp],xmm9
+        add     ebx,edi
+        xor     esi,ebp
+        shrd    edx,edx,7
+        add     ebx,ecx
+        vpslld  xmm4,xmm4,2
+        add     eax,DWORD[8+rsp]
+        xor     esi,edx
+        mov     edi,ebx
+        shld    ebx,ebx,5
+        add     eax,esi
+        xor     edi,edx
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        vpor    xmm4,xmm4,xmm8
+        add     ebp,DWORD[12+rsp]
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[48+r15]
+        xor     edi,ecx
+        mov     esi,eax
+        shld    eax,eax,5
+        add     ebp,edi
+        xor     esi,ecx
+        shrd    ebx,ebx,7
+        add     ebp,eax
+        vpalignr        xmm8,xmm4,xmm3,8
+        vpxor   xmm5,xmm5,xmm1
+        add     edx,DWORD[16+rsp]
+        xor     esi,ebx
+        mov     edi,ebp
+        shld    ebp,ebp,5
+        vpxor   xmm5,xmm5,xmm6
+        add     edx,esi
+        xor     edi,ebx
+        vpaddd  xmm9,xmm10,xmm4
+        shrd    eax,eax,7
+        add     edx,ebp
+        vpxor   xmm5,xmm5,xmm8
+        add     ecx,DWORD[20+rsp]
+        xor     edi,eax
+        mov     esi,edx
+        shld    edx,edx,5
+        vpsrld  xmm8,xmm5,30
+        vmovdqa XMMWORD[rsp],xmm9
+        add     ecx,edi
+        cmp     r8d,11
+        jb      NEAR $L$vaesenclast7
+        vaesenc xmm12,xmm12,xmm15
+        vmovups xmm14,XMMWORD[64+r15]
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[80+r15]
+        je      NEAR $L$vaesenclast7
+        vaesenc xmm12,xmm12,xmm15
+        vmovups xmm14,XMMWORD[96+r15]
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[112+r15]
+$L$vaesenclast7:
+        vaesenclast     xmm12,xmm12,xmm15
+        vmovups xmm15,XMMWORD[((-112))+r15]
+        vmovups xmm14,XMMWORD[((16-112))+r15]
+        xor     esi,eax
+        shrd    ebp,ebp,7
+        add     ecx,edx
+        vpslld  xmm5,xmm5,2
+        add     ebx,DWORD[24+rsp]
+        xor     esi,ebp
+        mov     edi,ecx
+        shld    ecx,ecx,5
+        add     ebx,esi
+        xor     edi,ebp
+        shrd    edx,edx,7
+        add     ebx,ecx
+        vpor    xmm5,xmm5,xmm8
+        add     eax,DWORD[28+rsp]
+        shrd    ecx,ecx,7
+        mov     esi,ebx
+        xor     edi,edx
+        shld    ebx,ebx,5
+        add     eax,edi
+        xor     esi,ecx
+        xor     ecx,edx
+        add     eax,ebx
+        vpalignr        xmm8,xmm5,xmm4,8
+        vpxor   xmm6,xmm6,xmm2
+        add     ebp,DWORD[32+rsp]
+        vmovdqu xmm13,XMMWORD[32+r12]
+        vpxor   xmm13,xmm13,xmm15
+        vmovups XMMWORD[16+r12*1+r13],xmm12
+        vpxor   xmm12,xmm12,xmm13
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[((-80))+r15]
+        and     esi,ecx
+        xor     ecx,edx
+        shrd    ebx,ebx,7
+        vpxor   xmm6,xmm6,xmm7
+        mov     edi,eax
+        xor     esi,ecx
+        vpaddd  xmm9,xmm10,xmm5
+        shld    eax,eax,5
+        add     ebp,esi
+        vpxor   xmm6,xmm6,xmm8
+        xor     edi,ebx
+        xor     ebx,ecx
+        add     ebp,eax
+        add     edx,DWORD[36+rsp]
+        vpsrld  xmm8,xmm6,30
+        vmovdqa XMMWORD[16+rsp],xmm9
+        and     edi,ebx
+        xor     ebx,ecx
+        shrd    eax,eax,7
+        mov     esi,ebp
+        vpslld  xmm6,xmm6,2
+        xor     edi,ebx
+        shld    ebp,ebp,5
+        add     edx,edi
+        vaesenc xmm12,xmm12,xmm15
+        vmovups xmm14,XMMWORD[((-64))+r15]
+        xor     esi,eax
+        xor     eax,ebx
+        add     edx,ebp
+        add     ecx,DWORD[40+rsp]
+        and     esi,eax
+        vpor    xmm6,xmm6,xmm8
+        xor     eax,ebx
+        shrd    ebp,ebp,7
+        mov     edi,edx
+        xor     esi,eax
+        shld    edx,edx,5
+        add     ecx,esi
+        xor     edi,ebp
+        xor     ebp,eax
+        add     ecx,edx
+        add     ebx,DWORD[44+rsp]
+        and     edi,ebp
+        xor     ebp,eax
+        shrd    edx,edx,7
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[((-48))+r15]
+        mov     esi,ecx
+        xor     edi,ebp
+        shld    ecx,ecx,5
+        add     ebx,edi
+        xor     esi,edx
+        xor     edx,ebp
+        add     ebx,ecx
+        vpalignr        xmm8,xmm6,xmm5,8
+        vpxor   xmm7,xmm7,xmm3
+        add     eax,DWORD[48+rsp]
+        and     esi,edx
+        xor     edx,ebp
+        shrd    ecx,ecx,7
+        vpxor   xmm7,xmm7,xmm0
+        mov     edi,ebx
+        xor     esi,edx
+        vpaddd  xmm9,xmm10,xmm6
+        vmovdqa xmm10,XMMWORD[48+r11]
+        shld    ebx,ebx,5
+        add     eax,esi
+        vpxor   xmm7,xmm7,xmm8
+        xor     edi,ecx
+        xor     ecx,edx
+        add     eax,ebx
+        add     ebp,DWORD[52+rsp]
+        vaesenc xmm12,xmm12,xmm15
+        vmovups xmm14,XMMWORD[((-32))+r15]
+        vpsrld  xmm8,xmm7,30
+        vmovdqa XMMWORD[32+rsp],xmm9
+        and     edi,ecx
+        xor     ecx,edx
+        shrd    ebx,ebx,7
+        mov     esi,eax
+        vpslld  xmm7,xmm7,2
+        xor     edi,ecx
+        shld    eax,eax,5
+        add     ebp,edi
+        xor     esi,ebx
+        xor     ebx,ecx
+        add     ebp,eax
+        add     edx,DWORD[56+rsp]
+        and     esi,ebx
+        vpor    xmm7,xmm7,xmm8
+        xor     ebx,ecx
+        shrd    eax,eax,7
+        mov     edi,ebp
+        xor     esi,ebx
+        shld    ebp,ebp,5
+        add     edx,esi
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[((-16))+r15]
+        xor     edi,eax
+        xor     eax,ebx
+        add     edx,ebp
+        add     ecx,DWORD[60+rsp]
+        and     edi,eax
+        xor     eax,ebx
+        shrd    ebp,ebp,7
+        mov     esi,edx
+        xor     edi,eax
+        shld    edx,edx,5
+        add     ecx,edi
+        xor     esi,ebp
+        xor     ebp,eax
+        add     ecx,edx
+        vpalignr        xmm8,xmm7,xmm6,8
+        vpxor   xmm0,xmm0,xmm4
+        add     ebx,DWORD[rsp]
+        and     esi,ebp
+        xor     ebp,eax
+        shrd    edx,edx,7
+        vaesenc xmm12,xmm12,xmm15
+        vmovups xmm14,XMMWORD[r15]
+        vpxor   xmm0,xmm0,xmm1
+        mov     edi,ecx
+        xor     esi,ebp
+        vpaddd  xmm9,xmm10,xmm7
+        shld    ecx,ecx,5
+        add     ebx,esi
+        vpxor   xmm0,xmm0,xmm8
+        xor     edi,edx
+        xor     edx,ebp
+        add     ebx,ecx
+        add     eax,DWORD[4+rsp]
+        vpsrld  xmm8,xmm0,30
+        vmovdqa XMMWORD[48+rsp],xmm9
+        and     edi,edx
+        xor     edx,ebp
+        shrd    ecx,ecx,7
+        mov     esi,ebx
+        vpslld  xmm0,xmm0,2
+        xor     edi,edx
+        shld    ebx,ebx,5
+        add     eax,edi
+        xor     esi,ecx
+        xor     ecx,edx
+        add     eax,ebx
+        add     ebp,DWORD[8+rsp]
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[16+r15]
+        and     esi,ecx
+        vpor    xmm0,xmm0,xmm8
+        xor     ecx,edx
+        shrd    ebx,ebx,7
+        mov     edi,eax
+        xor     esi,ecx
+        shld    eax,eax,5
+        add     ebp,esi
+        xor     edi,ebx
+        xor     ebx,ecx
+        add     ebp,eax
+        add     edx,DWORD[12+rsp]
+        and     edi,ebx
+        xor     ebx,ecx
+        shrd    eax,eax,7
+        mov     esi,ebp
+        xor     edi,ebx
+        shld    ebp,ebp,5
+        add     edx,edi
+        vaesenc xmm12,xmm12,xmm15
+        vmovups xmm14,XMMWORD[32+r15]
+        xor     esi,eax
+        xor     eax,ebx
+        add     edx,ebp
+        vpalignr        xmm8,xmm0,xmm7,8
+        vpxor   xmm1,xmm1,xmm5
+        add     ecx,DWORD[16+rsp]
+        and     esi,eax
+        xor     eax,ebx
+        shrd    ebp,ebp,7
+        vpxor   xmm1,xmm1,xmm2
+        mov     edi,edx
+        xor     esi,eax
+        vpaddd  xmm9,xmm10,xmm0
+        shld    edx,edx,5
+        add     ecx,esi
+        vpxor   xmm1,xmm1,xmm8
+        xor     edi,ebp
+        xor     ebp,eax
+        add     ecx,edx
+        add     ebx,DWORD[20+rsp]
+        vpsrld  xmm8,xmm1,30
+        vmovdqa XMMWORD[rsp],xmm9
+        and     edi,ebp
+        xor     ebp,eax
+        shrd    edx,edx,7
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[48+r15]
+        mov     esi,ecx
+        vpslld  xmm1,xmm1,2
+        xor     edi,ebp
+        shld    ecx,ecx,5
+        add     ebx,edi
+        xor     esi,edx
+        xor     edx,ebp
+        add     ebx,ecx
+        add     eax,DWORD[24+rsp]
+        and     esi,edx
+        vpor    xmm1,xmm1,xmm8
+        xor     edx,ebp
+        shrd    ecx,ecx,7
+        mov     edi,ebx
+        xor     esi,edx
+        shld    ebx,ebx,5
+        add     eax,esi
+        xor     edi,ecx
+        xor     ecx,edx
+        add     eax,ebx
+        add     ebp,DWORD[28+rsp]
+        cmp     r8d,11
+        jb      NEAR $L$vaesenclast8
+        vaesenc xmm12,xmm12,xmm15
+        vmovups xmm14,XMMWORD[64+r15]
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[80+r15]
+        je      NEAR $L$vaesenclast8
+        vaesenc xmm12,xmm12,xmm15
+        vmovups xmm14,XMMWORD[96+r15]
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[112+r15]
+$L$vaesenclast8:
+        vaesenclast     xmm12,xmm12,xmm15
+        vmovups xmm15,XMMWORD[((-112))+r15]
+        vmovups xmm14,XMMWORD[((16-112))+r15]
+        and     edi,ecx
+        xor     ecx,edx
+        shrd    ebx,ebx,7
+        mov     esi,eax
+        xor     edi,ecx
+        shld    eax,eax,5
+        add     ebp,edi
+        xor     esi,ebx
+        xor     ebx,ecx
+        add     ebp,eax
+        vpalignr        xmm8,xmm1,xmm0,8
+        vpxor   xmm2,xmm2,xmm6
+        add     edx,DWORD[32+rsp]
+        and     esi,ebx
+        xor     ebx,ecx
+        shrd    eax,eax,7
+        vpxor   xmm2,xmm2,xmm3
+        mov     edi,ebp
+        xor     esi,ebx
+        vpaddd  xmm9,xmm10,xmm1
+        shld    ebp,ebp,5
+        add     edx,esi
+        vmovdqu xmm13,XMMWORD[48+r12]
+        vpxor   xmm13,xmm13,xmm15
+        vmovups XMMWORD[32+r12*1+r13],xmm12
+        vpxor   xmm12,xmm12,xmm13
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[((-80))+r15]
+        vpxor   xmm2,xmm2,xmm8
+        xor     edi,eax
+        xor     eax,ebx
+        add     edx,ebp
+        add     ecx,DWORD[36+rsp]
+        vpsrld  xmm8,xmm2,30
+        vmovdqa XMMWORD[16+rsp],xmm9
+        and     edi,eax
+        xor     eax,ebx
+        shrd    ebp,ebp,7
+        mov     esi,edx
+        vpslld  xmm2,xmm2,2
+        xor     edi,eax
+        shld    edx,edx,5
+        add     ecx,edi
+        xor     esi,ebp
+        xor     ebp,eax
+        add     ecx,edx
+        add     ebx,DWORD[40+rsp]
+        and     esi,ebp
+        vpor    xmm2,xmm2,xmm8
+        xor     ebp,eax
+        shrd    edx,edx,7
+        vaesenc xmm12,xmm12,xmm15
+        vmovups xmm14,XMMWORD[((-64))+r15]
+        mov     edi,ecx
+        xor     esi,ebp
+        shld    ecx,ecx,5
+        add     ebx,esi
+        xor     edi,edx
+        xor     edx,ebp
+        add     ebx,ecx
+        add     eax,DWORD[44+rsp]
+        and     edi,edx
+        xor     edx,ebp
+        shrd    ecx,ecx,7
+        mov     esi,ebx
+        xor     edi,edx
+        shld    ebx,ebx,5
+        add     eax,edi
+        xor     esi,edx
+        add     eax,ebx
+        vpalignr        xmm8,xmm2,xmm1,8
+        vpxor   xmm3,xmm3,xmm7
+        add     ebp,DWORD[48+rsp]
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[((-48))+r15]
+        xor     esi,ecx
+        mov     edi,eax
+        shld    eax,eax,5
+        vpxor   xmm3,xmm3,xmm4
+        add     ebp,esi
+        xor     edi,ecx
+        vpaddd  xmm9,xmm10,xmm2
+        shrd    ebx,ebx,7
+        add     ebp,eax
+        vpxor   xmm3,xmm3,xmm8
+        add     edx,DWORD[52+rsp]
+        xor     edi,ebx
+        mov     esi,ebp
+        shld    ebp,ebp,5
+        vpsrld  xmm8,xmm3,30
+        vmovdqa XMMWORD[32+rsp],xmm9
+        add     edx,edi
+        xor     esi,ebx
+        shrd    eax,eax,7
+        add     edx,ebp
+        vpslld  xmm3,xmm3,2
+        add     ecx,DWORD[56+rsp]
+        xor     esi,eax
+        mov     edi,edx
+        shld    edx,edx,5
+        add     ecx,esi
+        vaesenc xmm12,xmm12,xmm15
+        vmovups xmm14,XMMWORD[((-32))+r15]
+        xor     edi,eax
+        shrd    ebp,ebp,7
+        add     ecx,edx
+        vpor    xmm3,xmm3,xmm8
+        add     ebx,DWORD[60+rsp]
+        xor     edi,ebp
+        mov     esi,ecx
+        shld    ecx,ecx,5
+        add     ebx,edi
+        xor     esi,ebp
+        shrd    edx,edx,7
+        add     ebx,ecx
+        add     eax,DWORD[rsp]
+        vpaddd  xmm9,xmm10,xmm3
+        xor     esi,edx
+        mov     edi,ebx
+        shld    ebx,ebx,5
+        add     eax,esi
+        vmovdqa XMMWORD[48+rsp],xmm9
+        xor     edi,edx
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        add     ebp,DWORD[4+rsp]
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[((-16))+r15]
+        xor     edi,ecx
+        mov     esi,eax
+        shld    eax,eax,5
+        add     ebp,edi
+        xor     esi,ecx
+        shrd    ebx,ebx,7
+        add     ebp,eax
+        add     edx,DWORD[8+rsp]
+        xor     esi,ebx
+        mov     edi,ebp
+        shld    ebp,ebp,5
+        add     edx,esi
+        xor     edi,ebx
+        shrd    eax,eax,7
+        add     edx,ebp
+        add     ecx,DWORD[12+rsp]
+        xor     edi,eax
+        mov     esi,edx
+        shld    edx,edx,5
+        add     ecx,edi
+        vaesenc xmm12,xmm12,xmm15
+        vmovups xmm14,XMMWORD[r15]
+        xor     esi,eax
+        shrd    ebp,ebp,7
+        add     ecx,edx
+        cmp     r10,r14
+        je      NEAR $L$done_avx
+        vmovdqa xmm9,XMMWORD[64+r11]
+        vmovdqa xmm10,XMMWORD[r11]
+        vmovdqu xmm0,XMMWORD[r10]
+        vmovdqu xmm1,XMMWORD[16+r10]
+        vmovdqu xmm2,XMMWORD[32+r10]
+        vmovdqu xmm3,XMMWORD[48+r10]
+        vpshufb xmm0,xmm0,xmm9
+        add     r10,64
+        add     ebx,DWORD[16+rsp]
+        xor     esi,ebp
+        vpshufb xmm1,xmm1,xmm9
+        mov     edi,ecx
+        shld    ecx,ecx,5
+        vpaddd  xmm8,xmm0,xmm10
+        add     ebx,esi
+        xor     edi,ebp
+        shrd    edx,edx,7
+        add     ebx,ecx
+        vmovdqa XMMWORD[rsp],xmm8
+        add     eax,DWORD[20+rsp]
+        xor     edi,edx
+        mov     esi,ebx
+        shld    ebx,ebx,5
+        add     eax,edi
+        xor     esi,edx
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        add     ebp,DWORD[24+rsp]
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[16+r15]
+        xor     esi,ecx
+        mov     edi,eax
+        shld    eax,eax,5
+        add     ebp,esi
+        xor     edi,ecx
+        shrd    ebx,ebx,7
+        add     ebp,eax
+        add     edx,DWORD[28+rsp]
+        xor     edi,ebx
+        mov     esi,ebp
+        shld    ebp,ebp,5
+        add     edx,edi
+        xor     esi,ebx
+        shrd    eax,eax,7
+        add     edx,ebp
+        add     ecx,DWORD[32+rsp]
+        xor     esi,eax
+        vpshufb xmm2,xmm2,xmm9
+        mov     edi,edx
+        shld    edx,edx,5
+        vpaddd  xmm8,xmm1,xmm10
+        add     ecx,esi
+        vaesenc xmm12,xmm12,xmm15
+        vmovups xmm14,XMMWORD[32+r15]
+        xor     edi,eax
+        shrd    ebp,ebp,7
+        add     ecx,edx
+        vmovdqa XMMWORD[16+rsp],xmm8
+        add     ebx,DWORD[36+rsp]
+        xor     edi,ebp
+        mov     esi,ecx
+        shld    ecx,ecx,5
+        add     ebx,edi
+        xor     esi,ebp
+        shrd    edx,edx,7
+        add     ebx,ecx
+        add     eax,DWORD[40+rsp]
+        xor     esi,edx
+        mov     edi,ebx
+        shld    ebx,ebx,5
+        add     eax,esi
+        xor     edi,edx
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        add     ebp,DWORD[44+rsp]
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[48+r15]
+        xor     edi,ecx
+        mov     esi,eax
+        shld    eax,eax,5
+        add     ebp,edi
+        xor     esi,ecx
+        shrd    ebx,ebx,7
+        add     ebp,eax
+        add     edx,DWORD[48+rsp]
+        xor     esi,ebx
+        vpshufb xmm3,xmm3,xmm9
+        mov     edi,ebp
+        shld    ebp,ebp,5
+        vpaddd  xmm8,xmm2,xmm10
+        add     edx,esi
+        xor     edi,ebx
+        shrd    eax,eax,7
+        add     edx,ebp
+        vmovdqa XMMWORD[32+rsp],xmm8
+        add     ecx,DWORD[52+rsp]
+        xor     edi,eax
+        mov     esi,edx
+        shld    edx,edx,5
+        add     ecx,edi
+        cmp     r8d,11
+        jb      NEAR $L$vaesenclast9
+        vaesenc xmm12,xmm12,xmm15
+        vmovups xmm14,XMMWORD[64+r15]
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[80+r15]
+        je      NEAR $L$vaesenclast9
+        vaesenc xmm12,xmm12,xmm15
+        vmovups xmm14,XMMWORD[96+r15]
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[112+r15]
+$L$vaesenclast9:
+        vaesenclast     xmm12,xmm12,xmm15
+        vmovups xmm15,XMMWORD[((-112))+r15]
+        vmovups xmm14,XMMWORD[((16-112))+r15]
+        xor     esi,eax
+        shrd    ebp,ebp,7
+        add     ecx,edx
+        add     ebx,DWORD[56+rsp]
+        xor     esi,ebp
+        mov     edi,ecx
+        shld    ecx,ecx,5
+        add     ebx,esi
+        xor     edi,ebp
+        shrd    edx,edx,7
+        add     ebx,ecx
+        add     eax,DWORD[60+rsp]
+        xor     edi,edx
+        mov     esi,ebx
+        shld    ebx,ebx,5
+        add     eax,edi
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        vmovups XMMWORD[48+r12*1+r13],xmm12
+        lea     r12,[64+r12]
+
+        add     eax,DWORD[r9]
+        add     esi,DWORD[4+r9]
+        add     ecx,DWORD[8+r9]
+        add     edx,DWORD[12+r9]
+        mov     DWORD[r9],eax
+        add     ebp,DWORD[16+r9]
+        mov     DWORD[4+r9],esi
+        mov     ebx,esi
+        mov     DWORD[8+r9],ecx
+        mov     edi,ecx
+        mov     DWORD[12+r9],edx
+        xor     edi,edx
+        mov     DWORD[16+r9],ebp
+        and     esi,edi
+        jmp     NEAR $L$oop_avx
+
+$L$done_avx:
+        add     ebx,DWORD[16+rsp]
+        xor     esi,ebp
+        mov     edi,ecx
+        shld    ecx,ecx,5
+        add     ebx,esi
+        xor     edi,ebp
+        shrd    edx,edx,7
+        add     ebx,ecx
+        add     eax,DWORD[20+rsp]
+        xor     edi,edx
+        mov     esi,ebx
+        shld    ebx,ebx,5
+        add     eax,edi
+        xor     esi,edx
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        add     ebp,DWORD[24+rsp]
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[16+r15]
+        xor     esi,ecx
+        mov     edi,eax
+        shld    eax,eax,5
+        add     ebp,esi
+        xor     edi,ecx
+        shrd    ebx,ebx,7
+        add     ebp,eax
+        add     edx,DWORD[28+rsp]
+        xor     edi,ebx
+        mov     esi,ebp
+        shld    ebp,ebp,5
+        add     edx,edi
+        xor     esi,ebx
+        shrd    eax,eax,7
+        add     edx,ebp
+        add     ecx,DWORD[32+rsp]
+        xor     esi,eax
+        mov     edi,edx
+        shld    edx,edx,5
+        add     ecx,esi
+        vaesenc xmm12,xmm12,xmm15
+        vmovups xmm14,XMMWORD[32+r15]
+        xor     edi,eax
+        shrd    ebp,ebp,7
+        add     ecx,edx
+        add     ebx,DWORD[36+rsp]
+        xor     edi,ebp
+        mov     esi,ecx
+        shld    ecx,ecx,5
+        add     ebx,edi
+        xor     esi,ebp
+        shrd    edx,edx,7
+        add     ebx,ecx
+        add     eax,DWORD[40+rsp]
+        xor     esi,edx
+        mov     edi,ebx
+        shld    ebx,ebx,5
+        add     eax,esi
+        xor     edi,edx
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        add     ebp,DWORD[44+rsp]
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[48+r15]
+        xor     edi,ecx
+        mov     esi,eax
+        shld    eax,eax,5
+        add     ebp,edi
+        xor     esi,ecx
+        shrd    ebx,ebx,7
+        add     ebp,eax
+        add     edx,DWORD[48+rsp]
+        xor     esi,ebx
+        mov     edi,ebp
+        shld    ebp,ebp,5
+        add     edx,esi
+        xor     edi,ebx
+        shrd    eax,eax,7
+        add     edx,ebp
+        add     ecx,DWORD[52+rsp]
+        xor     edi,eax
+        mov     esi,edx
+        shld    edx,edx,5
+        add     ecx,edi
+        cmp     r8d,11
+        jb      NEAR $L$vaesenclast10
+        vaesenc xmm12,xmm12,xmm15
+        vmovups xmm14,XMMWORD[64+r15]
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[80+r15]
+        je      NEAR $L$vaesenclast10
+        vaesenc xmm12,xmm12,xmm15
+        vmovups xmm14,XMMWORD[96+r15]
+        vaesenc xmm12,xmm12,xmm14
+        vmovups xmm15,XMMWORD[112+r15]
+$L$vaesenclast10:
+        vaesenclast     xmm12,xmm12,xmm15
+        vmovups xmm15,XMMWORD[((-112))+r15]
+        vmovups xmm14,XMMWORD[((16-112))+r15]
+        xor     esi,eax
+        shrd    ebp,ebp,7
+        add     ecx,edx
+        add     ebx,DWORD[56+rsp]
+        xor     esi,ebp
+        mov     edi,ecx
+        shld    ecx,ecx,5
+        add     ebx,esi
+        xor     edi,ebp
+        shrd    edx,edx,7
+        add     ebx,ecx
+        add     eax,DWORD[60+rsp]
+        xor     edi,edx
+        mov     esi,ebx
+        shld    ebx,ebx,5
+        add     eax,edi
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        vmovups XMMWORD[48+r12*1+r13],xmm12
+        mov     r8,QWORD[88+rsp]
+
+        add     eax,DWORD[r9]
+        add     esi,DWORD[4+r9]
+        add     ecx,DWORD[8+r9]
+        mov     DWORD[r9],eax
+        add     edx,DWORD[12+r9]
+        mov     DWORD[4+r9],esi
+        add     ebp,DWORD[16+r9]
+        mov     DWORD[8+r9],ecx
+        mov     DWORD[12+r9],edx
+        mov     DWORD[16+r9],ebp
+        vmovups XMMWORD[r8],xmm12
+        vzeroall
+        movaps  xmm6,XMMWORD[((96+0))+rsp]
+        movaps  xmm7,XMMWORD[((96+16))+rsp]
+        movaps  xmm8,XMMWORD[((96+32))+rsp]
+        movaps  xmm9,XMMWORD[((96+48))+rsp]
+        movaps  xmm10,XMMWORD[((96+64))+rsp]
+        movaps  xmm11,XMMWORD[((96+80))+rsp]
+        movaps  xmm12,XMMWORD[((96+96))+rsp]
+        movaps  xmm13,XMMWORD[((96+112))+rsp]
+        movaps  xmm14,XMMWORD[((96+128))+rsp]
+        movaps  xmm15,XMMWORD[((96+144))+rsp]
+        lea     rsi,[264+rsp]
+
+        mov     r15,QWORD[rsi]
+
+        mov     r14,QWORD[8+rsi]
+
+        mov     r13,QWORD[16+rsi]
+
+        mov     r12,QWORD[24+rsi]
+
+        mov     rbp,QWORD[32+rsi]
+
+        mov     rbx,QWORD[40+rsi]
+
+        lea     rsp,[48+rsi]
+
+$L$epilogue_avx:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_aesni_cbc_sha1_enc_avx:
+ALIGN   64
+K_XX_XX:
+        DD      0x5a827999,0x5a827999,0x5a827999,0x5a827999
+        DD      0x6ed9eba1,0x6ed9eba1,0x6ed9eba1,0x6ed9eba1
+        DD      0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc
+        DD      0xca62c1d6,0xca62c1d6,0xca62c1d6,0xca62c1d6
+        DD      0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f
+DB      0xf,0xe,0xd,0xc,0xb,0xa,0x9,0x8,0x7,0x6,0x5,0x4,0x3,0x2,0x1,0x0
+
+DB      65,69,83,78,73,45,67,66,67,43,83,72,65,49,32,115
+DB      116,105,116,99,104,32,102,111,114,32,120,56,54,95,54,52
+DB      44,32,67,82,89,80,84,79,71,65,77,83,32,98,121,32
+DB      60,97,112,112,114,111,64,111,112,101,110,115,115,108,46,111
+DB      114,103,62,0
+ALIGN   64
+
+ALIGN   32
+aesni_cbc_sha1_enc_shaext:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_aesni_cbc_sha1_enc_shaext:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+        mov     r10,QWORD[56+rsp]
+        lea     rsp,[((-168))+rsp]
+        movaps  XMMWORD[(-8-160)+rax],xmm6
+        movaps  XMMWORD[(-8-144)+rax],xmm7
+        movaps  XMMWORD[(-8-128)+rax],xmm8
+        movaps  XMMWORD[(-8-112)+rax],xmm9
+        movaps  XMMWORD[(-8-96)+rax],xmm10
+        movaps  XMMWORD[(-8-80)+rax],xmm11
+        movaps  XMMWORD[(-8-64)+rax],xmm12
+        movaps  XMMWORD[(-8-48)+rax],xmm13
+        movaps  XMMWORD[(-8-32)+rax],xmm14
+        movaps  XMMWORD[(-8-16)+rax],xmm15
+$L$prologue_shaext:
+        movdqu  xmm8,XMMWORD[r9]
+        movd    xmm9,DWORD[16+r9]
+        movdqa  xmm7,XMMWORD[((K_XX_XX+80))]
+
+        mov     r11d,DWORD[240+rcx]
+        sub     rsi,rdi
+        movups  xmm15,XMMWORD[rcx]
+        movups  xmm2,XMMWORD[r8]
+        movups  xmm0,XMMWORD[16+rcx]
+        lea     rcx,[112+rcx]
+
+        pshufd  xmm8,xmm8,27
+        pshufd  xmm9,xmm9,27
+        jmp     NEAR $L$oop_shaext
+
+ALIGN   16
+$L$oop_shaext:
+        movups  xmm14,XMMWORD[rdi]
+        xorps   xmm14,xmm15
+        xorps   xmm2,xmm14
+        movups  xmm1,XMMWORD[((-80))+rcx]
+DB      102,15,56,220,208
+        movdqu  xmm3,XMMWORD[r10]
+        movdqa  xmm12,xmm9
+DB      102,15,56,0,223
+        movdqu  xmm4,XMMWORD[16+r10]
+        movdqa  xmm11,xmm8
+        movups  xmm0,XMMWORD[((-64))+rcx]
+DB      102,15,56,220,209
+DB      102,15,56,0,231
+
+        paddd   xmm9,xmm3
+        movdqu  xmm5,XMMWORD[32+r10]
+        lea     r10,[64+r10]
+        pxor    xmm3,xmm12
+        movups  xmm1,XMMWORD[((-48))+rcx]
+DB      102,15,56,220,208
+        pxor    xmm3,xmm12
+        movdqa  xmm10,xmm8
+DB      102,15,56,0,239
+DB      69,15,58,204,193,0
+DB      68,15,56,200,212
+        movups  xmm0,XMMWORD[((-32))+rcx]
+DB      102,15,56,220,209
+DB      15,56,201,220
+        movdqu  xmm6,XMMWORD[((-16))+r10]
+        movdqa  xmm9,xmm8
+DB      102,15,56,0,247
+        movups  xmm1,XMMWORD[((-16))+rcx]
+DB      102,15,56,220,208
+DB      69,15,58,204,194,0
+DB      68,15,56,200,205
+        pxor    xmm3,xmm5
+DB      15,56,201,229
+        movups  xmm0,XMMWORD[rcx]
+DB      102,15,56,220,209
+        movdqa  xmm10,xmm8
+DB      69,15,58,204,193,0
+DB      68,15,56,200,214
+        movups  xmm1,XMMWORD[16+rcx]
+DB      102,15,56,220,208
+DB      15,56,202,222
+        pxor    xmm4,xmm6
+DB      15,56,201,238
+        movups  xmm0,XMMWORD[32+rcx]
+DB      102,15,56,220,209
+        movdqa  xmm9,xmm8
+DB      69,15,58,204,194,0
+DB      68,15,56,200,203
+        movups  xmm1,XMMWORD[48+rcx]
+DB      102,15,56,220,208
+DB      15,56,202,227
+        pxor    xmm5,xmm3
+DB      15,56,201,243
+        cmp     r11d,11
+        jb      NEAR $L$aesenclast11
+        movups  xmm0,XMMWORD[64+rcx]
+DB      102,15,56,220,209
+        movups  xmm1,XMMWORD[80+rcx]
+DB      102,15,56,220,208
+        je      NEAR $L$aesenclast11
+        movups  xmm0,XMMWORD[96+rcx]
+DB      102,15,56,220,209
+        movups  xmm1,XMMWORD[112+rcx]
+DB      102,15,56,220,208
+$L$aesenclast11:
+DB      102,15,56,221,209
+        movups  xmm0,XMMWORD[((16-112))+rcx]
+        movdqa  xmm10,xmm8
+DB      69,15,58,204,193,0
+DB      68,15,56,200,212
+        movups  xmm14,XMMWORD[16+rdi]
+        xorps   xmm14,xmm15
+        movups  XMMWORD[rdi*1+rsi],xmm2
+        xorps   xmm2,xmm14
+        movups  xmm1,XMMWORD[((-80))+rcx]
+DB      102,15,56,220,208
+DB      15,56,202,236
+        pxor    xmm6,xmm4
+DB      15,56,201,220
+        movups  xmm0,XMMWORD[((-64))+rcx]
+DB      102,15,56,220,209
+        movdqa  xmm9,xmm8
+DB      69,15,58,204,194,1
+DB      68,15,56,200,205
+        movups  xmm1,XMMWORD[((-48))+rcx]
+DB      102,15,56,220,208
+DB      15,56,202,245
+        pxor    xmm3,xmm5
+DB      15,56,201,229
+        movups  xmm0,XMMWORD[((-32))+rcx]
+DB      102,15,56,220,209
+        movdqa  xmm10,xmm8
+DB      69,15,58,204,193,1
+DB      68,15,56,200,214
+        movups  xmm1,XMMWORD[((-16))+rcx]
+DB      102,15,56,220,208
+DB      15,56,202,222
+        pxor    xmm4,xmm6
+DB      15,56,201,238
+        movups  xmm0,XMMWORD[rcx]
+DB      102,15,56,220,209
+        movdqa  xmm9,xmm8
+DB      69,15,58,204,194,1
+DB      68,15,56,200,203
+        movups  xmm1,XMMWORD[16+rcx]
+DB      102,15,56,220,208
+DB      15,56,202,227
+        pxor    xmm5,xmm3
+DB      15,56,201,243
+        movups  xmm0,XMMWORD[32+rcx]
+DB      102,15,56,220,209
+        movdqa  xmm10,xmm8
+DB      69,15,58,204,193,1
+DB      68,15,56,200,212
+        movups  xmm1,XMMWORD[48+rcx]
+DB      102,15,56,220,208
+DB      15,56,202,236
+        pxor    xmm6,xmm4
+DB      15,56,201,220
+        cmp     r11d,11
+        jb      NEAR $L$aesenclast12
+        movups  xmm0,XMMWORD[64+rcx]
+DB      102,15,56,220,209
+        movups  xmm1,XMMWORD[80+rcx]
+DB      102,15,56,220,208
+        je      NEAR $L$aesenclast12
+        movups  xmm0,XMMWORD[96+rcx]
+DB      102,15,56,220,209
+        movups  xmm1,XMMWORD[112+rcx]
+DB      102,15,56,220,208
+$L$aesenclast12:
+DB      102,15,56,221,209
+        movups  xmm0,XMMWORD[((16-112))+rcx]
+        movdqa  xmm9,xmm8
+DB      69,15,58,204,194,1
+DB      68,15,56,200,205
+        movups  xmm14,XMMWORD[32+rdi]
+        xorps   xmm14,xmm15
+        movups  XMMWORD[16+rdi*1+rsi],xmm2
+        xorps   xmm2,xmm14
+        movups  xmm1,XMMWORD[((-80))+rcx]
+DB      102,15,56,220,208
+DB      15,56,202,245
+        pxor    xmm3,xmm5
+DB      15,56,201,229
+        movups  xmm0,XMMWORD[((-64))+rcx]
+DB      102,15,56,220,209
+        movdqa  xmm10,xmm8
+DB      69,15,58,204,193,2
+DB      68,15,56,200,214
+        movups  xmm1,XMMWORD[((-48))+rcx]
+DB      102,15,56,220,208
+DB      15,56,202,222
+        pxor    xmm4,xmm6
+DB      15,56,201,238
+        movups  xmm0,XMMWORD[((-32))+rcx]
+DB      102,15,56,220,209
+        movdqa  xmm9,xmm8
+DB      69,15,58,204,194,2
+DB      68,15,56,200,203
+        movups  xmm1,XMMWORD[((-16))+rcx]
+DB      102,15,56,220,208
+DB      15,56,202,227
+        pxor    xmm5,xmm3
+DB      15,56,201,243
+        movups  xmm0,XMMWORD[rcx]
+DB      102,15,56,220,209
+        movdqa  xmm10,xmm8
+DB      69,15,58,204,193,2
+DB      68,15,56,200,212
+        movups  xmm1,XMMWORD[16+rcx]
+DB      102,15,56,220,208
+DB      15,56,202,236
+        pxor    xmm6,xmm4
+DB      15,56,201,220
+        movups  xmm0,XMMWORD[32+rcx]
+DB      102,15,56,220,209
+        movdqa  xmm9,xmm8
+DB      69,15,58,204,194,2
+DB      68,15,56,200,205
+        movups  xmm1,XMMWORD[48+rcx]
+DB      102,15,56,220,208
+DB      15,56,202,245
+        pxor    xmm3,xmm5
+DB      15,56,201,229
+        cmp     r11d,11
+        jb      NEAR $L$aesenclast13
+        movups  xmm0,XMMWORD[64+rcx]
+DB      102,15,56,220,209
+        movups  xmm1,XMMWORD[80+rcx]
+DB      102,15,56,220,208
+        je      NEAR $L$aesenclast13
+        movups  xmm0,XMMWORD[96+rcx]
+DB      102,15,56,220,209
+        movups  xmm1,XMMWORD[112+rcx]
+DB      102,15,56,220,208
+$L$aesenclast13:
+DB      102,15,56,221,209
+        movups  xmm0,XMMWORD[((16-112))+rcx]
+        movdqa  xmm10,xmm8
+DB      69,15,58,204,193,2
+DB      68,15,56,200,214
+        movups  xmm14,XMMWORD[48+rdi]
+        xorps   xmm14,xmm15
+        movups  XMMWORD[32+rdi*1+rsi],xmm2
+        xorps   xmm2,xmm14
+        movups  xmm1,XMMWORD[((-80))+rcx]
+DB      102,15,56,220,208
+DB      15,56,202,222
+        pxor    xmm4,xmm6
+DB      15,56,201,238
+        movups  xmm0,XMMWORD[((-64))+rcx]
+DB      102,15,56,220,209
+        movdqa  xmm9,xmm8
+DB      69,15,58,204,194,3
+DB      68,15,56,200,203
+        movups  xmm1,XMMWORD[((-48))+rcx]
+DB      102,15,56,220,208
+DB      15,56,202,227
+        pxor    xmm5,xmm3
+DB      15,56,201,243
+        movups  xmm0,XMMWORD[((-32))+rcx]
+DB      102,15,56,220,209
+        movdqa  xmm10,xmm8
+DB      69,15,58,204,193,3
+DB      68,15,56,200,212
+DB      15,56,202,236
+        pxor    xmm6,xmm4
+        movups  xmm1,XMMWORD[((-16))+rcx]
+DB      102,15,56,220,208
+        movdqa  xmm9,xmm8
+DB      69,15,58,204,194,3
+DB      68,15,56,200,205
+DB      15,56,202,245
+        movups  xmm0,XMMWORD[rcx]
+DB      102,15,56,220,209
+        movdqa  xmm5,xmm12
+        movdqa  xmm10,xmm8
+DB      69,15,58,204,193,3
+DB      68,15,56,200,214
+        movups  xmm1,XMMWORD[16+rcx]
+DB      102,15,56,220,208
+        movdqa  xmm9,xmm8
+DB      69,15,58,204,194,3
+DB      68,15,56,200,205
+        movups  xmm0,XMMWORD[32+rcx]
+DB      102,15,56,220,209
+        movups  xmm1,XMMWORD[48+rcx]
+DB      102,15,56,220,208
+        cmp     r11d,11
+        jb      NEAR $L$aesenclast14
+        movups  xmm0,XMMWORD[64+rcx]
+DB      102,15,56,220,209
+        movups  xmm1,XMMWORD[80+rcx]
+DB      102,15,56,220,208
+        je      NEAR $L$aesenclast14
+        movups  xmm0,XMMWORD[96+rcx]
+DB      102,15,56,220,209
+        movups  xmm1,XMMWORD[112+rcx]
+DB      102,15,56,220,208
+$L$aesenclast14:
+DB      102,15,56,221,209
+        movups  xmm0,XMMWORD[((16-112))+rcx]
+        dec     rdx
+
+        paddd   xmm8,xmm11
+        movups  XMMWORD[48+rdi*1+rsi],xmm2
+        lea     rdi,[64+rdi]
+        jnz     NEAR $L$oop_shaext
+
+        pshufd  xmm8,xmm8,27
+        pshufd  xmm9,xmm9,27
+        movups  XMMWORD[r8],xmm2
+        movdqu  XMMWORD[r9],xmm8
+        movd    DWORD[16+r9],xmm9
+        movaps  xmm6,XMMWORD[((-8-160))+rax]
+        movaps  xmm7,XMMWORD[((-8-144))+rax]
+        movaps  xmm8,XMMWORD[((-8-128))+rax]
+        movaps  xmm9,XMMWORD[((-8-112))+rax]
+        movaps  xmm10,XMMWORD[((-8-96))+rax]
+        movaps  xmm11,XMMWORD[((-8-80))+rax]
+        movaps  xmm12,XMMWORD[((-8-64))+rax]
+        movaps  xmm13,XMMWORD[((-8-48))+rax]
+        movaps  xmm14,XMMWORD[((-8-32))+rax]
+        movaps  xmm15,XMMWORD[((-8-16))+rax]
+        mov     rsp,rax
+$L$epilogue_shaext:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+$L$SEH_end_aesni_cbc_sha1_enc_shaext:
+EXTERN  __imp_RtlVirtualUnwind
+
+ALIGN   16
+ssse3_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        mov     rsi,QWORD[8+r9]
+        mov     r11,QWORD[56+r9]
+
+        mov     r10d,DWORD[r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jb      NEAR $L$common_seh_tail
+
+        mov     rax,QWORD[152+r8]
+
+        mov     r10d,DWORD[4+r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jae     NEAR $L$common_seh_tail
+        lea     r10,[aesni_cbc_sha1_enc_shaext]
+        cmp     rbx,r10
+        jb      NEAR $L$seh_no_shaext
+
+        lea     rsi,[rax]
+        lea     rdi,[512+r8]
+        mov     ecx,20
+        DD      0xa548f3fc
+        lea     rax,[168+rax]
+        jmp     NEAR $L$common_seh_tail
+$L$seh_no_shaext:
+        lea     rsi,[96+rax]
+        lea     rdi,[512+r8]
+        mov     ecx,20
+        DD      0xa548f3fc
+        lea     rax,[264+rax]
+
+        mov     r15,QWORD[rax]
+        mov     r14,QWORD[8+rax]
+        mov     r13,QWORD[16+rax]
+        mov     r12,QWORD[24+rax]
+        mov     rbp,QWORD[32+rax]
+        mov     rbx,QWORD[40+rax]
+        lea     rax,[48+rax]
+        mov     QWORD[144+r8],rbx
+        mov     QWORD[160+r8],rbp
+        mov     QWORD[216+r8],r12
+        mov     QWORD[224+r8],r13
+        mov     QWORD[232+r8],r14
+        mov     QWORD[240+r8],r15
+
+$L$common_seh_tail:
+        mov     rdi,QWORD[8+rax]
+        mov     rsi,QWORD[16+rax]
+        mov     QWORD[152+r8],rax
+        mov     QWORD[168+r8],rsi
+        mov     QWORD[176+r8],rdi
+
+        mov     rdi,QWORD[40+r9]
+        mov     rsi,r8
+        mov     ecx,154
+        DD      0xa548f3fc
+
+        mov     rsi,r9
+        xor     rcx,rcx
+        mov     rdx,QWORD[8+rsi]
+        mov     r8,QWORD[rsi]
+        mov     r9,QWORD[16+rsi]
+        mov     r10,QWORD[40+rsi]
+        lea     r11,[56+rsi]
+        lea     r12,[24+rsi]
+        mov     QWORD[32+rsp],r10
+        mov     QWORD[40+rsp],r11
+        mov     QWORD[48+rsp],r12
+        mov     QWORD[56+rsp],rcx
+        call    QWORD[__imp_RtlVirtualUnwind]
+
+        mov     eax,1
+        add     rsp,64
+        popfq
+        pop     r15
+        pop     r14
+        pop     r13
+        pop     r12
+        pop     rbp
+        pop     rbx
+        pop     rdi
+        pop     rsi
+        DB      0F3h,0C3h               ;repret
+
+
+section .pdata rdata align=4
+ALIGN   4
+        DD      $L$SEH_begin_aesni_cbc_sha1_enc_ssse3 wrt ..imagebase
+        DD      $L$SEH_end_aesni_cbc_sha1_enc_ssse3 wrt ..imagebase
+        DD      $L$SEH_info_aesni_cbc_sha1_enc_ssse3 wrt ..imagebase
+        DD      $L$SEH_begin_aesni_cbc_sha1_enc_avx wrt ..imagebase
+        DD      $L$SEH_end_aesni_cbc_sha1_enc_avx wrt ..imagebase
+        DD      $L$SEH_info_aesni_cbc_sha1_enc_avx wrt ..imagebase
+        DD      $L$SEH_begin_aesni_cbc_sha1_enc_shaext wrt ..imagebase
+        DD      $L$SEH_end_aesni_cbc_sha1_enc_shaext wrt ..imagebase
+        DD      $L$SEH_info_aesni_cbc_sha1_enc_shaext wrt ..imagebase
+section .xdata rdata align=8
+ALIGN   8
+$L$SEH_info_aesni_cbc_sha1_enc_ssse3:
+DB      9,0,0,0
+        DD      ssse3_handler wrt ..imagebase
+        DD      $L$prologue_ssse3 wrt ..imagebase,$L$epilogue_ssse3 wrt ..imagebase
+$L$SEH_info_aesni_cbc_sha1_enc_avx:
+DB      9,0,0,0
+        DD      ssse3_handler wrt ..imagebase
+        DD      $L$prologue_avx wrt ..imagebase,$L$epilogue_avx wrt ..imagebase
+$L$SEH_info_aesni_cbc_sha1_enc_shaext:
+DB      9,0,0,0
+        DD      ssse3_handler wrt ..imagebase
+        DD      $L$prologue_shaext wrt ..imagebase,$L$epilogue_shaext wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha256-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha256-x86_64.nasm
new file mode 100644
index 0000000000..0dba3d7f67
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha256-x86_64.nasm
@@ -0,0 +1,4709 @@
+; Copyright 2013-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+EXTERN  OPENSSL_ia32cap_P
+global  aesni_cbc_sha256_enc
+
+ALIGN   16
+aesni_cbc_sha256_enc:
+        lea     r11,[OPENSSL_ia32cap_P]
+        mov     eax,1
+        cmp     rcx,0
+        je      NEAR $L$probe
+        mov     eax,DWORD[r11]
+        mov     r10,QWORD[4+r11]
+        bt      r10,61
+        jc      NEAR aesni_cbc_sha256_enc_shaext
+        mov     r11,r10
+        shr     r11,32
+
+        test    r10d,2048
+        jnz     NEAR aesni_cbc_sha256_enc_xop
+        and     r11d,296
+        cmp     r11d,296
+        je      NEAR aesni_cbc_sha256_enc_avx2
+        and     r10d,268435456
+        jnz     NEAR aesni_cbc_sha256_enc_avx
+        ud2
+        xor     eax,eax
+        cmp     rcx,0
+        je      NEAR $L$probe
+        ud2
+$L$probe:
+        DB      0F3h,0C3h               ;repret
+
+
+ALIGN   64
+
+K256:
+        DD      0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5
+        DD      0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5
+        DD      0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5
+        DD      0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5
+        DD      0xd807aa98,0x12835b01,0x243185be,0x550c7dc3
+        DD      0xd807aa98,0x12835b01,0x243185be,0x550c7dc3
+        DD      0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174
+        DD      0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174
+        DD      0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc
+        DD      0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc
+        DD      0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da
+        DD      0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da
+        DD      0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7
+        DD      0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7
+        DD      0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967
+        DD      0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967
+        DD      0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13
+        DD      0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13
+        DD      0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85
+        DD      0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85
+        DD      0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3
+        DD      0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3
+        DD      0xd192e819,0xd6990624,0xf40e3585,0x106aa070
+        DD      0xd192e819,0xd6990624,0xf40e3585,0x106aa070
+        DD      0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5
+        DD      0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5
+        DD      0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3
+        DD      0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3
+        DD      0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208
+        DD      0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208
+        DD      0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2
+        DD      0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2
+
+        DD      0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f
+        DD      0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f
+        DD      0,0,0,0,0,0,0,0,-1,-1,-1,-1
+        DD      0,0,0,0,0,0,0,0
+DB      65,69,83,78,73,45,67,66,67,43,83,72,65,50,53,54
+DB      32,115,116,105,116,99,104,32,102,111,114,32,120,56,54,95
+DB      54,52,44,32,67,82,89,80,84,79,71,65,77,83,32,98
+DB      121,32,60,97,112,112,114,111,64,111,112,101,110,115,115,108
+DB      46,111,114,103,62,0
+ALIGN   64
+
+ALIGN   64
+aesni_cbc_sha256_enc_xop:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_aesni_cbc_sha256_enc_xop:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+
+$L$xop_shortcut:
+        mov     r10,QWORD[56+rsp]
+        mov     rax,rsp
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+        sub     rsp,288
+        and     rsp,-64
+
+        shl     rdx,6
+        sub     rsi,rdi
+        sub     r10,rdi
+        add     rdx,rdi
+
+
+        mov     QWORD[((64+8))+rsp],rsi
+        mov     QWORD[((64+16))+rsp],rdx
+
+        mov     QWORD[((64+32))+rsp],r8
+        mov     QWORD[((64+40))+rsp],r9
+        mov     QWORD[((64+48))+rsp],r10
+        mov     QWORD[120+rsp],rax
+
+        movaps  XMMWORD[128+rsp],xmm6
+        movaps  XMMWORD[144+rsp],xmm7
+        movaps  XMMWORD[160+rsp],xmm8
+        movaps  XMMWORD[176+rsp],xmm9
+        movaps  XMMWORD[192+rsp],xmm10
+        movaps  XMMWORD[208+rsp],xmm11
+        movaps  XMMWORD[224+rsp],xmm12
+        movaps  XMMWORD[240+rsp],xmm13
+        movaps  XMMWORD[256+rsp],xmm14
+        movaps  XMMWORD[272+rsp],xmm15
+$L$prologue_xop:
+        vzeroall
+
+        mov     r12,rdi
+        lea     rdi,[128+rcx]
+        lea     r13,[((K256+544))]
+        mov     r14d,DWORD[((240-128))+rdi]
+        mov     r15,r9
+        mov     rsi,r10
+        vmovdqu xmm8,XMMWORD[r8]
+        sub     r14,9
+
+        mov     eax,DWORD[r15]
+        mov     ebx,DWORD[4+r15]
+        mov     ecx,DWORD[8+r15]
+        mov     edx,DWORD[12+r15]
+        mov     r8d,DWORD[16+r15]
+        mov     r9d,DWORD[20+r15]
+        mov     r10d,DWORD[24+r15]
+        mov     r11d,DWORD[28+r15]
+
+        vmovdqa xmm14,XMMWORD[r14*8+r13]
+        vmovdqa xmm13,XMMWORD[16+r14*8+r13]
+        vmovdqa xmm12,XMMWORD[32+r14*8+r13]
+        vmovdqu xmm10,XMMWORD[((0-128))+rdi]
+        jmp     NEAR $L$loop_xop
+ALIGN   16
+$L$loop_xop:
+        vmovdqa xmm7,XMMWORD[((K256+512))]
+        vmovdqu xmm0,XMMWORD[r12*1+rsi]
+        vmovdqu xmm1,XMMWORD[16+r12*1+rsi]
+        vmovdqu xmm2,XMMWORD[32+r12*1+rsi]
+        vmovdqu xmm3,XMMWORD[48+r12*1+rsi]
+        vpshufb xmm0,xmm0,xmm7
+        lea     rbp,[K256]
+        vpshufb xmm1,xmm1,xmm7
+        vpshufb xmm2,xmm2,xmm7
+        vpaddd  xmm4,xmm0,XMMWORD[rbp]
+        vpshufb xmm3,xmm3,xmm7
+        vpaddd  xmm5,xmm1,XMMWORD[32+rbp]
+        vpaddd  xmm6,xmm2,XMMWORD[64+rbp]
+        vpaddd  xmm7,xmm3,XMMWORD[96+rbp]
+        vmovdqa XMMWORD[rsp],xmm4
+        mov     r14d,eax
+        vmovdqa XMMWORD[16+rsp],xmm5
+        mov     esi,ebx
+        vmovdqa XMMWORD[32+rsp],xmm6
+        xor     esi,ecx
+        vmovdqa XMMWORD[48+rsp],xmm7
+        mov     r13d,r8d
+        jmp     NEAR $L$xop_00_47
+
+ALIGN   16
+$L$xop_00_47:
+        sub     rbp,-16*2*4
+        vmovdqu xmm9,XMMWORD[r12]
+        mov     QWORD[((64+0))+rsp],r12
+        vpalignr        xmm4,xmm1,xmm0,4
+        ror     r13d,14
+        mov     eax,r14d
+        vpalignr        xmm7,xmm3,xmm2,4
+        mov     r12d,r9d
+        xor     r13d,r8d
+DB      143,232,120,194,236,14
+        ror     r14d,9
+        xor     r12d,r10d
+        vpsrld  xmm4,xmm4,3
+        ror     r13d,5
+        xor     r14d,eax
+        vpaddd  xmm0,xmm0,xmm7
+        and     r12d,r8d
+        vpxor   xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((16-128))+rdi]
+        xor     r13d,r8d
+        add     r11d,DWORD[rsp]
+        mov     r15d,eax
+DB      143,232,120,194,245,11
+        ror     r14d,11
+        xor     r12d,r10d
+        vpxor   xmm4,xmm4,xmm5
+        xor     r15d,ebx
+        ror     r13d,6
+        add     r11d,r12d
+        and     esi,r15d
+DB      143,232,120,194,251,13
+        xor     r14d,eax
+        add     r11d,r13d
+        vpxor   xmm4,xmm4,xmm6
+        xor     esi,ebx
+        add     edx,r11d
+        vpsrld  xmm6,xmm3,10
+        ror     r14d,2
+        add     r11d,esi
+        vpaddd  xmm0,xmm0,xmm4
+        mov     r13d,edx
+        add     r14d,r11d
+DB      143,232,120,194,239,2
+        ror     r13d,14
+        mov     r11d,r14d
+        vpxor   xmm7,xmm7,xmm6
+        mov     r12d,r8d
+        xor     r13d,edx
+        ror     r14d,9
+        xor     r12d,r9d
+        vpxor   xmm7,xmm7,xmm5
+        ror     r13d,5
+        xor     r14d,r11d
+        and     r12d,edx
+        vpxor   xmm9,xmm9,xmm8
+        xor     r13d,edx
+        vpsrldq xmm7,xmm7,8
+        add     r10d,DWORD[4+rsp]
+        mov     esi,r11d
+        ror     r14d,11
+        xor     r12d,r9d
+        vpaddd  xmm0,xmm0,xmm7
+        xor     esi,eax
+        ror     r13d,6
+        add     r10d,r12d
+        and     r15d,esi
+DB      143,232,120,194,248,13
+        xor     r14d,r11d
+        add     r10d,r13d
+        vpsrld  xmm6,xmm0,10
+        xor     r15d,eax
+        add     ecx,r10d
+DB      143,232,120,194,239,2
+        ror     r14d,2
+        add     r10d,r15d
+        vpxor   xmm7,xmm7,xmm6
+        mov     r13d,ecx
+        add     r14d,r10d
+        ror     r13d,14
+        mov     r10d,r14d
+        vpxor   xmm7,xmm7,xmm5
+        mov     r12d,edx
+        xor     r13d,ecx
+        ror     r14d,9
+        xor     r12d,r8d
+        vpslldq xmm7,xmm7,8
+        ror     r13d,5
+        xor     r14d,r10d
+        and     r12d,ecx
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((32-128))+rdi]
+        xor     r13d,ecx
+        vpaddd  xmm0,xmm0,xmm7
+        add     r9d,DWORD[8+rsp]
+        mov     r15d,r10d
+        ror     r14d,11
+        xor     r12d,r8d
+        vpaddd  xmm6,xmm0,XMMWORD[rbp]
+        xor     r15d,r11d
+        ror     r13d,6
+        add     r9d,r12d
+        and     esi,r15d
+        xor     r14d,r10d
+        add     r9d,r13d
+        xor     esi,r11d
+        add     ebx,r9d
+        ror     r14d,2
+        add     r9d,esi
+        mov     r13d,ebx
+        add     r14d,r9d
+        ror     r13d,14
+        mov     r9d,r14d
+        mov     r12d,ecx
+        xor     r13d,ebx
+        ror     r14d,9
+        xor     r12d,edx
+        ror     r13d,5
+        xor     r14d,r9d
+        and     r12d,ebx
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((48-128))+rdi]
+        xor     r13d,ebx
+        add     r8d,DWORD[12+rsp]
+        mov     esi,r9d
+        ror     r14d,11
+        xor     r12d,edx
+        xor     esi,r10d
+        ror     r13d,6
+        add     r8d,r12d
+        and     r15d,esi
+        xor     r14d,r9d
+        add     r8d,r13d
+        xor     r15d,r10d
+        add     eax,r8d
+        ror     r14d,2
+        add     r8d,r15d
+        mov     r13d,eax
+        add     r14d,r8d
+        vmovdqa XMMWORD[rsp],xmm6
+        vpalignr        xmm4,xmm2,xmm1,4
+        ror     r13d,14
+        mov     r8d,r14d
+        vpalignr        xmm7,xmm0,xmm3,4
+        mov     r12d,ebx
+        xor     r13d,eax
+DB      143,232,120,194,236,14
+        ror     r14d,9
+        xor     r12d,ecx
+        vpsrld  xmm4,xmm4,3
+        ror     r13d,5
+        xor     r14d,r8d
+        vpaddd  xmm1,xmm1,xmm7
+        and     r12d,eax
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((64-128))+rdi]
+        xor     r13d,eax
+        add     edx,DWORD[16+rsp]
+        mov     r15d,r8d
+DB      143,232,120,194,245,11
+        ror     r14d,11
+        xor     r12d,ecx
+        vpxor   xmm4,xmm4,xmm5
+        xor     r15d,r9d
+        ror     r13d,6
+        add     edx,r12d
+        and     esi,r15d
+DB      143,232,120,194,248,13
+        xor     r14d,r8d
+        add     edx,r13d
+        vpxor   xmm4,xmm4,xmm6
+        xor     esi,r9d
+        add     r11d,edx
+        vpsrld  xmm6,xmm0,10
+        ror     r14d,2
+        add     edx,esi
+        vpaddd  xmm1,xmm1,xmm4
+        mov     r13d,r11d
+        add     r14d,edx
+DB      143,232,120,194,239,2
+        ror     r13d,14
+        mov     edx,r14d
+        vpxor   xmm7,xmm7,xmm6
+        mov     r12d,eax
+        xor     r13d,r11d
+        ror     r14d,9
+        xor     r12d,ebx
+        vpxor   xmm7,xmm7,xmm5
+        ror     r13d,5
+        xor     r14d,edx
+        and     r12d,r11d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((80-128))+rdi]
+        xor     r13d,r11d
+        vpsrldq xmm7,xmm7,8
+        add     ecx,DWORD[20+rsp]
+        mov     esi,edx
+        ror     r14d,11
+        xor     r12d,ebx
+        vpaddd  xmm1,xmm1,xmm7
+        xor     esi,r8d
+        ror     r13d,6
+        add     ecx,r12d
+        and     r15d,esi
+DB      143,232,120,194,249,13
+        xor     r14d,edx
+        add     ecx,r13d
+        vpsrld  xmm6,xmm1,10
+        xor     r15d,r8d
+        add     r10d,ecx
+DB      143,232,120,194,239,2
+        ror     r14d,2
+        add     ecx,r15d
+        vpxor   xmm7,xmm7,xmm6
+        mov     r13d,r10d
+        add     r14d,ecx
+        ror     r13d,14
+        mov     ecx,r14d
+        vpxor   xmm7,xmm7,xmm5
+        mov     r12d,r11d
+        xor     r13d,r10d
+        ror     r14d,9
+        xor     r12d,eax
+        vpslldq xmm7,xmm7,8
+        ror     r13d,5
+        xor     r14d,ecx
+        and     r12d,r10d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((96-128))+rdi]
+        xor     r13d,r10d
+        vpaddd  xmm1,xmm1,xmm7
+        add     ebx,DWORD[24+rsp]
+        mov     r15d,ecx
+        ror     r14d,11
+        xor     r12d,eax
+        vpaddd  xmm6,xmm1,XMMWORD[32+rbp]
+        xor     r15d,edx
+        ror     r13d,6
+        add     ebx,r12d
+        and     esi,r15d
+        xor     r14d,ecx
+        add     ebx,r13d
+        xor     esi,edx
+        add     r9d,ebx
+        ror     r14d,2
+        add     ebx,esi
+        mov     r13d,r9d
+        add     r14d,ebx
+        ror     r13d,14
+        mov     ebx,r14d
+        mov     r12d,r10d
+        xor     r13d,r9d
+        ror     r14d,9
+        xor     r12d,r11d
+        ror     r13d,5
+        xor     r14d,ebx
+        and     r12d,r9d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((112-128))+rdi]
+        xor     r13d,r9d
+        add     eax,DWORD[28+rsp]
+        mov     esi,ebx
+        ror     r14d,11
+        xor     r12d,r11d
+        xor     esi,ecx
+        ror     r13d,6
+        add     eax,r12d
+        and     r15d,esi
+        xor     r14d,ebx
+        add     eax,r13d
+        xor     r15d,ecx
+        add     r8d,eax
+        ror     r14d,2
+        add     eax,r15d
+        mov     r13d,r8d
+        add     r14d,eax
+        vmovdqa XMMWORD[16+rsp],xmm6
+        vpalignr        xmm4,xmm3,xmm2,4
+        ror     r13d,14
+        mov     eax,r14d
+        vpalignr        xmm7,xmm1,xmm0,4
+        mov     r12d,r9d
+        xor     r13d,r8d
+DB      143,232,120,194,236,14
+        ror     r14d,9
+        xor     r12d,r10d
+        vpsrld  xmm4,xmm4,3
+        ror     r13d,5
+        xor     r14d,eax
+        vpaddd  xmm2,xmm2,xmm7
+        and     r12d,r8d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((128-128))+rdi]
+        xor     r13d,r8d
+        add     r11d,DWORD[32+rsp]
+        mov     r15d,eax
+DB      143,232,120,194,245,11
+        ror     r14d,11
+        xor     r12d,r10d
+        vpxor   xmm4,xmm4,xmm5
+        xor     r15d,ebx
+        ror     r13d,6
+        add     r11d,r12d
+        and     esi,r15d
+DB      143,232,120,194,249,13
+        xor     r14d,eax
+        add     r11d,r13d
+        vpxor   xmm4,xmm4,xmm6
+        xor     esi,ebx
+        add     edx,r11d
+        vpsrld  xmm6,xmm1,10
+        ror     r14d,2
+        add     r11d,esi
+        vpaddd  xmm2,xmm2,xmm4
+        mov     r13d,edx
+        add     r14d,r11d
+DB      143,232,120,194,239,2
+        ror     r13d,14
+        mov     r11d,r14d
+        vpxor   xmm7,xmm7,xmm6
+        mov     r12d,r8d
+        xor     r13d,edx
+        ror     r14d,9
+        xor     r12d,r9d
+        vpxor   xmm7,xmm7,xmm5
+        ror     r13d,5
+        xor     r14d,r11d
+        and     r12d,edx
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((144-128))+rdi]
+        xor     r13d,edx
+        vpsrldq xmm7,xmm7,8
+        add     r10d,DWORD[36+rsp]
+        mov     esi,r11d
+        ror     r14d,11
+        xor     r12d,r9d
+        vpaddd  xmm2,xmm2,xmm7
+        xor     esi,eax
+        ror     r13d,6
+        add     r10d,r12d
+        and     r15d,esi
+DB      143,232,120,194,250,13
+        xor     r14d,r11d
+        add     r10d,r13d
+        vpsrld  xmm6,xmm2,10
+        xor     r15d,eax
+        add     ecx,r10d
+DB      143,232,120,194,239,2
+        ror     r14d,2
+        add     r10d,r15d
+        vpxor   xmm7,xmm7,xmm6
+        mov     r13d,ecx
+        add     r14d,r10d
+        ror     r13d,14
+        mov     r10d,r14d
+        vpxor   xmm7,xmm7,xmm5
+        mov     r12d,edx
+        xor     r13d,ecx
+        ror     r14d,9
+        xor     r12d,r8d
+        vpslldq xmm7,xmm7,8
+        ror     r13d,5
+        xor     r14d,r10d
+        and     r12d,ecx
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((160-128))+rdi]
+        xor     r13d,ecx
+        vpaddd  xmm2,xmm2,xmm7
+        add     r9d,DWORD[40+rsp]
+        mov     r15d,r10d
+        ror     r14d,11
+        xor     r12d,r8d
+        vpaddd  xmm6,xmm2,XMMWORD[64+rbp]
+        xor     r15d,r11d
+        ror     r13d,6
+        add     r9d,r12d
+        and     esi,r15d
+        xor     r14d,r10d
+        add     r9d,r13d
+        xor     esi,r11d
+        add     ebx,r9d
+        ror     r14d,2
+        add     r9d,esi
+        mov     r13d,ebx
+        add     r14d,r9d
+        ror     r13d,14
+        mov     r9d,r14d
+        mov     r12d,ecx
+        xor     r13d,ebx
+        ror     r14d,9
+        xor     r12d,edx
+        ror     r13d,5
+        xor     r14d,r9d
+        and     r12d,ebx
+        vaesenclast     xmm11,xmm9,xmm10
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((176-128))+rdi]
+        xor     r13d,ebx
+        add     r8d,DWORD[44+rsp]
+        mov     esi,r9d
+        ror     r14d,11
+        xor     r12d,edx
+        xor     esi,r10d
+        ror     r13d,6
+        add     r8d,r12d
+        and     r15d,esi
+        xor     r14d,r9d
+        add     r8d,r13d
+        xor     r15d,r10d
+        add     eax,r8d
+        ror     r14d,2
+        add     r8d,r15d
+        mov     r13d,eax
+        add     r14d,r8d
+        vmovdqa XMMWORD[32+rsp],xmm6
+        vpalignr        xmm4,xmm0,xmm3,4
+        ror     r13d,14
+        mov     r8d,r14d
+        vpalignr        xmm7,xmm2,xmm1,4
+        mov     r12d,ebx
+        xor     r13d,eax
+DB      143,232,120,194,236,14
+        ror     r14d,9
+        xor     r12d,ecx
+        vpsrld  xmm4,xmm4,3
+        ror     r13d,5
+        xor     r14d,r8d
+        vpaddd  xmm3,xmm3,xmm7
+        and     r12d,eax
+        vpand   xmm8,xmm11,xmm12
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((192-128))+rdi]
+        xor     r13d,eax
+        add     edx,DWORD[48+rsp]
+        mov     r15d,r8d
+DB      143,232,120,194,245,11
+        ror     r14d,11
+        xor     r12d,ecx
+        vpxor   xmm4,xmm4,xmm5
+        xor     r15d,r9d
+        ror     r13d,6
+        add     edx,r12d
+        and     esi,r15d
+DB      143,232,120,194,250,13
+        xor     r14d,r8d
+        add     edx,r13d
+        vpxor   xmm4,xmm4,xmm6
+        xor     esi,r9d
+        add     r11d,edx
+        vpsrld  xmm6,xmm2,10
+        ror     r14d,2
+        add     edx,esi
+        vpaddd  xmm3,xmm3,xmm4
+        mov     r13d,r11d
+        add     r14d,edx
+DB      143,232,120,194,239,2
+        ror     r13d,14
+        mov     edx,r14d
+        vpxor   xmm7,xmm7,xmm6
+        mov     r12d,eax
+        xor     r13d,r11d
+        ror     r14d,9
+        xor     r12d,ebx
+        vpxor   xmm7,xmm7,xmm5
+        ror     r13d,5
+        xor     r14d,edx
+        and     r12d,r11d
+        vaesenclast     xmm11,xmm9,xmm10
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((208-128))+rdi]
+        xor     r13d,r11d
+        vpsrldq xmm7,xmm7,8
+        add     ecx,DWORD[52+rsp]
+        mov     esi,edx
+        ror     r14d,11
+        xor     r12d,ebx
+        vpaddd  xmm3,xmm3,xmm7
+        xor     esi,r8d
+        ror     r13d,6
+        add     ecx,r12d
+        and     r15d,esi
+DB      143,232,120,194,251,13
+        xor     r14d,edx
+        add     ecx,r13d
+        vpsrld  xmm6,xmm3,10
+        xor     r15d,r8d
+        add     r10d,ecx
+DB      143,232,120,194,239,2
+        ror     r14d,2
+        add     ecx,r15d
+        vpxor   xmm7,xmm7,xmm6
+        mov     r13d,r10d
+        add     r14d,ecx
+        ror     r13d,14
+        mov     ecx,r14d
+        vpxor   xmm7,xmm7,xmm5
+        mov     r12d,r11d
+        xor     r13d,r10d
+        ror     r14d,9
+        xor     r12d,eax
+        vpslldq xmm7,xmm7,8
+        ror     r13d,5
+        xor     r14d,ecx
+        and     r12d,r10d
+        vpand   xmm11,xmm11,xmm13
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((224-128))+rdi]
+        xor     r13d,r10d
+        vpaddd  xmm3,xmm3,xmm7
+        add     ebx,DWORD[56+rsp]
+        mov     r15d,ecx
+        ror     r14d,11
+        xor     r12d,eax
+        vpaddd  xmm6,xmm3,XMMWORD[96+rbp]
+        xor     r15d,edx
+        ror     r13d,6
+        add     ebx,r12d
+        and     esi,r15d
+        xor     r14d,ecx
+        add     ebx,r13d
+        xor     esi,edx
+        add     r9d,ebx
+        ror     r14d,2
+        add     ebx,esi
+        mov     r13d,r9d
+        add     r14d,ebx
+        ror     r13d,14
+        mov     ebx,r14d
+        mov     r12d,r10d
+        xor     r13d,r9d
+        ror     r14d,9
+        xor     r12d,r11d
+        ror     r13d,5
+        xor     r14d,ebx
+        and     r12d,r9d
+        vpor    xmm8,xmm8,xmm11
+        vaesenclast     xmm11,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((0-128))+rdi]
+        xor     r13d,r9d
+        add     eax,DWORD[60+rsp]
+        mov     esi,ebx
+        ror     r14d,11
+        xor     r12d,r11d
+        xor     esi,ecx
+        ror     r13d,6
+        add     eax,r12d
+        and     r15d,esi
+        xor     r14d,ebx
+        add     eax,r13d
+        xor     r15d,ecx
+        add     r8d,eax
+        ror     r14d,2
+        add     eax,r15d
+        mov     r13d,r8d
+        add     r14d,eax
+        vmovdqa XMMWORD[48+rsp],xmm6
+        mov     r12,QWORD[((64+0))+rsp]
+        vpand   xmm11,xmm11,xmm14
+        mov     r15,QWORD[((64+8))+rsp]
+        vpor    xmm8,xmm8,xmm11
+        vmovdqu XMMWORD[r12*1+r15],xmm8
+        lea     r12,[16+r12]
+        cmp     BYTE[131+rbp],0
+        jne     NEAR $L$xop_00_47
+        vmovdqu xmm9,XMMWORD[r12]
+        mov     QWORD[((64+0))+rsp],r12
+        ror     r13d,14
+        mov     eax,r14d
+        mov     r12d,r9d
+        xor     r13d,r8d
+        ror     r14d,9
+        xor     r12d,r10d
+        ror     r13d,5
+        xor     r14d,eax
+        and     r12d,r8d
+        vpxor   xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((16-128))+rdi]
+        xor     r13d,r8d
+        add     r11d,DWORD[rsp]
+        mov     r15d,eax
+        ror     r14d,11
+        xor     r12d,r10d
+        xor     r15d,ebx
+        ror     r13d,6
+        add     r11d,r12d
+        and     esi,r15d
+        xor     r14d,eax
+        add     r11d,r13d
+        xor     esi,ebx
+        add     edx,r11d
+        ror     r14d,2
+        add     r11d,esi
+        mov     r13d,edx
+        add     r14d,r11d
+        ror     r13d,14
+        mov     r11d,r14d
+        mov     r12d,r8d
+        xor     r13d,edx
+        ror     r14d,9
+        xor     r12d,r9d
+        ror     r13d,5
+        xor     r14d,r11d
+        and     r12d,edx
+        vpxor   xmm9,xmm9,xmm8
+        xor     r13d,edx
+        add     r10d,DWORD[4+rsp]
+        mov     esi,r11d
+        ror     r14d,11
+        xor     r12d,r9d
+        xor     esi,eax
+        ror     r13d,6
+        add     r10d,r12d
+        and     r15d,esi
+        xor     r14d,r11d
+        add     r10d,r13d
+        xor     r15d,eax
+        add     ecx,r10d
+        ror     r14d,2
+        add     r10d,r15d
+        mov     r13d,ecx
+        add     r14d,r10d
+        ror     r13d,14
+        mov     r10d,r14d
+        mov     r12d,edx
+        xor     r13d,ecx
+        ror     r14d,9
+        xor     r12d,r8d
+        ror     r13d,5
+        xor     r14d,r10d
+        and     r12d,ecx
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((32-128))+rdi]
+        xor     r13d,ecx
+        add     r9d,DWORD[8+rsp]
+        mov     r15d,r10d
+        ror     r14d,11
+        xor     r12d,r8d
+        xor     r15d,r11d
+        ror     r13d,6
+        add     r9d,r12d
+        and     esi,r15d
+        xor     r14d,r10d
+        add     r9d,r13d
+        xor     esi,r11d
+        add     ebx,r9d
+        ror     r14d,2
+        add     r9d,esi
+        mov     r13d,ebx
+        add     r14d,r9d
+        ror     r13d,14
+        mov     r9d,r14d
+        mov     r12d,ecx
+        xor     r13d,ebx
+        ror     r14d,9
+        xor     r12d,edx
+        ror     r13d,5
+        xor     r14d,r9d
+        and     r12d,ebx
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((48-128))+rdi]
+        xor     r13d,ebx
+        add     r8d,DWORD[12+rsp]
+        mov     esi,r9d
+        ror     r14d,11
+        xor     r12d,edx
+        xor     esi,r10d
+        ror     r13d,6
+        add     r8d,r12d
+        and     r15d,esi
+        xor     r14d,r9d
+        add     r8d,r13d
+        xor     r15d,r10d
+        add     eax,r8d
+        ror     r14d,2
+        add     r8d,r15d
+        mov     r13d,eax
+        add     r14d,r8d
+        ror     r13d,14
+        mov     r8d,r14d
+        mov     r12d,ebx
+        xor     r13d,eax
+        ror     r14d,9
+        xor     r12d,ecx
+        ror     r13d,5
+        xor     r14d,r8d
+        and     r12d,eax
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((64-128))+rdi]
+        xor     r13d,eax
+        add     edx,DWORD[16+rsp]
+        mov     r15d,r8d
+        ror     r14d,11
+        xor     r12d,ecx
+        xor     r15d,r9d
+        ror     r13d,6
+        add     edx,r12d
+        and     esi,r15d
+        xor     r14d,r8d
+        add     edx,r13d
+        xor     esi,r9d
+        add     r11d,edx
+        ror     r14d,2
+        add     edx,esi
+        mov     r13d,r11d
+        add     r14d,edx
+        ror     r13d,14
+        mov     edx,r14d
+        mov     r12d,eax
+        xor     r13d,r11d
+        ror     r14d,9
+        xor     r12d,ebx
+        ror     r13d,5
+        xor     r14d,edx
+        and     r12d,r11d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((80-128))+rdi]
+        xor     r13d,r11d
+        add     ecx,DWORD[20+rsp]
+        mov     esi,edx
+        ror     r14d,11
+        xor     r12d,ebx
+        xor     esi,r8d
+        ror     r13d,6
+        add     ecx,r12d
+        and     r15d,esi
+        xor     r14d,edx
+        add     ecx,r13d
+        xor     r15d,r8d
+        add     r10d,ecx
+        ror     r14d,2
+        add     ecx,r15d
+        mov     r13d,r10d
+        add     r14d,ecx
+        ror     r13d,14
+        mov     ecx,r14d
+        mov     r12d,r11d
+        xor     r13d,r10d
+        ror     r14d,9
+        xor     r12d,eax
+        ror     r13d,5
+        xor     r14d,ecx
+        and     r12d,r10d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((96-128))+rdi]
+        xor     r13d,r10d
+        add     ebx,DWORD[24+rsp]
+        mov     r15d,ecx
+        ror     r14d,11
+        xor     r12d,eax
+        xor     r15d,edx
+        ror     r13d,6
+        add     ebx,r12d
+        and     esi,r15d
+        xor     r14d,ecx
+        add     ebx,r13d
+        xor     esi,edx
+        add     r9d,ebx
+        ror     r14d,2
+        add     ebx,esi
+        mov     r13d,r9d
+        add     r14d,ebx
+        ror     r13d,14
+        mov     ebx,r14d
+        mov     r12d,r10d
+        xor     r13d,r9d
+        ror     r14d,9
+        xor     r12d,r11d
+        ror     r13d,5
+        xor     r14d,ebx
+        and     r12d,r9d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((112-128))+rdi]
+        xor     r13d,r9d
+        add     eax,DWORD[28+rsp]
+        mov     esi,ebx
+        ror     r14d,11
+        xor     r12d,r11d
+        xor     esi,ecx
+        ror     r13d,6
+        add     eax,r12d
+        and     r15d,esi
+        xor     r14d,ebx
+        add     eax,r13d
+        xor     r15d,ecx
+        add     r8d,eax
+        ror     r14d,2
+        add     eax,r15d
+        mov     r13d,r8d
+        add     r14d,eax
+        ror     r13d,14
+        mov     eax,r14d
+        mov     r12d,r9d
+        xor     r13d,r8d
+        ror     r14d,9
+        xor     r12d,r10d
+        ror     r13d,5
+        xor     r14d,eax
+        and     r12d,r8d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((128-128))+rdi]
+        xor     r13d,r8d
+        add     r11d,DWORD[32+rsp]
+        mov     r15d,eax
+        ror     r14d,11
+        xor     r12d,r10d
+        xor     r15d,ebx
+        ror     r13d,6
+        add     r11d,r12d
+        and     esi,r15d
+        xor     r14d,eax
+        add     r11d,r13d
+        xor     esi,ebx
+        add     edx,r11d
+        ror     r14d,2
+        add     r11d,esi
+        mov     r13d,edx
+        add     r14d,r11d
+        ror     r13d,14
+        mov     r11d,r14d
+        mov     r12d,r8d
+        xor     r13d,edx
+        ror     r14d,9
+        xor     r12d,r9d
+        ror     r13d,5
+        xor     r14d,r11d
+        and     r12d,edx
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((144-128))+rdi]
+        xor     r13d,edx
+        add     r10d,DWORD[36+rsp]
+        mov     esi,r11d
+        ror     r14d,11
+        xor     r12d,r9d
+        xor     esi,eax
+        ror     r13d,6
+        add     r10d,r12d
+        and     r15d,esi
+        xor     r14d,r11d
+        add     r10d,r13d
+        xor     r15d,eax
+        add     ecx,r10d
+        ror     r14d,2
+        add     r10d,r15d
+        mov     r13d,ecx
+        add     r14d,r10d
+        ror     r13d,14
+        mov     r10d,r14d
+        mov     r12d,edx
+        xor     r13d,ecx
+        ror     r14d,9
+        xor     r12d,r8d
+        ror     r13d,5
+        xor     r14d,r10d
+        and     r12d,ecx
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((160-128))+rdi]
+        xor     r13d,ecx
+        add     r9d,DWORD[40+rsp]
+        mov     r15d,r10d
+        ror     r14d,11
+        xor     r12d,r8d
+        xor     r15d,r11d
+        ror     r13d,6
+        add     r9d,r12d
+        and     esi,r15d
+        xor     r14d,r10d
+        add     r9d,r13d
+        xor     esi,r11d
+        add     ebx,r9d
+        ror     r14d,2
+        add     r9d,esi
+        mov     r13d,ebx
+        add     r14d,r9d
+        ror     r13d,14
+        mov     r9d,r14d
+        mov     r12d,ecx
+        xor     r13d,ebx
+        ror     r14d,9
+        xor     r12d,edx
+        ror     r13d,5
+        xor     r14d,r9d
+        and     r12d,ebx
+        vaesenclast     xmm11,xmm9,xmm10
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((176-128))+rdi]
+        xor     r13d,ebx
+        add     r8d,DWORD[44+rsp]
+        mov     esi,r9d
+        ror     r14d,11
+        xor     r12d,edx
+        xor     esi,r10d
+        ror     r13d,6
+        add     r8d,r12d
+        and     r15d,esi
+        xor     r14d,r9d
+        add     r8d,r13d
+        xor     r15d,r10d
+        add     eax,r8d
+        ror     r14d,2
+        add     r8d,r15d
+        mov     r13d,eax
+        add     r14d,r8d
+        ror     r13d,14
+        mov     r8d,r14d
+        mov     r12d,ebx
+        xor     r13d,eax
+        ror     r14d,9
+        xor     r12d,ecx
+        ror     r13d,5
+        xor     r14d,r8d
+        and     r12d,eax
+        vpand   xmm8,xmm11,xmm12
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((192-128))+rdi]
+        xor     r13d,eax
+        add     edx,DWORD[48+rsp]
+        mov     r15d,r8d
+        ror     r14d,11
+        xor     r12d,ecx
+        xor     r15d,r9d
+        ror     r13d,6
+        add     edx,r12d
+        and     esi,r15d
+        xor     r14d,r8d
+        add     edx,r13d
+        xor     esi,r9d
+        add     r11d,edx
+        ror     r14d,2
+        add     edx,esi
+        mov     r13d,r11d
+        add     r14d,edx
+        ror     r13d,14
+        mov     edx,r14d
+        mov     r12d,eax
+        xor     r13d,r11d
+        ror     r14d,9
+        xor     r12d,ebx
+        ror     r13d,5
+        xor     r14d,edx
+        and     r12d,r11d
+        vaesenclast     xmm11,xmm9,xmm10
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((208-128))+rdi]
+        xor     r13d,r11d
+        add     ecx,DWORD[52+rsp]
+        mov     esi,edx
+        ror     r14d,11
+        xor     r12d,ebx
+        xor     esi,r8d
+        ror     r13d,6
+        add     ecx,r12d
+        and     r15d,esi
+        xor     r14d,edx
+        add     ecx,r13d
+        xor     r15d,r8d
+        add     r10d,ecx
+        ror     r14d,2
+        add     ecx,r15d
+        mov     r13d,r10d
+        add     r14d,ecx
+        ror     r13d,14
+        mov     ecx,r14d
+        mov     r12d,r11d
+        xor     r13d,r10d
+        ror     r14d,9
+        xor     r12d,eax
+        ror     r13d,5
+        xor     r14d,ecx
+        and     r12d,r10d
+        vpand   xmm11,xmm11,xmm13
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((224-128))+rdi]
+        xor     r13d,r10d
+        add     ebx,DWORD[56+rsp]
+        mov     r15d,ecx
+        ror     r14d,11
+        xor     r12d,eax
+        xor     r15d,edx
+        ror     r13d,6
+        add     ebx,r12d
+        and     esi,r15d
+        xor     r14d,ecx
+        add     ebx,r13d
+        xor     esi,edx
+        add     r9d,ebx
+        ror     r14d,2
+        add     ebx,esi
+        mov     r13d,r9d
+        add     r14d,ebx
+        ror     r13d,14
+        mov     ebx,r14d
+        mov     r12d,r10d
+        xor     r13d,r9d
+        ror     r14d,9
+        xor     r12d,r11d
+        ror     r13d,5
+        xor     r14d,ebx
+        and     r12d,r9d
+        vpor    xmm8,xmm8,xmm11
+        vaesenclast     xmm11,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((0-128))+rdi]
+        xor     r13d,r9d
+        add     eax,DWORD[60+rsp]
+        mov     esi,ebx
+        ror     r14d,11
+        xor     r12d,r11d
+        xor     esi,ecx
+        ror     r13d,6
+        add     eax,r12d
+        and     r15d,esi
+        xor     r14d,ebx
+        add     eax,r13d
+        xor     r15d,ecx
+        add     r8d,eax
+        ror     r14d,2
+        add     eax,r15d
+        mov     r13d,r8d
+        add     r14d,eax
+        mov     r12,QWORD[((64+0))+rsp]
+        mov     r13,QWORD[((64+8))+rsp]
+        mov     r15,QWORD[((64+40))+rsp]
+        mov     rsi,QWORD[((64+48))+rsp]
+
+        vpand   xmm11,xmm11,xmm14
+        mov     eax,r14d
+        vpor    xmm8,xmm8,xmm11
+        vmovdqu XMMWORD[r13*1+r12],xmm8
+        lea     r12,[16+r12]
+
+        add     eax,DWORD[r15]
+        add     ebx,DWORD[4+r15]
+        add     ecx,DWORD[8+r15]
+        add     edx,DWORD[12+r15]
+        add     r8d,DWORD[16+r15]
+        add     r9d,DWORD[20+r15]
+        add     r10d,DWORD[24+r15]
+        add     r11d,DWORD[28+r15]
+
+        cmp     r12,QWORD[((64+16))+rsp]
+
+        mov     DWORD[r15],eax
+        mov     DWORD[4+r15],ebx
+        mov     DWORD[8+r15],ecx
+        mov     DWORD[12+r15],edx
+        mov     DWORD[16+r15],r8d
+        mov     DWORD[20+r15],r9d
+        mov     DWORD[24+r15],r10d
+        mov     DWORD[28+r15],r11d
+
+        jb      NEAR $L$loop_xop
+
+        mov     r8,QWORD[((64+32))+rsp]
+        mov     rsi,QWORD[120+rsp]
+
+        vmovdqu XMMWORD[r8],xmm8
+        vzeroall
+        movaps  xmm6,XMMWORD[128+rsp]
+        movaps  xmm7,XMMWORD[144+rsp]
+        movaps  xmm8,XMMWORD[160+rsp]
+        movaps  xmm9,XMMWORD[176+rsp]
+        movaps  xmm10,XMMWORD[192+rsp]
+        movaps  xmm11,XMMWORD[208+rsp]
+        movaps  xmm12,XMMWORD[224+rsp]
+        movaps  xmm13,XMMWORD[240+rsp]
+        movaps  xmm14,XMMWORD[256+rsp]
+        movaps  xmm15,XMMWORD[272+rsp]
+        mov     r15,QWORD[((-48))+rsi]
+
+        mov     r14,QWORD[((-40))+rsi]
+
+        mov     r13,QWORD[((-32))+rsi]
+
+        mov     r12,QWORD[((-24))+rsi]
+
+        mov     rbp,QWORD[((-16))+rsi]
+
+        mov     rbx,QWORD[((-8))+rsi]
+
+        lea     rsp,[rsi]
+
+$L$epilogue_xop:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_aesni_cbc_sha256_enc_xop:
+
+ALIGN   64
+aesni_cbc_sha256_enc_avx:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_aesni_cbc_sha256_enc_avx:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+
+$L$avx_shortcut:
+        mov     r10,QWORD[56+rsp]
+        mov     rax,rsp
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+        sub     rsp,288
+        and     rsp,-64
+
+        shl     rdx,6
+        sub     rsi,rdi
+        sub     r10,rdi
+        add     rdx,rdi
+
+
+        mov     QWORD[((64+8))+rsp],rsi
+        mov     QWORD[((64+16))+rsp],rdx
+
+        mov     QWORD[((64+32))+rsp],r8
+        mov     QWORD[((64+40))+rsp],r9
+        mov     QWORD[((64+48))+rsp],r10
+        mov     QWORD[120+rsp],rax
+
+        movaps  XMMWORD[128+rsp],xmm6
+        movaps  XMMWORD[144+rsp],xmm7
+        movaps  XMMWORD[160+rsp],xmm8
+        movaps  XMMWORD[176+rsp],xmm9
+        movaps  XMMWORD[192+rsp],xmm10
+        movaps  XMMWORD[208+rsp],xmm11
+        movaps  XMMWORD[224+rsp],xmm12
+        movaps  XMMWORD[240+rsp],xmm13
+        movaps  XMMWORD[256+rsp],xmm14
+        movaps  XMMWORD[272+rsp],xmm15
+$L$prologue_avx:
+        vzeroall
+
+        mov     r12,rdi
+        lea     rdi,[128+rcx]
+        lea     r13,[((K256+544))]
+        mov     r14d,DWORD[((240-128))+rdi]
+        mov     r15,r9
+        mov     rsi,r10
+        vmovdqu xmm8,XMMWORD[r8]
+        sub     r14,9
+
+        mov     eax,DWORD[r15]
+        mov     ebx,DWORD[4+r15]
+        mov     ecx,DWORD[8+r15]
+        mov     edx,DWORD[12+r15]
+        mov     r8d,DWORD[16+r15]
+        mov     r9d,DWORD[20+r15]
+        mov     r10d,DWORD[24+r15]
+        mov     r11d,DWORD[28+r15]
+
+        vmovdqa xmm14,XMMWORD[r14*8+r13]
+        vmovdqa xmm13,XMMWORD[16+r14*8+r13]
+        vmovdqa xmm12,XMMWORD[32+r14*8+r13]
+        vmovdqu xmm10,XMMWORD[((0-128))+rdi]
+        jmp     NEAR $L$loop_avx
+ALIGN   16
+$L$loop_avx:
+        vmovdqa xmm7,XMMWORD[((K256+512))]
+        vmovdqu xmm0,XMMWORD[r12*1+rsi]
+        vmovdqu xmm1,XMMWORD[16+r12*1+rsi]
+        vmovdqu xmm2,XMMWORD[32+r12*1+rsi]
+        vmovdqu xmm3,XMMWORD[48+r12*1+rsi]
+        vpshufb xmm0,xmm0,xmm7
+        lea     rbp,[K256]
+        vpshufb xmm1,xmm1,xmm7
+        vpshufb xmm2,xmm2,xmm7
+        vpaddd  xmm4,xmm0,XMMWORD[rbp]
+        vpshufb xmm3,xmm3,xmm7
+        vpaddd  xmm5,xmm1,XMMWORD[32+rbp]
+        vpaddd  xmm6,xmm2,XMMWORD[64+rbp]
+        vpaddd  xmm7,xmm3,XMMWORD[96+rbp]
+        vmovdqa XMMWORD[rsp],xmm4
+        mov     r14d,eax
+        vmovdqa XMMWORD[16+rsp],xmm5
+        mov     esi,ebx
+        vmovdqa XMMWORD[32+rsp],xmm6
+        xor     esi,ecx
+        vmovdqa XMMWORD[48+rsp],xmm7
+        mov     r13d,r8d
+        jmp     NEAR $L$avx_00_47
+
+ALIGN   16
+$L$avx_00_47:
+        sub     rbp,-16*2*4
+        vmovdqu xmm9,XMMWORD[r12]
+        mov     QWORD[((64+0))+rsp],r12
+        vpalignr        xmm4,xmm1,xmm0,4
+        shrd    r13d,r13d,14
+        mov     eax,r14d
+        mov     r12d,r9d
+        vpalignr        xmm7,xmm3,xmm2,4
+        xor     r13d,r8d
+        shrd    r14d,r14d,9
+        xor     r12d,r10d
+        vpsrld  xmm6,xmm4,7
+        shrd    r13d,r13d,5
+        xor     r14d,eax
+        and     r12d,r8d
+        vpaddd  xmm0,xmm0,xmm7
+        vpxor   xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((16-128))+rdi]
+        xor     r13d,r8d
+        add     r11d,DWORD[rsp]
+        mov     r15d,eax
+        vpsrld  xmm7,xmm4,3
+        shrd    r14d,r14d,11
+        xor     r12d,r10d
+        xor     r15d,ebx
+        vpslld  xmm5,xmm4,14
+        shrd    r13d,r13d,6
+        add     r11d,r12d
+        and     esi,r15d
+        vpxor   xmm4,xmm7,xmm6
+        xor     r14d,eax
+        add     r11d,r13d
+        xor     esi,ebx
+        vpshufd xmm7,xmm3,250
+        add     edx,r11d
+        shrd    r14d,r14d,2
+        add     r11d,esi
+        vpsrld  xmm6,xmm6,11
+        mov     r13d,edx
+        add     r14d,r11d
+        shrd    r13d,r13d,14
+        vpxor   xmm4,xmm4,xmm5
+        mov     r11d,r14d
+        mov     r12d,r8d
+        xor     r13d,edx
+        vpslld  xmm5,xmm5,11
+        shrd    r14d,r14d,9
+        xor     r12d,r9d
+        shrd    r13d,r13d,5
+        vpxor   xmm4,xmm4,xmm6
+        xor     r14d,r11d
+        and     r12d,edx
+        vpxor   xmm9,xmm9,xmm8
+        xor     r13d,edx
+        vpsrld  xmm6,xmm7,10
+        add     r10d,DWORD[4+rsp]
+        mov     esi,r11d
+        shrd    r14d,r14d,11
+        vpxor   xmm4,xmm4,xmm5
+        xor     r12d,r9d
+        xor     esi,eax
+        shrd    r13d,r13d,6
+        vpsrlq  xmm7,xmm7,17
+        add     r10d,r12d
+        and     r15d,esi
+        xor     r14d,r11d
+        vpaddd  xmm0,xmm0,xmm4
+        add     r10d,r13d
+        xor     r15d,eax
+        add     ecx,r10d
+        vpxor   xmm6,xmm6,xmm7
+        shrd    r14d,r14d,2
+        add     r10d,r15d
+        mov     r13d,ecx
+        vpsrlq  xmm7,xmm7,2
+        add     r14d,r10d
+        shrd    r13d,r13d,14
+        mov     r10d,r14d
+        vpxor   xmm6,xmm6,xmm7
+        mov     r12d,edx
+        xor     r13d,ecx
+        shrd    r14d,r14d,9
+        vpshufd xmm6,xmm6,132
+        xor     r12d,r8d
+        shrd    r13d,r13d,5
+        xor     r14d,r10d
+        vpsrldq xmm6,xmm6,8
+        and     r12d,ecx
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((32-128))+rdi]
+        xor     r13d,ecx
+        add     r9d,DWORD[8+rsp]
+        vpaddd  xmm0,xmm0,xmm6
+        mov     r15d,r10d
+        shrd    r14d,r14d,11
+        xor     r12d,r8d
+        vpshufd xmm7,xmm0,80
+        xor     r15d,r11d
+        shrd    r13d,r13d,6
+        add     r9d,r12d
+        vpsrld  xmm6,xmm7,10
+        and     esi,r15d
+        xor     r14d,r10d
+        add     r9d,r13d
+        vpsrlq  xmm7,xmm7,17
+        xor     esi,r11d
+        add     ebx,r9d
+        shrd    r14d,r14d,2
+        vpxor   xmm6,xmm6,xmm7
+        add     r9d,esi
+        mov     r13d,ebx
+        add     r14d,r9d
+        vpsrlq  xmm7,xmm7,2
+        shrd    r13d,r13d,14
+        mov     r9d,r14d
+        mov     r12d,ecx
+        vpxor   xmm6,xmm6,xmm7
+        xor     r13d,ebx
+        shrd    r14d,r14d,9
+        xor     r12d,edx
+        vpshufd xmm6,xmm6,232
+        shrd    r13d,r13d,5
+        xor     r14d,r9d
+        and     r12d,ebx
+        vpslldq xmm6,xmm6,8
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((48-128))+rdi]
+        xor     r13d,ebx
+        add     r8d,DWORD[12+rsp]
+        mov     esi,r9d
+        vpaddd  xmm0,xmm0,xmm6
+        shrd    r14d,r14d,11
+        xor     r12d,edx
+        xor     esi,r10d
+        vpaddd  xmm6,xmm0,XMMWORD[rbp]
+        shrd    r13d,r13d,6
+        add     r8d,r12d
+        and     r15d,esi
+        xor     r14d,r9d
+        add     r8d,r13d
+        xor     r15d,r10d
+        add     eax,r8d
+        shrd    r14d,r14d,2
+        add     r8d,r15d
+        mov     r13d,eax
+        add     r14d,r8d
+        vmovdqa XMMWORD[rsp],xmm6
+        vpalignr        xmm4,xmm2,xmm1,4
+        shrd    r13d,r13d,14
+        mov     r8d,r14d
+        mov     r12d,ebx
+        vpalignr        xmm7,xmm0,xmm3,4
+        xor     r13d,eax
+        shrd    r14d,r14d,9
+        xor     r12d,ecx
+        vpsrld  xmm6,xmm4,7
+        shrd    r13d,r13d,5
+        xor     r14d,r8d
+        and     r12d,eax
+        vpaddd  xmm1,xmm1,xmm7
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((64-128))+rdi]
+        xor     r13d,eax
+        add     edx,DWORD[16+rsp]
+        mov     r15d,r8d
+        vpsrld  xmm7,xmm4,3
+        shrd    r14d,r14d,11
+        xor     r12d,ecx
+        xor     r15d,r9d
+        vpslld  xmm5,xmm4,14
+        shrd    r13d,r13d,6
+        add     edx,r12d
+        and     esi,r15d
+        vpxor   xmm4,xmm7,xmm6
+        xor     r14d,r8d
+        add     edx,r13d
+        xor     esi,r9d
+        vpshufd xmm7,xmm0,250
+        add     r11d,edx
+        shrd    r14d,r14d,2
+        add     edx,esi
+        vpsrld  xmm6,xmm6,11
+        mov     r13d,r11d
+        add     r14d,edx
+        shrd    r13d,r13d,14
+        vpxor   xmm4,xmm4,xmm5
+        mov     edx,r14d
+        mov     r12d,eax
+        xor     r13d,r11d
+        vpslld  xmm5,xmm5,11
+        shrd    r14d,r14d,9
+        xor     r12d,ebx
+        shrd    r13d,r13d,5
+        vpxor   xmm4,xmm4,xmm6
+        xor     r14d,edx
+        and     r12d,r11d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((80-128))+rdi]
+        xor     r13d,r11d
+        vpsrld  xmm6,xmm7,10
+        add     ecx,DWORD[20+rsp]
+        mov     esi,edx
+        shrd    r14d,r14d,11
+        vpxor   xmm4,xmm4,xmm5
+        xor     r12d,ebx
+        xor     esi,r8d
+        shrd    r13d,r13d,6
+        vpsrlq  xmm7,xmm7,17
+        add     ecx,r12d
+        and     r15d,esi
+        xor     r14d,edx
+        vpaddd  xmm1,xmm1,xmm4
+        add     ecx,r13d
+        xor     r15d,r8d
+        add     r10d,ecx
+        vpxor   xmm6,xmm6,xmm7
+        shrd    r14d,r14d,2
+        add     ecx,r15d
+        mov     r13d,r10d
+        vpsrlq  xmm7,xmm7,2
+        add     r14d,ecx
+        shrd    r13d,r13d,14
+        mov     ecx,r14d
+        vpxor   xmm6,xmm6,xmm7
+        mov     r12d,r11d
+        xor     r13d,r10d
+        shrd    r14d,r14d,9
+        vpshufd xmm6,xmm6,132
+        xor     r12d,eax
+        shrd    r13d,r13d,5
+        xor     r14d,ecx
+        vpsrldq xmm6,xmm6,8
+        and     r12d,r10d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((96-128))+rdi]
+        xor     r13d,r10d
+        add     ebx,DWORD[24+rsp]
+        vpaddd  xmm1,xmm1,xmm6
+        mov     r15d,ecx
+        shrd    r14d,r14d,11
+        xor     r12d,eax
+        vpshufd xmm7,xmm1,80
+        xor     r15d,edx
+        shrd    r13d,r13d,6
+        add     ebx,r12d
+        vpsrld  xmm6,xmm7,10
+        and     esi,r15d
+        xor     r14d,ecx
+        add     ebx,r13d
+        vpsrlq  xmm7,xmm7,17
+        xor     esi,edx
+        add     r9d,ebx
+        shrd    r14d,r14d,2
+        vpxor   xmm6,xmm6,xmm7
+        add     ebx,esi
+        mov     r13d,r9d
+        add     r14d,ebx
+        vpsrlq  xmm7,xmm7,2
+        shrd    r13d,r13d,14
+        mov     ebx,r14d
+        mov     r12d,r10d
+        vpxor   xmm6,xmm6,xmm7
+        xor     r13d,r9d
+        shrd    r14d,r14d,9
+        xor     r12d,r11d
+        vpshufd xmm6,xmm6,232
+        shrd    r13d,r13d,5
+        xor     r14d,ebx
+        and     r12d,r9d
+        vpslldq xmm6,xmm6,8
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((112-128))+rdi]
+        xor     r13d,r9d
+        add     eax,DWORD[28+rsp]
+        mov     esi,ebx
+        vpaddd  xmm1,xmm1,xmm6
+        shrd    r14d,r14d,11
+        xor     r12d,r11d
+        xor     esi,ecx
+        vpaddd  xmm6,xmm1,XMMWORD[32+rbp]
+        shrd    r13d,r13d,6
+        add     eax,r12d
+        and     r15d,esi
+        xor     r14d,ebx
+        add     eax,r13d
+        xor     r15d,ecx
+        add     r8d,eax
+        shrd    r14d,r14d,2
+        add     eax,r15d
+        mov     r13d,r8d
+        add     r14d,eax
+        vmovdqa XMMWORD[16+rsp],xmm6
+        vpalignr        xmm4,xmm3,xmm2,4
+        shrd    r13d,r13d,14
+        mov     eax,r14d
+        mov     r12d,r9d
+        vpalignr        xmm7,xmm1,xmm0,4
+        xor     r13d,r8d
+        shrd    r14d,r14d,9
+        xor     r12d,r10d
+        vpsrld  xmm6,xmm4,7
+        shrd    r13d,r13d,5
+        xor     r14d,eax
+        and     r12d,r8d
+        vpaddd  xmm2,xmm2,xmm7
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((128-128))+rdi]
+        xor     r13d,r8d
+        add     r11d,DWORD[32+rsp]
+        mov     r15d,eax
+        vpsrld  xmm7,xmm4,3
+        shrd    r14d,r14d,11
+        xor     r12d,r10d
+        xor     r15d,ebx
+        vpslld  xmm5,xmm4,14
+        shrd    r13d,r13d,6
+        add     r11d,r12d
+        and     esi,r15d
+        vpxor   xmm4,xmm7,xmm6
+        xor     r14d,eax
+        add     r11d,r13d
+        xor     esi,ebx
+        vpshufd xmm7,xmm1,250
+        add     edx,r11d
+        shrd    r14d,r14d,2
+        add     r11d,esi
+        vpsrld  xmm6,xmm6,11
+        mov     r13d,edx
+        add     r14d,r11d
+        shrd    r13d,r13d,14
+        vpxor   xmm4,xmm4,xmm5
+        mov     r11d,r14d
+        mov     r12d,r8d
+        xor     r13d,edx
+        vpslld  xmm5,xmm5,11
+        shrd    r14d,r14d,9
+        xor     r12d,r9d
+        shrd    r13d,r13d,5
+        vpxor   xmm4,xmm4,xmm6
+        xor     r14d,r11d
+        and     r12d,edx
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((144-128))+rdi]
+        xor     r13d,edx
+        vpsrld  xmm6,xmm7,10
+        add     r10d,DWORD[36+rsp]
+        mov     esi,r11d
+        shrd    r14d,r14d,11
+        vpxor   xmm4,xmm4,xmm5
+        xor     r12d,r9d
+        xor     esi,eax
+        shrd    r13d,r13d,6
+        vpsrlq  xmm7,xmm7,17
+        add     r10d,r12d
+        and     r15d,esi
+        xor     r14d,r11d
+        vpaddd  xmm2,xmm2,xmm4
+        add     r10d,r13d
+        xor     r15d,eax
+        add     ecx,r10d
+        vpxor   xmm6,xmm6,xmm7
+        shrd    r14d,r14d,2
+        add     r10d,r15d
+        mov     r13d,ecx
+        vpsrlq  xmm7,xmm7,2
+        add     r14d,r10d
+        shrd    r13d,r13d,14
+        mov     r10d,r14d
+        vpxor   xmm6,xmm6,xmm7
+        mov     r12d,edx
+        xor     r13d,ecx
+        shrd    r14d,r14d,9
+        vpshufd xmm6,xmm6,132
+        xor     r12d,r8d
+        shrd    r13d,r13d,5
+        xor     r14d,r10d
+        vpsrldq xmm6,xmm6,8
+        and     r12d,ecx
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((160-128))+rdi]
+        xor     r13d,ecx
+        add     r9d,DWORD[40+rsp]
+        vpaddd  xmm2,xmm2,xmm6
+        mov     r15d,r10d
+        shrd    r14d,r14d,11
+        xor     r12d,r8d
+        vpshufd xmm7,xmm2,80
+        xor     r15d,r11d
+        shrd    r13d,r13d,6
+        add     r9d,r12d
+        vpsrld  xmm6,xmm7,10
+        and     esi,r15d
+        xor     r14d,r10d
+        add     r9d,r13d
+        vpsrlq  xmm7,xmm7,17
+        xor     esi,r11d
+        add     ebx,r9d
+        shrd    r14d,r14d,2
+        vpxor   xmm6,xmm6,xmm7
+        add     r9d,esi
+        mov     r13d,ebx
+        add     r14d,r9d
+        vpsrlq  xmm7,xmm7,2
+        shrd    r13d,r13d,14
+        mov     r9d,r14d
+        mov     r12d,ecx
+        vpxor   xmm6,xmm6,xmm7
+        xor     r13d,ebx
+        shrd    r14d,r14d,9
+        xor     r12d,edx
+        vpshufd xmm6,xmm6,232
+        shrd    r13d,r13d,5
+        xor     r14d,r9d
+        and     r12d,ebx
+        vpslldq xmm6,xmm6,8
+        vaesenclast     xmm11,xmm9,xmm10
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((176-128))+rdi]
+        xor     r13d,ebx
+        add     r8d,DWORD[44+rsp]
+        mov     esi,r9d
+        vpaddd  xmm2,xmm2,xmm6
+        shrd    r14d,r14d,11
+        xor     r12d,edx
+        xor     esi,r10d
+        vpaddd  xmm6,xmm2,XMMWORD[64+rbp]
+        shrd    r13d,r13d,6
+        add     r8d,r12d
+        and     r15d,esi
+        xor     r14d,r9d
+        add     r8d,r13d
+        xor     r15d,r10d
+        add     eax,r8d
+        shrd    r14d,r14d,2
+        add     r8d,r15d
+        mov     r13d,eax
+        add     r14d,r8d
+        vmovdqa XMMWORD[32+rsp],xmm6
+        vpalignr        xmm4,xmm0,xmm3,4
+        shrd    r13d,r13d,14
+        mov     r8d,r14d
+        mov     r12d,ebx
+        vpalignr        xmm7,xmm2,xmm1,4
+        xor     r13d,eax
+        shrd    r14d,r14d,9
+        xor     r12d,ecx
+        vpsrld  xmm6,xmm4,7
+        shrd    r13d,r13d,5
+        xor     r14d,r8d
+        and     r12d,eax
+        vpaddd  xmm3,xmm3,xmm7
+        vpand   xmm8,xmm11,xmm12
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((192-128))+rdi]
+        xor     r13d,eax
+        add     edx,DWORD[48+rsp]
+        mov     r15d,r8d
+        vpsrld  xmm7,xmm4,3
+        shrd    r14d,r14d,11
+        xor     r12d,ecx
+        xor     r15d,r9d
+        vpslld  xmm5,xmm4,14
+        shrd    r13d,r13d,6
+        add     edx,r12d
+        and     esi,r15d
+        vpxor   xmm4,xmm7,xmm6
+        xor     r14d,r8d
+        add     edx,r13d
+        xor     esi,r9d
+        vpshufd xmm7,xmm2,250
+        add     r11d,edx
+        shrd    r14d,r14d,2
+        add     edx,esi
+        vpsrld  xmm6,xmm6,11
+        mov     r13d,r11d
+        add     r14d,edx
+        shrd    r13d,r13d,14
+        vpxor   xmm4,xmm4,xmm5
+        mov     edx,r14d
+        mov     r12d,eax
+        xor     r13d,r11d
+        vpslld  xmm5,xmm5,11
+        shrd    r14d,r14d,9
+        xor     r12d,ebx
+        shrd    r13d,r13d,5
+        vpxor   xmm4,xmm4,xmm6
+        xor     r14d,edx
+        and     r12d,r11d
+        vaesenclast     xmm11,xmm9,xmm10
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((208-128))+rdi]
+        xor     r13d,r11d
+        vpsrld  xmm6,xmm7,10
+        add     ecx,DWORD[52+rsp]
+        mov     esi,edx
+        shrd    r14d,r14d,11
+        vpxor   xmm4,xmm4,xmm5
+        xor     r12d,ebx
+        xor     esi,r8d
+        shrd    r13d,r13d,6
+        vpsrlq  xmm7,xmm7,17
+        add     ecx,r12d
+        and     r15d,esi
+        xor     r14d,edx
+        vpaddd  xmm3,xmm3,xmm4
+        add     ecx,r13d
+        xor     r15d,r8d
+        add     r10d,ecx
+        vpxor   xmm6,xmm6,xmm7
+        shrd    r14d,r14d,2
+        add     ecx,r15d
+        mov     r13d,r10d
+        vpsrlq  xmm7,xmm7,2
+        add     r14d,ecx
+        shrd    r13d,r13d,14
+        mov     ecx,r14d
+        vpxor   xmm6,xmm6,xmm7
+        mov     r12d,r11d
+        xor     r13d,r10d
+        shrd    r14d,r14d,9
+        vpshufd xmm6,xmm6,132
+        xor     r12d,eax
+        shrd    r13d,r13d,5
+        xor     r14d,ecx
+        vpsrldq xmm6,xmm6,8
+        and     r12d,r10d
+        vpand   xmm11,xmm11,xmm13
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((224-128))+rdi]
+        xor     r13d,r10d
+        add     ebx,DWORD[56+rsp]
+        vpaddd  xmm3,xmm3,xmm6
+        mov     r15d,ecx
+        shrd    r14d,r14d,11
+        xor     r12d,eax
+        vpshufd xmm7,xmm3,80
+        xor     r15d,edx
+        shrd    r13d,r13d,6
+        add     ebx,r12d
+        vpsrld  xmm6,xmm7,10
+        and     esi,r15d
+        xor     r14d,ecx
+        add     ebx,r13d
+        vpsrlq  xmm7,xmm7,17
+        xor     esi,edx
+        add     r9d,ebx
+        shrd    r14d,r14d,2
+        vpxor   xmm6,xmm6,xmm7
+        add     ebx,esi
+        mov     r13d,r9d
+        add     r14d,ebx
+        vpsrlq  xmm7,xmm7,2
+        shrd    r13d,r13d,14
+        mov     ebx,r14d
+        mov     r12d,r10d
+        vpxor   xmm6,xmm6,xmm7
+        xor     r13d,r9d
+        shrd    r14d,r14d,9
+        xor     r12d,r11d
+        vpshufd xmm6,xmm6,232
+        shrd    r13d,r13d,5
+        xor     r14d,ebx
+        and     r12d,r9d
+        vpslldq xmm6,xmm6,8
+        vpor    xmm8,xmm8,xmm11
+        vaesenclast     xmm11,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((0-128))+rdi]
+        xor     r13d,r9d
+        add     eax,DWORD[60+rsp]
+        mov     esi,ebx
+        vpaddd  xmm3,xmm3,xmm6
+        shrd    r14d,r14d,11
+        xor     r12d,r11d
+        xor     esi,ecx
+        vpaddd  xmm6,xmm3,XMMWORD[96+rbp]
+        shrd    r13d,r13d,6
+        add     eax,r12d
+        and     r15d,esi
+        xor     r14d,ebx
+        add     eax,r13d
+        xor     r15d,ecx
+        add     r8d,eax
+        shrd    r14d,r14d,2
+        add     eax,r15d
+        mov     r13d,r8d
+        add     r14d,eax
+        vmovdqa XMMWORD[48+rsp],xmm6
+        mov     r12,QWORD[((64+0))+rsp]
+        vpand   xmm11,xmm11,xmm14
+        mov     r15,QWORD[((64+8))+rsp]
+        vpor    xmm8,xmm8,xmm11
+        vmovdqu XMMWORD[r12*1+r15],xmm8
+        lea     r12,[16+r12]
+        cmp     BYTE[131+rbp],0
+        jne     NEAR $L$avx_00_47
+        vmovdqu xmm9,XMMWORD[r12]
+        mov     QWORD[((64+0))+rsp],r12
+        shrd    r13d,r13d,14
+        mov     eax,r14d
+        mov     r12d,r9d
+        xor     r13d,r8d
+        shrd    r14d,r14d,9
+        xor     r12d,r10d
+        shrd    r13d,r13d,5
+        xor     r14d,eax
+        and     r12d,r8d
+        vpxor   xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((16-128))+rdi]
+        xor     r13d,r8d
+        add     r11d,DWORD[rsp]
+        mov     r15d,eax
+        shrd    r14d,r14d,11
+        xor     r12d,r10d
+        xor     r15d,ebx
+        shrd    r13d,r13d,6
+        add     r11d,r12d
+        and     esi,r15d
+        xor     r14d,eax
+        add     r11d,r13d
+        xor     esi,ebx
+        add     edx,r11d
+        shrd    r14d,r14d,2
+        add     r11d,esi
+        mov     r13d,edx
+        add     r14d,r11d
+        shrd    r13d,r13d,14
+        mov     r11d,r14d
+        mov     r12d,r8d
+        xor     r13d,edx
+        shrd    r14d,r14d,9
+        xor     r12d,r9d
+        shrd    r13d,r13d,5
+        xor     r14d,r11d
+        and     r12d,edx
+        vpxor   xmm9,xmm9,xmm8
+        xor     r13d,edx
+        add     r10d,DWORD[4+rsp]
+        mov     esi,r11d
+        shrd    r14d,r14d,11
+        xor     r12d,r9d
+        xor     esi,eax
+        shrd    r13d,r13d,6
+        add     r10d,r12d
+        and     r15d,esi
+        xor     r14d,r11d
+        add     r10d,r13d
+        xor     r15d,eax
+        add     ecx,r10d
+        shrd    r14d,r14d,2
+        add     r10d,r15d
+        mov     r13d,ecx
+        add     r14d,r10d
+        shrd    r13d,r13d,14
+        mov     r10d,r14d
+        mov     r12d,edx
+        xor     r13d,ecx
+        shrd    r14d,r14d,9
+        xor     r12d,r8d
+        shrd    r13d,r13d,5
+        xor     r14d,r10d
+        and     r12d,ecx
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((32-128))+rdi]
+        xor     r13d,ecx
+        add     r9d,DWORD[8+rsp]
+        mov     r15d,r10d
+        shrd    r14d,r14d,11
+        xor     r12d,r8d
+        xor     r15d,r11d
+        shrd    r13d,r13d,6
+        add     r9d,r12d
+        and     esi,r15d
+        xor     r14d,r10d
+        add     r9d,r13d
+        xor     esi,r11d
+        add     ebx,r9d
+        shrd    r14d,r14d,2
+        add     r9d,esi
+        mov     r13d,ebx
+        add     r14d,r9d
+        shrd    r13d,r13d,14
+        mov     r9d,r14d
+        mov     r12d,ecx
+        xor     r13d,ebx
+        shrd    r14d,r14d,9
+        xor     r12d,edx
+        shrd    r13d,r13d,5
+        xor     r14d,r9d
+        and     r12d,ebx
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((48-128))+rdi]
+        xor     r13d,ebx
+        add     r8d,DWORD[12+rsp]
+        mov     esi,r9d
+        shrd    r14d,r14d,11
+        xor     r12d,edx
+        xor     esi,r10d
+        shrd    r13d,r13d,6
+        add     r8d,r12d
+        and     r15d,esi
+        xor     r14d,r9d
+        add     r8d,r13d
+        xor     r15d,r10d
+        add     eax,r8d
+        shrd    r14d,r14d,2
+        add     r8d,r15d
+        mov     r13d,eax
+        add     r14d,r8d
+        shrd    r13d,r13d,14
+        mov     r8d,r14d
+        mov     r12d,ebx
+        xor     r13d,eax
+        shrd    r14d,r14d,9
+        xor     r12d,ecx
+        shrd    r13d,r13d,5
+        xor     r14d,r8d
+        and     r12d,eax
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((64-128))+rdi]
+        xor     r13d,eax
+        add     edx,DWORD[16+rsp]
+        mov     r15d,r8d
+        shrd    r14d,r14d,11
+        xor     r12d,ecx
+        xor     r15d,r9d
+        shrd    r13d,r13d,6
+        add     edx,r12d
+        and     esi,r15d
+        xor     r14d,r8d
+        add     edx,r13d
+        xor     esi,r9d
+        add     r11d,edx
+        shrd    r14d,r14d,2
+        add     edx,esi
+        mov     r13d,r11d
+        add     r14d,edx
+        shrd    r13d,r13d,14
+        mov     edx,r14d
+        mov     r12d,eax
+        xor     r13d,r11d
+        shrd    r14d,r14d,9
+        xor     r12d,ebx
+        shrd    r13d,r13d,5
+        xor     r14d,edx
+        and     r12d,r11d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((80-128))+rdi]
+        xor     r13d,r11d
+        add     ecx,DWORD[20+rsp]
+        mov     esi,edx
+        shrd    r14d,r14d,11
+        xor     r12d,ebx
+        xor     esi,r8d
+        shrd    r13d,r13d,6
+        add     ecx,r12d
+        and     r15d,esi
+        xor     r14d,edx
+        add     ecx,r13d
+        xor     r15d,r8d
+        add     r10d,ecx
+        shrd    r14d,r14d,2
+        add     ecx,r15d
+        mov     r13d,r10d
+        add     r14d,ecx
+        shrd    r13d,r13d,14
+        mov     ecx,r14d
+        mov     r12d,r11d
+        xor     r13d,r10d
+        shrd    r14d,r14d,9
+        xor     r12d,eax
+        shrd    r13d,r13d,5
+        xor     r14d,ecx
+        and     r12d,r10d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((96-128))+rdi]
+        xor     r13d,r10d
+        add     ebx,DWORD[24+rsp]
+        mov     r15d,ecx
+        shrd    r14d,r14d,11
+        xor     r12d,eax
+        xor     r15d,edx
+        shrd    r13d,r13d,6
+        add     ebx,r12d
+        and     esi,r15d
+        xor     r14d,ecx
+        add     ebx,r13d
+        xor     esi,edx
+        add     r9d,ebx
+        shrd    r14d,r14d,2
+        add     ebx,esi
+        mov     r13d,r9d
+        add     r14d,ebx
+        shrd    r13d,r13d,14
+        mov     ebx,r14d
+        mov     r12d,r10d
+        xor     r13d,r9d
+        shrd    r14d,r14d,9
+        xor     r12d,r11d
+        shrd    r13d,r13d,5
+        xor     r14d,ebx
+        and     r12d,r9d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((112-128))+rdi]
+        xor     r13d,r9d
+        add     eax,DWORD[28+rsp]
+        mov     esi,ebx
+        shrd    r14d,r14d,11
+        xor     r12d,r11d
+        xor     esi,ecx
+        shrd    r13d,r13d,6
+        add     eax,r12d
+        and     r15d,esi
+        xor     r14d,ebx
+        add     eax,r13d
+        xor     r15d,ecx
+        add     r8d,eax
+        shrd    r14d,r14d,2
+        add     eax,r15d
+        mov     r13d,r8d
+        add     r14d,eax
+        shrd    r13d,r13d,14
+        mov     eax,r14d
+        mov     r12d,r9d
+        xor     r13d,r8d
+        shrd    r14d,r14d,9
+        xor     r12d,r10d
+        shrd    r13d,r13d,5
+        xor     r14d,eax
+        and     r12d,r8d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((128-128))+rdi]
+        xor     r13d,r8d
+        add     r11d,DWORD[32+rsp]
+        mov     r15d,eax
+        shrd    r14d,r14d,11
+        xor     r12d,r10d
+        xor     r15d,ebx
+        shrd    r13d,r13d,6
+        add     r11d,r12d
+        and     esi,r15d
+        xor     r14d,eax
+        add     r11d,r13d
+        xor     esi,ebx
+        add     edx,r11d
+        shrd    r14d,r14d,2
+        add     r11d,esi
+        mov     r13d,edx
+        add     r14d,r11d
+        shrd    r13d,r13d,14
+        mov     r11d,r14d
+        mov     r12d,r8d
+        xor     r13d,edx
+        shrd    r14d,r14d,9
+        xor     r12d,r9d
+        shrd    r13d,r13d,5
+        xor     r14d,r11d
+        and     r12d,edx
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((144-128))+rdi]
+        xor     r13d,edx
+        add     r10d,DWORD[36+rsp]
+        mov     esi,r11d
+        shrd    r14d,r14d,11
+        xor     r12d,r9d
+        xor     esi,eax
+        shrd    r13d,r13d,6
+        add     r10d,r12d
+        and     r15d,esi
+        xor     r14d,r11d
+        add     r10d,r13d
+        xor     r15d,eax
+        add     ecx,r10d
+        shrd    r14d,r14d,2
+        add     r10d,r15d
+        mov     r13d,ecx
+        add     r14d,r10d
+        shrd    r13d,r13d,14
+        mov     r10d,r14d
+        mov     r12d,edx
+        xor     r13d,ecx
+        shrd    r14d,r14d,9
+        xor     r12d,r8d
+        shrd    r13d,r13d,5
+        xor     r14d,r10d
+        and     r12d,ecx
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((160-128))+rdi]
+        xor     r13d,ecx
+        add     r9d,DWORD[40+rsp]
+        mov     r15d,r10d
+        shrd    r14d,r14d,11
+        xor     r12d,r8d
+        xor     r15d,r11d
+        shrd    r13d,r13d,6
+        add     r9d,r12d
+        and     esi,r15d
+        xor     r14d,r10d
+        add     r9d,r13d
+        xor     esi,r11d
+        add     ebx,r9d
+        shrd    r14d,r14d,2
+        add     r9d,esi
+        mov     r13d,ebx
+        add     r14d,r9d
+        shrd    r13d,r13d,14
+        mov     r9d,r14d
+        mov     r12d,ecx
+        xor     r13d,ebx
+        shrd    r14d,r14d,9
+        xor     r12d,edx
+        shrd    r13d,r13d,5
+        xor     r14d,r9d
+        and     r12d,ebx
+        vaesenclast     xmm11,xmm9,xmm10
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((176-128))+rdi]
+        xor     r13d,ebx
+        add     r8d,DWORD[44+rsp]
+        mov     esi,r9d
+        shrd    r14d,r14d,11
+        xor     r12d,edx
+        xor     esi,r10d
+        shrd    r13d,r13d,6
+        add     r8d,r12d
+        and     r15d,esi
+        xor     r14d,r9d
+        add     r8d,r13d
+        xor     r15d,r10d
+        add     eax,r8d
+        shrd    r14d,r14d,2
+        add     r8d,r15d
+        mov     r13d,eax
+        add     r14d,r8d
+        shrd    r13d,r13d,14
+        mov     r8d,r14d
+        mov     r12d,ebx
+        xor     r13d,eax
+        shrd    r14d,r14d,9
+        xor     r12d,ecx
+        shrd    r13d,r13d,5
+        xor     r14d,r8d
+        and     r12d,eax
+        vpand   xmm8,xmm11,xmm12
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((192-128))+rdi]
+        xor     r13d,eax
+        add     edx,DWORD[48+rsp]
+        mov     r15d,r8d
+        shrd    r14d,r14d,11
+        xor     r12d,ecx
+        xor     r15d,r9d
+        shrd    r13d,r13d,6
+        add     edx,r12d
+        and     esi,r15d
+        xor     r14d,r8d
+        add     edx,r13d
+        xor     esi,r9d
+        add     r11d,edx
+        shrd    r14d,r14d,2
+        add     edx,esi
+        mov     r13d,r11d
+        add     r14d,edx
+        shrd    r13d,r13d,14
+        mov     edx,r14d
+        mov     r12d,eax
+        xor     r13d,r11d
+        shrd    r14d,r14d,9
+        xor     r12d,ebx
+        shrd    r13d,r13d,5
+        xor     r14d,edx
+        and     r12d,r11d
+        vaesenclast     xmm11,xmm9,xmm10
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((208-128))+rdi]
+        xor     r13d,r11d
+        add     ecx,DWORD[52+rsp]
+        mov     esi,edx
+        shrd    r14d,r14d,11
+        xor     r12d,ebx
+        xor     esi,r8d
+        shrd    r13d,r13d,6
+        add     ecx,r12d
+        and     r15d,esi
+        xor     r14d,edx
+        add     ecx,r13d
+        xor     r15d,r8d
+        add     r10d,ecx
+        shrd    r14d,r14d,2
+        add     ecx,r15d
+        mov     r13d,r10d
+        add     r14d,ecx
+        shrd    r13d,r13d,14
+        mov     ecx,r14d
+        mov     r12d,r11d
+        xor     r13d,r10d
+        shrd    r14d,r14d,9
+        xor     r12d,eax
+        shrd    r13d,r13d,5
+        xor     r14d,ecx
+        and     r12d,r10d
+        vpand   xmm11,xmm11,xmm13
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((224-128))+rdi]
+        xor     r13d,r10d
+        add     ebx,DWORD[56+rsp]
+        mov     r15d,ecx
+        shrd    r14d,r14d,11
+        xor     r12d,eax
+        xor     r15d,edx
+        shrd    r13d,r13d,6
+        add     ebx,r12d
+        and     esi,r15d
+        xor     r14d,ecx
+        add     ebx,r13d
+        xor     esi,edx
+        add     r9d,ebx
+        shrd    r14d,r14d,2
+        add     ebx,esi
+        mov     r13d,r9d
+        add     r14d,ebx
+        shrd    r13d,r13d,14
+        mov     ebx,r14d
+        mov     r12d,r10d
+        xor     r13d,r9d
+        shrd    r14d,r14d,9
+        xor     r12d,r11d
+        shrd    r13d,r13d,5
+        xor     r14d,ebx
+        and     r12d,r9d
+        vpor    xmm8,xmm8,xmm11
+        vaesenclast     xmm11,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((0-128))+rdi]
+        xor     r13d,r9d
+        add     eax,DWORD[60+rsp]
+        mov     esi,ebx
+        shrd    r14d,r14d,11
+        xor     r12d,r11d
+        xor     esi,ecx
+        shrd    r13d,r13d,6
+        add     eax,r12d
+        and     r15d,esi
+        xor     r14d,ebx
+        add     eax,r13d
+        xor     r15d,ecx
+        add     r8d,eax
+        shrd    r14d,r14d,2
+        add     eax,r15d
+        mov     r13d,r8d
+        add     r14d,eax
+        mov     r12,QWORD[((64+0))+rsp]
+        mov     r13,QWORD[((64+8))+rsp]
+        mov     r15,QWORD[((64+40))+rsp]
+        mov     rsi,QWORD[((64+48))+rsp]
+
+        vpand   xmm11,xmm11,xmm14
+        mov     eax,r14d
+        vpor    xmm8,xmm8,xmm11
+        vmovdqu XMMWORD[r13*1+r12],xmm8
+        lea     r12,[16+r12]
+
+        add     eax,DWORD[r15]
+        add     ebx,DWORD[4+r15]
+        add     ecx,DWORD[8+r15]
+        add     edx,DWORD[12+r15]
+        add     r8d,DWORD[16+r15]
+        add     r9d,DWORD[20+r15]
+        add     r10d,DWORD[24+r15]
+        add     r11d,DWORD[28+r15]
+
+        cmp     r12,QWORD[((64+16))+rsp]
+
+        mov     DWORD[r15],eax
+        mov     DWORD[4+r15],ebx
+        mov     DWORD[8+r15],ecx
+        mov     DWORD[12+r15],edx
+        mov     DWORD[16+r15],r8d
+        mov     DWORD[20+r15],r9d
+        mov     DWORD[24+r15],r10d
+        mov     DWORD[28+r15],r11d
+        jb      NEAR $L$loop_avx
+
+        mov     r8,QWORD[((64+32))+rsp]
+        mov     rsi,QWORD[120+rsp]
+
+        vmovdqu XMMWORD[r8],xmm8
+        vzeroall
+        movaps  xmm6,XMMWORD[128+rsp]
+        movaps  xmm7,XMMWORD[144+rsp]
+        movaps  xmm8,XMMWORD[160+rsp]
+        movaps  xmm9,XMMWORD[176+rsp]
+        movaps  xmm10,XMMWORD[192+rsp]
+        movaps  xmm11,XMMWORD[208+rsp]
+        movaps  xmm12,XMMWORD[224+rsp]
+        movaps  xmm13,XMMWORD[240+rsp]
+        movaps  xmm14,XMMWORD[256+rsp]
+        movaps  xmm15,XMMWORD[272+rsp]
+        mov     r15,QWORD[((-48))+rsi]
+
+        mov     r14,QWORD[((-40))+rsi]
+
+        mov     r13,QWORD[((-32))+rsi]
+
+        mov     r12,QWORD[((-24))+rsi]
+
+        mov     rbp,QWORD[((-16))+rsi]
+
+        mov     rbx,QWORD[((-8))+rsi]
+
+        lea     rsp,[rsi]
+
+$L$epilogue_avx:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_aesni_cbc_sha256_enc_avx:
+
+ALIGN   64
+aesni_cbc_sha256_enc_avx2:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_aesni_cbc_sha256_enc_avx2:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+
+$L$avx2_shortcut:
+        mov     r10,QWORD[56+rsp]
+        mov     rax,rsp
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+        sub     rsp,736
+        and     rsp,-256*4
+        add     rsp,448
+
+        shl     rdx,6
+        sub     rsi,rdi
+        sub     r10,rdi
+        add     rdx,rdi
+
+
+
+        mov     QWORD[((64+16))+rsp],rdx
+
+        mov     QWORD[((64+32))+rsp],r8
+        mov     QWORD[((64+40))+rsp],r9
+        mov     QWORD[((64+48))+rsp],r10
+        mov     QWORD[120+rsp],rax
+
+        movaps  XMMWORD[128+rsp],xmm6
+        movaps  XMMWORD[144+rsp],xmm7
+        movaps  XMMWORD[160+rsp],xmm8
+        movaps  XMMWORD[176+rsp],xmm9
+        movaps  XMMWORD[192+rsp],xmm10
+        movaps  XMMWORD[208+rsp],xmm11
+        movaps  XMMWORD[224+rsp],xmm12
+        movaps  XMMWORD[240+rsp],xmm13
+        movaps  XMMWORD[256+rsp],xmm14
+        movaps  XMMWORD[272+rsp],xmm15
+$L$prologue_avx2:
+        vzeroall
+
+        mov     r13,rdi
+        vpinsrq xmm15,xmm15,rsi,1
+        lea     rdi,[128+rcx]
+        lea     r12,[((K256+544))]
+        mov     r14d,DWORD[((240-128))+rdi]
+        mov     r15,r9
+        mov     rsi,r10
+        vmovdqu xmm8,XMMWORD[r8]
+        lea     r14,[((-9))+r14]
+
+        vmovdqa xmm14,XMMWORD[r14*8+r12]
+        vmovdqa xmm13,XMMWORD[16+r14*8+r12]
+        vmovdqa xmm12,XMMWORD[32+r14*8+r12]
+
+        sub     r13,-16*4
+        mov     eax,DWORD[r15]
+        lea     r12,[r13*1+rsi]
+        mov     ebx,DWORD[4+r15]
+        cmp     r13,rdx
+        mov     ecx,DWORD[8+r15]
+        cmove   r12,rsp
+        mov     edx,DWORD[12+r15]
+        mov     r8d,DWORD[16+r15]
+        mov     r9d,DWORD[20+r15]
+        mov     r10d,DWORD[24+r15]
+        mov     r11d,DWORD[28+r15]
+        vmovdqu xmm10,XMMWORD[((0-128))+rdi]
+        jmp     NEAR $L$oop_avx2
+ALIGN   16
+$L$oop_avx2:
+        vmovdqa ymm7,YMMWORD[((K256+512))]
+        vmovdqu xmm0,XMMWORD[((-64+0))+r13*1+rsi]
+        vmovdqu xmm1,XMMWORD[((-64+16))+r13*1+rsi]
+        vmovdqu xmm2,XMMWORD[((-64+32))+r13*1+rsi]
+        vmovdqu xmm3,XMMWORD[((-64+48))+r13*1+rsi]
+
+        vinserti128     ymm0,ymm0,XMMWORD[r12],1
+        vinserti128     ymm1,ymm1,XMMWORD[16+r12],1
+        vpshufb ymm0,ymm0,ymm7
+        vinserti128     ymm2,ymm2,XMMWORD[32+r12],1
+        vpshufb ymm1,ymm1,ymm7
+        vinserti128     ymm3,ymm3,XMMWORD[48+r12],1
+
+        lea     rbp,[K256]
+        vpshufb ymm2,ymm2,ymm7
+        lea     r13,[((-64))+r13]
+        vpaddd  ymm4,ymm0,YMMWORD[rbp]
+        vpshufb ymm3,ymm3,ymm7
+        vpaddd  ymm5,ymm1,YMMWORD[32+rbp]
+        vpaddd  ymm6,ymm2,YMMWORD[64+rbp]
+        vpaddd  ymm7,ymm3,YMMWORD[96+rbp]
+        vmovdqa YMMWORD[rsp],ymm4
+        xor     r14d,r14d
+        vmovdqa YMMWORD[32+rsp],ymm5
+        lea     rsp,[((-64))+rsp]
+        mov     esi,ebx
+        vmovdqa YMMWORD[rsp],ymm6
+        xor     esi,ecx
+        vmovdqa YMMWORD[32+rsp],ymm7
+        mov     r12d,r9d
+        sub     rbp,-16*2*4
+        jmp     NEAR $L$avx2_00_47
+
+ALIGN   16
+$L$avx2_00_47:
+        vmovdqu xmm9,XMMWORD[r13]
+        vpinsrq xmm15,xmm15,r13,0
+        lea     rsp,[((-64))+rsp]
+        vpalignr        ymm4,ymm1,ymm0,4
+        add     r11d,DWORD[((0+128))+rsp]
+        and     r12d,r8d
+        rorx    r13d,r8d,25
+        vpalignr        ymm7,ymm3,ymm2,4
+        rorx    r15d,r8d,11
+        lea     eax,[r14*1+rax]
+        lea     r11d,[r12*1+r11]
+        vpsrld  ymm6,ymm4,7
+        andn    r12d,r8d,r10d
+        xor     r13d,r15d
+        rorx    r14d,r8d,6
+        vpaddd  ymm0,ymm0,ymm7
+        lea     r11d,[r12*1+r11]
+        xor     r13d,r14d
+        mov     r15d,eax
+        vpsrld  ymm7,ymm4,3
+        rorx    r12d,eax,22
+        lea     r11d,[r13*1+r11]
+        xor     r15d,ebx
+        vpslld  ymm5,ymm4,14
+        rorx    r14d,eax,13
+        rorx    r13d,eax,2
+        lea     edx,[r11*1+rdx]
+        vpxor   ymm4,ymm7,ymm6
+        and     esi,r15d
+        vpxor   xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((16-128))+rdi]
+        xor     r14d,r12d
+        xor     esi,ebx
+        vpshufd ymm7,ymm3,250
+        xor     r14d,r13d
+        lea     r11d,[rsi*1+r11]
+        mov     r12d,r8d
+        vpsrld  ymm6,ymm6,11
+        add     r10d,DWORD[((4+128))+rsp]
+        and     r12d,edx
+        rorx    r13d,edx,25
+        vpxor   ymm4,ymm4,ymm5
+        rorx    esi,edx,11
+        lea     r11d,[r14*1+r11]
+        lea     r10d,[r12*1+r10]
+        vpslld  ymm5,ymm5,11
+        andn    r12d,edx,r9d
+        xor     r13d,esi
+        rorx    r14d,edx,6
+        vpxor   ymm4,ymm4,ymm6
+        lea     r10d,[r12*1+r10]
+        xor     r13d,r14d
+        mov     esi,r11d
+        vpsrld  ymm6,ymm7,10
+        rorx    r12d,r11d,22
+        lea     r10d,[r13*1+r10]
+        xor     esi,eax
+        vpxor   ymm4,ymm4,ymm5
+        rorx    r14d,r11d,13
+        rorx    r13d,r11d,2
+        lea     ecx,[r10*1+rcx]
+        vpsrlq  ymm7,ymm7,17
+        and     r15d,esi
+        vpxor   xmm9,xmm9,xmm8
+        xor     r14d,r12d
+        xor     r15d,eax
+        vpaddd  ymm0,ymm0,ymm4
+        xor     r14d,r13d
+        lea     r10d,[r15*1+r10]
+        mov     r12d,edx
+        vpxor   ymm6,ymm6,ymm7
+        add     r9d,DWORD[((8+128))+rsp]
+        and     r12d,ecx
+        rorx    r13d,ecx,25
+        vpsrlq  ymm7,ymm7,2
+        rorx    r15d,ecx,11
+        lea     r10d,[r14*1+r10]
+        lea     r9d,[r12*1+r9]
+        vpxor   ymm6,ymm6,ymm7
+        andn    r12d,ecx,r8d
+        xor     r13d,r15d
+        rorx    r14d,ecx,6
+        vpshufd ymm6,ymm6,132
+        lea     r9d,[r12*1+r9]
+        xor     r13d,r14d
+        mov     r15d,r10d
+        vpsrldq ymm6,ymm6,8
+        rorx    r12d,r10d,22
+        lea     r9d,[r13*1+r9]
+        xor     r15d,r11d
+        vpaddd  ymm0,ymm0,ymm6
+        rorx    r14d,r10d,13
+        rorx    r13d,r10d,2
+        lea     ebx,[r9*1+rbx]
+        vpshufd ymm7,ymm0,80
+        and     esi,r15d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((32-128))+rdi]
+        xor     r14d,r12d
+        xor     esi,r11d
+        vpsrld  ymm6,ymm7,10
+        xor     r14d,r13d
+        lea     r9d,[rsi*1+r9]
+        mov     r12d,ecx
+        vpsrlq  ymm7,ymm7,17
+        add     r8d,DWORD[((12+128))+rsp]
+        and     r12d,ebx
+        rorx    r13d,ebx,25
+        vpxor   ymm6,ymm6,ymm7
+        rorx    esi,ebx,11
+        lea     r9d,[r14*1+r9]
+        lea     r8d,[r12*1+r8]
+        vpsrlq  ymm7,ymm7,2
+        andn    r12d,ebx,edx
+        xor     r13d,esi
+        rorx    r14d,ebx,6
+        vpxor   ymm6,ymm6,ymm7
+        lea     r8d,[r12*1+r8]
+        xor     r13d,r14d
+        mov     esi,r9d
+        vpshufd ymm6,ymm6,232
+        rorx    r12d,r9d,22
+        lea     r8d,[r13*1+r8]
+        xor     esi,r10d
+        vpslldq ymm6,ymm6,8
+        rorx    r14d,r9d,13
+        rorx    r13d,r9d,2
+        lea     eax,[r8*1+rax]
+        vpaddd  ymm0,ymm0,ymm6
+        and     r15d,esi
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((48-128))+rdi]
+        xor     r14d,r12d
+        xor     r15d,r10d
+        vpaddd  ymm6,ymm0,YMMWORD[rbp]
+        xor     r14d,r13d
+        lea     r8d,[r15*1+r8]
+        mov     r12d,ebx
+        vmovdqa YMMWORD[rsp],ymm6
+        vpalignr        ymm4,ymm2,ymm1,4
+        add     edx,DWORD[((32+128))+rsp]
+        and     r12d,eax
+        rorx    r13d,eax,25
+        vpalignr        ymm7,ymm0,ymm3,4
+        rorx    r15d,eax,11
+        lea     r8d,[r14*1+r8]
+        lea     edx,[r12*1+rdx]
+        vpsrld  ymm6,ymm4,7
+        andn    r12d,eax,ecx
+        xor     r13d,r15d
+        rorx    r14d,eax,6
+        vpaddd  ymm1,ymm1,ymm7
+        lea     edx,[r12*1+rdx]
+        xor     r13d,r14d
+        mov     r15d,r8d
+        vpsrld  ymm7,ymm4,3
+        rorx    r12d,r8d,22
+        lea     edx,[r13*1+rdx]
+        xor     r15d,r9d
+        vpslld  ymm5,ymm4,14
+        rorx    r14d,r8d,13
+        rorx    r13d,r8d,2
+        lea     r11d,[rdx*1+r11]
+        vpxor   ymm4,ymm7,ymm6
+        and     esi,r15d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((64-128))+rdi]
+        xor     r14d,r12d
+        xor     esi,r9d
+        vpshufd ymm7,ymm0,250
+        xor     r14d,r13d
+        lea     edx,[rsi*1+rdx]
+        mov     r12d,eax
+        vpsrld  ymm6,ymm6,11
+        add     ecx,DWORD[((36+128))+rsp]
+        and     r12d,r11d
+        rorx    r13d,r11d,25
+        vpxor   ymm4,ymm4,ymm5
+        rorx    esi,r11d,11
+        lea     edx,[r14*1+rdx]
+        lea     ecx,[r12*1+rcx]
+        vpslld  ymm5,ymm5,11
+        andn    r12d,r11d,ebx
+        xor     r13d,esi
+        rorx    r14d,r11d,6
+        vpxor   ymm4,ymm4,ymm6
+        lea     ecx,[r12*1+rcx]
+        xor     r13d,r14d
+        mov     esi,edx
+        vpsrld  ymm6,ymm7,10
+        rorx    r12d,edx,22
+        lea     ecx,[r13*1+rcx]
+        xor     esi,r8d
+        vpxor   ymm4,ymm4,ymm5
+        rorx    r14d,edx,13
+        rorx    r13d,edx,2
+        lea     r10d,[rcx*1+r10]
+        vpsrlq  ymm7,ymm7,17
+        and     r15d,esi
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((80-128))+rdi]
+        xor     r14d,r12d
+        xor     r15d,r8d
+        vpaddd  ymm1,ymm1,ymm4
+        xor     r14d,r13d
+        lea     ecx,[r15*1+rcx]
+        mov     r12d,r11d
+        vpxor   ymm6,ymm6,ymm7
+        add     ebx,DWORD[((40+128))+rsp]
+        and     r12d,r10d
+        rorx    r13d,r10d,25
+        vpsrlq  ymm7,ymm7,2
+        rorx    r15d,r10d,11
+        lea     ecx,[r14*1+rcx]
+        lea     ebx,[r12*1+rbx]
+        vpxor   ymm6,ymm6,ymm7
+        andn    r12d,r10d,eax
+        xor     r13d,r15d
+        rorx    r14d,r10d,6
+        vpshufd ymm6,ymm6,132
+        lea     ebx,[r12*1+rbx]
+        xor     r13d,r14d
+        mov     r15d,ecx
+        vpsrldq ymm6,ymm6,8
+        rorx    r12d,ecx,22
+        lea     ebx,[r13*1+rbx]
+        xor     r15d,edx
+        vpaddd  ymm1,ymm1,ymm6
+        rorx    r14d,ecx,13
+        rorx    r13d,ecx,2
+        lea     r9d,[rbx*1+r9]
+        vpshufd ymm7,ymm1,80
+        and     esi,r15d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((96-128))+rdi]
+        xor     r14d,r12d
+        xor     esi,edx
+        vpsrld  ymm6,ymm7,10
+        xor     r14d,r13d
+        lea     ebx,[rsi*1+rbx]
+        mov     r12d,r10d
+        vpsrlq  ymm7,ymm7,17
+        add     eax,DWORD[((44+128))+rsp]
+        and     r12d,r9d
+        rorx    r13d,r9d,25
+        vpxor   ymm6,ymm6,ymm7
+        rorx    esi,r9d,11
+        lea     ebx,[r14*1+rbx]
+        lea     eax,[r12*1+rax]
+        vpsrlq  ymm7,ymm7,2
+        andn    r12d,r9d,r11d
+        xor     r13d,esi
+        rorx    r14d,r9d,6
+        vpxor   ymm6,ymm6,ymm7
+        lea     eax,[r12*1+rax]
+        xor     r13d,r14d
+        mov     esi,ebx
+        vpshufd ymm6,ymm6,232
+        rorx    r12d,ebx,22
+        lea     eax,[r13*1+rax]
+        xor     esi,ecx
+        vpslldq ymm6,ymm6,8
+        rorx    r14d,ebx,13
+        rorx    r13d,ebx,2
+        lea     r8d,[rax*1+r8]
+        vpaddd  ymm1,ymm1,ymm6
+        and     r15d,esi
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((112-128))+rdi]
+        xor     r14d,r12d
+        xor     r15d,ecx
+        vpaddd  ymm6,ymm1,YMMWORD[32+rbp]
+        xor     r14d,r13d
+        lea     eax,[r15*1+rax]
+        mov     r12d,r9d
+        vmovdqa YMMWORD[32+rsp],ymm6
+        lea     rsp,[((-64))+rsp]
+        vpalignr        ymm4,ymm3,ymm2,4
+        add     r11d,DWORD[((0+128))+rsp]
+        and     r12d,r8d
+        rorx    r13d,r8d,25
+        vpalignr        ymm7,ymm1,ymm0,4
+        rorx    r15d,r8d,11
+        lea     eax,[r14*1+rax]
+        lea     r11d,[r12*1+r11]
+        vpsrld  ymm6,ymm4,7
+        andn    r12d,r8d,r10d
+        xor     r13d,r15d
+        rorx    r14d,r8d,6
+        vpaddd  ymm2,ymm2,ymm7
+        lea     r11d,[r12*1+r11]
+        xor     r13d,r14d
+        mov     r15d,eax
+        vpsrld  ymm7,ymm4,3
+        rorx    r12d,eax,22
+        lea     r11d,[r13*1+r11]
+        xor     r15d,ebx
+        vpslld  ymm5,ymm4,14
+        rorx    r14d,eax,13
+        rorx    r13d,eax,2
+        lea     edx,[r11*1+rdx]
+        vpxor   ymm4,ymm7,ymm6
+        and     esi,r15d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((128-128))+rdi]
+        xor     r14d,r12d
+        xor     esi,ebx
+        vpshufd ymm7,ymm1,250
+        xor     r14d,r13d
+        lea     r11d,[rsi*1+r11]
+        mov     r12d,r8d
+        vpsrld  ymm6,ymm6,11
+        add     r10d,DWORD[((4+128))+rsp]
+        and     r12d,edx
+        rorx    r13d,edx,25
+        vpxor   ymm4,ymm4,ymm5
+        rorx    esi,edx,11
+        lea     r11d,[r14*1+r11]
+        lea     r10d,[r12*1+r10]
+        vpslld  ymm5,ymm5,11
+        andn    r12d,edx,r9d
+        xor     r13d,esi
+        rorx    r14d,edx,6
+        vpxor   ymm4,ymm4,ymm6
+        lea     r10d,[r12*1+r10]
+        xor     r13d,r14d
+        mov     esi,r11d
+        vpsrld  ymm6,ymm7,10
+        rorx    r12d,r11d,22
+        lea     r10d,[r13*1+r10]
+        xor     esi,eax
+        vpxor   ymm4,ymm4,ymm5
+        rorx    r14d,r11d,13
+        rorx    r13d,r11d,2
+        lea     ecx,[r10*1+rcx]
+        vpsrlq  ymm7,ymm7,17
+        and     r15d,esi
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((144-128))+rdi]
+        xor     r14d,r12d
+        xor     r15d,eax
+        vpaddd  ymm2,ymm2,ymm4
+        xor     r14d,r13d
+        lea     r10d,[r15*1+r10]
+        mov     r12d,edx
+        vpxor   ymm6,ymm6,ymm7
+        add     r9d,DWORD[((8+128))+rsp]
+        and     r12d,ecx
+        rorx    r13d,ecx,25
+        vpsrlq  ymm7,ymm7,2
+        rorx    r15d,ecx,11
+        lea     r10d,[r14*1+r10]
+        lea     r9d,[r12*1+r9]
+        vpxor   ymm6,ymm6,ymm7
+        andn    r12d,ecx,r8d
+        xor     r13d,r15d
+        rorx    r14d,ecx,6
+        vpshufd ymm6,ymm6,132
+        lea     r9d,[r12*1+r9]
+        xor     r13d,r14d
+        mov     r15d,r10d
+        vpsrldq ymm6,ymm6,8
+        rorx    r12d,r10d,22
+        lea     r9d,[r13*1+r9]
+        xor     r15d,r11d
+        vpaddd  ymm2,ymm2,ymm6
+        rorx    r14d,r10d,13
+        rorx    r13d,r10d,2
+        lea     ebx,[r9*1+rbx]
+        vpshufd ymm7,ymm2,80
+        and     esi,r15d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((160-128))+rdi]
+        xor     r14d,r12d
+        xor     esi,r11d
+        vpsrld  ymm6,ymm7,10
+        xor     r14d,r13d
+        lea     r9d,[rsi*1+r9]
+        mov     r12d,ecx
+        vpsrlq  ymm7,ymm7,17
+        add     r8d,DWORD[((12+128))+rsp]
+        and     r12d,ebx
+        rorx    r13d,ebx,25
+        vpxor   ymm6,ymm6,ymm7
+        rorx    esi,ebx,11
+        lea     r9d,[r14*1+r9]
+        lea     r8d,[r12*1+r8]
+        vpsrlq  ymm7,ymm7,2
+        andn    r12d,ebx,edx
+        xor     r13d,esi
+        rorx    r14d,ebx,6
+        vpxor   ymm6,ymm6,ymm7
+        lea     r8d,[r12*1+r8]
+        xor     r13d,r14d
+        mov     esi,r9d
+        vpshufd ymm6,ymm6,232
+        rorx    r12d,r9d,22
+        lea     r8d,[r13*1+r8]
+        xor     esi,r10d
+        vpslldq ymm6,ymm6,8
+        rorx    r14d,r9d,13
+        rorx    r13d,r9d,2
+        lea     eax,[r8*1+rax]
+        vpaddd  ymm2,ymm2,ymm6
+        and     r15d,esi
+        vaesenclast     xmm11,xmm9,xmm10
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((176-128))+rdi]
+        xor     r14d,r12d
+        xor     r15d,r10d
+        vpaddd  ymm6,ymm2,YMMWORD[64+rbp]
+        xor     r14d,r13d
+        lea     r8d,[r15*1+r8]
+        mov     r12d,ebx
+        vmovdqa YMMWORD[rsp],ymm6
+        vpalignr        ymm4,ymm0,ymm3,4
+        add     edx,DWORD[((32+128))+rsp]
+        and     r12d,eax
+        rorx    r13d,eax,25
+        vpalignr        ymm7,ymm2,ymm1,4
+        rorx    r15d,eax,11
+        lea     r8d,[r14*1+r8]
+        lea     edx,[r12*1+rdx]
+        vpsrld  ymm6,ymm4,7
+        andn    r12d,eax,ecx
+        xor     r13d,r15d
+        rorx    r14d,eax,6
+        vpaddd  ymm3,ymm3,ymm7
+        lea     edx,[r12*1+rdx]
+        xor     r13d,r14d
+        mov     r15d,r8d
+        vpsrld  ymm7,ymm4,3
+        rorx    r12d,r8d,22
+        lea     edx,[r13*1+rdx]
+        xor     r15d,r9d
+        vpslld  ymm5,ymm4,14
+        rorx    r14d,r8d,13
+        rorx    r13d,r8d,2
+        lea     r11d,[rdx*1+r11]
+        vpxor   ymm4,ymm7,ymm6
+        and     esi,r15d
+        vpand   xmm8,xmm11,xmm12
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((192-128))+rdi]
+        xor     r14d,r12d
+        xor     esi,r9d
+        vpshufd ymm7,ymm2,250
+        xor     r14d,r13d
+        lea     edx,[rsi*1+rdx]
+        mov     r12d,eax
+        vpsrld  ymm6,ymm6,11
+        add     ecx,DWORD[((36+128))+rsp]
+        and     r12d,r11d
+        rorx    r13d,r11d,25
+        vpxor   ymm4,ymm4,ymm5
+        rorx    esi,r11d,11
+        lea     edx,[r14*1+rdx]
+        lea     ecx,[r12*1+rcx]
+        vpslld  ymm5,ymm5,11
+        andn    r12d,r11d,ebx
+        xor     r13d,esi
+        rorx    r14d,r11d,6
+        vpxor   ymm4,ymm4,ymm6
+        lea     ecx,[r12*1+rcx]
+        xor     r13d,r14d
+        mov     esi,edx
+        vpsrld  ymm6,ymm7,10
+        rorx    r12d,edx,22
+        lea     ecx,[r13*1+rcx]
+        xor     esi,r8d
+        vpxor   ymm4,ymm4,ymm5
+        rorx    r14d,edx,13
+        rorx    r13d,edx,2
+        lea     r10d,[rcx*1+r10]
+        vpsrlq  ymm7,ymm7,17
+        and     r15d,esi
+        vaesenclast     xmm11,xmm9,xmm10
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((208-128))+rdi]
+        xor     r14d,r12d
+        xor     r15d,r8d
+        vpaddd  ymm3,ymm3,ymm4
+        xor     r14d,r13d
+        lea     ecx,[r15*1+rcx]
+        mov     r12d,r11d
+        vpxor   ymm6,ymm6,ymm7
+        add     ebx,DWORD[((40+128))+rsp]
+        and     r12d,r10d
+        rorx    r13d,r10d,25
+        vpsrlq  ymm7,ymm7,2
+        rorx    r15d,r10d,11
+        lea     ecx,[r14*1+rcx]
+        lea     ebx,[r12*1+rbx]
+        vpxor   ymm6,ymm6,ymm7
+        andn    r12d,r10d,eax
+        xor     r13d,r15d
+        rorx    r14d,r10d,6
+        vpshufd ymm6,ymm6,132
+        lea     ebx,[r12*1+rbx]
+        xor     r13d,r14d
+        mov     r15d,ecx
+        vpsrldq ymm6,ymm6,8
+        rorx    r12d,ecx,22
+        lea     ebx,[r13*1+rbx]
+        xor     r15d,edx
+        vpaddd  ymm3,ymm3,ymm6
+        rorx    r14d,ecx,13
+        rorx    r13d,ecx,2
+        lea     r9d,[rbx*1+r9]
+        vpshufd ymm7,ymm3,80
+        and     esi,r15d
+        vpand   xmm11,xmm11,xmm13
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((224-128))+rdi]
+        xor     r14d,r12d
+        xor     esi,edx
+        vpsrld  ymm6,ymm7,10
+        xor     r14d,r13d
+        lea     ebx,[rsi*1+rbx]
+        mov     r12d,r10d
+        vpsrlq  ymm7,ymm7,17
+        add     eax,DWORD[((44+128))+rsp]
+        and     r12d,r9d
+        rorx    r13d,r9d,25
+        vpxor   ymm6,ymm6,ymm7
+        rorx    esi,r9d,11
+        lea     ebx,[r14*1+rbx]
+        lea     eax,[r12*1+rax]
+        vpsrlq  ymm7,ymm7,2
+        andn    r12d,r9d,r11d
+        xor     r13d,esi
+        rorx    r14d,r9d,6
+        vpxor   ymm6,ymm6,ymm7
+        lea     eax,[r12*1+rax]
+        xor     r13d,r14d
+        mov     esi,ebx
+        vpshufd ymm6,ymm6,232
+        rorx    r12d,ebx,22
+        lea     eax,[r13*1+rax]
+        xor     esi,ecx
+        vpslldq ymm6,ymm6,8
+        rorx    r14d,ebx,13
+        rorx    r13d,ebx,2
+        lea     r8d,[rax*1+r8]
+        vpaddd  ymm3,ymm3,ymm6
+        and     r15d,esi
+        vpor    xmm8,xmm8,xmm11
+        vaesenclast     xmm11,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((0-128))+rdi]
+        xor     r14d,r12d
+        xor     r15d,ecx
+        vpaddd  ymm6,ymm3,YMMWORD[96+rbp]
+        xor     r14d,r13d
+        lea     eax,[r15*1+rax]
+        mov     r12d,r9d
+        vmovdqa YMMWORD[32+rsp],ymm6
+        vmovq   r13,xmm15
+        vpextrq r15,xmm15,1
+        vpand   xmm11,xmm11,xmm14
+        vpor    xmm8,xmm8,xmm11
+        vmovdqu XMMWORD[r13*1+r15],xmm8
+        lea     r13,[16+r13]
+        lea     rbp,[128+rbp]
+        cmp     BYTE[3+rbp],0
+        jne     NEAR $L$avx2_00_47
+        vmovdqu xmm9,XMMWORD[r13]
+        vpinsrq xmm15,xmm15,r13,0
+        add     r11d,DWORD[((0+64))+rsp]
+        and     r12d,r8d
+        rorx    r13d,r8d,25
+        rorx    r15d,r8d,11
+        lea     eax,[r14*1+rax]
+        lea     r11d,[r12*1+r11]
+        andn    r12d,r8d,r10d
+        xor     r13d,r15d
+        rorx    r14d,r8d,6
+        lea     r11d,[r12*1+r11]
+        xor     r13d,r14d
+        mov     r15d,eax
+        rorx    r12d,eax,22
+        lea     r11d,[r13*1+r11]
+        xor     r15d,ebx
+        rorx    r14d,eax,13
+        rorx    r13d,eax,2
+        lea     edx,[r11*1+rdx]
+        and     esi,r15d
+        vpxor   xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((16-128))+rdi]
+        xor     r14d,r12d
+        xor     esi,ebx
+        xor     r14d,r13d
+        lea     r11d,[rsi*1+r11]
+        mov     r12d,r8d
+        add     r10d,DWORD[((4+64))+rsp]
+        and     r12d,edx
+        rorx    r13d,edx,25
+        rorx    esi,edx,11
+        lea     r11d,[r14*1+r11]
+        lea     r10d,[r12*1+r10]
+        andn    r12d,edx,r9d
+        xor     r13d,esi
+        rorx    r14d,edx,6
+        lea     r10d,[r12*1+r10]
+        xor     r13d,r14d
+        mov     esi,r11d
+        rorx    r12d,r11d,22
+        lea     r10d,[r13*1+r10]
+        xor     esi,eax
+        rorx    r14d,r11d,13
+        rorx    r13d,r11d,2
+        lea     ecx,[r10*1+rcx]
+        and     r15d,esi
+        vpxor   xmm9,xmm9,xmm8
+        xor     r14d,r12d
+        xor     r15d,eax
+        xor     r14d,r13d
+        lea     r10d,[r15*1+r10]
+        mov     r12d,edx
+        add     r9d,DWORD[((8+64))+rsp]
+        and     r12d,ecx
+        rorx    r13d,ecx,25
+        rorx    r15d,ecx,11
+        lea     r10d,[r14*1+r10]
+        lea     r9d,[r12*1+r9]
+        andn    r12d,ecx,r8d
+        xor     r13d,r15d
+        rorx    r14d,ecx,6
+        lea     r9d,[r12*1+r9]
+        xor     r13d,r14d
+        mov     r15d,r10d
+        rorx    r12d,r10d,22
+        lea     r9d,[r13*1+r9]
+        xor     r15d,r11d
+        rorx    r14d,r10d,13
+        rorx    r13d,r10d,2
+        lea     ebx,[r9*1+rbx]
+        and     esi,r15d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((32-128))+rdi]
+        xor     r14d,r12d
+        xor     esi,r11d
+        xor     r14d,r13d
+        lea     r9d,[rsi*1+r9]
+        mov     r12d,ecx
+        add     r8d,DWORD[((12+64))+rsp]
+        and     r12d,ebx
+        rorx    r13d,ebx,25
+        rorx    esi,ebx,11
+        lea     r9d,[r14*1+r9]
+        lea     r8d,[r12*1+r8]
+        andn    r12d,ebx,edx
+        xor     r13d,esi
+        rorx    r14d,ebx,6
+        lea     r8d,[r12*1+r8]
+        xor     r13d,r14d
+        mov     esi,r9d
+        rorx    r12d,r9d,22
+        lea     r8d,[r13*1+r8]
+        xor     esi,r10d
+        rorx    r14d,r9d,13
+        rorx    r13d,r9d,2
+        lea     eax,[r8*1+rax]
+        and     r15d,esi
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((48-128))+rdi]
+        xor     r14d,r12d
+        xor     r15d,r10d
+        xor     r14d,r13d
+        lea     r8d,[r15*1+r8]
+        mov     r12d,ebx
+        add     edx,DWORD[((32+64))+rsp]
+        and     r12d,eax
+        rorx    r13d,eax,25
+        rorx    r15d,eax,11
+        lea     r8d,[r14*1+r8]
+        lea     edx,[r12*1+rdx]
+        andn    r12d,eax,ecx
+        xor     r13d,r15d
+        rorx    r14d,eax,6
+        lea     edx,[r12*1+rdx]
+        xor     r13d,r14d
+        mov     r15d,r8d
+        rorx    r12d,r8d,22
+        lea     edx,[r13*1+rdx]
+        xor     r15d,r9d
+        rorx    r14d,r8d,13
+        rorx    r13d,r8d,2
+        lea     r11d,[rdx*1+r11]
+        and     esi,r15d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((64-128))+rdi]
+        xor     r14d,r12d
+        xor     esi,r9d
+        xor     r14d,r13d
+        lea     edx,[rsi*1+rdx]
+        mov     r12d,eax
+        add     ecx,DWORD[((36+64))+rsp]
+        and     r12d,r11d
+        rorx    r13d,r11d,25
+        rorx    esi,r11d,11
+        lea     edx,[r14*1+rdx]
+        lea     ecx,[r12*1+rcx]
+        andn    r12d,r11d,ebx
+        xor     r13d,esi
+        rorx    r14d,r11d,6
+        lea     ecx,[r12*1+rcx]
+        xor     r13d,r14d
+        mov     esi,edx
+        rorx    r12d,edx,22
+        lea     ecx,[r13*1+rcx]
+        xor     esi,r8d
+        rorx    r14d,edx,13
+        rorx    r13d,edx,2
+        lea     r10d,[rcx*1+r10]
+        and     r15d,esi
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((80-128))+rdi]
+        xor     r14d,r12d
+        xor     r15d,r8d
+        xor     r14d,r13d
+        lea     ecx,[r15*1+rcx]
+        mov     r12d,r11d
+        add     ebx,DWORD[((40+64))+rsp]
+        and     r12d,r10d
+        rorx    r13d,r10d,25
+        rorx    r15d,r10d,11
+        lea     ecx,[r14*1+rcx]
+        lea     ebx,[r12*1+rbx]
+        andn    r12d,r10d,eax
+        xor     r13d,r15d
+        rorx    r14d,r10d,6
+        lea     ebx,[r12*1+rbx]
+        xor     r13d,r14d
+        mov     r15d,ecx
+        rorx    r12d,ecx,22
+        lea     ebx,[r13*1+rbx]
+        xor     r15d,edx
+        rorx    r14d,ecx,13
+        rorx    r13d,ecx,2
+        lea     r9d,[rbx*1+r9]
+        and     esi,r15d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((96-128))+rdi]
+        xor     r14d,r12d
+        xor     esi,edx
+        xor     r14d,r13d
+        lea     ebx,[rsi*1+rbx]
+        mov     r12d,r10d
+        add     eax,DWORD[((44+64))+rsp]
+        and     r12d,r9d
+        rorx    r13d,r9d,25
+        rorx    esi,r9d,11
+        lea     ebx,[r14*1+rbx]
+        lea     eax,[r12*1+rax]
+        andn    r12d,r9d,r11d
+        xor     r13d,esi
+        rorx    r14d,r9d,6
+        lea     eax,[r12*1+rax]
+        xor     r13d,r14d
+        mov     esi,ebx
+        rorx    r12d,ebx,22
+        lea     eax,[r13*1+rax]
+        xor     esi,ecx
+        rorx    r14d,ebx,13
+        rorx    r13d,ebx,2
+        lea     r8d,[rax*1+r8]
+        and     r15d,esi
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((112-128))+rdi]
+        xor     r14d,r12d
+        xor     r15d,ecx
+        xor     r14d,r13d
+        lea     eax,[r15*1+rax]
+        mov     r12d,r9d
+        add     r11d,DWORD[rsp]
+        and     r12d,r8d
+        rorx    r13d,r8d,25
+        rorx    r15d,r8d,11
+        lea     eax,[r14*1+rax]
+        lea     r11d,[r12*1+r11]
+        andn    r12d,r8d,r10d
+        xor     r13d,r15d
+        rorx    r14d,r8d,6
+        lea     r11d,[r12*1+r11]
+        xor     r13d,r14d
+        mov     r15d,eax
+        rorx    r12d,eax,22
+        lea     r11d,[r13*1+r11]
+        xor     r15d,ebx
+        rorx    r14d,eax,13
+        rorx    r13d,eax,2
+        lea     edx,[r11*1+rdx]
+        and     esi,r15d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((128-128))+rdi]
+        xor     r14d,r12d
+        xor     esi,ebx
+        xor     r14d,r13d
+        lea     r11d,[rsi*1+r11]
+        mov     r12d,r8d
+        add     r10d,DWORD[4+rsp]
+        and     r12d,edx
+        rorx    r13d,edx,25
+        rorx    esi,edx,11
+        lea     r11d,[r14*1+r11]
+        lea     r10d,[r12*1+r10]
+        andn    r12d,edx,r9d
+        xor     r13d,esi
+        rorx    r14d,edx,6
+        lea     r10d,[r12*1+r10]
+        xor     r13d,r14d
+        mov     esi,r11d
+        rorx    r12d,r11d,22
+        lea     r10d,[r13*1+r10]
+        xor     esi,eax
+        rorx    r14d,r11d,13
+        rorx    r13d,r11d,2
+        lea     ecx,[r10*1+rcx]
+        and     r15d,esi
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((144-128))+rdi]
+        xor     r14d,r12d
+        xor     r15d,eax
+        xor     r14d,r13d
+        lea     r10d,[r15*1+r10]
+        mov     r12d,edx
+        add     r9d,DWORD[8+rsp]
+        and     r12d,ecx
+        rorx    r13d,ecx,25
+        rorx    r15d,ecx,11
+        lea     r10d,[r14*1+r10]
+        lea     r9d,[r12*1+r9]
+        andn    r12d,ecx,r8d
+        xor     r13d,r15d
+        rorx    r14d,ecx,6
+        lea     r9d,[r12*1+r9]
+        xor     r13d,r14d
+        mov     r15d,r10d
+        rorx    r12d,r10d,22
+        lea     r9d,[r13*1+r9]
+        xor     r15d,r11d
+        rorx    r14d,r10d,13
+        rorx    r13d,r10d,2
+        lea     ebx,[r9*1+rbx]
+        and     esi,r15d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((160-128))+rdi]
+        xor     r14d,r12d
+        xor     esi,r11d
+        xor     r14d,r13d
+        lea     r9d,[rsi*1+r9]
+        mov     r12d,ecx
+        add     r8d,DWORD[12+rsp]
+        and     r12d,ebx
+        rorx    r13d,ebx,25
+        rorx    esi,ebx,11
+        lea     r9d,[r14*1+r9]
+        lea     r8d,[r12*1+r8]
+        andn    r12d,ebx,edx
+        xor     r13d,esi
+        rorx    r14d,ebx,6
+        lea     r8d,[r12*1+r8]
+        xor     r13d,r14d
+        mov     esi,r9d
+        rorx    r12d,r9d,22
+        lea     r8d,[r13*1+r8]
+        xor     esi,r10d
+        rorx    r14d,r9d,13
+        rorx    r13d,r9d,2
+        lea     eax,[r8*1+rax]
+        and     r15d,esi
+        vaesenclast     xmm11,xmm9,xmm10
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((176-128))+rdi]
+        xor     r14d,r12d
+        xor     r15d,r10d
+        xor     r14d,r13d
+        lea     r8d,[r15*1+r8]
+        mov     r12d,ebx
+        add     edx,DWORD[32+rsp]
+        and     r12d,eax
+        rorx    r13d,eax,25
+        rorx    r15d,eax,11
+        lea     r8d,[r14*1+r8]
+        lea     edx,[r12*1+rdx]
+        andn    r12d,eax,ecx
+        xor     r13d,r15d
+        rorx    r14d,eax,6
+        lea     edx,[r12*1+rdx]
+        xor     r13d,r14d
+        mov     r15d,r8d
+        rorx    r12d,r8d,22
+        lea     edx,[r13*1+rdx]
+        xor     r15d,r9d
+        rorx    r14d,r8d,13
+        rorx    r13d,r8d,2
+        lea     r11d,[rdx*1+r11]
+        and     esi,r15d
+        vpand   xmm8,xmm11,xmm12
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((192-128))+rdi]
+        xor     r14d,r12d
+        xor     esi,r9d
+        xor     r14d,r13d
+        lea     edx,[rsi*1+rdx]
+        mov     r12d,eax
+        add     ecx,DWORD[36+rsp]
+        and     r12d,r11d
+        rorx    r13d,r11d,25
+        rorx    esi,r11d,11
+        lea     edx,[r14*1+rdx]
+        lea     ecx,[r12*1+rcx]
+        andn    r12d,r11d,ebx
+        xor     r13d,esi
+        rorx    r14d,r11d,6
+        lea     ecx,[r12*1+rcx]
+        xor     r13d,r14d
+        mov     esi,edx
+        rorx    r12d,edx,22
+        lea     ecx,[r13*1+rcx]
+        xor     esi,r8d
+        rorx    r14d,edx,13
+        rorx    r13d,edx,2
+        lea     r10d,[rcx*1+r10]
+        and     r15d,esi
+        vaesenclast     xmm11,xmm9,xmm10
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((208-128))+rdi]
+        xor     r14d,r12d
+        xor     r15d,r8d
+        xor     r14d,r13d
+        lea     ecx,[r15*1+rcx]
+        mov     r12d,r11d
+        add     ebx,DWORD[40+rsp]
+        and     r12d,r10d
+        rorx    r13d,r10d,25
+        rorx    r15d,r10d,11
+        lea     ecx,[r14*1+rcx]
+        lea     ebx,[r12*1+rbx]
+        andn    r12d,r10d,eax
+        xor     r13d,r15d
+        rorx    r14d,r10d,6
+        lea     ebx,[r12*1+rbx]
+        xor     r13d,r14d
+        mov     r15d,ecx
+        rorx    r12d,ecx,22
+        lea     ebx,[r13*1+rbx]
+        xor     r15d,edx
+        rorx    r14d,ecx,13
+        rorx    r13d,ecx,2
+        lea     r9d,[rbx*1+r9]
+        and     esi,r15d
+        vpand   xmm11,xmm11,xmm13
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((224-128))+rdi]
+        xor     r14d,r12d
+        xor     esi,edx
+        xor     r14d,r13d
+        lea     ebx,[rsi*1+rbx]
+        mov     r12d,r10d
+        add     eax,DWORD[44+rsp]
+        and     r12d,r9d
+        rorx    r13d,r9d,25
+        rorx    esi,r9d,11
+        lea     ebx,[r14*1+rbx]
+        lea     eax,[r12*1+rax]
+        andn    r12d,r9d,r11d
+        xor     r13d,esi
+        rorx    r14d,r9d,6
+        lea     eax,[r12*1+rax]
+        xor     r13d,r14d
+        mov     esi,ebx
+        rorx    r12d,ebx,22
+        lea     eax,[r13*1+rax]
+        xor     esi,ecx
+        rorx    r14d,ebx,13
+        rorx    r13d,ebx,2
+        lea     r8d,[rax*1+r8]
+        and     r15d,esi
+        vpor    xmm8,xmm8,xmm11
+        vaesenclast     xmm11,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((0-128))+rdi]
+        xor     r14d,r12d
+        xor     r15d,ecx
+        xor     r14d,r13d
+        lea     eax,[r15*1+rax]
+        mov     r12d,r9d
+        vpextrq r12,xmm15,1
+        vmovq   r13,xmm15
+        mov     r15,QWORD[552+rsp]
+        add     eax,r14d
+        lea     rbp,[448+rsp]
+
+        vpand   xmm11,xmm11,xmm14
+        vpor    xmm8,xmm8,xmm11
+        vmovdqu XMMWORD[r13*1+r12],xmm8
+        lea     r13,[16+r13]
+
+        add     eax,DWORD[r15]
+        add     ebx,DWORD[4+r15]
+        add     ecx,DWORD[8+r15]
+        add     edx,DWORD[12+r15]
+        add     r8d,DWORD[16+r15]
+        add     r9d,DWORD[20+r15]
+        add     r10d,DWORD[24+r15]
+        add     r11d,DWORD[28+r15]
+
+        mov     DWORD[r15],eax
+        mov     DWORD[4+r15],ebx
+        mov     DWORD[8+r15],ecx
+        mov     DWORD[12+r15],edx
+        mov     DWORD[16+r15],r8d
+        mov     DWORD[20+r15],r9d
+        mov     DWORD[24+r15],r10d
+        mov     DWORD[28+r15],r11d
+
+        cmp     r13,QWORD[80+rbp]
+        je      NEAR $L$done_avx2
+
+        xor     r14d,r14d
+        mov     esi,ebx
+        mov     r12d,r9d
+        xor     esi,ecx
+        jmp     NEAR $L$ower_avx2
+ALIGN   16
+$L$ower_avx2:
+        vmovdqu xmm9,XMMWORD[r13]
+        vpinsrq xmm15,xmm15,r13,0
+        add     r11d,DWORD[((0+16))+rbp]
+        and     r12d,r8d
+        rorx    r13d,r8d,25
+        rorx    r15d,r8d,11
+        lea     eax,[r14*1+rax]
+        lea     r11d,[r12*1+r11]
+        andn    r12d,r8d,r10d
+        xor     r13d,r15d
+        rorx    r14d,r8d,6
+        lea     r11d,[r12*1+r11]
+        xor     r13d,r14d
+        mov     r15d,eax
+        rorx    r12d,eax,22
+        lea     r11d,[r13*1+r11]
+        xor     r15d,ebx
+        rorx    r14d,eax,13
+        rorx    r13d,eax,2
+        lea     edx,[r11*1+rdx]
+        and     esi,r15d
+        vpxor   xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((16-128))+rdi]
+        xor     r14d,r12d
+        xor     esi,ebx
+        xor     r14d,r13d
+        lea     r11d,[rsi*1+r11]
+        mov     r12d,r8d
+        add     r10d,DWORD[((4+16))+rbp]
+        and     r12d,edx
+        rorx    r13d,edx,25
+        rorx    esi,edx,11
+        lea     r11d,[r14*1+r11]
+        lea     r10d,[r12*1+r10]
+        andn    r12d,edx,r9d
+        xor     r13d,esi
+        rorx    r14d,edx,6
+        lea     r10d,[r12*1+r10]
+        xor     r13d,r14d
+        mov     esi,r11d
+        rorx    r12d,r11d,22
+        lea     r10d,[r13*1+r10]
+        xor     esi,eax
+        rorx    r14d,r11d,13
+        rorx    r13d,r11d,2
+        lea     ecx,[r10*1+rcx]
+        and     r15d,esi
+        vpxor   xmm9,xmm9,xmm8
+        xor     r14d,r12d
+        xor     r15d,eax
+        xor     r14d,r13d
+        lea     r10d,[r15*1+r10]
+        mov     r12d,edx
+        add     r9d,DWORD[((8+16))+rbp]
+        and     r12d,ecx
+        rorx    r13d,ecx,25
+        rorx    r15d,ecx,11
+        lea     r10d,[r14*1+r10]
+        lea     r9d,[r12*1+r9]
+        andn    r12d,ecx,r8d
+        xor     r13d,r15d
+        rorx    r14d,ecx,6
+        lea     r9d,[r12*1+r9]
+        xor     r13d,r14d
+        mov     r15d,r10d
+        rorx    r12d,r10d,22
+        lea     r9d,[r13*1+r9]
+        xor     r15d,r11d
+        rorx    r14d,r10d,13
+        rorx    r13d,r10d,2
+        lea     ebx,[r9*1+rbx]
+        and     esi,r15d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((32-128))+rdi]
+        xor     r14d,r12d
+        xor     esi,r11d
+        xor     r14d,r13d
+        lea     r9d,[rsi*1+r9]
+        mov     r12d,ecx
+        add     r8d,DWORD[((12+16))+rbp]
+        and     r12d,ebx
+        rorx    r13d,ebx,25
+        rorx    esi,ebx,11
+        lea     r9d,[r14*1+r9]
+        lea     r8d,[r12*1+r8]
+        andn    r12d,ebx,edx
+        xor     r13d,esi
+        rorx    r14d,ebx,6
+        lea     r8d,[r12*1+r8]
+        xor     r13d,r14d
+        mov     esi,r9d
+        rorx    r12d,r9d,22
+        lea     r8d,[r13*1+r8]
+        xor     esi,r10d
+        rorx    r14d,r9d,13
+        rorx    r13d,r9d,2
+        lea     eax,[r8*1+rax]
+        and     r15d,esi
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((48-128))+rdi]
+        xor     r14d,r12d
+        xor     r15d,r10d
+        xor     r14d,r13d
+        lea     r8d,[r15*1+r8]
+        mov     r12d,ebx
+        add     edx,DWORD[((32+16))+rbp]
+        and     r12d,eax
+        rorx    r13d,eax,25
+        rorx    r15d,eax,11
+        lea     r8d,[r14*1+r8]
+        lea     edx,[r12*1+rdx]
+        andn    r12d,eax,ecx
+        xor     r13d,r15d
+        rorx    r14d,eax,6
+        lea     edx,[r12*1+rdx]
+        xor     r13d,r14d
+        mov     r15d,r8d
+        rorx    r12d,r8d,22
+        lea     edx,[r13*1+rdx]
+        xor     r15d,r9d
+        rorx    r14d,r8d,13
+        rorx    r13d,r8d,2
+        lea     r11d,[rdx*1+r11]
+        and     esi,r15d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((64-128))+rdi]
+        xor     r14d,r12d
+        xor     esi,r9d
+        xor     r14d,r13d
+        lea     edx,[rsi*1+rdx]
+        mov     r12d,eax
+        add     ecx,DWORD[((36+16))+rbp]
+        and     r12d,r11d
+        rorx    r13d,r11d,25
+        rorx    esi,r11d,11
+        lea     edx,[r14*1+rdx]
+        lea     ecx,[r12*1+rcx]
+        andn    r12d,r11d,ebx
+        xor     r13d,esi
+        rorx    r14d,r11d,6
+        lea     ecx,[r12*1+rcx]
+        xor     r13d,r14d
+        mov     esi,edx
+        rorx    r12d,edx,22
+        lea     ecx,[r13*1+rcx]
+        xor     esi,r8d
+        rorx    r14d,edx,13
+        rorx    r13d,edx,2
+        lea     r10d,[rcx*1+r10]
+        and     r15d,esi
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((80-128))+rdi]
+        xor     r14d,r12d
+        xor     r15d,r8d
+        xor     r14d,r13d
+        lea     ecx,[r15*1+rcx]
+        mov     r12d,r11d
+        add     ebx,DWORD[((40+16))+rbp]
+        and     r12d,r10d
+        rorx    r13d,r10d,25
+        rorx    r15d,r10d,11
+        lea     ecx,[r14*1+rcx]
+        lea     ebx,[r12*1+rbx]
+        andn    r12d,r10d,eax
+        xor     r13d,r15d
+        rorx    r14d,r10d,6
+        lea     ebx,[r12*1+rbx]
+        xor     r13d,r14d
+        mov     r15d,ecx
+        rorx    r12d,ecx,22
+        lea     ebx,[r13*1+rbx]
+        xor     r15d,edx
+        rorx    r14d,ecx,13
+        rorx    r13d,ecx,2
+        lea     r9d,[rbx*1+r9]
+        and     esi,r15d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((96-128))+rdi]
+        xor     r14d,r12d
+        xor     esi,edx
+        xor     r14d,r13d
+        lea     ebx,[rsi*1+rbx]
+        mov     r12d,r10d
+        add     eax,DWORD[((44+16))+rbp]
+        and     r12d,r9d
+        rorx    r13d,r9d,25
+        rorx    esi,r9d,11
+        lea     ebx,[r14*1+rbx]
+        lea     eax,[r12*1+rax]
+        andn    r12d,r9d,r11d
+        xor     r13d,esi
+        rorx    r14d,r9d,6
+        lea     eax,[r12*1+rax]
+        xor     r13d,r14d
+        mov     esi,ebx
+        rorx    r12d,ebx,22
+        lea     eax,[r13*1+rax]
+        xor     esi,ecx
+        rorx    r14d,ebx,13
+        rorx    r13d,ebx,2
+        lea     r8d,[rax*1+r8]
+        and     r15d,esi
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((112-128))+rdi]
+        xor     r14d,r12d
+        xor     r15d,ecx
+        xor     r14d,r13d
+        lea     eax,[r15*1+rax]
+        mov     r12d,r9d
+        lea     rbp,[((-64))+rbp]
+        add     r11d,DWORD[((0+16))+rbp]
+        and     r12d,r8d
+        rorx    r13d,r8d,25
+        rorx    r15d,r8d,11
+        lea     eax,[r14*1+rax]
+        lea     r11d,[r12*1+r11]
+        andn    r12d,r8d,r10d
+        xor     r13d,r15d
+        rorx    r14d,r8d,6
+        lea     r11d,[r12*1+r11]
+        xor     r13d,r14d
+        mov     r15d,eax
+        rorx    r12d,eax,22
+        lea     r11d,[r13*1+r11]
+        xor     r15d,ebx
+        rorx    r14d,eax,13
+        rorx    r13d,eax,2
+        lea     edx,[r11*1+rdx]
+        and     esi,r15d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((128-128))+rdi]
+        xor     r14d,r12d
+        xor     esi,ebx
+        xor     r14d,r13d
+        lea     r11d,[rsi*1+r11]
+        mov     r12d,r8d
+        add     r10d,DWORD[((4+16))+rbp]
+        and     r12d,edx
+        rorx    r13d,edx,25
+        rorx    esi,edx,11
+        lea     r11d,[r14*1+r11]
+        lea     r10d,[r12*1+r10]
+        andn    r12d,edx,r9d
+        xor     r13d,esi
+        rorx    r14d,edx,6
+        lea     r10d,[r12*1+r10]
+        xor     r13d,r14d
+        mov     esi,r11d
+        rorx    r12d,r11d,22
+        lea     r10d,[r13*1+r10]
+        xor     esi,eax
+        rorx    r14d,r11d,13
+        rorx    r13d,r11d,2
+        lea     ecx,[r10*1+rcx]
+        and     r15d,esi
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((144-128))+rdi]
+        xor     r14d,r12d
+        xor     r15d,eax
+        xor     r14d,r13d
+        lea     r10d,[r15*1+r10]
+        mov     r12d,edx
+        add     r9d,DWORD[((8+16))+rbp]
+        and     r12d,ecx
+        rorx    r13d,ecx,25
+        rorx    r15d,ecx,11
+        lea     r10d,[r14*1+r10]
+        lea     r9d,[r12*1+r9]
+        andn    r12d,ecx,r8d
+        xor     r13d,r15d
+        rorx    r14d,ecx,6
+        lea     r9d,[r12*1+r9]
+        xor     r13d,r14d
+        mov     r15d,r10d
+        rorx    r12d,r10d,22
+        lea     r9d,[r13*1+r9]
+        xor     r15d,r11d
+        rorx    r14d,r10d,13
+        rorx    r13d,r10d,2
+        lea     ebx,[r9*1+rbx]
+        and     esi,r15d
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((160-128))+rdi]
+        xor     r14d,r12d
+        xor     esi,r11d
+        xor     r14d,r13d
+        lea     r9d,[rsi*1+r9]
+        mov     r12d,ecx
+        add     r8d,DWORD[((12+16))+rbp]
+        and     r12d,ebx
+        rorx    r13d,ebx,25
+        rorx    esi,ebx,11
+        lea     r9d,[r14*1+r9]
+        lea     r8d,[r12*1+r8]
+        andn    r12d,ebx,edx
+        xor     r13d,esi
+        rorx    r14d,ebx,6
+        lea     r8d,[r12*1+r8]
+        xor     r13d,r14d
+        mov     esi,r9d
+        rorx    r12d,r9d,22
+        lea     r8d,[r13*1+r8]
+        xor     esi,r10d
+        rorx    r14d,r9d,13
+        rorx    r13d,r9d,2
+        lea     eax,[r8*1+rax]
+        and     r15d,esi
+        vaesenclast     xmm11,xmm9,xmm10
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((176-128))+rdi]
+        xor     r14d,r12d
+        xor     r15d,r10d
+        xor     r14d,r13d
+        lea     r8d,[r15*1+r8]
+        mov     r12d,ebx
+        add     edx,DWORD[((32+16))+rbp]
+        and     r12d,eax
+        rorx    r13d,eax,25
+        rorx    r15d,eax,11
+        lea     r8d,[r14*1+r8]
+        lea     edx,[r12*1+rdx]
+        andn    r12d,eax,ecx
+        xor     r13d,r15d
+        rorx    r14d,eax,6
+        lea     edx,[r12*1+rdx]
+        xor     r13d,r14d
+        mov     r15d,r8d
+        rorx    r12d,r8d,22
+        lea     edx,[r13*1+rdx]
+        xor     r15d,r9d
+        rorx    r14d,r8d,13
+        rorx    r13d,r8d,2
+        lea     r11d,[rdx*1+r11]
+        and     esi,r15d
+        vpand   xmm8,xmm11,xmm12
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((192-128))+rdi]
+        xor     r14d,r12d
+        xor     esi,r9d
+        xor     r14d,r13d
+        lea     edx,[rsi*1+rdx]
+        mov     r12d,eax
+        add     ecx,DWORD[((36+16))+rbp]
+        and     r12d,r11d
+        rorx    r13d,r11d,25
+        rorx    esi,r11d,11
+        lea     edx,[r14*1+rdx]
+        lea     ecx,[r12*1+rcx]
+        andn    r12d,r11d,ebx
+        xor     r13d,esi
+        rorx    r14d,r11d,6
+        lea     ecx,[r12*1+rcx]
+        xor     r13d,r14d
+        mov     esi,edx
+        rorx    r12d,edx,22
+        lea     ecx,[r13*1+rcx]
+        xor     esi,r8d
+        rorx    r14d,edx,13
+        rorx    r13d,edx,2
+        lea     r10d,[rcx*1+r10]
+        and     r15d,esi
+        vaesenclast     xmm11,xmm9,xmm10
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((208-128))+rdi]
+        xor     r14d,r12d
+        xor     r15d,r8d
+        xor     r14d,r13d
+        lea     ecx,[r15*1+rcx]
+        mov     r12d,r11d
+        add     ebx,DWORD[((40+16))+rbp]
+        and     r12d,r10d
+        rorx    r13d,r10d,25
+        rorx    r15d,r10d,11
+        lea     ecx,[r14*1+rcx]
+        lea     ebx,[r12*1+rbx]
+        andn    r12d,r10d,eax
+        xor     r13d,r15d
+        rorx    r14d,r10d,6
+        lea     ebx,[r12*1+rbx]
+        xor     r13d,r14d
+        mov     r15d,ecx
+        rorx    r12d,ecx,22
+        lea     ebx,[r13*1+rbx]
+        xor     r15d,edx
+        rorx    r14d,ecx,13
+        rorx    r13d,ecx,2
+        lea     r9d,[rbx*1+r9]
+        and     esi,r15d
+        vpand   xmm11,xmm11,xmm13
+        vaesenc xmm9,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((224-128))+rdi]
+        xor     r14d,r12d
+        xor     esi,edx
+        xor     r14d,r13d
+        lea     ebx,[rsi*1+rbx]
+        mov     r12d,r10d
+        add     eax,DWORD[((44+16))+rbp]
+        and     r12d,r9d
+        rorx    r13d,r9d,25
+        rorx    esi,r9d,11
+        lea     ebx,[r14*1+rbx]
+        lea     eax,[r12*1+rax]
+        andn    r12d,r9d,r11d
+        xor     r13d,esi
+        rorx    r14d,r9d,6
+        lea     eax,[r12*1+rax]
+        xor     r13d,r14d
+        mov     esi,ebx
+        rorx    r12d,ebx,22
+        lea     eax,[r13*1+rax]
+        xor     esi,ecx
+        rorx    r14d,ebx,13
+        rorx    r13d,ebx,2
+        lea     r8d,[rax*1+r8]
+        and     r15d,esi
+        vpor    xmm8,xmm8,xmm11
+        vaesenclast     xmm11,xmm9,xmm10
+        vmovdqu xmm10,XMMWORD[((0-128))+rdi]
+        xor     r14d,r12d
+        xor     r15d,ecx
+        xor     r14d,r13d
+        lea     eax,[r15*1+rax]
+        mov     r12d,r9d
+        vmovq   r13,xmm15
+        vpextrq r15,xmm15,1
+        vpand   xmm11,xmm11,xmm14
+        vpor    xmm8,xmm8,xmm11
+        lea     rbp,[((-64))+rbp]
+        vmovdqu XMMWORD[r13*1+r15],xmm8
+        lea     r13,[16+r13]
+        cmp     rbp,rsp
+        jae     NEAR $L$ower_avx2
+
+        mov     r15,QWORD[552+rsp]
+        lea     r13,[64+r13]
+        mov     rsi,QWORD[560+rsp]
+        add     eax,r14d
+        lea     rsp,[448+rsp]
+
+        add     eax,DWORD[r15]
+        add     ebx,DWORD[4+r15]
+        add     ecx,DWORD[8+r15]
+        add     edx,DWORD[12+r15]
+        add     r8d,DWORD[16+r15]
+        add     r9d,DWORD[20+r15]
+        add     r10d,DWORD[24+r15]
+        lea     r12,[r13*1+rsi]
+        add     r11d,DWORD[28+r15]
+
+        cmp     r13,QWORD[((64+16))+rsp]
+
+        mov     DWORD[r15],eax
+        cmove   r12,rsp
+        mov     DWORD[4+r15],ebx
+        mov     DWORD[8+r15],ecx
+        mov     DWORD[12+r15],edx
+        mov     DWORD[16+r15],r8d
+        mov     DWORD[20+r15],r9d
+        mov     DWORD[24+r15],r10d
+        mov     DWORD[28+r15],r11d
+
+        jbe     NEAR $L$oop_avx2
+        lea     rbp,[rsp]
+
+$L$done_avx2:
+        lea     rsp,[rbp]
+        mov     r8,QWORD[((64+32))+rsp]
+        mov     rsi,QWORD[120+rsp]
+
+        vmovdqu XMMWORD[r8],xmm8
+        vzeroall
+        movaps  xmm6,XMMWORD[128+rsp]
+        movaps  xmm7,XMMWORD[144+rsp]
+        movaps  xmm8,XMMWORD[160+rsp]
+        movaps  xmm9,XMMWORD[176+rsp]
+        movaps  xmm10,XMMWORD[192+rsp]
+        movaps  xmm11,XMMWORD[208+rsp]
+        movaps  xmm12,XMMWORD[224+rsp]
+        movaps  xmm13,XMMWORD[240+rsp]
+        movaps  xmm14,XMMWORD[256+rsp]
+        movaps  xmm15,XMMWORD[272+rsp]
+        mov     r15,QWORD[((-48))+rsi]
+
+        mov     r14,QWORD[((-40))+rsi]
+
+        mov     r13,QWORD[((-32))+rsi]
+
+        mov     r12,QWORD[((-24))+rsi]
+
+        mov     rbp,QWORD[((-16))+rsi]
+
+        mov     rbx,QWORD[((-8))+rsi]
+
+        lea     rsp,[rsi]
+
+$L$epilogue_avx2:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_aesni_cbc_sha256_enc_avx2:
+
+ALIGN   32
+aesni_cbc_sha256_enc_shaext:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_aesni_cbc_sha256_enc_shaext:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+        mov     r10,QWORD[56+rsp]
+        lea     rsp,[((-168))+rsp]
+        movaps  XMMWORD[(-8-160)+rax],xmm6
+        movaps  XMMWORD[(-8-144)+rax],xmm7
+        movaps  XMMWORD[(-8-128)+rax],xmm8
+        movaps  XMMWORD[(-8-112)+rax],xmm9
+        movaps  XMMWORD[(-8-96)+rax],xmm10
+        movaps  XMMWORD[(-8-80)+rax],xmm11
+        movaps  XMMWORD[(-8-64)+rax],xmm12
+        movaps  XMMWORD[(-8-48)+rax],xmm13
+        movaps  XMMWORD[(-8-32)+rax],xmm14
+        movaps  XMMWORD[(-8-16)+rax],xmm15
+$L$prologue_shaext:
+        lea     rax,[((K256+128))]
+        movdqu  xmm1,XMMWORD[r9]
+        movdqu  xmm2,XMMWORD[16+r9]
+        movdqa  xmm3,XMMWORD[((512-128))+rax]
+
+        mov     r11d,DWORD[240+rcx]
+        sub     rsi,rdi
+        movups  xmm15,XMMWORD[rcx]
+        movups  xmm6,XMMWORD[r8]
+        movups  xmm4,XMMWORD[16+rcx]
+        lea     rcx,[112+rcx]
+
+        pshufd  xmm0,xmm1,0x1b
+        pshufd  xmm1,xmm1,0xb1
+        pshufd  xmm2,xmm2,0x1b
+        movdqa  xmm7,xmm3
+DB      102,15,58,15,202,8
+        punpcklqdq      xmm2,xmm0
+
+        jmp     NEAR $L$oop_shaext
+
+ALIGN   16
+$L$oop_shaext:
+        movdqu  xmm10,XMMWORD[r10]
+        movdqu  xmm11,XMMWORD[16+r10]
+        movdqu  xmm12,XMMWORD[32+r10]
+DB      102,68,15,56,0,211
+        movdqu  xmm13,XMMWORD[48+r10]
+
+        movdqa  xmm0,XMMWORD[((0-128))+rax]
+        paddd   xmm0,xmm10
+DB      102,68,15,56,0,219
+        movdqa  xmm9,xmm2
+        movdqa  xmm8,xmm1
+        movups  xmm14,XMMWORD[rdi]
+        xorps   xmm14,xmm15
+        xorps   xmm6,xmm14
+        movups  xmm5,XMMWORD[((-80))+rcx]
+        aesenc  xmm6,xmm4
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movups  xmm4,XMMWORD[((-64))+rcx]
+        aesenc  xmm6,xmm5
+DB      15,56,203,202
+
+        movdqa  xmm0,XMMWORD[((32-128))+rax]
+        paddd   xmm0,xmm11
+DB      102,68,15,56,0,227
+        lea     r10,[64+r10]
+        movups  xmm5,XMMWORD[((-48))+rcx]
+        aesenc  xmm6,xmm4
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movups  xmm4,XMMWORD[((-32))+rcx]
+        aesenc  xmm6,xmm5
+DB      15,56,203,202
+
+        movdqa  xmm0,XMMWORD[((64-128))+rax]
+        paddd   xmm0,xmm12
+DB      102,68,15,56,0,235
+DB      69,15,56,204,211
+        movups  xmm5,XMMWORD[((-16))+rcx]
+        aesenc  xmm6,xmm4
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movdqa  xmm3,xmm13
+DB      102,65,15,58,15,220,4
+        paddd   xmm10,xmm3
+        movups  xmm4,XMMWORD[rcx]
+        aesenc  xmm6,xmm5
+DB      15,56,203,202
+
+        movdqa  xmm0,XMMWORD[((96-128))+rax]
+        paddd   xmm0,xmm13
+DB      69,15,56,205,213
+DB      69,15,56,204,220
+        movups  xmm5,XMMWORD[16+rcx]
+        aesenc  xmm6,xmm4
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movups  xmm4,XMMWORD[32+rcx]
+        aesenc  xmm6,xmm5
+        movdqa  xmm3,xmm10
+DB      102,65,15,58,15,221,4
+        paddd   xmm11,xmm3
+DB      15,56,203,202
+        movdqa  xmm0,XMMWORD[((128-128))+rax]
+        paddd   xmm0,xmm10
+DB      69,15,56,205,218
+DB      69,15,56,204,229
+        movups  xmm5,XMMWORD[48+rcx]
+        aesenc  xmm6,xmm4
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movdqa  xmm3,xmm11
+DB      102,65,15,58,15,218,4
+        paddd   xmm12,xmm3
+        cmp     r11d,11
+        jb      NEAR $L$aesenclast1
+        movups  xmm4,XMMWORD[64+rcx]
+        aesenc  xmm6,xmm5
+        movups  xmm5,XMMWORD[80+rcx]
+        aesenc  xmm6,xmm4
+        je      NEAR $L$aesenclast1
+        movups  xmm4,XMMWORD[96+rcx]
+        aesenc  xmm6,xmm5
+        movups  xmm5,XMMWORD[112+rcx]
+        aesenc  xmm6,xmm4
+$L$aesenclast1:
+        aesenclast      xmm6,xmm5
+        movups  xmm4,XMMWORD[((16-112))+rcx]
+        nop
+DB      15,56,203,202
+        movups  xmm14,XMMWORD[16+rdi]
+        xorps   xmm14,xmm15
+        movups  XMMWORD[rdi*1+rsi],xmm6
+        xorps   xmm6,xmm14
+        movups  xmm5,XMMWORD[((-80))+rcx]
+        aesenc  xmm6,xmm4
+        movdqa  xmm0,XMMWORD[((160-128))+rax]
+        paddd   xmm0,xmm11
+DB      69,15,56,205,227
+DB      69,15,56,204,234
+        movups  xmm4,XMMWORD[((-64))+rcx]
+        aesenc  xmm6,xmm5
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movdqa  xmm3,xmm12
+DB      102,65,15,58,15,219,4
+        paddd   xmm13,xmm3
+        movups  xmm5,XMMWORD[((-48))+rcx]
+        aesenc  xmm6,xmm4
+DB      15,56,203,202
+        movdqa  xmm0,XMMWORD[((192-128))+rax]
+        paddd   xmm0,xmm12
+DB      69,15,56,205,236
+DB      69,15,56,204,211
+        movups  xmm4,XMMWORD[((-32))+rcx]
+        aesenc  xmm6,xmm5
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movdqa  xmm3,xmm13
+DB      102,65,15,58,15,220,4
+        paddd   xmm10,xmm3
+        movups  xmm5,XMMWORD[((-16))+rcx]
+        aesenc  xmm6,xmm4
+DB      15,56,203,202
+        movdqa  xmm0,XMMWORD[((224-128))+rax]
+        paddd   xmm0,xmm13
+DB      69,15,56,205,213
+DB      69,15,56,204,220
+        movups  xmm4,XMMWORD[rcx]
+        aesenc  xmm6,xmm5
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movdqa  xmm3,xmm10
+DB      102,65,15,58,15,221,4
+        paddd   xmm11,xmm3
+        movups  xmm5,XMMWORD[16+rcx]
+        aesenc  xmm6,xmm4
+DB      15,56,203,202
+        movdqa  xmm0,XMMWORD[((256-128))+rax]
+        paddd   xmm0,xmm10
+DB      69,15,56,205,218
+DB      69,15,56,204,229
+        movups  xmm4,XMMWORD[32+rcx]
+        aesenc  xmm6,xmm5
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movdqa  xmm3,xmm11
+DB      102,65,15,58,15,218,4
+        paddd   xmm12,xmm3
+        movups  xmm5,XMMWORD[48+rcx]
+        aesenc  xmm6,xmm4
+        cmp     r11d,11
+        jb      NEAR $L$aesenclast2
+        movups  xmm4,XMMWORD[64+rcx]
+        aesenc  xmm6,xmm5
+        movups  xmm5,XMMWORD[80+rcx]
+        aesenc  xmm6,xmm4
+        je      NEAR $L$aesenclast2
+        movups  xmm4,XMMWORD[96+rcx]
+        aesenc  xmm6,xmm5
+        movups  xmm5,XMMWORD[112+rcx]
+        aesenc  xmm6,xmm4
+$L$aesenclast2:
+        aesenclast      xmm6,xmm5
+        movups  xmm4,XMMWORD[((16-112))+rcx]
+        nop
+DB      15,56,203,202
+        movups  xmm14,XMMWORD[32+rdi]
+        xorps   xmm14,xmm15
+        movups  XMMWORD[16+rdi*1+rsi],xmm6
+        xorps   xmm6,xmm14
+        movups  xmm5,XMMWORD[((-80))+rcx]
+        aesenc  xmm6,xmm4
+        movdqa  xmm0,XMMWORD[((288-128))+rax]
+        paddd   xmm0,xmm11
+DB      69,15,56,205,227
+DB      69,15,56,204,234
+        movups  xmm4,XMMWORD[((-64))+rcx]
+        aesenc  xmm6,xmm5
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movdqa  xmm3,xmm12
+DB      102,65,15,58,15,219,4
+        paddd   xmm13,xmm3
+        movups  xmm5,XMMWORD[((-48))+rcx]
+        aesenc  xmm6,xmm4
+DB      15,56,203,202
+        movdqa  xmm0,XMMWORD[((320-128))+rax]
+        paddd   xmm0,xmm12
+DB      69,15,56,205,236
+DB      69,15,56,204,211
+        movups  xmm4,XMMWORD[((-32))+rcx]
+        aesenc  xmm6,xmm5
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movdqa  xmm3,xmm13
+DB      102,65,15,58,15,220,4
+        paddd   xmm10,xmm3
+        movups  xmm5,XMMWORD[((-16))+rcx]
+        aesenc  xmm6,xmm4
+DB      15,56,203,202
+        movdqa  xmm0,XMMWORD[((352-128))+rax]
+        paddd   xmm0,xmm13
+DB      69,15,56,205,213
+DB      69,15,56,204,220
+        movups  xmm4,XMMWORD[rcx]
+        aesenc  xmm6,xmm5
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movdqa  xmm3,xmm10
+DB      102,65,15,58,15,221,4
+        paddd   xmm11,xmm3
+        movups  xmm5,XMMWORD[16+rcx]
+        aesenc  xmm6,xmm4
+DB      15,56,203,202
+        movdqa  xmm0,XMMWORD[((384-128))+rax]
+        paddd   xmm0,xmm10
+DB      69,15,56,205,218
+DB      69,15,56,204,229
+        movups  xmm4,XMMWORD[32+rcx]
+        aesenc  xmm6,xmm5
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movdqa  xmm3,xmm11
+DB      102,65,15,58,15,218,4
+        paddd   xmm12,xmm3
+        movups  xmm5,XMMWORD[48+rcx]
+        aesenc  xmm6,xmm4
+DB      15,56,203,202
+        movdqa  xmm0,XMMWORD[((416-128))+rax]
+        paddd   xmm0,xmm11
+DB      69,15,56,205,227
+DB      69,15,56,204,234
+        cmp     r11d,11
+        jb      NEAR $L$aesenclast3
+        movups  xmm4,XMMWORD[64+rcx]
+        aesenc  xmm6,xmm5
+        movups  xmm5,XMMWORD[80+rcx]
+        aesenc  xmm6,xmm4
+        je      NEAR $L$aesenclast3
+        movups  xmm4,XMMWORD[96+rcx]
+        aesenc  xmm6,xmm5
+        movups  xmm5,XMMWORD[112+rcx]
+        aesenc  xmm6,xmm4
+$L$aesenclast3:
+        aesenclast      xmm6,xmm5
+        movups  xmm4,XMMWORD[((16-112))+rcx]
+        nop
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movdqa  xmm3,xmm12
+DB      102,65,15,58,15,219,4
+        paddd   xmm13,xmm3
+        movups  xmm14,XMMWORD[48+rdi]
+        xorps   xmm14,xmm15
+        movups  XMMWORD[32+rdi*1+rsi],xmm6
+        xorps   xmm6,xmm14
+        movups  xmm5,XMMWORD[((-80))+rcx]
+        aesenc  xmm6,xmm4
+        movups  xmm4,XMMWORD[((-64))+rcx]
+        aesenc  xmm6,xmm5
+DB      15,56,203,202
+
+        movdqa  xmm0,XMMWORD[((448-128))+rax]
+        paddd   xmm0,xmm12
+DB      69,15,56,205,236
+        movdqa  xmm3,xmm7
+        movups  xmm5,XMMWORD[((-48))+rcx]
+        aesenc  xmm6,xmm4
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movups  xmm4,XMMWORD[((-32))+rcx]
+        aesenc  xmm6,xmm5
+DB      15,56,203,202
+
+        movdqa  xmm0,XMMWORD[((480-128))+rax]
+        paddd   xmm0,xmm13
+        movups  xmm5,XMMWORD[((-16))+rcx]
+        aesenc  xmm6,xmm4
+        movups  xmm4,XMMWORD[rcx]
+        aesenc  xmm6,xmm5
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movups  xmm5,XMMWORD[16+rcx]
+        aesenc  xmm6,xmm4
+DB      15,56,203,202
+
+        movups  xmm4,XMMWORD[32+rcx]
+        aesenc  xmm6,xmm5
+        movups  xmm5,XMMWORD[48+rcx]
+        aesenc  xmm6,xmm4
+        cmp     r11d,11
+        jb      NEAR $L$aesenclast4
+        movups  xmm4,XMMWORD[64+rcx]
+        aesenc  xmm6,xmm5
+        movups  xmm5,XMMWORD[80+rcx]
+        aesenc  xmm6,xmm4
+        je      NEAR $L$aesenclast4
+        movups  xmm4,XMMWORD[96+rcx]
+        aesenc  xmm6,xmm5
+        movups  xmm5,XMMWORD[112+rcx]
+        aesenc  xmm6,xmm4
+$L$aesenclast4:
+        aesenclast      xmm6,xmm5
+        movups  xmm4,XMMWORD[((16-112))+rcx]
+        nop
+
+        paddd   xmm2,xmm9
+        paddd   xmm1,xmm8
+
+        dec     rdx
+        movups  XMMWORD[48+rdi*1+rsi],xmm6
+        lea     rdi,[64+rdi]
+        jnz     NEAR $L$oop_shaext
+
+        pshufd  xmm2,xmm2,0xb1
+        pshufd  xmm3,xmm1,0x1b
+        pshufd  xmm1,xmm1,0xb1
+        punpckhqdq      xmm1,xmm2
+DB      102,15,58,15,211,8
+
+        movups  XMMWORD[r8],xmm6
+        movdqu  XMMWORD[r9],xmm1
+        movdqu  XMMWORD[16+r9],xmm2
+        movaps  xmm6,XMMWORD[rsp]
+        movaps  xmm7,XMMWORD[16+rsp]
+        movaps  xmm8,XMMWORD[32+rsp]
+        movaps  xmm9,XMMWORD[48+rsp]
+        movaps  xmm10,XMMWORD[64+rsp]
+        movaps  xmm11,XMMWORD[80+rsp]
+        movaps  xmm12,XMMWORD[96+rsp]
+        movaps  xmm13,XMMWORD[112+rsp]
+        movaps  xmm14,XMMWORD[128+rsp]
+        movaps  xmm15,XMMWORD[144+rsp]
+        lea     rsp,[((8+160))+rsp]
+$L$epilogue_shaext:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+$L$SEH_end_aesni_cbc_sha256_enc_shaext:
+EXTERN  __imp_RtlVirtualUnwind
+
+ALIGN   16
+se_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        mov     rsi,QWORD[8+r9]
+        mov     r11,QWORD[56+r9]
+
+        mov     r10d,DWORD[r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jb      NEAR $L$in_prologue
+
+        mov     rax,QWORD[152+r8]
+
+        mov     r10d,DWORD[4+r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jae     NEAR $L$in_prologue
+        lea     r10,[aesni_cbc_sha256_enc_shaext]
+        cmp     rbx,r10
+        jb      NEAR $L$not_in_shaext
+
+        lea     rsi,[rax]
+        lea     rdi,[512+r8]
+        mov     ecx,20
+        DD      0xa548f3fc
+        lea     rax,[168+rax]
+        jmp     NEAR $L$in_prologue
+$L$not_in_shaext:
+        lea     r10,[$L$avx2_shortcut]
+        cmp     rbx,r10
+        jb      NEAR $L$not_in_avx2
+
+        and     rax,-256*4
+        add     rax,448
+$L$not_in_avx2:
+        mov     rsi,rax
+        mov     rax,QWORD[((64+56))+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+        mov     rbp,QWORD[((-16))+rax]
+        mov     r12,QWORD[((-24))+rax]
+        mov     r13,QWORD[((-32))+rax]
+        mov     r14,QWORD[((-40))+rax]
+        mov     r15,QWORD[((-48))+rax]
+        mov     QWORD[144+r8],rbx
+        mov     QWORD[160+r8],rbp
+        mov     QWORD[216+r8],r12
+        mov     QWORD[224+r8],r13
+        mov     QWORD[232+r8],r14
+        mov     QWORD[240+r8],r15
+
+        lea     rsi,[((64+64))+rsi]
+        lea     rdi,[512+r8]
+        mov     ecx,20
+        DD      0xa548f3fc
+
+$L$in_prologue:
+        mov     rdi,QWORD[8+rax]
+        mov     rsi,QWORD[16+rax]
+        mov     QWORD[152+r8],rax
+        mov     QWORD[168+r8],rsi
+        mov     QWORD[176+r8],rdi
+
+        mov     rdi,QWORD[40+r9]
+        mov     rsi,r8
+        mov     ecx,154
+        DD      0xa548f3fc
+
+        mov     rsi,r9
+        xor     rcx,rcx
+        mov     rdx,QWORD[8+rsi]
+        mov     r8,QWORD[rsi]
+        mov     r9,QWORD[16+rsi]
+        mov     r10,QWORD[40+rsi]
+        lea     r11,[56+rsi]
+        lea     r12,[24+rsi]
+        mov     QWORD[32+rsp],r10
+        mov     QWORD[40+rsp],r11
+        mov     QWORD[48+rsp],r12
+        mov     QWORD[56+rsp],rcx
+        call    QWORD[__imp_RtlVirtualUnwind]
+
+        mov     eax,1
+        add     rsp,64
+        popfq
+        pop     r15
+        pop     r14
+        pop     r13
+        pop     r12
+        pop     rbp
+        pop     rbx
+        pop     rdi
+        pop     rsi
+        DB      0F3h,0C3h               ;repret
+
+
+section .pdata rdata align=4
+        DD      $L$SEH_begin_aesni_cbc_sha256_enc_xop wrt ..imagebase
+        DD      $L$SEH_end_aesni_cbc_sha256_enc_xop wrt ..imagebase
+        DD      $L$SEH_info_aesni_cbc_sha256_enc_xop wrt ..imagebase
+
+        DD      $L$SEH_begin_aesni_cbc_sha256_enc_avx wrt ..imagebase
+        DD      $L$SEH_end_aesni_cbc_sha256_enc_avx wrt ..imagebase
+        DD      $L$SEH_info_aesni_cbc_sha256_enc_avx wrt ..imagebase
+        DD      $L$SEH_begin_aesni_cbc_sha256_enc_avx2 wrt ..imagebase
+        DD      $L$SEH_end_aesni_cbc_sha256_enc_avx2 wrt ..imagebase
+        DD      $L$SEH_info_aesni_cbc_sha256_enc_avx2 wrt ..imagebase
+        DD      $L$SEH_begin_aesni_cbc_sha256_enc_shaext wrt ..imagebase
+        DD      $L$SEH_end_aesni_cbc_sha256_enc_shaext wrt ..imagebase
+        DD      $L$SEH_info_aesni_cbc_sha256_enc_shaext wrt ..imagebase
+section .xdata rdata align=8
+ALIGN   8
+$L$SEH_info_aesni_cbc_sha256_enc_xop:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$prologue_xop wrt ..imagebase,$L$epilogue_xop wrt ..imagebase
+
+$L$SEH_info_aesni_cbc_sha256_enc_avx:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$prologue_avx wrt ..imagebase,$L$epilogue_avx wrt ..imagebase
+$L$SEH_info_aesni_cbc_sha256_enc_avx2:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$prologue_avx2 wrt ..imagebase,$L$epilogue_avx2 wrt ..imagebase
+$L$SEH_info_aesni_cbc_sha256_enc_shaext:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$prologue_shaext wrt ..imagebase,$L$epilogue_shaext wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-x86_64.nasm
new file mode 100644
index 0000000000..2705ece3e2
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-x86_64.nasm
@@ -0,0 +1,5084 @@
+; Copyright 2009-2019 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+EXTERN  OPENSSL_ia32cap_P
+global  aesni_encrypt
+
+ALIGN   16
+aesni_encrypt:
+
+        movups  xmm2,XMMWORD[rcx]
+        mov     eax,DWORD[240+r8]
+        movups  xmm0,XMMWORD[r8]
+        movups  xmm1,XMMWORD[16+r8]
+        lea     r8,[32+r8]
+        xorps   xmm2,xmm0
+$L$oop_enc1_1:
+DB      102,15,56,220,209
+        dec     eax
+        movups  xmm1,XMMWORD[r8]
+        lea     r8,[16+r8]
+        jnz     NEAR $L$oop_enc1_1
+DB      102,15,56,221,209
+        pxor    xmm0,xmm0
+        pxor    xmm1,xmm1
+        movups  XMMWORD[rdx],xmm2
+        pxor    xmm2,xmm2
+        DB      0F3h,0C3h               ;repret
+
+
+
+global  aesni_decrypt
+
+ALIGN   16
+aesni_decrypt:
+
+        movups  xmm2,XMMWORD[rcx]
+        mov     eax,DWORD[240+r8]
+        movups  xmm0,XMMWORD[r8]
+        movups  xmm1,XMMWORD[16+r8]
+        lea     r8,[32+r8]
+        xorps   xmm2,xmm0
+$L$oop_dec1_2:
+DB      102,15,56,222,209
+        dec     eax
+        movups  xmm1,XMMWORD[r8]
+        lea     r8,[16+r8]
+        jnz     NEAR $L$oop_dec1_2
+DB      102,15,56,223,209
+        pxor    xmm0,xmm0
+        pxor    xmm1,xmm1
+        movups  XMMWORD[rdx],xmm2
+        pxor    xmm2,xmm2
+        DB      0F3h,0C3h               ;repret
+
+
+
+ALIGN   16
+_aesni_encrypt2:
+
+        movups  xmm0,XMMWORD[rcx]
+        shl     eax,4
+        movups  xmm1,XMMWORD[16+rcx]
+        xorps   xmm2,xmm0
+        xorps   xmm3,xmm0
+        movups  xmm0,XMMWORD[32+rcx]
+        lea     rcx,[32+rax*1+rcx]
+        neg     rax
+        add     rax,16
+
+$L$enc_loop2:
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+        movups  xmm1,XMMWORD[rax*1+rcx]
+        add     rax,32
+DB      102,15,56,220,208
+DB      102,15,56,220,216
+        movups  xmm0,XMMWORD[((-16))+rax*1+rcx]
+        jnz     NEAR $L$enc_loop2
+
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+DB      102,15,56,221,208
+DB      102,15,56,221,216
+        DB      0F3h,0C3h               ;repret
+
+
+
+ALIGN   16
+_aesni_decrypt2:
+
+        movups  xmm0,XMMWORD[rcx]
+        shl     eax,4
+        movups  xmm1,XMMWORD[16+rcx]
+        xorps   xmm2,xmm0
+        xorps   xmm3,xmm0
+        movups  xmm0,XMMWORD[32+rcx]
+        lea     rcx,[32+rax*1+rcx]
+        neg     rax
+        add     rax,16
+
+$L$dec_loop2:
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+        movups  xmm1,XMMWORD[rax*1+rcx]
+        add     rax,32
+DB      102,15,56,222,208
+DB      102,15,56,222,216
+        movups  xmm0,XMMWORD[((-16))+rax*1+rcx]
+        jnz     NEAR $L$dec_loop2
+
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+DB      102,15,56,223,208
+DB      102,15,56,223,216
+        DB      0F3h,0C3h               ;repret
+
+
+
+ALIGN   16
+_aesni_encrypt3:
+
+        movups  xmm0,XMMWORD[rcx]
+        shl     eax,4
+        movups  xmm1,XMMWORD[16+rcx]
+        xorps   xmm2,xmm0
+        xorps   xmm3,xmm0
+        xorps   xmm4,xmm0
+        movups  xmm0,XMMWORD[32+rcx]
+        lea     rcx,[32+rax*1+rcx]
+        neg     rax
+        add     rax,16
+
+$L$enc_loop3:
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+DB      102,15,56,220,225
+        movups  xmm1,XMMWORD[rax*1+rcx]
+        add     rax,32
+DB      102,15,56,220,208
+DB      102,15,56,220,216
+DB      102,15,56,220,224
+        movups  xmm0,XMMWORD[((-16))+rax*1+rcx]
+        jnz     NEAR $L$enc_loop3
+
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+DB      102,15,56,220,225
+DB      102,15,56,221,208
+DB      102,15,56,221,216
+DB      102,15,56,221,224
+        DB      0F3h,0C3h               ;repret
+
+
+
+ALIGN   16
+_aesni_decrypt3:
+
+        movups  xmm0,XMMWORD[rcx]
+        shl     eax,4
+        movups  xmm1,XMMWORD[16+rcx]
+        xorps   xmm2,xmm0
+        xorps   xmm3,xmm0
+        xorps   xmm4,xmm0
+        movups  xmm0,XMMWORD[32+rcx]
+        lea     rcx,[32+rax*1+rcx]
+        neg     rax
+        add     rax,16
+
+$L$dec_loop3:
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+DB      102,15,56,222,225
+        movups  xmm1,XMMWORD[rax*1+rcx]
+        add     rax,32
+DB      102,15,56,222,208
+DB      102,15,56,222,216
+DB      102,15,56,222,224
+        movups  xmm0,XMMWORD[((-16))+rax*1+rcx]
+        jnz     NEAR $L$dec_loop3
+
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+DB      102,15,56,222,225
+DB      102,15,56,223,208
+DB      102,15,56,223,216
+DB      102,15,56,223,224
+        DB      0F3h,0C3h               ;repret
+
+
+
+ALIGN   16
+_aesni_encrypt4:
+
+        movups  xmm0,XMMWORD[rcx]
+        shl     eax,4
+        movups  xmm1,XMMWORD[16+rcx]
+        xorps   xmm2,xmm0
+        xorps   xmm3,xmm0
+        xorps   xmm4,xmm0
+        xorps   xmm5,xmm0
+        movups  xmm0,XMMWORD[32+rcx]
+        lea     rcx,[32+rax*1+rcx]
+        neg     rax
+DB      0x0f,0x1f,0x00
+        add     rax,16
+
+$L$enc_loop4:
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+DB      102,15,56,220,225
+DB      102,15,56,220,233
+        movups  xmm1,XMMWORD[rax*1+rcx]
+        add     rax,32
+DB      102,15,56,220,208
+DB      102,15,56,220,216
+DB      102,15,56,220,224
+DB      102,15,56,220,232
+        movups  xmm0,XMMWORD[((-16))+rax*1+rcx]
+        jnz     NEAR $L$enc_loop4
+
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+DB      102,15,56,220,225
+DB      102,15,56,220,233
+DB      102,15,56,221,208
+DB      102,15,56,221,216
+DB      102,15,56,221,224
+DB      102,15,56,221,232
+        DB      0F3h,0C3h               ;repret
+
+
+
+ALIGN   16
+_aesni_decrypt4:
+
+        movups  xmm0,XMMWORD[rcx]
+        shl     eax,4
+        movups  xmm1,XMMWORD[16+rcx]
+        xorps   xmm2,xmm0
+        xorps   xmm3,xmm0
+        xorps   xmm4,xmm0
+        xorps   xmm5,xmm0
+        movups  xmm0,XMMWORD[32+rcx]
+        lea     rcx,[32+rax*1+rcx]
+        neg     rax
+DB      0x0f,0x1f,0x00
+        add     rax,16
+
+$L$dec_loop4:
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+DB      102,15,56,222,225
+DB      102,15,56,222,233
+        movups  xmm1,XMMWORD[rax*1+rcx]
+        add     rax,32
+DB      102,15,56,222,208
+DB      102,15,56,222,216
+DB      102,15,56,222,224
+DB      102,15,56,222,232
+        movups  xmm0,XMMWORD[((-16))+rax*1+rcx]
+        jnz     NEAR $L$dec_loop4
+
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+DB      102,15,56,222,225
+DB      102,15,56,222,233
+DB      102,15,56,223,208
+DB      102,15,56,223,216
+DB      102,15,56,223,224
+DB      102,15,56,223,232
+        DB      0F3h,0C3h               ;repret
+
+
+
+ALIGN   16
+_aesni_encrypt6:
+
+        movups  xmm0,XMMWORD[rcx]
+        shl     eax,4
+        movups  xmm1,XMMWORD[16+rcx]
+        xorps   xmm2,xmm0
+        pxor    xmm3,xmm0
+        pxor    xmm4,xmm0
+DB      102,15,56,220,209
+        lea     rcx,[32+rax*1+rcx]
+        neg     rax
+DB      102,15,56,220,217
+        pxor    xmm5,xmm0
+        pxor    xmm6,xmm0
+DB      102,15,56,220,225
+        pxor    xmm7,xmm0
+        movups  xmm0,XMMWORD[rax*1+rcx]
+        add     rax,16
+        jmp     NEAR $L$enc_loop6_enter
+ALIGN   16
+$L$enc_loop6:
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+DB      102,15,56,220,225
+$L$enc_loop6_enter:
+DB      102,15,56,220,233
+DB      102,15,56,220,241
+DB      102,15,56,220,249
+        movups  xmm1,XMMWORD[rax*1+rcx]
+        add     rax,32
+DB      102,15,56,220,208
+DB      102,15,56,220,216
+DB      102,15,56,220,224
+DB      102,15,56,220,232
+DB      102,15,56,220,240
+DB      102,15,56,220,248
+        movups  xmm0,XMMWORD[((-16))+rax*1+rcx]
+        jnz     NEAR $L$enc_loop6
+
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+DB      102,15,56,220,225
+DB      102,15,56,220,233
+DB      102,15,56,220,241
+DB      102,15,56,220,249
+DB      102,15,56,221,208
+DB      102,15,56,221,216
+DB      102,15,56,221,224
+DB      102,15,56,221,232
+DB      102,15,56,221,240
+DB      102,15,56,221,248
+        DB      0F3h,0C3h               ;repret
+
+
+
+ALIGN   16
+_aesni_decrypt6:
+
+        movups  xmm0,XMMWORD[rcx]
+        shl     eax,4
+        movups  xmm1,XMMWORD[16+rcx]
+        xorps   xmm2,xmm0
+        pxor    xmm3,xmm0
+        pxor    xmm4,xmm0
+DB      102,15,56,222,209
+        lea     rcx,[32+rax*1+rcx]
+        neg     rax
+DB      102,15,56,222,217
+        pxor    xmm5,xmm0
+        pxor    xmm6,xmm0
+DB      102,15,56,222,225
+        pxor    xmm7,xmm0
+        movups  xmm0,XMMWORD[rax*1+rcx]
+        add     rax,16
+        jmp     NEAR $L$dec_loop6_enter
+ALIGN   16
+$L$dec_loop6:
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+DB      102,15,56,222,225
+$L$dec_loop6_enter:
+DB      102,15,56,222,233
+DB      102,15,56,222,241
+DB      102,15,56,222,249
+        movups  xmm1,XMMWORD[rax*1+rcx]
+        add     rax,32
+DB      102,15,56,222,208
+DB      102,15,56,222,216
+DB      102,15,56,222,224
+DB      102,15,56,222,232
+DB      102,15,56,222,240
+DB      102,15,56,222,248
+        movups  xmm0,XMMWORD[((-16))+rax*1+rcx]
+        jnz     NEAR $L$dec_loop6
+
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+DB      102,15,56,222,225
+DB      102,15,56,222,233
+DB      102,15,56,222,241
+DB      102,15,56,222,249
+DB      102,15,56,223,208
+DB      102,15,56,223,216
+DB      102,15,56,223,224
+DB      102,15,56,223,232
+DB      102,15,56,223,240
+DB      102,15,56,223,248
+        DB      0F3h,0C3h               ;repret
+
+
+
+ALIGN   16
+_aesni_encrypt8:
+
+        movups  xmm0,XMMWORD[rcx]
+        shl     eax,4
+        movups  xmm1,XMMWORD[16+rcx]
+        xorps   xmm2,xmm0
+        xorps   xmm3,xmm0
+        pxor    xmm4,xmm0
+        pxor    xmm5,xmm0
+        pxor    xmm6,xmm0
+        lea     rcx,[32+rax*1+rcx]
+        neg     rax
+DB      102,15,56,220,209
+        pxor    xmm7,xmm0
+        pxor    xmm8,xmm0
+DB      102,15,56,220,217
+        pxor    xmm9,xmm0
+        movups  xmm0,XMMWORD[rax*1+rcx]
+        add     rax,16
+        jmp     NEAR $L$enc_loop8_inner
+ALIGN   16
+$L$enc_loop8:
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+$L$enc_loop8_inner:
+DB      102,15,56,220,225
+DB      102,15,56,220,233
+DB      102,15,56,220,241
+DB      102,15,56,220,249
+DB      102,68,15,56,220,193
+DB      102,68,15,56,220,201
+$L$enc_loop8_enter:
+        movups  xmm1,XMMWORD[rax*1+rcx]
+        add     rax,32
+DB      102,15,56,220,208
+DB      102,15,56,220,216
+DB      102,15,56,220,224
+DB      102,15,56,220,232
+DB      102,15,56,220,240
+DB      102,15,56,220,248
+DB      102,68,15,56,220,192
+DB      102,68,15,56,220,200
+        movups  xmm0,XMMWORD[((-16))+rax*1+rcx]
+        jnz     NEAR $L$enc_loop8
+
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+DB      102,15,56,220,225
+DB      102,15,56,220,233
+DB      102,15,56,220,241
+DB      102,15,56,220,249
+DB      102,68,15,56,220,193
+DB      102,68,15,56,220,201
+DB      102,15,56,221,208
+DB      102,15,56,221,216
+DB      102,15,56,221,224
+DB      102,15,56,221,232
+DB      102,15,56,221,240
+DB      102,15,56,221,248
+DB      102,68,15,56,221,192
+DB      102,68,15,56,221,200
+        DB      0F3h,0C3h               ;repret
+
+
+
+ALIGN   16
+_aesni_decrypt8:
+
+        movups  xmm0,XMMWORD[rcx]
+        shl     eax,4
+        movups  xmm1,XMMWORD[16+rcx]
+        xorps   xmm2,xmm0
+        xorps   xmm3,xmm0
+        pxor    xmm4,xmm0
+        pxor    xmm5,xmm0
+        pxor    xmm6,xmm0
+        lea     rcx,[32+rax*1+rcx]
+        neg     rax
+DB      102,15,56,222,209
+        pxor    xmm7,xmm0
+        pxor    xmm8,xmm0
+DB      102,15,56,222,217
+        pxor    xmm9,xmm0
+        movups  xmm0,XMMWORD[rax*1+rcx]
+        add     rax,16
+        jmp     NEAR $L$dec_loop8_inner
+ALIGN   16
+$L$dec_loop8:
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+$L$dec_loop8_inner:
+DB      102,15,56,222,225
+DB      102,15,56,222,233
+DB      102,15,56,222,241
+DB      102,15,56,222,249
+DB      102,68,15,56,222,193
+DB      102,68,15,56,222,201
+$L$dec_loop8_enter:
+        movups  xmm1,XMMWORD[rax*1+rcx]
+        add     rax,32
+DB      102,15,56,222,208
+DB      102,15,56,222,216
+DB      102,15,56,222,224
+DB      102,15,56,222,232
+DB      102,15,56,222,240
+DB      102,15,56,222,248
+DB      102,68,15,56,222,192
+DB      102,68,15,56,222,200
+        movups  xmm0,XMMWORD[((-16))+rax*1+rcx]
+        jnz     NEAR $L$dec_loop8
+
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+DB      102,15,56,222,225
+DB      102,15,56,222,233
+DB      102,15,56,222,241
+DB      102,15,56,222,249
+DB      102,68,15,56,222,193
+DB      102,68,15,56,222,201
+DB      102,15,56,223,208
+DB      102,15,56,223,216
+DB      102,15,56,223,224
+DB      102,15,56,223,232
+DB      102,15,56,223,240
+DB      102,15,56,223,248
+DB      102,68,15,56,223,192
+DB      102,68,15,56,223,200
+        DB      0F3h,0C3h               ;repret
+
+
+global  aesni_ecb_encrypt
+
+ALIGN   16
+aesni_ecb_encrypt:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_aesni_ecb_encrypt:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+
+
+
+        lea     rsp,[((-88))+rsp]
+        movaps  XMMWORD[rsp],xmm6
+        movaps  XMMWORD[16+rsp],xmm7
+        movaps  XMMWORD[32+rsp],xmm8
+        movaps  XMMWORD[48+rsp],xmm9
+$L$ecb_enc_body:
+        and     rdx,-16
+        jz      NEAR $L$ecb_ret
+
+        mov     eax,DWORD[240+rcx]
+        movups  xmm0,XMMWORD[rcx]
+        mov     r11,rcx
+        mov     r10d,eax
+        test    r8d,r8d
+        jz      NEAR $L$ecb_decrypt
+
+        cmp     rdx,0x80
+        jb      NEAR $L$ecb_enc_tail
+
+        movdqu  xmm2,XMMWORD[rdi]
+        movdqu  xmm3,XMMWORD[16+rdi]
+        movdqu  xmm4,XMMWORD[32+rdi]
+        movdqu  xmm5,XMMWORD[48+rdi]
+        movdqu  xmm6,XMMWORD[64+rdi]
+        movdqu  xmm7,XMMWORD[80+rdi]
+        movdqu  xmm8,XMMWORD[96+rdi]
+        movdqu  xmm9,XMMWORD[112+rdi]
+        lea     rdi,[128+rdi]
+        sub     rdx,0x80
+        jmp     NEAR $L$ecb_enc_loop8_enter
+ALIGN   16
+$L$ecb_enc_loop8:
+        movups  XMMWORD[rsi],xmm2
+        mov     rcx,r11
+        movdqu  xmm2,XMMWORD[rdi]
+        mov     eax,r10d
+        movups  XMMWORD[16+rsi],xmm3
+        movdqu  xmm3,XMMWORD[16+rdi]
+        movups  XMMWORD[32+rsi],xmm4
+        movdqu  xmm4,XMMWORD[32+rdi]
+        movups  XMMWORD[48+rsi],xmm5
+        movdqu  xmm5,XMMWORD[48+rdi]
+        movups  XMMWORD[64+rsi],xmm6
+        movdqu  xmm6,XMMWORD[64+rdi]
+        movups  XMMWORD[80+rsi],xmm7
+        movdqu  xmm7,XMMWORD[80+rdi]
+        movups  XMMWORD[96+rsi],xmm8
+        movdqu  xmm8,XMMWORD[96+rdi]
+        movups  XMMWORD[112+rsi],xmm9
+        lea     rsi,[128+rsi]
+        movdqu  xmm9,XMMWORD[112+rdi]
+        lea     rdi,[128+rdi]
+$L$ecb_enc_loop8_enter:
+
+        call    _aesni_encrypt8
+
+        sub     rdx,0x80
+        jnc     NEAR $L$ecb_enc_loop8
+
+        movups  XMMWORD[rsi],xmm2
+        mov     rcx,r11
+        movups  XMMWORD[16+rsi],xmm3
+        mov     eax,r10d
+        movups  XMMWORD[32+rsi],xmm4
+        movups  XMMWORD[48+rsi],xmm5
+        movups  XMMWORD[64+rsi],xmm6
+        movups  XMMWORD[80+rsi],xmm7
+        movups  XMMWORD[96+rsi],xmm8
+        movups  XMMWORD[112+rsi],xmm9
+        lea     rsi,[128+rsi]
+        add     rdx,0x80
+        jz      NEAR $L$ecb_ret
+
+$L$ecb_enc_tail:
+        movups  xmm2,XMMWORD[rdi]
+        cmp     rdx,0x20
+        jb      NEAR $L$ecb_enc_one
+        movups  xmm3,XMMWORD[16+rdi]
+        je      NEAR $L$ecb_enc_two
+        movups  xmm4,XMMWORD[32+rdi]
+        cmp     rdx,0x40
+        jb      NEAR $L$ecb_enc_three
+        movups  xmm5,XMMWORD[48+rdi]
+        je      NEAR $L$ecb_enc_four
+        movups  xmm6,XMMWORD[64+rdi]
+        cmp     rdx,0x60
+        jb      NEAR $L$ecb_enc_five
+        movups  xmm7,XMMWORD[80+rdi]
+        je      NEAR $L$ecb_enc_six
+        movdqu  xmm8,XMMWORD[96+rdi]
+        xorps   xmm9,xmm9
+        call    _aesni_encrypt8
+        movups  XMMWORD[rsi],xmm2
+        movups  XMMWORD[16+rsi],xmm3
+        movups  XMMWORD[32+rsi],xmm4
+        movups  XMMWORD[48+rsi],xmm5
+        movups  XMMWORD[64+rsi],xmm6
+        movups  XMMWORD[80+rsi],xmm7
+        movups  XMMWORD[96+rsi],xmm8
+        jmp     NEAR $L$ecb_ret
+ALIGN   16
+$L$ecb_enc_one:
+        movups  xmm0,XMMWORD[rcx]
+        movups  xmm1,XMMWORD[16+rcx]
+        lea     rcx,[32+rcx]
+        xorps   xmm2,xmm0
+$L$oop_enc1_3:
+DB      102,15,56,220,209
+        dec     eax
+        movups  xmm1,XMMWORD[rcx]
+        lea     rcx,[16+rcx]
+        jnz     NEAR $L$oop_enc1_3
+DB      102,15,56,221,209
+        movups  XMMWORD[rsi],xmm2
+        jmp     NEAR $L$ecb_ret
+ALIGN   16
+$L$ecb_enc_two:
+        call    _aesni_encrypt2
+        movups  XMMWORD[rsi],xmm2
+        movups  XMMWORD[16+rsi],xmm3
+        jmp     NEAR $L$ecb_ret
+ALIGN   16
+$L$ecb_enc_three:
+        call    _aesni_encrypt3
+        movups  XMMWORD[rsi],xmm2
+        movups  XMMWORD[16+rsi],xmm3
+        movups  XMMWORD[32+rsi],xmm4
+        jmp     NEAR $L$ecb_ret
+ALIGN   16
+$L$ecb_enc_four:
+        call    _aesni_encrypt4
+        movups  XMMWORD[rsi],xmm2
+        movups  XMMWORD[16+rsi],xmm3
+        movups  XMMWORD[32+rsi],xmm4
+        movups  XMMWORD[48+rsi],xmm5
+        jmp     NEAR $L$ecb_ret
+ALIGN   16
+$L$ecb_enc_five:
+        xorps   xmm7,xmm7
+        call    _aesni_encrypt6
+        movups  XMMWORD[rsi],xmm2
+        movups  XMMWORD[16+rsi],xmm3
+        movups  XMMWORD[32+rsi],xmm4
+        movups  XMMWORD[48+rsi],xmm5
+        movups  XMMWORD[64+rsi],xmm6
+        jmp     NEAR $L$ecb_ret
+ALIGN   16
+$L$ecb_enc_six:
+        call    _aesni_encrypt6
+        movups  XMMWORD[rsi],xmm2
+        movups  XMMWORD[16+rsi],xmm3
+        movups  XMMWORD[32+rsi],xmm4
+        movups  XMMWORD[48+rsi],xmm5
+        movups  XMMWORD[64+rsi],xmm6
+        movups  XMMWORD[80+rsi],xmm7
+        jmp     NEAR $L$ecb_ret
+
+ALIGN   16
+$L$ecb_decrypt:
+        cmp     rdx,0x80
+        jb      NEAR $L$ecb_dec_tail
+
+        movdqu  xmm2,XMMWORD[rdi]
+        movdqu  xmm3,XMMWORD[16+rdi]
+        movdqu  xmm4,XMMWORD[32+rdi]
+        movdqu  xmm5,XMMWORD[48+rdi]
+        movdqu  xmm6,XMMWORD[64+rdi]
+        movdqu  xmm7,XMMWORD[80+rdi]
+        movdqu  xmm8,XMMWORD[96+rdi]
+        movdqu  xmm9,XMMWORD[112+rdi]
+        lea     rdi,[128+rdi]
+        sub     rdx,0x80
+        jmp     NEAR $L$ecb_dec_loop8_enter
+ALIGN   16
+$L$ecb_dec_loop8:
+        movups  XMMWORD[rsi],xmm2
+        mov     rcx,r11
+        movdqu  xmm2,XMMWORD[rdi]
+        mov     eax,r10d
+        movups  XMMWORD[16+rsi],xmm3
+        movdqu  xmm3,XMMWORD[16+rdi]
+        movups  XMMWORD[32+rsi],xmm4
+        movdqu  xmm4,XMMWORD[32+rdi]
+        movups  XMMWORD[48+rsi],xmm5
+        movdqu  xmm5,XMMWORD[48+rdi]
+        movups  XMMWORD[64+rsi],xmm6
+        movdqu  xmm6,XMMWORD[64+rdi]
+        movups  XMMWORD[80+rsi],xmm7
+        movdqu  xmm7,XMMWORD[80+rdi]
+        movups  XMMWORD[96+rsi],xmm8
+        movdqu  xmm8,XMMWORD[96+rdi]
+        movups  XMMWORD[112+rsi],xmm9
+        lea     rsi,[128+rsi]
+        movdqu  xmm9,XMMWORD[112+rdi]
+        lea     rdi,[128+rdi]
+$L$ecb_dec_loop8_enter:
+
+        call    _aesni_decrypt8
+
+        movups  xmm0,XMMWORD[r11]
+        sub     rdx,0x80
+        jnc     NEAR $L$ecb_dec_loop8
+
+        movups  XMMWORD[rsi],xmm2
+        pxor    xmm2,xmm2
+        mov     rcx,r11
+        movups  XMMWORD[16+rsi],xmm3
+        pxor    xmm3,xmm3
+        mov     eax,r10d
+        movups  XMMWORD[32+rsi],xmm4
+        pxor    xmm4,xmm4
+        movups  XMMWORD[48+rsi],xmm5
+        pxor    xmm5,xmm5
+        movups  XMMWORD[64+rsi],xmm6
+        pxor    xmm6,xmm6
+        movups  XMMWORD[80+rsi],xmm7
+        pxor    xmm7,xmm7
+        movups  XMMWORD[96+rsi],xmm8
+        pxor    xmm8,xmm8
+        movups  XMMWORD[112+rsi],xmm9
+        pxor    xmm9,xmm9
+        lea     rsi,[128+rsi]
+        add     rdx,0x80
+        jz      NEAR $L$ecb_ret
+
+$L$ecb_dec_tail:
+        movups  xmm2,XMMWORD[rdi]
+        cmp     rdx,0x20
+        jb      NEAR $L$ecb_dec_one
+        movups  xmm3,XMMWORD[16+rdi]
+        je      NEAR $L$ecb_dec_two
+        movups  xmm4,XMMWORD[32+rdi]
+        cmp     rdx,0x40
+        jb      NEAR $L$ecb_dec_three
+        movups  xmm5,XMMWORD[48+rdi]
+        je      NEAR $L$ecb_dec_four
+        movups  xmm6,XMMWORD[64+rdi]
+        cmp     rdx,0x60
+        jb      NEAR $L$ecb_dec_five
+        movups  xmm7,XMMWORD[80+rdi]
+        je      NEAR $L$ecb_dec_six
+        movups  xmm8,XMMWORD[96+rdi]
+        movups  xmm0,XMMWORD[rcx]
+        xorps   xmm9,xmm9
+        call    _aesni_decrypt8
+        movups  XMMWORD[rsi],xmm2
+        pxor    xmm2,xmm2
+        movups  XMMWORD[16+rsi],xmm3
+        pxor    xmm3,xmm3
+        movups  XMMWORD[32+rsi],xmm4
+        pxor    xmm4,xmm4
+        movups  XMMWORD[48+rsi],xmm5
+        pxor    xmm5,xmm5
+        movups  XMMWORD[64+rsi],xmm6
+        pxor    xmm6,xmm6
+        movups  XMMWORD[80+rsi],xmm7
+        pxor    xmm7,xmm7
+        movups  XMMWORD[96+rsi],xmm8
+        pxor    xmm8,xmm8
+        pxor    xmm9,xmm9
+        jmp     NEAR $L$ecb_ret
+ALIGN   16
+$L$ecb_dec_one:
+        movups  xmm0,XMMWORD[rcx]
+        movups  xmm1,XMMWORD[16+rcx]
+        lea     rcx,[32+rcx]
+        xorps   xmm2,xmm0
+$L$oop_dec1_4:
+DB      102,15,56,222,209
+        dec     eax
+        movups  xmm1,XMMWORD[rcx]
+        lea     rcx,[16+rcx]
+        jnz     NEAR $L$oop_dec1_4
+DB      102,15,56,223,209
+        movups  XMMWORD[rsi],xmm2
+        pxor    xmm2,xmm2
+        jmp     NEAR $L$ecb_ret
+ALIGN   16
+$L$ecb_dec_two:
+        call    _aesni_decrypt2
+        movups  XMMWORD[rsi],xmm2
+        pxor    xmm2,xmm2
+        movups  XMMWORD[16+rsi],xmm3
+        pxor    xmm3,xmm3
+        jmp     NEAR $L$ecb_ret
+ALIGN   16
+$L$ecb_dec_three:
+        call    _aesni_decrypt3
+        movups  XMMWORD[rsi],xmm2
+        pxor    xmm2,xmm2
+        movups  XMMWORD[16+rsi],xmm3
+        pxor    xmm3,xmm3
+        movups  XMMWORD[32+rsi],xmm4
+        pxor    xmm4,xmm4
+        jmp     NEAR $L$ecb_ret
+ALIGN   16
+$L$ecb_dec_four:
+        call    _aesni_decrypt4
+        movups  XMMWORD[rsi],xmm2
+        pxor    xmm2,xmm2
+        movups  XMMWORD[16+rsi],xmm3
+        pxor    xmm3,xmm3
+        movups  XMMWORD[32+rsi],xmm4
+        pxor    xmm4,xmm4
+        movups  XMMWORD[48+rsi],xmm5
+        pxor    xmm5,xmm5
+        jmp     NEAR $L$ecb_ret
+ALIGN   16
+$L$ecb_dec_five:
+        xorps   xmm7,xmm7
+        call    _aesni_decrypt6
+        movups  XMMWORD[rsi],xmm2
+        pxor    xmm2,xmm2
+        movups  XMMWORD[16+rsi],xmm3
+        pxor    xmm3,xmm3
+        movups  XMMWORD[32+rsi],xmm4
+        pxor    xmm4,xmm4
+        movups  XMMWORD[48+rsi],xmm5
+        pxor    xmm5,xmm5
+        movups  XMMWORD[64+rsi],xmm6
+        pxor    xmm6,xmm6
+        pxor    xmm7,xmm7
+        jmp     NEAR $L$ecb_ret
+ALIGN   16
+$L$ecb_dec_six:
+        call    _aesni_decrypt6
+        movups  XMMWORD[rsi],xmm2
+        pxor    xmm2,xmm2
+        movups  XMMWORD[16+rsi],xmm3
+        pxor    xmm3,xmm3
+        movups  XMMWORD[32+rsi],xmm4
+        pxor    xmm4,xmm4
+        movups  XMMWORD[48+rsi],xmm5
+        pxor    xmm5,xmm5
+        movups  XMMWORD[64+rsi],xmm6
+        pxor    xmm6,xmm6
+        movups  XMMWORD[80+rsi],xmm7
+        pxor    xmm7,xmm7
+
+$L$ecb_ret:
+        xorps   xmm0,xmm0
+        pxor    xmm1,xmm1
+        movaps  xmm6,XMMWORD[rsp]
+        movaps  XMMWORD[rsp],xmm0
+        movaps  xmm7,XMMWORD[16+rsp]
+        movaps  XMMWORD[16+rsp],xmm0
+        movaps  xmm8,XMMWORD[32+rsp]
+        movaps  XMMWORD[32+rsp],xmm0
+        movaps  xmm9,XMMWORD[48+rsp]
+        movaps  XMMWORD[48+rsp],xmm0
+        lea     rsp,[88+rsp]
+$L$ecb_enc_ret:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_aesni_ecb_encrypt:
+global  aesni_ccm64_encrypt_blocks
+
+ALIGN   16
+aesni_ccm64_encrypt_blocks:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_aesni_ccm64_encrypt_blocks:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+        lea     rsp,[((-88))+rsp]
+        movaps  XMMWORD[rsp],xmm6
+        movaps  XMMWORD[16+rsp],xmm7
+        movaps  XMMWORD[32+rsp],xmm8
+        movaps  XMMWORD[48+rsp],xmm9
+$L$ccm64_enc_body:
+        mov     eax,DWORD[240+rcx]
+        movdqu  xmm6,XMMWORD[r8]
+        movdqa  xmm9,XMMWORD[$L$increment64]
+        movdqa  xmm7,XMMWORD[$L$bswap_mask]
+
+        shl     eax,4
+        mov     r10d,16
+        lea     r11,[rcx]
+        movdqu  xmm3,XMMWORD[r9]
+        movdqa  xmm2,xmm6
+        lea     rcx,[32+rax*1+rcx]
+DB      102,15,56,0,247
+        sub     r10,rax
+        jmp     NEAR $L$ccm64_enc_outer
+ALIGN   16
+$L$ccm64_enc_outer:
+        movups  xmm0,XMMWORD[r11]
+        mov     rax,r10
+        movups  xmm8,XMMWORD[rdi]
+
+        xorps   xmm2,xmm0
+        movups  xmm1,XMMWORD[16+r11]
+        xorps   xmm0,xmm8
+        xorps   xmm3,xmm0
+        movups  xmm0,XMMWORD[32+r11]
+
+$L$ccm64_enc2_loop:
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+        movups  xmm1,XMMWORD[rax*1+rcx]
+        add     rax,32
+DB      102,15,56,220,208
+DB      102,15,56,220,216
+        movups  xmm0,XMMWORD[((-16))+rax*1+rcx]
+        jnz     NEAR $L$ccm64_enc2_loop
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+        paddq   xmm6,xmm9
+        dec     rdx
+DB      102,15,56,221,208
+DB      102,15,56,221,216
+
+        lea     rdi,[16+rdi]
+        xorps   xmm8,xmm2
+        movdqa  xmm2,xmm6
+        movups  XMMWORD[rsi],xmm8
+DB      102,15,56,0,215
+        lea     rsi,[16+rsi]
+        jnz     NEAR $L$ccm64_enc_outer
+
+        pxor    xmm0,xmm0
+        pxor    xmm1,xmm1
+        pxor    xmm2,xmm2
+        movups  XMMWORD[r9],xmm3
+        pxor    xmm3,xmm3
+        pxor    xmm8,xmm8
+        pxor    xmm6,xmm6
+        movaps  xmm6,XMMWORD[rsp]
+        movaps  XMMWORD[rsp],xmm0
+        movaps  xmm7,XMMWORD[16+rsp]
+        movaps  XMMWORD[16+rsp],xmm0
+        movaps  xmm8,XMMWORD[32+rsp]
+        movaps  XMMWORD[32+rsp],xmm0
+        movaps  xmm9,XMMWORD[48+rsp]
+        movaps  XMMWORD[48+rsp],xmm0
+        lea     rsp,[88+rsp]
+$L$ccm64_enc_ret:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+$L$SEH_end_aesni_ccm64_encrypt_blocks:
+global  aesni_ccm64_decrypt_blocks
+
+ALIGN   16
+aesni_ccm64_decrypt_blocks:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_aesni_ccm64_decrypt_blocks:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+        lea     rsp,[((-88))+rsp]
+        movaps  XMMWORD[rsp],xmm6
+        movaps  XMMWORD[16+rsp],xmm7
+        movaps  XMMWORD[32+rsp],xmm8
+        movaps  XMMWORD[48+rsp],xmm9
+$L$ccm64_dec_body:
+        mov     eax,DWORD[240+rcx]
+        movups  xmm6,XMMWORD[r8]
+        movdqu  xmm3,XMMWORD[r9]
+        movdqa  xmm9,XMMWORD[$L$increment64]
+        movdqa  xmm7,XMMWORD[$L$bswap_mask]
+
+        movaps  xmm2,xmm6
+        mov     r10d,eax
+        mov     r11,rcx
+DB      102,15,56,0,247
+        movups  xmm0,XMMWORD[rcx]
+        movups  xmm1,XMMWORD[16+rcx]
+        lea     rcx,[32+rcx]
+        xorps   xmm2,xmm0
+$L$oop_enc1_5:
+DB      102,15,56,220,209
+        dec     eax
+        movups  xmm1,XMMWORD[rcx]
+        lea     rcx,[16+rcx]
+        jnz     NEAR $L$oop_enc1_5
+DB      102,15,56,221,209
+        shl     r10d,4
+        mov     eax,16
+        movups  xmm8,XMMWORD[rdi]
+        paddq   xmm6,xmm9
+        lea     rdi,[16+rdi]
+        sub     rax,r10
+        lea     rcx,[32+r10*1+r11]
+        mov     r10,rax
+        jmp     NEAR $L$ccm64_dec_outer
+ALIGN   16
+$L$ccm64_dec_outer:
+        xorps   xmm8,xmm2
+        movdqa  xmm2,xmm6
+        movups  XMMWORD[rsi],xmm8
+        lea     rsi,[16+rsi]
+DB      102,15,56,0,215
+
+        sub     rdx,1
+        jz      NEAR $L$ccm64_dec_break
+
+        movups  xmm0,XMMWORD[r11]
+        mov     rax,r10
+        movups  xmm1,XMMWORD[16+r11]
+        xorps   xmm8,xmm0
+        xorps   xmm2,xmm0
+        xorps   xmm3,xmm8
+        movups  xmm0,XMMWORD[32+r11]
+        jmp     NEAR $L$ccm64_dec2_loop
+ALIGN   16
+$L$ccm64_dec2_loop:
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+        movups  xmm1,XMMWORD[rax*1+rcx]
+        add     rax,32
+DB      102,15,56,220,208
+DB      102,15,56,220,216
+        movups  xmm0,XMMWORD[((-16))+rax*1+rcx]
+        jnz     NEAR $L$ccm64_dec2_loop
+        movups  xmm8,XMMWORD[rdi]
+        paddq   xmm6,xmm9
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+DB      102,15,56,221,208
+DB      102,15,56,221,216
+        lea     rdi,[16+rdi]
+        jmp     NEAR $L$ccm64_dec_outer
+
+ALIGN   16
+$L$ccm64_dec_break:
+
+        mov     eax,DWORD[240+r11]
+        movups  xmm0,XMMWORD[r11]
+        movups  xmm1,XMMWORD[16+r11]
+        xorps   xmm8,xmm0
+        lea     r11,[32+r11]
+        xorps   xmm3,xmm8
+$L$oop_enc1_6:
+DB      102,15,56,220,217
+        dec     eax
+        movups  xmm1,XMMWORD[r11]
+        lea     r11,[16+r11]
+        jnz     NEAR $L$oop_enc1_6
+DB      102,15,56,221,217
+        pxor    xmm0,xmm0
+        pxor    xmm1,xmm1
+        pxor    xmm2,xmm2
+        movups  XMMWORD[r9],xmm3
+        pxor    xmm3,xmm3
+        pxor    xmm8,xmm8
+        pxor    xmm6,xmm6
+        movaps  xmm6,XMMWORD[rsp]
+        movaps  XMMWORD[rsp],xmm0
+        movaps  xmm7,XMMWORD[16+rsp]
+        movaps  XMMWORD[16+rsp],xmm0
+        movaps  xmm8,XMMWORD[32+rsp]
+        movaps  XMMWORD[32+rsp],xmm0
+        movaps  xmm9,XMMWORD[48+rsp]
+        movaps  XMMWORD[48+rsp],xmm0
+        lea     rsp,[88+rsp]
+$L$ccm64_dec_ret:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+$L$SEH_end_aesni_ccm64_decrypt_blocks:
+global  aesni_ctr32_encrypt_blocks
+
+ALIGN   16
+aesni_ctr32_encrypt_blocks:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_aesni_ctr32_encrypt_blocks:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+
+
+
+        cmp     rdx,1
+        jne     NEAR $L$ctr32_bulk
+
+
+
+        movups  xmm2,XMMWORD[r8]
+        movups  xmm3,XMMWORD[rdi]
+        mov     edx,DWORD[240+rcx]
+        movups  xmm0,XMMWORD[rcx]
+        movups  xmm1,XMMWORD[16+rcx]
+        lea     rcx,[32+rcx]
+        xorps   xmm2,xmm0
+$L$oop_enc1_7:
+DB      102,15,56,220,209
+        dec     edx
+        movups  xmm1,XMMWORD[rcx]
+        lea     rcx,[16+rcx]
+        jnz     NEAR $L$oop_enc1_7
+DB      102,15,56,221,209
+        pxor    xmm0,xmm0
+        pxor    xmm1,xmm1
+        xorps   xmm2,xmm3
+        pxor    xmm3,xmm3
+        movups  XMMWORD[rsi],xmm2
+        xorps   xmm2,xmm2
+        jmp     NEAR $L$ctr32_epilogue
+
+ALIGN   16
+$L$ctr32_bulk:
+        lea     r11,[rsp]
+
+        push    rbp
+
+        sub     rsp,288
+        and     rsp,-16
+        movaps  XMMWORD[(-168)+r11],xmm6
+        movaps  XMMWORD[(-152)+r11],xmm7
+        movaps  XMMWORD[(-136)+r11],xmm8
+        movaps  XMMWORD[(-120)+r11],xmm9
+        movaps  XMMWORD[(-104)+r11],xmm10
+        movaps  XMMWORD[(-88)+r11],xmm11
+        movaps  XMMWORD[(-72)+r11],xmm12
+        movaps  XMMWORD[(-56)+r11],xmm13
+        movaps  XMMWORD[(-40)+r11],xmm14
+        movaps  XMMWORD[(-24)+r11],xmm15
+$L$ctr32_body:
+
+
+
+
+        movdqu  xmm2,XMMWORD[r8]
+        movdqu  xmm0,XMMWORD[rcx]
+        mov     r8d,DWORD[12+r8]
+        pxor    xmm2,xmm0
+        mov     ebp,DWORD[12+rcx]
+        movdqa  XMMWORD[rsp],xmm2
+        bswap   r8d
+        movdqa  xmm3,xmm2
+        movdqa  xmm4,xmm2
+        movdqa  xmm5,xmm2
+        movdqa  XMMWORD[64+rsp],xmm2
+        movdqa  XMMWORD[80+rsp],xmm2
+        movdqa  XMMWORD[96+rsp],xmm2
+        mov     r10,rdx
+        movdqa  XMMWORD[112+rsp],xmm2
+
+        lea     rax,[1+r8]
+        lea     rdx,[2+r8]
+        bswap   eax
+        bswap   edx
+        xor     eax,ebp
+        xor     edx,ebp
+DB      102,15,58,34,216,3
+        lea     rax,[3+r8]
+        movdqa  XMMWORD[16+rsp],xmm3
+DB      102,15,58,34,226,3
+        bswap   eax
+        mov     rdx,r10
+        lea     r10,[4+r8]
+        movdqa  XMMWORD[32+rsp],xmm4
+        xor     eax,ebp
+        bswap   r10d
+DB      102,15,58,34,232,3
+        xor     r10d,ebp
+        movdqa  XMMWORD[48+rsp],xmm5
+        lea     r9,[5+r8]
+        mov     DWORD[((64+12))+rsp],r10d
+        bswap   r9d
+        lea     r10,[6+r8]
+        mov     eax,DWORD[240+rcx]
+        xor     r9d,ebp
+        bswap   r10d
+        mov     DWORD[((80+12))+rsp],r9d
+        xor     r10d,ebp
+        lea     r9,[7+r8]
+        mov     DWORD[((96+12))+rsp],r10d
+        bswap   r9d
+        mov     r10d,DWORD[((OPENSSL_ia32cap_P+4))]
+        xor     r9d,ebp
+        and     r10d,71303168
+        mov     DWORD[((112+12))+rsp],r9d
+
+        movups  xmm1,XMMWORD[16+rcx]
+
+        movdqa  xmm6,XMMWORD[64+rsp]
+        movdqa  xmm7,XMMWORD[80+rsp]
+
+        cmp     rdx,8
+        jb      NEAR $L$ctr32_tail
+
+        sub     rdx,6
+        cmp     r10d,4194304
+        je      NEAR $L$ctr32_6x
+
+        lea     rcx,[128+rcx]
+        sub     rdx,2
+        jmp     NEAR $L$ctr32_loop8
+
+ALIGN   16
+$L$ctr32_6x:
+        shl     eax,4
+        mov     r10d,48
+        bswap   ebp
+        lea     rcx,[32+rax*1+rcx]
+        sub     r10,rax
+        jmp     NEAR $L$ctr32_loop6
+
+ALIGN   16
+$L$ctr32_loop6:
+        add     r8d,6
+        movups  xmm0,XMMWORD[((-48))+r10*1+rcx]
+DB      102,15,56,220,209
+        mov     eax,r8d
+        xor     eax,ebp
+DB      102,15,56,220,217
+DB      0x0f,0x38,0xf1,0x44,0x24,12
+        lea     eax,[1+r8]
+DB      102,15,56,220,225
+        xor     eax,ebp
+DB      0x0f,0x38,0xf1,0x44,0x24,28
+DB      102,15,56,220,233
+        lea     eax,[2+r8]
+        xor     eax,ebp
+DB      102,15,56,220,241
+DB      0x0f,0x38,0xf1,0x44,0x24,44
+        lea     eax,[3+r8]
+DB      102,15,56,220,249
+        movups  xmm1,XMMWORD[((-32))+r10*1+rcx]
+        xor     eax,ebp
+
+DB      102,15,56,220,208
+DB      0x0f,0x38,0xf1,0x44,0x24,60
+        lea     eax,[4+r8]
+DB      102,15,56,220,216
+        xor     eax,ebp
+DB      0x0f,0x38,0xf1,0x44,0x24,76
+DB      102,15,56,220,224
+        lea     eax,[5+r8]
+        xor     eax,ebp
+DB      102,15,56,220,232
+DB      0x0f,0x38,0xf1,0x44,0x24,92
+        mov     rax,r10
+DB      102,15,56,220,240
+DB      102,15,56,220,248
+        movups  xmm0,XMMWORD[((-16))+r10*1+rcx]
+
+        call    $L$enc_loop6
+
+        movdqu  xmm8,XMMWORD[rdi]
+        movdqu  xmm9,XMMWORD[16+rdi]
+        movdqu  xmm10,XMMWORD[32+rdi]
+        movdqu  xmm11,XMMWORD[48+rdi]
+        movdqu  xmm12,XMMWORD[64+rdi]
+        movdqu  xmm13,XMMWORD[80+rdi]
+        lea     rdi,[96+rdi]
+        movups  xmm1,XMMWORD[((-64))+r10*1+rcx]
+        pxor    xmm8,xmm2
+        movaps  xmm2,XMMWORD[rsp]
+        pxor    xmm9,xmm3
+        movaps  xmm3,XMMWORD[16+rsp]
+        pxor    xmm10,xmm4
+        movaps  xmm4,XMMWORD[32+rsp]
+        pxor    xmm11,xmm5
+        movaps  xmm5,XMMWORD[48+rsp]
+        pxor    xmm12,xmm6
+        movaps  xmm6,XMMWORD[64+rsp]
+        pxor    xmm13,xmm7
+        movaps  xmm7,XMMWORD[80+rsp]
+        movdqu  XMMWORD[rsi],xmm8
+        movdqu  XMMWORD[16+rsi],xmm9
+        movdqu  XMMWORD[32+rsi],xmm10
+        movdqu  XMMWORD[48+rsi],xmm11
+        movdqu  XMMWORD[64+rsi],xmm12
+        movdqu  XMMWORD[80+rsi],xmm13
+        lea     rsi,[96+rsi]
+
+        sub     rdx,6
+        jnc     NEAR $L$ctr32_loop6
+
+        add     rdx,6
+        jz      NEAR $L$ctr32_done
+
+        lea     eax,[((-48))+r10]
+        lea     rcx,[((-80))+r10*1+rcx]
+        neg     eax
+        shr     eax,4
+        jmp     NEAR $L$ctr32_tail
+
+ALIGN   32
+$L$ctr32_loop8:
+        add     r8d,8
+        movdqa  xmm8,XMMWORD[96+rsp]
+DB      102,15,56,220,209
+        mov     r9d,r8d
+        movdqa  xmm9,XMMWORD[112+rsp]
+DB      102,15,56,220,217
+        bswap   r9d
+        movups  xmm0,XMMWORD[((32-128))+rcx]
+DB      102,15,56,220,225
+        xor     r9d,ebp
+        nop
+DB      102,15,56,220,233
+        mov     DWORD[((0+12))+rsp],r9d
+        lea     r9,[1+r8]
+DB      102,15,56,220,241
+DB      102,15,56,220,249
+DB      102,68,15,56,220,193
+DB      102,68,15,56,220,201
+        movups  xmm1,XMMWORD[((48-128))+rcx]
+        bswap   r9d
+DB      102,15,56,220,208
+DB      102,15,56,220,216
+        xor     r9d,ebp
+DB      0x66,0x90
+DB      102,15,56,220,224
+DB      102,15,56,220,232
+        mov     DWORD[((16+12))+rsp],r9d
+        lea     r9,[2+r8]
+DB      102,15,56,220,240
+DB      102,15,56,220,248
+DB      102,68,15,56,220,192
+DB      102,68,15,56,220,200
+        movups  xmm0,XMMWORD[((64-128))+rcx]
+        bswap   r9d
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+        xor     r9d,ebp
+DB      0x66,0x90
+DB      102,15,56,220,225
+DB      102,15,56,220,233
+        mov     DWORD[((32+12))+rsp],r9d
+        lea     r9,[3+r8]
+DB      102,15,56,220,241
+DB      102,15,56,220,249
+DB      102,68,15,56,220,193
+DB      102,68,15,56,220,201
+        movups  xmm1,XMMWORD[((80-128))+rcx]
+        bswap   r9d
+DB      102,15,56,220,208
+DB      102,15,56,220,216
+        xor     r9d,ebp
+DB      0x66,0x90
+DB      102,15,56,220,224
+DB      102,15,56,220,232
+        mov     DWORD[((48+12))+rsp],r9d
+        lea     r9,[4+r8]
+DB      102,15,56,220,240
+DB      102,15,56,220,248
+DB      102,68,15,56,220,192
+DB      102,68,15,56,220,200
+        movups  xmm0,XMMWORD[((96-128))+rcx]
+        bswap   r9d
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+        xor     r9d,ebp
+DB      0x66,0x90
+DB      102,15,56,220,225
+DB      102,15,56,220,233
+        mov     DWORD[((64+12))+rsp],r9d
+        lea     r9,[5+r8]
+DB      102,15,56,220,241
+DB      102,15,56,220,249
+DB      102,68,15,56,220,193
+DB      102,68,15,56,220,201
+        movups  xmm1,XMMWORD[((112-128))+rcx]
+        bswap   r9d
+DB      102,15,56,220,208
+DB      102,15,56,220,216
+        xor     r9d,ebp
+DB      0x66,0x90
+DB      102,15,56,220,224
+DB      102,15,56,220,232
+        mov     DWORD[((80+12))+rsp],r9d
+        lea     r9,[6+r8]
+DB      102,15,56,220,240
+DB      102,15,56,220,248
+DB      102,68,15,56,220,192
+DB      102,68,15,56,220,200
+        movups  xmm0,XMMWORD[((128-128))+rcx]
+        bswap   r9d
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+        xor     r9d,ebp
+DB      0x66,0x90
+DB      102,15,56,220,225
+DB      102,15,56,220,233
+        mov     DWORD[((96+12))+rsp],r9d
+        lea     r9,[7+r8]
+DB      102,15,56,220,241
+DB      102,15,56,220,249
+DB      102,68,15,56,220,193
+DB      102,68,15,56,220,201
+        movups  xmm1,XMMWORD[((144-128))+rcx]
+        bswap   r9d
+DB      102,15,56,220,208
+DB      102,15,56,220,216
+DB      102,15,56,220,224
+        xor     r9d,ebp
+        movdqu  xmm10,XMMWORD[rdi]
+DB      102,15,56,220,232
+        mov     DWORD[((112+12))+rsp],r9d
+        cmp     eax,11
+DB      102,15,56,220,240
+DB      102,15,56,220,248
+DB      102,68,15,56,220,192
+DB      102,68,15,56,220,200
+        movups  xmm0,XMMWORD[((160-128))+rcx]
+
+        jb      NEAR $L$ctr32_enc_done
+
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+DB      102,15,56,220,225
+DB      102,15,56,220,233
+DB      102,15,56,220,241
+DB      102,15,56,220,249
+DB      102,68,15,56,220,193
+DB      102,68,15,56,220,201
+        movups  xmm1,XMMWORD[((176-128))+rcx]
+
+DB      102,15,56,220,208
+DB      102,15,56,220,216
+DB      102,15,56,220,224
+DB      102,15,56,220,232
+DB      102,15,56,220,240
+DB      102,15,56,220,248
+DB      102,68,15,56,220,192
+DB      102,68,15,56,220,200
+        movups  xmm0,XMMWORD[((192-128))+rcx]
+        je      NEAR $L$ctr32_enc_done
+
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+DB      102,15,56,220,225
+DB      102,15,56,220,233
+DB      102,15,56,220,241
+DB      102,15,56,220,249
+DB      102,68,15,56,220,193
+DB      102,68,15,56,220,201
+        movups  xmm1,XMMWORD[((208-128))+rcx]
+
+DB      102,15,56,220,208
+DB      102,15,56,220,216
+DB      102,15,56,220,224
+DB      102,15,56,220,232
+DB      102,15,56,220,240
+DB      102,15,56,220,248
+DB      102,68,15,56,220,192
+DB      102,68,15,56,220,200
+        movups  xmm0,XMMWORD[((224-128))+rcx]
+        jmp     NEAR $L$ctr32_enc_done
+
+ALIGN   16
+$L$ctr32_enc_done:
+        movdqu  xmm11,XMMWORD[16+rdi]
+        pxor    xmm10,xmm0
+        movdqu  xmm12,XMMWORD[32+rdi]
+        pxor    xmm11,xmm0
+        movdqu  xmm13,XMMWORD[48+rdi]
+        pxor    xmm12,xmm0
+        movdqu  xmm14,XMMWORD[64+rdi]
+        pxor    xmm13,xmm0
+        movdqu  xmm15,XMMWORD[80+rdi]
+        pxor    xmm14,xmm0
+        pxor    xmm15,xmm0
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+DB      102,15,56,220,225
+DB      102,15,56,220,233
+DB      102,15,56,220,241
+DB      102,15,56,220,249
+DB      102,68,15,56,220,193
+DB      102,68,15,56,220,201
+        movdqu  xmm1,XMMWORD[96+rdi]
+        lea     rdi,[128+rdi]
+
+DB      102,65,15,56,221,210
+        pxor    xmm1,xmm0
+        movdqu  xmm10,XMMWORD[((112-128))+rdi]
+DB      102,65,15,56,221,219
+        pxor    xmm10,xmm0
+        movdqa  xmm11,XMMWORD[rsp]
+DB      102,65,15,56,221,228
+DB      102,65,15,56,221,237
+        movdqa  xmm12,XMMWORD[16+rsp]
+        movdqa  xmm13,XMMWORD[32+rsp]
+DB      102,65,15,56,221,246
+DB      102,65,15,56,221,255
+        movdqa  xmm14,XMMWORD[48+rsp]
+        movdqa  xmm15,XMMWORD[64+rsp]
+DB      102,68,15,56,221,193
+        movdqa  xmm0,XMMWORD[80+rsp]
+        movups  xmm1,XMMWORD[((16-128))+rcx]
+DB      102,69,15,56,221,202
+
+        movups  XMMWORD[rsi],xmm2
+        movdqa  xmm2,xmm11
+        movups  XMMWORD[16+rsi],xmm3
+        movdqa  xmm3,xmm12
+        movups  XMMWORD[32+rsi],xmm4
+        movdqa  xmm4,xmm13
+        movups  XMMWORD[48+rsi],xmm5
+        movdqa  xmm5,xmm14
+        movups  XMMWORD[64+rsi],xmm6
+        movdqa  xmm6,xmm15
+        movups  XMMWORD[80+rsi],xmm7
+        movdqa  xmm7,xmm0
+        movups  XMMWORD[96+rsi],xmm8
+        movups  XMMWORD[112+rsi],xmm9
+        lea     rsi,[128+rsi]
+
+        sub     rdx,8
+        jnc     NEAR $L$ctr32_loop8
+
+        add     rdx,8
+        jz      NEAR $L$ctr32_done
+        lea     rcx,[((-128))+rcx]
+
+$L$ctr32_tail:
+
+
+        lea     rcx,[16+rcx]
+        cmp     rdx,4
+        jb      NEAR $L$ctr32_loop3
+        je      NEAR $L$ctr32_loop4
+
+
+        shl     eax,4
+        movdqa  xmm8,XMMWORD[96+rsp]
+        pxor    xmm9,xmm9
+
+        movups  xmm0,XMMWORD[16+rcx]
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+        lea     rcx,[((32-16))+rax*1+rcx]
+        neg     rax
+DB      102,15,56,220,225
+        add     rax,16
+        movups  xmm10,XMMWORD[rdi]
+DB      102,15,56,220,233
+DB      102,15,56,220,241
+        movups  xmm11,XMMWORD[16+rdi]
+        movups  xmm12,XMMWORD[32+rdi]
+DB      102,15,56,220,249
+DB      102,68,15,56,220,193
+
+        call    $L$enc_loop8_enter
+
+        movdqu  xmm13,XMMWORD[48+rdi]
+        pxor    xmm2,xmm10
+        movdqu  xmm10,XMMWORD[64+rdi]
+        pxor    xmm3,xmm11
+        movdqu  XMMWORD[rsi],xmm2
+        pxor    xmm4,xmm12
+        movdqu  XMMWORD[16+rsi],xmm3
+        pxor    xmm5,xmm13
+        movdqu  XMMWORD[32+rsi],xmm4
+        pxor    xmm6,xmm10
+        movdqu  XMMWORD[48+rsi],xmm5
+        movdqu  XMMWORD[64+rsi],xmm6
+        cmp     rdx,6
+        jb      NEAR $L$ctr32_done
+
+        movups  xmm11,XMMWORD[80+rdi]
+        xorps   xmm7,xmm11
+        movups  XMMWORD[80+rsi],xmm7
+        je      NEAR $L$ctr32_done
+
+        movups  xmm12,XMMWORD[96+rdi]
+        xorps   xmm8,xmm12
+        movups  XMMWORD[96+rsi],xmm8
+        jmp     NEAR $L$ctr32_done
+
+ALIGN   32
+$L$ctr32_loop4:
+DB      102,15,56,220,209
+        lea     rcx,[16+rcx]
+        dec     eax
+DB      102,15,56,220,217
+DB      102,15,56,220,225
+DB      102,15,56,220,233
+        movups  xmm1,XMMWORD[rcx]
+        jnz     NEAR $L$ctr32_loop4
+DB      102,15,56,221,209
+DB      102,15,56,221,217
+        movups  xmm10,XMMWORD[rdi]
+        movups  xmm11,XMMWORD[16+rdi]
+DB      102,15,56,221,225
+DB      102,15,56,221,233
+        movups  xmm12,XMMWORD[32+rdi]
+        movups  xmm13,XMMWORD[48+rdi]
+
+        xorps   xmm2,xmm10
+        movups  XMMWORD[rsi],xmm2
+        xorps   xmm3,xmm11
+        movups  XMMWORD[16+rsi],xmm3
+        pxor    xmm4,xmm12
+        movdqu  XMMWORD[32+rsi],xmm4
+        pxor    xmm5,xmm13
+        movdqu  XMMWORD[48+rsi],xmm5
+        jmp     NEAR $L$ctr32_done
+
+ALIGN   32
+$L$ctr32_loop3:
+DB      102,15,56,220,209
+        lea     rcx,[16+rcx]
+        dec     eax
+DB      102,15,56,220,217
+DB      102,15,56,220,225
+        movups  xmm1,XMMWORD[rcx]
+        jnz     NEAR $L$ctr32_loop3
+DB      102,15,56,221,209
+DB      102,15,56,221,217
+DB      102,15,56,221,225
+
+        movups  xmm10,XMMWORD[rdi]
+        xorps   xmm2,xmm10
+        movups  XMMWORD[rsi],xmm2
+        cmp     rdx,2
+        jb      NEAR $L$ctr32_done
+
+        movups  xmm11,XMMWORD[16+rdi]
+        xorps   xmm3,xmm11
+        movups  XMMWORD[16+rsi],xmm3
+        je      NEAR $L$ctr32_done
+
+        movups  xmm12,XMMWORD[32+rdi]
+        xorps   xmm4,xmm12
+        movups  XMMWORD[32+rsi],xmm4
+
+$L$ctr32_done:
+        xorps   xmm0,xmm0
+        xor     ebp,ebp
+        pxor    xmm1,xmm1
+        pxor    xmm2,xmm2
+        pxor    xmm3,xmm3
+        pxor    xmm4,xmm4
+        pxor    xmm5,xmm5
+        movaps  xmm6,XMMWORD[((-168))+r11]
+        movaps  XMMWORD[(-168)+r11],xmm0
+        movaps  xmm7,XMMWORD[((-152))+r11]
+        movaps  XMMWORD[(-152)+r11],xmm0
+        movaps  xmm8,XMMWORD[((-136))+r11]
+        movaps  XMMWORD[(-136)+r11],xmm0
+        movaps  xmm9,XMMWORD[((-120))+r11]
+        movaps  XMMWORD[(-120)+r11],xmm0
+        movaps  xmm10,XMMWORD[((-104))+r11]
+        movaps  XMMWORD[(-104)+r11],xmm0
+        movaps  xmm11,XMMWORD[((-88))+r11]
+        movaps  XMMWORD[(-88)+r11],xmm0
+        movaps  xmm12,XMMWORD[((-72))+r11]
+        movaps  XMMWORD[(-72)+r11],xmm0
+        movaps  xmm13,XMMWORD[((-56))+r11]
+        movaps  XMMWORD[(-56)+r11],xmm0
+        movaps  xmm14,XMMWORD[((-40))+r11]
+        movaps  XMMWORD[(-40)+r11],xmm0
+        movaps  xmm15,XMMWORD[((-24))+r11]
+        movaps  XMMWORD[(-24)+r11],xmm0
+        movaps  XMMWORD[rsp],xmm0
+        movaps  XMMWORD[16+rsp],xmm0
+        movaps  XMMWORD[32+rsp],xmm0
+        movaps  XMMWORD[48+rsp],xmm0
+        movaps  XMMWORD[64+rsp],xmm0
+        movaps  XMMWORD[80+rsp],xmm0
+        movaps  XMMWORD[96+rsp],xmm0
+        movaps  XMMWORD[112+rsp],xmm0
+        mov     rbp,QWORD[((-8))+r11]
+
+        lea     rsp,[r11]
+
+$L$ctr32_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_aesni_ctr32_encrypt_blocks:
+global  aesni_xts_encrypt
+
+ALIGN   16
+aesni_xts_encrypt:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_aesni_xts_encrypt:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+
+        lea     r11,[rsp]
+
+        push    rbp
+
+        sub     rsp,272
+        and     rsp,-16
+        movaps  XMMWORD[(-168)+r11],xmm6
+        movaps  XMMWORD[(-152)+r11],xmm7
+        movaps  XMMWORD[(-136)+r11],xmm8
+        movaps  XMMWORD[(-120)+r11],xmm9
+        movaps  XMMWORD[(-104)+r11],xmm10
+        movaps  XMMWORD[(-88)+r11],xmm11
+        movaps  XMMWORD[(-72)+r11],xmm12
+        movaps  XMMWORD[(-56)+r11],xmm13
+        movaps  XMMWORD[(-40)+r11],xmm14
+        movaps  XMMWORD[(-24)+r11],xmm15
+$L$xts_enc_body:
+        movups  xmm2,XMMWORD[r9]
+        mov     eax,DWORD[240+r8]
+        mov     r10d,DWORD[240+rcx]
+        movups  xmm0,XMMWORD[r8]
+        movups  xmm1,XMMWORD[16+r8]
+        lea     r8,[32+r8]
+        xorps   xmm2,xmm0
+$L$oop_enc1_8:
+DB      102,15,56,220,209
+        dec     eax
+        movups  xmm1,XMMWORD[r8]
+        lea     r8,[16+r8]
+        jnz     NEAR $L$oop_enc1_8
+DB      102,15,56,221,209
+        movups  xmm0,XMMWORD[rcx]
+        mov     rbp,rcx
+        mov     eax,r10d
+        shl     r10d,4
+        mov     r9,rdx
+        and     rdx,-16
+
+        movups  xmm1,XMMWORD[16+r10*1+rcx]
+
+        movdqa  xmm8,XMMWORD[$L$xts_magic]
+        movdqa  xmm15,xmm2
+        pshufd  xmm9,xmm2,0x5f
+        pxor    xmm1,xmm0
+        movdqa  xmm14,xmm9
+        paddd   xmm9,xmm9
+        movdqa  xmm10,xmm15
+        psrad   xmm14,31
+        paddq   xmm15,xmm15
+        pand    xmm14,xmm8
+        pxor    xmm10,xmm0
+        pxor    xmm15,xmm14
+        movdqa  xmm14,xmm9
+        paddd   xmm9,xmm9
+        movdqa  xmm11,xmm15
+        psrad   xmm14,31
+        paddq   xmm15,xmm15
+        pand    xmm14,xmm8
+        pxor    xmm11,xmm0
+        pxor    xmm15,xmm14
+        movdqa  xmm14,xmm9
+        paddd   xmm9,xmm9
+        movdqa  xmm12,xmm15
+        psrad   xmm14,31
+        paddq   xmm15,xmm15
+        pand    xmm14,xmm8
+        pxor    xmm12,xmm0
+        pxor    xmm15,xmm14
+        movdqa  xmm14,xmm9
+        paddd   xmm9,xmm9
+        movdqa  xmm13,xmm15
+        psrad   xmm14,31
+        paddq   xmm15,xmm15
+        pand    xmm14,xmm8
+        pxor    xmm13,xmm0
+        pxor    xmm15,xmm14
+        movdqa  xmm14,xmm15
+        psrad   xmm9,31
+        paddq   xmm15,xmm15
+        pand    xmm9,xmm8
+        pxor    xmm14,xmm0
+        pxor    xmm15,xmm9
+        movaps  XMMWORD[96+rsp],xmm1
+
+        sub     rdx,16*6
+        jc      NEAR $L$xts_enc_short
+
+        mov     eax,16+96
+        lea     rcx,[32+r10*1+rbp]
+        sub     rax,r10
+        movups  xmm1,XMMWORD[16+rbp]
+        mov     r10,rax
+        lea     r8,[$L$xts_magic]
+        jmp     NEAR $L$xts_enc_grandloop
+
+ALIGN   32
+$L$xts_enc_grandloop:
+        movdqu  xmm2,XMMWORD[rdi]
+        movdqa  xmm8,xmm0
+        movdqu  xmm3,XMMWORD[16+rdi]
+        pxor    xmm2,xmm10
+        movdqu  xmm4,XMMWORD[32+rdi]
+        pxor    xmm3,xmm11
+DB      102,15,56,220,209
+        movdqu  xmm5,XMMWORD[48+rdi]
+        pxor    xmm4,xmm12
+DB      102,15,56,220,217
+        movdqu  xmm6,XMMWORD[64+rdi]
+        pxor    xmm5,xmm13
+DB      102,15,56,220,225
+        movdqu  xmm7,XMMWORD[80+rdi]
+        pxor    xmm8,xmm15
+        movdqa  xmm9,XMMWORD[96+rsp]
+        pxor    xmm6,xmm14
+DB      102,15,56,220,233
+        movups  xmm0,XMMWORD[32+rbp]
+        lea     rdi,[96+rdi]
+        pxor    xmm7,xmm8
+
+        pxor    xmm10,xmm9
+DB      102,15,56,220,241
+        pxor    xmm11,xmm9
+        movdqa  XMMWORD[rsp],xmm10
+DB      102,15,56,220,249
+        movups  xmm1,XMMWORD[48+rbp]
+        pxor    xmm12,xmm9
+
+DB      102,15,56,220,208
+        pxor    xmm13,xmm9
+        movdqa  XMMWORD[16+rsp],xmm11
+DB      102,15,56,220,216
+        pxor    xmm14,xmm9
+        movdqa  XMMWORD[32+rsp],xmm12
+DB      102,15,56,220,224
+DB      102,15,56,220,232
+        pxor    xmm8,xmm9
+        movdqa  XMMWORD[64+rsp],xmm14
+DB      102,15,56,220,240
+DB      102,15,56,220,248
+        movups  xmm0,XMMWORD[64+rbp]
+        movdqa  XMMWORD[80+rsp],xmm8
+        pshufd  xmm9,xmm15,0x5f
+        jmp     NEAR $L$xts_enc_loop6
+ALIGN   32
+$L$xts_enc_loop6:
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+DB      102,15,56,220,225
+DB      102,15,56,220,233
+DB      102,15,56,220,241
+DB      102,15,56,220,249
+        movups  xmm1,XMMWORD[((-64))+rax*1+rcx]
+        add     rax,32
+
+DB      102,15,56,220,208
+DB      102,15,56,220,216
+DB      102,15,56,220,224
+DB      102,15,56,220,232
+DB      102,15,56,220,240
+DB      102,15,56,220,248
+        movups  xmm0,XMMWORD[((-80))+rax*1+rcx]
+        jnz     NEAR $L$xts_enc_loop6
+
+        movdqa  xmm8,XMMWORD[r8]
+        movdqa  xmm14,xmm9
+        paddd   xmm9,xmm9
+DB      102,15,56,220,209
+        paddq   xmm15,xmm15
+        psrad   xmm14,31
+DB      102,15,56,220,217
+        pand    xmm14,xmm8
+        movups  xmm10,XMMWORD[rbp]
+DB      102,15,56,220,225
+DB      102,15,56,220,233
+DB      102,15,56,220,241
+        pxor    xmm15,xmm14
+        movaps  xmm11,xmm10
+DB      102,15,56,220,249
+        movups  xmm1,XMMWORD[((-64))+rcx]
+
+        movdqa  xmm14,xmm9
+DB      102,15,56,220,208
+        paddd   xmm9,xmm9
+        pxor    xmm10,xmm15
+DB      102,15,56,220,216
+        psrad   xmm14,31
+        paddq   xmm15,xmm15
+DB      102,15,56,220,224
+DB      102,15,56,220,232
+        pand    xmm14,xmm8
+        movaps  xmm12,xmm11
+DB      102,15,56,220,240
+        pxor    xmm15,xmm14
+        movdqa  xmm14,xmm9
+DB      102,15,56,220,248
+        movups  xmm0,XMMWORD[((-48))+rcx]
+
+        paddd   xmm9,xmm9
+DB      102,15,56,220,209
+        pxor    xmm11,xmm15
+        psrad   xmm14,31
+DB      102,15,56,220,217
+        paddq   xmm15,xmm15
+        pand    xmm14,xmm8
+DB      102,15,56,220,225
+DB      102,15,56,220,233
+        movdqa  XMMWORD[48+rsp],xmm13
+        pxor    xmm15,xmm14
+DB      102,15,56,220,241
+        movaps  xmm13,xmm12
+        movdqa  xmm14,xmm9
+DB      102,15,56,220,249
+        movups  xmm1,XMMWORD[((-32))+rcx]
+
+        paddd   xmm9,xmm9
+DB      102,15,56,220,208
+        pxor    xmm12,xmm15
+        psrad   xmm14,31
+DB      102,15,56,220,216
+        paddq   xmm15,xmm15
+        pand    xmm14,xmm8
+DB      102,15,56,220,224
+DB      102,15,56,220,232
+DB      102,15,56,220,240
+        pxor    xmm15,xmm14
+        movaps  xmm14,xmm13
+DB      102,15,56,220,248
+
+        movdqa  xmm0,xmm9
+        paddd   xmm9,xmm9
+DB      102,15,56,220,209
+        pxor    xmm13,xmm15
+        psrad   xmm0,31
+DB      102,15,56,220,217
+        paddq   xmm15,xmm15
+        pand    xmm0,xmm8
+DB      102,15,56,220,225
+DB      102,15,56,220,233
+        pxor    xmm15,xmm0
+        movups  xmm0,XMMWORD[rbp]
+DB      102,15,56,220,241
+DB      102,15,56,220,249
+        movups  xmm1,XMMWORD[16+rbp]
+
+        pxor    xmm14,xmm15
+DB      102,15,56,221,84,36,0
+        psrad   xmm9,31
+        paddq   xmm15,xmm15
+DB      102,15,56,221,92,36,16
+DB      102,15,56,221,100,36,32
+        pand    xmm9,xmm8
+        mov     rax,r10
+DB      102,15,56,221,108,36,48
+DB      102,15,56,221,116,36,64
+DB      102,15,56,221,124,36,80
+        pxor    xmm15,xmm9
+
+        lea     rsi,[96+rsi]
+        movups  XMMWORD[(-96)+rsi],xmm2
+        movups  XMMWORD[(-80)+rsi],xmm3
+        movups  XMMWORD[(-64)+rsi],xmm4
+        movups  XMMWORD[(-48)+rsi],xmm5
+        movups  XMMWORD[(-32)+rsi],xmm6
+        movups  XMMWORD[(-16)+rsi],xmm7
+        sub     rdx,16*6
+        jnc     NEAR $L$xts_enc_grandloop
+
+        mov     eax,16+96
+        sub     eax,r10d
+        mov     rcx,rbp
+        shr     eax,4
+
+$L$xts_enc_short:
+
+        mov     r10d,eax
+        pxor    xmm10,xmm0
+        add     rdx,16*6
+        jz      NEAR $L$xts_enc_done
+
+        pxor    xmm11,xmm0
+        cmp     rdx,0x20
+        jb      NEAR $L$xts_enc_one
+        pxor    xmm12,xmm0
+        je      NEAR $L$xts_enc_two
+
+        pxor    xmm13,xmm0
+        cmp     rdx,0x40
+        jb      NEAR $L$xts_enc_three
+        pxor    xmm14,xmm0
+        je      NEAR $L$xts_enc_four
+
+        movdqu  xmm2,XMMWORD[rdi]
+        movdqu  xmm3,XMMWORD[16+rdi]
+        movdqu  xmm4,XMMWORD[32+rdi]
+        pxor    xmm2,xmm10
+        movdqu  xmm5,XMMWORD[48+rdi]
+        pxor    xmm3,xmm11
+        movdqu  xmm6,XMMWORD[64+rdi]
+        lea     rdi,[80+rdi]
+        pxor    xmm4,xmm12
+        pxor    xmm5,xmm13
+        pxor    xmm6,xmm14
+        pxor    xmm7,xmm7
+
+        call    _aesni_encrypt6
+
+        xorps   xmm2,xmm10
+        movdqa  xmm10,xmm15
+        xorps   xmm3,xmm11
+        xorps   xmm4,xmm12
+        movdqu  XMMWORD[rsi],xmm2
+        xorps   xmm5,xmm13
+        movdqu  XMMWORD[16+rsi],xmm3
+        xorps   xmm6,xmm14
+        movdqu  XMMWORD[32+rsi],xmm4
+        movdqu  XMMWORD[48+rsi],xmm5
+        movdqu  XMMWORD[64+rsi],xmm6
+        lea     rsi,[80+rsi]
+        jmp     NEAR $L$xts_enc_done
+
+ALIGN   16
+$L$xts_enc_one:
+        movups  xmm2,XMMWORD[rdi]
+        lea     rdi,[16+rdi]
+        xorps   xmm2,xmm10
+        movups  xmm0,XMMWORD[rcx]
+        movups  xmm1,XMMWORD[16+rcx]
+        lea     rcx,[32+rcx]
+        xorps   xmm2,xmm0
+$L$oop_enc1_9:
+DB      102,15,56,220,209
+        dec     eax
+        movups  xmm1,XMMWORD[rcx]
+        lea     rcx,[16+rcx]
+        jnz     NEAR $L$oop_enc1_9
+DB      102,15,56,221,209
+        xorps   xmm2,xmm10
+        movdqa  xmm10,xmm11
+        movups  XMMWORD[rsi],xmm2
+        lea     rsi,[16+rsi]
+        jmp     NEAR $L$xts_enc_done
+
+ALIGN   16
+$L$xts_enc_two:
+        movups  xmm2,XMMWORD[rdi]
+        movups  xmm3,XMMWORD[16+rdi]
+        lea     rdi,[32+rdi]
+        xorps   xmm2,xmm10
+        xorps   xmm3,xmm11
+
+        call    _aesni_encrypt2
+
+        xorps   xmm2,xmm10
+        movdqa  xmm10,xmm12
+        xorps   xmm3,xmm11
+        movups  XMMWORD[rsi],xmm2
+        movups  XMMWORD[16+rsi],xmm3
+        lea     rsi,[32+rsi]
+        jmp     NEAR $L$xts_enc_done
+
+ALIGN   16
+$L$xts_enc_three:
+        movups  xmm2,XMMWORD[rdi]
+        movups  xmm3,XMMWORD[16+rdi]
+        movups  xmm4,XMMWORD[32+rdi]
+        lea     rdi,[48+rdi]
+        xorps   xmm2,xmm10
+        xorps   xmm3,xmm11
+        xorps   xmm4,xmm12
+
+        call    _aesni_encrypt3
+
+        xorps   xmm2,xmm10
+        movdqa  xmm10,xmm13
+        xorps   xmm3,xmm11
+        xorps   xmm4,xmm12
+        movups  XMMWORD[rsi],xmm2
+        movups  XMMWORD[16+rsi],xmm3
+        movups  XMMWORD[32+rsi],xmm4
+        lea     rsi,[48+rsi]
+        jmp     NEAR $L$xts_enc_done
+
+ALIGN   16
+$L$xts_enc_four:
+        movups  xmm2,XMMWORD[rdi]
+        movups  xmm3,XMMWORD[16+rdi]
+        movups  xmm4,XMMWORD[32+rdi]
+        xorps   xmm2,xmm10
+        movups  xmm5,XMMWORD[48+rdi]
+        lea     rdi,[64+rdi]
+        xorps   xmm3,xmm11
+        xorps   xmm4,xmm12
+        xorps   xmm5,xmm13
+
+        call    _aesni_encrypt4
+
+        pxor    xmm2,xmm10
+        movdqa  xmm10,xmm14
+        pxor    xmm3,xmm11
+        pxor    xmm4,xmm12
+        movdqu  XMMWORD[rsi],xmm2
+        pxor    xmm5,xmm13
+        movdqu  XMMWORD[16+rsi],xmm3
+        movdqu  XMMWORD[32+rsi],xmm4
+        movdqu  XMMWORD[48+rsi],xmm5
+        lea     rsi,[64+rsi]
+        jmp     NEAR $L$xts_enc_done
+
+ALIGN   16
+$L$xts_enc_done:
+        and     r9,15
+        jz      NEAR $L$xts_enc_ret
+        mov     rdx,r9
+
+$L$xts_enc_steal:
+        movzx   eax,BYTE[rdi]
+        movzx   ecx,BYTE[((-16))+rsi]
+        lea     rdi,[1+rdi]
+        mov     BYTE[((-16))+rsi],al
+        mov     BYTE[rsi],cl
+        lea     rsi,[1+rsi]
+        sub     rdx,1
+        jnz     NEAR $L$xts_enc_steal
+
+        sub     rsi,r9
+        mov     rcx,rbp
+        mov     eax,r10d
+
+        movups  xmm2,XMMWORD[((-16))+rsi]
+        xorps   xmm2,xmm10
+        movups  xmm0,XMMWORD[rcx]
+        movups  xmm1,XMMWORD[16+rcx]
+        lea     rcx,[32+rcx]
+        xorps   xmm2,xmm0
+$L$oop_enc1_10:
+DB      102,15,56,220,209
+        dec     eax
+        movups  xmm1,XMMWORD[rcx]
+        lea     rcx,[16+rcx]
+        jnz     NEAR $L$oop_enc1_10
+DB      102,15,56,221,209
+        xorps   xmm2,xmm10
+        movups  XMMWORD[(-16)+rsi],xmm2
+
+$L$xts_enc_ret:
+        xorps   xmm0,xmm0
+        pxor    xmm1,xmm1
+        pxor    xmm2,xmm2
+        pxor    xmm3,xmm3
+        pxor    xmm4,xmm4
+        pxor    xmm5,xmm5
+        movaps  xmm6,XMMWORD[((-168))+r11]
+        movaps  XMMWORD[(-168)+r11],xmm0
+        movaps  xmm7,XMMWORD[((-152))+r11]
+        movaps  XMMWORD[(-152)+r11],xmm0
+        movaps  xmm8,XMMWORD[((-136))+r11]
+        movaps  XMMWORD[(-136)+r11],xmm0
+        movaps  xmm9,XMMWORD[((-120))+r11]
+        movaps  XMMWORD[(-120)+r11],xmm0
+        movaps  xmm10,XMMWORD[((-104))+r11]
+        movaps  XMMWORD[(-104)+r11],xmm0
+        movaps  xmm11,XMMWORD[((-88))+r11]
+        movaps  XMMWORD[(-88)+r11],xmm0
+        movaps  xmm12,XMMWORD[((-72))+r11]
+        movaps  XMMWORD[(-72)+r11],xmm0
+        movaps  xmm13,XMMWORD[((-56))+r11]
+        movaps  XMMWORD[(-56)+r11],xmm0
+        movaps  xmm14,XMMWORD[((-40))+r11]
+        movaps  XMMWORD[(-40)+r11],xmm0
+        movaps  xmm15,XMMWORD[((-24))+r11]
+        movaps  XMMWORD[(-24)+r11],xmm0
+        movaps  XMMWORD[rsp],xmm0
+        movaps  XMMWORD[16+rsp],xmm0
+        movaps  XMMWORD[32+rsp],xmm0
+        movaps  XMMWORD[48+rsp],xmm0
+        movaps  XMMWORD[64+rsp],xmm0
+        movaps  XMMWORD[80+rsp],xmm0
+        movaps  XMMWORD[96+rsp],xmm0
+        mov     rbp,QWORD[((-8))+r11]
+
+        lea     rsp,[r11]
+
+$L$xts_enc_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_aesni_xts_encrypt:
+global  aesni_xts_decrypt
+
+ALIGN   16
+aesni_xts_decrypt:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_aesni_xts_decrypt:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+
+        lea     r11,[rsp]
+
+        push    rbp
+
+        sub     rsp,272
+        and     rsp,-16
+        movaps  XMMWORD[(-168)+r11],xmm6
+        movaps  XMMWORD[(-152)+r11],xmm7
+        movaps  XMMWORD[(-136)+r11],xmm8
+        movaps  XMMWORD[(-120)+r11],xmm9
+        movaps  XMMWORD[(-104)+r11],xmm10
+        movaps  XMMWORD[(-88)+r11],xmm11
+        movaps  XMMWORD[(-72)+r11],xmm12
+        movaps  XMMWORD[(-56)+r11],xmm13
+        movaps  XMMWORD[(-40)+r11],xmm14
+        movaps  XMMWORD[(-24)+r11],xmm15
+$L$xts_dec_body:
+        movups  xmm2,XMMWORD[r9]
+        mov     eax,DWORD[240+r8]
+        mov     r10d,DWORD[240+rcx]
+        movups  xmm0,XMMWORD[r8]
+        movups  xmm1,XMMWORD[16+r8]
+        lea     r8,[32+r8]
+        xorps   xmm2,xmm0
+$L$oop_enc1_11:
+DB      102,15,56,220,209
+        dec     eax
+        movups  xmm1,XMMWORD[r8]
+        lea     r8,[16+r8]
+        jnz     NEAR $L$oop_enc1_11
+DB      102,15,56,221,209
+        xor     eax,eax
+        test    rdx,15
+        setnz   al
+        shl     rax,4
+        sub     rdx,rax
+
+        movups  xmm0,XMMWORD[rcx]
+        mov     rbp,rcx
+        mov     eax,r10d
+        shl     r10d,4
+        mov     r9,rdx
+        and     rdx,-16
+
+        movups  xmm1,XMMWORD[16+r10*1+rcx]
+
+        movdqa  xmm8,XMMWORD[$L$xts_magic]
+        movdqa  xmm15,xmm2
+        pshufd  xmm9,xmm2,0x5f
+        pxor    xmm1,xmm0
+        movdqa  xmm14,xmm9
+        paddd   xmm9,xmm9
+        movdqa  xmm10,xmm15
+        psrad   xmm14,31
+        paddq   xmm15,xmm15
+        pand    xmm14,xmm8
+        pxor    xmm10,xmm0
+        pxor    xmm15,xmm14
+        movdqa  xmm14,xmm9
+        paddd   xmm9,xmm9
+        movdqa  xmm11,xmm15
+        psrad   xmm14,31
+        paddq   xmm15,xmm15
+        pand    xmm14,xmm8
+        pxor    xmm11,xmm0
+        pxor    xmm15,xmm14
+        movdqa  xmm14,xmm9
+        paddd   xmm9,xmm9
+        movdqa  xmm12,xmm15
+        psrad   xmm14,31
+        paddq   xmm15,xmm15
+        pand    xmm14,xmm8
+        pxor    xmm12,xmm0
+        pxor    xmm15,xmm14
+        movdqa  xmm14,xmm9
+        paddd   xmm9,xmm9
+        movdqa  xmm13,xmm15
+        psrad   xmm14,31
+        paddq   xmm15,xmm15
+        pand    xmm14,xmm8
+        pxor    xmm13,xmm0
+        pxor    xmm15,xmm14
+        movdqa  xmm14,xmm15
+        psrad   xmm9,31
+        paddq   xmm15,xmm15
+        pand    xmm9,xmm8
+        pxor    xmm14,xmm0
+        pxor    xmm15,xmm9
+        movaps  XMMWORD[96+rsp],xmm1
+
+        sub     rdx,16*6
+        jc      NEAR $L$xts_dec_short
+
+        mov     eax,16+96
+        lea     rcx,[32+r10*1+rbp]
+        sub     rax,r10
+        movups  xmm1,XMMWORD[16+rbp]
+        mov     r10,rax
+        lea     r8,[$L$xts_magic]
+        jmp     NEAR $L$xts_dec_grandloop
+
+ALIGN   32
+$L$xts_dec_grandloop:
+        movdqu  xmm2,XMMWORD[rdi]
+        movdqa  xmm8,xmm0
+        movdqu  xmm3,XMMWORD[16+rdi]
+        pxor    xmm2,xmm10
+        movdqu  xmm4,XMMWORD[32+rdi]
+        pxor    xmm3,xmm11
+DB      102,15,56,222,209
+        movdqu  xmm5,XMMWORD[48+rdi]
+        pxor    xmm4,xmm12
+DB      102,15,56,222,217
+        movdqu  xmm6,XMMWORD[64+rdi]
+        pxor    xmm5,xmm13
+DB      102,15,56,222,225
+        movdqu  xmm7,XMMWORD[80+rdi]
+        pxor    xmm8,xmm15
+        movdqa  xmm9,XMMWORD[96+rsp]
+        pxor    xmm6,xmm14
+DB      102,15,56,222,233
+        movups  xmm0,XMMWORD[32+rbp]
+        lea     rdi,[96+rdi]
+        pxor    xmm7,xmm8
+
+        pxor    xmm10,xmm9
+DB      102,15,56,222,241
+        pxor    xmm11,xmm9
+        movdqa  XMMWORD[rsp],xmm10
+DB      102,15,56,222,249
+        movups  xmm1,XMMWORD[48+rbp]
+        pxor    xmm12,xmm9
+
+DB      102,15,56,222,208
+        pxor    xmm13,xmm9
+        movdqa  XMMWORD[16+rsp],xmm11
+DB      102,15,56,222,216
+        pxor    xmm14,xmm9
+        movdqa  XMMWORD[32+rsp],xmm12
+DB      102,15,56,222,224
+DB      102,15,56,222,232
+        pxor    xmm8,xmm9
+        movdqa  XMMWORD[64+rsp],xmm14
+DB      102,15,56,222,240
+DB      102,15,56,222,248
+        movups  xmm0,XMMWORD[64+rbp]
+        movdqa  XMMWORD[80+rsp],xmm8
+        pshufd  xmm9,xmm15,0x5f
+        jmp     NEAR $L$xts_dec_loop6
+ALIGN   32
+$L$xts_dec_loop6:
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+DB      102,15,56,222,225
+DB      102,15,56,222,233
+DB      102,15,56,222,241
+DB      102,15,56,222,249
+        movups  xmm1,XMMWORD[((-64))+rax*1+rcx]
+        add     rax,32
+
+DB      102,15,56,222,208
+DB      102,15,56,222,216
+DB      102,15,56,222,224
+DB      102,15,56,222,232
+DB      102,15,56,222,240
+DB      102,15,56,222,248
+        movups  xmm0,XMMWORD[((-80))+rax*1+rcx]
+        jnz     NEAR $L$xts_dec_loop6
+
+        movdqa  xmm8,XMMWORD[r8]
+        movdqa  xmm14,xmm9
+        paddd   xmm9,xmm9
+DB      102,15,56,222,209
+        paddq   xmm15,xmm15
+        psrad   xmm14,31
+DB      102,15,56,222,217
+        pand    xmm14,xmm8
+        movups  xmm10,XMMWORD[rbp]
+DB      102,15,56,222,225
+DB      102,15,56,222,233
+DB      102,15,56,222,241
+        pxor    xmm15,xmm14
+        movaps  xmm11,xmm10
+DB      102,15,56,222,249
+        movups  xmm1,XMMWORD[((-64))+rcx]
+
+        movdqa  xmm14,xmm9
+DB      102,15,56,222,208
+        paddd   xmm9,xmm9
+        pxor    xmm10,xmm15
+DB      102,15,56,222,216
+        psrad   xmm14,31
+        paddq   xmm15,xmm15
+DB      102,15,56,222,224
+DB      102,15,56,222,232
+        pand    xmm14,xmm8
+        movaps  xmm12,xmm11
+DB      102,15,56,222,240
+        pxor    xmm15,xmm14
+        movdqa  xmm14,xmm9
+DB      102,15,56,222,248
+        movups  xmm0,XMMWORD[((-48))+rcx]
+
+        paddd   xmm9,xmm9
+DB      102,15,56,222,209
+        pxor    xmm11,xmm15
+        psrad   xmm14,31
+DB      102,15,56,222,217
+        paddq   xmm15,xmm15
+        pand    xmm14,xmm8
+DB      102,15,56,222,225
+DB      102,15,56,222,233
+        movdqa  XMMWORD[48+rsp],xmm13
+        pxor    xmm15,xmm14
+DB      102,15,56,222,241
+        movaps  xmm13,xmm12
+        movdqa  xmm14,xmm9
+DB      102,15,56,222,249
+        movups  xmm1,XMMWORD[((-32))+rcx]
+
+        paddd   xmm9,xmm9
+DB      102,15,56,222,208
+        pxor    xmm12,xmm15
+        psrad   xmm14,31
+DB      102,15,56,222,216
+        paddq   xmm15,xmm15
+        pand    xmm14,xmm8
+DB      102,15,56,222,224
+DB      102,15,56,222,232
+DB      102,15,56,222,240
+        pxor    xmm15,xmm14
+        movaps  xmm14,xmm13
+DB      102,15,56,222,248
+
+        movdqa  xmm0,xmm9
+        paddd   xmm9,xmm9
+DB      102,15,56,222,209
+        pxor    xmm13,xmm15
+        psrad   xmm0,31
+DB      102,15,56,222,217
+        paddq   xmm15,xmm15
+        pand    xmm0,xmm8
+DB      102,15,56,222,225
+DB      102,15,56,222,233
+        pxor    xmm15,xmm0
+        movups  xmm0,XMMWORD[rbp]
+DB      102,15,56,222,241
+DB      102,15,56,222,249
+        movups  xmm1,XMMWORD[16+rbp]
+
+        pxor    xmm14,xmm15
+DB      102,15,56,223,84,36,0
+        psrad   xmm9,31
+        paddq   xmm15,xmm15
+DB      102,15,56,223,92,36,16
+DB      102,15,56,223,100,36,32
+        pand    xmm9,xmm8
+        mov     rax,r10
+DB      102,15,56,223,108,36,48
+DB      102,15,56,223,116,36,64
+DB      102,15,56,223,124,36,80
+        pxor    xmm15,xmm9
+
+        lea     rsi,[96+rsi]
+        movups  XMMWORD[(-96)+rsi],xmm2
+        movups  XMMWORD[(-80)+rsi],xmm3
+        movups  XMMWORD[(-64)+rsi],xmm4
+        movups  XMMWORD[(-48)+rsi],xmm5
+        movups  XMMWORD[(-32)+rsi],xmm6
+        movups  XMMWORD[(-16)+rsi],xmm7
+        sub     rdx,16*6
+        jnc     NEAR $L$xts_dec_grandloop
+
+        mov     eax,16+96
+        sub     eax,r10d
+        mov     rcx,rbp
+        shr     eax,4
+
+$L$xts_dec_short:
+
+        mov     r10d,eax
+        pxor    xmm10,xmm0
+        pxor    xmm11,xmm0
+        add     rdx,16*6
+        jz      NEAR $L$xts_dec_done
+
+        pxor    xmm12,xmm0
+        cmp     rdx,0x20
+        jb      NEAR $L$xts_dec_one
+        pxor    xmm13,xmm0
+        je      NEAR $L$xts_dec_two
+
+        pxor    xmm14,xmm0
+        cmp     rdx,0x40
+        jb      NEAR $L$xts_dec_three
+        je      NEAR $L$xts_dec_four
+
+        movdqu  xmm2,XMMWORD[rdi]
+        movdqu  xmm3,XMMWORD[16+rdi]
+        movdqu  xmm4,XMMWORD[32+rdi]
+        pxor    xmm2,xmm10
+        movdqu  xmm5,XMMWORD[48+rdi]
+        pxor    xmm3,xmm11
+        movdqu  xmm6,XMMWORD[64+rdi]
+        lea     rdi,[80+rdi]
+        pxor    xmm4,xmm12
+        pxor    xmm5,xmm13
+        pxor    xmm6,xmm14
+
+        call    _aesni_decrypt6
+
+        xorps   xmm2,xmm10
+        xorps   xmm3,xmm11
+        xorps   xmm4,xmm12
+        movdqu  XMMWORD[rsi],xmm2
+        xorps   xmm5,xmm13
+        movdqu  XMMWORD[16+rsi],xmm3
+        xorps   xmm6,xmm14
+        movdqu  XMMWORD[32+rsi],xmm4
+        pxor    xmm14,xmm14
+        movdqu  XMMWORD[48+rsi],xmm5
+        pcmpgtd xmm14,xmm15
+        movdqu  XMMWORD[64+rsi],xmm6
+        lea     rsi,[80+rsi]
+        pshufd  xmm11,xmm14,0x13
+        and     r9,15
+        jz      NEAR $L$xts_dec_ret
+
+        movdqa  xmm10,xmm15
+        paddq   xmm15,xmm15
+        pand    xmm11,xmm8
+        pxor    xmm11,xmm15
+        jmp     NEAR $L$xts_dec_done2
+
+ALIGN   16
+$L$xts_dec_one:
+        movups  xmm2,XMMWORD[rdi]
+        lea     rdi,[16+rdi]
+        xorps   xmm2,xmm10
+        movups  xmm0,XMMWORD[rcx]
+        movups  xmm1,XMMWORD[16+rcx]
+        lea     rcx,[32+rcx]
+        xorps   xmm2,xmm0
+$L$oop_dec1_12:
+DB      102,15,56,222,209
+        dec     eax
+        movups  xmm1,XMMWORD[rcx]
+        lea     rcx,[16+rcx]
+        jnz     NEAR $L$oop_dec1_12
+DB      102,15,56,223,209
+        xorps   xmm2,xmm10
+        movdqa  xmm10,xmm11
+        movups  XMMWORD[rsi],xmm2
+        movdqa  xmm11,xmm12
+        lea     rsi,[16+rsi]
+        jmp     NEAR $L$xts_dec_done
+
+ALIGN   16
+$L$xts_dec_two:
+        movups  xmm2,XMMWORD[rdi]
+        movups  xmm3,XMMWORD[16+rdi]
+        lea     rdi,[32+rdi]
+        xorps   xmm2,xmm10
+        xorps   xmm3,xmm11
+
+        call    _aesni_decrypt2
+
+        xorps   xmm2,xmm10
+        movdqa  xmm10,xmm12
+        xorps   xmm3,xmm11
+        movdqa  xmm11,xmm13
+        movups  XMMWORD[rsi],xmm2
+        movups  XMMWORD[16+rsi],xmm3
+        lea     rsi,[32+rsi]
+        jmp     NEAR $L$xts_dec_done
+
+ALIGN   16
+$L$xts_dec_three:
+        movups  xmm2,XMMWORD[rdi]
+        movups  xmm3,XMMWORD[16+rdi]
+        movups  xmm4,XMMWORD[32+rdi]
+        lea     rdi,[48+rdi]
+        xorps   xmm2,xmm10
+        xorps   xmm3,xmm11
+        xorps   xmm4,xmm12
+
+        call    _aesni_decrypt3
+
+        xorps   xmm2,xmm10
+        movdqa  xmm10,xmm13
+        xorps   xmm3,xmm11
+        movdqa  xmm11,xmm14
+        xorps   xmm4,xmm12
+        movups  XMMWORD[rsi],xmm2
+        movups  XMMWORD[16+rsi],xmm3
+        movups  XMMWORD[32+rsi],xmm4
+        lea     rsi,[48+rsi]
+        jmp     NEAR $L$xts_dec_done
+
+ALIGN   16
+$L$xts_dec_four:
+        movups  xmm2,XMMWORD[rdi]
+        movups  xmm3,XMMWORD[16+rdi]
+        movups  xmm4,XMMWORD[32+rdi]
+        xorps   xmm2,xmm10
+        movups  xmm5,XMMWORD[48+rdi]
+        lea     rdi,[64+rdi]
+        xorps   xmm3,xmm11
+        xorps   xmm4,xmm12
+        xorps   xmm5,xmm13
+
+        call    _aesni_decrypt4
+
+        pxor    xmm2,xmm10
+        movdqa  xmm10,xmm14
+        pxor    xmm3,xmm11
+        movdqa  xmm11,xmm15
+        pxor    xmm4,xmm12
+        movdqu  XMMWORD[rsi],xmm2
+        pxor    xmm5,xmm13
+        movdqu  XMMWORD[16+rsi],xmm3
+        movdqu  XMMWORD[32+rsi],xmm4
+        movdqu  XMMWORD[48+rsi],xmm5
+        lea     rsi,[64+rsi]
+        jmp     NEAR $L$xts_dec_done
+
+ALIGN   16
+$L$xts_dec_done:
+        and     r9,15
+        jz      NEAR $L$xts_dec_ret
+$L$xts_dec_done2:
+        mov     rdx,r9
+        mov     rcx,rbp
+        mov     eax,r10d
+
+        movups  xmm2,XMMWORD[rdi]
+        xorps   xmm2,xmm11
+        movups  xmm0,XMMWORD[rcx]
+        movups  xmm1,XMMWORD[16+rcx]
+        lea     rcx,[32+rcx]
+        xorps   xmm2,xmm0
+$L$oop_dec1_13:
+DB      102,15,56,222,209
+        dec     eax
+        movups  xmm1,XMMWORD[rcx]
+        lea     rcx,[16+rcx]
+        jnz     NEAR $L$oop_dec1_13
+DB      102,15,56,223,209
+        xorps   xmm2,xmm11
+        movups  XMMWORD[rsi],xmm2
+
+$L$xts_dec_steal:
+        movzx   eax,BYTE[16+rdi]
+        movzx   ecx,BYTE[rsi]
+        lea     rdi,[1+rdi]
+        mov     BYTE[rsi],al
+        mov     BYTE[16+rsi],cl
+        lea     rsi,[1+rsi]
+        sub     rdx,1
+        jnz     NEAR $L$xts_dec_steal
+
+        sub     rsi,r9
+        mov     rcx,rbp
+        mov     eax,r10d
+
+        movups  xmm2,XMMWORD[rsi]
+        xorps   xmm2,xmm10
+        movups  xmm0,XMMWORD[rcx]
+        movups  xmm1,XMMWORD[16+rcx]
+        lea     rcx,[32+rcx]
+        xorps   xmm2,xmm0
+$L$oop_dec1_14:
+DB      102,15,56,222,209
+        dec     eax
+        movups  xmm1,XMMWORD[rcx]
+        lea     rcx,[16+rcx]
+        jnz     NEAR $L$oop_dec1_14
+DB      102,15,56,223,209
+        xorps   xmm2,xmm10
+        movups  XMMWORD[rsi],xmm2
+
+$L$xts_dec_ret:
+        xorps   xmm0,xmm0
+        pxor    xmm1,xmm1
+        pxor    xmm2,xmm2
+        pxor    xmm3,xmm3
+        pxor    xmm4,xmm4
+        pxor    xmm5,xmm5
+        movaps  xmm6,XMMWORD[((-168))+r11]
+        movaps  XMMWORD[(-168)+r11],xmm0
+        movaps  xmm7,XMMWORD[((-152))+r11]
+        movaps  XMMWORD[(-152)+r11],xmm0
+        movaps  xmm8,XMMWORD[((-136))+r11]
+        movaps  XMMWORD[(-136)+r11],xmm0
+        movaps  xmm9,XMMWORD[((-120))+r11]
+        movaps  XMMWORD[(-120)+r11],xmm0
+        movaps  xmm10,XMMWORD[((-104))+r11]
+        movaps  XMMWORD[(-104)+r11],xmm0
+        movaps  xmm11,XMMWORD[((-88))+r11]
+        movaps  XMMWORD[(-88)+r11],xmm0
+        movaps  xmm12,XMMWORD[((-72))+r11]
+        movaps  XMMWORD[(-72)+r11],xmm0
+        movaps  xmm13,XMMWORD[((-56))+r11]
+        movaps  XMMWORD[(-56)+r11],xmm0
+        movaps  xmm14,XMMWORD[((-40))+r11]
+        movaps  XMMWORD[(-40)+r11],xmm0
+        movaps  xmm15,XMMWORD[((-24))+r11]
+        movaps  XMMWORD[(-24)+r11],xmm0
+        movaps  XMMWORD[rsp],xmm0
+        movaps  XMMWORD[16+rsp],xmm0
+        movaps  XMMWORD[32+rsp],xmm0
+        movaps  XMMWORD[48+rsp],xmm0
+        movaps  XMMWORD[64+rsp],xmm0
+        movaps  XMMWORD[80+rsp],xmm0
+        movaps  XMMWORD[96+rsp],xmm0
+        mov     rbp,QWORD[((-8))+r11]
+
+        lea     rsp,[r11]
+
+$L$xts_dec_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_aesni_xts_decrypt:
+global  aesni_ocb_encrypt
+
+ALIGN   32
+aesni_ocb_encrypt:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_aesni_ocb_encrypt:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+
+        lea     rax,[rsp]
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        lea     rsp,[((-160))+rsp]
+        movaps  XMMWORD[rsp],xmm6
+        movaps  XMMWORD[16+rsp],xmm7
+        movaps  XMMWORD[32+rsp],xmm8
+        movaps  XMMWORD[48+rsp],xmm9
+        movaps  XMMWORD[64+rsp],xmm10
+        movaps  XMMWORD[80+rsp],xmm11
+        movaps  XMMWORD[96+rsp],xmm12
+        movaps  XMMWORD[112+rsp],xmm13
+        movaps  XMMWORD[128+rsp],xmm14
+        movaps  XMMWORD[144+rsp],xmm15
+$L$ocb_enc_body:
+        mov     rbx,QWORD[56+rax]
+        mov     rbp,QWORD[((56+8))+rax]
+
+        mov     r10d,DWORD[240+rcx]
+        mov     r11,rcx
+        shl     r10d,4
+        movups  xmm9,XMMWORD[rcx]
+        movups  xmm1,XMMWORD[16+r10*1+rcx]
+
+        movdqu  xmm15,XMMWORD[r9]
+        pxor    xmm9,xmm1
+        pxor    xmm15,xmm1
+
+        mov     eax,16+32
+        lea     rcx,[32+r10*1+r11]
+        movups  xmm1,XMMWORD[16+r11]
+        sub     rax,r10
+        mov     r10,rax
+
+        movdqu  xmm10,XMMWORD[rbx]
+        movdqu  xmm8,XMMWORD[rbp]
+
+        test    r8,1
+        jnz     NEAR $L$ocb_enc_odd
+
+        bsf     r12,r8
+        add     r8,1
+        shl     r12,4
+        movdqu  xmm7,XMMWORD[r12*1+rbx]
+        movdqu  xmm2,XMMWORD[rdi]
+        lea     rdi,[16+rdi]
+
+        call    __ocb_encrypt1
+
+        movdqa  xmm15,xmm7
+        movups  XMMWORD[rsi],xmm2
+        lea     rsi,[16+rsi]
+        sub     rdx,1
+        jz      NEAR $L$ocb_enc_done
+
+$L$ocb_enc_odd:
+        lea     r12,[1+r8]
+        lea     r13,[3+r8]
+        lea     r14,[5+r8]
+        lea     r8,[6+r8]
+        bsf     r12,r12
+        bsf     r13,r13
+        bsf     r14,r14
+        shl     r12,4
+        shl     r13,4
+        shl     r14,4
+
+        sub     rdx,6
+        jc      NEAR $L$ocb_enc_short
+        jmp     NEAR $L$ocb_enc_grandloop
+
+ALIGN   32
+$L$ocb_enc_grandloop:
+        movdqu  xmm2,XMMWORD[rdi]
+        movdqu  xmm3,XMMWORD[16+rdi]
+        movdqu  xmm4,XMMWORD[32+rdi]
+        movdqu  xmm5,XMMWORD[48+rdi]
+        movdqu  xmm6,XMMWORD[64+rdi]
+        movdqu  xmm7,XMMWORD[80+rdi]
+        lea     rdi,[96+rdi]
+
+        call    __ocb_encrypt6
+
+        movups  XMMWORD[rsi],xmm2
+        movups  XMMWORD[16+rsi],xmm3
+        movups  XMMWORD[32+rsi],xmm4
+        movups  XMMWORD[48+rsi],xmm5
+        movups  XMMWORD[64+rsi],xmm6
+        movups  XMMWORD[80+rsi],xmm7
+        lea     rsi,[96+rsi]
+        sub     rdx,6
+        jnc     NEAR $L$ocb_enc_grandloop
+
+$L$ocb_enc_short:
+        add     rdx,6
+        jz      NEAR $L$ocb_enc_done
+
+        movdqu  xmm2,XMMWORD[rdi]
+        cmp     rdx,2
+        jb      NEAR $L$ocb_enc_one
+        movdqu  xmm3,XMMWORD[16+rdi]
+        je      NEAR $L$ocb_enc_two
+
+        movdqu  xmm4,XMMWORD[32+rdi]
+        cmp     rdx,4
+        jb      NEAR $L$ocb_enc_three
+        movdqu  xmm5,XMMWORD[48+rdi]
+        je      NEAR $L$ocb_enc_four
+
+        movdqu  xmm6,XMMWORD[64+rdi]
+        pxor    xmm7,xmm7
+
+        call    __ocb_encrypt6
+
+        movdqa  xmm15,xmm14
+        movups  XMMWORD[rsi],xmm2
+        movups  XMMWORD[16+rsi],xmm3
+        movups  XMMWORD[32+rsi],xmm4
+        movups  XMMWORD[48+rsi],xmm5
+        movups  XMMWORD[64+rsi],xmm6
+
+        jmp     NEAR $L$ocb_enc_done
+
+ALIGN   16
+$L$ocb_enc_one:
+        movdqa  xmm7,xmm10
+
+        call    __ocb_encrypt1
+
+        movdqa  xmm15,xmm7
+        movups  XMMWORD[rsi],xmm2
+        jmp     NEAR $L$ocb_enc_done
+
+ALIGN   16
+$L$ocb_enc_two:
+        pxor    xmm4,xmm4
+        pxor    xmm5,xmm5
+
+        call    __ocb_encrypt4
+
+        movdqa  xmm15,xmm11
+        movups  XMMWORD[rsi],xmm2
+        movups  XMMWORD[16+rsi],xmm3
+
+        jmp     NEAR $L$ocb_enc_done
+
+ALIGN   16
+$L$ocb_enc_three:
+        pxor    xmm5,xmm5
+
+        call    __ocb_encrypt4
+
+        movdqa  xmm15,xmm12
+        movups  XMMWORD[rsi],xmm2
+        movups  XMMWORD[16+rsi],xmm3
+        movups  XMMWORD[32+rsi],xmm4
+
+        jmp     NEAR $L$ocb_enc_done
+
+ALIGN   16
+$L$ocb_enc_four:
+        call    __ocb_encrypt4
+
+        movdqa  xmm15,xmm13
+        movups  XMMWORD[rsi],xmm2
+        movups  XMMWORD[16+rsi],xmm3
+        movups  XMMWORD[32+rsi],xmm4
+        movups  XMMWORD[48+rsi],xmm5
+
+$L$ocb_enc_done:
+        pxor    xmm15,xmm0
+        movdqu  XMMWORD[rbp],xmm8
+        movdqu  XMMWORD[r9],xmm15
+
+        xorps   xmm0,xmm0
+        pxor    xmm1,xmm1
+        pxor    xmm2,xmm2
+        pxor    xmm3,xmm3
+        pxor    xmm4,xmm4
+        pxor    xmm5,xmm5
+        movaps  xmm6,XMMWORD[rsp]
+        movaps  XMMWORD[rsp],xmm0
+        movaps  xmm7,XMMWORD[16+rsp]
+        movaps  XMMWORD[16+rsp],xmm0
+        movaps  xmm8,XMMWORD[32+rsp]
+        movaps  XMMWORD[32+rsp],xmm0
+        movaps  xmm9,XMMWORD[48+rsp]
+        movaps  XMMWORD[48+rsp],xmm0
+        movaps  xmm10,XMMWORD[64+rsp]
+        movaps  XMMWORD[64+rsp],xmm0
+        movaps  xmm11,XMMWORD[80+rsp]
+        movaps  XMMWORD[80+rsp],xmm0
+        movaps  xmm12,XMMWORD[96+rsp]
+        movaps  XMMWORD[96+rsp],xmm0
+        movaps  xmm13,XMMWORD[112+rsp]
+        movaps  XMMWORD[112+rsp],xmm0
+        movaps  xmm14,XMMWORD[128+rsp]
+        movaps  XMMWORD[128+rsp],xmm0
+        movaps  xmm15,XMMWORD[144+rsp]
+        movaps  XMMWORD[144+rsp],xmm0
+        lea     rax,[((160+40))+rsp]
+$L$ocb_enc_pop:
+        mov     r14,QWORD[((-40))+rax]
+
+        mov     r13,QWORD[((-32))+rax]
+
+        mov     r12,QWORD[((-24))+rax]
+
+        mov     rbp,QWORD[((-16))+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+
+        lea     rsp,[rax]
+
+$L$ocb_enc_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_aesni_ocb_encrypt:
+
+
+ALIGN   32
+__ocb_encrypt6:
+        pxor    xmm15,xmm9
+        movdqu  xmm11,XMMWORD[r12*1+rbx]
+        movdqa  xmm12,xmm10
+        movdqu  xmm13,XMMWORD[r13*1+rbx]
+        movdqa  xmm14,xmm10
+        pxor    xmm10,xmm15
+        movdqu  xmm15,XMMWORD[r14*1+rbx]
+        pxor    xmm11,xmm10
+        pxor    xmm8,xmm2
+        pxor    xmm2,xmm10
+        pxor    xmm12,xmm11
+        pxor    xmm8,xmm3
+        pxor    xmm3,xmm11
+        pxor    xmm13,xmm12
+        pxor    xmm8,xmm4
+        pxor    xmm4,xmm12
+        pxor    xmm14,xmm13
+        pxor    xmm8,xmm5
+        pxor    xmm5,xmm13
+        pxor    xmm15,xmm14
+        pxor    xmm8,xmm6
+        pxor    xmm6,xmm14
+        pxor    xmm8,xmm7
+        pxor    xmm7,xmm15
+        movups  xmm0,XMMWORD[32+r11]
+
+        lea     r12,[1+r8]
+        lea     r13,[3+r8]
+        lea     r14,[5+r8]
+        add     r8,6
+        pxor    xmm10,xmm9
+        bsf     r12,r12
+        bsf     r13,r13
+        bsf     r14,r14
+
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+DB      102,15,56,220,225
+DB      102,15,56,220,233
+        pxor    xmm11,xmm9
+        pxor    xmm12,xmm9
+DB      102,15,56,220,241
+        pxor    xmm13,xmm9
+        pxor    xmm14,xmm9
+DB      102,15,56,220,249
+        movups  xmm1,XMMWORD[48+r11]
+        pxor    xmm15,xmm9
+
+DB      102,15,56,220,208
+DB      102,15,56,220,216
+DB      102,15,56,220,224
+DB      102,15,56,220,232
+DB      102,15,56,220,240
+DB      102,15,56,220,248
+        movups  xmm0,XMMWORD[64+r11]
+        shl     r12,4
+        shl     r13,4
+        jmp     NEAR $L$ocb_enc_loop6
+
+ALIGN   32
+$L$ocb_enc_loop6:
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+DB      102,15,56,220,225
+DB      102,15,56,220,233
+DB      102,15,56,220,241
+DB      102,15,56,220,249
+        movups  xmm1,XMMWORD[rax*1+rcx]
+        add     rax,32
+
+DB      102,15,56,220,208
+DB      102,15,56,220,216
+DB      102,15,56,220,224
+DB      102,15,56,220,232
+DB      102,15,56,220,240
+DB      102,15,56,220,248
+        movups  xmm0,XMMWORD[((-16))+rax*1+rcx]
+        jnz     NEAR $L$ocb_enc_loop6
+
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+DB      102,15,56,220,225
+DB      102,15,56,220,233
+DB      102,15,56,220,241
+DB      102,15,56,220,249
+        movups  xmm1,XMMWORD[16+r11]
+        shl     r14,4
+
+DB      102,65,15,56,221,210
+        movdqu  xmm10,XMMWORD[rbx]
+        mov     rax,r10
+DB      102,65,15,56,221,219
+DB      102,65,15,56,221,228
+DB      102,65,15,56,221,237
+DB      102,65,15,56,221,246
+DB      102,65,15,56,221,255
+        DB      0F3h,0C3h               ;repret
+
+
+
+ALIGN   32
+__ocb_encrypt4:
+        pxor    xmm15,xmm9
+        movdqu  xmm11,XMMWORD[r12*1+rbx]
+        movdqa  xmm12,xmm10
+        movdqu  xmm13,XMMWORD[r13*1+rbx]
+        pxor    xmm10,xmm15
+        pxor    xmm11,xmm10
+        pxor    xmm8,xmm2
+        pxor    xmm2,xmm10
+        pxor    xmm12,xmm11
+        pxor    xmm8,xmm3
+        pxor    xmm3,xmm11
+        pxor    xmm13,xmm12
+        pxor    xmm8,xmm4
+        pxor    xmm4,xmm12
+        pxor    xmm8,xmm5
+        pxor    xmm5,xmm13
+        movups  xmm0,XMMWORD[32+r11]
+
+        pxor    xmm10,xmm9
+        pxor    xmm11,xmm9
+        pxor    xmm12,xmm9
+        pxor    xmm13,xmm9
+
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+DB      102,15,56,220,225
+DB      102,15,56,220,233
+        movups  xmm1,XMMWORD[48+r11]
+
+DB      102,15,56,220,208
+DB      102,15,56,220,216
+DB      102,15,56,220,224
+DB      102,15,56,220,232
+        movups  xmm0,XMMWORD[64+r11]
+        jmp     NEAR $L$ocb_enc_loop4
+
+ALIGN   32
+$L$ocb_enc_loop4:
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+DB      102,15,56,220,225
+DB      102,15,56,220,233
+        movups  xmm1,XMMWORD[rax*1+rcx]
+        add     rax,32
+
+DB      102,15,56,220,208
+DB      102,15,56,220,216
+DB      102,15,56,220,224
+DB      102,15,56,220,232
+        movups  xmm0,XMMWORD[((-16))+rax*1+rcx]
+        jnz     NEAR $L$ocb_enc_loop4
+
+DB      102,15,56,220,209
+DB      102,15,56,220,217
+DB      102,15,56,220,225
+DB      102,15,56,220,233
+        movups  xmm1,XMMWORD[16+r11]
+        mov     rax,r10
+
+DB      102,65,15,56,221,210
+DB      102,65,15,56,221,219
+DB      102,65,15,56,221,228
+DB      102,65,15,56,221,237
+        DB      0F3h,0C3h               ;repret
+
+
+
+ALIGN   32
+__ocb_encrypt1:
+        pxor    xmm7,xmm15
+        pxor    xmm7,xmm9
+        pxor    xmm8,xmm2
+        pxor    xmm2,xmm7
+        movups  xmm0,XMMWORD[32+r11]
+
+DB      102,15,56,220,209
+        movups  xmm1,XMMWORD[48+r11]
+        pxor    xmm7,xmm9
+
+DB      102,15,56,220,208
+        movups  xmm0,XMMWORD[64+r11]
+        jmp     NEAR $L$ocb_enc_loop1
+
+ALIGN   32
+$L$ocb_enc_loop1:
+DB      102,15,56,220,209
+        movups  xmm1,XMMWORD[rax*1+rcx]
+        add     rax,32
+
+DB      102,15,56,220,208
+        movups  xmm0,XMMWORD[((-16))+rax*1+rcx]
+        jnz     NEAR $L$ocb_enc_loop1
+
+DB      102,15,56,220,209
+        movups  xmm1,XMMWORD[16+r11]
+        mov     rax,r10
+
+DB      102,15,56,221,215
+        DB      0F3h,0C3h               ;repret
+
+
+global  aesni_ocb_decrypt
+
+ALIGN   32
+aesni_ocb_decrypt:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_aesni_ocb_decrypt:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+
+        lea     rax,[rsp]
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        lea     rsp,[((-160))+rsp]
+        movaps  XMMWORD[rsp],xmm6
+        movaps  XMMWORD[16+rsp],xmm7
+        movaps  XMMWORD[32+rsp],xmm8
+        movaps  XMMWORD[48+rsp],xmm9
+        movaps  XMMWORD[64+rsp],xmm10
+        movaps  XMMWORD[80+rsp],xmm11
+        movaps  XMMWORD[96+rsp],xmm12
+        movaps  XMMWORD[112+rsp],xmm13
+        movaps  XMMWORD[128+rsp],xmm14
+        movaps  XMMWORD[144+rsp],xmm15
+$L$ocb_dec_body:
+        mov     rbx,QWORD[56+rax]
+        mov     rbp,QWORD[((56+8))+rax]
+
+        mov     r10d,DWORD[240+rcx]
+        mov     r11,rcx
+        shl     r10d,4
+        movups  xmm9,XMMWORD[rcx]
+        movups  xmm1,XMMWORD[16+r10*1+rcx]
+
+        movdqu  xmm15,XMMWORD[r9]
+        pxor    xmm9,xmm1
+        pxor    xmm15,xmm1
+
+        mov     eax,16+32
+        lea     rcx,[32+r10*1+r11]
+        movups  xmm1,XMMWORD[16+r11]
+        sub     rax,r10
+        mov     r10,rax
+
+        movdqu  xmm10,XMMWORD[rbx]
+        movdqu  xmm8,XMMWORD[rbp]
+
+        test    r8,1
+        jnz     NEAR $L$ocb_dec_odd
+
+        bsf     r12,r8
+        add     r8,1
+        shl     r12,4
+        movdqu  xmm7,XMMWORD[r12*1+rbx]
+        movdqu  xmm2,XMMWORD[rdi]
+        lea     rdi,[16+rdi]
+
+        call    __ocb_decrypt1
+
+        movdqa  xmm15,xmm7
+        movups  XMMWORD[rsi],xmm2
+        xorps   xmm8,xmm2
+        lea     rsi,[16+rsi]
+        sub     rdx,1
+        jz      NEAR $L$ocb_dec_done
+
+$L$ocb_dec_odd:
+        lea     r12,[1+r8]
+        lea     r13,[3+r8]
+        lea     r14,[5+r8]
+        lea     r8,[6+r8]
+        bsf     r12,r12
+        bsf     r13,r13
+        bsf     r14,r14
+        shl     r12,4
+        shl     r13,4
+        shl     r14,4
+
+        sub     rdx,6
+        jc      NEAR $L$ocb_dec_short
+        jmp     NEAR $L$ocb_dec_grandloop
+
+ALIGN   32
+$L$ocb_dec_grandloop:
+        movdqu  xmm2,XMMWORD[rdi]
+        movdqu  xmm3,XMMWORD[16+rdi]
+        movdqu  xmm4,XMMWORD[32+rdi]
+        movdqu  xmm5,XMMWORD[48+rdi]
+        movdqu  xmm6,XMMWORD[64+rdi]
+        movdqu  xmm7,XMMWORD[80+rdi]
+        lea     rdi,[96+rdi]
+
+        call    __ocb_decrypt6
+
+        movups  XMMWORD[rsi],xmm2
+        pxor    xmm8,xmm2
+        movups  XMMWORD[16+rsi],xmm3
+        pxor    xmm8,xmm3
+        movups  XMMWORD[32+rsi],xmm4
+        pxor    xmm8,xmm4
+        movups  XMMWORD[48+rsi],xmm5
+        pxor    xmm8,xmm5
+        movups  XMMWORD[64+rsi],xmm6
+        pxor    xmm8,xmm6
+        movups  XMMWORD[80+rsi],xmm7
+        pxor    xmm8,xmm7
+        lea     rsi,[96+rsi]
+        sub     rdx,6
+        jnc     NEAR $L$ocb_dec_grandloop
+
+$L$ocb_dec_short:
+        add     rdx,6
+        jz      NEAR $L$ocb_dec_done
+
+        movdqu  xmm2,XMMWORD[rdi]
+        cmp     rdx,2
+        jb      NEAR $L$ocb_dec_one
+        movdqu  xmm3,XMMWORD[16+rdi]
+        je      NEAR $L$ocb_dec_two
+
+        movdqu  xmm4,XMMWORD[32+rdi]
+        cmp     rdx,4
+        jb      NEAR $L$ocb_dec_three
+        movdqu  xmm5,XMMWORD[48+rdi]
+        je      NEAR $L$ocb_dec_four
+
+        movdqu  xmm6,XMMWORD[64+rdi]
+        pxor    xmm7,xmm7
+
+        call    __ocb_decrypt6
+
+        movdqa  xmm15,xmm14
+        movups  XMMWORD[rsi],xmm2
+        pxor    xmm8,xmm2
+        movups  XMMWORD[16+rsi],xmm3
+        pxor    xmm8,xmm3
+        movups  XMMWORD[32+rsi],xmm4
+        pxor    xmm8,xmm4
+        movups  XMMWORD[48+rsi],xmm5
+        pxor    xmm8,xmm5
+        movups  XMMWORD[64+rsi],xmm6
+        pxor    xmm8,xmm6
+
+        jmp     NEAR $L$ocb_dec_done
+
+ALIGN   16
+$L$ocb_dec_one:
+        movdqa  xmm7,xmm10
+
+        call    __ocb_decrypt1
+
+        movdqa  xmm15,xmm7
+        movups  XMMWORD[rsi],xmm2
+        xorps   xmm8,xmm2
+        jmp     NEAR $L$ocb_dec_done
+
+ALIGN   16
+$L$ocb_dec_two:
+        pxor    xmm4,xmm4
+        pxor    xmm5,xmm5
+
+        call    __ocb_decrypt4
+
+        movdqa  xmm15,xmm11
+        movups  XMMWORD[rsi],xmm2
+        xorps   xmm8,xmm2
+        movups  XMMWORD[16+rsi],xmm3
+        xorps   xmm8,xmm3
+
+        jmp     NEAR $L$ocb_dec_done
+
+ALIGN   16
+$L$ocb_dec_three:
+        pxor    xmm5,xmm5
+
+        call    __ocb_decrypt4
+
+        movdqa  xmm15,xmm12
+        movups  XMMWORD[rsi],xmm2
+        xorps   xmm8,xmm2
+        movups  XMMWORD[16+rsi],xmm3
+        xorps   xmm8,xmm3
+        movups  XMMWORD[32+rsi],xmm4
+        xorps   xmm8,xmm4
+
+        jmp     NEAR $L$ocb_dec_done
+
+ALIGN   16
+$L$ocb_dec_four:
+        call    __ocb_decrypt4
+
+        movdqa  xmm15,xmm13
+        movups  XMMWORD[rsi],xmm2
+        pxor    xmm8,xmm2
+        movups  XMMWORD[16+rsi],xmm3
+        pxor    xmm8,xmm3
+        movups  XMMWORD[32+rsi],xmm4
+        pxor    xmm8,xmm4
+        movups  XMMWORD[48+rsi],xmm5
+        pxor    xmm8,xmm5
+
+$L$ocb_dec_done:
+        pxor    xmm15,xmm0
+        movdqu  XMMWORD[rbp],xmm8
+        movdqu  XMMWORD[r9],xmm15
+
+        xorps   xmm0,xmm0
+        pxor    xmm1,xmm1
+        pxor    xmm2,xmm2
+        pxor    xmm3,xmm3
+        pxor    xmm4,xmm4
+        pxor    xmm5,xmm5
+        movaps  xmm6,XMMWORD[rsp]
+        movaps  XMMWORD[rsp],xmm0
+        movaps  xmm7,XMMWORD[16+rsp]
+        movaps  XMMWORD[16+rsp],xmm0
+        movaps  xmm8,XMMWORD[32+rsp]
+        movaps  XMMWORD[32+rsp],xmm0
+        movaps  xmm9,XMMWORD[48+rsp]
+        movaps  XMMWORD[48+rsp],xmm0
+        movaps  xmm10,XMMWORD[64+rsp]
+        movaps  XMMWORD[64+rsp],xmm0
+        movaps  xmm11,XMMWORD[80+rsp]
+        movaps  XMMWORD[80+rsp],xmm0
+        movaps  xmm12,XMMWORD[96+rsp]
+        movaps  XMMWORD[96+rsp],xmm0
+        movaps  xmm13,XMMWORD[112+rsp]
+        movaps  XMMWORD[112+rsp],xmm0
+        movaps  xmm14,XMMWORD[128+rsp]
+        movaps  XMMWORD[128+rsp],xmm0
+        movaps  xmm15,XMMWORD[144+rsp]
+        movaps  XMMWORD[144+rsp],xmm0
+        lea     rax,[((160+40))+rsp]
+$L$ocb_dec_pop:
+        mov     r14,QWORD[((-40))+rax]
+
+        mov     r13,QWORD[((-32))+rax]
+
+        mov     r12,QWORD[((-24))+rax]
+
+        mov     rbp,QWORD[((-16))+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+
+        lea     rsp,[rax]
+
+$L$ocb_dec_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_aesni_ocb_decrypt:
+
+
+ALIGN   32
+__ocb_decrypt6:
+        pxor    xmm15,xmm9
+        movdqu  xmm11,XMMWORD[r12*1+rbx]
+        movdqa  xmm12,xmm10
+        movdqu  xmm13,XMMWORD[r13*1+rbx]
+        movdqa  xmm14,xmm10
+        pxor    xmm10,xmm15
+        movdqu  xmm15,XMMWORD[r14*1+rbx]
+        pxor    xmm11,xmm10
+        pxor    xmm2,xmm10
+        pxor    xmm12,xmm11
+        pxor    xmm3,xmm11
+        pxor    xmm13,xmm12
+        pxor    xmm4,xmm12
+        pxor    xmm14,xmm13
+        pxor    xmm5,xmm13
+        pxor    xmm15,xmm14
+        pxor    xmm6,xmm14
+        pxor    xmm7,xmm15
+        movups  xmm0,XMMWORD[32+r11]
+
+        lea     r12,[1+r8]
+        lea     r13,[3+r8]
+        lea     r14,[5+r8]
+        add     r8,6
+        pxor    xmm10,xmm9
+        bsf     r12,r12
+        bsf     r13,r13
+        bsf     r14,r14
+
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+DB      102,15,56,222,225
+DB      102,15,56,222,233
+        pxor    xmm11,xmm9
+        pxor    xmm12,xmm9
+DB      102,15,56,222,241
+        pxor    xmm13,xmm9
+        pxor    xmm14,xmm9
+DB      102,15,56,222,249
+        movups  xmm1,XMMWORD[48+r11]
+        pxor    xmm15,xmm9
+
+DB      102,15,56,222,208
+DB      102,15,56,222,216
+DB      102,15,56,222,224
+DB      102,15,56,222,232
+DB      102,15,56,222,240
+DB      102,15,56,222,248
+        movups  xmm0,XMMWORD[64+r11]
+        shl     r12,4
+        shl     r13,4
+        jmp     NEAR $L$ocb_dec_loop6
+
+ALIGN   32
+$L$ocb_dec_loop6:
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+DB      102,15,56,222,225
+DB      102,15,56,222,233
+DB      102,15,56,222,241
+DB      102,15,56,222,249
+        movups  xmm1,XMMWORD[rax*1+rcx]
+        add     rax,32
+
+DB      102,15,56,222,208
+DB      102,15,56,222,216
+DB      102,15,56,222,224
+DB      102,15,56,222,232
+DB      102,15,56,222,240
+DB      102,15,56,222,248
+        movups  xmm0,XMMWORD[((-16))+rax*1+rcx]
+        jnz     NEAR $L$ocb_dec_loop6
+
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+DB      102,15,56,222,225
+DB      102,15,56,222,233
+DB      102,15,56,222,241
+DB      102,15,56,222,249
+        movups  xmm1,XMMWORD[16+r11]
+        shl     r14,4
+
+DB      102,65,15,56,223,210
+        movdqu  xmm10,XMMWORD[rbx]
+        mov     rax,r10
+DB      102,65,15,56,223,219
+DB      102,65,15,56,223,228
+DB      102,65,15,56,223,237
+DB      102,65,15,56,223,246
+DB      102,65,15,56,223,255
+        DB      0F3h,0C3h               ;repret
+
+
+
+ALIGN   32
+__ocb_decrypt4:
+        pxor    xmm15,xmm9
+        movdqu  xmm11,XMMWORD[r12*1+rbx]
+        movdqa  xmm12,xmm10
+        movdqu  xmm13,XMMWORD[r13*1+rbx]
+        pxor    xmm10,xmm15
+        pxor    xmm11,xmm10
+        pxor    xmm2,xmm10
+        pxor    xmm12,xmm11
+        pxor    xmm3,xmm11
+        pxor    xmm13,xmm12
+        pxor    xmm4,xmm12
+        pxor    xmm5,xmm13
+        movups  xmm0,XMMWORD[32+r11]
+
+        pxor    xmm10,xmm9
+        pxor    xmm11,xmm9
+        pxor    xmm12,xmm9
+        pxor    xmm13,xmm9
+
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+DB      102,15,56,222,225
+DB      102,15,56,222,233
+        movups  xmm1,XMMWORD[48+r11]
+
+DB      102,15,56,222,208
+DB      102,15,56,222,216
+DB      102,15,56,222,224
+DB      102,15,56,222,232
+        movups  xmm0,XMMWORD[64+r11]
+        jmp     NEAR $L$ocb_dec_loop4
+
+ALIGN   32
+$L$ocb_dec_loop4:
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+DB      102,15,56,222,225
+DB      102,15,56,222,233
+        movups  xmm1,XMMWORD[rax*1+rcx]
+        add     rax,32
+
+DB      102,15,56,222,208
+DB      102,15,56,222,216
+DB      102,15,56,222,224
+DB      102,15,56,222,232
+        movups  xmm0,XMMWORD[((-16))+rax*1+rcx]
+        jnz     NEAR $L$ocb_dec_loop4
+
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+DB      102,15,56,222,225
+DB      102,15,56,222,233
+        movups  xmm1,XMMWORD[16+r11]
+        mov     rax,r10
+
+DB      102,65,15,56,223,210
+DB      102,65,15,56,223,219
+DB      102,65,15,56,223,228
+DB      102,65,15,56,223,237
+        DB      0F3h,0C3h               ;repret
+
+
+
+ALIGN   32
+__ocb_decrypt1:
+        pxor    xmm7,xmm15
+        pxor    xmm7,xmm9
+        pxor    xmm2,xmm7
+        movups  xmm0,XMMWORD[32+r11]
+
+DB      102,15,56,222,209
+        movups  xmm1,XMMWORD[48+r11]
+        pxor    xmm7,xmm9
+
+DB      102,15,56,222,208
+        movups  xmm0,XMMWORD[64+r11]
+        jmp     NEAR $L$ocb_dec_loop1
+
+ALIGN   32
+$L$ocb_dec_loop1:
+DB      102,15,56,222,209
+        movups  xmm1,XMMWORD[rax*1+rcx]
+        add     rax,32
+
+DB      102,15,56,222,208
+        movups  xmm0,XMMWORD[((-16))+rax*1+rcx]
+        jnz     NEAR $L$ocb_dec_loop1
+
+DB      102,15,56,222,209
+        movups  xmm1,XMMWORD[16+r11]
+        mov     rax,r10
+
+DB      102,15,56,223,215
+        DB      0F3h,0C3h               ;repret
+
+global  aesni_cbc_encrypt
+
+ALIGN   16
+aesni_cbc_encrypt:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_aesni_cbc_encrypt:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+
+        test    rdx,rdx
+        jz      NEAR $L$cbc_ret
+
+        mov     r10d,DWORD[240+rcx]
+        mov     r11,rcx
+        test    r9d,r9d
+        jz      NEAR $L$cbc_decrypt
+
+        movups  xmm2,XMMWORD[r8]
+        mov     eax,r10d
+        cmp     rdx,16
+        jb      NEAR $L$cbc_enc_tail
+        sub     rdx,16
+        jmp     NEAR $L$cbc_enc_loop
+ALIGN   16
+$L$cbc_enc_loop:
+        movups  xmm3,XMMWORD[rdi]
+        lea     rdi,[16+rdi]
+
+        movups  xmm0,XMMWORD[rcx]
+        movups  xmm1,XMMWORD[16+rcx]
+        xorps   xmm3,xmm0
+        lea     rcx,[32+rcx]
+        xorps   xmm2,xmm3
+$L$oop_enc1_15:
+DB      102,15,56,220,209
+        dec     eax
+        movups  xmm1,XMMWORD[rcx]
+        lea     rcx,[16+rcx]
+        jnz     NEAR $L$oop_enc1_15
+DB      102,15,56,221,209
+        mov     eax,r10d
+        mov     rcx,r11
+        movups  XMMWORD[rsi],xmm2
+        lea     rsi,[16+rsi]
+        sub     rdx,16
+        jnc     NEAR $L$cbc_enc_loop
+        add     rdx,16
+        jnz     NEAR $L$cbc_enc_tail
+        pxor    xmm0,xmm0
+        pxor    xmm1,xmm1
+        movups  XMMWORD[r8],xmm2
+        pxor    xmm2,xmm2
+        pxor    xmm3,xmm3
+        jmp     NEAR $L$cbc_ret
+
+$L$cbc_enc_tail:
+        mov     rcx,rdx
+        xchg    rsi,rdi
+        DD      0x9066A4F3
+        mov     ecx,16
+        sub     rcx,rdx
+        xor     eax,eax
+        DD      0x9066AAF3
+        lea     rdi,[((-16))+rdi]
+        mov     eax,r10d
+        mov     rsi,rdi
+        mov     rcx,r11
+        xor     rdx,rdx
+        jmp     NEAR $L$cbc_enc_loop
+
+ALIGN   16
+$L$cbc_decrypt:
+        cmp     rdx,16
+        jne     NEAR $L$cbc_decrypt_bulk
+
+
+
+        movdqu  xmm2,XMMWORD[rdi]
+        movdqu  xmm3,XMMWORD[r8]
+        movdqa  xmm4,xmm2
+        movups  xmm0,XMMWORD[rcx]
+        movups  xmm1,XMMWORD[16+rcx]
+        lea     rcx,[32+rcx]
+        xorps   xmm2,xmm0
+$L$oop_dec1_16:
+DB      102,15,56,222,209
+        dec     r10d
+        movups  xmm1,XMMWORD[rcx]
+        lea     rcx,[16+rcx]
+        jnz     NEAR $L$oop_dec1_16
+DB      102,15,56,223,209
+        pxor    xmm0,xmm0
+        pxor    xmm1,xmm1
+        movdqu  XMMWORD[r8],xmm4
+        xorps   xmm2,xmm3
+        pxor    xmm3,xmm3
+        movups  XMMWORD[rsi],xmm2
+        pxor    xmm2,xmm2
+        jmp     NEAR $L$cbc_ret
+ALIGN   16
+$L$cbc_decrypt_bulk:
+        lea     r11,[rsp]
+
+        push    rbp
+
+        sub     rsp,176
+        and     rsp,-16
+        movaps  XMMWORD[16+rsp],xmm6
+        movaps  XMMWORD[32+rsp],xmm7
+        movaps  XMMWORD[48+rsp],xmm8
+        movaps  XMMWORD[64+rsp],xmm9
+        movaps  XMMWORD[80+rsp],xmm10
+        movaps  XMMWORD[96+rsp],xmm11
+        movaps  XMMWORD[112+rsp],xmm12
+        movaps  XMMWORD[128+rsp],xmm13
+        movaps  XMMWORD[144+rsp],xmm14
+        movaps  XMMWORD[160+rsp],xmm15
+$L$cbc_decrypt_body:
+        mov     rbp,rcx
+        movups  xmm10,XMMWORD[r8]
+        mov     eax,r10d
+        cmp     rdx,0x50
+        jbe     NEAR $L$cbc_dec_tail
+
+        movups  xmm0,XMMWORD[rcx]
+        movdqu  xmm2,XMMWORD[rdi]
+        movdqu  xmm3,XMMWORD[16+rdi]
+        movdqa  xmm11,xmm2
+        movdqu  xmm4,XMMWORD[32+rdi]
+        movdqa  xmm12,xmm3
+        movdqu  xmm5,XMMWORD[48+rdi]
+        movdqa  xmm13,xmm4
+        movdqu  xmm6,XMMWORD[64+rdi]
+        movdqa  xmm14,xmm5
+        movdqu  xmm7,XMMWORD[80+rdi]
+        movdqa  xmm15,xmm6
+        mov     r9d,DWORD[((OPENSSL_ia32cap_P+4))]
+        cmp     rdx,0x70
+        jbe     NEAR $L$cbc_dec_six_or_seven
+
+        and     r9d,71303168
+        sub     rdx,0x50
+        cmp     r9d,4194304
+        je      NEAR $L$cbc_dec_loop6_enter
+        sub     rdx,0x20
+        lea     rcx,[112+rcx]
+        jmp     NEAR $L$cbc_dec_loop8_enter
+ALIGN   16
+$L$cbc_dec_loop8:
+        movups  XMMWORD[rsi],xmm9
+        lea     rsi,[16+rsi]
+$L$cbc_dec_loop8_enter:
+        movdqu  xmm8,XMMWORD[96+rdi]
+        pxor    xmm2,xmm0
+        movdqu  xmm9,XMMWORD[112+rdi]
+        pxor    xmm3,xmm0
+        movups  xmm1,XMMWORD[((16-112))+rcx]
+        pxor    xmm4,xmm0
+        mov     rbp,-1
+        cmp     rdx,0x70
+        pxor    xmm5,xmm0
+        pxor    xmm6,xmm0
+        pxor    xmm7,xmm0
+        pxor    xmm8,xmm0
+
+DB      102,15,56,222,209
+        pxor    xmm9,xmm0
+        movups  xmm0,XMMWORD[((32-112))+rcx]
+DB      102,15,56,222,217
+DB      102,15,56,222,225
+DB      102,15,56,222,233
+DB      102,15,56,222,241
+DB      102,15,56,222,249
+DB      102,68,15,56,222,193
+        adc     rbp,0
+        and     rbp,128
+DB      102,68,15,56,222,201
+        add     rbp,rdi
+        movups  xmm1,XMMWORD[((48-112))+rcx]
+DB      102,15,56,222,208
+DB      102,15,56,222,216
+DB      102,15,56,222,224
+DB      102,15,56,222,232
+DB      102,15,56,222,240
+DB      102,15,56,222,248
+DB      102,68,15,56,222,192
+DB      102,68,15,56,222,200
+        movups  xmm0,XMMWORD[((64-112))+rcx]
+        nop
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+DB      102,15,56,222,225
+DB      102,15,56,222,233
+DB      102,15,56,222,241
+DB      102,15,56,222,249
+DB      102,68,15,56,222,193
+DB      102,68,15,56,222,201
+        movups  xmm1,XMMWORD[((80-112))+rcx]
+        nop
+DB      102,15,56,222,208
+DB      102,15,56,222,216
+DB      102,15,56,222,224
+DB      102,15,56,222,232
+DB      102,15,56,222,240
+DB      102,15,56,222,248
+DB      102,68,15,56,222,192
+DB      102,68,15,56,222,200
+        movups  xmm0,XMMWORD[((96-112))+rcx]
+        nop
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+DB      102,15,56,222,225
+DB      102,15,56,222,233
+DB      102,15,56,222,241
+DB      102,15,56,222,249
+DB      102,68,15,56,222,193
+DB      102,68,15,56,222,201
+        movups  xmm1,XMMWORD[((112-112))+rcx]
+        nop
+DB      102,15,56,222,208
+DB      102,15,56,222,216
+DB      102,15,56,222,224
+DB      102,15,56,222,232
+DB      102,15,56,222,240
+DB      102,15,56,222,248
+DB      102,68,15,56,222,192
+DB      102,68,15,56,222,200
+        movups  xmm0,XMMWORD[((128-112))+rcx]
+        nop
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+DB      102,15,56,222,225
+DB      102,15,56,222,233
+DB      102,15,56,222,241
+DB      102,15,56,222,249
+DB      102,68,15,56,222,193
+DB      102,68,15,56,222,201
+        movups  xmm1,XMMWORD[((144-112))+rcx]
+        cmp     eax,11
+DB      102,15,56,222,208
+DB      102,15,56,222,216
+DB      102,15,56,222,224
+DB      102,15,56,222,232
+DB      102,15,56,222,240
+DB      102,15,56,222,248
+DB      102,68,15,56,222,192
+DB      102,68,15,56,222,200
+        movups  xmm0,XMMWORD[((160-112))+rcx]
+        jb      NEAR $L$cbc_dec_done
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+DB      102,15,56,222,225
+DB      102,15,56,222,233
+DB      102,15,56,222,241
+DB      102,15,56,222,249
+DB      102,68,15,56,222,193
+DB      102,68,15,56,222,201
+        movups  xmm1,XMMWORD[((176-112))+rcx]
+        nop
+DB      102,15,56,222,208
+DB      102,15,56,222,216
+DB      102,15,56,222,224
+DB      102,15,56,222,232
+DB      102,15,56,222,240
+DB      102,15,56,222,248
+DB      102,68,15,56,222,192
+DB      102,68,15,56,222,200
+        movups  xmm0,XMMWORD[((192-112))+rcx]
+        je      NEAR $L$cbc_dec_done
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+DB      102,15,56,222,225
+DB      102,15,56,222,233
+DB      102,15,56,222,241
+DB      102,15,56,222,249
+DB      102,68,15,56,222,193
+DB      102,68,15,56,222,201
+        movups  xmm1,XMMWORD[((208-112))+rcx]
+        nop
+DB      102,15,56,222,208
+DB      102,15,56,222,216
+DB      102,15,56,222,224
+DB      102,15,56,222,232
+DB      102,15,56,222,240
+DB      102,15,56,222,248
+DB      102,68,15,56,222,192
+DB      102,68,15,56,222,200
+        movups  xmm0,XMMWORD[((224-112))+rcx]
+        jmp     NEAR $L$cbc_dec_done
+ALIGN   16
+$L$cbc_dec_done:
+DB      102,15,56,222,209
+DB      102,15,56,222,217
+        pxor    xmm10,xmm0
+        pxor    xmm11,xmm0
+DB      102,15,56,222,225
+DB      102,15,56,222,233
+        pxor    xmm12,xmm0
+        pxor    xmm13,xmm0
+DB      102,15,56,222,241
+DB      102,15,56,222,249
+        pxor    xmm14,xmm0
+        pxor    xmm15,xmm0
+DB      102,68,15,56,222,193
+DB      102,68,15,56,222,201
+        movdqu  xmm1,XMMWORD[80+rdi]
+
+DB      102,65,15,56,223,210
+        movdqu  xmm10,XMMWORD[96+rdi]
+        pxor    xmm1,xmm0
+DB      102,65,15,56,223,219
+        pxor    xmm10,xmm0
+        movdqu  xmm0,XMMWORD[112+rdi]
+DB      102,65,15,56,223,228
+        lea     rdi,[128+rdi]
+        movdqu  xmm11,XMMWORD[rbp]
+DB      102,65,15,56,223,237
+DB      102,65,15,56,223,246
+        movdqu  xmm12,XMMWORD[16+rbp]
+        movdqu  xmm13,XMMWORD[32+rbp]
+DB      102,65,15,56,223,255
+DB      102,68,15,56,223,193
+        movdqu  xmm14,XMMWORD[48+rbp]
+        movdqu  xmm15,XMMWORD[64+rbp]
+DB      102,69,15,56,223,202
+        movdqa  xmm10,xmm0
+        movdqu  xmm1,XMMWORD[80+rbp]
+        movups  xmm0,XMMWORD[((-112))+rcx]
+
+        movups  XMMWORD[rsi],xmm2
+        movdqa  xmm2,xmm11
+        movups  XMMWORD[16+rsi],xmm3
+        movdqa  xmm3,xmm12
+        movups  XMMWORD[32+rsi],xmm4
+        movdqa  xmm4,xmm13
+        movups  XMMWORD[48+rsi],xmm5
+        movdqa  xmm5,xmm14
+        movups  XMMWORD[64+rsi],xmm6
+        movdqa  xmm6,xmm15
+        movups  XMMWORD[80+rsi],xmm7
+        movdqa  xmm7,xmm1
+        movups  XMMWORD[96+rsi],xmm8
+        lea     rsi,[112+rsi]
+
+        sub     rdx,0x80
+        ja      NEAR $L$cbc_dec_loop8
+
+        movaps  xmm2,xmm9
+        lea     rcx,[((-112))+rcx]
+        add     rdx,0x70
+        jle     NEAR $L$cbc_dec_clear_tail_collected
+        movups  XMMWORD[rsi],xmm9
+        lea     rsi,[16+rsi]
+        cmp     rdx,0x50
+        jbe     NEAR $L$cbc_dec_tail
+
+        movaps  xmm2,xmm11
+$L$cbc_dec_six_or_seven:
+        cmp     rdx,0x60
+        ja      NEAR $L$cbc_dec_seven
+
+        movaps  xmm8,xmm7
+        call    _aesni_decrypt6
+        pxor    xmm2,xmm10
+        movaps  xmm10,xmm8
+        pxor    xmm3,xmm11
+        movdqu  XMMWORD[rsi],xmm2
+        pxor    xmm4,xmm12
+        movdqu  XMMWORD[16+rsi],xmm3
+        pxor    xmm3,xmm3
+        pxor    xmm5,xmm13
+        movdqu  XMMWORD[32+rsi],xmm4
+        pxor    xmm4,xmm4
+        pxor    xmm6,xmm14
+        movdqu  XMMWORD[48+rsi],xmm5
+        pxor    xmm5,xmm5
+        pxor    xmm7,xmm15
+        movdqu  XMMWORD[64+rsi],xmm6
+        pxor    xmm6,xmm6
+        lea     rsi,[80+rsi]
+        movdqa  xmm2,xmm7
+        pxor    xmm7,xmm7
+        jmp     NEAR $L$cbc_dec_tail_collected
+
+ALIGN   16
+$L$cbc_dec_seven:
+        movups  xmm8,XMMWORD[96+rdi]
+        xorps   xmm9,xmm9
+        call    _aesni_decrypt8
+        movups  xmm9,XMMWORD[80+rdi]
+        pxor    xmm2,xmm10
+        movups  xmm10,XMMWORD[96+rdi]
+        pxor    xmm3,xmm11
+        movdqu  XMMWORD[rsi],xmm2
+        pxor    xmm4,xmm12
+        movdqu  XMMWORD[16+rsi],xmm3
+        pxor    xmm3,xmm3
+        pxor    xmm5,xmm13
+        movdqu  XMMWORD[32+rsi],xmm4
+        pxor    xmm4,xmm4
+        pxor    xmm6,xmm14
+        movdqu  XMMWORD[48+rsi],xmm5
+        pxor    xmm5,xmm5
+        pxor    xmm7,xmm15
+        movdqu  XMMWORD[64+rsi],xmm6
+        pxor    xmm6,xmm6
+        pxor    xmm8,xmm9
+        movdqu  XMMWORD[80+rsi],xmm7
+        pxor    xmm7,xmm7
+        lea     rsi,[96+rsi]
+        movdqa  xmm2,xmm8
+        pxor    xmm8,xmm8
+        pxor    xmm9,xmm9
+        jmp     NEAR $L$cbc_dec_tail_collected
+
+ALIGN   16
+$L$cbc_dec_loop6:
+        movups  XMMWORD[rsi],xmm7
+        lea     rsi,[16+rsi]
+        movdqu  xmm2,XMMWORD[rdi]
+        movdqu  xmm3,XMMWORD[16+rdi]
+        movdqa  xmm11,xmm2
+        movdqu  xmm4,XMMWORD[32+rdi]
+        movdqa  xmm12,xmm3
+        movdqu  xmm5,XMMWORD[48+rdi]
+        movdqa  xmm13,xmm4
+        movdqu  xmm6,XMMWORD[64+rdi]
+        movdqa  xmm14,xmm5
+        movdqu  xmm7,XMMWORD[80+rdi]
+        movdqa  xmm15,xmm6
+$L$cbc_dec_loop6_enter:
+        lea     rdi,[96+rdi]
+        movdqa  xmm8,xmm7
+
+        call    _aesni_decrypt6
+
+        pxor    xmm2,xmm10
+        movdqa  xmm10,xmm8
+        pxor    xmm3,xmm11
+        movdqu  XMMWORD[rsi],xmm2
+        pxor    xmm4,xmm12
+        movdqu  XMMWORD[16+rsi],xmm3
+        pxor    xmm5,xmm13
+        movdqu  XMMWORD[32+rsi],xmm4
+        pxor    xmm6,xmm14
+        mov     rcx,rbp
+        movdqu  XMMWORD[48+rsi],xmm5
+        pxor    xmm7,xmm15
+        mov     eax,r10d
+        movdqu  XMMWORD[64+rsi],xmm6
+        lea     rsi,[80+rsi]
+        sub     rdx,0x60
+        ja      NEAR $L$cbc_dec_loop6
+
+        movdqa  xmm2,xmm7
+        add     rdx,0x50
+        jle     NEAR $L$cbc_dec_clear_tail_collected
+        movups  XMMWORD[rsi],xmm7
+        lea     rsi,[16+rsi]
+
+$L$cbc_dec_tail:
+        movups  xmm2,XMMWORD[rdi]
+        sub     rdx,0x10
+        jbe     NEAR $L$cbc_dec_one
+
+        movups  xmm3,XMMWORD[16+rdi]
+        movaps  xmm11,xmm2
+        sub     rdx,0x10
+        jbe     NEAR $L$cbc_dec_two
+
+        movups  xmm4,XMMWORD[32+rdi]
+        movaps  xmm12,xmm3
+        sub     rdx,0x10
+        jbe     NEAR $L$cbc_dec_three
+
+        movups  xmm5,XMMWORD[48+rdi]
+        movaps  xmm13,xmm4
+        sub     rdx,0x10
+        jbe     NEAR $L$cbc_dec_four
+
+        movups  xmm6,XMMWORD[64+rdi]
+        movaps  xmm14,xmm5
+        movaps  xmm15,xmm6
+        xorps   xmm7,xmm7
+        call    _aesni_decrypt6
+        pxor    xmm2,xmm10
+        movaps  xmm10,xmm15
+        pxor    xmm3,xmm11
+        movdqu  XMMWORD[rsi],xmm2
+        pxor    xmm4,xmm12
+        movdqu  XMMWORD[16+rsi],xmm3
+        pxor    xmm3,xmm3
+        pxor    xmm5,xmm13
+        movdqu  XMMWORD[32+rsi],xmm4
+        pxor    xmm4,xmm4
+        pxor    xmm6,xmm14
+        movdqu  XMMWORD[48+rsi],xmm5
+        pxor    xmm5,xmm5
+        lea     rsi,[64+rsi]
+        movdqa  xmm2,xmm6
+        pxor    xmm6,xmm6
+        pxor    xmm7,xmm7
+        sub     rdx,0x10
+        jmp     NEAR $L$cbc_dec_tail_collected
+
+ALIGN   16
+$L$cbc_dec_one:
+        movaps  xmm11,xmm2
+        movups  xmm0,XMMWORD[rcx]
+        movups  xmm1,XMMWORD[16+rcx]
+        lea     rcx,[32+rcx]
+        xorps   xmm2,xmm0
+$L$oop_dec1_17:
+DB      102,15,56,222,209
+        dec     eax
+        movups  xmm1,XMMWORD[rcx]
+        lea     rcx,[16+rcx]
+        jnz     NEAR $L$oop_dec1_17
+DB      102,15,56,223,209
+        xorps   xmm2,xmm10
+        movaps  xmm10,xmm11
+        jmp     NEAR $L$cbc_dec_tail_collected
+ALIGN   16
+$L$cbc_dec_two:
+        movaps  xmm12,xmm3
+        call    _aesni_decrypt2
+        pxor    xmm2,xmm10
+        movaps  xmm10,xmm12
+        pxor    xmm3,xmm11
+        movdqu  XMMWORD[rsi],xmm2
+        movdqa  xmm2,xmm3
+        pxor    xmm3,xmm3
+        lea     rsi,[16+rsi]
+        jmp     NEAR $L$cbc_dec_tail_collected
+ALIGN   16
+$L$cbc_dec_three:
+        movaps  xmm13,xmm4
+        call    _aesni_decrypt3
+        pxor    xmm2,xmm10
+        movaps  xmm10,xmm13
+        pxor    xmm3,xmm11
+        movdqu  XMMWORD[rsi],xmm2
+        pxor    xmm4,xmm12
+        movdqu  XMMWORD[16+rsi],xmm3
+        pxor    xmm3,xmm3
+        movdqa  xmm2,xmm4
+        pxor    xmm4,xmm4
+        lea     rsi,[32+rsi]
+        jmp     NEAR $L$cbc_dec_tail_collected
+ALIGN   16
+$L$cbc_dec_four:
+        movaps  xmm14,xmm5
+        call    _aesni_decrypt4
+        pxor    xmm2,xmm10
+        movaps  xmm10,xmm14
+        pxor    xmm3,xmm11
+        movdqu  XMMWORD[rsi],xmm2
+        pxor    xmm4,xmm12
+        movdqu  XMMWORD[16+rsi],xmm3
+        pxor    xmm3,xmm3
+        pxor    xmm5,xmm13
+        movdqu  XMMWORD[32+rsi],xmm4
+        pxor    xmm4,xmm4
+        movdqa  xmm2,xmm5
+        pxor    xmm5,xmm5
+        lea     rsi,[48+rsi]
+        jmp     NEAR $L$cbc_dec_tail_collected
+
+ALIGN   16
+$L$cbc_dec_clear_tail_collected:
+        pxor    xmm3,xmm3
+        pxor    xmm4,xmm4
+        pxor    xmm5,xmm5
+$L$cbc_dec_tail_collected:
+        movups  XMMWORD[r8],xmm10
+        and     rdx,15
+        jnz     NEAR $L$cbc_dec_tail_partial
+        movups  XMMWORD[rsi],xmm2
+        pxor    xmm2,xmm2
+        jmp     NEAR $L$cbc_dec_ret
+ALIGN   16
+$L$cbc_dec_tail_partial:
+        movaps  XMMWORD[rsp],xmm2
+        pxor    xmm2,xmm2
+        mov     rcx,16
+        mov     rdi,rsi
+        sub     rcx,rdx
+        lea     rsi,[rsp]
+        DD      0x9066A4F3
+        movdqa  XMMWORD[rsp],xmm2
+
+$L$cbc_dec_ret:
+        xorps   xmm0,xmm0
+        pxor    xmm1,xmm1
+        movaps  xmm6,XMMWORD[16+rsp]
+        movaps  XMMWORD[16+rsp],xmm0
+        movaps  xmm7,XMMWORD[32+rsp]
+        movaps  XMMWORD[32+rsp],xmm0
+        movaps  xmm8,XMMWORD[48+rsp]
+        movaps  XMMWORD[48+rsp],xmm0
+        movaps  xmm9,XMMWORD[64+rsp]
+        movaps  XMMWORD[64+rsp],xmm0
+        movaps  xmm10,XMMWORD[80+rsp]
+        movaps  XMMWORD[80+rsp],xmm0
+        movaps  xmm11,XMMWORD[96+rsp]
+        movaps  XMMWORD[96+rsp],xmm0
+        movaps  xmm12,XMMWORD[112+rsp]
+        movaps  XMMWORD[112+rsp],xmm0
+        movaps  xmm13,XMMWORD[128+rsp]
+        movaps  XMMWORD[128+rsp],xmm0
+        movaps  xmm14,XMMWORD[144+rsp]
+        movaps  XMMWORD[144+rsp],xmm0
+        movaps  xmm15,XMMWORD[160+rsp]
+        movaps  XMMWORD[160+rsp],xmm0
+        mov     rbp,QWORD[((-8))+r11]
+
+        lea     rsp,[r11]
+
+$L$cbc_ret:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_aesni_cbc_encrypt:
+global  aesni_set_decrypt_key
+
+ALIGN   16
+aesni_set_decrypt_key:
+
+DB      0x48,0x83,0xEC,0x08
+
+        call    __aesni_set_encrypt_key
+        shl     edx,4
+        test    eax,eax
+        jnz     NEAR $L$dec_key_ret
+        lea     rcx,[16+rdx*1+r8]
+
+        movups  xmm0,XMMWORD[r8]
+        movups  xmm1,XMMWORD[rcx]
+        movups  XMMWORD[rcx],xmm0
+        movups  XMMWORD[r8],xmm1
+        lea     r8,[16+r8]
+        lea     rcx,[((-16))+rcx]
+
+$L$dec_key_inverse:
+        movups  xmm0,XMMWORD[r8]
+        movups  xmm1,XMMWORD[rcx]
+DB      102,15,56,219,192
+DB      102,15,56,219,201
+        lea     r8,[16+r8]
+        lea     rcx,[((-16))+rcx]
+        movups  XMMWORD[16+rcx],xmm0
+        movups  XMMWORD[(-16)+r8],xmm1
+        cmp     rcx,r8
+        ja      NEAR $L$dec_key_inverse
+
+        movups  xmm0,XMMWORD[r8]
+DB      102,15,56,219,192
+        pxor    xmm1,xmm1
+        movups  XMMWORD[rcx],xmm0
+        pxor    xmm0,xmm0
+$L$dec_key_ret:
+        add     rsp,8
+
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_set_decrypt_key:
+
+global  aesni_set_encrypt_key
+
+ALIGN   16
+aesni_set_encrypt_key:
+__aesni_set_encrypt_key:
+
+DB      0x48,0x83,0xEC,0x08
+
+        mov     rax,-1
+        test    rcx,rcx
+        jz      NEAR $L$enc_key_ret
+        test    r8,r8
+        jz      NEAR $L$enc_key_ret
+
+        mov     r10d,268437504
+        movups  xmm0,XMMWORD[rcx]
+        xorps   xmm4,xmm4
+        and     r10d,DWORD[((OPENSSL_ia32cap_P+4))]
+        lea     rax,[16+r8]
+        cmp     edx,256
+        je      NEAR $L$14rounds
+        cmp     edx,192
+        je      NEAR $L$12rounds
+        cmp     edx,128
+        jne     NEAR $L$bad_keybits
+
+$L$10rounds:
+        mov     edx,9
+        cmp     r10d,268435456
+        je      NEAR $L$10rounds_alt
+
+        movups  XMMWORD[r8],xmm0
+DB      102,15,58,223,200,1
+        call    $L$key_expansion_128_cold
+DB      102,15,58,223,200,2
+        call    $L$key_expansion_128
+DB      102,15,58,223,200,4
+        call    $L$key_expansion_128
+DB      102,15,58,223,200,8
+        call    $L$key_expansion_128
+DB      102,15,58,223,200,16
+        call    $L$key_expansion_128
+DB      102,15,58,223,200,32
+        call    $L$key_expansion_128
+DB      102,15,58,223,200,64
+        call    $L$key_expansion_128
+DB      102,15,58,223,200,128
+        call    $L$key_expansion_128
+DB      102,15,58,223,200,27
+        call    $L$key_expansion_128
+DB      102,15,58,223,200,54
+        call    $L$key_expansion_128
+        movups  XMMWORD[rax],xmm0
+        mov     DWORD[80+rax],edx
+        xor     eax,eax
+        jmp     NEAR $L$enc_key_ret
+
+ALIGN   16
+$L$10rounds_alt:
+        movdqa  xmm5,XMMWORD[$L$key_rotate]
+        mov     r10d,8
+        movdqa  xmm4,XMMWORD[$L$key_rcon1]
+        movdqa  xmm2,xmm0
+        movdqu  XMMWORD[r8],xmm0
+        jmp     NEAR $L$oop_key128
+
+ALIGN   16
+$L$oop_key128:
+DB      102,15,56,0,197
+DB      102,15,56,221,196
+        pslld   xmm4,1
+        lea     rax,[16+rax]
+
+        movdqa  xmm3,xmm2
+        pslldq  xmm2,4
+        pxor    xmm3,xmm2
+        pslldq  xmm2,4
+        pxor    xmm3,xmm2
+        pslldq  xmm2,4
+        pxor    xmm2,xmm3
+
+        pxor    xmm0,xmm2
+        movdqu  XMMWORD[(-16)+rax],xmm0
+        movdqa  xmm2,xmm0
+
+        dec     r10d
+        jnz     NEAR $L$oop_key128
+
+        movdqa  xmm4,XMMWORD[$L$key_rcon1b]
+
+DB      102,15,56,0,197
+DB      102,15,56,221,196
+        pslld   xmm4,1
+
+        movdqa  xmm3,xmm2
+        pslldq  xmm2,4
+        pxor    xmm3,xmm2
+        pslldq  xmm2,4
+        pxor    xmm3,xmm2
+        pslldq  xmm2,4
+        pxor    xmm2,xmm3
+
+        pxor    xmm0,xmm2
+        movdqu  XMMWORD[rax],xmm0
+
+        movdqa  xmm2,xmm0
+DB      102,15,56,0,197
+DB      102,15,56,221,196
+
+        movdqa  xmm3,xmm2
+        pslldq  xmm2,4
+        pxor    xmm3,xmm2
+        pslldq  xmm2,4
+        pxor    xmm3,xmm2
+        pslldq  xmm2,4
+        pxor    xmm2,xmm3
+
+        pxor    xmm0,xmm2
+        movdqu  XMMWORD[16+rax],xmm0
+
+        mov     DWORD[96+rax],edx
+        xor     eax,eax
+        jmp     NEAR $L$enc_key_ret
+
+ALIGN   16
+$L$12rounds:
+        movq    xmm2,QWORD[16+rcx]
+        mov     edx,11
+        cmp     r10d,268435456
+        je      NEAR $L$12rounds_alt
+
+        movups  XMMWORD[r8],xmm0
+DB      102,15,58,223,202,1
+        call    $L$key_expansion_192a_cold
+DB      102,15,58,223,202,2
+        call    $L$key_expansion_192b
+DB      102,15,58,223,202,4
+        call    $L$key_expansion_192a
+DB      102,15,58,223,202,8
+        call    $L$key_expansion_192b
+DB      102,15,58,223,202,16
+        call    $L$key_expansion_192a
+DB      102,15,58,223,202,32
+        call    $L$key_expansion_192b
+DB      102,15,58,223,202,64
+        call    $L$key_expansion_192a
+DB      102,15,58,223,202,128
+        call    $L$key_expansion_192b
+        movups  XMMWORD[rax],xmm0
+        mov     DWORD[48+rax],edx
+        xor     rax,rax
+        jmp     NEAR $L$enc_key_ret
+
+ALIGN   16
+$L$12rounds_alt:
+        movdqa  xmm5,XMMWORD[$L$key_rotate192]
+        movdqa  xmm4,XMMWORD[$L$key_rcon1]
+        mov     r10d,8
+        movdqu  XMMWORD[r8],xmm0
+        jmp     NEAR $L$oop_key192
+
+ALIGN   16
+$L$oop_key192:
+        movq    QWORD[rax],xmm2
+        movdqa  xmm1,xmm2
+DB      102,15,56,0,213
+DB      102,15,56,221,212
+        pslld   xmm4,1
+        lea     rax,[24+rax]
+
+        movdqa  xmm3,xmm0
+        pslldq  xmm0,4
+        pxor    xmm3,xmm0
+        pslldq  xmm0,4
+        pxor    xmm3,xmm0
+        pslldq  xmm0,4
+        pxor    xmm0,xmm3
+
+        pshufd  xmm3,xmm0,0xff
+        pxor    xmm3,xmm1
+        pslldq  xmm1,4
+        pxor    xmm3,xmm1
+
+        pxor    xmm0,xmm2
+        pxor    xmm2,xmm3
+        movdqu  XMMWORD[(-16)+rax],xmm0
+
+        dec     r10d
+        jnz     NEAR $L$oop_key192
+
+        mov     DWORD[32+rax],edx
+        xor     eax,eax
+        jmp     NEAR $L$enc_key_ret
+
+ALIGN   16
+$L$14rounds:
+        movups  xmm2,XMMWORD[16+rcx]
+        mov     edx,13
+        lea     rax,[16+rax]
+        cmp     r10d,268435456
+        je      NEAR $L$14rounds_alt
+
+        movups  XMMWORD[r8],xmm0
+        movups  XMMWORD[16+r8],xmm2
+DB      102,15,58,223,202,1
+        call    $L$key_expansion_256a_cold
+DB      102,15,58,223,200,1
+        call    $L$key_expansion_256b
+DB      102,15,58,223,202,2
+        call    $L$key_expansion_256a
+DB      102,15,58,223,200,2
+        call    $L$key_expansion_256b
+DB      102,15,58,223,202,4
+        call    $L$key_expansion_256a
+DB      102,15,58,223,200,4
+        call    $L$key_expansion_256b
+DB      102,15,58,223,202,8
+        call    $L$key_expansion_256a
+DB      102,15,58,223,200,8
+        call    $L$key_expansion_256b
+DB      102,15,58,223,202,16
+        call    $L$key_expansion_256a
+DB      102,15,58,223,200,16
+        call    $L$key_expansion_256b
+DB      102,15,58,223,202,32
+        call    $L$key_expansion_256a
+DB      102,15,58,223,200,32
+        call    $L$key_expansion_256b
+DB      102,15,58,223,202,64
+        call    $L$key_expansion_256a
+        movups  XMMWORD[rax],xmm0
+        mov     DWORD[16+rax],edx
+        xor     rax,rax
+        jmp     NEAR $L$enc_key_ret
+
+ALIGN   16
+$L$14rounds_alt:
+        movdqa  xmm5,XMMWORD[$L$key_rotate]
+        movdqa  xmm4,XMMWORD[$L$key_rcon1]
+        mov     r10d,7
+        movdqu  XMMWORD[r8],xmm0
+        movdqa  xmm1,xmm2
+        movdqu  XMMWORD[16+r8],xmm2
+        jmp     NEAR $L$oop_key256
+
+ALIGN   16
+$L$oop_key256:
+DB      102,15,56,0,213
+DB      102,15,56,221,212
+
+        movdqa  xmm3,xmm0
+        pslldq  xmm0,4
+        pxor    xmm3,xmm0
+        pslldq  xmm0,4
+        pxor    xmm3,xmm0
+        pslldq  xmm0,4
+        pxor    xmm0,xmm3
+        pslld   xmm4,1
+
+        pxor    xmm0,xmm2
+        movdqu  XMMWORD[rax],xmm0
+
+        dec     r10d
+        jz      NEAR $L$done_key256
+
+        pshufd  xmm2,xmm0,0xff
+        pxor    xmm3,xmm3
+DB      102,15,56,221,211
+
+        movdqa  xmm3,xmm1
+        pslldq  xmm1,4
+        pxor    xmm3,xmm1
+        pslldq  xmm1,4
+        pxor    xmm3,xmm1
+        pslldq  xmm1,4
+        pxor    xmm1,xmm3
+
+        pxor    xmm2,xmm1
+        movdqu  XMMWORD[16+rax],xmm2
+        lea     rax,[32+rax]
+        movdqa  xmm1,xmm2
+
+        jmp     NEAR $L$oop_key256
+
+$L$done_key256:
+        mov     DWORD[16+rax],edx
+        xor     eax,eax
+        jmp     NEAR $L$enc_key_ret
+
+ALIGN   16
+$L$bad_keybits:
+        mov     rax,-2
+$L$enc_key_ret:
+        pxor    xmm0,xmm0
+        pxor    xmm1,xmm1
+        pxor    xmm2,xmm2
+        pxor    xmm3,xmm3
+        pxor    xmm4,xmm4
+        pxor    xmm5,xmm5
+        add     rsp,8
+
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_set_encrypt_key:
+
+ALIGN   16
+$L$key_expansion_128:
+        movups  XMMWORD[rax],xmm0
+        lea     rax,[16+rax]
+$L$key_expansion_128_cold:
+        shufps  xmm4,xmm0,16
+        xorps   xmm0,xmm4
+        shufps  xmm4,xmm0,140
+        xorps   xmm0,xmm4
+        shufps  xmm1,xmm1,255
+        xorps   xmm0,xmm1
+        DB      0F3h,0C3h               ;repret
+
+ALIGN   16
+$L$key_expansion_192a:
+        movups  XMMWORD[rax],xmm0
+        lea     rax,[16+rax]
+$L$key_expansion_192a_cold:
+        movaps  xmm5,xmm2
+$L$key_expansion_192b_warm:
+        shufps  xmm4,xmm0,16
+        movdqa  xmm3,xmm2
+        xorps   xmm0,xmm4
+        shufps  xmm4,xmm0,140
+        pslldq  xmm3,4
+        xorps   xmm0,xmm4
+        pshufd  xmm1,xmm1,85
+        pxor    xmm2,xmm3
+        pxor    xmm0,xmm1
+        pshufd  xmm3,xmm0,255
+        pxor    xmm2,xmm3
+        DB      0F3h,0C3h               ;repret
+
+ALIGN   16
+$L$key_expansion_192b:
+        movaps  xmm3,xmm0
+        shufps  xmm5,xmm0,68
+        movups  XMMWORD[rax],xmm5
+        shufps  xmm3,xmm2,78
+        movups  XMMWORD[16+rax],xmm3
+        lea     rax,[32+rax]
+        jmp     NEAR $L$key_expansion_192b_warm
+
+ALIGN   16
+$L$key_expansion_256a:
+        movups  XMMWORD[rax],xmm2
+        lea     rax,[16+rax]
+$L$key_expansion_256a_cold:
+        shufps  xmm4,xmm0,16
+        xorps   xmm0,xmm4
+        shufps  xmm4,xmm0,140
+        xorps   xmm0,xmm4
+        shufps  xmm1,xmm1,255
+        xorps   xmm0,xmm1
+        DB      0F3h,0C3h               ;repret
+
+ALIGN   16
+$L$key_expansion_256b:
+        movups  XMMWORD[rax],xmm0
+        lea     rax,[16+rax]
+
+        shufps  xmm4,xmm2,16
+        xorps   xmm2,xmm4
+        shufps  xmm4,xmm2,140
+        xorps   xmm2,xmm4
+        shufps  xmm1,xmm1,170
+        xorps   xmm2,xmm1
+        DB      0F3h,0C3h               ;repret
+
+
+ALIGN   64
+$L$bswap_mask:
+DB      15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0
+$L$increment32:
+        DD      6,6,6,0
+$L$increment64:
+        DD      1,0,0,0
+$L$xts_magic:
+        DD      0x87,0,1,0
+$L$increment1:
+DB      0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
+$L$key_rotate:
+        DD      0x0c0f0e0d,0x0c0f0e0d,0x0c0f0e0d,0x0c0f0e0d
+$L$key_rotate192:
+        DD      0x04070605,0x04070605,0x04070605,0x04070605
+$L$key_rcon1:
+        DD      1,1,1,1
+$L$key_rcon1b:
+        DD      0x1b,0x1b,0x1b,0x1b
+
+DB      65,69,83,32,102,111,114,32,73,110,116,101,108,32,65,69
+DB      83,45,78,73,44,32,67,82,89,80,84,79,71,65,77,83
+DB      32,98,121,32,60,97,112,112,114,111,64,111,112,101,110,115
+DB      115,108,46,111,114,103,62,0
+ALIGN   64
+EXTERN  __imp_RtlVirtualUnwind
+
+ALIGN   16
+ecb_ccm64_se_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        mov     rsi,QWORD[8+r9]
+        mov     r11,QWORD[56+r9]
+
+        mov     r10d,DWORD[r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jb      NEAR $L$common_seh_tail
+
+        mov     rax,QWORD[152+r8]
+
+        mov     r10d,DWORD[4+r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jae     NEAR $L$common_seh_tail
+
+        lea     rsi,[rax]
+        lea     rdi,[512+r8]
+        mov     ecx,8
+        DD      0xa548f3fc
+        lea     rax,[88+rax]
+
+        jmp     NEAR $L$common_seh_tail
+
+
+
+ALIGN   16
+ctr_xts_se_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        mov     rsi,QWORD[8+r9]
+        mov     r11,QWORD[56+r9]
+
+        mov     r10d,DWORD[r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jb      NEAR $L$common_seh_tail
+
+        mov     rax,QWORD[152+r8]
+
+        mov     r10d,DWORD[4+r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jae     NEAR $L$common_seh_tail
+
+        mov     rax,QWORD[208+r8]
+
+        lea     rsi,[((-168))+rax]
+        lea     rdi,[512+r8]
+        mov     ecx,20
+        DD      0xa548f3fc
+
+        mov     rbp,QWORD[((-8))+rax]
+        mov     QWORD[160+r8],rbp
+        jmp     NEAR $L$common_seh_tail
+
+
+
+ALIGN   16
+ocb_se_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        mov     rsi,QWORD[8+r9]
+        mov     r11,QWORD[56+r9]
+
+        mov     r10d,DWORD[r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jb      NEAR $L$common_seh_tail
+
+        mov     r10d,DWORD[4+r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jae     NEAR $L$common_seh_tail
+
+        mov     r10d,DWORD[8+r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jae     NEAR $L$ocb_no_xmm
+
+        mov     rax,QWORD[152+r8]
+
+        lea     rsi,[rax]
+        lea     rdi,[512+r8]
+        mov     ecx,20
+        DD      0xa548f3fc
+        lea     rax,[((160+40))+rax]
+
+$L$ocb_no_xmm:
+        mov     rbx,QWORD[((-8))+rax]
+        mov     rbp,QWORD[((-16))+rax]
+        mov     r12,QWORD[((-24))+rax]
+        mov     r13,QWORD[((-32))+rax]
+        mov     r14,QWORD[((-40))+rax]
+
+        mov     QWORD[144+r8],rbx
+        mov     QWORD[160+r8],rbp
+        mov     QWORD[216+r8],r12
+        mov     QWORD[224+r8],r13
+        mov     QWORD[232+r8],r14
+
+        jmp     NEAR $L$common_seh_tail
+
+
+ALIGN   16
+cbc_se_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[152+r8]
+        mov     rbx,QWORD[248+r8]
+
+        lea     r10,[$L$cbc_decrypt_bulk]
+        cmp     rbx,r10
+        jb      NEAR $L$common_seh_tail
+
+        mov     rax,QWORD[120+r8]
+
+        lea     r10,[$L$cbc_decrypt_body]
+        cmp     rbx,r10
+        jb      NEAR $L$common_seh_tail
+
+        mov     rax,QWORD[152+r8]
+
+        lea     r10,[$L$cbc_ret]
+        cmp     rbx,r10
+        jae     NEAR $L$common_seh_tail
+
+        lea     rsi,[16+rax]
+        lea     rdi,[512+r8]
+        mov     ecx,20
+        DD      0xa548f3fc
+
+        mov     rax,QWORD[208+r8]
+
+        mov     rbp,QWORD[((-8))+rax]
+        mov     QWORD[160+r8],rbp
+
+$L$common_seh_tail:
+        mov     rdi,QWORD[8+rax]
+        mov     rsi,QWORD[16+rax]
+        mov     QWORD[152+r8],rax
+        mov     QWORD[168+r8],rsi
+        mov     QWORD[176+r8],rdi
+
+        mov     rdi,QWORD[40+r9]
+        mov     rsi,r8
+        mov     ecx,154
+        DD      0xa548f3fc
+
+        mov     rsi,r9
+        xor     rcx,rcx
+        mov     rdx,QWORD[8+rsi]
+        mov     r8,QWORD[rsi]
+        mov     r9,QWORD[16+rsi]
+        mov     r10,QWORD[40+rsi]
+        lea     r11,[56+rsi]
+        lea     r12,[24+rsi]
+        mov     QWORD[32+rsp],r10
+        mov     QWORD[40+rsp],r11
+        mov     QWORD[48+rsp],r12
+        mov     QWORD[56+rsp],rcx
+        call    QWORD[__imp_RtlVirtualUnwind]
+
+        mov     eax,1
+        add     rsp,64
+        popfq
+        pop     r15
+        pop     r14
+        pop     r13
+        pop     r12
+        pop     rbp
+        pop     rbx
+        pop     rdi
+        pop     rsi
+        DB      0F3h,0C3h               ;repret
+
+
+section .pdata rdata align=4
+ALIGN   4
+        DD      $L$SEH_begin_aesni_ecb_encrypt wrt ..imagebase
+        DD      $L$SEH_end_aesni_ecb_encrypt wrt ..imagebase
+        DD      $L$SEH_info_ecb wrt ..imagebase
+
+        DD      $L$SEH_begin_aesni_ccm64_encrypt_blocks wrt ..imagebase
+        DD      $L$SEH_end_aesni_ccm64_encrypt_blocks wrt ..imagebase
+        DD      $L$SEH_info_ccm64_enc wrt ..imagebase
+
+        DD      $L$SEH_begin_aesni_ccm64_decrypt_blocks wrt ..imagebase
+        DD      $L$SEH_end_aesni_ccm64_decrypt_blocks wrt ..imagebase
+        DD      $L$SEH_info_ccm64_dec wrt ..imagebase
+
+        DD      $L$SEH_begin_aesni_ctr32_encrypt_blocks wrt ..imagebase
+        DD      $L$SEH_end_aesni_ctr32_encrypt_blocks wrt ..imagebase
+        DD      $L$SEH_info_ctr32 wrt ..imagebase
+
+        DD      $L$SEH_begin_aesni_xts_encrypt wrt ..imagebase
+        DD      $L$SEH_end_aesni_xts_encrypt wrt ..imagebase
+        DD      $L$SEH_info_xts_enc wrt ..imagebase
+
+        DD      $L$SEH_begin_aesni_xts_decrypt wrt ..imagebase
+        DD      $L$SEH_end_aesni_xts_decrypt wrt ..imagebase
+        DD      $L$SEH_info_xts_dec wrt ..imagebase
+
+        DD      $L$SEH_begin_aesni_ocb_encrypt wrt ..imagebase
+        DD      $L$SEH_end_aesni_ocb_encrypt wrt ..imagebase
+        DD      $L$SEH_info_ocb_enc wrt ..imagebase
+
+        DD      $L$SEH_begin_aesni_ocb_decrypt wrt ..imagebase
+        DD      $L$SEH_end_aesni_ocb_decrypt wrt ..imagebase
+        DD      $L$SEH_info_ocb_dec wrt ..imagebase
+        DD      $L$SEH_begin_aesni_cbc_encrypt wrt ..imagebase
+        DD      $L$SEH_end_aesni_cbc_encrypt wrt ..imagebase
+        DD      $L$SEH_info_cbc wrt ..imagebase
+
+        DD      aesni_set_decrypt_key wrt ..imagebase
+        DD      $L$SEH_end_set_decrypt_key wrt ..imagebase
+        DD      $L$SEH_info_key wrt ..imagebase
+
+        DD      aesni_set_encrypt_key wrt ..imagebase
+        DD      $L$SEH_end_set_encrypt_key wrt ..imagebase
+        DD      $L$SEH_info_key wrt ..imagebase
+section .xdata rdata align=8
+ALIGN   8
+$L$SEH_info_ecb:
+DB      9,0,0,0
+        DD      ecb_ccm64_se_handler wrt ..imagebase
+        DD      $L$ecb_enc_body wrt ..imagebase,$L$ecb_enc_ret wrt ..imagebase
+$L$SEH_info_ccm64_enc:
+DB      9,0,0,0
+        DD      ecb_ccm64_se_handler wrt ..imagebase
+        DD      $L$ccm64_enc_body wrt ..imagebase,$L$ccm64_enc_ret wrt ..imagebase
+$L$SEH_info_ccm64_dec:
+DB      9,0,0,0
+        DD      ecb_ccm64_se_handler wrt ..imagebase
+        DD      $L$ccm64_dec_body wrt ..imagebase,$L$ccm64_dec_ret wrt ..imagebase
+$L$SEH_info_ctr32:
+DB      9,0,0,0
+        DD      ctr_xts_se_handler wrt ..imagebase
+        DD      $L$ctr32_body wrt ..imagebase,$L$ctr32_epilogue wrt ..imagebase
+$L$SEH_info_xts_enc:
+DB      9,0,0,0
+        DD      ctr_xts_se_handler wrt ..imagebase
+        DD      $L$xts_enc_body wrt ..imagebase,$L$xts_enc_epilogue wrt ..imagebase
+$L$SEH_info_xts_dec:
+DB      9,0,0,0
+        DD      ctr_xts_se_handler wrt ..imagebase
+        DD      $L$xts_dec_body wrt ..imagebase,$L$xts_dec_epilogue wrt ..imagebase
+$L$SEH_info_ocb_enc:
+DB      9,0,0,0
+        DD      ocb_se_handler wrt ..imagebase
+        DD      $L$ocb_enc_body wrt ..imagebase,$L$ocb_enc_epilogue wrt ..imagebase
+        DD      $L$ocb_enc_pop wrt ..imagebase
+        DD      0
+$L$SEH_info_ocb_dec:
+DB      9,0,0,0
+        DD      ocb_se_handler wrt ..imagebase
+        DD      $L$ocb_dec_body wrt ..imagebase,$L$ocb_dec_epilogue wrt ..imagebase
+        DD      $L$ocb_dec_pop wrt ..imagebase
+        DD      0
+$L$SEH_info_cbc:
+DB      9,0,0,0
+        DD      cbc_se_handler wrt ..imagebase
+$L$SEH_info_key:
+DB      0x01,0x04,0x01,0x00
+DB      0x04,0x02,0x00,0x00
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/aes/vpaes-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/vpaes-x86_64.nasm
new file mode 100644
index 0000000000..e6a5733924
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/aes/vpaes-x86_64.nasm
@@ -0,0 +1,1170 @@
+; Copyright 2011-2019 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ALIGN   16
+_vpaes_encrypt_core:
+
+        mov     r9,rdx
+        mov     r11,16
+        mov     eax,DWORD[240+rdx]
+        movdqa  xmm1,xmm9
+        movdqa  xmm2,XMMWORD[$L$k_ipt]
+        pandn   xmm1,xmm0
+        movdqu  xmm5,XMMWORD[r9]
+        psrld   xmm1,4
+        pand    xmm0,xmm9
+DB      102,15,56,0,208
+        movdqa  xmm0,XMMWORD[(($L$k_ipt+16))]
+DB      102,15,56,0,193
+        pxor    xmm2,xmm5
+        add     r9,16
+        pxor    xmm0,xmm2
+        lea     r10,[$L$k_mc_backward]
+        jmp     NEAR $L$enc_entry
+
+ALIGN   16
+$L$enc_loop:
+
+        movdqa  xmm4,xmm13
+        movdqa  xmm0,xmm12
+DB      102,15,56,0,226
+DB      102,15,56,0,195
+        pxor    xmm4,xmm5
+        movdqa  xmm5,xmm15
+        pxor    xmm0,xmm4
+        movdqa  xmm1,XMMWORD[((-64))+r10*1+r11]
+DB      102,15,56,0,234
+        movdqa  xmm4,XMMWORD[r10*1+r11]
+        movdqa  xmm2,xmm14
+DB      102,15,56,0,211
+        movdqa  xmm3,xmm0
+        pxor    xmm2,xmm5
+DB      102,15,56,0,193
+        add     r9,16
+        pxor    xmm0,xmm2
+DB      102,15,56,0,220
+        add     r11,16
+        pxor    xmm3,xmm0
+DB      102,15,56,0,193
+        and     r11,0x30
+        sub     rax,1
+        pxor    xmm0,xmm3
+
+$L$enc_entry:
+
+        movdqa  xmm1,xmm9
+        movdqa  xmm5,xmm11
+        pandn   xmm1,xmm0
+        psrld   xmm1,4
+        pand    xmm0,xmm9
+DB      102,15,56,0,232
+        movdqa  xmm3,xmm10
+        pxor    xmm0,xmm1
+DB      102,15,56,0,217
+        movdqa  xmm4,xmm10
+        pxor    xmm3,xmm5
+DB      102,15,56,0,224
+        movdqa  xmm2,xmm10
+        pxor    xmm4,xmm5
+DB      102,15,56,0,211
+        movdqa  xmm3,xmm10
+        pxor    xmm2,xmm0
+DB      102,15,56,0,220
+        movdqu  xmm5,XMMWORD[r9]
+        pxor    xmm3,xmm1
+        jnz     NEAR $L$enc_loop
+
+
+        movdqa  xmm4,XMMWORD[((-96))+r10]
+        movdqa  xmm0,XMMWORD[((-80))+r10]
+DB      102,15,56,0,226
+        pxor    xmm4,xmm5
+DB      102,15,56,0,195
+        movdqa  xmm1,XMMWORD[64+r10*1+r11]
+        pxor    xmm0,xmm4
+DB      102,15,56,0,193
+        DB      0F3h,0C3h               ;repret
+
+
+
+
+
+
+
+
+
+ALIGN   16
+_vpaes_decrypt_core:
+
+        mov     r9,rdx
+        mov     eax,DWORD[240+rdx]
+        movdqa  xmm1,xmm9
+        movdqa  xmm2,XMMWORD[$L$k_dipt]
+        pandn   xmm1,xmm0
+        mov     r11,rax
+        psrld   xmm1,4
+        movdqu  xmm5,XMMWORD[r9]
+        shl     r11,4
+        pand    xmm0,xmm9
+DB      102,15,56,0,208
+        movdqa  xmm0,XMMWORD[(($L$k_dipt+16))]
+        xor     r11,0x30
+        lea     r10,[$L$k_dsbd]
+DB      102,15,56,0,193
+        and     r11,0x30
+        pxor    xmm2,xmm5
+        movdqa  xmm5,XMMWORD[(($L$k_mc_forward+48))]
+        pxor    xmm0,xmm2
+        add     r9,16
+        add     r11,r10
+        jmp     NEAR $L$dec_entry
+
+ALIGN   16
+$L$dec_loop:
+
+
+
+        movdqa  xmm4,XMMWORD[((-32))+r10]
+        movdqa  xmm1,XMMWORD[((-16))+r10]
+DB      102,15,56,0,226
+DB      102,15,56,0,203
+        pxor    xmm0,xmm4
+        movdqa  xmm4,XMMWORD[r10]
+        pxor    xmm0,xmm1
+        movdqa  xmm1,XMMWORD[16+r10]
+
+DB      102,15,56,0,226
+DB      102,15,56,0,197
+DB      102,15,56,0,203
+        pxor    xmm0,xmm4
+        movdqa  xmm4,XMMWORD[32+r10]
+        pxor    xmm0,xmm1
+        movdqa  xmm1,XMMWORD[48+r10]
+
+DB      102,15,56,0,226
+DB      102,15,56,0,197
+DB      102,15,56,0,203
+        pxor    xmm0,xmm4
+        movdqa  xmm4,XMMWORD[64+r10]
+        pxor    xmm0,xmm1
+        movdqa  xmm1,XMMWORD[80+r10]
+
+DB      102,15,56,0,226
+DB      102,15,56,0,197
+DB      102,15,56,0,203
+        pxor    xmm0,xmm4
+        add     r9,16
+DB      102,15,58,15,237,12
+        pxor    xmm0,xmm1
+        sub     rax,1
+
+$L$dec_entry:
+
+        movdqa  xmm1,xmm9
+        pandn   xmm1,xmm0
+        movdqa  xmm2,xmm11
+        psrld   xmm1,4
+        pand    xmm0,xmm9
+DB      102,15,56,0,208
+        movdqa  xmm3,xmm10
+        pxor    xmm0,xmm1
+DB      102,15,56,0,217
+        movdqa  xmm4,xmm10
+        pxor    xmm3,xmm2
+DB      102,15,56,0,224
+        pxor    xmm4,xmm2
+        movdqa  xmm2,xmm10
+DB      102,15,56,0,211
+        movdqa  xmm3,xmm10
+        pxor    xmm2,xmm0
+DB      102,15,56,0,220
+        movdqu  xmm0,XMMWORD[r9]
+        pxor    xmm3,xmm1
+        jnz     NEAR $L$dec_loop
+
+
+        movdqa  xmm4,XMMWORD[96+r10]
+DB      102,15,56,0,226
+        pxor    xmm4,xmm0
+        movdqa  xmm0,XMMWORD[112+r10]
+        movdqa  xmm2,XMMWORD[((-352))+r11]
+DB      102,15,56,0,195
+        pxor    xmm0,xmm4
+DB      102,15,56,0,194
+        DB      0F3h,0C3h               ;repret
+
+
+
+
+
+
+
+
+
+ALIGN   16
+_vpaes_schedule_core:
+
+
+
+
+
+
+        call    _vpaes_preheat
+        movdqa  xmm8,XMMWORD[$L$k_rcon]
+        movdqu  xmm0,XMMWORD[rdi]
+
+
+        movdqa  xmm3,xmm0
+        lea     r11,[$L$k_ipt]
+        call    _vpaes_schedule_transform
+        movdqa  xmm7,xmm0
+
+        lea     r10,[$L$k_sr]
+        test    rcx,rcx
+        jnz     NEAR $L$schedule_am_decrypting
+
+
+        movdqu  XMMWORD[rdx],xmm0
+        jmp     NEAR $L$schedule_go
+
+$L$schedule_am_decrypting:
+
+        movdqa  xmm1,XMMWORD[r10*1+r8]
+DB      102,15,56,0,217
+        movdqu  XMMWORD[rdx],xmm3
+        xor     r8,0x30
+
+$L$schedule_go:
+        cmp     esi,192
+        ja      NEAR $L$schedule_256
+        je      NEAR $L$schedule_192
+
+
+
+
+
+
+
+
+
+
+$L$schedule_128:
+        mov     esi,10
+
+$L$oop_schedule_128:
+        call    _vpaes_schedule_round
+        dec     rsi
+        jz      NEAR $L$schedule_mangle_last
+        call    _vpaes_schedule_mangle
+        jmp     NEAR $L$oop_schedule_128
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ALIGN   16
+$L$schedule_192:
+        movdqu  xmm0,XMMWORD[8+rdi]
+        call    _vpaes_schedule_transform
+        movdqa  xmm6,xmm0
+        pxor    xmm4,xmm4
+        movhlps xmm6,xmm4
+        mov     esi,4
+
+$L$oop_schedule_192:
+        call    _vpaes_schedule_round
+DB      102,15,58,15,198,8
+        call    _vpaes_schedule_mangle
+        call    _vpaes_schedule_192_smear
+        call    _vpaes_schedule_mangle
+        call    _vpaes_schedule_round
+        dec     rsi
+        jz      NEAR $L$schedule_mangle_last
+        call    _vpaes_schedule_mangle
+        call    _vpaes_schedule_192_smear
+        jmp     NEAR $L$oop_schedule_192
+
+
+
+
+
+
+
+
+
+
+
+ALIGN   16
+$L$schedule_256:
+        movdqu  xmm0,XMMWORD[16+rdi]
+        call    _vpaes_schedule_transform
+        mov     esi,7
+
+$L$oop_schedule_256:
+        call    _vpaes_schedule_mangle
+        movdqa  xmm6,xmm0
+
+
+        call    _vpaes_schedule_round
+        dec     rsi
+        jz      NEAR $L$schedule_mangle_last
+        call    _vpaes_schedule_mangle
+
+
+        pshufd  xmm0,xmm0,0xFF
+        movdqa  xmm5,xmm7
+        movdqa  xmm7,xmm6
+        call    _vpaes_schedule_low_round
+        movdqa  xmm7,xmm5
+
+        jmp     NEAR $L$oop_schedule_256
+
+
+
+
+
+
+
+
+
+
+
+
+ALIGN   16
+$L$schedule_mangle_last:
+
+        lea     r11,[$L$k_deskew]
+        test    rcx,rcx
+        jnz     NEAR $L$schedule_mangle_last_dec
+
+
+        movdqa  xmm1,XMMWORD[r10*1+r8]
+DB      102,15,56,0,193
+        lea     r11,[$L$k_opt]
+        add     rdx,32
+
+$L$schedule_mangle_last_dec:
+        add     rdx,-16
+        pxor    xmm0,XMMWORD[$L$k_s63]
+        call    _vpaes_schedule_transform
+        movdqu  XMMWORD[rdx],xmm0
+
+
+        pxor    xmm0,xmm0
+        pxor    xmm1,xmm1
+        pxor    xmm2,xmm2
+        pxor    xmm3,xmm3
+        pxor    xmm4,xmm4
+        pxor    xmm5,xmm5
+        pxor    xmm6,xmm6
+        pxor    xmm7,xmm7
+        DB      0F3h,0C3h               ;repret
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ALIGN   16
+_vpaes_schedule_192_smear:
+
+        pshufd  xmm1,xmm6,0x80
+        pshufd  xmm0,xmm7,0xFE
+        pxor    xmm6,xmm1
+        pxor    xmm1,xmm1
+        pxor    xmm6,xmm0
+        movdqa  xmm0,xmm6
+        movhlps xmm6,xmm1
+        DB      0F3h,0C3h               ;repret
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ALIGN   16
+_vpaes_schedule_round:
+
+
+        pxor    xmm1,xmm1
+DB      102,65,15,58,15,200,15
+DB      102,69,15,58,15,192,15
+        pxor    xmm7,xmm1
+
+
+        pshufd  xmm0,xmm0,0xFF
+DB      102,15,58,15,192,1
+
+
+
+
+_vpaes_schedule_low_round:
+
+        movdqa  xmm1,xmm7
+        pslldq  xmm7,4
+        pxor    xmm7,xmm1
+        movdqa  xmm1,xmm7
+        pslldq  xmm7,8
+        pxor    xmm7,xmm1
+        pxor    xmm7,XMMWORD[$L$k_s63]
+
+
+        movdqa  xmm1,xmm9
+        pandn   xmm1,xmm0
+        psrld   xmm1,4
+        pand    xmm0,xmm9
+        movdqa  xmm2,xmm11
+DB      102,15,56,0,208
+        pxor    xmm0,xmm1
+        movdqa  xmm3,xmm10
+DB      102,15,56,0,217
+        pxor    xmm3,xmm2
+        movdqa  xmm4,xmm10
+DB      102,15,56,0,224
+        pxor    xmm4,xmm2
+        movdqa  xmm2,xmm10
+DB      102,15,56,0,211
+        pxor    xmm2,xmm0
+        movdqa  xmm3,xmm10
+DB      102,15,56,0,220
+        pxor    xmm3,xmm1
+        movdqa  xmm4,xmm13
+DB      102,15,56,0,226
+        movdqa  xmm0,xmm12
+DB      102,15,56,0,195
+        pxor    xmm0,xmm4
+
+
+        pxor    xmm0,xmm7
+        movdqa  xmm7,xmm0
+        DB      0F3h,0C3h               ;repret
+
+
+
+
+
+
+
+
+
+
+
+
+
+ALIGN   16
+_vpaes_schedule_transform:
+
+        movdqa  xmm1,xmm9
+        pandn   xmm1,xmm0
+        psrld   xmm1,4
+        pand    xmm0,xmm9
+        movdqa  xmm2,XMMWORD[r11]
+DB      102,15,56,0,208
+        movdqa  xmm0,XMMWORD[16+r11]
+DB      102,15,56,0,193
+        pxor    xmm0,xmm2
+        DB      0F3h,0C3h               ;repret
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ALIGN   16
+_vpaes_schedule_mangle:
+
+        movdqa  xmm4,xmm0
+        movdqa  xmm5,XMMWORD[$L$k_mc_forward]
+        test    rcx,rcx
+        jnz     NEAR $L$schedule_mangle_dec
+
+
+        add     rdx,16
+        pxor    xmm4,XMMWORD[$L$k_s63]
+DB      102,15,56,0,229
+        movdqa  xmm3,xmm4
+DB      102,15,56,0,229
+        pxor    xmm3,xmm4
+DB      102,15,56,0,229
+        pxor    xmm3,xmm4
+
+        jmp     NEAR $L$schedule_mangle_both
+ALIGN   16
+$L$schedule_mangle_dec:
+
+        lea     r11,[$L$k_dksd]
+        movdqa  xmm1,xmm9
+        pandn   xmm1,xmm4
+        psrld   xmm1,4
+        pand    xmm4,xmm9
+
+        movdqa  xmm2,XMMWORD[r11]
+DB      102,15,56,0,212
+        movdqa  xmm3,XMMWORD[16+r11]
+DB      102,15,56,0,217
+        pxor    xmm3,xmm2
+DB      102,15,56,0,221
+
+        movdqa  xmm2,XMMWORD[32+r11]
+DB      102,15,56,0,212
+        pxor    xmm2,xmm3
+        movdqa  xmm3,XMMWORD[48+r11]
+DB      102,15,56,0,217
+        pxor    xmm3,xmm2
+DB      102,15,56,0,221
+
+        movdqa  xmm2,XMMWORD[64+r11]
+DB      102,15,56,0,212
+        pxor    xmm2,xmm3
+        movdqa  xmm3,XMMWORD[80+r11]
+DB      102,15,56,0,217
+        pxor    xmm3,xmm2
+DB      102,15,56,0,221
+
+        movdqa  xmm2,XMMWORD[96+r11]
+DB      102,15,56,0,212
+        pxor    xmm2,xmm3
+        movdqa  xmm3,XMMWORD[112+r11]
+DB      102,15,56,0,217
+        pxor    xmm3,xmm2
+
+        add     rdx,-16
+
+$L$schedule_mangle_both:
+        movdqa  xmm1,XMMWORD[r10*1+r8]
+DB      102,15,56,0,217
+        add     r8,-16
+        and     r8,0x30
+        movdqu  XMMWORD[rdx],xmm3
+        DB      0F3h,0C3h               ;repret
+
+
+
+
+
+
+global  vpaes_set_encrypt_key
+
+ALIGN   16
+vpaes_set_encrypt_key:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_vpaes_set_encrypt_key:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+
+        lea     rsp,[((-184))+rsp]
+        movaps  XMMWORD[16+rsp],xmm6
+        movaps  XMMWORD[32+rsp],xmm7
+        movaps  XMMWORD[48+rsp],xmm8
+        movaps  XMMWORD[64+rsp],xmm9
+        movaps  XMMWORD[80+rsp],xmm10
+        movaps  XMMWORD[96+rsp],xmm11
+        movaps  XMMWORD[112+rsp],xmm12
+        movaps  XMMWORD[128+rsp],xmm13
+        movaps  XMMWORD[144+rsp],xmm14
+        movaps  XMMWORD[160+rsp],xmm15
+$L$enc_key_body:
+        mov     eax,esi
+        shr     eax,5
+        add     eax,5
+        mov     DWORD[240+rdx],eax
+
+        mov     ecx,0
+        mov     r8d,0x30
+        call    _vpaes_schedule_core
+        movaps  xmm6,XMMWORD[16+rsp]
+        movaps  xmm7,XMMWORD[32+rsp]
+        movaps  xmm8,XMMWORD[48+rsp]
+        movaps  xmm9,XMMWORD[64+rsp]
+        movaps  xmm10,XMMWORD[80+rsp]
+        movaps  xmm11,XMMWORD[96+rsp]
+        movaps  xmm12,XMMWORD[112+rsp]
+        movaps  xmm13,XMMWORD[128+rsp]
+        movaps  xmm14,XMMWORD[144+rsp]
+        movaps  xmm15,XMMWORD[160+rsp]
+        lea     rsp,[184+rsp]
+$L$enc_key_epilogue:
+        xor     eax,eax
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_vpaes_set_encrypt_key:
+
+global  vpaes_set_decrypt_key
+
+ALIGN   16
+vpaes_set_decrypt_key:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_vpaes_set_decrypt_key:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+
+        lea     rsp,[((-184))+rsp]
+        movaps  XMMWORD[16+rsp],xmm6
+        movaps  XMMWORD[32+rsp],xmm7
+        movaps  XMMWORD[48+rsp],xmm8
+        movaps  XMMWORD[64+rsp],xmm9
+        movaps  XMMWORD[80+rsp],xmm10
+        movaps  XMMWORD[96+rsp],xmm11
+        movaps  XMMWORD[112+rsp],xmm12
+        movaps  XMMWORD[128+rsp],xmm13
+        movaps  XMMWORD[144+rsp],xmm14
+        movaps  XMMWORD[160+rsp],xmm15
+$L$dec_key_body:
+        mov     eax,esi
+        shr     eax,5
+        add     eax,5
+        mov     DWORD[240+rdx],eax
+        shl     eax,4
+        lea     rdx,[16+rax*1+rdx]
+
+        mov     ecx,1
+        mov     r8d,esi
+        shr     r8d,1
+        and     r8d,32
+        xor     r8d,32
+        call    _vpaes_schedule_core
+        movaps  xmm6,XMMWORD[16+rsp]
+        movaps  xmm7,XMMWORD[32+rsp]
+        movaps  xmm8,XMMWORD[48+rsp]
+        movaps  xmm9,XMMWORD[64+rsp]
+        movaps  xmm10,XMMWORD[80+rsp]
+        movaps  xmm11,XMMWORD[96+rsp]
+        movaps  xmm12,XMMWORD[112+rsp]
+        movaps  xmm13,XMMWORD[128+rsp]
+        movaps  xmm14,XMMWORD[144+rsp]
+        movaps  xmm15,XMMWORD[160+rsp]
+        lea     rsp,[184+rsp]
+$L$dec_key_epilogue:
+        xor     eax,eax
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_vpaes_set_decrypt_key:
+
+global  vpaes_encrypt
+
+ALIGN   16
+vpaes_encrypt:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_vpaes_encrypt:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+
+        lea     rsp,[((-184))+rsp]
+        movaps  XMMWORD[16+rsp],xmm6
+        movaps  XMMWORD[32+rsp],xmm7
+        movaps  XMMWORD[48+rsp],xmm8
+        movaps  XMMWORD[64+rsp],xmm9
+        movaps  XMMWORD[80+rsp],xmm10
+        movaps  XMMWORD[96+rsp],xmm11
+        movaps  XMMWORD[112+rsp],xmm12
+        movaps  XMMWORD[128+rsp],xmm13
+        movaps  XMMWORD[144+rsp],xmm14
+        movaps  XMMWORD[160+rsp],xmm15
+$L$enc_body:
+        movdqu  xmm0,XMMWORD[rdi]
+        call    _vpaes_preheat
+        call    _vpaes_encrypt_core
+        movdqu  XMMWORD[rsi],xmm0
+        movaps  xmm6,XMMWORD[16+rsp]
+        movaps  xmm7,XMMWORD[32+rsp]
+        movaps  xmm8,XMMWORD[48+rsp]
+        movaps  xmm9,XMMWORD[64+rsp]
+        movaps  xmm10,XMMWORD[80+rsp]
+        movaps  xmm11,XMMWORD[96+rsp]
+        movaps  xmm12,XMMWORD[112+rsp]
+        movaps  xmm13,XMMWORD[128+rsp]
+        movaps  xmm14,XMMWORD[144+rsp]
+        movaps  xmm15,XMMWORD[160+rsp]
+        lea     rsp,[184+rsp]
+$L$enc_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_vpaes_encrypt:
+
+global  vpaes_decrypt
+
+ALIGN   16
+vpaes_decrypt:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_vpaes_decrypt:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+
+        lea     rsp,[((-184))+rsp]
+        movaps  XMMWORD[16+rsp],xmm6
+        movaps  XMMWORD[32+rsp],xmm7
+        movaps  XMMWORD[48+rsp],xmm8
+        movaps  XMMWORD[64+rsp],xmm9
+        movaps  XMMWORD[80+rsp],xmm10
+        movaps  XMMWORD[96+rsp],xmm11
+        movaps  XMMWORD[112+rsp],xmm12
+        movaps  XMMWORD[128+rsp],xmm13
+        movaps  XMMWORD[144+rsp],xmm14
+        movaps  XMMWORD[160+rsp],xmm15
+$L$dec_body:
+        movdqu  xmm0,XMMWORD[rdi]
+        call    _vpaes_preheat
+        call    _vpaes_decrypt_core
+        movdqu  XMMWORD[rsi],xmm0
+        movaps  xmm6,XMMWORD[16+rsp]
+        movaps  xmm7,XMMWORD[32+rsp]
+        movaps  xmm8,XMMWORD[48+rsp]
+        movaps  xmm9,XMMWORD[64+rsp]
+        movaps  xmm10,XMMWORD[80+rsp]
+        movaps  xmm11,XMMWORD[96+rsp]
+        movaps  xmm12,XMMWORD[112+rsp]
+        movaps  xmm13,XMMWORD[128+rsp]
+        movaps  xmm14,XMMWORD[144+rsp]
+        movaps  xmm15,XMMWORD[160+rsp]
+        lea     rsp,[184+rsp]
+$L$dec_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_vpaes_decrypt:
+global  vpaes_cbc_encrypt
+
+ALIGN   16
+vpaes_cbc_encrypt:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_vpaes_cbc_encrypt:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+
+        xchg    rdx,rcx
+        sub     rcx,16
+        jc      NEAR $L$cbc_abort
+        lea     rsp,[((-184))+rsp]
+        movaps  XMMWORD[16+rsp],xmm6
+        movaps  XMMWORD[32+rsp],xmm7
+        movaps  XMMWORD[48+rsp],xmm8
+        movaps  XMMWORD[64+rsp],xmm9
+        movaps  XMMWORD[80+rsp],xmm10
+        movaps  XMMWORD[96+rsp],xmm11
+        movaps  XMMWORD[112+rsp],xmm12
+        movaps  XMMWORD[128+rsp],xmm13
+        movaps  XMMWORD[144+rsp],xmm14
+        movaps  XMMWORD[160+rsp],xmm15
+$L$cbc_body:
+        movdqu  xmm6,XMMWORD[r8]
+        sub     rsi,rdi
+        call    _vpaes_preheat
+        cmp     r9d,0
+        je      NEAR $L$cbc_dec_loop
+        jmp     NEAR $L$cbc_enc_loop
+ALIGN   16
+$L$cbc_enc_loop:
+        movdqu  xmm0,XMMWORD[rdi]
+        pxor    xmm0,xmm6
+        call    _vpaes_encrypt_core
+        movdqa  xmm6,xmm0
+        movdqu  XMMWORD[rdi*1+rsi],xmm0
+        lea     rdi,[16+rdi]
+        sub     rcx,16
+        jnc     NEAR $L$cbc_enc_loop
+        jmp     NEAR $L$cbc_done
+ALIGN   16
+$L$cbc_dec_loop:
+        movdqu  xmm0,XMMWORD[rdi]
+        movdqa  xmm7,xmm0
+        call    _vpaes_decrypt_core
+        pxor    xmm0,xmm6
+        movdqa  xmm6,xmm7
+        movdqu  XMMWORD[rdi*1+rsi],xmm0
+        lea     rdi,[16+rdi]
+        sub     rcx,16
+        jnc     NEAR $L$cbc_dec_loop
+$L$cbc_done:
+        movdqu  XMMWORD[r8],xmm6
+        movaps  xmm6,XMMWORD[16+rsp]
+        movaps  xmm7,XMMWORD[32+rsp]
+        movaps  xmm8,XMMWORD[48+rsp]
+        movaps  xmm9,XMMWORD[64+rsp]
+        movaps  xmm10,XMMWORD[80+rsp]
+        movaps  xmm11,XMMWORD[96+rsp]
+        movaps  xmm12,XMMWORD[112+rsp]
+        movaps  xmm13,XMMWORD[128+rsp]
+        movaps  xmm14,XMMWORD[144+rsp]
+        movaps  xmm15,XMMWORD[160+rsp]
+        lea     rsp,[184+rsp]
+$L$cbc_epilogue:
+$L$cbc_abort:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_vpaes_cbc_encrypt:
+
+
+
+
+
+
+
+ALIGN   16
+_vpaes_preheat:
+
+        lea     r10,[$L$k_s0F]
+        movdqa  xmm10,XMMWORD[((-32))+r10]
+        movdqa  xmm11,XMMWORD[((-16))+r10]
+        movdqa  xmm9,XMMWORD[r10]
+        movdqa  xmm13,XMMWORD[48+r10]
+        movdqa  xmm12,XMMWORD[64+r10]
+        movdqa  xmm15,XMMWORD[80+r10]
+        movdqa  xmm14,XMMWORD[96+r10]
+        DB      0F3h,0C3h               ;repret
+
+
+
+
+
+
+
+
+ALIGN   64
+_vpaes_consts:
+$L$k_inv:
+        DQ      0x0E05060F0D080180,0x040703090A0B0C02
+        DQ      0x01040A060F0B0780,0x030D0E0C02050809
+
+$L$k_s0F:
+        DQ      0x0F0F0F0F0F0F0F0F,0x0F0F0F0F0F0F0F0F
+
+$L$k_ipt:
+        DQ      0xC2B2E8985A2A7000,0xCABAE09052227808
+        DQ      0x4C01307D317C4D00,0xCD80B1FCB0FDCC81
+
+$L$k_sb1:
+        DQ      0xB19BE18FCB503E00,0xA5DF7A6E142AF544
+        DQ      0x3618D415FAE22300,0x3BF7CCC10D2ED9EF
+$L$k_sb2:
+        DQ      0xE27A93C60B712400,0x5EB7E955BC982FCD
+        DQ      0x69EB88400AE12900,0xC2A163C8AB82234A
+$L$k_sbo:
+        DQ      0xD0D26D176FBDC700,0x15AABF7AC502A878
+        DQ      0xCFE474A55FBB6A00,0x8E1E90D1412B35FA
+
+$L$k_mc_forward:
+        DQ      0x0407060500030201,0x0C0F0E0D080B0A09
+        DQ      0x080B0A0904070605,0x000302010C0F0E0D
+        DQ      0x0C0F0E0D080B0A09,0x0407060500030201
+        DQ      0x000302010C0F0E0D,0x080B0A0904070605
+
+$L$k_mc_backward:
+        DQ      0x0605040702010003,0x0E0D0C0F0A09080B
+        DQ      0x020100030E0D0C0F,0x0A09080B06050407
+        DQ      0x0E0D0C0F0A09080B,0x0605040702010003
+        DQ      0x0A09080B06050407,0x020100030E0D0C0F
+
+$L$k_sr:
+        DQ      0x0706050403020100,0x0F0E0D0C0B0A0908
+        DQ      0x030E09040F0A0500,0x0B06010C07020D08
+        DQ      0x0F060D040B020900,0x070E050C030A0108
+        DQ      0x0B0E0104070A0D00,0x0306090C0F020508
+
+$L$k_rcon:
+        DQ      0x1F8391B9AF9DEEB6,0x702A98084D7C7D81
+
+$L$k_s63:
+        DQ      0x5B5B5B5B5B5B5B5B,0x5B5B5B5B5B5B5B5B
+
+$L$k_opt:
+        DQ      0xFF9F4929D6B66000,0xF7974121DEBE6808
+        DQ      0x01EDBD5150BCEC00,0xE10D5DB1B05C0CE0
+
+$L$k_deskew:
+        DQ      0x07E4A34047A4E300,0x1DFEB95A5DBEF91A
+        DQ      0x5F36B5DC83EA6900,0x2841C2ABF49D1E77
+
+
+
+
+
+$L$k_dksd:
+        DQ      0xFEB91A5DA3E44700,0x0740E3A45A1DBEF9
+        DQ      0x41C277F4B5368300,0x5FDC69EAAB289D1E
+$L$k_dksb:
+        DQ      0x9A4FCA1F8550D500,0x03D653861CC94C99
+        DQ      0x115BEDA7B6FC4A00,0xD993256F7E3482C8
+$L$k_dkse:
+        DQ      0xD5031CCA1FC9D600,0x53859A4C994F5086
+        DQ      0xA23196054FDC7BE8,0xCD5EF96A20B31487
+$L$k_dks9:
+        DQ      0xB6116FC87ED9A700,0x4AED933482255BFC
+        DQ      0x4576516227143300,0x8BB89FACE9DAFDCE
+
+
+
+
+
+$L$k_dipt:
+        DQ      0x0F505B040B545F00,0x154A411E114E451A
+        DQ      0x86E383E660056500,0x12771772F491F194
+
+$L$k_dsb9:
+        DQ      0x851C03539A86D600,0xCAD51F504F994CC9
+        DQ      0xC03B1789ECD74900,0x725E2C9EB2FBA565
+$L$k_dsbd:
+        DQ      0x7D57CCDFE6B1A200,0xF56E9B13882A4439
+        DQ      0x3CE2FAF724C6CB00,0x2931180D15DEEFD3
+$L$k_dsbb:
+        DQ      0xD022649296B44200,0x602646F6B0F2D404
+        DQ      0xC19498A6CD596700,0xF3FF0C3E3255AA6B
+$L$k_dsbe:
+        DQ      0x46F2929626D4D000,0x2242600464B4F6B0
+        DQ      0x0C55A6CDFFAAC100,0x9467F36B98593E32
+$L$k_dsbo:
+        DQ      0x1387EA537EF94000,0xC7AA6DB9D4943E2D
+        DQ      0x12D7560F93441D00,0xCA4B8159D8C58E9C
+DB      86,101,99,116,111,114,32,80,101,114,109,117,116,97,116,105
+DB      111,110,32,65,69,83,32,102,111,114,32,120,56,54,95,54
+DB      52,47,83,83,83,69,51,44,32,77,105,107,101,32,72,97
+DB      109,98,117,114,103,32,40,83,116,97,110,102,111,114,100,32
+DB      85,110,105,118,101,114,115,105,116,121,41,0
+ALIGN   64
+
+EXTERN  __imp_RtlVirtualUnwind
+
+ALIGN   16
+se_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        mov     rsi,QWORD[8+r9]
+        mov     r11,QWORD[56+r9]
+
+        mov     r10d,DWORD[r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jb      NEAR $L$in_prologue
+
+        mov     rax,QWORD[152+r8]
+
+        mov     r10d,DWORD[4+r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jae     NEAR $L$in_prologue
+
+        lea     rsi,[16+rax]
+        lea     rdi,[512+r8]
+        mov     ecx,20
+        DD      0xa548f3fc
+        lea     rax,[184+rax]
+
+$L$in_prologue:
+        mov     rdi,QWORD[8+rax]
+        mov     rsi,QWORD[16+rax]
+        mov     QWORD[152+r8],rax
+        mov     QWORD[168+r8],rsi
+        mov     QWORD[176+r8],rdi
+
+        mov     rdi,QWORD[40+r9]
+        mov     rsi,r8
+        mov     ecx,154
+        DD      0xa548f3fc
+
+        mov     rsi,r9
+        xor     rcx,rcx
+        mov     rdx,QWORD[8+rsi]
+        mov     r8,QWORD[rsi]
+        mov     r9,QWORD[16+rsi]
+        mov     r10,QWORD[40+rsi]
+        lea     r11,[56+rsi]
+        lea     r12,[24+rsi]
+        mov     QWORD[32+rsp],r10
+        mov     QWORD[40+rsp],r11
+        mov     QWORD[48+rsp],r12
+        mov     QWORD[56+rsp],rcx
+        call    QWORD[__imp_RtlVirtualUnwind]
+
+        mov     eax,1
+        add     rsp,64
+        popfq
+        pop     r15
+        pop     r14
+        pop     r13
+        pop     r12
+        pop     rbp
+        pop     rbx
+        pop     rdi
+        pop     rsi
+        DB      0F3h,0C3h               ;repret
+
+
+section .pdata rdata align=4
+ALIGN   4
+        DD      $L$SEH_begin_vpaes_set_encrypt_key wrt ..imagebase
+        DD      $L$SEH_end_vpaes_set_encrypt_key wrt ..imagebase
+        DD      $L$SEH_info_vpaes_set_encrypt_key wrt ..imagebase
+
+        DD      $L$SEH_begin_vpaes_set_decrypt_key wrt ..imagebase
+        DD      $L$SEH_end_vpaes_set_decrypt_key wrt ..imagebase
+        DD      $L$SEH_info_vpaes_set_decrypt_key wrt ..imagebase
+
+        DD      $L$SEH_begin_vpaes_encrypt wrt ..imagebase
+        DD      $L$SEH_end_vpaes_encrypt wrt ..imagebase
+        DD      $L$SEH_info_vpaes_encrypt wrt ..imagebase
+
+        DD      $L$SEH_begin_vpaes_decrypt wrt ..imagebase
+        DD      $L$SEH_end_vpaes_decrypt wrt ..imagebase
+        DD      $L$SEH_info_vpaes_decrypt wrt ..imagebase
+
+        DD      $L$SEH_begin_vpaes_cbc_encrypt wrt ..imagebase
+        DD      $L$SEH_end_vpaes_cbc_encrypt wrt ..imagebase
+        DD      $L$SEH_info_vpaes_cbc_encrypt wrt ..imagebase
+
+section .xdata rdata align=8
+ALIGN   8
+$L$SEH_info_vpaes_set_encrypt_key:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$enc_key_body wrt ..imagebase,$L$enc_key_epilogue wrt ..imagebase
+$L$SEH_info_vpaes_set_decrypt_key:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$dec_key_body wrt ..imagebase,$L$dec_key_epilogue wrt ..imagebase
+$L$SEH_info_vpaes_encrypt:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$enc_body wrt ..imagebase,$L$enc_epilogue wrt ..imagebase
+$L$SEH_info_vpaes_decrypt:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$dec_body wrt ..imagebase,$L$dec_epilogue wrt ..imagebase
+$L$SEH_info_vpaes_cbc_encrypt:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$cbc_body wrt ..imagebase,$L$cbc_epilogue wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-avx2.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-avx2.nasm
new file mode 100644
index 0000000000..69443b7261
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-avx2.nasm
@@ -0,0 +1,1989 @@
+; Copyright 2013-2019 The OpenSSL Project Authors. All Rights Reserved.
+; Copyright (c) 2012, Intel Corporation. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+global  rsaz_1024_sqr_avx2
+
+ALIGN   64
+rsaz_1024_sqr_avx2:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_rsaz_1024_sqr_avx2:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+
+
+
+        lea     rax,[rsp]
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+        vzeroupper
+        lea     rsp,[((-168))+rsp]
+        vmovaps XMMWORD[(-216)+rax],xmm6
+        vmovaps XMMWORD[(-200)+rax],xmm7
+        vmovaps XMMWORD[(-184)+rax],xmm8
+        vmovaps XMMWORD[(-168)+rax],xmm9
+        vmovaps XMMWORD[(-152)+rax],xmm10
+        vmovaps XMMWORD[(-136)+rax],xmm11
+        vmovaps XMMWORD[(-120)+rax],xmm12
+        vmovaps XMMWORD[(-104)+rax],xmm13
+        vmovaps XMMWORD[(-88)+rax],xmm14
+        vmovaps XMMWORD[(-72)+rax],xmm15
+$L$sqr_1024_body:
+        mov     rbp,rax
+
+        mov     r13,rdx
+        sub     rsp,832
+        mov     r15,r13
+        sub     rdi,-128
+        sub     rsi,-128
+        sub     r13,-128
+
+        and     r15,4095
+        add     r15,32*10
+        shr     r15,12
+        vpxor   ymm9,ymm9,ymm9
+        jz      NEAR $L$sqr_1024_no_n_copy
+
+
+
+
+
+        sub     rsp,32*10
+        vmovdqu ymm0,YMMWORD[((0-128))+r13]
+        and     rsp,-2048
+        vmovdqu ymm1,YMMWORD[((32-128))+r13]
+        vmovdqu ymm2,YMMWORD[((64-128))+r13]
+        vmovdqu ymm3,YMMWORD[((96-128))+r13]
+        vmovdqu ymm4,YMMWORD[((128-128))+r13]
+        vmovdqu ymm5,YMMWORD[((160-128))+r13]
+        vmovdqu ymm6,YMMWORD[((192-128))+r13]
+        vmovdqu ymm7,YMMWORD[((224-128))+r13]
+        vmovdqu ymm8,YMMWORD[((256-128))+r13]
+        lea     r13,[((832+128))+rsp]
+        vmovdqu YMMWORD[(0-128)+r13],ymm0
+        vmovdqu YMMWORD[(32-128)+r13],ymm1
+        vmovdqu YMMWORD[(64-128)+r13],ymm2
+        vmovdqu YMMWORD[(96-128)+r13],ymm3
+        vmovdqu YMMWORD[(128-128)+r13],ymm4
+        vmovdqu YMMWORD[(160-128)+r13],ymm5
+        vmovdqu YMMWORD[(192-128)+r13],ymm6
+        vmovdqu YMMWORD[(224-128)+r13],ymm7
+        vmovdqu YMMWORD[(256-128)+r13],ymm8
+        vmovdqu YMMWORD[(288-128)+r13],ymm9
+
+$L$sqr_1024_no_n_copy:
+        and     rsp,-1024
+
+        vmovdqu ymm1,YMMWORD[((32-128))+rsi]
+        vmovdqu ymm2,YMMWORD[((64-128))+rsi]
+        vmovdqu ymm3,YMMWORD[((96-128))+rsi]
+        vmovdqu ymm4,YMMWORD[((128-128))+rsi]
+        vmovdqu ymm5,YMMWORD[((160-128))+rsi]
+        vmovdqu ymm6,YMMWORD[((192-128))+rsi]
+        vmovdqu ymm7,YMMWORD[((224-128))+rsi]
+        vmovdqu ymm8,YMMWORD[((256-128))+rsi]
+
+        lea     rbx,[192+rsp]
+        vmovdqu ymm15,YMMWORD[$L$and_mask]
+        jmp     NEAR $L$OOP_GRANDE_SQR_1024
+
+ALIGN   32
+$L$OOP_GRANDE_SQR_1024:
+        lea     r9,[((576+128))+rsp]
+        lea     r12,[448+rsp]
+
+
+
+
+        vpaddq  ymm1,ymm1,ymm1
+        vpbroadcastq    ymm10,QWORD[((0-128))+rsi]
+        vpaddq  ymm2,ymm2,ymm2
+        vmovdqa YMMWORD[(0-128)+r9],ymm1
+        vpaddq  ymm3,ymm3,ymm3
+        vmovdqa YMMWORD[(32-128)+r9],ymm2
+        vpaddq  ymm4,ymm4,ymm4
+        vmovdqa YMMWORD[(64-128)+r9],ymm3
+        vpaddq  ymm5,ymm5,ymm5
+        vmovdqa YMMWORD[(96-128)+r9],ymm4
+        vpaddq  ymm6,ymm6,ymm6
+        vmovdqa YMMWORD[(128-128)+r9],ymm5
+        vpaddq  ymm7,ymm7,ymm7
+        vmovdqa YMMWORD[(160-128)+r9],ymm6
+        vpaddq  ymm8,ymm8,ymm8
+        vmovdqa YMMWORD[(192-128)+r9],ymm7
+        vpxor   ymm9,ymm9,ymm9
+        vmovdqa YMMWORD[(224-128)+r9],ymm8
+
+        vpmuludq        ymm0,ymm10,YMMWORD[((0-128))+rsi]
+        vpbroadcastq    ymm11,QWORD[((32-128))+rsi]
+        vmovdqu YMMWORD[(288-192)+rbx],ymm9
+        vpmuludq        ymm1,ymm1,ymm10
+        vmovdqu YMMWORD[(320-448)+r12],ymm9
+        vpmuludq        ymm2,ymm2,ymm10
+        vmovdqu YMMWORD[(352-448)+r12],ymm9
+        vpmuludq        ymm3,ymm3,ymm10
+        vmovdqu YMMWORD[(384-448)+r12],ymm9
+        vpmuludq        ymm4,ymm4,ymm10
+        vmovdqu YMMWORD[(416-448)+r12],ymm9
+        vpmuludq        ymm5,ymm5,ymm10
+        vmovdqu YMMWORD[(448-448)+r12],ymm9
+        vpmuludq        ymm6,ymm6,ymm10
+        vmovdqu YMMWORD[(480-448)+r12],ymm9
+        vpmuludq        ymm7,ymm7,ymm10
+        vmovdqu YMMWORD[(512-448)+r12],ymm9
+        vpmuludq        ymm8,ymm8,ymm10
+        vpbroadcastq    ymm10,QWORD[((64-128))+rsi]
+        vmovdqu YMMWORD[(544-448)+r12],ymm9
+
+        mov     r15,rsi
+        mov     r14d,4
+        jmp     NEAR $L$sqr_entry_1024
+ALIGN   32
+$L$OOP_SQR_1024:
+        vpbroadcastq    ymm11,QWORD[((32-128))+r15]
+        vpmuludq        ymm0,ymm10,YMMWORD[((0-128))+rsi]
+        vpaddq  ymm0,ymm0,YMMWORD[((0-192))+rbx]
+        vpmuludq        ymm1,ymm10,YMMWORD[((0-128))+r9]
+        vpaddq  ymm1,ymm1,YMMWORD[((32-192))+rbx]
+        vpmuludq        ymm2,ymm10,YMMWORD[((32-128))+r9]
+        vpaddq  ymm2,ymm2,YMMWORD[((64-192))+rbx]
+        vpmuludq        ymm3,ymm10,YMMWORD[((64-128))+r9]
+        vpaddq  ymm3,ymm3,YMMWORD[((96-192))+rbx]
+        vpmuludq        ymm4,ymm10,YMMWORD[((96-128))+r9]
+        vpaddq  ymm4,ymm4,YMMWORD[((128-192))+rbx]
+        vpmuludq        ymm5,ymm10,YMMWORD[((128-128))+r9]
+        vpaddq  ymm5,ymm5,YMMWORD[((160-192))+rbx]
+        vpmuludq        ymm6,ymm10,YMMWORD[((160-128))+r9]
+        vpaddq  ymm6,ymm6,YMMWORD[((192-192))+rbx]
+        vpmuludq        ymm7,ymm10,YMMWORD[((192-128))+r9]
+        vpaddq  ymm7,ymm7,YMMWORD[((224-192))+rbx]
+        vpmuludq        ymm8,ymm10,YMMWORD[((224-128))+r9]
+        vpbroadcastq    ymm10,QWORD[((64-128))+r15]
+        vpaddq  ymm8,ymm8,YMMWORD[((256-192))+rbx]
+$L$sqr_entry_1024:
+        vmovdqu YMMWORD[(0-192)+rbx],ymm0
+        vmovdqu YMMWORD[(32-192)+rbx],ymm1
+
+        vpmuludq        ymm12,ymm11,YMMWORD[((32-128))+rsi]
+        vpaddq  ymm2,ymm2,ymm12
+        vpmuludq        ymm14,ymm11,YMMWORD[((32-128))+r9]
+        vpaddq  ymm3,ymm3,ymm14
+        vpmuludq        ymm13,ymm11,YMMWORD[((64-128))+r9]
+        vpaddq  ymm4,ymm4,ymm13
+        vpmuludq        ymm12,ymm11,YMMWORD[((96-128))+r9]
+        vpaddq  ymm5,ymm5,ymm12
+        vpmuludq        ymm14,ymm11,YMMWORD[((128-128))+r9]
+        vpaddq  ymm6,ymm6,ymm14
+        vpmuludq        ymm13,ymm11,YMMWORD[((160-128))+r9]
+        vpaddq  ymm7,ymm7,ymm13
+        vpmuludq        ymm12,ymm11,YMMWORD[((192-128))+r9]
+        vpaddq  ymm8,ymm8,ymm12
+        vpmuludq        ymm0,ymm11,YMMWORD[((224-128))+r9]
+        vpbroadcastq    ymm11,QWORD[((96-128))+r15]
+        vpaddq  ymm0,ymm0,YMMWORD[((288-192))+rbx]
+
+        vmovdqu YMMWORD[(64-192)+rbx],ymm2
+        vmovdqu YMMWORD[(96-192)+rbx],ymm3
+
+        vpmuludq        ymm13,ymm10,YMMWORD[((64-128))+rsi]
+        vpaddq  ymm4,ymm4,ymm13
+        vpmuludq        ymm12,ymm10,YMMWORD[((64-128))+r9]
+        vpaddq  ymm5,ymm5,ymm12
+        vpmuludq        ymm14,ymm10,YMMWORD[((96-128))+r9]
+        vpaddq  ymm6,ymm6,ymm14
+        vpmuludq        ymm13,ymm10,YMMWORD[((128-128))+r9]
+        vpaddq  ymm7,ymm7,ymm13
+        vpmuludq        ymm12,ymm10,YMMWORD[((160-128))+r9]
+        vpaddq  ymm8,ymm8,ymm12
+        vpmuludq        ymm14,ymm10,YMMWORD[((192-128))+r9]
+        vpaddq  ymm0,ymm0,ymm14
+        vpmuludq        ymm1,ymm10,YMMWORD[((224-128))+r9]
+        vpbroadcastq    ymm10,QWORD[((128-128))+r15]
+        vpaddq  ymm1,ymm1,YMMWORD[((320-448))+r12]
+
+        vmovdqu YMMWORD[(128-192)+rbx],ymm4
+        vmovdqu YMMWORD[(160-192)+rbx],ymm5
+
+        vpmuludq        ymm12,ymm11,YMMWORD[((96-128))+rsi]
+        vpaddq  ymm6,ymm6,ymm12
+        vpmuludq        ymm14,ymm11,YMMWORD[((96-128))+r9]
+        vpaddq  ymm7,ymm7,ymm14
+        vpmuludq        ymm13,ymm11,YMMWORD[((128-128))+r9]
+        vpaddq  ymm8,ymm8,ymm13
+        vpmuludq        ymm12,ymm11,YMMWORD[((160-128))+r9]
+        vpaddq  ymm0,ymm0,ymm12
+        vpmuludq        ymm14,ymm11,YMMWORD[((192-128))+r9]
+        vpaddq  ymm1,ymm1,ymm14
+        vpmuludq        ymm2,ymm11,YMMWORD[((224-128))+r9]
+        vpbroadcastq    ymm11,QWORD[((160-128))+r15]
+        vpaddq  ymm2,ymm2,YMMWORD[((352-448))+r12]
+
+        vmovdqu YMMWORD[(192-192)+rbx],ymm6
+        vmovdqu YMMWORD[(224-192)+rbx],ymm7
+
+        vpmuludq        ymm12,ymm10,YMMWORD[((128-128))+rsi]
+        vpaddq  ymm8,ymm8,ymm12
+        vpmuludq        ymm14,ymm10,YMMWORD[((128-128))+r9]
+        vpaddq  ymm0,ymm0,ymm14
+        vpmuludq        ymm13,ymm10,YMMWORD[((160-128))+r9]
+        vpaddq  ymm1,ymm1,ymm13
+        vpmuludq        ymm12,ymm10,YMMWORD[((192-128))+r9]
+        vpaddq  ymm2,ymm2,ymm12
+        vpmuludq        ymm3,ymm10,YMMWORD[((224-128))+r9]
+        vpbroadcastq    ymm10,QWORD[((192-128))+r15]
+        vpaddq  ymm3,ymm3,YMMWORD[((384-448))+r12]
+
+        vmovdqu YMMWORD[(256-192)+rbx],ymm8
+        vmovdqu YMMWORD[(288-192)+rbx],ymm0
+        lea     rbx,[8+rbx]
+
+        vpmuludq        ymm13,ymm11,YMMWORD[((160-128))+rsi]
+        vpaddq  ymm1,ymm1,ymm13
+        vpmuludq        ymm12,ymm11,YMMWORD[((160-128))+r9]
+        vpaddq  ymm2,ymm2,ymm12
+        vpmuludq        ymm14,ymm11,YMMWORD[((192-128))+r9]
+        vpaddq  ymm3,ymm3,ymm14
+        vpmuludq        ymm4,ymm11,YMMWORD[((224-128))+r9]
+        vpbroadcastq    ymm11,QWORD[((224-128))+r15]
+        vpaddq  ymm4,ymm4,YMMWORD[((416-448))+r12]
+
+        vmovdqu YMMWORD[(320-448)+r12],ymm1
+        vmovdqu YMMWORD[(352-448)+r12],ymm2
+
+        vpmuludq        ymm12,ymm10,YMMWORD[((192-128))+rsi]
+        vpaddq  ymm3,ymm3,ymm12
+        vpmuludq        ymm14,ymm10,YMMWORD[((192-128))+r9]
+        vpbroadcastq    ymm0,QWORD[((256-128))+r15]
+        vpaddq  ymm4,ymm4,ymm14
+        vpmuludq        ymm5,ymm10,YMMWORD[((224-128))+r9]
+        vpbroadcastq    ymm10,QWORD[((0+8-128))+r15]
+        vpaddq  ymm5,ymm5,YMMWORD[((448-448))+r12]
+
+        vmovdqu YMMWORD[(384-448)+r12],ymm3
+        vmovdqu YMMWORD[(416-448)+r12],ymm4
+        lea     r15,[8+r15]
+
+        vpmuludq        ymm12,ymm11,YMMWORD[((224-128))+rsi]
+        vpaddq  ymm5,ymm5,ymm12
+        vpmuludq        ymm6,ymm11,YMMWORD[((224-128))+r9]
+        vpaddq  ymm6,ymm6,YMMWORD[((480-448))+r12]
+
+        vpmuludq        ymm7,ymm0,YMMWORD[((256-128))+rsi]
+        vmovdqu YMMWORD[(448-448)+r12],ymm5
+        vpaddq  ymm7,ymm7,YMMWORD[((512-448))+r12]
+        vmovdqu YMMWORD[(480-448)+r12],ymm6
+        vmovdqu YMMWORD[(512-448)+r12],ymm7
+        lea     r12,[8+r12]
+
+        dec     r14d
+        jnz     NEAR $L$OOP_SQR_1024
+
+        vmovdqu ymm8,YMMWORD[256+rsp]
+        vmovdqu ymm1,YMMWORD[288+rsp]
+        vmovdqu ymm2,YMMWORD[320+rsp]
+        lea     rbx,[192+rsp]
+
+        vpsrlq  ymm14,ymm8,29
+        vpand   ymm8,ymm8,ymm15
+        vpsrlq  ymm11,ymm1,29
+        vpand   ymm1,ymm1,ymm15
+
+        vpermq  ymm14,ymm14,0x93
+        vpxor   ymm9,ymm9,ymm9
+        vpermq  ymm11,ymm11,0x93
+
+        vpblendd        ymm10,ymm14,ymm9,3
+        vpblendd        ymm14,ymm11,ymm14,3
+        vpaddq  ymm8,ymm8,ymm10
+        vpblendd        ymm11,ymm9,ymm11,3
+        vpaddq  ymm1,ymm1,ymm14
+        vpaddq  ymm2,ymm2,ymm11
+        vmovdqu YMMWORD[(288-192)+rbx],ymm1
+        vmovdqu YMMWORD[(320-192)+rbx],ymm2
+
+        mov     rax,QWORD[rsp]
+        mov     r10,QWORD[8+rsp]
+        mov     r11,QWORD[16+rsp]
+        mov     r12,QWORD[24+rsp]
+        vmovdqu ymm1,YMMWORD[32+rsp]
+        vmovdqu ymm2,YMMWORD[((64-192))+rbx]
+        vmovdqu ymm3,YMMWORD[((96-192))+rbx]
+        vmovdqu ymm4,YMMWORD[((128-192))+rbx]
+        vmovdqu ymm5,YMMWORD[((160-192))+rbx]
+        vmovdqu ymm6,YMMWORD[((192-192))+rbx]
+        vmovdqu ymm7,YMMWORD[((224-192))+rbx]
+
+        mov     r9,rax
+        imul    eax,ecx
+        and     eax,0x1fffffff
+        vmovd   xmm12,eax
+
+        mov     rdx,rax
+        imul    rax,QWORD[((-128))+r13]
+        vpbroadcastq    ymm12,xmm12
+        add     r9,rax
+        mov     rax,rdx
+        imul    rax,QWORD[((8-128))+r13]
+        shr     r9,29
+        add     r10,rax
+        mov     rax,rdx
+        imul    rax,QWORD[((16-128))+r13]
+        add     r10,r9
+        add     r11,rax
+        imul    rdx,QWORD[((24-128))+r13]
+        add     r12,rdx
+
+        mov     rax,r10
+        imul    eax,ecx
+        and     eax,0x1fffffff
+
+        mov     r14d,9
+        jmp     NEAR $L$OOP_REDUCE_1024
+
+ALIGN   32
+$L$OOP_REDUCE_1024:
+        vmovd   xmm13,eax
+        vpbroadcastq    ymm13,xmm13
+
+        vpmuludq        ymm10,ymm12,YMMWORD[((32-128))+r13]
+        mov     rdx,rax
+        imul    rax,QWORD[((-128))+r13]
+        vpaddq  ymm1,ymm1,ymm10
+        add     r10,rax
+        vpmuludq        ymm14,ymm12,YMMWORD[((64-128))+r13]
+        mov     rax,rdx
+        imul    rax,QWORD[((8-128))+r13]
+        vpaddq  ymm2,ymm2,ymm14
+        vpmuludq        ymm11,ymm12,YMMWORD[((96-128))+r13]
+DB      0x67
+        add     r11,rax
+DB      0x67
+        mov     rax,rdx
+        imul    rax,QWORD[((16-128))+r13]
+        shr     r10,29
+        vpaddq  ymm3,ymm3,ymm11
+        vpmuludq        ymm10,ymm12,YMMWORD[((128-128))+r13]
+        add     r12,rax
+        add     r11,r10
+        vpaddq  ymm4,ymm4,ymm10
+        vpmuludq        ymm14,ymm12,YMMWORD[((160-128))+r13]
+        mov     rax,r11
+        imul    eax,ecx
+        vpaddq  ymm5,ymm5,ymm14
+        vpmuludq        ymm11,ymm12,YMMWORD[((192-128))+r13]
+        and     eax,0x1fffffff
+        vpaddq  ymm6,ymm6,ymm11
+        vpmuludq        ymm10,ymm12,YMMWORD[((224-128))+r13]
+        vpaddq  ymm7,ymm7,ymm10
+        vpmuludq        ymm14,ymm12,YMMWORD[((256-128))+r13]
+        vmovd   xmm12,eax
+
+        vpaddq  ymm8,ymm8,ymm14
+
+        vpbroadcastq    ymm12,xmm12
+
+        vpmuludq        ymm11,ymm13,YMMWORD[((32-8-128))+r13]
+        vmovdqu ymm14,YMMWORD[((96-8-128))+r13]
+        mov     rdx,rax
+        imul    rax,QWORD[((-128))+r13]
+        vpaddq  ymm1,ymm1,ymm11
+        vpmuludq        ymm10,ymm13,YMMWORD[((64-8-128))+r13]
+        vmovdqu ymm11,YMMWORD[((128-8-128))+r13]
+        add     r11,rax
+        mov     rax,rdx
+        imul    rax,QWORD[((8-128))+r13]
+        vpaddq  ymm2,ymm2,ymm10
+        add     rax,r12
+        shr     r11,29
+        vpmuludq        ymm14,ymm14,ymm13
+        vmovdqu ymm10,YMMWORD[((160-8-128))+r13]
+        add     rax,r11
+        vpaddq  ymm3,ymm3,ymm14
+        vpmuludq        ymm11,ymm11,ymm13
+        vmovdqu ymm14,YMMWORD[((192-8-128))+r13]
+DB      0x67
+        mov     r12,rax
+        imul    eax,ecx
+        vpaddq  ymm4,ymm4,ymm11
+        vpmuludq        ymm10,ymm10,ymm13
+DB      0xc4,0x41,0x7e,0x6f,0x9d,0x58,0x00,0x00,0x00
+        and     eax,0x1fffffff
+        vpaddq  ymm5,ymm5,ymm10
+        vpmuludq        ymm14,ymm14,ymm13
+        vmovdqu ymm10,YMMWORD[((256-8-128))+r13]
+        vpaddq  ymm6,ymm6,ymm14
+        vpmuludq        ymm11,ymm11,ymm13
+        vmovdqu ymm9,YMMWORD[((288-8-128))+r13]
+        vmovd   xmm0,eax
+        imul    rax,QWORD[((-128))+r13]
+        vpaddq  ymm7,ymm7,ymm11
+        vpmuludq        ymm10,ymm10,ymm13
+        vmovdqu ymm14,YMMWORD[((32-16-128))+r13]
+        vpbroadcastq    ymm0,xmm0
+        vpaddq  ymm8,ymm8,ymm10
+        vpmuludq        ymm9,ymm9,ymm13
+        vmovdqu ymm11,YMMWORD[((64-16-128))+r13]
+        add     r12,rax
+
+        vmovdqu ymm13,YMMWORD[((32-24-128))+r13]
+        vpmuludq        ymm14,ymm14,ymm12
+        vmovdqu ymm10,YMMWORD[((96-16-128))+r13]
+        vpaddq  ymm1,ymm1,ymm14
+        vpmuludq        ymm13,ymm13,ymm0
+        vpmuludq        ymm11,ymm11,ymm12
+DB      0xc4,0x41,0x7e,0x6f,0xb5,0xf0,0xff,0xff,0xff
+        vpaddq  ymm13,ymm13,ymm1
+        vpaddq  ymm2,ymm2,ymm11
+        vpmuludq        ymm10,ymm10,ymm12
+        vmovdqu ymm11,YMMWORD[((160-16-128))+r13]
+DB      0x67
+        vmovq   rax,xmm13
+        vmovdqu YMMWORD[rsp],ymm13
+        vpaddq  ymm3,ymm3,ymm10
+        vpmuludq        ymm14,ymm14,ymm12
+        vmovdqu ymm10,YMMWORD[((192-16-128))+r13]
+        vpaddq  ymm4,ymm4,ymm14
+        vpmuludq        ymm11,ymm11,ymm12
+        vmovdqu ymm14,YMMWORD[((224-16-128))+r13]
+        vpaddq  ymm5,ymm5,ymm11
+        vpmuludq        ymm10,ymm10,ymm12
+        vmovdqu ymm11,YMMWORD[((256-16-128))+r13]
+        vpaddq  ymm6,ymm6,ymm10
+        vpmuludq        ymm14,ymm14,ymm12
+        shr     r12,29
+        vmovdqu ymm10,YMMWORD[((288-16-128))+r13]
+        add     rax,r12
+        vpaddq  ymm7,ymm7,ymm14
+        vpmuludq        ymm11,ymm11,ymm12
+
+        mov     r9,rax
+        imul    eax,ecx
+        vpaddq  ymm8,ymm8,ymm11
+        vpmuludq        ymm10,ymm10,ymm12
+        and     eax,0x1fffffff
+        vmovd   xmm12,eax
+        vmovdqu ymm11,YMMWORD[((96-24-128))+r13]
+DB      0x67
+        vpaddq  ymm9,ymm9,ymm10
+        vpbroadcastq    ymm12,xmm12
+
+        vpmuludq        ymm14,ymm0,YMMWORD[((64-24-128))+r13]
+        vmovdqu ymm10,YMMWORD[((128-24-128))+r13]
+        mov     rdx,rax
+        imul    rax,QWORD[((-128))+r13]
+        mov     r10,QWORD[8+rsp]
+        vpaddq  ymm1,ymm2,ymm14
+        vpmuludq        ymm11,ymm11,ymm0
+        vmovdqu ymm14,YMMWORD[((160-24-128))+r13]
+        add     r9,rax
+        mov     rax,rdx
+        imul    rax,QWORD[((8-128))+r13]
+DB      0x67
+        shr     r9,29
+        mov     r11,QWORD[16+rsp]
+        vpaddq  ymm2,ymm3,ymm11
+        vpmuludq        ymm10,ymm10,ymm0
+        vmovdqu ymm11,YMMWORD[((192-24-128))+r13]
+        add     r10,rax
+        mov     rax,rdx
+        imul    rax,QWORD[((16-128))+r13]
+        vpaddq  ymm3,ymm4,ymm10
+        vpmuludq        ymm14,ymm14,ymm0
+        vmovdqu ymm10,YMMWORD[((224-24-128))+r13]
+        imul    rdx,QWORD[((24-128))+r13]
+        add     r11,rax
+        lea     rax,[r10*1+r9]
+        vpaddq  ymm4,ymm5,ymm14
+        vpmuludq        ymm11,ymm11,ymm0
+        vmovdqu ymm14,YMMWORD[((256-24-128))+r13]
+        mov     r10,rax
+        imul    eax,ecx
+        vpmuludq        ymm10,ymm10,ymm0
+        vpaddq  ymm5,ymm6,ymm11
+        vmovdqu ymm11,YMMWORD[((288-24-128))+r13]
+        and     eax,0x1fffffff
+        vpaddq  ymm6,ymm7,ymm10
+        vpmuludq        ymm14,ymm14,ymm0
+        add     rdx,QWORD[24+rsp]
+        vpaddq  ymm7,ymm8,ymm14
+        vpmuludq        ymm11,ymm11,ymm0
+        vpaddq  ymm8,ymm9,ymm11
+        vmovq   xmm9,r12
+        mov     r12,rdx
+
+        dec     r14d
+        jnz     NEAR $L$OOP_REDUCE_1024
+        lea     r12,[448+rsp]
+        vpaddq  ymm0,ymm13,ymm9
+        vpxor   ymm9,ymm9,ymm9
+
+        vpaddq  ymm0,ymm0,YMMWORD[((288-192))+rbx]
+        vpaddq  ymm1,ymm1,YMMWORD[((320-448))+r12]
+        vpaddq  ymm2,ymm2,YMMWORD[((352-448))+r12]
+        vpaddq  ymm3,ymm3,YMMWORD[((384-448))+r12]
+        vpaddq  ymm4,ymm4,YMMWORD[((416-448))+r12]
+        vpaddq  ymm5,ymm5,YMMWORD[((448-448))+r12]
+        vpaddq  ymm6,ymm6,YMMWORD[((480-448))+r12]
+        vpaddq  ymm7,ymm7,YMMWORD[((512-448))+r12]
+        vpaddq  ymm8,ymm8,YMMWORD[((544-448))+r12]
+
+        vpsrlq  ymm14,ymm0,29
+        vpand   ymm0,ymm0,ymm15
+        vpsrlq  ymm11,ymm1,29
+        vpand   ymm1,ymm1,ymm15
+        vpsrlq  ymm12,ymm2,29
+        vpermq  ymm14,ymm14,0x93
+        vpand   ymm2,ymm2,ymm15
+        vpsrlq  ymm13,ymm3,29
+        vpermq  ymm11,ymm11,0x93
+        vpand   ymm3,ymm3,ymm15
+        vpermq  ymm12,ymm12,0x93
+
+        vpblendd        ymm10,ymm14,ymm9,3
+        vpermq  ymm13,ymm13,0x93
+        vpblendd        ymm14,ymm11,ymm14,3
+        vpaddq  ymm0,ymm0,ymm10
+        vpblendd        ymm11,ymm12,ymm11,3
+        vpaddq  ymm1,ymm1,ymm14
+        vpblendd        ymm12,ymm13,ymm12,3
+        vpaddq  ymm2,ymm2,ymm11
+        vpblendd        ymm13,ymm9,ymm13,3
+        vpaddq  ymm3,ymm3,ymm12
+        vpaddq  ymm4,ymm4,ymm13
+
+        vpsrlq  ymm14,ymm0,29
+        vpand   ymm0,ymm0,ymm15
+        vpsrlq  ymm11,ymm1,29
+        vpand   ymm1,ymm1,ymm15
+        vpsrlq  ymm12,ymm2,29
+        vpermq  ymm14,ymm14,0x93
+        vpand   ymm2,ymm2,ymm15
+        vpsrlq  ymm13,ymm3,29
+        vpermq  ymm11,ymm11,0x93
+        vpand   ymm3,ymm3,ymm15
+        vpermq  ymm12,ymm12,0x93
+
+        vpblendd        ymm10,ymm14,ymm9,3
+        vpermq  ymm13,ymm13,0x93
+        vpblendd        ymm14,ymm11,ymm14,3
+        vpaddq  ymm0,ymm0,ymm10
+        vpblendd        ymm11,ymm12,ymm11,3
+        vpaddq  ymm1,ymm1,ymm14
+        vmovdqu YMMWORD[(0-128)+rdi],ymm0
+        vpblendd        ymm12,ymm13,ymm12,3
+        vpaddq  ymm2,ymm2,ymm11
+        vmovdqu YMMWORD[(32-128)+rdi],ymm1
+        vpblendd        ymm13,ymm9,ymm13,3
+        vpaddq  ymm3,ymm3,ymm12
+        vmovdqu YMMWORD[(64-128)+rdi],ymm2
+        vpaddq  ymm4,ymm4,ymm13
+        vmovdqu YMMWORD[(96-128)+rdi],ymm3
+        vpsrlq  ymm14,ymm4,29
+        vpand   ymm4,ymm4,ymm15
+        vpsrlq  ymm11,ymm5,29
+        vpand   ymm5,ymm5,ymm15
+        vpsrlq  ymm12,ymm6,29
+        vpermq  ymm14,ymm14,0x93
+        vpand   ymm6,ymm6,ymm15
+        vpsrlq  ymm13,ymm7,29
+        vpermq  ymm11,ymm11,0x93
+        vpand   ymm7,ymm7,ymm15
+        vpsrlq  ymm0,ymm8,29
+        vpermq  ymm12,ymm12,0x93
+        vpand   ymm8,ymm8,ymm15
+        vpermq  ymm13,ymm13,0x93
+
+        vpblendd        ymm10,ymm14,ymm9,3
+        vpermq  ymm0,ymm0,0x93
+        vpblendd        ymm14,ymm11,ymm14,3
+        vpaddq  ymm4,ymm4,ymm10
+        vpblendd        ymm11,ymm12,ymm11,3
+        vpaddq  ymm5,ymm5,ymm14
+        vpblendd        ymm12,ymm13,ymm12,3
+        vpaddq  ymm6,ymm6,ymm11
+        vpblendd        ymm13,ymm0,ymm13,3
+        vpaddq  ymm7,ymm7,ymm12
+        vpaddq  ymm8,ymm8,ymm13
+
+        vpsrlq  ymm14,ymm4,29
+        vpand   ymm4,ymm4,ymm15
+        vpsrlq  ymm11,ymm5,29
+        vpand   ymm5,ymm5,ymm15
+        vpsrlq  ymm12,ymm6,29
+        vpermq  ymm14,ymm14,0x93
+        vpand   ymm6,ymm6,ymm15
+        vpsrlq  ymm13,ymm7,29
+        vpermq  ymm11,ymm11,0x93
+        vpand   ymm7,ymm7,ymm15
+        vpsrlq  ymm0,ymm8,29
+        vpermq  ymm12,ymm12,0x93
+        vpand   ymm8,ymm8,ymm15
+        vpermq  ymm13,ymm13,0x93
+
+        vpblendd        ymm10,ymm14,ymm9,3
+        vpermq  ymm0,ymm0,0x93
+        vpblendd        ymm14,ymm11,ymm14,3
+        vpaddq  ymm4,ymm4,ymm10
+        vpblendd        ymm11,ymm12,ymm11,3
+        vpaddq  ymm5,ymm5,ymm14
+        vmovdqu YMMWORD[(128-128)+rdi],ymm4
+        vpblendd        ymm12,ymm13,ymm12,3
+        vpaddq  ymm6,ymm6,ymm11
+        vmovdqu YMMWORD[(160-128)+rdi],ymm5
+        vpblendd        ymm13,ymm0,ymm13,3
+        vpaddq  ymm7,ymm7,ymm12
+        vmovdqu YMMWORD[(192-128)+rdi],ymm6
+        vpaddq  ymm8,ymm8,ymm13
+        vmovdqu YMMWORD[(224-128)+rdi],ymm7
+        vmovdqu YMMWORD[(256-128)+rdi],ymm8
+
+        mov     rsi,rdi
+        dec     r8d
+        jne     NEAR $L$OOP_GRANDE_SQR_1024
+
+        vzeroall
+        mov     rax,rbp
+
+$L$sqr_1024_in_tail:
+        movaps  xmm6,XMMWORD[((-216))+rax]
+        movaps  xmm7,XMMWORD[((-200))+rax]
+        movaps  xmm8,XMMWORD[((-184))+rax]
+        movaps  xmm9,XMMWORD[((-168))+rax]
+        movaps  xmm10,XMMWORD[((-152))+rax]
+        movaps  xmm11,XMMWORD[((-136))+rax]
+        movaps  xmm12,XMMWORD[((-120))+rax]
+        movaps  xmm13,XMMWORD[((-104))+rax]
+        movaps  xmm14,XMMWORD[((-88))+rax]
+        movaps  xmm15,XMMWORD[((-72))+rax]
+        mov     r15,QWORD[((-48))+rax]
+
+        mov     r14,QWORD[((-40))+rax]
+
+        mov     r13,QWORD[((-32))+rax]
+
+        mov     r12,QWORD[((-24))+rax]
+
+        mov     rbp,QWORD[((-16))+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+
+        lea     rsp,[rax]
+
+$L$sqr_1024_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_rsaz_1024_sqr_avx2:
+global  rsaz_1024_mul_avx2
+
+ALIGN   64
+rsaz_1024_mul_avx2:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_rsaz_1024_mul_avx2:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+
+
+
+        lea     rax,[rsp]
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+        vzeroupper
+        lea     rsp,[((-168))+rsp]
+        vmovaps XMMWORD[(-216)+rax],xmm6
+        vmovaps XMMWORD[(-200)+rax],xmm7
+        vmovaps XMMWORD[(-184)+rax],xmm8
+        vmovaps XMMWORD[(-168)+rax],xmm9
+        vmovaps XMMWORD[(-152)+rax],xmm10
+        vmovaps XMMWORD[(-136)+rax],xmm11
+        vmovaps XMMWORD[(-120)+rax],xmm12
+        vmovaps XMMWORD[(-104)+rax],xmm13
+        vmovaps XMMWORD[(-88)+rax],xmm14
+        vmovaps XMMWORD[(-72)+rax],xmm15
+$L$mul_1024_body:
+        mov     rbp,rax
+
+        vzeroall
+        mov     r13,rdx
+        sub     rsp,64
+
+
+
+
+
+
+DB      0x67,0x67
+        mov     r15,rsi
+        and     r15,4095
+        add     r15,32*10
+        shr     r15,12
+        mov     r15,rsi
+        cmovnz  rsi,r13
+        cmovnz  r13,r15
+
+        mov     r15,rcx
+        sub     rsi,-128
+        sub     rcx,-128
+        sub     rdi,-128
+
+        and     r15,4095
+        add     r15,32*10
+DB      0x67,0x67
+        shr     r15,12
+        jz      NEAR $L$mul_1024_no_n_copy
+
+
+
+
+
+        sub     rsp,32*10
+        vmovdqu ymm0,YMMWORD[((0-128))+rcx]
+        and     rsp,-512
+        vmovdqu ymm1,YMMWORD[((32-128))+rcx]
+        vmovdqu ymm2,YMMWORD[((64-128))+rcx]
+        vmovdqu ymm3,YMMWORD[((96-128))+rcx]
+        vmovdqu ymm4,YMMWORD[((128-128))+rcx]
+        vmovdqu ymm5,YMMWORD[((160-128))+rcx]
+        vmovdqu ymm6,YMMWORD[((192-128))+rcx]
+        vmovdqu ymm7,YMMWORD[((224-128))+rcx]
+        vmovdqu ymm8,YMMWORD[((256-128))+rcx]
+        lea     rcx,[((64+128))+rsp]
+        vmovdqu YMMWORD[(0-128)+rcx],ymm0
+        vpxor   ymm0,ymm0,ymm0
+        vmovdqu YMMWORD[(32-128)+rcx],ymm1
+        vpxor   ymm1,ymm1,ymm1
+        vmovdqu YMMWORD[(64-128)+rcx],ymm2
+        vpxor   ymm2,ymm2,ymm2
+        vmovdqu YMMWORD[(96-128)+rcx],ymm3
+        vpxor   ymm3,ymm3,ymm3
+        vmovdqu YMMWORD[(128-128)+rcx],ymm4
+        vpxor   ymm4,ymm4,ymm4
+        vmovdqu YMMWORD[(160-128)+rcx],ymm5
+        vpxor   ymm5,ymm5,ymm5
+        vmovdqu YMMWORD[(192-128)+rcx],ymm6
+        vpxor   ymm6,ymm6,ymm6
+        vmovdqu YMMWORD[(224-128)+rcx],ymm7
+        vpxor   ymm7,ymm7,ymm7
+        vmovdqu YMMWORD[(256-128)+rcx],ymm8
+        vmovdqa ymm8,ymm0
+        vmovdqu YMMWORD[(288-128)+rcx],ymm9
+$L$mul_1024_no_n_copy:
+        and     rsp,-64
+
+        mov     rbx,QWORD[r13]
+        vpbroadcastq    ymm10,QWORD[r13]
+        vmovdqu YMMWORD[rsp],ymm0
+        xor     r9,r9
+DB      0x67
+        xor     r10,r10
+        xor     r11,r11
+        xor     r12,r12
+
+        vmovdqu ymm15,YMMWORD[$L$and_mask]
+        mov     r14d,9
+        vmovdqu YMMWORD[(288-128)+rdi],ymm9
+        jmp     NEAR $L$oop_mul_1024
+
+ALIGN   32
+$L$oop_mul_1024:
+        vpsrlq  ymm9,ymm3,29
+        mov     rax,rbx
+        imul    rax,QWORD[((-128))+rsi]
+        add     rax,r9
+        mov     r10,rbx
+        imul    r10,QWORD[((8-128))+rsi]
+        add     r10,QWORD[8+rsp]
+
+        mov     r9,rax
+        imul    eax,r8d
+        and     eax,0x1fffffff
+
+        mov     r11,rbx
+        imul    r11,QWORD[((16-128))+rsi]
+        add     r11,QWORD[16+rsp]
+
+        mov     r12,rbx
+        imul    r12,QWORD[((24-128))+rsi]
+        add     r12,QWORD[24+rsp]
+        vpmuludq        ymm0,ymm10,YMMWORD[((32-128))+rsi]
+        vmovd   xmm11,eax
+        vpaddq  ymm1,ymm1,ymm0
+        vpmuludq        ymm12,ymm10,YMMWORD[((64-128))+rsi]
+        vpbroadcastq    ymm11,xmm11
+        vpaddq  ymm2,ymm2,ymm12
+        vpmuludq        ymm13,ymm10,YMMWORD[((96-128))+rsi]
+        vpand   ymm3,ymm3,ymm15
+        vpaddq  ymm3,ymm3,ymm13
+        vpmuludq        ymm0,ymm10,YMMWORD[((128-128))+rsi]
+        vpaddq  ymm4,ymm4,ymm0
+        vpmuludq        ymm12,ymm10,YMMWORD[((160-128))+rsi]
+        vpaddq  ymm5,ymm5,ymm12
+        vpmuludq        ymm13,ymm10,YMMWORD[((192-128))+rsi]
+        vpaddq  ymm6,ymm6,ymm13
+        vpmuludq        ymm0,ymm10,YMMWORD[((224-128))+rsi]
+        vpermq  ymm9,ymm9,0x93
+        vpaddq  ymm7,ymm7,ymm0
+        vpmuludq        ymm12,ymm10,YMMWORD[((256-128))+rsi]
+        vpbroadcastq    ymm10,QWORD[8+r13]
+        vpaddq  ymm8,ymm8,ymm12
+
+        mov     rdx,rax
+        imul    rax,QWORD[((-128))+rcx]
+        add     r9,rax
+        mov     rax,rdx
+        imul    rax,QWORD[((8-128))+rcx]
+        add     r10,rax
+        mov     rax,rdx
+        imul    rax,QWORD[((16-128))+rcx]
+        add     r11,rax
+        shr     r9,29
+        imul    rdx,QWORD[((24-128))+rcx]
+        add     r12,rdx
+        add     r10,r9
+
+        vpmuludq        ymm13,ymm11,YMMWORD[((32-128))+rcx]
+        vmovq   rbx,xmm10
+        vpaddq  ymm1,ymm1,ymm13
+        vpmuludq        ymm0,ymm11,YMMWORD[((64-128))+rcx]
+        vpaddq  ymm2,ymm2,ymm0
+        vpmuludq        ymm12,ymm11,YMMWORD[((96-128))+rcx]
+        vpaddq  ymm3,ymm3,ymm12
+        vpmuludq        ymm13,ymm11,YMMWORD[((128-128))+rcx]
+        vpaddq  ymm4,ymm4,ymm13
+        vpmuludq        ymm0,ymm11,YMMWORD[((160-128))+rcx]
+        vpaddq  ymm5,ymm5,ymm0
+        vpmuludq        ymm12,ymm11,YMMWORD[((192-128))+rcx]
+        vpaddq  ymm6,ymm6,ymm12
+        vpmuludq        ymm13,ymm11,YMMWORD[((224-128))+rcx]
+        vpblendd        ymm12,ymm9,ymm14,3
+        vpaddq  ymm7,ymm7,ymm13
+        vpmuludq        ymm0,ymm11,YMMWORD[((256-128))+rcx]
+        vpaddq  ymm3,ymm3,ymm12
+        vpaddq  ymm8,ymm8,ymm0
+
+        mov     rax,rbx
+        imul    rax,QWORD[((-128))+rsi]
+        add     r10,rax
+        vmovdqu ymm12,YMMWORD[((-8+32-128))+rsi]
+        mov     rax,rbx
+        imul    rax,QWORD[((8-128))+rsi]
+        add     r11,rax
+        vmovdqu ymm13,YMMWORD[((-8+64-128))+rsi]
+
+        mov     rax,r10
+        vpblendd        ymm9,ymm9,ymm14,0xfc
+        imul    eax,r8d
+        vpaddq  ymm4,ymm4,ymm9
+        and     eax,0x1fffffff
+
+        imul    rbx,QWORD[((16-128))+rsi]
+        add     r12,rbx
+        vpmuludq        ymm12,ymm12,ymm10
+        vmovd   xmm11,eax
+        vmovdqu ymm0,YMMWORD[((-8+96-128))+rsi]
+        vpaddq  ymm1,ymm1,ymm12
+        vpmuludq        ymm13,ymm13,ymm10
+        vpbroadcastq    ymm11,xmm11
+        vmovdqu ymm12,YMMWORD[((-8+128-128))+rsi]
+        vpaddq  ymm2,ymm2,ymm13
+        vpmuludq        ymm0,ymm0,ymm10
+        vmovdqu ymm13,YMMWORD[((-8+160-128))+rsi]
+        vpaddq  ymm3,ymm3,ymm0
+        vpmuludq        ymm12,ymm12,ymm10
+        vmovdqu ymm0,YMMWORD[((-8+192-128))+rsi]
+        vpaddq  ymm4,ymm4,ymm12
+        vpmuludq        ymm13,ymm13,ymm10
+        vmovdqu ymm12,YMMWORD[((-8+224-128))+rsi]
+        vpaddq  ymm5,ymm5,ymm13
+        vpmuludq        ymm0,ymm0,ymm10
+        vmovdqu ymm13,YMMWORD[((-8+256-128))+rsi]
+        vpaddq  ymm6,ymm6,ymm0
+        vpmuludq        ymm12,ymm12,ymm10
+        vmovdqu ymm9,YMMWORD[((-8+288-128))+rsi]
+        vpaddq  ymm7,ymm7,ymm12
+        vpmuludq        ymm13,ymm13,ymm10
+        vpaddq  ymm8,ymm8,ymm13
+        vpmuludq        ymm9,ymm9,ymm10
+        vpbroadcastq    ymm10,QWORD[16+r13]
+
+        mov     rdx,rax
+        imul    rax,QWORD[((-128))+rcx]
+        add     r10,rax
+        vmovdqu ymm0,YMMWORD[((-8+32-128))+rcx]
+        mov     rax,rdx
+        imul    rax,QWORD[((8-128))+rcx]
+        add     r11,rax
+        vmovdqu ymm12,YMMWORD[((-8+64-128))+rcx]
+        shr     r10,29
+        imul    rdx,QWORD[((16-128))+rcx]
+        add     r12,rdx
+        add     r11,r10
+
+        vpmuludq        ymm0,ymm0,ymm11
+        vmovq   rbx,xmm10
+        vmovdqu ymm13,YMMWORD[((-8+96-128))+rcx]
+        vpaddq  ymm1,ymm1,ymm0
+        vpmuludq        ymm12,ymm12,ymm11
+        vmovdqu ymm0,YMMWORD[((-8+128-128))+rcx]
+        vpaddq  ymm2,ymm2,ymm12
+        vpmuludq        ymm13,ymm13,ymm11
+        vmovdqu ymm12,YMMWORD[((-8+160-128))+rcx]
+        vpaddq  ymm3,ymm3,ymm13
+        vpmuludq        ymm0,ymm0,ymm11
+        vmovdqu ymm13,YMMWORD[((-8+192-128))+rcx]
+        vpaddq  ymm4,ymm4,ymm0
+        vpmuludq        ymm12,ymm12,ymm11
+        vmovdqu ymm0,YMMWORD[((-8+224-128))+rcx]
+        vpaddq  ymm5,ymm5,ymm12
+        vpmuludq        ymm13,ymm13,ymm11
+        vmovdqu ymm12,YMMWORD[((-8+256-128))+rcx]
+        vpaddq  ymm6,ymm6,ymm13
+        vpmuludq        ymm0,ymm0,ymm11
+        vmovdqu ymm13,YMMWORD[((-8+288-128))+rcx]
+        vpaddq  ymm7,ymm7,ymm0
+        vpmuludq        ymm12,ymm12,ymm11
+        vpaddq  ymm8,ymm8,ymm12
+        vpmuludq        ymm13,ymm13,ymm11
+        vpaddq  ymm9,ymm9,ymm13
+
+        vmovdqu ymm0,YMMWORD[((-16+32-128))+rsi]
+        mov     rax,rbx
+        imul    rax,QWORD[((-128))+rsi]
+        add     rax,r11
+
+        vmovdqu ymm12,YMMWORD[((-16+64-128))+rsi]
+        mov     r11,rax
+        imul    eax,r8d
+        and     eax,0x1fffffff
+
+        imul    rbx,QWORD[((8-128))+rsi]
+        add     r12,rbx
+        vpmuludq        ymm0,ymm0,ymm10
+        vmovd   xmm11,eax
+        vmovdqu ymm13,YMMWORD[((-16+96-128))+rsi]
+        vpaddq  ymm1,ymm1,ymm0
+        vpmuludq        ymm12,ymm12,ymm10
+        vpbroadcastq    ymm11,xmm11
+        vmovdqu ymm0,YMMWORD[((-16+128-128))+rsi]
+        vpaddq  ymm2,ymm2,ymm12
+        vpmuludq        ymm13,ymm13,ymm10
+        vmovdqu ymm12,YMMWORD[((-16+160-128))+rsi]
+        vpaddq  ymm3,ymm3,ymm13
+        vpmuludq        ymm0,ymm0,ymm10
+        vmovdqu ymm13,YMMWORD[((-16+192-128))+rsi]
+        vpaddq  ymm4,ymm4,ymm0
+        vpmuludq        ymm12,ymm12,ymm10
+        vmovdqu ymm0,YMMWORD[((-16+224-128))+rsi]
+        vpaddq  ymm5,ymm5,ymm12
+        vpmuludq        ymm13,ymm13,ymm10
+        vmovdqu ymm12,YMMWORD[((-16+256-128))+rsi]
+        vpaddq  ymm6,ymm6,ymm13
+        vpmuludq        ymm0,ymm0,ymm10
+        vmovdqu ymm13,YMMWORD[((-16+288-128))+rsi]
+        vpaddq  ymm7,ymm7,ymm0
+        vpmuludq        ymm12,ymm12,ymm10
+        vpaddq  ymm8,ymm8,ymm12
+        vpmuludq        ymm13,ymm13,ymm10
+        vpbroadcastq    ymm10,QWORD[24+r13]
+        vpaddq  ymm9,ymm9,ymm13
+
+        vmovdqu ymm0,YMMWORD[((-16+32-128))+rcx]
+        mov     rdx,rax
+        imul    rax,QWORD[((-128))+rcx]
+        add     r11,rax
+        vmovdqu ymm12,YMMWORD[((-16+64-128))+rcx]
+        imul    rdx,QWORD[((8-128))+rcx]
+        add     r12,rdx
+        shr     r11,29
+
+        vpmuludq        ymm0,ymm0,ymm11
+        vmovq   rbx,xmm10
+        vmovdqu ymm13,YMMWORD[((-16+96-128))+rcx]
+        vpaddq  ymm1,ymm1,ymm0
+        vpmuludq        ymm12,ymm12,ymm11
+        vmovdqu ymm0,YMMWORD[((-16+128-128))+rcx]
+        vpaddq  ymm2,ymm2,ymm12
+        vpmuludq        ymm13,ymm13,ymm11
+        vmovdqu ymm12,YMMWORD[((-16+160-128))+rcx]
+        vpaddq  ymm3,ymm3,ymm13
+        vpmuludq        ymm0,ymm0,ymm11
+        vmovdqu ymm13,YMMWORD[((-16+192-128))+rcx]
+        vpaddq  ymm4,ymm4,ymm0
+        vpmuludq        ymm12,ymm12,ymm11
+        vmovdqu ymm0,YMMWORD[((-16+224-128))+rcx]
+        vpaddq  ymm5,ymm5,ymm12
+        vpmuludq        ymm13,ymm13,ymm11
+        vmovdqu ymm12,YMMWORD[((-16+256-128))+rcx]
+        vpaddq  ymm6,ymm6,ymm13
+        vpmuludq        ymm0,ymm0,ymm11
+        vmovdqu ymm13,YMMWORD[((-16+288-128))+rcx]
+        vpaddq  ymm7,ymm7,ymm0
+        vpmuludq        ymm12,ymm12,ymm11
+        vmovdqu ymm0,YMMWORD[((-24+32-128))+rsi]
+        vpaddq  ymm8,ymm8,ymm12
+        vpmuludq        ymm13,ymm13,ymm11
+        vmovdqu ymm12,YMMWORD[((-24+64-128))+rsi]
+        vpaddq  ymm9,ymm9,ymm13
+
+        add     r12,r11
+        imul    rbx,QWORD[((-128))+rsi]
+        add     r12,rbx
+
+        mov     rax,r12
+        imul    eax,r8d
+        and     eax,0x1fffffff
+
+        vpmuludq        ymm0,ymm0,ymm10
+        vmovd   xmm11,eax
+        vmovdqu ymm13,YMMWORD[((-24+96-128))+rsi]
+        vpaddq  ymm1,ymm1,ymm0
+        vpmuludq        ymm12,ymm12,ymm10
+        vpbroadcastq    ymm11,xmm11
+        vmovdqu ymm0,YMMWORD[((-24+128-128))+rsi]
+        vpaddq  ymm2,ymm2,ymm12
+        vpmuludq        ymm13,ymm13,ymm10
+        vmovdqu ymm12,YMMWORD[((-24+160-128))+rsi]
+        vpaddq  ymm3,ymm3,ymm13
+        vpmuludq        ymm0,ymm0,ymm10
+        vmovdqu ymm13,YMMWORD[((-24+192-128))+rsi]
+        vpaddq  ymm4,ymm4,ymm0
+        vpmuludq        ymm12,ymm12,ymm10
+        vmovdqu ymm0,YMMWORD[((-24+224-128))+rsi]
+        vpaddq  ymm5,ymm5,ymm12
+        vpmuludq        ymm13,ymm13,ymm10
+        vmovdqu ymm12,YMMWORD[((-24+256-128))+rsi]
+        vpaddq  ymm6,ymm6,ymm13
+        vpmuludq        ymm0,ymm0,ymm10
+        vmovdqu ymm13,YMMWORD[((-24+288-128))+rsi]
+        vpaddq  ymm7,ymm7,ymm0
+        vpmuludq        ymm12,ymm12,ymm10
+        vpaddq  ymm8,ymm8,ymm12
+        vpmuludq        ymm13,ymm13,ymm10
+        vpbroadcastq    ymm10,QWORD[32+r13]
+        vpaddq  ymm9,ymm9,ymm13
+        add     r13,32
+
+        vmovdqu ymm0,YMMWORD[((-24+32-128))+rcx]
+        imul    rax,QWORD[((-128))+rcx]
+        add     r12,rax
+        shr     r12,29
+
+        vmovdqu ymm12,YMMWORD[((-24+64-128))+rcx]
+        vpmuludq        ymm0,ymm0,ymm11
+        vmovq   rbx,xmm10
+        vmovdqu ymm13,YMMWORD[((-24+96-128))+rcx]
+        vpaddq  ymm0,ymm1,ymm0
+        vpmuludq        ymm12,ymm12,ymm11
+        vmovdqu YMMWORD[rsp],ymm0
+        vpaddq  ymm1,ymm2,ymm12
+        vmovdqu ymm0,YMMWORD[((-24+128-128))+rcx]
+        vpmuludq        ymm13,ymm13,ymm11
+        vmovdqu ymm12,YMMWORD[((-24+160-128))+rcx]
+        vpaddq  ymm2,ymm3,ymm13
+        vpmuludq        ymm0,ymm0,ymm11
+        vmovdqu ymm13,YMMWORD[((-24+192-128))+rcx]
+        vpaddq  ymm3,ymm4,ymm0
+        vpmuludq        ymm12,ymm12,ymm11
+        vmovdqu ymm0,YMMWORD[((-24+224-128))+rcx]
+        vpaddq  ymm4,ymm5,ymm12
+        vpmuludq        ymm13,ymm13,ymm11
+        vmovdqu ymm12,YMMWORD[((-24+256-128))+rcx]
+        vpaddq  ymm5,ymm6,ymm13
+        vpmuludq        ymm0,ymm0,ymm11
+        vmovdqu ymm13,YMMWORD[((-24+288-128))+rcx]
+        mov     r9,r12
+        vpaddq  ymm6,ymm7,ymm0
+        vpmuludq        ymm12,ymm12,ymm11
+        add     r9,QWORD[rsp]
+        vpaddq  ymm7,ymm8,ymm12
+        vpmuludq        ymm13,ymm13,ymm11
+        vmovq   xmm12,r12
+        vpaddq  ymm8,ymm9,ymm13
+
+        dec     r14d
+        jnz     NEAR $L$oop_mul_1024
+        vpaddq  ymm0,ymm12,YMMWORD[rsp]
+
+        vpsrlq  ymm12,ymm0,29
+        vpand   ymm0,ymm0,ymm15
+        vpsrlq  ymm13,ymm1,29
+        vpand   ymm1,ymm1,ymm15
+        vpsrlq  ymm10,ymm2,29
+        vpermq  ymm12,ymm12,0x93
+        vpand   ymm2,ymm2,ymm15
+        vpsrlq  ymm11,ymm3,29
+        vpermq  ymm13,ymm13,0x93
+        vpand   ymm3,ymm3,ymm15
+
+        vpblendd        ymm9,ymm12,ymm14,3
+        vpermq  ymm10,ymm10,0x93
+        vpblendd        ymm12,ymm13,ymm12,3
+        vpermq  ymm11,ymm11,0x93
+        vpaddq  ymm0,ymm0,ymm9
+        vpblendd        ymm13,ymm10,ymm13,3
+        vpaddq  ymm1,ymm1,ymm12
+        vpblendd        ymm10,ymm11,ymm10,3
+        vpaddq  ymm2,ymm2,ymm13
+        vpblendd        ymm11,ymm14,ymm11,3
+        vpaddq  ymm3,ymm3,ymm10
+        vpaddq  ymm4,ymm4,ymm11
+
+        vpsrlq  ymm12,ymm0,29
+        vpand   ymm0,ymm0,ymm15
+        vpsrlq  ymm13,ymm1,29
+        vpand   ymm1,ymm1,ymm15
+        vpsrlq  ymm10,ymm2,29
+        vpermq  ymm12,ymm12,0x93
+        vpand   ymm2,ymm2,ymm15
+        vpsrlq  ymm11,ymm3,29
+        vpermq  ymm13,ymm13,0x93
+        vpand   ymm3,ymm3,ymm15
+        vpermq  ymm10,ymm10,0x93
+
+        vpblendd        ymm9,ymm12,ymm14,3
+        vpermq  ymm11,ymm11,0x93
+        vpblendd        ymm12,ymm13,ymm12,3
+        vpaddq  ymm0,ymm0,ymm9
+        vpblendd        ymm13,ymm10,ymm13,3
+        vpaddq  ymm1,ymm1,ymm12
+        vpblendd        ymm10,ymm11,ymm10,3
+        vpaddq  ymm2,ymm2,ymm13
+        vpblendd        ymm11,ymm14,ymm11,3
+        vpaddq  ymm3,ymm3,ymm10
+        vpaddq  ymm4,ymm4,ymm11
+
+        vmovdqu YMMWORD[(0-128)+rdi],ymm0
+        vmovdqu YMMWORD[(32-128)+rdi],ymm1
+        vmovdqu YMMWORD[(64-128)+rdi],ymm2
+        vmovdqu YMMWORD[(96-128)+rdi],ymm3
+        vpsrlq  ymm12,ymm4,29
+        vpand   ymm4,ymm4,ymm15
+        vpsrlq  ymm13,ymm5,29
+        vpand   ymm5,ymm5,ymm15
+        vpsrlq  ymm10,ymm6,29
+        vpermq  ymm12,ymm12,0x93
+        vpand   ymm6,ymm6,ymm15
+        vpsrlq  ymm11,ymm7,29
+        vpermq  ymm13,ymm13,0x93
+        vpand   ymm7,ymm7,ymm15
+        vpsrlq  ymm0,ymm8,29
+        vpermq  ymm10,ymm10,0x93
+        vpand   ymm8,ymm8,ymm15
+        vpermq  ymm11,ymm11,0x93
+
+        vpblendd        ymm9,ymm12,ymm14,3
+        vpermq  ymm0,ymm0,0x93
+        vpblendd        ymm12,ymm13,ymm12,3
+        vpaddq  ymm4,ymm4,ymm9
+        vpblendd        ymm13,ymm10,ymm13,3
+        vpaddq  ymm5,ymm5,ymm12
+        vpblendd        ymm10,ymm11,ymm10,3
+        vpaddq  ymm6,ymm6,ymm13
+        vpblendd        ymm11,ymm0,ymm11,3
+        vpaddq  ymm7,ymm7,ymm10
+        vpaddq  ymm8,ymm8,ymm11
+
+        vpsrlq  ymm12,ymm4,29
+        vpand   ymm4,ymm4,ymm15
+        vpsrlq  ymm13,ymm5,29
+        vpand   ymm5,ymm5,ymm15
+        vpsrlq  ymm10,ymm6,29
+        vpermq  ymm12,ymm12,0x93
+        vpand   ymm6,ymm6,ymm15
+        vpsrlq  ymm11,ymm7,29
+        vpermq  ymm13,ymm13,0x93
+        vpand   ymm7,ymm7,ymm15
+        vpsrlq  ymm0,ymm8,29
+        vpermq  ymm10,ymm10,0x93
+        vpand   ymm8,ymm8,ymm15
+        vpermq  ymm11,ymm11,0x93
+
+        vpblendd        ymm9,ymm12,ymm14,3
+        vpermq  ymm0,ymm0,0x93
+        vpblendd        ymm12,ymm13,ymm12,3
+        vpaddq  ymm4,ymm4,ymm9
+        vpblendd        ymm13,ymm10,ymm13,3
+        vpaddq  ymm5,ymm5,ymm12
+        vpblendd        ymm10,ymm11,ymm10,3
+        vpaddq  ymm6,ymm6,ymm13
+        vpblendd        ymm11,ymm0,ymm11,3
+        vpaddq  ymm7,ymm7,ymm10
+        vpaddq  ymm8,ymm8,ymm11
+
+        vmovdqu YMMWORD[(128-128)+rdi],ymm4
+        vmovdqu YMMWORD[(160-128)+rdi],ymm5
+        vmovdqu YMMWORD[(192-128)+rdi],ymm6
+        vmovdqu YMMWORD[(224-128)+rdi],ymm7
+        vmovdqu YMMWORD[(256-128)+rdi],ymm8
+        vzeroupper
+
+        mov     rax,rbp
+
+$L$mul_1024_in_tail:
+        movaps  xmm6,XMMWORD[((-216))+rax]
+        movaps  xmm7,XMMWORD[((-200))+rax]
+        movaps  xmm8,XMMWORD[((-184))+rax]
+        movaps  xmm9,XMMWORD[((-168))+rax]
+        movaps  xmm10,XMMWORD[((-152))+rax]
+        movaps  xmm11,XMMWORD[((-136))+rax]
+        movaps  xmm12,XMMWORD[((-120))+rax]
+        movaps  xmm13,XMMWORD[((-104))+rax]
+        movaps  xmm14,XMMWORD[((-88))+rax]
+        movaps  xmm15,XMMWORD[((-72))+rax]
+        mov     r15,QWORD[((-48))+rax]
+
+        mov     r14,QWORD[((-40))+rax]
+
+        mov     r13,QWORD[((-32))+rax]
+
+        mov     r12,QWORD[((-24))+rax]
+
+        mov     rbp,QWORD[((-16))+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+
+        lea     rsp,[rax]
+
+$L$mul_1024_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_rsaz_1024_mul_avx2:
+global  rsaz_1024_red2norm_avx2
+
+ALIGN   32
+rsaz_1024_red2norm_avx2:
+
+        sub     rdx,-128
+        xor     rax,rax
+        mov     r8,QWORD[((-128))+rdx]
+        mov     r9,QWORD[((-120))+rdx]
+        mov     r10,QWORD[((-112))+rdx]
+        shl     r8,0
+        shl     r9,29
+        mov     r11,r10
+        shl     r10,58
+        shr     r11,6
+        add     rax,r8
+        add     rax,r9
+        add     rax,r10
+        adc     r11,0
+        mov     QWORD[rcx],rax
+        mov     rax,r11
+        mov     r8,QWORD[((-104))+rdx]
+        mov     r9,QWORD[((-96))+rdx]
+        shl     r8,23
+        mov     r10,r9
+        shl     r9,52
+        shr     r10,12
+        add     rax,r8
+        add     rax,r9
+        adc     r10,0
+        mov     QWORD[8+rcx],rax
+        mov     rax,r10
+        mov     r11,QWORD[((-88))+rdx]
+        mov     r8,QWORD[((-80))+rdx]
+        shl     r11,17
+        mov     r9,r8
+        shl     r8,46
+        shr     r9,18
+        add     rax,r11
+        add     rax,r8
+        adc     r9,0
+        mov     QWORD[16+rcx],rax
+        mov     rax,r9
+        mov     r10,QWORD[((-72))+rdx]
+        mov     r11,QWORD[((-64))+rdx]
+        shl     r10,11
+        mov     r8,r11
+        shl     r11,40
+        shr     r8,24
+        add     rax,r10
+        add     rax,r11
+        adc     r8,0
+        mov     QWORD[24+rcx],rax
+        mov     rax,r8
+        mov     r9,QWORD[((-56))+rdx]
+        mov     r10,QWORD[((-48))+rdx]
+        mov     r11,QWORD[((-40))+rdx]
+        shl     r9,5
+        shl     r10,34
+        mov     r8,r11
+        shl     r11,63
+        shr     r8,1
+        add     rax,r9
+        add     rax,r10
+        add     rax,r11
+        adc     r8,0
+        mov     QWORD[32+rcx],rax
+        mov     rax,r8
+        mov     r9,QWORD[((-32))+rdx]
+        mov     r10,QWORD[((-24))+rdx]
+        shl     r9,28
+        mov     r11,r10
+        shl     r10,57
+        shr     r11,7
+        add     rax,r9
+        add     rax,r10
+        adc     r11,0
+        mov     QWORD[40+rcx],rax
+        mov     rax,r11
+        mov     r8,QWORD[((-16))+rdx]
+        mov     r9,QWORD[((-8))+rdx]
+        shl     r8,22
+        mov     r10,r9
+        shl     r9,51
+        shr     r10,13
+        add     rax,r8
+        add     rax,r9
+        adc     r10,0
+        mov     QWORD[48+rcx],rax
+        mov     rax,r10
+        mov     r11,QWORD[rdx]
+        mov     r8,QWORD[8+rdx]
+        shl     r11,16
+        mov     r9,r8
+        shl     r8,45
+        shr     r9,19
+        add     rax,r11
+        add     rax,r8
+        adc     r9,0
+        mov     QWORD[56+rcx],rax
+        mov     rax,r9
+        mov     r10,QWORD[16+rdx]
+        mov     r11,QWORD[24+rdx]
+        shl     r10,10
+        mov     r8,r11
+        shl     r11,39
+        shr     r8,25
+        add     rax,r10
+        add     rax,r11
+        adc     r8,0
+        mov     QWORD[64+rcx],rax
+        mov     rax,r8
+        mov     r9,QWORD[32+rdx]
+        mov     r10,QWORD[40+rdx]
+        mov     r11,QWORD[48+rdx]
+        shl     r9,4
+        shl     r10,33
+        mov     r8,r11
+        shl     r11,62
+        shr     r8,2
+        add     rax,r9
+        add     rax,r10
+        add     rax,r11
+        adc     r8,0
+        mov     QWORD[72+rcx],rax
+        mov     rax,r8
+        mov     r9,QWORD[56+rdx]
+        mov     r10,QWORD[64+rdx]
+        shl     r9,27
+        mov     r11,r10
+        shl     r10,56
+        shr     r11,8
+        add     rax,r9
+        add     rax,r10
+        adc     r11,0
+        mov     QWORD[80+rcx],rax
+        mov     rax,r11
+        mov     r8,QWORD[72+rdx]
+        mov     r9,QWORD[80+rdx]
+        shl     r8,21
+        mov     r10,r9
+        shl     r9,50
+        shr     r10,14
+        add     rax,r8
+        add     rax,r9
+        adc     r10,0
+        mov     QWORD[88+rcx],rax
+        mov     rax,r10
+        mov     r11,QWORD[88+rdx]
+        mov     r8,QWORD[96+rdx]
+        shl     r11,15
+        mov     r9,r8
+        shl     r8,44
+        shr     r9,20
+        add     rax,r11
+        add     rax,r8
+        adc     r9,0
+        mov     QWORD[96+rcx],rax
+        mov     rax,r9
+        mov     r10,QWORD[104+rdx]
+        mov     r11,QWORD[112+rdx]
+        shl     r10,9
+        mov     r8,r11
+        shl     r11,38
+        shr     r8,26
+        add     rax,r10
+        add     rax,r11
+        adc     r8,0
+        mov     QWORD[104+rcx],rax
+        mov     rax,r8
+        mov     r9,QWORD[120+rdx]
+        mov     r10,QWORD[128+rdx]
+        mov     r11,QWORD[136+rdx]
+        shl     r9,3
+        shl     r10,32
+        mov     r8,r11
+        shl     r11,61
+        shr     r8,3
+        add     rax,r9
+        add     rax,r10
+        add     rax,r11
+        adc     r8,0
+        mov     QWORD[112+rcx],rax
+        mov     rax,r8
+        mov     r9,QWORD[144+rdx]
+        mov     r10,QWORD[152+rdx]
+        shl     r9,26
+        mov     r11,r10
+        shl     r10,55
+        shr     r11,9
+        add     rax,r9
+        add     rax,r10
+        adc     r11,0
+        mov     QWORD[120+rcx],rax
+        mov     rax,r11
+        DB      0F3h,0C3h               ;repret
+
+
+
+global  rsaz_1024_norm2red_avx2
+
+ALIGN   32
+rsaz_1024_norm2red_avx2:
+
+        sub     rcx,-128
+        mov     r8,QWORD[rdx]
+        mov     eax,0x1fffffff
+        mov     r9,QWORD[8+rdx]
+        mov     r11,r8
+        shr     r11,0
+        and     r11,rax
+        mov     QWORD[((-128))+rcx],r11
+        mov     r10,r8
+        shr     r10,29
+        and     r10,rax
+        mov     QWORD[((-120))+rcx],r10
+        shrd    r8,r9,58
+        and     r8,rax
+        mov     QWORD[((-112))+rcx],r8
+        mov     r10,QWORD[16+rdx]
+        mov     r8,r9
+        shr     r8,23
+        and     r8,rax
+        mov     QWORD[((-104))+rcx],r8
+        shrd    r9,r10,52
+        and     r9,rax
+        mov     QWORD[((-96))+rcx],r9
+        mov     r11,QWORD[24+rdx]
+        mov     r9,r10
+        shr     r9,17
+        and     r9,rax
+        mov     QWORD[((-88))+rcx],r9
+        shrd    r10,r11,46
+        and     r10,rax
+        mov     QWORD[((-80))+rcx],r10
+        mov     r8,QWORD[32+rdx]
+        mov     r10,r11
+        shr     r10,11
+        and     r10,rax
+        mov     QWORD[((-72))+rcx],r10
+        shrd    r11,r8,40
+        and     r11,rax
+        mov     QWORD[((-64))+rcx],r11
+        mov     r9,QWORD[40+rdx]
+        mov     r11,r8
+        shr     r11,5
+        and     r11,rax
+        mov     QWORD[((-56))+rcx],r11
+        mov     r10,r8
+        shr     r10,34
+        and     r10,rax
+        mov     QWORD[((-48))+rcx],r10
+        shrd    r8,r9,63
+        and     r8,rax
+        mov     QWORD[((-40))+rcx],r8
+        mov     r10,QWORD[48+rdx]
+        mov     r8,r9
+        shr     r8,28
+        and     r8,rax
+        mov     QWORD[((-32))+rcx],r8
+        shrd    r9,r10,57
+        and     r9,rax
+        mov     QWORD[((-24))+rcx],r9
+        mov     r11,QWORD[56+rdx]
+        mov     r9,r10
+        shr     r9,22
+        and     r9,rax
+        mov     QWORD[((-16))+rcx],r9
+        shrd    r10,r11,51
+        and     r10,rax
+        mov     QWORD[((-8))+rcx],r10
+        mov     r8,QWORD[64+rdx]
+        mov     r10,r11
+        shr     r10,16
+        and     r10,rax
+        mov     QWORD[rcx],r10
+        shrd    r11,r8,45
+        and     r11,rax
+        mov     QWORD[8+rcx],r11
+        mov     r9,QWORD[72+rdx]
+        mov     r11,r8
+        shr     r11,10
+        and     r11,rax
+        mov     QWORD[16+rcx],r11
+        shrd    r8,r9,39
+        and     r8,rax
+        mov     QWORD[24+rcx],r8
+        mov     r10,QWORD[80+rdx]
+        mov     r8,r9
+        shr     r8,4
+        and     r8,rax
+        mov     QWORD[32+rcx],r8
+        mov     r11,r9
+        shr     r11,33
+        and     r11,rax
+        mov     QWORD[40+rcx],r11
+        shrd    r9,r10,62
+        and     r9,rax
+        mov     QWORD[48+rcx],r9
+        mov     r11,QWORD[88+rdx]
+        mov     r9,r10
+        shr     r9,27
+        and     r9,rax
+        mov     QWORD[56+rcx],r9
+        shrd    r10,r11,56
+        and     r10,rax
+        mov     QWORD[64+rcx],r10
+        mov     r8,QWORD[96+rdx]
+        mov     r10,r11
+        shr     r10,21
+        and     r10,rax
+        mov     QWORD[72+rcx],r10
+        shrd    r11,r8,50
+        and     r11,rax
+        mov     QWORD[80+rcx],r11
+        mov     r9,QWORD[104+rdx]
+        mov     r11,r8
+        shr     r11,15
+        and     r11,rax
+        mov     QWORD[88+rcx],r11
+        shrd    r8,r9,44
+        and     r8,rax
+        mov     QWORD[96+rcx],r8
+        mov     r10,QWORD[112+rdx]
+        mov     r8,r9
+        shr     r8,9
+        and     r8,rax
+        mov     QWORD[104+rcx],r8
+        shrd    r9,r10,38
+        and     r9,rax
+        mov     QWORD[112+rcx],r9
+        mov     r11,QWORD[120+rdx]
+        mov     r9,r10
+        shr     r9,3
+        and     r9,rax
+        mov     QWORD[120+rcx],r9
+        mov     r8,r10
+        shr     r8,32
+        and     r8,rax
+        mov     QWORD[128+rcx],r8
+        shrd    r10,r11,61
+        and     r10,rax
+        mov     QWORD[136+rcx],r10
+        xor     r8,r8
+        mov     r10,r11
+        shr     r10,26
+        and     r10,rax
+        mov     QWORD[144+rcx],r10
+        shrd    r11,r8,55
+        and     r11,rax
+        mov     QWORD[152+rcx],r11
+        mov     QWORD[160+rcx],r8
+        mov     QWORD[168+rcx],r8
+        mov     QWORD[176+rcx],r8
+        mov     QWORD[184+rcx],r8
+        DB      0F3h,0C3h               ;repret
+
+
+global  rsaz_1024_scatter5_avx2
+
+ALIGN   32
+rsaz_1024_scatter5_avx2:
+
+        vzeroupper
+        vmovdqu ymm5,YMMWORD[$L$scatter_permd]
+        shl     r8d,4
+        lea     rcx,[r8*1+rcx]
+        mov     eax,9
+        jmp     NEAR $L$oop_scatter_1024
+
+ALIGN   32
+$L$oop_scatter_1024:
+        vmovdqu ymm0,YMMWORD[rdx]
+        lea     rdx,[32+rdx]
+        vpermd  ymm0,ymm5,ymm0
+        vmovdqu XMMWORD[rcx],xmm0
+        lea     rcx,[512+rcx]
+        dec     eax
+        jnz     NEAR $L$oop_scatter_1024
+
+        vzeroupper
+        DB      0F3h,0C3h               ;repret
+
+
+
+global  rsaz_1024_gather5_avx2
+
+ALIGN   32
+rsaz_1024_gather5_avx2:
+
+        vzeroupper
+        mov     r11,rsp
+
+        lea     rax,[((-136))+rsp]
+$L$SEH_begin_rsaz_1024_gather5:
+
+DB      0x48,0x8d,0x60,0xe0
+DB      0xc5,0xf8,0x29,0x70,0xe0
+DB      0xc5,0xf8,0x29,0x78,0xf0
+DB      0xc5,0x78,0x29,0x40,0x00
+DB      0xc5,0x78,0x29,0x48,0x10
+DB      0xc5,0x78,0x29,0x50,0x20
+DB      0xc5,0x78,0x29,0x58,0x30
+DB      0xc5,0x78,0x29,0x60,0x40
+DB      0xc5,0x78,0x29,0x68,0x50
+DB      0xc5,0x78,0x29,0x70,0x60
+DB      0xc5,0x78,0x29,0x78,0x70
+        lea     rsp,[((-256))+rsp]
+        and     rsp,-32
+        lea     r10,[$L$inc]
+        lea     rax,[((-128))+rsp]
+
+        vmovd   xmm4,r8d
+        vmovdqa ymm0,YMMWORD[r10]
+        vmovdqa ymm1,YMMWORD[32+r10]
+        vmovdqa ymm5,YMMWORD[64+r10]
+        vpbroadcastd    ymm4,xmm4
+
+        vpaddd  ymm2,ymm0,ymm5
+        vpcmpeqd        ymm0,ymm0,ymm4
+        vpaddd  ymm3,ymm1,ymm5
+        vpcmpeqd        ymm1,ymm1,ymm4
+        vmovdqa YMMWORD[(0+128)+rax],ymm0
+        vpaddd  ymm0,ymm2,ymm5
+        vpcmpeqd        ymm2,ymm2,ymm4
+        vmovdqa YMMWORD[(32+128)+rax],ymm1
+        vpaddd  ymm1,ymm3,ymm5
+        vpcmpeqd        ymm3,ymm3,ymm4
+        vmovdqa YMMWORD[(64+128)+rax],ymm2
+        vpaddd  ymm2,ymm0,ymm5
+        vpcmpeqd        ymm0,ymm0,ymm4
+        vmovdqa YMMWORD[(96+128)+rax],ymm3
+        vpaddd  ymm3,ymm1,ymm5
+        vpcmpeqd        ymm1,ymm1,ymm4
+        vmovdqa YMMWORD[(128+128)+rax],ymm0
+        vpaddd  ymm8,ymm2,ymm5
+        vpcmpeqd        ymm2,ymm2,ymm4
+        vmovdqa YMMWORD[(160+128)+rax],ymm1
+        vpaddd  ymm9,ymm3,ymm5
+        vpcmpeqd        ymm3,ymm3,ymm4
+        vmovdqa YMMWORD[(192+128)+rax],ymm2
+        vpaddd  ymm10,ymm8,ymm5
+        vpcmpeqd        ymm8,ymm8,ymm4
+        vmovdqa YMMWORD[(224+128)+rax],ymm3
+        vpaddd  ymm11,ymm9,ymm5
+        vpcmpeqd        ymm9,ymm9,ymm4
+        vpaddd  ymm12,ymm10,ymm5
+        vpcmpeqd        ymm10,ymm10,ymm4
+        vpaddd  ymm13,ymm11,ymm5
+        vpcmpeqd        ymm11,ymm11,ymm4
+        vpaddd  ymm14,ymm12,ymm5
+        vpcmpeqd        ymm12,ymm12,ymm4
+        vpaddd  ymm15,ymm13,ymm5
+        vpcmpeqd        ymm13,ymm13,ymm4
+        vpcmpeqd        ymm14,ymm14,ymm4
+        vpcmpeqd        ymm15,ymm15,ymm4
+
+        vmovdqa ymm7,YMMWORD[((-32))+r10]
+        lea     rdx,[128+rdx]
+        mov     r8d,9
+
+$L$oop_gather_1024:
+        vmovdqa ymm0,YMMWORD[((0-128))+rdx]
+        vmovdqa ymm1,YMMWORD[((32-128))+rdx]
+        vmovdqa ymm2,YMMWORD[((64-128))+rdx]
+        vmovdqa ymm3,YMMWORD[((96-128))+rdx]
+        vpand   ymm0,ymm0,YMMWORD[((0+128))+rax]
+        vpand   ymm1,ymm1,YMMWORD[((32+128))+rax]
+        vpand   ymm2,ymm2,YMMWORD[((64+128))+rax]
+        vpor    ymm4,ymm1,ymm0
+        vpand   ymm3,ymm3,YMMWORD[((96+128))+rax]
+        vmovdqa ymm0,YMMWORD[((128-128))+rdx]
+        vmovdqa ymm1,YMMWORD[((160-128))+rdx]
+        vpor    ymm5,ymm3,ymm2
+        vmovdqa ymm2,YMMWORD[((192-128))+rdx]
+        vmovdqa ymm3,YMMWORD[((224-128))+rdx]
+        vpand   ymm0,ymm0,YMMWORD[((128+128))+rax]
+        vpand   ymm1,ymm1,YMMWORD[((160+128))+rax]
+        vpand   ymm2,ymm2,YMMWORD[((192+128))+rax]
+        vpor    ymm4,ymm4,ymm0
+        vpand   ymm3,ymm3,YMMWORD[((224+128))+rax]
+        vpand   ymm0,ymm8,YMMWORD[((256-128))+rdx]
+        vpor    ymm5,ymm5,ymm1
+        vpand   ymm1,ymm9,YMMWORD[((288-128))+rdx]
+        vpor    ymm4,ymm4,ymm2
+        vpand   ymm2,ymm10,YMMWORD[((320-128))+rdx]
+        vpor    ymm5,ymm5,ymm3
+        vpand   ymm3,ymm11,YMMWORD[((352-128))+rdx]
+        vpor    ymm4,ymm4,ymm0
+        vpand   ymm0,ymm12,YMMWORD[((384-128))+rdx]
+        vpor    ymm5,ymm5,ymm1
+        vpand   ymm1,ymm13,YMMWORD[((416-128))+rdx]
+        vpor    ymm4,ymm4,ymm2
+        vpand   ymm2,ymm14,YMMWORD[((448-128))+rdx]
+        vpor    ymm5,ymm5,ymm3
+        vpand   ymm3,ymm15,YMMWORD[((480-128))+rdx]
+        lea     rdx,[512+rdx]
+        vpor    ymm4,ymm4,ymm0
+        vpor    ymm5,ymm5,ymm1
+        vpor    ymm4,ymm4,ymm2
+        vpor    ymm5,ymm5,ymm3
+
+        vpor    ymm4,ymm4,ymm5
+        vextracti128    xmm5,ymm4,1
+        vpor    xmm5,xmm5,xmm4
+        vpermd  ymm5,ymm7,ymm5
+        vmovdqu YMMWORD[rcx],ymm5
+        lea     rcx,[32+rcx]
+        dec     r8d
+        jnz     NEAR $L$oop_gather_1024
+
+        vpxor   ymm0,ymm0,ymm0
+        vmovdqu YMMWORD[rcx],ymm0
+        vzeroupper
+        movaps  xmm6,XMMWORD[((-168))+r11]
+        movaps  xmm7,XMMWORD[((-152))+r11]
+        movaps  xmm8,XMMWORD[((-136))+r11]
+        movaps  xmm9,XMMWORD[((-120))+r11]
+        movaps  xmm10,XMMWORD[((-104))+r11]
+        movaps  xmm11,XMMWORD[((-88))+r11]
+        movaps  xmm12,XMMWORD[((-72))+r11]
+        movaps  xmm13,XMMWORD[((-56))+r11]
+        movaps  xmm14,XMMWORD[((-40))+r11]
+        movaps  xmm15,XMMWORD[((-24))+r11]
+        lea     rsp,[r11]
+
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_rsaz_1024_gather5:
+
+EXTERN  OPENSSL_ia32cap_P
+global  rsaz_avx2_eligible
+
+ALIGN   32
+rsaz_avx2_eligible:
+        mov     eax,DWORD[((OPENSSL_ia32cap_P+8))]
+        mov     ecx,524544
+        mov     edx,0
+        and     ecx,eax
+        cmp     ecx,524544
+        cmove   eax,edx
+        and     eax,32
+        shr     eax,5
+        DB      0F3h,0C3h               ;repret
+
+
+ALIGN   64
+$L$and_mask:
+        DQ      0x1fffffff,0x1fffffff,0x1fffffff,0x1fffffff
+$L$scatter_permd:
+        DD      0,2,4,6,7,7,7,7
+$L$gather_permd:
+        DD      0,7,1,7,2,7,3,7
+$L$inc:
+        DD      0,0,0,0,1,1,1,1
+        DD      2,2,2,2,3,3,3,3
+        DD      4,4,4,4,4,4,4,4
+ALIGN   64
+EXTERN  __imp_RtlVirtualUnwind
+
+ALIGN   16
+rsaz_se_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        mov     rsi,QWORD[8+r9]
+        mov     r11,QWORD[56+r9]
+
+        mov     r10d,DWORD[r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jb      NEAR $L$common_seh_tail
+
+        mov     r10d,DWORD[4+r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jae     NEAR $L$common_seh_tail
+
+        mov     rbp,QWORD[160+r8]
+
+        mov     r10d,DWORD[8+r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        cmovc   rax,rbp
+
+        mov     r15,QWORD[((-48))+rax]
+        mov     r14,QWORD[((-40))+rax]
+        mov     r13,QWORD[((-32))+rax]
+        mov     r12,QWORD[((-24))+rax]
+        mov     rbp,QWORD[((-16))+rax]
+        mov     rbx,QWORD[((-8))+rax]
+        mov     QWORD[240+r8],r15
+        mov     QWORD[232+r8],r14
+        mov     QWORD[224+r8],r13
+        mov     QWORD[216+r8],r12
+        mov     QWORD[160+r8],rbp
+        mov     QWORD[144+r8],rbx
+
+        lea     rsi,[((-216))+rax]
+        lea     rdi,[512+r8]
+        mov     ecx,20
+        DD      0xa548f3fc
+
+$L$common_seh_tail:
+        mov     rdi,QWORD[8+rax]
+        mov     rsi,QWORD[16+rax]
+        mov     QWORD[152+r8],rax
+        mov     QWORD[168+r8],rsi
+        mov     QWORD[176+r8],rdi
+
+        mov     rdi,QWORD[40+r9]
+        mov     rsi,r8
+        mov     ecx,154
+        DD      0xa548f3fc
+
+        mov     rsi,r9
+        xor     rcx,rcx
+        mov     rdx,QWORD[8+rsi]
+        mov     r8,QWORD[rsi]
+        mov     r9,QWORD[16+rsi]
+        mov     r10,QWORD[40+rsi]
+        lea     r11,[56+rsi]
+        lea     r12,[24+rsi]
+        mov     QWORD[32+rsp],r10
+        mov     QWORD[40+rsp],r11
+        mov     QWORD[48+rsp],r12
+        mov     QWORD[56+rsp],rcx
+        call    QWORD[__imp_RtlVirtualUnwind]
+
+        mov     eax,1
+        add     rsp,64
+        popfq
+        pop     r15
+        pop     r14
+        pop     r13
+        pop     r12
+        pop     rbp
+        pop     rbx
+        pop     rdi
+        pop     rsi
+        DB      0F3h,0C3h               ;repret
+
+
+section .pdata rdata align=4
+ALIGN   4
+        DD      $L$SEH_begin_rsaz_1024_sqr_avx2 wrt ..imagebase
+        DD      $L$SEH_end_rsaz_1024_sqr_avx2 wrt ..imagebase
+        DD      $L$SEH_info_rsaz_1024_sqr_avx2 wrt ..imagebase
+
+        DD      $L$SEH_begin_rsaz_1024_mul_avx2 wrt ..imagebase
+        DD      $L$SEH_end_rsaz_1024_mul_avx2 wrt ..imagebase
+        DD      $L$SEH_info_rsaz_1024_mul_avx2 wrt ..imagebase
+
+        DD      $L$SEH_begin_rsaz_1024_gather5 wrt ..imagebase
+        DD      $L$SEH_end_rsaz_1024_gather5 wrt ..imagebase
+        DD      $L$SEH_info_rsaz_1024_gather5 wrt ..imagebase
+section .xdata rdata align=8
+ALIGN   8
+$L$SEH_info_rsaz_1024_sqr_avx2:
+DB      9,0,0,0
+        DD      rsaz_se_handler wrt ..imagebase
+        DD      $L$sqr_1024_body wrt ..imagebase,$L$sqr_1024_epilogue wrt ..imagebase,$L$sqr_1024_in_tail wrt ..imagebase
+        DD      0
+$L$SEH_info_rsaz_1024_mul_avx2:
+DB      9,0,0,0
+        DD      rsaz_se_handler wrt ..imagebase
+        DD      $L$mul_1024_body wrt ..imagebase,$L$mul_1024_epilogue wrt ..imagebase,$L$mul_1024_in_tail wrt ..imagebase
+        DD      0
+$L$SEH_info_rsaz_1024_gather5:
+DB      0x01,0x36,0x17,0x0b
+DB      0x36,0xf8,0x09,0x00
+DB      0x31,0xe8,0x08,0x00
+DB      0x2c,0xd8,0x07,0x00
+DB      0x27,0xc8,0x06,0x00
+DB      0x22,0xb8,0x05,0x00
+DB      0x1d,0xa8,0x04,0x00
+DB      0x18,0x98,0x03,0x00
+DB      0x13,0x88,0x02,0x00
+DB      0x0e,0x78,0x01,0x00
+DB      0x09,0x68,0x00,0x00
+DB      0x04,0x01,0x15,0x00
+DB      0x00,0xb3,0x00,0x00
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-x86_64.nasm
new file mode 100644
index 0000000000..eb4958e903
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-x86_64.nasm
@@ -0,0 +1,2242 @@
+; Copyright 2013-2016 The OpenSSL Project Authors. All Rights Reserved.
+; Copyright (c) 2012, Intel Corporation. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+EXTERN  OPENSSL_ia32cap_P
+
+global  rsaz_512_sqr
+
+ALIGN   32
+rsaz_512_sqr:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_rsaz_512_sqr:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+
+
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+
+        sub     rsp,128+24
+
+$L$sqr_body:
+        mov     rbp,rdx
+        mov     rdx,QWORD[rsi]
+        mov     rax,QWORD[8+rsi]
+        mov     QWORD[128+rsp],rcx
+        mov     r11d,0x80100
+        and     r11d,DWORD[((OPENSSL_ia32cap_P+8))]
+        cmp     r11d,0x80100
+        je      NEAR $L$oop_sqrx
+        jmp     NEAR $L$oop_sqr
+
+ALIGN   32
+$L$oop_sqr:
+        mov     DWORD[((128+8))+rsp],r8d
+
+        mov     rbx,rdx
+        mul     rdx
+        mov     r8,rax
+        mov     rax,QWORD[16+rsi]
+        mov     r9,rdx
+
+        mul     rbx
+        add     r9,rax
+        mov     rax,QWORD[24+rsi]
+        mov     r10,rdx
+        adc     r10,0
+
+        mul     rbx
+        add     r10,rax
+        mov     rax,QWORD[32+rsi]
+        mov     r11,rdx
+        adc     r11,0
+
+        mul     rbx
+        add     r11,rax
+        mov     rax,QWORD[40+rsi]
+        mov     r12,rdx
+        adc     r12,0
+
+        mul     rbx
+        add     r12,rax
+        mov     rax,QWORD[48+rsi]
+        mov     r13,rdx
+        adc     r13,0
+
+        mul     rbx
+        add     r13,rax
+        mov     rax,QWORD[56+rsi]
+        mov     r14,rdx
+        adc     r14,0
+
+        mul     rbx
+        add     r14,rax
+        mov     rax,rbx
+        mov     r15,rdx
+        adc     r15,0
+
+        add     r8,r8
+        mov     rcx,r9
+        adc     r9,r9
+
+        mul     rax
+        mov     QWORD[rsp],rax
+        add     r8,rdx
+        adc     r9,0
+
+        mov     QWORD[8+rsp],r8
+        shr     rcx,63
+
+
+        mov     r8,QWORD[8+rsi]
+        mov     rax,QWORD[16+rsi]
+        mul     r8
+        add     r10,rax
+        mov     rax,QWORD[24+rsi]
+        mov     rbx,rdx
+        adc     rbx,0
+
+        mul     r8
+        add     r11,rax
+        mov     rax,QWORD[32+rsi]
+        adc     rdx,0
+        add     r11,rbx
+        mov     rbx,rdx
+        adc     rbx,0
+
+        mul     r8
+        add     r12,rax
+        mov     rax,QWORD[40+rsi]
+        adc     rdx,0
+        add     r12,rbx
+        mov     rbx,rdx
+        adc     rbx,0
+
+        mul     r8
+        add     r13,rax
+        mov     rax,QWORD[48+rsi]
+        adc     rdx,0
+        add     r13,rbx
+        mov     rbx,rdx
+        adc     rbx,0
+
+        mul     r8
+        add     r14,rax
+        mov     rax,QWORD[56+rsi]
+        adc     rdx,0
+        add     r14,rbx
+        mov     rbx,rdx
+        adc     rbx,0
+
+        mul     r8
+        add     r15,rax
+        mov     rax,r8
+        adc     rdx,0
+        add     r15,rbx
+        mov     r8,rdx
+        mov     rdx,r10
+        adc     r8,0
+
+        add     rdx,rdx
+        lea     r10,[r10*2+rcx]
+        mov     rbx,r11
+        adc     r11,r11
+
+        mul     rax
+        add     r9,rax
+        adc     r10,rdx
+        adc     r11,0
+
+        mov     QWORD[16+rsp],r9
+        mov     QWORD[24+rsp],r10
+        shr     rbx,63
+
+
+        mov     r9,QWORD[16+rsi]
+        mov     rax,QWORD[24+rsi]
+        mul     r9
+        add     r12,rax
+        mov     rax,QWORD[32+rsi]
+        mov     rcx,rdx
+        adc     rcx,0
+
+        mul     r9
+        add     r13,rax
+        mov     rax,QWORD[40+rsi]
+        adc     rdx,0
+        add     r13,rcx
+        mov     rcx,rdx
+        adc     rcx,0
+
+        mul     r9
+        add     r14,rax
+        mov     rax,QWORD[48+rsi]
+        adc     rdx,0
+        add     r14,rcx
+        mov     rcx,rdx
+        adc     rcx,0
+
+        mul     r9
+        mov     r10,r12
+        lea     r12,[r12*2+rbx]
+        add     r15,rax
+        mov     rax,QWORD[56+rsi]
+        adc     rdx,0
+        add     r15,rcx
+        mov     rcx,rdx
+        adc     rcx,0
+
+        mul     r9
+        shr     r10,63
+        add     r8,rax
+        mov     rax,r9
+        adc     rdx,0
+        add     r8,rcx
+        mov     r9,rdx
+        adc     r9,0
+
+        mov     rcx,r13
+        lea     r13,[r13*2+r10]
+
+        mul     rax
+        add     r11,rax
+        adc     r12,rdx
+        adc     r13,0
+
+        mov     QWORD[32+rsp],r11
+        mov     QWORD[40+rsp],r12
+        shr     rcx,63
+
+
+        mov     r10,QWORD[24+rsi]
+        mov     rax,QWORD[32+rsi]
+        mul     r10
+        add     r14,rax
+        mov     rax,QWORD[40+rsi]
+        mov     rbx,rdx
+        adc     rbx,0
+
+        mul     r10
+        add     r15,rax
+        mov     rax,QWORD[48+rsi]
+        adc     rdx,0
+        add     r15,rbx
+        mov     rbx,rdx
+        adc     rbx,0
+
+        mul     r10
+        mov     r12,r14
+        lea     r14,[r14*2+rcx]
+        add     r8,rax
+        mov     rax,QWORD[56+rsi]
+        adc     rdx,0
+        add     r8,rbx
+        mov     rbx,rdx
+        adc     rbx,0
+
+        mul     r10
+        shr     r12,63
+        add     r9,rax
+        mov     rax,r10
+        adc     rdx,0
+        add     r9,rbx
+        mov     r10,rdx
+        adc     r10,0
+
+        mov     rbx,r15
+        lea     r15,[r15*2+r12]
+
+        mul     rax
+        add     r13,rax
+        adc     r14,rdx
+        adc     r15,0
+
+        mov     QWORD[48+rsp],r13
+        mov     QWORD[56+rsp],r14
+        shr     rbx,63
+
+
+        mov     r11,QWORD[32+rsi]
+        mov     rax,QWORD[40+rsi]
+        mul     r11
+        add     r8,rax
+        mov     rax,QWORD[48+rsi]
+        mov     rcx,rdx
+        adc     rcx,0
+
+        mul     r11
+        add     r9,rax
+        mov     rax,QWORD[56+rsi]
+        adc     rdx,0
+        mov     r12,r8
+        lea     r8,[r8*2+rbx]
+        add     r9,rcx
+        mov     rcx,rdx
+        adc     rcx,0
+
+        mul     r11
+        shr     r12,63
+        add     r10,rax
+        mov     rax,r11
+        adc     rdx,0
+        add     r10,rcx
+        mov     r11,rdx
+        adc     r11,0
+
+        mov     rcx,r9
+        lea     r9,[r9*2+r12]
+
+        mul     rax
+        add     r15,rax
+        adc     r8,rdx
+        adc     r9,0
+
+        mov     QWORD[64+rsp],r15
+        mov     QWORD[72+rsp],r8
+        shr     rcx,63
+
+
+        mov     r12,QWORD[40+rsi]
+        mov     rax,QWORD[48+rsi]
+        mul     r12
+        add     r10,rax
+        mov     rax,QWORD[56+rsi]
+        mov     rbx,rdx
+        adc     rbx,0
+
+        mul     r12
+        add     r11,rax
+        mov     rax,r12
+        mov     r15,r10
+        lea     r10,[r10*2+rcx]
+        adc     rdx,0
+        shr     r15,63
+        add     r11,rbx
+        mov     r12,rdx
+        adc     r12,0
+
+        mov     rbx,r11
+        lea     r11,[r11*2+r15]
+
+        mul     rax
+        add     r9,rax
+        adc     r10,rdx
+        adc     r11,0
+
+        mov     QWORD[80+rsp],r9
+        mov     QWORD[88+rsp],r10
+
+
+        mov     r13,QWORD[48+rsi]
+        mov     rax,QWORD[56+rsi]
+        mul     r13
+        add     r12,rax
+        mov     rax,r13
+        mov     r13,rdx
+        adc     r13,0
+
+        xor     r14,r14
+        shl     rbx,1
+        adc     r12,r12
+        adc     r13,r13
+        adc     r14,r14
+
+        mul     rax
+        add     r11,rax
+        adc     r12,rdx
+        adc     r13,0
+
+        mov     QWORD[96+rsp],r11
+        mov     QWORD[104+rsp],r12
+
+
+        mov     rax,QWORD[56+rsi]
+        mul     rax
+        add     r13,rax
+        adc     rdx,0
+
+        add     r14,rdx
+
+        mov     QWORD[112+rsp],r13
+        mov     QWORD[120+rsp],r14
+
+        mov     r8,QWORD[rsp]
+        mov     r9,QWORD[8+rsp]
+        mov     r10,QWORD[16+rsp]
+        mov     r11,QWORD[24+rsp]
+        mov     r12,QWORD[32+rsp]
+        mov     r13,QWORD[40+rsp]
+        mov     r14,QWORD[48+rsp]
+        mov     r15,QWORD[56+rsp]
+
+        call    __rsaz_512_reduce
+
+        add     r8,QWORD[64+rsp]
+        adc     r9,QWORD[72+rsp]
+        adc     r10,QWORD[80+rsp]
+        adc     r11,QWORD[88+rsp]
+        adc     r12,QWORD[96+rsp]
+        adc     r13,QWORD[104+rsp]
+        adc     r14,QWORD[112+rsp]
+        adc     r15,QWORD[120+rsp]
+        sbb     rcx,rcx
+
+        call    __rsaz_512_subtract
+
+        mov     rdx,r8
+        mov     rax,r9
+        mov     r8d,DWORD[((128+8))+rsp]
+        mov     rsi,rdi
+
+        dec     r8d
+        jnz     NEAR $L$oop_sqr
+        jmp     NEAR $L$sqr_tail
+
+ALIGN   32
+$L$oop_sqrx:
+        mov     DWORD[((128+8))+rsp],r8d
+DB      102,72,15,110,199
+DB      102,72,15,110,205
+
+        mulx    r9,r8,rax
+
+        mulx    r10,rcx,QWORD[16+rsi]
+        xor     rbp,rbp
+
+        mulx    r11,rax,QWORD[24+rsi]
+        adcx    r9,rcx
+
+        mulx    r12,rcx,QWORD[32+rsi]
+        adcx    r10,rax
+
+        mulx    r13,rax,QWORD[40+rsi]
+        adcx    r11,rcx
+
+DB      0xc4,0x62,0xf3,0xf6,0xb6,0x30,0x00,0x00,0x00
+        adcx    r12,rax
+        adcx    r13,rcx
+
+DB      0xc4,0x62,0xfb,0xf6,0xbe,0x38,0x00,0x00,0x00
+        adcx    r14,rax
+        adcx    r15,rbp
+
+        mov     rcx,r9
+        shld    r9,r8,1
+        shl     r8,1
+
+        xor     ebp,ebp
+        mulx    rdx,rax,rdx
+        adcx    r8,rdx
+        mov     rdx,QWORD[8+rsi]
+        adcx    r9,rbp
+
+        mov     QWORD[rsp],rax
+        mov     QWORD[8+rsp],r8
+
+
+        mulx    rbx,rax,QWORD[16+rsi]
+        adox    r10,rax
+        adcx    r11,rbx
+
+DB      0xc4,0x62,0xc3,0xf6,0x86,0x18,0x00,0x00,0x00
+        adox    r11,rdi
+        adcx    r12,r8
+
+        mulx    rbx,rax,QWORD[32+rsi]
+        adox    r12,rax
+        adcx    r13,rbx
+
+        mulx    r8,rdi,QWORD[40+rsi]
+        adox    r13,rdi
+        adcx    r14,r8
+
+DB      0xc4,0xe2,0xfb,0xf6,0x9e,0x30,0x00,0x00,0x00
+        adox    r14,rax
+        adcx    r15,rbx
+
+DB      0xc4,0x62,0xc3,0xf6,0x86,0x38,0x00,0x00,0x00
+        adox    r15,rdi
+        adcx    r8,rbp
+        adox    r8,rbp
+
+        mov     rbx,r11
+        shld    r11,r10,1
+        shld    r10,rcx,1
+
+        xor     ebp,ebp
+        mulx    rcx,rax,rdx
+        mov     rdx,QWORD[16+rsi]
+        adcx    r9,rax
+        adcx    r10,rcx
+        adcx    r11,rbp
+
+        mov     QWORD[16+rsp],r9
+DB      0x4c,0x89,0x94,0x24,0x18,0x00,0x00,0x00
+
+
+DB      0xc4,0x62,0xc3,0xf6,0x8e,0x18,0x00,0x00,0x00
+        adox    r12,rdi
+        adcx    r13,r9
+
+        mulx    rcx,rax,QWORD[32+rsi]
+        adox    r13,rax
+        adcx    r14,rcx
+
+        mulx    r9,rdi,QWORD[40+rsi]
+        adox    r14,rdi
+        adcx    r15,r9
+
+DB      0xc4,0xe2,0xfb,0xf6,0x8e,0x30,0x00,0x00,0x00
+        adox    r15,rax
+        adcx    r8,rcx
+
+DB      0xc4,0x62,0xc3,0xf6,0x8e,0x38,0x00,0x00,0x00
+        adox    r8,rdi
+        adcx    r9,rbp
+        adox    r9,rbp
+
+        mov     rcx,r13
+        shld    r13,r12,1
+        shld    r12,rbx,1
+
+        xor     ebp,ebp
+        mulx    rdx,rax,rdx
+        adcx    r11,rax
+        adcx    r12,rdx
+        mov     rdx,QWORD[24+rsi]
+        adcx    r13,rbp
+
+        mov     QWORD[32+rsp],r11
+DB      0x4c,0x89,0xa4,0x24,0x28,0x00,0x00,0x00
+
+
+DB      0xc4,0xe2,0xfb,0xf6,0x9e,0x20,0x00,0x00,0x00
+        adox    r14,rax
+        adcx    r15,rbx
+
+        mulx    r10,rdi,QWORD[40+rsi]
+        adox    r15,rdi
+        adcx    r8,r10
+
+        mulx    rbx,rax,QWORD[48+rsi]
+        adox    r8,rax
+        adcx    r9,rbx
+
+        mulx    r10,rdi,QWORD[56+rsi]
+        adox    r9,rdi
+        adcx    r10,rbp
+        adox    r10,rbp
+
+DB      0x66
+        mov     rbx,r15
+        shld    r15,r14,1
+        shld    r14,rcx,1
+
+        xor     ebp,ebp
+        mulx    rdx,rax,rdx
+        adcx    r13,rax
+        adcx    r14,rdx
+        mov     rdx,QWORD[32+rsi]
+        adcx    r15,rbp
+
+        mov     QWORD[48+rsp],r13
+        mov     QWORD[56+rsp],r14
+
+
+DB      0xc4,0x62,0xc3,0xf6,0x9e,0x28,0x00,0x00,0x00
+        adox    r8,rdi
+        adcx    r9,r11
+
+        mulx    rcx,rax,QWORD[48+rsi]
+        adox    r9,rax
+        adcx    r10,rcx
+
+        mulx    r11,rdi,QWORD[56+rsi]
+        adox    r10,rdi
+        adcx    r11,rbp
+        adox    r11,rbp
+
+        mov     rcx,r9
+        shld    r9,r8,1
+        shld    r8,rbx,1
+
+        xor     ebp,ebp
+        mulx    rdx,rax,rdx
+        adcx    r15,rax
+        adcx    r8,rdx
+        mov     rdx,QWORD[40+rsi]
+        adcx    r9,rbp
+
+        mov     QWORD[64+rsp],r15
+        mov     QWORD[72+rsp],r8
+
+
+DB      0xc4,0xe2,0xfb,0xf6,0x9e,0x30,0x00,0x00,0x00
+        adox    r10,rax
+        adcx    r11,rbx
+
+DB      0xc4,0x62,0xc3,0xf6,0xa6,0x38,0x00,0x00,0x00
+        adox    r11,rdi
+        adcx    r12,rbp
+        adox    r12,rbp
+
+        mov     rbx,r11
+        shld    r11,r10,1
+        shld    r10,rcx,1
+
+        xor     ebp,ebp
+        mulx    rdx,rax,rdx
+        adcx    r9,rax
+        adcx    r10,rdx
+        mov     rdx,QWORD[48+rsi]
+        adcx    r11,rbp
+
+        mov     QWORD[80+rsp],r9
+        mov     QWORD[88+rsp],r10
+
+
+DB      0xc4,0x62,0xfb,0xf6,0xae,0x38,0x00,0x00,0x00
+        adox    r12,rax
+        adox    r13,rbp
+
+        xor     r14,r14
+        shld    r14,r13,1
+        shld    r13,r12,1
+        shld    r12,rbx,1
+
+        xor     ebp,ebp
+        mulx    rdx,rax,rdx
+        adcx    r11,rax
+        adcx    r12,rdx
+        mov     rdx,QWORD[56+rsi]
+        adcx    r13,rbp
+
+DB      0x4c,0x89,0x9c,0x24,0x60,0x00,0x00,0x00
+DB      0x4c,0x89,0xa4,0x24,0x68,0x00,0x00,0x00
+
+
+        mulx    rdx,rax,rdx
+        adox    r13,rax
+        adox    rdx,rbp
+
+DB      0x66
+        add     r14,rdx
+
+        mov     QWORD[112+rsp],r13
+        mov     QWORD[120+rsp],r14
+DB      102,72,15,126,199
+DB      102,72,15,126,205
+
+        mov     rdx,QWORD[128+rsp]
+        mov     r8,QWORD[rsp]
+        mov     r9,QWORD[8+rsp]
+        mov     r10,QWORD[16+rsp]
+        mov     r11,QWORD[24+rsp]
+        mov     r12,QWORD[32+rsp]
+        mov     r13,QWORD[40+rsp]
+        mov     r14,QWORD[48+rsp]
+        mov     r15,QWORD[56+rsp]
+
+        call    __rsaz_512_reducex
+
+        add     r8,QWORD[64+rsp]
+        adc     r9,QWORD[72+rsp]
+        adc     r10,QWORD[80+rsp]
+        adc     r11,QWORD[88+rsp]
+        adc     r12,QWORD[96+rsp]
+        adc     r13,QWORD[104+rsp]
+        adc     r14,QWORD[112+rsp]
+        adc     r15,QWORD[120+rsp]
+        sbb     rcx,rcx
+
+        call    __rsaz_512_subtract
+
+        mov     rdx,r8
+        mov     rax,r9
+        mov     r8d,DWORD[((128+8))+rsp]
+        mov     rsi,rdi
+
+        dec     r8d
+        jnz     NEAR $L$oop_sqrx
+
+$L$sqr_tail:
+
+        lea     rax,[((128+24+48))+rsp]
+
+        mov     r15,QWORD[((-48))+rax]
+
+        mov     r14,QWORD[((-40))+rax]
+
+        mov     r13,QWORD[((-32))+rax]
+
+        mov     r12,QWORD[((-24))+rax]
+
+        mov     rbp,QWORD[((-16))+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+
+        lea     rsp,[rax]
+
+$L$sqr_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_rsaz_512_sqr:
+global  rsaz_512_mul
+
+ALIGN   32
+rsaz_512_mul:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_rsaz_512_mul:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+
+
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+
+        sub     rsp,128+24
+
+$L$mul_body:
+DB      102,72,15,110,199
+DB      102,72,15,110,201
+        mov     QWORD[128+rsp],r8
+        mov     r11d,0x80100
+        and     r11d,DWORD[((OPENSSL_ia32cap_P+8))]
+        cmp     r11d,0x80100
+        je      NEAR $L$mulx
+        mov     rbx,QWORD[rdx]
+        mov     rbp,rdx
+        call    __rsaz_512_mul
+
+DB      102,72,15,126,199
+DB      102,72,15,126,205
+
+        mov     r8,QWORD[rsp]
+        mov     r9,QWORD[8+rsp]
+        mov     r10,QWORD[16+rsp]
+        mov     r11,QWORD[24+rsp]
+        mov     r12,QWORD[32+rsp]
+        mov     r13,QWORD[40+rsp]
+        mov     r14,QWORD[48+rsp]
+        mov     r15,QWORD[56+rsp]
+
+        call    __rsaz_512_reduce
+        jmp     NEAR $L$mul_tail
+
+ALIGN   32
+$L$mulx:
+        mov     rbp,rdx
+        mov     rdx,QWORD[rdx]
+        call    __rsaz_512_mulx
+
+DB      102,72,15,126,199
+DB      102,72,15,126,205
+
+        mov     rdx,QWORD[128+rsp]
+        mov     r8,QWORD[rsp]
+        mov     r9,QWORD[8+rsp]
+        mov     r10,QWORD[16+rsp]
+        mov     r11,QWORD[24+rsp]
+        mov     r12,QWORD[32+rsp]
+        mov     r13,QWORD[40+rsp]
+        mov     r14,QWORD[48+rsp]
+        mov     r15,QWORD[56+rsp]
+
+        call    __rsaz_512_reducex
+$L$mul_tail:
+        add     r8,QWORD[64+rsp]
+        adc     r9,QWORD[72+rsp]
+        adc     r10,QWORD[80+rsp]
+        adc     r11,QWORD[88+rsp]
+        adc     r12,QWORD[96+rsp]
+        adc     r13,QWORD[104+rsp]
+        adc     r14,QWORD[112+rsp]
+        adc     r15,QWORD[120+rsp]
+        sbb     rcx,rcx
+
+        call    __rsaz_512_subtract
+
+        lea     rax,[((128+24+48))+rsp]
+
+        mov     r15,QWORD[((-48))+rax]
+
+        mov     r14,QWORD[((-40))+rax]
+
+        mov     r13,QWORD[((-32))+rax]
+
+        mov     r12,QWORD[((-24))+rax]
+
+        mov     rbp,QWORD[((-16))+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+
+        lea     rsp,[rax]
+
+$L$mul_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_rsaz_512_mul:
+global  rsaz_512_mul_gather4
+
+ALIGN   32
+rsaz_512_mul_gather4:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_rsaz_512_mul_gather4:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+
+        sub     rsp,328
+
+        movaps  XMMWORD[160+rsp],xmm6
+        movaps  XMMWORD[176+rsp],xmm7
+        movaps  XMMWORD[192+rsp],xmm8
+        movaps  XMMWORD[208+rsp],xmm9
+        movaps  XMMWORD[224+rsp],xmm10
+        movaps  XMMWORD[240+rsp],xmm11
+        movaps  XMMWORD[256+rsp],xmm12
+        movaps  XMMWORD[272+rsp],xmm13
+        movaps  XMMWORD[288+rsp],xmm14
+        movaps  XMMWORD[304+rsp],xmm15
+$L$mul_gather4_body:
+        movd    xmm8,r9d
+        movdqa  xmm1,XMMWORD[(($L$inc+16))]
+        movdqa  xmm0,XMMWORD[$L$inc]
+
+        pshufd  xmm8,xmm8,0
+        movdqa  xmm7,xmm1
+        movdqa  xmm2,xmm1
+        paddd   xmm1,xmm0
+        pcmpeqd xmm0,xmm8
+        movdqa  xmm3,xmm7
+        paddd   xmm2,xmm1
+        pcmpeqd xmm1,xmm8
+        movdqa  xmm4,xmm7
+        paddd   xmm3,xmm2
+        pcmpeqd xmm2,xmm8
+        movdqa  xmm5,xmm7
+        paddd   xmm4,xmm3
+        pcmpeqd xmm3,xmm8
+        movdqa  xmm6,xmm7
+        paddd   xmm5,xmm4
+        pcmpeqd xmm4,xmm8
+        paddd   xmm6,xmm5
+        pcmpeqd xmm5,xmm8
+        paddd   xmm7,xmm6
+        pcmpeqd xmm6,xmm8
+        pcmpeqd xmm7,xmm8
+
+        movdqa  xmm8,XMMWORD[rdx]
+        movdqa  xmm9,XMMWORD[16+rdx]
+        movdqa  xmm10,XMMWORD[32+rdx]
+        movdqa  xmm11,XMMWORD[48+rdx]
+        pand    xmm8,xmm0
+        movdqa  xmm12,XMMWORD[64+rdx]
+        pand    xmm9,xmm1
+        movdqa  xmm13,XMMWORD[80+rdx]
+        pand    xmm10,xmm2
+        movdqa  xmm14,XMMWORD[96+rdx]
+        pand    xmm11,xmm3
+        movdqa  xmm15,XMMWORD[112+rdx]
+        lea     rbp,[128+rdx]
+        pand    xmm12,xmm4
+        pand    xmm13,xmm5
+        pand    xmm14,xmm6
+        pand    xmm15,xmm7
+        por     xmm8,xmm10
+        por     xmm9,xmm11
+        por     xmm8,xmm12
+        por     xmm9,xmm13
+        por     xmm8,xmm14
+        por     xmm9,xmm15
+
+        por     xmm8,xmm9
+        pshufd  xmm9,xmm8,0x4e
+        por     xmm8,xmm9
+        mov     r11d,0x80100
+        and     r11d,DWORD[((OPENSSL_ia32cap_P+8))]
+        cmp     r11d,0x80100
+        je      NEAR $L$mulx_gather
+DB      102,76,15,126,195
+
+        mov     QWORD[128+rsp],r8
+        mov     QWORD[((128+8))+rsp],rdi
+        mov     QWORD[((128+16))+rsp],rcx
+
+        mov     rax,QWORD[rsi]
+        mov     rcx,QWORD[8+rsi]
+        mul     rbx
+        mov     QWORD[rsp],rax
+        mov     rax,rcx
+        mov     r8,rdx
+
+        mul     rbx
+        add     r8,rax
+        mov     rax,QWORD[16+rsi]
+        mov     r9,rdx
+        adc     r9,0
+
+        mul     rbx
+        add     r9,rax
+        mov     rax,QWORD[24+rsi]
+        mov     r10,rdx
+        adc     r10,0
+
+        mul     rbx
+        add     r10,rax
+        mov     rax,QWORD[32+rsi]
+        mov     r11,rdx
+        adc     r11,0
+
+        mul     rbx
+        add     r11,rax
+        mov     rax,QWORD[40+rsi]
+        mov     r12,rdx
+        adc     r12,0
+
+        mul     rbx
+        add     r12,rax
+        mov     rax,QWORD[48+rsi]
+        mov     r13,rdx
+        adc     r13,0
+
+        mul     rbx
+        add     r13,rax
+        mov     rax,QWORD[56+rsi]
+        mov     r14,rdx
+        adc     r14,0
+
+        mul     rbx
+        add     r14,rax
+        mov     rax,QWORD[rsi]
+        mov     r15,rdx
+        adc     r15,0
+
+        lea     rdi,[8+rsp]
+        mov     ecx,7
+        jmp     NEAR $L$oop_mul_gather
+
+ALIGN   32
+$L$oop_mul_gather:
+        movdqa  xmm8,XMMWORD[rbp]
+        movdqa  xmm9,XMMWORD[16+rbp]
+        movdqa  xmm10,XMMWORD[32+rbp]
+        movdqa  xmm11,XMMWORD[48+rbp]
+        pand    xmm8,xmm0
+        movdqa  xmm12,XMMWORD[64+rbp]
+        pand    xmm9,xmm1
+        movdqa  xmm13,XMMWORD[80+rbp]
+        pand    xmm10,xmm2
+        movdqa  xmm14,XMMWORD[96+rbp]
+        pand    xmm11,xmm3
+        movdqa  xmm15,XMMWORD[112+rbp]
+        lea     rbp,[128+rbp]
+        pand    xmm12,xmm4
+        pand    xmm13,xmm5
+        pand    xmm14,xmm6
+        pand    xmm15,xmm7
+        por     xmm8,xmm10
+        por     xmm9,xmm11
+        por     xmm8,xmm12
+        por     xmm9,xmm13
+        por     xmm8,xmm14
+        por     xmm9,xmm15
+
+        por     xmm8,xmm9
+        pshufd  xmm9,xmm8,0x4e
+        por     xmm8,xmm9
+DB      102,76,15,126,195
+
+        mul     rbx
+        add     r8,rax
+        mov     rax,QWORD[8+rsi]
+        mov     QWORD[rdi],r8
+        mov     r8,rdx
+        adc     r8,0
+
+        mul     rbx
+        add     r9,rax
+        mov     rax,QWORD[16+rsi]
+        adc     rdx,0
+        add     r8,r9
+        mov     r9,rdx
+        adc     r9,0
+
+        mul     rbx
+        add     r10,rax
+        mov     rax,QWORD[24+rsi]
+        adc     rdx,0
+        add     r9,r10
+        mov     r10,rdx
+        adc     r10,0
+
+        mul     rbx
+        add     r11,rax
+        mov     rax,QWORD[32+rsi]
+        adc     rdx,0
+        add     r10,r11
+        mov     r11,rdx
+        adc     r11,0
+
+        mul     rbx
+        add     r12,rax
+        mov     rax,QWORD[40+rsi]
+        adc     rdx,0
+        add     r11,r12
+        mov     r12,rdx
+        adc     r12,0
+
+        mul     rbx
+        add     r13,rax
+        mov     rax,QWORD[48+rsi]
+        adc     rdx,0
+        add     r12,r13
+        mov     r13,rdx
+        adc     r13,0
+
+        mul     rbx
+        add     r14,rax
+        mov     rax,QWORD[56+rsi]
+        adc     rdx,0
+        add     r13,r14
+        mov     r14,rdx
+        adc     r14,0
+
+        mul     rbx
+        add     r15,rax
+        mov     rax,QWORD[rsi]
+        adc     rdx,0
+        add     r14,r15
+        mov     r15,rdx
+        adc     r15,0
+
+        lea     rdi,[8+rdi]
+
+        dec     ecx
+        jnz     NEAR $L$oop_mul_gather
+
+        mov     QWORD[rdi],r8
+        mov     QWORD[8+rdi],r9
+        mov     QWORD[16+rdi],r10
+        mov     QWORD[24+rdi],r11
+        mov     QWORD[32+rdi],r12
+        mov     QWORD[40+rdi],r13
+        mov     QWORD[48+rdi],r14
+        mov     QWORD[56+rdi],r15
+
+        mov     rdi,QWORD[((128+8))+rsp]
+        mov     rbp,QWORD[((128+16))+rsp]
+
+        mov     r8,QWORD[rsp]
+        mov     r9,QWORD[8+rsp]
+        mov     r10,QWORD[16+rsp]
+        mov     r11,QWORD[24+rsp]
+        mov     r12,QWORD[32+rsp]
+        mov     r13,QWORD[40+rsp]
+        mov     r14,QWORD[48+rsp]
+        mov     r15,QWORD[56+rsp]
+
+        call    __rsaz_512_reduce
+        jmp     NEAR $L$mul_gather_tail
+
+ALIGN   32
+$L$mulx_gather:
+DB      102,76,15,126,194
+
+        mov     QWORD[128+rsp],r8
+        mov     QWORD[((128+8))+rsp],rdi
+        mov     QWORD[((128+16))+rsp],rcx
+
+        mulx    r8,rbx,QWORD[rsi]
+        mov     QWORD[rsp],rbx
+        xor     edi,edi
+
+        mulx    r9,rax,QWORD[8+rsi]
+
+        mulx    r10,rbx,QWORD[16+rsi]
+        adcx    r8,rax
+
+        mulx    r11,rax,QWORD[24+rsi]
+        adcx    r9,rbx
+
+        mulx    r12,rbx,QWORD[32+rsi]
+        adcx    r10,rax
+
+        mulx    r13,rax,QWORD[40+rsi]
+        adcx    r11,rbx
+
+        mulx    r14,rbx,QWORD[48+rsi]
+        adcx    r12,rax
+
+        mulx    r15,rax,QWORD[56+rsi]
+        adcx    r13,rbx
+        adcx    r14,rax
+DB      0x67
+        mov     rbx,r8
+        adcx    r15,rdi
+
+        mov     rcx,-7
+        jmp     NEAR $L$oop_mulx_gather
+
+ALIGN   32
+$L$oop_mulx_gather:
+        movdqa  xmm8,XMMWORD[rbp]
+        movdqa  xmm9,XMMWORD[16+rbp]
+        movdqa  xmm10,XMMWORD[32+rbp]
+        movdqa  xmm11,XMMWORD[48+rbp]
+        pand    xmm8,xmm0
+        movdqa  xmm12,XMMWORD[64+rbp]
+        pand    xmm9,xmm1
+        movdqa  xmm13,XMMWORD[80+rbp]
+        pand    xmm10,xmm2
+        movdqa  xmm14,XMMWORD[96+rbp]
+        pand    xmm11,xmm3
+        movdqa  xmm15,XMMWORD[112+rbp]
+        lea     rbp,[128+rbp]
+        pand    xmm12,xmm4
+        pand    xmm13,xmm5
+        pand    xmm14,xmm6
+        pand    xmm15,xmm7
+        por     xmm8,xmm10
+        por     xmm9,xmm11
+        por     xmm8,xmm12
+        por     xmm9,xmm13
+        por     xmm8,xmm14
+        por     xmm9,xmm15
+
+        por     xmm8,xmm9
+        pshufd  xmm9,xmm8,0x4e
+        por     xmm8,xmm9
+DB      102,76,15,126,194
+
+DB      0xc4,0x62,0xfb,0xf6,0x86,0x00,0x00,0x00,0x00
+        adcx    rbx,rax
+        adox    r8,r9
+
+        mulx    r9,rax,QWORD[8+rsi]
+        adcx    r8,rax
+        adox    r9,r10
+
+        mulx    r10,rax,QWORD[16+rsi]
+        adcx    r9,rax
+        adox    r10,r11
+
+DB      0xc4,0x62,0xfb,0xf6,0x9e,0x18,0x00,0x00,0x00
+        adcx    r10,rax
+        adox    r11,r12
+
+        mulx    r12,rax,QWORD[32+rsi]
+        adcx    r11,rax
+        adox    r12,r13
+
+        mulx    r13,rax,QWORD[40+rsi]
+        adcx    r12,rax
+        adox    r13,r14
+
+DB      0xc4,0x62,0xfb,0xf6,0xb6,0x30,0x00,0x00,0x00
+        adcx    r13,rax
+DB      0x67
+        adox    r14,r15
+
+        mulx    r15,rax,QWORD[56+rsi]
+        mov     QWORD[64+rcx*8+rsp],rbx
+        adcx    r14,rax
+        adox    r15,rdi
+        mov     rbx,r8
+        adcx    r15,rdi
+
+        inc     rcx
+        jnz     NEAR $L$oop_mulx_gather
+
+        mov     QWORD[64+rsp],r8
+        mov     QWORD[((64+8))+rsp],r9
+        mov     QWORD[((64+16))+rsp],r10
+        mov     QWORD[((64+24))+rsp],r11
+        mov     QWORD[((64+32))+rsp],r12
+        mov     QWORD[((64+40))+rsp],r13
+        mov     QWORD[((64+48))+rsp],r14
+        mov     QWORD[((64+56))+rsp],r15
+
+        mov     rdx,QWORD[128+rsp]
+        mov     rdi,QWORD[((128+8))+rsp]
+        mov     rbp,QWORD[((128+16))+rsp]
+
+        mov     r8,QWORD[rsp]
+        mov     r9,QWORD[8+rsp]
+        mov     r10,QWORD[16+rsp]
+        mov     r11,QWORD[24+rsp]
+        mov     r12,QWORD[32+rsp]
+        mov     r13,QWORD[40+rsp]
+        mov     r14,QWORD[48+rsp]
+        mov     r15,QWORD[56+rsp]
+
+        call    __rsaz_512_reducex
+
+$L$mul_gather_tail:
+        add     r8,QWORD[64+rsp]
+        adc     r9,QWORD[72+rsp]
+        adc     r10,QWORD[80+rsp]
+        adc     r11,QWORD[88+rsp]
+        adc     r12,QWORD[96+rsp]
+        adc     r13,QWORD[104+rsp]
+        adc     r14,QWORD[112+rsp]
+        adc     r15,QWORD[120+rsp]
+        sbb     rcx,rcx
+
+        call    __rsaz_512_subtract
+
+        lea     rax,[((128+24+48))+rsp]
+        movaps  xmm6,XMMWORD[((160-200))+rax]
+        movaps  xmm7,XMMWORD[((176-200))+rax]
+        movaps  xmm8,XMMWORD[((192-200))+rax]
+        movaps  xmm9,XMMWORD[((208-200))+rax]
+        movaps  xmm10,XMMWORD[((224-200))+rax]
+        movaps  xmm11,XMMWORD[((240-200))+rax]
+        movaps  xmm12,XMMWORD[((256-200))+rax]
+        movaps  xmm13,XMMWORD[((272-200))+rax]
+        movaps  xmm14,XMMWORD[((288-200))+rax]
+        movaps  xmm15,XMMWORD[((304-200))+rax]
+        lea     rax,[176+rax]
+
+        mov     r15,QWORD[((-48))+rax]
+
+        mov     r14,QWORD[((-40))+rax]
+
+        mov     r13,QWORD[((-32))+rax]
+
+        mov     r12,QWORD[((-24))+rax]
+
+        mov     rbp,QWORD[((-16))+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+
+        lea     rsp,[rax]
+
+$L$mul_gather4_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_rsaz_512_mul_gather4:
+global  rsaz_512_mul_scatter4
+
+ALIGN   32
+rsaz_512_mul_scatter4:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_rsaz_512_mul_scatter4:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+
+        mov     r9d,r9d
+        sub     rsp,128+24
+
+$L$mul_scatter4_body:
+        lea     r8,[r9*8+r8]
+DB      102,72,15,110,199
+DB      102,72,15,110,202
+DB      102,73,15,110,208
+        mov     QWORD[128+rsp],rcx
+
+        mov     rbp,rdi
+        mov     r11d,0x80100
+        and     r11d,DWORD[((OPENSSL_ia32cap_P+8))]
+        cmp     r11d,0x80100
+        je      NEAR $L$mulx_scatter
+        mov     rbx,QWORD[rdi]
+        call    __rsaz_512_mul
+
+DB      102,72,15,126,199
+DB      102,72,15,126,205
+
+        mov     r8,QWORD[rsp]
+        mov     r9,QWORD[8+rsp]
+        mov     r10,QWORD[16+rsp]
+        mov     r11,QWORD[24+rsp]
+        mov     r12,QWORD[32+rsp]
+        mov     r13,QWORD[40+rsp]
+        mov     r14,QWORD[48+rsp]
+        mov     r15,QWORD[56+rsp]
+
+        call    __rsaz_512_reduce
+        jmp     NEAR $L$mul_scatter_tail
+
+ALIGN   32
+$L$mulx_scatter:
+        mov     rdx,QWORD[rdi]
+        call    __rsaz_512_mulx
+
+DB      102,72,15,126,199
+DB      102,72,15,126,205
+
+        mov     rdx,QWORD[128+rsp]
+        mov     r8,QWORD[rsp]
+        mov     r9,QWORD[8+rsp]
+        mov     r10,QWORD[16+rsp]
+        mov     r11,QWORD[24+rsp]
+        mov     r12,QWORD[32+rsp]
+        mov     r13,QWORD[40+rsp]
+        mov     r14,QWORD[48+rsp]
+        mov     r15,QWORD[56+rsp]
+
+        call    __rsaz_512_reducex
+
+$L$mul_scatter_tail:
+        add     r8,QWORD[64+rsp]
+        adc     r9,QWORD[72+rsp]
+        adc     r10,QWORD[80+rsp]
+        adc     r11,QWORD[88+rsp]
+        adc     r12,QWORD[96+rsp]
+        adc     r13,QWORD[104+rsp]
+        adc     r14,QWORD[112+rsp]
+        adc     r15,QWORD[120+rsp]
+DB      102,72,15,126,214
+        sbb     rcx,rcx
+
+        call    __rsaz_512_subtract
+
+        mov     QWORD[rsi],r8
+        mov     QWORD[128+rsi],r9
+        mov     QWORD[256+rsi],r10
+        mov     QWORD[384+rsi],r11
+        mov     QWORD[512+rsi],r12
+        mov     QWORD[640+rsi],r13
+        mov     QWORD[768+rsi],r14
+        mov     QWORD[896+rsi],r15
+
+        lea     rax,[((128+24+48))+rsp]
+
+        mov     r15,QWORD[((-48))+rax]
+
+        mov     r14,QWORD[((-40))+rax]
+
+        mov     r13,QWORD[((-32))+rax]
+
+        mov     r12,QWORD[((-24))+rax]
+
+        mov     rbp,QWORD[((-16))+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+
+        lea     rsp,[rax]
+
+$L$mul_scatter4_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_rsaz_512_mul_scatter4:
+global  rsaz_512_mul_by_one
+
+ALIGN   32
+rsaz_512_mul_by_one:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_rsaz_512_mul_by_one:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+
+
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+
+        sub     rsp,128+24
+
+$L$mul_by_one_body:
+        mov     eax,DWORD[((OPENSSL_ia32cap_P+8))]
+        mov     rbp,rdx
+        mov     QWORD[128+rsp],rcx
+
+        mov     r8,QWORD[rsi]
+        pxor    xmm0,xmm0
+        mov     r9,QWORD[8+rsi]
+        mov     r10,QWORD[16+rsi]
+        mov     r11,QWORD[24+rsi]
+        mov     r12,QWORD[32+rsi]
+        mov     r13,QWORD[40+rsi]
+        mov     r14,QWORD[48+rsi]
+        mov     r15,QWORD[56+rsi]
+
+        movdqa  XMMWORD[rsp],xmm0
+        movdqa  XMMWORD[16+rsp],xmm0
+        movdqa  XMMWORD[32+rsp],xmm0
+        movdqa  XMMWORD[48+rsp],xmm0
+        movdqa  XMMWORD[64+rsp],xmm0
+        movdqa  XMMWORD[80+rsp],xmm0
+        movdqa  XMMWORD[96+rsp],xmm0
+        and     eax,0x80100
+        cmp     eax,0x80100
+        je      NEAR $L$by_one_callx
+        call    __rsaz_512_reduce
+        jmp     NEAR $L$by_one_tail
+ALIGN   32
+$L$by_one_callx:
+        mov     rdx,QWORD[128+rsp]
+        call    __rsaz_512_reducex
+$L$by_one_tail:
+        mov     QWORD[rdi],r8
+        mov     QWORD[8+rdi],r9
+        mov     QWORD[16+rdi],r10
+        mov     QWORD[24+rdi],r11
+        mov     QWORD[32+rdi],r12
+        mov     QWORD[40+rdi],r13
+        mov     QWORD[48+rdi],r14
+        mov     QWORD[56+rdi],r15
+
+        lea     rax,[((128+24+48))+rsp]
+
+        mov     r15,QWORD[((-48))+rax]
+
+        mov     r14,QWORD[((-40))+rax]
+
+        mov     r13,QWORD[((-32))+rax]
+
+        mov     r12,QWORD[((-24))+rax]
+
+        mov     rbp,QWORD[((-16))+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+
+        lea     rsp,[rax]
+
+$L$mul_by_one_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_rsaz_512_mul_by_one:
+
+ALIGN   32
+__rsaz_512_reduce:
+        mov     rbx,r8
+        imul    rbx,QWORD[((128+8))+rsp]
+        mov     rax,QWORD[rbp]
+        mov     ecx,8
+        jmp     NEAR $L$reduction_loop
+
+ALIGN   32
+$L$reduction_loop:
+        mul     rbx
+        mov     rax,QWORD[8+rbp]
+        neg     r8
+        mov     r8,rdx
+        adc     r8,0
+
+        mul     rbx
+        add     r9,rax
+        mov     rax,QWORD[16+rbp]
+        adc     rdx,0
+        add     r8,r9
+        mov     r9,rdx
+        adc     r9,0
+
+        mul     rbx
+        add     r10,rax
+        mov     rax,QWORD[24+rbp]
+        adc     rdx,0
+        add     r9,r10
+        mov     r10,rdx
+        adc     r10,0
+
+        mul     rbx
+        add     r11,rax
+        mov     rax,QWORD[32+rbp]
+        adc     rdx,0
+        add     r10,r11
+        mov     rsi,QWORD[((128+8))+rsp]
+
+
+        adc     rdx,0
+        mov     r11,rdx
+
+        mul     rbx
+        add     r12,rax
+        mov     rax,QWORD[40+rbp]
+        adc     rdx,0
+        imul    rsi,r8
+        add     r11,r12
+        mov     r12,rdx
+        adc     r12,0
+
+        mul     rbx
+        add     r13,rax
+        mov     rax,QWORD[48+rbp]
+        adc     rdx,0
+        add     r12,r13
+        mov     r13,rdx
+        adc     r13,0
+
+        mul     rbx
+        add     r14,rax
+        mov     rax,QWORD[56+rbp]
+        adc     rdx,0
+        add     r13,r14
+        mov     r14,rdx
+        adc     r14,0
+
+        mul     rbx
+        mov     rbx,rsi
+        add     r15,rax
+        mov     rax,QWORD[rbp]
+        adc     rdx,0
+        add     r14,r15
+        mov     r15,rdx
+        adc     r15,0
+
+        dec     ecx
+        jne     NEAR $L$reduction_loop
+
+        DB      0F3h,0C3h               ;repret
+
+
+ALIGN   32
+__rsaz_512_reducex:
+
+        imul    rdx,r8
+        xor     rsi,rsi
+        mov     ecx,8
+        jmp     NEAR $L$reduction_loopx
+
+ALIGN   32
+$L$reduction_loopx:
+        mov     rbx,r8
+        mulx    r8,rax,QWORD[rbp]
+        adcx    rax,rbx
+        adox    r8,r9
+
+        mulx    r9,rax,QWORD[8+rbp]
+        adcx    r8,rax
+        adox    r9,r10
+
+        mulx    r10,rbx,QWORD[16+rbp]
+        adcx    r9,rbx
+        adox    r10,r11
+
+        mulx    r11,rbx,QWORD[24+rbp]
+        adcx    r10,rbx
+        adox    r11,r12
+
+DB      0xc4,0x62,0xe3,0xf6,0xa5,0x20,0x00,0x00,0x00
+        mov     rax,rdx
+        mov     rdx,r8
+        adcx    r11,rbx
+        adox    r12,r13
+
+        mulx    rdx,rbx,QWORD[((128+8))+rsp]
+        mov     rdx,rax
+
+        mulx    r13,rax,QWORD[40+rbp]
+        adcx    r12,rax
+        adox    r13,r14
+
+DB      0xc4,0x62,0xfb,0xf6,0xb5,0x30,0x00,0x00,0x00
+        adcx    r13,rax
+        adox    r14,r15
+
+        mulx    r15,rax,QWORD[56+rbp]
+        mov     rdx,rbx
+        adcx    r14,rax
+        adox    r15,rsi
+        adcx    r15,rsi
+
+        dec     ecx
+        jne     NEAR $L$reduction_loopx
+
+        DB      0F3h,0C3h               ;repret
+
+
+ALIGN   32
+__rsaz_512_subtract:
+        mov     QWORD[rdi],r8
+        mov     QWORD[8+rdi],r9
+        mov     QWORD[16+rdi],r10
+        mov     QWORD[24+rdi],r11
+        mov     QWORD[32+rdi],r12
+        mov     QWORD[40+rdi],r13
+        mov     QWORD[48+rdi],r14
+        mov     QWORD[56+rdi],r15
+
+        mov     r8,QWORD[rbp]
+        mov     r9,QWORD[8+rbp]
+        neg     r8
+        not     r9
+        and     r8,rcx
+        mov     r10,QWORD[16+rbp]
+        and     r9,rcx
+        not     r10
+        mov     r11,QWORD[24+rbp]
+        and     r10,rcx
+        not     r11
+        mov     r12,QWORD[32+rbp]
+        and     r11,rcx
+        not     r12
+        mov     r13,QWORD[40+rbp]
+        and     r12,rcx
+        not     r13
+        mov     r14,QWORD[48+rbp]
+        and     r13,rcx
+        not     r14
+        mov     r15,QWORD[56+rbp]
+        and     r14,rcx
+        not     r15
+        and     r15,rcx
+
+        add     r8,QWORD[rdi]
+        adc     r9,QWORD[8+rdi]
+        adc     r10,QWORD[16+rdi]
+        adc     r11,QWORD[24+rdi]
+        adc     r12,QWORD[32+rdi]
+        adc     r13,QWORD[40+rdi]
+        adc     r14,QWORD[48+rdi]
+        adc     r15,QWORD[56+rdi]
+
+        mov     QWORD[rdi],r8
+        mov     QWORD[8+rdi],r9
+        mov     QWORD[16+rdi],r10
+        mov     QWORD[24+rdi],r11
+        mov     QWORD[32+rdi],r12
+        mov     QWORD[40+rdi],r13
+        mov     QWORD[48+rdi],r14
+        mov     QWORD[56+rdi],r15
+
+        DB      0F3h,0C3h               ;repret
+
+
+ALIGN   32
+__rsaz_512_mul:
+        lea     rdi,[8+rsp]
+
+        mov     rax,QWORD[rsi]
+        mul     rbx
+        mov     QWORD[rdi],rax
+        mov     rax,QWORD[8+rsi]
+        mov     r8,rdx
+
+        mul     rbx
+        add     r8,rax
+        mov     rax,QWORD[16+rsi]
+        mov     r9,rdx
+        adc     r9,0
+
+        mul     rbx
+        add     r9,rax
+        mov     rax,QWORD[24+rsi]
+        mov     r10,rdx
+        adc     r10,0
+
+        mul     rbx
+        add     r10,rax
+        mov     rax,QWORD[32+rsi]
+        mov     r11,rdx
+        adc     r11,0
+
+        mul     rbx
+        add     r11,rax
+        mov     rax,QWORD[40+rsi]
+        mov     r12,rdx
+        adc     r12,0
+
+        mul     rbx
+        add     r12,rax
+        mov     rax,QWORD[48+rsi]
+        mov     r13,rdx
+        adc     r13,0
+
+        mul     rbx
+        add     r13,rax
+        mov     rax,QWORD[56+rsi]
+        mov     r14,rdx
+        adc     r14,0
+
+        mul     rbx
+        add     r14,rax
+        mov     rax,QWORD[rsi]
+        mov     r15,rdx
+        adc     r15,0
+
+        lea     rbp,[8+rbp]
+        lea     rdi,[8+rdi]
+
+        mov     ecx,7
+        jmp     NEAR $L$oop_mul
+
+ALIGN   32
+$L$oop_mul:
+        mov     rbx,QWORD[rbp]
+        mul     rbx
+        add     r8,rax
+        mov     rax,QWORD[8+rsi]
+        mov     QWORD[rdi],r8
+        mov     r8,rdx
+        adc     r8,0
+
+        mul     rbx
+        add     r9,rax
+        mov     rax,QWORD[16+rsi]
+        adc     rdx,0
+        add     r8,r9
+        mov     r9,rdx
+        adc     r9,0
+
+        mul     rbx
+        add     r10,rax
+        mov     rax,QWORD[24+rsi]
+        adc     rdx,0
+        add     r9,r10
+        mov     r10,rdx
+        adc     r10,0
+
+        mul     rbx
+        add     r11,rax
+        mov     rax,QWORD[32+rsi]
+        adc     rdx,0
+        add     r10,r11
+        mov     r11,rdx
+        adc     r11,0
+
+        mul     rbx
+        add     r12,rax
+        mov     rax,QWORD[40+rsi]
+        adc     rdx,0
+        add     r11,r12
+        mov     r12,rdx
+        adc     r12,0
+
+        mul     rbx
+        add     r13,rax
+        mov     rax,QWORD[48+rsi]
+        adc     rdx,0
+        add     r12,r13
+        mov     r13,rdx
+        adc     r13,0
+
+        mul     rbx
+        add     r14,rax
+        mov     rax,QWORD[56+rsi]
+        adc     rdx,0
+        add     r13,r14
+        mov     r14,rdx
+        lea     rbp,[8+rbp]
+        adc     r14,0
+
+        mul     rbx
+        add     r15,rax
+        mov     rax,QWORD[rsi]
+        adc     rdx,0
+        add     r14,r15
+        mov     r15,rdx
+        adc     r15,0
+
+        lea     rdi,[8+rdi]
+
+        dec     ecx
+        jnz     NEAR $L$oop_mul
+
+        mov     QWORD[rdi],r8
+        mov     QWORD[8+rdi],r9
+        mov     QWORD[16+rdi],r10
+        mov     QWORD[24+rdi],r11
+        mov     QWORD[32+rdi],r12
+        mov     QWORD[40+rdi],r13
+        mov     QWORD[48+rdi],r14
+        mov     QWORD[56+rdi],r15
+
+        DB      0F3h,0C3h               ;repret
+
+
+ALIGN   32
+__rsaz_512_mulx:
+        mulx    r8,rbx,QWORD[rsi]
+        mov     rcx,-6
+
+        mulx    r9,rax,QWORD[8+rsi]
+        mov     QWORD[8+rsp],rbx
+
+        mulx    r10,rbx,QWORD[16+rsi]
+        adc     r8,rax
+
+        mulx    r11,rax,QWORD[24+rsi]
+        adc     r9,rbx
+
+        mulx    r12,rbx,QWORD[32+rsi]
+        adc     r10,rax
+
+        mulx    r13,rax,QWORD[40+rsi]
+        adc     r11,rbx
+
+        mulx    r14,rbx,QWORD[48+rsi]
+        adc     r12,rax
+
+        mulx    r15,rax,QWORD[56+rsi]
+        mov     rdx,QWORD[8+rbp]
+        adc     r13,rbx
+        adc     r14,rax
+        adc     r15,0
+
+        xor     rdi,rdi
+        jmp     NEAR $L$oop_mulx
+
+ALIGN   32
+$L$oop_mulx:
+        mov     rbx,r8
+        mulx    r8,rax,QWORD[rsi]
+        adcx    rbx,rax
+        adox    r8,r9
+
+        mulx    r9,rax,QWORD[8+rsi]
+        adcx    r8,rax
+        adox    r9,r10
+
+        mulx    r10,rax,QWORD[16+rsi]
+        adcx    r9,rax
+        adox    r10,r11
+
+        mulx    r11,rax,QWORD[24+rsi]
+        adcx    r10,rax
+        adox    r11,r12
+
+DB      0x3e,0xc4,0x62,0xfb,0xf6,0xa6,0x20,0x00,0x00,0x00
+        adcx    r11,rax
+        adox    r12,r13
+
+        mulx    r13,rax,QWORD[40+rsi]
+        adcx    r12,rax
+        adox    r13,r14
+
+        mulx    r14,rax,QWORD[48+rsi]
+        adcx    r13,rax
+        adox    r14,r15
+
+        mulx    r15,rax,QWORD[56+rsi]
+        mov     rdx,QWORD[64+rcx*8+rbp]
+        mov     QWORD[((8+64-8))+rcx*8+rsp],rbx
+        adcx    r14,rax
+        adox    r15,rdi
+        adcx    r15,rdi
+
+        inc     rcx
+        jnz     NEAR $L$oop_mulx
+
+        mov     rbx,r8
+        mulx    r8,rax,QWORD[rsi]
+        adcx    rbx,rax
+        adox    r8,r9
+
+DB      0xc4,0x62,0xfb,0xf6,0x8e,0x08,0x00,0x00,0x00
+        adcx    r8,rax
+        adox    r9,r10
+
+DB      0xc4,0x62,0xfb,0xf6,0x96,0x10,0x00,0x00,0x00
+        adcx    r9,rax
+        adox    r10,r11
+
+        mulx    r11,rax,QWORD[24+rsi]
+        adcx    r10,rax
+        adox    r11,r12
+
+        mulx    r12,rax,QWORD[32+rsi]
+        adcx    r11,rax
+        adox    r12,r13
+
+        mulx    r13,rax,QWORD[40+rsi]
+        adcx    r12,rax
+        adox    r13,r14
+
+DB      0xc4,0x62,0xfb,0xf6,0xb6,0x30,0x00,0x00,0x00
+        adcx    r13,rax
+        adox    r14,r15
+
+DB      0xc4,0x62,0xfb,0xf6,0xbe,0x38,0x00,0x00,0x00
+        adcx    r14,rax
+        adox    r15,rdi
+        adcx    r15,rdi
+
+        mov     QWORD[((8+64-8))+rsp],rbx
+        mov     QWORD[((8+64))+rsp],r8
+        mov     QWORD[((8+64+8))+rsp],r9
+        mov     QWORD[((8+64+16))+rsp],r10
+        mov     QWORD[((8+64+24))+rsp],r11
+        mov     QWORD[((8+64+32))+rsp],r12
+        mov     QWORD[((8+64+40))+rsp],r13
+        mov     QWORD[((8+64+48))+rsp],r14
+        mov     QWORD[((8+64+56))+rsp],r15
+
+        DB      0F3h,0C3h               ;repret
+
+global  rsaz_512_scatter4
+
+ALIGN   16
+rsaz_512_scatter4:
+        lea     rcx,[r8*8+rcx]
+        mov     r9d,8
+        jmp     NEAR $L$oop_scatter
+ALIGN   16
+$L$oop_scatter:
+        mov     rax,QWORD[rdx]
+        lea     rdx,[8+rdx]
+        mov     QWORD[rcx],rax
+        lea     rcx,[128+rcx]
+        dec     r9d
+        jnz     NEAR $L$oop_scatter
+        DB      0F3h,0C3h               ;repret
+
+
+global  rsaz_512_gather4
+
+ALIGN   16
+rsaz_512_gather4:
+$L$SEH_begin_rsaz_512_gather4:
+DB      0x48,0x81,0xec,0xa8,0x00,0x00,0x00
+DB      0x0f,0x29,0x34,0x24
+DB      0x0f,0x29,0x7c,0x24,0x10
+DB      0x44,0x0f,0x29,0x44,0x24,0x20
+DB      0x44,0x0f,0x29,0x4c,0x24,0x30
+DB      0x44,0x0f,0x29,0x54,0x24,0x40
+DB      0x44,0x0f,0x29,0x5c,0x24,0x50
+DB      0x44,0x0f,0x29,0x64,0x24,0x60
+DB      0x44,0x0f,0x29,0x6c,0x24,0x70
+DB      0x44,0x0f,0x29,0xb4,0x24,0x80,0,0,0
+DB      0x44,0x0f,0x29,0xbc,0x24,0x90,0,0,0
+        movd    xmm8,r8d
+        movdqa  xmm1,XMMWORD[(($L$inc+16))]
+        movdqa  xmm0,XMMWORD[$L$inc]
+
+        pshufd  xmm8,xmm8,0
+        movdqa  xmm7,xmm1
+        movdqa  xmm2,xmm1
+        paddd   xmm1,xmm0
+        pcmpeqd xmm0,xmm8
+        movdqa  xmm3,xmm7
+        paddd   xmm2,xmm1
+        pcmpeqd xmm1,xmm8
+        movdqa  xmm4,xmm7
+        paddd   xmm3,xmm2
+        pcmpeqd xmm2,xmm8
+        movdqa  xmm5,xmm7
+        paddd   xmm4,xmm3
+        pcmpeqd xmm3,xmm8
+        movdqa  xmm6,xmm7
+        paddd   xmm5,xmm4
+        pcmpeqd xmm4,xmm8
+        paddd   xmm6,xmm5
+        pcmpeqd xmm5,xmm8
+        paddd   xmm7,xmm6
+        pcmpeqd xmm6,xmm8
+        pcmpeqd xmm7,xmm8
+        mov     r9d,8
+        jmp     NEAR $L$oop_gather
+ALIGN   16
+$L$oop_gather:
+        movdqa  xmm8,XMMWORD[rdx]
+        movdqa  xmm9,XMMWORD[16+rdx]
+        movdqa  xmm10,XMMWORD[32+rdx]
+        movdqa  xmm11,XMMWORD[48+rdx]
+        pand    xmm8,xmm0
+        movdqa  xmm12,XMMWORD[64+rdx]
+        pand    xmm9,xmm1
+        movdqa  xmm13,XMMWORD[80+rdx]
+        pand    xmm10,xmm2
+        movdqa  xmm14,XMMWORD[96+rdx]
+        pand    xmm11,xmm3
+        movdqa  xmm15,XMMWORD[112+rdx]
+        lea     rdx,[128+rdx]
+        pand    xmm12,xmm4
+        pand    xmm13,xmm5
+        pand    xmm14,xmm6
+        pand    xmm15,xmm7
+        por     xmm8,xmm10
+        por     xmm9,xmm11
+        por     xmm8,xmm12
+        por     xmm9,xmm13
+        por     xmm8,xmm14
+        por     xmm9,xmm15
+
+        por     xmm8,xmm9
+        pshufd  xmm9,xmm8,0x4e
+        por     xmm8,xmm9
+        movq    QWORD[rcx],xmm8
+        lea     rcx,[8+rcx]
+        dec     r9d
+        jnz     NEAR $L$oop_gather
+        movaps  xmm6,XMMWORD[rsp]
+        movaps  xmm7,XMMWORD[16+rsp]
+        movaps  xmm8,XMMWORD[32+rsp]
+        movaps  xmm9,XMMWORD[48+rsp]
+        movaps  xmm10,XMMWORD[64+rsp]
+        movaps  xmm11,XMMWORD[80+rsp]
+        movaps  xmm12,XMMWORD[96+rsp]
+        movaps  xmm13,XMMWORD[112+rsp]
+        movaps  xmm14,XMMWORD[128+rsp]
+        movaps  xmm15,XMMWORD[144+rsp]
+        add     rsp,0xa8
+        DB      0F3h,0C3h               ;repret
+$L$SEH_end_rsaz_512_gather4:
+
+
+ALIGN   64
+$L$inc:
+        DD      0,0,1,1
+        DD      2,2,2,2
+EXTERN  __imp_RtlVirtualUnwind
+
+ALIGN   16
+se_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        mov     rsi,QWORD[8+r9]
+        mov     r11,QWORD[56+r9]
+
+        mov     r10d,DWORD[r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jb      NEAR $L$common_seh_tail
+
+        mov     rax,QWORD[152+r8]
+
+        mov     r10d,DWORD[4+r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jae     NEAR $L$common_seh_tail
+
+        lea     rax,[((128+24+48))+rax]
+
+        lea     rbx,[$L$mul_gather4_epilogue]
+        cmp     rbx,r10
+        jne     NEAR $L$se_not_in_mul_gather4
+
+        lea     rax,[176+rax]
+
+        lea     rsi,[((-48-168))+rax]
+        lea     rdi,[512+r8]
+        mov     ecx,20
+        DD      0xa548f3fc
+
+$L$se_not_in_mul_gather4:
+        mov     rbx,QWORD[((-8))+rax]
+        mov     rbp,QWORD[((-16))+rax]
+        mov     r12,QWORD[((-24))+rax]
+        mov     r13,QWORD[((-32))+rax]
+        mov     r14,QWORD[((-40))+rax]
+        mov     r15,QWORD[((-48))+rax]
+        mov     QWORD[144+r8],rbx
+        mov     QWORD[160+r8],rbp
+        mov     QWORD[216+r8],r12
+        mov     QWORD[224+r8],r13
+        mov     QWORD[232+r8],r14
+        mov     QWORD[240+r8],r15
+
+$L$common_seh_tail:
+        mov     rdi,QWORD[8+rax]
+        mov     rsi,QWORD[16+rax]
+        mov     QWORD[152+r8],rax
+        mov     QWORD[168+r8],rsi
+        mov     QWORD[176+r8],rdi
+
+        mov     rdi,QWORD[40+r9]
+        mov     rsi,r8
+        mov     ecx,154
+        DD      0xa548f3fc
+
+        mov     rsi,r9
+        xor     rcx,rcx
+        mov     rdx,QWORD[8+rsi]
+        mov     r8,QWORD[rsi]
+        mov     r9,QWORD[16+rsi]
+        mov     r10,QWORD[40+rsi]
+        lea     r11,[56+rsi]
+        lea     r12,[24+rsi]
+        mov     QWORD[32+rsp],r10
+        mov     QWORD[40+rsp],r11
+        mov     QWORD[48+rsp],r12
+        mov     QWORD[56+rsp],rcx
+        call    QWORD[__imp_RtlVirtualUnwind]
+
+        mov     eax,1
+        add     rsp,64
+        popfq
+        pop     r15
+        pop     r14
+        pop     r13
+        pop     r12
+        pop     rbp
+        pop     rbx
+        pop     rdi
+        pop     rsi
+        DB      0F3h,0C3h               ;repret
+
+
+section .pdata rdata align=4
+ALIGN   4
+        DD      $L$SEH_begin_rsaz_512_sqr wrt ..imagebase
+        DD      $L$SEH_end_rsaz_512_sqr wrt ..imagebase
+        DD      $L$SEH_info_rsaz_512_sqr wrt ..imagebase
+
+        DD      $L$SEH_begin_rsaz_512_mul wrt ..imagebase
+        DD      $L$SEH_end_rsaz_512_mul wrt ..imagebase
+        DD      $L$SEH_info_rsaz_512_mul wrt ..imagebase
+
+        DD      $L$SEH_begin_rsaz_512_mul_gather4 wrt ..imagebase
+        DD      $L$SEH_end_rsaz_512_mul_gather4 wrt ..imagebase
+        DD      $L$SEH_info_rsaz_512_mul_gather4 wrt ..imagebase
+
+        DD      $L$SEH_begin_rsaz_512_mul_scatter4 wrt ..imagebase
+        DD      $L$SEH_end_rsaz_512_mul_scatter4 wrt ..imagebase
+        DD      $L$SEH_info_rsaz_512_mul_scatter4 wrt ..imagebase
+
+        DD      $L$SEH_begin_rsaz_512_mul_by_one wrt ..imagebase
+        DD      $L$SEH_end_rsaz_512_mul_by_one wrt ..imagebase
+        DD      $L$SEH_info_rsaz_512_mul_by_one wrt ..imagebase
+
+        DD      $L$SEH_begin_rsaz_512_gather4 wrt ..imagebase
+        DD      $L$SEH_end_rsaz_512_gather4 wrt ..imagebase
+        DD      $L$SEH_info_rsaz_512_gather4 wrt ..imagebase
+
+section .xdata rdata align=8
+ALIGN   8
+$L$SEH_info_rsaz_512_sqr:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$sqr_body wrt ..imagebase,$L$sqr_epilogue wrt ..imagebase
+$L$SEH_info_rsaz_512_mul:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$mul_body wrt ..imagebase,$L$mul_epilogue wrt ..imagebase
+$L$SEH_info_rsaz_512_mul_gather4:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$mul_gather4_body wrt ..imagebase,$L$mul_gather4_epilogue wrt ..imagebase
+$L$SEH_info_rsaz_512_mul_scatter4:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$mul_scatter4_body wrt ..imagebase,$L$mul_scatter4_epilogue wrt ..imagebase
+$L$SEH_info_rsaz_512_mul_by_one:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$mul_by_one_body wrt ..imagebase,$L$mul_by_one_epilogue wrt ..imagebase
+$L$SEH_info_rsaz_512_gather4:
+DB      0x01,0x46,0x16,0x00
+DB      0x46,0xf8,0x09,0x00
+DB      0x3d,0xe8,0x08,0x00
+DB      0x34,0xd8,0x07,0x00
+DB      0x2e,0xc8,0x06,0x00
+DB      0x28,0xb8,0x05,0x00
+DB      0x22,0xa8,0x04,0x00
+DB      0x1c,0x98,0x03,0x00
+DB      0x16,0x88,0x02,0x00
+DB      0x10,0x78,0x01,0x00
+DB      0x0b,0x68,0x00,0x00
+DB      0x07,0x01,0x15,0x00
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-gf2m.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-gf2m.nasm
new file mode 100644
index 0000000000..b96e85a35a
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-gf2m.nasm
@@ -0,0 +1,432 @@
+; Copyright 2011-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+
+ALIGN   16
+_mul_1x1:
+
+        sub     rsp,128+8
+
+        mov     r9,-1
+        lea     rsi,[rax*1+rax]
+        shr     r9,3
+        lea     rdi,[rax*4]
+        and     r9,rax
+        lea     r12,[rax*8]
+        sar     rax,63
+        lea     r10,[r9*1+r9]
+        sar     rsi,63
+        lea     r11,[r9*4]
+        and     rax,rbp
+        sar     rdi,63
+        mov     rdx,rax
+        shl     rax,63
+        and     rsi,rbp
+        shr     rdx,1
+        mov     rcx,rsi
+        shl     rsi,62
+        and     rdi,rbp
+        shr     rcx,2
+        xor     rax,rsi
+        mov     rbx,rdi
+        shl     rdi,61
+        xor     rdx,rcx
+        shr     rbx,3
+        xor     rax,rdi
+        xor     rdx,rbx
+
+        mov     r13,r9
+        mov     QWORD[rsp],0
+        xor     r13,r10
+        mov     QWORD[8+rsp],r9
+        mov     r14,r11
+        mov     QWORD[16+rsp],r10
+        xor     r14,r12
+        mov     QWORD[24+rsp],r13
+
+        xor     r9,r11
+        mov     QWORD[32+rsp],r11
+        xor     r10,r11
+        mov     QWORD[40+rsp],r9
+        xor     r13,r11
+        mov     QWORD[48+rsp],r10
+        xor     r9,r14
+        mov     QWORD[56+rsp],r13
+        xor     r10,r14
+
+        mov     QWORD[64+rsp],r12
+        xor     r13,r14
+        mov     QWORD[72+rsp],r9
+        xor     r9,r11
+        mov     QWORD[80+rsp],r10
+        xor     r10,r11
+        mov     QWORD[88+rsp],r13
+
+        xor     r13,r11
+        mov     QWORD[96+rsp],r14
+        mov     rsi,r8
+        mov     QWORD[104+rsp],r9
+        and     rsi,rbp
+        mov     QWORD[112+rsp],r10
+        shr     rbp,4
+        mov     QWORD[120+rsp],r13
+        mov     rdi,r8
+        and     rdi,rbp
+        shr     rbp,4
+
+        movq    xmm0,QWORD[rsi*8+rsp]
+        mov     rsi,r8
+        and     rsi,rbp
+        shr     rbp,4
+        mov     rcx,QWORD[rdi*8+rsp]
+        mov     rdi,r8
+        mov     rbx,rcx
+        shl     rcx,4
+        and     rdi,rbp
+        movq    xmm1,QWORD[rsi*8+rsp]
+        shr     rbx,60
+        xor     rax,rcx
+        pslldq  xmm1,1
+        mov     rsi,r8
+        shr     rbp,4
+        xor     rdx,rbx
+        and     rsi,rbp
+        shr     rbp,4
+        pxor    xmm0,xmm1
+        mov     rcx,QWORD[rdi*8+rsp]
+        mov     rdi,r8
+        mov     rbx,rcx
+        shl     rcx,12
+        and     rdi,rbp
+        movq    xmm1,QWORD[rsi*8+rsp]
+        shr     rbx,52
+        xor     rax,rcx
+        pslldq  xmm1,2
+        mov     rsi,r8
+        shr     rbp,4
+        xor     rdx,rbx
+        and     rsi,rbp
+        shr     rbp,4
+        pxor    xmm0,xmm1
+        mov     rcx,QWORD[rdi*8+rsp]
+        mov     rdi,r8
+        mov     rbx,rcx
+        shl     rcx,20
+        and     rdi,rbp
+        movq    xmm1,QWORD[rsi*8+rsp]
+        shr     rbx,44
+        xor     rax,rcx
+        pslldq  xmm1,3
+        mov     rsi,r8
+        shr     rbp,4
+        xor     rdx,rbx
+        and     rsi,rbp
+        shr     rbp,4
+        pxor    xmm0,xmm1
+        mov     rcx,QWORD[rdi*8+rsp]
+        mov     rdi,r8
+        mov     rbx,rcx
+        shl     rcx,28
+        and     rdi,rbp
+        movq    xmm1,QWORD[rsi*8+rsp]
+        shr     rbx,36
+        xor     rax,rcx
+        pslldq  xmm1,4
+        mov     rsi,r8
+        shr     rbp,4
+        xor     rdx,rbx
+        and     rsi,rbp
+        shr     rbp,4
+        pxor    xmm0,xmm1
+        mov     rcx,QWORD[rdi*8+rsp]
+        mov     rdi,r8
+        mov     rbx,rcx
+        shl     rcx,36
+        and     rdi,rbp
+        movq    xmm1,QWORD[rsi*8+rsp]
+        shr     rbx,28
+        xor     rax,rcx
+        pslldq  xmm1,5
+        mov     rsi,r8
+        shr     rbp,4
+        xor     rdx,rbx
+        and     rsi,rbp
+        shr     rbp,4
+        pxor    xmm0,xmm1
+        mov     rcx,QWORD[rdi*8+rsp]
+        mov     rdi,r8
+        mov     rbx,rcx
+        shl     rcx,44
+        and     rdi,rbp
+        movq    xmm1,QWORD[rsi*8+rsp]
+        shr     rbx,20
+        xor     rax,rcx
+        pslldq  xmm1,6
+        mov     rsi,r8
+        shr     rbp,4
+        xor     rdx,rbx
+        and     rsi,rbp
+        shr     rbp,4
+        pxor    xmm0,xmm1
+        mov     rcx,QWORD[rdi*8+rsp]
+        mov     rdi,r8
+        mov     rbx,rcx
+        shl     rcx,52
+        and     rdi,rbp
+        movq    xmm1,QWORD[rsi*8+rsp]
+        shr     rbx,12
+        xor     rax,rcx
+        pslldq  xmm1,7
+        mov     rsi,r8
+        shr     rbp,4
+        xor     rdx,rbx
+        and     rsi,rbp
+        shr     rbp,4
+        pxor    xmm0,xmm1
+        mov     rcx,QWORD[rdi*8+rsp]
+        mov     rbx,rcx
+        shl     rcx,60
+DB      102,72,15,126,198
+        shr     rbx,4
+        xor     rax,rcx
+        psrldq  xmm0,8
+        xor     rdx,rbx
+DB      102,72,15,126,199
+        xor     rax,rsi
+        xor     rdx,rdi
+
+        add     rsp,128+8
+
+        DB      0F3h,0C3h               ;repret
+$L$end_mul_1x1:
+
+
+EXTERN  OPENSSL_ia32cap_P
+global  bn_GF2m_mul_2x2
+
+ALIGN   16
+bn_GF2m_mul_2x2:
+
+        mov     rax,rsp
+        mov     r10,QWORD[OPENSSL_ia32cap_P]
+        bt      r10,33
+        jnc     NEAR $L$vanilla_mul_2x2
+
+DB      102,72,15,110,194
+DB      102,73,15,110,201
+DB      102,73,15,110,208
+        movq    xmm3,QWORD[40+rsp]
+        movdqa  xmm4,xmm0
+        movdqa  xmm5,xmm1
+DB      102,15,58,68,193,0
+        pxor    xmm4,xmm2
+        pxor    xmm5,xmm3
+DB      102,15,58,68,211,0
+DB      102,15,58,68,229,0
+        xorps   xmm4,xmm0
+        xorps   xmm4,xmm2
+        movdqa  xmm5,xmm4
+        pslldq  xmm4,8
+        psrldq  xmm5,8
+        pxor    xmm2,xmm4
+        pxor    xmm0,xmm5
+        movdqu  XMMWORD[rcx],xmm2
+        movdqu  XMMWORD[16+rcx],xmm0
+        DB      0F3h,0C3h               ;repret
+
+ALIGN   16
+$L$vanilla_mul_2x2:
+        lea     rsp,[((-136))+rsp]
+
+        mov     r10,QWORD[176+rsp]
+        mov     QWORD[120+rsp],rdi
+        mov     QWORD[128+rsp],rsi
+        mov     QWORD[80+rsp],r14
+
+        mov     QWORD[88+rsp],r13
+
+        mov     QWORD[96+rsp],r12
+
+        mov     QWORD[104+rsp],rbp
+
+        mov     QWORD[112+rsp],rbx
+
+$L$body_mul_2x2:
+        mov     QWORD[32+rsp],rcx
+        mov     QWORD[40+rsp],rdx
+        mov     QWORD[48+rsp],r8
+        mov     QWORD[56+rsp],r9
+        mov     QWORD[64+rsp],r10
+
+        mov     r8,0xf
+        mov     rax,rdx
+        mov     rbp,r9
+        call    _mul_1x1
+        mov     QWORD[16+rsp],rax
+        mov     QWORD[24+rsp],rdx
+
+        mov     rax,QWORD[48+rsp]
+        mov     rbp,QWORD[64+rsp]
+        call    _mul_1x1
+        mov     QWORD[rsp],rax
+        mov     QWORD[8+rsp],rdx
+
+        mov     rax,QWORD[40+rsp]
+        mov     rbp,QWORD[56+rsp]
+        xor     rax,QWORD[48+rsp]
+        xor     rbp,QWORD[64+rsp]
+        call    _mul_1x1
+        mov     rbx,QWORD[rsp]
+        mov     rcx,QWORD[8+rsp]
+        mov     rdi,QWORD[16+rsp]
+        mov     rsi,QWORD[24+rsp]
+        mov     rbp,QWORD[32+rsp]
+
+        xor     rax,rdx
+        xor     rdx,rcx
+        xor     rax,rbx
+        mov     QWORD[rbp],rbx
+        xor     rdx,rdi
+        mov     QWORD[24+rbp],rsi
+        xor     rax,rsi
+        xor     rdx,rsi
+        xor     rax,rdx
+        mov     QWORD[16+rbp],rdx
+        mov     QWORD[8+rbp],rax
+
+        mov     r14,QWORD[80+rsp]
+
+        mov     r13,QWORD[88+rsp]
+
+        mov     r12,QWORD[96+rsp]
+
+        mov     rbp,QWORD[104+rsp]
+
+        mov     rbx,QWORD[112+rsp]
+
+        mov     rdi,QWORD[120+rsp]
+        mov     rsi,QWORD[128+rsp]
+        lea     rsp,[136+rsp]
+
+$L$epilogue_mul_2x2:
+        DB      0F3h,0C3h               ;repret
+$L$end_mul_2x2:
+
+
+DB      71,70,40,50,94,109,41,32,77,117,108,116,105,112,108,105
+DB      99,97,116,105,111,110,32,102,111,114,32,120,56,54,95,54
+DB      52,44,32,67,82,89,80,84,79,71,65,77,83,32,98,121
+DB      32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46
+DB      111,114,103,62,0
+ALIGN   16
+EXTERN  __imp_RtlVirtualUnwind
+
+
+ALIGN   16
+se_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        lea     r10,[$L$body_mul_2x2]
+        cmp     rbx,r10
+        jb      NEAR $L$in_prologue
+
+        mov     rax,QWORD[152+r8]
+
+        lea     r10,[$L$epilogue_mul_2x2]
+        cmp     rbx,r10
+        jae     NEAR $L$in_prologue
+
+        mov     r14,QWORD[80+rax]
+        mov     r13,QWORD[88+rax]
+        mov     r12,QWORD[96+rax]
+        mov     rbp,QWORD[104+rax]
+        mov     rbx,QWORD[112+rax]
+        mov     rdi,QWORD[120+rax]
+        mov     rsi,QWORD[128+rax]
+
+        mov     QWORD[144+r8],rbx
+        mov     QWORD[160+r8],rbp
+        mov     QWORD[168+r8],rsi
+        mov     QWORD[176+r8],rdi
+        mov     QWORD[216+r8],r12
+        mov     QWORD[224+r8],r13
+        mov     QWORD[232+r8],r14
+
+        lea     rax,[136+rax]
+
+$L$in_prologue:
+        mov     QWORD[152+r8],rax
+
+        mov     rdi,QWORD[40+r9]
+        mov     rsi,r8
+        mov     ecx,154
+        DD      0xa548f3fc
+
+        mov     rsi,r9
+        xor     rcx,rcx
+        mov     rdx,QWORD[8+rsi]
+        mov     r8,QWORD[rsi]
+        mov     r9,QWORD[16+rsi]
+        mov     r10,QWORD[40+rsi]
+        lea     r11,[56+rsi]
+        lea     r12,[24+rsi]
+        mov     QWORD[32+rsp],r10
+        mov     QWORD[40+rsp],r11
+        mov     QWORD[48+rsp],r12
+        mov     QWORD[56+rsp],rcx
+        call    QWORD[__imp_RtlVirtualUnwind]
+
+        mov     eax,1
+        add     rsp,64
+        popfq
+        pop     r15
+        pop     r14
+        pop     r13
+        pop     r12
+        pop     rbp
+        pop     rbx
+        pop     rdi
+        pop     rsi
+        DB      0F3h,0C3h               ;repret
+
+
+section .pdata rdata align=4
+ALIGN   4
+        DD      _mul_1x1 wrt ..imagebase
+        DD      $L$end_mul_1x1 wrt ..imagebase
+        DD      $L$SEH_info_1x1 wrt ..imagebase
+
+        DD      $L$vanilla_mul_2x2 wrt ..imagebase
+        DD      $L$end_mul_2x2 wrt ..imagebase
+        DD      $L$SEH_info_2x2 wrt ..imagebase
+section .xdata rdata align=8
+ALIGN   8
+$L$SEH_info_1x1:
+DB      0x01,0x07,0x02,0x00
+DB      0x07,0x01,0x11,0x00
+$L$SEH_info_2x2:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont.nasm
new file mode 100644
index 0000000000..9ff8ec428f
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont.nasm
@@ -0,0 +1,1479 @@
+; Copyright 2005-2018 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+EXTERN  OPENSSL_ia32cap_P
+
+global  bn_mul_mont
+
+ALIGN   16
+bn_mul_mont:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_bn_mul_mont:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+
+        mov     r9d,r9d
+        mov     rax,rsp
+
+        test    r9d,3
+        jnz     NEAR $L$mul_enter
+        cmp     r9d,8
+        jb      NEAR $L$mul_enter
+        mov     r11d,DWORD[((OPENSSL_ia32cap_P+8))]
+        cmp     rdx,rsi
+        jne     NEAR $L$mul4x_enter
+        test    r9d,7
+        jz      NEAR $L$sqr8x_enter
+        jmp     NEAR $L$mul4x_enter
+
+ALIGN   16
+$L$mul_enter:
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+
+        neg     r9
+        mov     r11,rsp
+        lea     r10,[((-16))+r9*8+rsp]
+        neg     r9
+        and     r10,-1024
+
+
+
+
+
+
+
+
+
+        sub     r11,r10
+        and     r11,-4096
+        lea     rsp,[r11*1+r10]
+        mov     r11,QWORD[rsp]
+        cmp     rsp,r10
+        ja      NEAR $L$mul_page_walk
+        jmp     NEAR $L$mul_page_walk_done
+
+ALIGN   16
+$L$mul_page_walk:
+        lea     rsp,[((-4096))+rsp]
+        mov     r11,QWORD[rsp]
+        cmp     rsp,r10
+        ja      NEAR $L$mul_page_walk
+$L$mul_page_walk_done:
+
+        mov     QWORD[8+r9*8+rsp],rax
+
+$L$mul_body:
+        mov     r12,rdx
+        mov     r8,QWORD[r8]
+        mov     rbx,QWORD[r12]
+        mov     rax,QWORD[rsi]
+
+        xor     r14,r14
+        xor     r15,r15
+
+        mov     rbp,r8
+        mul     rbx
+        mov     r10,rax
+        mov     rax,QWORD[rcx]
+
+        imul    rbp,r10
+        mov     r11,rdx
+
+        mul     rbp
+        add     r10,rax
+        mov     rax,QWORD[8+rsi]
+        adc     rdx,0
+        mov     r13,rdx
+
+        lea     r15,[1+r15]
+        jmp     NEAR $L$1st_enter
+
+ALIGN   16
+$L$1st:
+        add     r13,rax
+        mov     rax,QWORD[r15*8+rsi]
+        adc     rdx,0
+        add     r13,r11
+        mov     r11,r10
+        adc     rdx,0
+        mov     QWORD[((-16))+r15*8+rsp],r13
+        mov     r13,rdx
+
+$L$1st_enter:
+        mul     rbx
+        add     r11,rax
+        mov     rax,QWORD[r15*8+rcx]
+        adc     rdx,0
+        lea     r15,[1+r15]
+        mov     r10,rdx
+
+        mul     rbp
+        cmp     r15,r9
+        jne     NEAR $L$1st
+
+        add     r13,rax
+        mov     rax,QWORD[rsi]
+        adc     rdx,0
+        add     r13,r11
+        adc     rdx,0
+        mov     QWORD[((-16))+r15*8+rsp],r13
+        mov     r13,rdx
+        mov     r11,r10
+
+        xor     rdx,rdx
+        add     r13,r11
+        adc     rdx,0
+        mov     QWORD[((-8))+r9*8+rsp],r13
+        mov     QWORD[r9*8+rsp],rdx
+
+        lea     r14,[1+r14]
+        jmp     NEAR $L$outer
+ALIGN   16
+$L$outer:
+        mov     rbx,QWORD[r14*8+r12]
+        xor     r15,r15
+        mov     rbp,r8
+        mov     r10,QWORD[rsp]
+        mul     rbx
+        add     r10,rax
+        mov     rax,QWORD[rcx]
+        adc     rdx,0
+
+        imul    rbp,r10
+        mov     r11,rdx
+
+        mul     rbp
+        add     r10,rax
+        mov     rax,QWORD[8+rsi]
+        adc     rdx,0
+        mov     r10,QWORD[8+rsp]
+        mov     r13,rdx
+
+        lea     r15,[1+r15]
+        jmp     NEAR $L$inner_enter
+
+ALIGN   16
+$L$inner:
+        add     r13,rax
+        mov     rax,QWORD[r15*8+rsi]
+        adc     rdx,0
+        add     r13,r10
+        mov     r10,QWORD[r15*8+rsp]
+        adc     rdx,0
+        mov     QWORD[((-16))+r15*8+rsp],r13
+        mov     r13,rdx
+
+$L$inner_enter:
+        mul     rbx
+        add     r11,rax
+        mov     rax,QWORD[r15*8+rcx]
+        adc     rdx,0
+        add     r10,r11
+        mov     r11,rdx
+        adc     r11,0
+        lea     r15,[1+r15]
+
+        mul     rbp
+        cmp     r15,r9
+        jne     NEAR $L$inner
+
+        add     r13,rax
+        mov     rax,QWORD[rsi]
+        adc     rdx,0
+        add     r13,r10
+        mov     r10,QWORD[r15*8+rsp]
+        adc     rdx,0
+        mov     QWORD[((-16))+r15*8+rsp],r13
+        mov     r13,rdx
+
+        xor     rdx,rdx
+        add     r13,r11
+        adc     rdx,0
+        add     r13,r10
+        adc     rdx,0
+        mov     QWORD[((-8))+r9*8+rsp],r13
+        mov     QWORD[r9*8+rsp],rdx
+
+        lea     r14,[1+r14]
+        cmp     r14,r9
+        jb      NEAR $L$outer
+
+        xor     r14,r14
+        mov     rax,QWORD[rsp]
+        mov     r15,r9
+
+ALIGN   16
+$L$sub: sbb     rax,QWORD[r14*8+rcx]
+        mov     QWORD[r14*8+rdi],rax
+        mov     rax,QWORD[8+r14*8+rsp]
+        lea     r14,[1+r14]
+        dec     r15
+        jnz     NEAR $L$sub
+
+        sbb     rax,0
+        mov     rbx,-1
+        xor     rbx,rax
+        xor     r14,r14
+        mov     r15,r9
+
+$L$copy:
+        mov     rcx,QWORD[r14*8+rdi]
+        mov     rdx,QWORD[r14*8+rsp]
+        and     rcx,rbx
+        and     rdx,rax
+        mov     QWORD[r14*8+rsp],r9
+        or      rdx,rcx
+        mov     QWORD[r14*8+rdi],rdx
+        lea     r14,[1+r14]
+        sub     r15,1
+        jnz     NEAR $L$copy
+
+        mov     rsi,QWORD[8+r9*8+rsp]
+
+        mov     rax,1
+        mov     r15,QWORD[((-48))+rsi]
+
+        mov     r14,QWORD[((-40))+rsi]
+
+        mov     r13,QWORD[((-32))+rsi]
+
+        mov     r12,QWORD[((-24))+rsi]
+
+        mov     rbp,QWORD[((-16))+rsi]
+
+        mov     rbx,QWORD[((-8))+rsi]
+
+        lea     rsp,[rsi]
+
+$L$mul_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_bn_mul_mont:
+
+ALIGN   16
+bn_mul4x_mont:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_bn_mul4x_mont:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+
+        mov     r9d,r9d
+        mov     rax,rsp
+
+$L$mul4x_enter:
+        and     r11d,0x80100
+        cmp     r11d,0x80100
+        je      NEAR $L$mulx4x_enter
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+
+        neg     r9
+        mov     r11,rsp
+        lea     r10,[((-32))+r9*8+rsp]
+        neg     r9
+        and     r10,-1024
+
+        sub     r11,r10
+        and     r11,-4096
+        lea     rsp,[r11*1+r10]
+        mov     r11,QWORD[rsp]
+        cmp     rsp,r10
+        ja      NEAR $L$mul4x_page_walk
+        jmp     NEAR $L$mul4x_page_walk_done
+
+$L$mul4x_page_walk:
+        lea     rsp,[((-4096))+rsp]
+        mov     r11,QWORD[rsp]
+        cmp     rsp,r10
+        ja      NEAR $L$mul4x_page_walk
+$L$mul4x_page_walk_done:
+
+        mov     QWORD[8+r9*8+rsp],rax
+
+$L$mul4x_body:
+        mov     QWORD[16+r9*8+rsp],rdi
+        mov     r12,rdx
+        mov     r8,QWORD[r8]
+        mov     rbx,QWORD[r12]
+        mov     rax,QWORD[rsi]
+
+        xor     r14,r14
+        xor     r15,r15
+
+        mov     rbp,r8
+        mul     rbx
+        mov     r10,rax
+        mov     rax,QWORD[rcx]
+
+        imul    rbp,r10
+        mov     r11,rdx
+
+        mul     rbp
+        add     r10,rax
+        mov     rax,QWORD[8+rsi]
+        adc     rdx,0
+        mov     rdi,rdx
+
+        mul     rbx
+        add     r11,rax
+        mov     rax,QWORD[8+rcx]
+        adc     rdx,0
+        mov     r10,rdx
+
+        mul     rbp
+        add     rdi,rax
+        mov     rax,QWORD[16+rsi]
+        adc     rdx,0
+        add     rdi,r11
+        lea     r15,[4+r15]
+        adc     rdx,0
+        mov     QWORD[rsp],rdi
+        mov     r13,rdx
+        jmp     NEAR $L$1st4x
+ALIGN   16
+$L$1st4x:
+        mul     rbx
+        add     r10,rax
+        mov     rax,QWORD[((-16))+r15*8+rcx]
+        adc     rdx,0
+        mov     r11,rdx
+
+        mul     rbp
+        add     r13,rax
+        mov     rax,QWORD[((-8))+r15*8+rsi]
+        adc     rdx,0
+        add     r13,r10
+        adc     rdx,0
+        mov     QWORD[((-24))+r15*8+rsp],r13
+        mov     rdi,rdx
+
+        mul     rbx
+        add     r11,rax
+        mov     rax,QWORD[((-8))+r15*8+rcx]
+        adc     rdx,0
+        mov     r10,rdx
+
+        mul     rbp
+        add     rdi,rax
+        mov     rax,QWORD[r15*8+rsi]
+        adc     rdx,0
+        add     rdi,r11
+        adc     rdx,0
+        mov     QWORD[((-16))+r15*8+rsp],rdi
+        mov     r13,rdx
+
+        mul     rbx
+        add     r10,rax
+        mov     rax,QWORD[r15*8+rcx]
+        adc     rdx,0
+        mov     r11,rdx
+
+        mul     rbp
+        add     r13,rax
+        mov     rax,QWORD[8+r15*8+rsi]
+        adc     rdx,0
+        add     r13,r10
+        adc     rdx,0
+        mov     QWORD[((-8))+r15*8+rsp],r13
+        mov     rdi,rdx
+
+        mul     rbx
+        add     r11,rax
+        mov     rax,QWORD[8+r15*8+rcx]
+        adc     rdx,0
+        lea     r15,[4+r15]
+        mov     r10,rdx
+
+        mul     rbp
+        add     rdi,rax
+        mov     rax,QWORD[((-16))+r15*8+rsi]
+        adc     rdx,0
+        add     rdi,r11
+        adc     rdx,0
+        mov     QWORD[((-32))+r15*8+rsp],rdi
+        mov     r13,rdx
+        cmp     r15,r9
+        jb      NEAR $L$1st4x
+
+        mul     rbx
+        add     r10,rax
+        mov     rax,QWORD[((-16))+r15*8+rcx]
+        adc     rdx,0
+        mov     r11,rdx
+
+        mul     rbp
+        add     r13,rax
+        mov     rax,QWORD[((-8))+r15*8+rsi]
+        adc     rdx,0
+        add     r13,r10
+        adc     rdx,0
+        mov     QWORD[((-24))+r15*8+rsp],r13
+        mov     rdi,rdx
+
+        mul     rbx
+        add     r11,rax
+        mov     rax,QWORD[((-8))+r15*8+rcx]
+        adc     rdx,0
+        mov     r10,rdx
+
+        mul     rbp
+        add     rdi,rax
+        mov     rax,QWORD[rsi]
+        adc     rdx,0
+        add     rdi,r11
+        adc     rdx,0
+        mov     QWORD[((-16))+r15*8+rsp],rdi
+        mov     r13,rdx
+
+        xor     rdi,rdi
+        add     r13,r10
+        adc     rdi,0
+        mov     QWORD[((-8))+r15*8+rsp],r13
+        mov     QWORD[r15*8+rsp],rdi
+
+        lea     r14,[1+r14]
+ALIGN   4
+$L$outer4x:
+        mov     rbx,QWORD[r14*8+r12]
+        xor     r15,r15
+        mov     r10,QWORD[rsp]
+        mov     rbp,r8
+        mul     rbx
+        add     r10,rax
+        mov     rax,QWORD[rcx]
+        adc     rdx,0
+
+        imul    rbp,r10
+        mov     r11,rdx
+
+        mul     rbp
+        add     r10,rax
+        mov     rax,QWORD[8+rsi]
+        adc     rdx,0
+        mov     rdi,rdx
+
+        mul     rbx
+        add     r11,rax
+        mov     rax,QWORD[8+rcx]
+        adc     rdx,0
+        add     r11,QWORD[8+rsp]
+        adc     rdx,0
+        mov     r10,rdx
+
+        mul     rbp
+        add     rdi,rax
+        mov     rax,QWORD[16+rsi]
+        adc     rdx,0
+        add     rdi,r11
+        lea     r15,[4+r15]
+        adc     rdx,0
+        mov     QWORD[rsp],rdi
+        mov     r13,rdx
+        jmp     NEAR $L$inner4x
+ALIGN   16
+$L$inner4x:
+        mul     rbx
+        add     r10,rax
+        mov     rax,QWORD[((-16))+r15*8+rcx]
+        adc     rdx,0
+        add     r10,QWORD[((-16))+r15*8+rsp]
+        adc     rdx,0
+        mov     r11,rdx
+
+        mul     rbp
+        add     r13,rax
+        mov     rax,QWORD[((-8))+r15*8+rsi]
+        adc     rdx,0
+        add     r13,r10
+        adc     rdx,0
+        mov     QWORD[((-24))+r15*8+rsp],r13
+        mov     rdi,rdx
+
+        mul     rbx
+        add     r11,rax
+        mov     rax,QWORD[((-8))+r15*8+rcx]
+        adc     rdx,0
+        add     r11,QWORD[((-8))+r15*8+rsp]
+        adc     rdx,0
+        mov     r10,rdx
+
+        mul     rbp
+        add     rdi,rax
+        mov     rax,QWORD[r15*8+rsi]
+        adc     rdx,0
+        add     rdi,r11
+        adc     rdx,0
+        mov     QWORD[((-16))+r15*8+rsp],rdi
+        mov     r13,rdx
+
+        mul     rbx
+        add     r10,rax
+        mov     rax,QWORD[r15*8+rcx]
+        adc     rdx,0
+        add     r10,QWORD[r15*8+rsp]
+        adc     rdx,0
+        mov     r11,rdx
+
+        mul     rbp
+        add     r13,rax
+        mov     rax,QWORD[8+r15*8+rsi]
+        adc     rdx,0
+        add     r13,r10
+        adc     rdx,0
+        mov     QWORD[((-8))+r15*8+rsp],r13
+        mov     rdi,rdx
+
+        mul     rbx
+        add     r11,rax
+        mov     rax,QWORD[8+r15*8+rcx]
+        adc     rdx,0
+        add     r11,QWORD[8+r15*8+rsp]
+        adc     rdx,0
+        lea     r15,[4+r15]
+        mov     r10,rdx
+
+        mul     rbp
+        add     rdi,rax
+        mov     rax,QWORD[((-16))+r15*8+rsi]
+        adc     rdx,0
+        add     rdi,r11
+        adc     rdx,0
+        mov     QWORD[((-32))+r15*8+rsp],rdi
+        mov     r13,rdx
+        cmp     r15,r9
+        jb      NEAR $L$inner4x
+
+        mul     rbx
+        add     r10,rax
+        mov     rax,QWORD[((-16))+r15*8+rcx]
+        adc     rdx,0
+        add     r10,QWORD[((-16))+r15*8+rsp]
+        adc     rdx,0
+        mov     r11,rdx
+
+        mul     rbp
+        add     r13,rax
+        mov     rax,QWORD[((-8))+r15*8+rsi]
+        adc     rdx,0
+        add     r13,r10
+        adc     rdx,0
+        mov     QWORD[((-24))+r15*8+rsp],r13
+        mov     rdi,rdx
+
+        mul     rbx
+        add     r11,rax
+        mov     rax,QWORD[((-8))+r15*8+rcx]
+        adc     rdx,0
+        add     r11,QWORD[((-8))+r15*8+rsp]
+        adc     rdx,0
+        lea     r14,[1+r14]
+        mov     r10,rdx
+
+        mul     rbp
+        add     rdi,rax
+        mov     rax,QWORD[rsi]
+        adc     rdx,0
+        add     rdi,r11
+        adc     rdx,0
+        mov     QWORD[((-16))+r15*8+rsp],rdi
+        mov     r13,rdx
+
+        xor     rdi,rdi
+        add     r13,r10
+        adc     rdi,0
+        add     r13,QWORD[r9*8+rsp]
+        adc     rdi,0
+        mov     QWORD[((-8))+r15*8+rsp],r13
+        mov     QWORD[r15*8+rsp],rdi
+
+        cmp     r14,r9
+        jb      NEAR $L$outer4x
+        mov     rdi,QWORD[16+r9*8+rsp]
+        lea     r15,[((-4))+r9]
+        mov     rax,QWORD[rsp]
+        mov     rdx,QWORD[8+rsp]
+        shr     r15,2
+        lea     rsi,[rsp]
+        xor     r14,r14
+
+        sub     rax,QWORD[rcx]
+        mov     rbx,QWORD[16+rsi]
+        mov     rbp,QWORD[24+rsi]
+        sbb     rdx,QWORD[8+rcx]
+
+$L$sub4x:
+        mov     QWORD[r14*8+rdi],rax
+        mov     QWORD[8+r14*8+rdi],rdx
+        sbb     rbx,QWORD[16+r14*8+rcx]
+        mov     rax,QWORD[32+r14*8+rsi]
+        mov     rdx,QWORD[40+r14*8+rsi]
+        sbb     rbp,QWORD[24+r14*8+rcx]
+        mov     QWORD[16+r14*8+rdi],rbx
+        mov     QWORD[24+r14*8+rdi],rbp
+        sbb     rax,QWORD[32+r14*8+rcx]
+        mov     rbx,QWORD[48+r14*8+rsi]
+        mov     rbp,QWORD[56+r14*8+rsi]
+        sbb     rdx,QWORD[40+r14*8+rcx]
+        lea     r14,[4+r14]
+        dec     r15
+        jnz     NEAR $L$sub4x
+
+        mov     QWORD[r14*8+rdi],rax
+        mov     rax,QWORD[32+r14*8+rsi]
+        sbb     rbx,QWORD[16+r14*8+rcx]
+        mov     QWORD[8+r14*8+rdi],rdx
+        sbb     rbp,QWORD[24+r14*8+rcx]
+        mov     QWORD[16+r14*8+rdi],rbx
+
+        sbb     rax,0
+        mov     QWORD[24+r14*8+rdi],rbp
+        pxor    xmm0,xmm0
+DB      102,72,15,110,224
+        pcmpeqd xmm5,xmm5
+        pshufd  xmm4,xmm4,0
+        mov     r15,r9
+        pxor    xmm5,xmm4
+        shr     r15,2
+        xor     eax,eax
+
+        jmp     NEAR $L$copy4x
+ALIGN   16
+$L$copy4x:
+        movdqa  xmm1,XMMWORD[rax*1+rsp]
+        movdqu  xmm2,XMMWORD[rax*1+rdi]
+        pand    xmm1,xmm4
+        pand    xmm2,xmm5
+        movdqa  xmm3,XMMWORD[16+rax*1+rsp]
+        movdqa  XMMWORD[rax*1+rsp],xmm0
+        por     xmm1,xmm2
+        movdqu  xmm2,XMMWORD[16+rax*1+rdi]
+        movdqu  XMMWORD[rax*1+rdi],xmm1
+        pand    xmm3,xmm4
+        pand    xmm2,xmm5
+        movdqa  XMMWORD[16+rax*1+rsp],xmm0
+        por     xmm3,xmm2
+        movdqu  XMMWORD[16+rax*1+rdi],xmm3
+        lea     rax,[32+rax]
+        dec     r15
+        jnz     NEAR $L$copy4x
+        mov     rsi,QWORD[8+r9*8+rsp]
+
+        mov     rax,1
+        mov     r15,QWORD[((-48))+rsi]
+
+        mov     r14,QWORD[((-40))+rsi]
+
+        mov     r13,QWORD[((-32))+rsi]
+
+        mov     r12,QWORD[((-24))+rsi]
+
+        mov     rbp,QWORD[((-16))+rsi]
+
+        mov     rbx,QWORD[((-8))+rsi]
+
+        lea     rsp,[rsi]
+
+$L$mul4x_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_bn_mul4x_mont:
+EXTERN  bn_sqrx8x_internal
+EXTERN  bn_sqr8x_internal
+
+
+ALIGN   32
+bn_sqr8x_mont:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_bn_sqr8x_mont:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+
+        mov     rax,rsp
+
+$L$sqr8x_enter:
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+$L$sqr8x_prologue:
+
+        mov     r10d,r9d
+        shl     r9d,3
+        shl     r10,3+2
+        neg     r9
+
+
+
+
+
+
+        lea     r11,[((-64))+r9*2+rsp]
+        mov     rbp,rsp
+        mov     r8,QWORD[r8]
+        sub     r11,rsi
+        and     r11,4095
+        cmp     r10,r11
+        jb      NEAR $L$sqr8x_sp_alt
+        sub     rbp,r11
+        lea     rbp,[((-64))+r9*2+rbp]
+        jmp     NEAR $L$sqr8x_sp_done
+
+ALIGN   32
+$L$sqr8x_sp_alt:
+        lea     r10,[((4096-64))+r9*2]
+        lea     rbp,[((-64))+r9*2+rbp]
+        sub     r11,r10
+        mov     r10,0
+        cmovc   r11,r10
+        sub     rbp,r11
+$L$sqr8x_sp_done:
+        and     rbp,-64
+        mov     r11,rsp
+        sub     r11,rbp
+        and     r11,-4096
+        lea     rsp,[rbp*1+r11]
+        mov     r10,QWORD[rsp]
+        cmp     rsp,rbp
+        ja      NEAR $L$sqr8x_page_walk
+        jmp     NEAR $L$sqr8x_page_walk_done
+
+ALIGN   16
+$L$sqr8x_page_walk:
+        lea     rsp,[((-4096))+rsp]
+        mov     r10,QWORD[rsp]
+        cmp     rsp,rbp
+        ja      NEAR $L$sqr8x_page_walk
+$L$sqr8x_page_walk_done:
+
+        mov     r10,r9
+        neg     r9
+
+        mov     QWORD[32+rsp],r8
+        mov     QWORD[40+rsp],rax
+
+$L$sqr8x_body:
+
+DB      102,72,15,110,209
+        pxor    xmm0,xmm0
+DB      102,72,15,110,207
+DB      102,73,15,110,218
+        mov     eax,DWORD[((OPENSSL_ia32cap_P+8))]
+        and     eax,0x80100
+        cmp     eax,0x80100
+        jne     NEAR $L$sqr8x_nox
+
+        call    bn_sqrx8x_internal
+
+
+
+
+        lea     rbx,[rcx*1+r8]
+        mov     r9,rcx
+        mov     rdx,rcx
+DB      102,72,15,126,207
+        sar     rcx,3+2
+        jmp     NEAR $L$sqr8x_sub
+
+ALIGN   32
+$L$sqr8x_nox:
+        call    bn_sqr8x_internal
+
+
+
+
+        lea     rbx,[r9*1+rdi]
+        mov     rcx,r9
+        mov     rdx,r9
+DB      102,72,15,126,207
+        sar     rcx,3+2
+        jmp     NEAR $L$sqr8x_sub
+
+ALIGN   32
+$L$sqr8x_sub:
+        mov     r12,QWORD[rbx]
+        mov     r13,QWORD[8+rbx]
+        mov     r14,QWORD[16+rbx]
+        mov     r15,QWORD[24+rbx]
+        lea     rbx,[32+rbx]
+        sbb     r12,QWORD[rbp]
+        sbb     r13,QWORD[8+rbp]
+        sbb     r14,QWORD[16+rbp]
+        sbb     r15,QWORD[24+rbp]
+        lea     rbp,[32+rbp]
+        mov     QWORD[rdi],r12
+        mov     QWORD[8+rdi],r13
+        mov     QWORD[16+rdi],r14
+        mov     QWORD[24+rdi],r15
+        lea     rdi,[32+rdi]
+        inc     rcx
+        jnz     NEAR $L$sqr8x_sub
+
+        sbb     rax,0
+        lea     rbx,[r9*1+rbx]
+        lea     rdi,[r9*1+rdi]
+
+DB      102,72,15,110,200
+        pxor    xmm0,xmm0
+        pshufd  xmm1,xmm1,0
+        mov     rsi,QWORD[40+rsp]
+
+        jmp     NEAR $L$sqr8x_cond_copy
+
+ALIGN   32
+$L$sqr8x_cond_copy:
+        movdqa  xmm2,XMMWORD[rbx]
+        movdqa  xmm3,XMMWORD[16+rbx]
+        lea     rbx,[32+rbx]
+        movdqu  xmm4,XMMWORD[rdi]
+        movdqu  xmm5,XMMWORD[16+rdi]
+        lea     rdi,[32+rdi]
+        movdqa  XMMWORD[(-32)+rbx],xmm0
+        movdqa  XMMWORD[(-16)+rbx],xmm0
+        movdqa  XMMWORD[(-32)+rdx*1+rbx],xmm0
+        movdqa  XMMWORD[(-16)+rdx*1+rbx],xmm0
+        pcmpeqd xmm0,xmm1
+        pand    xmm2,xmm1
+        pand    xmm3,xmm1
+        pand    xmm4,xmm0
+        pand    xmm5,xmm0
+        pxor    xmm0,xmm0
+        por     xmm4,xmm2
+        por     xmm5,xmm3
+        movdqu  XMMWORD[(-32)+rdi],xmm4
+        movdqu  XMMWORD[(-16)+rdi],xmm5
+        add     r9,32
+        jnz     NEAR $L$sqr8x_cond_copy
+
+        mov     rax,1
+        mov     r15,QWORD[((-48))+rsi]
+
+        mov     r14,QWORD[((-40))+rsi]
+
+        mov     r13,QWORD[((-32))+rsi]
+
+        mov     r12,QWORD[((-24))+rsi]
+
+        mov     rbp,QWORD[((-16))+rsi]
+
+        mov     rbx,QWORD[((-8))+rsi]
+
+        lea     rsp,[rsi]
+
+$L$sqr8x_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_bn_sqr8x_mont:
+
+ALIGN   32
+bn_mulx4x_mont:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_bn_mulx4x_mont:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+
+        mov     rax,rsp
+
+$L$mulx4x_enter:
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+$L$mulx4x_prologue:
+
+        shl     r9d,3
+        xor     r10,r10
+        sub     r10,r9
+        mov     r8,QWORD[r8]
+        lea     rbp,[((-72))+r10*1+rsp]
+        and     rbp,-128
+        mov     r11,rsp
+        sub     r11,rbp
+        and     r11,-4096
+        lea     rsp,[rbp*1+r11]
+        mov     r10,QWORD[rsp]
+        cmp     rsp,rbp
+        ja      NEAR $L$mulx4x_page_walk
+        jmp     NEAR $L$mulx4x_page_walk_done
+
+ALIGN   16
+$L$mulx4x_page_walk:
+        lea     rsp,[((-4096))+rsp]
+        mov     r10,QWORD[rsp]
+        cmp     rsp,rbp
+        ja      NEAR $L$mulx4x_page_walk
+$L$mulx4x_page_walk_done:
+
+        lea     r10,[r9*1+rdx]
+
+
+
+
+
+
+
+
+
+
+
+
+        mov     QWORD[rsp],r9
+        shr     r9,5
+        mov     QWORD[16+rsp],r10
+        sub     r9,1
+        mov     QWORD[24+rsp],r8
+        mov     QWORD[32+rsp],rdi
+        mov     QWORD[40+rsp],rax
+
+        mov     QWORD[48+rsp],r9
+        jmp     NEAR $L$mulx4x_body
+
+ALIGN   32
+$L$mulx4x_body:
+        lea     rdi,[8+rdx]
+        mov     rdx,QWORD[rdx]
+        lea     rbx,[((64+32))+rsp]
+        mov     r9,rdx
+
+        mulx    rax,r8,QWORD[rsi]
+        mulx    r14,r11,QWORD[8+rsi]
+        add     r11,rax
+        mov     QWORD[8+rsp],rdi
+        mulx    r13,r12,QWORD[16+rsi]
+        adc     r12,r14
+        adc     r13,0
+
+        mov     rdi,r8
+        imul    r8,QWORD[24+rsp]
+        xor     rbp,rbp
+
+        mulx    r14,rax,QWORD[24+rsi]
+        mov     rdx,r8
+        lea     rsi,[32+rsi]
+        adcx    r13,rax
+        adcx    r14,rbp
+
+        mulx    r10,rax,QWORD[rcx]
+        adcx    rdi,rax
+        adox    r10,r11
+        mulx    r11,rax,QWORD[8+rcx]
+        adcx    r10,rax
+        adox    r11,r12
+DB      0xc4,0x62,0xfb,0xf6,0xa1,0x10,0x00,0x00,0x00
+        mov     rdi,QWORD[48+rsp]
+        mov     QWORD[((-32))+rbx],r10
+        adcx    r11,rax
+        adox    r12,r13
+        mulx    r15,rax,QWORD[24+rcx]
+        mov     rdx,r9
+        mov     QWORD[((-24))+rbx],r11
+        adcx    r12,rax
+        adox    r15,rbp
+        lea     rcx,[32+rcx]
+        mov     QWORD[((-16))+rbx],r12
+
+        jmp     NEAR $L$mulx4x_1st
+
+ALIGN   32
+$L$mulx4x_1st:
+        adcx    r15,rbp
+        mulx    rax,r10,QWORD[rsi]
+        adcx    r10,r14
+        mulx    r14,r11,QWORD[8+rsi]
+        adcx    r11,rax
+        mulx    rax,r12,QWORD[16+rsi]
+        adcx    r12,r14
+        mulx    r14,r13,QWORD[24+rsi]
+DB      0x67,0x67
+        mov     rdx,r8
+        adcx    r13,rax
+        adcx    r14,rbp
+        lea     rsi,[32+rsi]
+        lea     rbx,[32+rbx]
+
+        adox    r10,r15
+        mulx    r15,rax,QWORD[rcx]
+        adcx    r10,rax
+        adox    r11,r15
+        mulx    r15,rax,QWORD[8+rcx]
+        adcx    r11,rax
+        adox    r12,r15
+        mulx    r15,rax,QWORD[16+rcx]
+        mov     QWORD[((-40))+rbx],r10
+        adcx    r12,rax
+        mov     QWORD[((-32))+rbx],r11
+        adox    r13,r15
+        mulx    r15,rax,QWORD[24+rcx]
+        mov     rdx,r9
+        mov     QWORD[((-24))+rbx],r12
+        adcx    r13,rax
+        adox    r15,rbp
+        lea     rcx,[32+rcx]
+        mov     QWORD[((-16))+rbx],r13
+
+        dec     rdi
+        jnz     NEAR $L$mulx4x_1st
+
+        mov     rax,QWORD[rsp]
+        mov     rdi,QWORD[8+rsp]
+        adc     r15,rbp
+        add     r14,r15
+        sbb     r15,r15
+        mov     QWORD[((-8))+rbx],r14
+        jmp     NEAR $L$mulx4x_outer
+
+ALIGN   32
+$L$mulx4x_outer:
+        mov     rdx,QWORD[rdi]
+        lea     rdi,[8+rdi]
+        sub     rsi,rax
+        mov     QWORD[rbx],r15
+        lea     rbx,[((64+32))+rsp]
+        sub     rcx,rax
+
+        mulx    r11,r8,QWORD[rsi]
+        xor     ebp,ebp
+        mov     r9,rdx
+        mulx    r12,r14,QWORD[8+rsi]
+        adox    r8,QWORD[((-32))+rbx]
+        adcx    r11,r14
+        mulx    r13,r15,QWORD[16+rsi]
+        adox    r11,QWORD[((-24))+rbx]
+        adcx    r12,r15
+        adox    r12,QWORD[((-16))+rbx]
+        adcx    r13,rbp
+        adox    r13,rbp
+
+        mov     QWORD[8+rsp],rdi
+        mov     r15,r8
+        imul    r8,QWORD[24+rsp]
+        xor     ebp,ebp
+
+        mulx    r14,rax,QWORD[24+rsi]
+        mov     rdx,r8
+        adcx    r13,rax
+        adox    r13,QWORD[((-8))+rbx]
+        adcx    r14,rbp
+        lea     rsi,[32+rsi]
+        adox    r14,rbp
+
+        mulx    r10,rax,QWORD[rcx]
+        adcx    r15,rax
+        adox    r10,r11
+        mulx    r11,rax,QWORD[8+rcx]
+        adcx    r10,rax
+        adox    r11,r12
+        mulx    r12,rax,QWORD[16+rcx]
+        mov     QWORD[((-32))+rbx],r10
+        adcx    r11,rax
+        adox    r12,r13
+        mulx    r15,rax,QWORD[24+rcx]
+        mov     rdx,r9
+        mov     QWORD[((-24))+rbx],r11
+        lea     rcx,[32+rcx]
+        adcx    r12,rax
+        adox    r15,rbp
+        mov     rdi,QWORD[48+rsp]
+        mov     QWORD[((-16))+rbx],r12
+
+        jmp     NEAR $L$mulx4x_inner
+
+ALIGN   32
+$L$mulx4x_inner:
+        mulx    rax,r10,QWORD[rsi]
+        adcx    r15,rbp
+        adox    r10,r14
+        mulx    r14,r11,QWORD[8+rsi]
+        adcx    r10,QWORD[rbx]
+        adox    r11,rax
+        mulx    rax,r12,QWORD[16+rsi]
+        adcx    r11,QWORD[8+rbx]
+        adox    r12,r14
+        mulx    r14,r13,QWORD[24+rsi]
+        mov     rdx,r8
+        adcx    r12,QWORD[16+rbx]
+        adox    r13,rax
+        adcx    r13,QWORD[24+rbx]
+        adox    r14,rbp
+        lea     rsi,[32+rsi]
+        lea     rbx,[32+rbx]
+        adcx    r14,rbp
+
+        adox    r10,r15
+        mulx    r15,rax,QWORD[rcx]
+        adcx    r10,rax
+        adox    r11,r15
+        mulx    r15,rax,QWORD[8+rcx]
+        adcx    r11,rax
+        adox    r12,r15
+        mulx    r15,rax,QWORD[16+rcx]
+        mov     QWORD[((-40))+rbx],r10
+        adcx    r12,rax
+        adox    r13,r15
+        mulx    r15,rax,QWORD[24+rcx]
+        mov     rdx,r9
+        mov     QWORD[((-32))+rbx],r11
+        mov     QWORD[((-24))+rbx],r12
+        adcx    r13,rax
+        adox    r15,rbp
+        lea     rcx,[32+rcx]
+        mov     QWORD[((-16))+rbx],r13
+
+        dec     rdi
+        jnz     NEAR $L$mulx4x_inner
+
+        mov     rax,QWORD[rsp]
+        mov     rdi,QWORD[8+rsp]
+        adc     r15,rbp
+        sub     rbp,QWORD[rbx]
+        adc     r14,r15
+        sbb     r15,r15
+        mov     QWORD[((-8))+rbx],r14
+
+        cmp     rdi,QWORD[16+rsp]
+        jne     NEAR $L$mulx4x_outer
+
+        lea     rbx,[64+rsp]
+        sub     rcx,rax
+        neg     r15
+        mov     rdx,rax
+        shr     rax,3+2
+        mov     rdi,QWORD[32+rsp]
+        jmp     NEAR $L$mulx4x_sub
+
+ALIGN   32
+$L$mulx4x_sub:
+        mov     r11,QWORD[rbx]
+        mov     r12,QWORD[8+rbx]
+        mov     r13,QWORD[16+rbx]
+        mov     r14,QWORD[24+rbx]
+        lea     rbx,[32+rbx]
+        sbb     r11,QWORD[rcx]
+        sbb     r12,QWORD[8+rcx]
+        sbb     r13,QWORD[16+rcx]
+        sbb     r14,QWORD[24+rcx]
+        lea     rcx,[32+rcx]
+        mov     QWORD[rdi],r11
+        mov     QWORD[8+rdi],r12
+        mov     QWORD[16+rdi],r13
+        mov     QWORD[24+rdi],r14
+        lea     rdi,[32+rdi]
+        dec     rax
+        jnz     NEAR $L$mulx4x_sub
+
+        sbb     r15,0
+        lea     rbx,[64+rsp]
+        sub     rdi,rdx
+
+DB      102,73,15,110,207
+        pxor    xmm0,xmm0
+        pshufd  xmm1,xmm1,0
+        mov     rsi,QWORD[40+rsp]
+
+        jmp     NEAR $L$mulx4x_cond_copy
+
+ALIGN   32
+$L$mulx4x_cond_copy:
+        movdqa  xmm2,XMMWORD[rbx]
+        movdqa  xmm3,XMMWORD[16+rbx]
+        lea     rbx,[32+rbx]
+        movdqu  xmm4,XMMWORD[rdi]
+        movdqu  xmm5,XMMWORD[16+rdi]
+        lea     rdi,[32+rdi]
+        movdqa  XMMWORD[(-32)+rbx],xmm0
+        movdqa  XMMWORD[(-16)+rbx],xmm0
+        pcmpeqd xmm0,xmm1
+        pand    xmm2,xmm1
+        pand    xmm3,xmm1
+        pand    xmm4,xmm0
+        pand    xmm5,xmm0
+        pxor    xmm0,xmm0
+        por     xmm4,xmm2
+        por     xmm5,xmm3
+        movdqu  XMMWORD[(-32)+rdi],xmm4
+        movdqu  XMMWORD[(-16)+rdi],xmm5
+        sub     rdx,32
+        jnz     NEAR $L$mulx4x_cond_copy
+
+        mov     QWORD[rbx],rdx
+
+        mov     rax,1
+        mov     r15,QWORD[((-48))+rsi]
+
+        mov     r14,QWORD[((-40))+rsi]
+
+        mov     r13,QWORD[((-32))+rsi]
+
+        mov     r12,QWORD[((-24))+rsi]
+
+        mov     rbp,QWORD[((-16))+rsi]
+
+        mov     rbx,QWORD[((-8))+rsi]
+
+        lea     rsp,[rsi]
+
+$L$mulx4x_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_bn_mulx4x_mont:
+DB      77,111,110,116,103,111,109,101,114,121,32,77,117,108,116,105
+DB      112,108,105,99,97,116,105,111,110,32,102,111,114,32,120,56
+DB      54,95,54,52,44,32,67,82,89,80,84,79,71,65,77,83
+DB      32,98,121,32,60,97,112,112,114,111,64,111,112,101,110,115
+DB      115,108,46,111,114,103,62,0
+ALIGN   16
+EXTERN  __imp_RtlVirtualUnwind
+
+ALIGN   16
+mul_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        mov     rsi,QWORD[8+r9]
+        mov     r11,QWORD[56+r9]
+
+        mov     r10d,DWORD[r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jb      NEAR $L$common_seh_tail
+
+        mov     rax,QWORD[152+r8]
+
+        mov     r10d,DWORD[4+r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jae     NEAR $L$common_seh_tail
+
+        mov     r10,QWORD[192+r8]
+        mov     rax,QWORD[8+r10*8+rax]
+
+        jmp     NEAR $L$common_pop_regs
+
+
+
+ALIGN   16
+sqr_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        mov     rsi,QWORD[8+r9]
+        mov     r11,QWORD[56+r9]
+
+        mov     r10d,DWORD[r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jb      NEAR $L$common_seh_tail
+
+        mov     r10d,DWORD[4+r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jb      NEAR $L$common_pop_regs
+
+        mov     rax,QWORD[152+r8]
+
+        mov     r10d,DWORD[8+r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jae     NEAR $L$common_seh_tail
+
+        mov     rax,QWORD[40+rax]
+
+$L$common_pop_regs:
+        mov     rbx,QWORD[((-8))+rax]
+        mov     rbp,QWORD[((-16))+rax]
+        mov     r12,QWORD[((-24))+rax]
+        mov     r13,QWORD[((-32))+rax]
+        mov     r14,QWORD[((-40))+rax]
+        mov     r15,QWORD[((-48))+rax]
+        mov     QWORD[144+r8],rbx
+        mov     QWORD[160+r8],rbp
+        mov     QWORD[216+r8],r12
+        mov     QWORD[224+r8],r13
+        mov     QWORD[232+r8],r14
+        mov     QWORD[240+r8],r15
+
+$L$common_seh_tail:
+        mov     rdi,QWORD[8+rax]
+        mov     rsi,QWORD[16+rax]
+        mov     QWORD[152+r8],rax
+        mov     QWORD[168+r8],rsi
+        mov     QWORD[176+r8],rdi
+
+        mov     rdi,QWORD[40+r9]
+        mov     rsi,r8
+        mov     ecx,154
+        DD      0xa548f3fc
+
+        mov     rsi,r9
+        xor     rcx,rcx
+        mov     rdx,QWORD[8+rsi]
+        mov     r8,QWORD[rsi]
+        mov     r9,QWORD[16+rsi]
+        mov     r10,QWORD[40+rsi]
+        lea     r11,[56+rsi]
+        lea     r12,[24+rsi]
+        mov     QWORD[32+rsp],r10
+        mov     QWORD[40+rsp],r11
+        mov     QWORD[48+rsp],r12
+        mov     QWORD[56+rsp],rcx
+        call    QWORD[__imp_RtlVirtualUnwind]
+
+        mov     eax,1
+        add     rsp,64
+        popfq
+        pop     r15
+        pop     r14
+        pop     r13
+        pop     r12
+        pop     rbp
+        pop     rbx
+        pop     rdi
+        pop     rsi
+        DB      0F3h,0C3h               ;repret
+
+
+section .pdata rdata align=4
+ALIGN   4
+        DD      $L$SEH_begin_bn_mul_mont wrt ..imagebase
+        DD      $L$SEH_end_bn_mul_mont wrt ..imagebase
+        DD      $L$SEH_info_bn_mul_mont wrt ..imagebase
+
+        DD      $L$SEH_begin_bn_mul4x_mont wrt ..imagebase
+        DD      $L$SEH_end_bn_mul4x_mont wrt ..imagebase
+        DD      $L$SEH_info_bn_mul4x_mont wrt ..imagebase
+
+        DD      $L$SEH_begin_bn_sqr8x_mont wrt ..imagebase
+        DD      $L$SEH_end_bn_sqr8x_mont wrt ..imagebase
+        DD      $L$SEH_info_bn_sqr8x_mont wrt ..imagebase
+        DD      $L$SEH_begin_bn_mulx4x_mont wrt ..imagebase
+        DD      $L$SEH_end_bn_mulx4x_mont wrt ..imagebase
+        DD      $L$SEH_info_bn_mulx4x_mont wrt ..imagebase
+section .xdata rdata align=8
+ALIGN   8
+$L$SEH_info_bn_mul_mont:
+DB      9,0,0,0
+        DD      mul_handler wrt ..imagebase
+        DD      $L$mul_body wrt ..imagebase,$L$mul_epilogue wrt ..imagebase
+$L$SEH_info_bn_mul4x_mont:
+DB      9,0,0,0
+        DD      mul_handler wrt ..imagebase
+        DD      $L$mul4x_body wrt ..imagebase,$L$mul4x_epilogue wrt ..imagebase
+$L$SEH_info_bn_sqr8x_mont:
+DB      9,0,0,0
+        DD      sqr_handler wrt ..imagebase
+        DD      $L$sqr8x_prologue wrt ..imagebase,$L$sqr8x_body wrt ..imagebase,$L$sqr8x_epilogue wrt ..imagebase
+ALIGN   8
+$L$SEH_info_bn_mulx4x_mont:
+DB      9,0,0,0
+        DD      sqr_handler wrt ..imagebase
+        DD      $L$mulx4x_prologue wrt ..imagebase,$L$mulx4x_body wrt ..imagebase,$L$mulx4x_epilogue wrt ..imagebase
+ALIGN   8
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont5.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont5.nasm
new file mode 100644
index 0000000000..f256a94476
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont5.nasm
@@ -0,0 +1,4033 @@
+; Copyright 2011-2019 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+EXTERN  OPENSSL_ia32cap_P
+
+global  bn_mul_mont_gather5
+
+ALIGN   64
+bn_mul_mont_gather5:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_bn_mul_mont_gather5:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+
+        mov     r9d,r9d
+        mov     rax,rsp
+
+        test    r9d,7
+        jnz     NEAR $L$mul_enter
+        mov     r11d,DWORD[((OPENSSL_ia32cap_P+8))]
+        jmp     NEAR $L$mul4x_enter
+
+ALIGN   16
+$L$mul_enter:
+        movd    xmm5,DWORD[56+rsp]
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+
+        neg     r9
+        mov     r11,rsp
+        lea     r10,[((-280))+r9*8+rsp]
+        neg     r9
+        and     r10,-1024
+
+
+
+
+
+
+
+
+
+        sub     r11,r10
+        and     r11,-4096
+        lea     rsp,[r11*1+r10]
+        mov     r11,QWORD[rsp]
+        cmp     rsp,r10
+        ja      NEAR $L$mul_page_walk
+        jmp     NEAR $L$mul_page_walk_done
+
+$L$mul_page_walk:
+        lea     rsp,[((-4096))+rsp]
+        mov     r11,QWORD[rsp]
+        cmp     rsp,r10
+        ja      NEAR $L$mul_page_walk
+$L$mul_page_walk_done:
+
+        lea     r10,[$L$inc]
+        mov     QWORD[8+r9*8+rsp],rax
+
+$L$mul_body:
+
+        lea     r12,[128+rdx]
+        movdqa  xmm0,XMMWORD[r10]
+        movdqa  xmm1,XMMWORD[16+r10]
+        lea     r10,[((24-112))+r9*8+rsp]
+        and     r10,-16
+
+        pshufd  xmm5,xmm5,0
+        movdqa  xmm4,xmm1
+        movdqa  xmm2,xmm1
+        paddd   xmm1,xmm0
+        pcmpeqd xmm0,xmm5
+DB      0x67
+        movdqa  xmm3,xmm4
+        paddd   xmm2,xmm1
+        pcmpeqd xmm1,xmm5
+        movdqa  XMMWORD[112+r10],xmm0
+        movdqa  xmm0,xmm4
+
+        paddd   xmm3,xmm2
+        pcmpeqd xmm2,xmm5
+        movdqa  XMMWORD[128+r10],xmm1
+        movdqa  xmm1,xmm4
+
+        paddd   xmm0,xmm3
+        pcmpeqd xmm3,xmm5
+        movdqa  XMMWORD[144+r10],xmm2
+        movdqa  xmm2,xmm4
+
+        paddd   xmm1,xmm0
+        pcmpeqd xmm0,xmm5
+        movdqa  XMMWORD[160+r10],xmm3
+        movdqa  xmm3,xmm4
+        paddd   xmm2,xmm1
+        pcmpeqd xmm1,xmm5
+        movdqa  XMMWORD[176+r10],xmm0
+        movdqa  xmm0,xmm4
+
+        paddd   xmm3,xmm2
+        pcmpeqd xmm2,xmm5
+        movdqa  XMMWORD[192+r10],xmm1
+        movdqa  xmm1,xmm4
+
+        paddd   xmm0,xmm3
+        pcmpeqd xmm3,xmm5
+        movdqa  XMMWORD[208+r10],xmm2
+        movdqa  xmm2,xmm4
+
+        paddd   xmm1,xmm0
+        pcmpeqd xmm0,xmm5
+        movdqa  XMMWORD[224+r10],xmm3
+        movdqa  xmm3,xmm4
+        paddd   xmm2,xmm1
+        pcmpeqd xmm1,xmm5
+        movdqa  XMMWORD[240+r10],xmm0
+        movdqa  xmm0,xmm4
+
+        paddd   xmm3,xmm2
+        pcmpeqd xmm2,xmm5
+        movdqa  XMMWORD[256+r10],xmm1
+        movdqa  xmm1,xmm4
+
+        paddd   xmm0,xmm3
+        pcmpeqd xmm3,xmm5
+        movdqa  XMMWORD[272+r10],xmm2
+        movdqa  xmm2,xmm4
+
+        paddd   xmm1,xmm0
+        pcmpeqd xmm0,xmm5
+        movdqa  XMMWORD[288+r10],xmm3
+        movdqa  xmm3,xmm4
+        paddd   xmm2,xmm1
+        pcmpeqd xmm1,xmm5
+        movdqa  XMMWORD[304+r10],xmm0
+
+        paddd   xmm3,xmm2
+DB      0x67
+        pcmpeqd xmm2,xmm5
+        movdqa  XMMWORD[320+r10],xmm1
+
+        pcmpeqd xmm3,xmm5
+        movdqa  XMMWORD[336+r10],xmm2
+        pand    xmm0,XMMWORD[64+r12]
+
+        pand    xmm1,XMMWORD[80+r12]
+        pand    xmm2,XMMWORD[96+r12]
+        movdqa  XMMWORD[352+r10],xmm3
+        pand    xmm3,XMMWORD[112+r12]
+        por     xmm0,xmm2
+        por     xmm1,xmm3
+        movdqa  xmm4,XMMWORD[((-128))+r12]
+        movdqa  xmm5,XMMWORD[((-112))+r12]
+        movdqa  xmm2,XMMWORD[((-96))+r12]
+        pand    xmm4,XMMWORD[112+r10]
+        movdqa  xmm3,XMMWORD[((-80))+r12]
+        pand    xmm5,XMMWORD[128+r10]
+        por     xmm0,xmm4
+        pand    xmm2,XMMWORD[144+r10]
+        por     xmm1,xmm5
+        pand    xmm3,XMMWORD[160+r10]
+        por     xmm0,xmm2
+        por     xmm1,xmm3
+        movdqa  xmm4,XMMWORD[((-64))+r12]
+        movdqa  xmm5,XMMWORD[((-48))+r12]
+        movdqa  xmm2,XMMWORD[((-32))+r12]
+        pand    xmm4,XMMWORD[176+r10]
+        movdqa  xmm3,XMMWORD[((-16))+r12]
+        pand    xmm5,XMMWORD[192+r10]
+        por     xmm0,xmm4
+        pand    xmm2,XMMWORD[208+r10]
+        por     xmm1,xmm5
+        pand    xmm3,XMMWORD[224+r10]
+        por     xmm0,xmm2
+        por     xmm1,xmm3
+        movdqa  xmm4,XMMWORD[r12]
+        movdqa  xmm5,XMMWORD[16+r12]
+        movdqa  xmm2,XMMWORD[32+r12]
+        pand    xmm4,XMMWORD[240+r10]
+        movdqa  xmm3,XMMWORD[48+r12]
+        pand    xmm5,XMMWORD[256+r10]
+        por     xmm0,xmm4
+        pand    xmm2,XMMWORD[272+r10]
+        por     xmm1,xmm5
+        pand    xmm3,XMMWORD[288+r10]
+        por     xmm0,xmm2
+        por     xmm1,xmm3
+        por     xmm0,xmm1
+        pshufd  xmm1,xmm0,0x4e
+        por     xmm0,xmm1
+        lea     r12,[256+r12]
+DB      102,72,15,126,195
+
+        mov     r8,QWORD[r8]
+        mov     rax,QWORD[rsi]
+
+        xor     r14,r14
+        xor     r15,r15
+
+        mov     rbp,r8
+        mul     rbx
+        mov     r10,rax
+        mov     rax,QWORD[rcx]
+
+        imul    rbp,r10
+        mov     r11,rdx
+
+        mul     rbp
+        add     r10,rax
+        mov     rax,QWORD[8+rsi]
+        adc     rdx,0
+        mov     r13,rdx
+
+        lea     r15,[1+r15]
+        jmp     NEAR $L$1st_enter
+
+ALIGN   16
+$L$1st:
+        add     r13,rax
+        mov     rax,QWORD[r15*8+rsi]
+        adc     rdx,0
+        add     r13,r11
+        mov     r11,r10
+        adc     rdx,0
+        mov     QWORD[((-16))+r15*8+rsp],r13
+        mov     r13,rdx
+
+$L$1st_enter:
+        mul     rbx
+        add     r11,rax
+        mov     rax,QWORD[r15*8+rcx]
+        adc     rdx,0
+        lea     r15,[1+r15]
+        mov     r10,rdx
+
+        mul     rbp
+        cmp     r15,r9
+        jne     NEAR $L$1st
+
+
+        add     r13,rax
+        adc     rdx,0
+        add     r13,r11
+        adc     rdx,0
+        mov     QWORD[((-16))+r9*8+rsp],r13
+        mov     r13,rdx
+        mov     r11,r10
+
+        xor     rdx,rdx
+        add     r13,r11
+        adc     rdx,0
+        mov     QWORD[((-8))+r9*8+rsp],r13
+        mov     QWORD[r9*8+rsp],rdx
+
+        lea     r14,[1+r14]
+        jmp     NEAR $L$outer
+ALIGN   16
+$L$outer:
+        lea     rdx,[((24+128))+r9*8+rsp]
+        and     rdx,-16
+        pxor    xmm4,xmm4
+        pxor    xmm5,xmm5
+        movdqa  xmm0,XMMWORD[((-128))+r12]
+        movdqa  xmm1,XMMWORD[((-112))+r12]
+        movdqa  xmm2,XMMWORD[((-96))+r12]
+        movdqa  xmm3,XMMWORD[((-80))+r12]
+        pand    xmm0,XMMWORD[((-128))+rdx]
+        pand    xmm1,XMMWORD[((-112))+rdx]
+        por     xmm4,xmm0
+        pand    xmm2,XMMWORD[((-96))+rdx]
+        por     xmm5,xmm1
+        pand    xmm3,XMMWORD[((-80))+rdx]
+        por     xmm4,xmm2
+        por     xmm5,xmm3
+        movdqa  xmm0,XMMWORD[((-64))+r12]
+        movdqa  xmm1,XMMWORD[((-48))+r12]
+        movdqa  xmm2,XMMWORD[((-32))+r12]
+        movdqa  xmm3,XMMWORD[((-16))+r12]
+        pand    xmm0,XMMWORD[((-64))+rdx]
+        pand    xmm1,XMMWORD[((-48))+rdx]
+        por     xmm4,xmm0
+        pand    xmm2,XMMWORD[((-32))+rdx]
+        por     xmm5,xmm1
+        pand    xmm3,XMMWORD[((-16))+rdx]
+        por     xmm4,xmm2
+        por     xmm5,xmm3
+        movdqa  xmm0,XMMWORD[r12]
+        movdqa  xmm1,XMMWORD[16+r12]
+        movdqa  xmm2,XMMWORD[32+r12]
+        movdqa  xmm3,XMMWORD[48+r12]
+        pand    xmm0,XMMWORD[rdx]
+        pand    xmm1,XMMWORD[16+rdx]
+        por     xmm4,xmm0
+        pand    xmm2,XMMWORD[32+rdx]
+        por     xmm5,xmm1
+        pand    xmm3,XMMWORD[48+rdx]
+        por     xmm4,xmm2
+        por     xmm5,xmm3
+        movdqa  xmm0,XMMWORD[64+r12]
+        movdqa  xmm1,XMMWORD[80+r12]
+        movdqa  xmm2,XMMWORD[96+r12]
+        movdqa  xmm3,XMMWORD[112+r12]
+        pand    xmm0,XMMWORD[64+rdx]
+        pand    xmm1,XMMWORD[80+rdx]
+        por     xmm4,xmm0
+        pand    xmm2,XMMWORD[96+rdx]
+        por     xmm5,xmm1
+        pand    xmm3,XMMWORD[112+rdx]
+        por     xmm4,xmm2
+        por     xmm5,xmm3
+        por     xmm4,xmm5
+        pshufd  xmm0,xmm4,0x4e
+        por     xmm0,xmm4
+        lea     r12,[256+r12]
+
+        mov     rax,QWORD[rsi]
+DB      102,72,15,126,195
+
+        xor     r15,r15
+        mov     rbp,r8
+        mov     r10,QWORD[rsp]
+
+        mul     rbx
+        add     r10,rax
+        mov     rax,QWORD[rcx]
+        adc     rdx,0
+
+        imul    rbp,r10
+        mov     r11,rdx
+
+        mul     rbp
+        add     r10,rax
+        mov     rax,QWORD[8+rsi]
+        adc     rdx,0
+        mov     r10,QWORD[8+rsp]
+        mov     r13,rdx
+
+        lea     r15,[1+r15]
+        jmp     NEAR $L$inner_enter
+
+ALIGN   16
+$L$inner:
+        add     r13,rax
+        mov     rax,QWORD[r15*8+rsi]
+        adc     rdx,0
+        add     r13,r10
+        mov     r10,QWORD[r15*8+rsp]
+        adc     rdx,0
+        mov     QWORD[((-16))+r15*8+rsp],r13
+        mov     r13,rdx
+
+$L$inner_enter:
+        mul     rbx
+        add     r11,rax
+        mov     rax,QWORD[r15*8+rcx]
+        adc     rdx,0
+        add     r10,r11
+        mov     r11,rdx
+        adc     r11,0
+        lea     r15,[1+r15]
+
+        mul     rbp
+        cmp     r15,r9
+        jne     NEAR $L$inner
+
+        add     r13,rax
+        adc     rdx,0
+        add     r13,r10
+        mov     r10,QWORD[r9*8+rsp]
+        adc     rdx,0
+        mov     QWORD[((-16))+r9*8+rsp],r13
+        mov     r13,rdx
+
+        xor     rdx,rdx
+        add     r13,r11
+        adc     rdx,0
+        add     r13,r10
+        adc     rdx,0
+        mov     QWORD[((-8))+r9*8+rsp],r13
+        mov     QWORD[r9*8+rsp],rdx
+
+        lea     r14,[1+r14]
+        cmp     r14,r9
+        jb      NEAR $L$outer
+
+        xor     r14,r14
+        mov     rax,QWORD[rsp]
+        lea     rsi,[rsp]
+        mov     r15,r9
+        jmp     NEAR $L$sub
+ALIGN   16
+$L$sub: sbb     rax,QWORD[r14*8+rcx]
+        mov     QWORD[r14*8+rdi],rax
+        mov     rax,QWORD[8+r14*8+rsi]
+        lea     r14,[1+r14]
+        dec     r15
+        jnz     NEAR $L$sub
+
+        sbb     rax,0
+        mov     rbx,-1
+        xor     rbx,rax
+        xor     r14,r14
+        mov     r15,r9
+
+$L$copy:
+        mov     rcx,QWORD[r14*8+rdi]
+        mov     rdx,QWORD[r14*8+rsp]
+        and     rcx,rbx
+        and     rdx,rax
+        mov     QWORD[r14*8+rsp],r14
+        or      rdx,rcx
+        mov     QWORD[r14*8+rdi],rdx
+        lea     r14,[1+r14]
+        sub     r15,1
+        jnz     NEAR $L$copy
+
+        mov     rsi,QWORD[8+r9*8+rsp]
+
+        mov     rax,1
+
+        mov     r15,QWORD[((-48))+rsi]
+
+        mov     r14,QWORD[((-40))+rsi]
+
+        mov     r13,QWORD[((-32))+rsi]
+
+        mov     r12,QWORD[((-24))+rsi]
+
+        mov     rbp,QWORD[((-16))+rsi]
+
+        mov     rbx,QWORD[((-8))+rsi]
+
+        lea     rsp,[rsi]
+
+$L$mul_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_bn_mul_mont_gather5:
+
+ALIGN   32
+bn_mul4x_mont_gather5:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_bn_mul4x_mont_gather5:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+
+DB      0x67
+        mov     rax,rsp
+
+$L$mul4x_enter:
+        and     r11d,0x80108
+        cmp     r11d,0x80108
+        je      NEAR $L$mulx4x_enter
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+$L$mul4x_prologue:
+
+DB      0x67
+        shl     r9d,3
+        lea     r10,[r9*2+r9]
+        neg     r9
+
+
+
+
+
+
+
+
+
+
+        lea     r11,[((-320))+r9*2+rsp]
+        mov     rbp,rsp
+        sub     r11,rdi
+        and     r11,4095
+        cmp     r10,r11
+        jb      NEAR $L$mul4xsp_alt
+        sub     rbp,r11
+        lea     rbp,[((-320))+r9*2+rbp]
+        jmp     NEAR $L$mul4xsp_done
+
+ALIGN   32
+$L$mul4xsp_alt:
+        lea     r10,[((4096-320))+r9*2]
+        lea     rbp,[((-320))+r9*2+rbp]
+        sub     r11,r10
+        mov     r10,0
+        cmovc   r11,r10
+        sub     rbp,r11
+$L$mul4xsp_done:
+        and     rbp,-64
+        mov     r11,rsp
+        sub     r11,rbp
+        and     r11,-4096
+        lea     rsp,[rbp*1+r11]
+        mov     r10,QWORD[rsp]
+        cmp     rsp,rbp
+        ja      NEAR $L$mul4x_page_walk
+        jmp     NEAR $L$mul4x_page_walk_done
+
+$L$mul4x_page_walk:
+        lea     rsp,[((-4096))+rsp]
+        mov     r10,QWORD[rsp]
+        cmp     rsp,rbp
+        ja      NEAR $L$mul4x_page_walk
+$L$mul4x_page_walk_done:
+
+        neg     r9
+
+        mov     QWORD[40+rsp],rax
+
+$L$mul4x_body:
+
+        call    mul4x_internal
+
+        mov     rsi,QWORD[40+rsp]
+
+        mov     rax,1
+
+        mov     r15,QWORD[((-48))+rsi]
+
+        mov     r14,QWORD[((-40))+rsi]
+
+        mov     r13,QWORD[((-32))+rsi]
+
+        mov     r12,QWORD[((-24))+rsi]
+
+        mov     rbp,QWORD[((-16))+rsi]
+
+        mov     rbx,QWORD[((-8))+rsi]
+
+        lea     rsp,[rsi]
+
+$L$mul4x_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_bn_mul4x_mont_gather5:
+
+
+ALIGN   32
+mul4x_internal:
+        shl     r9,5
+        movd    xmm5,DWORD[56+rax]
+        lea     rax,[$L$inc]
+        lea     r13,[128+r9*1+rdx]
+        shr     r9,5
+        movdqa  xmm0,XMMWORD[rax]
+        movdqa  xmm1,XMMWORD[16+rax]
+        lea     r10,[((88-112))+r9*1+rsp]
+        lea     r12,[128+rdx]
+
+        pshufd  xmm5,xmm5,0
+        movdqa  xmm4,xmm1
+DB      0x67,0x67
+        movdqa  xmm2,xmm1
+        paddd   xmm1,xmm0
+        pcmpeqd xmm0,xmm5
+DB      0x67
+        movdqa  xmm3,xmm4
+        paddd   xmm2,xmm1
+        pcmpeqd xmm1,xmm5
+        movdqa  XMMWORD[112+r10],xmm0
+        movdqa  xmm0,xmm4
+
+        paddd   xmm3,xmm2
+        pcmpeqd xmm2,xmm5
+        movdqa  XMMWORD[128+r10],xmm1
+        movdqa  xmm1,xmm4
+
+        paddd   xmm0,xmm3
+        pcmpeqd xmm3,xmm5
+        movdqa  XMMWORD[144+r10],xmm2
+        movdqa  xmm2,xmm4
+
+        paddd   xmm1,xmm0
+        pcmpeqd xmm0,xmm5
+        movdqa  XMMWORD[160+r10],xmm3
+        movdqa  xmm3,xmm4
+        paddd   xmm2,xmm1
+        pcmpeqd xmm1,xmm5
+        movdqa  XMMWORD[176+r10],xmm0
+        movdqa  xmm0,xmm4
+
+        paddd   xmm3,xmm2
+        pcmpeqd xmm2,xmm5
+        movdqa  XMMWORD[192+r10],xmm1
+        movdqa  xmm1,xmm4
+
+        paddd   xmm0,xmm3
+        pcmpeqd xmm3,xmm5
+        movdqa  XMMWORD[208+r10],xmm2
+        movdqa  xmm2,xmm4
+
+        paddd   xmm1,xmm0
+        pcmpeqd xmm0,xmm5
+        movdqa  XMMWORD[224+r10],xmm3
+        movdqa  xmm3,xmm4
+        paddd   xmm2,xmm1
+        pcmpeqd xmm1,xmm5
+        movdqa  XMMWORD[240+r10],xmm0
+        movdqa  xmm0,xmm4
+
+        paddd   xmm3,xmm2
+        pcmpeqd xmm2,xmm5
+        movdqa  XMMWORD[256+r10],xmm1
+        movdqa  xmm1,xmm4
+
+        paddd   xmm0,xmm3
+        pcmpeqd xmm3,xmm5
+        movdqa  XMMWORD[272+r10],xmm2
+        movdqa  xmm2,xmm4
+
+        paddd   xmm1,xmm0
+        pcmpeqd xmm0,xmm5
+        movdqa  XMMWORD[288+r10],xmm3
+        movdqa  xmm3,xmm4
+        paddd   xmm2,xmm1
+        pcmpeqd xmm1,xmm5
+        movdqa  XMMWORD[304+r10],xmm0
+
+        paddd   xmm3,xmm2
+DB      0x67
+        pcmpeqd xmm2,xmm5
+        movdqa  XMMWORD[320+r10],xmm1
+
+        pcmpeqd xmm3,xmm5
+        movdqa  XMMWORD[336+r10],xmm2
+        pand    xmm0,XMMWORD[64+r12]
+
+        pand    xmm1,XMMWORD[80+r12]
+        pand    xmm2,XMMWORD[96+r12]
+        movdqa  XMMWORD[352+r10],xmm3
+        pand    xmm3,XMMWORD[112+r12]
+        por     xmm0,xmm2
+        por     xmm1,xmm3
+        movdqa  xmm4,XMMWORD[((-128))+r12]
+        movdqa  xmm5,XMMWORD[((-112))+r12]
+        movdqa  xmm2,XMMWORD[((-96))+r12]
+        pand    xmm4,XMMWORD[112+r10]
+        movdqa  xmm3,XMMWORD[((-80))+r12]
+        pand    xmm5,XMMWORD[128+r10]
+        por     xmm0,xmm4
+        pand    xmm2,XMMWORD[144+r10]
+        por     xmm1,xmm5
+        pand    xmm3,XMMWORD[160+r10]
+        por     xmm0,xmm2
+        por     xmm1,xmm3
+        movdqa  xmm4,XMMWORD[((-64))+r12]
+        movdqa  xmm5,XMMWORD[((-48))+r12]
+        movdqa  xmm2,XMMWORD[((-32))+r12]
+        pand    xmm4,XMMWORD[176+r10]
+        movdqa  xmm3,XMMWORD[((-16))+r12]
+        pand    xmm5,XMMWORD[192+r10]
+        por     xmm0,xmm4
+        pand    xmm2,XMMWORD[208+r10]
+        por     xmm1,xmm5
+        pand    xmm3,XMMWORD[224+r10]
+        por     xmm0,xmm2
+        por     xmm1,xmm3
+        movdqa  xmm4,XMMWORD[r12]
+        movdqa  xmm5,XMMWORD[16+r12]
+        movdqa  xmm2,XMMWORD[32+r12]
+        pand    xmm4,XMMWORD[240+r10]
+        movdqa  xmm3,XMMWORD[48+r12]
+        pand    xmm5,XMMWORD[256+r10]
+        por     xmm0,xmm4
+        pand    xmm2,XMMWORD[272+r10]
+        por     xmm1,xmm5
+        pand    xmm3,XMMWORD[288+r10]
+        por     xmm0,xmm2
+        por     xmm1,xmm3
+        por     xmm0,xmm1
+        pshufd  xmm1,xmm0,0x4e
+        por     xmm0,xmm1
+        lea     r12,[256+r12]
+DB      102,72,15,126,195
+
+        mov     QWORD[((16+8))+rsp],r13
+        mov     QWORD[((56+8))+rsp],rdi
+
+        mov     r8,QWORD[r8]
+        mov     rax,QWORD[rsi]
+        lea     rsi,[r9*1+rsi]
+        neg     r9
+
+        mov     rbp,r8
+        mul     rbx
+        mov     r10,rax
+        mov     rax,QWORD[rcx]
+
+        imul    rbp,r10
+        lea     r14,[((64+8))+rsp]
+        mov     r11,rdx
+
+        mul     rbp
+        add     r10,rax
+        mov     rax,QWORD[8+r9*1+rsi]
+        adc     rdx,0
+        mov     rdi,rdx
+
+        mul     rbx
+        add     r11,rax
+        mov     rax,QWORD[8+rcx]
+        adc     rdx,0
+        mov     r10,rdx
+
+        mul     rbp
+        add     rdi,rax
+        mov     rax,QWORD[16+r9*1+rsi]
+        adc     rdx,0
+        add     rdi,r11
+        lea     r15,[32+r9]
+        lea     rcx,[32+rcx]
+        adc     rdx,0
+        mov     QWORD[r14],rdi
+        mov     r13,rdx
+        jmp     NEAR $L$1st4x
+
+ALIGN   32
+$L$1st4x:
+        mul     rbx
+        add     r10,rax
+        mov     rax,QWORD[((-16))+rcx]
+        lea     r14,[32+r14]
+        adc     rdx,0
+        mov     r11,rdx
+
+        mul     rbp
+        add     r13,rax
+        mov     rax,QWORD[((-8))+r15*1+rsi]
+        adc     rdx,0
+        add     r13,r10
+        adc     rdx,0
+        mov     QWORD[((-24))+r14],r13
+        mov     rdi,rdx
+
+        mul     rbx
+        add     r11,rax
+        mov     rax,QWORD[((-8))+rcx]
+        adc     rdx,0
+        mov     r10,rdx
+
+        mul     rbp
+        add     rdi,rax
+        mov     rax,QWORD[r15*1+rsi]
+        adc     rdx,0
+        add     rdi,r11
+        adc     rdx,0
+        mov     QWORD[((-16))+r14],rdi
+        mov     r13,rdx
+
+        mul     rbx
+        add     r10,rax
+        mov     rax,QWORD[rcx]
+        adc     rdx,0
+        mov     r11,rdx
+
+        mul     rbp
+        add     r13,rax
+        mov     rax,QWORD[8+r15*1+rsi]
+        adc     rdx,0
+        add     r13,r10
+        adc     rdx,0
+        mov     QWORD[((-8))+r14],r13
+        mov     rdi,rdx
+
+        mul     rbx
+        add     r11,rax
+        mov     rax,QWORD[8+rcx]
+        adc     rdx,0
+        mov     r10,rdx
+
+        mul     rbp
+        add     rdi,rax
+        mov     rax,QWORD[16+r15*1+rsi]
+        adc     rdx,0
+        add     rdi,r11
+        lea     rcx,[32+rcx]
+        adc     rdx,0
+        mov     QWORD[r14],rdi
+        mov     r13,rdx
+
+        add     r15,32
+        jnz     NEAR $L$1st4x
+
+        mul     rbx
+        add     r10,rax
+        mov     rax,QWORD[((-16))+rcx]
+        lea     r14,[32+r14]
+        adc     rdx,0
+        mov     r11,rdx
+
+        mul     rbp
+        add     r13,rax
+        mov     rax,QWORD[((-8))+rsi]
+        adc     rdx,0
+        add     r13,r10
+        adc     rdx,0
+        mov     QWORD[((-24))+r14],r13
+        mov     rdi,rdx
+
+        mul     rbx
+        add     r11,rax
+        mov     rax,QWORD[((-8))+rcx]
+        adc     rdx,0
+        mov     r10,rdx
+
+        mul     rbp
+        add     rdi,rax
+        mov     rax,QWORD[r9*1+rsi]
+        adc     rdx,0
+        add     rdi,r11
+        adc     rdx,0
+        mov     QWORD[((-16))+r14],rdi
+        mov     r13,rdx
+
+        lea     rcx,[r9*1+rcx]
+
+        xor     rdi,rdi
+        add     r13,r10
+        adc     rdi,0
+        mov     QWORD[((-8))+r14],r13
+
+        jmp     NEAR $L$outer4x
+
+ALIGN   32
+$L$outer4x:
+        lea     rdx,[((16+128))+r14]
+        pxor    xmm4,xmm4
+        pxor    xmm5,xmm5
+        movdqa  xmm0,XMMWORD[((-128))+r12]
+        movdqa  xmm1,XMMWORD[((-112))+r12]
+        movdqa  xmm2,XMMWORD[((-96))+r12]
+        movdqa  xmm3,XMMWORD[((-80))+r12]
+        pand    xmm0,XMMWORD[((-128))+rdx]
+        pand    xmm1,XMMWORD[((-112))+rdx]
+        por     xmm4,xmm0
+        pand    xmm2,XMMWORD[((-96))+rdx]
+        por     xmm5,xmm1
+        pand    xmm3,XMMWORD[((-80))+rdx]
+        por     xmm4,xmm2
+        por     xmm5,xmm3
+        movdqa  xmm0,XMMWORD[((-64))+r12]
+        movdqa  xmm1,XMMWORD[((-48))+r12]
+        movdqa  xmm2,XMMWORD[((-32))+r12]
+        movdqa  xmm3,XMMWORD[((-16))+r12]
+        pand    xmm0,XMMWORD[((-64))+rdx]
+        pand    xmm1,XMMWORD[((-48))+rdx]
+        por     xmm4,xmm0
+        pand    xmm2,XMMWORD[((-32))+rdx]
+        por     xmm5,xmm1
+        pand    xmm3,XMMWORD[((-16))+rdx]
+        por     xmm4,xmm2
+        por     xmm5,xmm3
+        movdqa  xmm0,XMMWORD[r12]
+        movdqa  xmm1,XMMWORD[16+r12]
+        movdqa  xmm2,XMMWORD[32+r12]
+        movdqa  xmm3,XMMWORD[48+r12]
+        pand    xmm0,XMMWORD[rdx]
+        pand    xmm1,XMMWORD[16+rdx]
+        por     xmm4,xmm0
+        pand    xmm2,XMMWORD[32+rdx]
+        por     xmm5,xmm1
+        pand    xmm3,XMMWORD[48+rdx]
+        por     xmm4,xmm2
+        por     xmm5,xmm3
+        movdqa  xmm0,XMMWORD[64+r12]
+        movdqa  xmm1,XMMWORD[80+r12]
+        movdqa  xmm2,XMMWORD[96+r12]
+        movdqa  xmm3,XMMWORD[112+r12]
+        pand    xmm0,XMMWORD[64+rdx]
+        pand    xmm1,XMMWORD[80+rdx]
+        por     xmm4,xmm0
+        pand    xmm2,XMMWORD[96+rdx]
+        por     xmm5,xmm1
+        pand    xmm3,XMMWORD[112+rdx]
+        por     xmm4,xmm2
+        por     xmm5,xmm3
+        por     xmm4,xmm5
+        pshufd  xmm0,xmm4,0x4e
+        por     xmm0,xmm4
+        lea     r12,[256+r12]
+DB      102,72,15,126,195
+
+        mov     r10,QWORD[r9*1+r14]
+        mov     rbp,r8
+        mul     rbx
+        add     r10,rax
+        mov     rax,QWORD[rcx]
+        adc     rdx,0
+
+        imul    rbp,r10
+        mov     r11,rdx
+        mov     QWORD[r14],rdi
+
+        lea     r14,[r9*1+r14]
+
+        mul     rbp
+        add     r10,rax
+        mov     rax,QWORD[8+r9*1+rsi]
+        adc     rdx,0
+        mov     rdi,rdx
+
+        mul     rbx
+        add     r11,rax
+        mov     rax,QWORD[8+rcx]
+        adc     rdx,0
+        add     r11,QWORD[8+r14]
+        adc     rdx,0
+        mov     r10,rdx
+
+        mul     rbp
+        add     rdi,rax
+        mov     rax,QWORD[16+r9*1+rsi]
+        adc     rdx,0
+        add     rdi,r11
+        lea     r15,[32+r9]
+        lea     rcx,[32+rcx]
+        adc     rdx,0
+        mov     r13,rdx
+        jmp     NEAR $L$inner4x
+
+ALIGN   32
+$L$inner4x:
+        mul     rbx
+        add     r10,rax
+        mov     rax,QWORD[((-16))+rcx]
+        adc     rdx,0
+        add     r10,QWORD[16+r14]
+        lea     r14,[32+r14]
+        adc     rdx,0
+        mov     r11,rdx
+
+        mul     rbp
+        add     r13,rax
+        mov     rax,QWORD[((-8))+r15*1+rsi]
+        adc     rdx,0
+        add     r13,r10
+        adc     rdx,0
+        mov     QWORD[((-32))+r14],rdi
+        mov     rdi,rdx
+
+        mul     rbx
+        add     r11,rax
+        mov     rax,QWORD[((-8))+rcx]
+        adc     rdx,0
+        add     r11,QWORD[((-8))+r14]
+        adc     rdx,0
+        mov     r10,rdx
+
+        mul     rbp
+        add     rdi,rax
+        mov     rax,QWORD[r15*1+rsi]
+        adc     rdx,0
+        add     rdi,r11
+        adc     rdx,0
+        mov     QWORD[((-24))+r14],r13
+        mov     r13,rdx
+
+        mul     rbx
+        add     r10,rax
+        mov     rax,QWORD[rcx]
+        adc     rdx,0
+        add     r10,QWORD[r14]
+        adc     rdx,0
+        mov     r11,rdx
+
+        mul     rbp
+        add     r13,rax
+        mov     rax,QWORD[8+r15*1+rsi]
+        adc     rdx,0
+        add     r13,r10
+        adc     rdx,0
+        mov     QWORD[((-16))+r14],rdi
+        mov     rdi,rdx
+
+        mul     rbx
+        add     r11,rax
+        mov     rax,QWORD[8+rcx]
+        adc     rdx,0
+        add     r11,QWORD[8+r14]
+        adc     rdx,0
+        mov     r10,rdx
+
+        mul     rbp
+        add     rdi,rax
+        mov     rax,QWORD[16+r15*1+rsi]
+        adc     rdx,0
+        add     rdi,r11
+        lea     rcx,[32+rcx]
+        adc     rdx,0
+        mov     QWORD[((-8))+r14],r13
+        mov     r13,rdx
+
+        add     r15,32
+        jnz     NEAR $L$inner4x
+
+        mul     rbx
+        add     r10,rax
+        mov     rax,QWORD[((-16))+rcx]
+        adc     rdx,0
+        add     r10,QWORD[16+r14]
+        lea     r14,[32+r14]
+        adc     rdx,0
+        mov     r11,rdx
+
+        mul     rbp
+        add     r13,rax
+        mov     rax,QWORD[((-8))+rsi]
+        adc     rdx,0
+        add     r13,r10
+        adc     rdx,0
+        mov     QWORD[((-32))+r14],rdi
+        mov     rdi,rdx
+
+        mul     rbx
+        add     r11,rax
+        mov     rax,rbp
+        mov     rbp,QWORD[((-8))+rcx]
+        adc     rdx,0
+        add     r11,QWORD[((-8))+r14]
+        adc     rdx,0
+        mov     r10,rdx
+
+        mul     rbp
+        add     rdi,rax
+        mov     rax,QWORD[r9*1+rsi]
+        adc     rdx,0
+        add     rdi,r11
+        adc     rdx,0
+        mov     QWORD[((-24))+r14],r13
+        mov     r13,rdx
+
+        mov     QWORD[((-16))+r14],rdi
+        lea     rcx,[r9*1+rcx]
+
+        xor     rdi,rdi
+        add     r13,r10
+        adc     rdi,0
+        add     r13,QWORD[r14]
+        adc     rdi,0
+        mov     QWORD[((-8))+r14],r13
+
+        cmp     r12,QWORD[((16+8))+rsp]
+        jb      NEAR $L$outer4x
+        xor     rax,rax
+        sub     rbp,r13
+        adc     r15,r15
+        or      rdi,r15
+        sub     rax,rdi
+        lea     rbx,[r9*1+r14]
+        mov     r12,QWORD[rcx]
+        lea     rbp,[rcx]
+        mov     rcx,r9
+        sar     rcx,3+2
+        mov     rdi,QWORD[((56+8))+rsp]
+        dec     r12
+        xor     r10,r10
+        mov     r13,QWORD[8+rbp]
+        mov     r14,QWORD[16+rbp]
+        mov     r15,QWORD[24+rbp]
+        jmp     NEAR $L$sqr4x_sub_entry
+
+global  bn_power5
+
+ALIGN   32
+bn_power5:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_bn_power5:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+
+        mov     rax,rsp
+
+        mov     r11d,DWORD[((OPENSSL_ia32cap_P+8))]
+        and     r11d,0x80108
+        cmp     r11d,0x80108
+        je      NEAR $L$powerx5_enter
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+$L$power5_prologue:
+
+        shl     r9d,3
+        lea     r10d,[r9*2+r9]
+        neg     r9
+        mov     r8,QWORD[r8]
+
+
+
+
+
+
+
+
+        lea     r11,[((-320))+r9*2+rsp]
+        mov     rbp,rsp
+        sub     r11,rdi
+        and     r11,4095
+        cmp     r10,r11
+        jb      NEAR $L$pwr_sp_alt
+        sub     rbp,r11
+        lea     rbp,[((-320))+r9*2+rbp]
+        jmp     NEAR $L$pwr_sp_done
+
+ALIGN   32
+$L$pwr_sp_alt:
+        lea     r10,[((4096-320))+r9*2]
+        lea     rbp,[((-320))+r9*2+rbp]
+        sub     r11,r10
+        mov     r10,0
+        cmovc   r11,r10
+        sub     rbp,r11
+$L$pwr_sp_done:
+        and     rbp,-64
+        mov     r11,rsp
+        sub     r11,rbp
+        and     r11,-4096
+        lea     rsp,[rbp*1+r11]
+        mov     r10,QWORD[rsp]
+        cmp     rsp,rbp
+        ja      NEAR $L$pwr_page_walk
+        jmp     NEAR $L$pwr_page_walk_done
+
+$L$pwr_page_walk:
+        lea     rsp,[((-4096))+rsp]
+        mov     r10,QWORD[rsp]
+        cmp     rsp,rbp
+        ja      NEAR $L$pwr_page_walk
+$L$pwr_page_walk_done:
+
+        mov     r10,r9
+        neg     r9
+
+
+
+
+
+
+
+
+
+
+        mov     QWORD[32+rsp],r8
+        mov     QWORD[40+rsp],rax
+
+$L$power5_body:
+DB      102,72,15,110,207
+DB      102,72,15,110,209
+DB      102,73,15,110,218
+DB      102,72,15,110,226
+
+        call    __bn_sqr8x_internal
+        call    __bn_post4x_internal
+        call    __bn_sqr8x_internal
+        call    __bn_post4x_internal
+        call    __bn_sqr8x_internal
+        call    __bn_post4x_internal
+        call    __bn_sqr8x_internal
+        call    __bn_post4x_internal
+        call    __bn_sqr8x_internal
+        call    __bn_post4x_internal
+
+DB      102,72,15,126,209
+DB      102,72,15,126,226
+        mov     rdi,rsi
+        mov     rax,QWORD[40+rsp]
+        lea     r8,[32+rsp]
+
+        call    mul4x_internal
+
+        mov     rsi,QWORD[40+rsp]
+
+        mov     rax,1
+        mov     r15,QWORD[((-48))+rsi]
+
+        mov     r14,QWORD[((-40))+rsi]
+
+        mov     r13,QWORD[((-32))+rsi]
+
+        mov     r12,QWORD[((-24))+rsi]
+
+        mov     rbp,QWORD[((-16))+rsi]
+
+        mov     rbx,QWORD[((-8))+rsi]
+
+        lea     rsp,[rsi]
+
+$L$power5_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_bn_power5:
+
+global  bn_sqr8x_internal
+
+
+ALIGN   32
+bn_sqr8x_internal:
+__bn_sqr8x_internal:
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+        lea     rbp,[32+r10]
+        lea     rsi,[r9*1+rsi]
+
+        mov     rcx,r9
+
+
+        mov     r14,QWORD[((-32))+rbp*1+rsi]
+        lea     rdi,[((48+8))+r9*2+rsp]
+        mov     rax,QWORD[((-24))+rbp*1+rsi]
+        lea     rdi,[((-32))+rbp*1+rdi]
+        mov     rbx,QWORD[((-16))+rbp*1+rsi]
+        mov     r15,rax
+
+        mul     r14
+        mov     r10,rax
+        mov     rax,rbx
+        mov     r11,rdx
+        mov     QWORD[((-24))+rbp*1+rdi],r10
+
+        mul     r14
+        add     r11,rax
+        mov     rax,rbx
+        adc     rdx,0
+        mov     QWORD[((-16))+rbp*1+rdi],r11
+        mov     r10,rdx
+
+
+        mov     rbx,QWORD[((-8))+rbp*1+rsi]
+        mul     r15
+        mov     r12,rax
+        mov     rax,rbx
+        mov     r13,rdx
+
+        lea     rcx,[rbp]
+        mul     r14
+        add     r10,rax
+        mov     rax,rbx
+        mov     r11,rdx
+        adc     r11,0
+        add     r10,r12
+        adc     r11,0
+        mov     QWORD[((-8))+rcx*1+rdi],r10
+        jmp     NEAR $L$sqr4x_1st
+
+ALIGN   32
+$L$sqr4x_1st:
+        mov     rbx,QWORD[rcx*1+rsi]
+        mul     r15
+        add     r13,rax
+        mov     rax,rbx
+        mov     r12,rdx
+        adc     r12,0
+
+        mul     r14
+        add     r11,rax
+        mov     rax,rbx
+        mov     rbx,QWORD[8+rcx*1+rsi]
+        mov     r10,rdx
+        adc     r10,0
+        add     r11,r13
+        adc     r10,0
+
+
+        mul     r15
+        add     r12,rax
+        mov     rax,rbx
+        mov     QWORD[rcx*1+rdi],r11
+        mov     r13,rdx
+        adc     r13,0
+
+        mul     r14
+        add     r10,rax
+        mov     rax,rbx
+        mov     rbx,QWORD[16+rcx*1+rsi]
+        mov     r11,rdx
+        adc     r11,0
+        add     r10,r12
+        adc     r11,0
+
+        mul     r15
+        add     r13,rax
+        mov     rax,rbx
+        mov     QWORD[8+rcx*1+rdi],r10
+        mov     r12,rdx
+        adc     r12,0
+
+        mul     r14
+        add     r11,rax
+        mov     rax,rbx
+        mov     rbx,QWORD[24+rcx*1+rsi]
+        mov     r10,rdx
+        adc     r10,0
+        add     r11,r13
+        adc     r10,0
+
+
+        mul     r15
+        add     r12,rax
+        mov     rax,rbx
+        mov     QWORD[16+rcx*1+rdi],r11
+        mov     r13,rdx
+        adc     r13,0
+        lea     rcx,[32+rcx]
+
+        mul     r14
+        add     r10,rax
+        mov     rax,rbx
+        mov     r11,rdx
+        adc     r11,0
+        add     r10,r12
+        adc     r11,0
+        mov     QWORD[((-8))+rcx*1+rdi],r10
+
+        cmp     rcx,0
+        jne     NEAR $L$sqr4x_1st
+
+        mul     r15
+        add     r13,rax
+        lea     rbp,[16+rbp]
+        adc     rdx,0
+        add     r13,r11
+        adc     rdx,0
+
+        mov     QWORD[rdi],r13
+        mov     r12,rdx
+        mov     QWORD[8+rdi],rdx
+        jmp     NEAR $L$sqr4x_outer
+
+ALIGN   32
+$L$sqr4x_outer:
+        mov     r14,QWORD[((-32))+rbp*1+rsi]
+        lea     rdi,[((48+8))+r9*2+rsp]
+        mov     rax,QWORD[((-24))+rbp*1+rsi]
+        lea     rdi,[((-32))+rbp*1+rdi]
+        mov     rbx,QWORD[((-16))+rbp*1+rsi]
+        mov     r15,rax
+
+        mul     r14
+        mov     r10,QWORD[((-24))+rbp*1+rdi]
+        add     r10,rax
+        mov     rax,rbx
+        adc     rdx,0
+        mov     QWORD[((-24))+rbp*1+rdi],r10
+        mov     r11,rdx
+
+        mul     r14
+        add     r11,rax
+        mov     rax,rbx
+        adc     rdx,0
+        add     r11,QWORD[((-16))+rbp*1+rdi]
+        mov     r10,rdx
+        adc     r10,0
+        mov     QWORD[((-16))+rbp*1+rdi],r11
+
+        xor     r12,r12
+
+        mov     rbx,QWORD[((-8))+rbp*1+rsi]
+        mul     r15
+        add     r12,rax
+        mov     rax,rbx
+        adc     rdx,0
+        add     r12,QWORD[((-8))+rbp*1+rdi]
+        mov     r13,rdx
+        adc     r13,0
+
+        mul     r14
+        add     r10,rax
+        mov     rax,rbx
+        adc     rdx,0
+        add     r10,r12
+        mov     r11,rdx
+        adc     r11,0
+        mov     QWORD[((-8))+rbp*1+rdi],r10
+
+        lea     rcx,[rbp]
+        jmp     NEAR $L$sqr4x_inner
+
+ALIGN   32
+$L$sqr4x_inner:
+        mov     rbx,QWORD[rcx*1+rsi]
+        mul     r15
+        add     r13,rax
+        mov     rax,rbx
+        mov     r12,rdx
+        adc     r12,0
+        add     r13,QWORD[rcx*1+rdi]
+        adc     r12,0
+
+DB      0x67
+        mul     r14
+        add     r11,rax
+        mov     rax,rbx
+        mov     rbx,QWORD[8+rcx*1+rsi]
+        mov     r10,rdx
+        adc     r10,0
+        add     r11,r13
+        adc     r10,0
+
+        mul     r15
+        add     r12,rax
+        mov     QWORD[rcx*1+rdi],r11
+        mov     rax,rbx
+        mov     r13,rdx
+        adc     r13,0
+        add     r12,QWORD[8+rcx*1+rdi]
+        lea     rcx,[16+rcx]
+        adc     r13,0
+
+        mul     r14
+        add     r10,rax
+        mov     rax,rbx
+        adc     rdx,0
+        add     r10,r12
+        mov     r11,rdx
+        adc     r11,0
+        mov     QWORD[((-8))+rcx*1+rdi],r10
+
+        cmp     rcx,0
+        jne     NEAR $L$sqr4x_inner
+
+DB      0x67
+        mul     r15
+        add     r13,rax
+        adc     rdx,0
+        add     r13,r11
+        adc     rdx,0
+
+        mov     QWORD[rdi],r13
+        mov     r12,rdx
+        mov     QWORD[8+rdi],rdx
+
+        add     rbp,16
+        jnz     NEAR $L$sqr4x_outer
+
+
+        mov     r14,QWORD[((-32))+rsi]
+        lea     rdi,[((48+8))+r9*2+rsp]
+        mov     rax,QWORD[((-24))+rsi]
+        lea     rdi,[((-32))+rbp*1+rdi]
+        mov     rbx,QWORD[((-16))+rsi]
+        mov     r15,rax
+
+        mul     r14
+        add     r10,rax
+        mov     rax,rbx
+        mov     r11,rdx
+        adc     r11,0
+
+        mul     r14
+        add     r11,rax
+        mov     rax,rbx
+        mov     QWORD[((-24))+rdi],r10
+        mov     r10,rdx
+        adc     r10,0
+        add     r11,r13
+        mov     rbx,QWORD[((-8))+rsi]
+        adc     r10,0
+
+        mul     r15
+        add     r12,rax
+        mov     rax,rbx
+        mov     QWORD[((-16))+rdi],r11
+        mov     r13,rdx
+        adc     r13,0
+
+        mul     r14
+        add     r10,rax
+        mov     rax,rbx
+        mov     r11,rdx
+        adc     r11,0
+        add     r10,r12
+        adc     r11,0
+        mov     QWORD[((-8))+rdi],r10
+
+        mul     r15
+        add     r13,rax
+        mov     rax,QWORD[((-16))+rsi]
+        adc     rdx,0
+        add     r13,r11
+        adc     rdx,0
+
+        mov     QWORD[rdi],r13
+        mov     r12,rdx
+        mov     QWORD[8+rdi],rdx
+
+        mul     rbx
+        add     rbp,16
+        xor     r14,r14
+        sub     rbp,r9
+        xor     r15,r15
+
+        add     rax,r12
+        adc     rdx,0
+        mov     QWORD[8+rdi],rax
+        mov     QWORD[16+rdi],rdx
+        mov     QWORD[24+rdi],r15
+
+        mov     rax,QWORD[((-16))+rbp*1+rsi]
+        lea     rdi,[((48+8))+rsp]
+        xor     r10,r10
+        mov     r11,QWORD[8+rdi]
+
+        lea     r12,[r10*2+r14]
+        shr     r10,63
+        lea     r13,[r11*2+rcx]
+        shr     r11,63
+        or      r13,r10
+        mov     r10,QWORD[16+rdi]
+        mov     r14,r11
+        mul     rax
+        neg     r15
+        mov     r11,QWORD[24+rdi]
+        adc     r12,rax
+        mov     rax,QWORD[((-8))+rbp*1+rsi]
+        mov     QWORD[rdi],r12
+        adc     r13,rdx
+
+        lea     rbx,[r10*2+r14]
+        mov     QWORD[8+rdi],r13
+        sbb     r15,r15
+        shr     r10,63
+        lea     r8,[r11*2+rcx]
+        shr     r11,63
+        or      r8,r10
+        mov     r10,QWORD[32+rdi]
+        mov     r14,r11
+        mul     rax
+        neg     r15
+        mov     r11,QWORD[40+rdi]
+        adc     rbx,rax
+        mov     rax,QWORD[rbp*1+rsi]
+        mov     QWORD[16+rdi],rbx
+        adc     r8,rdx
+        lea     rbp,[16+rbp]
+        mov     QWORD[24+rdi],r8
+        sbb     r15,r15
+        lea     rdi,[64+rdi]
+        jmp     NEAR $L$sqr4x_shift_n_add
+
+ALIGN   32
+$L$sqr4x_shift_n_add:
+        lea     r12,[r10*2+r14]
+        shr     r10,63
+        lea     r13,[r11*2+rcx]
+        shr     r11,63
+        or      r13,r10
+        mov     r10,QWORD[((-16))+rdi]
+        mov     r14,r11
+        mul     rax
+        neg     r15
+        mov     r11,QWORD[((-8))+rdi]
+        adc     r12,rax
+        mov     rax,QWORD[((-8))+rbp*1+rsi]
+        mov     QWORD[((-32))+rdi],r12
+        adc     r13,rdx
+
+        lea     rbx,[r10*2+r14]
+        mov     QWORD[((-24))+rdi],r13
+        sbb     r15,r15
+        shr     r10,63
+        lea     r8,[r11*2+rcx]
+        shr     r11,63
+        or      r8,r10
+        mov     r10,QWORD[rdi]
+        mov     r14,r11
+        mul     rax
+        neg     r15
+        mov     r11,QWORD[8+rdi]
+        adc     rbx,rax
+        mov     rax,QWORD[rbp*1+rsi]
+        mov     QWORD[((-16))+rdi],rbx
+        adc     r8,rdx
+
+        lea     r12,[r10*2+r14]
+        mov     QWORD[((-8))+rdi],r8
+        sbb     r15,r15
+        shr     r10,63
+        lea     r13,[r11*2+rcx]
+        shr     r11,63
+        or      r13,r10
+        mov     r10,QWORD[16+rdi]
+        mov     r14,r11
+        mul     rax
+        neg     r15
+        mov     r11,QWORD[24+rdi]
+        adc     r12,rax
+        mov     rax,QWORD[8+rbp*1+rsi]
+        mov     QWORD[rdi],r12
+        adc     r13,rdx
+
+        lea     rbx,[r10*2+r14]
+        mov     QWORD[8+rdi],r13
+        sbb     r15,r15
+        shr     r10,63
+        lea     r8,[r11*2+rcx]
+        shr     r11,63
+        or      r8,r10
+        mov     r10,QWORD[32+rdi]
+        mov     r14,r11
+        mul     rax
+        neg     r15
+        mov     r11,QWORD[40+rdi]
+        adc     rbx,rax
+        mov     rax,QWORD[16+rbp*1+rsi]
+        mov     QWORD[16+rdi],rbx
+        adc     r8,rdx
+        mov     QWORD[24+rdi],r8
+        sbb     r15,r15
+        lea     rdi,[64+rdi]
+        add     rbp,32
+        jnz     NEAR $L$sqr4x_shift_n_add
+
+        lea     r12,[r10*2+r14]
+DB      0x67
+        shr     r10,63
+        lea     r13,[r11*2+rcx]
+        shr     r11,63
+        or      r13,r10
+        mov     r10,QWORD[((-16))+rdi]
+        mov     r14,r11
+        mul     rax
+        neg     r15
+        mov     r11,QWORD[((-8))+rdi]
+        adc     r12,rax
+        mov     rax,QWORD[((-8))+rsi]
+        mov     QWORD[((-32))+rdi],r12
+        adc     r13,rdx
+
+        lea     rbx,[r10*2+r14]
+        mov     QWORD[((-24))+rdi],r13
+        sbb     r15,r15
+        shr     r10,63
+        lea     r8,[r11*2+rcx]
+        shr     r11,63
+        or      r8,r10
+        mul     rax
+        neg     r15
+        adc     rbx,rax
+        adc     r8,rdx
+        mov     QWORD[((-16))+rdi],rbx
+        mov     QWORD[((-8))+rdi],r8
+DB      102,72,15,126,213
+__bn_sqr8x_reduction:
+        xor     rax,rax
+        lea     rcx,[rbp*1+r9]
+        lea     rdx,[((48+8))+r9*2+rsp]
+        mov     QWORD[((0+8))+rsp],rcx
+        lea     rdi,[((48+8))+r9*1+rsp]
+        mov     QWORD[((8+8))+rsp],rdx
+        neg     r9
+        jmp     NEAR $L$8x_reduction_loop
+
+ALIGN   32
+$L$8x_reduction_loop:
+        lea     rdi,[r9*1+rdi]
+DB      0x66
+        mov     rbx,QWORD[rdi]
+        mov     r9,QWORD[8+rdi]
+        mov     r10,QWORD[16+rdi]
+        mov     r11,QWORD[24+rdi]
+        mov     r12,QWORD[32+rdi]
+        mov     r13,QWORD[40+rdi]
+        mov     r14,QWORD[48+rdi]
+        mov     r15,QWORD[56+rdi]
+        mov     QWORD[rdx],rax
+        lea     rdi,[64+rdi]
+
+DB      0x67
+        mov     r8,rbx
+        imul    rbx,QWORD[((32+8))+rsp]
+        mov     rax,QWORD[rbp]
+        mov     ecx,8
+        jmp     NEAR $L$8x_reduce
+
+ALIGN   32
+$L$8x_reduce:
+        mul     rbx
+        mov     rax,QWORD[8+rbp]
+        neg     r8
+        mov     r8,rdx
+        adc     r8,0
+
+        mul     rbx
+        add     r9,rax
+        mov     rax,QWORD[16+rbp]
+        adc     rdx,0
+        add     r8,r9
+        mov     QWORD[((48-8+8))+rcx*8+rsp],rbx
+        mov     r9,rdx
+        adc     r9,0
+
+        mul     rbx
+        add     r10,rax
+        mov     rax,QWORD[24+rbp]
+        adc     rdx,0
+        add     r9,r10
+        mov     rsi,QWORD[((32+8))+rsp]
+        mov     r10,rdx
+        adc     r10,0
+
+        mul     rbx
+        add     r11,rax
+        mov     rax,QWORD[32+rbp]
+        adc     rdx,0
+        imul    rsi,r8
+        add     r10,r11
+        mov     r11,rdx
+        adc     r11,0
+
+        mul     rbx
+        add     r12,rax
+        mov     rax,QWORD[40+rbp]
+        adc     rdx,0
+        add     r11,r12
+        mov     r12,rdx
+        adc     r12,0
+
+        mul     rbx
+        add     r13,rax
+        mov     rax,QWORD[48+rbp]
+        adc     rdx,0
+        add     r12,r13
+        mov     r13,rdx
+        adc     r13,0
+
+        mul     rbx
+        add     r14,rax
+        mov     rax,QWORD[56+rbp]
+        adc     rdx,0
+        add     r13,r14
+        mov     r14,rdx
+        adc     r14,0
+
+        mul     rbx
+        mov     rbx,rsi
+        add     r15,rax
+        mov     rax,QWORD[rbp]
+        adc     rdx,0
+        add     r14,r15
+        mov     r15,rdx
+        adc     r15,0
+
+        dec     ecx
+        jnz     NEAR $L$8x_reduce
+
+        lea     rbp,[64+rbp]
+        xor     rax,rax
+        mov     rdx,QWORD[((8+8))+rsp]
+        cmp     rbp,QWORD[((0+8))+rsp]
+        jae     NEAR $L$8x_no_tail
+
+DB      0x66
+        add     r8,QWORD[rdi]
+        adc     r9,QWORD[8+rdi]
+        adc     r10,QWORD[16+rdi]
+        adc     r11,QWORD[24+rdi]
+        adc     r12,QWORD[32+rdi]
+        adc     r13,QWORD[40+rdi]
+        adc     r14,QWORD[48+rdi]
+        adc     r15,QWORD[56+rdi]
+        sbb     rsi,rsi
+
+        mov     rbx,QWORD[((48+56+8))+rsp]
+        mov     ecx,8
+        mov     rax,QWORD[rbp]
+        jmp     NEAR $L$8x_tail
+
+ALIGN   32
+$L$8x_tail:
+        mul     rbx
+        add     r8,rax
+        mov     rax,QWORD[8+rbp]
+        mov     QWORD[rdi],r8
+        mov     r8,rdx
+        adc     r8,0
+
+        mul     rbx
+        add     r9,rax
+        mov     rax,QWORD[16+rbp]
+        adc     rdx,0
+        add     r8,r9
+        lea     rdi,[8+rdi]
+        mov     r9,rdx
+        adc     r9,0
+
+        mul     rbx
+        add     r10,rax
+        mov     rax,QWORD[24+rbp]
+        adc     rdx,0
+        add     r9,r10
+        mov     r10,rdx
+        adc     r10,0
+
+        mul     rbx
+        add     r11,rax
+        mov     rax,QWORD[32+rbp]
+        adc     rdx,0
+        add     r10,r11
+        mov     r11,rdx
+        adc     r11,0
+
+        mul     rbx
+        add     r12,rax
+        mov     rax,QWORD[40+rbp]
+        adc     rdx,0
+        add     r11,r12
+        mov     r12,rdx
+        adc     r12,0
+
+        mul     rbx
+        add     r13,rax
+        mov     rax,QWORD[48+rbp]
+        adc     rdx,0
+        add     r12,r13
+        mov     r13,rdx
+        adc     r13,0
+
+        mul     rbx
+        add     r14,rax
+        mov     rax,QWORD[56+rbp]
+        adc     rdx,0
+        add     r13,r14
+        mov     r14,rdx
+        adc     r14,0
+
+        mul     rbx
+        mov     rbx,QWORD[((48-16+8))+rcx*8+rsp]
+        add     r15,rax
+        adc     rdx,0
+        add     r14,r15
+        mov     rax,QWORD[rbp]
+        mov     r15,rdx
+        adc     r15,0
+
+        dec     ecx
+        jnz     NEAR $L$8x_tail
+
+        lea     rbp,[64+rbp]
+        mov     rdx,QWORD[((8+8))+rsp]
+        cmp     rbp,QWORD[((0+8))+rsp]
+        jae     NEAR $L$8x_tail_done
+
+        mov     rbx,QWORD[((48+56+8))+rsp]
+        neg     rsi
+        mov     rax,QWORD[rbp]
+        adc     r8,QWORD[rdi]
+        adc     r9,QWORD[8+rdi]
+        adc     r10,QWORD[16+rdi]
+        adc     r11,QWORD[24+rdi]
+        adc     r12,QWORD[32+rdi]
+        adc     r13,QWORD[40+rdi]
+        adc     r14,QWORD[48+rdi]
+        adc     r15,QWORD[56+rdi]
+        sbb     rsi,rsi
+
+        mov     ecx,8
+        jmp     NEAR $L$8x_tail
+
+ALIGN   32
+$L$8x_tail_done:
+        xor     rax,rax
+        add     r8,QWORD[rdx]
+        adc     r9,0
+        adc     r10,0
+        adc     r11,0
+        adc     r12,0
+        adc     r13,0
+        adc     r14,0
+        adc     r15,0
+        adc     rax,0
+
+        neg     rsi
+$L$8x_no_tail:
+        adc     r8,QWORD[rdi]
+        adc     r9,QWORD[8+rdi]
+        adc     r10,QWORD[16+rdi]
+        adc     r11,QWORD[24+rdi]
+        adc     r12,QWORD[32+rdi]
+        adc     r13,QWORD[40+rdi]
+        adc     r14,QWORD[48+rdi]
+        adc     r15,QWORD[56+rdi]
+        adc     rax,0
+        mov     rcx,QWORD[((-8))+rbp]
+        xor     rsi,rsi
+
+DB      102,72,15,126,213
+
+        mov     QWORD[rdi],r8
+        mov     QWORD[8+rdi],r9
+DB      102,73,15,126,217
+        mov     QWORD[16+rdi],r10
+        mov     QWORD[24+rdi],r11
+        mov     QWORD[32+rdi],r12
+        mov     QWORD[40+rdi],r13
+        mov     QWORD[48+rdi],r14
+        mov     QWORD[56+rdi],r15
+        lea     rdi,[64+rdi]
+
+        cmp     rdi,rdx
+        jb      NEAR $L$8x_reduction_loop
+        DB      0F3h,0C3h               ;repret
+
+
+ALIGN   32
+__bn_post4x_internal:
+        mov     r12,QWORD[rbp]
+        lea     rbx,[r9*1+rdi]
+        mov     rcx,r9
+DB      102,72,15,126,207
+        neg     rax
+DB      102,72,15,126,206
+        sar     rcx,3+2
+        dec     r12
+        xor     r10,r10
+        mov     r13,QWORD[8+rbp]
+        mov     r14,QWORD[16+rbp]
+        mov     r15,QWORD[24+rbp]
+        jmp     NEAR $L$sqr4x_sub_entry
+
+ALIGN   16
+$L$sqr4x_sub:
+        mov     r12,QWORD[rbp]
+        mov     r13,QWORD[8+rbp]
+        mov     r14,QWORD[16+rbp]
+        mov     r15,QWORD[24+rbp]
+$L$sqr4x_sub_entry:
+        lea     rbp,[32+rbp]
+        not     r12
+        not     r13
+        not     r14
+        not     r15
+        and     r12,rax
+        and     r13,rax
+        and     r14,rax
+        and     r15,rax
+
+        neg     r10
+        adc     r12,QWORD[rbx]
+        adc     r13,QWORD[8+rbx]
+        adc     r14,QWORD[16+rbx]
+        adc     r15,QWORD[24+rbx]
+        mov     QWORD[rdi],r12
+        lea     rbx,[32+rbx]
+        mov     QWORD[8+rdi],r13
+        sbb     r10,r10
+        mov     QWORD[16+rdi],r14
+        mov     QWORD[24+rdi],r15
+        lea     rdi,[32+rdi]
+
+        inc     rcx
+        jnz     NEAR $L$sqr4x_sub
+
+        mov     r10,r9
+        neg     r9
+        DB      0F3h,0C3h               ;repret
+
+global  bn_from_montgomery
+
+ALIGN   32
+bn_from_montgomery:
+        test    DWORD[48+rsp],7
+        jz      NEAR bn_from_mont8x
+        xor     eax,eax
+        DB      0F3h,0C3h               ;repret
+
+
+
+ALIGN   32
+bn_from_mont8x:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_bn_from_mont8x:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+
+DB      0x67
+        mov     rax,rsp
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+$L$from_prologue:
+
+        shl     r9d,3
+        lea     r10,[r9*2+r9]
+        neg     r9
+        mov     r8,QWORD[r8]
+
+
+
+
+
+
+
+
+        lea     r11,[((-320))+r9*2+rsp]
+        mov     rbp,rsp
+        sub     r11,rdi
+        and     r11,4095
+        cmp     r10,r11
+        jb      NEAR $L$from_sp_alt
+        sub     rbp,r11
+        lea     rbp,[((-320))+r9*2+rbp]
+        jmp     NEAR $L$from_sp_done
+
+ALIGN   32
+$L$from_sp_alt:
+        lea     r10,[((4096-320))+r9*2]
+        lea     rbp,[((-320))+r9*2+rbp]
+        sub     r11,r10
+        mov     r10,0
+        cmovc   r11,r10
+        sub     rbp,r11
+$L$from_sp_done:
+        and     rbp,-64
+        mov     r11,rsp
+        sub     r11,rbp
+        and     r11,-4096
+        lea     rsp,[rbp*1+r11]
+        mov     r10,QWORD[rsp]
+        cmp     rsp,rbp
+        ja      NEAR $L$from_page_walk
+        jmp     NEAR $L$from_page_walk_done
+
+$L$from_page_walk:
+        lea     rsp,[((-4096))+rsp]
+        mov     r10,QWORD[rsp]
+        cmp     rsp,rbp
+        ja      NEAR $L$from_page_walk
+$L$from_page_walk_done:
+
+        mov     r10,r9
+        neg     r9
+
+
+
+
+
+
+
+
+
+
+        mov     QWORD[32+rsp],r8
+        mov     QWORD[40+rsp],rax
+
+$L$from_body:
+        mov     r11,r9
+        lea     rax,[48+rsp]
+        pxor    xmm0,xmm0
+        jmp     NEAR $L$mul_by_1
+
+ALIGN   32
+$L$mul_by_1:
+        movdqu  xmm1,XMMWORD[rsi]
+        movdqu  xmm2,XMMWORD[16+rsi]
+        movdqu  xmm3,XMMWORD[32+rsi]
+        movdqa  XMMWORD[r9*1+rax],xmm0
+        movdqu  xmm4,XMMWORD[48+rsi]
+        movdqa  XMMWORD[16+r9*1+rax],xmm0
+DB      0x48,0x8d,0xb6,0x40,0x00,0x00,0x00
+        movdqa  XMMWORD[rax],xmm1
+        movdqa  XMMWORD[32+r9*1+rax],xmm0
+        movdqa  XMMWORD[16+rax],xmm2
+        movdqa  XMMWORD[48+r9*1+rax],xmm0
+        movdqa  XMMWORD[32+rax],xmm3
+        movdqa  XMMWORD[48+rax],xmm4
+        lea     rax,[64+rax]
+        sub     r11,64
+        jnz     NEAR $L$mul_by_1
+
+DB      102,72,15,110,207
+DB      102,72,15,110,209
+DB      0x67
+        mov     rbp,rcx
+DB      102,73,15,110,218
+        mov     r11d,DWORD[((OPENSSL_ia32cap_P+8))]
+        and     r11d,0x80108
+        cmp     r11d,0x80108
+        jne     NEAR $L$from_mont_nox
+
+        lea     rdi,[r9*1+rax]
+        call    __bn_sqrx8x_reduction
+        call    __bn_postx4x_internal
+
+        pxor    xmm0,xmm0
+        lea     rax,[48+rsp]
+        jmp     NEAR $L$from_mont_zero
+
+ALIGN   32
+$L$from_mont_nox:
+        call    __bn_sqr8x_reduction
+        call    __bn_post4x_internal
+
+        pxor    xmm0,xmm0
+        lea     rax,[48+rsp]
+        jmp     NEAR $L$from_mont_zero
+
+ALIGN   32
+$L$from_mont_zero:
+        mov     rsi,QWORD[40+rsp]
+
+        movdqa  XMMWORD[rax],xmm0
+        movdqa  XMMWORD[16+rax],xmm0
+        movdqa  XMMWORD[32+rax],xmm0
+        movdqa  XMMWORD[48+rax],xmm0
+        lea     rax,[64+rax]
+        sub     r9,32
+        jnz     NEAR $L$from_mont_zero
+
+        mov     rax,1
+        mov     r15,QWORD[((-48))+rsi]
+
+        mov     r14,QWORD[((-40))+rsi]
+
+        mov     r13,QWORD[((-32))+rsi]
+
+        mov     r12,QWORD[((-24))+rsi]
+
+        mov     rbp,QWORD[((-16))+rsi]
+
+        mov     rbx,QWORD[((-8))+rsi]
+
+        lea     rsp,[rsi]
+
+$L$from_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_bn_from_mont8x:
+
+ALIGN   32
+bn_mulx4x_mont_gather5:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_bn_mulx4x_mont_gather5:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+
+        mov     rax,rsp
+
+$L$mulx4x_enter:
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+$L$mulx4x_prologue:
+
+        shl     r9d,3
+        lea     r10,[r9*2+r9]
+        neg     r9
+        mov     r8,QWORD[r8]
+
+
+
+
+
+
+
+
+
+
+        lea     r11,[((-320))+r9*2+rsp]
+        mov     rbp,rsp
+        sub     r11,rdi
+        and     r11,4095
+        cmp     r10,r11
+        jb      NEAR $L$mulx4xsp_alt
+        sub     rbp,r11
+        lea     rbp,[((-320))+r9*2+rbp]
+        jmp     NEAR $L$mulx4xsp_done
+
+$L$mulx4xsp_alt:
+        lea     r10,[((4096-320))+r9*2]
+        lea     rbp,[((-320))+r9*2+rbp]
+        sub     r11,r10
+        mov     r10,0
+        cmovc   r11,r10
+        sub     rbp,r11
+$L$mulx4xsp_done:
+        and     rbp,-64
+        mov     r11,rsp
+        sub     r11,rbp
+        and     r11,-4096
+        lea     rsp,[rbp*1+r11]
+        mov     r10,QWORD[rsp]
+        cmp     rsp,rbp
+        ja      NEAR $L$mulx4x_page_walk
+        jmp     NEAR $L$mulx4x_page_walk_done
+
+$L$mulx4x_page_walk:
+        lea     rsp,[((-4096))+rsp]
+        mov     r10,QWORD[rsp]
+        cmp     rsp,rbp
+        ja      NEAR $L$mulx4x_page_walk
+$L$mulx4x_page_walk_done:
+
+
+
+
+
+
+
+
+
+
+
+
+
+        mov     QWORD[32+rsp],r8
+        mov     QWORD[40+rsp],rax
+
+$L$mulx4x_body:
+        call    mulx4x_internal
+
+        mov     rsi,QWORD[40+rsp]
+
+        mov     rax,1
+
+        mov     r15,QWORD[((-48))+rsi]
+
+        mov     r14,QWORD[((-40))+rsi]
+
+        mov     r13,QWORD[((-32))+rsi]
+
+        mov     r12,QWORD[((-24))+rsi]
+
+        mov     rbp,QWORD[((-16))+rsi]
+
+        mov     rbx,QWORD[((-8))+rsi]
+
+        lea     rsp,[rsi]
+
+$L$mulx4x_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_bn_mulx4x_mont_gather5:
+
+
+ALIGN   32
+mulx4x_internal:
+        mov     QWORD[8+rsp],r9
+        mov     r10,r9
+        neg     r9
+        shl     r9,5
+        neg     r10
+        lea     r13,[128+r9*1+rdx]
+        shr     r9,5+5
+        movd    xmm5,DWORD[56+rax]
+        sub     r9,1
+        lea     rax,[$L$inc]
+        mov     QWORD[((16+8))+rsp],r13
+        mov     QWORD[((24+8))+rsp],r9
+        mov     QWORD[((56+8))+rsp],rdi
+        movdqa  xmm0,XMMWORD[rax]
+        movdqa  xmm1,XMMWORD[16+rax]
+        lea     r10,[((88-112))+r10*1+rsp]
+        lea     rdi,[128+rdx]
+
+        pshufd  xmm5,xmm5,0
+        movdqa  xmm4,xmm1
+DB      0x67
+        movdqa  xmm2,xmm1
+DB      0x67
+        paddd   xmm1,xmm0
+        pcmpeqd xmm0,xmm5
+        movdqa  xmm3,xmm4
+        paddd   xmm2,xmm1
+        pcmpeqd xmm1,xmm5
+        movdqa  XMMWORD[112+r10],xmm0
+        movdqa  xmm0,xmm4
+
+        paddd   xmm3,xmm2
+        pcmpeqd xmm2,xmm5
+        movdqa  XMMWORD[128+r10],xmm1
+        movdqa  xmm1,xmm4
+
+        paddd   xmm0,xmm3
+        pcmpeqd xmm3,xmm5
+        movdqa  XMMWORD[144+r10],xmm2
+        movdqa  xmm2,xmm4
+
+        paddd   xmm1,xmm0
+        pcmpeqd xmm0,xmm5
+        movdqa  XMMWORD[160+r10],xmm3
+        movdqa  xmm3,xmm4
+        paddd   xmm2,xmm1
+        pcmpeqd xmm1,xmm5
+        movdqa  XMMWORD[176+r10],xmm0
+        movdqa  xmm0,xmm4
+
+        paddd   xmm3,xmm2
+        pcmpeqd xmm2,xmm5
+        movdqa  XMMWORD[192+r10],xmm1
+        movdqa  xmm1,xmm4
+
+        paddd   xmm0,xmm3
+        pcmpeqd xmm3,xmm5
+        movdqa  XMMWORD[208+r10],xmm2
+        movdqa  xmm2,xmm4
+
+        paddd   xmm1,xmm0
+        pcmpeqd xmm0,xmm5
+        movdqa  XMMWORD[224+r10],xmm3
+        movdqa  xmm3,xmm4
+        paddd   xmm2,xmm1
+        pcmpeqd xmm1,xmm5
+        movdqa  XMMWORD[240+r10],xmm0
+        movdqa  xmm0,xmm4
+
+        paddd   xmm3,xmm2
+        pcmpeqd xmm2,xmm5
+        movdqa  XMMWORD[256+r10],xmm1
+        movdqa  xmm1,xmm4
+
+        paddd   xmm0,xmm3
+        pcmpeqd xmm3,xmm5
+        movdqa  XMMWORD[272+r10],xmm2
+        movdqa  xmm2,xmm4
+
+        paddd   xmm1,xmm0
+        pcmpeqd xmm0,xmm5
+        movdqa  XMMWORD[288+r10],xmm3
+        movdqa  xmm3,xmm4
+DB      0x67
+        paddd   xmm2,xmm1
+        pcmpeqd xmm1,xmm5
+        movdqa  XMMWORD[304+r10],xmm0
+
+        paddd   xmm3,xmm2
+        pcmpeqd xmm2,xmm5
+        movdqa  XMMWORD[320+r10],xmm1
+
+        pcmpeqd xmm3,xmm5
+        movdqa  XMMWORD[336+r10],xmm2
+
+        pand    xmm0,XMMWORD[64+rdi]
+        pand    xmm1,XMMWORD[80+rdi]
+        pand    xmm2,XMMWORD[96+rdi]
+        movdqa  XMMWORD[352+r10],xmm3
+        pand    xmm3,XMMWORD[112+rdi]
+        por     xmm0,xmm2
+        por     xmm1,xmm3
+        movdqa  xmm4,XMMWORD[((-128))+rdi]
+        movdqa  xmm5,XMMWORD[((-112))+rdi]
+        movdqa  xmm2,XMMWORD[((-96))+rdi]
+        pand    xmm4,XMMWORD[112+r10]
+        movdqa  xmm3,XMMWORD[((-80))+rdi]
+        pand    xmm5,XMMWORD[128+r10]
+        por     xmm0,xmm4
+        pand    xmm2,XMMWORD[144+r10]
+        por     xmm1,xmm5
+        pand    xmm3,XMMWORD[160+r10]
+        por     xmm0,xmm2
+        por     xmm1,xmm3
+        movdqa  xmm4,XMMWORD[((-64))+rdi]
+        movdqa  xmm5,XMMWORD[((-48))+rdi]
+        movdqa  xmm2,XMMWORD[((-32))+rdi]
+        pand    xmm4,XMMWORD[176+r10]
+        movdqa  xmm3,XMMWORD[((-16))+rdi]
+        pand    xmm5,XMMWORD[192+r10]
+        por     xmm0,xmm4
+        pand    xmm2,XMMWORD[208+r10]
+        por     xmm1,xmm5
+        pand    xmm3,XMMWORD[224+r10]
+        por     xmm0,xmm2
+        por     xmm1,xmm3
+        movdqa  xmm4,XMMWORD[rdi]
+        movdqa  xmm5,XMMWORD[16+rdi]
+        movdqa  xmm2,XMMWORD[32+rdi]
+        pand    xmm4,XMMWORD[240+r10]
+        movdqa  xmm3,XMMWORD[48+rdi]
+        pand    xmm5,XMMWORD[256+r10]
+        por     xmm0,xmm4
+        pand    xmm2,XMMWORD[272+r10]
+        por     xmm1,xmm5
+        pand    xmm3,XMMWORD[288+r10]
+        por     xmm0,xmm2
+        por     xmm1,xmm3
+        pxor    xmm0,xmm1
+        pshufd  xmm1,xmm0,0x4e
+        por     xmm0,xmm1
+        lea     rdi,[256+rdi]
+DB      102,72,15,126,194
+        lea     rbx,[((64+32+8))+rsp]
+
+        mov     r9,rdx
+        mulx    rax,r8,QWORD[rsi]
+        mulx    r12,r11,QWORD[8+rsi]
+        add     r11,rax
+        mulx    r13,rax,QWORD[16+rsi]
+        adc     r12,rax
+        adc     r13,0
+        mulx    r14,rax,QWORD[24+rsi]
+
+        mov     r15,r8
+        imul    r8,QWORD[((32+8))+rsp]
+        xor     rbp,rbp
+        mov     rdx,r8
+
+        mov     QWORD[((8+8))+rsp],rdi
+
+        lea     rsi,[32+rsi]
+        adcx    r13,rax
+        adcx    r14,rbp
+
+        mulx    r10,rax,QWORD[rcx]
+        adcx    r15,rax
+        adox    r10,r11
+        mulx    r11,rax,QWORD[8+rcx]
+        adcx    r10,rax
+        adox    r11,r12
+        mulx    r12,rax,QWORD[16+rcx]
+        mov     rdi,QWORD[((24+8))+rsp]
+        mov     QWORD[((-32))+rbx],r10
+        adcx    r11,rax
+        adox    r12,r13
+        mulx    r15,rax,QWORD[24+rcx]
+        mov     rdx,r9
+        mov     QWORD[((-24))+rbx],r11
+        adcx    r12,rax
+        adox    r15,rbp
+        lea     rcx,[32+rcx]
+        mov     QWORD[((-16))+rbx],r12
+        jmp     NEAR $L$mulx4x_1st
+
+ALIGN   32
+$L$mulx4x_1st:
+        adcx    r15,rbp
+        mulx    rax,r10,QWORD[rsi]
+        adcx    r10,r14
+        mulx    r14,r11,QWORD[8+rsi]
+        adcx    r11,rax
+        mulx    rax,r12,QWORD[16+rsi]
+        adcx    r12,r14
+        mulx    r14,r13,QWORD[24+rsi]
+DB      0x67,0x67
+        mov     rdx,r8
+        adcx    r13,rax
+        adcx    r14,rbp
+        lea     rsi,[32+rsi]
+        lea     rbx,[32+rbx]
+
+        adox    r10,r15
+        mulx    r15,rax,QWORD[rcx]
+        adcx    r10,rax
+        adox    r11,r15
+        mulx    r15,rax,QWORD[8+rcx]
+        adcx    r11,rax
+        adox    r12,r15
+        mulx    r15,rax,QWORD[16+rcx]
+        mov     QWORD[((-40))+rbx],r10
+        adcx    r12,rax
+        mov     QWORD[((-32))+rbx],r11
+        adox    r13,r15
+        mulx    r15,rax,QWORD[24+rcx]
+        mov     rdx,r9
+        mov     QWORD[((-24))+rbx],r12
+        adcx    r13,rax
+        adox    r15,rbp
+        lea     rcx,[32+rcx]
+        mov     QWORD[((-16))+rbx],r13
+
+        dec     rdi
+        jnz     NEAR $L$mulx4x_1st
+
+        mov     rax,QWORD[8+rsp]
+        adc     r15,rbp
+        lea     rsi,[rax*1+rsi]
+        add     r14,r15
+        mov     rdi,QWORD[((8+8))+rsp]
+        adc     rbp,rbp
+        mov     QWORD[((-8))+rbx],r14
+        jmp     NEAR $L$mulx4x_outer
+
+ALIGN   32
+$L$mulx4x_outer:
+        lea     r10,[((16-256))+rbx]
+        pxor    xmm4,xmm4
+DB      0x67,0x67
+        pxor    xmm5,xmm5
+        movdqa  xmm0,XMMWORD[((-128))+rdi]
+        movdqa  xmm1,XMMWORD[((-112))+rdi]
+        movdqa  xmm2,XMMWORD[((-96))+rdi]
+        pand    xmm0,XMMWORD[256+r10]
+        movdqa  xmm3,XMMWORD[((-80))+rdi]
+        pand    xmm1,XMMWORD[272+r10]
+        por     xmm4,xmm0
+        pand    xmm2,XMMWORD[288+r10]
+        por     xmm5,xmm1
+        pand    xmm3,XMMWORD[304+r10]
+        por     xmm4,xmm2
+        por     xmm5,xmm3
+        movdqa  xmm0,XMMWORD[((-64))+rdi]
+        movdqa  xmm1,XMMWORD[((-48))+rdi]
+        movdqa  xmm2,XMMWORD[((-32))+rdi]
+        pand    xmm0,XMMWORD[320+r10]
+        movdqa  xmm3,XMMWORD[((-16))+rdi]
+        pand    xmm1,XMMWORD[336+r10]
+        por     xmm4,xmm0
+        pand    xmm2,XMMWORD[352+r10]
+        por     xmm5,xmm1
+        pand    xmm3,XMMWORD[368+r10]
+        por     xmm4,xmm2
+        por     xmm5,xmm3
+        movdqa  xmm0,XMMWORD[rdi]
+        movdqa  xmm1,XMMWORD[16+rdi]
+        movdqa  xmm2,XMMWORD[32+rdi]
+        pand    xmm0,XMMWORD[384+r10]
+        movdqa  xmm3,XMMWORD[48+rdi]
+        pand    xmm1,XMMWORD[400+r10]
+        por     xmm4,xmm0
+        pand    xmm2,XMMWORD[416+r10]
+        por     xmm5,xmm1
+        pand    xmm3,XMMWORD[432+r10]
+        por     xmm4,xmm2
+        por     xmm5,xmm3
+        movdqa  xmm0,XMMWORD[64+rdi]
+        movdqa  xmm1,XMMWORD[80+rdi]
+        movdqa  xmm2,XMMWORD[96+rdi]
+        pand    xmm0,XMMWORD[448+r10]
+        movdqa  xmm3,XMMWORD[112+rdi]
+        pand    xmm1,XMMWORD[464+r10]
+        por     xmm4,xmm0
+        pand    xmm2,XMMWORD[480+r10]
+        por     xmm5,xmm1
+        pand    xmm3,XMMWORD[496+r10]
+        por     xmm4,xmm2
+        por     xmm5,xmm3
+        por     xmm4,xmm5
+        pshufd  xmm0,xmm4,0x4e
+        por     xmm0,xmm4
+        lea     rdi,[256+rdi]
+DB      102,72,15,126,194
+
+        mov     QWORD[rbx],rbp
+        lea     rbx,[32+rax*1+rbx]
+        mulx    r11,r8,QWORD[rsi]
+        xor     rbp,rbp
+        mov     r9,rdx
+        mulx    r12,r14,QWORD[8+rsi]
+        adox    r8,QWORD[((-32))+rbx]
+        adcx    r11,r14
+        mulx    r13,r15,QWORD[16+rsi]
+        adox    r11,QWORD[((-24))+rbx]
+        adcx    r12,r15
+        mulx    r14,rdx,QWORD[24+rsi]
+        adox    r12,QWORD[((-16))+rbx]
+        adcx    r13,rdx
+        lea     rcx,[rax*1+rcx]
+        lea     rsi,[32+rsi]
+        adox    r13,QWORD[((-8))+rbx]
+        adcx    r14,rbp
+        adox    r14,rbp
+
+        mov     r15,r8
+        imul    r8,QWORD[((32+8))+rsp]
+
+        mov     rdx,r8
+        xor     rbp,rbp
+        mov     QWORD[((8+8))+rsp],rdi
+
+        mulx    r10,rax,QWORD[rcx]
+        adcx    r15,rax
+        adox    r10,r11
+        mulx    r11,rax,QWORD[8+rcx]
+        adcx    r10,rax
+        adox    r11,r12
+        mulx    r12,rax,QWORD[16+rcx]
+        adcx    r11,rax
+        adox    r12,r13
+        mulx    r15,rax,QWORD[24+rcx]
+        mov     rdx,r9
+        mov     rdi,QWORD[((24+8))+rsp]
+        mov     QWORD[((-32))+rbx],r10
+        adcx    r12,rax
+        mov     QWORD[((-24))+rbx],r11
+        adox    r15,rbp
+        mov     QWORD[((-16))+rbx],r12
+        lea     rcx,[32+rcx]
+        jmp     NEAR $L$mulx4x_inner
+
+ALIGN   32
+$L$mulx4x_inner:
+        mulx    rax,r10,QWORD[rsi]
+        adcx    r15,rbp
+        adox    r10,r14
+        mulx    r14,r11,QWORD[8+rsi]
+        adcx    r10,QWORD[rbx]
+        adox    r11,rax
+        mulx    rax,r12,QWORD[16+rsi]
+        adcx    r11,QWORD[8+rbx]
+        adox    r12,r14
+        mulx    r14,r13,QWORD[24+rsi]
+        mov     rdx,r8
+        adcx    r12,QWORD[16+rbx]
+        adox    r13,rax
+        adcx    r13,QWORD[24+rbx]
+        adox    r14,rbp
+        lea     rsi,[32+rsi]
+        lea     rbx,[32+rbx]
+        adcx    r14,rbp
+
+        adox    r10,r15
+        mulx    r15,rax,QWORD[rcx]
+        adcx    r10,rax
+        adox    r11,r15
+        mulx    r15,rax,QWORD[8+rcx]
+        adcx    r11,rax
+        adox    r12,r15
+        mulx    r15,rax,QWORD[16+rcx]
+        mov     QWORD[((-40))+rbx],r10
+        adcx    r12,rax
+        adox    r13,r15
+        mov     QWORD[((-32))+rbx],r11
+        mulx    r15,rax,QWORD[24+rcx]
+        mov     rdx,r9
+        lea     rcx,[32+rcx]
+        mov     QWORD[((-24))+rbx],r12
+        adcx    r13,rax
+        adox    r15,rbp
+        mov     QWORD[((-16))+rbx],r13
+
+        dec     rdi
+        jnz     NEAR $L$mulx4x_inner
+
+        mov     rax,QWORD[((0+8))+rsp]
+        adc     r15,rbp
+        sub     rdi,QWORD[rbx]
+        mov     rdi,QWORD[((8+8))+rsp]
+        mov     r10,QWORD[((16+8))+rsp]
+        adc     r14,r15
+        lea     rsi,[rax*1+rsi]
+        adc     rbp,rbp
+        mov     QWORD[((-8))+rbx],r14
+
+        cmp     rdi,r10
+        jb      NEAR $L$mulx4x_outer
+
+        mov     r10,QWORD[((-8))+rcx]
+        mov     r8,rbp
+        mov     r12,QWORD[rax*1+rcx]
+        lea     rbp,[rax*1+rcx]
+        mov     rcx,rax
+        lea     rdi,[rax*1+rbx]
+        xor     eax,eax
+        xor     r15,r15
+        sub     r10,r14
+        adc     r15,r15
+        or      r8,r15
+        sar     rcx,3+2
+        sub     rax,r8
+        mov     rdx,QWORD[((56+8))+rsp]
+        dec     r12
+        mov     r13,QWORD[8+rbp]
+        xor     r8,r8
+        mov     r14,QWORD[16+rbp]
+        mov     r15,QWORD[24+rbp]
+        jmp     NEAR $L$sqrx4x_sub_entry
+
+
+ALIGN   32
+bn_powerx5:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_bn_powerx5:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+
+        mov     rax,rsp
+
+$L$powerx5_enter:
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+$L$powerx5_prologue:
+
+        shl     r9d,3
+        lea     r10,[r9*2+r9]
+        neg     r9
+        mov     r8,QWORD[r8]
+
+
+
+
+
+
+
+
+        lea     r11,[((-320))+r9*2+rsp]
+        mov     rbp,rsp
+        sub     r11,rdi
+        and     r11,4095
+        cmp     r10,r11
+        jb      NEAR $L$pwrx_sp_alt
+        sub     rbp,r11
+        lea     rbp,[((-320))+r9*2+rbp]
+        jmp     NEAR $L$pwrx_sp_done
+
+ALIGN   32
+$L$pwrx_sp_alt:
+        lea     r10,[((4096-320))+r9*2]
+        lea     rbp,[((-320))+r9*2+rbp]
+        sub     r11,r10
+        mov     r10,0
+        cmovc   r11,r10
+        sub     rbp,r11
+$L$pwrx_sp_done:
+        and     rbp,-64
+        mov     r11,rsp
+        sub     r11,rbp
+        and     r11,-4096
+        lea     rsp,[rbp*1+r11]
+        mov     r10,QWORD[rsp]
+        cmp     rsp,rbp
+        ja      NEAR $L$pwrx_page_walk
+        jmp     NEAR $L$pwrx_page_walk_done
+
+$L$pwrx_page_walk:
+        lea     rsp,[((-4096))+rsp]
+        mov     r10,QWORD[rsp]
+        cmp     rsp,rbp
+        ja      NEAR $L$pwrx_page_walk
+$L$pwrx_page_walk_done:
+
+        mov     r10,r9
+        neg     r9
+
+
+
+
+
+
+
+
+
+
+
+
+        pxor    xmm0,xmm0
+DB      102,72,15,110,207
+DB      102,72,15,110,209
+DB      102,73,15,110,218
+DB      102,72,15,110,226
+        mov     QWORD[32+rsp],r8
+        mov     QWORD[40+rsp],rax
+
+$L$powerx5_body:
+
+        call    __bn_sqrx8x_internal
+        call    __bn_postx4x_internal
+        call    __bn_sqrx8x_internal
+        call    __bn_postx4x_internal
+        call    __bn_sqrx8x_internal
+        call    __bn_postx4x_internal
+        call    __bn_sqrx8x_internal
+        call    __bn_postx4x_internal
+        call    __bn_sqrx8x_internal
+        call    __bn_postx4x_internal
+
+        mov     r9,r10
+        mov     rdi,rsi
+DB      102,72,15,126,209
+DB      102,72,15,126,226
+        mov     rax,QWORD[40+rsp]
+
+        call    mulx4x_internal
+
+        mov     rsi,QWORD[40+rsp]
+
+        mov     rax,1
+
+        mov     r15,QWORD[((-48))+rsi]
+
+        mov     r14,QWORD[((-40))+rsi]
+
+        mov     r13,QWORD[((-32))+rsi]
+
+        mov     r12,QWORD[((-24))+rsi]
+
+        mov     rbp,QWORD[((-16))+rsi]
+
+        mov     rbx,QWORD[((-8))+rsi]
+
+        lea     rsp,[rsi]
+
+$L$powerx5_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_bn_powerx5:
+
+global  bn_sqrx8x_internal
+
+
+ALIGN   32
+bn_sqrx8x_internal:
+__bn_sqrx8x_internal:
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+        lea     rdi,[((48+8))+rsp]
+        lea     rbp,[r9*1+rsi]
+        mov     QWORD[((0+8))+rsp],r9
+        mov     QWORD[((8+8))+rsp],rbp
+        jmp     NEAR $L$sqr8x_zero_start
+
+ALIGN   32
+DB      0x66,0x66,0x66,0x2e,0x0f,0x1f,0x84,0x00,0x00,0x00,0x00,0x00
+$L$sqrx8x_zero:
+DB      0x3e
+        movdqa  XMMWORD[rdi],xmm0
+        movdqa  XMMWORD[16+rdi],xmm0
+        movdqa  XMMWORD[32+rdi],xmm0
+        movdqa  XMMWORD[48+rdi],xmm0
+$L$sqr8x_zero_start:
+        movdqa  XMMWORD[64+rdi],xmm0
+        movdqa  XMMWORD[80+rdi],xmm0
+        movdqa  XMMWORD[96+rdi],xmm0
+        movdqa  XMMWORD[112+rdi],xmm0
+        lea     rdi,[128+rdi]
+        sub     r9,64
+        jnz     NEAR $L$sqrx8x_zero
+
+        mov     rdx,QWORD[rsi]
+
+        xor     r10,r10
+        xor     r11,r11
+        xor     r12,r12
+        xor     r13,r13
+        xor     r14,r14
+        xor     r15,r15
+        lea     rdi,[((48+8))+rsp]
+        xor     rbp,rbp
+        jmp     NEAR $L$sqrx8x_outer_loop
+
+ALIGN   32
+$L$sqrx8x_outer_loop:
+        mulx    rax,r8,QWORD[8+rsi]
+        adcx    r8,r9
+        adox    r10,rax
+        mulx    rax,r9,QWORD[16+rsi]
+        adcx    r9,r10
+        adox    r11,rax
+DB      0xc4,0xe2,0xab,0xf6,0x86,0x18,0x00,0x00,0x00
+        adcx    r10,r11
+        adox    r12,rax
+DB      0xc4,0xe2,0xa3,0xf6,0x86,0x20,0x00,0x00,0x00
+        adcx    r11,r12
+        adox    r13,rax
+        mulx    rax,r12,QWORD[40+rsi]
+        adcx    r12,r13
+        adox    r14,rax
+        mulx    rax,r13,QWORD[48+rsi]
+        adcx    r13,r14
+        adox    rax,r15
+        mulx    r15,r14,QWORD[56+rsi]
+        mov     rdx,QWORD[8+rsi]
+        adcx    r14,rax
+        adox    r15,rbp
+        adc     r15,QWORD[64+rdi]
+        mov     QWORD[8+rdi],r8
+        mov     QWORD[16+rdi],r9
+        sbb     rcx,rcx
+        xor     rbp,rbp
+
+
+        mulx    rbx,r8,QWORD[16+rsi]
+        mulx    rax,r9,QWORD[24+rsi]
+        adcx    r8,r10
+        adox    r9,rbx
+        mulx    rbx,r10,QWORD[32+rsi]
+        adcx    r9,r11
+        adox    r10,rax
+DB      0xc4,0xe2,0xa3,0xf6,0x86,0x28,0x00,0x00,0x00
+        adcx    r10,r12
+        adox    r11,rbx
+DB      0xc4,0xe2,0x9b,0xf6,0x9e,0x30,0x00,0x00,0x00
+        adcx    r11,r13
+        adox    r12,r14
+DB      0xc4,0x62,0x93,0xf6,0xb6,0x38,0x00,0x00,0x00
+        mov     rdx,QWORD[16+rsi]
+        adcx    r12,rax
+        adox    r13,rbx
+        adcx    r13,r15
+        adox    r14,rbp
+        adcx    r14,rbp
+
+        mov     QWORD[24+rdi],r8
+        mov     QWORD[32+rdi],r9
+
+        mulx    rbx,r8,QWORD[24+rsi]
+        mulx    rax,r9,QWORD[32+rsi]
+        adcx    r8,r10
+        adox    r9,rbx
+        mulx    rbx,r10,QWORD[40+rsi]
+        adcx    r9,r11
+        adox    r10,rax
+DB      0xc4,0xe2,0xa3,0xf6,0x86,0x30,0x00,0x00,0x00
+        adcx    r10,r12
+        adox    r11,r13
+DB      0xc4,0x62,0x9b,0xf6,0xae,0x38,0x00,0x00,0x00
+DB      0x3e
+        mov     rdx,QWORD[24+rsi]
+        adcx    r11,rbx
+        adox    r12,rax
+        adcx    r12,r14
+        mov     QWORD[40+rdi],r8
+        mov     QWORD[48+rdi],r9
+        mulx    rax,r8,QWORD[32+rsi]
+        adox    r13,rbp
+        adcx    r13,rbp
+
+        mulx    rbx,r9,QWORD[40+rsi]
+        adcx    r8,r10
+        adox    r9,rax
+        mulx    rax,r10,QWORD[48+rsi]
+        adcx    r9,r11
+        adox    r10,r12
+        mulx    r12,r11,QWORD[56+rsi]
+        mov     rdx,QWORD[32+rsi]
+        mov     r14,QWORD[40+rsi]
+        adcx    r10,rbx
+        adox    r11,rax
+        mov     r15,QWORD[48+rsi]
+        adcx    r11,r13
+        adox    r12,rbp
+        adcx    r12,rbp
+
+        mov     QWORD[56+rdi],r8
+        mov     QWORD[64+rdi],r9
+
+        mulx    rax,r9,r14
+        mov     r8,QWORD[56+rsi]
+        adcx    r9,r10
+        mulx    rbx,r10,r15
+        adox    r10,rax
+        adcx    r10,r11
+        mulx    rax,r11,r8
+        mov     rdx,r14
+        adox    r11,rbx
+        adcx    r11,r12
+
+        adcx    rax,rbp
+
+        mulx    rbx,r14,r15
+        mulx    r13,r12,r8
+        mov     rdx,r15
+        lea     rsi,[64+rsi]
+        adcx    r11,r14
+        adox    r12,rbx
+        adcx    r12,rax
+        adox    r13,rbp
+
+DB      0x67,0x67
+        mulx    r14,r8,r8
+        adcx    r13,r8
+        adcx    r14,rbp
+
+        cmp     rsi,QWORD[((8+8))+rsp]
+        je      NEAR $L$sqrx8x_outer_break
+
+        neg     rcx
+        mov     rcx,-8
+        mov     r15,rbp
+        mov     r8,QWORD[64+rdi]
+        adcx    r9,QWORD[72+rdi]
+        adcx    r10,QWORD[80+rdi]
+        adcx    r11,QWORD[88+rdi]
+        adc     r12,QWORD[96+rdi]
+        adc     r13,QWORD[104+rdi]
+        adc     r14,QWORD[112+rdi]
+        adc     r15,QWORD[120+rdi]
+        lea     rbp,[rsi]
+        lea     rdi,[128+rdi]
+        sbb     rax,rax
+
+        mov     rdx,QWORD[((-64))+rsi]
+        mov     QWORD[((16+8))+rsp],rax
+        mov     QWORD[((24+8))+rsp],rdi
+
+
+        xor     eax,eax
+        jmp     NEAR $L$sqrx8x_loop
+
+ALIGN   32
+$L$sqrx8x_loop:
+        mov     rbx,r8
+        mulx    r8,rax,QWORD[rbp]
+        adcx    rbx,rax
+        adox    r8,r9
+
+        mulx    r9,rax,QWORD[8+rbp]
+        adcx    r8,rax
+        adox    r9,r10
+
+        mulx    r10,rax,QWORD[16+rbp]
+        adcx    r9,rax
+        adox    r10,r11
+
+        mulx    r11,rax,QWORD[24+rbp]
+        adcx    r10,rax
+        adox    r11,r12
+
+DB      0xc4,0x62,0xfb,0xf6,0xa5,0x20,0x00,0x00,0x00
+        adcx    r11,rax
+        adox    r12,r13
+
+        mulx    r13,rax,QWORD[40+rbp]
+        adcx    r12,rax
+        adox    r13,r14
+
+        mulx    r14,rax,QWORD[48+rbp]
+        mov     QWORD[rcx*8+rdi],rbx
+        mov     ebx,0
+        adcx    r13,rax
+        adox    r14,r15
+
+DB      0xc4,0x62,0xfb,0xf6,0xbd,0x38,0x00,0x00,0x00
+        mov     rdx,QWORD[8+rcx*8+rsi]
+        adcx    r14,rax
+        adox    r15,rbx
+        adcx    r15,rbx
+
+DB      0x67
+        inc     rcx
+        jnz     NEAR $L$sqrx8x_loop
+
+        lea     rbp,[64+rbp]
+        mov     rcx,-8
+        cmp     rbp,QWORD[((8+8))+rsp]
+        je      NEAR $L$sqrx8x_break
+
+        sub     rbx,QWORD[((16+8))+rsp]
+DB      0x66
+        mov     rdx,QWORD[((-64))+rsi]
+        adcx    r8,QWORD[rdi]
+        adcx    r9,QWORD[8+rdi]
+        adc     r10,QWORD[16+rdi]
+        adc     r11,QWORD[24+rdi]
+        adc     r12,QWORD[32+rdi]
+        adc     r13,QWORD[40+rdi]
+        adc     r14,QWORD[48+rdi]
+        adc     r15,QWORD[56+rdi]
+        lea     rdi,[64+rdi]
+DB      0x67
+        sbb     rax,rax
+        xor     ebx,ebx
+        mov     QWORD[((16+8))+rsp],rax
+        jmp     NEAR $L$sqrx8x_loop
+
+ALIGN   32
+$L$sqrx8x_break:
+        xor     rbp,rbp
+        sub     rbx,QWORD[((16+8))+rsp]
+        adcx    r8,rbp
+        mov     rcx,QWORD[((24+8))+rsp]
+        adcx    r9,rbp
+        mov     rdx,QWORD[rsi]
+        adc     r10,0
+        mov     QWORD[rdi],r8
+        adc     r11,0
+        adc     r12,0
+        adc     r13,0
+        adc     r14,0
+        adc     r15,0
+        cmp     rdi,rcx
+        je      NEAR $L$sqrx8x_outer_loop
+
+        mov     QWORD[8+rdi],r9
+        mov     r9,QWORD[8+rcx]
+        mov     QWORD[16+rdi],r10
+        mov     r10,QWORD[16+rcx]
+        mov     QWORD[24+rdi],r11
+        mov     r11,QWORD[24+rcx]
+        mov     QWORD[32+rdi],r12
+        mov     r12,QWORD[32+rcx]
+        mov     QWORD[40+rdi],r13
+        mov     r13,QWORD[40+rcx]
+        mov     QWORD[48+rdi],r14
+        mov     r14,QWORD[48+rcx]
+        mov     QWORD[56+rdi],r15
+        mov     r15,QWORD[56+rcx]
+        mov     rdi,rcx
+        jmp     NEAR $L$sqrx8x_outer_loop
+
+ALIGN   32
+$L$sqrx8x_outer_break:
+        mov     QWORD[72+rdi],r9
+DB      102,72,15,126,217
+        mov     QWORD[80+rdi],r10
+        mov     QWORD[88+rdi],r11
+        mov     QWORD[96+rdi],r12
+        mov     QWORD[104+rdi],r13
+        mov     QWORD[112+rdi],r14
+        lea     rdi,[((48+8))+rsp]
+        mov     rdx,QWORD[rcx*1+rsi]
+
+        mov     r11,QWORD[8+rdi]
+        xor     r10,r10
+        mov     r9,QWORD[((0+8))+rsp]
+        adox    r11,r11
+        mov     r12,QWORD[16+rdi]
+        mov     r13,QWORD[24+rdi]
+
+
+ALIGN   32
+$L$sqrx4x_shift_n_add:
+        mulx    rbx,rax,rdx
+        adox    r12,r12
+        adcx    rax,r10
+DB      0x48,0x8b,0x94,0x0e,0x08,0x00,0x00,0x00
+DB      0x4c,0x8b,0x97,0x20,0x00,0x00,0x00
+        adox    r13,r13
+        adcx    rbx,r11
+        mov     r11,QWORD[40+rdi]
+        mov     QWORD[rdi],rax
+        mov     QWORD[8+rdi],rbx
+
+        mulx    rbx,rax,rdx
+        adox    r10,r10
+        adcx    rax,r12
+        mov     rdx,QWORD[16+rcx*1+rsi]
+        mov     r12,QWORD[48+rdi]
+        adox    r11,r11
+        adcx    rbx,r13
+        mov     r13,QWORD[56+rdi]
+        mov     QWORD[16+rdi],rax
+        mov     QWORD[24+rdi],rbx
+
+        mulx    rbx,rax,rdx
+        adox    r12,r12
+        adcx    rax,r10
+        mov     rdx,QWORD[24+rcx*1+rsi]
+        lea     rcx,[32+rcx]
+        mov     r10,QWORD[64+rdi]
+        adox    r13,r13
+        adcx    rbx,r11
+        mov     r11,QWORD[72+rdi]
+        mov     QWORD[32+rdi],rax
+        mov     QWORD[40+rdi],rbx
+
+        mulx    rbx,rax,rdx
+        adox    r10,r10
+        adcx    rax,r12
+        jrcxz   $L$sqrx4x_shift_n_add_break
+DB      0x48,0x8b,0x94,0x0e,0x00,0x00,0x00,0x00
+        adox    r11,r11
+        adcx    rbx,r13
+        mov     r12,QWORD[80+rdi]
+        mov     r13,QWORD[88+rdi]
+        mov     QWORD[48+rdi],rax
+        mov     QWORD[56+rdi],rbx
+        lea     rdi,[64+rdi]
+        nop
+        jmp     NEAR $L$sqrx4x_shift_n_add
+
+ALIGN   32
+$L$sqrx4x_shift_n_add_break:
+        adcx    rbx,r13
+        mov     QWORD[48+rdi],rax
+        mov     QWORD[56+rdi],rbx
+        lea     rdi,[64+rdi]
+DB      102,72,15,126,213
+__bn_sqrx8x_reduction:
+        xor     eax,eax
+        mov     rbx,QWORD[((32+8))+rsp]
+        mov     rdx,QWORD[((48+8))+rsp]
+        lea     rcx,[((-64))+r9*1+rbp]
+
+        mov     QWORD[((0+8))+rsp],rcx
+        mov     QWORD[((8+8))+rsp],rdi
+
+        lea     rdi,[((48+8))+rsp]
+        jmp     NEAR $L$sqrx8x_reduction_loop
+
+ALIGN   32
+$L$sqrx8x_reduction_loop:
+        mov     r9,QWORD[8+rdi]
+        mov     r10,QWORD[16+rdi]
+        mov     r11,QWORD[24+rdi]
+        mov     r12,QWORD[32+rdi]
+        mov     r8,rdx
+        imul    rdx,rbx
+        mov     r13,QWORD[40+rdi]
+        mov     r14,QWORD[48+rdi]
+        mov     r15,QWORD[56+rdi]
+        mov     QWORD[((24+8))+rsp],rax
+
+        lea     rdi,[64+rdi]
+        xor     rsi,rsi
+        mov     rcx,-8
+        jmp     NEAR $L$sqrx8x_reduce
+
+ALIGN   32
+$L$sqrx8x_reduce:
+        mov     rbx,r8
+        mulx    r8,rax,QWORD[rbp]
+        adcx    rax,rbx
+        adox    r8,r9
+
+        mulx    r9,rbx,QWORD[8+rbp]
+        adcx    r8,rbx
+        adox    r9,r10
+
+        mulx    r10,rbx,QWORD[16+rbp]
+        adcx    r9,rbx
+        adox    r10,r11
+
+        mulx    r11,rbx,QWORD[24+rbp]
+        adcx    r10,rbx
+        adox    r11,r12
+
+DB      0xc4,0x62,0xe3,0xf6,0xa5,0x20,0x00,0x00,0x00
+        mov     rax,rdx
+        mov     rdx,r8
+        adcx    r11,rbx
+        adox    r12,r13
+
+        mulx    rdx,rbx,QWORD[((32+8))+rsp]
+        mov     rdx,rax
+        mov     QWORD[((64+48+8))+rcx*8+rsp],rax
+
+        mulx    r13,rax,QWORD[40+rbp]
+        adcx    r12,rax
+        adox    r13,r14
+
+        mulx    r14,rax,QWORD[48+rbp]
+        adcx    r13,rax
+        adox    r14,r15
+
+        mulx    r15,rax,QWORD[56+rbp]
+        mov     rdx,rbx
+        adcx    r14,rax
+        adox    r15,rsi
+        adcx    r15,rsi
+
+DB      0x67,0x67,0x67
+        inc     rcx
+        jnz     NEAR $L$sqrx8x_reduce
+
+        mov     rax,rsi
+        cmp     rbp,QWORD[((0+8))+rsp]
+        jae     NEAR $L$sqrx8x_no_tail
+
+        mov     rdx,QWORD[((48+8))+rsp]
+        add     r8,QWORD[rdi]
+        lea     rbp,[64+rbp]
+        mov     rcx,-8
+        adcx    r9,QWORD[8+rdi]
+        adcx    r10,QWORD[16+rdi]
+        adc     r11,QWORD[24+rdi]
+        adc     r12,QWORD[32+rdi]
+        adc     r13,QWORD[40+rdi]
+        adc     r14,QWORD[48+rdi]
+        adc     r15,QWORD[56+rdi]
+        lea     rdi,[64+rdi]
+        sbb     rax,rax
+
+        xor     rsi,rsi
+        mov     QWORD[((16+8))+rsp],rax
+        jmp     NEAR $L$sqrx8x_tail
+
+ALIGN   32
+$L$sqrx8x_tail:
+        mov     rbx,r8
+        mulx    r8,rax,QWORD[rbp]
+        adcx    rbx,rax
+        adox    r8,r9
+
+        mulx    r9,rax,QWORD[8+rbp]
+        adcx    r8,rax
+        adox    r9,r10
+
+        mulx    r10,rax,QWORD[16+rbp]
+        adcx    r9,rax
+        adox    r10,r11
+
+        mulx    r11,rax,QWORD[24+rbp]
+        adcx    r10,rax
+        adox    r11,r12
+
+DB      0xc4,0x62,0xfb,0xf6,0xa5,0x20,0x00,0x00,0x00
+        adcx    r11,rax
+        adox    r12,r13
+
+        mulx    r13,rax,QWORD[40+rbp]
+        adcx    r12,rax
+        adox    r13,r14
+
+        mulx    r14,rax,QWORD[48+rbp]
+        adcx    r13,rax
+        adox    r14,r15
+
+        mulx    r15,rax,QWORD[56+rbp]
+        mov     rdx,QWORD[((72+48+8))+rcx*8+rsp]
+        adcx    r14,rax
+        adox    r15,rsi
+        mov     QWORD[rcx*8+rdi],rbx
+        mov     rbx,r8
+        adcx    r15,rsi
+
+        inc     rcx
+        jnz     NEAR $L$sqrx8x_tail
+
+        cmp     rbp,QWORD[((0+8))+rsp]
+        jae     NEAR $L$sqrx8x_tail_done
+
+        sub     rsi,QWORD[((16+8))+rsp]
+        mov     rdx,QWORD[((48+8))+rsp]
+        lea     rbp,[64+rbp]
+        adc     r8,QWORD[rdi]
+        adc     r9,QWORD[8+rdi]
+        adc     r10,QWORD[16+rdi]
+        adc     r11,QWORD[24+rdi]
+        adc     r12,QWORD[32+rdi]
+        adc     r13,QWORD[40+rdi]
+        adc     r14,QWORD[48+rdi]
+        adc     r15,QWORD[56+rdi]
+        lea     rdi,[64+rdi]
+        sbb     rax,rax
+        sub     rcx,8
+
+        xor     rsi,rsi
+        mov     QWORD[((16+8))+rsp],rax
+        jmp     NEAR $L$sqrx8x_tail
+
+ALIGN   32
+$L$sqrx8x_tail_done:
+        xor     rax,rax
+        add     r8,QWORD[((24+8))+rsp]
+        adc     r9,0
+        adc     r10,0
+        adc     r11,0
+        adc     r12,0
+        adc     r13,0
+        adc     r14,0
+        adc     r15,0
+        adc     rax,0
+
+        sub     rsi,QWORD[((16+8))+rsp]
+$L$sqrx8x_no_tail:
+        adc     r8,QWORD[rdi]
+DB      102,72,15,126,217
+        adc     r9,QWORD[8+rdi]
+        mov     rsi,QWORD[56+rbp]
+DB      102,72,15,126,213
+        adc     r10,QWORD[16+rdi]
+        adc     r11,QWORD[24+rdi]
+        adc     r12,QWORD[32+rdi]
+        adc     r13,QWORD[40+rdi]
+        adc     r14,QWORD[48+rdi]
+        adc     r15,QWORD[56+rdi]
+        adc     rax,0
+
+        mov     rbx,QWORD[((32+8))+rsp]
+        mov     rdx,QWORD[64+rcx*1+rdi]
+
+        mov     QWORD[rdi],r8
+        lea     r8,[64+rdi]
+        mov     QWORD[8+rdi],r9
+        mov     QWORD[16+rdi],r10
+        mov     QWORD[24+rdi],r11
+        mov     QWORD[32+rdi],r12
+        mov     QWORD[40+rdi],r13
+        mov     QWORD[48+rdi],r14
+        mov     QWORD[56+rdi],r15
+
+        lea     rdi,[64+rcx*1+rdi]
+        cmp     r8,QWORD[((8+8))+rsp]
+        jb      NEAR $L$sqrx8x_reduction_loop
+        DB      0F3h,0C3h               ;repret
+
+
+ALIGN   32
+__bn_postx4x_internal:
+        mov     r12,QWORD[rbp]
+        mov     r10,rcx
+        mov     r9,rcx
+        neg     rax
+        sar     rcx,3+2
+
+DB      102,72,15,126,202
+DB      102,72,15,126,206
+        dec     r12
+        mov     r13,QWORD[8+rbp]
+        xor     r8,r8
+        mov     r14,QWORD[16+rbp]
+        mov     r15,QWORD[24+rbp]
+        jmp     NEAR $L$sqrx4x_sub_entry
+
+ALIGN   16
+$L$sqrx4x_sub:
+        mov     r12,QWORD[rbp]
+        mov     r13,QWORD[8+rbp]
+        mov     r14,QWORD[16+rbp]
+        mov     r15,QWORD[24+rbp]
+$L$sqrx4x_sub_entry:
+        andn    r12,r12,rax
+        lea     rbp,[32+rbp]
+        andn    r13,r13,rax
+        andn    r14,r14,rax
+        andn    r15,r15,rax
+
+        neg     r8
+        adc     r12,QWORD[rdi]
+        adc     r13,QWORD[8+rdi]
+        adc     r14,QWORD[16+rdi]
+        adc     r15,QWORD[24+rdi]
+        mov     QWORD[rdx],r12
+        lea     rdi,[32+rdi]
+        mov     QWORD[8+rdx],r13
+        sbb     r8,r8
+        mov     QWORD[16+rdx],r14
+        mov     QWORD[24+rdx],r15
+        lea     rdx,[32+rdx]
+
+        inc     rcx
+        jnz     NEAR $L$sqrx4x_sub
+
+        neg     r9
+
+        DB      0F3h,0C3h               ;repret
+
+global  bn_get_bits5
+
+ALIGN   16
+bn_get_bits5:
+        lea     r10,[rcx]
+        lea     r11,[1+rcx]
+        mov     ecx,edx
+        shr     edx,4
+        and     ecx,15
+        lea     eax,[((-8))+rcx]
+        cmp     ecx,11
+        cmova   r10,r11
+        cmova   ecx,eax
+        movzx   eax,WORD[rdx*2+r10]
+        shr     eax,cl
+        and     eax,31
+        DB      0F3h,0C3h               ;repret
+
+
+global  bn_scatter5
+
+ALIGN   16
+bn_scatter5:
+        cmp     edx,0
+        jz      NEAR $L$scatter_epilogue
+        lea     r8,[r9*8+r8]
+$L$scatter:
+        mov     rax,QWORD[rcx]
+        lea     rcx,[8+rcx]
+        mov     QWORD[r8],rax
+        lea     r8,[256+r8]
+        sub     edx,1
+        jnz     NEAR $L$scatter
+$L$scatter_epilogue:
+        DB      0F3h,0C3h               ;repret
+
+
+global  bn_gather5
+
+ALIGN   32
+bn_gather5:
+$L$SEH_begin_bn_gather5:
+
+DB      0x4c,0x8d,0x14,0x24
+DB      0x48,0x81,0xec,0x08,0x01,0x00,0x00
+        lea     rax,[$L$inc]
+        and     rsp,-16
+
+        movd    xmm5,r9d
+        movdqa  xmm0,XMMWORD[rax]
+        movdqa  xmm1,XMMWORD[16+rax]
+        lea     r11,[128+r8]
+        lea     rax,[128+rsp]
+
+        pshufd  xmm5,xmm5,0
+        movdqa  xmm4,xmm1
+        movdqa  xmm2,xmm1
+        paddd   xmm1,xmm0
+        pcmpeqd xmm0,xmm5
+        movdqa  xmm3,xmm4
+
+        paddd   xmm2,xmm1
+        pcmpeqd xmm1,xmm5
+        movdqa  XMMWORD[(-128)+rax],xmm0
+        movdqa  xmm0,xmm4
+
+        paddd   xmm3,xmm2
+        pcmpeqd xmm2,xmm5
+        movdqa  XMMWORD[(-112)+rax],xmm1
+        movdqa  xmm1,xmm4
+
+        paddd   xmm0,xmm3
+        pcmpeqd xmm3,xmm5
+        movdqa  XMMWORD[(-96)+rax],xmm2
+        movdqa  xmm2,xmm4
+        paddd   xmm1,xmm0
+        pcmpeqd xmm0,xmm5
+        movdqa  XMMWORD[(-80)+rax],xmm3
+        movdqa  xmm3,xmm4
+
+        paddd   xmm2,xmm1
+        pcmpeqd xmm1,xmm5
+        movdqa  XMMWORD[(-64)+rax],xmm0
+        movdqa  xmm0,xmm4
+
+        paddd   xmm3,xmm2
+        pcmpeqd xmm2,xmm5
+        movdqa  XMMWORD[(-48)+rax],xmm1
+        movdqa  xmm1,xmm4
+
+        paddd   xmm0,xmm3
+        pcmpeqd xmm3,xmm5
+        movdqa  XMMWORD[(-32)+rax],xmm2
+        movdqa  xmm2,xmm4
+        paddd   xmm1,xmm0
+        pcmpeqd xmm0,xmm5
+        movdqa  XMMWORD[(-16)+rax],xmm3
+        movdqa  xmm3,xmm4
+
+        paddd   xmm2,xmm1
+        pcmpeqd xmm1,xmm5
+        movdqa  XMMWORD[rax],xmm0
+        movdqa  xmm0,xmm4
+
+        paddd   xmm3,xmm2
+        pcmpeqd xmm2,xmm5
+        movdqa  XMMWORD[16+rax],xmm1
+        movdqa  xmm1,xmm4
+
+        paddd   xmm0,xmm3
+        pcmpeqd xmm3,xmm5
+        movdqa  XMMWORD[32+rax],xmm2
+        movdqa  xmm2,xmm4
+        paddd   xmm1,xmm0
+        pcmpeqd xmm0,xmm5
+        movdqa  XMMWORD[48+rax],xmm3
+        movdqa  xmm3,xmm4
+
+        paddd   xmm2,xmm1
+        pcmpeqd xmm1,xmm5
+        movdqa  XMMWORD[64+rax],xmm0
+        movdqa  xmm0,xmm4
+
+        paddd   xmm3,xmm2
+        pcmpeqd xmm2,xmm5
+        movdqa  XMMWORD[80+rax],xmm1
+        movdqa  xmm1,xmm4
+
+        paddd   xmm0,xmm3
+        pcmpeqd xmm3,xmm5
+        movdqa  XMMWORD[96+rax],xmm2
+        movdqa  xmm2,xmm4
+        movdqa  XMMWORD[112+rax],xmm3
+        jmp     NEAR $L$gather
+
+ALIGN   32
+$L$gather:
+        pxor    xmm4,xmm4
+        pxor    xmm5,xmm5
+        movdqa  xmm0,XMMWORD[((-128))+r11]
+        movdqa  xmm1,XMMWORD[((-112))+r11]
+        movdqa  xmm2,XMMWORD[((-96))+r11]
+        pand    xmm0,XMMWORD[((-128))+rax]
+        movdqa  xmm3,XMMWORD[((-80))+r11]
+        pand    xmm1,XMMWORD[((-112))+rax]
+        por     xmm4,xmm0
+        pand    xmm2,XMMWORD[((-96))+rax]
+        por     xmm5,xmm1
+        pand    xmm3,XMMWORD[((-80))+rax]
+        por     xmm4,xmm2
+        por     xmm5,xmm3
+        movdqa  xmm0,XMMWORD[((-64))+r11]
+        movdqa  xmm1,XMMWORD[((-48))+r11]
+        movdqa  xmm2,XMMWORD[((-32))+r11]
+        pand    xmm0,XMMWORD[((-64))+rax]
+        movdqa  xmm3,XMMWORD[((-16))+r11]
+        pand    xmm1,XMMWORD[((-48))+rax]
+        por     xmm4,xmm0
+        pand    xmm2,XMMWORD[((-32))+rax]
+        por     xmm5,xmm1
+        pand    xmm3,XMMWORD[((-16))+rax]
+        por     xmm4,xmm2
+        por     xmm5,xmm3
+        movdqa  xmm0,XMMWORD[r11]
+        movdqa  xmm1,XMMWORD[16+r11]
+        movdqa  xmm2,XMMWORD[32+r11]
+        pand    xmm0,XMMWORD[rax]
+        movdqa  xmm3,XMMWORD[48+r11]
+        pand    xmm1,XMMWORD[16+rax]
+        por     xmm4,xmm0
+        pand    xmm2,XMMWORD[32+rax]
+        por     xmm5,xmm1
+        pand    xmm3,XMMWORD[48+rax]
+        por     xmm4,xmm2
+        por     xmm5,xmm3
+        movdqa  xmm0,XMMWORD[64+r11]
+        movdqa  xmm1,XMMWORD[80+r11]
+        movdqa  xmm2,XMMWORD[96+r11]
+        pand    xmm0,XMMWORD[64+rax]
+        movdqa  xmm3,XMMWORD[112+r11]
+        pand    xmm1,XMMWORD[80+rax]
+        por     xmm4,xmm0
+        pand    xmm2,XMMWORD[96+rax]
+        por     xmm5,xmm1
+        pand    xmm3,XMMWORD[112+rax]
+        por     xmm4,xmm2
+        por     xmm5,xmm3
+        por     xmm4,xmm5
+        lea     r11,[256+r11]
+        pshufd  xmm0,xmm4,0x4e
+        por     xmm0,xmm4
+        movq    QWORD[rcx],xmm0
+        lea     rcx,[8+rcx]
+        sub     edx,1
+        jnz     NEAR $L$gather
+
+        lea     rsp,[r10]
+        DB      0F3h,0C3h               ;repret
+$L$SEH_end_bn_gather5:
+
+ALIGN   64
+$L$inc:
+        DD      0,0,1,1
+        DD      2,2,2,2
+DB      77,111,110,116,103,111,109,101,114,121,32,77,117,108,116,105
+DB      112,108,105,99,97,116,105,111,110,32,119,105,116,104,32,115
+DB      99,97,116,116,101,114,47,103,97,116,104,101,114,32,102,111
+DB      114,32,120,56,54,95,54,52,44,32,67,82,89,80,84,79
+DB      71,65,77,83,32,98,121,32,60,97,112,112,114,111,64,111
+DB      112,101,110,115,115,108,46,111,114,103,62,0
+EXTERN  __imp_RtlVirtualUnwind
+
+ALIGN   16
+mul_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        mov     rsi,QWORD[8+r9]
+        mov     r11,QWORD[56+r9]
+
+        mov     r10d,DWORD[r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jb      NEAR $L$common_seh_tail
+
+        mov     r10d,DWORD[4+r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jb      NEAR $L$common_pop_regs
+
+        mov     rax,QWORD[152+r8]
+
+        mov     r10d,DWORD[8+r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jae     NEAR $L$common_seh_tail
+
+        lea     r10,[$L$mul_epilogue]
+        cmp     rbx,r10
+        ja      NEAR $L$body_40
+
+        mov     r10,QWORD[192+r8]
+        mov     rax,QWORD[8+r10*8+rax]
+
+        jmp     NEAR $L$common_pop_regs
+
+$L$body_40:
+        mov     rax,QWORD[40+rax]
+$L$common_pop_regs:
+        mov     rbx,QWORD[((-8))+rax]
+        mov     rbp,QWORD[((-16))+rax]
+        mov     r12,QWORD[((-24))+rax]
+        mov     r13,QWORD[((-32))+rax]
+        mov     r14,QWORD[((-40))+rax]
+        mov     r15,QWORD[((-48))+rax]
+        mov     QWORD[144+r8],rbx
+        mov     QWORD[160+r8],rbp
+        mov     QWORD[216+r8],r12
+        mov     QWORD[224+r8],r13
+        mov     QWORD[232+r8],r14
+        mov     QWORD[240+r8],r15
+
+$L$common_seh_tail:
+        mov     rdi,QWORD[8+rax]
+        mov     rsi,QWORD[16+rax]
+        mov     QWORD[152+r8],rax
+        mov     QWORD[168+r8],rsi
+        mov     QWORD[176+r8],rdi
+
+        mov     rdi,QWORD[40+r9]
+        mov     rsi,r8
+        mov     ecx,154
+        DD      0xa548f3fc
+
+        mov     rsi,r9
+        xor     rcx,rcx
+        mov     rdx,QWORD[8+rsi]
+        mov     r8,QWORD[rsi]
+        mov     r9,QWORD[16+rsi]
+        mov     r10,QWORD[40+rsi]
+        lea     r11,[56+rsi]
+        lea     r12,[24+rsi]
+        mov     QWORD[32+rsp],r10
+        mov     QWORD[40+rsp],r11
+        mov     QWORD[48+rsp],r12
+        mov     QWORD[56+rsp],rcx
+        call    QWORD[__imp_RtlVirtualUnwind]
+
+        mov     eax,1
+        add     rsp,64
+        popfq
+        pop     r15
+        pop     r14
+        pop     r13
+        pop     r12
+        pop     rbp
+        pop     rbx
+        pop     rdi
+        pop     rsi
+        DB      0F3h,0C3h               ;repret
+
+
+section .pdata rdata align=4
+ALIGN   4
+        DD      $L$SEH_begin_bn_mul_mont_gather5 wrt ..imagebase
+        DD      $L$SEH_end_bn_mul_mont_gather5 wrt ..imagebase
+        DD      $L$SEH_info_bn_mul_mont_gather5 wrt ..imagebase
+
+        DD      $L$SEH_begin_bn_mul4x_mont_gather5 wrt ..imagebase
+        DD      $L$SEH_end_bn_mul4x_mont_gather5 wrt ..imagebase
+        DD      $L$SEH_info_bn_mul4x_mont_gather5 wrt ..imagebase
+
+        DD      $L$SEH_begin_bn_power5 wrt ..imagebase
+        DD      $L$SEH_end_bn_power5 wrt ..imagebase
+        DD      $L$SEH_info_bn_power5 wrt ..imagebase
+
+        DD      $L$SEH_begin_bn_from_mont8x wrt ..imagebase
+        DD      $L$SEH_end_bn_from_mont8x wrt ..imagebase
+        DD      $L$SEH_info_bn_from_mont8x wrt ..imagebase
+        DD      $L$SEH_begin_bn_mulx4x_mont_gather5 wrt ..imagebase
+        DD      $L$SEH_end_bn_mulx4x_mont_gather5 wrt ..imagebase
+        DD      $L$SEH_info_bn_mulx4x_mont_gather5 wrt ..imagebase
+
+        DD      $L$SEH_begin_bn_powerx5 wrt ..imagebase
+        DD      $L$SEH_end_bn_powerx5 wrt ..imagebase
+        DD      $L$SEH_info_bn_powerx5 wrt ..imagebase
+        DD      $L$SEH_begin_bn_gather5 wrt ..imagebase
+        DD      $L$SEH_end_bn_gather5 wrt ..imagebase
+        DD      $L$SEH_info_bn_gather5 wrt ..imagebase
+
+section .xdata rdata align=8
+ALIGN   8
+$L$SEH_info_bn_mul_mont_gather5:
+DB      9,0,0,0
+        DD      mul_handler wrt ..imagebase
+        DD      $L$mul_body wrt ..imagebase,$L$mul_body wrt ..imagebase,$L$mul_epilogue wrt ..imagebase
+ALIGN   8
+$L$SEH_info_bn_mul4x_mont_gather5:
+DB      9,0,0,0
+        DD      mul_handler wrt ..imagebase
+        DD      $L$mul4x_prologue wrt ..imagebase,$L$mul4x_body wrt ..imagebase,$L$mul4x_epilogue wrt ..imagebase
+ALIGN   8
+$L$SEH_info_bn_power5:
+DB      9,0,0,0
+        DD      mul_handler wrt ..imagebase
+        DD      $L$power5_prologue wrt ..imagebase,$L$power5_body wrt ..imagebase,$L$power5_epilogue wrt ..imagebase
+ALIGN   8
+$L$SEH_info_bn_from_mont8x:
+DB      9,0,0,0
+        DD      mul_handler wrt ..imagebase
+        DD      $L$from_prologue wrt ..imagebase,$L$from_body wrt ..imagebase,$L$from_epilogue wrt ..imagebase
+ALIGN   8
+$L$SEH_info_bn_mulx4x_mont_gather5:
+DB      9,0,0,0
+        DD      mul_handler wrt ..imagebase
+        DD      $L$mulx4x_prologue wrt ..imagebase,$L$mulx4x_body wrt ..imagebase,$L$mulx4x_epilogue wrt ..imagebase
+ALIGN   8
+$L$SEH_info_bn_powerx5:
+DB      9,0,0,0
+        DD      mul_handler wrt ..imagebase
+        DD      $L$powerx5_prologue wrt ..imagebase,$L$powerx5_body wrt ..imagebase,$L$powerx5_epilogue wrt ..imagebase
+ALIGN   8
+$L$SEH_info_bn_gather5:
+DB      0x01,0x0b,0x03,0x0a
+DB      0x0b,0x01,0x21,0x00
+DB      0x04,0xa3,0x00,0x00
+ALIGN   8
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/md5/md5-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/md5/md5-x86_64.nasm
new file mode 100644
index 0000000000..ff688eeb06
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/md5/md5-x86_64.nasm
@@ -0,0 +1,794 @@
+; Author: Marc Bevand <bevand_m (at) epita.fr>
+; Copyright 2005-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+ALIGN   16
+
+global  md5_block_asm_data_order
+
+md5_block_asm_data_order:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_md5_block_asm_data_order:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+
+        push    rbp
+
+        push    rbx
+
+        push    r12
+
+        push    r14
+
+        push    r15
+
+$L$prologue:
+
+
+
+
+        mov     rbp,rdi
+        shl     rdx,6
+        lea     rdi,[rdx*1+rsi]
+        mov     eax,DWORD[rbp]
+        mov     ebx,DWORD[4+rbp]
+        mov     ecx,DWORD[8+rbp]
+        mov     edx,DWORD[12+rbp]
+
+
+
+
+
+
+
+        cmp     rsi,rdi
+        je      NEAR $L$end
+
+
+$L$loop:
+        mov     r8d,eax
+        mov     r9d,ebx
+        mov     r14d,ecx
+        mov     r15d,edx
+        mov     r10d,DWORD[rsi]
+        mov     r11d,edx
+        xor     r11d,ecx
+        lea     eax,[((-680876936))+r10*1+rax]
+        and     r11d,ebx
+        mov     r10d,DWORD[4+rsi]
+        xor     r11d,edx
+        add     eax,r11d
+        rol     eax,7
+        mov     r11d,ecx
+        add     eax,ebx
+        xor     r11d,ebx
+        lea     edx,[((-389564586))+r10*1+rdx]
+        and     r11d,eax
+        mov     r10d,DWORD[8+rsi]
+        xor     r11d,ecx
+        add     edx,r11d
+        rol     edx,12
+        mov     r11d,ebx
+        add     edx,eax
+        xor     r11d,eax
+        lea     ecx,[606105819+r10*1+rcx]
+        and     r11d,edx
+        mov     r10d,DWORD[12+rsi]
+        xor     r11d,ebx
+        add     ecx,r11d
+        rol     ecx,17
+        mov     r11d,eax
+        add     ecx,edx
+        xor     r11d,edx
+        lea     ebx,[((-1044525330))+r10*1+rbx]
+        and     r11d,ecx
+        mov     r10d,DWORD[16+rsi]
+        xor     r11d,eax
+        add     ebx,r11d
+        rol     ebx,22
+        mov     r11d,edx
+        add     ebx,ecx
+        xor     r11d,ecx
+        lea     eax,[((-176418897))+r10*1+rax]
+        and     r11d,ebx
+        mov     r10d,DWORD[20+rsi]
+        xor     r11d,edx
+        add     eax,r11d
+        rol     eax,7
+        mov     r11d,ecx
+        add     eax,ebx
+        xor     r11d,ebx
+        lea     edx,[1200080426+r10*1+rdx]
+        and     r11d,eax
+        mov     r10d,DWORD[24+rsi]
+        xor     r11d,ecx
+        add     edx,r11d
+        rol     edx,12
+        mov     r11d,ebx
+        add     edx,eax
+        xor     r11d,eax
+        lea     ecx,[((-1473231341))+r10*1+rcx]
+        and     r11d,edx
+        mov     r10d,DWORD[28+rsi]
+        xor     r11d,ebx
+        add     ecx,r11d
+        rol     ecx,17
+        mov     r11d,eax
+        add     ecx,edx
+        xor     r11d,edx
+        lea     ebx,[((-45705983))+r10*1+rbx]
+        and     r11d,ecx
+        mov     r10d,DWORD[32+rsi]
+        xor     r11d,eax
+        add     ebx,r11d
+        rol     ebx,22
+        mov     r11d,edx
+        add     ebx,ecx
+        xor     r11d,ecx
+        lea     eax,[1770035416+r10*1+rax]
+        and     r11d,ebx
+        mov     r10d,DWORD[36+rsi]
+        xor     r11d,edx
+        add     eax,r11d
+        rol     eax,7
+        mov     r11d,ecx
+        add     eax,ebx
+        xor     r11d,ebx
+        lea     edx,[((-1958414417))+r10*1+rdx]
+        and     r11d,eax
+        mov     r10d,DWORD[40+rsi]
+        xor     r11d,ecx
+        add     edx,r11d
+        rol     edx,12
+        mov     r11d,ebx
+        add     edx,eax
+        xor     r11d,eax
+        lea     ecx,[((-42063))+r10*1+rcx]
+        and     r11d,edx
+        mov     r10d,DWORD[44+rsi]
+        xor     r11d,ebx
+        add     ecx,r11d
+        rol     ecx,17
+        mov     r11d,eax
+        add     ecx,edx
+        xor     r11d,edx
+        lea     ebx,[((-1990404162))+r10*1+rbx]
+        and     r11d,ecx
+        mov     r10d,DWORD[48+rsi]
+        xor     r11d,eax
+        add     ebx,r11d
+        rol     ebx,22
+        mov     r11d,edx
+        add     ebx,ecx
+        xor     r11d,ecx
+        lea     eax,[1804603682+r10*1+rax]
+        and     r11d,ebx
+        mov     r10d,DWORD[52+rsi]
+        xor     r11d,edx
+        add     eax,r11d
+        rol     eax,7
+        mov     r11d,ecx
+        add     eax,ebx
+        xor     r11d,ebx
+        lea     edx,[((-40341101))+r10*1+rdx]
+        and     r11d,eax
+        mov     r10d,DWORD[56+rsi]
+        xor     r11d,ecx
+        add     edx,r11d
+        rol     edx,12
+        mov     r11d,ebx
+        add     edx,eax
+        xor     r11d,eax
+        lea     ecx,[((-1502002290))+r10*1+rcx]
+        and     r11d,edx
+        mov     r10d,DWORD[60+rsi]
+        xor     r11d,ebx
+        add     ecx,r11d
+        rol     ecx,17
+        mov     r11d,eax
+        add     ecx,edx
+        xor     r11d,edx
+        lea     ebx,[1236535329+r10*1+rbx]
+        and     r11d,ecx
+        mov     r10d,DWORD[4+rsi]
+        xor     r11d,eax
+        add     ebx,r11d
+        rol     ebx,22
+        mov     r11d,edx
+        add     ebx,ecx
+        mov     r11d,edx
+        mov     r12d,edx
+        not     r11d
+        and     r12d,ebx
+        lea     eax,[((-165796510))+r10*1+rax]
+        and     r11d,ecx
+        mov     r10d,DWORD[24+rsi]
+        or      r12d,r11d
+        mov     r11d,ecx
+        add     eax,r12d
+        mov     r12d,ecx
+        rol     eax,5
+        add     eax,ebx
+        not     r11d
+        and     r12d,eax
+        lea     edx,[((-1069501632))+r10*1+rdx]
+        and     r11d,ebx
+        mov     r10d,DWORD[44+rsi]
+        or      r12d,r11d
+        mov     r11d,ebx
+        add     edx,r12d
+        mov     r12d,ebx
+        rol     edx,9
+        add     edx,eax
+        not     r11d
+        and     r12d,edx
+        lea     ecx,[643717713+r10*1+rcx]
+        and     r11d,eax
+        mov     r10d,DWORD[rsi]
+        or      r12d,r11d
+        mov     r11d,eax
+        add     ecx,r12d
+        mov     r12d,eax
+        rol     ecx,14
+        add     ecx,edx
+        not     r11d
+        and     r12d,ecx
+        lea     ebx,[((-373897302))+r10*1+rbx]
+        and     r11d,edx
+        mov     r10d,DWORD[20+rsi]
+        or      r12d,r11d
+        mov     r11d,edx
+        add     ebx,r12d
+        mov     r12d,edx
+        rol     ebx,20
+        add     ebx,ecx
+        not     r11d
+        and     r12d,ebx
+        lea     eax,[((-701558691))+r10*1+rax]
+        and     r11d,ecx
+        mov     r10d,DWORD[40+rsi]
+        or      r12d,r11d
+        mov     r11d,ecx
+        add     eax,r12d
+        mov     r12d,ecx
+        rol     eax,5
+        add     eax,ebx
+        not     r11d
+        and     r12d,eax
+        lea     edx,[38016083+r10*1+rdx]
+        and     r11d,ebx
+        mov     r10d,DWORD[60+rsi]
+        or      r12d,r11d
+        mov     r11d,ebx
+        add     edx,r12d
+        mov     r12d,ebx
+        rol     edx,9
+        add     edx,eax
+        not     r11d
+        and     r12d,edx
+        lea     ecx,[((-660478335))+r10*1+rcx]
+        and     r11d,eax
+        mov     r10d,DWORD[16+rsi]
+        or      r12d,r11d
+        mov     r11d,eax
+        add     ecx,r12d
+        mov     r12d,eax
+        rol     ecx,14
+        add     ecx,edx
+        not     r11d
+        and     r12d,ecx
+        lea     ebx,[((-405537848))+r10*1+rbx]
+        and     r11d,edx
+        mov     r10d,DWORD[36+rsi]
+        or      r12d,r11d
+        mov     r11d,edx
+        add     ebx,r12d
+        mov     r12d,edx
+        rol     ebx,20
+        add     ebx,ecx
+        not     r11d
+        and     r12d,ebx
+        lea     eax,[568446438+r10*1+rax]
+        and     r11d,ecx
+        mov     r10d,DWORD[56+rsi]
+        or      r12d,r11d
+        mov     r11d,ecx
+        add     eax,r12d
+        mov     r12d,ecx
+        rol     eax,5
+        add     eax,ebx
+        not     r11d
+        and     r12d,eax
+        lea     edx,[((-1019803690))+r10*1+rdx]
+        and     r11d,ebx
+        mov     r10d,DWORD[12+rsi]
+        or      r12d,r11d
+        mov     r11d,ebx
+        add     edx,r12d
+        mov     r12d,ebx
+        rol     edx,9
+        add     edx,eax
+        not     r11d
+        and     r12d,edx
+        lea     ecx,[((-187363961))+r10*1+rcx]
+        and     r11d,eax
+        mov     r10d,DWORD[32+rsi]
+        or      r12d,r11d
+        mov     r11d,eax
+        add     ecx,r12d
+        mov     r12d,eax
+        rol     ecx,14
+        add     ecx,edx
+        not     r11d
+        and     r12d,ecx
+        lea     ebx,[1163531501+r10*1+rbx]
+        and     r11d,edx
+        mov     r10d,DWORD[52+rsi]
+        or      r12d,r11d
+        mov     r11d,edx
+        add     ebx,r12d
+        mov     r12d,edx
+        rol     ebx,20
+        add     ebx,ecx
+        not     r11d
+        and     r12d,ebx
+        lea     eax,[((-1444681467))+r10*1+rax]
+        and     r11d,ecx
+        mov     r10d,DWORD[8+rsi]
+        or      r12d,r11d
+        mov     r11d,ecx
+        add     eax,r12d
+        mov     r12d,ecx
+        rol     eax,5
+        add     eax,ebx
+        not     r11d
+        and     r12d,eax
+        lea     edx,[((-51403784))+r10*1+rdx]
+        and     r11d,ebx
+        mov     r10d,DWORD[28+rsi]
+        or      r12d,r11d
+        mov     r11d,ebx
+        add     edx,r12d
+        mov     r12d,ebx
+        rol     edx,9
+        add     edx,eax
+        not     r11d
+        and     r12d,edx
+        lea     ecx,[1735328473+r10*1+rcx]
+        and     r11d,eax
+        mov     r10d,DWORD[48+rsi]
+        or      r12d,r11d
+        mov     r11d,eax
+        add     ecx,r12d
+        mov     r12d,eax
+        rol     ecx,14
+        add     ecx,edx
+        not     r11d
+        and     r12d,ecx
+        lea     ebx,[((-1926607734))+r10*1+rbx]
+        and     r11d,edx
+        mov     r10d,DWORD[20+rsi]
+        or      r12d,r11d
+        mov     r11d,edx
+        add     ebx,r12d
+        mov     r12d,edx
+        rol     ebx,20
+        add     ebx,ecx
+        mov     r11d,ecx
+        lea     eax,[((-378558))+r10*1+rax]
+        xor     r11d,edx
+        mov     r10d,DWORD[32+rsi]
+        xor     r11d,ebx
+        add     eax,r11d
+        mov     r11d,ebx
+        rol     eax,4
+        add     eax,ebx
+        lea     edx,[((-2022574463))+r10*1+rdx]
+        xor     r11d,ecx
+        mov     r10d,DWORD[44+rsi]
+        xor     r11d,eax
+        add     edx,r11d
+        rol     edx,11
+        mov     r11d,eax
+        add     edx,eax
+        lea     ecx,[1839030562+r10*1+rcx]
+        xor     r11d,ebx
+        mov     r10d,DWORD[56+rsi]
+        xor     r11d,edx
+        add     ecx,r11d
+        mov     r11d,edx
+        rol     ecx,16
+        add     ecx,edx
+        lea     ebx,[((-35309556))+r10*1+rbx]
+        xor     r11d,eax
+        mov     r10d,DWORD[4+rsi]
+        xor     r11d,ecx
+        add     ebx,r11d
+        rol     ebx,23
+        mov     r11d,ecx
+        add     ebx,ecx
+        lea     eax,[((-1530992060))+r10*1+rax]
+        xor     r11d,edx
+        mov     r10d,DWORD[16+rsi]
+        xor     r11d,ebx
+        add     eax,r11d
+        mov     r11d,ebx
+        rol     eax,4
+        add     eax,ebx
+        lea     edx,[1272893353+r10*1+rdx]
+        xor     r11d,ecx
+        mov     r10d,DWORD[28+rsi]
+        xor     r11d,eax
+        add     edx,r11d
+        rol     edx,11
+        mov     r11d,eax
+        add     edx,eax
+        lea     ecx,[((-155497632))+r10*1+rcx]
+        xor     r11d,ebx
+        mov     r10d,DWORD[40+rsi]
+        xor     r11d,edx
+        add     ecx,r11d
+        mov     r11d,edx
+        rol     ecx,16
+        add     ecx,edx
+        lea     ebx,[((-1094730640))+r10*1+rbx]
+        xor     r11d,eax
+        mov     r10d,DWORD[52+rsi]
+        xor     r11d,ecx
+        add     ebx,r11d
+        rol     ebx,23
+        mov     r11d,ecx
+        add     ebx,ecx
+        lea     eax,[681279174+r10*1+rax]
+        xor     r11d,edx
+        mov     r10d,DWORD[rsi]
+        xor     r11d,ebx
+        add     eax,r11d
+        mov     r11d,ebx
+        rol     eax,4
+        add     eax,ebx
+        lea     edx,[((-358537222))+r10*1+rdx]
+        xor     r11d,ecx
+        mov     r10d,DWORD[12+rsi]
+        xor     r11d,eax
+        add     edx,r11d
+        rol     edx,11
+        mov     r11d,eax
+        add     edx,eax
+        lea     ecx,[((-722521979))+r10*1+rcx]
+        xor     r11d,ebx
+        mov     r10d,DWORD[24+rsi]
+        xor     r11d,edx
+        add     ecx,r11d
+        mov     r11d,edx
+        rol     ecx,16
+        add     ecx,edx
+        lea     ebx,[76029189+r10*1+rbx]
+        xor     r11d,eax
+        mov     r10d,DWORD[36+rsi]
+        xor     r11d,ecx
+        add     ebx,r11d
+        rol     ebx,23
+        mov     r11d,ecx
+        add     ebx,ecx
+        lea     eax,[((-640364487))+r10*1+rax]
+        xor     r11d,edx
+        mov     r10d,DWORD[48+rsi]
+        xor     r11d,ebx
+        add     eax,r11d
+        mov     r11d,ebx
+        rol     eax,4
+        add     eax,ebx
+        lea     edx,[((-421815835))+r10*1+rdx]
+        xor     r11d,ecx
+        mov     r10d,DWORD[60+rsi]
+        xor     r11d,eax
+        add     edx,r11d
+        rol     edx,11
+        mov     r11d,eax
+        add     edx,eax
+        lea     ecx,[530742520+r10*1+rcx]
+        xor     r11d,ebx
+        mov     r10d,DWORD[8+rsi]
+        xor     r11d,edx
+        add     ecx,r11d
+        mov     r11d,edx
+        rol     ecx,16
+        add     ecx,edx
+        lea     ebx,[((-995338651))+r10*1+rbx]
+        xor     r11d,eax
+        mov     r10d,DWORD[rsi]
+        xor     r11d,ecx
+        add     ebx,r11d
+        rol     ebx,23
+        mov     r11d,ecx
+        add     ebx,ecx
+        mov     r11d,0xffffffff
+        xor     r11d,edx
+        lea     eax,[((-198630844))+r10*1+rax]
+        or      r11d,ebx
+        mov     r10d,DWORD[28+rsi]
+        xor     r11d,ecx
+        add     eax,r11d
+        mov     r11d,0xffffffff
+        rol     eax,6
+        xor     r11d,ecx
+        add     eax,ebx
+        lea     edx,[1126891415+r10*1+rdx]
+        or      r11d,eax
+        mov     r10d,DWORD[56+rsi]
+        xor     r11d,ebx
+        add     edx,r11d
+        mov     r11d,0xffffffff
+        rol     edx,10
+        xor     r11d,ebx
+        add     edx,eax
+        lea     ecx,[((-1416354905))+r10*1+rcx]
+        or      r11d,edx
+        mov     r10d,DWORD[20+rsi]
+        xor     r11d,eax
+        add     ecx,r11d
+        mov     r11d,0xffffffff
+        rol     ecx,15
+        xor     r11d,eax
+        add     ecx,edx
+        lea     ebx,[((-57434055))+r10*1+rbx]
+        or      r11d,ecx
+        mov     r10d,DWORD[48+rsi]
+        xor     r11d,edx
+        add     ebx,r11d
+        mov     r11d,0xffffffff
+        rol     ebx,21
+        xor     r11d,edx
+        add     ebx,ecx
+        lea     eax,[1700485571+r10*1+rax]
+        or      r11d,ebx
+        mov     r10d,DWORD[12+rsi]
+        xor     r11d,ecx
+        add     eax,r11d
+        mov     r11d,0xffffffff
+        rol     eax,6
+        xor     r11d,ecx
+        add     eax,ebx
+        lea     edx,[((-1894986606))+r10*1+rdx]
+        or      r11d,eax
+        mov     r10d,DWORD[40+rsi]
+        xor     r11d,ebx
+        add     edx,r11d
+        mov     r11d,0xffffffff
+        rol     edx,10
+        xor     r11d,ebx
+        add     edx,eax
+        lea     ecx,[((-1051523))+r10*1+rcx]
+        or      r11d,edx
+        mov     r10d,DWORD[4+rsi]
+        xor     r11d,eax
+        add     ecx,r11d
+        mov     r11d,0xffffffff
+        rol     ecx,15
+        xor     r11d,eax
+        add     ecx,edx
+        lea     ebx,[((-2054922799))+r10*1+rbx]
+        or      r11d,ecx
+        mov     r10d,DWORD[32+rsi]
+        xor     r11d,edx
+        add     ebx,r11d
+        mov     r11d,0xffffffff
+        rol     ebx,21
+        xor     r11d,edx
+        add     ebx,ecx
+        lea     eax,[1873313359+r10*1+rax]
+        or      r11d,ebx
+        mov     r10d,DWORD[60+rsi]
+        xor     r11d,ecx
+        add     eax,r11d
+        mov     r11d,0xffffffff
+        rol     eax,6
+        xor     r11d,ecx
+        add     eax,ebx
+        lea     edx,[((-30611744))+r10*1+rdx]
+        or      r11d,eax
+        mov     r10d,DWORD[24+rsi]
+        xor     r11d,ebx
+        add     edx,r11d
+        mov     r11d,0xffffffff
+        rol     edx,10
+        xor     r11d,ebx
+        add     edx,eax
+        lea     ecx,[((-1560198380))+r10*1+rcx]
+        or      r11d,edx
+        mov     r10d,DWORD[52+rsi]
+        xor     r11d,eax
+        add     ecx,r11d
+        mov     r11d,0xffffffff
+        rol     ecx,15
+        xor     r11d,eax
+        add     ecx,edx
+        lea     ebx,[1309151649+r10*1+rbx]
+        or      r11d,ecx
+        mov     r10d,DWORD[16+rsi]
+        xor     r11d,edx
+        add     ebx,r11d
+        mov     r11d,0xffffffff
+        rol     ebx,21
+        xor     r11d,edx
+        add     ebx,ecx
+        lea     eax,[((-145523070))+r10*1+rax]
+        or      r11d,ebx
+        mov     r10d,DWORD[44+rsi]
+        xor     r11d,ecx
+        add     eax,r11d
+        mov     r11d,0xffffffff
+        rol     eax,6
+        xor     r11d,ecx
+        add     eax,ebx
+        lea     edx,[((-1120210379))+r10*1+rdx]
+        or      r11d,eax
+        mov     r10d,DWORD[8+rsi]
+        xor     r11d,ebx
+        add     edx,r11d
+        mov     r11d,0xffffffff
+        rol     edx,10
+        xor     r11d,ebx
+        add     edx,eax
+        lea     ecx,[718787259+r10*1+rcx]
+        or      r11d,edx
+        mov     r10d,DWORD[36+rsi]
+        xor     r11d,eax
+        add     ecx,r11d
+        mov     r11d,0xffffffff
+        rol     ecx,15
+        xor     r11d,eax
+        add     ecx,edx
+        lea     ebx,[((-343485551))+r10*1+rbx]
+        or      r11d,ecx
+        mov     r10d,DWORD[rsi]
+        xor     r11d,edx
+        add     ebx,r11d
+        mov     r11d,0xffffffff
+        rol     ebx,21
+        xor     r11d,edx
+        add     ebx,ecx
+
+        add     eax,r8d
+        add     ebx,r9d
+        add     ecx,r14d
+        add     edx,r15d
+
+
+        add     rsi,64
+        cmp     rsi,rdi
+        jb      NEAR $L$loop
+
+
+$L$end:
+        mov     DWORD[rbp],eax
+        mov     DWORD[4+rbp],ebx
+        mov     DWORD[8+rbp],ecx
+        mov     DWORD[12+rbp],edx
+
+        mov     r15,QWORD[rsp]
+
+        mov     r14,QWORD[8+rsp]
+
+        mov     r12,QWORD[16+rsp]
+
+        mov     rbx,QWORD[24+rsp]
+
+        mov     rbp,QWORD[32+rsp]
+
+        add     rsp,40
+
+$L$epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_md5_block_asm_data_order:
+EXTERN  __imp_RtlVirtualUnwind
+
+ALIGN   16
+se_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        lea     r10,[$L$prologue]
+        cmp     rbx,r10
+        jb      NEAR $L$in_prologue
+
+        mov     rax,QWORD[152+r8]
+
+        lea     r10,[$L$epilogue]
+        cmp     rbx,r10
+        jae     NEAR $L$in_prologue
+
+        lea     rax,[40+rax]
+
+        mov     rbp,QWORD[((-8))+rax]
+        mov     rbx,QWORD[((-16))+rax]
+        mov     r12,QWORD[((-24))+rax]
+        mov     r14,QWORD[((-32))+rax]
+        mov     r15,QWORD[((-40))+rax]
+        mov     QWORD[144+r8],rbx
+        mov     QWORD[160+r8],rbp
+        mov     QWORD[216+r8],r12
+        mov     QWORD[232+r8],r14
+        mov     QWORD[240+r8],r15
+
+$L$in_prologue:
+        mov     rdi,QWORD[8+rax]
+        mov     rsi,QWORD[16+rax]
+        mov     QWORD[152+r8],rax
+        mov     QWORD[168+r8],rsi
+        mov     QWORD[176+r8],rdi
+
+        mov     rdi,QWORD[40+r9]
+        mov     rsi,r8
+        mov     ecx,154
+        DD      0xa548f3fc
+
+        mov     rsi,r9
+        xor     rcx,rcx
+        mov     rdx,QWORD[8+rsi]
+        mov     r8,QWORD[rsi]
+        mov     r9,QWORD[16+rsi]
+        mov     r10,QWORD[40+rsi]
+        lea     r11,[56+rsi]
+        lea     r12,[24+rsi]
+        mov     QWORD[32+rsp],r10
+        mov     QWORD[40+rsp],r11
+        mov     QWORD[48+rsp],r12
+        mov     QWORD[56+rsp],rcx
+        call    QWORD[__imp_RtlVirtualUnwind]
+
+        mov     eax,1
+        add     rsp,64
+        popfq
+        pop     r15
+        pop     r14
+        pop     r13
+        pop     r12
+        pop     rbp
+        pop     rbx
+        pop     rdi
+        pop     rsi
+        DB      0F3h,0C3h               ;repret
+
+
+section .pdata rdata align=4
+ALIGN   4
+        DD      $L$SEH_begin_md5_block_asm_data_order wrt ..imagebase
+        DD      $L$SEH_end_md5_block_asm_data_order wrt ..imagebase
+        DD      $L$SEH_info_md5_block_asm_data_order wrt ..imagebase
+
+section .xdata rdata align=8
+ALIGN   8
+$L$SEH_info_md5_block_asm_data_order:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/modes/aesni-gcm-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/modes/aesni-gcm-x86_64.nasm
new file mode 100644
index 0000000000..3951121452
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/modes/aesni-gcm-x86_64.nasm
@@ -0,0 +1,984 @@
+; Copyright 2013-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+
+ALIGN   32
+_aesni_ctr32_ghash_6x:
+        vmovdqu xmm2,XMMWORD[32+r11]
+        sub     rdx,6
+        vpxor   xmm4,xmm4,xmm4
+        vmovdqu xmm15,XMMWORD[((0-128))+rcx]
+        vpaddb  xmm10,xmm1,xmm2
+        vpaddb  xmm11,xmm10,xmm2
+        vpaddb  xmm12,xmm11,xmm2
+        vpaddb  xmm13,xmm12,xmm2
+        vpaddb  xmm14,xmm13,xmm2
+        vpxor   xmm9,xmm1,xmm15
+        vmovdqu XMMWORD[(16+8)+rsp],xmm4
+        jmp     NEAR $L$oop6x
+
+ALIGN   32
+$L$oop6x:
+        add     ebx,100663296
+        jc      NEAR $L$handle_ctr32
+        vmovdqu xmm3,XMMWORD[((0-32))+r9]
+        vpaddb  xmm1,xmm14,xmm2
+        vpxor   xmm10,xmm10,xmm15
+        vpxor   xmm11,xmm11,xmm15
+
+$L$resume_ctr32:
+        vmovdqu XMMWORD[r8],xmm1
+        vpclmulqdq      xmm5,xmm7,xmm3,0x10
+        vpxor   xmm12,xmm12,xmm15
+        vmovups xmm2,XMMWORD[((16-128))+rcx]
+        vpclmulqdq      xmm6,xmm7,xmm3,0x01
+        xor     r12,r12
+        cmp     r15,r14
+
+        vaesenc xmm9,xmm9,xmm2
+        vmovdqu xmm0,XMMWORD[((48+8))+rsp]
+        vpxor   xmm13,xmm13,xmm15
+        vpclmulqdq      xmm1,xmm7,xmm3,0x00
+        vaesenc xmm10,xmm10,xmm2
+        vpxor   xmm14,xmm14,xmm15
+        setnc   r12b
+        vpclmulqdq      xmm7,xmm7,xmm3,0x11
+        vaesenc xmm11,xmm11,xmm2
+        vmovdqu xmm3,XMMWORD[((16-32))+r9]
+        neg     r12
+        vaesenc xmm12,xmm12,xmm2
+        vpxor   xmm6,xmm6,xmm5
+        vpclmulqdq      xmm5,xmm0,xmm3,0x00
+        vpxor   xmm8,xmm8,xmm4
+        vaesenc xmm13,xmm13,xmm2
+        vpxor   xmm4,xmm1,xmm5
+        and     r12,0x60
+        vmovups xmm15,XMMWORD[((32-128))+rcx]
+        vpclmulqdq      xmm1,xmm0,xmm3,0x10
+        vaesenc xmm14,xmm14,xmm2
+
+        vpclmulqdq      xmm2,xmm0,xmm3,0x01
+        lea     r14,[r12*1+r14]
+        vaesenc xmm9,xmm9,xmm15
+        vpxor   xmm8,xmm8,XMMWORD[((16+8))+rsp]
+        vpclmulqdq      xmm3,xmm0,xmm3,0x11
+        vmovdqu xmm0,XMMWORD[((64+8))+rsp]
+        vaesenc xmm10,xmm10,xmm15
+        movbe   r13,QWORD[88+r14]
+        vaesenc xmm11,xmm11,xmm15
+        movbe   r12,QWORD[80+r14]
+        vaesenc xmm12,xmm12,xmm15
+        mov     QWORD[((32+8))+rsp],r13
+        vaesenc xmm13,xmm13,xmm15
+        mov     QWORD[((40+8))+rsp],r12
+        vmovdqu xmm5,XMMWORD[((48-32))+r9]
+        vaesenc xmm14,xmm14,xmm15
+
+        vmovups xmm15,XMMWORD[((48-128))+rcx]
+        vpxor   xmm6,xmm6,xmm1
+        vpclmulqdq      xmm1,xmm0,xmm5,0x00
+        vaesenc xmm9,xmm9,xmm15
+        vpxor   xmm6,xmm6,xmm2
+        vpclmulqdq      xmm2,xmm0,xmm5,0x10
+        vaesenc xmm10,xmm10,xmm15
+        vpxor   xmm7,xmm7,xmm3
+        vpclmulqdq      xmm3,xmm0,xmm5,0x01
+        vaesenc xmm11,xmm11,xmm15
+        vpclmulqdq      xmm5,xmm0,xmm5,0x11
+        vmovdqu xmm0,XMMWORD[((80+8))+rsp]
+        vaesenc xmm12,xmm12,xmm15
+        vaesenc xmm13,xmm13,xmm15
+        vpxor   xmm4,xmm4,xmm1
+        vmovdqu xmm1,XMMWORD[((64-32))+r9]
+        vaesenc xmm14,xmm14,xmm15
+
+        vmovups xmm15,XMMWORD[((64-128))+rcx]
+        vpxor   xmm6,xmm6,xmm2
+        vpclmulqdq      xmm2,xmm0,xmm1,0x00
+        vaesenc xmm9,xmm9,xmm15
+        vpxor   xmm6,xmm6,xmm3
+        vpclmulqdq      xmm3,xmm0,xmm1,0x10
+        vaesenc xmm10,xmm10,xmm15
+        movbe   r13,QWORD[72+r14]
+        vpxor   xmm7,xmm7,xmm5
+        vpclmulqdq      xmm5,xmm0,xmm1,0x01
+        vaesenc xmm11,xmm11,xmm15
+        movbe   r12,QWORD[64+r14]
+        vpclmulqdq      xmm1,xmm0,xmm1,0x11
+        vmovdqu xmm0,XMMWORD[((96+8))+rsp]
+        vaesenc xmm12,xmm12,xmm15
+        mov     QWORD[((48+8))+rsp],r13
+        vaesenc xmm13,xmm13,xmm15
+        mov     QWORD[((56+8))+rsp],r12
+        vpxor   xmm4,xmm4,xmm2
+        vmovdqu xmm2,XMMWORD[((96-32))+r9]
+        vaesenc xmm14,xmm14,xmm15
+
+        vmovups xmm15,XMMWORD[((80-128))+rcx]
+        vpxor   xmm6,xmm6,xmm3
+        vpclmulqdq      xmm3,xmm0,xmm2,0x00
+        vaesenc xmm9,xmm9,xmm15
+        vpxor   xmm6,xmm6,xmm5
+        vpclmulqdq      xmm5,xmm0,xmm2,0x10
+        vaesenc xmm10,xmm10,xmm15
+        movbe   r13,QWORD[56+r14]
+        vpxor   xmm7,xmm7,xmm1
+        vpclmulqdq      xmm1,xmm0,xmm2,0x01
+        vpxor   xmm8,xmm8,XMMWORD[((112+8))+rsp]
+        vaesenc xmm11,xmm11,xmm15
+        movbe   r12,QWORD[48+r14]
+        vpclmulqdq      xmm2,xmm0,xmm2,0x11
+        vaesenc xmm12,xmm12,xmm15
+        mov     QWORD[((64+8))+rsp],r13
+        vaesenc xmm13,xmm13,xmm15
+        mov     QWORD[((72+8))+rsp],r12
+        vpxor   xmm4,xmm4,xmm3
+        vmovdqu xmm3,XMMWORD[((112-32))+r9]
+        vaesenc xmm14,xmm14,xmm15
+
+        vmovups xmm15,XMMWORD[((96-128))+rcx]
+        vpxor   xmm6,xmm6,xmm5
+        vpclmulqdq      xmm5,xmm8,xmm3,0x10
+        vaesenc xmm9,xmm9,xmm15
+        vpxor   xmm6,xmm6,xmm1
+        vpclmulqdq      xmm1,xmm8,xmm3,0x01
+        vaesenc xmm10,xmm10,xmm15
+        movbe   r13,QWORD[40+r14]
+        vpxor   xmm7,xmm7,xmm2
+        vpclmulqdq      xmm2,xmm8,xmm3,0x00
+        vaesenc xmm11,xmm11,xmm15
+        movbe   r12,QWORD[32+r14]
+        vpclmulqdq      xmm8,xmm8,xmm3,0x11
+        vaesenc xmm12,xmm12,xmm15
+        mov     QWORD[((80+8))+rsp],r13
+        vaesenc xmm13,xmm13,xmm15
+        mov     QWORD[((88+8))+rsp],r12
+        vpxor   xmm6,xmm6,xmm5
+        vaesenc xmm14,xmm14,xmm15
+        vpxor   xmm6,xmm6,xmm1
+
+        vmovups xmm15,XMMWORD[((112-128))+rcx]
+        vpslldq xmm5,xmm6,8
+        vpxor   xmm4,xmm4,xmm2
+        vmovdqu xmm3,XMMWORD[16+r11]
+
+        vaesenc xmm9,xmm9,xmm15
+        vpxor   xmm7,xmm7,xmm8
+        vaesenc xmm10,xmm10,xmm15
+        vpxor   xmm4,xmm4,xmm5
+        movbe   r13,QWORD[24+r14]
+        vaesenc xmm11,xmm11,xmm15
+        movbe   r12,QWORD[16+r14]
+        vpalignr        xmm0,xmm4,xmm4,8
+        vpclmulqdq      xmm4,xmm4,xmm3,0x10
+        mov     QWORD[((96+8))+rsp],r13
+        vaesenc xmm12,xmm12,xmm15
+        mov     QWORD[((104+8))+rsp],r12
+        vaesenc xmm13,xmm13,xmm15
+        vmovups xmm1,XMMWORD[((128-128))+rcx]
+        vaesenc xmm14,xmm14,xmm15
+
+        vaesenc xmm9,xmm9,xmm1
+        vmovups xmm15,XMMWORD[((144-128))+rcx]
+        vaesenc xmm10,xmm10,xmm1
+        vpsrldq xmm6,xmm6,8
+        vaesenc xmm11,xmm11,xmm1
+        vpxor   xmm7,xmm7,xmm6
+        vaesenc xmm12,xmm12,xmm1
+        vpxor   xmm4,xmm4,xmm0
+        movbe   r13,QWORD[8+r14]
+        vaesenc xmm13,xmm13,xmm1
+        movbe   r12,QWORD[r14]
+        vaesenc xmm14,xmm14,xmm1
+        vmovups xmm1,XMMWORD[((160-128))+rcx]
+        cmp     ebp,11
+        jb      NEAR $L$enc_tail
+
+        vaesenc xmm9,xmm9,xmm15
+        vaesenc xmm10,xmm10,xmm15
+        vaesenc xmm11,xmm11,xmm15
+        vaesenc xmm12,xmm12,xmm15
+        vaesenc xmm13,xmm13,xmm15
+        vaesenc xmm14,xmm14,xmm15
+
+        vaesenc xmm9,xmm9,xmm1
+        vaesenc xmm10,xmm10,xmm1
+        vaesenc xmm11,xmm11,xmm1
+        vaesenc xmm12,xmm12,xmm1
+        vaesenc xmm13,xmm13,xmm1
+        vmovups xmm15,XMMWORD[((176-128))+rcx]
+        vaesenc xmm14,xmm14,xmm1
+        vmovups xmm1,XMMWORD[((192-128))+rcx]
+        je      NEAR $L$enc_tail
+
+        vaesenc xmm9,xmm9,xmm15
+        vaesenc xmm10,xmm10,xmm15
+        vaesenc xmm11,xmm11,xmm15
+        vaesenc xmm12,xmm12,xmm15
+        vaesenc xmm13,xmm13,xmm15
+        vaesenc xmm14,xmm14,xmm15
+
+        vaesenc xmm9,xmm9,xmm1
+        vaesenc xmm10,xmm10,xmm1
+        vaesenc xmm11,xmm11,xmm1
+        vaesenc xmm12,xmm12,xmm1
+        vaesenc xmm13,xmm13,xmm1
+        vmovups xmm15,XMMWORD[((208-128))+rcx]
+        vaesenc xmm14,xmm14,xmm1
+        vmovups xmm1,XMMWORD[((224-128))+rcx]
+        jmp     NEAR $L$enc_tail
+
+ALIGN   32
+$L$handle_ctr32:
+        vmovdqu xmm0,XMMWORD[r11]
+        vpshufb xmm6,xmm1,xmm0
+        vmovdqu xmm5,XMMWORD[48+r11]
+        vpaddd  xmm10,xmm6,XMMWORD[64+r11]
+        vpaddd  xmm11,xmm6,xmm5
+        vmovdqu xmm3,XMMWORD[((0-32))+r9]
+        vpaddd  xmm12,xmm10,xmm5
+        vpshufb xmm10,xmm10,xmm0
+        vpaddd  xmm13,xmm11,xmm5
+        vpshufb xmm11,xmm11,xmm0
+        vpxor   xmm10,xmm10,xmm15
+        vpaddd  xmm14,xmm12,xmm5
+        vpshufb xmm12,xmm12,xmm0
+        vpxor   xmm11,xmm11,xmm15
+        vpaddd  xmm1,xmm13,xmm5
+        vpshufb xmm13,xmm13,xmm0
+        vpshufb xmm14,xmm14,xmm0
+        vpshufb xmm1,xmm1,xmm0
+        jmp     NEAR $L$resume_ctr32
+
+ALIGN   32
+$L$enc_tail:
+        vaesenc xmm9,xmm9,xmm15
+        vmovdqu XMMWORD[(16+8)+rsp],xmm7
+        vpalignr        xmm8,xmm4,xmm4,8
+        vaesenc xmm10,xmm10,xmm15
+        vpclmulqdq      xmm4,xmm4,xmm3,0x10
+        vpxor   xmm2,xmm1,XMMWORD[rdi]
+        vaesenc xmm11,xmm11,xmm15
+        vpxor   xmm0,xmm1,XMMWORD[16+rdi]
+        vaesenc xmm12,xmm12,xmm15
+        vpxor   xmm5,xmm1,XMMWORD[32+rdi]
+        vaesenc xmm13,xmm13,xmm15
+        vpxor   xmm6,xmm1,XMMWORD[48+rdi]
+        vaesenc xmm14,xmm14,xmm15
+        vpxor   xmm7,xmm1,XMMWORD[64+rdi]
+        vpxor   xmm3,xmm1,XMMWORD[80+rdi]
+        vmovdqu xmm1,XMMWORD[r8]
+
+        vaesenclast     xmm9,xmm9,xmm2
+        vmovdqu xmm2,XMMWORD[32+r11]
+        vaesenclast     xmm10,xmm10,xmm0
+        vpaddb  xmm0,xmm1,xmm2
+        mov     QWORD[((112+8))+rsp],r13
+        lea     rdi,[96+rdi]
+        vaesenclast     xmm11,xmm11,xmm5
+        vpaddb  xmm5,xmm0,xmm2
+        mov     QWORD[((120+8))+rsp],r12
+        lea     rsi,[96+rsi]
+        vmovdqu xmm15,XMMWORD[((0-128))+rcx]
+        vaesenclast     xmm12,xmm12,xmm6
+        vpaddb  xmm6,xmm5,xmm2
+        vaesenclast     xmm13,xmm13,xmm7
+        vpaddb  xmm7,xmm6,xmm2
+        vaesenclast     xmm14,xmm14,xmm3
+        vpaddb  xmm3,xmm7,xmm2
+
+        add     r10,0x60
+        sub     rdx,0x6
+        jc      NEAR $L$6x_done
+
+        vmovups XMMWORD[(-96)+rsi],xmm9
+        vpxor   xmm9,xmm1,xmm15
+        vmovups XMMWORD[(-80)+rsi],xmm10
+        vmovdqa xmm10,xmm0
+        vmovups XMMWORD[(-64)+rsi],xmm11
+        vmovdqa xmm11,xmm5
+        vmovups XMMWORD[(-48)+rsi],xmm12
+        vmovdqa xmm12,xmm6
+        vmovups XMMWORD[(-32)+rsi],xmm13
+        vmovdqa xmm13,xmm7
+        vmovups XMMWORD[(-16)+rsi],xmm14
+        vmovdqa xmm14,xmm3
+        vmovdqu xmm7,XMMWORD[((32+8))+rsp]
+        jmp     NEAR $L$oop6x
+
+$L$6x_done:
+        vpxor   xmm8,xmm8,XMMWORD[((16+8))+rsp]
+        vpxor   xmm8,xmm8,xmm4
+
+        DB      0F3h,0C3h               ;repret
+
+global  aesni_gcm_decrypt
+
+ALIGN   32
+aesni_gcm_decrypt:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_aesni_gcm_decrypt:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+
+        xor     r10,r10
+        cmp     rdx,0x60
+        jb      NEAR $L$gcm_dec_abort
+
+        lea     rax,[rsp]
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+        lea     rsp,[((-168))+rsp]
+        movaps  XMMWORD[(-216)+rax],xmm6
+        movaps  XMMWORD[(-200)+rax],xmm7
+        movaps  XMMWORD[(-184)+rax],xmm8
+        movaps  XMMWORD[(-168)+rax],xmm9
+        movaps  XMMWORD[(-152)+rax],xmm10
+        movaps  XMMWORD[(-136)+rax],xmm11
+        movaps  XMMWORD[(-120)+rax],xmm12
+        movaps  XMMWORD[(-104)+rax],xmm13
+        movaps  XMMWORD[(-88)+rax],xmm14
+        movaps  XMMWORD[(-72)+rax],xmm15
+$L$gcm_dec_body:
+        vzeroupper
+
+        vmovdqu xmm1,XMMWORD[r8]
+        add     rsp,-128
+        mov     ebx,DWORD[12+r8]
+        lea     r11,[$L$bswap_mask]
+        lea     r14,[((-128))+rcx]
+        mov     r15,0xf80
+        vmovdqu xmm8,XMMWORD[r9]
+        and     rsp,-128
+        vmovdqu xmm0,XMMWORD[r11]
+        lea     rcx,[128+rcx]
+        lea     r9,[((32+32))+r9]
+        mov     ebp,DWORD[((240-128))+rcx]
+        vpshufb xmm8,xmm8,xmm0
+
+        and     r14,r15
+        and     r15,rsp
+        sub     r15,r14
+        jc      NEAR $L$dec_no_key_aliasing
+        cmp     r15,768
+        jnc     NEAR $L$dec_no_key_aliasing
+        sub     rsp,r15
+$L$dec_no_key_aliasing:
+
+        vmovdqu xmm7,XMMWORD[80+rdi]
+        lea     r14,[rdi]
+        vmovdqu xmm4,XMMWORD[64+rdi]
+        lea     r15,[((-192))+rdx*1+rdi]
+        vmovdqu xmm5,XMMWORD[48+rdi]
+        shr     rdx,4
+        xor     r10,r10
+        vmovdqu xmm6,XMMWORD[32+rdi]
+        vpshufb xmm7,xmm7,xmm0
+        vmovdqu xmm2,XMMWORD[16+rdi]
+        vpshufb xmm4,xmm4,xmm0
+        vmovdqu xmm3,XMMWORD[rdi]
+        vpshufb xmm5,xmm5,xmm0
+        vmovdqu XMMWORD[48+rsp],xmm4
+        vpshufb xmm6,xmm6,xmm0
+        vmovdqu XMMWORD[64+rsp],xmm5
+        vpshufb xmm2,xmm2,xmm0
+        vmovdqu XMMWORD[80+rsp],xmm6
+        vpshufb xmm3,xmm3,xmm0
+        vmovdqu XMMWORD[96+rsp],xmm2
+        vmovdqu XMMWORD[112+rsp],xmm3
+
+        call    _aesni_ctr32_ghash_6x
+
+        vmovups XMMWORD[(-96)+rsi],xmm9
+        vmovups XMMWORD[(-80)+rsi],xmm10
+        vmovups XMMWORD[(-64)+rsi],xmm11
+        vmovups XMMWORD[(-48)+rsi],xmm12
+        vmovups XMMWORD[(-32)+rsi],xmm13
+        vmovups XMMWORD[(-16)+rsi],xmm14
+
+        vpshufb xmm8,xmm8,XMMWORD[r11]
+        vmovdqu XMMWORD[(-64)+r9],xmm8
+
+        vzeroupper
+        movaps  xmm6,XMMWORD[((-216))+rax]
+        movaps  xmm7,XMMWORD[((-200))+rax]
+        movaps  xmm8,XMMWORD[((-184))+rax]
+        movaps  xmm9,XMMWORD[((-168))+rax]
+        movaps  xmm10,XMMWORD[((-152))+rax]
+        movaps  xmm11,XMMWORD[((-136))+rax]
+        movaps  xmm12,XMMWORD[((-120))+rax]
+        movaps  xmm13,XMMWORD[((-104))+rax]
+        movaps  xmm14,XMMWORD[((-88))+rax]
+        movaps  xmm15,XMMWORD[((-72))+rax]
+        mov     r15,QWORD[((-48))+rax]
+
+        mov     r14,QWORD[((-40))+rax]
+
+        mov     r13,QWORD[((-32))+rax]
+
+        mov     r12,QWORD[((-24))+rax]
+
+        mov     rbp,QWORD[((-16))+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+
+        lea     rsp,[rax]
+
+$L$gcm_dec_abort:
+        mov     rax,r10
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_aesni_gcm_decrypt:
+
+ALIGN   32
+_aesni_ctr32_6x:
+        vmovdqu xmm4,XMMWORD[((0-128))+rcx]
+        vmovdqu xmm2,XMMWORD[32+r11]
+        lea     r13,[((-1))+rbp]
+        vmovups xmm15,XMMWORD[((16-128))+rcx]
+        lea     r12,[((32-128))+rcx]
+        vpxor   xmm9,xmm1,xmm4
+        add     ebx,100663296
+        jc      NEAR $L$handle_ctr32_2
+        vpaddb  xmm10,xmm1,xmm2
+        vpaddb  xmm11,xmm10,xmm2
+        vpxor   xmm10,xmm10,xmm4
+        vpaddb  xmm12,xmm11,xmm2
+        vpxor   xmm11,xmm11,xmm4
+        vpaddb  xmm13,xmm12,xmm2
+        vpxor   xmm12,xmm12,xmm4
+        vpaddb  xmm14,xmm13,xmm2
+        vpxor   xmm13,xmm13,xmm4
+        vpaddb  xmm1,xmm14,xmm2
+        vpxor   xmm14,xmm14,xmm4
+        jmp     NEAR $L$oop_ctr32
+
+ALIGN   16
+$L$oop_ctr32:
+        vaesenc xmm9,xmm9,xmm15
+        vaesenc xmm10,xmm10,xmm15
+        vaesenc xmm11,xmm11,xmm15
+        vaesenc xmm12,xmm12,xmm15
+        vaesenc xmm13,xmm13,xmm15
+        vaesenc xmm14,xmm14,xmm15
+        vmovups xmm15,XMMWORD[r12]
+        lea     r12,[16+r12]
+        dec     r13d
+        jnz     NEAR $L$oop_ctr32
+
+        vmovdqu xmm3,XMMWORD[r12]
+        vaesenc xmm9,xmm9,xmm15
+        vpxor   xmm4,xmm3,XMMWORD[rdi]
+        vaesenc xmm10,xmm10,xmm15
+        vpxor   xmm5,xmm3,XMMWORD[16+rdi]
+        vaesenc xmm11,xmm11,xmm15
+        vpxor   xmm6,xmm3,XMMWORD[32+rdi]
+        vaesenc xmm12,xmm12,xmm15
+        vpxor   xmm8,xmm3,XMMWORD[48+rdi]
+        vaesenc xmm13,xmm13,xmm15
+        vpxor   xmm2,xmm3,XMMWORD[64+rdi]
+        vaesenc xmm14,xmm14,xmm15
+        vpxor   xmm3,xmm3,XMMWORD[80+rdi]
+        lea     rdi,[96+rdi]
+
+        vaesenclast     xmm9,xmm9,xmm4
+        vaesenclast     xmm10,xmm10,xmm5
+        vaesenclast     xmm11,xmm11,xmm6
+        vaesenclast     xmm12,xmm12,xmm8
+        vaesenclast     xmm13,xmm13,xmm2
+        vaesenclast     xmm14,xmm14,xmm3
+        vmovups XMMWORD[rsi],xmm9
+        vmovups XMMWORD[16+rsi],xmm10
+        vmovups XMMWORD[32+rsi],xmm11
+        vmovups XMMWORD[48+rsi],xmm12
+        vmovups XMMWORD[64+rsi],xmm13
+        vmovups XMMWORD[80+rsi],xmm14
+        lea     rsi,[96+rsi]
+
+        DB      0F3h,0C3h               ;repret
+ALIGN   32
+$L$handle_ctr32_2:
+        vpshufb xmm6,xmm1,xmm0
+        vmovdqu xmm5,XMMWORD[48+r11]
+        vpaddd  xmm10,xmm6,XMMWORD[64+r11]
+        vpaddd  xmm11,xmm6,xmm5
+        vpaddd  xmm12,xmm10,xmm5
+        vpshufb xmm10,xmm10,xmm0
+        vpaddd  xmm13,xmm11,xmm5
+        vpshufb xmm11,xmm11,xmm0
+        vpxor   xmm10,xmm10,xmm4
+        vpaddd  xmm14,xmm12,xmm5
+        vpshufb xmm12,xmm12,xmm0
+        vpxor   xmm11,xmm11,xmm4
+        vpaddd  xmm1,xmm13,xmm5
+        vpshufb xmm13,xmm13,xmm0
+        vpxor   xmm12,xmm12,xmm4
+        vpshufb xmm14,xmm14,xmm0
+        vpxor   xmm13,xmm13,xmm4
+        vpshufb xmm1,xmm1,xmm0
+        vpxor   xmm14,xmm14,xmm4
+        jmp     NEAR $L$oop_ctr32
+
+
+global  aesni_gcm_encrypt
+
+ALIGN   32
+aesni_gcm_encrypt:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_aesni_gcm_encrypt:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+
+        xor     r10,r10
+        cmp     rdx,0x60*3
+        jb      NEAR $L$gcm_enc_abort
+
+        lea     rax,[rsp]
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+        lea     rsp,[((-168))+rsp]
+        movaps  XMMWORD[(-216)+rax],xmm6
+        movaps  XMMWORD[(-200)+rax],xmm7
+        movaps  XMMWORD[(-184)+rax],xmm8
+        movaps  XMMWORD[(-168)+rax],xmm9
+        movaps  XMMWORD[(-152)+rax],xmm10
+        movaps  XMMWORD[(-136)+rax],xmm11
+        movaps  XMMWORD[(-120)+rax],xmm12
+        movaps  XMMWORD[(-104)+rax],xmm13
+        movaps  XMMWORD[(-88)+rax],xmm14
+        movaps  XMMWORD[(-72)+rax],xmm15
+$L$gcm_enc_body:
+        vzeroupper
+
+        vmovdqu xmm1,XMMWORD[r8]
+        add     rsp,-128
+        mov     ebx,DWORD[12+r8]
+        lea     r11,[$L$bswap_mask]
+        lea     r14,[((-128))+rcx]
+        mov     r15,0xf80
+        lea     rcx,[128+rcx]
+        vmovdqu xmm0,XMMWORD[r11]
+        and     rsp,-128
+        mov     ebp,DWORD[((240-128))+rcx]
+
+        and     r14,r15
+        and     r15,rsp
+        sub     r15,r14
+        jc      NEAR $L$enc_no_key_aliasing
+        cmp     r15,768
+        jnc     NEAR $L$enc_no_key_aliasing
+        sub     rsp,r15
+$L$enc_no_key_aliasing:
+
+        lea     r14,[rsi]
+        lea     r15,[((-192))+rdx*1+rsi]
+        shr     rdx,4
+
+        call    _aesni_ctr32_6x
+        vpshufb xmm8,xmm9,xmm0
+        vpshufb xmm2,xmm10,xmm0
+        vmovdqu XMMWORD[112+rsp],xmm8
+        vpshufb xmm4,xmm11,xmm0
+        vmovdqu XMMWORD[96+rsp],xmm2
+        vpshufb xmm5,xmm12,xmm0
+        vmovdqu XMMWORD[80+rsp],xmm4
+        vpshufb xmm6,xmm13,xmm0
+        vmovdqu XMMWORD[64+rsp],xmm5
+        vpshufb xmm7,xmm14,xmm0
+        vmovdqu XMMWORD[48+rsp],xmm6
+
+        call    _aesni_ctr32_6x
+
+        vmovdqu xmm8,XMMWORD[r9]
+        lea     r9,[((32+32))+r9]
+        sub     rdx,12
+        mov     r10,0x60*2
+        vpshufb xmm8,xmm8,xmm0
+
+        call    _aesni_ctr32_ghash_6x
+        vmovdqu xmm7,XMMWORD[32+rsp]
+        vmovdqu xmm0,XMMWORD[r11]
+        vmovdqu xmm3,XMMWORD[((0-32))+r9]
+        vpunpckhqdq     xmm1,xmm7,xmm7
+        vmovdqu xmm15,XMMWORD[((32-32))+r9]
+        vmovups XMMWORD[(-96)+rsi],xmm9
+        vpshufb xmm9,xmm9,xmm0
+        vpxor   xmm1,xmm1,xmm7
+        vmovups XMMWORD[(-80)+rsi],xmm10
+        vpshufb xmm10,xmm10,xmm0
+        vmovups XMMWORD[(-64)+rsi],xmm11
+        vpshufb xmm11,xmm11,xmm0
+        vmovups XMMWORD[(-48)+rsi],xmm12
+        vpshufb xmm12,xmm12,xmm0
+        vmovups XMMWORD[(-32)+rsi],xmm13
+        vpshufb xmm13,xmm13,xmm0
+        vmovups XMMWORD[(-16)+rsi],xmm14
+        vpshufb xmm14,xmm14,xmm0
+        vmovdqu XMMWORD[16+rsp],xmm9
+        vmovdqu xmm6,XMMWORD[48+rsp]
+        vmovdqu xmm0,XMMWORD[((16-32))+r9]
+        vpunpckhqdq     xmm2,xmm6,xmm6
+        vpclmulqdq      xmm5,xmm7,xmm3,0x00
+        vpxor   xmm2,xmm2,xmm6
+        vpclmulqdq      xmm7,xmm7,xmm3,0x11
+        vpclmulqdq      xmm1,xmm1,xmm15,0x00
+
+        vmovdqu xmm9,XMMWORD[64+rsp]
+        vpclmulqdq      xmm4,xmm6,xmm0,0x00
+        vmovdqu xmm3,XMMWORD[((48-32))+r9]
+        vpxor   xmm4,xmm4,xmm5
+        vpunpckhqdq     xmm5,xmm9,xmm9
+        vpclmulqdq      xmm6,xmm6,xmm0,0x11
+        vpxor   xmm5,xmm5,xmm9
+        vpxor   xmm6,xmm6,xmm7
+        vpclmulqdq      xmm2,xmm2,xmm15,0x10
+        vmovdqu xmm15,XMMWORD[((80-32))+r9]
+        vpxor   xmm2,xmm2,xmm1
+
+        vmovdqu xmm1,XMMWORD[80+rsp]
+        vpclmulqdq      xmm7,xmm9,xmm3,0x00
+        vmovdqu xmm0,XMMWORD[((64-32))+r9]
+        vpxor   xmm7,xmm7,xmm4
+        vpunpckhqdq     xmm4,xmm1,xmm1
+        vpclmulqdq      xmm9,xmm9,xmm3,0x11
+        vpxor   xmm4,xmm4,xmm1
+        vpxor   xmm9,xmm9,xmm6
+        vpclmulqdq      xmm5,xmm5,xmm15,0x00
+        vpxor   xmm5,xmm5,xmm2
+
+        vmovdqu xmm2,XMMWORD[96+rsp]
+        vpclmulqdq      xmm6,xmm1,xmm0,0x00
+        vmovdqu xmm3,XMMWORD[((96-32))+r9]
+        vpxor   xmm6,xmm6,xmm7
+        vpunpckhqdq     xmm7,xmm2,xmm2
+        vpclmulqdq      xmm1,xmm1,xmm0,0x11
+        vpxor   xmm7,xmm7,xmm2
+        vpxor   xmm1,xmm1,xmm9
+        vpclmulqdq      xmm4,xmm4,xmm15,0x10
+        vmovdqu xmm15,XMMWORD[((128-32))+r9]
+        vpxor   xmm4,xmm4,xmm5
+
+        vpxor   xmm8,xmm8,XMMWORD[112+rsp]
+        vpclmulqdq      xmm5,xmm2,xmm3,0x00
+        vmovdqu xmm0,XMMWORD[((112-32))+r9]
+        vpunpckhqdq     xmm9,xmm8,xmm8
+        vpxor   xmm5,xmm5,xmm6
+        vpclmulqdq      xmm2,xmm2,xmm3,0x11
+        vpxor   xmm9,xmm9,xmm8
+        vpxor   xmm2,xmm2,xmm1
+        vpclmulqdq      xmm7,xmm7,xmm15,0x00
+        vpxor   xmm4,xmm7,xmm4
+
+        vpclmulqdq      xmm6,xmm8,xmm0,0x00
+        vmovdqu xmm3,XMMWORD[((0-32))+r9]
+        vpunpckhqdq     xmm1,xmm14,xmm14
+        vpclmulqdq      xmm8,xmm8,xmm0,0x11
+        vpxor   xmm1,xmm1,xmm14
+        vpxor   xmm5,xmm6,xmm5
+        vpclmulqdq      xmm9,xmm9,xmm15,0x10
+        vmovdqu xmm15,XMMWORD[((32-32))+r9]
+        vpxor   xmm7,xmm8,xmm2
+        vpxor   xmm6,xmm9,xmm4
+
+        vmovdqu xmm0,XMMWORD[((16-32))+r9]
+        vpxor   xmm9,xmm7,xmm5
+        vpclmulqdq      xmm4,xmm14,xmm3,0x00
+        vpxor   xmm6,xmm6,xmm9
+        vpunpckhqdq     xmm2,xmm13,xmm13
+        vpclmulqdq      xmm14,xmm14,xmm3,0x11
+        vpxor   xmm2,xmm2,xmm13
+        vpslldq xmm9,xmm6,8
+        vpclmulqdq      xmm1,xmm1,xmm15,0x00
+        vpxor   xmm8,xmm5,xmm9
+        vpsrldq xmm6,xmm6,8
+        vpxor   xmm7,xmm7,xmm6
+
+        vpclmulqdq      xmm5,xmm13,xmm0,0x00
+        vmovdqu xmm3,XMMWORD[((48-32))+r9]
+        vpxor   xmm5,xmm5,xmm4
+        vpunpckhqdq     xmm9,xmm12,xmm12
+        vpclmulqdq      xmm13,xmm13,xmm0,0x11
+        vpxor   xmm9,xmm9,xmm12
+        vpxor   xmm13,xmm13,xmm14
+        vpalignr        xmm14,xmm8,xmm8,8
+        vpclmulqdq      xmm2,xmm2,xmm15,0x10
+        vmovdqu xmm15,XMMWORD[((80-32))+r9]
+        vpxor   xmm2,xmm2,xmm1
+
+        vpclmulqdq      xmm4,xmm12,xmm3,0x00
+        vmovdqu xmm0,XMMWORD[((64-32))+r9]
+        vpxor   xmm4,xmm4,xmm5
+        vpunpckhqdq     xmm1,xmm11,xmm11
+        vpclmulqdq      xmm12,xmm12,xmm3,0x11
+        vpxor   xmm1,xmm1,xmm11
+        vpxor   xmm12,xmm12,xmm13
+        vxorps  xmm7,xmm7,XMMWORD[16+rsp]
+        vpclmulqdq      xmm9,xmm9,xmm15,0x00
+        vpxor   xmm9,xmm9,xmm2
+
+        vpclmulqdq      xmm8,xmm8,XMMWORD[16+r11],0x10
+        vxorps  xmm8,xmm8,xmm14
+
+        vpclmulqdq      xmm5,xmm11,xmm0,0x00
+        vmovdqu xmm3,XMMWORD[((96-32))+r9]
+        vpxor   xmm5,xmm5,xmm4
+        vpunpckhqdq     xmm2,xmm10,xmm10
+        vpclmulqdq      xmm11,xmm11,xmm0,0x11
+        vpxor   xmm2,xmm2,xmm10
+        vpalignr        xmm14,xmm8,xmm8,8
+        vpxor   xmm11,xmm11,xmm12
+        vpclmulqdq      xmm1,xmm1,xmm15,0x10
+        vmovdqu xmm15,XMMWORD[((128-32))+r9]
+        vpxor   xmm1,xmm1,xmm9
+
+        vxorps  xmm14,xmm14,xmm7
+        vpclmulqdq      xmm8,xmm8,XMMWORD[16+r11],0x10
+        vxorps  xmm8,xmm8,xmm14
+
+        vpclmulqdq      xmm4,xmm10,xmm3,0x00
+        vmovdqu xmm0,XMMWORD[((112-32))+r9]
+        vpxor   xmm4,xmm4,xmm5
+        vpunpckhqdq     xmm9,xmm8,xmm8
+        vpclmulqdq      xmm10,xmm10,xmm3,0x11
+        vpxor   xmm9,xmm9,xmm8
+        vpxor   xmm10,xmm10,xmm11
+        vpclmulqdq      xmm2,xmm2,xmm15,0x00
+        vpxor   xmm2,xmm2,xmm1
+
+        vpclmulqdq      xmm5,xmm8,xmm0,0x00
+        vpclmulqdq      xmm7,xmm8,xmm0,0x11
+        vpxor   xmm5,xmm5,xmm4
+        vpclmulqdq      xmm6,xmm9,xmm15,0x10
+        vpxor   xmm7,xmm7,xmm10
+        vpxor   xmm6,xmm6,xmm2
+
+        vpxor   xmm4,xmm7,xmm5
+        vpxor   xmm6,xmm6,xmm4
+        vpslldq xmm1,xmm6,8
+        vmovdqu xmm3,XMMWORD[16+r11]
+        vpsrldq xmm6,xmm6,8
+        vpxor   xmm8,xmm5,xmm1
+        vpxor   xmm7,xmm7,xmm6
+
+        vpalignr        xmm2,xmm8,xmm8,8
+        vpclmulqdq      xmm8,xmm8,xmm3,0x10
+        vpxor   xmm8,xmm8,xmm2
+
+        vpalignr        xmm2,xmm8,xmm8,8
+        vpclmulqdq      xmm8,xmm8,xmm3,0x10
+        vpxor   xmm2,xmm2,xmm7
+        vpxor   xmm8,xmm8,xmm2
+        vpshufb xmm8,xmm8,XMMWORD[r11]
+        vmovdqu XMMWORD[(-64)+r9],xmm8
+
+        vzeroupper
+        movaps  xmm6,XMMWORD[((-216))+rax]
+        movaps  xmm7,XMMWORD[((-200))+rax]
+        movaps  xmm8,XMMWORD[((-184))+rax]
+        movaps  xmm9,XMMWORD[((-168))+rax]
+        movaps  xmm10,XMMWORD[((-152))+rax]
+        movaps  xmm11,XMMWORD[((-136))+rax]
+        movaps  xmm12,XMMWORD[((-120))+rax]
+        movaps  xmm13,XMMWORD[((-104))+rax]
+        movaps  xmm14,XMMWORD[((-88))+rax]
+        movaps  xmm15,XMMWORD[((-72))+rax]
+        mov     r15,QWORD[((-48))+rax]
+
+        mov     r14,QWORD[((-40))+rax]
+
+        mov     r13,QWORD[((-32))+rax]
+
+        mov     r12,QWORD[((-24))+rax]
+
+        mov     rbp,QWORD[((-16))+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+
+        lea     rsp,[rax]
+
+$L$gcm_enc_abort:
+        mov     rax,r10
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_aesni_gcm_encrypt:
+ALIGN   64
+$L$bswap_mask:
+DB      15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0
+$L$poly:
+DB      0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0xc2
+$L$one_msb:
+DB      0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
+$L$two_lsb:
+DB      2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
+$L$one_lsb:
+DB      1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
+DB      65,69,83,45,78,73,32,71,67,77,32,109,111,100,117,108
+DB      101,32,102,111,114,32,120,56,54,95,54,52,44,32,67,82
+DB      89,80,84,79,71,65,77,83,32,98,121,32,60,97,112,112
+DB      114,111,64,111,112,101,110,115,115,108,46,111,114,103,62,0
+ALIGN   64
+EXTERN  __imp_RtlVirtualUnwind
+
+ALIGN   16
+gcm_se_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        mov     rsi,QWORD[8+r9]
+        mov     r11,QWORD[56+r9]
+
+        mov     r10d,DWORD[r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jb      NEAR $L$common_seh_tail
+
+        mov     rax,QWORD[152+r8]
+
+        mov     r10d,DWORD[4+r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jae     NEAR $L$common_seh_tail
+
+        mov     rax,QWORD[120+r8]
+
+        mov     r15,QWORD[((-48))+rax]
+        mov     r14,QWORD[((-40))+rax]
+        mov     r13,QWORD[((-32))+rax]
+        mov     r12,QWORD[((-24))+rax]
+        mov     rbp,QWORD[((-16))+rax]
+        mov     rbx,QWORD[((-8))+rax]
+        mov     QWORD[240+r8],r15
+        mov     QWORD[232+r8],r14
+        mov     QWORD[224+r8],r13
+        mov     QWORD[216+r8],r12
+        mov     QWORD[160+r8],rbp
+        mov     QWORD[144+r8],rbx
+
+        lea     rsi,[((-216))+rax]
+        lea     rdi,[512+r8]
+        mov     ecx,20
+        DD      0xa548f3fc
+
+$L$common_seh_tail:
+        mov     rdi,QWORD[8+rax]
+        mov     rsi,QWORD[16+rax]
+        mov     QWORD[152+r8],rax
+        mov     QWORD[168+r8],rsi
+        mov     QWORD[176+r8],rdi
+
+        mov     rdi,QWORD[40+r9]
+        mov     rsi,r8
+        mov     ecx,154
+        DD      0xa548f3fc
+
+        mov     rsi,r9
+        xor     rcx,rcx
+        mov     rdx,QWORD[8+rsi]
+        mov     r8,QWORD[rsi]
+        mov     r9,QWORD[16+rsi]
+        mov     r10,QWORD[40+rsi]
+        lea     r11,[56+rsi]
+        lea     r12,[24+rsi]
+        mov     QWORD[32+rsp],r10
+        mov     QWORD[40+rsp],r11
+        mov     QWORD[48+rsp],r12
+        mov     QWORD[56+rsp],rcx
+        call    QWORD[__imp_RtlVirtualUnwind]
+
+        mov     eax,1
+        add     rsp,64
+        popfq
+        pop     r15
+        pop     r14
+        pop     r13
+        pop     r12
+        pop     rbp
+        pop     rbx
+        pop     rdi
+        pop     rsi
+        DB      0F3h,0C3h               ;repret
+
+
+section .pdata rdata align=4
+ALIGN   4
+        DD      $L$SEH_begin_aesni_gcm_decrypt wrt ..imagebase
+        DD      $L$SEH_end_aesni_gcm_decrypt wrt ..imagebase
+        DD      $L$SEH_gcm_dec_info wrt ..imagebase
+
+        DD      $L$SEH_begin_aesni_gcm_encrypt wrt ..imagebase
+        DD      $L$SEH_end_aesni_gcm_encrypt wrt ..imagebase
+        DD      $L$SEH_gcm_enc_info wrt ..imagebase
+section .xdata rdata align=8
+ALIGN   8
+$L$SEH_gcm_dec_info:
+DB      9,0,0,0
+        DD      gcm_se_handler wrt ..imagebase
+        DD      $L$gcm_dec_body wrt ..imagebase,$L$gcm_dec_abort wrt ..imagebase
+$L$SEH_gcm_enc_info:
+DB      9,0,0,0
+        DD      gcm_se_handler wrt ..imagebase
+        DD      $L$gcm_enc_body wrt ..imagebase,$L$gcm_enc_abort wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/modes/ghash-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/modes/ghash-x86_64.nasm
new file mode 100644
index 0000000000..3d67e12775
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/modes/ghash-x86_64.nasm
@@ -0,0 +1,2077 @@
+; Copyright 2010-2019 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+EXTERN  OPENSSL_ia32cap_P
+
+global  gcm_gmult_4bit
+
+ALIGN   16
+gcm_gmult_4bit:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_gcm_gmult_4bit:
+        mov     rdi,rcx
+        mov     rsi,rdx
+
+
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+        sub     rsp,280
+
+$L$gmult_prologue:
+
+        movzx   r8,BYTE[15+rdi]
+        lea     r11,[$L$rem_4bit]
+        xor     rax,rax
+        xor     rbx,rbx
+        mov     al,r8b
+        mov     bl,r8b
+        shl     al,4
+        mov     rcx,14
+        mov     r8,QWORD[8+rax*1+rsi]
+        mov     r9,QWORD[rax*1+rsi]
+        and     bl,0xf0
+        mov     rdx,r8
+        jmp     NEAR $L$oop1
+
+ALIGN   16
+$L$oop1:
+        shr     r8,4
+        and     rdx,0xf
+        mov     r10,r9
+        mov     al,BYTE[rcx*1+rdi]
+        shr     r9,4
+        xor     r8,QWORD[8+rbx*1+rsi]
+        shl     r10,60
+        xor     r9,QWORD[rbx*1+rsi]
+        mov     bl,al
+        xor     r9,QWORD[rdx*8+r11]
+        mov     rdx,r8
+        shl     al,4
+        xor     r8,r10
+        dec     rcx
+        js      NEAR $L$break1
+
+        shr     r8,4
+        and     rdx,0xf
+        mov     r10,r9
+        shr     r9,4
+        xor     r8,QWORD[8+rax*1+rsi]
+        shl     r10,60
+        xor     r9,QWORD[rax*1+rsi]
+        and     bl,0xf0
+        xor     r9,QWORD[rdx*8+r11]
+        mov     rdx,r8
+        xor     r8,r10
+        jmp     NEAR $L$oop1
+
+ALIGN   16
+$L$break1:
+        shr     r8,4
+        and     rdx,0xf
+        mov     r10,r9
+        shr     r9,4
+        xor     r8,QWORD[8+rax*1+rsi]
+        shl     r10,60
+        xor     r9,QWORD[rax*1+rsi]
+        and     bl,0xf0
+        xor     r9,QWORD[rdx*8+r11]
+        mov     rdx,r8
+        xor     r8,r10
+
+        shr     r8,4
+        and     rdx,0xf
+        mov     r10,r9
+        shr     r9,4
+        xor     r8,QWORD[8+rbx*1+rsi]
+        shl     r10,60
+        xor     r9,QWORD[rbx*1+rsi]
+        xor     r8,r10
+        xor     r9,QWORD[rdx*8+r11]
+
+        bswap   r8
+        bswap   r9
+        mov     QWORD[8+rdi],r8
+        mov     QWORD[rdi],r9
+
+        lea     rsi,[((280+48))+rsp]
+
+        mov     rbx,QWORD[((-8))+rsi]
+
+        lea     rsp,[rsi]
+
+$L$gmult_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_gcm_gmult_4bit:
+global  gcm_ghash_4bit
+
+ALIGN   16
+gcm_ghash_4bit:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_gcm_ghash_4bit:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+
+
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+        sub     rsp,280
+
+$L$ghash_prologue:
+        mov     r14,rdx
+        mov     r15,rcx
+        sub     rsi,-128
+        lea     rbp,[((16+128))+rsp]
+        xor     edx,edx
+        mov     r8,QWORD[((0+0-128))+rsi]
+        mov     rax,QWORD[((0+8-128))+rsi]
+        mov     dl,al
+        shr     rax,4
+        mov     r10,r8
+        shr     r8,4
+        mov     r9,QWORD[((16+0-128))+rsi]
+        shl     dl,4
+        mov     rbx,QWORD[((16+8-128))+rsi]
+        shl     r10,60
+        mov     BYTE[rsp],dl
+        or      rax,r10
+        mov     dl,bl
+        shr     rbx,4
+        mov     r10,r9
+        shr     r9,4
+        mov     QWORD[rbp],r8
+        mov     r8,QWORD[((32+0-128))+rsi]
+        shl     dl,4
+        mov     QWORD[((0-128))+rbp],rax
+        mov     rax,QWORD[((32+8-128))+rsi]
+        shl     r10,60
+        mov     BYTE[1+rsp],dl
+        or      rbx,r10
+        mov     dl,al
+        shr     rax,4
+        mov     r10,r8
+        shr     r8,4
+        mov     QWORD[8+rbp],r9
+        mov     r9,QWORD[((48+0-128))+rsi]
+        shl     dl,4
+        mov     QWORD[((8-128))+rbp],rbx
+        mov     rbx,QWORD[((48+8-128))+rsi]
+        shl     r10,60
+        mov     BYTE[2+rsp],dl
+        or      rax,r10
+        mov     dl,bl
+        shr     rbx,4
+        mov     r10,r9
+        shr     r9,4
+        mov     QWORD[16+rbp],r8
+        mov     r8,QWORD[((64+0-128))+rsi]
+        shl     dl,4
+        mov     QWORD[((16-128))+rbp],rax
+        mov     rax,QWORD[((64+8-128))+rsi]
+        shl     r10,60
+        mov     BYTE[3+rsp],dl
+        or      rbx,r10
+        mov     dl,al
+        shr     rax,4
+        mov     r10,r8
+        shr     r8,4
+        mov     QWORD[24+rbp],r9
+        mov     r9,QWORD[((80+0-128))+rsi]
+        shl     dl,4
+        mov     QWORD[((24-128))+rbp],rbx
+        mov     rbx,QWORD[((80+8-128))+rsi]
+        shl     r10,60
+        mov     BYTE[4+rsp],dl
+        or      rax,r10
+        mov     dl,bl
+        shr     rbx,4
+        mov     r10,r9
+        shr     r9,4
+        mov     QWORD[32+rbp],r8
+        mov     r8,QWORD[((96+0-128))+rsi]
+        shl     dl,4
+        mov     QWORD[((32-128))+rbp],rax
+        mov     rax,QWORD[((96+8-128))+rsi]
+        shl     r10,60
+        mov     BYTE[5+rsp],dl
+        or      rbx,r10
+        mov     dl,al
+        shr     rax,4
+        mov     r10,r8
+        shr     r8,4
+        mov     QWORD[40+rbp],r9
+        mov     r9,QWORD[((112+0-128))+rsi]
+        shl     dl,4
+        mov     QWORD[((40-128))+rbp],rbx
+        mov     rbx,QWORD[((112+8-128))+rsi]
+        shl     r10,60
+        mov     BYTE[6+rsp],dl
+        or      rax,r10
+        mov     dl,bl
+        shr     rbx,4
+        mov     r10,r9
+        shr     r9,4
+        mov     QWORD[48+rbp],r8
+        mov     r8,QWORD[((128+0-128))+rsi]
+        shl     dl,4
+        mov     QWORD[((48-128))+rbp],rax
+        mov     rax,QWORD[((128+8-128))+rsi]
+        shl     r10,60
+        mov     BYTE[7+rsp],dl
+        or      rbx,r10
+        mov     dl,al
+        shr     rax,4
+        mov     r10,r8
+        shr     r8,4
+        mov     QWORD[56+rbp],r9
+        mov     r9,QWORD[((144+0-128))+rsi]
+        shl     dl,4
+        mov     QWORD[((56-128))+rbp],rbx
+        mov     rbx,QWORD[((144+8-128))+rsi]
+        shl     r10,60
+        mov     BYTE[8+rsp],dl
+        or      rax,r10
+        mov     dl,bl
+        shr     rbx,4
+        mov     r10,r9
+        shr     r9,4
+        mov     QWORD[64+rbp],r8
+        mov     r8,QWORD[((160+0-128))+rsi]
+        shl     dl,4
+        mov     QWORD[((64-128))+rbp],rax
+        mov     rax,QWORD[((160+8-128))+rsi]
+        shl     r10,60
+        mov     BYTE[9+rsp],dl
+        or      rbx,r10
+        mov     dl,al
+        shr     rax,4
+        mov     r10,r8
+        shr     r8,4
+        mov     QWORD[72+rbp],r9
+        mov     r9,QWORD[((176+0-128))+rsi]
+        shl     dl,4
+        mov     QWORD[((72-128))+rbp],rbx
+        mov     rbx,QWORD[((176+8-128))+rsi]
+        shl     r10,60
+        mov     BYTE[10+rsp],dl
+        or      rax,r10
+        mov     dl,bl
+        shr     rbx,4
+        mov     r10,r9
+        shr     r9,4
+        mov     QWORD[80+rbp],r8
+        mov     r8,QWORD[((192+0-128))+rsi]
+        shl     dl,4
+        mov     QWORD[((80-128))+rbp],rax
+        mov     rax,QWORD[((192+8-128))+rsi]
+        shl     r10,60
+        mov     BYTE[11+rsp],dl
+        or      rbx,r10
+        mov     dl,al
+        shr     rax,4
+        mov     r10,r8
+        shr     r8,4
+        mov     QWORD[88+rbp],r9
+        mov     r9,QWORD[((208+0-128))+rsi]
+        shl     dl,4
+        mov     QWORD[((88-128))+rbp],rbx
+        mov     rbx,QWORD[((208+8-128))+rsi]
+        shl     r10,60
+        mov     BYTE[12+rsp],dl
+        or      rax,r10
+        mov     dl,bl
+        shr     rbx,4
+        mov     r10,r9
+        shr     r9,4
+        mov     QWORD[96+rbp],r8
+        mov     r8,QWORD[((224+0-128))+rsi]
+        shl     dl,4
+        mov     QWORD[((96-128))+rbp],rax
+        mov     rax,QWORD[((224+8-128))+rsi]
+        shl     r10,60
+        mov     BYTE[13+rsp],dl
+        or      rbx,r10
+        mov     dl,al
+        shr     rax,4
+        mov     r10,r8
+        shr     r8,4
+        mov     QWORD[104+rbp],r9
+        mov     r9,QWORD[((240+0-128))+rsi]
+        shl     dl,4
+        mov     QWORD[((104-128))+rbp],rbx
+        mov     rbx,QWORD[((240+8-128))+rsi]
+        shl     r10,60
+        mov     BYTE[14+rsp],dl
+        or      rax,r10
+        mov     dl,bl
+        shr     rbx,4
+        mov     r10,r9
+        shr     r9,4
+        mov     QWORD[112+rbp],r8
+        shl     dl,4
+        mov     QWORD[((112-128))+rbp],rax
+        shl     r10,60
+        mov     BYTE[15+rsp],dl
+        or      rbx,r10
+        mov     QWORD[120+rbp],r9
+        mov     QWORD[((120-128))+rbp],rbx
+        add     rsi,-128
+        mov     r8,QWORD[8+rdi]
+        mov     r9,QWORD[rdi]
+        add     r15,r14
+        lea     r11,[$L$rem_8bit]
+        jmp     NEAR $L$outer_loop
+ALIGN   16
+$L$outer_loop:
+        xor     r9,QWORD[r14]
+        mov     rdx,QWORD[8+r14]
+        lea     r14,[16+r14]
+        xor     rdx,r8
+        mov     QWORD[rdi],r9
+        mov     QWORD[8+rdi],rdx
+        shr     rdx,32
+        xor     rax,rax
+        rol     edx,8
+        mov     al,dl
+        movzx   ebx,dl
+        shl     al,4
+        shr     ebx,4
+        rol     edx,8
+        mov     r8,QWORD[8+rax*1+rsi]
+        mov     r9,QWORD[rax*1+rsi]
+        mov     al,dl
+        movzx   ecx,dl
+        shl     al,4
+        movzx   r12,BYTE[rbx*1+rsp]
+        shr     ecx,4
+        xor     r12,r8
+        mov     r10,r9
+        shr     r8,8
+        movzx   r12,r12b
+        shr     r9,8
+        xor     r8,QWORD[((-128))+rbx*8+rbp]
+        shl     r10,56
+        xor     r9,QWORD[rbx*8+rbp]
+        rol     edx,8
+        xor     r8,QWORD[8+rax*1+rsi]
+        xor     r9,QWORD[rax*1+rsi]
+        mov     al,dl
+        xor     r8,r10
+        movzx   r12,WORD[r12*2+r11]
+        movzx   ebx,dl
+        shl     al,4
+        movzx   r13,BYTE[rcx*1+rsp]
+        shr     ebx,4
+        shl     r12,48
+        xor     r13,r8
+        mov     r10,r9
+        xor     r9,r12
+        shr     r8,8
+        movzx   r13,r13b
+        shr     r9,8
+        xor     r8,QWORD[((-128))+rcx*8+rbp]
+        shl     r10,56
+        xor     r9,QWORD[rcx*8+rbp]
+        rol     edx,8
+        xor     r8,QWORD[8+rax*1+rsi]
+        xor     r9,QWORD[rax*1+rsi]
+        mov     al,dl
+        xor     r8,r10
+        movzx   r13,WORD[r13*2+r11]
+        movzx   ecx,dl
+        shl     al,4
+        movzx   r12,BYTE[rbx*1+rsp]
+        shr     ecx,4
+        shl     r13,48
+        xor     r12,r8
+        mov     r10,r9
+        xor     r9,r13
+        shr     r8,8
+        movzx   r12,r12b
+        mov     edx,DWORD[8+rdi]
+        shr     r9,8
+        xor     r8,QWORD[((-128))+rbx*8+rbp]
+        shl     r10,56
+        xor     r9,QWORD[rbx*8+rbp]
+        rol     edx,8
+        xor     r8,QWORD[8+rax*1+rsi]
+        xor     r9,QWORD[rax*1+rsi]
+        mov     al,dl
+        xor     r8,r10
+        movzx   r12,WORD[r12*2+r11]
+        movzx   ebx,dl
+        shl     al,4
+        movzx   r13,BYTE[rcx*1+rsp]
+        shr     ebx,4
+        shl     r12,48
+        xor     r13,r8
+        mov     r10,r9
+        xor     r9,r12
+        shr     r8,8
+        movzx   r13,r13b
+        shr     r9,8
+        xor     r8,QWORD[((-128))+rcx*8+rbp]
+        shl     r10,56
+        xor     r9,QWORD[rcx*8+rbp]
+        rol     edx,8
+        xor     r8,QWORD[8+rax*1+rsi]
+        xor     r9,QWORD[rax*1+rsi]
+        mov     al,dl
+        xor     r8,r10
+        movzx   r13,WORD[r13*2+r11]
+        movzx   ecx,dl
+        shl     al,4
+        movzx   r12,BYTE[rbx*1+rsp]
+        shr     ecx,4
+        shl     r13,48
+        xor     r12,r8
+        mov     r10,r9
+        xor     r9,r13
+        shr     r8,8
+        movzx   r12,r12b
+        shr     r9,8
+        xor     r8,QWORD[((-128))+rbx*8+rbp]
+        shl     r10,56
+        xor     r9,QWORD[rbx*8+rbp]
+        rol     edx,8
+        xor     r8,QWORD[8+rax*1+rsi]
+        xor     r9,QWORD[rax*1+rsi]
+        mov     al,dl
+        xor     r8,r10
+        movzx   r12,WORD[r12*2+r11]
+        movzx   ebx,dl
+        shl     al,4
+        movzx   r13,BYTE[rcx*1+rsp]
+        shr     ebx,4
+        shl     r12,48
+        xor     r13,r8
+        mov     r10,r9
+        xor     r9,r12
+        shr     r8,8
+        movzx   r13,r13b
+        shr     r9,8
+        xor     r8,QWORD[((-128))+rcx*8+rbp]
+        shl     r10,56
+        xor     r9,QWORD[rcx*8+rbp]
+        rol     edx,8
+        xor     r8,QWORD[8+rax*1+rsi]
+        xor     r9,QWORD[rax*1+rsi]
+        mov     al,dl
+        xor     r8,r10
+        movzx   r13,WORD[r13*2+r11]
+        movzx   ecx,dl
+        shl     al,4
+        movzx   r12,BYTE[rbx*1+rsp]
+        shr     ecx,4
+        shl     r13,48
+        xor     r12,r8
+        mov     r10,r9
+        xor     r9,r13
+        shr     r8,8
+        movzx   r12,r12b
+        mov     edx,DWORD[4+rdi]
+        shr     r9,8
+        xor     r8,QWORD[((-128))+rbx*8+rbp]
+        shl     r10,56
+        xor     r9,QWORD[rbx*8+rbp]
+        rol     edx,8
+        xor     r8,QWORD[8+rax*1+rsi]
+        xor     r9,QWORD[rax*1+rsi]
+        mov     al,dl
+        xor     r8,r10
+        movzx   r12,WORD[r12*2+r11]
+        movzx   ebx,dl
+        shl     al,4
+        movzx   r13,BYTE[rcx*1+rsp]
+        shr     ebx,4
+        shl     r12,48
+        xor     r13,r8
+        mov     r10,r9
+        xor     r9,r12
+        shr     r8,8
+        movzx   r13,r13b
+        shr     r9,8
+        xor     r8,QWORD[((-128))+rcx*8+rbp]
+        shl     r10,56
+        xor     r9,QWORD[rcx*8+rbp]
+        rol     edx,8
+        xor     r8,QWORD[8+rax*1+rsi]
+        xor     r9,QWORD[rax*1+rsi]
+        mov     al,dl
+        xor     r8,r10
+        movzx   r13,WORD[r13*2+r11]
+        movzx   ecx,dl
+        shl     al,4
+        movzx   r12,BYTE[rbx*1+rsp]
+        shr     ecx,4
+        shl     r13,48
+        xor     r12,r8
+        mov     r10,r9
+        xor     r9,r13
+        shr     r8,8
+        movzx   r12,r12b
+        shr     r9,8
+        xor     r8,QWORD[((-128))+rbx*8+rbp]
+        shl     r10,56
+        xor     r9,QWORD[rbx*8+rbp]
+        rol     edx,8
+        xor     r8,QWORD[8+rax*1+rsi]
+        xor     r9,QWORD[rax*1+rsi]
+        mov     al,dl
+        xor     r8,r10
+        movzx   r12,WORD[r12*2+r11]
+        movzx   ebx,dl
+        shl     al,4
+        movzx   r13,BYTE[rcx*1+rsp]
+        shr     ebx,4
+        shl     r12,48
+        xor     r13,r8
+        mov     r10,r9
+        xor     r9,r12
+        shr     r8,8
+        movzx   r13,r13b
+        shr     r9,8
+        xor     r8,QWORD[((-128))+rcx*8+rbp]
+        shl     r10,56
+        xor     r9,QWORD[rcx*8+rbp]
+        rol     edx,8
+        xor     r8,QWORD[8+rax*1+rsi]
+        xor     r9,QWORD[rax*1+rsi]
+        mov     al,dl
+        xor     r8,r10
+        movzx   r13,WORD[r13*2+r11]
+        movzx   ecx,dl
+        shl     al,4
+        movzx   r12,BYTE[rbx*1+rsp]
+        shr     ecx,4
+        shl     r13,48
+        xor     r12,r8
+        mov     r10,r9
+        xor     r9,r13
+        shr     r8,8
+        movzx   r12,r12b
+        mov     edx,DWORD[rdi]
+        shr     r9,8
+        xor     r8,QWORD[((-128))+rbx*8+rbp]
+        shl     r10,56
+        xor     r9,QWORD[rbx*8+rbp]
+        rol     edx,8
+        xor     r8,QWORD[8+rax*1+rsi]
+        xor     r9,QWORD[rax*1+rsi]
+        mov     al,dl
+        xor     r8,r10
+        movzx   r12,WORD[r12*2+r11]
+        movzx   ebx,dl
+        shl     al,4
+        movzx   r13,BYTE[rcx*1+rsp]
+        shr     ebx,4
+        shl     r12,48
+        xor     r13,r8
+        mov     r10,r9
+        xor     r9,r12
+        shr     r8,8
+        movzx   r13,r13b
+        shr     r9,8
+        xor     r8,QWORD[((-128))+rcx*8+rbp]
+        shl     r10,56
+        xor     r9,QWORD[rcx*8+rbp]
+        rol     edx,8
+        xor     r8,QWORD[8+rax*1+rsi]
+        xor     r9,QWORD[rax*1+rsi]
+        mov     al,dl
+        xor     r8,r10
+        movzx   r13,WORD[r13*2+r11]
+        movzx   ecx,dl
+        shl     al,4
+        movzx   r12,BYTE[rbx*1+rsp]
+        shr     ecx,4
+        shl     r13,48
+        xor     r12,r8
+        mov     r10,r9
+        xor     r9,r13
+        shr     r8,8
+        movzx   r12,r12b
+        shr     r9,8
+        xor     r8,QWORD[((-128))+rbx*8+rbp]
+        shl     r10,56
+        xor     r9,QWORD[rbx*8+rbp]
+        rol     edx,8
+        xor     r8,QWORD[8+rax*1+rsi]
+        xor     r9,QWORD[rax*1+rsi]
+        mov     al,dl
+        xor     r8,r10
+        movzx   r12,WORD[r12*2+r11]
+        movzx   ebx,dl
+        shl     al,4
+        movzx   r13,BYTE[rcx*1+rsp]
+        shr     ebx,4
+        shl     r12,48
+        xor     r13,r8
+        mov     r10,r9
+        xor     r9,r12
+        shr     r8,8
+        movzx   r13,r13b
+        shr     r9,8
+        xor     r8,QWORD[((-128))+rcx*8+rbp]
+        shl     r10,56
+        xor     r9,QWORD[rcx*8+rbp]
+        rol     edx,8
+        xor     r8,QWORD[8+rax*1+rsi]
+        xor     r9,QWORD[rax*1+rsi]
+        mov     al,dl
+        xor     r8,r10
+        movzx   r13,WORD[r13*2+r11]
+        movzx   ecx,dl
+        shl     al,4
+        movzx   r12,BYTE[rbx*1+rsp]
+        and     ecx,240
+        shl     r13,48
+        xor     r12,r8
+        mov     r10,r9
+        xor     r9,r13
+        shr     r8,8
+        movzx   r12,r12b
+        mov     edx,DWORD[((-4))+rdi]
+        shr     r9,8
+        xor     r8,QWORD[((-128))+rbx*8+rbp]
+        shl     r10,56
+        xor     r9,QWORD[rbx*8+rbp]
+        movzx   r12,WORD[r12*2+r11]
+        xor     r8,QWORD[8+rax*1+rsi]
+        xor     r9,QWORD[rax*1+rsi]
+        shl     r12,48
+        xor     r8,r10
+        xor     r9,r12
+        movzx   r13,r8b
+        shr     r8,4
+        mov     r10,r9
+        shl     r13b,4
+        shr     r9,4
+        xor     r8,QWORD[8+rcx*1+rsi]
+        movzx   r13,WORD[r13*2+r11]
+        shl     r10,60
+        xor     r9,QWORD[rcx*1+rsi]
+        xor     r8,r10
+        shl     r13,48
+        bswap   r8
+        xor     r9,r13
+        bswap   r9
+        cmp     r14,r15
+        jb      NEAR $L$outer_loop
+        mov     QWORD[8+rdi],r8
+        mov     QWORD[rdi],r9
+
+        lea     rsi,[((280+48))+rsp]
+
+        mov     r15,QWORD[((-48))+rsi]
+
+        mov     r14,QWORD[((-40))+rsi]
+
+        mov     r13,QWORD[((-32))+rsi]
+
+        mov     r12,QWORD[((-24))+rsi]
+
+        mov     rbp,QWORD[((-16))+rsi]
+
+        mov     rbx,QWORD[((-8))+rsi]
+
+        lea     rsp,[rsi]
+
+$L$ghash_epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_gcm_ghash_4bit:
+global  gcm_init_clmul
+
+ALIGN   16
+gcm_init_clmul:
+
+$L$_init_clmul:
+$L$SEH_begin_gcm_init_clmul:
+
+DB      0x48,0x83,0xec,0x18
+DB      0x0f,0x29,0x34,0x24
+        movdqu  xmm2,XMMWORD[rdx]
+        pshufd  xmm2,xmm2,78
+
+
+        pshufd  xmm4,xmm2,255
+        movdqa  xmm3,xmm2
+        psllq   xmm2,1
+        pxor    xmm5,xmm5
+        psrlq   xmm3,63
+        pcmpgtd xmm5,xmm4
+        pslldq  xmm3,8
+        por     xmm2,xmm3
+
+
+        pand    xmm5,XMMWORD[$L$0x1c2_polynomial]
+        pxor    xmm2,xmm5
+
+
+        pshufd  xmm6,xmm2,78
+        movdqa  xmm0,xmm2
+        pxor    xmm6,xmm2
+        movdqa  xmm1,xmm0
+        pshufd  xmm3,xmm0,78
+        pxor    xmm3,xmm0
+DB      102,15,58,68,194,0
+DB      102,15,58,68,202,17
+DB      102,15,58,68,222,0
+        pxor    xmm3,xmm0
+        pxor    xmm3,xmm1
+
+        movdqa  xmm4,xmm3
+        psrldq  xmm3,8
+        pslldq  xmm4,8
+        pxor    xmm1,xmm3
+        pxor    xmm0,xmm4
+
+        movdqa  xmm4,xmm0
+        movdqa  xmm3,xmm0
+        psllq   xmm0,5
+        pxor    xmm3,xmm0
+        psllq   xmm0,1
+        pxor    xmm0,xmm3
+        psllq   xmm0,57
+        movdqa  xmm3,xmm0
+        pslldq  xmm0,8
+        psrldq  xmm3,8
+        pxor    xmm0,xmm4
+        pxor    xmm1,xmm3
+
+
+        movdqa  xmm4,xmm0
+        psrlq   xmm0,1
+        pxor    xmm1,xmm4
+        pxor    xmm4,xmm0
+        psrlq   xmm0,5
+        pxor    xmm0,xmm4
+        psrlq   xmm0,1
+        pxor    xmm0,xmm1
+        pshufd  xmm3,xmm2,78
+        pshufd  xmm4,xmm0,78
+        pxor    xmm3,xmm2
+        movdqu  XMMWORD[rcx],xmm2
+        pxor    xmm4,xmm0
+        movdqu  XMMWORD[16+rcx],xmm0
+DB      102,15,58,15,227,8
+        movdqu  XMMWORD[32+rcx],xmm4
+        movdqa  xmm1,xmm0
+        pshufd  xmm3,xmm0,78
+        pxor    xmm3,xmm0
+DB      102,15,58,68,194,0
+DB      102,15,58,68,202,17
+DB      102,15,58,68,222,0
+        pxor    xmm3,xmm0
+        pxor    xmm3,xmm1
+
+        movdqa  xmm4,xmm3
+        psrldq  xmm3,8
+        pslldq  xmm4,8
+        pxor    xmm1,xmm3
+        pxor    xmm0,xmm4
+
+        movdqa  xmm4,xmm0
+        movdqa  xmm3,xmm0
+        psllq   xmm0,5
+        pxor    xmm3,xmm0
+        psllq   xmm0,1
+        pxor    xmm0,xmm3
+        psllq   xmm0,57
+        movdqa  xmm3,xmm0
+        pslldq  xmm0,8
+        psrldq  xmm3,8
+        pxor    xmm0,xmm4
+        pxor    xmm1,xmm3
+
+
+        movdqa  xmm4,xmm0
+        psrlq   xmm0,1
+        pxor    xmm1,xmm4
+        pxor    xmm4,xmm0
+        psrlq   xmm0,5
+        pxor    xmm0,xmm4
+        psrlq   xmm0,1
+        pxor    xmm0,xmm1
+        movdqa  xmm5,xmm0
+        movdqa  xmm1,xmm0
+        pshufd  xmm3,xmm0,78
+        pxor    xmm3,xmm0
+DB      102,15,58,68,194,0
+DB      102,15,58,68,202,17
+DB      102,15,58,68,222,0
+        pxor    xmm3,xmm0
+        pxor    xmm3,xmm1
+
+        movdqa  xmm4,xmm3
+        psrldq  xmm3,8
+        pslldq  xmm4,8
+        pxor    xmm1,xmm3
+        pxor    xmm0,xmm4
+
+        movdqa  xmm4,xmm0
+        movdqa  xmm3,xmm0
+        psllq   xmm0,5
+        pxor    xmm3,xmm0
+        psllq   xmm0,1
+        pxor    xmm0,xmm3
+        psllq   xmm0,57
+        movdqa  xmm3,xmm0
+        pslldq  xmm0,8
+        psrldq  xmm3,8
+        pxor    xmm0,xmm4
+        pxor    xmm1,xmm3
+
+
+        movdqa  xmm4,xmm0
+        psrlq   xmm0,1
+        pxor    xmm1,xmm4
+        pxor    xmm4,xmm0
+        psrlq   xmm0,5
+        pxor    xmm0,xmm4
+        psrlq   xmm0,1
+        pxor    xmm0,xmm1
+        pshufd  xmm3,xmm5,78
+        pshufd  xmm4,xmm0,78
+        pxor    xmm3,xmm5
+        movdqu  XMMWORD[48+rcx],xmm5
+        pxor    xmm4,xmm0
+        movdqu  XMMWORD[64+rcx],xmm0
+DB      102,15,58,15,227,8
+        movdqu  XMMWORD[80+rcx],xmm4
+        movaps  xmm6,XMMWORD[rsp]
+        lea     rsp,[24+rsp]
+$L$SEH_end_gcm_init_clmul:
+        DB      0F3h,0C3h               ;repret
+
+
+global  gcm_gmult_clmul
+
+ALIGN   16
+gcm_gmult_clmul:
+
+$L$_gmult_clmul:
+        movdqu  xmm0,XMMWORD[rcx]
+        movdqa  xmm5,XMMWORD[$L$bswap_mask]
+        movdqu  xmm2,XMMWORD[rdx]
+        movdqu  xmm4,XMMWORD[32+rdx]
+DB      102,15,56,0,197
+        movdqa  xmm1,xmm0
+        pshufd  xmm3,xmm0,78
+        pxor    xmm3,xmm0
+DB      102,15,58,68,194,0
+DB      102,15,58,68,202,17
+DB      102,15,58,68,220,0
+        pxor    xmm3,xmm0
+        pxor    xmm3,xmm1
+
+        movdqa  xmm4,xmm3
+        psrldq  xmm3,8
+        pslldq  xmm4,8
+        pxor    xmm1,xmm3
+        pxor    xmm0,xmm4
+
+        movdqa  xmm4,xmm0
+        movdqa  xmm3,xmm0
+        psllq   xmm0,5
+        pxor    xmm3,xmm0
+        psllq   xmm0,1
+        pxor    xmm0,xmm3
+        psllq   xmm0,57
+        movdqa  xmm3,xmm0
+        pslldq  xmm0,8
+        psrldq  xmm3,8
+        pxor    xmm0,xmm4
+        pxor    xmm1,xmm3
+
+
+        movdqa  xmm4,xmm0
+        psrlq   xmm0,1
+        pxor    xmm1,xmm4
+        pxor    xmm4,xmm0
+        psrlq   xmm0,5
+        pxor    xmm0,xmm4
+        psrlq   xmm0,1
+        pxor    xmm0,xmm1
+DB      102,15,56,0,197
+        movdqu  XMMWORD[rcx],xmm0
+        DB      0F3h,0C3h               ;repret
+
+
+global  gcm_ghash_clmul
+
+ALIGN   32
+gcm_ghash_clmul:
+
+$L$_ghash_clmul:
+        lea     rax,[((-136))+rsp]
+$L$SEH_begin_gcm_ghash_clmul:
+
+DB      0x48,0x8d,0x60,0xe0
+DB      0x0f,0x29,0x70,0xe0
+DB      0x0f,0x29,0x78,0xf0
+DB      0x44,0x0f,0x29,0x00
+DB      0x44,0x0f,0x29,0x48,0x10
+DB      0x44,0x0f,0x29,0x50,0x20
+DB      0x44,0x0f,0x29,0x58,0x30
+DB      0x44,0x0f,0x29,0x60,0x40
+DB      0x44,0x0f,0x29,0x68,0x50
+DB      0x44,0x0f,0x29,0x70,0x60
+DB      0x44,0x0f,0x29,0x78,0x70
+        movdqa  xmm10,XMMWORD[$L$bswap_mask]
+
+        movdqu  xmm0,XMMWORD[rcx]
+        movdqu  xmm2,XMMWORD[rdx]
+        movdqu  xmm7,XMMWORD[32+rdx]
+DB      102,65,15,56,0,194
+
+        sub     r9,0x10
+        jz      NEAR $L$odd_tail
+
+        movdqu  xmm6,XMMWORD[16+rdx]
+        mov     eax,DWORD[((OPENSSL_ia32cap_P+4))]
+        cmp     r9,0x30
+        jb      NEAR $L$skip4x
+
+        and     eax,71303168
+        cmp     eax,4194304
+        je      NEAR $L$skip4x
+
+        sub     r9,0x30
+        mov     rax,0xA040608020C0E000
+        movdqu  xmm14,XMMWORD[48+rdx]
+        movdqu  xmm15,XMMWORD[64+rdx]
+
+
+
+
+        movdqu  xmm3,XMMWORD[48+r8]
+        movdqu  xmm11,XMMWORD[32+r8]
+DB      102,65,15,56,0,218
+DB      102,69,15,56,0,218
+        movdqa  xmm5,xmm3
+        pshufd  xmm4,xmm3,78
+        pxor    xmm4,xmm3
+DB      102,15,58,68,218,0
+DB      102,15,58,68,234,17
+DB      102,15,58,68,231,0
+
+        movdqa  xmm13,xmm11
+        pshufd  xmm12,xmm11,78
+        pxor    xmm12,xmm11
+DB      102,68,15,58,68,222,0
+DB      102,68,15,58,68,238,17
+DB      102,68,15,58,68,231,16
+        xorps   xmm3,xmm11
+        xorps   xmm5,xmm13
+        movups  xmm7,XMMWORD[80+rdx]
+        xorps   xmm4,xmm12
+
+        movdqu  xmm11,XMMWORD[16+r8]
+        movdqu  xmm8,XMMWORD[r8]
+DB      102,69,15,56,0,218
+DB      102,69,15,56,0,194
+        movdqa  xmm13,xmm11
+        pshufd  xmm12,xmm11,78
+        pxor    xmm0,xmm8
+        pxor    xmm12,xmm11
+DB      102,69,15,58,68,222,0
+        movdqa  xmm1,xmm0
+        pshufd  xmm8,xmm0,78
+        pxor    xmm8,xmm0
+DB      102,69,15,58,68,238,17
+DB      102,68,15,58,68,231,0
+        xorps   xmm3,xmm11
+        xorps   xmm5,xmm13
+
+        lea     r8,[64+r8]
+        sub     r9,0x40
+        jc      NEAR $L$tail4x
+
+        jmp     NEAR $L$mod4_loop
+ALIGN   32
+$L$mod4_loop:
+DB      102,65,15,58,68,199,0
+        xorps   xmm4,xmm12
+        movdqu  xmm11,XMMWORD[48+r8]
+DB      102,69,15,56,0,218
+DB      102,65,15,58,68,207,17
+        xorps   xmm0,xmm3
+        movdqu  xmm3,XMMWORD[32+r8]
+        movdqa  xmm13,xmm11
+DB      102,68,15,58,68,199,16
+        pshufd  xmm12,xmm11,78
+        xorps   xmm1,xmm5
+        pxor    xmm12,xmm11
+DB      102,65,15,56,0,218
+        movups  xmm7,XMMWORD[32+rdx]
+        xorps   xmm8,xmm4
+DB      102,68,15,58,68,218,0
+        pshufd  xmm4,xmm3,78
+
+        pxor    xmm8,xmm0
+        movdqa  xmm5,xmm3
+        pxor    xmm8,xmm1
+        pxor    xmm4,xmm3
+        movdqa  xmm9,xmm8
+DB      102,68,15,58,68,234,17
+        pslldq  xmm8,8
+        psrldq  xmm9,8
+        pxor    xmm0,xmm8
+        movdqa  xmm8,XMMWORD[$L$7_mask]
+        pxor    xmm1,xmm9
+DB      102,76,15,110,200
+
+        pand    xmm8,xmm0
+DB      102,69,15,56,0,200
+        pxor    xmm9,xmm0
+DB      102,68,15,58,68,231,0
+        psllq   xmm9,57
+        movdqa  xmm8,xmm9
+        pslldq  xmm9,8
+DB      102,15,58,68,222,0
+        psrldq  xmm8,8
+        pxor    xmm0,xmm9
+        pxor    xmm1,xmm8
+        movdqu  xmm8,XMMWORD[r8]
+
+        movdqa  xmm9,xmm0
+        psrlq   xmm0,1
+DB      102,15,58,68,238,17
+        xorps   xmm3,xmm11
+        movdqu  xmm11,XMMWORD[16+r8]
+DB      102,69,15,56,0,218
+DB      102,15,58,68,231,16
+        xorps   xmm5,xmm13
+        movups  xmm7,XMMWORD[80+rdx]
+DB      102,69,15,56,0,194
+        pxor    xmm1,xmm9
+        pxor    xmm9,xmm0
+        psrlq   xmm0,5
+
+        movdqa  xmm13,xmm11
+        pxor    xmm4,xmm12
+        pshufd  xmm12,xmm11,78
+        pxor    xmm0,xmm9
+        pxor    xmm1,xmm8
+        pxor    xmm12,xmm11
+DB      102,69,15,58,68,222,0
+        psrlq   xmm0,1
+        pxor    xmm0,xmm1
+        movdqa  xmm1,xmm0
+DB      102,69,15,58,68,238,17
+        xorps   xmm3,xmm11
+        pshufd  xmm8,xmm0,78
+        pxor    xmm8,xmm0
+
+DB      102,68,15,58,68,231,0
+        xorps   xmm5,xmm13
+
+        lea     r8,[64+r8]
+        sub     r9,0x40
+        jnc     NEAR $L$mod4_loop
+
+$L$tail4x:
+DB      102,65,15,58,68,199,0
+DB      102,65,15,58,68,207,17
+DB      102,68,15,58,68,199,16
+        xorps   xmm4,xmm12
+        xorps   xmm0,xmm3
+        xorps   xmm1,xmm5
+        pxor    xmm1,xmm0
+        pxor    xmm8,xmm4
+
+        pxor    xmm8,xmm1
+        pxor    xmm1,xmm0
+
+        movdqa  xmm9,xmm8
+        psrldq  xmm8,8
+        pslldq  xmm9,8
+        pxor    xmm1,xmm8
+        pxor    xmm0,xmm9
+
+        movdqa  xmm4,xmm0
+        movdqa  xmm3,xmm0
+        psllq   xmm0,5
+        pxor    xmm3,xmm0
+        psllq   xmm0,1
+        pxor    xmm0,xmm3
+        psllq   xmm0,57
+        movdqa  xmm3,xmm0
+        pslldq  xmm0,8
+        psrldq  xmm3,8
+        pxor    xmm0,xmm4
+        pxor    xmm1,xmm3
+
+
+        movdqa  xmm4,xmm0
+        psrlq   xmm0,1
+        pxor    xmm1,xmm4
+        pxor    xmm4,xmm0
+        psrlq   xmm0,5
+        pxor    xmm0,xmm4
+        psrlq   xmm0,1
+        pxor    xmm0,xmm1
+        add     r9,0x40
+        jz      NEAR $L$done
+        movdqu  xmm7,XMMWORD[32+rdx]
+        sub     r9,0x10
+        jz      NEAR $L$odd_tail
+$L$skip4x:
+
+
+
+
+
+        movdqu  xmm8,XMMWORD[r8]
+        movdqu  xmm3,XMMWORD[16+r8]
+DB      102,69,15,56,0,194
+DB      102,65,15,56,0,218
+        pxor    xmm0,xmm8
+
+        movdqa  xmm5,xmm3
+        pshufd  xmm4,xmm3,78
+        pxor    xmm4,xmm3
+DB      102,15,58,68,218,0
+DB      102,15,58,68,234,17
+DB      102,15,58,68,231,0
+
+        lea     r8,[32+r8]
+        nop
+        sub     r9,0x20
+        jbe     NEAR $L$even_tail
+        nop
+        jmp     NEAR $L$mod_loop
+
+ALIGN   32
+$L$mod_loop:
+        movdqa  xmm1,xmm0
+        movdqa  xmm8,xmm4
+        pshufd  xmm4,xmm0,78
+        pxor    xmm4,xmm0
+
+DB      102,15,58,68,198,0
+DB      102,15,58,68,206,17
+DB      102,15,58,68,231,16
+
+        pxor    xmm0,xmm3
+        pxor    xmm1,xmm5
+        movdqu  xmm9,XMMWORD[r8]
+        pxor    xmm8,xmm0
+DB      102,69,15,56,0,202
+        movdqu  xmm3,XMMWORD[16+r8]
+
+        pxor    xmm8,xmm1
+        pxor    xmm1,xmm9
+        pxor    xmm4,xmm8
+DB      102,65,15,56,0,218
+        movdqa  xmm8,xmm4
+        psrldq  xmm8,8
+        pslldq  xmm4,8
+        pxor    xmm1,xmm8
+        pxor    xmm0,xmm4
+
+        movdqa  xmm5,xmm3
+
+        movdqa  xmm9,xmm0
+        movdqa  xmm8,xmm0
+        psllq   xmm0,5
+        pxor    xmm8,xmm0
+DB      102,15,58,68,218,0
+        psllq   xmm0,1
+        pxor    xmm0,xmm8
+        psllq   xmm0,57
+        movdqa  xmm8,xmm0
+        pslldq  xmm0,8
+        psrldq  xmm8,8
+        pxor    xmm0,xmm9
+        pshufd  xmm4,xmm5,78
+        pxor    xmm1,xmm8
+        pxor    xmm4,xmm5
+
+        movdqa  xmm9,xmm0
+        psrlq   xmm0,1
+DB      102,15,58,68,234,17
+        pxor    xmm1,xmm9
+        pxor    xmm9,xmm0
+        psrlq   xmm0,5
+        pxor    xmm0,xmm9
+        lea     r8,[32+r8]
+        psrlq   xmm0,1
+DB      102,15,58,68,231,0
+        pxor    xmm0,xmm1
+
+        sub     r9,0x20
+        ja      NEAR $L$mod_loop
+
+$L$even_tail:
+        movdqa  xmm1,xmm0
+        movdqa  xmm8,xmm4
+        pshufd  xmm4,xmm0,78
+        pxor    xmm4,xmm0
+
+DB      102,15,58,68,198,0
+DB      102,15,58,68,206,17
+DB      102,15,58,68,231,16
+
+        pxor    xmm0,xmm3
+        pxor    xmm1,xmm5
+        pxor    xmm8,xmm0
+        pxor    xmm8,xmm1
+        pxor    xmm4,xmm8
+        movdqa  xmm8,xmm4
+        psrldq  xmm8,8
+        pslldq  xmm4,8
+        pxor    xmm1,xmm8
+        pxor    xmm0,xmm4
+
+        movdqa  xmm4,xmm0
+        movdqa  xmm3,xmm0
+        psllq   xmm0,5
+        pxor    xmm3,xmm0
+        psllq   xmm0,1
+        pxor    xmm0,xmm3
+        psllq   xmm0,57
+        movdqa  xmm3,xmm0
+        pslldq  xmm0,8
+        psrldq  xmm3,8
+        pxor    xmm0,xmm4
+        pxor    xmm1,xmm3
+
+
+        movdqa  xmm4,xmm0
+        psrlq   xmm0,1
+        pxor    xmm1,xmm4
+        pxor    xmm4,xmm0
+        psrlq   xmm0,5
+        pxor    xmm0,xmm4
+        psrlq   xmm0,1
+        pxor    xmm0,xmm1
+        test    r9,r9
+        jnz     NEAR $L$done
+
+$L$odd_tail:
+        movdqu  xmm8,XMMWORD[r8]
+DB      102,69,15,56,0,194
+        pxor    xmm0,xmm8
+        movdqa  xmm1,xmm0
+        pshufd  xmm3,xmm0,78
+        pxor    xmm3,xmm0
+DB      102,15,58,68,194,0
+DB      102,15,58,68,202,17
+DB      102,15,58,68,223,0
+        pxor    xmm3,xmm0
+        pxor    xmm3,xmm1
+
+        movdqa  xmm4,xmm3
+        psrldq  xmm3,8
+        pslldq  xmm4,8
+        pxor    xmm1,xmm3
+        pxor    xmm0,xmm4
+
+        movdqa  xmm4,xmm0
+        movdqa  xmm3,xmm0
+        psllq   xmm0,5
+        pxor    xmm3,xmm0
+        psllq   xmm0,1
+        pxor    xmm0,xmm3
+        psllq   xmm0,57
+        movdqa  xmm3,xmm0
+        pslldq  xmm0,8
+        psrldq  xmm3,8
+        pxor    xmm0,xmm4
+        pxor    xmm1,xmm3
+
+
+        movdqa  xmm4,xmm0
+        psrlq   xmm0,1
+        pxor    xmm1,xmm4
+        pxor    xmm4,xmm0
+        psrlq   xmm0,5
+        pxor    xmm0,xmm4
+        psrlq   xmm0,1
+        pxor    xmm0,xmm1
+$L$done:
+DB      102,65,15,56,0,194
+        movdqu  XMMWORD[rcx],xmm0
+        movaps  xmm6,XMMWORD[rsp]
+        movaps  xmm7,XMMWORD[16+rsp]
+        movaps  xmm8,XMMWORD[32+rsp]
+        movaps  xmm9,XMMWORD[48+rsp]
+        movaps  xmm10,XMMWORD[64+rsp]
+        movaps  xmm11,XMMWORD[80+rsp]
+        movaps  xmm12,XMMWORD[96+rsp]
+        movaps  xmm13,XMMWORD[112+rsp]
+        movaps  xmm14,XMMWORD[128+rsp]
+        movaps  xmm15,XMMWORD[144+rsp]
+        lea     rsp,[168+rsp]
+$L$SEH_end_gcm_ghash_clmul:
+        DB      0F3h,0C3h               ;repret
+
+
+global  gcm_init_avx
+
+ALIGN   32
+gcm_init_avx:
+
+$L$SEH_begin_gcm_init_avx:
+
+DB      0x48,0x83,0xec,0x18
+DB      0x0f,0x29,0x34,0x24
+        vzeroupper
+
+        vmovdqu xmm2,XMMWORD[rdx]
+        vpshufd xmm2,xmm2,78
+
+
+        vpshufd xmm4,xmm2,255
+        vpsrlq  xmm3,xmm2,63
+        vpsllq  xmm2,xmm2,1
+        vpxor   xmm5,xmm5,xmm5
+        vpcmpgtd        xmm5,xmm5,xmm4
+        vpslldq xmm3,xmm3,8
+        vpor    xmm2,xmm2,xmm3
+
+
+        vpand   xmm5,xmm5,XMMWORD[$L$0x1c2_polynomial]
+        vpxor   xmm2,xmm2,xmm5
+
+        vpunpckhqdq     xmm6,xmm2,xmm2
+        vmovdqa xmm0,xmm2
+        vpxor   xmm6,xmm6,xmm2
+        mov     r10,4
+        jmp     NEAR $L$init_start_avx
+ALIGN   32
+$L$init_loop_avx:
+        vpalignr        xmm5,xmm4,xmm3,8
+        vmovdqu XMMWORD[(-16)+rcx],xmm5
+        vpunpckhqdq     xmm3,xmm0,xmm0
+        vpxor   xmm3,xmm3,xmm0
+        vpclmulqdq      xmm1,xmm0,xmm2,0x11
+        vpclmulqdq      xmm0,xmm0,xmm2,0x00
+        vpclmulqdq      xmm3,xmm3,xmm6,0x00
+        vpxor   xmm4,xmm1,xmm0
+        vpxor   xmm3,xmm3,xmm4
+
+        vpslldq xmm4,xmm3,8
+        vpsrldq xmm3,xmm3,8
+        vpxor   xmm0,xmm0,xmm4
+        vpxor   xmm1,xmm1,xmm3
+        vpsllq  xmm3,xmm0,57
+        vpsllq  xmm4,xmm0,62
+        vpxor   xmm4,xmm4,xmm3
+        vpsllq  xmm3,xmm0,63
+        vpxor   xmm4,xmm4,xmm3
+        vpslldq xmm3,xmm4,8
+        vpsrldq xmm4,xmm4,8
+        vpxor   xmm0,xmm0,xmm3
+        vpxor   xmm1,xmm1,xmm4
+
+        vpsrlq  xmm4,xmm0,1
+        vpxor   xmm1,xmm1,xmm0
+        vpxor   xmm0,xmm0,xmm4
+        vpsrlq  xmm4,xmm4,5
+        vpxor   xmm0,xmm0,xmm4
+        vpsrlq  xmm0,xmm0,1
+        vpxor   xmm0,xmm0,xmm1
+$L$init_start_avx:
+        vmovdqa xmm5,xmm0
+        vpunpckhqdq     xmm3,xmm0,xmm0
+        vpxor   xmm3,xmm3,xmm0
+        vpclmulqdq      xmm1,xmm0,xmm2,0x11
+        vpclmulqdq      xmm0,xmm0,xmm2,0x00
+        vpclmulqdq      xmm3,xmm3,xmm6,0x00
+        vpxor   xmm4,xmm1,xmm0
+        vpxor   xmm3,xmm3,xmm4
+
+        vpslldq xmm4,xmm3,8
+        vpsrldq xmm3,xmm3,8
+        vpxor   xmm0,xmm0,xmm4
+        vpxor   xmm1,xmm1,xmm3
+        vpsllq  xmm3,xmm0,57
+        vpsllq  xmm4,xmm0,62
+        vpxor   xmm4,xmm4,xmm3
+        vpsllq  xmm3,xmm0,63
+        vpxor   xmm4,xmm4,xmm3
+        vpslldq xmm3,xmm4,8
+        vpsrldq xmm4,xmm4,8
+        vpxor   xmm0,xmm0,xmm3
+        vpxor   xmm1,xmm1,xmm4
+
+        vpsrlq  xmm4,xmm0,1
+        vpxor   xmm1,xmm1,xmm0
+        vpxor   xmm0,xmm0,xmm4
+        vpsrlq  xmm4,xmm4,5
+        vpxor   xmm0,xmm0,xmm4
+        vpsrlq  xmm0,xmm0,1
+        vpxor   xmm0,xmm0,xmm1
+        vpshufd xmm3,xmm5,78
+        vpshufd xmm4,xmm0,78
+        vpxor   xmm3,xmm3,xmm5
+        vmovdqu XMMWORD[rcx],xmm5
+        vpxor   xmm4,xmm4,xmm0
+        vmovdqu XMMWORD[16+rcx],xmm0
+        lea     rcx,[48+rcx]
+        sub     r10,1
+        jnz     NEAR $L$init_loop_avx
+
+        vpalignr        xmm5,xmm3,xmm4,8
+        vmovdqu XMMWORD[(-16)+rcx],xmm5
+
+        vzeroupper
+        movaps  xmm6,XMMWORD[rsp]
+        lea     rsp,[24+rsp]
+$L$SEH_end_gcm_init_avx:
+        DB      0F3h,0C3h               ;repret
+
+
+global  gcm_gmult_avx
+
+ALIGN   32
+gcm_gmult_avx:
+
+        jmp     NEAR $L$_gmult_clmul
+
+
+global  gcm_ghash_avx
+
+ALIGN   32
+gcm_ghash_avx:
+
+        lea     rax,[((-136))+rsp]
+$L$SEH_begin_gcm_ghash_avx:
+
+DB      0x48,0x8d,0x60,0xe0
+DB      0x0f,0x29,0x70,0xe0
+DB      0x0f,0x29,0x78,0xf0
+DB      0x44,0x0f,0x29,0x00
+DB      0x44,0x0f,0x29,0x48,0x10
+DB      0x44,0x0f,0x29,0x50,0x20
+DB      0x44,0x0f,0x29,0x58,0x30
+DB      0x44,0x0f,0x29,0x60,0x40
+DB      0x44,0x0f,0x29,0x68,0x50
+DB      0x44,0x0f,0x29,0x70,0x60
+DB      0x44,0x0f,0x29,0x78,0x70
+        vzeroupper
+
+        vmovdqu xmm10,XMMWORD[rcx]
+        lea     r10,[$L$0x1c2_polynomial]
+        lea     rdx,[64+rdx]
+        vmovdqu xmm13,XMMWORD[$L$bswap_mask]
+        vpshufb xmm10,xmm10,xmm13
+        cmp     r9,0x80
+        jb      NEAR $L$short_avx
+        sub     r9,0x80
+
+        vmovdqu xmm14,XMMWORD[112+r8]
+        vmovdqu xmm6,XMMWORD[((0-64))+rdx]
+        vpshufb xmm14,xmm14,xmm13
+        vmovdqu xmm7,XMMWORD[((32-64))+rdx]
+
+        vpunpckhqdq     xmm9,xmm14,xmm14
+        vmovdqu xmm15,XMMWORD[96+r8]
+        vpclmulqdq      xmm0,xmm14,xmm6,0x00
+        vpxor   xmm9,xmm9,xmm14
+        vpshufb xmm15,xmm15,xmm13
+        vpclmulqdq      xmm1,xmm14,xmm6,0x11
+        vmovdqu xmm6,XMMWORD[((16-64))+rdx]
+        vpunpckhqdq     xmm8,xmm15,xmm15
+        vmovdqu xmm14,XMMWORD[80+r8]
+        vpclmulqdq      xmm2,xmm9,xmm7,0x00
+        vpxor   xmm8,xmm8,xmm15
+
+        vpshufb xmm14,xmm14,xmm13
+        vpclmulqdq      xmm3,xmm15,xmm6,0x00
+        vpunpckhqdq     xmm9,xmm14,xmm14
+        vpclmulqdq      xmm4,xmm15,xmm6,0x11
+        vmovdqu xmm6,XMMWORD[((48-64))+rdx]
+        vpxor   xmm9,xmm9,xmm14
+        vmovdqu xmm15,XMMWORD[64+r8]
+        vpclmulqdq      xmm5,xmm8,xmm7,0x10
+        vmovdqu xmm7,XMMWORD[((80-64))+rdx]
+
+        vpshufb xmm15,xmm15,xmm13
+        vpxor   xmm3,xmm3,xmm0
+        vpclmulqdq      xmm0,xmm14,xmm6,0x00
+        vpxor   xmm4,xmm4,xmm1
+        vpunpckhqdq     xmm8,xmm15,xmm15
+        vpclmulqdq      xmm1,xmm14,xmm6,0x11
+        vmovdqu xmm6,XMMWORD[((64-64))+rdx]
+        vpxor   xmm5,xmm5,xmm2
+        vpclmulqdq      xmm2,xmm9,xmm7,0x00
+        vpxor   xmm8,xmm8,xmm15
+
+        vmovdqu xmm14,XMMWORD[48+r8]
+        vpxor   xmm0,xmm0,xmm3
+        vpclmulqdq      xmm3,xmm15,xmm6,0x00
+        vpxor   xmm1,xmm1,xmm4
+        vpshufb xmm14,xmm14,xmm13
+        vpclmulqdq      xmm4,xmm15,xmm6,0x11
+        vmovdqu xmm6,XMMWORD[((96-64))+rdx]
+        vpxor   xmm2,xmm2,xmm5
+        vpunpckhqdq     xmm9,xmm14,xmm14
+        vpclmulqdq      xmm5,xmm8,xmm7,0x10
+        vmovdqu xmm7,XMMWORD[((128-64))+rdx]
+        vpxor   xmm9,xmm9,xmm14
+
+        vmovdqu xmm15,XMMWORD[32+r8]
+        vpxor   xmm3,xmm3,xmm0
+        vpclmulqdq      xmm0,xmm14,xmm6,0x00
+        vpxor   xmm4,xmm4,xmm1
+        vpshufb xmm15,xmm15,xmm13
+        vpclmulqdq      xmm1,xmm14,xmm6,0x11
+        vmovdqu xmm6,XMMWORD[((112-64))+rdx]
+        vpxor   xmm5,xmm5,xmm2
+        vpunpckhqdq     xmm8,xmm15,xmm15
+        vpclmulqdq      xmm2,xmm9,xmm7,0x00
+        vpxor   xmm8,xmm8,xmm15
+
+        vmovdqu xmm14,XMMWORD[16+r8]
+        vpxor   xmm0,xmm0,xmm3
+        vpclmulqdq      xmm3,xmm15,xmm6,0x00
+        vpxor   xmm1,xmm1,xmm4
+        vpshufb xmm14,xmm14,xmm13
+        vpclmulqdq      xmm4,xmm15,xmm6,0x11
+        vmovdqu xmm6,XMMWORD[((144-64))+rdx]
+        vpxor   xmm2,xmm2,xmm5
+        vpunpckhqdq     xmm9,xmm14,xmm14
+        vpclmulqdq      xmm5,xmm8,xmm7,0x10
+        vmovdqu xmm7,XMMWORD[((176-64))+rdx]
+        vpxor   xmm9,xmm9,xmm14
+
+        vmovdqu xmm15,XMMWORD[r8]
+        vpxor   xmm3,xmm3,xmm0
+        vpclmulqdq      xmm0,xmm14,xmm6,0x00
+        vpxor   xmm4,xmm4,xmm1
+        vpshufb xmm15,xmm15,xmm13
+        vpclmulqdq      xmm1,xmm14,xmm6,0x11
+        vmovdqu xmm6,XMMWORD[((160-64))+rdx]
+        vpxor   xmm5,xmm5,xmm2
+        vpclmulqdq      xmm2,xmm9,xmm7,0x10
+
+        lea     r8,[128+r8]
+        cmp     r9,0x80
+        jb      NEAR $L$tail_avx
+
+        vpxor   xmm15,xmm15,xmm10
+        sub     r9,0x80
+        jmp     NEAR $L$oop8x_avx
+
+ALIGN   32
+$L$oop8x_avx:
+        vpunpckhqdq     xmm8,xmm15,xmm15
+        vmovdqu xmm14,XMMWORD[112+r8]
+        vpxor   xmm3,xmm3,xmm0
+        vpxor   xmm8,xmm8,xmm15
+        vpclmulqdq      xmm10,xmm15,xmm6,0x00
+        vpshufb xmm14,xmm14,xmm13
+        vpxor   xmm4,xmm4,xmm1
+        vpclmulqdq      xmm11,xmm15,xmm6,0x11
+        vmovdqu xmm6,XMMWORD[((0-64))+rdx]
+        vpunpckhqdq     xmm9,xmm14,xmm14
+        vpxor   xmm5,xmm5,xmm2
+        vpclmulqdq      xmm12,xmm8,xmm7,0x00
+        vmovdqu xmm7,XMMWORD[((32-64))+rdx]
+        vpxor   xmm9,xmm9,xmm14
+
+        vmovdqu xmm15,XMMWORD[96+r8]
+        vpclmulqdq      xmm0,xmm14,xmm6,0x00
+        vpxor   xmm10,xmm10,xmm3
+        vpshufb xmm15,xmm15,xmm13
+        vpclmulqdq      xmm1,xmm14,xmm6,0x11
+        vxorps  xmm11,xmm11,xmm4
+        vmovdqu xmm6,XMMWORD[((16-64))+rdx]
+        vpunpckhqdq     xmm8,xmm15,xmm15
+        vpclmulqdq      xmm2,xmm9,xmm7,0x00
+        vpxor   xmm12,xmm12,xmm5
+        vxorps  xmm8,xmm8,xmm15
+
+        vmovdqu xmm14,XMMWORD[80+r8]
+        vpxor   xmm12,xmm12,xmm10
+        vpclmulqdq      xmm3,xmm15,xmm6,0x00
+        vpxor   xmm12,xmm12,xmm11
+        vpslldq xmm9,xmm12,8
+        vpxor   xmm3,xmm3,xmm0
+        vpclmulqdq      xmm4,xmm15,xmm6,0x11
+        vpsrldq xmm12,xmm12,8
+        vpxor   xmm10,xmm10,xmm9
+        vmovdqu xmm6,XMMWORD[((48-64))+rdx]
+        vpshufb xmm14,xmm14,xmm13
+        vxorps  xmm11,xmm11,xmm12
+        vpxor   xmm4,xmm4,xmm1
+        vpunpckhqdq     xmm9,xmm14,xmm14
+        vpclmulqdq      xmm5,xmm8,xmm7,0x10
+        vmovdqu xmm7,XMMWORD[((80-64))+rdx]
+        vpxor   xmm9,xmm9,xmm14
+        vpxor   xmm5,xmm5,xmm2
+
+        vmovdqu xmm15,XMMWORD[64+r8]
+        vpalignr        xmm12,xmm10,xmm10,8
+        vpclmulqdq      xmm0,xmm14,xmm6,0x00
+        vpshufb xmm15,xmm15,xmm13
+        vpxor   xmm0,xmm0,xmm3
+        vpclmulqdq      xmm1,xmm14,xmm6,0x11
+        vmovdqu xmm6,XMMWORD[((64-64))+rdx]
+        vpunpckhqdq     xmm8,xmm15,xmm15
+        vpxor   xmm1,xmm1,xmm4
+        vpclmulqdq      xmm2,xmm9,xmm7,0x00
+        vxorps  xmm8,xmm8,xmm15
+        vpxor   xmm2,xmm2,xmm5
+
+        vmovdqu xmm14,XMMWORD[48+r8]
+        vpclmulqdq      xmm10,xmm10,XMMWORD[r10],0x10
+        vpclmulqdq      xmm3,xmm15,xmm6,0x00
+        vpshufb xmm14,xmm14,xmm13
+        vpxor   xmm3,xmm3,xmm0
+        vpclmulqdq      xmm4,xmm15,xmm6,0x11
+        vmovdqu xmm6,XMMWORD[((96-64))+rdx]
+        vpunpckhqdq     xmm9,xmm14,xmm14
+        vpxor   xmm4,xmm4,xmm1
+        vpclmulqdq      xmm5,xmm8,xmm7,0x10
+        vmovdqu xmm7,XMMWORD[((128-64))+rdx]
+        vpxor   xmm9,xmm9,xmm14
+        vpxor   xmm5,xmm5,xmm2
+
+        vmovdqu xmm15,XMMWORD[32+r8]
+        vpclmulqdq      xmm0,xmm14,xmm6,0x00
+        vpshufb xmm15,xmm15,xmm13
+        vpxor   xmm0,xmm0,xmm3
+        vpclmulqdq      xmm1,xmm14,xmm6,0x11
+        vmovdqu xmm6,XMMWORD[((112-64))+rdx]
+        vpunpckhqdq     xmm8,xmm15,xmm15
+        vpxor   xmm1,xmm1,xmm4
+        vpclmulqdq      xmm2,xmm9,xmm7,0x00
+        vpxor   xmm8,xmm8,xmm15
+        vpxor   xmm2,xmm2,xmm5
+        vxorps  xmm10,xmm10,xmm12
+
+        vmovdqu xmm14,XMMWORD[16+r8]
+        vpalignr        xmm12,xmm10,xmm10,8
+        vpclmulqdq      xmm3,xmm15,xmm6,0x00
+        vpshufb xmm14,xmm14,xmm13
+        vpxor   xmm3,xmm3,xmm0
+        vpclmulqdq      xmm4,xmm15,xmm6,0x11
+        vmovdqu xmm6,XMMWORD[((144-64))+rdx]
+        vpclmulqdq      xmm10,xmm10,XMMWORD[r10],0x10
+        vxorps  xmm12,xmm12,xmm11
+        vpunpckhqdq     xmm9,xmm14,xmm14
+        vpxor   xmm4,xmm4,xmm1
+        vpclmulqdq      xmm5,xmm8,xmm7,0x10
+        vmovdqu xmm7,XMMWORD[((176-64))+rdx]
+        vpxor   xmm9,xmm9,xmm14
+        vpxor   xmm5,xmm5,xmm2
+
+        vmovdqu xmm15,XMMWORD[r8]
+        vpclmulqdq      xmm0,xmm14,xmm6,0x00
+        vpshufb xmm15,xmm15,xmm13
+        vpclmulqdq      xmm1,xmm14,xmm6,0x11
+        vmovdqu xmm6,XMMWORD[((160-64))+rdx]
+        vpxor   xmm15,xmm15,xmm12
+        vpclmulqdq      xmm2,xmm9,xmm7,0x10
+        vpxor   xmm15,xmm15,xmm10
+
+        lea     r8,[128+r8]
+        sub     r9,0x80
+        jnc     NEAR $L$oop8x_avx
+
+        add     r9,0x80
+        jmp     NEAR $L$tail_no_xor_avx
+
+ALIGN   32
+$L$short_avx:
+        vmovdqu xmm14,XMMWORD[((-16))+r9*1+r8]
+        lea     r8,[r9*1+r8]
+        vmovdqu xmm6,XMMWORD[((0-64))+rdx]
+        vmovdqu xmm7,XMMWORD[((32-64))+rdx]
+        vpshufb xmm15,xmm14,xmm13
+
+        vmovdqa xmm3,xmm0
+        vmovdqa xmm4,xmm1
+        vmovdqa xmm5,xmm2
+        sub     r9,0x10
+        jz      NEAR $L$tail_avx
+
+        vpunpckhqdq     xmm8,xmm15,xmm15
+        vpxor   xmm3,xmm3,xmm0
+        vpclmulqdq      xmm0,xmm15,xmm6,0x00
+        vpxor   xmm8,xmm8,xmm15
+        vmovdqu xmm14,XMMWORD[((-32))+r8]
+        vpxor   xmm4,xmm4,xmm1
+        vpclmulqdq      xmm1,xmm15,xmm6,0x11
+        vmovdqu xmm6,XMMWORD[((16-64))+rdx]
+        vpshufb xmm15,xmm14,xmm13
+        vpxor   xmm5,xmm5,xmm2
+        vpclmulqdq      xmm2,xmm8,xmm7,0x00
+        vpsrldq xmm7,xmm7,8
+        sub     r9,0x10
+        jz      NEAR $L$tail_avx
+
+        vpunpckhqdq     xmm8,xmm15,xmm15
+        vpxor   xmm3,xmm3,xmm0
+        vpclmulqdq      xmm0,xmm15,xmm6,0x00
+        vpxor   xmm8,xmm8,xmm15
+        vmovdqu xmm14,XMMWORD[((-48))+r8]
+        vpxor   xmm4,xmm4,xmm1
+        vpclmulqdq      xmm1,xmm15,xmm6,0x11
+        vmovdqu xmm6,XMMWORD[((48-64))+rdx]
+        vpshufb xmm15,xmm14,xmm13
+        vpxor   xmm5,xmm5,xmm2
+        vpclmulqdq      xmm2,xmm8,xmm7,0x00
+        vmovdqu xmm7,XMMWORD[((80-64))+rdx]
+        sub     r9,0x10
+        jz      NEAR $L$tail_avx
+
+        vpunpckhqdq     xmm8,xmm15,xmm15
+        vpxor   xmm3,xmm3,xmm0
+        vpclmulqdq      xmm0,xmm15,xmm6,0x00
+        vpxor   xmm8,xmm8,xmm15
+        vmovdqu xmm14,XMMWORD[((-64))+r8]
+        vpxor   xmm4,xmm4,xmm1
+        vpclmulqdq      xmm1,xmm15,xmm6,0x11
+        vmovdqu xmm6,XMMWORD[((64-64))+rdx]
+        vpshufb xmm15,xmm14,xmm13
+        vpxor   xmm5,xmm5,xmm2
+        vpclmulqdq      xmm2,xmm8,xmm7,0x00
+        vpsrldq xmm7,xmm7,8
+        sub     r9,0x10
+        jz      NEAR $L$tail_avx
+
+        vpunpckhqdq     xmm8,xmm15,xmm15
+        vpxor   xmm3,xmm3,xmm0
+        vpclmulqdq      xmm0,xmm15,xmm6,0x00
+        vpxor   xmm8,xmm8,xmm15
+        vmovdqu xmm14,XMMWORD[((-80))+r8]
+        vpxor   xmm4,xmm4,xmm1
+        vpclmulqdq      xmm1,xmm15,xmm6,0x11
+        vmovdqu xmm6,XMMWORD[((96-64))+rdx]
+        vpshufb xmm15,xmm14,xmm13
+        vpxor   xmm5,xmm5,xmm2
+        vpclmulqdq      xmm2,xmm8,xmm7,0x00
+        vmovdqu xmm7,XMMWORD[((128-64))+rdx]
+        sub     r9,0x10
+        jz      NEAR $L$tail_avx
+
+        vpunpckhqdq     xmm8,xmm15,xmm15
+        vpxor   xmm3,xmm3,xmm0
+        vpclmulqdq      xmm0,xmm15,xmm6,0x00
+        vpxor   xmm8,xmm8,xmm15
+        vmovdqu xmm14,XMMWORD[((-96))+r8]
+        vpxor   xmm4,xmm4,xmm1
+        vpclmulqdq      xmm1,xmm15,xmm6,0x11
+        vmovdqu xmm6,XMMWORD[((112-64))+rdx]
+        vpshufb xmm15,xmm14,xmm13
+        vpxor   xmm5,xmm5,xmm2
+        vpclmulqdq      xmm2,xmm8,xmm7,0x00
+        vpsrldq xmm7,xmm7,8
+        sub     r9,0x10
+        jz      NEAR $L$tail_avx
+
+        vpunpckhqdq     xmm8,xmm15,xmm15
+        vpxor   xmm3,xmm3,xmm0
+        vpclmulqdq      xmm0,xmm15,xmm6,0x00
+        vpxor   xmm8,xmm8,xmm15
+        vmovdqu xmm14,XMMWORD[((-112))+r8]
+        vpxor   xmm4,xmm4,xmm1
+        vpclmulqdq      xmm1,xmm15,xmm6,0x11
+        vmovdqu xmm6,XMMWORD[((144-64))+rdx]
+        vpshufb xmm15,xmm14,xmm13
+        vpxor   xmm5,xmm5,xmm2
+        vpclmulqdq      xmm2,xmm8,xmm7,0x00
+        vmovq   xmm7,QWORD[((184-64))+rdx]
+        sub     r9,0x10
+        jmp     NEAR $L$tail_avx
+
+ALIGN   32
+$L$tail_avx:
+        vpxor   xmm15,xmm15,xmm10
+$L$tail_no_xor_avx:
+        vpunpckhqdq     xmm8,xmm15,xmm15
+        vpxor   xmm3,xmm3,xmm0
+        vpclmulqdq      xmm0,xmm15,xmm6,0x00
+        vpxor   xmm8,xmm8,xmm15
+        vpxor   xmm4,xmm4,xmm1
+        vpclmulqdq      xmm1,xmm15,xmm6,0x11
+        vpxor   xmm5,xmm5,xmm2
+        vpclmulqdq      xmm2,xmm8,xmm7,0x00
+
+        vmovdqu xmm12,XMMWORD[r10]
+
+        vpxor   xmm10,xmm3,xmm0
+        vpxor   xmm11,xmm4,xmm1
+        vpxor   xmm5,xmm5,xmm2
+
+        vpxor   xmm5,xmm5,xmm10
+        vpxor   xmm5,xmm5,xmm11
+        vpslldq xmm9,xmm5,8
+        vpsrldq xmm5,xmm5,8
+        vpxor   xmm10,xmm10,xmm9
+        vpxor   xmm11,xmm11,xmm5
+
+        vpclmulqdq      xmm9,xmm10,xmm12,0x10
+        vpalignr        xmm10,xmm10,xmm10,8
+        vpxor   xmm10,xmm10,xmm9
+
+        vpclmulqdq      xmm9,xmm10,xmm12,0x10
+        vpalignr        xmm10,xmm10,xmm10,8
+        vpxor   xmm10,xmm10,xmm11
+        vpxor   xmm10,xmm10,xmm9
+
+        cmp     r9,0
+        jne     NEAR $L$short_avx
+
+        vpshufb xmm10,xmm10,xmm13
+        vmovdqu XMMWORD[rcx],xmm10
+        vzeroupper
+        movaps  xmm6,XMMWORD[rsp]
+        movaps  xmm7,XMMWORD[16+rsp]
+        movaps  xmm8,XMMWORD[32+rsp]
+        movaps  xmm9,XMMWORD[48+rsp]
+        movaps  xmm10,XMMWORD[64+rsp]
+        movaps  xmm11,XMMWORD[80+rsp]
+        movaps  xmm12,XMMWORD[96+rsp]
+        movaps  xmm13,XMMWORD[112+rsp]
+        movaps  xmm14,XMMWORD[128+rsp]
+        movaps  xmm15,XMMWORD[144+rsp]
+        lea     rsp,[168+rsp]
+$L$SEH_end_gcm_ghash_avx:
+        DB      0F3h,0C3h               ;repret
+
+
+ALIGN   64
+$L$bswap_mask:
+DB      15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0
+$L$0x1c2_polynomial:
+DB      1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0xc2
+$L$7_mask:
+        DD      7,0,7,0
+$L$7_mask_poly:
+        DD      7,0,450,0
+ALIGN   64
+
+$L$rem_4bit:
+        DD      0,0,0,471859200,0,943718400,0,610271232
+        DD      0,1887436800,0,1822425088,0,1220542464,0,1423966208
+        DD      0,3774873600,0,4246732800,0,3644850176,0,3311403008
+        DD      0,2441084928,0,2376073216,0,2847932416,0,3051356160
+
+$L$rem_8bit:
+        DW      0x0000,0x01C2,0x0384,0x0246,0x0708,0x06CA,0x048C,0x054E
+        DW      0x0E10,0x0FD2,0x0D94,0x0C56,0x0918,0x08DA,0x0A9C,0x0B5E
+        DW      0x1C20,0x1DE2,0x1FA4,0x1E66,0x1B28,0x1AEA,0x18AC,0x196E
+        DW      0x1230,0x13F2,0x11B4,0x1076,0x1538,0x14FA,0x16BC,0x177E
+        DW      0x3840,0x3982,0x3BC4,0x3A06,0x3F48,0x3E8A,0x3CCC,0x3D0E
+        DW      0x3650,0x3792,0x35D4,0x3416,0x3158,0x309A,0x32DC,0x331E
+        DW      0x2460,0x25A2,0x27E4,0x2626,0x2368,0x22AA,0x20EC,0x212E
+        DW      0x2A70,0x2BB2,0x29F4,0x2836,0x2D78,0x2CBA,0x2EFC,0x2F3E
+        DW      0x7080,0x7142,0x7304,0x72C6,0x7788,0x764A,0x740C,0x75CE
+        DW      0x7E90,0x7F52,0x7D14,0x7CD6,0x7998,0x785A,0x7A1C,0x7BDE
+        DW      0x6CA0,0x6D62,0x6F24,0x6EE6,0x6BA8,0x6A6A,0x682C,0x69EE
+        DW      0x62B0,0x6372,0x6134,0x60F6,0x65B8,0x647A,0x663C,0x67FE
+        DW      0x48C0,0x4902,0x4B44,0x4A86,0x4FC8,0x4E0A,0x4C4C,0x4D8E
+        DW      0x46D0,0x4712,0x4554,0x4496,0x41D8,0x401A,0x425C,0x439E
+        DW      0x54E0,0x5522,0x5764,0x56A6,0x53E8,0x522A,0x506C,0x51AE
+        DW      0x5AF0,0x5B32,0x5974,0x58B6,0x5DF8,0x5C3A,0x5E7C,0x5FBE
+        DW      0xE100,0xE0C2,0xE284,0xE346,0xE608,0xE7CA,0xE58C,0xE44E
+        DW      0xEF10,0xEED2,0xEC94,0xED56,0xE818,0xE9DA,0xEB9C,0xEA5E
+        DW      0xFD20,0xFCE2,0xFEA4,0xFF66,0xFA28,0xFBEA,0xF9AC,0xF86E
+        DW      0xF330,0xF2F2,0xF0B4,0xF176,0xF438,0xF5FA,0xF7BC,0xF67E
+        DW      0xD940,0xD882,0xDAC4,0xDB06,0xDE48,0xDF8A,0xDDCC,0xDC0E
+        DW      0xD750,0xD692,0xD4D4,0xD516,0xD058,0xD19A,0xD3DC,0xD21E
+        DW      0xC560,0xC4A2,0xC6E4,0xC726,0xC268,0xC3AA,0xC1EC,0xC02E
+        DW      0xCB70,0xCAB2,0xC8F4,0xC936,0xCC78,0xCDBA,0xCFFC,0xCE3E
+        DW      0x9180,0x9042,0x9204,0x93C6,0x9688,0x974A,0x950C,0x94CE
+        DW      0x9F90,0x9E52,0x9C14,0x9DD6,0x9898,0x995A,0x9B1C,0x9ADE
+        DW      0x8DA0,0x8C62,0x8E24,0x8FE6,0x8AA8,0x8B6A,0x892C,0x88EE
+        DW      0x83B0,0x8272,0x8034,0x81F6,0x84B8,0x857A,0x873C,0x86FE
+        DW      0xA9C0,0xA802,0xAA44,0xAB86,0xAEC8,0xAF0A,0xAD4C,0xAC8E
+        DW      0xA7D0,0xA612,0xA454,0xA596,0xA0D8,0xA11A,0xA35C,0xA29E
+        DW      0xB5E0,0xB422,0xB664,0xB7A6,0xB2E8,0xB32A,0xB16C,0xB0AE
+        DW      0xBBF0,0xBA32,0xB874,0xB9B6,0xBCF8,0xBD3A,0xBF7C,0xBEBE
+
+DB      71,72,65,83,72,32,102,111,114,32,120,56,54,95,54,52
+DB      44,32,67,82,89,80,84,79,71,65,77,83,32,98,121,32
+DB      60,97,112,112,114,111,64,111,112,101,110,115,115,108,46,111
+DB      114,103,62,0
+ALIGN   64
+EXTERN  __imp_RtlVirtualUnwind
+
+ALIGN   16
+se_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        mov     rsi,QWORD[8+r9]
+        mov     r11,QWORD[56+r9]
+
+        mov     r10d,DWORD[r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jb      NEAR $L$in_prologue
+
+        mov     rax,QWORD[152+r8]
+
+        mov     r10d,DWORD[4+r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jae     NEAR $L$in_prologue
+
+        lea     rax,[((48+280))+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+        mov     rbp,QWORD[((-16))+rax]
+        mov     r12,QWORD[((-24))+rax]
+        mov     r13,QWORD[((-32))+rax]
+        mov     r14,QWORD[((-40))+rax]
+        mov     r15,QWORD[((-48))+rax]
+        mov     QWORD[144+r8],rbx
+        mov     QWORD[160+r8],rbp
+        mov     QWORD[216+r8],r12
+        mov     QWORD[224+r8],r13
+        mov     QWORD[232+r8],r14
+        mov     QWORD[240+r8],r15
+
+$L$in_prologue:
+        mov     rdi,QWORD[8+rax]
+        mov     rsi,QWORD[16+rax]
+        mov     QWORD[152+r8],rax
+        mov     QWORD[168+r8],rsi
+        mov     QWORD[176+r8],rdi
+
+        mov     rdi,QWORD[40+r9]
+        mov     rsi,r8
+        mov     ecx,154
+        DD      0xa548f3fc
+
+        mov     rsi,r9
+        xor     rcx,rcx
+        mov     rdx,QWORD[8+rsi]
+        mov     r8,QWORD[rsi]
+        mov     r9,QWORD[16+rsi]
+        mov     r10,QWORD[40+rsi]
+        lea     r11,[56+rsi]
+        lea     r12,[24+rsi]
+        mov     QWORD[32+rsp],r10
+        mov     QWORD[40+rsp],r11
+        mov     QWORD[48+rsp],r12
+        mov     QWORD[56+rsp],rcx
+        call    QWORD[__imp_RtlVirtualUnwind]
+
+        mov     eax,1
+        add     rsp,64
+        popfq
+        pop     r15
+        pop     r14
+        pop     r13
+        pop     r12
+        pop     rbp
+        pop     rbx
+        pop     rdi
+        pop     rsi
+        DB      0F3h,0C3h               ;repret
+
+
+section .pdata rdata align=4
+ALIGN   4
+        DD      $L$SEH_begin_gcm_gmult_4bit wrt ..imagebase
+        DD      $L$SEH_end_gcm_gmult_4bit wrt ..imagebase
+        DD      $L$SEH_info_gcm_gmult_4bit wrt ..imagebase
+
+        DD      $L$SEH_begin_gcm_ghash_4bit wrt ..imagebase
+        DD      $L$SEH_end_gcm_ghash_4bit wrt ..imagebase
+        DD      $L$SEH_info_gcm_ghash_4bit wrt ..imagebase
+
+        DD      $L$SEH_begin_gcm_init_clmul wrt ..imagebase
+        DD      $L$SEH_end_gcm_init_clmul wrt ..imagebase
+        DD      $L$SEH_info_gcm_init_clmul wrt ..imagebase
+
+        DD      $L$SEH_begin_gcm_ghash_clmul wrt ..imagebase
+        DD      $L$SEH_end_gcm_ghash_clmul wrt ..imagebase
+        DD      $L$SEH_info_gcm_ghash_clmul wrt ..imagebase
+        DD      $L$SEH_begin_gcm_init_avx wrt ..imagebase
+        DD      $L$SEH_end_gcm_init_avx wrt ..imagebase
+        DD      $L$SEH_info_gcm_init_clmul wrt ..imagebase
+
+        DD      $L$SEH_begin_gcm_ghash_avx wrt ..imagebase
+        DD      $L$SEH_end_gcm_ghash_avx wrt ..imagebase
+        DD      $L$SEH_info_gcm_ghash_clmul wrt ..imagebase
+section .xdata rdata align=8
+ALIGN   8
+$L$SEH_info_gcm_gmult_4bit:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$gmult_prologue wrt ..imagebase,$L$gmult_epilogue wrt ..imagebase
+$L$SEH_info_gcm_ghash_4bit:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$ghash_prologue wrt ..imagebase,$L$ghash_epilogue wrt ..imagebase
+$L$SEH_info_gcm_init_clmul:
+DB      0x01,0x08,0x03,0x00
+DB      0x08,0x68,0x00,0x00
+DB      0x04,0x22,0x00,0x00
+$L$SEH_info_gcm_ghash_clmul:
+DB      0x01,0x33,0x16,0x00
+DB      0x33,0xf8,0x09,0x00
+DB      0x2e,0xe8,0x08,0x00
+DB      0x29,0xd8,0x07,0x00
+DB      0x24,0xc8,0x06,0x00
+DB      0x1f,0xb8,0x05,0x00
+DB      0x1a,0xa8,0x04,0x00
+DB      0x15,0x98,0x03,0x00
+DB      0x10,0x88,0x02,0x00
+DB      0x0c,0x78,0x01,0x00
+DB      0x08,0x68,0x00,0x00
+DB      0x04,0x01,0x15,0x00
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-md5-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-md5-x86_64.nasm
new file mode 100644
index 0000000000..c9a37a47c9
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-md5-x86_64.nasm
@@ -0,0 +1,1395 @@
+; Copyright 2011-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+ALIGN   16
+
+global  rc4_md5_enc
+
+rc4_md5_enc:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_rc4_md5_enc:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+        mov     r8,QWORD[40+rsp]
+        mov     r9,QWORD[48+rsp]
+
+
+
+        cmp     r9,0
+        je      NEAR $L$abort
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+        sub     rsp,40
+
+$L$body:
+        mov     r11,rcx
+        mov     r12,r9
+        mov     r13,rsi
+        mov     r14,rdx
+        mov     r15,r8
+        xor     rbp,rbp
+        xor     rcx,rcx
+
+        lea     rdi,[8+rdi]
+        mov     bpl,BYTE[((-8))+rdi]
+        mov     cl,BYTE[((-4))+rdi]
+
+        inc     bpl
+        sub     r14,r13
+        mov     eax,DWORD[rbp*4+rdi]
+        add     cl,al
+        lea     rsi,[rbp*4+rdi]
+        shl     r12,6
+        add     r12,r15
+        mov     QWORD[16+rsp],r12
+
+        mov     QWORD[24+rsp],r11
+        mov     r8d,DWORD[r11]
+        mov     r9d,DWORD[4+r11]
+        mov     r10d,DWORD[8+r11]
+        mov     r11d,DWORD[12+r11]
+        jmp     NEAR $L$oop
+
+ALIGN   16
+$L$oop:
+        mov     DWORD[rsp],r8d
+        mov     DWORD[4+rsp],r9d
+        mov     DWORD[8+rsp],r10d
+        mov     r12d,r11d
+        mov     DWORD[12+rsp],r11d
+        pxor    xmm0,xmm0
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r10d
+        mov     DWORD[rcx*4+rdi],eax
+        and     r12d,r9d
+        add     r8d,DWORD[r15]
+        add     al,dl
+        mov     ebx,DWORD[4+rsi]
+        add     r8d,3614090360
+        xor     r12d,r11d
+        movzx   eax,al
+        mov     DWORD[rsi],edx
+        add     r8d,r12d
+        add     cl,bl
+        rol     r8d,7
+        mov     r12d,r10d
+        movd    xmm0,DWORD[rax*4+rdi]
+
+        add     r8d,r9d
+        pxor    xmm1,xmm1
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r9d
+        mov     DWORD[rcx*4+rdi],ebx
+        and     r12d,r8d
+        add     r11d,DWORD[4+r15]
+        add     bl,dl
+        mov     eax,DWORD[8+rsi]
+        add     r11d,3905402710
+        xor     r12d,r10d
+        movzx   ebx,bl
+        mov     DWORD[4+rsi],edx
+        add     r11d,r12d
+        add     cl,al
+        rol     r11d,12
+        mov     r12d,r9d
+        movd    xmm1,DWORD[rbx*4+rdi]
+
+        add     r11d,r8d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r8d
+        mov     DWORD[rcx*4+rdi],eax
+        and     r12d,r11d
+        add     r10d,DWORD[8+r15]
+        add     al,dl
+        mov     ebx,DWORD[12+rsi]
+        add     r10d,606105819
+        xor     r12d,r9d
+        movzx   eax,al
+        mov     DWORD[8+rsi],edx
+        add     r10d,r12d
+        add     cl,bl
+        rol     r10d,17
+        mov     r12d,r8d
+        pinsrw  xmm0,WORD[rax*4+rdi],1
+
+        add     r10d,r11d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r11d
+        mov     DWORD[rcx*4+rdi],ebx
+        and     r12d,r10d
+        add     r9d,DWORD[12+r15]
+        add     bl,dl
+        mov     eax,DWORD[16+rsi]
+        add     r9d,3250441966
+        xor     r12d,r8d
+        movzx   ebx,bl
+        mov     DWORD[12+rsi],edx
+        add     r9d,r12d
+        add     cl,al
+        rol     r9d,22
+        mov     r12d,r11d
+        pinsrw  xmm1,WORD[rbx*4+rdi],1
+
+        add     r9d,r10d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r10d
+        mov     DWORD[rcx*4+rdi],eax
+        and     r12d,r9d
+        add     r8d,DWORD[16+r15]
+        add     al,dl
+        mov     ebx,DWORD[20+rsi]
+        add     r8d,4118548399
+        xor     r12d,r11d
+        movzx   eax,al
+        mov     DWORD[16+rsi],edx
+        add     r8d,r12d
+        add     cl,bl
+        rol     r8d,7
+        mov     r12d,r10d
+        pinsrw  xmm0,WORD[rax*4+rdi],2
+
+        add     r8d,r9d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r9d
+        mov     DWORD[rcx*4+rdi],ebx
+        and     r12d,r8d
+        add     r11d,DWORD[20+r15]
+        add     bl,dl
+        mov     eax,DWORD[24+rsi]
+        add     r11d,1200080426
+        xor     r12d,r10d
+        movzx   ebx,bl
+        mov     DWORD[20+rsi],edx
+        add     r11d,r12d
+        add     cl,al
+        rol     r11d,12
+        mov     r12d,r9d
+        pinsrw  xmm1,WORD[rbx*4+rdi],2
+
+        add     r11d,r8d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r8d
+        mov     DWORD[rcx*4+rdi],eax
+        and     r12d,r11d
+        add     r10d,DWORD[24+r15]
+        add     al,dl
+        mov     ebx,DWORD[28+rsi]
+        add     r10d,2821735955
+        xor     r12d,r9d
+        movzx   eax,al
+        mov     DWORD[24+rsi],edx
+        add     r10d,r12d
+        add     cl,bl
+        rol     r10d,17
+        mov     r12d,r8d
+        pinsrw  xmm0,WORD[rax*4+rdi],3
+
+        add     r10d,r11d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r11d
+        mov     DWORD[rcx*4+rdi],ebx
+        and     r12d,r10d
+        add     r9d,DWORD[28+r15]
+        add     bl,dl
+        mov     eax,DWORD[32+rsi]
+        add     r9d,4249261313
+        xor     r12d,r8d
+        movzx   ebx,bl
+        mov     DWORD[28+rsi],edx
+        add     r9d,r12d
+        add     cl,al
+        rol     r9d,22
+        mov     r12d,r11d
+        pinsrw  xmm1,WORD[rbx*4+rdi],3
+
+        add     r9d,r10d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r10d
+        mov     DWORD[rcx*4+rdi],eax
+        and     r12d,r9d
+        add     r8d,DWORD[32+r15]
+        add     al,dl
+        mov     ebx,DWORD[36+rsi]
+        add     r8d,1770035416
+        xor     r12d,r11d
+        movzx   eax,al
+        mov     DWORD[32+rsi],edx
+        add     r8d,r12d
+        add     cl,bl
+        rol     r8d,7
+        mov     r12d,r10d
+        pinsrw  xmm0,WORD[rax*4+rdi],4
+
+        add     r8d,r9d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r9d
+        mov     DWORD[rcx*4+rdi],ebx
+        and     r12d,r8d
+        add     r11d,DWORD[36+r15]
+        add     bl,dl
+        mov     eax,DWORD[40+rsi]
+        add     r11d,2336552879
+        xor     r12d,r10d
+        movzx   ebx,bl
+        mov     DWORD[36+rsi],edx
+        add     r11d,r12d
+        add     cl,al
+        rol     r11d,12
+        mov     r12d,r9d
+        pinsrw  xmm1,WORD[rbx*4+rdi],4
+
+        add     r11d,r8d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r8d
+        mov     DWORD[rcx*4+rdi],eax
+        and     r12d,r11d
+        add     r10d,DWORD[40+r15]
+        add     al,dl
+        mov     ebx,DWORD[44+rsi]
+        add     r10d,4294925233
+        xor     r12d,r9d
+        movzx   eax,al
+        mov     DWORD[40+rsi],edx
+        add     r10d,r12d
+        add     cl,bl
+        rol     r10d,17
+        mov     r12d,r8d
+        pinsrw  xmm0,WORD[rax*4+rdi],5
+
+        add     r10d,r11d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r11d
+        mov     DWORD[rcx*4+rdi],ebx
+        and     r12d,r10d
+        add     r9d,DWORD[44+r15]
+        add     bl,dl
+        mov     eax,DWORD[48+rsi]
+        add     r9d,2304563134
+        xor     r12d,r8d
+        movzx   ebx,bl
+        mov     DWORD[44+rsi],edx
+        add     r9d,r12d
+        add     cl,al
+        rol     r9d,22
+        mov     r12d,r11d
+        pinsrw  xmm1,WORD[rbx*4+rdi],5
+
+        add     r9d,r10d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r10d
+        mov     DWORD[rcx*4+rdi],eax
+        and     r12d,r9d
+        add     r8d,DWORD[48+r15]
+        add     al,dl
+        mov     ebx,DWORD[52+rsi]
+        add     r8d,1804603682
+        xor     r12d,r11d
+        movzx   eax,al
+        mov     DWORD[48+rsi],edx
+        add     r8d,r12d
+        add     cl,bl
+        rol     r8d,7
+        mov     r12d,r10d
+        pinsrw  xmm0,WORD[rax*4+rdi],6
+
+        add     r8d,r9d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r9d
+        mov     DWORD[rcx*4+rdi],ebx
+        and     r12d,r8d
+        add     r11d,DWORD[52+r15]
+        add     bl,dl
+        mov     eax,DWORD[56+rsi]
+        add     r11d,4254626195
+        xor     r12d,r10d
+        movzx   ebx,bl
+        mov     DWORD[52+rsi],edx
+        add     r11d,r12d
+        add     cl,al
+        rol     r11d,12
+        mov     r12d,r9d
+        pinsrw  xmm1,WORD[rbx*4+rdi],6
+
+        add     r11d,r8d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r8d
+        mov     DWORD[rcx*4+rdi],eax
+        and     r12d,r11d
+        add     r10d,DWORD[56+r15]
+        add     al,dl
+        mov     ebx,DWORD[60+rsi]
+        add     r10d,2792965006
+        xor     r12d,r9d
+        movzx   eax,al
+        mov     DWORD[56+rsi],edx
+        add     r10d,r12d
+        add     cl,bl
+        rol     r10d,17
+        mov     r12d,r8d
+        pinsrw  xmm0,WORD[rax*4+rdi],7
+
+        add     r10d,r11d
+        movdqu  xmm2,XMMWORD[r13]
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r11d
+        mov     DWORD[rcx*4+rdi],ebx
+        and     r12d,r10d
+        add     r9d,DWORD[60+r15]
+        add     bl,dl
+        mov     eax,DWORD[64+rsi]
+        add     r9d,1236535329
+        xor     r12d,r8d
+        movzx   ebx,bl
+        mov     DWORD[60+rsi],edx
+        add     r9d,r12d
+        add     cl,al
+        rol     r9d,22
+        mov     r12d,r10d
+        pinsrw  xmm1,WORD[rbx*4+rdi],7
+
+        add     r9d,r10d
+        psllq   xmm1,8
+        pxor    xmm2,xmm0
+        pxor    xmm2,xmm1
+        pxor    xmm0,xmm0
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r9d
+        mov     DWORD[rcx*4+rdi],eax
+        and     r12d,r11d
+        add     r8d,DWORD[4+r15]
+        add     al,dl
+        mov     ebx,DWORD[68+rsi]
+        add     r8d,4129170786
+        xor     r12d,r10d
+        movzx   eax,al
+        mov     DWORD[64+rsi],edx
+        add     r8d,r12d
+        add     cl,bl
+        rol     r8d,5
+        mov     r12d,r9d
+        movd    xmm0,DWORD[rax*4+rdi]
+
+        add     r8d,r9d
+        pxor    xmm1,xmm1
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r8d
+        mov     DWORD[rcx*4+rdi],ebx
+        and     r12d,r10d
+        add     r11d,DWORD[24+r15]
+        add     bl,dl
+        mov     eax,DWORD[72+rsi]
+        add     r11d,3225465664
+        xor     r12d,r9d
+        movzx   ebx,bl
+        mov     DWORD[68+rsi],edx
+        add     r11d,r12d
+        add     cl,al
+        rol     r11d,9
+        mov     r12d,r8d
+        movd    xmm1,DWORD[rbx*4+rdi]
+
+        add     r11d,r8d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r11d
+        mov     DWORD[rcx*4+rdi],eax
+        and     r12d,r9d
+        add     r10d,DWORD[44+r15]
+        add     al,dl
+        mov     ebx,DWORD[76+rsi]
+        add     r10d,643717713
+        xor     r12d,r8d
+        movzx   eax,al
+        mov     DWORD[72+rsi],edx
+        add     r10d,r12d
+        add     cl,bl
+        rol     r10d,14
+        mov     r12d,r11d
+        pinsrw  xmm0,WORD[rax*4+rdi],1
+
+        add     r10d,r11d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r10d
+        mov     DWORD[rcx*4+rdi],ebx
+        and     r12d,r8d
+        add     r9d,DWORD[r15]
+        add     bl,dl
+        mov     eax,DWORD[80+rsi]
+        add     r9d,3921069994
+        xor     r12d,r11d
+        movzx   ebx,bl
+        mov     DWORD[76+rsi],edx
+        add     r9d,r12d
+        add     cl,al
+        rol     r9d,20
+        mov     r12d,r10d
+        pinsrw  xmm1,WORD[rbx*4+rdi],1
+
+        add     r9d,r10d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r9d
+        mov     DWORD[rcx*4+rdi],eax
+        and     r12d,r11d
+        add     r8d,DWORD[20+r15]
+        add     al,dl
+        mov     ebx,DWORD[84+rsi]
+        add     r8d,3593408605
+        xor     r12d,r10d
+        movzx   eax,al
+        mov     DWORD[80+rsi],edx
+        add     r8d,r12d
+        add     cl,bl
+        rol     r8d,5
+        mov     r12d,r9d
+        pinsrw  xmm0,WORD[rax*4+rdi],2
+
+        add     r8d,r9d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r8d
+        mov     DWORD[rcx*4+rdi],ebx
+        and     r12d,r10d
+        add     r11d,DWORD[40+r15]
+        add     bl,dl
+        mov     eax,DWORD[88+rsi]
+        add     r11d,38016083
+        xor     r12d,r9d
+        movzx   ebx,bl
+        mov     DWORD[84+rsi],edx
+        add     r11d,r12d
+        add     cl,al
+        rol     r11d,9
+        mov     r12d,r8d
+        pinsrw  xmm1,WORD[rbx*4+rdi],2
+
+        add     r11d,r8d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r11d
+        mov     DWORD[rcx*4+rdi],eax
+        and     r12d,r9d
+        add     r10d,DWORD[60+r15]
+        add     al,dl
+        mov     ebx,DWORD[92+rsi]
+        add     r10d,3634488961
+        xor     r12d,r8d
+        movzx   eax,al
+        mov     DWORD[88+rsi],edx
+        add     r10d,r12d
+        add     cl,bl
+        rol     r10d,14
+        mov     r12d,r11d
+        pinsrw  xmm0,WORD[rax*4+rdi],3
+
+        add     r10d,r11d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r10d
+        mov     DWORD[rcx*4+rdi],ebx
+        and     r12d,r8d
+        add     r9d,DWORD[16+r15]
+        add     bl,dl
+        mov     eax,DWORD[96+rsi]
+        add     r9d,3889429448
+        xor     r12d,r11d
+        movzx   ebx,bl
+        mov     DWORD[92+rsi],edx
+        add     r9d,r12d
+        add     cl,al
+        rol     r9d,20
+        mov     r12d,r10d
+        pinsrw  xmm1,WORD[rbx*4+rdi],3
+
+        add     r9d,r10d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r9d
+        mov     DWORD[rcx*4+rdi],eax
+        and     r12d,r11d
+        add     r8d,DWORD[36+r15]
+        add     al,dl
+        mov     ebx,DWORD[100+rsi]
+        add     r8d,568446438
+        xor     r12d,r10d
+        movzx   eax,al
+        mov     DWORD[96+rsi],edx
+        add     r8d,r12d
+        add     cl,bl
+        rol     r8d,5
+        mov     r12d,r9d
+        pinsrw  xmm0,WORD[rax*4+rdi],4
+
+        add     r8d,r9d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r8d
+        mov     DWORD[rcx*4+rdi],ebx
+        and     r12d,r10d
+        add     r11d,DWORD[56+r15]
+        add     bl,dl
+        mov     eax,DWORD[104+rsi]
+        add     r11d,3275163606
+        xor     r12d,r9d
+        movzx   ebx,bl
+        mov     DWORD[100+rsi],edx
+        add     r11d,r12d
+        add     cl,al
+        rol     r11d,9
+        mov     r12d,r8d
+        pinsrw  xmm1,WORD[rbx*4+rdi],4
+
+        add     r11d,r8d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r11d
+        mov     DWORD[rcx*4+rdi],eax
+        and     r12d,r9d
+        add     r10d,DWORD[12+r15]
+        add     al,dl
+        mov     ebx,DWORD[108+rsi]
+        add     r10d,4107603335
+        xor     r12d,r8d
+        movzx   eax,al
+        mov     DWORD[104+rsi],edx
+        add     r10d,r12d
+        add     cl,bl
+        rol     r10d,14
+        mov     r12d,r11d
+        pinsrw  xmm0,WORD[rax*4+rdi],5
+
+        add     r10d,r11d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r10d
+        mov     DWORD[rcx*4+rdi],ebx
+        and     r12d,r8d
+        add     r9d,DWORD[32+r15]
+        add     bl,dl
+        mov     eax,DWORD[112+rsi]
+        add     r9d,1163531501
+        xor     r12d,r11d
+        movzx   ebx,bl
+        mov     DWORD[108+rsi],edx
+        add     r9d,r12d
+        add     cl,al
+        rol     r9d,20
+        mov     r12d,r10d
+        pinsrw  xmm1,WORD[rbx*4+rdi],5
+
+        add     r9d,r10d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r9d
+        mov     DWORD[rcx*4+rdi],eax
+        and     r12d,r11d
+        add     r8d,DWORD[52+r15]
+        add     al,dl
+        mov     ebx,DWORD[116+rsi]
+        add     r8d,2850285829
+        xor     r12d,r10d
+        movzx   eax,al
+        mov     DWORD[112+rsi],edx
+        add     r8d,r12d
+        add     cl,bl
+        rol     r8d,5
+        mov     r12d,r9d
+        pinsrw  xmm0,WORD[rax*4+rdi],6
+
+        add     r8d,r9d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r8d
+        mov     DWORD[rcx*4+rdi],ebx
+        and     r12d,r10d
+        add     r11d,DWORD[8+r15]
+        add     bl,dl
+        mov     eax,DWORD[120+rsi]
+        add     r11d,4243563512
+        xor     r12d,r9d
+        movzx   ebx,bl
+        mov     DWORD[116+rsi],edx
+        add     r11d,r12d
+        add     cl,al
+        rol     r11d,9
+        mov     r12d,r8d
+        pinsrw  xmm1,WORD[rbx*4+rdi],6
+
+        add     r11d,r8d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r11d
+        mov     DWORD[rcx*4+rdi],eax
+        and     r12d,r9d
+        add     r10d,DWORD[28+r15]
+        add     al,dl
+        mov     ebx,DWORD[124+rsi]
+        add     r10d,1735328473
+        xor     r12d,r8d
+        movzx   eax,al
+        mov     DWORD[120+rsi],edx
+        add     r10d,r12d
+        add     cl,bl
+        rol     r10d,14
+        mov     r12d,r11d
+        pinsrw  xmm0,WORD[rax*4+rdi],7
+
+        add     r10d,r11d
+        movdqu  xmm3,XMMWORD[16+r13]
+        add     bpl,32
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r10d
+        mov     DWORD[rcx*4+rdi],ebx
+        and     r12d,r8d
+        add     r9d,DWORD[48+r15]
+        add     bl,dl
+        mov     eax,DWORD[rbp*4+rdi]
+        add     r9d,2368359562
+        xor     r12d,r11d
+        movzx   ebx,bl
+        mov     DWORD[124+rsi],edx
+        add     r9d,r12d
+        add     cl,al
+        rol     r9d,20
+        mov     r12d,r11d
+        pinsrw  xmm1,WORD[rbx*4+rdi],7
+
+        add     r9d,r10d
+        mov     rsi,rcx
+        xor     rcx,rcx
+        mov     cl,sil
+        lea     rsi,[rbp*4+rdi]
+        psllq   xmm1,8
+        pxor    xmm3,xmm0
+        pxor    xmm3,xmm1
+        pxor    xmm0,xmm0
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r10d
+        mov     DWORD[rcx*4+rdi],eax
+        xor     r12d,r9d
+        add     r8d,DWORD[20+r15]
+        add     al,dl
+        mov     ebx,DWORD[4+rsi]
+        add     r8d,4294588738
+        movzx   eax,al
+        add     r8d,r12d
+        mov     DWORD[rsi],edx
+        add     cl,bl
+        rol     r8d,4
+        mov     r12d,r10d
+        movd    xmm0,DWORD[rax*4+rdi]
+
+        add     r8d,r9d
+        pxor    xmm1,xmm1
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r9d
+        mov     DWORD[rcx*4+rdi],ebx
+        xor     r12d,r8d
+        add     r11d,DWORD[32+r15]
+        add     bl,dl
+        mov     eax,DWORD[8+rsi]
+        add     r11d,2272392833
+        movzx   ebx,bl
+        add     r11d,r12d
+        mov     DWORD[4+rsi],edx
+        add     cl,al
+        rol     r11d,11
+        mov     r12d,r9d
+        movd    xmm1,DWORD[rbx*4+rdi]
+
+        add     r11d,r8d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r8d
+        mov     DWORD[rcx*4+rdi],eax
+        xor     r12d,r11d
+        add     r10d,DWORD[44+r15]
+        add     al,dl
+        mov     ebx,DWORD[12+rsi]
+        add     r10d,1839030562
+        movzx   eax,al
+        add     r10d,r12d
+        mov     DWORD[8+rsi],edx
+        add     cl,bl
+        rol     r10d,16
+        mov     r12d,r8d
+        pinsrw  xmm0,WORD[rax*4+rdi],1
+
+        add     r10d,r11d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r11d
+        mov     DWORD[rcx*4+rdi],ebx
+        xor     r12d,r10d
+        add     r9d,DWORD[56+r15]
+        add     bl,dl
+        mov     eax,DWORD[16+rsi]
+        add     r9d,4259657740
+        movzx   ebx,bl
+        add     r9d,r12d
+        mov     DWORD[12+rsi],edx
+        add     cl,al
+        rol     r9d,23
+        mov     r12d,r11d
+        pinsrw  xmm1,WORD[rbx*4+rdi],1
+
+        add     r9d,r10d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r10d
+        mov     DWORD[rcx*4+rdi],eax
+        xor     r12d,r9d
+        add     r8d,DWORD[4+r15]
+        add     al,dl
+        mov     ebx,DWORD[20+rsi]
+        add     r8d,2763975236
+        movzx   eax,al
+        add     r8d,r12d
+        mov     DWORD[16+rsi],edx
+        add     cl,bl
+        rol     r8d,4
+        mov     r12d,r10d
+        pinsrw  xmm0,WORD[rax*4+rdi],2
+
+        add     r8d,r9d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r9d
+        mov     DWORD[rcx*4+rdi],ebx
+        xor     r12d,r8d
+        add     r11d,DWORD[16+r15]
+        add     bl,dl
+        mov     eax,DWORD[24+rsi]
+        add     r11d,1272893353
+        movzx   ebx,bl
+        add     r11d,r12d
+        mov     DWORD[20+rsi],edx
+        add     cl,al
+        rol     r11d,11
+        mov     r12d,r9d
+        pinsrw  xmm1,WORD[rbx*4+rdi],2
+
+        add     r11d,r8d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r8d
+        mov     DWORD[rcx*4+rdi],eax
+        xor     r12d,r11d
+        add     r10d,DWORD[28+r15]
+        add     al,dl
+        mov     ebx,DWORD[28+rsi]
+        add     r10d,4139469664
+        movzx   eax,al
+        add     r10d,r12d
+        mov     DWORD[24+rsi],edx
+        add     cl,bl
+        rol     r10d,16
+        mov     r12d,r8d
+        pinsrw  xmm0,WORD[rax*4+rdi],3
+
+        add     r10d,r11d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r11d
+        mov     DWORD[rcx*4+rdi],ebx
+        xor     r12d,r10d
+        add     r9d,DWORD[40+r15]
+        add     bl,dl
+        mov     eax,DWORD[32+rsi]
+        add     r9d,3200236656
+        movzx   ebx,bl
+        add     r9d,r12d
+        mov     DWORD[28+rsi],edx
+        add     cl,al
+        rol     r9d,23
+        mov     r12d,r11d
+        pinsrw  xmm1,WORD[rbx*4+rdi],3
+
+        add     r9d,r10d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r10d
+        mov     DWORD[rcx*4+rdi],eax
+        xor     r12d,r9d
+        add     r8d,DWORD[52+r15]
+        add     al,dl
+        mov     ebx,DWORD[36+rsi]
+        add     r8d,681279174
+        movzx   eax,al
+        add     r8d,r12d
+        mov     DWORD[32+rsi],edx
+        add     cl,bl
+        rol     r8d,4
+        mov     r12d,r10d
+        pinsrw  xmm0,WORD[rax*4+rdi],4
+
+        add     r8d,r9d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r9d
+        mov     DWORD[rcx*4+rdi],ebx
+        xor     r12d,r8d
+        add     r11d,DWORD[r15]
+        add     bl,dl
+        mov     eax,DWORD[40+rsi]
+        add     r11d,3936430074
+        movzx   ebx,bl
+        add     r11d,r12d
+        mov     DWORD[36+rsi],edx
+        add     cl,al
+        rol     r11d,11
+        mov     r12d,r9d
+        pinsrw  xmm1,WORD[rbx*4+rdi],4
+
+        add     r11d,r8d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r8d
+        mov     DWORD[rcx*4+rdi],eax
+        xor     r12d,r11d
+        add     r10d,DWORD[12+r15]
+        add     al,dl
+        mov     ebx,DWORD[44+rsi]
+        add     r10d,3572445317
+        movzx   eax,al
+        add     r10d,r12d
+        mov     DWORD[40+rsi],edx
+        add     cl,bl
+        rol     r10d,16
+        mov     r12d,r8d
+        pinsrw  xmm0,WORD[rax*4+rdi],5
+
+        add     r10d,r11d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r11d
+        mov     DWORD[rcx*4+rdi],ebx
+        xor     r12d,r10d
+        add     r9d,DWORD[24+r15]
+        add     bl,dl
+        mov     eax,DWORD[48+rsi]
+        add     r9d,76029189
+        movzx   ebx,bl
+        add     r9d,r12d
+        mov     DWORD[44+rsi],edx
+        add     cl,al
+        rol     r9d,23
+        mov     r12d,r11d
+        pinsrw  xmm1,WORD[rbx*4+rdi],5
+
+        add     r9d,r10d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r10d
+        mov     DWORD[rcx*4+rdi],eax
+        xor     r12d,r9d
+        add     r8d,DWORD[36+r15]
+        add     al,dl
+        mov     ebx,DWORD[52+rsi]
+        add     r8d,3654602809
+        movzx   eax,al
+        add     r8d,r12d
+        mov     DWORD[48+rsi],edx
+        add     cl,bl
+        rol     r8d,4
+        mov     r12d,r10d
+        pinsrw  xmm0,WORD[rax*4+rdi],6
+
+        add     r8d,r9d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r9d
+        mov     DWORD[rcx*4+rdi],ebx
+        xor     r12d,r8d
+        add     r11d,DWORD[48+r15]
+        add     bl,dl
+        mov     eax,DWORD[56+rsi]
+        add     r11d,3873151461
+        movzx   ebx,bl
+        add     r11d,r12d
+        mov     DWORD[52+rsi],edx
+        add     cl,al
+        rol     r11d,11
+        mov     r12d,r9d
+        pinsrw  xmm1,WORD[rbx*4+rdi],6
+
+        add     r11d,r8d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r8d
+        mov     DWORD[rcx*4+rdi],eax
+        xor     r12d,r11d
+        add     r10d,DWORD[60+r15]
+        add     al,dl
+        mov     ebx,DWORD[60+rsi]
+        add     r10d,530742520
+        movzx   eax,al
+        add     r10d,r12d
+        mov     DWORD[56+rsi],edx
+        add     cl,bl
+        rol     r10d,16
+        mov     r12d,r8d
+        pinsrw  xmm0,WORD[rax*4+rdi],7
+
+        add     r10d,r11d
+        movdqu  xmm4,XMMWORD[32+r13]
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r11d
+        mov     DWORD[rcx*4+rdi],ebx
+        xor     r12d,r10d
+        add     r9d,DWORD[8+r15]
+        add     bl,dl
+        mov     eax,DWORD[64+rsi]
+        add     r9d,3299628645
+        movzx   ebx,bl
+        add     r9d,r12d
+        mov     DWORD[60+rsi],edx
+        add     cl,al
+        rol     r9d,23
+        mov     r12d,-1
+        pinsrw  xmm1,WORD[rbx*4+rdi],7
+
+        add     r9d,r10d
+        psllq   xmm1,8
+        pxor    xmm4,xmm0
+        pxor    xmm4,xmm1
+        pxor    xmm0,xmm0
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r11d
+        mov     DWORD[rcx*4+rdi],eax
+        or      r12d,r9d
+        add     r8d,DWORD[r15]
+        add     al,dl
+        mov     ebx,DWORD[68+rsi]
+        add     r8d,4096336452
+        movzx   eax,al
+        xor     r12d,r10d
+        mov     DWORD[64+rsi],edx
+        add     r8d,r12d
+        add     cl,bl
+        rol     r8d,6
+        mov     r12d,-1
+        movd    xmm0,DWORD[rax*4+rdi]
+
+        add     r8d,r9d
+        pxor    xmm1,xmm1
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r10d
+        mov     DWORD[rcx*4+rdi],ebx
+        or      r12d,r8d
+        add     r11d,DWORD[28+r15]
+        add     bl,dl
+        mov     eax,DWORD[72+rsi]
+        add     r11d,1126891415
+        movzx   ebx,bl
+        xor     r12d,r9d
+        mov     DWORD[68+rsi],edx
+        add     r11d,r12d
+        add     cl,al
+        rol     r11d,10
+        mov     r12d,-1
+        movd    xmm1,DWORD[rbx*4+rdi]
+
+        add     r11d,r8d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r9d
+        mov     DWORD[rcx*4+rdi],eax
+        or      r12d,r11d
+        add     r10d,DWORD[56+r15]
+        add     al,dl
+        mov     ebx,DWORD[76+rsi]
+        add     r10d,2878612391
+        movzx   eax,al
+        xor     r12d,r8d
+        mov     DWORD[72+rsi],edx
+        add     r10d,r12d
+        add     cl,bl
+        rol     r10d,15
+        mov     r12d,-1
+        pinsrw  xmm0,WORD[rax*4+rdi],1
+
+        add     r10d,r11d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r8d
+        mov     DWORD[rcx*4+rdi],ebx
+        or      r12d,r10d
+        add     r9d,DWORD[20+r15]
+        add     bl,dl
+        mov     eax,DWORD[80+rsi]
+        add     r9d,4237533241
+        movzx   ebx,bl
+        xor     r12d,r11d
+        mov     DWORD[76+rsi],edx
+        add     r9d,r12d
+        add     cl,al
+        rol     r9d,21
+        mov     r12d,-1
+        pinsrw  xmm1,WORD[rbx*4+rdi],1
+
+        add     r9d,r10d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r11d
+        mov     DWORD[rcx*4+rdi],eax
+        or      r12d,r9d
+        add     r8d,DWORD[48+r15]
+        add     al,dl
+        mov     ebx,DWORD[84+rsi]
+        add     r8d,1700485571
+        movzx   eax,al
+        xor     r12d,r10d
+        mov     DWORD[80+rsi],edx
+        add     r8d,r12d
+        add     cl,bl
+        rol     r8d,6
+        mov     r12d,-1
+        pinsrw  xmm0,WORD[rax*4+rdi],2
+
+        add     r8d,r9d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r10d
+        mov     DWORD[rcx*4+rdi],ebx
+        or      r12d,r8d
+        add     r11d,DWORD[12+r15]
+        add     bl,dl
+        mov     eax,DWORD[88+rsi]
+        add     r11d,2399980690
+        movzx   ebx,bl
+        xor     r12d,r9d
+        mov     DWORD[84+rsi],edx
+        add     r11d,r12d
+        add     cl,al
+        rol     r11d,10
+        mov     r12d,-1
+        pinsrw  xmm1,WORD[rbx*4+rdi],2
+
+        add     r11d,r8d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r9d
+        mov     DWORD[rcx*4+rdi],eax
+        or      r12d,r11d
+        add     r10d,DWORD[40+r15]
+        add     al,dl
+        mov     ebx,DWORD[92+rsi]
+        add     r10d,4293915773
+        movzx   eax,al
+        xor     r12d,r8d
+        mov     DWORD[88+rsi],edx
+        add     r10d,r12d
+        add     cl,bl
+        rol     r10d,15
+        mov     r12d,-1
+        pinsrw  xmm0,WORD[rax*4+rdi],3
+
+        add     r10d,r11d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r8d
+        mov     DWORD[rcx*4+rdi],ebx
+        or      r12d,r10d
+        add     r9d,DWORD[4+r15]
+        add     bl,dl
+        mov     eax,DWORD[96+rsi]
+        add     r9d,2240044497
+        movzx   ebx,bl
+        xor     r12d,r11d
+        mov     DWORD[92+rsi],edx
+        add     r9d,r12d
+        add     cl,al
+        rol     r9d,21
+        mov     r12d,-1
+        pinsrw  xmm1,WORD[rbx*4+rdi],3
+
+        add     r9d,r10d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r11d
+        mov     DWORD[rcx*4+rdi],eax
+        or      r12d,r9d
+        add     r8d,DWORD[32+r15]
+        add     al,dl
+        mov     ebx,DWORD[100+rsi]
+        add     r8d,1873313359
+        movzx   eax,al
+        xor     r12d,r10d
+        mov     DWORD[96+rsi],edx
+        add     r8d,r12d
+        add     cl,bl
+        rol     r8d,6
+        mov     r12d,-1
+        pinsrw  xmm0,WORD[rax*4+rdi],4
+
+        add     r8d,r9d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r10d
+        mov     DWORD[rcx*4+rdi],ebx
+        or      r12d,r8d
+        add     r11d,DWORD[60+r15]
+        add     bl,dl
+        mov     eax,DWORD[104+rsi]
+        add     r11d,4264355552
+        movzx   ebx,bl
+        xor     r12d,r9d
+        mov     DWORD[100+rsi],edx
+        add     r11d,r12d
+        add     cl,al
+        rol     r11d,10
+        mov     r12d,-1
+        pinsrw  xmm1,WORD[rbx*4+rdi],4
+
+        add     r11d,r8d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r9d
+        mov     DWORD[rcx*4+rdi],eax
+        or      r12d,r11d
+        add     r10d,DWORD[24+r15]
+        add     al,dl
+        mov     ebx,DWORD[108+rsi]
+        add     r10d,2734768916
+        movzx   eax,al
+        xor     r12d,r8d
+        mov     DWORD[104+rsi],edx
+        add     r10d,r12d
+        add     cl,bl
+        rol     r10d,15
+        mov     r12d,-1
+        pinsrw  xmm0,WORD[rax*4+rdi],5
+
+        add     r10d,r11d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r8d
+        mov     DWORD[rcx*4+rdi],ebx
+        or      r12d,r10d
+        add     r9d,DWORD[52+r15]
+        add     bl,dl
+        mov     eax,DWORD[112+rsi]
+        add     r9d,1309151649
+        movzx   ebx,bl
+        xor     r12d,r11d
+        mov     DWORD[108+rsi],edx
+        add     r9d,r12d
+        add     cl,al
+        rol     r9d,21
+        mov     r12d,-1
+        pinsrw  xmm1,WORD[rbx*4+rdi],5
+
+        add     r9d,r10d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r11d
+        mov     DWORD[rcx*4+rdi],eax
+        or      r12d,r9d
+        add     r8d,DWORD[16+r15]
+        add     al,dl
+        mov     ebx,DWORD[116+rsi]
+        add     r8d,4149444226
+        movzx   eax,al
+        xor     r12d,r10d
+        mov     DWORD[112+rsi],edx
+        add     r8d,r12d
+        add     cl,bl
+        rol     r8d,6
+        mov     r12d,-1
+        pinsrw  xmm0,WORD[rax*4+rdi],6
+
+        add     r8d,r9d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r10d
+        mov     DWORD[rcx*4+rdi],ebx
+        or      r12d,r8d
+        add     r11d,DWORD[44+r15]
+        add     bl,dl
+        mov     eax,DWORD[120+rsi]
+        add     r11d,3174756917
+        movzx   ebx,bl
+        xor     r12d,r9d
+        mov     DWORD[116+rsi],edx
+        add     r11d,r12d
+        add     cl,al
+        rol     r11d,10
+        mov     r12d,-1
+        pinsrw  xmm1,WORD[rbx*4+rdi],6
+
+        add     r11d,r8d
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r9d
+        mov     DWORD[rcx*4+rdi],eax
+        or      r12d,r11d
+        add     r10d,DWORD[8+r15]
+        add     al,dl
+        mov     ebx,DWORD[124+rsi]
+        add     r10d,718787259
+        movzx   eax,al
+        xor     r12d,r8d
+        mov     DWORD[120+rsi],edx
+        add     r10d,r12d
+        add     cl,bl
+        rol     r10d,15
+        mov     r12d,-1
+        pinsrw  xmm0,WORD[rax*4+rdi],7
+
+        add     r10d,r11d
+        movdqu  xmm5,XMMWORD[48+r13]
+        add     bpl,32
+        mov     edx,DWORD[rcx*4+rdi]
+        xor     r12d,r8d
+        mov     DWORD[rcx*4+rdi],ebx
+        or      r12d,r10d
+        add     r9d,DWORD[36+r15]
+        add     bl,dl
+        mov     eax,DWORD[rbp*4+rdi]
+        add     r9d,3951481745
+        movzx   ebx,bl
+        xor     r12d,r11d
+        mov     DWORD[124+rsi],edx
+        add     r9d,r12d
+        add     cl,al
+        rol     r9d,21
+        mov     r12d,-1
+        pinsrw  xmm1,WORD[rbx*4+rdi],7
+
+        add     r9d,r10d
+        mov     rsi,rbp
+        xor     rbp,rbp
+        mov     bpl,sil
+        mov     rsi,rcx
+        xor     rcx,rcx
+        mov     cl,sil
+        lea     rsi,[rbp*4+rdi]
+        psllq   xmm1,8
+        pxor    xmm5,xmm0
+        pxor    xmm5,xmm1
+        add     r8d,DWORD[rsp]
+        add     r9d,DWORD[4+rsp]
+        add     r10d,DWORD[8+rsp]
+        add     r11d,DWORD[12+rsp]
+
+        movdqu  XMMWORD[r13*1+r14],xmm2
+        movdqu  XMMWORD[16+r13*1+r14],xmm3
+        movdqu  XMMWORD[32+r13*1+r14],xmm4
+        movdqu  XMMWORD[48+r13*1+r14],xmm5
+        lea     r15,[64+r15]
+        lea     r13,[64+r13]
+        cmp     r15,QWORD[16+rsp]
+        jb      NEAR $L$oop
+
+        mov     r12,QWORD[24+rsp]
+        sub     cl,al
+        mov     DWORD[r12],r8d
+        mov     DWORD[4+r12],r9d
+        mov     DWORD[8+r12],r10d
+        mov     DWORD[12+r12],r11d
+        sub     bpl,1
+        mov     DWORD[((-8))+rdi],ebp
+        mov     DWORD[((-4))+rdi],ecx
+
+        mov     r15,QWORD[40+rsp]
+
+        mov     r14,QWORD[48+rsp]
+
+        mov     r13,QWORD[56+rsp]
+
+        mov     r12,QWORD[64+rsp]
+
+        mov     rbp,QWORD[72+rsp]
+
+        mov     rbx,QWORD[80+rsp]
+
+        lea     rsp,[88+rsp]
+
+$L$epilogue:
+$L$abort:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_rc4_md5_enc:
+EXTERN  __imp_RtlVirtualUnwind
+
+ALIGN   16
+se_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        lea     r10,[$L$body]
+        cmp     rbx,r10
+        jb      NEAR $L$in_prologue
+
+        mov     rax,QWORD[152+r8]
+
+        lea     r10,[$L$epilogue]
+        cmp     rbx,r10
+        jae     NEAR $L$in_prologue
+
+        mov     r15,QWORD[40+rax]
+        mov     r14,QWORD[48+rax]
+        mov     r13,QWORD[56+rax]
+        mov     r12,QWORD[64+rax]
+        mov     rbp,QWORD[72+rax]
+        mov     rbx,QWORD[80+rax]
+        lea     rax,[88+rax]
+
+        mov     QWORD[144+r8],rbx
+        mov     QWORD[160+r8],rbp
+        mov     QWORD[216+r8],r12
+        mov     QWORD[224+r8],r13
+        mov     QWORD[232+r8],r14
+        mov     QWORD[240+r8],r15
+
+$L$in_prologue:
+        mov     rdi,QWORD[8+rax]
+        mov     rsi,QWORD[16+rax]
+        mov     QWORD[152+r8],rax
+        mov     QWORD[168+r8],rsi
+        mov     QWORD[176+r8],rdi
+
+        mov     rdi,QWORD[40+r9]
+        mov     rsi,r8
+        mov     ecx,154
+        DD      0xa548f3fc
+
+        mov     rsi,r9
+        xor     rcx,rcx
+        mov     rdx,QWORD[8+rsi]
+        mov     r8,QWORD[rsi]
+        mov     r9,QWORD[16+rsi]
+        mov     r10,QWORD[40+rsi]
+        lea     r11,[56+rsi]
+        lea     r12,[24+rsi]
+        mov     QWORD[32+rsp],r10
+        mov     QWORD[40+rsp],r11
+        mov     QWORD[48+rsp],r12
+        mov     QWORD[56+rsp],rcx
+        call    QWORD[__imp_RtlVirtualUnwind]
+
+        mov     eax,1
+        add     rsp,64
+        popfq
+        pop     r15
+        pop     r14
+        pop     r13
+        pop     r12
+        pop     rbp
+        pop     rbx
+        pop     rdi
+        pop     rsi
+        DB      0F3h,0C3h               ;repret
+
+
+section .pdata rdata align=4
+ALIGN   4
+        DD      $L$SEH_begin_rc4_md5_enc wrt ..imagebase
+        DD      $L$SEH_end_rc4_md5_enc wrt ..imagebase
+        DD      $L$SEH_info_rc4_md5_enc wrt ..imagebase
+
+section .xdata rdata align=8
+ALIGN   8
+$L$SEH_info_rc4_md5_enc:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-x86_64.nasm
new file mode 100644
index 0000000000..72e3641649
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-x86_64.nasm
@@ -0,0 +1,784 @@
+; Copyright 2005-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+EXTERN  OPENSSL_ia32cap_P
+
+global  RC4
+
+ALIGN   16
+RC4:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_RC4:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+
+        or      rsi,rsi
+        jne     NEAR $L$entry
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+$L$entry:
+
+        push    rbx
+
+        push    r12
+
+        push    r13
+
+$L$prologue:
+        mov     r11,rsi
+        mov     r12,rdx
+        mov     r13,rcx
+        xor     r10,r10
+        xor     rcx,rcx
+
+        lea     rdi,[8+rdi]
+        mov     r10b,BYTE[((-8))+rdi]
+        mov     cl,BYTE[((-4))+rdi]
+        cmp     DWORD[256+rdi],-1
+        je      NEAR $L$RC4_CHAR
+        mov     r8d,DWORD[OPENSSL_ia32cap_P]
+        xor     rbx,rbx
+        inc     r10b
+        sub     rbx,r10
+        sub     r13,r12
+        mov     eax,DWORD[r10*4+rdi]
+        test    r11,-16
+        jz      NEAR $L$loop1
+        bt      r8d,30
+        jc      NEAR $L$intel
+        and     rbx,7
+        lea     rsi,[1+r10]
+        jz      NEAR $L$oop8
+        sub     r11,rbx
+$L$oop8_warmup:
+        add     cl,al
+        mov     edx,DWORD[rcx*4+rdi]
+        mov     DWORD[rcx*4+rdi],eax
+        mov     DWORD[r10*4+rdi],edx
+        add     al,dl
+        inc     r10b
+        mov     edx,DWORD[rax*4+rdi]
+        mov     eax,DWORD[r10*4+rdi]
+        xor     dl,BYTE[r12]
+        mov     BYTE[r13*1+r12],dl
+        lea     r12,[1+r12]
+        dec     rbx
+        jnz     NEAR $L$oop8_warmup
+
+        lea     rsi,[1+r10]
+        jmp     NEAR $L$oop8
+ALIGN   16
+$L$oop8:
+        add     cl,al
+        mov     edx,DWORD[rcx*4+rdi]
+        mov     DWORD[rcx*4+rdi],eax
+        mov     ebx,DWORD[rsi*4+rdi]
+        ror     r8,8
+        mov     DWORD[r10*4+rdi],edx
+        add     dl,al
+        mov     r8b,BYTE[rdx*4+rdi]
+        add     cl,bl
+        mov     edx,DWORD[rcx*4+rdi]
+        mov     DWORD[rcx*4+rdi],ebx
+        mov     eax,DWORD[4+rsi*4+rdi]
+        ror     r8,8
+        mov     DWORD[4+r10*4+rdi],edx
+        add     dl,bl
+        mov     r8b,BYTE[rdx*4+rdi]
+        add     cl,al
+        mov     edx,DWORD[rcx*4+rdi]
+        mov     DWORD[rcx*4+rdi],eax
+        mov     ebx,DWORD[8+rsi*4+rdi]
+        ror     r8,8
+        mov     DWORD[8+r10*4+rdi],edx
+        add     dl,al
+        mov     r8b,BYTE[rdx*4+rdi]
+        add     cl,bl
+        mov     edx,DWORD[rcx*4+rdi]
+        mov     DWORD[rcx*4+rdi],ebx
+        mov     eax,DWORD[12+rsi*4+rdi]
+        ror     r8,8
+        mov     DWORD[12+r10*4+rdi],edx
+        add     dl,bl
+        mov     r8b,BYTE[rdx*4+rdi]
+        add     cl,al
+        mov     edx,DWORD[rcx*4+rdi]
+        mov     DWORD[rcx*4+rdi],eax
+        mov     ebx,DWORD[16+rsi*4+rdi]
+        ror     r8,8
+        mov     DWORD[16+r10*4+rdi],edx
+        add     dl,al
+        mov     r8b,BYTE[rdx*4+rdi]
+        add     cl,bl
+        mov     edx,DWORD[rcx*4+rdi]
+        mov     DWORD[rcx*4+rdi],ebx
+        mov     eax,DWORD[20+rsi*4+rdi]
+        ror     r8,8
+        mov     DWORD[20+r10*4+rdi],edx
+        add     dl,bl
+        mov     r8b,BYTE[rdx*4+rdi]
+        add     cl,al
+        mov     edx,DWORD[rcx*4+rdi]
+        mov     DWORD[rcx*4+rdi],eax
+        mov     ebx,DWORD[24+rsi*4+rdi]
+        ror     r8,8
+        mov     DWORD[24+r10*4+rdi],edx
+        add     dl,al
+        mov     r8b,BYTE[rdx*4+rdi]
+        add     sil,8
+        add     cl,bl
+        mov     edx,DWORD[rcx*4+rdi]
+        mov     DWORD[rcx*4+rdi],ebx
+        mov     eax,DWORD[((-4))+rsi*4+rdi]
+        ror     r8,8
+        mov     DWORD[28+r10*4+rdi],edx
+        add     dl,bl
+        mov     r8b,BYTE[rdx*4+rdi]
+        add     r10b,8
+        ror     r8,8
+        sub     r11,8
+
+        xor     r8,QWORD[r12]
+        mov     QWORD[r13*1+r12],r8
+        lea     r12,[8+r12]
+
+        test    r11,-8
+        jnz     NEAR $L$oop8
+        cmp     r11,0
+        jne     NEAR $L$loop1
+        jmp     NEAR $L$exit
+
+ALIGN   16
+$L$intel:
+        test    r11,-32
+        jz      NEAR $L$loop1
+        and     rbx,15
+        jz      NEAR $L$oop16_is_hot
+        sub     r11,rbx
+$L$oop16_warmup:
+        add     cl,al
+        mov     edx,DWORD[rcx*4+rdi]
+        mov     DWORD[rcx*4+rdi],eax
+        mov     DWORD[r10*4+rdi],edx
+        add     al,dl
+        inc     r10b
+        mov     edx,DWORD[rax*4+rdi]
+        mov     eax,DWORD[r10*4+rdi]
+        xor     dl,BYTE[r12]
+        mov     BYTE[r13*1+r12],dl
+        lea     r12,[1+r12]
+        dec     rbx
+        jnz     NEAR $L$oop16_warmup
+
+        mov     rbx,rcx
+        xor     rcx,rcx
+        mov     cl,bl
+
+$L$oop16_is_hot:
+        lea     rsi,[r10*4+rdi]
+        add     cl,al
+        mov     edx,DWORD[rcx*4+rdi]
+        pxor    xmm0,xmm0
+        mov     DWORD[rcx*4+rdi],eax
+        add     al,dl
+        mov     ebx,DWORD[4+rsi]
+        movzx   eax,al
+        mov     DWORD[rsi],edx
+        add     cl,bl
+        pinsrw  xmm0,WORD[rax*4+rdi],0
+        jmp     NEAR $L$oop16_enter
+ALIGN   16
+$L$oop16:
+        add     cl,al
+        mov     edx,DWORD[rcx*4+rdi]
+        pxor    xmm2,xmm0
+        psllq   xmm1,8
+        pxor    xmm0,xmm0
+        mov     DWORD[rcx*4+rdi],eax
+        add     al,dl
+        mov     ebx,DWORD[4+rsi]
+        movzx   eax,al
+        mov     DWORD[rsi],edx
+        pxor    xmm2,xmm1
+        add     cl,bl
+        pinsrw  xmm0,WORD[rax*4+rdi],0
+        movdqu  XMMWORD[r13*1+r12],xmm2
+        lea     r12,[16+r12]
+$L$oop16_enter:
+        mov     edx,DWORD[rcx*4+rdi]
+        pxor    xmm1,xmm1
+        mov     DWORD[rcx*4+rdi],ebx
+        add     bl,dl
+        mov     eax,DWORD[8+rsi]
+        movzx   ebx,bl
+        mov     DWORD[4+rsi],edx
+        add     cl,al
+        pinsrw  xmm1,WORD[rbx*4+rdi],0
+        mov     edx,DWORD[rcx*4+rdi]
+        mov     DWORD[rcx*4+rdi],eax
+        add     al,dl
+        mov     ebx,DWORD[12+rsi]
+        movzx   eax,al
+        mov     DWORD[8+rsi],edx
+        add     cl,bl
+        pinsrw  xmm0,WORD[rax*4+rdi],1
+        mov     edx,DWORD[rcx*4+rdi]
+        mov     DWORD[rcx*4+rdi],ebx
+        add     bl,dl
+        mov     eax,DWORD[16+rsi]
+        movzx   ebx,bl
+        mov     DWORD[12+rsi],edx
+        add     cl,al
+        pinsrw  xmm1,WORD[rbx*4+rdi],1
+        mov     edx,DWORD[rcx*4+rdi]
+        mov     DWORD[rcx*4+rdi],eax
+        add     al,dl
+        mov     ebx,DWORD[20+rsi]
+        movzx   eax,al
+        mov     DWORD[16+rsi],edx
+        add     cl,bl
+        pinsrw  xmm0,WORD[rax*4+rdi],2
+        mov     edx,DWORD[rcx*4+rdi]
+        mov     DWORD[rcx*4+rdi],ebx
+        add     bl,dl
+        mov     eax,DWORD[24+rsi]
+        movzx   ebx,bl
+        mov     DWORD[20+rsi],edx
+        add     cl,al
+        pinsrw  xmm1,WORD[rbx*4+rdi],2
+        mov     edx,DWORD[rcx*4+rdi]
+        mov     DWORD[rcx*4+rdi],eax
+        add     al,dl
+        mov     ebx,DWORD[28+rsi]
+        movzx   eax,al
+        mov     DWORD[24+rsi],edx
+        add     cl,bl
+        pinsrw  xmm0,WORD[rax*4+rdi],3
+        mov     edx,DWORD[rcx*4+rdi]
+        mov     DWORD[rcx*4+rdi],ebx
+        add     bl,dl
+        mov     eax,DWORD[32+rsi]
+        movzx   ebx,bl
+        mov     DWORD[28+rsi],edx
+        add     cl,al
+        pinsrw  xmm1,WORD[rbx*4+rdi],3
+        mov     edx,DWORD[rcx*4+rdi]
+        mov     DWORD[rcx*4+rdi],eax
+        add     al,dl
+        mov     ebx,DWORD[36+rsi]
+        movzx   eax,al
+        mov     DWORD[32+rsi],edx
+        add     cl,bl
+        pinsrw  xmm0,WORD[rax*4+rdi],4
+        mov     edx,DWORD[rcx*4+rdi]
+        mov     DWORD[rcx*4+rdi],ebx
+        add     bl,dl
+        mov     eax,DWORD[40+rsi]
+        movzx   ebx,bl
+        mov     DWORD[36+rsi],edx
+        add     cl,al
+        pinsrw  xmm1,WORD[rbx*4+rdi],4
+        mov     edx,DWORD[rcx*4+rdi]
+        mov     DWORD[rcx*4+rdi],eax
+        add     al,dl
+        mov     ebx,DWORD[44+rsi]
+        movzx   eax,al
+        mov     DWORD[40+rsi],edx
+        add     cl,bl
+        pinsrw  xmm0,WORD[rax*4+rdi],5
+        mov     edx,DWORD[rcx*4+rdi]
+        mov     DWORD[rcx*4+rdi],ebx
+        add     bl,dl
+        mov     eax,DWORD[48+rsi]
+        movzx   ebx,bl
+        mov     DWORD[44+rsi],edx
+        add     cl,al
+        pinsrw  xmm1,WORD[rbx*4+rdi],5
+        mov     edx,DWORD[rcx*4+rdi]
+        mov     DWORD[rcx*4+rdi],eax
+        add     al,dl
+        mov     ebx,DWORD[52+rsi]
+        movzx   eax,al
+        mov     DWORD[48+rsi],edx
+        add     cl,bl
+        pinsrw  xmm0,WORD[rax*4+rdi],6
+        mov     edx,DWORD[rcx*4+rdi]
+        mov     DWORD[rcx*4+rdi],ebx
+        add     bl,dl
+        mov     eax,DWORD[56+rsi]
+        movzx   ebx,bl
+        mov     DWORD[52+rsi],edx
+        add     cl,al
+        pinsrw  xmm1,WORD[rbx*4+rdi],6
+        mov     edx,DWORD[rcx*4+rdi]
+        mov     DWORD[rcx*4+rdi],eax
+        add     al,dl
+        mov     ebx,DWORD[60+rsi]
+        movzx   eax,al
+        mov     DWORD[56+rsi],edx
+        add     cl,bl
+        pinsrw  xmm0,WORD[rax*4+rdi],7
+        add     r10b,16
+        movdqu  xmm2,XMMWORD[r12]
+        mov     edx,DWORD[rcx*4+rdi]
+        mov     DWORD[rcx*4+rdi],ebx
+        add     bl,dl
+        movzx   ebx,bl
+        mov     DWORD[60+rsi],edx
+        lea     rsi,[r10*4+rdi]
+        pinsrw  xmm1,WORD[rbx*4+rdi],7
+        mov     eax,DWORD[rsi]
+        mov     rbx,rcx
+        xor     rcx,rcx
+        sub     r11,16
+        mov     cl,bl
+        test    r11,-16
+        jnz     NEAR $L$oop16
+
+        psllq   xmm1,8
+        pxor    xmm2,xmm0
+        pxor    xmm2,xmm1
+        movdqu  XMMWORD[r13*1+r12],xmm2
+        lea     r12,[16+r12]
+
+        cmp     r11,0
+        jne     NEAR $L$loop1
+        jmp     NEAR $L$exit
+
+ALIGN   16
+$L$loop1:
+        add     cl,al
+        mov     edx,DWORD[rcx*4+rdi]
+        mov     DWORD[rcx*4+rdi],eax
+        mov     DWORD[r10*4+rdi],edx
+        add     al,dl
+        inc     r10b
+        mov     edx,DWORD[rax*4+rdi]
+        mov     eax,DWORD[r10*4+rdi]
+        xor     dl,BYTE[r12]
+        mov     BYTE[r13*1+r12],dl
+        lea     r12,[1+r12]
+        dec     r11
+        jnz     NEAR $L$loop1
+        jmp     NEAR $L$exit
+
+ALIGN   16
+$L$RC4_CHAR:
+        add     r10b,1
+        movzx   eax,BYTE[r10*1+rdi]
+        test    r11,-8
+        jz      NEAR $L$cloop1
+        jmp     NEAR $L$cloop8
+ALIGN   16
+$L$cloop8:
+        mov     r8d,DWORD[r12]
+        mov     r9d,DWORD[4+r12]
+        add     cl,al
+        lea     rsi,[1+r10]
+        movzx   edx,BYTE[rcx*1+rdi]
+        movzx   esi,sil
+        movzx   ebx,BYTE[rsi*1+rdi]
+        mov     BYTE[rcx*1+rdi],al
+        cmp     rcx,rsi
+        mov     BYTE[r10*1+rdi],dl
+        jne     NEAR $L$cmov0
+        mov     rbx,rax
+$L$cmov0:
+        add     dl,al
+        xor     r8b,BYTE[rdx*1+rdi]
+        ror     r8d,8
+        add     cl,bl
+        lea     r10,[1+rsi]
+        movzx   edx,BYTE[rcx*1+rdi]
+        movzx   r10d,r10b
+        movzx   eax,BYTE[r10*1+rdi]
+        mov     BYTE[rcx*1+rdi],bl
+        cmp     rcx,r10
+        mov     BYTE[rsi*1+rdi],dl
+        jne     NEAR $L$cmov1
+        mov     rax,rbx
+$L$cmov1:
+        add     dl,bl
+        xor     r8b,BYTE[rdx*1+rdi]
+        ror     r8d,8
+        add     cl,al
+        lea     rsi,[1+r10]
+        movzx   edx,BYTE[rcx*1+rdi]
+        movzx   esi,sil
+        movzx   ebx,BYTE[rsi*1+rdi]
+        mov     BYTE[rcx*1+rdi],al
+        cmp     rcx,rsi
+        mov     BYTE[r10*1+rdi],dl
+        jne     NEAR $L$cmov2
+        mov     rbx,rax
+$L$cmov2:
+        add     dl,al
+        xor     r8b,BYTE[rdx*1+rdi]
+        ror     r8d,8
+        add     cl,bl
+        lea     r10,[1+rsi]
+        movzx   edx,BYTE[rcx*1+rdi]
+        movzx   r10d,r10b
+        movzx   eax,BYTE[r10*1+rdi]
+        mov     BYTE[rcx*1+rdi],bl
+        cmp     rcx,r10
+        mov     BYTE[rsi*1+rdi],dl
+        jne     NEAR $L$cmov3
+        mov     rax,rbx
+$L$cmov3:
+        add     dl,bl
+        xor     r8b,BYTE[rdx*1+rdi]
+        ror     r8d,8
+        add     cl,al
+        lea     rsi,[1+r10]
+        movzx   edx,BYTE[rcx*1+rdi]
+        movzx   esi,sil
+        movzx   ebx,BYTE[rsi*1+rdi]
+        mov     BYTE[rcx*1+rdi],al
+        cmp     rcx,rsi
+        mov     BYTE[r10*1+rdi],dl
+        jne     NEAR $L$cmov4
+        mov     rbx,rax
+$L$cmov4:
+        add     dl,al
+        xor     r9b,BYTE[rdx*1+rdi]
+        ror     r9d,8
+        add     cl,bl
+        lea     r10,[1+rsi]
+        movzx   edx,BYTE[rcx*1+rdi]
+        movzx   r10d,r10b
+        movzx   eax,BYTE[r10*1+rdi]
+        mov     BYTE[rcx*1+rdi],bl
+        cmp     rcx,r10
+        mov     BYTE[rsi*1+rdi],dl
+        jne     NEAR $L$cmov5
+        mov     rax,rbx
+$L$cmov5:
+        add     dl,bl
+        xor     r9b,BYTE[rdx*1+rdi]
+        ror     r9d,8
+        add     cl,al
+        lea     rsi,[1+r10]
+        movzx   edx,BYTE[rcx*1+rdi]
+        movzx   esi,sil
+        movzx   ebx,BYTE[rsi*1+rdi]
+        mov     BYTE[rcx*1+rdi],al
+        cmp     rcx,rsi
+        mov     BYTE[r10*1+rdi],dl
+        jne     NEAR $L$cmov6
+        mov     rbx,rax
+$L$cmov6:
+        add     dl,al
+        xor     r9b,BYTE[rdx*1+rdi]
+        ror     r9d,8
+        add     cl,bl
+        lea     r10,[1+rsi]
+        movzx   edx,BYTE[rcx*1+rdi]
+        movzx   r10d,r10b
+        movzx   eax,BYTE[r10*1+rdi]
+        mov     BYTE[rcx*1+rdi],bl
+        cmp     rcx,r10
+        mov     BYTE[rsi*1+rdi],dl
+        jne     NEAR $L$cmov7
+        mov     rax,rbx
+$L$cmov7:
+        add     dl,bl
+        xor     r9b,BYTE[rdx*1+rdi]
+        ror     r9d,8
+        lea     r11,[((-8))+r11]
+        mov     DWORD[r13],r8d
+        lea     r12,[8+r12]
+        mov     DWORD[4+r13],r9d
+        lea     r13,[8+r13]
+
+        test    r11,-8
+        jnz     NEAR $L$cloop8
+        cmp     r11,0
+        jne     NEAR $L$cloop1
+        jmp     NEAR $L$exit
+ALIGN   16
+$L$cloop1:
+        add     cl,al
+        movzx   ecx,cl
+        movzx   edx,BYTE[rcx*1+rdi]
+        mov     BYTE[rcx*1+rdi],al
+        mov     BYTE[r10*1+rdi],dl
+        add     dl,al
+        add     r10b,1
+        movzx   edx,dl
+        movzx   r10d,r10b
+        movzx   edx,BYTE[rdx*1+rdi]
+        movzx   eax,BYTE[r10*1+rdi]
+        xor     dl,BYTE[r12]
+        lea     r12,[1+r12]
+        mov     BYTE[r13],dl
+        lea     r13,[1+r13]
+        sub     r11,1
+        jnz     NEAR $L$cloop1
+        jmp     NEAR $L$exit
+
+ALIGN   16
+$L$exit:
+        sub     r10b,1
+        mov     DWORD[((-8))+rdi],r10d
+        mov     DWORD[((-4))+rdi],ecx
+
+        mov     r13,QWORD[rsp]
+
+        mov     r12,QWORD[8+rsp]
+
+        mov     rbx,QWORD[16+rsp]
+
+        add     rsp,24
+
+$L$epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_RC4:
+global  RC4_set_key
+
+ALIGN   16
+RC4_set_key:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_RC4_set_key:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+        lea     rdi,[8+rdi]
+        lea     rdx,[rsi*1+rdx]
+        neg     rsi
+        mov     rcx,rsi
+        xor     eax,eax
+        xor     r9,r9
+        xor     r10,r10
+        xor     r11,r11
+
+        mov     r8d,DWORD[OPENSSL_ia32cap_P]
+        bt      r8d,20
+        jc      NEAR $L$c1stloop
+        jmp     NEAR $L$w1stloop
+
+ALIGN   16
+$L$w1stloop:
+        mov     DWORD[rax*4+rdi],eax
+        add     al,1
+        jnc     NEAR $L$w1stloop
+
+        xor     r9,r9
+        xor     r8,r8
+ALIGN   16
+$L$w2ndloop:
+        mov     r10d,DWORD[r9*4+rdi]
+        add     r8b,BYTE[rsi*1+rdx]
+        add     r8b,r10b
+        add     rsi,1
+        mov     r11d,DWORD[r8*4+rdi]
+        cmovz   rsi,rcx
+        mov     DWORD[r8*4+rdi],r10d
+        mov     DWORD[r9*4+rdi],r11d
+        add     r9b,1
+        jnc     NEAR $L$w2ndloop
+        jmp     NEAR $L$exit_key
+
+ALIGN   16
+$L$c1stloop:
+        mov     BYTE[rax*1+rdi],al
+        add     al,1
+        jnc     NEAR $L$c1stloop
+
+        xor     r9,r9
+        xor     r8,r8
+ALIGN   16
+$L$c2ndloop:
+        mov     r10b,BYTE[r9*1+rdi]
+        add     r8b,BYTE[rsi*1+rdx]
+        add     r8b,r10b
+        add     rsi,1
+        mov     r11b,BYTE[r8*1+rdi]
+        jnz     NEAR $L$cnowrap
+        mov     rsi,rcx
+$L$cnowrap:
+        mov     BYTE[r8*1+rdi],r10b
+        mov     BYTE[r9*1+rdi],r11b
+        add     r9b,1
+        jnc     NEAR $L$c2ndloop
+        mov     DWORD[256+rdi],-1
+
+ALIGN   16
+$L$exit_key:
+        xor     eax,eax
+        mov     DWORD[((-8))+rdi],eax
+        mov     DWORD[((-4))+rdi],eax
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+$L$SEH_end_RC4_set_key:
+
+global  RC4_options
+
+ALIGN   16
+RC4_options:
+        lea     rax,[$L$opts]
+        mov     edx,DWORD[OPENSSL_ia32cap_P]
+        bt      edx,20
+        jc      NEAR $L$8xchar
+        bt      edx,30
+        jnc     NEAR $L$done
+        add     rax,25
+        DB      0F3h,0C3h               ;repret
+$L$8xchar:
+        add     rax,12
+$L$done:
+        DB      0F3h,0C3h               ;repret
+ALIGN   64
+$L$opts:
+DB      114,99,52,40,56,120,44,105,110,116,41,0
+DB      114,99,52,40,56,120,44,99,104,97,114,41,0
+DB      114,99,52,40,49,54,120,44,105,110,116,41,0
+DB      82,67,52,32,102,111,114,32,120,56,54,95,54,52,44,32
+DB      67,82,89,80,84,79,71,65,77,83,32,98,121,32,60,97
+DB      112,112,114,111,64,111,112,101,110,115,115,108,46,111,114,103
+DB      62,0
+ALIGN   64
+
+EXTERN  __imp_RtlVirtualUnwind
+
+ALIGN   16
+stream_se_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        lea     r10,[$L$prologue]
+        cmp     rbx,r10
+        jb      NEAR $L$in_prologue
+
+        mov     rax,QWORD[152+r8]
+
+        lea     r10,[$L$epilogue]
+        cmp     rbx,r10
+        jae     NEAR $L$in_prologue
+
+        lea     rax,[24+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+        mov     r12,QWORD[((-16))+rax]
+        mov     r13,QWORD[((-24))+rax]
+        mov     QWORD[144+r8],rbx
+        mov     QWORD[216+r8],r12
+        mov     QWORD[224+r8],r13
+
+$L$in_prologue:
+        mov     rdi,QWORD[8+rax]
+        mov     rsi,QWORD[16+rax]
+        mov     QWORD[152+r8],rax
+        mov     QWORD[168+r8],rsi
+        mov     QWORD[176+r8],rdi
+
+        jmp     NEAR $L$common_seh_exit
+
+
+
+ALIGN   16
+key_se_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[152+r8]
+        mov     rdi,QWORD[8+rax]
+        mov     rsi,QWORD[16+rax]
+        mov     QWORD[168+r8],rsi
+        mov     QWORD[176+r8],rdi
+
+$L$common_seh_exit:
+
+        mov     rdi,QWORD[40+r9]
+        mov     rsi,r8
+        mov     ecx,154
+        DD      0xa548f3fc
+
+        mov     rsi,r9
+        xor     rcx,rcx
+        mov     rdx,QWORD[8+rsi]
+        mov     r8,QWORD[rsi]
+        mov     r9,QWORD[16+rsi]
+        mov     r10,QWORD[40+rsi]
+        lea     r11,[56+rsi]
+        lea     r12,[24+rsi]
+        mov     QWORD[32+rsp],r10
+        mov     QWORD[40+rsp],r11
+        mov     QWORD[48+rsp],r12
+        mov     QWORD[56+rsp],rcx
+        call    QWORD[__imp_RtlVirtualUnwind]
+
+        mov     eax,1
+        add     rsp,64
+        popfq
+        pop     r15
+        pop     r14
+        pop     r13
+        pop     r12
+        pop     rbp
+        pop     rbx
+        pop     rdi
+        pop     rsi
+        DB      0F3h,0C3h               ;repret
+
+
+section .pdata rdata align=4
+ALIGN   4
+        DD      $L$SEH_begin_RC4 wrt ..imagebase
+        DD      $L$SEH_end_RC4 wrt ..imagebase
+        DD      $L$SEH_info_RC4 wrt ..imagebase
+
+        DD      $L$SEH_begin_RC4_set_key wrt ..imagebase
+        DD      $L$SEH_end_RC4_set_key wrt ..imagebase
+        DD      $L$SEH_info_RC4_set_key wrt ..imagebase
+
+section .xdata rdata align=8
+ALIGN   8
+$L$SEH_info_RC4:
+DB      9,0,0,0
+        DD      stream_se_handler wrt ..imagebase
+$L$SEH_info_RC4_set_key:
+DB      9,0,0,0
+        DD      key_se_handler wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/sha/keccak1600-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/keccak1600-x86_64.nasm
new file mode 100644
index 0000000000..00eadebf68
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/keccak1600-x86_64.nasm
@@ -0,0 +1,532 @@
+; Copyright 2017-2018 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+
+ALIGN   32
+__KeccakF1600:
+        mov     rax,QWORD[60+rdi]
+        mov     rbx,QWORD[68+rdi]
+        mov     rcx,QWORD[76+rdi]
+        mov     rdx,QWORD[84+rdi]
+        mov     rbp,QWORD[92+rdi]
+        jmp     NEAR $L$oop
+
+ALIGN   32
+$L$oop:
+        mov     r8,QWORD[((-100))+rdi]
+        mov     r9,QWORD[((-52))+rdi]
+        mov     r10,QWORD[((-4))+rdi]
+        mov     r11,QWORD[44+rdi]
+
+        xor     rcx,QWORD[((-84))+rdi]
+        xor     rdx,QWORD[((-76))+rdi]
+        xor     rax,r8
+        xor     rbx,QWORD[((-92))+rdi]
+        xor     rcx,QWORD[((-44))+rdi]
+        xor     rax,QWORD[((-60))+rdi]
+        mov     r12,rbp
+        xor     rbp,QWORD[((-68))+rdi]
+
+        xor     rcx,r10
+        xor     rax,QWORD[((-20))+rdi]
+        xor     rdx,QWORD[((-36))+rdi]
+        xor     rbx,r9
+        xor     rbp,QWORD[((-28))+rdi]
+
+        xor     rcx,QWORD[36+rdi]
+        xor     rax,QWORD[20+rdi]
+        xor     rdx,QWORD[4+rdi]
+        xor     rbx,QWORD[((-12))+rdi]
+        xor     rbp,QWORD[12+rdi]
+
+        mov     r13,rcx
+        rol     rcx,1
+        xor     rcx,rax
+        xor     rdx,r11
+
+        rol     rax,1
+        xor     rax,rdx
+        xor     rbx,QWORD[28+rdi]
+
+        rol     rdx,1
+        xor     rdx,rbx
+        xor     rbp,QWORD[52+rdi]
+
+        rol     rbx,1
+        xor     rbx,rbp
+
+        rol     rbp,1
+        xor     rbp,r13
+        xor     r9,rcx
+        xor     r10,rdx
+        rol     r9,44
+        xor     r11,rbp
+        xor     r12,rax
+        rol     r10,43
+        xor     r8,rbx
+        mov     r13,r9
+        rol     r11,21
+        or      r9,r10
+        xor     r9,r8
+        rol     r12,14
+
+        xor     r9,QWORD[r15]
+        lea     r15,[8+r15]
+
+        mov     r14,r12
+        and     r12,r11
+        mov     QWORD[((-100))+rsi],r9
+        xor     r12,r10
+        not     r10
+        mov     QWORD[((-84))+rsi],r12
+
+        or      r10,r11
+        mov     r12,QWORD[76+rdi]
+        xor     r10,r13
+        mov     QWORD[((-92))+rsi],r10
+
+        and     r13,r8
+        mov     r9,QWORD[((-28))+rdi]
+        xor     r13,r14
+        mov     r10,QWORD[((-20))+rdi]
+        mov     QWORD[((-68))+rsi],r13
+
+        or      r14,r8
+        mov     r8,QWORD[((-76))+rdi]
+        xor     r14,r11
+        mov     r11,QWORD[28+rdi]
+        mov     QWORD[((-76))+rsi],r14
+
+
+        xor     r8,rbp
+        xor     r12,rdx
+        rol     r8,28
+        xor     r11,rcx
+        xor     r9,rax
+        rol     r12,61
+        rol     r11,45
+        xor     r10,rbx
+        rol     r9,20
+        mov     r13,r8
+        or      r8,r12
+        rol     r10,3
+
+        xor     r8,r11
+        mov     QWORD[((-36))+rsi],r8
+
+        mov     r14,r9
+        and     r9,r13
+        mov     r8,QWORD[((-92))+rdi]
+        xor     r9,r12
+        not     r12
+        mov     QWORD[((-28))+rsi],r9
+
+        or      r12,r11
+        mov     r9,QWORD[((-44))+rdi]
+        xor     r12,r10
+        mov     QWORD[((-44))+rsi],r12
+
+        and     r11,r10
+        mov     r12,QWORD[60+rdi]
+        xor     r11,r14
+        mov     QWORD[((-52))+rsi],r11
+
+        or      r14,r10
+        mov     r10,QWORD[4+rdi]
+        xor     r14,r13
+        mov     r11,QWORD[52+rdi]
+        mov     QWORD[((-60))+rsi],r14
+
+
+        xor     r10,rbp
+        xor     r11,rax
+        rol     r10,25
+        xor     r9,rdx
+        rol     r11,8
+        xor     r12,rbx
+        rol     r9,6
+        xor     r8,rcx
+        rol     r12,18
+        mov     r13,r10
+        and     r10,r11
+        rol     r8,1
+
+        not     r11
+        xor     r10,r9
+        mov     QWORD[((-12))+rsi],r10
+
+        mov     r14,r12
+        and     r12,r11
+        mov     r10,QWORD[((-12))+rdi]
+        xor     r12,r13
+        mov     QWORD[((-4))+rsi],r12
+
+        or      r13,r9
+        mov     r12,QWORD[84+rdi]
+        xor     r13,r8
+        mov     QWORD[((-20))+rsi],r13
+
+        and     r9,r8
+        xor     r9,r14
+        mov     QWORD[12+rsi],r9
+
+        or      r14,r8
+        mov     r9,QWORD[((-60))+rdi]
+        xor     r14,r11
+        mov     r11,QWORD[36+rdi]
+        mov     QWORD[4+rsi],r14
+
+
+        mov     r8,QWORD[((-68))+rdi]
+
+        xor     r10,rcx
+        xor     r11,rdx
+        rol     r10,10
+        xor     r9,rbx
+        rol     r11,15
+        xor     r12,rbp
+        rol     r9,36
+        xor     r8,rax
+        rol     r12,56
+        mov     r13,r10
+        or      r10,r11
+        rol     r8,27
+
+        not     r11
+        xor     r10,r9
+        mov     QWORD[28+rsi],r10
+
+        mov     r14,r12
+        or      r12,r11
+        xor     r12,r13
+        mov     QWORD[36+rsi],r12
+
+        and     r13,r9
+        xor     r13,r8
+        mov     QWORD[20+rsi],r13
+
+        or      r9,r8
+        xor     r9,r14
+        mov     QWORD[52+rsi],r9
+
+        and     r8,r14
+        xor     r8,r11
+        mov     QWORD[44+rsi],r8
+
+
+        xor     rdx,QWORD[((-84))+rdi]
+        xor     rbp,QWORD[((-36))+rdi]
+        rol     rdx,62
+        xor     rcx,QWORD[68+rdi]
+        rol     rbp,55
+        xor     rax,QWORD[12+rdi]
+        rol     rcx,2
+        xor     rbx,QWORD[20+rdi]
+        xchg    rdi,rsi
+        rol     rax,39
+        rol     rbx,41
+        mov     r13,rdx
+        and     rdx,rbp
+        not     rbp
+        xor     rdx,rcx
+        mov     QWORD[92+rdi],rdx
+
+        mov     r14,rax
+        and     rax,rbp
+        xor     rax,r13
+        mov     QWORD[60+rdi],rax
+
+        or      r13,rcx
+        xor     r13,rbx
+        mov     QWORD[84+rdi],r13
+
+        and     rcx,rbx
+        xor     rcx,r14
+        mov     QWORD[76+rdi],rcx
+
+        or      rbx,r14
+        xor     rbx,rbp
+        mov     QWORD[68+rdi],rbx
+
+        mov     rbp,rdx
+        mov     rdx,r13
+
+        test    r15,255
+        jnz     NEAR $L$oop
+
+        lea     r15,[((-192))+r15]
+        DB      0F3h,0C3h               ;repret
+
+
+
+ALIGN   32
+KeccakF1600:
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+
+        lea     rdi,[100+rdi]
+        sub     rsp,200
+
+
+        not     QWORD[((-92))+rdi]
+        not     QWORD[((-84))+rdi]
+        not     QWORD[((-36))+rdi]
+        not     QWORD[((-4))+rdi]
+        not     QWORD[36+rdi]
+        not     QWORD[60+rdi]
+
+        lea     r15,[iotas]
+        lea     rsi,[100+rsp]
+
+        call    __KeccakF1600
+
+        not     QWORD[((-92))+rdi]
+        not     QWORD[((-84))+rdi]
+        not     QWORD[((-36))+rdi]
+        not     QWORD[((-4))+rdi]
+        not     QWORD[36+rdi]
+        not     QWORD[60+rdi]
+        lea     rdi,[((-100))+rdi]
+
+        add     rsp,200
+
+
+        pop     r15
+
+        pop     r14
+
+        pop     r13
+
+        pop     r12
+
+        pop     rbp
+
+        pop     rbx
+
+        DB      0F3h,0C3h               ;repret
+
+
+global  SHA3_absorb
+
+ALIGN   32
+SHA3_absorb:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_SHA3_absorb:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+
+
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+
+        lea     rdi,[100+rdi]
+        sub     rsp,232
+
+
+        mov     r9,rsi
+        lea     rsi,[100+rsp]
+
+        not     QWORD[((-92))+rdi]
+        not     QWORD[((-84))+rdi]
+        not     QWORD[((-36))+rdi]
+        not     QWORD[((-4))+rdi]
+        not     QWORD[36+rdi]
+        not     QWORD[60+rdi]
+        lea     r15,[iotas]
+
+        mov     QWORD[((216-100))+rsi],rcx
+
+$L$oop_absorb:
+        cmp     rdx,rcx
+        jc      NEAR $L$done_absorb
+
+        shr     rcx,3
+        lea     r8,[((-100))+rdi]
+
+$L$block_absorb:
+        mov     rax,QWORD[r9]
+        lea     r9,[8+r9]
+        xor     rax,QWORD[r8]
+        lea     r8,[8+r8]
+        sub     rdx,8
+        mov     QWORD[((-8))+r8],rax
+        sub     rcx,1
+        jnz     NEAR $L$block_absorb
+
+        mov     QWORD[((200-100))+rsi],r9
+        mov     QWORD[((208-100))+rsi],rdx
+        call    __KeccakF1600
+        mov     r9,QWORD[((200-100))+rsi]
+        mov     rdx,QWORD[((208-100))+rsi]
+        mov     rcx,QWORD[((216-100))+rsi]
+        jmp     NEAR $L$oop_absorb
+
+ALIGN   32
+$L$done_absorb:
+        mov     rax,rdx
+
+        not     QWORD[((-92))+rdi]
+        not     QWORD[((-84))+rdi]
+        not     QWORD[((-36))+rdi]
+        not     QWORD[((-4))+rdi]
+        not     QWORD[36+rdi]
+        not     QWORD[60+rdi]
+
+        add     rsp,232
+
+
+        pop     r15
+
+        pop     r14
+
+        pop     r13
+
+        pop     r12
+
+        pop     rbp
+
+        pop     rbx
+
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_SHA3_absorb:
+global  SHA3_squeeze
+
+ALIGN   32
+SHA3_squeeze:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_SHA3_squeeze:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+        mov     rcx,r9
+
+
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+
+        shr     rcx,3
+        mov     r8,rdi
+        mov     r12,rsi
+        mov     r13,rdx
+        mov     r14,rcx
+        jmp     NEAR $L$oop_squeeze
+
+ALIGN   32
+$L$oop_squeeze:
+        cmp     r13,8
+        jb      NEAR $L$tail_squeeze
+
+        mov     rax,QWORD[r8]
+        lea     r8,[8+r8]
+        mov     QWORD[r12],rax
+        lea     r12,[8+r12]
+        sub     r13,8
+        jz      NEAR $L$done_squeeze
+
+        sub     rcx,1
+        jnz     NEAR $L$oop_squeeze
+
+        call    KeccakF1600
+        mov     r8,rdi
+        mov     rcx,r14
+        jmp     NEAR $L$oop_squeeze
+
+$L$tail_squeeze:
+        mov     rsi,r8
+        mov     rdi,r12
+        mov     rcx,r13
+DB      0xf3,0xa4
+
+$L$done_squeeze:
+        pop     r14
+
+        pop     r13
+
+        pop     r12
+
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_SHA3_squeeze:
+ALIGN   256
+        DQ      0,0,0,0,0,0,0,0
+
+iotas:
+        DQ      0x0000000000000001
+        DQ      0x0000000000008082
+        DQ      0x800000000000808a
+        DQ      0x8000000080008000
+        DQ      0x000000000000808b
+        DQ      0x0000000080000001
+        DQ      0x8000000080008081
+        DQ      0x8000000000008009
+        DQ      0x000000000000008a
+        DQ      0x0000000000000088
+        DQ      0x0000000080008009
+        DQ      0x000000008000000a
+        DQ      0x000000008000808b
+        DQ      0x800000000000008b
+        DQ      0x8000000000008089
+        DQ      0x8000000000008003
+        DQ      0x8000000000008002
+        DQ      0x8000000000000080
+        DQ      0x000000000000800a
+        DQ      0x800000008000000a
+        DQ      0x8000000080008081
+        DQ      0x8000000000008080
+        DQ      0x0000000080000001
+        DQ      0x8000000080008008
+
+DB      75,101,99,99,97,107,45,49,54,48,48,32,97,98,115,111
+DB      114,98,32,97,110,100,32,115,113,117,101,101,122,101,32,102
+DB      111,114,32,120,56,54,95,54,52,44,32,67,82,89,80,84
+DB      79,71,65,77,83,32,98,121,32,60,97,112,112,114,111,64
+DB      111,112,101,110,115,115,108,46,111,114,103,62,0
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-mb-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-mb-x86_64.nasm
new file mode 100644
index 0000000000..ea394daa3b
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-mb-x86_64.nasm
@@ -0,0 +1,7581 @@
+; Copyright 2013-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+EXTERN  OPENSSL_ia32cap_P
+
+global  sha1_multi_block
+
+ALIGN   32
+sha1_multi_block:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_sha1_multi_block:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+
+        mov     rcx,QWORD[((OPENSSL_ia32cap_P+4))]
+        bt      rcx,61
+        jc      NEAR _shaext_shortcut
+        test    ecx,268435456
+        jnz     NEAR _avx_shortcut
+        mov     rax,rsp
+
+        push    rbx
+
+        push    rbp
+
+        lea     rsp,[((-168))+rsp]
+        movaps  XMMWORD[rsp],xmm6
+        movaps  XMMWORD[16+rsp],xmm7
+        movaps  XMMWORD[32+rsp],xmm8
+        movaps  XMMWORD[48+rsp],xmm9
+        movaps  XMMWORD[(-120)+rax],xmm10
+        movaps  XMMWORD[(-104)+rax],xmm11
+        movaps  XMMWORD[(-88)+rax],xmm12
+        movaps  XMMWORD[(-72)+rax],xmm13
+        movaps  XMMWORD[(-56)+rax],xmm14
+        movaps  XMMWORD[(-40)+rax],xmm15
+        sub     rsp,288
+        and     rsp,-256
+        mov     QWORD[272+rsp],rax
+
+$L$body:
+        lea     rbp,[K_XX_XX]
+        lea     rbx,[256+rsp]
+
+$L$oop_grande:
+        mov     DWORD[280+rsp],edx
+        xor     edx,edx
+        mov     r8,QWORD[rsi]
+        mov     ecx,DWORD[8+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[rbx],ecx
+        cmovle  r8,rbp
+        mov     r9,QWORD[16+rsi]
+        mov     ecx,DWORD[24+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[4+rbx],ecx
+        cmovle  r9,rbp
+        mov     r10,QWORD[32+rsi]
+        mov     ecx,DWORD[40+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[8+rbx],ecx
+        cmovle  r10,rbp
+        mov     r11,QWORD[48+rsi]
+        mov     ecx,DWORD[56+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[12+rbx],ecx
+        cmovle  r11,rbp
+        test    edx,edx
+        jz      NEAR $L$done
+
+        movdqu  xmm10,XMMWORD[rdi]
+        lea     rax,[128+rsp]
+        movdqu  xmm11,XMMWORD[32+rdi]
+        movdqu  xmm12,XMMWORD[64+rdi]
+        movdqu  xmm13,XMMWORD[96+rdi]
+        movdqu  xmm14,XMMWORD[128+rdi]
+        movdqa  xmm5,XMMWORD[96+rbp]
+        movdqa  xmm15,XMMWORD[((-32))+rbp]
+        jmp     NEAR $L$oop
+
+ALIGN   32
+$L$oop:
+        movd    xmm0,DWORD[r8]
+        lea     r8,[64+r8]
+        movd    xmm2,DWORD[r9]
+        lea     r9,[64+r9]
+        movd    xmm3,DWORD[r10]
+        lea     r10,[64+r10]
+        movd    xmm4,DWORD[r11]
+        lea     r11,[64+r11]
+        punpckldq       xmm0,xmm3
+        movd    xmm1,DWORD[((-60))+r8]
+        punpckldq       xmm2,xmm4
+        movd    xmm9,DWORD[((-60))+r9]
+        punpckldq       xmm0,xmm2
+        movd    xmm8,DWORD[((-60))+r10]
+DB      102,15,56,0,197
+        movd    xmm7,DWORD[((-60))+r11]
+        punpckldq       xmm1,xmm8
+        movdqa  xmm8,xmm10
+        paddd   xmm14,xmm15
+        punpckldq       xmm9,xmm7
+        movdqa  xmm7,xmm11
+        movdqa  xmm6,xmm11
+        pslld   xmm8,5
+        pandn   xmm7,xmm13
+        pand    xmm6,xmm12
+        punpckldq       xmm1,xmm9
+        movdqa  xmm9,xmm10
+
+        movdqa  XMMWORD[(0-128)+rax],xmm0
+        paddd   xmm14,xmm0
+        movd    xmm2,DWORD[((-56))+r8]
+        psrld   xmm9,27
+        pxor    xmm6,xmm7
+        movdqa  xmm7,xmm11
+
+        por     xmm8,xmm9
+        movd    xmm9,DWORD[((-56))+r9]
+        pslld   xmm7,30
+        paddd   xmm14,xmm6
+
+        psrld   xmm11,2
+        paddd   xmm14,xmm8
+DB      102,15,56,0,205
+        movd    xmm8,DWORD[((-56))+r10]
+        por     xmm11,xmm7
+        movd    xmm7,DWORD[((-56))+r11]
+        punpckldq       xmm2,xmm8
+        movdqa  xmm8,xmm14
+        paddd   xmm13,xmm15
+        punpckldq       xmm9,xmm7
+        movdqa  xmm7,xmm10
+        movdqa  xmm6,xmm10
+        pslld   xmm8,5
+        pandn   xmm7,xmm12
+        pand    xmm6,xmm11
+        punpckldq       xmm2,xmm9
+        movdqa  xmm9,xmm14
+
+        movdqa  XMMWORD[(16-128)+rax],xmm1
+        paddd   xmm13,xmm1
+        movd    xmm3,DWORD[((-52))+r8]
+        psrld   xmm9,27
+        pxor    xmm6,xmm7
+        movdqa  xmm7,xmm10
+
+        por     xmm8,xmm9
+        movd    xmm9,DWORD[((-52))+r9]
+        pslld   xmm7,30
+        paddd   xmm13,xmm6
+
+        psrld   xmm10,2
+        paddd   xmm13,xmm8
+DB      102,15,56,0,213
+        movd    xmm8,DWORD[((-52))+r10]
+        por     xmm10,xmm7
+        movd    xmm7,DWORD[((-52))+r11]
+        punpckldq       xmm3,xmm8
+        movdqa  xmm8,xmm13
+        paddd   xmm12,xmm15
+        punpckldq       xmm9,xmm7
+        movdqa  xmm7,xmm14
+        movdqa  xmm6,xmm14
+        pslld   xmm8,5
+        pandn   xmm7,xmm11
+        pand    xmm6,xmm10
+        punpckldq       xmm3,xmm9
+        movdqa  xmm9,xmm13
+
+        movdqa  XMMWORD[(32-128)+rax],xmm2
+        paddd   xmm12,xmm2
+        movd    xmm4,DWORD[((-48))+r8]
+        psrld   xmm9,27
+        pxor    xmm6,xmm7
+        movdqa  xmm7,xmm14
+
+        por     xmm8,xmm9
+        movd    xmm9,DWORD[((-48))+r9]
+        pslld   xmm7,30
+        paddd   xmm12,xmm6
+
+        psrld   xmm14,2
+        paddd   xmm12,xmm8
+DB      102,15,56,0,221
+        movd    xmm8,DWORD[((-48))+r10]
+        por     xmm14,xmm7
+        movd    xmm7,DWORD[((-48))+r11]
+        punpckldq       xmm4,xmm8
+        movdqa  xmm8,xmm12
+        paddd   xmm11,xmm15
+        punpckldq       xmm9,xmm7
+        movdqa  xmm7,xmm13
+        movdqa  xmm6,xmm13
+        pslld   xmm8,5
+        pandn   xmm7,xmm10
+        pand    xmm6,xmm14
+        punpckldq       xmm4,xmm9
+        movdqa  xmm9,xmm12
+
+        movdqa  XMMWORD[(48-128)+rax],xmm3
+        paddd   xmm11,xmm3
+        movd    xmm0,DWORD[((-44))+r8]
+        psrld   xmm9,27
+        pxor    xmm6,xmm7
+        movdqa  xmm7,xmm13
+
+        por     xmm8,xmm9
+        movd    xmm9,DWORD[((-44))+r9]
+        pslld   xmm7,30
+        paddd   xmm11,xmm6
+
+        psrld   xmm13,2
+        paddd   xmm11,xmm8
+DB      102,15,56,0,229
+        movd    xmm8,DWORD[((-44))+r10]
+        por     xmm13,xmm7
+        movd    xmm7,DWORD[((-44))+r11]
+        punpckldq       xmm0,xmm8
+        movdqa  xmm8,xmm11
+        paddd   xmm10,xmm15
+        punpckldq       xmm9,xmm7
+        movdqa  xmm7,xmm12
+        movdqa  xmm6,xmm12
+        pslld   xmm8,5
+        pandn   xmm7,xmm14
+        pand    xmm6,xmm13
+        punpckldq       xmm0,xmm9
+        movdqa  xmm9,xmm11
+
+        movdqa  XMMWORD[(64-128)+rax],xmm4
+        paddd   xmm10,xmm4
+        movd    xmm1,DWORD[((-40))+r8]
+        psrld   xmm9,27
+        pxor    xmm6,xmm7
+        movdqa  xmm7,xmm12
+
+        por     xmm8,xmm9
+        movd    xmm9,DWORD[((-40))+r9]
+        pslld   xmm7,30
+        paddd   xmm10,xmm6
+
+        psrld   xmm12,2
+        paddd   xmm10,xmm8
+DB      102,15,56,0,197
+        movd    xmm8,DWORD[((-40))+r10]
+        por     xmm12,xmm7
+        movd    xmm7,DWORD[((-40))+r11]
+        punpckldq       xmm1,xmm8
+        movdqa  xmm8,xmm10
+        paddd   xmm14,xmm15
+        punpckldq       xmm9,xmm7
+        movdqa  xmm7,xmm11
+        movdqa  xmm6,xmm11
+        pslld   xmm8,5
+        pandn   xmm7,xmm13
+        pand    xmm6,xmm12
+        punpckldq       xmm1,xmm9
+        movdqa  xmm9,xmm10
+
+        movdqa  XMMWORD[(80-128)+rax],xmm0
+        paddd   xmm14,xmm0
+        movd    xmm2,DWORD[((-36))+r8]
+        psrld   xmm9,27
+        pxor    xmm6,xmm7
+        movdqa  xmm7,xmm11
+
+        por     xmm8,xmm9
+        movd    xmm9,DWORD[((-36))+r9]
+        pslld   xmm7,30
+        paddd   xmm14,xmm6
+
+        psrld   xmm11,2
+        paddd   xmm14,xmm8
+DB      102,15,56,0,205
+        movd    xmm8,DWORD[((-36))+r10]
+        por     xmm11,xmm7
+        movd    xmm7,DWORD[((-36))+r11]
+        punpckldq       xmm2,xmm8
+        movdqa  xmm8,xmm14
+        paddd   xmm13,xmm15
+        punpckldq       xmm9,xmm7
+        movdqa  xmm7,xmm10
+        movdqa  xmm6,xmm10
+        pslld   xmm8,5
+        pandn   xmm7,xmm12
+        pand    xmm6,xmm11
+        punpckldq       xmm2,xmm9
+        movdqa  xmm9,xmm14
+
+        movdqa  XMMWORD[(96-128)+rax],xmm1
+        paddd   xmm13,xmm1
+        movd    xmm3,DWORD[((-32))+r8]
+        psrld   xmm9,27
+        pxor    xmm6,xmm7
+        movdqa  xmm7,xmm10
+
+        por     xmm8,xmm9
+        movd    xmm9,DWORD[((-32))+r9]
+        pslld   xmm7,30
+        paddd   xmm13,xmm6
+
+        psrld   xmm10,2
+        paddd   xmm13,xmm8
+DB      102,15,56,0,213
+        movd    xmm8,DWORD[((-32))+r10]
+        por     xmm10,xmm7
+        movd    xmm7,DWORD[((-32))+r11]
+        punpckldq       xmm3,xmm8
+        movdqa  xmm8,xmm13
+        paddd   xmm12,xmm15
+        punpckldq       xmm9,xmm7
+        movdqa  xmm7,xmm14
+        movdqa  xmm6,xmm14
+        pslld   xmm8,5
+        pandn   xmm7,xmm11
+        pand    xmm6,xmm10
+        punpckldq       xmm3,xmm9
+        movdqa  xmm9,xmm13
+
+        movdqa  XMMWORD[(112-128)+rax],xmm2
+        paddd   xmm12,xmm2
+        movd    xmm4,DWORD[((-28))+r8]
+        psrld   xmm9,27
+        pxor    xmm6,xmm7
+        movdqa  xmm7,xmm14
+
+        por     xmm8,xmm9
+        movd    xmm9,DWORD[((-28))+r9]
+        pslld   xmm7,30
+        paddd   xmm12,xmm6
+
+        psrld   xmm14,2
+        paddd   xmm12,xmm8
+DB      102,15,56,0,221
+        movd    xmm8,DWORD[((-28))+r10]
+        por     xmm14,xmm7
+        movd    xmm7,DWORD[((-28))+r11]
+        punpckldq       xmm4,xmm8
+        movdqa  xmm8,xmm12
+        paddd   xmm11,xmm15
+        punpckldq       xmm9,xmm7
+        movdqa  xmm7,xmm13
+        movdqa  xmm6,xmm13
+        pslld   xmm8,5
+        pandn   xmm7,xmm10
+        pand    xmm6,xmm14
+        punpckldq       xmm4,xmm9
+        movdqa  xmm9,xmm12
+
+        movdqa  XMMWORD[(128-128)+rax],xmm3
+        paddd   xmm11,xmm3
+        movd    xmm0,DWORD[((-24))+r8]
+        psrld   xmm9,27
+        pxor    xmm6,xmm7
+        movdqa  xmm7,xmm13
+
+        por     xmm8,xmm9
+        movd    xmm9,DWORD[((-24))+r9]
+        pslld   xmm7,30
+        paddd   xmm11,xmm6
+
+        psrld   xmm13,2
+        paddd   xmm11,xmm8
+DB      102,15,56,0,229
+        movd    xmm8,DWORD[((-24))+r10]
+        por     xmm13,xmm7
+        movd    xmm7,DWORD[((-24))+r11]
+        punpckldq       xmm0,xmm8
+        movdqa  xmm8,xmm11
+        paddd   xmm10,xmm15
+        punpckldq       xmm9,xmm7
+        movdqa  xmm7,xmm12
+        movdqa  xmm6,xmm12
+        pslld   xmm8,5
+        pandn   xmm7,xmm14
+        pand    xmm6,xmm13
+        punpckldq       xmm0,xmm9
+        movdqa  xmm9,xmm11
+
+        movdqa  XMMWORD[(144-128)+rax],xmm4
+        paddd   xmm10,xmm4
+        movd    xmm1,DWORD[((-20))+r8]
+        psrld   xmm9,27
+        pxor    xmm6,xmm7
+        movdqa  xmm7,xmm12
+
+        por     xmm8,xmm9
+        movd    xmm9,DWORD[((-20))+r9]
+        pslld   xmm7,30
+        paddd   xmm10,xmm6
+
+        psrld   xmm12,2
+        paddd   xmm10,xmm8
+DB      102,15,56,0,197
+        movd    xmm8,DWORD[((-20))+r10]
+        por     xmm12,xmm7
+        movd    xmm7,DWORD[((-20))+r11]
+        punpckldq       xmm1,xmm8
+        movdqa  xmm8,xmm10
+        paddd   xmm14,xmm15
+        punpckldq       xmm9,xmm7
+        movdqa  xmm7,xmm11
+        movdqa  xmm6,xmm11
+        pslld   xmm8,5
+        pandn   xmm7,xmm13
+        pand    xmm6,xmm12
+        punpckldq       xmm1,xmm9
+        movdqa  xmm9,xmm10
+
+        movdqa  XMMWORD[(160-128)+rax],xmm0
+        paddd   xmm14,xmm0
+        movd    xmm2,DWORD[((-16))+r8]
+        psrld   xmm9,27
+        pxor    xmm6,xmm7
+        movdqa  xmm7,xmm11
+
+        por     xmm8,xmm9
+        movd    xmm9,DWORD[((-16))+r9]
+        pslld   xmm7,30
+        paddd   xmm14,xmm6
+
+        psrld   xmm11,2
+        paddd   xmm14,xmm8
+DB      102,15,56,0,205
+        movd    xmm8,DWORD[((-16))+r10]
+        por     xmm11,xmm7
+        movd    xmm7,DWORD[((-16))+r11]
+        punpckldq       xmm2,xmm8
+        movdqa  xmm8,xmm14
+        paddd   xmm13,xmm15
+        punpckldq       xmm9,xmm7
+        movdqa  xmm7,xmm10
+        movdqa  xmm6,xmm10
+        pslld   xmm8,5
+        pandn   xmm7,xmm12
+        pand    xmm6,xmm11
+        punpckldq       xmm2,xmm9
+        movdqa  xmm9,xmm14
+
+        movdqa  XMMWORD[(176-128)+rax],xmm1
+        paddd   xmm13,xmm1
+        movd    xmm3,DWORD[((-12))+r8]
+        psrld   xmm9,27
+        pxor    xmm6,xmm7
+        movdqa  xmm7,xmm10
+
+        por     xmm8,xmm9
+        movd    xmm9,DWORD[((-12))+r9]
+        pslld   xmm7,30
+        paddd   xmm13,xmm6
+
+        psrld   xmm10,2
+        paddd   xmm13,xmm8
+DB      102,15,56,0,213
+        movd    xmm8,DWORD[((-12))+r10]
+        por     xmm10,xmm7
+        movd    xmm7,DWORD[((-12))+r11]
+        punpckldq       xmm3,xmm8
+        movdqa  xmm8,xmm13
+        paddd   xmm12,xmm15
+        punpckldq       xmm9,xmm7
+        movdqa  xmm7,xmm14
+        movdqa  xmm6,xmm14
+        pslld   xmm8,5
+        pandn   xmm7,xmm11
+        pand    xmm6,xmm10
+        punpckldq       xmm3,xmm9
+        movdqa  xmm9,xmm13
+
+        movdqa  XMMWORD[(192-128)+rax],xmm2
+        paddd   xmm12,xmm2
+        movd    xmm4,DWORD[((-8))+r8]
+        psrld   xmm9,27
+        pxor    xmm6,xmm7
+        movdqa  xmm7,xmm14
+
+        por     xmm8,xmm9
+        movd    xmm9,DWORD[((-8))+r9]
+        pslld   xmm7,30
+        paddd   xmm12,xmm6
+
+        psrld   xmm14,2
+        paddd   xmm12,xmm8
+DB      102,15,56,0,221
+        movd    xmm8,DWORD[((-8))+r10]
+        por     xmm14,xmm7
+        movd    xmm7,DWORD[((-8))+r11]
+        punpckldq       xmm4,xmm8
+        movdqa  xmm8,xmm12
+        paddd   xmm11,xmm15
+        punpckldq       xmm9,xmm7
+        movdqa  xmm7,xmm13
+        movdqa  xmm6,xmm13
+        pslld   xmm8,5
+        pandn   xmm7,xmm10
+        pand    xmm6,xmm14
+        punpckldq       xmm4,xmm9
+        movdqa  xmm9,xmm12
+
+        movdqa  XMMWORD[(208-128)+rax],xmm3
+        paddd   xmm11,xmm3
+        movd    xmm0,DWORD[((-4))+r8]
+        psrld   xmm9,27
+        pxor    xmm6,xmm7
+        movdqa  xmm7,xmm13
+
+        por     xmm8,xmm9
+        movd    xmm9,DWORD[((-4))+r9]
+        pslld   xmm7,30
+        paddd   xmm11,xmm6
+
+        psrld   xmm13,2
+        paddd   xmm11,xmm8
+DB      102,15,56,0,229
+        movd    xmm8,DWORD[((-4))+r10]
+        por     xmm13,xmm7
+        movdqa  xmm1,XMMWORD[((0-128))+rax]
+        movd    xmm7,DWORD[((-4))+r11]
+        punpckldq       xmm0,xmm8
+        movdqa  xmm8,xmm11
+        paddd   xmm10,xmm15
+        punpckldq       xmm9,xmm7
+        movdqa  xmm7,xmm12
+        movdqa  xmm6,xmm12
+        pslld   xmm8,5
+        prefetcht0      [63+r8]
+        pandn   xmm7,xmm14
+        pand    xmm6,xmm13
+        punpckldq       xmm0,xmm9
+        movdqa  xmm9,xmm11
+
+        movdqa  XMMWORD[(224-128)+rax],xmm4
+        paddd   xmm10,xmm4
+        psrld   xmm9,27
+        pxor    xmm6,xmm7
+        movdqa  xmm7,xmm12
+        prefetcht0      [63+r9]
+
+        por     xmm8,xmm9
+        pslld   xmm7,30
+        paddd   xmm10,xmm6
+        prefetcht0      [63+r10]
+
+        psrld   xmm12,2
+        paddd   xmm10,xmm8
+DB      102,15,56,0,197
+        prefetcht0      [63+r11]
+        por     xmm12,xmm7
+        movdqa  xmm2,XMMWORD[((16-128))+rax]
+        pxor    xmm1,xmm3
+        movdqa  xmm3,XMMWORD[((32-128))+rax]
+
+        movdqa  xmm8,xmm10
+        pxor    xmm1,XMMWORD[((128-128))+rax]
+        paddd   xmm14,xmm15
+        movdqa  xmm7,xmm11
+        pslld   xmm8,5
+        pxor    xmm1,xmm3
+        movdqa  xmm6,xmm11
+        pandn   xmm7,xmm13
+        movdqa  xmm5,xmm1
+        pand    xmm6,xmm12
+        movdqa  xmm9,xmm10
+        psrld   xmm5,31
+        paddd   xmm1,xmm1
+
+        movdqa  XMMWORD[(240-128)+rax],xmm0
+        paddd   xmm14,xmm0
+        psrld   xmm9,27
+        pxor    xmm6,xmm7
+
+        movdqa  xmm7,xmm11
+        por     xmm8,xmm9
+        pslld   xmm7,30
+        paddd   xmm14,xmm6
+
+        psrld   xmm11,2
+        paddd   xmm14,xmm8
+        por     xmm1,xmm5
+        por     xmm11,xmm7
+        pxor    xmm2,xmm4
+        movdqa  xmm4,XMMWORD[((48-128))+rax]
+
+        movdqa  xmm8,xmm14
+        pxor    xmm2,XMMWORD[((144-128))+rax]
+        paddd   xmm13,xmm15
+        movdqa  xmm7,xmm10
+        pslld   xmm8,5
+        pxor    xmm2,xmm4
+        movdqa  xmm6,xmm10
+        pandn   xmm7,xmm12
+        movdqa  xmm5,xmm2
+        pand    xmm6,xmm11
+        movdqa  xmm9,xmm14
+        psrld   xmm5,31
+        paddd   xmm2,xmm2
+
+        movdqa  XMMWORD[(0-128)+rax],xmm1
+        paddd   xmm13,xmm1
+        psrld   xmm9,27
+        pxor    xmm6,xmm7
+
+        movdqa  xmm7,xmm10
+        por     xmm8,xmm9
+        pslld   xmm7,30
+        paddd   xmm13,xmm6
+
+        psrld   xmm10,2
+        paddd   xmm13,xmm8
+        por     xmm2,xmm5
+        por     xmm10,xmm7
+        pxor    xmm3,xmm0
+        movdqa  xmm0,XMMWORD[((64-128))+rax]
+
+        movdqa  xmm8,xmm13
+        pxor    xmm3,XMMWORD[((160-128))+rax]
+        paddd   xmm12,xmm15
+        movdqa  xmm7,xmm14
+        pslld   xmm8,5
+        pxor    xmm3,xmm0
+        movdqa  xmm6,xmm14
+        pandn   xmm7,xmm11
+        movdqa  xmm5,xmm3
+        pand    xmm6,xmm10
+        movdqa  xmm9,xmm13
+        psrld   xmm5,31
+        paddd   xmm3,xmm3
+
+        movdqa  XMMWORD[(16-128)+rax],xmm2
+        paddd   xmm12,xmm2
+        psrld   xmm9,27
+        pxor    xmm6,xmm7
+
+        movdqa  xmm7,xmm14
+        por     xmm8,xmm9
+        pslld   xmm7,30
+        paddd   xmm12,xmm6
+
+        psrld   xmm14,2
+        paddd   xmm12,xmm8
+        por     xmm3,xmm5
+        por     xmm14,xmm7
+        pxor    xmm4,xmm1
+        movdqa  xmm1,XMMWORD[((80-128))+rax]
+
+        movdqa  xmm8,xmm12
+        pxor    xmm4,XMMWORD[((176-128))+rax]
+        paddd   xmm11,xmm15
+        movdqa  xmm7,xmm13
+        pslld   xmm8,5
+        pxor    xmm4,xmm1
+        movdqa  xmm6,xmm13
+        pandn   xmm7,xmm10
+        movdqa  xmm5,xmm4
+        pand    xmm6,xmm14
+        movdqa  xmm9,xmm12
+        psrld   xmm5,31
+        paddd   xmm4,xmm4
+
+        movdqa  XMMWORD[(32-128)+rax],xmm3
+        paddd   xmm11,xmm3
+        psrld   xmm9,27
+        pxor    xmm6,xmm7
+
+        movdqa  xmm7,xmm13
+        por     xmm8,xmm9
+        pslld   xmm7,30
+        paddd   xmm11,xmm6
+
+        psrld   xmm13,2
+        paddd   xmm11,xmm8
+        por     xmm4,xmm5
+        por     xmm13,xmm7
+        pxor    xmm0,xmm2
+        movdqa  xmm2,XMMWORD[((96-128))+rax]
+
+        movdqa  xmm8,xmm11
+        pxor    xmm0,XMMWORD[((192-128))+rax]
+        paddd   xmm10,xmm15
+        movdqa  xmm7,xmm12
+        pslld   xmm8,5
+        pxor    xmm0,xmm2
+        movdqa  xmm6,xmm12
+        pandn   xmm7,xmm14
+        movdqa  xmm5,xmm0
+        pand    xmm6,xmm13
+        movdqa  xmm9,xmm11
+        psrld   xmm5,31
+        paddd   xmm0,xmm0
+
+        movdqa  XMMWORD[(48-128)+rax],xmm4
+        paddd   xmm10,xmm4
+        psrld   xmm9,27
+        pxor    xmm6,xmm7
+
+        movdqa  xmm7,xmm12
+        por     xmm8,xmm9
+        pslld   xmm7,30
+        paddd   xmm10,xmm6
+
+        psrld   xmm12,2
+        paddd   xmm10,xmm8
+        por     xmm0,xmm5
+        por     xmm12,xmm7
+        movdqa  xmm15,XMMWORD[rbp]
+        pxor    xmm1,xmm3
+        movdqa  xmm3,XMMWORD[((112-128))+rax]
+
+        movdqa  xmm8,xmm10
+        movdqa  xmm6,xmm13
+        pxor    xmm1,XMMWORD[((208-128))+rax]
+        paddd   xmm14,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm11
+
+        movdqa  xmm9,xmm10
+        movdqa  XMMWORD[(64-128)+rax],xmm0
+        paddd   xmm14,xmm0
+        pxor    xmm1,xmm3
+        psrld   xmm9,27
+        pxor    xmm6,xmm12
+        movdqa  xmm7,xmm11
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm1
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm14,xmm6
+        paddd   xmm1,xmm1
+
+        psrld   xmm11,2
+        paddd   xmm14,xmm8
+        por     xmm1,xmm5
+        por     xmm11,xmm7
+        pxor    xmm2,xmm4
+        movdqa  xmm4,XMMWORD[((128-128))+rax]
+
+        movdqa  xmm8,xmm14
+        movdqa  xmm6,xmm12
+        pxor    xmm2,XMMWORD[((224-128))+rax]
+        paddd   xmm13,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm10
+
+        movdqa  xmm9,xmm14
+        movdqa  XMMWORD[(80-128)+rax],xmm1
+        paddd   xmm13,xmm1
+        pxor    xmm2,xmm4
+        psrld   xmm9,27
+        pxor    xmm6,xmm11
+        movdqa  xmm7,xmm10
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm2
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm13,xmm6
+        paddd   xmm2,xmm2
+
+        psrld   xmm10,2
+        paddd   xmm13,xmm8
+        por     xmm2,xmm5
+        por     xmm10,xmm7
+        pxor    xmm3,xmm0
+        movdqa  xmm0,XMMWORD[((144-128))+rax]
+
+        movdqa  xmm8,xmm13
+        movdqa  xmm6,xmm11
+        pxor    xmm3,XMMWORD[((240-128))+rax]
+        paddd   xmm12,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm14
+
+        movdqa  xmm9,xmm13
+        movdqa  XMMWORD[(96-128)+rax],xmm2
+        paddd   xmm12,xmm2
+        pxor    xmm3,xmm0
+        psrld   xmm9,27
+        pxor    xmm6,xmm10
+        movdqa  xmm7,xmm14
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm3
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm12,xmm6
+        paddd   xmm3,xmm3
+
+        psrld   xmm14,2
+        paddd   xmm12,xmm8
+        por     xmm3,xmm5
+        por     xmm14,xmm7
+        pxor    xmm4,xmm1
+        movdqa  xmm1,XMMWORD[((160-128))+rax]
+
+        movdqa  xmm8,xmm12
+        movdqa  xmm6,xmm10
+        pxor    xmm4,XMMWORD[((0-128))+rax]
+        paddd   xmm11,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm13
+
+        movdqa  xmm9,xmm12
+        movdqa  XMMWORD[(112-128)+rax],xmm3
+        paddd   xmm11,xmm3
+        pxor    xmm4,xmm1
+        psrld   xmm9,27
+        pxor    xmm6,xmm14
+        movdqa  xmm7,xmm13
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm4
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm11,xmm6
+        paddd   xmm4,xmm4
+
+        psrld   xmm13,2
+        paddd   xmm11,xmm8
+        por     xmm4,xmm5
+        por     xmm13,xmm7
+        pxor    xmm0,xmm2
+        movdqa  xmm2,XMMWORD[((176-128))+rax]
+
+        movdqa  xmm8,xmm11
+        movdqa  xmm6,xmm14
+        pxor    xmm0,XMMWORD[((16-128))+rax]
+        paddd   xmm10,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm12
+
+        movdqa  xmm9,xmm11
+        movdqa  XMMWORD[(128-128)+rax],xmm4
+        paddd   xmm10,xmm4
+        pxor    xmm0,xmm2
+        psrld   xmm9,27
+        pxor    xmm6,xmm13
+        movdqa  xmm7,xmm12
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm0
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm10,xmm6
+        paddd   xmm0,xmm0
+
+        psrld   xmm12,2
+        paddd   xmm10,xmm8
+        por     xmm0,xmm5
+        por     xmm12,xmm7
+        pxor    xmm1,xmm3
+        movdqa  xmm3,XMMWORD[((192-128))+rax]
+
+        movdqa  xmm8,xmm10
+        movdqa  xmm6,xmm13
+        pxor    xmm1,XMMWORD[((32-128))+rax]
+        paddd   xmm14,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm11
+
+        movdqa  xmm9,xmm10
+        movdqa  XMMWORD[(144-128)+rax],xmm0
+        paddd   xmm14,xmm0
+        pxor    xmm1,xmm3
+        psrld   xmm9,27
+        pxor    xmm6,xmm12
+        movdqa  xmm7,xmm11
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm1
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm14,xmm6
+        paddd   xmm1,xmm1
+
+        psrld   xmm11,2
+        paddd   xmm14,xmm8
+        por     xmm1,xmm5
+        por     xmm11,xmm7
+        pxor    xmm2,xmm4
+        movdqa  xmm4,XMMWORD[((208-128))+rax]
+
+        movdqa  xmm8,xmm14
+        movdqa  xmm6,xmm12
+        pxor    xmm2,XMMWORD[((48-128))+rax]
+        paddd   xmm13,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm10
+
+        movdqa  xmm9,xmm14
+        movdqa  XMMWORD[(160-128)+rax],xmm1
+        paddd   xmm13,xmm1
+        pxor    xmm2,xmm4
+        psrld   xmm9,27
+        pxor    xmm6,xmm11
+        movdqa  xmm7,xmm10
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm2
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm13,xmm6
+        paddd   xmm2,xmm2
+
+        psrld   xmm10,2
+        paddd   xmm13,xmm8
+        por     xmm2,xmm5
+        por     xmm10,xmm7
+        pxor    xmm3,xmm0
+        movdqa  xmm0,XMMWORD[((224-128))+rax]
+
+        movdqa  xmm8,xmm13
+        movdqa  xmm6,xmm11
+        pxor    xmm3,XMMWORD[((64-128))+rax]
+        paddd   xmm12,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm14
+
+        movdqa  xmm9,xmm13
+        movdqa  XMMWORD[(176-128)+rax],xmm2
+        paddd   xmm12,xmm2
+        pxor    xmm3,xmm0
+        psrld   xmm9,27
+        pxor    xmm6,xmm10
+        movdqa  xmm7,xmm14
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm3
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm12,xmm6
+        paddd   xmm3,xmm3
+
+        psrld   xmm14,2
+        paddd   xmm12,xmm8
+        por     xmm3,xmm5
+        por     xmm14,xmm7
+        pxor    xmm4,xmm1
+        movdqa  xmm1,XMMWORD[((240-128))+rax]
+
+        movdqa  xmm8,xmm12
+        movdqa  xmm6,xmm10
+        pxor    xmm4,XMMWORD[((80-128))+rax]
+        paddd   xmm11,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm13
+
+        movdqa  xmm9,xmm12
+        movdqa  XMMWORD[(192-128)+rax],xmm3
+        paddd   xmm11,xmm3
+        pxor    xmm4,xmm1
+        psrld   xmm9,27
+        pxor    xmm6,xmm14
+        movdqa  xmm7,xmm13
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm4
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm11,xmm6
+        paddd   xmm4,xmm4
+
+        psrld   xmm13,2
+        paddd   xmm11,xmm8
+        por     xmm4,xmm5
+        por     xmm13,xmm7
+        pxor    xmm0,xmm2
+        movdqa  xmm2,XMMWORD[((0-128))+rax]
+
+        movdqa  xmm8,xmm11
+        movdqa  xmm6,xmm14
+        pxor    xmm0,XMMWORD[((96-128))+rax]
+        paddd   xmm10,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm12
+
+        movdqa  xmm9,xmm11
+        movdqa  XMMWORD[(208-128)+rax],xmm4
+        paddd   xmm10,xmm4
+        pxor    xmm0,xmm2
+        psrld   xmm9,27
+        pxor    xmm6,xmm13
+        movdqa  xmm7,xmm12
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm0
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm10,xmm6
+        paddd   xmm0,xmm0
+
+        psrld   xmm12,2
+        paddd   xmm10,xmm8
+        por     xmm0,xmm5
+        por     xmm12,xmm7
+        pxor    xmm1,xmm3
+        movdqa  xmm3,XMMWORD[((16-128))+rax]
+
+        movdqa  xmm8,xmm10
+        movdqa  xmm6,xmm13
+        pxor    xmm1,XMMWORD[((112-128))+rax]
+        paddd   xmm14,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm11
+
+        movdqa  xmm9,xmm10
+        movdqa  XMMWORD[(224-128)+rax],xmm0
+        paddd   xmm14,xmm0
+        pxor    xmm1,xmm3
+        psrld   xmm9,27
+        pxor    xmm6,xmm12
+        movdqa  xmm7,xmm11
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm1
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm14,xmm6
+        paddd   xmm1,xmm1
+
+        psrld   xmm11,2
+        paddd   xmm14,xmm8
+        por     xmm1,xmm5
+        por     xmm11,xmm7
+        pxor    xmm2,xmm4
+        movdqa  xmm4,XMMWORD[((32-128))+rax]
+
+        movdqa  xmm8,xmm14
+        movdqa  xmm6,xmm12
+        pxor    xmm2,XMMWORD[((128-128))+rax]
+        paddd   xmm13,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm10
+
+        movdqa  xmm9,xmm14
+        movdqa  XMMWORD[(240-128)+rax],xmm1
+        paddd   xmm13,xmm1
+        pxor    xmm2,xmm4
+        psrld   xmm9,27
+        pxor    xmm6,xmm11
+        movdqa  xmm7,xmm10
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm2
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm13,xmm6
+        paddd   xmm2,xmm2
+
+        psrld   xmm10,2
+        paddd   xmm13,xmm8
+        por     xmm2,xmm5
+        por     xmm10,xmm7
+        pxor    xmm3,xmm0
+        movdqa  xmm0,XMMWORD[((48-128))+rax]
+
+        movdqa  xmm8,xmm13
+        movdqa  xmm6,xmm11
+        pxor    xmm3,XMMWORD[((144-128))+rax]
+        paddd   xmm12,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm14
+
+        movdqa  xmm9,xmm13
+        movdqa  XMMWORD[(0-128)+rax],xmm2
+        paddd   xmm12,xmm2
+        pxor    xmm3,xmm0
+        psrld   xmm9,27
+        pxor    xmm6,xmm10
+        movdqa  xmm7,xmm14
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm3
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm12,xmm6
+        paddd   xmm3,xmm3
+
+        psrld   xmm14,2
+        paddd   xmm12,xmm8
+        por     xmm3,xmm5
+        por     xmm14,xmm7
+        pxor    xmm4,xmm1
+        movdqa  xmm1,XMMWORD[((64-128))+rax]
+
+        movdqa  xmm8,xmm12
+        movdqa  xmm6,xmm10
+        pxor    xmm4,XMMWORD[((160-128))+rax]
+        paddd   xmm11,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm13
+
+        movdqa  xmm9,xmm12
+        movdqa  XMMWORD[(16-128)+rax],xmm3
+        paddd   xmm11,xmm3
+        pxor    xmm4,xmm1
+        psrld   xmm9,27
+        pxor    xmm6,xmm14
+        movdqa  xmm7,xmm13
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm4
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm11,xmm6
+        paddd   xmm4,xmm4
+
+        psrld   xmm13,2
+        paddd   xmm11,xmm8
+        por     xmm4,xmm5
+        por     xmm13,xmm7
+        pxor    xmm0,xmm2
+        movdqa  xmm2,XMMWORD[((80-128))+rax]
+
+        movdqa  xmm8,xmm11
+        movdqa  xmm6,xmm14
+        pxor    xmm0,XMMWORD[((176-128))+rax]
+        paddd   xmm10,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm12
+
+        movdqa  xmm9,xmm11
+        movdqa  XMMWORD[(32-128)+rax],xmm4
+        paddd   xmm10,xmm4
+        pxor    xmm0,xmm2
+        psrld   xmm9,27
+        pxor    xmm6,xmm13
+        movdqa  xmm7,xmm12
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm0
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm10,xmm6
+        paddd   xmm0,xmm0
+
+        psrld   xmm12,2
+        paddd   xmm10,xmm8
+        por     xmm0,xmm5
+        por     xmm12,xmm7
+        pxor    xmm1,xmm3
+        movdqa  xmm3,XMMWORD[((96-128))+rax]
+
+        movdqa  xmm8,xmm10
+        movdqa  xmm6,xmm13
+        pxor    xmm1,XMMWORD[((192-128))+rax]
+        paddd   xmm14,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm11
+
+        movdqa  xmm9,xmm10
+        movdqa  XMMWORD[(48-128)+rax],xmm0
+        paddd   xmm14,xmm0
+        pxor    xmm1,xmm3
+        psrld   xmm9,27
+        pxor    xmm6,xmm12
+        movdqa  xmm7,xmm11
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm1
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm14,xmm6
+        paddd   xmm1,xmm1
+
+        psrld   xmm11,2
+        paddd   xmm14,xmm8
+        por     xmm1,xmm5
+        por     xmm11,xmm7
+        pxor    xmm2,xmm4
+        movdqa  xmm4,XMMWORD[((112-128))+rax]
+
+        movdqa  xmm8,xmm14
+        movdqa  xmm6,xmm12
+        pxor    xmm2,XMMWORD[((208-128))+rax]
+        paddd   xmm13,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm10
+
+        movdqa  xmm9,xmm14
+        movdqa  XMMWORD[(64-128)+rax],xmm1
+        paddd   xmm13,xmm1
+        pxor    xmm2,xmm4
+        psrld   xmm9,27
+        pxor    xmm6,xmm11
+        movdqa  xmm7,xmm10
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm2
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm13,xmm6
+        paddd   xmm2,xmm2
+
+        psrld   xmm10,2
+        paddd   xmm13,xmm8
+        por     xmm2,xmm5
+        por     xmm10,xmm7
+        pxor    xmm3,xmm0
+        movdqa  xmm0,XMMWORD[((128-128))+rax]
+
+        movdqa  xmm8,xmm13
+        movdqa  xmm6,xmm11
+        pxor    xmm3,XMMWORD[((224-128))+rax]
+        paddd   xmm12,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm14
+
+        movdqa  xmm9,xmm13
+        movdqa  XMMWORD[(80-128)+rax],xmm2
+        paddd   xmm12,xmm2
+        pxor    xmm3,xmm0
+        psrld   xmm9,27
+        pxor    xmm6,xmm10
+        movdqa  xmm7,xmm14
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm3
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm12,xmm6
+        paddd   xmm3,xmm3
+
+        psrld   xmm14,2
+        paddd   xmm12,xmm8
+        por     xmm3,xmm5
+        por     xmm14,xmm7
+        pxor    xmm4,xmm1
+        movdqa  xmm1,XMMWORD[((144-128))+rax]
+
+        movdqa  xmm8,xmm12
+        movdqa  xmm6,xmm10
+        pxor    xmm4,XMMWORD[((240-128))+rax]
+        paddd   xmm11,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm13
+
+        movdqa  xmm9,xmm12
+        movdqa  XMMWORD[(96-128)+rax],xmm3
+        paddd   xmm11,xmm3
+        pxor    xmm4,xmm1
+        psrld   xmm9,27
+        pxor    xmm6,xmm14
+        movdqa  xmm7,xmm13
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm4
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm11,xmm6
+        paddd   xmm4,xmm4
+
+        psrld   xmm13,2
+        paddd   xmm11,xmm8
+        por     xmm4,xmm5
+        por     xmm13,xmm7
+        pxor    xmm0,xmm2
+        movdqa  xmm2,XMMWORD[((160-128))+rax]
+
+        movdqa  xmm8,xmm11
+        movdqa  xmm6,xmm14
+        pxor    xmm0,XMMWORD[((0-128))+rax]
+        paddd   xmm10,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm12
+
+        movdqa  xmm9,xmm11
+        movdqa  XMMWORD[(112-128)+rax],xmm4
+        paddd   xmm10,xmm4
+        pxor    xmm0,xmm2
+        psrld   xmm9,27
+        pxor    xmm6,xmm13
+        movdqa  xmm7,xmm12
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm0
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm10,xmm6
+        paddd   xmm0,xmm0
+
+        psrld   xmm12,2
+        paddd   xmm10,xmm8
+        por     xmm0,xmm5
+        por     xmm12,xmm7
+        movdqa  xmm15,XMMWORD[32+rbp]
+        pxor    xmm1,xmm3
+        movdqa  xmm3,XMMWORD[((176-128))+rax]
+
+        movdqa  xmm8,xmm10
+        movdqa  xmm7,xmm13
+        pxor    xmm1,XMMWORD[((16-128))+rax]
+        pxor    xmm1,xmm3
+        paddd   xmm14,xmm15
+        pslld   xmm8,5
+        movdqa  xmm9,xmm10
+        pand    xmm7,xmm12
+
+        movdqa  xmm6,xmm13
+        movdqa  xmm5,xmm1
+        psrld   xmm9,27
+        paddd   xmm14,xmm7
+        pxor    xmm6,xmm12
+
+        movdqa  XMMWORD[(128-128)+rax],xmm0
+        paddd   xmm14,xmm0
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        pand    xmm6,xmm11
+        movdqa  xmm7,xmm11
+
+        pslld   xmm7,30
+        paddd   xmm1,xmm1
+        paddd   xmm14,xmm6
+
+        psrld   xmm11,2
+        paddd   xmm14,xmm8
+        por     xmm1,xmm5
+        por     xmm11,xmm7
+        pxor    xmm2,xmm4
+        movdqa  xmm4,XMMWORD[((192-128))+rax]
+
+        movdqa  xmm8,xmm14
+        movdqa  xmm7,xmm12
+        pxor    xmm2,XMMWORD[((32-128))+rax]
+        pxor    xmm2,xmm4
+        paddd   xmm13,xmm15
+        pslld   xmm8,5
+        movdqa  xmm9,xmm14
+        pand    xmm7,xmm11
+
+        movdqa  xmm6,xmm12
+        movdqa  xmm5,xmm2
+        psrld   xmm9,27
+        paddd   xmm13,xmm7
+        pxor    xmm6,xmm11
+
+        movdqa  XMMWORD[(144-128)+rax],xmm1
+        paddd   xmm13,xmm1
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        pand    xmm6,xmm10
+        movdqa  xmm7,xmm10
+
+        pslld   xmm7,30
+        paddd   xmm2,xmm2
+        paddd   xmm13,xmm6
+
+        psrld   xmm10,2
+        paddd   xmm13,xmm8
+        por     xmm2,xmm5
+        por     xmm10,xmm7
+        pxor    xmm3,xmm0
+        movdqa  xmm0,XMMWORD[((208-128))+rax]
+
+        movdqa  xmm8,xmm13
+        movdqa  xmm7,xmm11
+        pxor    xmm3,XMMWORD[((48-128))+rax]
+        pxor    xmm3,xmm0
+        paddd   xmm12,xmm15
+        pslld   xmm8,5
+        movdqa  xmm9,xmm13
+        pand    xmm7,xmm10
+
+        movdqa  xmm6,xmm11
+        movdqa  xmm5,xmm3
+        psrld   xmm9,27
+        paddd   xmm12,xmm7
+        pxor    xmm6,xmm10
+
+        movdqa  XMMWORD[(160-128)+rax],xmm2
+        paddd   xmm12,xmm2
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        pand    xmm6,xmm14
+        movdqa  xmm7,xmm14
+
+        pslld   xmm7,30
+        paddd   xmm3,xmm3
+        paddd   xmm12,xmm6
+
+        psrld   xmm14,2
+        paddd   xmm12,xmm8
+        por     xmm3,xmm5
+        por     xmm14,xmm7
+        pxor    xmm4,xmm1
+        movdqa  xmm1,XMMWORD[((224-128))+rax]
+
+        movdqa  xmm8,xmm12
+        movdqa  xmm7,xmm10
+        pxor    xmm4,XMMWORD[((64-128))+rax]
+        pxor    xmm4,xmm1
+        paddd   xmm11,xmm15
+        pslld   xmm8,5
+        movdqa  xmm9,xmm12
+        pand    xmm7,xmm14
+
+        movdqa  xmm6,xmm10
+        movdqa  xmm5,xmm4
+        psrld   xmm9,27
+        paddd   xmm11,xmm7
+        pxor    xmm6,xmm14
+
+        movdqa  XMMWORD[(176-128)+rax],xmm3
+        paddd   xmm11,xmm3
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        pand    xmm6,xmm13
+        movdqa  xmm7,xmm13
+
+        pslld   xmm7,30
+        paddd   xmm4,xmm4
+        paddd   xmm11,xmm6
+
+        psrld   xmm13,2
+        paddd   xmm11,xmm8
+        por     xmm4,xmm5
+        por     xmm13,xmm7
+        pxor    xmm0,xmm2
+        movdqa  xmm2,XMMWORD[((240-128))+rax]
+
+        movdqa  xmm8,xmm11
+        movdqa  xmm7,xmm14
+        pxor    xmm0,XMMWORD[((80-128))+rax]
+        pxor    xmm0,xmm2
+        paddd   xmm10,xmm15
+        pslld   xmm8,5
+        movdqa  xmm9,xmm11
+        pand    xmm7,xmm13
+
+        movdqa  xmm6,xmm14
+        movdqa  xmm5,xmm0
+        psrld   xmm9,27
+        paddd   xmm10,xmm7
+        pxor    xmm6,xmm13
+
+        movdqa  XMMWORD[(192-128)+rax],xmm4
+        paddd   xmm10,xmm4
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        pand    xmm6,xmm12
+        movdqa  xmm7,xmm12
+
+        pslld   xmm7,30
+        paddd   xmm0,xmm0
+        paddd   xmm10,xmm6
+
+        psrld   xmm12,2
+        paddd   xmm10,xmm8
+        por     xmm0,xmm5
+        por     xmm12,xmm7
+        pxor    xmm1,xmm3
+        movdqa  xmm3,XMMWORD[((0-128))+rax]
+
+        movdqa  xmm8,xmm10
+        movdqa  xmm7,xmm13
+        pxor    xmm1,XMMWORD[((96-128))+rax]
+        pxor    xmm1,xmm3
+        paddd   xmm14,xmm15
+        pslld   xmm8,5
+        movdqa  xmm9,xmm10
+        pand    xmm7,xmm12
+
+        movdqa  xmm6,xmm13
+        movdqa  xmm5,xmm1
+        psrld   xmm9,27
+        paddd   xmm14,xmm7
+        pxor    xmm6,xmm12
+
+        movdqa  XMMWORD[(208-128)+rax],xmm0
+        paddd   xmm14,xmm0
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        pand    xmm6,xmm11
+        movdqa  xmm7,xmm11
+
+        pslld   xmm7,30
+        paddd   xmm1,xmm1
+        paddd   xmm14,xmm6
+
+        psrld   xmm11,2
+        paddd   xmm14,xmm8
+        por     xmm1,xmm5
+        por     xmm11,xmm7
+        pxor    xmm2,xmm4
+        movdqa  xmm4,XMMWORD[((16-128))+rax]
+
+        movdqa  xmm8,xmm14
+        movdqa  xmm7,xmm12
+        pxor    xmm2,XMMWORD[((112-128))+rax]
+        pxor    xmm2,xmm4
+        paddd   xmm13,xmm15
+        pslld   xmm8,5
+        movdqa  xmm9,xmm14
+        pand    xmm7,xmm11
+
+        movdqa  xmm6,xmm12
+        movdqa  xmm5,xmm2
+        psrld   xmm9,27
+        paddd   xmm13,xmm7
+        pxor    xmm6,xmm11
+
+        movdqa  XMMWORD[(224-128)+rax],xmm1
+        paddd   xmm13,xmm1
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        pand    xmm6,xmm10
+        movdqa  xmm7,xmm10
+
+        pslld   xmm7,30
+        paddd   xmm2,xmm2
+        paddd   xmm13,xmm6
+
+        psrld   xmm10,2
+        paddd   xmm13,xmm8
+        por     xmm2,xmm5
+        por     xmm10,xmm7
+        pxor    xmm3,xmm0
+        movdqa  xmm0,XMMWORD[((32-128))+rax]
+
+        movdqa  xmm8,xmm13
+        movdqa  xmm7,xmm11
+        pxor    xmm3,XMMWORD[((128-128))+rax]
+        pxor    xmm3,xmm0
+        paddd   xmm12,xmm15
+        pslld   xmm8,5
+        movdqa  xmm9,xmm13
+        pand    xmm7,xmm10
+
+        movdqa  xmm6,xmm11
+        movdqa  xmm5,xmm3
+        psrld   xmm9,27
+        paddd   xmm12,xmm7
+        pxor    xmm6,xmm10
+
+        movdqa  XMMWORD[(240-128)+rax],xmm2
+        paddd   xmm12,xmm2
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        pand    xmm6,xmm14
+        movdqa  xmm7,xmm14
+
+        pslld   xmm7,30
+        paddd   xmm3,xmm3
+        paddd   xmm12,xmm6
+
+        psrld   xmm14,2
+        paddd   xmm12,xmm8
+        por     xmm3,xmm5
+        por     xmm14,xmm7
+        pxor    xmm4,xmm1
+        movdqa  xmm1,XMMWORD[((48-128))+rax]
+
+        movdqa  xmm8,xmm12
+        movdqa  xmm7,xmm10
+        pxor    xmm4,XMMWORD[((144-128))+rax]
+        pxor    xmm4,xmm1
+        paddd   xmm11,xmm15
+        pslld   xmm8,5
+        movdqa  xmm9,xmm12
+        pand    xmm7,xmm14
+
+        movdqa  xmm6,xmm10
+        movdqa  xmm5,xmm4
+        psrld   xmm9,27
+        paddd   xmm11,xmm7
+        pxor    xmm6,xmm14
+
+        movdqa  XMMWORD[(0-128)+rax],xmm3
+        paddd   xmm11,xmm3
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        pand    xmm6,xmm13
+        movdqa  xmm7,xmm13
+
+        pslld   xmm7,30
+        paddd   xmm4,xmm4
+        paddd   xmm11,xmm6
+
+        psrld   xmm13,2
+        paddd   xmm11,xmm8
+        por     xmm4,xmm5
+        por     xmm13,xmm7
+        pxor    xmm0,xmm2
+        movdqa  xmm2,XMMWORD[((64-128))+rax]
+
+        movdqa  xmm8,xmm11
+        movdqa  xmm7,xmm14
+        pxor    xmm0,XMMWORD[((160-128))+rax]
+        pxor    xmm0,xmm2
+        paddd   xmm10,xmm15
+        pslld   xmm8,5
+        movdqa  xmm9,xmm11
+        pand    xmm7,xmm13
+
+        movdqa  xmm6,xmm14
+        movdqa  xmm5,xmm0
+        psrld   xmm9,27
+        paddd   xmm10,xmm7
+        pxor    xmm6,xmm13
+
+        movdqa  XMMWORD[(16-128)+rax],xmm4
+        paddd   xmm10,xmm4
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        pand    xmm6,xmm12
+        movdqa  xmm7,xmm12
+
+        pslld   xmm7,30
+        paddd   xmm0,xmm0
+        paddd   xmm10,xmm6
+
+        psrld   xmm12,2
+        paddd   xmm10,xmm8
+        por     xmm0,xmm5
+        por     xmm12,xmm7
+        pxor    xmm1,xmm3
+        movdqa  xmm3,XMMWORD[((80-128))+rax]
+
+        movdqa  xmm8,xmm10
+        movdqa  xmm7,xmm13
+        pxor    xmm1,XMMWORD[((176-128))+rax]
+        pxor    xmm1,xmm3
+        paddd   xmm14,xmm15
+        pslld   xmm8,5
+        movdqa  xmm9,xmm10
+        pand    xmm7,xmm12
+
+        movdqa  xmm6,xmm13
+        movdqa  xmm5,xmm1
+        psrld   xmm9,27
+        paddd   xmm14,xmm7
+        pxor    xmm6,xmm12
+
+        movdqa  XMMWORD[(32-128)+rax],xmm0
+        paddd   xmm14,xmm0
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        pand    xmm6,xmm11
+        movdqa  xmm7,xmm11
+
+        pslld   xmm7,30
+        paddd   xmm1,xmm1
+        paddd   xmm14,xmm6
+
+        psrld   xmm11,2
+        paddd   xmm14,xmm8
+        por     xmm1,xmm5
+        por     xmm11,xmm7
+        pxor    xmm2,xmm4
+        movdqa  xmm4,XMMWORD[((96-128))+rax]
+
+        movdqa  xmm8,xmm14
+        movdqa  xmm7,xmm12
+        pxor    xmm2,XMMWORD[((192-128))+rax]
+        pxor    xmm2,xmm4
+        paddd   xmm13,xmm15
+        pslld   xmm8,5
+        movdqa  xmm9,xmm14
+        pand    xmm7,xmm11
+
+        movdqa  xmm6,xmm12
+        movdqa  xmm5,xmm2
+        psrld   xmm9,27
+        paddd   xmm13,xmm7
+        pxor    xmm6,xmm11
+
+        movdqa  XMMWORD[(48-128)+rax],xmm1
+        paddd   xmm13,xmm1
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        pand    xmm6,xmm10
+        movdqa  xmm7,xmm10
+
+        pslld   xmm7,30
+        paddd   xmm2,xmm2
+        paddd   xmm13,xmm6
+
+        psrld   xmm10,2
+        paddd   xmm13,xmm8
+        por     xmm2,xmm5
+        por     xmm10,xmm7
+        pxor    xmm3,xmm0
+        movdqa  xmm0,XMMWORD[((112-128))+rax]
+
+        movdqa  xmm8,xmm13
+        movdqa  xmm7,xmm11
+        pxor    xmm3,XMMWORD[((208-128))+rax]
+        pxor    xmm3,xmm0
+        paddd   xmm12,xmm15
+        pslld   xmm8,5
+        movdqa  xmm9,xmm13
+        pand    xmm7,xmm10
+
+        movdqa  xmm6,xmm11
+        movdqa  xmm5,xmm3
+        psrld   xmm9,27
+        paddd   xmm12,xmm7
+        pxor    xmm6,xmm10
+
+        movdqa  XMMWORD[(64-128)+rax],xmm2
+        paddd   xmm12,xmm2
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        pand    xmm6,xmm14
+        movdqa  xmm7,xmm14
+
+        pslld   xmm7,30
+        paddd   xmm3,xmm3
+        paddd   xmm12,xmm6
+
+        psrld   xmm14,2
+        paddd   xmm12,xmm8
+        por     xmm3,xmm5
+        por     xmm14,xmm7
+        pxor    xmm4,xmm1
+        movdqa  xmm1,XMMWORD[((128-128))+rax]
+
+        movdqa  xmm8,xmm12
+        movdqa  xmm7,xmm10
+        pxor    xmm4,XMMWORD[((224-128))+rax]
+        pxor    xmm4,xmm1
+        paddd   xmm11,xmm15
+        pslld   xmm8,5
+        movdqa  xmm9,xmm12
+        pand    xmm7,xmm14
+
+        movdqa  xmm6,xmm10
+        movdqa  xmm5,xmm4
+        psrld   xmm9,27
+        paddd   xmm11,xmm7
+        pxor    xmm6,xmm14
+
+        movdqa  XMMWORD[(80-128)+rax],xmm3
+        paddd   xmm11,xmm3
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        pand    xmm6,xmm13
+        movdqa  xmm7,xmm13
+
+        pslld   xmm7,30
+        paddd   xmm4,xmm4
+        paddd   xmm11,xmm6
+
+        psrld   xmm13,2
+        paddd   xmm11,xmm8
+        por     xmm4,xmm5
+        por     xmm13,xmm7
+        pxor    xmm0,xmm2
+        movdqa  xmm2,XMMWORD[((144-128))+rax]
+
+        movdqa  xmm8,xmm11
+        movdqa  xmm7,xmm14
+        pxor    xmm0,XMMWORD[((240-128))+rax]
+        pxor    xmm0,xmm2
+        paddd   xmm10,xmm15
+        pslld   xmm8,5
+        movdqa  xmm9,xmm11
+        pand    xmm7,xmm13
+
+        movdqa  xmm6,xmm14
+        movdqa  xmm5,xmm0
+        psrld   xmm9,27
+        paddd   xmm10,xmm7
+        pxor    xmm6,xmm13
+
+        movdqa  XMMWORD[(96-128)+rax],xmm4
+        paddd   xmm10,xmm4
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        pand    xmm6,xmm12
+        movdqa  xmm7,xmm12
+
+        pslld   xmm7,30
+        paddd   xmm0,xmm0
+        paddd   xmm10,xmm6
+
+        psrld   xmm12,2
+        paddd   xmm10,xmm8
+        por     xmm0,xmm5
+        por     xmm12,xmm7
+        pxor    xmm1,xmm3
+        movdqa  xmm3,XMMWORD[((160-128))+rax]
+
+        movdqa  xmm8,xmm10
+        movdqa  xmm7,xmm13
+        pxor    xmm1,XMMWORD[((0-128))+rax]
+        pxor    xmm1,xmm3
+        paddd   xmm14,xmm15
+        pslld   xmm8,5
+        movdqa  xmm9,xmm10
+        pand    xmm7,xmm12
+
+        movdqa  xmm6,xmm13
+        movdqa  xmm5,xmm1
+        psrld   xmm9,27
+        paddd   xmm14,xmm7
+        pxor    xmm6,xmm12
+
+        movdqa  XMMWORD[(112-128)+rax],xmm0
+        paddd   xmm14,xmm0
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        pand    xmm6,xmm11
+        movdqa  xmm7,xmm11
+
+        pslld   xmm7,30
+        paddd   xmm1,xmm1
+        paddd   xmm14,xmm6
+
+        psrld   xmm11,2
+        paddd   xmm14,xmm8
+        por     xmm1,xmm5
+        por     xmm11,xmm7
+        pxor    xmm2,xmm4
+        movdqa  xmm4,XMMWORD[((176-128))+rax]
+
+        movdqa  xmm8,xmm14
+        movdqa  xmm7,xmm12
+        pxor    xmm2,XMMWORD[((16-128))+rax]
+        pxor    xmm2,xmm4
+        paddd   xmm13,xmm15
+        pslld   xmm8,5
+        movdqa  xmm9,xmm14
+        pand    xmm7,xmm11
+
+        movdqa  xmm6,xmm12
+        movdqa  xmm5,xmm2
+        psrld   xmm9,27
+        paddd   xmm13,xmm7
+        pxor    xmm6,xmm11
+
+        movdqa  XMMWORD[(128-128)+rax],xmm1
+        paddd   xmm13,xmm1
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        pand    xmm6,xmm10
+        movdqa  xmm7,xmm10
+
+        pslld   xmm7,30
+        paddd   xmm2,xmm2
+        paddd   xmm13,xmm6
+
+        psrld   xmm10,2
+        paddd   xmm13,xmm8
+        por     xmm2,xmm5
+        por     xmm10,xmm7
+        pxor    xmm3,xmm0
+        movdqa  xmm0,XMMWORD[((192-128))+rax]
+
+        movdqa  xmm8,xmm13
+        movdqa  xmm7,xmm11
+        pxor    xmm3,XMMWORD[((32-128))+rax]
+        pxor    xmm3,xmm0
+        paddd   xmm12,xmm15
+        pslld   xmm8,5
+        movdqa  xmm9,xmm13
+        pand    xmm7,xmm10
+
+        movdqa  xmm6,xmm11
+        movdqa  xmm5,xmm3
+        psrld   xmm9,27
+        paddd   xmm12,xmm7
+        pxor    xmm6,xmm10
+
+        movdqa  XMMWORD[(144-128)+rax],xmm2
+        paddd   xmm12,xmm2
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        pand    xmm6,xmm14
+        movdqa  xmm7,xmm14
+
+        pslld   xmm7,30
+        paddd   xmm3,xmm3
+        paddd   xmm12,xmm6
+
+        psrld   xmm14,2
+        paddd   xmm12,xmm8
+        por     xmm3,xmm5
+        por     xmm14,xmm7
+        pxor    xmm4,xmm1
+        movdqa  xmm1,XMMWORD[((208-128))+rax]
+
+        movdqa  xmm8,xmm12
+        movdqa  xmm7,xmm10
+        pxor    xmm4,XMMWORD[((48-128))+rax]
+        pxor    xmm4,xmm1
+        paddd   xmm11,xmm15
+        pslld   xmm8,5
+        movdqa  xmm9,xmm12
+        pand    xmm7,xmm14
+
+        movdqa  xmm6,xmm10
+        movdqa  xmm5,xmm4
+        psrld   xmm9,27
+        paddd   xmm11,xmm7
+        pxor    xmm6,xmm14
+
+        movdqa  XMMWORD[(160-128)+rax],xmm3
+        paddd   xmm11,xmm3
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        pand    xmm6,xmm13
+        movdqa  xmm7,xmm13
+
+        pslld   xmm7,30
+        paddd   xmm4,xmm4
+        paddd   xmm11,xmm6
+
+        psrld   xmm13,2
+        paddd   xmm11,xmm8
+        por     xmm4,xmm5
+        por     xmm13,xmm7
+        pxor    xmm0,xmm2
+        movdqa  xmm2,XMMWORD[((224-128))+rax]
+
+        movdqa  xmm8,xmm11
+        movdqa  xmm7,xmm14
+        pxor    xmm0,XMMWORD[((64-128))+rax]
+        pxor    xmm0,xmm2
+        paddd   xmm10,xmm15
+        pslld   xmm8,5
+        movdqa  xmm9,xmm11
+        pand    xmm7,xmm13
+
+        movdqa  xmm6,xmm14
+        movdqa  xmm5,xmm0
+        psrld   xmm9,27
+        paddd   xmm10,xmm7
+        pxor    xmm6,xmm13
+
+        movdqa  XMMWORD[(176-128)+rax],xmm4
+        paddd   xmm10,xmm4
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        pand    xmm6,xmm12
+        movdqa  xmm7,xmm12
+
+        pslld   xmm7,30
+        paddd   xmm0,xmm0
+        paddd   xmm10,xmm6
+
+        psrld   xmm12,2
+        paddd   xmm10,xmm8
+        por     xmm0,xmm5
+        por     xmm12,xmm7
+        movdqa  xmm15,XMMWORD[64+rbp]
+        pxor    xmm1,xmm3
+        movdqa  xmm3,XMMWORD[((240-128))+rax]
+
+        movdqa  xmm8,xmm10
+        movdqa  xmm6,xmm13
+        pxor    xmm1,XMMWORD[((80-128))+rax]
+        paddd   xmm14,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm11
+
+        movdqa  xmm9,xmm10
+        movdqa  XMMWORD[(192-128)+rax],xmm0
+        paddd   xmm14,xmm0
+        pxor    xmm1,xmm3
+        psrld   xmm9,27
+        pxor    xmm6,xmm12
+        movdqa  xmm7,xmm11
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm1
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm14,xmm6
+        paddd   xmm1,xmm1
+
+        psrld   xmm11,2
+        paddd   xmm14,xmm8
+        por     xmm1,xmm5
+        por     xmm11,xmm7
+        pxor    xmm2,xmm4
+        movdqa  xmm4,XMMWORD[((0-128))+rax]
+
+        movdqa  xmm8,xmm14
+        movdqa  xmm6,xmm12
+        pxor    xmm2,XMMWORD[((96-128))+rax]
+        paddd   xmm13,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm10
+
+        movdqa  xmm9,xmm14
+        movdqa  XMMWORD[(208-128)+rax],xmm1
+        paddd   xmm13,xmm1
+        pxor    xmm2,xmm4
+        psrld   xmm9,27
+        pxor    xmm6,xmm11
+        movdqa  xmm7,xmm10
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm2
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm13,xmm6
+        paddd   xmm2,xmm2
+
+        psrld   xmm10,2
+        paddd   xmm13,xmm8
+        por     xmm2,xmm5
+        por     xmm10,xmm7
+        pxor    xmm3,xmm0
+        movdqa  xmm0,XMMWORD[((16-128))+rax]
+
+        movdqa  xmm8,xmm13
+        movdqa  xmm6,xmm11
+        pxor    xmm3,XMMWORD[((112-128))+rax]
+        paddd   xmm12,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm14
+
+        movdqa  xmm9,xmm13
+        movdqa  XMMWORD[(224-128)+rax],xmm2
+        paddd   xmm12,xmm2
+        pxor    xmm3,xmm0
+        psrld   xmm9,27
+        pxor    xmm6,xmm10
+        movdqa  xmm7,xmm14
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm3
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm12,xmm6
+        paddd   xmm3,xmm3
+
+        psrld   xmm14,2
+        paddd   xmm12,xmm8
+        por     xmm3,xmm5
+        por     xmm14,xmm7
+        pxor    xmm4,xmm1
+        movdqa  xmm1,XMMWORD[((32-128))+rax]
+
+        movdqa  xmm8,xmm12
+        movdqa  xmm6,xmm10
+        pxor    xmm4,XMMWORD[((128-128))+rax]
+        paddd   xmm11,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm13
+
+        movdqa  xmm9,xmm12
+        movdqa  XMMWORD[(240-128)+rax],xmm3
+        paddd   xmm11,xmm3
+        pxor    xmm4,xmm1
+        psrld   xmm9,27
+        pxor    xmm6,xmm14
+        movdqa  xmm7,xmm13
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm4
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm11,xmm6
+        paddd   xmm4,xmm4
+
+        psrld   xmm13,2
+        paddd   xmm11,xmm8
+        por     xmm4,xmm5
+        por     xmm13,xmm7
+        pxor    xmm0,xmm2
+        movdqa  xmm2,XMMWORD[((48-128))+rax]
+
+        movdqa  xmm8,xmm11
+        movdqa  xmm6,xmm14
+        pxor    xmm0,XMMWORD[((144-128))+rax]
+        paddd   xmm10,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm12
+
+        movdqa  xmm9,xmm11
+        movdqa  XMMWORD[(0-128)+rax],xmm4
+        paddd   xmm10,xmm4
+        pxor    xmm0,xmm2
+        psrld   xmm9,27
+        pxor    xmm6,xmm13
+        movdqa  xmm7,xmm12
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm0
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm10,xmm6
+        paddd   xmm0,xmm0
+
+        psrld   xmm12,2
+        paddd   xmm10,xmm8
+        por     xmm0,xmm5
+        por     xmm12,xmm7
+        pxor    xmm1,xmm3
+        movdqa  xmm3,XMMWORD[((64-128))+rax]
+
+        movdqa  xmm8,xmm10
+        movdqa  xmm6,xmm13
+        pxor    xmm1,XMMWORD[((160-128))+rax]
+        paddd   xmm14,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm11
+
+        movdqa  xmm9,xmm10
+        movdqa  XMMWORD[(16-128)+rax],xmm0
+        paddd   xmm14,xmm0
+        pxor    xmm1,xmm3
+        psrld   xmm9,27
+        pxor    xmm6,xmm12
+        movdqa  xmm7,xmm11
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm1
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm14,xmm6
+        paddd   xmm1,xmm1
+
+        psrld   xmm11,2
+        paddd   xmm14,xmm8
+        por     xmm1,xmm5
+        por     xmm11,xmm7
+        pxor    xmm2,xmm4
+        movdqa  xmm4,XMMWORD[((80-128))+rax]
+
+        movdqa  xmm8,xmm14
+        movdqa  xmm6,xmm12
+        pxor    xmm2,XMMWORD[((176-128))+rax]
+        paddd   xmm13,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm10
+
+        movdqa  xmm9,xmm14
+        movdqa  XMMWORD[(32-128)+rax],xmm1
+        paddd   xmm13,xmm1
+        pxor    xmm2,xmm4
+        psrld   xmm9,27
+        pxor    xmm6,xmm11
+        movdqa  xmm7,xmm10
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm2
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm13,xmm6
+        paddd   xmm2,xmm2
+
+        psrld   xmm10,2
+        paddd   xmm13,xmm8
+        por     xmm2,xmm5
+        por     xmm10,xmm7
+        pxor    xmm3,xmm0
+        movdqa  xmm0,XMMWORD[((96-128))+rax]
+
+        movdqa  xmm8,xmm13
+        movdqa  xmm6,xmm11
+        pxor    xmm3,XMMWORD[((192-128))+rax]
+        paddd   xmm12,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm14
+
+        movdqa  xmm9,xmm13
+        movdqa  XMMWORD[(48-128)+rax],xmm2
+        paddd   xmm12,xmm2
+        pxor    xmm3,xmm0
+        psrld   xmm9,27
+        pxor    xmm6,xmm10
+        movdqa  xmm7,xmm14
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm3
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm12,xmm6
+        paddd   xmm3,xmm3
+
+        psrld   xmm14,2
+        paddd   xmm12,xmm8
+        por     xmm3,xmm5
+        por     xmm14,xmm7
+        pxor    xmm4,xmm1
+        movdqa  xmm1,XMMWORD[((112-128))+rax]
+
+        movdqa  xmm8,xmm12
+        movdqa  xmm6,xmm10
+        pxor    xmm4,XMMWORD[((208-128))+rax]
+        paddd   xmm11,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm13
+
+        movdqa  xmm9,xmm12
+        movdqa  XMMWORD[(64-128)+rax],xmm3
+        paddd   xmm11,xmm3
+        pxor    xmm4,xmm1
+        psrld   xmm9,27
+        pxor    xmm6,xmm14
+        movdqa  xmm7,xmm13
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm4
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm11,xmm6
+        paddd   xmm4,xmm4
+
+        psrld   xmm13,2
+        paddd   xmm11,xmm8
+        por     xmm4,xmm5
+        por     xmm13,xmm7
+        pxor    xmm0,xmm2
+        movdqa  xmm2,XMMWORD[((128-128))+rax]
+
+        movdqa  xmm8,xmm11
+        movdqa  xmm6,xmm14
+        pxor    xmm0,XMMWORD[((224-128))+rax]
+        paddd   xmm10,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm12
+
+        movdqa  xmm9,xmm11
+        movdqa  XMMWORD[(80-128)+rax],xmm4
+        paddd   xmm10,xmm4
+        pxor    xmm0,xmm2
+        psrld   xmm9,27
+        pxor    xmm6,xmm13
+        movdqa  xmm7,xmm12
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm0
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm10,xmm6
+        paddd   xmm0,xmm0
+
+        psrld   xmm12,2
+        paddd   xmm10,xmm8
+        por     xmm0,xmm5
+        por     xmm12,xmm7
+        pxor    xmm1,xmm3
+        movdqa  xmm3,XMMWORD[((144-128))+rax]
+
+        movdqa  xmm8,xmm10
+        movdqa  xmm6,xmm13
+        pxor    xmm1,XMMWORD[((240-128))+rax]
+        paddd   xmm14,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm11
+
+        movdqa  xmm9,xmm10
+        movdqa  XMMWORD[(96-128)+rax],xmm0
+        paddd   xmm14,xmm0
+        pxor    xmm1,xmm3
+        psrld   xmm9,27
+        pxor    xmm6,xmm12
+        movdqa  xmm7,xmm11
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm1
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm14,xmm6
+        paddd   xmm1,xmm1
+
+        psrld   xmm11,2
+        paddd   xmm14,xmm8
+        por     xmm1,xmm5
+        por     xmm11,xmm7
+        pxor    xmm2,xmm4
+        movdqa  xmm4,XMMWORD[((160-128))+rax]
+
+        movdqa  xmm8,xmm14
+        movdqa  xmm6,xmm12
+        pxor    xmm2,XMMWORD[((0-128))+rax]
+        paddd   xmm13,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm10
+
+        movdqa  xmm9,xmm14
+        movdqa  XMMWORD[(112-128)+rax],xmm1
+        paddd   xmm13,xmm1
+        pxor    xmm2,xmm4
+        psrld   xmm9,27
+        pxor    xmm6,xmm11
+        movdqa  xmm7,xmm10
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm2
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm13,xmm6
+        paddd   xmm2,xmm2
+
+        psrld   xmm10,2
+        paddd   xmm13,xmm8
+        por     xmm2,xmm5
+        por     xmm10,xmm7
+        pxor    xmm3,xmm0
+        movdqa  xmm0,XMMWORD[((176-128))+rax]
+
+        movdqa  xmm8,xmm13
+        movdqa  xmm6,xmm11
+        pxor    xmm3,XMMWORD[((16-128))+rax]
+        paddd   xmm12,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm14
+
+        movdqa  xmm9,xmm13
+        paddd   xmm12,xmm2
+        pxor    xmm3,xmm0
+        psrld   xmm9,27
+        pxor    xmm6,xmm10
+        movdqa  xmm7,xmm14
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm3
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm12,xmm6
+        paddd   xmm3,xmm3
+
+        psrld   xmm14,2
+        paddd   xmm12,xmm8
+        por     xmm3,xmm5
+        por     xmm14,xmm7
+        pxor    xmm4,xmm1
+        movdqa  xmm1,XMMWORD[((192-128))+rax]
+
+        movdqa  xmm8,xmm12
+        movdqa  xmm6,xmm10
+        pxor    xmm4,XMMWORD[((32-128))+rax]
+        paddd   xmm11,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm13
+
+        movdqa  xmm9,xmm12
+        paddd   xmm11,xmm3
+        pxor    xmm4,xmm1
+        psrld   xmm9,27
+        pxor    xmm6,xmm14
+        movdqa  xmm7,xmm13
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm4
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm11,xmm6
+        paddd   xmm4,xmm4
+
+        psrld   xmm13,2
+        paddd   xmm11,xmm8
+        por     xmm4,xmm5
+        por     xmm13,xmm7
+        pxor    xmm0,xmm2
+        movdqa  xmm2,XMMWORD[((208-128))+rax]
+
+        movdqa  xmm8,xmm11
+        movdqa  xmm6,xmm14
+        pxor    xmm0,XMMWORD[((48-128))+rax]
+        paddd   xmm10,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm12
+
+        movdqa  xmm9,xmm11
+        paddd   xmm10,xmm4
+        pxor    xmm0,xmm2
+        psrld   xmm9,27
+        pxor    xmm6,xmm13
+        movdqa  xmm7,xmm12
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm0
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm10,xmm6
+        paddd   xmm0,xmm0
+
+        psrld   xmm12,2
+        paddd   xmm10,xmm8
+        por     xmm0,xmm5
+        por     xmm12,xmm7
+        pxor    xmm1,xmm3
+        movdqa  xmm3,XMMWORD[((224-128))+rax]
+
+        movdqa  xmm8,xmm10
+        movdqa  xmm6,xmm13
+        pxor    xmm1,XMMWORD[((64-128))+rax]
+        paddd   xmm14,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm11
+
+        movdqa  xmm9,xmm10
+        paddd   xmm14,xmm0
+        pxor    xmm1,xmm3
+        psrld   xmm9,27
+        pxor    xmm6,xmm12
+        movdqa  xmm7,xmm11
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm1
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm14,xmm6
+        paddd   xmm1,xmm1
+
+        psrld   xmm11,2
+        paddd   xmm14,xmm8
+        por     xmm1,xmm5
+        por     xmm11,xmm7
+        pxor    xmm2,xmm4
+        movdqa  xmm4,XMMWORD[((240-128))+rax]
+
+        movdqa  xmm8,xmm14
+        movdqa  xmm6,xmm12
+        pxor    xmm2,XMMWORD[((80-128))+rax]
+        paddd   xmm13,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm10
+
+        movdqa  xmm9,xmm14
+        paddd   xmm13,xmm1
+        pxor    xmm2,xmm4
+        psrld   xmm9,27
+        pxor    xmm6,xmm11
+        movdqa  xmm7,xmm10
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm2
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm13,xmm6
+        paddd   xmm2,xmm2
+
+        psrld   xmm10,2
+        paddd   xmm13,xmm8
+        por     xmm2,xmm5
+        por     xmm10,xmm7
+        pxor    xmm3,xmm0
+        movdqa  xmm0,XMMWORD[((0-128))+rax]
+
+        movdqa  xmm8,xmm13
+        movdqa  xmm6,xmm11
+        pxor    xmm3,XMMWORD[((96-128))+rax]
+        paddd   xmm12,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm14
+
+        movdqa  xmm9,xmm13
+        paddd   xmm12,xmm2
+        pxor    xmm3,xmm0
+        psrld   xmm9,27
+        pxor    xmm6,xmm10
+        movdqa  xmm7,xmm14
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm3
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm12,xmm6
+        paddd   xmm3,xmm3
+
+        psrld   xmm14,2
+        paddd   xmm12,xmm8
+        por     xmm3,xmm5
+        por     xmm14,xmm7
+        pxor    xmm4,xmm1
+        movdqa  xmm1,XMMWORD[((16-128))+rax]
+
+        movdqa  xmm8,xmm12
+        movdqa  xmm6,xmm10
+        pxor    xmm4,XMMWORD[((112-128))+rax]
+        paddd   xmm11,xmm15
+        pslld   xmm8,5
+        pxor    xmm6,xmm13
+
+        movdqa  xmm9,xmm12
+        paddd   xmm11,xmm3
+        pxor    xmm4,xmm1
+        psrld   xmm9,27
+        pxor    xmm6,xmm14
+        movdqa  xmm7,xmm13
+
+        pslld   xmm7,30
+        movdqa  xmm5,xmm4
+        por     xmm8,xmm9
+        psrld   xmm5,31
+        paddd   xmm11,xmm6
+        paddd   xmm4,xmm4
+
+        psrld   xmm13,2
+        paddd   xmm11,xmm8
+        por     xmm4,xmm5
+        por     xmm13,xmm7
+        movdqa  xmm8,xmm11
+        paddd   xmm10,xmm15
+        movdqa  xmm6,xmm14
+        pslld   xmm8,5
+        pxor    xmm6,xmm12
+
+        movdqa  xmm9,xmm11
+        paddd   xmm10,xmm4
+        psrld   xmm9,27
+        movdqa  xmm7,xmm12
+        pxor    xmm6,xmm13
+
+        pslld   xmm7,30
+        por     xmm8,xmm9
+        paddd   xmm10,xmm6
+
+        psrld   xmm12,2
+        paddd   xmm10,xmm8
+        por     xmm12,xmm7
+        movdqa  xmm0,XMMWORD[rbx]
+        mov     ecx,1
+        cmp     ecx,DWORD[rbx]
+        pxor    xmm8,xmm8
+        cmovge  r8,rbp
+        cmp     ecx,DWORD[4+rbx]
+        movdqa  xmm1,xmm0
+        cmovge  r9,rbp
+        cmp     ecx,DWORD[8+rbx]
+        pcmpgtd xmm1,xmm8
+        cmovge  r10,rbp
+        cmp     ecx,DWORD[12+rbx]
+        paddd   xmm0,xmm1
+        cmovge  r11,rbp
+
+        movdqu  xmm6,XMMWORD[rdi]
+        pand    xmm10,xmm1
+        movdqu  xmm7,XMMWORD[32+rdi]
+        pand    xmm11,xmm1
+        paddd   xmm10,xmm6
+        movdqu  xmm8,XMMWORD[64+rdi]
+        pand    xmm12,xmm1
+        paddd   xmm11,xmm7
+        movdqu  xmm9,XMMWORD[96+rdi]
+        pand    xmm13,xmm1
+        paddd   xmm12,xmm8
+        movdqu  xmm5,XMMWORD[128+rdi]
+        pand    xmm14,xmm1
+        movdqu  XMMWORD[rdi],xmm10
+        paddd   xmm13,xmm9
+        movdqu  XMMWORD[32+rdi],xmm11
+        paddd   xmm14,xmm5
+        movdqu  XMMWORD[64+rdi],xmm12
+        movdqu  XMMWORD[96+rdi],xmm13
+        movdqu  XMMWORD[128+rdi],xmm14
+
+        movdqa  XMMWORD[rbx],xmm0
+        movdqa  xmm5,XMMWORD[96+rbp]
+        movdqa  xmm15,XMMWORD[((-32))+rbp]
+        dec     edx
+        jnz     NEAR $L$oop
+
+        mov     edx,DWORD[280+rsp]
+        lea     rdi,[16+rdi]
+        lea     rsi,[64+rsi]
+        dec     edx
+        jnz     NEAR $L$oop_grande
+
+$L$done:
+        mov     rax,QWORD[272+rsp]
+
+        movaps  xmm6,XMMWORD[((-184))+rax]
+        movaps  xmm7,XMMWORD[((-168))+rax]
+        movaps  xmm8,XMMWORD[((-152))+rax]
+        movaps  xmm9,XMMWORD[((-136))+rax]
+        movaps  xmm10,XMMWORD[((-120))+rax]
+        movaps  xmm11,XMMWORD[((-104))+rax]
+        movaps  xmm12,XMMWORD[((-88))+rax]
+        movaps  xmm13,XMMWORD[((-72))+rax]
+        movaps  xmm14,XMMWORD[((-56))+rax]
+        movaps  xmm15,XMMWORD[((-40))+rax]
+        mov     rbp,QWORD[((-16))+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+
+        lea     rsp,[rax]
+
+$L$epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_sha1_multi_block:
+
+ALIGN   32
+sha1_multi_block_shaext:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_sha1_multi_block_shaext:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+
+_shaext_shortcut:
+        mov     rax,rsp
+
+        push    rbx
+
+        push    rbp
+
+        lea     rsp,[((-168))+rsp]
+        movaps  XMMWORD[rsp],xmm6
+        movaps  XMMWORD[16+rsp],xmm7
+        movaps  XMMWORD[32+rsp],xmm8
+        movaps  XMMWORD[48+rsp],xmm9
+        movaps  XMMWORD[(-120)+rax],xmm10
+        movaps  XMMWORD[(-104)+rax],xmm11
+        movaps  XMMWORD[(-88)+rax],xmm12
+        movaps  XMMWORD[(-72)+rax],xmm13
+        movaps  XMMWORD[(-56)+rax],xmm14
+        movaps  XMMWORD[(-40)+rax],xmm15
+        sub     rsp,288
+        shl     edx,1
+        and     rsp,-256
+        lea     rdi,[64+rdi]
+        mov     QWORD[272+rsp],rax
+$L$body_shaext:
+        lea     rbx,[256+rsp]
+        movdqa  xmm3,XMMWORD[((K_XX_XX+128))]
+
+$L$oop_grande_shaext:
+        mov     DWORD[280+rsp],edx
+        xor     edx,edx
+        mov     r8,QWORD[rsi]
+        mov     ecx,DWORD[8+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[rbx],ecx
+        cmovle  r8,rsp
+        mov     r9,QWORD[16+rsi]
+        mov     ecx,DWORD[24+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[4+rbx],ecx
+        cmovle  r9,rsp
+        test    edx,edx
+        jz      NEAR $L$done_shaext
+
+        movq    xmm0,QWORD[((0-64))+rdi]
+        movq    xmm4,QWORD[((32-64))+rdi]
+        movq    xmm5,QWORD[((64-64))+rdi]
+        movq    xmm6,QWORD[((96-64))+rdi]
+        movq    xmm7,QWORD[((128-64))+rdi]
+
+        punpckldq       xmm0,xmm4
+        punpckldq       xmm5,xmm6
+
+        movdqa  xmm8,xmm0
+        punpcklqdq      xmm0,xmm5
+        punpckhqdq      xmm8,xmm5
+
+        pshufd  xmm1,xmm7,63
+        pshufd  xmm9,xmm7,127
+        pshufd  xmm0,xmm0,27
+        pshufd  xmm8,xmm8,27
+        jmp     NEAR $L$oop_shaext
+
+ALIGN   32
+$L$oop_shaext:
+        movdqu  xmm4,XMMWORD[r8]
+        movdqu  xmm11,XMMWORD[r9]
+        movdqu  xmm5,XMMWORD[16+r8]
+        movdqu  xmm12,XMMWORD[16+r9]
+        movdqu  xmm6,XMMWORD[32+r8]
+DB      102,15,56,0,227
+        movdqu  xmm13,XMMWORD[32+r9]
+DB      102,68,15,56,0,219
+        movdqu  xmm7,XMMWORD[48+r8]
+        lea     r8,[64+r8]
+DB      102,15,56,0,235
+        movdqu  xmm14,XMMWORD[48+r9]
+        lea     r9,[64+r9]
+DB      102,68,15,56,0,227
+
+        movdqa  XMMWORD[80+rsp],xmm1
+        paddd   xmm1,xmm4
+        movdqa  XMMWORD[112+rsp],xmm9
+        paddd   xmm9,xmm11
+        movdqa  XMMWORD[64+rsp],xmm0
+        movdqa  xmm2,xmm0
+        movdqa  XMMWORD[96+rsp],xmm8
+        movdqa  xmm10,xmm8
+DB      15,58,204,193,0
+DB      15,56,200,213
+DB      69,15,58,204,193,0
+DB      69,15,56,200,212
+DB      102,15,56,0,243
+        prefetcht0      [127+r8]
+DB      15,56,201,229
+DB      102,68,15,56,0,235
+        prefetcht0      [127+r9]
+DB      69,15,56,201,220
+
+DB      102,15,56,0,251
+        movdqa  xmm1,xmm0
+DB      102,68,15,56,0,243
+        movdqa  xmm9,xmm8
+DB      15,58,204,194,0
+DB      15,56,200,206
+DB      69,15,58,204,194,0
+DB      69,15,56,200,205
+        pxor    xmm4,xmm6
+DB      15,56,201,238
+        pxor    xmm11,xmm13
+DB      69,15,56,201,229
+        movdqa  xmm2,xmm0
+        movdqa  xmm10,xmm8
+DB      15,58,204,193,0
+DB      15,56,200,215
+DB      69,15,58,204,193,0
+DB      69,15,56,200,214
+DB      15,56,202,231
+DB      69,15,56,202,222
+        pxor    xmm5,xmm7
+DB      15,56,201,247
+        pxor    xmm12,xmm14
+DB      69,15,56,201,238
+        movdqa  xmm1,xmm0
+        movdqa  xmm9,xmm8
+DB      15,58,204,194,0
+DB      15,56,200,204
+DB      69,15,58,204,194,0
+DB      69,15,56,200,203
+DB      15,56,202,236
+DB      69,15,56,202,227
+        pxor    xmm6,xmm4
+DB      15,56,201,252
+        pxor    xmm13,xmm11
+DB      69,15,56,201,243
+        movdqa  xmm2,xmm0
+        movdqa  xmm10,xmm8
+DB      15,58,204,193,0
+DB      15,56,200,213
+DB      69,15,58,204,193,0
+DB      69,15,56,200,212
+DB      15,56,202,245
+DB      69,15,56,202,236
+        pxor    xmm7,xmm5
+DB      15,56,201,229
+        pxor    xmm14,xmm12
+DB      69,15,56,201,220
+        movdqa  xmm1,xmm0
+        movdqa  xmm9,xmm8
+DB      15,58,204,194,1
+DB      15,56,200,206
+DB      69,15,58,204,194,1
+DB      69,15,56,200,205
+DB      15,56,202,254
+DB      69,15,56,202,245
+        pxor    xmm4,xmm6
+DB      15,56,201,238
+        pxor    xmm11,xmm13
+DB      69,15,56,201,229
+        movdqa  xmm2,xmm0
+        movdqa  xmm10,xmm8
+DB      15,58,204,193,1
+DB      15,56,200,215
+DB      69,15,58,204,193,1
+DB      69,15,56,200,214
+DB      15,56,202,231
+DB      69,15,56,202,222
+        pxor    xmm5,xmm7
+DB      15,56,201,247
+        pxor    xmm12,xmm14
+DB      69,15,56,201,238
+        movdqa  xmm1,xmm0
+        movdqa  xmm9,xmm8
+DB      15,58,204,194,1
+DB      15,56,200,204
+DB      69,15,58,204,194,1
+DB      69,15,56,200,203
+DB      15,56,202,236
+DB      69,15,56,202,227
+        pxor    xmm6,xmm4
+DB      15,56,201,252
+        pxor    xmm13,xmm11
+DB      69,15,56,201,243
+        movdqa  xmm2,xmm0
+        movdqa  xmm10,xmm8
+DB      15,58,204,193,1
+DB      15,56,200,213
+DB      69,15,58,204,193,1
+DB      69,15,56,200,212
+DB      15,56,202,245
+DB      69,15,56,202,236
+        pxor    xmm7,xmm5
+DB      15,56,201,229
+        pxor    xmm14,xmm12
+DB      69,15,56,201,220
+        movdqa  xmm1,xmm0
+        movdqa  xmm9,xmm8
+DB      15,58,204,194,1
+DB      15,56,200,206
+DB      69,15,58,204,194,1
+DB      69,15,56,200,205
+DB      15,56,202,254
+DB      69,15,56,202,245
+        pxor    xmm4,xmm6
+DB      15,56,201,238
+        pxor    xmm11,xmm13
+DB      69,15,56,201,229
+        movdqa  xmm2,xmm0
+        movdqa  xmm10,xmm8
+DB      15,58,204,193,2
+DB      15,56,200,215
+DB      69,15,58,204,193,2
+DB      69,15,56,200,214
+DB      15,56,202,231
+DB      69,15,56,202,222
+        pxor    xmm5,xmm7
+DB      15,56,201,247
+        pxor    xmm12,xmm14
+DB      69,15,56,201,238
+        movdqa  xmm1,xmm0
+        movdqa  xmm9,xmm8
+DB      15,58,204,194,2
+DB      15,56,200,204
+DB      69,15,58,204,194,2
+DB      69,15,56,200,203
+DB      15,56,202,236
+DB      69,15,56,202,227
+        pxor    xmm6,xmm4
+DB      15,56,201,252
+        pxor    xmm13,xmm11
+DB      69,15,56,201,243
+        movdqa  xmm2,xmm0
+        movdqa  xmm10,xmm8
+DB      15,58,204,193,2
+DB      15,56,200,213
+DB      69,15,58,204,193,2
+DB      69,15,56,200,212
+DB      15,56,202,245
+DB      69,15,56,202,236
+        pxor    xmm7,xmm5
+DB      15,56,201,229
+        pxor    xmm14,xmm12
+DB      69,15,56,201,220
+        movdqa  xmm1,xmm0
+        movdqa  xmm9,xmm8
+DB      15,58,204,194,2
+DB      15,56,200,206
+DB      69,15,58,204,194,2
+DB      69,15,56,200,205
+DB      15,56,202,254
+DB      69,15,56,202,245
+        pxor    xmm4,xmm6
+DB      15,56,201,238
+        pxor    xmm11,xmm13
+DB      69,15,56,201,229
+        movdqa  xmm2,xmm0
+        movdqa  xmm10,xmm8
+DB      15,58,204,193,2
+DB      15,56,200,215
+DB      69,15,58,204,193,2
+DB      69,15,56,200,214
+DB      15,56,202,231
+DB      69,15,56,202,222
+        pxor    xmm5,xmm7
+DB      15,56,201,247
+        pxor    xmm12,xmm14
+DB      69,15,56,201,238
+        movdqa  xmm1,xmm0
+        movdqa  xmm9,xmm8
+DB      15,58,204,194,3
+DB      15,56,200,204
+DB      69,15,58,204,194,3
+DB      69,15,56,200,203
+DB      15,56,202,236
+DB      69,15,56,202,227
+        pxor    xmm6,xmm4
+DB      15,56,201,252
+        pxor    xmm13,xmm11
+DB      69,15,56,201,243
+        movdqa  xmm2,xmm0
+        movdqa  xmm10,xmm8
+DB      15,58,204,193,3
+DB      15,56,200,213
+DB      69,15,58,204,193,3
+DB      69,15,56,200,212
+DB      15,56,202,245
+DB      69,15,56,202,236
+        pxor    xmm7,xmm5
+        pxor    xmm14,xmm12
+
+        mov     ecx,1
+        pxor    xmm4,xmm4
+        cmp     ecx,DWORD[rbx]
+        cmovge  r8,rsp
+
+        movdqa  xmm1,xmm0
+        movdqa  xmm9,xmm8
+DB      15,58,204,194,3
+DB      15,56,200,206
+DB      69,15,58,204,194,3
+DB      69,15,56,200,205
+DB      15,56,202,254
+DB      69,15,56,202,245
+
+        cmp     ecx,DWORD[4+rbx]
+        cmovge  r9,rsp
+        movq    xmm6,QWORD[rbx]
+
+        movdqa  xmm2,xmm0
+        movdqa  xmm10,xmm8
+DB      15,58,204,193,3
+DB      15,56,200,215
+DB      69,15,58,204,193,3
+DB      69,15,56,200,214
+
+        pshufd  xmm11,xmm6,0x00
+        pshufd  xmm12,xmm6,0x55
+        movdqa  xmm7,xmm6
+        pcmpgtd xmm11,xmm4
+        pcmpgtd xmm12,xmm4
+
+        movdqa  xmm1,xmm0
+        movdqa  xmm9,xmm8
+DB      15,58,204,194,3
+DB      15,56,200,204
+DB      69,15,58,204,194,3
+DB      68,15,56,200,204
+
+        pcmpgtd xmm7,xmm4
+        pand    xmm0,xmm11
+        pand    xmm1,xmm11
+        pand    xmm8,xmm12
+        pand    xmm9,xmm12
+        paddd   xmm6,xmm7
+
+        paddd   xmm0,XMMWORD[64+rsp]
+        paddd   xmm1,XMMWORD[80+rsp]
+        paddd   xmm8,XMMWORD[96+rsp]
+        paddd   xmm9,XMMWORD[112+rsp]
+
+        movq    QWORD[rbx],xmm6
+        dec     edx
+        jnz     NEAR $L$oop_shaext
+
+        mov     edx,DWORD[280+rsp]
+
+        pshufd  xmm0,xmm0,27
+        pshufd  xmm8,xmm8,27
+
+        movdqa  xmm6,xmm0
+        punpckldq       xmm0,xmm8
+        punpckhdq       xmm6,xmm8
+        punpckhdq       xmm1,xmm9
+        movq    QWORD[(0-64)+rdi],xmm0
+        psrldq  xmm0,8
+        movq    QWORD[(64-64)+rdi],xmm6
+        psrldq  xmm6,8
+        movq    QWORD[(32-64)+rdi],xmm0
+        psrldq  xmm1,8
+        movq    QWORD[(96-64)+rdi],xmm6
+        movq    QWORD[(128-64)+rdi],xmm1
+
+        lea     rdi,[8+rdi]
+        lea     rsi,[32+rsi]
+        dec     edx
+        jnz     NEAR $L$oop_grande_shaext
+
+$L$done_shaext:
+
+        movaps  xmm6,XMMWORD[((-184))+rax]
+        movaps  xmm7,XMMWORD[((-168))+rax]
+        movaps  xmm8,XMMWORD[((-152))+rax]
+        movaps  xmm9,XMMWORD[((-136))+rax]
+        movaps  xmm10,XMMWORD[((-120))+rax]
+        movaps  xmm11,XMMWORD[((-104))+rax]
+        movaps  xmm12,XMMWORD[((-88))+rax]
+        movaps  xmm13,XMMWORD[((-72))+rax]
+        movaps  xmm14,XMMWORD[((-56))+rax]
+        movaps  xmm15,XMMWORD[((-40))+rax]
+        mov     rbp,QWORD[((-16))+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+
+        lea     rsp,[rax]
+
+$L$epilogue_shaext:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_sha1_multi_block_shaext:
+
+ALIGN   32
+sha1_multi_block_avx:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_sha1_multi_block_avx:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+
+_avx_shortcut:
+        shr     rcx,32
+        cmp     edx,2
+        jb      NEAR $L$avx
+        test    ecx,32
+        jnz     NEAR _avx2_shortcut
+        jmp     NEAR $L$avx
+ALIGN   32
+$L$avx:
+        mov     rax,rsp
+
+        push    rbx
+
+        push    rbp
+
+        lea     rsp,[((-168))+rsp]
+        movaps  XMMWORD[rsp],xmm6
+        movaps  XMMWORD[16+rsp],xmm7
+        movaps  XMMWORD[32+rsp],xmm8
+        movaps  XMMWORD[48+rsp],xmm9
+        movaps  XMMWORD[(-120)+rax],xmm10
+        movaps  XMMWORD[(-104)+rax],xmm11
+        movaps  XMMWORD[(-88)+rax],xmm12
+        movaps  XMMWORD[(-72)+rax],xmm13
+        movaps  XMMWORD[(-56)+rax],xmm14
+        movaps  XMMWORD[(-40)+rax],xmm15
+        sub     rsp,288
+        and     rsp,-256
+        mov     QWORD[272+rsp],rax
+
+$L$body_avx:
+        lea     rbp,[K_XX_XX]
+        lea     rbx,[256+rsp]
+
+        vzeroupper
+$L$oop_grande_avx:
+        mov     DWORD[280+rsp],edx
+        xor     edx,edx
+        mov     r8,QWORD[rsi]
+        mov     ecx,DWORD[8+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[rbx],ecx
+        cmovle  r8,rbp
+        mov     r9,QWORD[16+rsi]
+        mov     ecx,DWORD[24+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[4+rbx],ecx
+        cmovle  r9,rbp
+        mov     r10,QWORD[32+rsi]
+        mov     ecx,DWORD[40+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[8+rbx],ecx
+        cmovle  r10,rbp
+        mov     r11,QWORD[48+rsi]
+        mov     ecx,DWORD[56+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[12+rbx],ecx
+        cmovle  r11,rbp
+        test    edx,edx
+        jz      NEAR $L$done_avx
+
+        vmovdqu xmm10,XMMWORD[rdi]
+        lea     rax,[128+rsp]
+        vmovdqu xmm11,XMMWORD[32+rdi]
+        vmovdqu xmm12,XMMWORD[64+rdi]
+        vmovdqu xmm13,XMMWORD[96+rdi]
+        vmovdqu xmm14,XMMWORD[128+rdi]
+        vmovdqu xmm5,XMMWORD[96+rbp]
+        jmp     NEAR $L$oop_avx
+
+ALIGN   32
+$L$oop_avx:
+        vmovdqa xmm15,XMMWORD[((-32))+rbp]
+        vmovd   xmm0,DWORD[r8]
+        lea     r8,[64+r8]
+        vmovd   xmm2,DWORD[r9]
+        lea     r9,[64+r9]
+        vpinsrd xmm0,xmm0,DWORD[r10],1
+        lea     r10,[64+r10]
+        vpinsrd xmm2,xmm2,DWORD[r11],1
+        lea     r11,[64+r11]
+        vmovd   xmm1,DWORD[((-60))+r8]
+        vpunpckldq      xmm0,xmm0,xmm2
+        vmovd   xmm9,DWORD[((-60))+r9]
+        vpshufb xmm0,xmm0,xmm5
+        vpinsrd xmm1,xmm1,DWORD[((-60))+r10],1
+        vpinsrd xmm9,xmm9,DWORD[((-60))+r11],1
+        vpaddd  xmm14,xmm14,xmm15
+        vpslld  xmm8,xmm10,5
+        vpandn  xmm7,xmm11,xmm13
+        vpand   xmm6,xmm11,xmm12
+
+        vmovdqa XMMWORD[(0-128)+rax],xmm0
+        vpaddd  xmm14,xmm14,xmm0
+        vpunpckldq      xmm1,xmm1,xmm9
+        vpsrld  xmm9,xmm10,27
+        vpxor   xmm6,xmm6,xmm7
+        vmovd   xmm2,DWORD[((-56))+r8]
+
+        vpslld  xmm7,xmm11,30
+        vpor    xmm8,xmm8,xmm9
+        vmovd   xmm9,DWORD[((-56))+r9]
+        vpaddd  xmm14,xmm14,xmm6
+
+        vpsrld  xmm11,xmm11,2
+        vpaddd  xmm14,xmm14,xmm8
+        vpshufb xmm1,xmm1,xmm5
+        vpor    xmm11,xmm11,xmm7
+        vpinsrd xmm2,xmm2,DWORD[((-56))+r10],1
+        vpinsrd xmm9,xmm9,DWORD[((-56))+r11],1
+        vpaddd  xmm13,xmm13,xmm15
+        vpslld  xmm8,xmm14,5
+        vpandn  xmm7,xmm10,xmm12
+        vpand   xmm6,xmm10,xmm11
+
+        vmovdqa XMMWORD[(16-128)+rax],xmm1
+        vpaddd  xmm13,xmm13,xmm1
+        vpunpckldq      xmm2,xmm2,xmm9
+        vpsrld  xmm9,xmm14,27
+        vpxor   xmm6,xmm6,xmm7
+        vmovd   xmm3,DWORD[((-52))+r8]
+
+        vpslld  xmm7,xmm10,30
+        vpor    xmm8,xmm8,xmm9
+        vmovd   xmm9,DWORD[((-52))+r9]
+        vpaddd  xmm13,xmm13,xmm6
+
+        vpsrld  xmm10,xmm10,2
+        vpaddd  xmm13,xmm13,xmm8
+        vpshufb xmm2,xmm2,xmm5
+        vpor    xmm10,xmm10,xmm7
+        vpinsrd xmm3,xmm3,DWORD[((-52))+r10],1
+        vpinsrd xmm9,xmm9,DWORD[((-52))+r11],1
+        vpaddd  xmm12,xmm12,xmm15
+        vpslld  xmm8,xmm13,5
+        vpandn  xmm7,xmm14,xmm11
+        vpand   xmm6,xmm14,xmm10
+
+        vmovdqa XMMWORD[(32-128)+rax],xmm2
+        vpaddd  xmm12,xmm12,xmm2
+        vpunpckldq      xmm3,xmm3,xmm9
+        vpsrld  xmm9,xmm13,27
+        vpxor   xmm6,xmm6,xmm7
+        vmovd   xmm4,DWORD[((-48))+r8]
+
+        vpslld  xmm7,xmm14,30
+        vpor    xmm8,xmm8,xmm9
+        vmovd   xmm9,DWORD[((-48))+r9]
+        vpaddd  xmm12,xmm12,xmm6
+
+        vpsrld  xmm14,xmm14,2
+        vpaddd  xmm12,xmm12,xmm8
+        vpshufb xmm3,xmm3,xmm5
+        vpor    xmm14,xmm14,xmm7
+        vpinsrd xmm4,xmm4,DWORD[((-48))+r10],1
+        vpinsrd xmm9,xmm9,DWORD[((-48))+r11],1
+        vpaddd  xmm11,xmm11,xmm15
+        vpslld  xmm8,xmm12,5
+        vpandn  xmm7,xmm13,xmm10
+        vpand   xmm6,xmm13,xmm14
+
+        vmovdqa XMMWORD[(48-128)+rax],xmm3
+        vpaddd  xmm11,xmm11,xmm3
+        vpunpckldq      xmm4,xmm4,xmm9
+        vpsrld  xmm9,xmm12,27
+        vpxor   xmm6,xmm6,xmm7
+        vmovd   xmm0,DWORD[((-44))+r8]
+
+        vpslld  xmm7,xmm13,30
+        vpor    xmm8,xmm8,xmm9
+        vmovd   xmm9,DWORD[((-44))+r9]
+        vpaddd  xmm11,xmm11,xmm6
+
+        vpsrld  xmm13,xmm13,2
+        vpaddd  xmm11,xmm11,xmm8
+        vpshufb xmm4,xmm4,xmm5
+        vpor    xmm13,xmm13,xmm7
+        vpinsrd xmm0,xmm0,DWORD[((-44))+r10],1
+        vpinsrd xmm9,xmm9,DWORD[((-44))+r11],1
+        vpaddd  xmm10,xmm10,xmm15
+        vpslld  xmm8,xmm11,5
+        vpandn  xmm7,xmm12,xmm14
+        vpand   xmm6,xmm12,xmm13
+
+        vmovdqa XMMWORD[(64-128)+rax],xmm4
+        vpaddd  xmm10,xmm10,xmm4
+        vpunpckldq      xmm0,xmm0,xmm9
+        vpsrld  xmm9,xmm11,27
+        vpxor   xmm6,xmm6,xmm7
+        vmovd   xmm1,DWORD[((-40))+r8]
+
+        vpslld  xmm7,xmm12,30
+        vpor    xmm8,xmm8,xmm9
+        vmovd   xmm9,DWORD[((-40))+r9]
+        vpaddd  xmm10,xmm10,xmm6
+
+        vpsrld  xmm12,xmm12,2
+        vpaddd  xmm10,xmm10,xmm8
+        vpshufb xmm0,xmm0,xmm5
+        vpor    xmm12,xmm12,xmm7
+        vpinsrd xmm1,xmm1,DWORD[((-40))+r10],1
+        vpinsrd xmm9,xmm9,DWORD[((-40))+r11],1
+        vpaddd  xmm14,xmm14,xmm15
+        vpslld  xmm8,xmm10,5
+        vpandn  xmm7,xmm11,xmm13
+        vpand   xmm6,xmm11,xmm12
+
+        vmovdqa XMMWORD[(80-128)+rax],xmm0
+        vpaddd  xmm14,xmm14,xmm0
+        vpunpckldq      xmm1,xmm1,xmm9
+        vpsrld  xmm9,xmm10,27
+        vpxor   xmm6,xmm6,xmm7
+        vmovd   xmm2,DWORD[((-36))+r8]
+
+        vpslld  xmm7,xmm11,30
+        vpor    xmm8,xmm8,xmm9
+        vmovd   xmm9,DWORD[((-36))+r9]
+        vpaddd  xmm14,xmm14,xmm6
+
+        vpsrld  xmm11,xmm11,2
+        vpaddd  xmm14,xmm14,xmm8
+        vpshufb xmm1,xmm1,xmm5
+        vpor    xmm11,xmm11,xmm7
+        vpinsrd xmm2,xmm2,DWORD[((-36))+r10],1
+        vpinsrd xmm9,xmm9,DWORD[((-36))+r11],1
+        vpaddd  xmm13,xmm13,xmm15
+        vpslld  xmm8,xmm14,5
+        vpandn  xmm7,xmm10,xmm12
+        vpand   xmm6,xmm10,xmm11
+
+        vmovdqa XMMWORD[(96-128)+rax],xmm1
+        vpaddd  xmm13,xmm13,xmm1
+        vpunpckldq      xmm2,xmm2,xmm9
+        vpsrld  xmm9,xmm14,27
+        vpxor   xmm6,xmm6,xmm7
+        vmovd   xmm3,DWORD[((-32))+r8]
+
+        vpslld  xmm7,xmm10,30
+        vpor    xmm8,xmm8,xmm9
+        vmovd   xmm9,DWORD[((-32))+r9]
+        vpaddd  xmm13,xmm13,xmm6
+
+        vpsrld  xmm10,xmm10,2
+        vpaddd  xmm13,xmm13,xmm8
+        vpshufb xmm2,xmm2,xmm5
+        vpor    xmm10,xmm10,xmm7
+        vpinsrd xmm3,xmm3,DWORD[((-32))+r10],1
+        vpinsrd xmm9,xmm9,DWORD[((-32))+r11],1
+        vpaddd  xmm12,xmm12,xmm15
+        vpslld  xmm8,xmm13,5
+        vpandn  xmm7,xmm14,xmm11
+        vpand   xmm6,xmm14,xmm10
+
+        vmovdqa XMMWORD[(112-128)+rax],xmm2
+        vpaddd  xmm12,xmm12,xmm2
+        vpunpckldq      xmm3,xmm3,xmm9
+        vpsrld  xmm9,xmm13,27
+        vpxor   xmm6,xmm6,xmm7
+        vmovd   xmm4,DWORD[((-28))+r8]
+
+        vpslld  xmm7,xmm14,30
+        vpor    xmm8,xmm8,xmm9
+        vmovd   xmm9,DWORD[((-28))+r9]
+        vpaddd  xmm12,xmm12,xmm6
+
+        vpsrld  xmm14,xmm14,2
+        vpaddd  xmm12,xmm12,xmm8
+        vpshufb xmm3,xmm3,xmm5
+        vpor    xmm14,xmm14,xmm7
+        vpinsrd xmm4,xmm4,DWORD[((-28))+r10],1
+        vpinsrd xmm9,xmm9,DWORD[((-28))+r11],1
+        vpaddd  xmm11,xmm11,xmm15
+        vpslld  xmm8,xmm12,5
+        vpandn  xmm7,xmm13,xmm10
+        vpand   xmm6,xmm13,xmm14
+
+        vmovdqa XMMWORD[(128-128)+rax],xmm3
+        vpaddd  xmm11,xmm11,xmm3
+        vpunpckldq      xmm4,xmm4,xmm9
+        vpsrld  xmm9,xmm12,27
+        vpxor   xmm6,xmm6,xmm7
+        vmovd   xmm0,DWORD[((-24))+r8]
+
+        vpslld  xmm7,xmm13,30
+        vpor    xmm8,xmm8,xmm9
+        vmovd   xmm9,DWORD[((-24))+r9]
+        vpaddd  xmm11,xmm11,xmm6
+
+        vpsrld  xmm13,xmm13,2
+        vpaddd  xmm11,xmm11,xmm8
+        vpshufb xmm4,xmm4,xmm5
+        vpor    xmm13,xmm13,xmm7
+        vpinsrd xmm0,xmm0,DWORD[((-24))+r10],1
+        vpinsrd xmm9,xmm9,DWORD[((-24))+r11],1
+        vpaddd  xmm10,xmm10,xmm15
+        vpslld  xmm8,xmm11,5
+        vpandn  xmm7,xmm12,xmm14
+        vpand   xmm6,xmm12,xmm13
+
+        vmovdqa XMMWORD[(144-128)+rax],xmm4
+        vpaddd  xmm10,xmm10,xmm4
+        vpunpckldq      xmm0,xmm0,xmm9
+        vpsrld  xmm9,xmm11,27
+        vpxor   xmm6,xmm6,xmm7
+        vmovd   xmm1,DWORD[((-20))+r8]
+
+        vpslld  xmm7,xmm12,30
+        vpor    xmm8,xmm8,xmm9
+        vmovd   xmm9,DWORD[((-20))+r9]
+        vpaddd  xmm10,xmm10,xmm6
+
+        vpsrld  xmm12,xmm12,2
+        vpaddd  xmm10,xmm10,xmm8
+        vpshufb xmm0,xmm0,xmm5
+        vpor    xmm12,xmm12,xmm7
+        vpinsrd xmm1,xmm1,DWORD[((-20))+r10],1
+        vpinsrd xmm9,xmm9,DWORD[((-20))+r11],1
+        vpaddd  xmm14,xmm14,xmm15
+        vpslld  xmm8,xmm10,5
+        vpandn  xmm7,xmm11,xmm13
+        vpand   xmm6,xmm11,xmm12
+
+        vmovdqa XMMWORD[(160-128)+rax],xmm0
+        vpaddd  xmm14,xmm14,xmm0
+        vpunpckldq      xmm1,xmm1,xmm9
+        vpsrld  xmm9,xmm10,27
+        vpxor   xmm6,xmm6,xmm7
+        vmovd   xmm2,DWORD[((-16))+r8]
+
+        vpslld  xmm7,xmm11,30
+        vpor    xmm8,xmm8,xmm9
+        vmovd   xmm9,DWORD[((-16))+r9]
+        vpaddd  xmm14,xmm14,xmm6
+
+        vpsrld  xmm11,xmm11,2
+        vpaddd  xmm14,xmm14,xmm8
+        vpshufb xmm1,xmm1,xmm5
+        vpor    xmm11,xmm11,xmm7
+        vpinsrd xmm2,xmm2,DWORD[((-16))+r10],1
+        vpinsrd xmm9,xmm9,DWORD[((-16))+r11],1
+        vpaddd  xmm13,xmm13,xmm15
+        vpslld  xmm8,xmm14,5
+        vpandn  xmm7,xmm10,xmm12
+        vpand   xmm6,xmm10,xmm11
+
+        vmovdqa XMMWORD[(176-128)+rax],xmm1
+        vpaddd  xmm13,xmm13,xmm1
+        vpunpckldq      xmm2,xmm2,xmm9
+        vpsrld  xmm9,xmm14,27
+        vpxor   xmm6,xmm6,xmm7
+        vmovd   xmm3,DWORD[((-12))+r8]
+
+        vpslld  xmm7,xmm10,30
+        vpor    xmm8,xmm8,xmm9
+        vmovd   xmm9,DWORD[((-12))+r9]
+        vpaddd  xmm13,xmm13,xmm6
+
+        vpsrld  xmm10,xmm10,2
+        vpaddd  xmm13,xmm13,xmm8
+        vpshufb xmm2,xmm2,xmm5
+        vpor    xmm10,xmm10,xmm7
+        vpinsrd xmm3,xmm3,DWORD[((-12))+r10],1
+        vpinsrd xmm9,xmm9,DWORD[((-12))+r11],1
+        vpaddd  xmm12,xmm12,xmm15
+        vpslld  xmm8,xmm13,5
+        vpandn  xmm7,xmm14,xmm11
+        vpand   xmm6,xmm14,xmm10
+
+        vmovdqa XMMWORD[(192-128)+rax],xmm2
+        vpaddd  xmm12,xmm12,xmm2
+        vpunpckldq      xmm3,xmm3,xmm9
+        vpsrld  xmm9,xmm13,27
+        vpxor   xmm6,xmm6,xmm7
+        vmovd   xmm4,DWORD[((-8))+r8]
+
+        vpslld  xmm7,xmm14,30
+        vpor    xmm8,xmm8,xmm9
+        vmovd   xmm9,DWORD[((-8))+r9]
+        vpaddd  xmm12,xmm12,xmm6
+
+        vpsrld  xmm14,xmm14,2
+        vpaddd  xmm12,xmm12,xmm8
+        vpshufb xmm3,xmm3,xmm5
+        vpor    xmm14,xmm14,xmm7
+        vpinsrd xmm4,xmm4,DWORD[((-8))+r10],1
+        vpinsrd xmm9,xmm9,DWORD[((-8))+r11],1
+        vpaddd  xmm11,xmm11,xmm15
+        vpslld  xmm8,xmm12,5
+        vpandn  xmm7,xmm13,xmm10
+        vpand   xmm6,xmm13,xmm14
+
+        vmovdqa XMMWORD[(208-128)+rax],xmm3
+        vpaddd  xmm11,xmm11,xmm3
+        vpunpckldq      xmm4,xmm4,xmm9
+        vpsrld  xmm9,xmm12,27
+        vpxor   xmm6,xmm6,xmm7
+        vmovd   xmm0,DWORD[((-4))+r8]
+
+        vpslld  xmm7,xmm13,30
+        vpor    xmm8,xmm8,xmm9
+        vmovd   xmm9,DWORD[((-4))+r9]
+        vpaddd  xmm11,xmm11,xmm6
+
+        vpsrld  xmm13,xmm13,2
+        vpaddd  xmm11,xmm11,xmm8
+        vpshufb xmm4,xmm4,xmm5
+        vpor    xmm13,xmm13,xmm7
+        vmovdqa xmm1,XMMWORD[((0-128))+rax]
+        vpinsrd xmm0,xmm0,DWORD[((-4))+r10],1
+        vpinsrd xmm9,xmm9,DWORD[((-4))+r11],1
+        vpaddd  xmm10,xmm10,xmm15
+        prefetcht0      [63+r8]
+        vpslld  xmm8,xmm11,5
+        vpandn  xmm7,xmm12,xmm14
+        vpand   xmm6,xmm12,xmm13
+
+        vmovdqa XMMWORD[(224-128)+rax],xmm4
+        vpaddd  xmm10,xmm10,xmm4
+        vpunpckldq      xmm0,xmm0,xmm9
+        vpsrld  xmm9,xmm11,27
+        prefetcht0      [63+r9]
+        vpxor   xmm6,xmm6,xmm7
+
+        vpslld  xmm7,xmm12,30
+        vpor    xmm8,xmm8,xmm9
+        prefetcht0      [63+r10]
+        vpaddd  xmm10,xmm10,xmm6
+
+        vpsrld  xmm12,xmm12,2
+        vpaddd  xmm10,xmm10,xmm8
+        prefetcht0      [63+r11]
+        vpshufb xmm0,xmm0,xmm5
+        vpor    xmm12,xmm12,xmm7
+        vmovdqa xmm2,XMMWORD[((16-128))+rax]
+        vpxor   xmm1,xmm1,xmm3
+        vmovdqa xmm3,XMMWORD[((32-128))+rax]
+
+        vpaddd  xmm14,xmm14,xmm15
+        vpslld  xmm8,xmm10,5
+        vpandn  xmm7,xmm11,xmm13
+
+        vpand   xmm6,xmm11,xmm12
+
+        vmovdqa XMMWORD[(240-128)+rax],xmm0
+        vpaddd  xmm14,xmm14,xmm0
+        vpxor   xmm1,xmm1,XMMWORD[((128-128))+rax]
+        vpsrld  xmm9,xmm10,27
+        vpxor   xmm6,xmm6,xmm7
+        vpxor   xmm1,xmm1,xmm3
+
+
+        vpslld  xmm7,xmm11,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm14,xmm14,xmm6
+
+        vpsrld  xmm5,xmm1,31
+        vpaddd  xmm1,xmm1,xmm1
+
+        vpsrld  xmm11,xmm11,2
+
+        vpaddd  xmm14,xmm14,xmm8
+        vpor    xmm1,xmm1,xmm5
+        vpor    xmm11,xmm11,xmm7
+        vpxor   xmm2,xmm2,xmm4
+        vmovdqa xmm4,XMMWORD[((48-128))+rax]
+
+        vpaddd  xmm13,xmm13,xmm15
+        vpslld  xmm8,xmm14,5
+        vpandn  xmm7,xmm10,xmm12
+
+        vpand   xmm6,xmm10,xmm11
+
+        vmovdqa XMMWORD[(0-128)+rax],xmm1
+        vpaddd  xmm13,xmm13,xmm1
+        vpxor   xmm2,xmm2,XMMWORD[((144-128))+rax]
+        vpsrld  xmm9,xmm14,27
+        vpxor   xmm6,xmm6,xmm7
+        vpxor   xmm2,xmm2,xmm4
+
+
+        vpslld  xmm7,xmm10,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm13,xmm13,xmm6
+
+        vpsrld  xmm5,xmm2,31
+        vpaddd  xmm2,xmm2,xmm2
+
+        vpsrld  xmm10,xmm10,2
+
+        vpaddd  xmm13,xmm13,xmm8
+        vpor    xmm2,xmm2,xmm5
+        vpor    xmm10,xmm10,xmm7
+        vpxor   xmm3,xmm3,xmm0
+        vmovdqa xmm0,XMMWORD[((64-128))+rax]
+
+        vpaddd  xmm12,xmm12,xmm15
+        vpslld  xmm8,xmm13,5
+        vpandn  xmm7,xmm14,xmm11
+
+        vpand   xmm6,xmm14,xmm10
+
+        vmovdqa XMMWORD[(16-128)+rax],xmm2
+        vpaddd  xmm12,xmm12,xmm2
+        vpxor   xmm3,xmm3,XMMWORD[((160-128))+rax]
+        vpsrld  xmm9,xmm13,27
+        vpxor   xmm6,xmm6,xmm7
+        vpxor   xmm3,xmm3,xmm0
+
+
+        vpslld  xmm7,xmm14,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm12,xmm12,xmm6
+
+        vpsrld  xmm5,xmm3,31
+        vpaddd  xmm3,xmm3,xmm3
+
+        vpsrld  xmm14,xmm14,2
+
+        vpaddd  xmm12,xmm12,xmm8
+        vpor    xmm3,xmm3,xmm5
+        vpor    xmm14,xmm14,xmm7
+        vpxor   xmm4,xmm4,xmm1
+        vmovdqa xmm1,XMMWORD[((80-128))+rax]
+
+        vpaddd  xmm11,xmm11,xmm15
+        vpslld  xmm8,xmm12,5
+        vpandn  xmm7,xmm13,xmm10
+
+        vpand   xmm6,xmm13,xmm14
+
+        vmovdqa XMMWORD[(32-128)+rax],xmm3
+        vpaddd  xmm11,xmm11,xmm3
+        vpxor   xmm4,xmm4,XMMWORD[((176-128))+rax]
+        vpsrld  xmm9,xmm12,27
+        vpxor   xmm6,xmm6,xmm7
+        vpxor   xmm4,xmm4,xmm1
+
+
+        vpslld  xmm7,xmm13,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm11,xmm11,xmm6
+
+        vpsrld  xmm5,xmm4,31
+        vpaddd  xmm4,xmm4,xmm4
+
+        vpsrld  xmm13,xmm13,2
+
+        vpaddd  xmm11,xmm11,xmm8
+        vpor    xmm4,xmm4,xmm5
+        vpor    xmm13,xmm13,xmm7
+        vpxor   xmm0,xmm0,xmm2
+        vmovdqa xmm2,XMMWORD[((96-128))+rax]
+
+        vpaddd  xmm10,xmm10,xmm15
+        vpslld  xmm8,xmm11,5
+        vpandn  xmm7,xmm12,xmm14
+
+        vpand   xmm6,xmm12,xmm13
+
+        vmovdqa XMMWORD[(48-128)+rax],xmm4
+        vpaddd  xmm10,xmm10,xmm4
+        vpxor   xmm0,xmm0,XMMWORD[((192-128))+rax]
+        vpsrld  xmm9,xmm11,27
+        vpxor   xmm6,xmm6,xmm7
+        vpxor   xmm0,xmm0,xmm2
+
+
+        vpslld  xmm7,xmm12,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm10,xmm10,xmm6
+
+        vpsrld  xmm5,xmm0,31
+        vpaddd  xmm0,xmm0,xmm0
+
+        vpsrld  xmm12,xmm12,2
+
+        vpaddd  xmm10,xmm10,xmm8
+        vpor    xmm0,xmm0,xmm5
+        vpor    xmm12,xmm12,xmm7
+        vmovdqa xmm15,XMMWORD[rbp]
+        vpxor   xmm1,xmm1,xmm3
+        vmovdqa xmm3,XMMWORD[((112-128))+rax]
+
+        vpslld  xmm8,xmm10,5
+        vpaddd  xmm14,xmm14,xmm15
+        vpxor   xmm6,xmm13,xmm11
+        vmovdqa XMMWORD[(64-128)+rax],xmm0
+        vpaddd  xmm14,xmm14,xmm0
+        vpxor   xmm1,xmm1,XMMWORD[((208-128))+rax]
+        vpsrld  xmm9,xmm10,27
+        vpxor   xmm6,xmm6,xmm12
+        vpxor   xmm1,xmm1,xmm3
+
+        vpslld  xmm7,xmm11,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm14,xmm14,xmm6
+        vpsrld  xmm5,xmm1,31
+        vpaddd  xmm1,xmm1,xmm1
+
+        vpsrld  xmm11,xmm11,2
+        vpaddd  xmm14,xmm14,xmm8
+        vpor    xmm1,xmm1,xmm5
+        vpor    xmm11,xmm11,xmm7
+        vpxor   xmm2,xmm2,xmm4
+        vmovdqa xmm4,XMMWORD[((128-128))+rax]
+
+        vpslld  xmm8,xmm14,5
+        vpaddd  xmm13,xmm13,xmm15
+        vpxor   xmm6,xmm12,xmm10
+        vmovdqa XMMWORD[(80-128)+rax],xmm1
+        vpaddd  xmm13,xmm13,xmm1
+        vpxor   xmm2,xmm2,XMMWORD[((224-128))+rax]
+        vpsrld  xmm9,xmm14,27
+        vpxor   xmm6,xmm6,xmm11
+        vpxor   xmm2,xmm2,xmm4
+
+        vpslld  xmm7,xmm10,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm13,xmm13,xmm6
+        vpsrld  xmm5,xmm2,31
+        vpaddd  xmm2,xmm2,xmm2
+
+        vpsrld  xmm10,xmm10,2
+        vpaddd  xmm13,xmm13,xmm8
+        vpor    xmm2,xmm2,xmm5
+        vpor    xmm10,xmm10,xmm7
+        vpxor   xmm3,xmm3,xmm0
+        vmovdqa xmm0,XMMWORD[((144-128))+rax]
+
+        vpslld  xmm8,xmm13,5
+        vpaddd  xmm12,xmm12,xmm15
+        vpxor   xmm6,xmm11,xmm14
+        vmovdqa XMMWORD[(96-128)+rax],xmm2
+        vpaddd  xmm12,xmm12,xmm2
+        vpxor   xmm3,xmm3,XMMWORD[((240-128))+rax]
+        vpsrld  xmm9,xmm13,27
+        vpxor   xmm6,xmm6,xmm10
+        vpxor   xmm3,xmm3,xmm0
+
+        vpslld  xmm7,xmm14,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm12,xmm12,xmm6
+        vpsrld  xmm5,xmm3,31
+        vpaddd  xmm3,xmm3,xmm3
+
+        vpsrld  xmm14,xmm14,2
+        vpaddd  xmm12,xmm12,xmm8
+        vpor    xmm3,xmm3,xmm5
+        vpor    xmm14,xmm14,xmm7
+        vpxor   xmm4,xmm4,xmm1
+        vmovdqa xmm1,XMMWORD[((160-128))+rax]
+
+        vpslld  xmm8,xmm12,5
+        vpaddd  xmm11,xmm11,xmm15
+        vpxor   xmm6,xmm10,xmm13
+        vmovdqa XMMWORD[(112-128)+rax],xmm3
+        vpaddd  xmm11,xmm11,xmm3
+        vpxor   xmm4,xmm4,XMMWORD[((0-128))+rax]
+        vpsrld  xmm9,xmm12,27
+        vpxor   xmm6,xmm6,xmm14
+        vpxor   xmm4,xmm4,xmm1
+
+        vpslld  xmm7,xmm13,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm11,xmm11,xmm6
+        vpsrld  xmm5,xmm4,31
+        vpaddd  xmm4,xmm4,xmm4
+
+        vpsrld  xmm13,xmm13,2
+        vpaddd  xmm11,xmm11,xmm8
+        vpor    xmm4,xmm4,xmm5
+        vpor    xmm13,xmm13,xmm7
+        vpxor   xmm0,xmm0,xmm2
+        vmovdqa xmm2,XMMWORD[((176-128))+rax]
+
+        vpslld  xmm8,xmm11,5
+        vpaddd  xmm10,xmm10,xmm15
+        vpxor   xmm6,xmm14,xmm12
+        vmovdqa XMMWORD[(128-128)+rax],xmm4
+        vpaddd  xmm10,xmm10,xmm4
+        vpxor   xmm0,xmm0,XMMWORD[((16-128))+rax]
+        vpsrld  xmm9,xmm11,27
+        vpxor   xmm6,xmm6,xmm13
+        vpxor   xmm0,xmm0,xmm2
+
+        vpslld  xmm7,xmm12,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm10,xmm10,xmm6
+        vpsrld  xmm5,xmm0,31
+        vpaddd  xmm0,xmm0,xmm0
+
+        vpsrld  xmm12,xmm12,2
+        vpaddd  xmm10,xmm10,xmm8
+        vpor    xmm0,xmm0,xmm5
+        vpor    xmm12,xmm12,xmm7
+        vpxor   xmm1,xmm1,xmm3
+        vmovdqa xmm3,XMMWORD[((192-128))+rax]
+
+        vpslld  xmm8,xmm10,5
+        vpaddd  xmm14,xmm14,xmm15
+        vpxor   xmm6,xmm13,xmm11
+        vmovdqa XMMWORD[(144-128)+rax],xmm0
+        vpaddd  xmm14,xmm14,xmm0
+        vpxor   xmm1,xmm1,XMMWORD[((32-128))+rax]
+        vpsrld  xmm9,xmm10,27
+        vpxor   xmm6,xmm6,xmm12
+        vpxor   xmm1,xmm1,xmm3
+
+        vpslld  xmm7,xmm11,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm14,xmm14,xmm6
+        vpsrld  xmm5,xmm1,31
+        vpaddd  xmm1,xmm1,xmm1
+
+        vpsrld  xmm11,xmm11,2
+        vpaddd  xmm14,xmm14,xmm8
+        vpor    xmm1,xmm1,xmm5
+        vpor    xmm11,xmm11,xmm7
+        vpxor   xmm2,xmm2,xmm4
+        vmovdqa xmm4,XMMWORD[((208-128))+rax]
+
+        vpslld  xmm8,xmm14,5
+        vpaddd  xmm13,xmm13,xmm15
+        vpxor   xmm6,xmm12,xmm10
+        vmovdqa XMMWORD[(160-128)+rax],xmm1
+        vpaddd  xmm13,xmm13,xmm1
+        vpxor   xmm2,xmm2,XMMWORD[((48-128))+rax]
+        vpsrld  xmm9,xmm14,27
+        vpxor   xmm6,xmm6,xmm11
+        vpxor   xmm2,xmm2,xmm4
+
+        vpslld  xmm7,xmm10,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm13,xmm13,xmm6
+        vpsrld  xmm5,xmm2,31
+        vpaddd  xmm2,xmm2,xmm2
+
+        vpsrld  xmm10,xmm10,2
+        vpaddd  xmm13,xmm13,xmm8
+        vpor    xmm2,xmm2,xmm5
+        vpor    xmm10,xmm10,xmm7
+        vpxor   xmm3,xmm3,xmm0
+        vmovdqa xmm0,XMMWORD[((224-128))+rax]
+
+        vpslld  xmm8,xmm13,5
+        vpaddd  xmm12,xmm12,xmm15
+        vpxor   xmm6,xmm11,xmm14
+        vmovdqa XMMWORD[(176-128)+rax],xmm2
+        vpaddd  xmm12,xmm12,xmm2
+        vpxor   xmm3,xmm3,XMMWORD[((64-128))+rax]
+        vpsrld  xmm9,xmm13,27
+        vpxor   xmm6,xmm6,xmm10
+        vpxor   xmm3,xmm3,xmm0
+
+        vpslld  xmm7,xmm14,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm12,xmm12,xmm6
+        vpsrld  xmm5,xmm3,31
+        vpaddd  xmm3,xmm3,xmm3
+
+        vpsrld  xmm14,xmm14,2
+        vpaddd  xmm12,xmm12,xmm8
+        vpor    xmm3,xmm3,xmm5
+        vpor    xmm14,xmm14,xmm7
+        vpxor   xmm4,xmm4,xmm1
+        vmovdqa xmm1,XMMWORD[((240-128))+rax]
+
+        vpslld  xmm8,xmm12,5
+        vpaddd  xmm11,xmm11,xmm15
+        vpxor   xmm6,xmm10,xmm13
+        vmovdqa XMMWORD[(192-128)+rax],xmm3
+        vpaddd  xmm11,xmm11,xmm3
+        vpxor   xmm4,xmm4,XMMWORD[((80-128))+rax]
+        vpsrld  xmm9,xmm12,27
+        vpxor   xmm6,xmm6,xmm14
+        vpxor   xmm4,xmm4,xmm1
+
+        vpslld  xmm7,xmm13,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm11,xmm11,xmm6
+        vpsrld  xmm5,xmm4,31
+        vpaddd  xmm4,xmm4,xmm4
+
+        vpsrld  xmm13,xmm13,2
+        vpaddd  xmm11,xmm11,xmm8
+        vpor    xmm4,xmm4,xmm5
+        vpor    xmm13,xmm13,xmm7
+        vpxor   xmm0,xmm0,xmm2
+        vmovdqa xmm2,XMMWORD[((0-128))+rax]
+
+        vpslld  xmm8,xmm11,5
+        vpaddd  xmm10,xmm10,xmm15
+        vpxor   xmm6,xmm14,xmm12
+        vmovdqa XMMWORD[(208-128)+rax],xmm4
+        vpaddd  xmm10,xmm10,xmm4
+        vpxor   xmm0,xmm0,XMMWORD[((96-128))+rax]
+        vpsrld  xmm9,xmm11,27
+        vpxor   xmm6,xmm6,xmm13
+        vpxor   xmm0,xmm0,xmm2
+
+        vpslld  xmm7,xmm12,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm10,xmm10,xmm6
+        vpsrld  xmm5,xmm0,31
+        vpaddd  xmm0,xmm0,xmm0
+
+        vpsrld  xmm12,xmm12,2
+        vpaddd  xmm10,xmm10,xmm8
+        vpor    xmm0,xmm0,xmm5
+        vpor    xmm12,xmm12,xmm7
+        vpxor   xmm1,xmm1,xmm3
+        vmovdqa xmm3,XMMWORD[((16-128))+rax]
+
+        vpslld  xmm8,xmm10,5
+        vpaddd  xmm14,xmm14,xmm15
+        vpxor   xmm6,xmm13,xmm11
+        vmovdqa XMMWORD[(224-128)+rax],xmm0
+        vpaddd  xmm14,xmm14,xmm0
+        vpxor   xmm1,xmm1,XMMWORD[((112-128))+rax]
+        vpsrld  xmm9,xmm10,27
+        vpxor   xmm6,xmm6,xmm12
+        vpxor   xmm1,xmm1,xmm3
+
+        vpslld  xmm7,xmm11,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm14,xmm14,xmm6
+        vpsrld  xmm5,xmm1,31
+        vpaddd  xmm1,xmm1,xmm1
+
+        vpsrld  xmm11,xmm11,2
+        vpaddd  xmm14,xmm14,xmm8
+        vpor    xmm1,xmm1,xmm5
+        vpor    xmm11,xmm11,xmm7
+        vpxor   xmm2,xmm2,xmm4
+        vmovdqa xmm4,XMMWORD[((32-128))+rax]
+
+        vpslld  xmm8,xmm14,5
+        vpaddd  xmm13,xmm13,xmm15
+        vpxor   xmm6,xmm12,xmm10
+        vmovdqa XMMWORD[(240-128)+rax],xmm1
+        vpaddd  xmm13,xmm13,xmm1
+        vpxor   xmm2,xmm2,XMMWORD[((128-128))+rax]
+        vpsrld  xmm9,xmm14,27
+        vpxor   xmm6,xmm6,xmm11
+        vpxor   xmm2,xmm2,xmm4
+
+        vpslld  xmm7,xmm10,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm13,xmm13,xmm6
+        vpsrld  xmm5,xmm2,31
+        vpaddd  xmm2,xmm2,xmm2
+
+        vpsrld  xmm10,xmm10,2
+        vpaddd  xmm13,xmm13,xmm8
+        vpor    xmm2,xmm2,xmm5
+        vpor    xmm10,xmm10,xmm7
+        vpxor   xmm3,xmm3,xmm0
+        vmovdqa xmm0,XMMWORD[((48-128))+rax]
+
+        vpslld  xmm8,xmm13,5
+        vpaddd  xmm12,xmm12,xmm15
+        vpxor   xmm6,xmm11,xmm14
+        vmovdqa XMMWORD[(0-128)+rax],xmm2
+        vpaddd  xmm12,xmm12,xmm2
+        vpxor   xmm3,xmm3,XMMWORD[((144-128))+rax]
+        vpsrld  xmm9,xmm13,27
+        vpxor   xmm6,xmm6,xmm10
+        vpxor   xmm3,xmm3,xmm0
+
+        vpslld  xmm7,xmm14,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm12,xmm12,xmm6
+        vpsrld  xmm5,xmm3,31
+        vpaddd  xmm3,xmm3,xmm3
+
+        vpsrld  xmm14,xmm14,2
+        vpaddd  xmm12,xmm12,xmm8
+        vpor    xmm3,xmm3,xmm5
+        vpor    xmm14,xmm14,xmm7
+        vpxor   xmm4,xmm4,xmm1
+        vmovdqa xmm1,XMMWORD[((64-128))+rax]
+
+        vpslld  xmm8,xmm12,5
+        vpaddd  xmm11,xmm11,xmm15
+        vpxor   xmm6,xmm10,xmm13
+        vmovdqa XMMWORD[(16-128)+rax],xmm3
+        vpaddd  xmm11,xmm11,xmm3
+        vpxor   xmm4,xmm4,XMMWORD[((160-128))+rax]
+        vpsrld  xmm9,xmm12,27
+        vpxor   xmm6,xmm6,xmm14
+        vpxor   xmm4,xmm4,xmm1
+
+        vpslld  xmm7,xmm13,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm11,xmm11,xmm6
+        vpsrld  xmm5,xmm4,31
+        vpaddd  xmm4,xmm4,xmm4
+
+        vpsrld  xmm13,xmm13,2
+        vpaddd  xmm11,xmm11,xmm8
+        vpor    xmm4,xmm4,xmm5
+        vpor    xmm13,xmm13,xmm7
+        vpxor   xmm0,xmm0,xmm2
+        vmovdqa xmm2,XMMWORD[((80-128))+rax]
+
+        vpslld  xmm8,xmm11,5
+        vpaddd  xmm10,xmm10,xmm15
+        vpxor   xmm6,xmm14,xmm12
+        vmovdqa XMMWORD[(32-128)+rax],xmm4
+        vpaddd  xmm10,xmm10,xmm4
+        vpxor   xmm0,xmm0,XMMWORD[((176-128))+rax]
+        vpsrld  xmm9,xmm11,27
+        vpxor   xmm6,xmm6,xmm13
+        vpxor   xmm0,xmm0,xmm2
+
+        vpslld  xmm7,xmm12,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm10,xmm10,xmm6
+        vpsrld  xmm5,xmm0,31
+        vpaddd  xmm0,xmm0,xmm0
+
+        vpsrld  xmm12,xmm12,2
+        vpaddd  xmm10,xmm10,xmm8
+        vpor    xmm0,xmm0,xmm5
+        vpor    xmm12,xmm12,xmm7
+        vpxor   xmm1,xmm1,xmm3
+        vmovdqa xmm3,XMMWORD[((96-128))+rax]
+
+        vpslld  xmm8,xmm10,5
+        vpaddd  xmm14,xmm14,xmm15
+        vpxor   xmm6,xmm13,xmm11
+        vmovdqa XMMWORD[(48-128)+rax],xmm0
+        vpaddd  xmm14,xmm14,xmm0
+        vpxor   xmm1,xmm1,XMMWORD[((192-128))+rax]
+        vpsrld  xmm9,xmm10,27
+        vpxor   xmm6,xmm6,xmm12
+        vpxor   xmm1,xmm1,xmm3
+
+        vpslld  xmm7,xmm11,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm14,xmm14,xmm6
+        vpsrld  xmm5,xmm1,31
+        vpaddd  xmm1,xmm1,xmm1
+
+        vpsrld  xmm11,xmm11,2
+        vpaddd  xmm14,xmm14,xmm8
+        vpor    xmm1,xmm1,xmm5
+        vpor    xmm11,xmm11,xmm7
+        vpxor   xmm2,xmm2,xmm4
+        vmovdqa xmm4,XMMWORD[((112-128))+rax]
+
+        vpslld  xmm8,xmm14,5
+        vpaddd  xmm13,xmm13,xmm15
+        vpxor   xmm6,xmm12,xmm10
+        vmovdqa XMMWORD[(64-128)+rax],xmm1
+        vpaddd  xmm13,xmm13,xmm1
+        vpxor   xmm2,xmm2,XMMWORD[((208-128))+rax]
+        vpsrld  xmm9,xmm14,27
+        vpxor   xmm6,xmm6,xmm11
+        vpxor   xmm2,xmm2,xmm4
+
+        vpslld  xmm7,xmm10,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm13,xmm13,xmm6
+        vpsrld  xmm5,xmm2,31
+        vpaddd  xmm2,xmm2,xmm2
+
+        vpsrld  xmm10,xmm10,2
+        vpaddd  xmm13,xmm13,xmm8
+        vpor    xmm2,xmm2,xmm5
+        vpor    xmm10,xmm10,xmm7
+        vpxor   xmm3,xmm3,xmm0
+        vmovdqa xmm0,XMMWORD[((128-128))+rax]
+
+        vpslld  xmm8,xmm13,5
+        vpaddd  xmm12,xmm12,xmm15
+        vpxor   xmm6,xmm11,xmm14
+        vmovdqa XMMWORD[(80-128)+rax],xmm2
+        vpaddd  xmm12,xmm12,xmm2
+        vpxor   xmm3,xmm3,XMMWORD[((224-128))+rax]
+        vpsrld  xmm9,xmm13,27
+        vpxor   xmm6,xmm6,xmm10
+        vpxor   xmm3,xmm3,xmm0
+
+        vpslld  xmm7,xmm14,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm12,xmm12,xmm6
+        vpsrld  xmm5,xmm3,31
+        vpaddd  xmm3,xmm3,xmm3
+
+        vpsrld  xmm14,xmm14,2
+        vpaddd  xmm12,xmm12,xmm8
+        vpor    xmm3,xmm3,xmm5
+        vpor    xmm14,xmm14,xmm7
+        vpxor   xmm4,xmm4,xmm1
+        vmovdqa xmm1,XMMWORD[((144-128))+rax]
+
+        vpslld  xmm8,xmm12,5
+        vpaddd  xmm11,xmm11,xmm15
+        vpxor   xmm6,xmm10,xmm13
+        vmovdqa XMMWORD[(96-128)+rax],xmm3
+        vpaddd  xmm11,xmm11,xmm3
+        vpxor   xmm4,xmm4,XMMWORD[((240-128))+rax]
+        vpsrld  xmm9,xmm12,27
+        vpxor   xmm6,xmm6,xmm14
+        vpxor   xmm4,xmm4,xmm1
+
+        vpslld  xmm7,xmm13,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm11,xmm11,xmm6
+        vpsrld  xmm5,xmm4,31
+        vpaddd  xmm4,xmm4,xmm4
+
+        vpsrld  xmm13,xmm13,2
+        vpaddd  xmm11,xmm11,xmm8
+        vpor    xmm4,xmm4,xmm5
+        vpor    xmm13,xmm13,xmm7
+        vpxor   xmm0,xmm0,xmm2
+        vmovdqa xmm2,XMMWORD[((160-128))+rax]
+
+        vpslld  xmm8,xmm11,5
+        vpaddd  xmm10,xmm10,xmm15
+        vpxor   xmm6,xmm14,xmm12
+        vmovdqa XMMWORD[(112-128)+rax],xmm4
+        vpaddd  xmm10,xmm10,xmm4
+        vpxor   xmm0,xmm0,XMMWORD[((0-128))+rax]
+        vpsrld  xmm9,xmm11,27
+        vpxor   xmm6,xmm6,xmm13
+        vpxor   xmm0,xmm0,xmm2
+
+        vpslld  xmm7,xmm12,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm10,xmm10,xmm6
+        vpsrld  xmm5,xmm0,31
+        vpaddd  xmm0,xmm0,xmm0
+
+        vpsrld  xmm12,xmm12,2
+        vpaddd  xmm10,xmm10,xmm8
+        vpor    xmm0,xmm0,xmm5
+        vpor    xmm12,xmm12,xmm7
+        vmovdqa xmm15,XMMWORD[32+rbp]
+        vpxor   xmm1,xmm1,xmm3
+        vmovdqa xmm3,XMMWORD[((176-128))+rax]
+
+        vpaddd  xmm14,xmm14,xmm15
+        vpslld  xmm8,xmm10,5
+        vpand   xmm7,xmm13,xmm12
+        vpxor   xmm1,xmm1,XMMWORD[((16-128))+rax]
+
+        vpaddd  xmm14,xmm14,xmm7
+        vpsrld  xmm9,xmm10,27
+        vpxor   xmm6,xmm13,xmm12
+        vpxor   xmm1,xmm1,xmm3
+
+        vmovdqu XMMWORD[(128-128)+rax],xmm0
+        vpaddd  xmm14,xmm14,xmm0
+        vpor    xmm8,xmm8,xmm9
+        vpsrld  xmm5,xmm1,31
+        vpand   xmm6,xmm6,xmm11
+        vpaddd  xmm1,xmm1,xmm1
+
+        vpslld  xmm7,xmm11,30
+        vpaddd  xmm14,xmm14,xmm6
+
+        vpsrld  xmm11,xmm11,2
+        vpaddd  xmm14,xmm14,xmm8
+        vpor    xmm1,xmm1,xmm5
+        vpor    xmm11,xmm11,xmm7
+        vpxor   xmm2,xmm2,xmm4
+        vmovdqa xmm4,XMMWORD[((192-128))+rax]
+
+        vpaddd  xmm13,xmm13,xmm15
+        vpslld  xmm8,xmm14,5
+        vpand   xmm7,xmm12,xmm11
+        vpxor   xmm2,xmm2,XMMWORD[((32-128))+rax]
+
+        vpaddd  xmm13,xmm13,xmm7
+        vpsrld  xmm9,xmm14,27
+        vpxor   xmm6,xmm12,xmm11
+        vpxor   xmm2,xmm2,xmm4
+
+        vmovdqu XMMWORD[(144-128)+rax],xmm1
+        vpaddd  xmm13,xmm13,xmm1
+        vpor    xmm8,xmm8,xmm9
+        vpsrld  xmm5,xmm2,31
+        vpand   xmm6,xmm6,xmm10
+        vpaddd  xmm2,xmm2,xmm2
+
+        vpslld  xmm7,xmm10,30
+        vpaddd  xmm13,xmm13,xmm6
+
+        vpsrld  xmm10,xmm10,2
+        vpaddd  xmm13,xmm13,xmm8
+        vpor    xmm2,xmm2,xmm5
+        vpor    xmm10,xmm10,xmm7
+        vpxor   xmm3,xmm3,xmm0
+        vmovdqa xmm0,XMMWORD[((208-128))+rax]
+
+        vpaddd  xmm12,xmm12,xmm15
+        vpslld  xmm8,xmm13,5
+        vpand   xmm7,xmm11,xmm10
+        vpxor   xmm3,xmm3,XMMWORD[((48-128))+rax]
+
+        vpaddd  xmm12,xmm12,xmm7
+        vpsrld  xmm9,xmm13,27
+        vpxor   xmm6,xmm11,xmm10
+        vpxor   xmm3,xmm3,xmm0
+
+        vmovdqu XMMWORD[(160-128)+rax],xmm2
+        vpaddd  xmm12,xmm12,xmm2
+        vpor    xmm8,xmm8,xmm9
+        vpsrld  xmm5,xmm3,31
+        vpand   xmm6,xmm6,xmm14
+        vpaddd  xmm3,xmm3,xmm3
+
+        vpslld  xmm7,xmm14,30
+        vpaddd  xmm12,xmm12,xmm6
+
+        vpsrld  xmm14,xmm14,2
+        vpaddd  xmm12,xmm12,xmm8
+        vpor    xmm3,xmm3,xmm5
+        vpor    xmm14,xmm14,xmm7
+        vpxor   xmm4,xmm4,xmm1
+        vmovdqa xmm1,XMMWORD[((224-128))+rax]
+
+        vpaddd  xmm11,xmm11,xmm15
+        vpslld  xmm8,xmm12,5
+        vpand   xmm7,xmm10,xmm14
+        vpxor   xmm4,xmm4,XMMWORD[((64-128))+rax]
+
+        vpaddd  xmm11,xmm11,xmm7
+        vpsrld  xmm9,xmm12,27
+        vpxor   xmm6,xmm10,xmm14
+        vpxor   xmm4,xmm4,xmm1
+
+        vmovdqu XMMWORD[(176-128)+rax],xmm3
+        vpaddd  xmm11,xmm11,xmm3
+        vpor    xmm8,xmm8,xmm9
+        vpsrld  xmm5,xmm4,31
+        vpand   xmm6,xmm6,xmm13
+        vpaddd  xmm4,xmm4,xmm4
+
+        vpslld  xmm7,xmm13,30
+        vpaddd  xmm11,xmm11,xmm6
+
+        vpsrld  xmm13,xmm13,2
+        vpaddd  xmm11,xmm11,xmm8
+        vpor    xmm4,xmm4,xmm5
+        vpor    xmm13,xmm13,xmm7
+        vpxor   xmm0,xmm0,xmm2
+        vmovdqa xmm2,XMMWORD[((240-128))+rax]
+
+        vpaddd  xmm10,xmm10,xmm15
+        vpslld  xmm8,xmm11,5
+        vpand   xmm7,xmm14,xmm13
+        vpxor   xmm0,xmm0,XMMWORD[((80-128))+rax]
+
+        vpaddd  xmm10,xmm10,xmm7
+        vpsrld  xmm9,xmm11,27
+        vpxor   xmm6,xmm14,xmm13
+        vpxor   xmm0,xmm0,xmm2
+
+        vmovdqu XMMWORD[(192-128)+rax],xmm4
+        vpaddd  xmm10,xmm10,xmm4
+        vpor    xmm8,xmm8,xmm9
+        vpsrld  xmm5,xmm0,31
+        vpand   xmm6,xmm6,xmm12
+        vpaddd  xmm0,xmm0,xmm0
+
+        vpslld  xmm7,xmm12,30
+        vpaddd  xmm10,xmm10,xmm6
+
+        vpsrld  xmm12,xmm12,2
+        vpaddd  xmm10,xmm10,xmm8
+        vpor    xmm0,xmm0,xmm5
+        vpor    xmm12,xmm12,xmm7
+        vpxor   xmm1,xmm1,xmm3
+        vmovdqa xmm3,XMMWORD[((0-128))+rax]
+
+        vpaddd  xmm14,xmm14,xmm15
+        vpslld  xmm8,xmm10,5
+        vpand   xmm7,xmm13,xmm12
+        vpxor   xmm1,xmm1,XMMWORD[((96-128))+rax]
+
+        vpaddd  xmm14,xmm14,xmm7
+        vpsrld  xmm9,xmm10,27
+        vpxor   xmm6,xmm13,xmm12
+        vpxor   xmm1,xmm1,xmm3
+
+        vmovdqu XMMWORD[(208-128)+rax],xmm0
+        vpaddd  xmm14,xmm14,xmm0
+        vpor    xmm8,xmm8,xmm9
+        vpsrld  xmm5,xmm1,31
+        vpand   xmm6,xmm6,xmm11
+        vpaddd  xmm1,xmm1,xmm1
+
+        vpslld  xmm7,xmm11,30
+        vpaddd  xmm14,xmm14,xmm6
+
+        vpsrld  xmm11,xmm11,2
+        vpaddd  xmm14,xmm14,xmm8
+        vpor    xmm1,xmm1,xmm5
+        vpor    xmm11,xmm11,xmm7
+        vpxor   xmm2,xmm2,xmm4
+        vmovdqa xmm4,XMMWORD[((16-128))+rax]
+
+        vpaddd  xmm13,xmm13,xmm15
+        vpslld  xmm8,xmm14,5
+        vpand   xmm7,xmm12,xmm11
+        vpxor   xmm2,xmm2,XMMWORD[((112-128))+rax]
+
+        vpaddd  xmm13,xmm13,xmm7
+        vpsrld  xmm9,xmm14,27
+        vpxor   xmm6,xmm12,xmm11
+        vpxor   xmm2,xmm2,xmm4
+
+        vmovdqu XMMWORD[(224-128)+rax],xmm1
+        vpaddd  xmm13,xmm13,xmm1
+        vpor    xmm8,xmm8,xmm9
+        vpsrld  xmm5,xmm2,31
+        vpand   xmm6,xmm6,xmm10
+        vpaddd  xmm2,xmm2,xmm2
+
+        vpslld  xmm7,xmm10,30
+        vpaddd  xmm13,xmm13,xmm6
+
+        vpsrld  xmm10,xmm10,2
+        vpaddd  xmm13,xmm13,xmm8
+        vpor    xmm2,xmm2,xmm5
+        vpor    xmm10,xmm10,xmm7
+        vpxor   xmm3,xmm3,xmm0
+        vmovdqa xmm0,XMMWORD[((32-128))+rax]
+
+        vpaddd  xmm12,xmm12,xmm15
+        vpslld  xmm8,xmm13,5
+        vpand   xmm7,xmm11,xmm10
+        vpxor   xmm3,xmm3,XMMWORD[((128-128))+rax]
+
+        vpaddd  xmm12,xmm12,xmm7
+        vpsrld  xmm9,xmm13,27
+        vpxor   xmm6,xmm11,xmm10
+        vpxor   xmm3,xmm3,xmm0
+
+        vmovdqu XMMWORD[(240-128)+rax],xmm2
+        vpaddd  xmm12,xmm12,xmm2
+        vpor    xmm8,xmm8,xmm9
+        vpsrld  xmm5,xmm3,31
+        vpand   xmm6,xmm6,xmm14
+        vpaddd  xmm3,xmm3,xmm3
+
+        vpslld  xmm7,xmm14,30
+        vpaddd  xmm12,xmm12,xmm6
+
+        vpsrld  xmm14,xmm14,2
+        vpaddd  xmm12,xmm12,xmm8
+        vpor    xmm3,xmm3,xmm5
+        vpor    xmm14,xmm14,xmm7
+        vpxor   xmm4,xmm4,xmm1
+        vmovdqa xmm1,XMMWORD[((48-128))+rax]
+
+        vpaddd  xmm11,xmm11,xmm15
+        vpslld  xmm8,xmm12,5
+        vpand   xmm7,xmm10,xmm14
+        vpxor   xmm4,xmm4,XMMWORD[((144-128))+rax]
+
+        vpaddd  xmm11,xmm11,xmm7
+        vpsrld  xmm9,xmm12,27
+        vpxor   xmm6,xmm10,xmm14
+        vpxor   xmm4,xmm4,xmm1
+
+        vmovdqu XMMWORD[(0-128)+rax],xmm3
+        vpaddd  xmm11,xmm11,xmm3
+        vpor    xmm8,xmm8,xmm9
+        vpsrld  xmm5,xmm4,31
+        vpand   xmm6,xmm6,xmm13
+        vpaddd  xmm4,xmm4,xmm4
+
+        vpslld  xmm7,xmm13,30
+        vpaddd  xmm11,xmm11,xmm6
+
+        vpsrld  xmm13,xmm13,2
+        vpaddd  xmm11,xmm11,xmm8
+        vpor    xmm4,xmm4,xmm5
+        vpor    xmm13,xmm13,xmm7
+        vpxor   xmm0,xmm0,xmm2
+        vmovdqa xmm2,XMMWORD[((64-128))+rax]
+
+        vpaddd  xmm10,xmm10,xmm15
+        vpslld  xmm8,xmm11,5
+        vpand   xmm7,xmm14,xmm13
+        vpxor   xmm0,xmm0,XMMWORD[((160-128))+rax]
+
+        vpaddd  xmm10,xmm10,xmm7
+        vpsrld  xmm9,xmm11,27
+        vpxor   xmm6,xmm14,xmm13
+        vpxor   xmm0,xmm0,xmm2
+
+        vmovdqu XMMWORD[(16-128)+rax],xmm4
+        vpaddd  xmm10,xmm10,xmm4
+        vpor    xmm8,xmm8,xmm9
+        vpsrld  xmm5,xmm0,31
+        vpand   xmm6,xmm6,xmm12
+        vpaddd  xmm0,xmm0,xmm0
+
+        vpslld  xmm7,xmm12,30
+        vpaddd  xmm10,xmm10,xmm6
+
+        vpsrld  xmm12,xmm12,2
+        vpaddd  xmm10,xmm10,xmm8
+        vpor    xmm0,xmm0,xmm5
+        vpor    xmm12,xmm12,xmm7
+        vpxor   xmm1,xmm1,xmm3
+        vmovdqa xmm3,XMMWORD[((80-128))+rax]
+
+        vpaddd  xmm14,xmm14,xmm15
+        vpslld  xmm8,xmm10,5
+        vpand   xmm7,xmm13,xmm12
+        vpxor   xmm1,xmm1,XMMWORD[((176-128))+rax]
+
+        vpaddd  xmm14,xmm14,xmm7
+        vpsrld  xmm9,xmm10,27
+        vpxor   xmm6,xmm13,xmm12
+        vpxor   xmm1,xmm1,xmm3
+
+        vmovdqu XMMWORD[(32-128)+rax],xmm0
+        vpaddd  xmm14,xmm14,xmm0
+        vpor    xmm8,xmm8,xmm9
+        vpsrld  xmm5,xmm1,31
+        vpand   xmm6,xmm6,xmm11
+        vpaddd  xmm1,xmm1,xmm1
+
+        vpslld  xmm7,xmm11,30
+        vpaddd  xmm14,xmm14,xmm6
+
+        vpsrld  xmm11,xmm11,2
+        vpaddd  xmm14,xmm14,xmm8
+        vpor    xmm1,xmm1,xmm5
+        vpor    xmm11,xmm11,xmm7
+        vpxor   xmm2,xmm2,xmm4
+        vmovdqa xmm4,XMMWORD[((96-128))+rax]
+
+        vpaddd  xmm13,xmm13,xmm15
+        vpslld  xmm8,xmm14,5
+        vpand   xmm7,xmm12,xmm11
+        vpxor   xmm2,xmm2,XMMWORD[((192-128))+rax]
+
+        vpaddd  xmm13,xmm13,xmm7
+        vpsrld  xmm9,xmm14,27
+        vpxor   xmm6,xmm12,xmm11
+        vpxor   xmm2,xmm2,xmm4
+
+        vmovdqu XMMWORD[(48-128)+rax],xmm1
+        vpaddd  xmm13,xmm13,xmm1
+        vpor    xmm8,xmm8,xmm9
+        vpsrld  xmm5,xmm2,31
+        vpand   xmm6,xmm6,xmm10
+        vpaddd  xmm2,xmm2,xmm2
+
+        vpslld  xmm7,xmm10,30
+        vpaddd  xmm13,xmm13,xmm6
+
+        vpsrld  xmm10,xmm10,2
+        vpaddd  xmm13,xmm13,xmm8
+        vpor    xmm2,xmm2,xmm5
+        vpor    xmm10,xmm10,xmm7
+        vpxor   xmm3,xmm3,xmm0
+        vmovdqa xmm0,XMMWORD[((112-128))+rax]
+
+        vpaddd  xmm12,xmm12,xmm15
+        vpslld  xmm8,xmm13,5
+        vpand   xmm7,xmm11,xmm10
+        vpxor   xmm3,xmm3,XMMWORD[((208-128))+rax]
+
+        vpaddd  xmm12,xmm12,xmm7
+        vpsrld  xmm9,xmm13,27
+        vpxor   xmm6,xmm11,xmm10
+        vpxor   xmm3,xmm3,xmm0
+
+        vmovdqu XMMWORD[(64-128)+rax],xmm2
+        vpaddd  xmm12,xmm12,xmm2
+        vpor    xmm8,xmm8,xmm9
+        vpsrld  xmm5,xmm3,31
+        vpand   xmm6,xmm6,xmm14
+        vpaddd  xmm3,xmm3,xmm3
+
+        vpslld  xmm7,xmm14,30
+        vpaddd  xmm12,xmm12,xmm6
+
+        vpsrld  xmm14,xmm14,2
+        vpaddd  xmm12,xmm12,xmm8
+        vpor    xmm3,xmm3,xmm5
+        vpor    xmm14,xmm14,xmm7
+        vpxor   xmm4,xmm4,xmm1
+        vmovdqa xmm1,XMMWORD[((128-128))+rax]
+
+        vpaddd  xmm11,xmm11,xmm15
+        vpslld  xmm8,xmm12,5
+        vpand   xmm7,xmm10,xmm14
+        vpxor   xmm4,xmm4,XMMWORD[((224-128))+rax]
+
+        vpaddd  xmm11,xmm11,xmm7
+        vpsrld  xmm9,xmm12,27
+        vpxor   xmm6,xmm10,xmm14
+        vpxor   xmm4,xmm4,xmm1
+
+        vmovdqu XMMWORD[(80-128)+rax],xmm3
+        vpaddd  xmm11,xmm11,xmm3
+        vpor    xmm8,xmm8,xmm9
+        vpsrld  xmm5,xmm4,31
+        vpand   xmm6,xmm6,xmm13
+        vpaddd  xmm4,xmm4,xmm4
+
+        vpslld  xmm7,xmm13,30
+        vpaddd  xmm11,xmm11,xmm6
+
+        vpsrld  xmm13,xmm13,2
+        vpaddd  xmm11,xmm11,xmm8
+        vpor    xmm4,xmm4,xmm5
+        vpor    xmm13,xmm13,xmm7
+        vpxor   xmm0,xmm0,xmm2
+        vmovdqa xmm2,XMMWORD[((144-128))+rax]
+
+        vpaddd  xmm10,xmm10,xmm15
+        vpslld  xmm8,xmm11,5
+        vpand   xmm7,xmm14,xmm13
+        vpxor   xmm0,xmm0,XMMWORD[((240-128))+rax]
+
+        vpaddd  xmm10,xmm10,xmm7
+        vpsrld  xmm9,xmm11,27
+        vpxor   xmm6,xmm14,xmm13
+        vpxor   xmm0,xmm0,xmm2
+
+        vmovdqu XMMWORD[(96-128)+rax],xmm4
+        vpaddd  xmm10,xmm10,xmm4
+        vpor    xmm8,xmm8,xmm9
+        vpsrld  xmm5,xmm0,31
+        vpand   xmm6,xmm6,xmm12
+        vpaddd  xmm0,xmm0,xmm0
+
+        vpslld  xmm7,xmm12,30
+        vpaddd  xmm10,xmm10,xmm6
+
+        vpsrld  xmm12,xmm12,2
+        vpaddd  xmm10,xmm10,xmm8
+        vpor    xmm0,xmm0,xmm5
+        vpor    xmm12,xmm12,xmm7
+        vpxor   xmm1,xmm1,xmm3
+        vmovdqa xmm3,XMMWORD[((160-128))+rax]
+
+        vpaddd  xmm14,xmm14,xmm15
+        vpslld  xmm8,xmm10,5
+        vpand   xmm7,xmm13,xmm12
+        vpxor   xmm1,xmm1,XMMWORD[((0-128))+rax]
+
+        vpaddd  xmm14,xmm14,xmm7
+        vpsrld  xmm9,xmm10,27
+        vpxor   xmm6,xmm13,xmm12
+        vpxor   xmm1,xmm1,xmm3
+
+        vmovdqu XMMWORD[(112-128)+rax],xmm0
+        vpaddd  xmm14,xmm14,xmm0
+        vpor    xmm8,xmm8,xmm9
+        vpsrld  xmm5,xmm1,31
+        vpand   xmm6,xmm6,xmm11
+        vpaddd  xmm1,xmm1,xmm1
+
+        vpslld  xmm7,xmm11,30
+        vpaddd  xmm14,xmm14,xmm6
+
+        vpsrld  xmm11,xmm11,2
+        vpaddd  xmm14,xmm14,xmm8
+        vpor    xmm1,xmm1,xmm5
+        vpor    xmm11,xmm11,xmm7
+        vpxor   xmm2,xmm2,xmm4
+        vmovdqa xmm4,XMMWORD[((176-128))+rax]
+
+        vpaddd  xmm13,xmm13,xmm15
+        vpslld  xmm8,xmm14,5
+        vpand   xmm7,xmm12,xmm11
+        vpxor   xmm2,xmm2,XMMWORD[((16-128))+rax]
+
+        vpaddd  xmm13,xmm13,xmm7
+        vpsrld  xmm9,xmm14,27
+        vpxor   xmm6,xmm12,xmm11
+        vpxor   xmm2,xmm2,xmm4
+
+        vmovdqu XMMWORD[(128-128)+rax],xmm1
+        vpaddd  xmm13,xmm13,xmm1
+        vpor    xmm8,xmm8,xmm9
+        vpsrld  xmm5,xmm2,31
+        vpand   xmm6,xmm6,xmm10
+        vpaddd  xmm2,xmm2,xmm2
+
+        vpslld  xmm7,xmm10,30
+        vpaddd  xmm13,xmm13,xmm6
+
+        vpsrld  xmm10,xmm10,2
+        vpaddd  xmm13,xmm13,xmm8
+        vpor    xmm2,xmm2,xmm5
+        vpor    xmm10,xmm10,xmm7
+        vpxor   xmm3,xmm3,xmm0
+        vmovdqa xmm0,XMMWORD[((192-128))+rax]
+
+        vpaddd  xmm12,xmm12,xmm15
+        vpslld  xmm8,xmm13,5
+        vpand   xmm7,xmm11,xmm10
+        vpxor   xmm3,xmm3,XMMWORD[((32-128))+rax]
+
+        vpaddd  xmm12,xmm12,xmm7
+        vpsrld  xmm9,xmm13,27
+        vpxor   xmm6,xmm11,xmm10
+        vpxor   xmm3,xmm3,xmm0
+
+        vmovdqu XMMWORD[(144-128)+rax],xmm2
+        vpaddd  xmm12,xmm12,xmm2
+        vpor    xmm8,xmm8,xmm9
+        vpsrld  xmm5,xmm3,31
+        vpand   xmm6,xmm6,xmm14
+        vpaddd  xmm3,xmm3,xmm3
+
+        vpslld  xmm7,xmm14,30
+        vpaddd  xmm12,xmm12,xmm6
+
+        vpsrld  xmm14,xmm14,2
+        vpaddd  xmm12,xmm12,xmm8
+        vpor    xmm3,xmm3,xmm5
+        vpor    xmm14,xmm14,xmm7
+        vpxor   xmm4,xmm4,xmm1
+        vmovdqa xmm1,XMMWORD[((208-128))+rax]
+
+        vpaddd  xmm11,xmm11,xmm15
+        vpslld  xmm8,xmm12,5
+        vpand   xmm7,xmm10,xmm14
+        vpxor   xmm4,xmm4,XMMWORD[((48-128))+rax]
+
+        vpaddd  xmm11,xmm11,xmm7
+        vpsrld  xmm9,xmm12,27
+        vpxor   xmm6,xmm10,xmm14
+        vpxor   xmm4,xmm4,xmm1
+
+        vmovdqu XMMWORD[(160-128)+rax],xmm3
+        vpaddd  xmm11,xmm11,xmm3
+        vpor    xmm8,xmm8,xmm9
+        vpsrld  xmm5,xmm4,31
+        vpand   xmm6,xmm6,xmm13
+        vpaddd  xmm4,xmm4,xmm4
+
+        vpslld  xmm7,xmm13,30
+        vpaddd  xmm11,xmm11,xmm6
+
+        vpsrld  xmm13,xmm13,2
+        vpaddd  xmm11,xmm11,xmm8
+        vpor    xmm4,xmm4,xmm5
+        vpor    xmm13,xmm13,xmm7
+        vpxor   xmm0,xmm0,xmm2
+        vmovdqa xmm2,XMMWORD[((224-128))+rax]
+
+        vpaddd  xmm10,xmm10,xmm15
+        vpslld  xmm8,xmm11,5
+        vpand   xmm7,xmm14,xmm13
+        vpxor   xmm0,xmm0,XMMWORD[((64-128))+rax]
+
+        vpaddd  xmm10,xmm10,xmm7
+        vpsrld  xmm9,xmm11,27
+        vpxor   xmm6,xmm14,xmm13
+        vpxor   xmm0,xmm0,xmm2
+
+        vmovdqu XMMWORD[(176-128)+rax],xmm4
+        vpaddd  xmm10,xmm10,xmm4
+        vpor    xmm8,xmm8,xmm9
+        vpsrld  xmm5,xmm0,31
+        vpand   xmm6,xmm6,xmm12
+        vpaddd  xmm0,xmm0,xmm0
+
+        vpslld  xmm7,xmm12,30
+        vpaddd  xmm10,xmm10,xmm6
+
+        vpsrld  xmm12,xmm12,2
+        vpaddd  xmm10,xmm10,xmm8
+        vpor    xmm0,xmm0,xmm5
+        vpor    xmm12,xmm12,xmm7
+        vmovdqa xmm15,XMMWORD[64+rbp]
+        vpxor   xmm1,xmm1,xmm3
+        vmovdqa xmm3,XMMWORD[((240-128))+rax]
+
+        vpslld  xmm8,xmm10,5
+        vpaddd  xmm14,xmm14,xmm15
+        vpxor   xmm6,xmm13,xmm11
+        vmovdqa XMMWORD[(192-128)+rax],xmm0
+        vpaddd  xmm14,xmm14,xmm0
+        vpxor   xmm1,xmm1,XMMWORD[((80-128))+rax]
+        vpsrld  xmm9,xmm10,27
+        vpxor   xmm6,xmm6,xmm12
+        vpxor   xmm1,xmm1,xmm3
+
+        vpslld  xmm7,xmm11,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm14,xmm14,xmm6
+        vpsrld  xmm5,xmm1,31
+        vpaddd  xmm1,xmm1,xmm1
+
+        vpsrld  xmm11,xmm11,2
+        vpaddd  xmm14,xmm14,xmm8
+        vpor    xmm1,xmm1,xmm5
+        vpor    xmm11,xmm11,xmm7
+        vpxor   xmm2,xmm2,xmm4
+        vmovdqa xmm4,XMMWORD[((0-128))+rax]
+
+        vpslld  xmm8,xmm14,5
+        vpaddd  xmm13,xmm13,xmm15
+        vpxor   xmm6,xmm12,xmm10
+        vmovdqa XMMWORD[(208-128)+rax],xmm1
+        vpaddd  xmm13,xmm13,xmm1
+        vpxor   xmm2,xmm2,XMMWORD[((96-128))+rax]
+        vpsrld  xmm9,xmm14,27
+        vpxor   xmm6,xmm6,xmm11
+        vpxor   xmm2,xmm2,xmm4
+
+        vpslld  xmm7,xmm10,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm13,xmm13,xmm6
+        vpsrld  xmm5,xmm2,31
+        vpaddd  xmm2,xmm2,xmm2
+
+        vpsrld  xmm10,xmm10,2
+        vpaddd  xmm13,xmm13,xmm8
+        vpor    xmm2,xmm2,xmm5
+        vpor    xmm10,xmm10,xmm7
+        vpxor   xmm3,xmm3,xmm0
+        vmovdqa xmm0,XMMWORD[((16-128))+rax]
+
+        vpslld  xmm8,xmm13,5
+        vpaddd  xmm12,xmm12,xmm15
+        vpxor   xmm6,xmm11,xmm14
+        vmovdqa XMMWORD[(224-128)+rax],xmm2
+        vpaddd  xmm12,xmm12,xmm2
+        vpxor   xmm3,xmm3,XMMWORD[((112-128))+rax]
+        vpsrld  xmm9,xmm13,27
+        vpxor   xmm6,xmm6,xmm10
+        vpxor   xmm3,xmm3,xmm0
+
+        vpslld  xmm7,xmm14,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm12,xmm12,xmm6
+        vpsrld  xmm5,xmm3,31
+        vpaddd  xmm3,xmm3,xmm3
+
+        vpsrld  xmm14,xmm14,2
+        vpaddd  xmm12,xmm12,xmm8
+        vpor    xmm3,xmm3,xmm5
+        vpor    xmm14,xmm14,xmm7
+        vpxor   xmm4,xmm4,xmm1
+        vmovdqa xmm1,XMMWORD[((32-128))+rax]
+
+        vpslld  xmm8,xmm12,5
+        vpaddd  xmm11,xmm11,xmm15
+        vpxor   xmm6,xmm10,xmm13
+        vmovdqa XMMWORD[(240-128)+rax],xmm3
+        vpaddd  xmm11,xmm11,xmm3
+        vpxor   xmm4,xmm4,XMMWORD[((128-128))+rax]
+        vpsrld  xmm9,xmm12,27
+        vpxor   xmm6,xmm6,xmm14
+        vpxor   xmm4,xmm4,xmm1
+
+        vpslld  xmm7,xmm13,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm11,xmm11,xmm6
+        vpsrld  xmm5,xmm4,31
+        vpaddd  xmm4,xmm4,xmm4
+
+        vpsrld  xmm13,xmm13,2
+        vpaddd  xmm11,xmm11,xmm8
+        vpor    xmm4,xmm4,xmm5
+        vpor    xmm13,xmm13,xmm7
+        vpxor   xmm0,xmm0,xmm2
+        vmovdqa xmm2,XMMWORD[((48-128))+rax]
+
+        vpslld  xmm8,xmm11,5
+        vpaddd  xmm10,xmm10,xmm15
+        vpxor   xmm6,xmm14,xmm12
+        vmovdqa XMMWORD[(0-128)+rax],xmm4
+        vpaddd  xmm10,xmm10,xmm4
+        vpxor   xmm0,xmm0,XMMWORD[((144-128))+rax]
+        vpsrld  xmm9,xmm11,27
+        vpxor   xmm6,xmm6,xmm13
+        vpxor   xmm0,xmm0,xmm2
+
+        vpslld  xmm7,xmm12,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm10,xmm10,xmm6
+        vpsrld  xmm5,xmm0,31
+        vpaddd  xmm0,xmm0,xmm0
+
+        vpsrld  xmm12,xmm12,2
+        vpaddd  xmm10,xmm10,xmm8
+        vpor    xmm0,xmm0,xmm5
+        vpor    xmm12,xmm12,xmm7
+        vpxor   xmm1,xmm1,xmm3
+        vmovdqa xmm3,XMMWORD[((64-128))+rax]
+
+        vpslld  xmm8,xmm10,5
+        vpaddd  xmm14,xmm14,xmm15
+        vpxor   xmm6,xmm13,xmm11
+        vmovdqa XMMWORD[(16-128)+rax],xmm0
+        vpaddd  xmm14,xmm14,xmm0
+        vpxor   xmm1,xmm1,XMMWORD[((160-128))+rax]
+        vpsrld  xmm9,xmm10,27
+        vpxor   xmm6,xmm6,xmm12
+        vpxor   xmm1,xmm1,xmm3
+
+        vpslld  xmm7,xmm11,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm14,xmm14,xmm6
+        vpsrld  xmm5,xmm1,31
+        vpaddd  xmm1,xmm1,xmm1
+
+        vpsrld  xmm11,xmm11,2
+        vpaddd  xmm14,xmm14,xmm8
+        vpor    xmm1,xmm1,xmm5
+        vpor    xmm11,xmm11,xmm7
+        vpxor   xmm2,xmm2,xmm4
+        vmovdqa xmm4,XMMWORD[((80-128))+rax]
+
+        vpslld  xmm8,xmm14,5
+        vpaddd  xmm13,xmm13,xmm15
+        vpxor   xmm6,xmm12,xmm10
+        vmovdqa XMMWORD[(32-128)+rax],xmm1
+        vpaddd  xmm13,xmm13,xmm1
+        vpxor   xmm2,xmm2,XMMWORD[((176-128))+rax]
+        vpsrld  xmm9,xmm14,27
+        vpxor   xmm6,xmm6,xmm11
+        vpxor   xmm2,xmm2,xmm4
+
+        vpslld  xmm7,xmm10,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm13,xmm13,xmm6
+        vpsrld  xmm5,xmm2,31
+        vpaddd  xmm2,xmm2,xmm2
+
+        vpsrld  xmm10,xmm10,2
+        vpaddd  xmm13,xmm13,xmm8
+        vpor    xmm2,xmm2,xmm5
+        vpor    xmm10,xmm10,xmm7
+        vpxor   xmm3,xmm3,xmm0
+        vmovdqa xmm0,XMMWORD[((96-128))+rax]
+
+        vpslld  xmm8,xmm13,5
+        vpaddd  xmm12,xmm12,xmm15
+        vpxor   xmm6,xmm11,xmm14
+        vmovdqa XMMWORD[(48-128)+rax],xmm2
+        vpaddd  xmm12,xmm12,xmm2
+        vpxor   xmm3,xmm3,XMMWORD[((192-128))+rax]
+        vpsrld  xmm9,xmm13,27
+        vpxor   xmm6,xmm6,xmm10
+        vpxor   xmm3,xmm3,xmm0
+
+        vpslld  xmm7,xmm14,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm12,xmm12,xmm6
+        vpsrld  xmm5,xmm3,31
+        vpaddd  xmm3,xmm3,xmm3
+
+        vpsrld  xmm14,xmm14,2
+        vpaddd  xmm12,xmm12,xmm8
+        vpor    xmm3,xmm3,xmm5
+        vpor    xmm14,xmm14,xmm7
+        vpxor   xmm4,xmm4,xmm1
+        vmovdqa xmm1,XMMWORD[((112-128))+rax]
+
+        vpslld  xmm8,xmm12,5
+        vpaddd  xmm11,xmm11,xmm15
+        vpxor   xmm6,xmm10,xmm13
+        vmovdqa XMMWORD[(64-128)+rax],xmm3
+        vpaddd  xmm11,xmm11,xmm3
+        vpxor   xmm4,xmm4,XMMWORD[((208-128))+rax]
+        vpsrld  xmm9,xmm12,27
+        vpxor   xmm6,xmm6,xmm14
+        vpxor   xmm4,xmm4,xmm1
+
+        vpslld  xmm7,xmm13,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm11,xmm11,xmm6
+        vpsrld  xmm5,xmm4,31
+        vpaddd  xmm4,xmm4,xmm4
+
+        vpsrld  xmm13,xmm13,2
+        vpaddd  xmm11,xmm11,xmm8
+        vpor    xmm4,xmm4,xmm5
+        vpor    xmm13,xmm13,xmm7
+        vpxor   xmm0,xmm0,xmm2
+        vmovdqa xmm2,XMMWORD[((128-128))+rax]
+
+        vpslld  xmm8,xmm11,5
+        vpaddd  xmm10,xmm10,xmm15
+        vpxor   xmm6,xmm14,xmm12
+        vmovdqa XMMWORD[(80-128)+rax],xmm4
+        vpaddd  xmm10,xmm10,xmm4
+        vpxor   xmm0,xmm0,XMMWORD[((224-128))+rax]
+        vpsrld  xmm9,xmm11,27
+        vpxor   xmm6,xmm6,xmm13
+        vpxor   xmm0,xmm0,xmm2
+
+        vpslld  xmm7,xmm12,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm10,xmm10,xmm6
+        vpsrld  xmm5,xmm0,31
+        vpaddd  xmm0,xmm0,xmm0
+
+        vpsrld  xmm12,xmm12,2
+        vpaddd  xmm10,xmm10,xmm8
+        vpor    xmm0,xmm0,xmm5
+        vpor    xmm12,xmm12,xmm7
+        vpxor   xmm1,xmm1,xmm3
+        vmovdqa xmm3,XMMWORD[((144-128))+rax]
+
+        vpslld  xmm8,xmm10,5
+        vpaddd  xmm14,xmm14,xmm15
+        vpxor   xmm6,xmm13,xmm11
+        vmovdqa XMMWORD[(96-128)+rax],xmm0
+        vpaddd  xmm14,xmm14,xmm0
+        vpxor   xmm1,xmm1,XMMWORD[((240-128))+rax]
+        vpsrld  xmm9,xmm10,27
+        vpxor   xmm6,xmm6,xmm12
+        vpxor   xmm1,xmm1,xmm3
+
+        vpslld  xmm7,xmm11,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm14,xmm14,xmm6
+        vpsrld  xmm5,xmm1,31
+        vpaddd  xmm1,xmm1,xmm1
+
+        vpsrld  xmm11,xmm11,2
+        vpaddd  xmm14,xmm14,xmm8
+        vpor    xmm1,xmm1,xmm5
+        vpor    xmm11,xmm11,xmm7
+        vpxor   xmm2,xmm2,xmm4
+        vmovdqa xmm4,XMMWORD[((160-128))+rax]
+
+        vpslld  xmm8,xmm14,5
+        vpaddd  xmm13,xmm13,xmm15
+        vpxor   xmm6,xmm12,xmm10
+        vmovdqa XMMWORD[(112-128)+rax],xmm1
+        vpaddd  xmm13,xmm13,xmm1
+        vpxor   xmm2,xmm2,XMMWORD[((0-128))+rax]
+        vpsrld  xmm9,xmm14,27
+        vpxor   xmm6,xmm6,xmm11
+        vpxor   xmm2,xmm2,xmm4
+
+        vpslld  xmm7,xmm10,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm13,xmm13,xmm6
+        vpsrld  xmm5,xmm2,31
+        vpaddd  xmm2,xmm2,xmm2
+
+        vpsrld  xmm10,xmm10,2
+        vpaddd  xmm13,xmm13,xmm8
+        vpor    xmm2,xmm2,xmm5
+        vpor    xmm10,xmm10,xmm7
+        vpxor   xmm3,xmm3,xmm0
+        vmovdqa xmm0,XMMWORD[((176-128))+rax]
+
+        vpslld  xmm8,xmm13,5
+        vpaddd  xmm12,xmm12,xmm15
+        vpxor   xmm6,xmm11,xmm14
+        vpaddd  xmm12,xmm12,xmm2
+        vpxor   xmm3,xmm3,XMMWORD[((16-128))+rax]
+        vpsrld  xmm9,xmm13,27
+        vpxor   xmm6,xmm6,xmm10
+        vpxor   xmm3,xmm3,xmm0
+
+        vpslld  xmm7,xmm14,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm12,xmm12,xmm6
+        vpsrld  xmm5,xmm3,31
+        vpaddd  xmm3,xmm3,xmm3
+
+        vpsrld  xmm14,xmm14,2
+        vpaddd  xmm12,xmm12,xmm8
+        vpor    xmm3,xmm3,xmm5
+        vpor    xmm14,xmm14,xmm7
+        vpxor   xmm4,xmm4,xmm1
+        vmovdqa xmm1,XMMWORD[((192-128))+rax]
+
+        vpslld  xmm8,xmm12,5
+        vpaddd  xmm11,xmm11,xmm15
+        vpxor   xmm6,xmm10,xmm13
+        vpaddd  xmm11,xmm11,xmm3
+        vpxor   xmm4,xmm4,XMMWORD[((32-128))+rax]
+        vpsrld  xmm9,xmm12,27
+        vpxor   xmm6,xmm6,xmm14
+        vpxor   xmm4,xmm4,xmm1
+
+        vpslld  xmm7,xmm13,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm11,xmm11,xmm6
+        vpsrld  xmm5,xmm4,31
+        vpaddd  xmm4,xmm4,xmm4
+
+        vpsrld  xmm13,xmm13,2
+        vpaddd  xmm11,xmm11,xmm8
+        vpor    xmm4,xmm4,xmm5
+        vpor    xmm13,xmm13,xmm7
+        vpxor   xmm0,xmm0,xmm2
+        vmovdqa xmm2,XMMWORD[((208-128))+rax]
+
+        vpslld  xmm8,xmm11,5
+        vpaddd  xmm10,xmm10,xmm15
+        vpxor   xmm6,xmm14,xmm12
+        vpaddd  xmm10,xmm10,xmm4
+        vpxor   xmm0,xmm0,XMMWORD[((48-128))+rax]
+        vpsrld  xmm9,xmm11,27
+        vpxor   xmm6,xmm6,xmm13
+        vpxor   xmm0,xmm0,xmm2
+
+        vpslld  xmm7,xmm12,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm10,xmm10,xmm6
+        vpsrld  xmm5,xmm0,31
+        vpaddd  xmm0,xmm0,xmm0
+
+        vpsrld  xmm12,xmm12,2
+        vpaddd  xmm10,xmm10,xmm8
+        vpor    xmm0,xmm0,xmm5
+        vpor    xmm12,xmm12,xmm7
+        vpxor   xmm1,xmm1,xmm3
+        vmovdqa xmm3,XMMWORD[((224-128))+rax]
+
+        vpslld  xmm8,xmm10,5
+        vpaddd  xmm14,xmm14,xmm15
+        vpxor   xmm6,xmm13,xmm11
+        vpaddd  xmm14,xmm14,xmm0
+        vpxor   xmm1,xmm1,XMMWORD[((64-128))+rax]
+        vpsrld  xmm9,xmm10,27
+        vpxor   xmm6,xmm6,xmm12
+        vpxor   xmm1,xmm1,xmm3
+
+        vpslld  xmm7,xmm11,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm14,xmm14,xmm6
+        vpsrld  xmm5,xmm1,31
+        vpaddd  xmm1,xmm1,xmm1
+
+        vpsrld  xmm11,xmm11,2
+        vpaddd  xmm14,xmm14,xmm8
+        vpor    xmm1,xmm1,xmm5
+        vpor    xmm11,xmm11,xmm7
+        vpxor   xmm2,xmm2,xmm4
+        vmovdqa xmm4,XMMWORD[((240-128))+rax]
+
+        vpslld  xmm8,xmm14,5
+        vpaddd  xmm13,xmm13,xmm15
+        vpxor   xmm6,xmm12,xmm10
+        vpaddd  xmm13,xmm13,xmm1
+        vpxor   xmm2,xmm2,XMMWORD[((80-128))+rax]
+        vpsrld  xmm9,xmm14,27
+        vpxor   xmm6,xmm6,xmm11
+        vpxor   xmm2,xmm2,xmm4
+
+        vpslld  xmm7,xmm10,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm13,xmm13,xmm6
+        vpsrld  xmm5,xmm2,31
+        vpaddd  xmm2,xmm2,xmm2
+
+        vpsrld  xmm10,xmm10,2
+        vpaddd  xmm13,xmm13,xmm8
+        vpor    xmm2,xmm2,xmm5
+        vpor    xmm10,xmm10,xmm7
+        vpxor   xmm3,xmm3,xmm0
+        vmovdqa xmm0,XMMWORD[((0-128))+rax]
+
+        vpslld  xmm8,xmm13,5
+        vpaddd  xmm12,xmm12,xmm15
+        vpxor   xmm6,xmm11,xmm14
+        vpaddd  xmm12,xmm12,xmm2
+        vpxor   xmm3,xmm3,XMMWORD[((96-128))+rax]
+        vpsrld  xmm9,xmm13,27
+        vpxor   xmm6,xmm6,xmm10
+        vpxor   xmm3,xmm3,xmm0
+
+        vpslld  xmm7,xmm14,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm12,xmm12,xmm6
+        vpsrld  xmm5,xmm3,31
+        vpaddd  xmm3,xmm3,xmm3
+
+        vpsrld  xmm14,xmm14,2
+        vpaddd  xmm12,xmm12,xmm8
+        vpor    xmm3,xmm3,xmm5
+        vpor    xmm14,xmm14,xmm7
+        vpxor   xmm4,xmm4,xmm1
+        vmovdqa xmm1,XMMWORD[((16-128))+rax]
+
+        vpslld  xmm8,xmm12,5
+        vpaddd  xmm11,xmm11,xmm15
+        vpxor   xmm6,xmm10,xmm13
+        vpaddd  xmm11,xmm11,xmm3
+        vpxor   xmm4,xmm4,XMMWORD[((112-128))+rax]
+        vpsrld  xmm9,xmm12,27
+        vpxor   xmm6,xmm6,xmm14
+        vpxor   xmm4,xmm4,xmm1
+
+        vpslld  xmm7,xmm13,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm11,xmm11,xmm6
+        vpsrld  xmm5,xmm4,31
+        vpaddd  xmm4,xmm4,xmm4
+
+        vpsrld  xmm13,xmm13,2
+        vpaddd  xmm11,xmm11,xmm8
+        vpor    xmm4,xmm4,xmm5
+        vpor    xmm13,xmm13,xmm7
+        vpslld  xmm8,xmm11,5
+        vpaddd  xmm10,xmm10,xmm15
+        vpxor   xmm6,xmm14,xmm12
+
+        vpsrld  xmm9,xmm11,27
+        vpaddd  xmm10,xmm10,xmm4
+        vpxor   xmm6,xmm6,xmm13
+
+        vpslld  xmm7,xmm12,30
+        vpor    xmm8,xmm8,xmm9
+        vpaddd  xmm10,xmm10,xmm6
+
+        vpsrld  xmm12,xmm12,2
+        vpaddd  xmm10,xmm10,xmm8
+        vpor    xmm12,xmm12,xmm7
+        mov     ecx,1
+        cmp     ecx,DWORD[rbx]
+        cmovge  r8,rbp
+        cmp     ecx,DWORD[4+rbx]
+        cmovge  r9,rbp
+        cmp     ecx,DWORD[8+rbx]
+        cmovge  r10,rbp
+        cmp     ecx,DWORD[12+rbx]
+        cmovge  r11,rbp
+        vmovdqu xmm6,XMMWORD[rbx]
+        vpxor   xmm8,xmm8,xmm8
+        vmovdqa xmm7,xmm6
+        vpcmpgtd        xmm7,xmm7,xmm8
+        vpaddd  xmm6,xmm6,xmm7
+
+        vpand   xmm10,xmm10,xmm7
+        vpand   xmm11,xmm11,xmm7
+        vpaddd  xmm10,xmm10,XMMWORD[rdi]
+        vpand   xmm12,xmm12,xmm7
+        vpaddd  xmm11,xmm11,XMMWORD[32+rdi]
+        vpand   xmm13,xmm13,xmm7
+        vpaddd  xmm12,xmm12,XMMWORD[64+rdi]
+        vpand   xmm14,xmm14,xmm7
+        vpaddd  xmm13,xmm13,XMMWORD[96+rdi]
+        vpaddd  xmm14,xmm14,XMMWORD[128+rdi]
+        vmovdqu XMMWORD[rdi],xmm10
+        vmovdqu XMMWORD[32+rdi],xmm11
+        vmovdqu XMMWORD[64+rdi],xmm12
+        vmovdqu XMMWORD[96+rdi],xmm13
+        vmovdqu XMMWORD[128+rdi],xmm14
+
+        vmovdqu XMMWORD[rbx],xmm6
+        vmovdqu xmm5,XMMWORD[96+rbp]
+        dec     edx
+        jnz     NEAR $L$oop_avx
+
+        mov     edx,DWORD[280+rsp]
+        lea     rdi,[16+rdi]
+        lea     rsi,[64+rsi]
+        dec     edx
+        jnz     NEAR $L$oop_grande_avx
+
+$L$done_avx:
+        mov     rax,QWORD[272+rsp]
+
+        vzeroupper
+        movaps  xmm6,XMMWORD[((-184))+rax]
+        movaps  xmm7,XMMWORD[((-168))+rax]
+        movaps  xmm8,XMMWORD[((-152))+rax]
+        movaps  xmm9,XMMWORD[((-136))+rax]
+        movaps  xmm10,XMMWORD[((-120))+rax]
+        movaps  xmm11,XMMWORD[((-104))+rax]
+        movaps  xmm12,XMMWORD[((-88))+rax]
+        movaps  xmm13,XMMWORD[((-72))+rax]
+        movaps  xmm14,XMMWORD[((-56))+rax]
+        movaps  xmm15,XMMWORD[((-40))+rax]
+        mov     rbp,QWORD[((-16))+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+
+        lea     rsp,[rax]
+
+$L$epilogue_avx:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_sha1_multi_block_avx:
+
+ALIGN   32
+sha1_multi_block_avx2:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_sha1_multi_block_avx2:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+
+_avx2_shortcut:
+        mov     rax,rsp
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+        lea     rsp,[((-168))+rsp]
+        movaps  XMMWORD[rsp],xmm6
+        movaps  XMMWORD[16+rsp],xmm7
+        movaps  XMMWORD[32+rsp],xmm8
+        movaps  XMMWORD[48+rsp],xmm9
+        movaps  XMMWORD[64+rsp],xmm10
+        movaps  XMMWORD[80+rsp],xmm11
+        movaps  XMMWORD[(-120)+rax],xmm12
+        movaps  XMMWORD[(-104)+rax],xmm13
+        movaps  XMMWORD[(-88)+rax],xmm14
+        movaps  XMMWORD[(-72)+rax],xmm15
+        sub     rsp,576
+        and     rsp,-256
+        mov     QWORD[544+rsp],rax
+
+$L$body_avx2:
+        lea     rbp,[K_XX_XX]
+        shr     edx,1
+
+        vzeroupper
+$L$oop_grande_avx2:
+        mov     DWORD[552+rsp],edx
+        xor     edx,edx
+        lea     rbx,[512+rsp]
+        mov     r12,QWORD[rsi]
+        mov     ecx,DWORD[8+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[rbx],ecx
+        cmovle  r12,rbp
+        mov     r13,QWORD[16+rsi]
+        mov     ecx,DWORD[24+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[4+rbx],ecx
+        cmovle  r13,rbp
+        mov     r14,QWORD[32+rsi]
+        mov     ecx,DWORD[40+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[8+rbx],ecx
+        cmovle  r14,rbp
+        mov     r15,QWORD[48+rsi]
+        mov     ecx,DWORD[56+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[12+rbx],ecx
+        cmovle  r15,rbp
+        mov     r8,QWORD[64+rsi]
+        mov     ecx,DWORD[72+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[16+rbx],ecx
+        cmovle  r8,rbp
+        mov     r9,QWORD[80+rsi]
+        mov     ecx,DWORD[88+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[20+rbx],ecx
+        cmovle  r9,rbp
+        mov     r10,QWORD[96+rsi]
+        mov     ecx,DWORD[104+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[24+rbx],ecx
+        cmovle  r10,rbp
+        mov     r11,QWORD[112+rsi]
+        mov     ecx,DWORD[120+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[28+rbx],ecx
+        cmovle  r11,rbp
+        vmovdqu ymm0,YMMWORD[rdi]
+        lea     rax,[128+rsp]
+        vmovdqu ymm1,YMMWORD[32+rdi]
+        lea     rbx,[((256+128))+rsp]
+        vmovdqu ymm2,YMMWORD[64+rdi]
+        vmovdqu ymm3,YMMWORD[96+rdi]
+        vmovdqu ymm4,YMMWORD[128+rdi]
+        vmovdqu ymm9,YMMWORD[96+rbp]
+        jmp     NEAR $L$oop_avx2
+
+ALIGN   32
+$L$oop_avx2:
+        vmovdqa ymm15,YMMWORD[((-32))+rbp]
+        vmovd   xmm10,DWORD[r12]
+        lea     r12,[64+r12]
+        vmovd   xmm12,DWORD[r8]
+        lea     r8,[64+r8]
+        vmovd   xmm7,DWORD[r13]
+        lea     r13,[64+r13]
+        vmovd   xmm6,DWORD[r9]
+        lea     r9,[64+r9]
+        vpinsrd xmm10,xmm10,DWORD[r14],1
+        lea     r14,[64+r14]
+        vpinsrd xmm12,xmm12,DWORD[r10],1
+        lea     r10,[64+r10]
+        vpinsrd xmm7,xmm7,DWORD[r15],1
+        lea     r15,[64+r15]
+        vpunpckldq      ymm10,ymm10,ymm7
+        vpinsrd xmm6,xmm6,DWORD[r11],1
+        lea     r11,[64+r11]
+        vpunpckldq      ymm12,ymm12,ymm6
+        vmovd   xmm11,DWORD[((-60))+r12]
+        vinserti128     ymm10,ymm10,xmm12,1
+        vmovd   xmm8,DWORD[((-60))+r8]
+        vpshufb ymm10,ymm10,ymm9
+        vmovd   xmm7,DWORD[((-60))+r13]
+        vmovd   xmm6,DWORD[((-60))+r9]
+        vpinsrd xmm11,xmm11,DWORD[((-60))+r14],1
+        vpinsrd xmm8,xmm8,DWORD[((-60))+r10],1
+        vpinsrd xmm7,xmm7,DWORD[((-60))+r15],1
+        vpunpckldq      ymm11,ymm11,ymm7
+        vpinsrd xmm6,xmm6,DWORD[((-60))+r11],1
+        vpunpckldq      ymm8,ymm8,ymm6
+        vpaddd  ymm4,ymm4,ymm15
+        vpslld  ymm7,ymm0,5
+        vpandn  ymm6,ymm1,ymm3
+        vpand   ymm5,ymm1,ymm2
+
+        vmovdqa YMMWORD[(0-128)+rax],ymm10
+        vpaddd  ymm4,ymm4,ymm10
+        vinserti128     ymm11,ymm11,xmm8,1
+        vpsrld  ymm8,ymm0,27
+        vpxor   ymm5,ymm5,ymm6
+        vmovd   xmm12,DWORD[((-56))+r12]
+
+        vpslld  ymm6,ymm1,30
+        vpor    ymm7,ymm7,ymm8
+        vmovd   xmm8,DWORD[((-56))+r8]
+        vpaddd  ymm4,ymm4,ymm5
+
+        vpsrld  ymm1,ymm1,2
+        vpaddd  ymm4,ymm4,ymm7
+        vpshufb ymm11,ymm11,ymm9
+        vpor    ymm1,ymm1,ymm6
+        vmovd   xmm7,DWORD[((-56))+r13]
+        vmovd   xmm6,DWORD[((-56))+r9]
+        vpinsrd xmm12,xmm12,DWORD[((-56))+r14],1
+        vpinsrd xmm8,xmm8,DWORD[((-56))+r10],1
+        vpinsrd xmm7,xmm7,DWORD[((-56))+r15],1
+        vpunpckldq      ymm12,ymm12,ymm7
+        vpinsrd xmm6,xmm6,DWORD[((-56))+r11],1
+        vpunpckldq      ymm8,ymm8,ymm6
+        vpaddd  ymm3,ymm3,ymm15
+        vpslld  ymm7,ymm4,5
+        vpandn  ymm6,ymm0,ymm2
+        vpand   ymm5,ymm0,ymm1
+
+        vmovdqa YMMWORD[(32-128)+rax],ymm11
+        vpaddd  ymm3,ymm3,ymm11
+        vinserti128     ymm12,ymm12,xmm8,1
+        vpsrld  ymm8,ymm4,27
+        vpxor   ymm5,ymm5,ymm6
+        vmovd   xmm13,DWORD[((-52))+r12]
+
+        vpslld  ymm6,ymm0,30
+        vpor    ymm7,ymm7,ymm8
+        vmovd   xmm8,DWORD[((-52))+r8]
+        vpaddd  ymm3,ymm3,ymm5
+
+        vpsrld  ymm0,ymm0,2
+        vpaddd  ymm3,ymm3,ymm7
+        vpshufb ymm12,ymm12,ymm9
+        vpor    ymm0,ymm0,ymm6
+        vmovd   xmm7,DWORD[((-52))+r13]
+        vmovd   xmm6,DWORD[((-52))+r9]
+        vpinsrd xmm13,xmm13,DWORD[((-52))+r14],1
+        vpinsrd xmm8,xmm8,DWORD[((-52))+r10],1
+        vpinsrd xmm7,xmm7,DWORD[((-52))+r15],1
+        vpunpckldq      ymm13,ymm13,ymm7
+        vpinsrd xmm6,xmm6,DWORD[((-52))+r11],1
+        vpunpckldq      ymm8,ymm8,ymm6
+        vpaddd  ymm2,ymm2,ymm15
+        vpslld  ymm7,ymm3,5
+        vpandn  ymm6,ymm4,ymm1
+        vpand   ymm5,ymm4,ymm0
+
+        vmovdqa YMMWORD[(64-128)+rax],ymm12
+        vpaddd  ymm2,ymm2,ymm12
+        vinserti128     ymm13,ymm13,xmm8,1
+        vpsrld  ymm8,ymm3,27
+        vpxor   ymm5,ymm5,ymm6
+        vmovd   xmm14,DWORD[((-48))+r12]
+
+        vpslld  ymm6,ymm4,30
+        vpor    ymm7,ymm7,ymm8
+        vmovd   xmm8,DWORD[((-48))+r8]
+        vpaddd  ymm2,ymm2,ymm5
+
+        vpsrld  ymm4,ymm4,2
+        vpaddd  ymm2,ymm2,ymm7
+        vpshufb ymm13,ymm13,ymm9
+        vpor    ymm4,ymm4,ymm6
+        vmovd   xmm7,DWORD[((-48))+r13]
+        vmovd   xmm6,DWORD[((-48))+r9]
+        vpinsrd xmm14,xmm14,DWORD[((-48))+r14],1
+        vpinsrd xmm8,xmm8,DWORD[((-48))+r10],1
+        vpinsrd xmm7,xmm7,DWORD[((-48))+r15],1
+        vpunpckldq      ymm14,ymm14,ymm7
+        vpinsrd xmm6,xmm6,DWORD[((-48))+r11],1
+        vpunpckldq      ymm8,ymm8,ymm6
+        vpaddd  ymm1,ymm1,ymm15
+        vpslld  ymm7,ymm2,5
+        vpandn  ymm6,ymm3,ymm0
+        vpand   ymm5,ymm3,ymm4
+
+        vmovdqa YMMWORD[(96-128)+rax],ymm13
+        vpaddd  ymm1,ymm1,ymm13
+        vinserti128     ymm14,ymm14,xmm8,1
+        vpsrld  ymm8,ymm2,27
+        vpxor   ymm5,ymm5,ymm6
+        vmovd   xmm10,DWORD[((-44))+r12]
+
+        vpslld  ymm6,ymm3,30
+        vpor    ymm7,ymm7,ymm8
+        vmovd   xmm8,DWORD[((-44))+r8]
+        vpaddd  ymm1,ymm1,ymm5
+
+        vpsrld  ymm3,ymm3,2
+        vpaddd  ymm1,ymm1,ymm7
+        vpshufb ymm14,ymm14,ymm9
+        vpor    ymm3,ymm3,ymm6
+        vmovd   xmm7,DWORD[((-44))+r13]
+        vmovd   xmm6,DWORD[((-44))+r9]
+        vpinsrd xmm10,xmm10,DWORD[((-44))+r14],1
+        vpinsrd xmm8,xmm8,DWORD[((-44))+r10],1
+        vpinsrd xmm7,xmm7,DWORD[((-44))+r15],1
+        vpunpckldq      ymm10,ymm10,ymm7
+        vpinsrd xmm6,xmm6,DWORD[((-44))+r11],1
+        vpunpckldq      ymm8,ymm8,ymm6
+        vpaddd  ymm0,ymm0,ymm15
+        vpslld  ymm7,ymm1,5
+        vpandn  ymm6,ymm2,ymm4
+        vpand   ymm5,ymm2,ymm3
+
+        vmovdqa YMMWORD[(128-128)+rax],ymm14
+        vpaddd  ymm0,ymm0,ymm14
+        vinserti128     ymm10,ymm10,xmm8,1
+        vpsrld  ymm8,ymm1,27
+        vpxor   ymm5,ymm5,ymm6
+        vmovd   xmm11,DWORD[((-40))+r12]
+
+        vpslld  ymm6,ymm2,30
+        vpor    ymm7,ymm7,ymm8
+        vmovd   xmm8,DWORD[((-40))+r8]
+        vpaddd  ymm0,ymm0,ymm5
+
+        vpsrld  ymm2,ymm2,2
+        vpaddd  ymm0,ymm0,ymm7
+        vpshufb ymm10,ymm10,ymm9
+        vpor    ymm2,ymm2,ymm6
+        vmovd   xmm7,DWORD[((-40))+r13]
+        vmovd   xmm6,DWORD[((-40))+r9]
+        vpinsrd xmm11,xmm11,DWORD[((-40))+r14],1
+        vpinsrd xmm8,xmm8,DWORD[((-40))+r10],1
+        vpinsrd xmm7,xmm7,DWORD[((-40))+r15],1
+        vpunpckldq      ymm11,ymm11,ymm7
+        vpinsrd xmm6,xmm6,DWORD[((-40))+r11],1
+        vpunpckldq      ymm8,ymm8,ymm6
+        vpaddd  ymm4,ymm4,ymm15
+        vpslld  ymm7,ymm0,5
+        vpandn  ymm6,ymm1,ymm3
+        vpand   ymm5,ymm1,ymm2
+
+        vmovdqa YMMWORD[(160-128)+rax],ymm10
+        vpaddd  ymm4,ymm4,ymm10
+        vinserti128     ymm11,ymm11,xmm8,1
+        vpsrld  ymm8,ymm0,27
+        vpxor   ymm5,ymm5,ymm6
+        vmovd   xmm12,DWORD[((-36))+r12]
+
+        vpslld  ymm6,ymm1,30
+        vpor    ymm7,ymm7,ymm8
+        vmovd   xmm8,DWORD[((-36))+r8]
+        vpaddd  ymm4,ymm4,ymm5
+
+        vpsrld  ymm1,ymm1,2
+        vpaddd  ymm4,ymm4,ymm7
+        vpshufb ymm11,ymm11,ymm9
+        vpor    ymm1,ymm1,ymm6
+        vmovd   xmm7,DWORD[((-36))+r13]
+        vmovd   xmm6,DWORD[((-36))+r9]
+        vpinsrd xmm12,xmm12,DWORD[((-36))+r14],1
+        vpinsrd xmm8,xmm8,DWORD[((-36))+r10],1
+        vpinsrd xmm7,xmm7,DWORD[((-36))+r15],1
+        vpunpckldq      ymm12,ymm12,ymm7
+        vpinsrd xmm6,xmm6,DWORD[((-36))+r11],1
+        vpunpckldq      ymm8,ymm8,ymm6
+        vpaddd  ymm3,ymm3,ymm15
+        vpslld  ymm7,ymm4,5
+        vpandn  ymm6,ymm0,ymm2
+        vpand   ymm5,ymm0,ymm1
+
+        vmovdqa YMMWORD[(192-128)+rax],ymm11
+        vpaddd  ymm3,ymm3,ymm11
+        vinserti128     ymm12,ymm12,xmm8,1
+        vpsrld  ymm8,ymm4,27
+        vpxor   ymm5,ymm5,ymm6
+        vmovd   xmm13,DWORD[((-32))+r12]
+
+        vpslld  ymm6,ymm0,30
+        vpor    ymm7,ymm7,ymm8
+        vmovd   xmm8,DWORD[((-32))+r8]
+        vpaddd  ymm3,ymm3,ymm5
+
+        vpsrld  ymm0,ymm0,2
+        vpaddd  ymm3,ymm3,ymm7
+        vpshufb ymm12,ymm12,ymm9
+        vpor    ymm0,ymm0,ymm6
+        vmovd   xmm7,DWORD[((-32))+r13]
+        vmovd   xmm6,DWORD[((-32))+r9]
+        vpinsrd xmm13,xmm13,DWORD[((-32))+r14],1
+        vpinsrd xmm8,xmm8,DWORD[((-32))+r10],1
+        vpinsrd xmm7,xmm7,DWORD[((-32))+r15],1
+        vpunpckldq      ymm13,ymm13,ymm7
+        vpinsrd xmm6,xmm6,DWORD[((-32))+r11],1
+        vpunpckldq      ymm8,ymm8,ymm6
+        vpaddd  ymm2,ymm2,ymm15
+        vpslld  ymm7,ymm3,5
+        vpandn  ymm6,ymm4,ymm1
+        vpand   ymm5,ymm4,ymm0
+
+        vmovdqa YMMWORD[(224-128)+rax],ymm12
+        vpaddd  ymm2,ymm2,ymm12
+        vinserti128     ymm13,ymm13,xmm8,1
+        vpsrld  ymm8,ymm3,27
+        vpxor   ymm5,ymm5,ymm6
+        vmovd   xmm14,DWORD[((-28))+r12]
+
+        vpslld  ymm6,ymm4,30
+        vpor    ymm7,ymm7,ymm8
+        vmovd   xmm8,DWORD[((-28))+r8]
+        vpaddd  ymm2,ymm2,ymm5
+
+        vpsrld  ymm4,ymm4,2
+        vpaddd  ymm2,ymm2,ymm7
+        vpshufb ymm13,ymm13,ymm9
+        vpor    ymm4,ymm4,ymm6
+        vmovd   xmm7,DWORD[((-28))+r13]
+        vmovd   xmm6,DWORD[((-28))+r9]
+        vpinsrd xmm14,xmm14,DWORD[((-28))+r14],1
+        vpinsrd xmm8,xmm8,DWORD[((-28))+r10],1
+        vpinsrd xmm7,xmm7,DWORD[((-28))+r15],1
+        vpunpckldq      ymm14,ymm14,ymm7
+        vpinsrd xmm6,xmm6,DWORD[((-28))+r11],1
+        vpunpckldq      ymm8,ymm8,ymm6
+        vpaddd  ymm1,ymm1,ymm15
+        vpslld  ymm7,ymm2,5
+        vpandn  ymm6,ymm3,ymm0
+        vpand   ymm5,ymm3,ymm4
+
+        vmovdqa YMMWORD[(256-256-128)+rbx],ymm13
+        vpaddd  ymm1,ymm1,ymm13
+        vinserti128     ymm14,ymm14,xmm8,1
+        vpsrld  ymm8,ymm2,27
+        vpxor   ymm5,ymm5,ymm6
+        vmovd   xmm10,DWORD[((-24))+r12]
+
+        vpslld  ymm6,ymm3,30
+        vpor    ymm7,ymm7,ymm8
+        vmovd   xmm8,DWORD[((-24))+r8]
+        vpaddd  ymm1,ymm1,ymm5
+
+        vpsrld  ymm3,ymm3,2
+        vpaddd  ymm1,ymm1,ymm7
+        vpshufb ymm14,ymm14,ymm9
+        vpor    ymm3,ymm3,ymm6
+        vmovd   xmm7,DWORD[((-24))+r13]
+        vmovd   xmm6,DWORD[((-24))+r9]
+        vpinsrd xmm10,xmm10,DWORD[((-24))+r14],1
+        vpinsrd xmm8,xmm8,DWORD[((-24))+r10],1
+        vpinsrd xmm7,xmm7,DWORD[((-24))+r15],1
+        vpunpckldq      ymm10,ymm10,ymm7
+        vpinsrd xmm6,xmm6,DWORD[((-24))+r11],1
+        vpunpckldq      ymm8,ymm8,ymm6
+        vpaddd  ymm0,ymm0,ymm15
+        vpslld  ymm7,ymm1,5
+        vpandn  ymm6,ymm2,ymm4
+        vpand   ymm5,ymm2,ymm3
+
+        vmovdqa YMMWORD[(288-256-128)+rbx],ymm14
+        vpaddd  ymm0,ymm0,ymm14
+        vinserti128     ymm10,ymm10,xmm8,1
+        vpsrld  ymm8,ymm1,27
+        vpxor   ymm5,ymm5,ymm6
+        vmovd   xmm11,DWORD[((-20))+r12]
+
+        vpslld  ymm6,ymm2,30
+        vpor    ymm7,ymm7,ymm8
+        vmovd   xmm8,DWORD[((-20))+r8]
+        vpaddd  ymm0,ymm0,ymm5
+
+        vpsrld  ymm2,ymm2,2
+        vpaddd  ymm0,ymm0,ymm7
+        vpshufb ymm10,ymm10,ymm9
+        vpor    ymm2,ymm2,ymm6
+        vmovd   xmm7,DWORD[((-20))+r13]
+        vmovd   xmm6,DWORD[((-20))+r9]
+        vpinsrd xmm11,xmm11,DWORD[((-20))+r14],1
+        vpinsrd xmm8,xmm8,DWORD[((-20))+r10],1
+        vpinsrd xmm7,xmm7,DWORD[((-20))+r15],1
+        vpunpckldq      ymm11,ymm11,ymm7
+        vpinsrd xmm6,xmm6,DWORD[((-20))+r11],1
+        vpunpckldq      ymm8,ymm8,ymm6
+        vpaddd  ymm4,ymm4,ymm15
+        vpslld  ymm7,ymm0,5
+        vpandn  ymm6,ymm1,ymm3
+        vpand   ymm5,ymm1,ymm2
+
+        vmovdqa YMMWORD[(320-256-128)+rbx],ymm10
+        vpaddd  ymm4,ymm4,ymm10
+        vinserti128     ymm11,ymm11,xmm8,1
+        vpsrld  ymm8,ymm0,27
+        vpxor   ymm5,ymm5,ymm6
+        vmovd   xmm12,DWORD[((-16))+r12]
+
+        vpslld  ymm6,ymm1,30
+        vpor    ymm7,ymm7,ymm8
+        vmovd   xmm8,DWORD[((-16))+r8]
+        vpaddd  ymm4,ymm4,ymm5
+
+        vpsrld  ymm1,ymm1,2
+        vpaddd  ymm4,ymm4,ymm7
+        vpshufb ymm11,ymm11,ymm9
+        vpor    ymm1,ymm1,ymm6
+        vmovd   xmm7,DWORD[((-16))+r13]
+        vmovd   xmm6,DWORD[((-16))+r9]
+        vpinsrd xmm12,xmm12,DWORD[((-16))+r14],1
+        vpinsrd xmm8,xmm8,DWORD[((-16))+r10],1
+        vpinsrd xmm7,xmm7,DWORD[((-16))+r15],1
+        vpunpckldq      ymm12,ymm12,ymm7
+        vpinsrd xmm6,xmm6,DWORD[((-16))+r11],1
+        vpunpckldq      ymm8,ymm8,ymm6
+        vpaddd  ymm3,ymm3,ymm15
+        vpslld  ymm7,ymm4,5
+        vpandn  ymm6,ymm0,ymm2
+        vpand   ymm5,ymm0,ymm1
+
+        vmovdqa YMMWORD[(352-256-128)+rbx],ymm11
+        vpaddd  ymm3,ymm3,ymm11
+        vinserti128     ymm12,ymm12,xmm8,1
+        vpsrld  ymm8,ymm4,27
+        vpxor   ymm5,ymm5,ymm6
+        vmovd   xmm13,DWORD[((-12))+r12]
+
+        vpslld  ymm6,ymm0,30
+        vpor    ymm7,ymm7,ymm8
+        vmovd   xmm8,DWORD[((-12))+r8]
+        vpaddd  ymm3,ymm3,ymm5
+
+        vpsrld  ymm0,ymm0,2
+        vpaddd  ymm3,ymm3,ymm7
+        vpshufb ymm12,ymm12,ymm9
+        vpor    ymm0,ymm0,ymm6
+        vmovd   xmm7,DWORD[((-12))+r13]
+        vmovd   xmm6,DWORD[((-12))+r9]
+        vpinsrd xmm13,xmm13,DWORD[((-12))+r14],1
+        vpinsrd xmm8,xmm8,DWORD[((-12))+r10],1
+        vpinsrd xmm7,xmm7,DWORD[((-12))+r15],1
+        vpunpckldq      ymm13,ymm13,ymm7
+        vpinsrd xmm6,xmm6,DWORD[((-12))+r11],1
+        vpunpckldq      ymm8,ymm8,ymm6
+        vpaddd  ymm2,ymm2,ymm15
+        vpslld  ymm7,ymm3,5
+        vpandn  ymm6,ymm4,ymm1
+        vpand   ymm5,ymm4,ymm0
+
+        vmovdqa YMMWORD[(384-256-128)+rbx],ymm12
+        vpaddd  ymm2,ymm2,ymm12
+        vinserti128     ymm13,ymm13,xmm8,1
+        vpsrld  ymm8,ymm3,27
+        vpxor   ymm5,ymm5,ymm6
+        vmovd   xmm14,DWORD[((-8))+r12]
+
+        vpslld  ymm6,ymm4,30
+        vpor    ymm7,ymm7,ymm8
+        vmovd   xmm8,DWORD[((-8))+r8]
+        vpaddd  ymm2,ymm2,ymm5
+
+        vpsrld  ymm4,ymm4,2
+        vpaddd  ymm2,ymm2,ymm7
+        vpshufb ymm13,ymm13,ymm9
+        vpor    ymm4,ymm4,ymm6
+        vmovd   xmm7,DWORD[((-8))+r13]
+        vmovd   xmm6,DWORD[((-8))+r9]
+        vpinsrd xmm14,xmm14,DWORD[((-8))+r14],1
+        vpinsrd xmm8,xmm8,DWORD[((-8))+r10],1
+        vpinsrd xmm7,xmm7,DWORD[((-8))+r15],1
+        vpunpckldq      ymm14,ymm14,ymm7
+        vpinsrd xmm6,xmm6,DWORD[((-8))+r11],1
+        vpunpckldq      ymm8,ymm8,ymm6
+        vpaddd  ymm1,ymm1,ymm15
+        vpslld  ymm7,ymm2,5
+        vpandn  ymm6,ymm3,ymm0
+        vpand   ymm5,ymm3,ymm4
+
+        vmovdqa YMMWORD[(416-256-128)+rbx],ymm13
+        vpaddd  ymm1,ymm1,ymm13
+        vinserti128     ymm14,ymm14,xmm8,1
+        vpsrld  ymm8,ymm2,27
+        vpxor   ymm5,ymm5,ymm6
+        vmovd   xmm10,DWORD[((-4))+r12]
+
+        vpslld  ymm6,ymm3,30
+        vpor    ymm7,ymm7,ymm8
+        vmovd   xmm8,DWORD[((-4))+r8]
+        vpaddd  ymm1,ymm1,ymm5
+
+        vpsrld  ymm3,ymm3,2
+        vpaddd  ymm1,ymm1,ymm7
+        vpshufb ymm14,ymm14,ymm9
+        vpor    ymm3,ymm3,ymm6
+        vmovdqa ymm11,YMMWORD[((0-128))+rax]
+        vmovd   xmm7,DWORD[((-4))+r13]
+        vmovd   xmm6,DWORD[((-4))+r9]
+        vpinsrd xmm10,xmm10,DWORD[((-4))+r14],1
+        vpinsrd xmm8,xmm8,DWORD[((-4))+r10],1
+        vpinsrd xmm7,xmm7,DWORD[((-4))+r15],1
+        vpunpckldq      ymm10,ymm10,ymm7
+        vpinsrd xmm6,xmm6,DWORD[((-4))+r11],1
+        vpunpckldq      ymm8,ymm8,ymm6
+        vpaddd  ymm0,ymm0,ymm15
+        prefetcht0      [63+r12]
+        vpslld  ymm7,ymm1,5
+        vpandn  ymm6,ymm2,ymm4
+        vpand   ymm5,ymm2,ymm3
+
+        vmovdqa YMMWORD[(448-256-128)+rbx],ymm14
+        vpaddd  ymm0,ymm0,ymm14
+        vinserti128     ymm10,ymm10,xmm8,1
+        vpsrld  ymm8,ymm1,27
+        prefetcht0      [63+r13]
+        vpxor   ymm5,ymm5,ymm6
+
+        vpslld  ymm6,ymm2,30
+        vpor    ymm7,ymm7,ymm8
+        prefetcht0      [63+r14]
+        vpaddd  ymm0,ymm0,ymm5
+
+        vpsrld  ymm2,ymm2,2
+        vpaddd  ymm0,ymm0,ymm7
+        prefetcht0      [63+r15]
+        vpshufb ymm10,ymm10,ymm9
+        vpor    ymm2,ymm2,ymm6
+        vmovdqa ymm12,YMMWORD[((32-128))+rax]
+        vpxor   ymm11,ymm11,ymm13
+        vmovdqa ymm13,YMMWORD[((64-128))+rax]
+
+        vpaddd  ymm4,ymm4,ymm15
+        vpslld  ymm7,ymm0,5
+        vpandn  ymm6,ymm1,ymm3
+        prefetcht0      [63+r8]
+        vpand   ymm5,ymm1,ymm2
+
+        vmovdqa YMMWORD[(480-256-128)+rbx],ymm10
+        vpaddd  ymm4,ymm4,ymm10
+        vpxor   ymm11,ymm11,YMMWORD[((256-256-128))+rbx]
+        vpsrld  ymm8,ymm0,27
+        vpxor   ymm5,ymm5,ymm6
+        vpxor   ymm11,ymm11,ymm13
+        prefetcht0      [63+r9]
+
+        vpslld  ymm6,ymm1,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm4,ymm4,ymm5
+        prefetcht0      [63+r10]
+        vpsrld  ymm9,ymm11,31
+        vpaddd  ymm11,ymm11,ymm11
+
+        vpsrld  ymm1,ymm1,2
+        prefetcht0      [63+r11]
+        vpaddd  ymm4,ymm4,ymm7
+        vpor    ymm11,ymm11,ymm9
+        vpor    ymm1,ymm1,ymm6
+        vpxor   ymm12,ymm12,ymm14
+        vmovdqa ymm14,YMMWORD[((96-128))+rax]
+
+        vpaddd  ymm3,ymm3,ymm15
+        vpslld  ymm7,ymm4,5
+        vpandn  ymm6,ymm0,ymm2
+
+        vpand   ymm5,ymm0,ymm1
+
+        vmovdqa YMMWORD[(0-128)+rax],ymm11
+        vpaddd  ymm3,ymm3,ymm11
+        vpxor   ymm12,ymm12,YMMWORD[((288-256-128))+rbx]
+        vpsrld  ymm8,ymm4,27
+        vpxor   ymm5,ymm5,ymm6
+        vpxor   ymm12,ymm12,ymm14
+
+
+        vpslld  ymm6,ymm0,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm3,ymm3,ymm5
+
+        vpsrld  ymm9,ymm12,31
+        vpaddd  ymm12,ymm12,ymm12
+
+        vpsrld  ymm0,ymm0,2
+
+        vpaddd  ymm3,ymm3,ymm7
+        vpor    ymm12,ymm12,ymm9
+        vpor    ymm0,ymm0,ymm6
+        vpxor   ymm13,ymm13,ymm10
+        vmovdqa ymm10,YMMWORD[((128-128))+rax]
+
+        vpaddd  ymm2,ymm2,ymm15
+        vpslld  ymm7,ymm3,5
+        vpandn  ymm6,ymm4,ymm1
+
+        vpand   ymm5,ymm4,ymm0
+
+        vmovdqa YMMWORD[(32-128)+rax],ymm12
+        vpaddd  ymm2,ymm2,ymm12
+        vpxor   ymm13,ymm13,YMMWORD[((320-256-128))+rbx]
+        vpsrld  ymm8,ymm3,27
+        vpxor   ymm5,ymm5,ymm6
+        vpxor   ymm13,ymm13,ymm10
+
+
+        vpslld  ymm6,ymm4,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm2,ymm2,ymm5
+
+        vpsrld  ymm9,ymm13,31
+        vpaddd  ymm13,ymm13,ymm13
+
+        vpsrld  ymm4,ymm4,2
+
+        vpaddd  ymm2,ymm2,ymm7
+        vpor    ymm13,ymm13,ymm9
+        vpor    ymm4,ymm4,ymm6
+        vpxor   ymm14,ymm14,ymm11
+        vmovdqa ymm11,YMMWORD[((160-128))+rax]
+
+        vpaddd  ymm1,ymm1,ymm15
+        vpslld  ymm7,ymm2,5
+        vpandn  ymm6,ymm3,ymm0
+
+        vpand   ymm5,ymm3,ymm4
+
+        vmovdqa YMMWORD[(64-128)+rax],ymm13
+        vpaddd  ymm1,ymm1,ymm13
+        vpxor   ymm14,ymm14,YMMWORD[((352-256-128))+rbx]
+        vpsrld  ymm8,ymm2,27
+        vpxor   ymm5,ymm5,ymm6
+        vpxor   ymm14,ymm14,ymm11
+
+
+        vpslld  ymm6,ymm3,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm1,ymm1,ymm5
+
+        vpsrld  ymm9,ymm14,31
+        vpaddd  ymm14,ymm14,ymm14
+
+        vpsrld  ymm3,ymm3,2
+
+        vpaddd  ymm1,ymm1,ymm7
+        vpor    ymm14,ymm14,ymm9
+        vpor    ymm3,ymm3,ymm6
+        vpxor   ymm10,ymm10,ymm12
+        vmovdqa ymm12,YMMWORD[((192-128))+rax]
+
+        vpaddd  ymm0,ymm0,ymm15
+        vpslld  ymm7,ymm1,5
+        vpandn  ymm6,ymm2,ymm4
+
+        vpand   ymm5,ymm2,ymm3
+
+        vmovdqa YMMWORD[(96-128)+rax],ymm14
+        vpaddd  ymm0,ymm0,ymm14
+        vpxor   ymm10,ymm10,YMMWORD[((384-256-128))+rbx]
+        vpsrld  ymm8,ymm1,27
+        vpxor   ymm5,ymm5,ymm6
+        vpxor   ymm10,ymm10,ymm12
+
+
+        vpslld  ymm6,ymm2,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm0,ymm0,ymm5
+
+        vpsrld  ymm9,ymm10,31
+        vpaddd  ymm10,ymm10,ymm10
+
+        vpsrld  ymm2,ymm2,2
+
+        vpaddd  ymm0,ymm0,ymm7
+        vpor    ymm10,ymm10,ymm9
+        vpor    ymm2,ymm2,ymm6
+        vmovdqa ymm15,YMMWORD[rbp]
+        vpxor   ymm11,ymm11,ymm13
+        vmovdqa ymm13,YMMWORD[((224-128))+rax]
+
+        vpslld  ymm7,ymm0,5
+        vpaddd  ymm4,ymm4,ymm15
+        vpxor   ymm5,ymm3,ymm1
+        vmovdqa YMMWORD[(128-128)+rax],ymm10
+        vpaddd  ymm4,ymm4,ymm10
+        vpxor   ymm11,ymm11,YMMWORD[((416-256-128))+rbx]
+        vpsrld  ymm8,ymm0,27
+        vpxor   ymm5,ymm5,ymm2
+        vpxor   ymm11,ymm11,ymm13
+
+        vpslld  ymm6,ymm1,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm4,ymm4,ymm5
+        vpsrld  ymm9,ymm11,31
+        vpaddd  ymm11,ymm11,ymm11
+
+        vpsrld  ymm1,ymm1,2
+        vpaddd  ymm4,ymm4,ymm7
+        vpor    ymm11,ymm11,ymm9
+        vpor    ymm1,ymm1,ymm6
+        vpxor   ymm12,ymm12,ymm14
+        vmovdqa ymm14,YMMWORD[((256-256-128))+rbx]
+
+        vpslld  ymm7,ymm4,5
+        vpaddd  ymm3,ymm3,ymm15
+        vpxor   ymm5,ymm2,ymm0
+        vmovdqa YMMWORD[(160-128)+rax],ymm11
+        vpaddd  ymm3,ymm3,ymm11
+        vpxor   ymm12,ymm12,YMMWORD[((448-256-128))+rbx]
+        vpsrld  ymm8,ymm4,27
+        vpxor   ymm5,ymm5,ymm1
+        vpxor   ymm12,ymm12,ymm14
+
+        vpslld  ymm6,ymm0,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm3,ymm3,ymm5
+        vpsrld  ymm9,ymm12,31
+        vpaddd  ymm12,ymm12,ymm12
+
+        vpsrld  ymm0,ymm0,2
+        vpaddd  ymm3,ymm3,ymm7
+        vpor    ymm12,ymm12,ymm9
+        vpor    ymm0,ymm0,ymm6
+        vpxor   ymm13,ymm13,ymm10
+        vmovdqa ymm10,YMMWORD[((288-256-128))+rbx]
+
+        vpslld  ymm7,ymm3,5
+        vpaddd  ymm2,ymm2,ymm15
+        vpxor   ymm5,ymm1,ymm4
+        vmovdqa YMMWORD[(192-128)+rax],ymm12
+        vpaddd  ymm2,ymm2,ymm12
+        vpxor   ymm13,ymm13,YMMWORD[((480-256-128))+rbx]
+        vpsrld  ymm8,ymm3,27
+        vpxor   ymm5,ymm5,ymm0
+        vpxor   ymm13,ymm13,ymm10
+
+        vpslld  ymm6,ymm4,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm2,ymm2,ymm5
+        vpsrld  ymm9,ymm13,31
+        vpaddd  ymm13,ymm13,ymm13
+
+        vpsrld  ymm4,ymm4,2
+        vpaddd  ymm2,ymm2,ymm7
+        vpor    ymm13,ymm13,ymm9
+        vpor    ymm4,ymm4,ymm6
+        vpxor   ymm14,ymm14,ymm11
+        vmovdqa ymm11,YMMWORD[((320-256-128))+rbx]
+
+        vpslld  ymm7,ymm2,5
+        vpaddd  ymm1,ymm1,ymm15
+        vpxor   ymm5,ymm0,ymm3
+        vmovdqa YMMWORD[(224-128)+rax],ymm13
+        vpaddd  ymm1,ymm1,ymm13
+        vpxor   ymm14,ymm14,YMMWORD[((0-128))+rax]
+        vpsrld  ymm8,ymm2,27
+        vpxor   ymm5,ymm5,ymm4
+        vpxor   ymm14,ymm14,ymm11
+
+        vpslld  ymm6,ymm3,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm1,ymm1,ymm5
+        vpsrld  ymm9,ymm14,31
+        vpaddd  ymm14,ymm14,ymm14
+
+        vpsrld  ymm3,ymm3,2
+        vpaddd  ymm1,ymm1,ymm7
+        vpor    ymm14,ymm14,ymm9
+        vpor    ymm3,ymm3,ymm6
+        vpxor   ymm10,ymm10,ymm12
+        vmovdqa ymm12,YMMWORD[((352-256-128))+rbx]
+
+        vpslld  ymm7,ymm1,5
+        vpaddd  ymm0,ymm0,ymm15
+        vpxor   ymm5,ymm4,ymm2
+        vmovdqa YMMWORD[(256-256-128)+rbx],ymm14
+        vpaddd  ymm0,ymm0,ymm14
+        vpxor   ymm10,ymm10,YMMWORD[((32-128))+rax]
+        vpsrld  ymm8,ymm1,27
+        vpxor   ymm5,ymm5,ymm3
+        vpxor   ymm10,ymm10,ymm12
+
+        vpslld  ymm6,ymm2,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm0,ymm0,ymm5
+        vpsrld  ymm9,ymm10,31
+        vpaddd  ymm10,ymm10,ymm10
+
+        vpsrld  ymm2,ymm2,2
+        vpaddd  ymm0,ymm0,ymm7
+        vpor    ymm10,ymm10,ymm9
+        vpor    ymm2,ymm2,ymm6
+        vpxor   ymm11,ymm11,ymm13
+        vmovdqa ymm13,YMMWORD[((384-256-128))+rbx]
+
+        vpslld  ymm7,ymm0,5
+        vpaddd  ymm4,ymm4,ymm15
+        vpxor   ymm5,ymm3,ymm1
+        vmovdqa YMMWORD[(288-256-128)+rbx],ymm10
+        vpaddd  ymm4,ymm4,ymm10
+        vpxor   ymm11,ymm11,YMMWORD[((64-128))+rax]
+        vpsrld  ymm8,ymm0,27
+        vpxor   ymm5,ymm5,ymm2
+        vpxor   ymm11,ymm11,ymm13
+
+        vpslld  ymm6,ymm1,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm4,ymm4,ymm5
+        vpsrld  ymm9,ymm11,31
+        vpaddd  ymm11,ymm11,ymm11
+
+        vpsrld  ymm1,ymm1,2
+        vpaddd  ymm4,ymm4,ymm7
+        vpor    ymm11,ymm11,ymm9
+        vpor    ymm1,ymm1,ymm6
+        vpxor   ymm12,ymm12,ymm14
+        vmovdqa ymm14,YMMWORD[((416-256-128))+rbx]
+
+        vpslld  ymm7,ymm4,5
+        vpaddd  ymm3,ymm3,ymm15
+        vpxor   ymm5,ymm2,ymm0
+        vmovdqa YMMWORD[(320-256-128)+rbx],ymm11
+        vpaddd  ymm3,ymm3,ymm11
+        vpxor   ymm12,ymm12,YMMWORD[((96-128))+rax]
+        vpsrld  ymm8,ymm4,27
+        vpxor   ymm5,ymm5,ymm1
+        vpxor   ymm12,ymm12,ymm14
+
+        vpslld  ymm6,ymm0,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm3,ymm3,ymm5
+        vpsrld  ymm9,ymm12,31
+        vpaddd  ymm12,ymm12,ymm12
+
+        vpsrld  ymm0,ymm0,2
+        vpaddd  ymm3,ymm3,ymm7
+        vpor    ymm12,ymm12,ymm9
+        vpor    ymm0,ymm0,ymm6
+        vpxor   ymm13,ymm13,ymm10
+        vmovdqa ymm10,YMMWORD[((448-256-128))+rbx]
+
+        vpslld  ymm7,ymm3,5
+        vpaddd  ymm2,ymm2,ymm15
+        vpxor   ymm5,ymm1,ymm4
+        vmovdqa YMMWORD[(352-256-128)+rbx],ymm12
+        vpaddd  ymm2,ymm2,ymm12
+        vpxor   ymm13,ymm13,YMMWORD[((128-128))+rax]
+        vpsrld  ymm8,ymm3,27
+        vpxor   ymm5,ymm5,ymm0
+        vpxor   ymm13,ymm13,ymm10
+
+        vpslld  ymm6,ymm4,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm2,ymm2,ymm5
+        vpsrld  ymm9,ymm13,31
+        vpaddd  ymm13,ymm13,ymm13
+
+        vpsrld  ymm4,ymm4,2
+        vpaddd  ymm2,ymm2,ymm7
+        vpor    ymm13,ymm13,ymm9
+        vpor    ymm4,ymm4,ymm6
+        vpxor   ymm14,ymm14,ymm11
+        vmovdqa ymm11,YMMWORD[((480-256-128))+rbx]
+
+        vpslld  ymm7,ymm2,5
+        vpaddd  ymm1,ymm1,ymm15
+        vpxor   ymm5,ymm0,ymm3
+        vmovdqa YMMWORD[(384-256-128)+rbx],ymm13
+        vpaddd  ymm1,ymm1,ymm13
+        vpxor   ymm14,ymm14,YMMWORD[((160-128))+rax]
+        vpsrld  ymm8,ymm2,27
+        vpxor   ymm5,ymm5,ymm4
+        vpxor   ymm14,ymm14,ymm11
+
+        vpslld  ymm6,ymm3,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm1,ymm1,ymm5
+        vpsrld  ymm9,ymm14,31
+        vpaddd  ymm14,ymm14,ymm14
+
+        vpsrld  ymm3,ymm3,2
+        vpaddd  ymm1,ymm1,ymm7
+        vpor    ymm14,ymm14,ymm9
+        vpor    ymm3,ymm3,ymm6
+        vpxor   ymm10,ymm10,ymm12
+        vmovdqa ymm12,YMMWORD[((0-128))+rax]
+
+        vpslld  ymm7,ymm1,5
+        vpaddd  ymm0,ymm0,ymm15
+        vpxor   ymm5,ymm4,ymm2
+        vmovdqa YMMWORD[(416-256-128)+rbx],ymm14
+        vpaddd  ymm0,ymm0,ymm14
+        vpxor   ymm10,ymm10,YMMWORD[((192-128))+rax]
+        vpsrld  ymm8,ymm1,27
+        vpxor   ymm5,ymm5,ymm3
+        vpxor   ymm10,ymm10,ymm12
+
+        vpslld  ymm6,ymm2,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm0,ymm0,ymm5
+        vpsrld  ymm9,ymm10,31
+        vpaddd  ymm10,ymm10,ymm10
+
+        vpsrld  ymm2,ymm2,2
+        vpaddd  ymm0,ymm0,ymm7
+        vpor    ymm10,ymm10,ymm9
+        vpor    ymm2,ymm2,ymm6
+        vpxor   ymm11,ymm11,ymm13
+        vmovdqa ymm13,YMMWORD[((32-128))+rax]
+
+        vpslld  ymm7,ymm0,5
+        vpaddd  ymm4,ymm4,ymm15
+        vpxor   ymm5,ymm3,ymm1
+        vmovdqa YMMWORD[(448-256-128)+rbx],ymm10
+        vpaddd  ymm4,ymm4,ymm10
+        vpxor   ymm11,ymm11,YMMWORD[((224-128))+rax]
+        vpsrld  ymm8,ymm0,27
+        vpxor   ymm5,ymm5,ymm2
+        vpxor   ymm11,ymm11,ymm13
+
+        vpslld  ymm6,ymm1,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm4,ymm4,ymm5
+        vpsrld  ymm9,ymm11,31
+        vpaddd  ymm11,ymm11,ymm11
+
+        vpsrld  ymm1,ymm1,2
+        vpaddd  ymm4,ymm4,ymm7
+        vpor    ymm11,ymm11,ymm9
+        vpor    ymm1,ymm1,ymm6
+        vpxor   ymm12,ymm12,ymm14
+        vmovdqa ymm14,YMMWORD[((64-128))+rax]
+
+        vpslld  ymm7,ymm4,5
+        vpaddd  ymm3,ymm3,ymm15
+        vpxor   ymm5,ymm2,ymm0
+        vmovdqa YMMWORD[(480-256-128)+rbx],ymm11
+        vpaddd  ymm3,ymm3,ymm11
+        vpxor   ymm12,ymm12,YMMWORD[((256-256-128))+rbx]
+        vpsrld  ymm8,ymm4,27
+        vpxor   ymm5,ymm5,ymm1
+        vpxor   ymm12,ymm12,ymm14
+
+        vpslld  ymm6,ymm0,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm3,ymm3,ymm5
+        vpsrld  ymm9,ymm12,31
+        vpaddd  ymm12,ymm12,ymm12
+
+        vpsrld  ymm0,ymm0,2
+        vpaddd  ymm3,ymm3,ymm7
+        vpor    ymm12,ymm12,ymm9
+        vpor    ymm0,ymm0,ymm6
+        vpxor   ymm13,ymm13,ymm10
+        vmovdqa ymm10,YMMWORD[((96-128))+rax]
+
+        vpslld  ymm7,ymm3,5
+        vpaddd  ymm2,ymm2,ymm15
+        vpxor   ymm5,ymm1,ymm4
+        vmovdqa YMMWORD[(0-128)+rax],ymm12
+        vpaddd  ymm2,ymm2,ymm12
+        vpxor   ymm13,ymm13,YMMWORD[((288-256-128))+rbx]
+        vpsrld  ymm8,ymm3,27
+        vpxor   ymm5,ymm5,ymm0
+        vpxor   ymm13,ymm13,ymm10
+
+        vpslld  ymm6,ymm4,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm2,ymm2,ymm5
+        vpsrld  ymm9,ymm13,31
+        vpaddd  ymm13,ymm13,ymm13
+
+        vpsrld  ymm4,ymm4,2
+        vpaddd  ymm2,ymm2,ymm7
+        vpor    ymm13,ymm13,ymm9
+        vpor    ymm4,ymm4,ymm6
+        vpxor   ymm14,ymm14,ymm11
+        vmovdqa ymm11,YMMWORD[((128-128))+rax]
+
+        vpslld  ymm7,ymm2,5
+        vpaddd  ymm1,ymm1,ymm15
+        vpxor   ymm5,ymm0,ymm3
+        vmovdqa YMMWORD[(32-128)+rax],ymm13
+        vpaddd  ymm1,ymm1,ymm13
+        vpxor   ymm14,ymm14,YMMWORD[((320-256-128))+rbx]
+        vpsrld  ymm8,ymm2,27
+        vpxor   ymm5,ymm5,ymm4
+        vpxor   ymm14,ymm14,ymm11
+
+        vpslld  ymm6,ymm3,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm1,ymm1,ymm5
+        vpsrld  ymm9,ymm14,31
+        vpaddd  ymm14,ymm14,ymm14
+
+        vpsrld  ymm3,ymm3,2
+        vpaddd  ymm1,ymm1,ymm7
+        vpor    ymm14,ymm14,ymm9
+        vpor    ymm3,ymm3,ymm6
+        vpxor   ymm10,ymm10,ymm12
+        vmovdqa ymm12,YMMWORD[((160-128))+rax]
+
+        vpslld  ymm7,ymm1,5
+        vpaddd  ymm0,ymm0,ymm15
+        vpxor   ymm5,ymm4,ymm2
+        vmovdqa YMMWORD[(64-128)+rax],ymm14
+        vpaddd  ymm0,ymm0,ymm14
+        vpxor   ymm10,ymm10,YMMWORD[((352-256-128))+rbx]
+        vpsrld  ymm8,ymm1,27
+        vpxor   ymm5,ymm5,ymm3
+        vpxor   ymm10,ymm10,ymm12
+
+        vpslld  ymm6,ymm2,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm0,ymm0,ymm5
+        vpsrld  ymm9,ymm10,31
+        vpaddd  ymm10,ymm10,ymm10
+
+        vpsrld  ymm2,ymm2,2
+        vpaddd  ymm0,ymm0,ymm7
+        vpor    ymm10,ymm10,ymm9
+        vpor    ymm2,ymm2,ymm6
+        vpxor   ymm11,ymm11,ymm13
+        vmovdqa ymm13,YMMWORD[((192-128))+rax]
+
+        vpslld  ymm7,ymm0,5
+        vpaddd  ymm4,ymm4,ymm15
+        vpxor   ymm5,ymm3,ymm1
+        vmovdqa YMMWORD[(96-128)+rax],ymm10
+        vpaddd  ymm4,ymm4,ymm10
+        vpxor   ymm11,ymm11,YMMWORD[((384-256-128))+rbx]
+        vpsrld  ymm8,ymm0,27
+        vpxor   ymm5,ymm5,ymm2
+        vpxor   ymm11,ymm11,ymm13
+
+        vpslld  ymm6,ymm1,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm4,ymm4,ymm5
+        vpsrld  ymm9,ymm11,31
+        vpaddd  ymm11,ymm11,ymm11
+
+        vpsrld  ymm1,ymm1,2
+        vpaddd  ymm4,ymm4,ymm7
+        vpor    ymm11,ymm11,ymm9
+        vpor    ymm1,ymm1,ymm6
+        vpxor   ymm12,ymm12,ymm14
+        vmovdqa ymm14,YMMWORD[((224-128))+rax]
+
+        vpslld  ymm7,ymm4,5
+        vpaddd  ymm3,ymm3,ymm15
+        vpxor   ymm5,ymm2,ymm0
+        vmovdqa YMMWORD[(128-128)+rax],ymm11
+        vpaddd  ymm3,ymm3,ymm11
+        vpxor   ymm12,ymm12,YMMWORD[((416-256-128))+rbx]
+        vpsrld  ymm8,ymm4,27
+        vpxor   ymm5,ymm5,ymm1
+        vpxor   ymm12,ymm12,ymm14
+
+        vpslld  ymm6,ymm0,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm3,ymm3,ymm5
+        vpsrld  ymm9,ymm12,31
+        vpaddd  ymm12,ymm12,ymm12
+
+        vpsrld  ymm0,ymm0,2
+        vpaddd  ymm3,ymm3,ymm7
+        vpor    ymm12,ymm12,ymm9
+        vpor    ymm0,ymm0,ymm6
+        vpxor   ymm13,ymm13,ymm10
+        vmovdqa ymm10,YMMWORD[((256-256-128))+rbx]
+
+        vpslld  ymm7,ymm3,5
+        vpaddd  ymm2,ymm2,ymm15
+        vpxor   ymm5,ymm1,ymm4
+        vmovdqa YMMWORD[(160-128)+rax],ymm12
+        vpaddd  ymm2,ymm2,ymm12
+        vpxor   ymm13,ymm13,YMMWORD[((448-256-128))+rbx]
+        vpsrld  ymm8,ymm3,27
+        vpxor   ymm5,ymm5,ymm0
+        vpxor   ymm13,ymm13,ymm10
+
+        vpslld  ymm6,ymm4,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm2,ymm2,ymm5
+        vpsrld  ymm9,ymm13,31
+        vpaddd  ymm13,ymm13,ymm13
+
+        vpsrld  ymm4,ymm4,2
+        vpaddd  ymm2,ymm2,ymm7
+        vpor    ymm13,ymm13,ymm9
+        vpor    ymm4,ymm4,ymm6
+        vpxor   ymm14,ymm14,ymm11
+        vmovdqa ymm11,YMMWORD[((288-256-128))+rbx]
+
+        vpslld  ymm7,ymm2,5
+        vpaddd  ymm1,ymm1,ymm15
+        vpxor   ymm5,ymm0,ymm3
+        vmovdqa YMMWORD[(192-128)+rax],ymm13
+        vpaddd  ymm1,ymm1,ymm13
+        vpxor   ymm14,ymm14,YMMWORD[((480-256-128))+rbx]
+        vpsrld  ymm8,ymm2,27
+        vpxor   ymm5,ymm5,ymm4
+        vpxor   ymm14,ymm14,ymm11
+
+        vpslld  ymm6,ymm3,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm1,ymm1,ymm5
+        vpsrld  ymm9,ymm14,31
+        vpaddd  ymm14,ymm14,ymm14
+
+        vpsrld  ymm3,ymm3,2
+        vpaddd  ymm1,ymm1,ymm7
+        vpor    ymm14,ymm14,ymm9
+        vpor    ymm3,ymm3,ymm6
+        vpxor   ymm10,ymm10,ymm12
+        vmovdqa ymm12,YMMWORD[((320-256-128))+rbx]
+
+        vpslld  ymm7,ymm1,5
+        vpaddd  ymm0,ymm0,ymm15
+        vpxor   ymm5,ymm4,ymm2
+        vmovdqa YMMWORD[(224-128)+rax],ymm14
+        vpaddd  ymm0,ymm0,ymm14
+        vpxor   ymm10,ymm10,YMMWORD[((0-128))+rax]
+        vpsrld  ymm8,ymm1,27
+        vpxor   ymm5,ymm5,ymm3
+        vpxor   ymm10,ymm10,ymm12
+
+        vpslld  ymm6,ymm2,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm0,ymm0,ymm5
+        vpsrld  ymm9,ymm10,31
+        vpaddd  ymm10,ymm10,ymm10
+
+        vpsrld  ymm2,ymm2,2
+        vpaddd  ymm0,ymm0,ymm7
+        vpor    ymm10,ymm10,ymm9
+        vpor    ymm2,ymm2,ymm6
+        vmovdqa ymm15,YMMWORD[32+rbp]
+        vpxor   ymm11,ymm11,ymm13
+        vmovdqa ymm13,YMMWORD[((352-256-128))+rbx]
+
+        vpaddd  ymm4,ymm4,ymm15
+        vpslld  ymm7,ymm0,5
+        vpand   ymm6,ymm3,ymm2
+        vpxor   ymm11,ymm11,YMMWORD[((32-128))+rax]
+
+        vpaddd  ymm4,ymm4,ymm6
+        vpsrld  ymm8,ymm0,27
+        vpxor   ymm5,ymm3,ymm2
+        vpxor   ymm11,ymm11,ymm13
+
+        vmovdqu YMMWORD[(256-256-128)+rbx],ymm10
+        vpaddd  ymm4,ymm4,ymm10
+        vpor    ymm7,ymm7,ymm8
+        vpsrld  ymm9,ymm11,31
+        vpand   ymm5,ymm5,ymm1
+        vpaddd  ymm11,ymm11,ymm11
+
+        vpslld  ymm6,ymm1,30
+        vpaddd  ymm4,ymm4,ymm5
+
+        vpsrld  ymm1,ymm1,2
+        vpaddd  ymm4,ymm4,ymm7
+        vpor    ymm11,ymm11,ymm9
+        vpor    ymm1,ymm1,ymm6
+        vpxor   ymm12,ymm12,ymm14
+        vmovdqa ymm14,YMMWORD[((384-256-128))+rbx]
+
+        vpaddd  ymm3,ymm3,ymm15
+        vpslld  ymm7,ymm4,5
+        vpand   ymm6,ymm2,ymm1
+        vpxor   ymm12,ymm12,YMMWORD[((64-128))+rax]
+
+        vpaddd  ymm3,ymm3,ymm6
+        vpsrld  ymm8,ymm4,27
+        vpxor   ymm5,ymm2,ymm1
+        vpxor   ymm12,ymm12,ymm14
+
+        vmovdqu YMMWORD[(288-256-128)+rbx],ymm11
+        vpaddd  ymm3,ymm3,ymm11
+        vpor    ymm7,ymm7,ymm8
+        vpsrld  ymm9,ymm12,31
+        vpand   ymm5,ymm5,ymm0
+        vpaddd  ymm12,ymm12,ymm12
+
+        vpslld  ymm6,ymm0,30
+        vpaddd  ymm3,ymm3,ymm5
+
+        vpsrld  ymm0,ymm0,2
+        vpaddd  ymm3,ymm3,ymm7
+        vpor    ymm12,ymm12,ymm9
+        vpor    ymm0,ymm0,ymm6
+        vpxor   ymm13,ymm13,ymm10
+        vmovdqa ymm10,YMMWORD[((416-256-128))+rbx]
+
+        vpaddd  ymm2,ymm2,ymm15
+        vpslld  ymm7,ymm3,5
+        vpand   ymm6,ymm1,ymm0
+        vpxor   ymm13,ymm13,YMMWORD[((96-128))+rax]
+
+        vpaddd  ymm2,ymm2,ymm6
+        vpsrld  ymm8,ymm3,27
+        vpxor   ymm5,ymm1,ymm0
+        vpxor   ymm13,ymm13,ymm10
+
+        vmovdqu YMMWORD[(320-256-128)+rbx],ymm12
+        vpaddd  ymm2,ymm2,ymm12
+        vpor    ymm7,ymm7,ymm8
+        vpsrld  ymm9,ymm13,31
+        vpand   ymm5,ymm5,ymm4
+        vpaddd  ymm13,ymm13,ymm13
+
+        vpslld  ymm6,ymm4,30
+        vpaddd  ymm2,ymm2,ymm5
+
+        vpsrld  ymm4,ymm4,2
+        vpaddd  ymm2,ymm2,ymm7
+        vpor    ymm13,ymm13,ymm9
+        vpor    ymm4,ymm4,ymm6
+        vpxor   ymm14,ymm14,ymm11
+        vmovdqa ymm11,YMMWORD[((448-256-128))+rbx]
+
+        vpaddd  ymm1,ymm1,ymm15
+        vpslld  ymm7,ymm2,5
+        vpand   ymm6,ymm0,ymm4
+        vpxor   ymm14,ymm14,YMMWORD[((128-128))+rax]
+
+        vpaddd  ymm1,ymm1,ymm6
+        vpsrld  ymm8,ymm2,27
+        vpxor   ymm5,ymm0,ymm4
+        vpxor   ymm14,ymm14,ymm11
+
+        vmovdqu YMMWORD[(352-256-128)+rbx],ymm13
+        vpaddd  ymm1,ymm1,ymm13
+        vpor    ymm7,ymm7,ymm8
+        vpsrld  ymm9,ymm14,31
+        vpand   ymm5,ymm5,ymm3
+        vpaddd  ymm14,ymm14,ymm14
+
+        vpslld  ymm6,ymm3,30
+        vpaddd  ymm1,ymm1,ymm5
+
+        vpsrld  ymm3,ymm3,2
+        vpaddd  ymm1,ymm1,ymm7
+        vpor    ymm14,ymm14,ymm9
+        vpor    ymm3,ymm3,ymm6
+        vpxor   ymm10,ymm10,ymm12
+        vmovdqa ymm12,YMMWORD[((480-256-128))+rbx]
+
+        vpaddd  ymm0,ymm0,ymm15
+        vpslld  ymm7,ymm1,5
+        vpand   ymm6,ymm4,ymm3
+        vpxor   ymm10,ymm10,YMMWORD[((160-128))+rax]
+
+        vpaddd  ymm0,ymm0,ymm6
+        vpsrld  ymm8,ymm1,27
+        vpxor   ymm5,ymm4,ymm3
+        vpxor   ymm10,ymm10,ymm12
+
+        vmovdqu YMMWORD[(384-256-128)+rbx],ymm14
+        vpaddd  ymm0,ymm0,ymm14
+        vpor    ymm7,ymm7,ymm8
+        vpsrld  ymm9,ymm10,31
+        vpand   ymm5,ymm5,ymm2
+        vpaddd  ymm10,ymm10,ymm10
+
+        vpslld  ymm6,ymm2,30
+        vpaddd  ymm0,ymm0,ymm5
+
+        vpsrld  ymm2,ymm2,2
+        vpaddd  ymm0,ymm0,ymm7
+        vpor    ymm10,ymm10,ymm9
+        vpor    ymm2,ymm2,ymm6
+        vpxor   ymm11,ymm11,ymm13
+        vmovdqa ymm13,YMMWORD[((0-128))+rax]
+
+        vpaddd  ymm4,ymm4,ymm15
+        vpslld  ymm7,ymm0,5
+        vpand   ymm6,ymm3,ymm2
+        vpxor   ymm11,ymm11,YMMWORD[((192-128))+rax]
+
+        vpaddd  ymm4,ymm4,ymm6
+        vpsrld  ymm8,ymm0,27
+        vpxor   ymm5,ymm3,ymm2
+        vpxor   ymm11,ymm11,ymm13
+
+        vmovdqu YMMWORD[(416-256-128)+rbx],ymm10
+        vpaddd  ymm4,ymm4,ymm10
+        vpor    ymm7,ymm7,ymm8
+        vpsrld  ymm9,ymm11,31
+        vpand   ymm5,ymm5,ymm1
+        vpaddd  ymm11,ymm11,ymm11
+
+        vpslld  ymm6,ymm1,30
+        vpaddd  ymm4,ymm4,ymm5
+
+        vpsrld  ymm1,ymm1,2
+        vpaddd  ymm4,ymm4,ymm7
+        vpor    ymm11,ymm11,ymm9
+        vpor    ymm1,ymm1,ymm6
+        vpxor   ymm12,ymm12,ymm14
+        vmovdqa ymm14,YMMWORD[((32-128))+rax]
+
+        vpaddd  ymm3,ymm3,ymm15
+        vpslld  ymm7,ymm4,5
+        vpand   ymm6,ymm2,ymm1
+        vpxor   ymm12,ymm12,YMMWORD[((224-128))+rax]
+
+        vpaddd  ymm3,ymm3,ymm6
+        vpsrld  ymm8,ymm4,27
+        vpxor   ymm5,ymm2,ymm1
+        vpxor   ymm12,ymm12,ymm14
+
+        vmovdqu YMMWORD[(448-256-128)+rbx],ymm11
+        vpaddd  ymm3,ymm3,ymm11
+        vpor    ymm7,ymm7,ymm8
+        vpsrld  ymm9,ymm12,31
+        vpand   ymm5,ymm5,ymm0
+        vpaddd  ymm12,ymm12,ymm12
+
+        vpslld  ymm6,ymm0,30
+        vpaddd  ymm3,ymm3,ymm5
+
+        vpsrld  ymm0,ymm0,2
+        vpaddd  ymm3,ymm3,ymm7
+        vpor    ymm12,ymm12,ymm9
+        vpor    ymm0,ymm0,ymm6
+        vpxor   ymm13,ymm13,ymm10
+        vmovdqa ymm10,YMMWORD[((64-128))+rax]
+
+        vpaddd  ymm2,ymm2,ymm15
+        vpslld  ymm7,ymm3,5
+        vpand   ymm6,ymm1,ymm0
+        vpxor   ymm13,ymm13,YMMWORD[((256-256-128))+rbx]
+
+        vpaddd  ymm2,ymm2,ymm6
+        vpsrld  ymm8,ymm3,27
+        vpxor   ymm5,ymm1,ymm0
+        vpxor   ymm13,ymm13,ymm10
+
+        vmovdqu YMMWORD[(480-256-128)+rbx],ymm12
+        vpaddd  ymm2,ymm2,ymm12
+        vpor    ymm7,ymm7,ymm8
+        vpsrld  ymm9,ymm13,31
+        vpand   ymm5,ymm5,ymm4
+        vpaddd  ymm13,ymm13,ymm13
+
+        vpslld  ymm6,ymm4,30
+        vpaddd  ymm2,ymm2,ymm5
+
+        vpsrld  ymm4,ymm4,2
+        vpaddd  ymm2,ymm2,ymm7
+        vpor    ymm13,ymm13,ymm9
+        vpor    ymm4,ymm4,ymm6
+        vpxor   ymm14,ymm14,ymm11
+        vmovdqa ymm11,YMMWORD[((96-128))+rax]
+
+        vpaddd  ymm1,ymm1,ymm15
+        vpslld  ymm7,ymm2,5
+        vpand   ymm6,ymm0,ymm4
+        vpxor   ymm14,ymm14,YMMWORD[((288-256-128))+rbx]
+
+        vpaddd  ymm1,ymm1,ymm6
+        vpsrld  ymm8,ymm2,27
+        vpxor   ymm5,ymm0,ymm4
+        vpxor   ymm14,ymm14,ymm11
+
+        vmovdqu YMMWORD[(0-128)+rax],ymm13
+        vpaddd  ymm1,ymm1,ymm13
+        vpor    ymm7,ymm7,ymm8
+        vpsrld  ymm9,ymm14,31
+        vpand   ymm5,ymm5,ymm3
+        vpaddd  ymm14,ymm14,ymm14
+
+        vpslld  ymm6,ymm3,30
+        vpaddd  ymm1,ymm1,ymm5
+
+        vpsrld  ymm3,ymm3,2
+        vpaddd  ymm1,ymm1,ymm7
+        vpor    ymm14,ymm14,ymm9
+        vpor    ymm3,ymm3,ymm6
+        vpxor   ymm10,ymm10,ymm12
+        vmovdqa ymm12,YMMWORD[((128-128))+rax]
+
+        vpaddd  ymm0,ymm0,ymm15
+        vpslld  ymm7,ymm1,5
+        vpand   ymm6,ymm4,ymm3
+        vpxor   ymm10,ymm10,YMMWORD[((320-256-128))+rbx]
+
+        vpaddd  ymm0,ymm0,ymm6
+        vpsrld  ymm8,ymm1,27
+        vpxor   ymm5,ymm4,ymm3
+        vpxor   ymm10,ymm10,ymm12
+
+        vmovdqu YMMWORD[(32-128)+rax],ymm14
+        vpaddd  ymm0,ymm0,ymm14
+        vpor    ymm7,ymm7,ymm8
+        vpsrld  ymm9,ymm10,31
+        vpand   ymm5,ymm5,ymm2
+        vpaddd  ymm10,ymm10,ymm10
+
+        vpslld  ymm6,ymm2,30
+        vpaddd  ymm0,ymm0,ymm5
+
+        vpsrld  ymm2,ymm2,2
+        vpaddd  ymm0,ymm0,ymm7
+        vpor    ymm10,ymm10,ymm9
+        vpor    ymm2,ymm2,ymm6
+        vpxor   ymm11,ymm11,ymm13
+        vmovdqa ymm13,YMMWORD[((160-128))+rax]
+
+        vpaddd  ymm4,ymm4,ymm15
+        vpslld  ymm7,ymm0,5
+        vpand   ymm6,ymm3,ymm2
+        vpxor   ymm11,ymm11,YMMWORD[((352-256-128))+rbx]
+
+        vpaddd  ymm4,ymm4,ymm6
+        vpsrld  ymm8,ymm0,27
+        vpxor   ymm5,ymm3,ymm2
+        vpxor   ymm11,ymm11,ymm13
+
+        vmovdqu YMMWORD[(64-128)+rax],ymm10
+        vpaddd  ymm4,ymm4,ymm10
+        vpor    ymm7,ymm7,ymm8
+        vpsrld  ymm9,ymm11,31
+        vpand   ymm5,ymm5,ymm1
+        vpaddd  ymm11,ymm11,ymm11
+
+        vpslld  ymm6,ymm1,30
+        vpaddd  ymm4,ymm4,ymm5
+
+        vpsrld  ymm1,ymm1,2
+        vpaddd  ymm4,ymm4,ymm7
+        vpor    ymm11,ymm11,ymm9
+        vpor    ymm1,ymm1,ymm6
+        vpxor   ymm12,ymm12,ymm14
+        vmovdqa ymm14,YMMWORD[((192-128))+rax]
+
+        vpaddd  ymm3,ymm3,ymm15
+        vpslld  ymm7,ymm4,5
+        vpand   ymm6,ymm2,ymm1
+        vpxor   ymm12,ymm12,YMMWORD[((384-256-128))+rbx]
+
+        vpaddd  ymm3,ymm3,ymm6
+        vpsrld  ymm8,ymm4,27
+        vpxor   ymm5,ymm2,ymm1
+        vpxor   ymm12,ymm12,ymm14
+
+        vmovdqu YMMWORD[(96-128)+rax],ymm11
+        vpaddd  ymm3,ymm3,ymm11
+        vpor    ymm7,ymm7,ymm8
+        vpsrld  ymm9,ymm12,31
+        vpand   ymm5,ymm5,ymm0
+        vpaddd  ymm12,ymm12,ymm12
+
+        vpslld  ymm6,ymm0,30
+        vpaddd  ymm3,ymm3,ymm5
+
+        vpsrld  ymm0,ymm0,2
+        vpaddd  ymm3,ymm3,ymm7
+        vpor    ymm12,ymm12,ymm9
+        vpor    ymm0,ymm0,ymm6
+        vpxor   ymm13,ymm13,ymm10
+        vmovdqa ymm10,YMMWORD[((224-128))+rax]
+
+        vpaddd  ymm2,ymm2,ymm15
+        vpslld  ymm7,ymm3,5
+        vpand   ymm6,ymm1,ymm0
+        vpxor   ymm13,ymm13,YMMWORD[((416-256-128))+rbx]
+
+        vpaddd  ymm2,ymm2,ymm6
+        vpsrld  ymm8,ymm3,27
+        vpxor   ymm5,ymm1,ymm0
+        vpxor   ymm13,ymm13,ymm10
+
+        vmovdqu YMMWORD[(128-128)+rax],ymm12
+        vpaddd  ymm2,ymm2,ymm12
+        vpor    ymm7,ymm7,ymm8
+        vpsrld  ymm9,ymm13,31
+        vpand   ymm5,ymm5,ymm4
+        vpaddd  ymm13,ymm13,ymm13
+
+        vpslld  ymm6,ymm4,30
+        vpaddd  ymm2,ymm2,ymm5
+
+        vpsrld  ymm4,ymm4,2
+        vpaddd  ymm2,ymm2,ymm7
+        vpor    ymm13,ymm13,ymm9
+        vpor    ymm4,ymm4,ymm6
+        vpxor   ymm14,ymm14,ymm11
+        vmovdqa ymm11,YMMWORD[((256-256-128))+rbx]
+
+        vpaddd  ymm1,ymm1,ymm15
+        vpslld  ymm7,ymm2,5
+        vpand   ymm6,ymm0,ymm4
+        vpxor   ymm14,ymm14,YMMWORD[((448-256-128))+rbx]
+
+        vpaddd  ymm1,ymm1,ymm6
+        vpsrld  ymm8,ymm2,27
+        vpxor   ymm5,ymm0,ymm4
+        vpxor   ymm14,ymm14,ymm11
+
+        vmovdqu YMMWORD[(160-128)+rax],ymm13
+        vpaddd  ymm1,ymm1,ymm13
+        vpor    ymm7,ymm7,ymm8
+        vpsrld  ymm9,ymm14,31
+        vpand   ymm5,ymm5,ymm3
+        vpaddd  ymm14,ymm14,ymm14
+
+        vpslld  ymm6,ymm3,30
+        vpaddd  ymm1,ymm1,ymm5
+
+        vpsrld  ymm3,ymm3,2
+        vpaddd  ymm1,ymm1,ymm7
+        vpor    ymm14,ymm14,ymm9
+        vpor    ymm3,ymm3,ymm6
+        vpxor   ymm10,ymm10,ymm12
+        vmovdqa ymm12,YMMWORD[((288-256-128))+rbx]
+
+        vpaddd  ymm0,ymm0,ymm15
+        vpslld  ymm7,ymm1,5
+        vpand   ymm6,ymm4,ymm3
+        vpxor   ymm10,ymm10,YMMWORD[((480-256-128))+rbx]
+
+        vpaddd  ymm0,ymm0,ymm6
+        vpsrld  ymm8,ymm1,27
+        vpxor   ymm5,ymm4,ymm3
+        vpxor   ymm10,ymm10,ymm12
+
+        vmovdqu YMMWORD[(192-128)+rax],ymm14
+        vpaddd  ymm0,ymm0,ymm14
+        vpor    ymm7,ymm7,ymm8
+        vpsrld  ymm9,ymm10,31
+        vpand   ymm5,ymm5,ymm2
+        vpaddd  ymm10,ymm10,ymm10
+
+        vpslld  ymm6,ymm2,30
+        vpaddd  ymm0,ymm0,ymm5
+
+        vpsrld  ymm2,ymm2,2
+        vpaddd  ymm0,ymm0,ymm7
+        vpor    ymm10,ymm10,ymm9
+        vpor    ymm2,ymm2,ymm6
+        vpxor   ymm11,ymm11,ymm13
+        vmovdqa ymm13,YMMWORD[((320-256-128))+rbx]
+
+        vpaddd  ymm4,ymm4,ymm15
+        vpslld  ymm7,ymm0,5
+        vpand   ymm6,ymm3,ymm2
+        vpxor   ymm11,ymm11,YMMWORD[((0-128))+rax]
+
+        vpaddd  ymm4,ymm4,ymm6
+        vpsrld  ymm8,ymm0,27
+        vpxor   ymm5,ymm3,ymm2
+        vpxor   ymm11,ymm11,ymm13
+
+        vmovdqu YMMWORD[(224-128)+rax],ymm10
+        vpaddd  ymm4,ymm4,ymm10
+        vpor    ymm7,ymm7,ymm8
+        vpsrld  ymm9,ymm11,31
+        vpand   ymm5,ymm5,ymm1
+        vpaddd  ymm11,ymm11,ymm11
+
+        vpslld  ymm6,ymm1,30
+        vpaddd  ymm4,ymm4,ymm5
+
+        vpsrld  ymm1,ymm1,2
+        vpaddd  ymm4,ymm4,ymm7
+        vpor    ymm11,ymm11,ymm9
+        vpor    ymm1,ymm1,ymm6
+        vpxor   ymm12,ymm12,ymm14
+        vmovdqa ymm14,YMMWORD[((352-256-128))+rbx]
+
+        vpaddd  ymm3,ymm3,ymm15
+        vpslld  ymm7,ymm4,5
+        vpand   ymm6,ymm2,ymm1
+        vpxor   ymm12,ymm12,YMMWORD[((32-128))+rax]
+
+        vpaddd  ymm3,ymm3,ymm6
+        vpsrld  ymm8,ymm4,27
+        vpxor   ymm5,ymm2,ymm1
+        vpxor   ymm12,ymm12,ymm14
+
+        vmovdqu YMMWORD[(256-256-128)+rbx],ymm11
+        vpaddd  ymm3,ymm3,ymm11
+        vpor    ymm7,ymm7,ymm8
+        vpsrld  ymm9,ymm12,31
+        vpand   ymm5,ymm5,ymm0
+        vpaddd  ymm12,ymm12,ymm12
+
+        vpslld  ymm6,ymm0,30
+        vpaddd  ymm3,ymm3,ymm5
+
+        vpsrld  ymm0,ymm0,2
+        vpaddd  ymm3,ymm3,ymm7
+        vpor    ymm12,ymm12,ymm9
+        vpor    ymm0,ymm0,ymm6
+        vpxor   ymm13,ymm13,ymm10
+        vmovdqa ymm10,YMMWORD[((384-256-128))+rbx]
+
+        vpaddd  ymm2,ymm2,ymm15
+        vpslld  ymm7,ymm3,5
+        vpand   ymm6,ymm1,ymm0
+        vpxor   ymm13,ymm13,YMMWORD[((64-128))+rax]
+
+        vpaddd  ymm2,ymm2,ymm6
+        vpsrld  ymm8,ymm3,27
+        vpxor   ymm5,ymm1,ymm0
+        vpxor   ymm13,ymm13,ymm10
+
+        vmovdqu YMMWORD[(288-256-128)+rbx],ymm12
+        vpaddd  ymm2,ymm2,ymm12
+        vpor    ymm7,ymm7,ymm8
+        vpsrld  ymm9,ymm13,31
+        vpand   ymm5,ymm5,ymm4
+        vpaddd  ymm13,ymm13,ymm13
+
+        vpslld  ymm6,ymm4,30
+        vpaddd  ymm2,ymm2,ymm5
+
+        vpsrld  ymm4,ymm4,2
+        vpaddd  ymm2,ymm2,ymm7
+        vpor    ymm13,ymm13,ymm9
+        vpor    ymm4,ymm4,ymm6
+        vpxor   ymm14,ymm14,ymm11
+        vmovdqa ymm11,YMMWORD[((416-256-128))+rbx]
+
+        vpaddd  ymm1,ymm1,ymm15
+        vpslld  ymm7,ymm2,5
+        vpand   ymm6,ymm0,ymm4
+        vpxor   ymm14,ymm14,YMMWORD[((96-128))+rax]
+
+        vpaddd  ymm1,ymm1,ymm6
+        vpsrld  ymm8,ymm2,27
+        vpxor   ymm5,ymm0,ymm4
+        vpxor   ymm14,ymm14,ymm11
+
+        vmovdqu YMMWORD[(320-256-128)+rbx],ymm13
+        vpaddd  ymm1,ymm1,ymm13
+        vpor    ymm7,ymm7,ymm8
+        vpsrld  ymm9,ymm14,31
+        vpand   ymm5,ymm5,ymm3
+        vpaddd  ymm14,ymm14,ymm14
+
+        vpslld  ymm6,ymm3,30
+        vpaddd  ymm1,ymm1,ymm5
+
+        vpsrld  ymm3,ymm3,2
+        vpaddd  ymm1,ymm1,ymm7
+        vpor    ymm14,ymm14,ymm9
+        vpor    ymm3,ymm3,ymm6
+        vpxor   ymm10,ymm10,ymm12
+        vmovdqa ymm12,YMMWORD[((448-256-128))+rbx]
+
+        vpaddd  ymm0,ymm0,ymm15
+        vpslld  ymm7,ymm1,5
+        vpand   ymm6,ymm4,ymm3
+        vpxor   ymm10,ymm10,YMMWORD[((128-128))+rax]
+
+        vpaddd  ymm0,ymm0,ymm6
+        vpsrld  ymm8,ymm1,27
+        vpxor   ymm5,ymm4,ymm3
+        vpxor   ymm10,ymm10,ymm12
+
+        vmovdqu YMMWORD[(352-256-128)+rbx],ymm14
+        vpaddd  ymm0,ymm0,ymm14
+        vpor    ymm7,ymm7,ymm8
+        vpsrld  ymm9,ymm10,31
+        vpand   ymm5,ymm5,ymm2
+        vpaddd  ymm10,ymm10,ymm10
+
+        vpslld  ymm6,ymm2,30
+        vpaddd  ymm0,ymm0,ymm5
+
+        vpsrld  ymm2,ymm2,2
+        vpaddd  ymm0,ymm0,ymm7
+        vpor    ymm10,ymm10,ymm9
+        vpor    ymm2,ymm2,ymm6
+        vmovdqa ymm15,YMMWORD[64+rbp]
+        vpxor   ymm11,ymm11,ymm13
+        vmovdqa ymm13,YMMWORD[((480-256-128))+rbx]
+
+        vpslld  ymm7,ymm0,5
+        vpaddd  ymm4,ymm4,ymm15
+        vpxor   ymm5,ymm3,ymm1
+        vmovdqa YMMWORD[(384-256-128)+rbx],ymm10
+        vpaddd  ymm4,ymm4,ymm10
+        vpxor   ymm11,ymm11,YMMWORD[((160-128))+rax]
+        vpsrld  ymm8,ymm0,27
+        vpxor   ymm5,ymm5,ymm2
+        vpxor   ymm11,ymm11,ymm13
+
+        vpslld  ymm6,ymm1,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm4,ymm4,ymm5
+        vpsrld  ymm9,ymm11,31
+        vpaddd  ymm11,ymm11,ymm11
+
+        vpsrld  ymm1,ymm1,2
+        vpaddd  ymm4,ymm4,ymm7
+        vpor    ymm11,ymm11,ymm9
+        vpor    ymm1,ymm1,ymm6
+        vpxor   ymm12,ymm12,ymm14
+        vmovdqa ymm14,YMMWORD[((0-128))+rax]
+
+        vpslld  ymm7,ymm4,5
+        vpaddd  ymm3,ymm3,ymm15
+        vpxor   ymm5,ymm2,ymm0
+        vmovdqa YMMWORD[(416-256-128)+rbx],ymm11
+        vpaddd  ymm3,ymm3,ymm11
+        vpxor   ymm12,ymm12,YMMWORD[((192-128))+rax]
+        vpsrld  ymm8,ymm4,27
+        vpxor   ymm5,ymm5,ymm1
+        vpxor   ymm12,ymm12,ymm14
+
+        vpslld  ymm6,ymm0,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm3,ymm3,ymm5
+        vpsrld  ymm9,ymm12,31
+        vpaddd  ymm12,ymm12,ymm12
+
+        vpsrld  ymm0,ymm0,2
+        vpaddd  ymm3,ymm3,ymm7
+        vpor    ymm12,ymm12,ymm9
+        vpor    ymm0,ymm0,ymm6
+        vpxor   ymm13,ymm13,ymm10
+        vmovdqa ymm10,YMMWORD[((32-128))+rax]
+
+        vpslld  ymm7,ymm3,5
+        vpaddd  ymm2,ymm2,ymm15
+        vpxor   ymm5,ymm1,ymm4
+        vmovdqa YMMWORD[(448-256-128)+rbx],ymm12
+        vpaddd  ymm2,ymm2,ymm12
+        vpxor   ymm13,ymm13,YMMWORD[((224-128))+rax]
+        vpsrld  ymm8,ymm3,27
+        vpxor   ymm5,ymm5,ymm0
+        vpxor   ymm13,ymm13,ymm10
+
+        vpslld  ymm6,ymm4,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm2,ymm2,ymm5
+        vpsrld  ymm9,ymm13,31
+        vpaddd  ymm13,ymm13,ymm13
+
+        vpsrld  ymm4,ymm4,2
+        vpaddd  ymm2,ymm2,ymm7
+        vpor    ymm13,ymm13,ymm9
+        vpor    ymm4,ymm4,ymm6
+        vpxor   ymm14,ymm14,ymm11
+        vmovdqa ymm11,YMMWORD[((64-128))+rax]
+
+        vpslld  ymm7,ymm2,5
+        vpaddd  ymm1,ymm1,ymm15
+        vpxor   ymm5,ymm0,ymm3
+        vmovdqa YMMWORD[(480-256-128)+rbx],ymm13
+        vpaddd  ymm1,ymm1,ymm13
+        vpxor   ymm14,ymm14,YMMWORD[((256-256-128))+rbx]
+        vpsrld  ymm8,ymm2,27
+        vpxor   ymm5,ymm5,ymm4
+        vpxor   ymm14,ymm14,ymm11
+
+        vpslld  ymm6,ymm3,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm1,ymm1,ymm5
+        vpsrld  ymm9,ymm14,31
+        vpaddd  ymm14,ymm14,ymm14
+
+        vpsrld  ymm3,ymm3,2
+        vpaddd  ymm1,ymm1,ymm7
+        vpor    ymm14,ymm14,ymm9
+        vpor    ymm3,ymm3,ymm6
+        vpxor   ymm10,ymm10,ymm12
+        vmovdqa ymm12,YMMWORD[((96-128))+rax]
+
+        vpslld  ymm7,ymm1,5
+        vpaddd  ymm0,ymm0,ymm15
+        vpxor   ymm5,ymm4,ymm2
+        vmovdqa YMMWORD[(0-128)+rax],ymm14
+        vpaddd  ymm0,ymm0,ymm14
+        vpxor   ymm10,ymm10,YMMWORD[((288-256-128))+rbx]
+        vpsrld  ymm8,ymm1,27
+        vpxor   ymm5,ymm5,ymm3
+        vpxor   ymm10,ymm10,ymm12
+
+        vpslld  ymm6,ymm2,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm0,ymm0,ymm5
+        vpsrld  ymm9,ymm10,31
+        vpaddd  ymm10,ymm10,ymm10
+
+        vpsrld  ymm2,ymm2,2
+        vpaddd  ymm0,ymm0,ymm7
+        vpor    ymm10,ymm10,ymm9
+        vpor    ymm2,ymm2,ymm6
+        vpxor   ymm11,ymm11,ymm13
+        vmovdqa ymm13,YMMWORD[((128-128))+rax]
+
+        vpslld  ymm7,ymm0,5
+        vpaddd  ymm4,ymm4,ymm15
+        vpxor   ymm5,ymm3,ymm1
+        vmovdqa YMMWORD[(32-128)+rax],ymm10
+        vpaddd  ymm4,ymm4,ymm10
+        vpxor   ymm11,ymm11,YMMWORD[((320-256-128))+rbx]
+        vpsrld  ymm8,ymm0,27
+        vpxor   ymm5,ymm5,ymm2
+        vpxor   ymm11,ymm11,ymm13
+
+        vpslld  ymm6,ymm1,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm4,ymm4,ymm5
+        vpsrld  ymm9,ymm11,31
+        vpaddd  ymm11,ymm11,ymm11
+
+        vpsrld  ymm1,ymm1,2
+        vpaddd  ymm4,ymm4,ymm7
+        vpor    ymm11,ymm11,ymm9
+        vpor    ymm1,ymm1,ymm6
+        vpxor   ymm12,ymm12,ymm14
+        vmovdqa ymm14,YMMWORD[((160-128))+rax]
+
+        vpslld  ymm7,ymm4,5
+        vpaddd  ymm3,ymm3,ymm15
+        vpxor   ymm5,ymm2,ymm0
+        vmovdqa YMMWORD[(64-128)+rax],ymm11
+        vpaddd  ymm3,ymm3,ymm11
+        vpxor   ymm12,ymm12,YMMWORD[((352-256-128))+rbx]
+        vpsrld  ymm8,ymm4,27
+        vpxor   ymm5,ymm5,ymm1
+        vpxor   ymm12,ymm12,ymm14
+
+        vpslld  ymm6,ymm0,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm3,ymm3,ymm5
+        vpsrld  ymm9,ymm12,31
+        vpaddd  ymm12,ymm12,ymm12
+
+        vpsrld  ymm0,ymm0,2
+        vpaddd  ymm3,ymm3,ymm7
+        vpor    ymm12,ymm12,ymm9
+        vpor    ymm0,ymm0,ymm6
+        vpxor   ymm13,ymm13,ymm10
+        vmovdqa ymm10,YMMWORD[((192-128))+rax]
+
+        vpslld  ymm7,ymm3,5
+        vpaddd  ymm2,ymm2,ymm15
+        vpxor   ymm5,ymm1,ymm4
+        vmovdqa YMMWORD[(96-128)+rax],ymm12
+        vpaddd  ymm2,ymm2,ymm12
+        vpxor   ymm13,ymm13,YMMWORD[((384-256-128))+rbx]
+        vpsrld  ymm8,ymm3,27
+        vpxor   ymm5,ymm5,ymm0
+        vpxor   ymm13,ymm13,ymm10
+
+        vpslld  ymm6,ymm4,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm2,ymm2,ymm5
+        vpsrld  ymm9,ymm13,31
+        vpaddd  ymm13,ymm13,ymm13
+
+        vpsrld  ymm4,ymm4,2
+        vpaddd  ymm2,ymm2,ymm7
+        vpor    ymm13,ymm13,ymm9
+        vpor    ymm4,ymm4,ymm6
+        vpxor   ymm14,ymm14,ymm11
+        vmovdqa ymm11,YMMWORD[((224-128))+rax]
+
+        vpslld  ymm7,ymm2,5
+        vpaddd  ymm1,ymm1,ymm15
+        vpxor   ymm5,ymm0,ymm3
+        vmovdqa YMMWORD[(128-128)+rax],ymm13
+        vpaddd  ymm1,ymm1,ymm13
+        vpxor   ymm14,ymm14,YMMWORD[((416-256-128))+rbx]
+        vpsrld  ymm8,ymm2,27
+        vpxor   ymm5,ymm5,ymm4
+        vpxor   ymm14,ymm14,ymm11
+
+        vpslld  ymm6,ymm3,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm1,ymm1,ymm5
+        vpsrld  ymm9,ymm14,31
+        vpaddd  ymm14,ymm14,ymm14
+
+        vpsrld  ymm3,ymm3,2
+        vpaddd  ymm1,ymm1,ymm7
+        vpor    ymm14,ymm14,ymm9
+        vpor    ymm3,ymm3,ymm6
+        vpxor   ymm10,ymm10,ymm12
+        vmovdqa ymm12,YMMWORD[((256-256-128))+rbx]
+
+        vpslld  ymm7,ymm1,5
+        vpaddd  ymm0,ymm0,ymm15
+        vpxor   ymm5,ymm4,ymm2
+        vmovdqa YMMWORD[(160-128)+rax],ymm14
+        vpaddd  ymm0,ymm0,ymm14
+        vpxor   ymm10,ymm10,YMMWORD[((448-256-128))+rbx]
+        vpsrld  ymm8,ymm1,27
+        vpxor   ymm5,ymm5,ymm3
+        vpxor   ymm10,ymm10,ymm12
+
+        vpslld  ymm6,ymm2,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm0,ymm0,ymm5
+        vpsrld  ymm9,ymm10,31
+        vpaddd  ymm10,ymm10,ymm10
+
+        vpsrld  ymm2,ymm2,2
+        vpaddd  ymm0,ymm0,ymm7
+        vpor    ymm10,ymm10,ymm9
+        vpor    ymm2,ymm2,ymm6
+        vpxor   ymm11,ymm11,ymm13
+        vmovdqa ymm13,YMMWORD[((288-256-128))+rbx]
+
+        vpslld  ymm7,ymm0,5
+        vpaddd  ymm4,ymm4,ymm15
+        vpxor   ymm5,ymm3,ymm1
+        vmovdqa YMMWORD[(192-128)+rax],ymm10
+        vpaddd  ymm4,ymm4,ymm10
+        vpxor   ymm11,ymm11,YMMWORD[((480-256-128))+rbx]
+        vpsrld  ymm8,ymm0,27
+        vpxor   ymm5,ymm5,ymm2
+        vpxor   ymm11,ymm11,ymm13
+
+        vpslld  ymm6,ymm1,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm4,ymm4,ymm5
+        vpsrld  ymm9,ymm11,31
+        vpaddd  ymm11,ymm11,ymm11
+
+        vpsrld  ymm1,ymm1,2
+        vpaddd  ymm4,ymm4,ymm7
+        vpor    ymm11,ymm11,ymm9
+        vpor    ymm1,ymm1,ymm6
+        vpxor   ymm12,ymm12,ymm14
+        vmovdqa ymm14,YMMWORD[((320-256-128))+rbx]
+
+        vpslld  ymm7,ymm4,5
+        vpaddd  ymm3,ymm3,ymm15
+        vpxor   ymm5,ymm2,ymm0
+        vmovdqa YMMWORD[(224-128)+rax],ymm11
+        vpaddd  ymm3,ymm3,ymm11
+        vpxor   ymm12,ymm12,YMMWORD[((0-128))+rax]
+        vpsrld  ymm8,ymm4,27
+        vpxor   ymm5,ymm5,ymm1
+        vpxor   ymm12,ymm12,ymm14
+
+        vpslld  ymm6,ymm0,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm3,ymm3,ymm5
+        vpsrld  ymm9,ymm12,31
+        vpaddd  ymm12,ymm12,ymm12
+
+        vpsrld  ymm0,ymm0,2
+        vpaddd  ymm3,ymm3,ymm7
+        vpor    ymm12,ymm12,ymm9
+        vpor    ymm0,ymm0,ymm6
+        vpxor   ymm13,ymm13,ymm10
+        vmovdqa ymm10,YMMWORD[((352-256-128))+rbx]
+
+        vpslld  ymm7,ymm3,5
+        vpaddd  ymm2,ymm2,ymm15
+        vpxor   ymm5,ymm1,ymm4
+        vpaddd  ymm2,ymm2,ymm12
+        vpxor   ymm13,ymm13,YMMWORD[((32-128))+rax]
+        vpsrld  ymm8,ymm3,27
+        vpxor   ymm5,ymm5,ymm0
+        vpxor   ymm13,ymm13,ymm10
+
+        vpslld  ymm6,ymm4,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm2,ymm2,ymm5
+        vpsrld  ymm9,ymm13,31
+        vpaddd  ymm13,ymm13,ymm13
+
+        vpsrld  ymm4,ymm4,2
+        vpaddd  ymm2,ymm2,ymm7
+        vpor    ymm13,ymm13,ymm9
+        vpor    ymm4,ymm4,ymm6
+        vpxor   ymm14,ymm14,ymm11
+        vmovdqa ymm11,YMMWORD[((384-256-128))+rbx]
+
+        vpslld  ymm7,ymm2,5
+        vpaddd  ymm1,ymm1,ymm15
+        vpxor   ymm5,ymm0,ymm3
+        vpaddd  ymm1,ymm1,ymm13
+        vpxor   ymm14,ymm14,YMMWORD[((64-128))+rax]
+        vpsrld  ymm8,ymm2,27
+        vpxor   ymm5,ymm5,ymm4
+        vpxor   ymm14,ymm14,ymm11
+
+        vpslld  ymm6,ymm3,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm1,ymm1,ymm5
+        vpsrld  ymm9,ymm14,31
+        vpaddd  ymm14,ymm14,ymm14
+
+        vpsrld  ymm3,ymm3,2
+        vpaddd  ymm1,ymm1,ymm7
+        vpor    ymm14,ymm14,ymm9
+        vpor    ymm3,ymm3,ymm6
+        vpxor   ymm10,ymm10,ymm12
+        vmovdqa ymm12,YMMWORD[((416-256-128))+rbx]
+
+        vpslld  ymm7,ymm1,5
+        vpaddd  ymm0,ymm0,ymm15
+        vpxor   ymm5,ymm4,ymm2
+        vpaddd  ymm0,ymm0,ymm14
+        vpxor   ymm10,ymm10,YMMWORD[((96-128))+rax]
+        vpsrld  ymm8,ymm1,27
+        vpxor   ymm5,ymm5,ymm3
+        vpxor   ymm10,ymm10,ymm12
+
+        vpslld  ymm6,ymm2,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm0,ymm0,ymm5
+        vpsrld  ymm9,ymm10,31
+        vpaddd  ymm10,ymm10,ymm10
+
+        vpsrld  ymm2,ymm2,2
+        vpaddd  ymm0,ymm0,ymm7
+        vpor    ymm10,ymm10,ymm9
+        vpor    ymm2,ymm2,ymm6
+        vpxor   ymm11,ymm11,ymm13
+        vmovdqa ymm13,YMMWORD[((448-256-128))+rbx]
+
+        vpslld  ymm7,ymm0,5
+        vpaddd  ymm4,ymm4,ymm15
+        vpxor   ymm5,ymm3,ymm1
+        vpaddd  ymm4,ymm4,ymm10
+        vpxor   ymm11,ymm11,YMMWORD[((128-128))+rax]
+        vpsrld  ymm8,ymm0,27
+        vpxor   ymm5,ymm5,ymm2
+        vpxor   ymm11,ymm11,ymm13
+
+        vpslld  ymm6,ymm1,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm4,ymm4,ymm5
+        vpsrld  ymm9,ymm11,31
+        vpaddd  ymm11,ymm11,ymm11
+
+        vpsrld  ymm1,ymm1,2
+        vpaddd  ymm4,ymm4,ymm7
+        vpor    ymm11,ymm11,ymm9
+        vpor    ymm1,ymm1,ymm6
+        vpxor   ymm12,ymm12,ymm14
+        vmovdqa ymm14,YMMWORD[((480-256-128))+rbx]
+
+        vpslld  ymm7,ymm4,5
+        vpaddd  ymm3,ymm3,ymm15
+        vpxor   ymm5,ymm2,ymm0
+        vpaddd  ymm3,ymm3,ymm11
+        vpxor   ymm12,ymm12,YMMWORD[((160-128))+rax]
+        vpsrld  ymm8,ymm4,27
+        vpxor   ymm5,ymm5,ymm1
+        vpxor   ymm12,ymm12,ymm14
+
+        vpslld  ymm6,ymm0,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm3,ymm3,ymm5
+        vpsrld  ymm9,ymm12,31
+        vpaddd  ymm12,ymm12,ymm12
+
+        vpsrld  ymm0,ymm0,2
+        vpaddd  ymm3,ymm3,ymm7
+        vpor    ymm12,ymm12,ymm9
+        vpor    ymm0,ymm0,ymm6
+        vpxor   ymm13,ymm13,ymm10
+        vmovdqa ymm10,YMMWORD[((0-128))+rax]
+
+        vpslld  ymm7,ymm3,5
+        vpaddd  ymm2,ymm2,ymm15
+        vpxor   ymm5,ymm1,ymm4
+        vpaddd  ymm2,ymm2,ymm12
+        vpxor   ymm13,ymm13,YMMWORD[((192-128))+rax]
+        vpsrld  ymm8,ymm3,27
+        vpxor   ymm5,ymm5,ymm0
+        vpxor   ymm13,ymm13,ymm10
+
+        vpslld  ymm6,ymm4,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm2,ymm2,ymm5
+        vpsrld  ymm9,ymm13,31
+        vpaddd  ymm13,ymm13,ymm13
+
+        vpsrld  ymm4,ymm4,2
+        vpaddd  ymm2,ymm2,ymm7
+        vpor    ymm13,ymm13,ymm9
+        vpor    ymm4,ymm4,ymm6
+        vpxor   ymm14,ymm14,ymm11
+        vmovdqa ymm11,YMMWORD[((32-128))+rax]
+
+        vpslld  ymm7,ymm2,5
+        vpaddd  ymm1,ymm1,ymm15
+        vpxor   ymm5,ymm0,ymm3
+        vpaddd  ymm1,ymm1,ymm13
+        vpxor   ymm14,ymm14,YMMWORD[((224-128))+rax]
+        vpsrld  ymm8,ymm2,27
+        vpxor   ymm5,ymm5,ymm4
+        vpxor   ymm14,ymm14,ymm11
+
+        vpslld  ymm6,ymm3,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm1,ymm1,ymm5
+        vpsrld  ymm9,ymm14,31
+        vpaddd  ymm14,ymm14,ymm14
+
+        vpsrld  ymm3,ymm3,2
+        vpaddd  ymm1,ymm1,ymm7
+        vpor    ymm14,ymm14,ymm9
+        vpor    ymm3,ymm3,ymm6
+        vpslld  ymm7,ymm1,5
+        vpaddd  ymm0,ymm0,ymm15
+        vpxor   ymm5,ymm4,ymm2
+
+        vpsrld  ymm8,ymm1,27
+        vpaddd  ymm0,ymm0,ymm14
+        vpxor   ymm5,ymm5,ymm3
+
+        vpslld  ymm6,ymm2,30
+        vpor    ymm7,ymm7,ymm8
+        vpaddd  ymm0,ymm0,ymm5
+
+        vpsrld  ymm2,ymm2,2
+        vpaddd  ymm0,ymm0,ymm7
+        vpor    ymm2,ymm2,ymm6
+        mov     ecx,1
+        lea     rbx,[512+rsp]
+        cmp     ecx,DWORD[rbx]
+        cmovge  r12,rbp
+        cmp     ecx,DWORD[4+rbx]
+        cmovge  r13,rbp
+        cmp     ecx,DWORD[8+rbx]
+        cmovge  r14,rbp
+        cmp     ecx,DWORD[12+rbx]
+        cmovge  r15,rbp
+        cmp     ecx,DWORD[16+rbx]
+        cmovge  r8,rbp
+        cmp     ecx,DWORD[20+rbx]
+        cmovge  r9,rbp
+        cmp     ecx,DWORD[24+rbx]
+        cmovge  r10,rbp
+        cmp     ecx,DWORD[28+rbx]
+        cmovge  r11,rbp
+        vmovdqu ymm5,YMMWORD[rbx]
+        vpxor   ymm7,ymm7,ymm7
+        vmovdqa ymm6,ymm5
+        vpcmpgtd        ymm6,ymm6,ymm7
+        vpaddd  ymm5,ymm5,ymm6
+
+        vpand   ymm0,ymm0,ymm6
+        vpand   ymm1,ymm1,ymm6
+        vpaddd  ymm0,ymm0,YMMWORD[rdi]
+        vpand   ymm2,ymm2,ymm6
+        vpaddd  ymm1,ymm1,YMMWORD[32+rdi]
+        vpand   ymm3,ymm3,ymm6
+        vpaddd  ymm2,ymm2,YMMWORD[64+rdi]
+        vpand   ymm4,ymm4,ymm6
+        vpaddd  ymm3,ymm3,YMMWORD[96+rdi]
+        vpaddd  ymm4,ymm4,YMMWORD[128+rdi]
+        vmovdqu YMMWORD[rdi],ymm0
+        vmovdqu YMMWORD[32+rdi],ymm1
+        vmovdqu YMMWORD[64+rdi],ymm2
+        vmovdqu YMMWORD[96+rdi],ymm3
+        vmovdqu YMMWORD[128+rdi],ymm4
+
+        vmovdqu YMMWORD[rbx],ymm5
+        lea     rbx,[((256+128))+rsp]
+        vmovdqu ymm9,YMMWORD[96+rbp]
+        dec     edx
+        jnz     NEAR $L$oop_avx2
+
+
+
+
+
+
+
+$L$done_avx2:
+        mov     rax,QWORD[544+rsp]
+
+        vzeroupper
+        movaps  xmm6,XMMWORD[((-216))+rax]
+        movaps  xmm7,XMMWORD[((-200))+rax]
+        movaps  xmm8,XMMWORD[((-184))+rax]
+        movaps  xmm9,XMMWORD[((-168))+rax]
+        movaps  xmm10,XMMWORD[((-152))+rax]
+        movaps  xmm11,XMMWORD[((-136))+rax]
+        movaps  xmm12,XMMWORD[((-120))+rax]
+        movaps  xmm13,XMMWORD[((-104))+rax]
+        movaps  xmm14,XMMWORD[((-88))+rax]
+        movaps  xmm15,XMMWORD[((-72))+rax]
+        mov     r15,QWORD[((-48))+rax]
+
+        mov     r14,QWORD[((-40))+rax]
+
+        mov     r13,QWORD[((-32))+rax]
+
+        mov     r12,QWORD[((-24))+rax]
+
+        mov     rbp,QWORD[((-16))+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+
+        lea     rsp,[rax]
+
+$L$epilogue_avx2:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_sha1_multi_block_avx2:
+
+ALIGN   256
+        DD      0x5a827999,0x5a827999,0x5a827999,0x5a827999
+        DD      0x5a827999,0x5a827999,0x5a827999,0x5a827999
+K_XX_XX:
+        DD      0x6ed9eba1,0x6ed9eba1,0x6ed9eba1,0x6ed9eba1
+        DD      0x6ed9eba1,0x6ed9eba1,0x6ed9eba1,0x6ed9eba1
+        DD      0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc
+        DD      0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc
+        DD      0xca62c1d6,0xca62c1d6,0xca62c1d6,0xca62c1d6
+        DD      0xca62c1d6,0xca62c1d6,0xca62c1d6,0xca62c1d6
+        DD      0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f
+        DD      0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f
+DB      0xf,0xe,0xd,0xc,0xb,0xa,0x9,0x8,0x7,0x6,0x5,0x4,0x3,0x2,0x1,0x0
+DB      83,72,65,49,32,109,117,108,116,105,45,98,108,111,99,107
+DB      32,116,114,97,110,115,102,111,114,109,32,102,111,114,32,120
+DB      56,54,95,54,52,44,32,67,82,89,80,84,79,71,65,77
+DB      83,32,98,121,32,60,97,112,112,114,111,64,111,112,101,110
+DB      115,115,108,46,111,114,103,62,0
+EXTERN  __imp_RtlVirtualUnwind
+
+ALIGN   16
+se_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        mov     rsi,QWORD[8+r9]
+        mov     r11,QWORD[56+r9]
+
+        mov     r10d,DWORD[r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jb      NEAR $L$in_prologue
+
+        mov     rax,QWORD[152+r8]
+
+        mov     r10d,DWORD[4+r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jae     NEAR $L$in_prologue
+
+        mov     rax,QWORD[272+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+        mov     rbp,QWORD[((-16))+rax]
+        mov     QWORD[144+r8],rbx
+        mov     QWORD[160+r8],rbp
+
+        lea     rsi,[((-24-160))+rax]
+        lea     rdi,[512+r8]
+        mov     ecx,20
+        DD      0xa548f3fc
+
+$L$in_prologue:
+        mov     rdi,QWORD[8+rax]
+        mov     rsi,QWORD[16+rax]
+        mov     QWORD[152+r8],rax
+        mov     QWORD[168+r8],rsi
+        mov     QWORD[176+r8],rdi
+
+        mov     rdi,QWORD[40+r9]
+        mov     rsi,r8
+        mov     ecx,154
+        DD      0xa548f3fc
+
+        mov     rsi,r9
+        xor     rcx,rcx
+        mov     rdx,QWORD[8+rsi]
+        mov     r8,QWORD[rsi]
+        mov     r9,QWORD[16+rsi]
+        mov     r10,QWORD[40+rsi]
+        lea     r11,[56+rsi]
+        lea     r12,[24+rsi]
+        mov     QWORD[32+rsp],r10
+        mov     QWORD[40+rsp],r11
+        mov     QWORD[48+rsp],r12
+        mov     QWORD[56+rsp],rcx
+        call    QWORD[__imp_RtlVirtualUnwind]
+
+        mov     eax,1
+        add     rsp,64
+        popfq
+        pop     r15
+        pop     r14
+        pop     r13
+        pop     r12
+        pop     rbp
+        pop     rbx
+        pop     rdi
+        pop     rsi
+        DB      0F3h,0C3h               ;repret
+
+
+ALIGN   16
+avx2_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        mov     rsi,QWORD[8+r9]
+        mov     r11,QWORD[56+r9]
+
+        mov     r10d,DWORD[r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jb      NEAR $L$in_prologue
+
+        mov     rax,QWORD[152+r8]
+
+        mov     r10d,DWORD[4+r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jae     NEAR $L$in_prologue
+
+        mov     rax,QWORD[544+r8]
+
+        mov     rbx,QWORD[((-8))+rax]
+        mov     rbp,QWORD[((-16))+rax]
+        mov     r12,QWORD[((-24))+rax]
+        mov     r13,QWORD[((-32))+rax]
+        mov     r14,QWORD[((-40))+rax]
+        mov     r15,QWORD[((-48))+rax]
+        mov     QWORD[144+r8],rbx
+        mov     QWORD[160+r8],rbp
+        mov     QWORD[216+r8],r12
+        mov     QWORD[224+r8],r13
+        mov     QWORD[232+r8],r14
+        mov     QWORD[240+r8],r15
+
+        lea     rsi,[((-56-160))+rax]
+        lea     rdi,[512+r8]
+        mov     ecx,20
+        DD      0xa548f3fc
+
+        jmp     NEAR $L$in_prologue
+
+section .pdata rdata align=4
+ALIGN   4
+        DD      $L$SEH_begin_sha1_multi_block wrt ..imagebase
+        DD      $L$SEH_end_sha1_multi_block wrt ..imagebase
+        DD      $L$SEH_info_sha1_multi_block wrt ..imagebase
+        DD      $L$SEH_begin_sha1_multi_block_shaext wrt ..imagebase
+        DD      $L$SEH_end_sha1_multi_block_shaext wrt ..imagebase
+        DD      $L$SEH_info_sha1_multi_block_shaext wrt ..imagebase
+        DD      $L$SEH_begin_sha1_multi_block_avx wrt ..imagebase
+        DD      $L$SEH_end_sha1_multi_block_avx wrt ..imagebase
+        DD      $L$SEH_info_sha1_multi_block_avx wrt ..imagebase
+        DD      $L$SEH_begin_sha1_multi_block_avx2 wrt ..imagebase
+        DD      $L$SEH_end_sha1_multi_block_avx2 wrt ..imagebase
+        DD      $L$SEH_info_sha1_multi_block_avx2 wrt ..imagebase
+section .xdata rdata align=8
+ALIGN   8
+$L$SEH_info_sha1_multi_block:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$body wrt ..imagebase,$L$epilogue wrt ..imagebase
+$L$SEH_info_sha1_multi_block_shaext:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$body_shaext wrt ..imagebase,$L$epilogue_shaext wrt ..imagebase
+$L$SEH_info_sha1_multi_block_avx:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$body_avx wrt ..imagebase,$L$epilogue_avx wrt ..imagebase
+$L$SEH_info_sha1_multi_block_avx2:
+DB      9,0,0,0
+        DD      avx2_handler wrt ..imagebase
+        DD      $L$body_avx2 wrt ..imagebase,$L$epilogue_avx2 wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-x86_64.nasm
new file mode 100644
index 0000000000..3a7655b27f
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-x86_64.nasm
@@ -0,0 +1,5773 @@
+; Copyright 2006-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+EXTERN  OPENSSL_ia32cap_P
+
+global  sha1_block_data_order
+
+ALIGN   16
+sha1_block_data_order:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_sha1_block_data_order:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+
+        mov     r9d,DWORD[((OPENSSL_ia32cap_P+0))]
+        mov     r8d,DWORD[((OPENSSL_ia32cap_P+4))]
+        mov     r10d,DWORD[((OPENSSL_ia32cap_P+8))]
+        test    r8d,512
+        jz      NEAR $L$ialu
+        test    r10d,536870912
+        jnz     NEAR _shaext_shortcut
+        and     r10d,296
+        cmp     r10d,296
+        je      NEAR _avx2_shortcut
+        and     r8d,268435456
+        and     r9d,1073741824
+        or      r8d,r9d
+        cmp     r8d,1342177280
+        je      NEAR _avx_shortcut
+        jmp     NEAR _ssse3_shortcut
+
+ALIGN   16
+$L$ialu:
+        mov     rax,rsp
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        mov     r8,rdi
+        sub     rsp,72
+        mov     r9,rsi
+        and     rsp,-64
+        mov     r10,rdx
+        mov     QWORD[64+rsp],rax
+
+$L$prologue:
+
+        mov     esi,DWORD[r8]
+        mov     edi,DWORD[4+r8]
+        mov     r11d,DWORD[8+r8]
+        mov     r12d,DWORD[12+r8]
+        mov     r13d,DWORD[16+r8]
+        jmp     NEAR $L$loop
+
+ALIGN   16
+$L$loop:
+        mov     edx,DWORD[r9]
+        bswap   edx
+        mov     ebp,DWORD[4+r9]
+        mov     eax,r12d
+        mov     DWORD[rsp],edx
+        mov     ecx,esi
+        bswap   ebp
+        xor     eax,r11d
+        rol     ecx,5
+        and     eax,edi
+        lea     r13d,[1518500249+r13*1+rdx]
+        add     r13d,ecx
+        xor     eax,r12d
+        rol     edi,30
+        add     r13d,eax
+        mov     r14d,DWORD[8+r9]
+        mov     eax,r11d
+        mov     DWORD[4+rsp],ebp
+        mov     ecx,r13d
+        bswap   r14d
+        xor     eax,edi
+        rol     ecx,5
+        and     eax,esi
+        lea     r12d,[1518500249+r12*1+rbp]
+        add     r12d,ecx
+        xor     eax,r11d
+        rol     esi,30
+        add     r12d,eax
+        mov     edx,DWORD[12+r9]
+        mov     eax,edi
+        mov     DWORD[8+rsp],r14d
+        mov     ecx,r12d
+        bswap   edx
+        xor     eax,esi
+        rol     ecx,5
+        and     eax,r13d
+        lea     r11d,[1518500249+r11*1+r14]
+        add     r11d,ecx
+        xor     eax,edi
+        rol     r13d,30
+        add     r11d,eax
+        mov     ebp,DWORD[16+r9]
+        mov     eax,esi
+        mov     DWORD[12+rsp],edx
+        mov     ecx,r11d
+        bswap   ebp
+        xor     eax,r13d
+        rol     ecx,5
+        and     eax,r12d
+        lea     edi,[1518500249+rdi*1+rdx]
+        add     edi,ecx
+        xor     eax,esi
+        rol     r12d,30
+        add     edi,eax
+        mov     r14d,DWORD[20+r9]
+        mov     eax,r13d
+        mov     DWORD[16+rsp],ebp
+        mov     ecx,edi
+        bswap   r14d
+        xor     eax,r12d
+        rol     ecx,5
+        and     eax,r11d
+        lea     esi,[1518500249+rsi*1+rbp]
+        add     esi,ecx
+        xor     eax,r13d
+        rol     r11d,30
+        add     esi,eax
+        mov     edx,DWORD[24+r9]
+        mov     eax,r12d
+        mov     DWORD[20+rsp],r14d
+        mov     ecx,esi
+        bswap   edx
+        xor     eax,r11d
+        rol     ecx,5
+        and     eax,edi
+        lea     r13d,[1518500249+r13*1+r14]
+        add     r13d,ecx
+        xor     eax,r12d
+        rol     edi,30
+        add     r13d,eax
+        mov     ebp,DWORD[28+r9]
+        mov     eax,r11d
+        mov     DWORD[24+rsp],edx
+        mov     ecx,r13d
+        bswap   ebp
+        xor     eax,edi
+        rol     ecx,5
+        and     eax,esi
+        lea     r12d,[1518500249+r12*1+rdx]
+        add     r12d,ecx
+        xor     eax,r11d
+        rol     esi,30
+        add     r12d,eax
+        mov     r14d,DWORD[32+r9]
+        mov     eax,edi
+        mov     DWORD[28+rsp],ebp
+        mov     ecx,r12d
+        bswap   r14d
+        xor     eax,esi
+        rol     ecx,5
+        and     eax,r13d
+        lea     r11d,[1518500249+r11*1+rbp]
+        add     r11d,ecx
+        xor     eax,edi
+        rol     r13d,30
+        add     r11d,eax
+        mov     edx,DWORD[36+r9]
+        mov     eax,esi
+        mov     DWORD[32+rsp],r14d
+        mov     ecx,r11d
+        bswap   edx
+        xor     eax,r13d
+        rol     ecx,5
+        and     eax,r12d
+        lea     edi,[1518500249+rdi*1+r14]
+        add     edi,ecx
+        xor     eax,esi
+        rol     r12d,30
+        add     edi,eax
+        mov     ebp,DWORD[40+r9]
+        mov     eax,r13d
+        mov     DWORD[36+rsp],edx
+        mov     ecx,edi
+        bswap   ebp
+        xor     eax,r12d
+        rol     ecx,5
+        and     eax,r11d
+        lea     esi,[1518500249+rsi*1+rdx]
+        add     esi,ecx
+        xor     eax,r13d
+        rol     r11d,30
+        add     esi,eax
+        mov     r14d,DWORD[44+r9]
+        mov     eax,r12d
+        mov     DWORD[40+rsp],ebp
+        mov     ecx,esi
+        bswap   r14d
+        xor     eax,r11d
+        rol     ecx,5
+        and     eax,edi
+        lea     r13d,[1518500249+r13*1+rbp]
+        add     r13d,ecx
+        xor     eax,r12d
+        rol     edi,30
+        add     r13d,eax
+        mov     edx,DWORD[48+r9]
+        mov     eax,r11d
+        mov     DWORD[44+rsp],r14d
+        mov     ecx,r13d
+        bswap   edx
+        xor     eax,edi
+        rol     ecx,5
+        and     eax,esi
+        lea     r12d,[1518500249+r12*1+r14]
+        add     r12d,ecx
+        xor     eax,r11d
+        rol     esi,30
+        add     r12d,eax
+        mov     ebp,DWORD[52+r9]
+        mov     eax,edi
+        mov     DWORD[48+rsp],edx
+        mov     ecx,r12d
+        bswap   ebp
+        xor     eax,esi
+        rol     ecx,5
+        and     eax,r13d
+        lea     r11d,[1518500249+r11*1+rdx]
+        add     r11d,ecx
+        xor     eax,edi
+        rol     r13d,30
+        add     r11d,eax
+        mov     r14d,DWORD[56+r9]
+        mov     eax,esi
+        mov     DWORD[52+rsp],ebp
+        mov     ecx,r11d
+        bswap   r14d
+        xor     eax,r13d
+        rol     ecx,5
+        and     eax,r12d
+        lea     edi,[1518500249+rdi*1+rbp]
+        add     edi,ecx
+        xor     eax,esi
+        rol     r12d,30
+        add     edi,eax
+        mov     edx,DWORD[60+r9]
+        mov     eax,r13d
+        mov     DWORD[56+rsp],r14d
+        mov     ecx,edi
+        bswap   edx
+        xor     eax,r12d
+        rol     ecx,5
+        and     eax,r11d
+        lea     esi,[1518500249+rsi*1+r14]
+        add     esi,ecx
+        xor     eax,r13d
+        rol     r11d,30
+        add     esi,eax
+        xor     ebp,DWORD[rsp]
+        mov     eax,r12d
+        mov     DWORD[60+rsp],edx
+        mov     ecx,esi
+        xor     ebp,DWORD[8+rsp]
+        xor     eax,r11d
+        rol     ecx,5
+        xor     ebp,DWORD[32+rsp]
+        and     eax,edi
+        lea     r13d,[1518500249+r13*1+rdx]
+        rol     edi,30
+        xor     eax,r12d
+        add     r13d,ecx
+        rol     ebp,1
+        add     r13d,eax
+        xor     r14d,DWORD[4+rsp]
+        mov     eax,r11d
+        mov     DWORD[rsp],ebp
+        mov     ecx,r13d
+        xor     r14d,DWORD[12+rsp]
+        xor     eax,edi
+        rol     ecx,5
+        xor     r14d,DWORD[36+rsp]
+        and     eax,esi
+        lea     r12d,[1518500249+r12*1+rbp]
+        rol     esi,30
+        xor     eax,r11d
+        add     r12d,ecx
+        rol     r14d,1
+        add     r12d,eax
+        xor     edx,DWORD[8+rsp]
+        mov     eax,edi
+        mov     DWORD[4+rsp],r14d
+        mov     ecx,r12d
+        xor     edx,DWORD[16+rsp]
+        xor     eax,esi
+        rol     ecx,5
+        xor     edx,DWORD[40+rsp]
+        and     eax,r13d
+        lea     r11d,[1518500249+r11*1+r14]
+        rol     r13d,30
+        xor     eax,edi
+        add     r11d,ecx
+        rol     edx,1
+        add     r11d,eax
+        xor     ebp,DWORD[12+rsp]
+        mov     eax,esi
+        mov     DWORD[8+rsp],edx
+        mov     ecx,r11d
+        xor     ebp,DWORD[20+rsp]
+        xor     eax,r13d
+        rol     ecx,5
+        xor     ebp,DWORD[44+rsp]
+        and     eax,r12d
+        lea     edi,[1518500249+rdi*1+rdx]
+        rol     r12d,30
+        xor     eax,esi
+        add     edi,ecx
+        rol     ebp,1
+        add     edi,eax
+        xor     r14d,DWORD[16+rsp]
+        mov     eax,r13d
+        mov     DWORD[12+rsp],ebp
+        mov     ecx,edi
+        xor     r14d,DWORD[24+rsp]
+        xor     eax,r12d
+        rol     ecx,5
+        xor     r14d,DWORD[48+rsp]
+        and     eax,r11d
+        lea     esi,[1518500249+rsi*1+rbp]
+        rol     r11d,30
+        xor     eax,r13d
+        add     esi,ecx
+        rol     r14d,1
+        add     esi,eax
+        xor     edx,DWORD[20+rsp]
+        mov     eax,edi
+        mov     DWORD[16+rsp],r14d
+        mov     ecx,esi
+        xor     edx,DWORD[28+rsp]
+        xor     eax,r12d
+        rol     ecx,5
+        xor     edx,DWORD[52+rsp]
+        lea     r13d,[1859775393+r13*1+r14]
+        xor     eax,r11d
+        add     r13d,ecx
+        rol     edi,30
+        add     r13d,eax
+        rol     edx,1
+        xor     ebp,DWORD[24+rsp]
+        mov     eax,esi
+        mov     DWORD[20+rsp],edx
+        mov     ecx,r13d
+        xor     ebp,DWORD[32+rsp]
+        xor     eax,r11d
+        rol     ecx,5
+        xor     ebp,DWORD[56+rsp]
+        lea     r12d,[1859775393+r12*1+rdx]
+        xor     eax,edi
+        add     r12d,ecx
+        rol     esi,30
+        add     r12d,eax
+        rol     ebp,1
+        xor     r14d,DWORD[28+rsp]
+        mov     eax,r13d
+        mov     DWORD[24+rsp],ebp
+        mov     ecx,r12d
+        xor     r14d,DWORD[36+rsp]
+        xor     eax,edi
+        rol     ecx,5
+        xor     r14d,DWORD[60+rsp]
+        lea     r11d,[1859775393+r11*1+rbp]
+        xor     eax,esi
+        add     r11d,ecx
+        rol     r13d,30
+        add     r11d,eax
+        rol     r14d,1
+        xor     edx,DWORD[32+rsp]
+        mov     eax,r12d
+        mov     DWORD[28+rsp],r14d
+        mov     ecx,r11d
+        xor     edx,DWORD[40+rsp]
+        xor     eax,esi
+        rol     ecx,5
+        xor     edx,DWORD[rsp]
+        lea     edi,[1859775393+rdi*1+r14]
+        xor     eax,r13d
+        add     edi,ecx
+        rol     r12d,30
+        add     edi,eax
+        rol     edx,1
+        xor     ebp,DWORD[36+rsp]
+        mov     eax,r11d
+        mov     DWORD[32+rsp],edx
+        mov     ecx,edi
+        xor     ebp,DWORD[44+rsp]
+        xor     eax,r13d
+        rol     ecx,5
+        xor     ebp,DWORD[4+rsp]
+        lea     esi,[1859775393+rsi*1+rdx]
+        xor     eax,r12d
+        add     esi,ecx
+        rol     r11d,30
+        add     esi,eax
+        rol     ebp,1
+        xor     r14d,DWORD[40+rsp]
+        mov     eax,edi
+        mov     DWORD[36+rsp],ebp
+        mov     ecx,esi
+        xor     r14d,DWORD[48+rsp]
+        xor     eax,r12d
+        rol     ecx,5
+        xor     r14d,DWORD[8+rsp]
+        lea     r13d,[1859775393+r13*1+rbp]
+        xor     eax,r11d
+        add     r13d,ecx
+        rol     edi,30
+        add     r13d,eax
+        rol     r14d,1
+        xor     edx,DWORD[44+rsp]
+        mov     eax,esi
+        mov     DWORD[40+rsp],r14d
+        mov     ecx,r13d
+        xor     edx,DWORD[52+rsp]
+        xor     eax,r11d
+        rol     ecx,5
+        xor     edx,DWORD[12+rsp]
+        lea     r12d,[1859775393+r12*1+r14]
+        xor     eax,edi
+        add     r12d,ecx
+        rol     esi,30
+        add     r12d,eax
+        rol     edx,1
+        xor     ebp,DWORD[48+rsp]
+        mov     eax,r13d
+        mov     DWORD[44+rsp],edx
+        mov     ecx,r12d
+        xor     ebp,DWORD[56+rsp]
+        xor     eax,edi
+        rol     ecx,5
+        xor     ebp,DWORD[16+rsp]
+        lea     r11d,[1859775393+r11*1+rdx]
+        xor     eax,esi
+        add     r11d,ecx
+        rol     r13d,30
+        add     r11d,eax
+        rol     ebp,1
+        xor     r14d,DWORD[52+rsp]
+        mov     eax,r12d
+        mov     DWORD[48+rsp],ebp
+        mov     ecx,r11d
+        xor     r14d,DWORD[60+rsp]
+        xor     eax,esi
+        rol     ecx,5
+        xor     r14d,DWORD[20+rsp]
+        lea     edi,[1859775393+rdi*1+rbp]
+        xor     eax,r13d
+        add     edi,ecx
+        rol     r12d,30
+        add     edi,eax
+        rol     r14d,1
+        xor     edx,DWORD[56+rsp]
+        mov     eax,r11d
+        mov     DWORD[52+rsp],r14d
+        mov     ecx,edi
+        xor     edx,DWORD[rsp]
+        xor     eax,r13d
+        rol     ecx,5
+        xor     edx,DWORD[24+rsp]
+        lea     esi,[1859775393+rsi*1+r14]
+        xor     eax,r12d
+        add     esi,ecx
+        rol     r11d,30
+        add     esi,eax
+        rol     edx,1
+        xor     ebp,DWORD[60+rsp]
+        mov     eax,edi
+        mov     DWORD[56+rsp],edx
+        mov     ecx,esi
+        xor     ebp,DWORD[4+rsp]
+        xor     eax,r12d
+        rol     ecx,5
+        xor     ebp,DWORD[28+rsp]
+        lea     r13d,[1859775393+r13*1+rdx]
+        xor     eax,r11d
+        add     r13d,ecx
+        rol     edi,30
+        add     r13d,eax
+        rol     ebp,1
+        xor     r14d,DWORD[rsp]
+        mov     eax,esi
+        mov     DWORD[60+rsp],ebp
+        mov     ecx,r13d
+        xor     r14d,DWORD[8+rsp]
+        xor     eax,r11d
+        rol     ecx,5
+        xor     r14d,DWORD[32+rsp]
+        lea     r12d,[1859775393+r12*1+rbp]
+        xor     eax,edi
+        add     r12d,ecx
+        rol     esi,30
+        add     r12d,eax
+        rol     r14d,1
+        xor     edx,DWORD[4+rsp]
+        mov     eax,r13d
+        mov     DWORD[rsp],r14d
+        mov     ecx,r12d
+        xor     edx,DWORD[12+rsp]
+        xor     eax,edi
+        rol     ecx,5
+        xor     edx,DWORD[36+rsp]
+        lea     r11d,[1859775393+r11*1+r14]
+        xor     eax,esi
+        add     r11d,ecx
+        rol     r13d,30
+        add     r11d,eax
+        rol     edx,1
+        xor     ebp,DWORD[8+rsp]
+        mov     eax,r12d
+        mov     DWORD[4+rsp],edx
+        mov     ecx,r11d
+        xor     ebp,DWORD[16+rsp]
+        xor     eax,esi
+        rol     ecx,5
+        xor     ebp,DWORD[40+rsp]
+        lea     edi,[1859775393+rdi*1+rdx]
+        xor     eax,r13d
+        add     edi,ecx
+        rol     r12d,30
+        add     edi,eax
+        rol     ebp,1
+        xor     r14d,DWORD[12+rsp]
+        mov     eax,r11d
+        mov     DWORD[8+rsp],ebp
+        mov     ecx,edi
+        xor     r14d,DWORD[20+rsp]
+        xor     eax,r13d
+        rol     ecx,5
+        xor     r14d,DWORD[44+rsp]
+        lea     esi,[1859775393+rsi*1+rbp]
+        xor     eax,r12d
+        add     esi,ecx
+        rol     r11d,30
+        add     esi,eax
+        rol     r14d,1
+        xor     edx,DWORD[16+rsp]
+        mov     eax,edi
+        mov     DWORD[12+rsp],r14d
+        mov     ecx,esi
+        xor     edx,DWORD[24+rsp]
+        xor     eax,r12d
+        rol     ecx,5
+        xor     edx,DWORD[48+rsp]
+        lea     r13d,[1859775393+r13*1+r14]
+        xor     eax,r11d
+        add     r13d,ecx
+        rol     edi,30
+        add     r13d,eax
+        rol     edx,1
+        xor     ebp,DWORD[20+rsp]
+        mov     eax,esi
+        mov     DWORD[16+rsp],edx
+        mov     ecx,r13d
+        xor     ebp,DWORD[28+rsp]
+        xor     eax,r11d
+        rol     ecx,5
+        xor     ebp,DWORD[52+rsp]
+        lea     r12d,[1859775393+r12*1+rdx]
+        xor     eax,edi
+        add     r12d,ecx
+        rol     esi,30
+        add     r12d,eax
+        rol     ebp,1
+        xor     r14d,DWORD[24+rsp]
+        mov     eax,r13d
+        mov     DWORD[20+rsp],ebp
+        mov     ecx,r12d
+        xor     r14d,DWORD[32+rsp]
+        xor     eax,edi
+        rol     ecx,5
+        xor     r14d,DWORD[56+rsp]
+        lea     r11d,[1859775393+r11*1+rbp]
+        xor     eax,esi
+        add     r11d,ecx
+        rol     r13d,30
+        add     r11d,eax
+        rol     r14d,1
+        xor     edx,DWORD[28+rsp]
+        mov     eax,r12d
+        mov     DWORD[24+rsp],r14d
+        mov     ecx,r11d
+        xor     edx,DWORD[36+rsp]
+        xor     eax,esi
+        rol     ecx,5
+        xor     edx,DWORD[60+rsp]
+        lea     edi,[1859775393+rdi*1+r14]
+        xor     eax,r13d
+        add     edi,ecx
+        rol     r12d,30
+        add     edi,eax
+        rol     edx,1
+        xor     ebp,DWORD[32+rsp]
+        mov     eax,r11d
+        mov     DWORD[28+rsp],edx
+        mov     ecx,edi
+        xor     ebp,DWORD[40+rsp]
+        xor     eax,r13d
+        rol     ecx,5
+        xor     ebp,DWORD[rsp]
+        lea     esi,[1859775393+rsi*1+rdx]
+        xor     eax,r12d
+        add     esi,ecx
+        rol     r11d,30
+        add     esi,eax
+        rol     ebp,1
+        xor     r14d,DWORD[36+rsp]
+        mov     eax,r12d
+        mov     DWORD[32+rsp],ebp
+        mov     ebx,r12d
+        xor     r14d,DWORD[44+rsp]
+        and     eax,r11d
+        mov     ecx,esi
+        xor     r14d,DWORD[4+rsp]
+        lea     r13d,[((-1894007588))+r13*1+rbp]
+        xor     ebx,r11d
+        rol     ecx,5
+        add     r13d,eax
+        rol     r14d,1
+        and     ebx,edi
+        add     r13d,ecx
+        rol     edi,30
+        add     r13d,ebx
+        xor     edx,DWORD[40+rsp]
+        mov     eax,r11d
+        mov     DWORD[36+rsp],r14d
+        mov     ebx,r11d
+        xor     edx,DWORD[48+rsp]
+        and     eax,edi
+        mov     ecx,r13d
+        xor     edx,DWORD[8+rsp]
+        lea     r12d,[((-1894007588))+r12*1+r14]
+        xor     ebx,edi
+        rol     ecx,5
+        add     r12d,eax
+        rol     edx,1
+        and     ebx,esi
+        add     r12d,ecx
+        rol     esi,30
+        add     r12d,ebx
+        xor     ebp,DWORD[44+rsp]
+        mov     eax,edi
+        mov     DWORD[40+rsp],edx
+        mov     ebx,edi
+        xor     ebp,DWORD[52+rsp]
+        and     eax,esi
+        mov     ecx,r12d
+        xor     ebp,DWORD[12+rsp]
+        lea     r11d,[((-1894007588))+r11*1+rdx]
+        xor     ebx,esi
+        rol     ecx,5
+        add     r11d,eax
+        rol     ebp,1
+        and     ebx,r13d
+        add     r11d,ecx
+        rol     r13d,30
+        add     r11d,ebx
+        xor     r14d,DWORD[48+rsp]
+        mov     eax,esi
+        mov     DWORD[44+rsp],ebp
+        mov     ebx,esi
+        xor     r14d,DWORD[56+rsp]
+        and     eax,r13d
+        mov     ecx,r11d
+        xor     r14d,DWORD[16+rsp]
+        lea     edi,[((-1894007588))+rdi*1+rbp]
+        xor     ebx,r13d
+        rol     ecx,5
+        add     edi,eax
+        rol     r14d,1
+        and     ebx,r12d
+        add     edi,ecx
+        rol     r12d,30
+        add     edi,ebx
+        xor     edx,DWORD[52+rsp]
+        mov     eax,r13d
+        mov     DWORD[48+rsp],r14d
+        mov     ebx,r13d
+        xor     edx,DWORD[60+rsp]
+        and     eax,r12d
+        mov     ecx,edi
+        xor     edx,DWORD[20+rsp]
+        lea     esi,[((-1894007588))+rsi*1+r14]
+        xor     ebx,r12d
+        rol     ecx,5
+        add     esi,eax
+        rol     edx,1
+        and     ebx,r11d
+        add     esi,ecx
+        rol     r11d,30
+        add     esi,ebx
+        xor     ebp,DWORD[56+rsp]
+        mov     eax,r12d
+        mov     DWORD[52+rsp],edx
+        mov     ebx,r12d
+        xor     ebp,DWORD[rsp]
+        and     eax,r11d
+        mov     ecx,esi
+        xor     ebp,DWORD[24+rsp]
+        lea     r13d,[((-1894007588))+r13*1+rdx]
+        xor     ebx,r11d
+        rol     ecx,5
+        add     r13d,eax
+        rol     ebp,1
+        and     ebx,edi
+        add     r13d,ecx
+        rol     edi,30
+        add     r13d,ebx
+        xor     r14d,DWORD[60+rsp]
+        mov     eax,r11d
+        mov     DWORD[56+rsp],ebp
+        mov     ebx,r11d
+        xor     r14d,DWORD[4+rsp]
+        and     eax,edi
+        mov     ecx,r13d
+        xor     r14d,DWORD[28+rsp]
+        lea     r12d,[((-1894007588))+r12*1+rbp]
+        xor     ebx,edi
+        rol     ecx,5
+        add     r12d,eax
+        rol     r14d,1
+        and     ebx,esi
+        add     r12d,ecx
+        rol     esi,30
+        add     r12d,ebx
+        xor     edx,DWORD[rsp]
+        mov     eax,edi
+        mov     DWORD[60+rsp],r14d
+        mov     ebx,edi
+        xor     edx,DWORD[8+rsp]
+        and     eax,esi
+        mov     ecx,r12d
+        xor     edx,DWORD[32+rsp]
+        lea     r11d,[((-1894007588))+r11*1+r14]
+        xor     ebx,esi
+        rol     ecx,5
+        add     r11d,eax
+        rol     edx,1
+        and     ebx,r13d
+        add     r11d,ecx
+        rol     r13d,30
+        add     r11d,ebx
+        xor     ebp,DWORD[4+rsp]
+        mov     eax,esi
+        mov     DWORD[rsp],edx
+        mov     ebx,esi
+        xor     ebp,DWORD[12+rsp]
+        and     eax,r13d
+        mov     ecx,r11d
+        xor     ebp,DWORD[36+rsp]
+        lea     edi,[((-1894007588))+rdi*1+rdx]
+        xor     ebx,r13d
+        rol     ecx,5
+        add     edi,eax
+        rol     ebp,1
+        and     ebx,r12d
+        add     edi,ecx
+        rol     r12d,30
+        add     edi,ebx
+        xor     r14d,DWORD[8+rsp]
+        mov     eax,r13d
+        mov     DWORD[4+rsp],ebp
+        mov     ebx,r13d
+        xor     r14d,DWORD[16+rsp]
+        and     eax,r12d
+        mov     ecx,edi
+        xor     r14d,DWORD[40+rsp]
+        lea     esi,[((-1894007588))+rsi*1+rbp]
+        xor     ebx,r12d
+        rol     ecx,5
+        add     esi,eax
+        rol     r14d,1
+        and     ebx,r11d
+        add     esi,ecx
+        rol     r11d,30
+        add     esi,ebx
+        xor     edx,DWORD[12+rsp]
+        mov     eax,r12d
+        mov     DWORD[8+rsp],r14d
+        mov     ebx,r12d
+        xor     edx,DWORD[20+rsp]
+        and     eax,r11d
+        mov     ecx,esi
+        xor     edx,DWORD[44+rsp]
+        lea     r13d,[((-1894007588))+r13*1+r14]
+        xor     ebx,r11d
+        rol     ecx,5
+        add     r13d,eax
+        rol     edx,1
+        and     ebx,edi
+        add     r13d,ecx
+        rol     edi,30
+        add     r13d,ebx
+        xor     ebp,DWORD[16+rsp]
+        mov     eax,r11d
+        mov     DWORD[12+rsp],edx
+        mov     ebx,r11d
+        xor     ebp,DWORD[24+rsp]
+        and     eax,edi
+        mov     ecx,r13d
+        xor     ebp,DWORD[48+rsp]
+        lea     r12d,[((-1894007588))+r12*1+rdx]
+        xor     ebx,edi
+        rol     ecx,5
+        add     r12d,eax
+        rol     ebp,1
+        and     ebx,esi
+        add     r12d,ecx
+        rol     esi,30
+        add     r12d,ebx
+        xor     r14d,DWORD[20+rsp]
+        mov     eax,edi
+        mov     DWORD[16+rsp],ebp
+        mov     ebx,edi
+        xor     r14d,DWORD[28+rsp]
+        and     eax,esi
+        mov     ecx,r12d
+        xor     r14d,DWORD[52+rsp]
+        lea     r11d,[((-1894007588))+r11*1+rbp]
+        xor     ebx,esi
+        rol     ecx,5
+        add     r11d,eax
+        rol     r14d,1
+        and     ebx,r13d
+        add     r11d,ecx
+        rol     r13d,30
+        add     r11d,ebx
+        xor     edx,DWORD[24+rsp]
+        mov     eax,esi
+        mov     DWORD[20+rsp],r14d
+        mov     ebx,esi
+        xor     edx,DWORD[32+rsp]
+        and     eax,r13d
+        mov     ecx,r11d
+        xor     edx,DWORD[56+rsp]
+        lea     edi,[((-1894007588))+rdi*1+r14]
+        xor     ebx,r13d
+        rol     ecx,5
+        add     edi,eax
+        rol     edx,1
+        and     ebx,r12d
+        add     edi,ecx
+        rol     r12d,30
+        add     edi,ebx
+        xor     ebp,DWORD[28+rsp]
+        mov     eax,r13d
+        mov     DWORD[24+rsp],edx
+        mov     ebx,r13d
+        xor     ebp,DWORD[36+rsp]
+        and     eax,r12d
+        mov     ecx,edi
+        xor     ebp,DWORD[60+rsp]
+        lea     esi,[((-1894007588))+rsi*1+rdx]
+        xor     ebx,r12d
+        rol     ecx,5
+        add     esi,eax
+        rol     ebp,1
+        and     ebx,r11d
+        add     esi,ecx
+        rol     r11d,30
+        add     esi,ebx
+        xor     r14d,DWORD[32+rsp]
+        mov     eax,r12d
+        mov     DWORD[28+rsp],ebp
+        mov     ebx,r12d
+        xor     r14d,DWORD[40+rsp]
+        and     eax,r11d
+        mov     ecx,esi
+        xor     r14d,DWORD[rsp]
+        lea     r13d,[((-1894007588))+r13*1+rbp]
+        xor     ebx,r11d
+        rol     ecx,5
+        add     r13d,eax
+        rol     r14d,1
+        and     ebx,edi
+        add     r13d,ecx
+        rol     edi,30
+        add     r13d,ebx
+        xor     edx,DWORD[36+rsp]
+        mov     eax,r11d
+        mov     DWORD[32+rsp],r14d
+        mov     ebx,r11d
+        xor     edx,DWORD[44+rsp]
+        and     eax,edi
+        mov     ecx,r13d
+        xor     edx,DWORD[4+rsp]
+        lea     r12d,[((-1894007588))+r12*1+r14]
+        xor     ebx,edi
+        rol     ecx,5
+        add     r12d,eax
+        rol     edx,1
+        and     ebx,esi
+        add     r12d,ecx
+        rol     esi,30
+        add     r12d,ebx
+        xor     ebp,DWORD[40+rsp]
+        mov     eax,edi
+        mov     DWORD[36+rsp],edx
+        mov     ebx,edi
+        xor     ebp,DWORD[48+rsp]
+        and     eax,esi
+        mov     ecx,r12d
+        xor     ebp,DWORD[8+rsp]
+        lea     r11d,[((-1894007588))+r11*1+rdx]
+        xor     ebx,esi
+        rol     ecx,5
+        add     r11d,eax
+        rol     ebp,1
+        and     ebx,r13d
+        add     r11d,ecx
+        rol     r13d,30
+        add     r11d,ebx
+        xor     r14d,DWORD[44+rsp]
+        mov     eax,esi
+        mov     DWORD[40+rsp],ebp
+        mov     ebx,esi
+        xor     r14d,DWORD[52+rsp]
+        and     eax,r13d
+        mov     ecx,r11d
+        xor     r14d,DWORD[12+rsp]
+        lea     edi,[((-1894007588))+rdi*1+rbp]
+        xor     ebx,r13d
+        rol     ecx,5
+        add     edi,eax
+        rol     r14d,1
+        and     ebx,r12d
+        add     edi,ecx
+        rol     r12d,30
+        add     edi,ebx
+        xor     edx,DWORD[48+rsp]
+        mov     eax,r13d
+        mov     DWORD[44+rsp],r14d
+        mov     ebx,r13d
+        xor     edx,DWORD[56+rsp]
+        and     eax,r12d
+        mov     ecx,edi
+        xor     edx,DWORD[16+rsp]
+        lea     esi,[((-1894007588))+rsi*1+r14]
+        xor     ebx,r12d
+        rol     ecx,5
+        add     esi,eax
+        rol     edx,1
+        and     ebx,r11d
+        add     esi,ecx
+        rol     r11d,30
+        add     esi,ebx
+        xor     ebp,DWORD[52+rsp]
+        mov     eax,edi
+        mov     DWORD[48+rsp],edx
+        mov     ecx,esi
+        xor     ebp,DWORD[60+rsp]
+        xor     eax,r12d
+        rol     ecx,5
+        xor     ebp,DWORD[20+rsp]
+        lea     r13d,[((-899497514))+r13*1+rdx]
+        xor     eax,r11d
+        add     r13d,ecx
+        rol     edi,30
+        add     r13d,eax
+        rol     ebp,1
+        xor     r14d,DWORD[56+rsp]
+        mov     eax,esi
+        mov     DWORD[52+rsp],ebp
+        mov     ecx,r13d
+        xor     r14d,DWORD[rsp]
+        xor     eax,r11d
+        rol     ecx,5
+        xor     r14d,DWORD[24+rsp]
+        lea     r12d,[((-899497514))+r12*1+rbp]
+        xor     eax,edi
+        add     r12d,ecx
+        rol     esi,30
+        add     r12d,eax
+        rol     r14d,1
+        xor     edx,DWORD[60+rsp]
+        mov     eax,r13d
+        mov     DWORD[56+rsp],r14d
+        mov     ecx,r12d
+        xor     edx,DWORD[4+rsp]
+        xor     eax,edi
+        rol     ecx,5
+        xor     edx,DWORD[28+rsp]
+        lea     r11d,[((-899497514))+r11*1+r14]
+        xor     eax,esi
+        add     r11d,ecx
+        rol     r13d,30
+        add     r11d,eax
+        rol     edx,1
+        xor     ebp,DWORD[rsp]
+        mov     eax,r12d
+        mov     DWORD[60+rsp],edx
+        mov     ecx,r11d
+        xor     ebp,DWORD[8+rsp]
+        xor     eax,esi
+        rol     ecx,5
+        xor     ebp,DWORD[32+rsp]
+        lea     edi,[((-899497514))+rdi*1+rdx]
+        xor     eax,r13d
+        add     edi,ecx
+        rol     r12d,30
+        add     edi,eax
+        rol     ebp,1
+        xor     r14d,DWORD[4+rsp]
+        mov     eax,r11d
+        mov     DWORD[rsp],ebp
+        mov     ecx,edi
+        xor     r14d,DWORD[12+rsp]
+        xor     eax,r13d
+        rol     ecx,5
+        xor     r14d,DWORD[36+rsp]
+        lea     esi,[((-899497514))+rsi*1+rbp]
+        xor     eax,r12d
+        add     esi,ecx
+        rol     r11d,30
+        add     esi,eax
+        rol     r14d,1
+        xor     edx,DWORD[8+rsp]
+        mov     eax,edi
+        mov     DWORD[4+rsp],r14d
+        mov     ecx,esi
+        xor     edx,DWORD[16+rsp]
+        xor     eax,r12d
+        rol     ecx,5
+        xor     edx,DWORD[40+rsp]
+        lea     r13d,[((-899497514))+r13*1+r14]
+        xor     eax,r11d
+        add     r13d,ecx
+        rol     edi,30
+        add     r13d,eax
+        rol     edx,1
+        xor     ebp,DWORD[12+rsp]
+        mov     eax,esi
+        mov     DWORD[8+rsp],edx
+        mov     ecx,r13d
+        xor     ebp,DWORD[20+rsp]
+        xor     eax,r11d
+        rol     ecx,5
+        xor     ebp,DWORD[44+rsp]
+        lea     r12d,[((-899497514))+r12*1+rdx]
+        xor     eax,edi
+        add     r12d,ecx
+        rol     esi,30
+        add     r12d,eax
+        rol     ebp,1
+        xor     r14d,DWORD[16+rsp]
+        mov     eax,r13d
+        mov     DWORD[12+rsp],ebp
+        mov     ecx,r12d
+        xor     r14d,DWORD[24+rsp]
+        xor     eax,edi
+        rol     ecx,5
+        xor     r14d,DWORD[48+rsp]
+        lea     r11d,[((-899497514))+r11*1+rbp]
+        xor     eax,esi
+        add     r11d,ecx
+        rol     r13d,30
+        add     r11d,eax
+        rol     r14d,1
+        xor     edx,DWORD[20+rsp]
+        mov     eax,r12d
+        mov     DWORD[16+rsp],r14d
+        mov     ecx,r11d
+        xor     edx,DWORD[28+rsp]
+        xor     eax,esi
+        rol     ecx,5
+        xor     edx,DWORD[52+rsp]
+        lea     edi,[((-899497514))+rdi*1+r14]
+        xor     eax,r13d
+        add     edi,ecx
+        rol     r12d,30
+        add     edi,eax
+        rol     edx,1
+        xor     ebp,DWORD[24+rsp]
+        mov     eax,r11d
+        mov     DWORD[20+rsp],edx
+        mov     ecx,edi
+        xor     ebp,DWORD[32+rsp]
+        xor     eax,r13d
+        rol     ecx,5
+        xor     ebp,DWORD[56+rsp]
+        lea     esi,[((-899497514))+rsi*1+rdx]
+        xor     eax,r12d
+        add     esi,ecx
+        rol     r11d,30
+        add     esi,eax
+        rol     ebp,1
+        xor     r14d,DWORD[28+rsp]
+        mov     eax,edi
+        mov     DWORD[24+rsp],ebp
+        mov     ecx,esi
+        xor     r14d,DWORD[36+rsp]
+        xor     eax,r12d
+        rol     ecx,5
+        xor     r14d,DWORD[60+rsp]
+        lea     r13d,[((-899497514))+r13*1+rbp]
+        xor     eax,r11d
+        add     r13d,ecx
+        rol     edi,30
+        add     r13d,eax
+        rol     r14d,1
+        xor     edx,DWORD[32+rsp]
+        mov     eax,esi
+        mov     DWORD[28+rsp],r14d
+        mov     ecx,r13d
+        xor     edx,DWORD[40+rsp]
+        xor     eax,r11d
+        rol     ecx,5
+        xor     edx,DWORD[rsp]
+        lea     r12d,[((-899497514))+r12*1+r14]
+        xor     eax,edi
+        add     r12d,ecx
+        rol     esi,30
+        add     r12d,eax
+        rol     edx,1
+        xor     ebp,DWORD[36+rsp]
+        mov     eax,r13d
+
+        mov     ecx,r12d
+        xor     ebp,DWORD[44+rsp]
+        xor     eax,edi
+        rol     ecx,5
+        xor     ebp,DWORD[4+rsp]
+        lea     r11d,[((-899497514))+r11*1+rdx]
+        xor     eax,esi
+        add     r11d,ecx
+        rol     r13d,30
+        add     r11d,eax
+        rol     ebp,1
+        xor     r14d,DWORD[40+rsp]
+        mov     eax,r12d
+
+        mov     ecx,r11d
+        xor     r14d,DWORD[48+rsp]
+        xor     eax,esi
+        rol     ecx,5
+        xor     r14d,DWORD[8+rsp]
+        lea     edi,[((-899497514))+rdi*1+rbp]
+        xor     eax,r13d
+        add     edi,ecx
+        rol     r12d,30
+        add     edi,eax
+        rol     r14d,1
+        xor     edx,DWORD[44+rsp]
+        mov     eax,r11d
+
+        mov     ecx,edi
+        xor     edx,DWORD[52+rsp]
+        xor     eax,r13d
+        rol     ecx,5
+        xor     edx,DWORD[12+rsp]
+        lea     esi,[((-899497514))+rsi*1+r14]
+        xor     eax,r12d
+        add     esi,ecx
+        rol     r11d,30
+        add     esi,eax
+        rol     edx,1
+        xor     ebp,DWORD[48+rsp]
+        mov     eax,edi
+
+        mov     ecx,esi
+        xor     ebp,DWORD[56+rsp]
+        xor     eax,r12d
+        rol     ecx,5
+        xor     ebp,DWORD[16+rsp]
+        lea     r13d,[((-899497514))+r13*1+rdx]
+        xor     eax,r11d
+        add     r13d,ecx
+        rol     edi,30
+        add     r13d,eax
+        rol     ebp,1
+        xor     r14d,DWORD[52+rsp]
+        mov     eax,esi
+
+        mov     ecx,r13d
+        xor     r14d,DWORD[60+rsp]
+        xor     eax,r11d
+        rol     ecx,5
+        xor     r14d,DWORD[20+rsp]
+        lea     r12d,[((-899497514))+r12*1+rbp]
+        xor     eax,edi
+        add     r12d,ecx
+        rol     esi,30
+        add     r12d,eax
+        rol     r14d,1
+        xor     edx,DWORD[56+rsp]
+        mov     eax,r13d
+
+        mov     ecx,r12d
+        xor     edx,DWORD[rsp]
+        xor     eax,edi
+        rol     ecx,5
+        xor     edx,DWORD[24+rsp]
+        lea     r11d,[((-899497514))+r11*1+r14]
+        xor     eax,esi
+        add     r11d,ecx
+        rol     r13d,30
+        add     r11d,eax
+        rol     edx,1
+        xor     ebp,DWORD[60+rsp]
+        mov     eax,r12d
+
+        mov     ecx,r11d
+        xor     ebp,DWORD[4+rsp]
+        xor     eax,esi
+        rol     ecx,5
+        xor     ebp,DWORD[28+rsp]
+        lea     edi,[((-899497514))+rdi*1+rdx]
+        xor     eax,r13d
+        add     edi,ecx
+        rol     r12d,30
+        add     edi,eax
+        rol     ebp,1
+        mov     eax,r11d
+        mov     ecx,edi
+        xor     eax,r13d
+        lea     esi,[((-899497514))+rsi*1+rbp]
+        rol     ecx,5
+        xor     eax,r12d
+        add     esi,ecx
+        rol     r11d,30
+        add     esi,eax
+        add     esi,DWORD[r8]
+        add     edi,DWORD[4+r8]
+        add     r11d,DWORD[8+r8]
+        add     r12d,DWORD[12+r8]
+        add     r13d,DWORD[16+r8]
+        mov     DWORD[r8],esi
+        mov     DWORD[4+r8],edi
+        mov     DWORD[8+r8],r11d
+        mov     DWORD[12+r8],r12d
+        mov     DWORD[16+r8],r13d
+
+        sub     r10,1
+        lea     r9,[64+r9]
+        jnz     NEAR $L$loop
+
+        mov     rsi,QWORD[64+rsp]
+
+        mov     r14,QWORD[((-40))+rsi]
+
+        mov     r13,QWORD[((-32))+rsi]
+
+        mov     r12,QWORD[((-24))+rsi]
+
+        mov     rbp,QWORD[((-16))+rsi]
+
+        mov     rbx,QWORD[((-8))+rsi]
+
+        lea     rsp,[rsi]
+
+$L$epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_sha1_block_data_order:
+
+ALIGN   32
+sha1_block_data_order_shaext:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_sha1_block_data_order_shaext:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+_shaext_shortcut:
+
+        lea     rsp,[((-72))+rsp]
+        movaps  XMMWORD[(-8-64)+rax],xmm6
+        movaps  XMMWORD[(-8-48)+rax],xmm7
+        movaps  XMMWORD[(-8-32)+rax],xmm8
+        movaps  XMMWORD[(-8-16)+rax],xmm9
+$L$prologue_shaext:
+        movdqu  xmm0,XMMWORD[rdi]
+        movd    xmm1,DWORD[16+rdi]
+        movdqa  xmm3,XMMWORD[((K_XX_XX+160))]
+
+        movdqu  xmm4,XMMWORD[rsi]
+        pshufd  xmm0,xmm0,27
+        movdqu  xmm5,XMMWORD[16+rsi]
+        pshufd  xmm1,xmm1,27
+        movdqu  xmm6,XMMWORD[32+rsi]
+DB      102,15,56,0,227
+        movdqu  xmm7,XMMWORD[48+rsi]
+DB      102,15,56,0,235
+DB      102,15,56,0,243
+        movdqa  xmm9,xmm1
+DB      102,15,56,0,251
+        jmp     NEAR $L$oop_shaext
+
+ALIGN   16
+$L$oop_shaext:
+        dec     rdx
+        lea     r8,[64+rsi]
+        paddd   xmm1,xmm4
+        cmovne  rsi,r8
+        movdqa  xmm8,xmm0
+DB      15,56,201,229
+        movdqa  xmm2,xmm0
+DB      15,58,204,193,0
+DB      15,56,200,213
+        pxor    xmm4,xmm6
+DB      15,56,201,238
+DB      15,56,202,231
+
+        movdqa  xmm1,xmm0
+DB      15,58,204,194,0
+DB      15,56,200,206
+        pxor    xmm5,xmm7
+DB      15,56,202,236
+DB      15,56,201,247
+        movdqa  xmm2,xmm0
+DB      15,58,204,193,0
+DB      15,56,200,215
+        pxor    xmm6,xmm4
+DB      15,56,201,252
+DB      15,56,202,245
+
+        movdqa  xmm1,xmm0
+DB      15,58,204,194,0
+DB      15,56,200,204
+        pxor    xmm7,xmm5
+DB      15,56,202,254
+DB      15,56,201,229
+        movdqa  xmm2,xmm0
+DB      15,58,204,193,0
+DB      15,56,200,213
+        pxor    xmm4,xmm6
+DB      15,56,201,238
+DB      15,56,202,231
+
+        movdqa  xmm1,xmm0
+DB      15,58,204,194,1
+DB      15,56,200,206
+        pxor    xmm5,xmm7
+DB      15,56,202,236
+DB      15,56,201,247
+        movdqa  xmm2,xmm0
+DB      15,58,204,193,1
+DB      15,56,200,215
+        pxor    xmm6,xmm4
+DB      15,56,201,252
+DB      15,56,202,245
+
+        movdqa  xmm1,xmm0
+DB      15,58,204,194,1
+DB      15,56,200,204
+        pxor    xmm7,xmm5
+DB      15,56,202,254
+DB      15,56,201,229
+        movdqa  xmm2,xmm0
+DB      15,58,204,193,1
+DB      15,56,200,213
+        pxor    xmm4,xmm6
+DB      15,56,201,238
+DB      15,56,202,231
+
+        movdqa  xmm1,xmm0
+DB      15,58,204,194,1
+DB      15,56,200,206
+        pxor    xmm5,xmm7
+DB      15,56,202,236
+DB      15,56,201,247
+        movdqa  xmm2,xmm0
+DB      15,58,204,193,2
+DB      15,56,200,215
+        pxor    xmm6,xmm4
+DB      15,56,201,252
+DB      15,56,202,245
+
+        movdqa  xmm1,xmm0
+DB      15,58,204,194,2
+DB      15,56,200,204
+        pxor    xmm7,xmm5
+DB      15,56,202,254
+DB      15,56,201,229
+        movdqa  xmm2,xmm0
+DB      15,58,204,193,2
+DB      15,56,200,213
+        pxor    xmm4,xmm6
+DB      15,56,201,238
+DB      15,56,202,231
+
+        movdqa  xmm1,xmm0
+DB      15,58,204,194,2
+DB      15,56,200,206
+        pxor    xmm5,xmm7
+DB      15,56,202,236
+DB      15,56,201,247
+        movdqa  xmm2,xmm0
+DB      15,58,204,193,2
+DB      15,56,200,215
+        pxor    xmm6,xmm4
+DB      15,56,201,252
+DB      15,56,202,245
+
+        movdqa  xmm1,xmm0
+DB      15,58,204,194,3
+DB      15,56,200,204
+        pxor    xmm7,xmm5
+DB      15,56,202,254
+        movdqu  xmm4,XMMWORD[rsi]
+        movdqa  xmm2,xmm0
+DB      15,58,204,193,3
+DB      15,56,200,213
+        movdqu  xmm5,XMMWORD[16+rsi]
+DB      102,15,56,0,227
+
+        movdqa  xmm1,xmm0
+DB      15,58,204,194,3
+DB      15,56,200,206
+        movdqu  xmm6,XMMWORD[32+rsi]
+DB      102,15,56,0,235
+
+        movdqa  xmm2,xmm0
+DB      15,58,204,193,3
+DB      15,56,200,215
+        movdqu  xmm7,XMMWORD[48+rsi]
+DB      102,15,56,0,243
+
+        movdqa  xmm1,xmm0
+DB      15,58,204,194,3
+DB      65,15,56,200,201
+DB      102,15,56,0,251
+
+        paddd   xmm0,xmm8
+        movdqa  xmm9,xmm1
+
+        jnz     NEAR $L$oop_shaext
+
+        pshufd  xmm0,xmm0,27
+        pshufd  xmm1,xmm1,27
+        movdqu  XMMWORD[rdi],xmm0
+        movd    DWORD[16+rdi],xmm1
+        movaps  xmm6,XMMWORD[((-8-64))+rax]
+        movaps  xmm7,XMMWORD[((-8-48))+rax]
+        movaps  xmm8,XMMWORD[((-8-32))+rax]
+        movaps  xmm9,XMMWORD[((-8-16))+rax]
+        mov     rsp,rax
+$L$epilogue_shaext:
+
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+$L$SEH_end_sha1_block_data_order_shaext:
+
+ALIGN   16
+sha1_block_data_order_ssse3:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_sha1_block_data_order_ssse3:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+_ssse3_shortcut:
+
+        mov     r11,rsp
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        lea     rsp,[((-160))+rsp]
+        movaps  XMMWORD[(-40-96)+r11],xmm6
+        movaps  XMMWORD[(-40-80)+r11],xmm7
+        movaps  XMMWORD[(-40-64)+r11],xmm8
+        movaps  XMMWORD[(-40-48)+r11],xmm9
+        movaps  XMMWORD[(-40-32)+r11],xmm10
+        movaps  XMMWORD[(-40-16)+r11],xmm11
+$L$prologue_ssse3:
+        and     rsp,-64
+        mov     r8,rdi
+        mov     r9,rsi
+        mov     r10,rdx
+
+        shl     r10,6
+        add     r10,r9
+        lea     r14,[((K_XX_XX+64))]
+
+        mov     eax,DWORD[r8]
+        mov     ebx,DWORD[4+r8]
+        mov     ecx,DWORD[8+r8]
+        mov     edx,DWORD[12+r8]
+        mov     esi,ebx
+        mov     ebp,DWORD[16+r8]
+        mov     edi,ecx
+        xor     edi,edx
+        and     esi,edi
+
+        movdqa  xmm6,XMMWORD[64+r14]
+        movdqa  xmm9,XMMWORD[((-64))+r14]
+        movdqu  xmm0,XMMWORD[r9]
+        movdqu  xmm1,XMMWORD[16+r9]
+        movdqu  xmm2,XMMWORD[32+r9]
+        movdqu  xmm3,XMMWORD[48+r9]
+DB      102,15,56,0,198
+DB      102,15,56,0,206
+DB      102,15,56,0,214
+        add     r9,64
+        paddd   xmm0,xmm9
+DB      102,15,56,0,222
+        paddd   xmm1,xmm9
+        paddd   xmm2,xmm9
+        movdqa  XMMWORD[rsp],xmm0
+        psubd   xmm0,xmm9
+        movdqa  XMMWORD[16+rsp],xmm1
+        psubd   xmm1,xmm9
+        movdqa  XMMWORD[32+rsp],xmm2
+        psubd   xmm2,xmm9
+        jmp     NEAR $L$oop_ssse3
+ALIGN   16
+$L$oop_ssse3:
+        ror     ebx,2
+        pshufd  xmm4,xmm0,238
+        xor     esi,edx
+        movdqa  xmm8,xmm3
+        paddd   xmm9,xmm3
+        mov     edi,eax
+        add     ebp,DWORD[rsp]
+        punpcklqdq      xmm4,xmm1
+        xor     ebx,ecx
+        rol     eax,5
+        add     ebp,esi
+        psrldq  xmm8,4
+        and     edi,ebx
+        xor     ebx,ecx
+        pxor    xmm4,xmm0
+        add     ebp,eax
+        ror     eax,7
+        pxor    xmm8,xmm2
+        xor     edi,ecx
+        mov     esi,ebp
+        add     edx,DWORD[4+rsp]
+        pxor    xmm4,xmm8
+        xor     eax,ebx
+        rol     ebp,5
+        movdqa  XMMWORD[48+rsp],xmm9
+        add     edx,edi
+        and     esi,eax
+        movdqa  xmm10,xmm4
+        xor     eax,ebx
+        add     edx,ebp
+        ror     ebp,7
+        movdqa  xmm8,xmm4
+        xor     esi,ebx
+        pslldq  xmm10,12
+        paddd   xmm4,xmm4
+        mov     edi,edx
+        add     ecx,DWORD[8+rsp]
+        psrld   xmm8,31
+        xor     ebp,eax
+        rol     edx,5
+        add     ecx,esi
+        movdqa  xmm9,xmm10
+        and     edi,ebp
+        xor     ebp,eax
+        psrld   xmm10,30
+        add     ecx,edx
+        ror     edx,7
+        por     xmm4,xmm8
+        xor     edi,eax
+        mov     esi,ecx
+        add     ebx,DWORD[12+rsp]
+        pslld   xmm9,2
+        pxor    xmm4,xmm10
+        xor     edx,ebp
+        movdqa  xmm10,XMMWORD[((-64))+r14]
+        rol     ecx,5
+        add     ebx,edi
+        and     esi,edx
+        pxor    xmm4,xmm9
+        xor     edx,ebp
+        add     ebx,ecx
+        ror     ecx,7
+        pshufd  xmm5,xmm1,238
+        xor     esi,ebp
+        movdqa  xmm9,xmm4
+        paddd   xmm10,xmm4
+        mov     edi,ebx
+        add     eax,DWORD[16+rsp]
+        punpcklqdq      xmm5,xmm2
+        xor     ecx,edx
+        rol     ebx,5
+        add     eax,esi
+        psrldq  xmm9,4
+        and     edi,ecx
+        xor     ecx,edx
+        pxor    xmm5,xmm1
+        add     eax,ebx
+        ror     ebx,7
+        pxor    xmm9,xmm3
+        xor     edi,edx
+        mov     esi,eax
+        add     ebp,DWORD[20+rsp]
+        pxor    xmm5,xmm9
+        xor     ebx,ecx
+        rol     eax,5
+        movdqa  XMMWORD[rsp],xmm10
+        add     ebp,edi
+        and     esi,ebx
+        movdqa  xmm8,xmm5
+        xor     ebx,ecx
+        add     ebp,eax
+        ror     eax,7
+        movdqa  xmm9,xmm5
+        xor     esi,ecx
+        pslldq  xmm8,12
+        paddd   xmm5,xmm5
+        mov     edi,ebp
+        add     edx,DWORD[24+rsp]
+        psrld   xmm9,31
+        xor     eax,ebx
+        rol     ebp,5
+        add     edx,esi
+        movdqa  xmm10,xmm8
+        and     edi,eax
+        xor     eax,ebx
+        psrld   xmm8,30
+        add     edx,ebp
+        ror     ebp,7
+        por     xmm5,xmm9
+        xor     edi,ebx
+        mov     esi,edx
+        add     ecx,DWORD[28+rsp]
+        pslld   xmm10,2
+        pxor    xmm5,xmm8
+        xor     ebp,eax
+        movdqa  xmm8,XMMWORD[((-32))+r14]
+        rol     edx,5
+        add     ecx,edi
+        and     esi,ebp
+        pxor    xmm5,xmm10
+        xor     ebp,eax
+        add     ecx,edx
+        ror     edx,7
+        pshufd  xmm6,xmm2,238
+        xor     esi,eax
+        movdqa  xmm10,xmm5
+        paddd   xmm8,xmm5
+        mov     edi,ecx
+        add     ebx,DWORD[32+rsp]
+        punpcklqdq      xmm6,xmm3
+        xor     edx,ebp
+        rol     ecx,5
+        add     ebx,esi
+        psrldq  xmm10,4
+        and     edi,edx
+        xor     edx,ebp
+        pxor    xmm6,xmm2
+        add     ebx,ecx
+        ror     ecx,7
+        pxor    xmm10,xmm4
+        xor     edi,ebp
+        mov     esi,ebx
+        add     eax,DWORD[36+rsp]
+        pxor    xmm6,xmm10
+        xor     ecx,edx
+        rol     ebx,5
+        movdqa  XMMWORD[16+rsp],xmm8
+        add     eax,edi
+        and     esi,ecx
+        movdqa  xmm9,xmm6
+        xor     ecx,edx
+        add     eax,ebx
+        ror     ebx,7
+        movdqa  xmm10,xmm6
+        xor     esi,edx
+        pslldq  xmm9,12
+        paddd   xmm6,xmm6
+        mov     edi,eax
+        add     ebp,DWORD[40+rsp]
+        psrld   xmm10,31
+        xor     ebx,ecx
+        rol     eax,5
+        add     ebp,esi
+        movdqa  xmm8,xmm9
+        and     edi,ebx
+        xor     ebx,ecx
+        psrld   xmm9,30
+        add     ebp,eax
+        ror     eax,7
+        por     xmm6,xmm10
+        xor     edi,ecx
+        mov     esi,ebp
+        add     edx,DWORD[44+rsp]
+        pslld   xmm8,2
+        pxor    xmm6,xmm9
+        xor     eax,ebx
+        movdqa  xmm9,XMMWORD[((-32))+r14]
+        rol     ebp,5
+        add     edx,edi
+        and     esi,eax
+        pxor    xmm6,xmm8
+        xor     eax,ebx
+        add     edx,ebp
+        ror     ebp,7
+        pshufd  xmm7,xmm3,238
+        xor     esi,ebx
+        movdqa  xmm8,xmm6
+        paddd   xmm9,xmm6
+        mov     edi,edx
+        add     ecx,DWORD[48+rsp]
+        punpcklqdq      xmm7,xmm4
+        xor     ebp,eax
+        rol     edx,5
+        add     ecx,esi
+        psrldq  xmm8,4
+        and     edi,ebp
+        xor     ebp,eax
+        pxor    xmm7,xmm3
+        add     ecx,edx
+        ror     edx,7
+        pxor    xmm8,xmm5
+        xor     edi,eax
+        mov     esi,ecx
+        add     ebx,DWORD[52+rsp]
+        pxor    xmm7,xmm8
+        xor     edx,ebp
+        rol     ecx,5
+        movdqa  XMMWORD[32+rsp],xmm9
+        add     ebx,edi
+        and     esi,edx
+        movdqa  xmm10,xmm7
+        xor     edx,ebp
+        add     ebx,ecx
+        ror     ecx,7
+        movdqa  xmm8,xmm7
+        xor     esi,ebp
+        pslldq  xmm10,12
+        paddd   xmm7,xmm7
+        mov     edi,ebx
+        add     eax,DWORD[56+rsp]
+        psrld   xmm8,31
+        xor     ecx,edx
+        rol     ebx,5
+        add     eax,esi
+        movdqa  xmm9,xmm10
+        and     edi,ecx
+        xor     ecx,edx
+        psrld   xmm10,30
+        add     eax,ebx
+        ror     ebx,7
+        por     xmm7,xmm8
+        xor     edi,edx
+        mov     esi,eax
+        add     ebp,DWORD[60+rsp]
+        pslld   xmm9,2
+        pxor    xmm7,xmm10
+        xor     ebx,ecx
+        movdqa  xmm10,XMMWORD[((-32))+r14]
+        rol     eax,5
+        add     ebp,edi
+        and     esi,ebx
+        pxor    xmm7,xmm9
+        pshufd  xmm9,xmm6,238
+        xor     ebx,ecx
+        add     ebp,eax
+        ror     eax,7
+        pxor    xmm0,xmm4
+        xor     esi,ecx
+        mov     edi,ebp
+        add     edx,DWORD[rsp]
+        punpcklqdq      xmm9,xmm7
+        xor     eax,ebx
+        rol     ebp,5
+        pxor    xmm0,xmm1
+        add     edx,esi
+        and     edi,eax
+        movdqa  xmm8,xmm10
+        xor     eax,ebx
+        paddd   xmm10,xmm7
+        add     edx,ebp
+        pxor    xmm0,xmm9
+        ror     ebp,7
+        xor     edi,ebx
+        mov     esi,edx
+        add     ecx,DWORD[4+rsp]
+        movdqa  xmm9,xmm0
+        xor     ebp,eax
+        rol     edx,5
+        movdqa  XMMWORD[48+rsp],xmm10
+        add     ecx,edi
+        and     esi,ebp
+        xor     ebp,eax
+        pslld   xmm0,2
+        add     ecx,edx
+        ror     edx,7
+        psrld   xmm9,30
+        xor     esi,eax
+        mov     edi,ecx
+        add     ebx,DWORD[8+rsp]
+        por     xmm0,xmm9
+        xor     edx,ebp
+        rol     ecx,5
+        pshufd  xmm10,xmm7,238
+        add     ebx,esi
+        and     edi,edx
+        xor     edx,ebp
+        add     ebx,ecx
+        add     eax,DWORD[12+rsp]
+        xor     edi,ebp
+        mov     esi,ebx
+        rol     ebx,5
+        add     eax,edi
+        xor     esi,edx
+        ror     ecx,7
+        add     eax,ebx
+        pxor    xmm1,xmm5
+        add     ebp,DWORD[16+rsp]
+        xor     esi,ecx
+        punpcklqdq      xmm10,xmm0
+        mov     edi,eax
+        rol     eax,5
+        pxor    xmm1,xmm2
+        add     ebp,esi
+        xor     edi,ecx
+        movdqa  xmm9,xmm8
+        ror     ebx,7
+        paddd   xmm8,xmm0
+        add     ebp,eax
+        pxor    xmm1,xmm10
+        add     edx,DWORD[20+rsp]
+        xor     edi,ebx
+        mov     esi,ebp
+        rol     ebp,5
+        movdqa  xmm10,xmm1
+        add     edx,edi
+        xor     esi,ebx
+        movdqa  XMMWORD[rsp],xmm8
+        ror     eax,7
+        add     edx,ebp
+        add     ecx,DWORD[24+rsp]
+        pslld   xmm1,2
+        xor     esi,eax
+        mov     edi,edx
+        psrld   xmm10,30
+        rol     edx,5
+        add     ecx,esi
+        xor     edi,eax
+        ror     ebp,7
+        por     xmm1,xmm10
+        add     ecx,edx
+        add     ebx,DWORD[28+rsp]
+        pshufd  xmm8,xmm0,238
+        xor     edi,ebp
+        mov     esi,ecx
+        rol     ecx,5
+        add     ebx,edi
+        xor     esi,ebp
+        ror     edx,7
+        add     ebx,ecx
+        pxor    xmm2,xmm6
+        add     eax,DWORD[32+rsp]
+        xor     esi,edx
+        punpcklqdq      xmm8,xmm1
+        mov     edi,ebx
+        rol     ebx,5
+        pxor    xmm2,xmm3
+        add     eax,esi
+        xor     edi,edx
+        movdqa  xmm10,XMMWORD[r14]
+        ror     ecx,7
+        paddd   xmm9,xmm1
+        add     eax,ebx
+        pxor    xmm2,xmm8
+        add     ebp,DWORD[36+rsp]
+        xor     edi,ecx
+        mov     esi,eax
+        rol     eax,5
+        movdqa  xmm8,xmm2
+        add     ebp,edi
+        xor     esi,ecx
+        movdqa  XMMWORD[16+rsp],xmm9
+        ror     ebx,7
+        add     ebp,eax
+        add     edx,DWORD[40+rsp]
+        pslld   xmm2,2
+        xor     esi,ebx
+        mov     edi,ebp
+        psrld   xmm8,30
+        rol     ebp,5
+        add     edx,esi
+        xor     edi,ebx
+        ror     eax,7
+        por     xmm2,xmm8
+        add     edx,ebp
+        add     ecx,DWORD[44+rsp]
+        pshufd  xmm9,xmm1,238
+        xor     edi,eax
+        mov     esi,edx
+        rol     edx,5
+        add     ecx,edi
+        xor     esi,eax
+        ror     ebp,7
+        add     ecx,edx
+        pxor    xmm3,xmm7
+        add     ebx,DWORD[48+rsp]
+        xor     esi,ebp
+        punpcklqdq      xmm9,xmm2
+        mov     edi,ecx
+        rol     ecx,5
+        pxor    xmm3,xmm4
+        add     ebx,esi
+        xor     edi,ebp
+        movdqa  xmm8,xmm10
+        ror     edx,7
+        paddd   xmm10,xmm2
+        add     ebx,ecx
+        pxor    xmm3,xmm9
+        add     eax,DWORD[52+rsp]
+        xor     edi,edx
+        mov     esi,ebx
+        rol     ebx,5
+        movdqa  xmm9,xmm3
+        add     eax,edi
+        xor     esi,edx
+        movdqa  XMMWORD[32+rsp],xmm10
+        ror     ecx,7
+        add     eax,ebx
+        add     ebp,DWORD[56+rsp]
+        pslld   xmm3,2
+        xor     esi,ecx
+        mov     edi,eax
+        psrld   xmm9,30
+        rol     eax,5
+        add     ebp,esi
+        xor     edi,ecx
+        ror     ebx,7
+        por     xmm3,xmm9
+        add     ebp,eax
+        add     edx,DWORD[60+rsp]
+        pshufd  xmm10,xmm2,238
+        xor     edi,ebx
+        mov     esi,ebp
+        rol     ebp,5
+        add     edx,edi
+        xor     esi,ebx
+        ror     eax,7
+        add     edx,ebp
+        pxor    xmm4,xmm0
+        add     ecx,DWORD[rsp]
+        xor     esi,eax
+        punpcklqdq      xmm10,xmm3
+        mov     edi,edx
+        rol     edx,5
+        pxor    xmm4,xmm5
+        add     ecx,esi
+        xor     edi,eax
+        movdqa  xmm9,xmm8
+        ror     ebp,7
+        paddd   xmm8,xmm3
+        add     ecx,edx
+        pxor    xmm4,xmm10
+        add     ebx,DWORD[4+rsp]
+        xor     edi,ebp
+        mov     esi,ecx
+        rol     ecx,5
+        movdqa  xmm10,xmm4
+        add     ebx,edi
+        xor     esi,ebp
+        movdqa  XMMWORD[48+rsp],xmm8
+        ror     edx,7
+        add     ebx,ecx
+        add     eax,DWORD[8+rsp]
+        pslld   xmm4,2
+        xor     esi,edx
+        mov     edi,ebx
+        psrld   xmm10,30
+        rol     ebx,5
+        add     eax,esi
+        xor     edi,edx
+        ror     ecx,7
+        por     xmm4,xmm10
+        add     eax,ebx
+        add     ebp,DWORD[12+rsp]
+        pshufd  xmm8,xmm3,238
+        xor     edi,ecx
+        mov     esi,eax
+        rol     eax,5
+        add     ebp,edi
+        xor     esi,ecx
+        ror     ebx,7
+        add     ebp,eax
+        pxor    xmm5,xmm1
+        add     edx,DWORD[16+rsp]
+        xor     esi,ebx
+        punpcklqdq      xmm8,xmm4
+        mov     edi,ebp
+        rol     ebp,5
+        pxor    xmm5,xmm6
+        add     edx,esi
+        xor     edi,ebx
+        movdqa  xmm10,xmm9
+        ror     eax,7
+        paddd   xmm9,xmm4
+        add     edx,ebp
+        pxor    xmm5,xmm8
+        add     ecx,DWORD[20+rsp]
+        xor     edi,eax
+        mov     esi,edx
+        rol     edx,5
+        movdqa  xmm8,xmm5
+        add     ecx,edi
+        xor     esi,eax
+        movdqa  XMMWORD[rsp],xmm9
+        ror     ebp,7
+        add     ecx,edx
+        add     ebx,DWORD[24+rsp]
+        pslld   xmm5,2
+        xor     esi,ebp
+        mov     edi,ecx
+        psrld   xmm8,30
+        rol     ecx,5
+        add     ebx,esi
+        xor     edi,ebp
+        ror     edx,7
+        por     xmm5,xmm8
+        add     ebx,ecx
+        add     eax,DWORD[28+rsp]
+        pshufd  xmm9,xmm4,238
+        ror     ecx,7
+        mov     esi,ebx
+        xor     edi,edx
+        rol     ebx,5
+        add     eax,edi
+        xor     esi,ecx
+        xor     ecx,edx
+        add     eax,ebx
+        pxor    xmm6,xmm2
+        add     ebp,DWORD[32+rsp]
+        and     esi,ecx
+        xor     ecx,edx
+        ror     ebx,7
+        punpcklqdq      xmm9,xmm5
+        mov     edi,eax
+        xor     esi,ecx
+        pxor    xmm6,xmm7
+        rol     eax,5
+        add     ebp,esi
+        movdqa  xmm8,xmm10
+        xor     edi,ebx
+        paddd   xmm10,xmm5
+        xor     ebx,ecx
+        pxor    xmm6,xmm9
+        add     ebp,eax
+        add     edx,DWORD[36+rsp]
+        and     edi,ebx
+        xor     ebx,ecx
+        ror     eax,7
+        movdqa  xmm9,xmm6
+        mov     esi,ebp
+        xor     edi,ebx
+        movdqa  XMMWORD[16+rsp],xmm10
+        rol     ebp,5
+        add     edx,edi
+        xor     esi,eax
+        pslld   xmm6,2
+        xor     eax,ebx
+        add     edx,ebp
+        psrld   xmm9,30
+        add     ecx,DWORD[40+rsp]
+        and     esi,eax
+        xor     eax,ebx
+        por     xmm6,xmm9
+        ror     ebp,7
+        mov     edi,edx
+        xor     esi,eax
+        rol     edx,5
+        pshufd  xmm10,xmm5,238
+        add     ecx,esi
+        xor     edi,ebp
+        xor     ebp,eax
+        add     ecx,edx
+        add     ebx,DWORD[44+rsp]
+        and     edi,ebp
+        xor     ebp,eax
+        ror     edx,7
+        mov     esi,ecx
+        xor     edi,ebp
+        rol     ecx,5
+        add     ebx,edi
+        xor     esi,edx
+        xor     edx,ebp
+        add     ebx,ecx
+        pxor    xmm7,xmm3
+        add     eax,DWORD[48+rsp]
+        and     esi,edx
+        xor     edx,ebp
+        ror     ecx,7
+        punpcklqdq      xmm10,xmm6
+        mov     edi,ebx
+        xor     esi,edx
+        pxor    xmm7,xmm0
+        rol     ebx,5
+        add     eax,esi
+        movdqa  xmm9,XMMWORD[32+r14]
+        xor     edi,ecx
+        paddd   xmm8,xmm6
+        xor     ecx,edx
+        pxor    xmm7,xmm10
+        add     eax,ebx
+        add     ebp,DWORD[52+rsp]
+        and     edi,ecx
+        xor     ecx,edx
+        ror     ebx,7
+        movdqa  xmm10,xmm7
+        mov     esi,eax
+        xor     edi,ecx
+        movdqa  XMMWORD[32+rsp],xmm8
+        rol     eax,5
+        add     ebp,edi
+        xor     esi,ebx
+        pslld   xmm7,2
+        xor     ebx,ecx
+        add     ebp,eax
+        psrld   xmm10,30
+        add     edx,DWORD[56+rsp]
+        and     esi,ebx
+        xor     ebx,ecx
+        por     xmm7,xmm10
+        ror     eax,7
+        mov     edi,ebp
+        xor     esi,ebx
+        rol     ebp,5
+        pshufd  xmm8,xmm6,238
+        add     edx,esi
+        xor     edi,eax
+        xor     eax,ebx
+        add     edx,ebp
+        add     ecx,DWORD[60+rsp]
+        and     edi,eax
+        xor     eax,ebx
+        ror     ebp,7
+        mov     esi,edx
+        xor     edi,eax
+        rol     edx,5
+        add     ecx,edi
+        xor     esi,ebp
+        xor     ebp,eax
+        add     ecx,edx
+        pxor    xmm0,xmm4
+        add     ebx,DWORD[rsp]
+        and     esi,ebp
+        xor     ebp,eax
+        ror     edx,7
+        punpcklqdq      xmm8,xmm7
+        mov     edi,ecx
+        xor     esi,ebp
+        pxor    xmm0,xmm1
+        rol     ecx,5
+        add     ebx,esi
+        movdqa  xmm10,xmm9
+        xor     edi,edx
+        paddd   xmm9,xmm7
+        xor     edx,ebp
+        pxor    xmm0,xmm8
+        add     ebx,ecx
+        add     eax,DWORD[4+rsp]
+        and     edi,edx
+        xor     edx,ebp
+        ror     ecx,7
+        movdqa  xmm8,xmm0
+        mov     esi,ebx
+        xor     edi,edx
+        movdqa  XMMWORD[48+rsp],xmm9
+        rol     ebx,5
+        add     eax,edi
+        xor     esi,ecx
+        pslld   xmm0,2
+        xor     ecx,edx
+        add     eax,ebx
+        psrld   xmm8,30
+        add     ebp,DWORD[8+rsp]
+        and     esi,ecx
+        xor     ecx,edx
+        por     xmm0,xmm8
+        ror     ebx,7
+        mov     edi,eax
+        xor     esi,ecx
+        rol     eax,5
+        pshufd  xmm9,xmm7,238
+        add     ebp,esi
+        xor     edi,ebx
+        xor     ebx,ecx
+        add     ebp,eax
+        add     edx,DWORD[12+rsp]
+        and     edi,ebx
+        xor     ebx,ecx
+        ror     eax,7
+        mov     esi,ebp
+        xor     edi,ebx
+        rol     ebp,5
+        add     edx,edi
+        xor     esi,eax
+        xor     eax,ebx
+        add     edx,ebp
+        pxor    xmm1,xmm5
+        add     ecx,DWORD[16+rsp]
+        and     esi,eax
+        xor     eax,ebx
+        ror     ebp,7
+        punpcklqdq      xmm9,xmm0
+        mov     edi,edx
+        xor     esi,eax
+        pxor    xmm1,xmm2
+        rol     edx,5
+        add     ecx,esi
+        movdqa  xmm8,xmm10
+        xor     edi,ebp
+        paddd   xmm10,xmm0
+        xor     ebp,eax
+        pxor    xmm1,xmm9
+        add     ecx,edx
+        add     ebx,DWORD[20+rsp]
+        and     edi,ebp
+        xor     ebp,eax
+        ror     edx,7
+        movdqa  xmm9,xmm1
+        mov     esi,ecx
+        xor     edi,ebp
+        movdqa  XMMWORD[rsp],xmm10
+        rol     ecx,5
+        add     ebx,edi
+        xor     esi,edx
+        pslld   xmm1,2
+        xor     edx,ebp
+        add     ebx,ecx
+        psrld   xmm9,30
+        add     eax,DWORD[24+rsp]
+        and     esi,edx
+        xor     edx,ebp
+        por     xmm1,xmm9
+        ror     ecx,7
+        mov     edi,ebx
+        xor     esi,edx
+        rol     ebx,5
+        pshufd  xmm10,xmm0,238
+        add     eax,esi
+        xor     edi,ecx
+        xor     ecx,edx
+        add     eax,ebx
+        add     ebp,DWORD[28+rsp]
+        and     edi,ecx
+        xor     ecx,edx
+        ror     ebx,7
+        mov     esi,eax
+        xor     edi,ecx
+        rol     eax,5
+        add     ebp,edi
+        xor     esi,ebx
+        xor     ebx,ecx
+        add     ebp,eax
+        pxor    xmm2,xmm6
+        add     edx,DWORD[32+rsp]
+        and     esi,ebx
+        xor     ebx,ecx
+        ror     eax,7
+        punpcklqdq      xmm10,xmm1
+        mov     edi,ebp
+        xor     esi,ebx
+        pxor    xmm2,xmm3
+        rol     ebp,5
+        add     edx,esi
+        movdqa  xmm9,xmm8
+        xor     edi,eax
+        paddd   xmm8,xmm1
+        xor     eax,ebx
+        pxor    xmm2,xmm10
+        add     edx,ebp
+        add     ecx,DWORD[36+rsp]
+        and     edi,eax
+        xor     eax,ebx
+        ror     ebp,7
+        movdqa  xmm10,xmm2
+        mov     esi,edx
+        xor     edi,eax
+        movdqa  XMMWORD[16+rsp],xmm8
+        rol     edx,5
+        add     ecx,edi
+        xor     esi,ebp
+        pslld   xmm2,2
+        xor     ebp,eax
+        add     ecx,edx
+        psrld   xmm10,30
+        add     ebx,DWORD[40+rsp]
+        and     esi,ebp
+        xor     ebp,eax
+        por     xmm2,xmm10
+        ror     edx,7
+        mov     edi,ecx
+        xor     esi,ebp
+        rol     ecx,5
+        pshufd  xmm8,xmm1,238
+        add     ebx,esi
+        xor     edi,edx
+        xor     edx,ebp
+        add     ebx,ecx
+        add     eax,DWORD[44+rsp]
+        and     edi,edx
+        xor     edx,ebp
+        ror     ecx,7
+        mov     esi,ebx
+        xor     edi,edx
+        rol     ebx,5
+        add     eax,edi
+        xor     esi,edx
+        add     eax,ebx
+        pxor    xmm3,xmm7
+        add     ebp,DWORD[48+rsp]
+        xor     esi,ecx
+        punpcklqdq      xmm8,xmm2
+        mov     edi,eax
+        rol     eax,5
+        pxor    xmm3,xmm4
+        add     ebp,esi
+        xor     edi,ecx
+        movdqa  xmm10,xmm9
+        ror     ebx,7
+        paddd   xmm9,xmm2
+        add     ebp,eax
+        pxor    xmm3,xmm8
+        add     edx,DWORD[52+rsp]
+        xor     edi,ebx
+        mov     esi,ebp
+        rol     ebp,5
+        movdqa  xmm8,xmm3
+        add     edx,edi
+        xor     esi,ebx
+        movdqa  XMMWORD[32+rsp],xmm9
+        ror     eax,7
+        add     edx,ebp
+        add     ecx,DWORD[56+rsp]
+        pslld   xmm3,2
+        xor     esi,eax
+        mov     edi,edx
+        psrld   xmm8,30
+        rol     edx,5
+        add     ecx,esi
+        xor     edi,eax
+        ror     ebp,7
+        por     xmm3,xmm8
+        add     ecx,edx
+        add     ebx,DWORD[60+rsp]
+        xor     edi,ebp
+        mov     esi,ecx
+        rol     ecx,5
+        add     ebx,edi
+        xor     esi,ebp
+        ror     edx,7
+        add     ebx,ecx
+        add     eax,DWORD[rsp]
+        xor     esi,edx
+        mov     edi,ebx
+        rol     ebx,5
+        paddd   xmm10,xmm3
+        add     eax,esi
+        xor     edi,edx
+        movdqa  XMMWORD[48+rsp],xmm10
+        ror     ecx,7
+        add     eax,ebx
+        add     ebp,DWORD[4+rsp]
+        xor     edi,ecx
+        mov     esi,eax
+        rol     eax,5
+        add     ebp,edi
+        xor     esi,ecx
+        ror     ebx,7
+        add     ebp,eax
+        add     edx,DWORD[8+rsp]
+        xor     esi,ebx
+        mov     edi,ebp
+        rol     ebp,5
+        add     edx,esi
+        xor     edi,ebx
+        ror     eax,7
+        add     edx,ebp
+        add     ecx,DWORD[12+rsp]
+        xor     edi,eax
+        mov     esi,edx
+        rol     edx,5
+        add     ecx,edi
+        xor     esi,eax
+        ror     ebp,7
+        add     ecx,edx
+        cmp     r9,r10
+        je      NEAR $L$done_ssse3
+        movdqa  xmm6,XMMWORD[64+r14]
+        movdqa  xmm9,XMMWORD[((-64))+r14]
+        movdqu  xmm0,XMMWORD[r9]
+        movdqu  xmm1,XMMWORD[16+r9]
+        movdqu  xmm2,XMMWORD[32+r9]
+        movdqu  xmm3,XMMWORD[48+r9]
+DB      102,15,56,0,198
+        add     r9,64
+        add     ebx,DWORD[16+rsp]
+        xor     esi,ebp
+        mov     edi,ecx
+DB      102,15,56,0,206
+        rol     ecx,5
+        add     ebx,esi
+        xor     edi,ebp
+        ror     edx,7
+        paddd   xmm0,xmm9
+        add     ebx,ecx
+        add     eax,DWORD[20+rsp]
+        xor     edi,edx
+        mov     esi,ebx
+        movdqa  XMMWORD[rsp],xmm0
+        rol     ebx,5
+        add     eax,edi
+        xor     esi,edx
+        ror     ecx,7
+        psubd   xmm0,xmm9
+        add     eax,ebx
+        add     ebp,DWORD[24+rsp]
+        xor     esi,ecx
+        mov     edi,eax
+        rol     eax,5
+        add     ebp,esi
+        xor     edi,ecx
+        ror     ebx,7
+        add     ebp,eax
+        add     edx,DWORD[28+rsp]
+        xor     edi,ebx
+        mov     esi,ebp
+        rol     ebp,5
+        add     edx,edi
+        xor     esi,ebx
+        ror     eax,7
+        add     edx,ebp
+        add     ecx,DWORD[32+rsp]
+        xor     esi,eax
+        mov     edi,edx
+DB      102,15,56,0,214
+        rol     edx,5
+        add     ecx,esi
+        xor     edi,eax
+        ror     ebp,7
+        paddd   xmm1,xmm9
+        add     ecx,edx
+        add     ebx,DWORD[36+rsp]
+        xor     edi,ebp
+        mov     esi,ecx
+        movdqa  XMMWORD[16+rsp],xmm1
+        rol     ecx,5
+        add     ebx,edi
+        xor     esi,ebp
+        ror     edx,7
+        psubd   xmm1,xmm9
+        add     ebx,ecx
+        add     eax,DWORD[40+rsp]
+        xor     esi,edx
+        mov     edi,ebx
+        rol     ebx,5
+        add     eax,esi
+        xor     edi,edx
+        ror     ecx,7
+        add     eax,ebx
+        add     ebp,DWORD[44+rsp]
+        xor     edi,ecx
+        mov     esi,eax
+        rol     eax,5
+        add     ebp,edi
+        xor     esi,ecx
+        ror     ebx,7
+        add     ebp,eax
+        add     edx,DWORD[48+rsp]
+        xor     esi,ebx
+        mov     edi,ebp
+DB      102,15,56,0,222
+        rol     ebp,5
+        add     edx,esi
+        xor     edi,ebx
+        ror     eax,7
+        paddd   xmm2,xmm9
+        add     edx,ebp
+        add     ecx,DWORD[52+rsp]
+        xor     edi,eax
+        mov     esi,edx
+        movdqa  XMMWORD[32+rsp],xmm2
+        rol     edx,5
+        add     ecx,edi
+        xor     esi,eax
+        ror     ebp,7
+        psubd   xmm2,xmm9
+        add     ecx,edx
+        add     ebx,DWORD[56+rsp]
+        xor     esi,ebp
+        mov     edi,ecx
+        rol     ecx,5
+        add     ebx,esi
+        xor     edi,ebp
+        ror     edx,7
+        add     ebx,ecx
+        add     eax,DWORD[60+rsp]
+        xor     edi,edx
+        mov     esi,ebx
+        rol     ebx,5
+        add     eax,edi
+        ror     ecx,7
+        add     eax,ebx
+        add     eax,DWORD[r8]
+        add     esi,DWORD[4+r8]
+        add     ecx,DWORD[8+r8]
+        add     edx,DWORD[12+r8]
+        mov     DWORD[r8],eax
+        add     ebp,DWORD[16+r8]
+        mov     DWORD[4+r8],esi
+        mov     ebx,esi
+        mov     DWORD[8+r8],ecx
+        mov     edi,ecx
+        mov     DWORD[12+r8],edx
+        xor     edi,edx
+        mov     DWORD[16+r8],ebp
+        and     esi,edi
+        jmp     NEAR $L$oop_ssse3
+
+ALIGN   16
+$L$done_ssse3:
+        add     ebx,DWORD[16+rsp]
+        xor     esi,ebp
+        mov     edi,ecx
+        rol     ecx,5
+        add     ebx,esi
+        xor     edi,ebp
+        ror     edx,7
+        add     ebx,ecx
+        add     eax,DWORD[20+rsp]
+        xor     edi,edx
+        mov     esi,ebx
+        rol     ebx,5
+        add     eax,edi
+        xor     esi,edx
+        ror     ecx,7
+        add     eax,ebx
+        add     ebp,DWORD[24+rsp]
+        xor     esi,ecx
+        mov     edi,eax
+        rol     eax,5
+        add     ebp,esi
+        xor     edi,ecx
+        ror     ebx,7
+        add     ebp,eax
+        add     edx,DWORD[28+rsp]
+        xor     edi,ebx
+        mov     esi,ebp
+        rol     ebp,5
+        add     edx,edi
+        xor     esi,ebx
+        ror     eax,7
+        add     edx,ebp
+        add     ecx,DWORD[32+rsp]
+        xor     esi,eax
+        mov     edi,edx
+        rol     edx,5
+        add     ecx,esi
+        xor     edi,eax
+        ror     ebp,7
+        add     ecx,edx
+        add     ebx,DWORD[36+rsp]
+        xor     edi,ebp
+        mov     esi,ecx
+        rol     ecx,5
+        add     ebx,edi
+        xor     esi,ebp
+        ror     edx,7
+        add     ebx,ecx
+        add     eax,DWORD[40+rsp]
+        xor     esi,edx
+        mov     edi,ebx
+        rol     ebx,5
+        add     eax,esi
+        xor     edi,edx
+        ror     ecx,7
+        add     eax,ebx
+        add     ebp,DWORD[44+rsp]
+        xor     edi,ecx
+        mov     esi,eax
+        rol     eax,5
+        add     ebp,edi
+        xor     esi,ecx
+        ror     ebx,7
+        add     ebp,eax
+        add     edx,DWORD[48+rsp]
+        xor     esi,ebx
+        mov     edi,ebp
+        rol     ebp,5
+        add     edx,esi
+        xor     edi,ebx
+        ror     eax,7
+        add     edx,ebp
+        add     ecx,DWORD[52+rsp]
+        xor     edi,eax
+        mov     esi,edx
+        rol     edx,5
+        add     ecx,edi
+        xor     esi,eax
+        ror     ebp,7
+        add     ecx,edx
+        add     ebx,DWORD[56+rsp]
+        xor     esi,ebp
+        mov     edi,ecx
+        rol     ecx,5
+        add     ebx,esi
+        xor     edi,ebp
+        ror     edx,7
+        add     ebx,ecx
+        add     eax,DWORD[60+rsp]
+        xor     edi,edx
+        mov     esi,ebx
+        rol     ebx,5
+        add     eax,edi
+        ror     ecx,7
+        add     eax,ebx
+        add     eax,DWORD[r8]
+        add     esi,DWORD[4+r8]
+        add     ecx,DWORD[8+r8]
+        mov     DWORD[r8],eax
+        add     edx,DWORD[12+r8]
+        mov     DWORD[4+r8],esi
+        add     ebp,DWORD[16+r8]
+        mov     DWORD[8+r8],ecx
+        mov     DWORD[12+r8],edx
+        mov     DWORD[16+r8],ebp
+        movaps  xmm6,XMMWORD[((-40-96))+r11]
+        movaps  xmm7,XMMWORD[((-40-80))+r11]
+        movaps  xmm8,XMMWORD[((-40-64))+r11]
+        movaps  xmm9,XMMWORD[((-40-48))+r11]
+        movaps  xmm10,XMMWORD[((-40-32))+r11]
+        movaps  xmm11,XMMWORD[((-40-16))+r11]
+        mov     r14,QWORD[((-40))+r11]
+
+        mov     r13,QWORD[((-32))+r11]
+
+        mov     r12,QWORD[((-24))+r11]
+
+        mov     rbp,QWORD[((-16))+r11]
+
+        mov     rbx,QWORD[((-8))+r11]
+
+        lea     rsp,[r11]
+
+$L$epilogue_ssse3:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_sha1_block_data_order_ssse3:
+
+ALIGN   16
+sha1_block_data_order_avx:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_sha1_block_data_order_avx:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+_avx_shortcut:
+
+        mov     r11,rsp
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        lea     rsp,[((-160))+rsp]
+        vzeroupper
+        vmovaps XMMWORD[(-40-96)+r11],xmm6
+        vmovaps XMMWORD[(-40-80)+r11],xmm7
+        vmovaps XMMWORD[(-40-64)+r11],xmm8
+        vmovaps XMMWORD[(-40-48)+r11],xmm9
+        vmovaps XMMWORD[(-40-32)+r11],xmm10
+        vmovaps XMMWORD[(-40-16)+r11],xmm11
+$L$prologue_avx:
+        and     rsp,-64
+        mov     r8,rdi
+        mov     r9,rsi
+        mov     r10,rdx
+
+        shl     r10,6
+        add     r10,r9
+        lea     r14,[((K_XX_XX+64))]
+
+        mov     eax,DWORD[r8]
+        mov     ebx,DWORD[4+r8]
+        mov     ecx,DWORD[8+r8]
+        mov     edx,DWORD[12+r8]
+        mov     esi,ebx
+        mov     ebp,DWORD[16+r8]
+        mov     edi,ecx
+        xor     edi,edx
+        and     esi,edi
+
+        vmovdqa xmm6,XMMWORD[64+r14]
+        vmovdqa xmm11,XMMWORD[((-64))+r14]
+        vmovdqu xmm0,XMMWORD[r9]
+        vmovdqu xmm1,XMMWORD[16+r9]
+        vmovdqu xmm2,XMMWORD[32+r9]
+        vmovdqu xmm3,XMMWORD[48+r9]
+        vpshufb xmm0,xmm0,xmm6
+        add     r9,64
+        vpshufb xmm1,xmm1,xmm6
+        vpshufb xmm2,xmm2,xmm6
+        vpshufb xmm3,xmm3,xmm6
+        vpaddd  xmm4,xmm0,xmm11
+        vpaddd  xmm5,xmm1,xmm11
+        vpaddd  xmm6,xmm2,xmm11
+        vmovdqa XMMWORD[rsp],xmm4
+        vmovdqa XMMWORD[16+rsp],xmm5
+        vmovdqa XMMWORD[32+rsp],xmm6
+        jmp     NEAR $L$oop_avx
+ALIGN   16
+$L$oop_avx:
+        shrd    ebx,ebx,2
+        xor     esi,edx
+        vpalignr        xmm4,xmm1,xmm0,8
+        mov     edi,eax
+        add     ebp,DWORD[rsp]
+        vpaddd  xmm9,xmm11,xmm3
+        xor     ebx,ecx
+        shld    eax,eax,5
+        vpsrldq xmm8,xmm3,4
+        add     ebp,esi
+        and     edi,ebx
+        vpxor   xmm4,xmm4,xmm0
+        xor     ebx,ecx
+        add     ebp,eax
+        vpxor   xmm8,xmm8,xmm2
+        shrd    eax,eax,7
+        xor     edi,ecx
+        mov     esi,ebp
+        add     edx,DWORD[4+rsp]
+        vpxor   xmm4,xmm4,xmm8
+        xor     eax,ebx
+        shld    ebp,ebp,5
+        vmovdqa XMMWORD[48+rsp],xmm9
+        add     edx,edi
+        and     esi,eax
+        vpsrld  xmm8,xmm4,31
+        xor     eax,ebx
+        add     edx,ebp
+        shrd    ebp,ebp,7
+        xor     esi,ebx
+        vpslldq xmm10,xmm4,12
+        vpaddd  xmm4,xmm4,xmm4
+        mov     edi,edx
+        add     ecx,DWORD[8+rsp]
+        xor     ebp,eax
+        shld    edx,edx,5
+        vpsrld  xmm9,xmm10,30
+        vpor    xmm4,xmm4,xmm8
+        add     ecx,esi
+        and     edi,ebp
+        xor     ebp,eax
+        add     ecx,edx
+        vpslld  xmm10,xmm10,2
+        vpxor   xmm4,xmm4,xmm9
+        shrd    edx,edx,7
+        xor     edi,eax
+        mov     esi,ecx
+        add     ebx,DWORD[12+rsp]
+        vpxor   xmm4,xmm4,xmm10
+        xor     edx,ebp
+        shld    ecx,ecx,5
+        add     ebx,edi
+        and     esi,edx
+        xor     edx,ebp
+        add     ebx,ecx
+        shrd    ecx,ecx,7
+        xor     esi,ebp
+        vpalignr        xmm5,xmm2,xmm1,8
+        mov     edi,ebx
+        add     eax,DWORD[16+rsp]
+        vpaddd  xmm9,xmm11,xmm4
+        xor     ecx,edx
+        shld    ebx,ebx,5
+        vpsrldq xmm8,xmm4,4
+        add     eax,esi
+        and     edi,ecx
+        vpxor   xmm5,xmm5,xmm1
+        xor     ecx,edx
+        add     eax,ebx
+        vpxor   xmm8,xmm8,xmm3
+        shrd    ebx,ebx,7
+        xor     edi,edx
+        mov     esi,eax
+        add     ebp,DWORD[20+rsp]
+        vpxor   xmm5,xmm5,xmm8
+        xor     ebx,ecx
+        shld    eax,eax,5
+        vmovdqa XMMWORD[rsp],xmm9
+        add     ebp,edi
+        and     esi,ebx
+        vpsrld  xmm8,xmm5,31
+        xor     ebx,ecx
+        add     ebp,eax
+        shrd    eax,eax,7
+        xor     esi,ecx
+        vpslldq xmm10,xmm5,12
+        vpaddd  xmm5,xmm5,xmm5
+        mov     edi,ebp
+        add     edx,DWORD[24+rsp]
+        xor     eax,ebx
+        shld    ebp,ebp,5
+        vpsrld  xmm9,xmm10,30
+        vpor    xmm5,xmm5,xmm8
+        add     edx,esi
+        and     edi,eax
+        xor     eax,ebx
+        add     edx,ebp
+        vpslld  xmm10,xmm10,2
+        vpxor   xmm5,xmm5,xmm9
+        shrd    ebp,ebp,7
+        xor     edi,ebx
+        mov     esi,edx
+        add     ecx,DWORD[28+rsp]
+        vpxor   xmm5,xmm5,xmm10
+        xor     ebp,eax
+        shld    edx,edx,5
+        vmovdqa xmm11,XMMWORD[((-32))+r14]
+        add     ecx,edi
+        and     esi,ebp
+        xor     ebp,eax
+        add     ecx,edx
+        shrd    edx,edx,7
+        xor     esi,eax
+        vpalignr        xmm6,xmm3,xmm2,8
+        mov     edi,ecx
+        add     ebx,DWORD[32+rsp]
+        vpaddd  xmm9,xmm11,xmm5
+        xor     edx,ebp
+        shld    ecx,ecx,5
+        vpsrldq xmm8,xmm5,4
+        add     ebx,esi
+        and     edi,edx
+        vpxor   xmm6,xmm6,xmm2
+        xor     edx,ebp
+        add     ebx,ecx
+        vpxor   xmm8,xmm8,xmm4
+        shrd    ecx,ecx,7
+        xor     edi,ebp
+        mov     esi,ebx
+        add     eax,DWORD[36+rsp]
+        vpxor   xmm6,xmm6,xmm8
+        xor     ecx,edx
+        shld    ebx,ebx,5
+        vmovdqa XMMWORD[16+rsp],xmm9
+        add     eax,edi
+        and     esi,ecx
+        vpsrld  xmm8,xmm6,31
+        xor     ecx,edx
+        add     eax,ebx
+        shrd    ebx,ebx,7
+        xor     esi,edx
+        vpslldq xmm10,xmm6,12
+        vpaddd  xmm6,xmm6,xmm6
+        mov     edi,eax
+        add     ebp,DWORD[40+rsp]
+        xor     ebx,ecx
+        shld    eax,eax,5
+        vpsrld  xmm9,xmm10,30
+        vpor    xmm6,xmm6,xmm8
+        add     ebp,esi
+        and     edi,ebx
+        xor     ebx,ecx
+        add     ebp,eax
+        vpslld  xmm10,xmm10,2
+        vpxor   xmm6,xmm6,xmm9
+        shrd    eax,eax,7
+        xor     edi,ecx
+        mov     esi,ebp
+        add     edx,DWORD[44+rsp]
+        vpxor   xmm6,xmm6,xmm10
+        xor     eax,ebx
+        shld    ebp,ebp,5
+        add     edx,edi
+        and     esi,eax
+        xor     eax,ebx
+        add     edx,ebp
+        shrd    ebp,ebp,7
+        xor     esi,ebx
+        vpalignr        xmm7,xmm4,xmm3,8
+        mov     edi,edx
+        add     ecx,DWORD[48+rsp]
+        vpaddd  xmm9,xmm11,xmm6
+        xor     ebp,eax
+        shld    edx,edx,5
+        vpsrldq xmm8,xmm6,4
+        add     ecx,esi
+        and     edi,ebp
+        vpxor   xmm7,xmm7,xmm3
+        xor     ebp,eax
+        add     ecx,edx
+        vpxor   xmm8,xmm8,xmm5
+        shrd    edx,edx,7
+        xor     edi,eax
+        mov     esi,ecx
+        add     ebx,DWORD[52+rsp]
+        vpxor   xmm7,xmm7,xmm8
+        xor     edx,ebp
+        shld    ecx,ecx,5
+        vmovdqa XMMWORD[32+rsp],xmm9
+        add     ebx,edi
+        and     esi,edx
+        vpsrld  xmm8,xmm7,31
+        xor     edx,ebp
+        add     ebx,ecx
+        shrd    ecx,ecx,7
+        xor     esi,ebp
+        vpslldq xmm10,xmm7,12
+        vpaddd  xmm7,xmm7,xmm7
+        mov     edi,ebx
+        add     eax,DWORD[56+rsp]
+        xor     ecx,edx
+        shld    ebx,ebx,5
+        vpsrld  xmm9,xmm10,30
+        vpor    xmm7,xmm7,xmm8
+        add     eax,esi
+        and     edi,ecx
+        xor     ecx,edx
+        add     eax,ebx
+        vpslld  xmm10,xmm10,2
+        vpxor   xmm7,xmm7,xmm9
+        shrd    ebx,ebx,7
+        xor     edi,edx
+        mov     esi,eax
+        add     ebp,DWORD[60+rsp]
+        vpxor   xmm7,xmm7,xmm10
+        xor     ebx,ecx
+        shld    eax,eax,5
+        add     ebp,edi
+        and     esi,ebx
+        xor     ebx,ecx
+        add     ebp,eax
+        vpalignr        xmm8,xmm7,xmm6,8
+        vpxor   xmm0,xmm0,xmm4
+        shrd    eax,eax,7
+        xor     esi,ecx
+        mov     edi,ebp
+        add     edx,DWORD[rsp]
+        vpxor   xmm0,xmm0,xmm1
+        xor     eax,ebx
+        shld    ebp,ebp,5
+        vpaddd  xmm9,xmm11,xmm7
+        add     edx,esi
+        and     edi,eax
+        vpxor   xmm0,xmm0,xmm8
+        xor     eax,ebx
+        add     edx,ebp
+        shrd    ebp,ebp,7
+        xor     edi,ebx
+        vpsrld  xmm8,xmm0,30
+        vmovdqa XMMWORD[48+rsp],xmm9
+        mov     esi,edx
+        add     ecx,DWORD[4+rsp]
+        xor     ebp,eax
+        shld    edx,edx,5
+        vpslld  xmm0,xmm0,2
+        add     ecx,edi
+        and     esi,ebp
+        xor     ebp,eax
+        add     ecx,edx
+        shrd    edx,edx,7
+        xor     esi,eax
+        mov     edi,ecx
+        add     ebx,DWORD[8+rsp]
+        vpor    xmm0,xmm0,xmm8
+        xor     edx,ebp
+        shld    ecx,ecx,5
+        add     ebx,esi
+        and     edi,edx
+        xor     edx,ebp
+        add     ebx,ecx
+        add     eax,DWORD[12+rsp]
+        xor     edi,ebp
+        mov     esi,ebx
+        shld    ebx,ebx,5
+        add     eax,edi
+        xor     esi,edx
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        vpalignr        xmm8,xmm0,xmm7,8
+        vpxor   xmm1,xmm1,xmm5
+        add     ebp,DWORD[16+rsp]
+        xor     esi,ecx
+        mov     edi,eax
+        shld    eax,eax,5
+        vpxor   xmm1,xmm1,xmm2
+        add     ebp,esi
+        xor     edi,ecx
+        vpaddd  xmm9,xmm11,xmm0
+        shrd    ebx,ebx,7
+        add     ebp,eax
+        vpxor   xmm1,xmm1,xmm8
+        add     edx,DWORD[20+rsp]
+        xor     edi,ebx
+        mov     esi,ebp
+        shld    ebp,ebp,5
+        vpsrld  xmm8,xmm1,30
+        vmovdqa XMMWORD[rsp],xmm9
+        add     edx,edi
+        xor     esi,ebx
+        shrd    eax,eax,7
+        add     edx,ebp
+        vpslld  xmm1,xmm1,2
+        add     ecx,DWORD[24+rsp]
+        xor     esi,eax
+        mov     edi,edx
+        shld    edx,edx,5
+        add     ecx,esi
+        xor     edi,eax
+        shrd    ebp,ebp,7
+        add     ecx,edx
+        vpor    xmm1,xmm1,xmm8
+        add     ebx,DWORD[28+rsp]
+        xor     edi,ebp
+        mov     esi,ecx
+        shld    ecx,ecx,5
+        add     ebx,edi
+        xor     esi,ebp
+        shrd    edx,edx,7
+        add     ebx,ecx
+        vpalignr        xmm8,xmm1,xmm0,8
+        vpxor   xmm2,xmm2,xmm6
+        add     eax,DWORD[32+rsp]
+        xor     esi,edx
+        mov     edi,ebx
+        shld    ebx,ebx,5
+        vpxor   xmm2,xmm2,xmm3
+        add     eax,esi
+        xor     edi,edx
+        vpaddd  xmm9,xmm11,xmm1
+        vmovdqa xmm11,XMMWORD[r14]
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        vpxor   xmm2,xmm2,xmm8
+        add     ebp,DWORD[36+rsp]
+        xor     edi,ecx
+        mov     esi,eax
+        shld    eax,eax,5
+        vpsrld  xmm8,xmm2,30
+        vmovdqa XMMWORD[16+rsp],xmm9
+        add     ebp,edi
+        xor     esi,ecx
+        shrd    ebx,ebx,7
+        add     ebp,eax
+        vpslld  xmm2,xmm2,2
+        add     edx,DWORD[40+rsp]
+        xor     esi,ebx
+        mov     edi,ebp
+        shld    ebp,ebp,5
+        add     edx,esi
+        xor     edi,ebx
+        shrd    eax,eax,7
+        add     edx,ebp
+        vpor    xmm2,xmm2,xmm8
+        add     ecx,DWORD[44+rsp]
+        xor     edi,eax
+        mov     esi,edx
+        shld    edx,edx,5
+        add     ecx,edi
+        xor     esi,eax
+        shrd    ebp,ebp,7
+        add     ecx,edx
+        vpalignr        xmm8,xmm2,xmm1,8
+        vpxor   xmm3,xmm3,xmm7
+        add     ebx,DWORD[48+rsp]
+        xor     esi,ebp
+        mov     edi,ecx
+        shld    ecx,ecx,5
+        vpxor   xmm3,xmm3,xmm4
+        add     ebx,esi
+        xor     edi,ebp
+        vpaddd  xmm9,xmm11,xmm2
+        shrd    edx,edx,7
+        add     ebx,ecx
+        vpxor   xmm3,xmm3,xmm8
+        add     eax,DWORD[52+rsp]
+        xor     edi,edx
+        mov     esi,ebx
+        shld    ebx,ebx,5
+        vpsrld  xmm8,xmm3,30
+        vmovdqa XMMWORD[32+rsp],xmm9
+        add     eax,edi
+        xor     esi,edx
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        vpslld  xmm3,xmm3,2
+        add     ebp,DWORD[56+rsp]
+        xor     esi,ecx
+        mov     edi,eax
+        shld    eax,eax,5
+        add     ebp,esi
+        xor     edi,ecx
+        shrd    ebx,ebx,7
+        add     ebp,eax
+        vpor    xmm3,xmm3,xmm8
+        add     edx,DWORD[60+rsp]
+        xor     edi,ebx
+        mov     esi,ebp
+        shld    ebp,ebp,5
+        add     edx,edi
+        xor     esi,ebx
+        shrd    eax,eax,7
+        add     edx,ebp
+        vpalignr        xmm8,xmm3,xmm2,8
+        vpxor   xmm4,xmm4,xmm0
+        add     ecx,DWORD[rsp]
+        xor     esi,eax
+        mov     edi,edx
+        shld    edx,edx,5
+        vpxor   xmm4,xmm4,xmm5
+        add     ecx,esi
+        xor     edi,eax
+        vpaddd  xmm9,xmm11,xmm3
+        shrd    ebp,ebp,7
+        add     ecx,edx
+        vpxor   xmm4,xmm4,xmm8
+        add     ebx,DWORD[4+rsp]
+        xor     edi,ebp
+        mov     esi,ecx
+        shld    ecx,ecx,5
+        vpsrld  xmm8,xmm4,30
+        vmovdqa XMMWORD[48+rsp],xmm9
+        add     ebx,edi
+        xor     esi,ebp
+        shrd    edx,edx,7
+        add     ebx,ecx
+        vpslld  xmm4,xmm4,2
+        add     eax,DWORD[8+rsp]
+        xor     esi,edx
+        mov     edi,ebx
+        shld    ebx,ebx,5
+        add     eax,esi
+        xor     edi,edx
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        vpor    xmm4,xmm4,xmm8
+        add     ebp,DWORD[12+rsp]
+        xor     edi,ecx
+        mov     esi,eax
+        shld    eax,eax,5
+        add     ebp,edi
+        xor     esi,ecx
+        shrd    ebx,ebx,7
+        add     ebp,eax
+        vpalignr        xmm8,xmm4,xmm3,8
+        vpxor   xmm5,xmm5,xmm1
+        add     edx,DWORD[16+rsp]
+        xor     esi,ebx
+        mov     edi,ebp
+        shld    ebp,ebp,5
+        vpxor   xmm5,xmm5,xmm6
+        add     edx,esi
+        xor     edi,ebx
+        vpaddd  xmm9,xmm11,xmm4
+        shrd    eax,eax,7
+        add     edx,ebp
+        vpxor   xmm5,xmm5,xmm8
+        add     ecx,DWORD[20+rsp]
+        xor     edi,eax
+        mov     esi,edx
+        shld    edx,edx,5
+        vpsrld  xmm8,xmm5,30
+        vmovdqa XMMWORD[rsp],xmm9
+        add     ecx,edi
+        xor     esi,eax
+        shrd    ebp,ebp,7
+        add     ecx,edx
+        vpslld  xmm5,xmm5,2
+        add     ebx,DWORD[24+rsp]
+        xor     esi,ebp
+        mov     edi,ecx
+        shld    ecx,ecx,5
+        add     ebx,esi
+        xor     edi,ebp
+        shrd    edx,edx,7
+        add     ebx,ecx
+        vpor    xmm5,xmm5,xmm8
+        add     eax,DWORD[28+rsp]
+        shrd    ecx,ecx,7
+        mov     esi,ebx
+        xor     edi,edx
+        shld    ebx,ebx,5
+        add     eax,edi
+        xor     esi,ecx
+        xor     ecx,edx
+        add     eax,ebx
+        vpalignr        xmm8,xmm5,xmm4,8
+        vpxor   xmm6,xmm6,xmm2
+        add     ebp,DWORD[32+rsp]
+        and     esi,ecx
+        xor     ecx,edx
+        shrd    ebx,ebx,7
+        vpxor   xmm6,xmm6,xmm7
+        mov     edi,eax
+        xor     esi,ecx
+        vpaddd  xmm9,xmm11,xmm5
+        shld    eax,eax,5
+        add     ebp,esi
+        vpxor   xmm6,xmm6,xmm8
+        xor     edi,ebx
+        xor     ebx,ecx
+        add     ebp,eax
+        add     edx,DWORD[36+rsp]
+        vpsrld  xmm8,xmm6,30
+        vmovdqa XMMWORD[16+rsp],xmm9
+        and     edi,ebx
+        xor     ebx,ecx
+        shrd    eax,eax,7
+        mov     esi,ebp
+        vpslld  xmm6,xmm6,2
+        xor     edi,ebx
+        shld    ebp,ebp,5
+        add     edx,edi
+        xor     esi,eax
+        xor     eax,ebx
+        add     edx,ebp
+        add     ecx,DWORD[40+rsp]
+        and     esi,eax
+        vpor    xmm6,xmm6,xmm8
+        xor     eax,ebx
+        shrd    ebp,ebp,7
+        mov     edi,edx
+        xor     esi,eax
+        shld    edx,edx,5
+        add     ecx,esi
+        xor     edi,ebp
+        xor     ebp,eax
+        add     ecx,edx
+        add     ebx,DWORD[44+rsp]
+        and     edi,ebp
+        xor     ebp,eax
+        shrd    edx,edx,7
+        mov     esi,ecx
+        xor     edi,ebp
+        shld    ecx,ecx,5
+        add     ebx,edi
+        xor     esi,edx
+        xor     edx,ebp
+        add     ebx,ecx
+        vpalignr        xmm8,xmm6,xmm5,8
+        vpxor   xmm7,xmm7,xmm3
+        add     eax,DWORD[48+rsp]
+        and     esi,edx
+        xor     edx,ebp
+        shrd    ecx,ecx,7
+        vpxor   xmm7,xmm7,xmm0
+        mov     edi,ebx
+        xor     esi,edx
+        vpaddd  xmm9,xmm11,xmm6
+        vmovdqa xmm11,XMMWORD[32+r14]
+        shld    ebx,ebx,5
+        add     eax,esi
+        vpxor   xmm7,xmm7,xmm8
+        xor     edi,ecx
+        xor     ecx,edx
+        add     eax,ebx
+        add     ebp,DWORD[52+rsp]
+        vpsrld  xmm8,xmm7,30
+        vmovdqa XMMWORD[32+rsp],xmm9
+        and     edi,ecx
+        xor     ecx,edx
+        shrd    ebx,ebx,7
+        mov     esi,eax
+        vpslld  xmm7,xmm7,2
+        xor     edi,ecx
+        shld    eax,eax,5
+        add     ebp,edi
+        xor     esi,ebx
+        xor     ebx,ecx
+        add     ebp,eax
+        add     edx,DWORD[56+rsp]
+        and     esi,ebx
+        vpor    xmm7,xmm7,xmm8
+        xor     ebx,ecx
+        shrd    eax,eax,7
+        mov     edi,ebp
+        xor     esi,ebx
+        shld    ebp,ebp,5
+        add     edx,esi
+        xor     edi,eax
+        xor     eax,ebx
+        add     edx,ebp
+        add     ecx,DWORD[60+rsp]
+        and     edi,eax
+        xor     eax,ebx
+        shrd    ebp,ebp,7
+        mov     esi,edx
+        xor     edi,eax
+        shld    edx,edx,5
+        add     ecx,edi
+        xor     esi,ebp
+        xor     ebp,eax
+        add     ecx,edx
+        vpalignr        xmm8,xmm7,xmm6,8
+        vpxor   xmm0,xmm0,xmm4
+        add     ebx,DWORD[rsp]
+        and     esi,ebp
+        xor     ebp,eax
+        shrd    edx,edx,7
+        vpxor   xmm0,xmm0,xmm1
+        mov     edi,ecx
+        xor     esi,ebp
+        vpaddd  xmm9,xmm11,xmm7
+        shld    ecx,ecx,5
+        add     ebx,esi
+        vpxor   xmm0,xmm0,xmm8
+        xor     edi,edx
+        xor     edx,ebp
+        add     ebx,ecx
+        add     eax,DWORD[4+rsp]
+        vpsrld  xmm8,xmm0,30
+        vmovdqa XMMWORD[48+rsp],xmm9
+        and     edi,edx
+        xor     edx,ebp
+        shrd    ecx,ecx,7
+        mov     esi,ebx
+        vpslld  xmm0,xmm0,2
+        xor     edi,edx
+        shld    ebx,ebx,5
+        add     eax,edi
+        xor     esi,ecx
+        xor     ecx,edx
+        add     eax,ebx
+        add     ebp,DWORD[8+rsp]
+        and     esi,ecx
+        vpor    xmm0,xmm0,xmm8
+        xor     ecx,edx
+        shrd    ebx,ebx,7
+        mov     edi,eax
+        xor     esi,ecx
+        shld    eax,eax,5
+        add     ebp,esi
+        xor     edi,ebx
+        xor     ebx,ecx
+        add     ebp,eax
+        add     edx,DWORD[12+rsp]
+        and     edi,ebx
+        xor     ebx,ecx
+        shrd    eax,eax,7
+        mov     esi,ebp
+        xor     edi,ebx
+        shld    ebp,ebp,5
+        add     edx,edi
+        xor     esi,eax
+        xor     eax,ebx
+        add     edx,ebp
+        vpalignr        xmm8,xmm0,xmm7,8
+        vpxor   xmm1,xmm1,xmm5
+        add     ecx,DWORD[16+rsp]
+        and     esi,eax
+        xor     eax,ebx
+        shrd    ebp,ebp,7
+        vpxor   xmm1,xmm1,xmm2
+        mov     edi,edx
+        xor     esi,eax
+        vpaddd  xmm9,xmm11,xmm0
+        shld    edx,edx,5
+        add     ecx,esi
+        vpxor   xmm1,xmm1,xmm8
+        xor     edi,ebp
+        xor     ebp,eax
+        add     ecx,edx
+        add     ebx,DWORD[20+rsp]
+        vpsrld  xmm8,xmm1,30
+        vmovdqa XMMWORD[rsp],xmm9
+        and     edi,ebp
+        xor     ebp,eax
+        shrd    edx,edx,7
+        mov     esi,ecx
+        vpslld  xmm1,xmm1,2
+        xor     edi,ebp
+        shld    ecx,ecx,5
+        add     ebx,edi
+        xor     esi,edx
+        xor     edx,ebp
+        add     ebx,ecx
+        add     eax,DWORD[24+rsp]
+        and     esi,edx
+        vpor    xmm1,xmm1,xmm8
+        xor     edx,ebp
+        shrd    ecx,ecx,7
+        mov     edi,ebx
+        xor     esi,edx
+        shld    ebx,ebx,5
+        add     eax,esi
+        xor     edi,ecx
+        xor     ecx,edx
+        add     eax,ebx
+        add     ebp,DWORD[28+rsp]
+        and     edi,ecx
+        xor     ecx,edx
+        shrd    ebx,ebx,7
+        mov     esi,eax
+        xor     edi,ecx
+        shld    eax,eax,5
+        add     ebp,edi
+        xor     esi,ebx
+        xor     ebx,ecx
+        add     ebp,eax
+        vpalignr        xmm8,xmm1,xmm0,8
+        vpxor   xmm2,xmm2,xmm6
+        add     edx,DWORD[32+rsp]
+        and     esi,ebx
+        xor     ebx,ecx
+        shrd    eax,eax,7
+        vpxor   xmm2,xmm2,xmm3
+        mov     edi,ebp
+        xor     esi,ebx
+        vpaddd  xmm9,xmm11,xmm1
+        shld    ebp,ebp,5
+        add     edx,esi
+        vpxor   xmm2,xmm2,xmm8
+        xor     edi,eax
+        xor     eax,ebx
+        add     edx,ebp
+        add     ecx,DWORD[36+rsp]
+        vpsrld  xmm8,xmm2,30
+        vmovdqa XMMWORD[16+rsp],xmm9
+        and     edi,eax
+        xor     eax,ebx
+        shrd    ebp,ebp,7
+        mov     esi,edx
+        vpslld  xmm2,xmm2,2
+        xor     edi,eax
+        shld    edx,edx,5
+        add     ecx,edi
+        xor     esi,ebp
+        xor     ebp,eax
+        add     ecx,edx
+        add     ebx,DWORD[40+rsp]
+        and     esi,ebp
+        vpor    xmm2,xmm2,xmm8
+        xor     ebp,eax
+        shrd    edx,edx,7
+        mov     edi,ecx
+        xor     esi,ebp
+        shld    ecx,ecx,5
+        add     ebx,esi
+        xor     edi,edx
+        xor     edx,ebp
+        add     ebx,ecx
+        add     eax,DWORD[44+rsp]
+        and     edi,edx
+        xor     edx,ebp
+        shrd    ecx,ecx,7
+        mov     esi,ebx
+        xor     edi,edx
+        shld    ebx,ebx,5
+        add     eax,edi
+        xor     esi,edx
+        add     eax,ebx
+        vpalignr        xmm8,xmm2,xmm1,8
+        vpxor   xmm3,xmm3,xmm7
+        add     ebp,DWORD[48+rsp]
+        xor     esi,ecx
+        mov     edi,eax
+        shld    eax,eax,5
+        vpxor   xmm3,xmm3,xmm4
+        add     ebp,esi
+        xor     edi,ecx
+        vpaddd  xmm9,xmm11,xmm2
+        shrd    ebx,ebx,7
+        add     ebp,eax
+        vpxor   xmm3,xmm3,xmm8
+        add     edx,DWORD[52+rsp]
+        xor     edi,ebx
+        mov     esi,ebp
+        shld    ebp,ebp,5
+        vpsrld  xmm8,xmm3,30
+        vmovdqa XMMWORD[32+rsp],xmm9
+        add     edx,edi
+        xor     esi,ebx
+        shrd    eax,eax,7
+        add     edx,ebp
+        vpslld  xmm3,xmm3,2
+        add     ecx,DWORD[56+rsp]
+        xor     esi,eax
+        mov     edi,edx
+        shld    edx,edx,5
+        add     ecx,esi
+        xor     edi,eax
+        shrd    ebp,ebp,7
+        add     ecx,edx
+        vpor    xmm3,xmm3,xmm8
+        add     ebx,DWORD[60+rsp]
+        xor     edi,ebp
+        mov     esi,ecx
+        shld    ecx,ecx,5
+        add     ebx,edi
+        xor     esi,ebp
+        shrd    edx,edx,7
+        add     ebx,ecx
+        add     eax,DWORD[rsp]
+        vpaddd  xmm9,xmm11,xmm3
+        xor     esi,edx
+        mov     edi,ebx
+        shld    ebx,ebx,5
+        add     eax,esi
+        vmovdqa XMMWORD[48+rsp],xmm9
+        xor     edi,edx
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        add     ebp,DWORD[4+rsp]
+        xor     edi,ecx
+        mov     esi,eax
+        shld    eax,eax,5
+        add     ebp,edi
+        xor     esi,ecx
+        shrd    ebx,ebx,7
+        add     ebp,eax
+        add     edx,DWORD[8+rsp]
+        xor     esi,ebx
+        mov     edi,ebp
+        shld    ebp,ebp,5
+        add     edx,esi
+        xor     edi,ebx
+        shrd    eax,eax,7
+        add     edx,ebp
+        add     ecx,DWORD[12+rsp]
+        xor     edi,eax
+        mov     esi,edx
+        shld    edx,edx,5
+        add     ecx,edi
+        xor     esi,eax
+        shrd    ebp,ebp,7
+        add     ecx,edx
+        cmp     r9,r10
+        je      NEAR $L$done_avx
+        vmovdqa xmm6,XMMWORD[64+r14]
+        vmovdqa xmm11,XMMWORD[((-64))+r14]
+        vmovdqu xmm0,XMMWORD[r9]
+        vmovdqu xmm1,XMMWORD[16+r9]
+        vmovdqu xmm2,XMMWORD[32+r9]
+        vmovdqu xmm3,XMMWORD[48+r9]
+        vpshufb xmm0,xmm0,xmm6
+        add     r9,64
+        add     ebx,DWORD[16+rsp]
+        xor     esi,ebp
+        vpshufb xmm1,xmm1,xmm6
+        mov     edi,ecx
+        shld    ecx,ecx,5
+        vpaddd  xmm4,xmm0,xmm11
+        add     ebx,esi
+        xor     edi,ebp
+        shrd    edx,edx,7
+        add     ebx,ecx
+        vmovdqa XMMWORD[rsp],xmm4
+        add     eax,DWORD[20+rsp]
+        xor     edi,edx
+        mov     esi,ebx
+        shld    ebx,ebx,5
+        add     eax,edi
+        xor     esi,edx
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        add     ebp,DWORD[24+rsp]
+        xor     esi,ecx
+        mov     edi,eax
+        shld    eax,eax,5
+        add     ebp,esi
+        xor     edi,ecx
+        shrd    ebx,ebx,7
+        add     ebp,eax
+        add     edx,DWORD[28+rsp]
+        xor     edi,ebx
+        mov     esi,ebp
+        shld    ebp,ebp,5
+        add     edx,edi
+        xor     esi,ebx
+        shrd    eax,eax,7
+        add     edx,ebp
+        add     ecx,DWORD[32+rsp]
+        xor     esi,eax
+        vpshufb xmm2,xmm2,xmm6
+        mov     edi,edx
+        shld    edx,edx,5
+        vpaddd  xmm5,xmm1,xmm11
+        add     ecx,esi
+        xor     edi,eax
+        shrd    ebp,ebp,7
+        add     ecx,edx
+        vmovdqa XMMWORD[16+rsp],xmm5
+        add     ebx,DWORD[36+rsp]
+        xor     edi,ebp
+        mov     esi,ecx
+        shld    ecx,ecx,5
+        add     ebx,edi
+        xor     esi,ebp
+        shrd    edx,edx,7
+        add     ebx,ecx
+        add     eax,DWORD[40+rsp]
+        xor     esi,edx
+        mov     edi,ebx
+        shld    ebx,ebx,5
+        add     eax,esi
+        xor     edi,edx
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        add     ebp,DWORD[44+rsp]
+        xor     edi,ecx
+        mov     esi,eax
+        shld    eax,eax,5
+        add     ebp,edi
+        xor     esi,ecx
+        shrd    ebx,ebx,7
+        add     ebp,eax
+        add     edx,DWORD[48+rsp]
+        xor     esi,ebx
+        vpshufb xmm3,xmm3,xmm6
+        mov     edi,ebp
+        shld    ebp,ebp,5
+        vpaddd  xmm6,xmm2,xmm11
+        add     edx,esi
+        xor     edi,ebx
+        shrd    eax,eax,7
+        add     edx,ebp
+        vmovdqa XMMWORD[32+rsp],xmm6
+        add     ecx,DWORD[52+rsp]
+        xor     edi,eax
+        mov     esi,edx
+        shld    edx,edx,5
+        add     ecx,edi
+        xor     esi,eax
+        shrd    ebp,ebp,7
+        add     ecx,edx
+        add     ebx,DWORD[56+rsp]
+        xor     esi,ebp
+        mov     edi,ecx
+        shld    ecx,ecx,5
+        add     ebx,esi
+        xor     edi,ebp
+        shrd    edx,edx,7
+        add     ebx,ecx
+        add     eax,DWORD[60+rsp]
+        xor     edi,edx
+        mov     esi,ebx
+        shld    ebx,ebx,5
+        add     eax,edi
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        add     eax,DWORD[r8]
+        add     esi,DWORD[4+r8]
+        add     ecx,DWORD[8+r8]
+        add     edx,DWORD[12+r8]
+        mov     DWORD[r8],eax
+        add     ebp,DWORD[16+r8]
+        mov     DWORD[4+r8],esi
+        mov     ebx,esi
+        mov     DWORD[8+r8],ecx
+        mov     edi,ecx
+        mov     DWORD[12+r8],edx
+        xor     edi,edx
+        mov     DWORD[16+r8],ebp
+        and     esi,edi
+        jmp     NEAR $L$oop_avx
+
+ALIGN   16
+$L$done_avx:
+        add     ebx,DWORD[16+rsp]
+        xor     esi,ebp
+        mov     edi,ecx
+        shld    ecx,ecx,5
+        add     ebx,esi
+        xor     edi,ebp
+        shrd    edx,edx,7
+        add     ebx,ecx
+        add     eax,DWORD[20+rsp]
+        xor     edi,edx
+        mov     esi,ebx
+        shld    ebx,ebx,5
+        add     eax,edi
+        xor     esi,edx
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        add     ebp,DWORD[24+rsp]
+        xor     esi,ecx
+        mov     edi,eax
+        shld    eax,eax,5
+        add     ebp,esi
+        xor     edi,ecx
+        shrd    ebx,ebx,7
+        add     ebp,eax
+        add     edx,DWORD[28+rsp]
+        xor     edi,ebx
+        mov     esi,ebp
+        shld    ebp,ebp,5
+        add     edx,edi
+        xor     esi,ebx
+        shrd    eax,eax,7
+        add     edx,ebp
+        add     ecx,DWORD[32+rsp]
+        xor     esi,eax
+        mov     edi,edx
+        shld    edx,edx,5
+        add     ecx,esi
+        xor     edi,eax
+        shrd    ebp,ebp,7
+        add     ecx,edx
+        add     ebx,DWORD[36+rsp]
+        xor     edi,ebp
+        mov     esi,ecx
+        shld    ecx,ecx,5
+        add     ebx,edi
+        xor     esi,ebp
+        shrd    edx,edx,7
+        add     ebx,ecx
+        add     eax,DWORD[40+rsp]
+        xor     esi,edx
+        mov     edi,ebx
+        shld    ebx,ebx,5
+        add     eax,esi
+        xor     edi,edx
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        add     ebp,DWORD[44+rsp]
+        xor     edi,ecx
+        mov     esi,eax
+        shld    eax,eax,5
+        add     ebp,edi
+        xor     esi,ecx
+        shrd    ebx,ebx,7
+        add     ebp,eax
+        add     edx,DWORD[48+rsp]
+        xor     esi,ebx
+        mov     edi,ebp
+        shld    ebp,ebp,5
+        add     edx,esi
+        xor     edi,ebx
+        shrd    eax,eax,7
+        add     edx,ebp
+        add     ecx,DWORD[52+rsp]
+        xor     edi,eax
+        mov     esi,edx
+        shld    edx,edx,5
+        add     ecx,edi
+        xor     esi,eax
+        shrd    ebp,ebp,7
+        add     ecx,edx
+        add     ebx,DWORD[56+rsp]
+        xor     esi,ebp
+        mov     edi,ecx
+        shld    ecx,ecx,5
+        add     ebx,esi
+        xor     edi,ebp
+        shrd    edx,edx,7
+        add     ebx,ecx
+        add     eax,DWORD[60+rsp]
+        xor     edi,edx
+        mov     esi,ebx
+        shld    ebx,ebx,5
+        add     eax,edi
+        shrd    ecx,ecx,7
+        add     eax,ebx
+        vzeroupper
+
+        add     eax,DWORD[r8]
+        add     esi,DWORD[4+r8]
+        add     ecx,DWORD[8+r8]
+        mov     DWORD[r8],eax
+        add     edx,DWORD[12+r8]
+        mov     DWORD[4+r8],esi
+        add     ebp,DWORD[16+r8]
+        mov     DWORD[8+r8],ecx
+        mov     DWORD[12+r8],edx
+        mov     DWORD[16+r8],ebp
+        movaps  xmm6,XMMWORD[((-40-96))+r11]
+        movaps  xmm7,XMMWORD[((-40-80))+r11]
+        movaps  xmm8,XMMWORD[((-40-64))+r11]
+        movaps  xmm9,XMMWORD[((-40-48))+r11]
+        movaps  xmm10,XMMWORD[((-40-32))+r11]
+        movaps  xmm11,XMMWORD[((-40-16))+r11]
+        mov     r14,QWORD[((-40))+r11]
+
+        mov     r13,QWORD[((-32))+r11]
+
+        mov     r12,QWORD[((-24))+r11]
+
+        mov     rbp,QWORD[((-16))+r11]
+
+        mov     rbx,QWORD[((-8))+r11]
+
+        lea     rsp,[r11]
+
+$L$epilogue_avx:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_sha1_block_data_order_avx:
+
+ALIGN   16
+sha1_block_data_order_avx2:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_sha1_block_data_order_avx2:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+_avx2_shortcut:
+
+        mov     r11,rsp
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        vzeroupper
+        lea     rsp,[((-96))+rsp]
+        vmovaps XMMWORD[(-40-96)+r11],xmm6
+        vmovaps XMMWORD[(-40-80)+r11],xmm7
+        vmovaps XMMWORD[(-40-64)+r11],xmm8
+        vmovaps XMMWORD[(-40-48)+r11],xmm9
+        vmovaps XMMWORD[(-40-32)+r11],xmm10
+        vmovaps XMMWORD[(-40-16)+r11],xmm11
+$L$prologue_avx2:
+        mov     r8,rdi
+        mov     r9,rsi
+        mov     r10,rdx
+
+        lea     rsp,[((-640))+rsp]
+        shl     r10,6
+        lea     r13,[64+r9]
+        and     rsp,-128
+        add     r10,r9
+        lea     r14,[((K_XX_XX+64))]
+
+        mov     eax,DWORD[r8]
+        cmp     r13,r10
+        cmovae  r13,r9
+        mov     ebp,DWORD[4+r8]
+        mov     ecx,DWORD[8+r8]
+        mov     edx,DWORD[12+r8]
+        mov     esi,DWORD[16+r8]
+        vmovdqu ymm6,YMMWORD[64+r14]
+
+        vmovdqu xmm0,XMMWORD[r9]
+        vmovdqu xmm1,XMMWORD[16+r9]
+        vmovdqu xmm2,XMMWORD[32+r9]
+        vmovdqu xmm3,XMMWORD[48+r9]
+        lea     r9,[64+r9]
+        vinserti128     ymm0,ymm0,XMMWORD[r13],1
+        vinserti128     ymm1,ymm1,XMMWORD[16+r13],1
+        vpshufb ymm0,ymm0,ymm6
+        vinserti128     ymm2,ymm2,XMMWORD[32+r13],1
+        vpshufb ymm1,ymm1,ymm6
+        vinserti128     ymm3,ymm3,XMMWORD[48+r13],1
+        vpshufb ymm2,ymm2,ymm6
+        vmovdqu ymm11,YMMWORD[((-64))+r14]
+        vpshufb ymm3,ymm3,ymm6
+
+        vpaddd  ymm4,ymm0,ymm11
+        vpaddd  ymm5,ymm1,ymm11
+        vmovdqu YMMWORD[rsp],ymm4
+        vpaddd  ymm6,ymm2,ymm11
+        vmovdqu YMMWORD[32+rsp],ymm5
+        vpaddd  ymm7,ymm3,ymm11
+        vmovdqu YMMWORD[64+rsp],ymm6
+        vmovdqu YMMWORD[96+rsp],ymm7
+        vpalignr        ymm4,ymm1,ymm0,8
+        vpsrldq ymm8,ymm3,4
+        vpxor   ymm4,ymm4,ymm0
+        vpxor   ymm8,ymm8,ymm2
+        vpxor   ymm4,ymm4,ymm8
+        vpsrld  ymm8,ymm4,31
+        vpslldq ymm10,ymm4,12
+        vpaddd  ymm4,ymm4,ymm4
+        vpsrld  ymm9,ymm10,30
+        vpor    ymm4,ymm4,ymm8
+        vpslld  ymm10,ymm10,2
+        vpxor   ymm4,ymm4,ymm9
+        vpxor   ymm4,ymm4,ymm10
+        vpaddd  ymm9,ymm4,ymm11
+        vmovdqu YMMWORD[128+rsp],ymm9
+        vpalignr        ymm5,ymm2,ymm1,8
+        vpsrldq ymm8,ymm4,4
+        vpxor   ymm5,ymm5,ymm1
+        vpxor   ymm8,ymm8,ymm3
+        vpxor   ymm5,ymm5,ymm8
+        vpsrld  ymm8,ymm5,31
+        vmovdqu ymm11,YMMWORD[((-32))+r14]
+        vpslldq ymm10,ymm5,12
+        vpaddd  ymm5,ymm5,ymm5
+        vpsrld  ymm9,ymm10,30
+        vpor    ymm5,ymm5,ymm8
+        vpslld  ymm10,ymm10,2
+        vpxor   ymm5,ymm5,ymm9
+        vpxor   ymm5,ymm5,ymm10
+        vpaddd  ymm9,ymm5,ymm11
+        vmovdqu YMMWORD[160+rsp],ymm9
+        vpalignr        ymm6,ymm3,ymm2,8
+        vpsrldq ymm8,ymm5,4
+        vpxor   ymm6,ymm6,ymm2
+        vpxor   ymm8,ymm8,ymm4
+        vpxor   ymm6,ymm6,ymm8
+        vpsrld  ymm8,ymm6,31
+        vpslldq ymm10,ymm6,12
+        vpaddd  ymm6,ymm6,ymm6
+        vpsrld  ymm9,ymm10,30
+        vpor    ymm6,ymm6,ymm8
+        vpslld  ymm10,ymm10,2
+        vpxor   ymm6,ymm6,ymm9
+        vpxor   ymm6,ymm6,ymm10
+        vpaddd  ymm9,ymm6,ymm11
+        vmovdqu YMMWORD[192+rsp],ymm9
+        vpalignr        ymm7,ymm4,ymm3,8
+        vpsrldq ymm8,ymm6,4
+        vpxor   ymm7,ymm7,ymm3
+        vpxor   ymm8,ymm8,ymm5
+        vpxor   ymm7,ymm7,ymm8
+        vpsrld  ymm8,ymm7,31
+        vpslldq ymm10,ymm7,12
+        vpaddd  ymm7,ymm7,ymm7
+        vpsrld  ymm9,ymm10,30
+        vpor    ymm7,ymm7,ymm8
+        vpslld  ymm10,ymm10,2
+        vpxor   ymm7,ymm7,ymm9
+        vpxor   ymm7,ymm7,ymm10
+        vpaddd  ymm9,ymm7,ymm11
+        vmovdqu YMMWORD[224+rsp],ymm9
+        lea     r13,[128+rsp]
+        jmp     NEAR $L$oop_avx2
+ALIGN   32
+$L$oop_avx2:
+        rorx    ebx,ebp,2
+        andn    edi,ebp,edx
+        and     ebp,ecx
+        xor     ebp,edi
+        jmp     NEAR $L$align32_1
+ALIGN   32
+$L$align32_1:
+        vpalignr        ymm8,ymm7,ymm6,8
+        vpxor   ymm0,ymm0,ymm4
+        add     esi,DWORD[((-128))+r13]
+        andn    edi,eax,ecx
+        vpxor   ymm0,ymm0,ymm1
+        add     esi,ebp
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        vpxor   ymm0,ymm0,ymm8
+        and     eax,ebx
+        add     esi,r12d
+        xor     eax,edi
+        vpsrld  ymm8,ymm0,30
+        vpslld  ymm0,ymm0,2
+        add     edx,DWORD[((-124))+r13]
+        andn    edi,esi,ebx
+        add     edx,eax
+        rorx    r12d,esi,27
+        rorx    eax,esi,2
+        and     esi,ebp
+        vpor    ymm0,ymm0,ymm8
+        add     edx,r12d
+        xor     esi,edi
+        add     ecx,DWORD[((-120))+r13]
+        andn    edi,edx,ebp
+        vpaddd  ymm9,ymm0,ymm11
+        add     ecx,esi
+        rorx    r12d,edx,27
+        rorx    esi,edx,2
+        and     edx,eax
+        vmovdqu YMMWORD[256+rsp],ymm9
+        add     ecx,r12d
+        xor     edx,edi
+        add     ebx,DWORD[((-116))+r13]
+        andn    edi,ecx,eax
+        add     ebx,edx
+        rorx    r12d,ecx,27
+        rorx    edx,ecx,2
+        and     ecx,esi
+        add     ebx,r12d
+        xor     ecx,edi
+        add     ebp,DWORD[((-96))+r13]
+        andn    edi,ebx,esi
+        add     ebp,ecx
+        rorx    r12d,ebx,27
+        rorx    ecx,ebx,2
+        and     ebx,edx
+        add     ebp,r12d
+        xor     ebx,edi
+        vpalignr        ymm8,ymm0,ymm7,8
+        vpxor   ymm1,ymm1,ymm5
+        add     eax,DWORD[((-92))+r13]
+        andn    edi,ebp,edx
+        vpxor   ymm1,ymm1,ymm2
+        add     eax,ebx
+        rorx    r12d,ebp,27
+        rorx    ebx,ebp,2
+        vpxor   ymm1,ymm1,ymm8
+        and     ebp,ecx
+        add     eax,r12d
+        xor     ebp,edi
+        vpsrld  ymm8,ymm1,30
+        vpslld  ymm1,ymm1,2
+        add     esi,DWORD[((-88))+r13]
+        andn    edi,eax,ecx
+        add     esi,ebp
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        and     eax,ebx
+        vpor    ymm1,ymm1,ymm8
+        add     esi,r12d
+        xor     eax,edi
+        add     edx,DWORD[((-84))+r13]
+        andn    edi,esi,ebx
+        vpaddd  ymm9,ymm1,ymm11
+        add     edx,eax
+        rorx    r12d,esi,27
+        rorx    eax,esi,2
+        and     esi,ebp
+        vmovdqu YMMWORD[288+rsp],ymm9
+        add     edx,r12d
+        xor     esi,edi
+        add     ecx,DWORD[((-64))+r13]
+        andn    edi,edx,ebp
+        add     ecx,esi
+        rorx    r12d,edx,27
+        rorx    esi,edx,2
+        and     edx,eax
+        add     ecx,r12d
+        xor     edx,edi
+        add     ebx,DWORD[((-60))+r13]
+        andn    edi,ecx,eax
+        add     ebx,edx
+        rorx    r12d,ecx,27
+        rorx    edx,ecx,2
+        and     ecx,esi
+        add     ebx,r12d
+        xor     ecx,edi
+        vpalignr        ymm8,ymm1,ymm0,8
+        vpxor   ymm2,ymm2,ymm6
+        add     ebp,DWORD[((-56))+r13]
+        andn    edi,ebx,esi
+        vpxor   ymm2,ymm2,ymm3
+        vmovdqu ymm11,YMMWORD[r14]
+        add     ebp,ecx
+        rorx    r12d,ebx,27
+        rorx    ecx,ebx,2
+        vpxor   ymm2,ymm2,ymm8
+        and     ebx,edx
+        add     ebp,r12d
+        xor     ebx,edi
+        vpsrld  ymm8,ymm2,30
+        vpslld  ymm2,ymm2,2
+        add     eax,DWORD[((-52))+r13]
+        andn    edi,ebp,edx
+        add     eax,ebx
+        rorx    r12d,ebp,27
+        rorx    ebx,ebp,2
+        and     ebp,ecx
+        vpor    ymm2,ymm2,ymm8
+        add     eax,r12d
+        xor     ebp,edi
+        add     esi,DWORD[((-32))+r13]
+        andn    edi,eax,ecx
+        vpaddd  ymm9,ymm2,ymm11
+        add     esi,ebp
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        and     eax,ebx
+        vmovdqu YMMWORD[320+rsp],ymm9
+        add     esi,r12d
+        xor     eax,edi
+        add     edx,DWORD[((-28))+r13]
+        andn    edi,esi,ebx
+        add     edx,eax
+        rorx    r12d,esi,27
+        rorx    eax,esi,2
+        and     esi,ebp
+        add     edx,r12d
+        xor     esi,edi
+        add     ecx,DWORD[((-24))+r13]
+        andn    edi,edx,ebp
+        add     ecx,esi
+        rorx    r12d,edx,27
+        rorx    esi,edx,2
+        and     edx,eax
+        add     ecx,r12d
+        xor     edx,edi
+        vpalignr        ymm8,ymm2,ymm1,8
+        vpxor   ymm3,ymm3,ymm7
+        add     ebx,DWORD[((-20))+r13]
+        andn    edi,ecx,eax
+        vpxor   ymm3,ymm3,ymm4
+        add     ebx,edx
+        rorx    r12d,ecx,27
+        rorx    edx,ecx,2
+        vpxor   ymm3,ymm3,ymm8
+        and     ecx,esi
+        add     ebx,r12d
+        xor     ecx,edi
+        vpsrld  ymm8,ymm3,30
+        vpslld  ymm3,ymm3,2
+        add     ebp,DWORD[r13]
+        andn    edi,ebx,esi
+        add     ebp,ecx
+        rorx    r12d,ebx,27
+        rorx    ecx,ebx,2
+        and     ebx,edx
+        vpor    ymm3,ymm3,ymm8
+        add     ebp,r12d
+        xor     ebx,edi
+        add     eax,DWORD[4+r13]
+        andn    edi,ebp,edx
+        vpaddd  ymm9,ymm3,ymm11
+        add     eax,ebx
+        rorx    r12d,ebp,27
+        rorx    ebx,ebp,2
+        and     ebp,ecx
+        vmovdqu YMMWORD[352+rsp],ymm9
+        add     eax,r12d
+        xor     ebp,edi
+        add     esi,DWORD[8+r13]
+        andn    edi,eax,ecx
+        add     esi,ebp
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        and     eax,ebx
+        add     esi,r12d
+        xor     eax,edi
+        add     edx,DWORD[12+r13]
+        lea     edx,[rax*1+rdx]
+        rorx    r12d,esi,27
+        rorx    eax,esi,2
+        xor     esi,ebp
+        add     edx,r12d
+        xor     esi,ebx
+        vpalignr        ymm8,ymm3,ymm2,8
+        vpxor   ymm4,ymm4,ymm0
+        add     ecx,DWORD[32+r13]
+        lea     ecx,[rsi*1+rcx]
+        vpxor   ymm4,ymm4,ymm5
+        rorx    r12d,edx,27
+        rorx    esi,edx,2
+        xor     edx,eax
+        vpxor   ymm4,ymm4,ymm8
+        add     ecx,r12d
+        xor     edx,ebp
+        add     ebx,DWORD[36+r13]
+        vpsrld  ymm8,ymm4,30
+        vpslld  ymm4,ymm4,2
+        lea     ebx,[rdx*1+rbx]
+        rorx    r12d,ecx,27
+        rorx    edx,ecx,2
+        xor     ecx,esi
+        add     ebx,r12d
+        xor     ecx,eax
+        vpor    ymm4,ymm4,ymm8
+        add     ebp,DWORD[40+r13]
+        lea     ebp,[rbp*1+rcx]
+        rorx    r12d,ebx,27
+        rorx    ecx,ebx,2
+        vpaddd  ymm9,ymm4,ymm11
+        xor     ebx,edx
+        add     ebp,r12d
+        xor     ebx,esi
+        add     eax,DWORD[44+r13]
+        vmovdqu YMMWORD[384+rsp],ymm9
+        lea     eax,[rbx*1+rax]
+        rorx    r12d,ebp,27
+        rorx    ebx,ebp,2
+        xor     ebp,ecx
+        add     eax,r12d
+        xor     ebp,edx
+        add     esi,DWORD[64+r13]
+        lea     esi,[rbp*1+rsi]
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        xor     eax,ebx
+        add     esi,r12d
+        xor     eax,ecx
+        vpalignr        ymm8,ymm4,ymm3,8
+        vpxor   ymm5,ymm5,ymm1
+        add     edx,DWORD[68+r13]
+        lea     edx,[rax*1+rdx]
+        vpxor   ymm5,ymm5,ymm6
+        rorx    r12d,esi,27
+        rorx    eax,esi,2
+        xor     esi,ebp
+        vpxor   ymm5,ymm5,ymm8
+        add     edx,r12d
+        xor     esi,ebx
+        add     ecx,DWORD[72+r13]
+        vpsrld  ymm8,ymm5,30
+        vpslld  ymm5,ymm5,2
+        lea     ecx,[rsi*1+rcx]
+        rorx    r12d,edx,27
+        rorx    esi,edx,2
+        xor     edx,eax
+        add     ecx,r12d
+        xor     edx,ebp
+        vpor    ymm5,ymm5,ymm8
+        add     ebx,DWORD[76+r13]
+        lea     ebx,[rdx*1+rbx]
+        rorx    r12d,ecx,27
+        rorx    edx,ecx,2
+        vpaddd  ymm9,ymm5,ymm11
+        xor     ecx,esi
+        add     ebx,r12d
+        xor     ecx,eax
+        add     ebp,DWORD[96+r13]
+        vmovdqu YMMWORD[416+rsp],ymm9
+        lea     ebp,[rbp*1+rcx]
+        rorx    r12d,ebx,27
+        rorx    ecx,ebx,2
+        xor     ebx,edx
+        add     ebp,r12d
+        xor     ebx,esi
+        add     eax,DWORD[100+r13]
+        lea     eax,[rbx*1+rax]
+        rorx    r12d,ebp,27
+        rorx    ebx,ebp,2
+        xor     ebp,ecx
+        add     eax,r12d
+        xor     ebp,edx
+        vpalignr        ymm8,ymm5,ymm4,8
+        vpxor   ymm6,ymm6,ymm2
+        add     esi,DWORD[104+r13]
+        lea     esi,[rbp*1+rsi]
+        vpxor   ymm6,ymm6,ymm7
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        xor     eax,ebx
+        vpxor   ymm6,ymm6,ymm8
+        add     esi,r12d
+        xor     eax,ecx
+        add     edx,DWORD[108+r13]
+        lea     r13,[256+r13]
+        vpsrld  ymm8,ymm6,30
+        vpslld  ymm6,ymm6,2
+        lea     edx,[rax*1+rdx]
+        rorx    r12d,esi,27
+        rorx    eax,esi,2
+        xor     esi,ebp
+        add     edx,r12d
+        xor     esi,ebx
+        vpor    ymm6,ymm6,ymm8
+        add     ecx,DWORD[((-128))+r13]
+        lea     ecx,[rsi*1+rcx]
+        rorx    r12d,edx,27
+        rorx    esi,edx,2
+        vpaddd  ymm9,ymm6,ymm11
+        xor     edx,eax
+        add     ecx,r12d
+        xor     edx,ebp
+        add     ebx,DWORD[((-124))+r13]
+        vmovdqu YMMWORD[448+rsp],ymm9
+        lea     ebx,[rdx*1+rbx]
+        rorx    r12d,ecx,27
+        rorx    edx,ecx,2
+        xor     ecx,esi
+        add     ebx,r12d
+        xor     ecx,eax
+        add     ebp,DWORD[((-120))+r13]
+        lea     ebp,[rbp*1+rcx]
+        rorx    r12d,ebx,27
+        rorx    ecx,ebx,2
+        xor     ebx,edx
+        add     ebp,r12d
+        xor     ebx,esi
+        vpalignr        ymm8,ymm6,ymm5,8
+        vpxor   ymm7,ymm7,ymm3
+        add     eax,DWORD[((-116))+r13]
+        lea     eax,[rbx*1+rax]
+        vpxor   ymm7,ymm7,ymm0
+        vmovdqu ymm11,YMMWORD[32+r14]
+        rorx    r12d,ebp,27
+        rorx    ebx,ebp,2
+        xor     ebp,ecx
+        vpxor   ymm7,ymm7,ymm8
+        add     eax,r12d
+        xor     ebp,edx
+        add     esi,DWORD[((-96))+r13]
+        vpsrld  ymm8,ymm7,30
+        vpslld  ymm7,ymm7,2
+        lea     esi,[rbp*1+rsi]
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        xor     eax,ebx
+        add     esi,r12d
+        xor     eax,ecx
+        vpor    ymm7,ymm7,ymm8
+        add     edx,DWORD[((-92))+r13]
+        lea     edx,[rax*1+rdx]
+        rorx    r12d,esi,27
+        rorx    eax,esi,2
+        vpaddd  ymm9,ymm7,ymm11
+        xor     esi,ebp
+        add     edx,r12d
+        xor     esi,ebx
+        add     ecx,DWORD[((-88))+r13]
+        vmovdqu YMMWORD[480+rsp],ymm9
+        lea     ecx,[rsi*1+rcx]
+        rorx    r12d,edx,27
+        rorx    esi,edx,2
+        xor     edx,eax
+        add     ecx,r12d
+        xor     edx,ebp
+        add     ebx,DWORD[((-84))+r13]
+        mov     edi,esi
+        xor     edi,eax
+        lea     ebx,[rdx*1+rbx]
+        rorx    r12d,ecx,27
+        rorx    edx,ecx,2
+        xor     ecx,esi
+        add     ebx,r12d
+        and     ecx,edi
+        jmp     NEAR $L$align32_2
+ALIGN   32
+$L$align32_2:
+        vpalignr        ymm8,ymm7,ymm6,8
+        vpxor   ymm0,ymm0,ymm4
+        add     ebp,DWORD[((-64))+r13]
+        xor     ecx,esi
+        vpxor   ymm0,ymm0,ymm1
+        mov     edi,edx
+        xor     edi,esi
+        lea     ebp,[rbp*1+rcx]
+        vpxor   ymm0,ymm0,ymm8
+        rorx    r12d,ebx,27
+        rorx    ecx,ebx,2
+        xor     ebx,edx
+        vpsrld  ymm8,ymm0,30
+        vpslld  ymm0,ymm0,2
+        add     ebp,r12d
+        and     ebx,edi
+        add     eax,DWORD[((-60))+r13]
+        xor     ebx,edx
+        mov     edi,ecx
+        xor     edi,edx
+        vpor    ymm0,ymm0,ymm8
+        lea     eax,[rbx*1+rax]
+        rorx    r12d,ebp,27
+        rorx    ebx,ebp,2
+        xor     ebp,ecx
+        vpaddd  ymm9,ymm0,ymm11
+        add     eax,r12d
+        and     ebp,edi
+        add     esi,DWORD[((-56))+r13]
+        xor     ebp,ecx
+        vmovdqu YMMWORD[512+rsp],ymm9
+        mov     edi,ebx
+        xor     edi,ecx
+        lea     esi,[rbp*1+rsi]
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        xor     eax,ebx
+        add     esi,r12d
+        and     eax,edi
+        add     edx,DWORD[((-52))+r13]
+        xor     eax,ebx
+        mov     edi,ebp
+        xor     edi,ebx
+        lea     edx,[rax*1+rdx]
+        rorx    r12d,esi,27
+        rorx    eax,esi,2
+        xor     esi,ebp
+        add     edx,r12d
+        and     esi,edi
+        add     ecx,DWORD[((-32))+r13]
+        xor     esi,ebp
+        mov     edi,eax
+        xor     edi,ebp
+        lea     ecx,[rsi*1+rcx]
+        rorx    r12d,edx,27
+        rorx    esi,edx,2
+        xor     edx,eax
+        add     ecx,r12d
+        and     edx,edi
+        vpalignr        ymm8,ymm0,ymm7,8
+        vpxor   ymm1,ymm1,ymm5
+        add     ebx,DWORD[((-28))+r13]
+        xor     edx,eax
+        vpxor   ymm1,ymm1,ymm2
+        mov     edi,esi
+        xor     edi,eax
+        lea     ebx,[rdx*1+rbx]
+        vpxor   ymm1,ymm1,ymm8
+        rorx    r12d,ecx,27
+        rorx    edx,ecx,2
+        xor     ecx,esi
+        vpsrld  ymm8,ymm1,30
+        vpslld  ymm1,ymm1,2
+        add     ebx,r12d
+        and     ecx,edi
+        add     ebp,DWORD[((-24))+r13]
+        xor     ecx,esi
+        mov     edi,edx
+        xor     edi,esi
+        vpor    ymm1,ymm1,ymm8
+        lea     ebp,[rbp*1+rcx]
+        rorx    r12d,ebx,27
+        rorx    ecx,ebx,2
+        xor     ebx,edx
+        vpaddd  ymm9,ymm1,ymm11
+        add     ebp,r12d
+        and     ebx,edi
+        add     eax,DWORD[((-20))+r13]
+        xor     ebx,edx
+        vmovdqu YMMWORD[544+rsp],ymm9
+        mov     edi,ecx
+        xor     edi,edx
+        lea     eax,[rbx*1+rax]
+        rorx    r12d,ebp,27
+        rorx    ebx,ebp,2
+        xor     ebp,ecx
+        add     eax,r12d
+        and     ebp,edi
+        add     esi,DWORD[r13]
+        xor     ebp,ecx
+        mov     edi,ebx
+        xor     edi,ecx
+        lea     esi,[rbp*1+rsi]
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        xor     eax,ebx
+        add     esi,r12d
+        and     eax,edi
+        add     edx,DWORD[4+r13]
+        xor     eax,ebx
+        mov     edi,ebp
+        xor     edi,ebx
+        lea     edx,[rax*1+rdx]
+        rorx    r12d,esi,27
+        rorx    eax,esi,2
+        xor     esi,ebp
+        add     edx,r12d
+        and     esi,edi
+        vpalignr        ymm8,ymm1,ymm0,8
+        vpxor   ymm2,ymm2,ymm6
+        add     ecx,DWORD[8+r13]
+        xor     esi,ebp
+        vpxor   ymm2,ymm2,ymm3
+        mov     edi,eax
+        xor     edi,ebp
+        lea     ecx,[rsi*1+rcx]
+        vpxor   ymm2,ymm2,ymm8
+        rorx    r12d,edx,27
+        rorx    esi,edx,2
+        xor     edx,eax
+        vpsrld  ymm8,ymm2,30
+        vpslld  ymm2,ymm2,2
+        add     ecx,r12d
+        and     edx,edi
+        add     ebx,DWORD[12+r13]
+        xor     edx,eax
+        mov     edi,esi
+        xor     edi,eax
+        vpor    ymm2,ymm2,ymm8
+        lea     ebx,[rdx*1+rbx]
+        rorx    r12d,ecx,27
+        rorx    edx,ecx,2
+        xor     ecx,esi
+        vpaddd  ymm9,ymm2,ymm11
+        add     ebx,r12d
+        and     ecx,edi
+        add     ebp,DWORD[32+r13]
+        xor     ecx,esi
+        vmovdqu YMMWORD[576+rsp],ymm9
+        mov     edi,edx
+        xor     edi,esi
+        lea     ebp,[rbp*1+rcx]
+        rorx    r12d,ebx,27
+        rorx    ecx,ebx,2
+        xor     ebx,edx
+        add     ebp,r12d
+        and     ebx,edi
+        add     eax,DWORD[36+r13]
+        xor     ebx,edx
+        mov     edi,ecx
+        xor     edi,edx
+        lea     eax,[rbx*1+rax]
+        rorx    r12d,ebp,27
+        rorx    ebx,ebp,2
+        xor     ebp,ecx
+        add     eax,r12d
+        and     ebp,edi
+        add     esi,DWORD[40+r13]
+        xor     ebp,ecx
+        mov     edi,ebx
+        xor     edi,ecx
+        lea     esi,[rbp*1+rsi]
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        xor     eax,ebx
+        add     esi,r12d
+        and     eax,edi
+        vpalignr        ymm8,ymm2,ymm1,8
+        vpxor   ymm3,ymm3,ymm7
+        add     edx,DWORD[44+r13]
+        xor     eax,ebx
+        vpxor   ymm3,ymm3,ymm4
+        mov     edi,ebp
+        xor     edi,ebx
+        lea     edx,[rax*1+rdx]
+        vpxor   ymm3,ymm3,ymm8
+        rorx    r12d,esi,27
+        rorx    eax,esi,2
+        xor     esi,ebp
+        vpsrld  ymm8,ymm3,30
+        vpslld  ymm3,ymm3,2
+        add     edx,r12d
+        and     esi,edi
+        add     ecx,DWORD[64+r13]
+        xor     esi,ebp
+        mov     edi,eax
+        xor     edi,ebp
+        vpor    ymm3,ymm3,ymm8
+        lea     ecx,[rsi*1+rcx]
+        rorx    r12d,edx,27
+        rorx    esi,edx,2
+        xor     edx,eax
+        vpaddd  ymm9,ymm3,ymm11
+        add     ecx,r12d
+        and     edx,edi
+        add     ebx,DWORD[68+r13]
+        xor     edx,eax
+        vmovdqu YMMWORD[608+rsp],ymm9
+        mov     edi,esi
+        xor     edi,eax
+        lea     ebx,[rdx*1+rbx]
+        rorx    r12d,ecx,27
+        rorx    edx,ecx,2
+        xor     ecx,esi
+        add     ebx,r12d
+        and     ecx,edi
+        add     ebp,DWORD[72+r13]
+        xor     ecx,esi
+        mov     edi,edx
+        xor     edi,esi
+        lea     ebp,[rbp*1+rcx]
+        rorx    r12d,ebx,27
+        rorx    ecx,ebx,2
+        xor     ebx,edx
+        add     ebp,r12d
+        and     ebx,edi
+        add     eax,DWORD[76+r13]
+        xor     ebx,edx
+        lea     eax,[rbx*1+rax]
+        rorx    r12d,ebp,27
+        rorx    ebx,ebp,2
+        xor     ebp,ecx
+        add     eax,r12d
+        xor     ebp,edx
+        add     esi,DWORD[96+r13]
+        lea     esi,[rbp*1+rsi]
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        xor     eax,ebx
+        add     esi,r12d
+        xor     eax,ecx
+        add     edx,DWORD[100+r13]
+        lea     edx,[rax*1+rdx]
+        rorx    r12d,esi,27
+        rorx    eax,esi,2
+        xor     esi,ebp
+        add     edx,r12d
+        xor     esi,ebx
+        add     ecx,DWORD[104+r13]
+        lea     ecx,[rsi*1+rcx]
+        rorx    r12d,edx,27
+        rorx    esi,edx,2
+        xor     edx,eax
+        add     ecx,r12d
+        xor     edx,ebp
+        add     ebx,DWORD[108+r13]
+        lea     r13,[256+r13]
+        lea     ebx,[rdx*1+rbx]
+        rorx    r12d,ecx,27
+        rorx    edx,ecx,2
+        xor     ecx,esi
+        add     ebx,r12d
+        xor     ecx,eax
+        add     ebp,DWORD[((-128))+r13]
+        lea     ebp,[rbp*1+rcx]
+        rorx    r12d,ebx,27
+        rorx    ecx,ebx,2
+        xor     ebx,edx
+        add     ebp,r12d
+        xor     ebx,esi
+        add     eax,DWORD[((-124))+r13]
+        lea     eax,[rbx*1+rax]
+        rorx    r12d,ebp,27
+        rorx    ebx,ebp,2
+        xor     ebp,ecx
+        add     eax,r12d
+        xor     ebp,edx
+        add     esi,DWORD[((-120))+r13]
+        lea     esi,[rbp*1+rsi]
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        xor     eax,ebx
+        add     esi,r12d
+        xor     eax,ecx
+        add     edx,DWORD[((-116))+r13]
+        lea     edx,[rax*1+rdx]
+        rorx    r12d,esi,27
+        rorx    eax,esi,2
+        xor     esi,ebp
+        add     edx,r12d
+        xor     esi,ebx
+        add     ecx,DWORD[((-96))+r13]
+        lea     ecx,[rsi*1+rcx]
+        rorx    r12d,edx,27
+        rorx    esi,edx,2
+        xor     edx,eax
+        add     ecx,r12d
+        xor     edx,ebp
+        add     ebx,DWORD[((-92))+r13]
+        lea     ebx,[rdx*1+rbx]
+        rorx    r12d,ecx,27
+        rorx    edx,ecx,2
+        xor     ecx,esi
+        add     ebx,r12d
+        xor     ecx,eax
+        add     ebp,DWORD[((-88))+r13]
+        lea     ebp,[rbp*1+rcx]
+        rorx    r12d,ebx,27
+        rorx    ecx,ebx,2
+        xor     ebx,edx
+        add     ebp,r12d
+        xor     ebx,esi
+        add     eax,DWORD[((-84))+r13]
+        lea     eax,[rbx*1+rax]
+        rorx    r12d,ebp,27
+        rorx    ebx,ebp,2
+        xor     ebp,ecx
+        add     eax,r12d
+        xor     ebp,edx
+        add     esi,DWORD[((-64))+r13]
+        lea     esi,[rbp*1+rsi]
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        xor     eax,ebx
+        add     esi,r12d
+        xor     eax,ecx
+        add     edx,DWORD[((-60))+r13]
+        lea     edx,[rax*1+rdx]
+        rorx    r12d,esi,27
+        rorx    eax,esi,2
+        xor     esi,ebp
+        add     edx,r12d
+        xor     esi,ebx
+        add     ecx,DWORD[((-56))+r13]
+        lea     ecx,[rsi*1+rcx]
+        rorx    r12d,edx,27
+        rorx    esi,edx,2
+        xor     edx,eax
+        add     ecx,r12d
+        xor     edx,ebp
+        add     ebx,DWORD[((-52))+r13]
+        lea     ebx,[rdx*1+rbx]
+        rorx    r12d,ecx,27
+        rorx    edx,ecx,2
+        xor     ecx,esi
+        add     ebx,r12d
+        xor     ecx,eax
+        add     ebp,DWORD[((-32))+r13]
+        lea     ebp,[rbp*1+rcx]
+        rorx    r12d,ebx,27
+        rorx    ecx,ebx,2
+        xor     ebx,edx
+        add     ebp,r12d
+        xor     ebx,esi
+        add     eax,DWORD[((-28))+r13]
+        lea     eax,[rbx*1+rax]
+        rorx    r12d,ebp,27
+        rorx    ebx,ebp,2
+        xor     ebp,ecx
+        add     eax,r12d
+        xor     ebp,edx
+        add     esi,DWORD[((-24))+r13]
+        lea     esi,[rbp*1+rsi]
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        xor     eax,ebx
+        add     esi,r12d
+        xor     eax,ecx
+        add     edx,DWORD[((-20))+r13]
+        lea     edx,[rax*1+rdx]
+        rorx    r12d,esi,27
+        add     edx,r12d
+        lea     r13,[128+r9]
+        lea     rdi,[128+r9]
+        cmp     r13,r10
+        cmovae  r13,r9
+
+
+        add     edx,DWORD[r8]
+        add     esi,DWORD[4+r8]
+        add     ebp,DWORD[8+r8]
+        mov     DWORD[r8],edx
+        add     ebx,DWORD[12+r8]
+        mov     DWORD[4+r8],esi
+        mov     eax,edx
+        add     ecx,DWORD[16+r8]
+        mov     r12d,ebp
+        mov     DWORD[8+r8],ebp
+        mov     edx,ebx
+
+        mov     DWORD[12+r8],ebx
+        mov     ebp,esi
+        mov     DWORD[16+r8],ecx
+
+        mov     esi,ecx
+        mov     ecx,r12d
+
+
+        cmp     r9,r10
+        je      NEAR $L$done_avx2
+        vmovdqu ymm6,YMMWORD[64+r14]
+        cmp     rdi,r10
+        ja      NEAR $L$ast_avx2
+
+        vmovdqu xmm0,XMMWORD[((-64))+rdi]
+        vmovdqu xmm1,XMMWORD[((-48))+rdi]
+        vmovdqu xmm2,XMMWORD[((-32))+rdi]
+        vmovdqu xmm3,XMMWORD[((-16))+rdi]
+        vinserti128     ymm0,ymm0,XMMWORD[r13],1
+        vinserti128     ymm1,ymm1,XMMWORD[16+r13],1
+        vinserti128     ymm2,ymm2,XMMWORD[32+r13],1
+        vinserti128     ymm3,ymm3,XMMWORD[48+r13],1
+        jmp     NEAR $L$ast_avx2
+
+ALIGN   32
+$L$ast_avx2:
+        lea     r13,[((128+16))+rsp]
+        rorx    ebx,ebp,2
+        andn    edi,ebp,edx
+        and     ebp,ecx
+        xor     ebp,edi
+        sub     r9,-128
+        add     esi,DWORD[((-128))+r13]
+        andn    edi,eax,ecx
+        add     esi,ebp
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        and     eax,ebx
+        add     esi,r12d
+        xor     eax,edi
+        add     edx,DWORD[((-124))+r13]
+        andn    edi,esi,ebx
+        add     edx,eax
+        rorx    r12d,esi,27
+        rorx    eax,esi,2
+        and     esi,ebp
+        add     edx,r12d
+        xor     esi,edi
+        add     ecx,DWORD[((-120))+r13]
+        andn    edi,edx,ebp
+        add     ecx,esi
+        rorx    r12d,edx,27
+        rorx    esi,edx,2
+        and     edx,eax
+        add     ecx,r12d
+        xor     edx,edi
+        add     ebx,DWORD[((-116))+r13]
+        andn    edi,ecx,eax
+        add     ebx,edx
+        rorx    r12d,ecx,27
+        rorx    edx,ecx,2
+        and     ecx,esi
+        add     ebx,r12d
+        xor     ecx,edi
+        add     ebp,DWORD[((-96))+r13]
+        andn    edi,ebx,esi
+        add     ebp,ecx
+        rorx    r12d,ebx,27
+        rorx    ecx,ebx,2
+        and     ebx,edx
+        add     ebp,r12d
+        xor     ebx,edi
+        add     eax,DWORD[((-92))+r13]
+        andn    edi,ebp,edx
+        add     eax,ebx
+        rorx    r12d,ebp,27
+        rorx    ebx,ebp,2
+        and     ebp,ecx
+        add     eax,r12d
+        xor     ebp,edi
+        add     esi,DWORD[((-88))+r13]
+        andn    edi,eax,ecx
+        add     esi,ebp
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        and     eax,ebx
+        add     esi,r12d
+        xor     eax,edi
+        add     edx,DWORD[((-84))+r13]
+        andn    edi,esi,ebx
+        add     edx,eax
+        rorx    r12d,esi,27
+        rorx    eax,esi,2
+        and     esi,ebp
+        add     edx,r12d
+        xor     esi,edi
+        add     ecx,DWORD[((-64))+r13]
+        andn    edi,edx,ebp
+        add     ecx,esi
+        rorx    r12d,edx,27
+        rorx    esi,edx,2
+        and     edx,eax
+        add     ecx,r12d
+        xor     edx,edi
+        add     ebx,DWORD[((-60))+r13]
+        andn    edi,ecx,eax
+        add     ebx,edx
+        rorx    r12d,ecx,27
+        rorx    edx,ecx,2
+        and     ecx,esi
+        add     ebx,r12d
+        xor     ecx,edi
+        add     ebp,DWORD[((-56))+r13]
+        andn    edi,ebx,esi
+        add     ebp,ecx
+        rorx    r12d,ebx,27
+        rorx    ecx,ebx,2
+        and     ebx,edx
+        add     ebp,r12d
+        xor     ebx,edi
+        add     eax,DWORD[((-52))+r13]
+        andn    edi,ebp,edx
+        add     eax,ebx
+        rorx    r12d,ebp,27
+        rorx    ebx,ebp,2
+        and     ebp,ecx
+        add     eax,r12d
+        xor     ebp,edi
+        add     esi,DWORD[((-32))+r13]
+        andn    edi,eax,ecx
+        add     esi,ebp
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        and     eax,ebx
+        add     esi,r12d
+        xor     eax,edi
+        add     edx,DWORD[((-28))+r13]
+        andn    edi,esi,ebx
+        add     edx,eax
+        rorx    r12d,esi,27
+        rorx    eax,esi,2
+        and     esi,ebp
+        add     edx,r12d
+        xor     esi,edi
+        add     ecx,DWORD[((-24))+r13]
+        andn    edi,edx,ebp
+        add     ecx,esi
+        rorx    r12d,edx,27
+        rorx    esi,edx,2
+        and     edx,eax
+        add     ecx,r12d
+        xor     edx,edi
+        add     ebx,DWORD[((-20))+r13]
+        andn    edi,ecx,eax
+        add     ebx,edx
+        rorx    r12d,ecx,27
+        rorx    edx,ecx,2
+        and     ecx,esi
+        add     ebx,r12d
+        xor     ecx,edi
+        add     ebp,DWORD[r13]
+        andn    edi,ebx,esi
+        add     ebp,ecx
+        rorx    r12d,ebx,27
+        rorx    ecx,ebx,2
+        and     ebx,edx
+        add     ebp,r12d
+        xor     ebx,edi
+        add     eax,DWORD[4+r13]
+        andn    edi,ebp,edx
+        add     eax,ebx
+        rorx    r12d,ebp,27
+        rorx    ebx,ebp,2
+        and     ebp,ecx
+        add     eax,r12d
+        xor     ebp,edi
+        add     esi,DWORD[8+r13]
+        andn    edi,eax,ecx
+        add     esi,ebp
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        and     eax,ebx
+        add     esi,r12d
+        xor     eax,edi
+        add     edx,DWORD[12+r13]
+        lea     edx,[rax*1+rdx]
+        rorx    r12d,esi,27
+        rorx    eax,esi,2
+        xor     esi,ebp
+        add     edx,r12d
+        xor     esi,ebx
+        add     ecx,DWORD[32+r13]
+        lea     ecx,[rsi*1+rcx]
+        rorx    r12d,edx,27
+        rorx    esi,edx,2
+        xor     edx,eax
+        add     ecx,r12d
+        xor     edx,ebp
+        add     ebx,DWORD[36+r13]
+        lea     ebx,[rdx*1+rbx]
+        rorx    r12d,ecx,27
+        rorx    edx,ecx,2
+        xor     ecx,esi
+        add     ebx,r12d
+        xor     ecx,eax
+        add     ebp,DWORD[40+r13]
+        lea     ebp,[rbp*1+rcx]
+        rorx    r12d,ebx,27
+        rorx    ecx,ebx,2
+        xor     ebx,edx
+        add     ebp,r12d
+        xor     ebx,esi
+        add     eax,DWORD[44+r13]
+        lea     eax,[rbx*1+rax]
+        rorx    r12d,ebp,27
+        rorx    ebx,ebp,2
+        xor     ebp,ecx
+        add     eax,r12d
+        xor     ebp,edx
+        add     esi,DWORD[64+r13]
+        lea     esi,[rbp*1+rsi]
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        xor     eax,ebx
+        add     esi,r12d
+        xor     eax,ecx
+        vmovdqu ymm11,YMMWORD[((-64))+r14]
+        vpshufb ymm0,ymm0,ymm6
+        add     edx,DWORD[68+r13]
+        lea     edx,[rax*1+rdx]
+        rorx    r12d,esi,27
+        rorx    eax,esi,2
+        xor     esi,ebp
+        add     edx,r12d
+        xor     esi,ebx
+        add     ecx,DWORD[72+r13]
+        lea     ecx,[rsi*1+rcx]
+        rorx    r12d,edx,27
+        rorx    esi,edx,2
+        xor     edx,eax
+        add     ecx,r12d
+        xor     edx,ebp
+        add     ebx,DWORD[76+r13]
+        lea     ebx,[rdx*1+rbx]
+        rorx    r12d,ecx,27
+        rorx    edx,ecx,2
+        xor     ecx,esi
+        add     ebx,r12d
+        xor     ecx,eax
+        add     ebp,DWORD[96+r13]
+        lea     ebp,[rbp*1+rcx]
+        rorx    r12d,ebx,27
+        rorx    ecx,ebx,2
+        xor     ebx,edx
+        add     ebp,r12d
+        xor     ebx,esi
+        add     eax,DWORD[100+r13]
+        lea     eax,[rbx*1+rax]
+        rorx    r12d,ebp,27
+        rorx    ebx,ebp,2
+        xor     ebp,ecx
+        add     eax,r12d
+        xor     ebp,edx
+        vpshufb ymm1,ymm1,ymm6
+        vpaddd  ymm8,ymm0,ymm11
+        add     esi,DWORD[104+r13]
+        lea     esi,[rbp*1+rsi]
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        xor     eax,ebx
+        add     esi,r12d
+        xor     eax,ecx
+        add     edx,DWORD[108+r13]
+        lea     r13,[256+r13]
+        lea     edx,[rax*1+rdx]
+        rorx    r12d,esi,27
+        rorx    eax,esi,2
+        xor     esi,ebp
+        add     edx,r12d
+        xor     esi,ebx
+        add     ecx,DWORD[((-128))+r13]
+        lea     ecx,[rsi*1+rcx]
+        rorx    r12d,edx,27
+        rorx    esi,edx,2
+        xor     edx,eax
+        add     ecx,r12d
+        xor     edx,ebp
+        add     ebx,DWORD[((-124))+r13]
+        lea     ebx,[rdx*1+rbx]
+        rorx    r12d,ecx,27
+        rorx    edx,ecx,2
+        xor     ecx,esi
+        add     ebx,r12d
+        xor     ecx,eax
+        add     ebp,DWORD[((-120))+r13]
+        lea     ebp,[rbp*1+rcx]
+        rorx    r12d,ebx,27
+        rorx    ecx,ebx,2
+        xor     ebx,edx
+        add     ebp,r12d
+        xor     ebx,esi
+        vmovdqu YMMWORD[rsp],ymm8
+        vpshufb ymm2,ymm2,ymm6
+        vpaddd  ymm9,ymm1,ymm11
+        add     eax,DWORD[((-116))+r13]
+        lea     eax,[rbx*1+rax]
+        rorx    r12d,ebp,27
+        rorx    ebx,ebp,2
+        xor     ebp,ecx
+        add     eax,r12d
+        xor     ebp,edx
+        add     esi,DWORD[((-96))+r13]
+        lea     esi,[rbp*1+rsi]
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        xor     eax,ebx
+        add     esi,r12d
+        xor     eax,ecx
+        add     edx,DWORD[((-92))+r13]
+        lea     edx,[rax*1+rdx]
+        rorx    r12d,esi,27
+        rorx    eax,esi,2
+        xor     esi,ebp
+        add     edx,r12d
+        xor     esi,ebx
+        add     ecx,DWORD[((-88))+r13]
+        lea     ecx,[rsi*1+rcx]
+        rorx    r12d,edx,27
+        rorx    esi,edx,2
+        xor     edx,eax
+        add     ecx,r12d
+        xor     edx,ebp
+        add     ebx,DWORD[((-84))+r13]
+        mov     edi,esi
+        xor     edi,eax
+        lea     ebx,[rdx*1+rbx]
+        rorx    r12d,ecx,27
+        rorx    edx,ecx,2
+        xor     ecx,esi
+        add     ebx,r12d
+        and     ecx,edi
+        vmovdqu YMMWORD[32+rsp],ymm9
+        vpshufb ymm3,ymm3,ymm6
+        vpaddd  ymm6,ymm2,ymm11
+        add     ebp,DWORD[((-64))+r13]
+        xor     ecx,esi
+        mov     edi,edx
+        xor     edi,esi
+        lea     ebp,[rbp*1+rcx]
+        rorx    r12d,ebx,27
+        rorx    ecx,ebx,2
+        xor     ebx,edx
+        add     ebp,r12d
+        and     ebx,edi
+        add     eax,DWORD[((-60))+r13]
+        xor     ebx,edx
+        mov     edi,ecx
+        xor     edi,edx
+        lea     eax,[rbx*1+rax]
+        rorx    r12d,ebp,27
+        rorx    ebx,ebp,2
+        xor     ebp,ecx
+        add     eax,r12d
+        and     ebp,edi
+        add     esi,DWORD[((-56))+r13]
+        xor     ebp,ecx
+        mov     edi,ebx
+        xor     edi,ecx
+        lea     esi,[rbp*1+rsi]
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        xor     eax,ebx
+        add     esi,r12d
+        and     eax,edi
+        add     edx,DWORD[((-52))+r13]
+        xor     eax,ebx
+        mov     edi,ebp
+        xor     edi,ebx
+        lea     edx,[rax*1+rdx]
+        rorx    r12d,esi,27
+        rorx    eax,esi,2
+        xor     esi,ebp
+        add     edx,r12d
+        and     esi,edi
+        add     ecx,DWORD[((-32))+r13]
+        xor     esi,ebp
+        mov     edi,eax
+        xor     edi,ebp
+        lea     ecx,[rsi*1+rcx]
+        rorx    r12d,edx,27
+        rorx    esi,edx,2
+        xor     edx,eax
+        add     ecx,r12d
+        and     edx,edi
+        jmp     NEAR $L$align32_3
+ALIGN   32
+$L$align32_3:
+        vmovdqu YMMWORD[64+rsp],ymm6
+        vpaddd  ymm7,ymm3,ymm11
+        add     ebx,DWORD[((-28))+r13]
+        xor     edx,eax
+        mov     edi,esi
+        xor     edi,eax
+        lea     ebx,[rdx*1+rbx]
+        rorx    r12d,ecx,27
+        rorx    edx,ecx,2
+        xor     ecx,esi
+        add     ebx,r12d
+        and     ecx,edi
+        add     ebp,DWORD[((-24))+r13]
+        xor     ecx,esi
+        mov     edi,edx
+        xor     edi,esi
+        lea     ebp,[rbp*1+rcx]
+        rorx    r12d,ebx,27
+        rorx    ecx,ebx,2
+        xor     ebx,edx
+        add     ebp,r12d
+        and     ebx,edi
+        add     eax,DWORD[((-20))+r13]
+        xor     ebx,edx
+        mov     edi,ecx
+        xor     edi,edx
+        lea     eax,[rbx*1+rax]
+        rorx    r12d,ebp,27
+        rorx    ebx,ebp,2
+        xor     ebp,ecx
+        add     eax,r12d
+        and     ebp,edi
+        add     esi,DWORD[r13]
+        xor     ebp,ecx
+        mov     edi,ebx
+        xor     edi,ecx
+        lea     esi,[rbp*1+rsi]
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        xor     eax,ebx
+        add     esi,r12d
+        and     eax,edi
+        add     edx,DWORD[4+r13]
+        xor     eax,ebx
+        mov     edi,ebp
+        xor     edi,ebx
+        lea     edx,[rax*1+rdx]
+        rorx    r12d,esi,27
+        rorx    eax,esi,2
+        xor     esi,ebp
+        add     edx,r12d
+        and     esi,edi
+        vmovdqu YMMWORD[96+rsp],ymm7
+        add     ecx,DWORD[8+r13]
+        xor     esi,ebp
+        mov     edi,eax
+        xor     edi,ebp
+        lea     ecx,[rsi*1+rcx]
+        rorx    r12d,edx,27
+        rorx    esi,edx,2
+        xor     edx,eax
+        add     ecx,r12d
+        and     edx,edi
+        add     ebx,DWORD[12+r13]
+        xor     edx,eax
+        mov     edi,esi
+        xor     edi,eax
+        lea     ebx,[rdx*1+rbx]
+        rorx    r12d,ecx,27
+        rorx    edx,ecx,2
+        xor     ecx,esi
+        add     ebx,r12d
+        and     ecx,edi
+        add     ebp,DWORD[32+r13]
+        xor     ecx,esi
+        mov     edi,edx
+        xor     edi,esi
+        lea     ebp,[rbp*1+rcx]
+        rorx    r12d,ebx,27
+        rorx    ecx,ebx,2
+        xor     ebx,edx
+        add     ebp,r12d
+        and     ebx,edi
+        add     eax,DWORD[36+r13]
+        xor     ebx,edx
+        mov     edi,ecx
+        xor     edi,edx
+        lea     eax,[rbx*1+rax]
+        rorx    r12d,ebp,27
+        rorx    ebx,ebp,2
+        xor     ebp,ecx
+        add     eax,r12d
+        and     ebp,edi
+        add     esi,DWORD[40+r13]
+        xor     ebp,ecx
+        mov     edi,ebx
+        xor     edi,ecx
+        lea     esi,[rbp*1+rsi]
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        xor     eax,ebx
+        add     esi,r12d
+        and     eax,edi
+        vpalignr        ymm4,ymm1,ymm0,8
+        add     edx,DWORD[44+r13]
+        xor     eax,ebx
+        mov     edi,ebp
+        xor     edi,ebx
+        vpsrldq ymm8,ymm3,4
+        lea     edx,[rax*1+rdx]
+        rorx    r12d,esi,27
+        rorx    eax,esi,2
+        vpxor   ymm4,ymm4,ymm0
+        vpxor   ymm8,ymm8,ymm2
+        xor     esi,ebp
+        add     edx,r12d
+        vpxor   ymm4,ymm4,ymm8
+        and     esi,edi
+        add     ecx,DWORD[64+r13]
+        xor     esi,ebp
+        mov     edi,eax
+        vpsrld  ymm8,ymm4,31
+        xor     edi,ebp
+        lea     ecx,[rsi*1+rcx]
+        rorx    r12d,edx,27
+        vpslldq ymm10,ymm4,12
+        vpaddd  ymm4,ymm4,ymm4
+        rorx    esi,edx,2
+        xor     edx,eax
+        vpsrld  ymm9,ymm10,30
+        vpor    ymm4,ymm4,ymm8
+        add     ecx,r12d
+        and     edx,edi
+        vpslld  ymm10,ymm10,2
+        vpxor   ymm4,ymm4,ymm9
+        add     ebx,DWORD[68+r13]
+        xor     edx,eax
+        vpxor   ymm4,ymm4,ymm10
+        mov     edi,esi
+        xor     edi,eax
+        lea     ebx,[rdx*1+rbx]
+        vpaddd  ymm9,ymm4,ymm11
+        rorx    r12d,ecx,27
+        rorx    edx,ecx,2
+        xor     ecx,esi
+        vmovdqu YMMWORD[128+rsp],ymm9
+        add     ebx,r12d
+        and     ecx,edi
+        add     ebp,DWORD[72+r13]
+        xor     ecx,esi
+        mov     edi,edx
+        xor     edi,esi
+        lea     ebp,[rbp*1+rcx]
+        rorx    r12d,ebx,27
+        rorx    ecx,ebx,2
+        xor     ebx,edx
+        add     ebp,r12d
+        and     ebx,edi
+        add     eax,DWORD[76+r13]
+        xor     ebx,edx
+        lea     eax,[rbx*1+rax]
+        rorx    r12d,ebp,27
+        rorx    ebx,ebp,2
+        xor     ebp,ecx
+        add     eax,r12d
+        xor     ebp,edx
+        vpalignr        ymm5,ymm2,ymm1,8
+        add     esi,DWORD[96+r13]
+        lea     esi,[rbp*1+rsi]
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        vpsrldq ymm8,ymm4,4
+        xor     eax,ebx
+        add     esi,r12d
+        xor     eax,ecx
+        vpxor   ymm5,ymm5,ymm1
+        vpxor   ymm8,ymm8,ymm3
+        add     edx,DWORD[100+r13]
+        lea     edx,[rax*1+rdx]
+        vpxor   ymm5,ymm5,ymm8
+        rorx    r12d,esi,27
+        rorx    eax,esi,2
+        xor     esi,ebp
+        add     edx,r12d
+        vpsrld  ymm8,ymm5,31
+        vmovdqu ymm11,YMMWORD[((-32))+r14]
+        xor     esi,ebx
+        add     ecx,DWORD[104+r13]
+        lea     ecx,[rsi*1+rcx]
+        vpslldq ymm10,ymm5,12
+        vpaddd  ymm5,ymm5,ymm5
+        rorx    r12d,edx,27
+        rorx    esi,edx,2
+        vpsrld  ymm9,ymm10,30
+        vpor    ymm5,ymm5,ymm8
+        xor     edx,eax
+        add     ecx,r12d
+        vpslld  ymm10,ymm10,2
+        vpxor   ymm5,ymm5,ymm9
+        xor     edx,ebp
+        add     ebx,DWORD[108+r13]
+        lea     r13,[256+r13]
+        vpxor   ymm5,ymm5,ymm10
+        lea     ebx,[rdx*1+rbx]
+        rorx    r12d,ecx,27
+        rorx    edx,ecx,2
+        vpaddd  ymm9,ymm5,ymm11
+        xor     ecx,esi
+        add     ebx,r12d
+        xor     ecx,eax
+        vmovdqu YMMWORD[160+rsp],ymm9
+        add     ebp,DWORD[((-128))+r13]
+        lea     ebp,[rbp*1+rcx]
+        rorx    r12d,ebx,27
+        rorx    ecx,ebx,2
+        xor     ebx,edx
+        add     ebp,r12d
+        xor     ebx,esi
+        vpalignr        ymm6,ymm3,ymm2,8
+        add     eax,DWORD[((-124))+r13]
+        lea     eax,[rbx*1+rax]
+        rorx    r12d,ebp,27
+        rorx    ebx,ebp,2
+        vpsrldq ymm8,ymm5,4
+        xor     ebp,ecx
+        add     eax,r12d
+        xor     ebp,edx
+        vpxor   ymm6,ymm6,ymm2
+        vpxor   ymm8,ymm8,ymm4
+        add     esi,DWORD[((-120))+r13]
+        lea     esi,[rbp*1+rsi]
+        vpxor   ymm6,ymm6,ymm8
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        xor     eax,ebx
+        add     esi,r12d
+        vpsrld  ymm8,ymm6,31
+        xor     eax,ecx
+        add     edx,DWORD[((-116))+r13]
+        lea     edx,[rax*1+rdx]
+        vpslldq ymm10,ymm6,12
+        vpaddd  ymm6,ymm6,ymm6
+        rorx    r12d,esi,27
+        rorx    eax,esi,2
+        vpsrld  ymm9,ymm10,30
+        vpor    ymm6,ymm6,ymm8
+        xor     esi,ebp
+        add     edx,r12d
+        vpslld  ymm10,ymm10,2
+        vpxor   ymm6,ymm6,ymm9
+        xor     esi,ebx
+        add     ecx,DWORD[((-96))+r13]
+        vpxor   ymm6,ymm6,ymm10
+        lea     ecx,[rsi*1+rcx]
+        rorx    r12d,edx,27
+        rorx    esi,edx,2
+        vpaddd  ymm9,ymm6,ymm11
+        xor     edx,eax
+        add     ecx,r12d
+        xor     edx,ebp
+        vmovdqu YMMWORD[192+rsp],ymm9
+        add     ebx,DWORD[((-92))+r13]
+        lea     ebx,[rdx*1+rbx]
+        rorx    r12d,ecx,27
+        rorx    edx,ecx,2
+        xor     ecx,esi
+        add     ebx,r12d
+        xor     ecx,eax
+        vpalignr        ymm7,ymm4,ymm3,8
+        add     ebp,DWORD[((-88))+r13]
+        lea     ebp,[rbp*1+rcx]
+        rorx    r12d,ebx,27
+        rorx    ecx,ebx,2
+        vpsrldq ymm8,ymm6,4
+        xor     ebx,edx
+        add     ebp,r12d
+        xor     ebx,esi
+        vpxor   ymm7,ymm7,ymm3
+        vpxor   ymm8,ymm8,ymm5
+        add     eax,DWORD[((-84))+r13]
+        lea     eax,[rbx*1+rax]
+        vpxor   ymm7,ymm7,ymm8
+        rorx    r12d,ebp,27
+        rorx    ebx,ebp,2
+        xor     ebp,ecx
+        add     eax,r12d
+        vpsrld  ymm8,ymm7,31
+        xor     ebp,edx
+        add     esi,DWORD[((-64))+r13]
+        lea     esi,[rbp*1+rsi]
+        vpslldq ymm10,ymm7,12
+        vpaddd  ymm7,ymm7,ymm7
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        vpsrld  ymm9,ymm10,30
+        vpor    ymm7,ymm7,ymm8
+        xor     eax,ebx
+        add     esi,r12d
+        vpslld  ymm10,ymm10,2
+        vpxor   ymm7,ymm7,ymm9
+        xor     eax,ecx
+        add     edx,DWORD[((-60))+r13]
+        vpxor   ymm7,ymm7,ymm10
+        lea     edx,[rax*1+rdx]
+        rorx    r12d,esi,27
+        rorx    eax,esi,2
+        vpaddd  ymm9,ymm7,ymm11
+        xor     esi,ebp
+        add     edx,r12d
+        xor     esi,ebx
+        vmovdqu YMMWORD[224+rsp],ymm9
+        add     ecx,DWORD[((-56))+r13]
+        lea     ecx,[rsi*1+rcx]
+        rorx    r12d,edx,27
+        rorx    esi,edx,2
+        xor     edx,eax
+        add     ecx,r12d
+        xor     edx,ebp
+        add     ebx,DWORD[((-52))+r13]
+        lea     ebx,[rdx*1+rbx]
+        rorx    r12d,ecx,27
+        rorx    edx,ecx,2
+        xor     ecx,esi
+        add     ebx,r12d
+        xor     ecx,eax
+        add     ebp,DWORD[((-32))+r13]
+        lea     ebp,[rbp*1+rcx]
+        rorx    r12d,ebx,27
+        rorx    ecx,ebx,2
+        xor     ebx,edx
+        add     ebp,r12d
+        xor     ebx,esi
+        add     eax,DWORD[((-28))+r13]
+        lea     eax,[rbx*1+rax]
+        rorx    r12d,ebp,27
+        rorx    ebx,ebp,2
+        xor     ebp,ecx
+        add     eax,r12d
+        xor     ebp,edx
+        add     esi,DWORD[((-24))+r13]
+        lea     esi,[rbp*1+rsi]
+        rorx    r12d,eax,27
+        rorx    ebp,eax,2
+        xor     eax,ebx
+        add     esi,r12d
+        xor     eax,ecx
+        add     edx,DWORD[((-20))+r13]
+        lea     edx,[rax*1+rdx]
+        rorx    r12d,esi,27
+        add     edx,r12d
+        lea     r13,[128+rsp]
+
+
+        add     edx,DWORD[r8]
+        add     esi,DWORD[4+r8]
+        add     ebp,DWORD[8+r8]
+        mov     DWORD[r8],edx
+        add     ebx,DWORD[12+r8]
+        mov     DWORD[4+r8],esi
+        mov     eax,edx
+        add     ecx,DWORD[16+r8]
+        mov     r12d,ebp
+        mov     DWORD[8+r8],ebp
+        mov     edx,ebx
+
+        mov     DWORD[12+r8],ebx
+        mov     ebp,esi
+        mov     DWORD[16+r8],ecx
+
+        mov     esi,ecx
+        mov     ecx,r12d
+
+
+        cmp     r9,r10
+        jbe     NEAR $L$oop_avx2
+
+$L$done_avx2:
+        vzeroupper
+        movaps  xmm6,XMMWORD[((-40-96))+r11]
+        movaps  xmm7,XMMWORD[((-40-80))+r11]
+        movaps  xmm8,XMMWORD[((-40-64))+r11]
+        movaps  xmm9,XMMWORD[((-40-48))+r11]
+        movaps  xmm10,XMMWORD[((-40-32))+r11]
+        movaps  xmm11,XMMWORD[((-40-16))+r11]
+        mov     r14,QWORD[((-40))+r11]
+
+        mov     r13,QWORD[((-32))+r11]
+
+        mov     r12,QWORD[((-24))+r11]
+
+        mov     rbp,QWORD[((-16))+r11]
+
+        mov     rbx,QWORD[((-8))+r11]
+
+        lea     rsp,[r11]
+
+$L$epilogue_avx2:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_sha1_block_data_order_avx2:
+ALIGN   64
+K_XX_XX:
+        DD      0x5a827999,0x5a827999,0x5a827999,0x5a827999
+        DD      0x5a827999,0x5a827999,0x5a827999,0x5a827999
+        DD      0x6ed9eba1,0x6ed9eba1,0x6ed9eba1,0x6ed9eba1
+        DD      0x6ed9eba1,0x6ed9eba1,0x6ed9eba1,0x6ed9eba1
+        DD      0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc
+        DD      0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc,0x8f1bbcdc
+        DD      0xca62c1d6,0xca62c1d6,0xca62c1d6,0xca62c1d6
+        DD      0xca62c1d6,0xca62c1d6,0xca62c1d6,0xca62c1d6
+        DD      0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f
+        DD      0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f
+DB      0xf,0xe,0xd,0xc,0xb,0xa,0x9,0x8,0x7,0x6,0x5,0x4,0x3,0x2,0x1,0x0
+DB      83,72,65,49,32,98,108,111,99,107,32,116,114,97,110,115
+DB      102,111,114,109,32,102,111,114,32,120,56,54,95,54,52,44
+DB      32,67,82,89,80,84,79,71,65,77,83,32,98,121,32,60
+DB      97,112,112,114,111,64,111,112,101,110,115,115,108,46,111,114
+DB      103,62,0
+ALIGN   64
+EXTERN  __imp_RtlVirtualUnwind
+
+ALIGN   16
+se_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        lea     r10,[$L$prologue]
+        cmp     rbx,r10
+        jb      NEAR $L$common_seh_tail
+
+        mov     rax,QWORD[152+r8]
+
+        lea     r10,[$L$epilogue]
+        cmp     rbx,r10
+        jae     NEAR $L$common_seh_tail
+
+        mov     rax,QWORD[64+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+        mov     rbp,QWORD[((-16))+rax]
+        mov     r12,QWORD[((-24))+rax]
+        mov     r13,QWORD[((-32))+rax]
+        mov     r14,QWORD[((-40))+rax]
+        mov     QWORD[144+r8],rbx
+        mov     QWORD[160+r8],rbp
+        mov     QWORD[216+r8],r12
+        mov     QWORD[224+r8],r13
+        mov     QWORD[232+r8],r14
+
+        jmp     NEAR $L$common_seh_tail
+
+
+ALIGN   16
+shaext_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        lea     r10,[$L$prologue_shaext]
+        cmp     rbx,r10
+        jb      NEAR $L$common_seh_tail
+
+        lea     r10,[$L$epilogue_shaext]
+        cmp     rbx,r10
+        jae     NEAR $L$common_seh_tail
+
+        lea     rsi,[((-8-64))+rax]
+        lea     rdi,[512+r8]
+        mov     ecx,8
+        DD      0xa548f3fc
+
+        jmp     NEAR $L$common_seh_tail
+
+
+ALIGN   16
+ssse3_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        mov     rsi,QWORD[8+r9]
+        mov     r11,QWORD[56+r9]
+
+        mov     r10d,DWORD[r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jb      NEAR $L$common_seh_tail
+
+        mov     rax,QWORD[208+r8]
+
+        mov     r10d,DWORD[4+r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jae     NEAR $L$common_seh_tail
+
+        lea     rsi,[((-40-96))+rax]
+        lea     rdi,[512+r8]
+        mov     ecx,12
+        DD      0xa548f3fc
+
+        mov     rbx,QWORD[((-8))+rax]
+        mov     rbp,QWORD[((-16))+rax]
+        mov     r12,QWORD[((-24))+rax]
+        mov     r13,QWORD[((-32))+rax]
+        mov     r14,QWORD[((-40))+rax]
+        mov     QWORD[144+r8],rbx
+        mov     QWORD[160+r8],rbp
+        mov     QWORD[216+r8],r12
+        mov     QWORD[224+r8],r13
+        mov     QWORD[232+r8],r14
+
+$L$common_seh_tail:
+        mov     rdi,QWORD[8+rax]
+        mov     rsi,QWORD[16+rax]
+        mov     QWORD[152+r8],rax
+        mov     QWORD[168+r8],rsi
+        mov     QWORD[176+r8],rdi
+
+        mov     rdi,QWORD[40+r9]
+        mov     rsi,r8
+        mov     ecx,154
+        DD      0xa548f3fc
+
+        mov     rsi,r9
+        xor     rcx,rcx
+        mov     rdx,QWORD[8+rsi]
+        mov     r8,QWORD[rsi]
+        mov     r9,QWORD[16+rsi]
+        mov     r10,QWORD[40+rsi]
+        lea     r11,[56+rsi]
+        lea     r12,[24+rsi]
+        mov     QWORD[32+rsp],r10
+        mov     QWORD[40+rsp],r11
+        mov     QWORD[48+rsp],r12
+        mov     QWORD[56+rsp],rcx
+        call    QWORD[__imp_RtlVirtualUnwind]
+
+        mov     eax,1
+        add     rsp,64
+        popfq
+        pop     r15
+        pop     r14
+        pop     r13
+        pop     r12
+        pop     rbp
+        pop     rbx
+        pop     rdi
+        pop     rsi
+        DB      0F3h,0C3h               ;repret
+
+
+section .pdata rdata align=4
+ALIGN   4
+        DD      $L$SEH_begin_sha1_block_data_order wrt ..imagebase
+        DD      $L$SEH_end_sha1_block_data_order wrt ..imagebase
+        DD      $L$SEH_info_sha1_block_data_order wrt ..imagebase
+        DD      $L$SEH_begin_sha1_block_data_order_shaext wrt ..imagebase
+        DD      $L$SEH_end_sha1_block_data_order_shaext wrt ..imagebase
+        DD      $L$SEH_info_sha1_block_data_order_shaext wrt ..imagebase
+        DD      $L$SEH_begin_sha1_block_data_order_ssse3 wrt ..imagebase
+        DD      $L$SEH_end_sha1_block_data_order_ssse3 wrt ..imagebase
+        DD      $L$SEH_info_sha1_block_data_order_ssse3 wrt ..imagebase
+        DD      $L$SEH_begin_sha1_block_data_order_avx wrt ..imagebase
+        DD      $L$SEH_end_sha1_block_data_order_avx wrt ..imagebase
+        DD      $L$SEH_info_sha1_block_data_order_avx wrt ..imagebase
+        DD      $L$SEH_begin_sha1_block_data_order_avx2 wrt ..imagebase
+        DD      $L$SEH_end_sha1_block_data_order_avx2 wrt ..imagebase
+        DD      $L$SEH_info_sha1_block_data_order_avx2 wrt ..imagebase
+section .xdata rdata align=8
+ALIGN   8
+$L$SEH_info_sha1_block_data_order:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+$L$SEH_info_sha1_block_data_order_shaext:
+DB      9,0,0,0
+        DD      shaext_handler wrt ..imagebase
+$L$SEH_info_sha1_block_data_order_ssse3:
+DB      9,0,0,0
+        DD      ssse3_handler wrt ..imagebase
+        DD      $L$prologue_ssse3 wrt ..imagebase,$L$epilogue_ssse3 wrt ..imagebase
+$L$SEH_info_sha1_block_data_order_avx:
+DB      9,0,0,0
+        DD      ssse3_handler wrt ..imagebase
+        DD      $L$prologue_avx wrt ..imagebase,$L$epilogue_avx wrt ..imagebase
+$L$SEH_info_sha1_block_data_order_avx2:
+DB      9,0,0,0
+        DD      ssse3_handler wrt ..imagebase
+        DD      $L$prologue_avx2 wrt ..imagebase,$L$epilogue_avx2 wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-mb-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-mb-x86_64.nasm
new file mode 100644
index 0000000000..5940112c1f
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-mb-x86_64.nasm
@@ -0,0 +1,8262 @@
+; Copyright 2013-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+EXTERN  OPENSSL_ia32cap_P
+
+global  sha256_multi_block
+
+ALIGN   32
+sha256_multi_block:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_sha256_multi_block:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+
+        mov     rcx,QWORD[((OPENSSL_ia32cap_P+4))]
+        bt      rcx,61
+        jc      NEAR _shaext_shortcut
+        test    ecx,268435456
+        jnz     NEAR _avx_shortcut
+        mov     rax,rsp
+
+        push    rbx
+
+        push    rbp
+
+        lea     rsp,[((-168))+rsp]
+        movaps  XMMWORD[rsp],xmm6
+        movaps  XMMWORD[16+rsp],xmm7
+        movaps  XMMWORD[32+rsp],xmm8
+        movaps  XMMWORD[48+rsp],xmm9
+        movaps  XMMWORD[(-120)+rax],xmm10
+        movaps  XMMWORD[(-104)+rax],xmm11
+        movaps  XMMWORD[(-88)+rax],xmm12
+        movaps  XMMWORD[(-72)+rax],xmm13
+        movaps  XMMWORD[(-56)+rax],xmm14
+        movaps  XMMWORD[(-40)+rax],xmm15
+        sub     rsp,288
+        and     rsp,-256
+        mov     QWORD[272+rsp],rax
+
+$L$body:
+        lea     rbp,[((K256+128))]
+        lea     rbx,[256+rsp]
+        lea     rdi,[128+rdi]
+
+$L$oop_grande:
+        mov     DWORD[280+rsp],edx
+        xor     edx,edx
+        mov     r8,QWORD[rsi]
+        mov     ecx,DWORD[8+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[rbx],ecx
+        cmovle  r8,rbp
+        mov     r9,QWORD[16+rsi]
+        mov     ecx,DWORD[24+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[4+rbx],ecx
+        cmovle  r9,rbp
+        mov     r10,QWORD[32+rsi]
+        mov     ecx,DWORD[40+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[8+rbx],ecx
+        cmovle  r10,rbp
+        mov     r11,QWORD[48+rsi]
+        mov     ecx,DWORD[56+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[12+rbx],ecx
+        cmovle  r11,rbp
+        test    edx,edx
+        jz      NEAR $L$done
+
+        movdqu  xmm8,XMMWORD[((0-128))+rdi]
+        lea     rax,[128+rsp]
+        movdqu  xmm9,XMMWORD[((32-128))+rdi]
+        movdqu  xmm10,XMMWORD[((64-128))+rdi]
+        movdqu  xmm11,XMMWORD[((96-128))+rdi]
+        movdqu  xmm12,XMMWORD[((128-128))+rdi]
+        movdqu  xmm13,XMMWORD[((160-128))+rdi]
+        movdqu  xmm14,XMMWORD[((192-128))+rdi]
+        movdqu  xmm15,XMMWORD[((224-128))+rdi]
+        movdqu  xmm6,XMMWORD[$L$pbswap]
+        jmp     NEAR $L$oop
+
+ALIGN   32
+$L$oop:
+        movdqa  xmm4,xmm10
+        pxor    xmm4,xmm9
+        movd    xmm5,DWORD[r8]
+        movd    xmm0,DWORD[r9]
+        movd    xmm1,DWORD[r10]
+        movd    xmm2,DWORD[r11]
+        punpckldq       xmm5,xmm1
+        punpckldq       xmm0,xmm2
+        punpckldq       xmm5,xmm0
+        movdqa  xmm7,xmm12
+DB      102,15,56,0,238
+        movdqa  xmm2,xmm12
+
+        psrld   xmm7,6
+        movdqa  xmm1,xmm12
+        pslld   xmm2,7
+        movdqa  XMMWORD[(0-128)+rax],xmm5
+        paddd   xmm5,xmm15
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm5,XMMWORD[((-128))+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm12
+
+        pxor    xmm7,xmm2
+        movdqa  xmm3,xmm12
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm14
+        pand    xmm3,xmm13
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm8
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm8
+        psrld   xmm1,2
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm3
+        movdqa  xmm3,xmm9
+        movdqa  xmm7,xmm8
+        pslld   xmm2,10
+        pxor    xmm3,xmm8
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm5,xmm0
+        pslld   xmm2,19-10
+        pand    xmm4,xmm3
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm15,xmm9
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm15,xmm4
+        paddd   xmm11,xmm5
+        pxor    xmm7,xmm2
+
+        paddd   xmm15,xmm5
+        paddd   xmm15,xmm7
+        movd    xmm5,DWORD[4+r8]
+        movd    xmm0,DWORD[4+r9]
+        movd    xmm1,DWORD[4+r10]
+        movd    xmm2,DWORD[4+r11]
+        punpckldq       xmm5,xmm1
+        punpckldq       xmm0,xmm2
+        punpckldq       xmm5,xmm0
+        movdqa  xmm7,xmm11
+
+        movdqa  xmm2,xmm11
+DB      102,15,56,0,238
+        psrld   xmm7,6
+        movdqa  xmm1,xmm11
+        pslld   xmm2,7
+        movdqa  XMMWORD[(16-128)+rax],xmm5
+        paddd   xmm5,xmm14
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm5,XMMWORD[((-96))+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm11
+
+        pxor    xmm7,xmm2
+        movdqa  xmm4,xmm11
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm13
+        pand    xmm4,xmm12
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm15
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm15
+        psrld   xmm1,2
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm4
+        movdqa  xmm4,xmm8
+        movdqa  xmm7,xmm15
+        pslld   xmm2,10
+        pxor    xmm4,xmm15
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm5,xmm0
+        pslld   xmm2,19-10
+        pand    xmm3,xmm4
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm14,xmm8
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm14,xmm3
+        paddd   xmm10,xmm5
+        pxor    xmm7,xmm2
+
+        paddd   xmm14,xmm5
+        paddd   xmm14,xmm7
+        movd    xmm5,DWORD[8+r8]
+        movd    xmm0,DWORD[8+r9]
+        movd    xmm1,DWORD[8+r10]
+        movd    xmm2,DWORD[8+r11]
+        punpckldq       xmm5,xmm1
+        punpckldq       xmm0,xmm2
+        punpckldq       xmm5,xmm0
+        movdqa  xmm7,xmm10
+DB      102,15,56,0,238
+        movdqa  xmm2,xmm10
+
+        psrld   xmm7,6
+        movdqa  xmm1,xmm10
+        pslld   xmm2,7
+        movdqa  XMMWORD[(32-128)+rax],xmm5
+        paddd   xmm5,xmm13
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm5,XMMWORD[((-64))+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm10
+
+        pxor    xmm7,xmm2
+        movdqa  xmm3,xmm10
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm12
+        pand    xmm3,xmm11
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm14
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm14
+        psrld   xmm1,2
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm3
+        movdqa  xmm3,xmm15
+        movdqa  xmm7,xmm14
+        pslld   xmm2,10
+        pxor    xmm3,xmm14
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm5,xmm0
+        pslld   xmm2,19-10
+        pand    xmm4,xmm3
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm13,xmm15
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm13,xmm4
+        paddd   xmm9,xmm5
+        pxor    xmm7,xmm2
+
+        paddd   xmm13,xmm5
+        paddd   xmm13,xmm7
+        movd    xmm5,DWORD[12+r8]
+        movd    xmm0,DWORD[12+r9]
+        movd    xmm1,DWORD[12+r10]
+        movd    xmm2,DWORD[12+r11]
+        punpckldq       xmm5,xmm1
+        punpckldq       xmm0,xmm2
+        punpckldq       xmm5,xmm0
+        movdqa  xmm7,xmm9
+
+        movdqa  xmm2,xmm9
+DB      102,15,56,0,238
+        psrld   xmm7,6
+        movdqa  xmm1,xmm9
+        pslld   xmm2,7
+        movdqa  XMMWORD[(48-128)+rax],xmm5
+        paddd   xmm5,xmm12
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm5,XMMWORD[((-32))+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm9
+
+        pxor    xmm7,xmm2
+        movdqa  xmm4,xmm9
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm11
+        pand    xmm4,xmm10
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm13
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm13
+        psrld   xmm1,2
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm4
+        movdqa  xmm4,xmm14
+        movdqa  xmm7,xmm13
+        pslld   xmm2,10
+        pxor    xmm4,xmm13
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm5,xmm0
+        pslld   xmm2,19-10
+        pand    xmm3,xmm4
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm12,xmm14
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm12,xmm3
+        paddd   xmm8,xmm5
+        pxor    xmm7,xmm2
+
+        paddd   xmm12,xmm5
+        paddd   xmm12,xmm7
+        movd    xmm5,DWORD[16+r8]
+        movd    xmm0,DWORD[16+r9]
+        movd    xmm1,DWORD[16+r10]
+        movd    xmm2,DWORD[16+r11]
+        punpckldq       xmm5,xmm1
+        punpckldq       xmm0,xmm2
+        punpckldq       xmm5,xmm0
+        movdqa  xmm7,xmm8
+DB      102,15,56,0,238
+        movdqa  xmm2,xmm8
+
+        psrld   xmm7,6
+        movdqa  xmm1,xmm8
+        pslld   xmm2,7
+        movdqa  XMMWORD[(64-128)+rax],xmm5
+        paddd   xmm5,xmm11
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm5,XMMWORD[rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm8
+
+        pxor    xmm7,xmm2
+        movdqa  xmm3,xmm8
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm10
+        pand    xmm3,xmm9
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm12
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm12
+        psrld   xmm1,2
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm3
+        movdqa  xmm3,xmm13
+        movdqa  xmm7,xmm12
+        pslld   xmm2,10
+        pxor    xmm3,xmm12
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm5,xmm0
+        pslld   xmm2,19-10
+        pand    xmm4,xmm3
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm11,xmm13
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm11,xmm4
+        paddd   xmm15,xmm5
+        pxor    xmm7,xmm2
+
+        paddd   xmm11,xmm5
+        paddd   xmm11,xmm7
+        movd    xmm5,DWORD[20+r8]
+        movd    xmm0,DWORD[20+r9]
+        movd    xmm1,DWORD[20+r10]
+        movd    xmm2,DWORD[20+r11]
+        punpckldq       xmm5,xmm1
+        punpckldq       xmm0,xmm2
+        punpckldq       xmm5,xmm0
+        movdqa  xmm7,xmm15
+
+        movdqa  xmm2,xmm15
+DB      102,15,56,0,238
+        psrld   xmm7,6
+        movdqa  xmm1,xmm15
+        pslld   xmm2,7
+        movdqa  XMMWORD[(80-128)+rax],xmm5
+        paddd   xmm5,xmm10
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm5,XMMWORD[32+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm15
+
+        pxor    xmm7,xmm2
+        movdqa  xmm4,xmm15
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm9
+        pand    xmm4,xmm8
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm11
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm11
+        psrld   xmm1,2
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm4
+        movdqa  xmm4,xmm12
+        movdqa  xmm7,xmm11
+        pslld   xmm2,10
+        pxor    xmm4,xmm11
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm5,xmm0
+        pslld   xmm2,19-10
+        pand    xmm3,xmm4
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm10,xmm12
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm10,xmm3
+        paddd   xmm14,xmm5
+        pxor    xmm7,xmm2
+
+        paddd   xmm10,xmm5
+        paddd   xmm10,xmm7
+        movd    xmm5,DWORD[24+r8]
+        movd    xmm0,DWORD[24+r9]
+        movd    xmm1,DWORD[24+r10]
+        movd    xmm2,DWORD[24+r11]
+        punpckldq       xmm5,xmm1
+        punpckldq       xmm0,xmm2
+        punpckldq       xmm5,xmm0
+        movdqa  xmm7,xmm14
+DB      102,15,56,0,238
+        movdqa  xmm2,xmm14
+
+        psrld   xmm7,6
+        movdqa  xmm1,xmm14
+        pslld   xmm2,7
+        movdqa  XMMWORD[(96-128)+rax],xmm5
+        paddd   xmm5,xmm9
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm5,XMMWORD[64+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm14
+
+        pxor    xmm7,xmm2
+        movdqa  xmm3,xmm14
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm8
+        pand    xmm3,xmm15
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm10
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm10
+        psrld   xmm1,2
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm3
+        movdqa  xmm3,xmm11
+        movdqa  xmm7,xmm10
+        pslld   xmm2,10
+        pxor    xmm3,xmm10
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm5,xmm0
+        pslld   xmm2,19-10
+        pand    xmm4,xmm3
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm9,xmm11
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm9,xmm4
+        paddd   xmm13,xmm5
+        pxor    xmm7,xmm2
+
+        paddd   xmm9,xmm5
+        paddd   xmm9,xmm7
+        movd    xmm5,DWORD[28+r8]
+        movd    xmm0,DWORD[28+r9]
+        movd    xmm1,DWORD[28+r10]
+        movd    xmm2,DWORD[28+r11]
+        punpckldq       xmm5,xmm1
+        punpckldq       xmm0,xmm2
+        punpckldq       xmm5,xmm0
+        movdqa  xmm7,xmm13
+
+        movdqa  xmm2,xmm13
+DB      102,15,56,0,238
+        psrld   xmm7,6
+        movdqa  xmm1,xmm13
+        pslld   xmm2,7
+        movdqa  XMMWORD[(112-128)+rax],xmm5
+        paddd   xmm5,xmm8
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm5,XMMWORD[96+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm13
+
+        pxor    xmm7,xmm2
+        movdqa  xmm4,xmm13
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm15
+        pand    xmm4,xmm14
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm9
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm9
+        psrld   xmm1,2
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm4
+        movdqa  xmm4,xmm10
+        movdqa  xmm7,xmm9
+        pslld   xmm2,10
+        pxor    xmm4,xmm9
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm5,xmm0
+        pslld   xmm2,19-10
+        pand    xmm3,xmm4
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm8,xmm10
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm8,xmm3
+        paddd   xmm12,xmm5
+        pxor    xmm7,xmm2
+
+        paddd   xmm8,xmm5
+        paddd   xmm8,xmm7
+        lea     rbp,[256+rbp]
+        movd    xmm5,DWORD[32+r8]
+        movd    xmm0,DWORD[32+r9]
+        movd    xmm1,DWORD[32+r10]
+        movd    xmm2,DWORD[32+r11]
+        punpckldq       xmm5,xmm1
+        punpckldq       xmm0,xmm2
+        punpckldq       xmm5,xmm0
+        movdqa  xmm7,xmm12
+DB      102,15,56,0,238
+        movdqa  xmm2,xmm12
+
+        psrld   xmm7,6
+        movdqa  xmm1,xmm12
+        pslld   xmm2,7
+        movdqa  XMMWORD[(128-128)+rax],xmm5
+        paddd   xmm5,xmm15
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm5,XMMWORD[((-128))+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm12
+
+        pxor    xmm7,xmm2
+        movdqa  xmm3,xmm12
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm14
+        pand    xmm3,xmm13
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm8
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm8
+        psrld   xmm1,2
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm3
+        movdqa  xmm3,xmm9
+        movdqa  xmm7,xmm8
+        pslld   xmm2,10
+        pxor    xmm3,xmm8
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm5,xmm0
+        pslld   xmm2,19-10
+        pand    xmm4,xmm3
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm15,xmm9
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm15,xmm4
+        paddd   xmm11,xmm5
+        pxor    xmm7,xmm2
+
+        paddd   xmm15,xmm5
+        paddd   xmm15,xmm7
+        movd    xmm5,DWORD[36+r8]
+        movd    xmm0,DWORD[36+r9]
+        movd    xmm1,DWORD[36+r10]
+        movd    xmm2,DWORD[36+r11]
+        punpckldq       xmm5,xmm1
+        punpckldq       xmm0,xmm2
+        punpckldq       xmm5,xmm0
+        movdqa  xmm7,xmm11
+
+        movdqa  xmm2,xmm11
+DB      102,15,56,0,238
+        psrld   xmm7,6
+        movdqa  xmm1,xmm11
+        pslld   xmm2,7
+        movdqa  XMMWORD[(144-128)+rax],xmm5
+        paddd   xmm5,xmm14
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm5,XMMWORD[((-96))+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm11
+
+        pxor    xmm7,xmm2
+        movdqa  xmm4,xmm11
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm13
+        pand    xmm4,xmm12
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm15
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm15
+        psrld   xmm1,2
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm4
+        movdqa  xmm4,xmm8
+        movdqa  xmm7,xmm15
+        pslld   xmm2,10
+        pxor    xmm4,xmm15
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm5,xmm0
+        pslld   xmm2,19-10
+        pand    xmm3,xmm4
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm14,xmm8
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm14,xmm3
+        paddd   xmm10,xmm5
+        pxor    xmm7,xmm2
+
+        paddd   xmm14,xmm5
+        paddd   xmm14,xmm7
+        movd    xmm5,DWORD[40+r8]
+        movd    xmm0,DWORD[40+r9]
+        movd    xmm1,DWORD[40+r10]
+        movd    xmm2,DWORD[40+r11]
+        punpckldq       xmm5,xmm1
+        punpckldq       xmm0,xmm2
+        punpckldq       xmm5,xmm0
+        movdqa  xmm7,xmm10
+DB      102,15,56,0,238
+        movdqa  xmm2,xmm10
+
+        psrld   xmm7,6
+        movdqa  xmm1,xmm10
+        pslld   xmm2,7
+        movdqa  XMMWORD[(160-128)+rax],xmm5
+        paddd   xmm5,xmm13
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm5,XMMWORD[((-64))+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm10
+
+        pxor    xmm7,xmm2
+        movdqa  xmm3,xmm10
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm12
+        pand    xmm3,xmm11
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm14
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm14
+        psrld   xmm1,2
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm3
+        movdqa  xmm3,xmm15
+        movdqa  xmm7,xmm14
+        pslld   xmm2,10
+        pxor    xmm3,xmm14
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm5,xmm0
+        pslld   xmm2,19-10
+        pand    xmm4,xmm3
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm13,xmm15
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm13,xmm4
+        paddd   xmm9,xmm5
+        pxor    xmm7,xmm2
+
+        paddd   xmm13,xmm5
+        paddd   xmm13,xmm7
+        movd    xmm5,DWORD[44+r8]
+        movd    xmm0,DWORD[44+r9]
+        movd    xmm1,DWORD[44+r10]
+        movd    xmm2,DWORD[44+r11]
+        punpckldq       xmm5,xmm1
+        punpckldq       xmm0,xmm2
+        punpckldq       xmm5,xmm0
+        movdqa  xmm7,xmm9
+
+        movdqa  xmm2,xmm9
+DB      102,15,56,0,238
+        psrld   xmm7,6
+        movdqa  xmm1,xmm9
+        pslld   xmm2,7
+        movdqa  XMMWORD[(176-128)+rax],xmm5
+        paddd   xmm5,xmm12
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm5,XMMWORD[((-32))+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm9
+
+        pxor    xmm7,xmm2
+        movdqa  xmm4,xmm9
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm11
+        pand    xmm4,xmm10
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm13
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm13
+        psrld   xmm1,2
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm4
+        movdqa  xmm4,xmm14
+        movdqa  xmm7,xmm13
+        pslld   xmm2,10
+        pxor    xmm4,xmm13
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm5,xmm0
+        pslld   xmm2,19-10
+        pand    xmm3,xmm4
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm12,xmm14
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm12,xmm3
+        paddd   xmm8,xmm5
+        pxor    xmm7,xmm2
+
+        paddd   xmm12,xmm5
+        paddd   xmm12,xmm7
+        movd    xmm5,DWORD[48+r8]
+        movd    xmm0,DWORD[48+r9]
+        movd    xmm1,DWORD[48+r10]
+        movd    xmm2,DWORD[48+r11]
+        punpckldq       xmm5,xmm1
+        punpckldq       xmm0,xmm2
+        punpckldq       xmm5,xmm0
+        movdqa  xmm7,xmm8
+DB      102,15,56,0,238
+        movdqa  xmm2,xmm8
+
+        psrld   xmm7,6
+        movdqa  xmm1,xmm8
+        pslld   xmm2,7
+        movdqa  XMMWORD[(192-128)+rax],xmm5
+        paddd   xmm5,xmm11
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm5,XMMWORD[rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm8
+
+        pxor    xmm7,xmm2
+        movdqa  xmm3,xmm8
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm10
+        pand    xmm3,xmm9
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm12
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm12
+        psrld   xmm1,2
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm3
+        movdqa  xmm3,xmm13
+        movdqa  xmm7,xmm12
+        pslld   xmm2,10
+        pxor    xmm3,xmm12
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm5,xmm0
+        pslld   xmm2,19-10
+        pand    xmm4,xmm3
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm11,xmm13
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm11,xmm4
+        paddd   xmm15,xmm5
+        pxor    xmm7,xmm2
+
+        paddd   xmm11,xmm5
+        paddd   xmm11,xmm7
+        movd    xmm5,DWORD[52+r8]
+        movd    xmm0,DWORD[52+r9]
+        movd    xmm1,DWORD[52+r10]
+        movd    xmm2,DWORD[52+r11]
+        punpckldq       xmm5,xmm1
+        punpckldq       xmm0,xmm2
+        punpckldq       xmm5,xmm0
+        movdqa  xmm7,xmm15
+
+        movdqa  xmm2,xmm15
+DB      102,15,56,0,238
+        psrld   xmm7,6
+        movdqa  xmm1,xmm15
+        pslld   xmm2,7
+        movdqa  XMMWORD[(208-128)+rax],xmm5
+        paddd   xmm5,xmm10
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm5,XMMWORD[32+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm15
+
+        pxor    xmm7,xmm2
+        movdqa  xmm4,xmm15
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm9
+        pand    xmm4,xmm8
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm11
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm11
+        psrld   xmm1,2
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm4
+        movdqa  xmm4,xmm12
+        movdqa  xmm7,xmm11
+        pslld   xmm2,10
+        pxor    xmm4,xmm11
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm5,xmm0
+        pslld   xmm2,19-10
+        pand    xmm3,xmm4
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm10,xmm12
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm10,xmm3
+        paddd   xmm14,xmm5
+        pxor    xmm7,xmm2
+
+        paddd   xmm10,xmm5
+        paddd   xmm10,xmm7
+        movd    xmm5,DWORD[56+r8]
+        movd    xmm0,DWORD[56+r9]
+        movd    xmm1,DWORD[56+r10]
+        movd    xmm2,DWORD[56+r11]
+        punpckldq       xmm5,xmm1
+        punpckldq       xmm0,xmm2
+        punpckldq       xmm5,xmm0
+        movdqa  xmm7,xmm14
+DB      102,15,56,0,238
+        movdqa  xmm2,xmm14
+
+        psrld   xmm7,6
+        movdqa  xmm1,xmm14
+        pslld   xmm2,7
+        movdqa  XMMWORD[(224-128)+rax],xmm5
+        paddd   xmm5,xmm9
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm5,XMMWORD[64+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm14
+
+        pxor    xmm7,xmm2
+        movdqa  xmm3,xmm14
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm8
+        pand    xmm3,xmm15
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm10
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm10
+        psrld   xmm1,2
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm3
+        movdqa  xmm3,xmm11
+        movdqa  xmm7,xmm10
+        pslld   xmm2,10
+        pxor    xmm3,xmm10
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm5,xmm0
+        pslld   xmm2,19-10
+        pand    xmm4,xmm3
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm9,xmm11
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm9,xmm4
+        paddd   xmm13,xmm5
+        pxor    xmm7,xmm2
+
+        paddd   xmm9,xmm5
+        paddd   xmm9,xmm7
+        movd    xmm5,DWORD[60+r8]
+        lea     r8,[64+r8]
+        movd    xmm0,DWORD[60+r9]
+        lea     r9,[64+r9]
+        movd    xmm1,DWORD[60+r10]
+        lea     r10,[64+r10]
+        movd    xmm2,DWORD[60+r11]
+        lea     r11,[64+r11]
+        punpckldq       xmm5,xmm1
+        punpckldq       xmm0,xmm2
+        punpckldq       xmm5,xmm0
+        movdqa  xmm7,xmm13
+
+        movdqa  xmm2,xmm13
+DB      102,15,56,0,238
+        psrld   xmm7,6
+        movdqa  xmm1,xmm13
+        pslld   xmm2,7
+        movdqa  XMMWORD[(240-128)+rax],xmm5
+        paddd   xmm5,xmm8
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm5,XMMWORD[96+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm13
+        prefetcht0      [63+r8]
+        pxor    xmm7,xmm2
+        movdqa  xmm4,xmm13
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm15
+        pand    xmm4,xmm14
+        pxor    xmm7,xmm1
+
+        prefetcht0      [63+r9]
+        movdqa  xmm1,xmm9
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm9
+        psrld   xmm1,2
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm4
+        movdqa  xmm4,xmm10
+        movdqa  xmm7,xmm9
+        pslld   xmm2,10
+        pxor    xmm4,xmm9
+
+        prefetcht0      [63+r10]
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm5,xmm0
+        pslld   xmm2,19-10
+        pand    xmm3,xmm4
+        pxor    xmm1,xmm7
+
+        prefetcht0      [63+r11]
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm8,xmm10
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm8,xmm3
+        paddd   xmm12,xmm5
+        pxor    xmm7,xmm2
+
+        paddd   xmm8,xmm5
+        paddd   xmm8,xmm7
+        lea     rbp,[256+rbp]
+        movdqu  xmm5,XMMWORD[((0-128))+rax]
+        mov     ecx,3
+        jmp     NEAR $L$oop_16_xx
+ALIGN   32
+$L$oop_16_xx:
+        movdqa  xmm6,XMMWORD[((16-128))+rax]
+        paddd   xmm5,XMMWORD[((144-128))+rax]
+
+        movdqa  xmm7,xmm6
+        movdqa  xmm1,xmm6
+        psrld   xmm7,3
+        movdqa  xmm2,xmm6
+
+        psrld   xmm1,7
+        movdqa  xmm0,XMMWORD[((224-128))+rax]
+        pslld   xmm2,14
+        pxor    xmm7,xmm1
+        psrld   xmm1,18-7
+        movdqa  xmm3,xmm0
+        pxor    xmm7,xmm2
+        pslld   xmm2,25-14
+        pxor    xmm7,xmm1
+        psrld   xmm0,10
+        movdqa  xmm1,xmm3
+
+        psrld   xmm3,17
+        pxor    xmm7,xmm2
+        pslld   xmm1,13
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm3
+        psrld   xmm3,19-17
+        pxor    xmm0,xmm1
+        pslld   xmm1,15-13
+        pxor    xmm0,xmm3
+        pxor    xmm0,xmm1
+        paddd   xmm5,xmm0
+        movdqa  xmm7,xmm12
+
+        movdqa  xmm2,xmm12
+
+        psrld   xmm7,6
+        movdqa  xmm1,xmm12
+        pslld   xmm2,7
+        movdqa  XMMWORD[(0-128)+rax],xmm5
+        paddd   xmm5,xmm15
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm5,XMMWORD[((-128))+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm12
+
+        pxor    xmm7,xmm2
+        movdqa  xmm3,xmm12
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm14
+        pand    xmm3,xmm13
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm8
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm8
+        psrld   xmm1,2
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm3
+        movdqa  xmm3,xmm9
+        movdqa  xmm7,xmm8
+        pslld   xmm2,10
+        pxor    xmm3,xmm8
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm5,xmm0
+        pslld   xmm2,19-10
+        pand    xmm4,xmm3
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm15,xmm9
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm15,xmm4
+        paddd   xmm11,xmm5
+        pxor    xmm7,xmm2
+
+        paddd   xmm15,xmm5
+        paddd   xmm15,xmm7
+        movdqa  xmm5,XMMWORD[((32-128))+rax]
+        paddd   xmm6,XMMWORD[((160-128))+rax]
+
+        movdqa  xmm7,xmm5
+        movdqa  xmm1,xmm5
+        psrld   xmm7,3
+        movdqa  xmm2,xmm5
+
+        psrld   xmm1,7
+        movdqa  xmm0,XMMWORD[((240-128))+rax]
+        pslld   xmm2,14
+        pxor    xmm7,xmm1
+        psrld   xmm1,18-7
+        movdqa  xmm4,xmm0
+        pxor    xmm7,xmm2
+        pslld   xmm2,25-14
+        pxor    xmm7,xmm1
+        psrld   xmm0,10
+        movdqa  xmm1,xmm4
+
+        psrld   xmm4,17
+        pxor    xmm7,xmm2
+        pslld   xmm1,13
+        paddd   xmm6,xmm7
+        pxor    xmm0,xmm4
+        psrld   xmm4,19-17
+        pxor    xmm0,xmm1
+        pslld   xmm1,15-13
+        pxor    xmm0,xmm4
+        pxor    xmm0,xmm1
+        paddd   xmm6,xmm0
+        movdqa  xmm7,xmm11
+
+        movdqa  xmm2,xmm11
+
+        psrld   xmm7,6
+        movdqa  xmm1,xmm11
+        pslld   xmm2,7
+        movdqa  XMMWORD[(16-128)+rax],xmm6
+        paddd   xmm6,xmm14
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm6,XMMWORD[((-96))+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm11
+
+        pxor    xmm7,xmm2
+        movdqa  xmm4,xmm11
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm13
+        pand    xmm4,xmm12
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm15
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm15
+        psrld   xmm1,2
+        paddd   xmm6,xmm7
+        pxor    xmm0,xmm4
+        movdqa  xmm4,xmm8
+        movdqa  xmm7,xmm15
+        pslld   xmm2,10
+        pxor    xmm4,xmm15
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm6,xmm0
+        pslld   xmm2,19-10
+        pand    xmm3,xmm4
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm14,xmm8
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm14,xmm3
+        paddd   xmm10,xmm6
+        pxor    xmm7,xmm2
+
+        paddd   xmm14,xmm6
+        paddd   xmm14,xmm7
+        movdqa  xmm6,XMMWORD[((48-128))+rax]
+        paddd   xmm5,XMMWORD[((176-128))+rax]
+
+        movdqa  xmm7,xmm6
+        movdqa  xmm1,xmm6
+        psrld   xmm7,3
+        movdqa  xmm2,xmm6
+
+        psrld   xmm1,7
+        movdqa  xmm0,XMMWORD[((0-128))+rax]
+        pslld   xmm2,14
+        pxor    xmm7,xmm1
+        psrld   xmm1,18-7
+        movdqa  xmm3,xmm0
+        pxor    xmm7,xmm2
+        pslld   xmm2,25-14
+        pxor    xmm7,xmm1
+        psrld   xmm0,10
+        movdqa  xmm1,xmm3
+
+        psrld   xmm3,17
+        pxor    xmm7,xmm2
+        pslld   xmm1,13
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm3
+        psrld   xmm3,19-17
+        pxor    xmm0,xmm1
+        pslld   xmm1,15-13
+        pxor    xmm0,xmm3
+        pxor    xmm0,xmm1
+        paddd   xmm5,xmm0
+        movdqa  xmm7,xmm10
+
+        movdqa  xmm2,xmm10
+
+        psrld   xmm7,6
+        movdqa  xmm1,xmm10
+        pslld   xmm2,7
+        movdqa  XMMWORD[(32-128)+rax],xmm5
+        paddd   xmm5,xmm13
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm5,XMMWORD[((-64))+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm10
+
+        pxor    xmm7,xmm2
+        movdqa  xmm3,xmm10
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm12
+        pand    xmm3,xmm11
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm14
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm14
+        psrld   xmm1,2
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm3
+        movdqa  xmm3,xmm15
+        movdqa  xmm7,xmm14
+        pslld   xmm2,10
+        pxor    xmm3,xmm14
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm5,xmm0
+        pslld   xmm2,19-10
+        pand    xmm4,xmm3
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm13,xmm15
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm13,xmm4
+        paddd   xmm9,xmm5
+        pxor    xmm7,xmm2
+
+        paddd   xmm13,xmm5
+        paddd   xmm13,xmm7
+        movdqa  xmm5,XMMWORD[((64-128))+rax]
+        paddd   xmm6,XMMWORD[((192-128))+rax]
+
+        movdqa  xmm7,xmm5
+        movdqa  xmm1,xmm5
+        psrld   xmm7,3
+        movdqa  xmm2,xmm5
+
+        psrld   xmm1,7
+        movdqa  xmm0,XMMWORD[((16-128))+rax]
+        pslld   xmm2,14
+        pxor    xmm7,xmm1
+        psrld   xmm1,18-7
+        movdqa  xmm4,xmm0
+        pxor    xmm7,xmm2
+        pslld   xmm2,25-14
+        pxor    xmm7,xmm1
+        psrld   xmm0,10
+        movdqa  xmm1,xmm4
+
+        psrld   xmm4,17
+        pxor    xmm7,xmm2
+        pslld   xmm1,13
+        paddd   xmm6,xmm7
+        pxor    xmm0,xmm4
+        psrld   xmm4,19-17
+        pxor    xmm0,xmm1
+        pslld   xmm1,15-13
+        pxor    xmm0,xmm4
+        pxor    xmm0,xmm1
+        paddd   xmm6,xmm0
+        movdqa  xmm7,xmm9
+
+        movdqa  xmm2,xmm9
+
+        psrld   xmm7,6
+        movdqa  xmm1,xmm9
+        pslld   xmm2,7
+        movdqa  XMMWORD[(48-128)+rax],xmm6
+        paddd   xmm6,xmm12
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm6,XMMWORD[((-32))+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm9
+
+        pxor    xmm7,xmm2
+        movdqa  xmm4,xmm9
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm11
+        pand    xmm4,xmm10
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm13
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm13
+        psrld   xmm1,2
+        paddd   xmm6,xmm7
+        pxor    xmm0,xmm4
+        movdqa  xmm4,xmm14
+        movdqa  xmm7,xmm13
+        pslld   xmm2,10
+        pxor    xmm4,xmm13
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm6,xmm0
+        pslld   xmm2,19-10
+        pand    xmm3,xmm4
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm12,xmm14
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm12,xmm3
+        paddd   xmm8,xmm6
+        pxor    xmm7,xmm2
+
+        paddd   xmm12,xmm6
+        paddd   xmm12,xmm7
+        movdqa  xmm6,XMMWORD[((80-128))+rax]
+        paddd   xmm5,XMMWORD[((208-128))+rax]
+
+        movdqa  xmm7,xmm6
+        movdqa  xmm1,xmm6
+        psrld   xmm7,3
+        movdqa  xmm2,xmm6
+
+        psrld   xmm1,7
+        movdqa  xmm0,XMMWORD[((32-128))+rax]
+        pslld   xmm2,14
+        pxor    xmm7,xmm1
+        psrld   xmm1,18-7
+        movdqa  xmm3,xmm0
+        pxor    xmm7,xmm2
+        pslld   xmm2,25-14
+        pxor    xmm7,xmm1
+        psrld   xmm0,10
+        movdqa  xmm1,xmm3
+
+        psrld   xmm3,17
+        pxor    xmm7,xmm2
+        pslld   xmm1,13
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm3
+        psrld   xmm3,19-17
+        pxor    xmm0,xmm1
+        pslld   xmm1,15-13
+        pxor    xmm0,xmm3
+        pxor    xmm0,xmm1
+        paddd   xmm5,xmm0
+        movdqa  xmm7,xmm8
+
+        movdqa  xmm2,xmm8
+
+        psrld   xmm7,6
+        movdqa  xmm1,xmm8
+        pslld   xmm2,7
+        movdqa  XMMWORD[(64-128)+rax],xmm5
+        paddd   xmm5,xmm11
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm5,XMMWORD[rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm8
+
+        pxor    xmm7,xmm2
+        movdqa  xmm3,xmm8
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm10
+        pand    xmm3,xmm9
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm12
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm12
+        psrld   xmm1,2
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm3
+        movdqa  xmm3,xmm13
+        movdqa  xmm7,xmm12
+        pslld   xmm2,10
+        pxor    xmm3,xmm12
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm5,xmm0
+        pslld   xmm2,19-10
+        pand    xmm4,xmm3
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm11,xmm13
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm11,xmm4
+        paddd   xmm15,xmm5
+        pxor    xmm7,xmm2
+
+        paddd   xmm11,xmm5
+        paddd   xmm11,xmm7
+        movdqa  xmm5,XMMWORD[((96-128))+rax]
+        paddd   xmm6,XMMWORD[((224-128))+rax]
+
+        movdqa  xmm7,xmm5
+        movdqa  xmm1,xmm5
+        psrld   xmm7,3
+        movdqa  xmm2,xmm5
+
+        psrld   xmm1,7
+        movdqa  xmm0,XMMWORD[((48-128))+rax]
+        pslld   xmm2,14
+        pxor    xmm7,xmm1
+        psrld   xmm1,18-7
+        movdqa  xmm4,xmm0
+        pxor    xmm7,xmm2
+        pslld   xmm2,25-14
+        pxor    xmm7,xmm1
+        psrld   xmm0,10
+        movdqa  xmm1,xmm4
+
+        psrld   xmm4,17
+        pxor    xmm7,xmm2
+        pslld   xmm1,13
+        paddd   xmm6,xmm7
+        pxor    xmm0,xmm4
+        psrld   xmm4,19-17
+        pxor    xmm0,xmm1
+        pslld   xmm1,15-13
+        pxor    xmm0,xmm4
+        pxor    xmm0,xmm1
+        paddd   xmm6,xmm0
+        movdqa  xmm7,xmm15
+
+        movdqa  xmm2,xmm15
+
+        psrld   xmm7,6
+        movdqa  xmm1,xmm15
+        pslld   xmm2,7
+        movdqa  XMMWORD[(80-128)+rax],xmm6
+        paddd   xmm6,xmm10
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm6,XMMWORD[32+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm15
+
+        pxor    xmm7,xmm2
+        movdqa  xmm4,xmm15
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm9
+        pand    xmm4,xmm8
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm11
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm11
+        psrld   xmm1,2
+        paddd   xmm6,xmm7
+        pxor    xmm0,xmm4
+        movdqa  xmm4,xmm12
+        movdqa  xmm7,xmm11
+        pslld   xmm2,10
+        pxor    xmm4,xmm11
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm6,xmm0
+        pslld   xmm2,19-10
+        pand    xmm3,xmm4
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm10,xmm12
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm10,xmm3
+        paddd   xmm14,xmm6
+        pxor    xmm7,xmm2
+
+        paddd   xmm10,xmm6
+        paddd   xmm10,xmm7
+        movdqa  xmm6,XMMWORD[((112-128))+rax]
+        paddd   xmm5,XMMWORD[((240-128))+rax]
+
+        movdqa  xmm7,xmm6
+        movdqa  xmm1,xmm6
+        psrld   xmm7,3
+        movdqa  xmm2,xmm6
+
+        psrld   xmm1,7
+        movdqa  xmm0,XMMWORD[((64-128))+rax]
+        pslld   xmm2,14
+        pxor    xmm7,xmm1
+        psrld   xmm1,18-7
+        movdqa  xmm3,xmm0
+        pxor    xmm7,xmm2
+        pslld   xmm2,25-14
+        pxor    xmm7,xmm1
+        psrld   xmm0,10
+        movdqa  xmm1,xmm3
+
+        psrld   xmm3,17
+        pxor    xmm7,xmm2
+        pslld   xmm1,13
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm3
+        psrld   xmm3,19-17
+        pxor    xmm0,xmm1
+        pslld   xmm1,15-13
+        pxor    xmm0,xmm3
+        pxor    xmm0,xmm1
+        paddd   xmm5,xmm0
+        movdqa  xmm7,xmm14
+
+        movdqa  xmm2,xmm14
+
+        psrld   xmm7,6
+        movdqa  xmm1,xmm14
+        pslld   xmm2,7
+        movdqa  XMMWORD[(96-128)+rax],xmm5
+        paddd   xmm5,xmm9
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm5,XMMWORD[64+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm14
+
+        pxor    xmm7,xmm2
+        movdqa  xmm3,xmm14
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm8
+        pand    xmm3,xmm15
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm10
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm10
+        psrld   xmm1,2
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm3
+        movdqa  xmm3,xmm11
+        movdqa  xmm7,xmm10
+        pslld   xmm2,10
+        pxor    xmm3,xmm10
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm5,xmm0
+        pslld   xmm2,19-10
+        pand    xmm4,xmm3
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm9,xmm11
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm9,xmm4
+        paddd   xmm13,xmm5
+        pxor    xmm7,xmm2
+
+        paddd   xmm9,xmm5
+        paddd   xmm9,xmm7
+        movdqa  xmm5,XMMWORD[((128-128))+rax]
+        paddd   xmm6,XMMWORD[((0-128))+rax]
+
+        movdqa  xmm7,xmm5
+        movdqa  xmm1,xmm5
+        psrld   xmm7,3
+        movdqa  xmm2,xmm5
+
+        psrld   xmm1,7
+        movdqa  xmm0,XMMWORD[((80-128))+rax]
+        pslld   xmm2,14
+        pxor    xmm7,xmm1
+        psrld   xmm1,18-7
+        movdqa  xmm4,xmm0
+        pxor    xmm7,xmm2
+        pslld   xmm2,25-14
+        pxor    xmm7,xmm1
+        psrld   xmm0,10
+        movdqa  xmm1,xmm4
+
+        psrld   xmm4,17
+        pxor    xmm7,xmm2
+        pslld   xmm1,13
+        paddd   xmm6,xmm7
+        pxor    xmm0,xmm4
+        psrld   xmm4,19-17
+        pxor    xmm0,xmm1
+        pslld   xmm1,15-13
+        pxor    xmm0,xmm4
+        pxor    xmm0,xmm1
+        paddd   xmm6,xmm0
+        movdqa  xmm7,xmm13
+
+        movdqa  xmm2,xmm13
+
+        psrld   xmm7,6
+        movdqa  xmm1,xmm13
+        pslld   xmm2,7
+        movdqa  XMMWORD[(112-128)+rax],xmm6
+        paddd   xmm6,xmm8
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm6,XMMWORD[96+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm13
+
+        pxor    xmm7,xmm2
+        movdqa  xmm4,xmm13
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm15
+        pand    xmm4,xmm14
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm9
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm9
+        psrld   xmm1,2
+        paddd   xmm6,xmm7
+        pxor    xmm0,xmm4
+        movdqa  xmm4,xmm10
+        movdqa  xmm7,xmm9
+        pslld   xmm2,10
+        pxor    xmm4,xmm9
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm6,xmm0
+        pslld   xmm2,19-10
+        pand    xmm3,xmm4
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm8,xmm10
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm8,xmm3
+        paddd   xmm12,xmm6
+        pxor    xmm7,xmm2
+
+        paddd   xmm8,xmm6
+        paddd   xmm8,xmm7
+        lea     rbp,[256+rbp]
+        movdqa  xmm6,XMMWORD[((144-128))+rax]
+        paddd   xmm5,XMMWORD[((16-128))+rax]
+
+        movdqa  xmm7,xmm6
+        movdqa  xmm1,xmm6
+        psrld   xmm7,3
+        movdqa  xmm2,xmm6
+
+        psrld   xmm1,7
+        movdqa  xmm0,XMMWORD[((96-128))+rax]
+        pslld   xmm2,14
+        pxor    xmm7,xmm1
+        psrld   xmm1,18-7
+        movdqa  xmm3,xmm0
+        pxor    xmm7,xmm2
+        pslld   xmm2,25-14
+        pxor    xmm7,xmm1
+        psrld   xmm0,10
+        movdqa  xmm1,xmm3
+
+        psrld   xmm3,17
+        pxor    xmm7,xmm2
+        pslld   xmm1,13
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm3
+        psrld   xmm3,19-17
+        pxor    xmm0,xmm1
+        pslld   xmm1,15-13
+        pxor    xmm0,xmm3
+        pxor    xmm0,xmm1
+        paddd   xmm5,xmm0
+        movdqa  xmm7,xmm12
+
+        movdqa  xmm2,xmm12
+
+        psrld   xmm7,6
+        movdqa  xmm1,xmm12
+        pslld   xmm2,7
+        movdqa  XMMWORD[(128-128)+rax],xmm5
+        paddd   xmm5,xmm15
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm5,XMMWORD[((-128))+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm12
+
+        pxor    xmm7,xmm2
+        movdqa  xmm3,xmm12
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm14
+        pand    xmm3,xmm13
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm8
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm8
+        psrld   xmm1,2
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm3
+        movdqa  xmm3,xmm9
+        movdqa  xmm7,xmm8
+        pslld   xmm2,10
+        pxor    xmm3,xmm8
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm5,xmm0
+        pslld   xmm2,19-10
+        pand    xmm4,xmm3
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm15,xmm9
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm15,xmm4
+        paddd   xmm11,xmm5
+        pxor    xmm7,xmm2
+
+        paddd   xmm15,xmm5
+        paddd   xmm15,xmm7
+        movdqa  xmm5,XMMWORD[((160-128))+rax]
+        paddd   xmm6,XMMWORD[((32-128))+rax]
+
+        movdqa  xmm7,xmm5
+        movdqa  xmm1,xmm5
+        psrld   xmm7,3
+        movdqa  xmm2,xmm5
+
+        psrld   xmm1,7
+        movdqa  xmm0,XMMWORD[((112-128))+rax]
+        pslld   xmm2,14
+        pxor    xmm7,xmm1
+        psrld   xmm1,18-7
+        movdqa  xmm4,xmm0
+        pxor    xmm7,xmm2
+        pslld   xmm2,25-14
+        pxor    xmm7,xmm1
+        psrld   xmm0,10
+        movdqa  xmm1,xmm4
+
+        psrld   xmm4,17
+        pxor    xmm7,xmm2
+        pslld   xmm1,13
+        paddd   xmm6,xmm7
+        pxor    xmm0,xmm4
+        psrld   xmm4,19-17
+        pxor    xmm0,xmm1
+        pslld   xmm1,15-13
+        pxor    xmm0,xmm4
+        pxor    xmm0,xmm1
+        paddd   xmm6,xmm0
+        movdqa  xmm7,xmm11
+
+        movdqa  xmm2,xmm11
+
+        psrld   xmm7,6
+        movdqa  xmm1,xmm11
+        pslld   xmm2,7
+        movdqa  XMMWORD[(144-128)+rax],xmm6
+        paddd   xmm6,xmm14
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm6,XMMWORD[((-96))+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm11
+
+        pxor    xmm7,xmm2
+        movdqa  xmm4,xmm11
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm13
+        pand    xmm4,xmm12
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm15
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm15
+        psrld   xmm1,2
+        paddd   xmm6,xmm7
+        pxor    xmm0,xmm4
+        movdqa  xmm4,xmm8
+        movdqa  xmm7,xmm15
+        pslld   xmm2,10
+        pxor    xmm4,xmm15
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm6,xmm0
+        pslld   xmm2,19-10
+        pand    xmm3,xmm4
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm14,xmm8
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm14,xmm3
+        paddd   xmm10,xmm6
+        pxor    xmm7,xmm2
+
+        paddd   xmm14,xmm6
+        paddd   xmm14,xmm7
+        movdqa  xmm6,XMMWORD[((176-128))+rax]
+        paddd   xmm5,XMMWORD[((48-128))+rax]
+
+        movdqa  xmm7,xmm6
+        movdqa  xmm1,xmm6
+        psrld   xmm7,3
+        movdqa  xmm2,xmm6
+
+        psrld   xmm1,7
+        movdqa  xmm0,XMMWORD[((128-128))+rax]
+        pslld   xmm2,14
+        pxor    xmm7,xmm1
+        psrld   xmm1,18-7
+        movdqa  xmm3,xmm0
+        pxor    xmm7,xmm2
+        pslld   xmm2,25-14
+        pxor    xmm7,xmm1
+        psrld   xmm0,10
+        movdqa  xmm1,xmm3
+
+        psrld   xmm3,17
+        pxor    xmm7,xmm2
+        pslld   xmm1,13
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm3
+        psrld   xmm3,19-17
+        pxor    xmm0,xmm1
+        pslld   xmm1,15-13
+        pxor    xmm0,xmm3
+        pxor    xmm0,xmm1
+        paddd   xmm5,xmm0
+        movdqa  xmm7,xmm10
+
+        movdqa  xmm2,xmm10
+
+        psrld   xmm7,6
+        movdqa  xmm1,xmm10
+        pslld   xmm2,7
+        movdqa  XMMWORD[(160-128)+rax],xmm5
+        paddd   xmm5,xmm13
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm5,XMMWORD[((-64))+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm10
+
+        pxor    xmm7,xmm2
+        movdqa  xmm3,xmm10
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm12
+        pand    xmm3,xmm11
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm14
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm14
+        psrld   xmm1,2
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm3
+        movdqa  xmm3,xmm15
+        movdqa  xmm7,xmm14
+        pslld   xmm2,10
+        pxor    xmm3,xmm14
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm5,xmm0
+        pslld   xmm2,19-10
+        pand    xmm4,xmm3
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm13,xmm15
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm13,xmm4
+        paddd   xmm9,xmm5
+        pxor    xmm7,xmm2
+
+        paddd   xmm13,xmm5
+        paddd   xmm13,xmm7
+        movdqa  xmm5,XMMWORD[((192-128))+rax]
+        paddd   xmm6,XMMWORD[((64-128))+rax]
+
+        movdqa  xmm7,xmm5
+        movdqa  xmm1,xmm5
+        psrld   xmm7,3
+        movdqa  xmm2,xmm5
+
+        psrld   xmm1,7
+        movdqa  xmm0,XMMWORD[((144-128))+rax]
+        pslld   xmm2,14
+        pxor    xmm7,xmm1
+        psrld   xmm1,18-7
+        movdqa  xmm4,xmm0
+        pxor    xmm7,xmm2
+        pslld   xmm2,25-14
+        pxor    xmm7,xmm1
+        psrld   xmm0,10
+        movdqa  xmm1,xmm4
+
+        psrld   xmm4,17
+        pxor    xmm7,xmm2
+        pslld   xmm1,13
+        paddd   xmm6,xmm7
+        pxor    xmm0,xmm4
+        psrld   xmm4,19-17
+        pxor    xmm0,xmm1
+        pslld   xmm1,15-13
+        pxor    xmm0,xmm4
+        pxor    xmm0,xmm1
+        paddd   xmm6,xmm0
+        movdqa  xmm7,xmm9
+
+        movdqa  xmm2,xmm9
+
+        psrld   xmm7,6
+        movdqa  xmm1,xmm9
+        pslld   xmm2,7
+        movdqa  XMMWORD[(176-128)+rax],xmm6
+        paddd   xmm6,xmm12
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm6,XMMWORD[((-32))+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm9
+
+        pxor    xmm7,xmm2
+        movdqa  xmm4,xmm9
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm11
+        pand    xmm4,xmm10
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm13
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm13
+        psrld   xmm1,2
+        paddd   xmm6,xmm7
+        pxor    xmm0,xmm4
+        movdqa  xmm4,xmm14
+        movdqa  xmm7,xmm13
+        pslld   xmm2,10
+        pxor    xmm4,xmm13
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm6,xmm0
+        pslld   xmm2,19-10
+        pand    xmm3,xmm4
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm12,xmm14
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm12,xmm3
+        paddd   xmm8,xmm6
+        pxor    xmm7,xmm2
+
+        paddd   xmm12,xmm6
+        paddd   xmm12,xmm7
+        movdqa  xmm6,XMMWORD[((208-128))+rax]
+        paddd   xmm5,XMMWORD[((80-128))+rax]
+
+        movdqa  xmm7,xmm6
+        movdqa  xmm1,xmm6
+        psrld   xmm7,3
+        movdqa  xmm2,xmm6
+
+        psrld   xmm1,7
+        movdqa  xmm0,XMMWORD[((160-128))+rax]
+        pslld   xmm2,14
+        pxor    xmm7,xmm1
+        psrld   xmm1,18-7
+        movdqa  xmm3,xmm0
+        pxor    xmm7,xmm2
+        pslld   xmm2,25-14
+        pxor    xmm7,xmm1
+        psrld   xmm0,10
+        movdqa  xmm1,xmm3
+
+        psrld   xmm3,17
+        pxor    xmm7,xmm2
+        pslld   xmm1,13
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm3
+        psrld   xmm3,19-17
+        pxor    xmm0,xmm1
+        pslld   xmm1,15-13
+        pxor    xmm0,xmm3
+        pxor    xmm0,xmm1
+        paddd   xmm5,xmm0
+        movdqa  xmm7,xmm8
+
+        movdqa  xmm2,xmm8
+
+        psrld   xmm7,6
+        movdqa  xmm1,xmm8
+        pslld   xmm2,7
+        movdqa  XMMWORD[(192-128)+rax],xmm5
+        paddd   xmm5,xmm11
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm5,XMMWORD[rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm8
+
+        pxor    xmm7,xmm2
+        movdqa  xmm3,xmm8
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm10
+        pand    xmm3,xmm9
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm12
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm12
+        psrld   xmm1,2
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm3
+        movdqa  xmm3,xmm13
+        movdqa  xmm7,xmm12
+        pslld   xmm2,10
+        pxor    xmm3,xmm12
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm5,xmm0
+        pslld   xmm2,19-10
+        pand    xmm4,xmm3
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm11,xmm13
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm11,xmm4
+        paddd   xmm15,xmm5
+        pxor    xmm7,xmm2
+
+        paddd   xmm11,xmm5
+        paddd   xmm11,xmm7
+        movdqa  xmm5,XMMWORD[((224-128))+rax]
+        paddd   xmm6,XMMWORD[((96-128))+rax]
+
+        movdqa  xmm7,xmm5
+        movdqa  xmm1,xmm5
+        psrld   xmm7,3
+        movdqa  xmm2,xmm5
+
+        psrld   xmm1,7
+        movdqa  xmm0,XMMWORD[((176-128))+rax]
+        pslld   xmm2,14
+        pxor    xmm7,xmm1
+        psrld   xmm1,18-7
+        movdqa  xmm4,xmm0
+        pxor    xmm7,xmm2
+        pslld   xmm2,25-14
+        pxor    xmm7,xmm1
+        psrld   xmm0,10
+        movdqa  xmm1,xmm4
+
+        psrld   xmm4,17
+        pxor    xmm7,xmm2
+        pslld   xmm1,13
+        paddd   xmm6,xmm7
+        pxor    xmm0,xmm4
+        psrld   xmm4,19-17
+        pxor    xmm0,xmm1
+        pslld   xmm1,15-13
+        pxor    xmm0,xmm4
+        pxor    xmm0,xmm1
+        paddd   xmm6,xmm0
+        movdqa  xmm7,xmm15
+
+        movdqa  xmm2,xmm15
+
+        psrld   xmm7,6
+        movdqa  xmm1,xmm15
+        pslld   xmm2,7
+        movdqa  XMMWORD[(208-128)+rax],xmm6
+        paddd   xmm6,xmm10
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm6,XMMWORD[32+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm15
+
+        pxor    xmm7,xmm2
+        movdqa  xmm4,xmm15
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm9
+        pand    xmm4,xmm8
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm11
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm11
+        psrld   xmm1,2
+        paddd   xmm6,xmm7
+        pxor    xmm0,xmm4
+        movdqa  xmm4,xmm12
+        movdqa  xmm7,xmm11
+        pslld   xmm2,10
+        pxor    xmm4,xmm11
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm6,xmm0
+        pslld   xmm2,19-10
+        pand    xmm3,xmm4
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm10,xmm12
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm10,xmm3
+        paddd   xmm14,xmm6
+        pxor    xmm7,xmm2
+
+        paddd   xmm10,xmm6
+        paddd   xmm10,xmm7
+        movdqa  xmm6,XMMWORD[((240-128))+rax]
+        paddd   xmm5,XMMWORD[((112-128))+rax]
+
+        movdqa  xmm7,xmm6
+        movdqa  xmm1,xmm6
+        psrld   xmm7,3
+        movdqa  xmm2,xmm6
+
+        psrld   xmm1,7
+        movdqa  xmm0,XMMWORD[((192-128))+rax]
+        pslld   xmm2,14
+        pxor    xmm7,xmm1
+        psrld   xmm1,18-7
+        movdqa  xmm3,xmm0
+        pxor    xmm7,xmm2
+        pslld   xmm2,25-14
+        pxor    xmm7,xmm1
+        psrld   xmm0,10
+        movdqa  xmm1,xmm3
+
+        psrld   xmm3,17
+        pxor    xmm7,xmm2
+        pslld   xmm1,13
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm3
+        psrld   xmm3,19-17
+        pxor    xmm0,xmm1
+        pslld   xmm1,15-13
+        pxor    xmm0,xmm3
+        pxor    xmm0,xmm1
+        paddd   xmm5,xmm0
+        movdqa  xmm7,xmm14
+
+        movdqa  xmm2,xmm14
+
+        psrld   xmm7,6
+        movdqa  xmm1,xmm14
+        pslld   xmm2,7
+        movdqa  XMMWORD[(224-128)+rax],xmm5
+        paddd   xmm5,xmm9
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm5,XMMWORD[64+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm14
+
+        pxor    xmm7,xmm2
+        movdqa  xmm3,xmm14
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm8
+        pand    xmm3,xmm15
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm10
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm10
+        psrld   xmm1,2
+        paddd   xmm5,xmm7
+        pxor    xmm0,xmm3
+        movdqa  xmm3,xmm11
+        movdqa  xmm7,xmm10
+        pslld   xmm2,10
+        pxor    xmm3,xmm10
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm5,xmm0
+        pslld   xmm2,19-10
+        pand    xmm4,xmm3
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm9,xmm11
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm9,xmm4
+        paddd   xmm13,xmm5
+        pxor    xmm7,xmm2
+
+        paddd   xmm9,xmm5
+        paddd   xmm9,xmm7
+        movdqa  xmm5,XMMWORD[((0-128))+rax]
+        paddd   xmm6,XMMWORD[((128-128))+rax]
+
+        movdqa  xmm7,xmm5
+        movdqa  xmm1,xmm5
+        psrld   xmm7,3
+        movdqa  xmm2,xmm5
+
+        psrld   xmm1,7
+        movdqa  xmm0,XMMWORD[((208-128))+rax]
+        pslld   xmm2,14
+        pxor    xmm7,xmm1
+        psrld   xmm1,18-7
+        movdqa  xmm4,xmm0
+        pxor    xmm7,xmm2
+        pslld   xmm2,25-14
+        pxor    xmm7,xmm1
+        psrld   xmm0,10
+        movdqa  xmm1,xmm4
+
+        psrld   xmm4,17
+        pxor    xmm7,xmm2
+        pslld   xmm1,13
+        paddd   xmm6,xmm7
+        pxor    xmm0,xmm4
+        psrld   xmm4,19-17
+        pxor    xmm0,xmm1
+        pslld   xmm1,15-13
+        pxor    xmm0,xmm4
+        pxor    xmm0,xmm1
+        paddd   xmm6,xmm0
+        movdqa  xmm7,xmm13
+
+        movdqa  xmm2,xmm13
+
+        psrld   xmm7,6
+        movdqa  xmm1,xmm13
+        pslld   xmm2,7
+        movdqa  XMMWORD[(240-128)+rax],xmm6
+        paddd   xmm6,xmm8
+
+        psrld   xmm1,11
+        pxor    xmm7,xmm2
+        pslld   xmm2,21-7
+        paddd   xmm6,XMMWORD[96+rbp]
+        pxor    xmm7,xmm1
+
+        psrld   xmm1,25-11
+        movdqa  xmm0,xmm13
+
+        pxor    xmm7,xmm2
+        movdqa  xmm4,xmm13
+        pslld   xmm2,26-21
+        pandn   xmm0,xmm15
+        pand    xmm4,xmm14
+        pxor    xmm7,xmm1
+
+
+        movdqa  xmm1,xmm9
+        pxor    xmm7,xmm2
+        movdqa  xmm2,xmm9
+        psrld   xmm1,2
+        paddd   xmm6,xmm7
+        pxor    xmm0,xmm4
+        movdqa  xmm4,xmm10
+        movdqa  xmm7,xmm9
+        pslld   xmm2,10
+        pxor    xmm4,xmm9
+
+
+        psrld   xmm7,13
+        pxor    xmm1,xmm2
+        paddd   xmm6,xmm0
+        pslld   xmm2,19-10
+        pand    xmm3,xmm4
+        pxor    xmm1,xmm7
+
+
+        psrld   xmm7,22-13
+        pxor    xmm1,xmm2
+        movdqa  xmm8,xmm10
+        pslld   xmm2,30-19
+        pxor    xmm7,xmm1
+        pxor    xmm8,xmm3
+        paddd   xmm12,xmm6
+        pxor    xmm7,xmm2
+
+        paddd   xmm8,xmm6
+        paddd   xmm8,xmm7
+        lea     rbp,[256+rbp]
+        dec     ecx
+        jnz     NEAR $L$oop_16_xx
+
+        mov     ecx,1
+        lea     rbp,[((K256+128))]
+
+        movdqa  xmm7,XMMWORD[rbx]
+        cmp     ecx,DWORD[rbx]
+        pxor    xmm0,xmm0
+        cmovge  r8,rbp
+        cmp     ecx,DWORD[4+rbx]
+        movdqa  xmm6,xmm7
+        cmovge  r9,rbp
+        cmp     ecx,DWORD[8+rbx]
+        pcmpgtd xmm6,xmm0
+        cmovge  r10,rbp
+        cmp     ecx,DWORD[12+rbx]
+        paddd   xmm7,xmm6
+        cmovge  r11,rbp
+
+        movdqu  xmm0,XMMWORD[((0-128))+rdi]
+        pand    xmm8,xmm6
+        movdqu  xmm1,XMMWORD[((32-128))+rdi]
+        pand    xmm9,xmm6
+        movdqu  xmm2,XMMWORD[((64-128))+rdi]
+        pand    xmm10,xmm6
+        movdqu  xmm5,XMMWORD[((96-128))+rdi]
+        pand    xmm11,xmm6
+        paddd   xmm8,xmm0
+        movdqu  xmm0,XMMWORD[((128-128))+rdi]
+        pand    xmm12,xmm6
+        paddd   xmm9,xmm1
+        movdqu  xmm1,XMMWORD[((160-128))+rdi]
+        pand    xmm13,xmm6
+        paddd   xmm10,xmm2
+        movdqu  xmm2,XMMWORD[((192-128))+rdi]
+        pand    xmm14,xmm6
+        paddd   xmm11,xmm5
+        movdqu  xmm5,XMMWORD[((224-128))+rdi]
+        pand    xmm15,xmm6
+        paddd   xmm12,xmm0
+        paddd   xmm13,xmm1
+        movdqu  XMMWORD[(0-128)+rdi],xmm8
+        paddd   xmm14,xmm2
+        movdqu  XMMWORD[(32-128)+rdi],xmm9
+        paddd   xmm15,xmm5
+        movdqu  XMMWORD[(64-128)+rdi],xmm10
+        movdqu  XMMWORD[(96-128)+rdi],xmm11
+        movdqu  XMMWORD[(128-128)+rdi],xmm12
+        movdqu  XMMWORD[(160-128)+rdi],xmm13
+        movdqu  XMMWORD[(192-128)+rdi],xmm14
+        movdqu  XMMWORD[(224-128)+rdi],xmm15
+
+        movdqa  XMMWORD[rbx],xmm7
+        movdqa  xmm6,XMMWORD[$L$pbswap]
+        dec     edx
+        jnz     NEAR $L$oop
+
+        mov     edx,DWORD[280+rsp]
+        lea     rdi,[16+rdi]
+        lea     rsi,[64+rsi]
+        dec     edx
+        jnz     NEAR $L$oop_grande
+
+$L$done:
+        mov     rax,QWORD[272+rsp]
+
+        movaps  xmm6,XMMWORD[((-184))+rax]
+        movaps  xmm7,XMMWORD[((-168))+rax]
+        movaps  xmm8,XMMWORD[((-152))+rax]
+        movaps  xmm9,XMMWORD[((-136))+rax]
+        movaps  xmm10,XMMWORD[((-120))+rax]
+        movaps  xmm11,XMMWORD[((-104))+rax]
+        movaps  xmm12,XMMWORD[((-88))+rax]
+        movaps  xmm13,XMMWORD[((-72))+rax]
+        movaps  xmm14,XMMWORD[((-56))+rax]
+        movaps  xmm15,XMMWORD[((-40))+rax]
+        mov     rbp,QWORD[((-16))+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+
+        lea     rsp,[rax]
+
+$L$epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_sha256_multi_block:
+
+ALIGN   32
+sha256_multi_block_shaext:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_sha256_multi_block_shaext:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+
+_shaext_shortcut:
+        mov     rax,rsp
+
+        push    rbx
+
+        push    rbp
+
+        lea     rsp,[((-168))+rsp]
+        movaps  XMMWORD[rsp],xmm6
+        movaps  XMMWORD[16+rsp],xmm7
+        movaps  XMMWORD[32+rsp],xmm8
+        movaps  XMMWORD[48+rsp],xmm9
+        movaps  XMMWORD[(-120)+rax],xmm10
+        movaps  XMMWORD[(-104)+rax],xmm11
+        movaps  XMMWORD[(-88)+rax],xmm12
+        movaps  XMMWORD[(-72)+rax],xmm13
+        movaps  XMMWORD[(-56)+rax],xmm14
+        movaps  XMMWORD[(-40)+rax],xmm15
+        sub     rsp,288
+        shl     edx,1
+        and     rsp,-256
+        lea     rdi,[128+rdi]
+        mov     QWORD[272+rsp],rax
+$L$body_shaext:
+        lea     rbx,[256+rsp]
+        lea     rbp,[((K256_shaext+128))]
+
+$L$oop_grande_shaext:
+        mov     DWORD[280+rsp],edx
+        xor     edx,edx
+        mov     r8,QWORD[rsi]
+        mov     ecx,DWORD[8+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[rbx],ecx
+        cmovle  r8,rsp
+        mov     r9,QWORD[16+rsi]
+        mov     ecx,DWORD[24+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[4+rbx],ecx
+        cmovle  r9,rsp
+        test    edx,edx
+        jz      NEAR $L$done_shaext
+
+        movq    xmm12,QWORD[((0-128))+rdi]
+        movq    xmm4,QWORD[((32-128))+rdi]
+        movq    xmm13,QWORD[((64-128))+rdi]
+        movq    xmm5,QWORD[((96-128))+rdi]
+        movq    xmm8,QWORD[((128-128))+rdi]
+        movq    xmm9,QWORD[((160-128))+rdi]
+        movq    xmm10,QWORD[((192-128))+rdi]
+        movq    xmm11,QWORD[((224-128))+rdi]
+
+        punpckldq       xmm12,xmm4
+        punpckldq       xmm13,xmm5
+        punpckldq       xmm8,xmm9
+        punpckldq       xmm10,xmm11
+        movdqa  xmm3,XMMWORD[((K256_shaext-16))]
+
+        movdqa  xmm14,xmm12
+        movdqa  xmm15,xmm13
+        punpcklqdq      xmm12,xmm8
+        punpcklqdq      xmm13,xmm10
+        punpckhqdq      xmm14,xmm8
+        punpckhqdq      xmm15,xmm10
+
+        pshufd  xmm12,xmm12,27
+        pshufd  xmm13,xmm13,27
+        pshufd  xmm14,xmm14,27
+        pshufd  xmm15,xmm15,27
+        jmp     NEAR $L$oop_shaext
+
+ALIGN   32
+$L$oop_shaext:
+        movdqu  xmm4,XMMWORD[r8]
+        movdqu  xmm8,XMMWORD[r9]
+        movdqu  xmm5,XMMWORD[16+r8]
+        movdqu  xmm9,XMMWORD[16+r9]
+        movdqu  xmm6,XMMWORD[32+r8]
+DB      102,15,56,0,227
+        movdqu  xmm10,XMMWORD[32+r9]
+DB      102,68,15,56,0,195
+        movdqu  xmm7,XMMWORD[48+r8]
+        lea     r8,[64+r8]
+        movdqu  xmm11,XMMWORD[48+r9]
+        lea     r9,[64+r9]
+
+        movdqa  xmm0,XMMWORD[((0-128))+rbp]
+DB      102,15,56,0,235
+        paddd   xmm0,xmm4
+        pxor    xmm4,xmm12
+        movdqa  xmm1,xmm0
+        movdqa  xmm2,XMMWORD[((0-128))+rbp]
+DB      102,68,15,56,0,203
+        paddd   xmm2,xmm8
+        movdqa  XMMWORD[80+rsp],xmm13
+DB      69,15,56,203,236
+        pxor    xmm8,xmm14
+        movdqa  xmm0,xmm2
+        movdqa  XMMWORD[112+rsp],xmm15
+DB      69,15,56,203,254
+        pshufd  xmm0,xmm1,0x0e
+        pxor    xmm4,xmm12
+        movdqa  XMMWORD[64+rsp],xmm12
+DB      69,15,56,203,229
+        pshufd  xmm0,xmm2,0x0e
+        pxor    xmm8,xmm14
+        movdqa  XMMWORD[96+rsp],xmm14
+        movdqa  xmm1,XMMWORD[((16-128))+rbp]
+        paddd   xmm1,xmm5
+DB      102,15,56,0,243
+DB      69,15,56,203,247
+
+        movdqa  xmm0,xmm1
+        movdqa  xmm2,XMMWORD[((16-128))+rbp]
+        paddd   xmm2,xmm9
+DB      69,15,56,203,236
+        movdqa  xmm0,xmm2
+        prefetcht0      [127+r8]
+DB      102,15,56,0,251
+DB      102,68,15,56,0,211
+        prefetcht0      [127+r9]
+DB      69,15,56,203,254
+        pshufd  xmm0,xmm1,0x0e
+DB      102,68,15,56,0,219
+DB      15,56,204,229
+DB      69,15,56,203,229
+        pshufd  xmm0,xmm2,0x0e
+        movdqa  xmm1,XMMWORD[((32-128))+rbp]
+        paddd   xmm1,xmm6
+DB      69,15,56,203,247
+
+        movdqa  xmm0,xmm1
+        movdqa  xmm2,XMMWORD[((32-128))+rbp]
+        paddd   xmm2,xmm10
+DB      69,15,56,203,236
+DB      69,15,56,204,193
+        movdqa  xmm0,xmm2
+        movdqa  xmm3,xmm7
+DB      69,15,56,203,254
+        pshufd  xmm0,xmm1,0x0e
+DB      102,15,58,15,222,4
+        paddd   xmm4,xmm3
+        movdqa  xmm3,xmm11
+DB      102,65,15,58,15,218,4
+DB      15,56,204,238
+DB      69,15,56,203,229
+        pshufd  xmm0,xmm2,0x0e
+        movdqa  xmm1,XMMWORD[((48-128))+rbp]
+        paddd   xmm1,xmm7
+DB      69,15,56,203,247
+DB      69,15,56,204,202
+
+        movdqa  xmm0,xmm1
+        movdqa  xmm2,XMMWORD[((48-128))+rbp]
+        paddd   xmm8,xmm3
+        paddd   xmm2,xmm11
+DB      15,56,205,231
+DB      69,15,56,203,236
+        movdqa  xmm0,xmm2
+        movdqa  xmm3,xmm4
+DB      102,15,58,15,223,4
+DB      69,15,56,203,254
+DB      69,15,56,205,195
+        pshufd  xmm0,xmm1,0x0e
+        paddd   xmm5,xmm3
+        movdqa  xmm3,xmm8
+DB      102,65,15,58,15,219,4
+DB      15,56,204,247
+DB      69,15,56,203,229
+        pshufd  xmm0,xmm2,0x0e
+        movdqa  xmm1,XMMWORD[((64-128))+rbp]
+        paddd   xmm1,xmm4
+DB      69,15,56,203,247
+DB      69,15,56,204,211
+        movdqa  xmm0,xmm1
+        movdqa  xmm2,XMMWORD[((64-128))+rbp]
+        paddd   xmm9,xmm3
+        paddd   xmm2,xmm8
+DB      15,56,205,236
+DB      69,15,56,203,236
+        movdqa  xmm0,xmm2
+        movdqa  xmm3,xmm5
+DB      102,15,58,15,220,4
+DB      69,15,56,203,254
+DB      69,15,56,205,200
+        pshufd  xmm0,xmm1,0x0e
+        paddd   xmm6,xmm3
+        movdqa  xmm3,xmm9
+DB      102,65,15,58,15,216,4
+DB      15,56,204,252
+DB      69,15,56,203,229
+        pshufd  xmm0,xmm2,0x0e
+        movdqa  xmm1,XMMWORD[((80-128))+rbp]
+        paddd   xmm1,xmm5
+DB      69,15,56,203,247
+DB      69,15,56,204,216
+        movdqa  xmm0,xmm1
+        movdqa  xmm2,XMMWORD[((80-128))+rbp]
+        paddd   xmm10,xmm3
+        paddd   xmm2,xmm9
+DB      15,56,205,245
+DB      69,15,56,203,236
+        movdqa  xmm0,xmm2
+        movdqa  xmm3,xmm6
+DB      102,15,58,15,221,4
+DB      69,15,56,203,254
+DB      69,15,56,205,209
+        pshufd  xmm0,xmm1,0x0e
+        paddd   xmm7,xmm3
+        movdqa  xmm3,xmm10
+DB      102,65,15,58,15,217,4
+DB      15,56,204,229
+DB      69,15,56,203,229
+        pshufd  xmm0,xmm2,0x0e
+        movdqa  xmm1,XMMWORD[((96-128))+rbp]
+        paddd   xmm1,xmm6
+DB      69,15,56,203,247
+DB      69,15,56,204,193
+        movdqa  xmm0,xmm1
+        movdqa  xmm2,XMMWORD[((96-128))+rbp]
+        paddd   xmm11,xmm3
+        paddd   xmm2,xmm10
+DB      15,56,205,254
+DB      69,15,56,203,236
+        movdqa  xmm0,xmm2
+        movdqa  xmm3,xmm7
+DB      102,15,58,15,222,4
+DB      69,15,56,203,254
+DB      69,15,56,205,218
+        pshufd  xmm0,xmm1,0x0e
+        paddd   xmm4,xmm3
+        movdqa  xmm3,xmm11
+DB      102,65,15,58,15,218,4
+DB      15,56,204,238
+DB      69,15,56,203,229
+        pshufd  xmm0,xmm2,0x0e
+        movdqa  xmm1,XMMWORD[((112-128))+rbp]
+        paddd   xmm1,xmm7
+DB      69,15,56,203,247
+DB      69,15,56,204,202
+        movdqa  xmm0,xmm1
+        movdqa  xmm2,XMMWORD[((112-128))+rbp]
+        paddd   xmm8,xmm3
+        paddd   xmm2,xmm11
+DB      15,56,205,231
+DB      69,15,56,203,236
+        movdqa  xmm0,xmm2
+        movdqa  xmm3,xmm4
+DB      102,15,58,15,223,4
+DB      69,15,56,203,254
+DB      69,15,56,205,195
+        pshufd  xmm0,xmm1,0x0e
+        paddd   xmm5,xmm3
+        movdqa  xmm3,xmm8
+DB      102,65,15,58,15,219,4
+DB      15,56,204,247
+DB      69,15,56,203,229
+        pshufd  xmm0,xmm2,0x0e
+        movdqa  xmm1,XMMWORD[((128-128))+rbp]
+        paddd   xmm1,xmm4
+DB      69,15,56,203,247
+DB      69,15,56,204,211
+        movdqa  xmm0,xmm1
+        movdqa  xmm2,XMMWORD[((128-128))+rbp]
+        paddd   xmm9,xmm3
+        paddd   xmm2,xmm8
+DB      15,56,205,236
+DB      69,15,56,203,236
+        movdqa  xmm0,xmm2
+        movdqa  xmm3,xmm5
+DB      102,15,58,15,220,4
+DB      69,15,56,203,254
+DB      69,15,56,205,200
+        pshufd  xmm0,xmm1,0x0e
+        paddd   xmm6,xmm3
+        movdqa  xmm3,xmm9
+DB      102,65,15,58,15,216,4
+DB      15,56,204,252
+DB      69,15,56,203,229
+        pshufd  xmm0,xmm2,0x0e
+        movdqa  xmm1,XMMWORD[((144-128))+rbp]
+        paddd   xmm1,xmm5
+DB      69,15,56,203,247
+DB      69,15,56,204,216
+        movdqa  xmm0,xmm1
+        movdqa  xmm2,XMMWORD[((144-128))+rbp]
+        paddd   xmm10,xmm3
+        paddd   xmm2,xmm9
+DB      15,56,205,245
+DB      69,15,56,203,236
+        movdqa  xmm0,xmm2
+        movdqa  xmm3,xmm6
+DB      102,15,58,15,221,4
+DB      69,15,56,203,254
+DB      69,15,56,205,209
+        pshufd  xmm0,xmm1,0x0e
+        paddd   xmm7,xmm3
+        movdqa  xmm3,xmm10
+DB      102,65,15,58,15,217,4
+DB      15,56,204,229
+DB      69,15,56,203,229
+        pshufd  xmm0,xmm2,0x0e
+        movdqa  xmm1,XMMWORD[((160-128))+rbp]
+        paddd   xmm1,xmm6
+DB      69,15,56,203,247
+DB      69,15,56,204,193
+        movdqa  xmm0,xmm1
+        movdqa  xmm2,XMMWORD[((160-128))+rbp]
+        paddd   xmm11,xmm3
+        paddd   xmm2,xmm10
+DB      15,56,205,254
+DB      69,15,56,203,236
+        movdqa  xmm0,xmm2
+        movdqa  xmm3,xmm7
+DB      102,15,58,15,222,4
+DB      69,15,56,203,254
+DB      69,15,56,205,218
+        pshufd  xmm0,xmm1,0x0e
+        paddd   xmm4,xmm3
+        movdqa  xmm3,xmm11
+DB      102,65,15,58,15,218,4
+DB      15,56,204,238
+DB      69,15,56,203,229
+        pshufd  xmm0,xmm2,0x0e
+        movdqa  xmm1,XMMWORD[((176-128))+rbp]
+        paddd   xmm1,xmm7
+DB      69,15,56,203,247
+DB      69,15,56,204,202
+        movdqa  xmm0,xmm1
+        movdqa  xmm2,XMMWORD[((176-128))+rbp]
+        paddd   xmm8,xmm3
+        paddd   xmm2,xmm11
+DB      15,56,205,231
+DB      69,15,56,203,236
+        movdqa  xmm0,xmm2
+        movdqa  xmm3,xmm4
+DB      102,15,58,15,223,4
+DB      69,15,56,203,254
+DB      69,15,56,205,195
+        pshufd  xmm0,xmm1,0x0e
+        paddd   xmm5,xmm3
+        movdqa  xmm3,xmm8
+DB      102,65,15,58,15,219,4
+DB      15,56,204,247
+DB      69,15,56,203,229
+        pshufd  xmm0,xmm2,0x0e
+        movdqa  xmm1,XMMWORD[((192-128))+rbp]
+        paddd   xmm1,xmm4
+DB      69,15,56,203,247
+DB      69,15,56,204,211
+        movdqa  xmm0,xmm1
+        movdqa  xmm2,XMMWORD[((192-128))+rbp]
+        paddd   xmm9,xmm3
+        paddd   xmm2,xmm8
+DB      15,56,205,236
+DB      69,15,56,203,236
+        movdqa  xmm0,xmm2
+        movdqa  xmm3,xmm5
+DB      102,15,58,15,220,4
+DB      69,15,56,203,254
+DB      69,15,56,205,200
+        pshufd  xmm0,xmm1,0x0e
+        paddd   xmm6,xmm3
+        movdqa  xmm3,xmm9
+DB      102,65,15,58,15,216,4
+DB      15,56,204,252
+DB      69,15,56,203,229
+        pshufd  xmm0,xmm2,0x0e
+        movdqa  xmm1,XMMWORD[((208-128))+rbp]
+        paddd   xmm1,xmm5
+DB      69,15,56,203,247
+DB      69,15,56,204,216
+        movdqa  xmm0,xmm1
+        movdqa  xmm2,XMMWORD[((208-128))+rbp]
+        paddd   xmm10,xmm3
+        paddd   xmm2,xmm9
+DB      15,56,205,245
+DB      69,15,56,203,236
+        movdqa  xmm0,xmm2
+        movdqa  xmm3,xmm6
+DB      102,15,58,15,221,4
+DB      69,15,56,203,254
+DB      69,15,56,205,209
+        pshufd  xmm0,xmm1,0x0e
+        paddd   xmm7,xmm3
+        movdqa  xmm3,xmm10
+DB      102,65,15,58,15,217,4
+        nop
+DB      69,15,56,203,229
+        pshufd  xmm0,xmm2,0x0e
+        movdqa  xmm1,XMMWORD[((224-128))+rbp]
+        paddd   xmm1,xmm6
+DB      69,15,56,203,247
+
+        movdqa  xmm0,xmm1
+        movdqa  xmm2,XMMWORD[((224-128))+rbp]
+        paddd   xmm11,xmm3
+        paddd   xmm2,xmm10
+DB      15,56,205,254
+        nop
+DB      69,15,56,203,236
+        movdqa  xmm0,xmm2
+        mov     ecx,1
+        pxor    xmm6,xmm6
+DB      69,15,56,203,254
+DB      69,15,56,205,218
+        pshufd  xmm0,xmm1,0x0e
+        movdqa  xmm1,XMMWORD[((240-128))+rbp]
+        paddd   xmm1,xmm7
+        movq    xmm7,QWORD[rbx]
+        nop
+DB      69,15,56,203,229
+        pshufd  xmm0,xmm2,0x0e
+        movdqa  xmm2,XMMWORD[((240-128))+rbp]
+        paddd   xmm2,xmm11
+DB      69,15,56,203,247
+
+        movdqa  xmm0,xmm1
+        cmp     ecx,DWORD[rbx]
+        cmovge  r8,rsp
+        cmp     ecx,DWORD[4+rbx]
+        cmovge  r9,rsp
+        pshufd  xmm9,xmm7,0x00
+DB      69,15,56,203,236
+        movdqa  xmm0,xmm2
+        pshufd  xmm10,xmm7,0x55
+        movdqa  xmm11,xmm7
+DB      69,15,56,203,254
+        pshufd  xmm0,xmm1,0x0e
+        pcmpgtd xmm9,xmm6
+        pcmpgtd xmm10,xmm6
+DB      69,15,56,203,229
+        pshufd  xmm0,xmm2,0x0e
+        pcmpgtd xmm11,xmm6
+        movdqa  xmm3,XMMWORD[((K256_shaext-16))]
+DB      69,15,56,203,247
+
+        pand    xmm13,xmm9
+        pand    xmm15,xmm10
+        pand    xmm12,xmm9
+        pand    xmm14,xmm10
+        paddd   xmm11,xmm7
+
+        paddd   xmm13,XMMWORD[80+rsp]
+        paddd   xmm15,XMMWORD[112+rsp]
+        paddd   xmm12,XMMWORD[64+rsp]
+        paddd   xmm14,XMMWORD[96+rsp]
+
+        movq    QWORD[rbx],xmm11
+        dec     edx
+        jnz     NEAR $L$oop_shaext
+
+        mov     edx,DWORD[280+rsp]
+
+        pshufd  xmm12,xmm12,27
+        pshufd  xmm13,xmm13,27
+        pshufd  xmm14,xmm14,27
+        pshufd  xmm15,xmm15,27
+
+        movdqa  xmm5,xmm12
+        movdqa  xmm6,xmm13
+        punpckldq       xmm12,xmm14
+        punpckhdq       xmm5,xmm14
+        punpckldq       xmm13,xmm15
+        punpckhdq       xmm6,xmm15
+
+        movq    QWORD[(0-128)+rdi],xmm12
+        psrldq  xmm12,8
+        movq    QWORD[(128-128)+rdi],xmm5
+        psrldq  xmm5,8
+        movq    QWORD[(32-128)+rdi],xmm12
+        movq    QWORD[(160-128)+rdi],xmm5
+
+        movq    QWORD[(64-128)+rdi],xmm13
+        psrldq  xmm13,8
+        movq    QWORD[(192-128)+rdi],xmm6
+        psrldq  xmm6,8
+        movq    QWORD[(96-128)+rdi],xmm13
+        movq    QWORD[(224-128)+rdi],xmm6
+
+        lea     rdi,[8+rdi]
+        lea     rsi,[32+rsi]
+        dec     edx
+        jnz     NEAR $L$oop_grande_shaext
+
+$L$done_shaext:
+
+        movaps  xmm6,XMMWORD[((-184))+rax]
+        movaps  xmm7,XMMWORD[((-168))+rax]
+        movaps  xmm8,XMMWORD[((-152))+rax]
+        movaps  xmm9,XMMWORD[((-136))+rax]
+        movaps  xmm10,XMMWORD[((-120))+rax]
+        movaps  xmm11,XMMWORD[((-104))+rax]
+        movaps  xmm12,XMMWORD[((-88))+rax]
+        movaps  xmm13,XMMWORD[((-72))+rax]
+        movaps  xmm14,XMMWORD[((-56))+rax]
+        movaps  xmm15,XMMWORD[((-40))+rax]
+        mov     rbp,QWORD[((-16))+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+
+        lea     rsp,[rax]
+
+$L$epilogue_shaext:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_sha256_multi_block_shaext:
+
+ALIGN   32
+sha256_multi_block_avx:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_sha256_multi_block_avx:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+
+_avx_shortcut:
+        shr     rcx,32
+        cmp     edx,2
+        jb      NEAR $L$avx
+        test    ecx,32
+        jnz     NEAR _avx2_shortcut
+        jmp     NEAR $L$avx
+ALIGN   32
+$L$avx:
+        mov     rax,rsp
+
+        push    rbx
+
+        push    rbp
+
+        lea     rsp,[((-168))+rsp]
+        movaps  XMMWORD[rsp],xmm6
+        movaps  XMMWORD[16+rsp],xmm7
+        movaps  XMMWORD[32+rsp],xmm8
+        movaps  XMMWORD[48+rsp],xmm9
+        movaps  XMMWORD[(-120)+rax],xmm10
+        movaps  XMMWORD[(-104)+rax],xmm11
+        movaps  XMMWORD[(-88)+rax],xmm12
+        movaps  XMMWORD[(-72)+rax],xmm13
+        movaps  XMMWORD[(-56)+rax],xmm14
+        movaps  XMMWORD[(-40)+rax],xmm15
+        sub     rsp,288
+        and     rsp,-256
+        mov     QWORD[272+rsp],rax
+
+$L$body_avx:
+        lea     rbp,[((K256+128))]
+        lea     rbx,[256+rsp]
+        lea     rdi,[128+rdi]
+
+$L$oop_grande_avx:
+        mov     DWORD[280+rsp],edx
+        xor     edx,edx
+        mov     r8,QWORD[rsi]
+        mov     ecx,DWORD[8+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[rbx],ecx
+        cmovle  r8,rbp
+        mov     r9,QWORD[16+rsi]
+        mov     ecx,DWORD[24+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[4+rbx],ecx
+        cmovle  r9,rbp
+        mov     r10,QWORD[32+rsi]
+        mov     ecx,DWORD[40+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[8+rbx],ecx
+        cmovle  r10,rbp
+        mov     r11,QWORD[48+rsi]
+        mov     ecx,DWORD[56+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[12+rbx],ecx
+        cmovle  r11,rbp
+        test    edx,edx
+        jz      NEAR $L$done_avx
+
+        vmovdqu xmm8,XMMWORD[((0-128))+rdi]
+        lea     rax,[128+rsp]
+        vmovdqu xmm9,XMMWORD[((32-128))+rdi]
+        vmovdqu xmm10,XMMWORD[((64-128))+rdi]
+        vmovdqu xmm11,XMMWORD[((96-128))+rdi]
+        vmovdqu xmm12,XMMWORD[((128-128))+rdi]
+        vmovdqu xmm13,XMMWORD[((160-128))+rdi]
+        vmovdqu xmm14,XMMWORD[((192-128))+rdi]
+        vmovdqu xmm15,XMMWORD[((224-128))+rdi]
+        vmovdqu xmm6,XMMWORD[$L$pbswap]
+        jmp     NEAR $L$oop_avx
+
+ALIGN   32
+$L$oop_avx:
+        vpxor   xmm4,xmm10,xmm9
+        vmovd   xmm5,DWORD[r8]
+        vmovd   xmm0,DWORD[r9]
+        vpinsrd xmm5,xmm5,DWORD[r10],1
+        vpinsrd xmm0,xmm0,DWORD[r11],1
+        vpunpckldq      xmm5,xmm5,xmm0
+        vpshufb xmm5,xmm5,xmm6
+        vpsrld  xmm7,xmm12,6
+        vpslld  xmm2,xmm12,26
+        vmovdqu XMMWORD[(0-128)+rax],xmm5
+        vpaddd  xmm5,xmm5,xmm15
+
+        vpsrld  xmm1,xmm12,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm12,21
+        vpaddd  xmm5,xmm5,XMMWORD[((-128))+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm12,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm12,7
+        vpandn  xmm0,xmm12,xmm14
+        vpand   xmm3,xmm12,xmm13
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm15,xmm8,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm8,30
+        vpxor   xmm0,xmm0,xmm3
+        vpxor   xmm3,xmm9,xmm8
+
+        vpxor   xmm15,xmm15,xmm1
+        vpaddd  xmm5,xmm5,xmm7
+
+        vpsrld  xmm1,xmm8,13
+
+        vpslld  xmm2,xmm8,19
+        vpaddd  xmm5,xmm5,xmm0
+        vpand   xmm4,xmm4,xmm3
+
+        vpxor   xmm7,xmm15,xmm1
+
+        vpsrld  xmm1,xmm8,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm8,10
+        vpxor   xmm15,xmm9,xmm4
+        vpaddd  xmm11,xmm11,xmm5
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm15,xmm15,xmm5
+        vpaddd  xmm15,xmm15,xmm7
+        vmovd   xmm5,DWORD[4+r8]
+        vmovd   xmm0,DWORD[4+r9]
+        vpinsrd xmm5,xmm5,DWORD[4+r10],1
+        vpinsrd xmm0,xmm0,DWORD[4+r11],1
+        vpunpckldq      xmm5,xmm5,xmm0
+        vpshufb xmm5,xmm5,xmm6
+        vpsrld  xmm7,xmm11,6
+        vpslld  xmm2,xmm11,26
+        vmovdqu XMMWORD[(16-128)+rax],xmm5
+        vpaddd  xmm5,xmm5,xmm14
+
+        vpsrld  xmm1,xmm11,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm11,21
+        vpaddd  xmm5,xmm5,XMMWORD[((-96))+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm11,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm11,7
+        vpandn  xmm0,xmm11,xmm13
+        vpand   xmm4,xmm11,xmm12
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm14,xmm15,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm15,30
+        vpxor   xmm0,xmm0,xmm4
+        vpxor   xmm4,xmm8,xmm15
+
+        vpxor   xmm14,xmm14,xmm1
+        vpaddd  xmm5,xmm5,xmm7
+
+        vpsrld  xmm1,xmm15,13
+
+        vpslld  xmm2,xmm15,19
+        vpaddd  xmm5,xmm5,xmm0
+        vpand   xmm3,xmm3,xmm4
+
+        vpxor   xmm7,xmm14,xmm1
+
+        vpsrld  xmm1,xmm15,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm15,10
+        vpxor   xmm14,xmm8,xmm3
+        vpaddd  xmm10,xmm10,xmm5
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm14,xmm14,xmm5
+        vpaddd  xmm14,xmm14,xmm7
+        vmovd   xmm5,DWORD[8+r8]
+        vmovd   xmm0,DWORD[8+r9]
+        vpinsrd xmm5,xmm5,DWORD[8+r10],1
+        vpinsrd xmm0,xmm0,DWORD[8+r11],1
+        vpunpckldq      xmm5,xmm5,xmm0
+        vpshufb xmm5,xmm5,xmm6
+        vpsrld  xmm7,xmm10,6
+        vpslld  xmm2,xmm10,26
+        vmovdqu XMMWORD[(32-128)+rax],xmm5
+        vpaddd  xmm5,xmm5,xmm13
+
+        vpsrld  xmm1,xmm10,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm10,21
+        vpaddd  xmm5,xmm5,XMMWORD[((-64))+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm10,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm10,7
+        vpandn  xmm0,xmm10,xmm12
+        vpand   xmm3,xmm10,xmm11
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm13,xmm14,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm14,30
+        vpxor   xmm0,xmm0,xmm3
+        vpxor   xmm3,xmm15,xmm14
+
+        vpxor   xmm13,xmm13,xmm1
+        vpaddd  xmm5,xmm5,xmm7
+
+        vpsrld  xmm1,xmm14,13
+
+        vpslld  xmm2,xmm14,19
+        vpaddd  xmm5,xmm5,xmm0
+        vpand   xmm4,xmm4,xmm3
+
+        vpxor   xmm7,xmm13,xmm1
+
+        vpsrld  xmm1,xmm14,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm14,10
+        vpxor   xmm13,xmm15,xmm4
+        vpaddd  xmm9,xmm9,xmm5
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm13,xmm13,xmm5
+        vpaddd  xmm13,xmm13,xmm7
+        vmovd   xmm5,DWORD[12+r8]
+        vmovd   xmm0,DWORD[12+r9]
+        vpinsrd xmm5,xmm5,DWORD[12+r10],1
+        vpinsrd xmm0,xmm0,DWORD[12+r11],1
+        vpunpckldq      xmm5,xmm5,xmm0
+        vpshufb xmm5,xmm5,xmm6
+        vpsrld  xmm7,xmm9,6
+        vpslld  xmm2,xmm9,26
+        vmovdqu XMMWORD[(48-128)+rax],xmm5
+        vpaddd  xmm5,xmm5,xmm12
+
+        vpsrld  xmm1,xmm9,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm9,21
+        vpaddd  xmm5,xmm5,XMMWORD[((-32))+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm9,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm9,7
+        vpandn  xmm0,xmm9,xmm11
+        vpand   xmm4,xmm9,xmm10
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm12,xmm13,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm13,30
+        vpxor   xmm0,xmm0,xmm4
+        vpxor   xmm4,xmm14,xmm13
+
+        vpxor   xmm12,xmm12,xmm1
+        vpaddd  xmm5,xmm5,xmm7
+
+        vpsrld  xmm1,xmm13,13
+
+        vpslld  xmm2,xmm13,19
+        vpaddd  xmm5,xmm5,xmm0
+        vpand   xmm3,xmm3,xmm4
+
+        vpxor   xmm7,xmm12,xmm1
+
+        vpsrld  xmm1,xmm13,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm13,10
+        vpxor   xmm12,xmm14,xmm3
+        vpaddd  xmm8,xmm8,xmm5
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm12,xmm12,xmm5
+        vpaddd  xmm12,xmm12,xmm7
+        vmovd   xmm5,DWORD[16+r8]
+        vmovd   xmm0,DWORD[16+r9]
+        vpinsrd xmm5,xmm5,DWORD[16+r10],1
+        vpinsrd xmm0,xmm0,DWORD[16+r11],1
+        vpunpckldq      xmm5,xmm5,xmm0
+        vpshufb xmm5,xmm5,xmm6
+        vpsrld  xmm7,xmm8,6
+        vpslld  xmm2,xmm8,26
+        vmovdqu XMMWORD[(64-128)+rax],xmm5
+        vpaddd  xmm5,xmm5,xmm11
+
+        vpsrld  xmm1,xmm8,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm8,21
+        vpaddd  xmm5,xmm5,XMMWORD[rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm8,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm8,7
+        vpandn  xmm0,xmm8,xmm10
+        vpand   xmm3,xmm8,xmm9
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm11,xmm12,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm12,30
+        vpxor   xmm0,xmm0,xmm3
+        vpxor   xmm3,xmm13,xmm12
+
+        vpxor   xmm11,xmm11,xmm1
+        vpaddd  xmm5,xmm5,xmm7
+
+        vpsrld  xmm1,xmm12,13
+
+        vpslld  xmm2,xmm12,19
+        vpaddd  xmm5,xmm5,xmm0
+        vpand   xmm4,xmm4,xmm3
+
+        vpxor   xmm7,xmm11,xmm1
+
+        vpsrld  xmm1,xmm12,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm12,10
+        vpxor   xmm11,xmm13,xmm4
+        vpaddd  xmm15,xmm15,xmm5
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm11,xmm11,xmm5
+        vpaddd  xmm11,xmm11,xmm7
+        vmovd   xmm5,DWORD[20+r8]
+        vmovd   xmm0,DWORD[20+r9]
+        vpinsrd xmm5,xmm5,DWORD[20+r10],1
+        vpinsrd xmm0,xmm0,DWORD[20+r11],1
+        vpunpckldq      xmm5,xmm5,xmm0
+        vpshufb xmm5,xmm5,xmm6
+        vpsrld  xmm7,xmm15,6
+        vpslld  xmm2,xmm15,26
+        vmovdqu XMMWORD[(80-128)+rax],xmm5
+        vpaddd  xmm5,xmm5,xmm10
+
+        vpsrld  xmm1,xmm15,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm15,21
+        vpaddd  xmm5,xmm5,XMMWORD[32+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm15,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm15,7
+        vpandn  xmm0,xmm15,xmm9
+        vpand   xmm4,xmm15,xmm8
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm10,xmm11,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm11,30
+        vpxor   xmm0,xmm0,xmm4
+        vpxor   xmm4,xmm12,xmm11
+
+        vpxor   xmm10,xmm10,xmm1
+        vpaddd  xmm5,xmm5,xmm7
+
+        vpsrld  xmm1,xmm11,13
+
+        vpslld  xmm2,xmm11,19
+        vpaddd  xmm5,xmm5,xmm0
+        vpand   xmm3,xmm3,xmm4
+
+        vpxor   xmm7,xmm10,xmm1
+
+        vpsrld  xmm1,xmm11,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm11,10
+        vpxor   xmm10,xmm12,xmm3
+        vpaddd  xmm14,xmm14,xmm5
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm10,xmm10,xmm5
+        vpaddd  xmm10,xmm10,xmm7
+        vmovd   xmm5,DWORD[24+r8]
+        vmovd   xmm0,DWORD[24+r9]
+        vpinsrd xmm5,xmm5,DWORD[24+r10],1
+        vpinsrd xmm0,xmm0,DWORD[24+r11],1
+        vpunpckldq      xmm5,xmm5,xmm0
+        vpshufb xmm5,xmm5,xmm6
+        vpsrld  xmm7,xmm14,6
+        vpslld  xmm2,xmm14,26
+        vmovdqu XMMWORD[(96-128)+rax],xmm5
+        vpaddd  xmm5,xmm5,xmm9
+
+        vpsrld  xmm1,xmm14,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm14,21
+        vpaddd  xmm5,xmm5,XMMWORD[64+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm14,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm14,7
+        vpandn  xmm0,xmm14,xmm8
+        vpand   xmm3,xmm14,xmm15
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm9,xmm10,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm10,30
+        vpxor   xmm0,xmm0,xmm3
+        vpxor   xmm3,xmm11,xmm10
+
+        vpxor   xmm9,xmm9,xmm1
+        vpaddd  xmm5,xmm5,xmm7
+
+        vpsrld  xmm1,xmm10,13
+
+        vpslld  xmm2,xmm10,19
+        vpaddd  xmm5,xmm5,xmm0
+        vpand   xmm4,xmm4,xmm3
+
+        vpxor   xmm7,xmm9,xmm1
+
+        vpsrld  xmm1,xmm10,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm10,10
+        vpxor   xmm9,xmm11,xmm4
+        vpaddd  xmm13,xmm13,xmm5
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm9,xmm9,xmm5
+        vpaddd  xmm9,xmm9,xmm7
+        vmovd   xmm5,DWORD[28+r8]
+        vmovd   xmm0,DWORD[28+r9]
+        vpinsrd xmm5,xmm5,DWORD[28+r10],1
+        vpinsrd xmm0,xmm0,DWORD[28+r11],1
+        vpunpckldq      xmm5,xmm5,xmm0
+        vpshufb xmm5,xmm5,xmm6
+        vpsrld  xmm7,xmm13,6
+        vpslld  xmm2,xmm13,26
+        vmovdqu XMMWORD[(112-128)+rax],xmm5
+        vpaddd  xmm5,xmm5,xmm8
+
+        vpsrld  xmm1,xmm13,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm13,21
+        vpaddd  xmm5,xmm5,XMMWORD[96+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm13,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm13,7
+        vpandn  xmm0,xmm13,xmm15
+        vpand   xmm4,xmm13,xmm14
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm8,xmm9,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm9,30
+        vpxor   xmm0,xmm0,xmm4
+        vpxor   xmm4,xmm10,xmm9
+
+        vpxor   xmm8,xmm8,xmm1
+        vpaddd  xmm5,xmm5,xmm7
+
+        vpsrld  xmm1,xmm9,13
+
+        vpslld  xmm2,xmm9,19
+        vpaddd  xmm5,xmm5,xmm0
+        vpand   xmm3,xmm3,xmm4
+
+        vpxor   xmm7,xmm8,xmm1
+
+        vpsrld  xmm1,xmm9,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm9,10
+        vpxor   xmm8,xmm10,xmm3
+        vpaddd  xmm12,xmm12,xmm5
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm8,xmm8,xmm5
+        vpaddd  xmm8,xmm8,xmm7
+        add     rbp,256
+        vmovd   xmm5,DWORD[32+r8]
+        vmovd   xmm0,DWORD[32+r9]
+        vpinsrd xmm5,xmm5,DWORD[32+r10],1
+        vpinsrd xmm0,xmm0,DWORD[32+r11],1
+        vpunpckldq      xmm5,xmm5,xmm0
+        vpshufb xmm5,xmm5,xmm6
+        vpsrld  xmm7,xmm12,6
+        vpslld  xmm2,xmm12,26
+        vmovdqu XMMWORD[(128-128)+rax],xmm5
+        vpaddd  xmm5,xmm5,xmm15
+
+        vpsrld  xmm1,xmm12,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm12,21
+        vpaddd  xmm5,xmm5,XMMWORD[((-128))+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm12,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm12,7
+        vpandn  xmm0,xmm12,xmm14
+        vpand   xmm3,xmm12,xmm13
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm15,xmm8,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm8,30
+        vpxor   xmm0,xmm0,xmm3
+        vpxor   xmm3,xmm9,xmm8
+
+        vpxor   xmm15,xmm15,xmm1
+        vpaddd  xmm5,xmm5,xmm7
+
+        vpsrld  xmm1,xmm8,13
+
+        vpslld  xmm2,xmm8,19
+        vpaddd  xmm5,xmm5,xmm0
+        vpand   xmm4,xmm4,xmm3
+
+        vpxor   xmm7,xmm15,xmm1
+
+        vpsrld  xmm1,xmm8,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm8,10
+        vpxor   xmm15,xmm9,xmm4
+        vpaddd  xmm11,xmm11,xmm5
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm15,xmm15,xmm5
+        vpaddd  xmm15,xmm15,xmm7
+        vmovd   xmm5,DWORD[36+r8]
+        vmovd   xmm0,DWORD[36+r9]
+        vpinsrd xmm5,xmm5,DWORD[36+r10],1
+        vpinsrd xmm0,xmm0,DWORD[36+r11],1
+        vpunpckldq      xmm5,xmm5,xmm0
+        vpshufb xmm5,xmm5,xmm6
+        vpsrld  xmm7,xmm11,6
+        vpslld  xmm2,xmm11,26
+        vmovdqu XMMWORD[(144-128)+rax],xmm5
+        vpaddd  xmm5,xmm5,xmm14
+
+        vpsrld  xmm1,xmm11,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm11,21
+        vpaddd  xmm5,xmm5,XMMWORD[((-96))+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm11,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm11,7
+        vpandn  xmm0,xmm11,xmm13
+        vpand   xmm4,xmm11,xmm12
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm14,xmm15,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm15,30
+        vpxor   xmm0,xmm0,xmm4
+        vpxor   xmm4,xmm8,xmm15
+
+        vpxor   xmm14,xmm14,xmm1
+        vpaddd  xmm5,xmm5,xmm7
+
+        vpsrld  xmm1,xmm15,13
+
+        vpslld  xmm2,xmm15,19
+        vpaddd  xmm5,xmm5,xmm0
+        vpand   xmm3,xmm3,xmm4
+
+        vpxor   xmm7,xmm14,xmm1
+
+        vpsrld  xmm1,xmm15,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm15,10
+        vpxor   xmm14,xmm8,xmm3
+        vpaddd  xmm10,xmm10,xmm5
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm14,xmm14,xmm5
+        vpaddd  xmm14,xmm14,xmm7
+        vmovd   xmm5,DWORD[40+r8]
+        vmovd   xmm0,DWORD[40+r9]
+        vpinsrd xmm5,xmm5,DWORD[40+r10],1
+        vpinsrd xmm0,xmm0,DWORD[40+r11],1
+        vpunpckldq      xmm5,xmm5,xmm0
+        vpshufb xmm5,xmm5,xmm6
+        vpsrld  xmm7,xmm10,6
+        vpslld  xmm2,xmm10,26
+        vmovdqu XMMWORD[(160-128)+rax],xmm5
+        vpaddd  xmm5,xmm5,xmm13
+
+        vpsrld  xmm1,xmm10,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm10,21
+        vpaddd  xmm5,xmm5,XMMWORD[((-64))+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm10,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm10,7
+        vpandn  xmm0,xmm10,xmm12
+        vpand   xmm3,xmm10,xmm11
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm13,xmm14,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm14,30
+        vpxor   xmm0,xmm0,xmm3
+        vpxor   xmm3,xmm15,xmm14
+
+        vpxor   xmm13,xmm13,xmm1
+        vpaddd  xmm5,xmm5,xmm7
+
+        vpsrld  xmm1,xmm14,13
+
+        vpslld  xmm2,xmm14,19
+        vpaddd  xmm5,xmm5,xmm0
+        vpand   xmm4,xmm4,xmm3
+
+        vpxor   xmm7,xmm13,xmm1
+
+        vpsrld  xmm1,xmm14,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm14,10
+        vpxor   xmm13,xmm15,xmm4
+        vpaddd  xmm9,xmm9,xmm5
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm13,xmm13,xmm5
+        vpaddd  xmm13,xmm13,xmm7
+        vmovd   xmm5,DWORD[44+r8]
+        vmovd   xmm0,DWORD[44+r9]
+        vpinsrd xmm5,xmm5,DWORD[44+r10],1
+        vpinsrd xmm0,xmm0,DWORD[44+r11],1
+        vpunpckldq      xmm5,xmm5,xmm0
+        vpshufb xmm5,xmm5,xmm6
+        vpsrld  xmm7,xmm9,6
+        vpslld  xmm2,xmm9,26
+        vmovdqu XMMWORD[(176-128)+rax],xmm5
+        vpaddd  xmm5,xmm5,xmm12
+
+        vpsrld  xmm1,xmm9,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm9,21
+        vpaddd  xmm5,xmm5,XMMWORD[((-32))+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm9,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm9,7
+        vpandn  xmm0,xmm9,xmm11
+        vpand   xmm4,xmm9,xmm10
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm12,xmm13,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm13,30
+        vpxor   xmm0,xmm0,xmm4
+        vpxor   xmm4,xmm14,xmm13
+
+        vpxor   xmm12,xmm12,xmm1
+        vpaddd  xmm5,xmm5,xmm7
+
+        vpsrld  xmm1,xmm13,13
+
+        vpslld  xmm2,xmm13,19
+        vpaddd  xmm5,xmm5,xmm0
+        vpand   xmm3,xmm3,xmm4
+
+        vpxor   xmm7,xmm12,xmm1
+
+        vpsrld  xmm1,xmm13,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm13,10
+        vpxor   xmm12,xmm14,xmm3
+        vpaddd  xmm8,xmm8,xmm5
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm12,xmm12,xmm5
+        vpaddd  xmm12,xmm12,xmm7
+        vmovd   xmm5,DWORD[48+r8]
+        vmovd   xmm0,DWORD[48+r9]
+        vpinsrd xmm5,xmm5,DWORD[48+r10],1
+        vpinsrd xmm0,xmm0,DWORD[48+r11],1
+        vpunpckldq      xmm5,xmm5,xmm0
+        vpshufb xmm5,xmm5,xmm6
+        vpsrld  xmm7,xmm8,6
+        vpslld  xmm2,xmm8,26
+        vmovdqu XMMWORD[(192-128)+rax],xmm5
+        vpaddd  xmm5,xmm5,xmm11
+
+        vpsrld  xmm1,xmm8,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm8,21
+        vpaddd  xmm5,xmm5,XMMWORD[rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm8,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm8,7
+        vpandn  xmm0,xmm8,xmm10
+        vpand   xmm3,xmm8,xmm9
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm11,xmm12,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm12,30
+        vpxor   xmm0,xmm0,xmm3
+        vpxor   xmm3,xmm13,xmm12
+
+        vpxor   xmm11,xmm11,xmm1
+        vpaddd  xmm5,xmm5,xmm7
+
+        vpsrld  xmm1,xmm12,13
+
+        vpslld  xmm2,xmm12,19
+        vpaddd  xmm5,xmm5,xmm0
+        vpand   xmm4,xmm4,xmm3
+
+        vpxor   xmm7,xmm11,xmm1
+
+        vpsrld  xmm1,xmm12,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm12,10
+        vpxor   xmm11,xmm13,xmm4
+        vpaddd  xmm15,xmm15,xmm5
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm11,xmm11,xmm5
+        vpaddd  xmm11,xmm11,xmm7
+        vmovd   xmm5,DWORD[52+r8]
+        vmovd   xmm0,DWORD[52+r9]
+        vpinsrd xmm5,xmm5,DWORD[52+r10],1
+        vpinsrd xmm0,xmm0,DWORD[52+r11],1
+        vpunpckldq      xmm5,xmm5,xmm0
+        vpshufb xmm5,xmm5,xmm6
+        vpsrld  xmm7,xmm15,6
+        vpslld  xmm2,xmm15,26
+        vmovdqu XMMWORD[(208-128)+rax],xmm5
+        vpaddd  xmm5,xmm5,xmm10
+
+        vpsrld  xmm1,xmm15,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm15,21
+        vpaddd  xmm5,xmm5,XMMWORD[32+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm15,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm15,7
+        vpandn  xmm0,xmm15,xmm9
+        vpand   xmm4,xmm15,xmm8
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm10,xmm11,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm11,30
+        vpxor   xmm0,xmm0,xmm4
+        vpxor   xmm4,xmm12,xmm11
+
+        vpxor   xmm10,xmm10,xmm1
+        vpaddd  xmm5,xmm5,xmm7
+
+        vpsrld  xmm1,xmm11,13
+
+        vpslld  xmm2,xmm11,19
+        vpaddd  xmm5,xmm5,xmm0
+        vpand   xmm3,xmm3,xmm4
+
+        vpxor   xmm7,xmm10,xmm1
+
+        vpsrld  xmm1,xmm11,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm11,10
+        vpxor   xmm10,xmm12,xmm3
+        vpaddd  xmm14,xmm14,xmm5
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm10,xmm10,xmm5
+        vpaddd  xmm10,xmm10,xmm7
+        vmovd   xmm5,DWORD[56+r8]
+        vmovd   xmm0,DWORD[56+r9]
+        vpinsrd xmm5,xmm5,DWORD[56+r10],1
+        vpinsrd xmm0,xmm0,DWORD[56+r11],1
+        vpunpckldq      xmm5,xmm5,xmm0
+        vpshufb xmm5,xmm5,xmm6
+        vpsrld  xmm7,xmm14,6
+        vpslld  xmm2,xmm14,26
+        vmovdqu XMMWORD[(224-128)+rax],xmm5
+        vpaddd  xmm5,xmm5,xmm9
+
+        vpsrld  xmm1,xmm14,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm14,21
+        vpaddd  xmm5,xmm5,XMMWORD[64+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm14,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm14,7
+        vpandn  xmm0,xmm14,xmm8
+        vpand   xmm3,xmm14,xmm15
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm9,xmm10,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm10,30
+        vpxor   xmm0,xmm0,xmm3
+        vpxor   xmm3,xmm11,xmm10
+
+        vpxor   xmm9,xmm9,xmm1
+        vpaddd  xmm5,xmm5,xmm7
+
+        vpsrld  xmm1,xmm10,13
+
+        vpslld  xmm2,xmm10,19
+        vpaddd  xmm5,xmm5,xmm0
+        vpand   xmm4,xmm4,xmm3
+
+        vpxor   xmm7,xmm9,xmm1
+
+        vpsrld  xmm1,xmm10,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm10,10
+        vpxor   xmm9,xmm11,xmm4
+        vpaddd  xmm13,xmm13,xmm5
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm9,xmm9,xmm5
+        vpaddd  xmm9,xmm9,xmm7
+        vmovd   xmm5,DWORD[60+r8]
+        lea     r8,[64+r8]
+        vmovd   xmm0,DWORD[60+r9]
+        lea     r9,[64+r9]
+        vpinsrd xmm5,xmm5,DWORD[60+r10],1
+        lea     r10,[64+r10]
+        vpinsrd xmm0,xmm0,DWORD[60+r11],1
+        lea     r11,[64+r11]
+        vpunpckldq      xmm5,xmm5,xmm0
+        vpshufb xmm5,xmm5,xmm6
+        vpsrld  xmm7,xmm13,6
+        vpslld  xmm2,xmm13,26
+        vmovdqu XMMWORD[(240-128)+rax],xmm5
+        vpaddd  xmm5,xmm5,xmm8
+
+        vpsrld  xmm1,xmm13,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm13,21
+        vpaddd  xmm5,xmm5,XMMWORD[96+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm13,25
+        vpxor   xmm7,xmm7,xmm2
+        prefetcht0      [63+r8]
+        vpslld  xmm2,xmm13,7
+        vpandn  xmm0,xmm13,xmm15
+        vpand   xmm4,xmm13,xmm14
+        prefetcht0      [63+r9]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm8,xmm9,2
+        vpxor   xmm7,xmm7,xmm2
+        prefetcht0      [63+r10]
+        vpslld  xmm1,xmm9,30
+        vpxor   xmm0,xmm0,xmm4
+        vpxor   xmm4,xmm10,xmm9
+        prefetcht0      [63+r11]
+        vpxor   xmm8,xmm8,xmm1
+        vpaddd  xmm5,xmm5,xmm7
+
+        vpsrld  xmm1,xmm9,13
+
+        vpslld  xmm2,xmm9,19
+        vpaddd  xmm5,xmm5,xmm0
+        vpand   xmm3,xmm3,xmm4
+
+        vpxor   xmm7,xmm8,xmm1
+
+        vpsrld  xmm1,xmm9,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm9,10
+        vpxor   xmm8,xmm10,xmm3
+        vpaddd  xmm12,xmm12,xmm5
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm8,xmm8,xmm5
+        vpaddd  xmm8,xmm8,xmm7
+        add     rbp,256
+        vmovdqu xmm5,XMMWORD[((0-128))+rax]
+        mov     ecx,3
+        jmp     NEAR $L$oop_16_xx_avx
+ALIGN   32
+$L$oop_16_xx_avx:
+        vmovdqu xmm6,XMMWORD[((16-128))+rax]
+        vpaddd  xmm5,xmm5,XMMWORD[((144-128))+rax]
+
+        vpsrld  xmm7,xmm6,3
+        vpsrld  xmm1,xmm6,7
+        vpslld  xmm2,xmm6,25
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm6,18
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm6,14
+        vmovdqu xmm0,XMMWORD[((224-128))+rax]
+        vpsrld  xmm3,xmm0,10
+
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm0,17
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,15
+        vpaddd  xmm5,xmm5,xmm7
+        vpxor   xmm7,xmm3,xmm1
+        vpsrld  xmm1,xmm0,19
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,13
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+        vpaddd  xmm5,xmm5,xmm7
+        vpsrld  xmm7,xmm12,6
+        vpslld  xmm2,xmm12,26
+        vmovdqu XMMWORD[(0-128)+rax],xmm5
+        vpaddd  xmm5,xmm5,xmm15
+
+        vpsrld  xmm1,xmm12,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm12,21
+        vpaddd  xmm5,xmm5,XMMWORD[((-128))+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm12,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm12,7
+        vpandn  xmm0,xmm12,xmm14
+        vpand   xmm3,xmm12,xmm13
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm15,xmm8,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm8,30
+        vpxor   xmm0,xmm0,xmm3
+        vpxor   xmm3,xmm9,xmm8
+
+        vpxor   xmm15,xmm15,xmm1
+        vpaddd  xmm5,xmm5,xmm7
+
+        vpsrld  xmm1,xmm8,13
+
+        vpslld  xmm2,xmm8,19
+        vpaddd  xmm5,xmm5,xmm0
+        vpand   xmm4,xmm4,xmm3
+
+        vpxor   xmm7,xmm15,xmm1
+
+        vpsrld  xmm1,xmm8,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm8,10
+        vpxor   xmm15,xmm9,xmm4
+        vpaddd  xmm11,xmm11,xmm5
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm15,xmm15,xmm5
+        vpaddd  xmm15,xmm15,xmm7
+        vmovdqu xmm5,XMMWORD[((32-128))+rax]
+        vpaddd  xmm6,xmm6,XMMWORD[((160-128))+rax]
+
+        vpsrld  xmm7,xmm5,3
+        vpsrld  xmm1,xmm5,7
+        vpslld  xmm2,xmm5,25
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm5,18
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm5,14
+        vmovdqu xmm0,XMMWORD[((240-128))+rax]
+        vpsrld  xmm4,xmm0,10
+
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm0,17
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,15
+        vpaddd  xmm6,xmm6,xmm7
+        vpxor   xmm7,xmm4,xmm1
+        vpsrld  xmm1,xmm0,19
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,13
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+        vpaddd  xmm6,xmm6,xmm7
+        vpsrld  xmm7,xmm11,6
+        vpslld  xmm2,xmm11,26
+        vmovdqu XMMWORD[(16-128)+rax],xmm6
+        vpaddd  xmm6,xmm6,xmm14
+
+        vpsrld  xmm1,xmm11,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm11,21
+        vpaddd  xmm6,xmm6,XMMWORD[((-96))+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm11,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm11,7
+        vpandn  xmm0,xmm11,xmm13
+        vpand   xmm4,xmm11,xmm12
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm14,xmm15,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm15,30
+        vpxor   xmm0,xmm0,xmm4
+        vpxor   xmm4,xmm8,xmm15
+
+        vpxor   xmm14,xmm14,xmm1
+        vpaddd  xmm6,xmm6,xmm7
+
+        vpsrld  xmm1,xmm15,13
+
+        vpslld  xmm2,xmm15,19
+        vpaddd  xmm6,xmm6,xmm0
+        vpand   xmm3,xmm3,xmm4
+
+        vpxor   xmm7,xmm14,xmm1
+
+        vpsrld  xmm1,xmm15,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm15,10
+        vpxor   xmm14,xmm8,xmm3
+        vpaddd  xmm10,xmm10,xmm6
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm14,xmm14,xmm6
+        vpaddd  xmm14,xmm14,xmm7
+        vmovdqu xmm6,XMMWORD[((48-128))+rax]
+        vpaddd  xmm5,xmm5,XMMWORD[((176-128))+rax]
+
+        vpsrld  xmm7,xmm6,3
+        vpsrld  xmm1,xmm6,7
+        vpslld  xmm2,xmm6,25
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm6,18
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm6,14
+        vmovdqu xmm0,XMMWORD[((0-128))+rax]
+        vpsrld  xmm3,xmm0,10
+
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm0,17
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,15
+        vpaddd  xmm5,xmm5,xmm7
+        vpxor   xmm7,xmm3,xmm1
+        vpsrld  xmm1,xmm0,19
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,13
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+        vpaddd  xmm5,xmm5,xmm7
+        vpsrld  xmm7,xmm10,6
+        vpslld  xmm2,xmm10,26
+        vmovdqu XMMWORD[(32-128)+rax],xmm5
+        vpaddd  xmm5,xmm5,xmm13
+
+        vpsrld  xmm1,xmm10,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm10,21
+        vpaddd  xmm5,xmm5,XMMWORD[((-64))+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm10,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm10,7
+        vpandn  xmm0,xmm10,xmm12
+        vpand   xmm3,xmm10,xmm11
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm13,xmm14,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm14,30
+        vpxor   xmm0,xmm0,xmm3
+        vpxor   xmm3,xmm15,xmm14
+
+        vpxor   xmm13,xmm13,xmm1
+        vpaddd  xmm5,xmm5,xmm7
+
+        vpsrld  xmm1,xmm14,13
+
+        vpslld  xmm2,xmm14,19
+        vpaddd  xmm5,xmm5,xmm0
+        vpand   xmm4,xmm4,xmm3
+
+        vpxor   xmm7,xmm13,xmm1
+
+        vpsrld  xmm1,xmm14,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm14,10
+        vpxor   xmm13,xmm15,xmm4
+        vpaddd  xmm9,xmm9,xmm5
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm13,xmm13,xmm5
+        vpaddd  xmm13,xmm13,xmm7
+        vmovdqu xmm5,XMMWORD[((64-128))+rax]
+        vpaddd  xmm6,xmm6,XMMWORD[((192-128))+rax]
+
+        vpsrld  xmm7,xmm5,3
+        vpsrld  xmm1,xmm5,7
+        vpslld  xmm2,xmm5,25
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm5,18
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm5,14
+        vmovdqu xmm0,XMMWORD[((16-128))+rax]
+        vpsrld  xmm4,xmm0,10
+
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm0,17
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,15
+        vpaddd  xmm6,xmm6,xmm7
+        vpxor   xmm7,xmm4,xmm1
+        vpsrld  xmm1,xmm0,19
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,13
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+        vpaddd  xmm6,xmm6,xmm7
+        vpsrld  xmm7,xmm9,6
+        vpslld  xmm2,xmm9,26
+        vmovdqu XMMWORD[(48-128)+rax],xmm6
+        vpaddd  xmm6,xmm6,xmm12
+
+        vpsrld  xmm1,xmm9,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm9,21
+        vpaddd  xmm6,xmm6,XMMWORD[((-32))+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm9,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm9,7
+        vpandn  xmm0,xmm9,xmm11
+        vpand   xmm4,xmm9,xmm10
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm12,xmm13,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm13,30
+        vpxor   xmm0,xmm0,xmm4
+        vpxor   xmm4,xmm14,xmm13
+
+        vpxor   xmm12,xmm12,xmm1
+        vpaddd  xmm6,xmm6,xmm7
+
+        vpsrld  xmm1,xmm13,13
+
+        vpslld  xmm2,xmm13,19
+        vpaddd  xmm6,xmm6,xmm0
+        vpand   xmm3,xmm3,xmm4
+
+        vpxor   xmm7,xmm12,xmm1
+
+        vpsrld  xmm1,xmm13,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm13,10
+        vpxor   xmm12,xmm14,xmm3
+        vpaddd  xmm8,xmm8,xmm6
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm12,xmm12,xmm6
+        vpaddd  xmm12,xmm12,xmm7
+        vmovdqu xmm6,XMMWORD[((80-128))+rax]
+        vpaddd  xmm5,xmm5,XMMWORD[((208-128))+rax]
+
+        vpsrld  xmm7,xmm6,3
+        vpsrld  xmm1,xmm6,7
+        vpslld  xmm2,xmm6,25
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm6,18
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm6,14
+        vmovdqu xmm0,XMMWORD[((32-128))+rax]
+        vpsrld  xmm3,xmm0,10
+
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm0,17
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,15
+        vpaddd  xmm5,xmm5,xmm7
+        vpxor   xmm7,xmm3,xmm1
+        vpsrld  xmm1,xmm0,19
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,13
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+        vpaddd  xmm5,xmm5,xmm7
+        vpsrld  xmm7,xmm8,6
+        vpslld  xmm2,xmm8,26
+        vmovdqu XMMWORD[(64-128)+rax],xmm5
+        vpaddd  xmm5,xmm5,xmm11
+
+        vpsrld  xmm1,xmm8,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm8,21
+        vpaddd  xmm5,xmm5,XMMWORD[rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm8,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm8,7
+        vpandn  xmm0,xmm8,xmm10
+        vpand   xmm3,xmm8,xmm9
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm11,xmm12,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm12,30
+        vpxor   xmm0,xmm0,xmm3
+        vpxor   xmm3,xmm13,xmm12
+
+        vpxor   xmm11,xmm11,xmm1
+        vpaddd  xmm5,xmm5,xmm7
+
+        vpsrld  xmm1,xmm12,13
+
+        vpslld  xmm2,xmm12,19
+        vpaddd  xmm5,xmm5,xmm0
+        vpand   xmm4,xmm4,xmm3
+
+        vpxor   xmm7,xmm11,xmm1
+
+        vpsrld  xmm1,xmm12,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm12,10
+        vpxor   xmm11,xmm13,xmm4
+        vpaddd  xmm15,xmm15,xmm5
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm11,xmm11,xmm5
+        vpaddd  xmm11,xmm11,xmm7
+        vmovdqu xmm5,XMMWORD[((96-128))+rax]
+        vpaddd  xmm6,xmm6,XMMWORD[((224-128))+rax]
+
+        vpsrld  xmm7,xmm5,3
+        vpsrld  xmm1,xmm5,7
+        vpslld  xmm2,xmm5,25
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm5,18
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm5,14
+        vmovdqu xmm0,XMMWORD[((48-128))+rax]
+        vpsrld  xmm4,xmm0,10
+
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm0,17
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,15
+        vpaddd  xmm6,xmm6,xmm7
+        vpxor   xmm7,xmm4,xmm1
+        vpsrld  xmm1,xmm0,19
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,13
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+        vpaddd  xmm6,xmm6,xmm7
+        vpsrld  xmm7,xmm15,6
+        vpslld  xmm2,xmm15,26
+        vmovdqu XMMWORD[(80-128)+rax],xmm6
+        vpaddd  xmm6,xmm6,xmm10
+
+        vpsrld  xmm1,xmm15,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm15,21
+        vpaddd  xmm6,xmm6,XMMWORD[32+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm15,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm15,7
+        vpandn  xmm0,xmm15,xmm9
+        vpand   xmm4,xmm15,xmm8
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm10,xmm11,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm11,30
+        vpxor   xmm0,xmm0,xmm4
+        vpxor   xmm4,xmm12,xmm11
+
+        vpxor   xmm10,xmm10,xmm1
+        vpaddd  xmm6,xmm6,xmm7
+
+        vpsrld  xmm1,xmm11,13
+
+        vpslld  xmm2,xmm11,19
+        vpaddd  xmm6,xmm6,xmm0
+        vpand   xmm3,xmm3,xmm4
+
+        vpxor   xmm7,xmm10,xmm1
+
+        vpsrld  xmm1,xmm11,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm11,10
+        vpxor   xmm10,xmm12,xmm3
+        vpaddd  xmm14,xmm14,xmm6
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm10,xmm10,xmm6
+        vpaddd  xmm10,xmm10,xmm7
+        vmovdqu xmm6,XMMWORD[((112-128))+rax]
+        vpaddd  xmm5,xmm5,XMMWORD[((240-128))+rax]
+
+        vpsrld  xmm7,xmm6,3
+        vpsrld  xmm1,xmm6,7
+        vpslld  xmm2,xmm6,25
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm6,18
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm6,14
+        vmovdqu xmm0,XMMWORD[((64-128))+rax]
+        vpsrld  xmm3,xmm0,10
+
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm0,17
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,15
+        vpaddd  xmm5,xmm5,xmm7
+        vpxor   xmm7,xmm3,xmm1
+        vpsrld  xmm1,xmm0,19
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,13
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+        vpaddd  xmm5,xmm5,xmm7
+        vpsrld  xmm7,xmm14,6
+        vpslld  xmm2,xmm14,26
+        vmovdqu XMMWORD[(96-128)+rax],xmm5
+        vpaddd  xmm5,xmm5,xmm9
+
+        vpsrld  xmm1,xmm14,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm14,21
+        vpaddd  xmm5,xmm5,XMMWORD[64+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm14,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm14,7
+        vpandn  xmm0,xmm14,xmm8
+        vpand   xmm3,xmm14,xmm15
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm9,xmm10,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm10,30
+        vpxor   xmm0,xmm0,xmm3
+        vpxor   xmm3,xmm11,xmm10
+
+        vpxor   xmm9,xmm9,xmm1
+        vpaddd  xmm5,xmm5,xmm7
+
+        vpsrld  xmm1,xmm10,13
+
+        vpslld  xmm2,xmm10,19
+        vpaddd  xmm5,xmm5,xmm0
+        vpand   xmm4,xmm4,xmm3
+
+        vpxor   xmm7,xmm9,xmm1
+
+        vpsrld  xmm1,xmm10,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm10,10
+        vpxor   xmm9,xmm11,xmm4
+        vpaddd  xmm13,xmm13,xmm5
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm9,xmm9,xmm5
+        vpaddd  xmm9,xmm9,xmm7
+        vmovdqu xmm5,XMMWORD[((128-128))+rax]
+        vpaddd  xmm6,xmm6,XMMWORD[((0-128))+rax]
+
+        vpsrld  xmm7,xmm5,3
+        vpsrld  xmm1,xmm5,7
+        vpslld  xmm2,xmm5,25
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm5,18
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm5,14
+        vmovdqu xmm0,XMMWORD[((80-128))+rax]
+        vpsrld  xmm4,xmm0,10
+
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm0,17
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,15
+        vpaddd  xmm6,xmm6,xmm7
+        vpxor   xmm7,xmm4,xmm1
+        vpsrld  xmm1,xmm0,19
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,13
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+        vpaddd  xmm6,xmm6,xmm7
+        vpsrld  xmm7,xmm13,6
+        vpslld  xmm2,xmm13,26
+        vmovdqu XMMWORD[(112-128)+rax],xmm6
+        vpaddd  xmm6,xmm6,xmm8
+
+        vpsrld  xmm1,xmm13,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm13,21
+        vpaddd  xmm6,xmm6,XMMWORD[96+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm13,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm13,7
+        vpandn  xmm0,xmm13,xmm15
+        vpand   xmm4,xmm13,xmm14
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm8,xmm9,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm9,30
+        vpxor   xmm0,xmm0,xmm4
+        vpxor   xmm4,xmm10,xmm9
+
+        vpxor   xmm8,xmm8,xmm1
+        vpaddd  xmm6,xmm6,xmm7
+
+        vpsrld  xmm1,xmm9,13
+
+        vpslld  xmm2,xmm9,19
+        vpaddd  xmm6,xmm6,xmm0
+        vpand   xmm3,xmm3,xmm4
+
+        vpxor   xmm7,xmm8,xmm1
+
+        vpsrld  xmm1,xmm9,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm9,10
+        vpxor   xmm8,xmm10,xmm3
+        vpaddd  xmm12,xmm12,xmm6
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm8,xmm8,xmm6
+        vpaddd  xmm8,xmm8,xmm7
+        add     rbp,256
+        vmovdqu xmm6,XMMWORD[((144-128))+rax]
+        vpaddd  xmm5,xmm5,XMMWORD[((16-128))+rax]
+
+        vpsrld  xmm7,xmm6,3
+        vpsrld  xmm1,xmm6,7
+        vpslld  xmm2,xmm6,25
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm6,18
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm6,14
+        vmovdqu xmm0,XMMWORD[((96-128))+rax]
+        vpsrld  xmm3,xmm0,10
+
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm0,17
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,15
+        vpaddd  xmm5,xmm5,xmm7
+        vpxor   xmm7,xmm3,xmm1
+        vpsrld  xmm1,xmm0,19
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,13
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+        vpaddd  xmm5,xmm5,xmm7
+        vpsrld  xmm7,xmm12,6
+        vpslld  xmm2,xmm12,26
+        vmovdqu XMMWORD[(128-128)+rax],xmm5
+        vpaddd  xmm5,xmm5,xmm15
+
+        vpsrld  xmm1,xmm12,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm12,21
+        vpaddd  xmm5,xmm5,XMMWORD[((-128))+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm12,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm12,7
+        vpandn  xmm0,xmm12,xmm14
+        vpand   xmm3,xmm12,xmm13
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm15,xmm8,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm8,30
+        vpxor   xmm0,xmm0,xmm3
+        vpxor   xmm3,xmm9,xmm8
+
+        vpxor   xmm15,xmm15,xmm1
+        vpaddd  xmm5,xmm5,xmm7
+
+        vpsrld  xmm1,xmm8,13
+
+        vpslld  xmm2,xmm8,19
+        vpaddd  xmm5,xmm5,xmm0
+        vpand   xmm4,xmm4,xmm3
+
+        vpxor   xmm7,xmm15,xmm1
+
+        vpsrld  xmm1,xmm8,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm8,10
+        vpxor   xmm15,xmm9,xmm4
+        vpaddd  xmm11,xmm11,xmm5
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm15,xmm15,xmm5
+        vpaddd  xmm15,xmm15,xmm7
+        vmovdqu xmm5,XMMWORD[((160-128))+rax]
+        vpaddd  xmm6,xmm6,XMMWORD[((32-128))+rax]
+
+        vpsrld  xmm7,xmm5,3
+        vpsrld  xmm1,xmm5,7
+        vpslld  xmm2,xmm5,25
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm5,18
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm5,14
+        vmovdqu xmm0,XMMWORD[((112-128))+rax]
+        vpsrld  xmm4,xmm0,10
+
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm0,17
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,15
+        vpaddd  xmm6,xmm6,xmm7
+        vpxor   xmm7,xmm4,xmm1
+        vpsrld  xmm1,xmm0,19
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,13
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+        vpaddd  xmm6,xmm6,xmm7
+        vpsrld  xmm7,xmm11,6
+        vpslld  xmm2,xmm11,26
+        vmovdqu XMMWORD[(144-128)+rax],xmm6
+        vpaddd  xmm6,xmm6,xmm14
+
+        vpsrld  xmm1,xmm11,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm11,21
+        vpaddd  xmm6,xmm6,XMMWORD[((-96))+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm11,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm11,7
+        vpandn  xmm0,xmm11,xmm13
+        vpand   xmm4,xmm11,xmm12
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm14,xmm15,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm15,30
+        vpxor   xmm0,xmm0,xmm4
+        vpxor   xmm4,xmm8,xmm15
+
+        vpxor   xmm14,xmm14,xmm1
+        vpaddd  xmm6,xmm6,xmm7
+
+        vpsrld  xmm1,xmm15,13
+
+        vpslld  xmm2,xmm15,19
+        vpaddd  xmm6,xmm6,xmm0
+        vpand   xmm3,xmm3,xmm4
+
+        vpxor   xmm7,xmm14,xmm1
+
+        vpsrld  xmm1,xmm15,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm15,10
+        vpxor   xmm14,xmm8,xmm3
+        vpaddd  xmm10,xmm10,xmm6
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm14,xmm14,xmm6
+        vpaddd  xmm14,xmm14,xmm7
+        vmovdqu xmm6,XMMWORD[((176-128))+rax]
+        vpaddd  xmm5,xmm5,XMMWORD[((48-128))+rax]
+
+        vpsrld  xmm7,xmm6,3
+        vpsrld  xmm1,xmm6,7
+        vpslld  xmm2,xmm6,25
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm6,18
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm6,14
+        vmovdqu xmm0,XMMWORD[((128-128))+rax]
+        vpsrld  xmm3,xmm0,10
+
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm0,17
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,15
+        vpaddd  xmm5,xmm5,xmm7
+        vpxor   xmm7,xmm3,xmm1
+        vpsrld  xmm1,xmm0,19
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,13
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+        vpaddd  xmm5,xmm5,xmm7
+        vpsrld  xmm7,xmm10,6
+        vpslld  xmm2,xmm10,26
+        vmovdqu XMMWORD[(160-128)+rax],xmm5
+        vpaddd  xmm5,xmm5,xmm13
+
+        vpsrld  xmm1,xmm10,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm10,21
+        vpaddd  xmm5,xmm5,XMMWORD[((-64))+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm10,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm10,7
+        vpandn  xmm0,xmm10,xmm12
+        vpand   xmm3,xmm10,xmm11
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm13,xmm14,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm14,30
+        vpxor   xmm0,xmm0,xmm3
+        vpxor   xmm3,xmm15,xmm14
+
+        vpxor   xmm13,xmm13,xmm1
+        vpaddd  xmm5,xmm5,xmm7
+
+        vpsrld  xmm1,xmm14,13
+
+        vpslld  xmm2,xmm14,19
+        vpaddd  xmm5,xmm5,xmm0
+        vpand   xmm4,xmm4,xmm3
+
+        vpxor   xmm7,xmm13,xmm1
+
+        vpsrld  xmm1,xmm14,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm14,10
+        vpxor   xmm13,xmm15,xmm4
+        vpaddd  xmm9,xmm9,xmm5
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm13,xmm13,xmm5
+        vpaddd  xmm13,xmm13,xmm7
+        vmovdqu xmm5,XMMWORD[((192-128))+rax]
+        vpaddd  xmm6,xmm6,XMMWORD[((64-128))+rax]
+
+        vpsrld  xmm7,xmm5,3
+        vpsrld  xmm1,xmm5,7
+        vpslld  xmm2,xmm5,25
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm5,18
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm5,14
+        vmovdqu xmm0,XMMWORD[((144-128))+rax]
+        vpsrld  xmm4,xmm0,10
+
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm0,17
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,15
+        vpaddd  xmm6,xmm6,xmm7
+        vpxor   xmm7,xmm4,xmm1
+        vpsrld  xmm1,xmm0,19
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,13
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+        vpaddd  xmm6,xmm6,xmm7
+        vpsrld  xmm7,xmm9,6
+        vpslld  xmm2,xmm9,26
+        vmovdqu XMMWORD[(176-128)+rax],xmm6
+        vpaddd  xmm6,xmm6,xmm12
+
+        vpsrld  xmm1,xmm9,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm9,21
+        vpaddd  xmm6,xmm6,XMMWORD[((-32))+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm9,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm9,7
+        vpandn  xmm0,xmm9,xmm11
+        vpand   xmm4,xmm9,xmm10
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm12,xmm13,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm13,30
+        vpxor   xmm0,xmm0,xmm4
+        vpxor   xmm4,xmm14,xmm13
+
+        vpxor   xmm12,xmm12,xmm1
+        vpaddd  xmm6,xmm6,xmm7
+
+        vpsrld  xmm1,xmm13,13
+
+        vpslld  xmm2,xmm13,19
+        vpaddd  xmm6,xmm6,xmm0
+        vpand   xmm3,xmm3,xmm4
+
+        vpxor   xmm7,xmm12,xmm1
+
+        vpsrld  xmm1,xmm13,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm13,10
+        vpxor   xmm12,xmm14,xmm3
+        vpaddd  xmm8,xmm8,xmm6
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm12,xmm12,xmm6
+        vpaddd  xmm12,xmm12,xmm7
+        vmovdqu xmm6,XMMWORD[((208-128))+rax]
+        vpaddd  xmm5,xmm5,XMMWORD[((80-128))+rax]
+
+        vpsrld  xmm7,xmm6,3
+        vpsrld  xmm1,xmm6,7
+        vpslld  xmm2,xmm6,25
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm6,18
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm6,14
+        vmovdqu xmm0,XMMWORD[((160-128))+rax]
+        vpsrld  xmm3,xmm0,10
+
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm0,17
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,15
+        vpaddd  xmm5,xmm5,xmm7
+        vpxor   xmm7,xmm3,xmm1
+        vpsrld  xmm1,xmm0,19
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,13
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+        vpaddd  xmm5,xmm5,xmm7
+        vpsrld  xmm7,xmm8,6
+        vpslld  xmm2,xmm8,26
+        vmovdqu XMMWORD[(192-128)+rax],xmm5
+        vpaddd  xmm5,xmm5,xmm11
+
+        vpsrld  xmm1,xmm8,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm8,21
+        vpaddd  xmm5,xmm5,XMMWORD[rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm8,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm8,7
+        vpandn  xmm0,xmm8,xmm10
+        vpand   xmm3,xmm8,xmm9
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm11,xmm12,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm12,30
+        vpxor   xmm0,xmm0,xmm3
+        vpxor   xmm3,xmm13,xmm12
+
+        vpxor   xmm11,xmm11,xmm1
+        vpaddd  xmm5,xmm5,xmm7
+
+        vpsrld  xmm1,xmm12,13
+
+        vpslld  xmm2,xmm12,19
+        vpaddd  xmm5,xmm5,xmm0
+        vpand   xmm4,xmm4,xmm3
+
+        vpxor   xmm7,xmm11,xmm1
+
+        vpsrld  xmm1,xmm12,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm12,10
+        vpxor   xmm11,xmm13,xmm4
+        vpaddd  xmm15,xmm15,xmm5
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm11,xmm11,xmm5
+        vpaddd  xmm11,xmm11,xmm7
+        vmovdqu xmm5,XMMWORD[((224-128))+rax]
+        vpaddd  xmm6,xmm6,XMMWORD[((96-128))+rax]
+
+        vpsrld  xmm7,xmm5,3
+        vpsrld  xmm1,xmm5,7
+        vpslld  xmm2,xmm5,25
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm5,18
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm5,14
+        vmovdqu xmm0,XMMWORD[((176-128))+rax]
+        vpsrld  xmm4,xmm0,10
+
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm0,17
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,15
+        vpaddd  xmm6,xmm6,xmm7
+        vpxor   xmm7,xmm4,xmm1
+        vpsrld  xmm1,xmm0,19
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,13
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+        vpaddd  xmm6,xmm6,xmm7
+        vpsrld  xmm7,xmm15,6
+        vpslld  xmm2,xmm15,26
+        vmovdqu XMMWORD[(208-128)+rax],xmm6
+        vpaddd  xmm6,xmm6,xmm10
+
+        vpsrld  xmm1,xmm15,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm15,21
+        vpaddd  xmm6,xmm6,XMMWORD[32+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm15,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm15,7
+        vpandn  xmm0,xmm15,xmm9
+        vpand   xmm4,xmm15,xmm8
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm10,xmm11,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm11,30
+        vpxor   xmm0,xmm0,xmm4
+        vpxor   xmm4,xmm12,xmm11
+
+        vpxor   xmm10,xmm10,xmm1
+        vpaddd  xmm6,xmm6,xmm7
+
+        vpsrld  xmm1,xmm11,13
+
+        vpslld  xmm2,xmm11,19
+        vpaddd  xmm6,xmm6,xmm0
+        vpand   xmm3,xmm3,xmm4
+
+        vpxor   xmm7,xmm10,xmm1
+
+        vpsrld  xmm1,xmm11,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm11,10
+        vpxor   xmm10,xmm12,xmm3
+        vpaddd  xmm14,xmm14,xmm6
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm10,xmm10,xmm6
+        vpaddd  xmm10,xmm10,xmm7
+        vmovdqu xmm6,XMMWORD[((240-128))+rax]
+        vpaddd  xmm5,xmm5,XMMWORD[((112-128))+rax]
+
+        vpsrld  xmm7,xmm6,3
+        vpsrld  xmm1,xmm6,7
+        vpslld  xmm2,xmm6,25
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm6,18
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm6,14
+        vmovdqu xmm0,XMMWORD[((192-128))+rax]
+        vpsrld  xmm3,xmm0,10
+
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm0,17
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,15
+        vpaddd  xmm5,xmm5,xmm7
+        vpxor   xmm7,xmm3,xmm1
+        vpsrld  xmm1,xmm0,19
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,13
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+        vpaddd  xmm5,xmm5,xmm7
+        vpsrld  xmm7,xmm14,6
+        vpslld  xmm2,xmm14,26
+        vmovdqu XMMWORD[(224-128)+rax],xmm5
+        vpaddd  xmm5,xmm5,xmm9
+
+        vpsrld  xmm1,xmm14,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm14,21
+        vpaddd  xmm5,xmm5,XMMWORD[64+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm14,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm14,7
+        vpandn  xmm0,xmm14,xmm8
+        vpand   xmm3,xmm14,xmm15
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm9,xmm10,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm10,30
+        vpxor   xmm0,xmm0,xmm3
+        vpxor   xmm3,xmm11,xmm10
+
+        vpxor   xmm9,xmm9,xmm1
+        vpaddd  xmm5,xmm5,xmm7
+
+        vpsrld  xmm1,xmm10,13
+
+        vpslld  xmm2,xmm10,19
+        vpaddd  xmm5,xmm5,xmm0
+        vpand   xmm4,xmm4,xmm3
+
+        vpxor   xmm7,xmm9,xmm1
+
+        vpsrld  xmm1,xmm10,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm10,10
+        vpxor   xmm9,xmm11,xmm4
+        vpaddd  xmm13,xmm13,xmm5
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm9,xmm9,xmm5
+        vpaddd  xmm9,xmm9,xmm7
+        vmovdqu xmm5,XMMWORD[((0-128))+rax]
+        vpaddd  xmm6,xmm6,XMMWORD[((128-128))+rax]
+
+        vpsrld  xmm7,xmm5,3
+        vpsrld  xmm1,xmm5,7
+        vpslld  xmm2,xmm5,25
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm5,18
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm5,14
+        vmovdqu xmm0,XMMWORD[((208-128))+rax]
+        vpsrld  xmm4,xmm0,10
+
+        vpxor   xmm7,xmm7,xmm1
+        vpsrld  xmm1,xmm0,17
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,15
+        vpaddd  xmm6,xmm6,xmm7
+        vpxor   xmm7,xmm4,xmm1
+        vpsrld  xmm1,xmm0,19
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm0,13
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+        vpaddd  xmm6,xmm6,xmm7
+        vpsrld  xmm7,xmm13,6
+        vpslld  xmm2,xmm13,26
+        vmovdqu XMMWORD[(240-128)+rax],xmm6
+        vpaddd  xmm6,xmm6,xmm8
+
+        vpsrld  xmm1,xmm13,11
+        vpxor   xmm7,xmm7,xmm2
+        vpslld  xmm2,xmm13,21
+        vpaddd  xmm6,xmm6,XMMWORD[96+rbp]
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm1,xmm13,25
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm13,7
+        vpandn  xmm0,xmm13,xmm15
+        vpand   xmm4,xmm13,xmm14
+
+        vpxor   xmm7,xmm7,xmm1
+
+        vpsrld  xmm8,xmm9,2
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm1,xmm9,30
+        vpxor   xmm0,xmm0,xmm4
+        vpxor   xmm4,xmm10,xmm9
+
+        vpxor   xmm8,xmm8,xmm1
+        vpaddd  xmm6,xmm6,xmm7
+
+        vpsrld  xmm1,xmm9,13
+
+        vpslld  xmm2,xmm9,19
+        vpaddd  xmm6,xmm6,xmm0
+        vpand   xmm3,xmm3,xmm4
+
+        vpxor   xmm7,xmm8,xmm1
+
+        vpsrld  xmm1,xmm9,22
+        vpxor   xmm7,xmm7,xmm2
+
+        vpslld  xmm2,xmm9,10
+        vpxor   xmm8,xmm10,xmm3
+        vpaddd  xmm12,xmm12,xmm6
+
+        vpxor   xmm7,xmm7,xmm1
+        vpxor   xmm7,xmm7,xmm2
+
+        vpaddd  xmm8,xmm8,xmm6
+        vpaddd  xmm8,xmm8,xmm7
+        add     rbp,256
+        dec     ecx
+        jnz     NEAR $L$oop_16_xx_avx
+
+        mov     ecx,1
+        lea     rbp,[((K256+128))]
+        cmp     ecx,DWORD[rbx]
+        cmovge  r8,rbp
+        cmp     ecx,DWORD[4+rbx]
+        cmovge  r9,rbp
+        cmp     ecx,DWORD[8+rbx]
+        cmovge  r10,rbp
+        cmp     ecx,DWORD[12+rbx]
+        cmovge  r11,rbp
+        vmovdqa xmm7,XMMWORD[rbx]
+        vpxor   xmm0,xmm0,xmm0
+        vmovdqa xmm6,xmm7
+        vpcmpgtd        xmm6,xmm6,xmm0
+        vpaddd  xmm7,xmm7,xmm6
+
+        vmovdqu xmm0,XMMWORD[((0-128))+rdi]
+        vpand   xmm8,xmm8,xmm6
+        vmovdqu xmm1,XMMWORD[((32-128))+rdi]
+        vpand   xmm9,xmm9,xmm6
+        vmovdqu xmm2,XMMWORD[((64-128))+rdi]
+        vpand   xmm10,xmm10,xmm6
+        vmovdqu xmm5,XMMWORD[((96-128))+rdi]
+        vpand   xmm11,xmm11,xmm6
+        vpaddd  xmm8,xmm8,xmm0
+        vmovdqu xmm0,XMMWORD[((128-128))+rdi]
+        vpand   xmm12,xmm12,xmm6
+        vpaddd  xmm9,xmm9,xmm1
+        vmovdqu xmm1,XMMWORD[((160-128))+rdi]
+        vpand   xmm13,xmm13,xmm6
+        vpaddd  xmm10,xmm10,xmm2
+        vmovdqu xmm2,XMMWORD[((192-128))+rdi]
+        vpand   xmm14,xmm14,xmm6
+        vpaddd  xmm11,xmm11,xmm5
+        vmovdqu xmm5,XMMWORD[((224-128))+rdi]
+        vpand   xmm15,xmm15,xmm6
+        vpaddd  xmm12,xmm12,xmm0
+        vpaddd  xmm13,xmm13,xmm1
+        vmovdqu XMMWORD[(0-128)+rdi],xmm8
+        vpaddd  xmm14,xmm14,xmm2
+        vmovdqu XMMWORD[(32-128)+rdi],xmm9
+        vpaddd  xmm15,xmm15,xmm5
+        vmovdqu XMMWORD[(64-128)+rdi],xmm10
+        vmovdqu XMMWORD[(96-128)+rdi],xmm11
+        vmovdqu XMMWORD[(128-128)+rdi],xmm12
+        vmovdqu XMMWORD[(160-128)+rdi],xmm13
+        vmovdqu XMMWORD[(192-128)+rdi],xmm14
+        vmovdqu XMMWORD[(224-128)+rdi],xmm15
+
+        vmovdqu XMMWORD[rbx],xmm7
+        vmovdqu xmm6,XMMWORD[$L$pbswap]
+        dec     edx
+        jnz     NEAR $L$oop_avx
+
+        mov     edx,DWORD[280+rsp]
+        lea     rdi,[16+rdi]
+        lea     rsi,[64+rsi]
+        dec     edx
+        jnz     NEAR $L$oop_grande_avx
+
+$L$done_avx:
+        mov     rax,QWORD[272+rsp]
+
+        vzeroupper
+        movaps  xmm6,XMMWORD[((-184))+rax]
+        movaps  xmm7,XMMWORD[((-168))+rax]
+        movaps  xmm8,XMMWORD[((-152))+rax]
+        movaps  xmm9,XMMWORD[((-136))+rax]
+        movaps  xmm10,XMMWORD[((-120))+rax]
+        movaps  xmm11,XMMWORD[((-104))+rax]
+        movaps  xmm12,XMMWORD[((-88))+rax]
+        movaps  xmm13,XMMWORD[((-72))+rax]
+        movaps  xmm14,XMMWORD[((-56))+rax]
+        movaps  xmm15,XMMWORD[((-40))+rax]
+        mov     rbp,QWORD[((-16))+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+
+        lea     rsp,[rax]
+
+$L$epilogue_avx:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_sha256_multi_block_avx:
+
+ALIGN   32
+sha256_multi_block_avx2:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_sha256_multi_block_avx2:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+
+_avx2_shortcut:
+        mov     rax,rsp
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+        lea     rsp,[((-168))+rsp]
+        movaps  XMMWORD[rsp],xmm6
+        movaps  XMMWORD[16+rsp],xmm7
+        movaps  XMMWORD[32+rsp],xmm8
+        movaps  XMMWORD[48+rsp],xmm9
+        movaps  XMMWORD[64+rsp],xmm10
+        movaps  XMMWORD[80+rsp],xmm11
+        movaps  XMMWORD[(-120)+rax],xmm12
+        movaps  XMMWORD[(-104)+rax],xmm13
+        movaps  XMMWORD[(-88)+rax],xmm14
+        movaps  XMMWORD[(-72)+rax],xmm15
+        sub     rsp,576
+        and     rsp,-256
+        mov     QWORD[544+rsp],rax
+
+$L$body_avx2:
+        lea     rbp,[((K256+128))]
+        lea     rdi,[128+rdi]
+
+$L$oop_grande_avx2:
+        mov     DWORD[552+rsp],edx
+        xor     edx,edx
+        lea     rbx,[512+rsp]
+        mov     r12,QWORD[rsi]
+        mov     ecx,DWORD[8+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[rbx],ecx
+        cmovle  r12,rbp
+        mov     r13,QWORD[16+rsi]
+        mov     ecx,DWORD[24+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[4+rbx],ecx
+        cmovle  r13,rbp
+        mov     r14,QWORD[32+rsi]
+        mov     ecx,DWORD[40+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[8+rbx],ecx
+        cmovle  r14,rbp
+        mov     r15,QWORD[48+rsi]
+        mov     ecx,DWORD[56+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[12+rbx],ecx
+        cmovle  r15,rbp
+        mov     r8,QWORD[64+rsi]
+        mov     ecx,DWORD[72+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[16+rbx],ecx
+        cmovle  r8,rbp
+        mov     r9,QWORD[80+rsi]
+        mov     ecx,DWORD[88+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[20+rbx],ecx
+        cmovle  r9,rbp
+        mov     r10,QWORD[96+rsi]
+        mov     ecx,DWORD[104+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[24+rbx],ecx
+        cmovle  r10,rbp
+        mov     r11,QWORD[112+rsi]
+        mov     ecx,DWORD[120+rsi]
+        cmp     ecx,edx
+        cmovg   edx,ecx
+        test    ecx,ecx
+        mov     DWORD[28+rbx],ecx
+        cmovle  r11,rbp
+        vmovdqu ymm8,YMMWORD[((0-128))+rdi]
+        lea     rax,[128+rsp]
+        vmovdqu ymm9,YMMWORD[((32-128))+rdi]
+        lea     rbx,[((256+128))+rsp]
+        vmovdqu ymm10,YMMWORD[((64-128))+rdi]
+        vmovdqu ymm11,YMMWORD[((96-128))+rdi]
+        vmovdqu ymm12,YMMWORD[((128-128))+rdi]
+        vmovdqu ymm13,YMMWORD[((160-128))+rdi]
+        vmovdqu ymm14,YMMWORD[((192-128))+rdi]
+        vmovdqu ymm15,YMMWORD[((224-128))+rdi]
+        vmovdqu ymm6,YMMWORD[$L$pbswap]
+        jmp     NEAR $L$oop_avx2
+
+ALIGN   32
+$L$oop_avx2:
+        vpxor   ymm4,ymm10,ymm9
+        vmovd   xmm5,DWORD[r12]
+        vmovd   xmm0,DWORD[r8]
+        vmovd   xmm1,DWORD[r13]
+        vmovd   xmm2,DWORD[r9]
+        vpinsrd xmm5,xmm5,DWORD[r14],1
+        vpinsrd xmm0,xmm0,DWORD[r10],1
+        vpinsrd xmm1,xmm1,DWORD[r15],1
+        vpunpckldq      ymm5,ymm5,ymm1
+        vpinsrd xmm2,xmm2,DWORD[r11],1
+        vpunpckldq      ymm0,ymm0,ymm2
+        vinserti128     ymm5,ymm5,xmm0,1
+        vpshufb ymm5,ymm5,ymm6
+        vpsrld  ymm7,ymm12,6
+        vpslld  ymm2,ymm12,26
+        vmovdqu YMMWORD[(0-128)+rax],ymm5
+        vpaddd  ymm5,ymm5,ymm15
+
+        vpsrld  ymm1,ymm12,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm12,21
+        vpaddd  ymm5,ymm5,YMMWORD[((-128))+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm12,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm12,7
+        vpandn  ymm0,ymm12,ymm14
+        vpand   ymm3,ymm12,ymm13
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm15,ymm8,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm8,30
+        vpxor   ymm0,ymm0,ymm3
+        vpxor   ymm3,ymm9,ymm8
+
+        vpxor   ymm15,ymm15,ymm1
+        vpaddd  ymm5,ymm5,ymm7
+
+        vpsrld  ymm1,ymm8,13
+
+        vpslld  ymm2,ymm8,19
+        vpaddd  ymm5,ymm5,ymm0
+        vpand   ymm4,ymm4,ymm3
+
+        vpxor   ymm7,ymm15,ymm1
+
+        vpsrld  ymm1,ymm8,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm8,10
+        vpxor   ymm15,ymm9,ymm4
+        vpaddd  ymm11,ymm11,ymm5
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm15,ymm15,ymm5
+        vpaddd  ymm15,ymm15,ymm7
+        vmovd   xmm5,DWORD[4+r12]
+        vmovd   xmm0,DWORD[4+r8]
+        vmovd   xmm1,DWORD[4+r13]
+        vmovd   xmm2,DWORD[4+r9]
+        vpinsrd xmm5,xmm5,DWORD[4+r14],1
+        vpinsrd xmm0,xmm0,DWORD[4+r10],1
+        vpinsrd xmm1,xmm1,DWORD[4+r15],1
+        vpunpckldq      ymm5,ymm5,ymm1
+        vpinsrd xmm2,xmm2,DWORD[4+r11],1
+        vpunpckldq      ymm0,ymm0,ymm2
+        vinserti128     ymm5,ymm5,xmm0,1
+        vpshufb ymm5,ymm5,ymm6
+        vpsrld  ymm7,ymm11,6
+        vpslld  ymm2,ymm11,26
+        vmovdqu YMMWORD[(32-128)+rax],ymm5
+        vpaddd  ymm5,ymm5,ymm14
+
+        vpsrld  ymm1,ymm11,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm11,21
+        vpaddd  ymm5,ymm5,YMMWORD[((-96))+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm11,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm11,7
+        vpandn  ymm0,ymm11,ymm13
+        vpand   ymm4,ymm11,ymm12
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm14,ymm15,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm15,30
+        vpxor   ymm0,ymm0,ymm4
+        vpxor   ymm4,ymm8,ymm15
+
+        vpxor   ymm14,ymm14,ymm1
+        vpaddd  ymm5,ymm5,ymm7
+
+        vpsrld  ymm1,ymm15,13
+
+        vpslld  ymm2,ymm15,19
+        vpaddd  ymm5,ymm5,ymm0
+        vpand   ymm3,ymm3,ymm4
+
+        vpxor   ymm7,ymm14,ymm1
+
+        vpsrld  ymm1,ymm15,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm15,10
+        vpxor   ymm14,ymm8,ymm3
+        vpaddd  ymm10,ymm10,ymm5
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm14,ymm14,ymm5
+        vpaddd  ymm14,ymm14,ymm7
+        vmovd   xmm5,DWORD[8+r12]
+        vmovd   xmm0,DWORD[8+r8]
+        vmovd   xmm1,DWORD[8+r13]
+        vmovd   xmm2,DWORD[8+r9]
+        vpinsrd xmm5,xmm5,DWORD[8+r14],1
+        vpinsrd xmm0,xmm0,DWORD[8+r10],1
+        vpinsrd xmm1,xmm1,DWORD[8+r15],1
+        vpunpckldq      ymm5,ymm5,ymm1
+        vpinsrd xmm2,xmm2,DWORD[8+r11],1
+        vpunpckldq      ymm0,ymm0,ymm2
+        vinserti128     ymm5,ymm5,xmm0,1
+        vpshufb ymm5,ymm5,ymm6
+        vpsrld  ymm7,ymm10,6
+        vpslld  ymm2,ymm10,26
+        vmovdqu YMMWORD[(64-128)+rax],ymm5
+        vpaddd  ymm5,ymm5,ymm13
+
+        vpsrld  ymm1,ymm10,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm10,21
+        vpaddd  ymm5,ymm5,YMMWORD[((-64))+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm10,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm10,7
+        vpandn  ymm0,ymm10,ymm12
+        vpand   ymm3,ymm10,ymm11
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm13,ymm14,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm14,30
+        vpxor   ymm0,ymm0,ymm3
+        vpxor   ymm3,ymm15,ymm14
+
+        vpxor   ymm13,ymm13,ymm1
+        vpaddd  ymm5,ymm5,ymm7
+
+        vpsrld  ymm1,ymm14,13
+
+        vpslld  ymm2,ymm14,19
+        vpaddd  ymm5,ymm5,ymm0
+        vpand   ymm4,ymm4,ymm3
+
+        vpxor   ymm7,ymm13,ymm1
+
+        vpsrld  ymm1,ymm14,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm14,10
+        vpxor   ymm13,ymm15,ymm4
+        vpaddd  ymm9,ymm9,ymm5
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm13,ymm13,ymm5
+        vpaddd  ymm13,ymm13,ymm7
+        vmovd   xmm5,DWORD[12+r12]
+        vmovd   xmm0,DWORD[12+r8]
+        vmovd   xmm1,DWORD[12+r13]
+        vmovd   xmm2,DWORD[12+r9]
+        vpinsrd xmm5,xmm5,DWORD[12+r14],1
+        vpinsrd xmm0,xmm0,DWORD[12+r10],1
+        vpinsrd xmm1,xmm1,DWORD[12+r15],1
+        vpunpckldq      ymm5,ymm5,ymm1
+        vpinsrd xmm2,xmm2,DWORD[12+r11],1
+        vpunpckldq      ymm0,ymm0,ymm2
+        vinserti128     ymm5,ymm5,xmm0,1
+        vpshufb ymm5,ymm5,ymm6
+        vpsrld  ymm7,ymm9,6
+        vpslld  ymm2,ymm9,26
+        vmovdqu YMMWORD[(96-128)+rax],ymm5
+        vpaddd  ymm5,ymm5,ymm12
+
+        vpsrld  ymm1,ymm9,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm9,21
+        vpaddd  ymm5,ymm5,YMMWORD[((-32))+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm9,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm9,7
+        vpandn  ymm0,ymm9,ymm11
+        vpand   ymm4,ymm9,ymm10
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm12,ymm13,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm13,30
+        vpxor   ymm0,ymm0,ymm4
+        vpxor   ymm4,ymm14,ymm13
+
+        vpxor   ymm12,ymm12,ymm1
+        vpaddd  ymm5,ymm5,ymm7
+
+        vpsrld  ymm1,ymm13,13
+
+        vpslld  ymm2,ymm13,19
+        vpaddd  ymm5,ymm5,ymm0
+        vpand   ymm3,ymm3,ymm4
+
+        vpxor   ymm7,ymm12,ymm1
+
+        vpsrld  ymm1,ymm13,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm13,10
+        vpxor   ymm12,ymm14,ymm3
+        vpaddd  ymm8,ymm8,ymm5
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm12,ymm12,ymm5
+        vpaddd  ymm12,ymm12,ymm7
+        vmovd   xmm5,DWORD[16+r12]
+        vmovd   xmm0,DWORD[16+r8]
+        vmovd   xmm1,DWORD[16+r13]
+        vmovd   xmm2,DWORD[16+r9]
+        vpinsrd xmm5,xmm5,DWORD[16+r14],1
+        vpinsrd xmm0,xmm0,DWORD[16+r10],1
+        vpinsrd xmm1,xmm1,DWORD[16+r15],1
+        vpunpckldq      ymm5,ymm5,ymm1
+        vpinsrd xmm2,xmm2,DWORD[16+r11],1
+        vpunpckldq      ymm0,ymm0,ymm2
+        vinserti128     ymm5,ymm5,xmm0,1
+        vpshufb ymm5,ymm5,ymm6
+        vpsrld  ymm7,ymm8,6
+        vpslld  ymm2,ymm8,26
+        vmovdqu YMMWORD[(128-128)+rax],ymm5
+        vpaddd  ymm5,ymm5,ymm11
+
+        vpsrld  ymm1,ymm8,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm8,21
+        vpaddd  ymm5,ymm5,YMMWORD[rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm8,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm8,7
+        vpandn  ymm0,ymm8,ymm10
+        vpand   ymm3,ymm8,ymm9
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm11,ymm12,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm12,30
+        vpxor   ymm0,ymm0,ymm3
+        vpxor   ymm3,ymm13,ymm12
+
+        vpxor   ymm11,ymm11,ymm1
+        vpaddd  ymm5,ymm5,ymm7
+
+        vpsrld  ymm1,ymm12,13
+
+        vpslld  ymm2,ymm12,19
+        vpaddd  ymm5,ymm5,ymm0
+        vpand   ymm4,ymm4,ymm3
+
+        vpxor   ymm7,ymm11,ymm1
+
+        vpsrld  ymm1,ymm12,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm12,10
+        vpxor   ymm11,ymm13,ymm4
+        vpaddd  ymm15,ymm15,ymm5
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm11,ymm11,ymm5
+        vpaddd  ymm11,ymm11,ymm7
+        vmovd   xmm5,DWORD[20+r12]
+        vmovd   xmm0,DWORD[20+r8]
+        vmovd   xmm1,DWORD[20+r13]
+        vmovd   xmm2,DWORD[20+r9]
+        vpinsrd xmm5,xmm5,DWORD[20+r14],1
+        vpinsrd xmm0,xmm0,DWORD[20+r10],1
+        vpinsrd xmm1,xmm1,DWORD[20+r15],1
+        vpunpckldq      ymm5,ymm5,ymm1
+        vpinsrd xmm2,xmm2,DWORD[20+r11],1
+        vpunpckldq      ymm0,ymm0,ymm2
+        vinserti128     ymm5,ymm5,xmm0,1
+        vpshufb ymm5,ymm5,ymm6
+        vpsrld  ymm7,ymm15,6
+        vpslld  ymm2,ymm15,26
+        vmovdqu YMMWORD[(160-128)+rax],ymm5
+        vpaddd  ymm5,ymm5,ymm10
+
+        vpsrld  ymm1,ymm15,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm15,21
+        vpaddd  ymm5,ymm5,YMMWORD[32+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm15,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm15,7
+        vpandn  ymm0,ymm15,ymm9
+        vpand   ymm4,ymm15,ymm8
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm10,ymm11,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm11,30
+        vpxor   ymm0,ymm0,ymm4
+        vpxor   ymm4,ymm12,ymm11
+
+        vpxor   ymm10,ymm10,ymm1
+        vpaddd  ymm5,ymm5,ymm7
+
+        vpsrld  ymm1,ymm11,13
+
+        vpslld  ymm2,ymm11,19
+        vpaddd  ymm5,ymm5,ymm0
+        vpand   ymm3,ymm3,ymm4
+
+        vpxor   ymm7,ymm10,ymm1
+
+        vpsrld  ymm1,ymm11,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm11,10
+        vpxor   ymm10,ymm12,ymm3
+        vpaddd  ymm14,ymm14,ymm5
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm10,ymm10,ymm5
+        vpaddd  ymm10,ymm10,ymm7
+        vmovd   xmm5,DWORD[24+r12]
+        vmovd   xmm0,DWORD[24+r8]
+        vmovd   xmm1,DWORD[24+r13]
+        vmovd   xmm2,DWORD[24+r9]
+        vpinsrd xmm5,xmm5,DWORD[24+r14],1
+        vpinsrd xmm0,xmm0,DWORD[24+r10],1
+        vpinsrd xmm1,xmm1,DWORD[24+r15],1
+        vpunpckldq      ymm5,ymm5,ymm1
+        vpinsrd xmm2,xmm2,DWORD[24+r11],1
+        vpunpckldq      ymm0,ymm0,ymm2
+        vinserti128     ymm5,ymm5,xmm0,1
+        vpshufb ymm5,ymm5,ymm6
+        vpsrld  ymm7,ymm14,6
+        vpslld  ymm2,ymm14,26
+        vmovdqu YMMWORD[(192-128)+rax],ymm5
+        vpaddd  ymm5,ymm5,ymm9
+
+        vpsrld  ymm1,ymm14,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm14,21
+        vpaddd  ymm5,ymm5,YMMWORD[64+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm14,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm14,7
+        vpandn  ymm0,ymm14,ymm8
+        vpand   ymm3,ymm14,ymm15
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm9,ymm10,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm10,30
+        vpxor   ymm0,ymm0,ymm3
+        vpxor   ymm3,ymm11,ymm10
+
+        vpxor   ymm9,ymm9,ymm1
+        vpaddd  ymm5,ymm5,ymm7
+
+        vpsrld  ymm1,ymm10,13
+
+        vpslld  ymm2,ymm10,19
+        vpaddd  ymm5,ymm5,ymm0
+        vpand   ymm4,ymm4,ymm3
+
+        vpxor   ymm7,ymm9,ymm1
+
+        vpsrld  ymm1,ymm10,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm10,10
+        vpxor   ymm9,ymm11,ymm4
+        vpaddd  ymm13,ymm13,ymm5
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm9,ymm9,ymm5
+        vpaddd  ymm9,ymm9,ymm7
+        vmovd   xmm5,DWORD[28+r12]
+        vmovd   xmm0,DWORD[28+r8]
+        vmovd   xmm1,DWORD[28+r13]
+        vmovd   xmm2,DWORD[28+r9]
+        vpinsrd xmm5,xmm5,DWORD[28+r14],1
+        vpinsrd xmm0,xmm0,DWORD[28+r10],1
+        vpinsrd xmm1,xmm1,DWORD[28+r15],1
+        vpunpckldq      ymm5,ymm5,ymm1
+        vpinsrd xmm2,xmm2,DWORD[28+r11],1
+        vpunpckldq      ymm0,ymm0,ymm2
+        vinserti128     ymm5,ymm5,xmm0,1
+        vpshufb ymm5,ymm5,ymm6
+        vpsrld  ymm7,ymm13,6
+        vpslld  ymm2,ymm13,26
+        vmovdqu YMMWORD[(224-128)+rax],ymm5
+        vpaddd  ymm5,ymm5,ymm8
+
+        vpsrld  ymm1,ymm13,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm13,21
+        vpaddd  ymm5,ymm5,YMMWORD[96+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm13,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm13,7
+        vpandn  ymm0,ymm13,ymm15
+        vpand   ymm4,ymm13,ymm14
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm8,ymm9,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm9,30
+        vpxor   ymm0,ymm0,ymm4
+        vpxor   ymm4,ymm10,ymm9
+
+        vpxor   ymm8,ymm8,ymm1
+        vpaddd  ymm5,ymm5,ymm7
+
+        vpsrld  ymm1,ymm9,13
+
+        vpslld  ymm2,ymm9,19
+        vpaddd  ymm5,ymm5,ymm0
+        vpand   ymm3,ymm3,ymm4
+
+        vpxor   ymm7,ymm8,ymm1
+
+        vpsrld  ymm1,ymm9,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm9,10
+        vpxor   ymm8,ymm10,ymm3
+        vpaddd  ymm12,ymm12,ymm5
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm8,ymm8,ymm5
+        vpaddd  ymm8,ymm8,ymm7
+        add     rbp,256
+        vmovd   xmm5,DWORD[32+r12]
+        vmovd   xmm0,DWORD[32+r8]
+        vmovd   xmm1,DWORD[32+r13]
+        vmovd   xmm2,DWORD[32+r9]
+        vpinsrd xmm5,xmm5,DWORD[32+r14],1
+        vpinsrd xmm0,xmm0,DWORD[32+r10],1
+        vpinsrd xmm1,xmm1,DWORD[32+r15],1
+        vpunpckldq      ymm5,ymm5,ymm1
+        vpinsrd xmm2,xmm2,DWORD[32+r11],1
+        vpunpckldq      ymm0,ymm0,ymm2
+        vinserti128     ymm5,ymm5,xmm0,1
+        vpshufb ymm5,ymm5,ymm6
+        vpsrld  ymm7,ymm12,6
+        vpslld  ymm2,ymm12,26
+        vmovdqu YMMWORD[(256-256-128)+rbx],ymm5
+        vpaddd  ymm5,ymm5,ymm15
+
+        vpsrld  ymm1,ymm12,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm12,21
+        vpaddd  ymm5,ymm5,YMMWORD[((-128))+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm12,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm12,7
+        vpandn  ymm0,ymm12,ymm14
+        vpand   ymm3,ymm12,ymm13
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm15,ymm8,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm8,30
+        vpxor   ymm0,ymm0,ymm3
+        vpxor   ymm3,ymm9,ymm8
+
+        vpxor   ymm15,ymm15,ymm1
+        vpaddd  ymm5,ymm5,ymm7
+
+        vpsrld  ymm1,ymm8,13
+
+        vpslld  ymm2,ymm8,19
+        vpaddd  ymm5,ymm5,ymm0
+        vpand   ymm4,ymm4,ymm3
+
+        vpxor   ymm7,ymm15,ymm1
+
+        vpsrld  ymm1,ymm8,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm8,10
+        vpxor   ymm15,ymm9,ymm4
+        vpaddd  ymm11,ymm11,ymm5
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm15,ymm15,ymm5
+        vpaddd  ymm15,ymm15,ymm7
+        vmovd   xmm5,DWORD[36+r12]
+        vmovd   xmm0,DWORD[36+r8]
+        vmovd   xmm1,DWORD[36+r13]
+        vmovd   xmm2,DWORD[36+r9]
+        vpinsrd xmm5,xmm5,DWORD[36+r14],1
+        vpinsrd xmm0,xmm0,DWORD[36+r10],1
+        vpinsrd xmm1,xmm1,DWORD[36+r15],1
+        vpunpckldq      ymm5,ymm5,ymm1
+        vpinsrd xmm2,xmm2,DWORD[36+r11],1
+        vpunpckldq      ymm0,ymm0,ymm2
+        vinserti128     ymm5,ymm5,xmm0,1
+        vpshufb ymm5,ymm5,ymm6
+        vpsrld  ymm7,ymm11,6
+        vpslld  ymm2,ymm11,26
+        vmovdqu YMMWORD[(288-256-128)+rbx],ymm5
+        vpaddd  ymm5,ymm5,ymm14
+
+        vpsrld  ymm1,ymm11,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm11,21
+        vpaddd  ymm5,ymm5,YMMWORD[((-96))+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm11,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm11,7
+        vpandn  ymm0,ymm11,ymm13
+        vpand   ymm4,ymm11,ymm12
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm14,ymm15,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm15,30
+        vpxor   ymm0,ymm0,ymm4
+        vpxor   ymm4,ymm8,ymm15
+
+        vpxor   ymm14,ymm14,ymm1
+        vpaddd  ymm5,ymm5,ymm7
+
+        vpsrld  ymm1,ymm15,13
+
+        vpslld  ymm2,ymm15,19
+        vpaddd  ymm5,ymm5,ymm0
+        vpand   ymm3,ymm3,ymm4
+
+        vpxor   ymm7,ymm14,ymm1
+
+        vpsrld  ymm1,ymm15,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm15,10
+        vpxor   ymm14,ymm8,ymm3
+        vpaddd  ymm10,ymm10,ymm5
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm14,ymm14,ymm5
+        vpaddd  ymm14,ymm14,ymm7
+        vmovd   xmm5,DWORD[40+r12]
+        vmovd   xmm0,DWORD[40+r8]
+        vmovd   xmm1,DWORD[40+r13]
+        vmovd   xmm2,DWORD[40+r9]
+        vpinsrd xmm5,xmm5,DWORD[40+r14],1
+        vpinsrd xmm0,xmm0,DWORD[40+r10],1
+        vpinsrd xmm1,xmm1,DWORD[40+r15],1
+        vpunpckldq      ymm5,ymm5,ymm1
+        vpinsrd xmm2,xmm2,DWORD[40+r11],1
+        vpunpckldq      ymm0,ymm0,ymm2
+        vinserti128     ymm5,ymm5,xmm0,1
+        vpshufb ymm5,ymm5,ymm6
+        vpsrld  ymm7,ymm10,6
+        vpslld  ymm2,ymm10,26
+        vmovdqu YMMWORD[(320-256-128)+rbx],ymm5
+        vpaddd  ymm5,ymm5,ymm13
+
+        vpsrld  ymm1,ymm10,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm10,21
+        vpaddd  ymm5,ymm5,YMMWORD[((-64))+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm10,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm10,7
+        vpandn  ymm0,ymm10,ymm12
+        vpand   ymm3,ymm10,ymm11
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm13,ymm14,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm14,30
+        vpxor   ymm0,ymm0,ymm3
+        vpxor   ymm3,ymm15,ymm14
+
+        vpxor   ymm13,ymm13,ymm1
+        vpaddd  ymm5,ymm5,ymm7
+
+        vpsrld  ymm1,ymm14,13
+
+        vpslld  ymm2,ymm14,19
+        vpaddd  ymm5,ymm5,ymm0
+        vpand   ymm4,ymm4,ymm3
+
+        vpxor   ymm7,ymm13,ymm1
+
+        vpsrld  ymm1,ymm14,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm14,10
+        vpxor   ymm13,ymm15,ymm4
+        vpaddd  ymm9,ymm9,ymm5
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm13,ymm13,ymm5
+        vpaddd  ymm13,ymm13,ymm7
+        vmovd   xmm5,DWORD[44+r12]
+        vmovd   xmm0,DWORD[44+r8]
+        vmovd   xmm1,DWORD[44+r13]
+        vmovd   xmm2,DWORD[44+r9]
+        vpinsrd xmm5,xmm5,DWORD[44+r14],1
+        vpinsrd xmm0,xmm0,DWORD[44+r10],1
+        vpinsrd xmm1,xmm1,DWORD[44+r15],1
+        vpunpckldq      ymm5,ymm5,ymm1
+        vpinsrd xmm2,xmm2,DWORD[44+r11],1
+        vpunpckldq      ymm0,ymm0,ymm2
+        vinserti128     ymm5,ymm5,xmm0,1
+        vpshufb ymm5,ymm5,ymm6
+        vpsrld  ymm7,ymm9,6
+        vpslld  ymm2,ymm9,26
+        vmovdqu YMMWORD[(352-256-128)+rbx],ymm5
+        vpaddd  ymm5,ymm5,ymm12
+
+        vpsrld  ymm1,ymm9,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm9,21
+        vpaddd  ymm5,ymm5,YMMWORD[((-32))+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm9,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm9,7
+        vpandn  ymm0,ymm9,ymm11
+        vpand   ymm4,ymm9,ymm10
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm12,ymm13,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm13,30
+        vpxor   ymm0,ymm0,ymm4
+        vpxor   ymm4,ymm14,ymm13
+
+        vpxor   ymm12,ymm12,ymm1
+        vpaddd  ymm5,ymm5,ymm7
+
+        vpsrld  ymm1,ymm13,13
+
+        vpslld  ymm2,ymm13,19
+        vpaddd  ymm5,ymm5,ymm0
+        vpand   ymm3,ymm3,ymm4
+
+        vpxor   ymm7,ymm12,ymm1
+
+        vpsrld  ymm1,ymm13,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm13,10
+        vpxor   ymm12,ymm14,ymm3
+        vpaddd  ymm8,ymm8,ymm5
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm12,ymm12,ymm5
+        vpaddd  ymm12,ymm12,ymm7
+        vmovd   xmm5,DWORD[48+r12]
+        vmovd   xmm0,DWORD[48+r8]
+        vmovd   xmm1,DWORD[48+r13]
+        vmovd   xmm2,DWORD[48+r9]
+        vpinsrd xmm5,xmm5,DWORD[48+r14],1
+        vpinsrd xmm0,xmm0,DWORD[48+r10],1
+        vpinsrd xmm1,xmm1,DWORD[48+r15],1
+        vpunpckldq      ymm5,ymm5,ymm1
+        vpinsrd xmm2,xmm2,DWORD[48+r11],1
+        vpunpckldq      ymm0,ymm0,ymm2
+        vinserti128     ymm5,ymm5,xmm0,1
+        vpshufb ymm5,ymm5,ymm6
+        vpsrld  ymm7,ymm8,6
+        vpslld  ymm2,ymm8,26
+        vmovdqu YMMWORD[(384-256-128)+rbx],ymm5
+        vpaddd  ymm5,ymm5,ymm11
+
+        vpsrld  ymm1,ymm8,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm8,21
+        vpaddd  ymm5,ymm5,YMMWORD[rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm8,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm8,7
+        vpandn  ymm0,ymm8,ymm10
+        vpand   ymm3,ymm8,ymm9
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm11,ymm12,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm12,30
+        vpxor   ymm0,ymm0,ymm3
+        vpxor   ymm3,ymm13,ymm12
+
+        vpxor   ymm11,ymm11,ymm1
+        vpaddd  ymm5,ymm5,ymm7
+
+        vpsrld  ymm1,ymm12,13
+
+        vpslld  ymm2,ymm12,19
+        vpaddd  ymm5,ymm5,ymm0
+        vpand   ymm4,ymm4,ymm3
+
+        vpxor   ymm7,ymm11,ymm1
+
+        vpsrld  ymm1,ymm12,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm12,10
+        vpxor   ymm11,ymm13,ymm4
+        vpaddd  ymm15,ymm15,ymm5
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm11,ymm11,ymm5
+        vpaddd  ymm11,ymm11,ymm7
+        vmovd   xmm5,DWORD[52+r12]
+        vmovd   xmm0,DWORD[52+r8]
+        vmovd   xmm1,DWORD[52+r13]
+        vmovd   xmm2,DWORD[52+r9]
+        vpinsrd xmm5,xmm5,DWORD[52+r14],1
+        vpinsrd xmm0,xmm0,DWORD[52+r10],1
+        vpinsrd xmm1,xmm1,DWORD[52+r15],1
+        vpunpckldq      ymm5,ymm5,ymm1
+        vpinsrd xmm2,xmm2,DWORD[52+r11],1
+        vpunpckldq      ymm0,ymm0,ymm2
+        vinserti128     ymm5,ymm5,xmm0,1
+        vpshufb ymm5,ymm5,ymm6
+        vpsrld  ymm7,ymm15,6
+        vpslld  ymm2,ymm15,26
+        vmovdqu YMMWORD[(416-256-128)+rbx],ymm5
+        vpaddd  ymm5,ymm5,ymm10
+
+        vpsrld  ymm1,ymm15,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm15,21
+        vpaddd  ymm5,ymm5,YMMWORD[32+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm15,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm15,7
+        vpandn  ymm0,ymm15,ymm9
+        vpand   ymm4,ymm15,ymm8
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm10,ymm11,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm11,30
+        vpxor   ymm0,ymm0,ymm4
+        vpxor   ymm4,ymm12,ymm11
+
+        vpxor   ymm10,ymm10,ymm1
+        vpaddd  ymm5,ymm5,ymm7
+
+        vpsrld  ymm1,ymm11,13
+
+        vpslld  ymm2,ymm11,19
+        vpaddd  ymm5,ymm5,ymm0
+        vpand   ymm3,ymm3,ymm4
+
+        vpxor   ymm7,ymm10,ymm1
+
+        vpsrld  ymm1,ymm11,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm11,10
+        vpxor   ymm10,ymm12,ymm3
+        vpaddd  ymm14,ymm14,ymm5
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm10,ymm10,ymm5
+        vpaddd  ymm10,ymm10,ymm7
+        vmovd   xmm5,DWORD[56+r12]
+        vmovd   xmm0,DWORD[56+r8]
+        vmovd   xmm1,DWORD[56+r13]
+        vmovd   xmm2,DWORD[56+r9]
+        vpinsrd xmm5,xmm5,DWORD[56+r14],1
+        vpinsrd xmm0,xmm0,DWORD[56+r10],1
+        vpinsrd xmm1,xmm1,DWORD[56+r15],1
+        vpunpckldq      ymm5,ymm5,ymm1
+        vpinsrd xmm2,xmm2,DWORD[56+r11],1
+        vpunpckldq      ymm0,ymm0,ymm2
+        vinserti128     ymm5,ymm5,xmm0,1
+        vpshufb ymm5,ymm5,ymm6
+        vpsrld  ymm7,ymm14,6
+        vpslld  ymm2,ymm14,26
+        vmovdqu YMMWORD[(448-256-128)+rbx],ymm5
+        vpaddd  ymm5,ymm5,ymm9
+
+        vpsrld  ymm1,ymm14,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm14,21
+        vpaddd  ymm5,ymm5,YMMWORD[64+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm14,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm14,7
+        vpandn  ymm0,ymm14,ymm8
+        vpand   ymm3,ymm14,ymm15
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm9,ymm10,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm10,30
+        vpxor   ymm0,ymm0,ymm3
+        vpxor   ymm3,ymm11,ymm10
+
+        vpxor   ymm9,ymm9,ymm1
+        vpaddd  ymm5,ymm5,ymm7
+
+        vpsrld  ymm1,ymm10,13
+
+        vpslld  ymm2,ymm10,19
+        vpaddd  ymm5,ymm5,ymm0
+        vpand   ymm4,ymm4,ymm3
+
+        vpxor   ymm7,ymm9,ymm1
+
+        vpsrld  ymm1,ymm10,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm10,10
+        vpxor   ymm9,ymm11,ymm4
+        vpaddd  ymm13,ymm13,ymm5
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm9,ymm9,ymm5
+        vpaddd  ymm9,ymm9,ymm7
+        vmovd   xmm5,DWORD[60+r12]
+        lea     r12,[64+r12]
+        vmovd   xmm0,DWORD[60+r8]
+        lea     r8,[64+r8]
+        vmovd   xmm1,DWORD[60+r13]
+        lea     r13,[64+r13]
+        vmovd   xmm2,DWORD[60+r9]
+        lea     r9,[64+r9]
+        vpinsrd xmm5,xmm5,DWORD[60+r14],1
+        lea     r14,[64+r14]
+        vpinsrd xmm0,xmm0,DWORD[60+r10],1
+        lea     r10,[64+r10]
+        vpinsrd xmm1,xmm1,DWORD[60+r15],1
+        lea     r15,[64+r15]
+        vpunpckldq      ymm5,ymm5,ymm1
+        vpinsrd xmm2,xmm2,DWORD[60+r11],1
+        lea     r11,[64+r11]
+        vpunpckldq      ymm0,ymm0,ymm2
+        vinserti128     ymm5,ymm5,xmm0,1
+        vpshufb ymm5,ymm5,ymm6
+        vpsrld  ymm7,ymm13,6
+        vpslld  ymm2,ymm13,26
+        vmovdqu YMMWORD[(480-256-128)+rbx],ymm5
+        vpaddd  ymm5,ymm5,ymm8
+
+        vpsrld  ymm1,ymm13,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm13,21
+        vpaddd  ymm5,ymm5,YMMWORD[96+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm13,25
+        vpxor   ymm7,ymm7,ymm2
+        prefetcht0      [63+r12]
+        vpslld  ymm2,ymm13,7
+        vpandn  ymm0,ymm13,ymm15
+        vpand   ymm4,ymm13,ymm14
+        prefetcht0      [63+r13]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm8,ymm9,2
+        vpxor   ymm7,ymm7,ymm2
+        prefetcht0      [63+r14]
+        vpslld  ymm1,ymm9,30
+        vpxor   ymm0,ymm0,ymm4
+        vpxor   ymm4,ymm10,ymm9
+        prefetcht0      [63+r15]
+        vpxor   ymm8,ymm8,ymm1
+        vpaddd  ymm5,ymm5,ymm7
+
+        vpsrld  ymm1,ymm9,13
+        prefetcht0      [63+r8]
+        vpslld  ymm2,ymm9,19
+        vpaddd  ymm5,ymm5,ymm0
+        vpand   ymm3,ymm3,ymm4
+        prefetcht0      [63+r9]
+        vpxor   ymm7,ymm8,ymm1
+
+        vpsrld  ymm1,ymm9,22
+        vpxor   ymm7,ymm7,ymm2
+        prefetcht0      [63+r10]
+        vpslld  ymm2,ymm9,10
+        vpxor   ymm8,ymm10,ymm3
+        vpaddd  ymm12,ymm12,ymm5
+        prefetcht0      [63+r11]
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm8,ymm8,ymm5
+        vpaddd  ymm8,ymm8,ymm7
+        add     rbp,256
+        vmovdqu ymm5,YMMWORD[((0-128))+rax]
+        mov     ecx,3
+        jmp     NEAR $L$oop_16_xx_avx2
+ALIGN   32
+$L$oop_16_xx_avx2:
+        vmovdqu ymm6,YMMWORD[((32-128))+rax]
+        vpaddd  ymm5,ymm5,YMMWORD[((288-256-128))+rbx]
+
+        vpsrld  ymm7,ymm6,3
+        vpsrld  ymm1,ymm6,7
+        vpslld  ymm2,ymm6,25
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm6,18
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm6,14
+        vmovdqu ymm0,YMMWORD[((448-256-128))+rbx]
+        vpsrld  ymm3,ymm0,10
+
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm0,17
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,15
+        vpaddd  ymm5,ymm5,ymm7
+        vpxor   ymm7,ymm3,ymm1
+        vpsrld  ymm1,ymm0,19
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,13
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+        vpaddd  ymm5,ymm5,ymm7
+        vpsrld  ymm7,ymm12,6
+        vpslld  ymm2,ymm12,26
+        vmovdqu YMMWORD[(0-128)+rax],ymm5
+        vpaddd  ymm5,ymm5,ymm15
+
+        vpsrld  ymm1,ymm12,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm12,21
+        vpaddd  ymm5,ymm5,YMMWORD[((-128))+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm12,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm12,7
+        vpandn  ymm0,ymm12,ymm14
+        vpand   ymm3,ymm12,ymm13
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm15,ymm8,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm8,30
+        vpxor   ymm0,ymm0,ymm3
+        vpxor   ymm3,ymm9,ymm8
+
+        vpxor   ymm15,ymm15,ymm1
+        vpaddd  ymm5,ymm5,ymm7
+
+        vpsrld  ymm1,ymm8,13
+
+        vpslld  ymm2,ymm8,19
+        vpaddd  ymm5,ymm5,ymm0
+        vpand   ymm4,ymm4,ymm3
+
+        vpxor   ymm7,ymm15,ymm1
+
+        vpsrld  ymm1,ymm8,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm8,10
+        vpxor   ymm15,ymm9,ymm4
+        vpaddd  ymm11,ymm11,ymm5
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm15,ymm15,ymm5
+        vpaddd  ymm15,ymm15,ymm7
+        vmovdqu ymm5,YMMWORD[((64-128))+rax]
+        vpaddd  ymm6,ymm6,YMMWORD[((320-256-128))+rbx]
+
+        vpsrld  ymm7,ymm5,3
+        vpsrld  ymm1,ymm5,7
+        vpslld  ymm2,ymm5,25
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm5,18
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm5,14
+        vmovdqu ymm0,YMMWORD[((480-256-128))+rbx]
+        vpsrld  ymm4,ymm0,10
+
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm0,17
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,15
+        vpaddd  ymm6,ymm6,ymm7
+        vpxor   ymm7,ymm4,ymm1
+        vpsrld  ymm1,ymm0,19
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,13
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+        vpaddd  ymm6,ymm6,ymm7
+        vpsrld  ymm7,ymm11,6
+        vpslld  ymm2,ymm11,26
+        vmovdqu YMMWORD[(32-128)+rax],ymm6
+        vpaddd  ymm6,ymm6,ymm14
+
+        vpsrld  ymm1,ymm11,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm11,21
+        vpaddd  ymm6,ymm6,YMMWORD[((-96))+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm11,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm11,7
+        vpandn  ymm0,ymm11,ymm13
+        vpand   ymm4,ymm11,ymm12
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm14,ymm15,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm15,30
+        vpxor   ymm0,ymm0,ymm4
+        vpxor   ymm4,ymm8,ymm15
+
+        vpxor   ymm14,ymm14,ymm1
+        vpaddd  ymm6,ymm6,ymm7
+
+        vpsrld  ymm1,ymm15,13
+
+        vpslld  ymm2,ymm15,19
+        vpaddd  ymm6,ymm6,ymm0
+        vpand   ymm3,ymm3,ymm4
+
+        vpxor   ymm7,ymm14,ymm1
+
+        vpsrld  ymm1,ymm15,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm15,10
+        vpxor   ymm14,ymm8,ymm3
+        vpaddd  ymm10,ymm10,ymm6
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm14,ymm14,ymm6
+        vpaddd  ymm14,ymm14,ymm7
+        vmovdqu ymm6,YMMWORD[((96-128))+rax]
+        vpaddd  ymm5,ymm5,YMMWORD[((352-256-128))+rbx]
+
+        vpsrld  ymm7,ymm6,3
+        vpsrld  ymm1,ymm6,7
+        vpslld  ymm2,ymm6,25
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm6,18
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm6,14
+        vmovdqu ymm0,YMMWORD[((0-128))+rax]
+        vpsrld  ymm3,ymm0,10
+
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm0,17
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,15
+        vpaddd  ymm5,ymm5,ymm7
+        vpxor   ymm7,ymm3,ymm1
+        vpsrld  ymm1,ymm0,19
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,13
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+        vpaddd  ymm5,ymm5,ymm7
+        vpsrld  ymm7,ymm10,6
+        vpslld  ymm2,ymm10,26
+        vmovdqu YMMWORD[(64-128)+rax],ymm5
+        vpaddd  ymm5,ymm5,ymm13
+
+        vpsrld  ymm1,ymm10,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm10,21
+        vpaddd  ymm5,ymm5,YMMWORD[((-64))+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm10,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm10,7
+        vpandn  ymm0,ymm10,ymm12
+        vpand   ymm3,ymm10,ymm11
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm13,ymm14,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm14,30
+        vpxor   ymm0,ymm0,ymm3
+        vpxor   ymm3,ymm15,ymm14
+
+        vpxor   ymm13,ymm13,ymm1
+        vpaddd  ymm5,ymm5,ymm7
+
+        vpsrld  ymm1,ymm14,13
+
+        vpslld  ymm2,ymm14,19
+        vpaddd  ymm5,ymm5,ymm0
+        vpand   ymm4,ymm4,ymm3
+
+        vpxor   ymm7,ymm13,ymm1
+
+        vpsrld  ymm1,ymm14,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm14,10
+        vpxor   ymm13,ymm15,ymm4
+        vpaddd  ymm9,ymm9,ymm5
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm13,ymm13,ymm5
+        vpaddd  ymm13,ymm13,ymm7
+        vmovdqu ymm5,YMMWORD[((128-128))+rax]
+        vpaddd  ymm6,ymm6,YMMWORD[((384-256-128))+rbx]
+
+        vpsrld  ymm7,ymm5,3
+        vpsrld  ymm1,ymm5,7
+        vpslld  ymm2,ymm5,25
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm5,18
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm5,14
+        vmovdqu ymm0,YMMWORD[((32-128))+rax]
+        vpsrld  ymm4,ymm0,10
+
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm0,17
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,15
+        vpaddd  ymm6,ymm6,ymm7
+        vpxor   ymm7,ymm4,ymm1
+        vpsrld  ymm1,ymm0,19
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,13
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+        vpaddd  ymm6,ymm6,ymm7
+        vpsrld  ymm7,ymm9,6
+        vpslld  ymm2,ymm9,26
+        vmovdqu YMMWORD[(96-128)+rax],ymm6
+        vpaddd  ymm6,ymm6,ymm12
+
+        vpsrld  ymm1,ymm9,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm9,21
+        vpaddd  ymm6,ymm6,YMMWORD[((-32))+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm9,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm9,7
+        vpandn  ymm0,ymm9,ymm11
+        vpand   ymm4,ymm9,ymm10
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm12,ymm13,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm13,30
+        vpxor   ymm0,ymm0,ymm4
+        vpxor   ymm4,ymm14,ymm13
+
+        vpxor   ymm12,ymm12,ymm1
+        vpaddd  ymm6,ymm6,ymm7
+
+        vpsrld  ymm1,ymm13,13
+
+        vpslld  ymm2,ymm13,19
+        vpaddd  ymm6,ymm6,ymm0
+        vpand   ymm3,ymm3,ymm4
+
+        vpxor   ymm7,ymm12,ymm1
+
+        vpsrld  ymm1,ymm13,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm13,10
+        vpxor   ymm12,ymm14,ymm3
+        vpaddd  ymm8,ymm8,ymm6
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm12,ymm12,ymm6
+        vpaddd  ymm12,ymm12,ymm7
+        vmovdqu ymm6,YMMWORD[((160-128))+rax]
+        vpaddd  ymm5,ymm5,YMMWORD[((416-256-128))+rbx]
+
+        vpsrld  ymm7,ymm6,3
+        vpsrld  ymm1,ymm6,7
+        vpslld  ymm2,ymm6,25
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm6,18
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm6,14
+        vmovdqu ymm0,YMMWORD[((64-128))+rax]
+        vpsrld  ymm3,ymm0,10
+
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm0,17
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,15
+        vpaddd  ymm5,ymm5,ymm7
+        vpxor   ymm7,ymm3,ymm1
+        vpsrld  ymm1,ymm0,19
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,13
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+        vpaddd  ymm5,ymm5,ymm7
+        vpsrld  ymm7,ymm8,6
+        vpslld  ymm2,ymm8,26
+        vmovdqu YMMWORD[(128-128)+rax],ymm5
+        vpaddd  ymm5,ymm5,ymm11
+
+        vpsrld  ymm1,ymm8,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm8,21
+        vpaddd  ymm5,ymm5,YMMWORD[rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm8,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm8,7
+        vpandn  ymm0,ymm8,ymm10
+        vpand   ymm3,ymm8,ymm9
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm11,ymm12,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm12,30
+        vpxor   ymm0,ymm0,ymm3
+        vpxor   ymm3,ymm13,ymm12
+
+        vpxor   ymm11,ymm11,ymm1
+        vpaddd  ymm5,ymm5,ymm7
+
+        vpsrld  ymm1,ymm12,13
+
+        vpslld  ymm2,ymm12,19
+        vpaddd  ymm5,ymm5,ymm0
+        vpand   ymm4,ymm4,ymm3
+
+        vpxor   ymm7,ymm11,ymm1
+
+        vpsrld  ymm1,ymm12,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm12,10
+        vpxor   ymm11,ymm13,ymm4
+        vpaddd  ymm15,ymm15,ymm5
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm11,ymm11,ymm5
+        vpaddd  ymm11,ymm11,ymm7
+        vmovdqu ymm5,YMMWORD[((192-128))+rax]
+        vpaddd  ymm6,ymm6,YMMWORD[((448-256-128))+rbx]
+
+        vpsrld  ymm7,ymm5,3
+        vpsrld  ymm1,ymm5,7
+        vpslld  ymm2,ymm5,25
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm5,18
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm5,14
+        vmovdqu ymm0,YMMWORD[((96-128))+rax]
+        vpsrld  ymm4,ymm0,10
+
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm0,17
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,15
+        vpaddd  ymm6,ymm6,ymm7
+        vpxor   ymm7,ymm4,ymm1
+        vpsrld  ymm1,ymm0,19
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,13
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+        vpaddd  ymm6,ymm6,ymm7
+        vpsrld  ymm7,ymm15,6
+        vpslld  ymm2,ymm15,26
+        vmovdqu YMMWORD[(160-128)+rax],ymm6
+        vpaddd  ymm6,ymm6,ymm10
+
+        vpsrld  ymm1,ymm15,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm15,21
+        vpaddd  ymm6,ymm6,YMMWORD[32+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm15,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm15,7
+        vpandn  ymm0,ymm15,ymm9
+        vpand   ymm4,ymm15,ymm8
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm10,ymm11,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm11,30
+        vpxor   ymm0,ymm0,ymm4
+        vpxor   ymm4,ymm12,ymm11
+
+        vpxor   ymm10,ymm10,ymm1
+        vpaddd  ymm6,ymm6,ymm7
+
+        vpsrld  ymm1,ymm11,13
+
+        vpslld  ymm2,ymm11,19
+        vpaddd  ymm6,ymm6,ymm0
+        vpand   ymm3,ymm3,ymm4
+
+        vpxor   ymm7,ymm10,ymm1
+
+        vpsrld  ymm1,ymm11,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm11,10
+        vpxor   ymm10,ymm12,ymm3
+        vpaddd  ymm14,ymm14,ymm6
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm10,ymm10,ymm6
+        vpaddd  ymm10,ymm10,ymm7
+        vmovdqu ymm6,YMMWORD[((224-128))+rax]
+        vpaddd  ymm5,ymm5,YMMWORD[((480-256-128))+rbx]
+
+        vpsrld  ymm7,ymm6,3
+        vpsrld  ymm1,ymm6,7
+        vpslld  ymm2,ymm6,25
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm6,18
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm6,14
+        vmovdqu ymm0,YMMWORD[((128-128))+rax]
+        vpsrld  ymm3,ymm0,10
+
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm0,17
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,15
+        vpaddd  ymm5,ymm5,ymm7
+        vpxor   ymm7,ymm3,ymm1
+        vpsrld  ymm1,ymm0,19
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,13
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+        vpaddd  ymm5,ymm5,ymm7
+        vpsrld  ymm7,ymm14,6
+        vpslld  ymm2,ymm14,26
+        vmovdqu YMMWORD[(192-128)+rax],ymm5
+        vpaddd  ymm5,ymm5,ymm9
+
+        vpsrld  ymm1,ymm14,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm14,21
+        vpaddd  ymm5,ymm5,YMMWORD[64+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm14,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm14,7
+        vpandn  ymm0,ymm14,ymm8
+        vpand   ymm3,ymm14,ymm15
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm9,ymm10,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm10,30
+        vpxor   ymm0,ymm0,ymm3
+        vpxor   ymm3,ymm11,ymm10
+
+        vpxor   ymm9,ymm9,ymm1
+        vpaddd  ymm5,ymm5,ymm7
+
+        vpsrld  ymm1,ymm10,13
+
+        vpslld  ymm2,ymm10,19
+        vpaddd  ymm5,ymm5,ymm0
+        vpand   ymm4,ymm4,ymm3
+
+        vpxor   ymm7,ymm9,ymm1
+
+        vpsrld  ymm1,ymm10,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm10,10
+        vpxor   ymm9,ymm11,ymm4
+        vpaddd  ymm13,ymm13,ymm5
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm9,ymm9,ymm5
+        vpaddd  ymm9,ymm9,ymm7
+        vmovdqu ymm5,YMMWORD[((256-256-128))+rbx]
+        vpaddd  ymm6,ymm6,YMMWORD[((0-128))+rax]
+
+        vpsrld  ymm7,ymm5,3
+        vpsrld  ymm1,ymm5,7
+        vpslld  ymm2,ymm5,25
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm5,18
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm5,14
+        vmovdqu ymm0,YMMWORD[((160-128))+rax]
+        vpsrld  ymm4,ymm0,10
+
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm0,17
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,15
+        vpaddd  ymm6,ymm6,ymm7
+        vpxor   ymm7,ymm4,ymm1
+        vpsrld  ymm1,ymm0,19
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,13
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+        vpaddd  ymm6,ymm6,ymm7
+        vpsrld  ymm7,ymm13,6
+        vpslld  ymm2,ymm13,26
+        vmovdqu YMMWORD[(224-128)+rax],ymm6
+        vpaddd  ymm6,ymm6,ymm8
+
+        vpsrld  ymm1,ymm13,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm13,21
+        vpaddd  ymm6,ymm6,YMMWORD[96+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm13,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm13,7
+        vpandn  ymm0,ymm13,ymm15
+        vpand   ymm4,ymm13,ymm14
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm8,ymm9,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm9,30
+        vpxor   ymm0,ymm0,ymm4
+        vpxor   ymm4,ymm10,ymm9
+
+        vpxor   ymm8,ymm8,ymm1
+        vpaddd  ymm6,ymm6,ymm7
+
+        vpsrld  ymm1,ymm9,13
+
+        vpslld  ymm2,ymm9,19
+        vpaddd  ymm6,ymm6,ymm0
+        vpand   ymm3,ymm3,ymm4
+
+        vpxor   ymm7,ymm8,ymm1
+
+        vpsrld  ymm1,ymm9,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm9,10
+        vpxor   ymm8,ymm10,ymm3
+        vpaddd  ymm12,ymm12,ymm6
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm8,ymm8,ymm6
+        vpaddd  ymm8,ymm8,ymm7
+        add     rbp,256
+        vmovdqu ymm6,YMMWORD[((288-256-128))+rbx]
+        vpaddd  ymm5,ymm5,YMMWORD[((32-128))+rax]
+
+        vpsrld  ymm7,ymm6,3
+        vpsrld  ymm1,ymm6,7
+        vpslld  ymm2,ymm6,25
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm6,18
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm6,14
+        vmovdqu ymm0,YMMWORD[((192-128))+rax]
+        vpsrld  ymm3,ymm0,10
+
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm0,17
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,15
+        vpaddd  ymm5,ymm5,ymm7
+        vpxor   ymm7,ymm3,ymm1
+        vpsrld  ymm1,ymm0,19
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,13
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+        vpaddd  ymm5,ymm5,ymm7
+        vpsrld  ymm7,ymm12,6
+        vpslld  ymm2,ymm12,26
+        vmovdqu YMMWORD[(256-256-128)+rbx],ymm5
+        vpaddd  ymm5,ymm5,ymm15
+
+        vpsrld  ymm1,ymm12,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm12,21
+        vpaddd  ymm5,ymm5,YMMWORD[((-128))+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm12,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm12,7
+        vpandn  ymm0,ymm12,ymm14
+        vpand   ymm3,ymm12,ymm13
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm15,ymm8,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm8,30
+        vpxor   ymm0,ymm0,ymm3
+        vpxor   ymm3,ymm9,ymm8
+
+        vpxor   ymm15,ymm15,ymm1
+        vpaddd  ymm5,ymm5,ymm7
+
+        vpsrld  ymm1,ymm8,13
+
+        vpslld  ymm2,ymm8,19
+        vpaddd  ymm5,ymm5,ymm0
+        vpand   ymm4,ymm4,ymm3
+
+        vpxor   ymm7,ymm15,ymm1
+
+        vpsrld  ymm1,ymm8,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm8,10
+        vpxor   ymm15,ymm9,ymm4
+        vpaddd  ymm11,ymm11,ymm5
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm15,ymm15,ymm5
+        vpaddd  ymm15,ymm15,ymm7
+        vmovdqu ymm5,YMMWORD[((320-256-128))+rbx]
+        vpaddd  ymm6,ymm6,YMMWORD[((64-128))+rax]
+
+        vpsrld  ymm7,ymm5,3
+        vpsrld  ymm1,ymm5,7
+        vpslld  ymm2,ymm5,25
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm5,18
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm5,14
+        vmovdqu ymm0,YMMWORD[((224-128))+rax]
+        vpsrld  ymm4,ymm0,10
+
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm0,17
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,15
+        vpaddd  ymm6,ymm6,ymm7
+        vpxor   ymm7,ymm4,ymm1
+        vpsrld  ymm1,ymm0,19
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,13
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+        vpaddd  ymm6,ymm6,ymm7
+        vpsrld  ymm7,ymm11,6
+        vpslld  ymm2,ymm11,26
+        vmovdqu YMMWORD[(288-256-128)+rbx],ymm6
+        vpaddd  ymm6,ymm6,ymm14
+
+        vpsrld  ymm1,ymm11,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm11,21
+        vpaddd  ymm6,ymm6,YMMWORD[((-96))+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm11,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm11,7
+        vpandn  ymm0,ymm11,ymm13
+        vpand   ymm4,ymm11,ymm12
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm14,ymm15,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm15,30
+        vpxor   ymm0,ymm0,ymm4
+        vpxor   ymm4,ymm8,ymm15
+
+        vpxor   ymm14,ymm14,ymm1
+        vpaddd  ymm6,ymm6,ymm7
+
+        vpsrld  ymm1,ymm15,13
+
+        vpslld  ymm2,ymm15,19
+        vpaddd  ymm6,ymm6,ymm0
+        vpand   ymm3,ymm3,ymm4
+
+        vpxor   ymm7,ymm14,ymm1
+
+        vpsrld  ymm1,ymm15,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm15,10
+        vpxor   ymm14,ymm8,ymm3
+        vpaddd  ymm10,ymm10,ymm6
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm14,ymm14,ymm6
+        vpaddd  ymm14,ymm14,ymm7
+        vmovdqu ymm6,YMMWORD[((352-256-128))+rbx]
+        vpaddd  ymm5,ymm5,YMMWORD[((96-128))+rax]
+
+        vpsrld  ymm7,ymm6,3
+        vpsrld  ymm1,ymm6,7
+        vpslld  ymm2,ymm6,25
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm6,18
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm6,14
+        vmovdqu ymm0,YMMWORD[((256-256-128))+rbx]
+        vpsrld  ymm3,ymm0,10
+
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm0,17
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,15
+        vpaddd  ymm5,ymm5,ymm7
+        vpxor   ymm7,ymm3,ymm1
+        vpsrld  ymm1,ymm0,19
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,13
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+        vpaddd  ymm5,ymm5,ymm7
+        vpsrld  ymm7,ymm10,6
+        vpslld  ymm2,ymm10,26
+        vmovdqu YMMWORD[(320-256-128)+rbx],ymm5
+        vpaddd  ymm5,ymm5,ymm13
+
+        vpsrld  ymm1,ymm10,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm10,21
+        vpaddd  ymm5,ymm5,YMMWORD[((-64))+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm10,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm10,7
+        vpandn  ymm0,ymm10,ymm12
+        vpand   ymm3,ymm10,ymm11
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm13,ymm14,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm14,30
+        vpxor   ymm0,ymm0,ymm3
+        vpxor   ymm3,ymm15,ymm14
+
+        vpxor   ymm13,ymm13,ymm1
+        vpaddd  ymm5,ymm5,ymm7
+
+        vpsrld  ymm1,ymm14,13
+
+        vpslld  ymm2,ymm14,19
+        vpaddd  ymm5,ymm5,ymm0
+        vpand   ymm4,ymm4,ymm3
+
+        vpxor   ymm7,ymm13,ymm1
+
+        vpsrld  ymm1,ymm14,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm14,10
+        vpxor   ymm13,ymm15,ymm4
+        vpaddd  ymm9,ymm9,ymm5
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm13,ymm13,ymm5
+        vpaddd  ymm13,ymm13,ymm7
+        vmovdqu ymm5,YMMWORD[((384-256-128))+rbx]
+        vpaddd  ymm6,ymm6,YMMWORD[((128-128))+rax]
+
+        vpsrld  ymm7,ymm5,3
+        vpsrld  ymm1,ymm5,7
+        vpslld  ymm2,ymm5,25
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm5,18
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm5,14
+        vmovdqu ymm0,YMMWORD[((288-256-128))+rbx]
+        vpsrld  ymm4,ymm0,10
+
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm0,17
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,15
+        vpaddd  ymm6,ymm6,ymm7
+        vpxor   ymm7,ymm4,ymm1
+        vpsrld  ymm1,ymm0,19
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,13
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+        vpaddd  ymm6,ymm6,ymm7
+        vpsrld  ymm7,ymm9,6
+        vpslld  ymm2,ymm9,26
+        vmovdqu YMMWORD[(352-256-128)+rbx],ymm6
+        vpaddd  ymm6,ymm6,ymm12
+
+        vpsrld  ymm1,ymm9,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm9,21
+        vpaddd  ymm6,ymm6,YMMWORD[((-32))+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm9,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm9,7
+        vpandn  ymm0,ymm9,ymm11
+        vpand   ymm4,ymm9,ymm10
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm12,ymm13,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm13,30
+        vpxor   ymm0,ymm0,ymm4
+        vpxor   ymm4,ymm14,ymm13
+
+        vpxor   ymm12,ymm12,ymm1
+        vpaddd  ymm6,ymm6,ymm7
+
+        vpsrld  ymm1,ymm13,13
+
+        vpslld  ymm2,ymm13,19
+        vpaddd  ymm6,ymm6,ymm0
+        vpand   ymm3,ymm3,ymm4
+
+        vpxor   ymm7,ymm12,ymm1
+
+        vpsrld  ymm1,ymm13,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm13,10
+        vpxor   ymm12,ymm14,ymm3
+        vpaddd  ymm8,ymm8,ymm6
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm12,ymm12,ymm6
+        vpaddd  ymm12,ymm12,ymm7
+        vmovdqu ymm6,YMMWORD[((416-256-128))+rbx]
+        vpaddd  ymm5,ymm5,YMMWORD[((160-128))+rax]
+
+        vpsrld  ymm7,ymm6,3
+        vpsrld  ymm1,ymm6,7
+        vpslld  ymm2,ymm6,25
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm6,18
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm6,14
+        vmovdqu ymm0,YMMWORD[((320-256-128))+rbx]
+        vpsrld  ymm3,ymm0,10
+
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm0,17
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,15
+        vpaddd  ymm5,ymm5,ymm7
+        vpxor   ymm7,ymm3,ymm1
+        vpsrld  ymm1,ymm0,19
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,13
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+        vpaddd  ymm5,ymm5,ymm7
+        vpsrld  ymm7,ymm8,6
+        vpslld  ymm2,ymm8,26
+        vmovdqu YMMWORD[(384-256-128)+rbx],ymm5
+        vpaddd  ymm5,ymm5,ymm11
+
+        vpsrld  ymm1,ymm8,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm8,21
+        vpaddd  ymm5,ymm5,YMMWORD[rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm8,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm8,7
+        vpandn  ymm0,ymm8,ymm10
+        vpand   ymm3,ymm8,ymm9
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm11,ymm12,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm12,30
+        vpxor   ymm0,ymm0,ymm3
+        vpxor   ymm3,ymm13,ymm12
+
+        vpxor   ymm11,ymm11,ymm1
+        vpaddd  ymm5,ymm5,ymm7
+
+        vpsrld  ymm1,ymm12,13
+
+        vpslld  ymm2,ymm12,19
+        vpaddd  ymm5,ymm5,ymm0
+        vpand   ymm4,ymm4,ymm3
+
+        vpxor   ymm7,ymm11,ymm1
+
+        vpsrld  ymm1,ymm12,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm12,10
+        vpxor   ymm11,ymm13,ymm4
+        vpaddd  ymm15,ymm15,ymm5
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm11,ymm11,ymm5
+        vpaddd  ymm11,ymm11,ymm7
+        vmovdqu ymm5,YMMWORD[((448-256-128))+rbx]
+        vpaddd  ymm6,ymm6,YMMWORD[((192-128))+rax]
+
+        vpsrld  ymm7,ymm5,3
+        vpsrld  ymm1,ymm5,7
+        vpslld  ymm2,ymm5,25
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm5,18
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm5,14
+        vmovdqu ymm0,YMMWORD[((352-256-128))+rbx]
+        vpsrld  ymm4,ymm0,10
+
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm0,17
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,15
+        vpaddd  ymm6,ymm6,ymm7
+        vpxor   ymm7,ymm4,ymm1
+        vpsrld  ymm1,ymm0,19
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,13
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+        vpaddd  ymm6,ymm6,ymm7
+        vpsrld  ymm7,ymm15,6
+        vpslld  ymm2,ymm15,26
+        vmovdqu YMMWORD[(416-256-128)+rbx],ymm6
+        vpaddd  ymm6,ymm6,ymm10
+
+        vpsrld  ymm1,ymm15,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm15,21
+        vpaddd  ymm6,ymm6,YMMWORD[32+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm15,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm15,7
+        vpandn  ymm0,ymm15,ymm9
+        vpand   ymm4,ymm15,ymm8
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm10,ymm11,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm11,30
+        vpxor   ymm0,ymm0,ymm4
+        vpxor   ymm4,ymm12,ymm11
+
+        vpxor   ymm10,ymm10,ymm1
+        vpaddd  ymm6,ymm6,ymm7
+
+        vpsrld  ymm1,ymm11,13
+
+        vpslld  ymm2,ymm11,19
+        vpaddd  ymm6,ymm6,ymm0
+        vpand   ymm3,ymm3,ymm4
+
+        vpxor   ymm7,ymm10,ymm1
+
+        vpsrld  ymm1,ymm11,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm11,10
+        vpxor   ymm10,ymm12,ymm3
+        vpaddd  ymm14,ymm14,ymm6
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm10,ymm10,ymm6
+        vpaddd  ymm10,ymm10,ymm7
+        vmovdqu ymm6,YMMWORD[((480-256-128))+rbx]
+        vpaddd  ymm5,ymm5,YMMWORD[((224-128))+rax]
+
+        vpsrld  ymm7,ymm6,3
+        vpsrld  ymm1,ymm6,7
+        vpslld  ymm2,ymm6,25
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm6,18
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm6,14
+        vmovdqu ymm0,YMMWORD[((384-256-128))+rbx]
+        vpsrld  ymm3,ymm0,10
+
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm0,17
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,15
+        vpaddd  ymm5,ymm5,ymm7
+        vpxor   ymm7,ymm3,ymm1
+        vpsrld  ymm1,ymm0,19
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,13
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+        vpaddd  ymm5,ymm5,ymm7
+        vpsrld  ymm7,ymm14,6
+        vpslld  ymm2,ymm14,26
+        vmovdqu YMMWORD[(448-256-128)+rbx],ymm5
+        vpaddd  ymm5,ymm5,ymm9
+
+        vpsrld  ymm1,ymm14,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm14,21
+        vpaddd  ymm5,ymm5,YMMWORD[64+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm14,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm14,7
+        vpandn  ymm0,ymm14,ymm8
+        vpand   ymm3,ymm14,ymm15
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm9,ymm10,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm10,30
+        vpxor   ymm0,ymm0,ymm3
+        vpxor   ymm3,ymm11,ymm10
+
+        vpxor   ymm9,ymm9,ymm1
+        vpaddd  ymm5,ymm5,ymm7
+
+        vpsrld  ymm1,ymm10,13
+
+        vpslld  ymm2,ymm10,19
+        vpaddd  ymm5,ymm5,ymm0
+        vpand   ymm4,ymm4,ymm3
+
+        vpxor   ymm7,ymm9,ymm1
+
+        vpsrld  ymm1,ymm10,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm10,10
+        vpxor   ymm9,ymm11,ymm4
+        vpaddd  ymm13,ymm13,ymm5
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm9,ymm9,ymm5
+        vpaddd  ymm9,ymm9,ymm7
+        vmovdqu ymm5,YMMWORD[((0-128))+rax]
+        vpaddd  ymm6,ymm6,YMMWORD[((256-256-128))+rbx]
+
+        vpsrld  ymm7,ymm5,3
+        vpsrld  ymm1,ymm5,7
+        vpslld  ymm2,ymm5,25
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm5,18
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm5,14
+        vmovdqu ymm0,YMMWORD[((416-256-128))+rbx]
+        vpsrld  ymm4,ymm0,10
+
+        vpxor   ymm7,ymm7,ymm1
+        vpsrld  ymm1,ymm0,17
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,15
+        vpaddd  ymm6,ymm6,ymm7
+        vpxor   ymm7,ymm4,ymm1
+        vpsrld  ymm1,ymm0,19
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm0,13
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+        vpaddd  ymm6,ymm6,ymm7
+        vpsrld  ymm7,ymm13,6
+        vpslld  ymm2,ymm13,26
+        vmovdqu YMMWORD[(480-256-128)+rbx],ymm6
+        vpaddd  ymm6,ymm6,ymm8
+
+        vpsrld  ymm1,ymm13,11
+        vpxor   ymm7,ymm7,ymm2
+        vpslld  ymm2,ymm13,21
+        vpaddd  ymm6,ymm6,YMMWORD[96+rbp]
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm1,ymm13,25
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm13,7
+        vpandn  ymm0,ymm13,ymm15
+        vpand   ymm4,ymm13,ymm14
+
+        vpxor   ymm7,ymm7,ymm1
+
+        vpsrld  ymm8,ymm9,2
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm1,ymm9,30
+        vpxor   ymm0,ymm0,ymm4
+        vpxor   ymm4,ymm10,ymm9
+
+        vpxor   ymm8,ymm8,ymm1
+        vpaddd  ymm6,ymm6,ymm7
+
+        vpsrld  ymm1,ymm9,13
+
+        vpslld  ymm2,ymm9,19
+        vpaddd  ymm6,ymm6,ymm0
+        vpand   ymm3,ymm3,ymm4
+
+        vpxor   ymm7,ymm8,ymm1
+
+        vpsrld  ymm1,ymm9,22
+        vpxor   ymm7,ymm7,ymm2
+
+        vpslld  ymm2,ymm9,10
+        vpxor   ymm8,ymm10,ymm3
+        vpaddd  ymm12,ymm12,ymm6
+
+        vpxor   ymm7,ymm7,ymm1
+        vpxor   ymm7,ymm7,ymm2
+
+        vpaddd  ymm8,ymm8,ymm6
+        vpaddd  ymm8,ymm8,ymm7
+        add     rbp,256
+        dec     ecx
+        jnz     NEAR $L$oop_16_xx_avx2
+
+        mov     ecx,1
+        lea     rbx,[512+rsp]
+        lea     rbp,[((K256+128))]
+        cmp     ecx,DWORD[rbx]
+        cmovge  r12,rbp
+        cmp     ecx,DWORD[4+rbx]
+        cmovge  r13,rbp
+        cmp     ecx,DWORD[8+rbx]
+        cmovge  r14,rbp
+        cmp     ecx,DWORD[12+rbx]
+        cmovge  r15,rbp
+        cmp     ecx,DWORD[16+rbx]
+        cmovge  r8,rbp
+        cmp     ecx,DWORD[20+rbx]
+        cmovge  r9,rbp
+        cmp     ecx,DWORD[24+rbx]
+        cmovge  r10,rbp
+        cmp     ecx,DWORD[28+rbx]
+        cmovge  r11,rbp
+        vmovdqa ymm7,YMMWORD[rbx]
+        vpxor   ymm0,ymm0,ymm0
+        vmovdqa ymm6,ymm7
+        vpcmpgtd        ymm6,ymm6,ymm0
+        vpaddd  ymm7,ymm7,ymm6
+
+        vmovdqu ymm0,YMMWORD[((0-128))+rdi]
+        vpand   ymm8,ymm8,ymm6
+        vmovdqu ymm1,YMMWORD[((32-128))+rdi]
+        vpand   ymm9,ymm9,ymm6
+        vmovdqu ymm2,YMMWORD[((64-128))+rdi]
+        vpand   ymm10,ymm10,ymm6
+        vmovdqu ymm5,YMMWORD[((96-128))+rdi]
+        vpand   ymm11,ymm11,ymm6
+        vpaddd  ymm8,ymm8,ymm0
+        vmovdqu ymm0,YMMWORD[((128-128))+rdi]
+        vpand   ymm12,ymm12,ymm6
+        vpaddd  ymm9,ymm9,ymm1
+        vmovdqu ymm1,YMMWORD[((160-128))+rdi]
+        vpand   ymm13,ymm13,ymm6
+        vpaddd  ymm10,ymm10,ymm2
+        vmovdqu ymm2,YMMWORD[((192-128))+rdi]
+        vpand   ymm14,ymm14,ymm6
+        vpaddd  ymm11,ymm11,ymm5
+        vmovdqu ymm5,YMMWORD[((224-128))+rdi]
+        vpand   ymm15,ymm15,ymm6
+        vpaddd  ymm12,ymm12,ymm0
+        vpaddd  ymm13,ymm13,ymm1
+        vmovdqu YMMWORD[(0-128)+rdi],ymm8
+        vpaddd  ymm14,ymm14,ymm2
+        vmovdqu YMMWORD[(32-128)+rdi],ymm9
+        vpaddd  ymm15,ymm15,ymm5
+        vmovdqu YMMWORD[(64-128)+rdi],ymm10
+        vmovdqu YMMWORD[(96-128)+rdi],ymm11
+        vmovdqu YMMWORD[(128-128)+rdi],ymm12
+        vmovdqu YMMWORD[(160-128)+rdi],ymm13
+        vmovdqu YMMWORD[(192-128)+rdi],ymm14
+        vmovdqu YMMWORD[(224-128)+rdi],ymm15
+
+        vmovdqu YMMWORD[rbx],ymm7
+        lea     rbx,[((256+128))+rsp]
+        vmovdqu ymm6,YMMWORD[$L$pbswap]
+        dec     edx
+        jnz     NEAR $L$oop_avx2
+
+
+
+
+
+
+
+$L$done_avx2:
+        mov     rax,QWORD[544+rsp]
+
+        vzeroupper
+        movaps  xmm6,XMMWORD[((-216))+rax]
+        movaps  xmm7,XMMWORD[((-200))+rax]
+        movaps  xmm8,XMMWORD[((-184))+rax]
+        movaps  xmm9,XMMWORD[((-168))+rax]
+        movaps  xmm10,XMMWORD[((-152))+rax]
+        movaps  xmm11,XMMWORD[((-136))+rax]
+        movaps  xmm12,XMMWORD[((-120))+rax]
+        movaps  xmm13,XMMWORD[((-104))+rax]
+        movaps  xmm14,XMMWORD[((-88))+rax]
+        movaps  xmm15,XMMWORD[((-72))+rax]
+        mov     r15,QWORD[((-48))+rax]
+
+        mov     r14,QWORD[((-40))+rax]
+
+        mov     r13,QWORD[((-32))+rax]
+
+        mov     r12,QWORD[((-24))+rax]
+
+        mov     rbp,QWORD[((-16))+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+
+        lea     rsp,[rax]
+
+$L$epilogue_avx2:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_sha256_multi_block_avx2:
+ALIGN   256
+K256:
+        DD      1116352408,1116352408,1116352408,1116352408
+        DD      1116352408,1116352408,1116352408,1116352408
+        DD      1899447441,1899447441,1899447441,1899447441
+        DD      1899447441,1899447441,1899447441,1899447441
+        DD      3049323471,3049323471,3049323471,3049323471
+        DD      3049323471,3049323471,3049323471,3049323471
+        DD      3921009573,3921009573,3921009573,3921009573
+        DD      3921009573,3921009573,3921009573,3921009573
+        DD      961987163,961987163,961987163,961987163
+        DD      961987163,961987163,961987163,961987163
+        DD      1508970993,1508970993,1508970993,1508970993
+        DD      1508970993,1508970993,1508970993,1508970993
+        DD      2453635748,2453635748,2453635748,2453635748
+        DD      2453635748,2453635748,2453635748,2453635748
+        DD      2870763221,2870763221,2870763221,2870763221
+        DD      2870763221,2870763221,2870763221,2870763221
+        DD      3624381080,3624381080,3624381080,3624381080
+        DD      3624381080,3624381080,3624381080,3624381080
+        DD      310598401,310598401,310598401,310598401
+        DD      310598401,310598401,310598401,310598401
+        DD      607225278,607225278,607225278,607225278
+        DD      607225278,607225278,607225278,607225278
+        DD      1426881987,1426881987,1426881987,1426881987
+        DD      1426881987,1426881987,1426881987,1426881987
+        DD      1925078388,1925078388,1925078388,1925078388
+        DD      1925078388,1925078388,1925078388,1925078388
+        DD      2162078206,2162078206,2162078206,2162078206
+        DD      2162078206,2162078206,2162078206,2162078206
+        DD      2614888103,2614888103,2614888103,2614888103
+        DD      2614888103,2614888103,2614888103,2614888103
+        DD      3248222580,3248222580,3248222580,3248222580
+        DD      3248222580,3248222580,3248222580,3248222580
+        DD      3835390401,3835390401,3835390401,3835390401
+        DD      3835390401,3835390401,3835390401,3835390401
+        DD      4022224774,4022224774,4022224774,4022224774
+        DD      4022224774,4022224774,4022224774,4022224774
+        DD      264347078,264347078,264347078,264347078
+        DD      264347078,264347078,264347078,264347078
+        DD      604807628,604807628,604807628,604807628
+        DD      604807628,604807628,604807628,604807628
+        DD      770255983,770255983,770255983,770255983
+        DD      770255983,770255983,770255983,770255983
+        DD      1249150122,1249150122,1249150122,1249150122
+        DD      1249150122,1249150122,1249150122,1249150122
+        DD      1555081692,1555081692,1555081692,1555081692
+        DD      1555081692,1555081692,1555081692,1555081692
+        DD      1996064986,1996064986,1996064986,1996064986
+        DD      1996064986,1996064986,1996064986,1996064986
+        DD      2554220882,2554220882,2554220882,2554220882
+        DD      2554220882,2554220882,2554220882,2554220882
+        DD      2821834349,2821834349,2821834349,2821834349
+        DD      2821834349,2821834349,2821834349,2821834349
+        DD      2952996808,2952996808,2952996808,2952996808
+        DD      2952996808,2952996808,2952996808,2952996808
+        DD      3210313671,3210313671,3210313671,3210313671
+        DD      3210313671,3210313671,3210313671,3210313671
+        DD      3336571891,3336571891,3336571891,3336571891
+        DD      3336571891,3336571891,3336571891,3336571891
+        DD      3584528711,3584528711,3584528711,3584528711
+        DD      3584528711,3584528711,3584528711,3584528711
+        DD      113926993,113926993,113926993,113926993
+        DD      113926993,113926993,113926993,113926993
+        DD      338241895,338241895,338241895,338241895
+        DD      338241895,338241895,338241895,338241895
+        DD      666307205,666307205,666307205,666307205
+        DD      666307205,666307205,666307205,666307205
+        DD      773529912,773529912,773529912,773529912
+        DD      773529912,773529912,773529912,773529912
+        DD      1294757372,1294757372,1294757372,1294757372
+        DD      1294757372,1294757372,1294757372,1294757372
+        DD      1396182291,1396182291,1396182291,1396182291
+        DD      1396182291,1396182291,1396182291,1396182291
+        DD      1695183700,1695183700,1695183700,1695183700
+        DD      1695183700,1695183700,1695183700,1695183700
+        DD      1986661051,1986661051,1986661051,1986661051
+        DD      1986661051,1986661051,1986661051,1986661051
+        DD      2177026350,2177026350,2177026350,2177026350
+        DD      2177026350,2177026350,2177026350,2177026350
+        DD      2456956037,2456956037,2456956037,2456956037
+        DD      2456956037,2456956037,2456956037,2456956037
+        DD      2730485921,2730485921,2730485921,2730485921
+        DD      2730485921,2730485921,2730485921,2730485921
+        DD      2820302411,2820302411,2820302411,2820302411
+        DD      2820302411,2820302411,2820302411,2820302411
+        DD      3259730800,3259730800,3259730800,3259730800
+        DD      3259730800,3259730800,3259730800,3259730800
+        DD      3345764771,3345764771,3345764771,3345764771
+        DD      3345764771,3345764771,3345764771,3345764771
+        DD      3516065817,3516065817,3516065817,3516065817
+        DD      3516065817,3516065817,3516065817,3516065817
+        DD      3600352804,3600352804,3600352804,3600352804
+        DD      3600352804,3600352804,3600352804,3600352804
+        DD      4094571909,4094571909,4094571909,4094571909
+        DD      4094571909,4094571909,4094571909,4094571909
+        DD      275423344,275423344,275423344,275423344
+        DD      275423344,275423344,275423344,275423344
+        DD      430227734,430227734,430227734,430227734
+        DD      430227734,430227734,430227734,430227734
+        DD      506948616,506948616,506948616,506948616
+        DD      506948616,506948616,506948616,506948616
+        DD      659060556,659060556,659060556,659060556
+        DD      659060556,659060556,659060556,659060556
+        DD      883997877,883997877,883997877,883997877
+        DD      883997877,883997877,883997877,883997877
+        DD      958139571,958139571,958139571,958139571
+        DD      958139571,958139571,958139571,958139571
+        DD      1322822218,1322822218,1322822218,1322822218
+        DD      1322822218,1322822218,1322822218,1322822218
+        DD      1537002063,1537002063,1537002063,1537002063
+        DD      1537002063,1537002063,1537002063,1537002063
+        DD      1747873779,1747873779,1747873779,1747873779
+        DD      1747873779,1747873779,1747873779,1747873779
+        DD      1955562222,1955562222,1955562222,1955562222
+        DD      1955562222,1955562222,1955562222,1955562222
+        DD      2024104815,2024104815,2024104815,2024104815
+        DD      2024104815,2024104815,2024104815,2024104815
+        DD      2227730452,2227730452,2227730452,2227730452
+        DD      2227730452,2227730452,2227730452,2227730452
+        DD      2361852424,2361852424,2361852424,2361852424
+        DD      2361852424,2361852424,2361852424,2361852424
+        DD      2428436474,2428436474,2428436474,2428436474
+        DD      2428436474,2428436474,2428436474,2428436474
+        DD      2756734187,2756734187,2756734187,2756734187
+        DD      2756734187,2756734187,2756734187,2756734187
+        DD      3204031479,3204031479,3204031479,3204031479
+        DD      3204031479,3204031479,3204031479,3204031479
+        DD      3329325298,3329325298,3329325298,3329325298
+        DD      3329325298,3329325298,3329325298,3329325298
+$L$pbswap:
+        DD      0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f
+        DD      0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f
+K256_shaext:
+        DD      0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5
+        DD      0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5
+        DD      0xd807aa98,0x12835b01,0x243185be,0x550c7dc3
+        DD      0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174
+        DD      0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc
+        DD      0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da
+        DD      0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7
+        DD      0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967
+        DD      0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13
+        DD      0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85
+        DD      0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3
+        DD      0xd192e819,0xd6990624,0xf40e3585,0x106aa070
+        DD      0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5
+        DD      0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3
+        DD      0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208
+        DD      0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2
+DB      83,72,65,50,53,54,32,109,117,108,116,105,45,98,108,111
+DB      99,107,32,116,114,97,110,115,102,111,114,109,32,102,111,114
+DB      32,120,56,54,95,54,52,44,32,67,82,89,80,84,79,71
+DB      65,77,83,32,98,121,32,60,97,112,112,114,111,64,111,112
+DB      101,110,115,115,108,46,111,114,103,62,0
+EXTERN  __imp_RtlVirtualUnwind
+
+ALIGN   16
+se_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        mov     rsi,QWORD[8+r9]
+        mov     r11,QWORD[56+r9]
+
+        mov     r10d,DWORD[r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jb      NEAR $L$in_prologue
+
+        mov     rax,QWORD[152+r8]
+
+        mov     r10d,DWORD[4+r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jae     NEAR $L$in_prologue
+
+        mov     rax,QWORD[272+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+        mov     rbp,QWORD[((-16))+rax]
+        mov     QWORD[144+r8],rbx
+        mov     QWORD[160+r8],rbp
+
+        lea     rsi,[((-24-160))+rax]
+        lea     rdi,[512+r8]
+        mov     ecx,20
+        DD      0xa548f3fc
+
+$L$in_prologue:
+        mov     rdi,QWORD[8+rax]
+        mov     rsi,QWORD[16+rax]
+        mov     QWORD[152+r8],rax
+        mov     QWORD[168+r8],rsi
+        mov     QWORD[176+r8],rdi
+
+        mov     rdi,QWORD[40+r9]
+        mov     rsi,r8
+        mov     ecx,154
+        DD      0xa548f3fc
+
+        mov     rsi,r9
+        xor     rcx,rcx
+        mov     rdx,QWORD[8+rsi]
+        mov     r8,QWORD[rsi]
+        mov     r9,QWORD[16+rsi]
+        mov     r10,QWORD[40+rsi]
+        lea     r11,[56+rsi]
+        lea     r12,[24+rsi]
+        mov     QWORD[32+rsp],r10
+        mov     QWORD[40+rsp],r11
+        mov     QWORD[48+rsp],r12
+        mov     QWORD[56+rsp],rcx
+        call    QWORD[__imp_RtlVirtualUnwind]
+
+        mov     eax,1
+        add     rsp,64
+        popfq
+        pop     r15
+        pop     r14
+        pop     r13
+        pop     r12
+        pop     rbp
+        pop     rbx
+        pop     rdi
+        pop     rsi
+        DB      0F3h,0C3h               ;repret
+
+
+ALIGN   16
+avx2_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        mov     rsi,QWORD[8+r9]
+        mov     r11,QWORD[56+r9]
+
+        mov     r10d,DWORD[r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jb      NEAR $L$in_prologue
+
+        mov     rax,QWORD[152+r8]
+
+        mov     r10d,DWORD[4+r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jae     NEAR $L$in_prologue
+
+        mov     rax,QWORD[544+r8]
+
+        mov     rbx,QWORD[((-8))+rax]
+        mov     rbp,QWORD[((-16))+rax]
+        mov     r12,QWORD[((-24))+rax]
+        mov     r13,QWORD[((-32))+rax]
+        mov     r14,QWORD[((-40))+rax]
+        mov     r15,QWORD[((-48))+rax]
+        mov     QWORD[144+r8],rbx
+        mov     QWORD[160+r8],rbp
+        mov     QWORD[216+r8],r12
+        mov     QWORD[224+r8],r13
+        mov     QWORD[232+r8],r14
+        mov     QWORD[240+r8],r15
+
+        lea     rsi,[((-56-160))+rax]
+        lea     rdi,[512+r8]
+        mov     ecx,20
+        DD      0xa548f3fc
+
+        jmp     NEAR $L$in_prologue
+
+section .pdata rdata align=4
+ALIGN   4
+        DD      $L$SEH_begin_sha256_multi_block wrt ..imagebase
+        DD      $L$SEH_end_sha256_multi_block wrt ..imagebase
+        DD      $L$SEH_info_sha256_multi_block wrt ..imagebase
+        DD      $L$SEH_begin_sha256_multi_block_shaext wrt ..imagebase
+        DD      $L$SEH_end_sha256_multi_block_shaext wrt ..imagebase
+        DD      $L$SEH_info_sha256_multi_block_shaext wrt ..imagebase
+        DD      $L$SEH_begin_sha256_multi_block_avx wrt ..imagebase
+        DD      $L$SEH_end_sha256_multi_block_avx wrt ..imagebase
+        DD      $L$SEH_info_sha256_multi_block_avx wrt ..imagebase
+        DD      $L$SEH_begin_sha256_multi_block_avx2 wrt ..imagebase
+        DD      $L$SEH_end_sha256_multi_block_avx2 wrt ..imagebase
+        DD      $L$SEH_info_sha256_multi_block_avx2 wrt ..imagebase
+section .xdata rdata align=8
+ALIGN   8
+$L$SEH_info_sha256_multi_block:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$body wrt ..imagebase,$L$epilogue wrt ..imagebase
+$L$SEH_info_sha256_multi_block_shaext:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$body_shaext wrt ..imagebase,$L$epilogue_shaext wrt ..imagebase
+$L$SEH_info_sha256_multi_block_avx:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$body_avx wrt ..imagebase,$L$epilogue_avx wrt ..imagebase
+$L$SEH_info_sha256_multi_block_avx2:
+DB      9,0,0,0
+        DD      avx2_handler wrt ..imagebase
+        DD      $L$body_avx2 wrt ..imagebase,$L$epilogue_avx2 wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-x86_64.nasm
new file mode 100644
index 0000000000..e8abeaa668
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-x86_64.nasm
@@ -0,0 +1,5712 @@
+; Copyright 2005-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+EXTERN  OPENSSL_ia32cap_P
+global  sha256_block_data_order
+
+ALIGN   16
+sha256_block_data_order:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_sha256_block_data_order:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+
+        lea     r11,[OPENSSL_ia32cap_P]
+        mov     r9d,DWORD[r11]
+        mov     r10d,DWORD[4+r11]
+        mov     r11d,DWORD[8+r11]
+        test    r11d,536870912
+        jnz     NEAR _shaext_shortcut
+        and     r11d,296
+        cmp     r11d,296
+        je      NEAR $L$avx2_shortcut
+        and     r9d,1073741824
+        and     r10d,268435968
+        or      r10d,r9d
+        cmp     r10d,1342177792
+        je      NEAR $L$avx_shortcut
+        test    r10d,512
+        jnz     NEAR $L$ssse3_shortcut
+        mov     rax,rsp
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+        shl     rdx,4
+        sub     rsp,16*4+4*8
+        lea     rdx,[rdx*4+rsi]
+        and     rsp,-64
+        mov     QWORD[((64+0))+rsp],rdi
+        mov     QWORD[((64+8))+rsp],rsi
+        mov     QWORD[((64+16))+rsp],rdx
+        mov     QWORD[88+rsp],rax
+
+$L$prologue:
+
+        mov     eax,DWORD[rdi]
+        mov     ebx,DWORD[4+rdi]
+        mov     ecx,DWORD[8+rdi]
+        mov     edx,DWORD[12+rdi]
+        mov     r8d,DWORD[16+rdi]
+        mov     r9d,DWORD[20+rdi]
+        mov     r10d,DWORD[24+rdi]
+        mov     r11d,DWORD[28+rdi]
+        jmp     NEAR $L$loop
+
+ALIGN   16
+$L$loop:
+        mov     edi,ebx
+        lea     rbp,[K256]
+        xor     edi,ecx
+        mov     r12d,DWORD[rsi]
+        mov     r13d,r8d
+        mov     r14d,eax
+        bswap   r12d
+        ror     r13d,14
+        mov     r15d,r9d
+
+        xor     r13d,r8d
+        ror     r14d,9
+        xor     r15d,r10d
+
+        mov     DWORD[rsp],r12d
+        xor     r14d,eax
+        and     r15d,r8d
+
+        ror     r13d,5
+        add     r12d,r11d
+        xor     r15d,r10d
+
+        ror     r14d,11
+        xor     r13d,r8d
+        add     r12d,r15d
+
+        mov     r15d,eax
+        add     r12d,DWORD[rbp]
+        xor     r14d,eax
+
+        xor     r15d,ebx
+        ror     r13d,6
+        mov     r11d,ebx
+
+        and     edi,r15d
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     r11d,edi
+        add     edx,r12d
+        add     r11d,r12d
+
+        lea     rbp,[4+rbp]
+        add     r11d,r14d
+        mov     r12d,DWORD[4+rsi]
+        mov     r13d,edx
+        mov     r14d,r11d
+        bswap   r12d
+        ror     r13d,14
+        mov     edi,r8d
+
+        xor     r13d,edx
+        ror     r14d,9
+        xor     edi,r9d
+
+        mov     DWORD[4+rsp],r12d
+        xor     r14d,r11d
+        and     edi,edx
+
+        ror     r13d,5
+        add     r12d,r10d
+        xor     edi,r9d
+
+        ror     r14d,11
+        xor     r13d,edx
+        add     r12d,edi
+
+        mov     edi,r11d
+        add     r12d,DWORD[rbp]
+        xor     r14d,r11d
+
+        xor     edi,eax
+        ror     r13d,6
+        mov     r10d,eax
+
+        and     r15d,edi
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     r10d,r15d
+        add     ecx,r12d
+        add     r10d,r12d
+
+        lea     rbp,[4+rbp]
+        add     r10d,r14d
+        mov     r12d,DWORD[8+rsi]
+        mov     r13d,ecx
+        mov     r14d,r10d
+        bswap   r12d
+        ror     r13d,14
+        mov     r15d,edx
+
+        xor     r13d,ecx
+        ror     r14d,9
+        xor     r15d,r8d
+
+        mov     DWORD[8+rsp],r12d
+        xor     r14d,r10d
+        and     r15d,ecx
+
+        ror     r13d,5
+        add     r12d,r9d
+        xor     r15d,r8d
+
+        ror     r14d,11
+        xor     r13d,ecx
+        add     r12d,r15d
+
+        mov     r15d,r10d
+        add     r12d,DWORD[rbp]
+        xor     r14d,r10d
+
+        xor     r15d,r11d
+        ror     r13d,6
+        mov     r9d,r11d
+
+        and     edi,r15d
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     r9d,edi
+        add     ebx,r12d
+        add     r9d,r12d
+
+        lea     rbp,[4+rbp]
+        add     r9d,r14d
+        mov     r12d,DWORD[12+rsi]
+        mov     r13d,ebx
+        mov     r14d,r9d
+        bswap   r12d
+        ror     r13d,14
+        mov     edi,ecx
+
+        xor     r13d,ebx
+        ror     r14d,9
+        xor     edi,edx
+
+        mov     DWORD[12+rsp],r12d
+        xor     r14d,r9d
+        and     edi,ebx
+
+        ror     r13d,5
+        add     r12d,r8d
+        xor     edi,edx
+
+        ror     r14d,11
+        xor     r13d,ebx
+        add     r12d,edi
+
+        mov     edi,r9d
+        add     r12d,DWORD[rbp]
+        xor     r14d,r9d
+
+        xor     edi,r10d
+        ror     r13d,6
+        mov     r8d,r10d
+
+        and     r15d,edi
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     r8d,r15d
+        add     eax,r12d
+        add     r8d,r12d
+
+        lea     rbp,[20+rbp]
+        add     r8d,r14d
+        mov     r12d,DWORD[16+rsi]
+        mov     r13d,eax
+        mov     r14d,r8d
+        bswap   r12d
+        ror     r13d,14
+        mov     r15d,ebx
+
+        xor     r13d,eax
+        ror     r14d,9
+        xor     r15d,ecx
+
+        mov     DWORD[16+rsp],r12d
+        xor     r14d,r8d
+        and     r15d,eax
+
+        ror     r13d,5
+        add     r12d,edx
+        xor     r15d,ecx
+
+        ror     r14d,11
+        xor     r13d,eax
+        add     r12d,r15d
+
+        mov     r15d,r8d
+        add     r12d,DWORD[rbp]
+        xor     r14d,r8d
+
+        xor     r15d,r9d
+        ror     r13d,6
+        mov     edx,r9d
+
+        and     edi,r15d
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     edx,edi
+        add     r11d,r12d
+        add     edx,r12d
+
+        lea     rbp,[4+rbp]
+        add     edx,r14d
+        mov     r12d,DWORD[20+rsi]
+        mov     r13d,r11d
+        mov     r14d,edx
+        bswap   r12d
+        ror     r13d,14
+        mov     edi,eax
+
+        xor     r13d,r11d
+        ror     r14d,9
+        xor     edi,ebx
+
+        mov     DWORD[20+rsp],r12d
+        xor     r14d,edx
+        and     edi,r11d
+
+        ror     r13d,5
+        add     r12d,ecx
+        xor     edi,ebx
+
+        ror     r14d,11
+        xor     r13d,r11d
+        add     r12d,edi
+
+        mov     edi,edx
+        add     r12d,DWORD[rbp]
+        xor     r14d,edx
+
+        xor     edi,r8d
+        ror     r13d,6
+        mov     ecx,r8d
+
+        and     r15d,edi
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     ecx,r15d
+        add     r10d,r12d
+        add     ecx,r12d
+
+        lea     rbp,[4+rbp]
+        add     ecx,r14d
+        mov     r12d,DWORD[24+rsi]
+        mov     r13d,r10d
+        mov     r14d,ecx
+        bswap   r12d
+        ror     r13d,14
+        mov     r15d,r11d
+
+        xor     r13d,r10d
+        ror     r14d,9
+        xor     r15d,eax
+
+        mov     DWORD[24+rsp],r12d
+        xor     r14d,ecx
+        and     r15d,r10d
+
+        ror     r13d,5
+        add     r12d,ebx
+        xor     r15d,eax
+
+        ror     r14d,11
+        xor     r13d,r10d
+        add     r12d,r15d
+
+        mov     r15d,ecx
+        add     r12d,DWORD[rbp]
+        xor     r14d,ecx
+
+        xor     r15d,edx
+        ror     r13d,6
+        mov     ebx,edx
+
+        and     edi,r15d
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     ebx,edi
+        add     r9d,r12d
+        add     ebx,r12d
+
+        lea     rbp,[4+rbp]
+        add     ebx,r14d
+        mov     r12d,DWORD[28+rsi]
+        mov     r13d,r9d
+        mov     r14d,ebx
+        bswap   r12d
+        ror     r13d,14
+        mov     edi,r10d
+
+        xor     r13d,r9d
+        ror     r14d,9
+        xor     edi,r11d
+
+        mov     DWORD[28+rsp],r12d
+        xor     r14d,ebx
+        and     edi,r9d
+
+        ror     r13d,5
+        add     r12d,eax
+        xor     edi,r11d
+
+        ror     r14d,11
+        xor     r13d,r9d
+        add     r12d,edi
+
+        mov     edi,ebx
+        add     r12d,DWORD[rbp]
+        xor     r14d,ebx
+
+        xor     edi,ecx
+        ror     r13d,6
+        mov     eax,ecx
+
+        and     r15d,edi
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     eax,r15d
+        add     r8d,r12d
+        add     eax,r12d
+
+        lea     rbp,[20+rbp]
+        add     eax,r14d
+        mov     r12d,DWORD[32+rsi]
+        mov     r13d,r8d
+        mov     r14d,eax
+        bswap   r12d
+        ror     r13d,14
+        mov     r15d,r9d
+
+        xor     r13d,r8d
+        ror     r14d,9
+        xor     r15d,r10d
+
+        mov     DWORD[32+rsp],r12d
+        xor     r14d,eax
+        and     r15d,r8d
+
+        ror     r13d,5
+        add     r12d,r11d
+        xor     r15d,r10d
+
+        ror     r14d,11
+        xor     r13d,r8d
+        add     r12d,r15d
+
+        mov     r15d,eax
+        add     r12d,DWORD[rbp]
+        xor     r14d,eax
+
+        xor     r15d,ebx
+        ror     r13d,6
+        mov     r11d,ebx
+
+        and     edi,r15d
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     r11d,edi
+        add     edx,r12d
+        add     r11d,r12d
+
+        lea     rbp,[4+rbp]
+        add     r11d,r14d
+        mov     r12d,DWORD[36+rsi]
+        mov     r13d,edx
+        mov     r14d,r11d
+        bswap   r12d
+        ror     r13d,14
+        mov     edi,r8d
+
+        xor     r13d,edx
+        ror     r14d,9
+        xor     edi,r9d
+
+        mov     DWORD[36+rsp],r12d
+        xor     r14d,r11d
+        and     edi,edx
+
+        ror     r13d,5
+        add     r12d,r10d
+        xor     edi,r9d
+
+        ror     r14d,11
+        xor     r13d,edx
+        add     r12d,edi
+
+        mov     edi,r11d
+        add     r12d,DWORD[rbp]
+        xor     r14d,r11d
+
+        xor     edi,eax
+        ror     r13d,6
+        mov     r10d,eax
+
+        and     r15d,edi
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     r10d,r15d
+        add     ecx,r12d
+        add     r10d,r12d
+
+        lea     rbp,[4+rbp]
+        add     r10d,r14d
+        mov     r12d,DWORD[40+rsi]
+        mov     r13d,ecx
+        mov     r14d,r10d
+        bswap   r12d
+        ror     r13d,14
+        mov     r15d,edx
+
+        xor     r13d,ecx
+        ror     r14d,9
+        xor     r15d,r8d
+
+        mov     DWORD[40+rsp],r12d
+        xor     r14d,r10d
+        and     r15d,ecx
+
+        ror     r13d,5
+        add     r12d,r9d
+        xor     r15d,r8d
+
+        ror     r14d,11
+        xor     r13d,ecx
+        add     r12d,r15d
+
+        mov     r15d,r10d
+        add     r12d,DWORD[rbp]
+        xor     r14d,r10d
+
+        xor     r15d,r11d
+        ror     r13d,6
+        mov     r9d,r11d
+
+        and     edi,r15d
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     r9d,edi
+        add     ebx,r12d
+        add     r9d,r12d
+
+        lea     rbp,[4+rbp]
+        add     r9d,r14d
+        mov     r12d,DWORD[44+rsi]
+        mov     r13d,ebx
+        mov     r14d,r9d
+        bswap   r12d
+        ror     r13d,14
+        mov     edi,ecx
+
+        xor     r13d,ebx
+        ror     r14d,9
+        xor     edi,edx
+
+        mov     DWORD[44+rsp],r12d
+        xor     r14d,r9d
+        and     edi,ebx
+
+        ror     r13d,5
+        add     r12d,r8d
+        xor     edi,edx
+
+        ror     r14d,11
+        xor     r13d,ebx
+        add     r12d,edi
+
+        mov     edi,r9d
+        add     r12d,DWORD[rbp]
+        xor     r14d,r9d
+
+        xor     edi,r10d
+        ror     r13d,6
+        mov     r8d,r10d
+
+        and     r15d,edi
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     r8d,r15d
+        add     eax,r12d
+        add     r8d,r12d
+
+        lea     rbp,[20+rbp]
+        add     r8d,r14d
+        mov     r12d,DWORD[48+rsi]
+        mov     r13d,eax
+        mov     r14d,r8d
+        bswap   r12d
+        ror     r13d,14
+        mov     r15d,ebx
+
+        xor     r13d,eax
+        ror     r14d,9
+        xor     r15d,ecx
+
+        mov     DWORD[48+rsp],r12d
+        xor     r14d,r8d
+        and     r15d,eax
+
+        ror     r13d,5
+        add     r12d,edx
+        xor     r15d,ecx
+
+        ror     r14d,11
+        xor     r13d,eax
+        add     r12d,r15d
+
+        mov     r15d,r8d
+        add     r12d,DWORD[rbp]
+        xor     r14d,r8d
+
+        xor     r15d,r9d
+        ror     r13d,6
+        mov     edx,r9d
+
+        and     edi,r15d
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     edx,edi
+        add     r11d,r12d
+        add     edx,r12d
+
+        lea     rbp,[4+rbp]
+        add     edx,r14d
+        mov     r12d,DWORD[52+rsi]
+        mov     r13d,r11d
+        mov     r14d,edx
+        bswap   r12d
+        ror     r13d,14
+        mov     edi,eax
+
+        xor     r13d,r11d
+        ror     r14d,9
+        xor     edi,ebx
+
+        mov     DWORD[52+rsp],r12d
+        xor     r14d,edx
+        and     edi,r11d
+
+        ror     r13d,5
+        add     r12d,ecx
+        xor     edi,ebx
+
+        ror     r14d,11
+        xor     r13d,r11d
+        add     r12d,edi
+
+        mov     edi,edx
+        add     r12d,DWORD[rbp]
+        xor     r14d,edx
+
+        xor     edi,r8d
+        ror     r13d,6
+        mov     ecx,r8d
+
+        and     r15d,edi
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     ecx,r15d
+        add     r10d,r12d
+        add     ecx,r12d
+
+        lea     rbp,[4+rbp]
+        add     ecx,r14d
+        mov     r12d,DWORD[56+rsi]
+        mov     r13d,r10d
+        mov     r14d,ecx
+        bswap   r12d
+        ror     r13d,14
+        mov     r15d,r11d
+
+        xor     r13d,r10d
+        ror     r14d,9
+        xor     r15d,eax
+
+        mov     DWORD[56+rsp],r12d
+        xor     r14d,ecx
+        and     r15d,r10d
+
+        ror     r13d,5
+        add     r12d,ebx
+        xor     r15d,eax
+
+        ror     r14d,11
+        xor     r13d,r10d
+        add     r12d,r15d
+
+        mov     r15d,ecx
+        add     r12d,DWORD[rbp]
+        xor     r14d,ecx
+
+        xor     r15d,edx
+        ror     r13d,6
+        mov     ebx,edx
+
+        and     edi,r15d
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     ebx,edi
+        add     r9d,r12d
+        add     ebx,r12d
+
+        lea     rbp,[4+rbp]
+        add     ebx,r14d
+        mov     r12d,DWORD[60+rsi]
+        mov     r13d,r9d
+        mov     r14d,ebx
+        bswap   r12d
+        ror     r13d,14
+        mov     edi,r10d
+
+        xor     r13d,r9d
+        ror     r14d,9
+        xor     edi,r11d
+
+        mov     DWORD[60+rsp],r12d
+        xor     r14d,ebx
+        and     edi,r9d
+
+        ror     r13d,5
+        add     r12d,eax
+        xor     edi,r11d
+
+        ror     r14d,11
+        xor     r13d,r9d
+        add     r12d,edi
+
+        mov     edi,ebx
+        add     r12d,DWORD[rbp]
+        xor     r14d,ebx
+
+        xor     edi,ecx
+        ror     r13d,6
+        mov     eax,ecx
+
+        and     r15d,edi
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     eax,r15d
+        add     r8d,r12d
+        add     eax,r12d
+
+        lea     rbp,[20+rbp]
+        jmp     NEAR $L$rounds_16_xx
+ALIGN   16
+$L$rounds_16_xx:
+        mov     r13d,DWORD[4+rsp]
+        mov     r15d,DWORD[56+rsp]
+
+        mov     r12d,r13d
+        ror     r13d,11
+        add     eax,r14d
+        mov     r14d,r15d
+        ror     r15d,2
+
+        xor     r13d,r12d
+        shr     r12d,3
+        ror     r13d,7
+        xor     r15d,r14d
+        shr     r14d,10
+
+        ror     r15d,17
+        xor     r12d,r13d
+        xor     r15d,r14d
+        add     r12d,DWORD[36+rsp]
+
+        add     r12d,DWORD[rsp]
+        mov     r13d,r8d
+        add     r12d,r15d
+        mov     r14d,eax
+        ror     r13d,14
+        mov     r15d,r9d
+
+        xor     r13d,r8d
+        ror     r14d,9
+        xor     r15d,r10d
+
+        mov     DWORD[rsp],r12d
+        xor     r14d,eax
+        and     r15d,r8d
+
+        ror     r13d,5
+        add     r12d,r11d
+        xor     r15d,r10d
+
+        ror     r14d,11
+        xor     r13d,r8d
+        add     r12d,r15d
+
+        mov     r15d,eax
+        add     r12d,DWORD[rbp]
+        xor     r14d,eax
+
+        xor     r15d,ebx
+        ror     r13d,6
+        mov     r11d,ebx
+
+        and     edi,r15d
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     r11d,edi
+        add     edx,r12d
+        add     r11d,r12d
+
+        lea     rbp,[4+rbp]
+        mov     r13d,DWORD[8+rsp]
+        mov     edi,DWORD[60+rsp]
+
+        mov     r12d,r13d
+        ror     r13d,11
+        add     r11d,r14d
+        mov     r14d,edi
+        ror     edi,2
+
+        xor     r13d,r12d
+        shr     r12d,3
+        ror     r13d,7
+        xor     edi,r14d
+        shr     r14d,10
+
+        ror     edi,17
+        xor     r12d,r13d
+        xor     edi,r14d
+        add     r12d,DWORD[40+rsp]
+
+        add     r12d,DWORD[4+rsp]
+        mov     r13d,edx
+        add     r12d,edi
+        mov     r14d,r11d
+        ror     r13d,14
+        mov     edi,r8d
+
+        xor     r13d,edx
+        ror     r14d,9
+        xor     edi,r9d
+
+        mov     DWORD[4+rsp],r12d
+        xor     r14d,r11d
+        and     edi,edx
+
+        ror     r13d,5
+        add     r12d,r10d
+        xor     edi,r9d
+
+        ror     r14d,11
+        xor     r13d,edx
+        add     r12d,edi
+
+        mov     edi,r11d
+        add     r12d,DWORD[rbp]
+        xor     r14d,r11d
+
+        xor     edi,eax
+        ror     r13d,6
+        mov     r10d,eax
+
+        and     r15d,edi
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     r10d,r15d
+        add     ecx,r12d
+        add     r10d,r12d
+
+        lea     rbp,[4+rbp]
+        mov     r13d,DWORD[12+rsp]
+        mov     r15d,DWORD[rsp]
+
+        mov     r12d,r13d
+        ror     r13d,11
+        add     r10d,r14d
+        mov     r14d,r15d
+        ror     r15d,2
+
+        xor     r13d,r12d
+        shr     r12d,3
+        ror     r13d,7
+        xor     r15d,r14d
+        shr     r14d,10
+
+        ror     r15d,17
+        xor     r12d,r13d
+        xor     r15d,r14d
+        add     r12d,DWORD[44+rsp]
+
+        add     r12d,DWORD[8+rsp]
+        mov     r13d,ecx
+        add     r12d,r15d
+        mov     r14d,r10d
+        ror     r13d,14
+        mov     r15d,edx
+
+        xor     r13d,ecx
+        ror     r14d,9
+        xor     r15d,r8d
+
+        mov     DWORD[8+rsp],r12d
+        xor     r14d,r10d
+        and     r15d,ecx
+
+        ror     r13d,5
+        add     r12d,r9d
+        xor     r15d,r8d
+
+        ror     r14d,11
+        xor     r13d,ecx
+        add     r12d,r15d
+
+        mov     r15d,r10d
+        add     r12d,DWORD[rbp]
+        xor     r14d,r10d
+
+        xor     r15d,r11d
+        ror     r13d,6
+        mov     r9d,r11d
+
+        and     edi,r15d
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     r9d,edi
+        add     ebx,r12d
+        add     r9d,r12d
+
+        lea     rbp,[4+rbp]
+        mov     r13d,DWORD[16+rsp]
+        mov     edi,DWORD[4+rsp]
+
+        mov     r12d,r13d
+        ror     r13d,11
+        add     r9d,r14d
+        mov     r14d,edi
+        ror     edi,2
+
+        xor     r13d,r12d
+        shr     r12d,3
+        ror     r13d,7
+        xor     edi,r14d
+        shr     r14d,10
+
+        ror     edi,17
+        xor     r12d,r13d
+        xor     edi,r14d
+        add     r12d,DWORD[48+rsp]
+
+        add     r12d,DWORD[12+rsp]
+        mov     r13d,ebx
+        add     r12d,edi
+        mov     r14d,r9d
+        ror     r13d,14
+        mov     edi,ecx
+
+        xor     r13d,ebx
+        ror     r14d,9
+        xor     edi,edx
+
+        mov     DWORD[12+rsp],r12d
+        xor     r14d,r9d
+        and     edi,ebx
+
+        ror     r13d,5
+        add     r12d,r8d
+        xor     edi,edx
+
+        ror     r14d,11
+        xor     r13d,ebx
+        add     r12d,edi
+
+        mov     edi,r9d
+        add     r12d,DWORD[rbp]
+        xor     r14d,r9d
+
+        xor     edi,r10d
+        ror     r13d,6
+        mov     r8d,r10d
+
+        and     r15d,edi
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     r8d,r15d
+        add     eax,r12d
+        add     r8d,r12d
+
+        lea     rbp,[20+rbp]
+        mov     r13d,DWORD[20+rsp]
+        mov     r15d,DWORD[8+rsp]
+
+        mov     r12d,r13d
+        ror     r13d,11
+        add     r8d,r14d
+        mov     r14d,r15d
+        ror     r15d,2
+
+        xor     r13d,r12d
+        shr     r12d,3
+        ror     r13d,7
+        xor     r15d,r14d
+        shr     r14d,10
+
+        ror     r15d,17
+        xor     r12d,r13d
+        xor     r15d,r14d
+        add     r12d,DWORD[52+rsp]
+
+        add     r12d,DWORD[16+rsp]
+        mov     r13d,eax
+        add     r12d,r15d
+        mov     r14d,r8d
+        ror     r13d,14
+        mov     r15d,ebx
+
+        xor     r13d,eax
+        ror     r14d,9
+        xor     r15d,ecx
+
+        mov     DWORD[16+rsp],r12d
+        xor     r14d,r8d
+        and     r15d,eax
+
+        ror     r13d,5
+        add     r12d,edx
+        xor     r15d,ecx
+
+        ror     r14d,11
+        xor     r13d,eax
+        add     r12d,r15d
+
+        mov     r15d,r8d
+        add     r12d,DWORD[rbp]
+        xor     r14d,r8d
+
+        xor     r15d,r9d
+        ror     r13d,6
+        mov     edx,r9d
+
+        and     edi,r15d
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     edx,edi
+        add     r11d,r12d
+        add     edx,r12d
+
+        lea     rbp,[4+rbp]
+        mov     r13d,DWORD[24+rsp]
+        mov     edi,DWORD[12+rsp]
+
+        mov     r12d,r13d
+        ror     r13d,11
+        add     edx,r14d
+        mov     r14d,edi
+        ror     edi,2
+
+        xor     r13d,r12d
+        shr     r12d,3
+        ror     r13d,7
+        xor     edi,r14d
+        shr     r14d,10
+
+        ror     edi,17
+        xor     r12d,r13d
+        xor     edi,r14d
+        add     r12d,DWORD[56+rsp]
+
+        add     r12d,DWORD[20+rsp]
+        mov     r13d,r11d
+        add     r12d,edi
+        mov     r14d,edx
+        ror     r13d,14
+        mov     edi,eax
+
+        xor     r13d,r11d
+        ror     r14d,9
+        xor     edi,ebx
+
+        mov     DWORD[20+rsp],r12d
+        xor     r14d,edx
+        and     edi,r11d
+
+        ror     r13d,5
+        add     r12d,ecx
+        xor     edi,ebx
+
+        ror     r14d,11
+        xor     r13d,r11d
+        add     r12d,edi
+
+        mov     edi,edx
+        add     r12d,DWORD[rbp]
+        xor     r14d,edx
+
+        xor     edi,r8d
+        ror     r13d,6
+        mov     ecx,r8d
+
+        and     r15d,edi
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     ecx,r15d
+        add     r10d,r12d
+        add     ecx,r12d
+
+        lea     rbp,[4+rbp]
+        mov     r13d,DWORD[28+rsp]
+        mov     r15d,DWORD[16+rsp]
+
+        mov     r12d,r13d
+        ror     r13d,11
+        add     ecx,r14d
+        mov     r14d,r15d
+        ror     r15d,2
+
+        xor     r13d,r12d
+        shr     r12d,3
+        ror     r13d,7
+        xor     r15d,r14d
+        shr     r14d,10
+
+        ror     r15d,17
+        xor     r12d,r13d
+        xor     r15d,r14d
+        add     r12d,DWORD[60+rsp]
+
+        add     r12d,DWORD[24+rsp]
+        mov     r13d,r10d
+        add     r12d,r15d
+        mov     r14d,ecx
+        ror     r13d,14
+        mov     r15d,r11d
+
+        xor     r13d,r10d
+        ror     r14d,9
+        xor     r15d,eax
+
+        mov     DWORD[24+rsp],r12d
+        xor     r14d,ecx
+        and     r15d,r10d
+
+        ror     r13d,5
+        add     r12d,ebx
+        xor     r15d,eax
+
+        ror     r14d,11
+        xor     r13d,r10d
+        add     r12d,r15d
+
+        mov     r15d,ecx
+        add     r12d,DWORD[rbp]
+        xor     r14d,ecx
+
+        xor     r15d,edx
+        ror     r13d,6
+        mov     ebx,edx
+
+        and     edi,r15d
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     ebx,edi
+        add     r9d,r12d
+        add     ebx,r12d
+
+        lea     rbp,[4+rbp]
+        mov     r13d,DWORD[32+rsp]
+        mov     edi,DWORD[20+rsp]
+
+        mov     r12d,r13d
+        ror     r13d,11
+        add     ebx,r14d
+        mov     r14d,edi
+        ror     edi,2
+
+        xor     r13d,r12d
+        shr     r12d,3
+        ror     r13d,7
+        xor     edi,r14d
+        shr     r14d,10
+
+        ror     edi,17
+        xor     r12d,r13d
+        xor     edi,r14d
+        add     r12d,DWORD[rsp]
+
+        add     r12d,DWORD[28+rsp]
+        mov     r13d,r9d
+        add     r12d,edi
+        mov     r14d,ebx
+        ror     r13d,14
+        mov     edi,r10d
+
+        xor     r13d,r9d
+        ror     r14d,9
+        xor     edi,r11d
+
+        mov     DWORD[28+rsp],r12d
+        xor     r14d,ebx
+        and     edi,r9d
+
+        ror     r13d,5
+        add     r12d,eax
+        xor     edi,r11d
+
+        ror     r14d,11
+        xor     r13d,r9d
+        add     r12d,edi
+
+        mov     edi,ebx
+        add     r12d,DWORD[rbp]
+        xor     r14d,ebx
+
+        xor     edi,ecx
+        ror     r13d,6
+        mov     eax,ecx
+
+        and     r15d,edi
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     eax,r15d
+        add     r8d,r12d
+        add     eax,r12d
+
+        lea     rbp,[20+rbp]
+        mov     r13d,DWORD[36+rsp]
+        mov     r15d,DWORD[24+rsp]
+
+        mov     r12d,r13d
+        ror     r13d,11
+        add     eax,r14d
+        mov     r14d,r15d
+        ror     r15d,2
+
+        xor     r13d,r12d
+        shr     r12d,3
+        ror     r13d,7
+        xor     r15d,r14d
+        shr     r14d,10
+
+        ror     r15d,17
+        xor     r12d,r13d
+        xor     r15d,r14d
+        add     r12d,DWORD[4+rsp]
+
+        add     r12d,DWORD[32+rsp]
+        mov     r13d,r8d
+        add     r12d,r15d
+        mov     r14d,eax
+        ror     r13d,14
+        mov     r15d,r9d
+
+        xor     r13d,r8d
+        ror     r14d,9
+        xor     r15d,r10d
+
+        mov     DWORD[32+rsp],r12d
+        xor     r14d,eax
+        and     r15d,r8d
+
+        ror     r13d,5
+        add     r12d,r11d
+        xor     r15d,r10d
+
+        ror     r14d,11
+        xor     r13d,r8d
+        add     r12d,r15d
+
+        mov     r15d,eax
+        add     r12d,DWORD[rbp]
+        xor     r14d,eax
+
+        xor     r15d,ebx
+        ror     r13d,6
+        mov     r11d,ebx
+
+        and     edi,r15d
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     r11d,edi
+        add     edx,r12d
+        add     r11d,r12d
+
+        lea     rbp,[4+rbp]
+        mov     r13d,DWORD[40+rsp]
+        mov     edi,DWORD[28+rsp]
+
+        mov     r12d,r13d
+        ror     r13d,11
+        add     r11d,r14d
+        mov     r14d,edi
+        ror     edi,2
+
+        xor     r13d,r12d
+        shr     r12d,3
+        ror     r13d,7
+        xor     edi,r14d
+        shr     r14d,10
+
+        ror     edi,17
+        xor     r12d,r13d
+        xor     edi,r14d
+        add     r12d,DWORD[8+rsp]
+
+        add     r12d,DWORD[36+rsp]
+        mov     r13d,edx
+        add     r12d,edi
+        mov     r14d,r11d
+        ror     r13d,14
+        mov     edi,r8d
+
+        xor     r13d,edx
+        ror     r14d,9
+        xor     edi,r9d
+
+        mov     DWORD[36+rsp],r12d
+        xor     r14d,r11d
+        and     edi,edx
+
+        ror     r13d,5
+        add     r12d,r10d
+        xor     edi,r9d
+
+        ror     r14d,11
+        xor     r13d,edx
+        add     r12d,edi
+
+        mov     edi,r11d
+        add     r12d,DWORD[rbp]
+        xor     r14d,r11d
+
+        xor     edi,eax
+        ror     r13d,6
+        mov     r10d,eax
+
+        and     r15d,edi
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     r10d,r15d
+        add     ecx,r12d
+        add     r10d,r12d
+
+        lea     rbp,[4+rbp]
+        mov     r13d,DWORD[44+rsp]
+        mov     r15d,DWORD[32+rsp]
+
+        mov     r12d,r13d
+        ror     r13d,11
+        add     r10d,r14d
+        mov     r14d,r15d
+        ror     r15d,2
+
+        xor     r13d,r12d
+        shr     r12d,3
+        ror     r13d,7
+        xor     r15d,r14d
+        shr     r14d,10
+
+        ror     r15d,17
+        xor     r12d,r13d
+        xor     r15d,r14d
+        add     r12d,DWORD[12+rsp]
+
+        add     r12d,DWORD[40+rsp]
+        mov     r13d,ecx
+        add     r12d,r15d
+        mov     r14d,r10d
+        ror     r13d,14
+        mov     r15d,edx
+
+        xor     r13d,ecx
+        ror     r14d,9
+        xor     r15d,r8d
+
+        mov     DWORD[40+rsp],r12d
+        xor     r14d,r10d
+        and     r15d,ecx
+
+        ror     r13d,5
+        add     r12d,r9d
+        xor     r15d,r8d
+
+        ror     r14d,11
+        xor     r13d,ecx
+        add     r12d,r15d
+
+        mov     r15d,r10d
+        add     r12d,DWORD[rbp]
+        xor     r14d,r10d
+
+        xor     r15d,r11d
+        ror     r13d,6
+        mov     r9d,r11d
+
+        and     edi,r15d
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     r9d,edi
+        add     ebx,r12d
+        add     r9d,r12d
+
+        lea     rbp,[4+rbp]
+        mov     r13d,DWORD[48+rsp]
+        mov     edi,DWORD[36+rsp]
+
+        mov     r12d,r13d
+        ror     r13d,11
+        add     r9d,r14d
+        mov     r14d,edi
+        ror     edi,2
+
+        xor     r13d,r12d
+        shr     r12d,3
+        ror     r13d,7
+        xor     edi,r14d
+        shr     r14d,10
+
+        ror     edi,17
+        xor     r12d,r13d
+        xor     edi,r14d
+        add     r12d,DWORD[16+rsp]
+
+        add     r12d,DWORD[44+rsp]
+        mov     r13d,ebx
+        add     r12d,edi
+        mov     r14d,r9d
+        ror     r13d,14
+        mov     edi,ecx
+
+        xor     r13d,ebx
+        ror     r14d,9
+        xor     edi,edx
+
+        mov     DWORD[44+rsp],r12d
+        xor     r14d,r9d
+        and     edi,ebx
+
+        ror     r13d,5
+        add     r12d,r8d
+        xor     edi,edx
+
+        ror     r14d,11
+        xor     r13d,ebx
+        add     r12d,edi
+
+        mov     edi,r9d
+        add     r12d,DWORD[rbp]
+        xor     r14d,r9d
+
+        xor     edi,r10d
+        ror     r13d,6
+        mov     r8d,r10d
+
+        and     r15d,edi
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     r8d,r15d
+        add     eax,r12d
+        add     r8d,r12d
+
+        lea     rbp,[20+rbp]
+        mov     r13d,DWORD[52+rsp]
+        mov     r15d,DWORD[40+rsp]
+
+        mov     r12d,r13d
+        ror     r13d,11
+        add     r8d,r14d
+        mov     r14d,r15d
+        ror     r15d,2
+
+        xor     r13d,r12d
+        shr     r12d,3
+        ror     r13d,7
+        xor     r15d,r14d
+        shr     r14d,10
+
+        ror     r15d,17
+        xor     r12d,r13d
+        xor     r15d,r14d
+        add     r12d,DWORD[20+rsp]
+
+        add     r12d,DWORD[48+rsp]
+        mov     r13d,eax
+        add     r12d,r15d
+        mov     r14d,r8d
+        ror     r13d,14
+        mov     r15d,ebx
+
+        xor     r13d,eax
+        ror     r14d,9
+        xor     r15d,ecx
+
+        mov     DWORD[48+rsp],r12d
+        xor     r14d,r8d
+        and     r15d,eax
+
+        ror     r13d,5
+        add     r12d,edx
+        xor     r15d,ecx
+
+        ror     r14d,11
+        xor     r13d,eax
+        add     r12d,r15d
+
+        mov     r15d,r8d
+        add     r12d,DWORD[rbp]
+        xor     r14d,r8d
+
+        xor     r15d,r9d
+        ror     r13d,6
+        mov     edx,r9d
+
+        and     edi,r15d
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     edx,edi
+        add     r11d,r12d
+        add     edx,r12d
+
+        lea     rbp,[4+rbp]
+        mov     r13d,DWORD[56+rsp]
+        mov     edi,DWORD[44+rsp]
+
+        mov     r12d,r13d
+        ror     r13d,11
+        add     edx,r14d
+        mov     r14d,edi
+        ror     edi,2
+
+        xor     r13d,r12d
+        shr     r12d,3
+        ror     r13d,7
+        xor     edi,r14d
+        shr     r14d,10
+
+        ror     edi,17
+        xor     r12d,r13d
+        xor     edi,r14d
+        add     r12d,DWORD[24+rsp]
+
+        add     r12d,DWORD[52+rsp]
+        mov     r13d,r11d
+        add     r12d,edi
+        mov     r14d,edx
+        ror     r13d,14
+        mov     edi,eax
+
+        xor     r13d,r11d
+        ror     r14d,9
+        xor     edi,ebx
+
+        mov     DWORD[52+rsp],r12d
+        xor     r14d,edx
+        and     edi,r11d
+
+        ror     r13d,5
+        add     r12d,ecx
+        xor     edi,ebx
+
+        ror     r14d,11
+        xor     r13d,r11d
+        add     r12d,edi
+
+        mov     edi,edx
+        add     r12d,DWORD[rbp]
+        xor     r14d,edx
+
+        xor     edi,r8d
+        ror     r13d,6
+        mov     ecx,r8d
+
+        and     r15d,edi
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     ecx,r15d
+        add     r10d,r12d
+        add     ecx,r12d
+
+        lea     rbp,[4+rbp]
+        mov     r13d,DWORD[60+rsp]
+        mov     r15d,DWORD[48+rsp]
+
+        mov     r12d,r13d
+        ror     r13d,11
+        add     ecx,r14d
+        mov     r14d,r15d
+        ror     r15d,2
+
+        xor     r13d,r12d
+        shr     r12d,3
+        ror     r13d,7
+        xor     r15d,r14d
+        shr     r14d,10
+
+        ror     r15d,17
+        xor     r12d,r13d
+        xor     r15d,r14d
+        add     r12d,DWORD[28+rsp]
+
+        add     r12d,DWORD[56+rsp]
+        mov     r13d,r10d
+        add     r12d,r15d
+        mov     r14d,ecx
+        ror     r13d,14
+        mov     r15d,r11d
+
+        xor     r13d,r10d
+        ror     r14d,9
+        xor     r15d,eax
+
+        mov     DWORD[56+rsp],r12d
+        xor     r14d,ecx
+        and     r15d,r10d
+
+        ror     r13d,5
+        add     r12d,ebx
+        xor     r15d,eax
+
+        ror     r14d,11
+        xor     r13d,r10d
+        add     r12d,r15d
+
+        mov     r15d,ecx
+        add     r12d,DWORD[rbp]
+        xor     r14d,ecx
+
+        xor     r15d,edx
+        ror     r13d,6
+        mov     ebx,edx
+
+        and     edi,r15d
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     ebx,edi
+        add     r9d,r12d
+        add     ebx,r12d
+
+        lea     rbp,[4+rbp]
+        mov     r13d,DWORD[rsp]
+        mov     edi,DWORD[52+rsp]
+
+        mov     r12d,r13d
+        ror     r13d,11
+        add     ebx,r14d
+        mov     r14d,edi
+        ror     edi,2
+
+        xor     r13d,r12d
+        shr     r12d,3
+        ror     r13d,7
+        xor     edi,r14d
+        shr     r14d,10
+
+        ror     edi,17
+        xor     r12d,r13d
+        xor     edi,r14d
+        add     r12d,DWORD[32+rsp]
+
+        add     r12d,DWORD[60+rsp]
+        mov     r13d,r9d
+        add     r12d,edi
+        mov     r14d,ebx
+        ror     r13d,14
+        mov     edi,r10d
+
+        xor     r13d,r9d
+        ror     r14d,9
+        xor     edi,r11d
+
+        mov     DWORD[60+rsp],r12d
+        xor     r14d,ebx
+        and     edi,r9d
+
+        ror     r13d,5
+        add     r12d,eax
+        xor     edi,r11d
+
+        ror     r14d,11
+        xor     r13d,r9d
+        add     r12d,edi
+
+        mov     edi,ebx
+        add     r12d,DWORD[rbp]
+        xor     r14d,ebx
+
+        xor     edi,ecx
+        ror     r13d,6
+        mov     eax,ecx
+
+        and     r15d,edi
+        ror     r14d,2
+        add     r12d,r13d
+
+        xor     eax,r15d
+        add     r8d,r12d
+        add     eax,r12d
+
+        lea     rbp,[20+rbp]
+        cmp     BYTE[3+rbp],0
+        jnz     NEAR $L$rounds_16_xx
+
+        mov     rdi,QWORD[((64+0))+rsp]
+        add     eax,r14d
+        lea     rsi,[64+rsi]
+
+        add     eax,DWORD[rdi]
+        add     ebx,DWORD[4+rdi]
+        add     ecx,DWORD[8+rdi]
+        add     edx,DWORD[12+rdi]
+        add     r8d,DWORD[16+rdi]
+        add     r9d,DWORD[20+rdi]
+        add     r10d,DWORD[24+rdi]
+        add     r11d,DWORD[28+rdi]
+
+        cmp     rsi,QWORD[((64+16))+rsp]
+
+        mov     DWORD[rdi],eax
+        mov     DWORD[4+rdi],ebx
+        mov     DWORD[8+rdi],ecx
+        mov     DWORD[12+rdi],edx
+        mov     DWORD[16+rdi],r8d
+        mov     DWORD[20+rdi],r9d
+        mov     DWORD[24+rdi],r10d
+        mov     DWORD[28+rdi],r11d
+        jb      NEAR $L$loop
+
+        mov     rsi,QWORD[88+rsp]
+
+        mov     r15,QWORD[((-48))+rsi]
+
+        mov     r14,QWORD[((-40))+rsi]
+
+        mov     r13,QWORD[((-32))+rsi]
+
+        mov     r12,QWORD[((-24))+rsi]
+
+        mov     rbp,QWORD[((-16))+rsi]
+
+        mov     rbx,QWORD[((-8))+rsi]
+
+        lea     rsp,[rsi]
+
+$L$epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_sha256_block_data_order:
+ALIGN   64
+
+K256:
+        DD      0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5
+        DD      0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5
+        DD      0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5
+        DD      0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5
+        DD      0xd807aa98,0x12835b01,0x243185be,0x550c7dc3
+        DD      0xd807aa98,0x12835b01,0x243185be,0x550c7dc3
+        DD      0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174
+        DD      0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174
+        DD      0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc
+        DD      0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc
+        DD      0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da
+        DD      0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da
+        DD      0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7
+        DD      0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7
+        DD      0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967
+        DD      0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967
+        DD      0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13
+        DD      0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13
+        DD      0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85
+        DD      0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85
+        DD      0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3
+        DD      0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3
+        DD      0xd192e819,0xd6990624,0xf40e3585,0x106aa070
+        DD      0xd192e819,0xd6990624,0xf40e3585,0x106aa070
+        DD      0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5
+        DD      0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5
+        DD      0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3
+        DD      0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3
+        DD      0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208
+        DD      0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208
+        DD      0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2
+        DD      0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2
+
+        DD      0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f
+        DD      0x00010203,0x04050607,0x08090a0b,0x0c0d0e0f
+        DD      0x03020100,0x0b0a0908,0xffffffff,0xffffffff
+        DD      0x03020100,0x0b0a0908,0xffffffff,0xffffffff
+        DD      0xffffffff,0xffffffff,0x03020100,0x0b0a0908
+        DD      0xffffffff,0xffffffff,0x03020100,0x0b0a0908
+DB      83,72,65,50,53,54,32,98,108,111,99,107,32,116,114,97
+DB      110,115,102,111,114,109,32,102,111,114,32,120,56,54,95,54
+DB      52,44,32,67,82,89,80,84,79,71,65,77,83,32,98,121
+DB      32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46
+DB      111,114,103,62,0
+
+ALIGN   64
+sha256_block_data_order_shaext:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_sha256_block_data_order_shaext:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+_shaext_shortcut:
+        lea     rsp,[((-88))+rsp]
+        movaps  XMMWORD[(-8-80)+rax],xmm6
+        movaps  XMMWORD[(-8-64)+rax],xmm7
+        movaps  XMMWORD[(-8-48)+rax],xmm8
+        movaps  XMMWORD[(-8-32)+rax],xmm9
+        movaps  XMMWORD[(-8-16)+rax],xmm10
+$L$prologue_shaext:
+        lea     rcx,[((K256+128))]
+        movdqu  xmm1,XMMWORD[rdi]
+        movdqu  xmm2,XMMWORD[16+rdi]
+        movdqa  xmm7,XMMWORD[((512-128))+rcx]
+
+        pshufd  xmm0,xmm1,0x1b
+        pshufd  xmm1,xmm1,0xb1
+        pshufd  xmm2,xmm2,0x1b
+        movdqa  xmm8,xmm7
+DB      102,15,58,15,202,8
+        punpcklqdq      xmm2,xmm0
+        jmp     NEAR $L$oop_shaext
+
+ALIGN   16
+$L$oop_shaext:
+        movdqu  xmm3,XMMWORD[rsi]
+        movdqu  xmm4,XMMWORD[16+rsi]
+        movdqu  xmm5,XMMWORD[32+rsi]
+DB      102,15,56,0,223
+        movdqu  xmm6,XMMWORD[48+rsi]
+
+        movdqa  xmm0,XMMWORD[((0-128))+rcx]
+        paddd   xmm0,xmm3
+DB      102,15,56,0,231
+        movdqa  xmm10,xmm2
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        nop
+        movdqa  xmm9,xmm1
+DB      15,56,203,202
+
+        movdqa  xmm0,XMMWORD[((32-128))+rcx]
+        paddd   xmm0,xmm4
+DB      102,15,56,0,239
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        lea     rsi,[64+rsi]
+DB      15,56,204,220
+DB      15,56,203,202
+
+        movdqa  xmm0,XMMWORD[((64-128))+rcx]
+        paddd   xmm0,xmm5
+DB      102,15,56,0,247
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movdqa  xmm7,xmm6
+DB      102,15,58,15,253,4
+        nop
+        paddd   xmm3,xmm7
+DB      15,56,204,229
+DB      15,56,203,202
+
+        movdqa  xmm0,XMMWORD[((96-128))+rcx]
+        paddd   xmm0,xmm6
+DB      15,56,205,222
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movdqa  xmm7,xmm3
+DB      102,15,58,15,254,4
+        nop
+        paddd   xmm4,xmm7
+DB      15,56,204,238
+DB      15,56,203,202
+        movdqa  xmm0,XMMWORD[((128-128))+rcx]
+        paddd   xmm0,xmm3
+DB      15,56,205,227
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movdqa  xmm7,xmm4
+DB      102,15,58,15,251,4
+        nop
+        paddd   xmm5,xmm7
+DB      15,56,204,243
+DB      15,56,203,202
+        movdqa  xmm0,XMMWORD[((160-128))+rcx]
+        paddd   xmm0,xmm4
+DB      15,56,205,236
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movdqa  xmm7,xmm5
+DB      102,15,58,15,252,4
+        nop
+        paddd   xmm6,xmm7
+DB      15,56,204,220
+DB      15,56,203,202
+        movdqa  xmm0,XMMWORD[((192-128))+rcx]
+        paddd   xmm0,xmm5
+DB      15,56,205,245
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movdqa  xmm7,xmm6
+DB      102,15,58,15,253,4
+        nop
+        paddd   xmm3,xmm7
+DB      15,56,204,229
+DB      15,56,203,202
+        movdqa  xmm0,XMMWORD[((224-128))+rcx]
+        paddd   xmm0,xmm6
+DB      15,56,205,222
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movdqa  xmm7,xmm3
+DB      102,15,58,15,254,4
+        nop
+        paddd   xmm4,xmm7
+DB      15,56,204,238
+DB      15,56,203,202
+        movdqa  xmm0,XMMWORD[((256-128))+rcx]
+        paddd   xmm0,xmm3
+DB      15,56,205,227
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movdqa  xmm7,xmm4
+DB      102,15,58,15,251,4
+        nop
+        paddd   xmm5,xmm7
+DB      15,56,204,243
+DB      15,56,203,202
+        movdqa  xmm0,XMMWORD[((288-128))+rcx]
+        paddd   xmm0,xmm4
+DB      15,56,205,236
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movdqa  xmm7,xmm5
+DB      102,15,58,15,252,4
+        nop
+        paddd   xmm6,xmm7
+DB      15,56,204,220
+DB      15,56,203,202
+        movdqa  xmm0,XMMWORD[((320-128))+rcx]
+        paddd   xmm0,xmm5
+DB      15,56,205,245
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movdqa  xmm7,xmm6
+DB      102,15,58,15,253,4
+        nop
+        paddd   xmm3,xmm7
+DB      15,56,204,229
+DB      15,56,203,202
+        movdqa  xmm0,XMMWORD[((352-128))+rcx]
+        paddd   xmm0,xmm6
+DB      15,56,205,222
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movdqa  xmm7,xmm3
+DB      102,15,58,15,254,4
+        nop
+        paddd   xmm4,xmm7
+DB      15,56,204,238
+DB      15,56,203,202
+        movdqa  xmm0,XMMWORD[((384-128))+rcx]
+        paddd   xmm0,xmm3
+DB      15,56,205,227
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movdqa  xmm7,xmm4
+DB      102,15,58,15,251,4
+        nop
+        paddd   xmm5,xmm7
+DB      15,56,204,243
+DB      15,56,203,202
+        movdqa  xmm0,XMMWORD[((416-128))+rcx]
+        paddd   xmm0,xmm4
+DB      15,56,205,236
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        movdqa  xmm7,xmm5
+DB      102,15,58,15,252,4
+DB      15,56,203,202
+        paddd   xmm6,xmm7
+
+        movdqa  xmm0,XMMWORD[((448-128))+rcx]
+        paddd   xmm0,xmm5
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+DB      15,56,205,245
+        movdqa  xmm7,xmm8
+DB      15,56,203,202
+
+        movdqa  xmm0,XMMWORD[((480-128))+rcx]
+        paddd   xmm0,xmm6
+        nop
+DB      15,56,203,209
+        pshufd  xmm0,xmm0,0x0e
+        dec     rdx
+        nop
+DB      15,56,203,202
+
+        paddd   xmm2,xmm10
+        paddd   xmm1,xmm9
+        jnz     NEAR $L$oop_shaext
+
+        pshufd  xmm2,xmm2,0xb1
+        pshufd  xmm7,xmm1,0x1b
+        pshufd  xmm1,xmm1,0xb1
+        punpckhqdq      xmm1,xmm2
+DB      102,15,58,15,215,8
+
+        movdqu  XMMWORD[rdi],xmm1
+        movdqu  XMMWORD[16+rdi],xmm2
+        movaps  xmm6,XMMWORD[((-8-80))+rax]
+        movaps  xmm7,XMMWORD[((-8-64))+rax]
+        movaps  xmm8,XMMWORD[((-8-48))+rax]
+        movaps  xmm9,XMMWORD[((-8-32))+rax]
+        movaps  xmm10,XMMWORD[((-8-16))+rax]
+        mov     rsp,rax
+$L$epilogue_shaext:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+$L$SEH_end_sha256_block_data_order_shaext:
+
+ALIGN   64
+sha256_block_data_order_ssse3:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_sha256_block_data_order_ssse3:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+
+$L$ssse3_shortcut:
+        mov     rax,rsp
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+        shl     rdx,4
+        sub     rsp,160
+        lea     rdx,[rdx*4+rsi]
+        and     rsp,-64
+        mov     QWORD[((64+0))+rsp],rdi
+        mov     QWORD[((64+8))+rsp],rsi
+        mov     QWORD[((64+16))+rsp],rdx
+        mov     QWORD[88+rsp],rax
+
+        movaps  XMMWORD[(64+32)+rsp],xmm6
+        movaps  XMMWORD[(64+48)+rsp],xmm7
+        movaps  XMMWORD[(64+64)+rsp],xmm8
+        movaps  XMMWORD[(64+80)+rsp],xmm9
+$L$prologue_ssse3:
+
+        mov     eax,DWORD[rdi]
+        mov     ebx,DWORD[4+rdi]
+        mov     ecx,DWORD[8+rdi]
+        mov     edx,DWORD[12+rdi]
+        mov     r8d,DWORD[16+rdi]
+        mov     r9d,DWORD[20+rdi]
+        mov     r10d,DWORD[24+rdi]
+        mov     r11d,DWORD[28+rdi]
+
+
+        jmp     NEAR $L$loop_ssse3
+ALIGN   16
+$L$loop_ssse3:
+        movdqa  xmm7,XMMWORD[((K256+512))]
+        movdqu  xmm0,XMMWORD[rsi]
+        movdqu  xmm1,XMMWORD[16+rsi]
+        movdqu  xmm2,XMMWORD[32+rsi]
+DB      102,15,56,0,199
+        movdqu  xmm3,XMMWORD[48+rsi]
+        lea     rbp,[K256]
+DB      102,15,56,0,207
+        movdqa  xmm4,XMMWORD[rbp]
+        movdqa  xmm5,XMMWORD[32+rbp]
+DB      102,15,56,0,215
+        paddd   xmm4,xmm0
+        movdqa  xmm6,XMMWORD[64+rbp]
+DB      102,15,56,0,223
+        movdqa  xmm7,XMMWORD[96+rbp]
+        paddd   xmm5,xmm1
+        paddd   xmm6,xmm2
+        paddd   xmm7,xmm3
+        movdqa  XMMWORD[rsp],xmm4
+        mov     r14d,eax
+        movdqa  XMMWORD[16+rsp],xmm5
+        mov     edi,ebx
+        movdqa  XMMWORD[32+rsp],xmm6
+        xor     edi,ecx
+        movdqa  XMMWORD[48+rsp],xmm7
+        mov     r13d,r8d
+        jmp     NEAR $L$ssse3_00_47
+
+ALIGN   16
+$L$ssse3_00_47:
+        sub     rbp,-128
+        ror     r13d,14
+        movdqa  xmm4,xmm1
+        mov     eax,r14d
+        mov     r12d,r9d
+        movdqa  xmm7,xmm3
+        ror     r14d,9
+        xor     r13d,r8d
+        xor     r12d,r10d
+        ror     r13d,5
+        xor     r14d,eax
+DB      102,15,58,15,224,4
+        and     r12d,r8d
+        xor     r13d,r8d
+DB      102,15,58,15,250,4
+        add     r11d,DWORD[rsp]
+        mov     r15d,eax
+        xor     r12d,r10d
+        ror     r14d,11
+        movdqa  xmm5,xmm4
+        xor     r15d,ebx
+        add     r11d,r12d
+        movdqa  xmm6,xmm4
+        ror     r13d,6
+        and     edi,r15d
+        psrld   xmm4,3
+        xor     r14d,eax
+        add     r11d,r13d
+        xor     edi,ebx
+        paddd   xmm0,xmm7
+        ror     r14d,2
+        add     edx,r11d
+        psrld   xmm6,7
+        add     r11d,edi
+        mov     r13d,edx
+        pshufd  xmm7,xmm3,250
+        add     r14d,r11d
+        ror     r13d,14
+        pslld   xmm5,14
+        mov     r11d,r14d
+        mov     r12d,r8d
+        pxor    xmm4,xmm6
+        ror     r14d,9
+        xor     r13d,edx
+        xor     r12d,r9d
+        ror     r13d,5
+        psrld   xmm6,11
+        xor     r14d,r11d
+        pxor    xmm4,xmm5
+        and     r12d,edx
+        xor     r13d,edx
+        pslld   xmm5,11
+        add     r10d,DWORD[4+rsp]
+        mov     edi,r11d
+        pxor    xmm4,xmm6
+        xor     r12d,r9d
+        ror     r14d,11
+        movdqa  xmm6,xmm7
+        xor     edi,eax
+        add     r10d,r12d
+        pxor    xmm4,xmm5
+        ror     r13d,6
+        and     r15d,edi
+        xor     r14d,r11d
+        psrld   xmm7,10
+        add     r10d,r13d
+        xor     r15d,eax
+        paddd   xmm0,xmm4
+        ror     r14d,2
+        add     ecx,r10d
+        psrlq   xmm6,17
+        add     r10d,r15d
+        mov     r13d,ecx
+        add     r14d,r10d
+        pxor    xmm7,xmm6
+        ror     r13d,14
+        mov     r10d,r14d
+        mov     r12d,edx
+        ror     r14d,9
+        psrlq   xmm6,2
+        xor     r13d,ecx
+        xor     r12d,r8d
+        pxor    xmm7,xmm6
+        ror     r13d,5
+        xor     r14d,r10d
+        and     r12d,ecx
+        pshufd  xmm7,xmm7,128
+        xor     r13d,ecx
+        add     r9d,DWORD[8+rsp]
+        mov     r15d,r10d
+        psrldq  xmm7,8
+        xor     r12d,r8d
+        ror     r14d,11
+        xor     r15d,r11d
+        add     r9d,r12d
+        ror     r13d,6
+        paddd   xmm0,xmm7
+        and     edi,r15d
+        xor     r14d,r10d
+        add     r9d,r13d
+        pshufd  xmm7,xmm0,80
+        xor     edi,r11d
+        ror     r14d,2
+        add     ebx,r9d
+        movdqa  xmm6,xmm7
+        add     r9d,edi
+        mov     r13d,ebx
+        psrld   xmm7,10
+        add     r14d,r9d
+        ror     r13d,14
+        psrlq   xmm6,17
+        mov     r9d,r14d
+        mov     r12d,ecx
+        pxor    xmm7,xmm6
+        ror     r14d,9
+        xor     r13d,ebx
+        xor     r12d,edx
+        ror     r13d,5
+        xor     r14d,r9d
+        psrlq   xmm6,2
+        and     r12d,ebx
+        xor     r13d,ebx
+        add     r8d,DWORD[12+rsp]
+        pxor    xmm7,xmm6
+        mov     edi,r9d
+        xor     r12d,edx
+        ror     r14d,11
+        pshufd  xmm7,xmm7,8
+        xor     edi,r10d
+        add     r8d,r12d
+        movdqa  xmm6,XMMWORD[rbp]
+        ror     r13d,6
+        and     r15d,edi
+        pslldq  xmm7,8
+        xor     r14d,r9d
+        add     r8d,r13d
+        xor     r15d,r10d
+        paddd   xmm0,xmm7
+        ror     r14d,2
+        add     eax,r8d
+        add     r8d,r15d
+        paddd   xmm6,xmm0
+        mov     r13d,eax
+        add     r14d,r8d
+        movdqa  XMMWORD[rsp],xmm6
+        ror     r13d,14
+        movdqa  xmm4,xmm2
+        mov     r8d,r14d
+        mov     r12d,ebx
+        movdqa  xmm7,xmm0
+        ror     r14d,9
+        xor     r13d,eax
+        xor     r12d,ecx
+        ror     r13d,5
+        xor     r14d,r8d
+DB      102,15,58,15,225,4
+        and     r12d,eax
+        xor     r13d,eax
+DB      102,15,58,15,251,4
+        add     edx,DWORD[16+rsp]
+        mov     r15d,r8d
+        xor     r12d,ecx
+        ror     r14d,11
+        movdqa  xmm5,xmm4
+        xor     r15d,r9d
+        add     edx,r12d
+        movdqa  xmm6,xmm4
+        ror     r13d,6
+        and     edi,r15d
+        psrld   xmm4,3
+        xor     r14d,r8d
+        add     edx,r13d
+        xor     edi,r9d
+        paddd   xmm1,xmm7
+        ror     r14d,2
+        add     r11d,edx
+        psrld   xmm6,7
+        add     edx,edi
+        mov     r13d,r11d
+        pshufd  xmm7,xmm0,250
+        add     r14d,edx
+        ror     r13d,14
+        pslld   xmm5,14
+        mov     edx,r14d
+        mov     r12d,eax
+        pxor    xmm4,xmm6
+        ror     r14d,9
+        xor     r13d,r11d
+        xor     r12d,ebx
+        ror     r13d,5
+        psrld   xmm6,11
+        xor     r14d,edx
+        pxor    xmm4,xmm5
+        and     r12d,r11d
+        xor     r13d,r11d
+        pslld   xmm5,11
+        add     ecx,DWORD[20+rsp]
+        mov     edi,edx
+        pxor    xmm4,xmm6
+        xor     r12d,ebx
+        ror     r14d,11
+        movdqa  xmm6,xmm7
+        xor     edi,r8d
+        add     ecx,r12d
+        pxor    xmm4,xmm5
+        ror     r13d,6
+        and     r15d,edi
+        xor     r14d,edx
+        psrld   xmm7,10
+        add     ecx,r13d
+        xor     r15d,r8d
+        paddd   xmm1,xmm4
+        ror     r14d,2
+        add     r10d,ecx
+        psrlq   xmm6,17
+        add     ecx,r15d
+        mov     r13d,r10d
+        add     r14d,ecx
+        pxor    xmm7,xmm6
+        ror     r13d,14
+        mov     ecx,r14d
+        mov     r12d,r11d
+        ror     r14d,9
+        psrlq   xmm6,2
+        xor     r13d,r10d
+        xor     r12d,eax
+        pxor    xmm7,xmm6
+        ror     r13d,5
+        xor     r14d,ecx
+        and     r12d,r10d
+        pshufd  xmm7,xmm7,128
+        xor     r13d,r10d
+        add     ebx,DWORD[24+rsp]
+        mov     r15d,ecx
+        psrldq  xmm7,8
+        xor     r12d,eax
+        ror     r14d,11
+        xor     r15d,edx
+        add     ebx,r12d
+        ror     r13d,6
+        paddd   xmm1,xmm7
+        and     edi,r15d
+        xor     r14d,ecx
+        add     ebx,r13d
+        pshufd  xmm7,xmm1,80
+        xor     edi,edx
+        ror     r14d,2
+        add     r9d,ebx
+        movdqa  xmm6,xmm7
+        add     ebx,edi
+        mov     r13d,r9d
+        psrld   xmm7,10
+        add     r14d,ebx
+        ror     r13d,14
+        psrlq   xmm6,17
+        mov     ebx,r14d
+        mov     r12d,r10d
+        pxor    xmm7,xmm6
+        ror     r14d,9
+        xor     r13d,r9d
+        xor     r12d,r11d
+        ror     r13d,5
+        xor     r14d,ebx
+        psrlq   xmm6,2
+        and     r12d,r9d
+        xor     r13d,r9d
+        add     eax,DWORD[28+rsp]
+        pxor    xmm7,xmm6
+        mov     edi,ebx
+        xor     r12d,r11d
+        ror     r14d,11
+        pshufd  xmm7,xmm7,8
+        xor     edi,ecx
+        add     eax,r12d
+        movdqa  xmm6,XMMWORD[32+rbp]
+        ror     r13d,6
+        and     r15d,edi
+        pslldq  xmm7,8
+        xor     r14d,ebx
+        add     eax,r13d
+        xor     r15d,ecx
+        paddd   xmm1,xmm7
+        ror     r14d,2
+        add     r8d,eax
+        add     eax,r15d
+        paddd   xmm6,xmm1
+        mov     r13d,r8d
+        add     r14d,eax
+        movdqa  XMMWORD[16+rsp],xmm6
+        ror     r13d,14
+        movdqa  xmm4,xmm3
+        mov     eax,r14d
+        mov     r12d,r9d
+        movdqa  xmm7,xmm1
+        ror     r14d,9
+        xor     r13d,r8d
+        xor     r12d,r10d
+        ror     r13d,5
+        xor     r14d,eax
+DB      102,15,58,15,226,4
+        and     r12d,r8d
+        xor     r13d,r8d
+DB      102,15,58,15,248,4
+        add     r11d,DWORD[32+rsp]
+        mov     r15d,eax
+        xor     r12d,r10d
+        ror     r14d,11
+        movdqa  xmm5,xmm4
+        xor     r15d,ebx
+        add     r11d,r12d
+        movdqa  xmm6,xmm4
+        ror     r13d,6
+        and     edi,r15d
+        psrld   xmm4,3
+        xor     r14d,eax
+        add     r11d,r13d
+        xor     edi,ebx
+        paddd   xmm2,xmm7
+        ror     r14d,2
+        add     edx,r11d
+        psrld   xmm6,7
+        add     r11d,edi
+        mov     r13d,edx
+        pshufd  xmm7,xmm1,250
+        add     r14d,r11d
+        ror     r13d,14
+        pslld   xmm5,14
+        mov     r11d,r14d
+        mov     r12d,r8d
+        pxor    xmm4,xmm6
+        ror     r14d,9
+        xor     r13d,edx
+        xor     r12d,r9d
+        ror     r13d,5
+        psrld   xmm6,11
+        xor     r14d,r11d
+        pxor    xmm4,xmm5
+        and     r12d,edx
+        xor     r13d,edx
+        pslld   xmm5,11
+        add     r10d,DWORD[36+rsp]
+        mov     edi,r11d
+        pxor    xmm4,xmm6
+        xor     r12d,r9d
+        ror     r14d,11
+        movdqa  xmm6,xmm7
+        xor     edi,eax
+        add     r10d,r12d
+        pxor    xmm4,xmm5
+        ror     r13d,6
+        and     r15d,edi
+        xor     r14d,r11d
+        psrld   xmm7,10
+        add     r10d,r13d
+        xor     r15d,eax
+        paddd   xmm2,xmm4
+        ror     r14d,2
+        add     ecx,r10d
+        psrlq   xmm6,17
+        add     r10d,r15d
+        mov     r13d,ecx
+        add     r14d,r10d
+        pxor    xmm7,xmm6
+        ror     r13d,14
+        mov     r10d,r14d
+        mov     r12d,edx
+        ror     r14d,9
+        psrlq   xmm6,2
+        xor     r13d,ecx
+        xor     r12d,r8d
+        pxor    xmm7,xmm6
+        ror     r13d,5
+        xor     r14d,r10d
+        and     r12d,ecx
+        pshufd  xmm7,xmm7,128
+        xor     r13d,ecx
+        add     r9d,DWORD[40+rsp]
+        mov     r15d,r10d
+        psrldq  xmm7,8
+        xor     r12d,r8d
+        ror     r14d,11
+        xor     r15d,r11d
+        add     r9d,r12d
+        ror     r13d,6
+        paddd   xmm2,xmm7
+        and     edi,r15d
+        xor     r14d,r10d
+        add     r9d,r13d
+        pshufd  xmm7,xmm2,80
+        xor     edi,r11d
+        ror     r14d,2
+        add     ebx,r9d
+        movdqa  xmm6,xmm7
+        add     r9d,edi
+        mov     r13d,ebx
+        psrld   xmm7,10
+        add     r14d,r9d
+        ror     r13d,14
+        psrlq   xmm6,17
+        mov     r9d,r14d
+        mov     r12d,ecx
+        pxor    xmm7,xmm6
+        ror     r14d,9
+        xor     r13d,ebx
+        xor     r12d,edx
+        ror     r13d,5
+        xor     r14d,r9d
+        psrlq   xmm6,2
+        and     r12d,ebx
+        xor     r13d,ebx
+        add     r8d,DWORD[44+rsp]
+        pxor    xmm7,xmm6
+        mov     edi,r9d
+        xor     r12d,edx
+        ror     r14d,11
+        pshufd  xmm7,xmm7,8
+        xor     edi,r10d
+        add     r8d,r12d
+        movdqa  xmm6,XMMWORD[64+rbp]
+        ror     r13d,6
+        and     r15d,edi
+        pslldq  xmm7,8
+        xor     r14d,r9d
+        add     r8d,r13d
+        xor     r15d,r10d
+        paddd   xmm2,xmm7
+        ror     r14d,2
+        add     eax,r8d
+        add     r8d,r15d
+        paddd   xmm6,xmm2
+        mov     r13d,eax
+        add     r14d,r8d
+        movdqa  XMMWORD[32+rsp],xmm6
+        ror     r13d,14
+        movdqa  xmm4,xmm0
+        mov     r8d,r14d
+        mov     r12d,ebx
+        movdqa  xmm7,xmm2
+        ror     r14d,9
+        xor     r13d,eax
+        xor     r12d,ecx
+        ror     r13d,5
+        xor     r14d,r8d
+DB      102,15,58,15,227,4
+        and     r12d,eax
+        xor     r13d,eax
+DB      102,15,58,15,249,4
+        add     edx,DWORD[48+rsp]
+        mov     r15d,r8d
+        xor     r12d,ecx
+        ror     r14d,11
+        movdqa  xmm5,xmm4
+        xor     r15d,r9d
+        add     edx,r12d
+        movdqa  xmm6,xmm4
+        ror     r13d,6
+        and     edi,r15d
+        psrld   xmm4,3
+        xor     r14d,r8d
+        add     edx,r13d
+        xor     edi,r9d
+        paddd   xmm3,xmm7
+        ror     r14d,2
+        add     r11d,edx
+        psrld   xmm6,7
+        add     edx,edi
+        mov     r13d,r11d
+        pshufd  xmm7,xmm2,250
+        add     r14d,edx
+        ror     r13d,14
+        pslld   xmm5,14
+        mov     edx,r14d
+        mov     r12d,eax
+        pxor    xmm4,xmm6
+        ror     r14d,9
+        xor     r13d,r11d
+        xor     r12d,ebx
+        ror     r13d,5
+        psrld   xmm6,11
+        xor     r14d,edx
+        pxor    xmm4,xmm5
+        and     r12d,r11d
+        xor     r13d,r11d
+        pslld   xmm5,11
+        add     ecx,DWORD[52+rsp]
+        mov     edi,edx
+        pxor    xmm4,xmm6
+        xor     r12d,ebx
+        ror     r14d,11
+        movdqa  xmm6,xmm7
+        xor     edi,r8d
+        add     ecx,r12d
+        pxor    xmm4,xmm5
+        ror     r13d,6
+        and     r15d,edi
+        xor     r14d,edx
+        psrld   xmm7,10
+        add     ecx,r13d
+        xor     r15d,r8d
+        paddd   xmm3,xmm4
+        ror     r14d,2
+        add     r10d,ecx
+        psrlq   xmm6,17
+        add     ecx,r15d
+        mov     r13d,r10d
+        add     r14d,ecx
+        pxor    xmm7,xmm6
+        ror     r13d,14
+        mov     ecx,r14d
+        mov     r12d,r11d
+        ror     r14d,9
+        psrlq   xmm6,2
+        xor     r13d,r10d
+        xor     r12d,eax
+        pxor    xmm7,xmm6
+        ror     r13d,5
+        xor     r14d,ecx
+        and     r12d,r10d
+        pshufd  xmm7,xmm7,128
+        xor     r13d,r10d
+        add     ebx,DWORD[56+rsp]
+        mov     r15d,ecx
+        psrldq  xmm7,8
+        xor     r12d,eax
+        ror     r14d,11
+        xor     r15d,edx
+        add     ebx,r12d
+        ror     r13d,6
+        paddd   xmm3,xmm7
+        and     edi,r15d
+        xor     r14d,ecx
+        add     ebx,r13d
+        pshufd  xmm7,xmm3,80
+        xor     edi,edx
+        ror     r14d,2
+        add     r9d,ebx
+        movdqa  xmm6,xmm7
+        add     ebx,edi
+        mov     r13d,r9d
+        psrld   xmm7,10
+        add     r14d,ebx
+        ror     r13d,14
+        psrlq   xmm6,17
+        mov     ebx,r14d
+        mov     r12d,r10d
+        pxor    xmm7,xmm6
+        ror     r14d,9
+        xor     r13d,r9d
+        xor     r12d,r11d
+        ror     r13d,5
+        xor     r14d,ebx
+        psrlq   xmm6,2
+        and     r12d,r9d
+        xor     r13d,r9d
+        add     eax,DWORD[60+rsp]
+        pxor    xmm7,xmm6
+        mov     edi,ebx
+        xor     r12d,r11d
+        ror     r14d,11
+        pshufd  xmm7,xmm7,8
+        xor     edi,ecx
+        add     eax,r12d
+        movdqa  xmm6,XMMWORD[96+rbp]
+        ror     r13d,6
+        and     r15d,edi
+        pslldq  xmm7,8
+        xor     r14d,ebx
+        add     eax,r13d
+        xor     r15d,ecx
+        paddd   xmm3,xmm7
+        ror     r14d,2
+        add     r8d,eax
+        add     eax,r15d
+        paddd   xmm6,xmm3
+        mov     r13d,r8d
+        add     r14d,eax
+        movdqa  XMMWORD[48+rsp],xmm6
+        cmp     BYTE[131+rbp],0
+        jne     NEAR $L$ssse3_00_47
+        ror     r13d,14
+        mov     eax,r14d
+        mov     r12d,r9d
+        ror     r14d,9
+        xor     r13d,r8d
+        xor     r12d,r10d
+        ror     r13d,5
+        xor     r14d,eax
+        and     r12d,r8d
+        xor     r13d,r8d
+        add     r11d,DWORD[rsp]
+        mov     r15d,eax
+        xor     r12d,r10d
+        ror     r14d,11
+        xor     r15d,ebx
+        add     r11d,r12d
+        ror     r13d,6
+        and     edi,r15d
+        xor     r14d,eax
+        add     r11d,r13d
+        xor     edi,ebx
+        ror     r14d,2
+        add     edx,r11d
+        add     r11d,edi
+        mov     r13d,edx
+        add     r14d,r11d
+        ror     r13d,14
+        mov     r11d,r14d
+        mov     r12d,r8d
+        ror     r14d,9
+        xor     r13d,edx
+        xor     r12d,r9d
+        ror     r13d,5
+        xor     r14d,r11d
+        and     r12d,edx
+        xor     r13d,edx
+        add     r10d,DWORD[4+rsp]
+        mov     edi,r11d
+        xor     r12d,r9d
+        ror     r14d,11
+        xor     edi,eax
+        add     r10d,r12d
+        ror     r13d,6
+        and     r15d,edi
+        xor     r14d,r11d
+        add     r10d,r13d
+        xor     r15d,eax
+        ror     r14d,2
+        add     ecx,r10d
+        add     r10d,r15d
+        mov     r13d,ecx
+        add     r14d,r10d
+        ror     r13d,14
+        mov     r10d,r14d
+        mov     r12d,edx
+        ror     r14d,9
+        xor     r13d,ecx
+        xor     r12d,r8d
+        ror     r13d,5
+        xor     r14d,r10d
+        and     r12d,ecx
+        xor     r13d,ecx
+        add     r9d,DWORD[8+rsp]
+        mov     r15d,r10d
+        xor     r12d,r8d
+        ror     r14d,11
+        xor     r15d,r11d
+        add     r9d,r12d
+        ror     r13d,6
+        and     edi,r15d
+        xor     r14d,r10d
+        add     r9d,r13d
+        xor     edi,r11d
+        ror     r14d,2
+        add     ebx,r9d
+        add     r9d,edi
+        mov     r13d,ebx
+        add     r14d,r9d
+        ror     r13d,14
+        mov     r9d,r14d
+        mov     r12d,ecx
+        ror     r14d,9
+        xor     r13d,ebx
+        xor     r12d,edx
+        ror     r13d,5
+        xor     r14d,r9d
+        and     r12d,ebx
+        xor     r13d,ebx
+        add     r8d,DWORD[12+rsp]
+        mov     edi,r9d
+        xor     r12d,edx
+        ror     r14d,11
+        xor     edi,r10d
+        add     r8d,r12d
+        ror     r13d,6
+        and     r15d,edi
+        xor     r14d,r9d
+        add     r8d,r13d
+        xor     r15d,r10d
+        ror     r14d,2
+        add     eax,r8d
+        add     r8d,r15d
+        mov     r13d,eax
+        add     r14d,r8d
+        ror     r13d,14
+        mov     r8d,r14d
+        mov     r12d,ebx
+        ror     r14d,9
+        xor     r13d,eax
+        xor     r12d,ecx
+        ror     r13d,5
+        xor     r14d,r8d
+        and     r12d,eax
+        xor     r13d,eax
+        add     edx,DWORD[16+rsp]
+        mov     r15d,r8d
+        xor     r12d,ecx
+        ror     r14d,11
+        xor     r15d,r9d
+        add     edx,r12d
+        ror     r13d,6
+        and     edi,r15d
+        xor     r14d,r8d
+        add     edx,r13d
+        xor     edi,r9d
+        ror     r14d,2
+        add     r11d,edx
+        add     edx,edi
+        mov     r13d,r11d
+        add     r14d,edx
+        ror     r13d,14
+        mov     edx,r14d
+        mov     r12d,eax
+        ror     r14d,9
+        xor     r13d,r11d
+        xor     r12d,ebx
+        ror     r13d,5
+        xor     r14d,edx
+        and     r12d,r11d
+        xor     r13d,r11d
+        add     ecx,DWORD[20+rsp]
+        mov     edi,edx
+        xor     r12d,ebx
+        ror     r14d,11
+        xor     edi,r8d
+        add     ecx,r12d
+        ror     r13d,6
+        and     r15d,edi
+        xor     r14d,edx
+        add     ecx,r13d
+        xor     r15d,r8d
+        ror     r14d,2
+        add     r10d,ecx
+        add     ecx,r15d
+        mov     r13d,r10d
+        add     r14d,ecx
+        ror     r13d,14
+        mov     ecx,r14d
+        mov     r12d,r11d
+        ror     r14d,9
+        xor     r13d,r10d
+        xor     r12d,eax
+        ror     r13d,5
+        xor     r14d,ecx
+        and     r12d,r10d
+        xor     r13d,r10d
+        add     ebx,DWORD[24+rsp]
+        mov     r15d,ecx
+        xor     r12d,eax
+        ror     r14d,11
+        xor     r15d,edx
+        add     ebx,r12d
+        ror     r13d,6
+        and     edi,r15d
+        xor     r14d,ecx
+        add     ebx,r13d
+        xor     edi,edx
+        ror     r14d,2
+        add     r9d,ebx
+        add     ebx,edi
+        mov     r13d,r9d
+        add     r14d,ebx
+        ror     r13d,14
+        mov     ebx,r14d
+        mov     r12d,r10d
+        ror     r14d,9
+        xor     r13d,r9d
+        xor     r12d,r11d
+        ror     r13d,5
+        xor     r14d,ebx
+        and     r12d,r9d
+        xor     r13d,r9d
+        add     eax,DWORD[28+rsp]
+        mov     edi,ebx
+        xor     r12d,r11d
+        ror     r14d,11
+        xor     edi,ecx
+        add     eax,r12d
+        ror     r13d,6
+        and     r15d,edi
+        xor     r14d,ebx
+        add     eax,r13d
+        xor     r15d,ecx
+        ror     r14d,2
+        add     r8d,eax
+        add     eax,r15d
+        mov     r13d,r8d
+        add     r14d,eax
+        ror     r13d,14
+        mov     eax,r14d
+        mov     r12d,r9d
+        ror     r14d,9
+        xor     r13d,r8d
+        xor     r12d,r10d
+        ror     r13d,5
+        xor     r14d,eax
+        and     r12d,r8d
+        xor     r13d,r8d
+        add     r11d,DWORD[32+rsp]
+        mov     r15d,eax
+        xor     r12d,r10d
+        ror     r14d,11
+        xor     r15d,ebx
+        add     r11d,r12d
+        ror     r13d,6
+        and     edi,r15d
+        xor     r14d,eax
+        add     r11d,r13d
+        xor     edi,ebx
+        ror     r14d,2
+        add     edx,r11d
+        add     r11d,edi
+        mov     r13d,edx
+        add     r14d,r11d
+        ror     r13d,14
+        mov     r11d,r14d
+        mov     r12d,r8d
+        ror     r14d,9
+        xor     r13d,edx
+        xor     r12d,r9d
+        ror     r13d,5
+        xor     r14d,r11d
+        and     r12d,edx
+        xor     r13d,edx
+        add     r10d,DWORD[36+rsp]
+        mov     edi,r11d
+        xor     r12d,r9d
+        ror     r14d,11
+        xor     edi,eax
+        add     r10d,r12d
+        ror     r13d,6
+        and     r15d,edi
+        xor     r14d,r11d
+        add     r10d,r13d
+        xor     r15d,eax
+        ror     r14d,2
+        add     ecx,r10d
+        add     r10d,r15d
+        mov     r13d,ecx
+        add     r14d,r10d
+        ror     r13d,14
+        mov     r10d,r14d
+        mov     r12d,edx
+        ror     r14d,9
+        xor     r13d,ecx
+        xor     r12d,r8d
+        ror     r13d,5
+        xor     r14d,r10d
+        and     r12d,ecx
+        xor     r13d,ecx
+        add     r9d,DWORD[40+rsp]
+        mov     r15d,r10d
+        xor     r12d,r8d
+        ror     r14d,11
+        xor     r15d,r11d
+        add     r9d,r12d
+        ror     r13d,6
+        and     edi,r15d
+        xor     r14d,r10d
+        add     r9d,r13d
+        xor     edi,r11d
+        ror     r14d,2
+        add     ebx,r9d
+        add     r9d,edi
+        mov     r13d,ebx
+        add     r14d,r9d
+        ror     r13d,14
+        mov     r9d,r14d
+        mov     r12d,ecx
+        ror     r14d,9
+        xor     r13d,ebx
+        xor     r12d,edx
+        ror     r13d,5
+        xor     r14d,r9d
+        and     r12d,ebx
+        xor     r13d,ebx
+        add     r8d,DWORD[44+rsp]
+        mov     edi,r9d
+        xor     r12d,edx
+        ror     r14d,11
+        xor     edi,r10d
+        add     r8d,r12d
+        ror     r13d,6
+        and     r15d,edi
+        xor     r14d,r9d
+        add     r8d,r13d
+        xor     r15d,r10d
+        ror     r14d,2
+        add     eax,r8d
+        add     r8d,r15d
+        mov     r13d,eax
+        add     r14d,r8d
+        ror     r13d,14
+        mov     r8d,r14d
+        mov     r12d,ebx
+        ror     r14d,9
+        xor     r13d,eax
+        xor     r12d,ecx
+        ror     r13d,5
+        xor     r14d,r8d
+        and     r12d,eax
+        xor     r13d,eax
+        add     edx,DWORD[48+rsp]
+        mov     r15d,r8d
+        xor     r12d,ecx
+        ror     r14d,11
+        xor     r15d,r9d
+        add     edx,r12d
+        ror     r13d,6
+        and     edi,r15d
+        xor     r14d,r8d
+        add     edx,r13d
+        xor     edi,r9d
+        ror     r14d,2
+        add     r11d,edx
+        add     edx,edi
+        mov     r13d,r11d
+        add     r14d,edx
+        ror     r13d,14
+        mov     edx,r14d
+        mov     r12d,eax
+        ror     r14d,9
+        xor     r13d,r11d
+        xor     r12d,ebx
+        ror     r13d,5
+        xor     r14d,edx
+        and     r12d,r11d
+        xor     r13d,r11d
+        add     ecx,DWORD[52+rsp]
+        mov     edi,edx
+        xor     r12d,ebx
+        ror     r14d,11
+        xor     edi,r8d
+        add     ecx,r12d
+        ror     r13d,6
+        and     r15d,edi
+        xor     r14d,edx
+        add     ecx,r13d
+        xor     r15d,r8d
+        ror     r14d,2
+        add     r10d,ecx
+        add     ecx,r15d
+        mov     r13d,r10d
+        add     r14d,ecx
+        ror     r13d,14
+        mov     ecx,r14d
+        mov     r12d,r11d
+        ror     r14d,9
+        xor     r13d,r10d
+        xor     r12d,eax
+        ror     r13d,5
+        xor     r14d,ecx
+        and     r12d,r10d
+        xor     r13d,r10d
+        add     ebx,DWORD[56+rsp]
+        mov     r15d,ecx
+        xor     r12d,eax
+        ror     r14d,11
+        xor     r15d,edx
+        add     ebx,r12d
+        ror     r13d,6
+        and     edi,r15d
+        xor     r14d,ecx
+        add     ebx,r13d
+        xor     edi,edx
+        ror     r14d,2
+        add     r9d,ebx
+        add     ebx,edi
+        mov     r13d,r9d
+        add     r14d,ebx
+        ror     r13d,14
+        mov     ebx,r14d
+        mov     r12d,r10d
+        ror     r14d,9
+        xor     r13d,r9d
+        xor     r12d,r11d
+        ror     r13d,5
+        xor     r14d,ebx
+        and     r12d,r9d
+        xor     r13d,r9d
+        add     eax,DWORD[60+rsp]
+        mov     edi,ebx
+        xor     r12d,r11d
+        ror     r14d,11
+        xor     edi,ecx
+        add     eax,r12d
+        ror     r13d,6
+        and     r15d,edi
+        xor     r14d,ebx
+        add     eax,r13d
+        xor     r15d,ecx
+        ror     r14d,2
+        add     r8d,eax
+        add     eax,r15d
+        mov     r13d,r8d
+        add     r14d,eax
+        mov     rdi,QWORD[((64+0))+rsp]
+        mov     eax,r14d
+
+        add     eax,DWORD[rdi]
+        lea     rsi,[64+rsi]
+        add     ebx,DWORD[4+rdi]
+        add     ecx,DWORD[8+rdi]
+        add     edx,DWORD[12+rdi]
+        add     r8d,DWORD[16+rdi]
+        add     r9d,DWORD[20+rdi]
+        add     r10d,DWORD[24+rdi]
+        add     r11d,DWORD[28+rdi]
+
+        cmp     rsi,QWORD[((64+16))+rsp]
+
+        mov     DWORD[rdi],eax
+        mov     DWORD[4+rdi],ebx
+        mov     DWORD[8+rdi],ecx
+        mov     DWORD[12+rdi],edx
+        mov     DWORD[16+rdi],r8d
+        mov     DWORD[20+rdi],r9d
+        mov     DWORD[24+rdi],r10d
+        mov     DWORD[28+rdi],r11d
+        jb      NEAR $L$loop_ssse3
+
+        mov     rsi,QWORD[88+rsp]
+
+        movaps  xmm6,XMMWORD[((64+32))+rsp]
+        movaps  xmm7,XMMWORD[((64+48))+rsp]
+        movaps  xmm8,XMMWORD[((64+64))+rsp]
+        movaps  xmm9,XMMWORD[((64+80))+rsp]
+        mov     r15,QWORD[((-48))+rsi]
+
+        mov     r14,QWORD[((-40))+rsi]
+
+        mov     r13,QWORD[((-32))+rsi]
+
+        mov     r12,QWORD[((-24))+rsi]
+
+        mov     rbp,QWORD[((-16))+rsi]
+
+        mov     rbx,QWORD[((-8))+rsi]
+
+        lea     rsp,[rsi]
+
+$L$epilogue_ssse3:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_sha256_block_data_order_ssse3:
+
+ALIGN   64
+sha256_block_data_order_avx:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_sha256_block_data_order_avx:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+
+$L$avx_shortcut:
+        mov     rax,rsp
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+        shl     rdx,4
+        sub     rsp,160
+        lea     rdx,[rdx*4+rsi]
+        and     rsp,-64
+        mov     QWORD[((64+0))+rsp],rdi
+        mov     QWORD[((64+8))+rsp],rsi
+        mov     QWORD[((64+16))+rsp],rdx
+        mov     QWORD[88+rsp],rax
+
+        movaps  XMMWORD[(64+32)+rsp],xmm6
+        movaps  XMMWORD[(64+48)+rsp],xmm7
+        movaps  XMMWORD[(64+64)+rsp],xmm8
+        movaps  XMMWORD[(64+80)+rsp],xmm9
+$L$prologue_avx:
+
+        vzeroupper
+        mov     eax,DWORD[rdi]
+        mov     ebx,DWORD[4+rdi]
+        mov     ecx,DWORD[8+rdi]
+        mov     edx,DWORD[12+rdi]
+        mov     r8d,DWORD[16+rdi]
+        mov     r9d,DWORD[20+rdi]
+        mov     r10d,DWORD[24+rdi]
+        mov     r11d,DWORD[28+rdi]
+        vmovdqa xmm8,XMMWORD[((K256+512+32))]
+        vmovdqa xmm9,XMMWORD[((K256+512+64))]
+        jmp     NEAR $L$loop_avx
+ALIGN   16
+$L$loop_avx:
+        vmovdqa xmm7,XMMWORD[((K256+512))]
+        vmovdqu xmm0,XMMWORD[rsi]
+        vmovdqu xmm1,XMMWORD[16+rsi]
+        vmovdqu xmm2,XMMWORD[32+rsi]
+        vmovdqu xmm3,XMMWORD[48+rsi]
+        vpshufb xmm0,xmm0,xmm7
+        lea     rbp,[K256]
+        vpshufb xmm1,xmm1,xmm7
+        vpshufb xmm2,xmm2,xmm7
+        vpaddd  xmm4,xmm0,XMMWORD[rbp]
+        vpshufb xmm3,xmm3,xmm7
+        vpaddd  xmm5,xmm1,XMMWORD[32+rbp]
+        vpaddd  xmm6,xmm2,XMMWORD[64+rbp]
+        vpaddd  xmm7,xmm3,XMMWORD[96+rbp]
+        vmovdqa XMMWORD[rsp],xmm4
+        mov     r14d,eax
+        vmovdqa XMMWORD[16+rsp],xmm5
+        mov     edi,ebx
+        vmovdqa XMMWORD[32+rsp],xmm6
+        xor     edi,ecx
+        vmovdqa XMMWORD[48+rsp],xmm7
+        mov     r13d,r8d
+        jmp     NEAR $L$avx_00_47
+
+ALIGN   16
+$L$avx_00_47:
+        sub     rbp,-128
+        vpalignr        xmm4,xmm1,xmm0,4
+        shrd    r13d,r13d,14
+        mov     eax,r14d
+        mov     r12d,r9d
+        vpalignr        xmm7,xmm3,xmm2,4
+        shrd    r14d,r14d,9
+        xor     r13d,r8d
+        xor     r12d,r10d
+        vpsrld  xmm6,xmm4,7
+        shrd    r13d,r13d,5
+        xor     r14d,eax
+        and     r12d,r8d
+        vpaddd  xmm0,xmm0,xmm7
+        xor     r13d,r8d
+        add     r11d,DWORD[rsp]
+        mov     r15d,eax
+        vpsrld  xmm7,xmm4,3
+        xor     r12d,r10d
+        shrd    r14d,r14d,11
+        xor     r15d,ebx
+        vpslld  xmm5,xmm4,14
+        add     r11d,r12d
+        shrd    r13d,r13d,6
+        and     edi,r15d
+        vpxor   xmm4,xmm7,xmm6
+        xor     r14d,eax
+        add     r11d,r13d
+        xor     edi,ebx
+        vpshufd xmm7,xmm3,250
+        shrd    r14d,r14d,2
+        add     edx,r11d
+        add     r11d,edi
+        vpsrld  xmm6,xmm6,11
+        mov     r13d,edx
+        add     r14d,r11d
+        shrd    r13d,r13d,14
+        vpxor   xmm4,xmm4,xmm5
+        mov     r11d,r14d
+        mov     r12d,r8d
+        shrd    r14d,r14d,9
+        vpslld  xmm5,xmm5,11
+        xor     r13d,edx
+        xor     r12d,r9d
+        shrd    r13d,r13d,5
+        vpxor   xmm4,xmm4,xmm6
+        xor     r14d,r11d
+        and     r12d,edx
+        xor     r13d,edx
+        vpsrld  xmm6,xmm7,10
+        add     r10d,DWORD[4+rsp]
+        mov     edi,r11d
+        xor     r12d,r9d
+        vpxor   xmm4,xmm4,xmm5
+        shrd    r14d,r14d,11
+        xor     edi,eax
+        add     r10d,r12d
+        vpsrlq  xmm7,xmm7,17
+        shrd    r13d,r13d,6
+        and     r15d,edi
+        xor     r14d,r11d
+        vpaddd  xmm0,xmm0,xmm4
+        add     r10d,r13d
+        xor     r15d,eax
+        shrd    r14d,r14d,2
+        vpxor   xmm6,xmm6,xmm7
+        add     ecx,r10d
+        add     r10d,r15d
+        mov     r13d,ecx
+        vpsrlq  xmm7,xmm7,2
+        add     r14d,r10d
+        shrd    r13d,r13d,14
+        mov     r10d,r14d
+        vpxor   xmm6,xmm6,xmm7
+        mov     r12d,edx
+        shrd    r14d,r14d,9
+        xor     r13d,ecx
+        vpshufb xmm6,xmm6,xmm8
+        xor     r12d,r8d
+        shrd    r13d,r13d,5
+        xor     r14d,r10d
+        vpaddd  xmm0,xmm0,xmm6
+        and     r12d,ecx
+        xor     r13d,ecx
+        add     r9d,DWORD[8+rsp]
+        vpshufd xmm7,xmm0,80
+        mov     r15d,r10d
+        xor     r12d,r8d
+        shrd    r14d,r14d,11
+        vpsrld  xmm6,xmm7,10
+        xor     r15d,r11d
+        add     r9d,r12d
+        shrd    r13d,r13d,6
+        vpsrlq  xmm7,xmm7,17
+        and     edi,r15d
+        xor     r14d,r10d
+        add     r9d,r13d
+        vpxor   xmm6,xmm6,xmm7
+        xor     edi,r11d
+        shrd    r14d,r14d,2
+        add     ebx,r9d
+        vpsrlq  xmm7,xmm7,2
+        add     r9d,edi
+        mov     r13d,ebx
+        add     r14d,r9d
+        vpxor   xmm6,xmm6,xmm7
+        shrd    r13d,r13d,14
+        mov     r9d,r14d
+        mov     r12d,ecx
+        vpshufb xmm6,xmm6,xmm9
+        shrd    r14d,r14d,9
+        xor     r13d,ebx
+        xor     r12d,edx
+        vpaddd  xmm0,xmm0,xmm6
+        shrd    r13d,r13d,5
+        xor     r14d,r9d
+        and     r12d,ebx
+        vpaddd  xmm6,xmm0,XMMWORD[rbp]
+        xor     r13d,ebx
+        add     r8d,DWORD[12+rsp]
+        mov     edi,r9d
+        xor     r12d,edx
+        shrd    r14d,r14d,11
+        xor     edi,r10d
+        add     r8d,r12d
+        shrd    r13d,r13d,6
+        and     r15d,edi
+        xor     r14d,r9d
+        add     r8d,r13d
+        xor     r15d,r10d
+        shrd    r14d,r14d,2
+        add     eax,r8d
+        add     r8d,r15d
+        mov     r13d,eax
+        add     r14d,r8d
+        vmovdqa XMMWORD[rsp],xmm6
+        vpalignr        xmm4,xmm2,xmm1,4
+        shrd    r13d,r13d,14
+        mov     r8d,r14d
+        mov     r12d,ebx
+        vpalignr        xmm7,xmm0,xmm3,4
+        shrd    r14d,r14d,9
+        xor     r13d,eax
+        xor     r12d,ecx
+        vpsrld  xmm6,xmm4,7
+        shrd    r13d,r13d,5
+        xor     r14d,r8d
+        and     r12d,eax
+        vpaddd  xmm1,xmm1,xmm7
+        xor     r13d,eax
+        add     edx,DWORD[16+rsp]
+        mov     r15d,r8d
+        vpsrld  xmm7,xmm4,3
+        xor     r12d,ecx
+        shrd    r14d,r14d,11
+        xor     r15d,r9d
+        vpslld  xmm5,xmm4,14
+        add     edx,r12d
+        shrd    r13d,r13d,6
+        and     edi,r15d
+        vpxor   xmm4,xmm7,xmm6
+        xor     r14d,r8d
+        add     edx,r13d
+        xor     edi,r9d
+        vpshufd xmm7,xmm0,250
+        shrd    r14d,r14d,2
+        add     r11d,edx
+        add     edx,edi
+        vpsrld  xmm6,xmm6,11
+        mov     r13d,r11d
+        add     r14d,edx
+        shrd    r13d,r13d,14
+        vpxor   xmm4,xmm4,xmm5
+        mov     edx,r14d
+        mov     r12d,eax
+        shrd    r14d,r14d,9
+        vpslld  xmm5,xmm5,11
+        xor     r13d,r11d
+        xor     r12d,ebx
+        shrd    r13d,r13d,5
+        vpxor   xmm4,xmm4,xmm6
+        xor     r14d,edx
+        and     r12d,r11d
+        xor     r13d,r11d
+        vpsrld  xmm6,xmm7,10
+        add     ecx,DWORD[20+rsp]
+        mov     edi,edx
+        xor     r12d,ebx
+        vpxor   xmm4,xmm4,xmm5
+        shrd    r14d,r14d,11
+        xor     edi,r8d
+        add     ecx,r12d
+        vpsrlq  xmm7,xmm7,17
+        shrd    r13d,r13d,6
+        and     r15d,edi
+        xor     r14d,edx
+        vpaddd  xmm1,xmm1,xmm4
+        add     ecx,r13d
+        xor     r15d,r8d
+        shrd    r14d,r14d,2
+        vpxor   xmm6,xmm6,xmm7
+        add     r10d,ecx
+        add     ecx,r15d
+        mov     r13d,r10d
+        vpsrlq  xmm7,xmm7,2
+        add     r14d,ecx
+        shrd    r13d,r13d,14
+        mov     ecx,r14d
+        vpxor   xmm6,xmm6,xmm7
+        mov     r12d,r11d
+        shrd    r14d,r14d,9
+        xor     r13d,r10d
+        vpshufb xmm6,xmm6,xmm8
+        xor     r12d,eax
+        shrd    r13d,r13d,5
+        xor     r14d,ecx
+        vpaddd  xmm1,xmm1,xmm6
+        and     r12d,r10d
+        xor     r13d,r10d
+        add     ebx,DWORD[24+rsp]
+        vpshufd xmm7,xmm1,80
+        mov     r15d,ecx
+        xor     r12d,eax
+        shrd    r14d,r14d,11
+        vpsrld  xmm6,xmm7,10
+        xor     r15d,edx
+        add     ebx,r12d
+        shrd    r13d,r13d,6
+        vpsrlq  xmm7,xmm7,17
+        and     edi,r15d
+        xor     r14d,ecx
+        add     ebx,r13d
+        vpxor   xmm6,xmm6,xmm7
+        xor     edi,edx
+        shrd    r14d,r14d,2
+        add     r9d,ebx
+        vpsrlq  xmm7,xmm7,2
+        add     ebx,edi
+        mov     r13d,r9d
+        add     r14d,ebx
+        vpxor   xmm6,xmm6,xmm7
+        shrd    r13d,r13d,14
+        mov     ebx,r14d
+        mov     r12d,r10d
+        vpshufb xmm6,xmm6,xmm9
+        shrd    r14d,r14d,9
+        xor     r13d,r9d
+        xor     r12d,r11d
+        vpaddd  xmm1,xmm1,xmm6
+        shrd    r13d,r13d,5
+        xor     r14d,ebx
+        and     r12d,r9d
+        vpaddd  xmm6,xmm1,XMMWORD[32+rbp]
+        xor     r13d,r9d
+        add     eax,DWORD[28+rsp]
+        mov     edi,ebx
+        xor     r12d,r11d
+        shrd    r14d,r14d,11
+        xor     edi,ecx
+        add     eax,r12d
+        shrd    r13d,r13d,6
+        and     r15d,edi
+        xor     r14d,ebx
+        add     eax,r13d
+        xor     r15d,ecx
+        shrd    r14d,r14d,2
+        add     r8d,eax
+        add     eax,r15d
+        mov     r13d,r8d
+        add     r14d,eax
+        vmovdqa XMMWORD[16+rsp],xmm6
+        vpalignr        xmm4,xmm3,xmm2,4
+        shrd    r13d,r13d,14
+        mov     eax,r14d
+        mov     r12d,r9d
+        vpalignr        xmm7,xmm1,xmm0,4
+        shrd    r14d,r14d,9
+        xor     r13d,r8d
+        xor     r12d,r10d
+        vpsrld  xmm6,xmm4,7
+        shrd    r13d,r13d,5
+        xor     r14d,eax
+        and     r12d,r8d
+        vpaddd  xmm2,xmm2,xmm7
+        xor     r13d,r8d
+        add     r11d,DWORD[32+rsp]
+        mov     r15d,eax
+        vpsrld  xmm7,xmm4,3
+        xor     r12d,r10d
+        shrd    r14d,r14d,11
+        xor     r15d,ebx
+        vpslld  xmm5,xmm4,14
+        add     r11d,r12d
+        shrd    r13d,r13d,6
+        and     edi,r15d
+        vpxor   xmm4,xmm7,xmm6
+        xor     r14d,eax
+        add     r11d,r13d
+        xor     edi,ebx
+        vpshufd xmm7,xmm1,250
+        shrd    r14d,r14d,2
+        add     edx,r11d
+        add     r11d,edi
+        vpsrld  xmm6,xmm6,11
+        mov     r13d,edx
+        add     r14d,r11d
+        shrd    r13d,r13d,14
+        vpxor   xmm4,xmm4,xmm5
+        mov     r11d,r14d
+        mov     r12d,r8d
+        shrd    r14d,r14d,9
+        vpslld  xmm5,xmm5,11
+        xor     r13d,edx
+        xor     r12d,r9d
+        shrd    r13d,r13d,5
+        vpxor   xmm4,xmm4,xmm6
+        xor     r14d,r11d
+        and     r12d,edx
+        xor     r13d,edx
+        vpsrld  xmm6,xmm7,10
+        add     r10d,DWORD[36+rsp]
+        mov     edi,r11d
+        xor     r12d,r9d
+        vpxor   xmm4,xmm4,xmm5
+        shrd    r14d,r14d,11
+        xor     edi,eax
+        add     r10d,r12d
+        vpsrlq  xmm7,xmm7,17
+        shrd    r13d,r13d,6
+        and     r15d,edi
+        xor     r14d,r11d
+        vpaddd  xmm2,xmm2,xmm4
+        add     r10d,r13d
+        xor     r15d,eax
+        shrd    r14d,r14d,2
+        vpxor   xmm6,xmm6,xmm7
+        add     ecx,r10d
+        add     r10d,r15d
+        mov     r13d,ecx
+        vpsrlq  xmm7,xmm7,2
+        add     r14d,r10d
+        shrd    r13d,r13d,14
+        mov     r10d,r14d
+        vpxor   xmm6,xmm6,xmm7
+        mov     r12d,edx
+        shrd    r14d,r14d,9
+        xor     r13d,ecx
+        vpshufb xmm6,xmm6,xmm8
+        xor     r12d,r8d
+        shrd    r13d,r13d,5
+        xor     r14d,r10d
+        vpaddd  xmm2,xmm2,xmm6
+        and     r12d,ecx
+        xor     r13d,ecx
+        add     r9d,DWORD[40+rsp]
+        vpshufd xmm7,xmm2,80
+        mov     r15d,r10d
+        xor     r12d,r8d
+        shrd    r14d,r14d,11
+        vpsrld  xmm6,xmm7,10
+        xor     r15d,r11d
+        add     r9d,r12d
+        shrd    r13d,r13d,6
+        vpsrlq  xmm7,xmm7,17
+        and     edi,r15d
+        xor     r14d,r10d
+        add     r9d,r13d
+        vpxor   xmm6,xmm6,xmm7
+        xor     edi,r11d
+        shrd    r14d,r14d,2
+        add     ebx,r9d
+        vpsrlq  xmm7,xmm7,2
+        add     r9d,edi
+        mov     r13d,ebx
+        add     r14d,r9d
+        vpxor   xmm6,xmm6,xmm7
+        shrd    r13d,r13d,14
+        mov     r9d,r14d
+        mov     r12d,ecx
+        vpshufb xmm6,xmm6,xmm9
+        shrd    r14d,r14d,9
+        xor     r13d,ebx
+        xor     r12d,edx
+        vpaddd  xmm2,xmm2,xmm6
+        shrd    r13d,r13d,5
+        xor     r14d,r9d
+        and     r12d,ebx
+        vpaddd  xmm6,xmm2,XMMWORD[64+rbp]
+        xor     r13d,ebx
+        add     r8d,DWORD[44+rsp]
+        mov     edi,r9d
+        xor     r12d,edx
+        shrd    r14d,r14d,11
+        xor     edi,r10d
+        add     r8d,r12d
+        shrd    r13d,r13d,6
+        and     r15d,edi
+        xor     r14d,r9d
+        add     r8d,r13d
+        xor     r15d,r10d
+        shrd    r14d,r14d,2
+        add     eax,r8d
+        add     r8d,r15d
+        mov     r13d,eax
+        add     r14d,r8d
+        vmovdqa XMMWORD[32+rsp],xmm6
+        vpalignr        xmm4,xmm0,xmm3,4
+        shrd    r13d,r13d,14
+        mov     r8d,r14d
+        mov     r12d,ebx
+        vpalignr        xmm7,xmm2,xmm1,4
+        shrd    r14d,r14d,9
+        xor     r13d,eax
+        xor     r12d,ecx
+        vpsrld  xmm6,xmm4,7
+        shrd    r13d,r13d,5
+        xor     r14d,r8d
+        and     r12d,eax
+        vpaddd  xmm3,xmm3,xmm7
+        xor     r13d,eax
+        add     edx,DWORD[48+rsp]
+        mov     r15d,r8d
+        vpsrld  xmm7,xmm4,3
+        xor     r12d,ecx
+        shrd    r14d,r14d,11
+        xor     r15d,r9d
+        vpslld  xmm5,xmm4,14
+        add     edx,r12d
+        shrd    r13d,r13d,6
+        and     edi,r15d
+        vpxor   xmm4,xmm7,xmm6
+        xor     r14d,r8d
+        add     edx,r13d
+        xor     edi,r9d
+        vpshufd xmm7,xmm2,250
+        shrd    r14d,r14d,2
+        add     r11d,edx
+        add     edx,edi
+        vpsrld  xmm6,xmm6,11
+        mov     r13d,r11d
+        add     r14d,edx
+        shrd    r13d,r13d,14
+        vpxor   xmm4,xmm4,xmm5
+        mov     edx,r14d
+        mov     r12d,eax
+        shrd    r14d,r14d,9
+        vpslld  xmm5,xmm5,11
+        xor     r13d,r11d
+        xor     r12d,ebx
+        shrd    r13d,r13d,5
+        vpxor   xmm4,xmm4,xmm6
+        xor     r14d,edx
+        and     r12d,r11d
+        xor     r13d,r11d
+        vpsrld  xmm6,xmm7,10
+        add     ecx,DWORD[52+rsp]
+        mov     edi,edx
+        xor     r12d,ebx
+        vpxor   xmm4,xmm4,xmm5
+        shrd    r14d,r14d,11
+        xor     edi,r8d
+        add     ecx,r12d
+        vpsrlq  xmm7,xmm7,17
+        shrd    r13d,r13d,6
+        and     r15d,edi
+        xor     r14d,edx
+        vpaddd  xmm3,xmm3,xmm4
+        add     ecx,r13d
+        xor     r15d,r8d
+        shrd    r14d,r14d,2
+        vpxor   xmm6,xmm6,xmm7
+        add     r10d,ecx
+        add     ecx,r15d
+        mov     r13d,r10d
+        vpsrlq  xmm7,xmm7,2
+        add     r14d,ecx
+        shrd    r13d,r13d,14
+        mov     ecx,r14d
+        vpxor   xmm6,xmm6,xmm7
+        mov     r12d,r11d
+        shrd    r14d,r14d,9
+        xor     r13d,r10d
+        vpshufb xmm6,xmm6,xmm8
+        xor     r12d,eax
+        shrd    r13d,r13d,5
+        xor     r14d,ecx
+        vpaddd  xmm3,xmm3,xmm6
+        and     r12d,r10d
+        xor     r13d,r10d
+        add     ebx,DWORD[56+rsp]
+        vpshufd xmm7,xmm3,80
+        mov     r15d,ecx
+        xor     r12d,eax
+        shrd    r14d,r14d,11
+        vpsrld  xmm6,xmm7,10
+        xor     r15d,edx
+        add     ebx,r12d
+        shrd    r13d,r13d,6
+        vpsrlq  xmm7,xmm7,17
+        and     edi,r15d
+        xor     r14d,ecx
+        add     ebx,r13d
+        vpxor   xmm6,xmm6,xmm7
+        xor     edi,edx
+        shrd    r14d,r14d,2
+        add     r9d,ebx
+        vpsrlq  xmm7,xmm7,2
+        add     ebx,edi
+        mov     r13d,r9d
+        add     r14d,ebx
+        vpxor   xmm6,xmm6,xmm7
+        shrd    r13d,r13d,14
+        mov     ebx,r14d
+        mov     r12d,r10d
+        vpshufb xmm6,xmm6,xmm9
+        shrd    r14d,r14d,9
+        xor     r13d,r9d
+        xor     r12d,r11d
+        vpaddd  xmm3,xmm3,xmm6
+        shrd    r13d,r13d,5
+        xor     r14d,ebx
+        and     r12d,r9d
+        vpaddd  xmm6,xmm3,XMMWORD[96+rbp]
+        xor     r13d,r9d
+        add     eax,DWORD[60+rsp]
+        mov     edi,ebx
+        xor     r12d,r11d
+        shrd    r14d,r14d,11
+        xor     edi,ecx
+        add     eax,r12d
+        shrd    r13d,r13d,6
+        and     r15d,edi
+        xor     r14d,ebx
+        add     eax,r13d
+        xor     r15d,ecx
+        shrd    r14d,r14d,2
+        add     r8d,eax
+        add     eax,r15d
+        mov     r13d,r8d
+        add     r14d,eax
+        vmovdqa XMMWORD[48+rsp],xmm6
+        cmp     BYTE[131+rbp],0
+        jne     NEAR $L$avx_00_47
+        shrd    r13d,r13d,14
+        mov     eax,r14d
+        mov     r12d,r9d
+        shrd    r14d,r14d,9
+        xor     r13d,r8d
+        xor     r12d,r10d
+        shrd    r13d,r13d,5
+        xor     r14d,eax
+        and     r12d,r8d
+        xor     r13d,r8d
+        add     r11d,DWORD[rsp]
+        mov     r15d,eax
+        xor     r12d,r10d
+        shrd    r14d,r14d,11
+        xor     r15d,ebx
+        add     r11d,r12d
+        shrd    r13d,r13d,6
+        and     edi,r15d
+        xor     r14d,eax
+        add     r11d,r13d
+        xor     edi,ebx
+        shrd    r14d,r14d,2
+        add     edx,r11d
+        add     r11d,edi
+        mov     r13d,edx
+        add     r14d,r11d
+        shrd    r13d,r13d,14
+        mov     r11d,r14d
+        mov     r12d,r8d
+        shrd    r14d,r14d,9
+        xor     r13d,edx
+        xor     r12d,r9d
+        shrd    r13d,r13d,5
+        xor     r14d,r11d
+        and     r12d,edx
+        xor     r13d,edx
+        add     r10d,DWORD[4+rsp]
+        mov     edi,r11d
+        xor     r12d,r9d
+        shrd    r14d,r14d,11
+        xor     edi,eax
+        add     r10d,r12d
+        shrd    r13d,r13d,6
+        and     r15d,edi
+        xor     r14d,r11d
+        add     r10d,r13d
+        xor     r15d,eax
+        shrd    r14d,r14d,2
+        add     ecx,r10d
+        add     r10d,r15d
+        mov     r13d,ecx
+        add     r14d,r10d
+        shrd    r13d,r13d,14
+        mov     r10d,r14d
+        mov     r12d,edx
+        shrd    r14d,r14d,9
+        xor     r13d,ecx
+        xor     r12d,r8d
+        shrd    r13d,r13d,5
+        xor     r14d,r10d
+        and     r12d,ecx
+        xor     r13d,ecx
+        add     r9d,DWORD[8+rsp]
+        mov     r15d,r10d
+        xor     r12d,r8d
+        shrd    r14d,r14d,11
+        xor     r15d,r11d
+        add     r9d,r12d
+        shrd    r13d,r13d,6
+        and     edi,r15d
+        xor     r14d,r10d
+        add     r9d,r13d
+        xor     edi,r11d
+        shrd    r14d,r14d,2
+        add     ebx,r9d
+        add     r9d,edi
+        mov     r13d,ebx
+        add     r14d,r9d
+        shrd    r13d,r13d,14
+        mov     r9d,r14d
+        mov     r12d,ecx
+        shrd    r14d,r14d,9
+        xor     r13d,ebx
+        xor     r12d,edx
+        shrd    r13d,r13d,5
+        xor     r14d,r9d
+        and     r12d,ebx
+        xor     r13d,ebx
+        add     r8d,DWORD[12+rsp]
+        mov     edi,r9d
+        xor     r12d,edx
+        shrd    r14d,r14d,11
+        xor     edi,r10d
+        add     r8d,r12d
+        shrd    r13d,r13d,6
+        and     r15d,edi
+        xor     r14d,r9d
+        add     r8d,r13d
+        xor     r15d,r10d
+        shrd    r14d,r14d,2
+        add     eax,r8d
+        add     r8d,r15d
+        mov     r13d,eax
+        add     r14d,r8d
+        shrd    r13d,r13d,14
+        mov     r8d,r14d
+        mov     r12d,ebx
+        shrd    r14d,r14d,9
+        xor     r13d,eax
+        xor     r12d,ecx
+        shrd    r13d,r13d,5
+        xor     r14d,r8d
+        and     r12d,eax
+        xor     r13d,eax
+        add     edx,DWORD[16+rsp]
+        mov     r15d,r8d
+        xor     r12d,ecx
+        shrd    r14d,r14d,11
+        xor     r15d,r9d
+        add     edx,r12d
+        shrd    r13d,r13d,6
+        and     edi,r15d
+        xor     r14d,r8d
+        add     edx,r13d
+        xor     edi,r9d
+        shrd    r14d,r14d,2
+        add     r11d,edx
+        add     edx,edi
+        mov     r13d,r11d
+        add     r14d,edx
+        shrd    r13d,r13d,14
+        mov     edx,r14d
+        mov     r12d,eax
+        shrd    r14d,r14d,9
+        xor     r13d,r11d
+        xor     r12d,ebx
+        shrd    r13d,r13d,5
+        xor     r14d,edx
+        and     r12d,r11d
+        xor     r13d,r11d
+        add     ecx,DWORD[20+rsp]
+        mov     edi,edx
+        xor     r12d,ebx
+        shrd    r14d,r14d,11
+        xor     edi,r8d
+        add     ecx,r12d
+        shrd    r13d,r13d,6
+        and     r15d,edi
+        xor     r14d,edx
+        add     ecx,r13d
+        xor     r15d,r8d
+        shrd    r14d,r14d,2
+        add     r10d,ecx
+        add     ecx,r15d
+        mov     r13d,r10d
+        add     r14d,ecx
+        shrd    r13d,r13d,14
+        mov     ecx,r14d
+        mov     r12d,r11d
+        shrd    r14d,r14d,9
+        xor     r13d,r10d
+        xor     r12d,eax
+        shrd    r13d,r13d,5
+        xor     r14d,ecx
+        and     r12d,r10d
+        xor     r13d,r10d
+        add     ebx,DWORD[24+rsp]
+        mov     r15d,ecx
+        xor     r12d,eax
+        shrd    r14d,r14d,11
+        xor     r15d,edx
+        add     ebx,r12d
+        shrd    r13d,r13d,6
+        and     edi,r15d
+        xor     r14d,ecx
+        add     ebx,r13d
+        xor     edi,edx
+        shrd    r14d,r14d,2
+        add     r9d,ebx
+        add     ebx,edi
+        mov     r13d,r9d
+        add     r14d,ebx
+        shrd    r13d,r13d,14
+        mov     ebx,r14d
+        mov     r12d,r10d
+        shrd    r14d,r14d,9
+        xor     r13d,r9d
+        xor     r12d,r11d
+        shrd    r13d,r13d,5
+        xor     r14d,ebx
+        and     r12d,r9d
+        xor     r13d,r9d
+        add     eax,DWORD[28+rsp]
+        mov     edi,ebx
+        xor     r12d,r11d
+        shrd    r14d,r14d,11
+        xor     edi,ecx
+        add     eax,r12d
+        shrd    r13d,r13d,6
+        and     r15d,edi
+        xor     r14d,ebx
+        add     eax,r13d
+        xor     r15d,ecx
+        shrd    r14d,r14d,2
+        add     r8d,eax
+        add     eax,r15d
+        mov     r13d,r8d
+        add     r14d,eax
+        shrd    r13d,r13d,14
+        mov     eax,r14d
+        mov     r12d,r9d
+        shrd    r14d,r14d,9
+        xor     r13d,r8d
+        xor     r12d,r10d
+        shrd    r13d,r13d,5
+        xor     r14d,eax
+        and     r12d,r8d
+        xor     r13d,r8d
+        add     r11d,DWORD[32+rsp]
+        mov     r15d,eax
+        xor     r12d,r10d
+        shrd    r14d,r14d,11
+        xor     r15d,ebx
+        add     r11d,r12d
+        shrd    r13d,r13d,6
+        and     edi,r15d
+        xor     r14d,eax
+        add     r11d,r13d
+        xor     edi,ebx
+        shrd    r14d,r14d,2
+        add     edx,r11d
+        add     r11d,edi
+        mov     r13d,edx
+        add     r14d,r11d
+        shrd    r13d,r13d,14
+        mov     r11d,r14d
+        mov     r12d,r8d
+        shrd    r14d,r14d,9
+        xor     r13d,edx
+        xor     r12d,r9d
+        shrd    r13d,r13d,5
+        xor     r14d,r11d
+        and     r12d,edx
+        xor     r13d,edx
+        add     r10d,DWORD[36+rsp]
+        mov     edi,r11d
+        xor     r12d,r9d
+        shrd    r14d,r14d,11
+        xor     edi,eax
+        add     r10d,r12d
+        shrd    r13d,r13d,6
+        and     r15d,edi
+        xor     r14d,r11d
+        add     r10d,r13d
+        xor     r15d,eax
+        shrd    r14d,r14d,2
+        add     ecx,r10d
+        add     r10d,r15d
+        mov     r13d,ecx
+        add     r14d,r10d
+        shrd    r13d,r13d,14
+        mov     r10d,r14d
+        mov     r12d,edx
+        shrd    r14d,r14d,9
+        xor     r13d,ecx
+        xor     r12d,r8d
+        shrd    r13d,r13d,5
+        xor     r14d,r10d
+        and     r12d,ecx
+        xor     r13d,ecx
+        add     r9d,DWORD[40+rsp]
+        mov     r15d,r10d
+        xor     r12d,r8d
+        shrd    r14d,r14d,11
+        xor     r15d,r11d
+        add     r9d,r12d
+        shrd    r13d,r13d,6
+        and     edi,r15d
+        xor     r14d,r10d
+        add     r9d,r13d
+        xor     edi,r11d
+        shrd    r14d,r14d,2
+        add     ebx,r9d
+        add     r9d,edi
+        mov     r13d,ebx
+        add     r14d,r9d
+        shrd    r13d,r13d,14
+        mov     r9d,r14d
+        mov     r12d,ecx
+        shrd    r14d,r14d,9
+        xor     r13d,ebx
+        xor     r12d,edx
+        shrd    r13d,r13d,5
+        xor     r14d,r9d
+        and     r12d,ebx
+        xor     r13d,ebx
+        add     r8d,DWORD[44+rsp]
+        mov     edi,r9d
+        xor     r12d,edx
+        shrd    r14d,r14d,11
+        xor     edi,r10d
+        add     r8d,r12d
+        shrd    r13d,r13d,6
+        and     r15d,edi
+        xor     r14d,r9d
+        add     r8d,r13d
+        xor     r15d,r10d
+        shrd    r14d,r14d,2
+        add     eax,r8d
+        add     r8d,r15d
+        mov     r13d,eax
+        add     r14d,r8d
+        shrd    r13d,r13d,14
+        mov     r8d,r14d
+        mov     r12d,ebx
+        shrd    r14d,r14d,9
+        xor     r13d,eax
+        xor     r12d,ecx
+        shrd    r13d,r13d,5
+        xor     r14d,r8d
+        and     r12d,eax
+        xor     r13d,eax
+        add     edx,DWORD[48+rsp]
+        mov     r15d,r8d
+        xor     r12d,ecx
+        shrd    r14d,r14d,11
+        xor     r15d,r9d
+        add     edx,r12d
+        shrd    r13d,r13d,6
+        and     edi,r15d
+        xor     r14d,r8d
+        add     edx,r13d
+        xor     edi,r9d
+        shrd    r14d,r14d,2
+        add     r11d,edx
+        add     edx,edi
+        mov     r13d,r11d
+        add     r14d,edx
+        shrd    r13d,r13d,14
+        mov     edx,r14d
+        mov     r12d,eax
+        shrd    r14d,r14d,9
+        xor     r13d,r11d
+        xor     r12d,ebx
+        shrd    r13d,r13d,5
+        xor     r14d,edx
+        and     r12d,r11d
+        xor     r13d,r11d
+        add     ecx,DWORD[52+rsp]
+        mov     edi,edx
+        xor     r12d,ebx
+        shrd    r14d,r14d,11
+        xor     edi,r8d
+        add     ecx,r12d
+        shrd    r13d,r13d,6
+        and     r15d,edi
+        xor     r14d,edx
+        add     ecx,r13d
+        xor     r15d,r8d
+        shrd    r14d,r14d,2
+        add     r10d,ecx
+        add     ecx,r15d
+        mov     r13d,r10d
+        add     r14d,ecx
+        shrd    r13d,r13d,14
+        mov     ecx,r14d
+        mov     r12d,r11d
+        shrd    r14d,r14d,9
+        xor     r13d,r10d
+        xor     r12d,eax
+        shrd    r13d,r13d,5
+        xor     r14d,ecx
+        and     r12d,r10d
+        xor     r13d,r10d
+        add     ebx,DWORD[56+rsp]
+        mov     r15d,ecx
+        xor     r12d,eax
+        shrd    r14d,r14d,11
+        xor     r15d,edx
+        add     ebx,r12d
+        shrd    r13d,r13d,6
+        and     edi,r15d
+        xor     r14d,ecx
+        add     ebx,r13d
+        xor     edi,edx
+        shrd    r14d,r14d,2
+        add     r9d,ebx
+        add     ebx,edi
+        mov     r13d,r9d
+        add     r14d,ebx
+        shrd    r13d,r13d,14
+        mov     ebx,r14d
+        mov     r12d,r10d
+        shrd    r14d,r14d,9
+        xor     r13d,r9d
+        xor     r12d,r11d
+        shrd    r13d,r13d,5
+        xor     r14d,ebx
+        and     r12d,r9d
+        xor     r13d,r9d
+        add     eax,DWORD[60+rsp]
+        mov     edi,ebx
+        xor     r12d,r11d
+        shrd    r14d,r14d,11
+        xor     edi,ecx
+        add     eax,r12d
+        shrd    r13d,r13d,6
+        and     r15d,edi
+        xor     r14d,ebx
+        add     eax,r13d
+        xor     r15d,ecx
+        shrd    r14d,r14d,2
+        add     r8d,eax
+        add     eax,r15d
+        mov     r13d,r8d
+        add     r14d,eax
+        mov     rdi,QWORD[((64+0))+rsp]
+        mov     eax,r14d
+
+        add     eax,DWORD[rdi]
+        lea     rsi,[64+rsi]
+        add     ebx,DWORD[4+rdi]
+        add     ecx,DWORD[8+rdi]
+        add     edx,DWORD[12+rdi]
+        add     r8d,DWORD[16+rdi]
+        add     r9d,DWORD[20+rdi]
+        add     r10d,DWORD[24+rdi]
+        add     r11d,DWORD[28+rdi]
+
+        cmp     rsi,QWORD[((64+16))+rsp]
+
+        mov     DWORD[rdi],eax
+        mov     DWORD[4+rdi],ebx
+        mov     DWORD[8+rdi],ecx
+        mov     DWORD[12+rdi],edx
+        mov     DWORD[16+rdi],r8d
+        mov     DWORD[20+rdi],r9d
+        mov     DWORD[24+rdi],r10d
+        mov     DWORD[28+rdi],r11d
+        jb      NEAR $L$loop_avx
+
+        mov     rsi,QWORD[88+rsp]
+
+        vzeroupper
+        movaps  xmm6,XMMWORD[((64+32))+rsp]
+        movaps  xmm7,XMMWORD[((64+48))+rsp]
+        movaps  xmm8,XMMWORD[((64+64))+rsp]
+        movaps  xmm9,XMMWORD[((64+80))+rsp]
+        mov     r15,QWORD[((-48))+rsi]
+
+        mov     r14,QWORD[((-40))+rsi]
+
+        mov     r13,QWORD[((-32))+rsi]
+
+        mov     r12,QWORD[((-24))+rsi]
+
+        mov     rbp,QWORD[((-16))+rsi]
+
+        mov     rbx,QWORD[((-8))+rsi]
+
+        lea     rsp,[rsi]
+
+$L$epilogue_avx:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_sha256_block_data_order_avx:
+
+ALIGN   64
+sha256_block_data_order_avx2:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_sha256_block_data_order_avx2:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+
+$L$avx2_shortcut:
+        mov     rax,rsp
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+        sub     rsp,608
+        shl     rdx,4
+        and     rsp,-256*4
+        lea     rdx,[rdx*4+rsi]
+        add     rsp,448
+        mov     QWORD[((64+0))+rsp],rdi
+        mov     QWORD[((64+8))+rsp],rsi
+        mov     QWORD[((64+16))+rsp],rdx
+        mov     QWORD[88+rsp],rax
+
+        movaps  XMMWORD[(64+32)+rsp],xmm6
+        movaps  XMMWORD[(64+48)+rsp],xmm7
+        movaps  XMMWORD[(64+64)+rsp],xmm8
+        movaps  XMMWORD[(64+80)+rsp],xmm9
+$L$prologue_avx2:
+
+        vzeroupper
+        sub     rsi,-16*4
+        mov     eax,DWORD[rdi]
+        mov     r12,rsi
+        mov     ebx,DWORD[4+rdi]
+        cmp     rsi,rdx
+        mov     ecx,DWORD[8+rdi]
+        cmove   r12,rsp
+        mov     edx,DWORD[12+rdi]
+        mov     r8d,DWORD[16+rdi]
+        mov     r9d,DWORD[20+rdi]
+        mov     r10d,DWORD[24+rdi]
+        mov     r11d,DWORD[28+rdi]
+        vmovdqa ymm8,YMMWORD[((K256+512+32))]
+        vmovdqa ymm9,YMMWORD[((K256+512+64))]
+        jmp     NEAR $L$oop_avx2
+ALIGN   16
+$L$oop_avx2:
+        vmovdqa ymm7,YMMWORD[((K256+512))]
+        vmovdqu xmm0,XMMWORD[((-64+0))+rsi]
+        vmovdqu xmm1,XMMWORD[((-64+16))+rsi]
+        vmovdqu xmm2,XMMWORD[((-64+32))+rsi]
+        vmovdqu xmm3,XMMWORD[((-64+48))+rsi]
+
+        vinserti128     ymm0,ymm0,XMMWORD[r12],1
+        vinserti128     ymm1,ymm1,XMMWORD[16+r12],1
+        vpshufb ymm0,ymm0,ymm7
+        vinserti128     ymm2,ymm2,XMMWORD[32+r12],1
+        vpshufb ymm1,ymm1,ymm7
+        vinserti128     ymm3,ymm3,XMMWORD[48+r12],1
+
+        lea     rbp,[K256]
+        vpshufb ymm2,ymm2,ymm7
+        vpaddd  ymm4,ymm0,YMMWORD[rbp]
+        vpshufb ymm3,ymm3,ymm7
+        vpaddd  ymm5,ymm1,YMMWORD[32+rbp]
+        vpaddd  ymm6,ymm2,YMMWORD[64+rbp]
+        vpaddd  ymm7,ymm3,YMMWORD[96+rbp]
+        vmovdqa YMMWORD[rsp],ymm4
+        xor     r14d,r14d
+        vmovdqa YMMWORD[32+rsp],ymm5
+        lea     rsp,[((-64))+rsp]
+        mov     edi,ebx
+        vmovdqa YMMWORD[rsp],ymm6
+        xor     edi,ecx
+        vmovdqa YMMWORD[32+rsp],ymm7
+        mov     r12d,r9d
+        sub     rbp,-16*2*4
+        jmp     NEAR $L$avx2_00_47
+
+ALIGN   16
+$L$avx2_00_47:
+        lea     rsp,[((-64))+rsp]
+        vpalignr        ymm4,ymm1,ymm0,4
+        add     r11d,DWORD[((0+128))+rsp]
+        and     r12d,r8d
+        rorx    r13d,r8d,25
+        vpalignr        ymm7,ymm3,ymm2,4
+        rorx    r15d,r8d,11
+        lea     eax,[r14*1+rax]
+        lea     r11d,[r12*1+r11]
+        vpsrld  ymm6,ymm4,7
+        andn    r12d,r8d,r10d
+        xor     r13d,r15d
+        rorx    r14d,r8d,6
+        vpaddd  ymm0,ymm0,ymm7
+        lea     r11d,[r12*1+r11]
+        xor     r13d,r14d
+        mov     r15d,eax
+        vpsrld  ymm7,ymm4,3
+        rorx    r12d,eax,22
+        lea     r11d,[r13*1+r11]
+        xor     r15d,ebx
+        vpslld  ymm5,ymm4,14
+        rorx    r14d,eax,13
+        rorx    r13d,eax,2
+        lea     edx,[r11*1+rdx]
+        vpxor   ymm4,ymm7,ymm6
+        and     edi,r15d
+        xor     r14d,r12d
+        xor     edi,ebx
+        vpshufd ymm7,ymm3,250
+        xor     r14d,r13d
+        lea     r11d,[rdi*1+r11]
+        mov     r12d,r8d
+        vpsrld  ymm6,ymm6,11
+        add     r10d,DWORD[((4+128))+rsp]
+        and     r12d,edx
+        rorx    r13d,edx,25
+        vpxor   ymm4,ymm4,ymm5
+        rorx    edi,edx,11
+        lea     r11d,[r14*1+r11]
+        lea     r10d,[r12*1+r10]
+        vpslld  ymm5,ymm5,11
+        andn    r12d,edx,r9d
+        xor     r13d,edi
+        rorx    r14d,edx,6
+        vpxor   ymm4,ymm4,ymm6
+        lea     r10d,[r12*1+r10]
+        xor     r13d,r14d
+        mov     edi,r11d
+        vpsrld  ymm6,ymm7,10
+        rorx    r12d,r11d,22
+        lea     r10d,[r13*1+r10]
+        xor     edi,eax
+        vpxor   ymm4,ymm4,ymm5
+        rorx    r14d,r11d,13
+        rorx    r13d,r11d,2
+        lea     ecx,[r10*1+rcx]
+        vpsrlq  ymm7,ymm7,17
+        and     r15d,edi
+        xor     r14d,r12d
+        xor     r15d,eax
+        vpaddd  ymm0,ymm0,ymm4
+        xor     r14d,r13d
+        lea     r10d,[r15*1+r10]
+        mov     r12d,edx
+        vpxor   ymm6,ymm6,ymm7
+        add     r9d,DWORD[((8+128))+rsp]
+        and     r12d,ecx
+        rorx    r13d,ecx,25
+        vpsrlq  ymm7,ymm7,2
+        rorx    r15d,ecx,11
+        lea     r10d,[r14*1+r10]
+        lea     r9d,[r12*1+r9]
+        vpxor   ymm6,ymm6,ymm7
+        andn    r12d,ecx,r8d
+        xor     r13d,r15d
+        rorx    r14d,ecx,6
+        vpshufb ymm6,ymm6,ymm8
+        lea     r9d,[r12*1+r9]
+        xor     r13d,r14d
+        mov     r15d,r10d
+        vpaddd  ymm0,ymm0,ymm6
+        rorx    r12d,r10d,22
+        lea     r9d,[r13*1+r9]
+        xor     r15d,r11d
+        vpshufd ymm7,ymm0,80
+        rorx    r14d,r10d,13
+        rorx    r13d,r10d,2
+        lea     ebx,[r9*1+rbx]
+        vpsrld  ymm6,ymm7,10
+        and     edi,r15d
+        xor     r14d,r12d
+        xor     edi,r11d
+        vpsrlq  ymm7,ymm7,17
+        xor     r14d,r13d
+        lea     r9d,[rdi*1+r9]
+        mov     r12d,ecx
+        vpxor   ymm6,ymm6,ymm7
+        add     r8d,DWORD[((12+128))+rsp]
+        and     r12d,ebx
+        rorx    r13d,ebx,25
+        vpsrlq  ymm7,ymm7,2
+        rorx    edi,ebx,11
+        lea     r9d,[r14*1+r9]
+        lea     r8d,[r12*1+r8]
+        vpxor   ymm6,ymm6,ymm7
+        andn    r12d,ebx,edx
+        xor     r13d,edi
+        rorx    r14d,ebx,6
+        vpshufb ymm6,ymm6,ymm9
+        lea     r8d,[r12*1+r8]
+        xor     r13d,r14d
+        mov     edi,r9d
+        vpaddd  ymm0,ymm0,ymm6
+        rorx    r12d,r9d,22
+        lea     r8d,[r13*1+r8]
+        xor     edi,r10d
+        vpaddd  ymm6,ymm0,YMMWORD[rbp]
+        rorx    r14d,r9d,13
+        rorx    r13d,r9d,2
+        lea     eax,[r8*1+rax]
+        and     r15d,edi
+        xor     r14d,r12d
+        xor     r15d,r10d
+        xor     r14d,r13d
+        lea     r8d,[r15*1+r8]
+        mov     r12d,ebx
+        vmovdqa YMMWORD[rsp],ymm6
+        vpalignr        ymm4,ymm2,ymm1,4
+        add     edx,DWORD[((32+128))+rsp]
+        and     r12d,eax
+        rorx    r13d,eax,25
+        vpalignr        ymm7,ymm0,ymm3,4
+        rorx    r15d,eax,11
+        lea     r8d,[r14*1+r8]
+        lea     edx,[r12*1+rdx]
+        vpsrld  ymm6,ymm4,7
+        andn    r12d,eax,ecx
+        xor     r13d,r15d
+        rorx    r14d,eax,6
+        vpaddd  ymm1,ymm1,ymm7
+        lea     edx,[r12*1+rdx]
+        xor     r13d,r14d
+        mov     r15d,r8d
+        vpsrld  ymm7,ymm4,3
+        rorx    r12d,r8d,22
+        lea     edx,[r13*1+rdx]
+        xor     r15d,r9d
+        vpslld  ymm5,ymm4,14
+        rorx    r14d,r8d,13
+        rorx    r13d,r8d,2
+        lea     r11d,[rdx*1+r11]
+        vpxor   ymm4,ymm7,ymm6
+        and     edi,r15d
+        xor     r14d,r12d
+        xor     edi,r9d
+        vpshufd ymm7,ymm0,250
+        xor     r14d,r13d
+        lea     edx,[rdi*1+rdx]
+        mov     r12d,eax
+        vpsrld  ymm6,ymm6,11
+        add     ecx,DWORD[((36+128))+rsp]
+        and     r12d,r11d
+        rorx    r13d,r11d,25
+        vpxor   ymm4,ymm4,ymm5
+        rorx    edi,r11d,11
+        lea     edx,[r14*1+rdx]
+        lea     ecx,[r12*1+rcx]
+        vpslld  ymm5,ymm5,11
+        andn    r12d,r11d,ebx
+        xor     r13d,edi
+        rorx    r14d,r11d,6
+        vpxor   ymm4,ymm4,ymm6
+        lea     ecx,[r12*1+rcx]
+        xor     r13d,r14d
+        mov     edi,edx
+        vpsrld  ymm6,ymm7,10
+        rorx    r12d,edx,22
+        lea     ecx,[r13*1+rcx]
+        xor     edi,r8d
+        vpxor   ymm4,ymm4,ymm5
+        rorx    r14d,edx,13
+        rorx    r13d,edx,2
+        lea     r10d,[rcx*1+r10]
+        vpsrlq  ymm7,ymm7,17
+        and     r15d,edi
+        xor     r14d,r12d
+        xor     r15d,r8d
+        vpaddd  ymm1,ymm1,ymm4
+        xor     r14d,r13d
+        lea     ecx,[r15*1+rcx]
+        mov     r12d,r11d
+        vpxor   ymm6,ymm6,ymm7
+        add     ebx,DWORD[((40+128))+rsp]
+        and     r12d,r10d
+        rorx    r13d,r10d,25
+        vpsrlq  ymm7,ymm7,2
+        rorx    r15d,r10d,11
+        lea     ecx,[r14*1+rcx]
+        lea     ebx,[r12*1+rbx]
+        vpxor   ymm6,ymm6,ymm7
+        andn    r12d,r10d,eax
+        xor     r13d,r15d
+        rorx    r14d,r10d,6
+        vpshufb ymm6,ymm6,ymm8
+        lea     ebx,[r12*1+rbx]
+        xor     r13d,r14d
+        mov     r15d,ecx
+        vpaddd  ymm1,ymm1,ymm6
+        rorx    r12d,ecx,22
+        lea     ebx,[r13*1+rbx]
+        xor     r15d,edx
+        vpshufd ymm7,ymm1,80
+        rorx    r14d,ecx,13
+        rorx    r13d,ecx,2
+        lea     r9d,[rbx*1+r9]
+        vpsrld  ymm6,ymm7,10
+        and     edi,r15d
+        xor     r14d,r12d
+        xor     edi,edx
+        vpsrlq  ymm7,ymm7,17
+        xor     r14d,r13d
+        lea     ebx,[rdi*1+rbx]
+        mov     r12d,r10d
+        vpxor   ymm6,ymm6,ymm7
+        add     eax,DWORD[((44+128))+rsp]
+        and     r12d,r9d
+        rorx    r13d,r9d,25
+        vpsrlq  ymm7,ymm7,2
+        rorx    edi,r9d,11
+        lea     ebx,[r14*1+rbx]
+        lea     eax,[r12*1+rax]
+        vpxor   ymm6,ymm6,ymm7
+        andn    r12d,r9d,r11d
+        xor     r13d,edi
+        rorx    r14d,r9d,6
+        vpshufb ymm6,ymm6,ymm9
+        lea     eax,[r12*1+rax]
+        xor     r13d,r14d
+        mov     edi,ebx
+        vpaddd  ymm1,ymm1,ymm6
+        rorx    r12d,ebx,22
+        lea     eax,[r13*1+rax]
+        xor     edi,ecx
+        vpaddd  ymm6,ymm1,YMMWORD[32+rbp]
+        rorx    r14d,ebx,13
+        rorx    r13d,ebx,2
+        lea     r8d,[rax*1+r8]
+        and     r15d,edi
+        xor     r14d,r12d
+        xor     r15d,ecx
+        xor     r14d,r13d
+        lea     eax,[r15*1+rax]
+        mov     r12d,r9d
+        vmovdqa YMMWORD[32+rsp],ymm6
+        lea     rsp,[((-64))+rsp]
+        vpalignr        ymm4,ymm3,ymm2,4
+        add     r11d,DWORD[((0+128))+rsp]
+        and     r12d,r8d
+        rorx    r13d,r8d,25
+        vpalignr        ymm7,ymm1,ymm0,4
+        rorx    r15d,r8d,11
+        lea     eax,[r14*1+rax]
+        lea     r11d,[r12*1+r11]
+        vpsrld  ymm6,ymm4,7
+        andn    r12d,r8d,r10d
+        xor     r13d,r15d
+        rorx    r14d,r8d,6
+        vpaddd  ymm2,ymm2,ymm7
+        lea     r11d,[r12*1+r11]
+        xor     r13d,r14d
+        mov     r15d,eax
+        vpsrld  ymm7,ymm4,3
+        rorx    r12d,eax,22
+        lea     r11d,[r13*1+r11]
+        xor     r15d,ebx
+        vpslld  ymm5,ymm4,14
+        rorx    r14d,eax,13
+        rorx    r13d,eax,2
+        lea     edx,[r11*1+rdx]
+        vpxor   ymm4,ymm7,ymm6
+        and     edi,r15d
+        xor     r14d,r12d
+        xor     edi,ebx
+        vpshufd ymm7,ymm1,250
+        xor     r14d,r13d
+        lea     r11d,[rdi*1+r11]
+        mov     r12d,r8d
+        vpsrld  ymm6,ymm6,11
+        add     r10d,DWORD[((4+128))+rsp]
+        and     r12d,edx
+        rorx    r13d,edx,25
+        vpxor   ymm4,ymm4,ymm5
+        rorx    edi,edx,11
+        lea     r11d,[r14*1+r11]
+        lea     r10d,[r12*1+r10]
+        vpslld  ymm5,ymm5,11
+        andn    r12d,edx,r9d
+        xor     r13d,edi
+        rorx    r14d,edx,6
+        vpxor   ymm4,ymm4,ymm6
+        lea     r10d,[r12*1+r10]
+        xor     r13d,r14d
+        mov     edi,r11d
+        vpsrld  ymm6,ymm7,10
+        rorx    r12d,r11d,22
+        lea     r10d,[r13*1+r10]
+        xor     edi,eax
+        vpxor   ymm4,ymm4,ymm5
+        rorx    r14d,r11d,13
+        rorx    r13d,r11d,2
+        lea     ecx,[r10*1+rcx]
+        vpsrlq  ymm7,ymm7,17
+        and     r15d,edi
+        xor     r14d,r12d
+        xor     r15d,eax
+        vpaddd  ymm2,ymm2,ymm4
+        xor     r14d,r13d
+        lea     r10d,[r15*1+r10]
+        mov     r12d,edx
+        vpxor   ymm6,ymm6,ymm7
+        add     r9d,DWORD[((8+128))+rsp]
+        and     r12d,ecx
+        rorx    r13d,ecx,25
+        vpsrlq  ymm7,ymm7,2
+        rorx    r15d,ecx,11
+        lea     r10d,[r14*1+r10]
+        lea     r9d,[r12*1+r9]
+        vpxor   ymm6,ymm6,ymm7
+        andn    r12d,ecx,r8d
+        xor     r13d,r15d
+        rorx    r14d,ecx,6
+        vpshufb ymm6,ymm6,ymm8
+        lea     r9d,[r12*1+r9]
+        xor     r13d,r14d
+        mov     r15d,r10d
+        vpaddd  ymm2,ymm2,ymm6
+        rorx    r12d,r10d,22
+        lea     r9d,[r13*1+r9]
+        xor     r15d,r11d
+        vpshufd ymm7,ymm2,80
+        rorx    r14d,r10d,13
+        rorx    r13d,r10d,2
+        lea     ebx,[r9*1+rbx]
+        vpsrld  ymm6,ymm7,10
+        and     edi,r15d
+        xor     r14d,r12d
+        xor     edi,r11d
+        vpsrlq  ymm7,ymm7,17
+        xor     r14d,r13d
+        lea     r9d,[rdi*1+r9]
+        mov     r12d,ecx
+        vpxor   ymm6,ymm6,ymm7
+        add     r8d,DWORD[((12+128))+rsp]
+        and     r12d,ebx
+        rorx    r13d,ebx,25
+        vpsrlq  ymm7,ymm7,2
+        rorx    edi,ebx,11
+        lea     r9d,[r14*1+r9]
+        lea     r8d,[r12*1+r8]
+        vpxor   ymm6,ymm6,ymm7
+        andn    r12d,ebx,edx
+        xor     r13d,edi
+        rorx    r14d,ebx,6
+        vpshufb ymm6,ymm6,ymm9
+        lea     r8d,[r12*1+r8]
+        xor     r13d,r14d
+        mov     edi,r9d
+        vpaddd  ymm2,ymm2,ymm6
+        rorx    r12d,r9d,22
+        lea     r8d,[r13*1+r8]
+        xor     edi,r10d
+        vpaddd  ymm6,ymm2,YMMWORD[64+rbp]
+        rorx    r14d,r9d,13
+        rorx    r13d,r9d,2
+        lea     eax,[r8*1+rax]
+        and     r15d,edi
+        xor     r14d,r12d
+        xor     r15d,r10d
+        xor     r14d,r13d
+        lea     r8d,[r15*1+r8]
+        mov     r12d,ebx
+        vmovdqa YMMWORD[rsp],ymm6
+        vpalignr        ymm4,ymm0,ymm3,4
+        add     edx,DWORD[((32+128))+rsp]
+        and     r12d,eax
+        rorx    r13d,eax,25
+        vpalignr        ymm7,ymm2,ymm1,4
+        rorx    r15d,eax,11
+        lea     r8d,[r14*1+r8]
+        lea     edx,[r12*1+rdx]
+        vpsrld  ymm6,ymm4,7
+        andn    r12d,eax,ecx
+        xor     r13d,r15d
+        rorx    r14d,eax,6
+        vpaddd  ymm3,ymm3,ymm7
+        lea     edx,[r12*1+rdx]
+        xor     r13d,r14d
+        mov     r15d,r8d
+        vpsrld  ymm7,ymm4,3
+        rorx    r12d,r8d,22
+        lea     edx,[r13*1+rdx]
+        xor     r15d,r9d
+        vpslld  ymm5,ymm4,14
+        rorx    r14d,r8d,13
+        rorx    r13d,r8d,2
+        lea     r11d,[rdx*1+r11]
+        vpxor   ymm4,ymm7,ymm6
+        and     edi,r15d
+        xor     r14d,r12d
+        xor     edi,r9d
+        vpshufd ymm7,ymm2,250
+        xor     r14d,r13d
+        lea     edx,[rdi*1+rdx]
+        mov     r12d,eax
+        vpsrld  ymm6,ymm6,11
+        add     ecx,DWORD[((36+128))+rsp]
+        and     r12d,r11d
+        rorx    r13d,r11d,25
+        vpxor   ymm4,ymm4,ymm5
+        rorx    edi,r11d,11
+        lea     edx,[r14*1+rdx]
+        lea     ecx,[r12*1+rcx]
+        vpslld  ymm5,ymm5,11
+        andn    r12d,r11d,ebx
+        xor     r13d,edi
+        rorx    r14d,r11d,6
+        vpxor   ymm4,ymm4,ymm6
+        lea     ecx,[r12*1+rcx]
+        xor     r13d,r14d
+        mov     edi,edx
+        vpsrld  ymm6,ymm7,10
+        rorx    r12d,edx,22
+        lea     ecx,[r13*1+rcx]
+        xor     edi,r8d
+        vpxor   ymm4,ymm4,ymm5
+        rorx    r14d,edx,13
+        rorx    r13d,edx,2
+        lea     r10d,[rcx*1+r10]
+        vpsrlq  ymm7,ymm7,17
+        and     r15d,edi
+        xor     r14d,r12d
+        xor     r15d,r8d
+        vpaddd  ymm3,ymm3,ymm4
+        xor     r14d,r13d
+        lea     ecx,[r15*1+rcx]
+        mov     r12d,r11d
+        vpxor   ymm6,ymm6,ymm7
+        add     ebx,DWORD[((40+128))+rsp]
+        and     r12d,r10d
+        rorx    r13d,r10d,25
+        vpsrlq  ymm7,ymm7,2
+        rorx    r15d,r10d,11
+        lea     ecx,[r14*1+rcx]
+        lea     ebx,[r12*1+rbx]
+        vpxor   ymm6,ymm6,ymm7
+        andn    r12d,r10d,eax
+        xor     r13d,r15d
+        rorx    r14d,r10d,6
+        vpshufb ymm6,ymm6,ymm8
+        lea     ebx,[r12*1+rbx]
+        xor     r13d,r14d
+        mov     r15d,ecx
+        vpaddd  ymm3,ymm3,ymm6
+        rorx    r12d,ecx,22
+        lea     ebx,[r13*1+rbx]
+        xor     r15d,edx
+        vpshufd ymm7,ymm3,80
+        rorx    r14d,ecx,13
+        rorx    r13d,ecx,2
+        lea     r9d,[rbx*1+r9]
+        vpsrld  ymm6,ymm7,10
+        and     edi,r15d
+        xor     r14d,r12d
+        xor     edi,edx
+        vpsrlq  ymm7,ymm7,17
+        xor     r14d,r13d
+        lea     ebx,[rdi*1+rbx]
+        mov     r12d,r10d
+        vpxor   ymm6,ymm6,ymm7
+        add     eax,DWORD[((44+128))+rsp]
+        and     r12d,r9d
+        rorx    r13d,r9d,25
+        vpsrlq  ymm7,ymm7,2
+        rorx    edi,r9d,11
+        lea     ebx,[r14*1+rbx]
+        lea     eax,[r12*1+rax]
+        vpxor   ymm6,ymm6,ymm7
+        andn    r12d,r9d,r11d
+        xor     r13d,edi
+        rorx    r14d,r9d,6
+        vpshufb ymm6,ymm6,ymm9
+        lea     eax,[r12*1+rax]
+        xor     r13d,r14d
+        mov     edi,ebx
+        vpaddd  ymm3,ymm3,ymm6
+        rorx    r12d,ebx,22
+        lea     eax,[r13*1+rax]
+        xor     edi,ecx
+        vpaddd  ymm6,ymm3,YMMWORD[96+rbp]
+        rorx    r14d,ebx,13
+        rorx    r13d,ebx,2
+        lea     r8d,[rax*1+r8]
+        and     r15d,edi
+        xor     r14d,r12d
+        xor     r15d,ecx
+        xor     r14d,r13d
+        lea     eax,[r15*1+rax]
+        mov     r12d,r9d
+        vmovdqa YMMWORD[32+rsp],ymm6
+        lea     rbp,[128+rbp]
+        cmp     BYTE[3+rbp],0
+        jne     NEAR $L$avx2_00_47
+        add     r11d,DWORD[((0+64))+rsp]
+        and     r12d,r8d
+        rorx    r13d,r8d,25
+        rorx    r15d,r8d,11
+        lea     eax,[r14*1+rax]
+        lea     r11d,[r12*1+r11]
+        andn    r12d,r8d,r10d
+        xor     r13d,r15d
+        rorx    r14d,r8d,6
+        lea     r11d,[r12*1+r11]
+        xor     r13d,r14d
+        mov     r15d,eax
+        rorx    r12d,eax,22
+        lea     r11d,[r13*1+r11]
+        xor     r15d,ebx
+        rorx    r14d,eax,13
+        rorx    r13d,eax,2
+        lea     edx,[r11*1+rdx]
+        and     edi,r15d
+        xor     r14d,r12d
+        xor     edi,ebx
+        xor     r14d,r13d
+        lea     r11d,[rdi*1+r11]
+        mov     r12d,r8d
+        add     r10d,DWORD[((4+64))+rsp]
+        and     r12d,edx
+        rorx    r13d,edx,25
+        rorx    edi,edx,11
+        lea     r11d,[r14*1+r11]
+        lea     r10d,[r12*1+r10]
+        andn    r12d,edx,r9d
+        xor     r13d,edi
+        rorx    r14d,edx,6
+        lea     r10d,[r12*1+r10]
+        xor     r13d,r14d
+        mov     edi,r11d
+        rorx    r12d,r11d,22
+        lea     r10d,[r13*1+r10]
+        xor     edi,eax
+        rorx    r14d,r11d,13
+        rorx    r13d,r11d,2
+        lea     ecx,[r10*1+rcx]
+        and     r15d,edi
+        xor     r14d,r12d
+        xor     r15d,eax
+        xor     r14d,r13d
+        lea     r10d,[r15*1+r10]
+        mov     r12d,edx
+        add     r9d,DWORD[((8+64))+rsp]
+        and     r12d,ecx
+        rorx    r13d,ecx,25
+        rorx    r15d,ecx,11
+        lea     r10d,[r14*1+r10]
+        lea     r9d,[r12*1+r9]
+        andn    r12d,ecx,r8d
+        xor     r13d,r15d
+        rorx    r14d,ecx,6
+        lea     r9d,[r12*1+r9]
+        xor     r13d,r14d
+        mov     r15d,r10d
+        rorx    r12d,r10d,22
+        lea     r9d,[r13*1+r9]
+        xor     r15d,r11d
+        rorx    r14d,r10d,13
+        rorx    r13d,r10d,2
+        lea     ebx,[r9*1+rbx]
+        and     edi,r15d
+        xor     r14d,r12d
+        xor     edi,r11d
+        xor     r14d,r13d
+        lea     r9d,[rdi*1+r9]
+        mov     r12d,ecx
+        add     r8d,DWORD[((12+64))+rsp]
+        and     r12d,ebx
+        rorx    r13d,ebx,25
+        rorx    edi,ebx,11
+        lea     r9d,[r14*1+r9]
+        lea     r8d,[r12*1+r8]
+        andn    r12d,ebx,edx
+        xor     r13d,edi
+        rorx    r14d,ebx,6
+        lea     r8d,[r12*1+r8]
+        xor     r13d,r14d
+        mov     edi,r9d
+        rorx    r12d,r9d,22
+        lea     r8d,[r13*1+r8]
+        xor     edi,r10d
+        rorx    r14d,r9d,13
+        rorx    r13d,r9d,2
+        lea     eax,[r8*1+rax]
+        and     r15d,edi
+        xor     r14d,r12d
+        xor     r15d,r10d
+        xor     r14d,r13d
+        lea     r8d,[r15*1+r8]
+        mov     r12d,ebx
+        add     edx,DWORD[((32+64))+rsp]
+        and     r12d,eax
+        rorx    r13d,eax,25
+        rorx    r15d,eax,11
+        lea     r8d,[r14*1+r8]
+        lea     edx,[r12*1+rdx]
+        andn    r12d,eax,ecx
+        xor     r13d,r15d
+        rorx    r14d,eax,6
+        lea     edx,[r12*1+rdx]
+        xor     r13d,r14d
+        mov     r15d,r8d
+        rorx    r12d,r8d,22
+        lea     edx,[r13*1+rdx]
+        xor     r15d,r9d
+        rorx    r14d,r8d,13
+        rorx    r13d,r8d,2
+        lea     r11d,[rdx*1+r11]
+        and     edi,r15d
+        xor     r14d,r12d
+        xor     edi,r9d
+        xor     r14d,r13d
+        lea     edx,[rdi*1+rdx]
+        mov     r12d,eax
+        add     ecx,DWORD[((36+64))+rsp]
+        and     r12d,r11d
+        rorx    r13d,r11d,25
+        rorx    edi,r11d,11
+        lea     edx,[r14*1+rdx]
+        lea     ecx,[r12*1+rcx]
+        andn    r12d,r11d,ebx
+        xor     r13d,edi
+        rorx    r14d,r11d,6
+        lea     ecx,[r12*1+rcx]
+        xor     r13d,r14d
+        mov     edi,edx
+        rorx    r12d,edx,22
+        lea     ecx,[r13*1+rcx]
+        xor     edi,r8d
+        rorx    r14d,edx,13
+        rorx    r13d,edx,2
+        lea     r10d,[rcx*1+r10]
+        and     r15d,edi
+        xor     r14d,r12d
+        xor     r15d,r8d
+        xor     r14d,r13d
+        lea     ecx,[r15*1+rcx]
+        mov     r12d,r11d
+        add     ebx,DWORD[((40+64))+rsp]
+        and     r12d,r10d
+        rorx    r13d,r10d,25
+        rorx    r15d,r10d,11
+        lea     ecx,[r14*1+rcx]
+        lea     ebx,[r12*1+rbx]
+        andn    r12d,r10d,eax
+        xor     r13d,r15d
+        rorx    r14d,r10d,6
+        lea     ebx,[r12*1+rbx]
+        xor     r13d,r14d
+        mov     r15d,ecx
+        rorx    r12d,ecx,22
+        lea     ebx,[r13*1+rbx]
+        xor     r15d,edx
+        rorx    r14d,ecx,13
+        rorx    r13d,ecx,2
+        lea     r9d,[rbx*1+r9]
+        and     edi,r15d
+        xor     r14d,r12d
+        xor     edi,edx
+        xor     r14d,r13d
+        lea     ebx,[rdi*1+rbx]
+        mov     r12d,r10d
+        add     eax,DWORD[((44+64))+rsp]
+        and     r12d,r9d
+        rorx    r13d,r9d,25
+        rorx    edi,r9d,11
+        lea     ebx,[r14*1+rbx]
+        lea     eax,[r12*1+rax]
+        andn    r12d,r9d,r11d
+        xor     r13d,edi
+        rorx    r14d,r9d,6
+        lea     eax,[r12*1+rax]
+        xor     r13d,r14d
+        mov     edi,ebx
+        rorx    r12d,ebx,22
+        lea     eax,[r13*1+rax]
+        xor     edi,ecx
+        rorx    r14d,ebx,13
+        rorx    r13d,ebx,2
+        lea     r8d,[rax*1+r8]
+        and     r15d,edi
+        xor     r14d,r12d
+        xor     r15d,ecx
+        xor     r14d,r13d
+        lea     eax,[r15*1+rax]
+        mov     r12d,r9d
+        add     r11d,DWORD[rsp]
+        and     r12d,r8d
+        rorx    r13d,r8d,25
+        rorx    r15d,r8d,11
+        lea     eax,[r14*1+rax]
+        lea     r11d,[r12*1+r11]
+        andn    r12d,r8d,r10d
+        xor     r13d,r15d
+        rorx    r14d,r8d,6
+        lea     r11d,[r12*1+r11]
+        xor     r13d,r14d
+        mov     r15d,eax
+        rorx    r12d,eax,22
+        lea     r11d,[r13*1+r11]
+        xor     r15d,ebx
+        rorx    r14d,eax,13
+        rorx    r13d,eax,2
+        lea     edx,[r11*1+rdx]
+        and     edi,r15d
+        xor     r14d,r12d
+        xor     edi,ebx
+        xor     r14d,r13d
+        lea     r11d,[rdi*1+r11]
+        mov     r12d,r8d
+        add     r10d,DWORD[4+rsp]
+        and     r12d,edx
+        rorx    r13d,edx,25
+        rorx    edi,edx,11
+        lea     r11d,[r14*1+r11]
+        lea     r10d,[r12*1+r10]
+        andn    r12d,edx,r9d
+        xor     r13d,edi
+        rorx    r14d,edx,6
+        lea     r10d,[r12*1+r10]
+        xor     r13d,r14d
+        mov     edi,r11d
+        rorx    r12d,r11d,22
+        lea     r10d,[r13*1+r10]
+        xor     edi,eax
+        rorx    r14d,r11d,13
+        rorx    r13d,r11d,2
+        lea     ecx,[r10*1+rcx]
+        and     r15d,edi
+        xor     r14d,r12d
+        xor     r15d,eax
+        xor     r14d,r13d
+        lea     r10d,[r15*1+r10]
+        mov     r12d,edx
+        add     r9d,DWORD[8+rsp]
+        and     r12d,ecx
+        rorx    r13d,ecx,25
+        rorx    r15d,ecx,11
+        lea     r10d,[r14*1+r10]
+        lea     r9d,[r12*1+r9]
+        andn    r12d,ecx,r8d
+        xor     r13d,r15d
+        rorx    r14d,ecx,6
+        lea     r9d,[r12*1+r9]
+        xor     r13d,r14d
+        mov     r15d,r10d
+        rorx    r12d,r10d,22
+        lea     r9d,[r13*1+r9]
+        xor     r15d,r11d
+        rorx    r14d,r10d,13
+        rorx    r13d,r10d,2
+        lea     ebx,[r9*1+rbx]
+        and     edi,r15d
+        xor     r14d,r12d
+        xor     edi,r11d
+        xor     r14d,r13d
+        lea     r9d,[rdi*1+r9]
+        mov     r12d,ecx
+        add     r8d,DWORD[12+rsp]
+        and     r12d,ebx
+        rorx    r13d,ebx,25
+        rorx    edi,ebx,11
+        lea     r9d,[r14*1+r9]
+        lea     r8d,[r12*1+r8]
+        andn    r12d,ebx,edx
+        xor     r13d,edi
+        rorx    r14d,ebx,6
+        lea     r8d,[r12*1+r8]
+        xor     r13d,r14d
+        mov     edi,r9d
+        rorx    r12d,r9d,22
+        lea     r8d,[r13*1+r8]
+        xor     edi,r10d
+        rorx    r14d,r9d,13
+        rorx    r13d,r9d,2
+        lea     eax,[r8*1+rax]
+        and     r15d,edi
+        xor     r14d,r12d
+        xor     r15d,r10d
+        xor     r14d,r13d
+        lea     r8d,[r15*1+r8]
+        mov     r12d,ebx
+        add     edx,DWORD[32+rsp]
+        and     r12d,eax
+        rorx    r13d,eax,25
+        rorx    r15d,eax,11
+        lea     r8d,[r14*1+r8]
+        lea     edx,[r12*1+rdx]
+        andn    r12d,eax,ecx
+        xor     r13d,r15d
+        rorx    r14d,eax,6
+        lea     edx,[r12*1+rdx]
+        xor     r13d,r14d
+        mov     r15d,r8d
+        rorx    r12d,r8d,22
+        lea     edx,[r13*1+rdx]
+        xor     r15d,r9d
+        rorx    r14d,r8d,13
+        rorx    r13d,r8d,2
+        lea     r11d,[rdx*1+r11]
+        and     edi,r15d
+        xor     r14d,r12d
+        xor     edi,r9d
+        xor     r14d,r13d
+        lea     edx,[rdi*1+rdx]
+        mov     r12d,eax
+        add     ecx,DWORD[36+rsp]
+        and     r12d,r11d
+        rorx    r13d,r11d,25
+        rorx    edi,r11d,11
+        lea     edx,[r14*1+rdx]
+        lea     ecx,[r12*1+rcx]
+        andn    r12d,r11d,ebx
+        xor     r13d,edi
+        rorx    r14d,r11d,6
+        lea     ecx,[r12*1+rcx]
+        xor     r13d,r14d
+        mov     edi,edx
+        rorx    r12d,edx,22
+        lea     ecx,[r13*1+rcx]
+        xor     edi,r8d
+        rorx    r14d,edx,13
+        rorx    r13d,edx,2
+        lea     r10d,[rcx*1+r10]
+        and     r15d,edi
+        xor     r14d,r12d
+        xor     r15d,r8d
+        xor     r14d,r13d
+        lea     ecx,[r15*1+rcx]
+        mov     r12d,r11d
+        add     ebx,DWORD[40+rsp]
+        and     r12d,r10d
+        rorx    r13d,r10d,25
+        rorx    r15d,r10d,11
+        lea     ecx,[r14*1+rcx]
+        lea     ebx,[r12*1+rbx]
+        andn    r12d,r10d,eax
+        xor     r13d,r15d
+        rorx    r14d,r10d,6
+        lea     ebx,[r12*1+rbx]
+        xor     r13d,r14d
+        mov     r15d,ecx
+        rorx    r12d,ecx,22
+        lea     ebx,[r13*1+rbx]
+        xor     r15d,edx
+        rorx    r14d,ecx,13
+        rorx    r13d,ecx,2
+        lea     r9d,[rbx*1+r9]
+        and     edi,r15d
+        xor     r14d,r12d
+        xor     edi,edx
+        xor     r14d,r13d
+        lea     ebx,[rdi*1+rbx]
+        mov     r12d,r10d
+        add     eax,DWORD[44+rsp]
+        and     r12d,r9d
+        rorx    r13d,r9d,25
+        rorx    edi,r9d,11
+        lea     ebx,[r14*1+rbx]
+        lea     eax,[r12*1+rax]
+        andn    r12d,r9d,r11d
+        xor     r13d,edi
+        rorx    r14d,r9d,6
+        lea     eax,[r12*1+rax]
+        xor     r13d,r14d
+        mov     edi,ebx
+        rorx    r12d,ebx,22
+        lea     eax,[r13*1+rax]
+        xor     edi,ecx
+        rorx    r14d,ebx,13
+        rorx    r13d,ebx,2
+        lea     r8d,[rax*1+r8]
+        and     r15d,edi
+        xor     r14d,r12d
+        xor     r15d,ecx
+        xor     r14d,r13d
+        lea     eax,[r15*1+rax]
+        mov     r12d,r9d
+        mov     rdi,QWORD[512+rsp]
+        add     eax,r14d
+
+        lea     rbp,[448+rsp]
+
+        add     eax,DWORD[rdi]
+        add     ebx,DWORD[4+rdi]
+        add     ecx,DWORD[8+rdi]
+        add     edx,DWORD[12+rdi]
+        add     r8d,DWORD[16+rdi]
+        add     r9d,DWORD[20+rdi]
+        add     r10d,DWORD[24+rdi]
+        add     r11d,DWORD[28+rdi]
+
+        mov     DWORD[rdi],eax
+        mov     DWORD[4+rdi],ebx
+        mov     DWORD[8+rdi],ecx
+        mov     DWORD[12+rdi],edx
+        mov     DWORD[16+rdi],r8d
+        mov     DWORD[20+rdi],r9d
+        mov     DWORD[24+rdi],r10d
+        mov     DWORD[28+rdi],r11d
+
+        cmp     rsi,QWORD[80+rbp]
+        je      NEAR $L$done_avx2
+
+        xor     r14d,r14d
+        mov     edi,ebx
+        xor     edi,ecx
+        mov     r12d,r9d
+        jmp     NEAR $L$ower_avx2
+ALIGN   16
+$L$ower_avx2:
+        add     r11d,DWORD[((0+16))+rbp]
+        and     r12d,r8d
+        rorx    r13d,r8d,25
+        rorx    r15d,r8d,11
+        lea     eax,[r14*1+rax]
+        lea     r11d,[r12*1+r11]
+        andn    r12d,r8d,r10d
+        xor     r13d,r15d
+        rorx    r14d,r8d,6
+        lea     r11d,[r12*1+r11]
+        xor     r13d,r14d
+        mov     r15d,eax
+        rorx    r12d,eax,22
+        lea     r11d,[r13*1+r11]
+        xor     r15d,ebx
+        rorx    r14d,eax,13
+        rorx    r13d,eax,2
+        lea     edx,[r11*1+rdx]
+        and     edi,r15d
+        xor     r14d,r12d
+        xor     edi,ebx
+        xor     r14d,r13d
+        lea     r11d,[rdi*1+r11]
+        mov     r12d,r8d
+        add     r10d,DWORD[((4+16))+rbp]
+        and     r12d,edx
+        rorx    r13d,edx,25
+        rorx    edi,edx,11
+        lea     r11d,[r14*1+r11]
+        lea     r10d,[r12*1+r10]
+        andn    r12d,edx,r9d
+        xor     r13d,edi
+        rorx    r14d,edx,6
+        lea     r10d,[r12*1+r10]
+        xor     r13d,r14d
+        mov     edi,r11d
+        rorx    r12d,r11d,22
+        lea     r10d,[r13*1+r10]
+        xor     edi,eax
+        rorx    r14d,r11d,13
+        rorx    r13d,r11d,2
+        lea     ecx,[r10*1+rcx]
+        and     r15d,edi
+        xor     r14d,r12d
+        xor     r15d,eax
+        xor     r14d,r13d
+        lea     r10d,[r15*1+r10]
+        mov     r12d,edx
+        add     r9d,DWORD[((8+16))+rbp]
+        and     r12d,ecx
+        rorx    r13d,ecx,25
+        rorx    r15d,ecx,11
+        lea     r10d,[r14*1+r10]
+        lea     r9d,[r12*1+r9]
+        andn    r12d,ecx,r8d
+        xor     r13d,r15d
+        rorx    r14d,ecx,6
+        lea     r9d,[r12*1+r9]
+        xor     r13d,r14d
+        mov     r15d,r10d
+        rorx    r12d,r10d,22
+        lea     r9d,[r13*1+r9]
+        xor     r15d,r11d
+        rorx    r14d,r10d,13
+        rorx    r13d,r10d,2
+        lea     ebx,[r9*1+rbx]
+        and     edi,r15d
+        xor     r14d,r12d
+        xor     edi,r11d
+        xor     r14d,r13d
+        lea     r9d,[rdi*1+r9]
+        mov     r12d,ecx
+        add     r8d,DWORD[((12+16))+rbp]
+        and     r12d,ebx
+        rorx    r13d,ebx,25
+        rorx    edi,ebx,11
+        lea     r9d,[r14*1+r9]
+        lea     r8d,[r12*1+r8]
+        andn    r12d,ebx,edx
+        xor     r13d,edi
+        rorx    r14d,ebx,6
+        lea     r8d,[r12*1+r8]
+        xor     r13d,r14d
+        mov     edi,r9d
+        rorx    r12d,r9d,22
+        lea     r8d,[r13*1+r8]
+        xor     edi,r10d
+        rorx    r14d,r9d,13
+        rorx    r13d,r9d,2
+        lea     eax,[r8*1+rax]
+        and     r15d,edi
+        xor     r14d,r12d
+        xor     r15d,r10d
+        xor     r14d,r13d
+        lea     r8d,[r15*1+r8]
+        mov     r12d,ebx
+        add     edx,DWORD[((32+16))+rbp]
+        and     r12d,eax
+        rorx    r13d,eax,25
+        rorx    r15d,eax,11
+        lea     r8d,[r14*1+r8]
+        lea     edx,[r12*1+rdx]
+        andn    r12d,eax,ecx
+        xor     r13d,r15d
+        rorx    r14d,eax,6
+        lea     edx,[r12*1+rdx]
+        xor     r13d,r14d
+        mov     r15d,r8d
+        rorx    r12d,r8d,22
+        lea     edx,[r13*1+rdx]
+        xor     r15d,r9d
+        rorx    r14d,r8d,13
+        rorx    r13d,r8d,2
+        lea     r11d,[rdx*1+r11]
+        and     edi,r15d
+        xor     r14d,r12d
+        xor     edi,r9d
+        xor     r14d,r13d
+        lea     edx,[rdi*1+rdx]
+        mov     r12d,eax
+        add     ecx,DWORD[((36+16))+rbp]
+        and     r12d,r11d
+        rorx    r13d,r11d,25
+        rorx    edi,r11d,11
+        lea     edx,[r14*1+rdx]
+        lea     ecx,[r12*1+rcx]
+        andn    r12d,r11d,ebx
+        xor     r13d,edi
+        rorx    r14d,r11d,6
+        lea     ecx,[r12*1+rcx]
+        xor     r13d,r14d
+        mov     edi,edx
+        rorx    r12d,edx,22
+        lea     ecx,[r13*1+rcx]
+        xor     edi,r8d
+        rorx    r14d,edx,13
+        rorx    r13d,edx,2
+        lea     r10d,[rcx*1+r10]
+        and     r15d,edi
+        xor     r14d,r12d
+        xor     r15d,r8d
+        xor     r14d,r13d
+        lea     ecx,[r15*1+rcx]
+        mov     r12d,r11d
+        add     ebx,DWORD[((40+16))+rbp]
+        and     r12d,r10d
+        rorx    r13d,r10d,25
+        rorx    r15d,r10d,11
+        lea     ecx,[r14*1+rcx]
+        lea     ebx,[r12*1+rbx]
+        andn    r12d,r10d,eax
+        xor     r13d,r15d
+        rorx    r14d,r10d,6
+        lea     ebx,[r12*1+rbx]
+        xor     r13d,r14d
+        mov     r15d,ecx
+        rorx    r12d,ecx,22
+        lea     ebx,[r13*1+rbx]
+        xor     r15d,edx
+        rorx    r14d,ecx,13
+        rorx    r13d,ecx,2
+        lea     r9d,[rbx*1+r9]
+        and     edi,r15d
+        xor     r14d,r12d
+        xor     edi,edx
+        xor     r14d,r13d
+        lea     ebx,[rdi*1+rbx]
+        mov     r12d,r10d
+        add     eax,DWORD[((44+16))+rbp]
+        and     r12d,r9d
+        rorx    r13d,r9d,25
+        rorx    edi,r9d,11
+        lea     ebx,[r14*1+rbx]
+        lea     eax,[r12*1+rax]
+        andn    r12d,r9d,r11d
+        xor     r13d,edi
+        rorx    r14d,r9d,6
+        lea     eax,[r12*1+rax]
+        xor     r13d,r14d
+        mov     edi,ebx
+        rorx    r12d,ebx,22
+        lea     eax,[r13*1+rax]
+        xor     edi,ecx
+        rorx    r14d,ebx,13
+        rorx    r13d,ebx,2
+        lea     r8d,[rax*1+r8]
+        and     r15d,edi
+        xor     r14d,r12d
+        xor     r15d,ecx
+        xor     r14d,r13d
+        lea     eax,[r15*1+rax]
+        mov     r12d,r9d
+        lea     rbp,[((-64))+rbp]
+        cmp     rbp,rsp
+        jae     NEAR $L$ower_avx2
+
+        mov     rdi,QWORD[512+rsp]
+        add     eax,r14d
+
+        lea     rsp,[448+rsp]
+
+        add     eax,DWORD[rdi]
+        add     ebx,DWORD[4+rdi]
+        add     ecx,DWORD[8+rdi]
+        add     edx,DWORD[12+rdi]
+        add     r8d,DWORD[16+rdi]
+        add     r9d,DWORD[20+rdi]
+        lea     rsi,[128+rsi]
+        add     r10d,DWORD[24+rdi]
+        mov     r12,rsi
+        add     r11d,DWORD[28+rdi]
+        cmp     rsi,QWORD[((64+16))+rsp]
+
+        mov     DWORD[rdi],eax
+        cmove   r12,rsp
+        mov     DWORD[4+rdi],ebx
+        mov     DWORD[8+rdi],ecx
+        mov     DWORD[12+rdi],edx
+        mov     DWORD[16+rdi],r8d
+        mov     DWORD[20+rdi],r9d
+        mov     DWORD[24+rdi],r10d
+        mov     DWORD[28+rdi],r11d
+
+        jbe     NEAR $L$oop_avx2
+        lea     rbp,[rsp]
+
+$L$done_avx2:
+        lea     rsp,[rbp]
+        mov     rsi,QWORD[88+rsp]
+
+        vzeroupper
+        movaps  xmm6,XMMWORD[((64+32))+rsp]
+        movaps  xmm7,XMMWORD[((64+48))+rsp]
+        movaps  xmm8,XMMWORD[((64+64))+rsp]
+        movaps  xmm9,XMMWORD[((64+80))+rsp]
+        mov     r15,QWORD[((-48))+rsi]
+
+        mov     r14,QWORD[((-40))+rsi]
+
+        mov     r13,QWORD[((-32))+rsi]
+
+        mov     r12,QWORD[((-24))+rsi]
+
+        mov     rbp,QWORD[((-16))+rsi]
+
+        mov     rbx,QWORD[((-8))+rsi]
+
+        lea     rsp,[rsi]
+
+$L$epilogue_avx2:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_sha256_block_data_order_avx2:
+EXTERN  __imp_RtlVirtualUnwind
+
+ALIGN   16
+se_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        mov     rsi,QWORD[8+r9]
+        mov     r11,QWORD[56+r9]
+
+        mov     r10d,DWORD[r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jb      NEAR $L$in_prologue
+
+        mov     rax,QWORD[152+r8]
+
+        mov     r10d,DWORD[4+r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jae     NEAR $L$in_prologue
+        lea     r10,[$L$avx2_shortcut]
+        cmp     rbx,r10
+        jb      NEAR $L$not_in_avx2
+
+        and     rax,-256*4
+        add     rax,448
+$L$not_in_avx2:
+        mov     rsi,rax
+        mov     rax,QWORD[((64+24))+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+        mov     rbp,QWORD[((-16))+rax]
+        mov     r12,QWORD[((-24))+rax]
+        mov     r13,QWORD[((-32))+rax]
+        mov     r14,QWORD[((-40))+rax]
+        mov     r15,QWORD[((-48))+rax]
+        mov     QWORD[144+r8],rbx
+        mov     QWORD[160+r8],rbp
+        mov     QWORD[216+r8],r12
+        mov     QWORD[224+r8],r13
+        mov     QWORD[232+r8],r14
+        mov     QWORD[240+r8],r15
+
+        lea     r10,[$L$epilogue]
+        cmp     rbx,r10
+        jb      NEAR $L$in_prologue
+
+        lea     rsi,[((64+32))+rsi]
+        lea     rdi,[512+r8]
+        mov     ecx,8
+        DD      0xa548f3fc
+
+$L$in_prologue:
+        mov     rdi,QWORD[8+rax]
+        mov     rsi,QWORD[16+rax]
+        mov     QWORD[152+r8],rax
+        mov     QWORD[168+r8],rsi
+        mov     QWORD[176+r8],rdi
+
+        mov     rdi,QWORD[40+r9]
+        mov     rsi,r8
+        mov     ecx,154
+        DD      0xa548f3fc
+
+        mov     rsi,r9
+        xor     rcx,rcx
+        mov     rdx,QWORD[8+rsi]
+        mov     r8,QWORD[rsi]
+        mov     r9,QWORD[16+rsi]
+        mov     r10,QWORD[40+rsi]
+        lea     r11,[56+rsi]
+        lea     r12,[24+rsi]
+        mov     QWORD[32+rsp],r10
+        mov     QWORD[40+rsp],r11
+        mov     QWORD[48+rsp],r12
+        mov     QWORD[56+rsp],rcx
+        call    QWORD[__imp_RtlVirtualUnwind]
+
+        mov     eax,1
+        add     rsp,64
+        popfq
+        pop     r15
+        pop     r14
+        pop     r13
+        pop     r12
+        pop     rbp
+        pop     rbx
+        pop     rdi
+        pop     rsi
+        DB      0F3h,0C3h               ;repret
+
+
+ALIGN   16
+shaext_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        lea     r10,[$L$prologue_shaext]
+        cmp     rbx,r10
+        jb      NEAR $L$in_prologue
+
+        lea     r10,[$L$epilogue_shaext]
+        cmp     rbx,r10
+        jae     NEAR $L$in_prologue
+
+        lea     rsi,[((-8-80))+rax]
+        lea     rdi,[512+r8]
+        mov     ecx,10
+        DD      0xa548f3fc
+
+        jmp     NEAR $L$in_prologue
+
+section .pdata rdata align=4
+ALIGN   4
+        DD      $L$SEH_begin_sha256_block_data_order wrt ..imagebase
+        DD      $L$SEH_end_sha256_block_data_order wrt ..imagebase
+        DD      $L$SEH_info_sha256_block_data_order wrt ..imagebase
+        DD      $L$SEH_begin_sha256_block_data_order_shaext wrt ..imagebase
+        DD      $L$SEH_end_sha256_block_data_order_shaext wrt ..imagebase
+        DD      $L$SEH_info_sha256_block_data_order_shaext wrt ..imagebase
+        DD      $L$SEH_begin_sha256_block_data_order_ssse3 wrt ..imagebase
+        DD      $L$SEH_end_sha256_block_data_order_ssse3 wrt ..imagebase
+        DD      $L$SEH_info_sha256_block_data_order_ssse3 wrt ..imagebase
+        DD      $L$SEH_begin_sha256_block_data_order_avx wrt ..imagebase
+        DD      $L$SEH_end_sha256_block_data_order_avx wrt ..imagebase
+        DD      $L$SEH_info_sha256_block_data_order_avx wrt ..imagebase
+        DD      $L$SEH_begin_sha256_block_data_order_avx2 wrt ..imagebase
+        DD      $L$SEH_end_sha256_block_data_order_avx2 wrt ..imagebase
+        DD      $L$SEH_info_sha256_block_data_order_avx2 wrt ..imagebase
+section .xdata rdata align=8
+ALIGN   8
+$L$SEH_info_sha256_block_data_order:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$prologue wrt ..imagebase,$L$epilogue wrt ..imagebase
+$L$SEH_info_sha256_block_data_order_shaext:
+DB      9,0,0,0
+        DD      shaext_handler wrt ..imagebase
+$L$SEH_info_sha256_block_data_order_ssse3:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$prologue_ssse3 wrt ..imagebase,$L$epilogue_ssse3 wrt ..imagebase
+$L$SEH_info_sha256_block_data_order_avx:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$prologue_avx wrt ..imagebase,$L$epilogue_avx wrt ..imagebase
+$L$SEH_info_sha256_block_data_order_avx2:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$prologue_avx2 wrt ..imagebase,$L$epilogue_avx2 wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha512-x86_64.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha512-x86_64.nasm
new file mode 100644
index 0000000000..6d48b93b84
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha512-x86_64.nasm
@@ -0,0 +1,5668 @@
+; Copyright 2005-2016 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+section .text code align=64
+
+
+EXTERN  OPENSSL_ia32cap_P
+global  sha512_block_data_order
+
+ALIGN   16
+sha512_block_data_order:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_sha512_block_data_order:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+
+        lea     r11,[OPENSSL_ia32cap_P]
+        mov     r9d,DWORD[r11]
+        mov     r10d,DWORD[4+r11]
+        mov     r11d,DWORD[8+r11]
+        test    r10d,2048
+        jnz     NEAR $L$xop_shortcut
+        and     r11d,296
+        cmp     r11d,296
+        je      NEAR $L$avx2_shortcut
+        and     r9d,1073741824
+        and     r10d,268435968
+        or      r10d,r9d
+        cmp     r10d,1342177792
+        je      NEAR $L$avx_shortcut
+        mov     rax,rsp
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+        shl     rdx,4
+        sub     rsp,16*8+4*8
+        lea     rdx,[rdx*8+rsi]
+        and     rsp,-64
+        mov     QWORD[((128+0))+rsp],rdi
+        mov     QWORD[((128+8))+rsp],rsi
+        mov     QWORD[((128+16))+rsp],rdx
+        mov     QWORD[152+rsp],rax
+
+$L$prologue:
+
+        mov     rax,QWORD[rdi]
+        mov     rbx,QWORD[8+rdi]
+        mov     rcx,QWORD[16+rdi]
+        mov     rdx,QWORD[24+rdi]
+        mov     r8,QWORD[32+rdi]
+        mov     r9,QWORD[40+rdi]
+        mov     r10,QWORD[48+rdi]
+        mov     r11,QWORD[56+rdi]
+        jmp     NEAR $L$loop
+
+ALIGN   16
+$L$loop:
+        mov     rdi,rbx
+        lea     rbp,[K512]
+        xor     rdi,rcx
+        mov     r12,QWORD[rsi]
+        mov     r13,r8
+        mov     r14,rax
+        bswap   r12
+        ror     r13,23
+        mov     r15,r9
+
+        xor     r13,r8
+        ror     r14,5
+        xor     r15,r10
+
+        mov     QWORD[rsp],r12
+        xor     r14,rax
+        and     r15,r8
+
+        ror     r13,4
+        add     r12,r11
+        xor     r15,r10
+
+        ror     r14,6
+        xor     r13,r8
+        add     r12,r15
+
+        mov     r15,rax
+        add     r12,QWORD[rbp]
+        xor     r14,rax
+
+        xor     r15,rbx
+        ror     r13,14
+        mov     r11,rbx
+
+        and     rdi,r15
+        ror     r14,28
+        add     r12,r13
+
+        xor     r11,rdi
+        add     rdx,r12
+        add     r11,r12
+
+        lea     rbp,[8+rbp]
+        add     r11,r14
+        mov     r12,QWORD[8+rsi]
+        mov     r13,rdx
+        mov     r14,r11
+        bswap   r12
+        ror     r13,23
+        mov     rdi,r8
+
+        xor     r13,rdx
+        ror     r14,5
+        xor     rdi,r9
+
+        mov     QWORD[8+rsp],r12
+        xor     r14,r11
+        and     rdi,rdx
+
+        ror     r13,4
+        add     r12,r10
+        xor     rdi,r9
+
+        ror     r14,6
+        xor     r13,rdx
+        add     r12,rdi
+
+        mov     rdi,r11
+        add     r12,QWORD[rbp]
+        xor     r14,r11
+
+        xor     rdi,rax
+        ror     r13,14
+        mov     r10,rax
+
+        and     r15,rdi
+        ror     r14,28
+        add     r12,r13
+
+        xor     r10,r15
+        add     rcx,r12
+        add     r10,r12
+
+        lea     rbp,[24+rbp]
+        add     r10,r14
+        mov     r12,QWORD[16+rsi]
+        mov     r13,rcx
+        mov     r14,r10
+        bswap   r12
+        ror     r13,23
+        mov     r15,rdx
+
+        xor     r13,rcx
+        ror     r14,5
+        xor     r15,r8
+
+        mov     QWORD[16+rsp],r12
+        xor     r14,r10
+        and     r15,rcx
+
+        ror     r13,4
+        add     r12,r9
+        xor     r15,r8
+
+        ror     r14,6
+        xor     r13,rcx
+        add     r12,r15
+
+        mov     r15,r10
+        add     r12,QWORD[rbp]
+        xor     r14,r10
+
+        xor     r15,r11
+        ror     r13,14
+        mov     r9,r11
+
+        and     rdi,r15
+        ror     r14,28
+        add     r12,r13
+
+        xor     r9,rdi
+        add     rbx,r12
+        add     r9,r12
+
+        lea     rbp,[8+rbp]
+        add     r9,r14
+        mov     r12,QWORD[24+rsi]
+        mov     r13,rbx
+        mov     r14,r9
+        bswap   r12
+        ror     r13,23
+        mov     rdi,rcx
+
+        xor     r13,rbx
+        ror     r14,5
+        xor     rdi,rdx
+
+        mov     QWORD[24+rsp],r12
+        xor     r14,r9
+        and     rdi,rbx
+
+        ror     r13,4
+        add     r12,r8
+        xor     rdi,rdx
+
+        ror     r14,6
+        xor     r13,rbx
+        add     r12,rdi
+
+        mov     rdi,r9
+        add     r12,QWORD[rbp]
+        xor     r14,r9
+
+        xor     rdi,r10
+        ror     r13,14
+        mov     r8,r10
+
+        and     r15,rdi
+        ror     r14,28
+        add     r12,r13
+
+        xor     r8,r15
+        add     rax,r12
+        add     r8,r12
+
+        lea     rbp,[24+rbp]
+        add     r8,r14
+        mov     r12,QWORD[32+rsi]
+        mov     r13,rax
+        mov     r14,r8
+        bswap   r12
+        ror     r13,23
+        mov     r15,rbx
+
+        xor     r13,rax
+        ror     r14,5
+        xor     r15,rcx
+
+        mov     QWORD[32+rsp],r12
+        xor     r14,r8
+        and     r15,rax
+
+        ror     r13,4
+        add     r12,rdx
+        xor     r15,rcx
+
+        ror     r14,6
+        xor     r13,rax
+        add     r12,r15
+
+        mov     r15,r8
+        add     r12,QWORD[rbp]
+        xor     r14,r8
+
+        xor     r15,r9
+        ror     r13,14
+        mov     rdx,r9
+
+        and     rdi,r15
+        ror     r14,28
+        add     r12,r13
+
+        xor     rdx,rdi
+        add     r11,r12
+        add     rdx,r12
+
+        lea     rbp,[8+rbp]
+        add     rdx,r14
+        mov     r12,QWORD[40+rsi]
+        mov     r13,r11
+        mov     r14,rdx
+        bswap   r12
+        ror     r13,23
+        mov     rdi,rax
+
+        xor     r13,r11
+        ror     r14,5
+        xor     rdi,rbx
+
+        mov     QWORD[40+rsp],r12
+        xor     r14,rdx
+        and     rdi,r11
+
+        ror     r13,4
+        add     r12,rcx
+        xor     rdi,rbx
+
+        ror     r14,6
+        xor     r13,r11
+        add     r12,rdi
+
+        mov     rdi,rdx
+        add     r12,QWORD[rbp]
+        xor     r14,rdx
+
+        xor     rdi,r8
+        ror     r13,14
+        mov     rcx,r8
+
+        and     r15,rdi
+        ror     r14,28
+        add     r12,r13
+
+        xor     rcx,r15
+        add     r10,r12
+        add     rcx,r12
+
+        lea     rbp,[24+rbp]
+        add     rcx,r14
+        mov     r12,QWORD[48+rsi]
+        mov     r13,r10
+        mov     r14,rcx
+        bswap   r12
+        ror     r13,23
+        mov     r15,r11
+
+        xor     r13,r10
+        ror     r14,5
+        xor     r15,rax
+
+        mov     QWORD[48+rsp],r12
+        xor     r14,rcx
+        and     r15,r10
+
+        ror     r13,4
+        add     r12,rbx
+        xor     r15,rax
+
+        ror     r14,6
+        xor     r13,r10
+        add     r12,r15
+
+        mov     r15,rcx
+        add     r12,QWORD[rbp]
+        xor     r14,rcx
+
+        xor     r15,rdx
+        ror     r13,14
+        mov     rbx,rdx
+
+        and     rdi,r15
+        ror     r14,28
+        add     r12,r13
+
+        xor     rbx,rdi
+        add     r9,r12
+        add     rbx,r12
+
+        lea     rbp,[8+rbp]
+        add     rbx,r14
+        mov     r12,QWORD[56+rsi]
+        mov     r13,r9
+        mov     r14,rbx
+        bswap   r12
+        ror     r13,23
+        mov     rdi,r10
+
+        xor     r13,r9
+        ror     r14,5
+        xor     rdi,r11
+
+        mov     QWORD[56+rsp],r12
+        xor     r14,rbx
+        and     rdi,r9
+
+        ror     r13,4
+        add     r12,rax
+        xor     rdi,r11
+
+        ror     r14,6
+        xor     r13,r9
+        add     r12,rdi
+
+        mov     rdi,rbx
+        add     r12,QWORD[rbp]
+        xor     r14,rbx
+
+        xor     rdi,rcx
+        ror     r13,14
+        mov     rax,rcx
+
+        and     r15,rdi
+        ror     r14,28
+        add     r12,r13
+
+        xor     rax,r15
+        add     r8,r12
+        add     rax,r12
+
+        lea     rbp,[24+rbp]
+        add     rax,r14
+        mov     r12,QWORD[64+rsi]
+        mov     r13,r8
+        mov     r14,rax
+        bswap   r12
+        ror     r13,23
+        mov     r15,r9
+
+        xor     r13,r8
+        ror     r14,5
+        xor     r15,r10
+
+        mov     QWORD[64+rsp],r12
+        xor     r14,rax
+        and     r15,r8
+
+        ror     r13,4
+        add     r12,r11
+        xor     r15,r10
+
+        ror     r14,6
+        xor     r13,r8
+        add     r12,r15
+
+        mov     r15,rax
+        add     r12,QWORD[rbp]
+        xor     r14,rax
+
+        xor     r15,rbx
+        ror     r13,14
+        mov     r11,rbx
+
+        and     rdi,r15
+        ror     r14,28
+        add     r12,r13
+
+        xor     r11,rdi
+        add     rdx,r12
+        add     r11,r12
+
+        lea     rbp,[8+rbp]
+        add     r11,r14
+        mov     r12,QWORD[72+rsi]
+        mov     r13,rdx
+        mov     r14,r11
+        bswap   r12
+        ror     r13,23
+        mov     rdi,r8
+
+        xor     r13,rdx
+        ror     r14,5
+        xor     rdi,r9
+
+        mov     QWORD[72+rsp],r12
+        xor     r14,r11
+        and     rdi,rdx
+
+        ror     r13,4
+        add     r12,r10
+        xor     rdi,r9
+
+        ror     r14,6
+        xor     r13,rdx
+        add     r12,rdi
+
+        mov     rdi,r11
+        add     r12,QWORD[rbp]
+        xor     r14,r11
+
+        xor     rdi,rax
+        ror     r13,14
+        mov     r10,rax
+
+        and     r15,rdi
+        ror     r14,28
+        add     r12,r13
+
+        xor     r10,r15
+        add     rcx,r12
+        add     r10,r12
+
+        lea     rbp,[24+rbp]
+        add     r10,r14
+        mov     r12,QWORD[80+rsi]
+        mov     r13,rcx
+        mov     r14,r10
+        bswap   r12
+        ror     r13,23
+        mov     r15,rdx
+
+        xor     r13,rcx
+        ror     r14,5
+        xor     r15,r8
+
+        mov     QWORD[80+rsp],r12
+        xor     r14,r10
+        and     r15,rcx
+
+        ror     r13,4
+        add     r12,r9
+        xor     r15,r8
+
+        ror     r14,6
+        xor     r13,rcx
+        add     r12,r15
+
+        mov     r15,r10
+        add     r12,QWORD[rbp]
+        xor     r14,r10
+
+        xor     r15,r11
+        ror     r13,14
+        mov     r9,r11
+
+        and     rdi,r15
+        ror     r14,28
+        add     r12,r13
+
+        xor     r9,rdi
+        add     rbx,r12
+        add     r9,r12
+
+        lea     rbp,[8+rbp]
+        add     r9,r14
+        mov     r12,QWORD[88+rsi]
+        mov     r13,rbx
+        mov     r14,r9
+        bswap   r12
+        ror     r13,23
+        mov     rdi,rcx
+
+        xor     r13,rbx
+        ror     r14,5
+        xor     rdi,rdx
+
+        mov     QWORD[88+rsp],r12
+        xor     r14,r9
+        and     rdi,rbx
+
+        ror     r13,4
+        add     r12,r8
+        xor     rdi,rdx
+
+        ror     r14,6
+        xor     r13,rbx
+        add     r12,rdi
+
+        mov     rdi,r9
+        add     r12,QWORD[rbp]
+        xor     r14,r9
+
+        xor     rdi,r10
+        ror     r13,14
+        mov     r8,r10
+
+        and     r15,rdi
+        ror     r14,28
+        add     r12,r13
+
+        xor     r8,r15
+        add     rax,r12
+        add     r8,r12
+
+        lea     rbp,[24+rbp]
+        add     r8,r14
+        mov     r12,QWORD[96+rsi]
+        mov     r13,rax
+        mov     r14,r8
+        bswap   r12
+        ror     r13,23
+        mov     r15,rbx
+
+        xor     r13,rax
+        ror     r14,5
+        xor     r15,rcx
+
+        mov     QWORD[96+rsp],r12
+        xor     r14,r8
+        and     r15,rax
+
+        ror     r13,4
+        add     r12,rdx
+        xor     r15,rcx
+
+        ror     r14,6
+        xor     r13,rax
+        add     r12,r15
+
+        mov     r15,r8
+        add     r12,QWORD[rbp]
+        xor     r14,r8
+
+        xor     r15,r9
+        ror     r13,14
+        mov     rdx,r9
+
+        and     rdi,r15
+        ror     r14,28
+        add     r12,r13
+
+        xor     rdx,rdi
+        add     r11,r12
+        add     rdx,r12
+
+        lea     rbp,[8+rbp]
+        add     rdx,r14
+        mov     r12,QWORD[104+rsi]
+        mov     r13,r11
+        mov     r14,rdx
+        bswap   r12
+        ror     r13,23
+        mov     rdi,rax
+
+        xor     r13,r11
+        ror     r14,5
+        xor     rdi,rbx
+
+        mov     QWORD[104+rsp],r12
+        xor     r14,rdx
+        and     rdi,r11
+
+        ror     r13,4
+        add     r12,rcx
+        xor     rdi,rbx
+
+        ror     r14,6
+        xor     r13,r11
+        add     r12,rdi
+
+        mov     rdi,rdx
+        add     r12,QWORD[rbp]
+        xor     r14,rdx
+
+        xor     rdi,r8
+        ror     r13,14
+        mov     rcx,r8
+
+        and     r15,rdi
+        ror     r14,28
+        add     r12,r13
+
+        xor     rcx,r15
+        add     r10,r12
+        add     rcx,r12
+
+        lea     rbp,[24+rbp]
+        add     rcx,r14
+        mov     r12,QWORD[112+rsi]
+        mov     r13,r10
+        mov     r14,rcx
+        bswap   r12
+        ror     r13,23
+        mov     r15,r11
+
+        xor     r13,r10
+        ror     r14,5
+        xor     r15,rax
+
+        mov     QWORD[112+rsp],r12
+        xor     r14,rcx
+        and     r15,r10
+
+        ror     r13,4
+        add     r12,rbx
+        xor     r15,rax
+
+        ror     r14,6
+        xor     r13,r10
+        add     r12,r15
+
+        mov     r15,rcx
+        add     r12,QWORD[rbp]
+        xor     r14,rcx
+
+        xor     r15,rdx
+        ror     r13,14
+        mov     rbx,rdx
+
+        and     rdi,r15
+        ror     r14,28
+        add     r12,r13
+
+        xor     rbx,rdi
+        add     r9,r12
+        add     rbx,r12
+
+        lea     rbp,[8+rbp]
+        add     rbx,r14
+        mov     r12,QWORD[120+rsi]
+        mov     r13,r9
+        mov     r14,rbx
+        bswap   r12
+        ror     r13,23
+        mov     rdi,r10
+
+        xor     r13,r9
+        ror     r14,5
+        xor     rdi,r11
+
+        mov     QWORD[120+rsp],r12
+        xor     r14,rbx
+        and     rdi,r9
+
+        ror     r13,4
+        add     r12,rax
+        xor     rdi,r11
+
+        ror     r14,6
+        xor     r13,r9
+        add     r12,rdi
+
+        mov     rdi,rbx
+        add     r12,QWORD[rbp]
+        xor     r14,rbx
+
+        xor     rdi,rcx
+        ror     r13,14
+        mov     rax,rcx
+
+        and     r15,rdi
+        ror     r14,28
+        add     r12,r13
+
+        xor     rax,r15
+        add     r8,r12
+        add     rax,r12
+
+        lea     rbp,[24+rbp]
+        jmp     NEAR $L$rounds_16_xx
+ALIGN   16
+$L$rounds_16_xx:
+        mov     r13,QWORD[8+rsp]
+        mov     r15,QWORD[112+rsp]
+
+        mov     r12,r13
+        ror     r13,7
+        add     rax,r14
+        mov     r14,r15
+        ror     r15,42
+
+        xor     r13,r12
+        shr     r12,7
+        ror     r13,1
+        xor     r15,r14
+        shr     r14,6
+
+        ror     r15,19
+        xor     r12,r13
+        xor     r15,r14
+        add     r12,QWORD[72+rsp]
+
+        add     r12,QWORD[rsp]
+        mov     r13,r8
+        add     r12,r15
+        mov     r14,rax
+        ror     r13,23
+        mov     r15,r9
+
+        xor     r13,r8
+        ror     r14,5
+        xor     r15,r10
+
+        mov     QWORD[rsp],r12
+        xor     r14,rax
+        and     r15,r8
+
+        ror     r13,4
+        add     r12,r11
+        xor     r15,r10
+
+        ror     r14,6
+        xor     r13,r8
+        add     r12,r15
+
+        mov     r15,rax
+        add     r12,QWORD[rbp]
+        xor     r14,rax
+
+        xor     r15,rbx
+        ror     r13,14
+        mov     r11,rbx
+
+        and     rdi,r15
+        ror     r14,28
+        add     r12,r13
+
+        xor     r11,rdi
+        add     rdx,r12
+        add     r11,r12
+
+        lea     rbp,[8+rbp]
+        mov     r13,QWORD[16+rsp]
+        mov     rdi,QWORD[120+rsp]
+
+        mov     r12,r13
+        ror     r13,7
+        add     r11,r14
+        mov     r14,rdi
+        ror     rdi,42
+
+        xor     r13,r12
+        shr     r12,7
+        ror     r13,1
+        xor     rdi,r14
+        shr     r14,6
+
+        ror     rdi,19
+        xor     r12,r13
+        xor     rdi,r14
+        add     r12,QWORD[80+rsp]
+
+        add     r12,QWORD[8+rsp]
+        mov     r13,rdx
+        add     r12,rdi
+        mov     r14,r11
+        ror     r13,23
+        mov     rdi,r8
+
+        xor     r13,rdx
+        ror     r14,5
+        xor     rdi,r9
+
+        mov     QWORD[8+rsp],r12
+        xor     r14,r11
+        and     rdi,rdx
+
+        ror     r13,4
+        add     r12,r10
+        xor     rdi,r9
+
+        ror     r14,6
+        xor     r13,rdx
+        add     r12,rdi
+
+        mov     rdi,r11
+        add     r12,QWORD[rbp]
+        xor     r14,r11
+
+        xor     rdi,rax
+        ror     r13,14
+        mov     r10,rax
+
+        and     r15,rdi
+        ror     r14,28
+        add     r12,r13
+
+        xor     r10,r15
+        add     rcx,r12
+        add     r10,r12
+
+        lea     rbp,[24+rbp]
+        mov     r13,QWORD[24+rsp]
+        mov     r15,QWORD[rsp]
+
+        mov     r12,r13
+        ror     r13,7
+        add     r10,r14
+        mov     r14,r15
+        ror     r15,42
+
+        xor     r13,r12
+        shr     r12,7
+        ror     r13,1
+        xor     r15,r14
+        shr     r14,6
+
+        ror     r15,19
+        xor     r12,r13
+        xor     r15,r14
+        add     r12,QWORD[88+rsp]
+
+        add     r12,QWORD[16+rsp]
+        mov     r13,rcx
+        add     r12,r15
+        mov     r14,r10
+        ror     r13,23
+        mov     r15,rdx
+
+        xor     r13,rcx
+        ror     r14,5
+        xor     r15,r8
+
+        mov     QWORD[16+rsp],r12
+        xor     r14,r10
+        and     r15,rcx
+
+        ror     r13,4
+        add     r12,r9
+        xor     r15,r8
+
+        ror     r14,6
+        xor     r13,rcx
+        add     r12,r15
+
+        mov     r15,r10
+        add     r12,QWORD[rbp]
+        xor     r14,r10
+
+        xor     r15,r11
+        ror     r13,14
+        mov     r9,r11
+
+        and     rdi,r15
+        ror     r14,28
+        add     r12,r13
+
+        xor     r9,rdi
+        add     rbx,r12
+        add     r9,r12
+
+        lea     rbp,[8+rbp]
+        mov     r13,QWORD[32+rsp]
+        mov     rdi,QWORD[8+rsp]
+
+        mov     r12,r13
+        ror     r13,7
+        add     r9,r14
+        mov     r14,rdi
+        ror     rdi,42
+
+        xor     r13,r12
+        shr     r12,7
+        ror     r13,1
+        xor     rdi,r14
+        shr     r14,6
+
+        ror     rdi,19
+        xor     r12,r13
+        xor     rdi,r14
+        add     r12,QWORD[96+rsp]
+
+        add     r12,QWORD[24+rsp]
+        mov     r13,rbx
+        add     r12,rdi
+        mov     r14,r9
+        ror     r13,23
+        mov     rdi,rcx
+
+        xor     r13,rbx
+        ror     r14,5
+        xor     rdi,rdx
+
+        mov     QWORD[24+rsp],r12
+        xor     r14,r9
+        and     rdi,rbx
+
+        ror     r13,4
+        add     r12,r8
+        xor     rdi,rdx
+
+        ror     r14,6
+        xor     r13,rbx
+        add     r12,rdi
+
+        mov     rdi,r9
+        add     r12,QWORD[rbp]
+        xor     r14,r9
+
+        xor     rdi,r10
+        ror     r13,14
+        mov     r8,r10
+
+        and     r15,rdi
+        ror     r14,28
+        add     r12,r13
+
+        xor     r8,r15
+        add     rax,r12
+        add     r8,r12
+
+        lea     rbp,[24+rbp]
+        mov     r13,QWORD[40+rsp]
+        mov     r15,QWORD[16+rsp]
+
+        mov     r12,r13
+        ror     r13,7
+        add     r8,r14
+        mov     r14,r15
+        ror     r15,42
+
+        xor     r13,r12
+        shr     r12,7
+        ror     r13,1
+        xor     r15,r14
+        shr     r14,6
+
+        ror     r15,19
+        xor     r12,r13
+        xor     r15,r14
+        add     r12,QWORD[104+rsp]
+
+        add     r12,QWORD[32+rsp]
+        mov     r13,rax
+        add     r12,r15
+        mov     r14,r8
+        ror     r13,23
+        mov     r15,rbx
+
+        xor     r13,rax
+        ror     r14,5
+        xor     r15,rcx
+
+        mov     QWORD[32+rsp],r12
+        xor     r14,r8
+        and     r15,rax
+
+        ror     r13,4
+        add     r12,rdx
+        xor     r15,rcx
+
+        ror     r14,6
+        xor     r13,rax
+        add     r12,r15
+
+        mov     r15,r8
+        add     r12,QWORD[rbp]
+        xor     r14,r8
+
+        xor     r15,r9
+        ror     r13,14
+        mov     rdx,r9
+
+        and     rdi,r15
+        ror     r14,28
+        add     r12,r13
+
+        xor     rdx,rdi
+        add     r11,r12
+        add     rdx,r12
+
+        lea     rbp,[8+rbp]
+        mov     r13,QWORD[48+rsp]
+        mov     rdi,QWORD[24+rsp]
+
+        mov     r12,r13
+        ror     r13,7
+        add     rdx,r14
+        mov     r14,rdi
+        ror     rdi,42
+
+        xor     r13,r12
+        shr     r12,7
+        ror     r13,1
+        xor     rdi,r14
+        shr     r14,6
+
+        ror     rdi,19
+        xor     r12,r13
+        xor     rdi,r14
+        add     r12,QWORD[112+rsp]
+
+        add     r12,QWORD[40+rsp]
+        mov     r13,r11
+        add     r12,rdi
+        mov     r14,rdx
+        ror     r13,23
+        mov     rdi,rax
+
+        xor     r13,r11
+        ror     r14,5
+        xor     rdi,rbx
+
+        mov     QWORD[40+rsp],r12
+        xor     r14,rdx
+        and     rdi,r11
+
+        ror     r13,4
+        add     r12,rcx
+        xor     rdi,rbx
+
+        ror     r14,6
+        xor     r13,r11
+        add     r12,rdi
+
+        mov     rdi,rdx
+        add     r12,QWORD[rbp]
+        xor     r14,rdx
+
+        xor     rdi,r8
+        ror     r13,14
+        mov     rcx,r8
+
+        and     r15,rdi
+        ror     r14,28
+        add     r12,r13
+
+        xor     rcx,r15
+        add     r10,r12
+        add     rcx,r12
+
+        lea     rbp,[24+rbp]
+        mov     r13,QWORD[56+rsp]
+        mov     r15,QWORD[32+rsp]
+
+        mov     r12,r13
+        ror     r13,7
+        add     rcx,r14
+        mov     r14,r15
+        ror     r15,42
+
+        xor     r13,r12
+        shr     r12,7
+        ror     r13,1
+        xor     r15,r14
+        shr     r14,6
+
+        ror     r15,19
+        xor     r12,r13
+        xor     r15,r14
+        add     r12,QWORD[120+rsp]
+
+        add     r12,QWORD[48+rsp]
+        mov     r13,r10
+        add     r12,r15
+        mov     r14,rcx
+        ror     r13,23
+        mov     r15,r11
+
+        xor     r13,r10
+        ror     r14,5
+        xor     r15,rax
+
+        mov     QWORD[48+rsp],r12
+        xor     r14,rcx
+        and     r15,r10
+
+        ror     r13,4
+        add     r12,rbx
+        xor     r15,rax
+
+        ror     r14,6
+        xor     r13,r10
+        add     r12,r15
+
+        mov     r15,rcx
+        add     r12,QWORD[rbp]
+        xor     r14,rcx
+
+        xor     r15,rdx
+        ror     r13,14
+        mov     rbx,rdx
+
+        and     rdi,r15
+        ror     r14,28
+        add     r12,r13
+
+        xor     rbx,rdi
+        add     r9,r12
+        add     rbx,r12
+
+        lea     rbp,[8+rbp]
+        mov     r13,QWORD[64+rsp]
+        mov     rdi,QWORD[40+rsp]
+
+        mov     r12,r13
+        ror     r13,7
+        add     rbx,r14
+        mov     r14,rdi
+        ror     rdi,42
+
+        xor     r13,r12
+        shr     r12,7
+        ror     r13,1
+        xor     rdi,r14
+        shr     r14,6
+
+        ror     rdi,19
+        xor     r12,r13
+        xor     rdi,r14
+        add     r12,QWORD[rsp]
+
+        add     r12,QWORD[56+rsp]
+        mov     r13,r9
+        add     r12,rdi
+        mov     r14,rbx
+        ror     r13,23
+        mov     rdi,r10
+
+        xor     r13,r9
+        ror     r14,5
+        xor     rdi,r11
+
+        mov     QWORD[56+rsp],r12
+        xor     r14,rbx
+        and     rdi,r9
+
+        ror     r13,4
+        add     r12,rax
+        xor     rdi,r11
+
+        ror     r14,6
+        xor     r13,r9
+        add     r12,rdi
+
+        mov     rdi,rbx
+        add     r12,QWORD[rbp]
+        xor     r14,rbx
+
+        xor     rdi,rcx
+        ror     r13,14
+        mov     rax,rcx
+
+        and     r15,rdi
+        ror     r14,28
+        add     r12,r13
+
+        xor     rax,r15
+        add     r8,r12
+        add     rax,r12
+
+        lea     rbp,[24+rbp]
+        mov     r13,QWORD[72+rsp]
+        mov     r15,QWORD[48+rsp]
+
+        mov     r12,r13
+        ror     r13,7
+        add     rax,r14
+        mov     r14,r15
+        ror     r15,42
+
+        xor     r13,r12
+        shr     r12,7
+        ror     r13,1
+        xor     r15,r14
+        shr     r14,6
+
+        ror     r15,19
+        xor     r12,r13
+        xor     r15,r14
+        add     r12,QWORD[8+rsp]
+
+        add     r12,QWORD[64+rsp]
+        mov     r13,r8
+        add     r12,r15
+        mov     r14,rax
+        ror     r13,23
+        mov     r15,r9
+
+        xor     r13,r8
+        ror     r14,5
+        xor     r15,r10
+
+        mov     QWORD[64+rsp],r12
+        xor     r14,rax
+        and     r15,r8
+
+        ror     r13,4
+        add     r12,r11
+        xor     r15,r10
+
+        ror     r14,6
+        xor     r13,r8
+        add     r12,r15
+
+        mov     r15,rax
+        add     r12,QWORD[rbp]
+        xor     r14,rax
+
+        xor     r15,rbx
+        ror     r13,14
+        mov     r11,rbx
+
+        and     rdi,r15
+        ror     r14,28
+        add     r12,r13
+
+        xor     r11,rdi
+        add     rdx,r12
+        add     r11,r12
+
+        lea     rbp,[8+rbp]
+        mov     r13,QWORD[80+rsp]
+        mov     rdi,QWORD[56+rsp]
+
+        mov     r12,r13
+        ror     r13,7
+        add     r11,r14
+        mov     r14,rdi
+        ror     rdi,42
+
+        xor     r13,r12
+        shr     r12,7
+        ror     r13,1
+        xor     rdi,r14
+        shr     r14,6
+
+        ror     rdi,19
+        xor     r12,r13
+        xor     rdi,r14
+        add     r12,QWORD[16+rsp]
+
+        add     r12,QWORD[72+rsp]
+        mov     r13,rdx
+        add     r12,rdi
+        mov     r14,r11
+        ror     r13,23
+        mov     rdi,r8
+
+        xor     r13,rdx
+        ror     r14,5
+        xor     rdi,r9
+
+        mov     QWORD[72+rsp],r12
+        xor     r14,r11
+        and     rdi,rdx
+
+        ror     r13,4
+        add     r12,r10
+        xor     rdi,r9
+
+        ror     r14,6
+        xor     r13,rdx
+        add     r12,rdi
+
+        mov     rdi,r11
+        add     r12,QWORD[rbp]
+        xor     r14,r11
+
+        xor     rdi,rax
+        ror     r13,14
+        mov     r10,rax
+
+        and     r15,rdi
+        ror     r14,28
+        add     r12,r13
+
+        xor     r10,r15
+        add     rcx,r12
+        add     r10,r12
+
+        lea     rbp,[24+rbp]
+        mov     r13,QWORD[88+rsp]
+        mov     r15,QWORD[64+rsp]
+
+        mov     r12,r13
+        ror     r13,7
+        add     r10,r14
+        mov     r14,r15
+        ror     r15,42
+
+        xor     r13,r12
+        shr     r12,7
+        ror     r13,1
+        xor     r15,r14
+        shr     r14,6
+
+        ror     r15,19
+        xor     r12,r13
+        xor     r15,r14
+        add     r12,QWORD[24+rsp]
+
+        add     r12,QWORD[80+rsp]
+        mov     r13,rcx
+        add     r12,r15
+        mov     r14,r10
+        ror     r13,23
+        mov     r15,rdx
+
+        xor     r13,rcx
+        ror     r14,5
+        xor     r15,r8
+
+        mov     QWORD[80+rsp],r12
+        xor     r14,r10
+        and     r15,rcx
+
+        ror     r13,4
+        add     r12,r9
+        xor     r15,r8
+
+        ror     r14,6
+        xor     r13,rcx
+        add     r12,r15
+
+        mov     r15,r10
+        add     r12,QWORD[rbp]
+        xor     r14,r10
+
+        xor     r15,r11
+        ror     r13,14
+        mov     r9,r11
+
+        and     rdi,r15
+        ror     r14,28
+        add     r12,r13
+
+        xor     r9,rdi
+        add     rbx,r12
+        add     r9,r12
+
+        lea     rbp,[8+rbp]
+        mov     r13,QWORD[96+rsp]
+        mov     rdi,QWORD[72+rsp]
+
+        mov     r12,r13
+        ror     r13,7
+        add     r9,r14
+        mov     r14,rdi
+        ror     rdi,42
+
+        xor     r13,r12
+        shr     r12,7
+        ror     r13,1
+        xor     rdi,r14
+        shr     r14,6
+
+        ror     rdi,19
+        xor     r12,r13
+        xor     rdi,r14
+        add     r12,QWORD[32+rsp]
+
+        add     r12,QWORD[88+rsp]
+        mov     r13,rbx
+        add     r12,rdi
+        mov     r14,r9
+        ror     r13,23
+        mov     rdi,rcx
+
+        xor     r13,rbx
+        ror     r14,5
+        xor     rdi,rdx
+
+        mov     QWORD[88+rsp],r12
+        xor     r14,r9
+        and     rdi,rbx
+
+        ror     r13,4
+        add     r12,r8
+        xor     rdi,rdx
+
+        ror     r14,6
+        xor     r13,rbx
+        add     r12,rdi
+
+        mov     rdi,r9
+        add     r12,QWORD[rbp]
+        xor     r14,r9
+
+        xor     rdi,r10
+        ror     r13,14
+        mov     r8,r10
+
+        and     r15,rdi
+        ror     r14,28
+        add     r12,r13
+
+        xor     r8,r15
+        add     rax,r12
+        add     r8,r12
+
+        lea     rbp,[24+rbp]
+        mov     r13,QWORD[104+rsp]
+        mov     r15,QWORD[80+rsp]
+
+        mov     r12,r13
+        ror     r13,7
+        add     r8,r14
+        mov     r14,r15
+        ror     r15,42
+
+        xor     r13,r12
+        shr     r12,7
+        ror     r13,1
+        xor     r15,r14
+        shr     r14,6
+
+        ror     r15,19
+        xor     r12,r13
+        xor     r15,r14
+        add     r12,QWORD[40+rsp]
+
+        add     r12,QWORD[96+rsp]
+        mov     r13,rax
+        add     r12,r15
+        mov     r14,r8
+        ror     r13,23
+        mov     r15,rbx
+
+        xor     r13,rax
+        ror     r14,5
+        xor     r15,rcx
+
+        mov     QWORD[96+rsp],r12
+        xor     r14,r8
+        and     r15,rax
+
+        ror     r13,4
+        add     r12,rdx
+        xor     r15,rcx
+
+        ror     r14,6
+        xor     r13,rax
+        add     r12,r15
+
+        mov     r15,r8
+        add     r12,QWORD[rbp]
+        xor     r14,r8
+
+        xor     r15,r9
+        ror     r13,14
+        mov     rdx,r9
+
+        and     rdi,r15
+        ror     r14,28
+        add     r12,r13
+
+        xor     rdx,rdi
+        add     r11,r12
+        add     rdx,r12
+
+        lea     rbp,[8+rbp]
+        mov     r13,QWORD[112+rsp]
+        mov     rdi,QWORD[88+rsp]
+
+        mov     r12,r13
+        ror     r13,7
+        add     rdx,r14
+        mov     r14,rdi
+        ror     rdi,42
+
+        xor     r13,r12
+        shr     r12,7
+        ror     r13,1
+        xor     rdi,r14
+        shr     r14,6
+
+        ror     rdi,19
+        xor     r12,r13
+        xor     rdi,r14
+        add     r12,QWORD[48+rsp]
+
+        add     r12,QWORD[104+rsp]
+        mov     r13,r11
+        add     r12,rdi
+        mov     r14,rdx
+        ror     r13,23
+        mov     rdi,rax
+
+        xor     r13,r11
+        ror     r14,5
+        xor     rdi,rbx
+
+        mov     QWORD[104+rsp],r12
+        xor     r14,rdx
+        and     rdi,r11
+
+        ror     r13,4
+        add     r12,rcx
+        xor     rdi,rbx
+
+        ror     r14,6
+        xor     r13,r11
+        add     r12,rdi
+
+        mov     rdi,rdx
+        add     r12,QWORD[rbp]
+        xor     r14,rdx
+
+        xor     rdi,r8
+        ror     r13,14
+        mov     rcx,r8
+
+        and     r15,rdi
+        ror     r14,28
+        add     r12,r13
+
+        xor     rcx,r15
+        add     r10,r12
+        add     rcx,r12
+
+        lea     rbp,[24+rbp]
+        mov     r13,QWORD[120+rsp]
+        mov     r15,QWORD[96+rsp]
+
+        mov     r12,r13
+        ror     r13,7
+        add     rcx,r14
+        mov     r14,r15
+        ror     r15,42
+
+        xor     r13,r12
+        shr     r12,7
+        ror     r13,1
+        xor     r15,r14
+        shr     r14,6
+
+        ror     r15,19
+        xor     r12,r13
+        xor     r15,r14
+        add     r12,QWORD[56+rsp]
+
+        add     r12,QWORD[112+rsp]
+        mov     r13,r10
+        add     r12,r15
+        mov     r14,rcx
+        ror     r13,23
+        mov     r15,r11
+
+        xor     r13,r10
+        ror     r14,5
+        xor     r15,rax
+
+        mov     QWORD[112+rsp],r12
+        xor     r14,rcx
+        and     r15,r10
+
+        ror     r13,4
+        add     r12,rbx
+        xor     r15,rax
+
+        ror     r14,6
+        xor     r13,r10
+        add     r12,r15
+
+        mov     r15,rcx
+        add     r12,QWORD[rbp]
+        xor     r14,rcx
+
+        xor     r15,rdx
+        ror     r13,14
+        mov     rbx,rdx
+
+        and     rdi,r15
+        ror     r14,28
+        add     r12,r13
+
+        xor     rbx,rdi
+        add     r9,r12
+        add     rbx,r12
+
+        lea     rbp,[8+rbp]
+        mov     r13,QWORD[rsp]
+        mov     rdi,QWORD[104+rsp]
+
+        mov     r12,r13
+        ror     r13,7
+        add     rbx,r14
+        mov     r14,rdi
+        ror     rdi,42
+
+        xor     r13,r12
+        shr     r12,7
+        ror     r13,1
+        xor     rdi,r14
+        shr     r14,6
+
+        ror     rdi,19
+        xor     r12,r13
+        xor     rdi,r14
+        add     r12,QWORD[64+rsp]
+
+        add     r12,QWORD[120+rsp]
+        mov     r13,r9
+        add     r12,rdi
+        mov     r14,rbx
+        ror     r13,23
+        mov     rdi,r10
+
+        xor     r13,r9
+        ror     r14,5
+        xor     rdi,r11
+
+        mov     QWORD[120+rsp],r12
+        xor     r14,rbx
+        and     rdi,r9
+
+        ror     r13,4
+        add     r12,rax
+        xor     rdi,r11
+
+        ror     r14,6
+        xor     r13,r9
+        add     r12,rdi
+
+        mov     rdi,rbx
+        add     r12,QWORD[rbp]
+        xor     r14,rbx
+
+        xor     rdi,rcx
+        ror     r13,14
+        mov     rax,rcx
+
+        and     r15,rdi
+        ror     r14,28
+        add     r12,r13
+
+        xor     rax,r15
+        add     r8,r12
+        add     rax,r12
+
+        lea     rbp,[24+rbp]
+        cmp     BYTE[7+rbp],0
+        jnz     NEAR $L$rounds_16_xx
+
+        mov     rdi,QWORD[((128+0))+rsp]
+        add     rax,r14
+        lea     rsi,[128+rsi]
+
+        add     rax,QWORD[rdi]
+        add     rbx,QWORD[8+rdi]
+        add     rcx,QWORD[16+rdi]
+        add     rdx,QWORD[24+rdi]
+        add     r8,QWORD[32+rdi]
+        add     r9,QWORD[40+rdi]
+        add     r10,QWORD[48+rdi]
+        add     r11,QWORD[56+rdi]
+
+        cmp     rsi,QWORD[((128+16))+rsp]
+
+        mov     QWORD[rdi],rax
+        mov     QWORD[8+rdi],rbx
+        mov     QWORD[16+rdi],rcx
+        mov     QWORD[24+rdi],rdx
+        mov     QWORD[32+rdi],r8
+        mov     QWORD[40+rdi],r9
+        mov     QWORD[48+rdi],r10
+        mov     QWORD[56+rdi],r11
+        jb      NEAR $L$loop
+
+        mov     rsi,QWORD[152+rsp]
+
+        mov     r15,QWORD[((-48))+rsi]
+
+        mov     r14,QWORD[((-40))+rsi]
+
+        mov     r13,QWORD[((-32))+rsi]
+
+        mov     r12,QWORD[((-24))+rsi]
+
+        mov     rbp,QWORD[((-16))+rsi]
+
+        mov     rbx,QWORD[((-8))+rsi]
+
+        lea     rsp,[rsi]
+
+$L$epilogue:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_sha512_block_data_order:
+ALIGN   64
+
+K512:
+        DQ      0x428a2f98d728ae22,0x7137449123ef65cd
+        DQ      0x428a2f98d728ae22,0x7137449123ef65cd
+        DQ      0xb5c0fbcfec4d3b2f,0xe9b5dba58189dbbc
+        DQ      0xb5c0fbcfec4d3b2f,0xe9b5dba58189dbbc
+        DQ      0x3956c25bf348b538,0x59f111f1b605d019
+        DQ      0x3956c25bf348b538,0x59f111f1b605d019
+        DQ      0x923f82a4af194f9b,0xab1c5ed5da6d8118
+        DQ      0x923f82a4af194f9b,0xab1c5ed5da6d8118
+        DQ      0xd807aa98a3030242,0x12835b0145706fbe
+        DQ      0xd807aa98a3030242,0x12835b0145706fbe
+        DQ      0x243185be4ee4b28c,0x550c7dc3d5ffb4e2
+        DQ      0x243185be4ee4b28c,0x550c7dc3d5ffb4e2
+        DQ      0x72be5d74f27b896f,0x80deb1fe3b1696b1
+        DQ      0x72be5d74f27b896f,0x80deb1fe3b1696b1
+        DQ      0x9bdc06a725c71235,0xc19bf174cf692694
+        DQ      0x9bdc06a725c71235,0xc19bf174cf692694
+        DQ      0xe49b69c19ef14ad2,0xefbe4786384f25e3
+        DQ      0xe49b69c19ef14ad2,0xefbe4786384f25e3
+        DQ      0x0fc19dc68b8cd5b5,0x240ca1cc77ac9c65
+        DQ      0x0fc19dc68b8cd5b5,0x240ca1cc77ac9c65
+        DQ      0x2de92c6f592b0275,0x4a7484aa6ea6e483
+        DQ      0x2de92c6f592b0275,0x4a7484aa6ea6e483
+        DQ      0x5cb0a9dcbd41fbd4,0x76f988da831153b5
+        DQ      0x5cb0a9dcbd41fbd4,0x76f988da831153b5
+        DQ      0x983e5152ee66dfab,0xa831c66d2db43210
+        DQ      0x983e5152ee66dfab,0xa831c66d2db43210
+        DQ      0xb00327c898fb213f,0xbf597fc7beef0ee4
+        DQ      0xb00327c898fb213f,0xbf597fc7beef0ee4
+        DQ      0xc6e00bf33da88fc2,0xd5a79147930aa725
+        DQ      0xc6e00bf33da88fc2,0xd5a79147930aa725
+        DQ      0x06ca6351e003826f,0x142929670a0e6e70
+        DQ      0x06ca6351e003826f,0x142929670a0e6e70
+        DQ      0x27b70a8546d22ffc,0x2e1b21385c26c926
+        DQ      0x27b70a8546d22ffc,0x2e1b21385c26c926
+        DQ      0x4d2c6dfc5ac42aed,0x53380d139d95b3df
+        DQ      0x4d2c6dfc5ac42aed,0x53380d139d95b3df
+        DQ      0x650a73548baf63de,0x766a0abb3c77b2a8
+        DQ      0x650a73548baf63de,0x766a0abb3c77b2a8
+        DQ      0x81c2c92e47edaee6,0x92722c851482353b
+        DQ      0x81c2c92e47edaee6,0x92722c851482353b
+        DQ      0xa2bfe8a14cf10364,0xa81a664bbc423001
+        DQ      0xa2bfe8a14cf10364,0xa81a664bbc423001
+        DQ      0xc24b8b70d0f89791,0xc76c51a30654be30
+        DQ      0xc24b8b70d0f89791,0xc76c51a30654be30
+        DQ      0xd192e819d6ef5218,0xd69906245565a910
+        DQ      0xd192e819d6ef5218,0xd69906245565a910
+        DQ      0xf40e35855771202a,0x106aa07032bbd1b8
+        DQ      0xf40e35855771202a,0x106aa07032bbd1b8
+        DQ      0x19a4c116b8d2d0c8,0x1e376c085141ab53
+        DQ      0x19a4c116b8d2d0c8,0x1e376c085141ab53
+        DQ      0x2748774cdf8eeb99,0x34b0bcb5e19b48a8
+        DQ      0x2748774cdf8eeb99,0x34b0bcb5e19b48a8
+        DQ      0x391c0cb3c5c95a63,0x4ed8aa4ae3418acb
+        DQ      0x391c0cb3c5c95a63,0x4ed8aa4ae3418acb
+        DQ      0x5b9cca4f7763e373,0x682e6ff3d6b2b8a3
+        DQ      0x5b9cca4f7763e373,0x682e6ff3d6b2b8a3
+        DQ      0x748f82ee5defb2fc,0x78a5636f43172f60
+        DQ      0x748f82ee5defb2fc,0x78a5636f43172f60
+        DQ      0x84c87814a1f0ab72,0x8cc702081a6439ec
+        DQ      0x84c87814a1f0ab72,0x8cc702081a6439ec
+        DQ      0x90befffa23631e28,0xa4506cebde82bde9
+        DQ      0x90befffa23631e28,0xa4506cebde82bde9
+        DQ      0xbef9a3f7b2c67915,0xc67178f2e372532b
+        DQ      0xbef9a3f7b2c67915,0xc67178f2e372532b
+        DQ      0xca273eceea26619c,0xd186b8c721c0c207
+        DQ      0xca273eceea26619c,0xd186b8c721c0c207
+        DQ      0xeada7dd6cde0eb1e,0xf57d4f7fee6ed178
+        DQ      0xeada7dd6cde0eb1e,0xf57d4f7fee6ed178
+        DQ      0x06f067aa72176fba,0x0a637dc5a2c898a6
+        DQ      0x06f067aa72176fba,0x0a637dc5a2c898a6
+        DQ      0x113f9804bef90dae,0x1b710b35131c471b
+        DQ      0x113f9804bef90dae,0x1b710b35131c471b
+        DQ      0x28db77f523047d84,0x32caab7b40c72493
+        DQ      0x28db77f523047d84,0x32caab7b40c72493
+        DQ      0x3c9ebe0a15c9bebc,0x431d67c49c100d4c
+        DQ      0x3c9ebe0a15c9bebc,0x431d67c49c100d4c
+        DQ      0x4cc5d4becb3e42b6,0x597f299cfc657e2a
+        DQ      0x4cc5d4becb3e42b6,0x597f299cfc657e2a
+        DQ      0x5fcb6fab3ad6faec,0x6c44198c4a475817
+        DQ      0x5fcb6fab3ad6faec,0x6c44198c4a475817
+
+        DQ      0x0001020304050607,0x08090a0b0c0d0e0f
+        DQ      0x0001020304050607,0x08090a0b0c0d0e0f
+DB      83,72,65,53,49,50,32,98,108,111,99,107,32,116,114,97
+DB      110,115,102,111,114,109,32,102,111,114,32,120,56,54,95,54
+DB      52,44,32,67,82,89,80,84,79,71,65,77,83,32,98,121
+DB      32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46
+DB      111,114,103,62,0
+
+ALIGN   64
+sha512_block_data_order_xop:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_sha512_block_data_order_xop:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+
+$L$xop_shortcut:
+        mov     rax,rsp
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+        shl     rdx,4
+        sub     rsp,256
+        lea     rdx,[rdx*8+rsi]
+        and     rsp,-64
+        mov     QWORD[((128+0))+rsp],rdi
+        mov     QWORD[((128+8))+rsp],rsi
+        mov     QWORD[((128+16))+rsp],rdx
+        mov     QWORD[152+rsp],rax
+
+        movaps  XMMWORD[(128+32)+rsp],xmm6
+        movaps  XMMWORD[(128+48)+rsp],xmm7
+        movaps  XMMWORD[(128+64)+rsp],xmm8
+        movaps  XMMWORD[(128+80)+rsp],xmm9
+        movaps  XMMWORD[(128+96)+rsp],xmm10
+        movaps  XMMWORD[(128+112)+rsp],xmm11
+$L$prologue_xop:
+
+        vzeroupper
+        mov     rax,QWORD[rdi]
+        mov     rbx,QWORD[8+rdi]
+        mov     rcx,QWORD[16+rdi]
+        mov     rdx,QWORD[24+rdi]
+        mov     r8,QWORD[32+rdi]
+        mov     r9,QWORD[40+rdi]
+        mov     r10,QWORD[48+rdi]
+        mov     r11,QWORD[56+rdi]
+        jmp     NEAR $L$loop_xop
+ALIGN   16
+$L$loop_xop:
+        vmovdqa xmm11,XMMWORD[((K512+1280))]
+        vmovdqu xmm0,XMMWORD[rsi]
+        lea     rbp,[((K512+128))]
+        vmovdqu xmm1,XMMWORD[16+rsi]
+        vmovdqu xmm2,XMMWORD[32+rsi]
+        vpshufb xmm0,xmm0,xmm11
+        vmovdqu xmm3,XMMWORD[48+rsi]
+        vpshufb xmm1,xmm1,xmm11
+        vmovdqu xmm4,XMMWORD[64+rsi]
+        vpshufb xmm2,xmm2,xmm11
+        vmovdqu xmm5,XMMWORD[80+rsi]
+        vpshufb xmm3,xmm3,xmm11
+        vmovdqu xmm6,XMMWORD[96+rsi]
+        vpshufb xmm4,xmm4,xmm11
+        vmovdqu xmm7,XMMWORD[112+rsi]
+        vpshufb xmm5,xmm5,xmm11
+        vpaddq  xmm8,xmm0,XMMWORD[((-128))+rbp]
+        vpshufb xmm6,xmm6,xmm11
+        vpaddq  xmm9,xmm1,XMMWORD[((-96))+rbp]
+        vpshufb xmm7,xmm7,xmm11
+        vpaddq  xmm10,xmm2,XMMWORD[((-64))+rbp]
+        vpaddq  xmm11,xmm3,XMMWORD[((-32))+rbp]
+        vmovdqa XMMWORD[rsp],xmm8
+        vpaddq  xmm8,xmm4,XMMWORD[rbp]
+        vmovdqa XMMWORD[16+rsp],xmm9
+        vpaddq  xmm9,xmm5,XMMWORD[32+rbp]
+        vmovdqa XMMWORD[32+rsp],xmm10
+        vpaddq  xmm10,xmm6,XMMWORD[64+rbp]
+        vmovdqa XMMWORD[48+rsp],xmm11
+        vpaddq  xmm11,xmm7,XMMWORD[96+rbp]
+        vmovdqa XMMWORD[64+rsp],xmm8
+        mov     r14,rax
+        vmovdqa XMMWORD[80+rsp],xmm9
+        mov     rdi,rbx
+        vmovdqa XMMWORD[96+rsp],xmm10
+        xor     rdi,rcx
+        vmovdqa XMMWORD[112+rsp],xmm11
+        mov     r13,r8
+        jmp     NEAR $L$xop_00_47
+
+ALIGN   16
+$L$xop_00_47:
+        add     rbp,256
+        vpalignr        xmm8,xmm1,xmm0,8
+        ror     r13,23
+        mov     rax,r14
+        vpalignr        xmm11,xmm5,xmm4,8
+        mov     r12,r9
+        ror     r14,5
+DB      143,72,120,195,200,56
+        xor     r13,r8
+        xor     r12,r10
+        vpsrlq  xmm8,xmm8,7
+        ror     r13,4
+        xor     r14,rax
+        vpaddq  xmm0,xmm0,xmm11
+        and     r12,r8
+        xor     r13,r8
+        add     r11,QWORD[rsp]
+        mov     r15,rax
+DB      143,72,120,195,209,7
+        xor     r12,r10
+        ror     r14,6
+        vpxor   xmm8,xmm8,xmm9
+        xor     r15,rbx
+        add     r11,r12
+        ror     r13,14
+        and     rdi,r15
+DB      143,104,120,195,223,3
+        xor     r14,rax
+        add     r11,r13
+        vpxor   xmm8,xmm8,xmm10
+        xor     rdi,rbx
+        ror     r14,28
+        vpsrlq  xmm10,xmm7,6
+        add     rdx,r11
+        add     r11,rdi
+        vpaddq  xmm0,xmm0,xmm8
+        mov     r13,rdx
+        add     r14,r11
+DB      143,72,120,195,203,42
+        ror     r13,23
+        mov     r11,r14
+        vpxor   xmm11,xmm11,xmm10
+        mov     r12,r8
+        ror     r14,5
+        xor     r13,rdx
+        xor     r12,r9
+        vpxor   xmm11,xmm11,xmm9
+        ror     r13,4
+        xor     r14,r11
+        and     r12,rdx
+        xor     r13,rdx
+        vpaddq  xmm0,xmm0,xmm11
+        add     r10,QWORD[8+rsp]
+        mov     rdi,r11
+        xor     r12,r9
+        ror     r14,6
+        vpaddq  xmm10,xmm0,XMMWORD[((-128))+rbp]
+        xor     rdi,rax
+        add     r10,r12
+        ror     r13,14
+        and     r15,rdi
+        xor     r14,r11
+        add     r10,r13
+        xor     r15,rax
+        ror     r14,28
+        add     rcx,r10
+        add     r10,r15
+        mov     r13,rcx
+        add     r14,r10
+        vmovdqa XMMWORD[rsp],xmm10
+        vpalignr        xmm8,xmm2,xmm1,8
+        ror     r13,23
+        mov     r10,r14
+        vpalignr        xmm11,xmm6,xmm5,8
+        mov     r12,rdx
+        ror     r14,5
+DB      143,72,120,195,200,56
+        xor     r13,rcx
+        xor     r12,r8
+        vpsrlq  xmm8,xmm8,7
+        ror     r13,4
+        xor     r14,r10
+        vpaddq  xmm1,xmm1,xmm11
+        and     r12,rcx
+        xor     r13,rcx
+        add     r9,QWORD[16+rsp]
+        mov     r15,r10
+DB      143,72,120,195,209,7
+        xor     r12,r8
+        ror     r14,6
+        vpxor   xmm8,xmm8,xmm9
+        xor     r15,r11
+        add     r9,r12
+        ror     r13,14
+        and     rdi,r15
+DB      143,104,120,195,216,3
+        xor     r14,r10
+        add     r9,r13
+        vpxor   xmm8,xmm8,xmm10
+        xor     rdi,r11
+        ror     r14,28
+        vpsrlq  xmm10,xmm0,6
+        add     rbx,r9
+        add     r9,rdi
+        vpaddq  xmm1,xmm1,xmm8
+        mov     r13,rbx
+        add     r14,r9
+DB      143,72,120,195,203,42
+        ror     r13,23
+        mov     r9,r14
+        vpxor   xmm11,xmm11,xmm10
+        mov     r12,rcx
+        ror     r14,5
+        xor     r13,rbx
+        xor     r12,rdx
+        vpxor   xmm11,xmm11,xmm9
+        ror     r13,4
+        xor     r14,r9
+        and     r12,rbx
+        xor     r13,rbx
+        vpaddq  xmm1,xmm1,xmm11
+        add     r8,QWORD[24+rsp]
+        mov     rdi,r9
+        xor     r12,rdx
+        ror     r14,6
+        vpaddq  xmm10,xmm1,XMMWORD[((-96))+rbp]
+        xor     rdi,r10
+        add     r8,r12
+        ror     r13,14
+        and     r15,rdi
+        xor     r14,r9
+        add     r8,r13
+        xor     r15,r10
+        ror     r14,28
+        add     rax,r8
+        add     r8,r15
+        mov     r13,rax
+        add     r14,r8
+        vmovdqa XMMWORD[16+rsp],xmm10
+        vpalignr        xmm8,xmm3,xmm2,8
+        ror     r13,23
+        mov     r8,r14
+        vpalignr        xmm11,xmm7,xmm6,8
+        mov     r12,rbx
+        ror     r14,5
+DB      143,72,120,195,200,56
+        xor     r13,rax
+        xor     r12,rcx
+        vpsrlq  xmm8,xmm8,7
+        ror     r13,4
+        xor     r14,r8
+        vpaddq  xmm2,xmm2,xmm11
+        and     r12,rax
+        xor     r13,rax
+        add     rdx,QWORD[32+rsp]
+        mov     r15,r8
+DB      143,72,120,195,209,7
+        xor     r12,rcx
+        ror     r14,6
+        vpxor   xmm8,xmm8,xmm9
+        xor     r15,r9
+        add     rdx,r12
+        ror     r13,14
+        and     rdi,r15
+DB      143,104,120,195,217,3
+        xor     r14,r8
+        add     rdx,r13
+        vpxor   xmm8,xmm8,xmm10
+        xor     rdi,r9
+        ror     r14,28
+        vpsrlq  xmm10,xmm1,6
+        add     r11,rdx
+        add     rdx,rdi
+        vpaddq  xmm2,xmm2,xmm8
+        mov     r13,r11
+        add     r14,rdx
+DB      143,72,120,195,203,42
+        ror     r13,23
+        mov     rdx,r14
+        vpxor   xmm11,xmm11,xmm10
+        mov     r12,rax
+        ror     r14,5
+        xor     r13,r11
+        xor     r12,rbx
+        vpxor   xmm11,xmm11,xmm9
+        ror     r13,4
+        xor     r14,rdx
+        and     r12,r11
+        xor     r13,r11
+        vpaddq  xmm2,xmm2,xmm11
+        add     rcx,QWORD[40+rsp]
+        mov     rdi,rdx
+        xor     r12,rbx
+        ror     r14,6
+        vpaddq  xmm10,xmm2,XMMWORD[((-64))+rbp]
+        xor     rdi,r8
+        add     rcx,r12
+        ror     r13,14
+        and     r15,rdi
+        xor     r14,rdx
+        add     rcx,r13
+        xor     r15,r8
+        ror     r14,28
+        add     r10,rcx
+        add     rcx,r15
+        mov     r13,r10
+        add     r14,rcx
+        vmovdqa XMMWORD[32+rsp],xmm10
+        vpalignr        xmm8,xmm4,xmm3,8
+        ror     r13,23
+        mov     rcx,r14
+        vpalignr        xmm11,xmm0,xmm7,8
+        mov     r12,r11
+        ror     r14,5
+DB      143,72,120,195,200,56
+        xor     r13,r10
+        xor     r12,rax
+        vpsrlq  xmm8,xmm8,7
+        ror     r13,4
+        xor     r14,rcx
+        vpaddq  xmm3,xmm3,xmm11
+        and     r12,r10
+        xor     r13,r10
+        add     rbx,QWORD[48+rsp]
+        mov     r15,rcx
+DB      143,72,120,195,209,7
+        xor     r12,rax
+        ror     r14,6
+        vpxor   xmm8,xmm8,xmm9
+        xor     r15,rdx
+        add     rbx,r12
+        ror     r13,14
+        and     rdi,r15
+DB      143,104,120,195,218,3
+        xor     r14,rcx
+        add     rbx,r13
+        vpxor   xmm8,xmm8,xmm10
+        xor     rdi,rdx
+        ror     r14,28
+        vpsrlq  xmm10,xmm2,6
+        add     r9,rbx
+        add     rbx,rdi
+        vpaddq  xmm3,xmm3,xmm8
+        mov     r13,r9
+        add     r14,rbx
+DB      143,72,120,195,203,42
+        ror     r13,23
+        mov     rbx,r14
+        vpxor   xmm11,xmm11,xmm10
+        mov     r12,r10
+        ror     r14,5
+        xor     r13,r9
+        xor     r12,r11
+        vpxor   xmm11,xmm11,xmm9
+        ror     r13,4
+        xor     r14,rbx
+        and     r12,r9
+        xor     r13,r9
+        vpaddq  xmm3,xmm3,xmm11
+        add     rax,QWORD[56+rsp]
+        mov     rdi,rbx
+        xor     r12,r11
+        ror     r14,6
+        vpaddq  xmm10,xmm3,XMMWORD[((-32))+rbp]
+        xor     rdi,rcx
+        add     rax,r12
+        ror     r13,14
+        and     r15,rdi
+        xor     r14,rbx
+        add     rax,r13
+        xor     r15,rcx
+        ror     r14,28
+        add     r8,rax
+        add     rax,r15
+        mov     r13,r8
+        add     r14,rax
+        vmovdqa XMMWORD[48+rsp],xmm10
+        vpalignr        xmm8,xmm5,xmm4,8
+        ror     r13,23
+        mov     rax,r14
+        vpalignr        xmm11,xmm1,xmm0,8
+        mov     r12,r9
+        ror     r14,5
+DB      143,72,120,195,200,56
+        xor     r13,r8
+        xor     r12,r10
+        vpsrlq  xmm8,xmm8,7
+        ror     r13,4
+        xor     r14,rax
+        vpaddq  xmm4,xmm4,xmm11
+        and     r12,r8
+        xor     r13,r8
+        add     r11,QWORD[64+rsp]
+        mov     r15,rax
+DB      143,72,120,195,209,7
+        xor     r12,r10
+        ror     r14,6
+        vpxor   xmm8,xmm8,xmm9
+        xor     r15,rbx
+        add     r11,r12
+        ror     r13,14
+        and     rdi,r15
+DB      143,104,120,195,219,3
+        xor     r14,rax
+        add     r11,r13
+        vpxor   xmm8,xmm8,xmm10
+        xor     rdi,rbx
+        ror     r14,28
+        vpsrlq  xmm10,xmm3,6
+        add     rdx,r11
+        add     r11,rdi
+        vpaddq  xmm4,xmm4,xmm8
+        mov     r13,rdx
+        add     r14,r11
+DB      143,72,120,195,203,42
+        ror     r13,23
+        mov     r11,r14
+        vpxor   xmm11,xmm11,xmm10
+        mov     r12,r8
+        ror     r14,5
+        xor     r13,rdx
+        xor     r12,r9
+        vpxor   xmm11,xmm11,xmm9
+        ror     r13,4
+        xor     r14,r11
+        and     r12,rdx
+        xor     r13,rdx
+        vpaddq  xmm4,xmm4,xmm11
+        add     r10,QWORD[72+rsp]
+        mov     rdi,r11
+        xor     r12,r9
+        ror     r14,6
+        vpaddq  xmm10,xmm4,XMMWORD[rbp]
+        xor     rdi,rax
+        add     r10,r12
+        ror     r13,14
+        and     r15,rdi
+        xor     r14,r11
+        add     r10,r13
+        xor     r15,rax
+        ror     r14,28
+        add     rcx,r10
+        add     r10,r15
+        mov     r13,rcx
+        add     r14,r10
+        vmovdqa XMMWORD[64+rsp],xmm10
+        vpalignr        xmm8,xmm6,xmm5,8
+        ror     r13,23
+        mov     r10,r14
+        vpalignr        xmm11,xmm2,xmm1,8
+        mov     r12,rdx
+        ror     r14,5
+DB      143,72,120,195,200,56
+        xor     r13,rcx
+        xor     r12,r8
+        vpsrlq  xmm8,xmm8,7
+        ror     r13,4
+        xor     r14,r10
+        vpaddq  xmm5,xmm5,xmm11
+        and     r12,rcx
+        xor     r13,rcx
+        add     r9,QWORD[80+rsp]
+        mov     r15,r10
+DB      143,72,120,195,209,7
+        xor     r12,r8
+        ror     r14,6
+        vpxor   xmm8,xmm8,xmm9
+        xor     r15,r11
+        add     r9,r12
+        ror     r13,14
+        and     rdi,r15
+DB      143,104,120,195,220,3
+        xor     r14,r10
+        add     r9,r13
+        vpxor   xmm8,xmm8,xmm10
+        xor     rdi,r11
+        ror     r14,28
+        vpsrlq  xmm10,xmm4,6
+        add     rbx,r9
+        add     r9,rdi
+        vpaddq  xmm5,xmm5,xmm8
+        mov     r13,rbx
+        add     r14,r9
+DB      143,72,120,195,203,42
+        ror     r13,23
+        mov     r9,r14
+        vpxor   xmm11,xmm11,xmm10
+        mov     r12,rcx
+        ror     r14,5
+        xor     r13,rbx
+        xor     r12,rdx
+        vpxor   xmm11,xmm11,xmm9
+        ror     r13,4
+        xor     r14,r9
+        and     r12,rbx
+        xor     r13,rbx
+        vpaddq  xmm5,xmm5,xmm11
+        add     r8,QWORD[88+rsp]
+        mov     rdi,r9
+        xor     r12,rdx
+        ror     r14,6
+        vpaddq  xmm10,xmm5,XMMWORD[32+rbp]
+        xor     rdi,r10
+        add     r8,r12
+        ror     r13,14
+        and     r15,rdi
+        xor     r14,r9
+        add     r8,r13
+        xor     r15,r10
+        ror     r14,28
+        add     rax,r8
+        add     r8,r15
+        mov     r13,rax
+        add     r14,r8
+        vmovdqa XMMWORD[80+rsp],xmm10
+        vpalignr        xmm8,xmm7,xmm6,8
+        ror     r13,23
+        mov     r8,r14
+        vpalignr        xmm11,xmm3,xmm2,8
+        mov     r12,rbx
+        ror     r14,5
+DB      143,72,120,195,200,56
+        xor     r13,rax
+        xor     r12,rcx
+        vpsrlq  xmm8,xmm8,7
+        ror     r13,4
+        xor     r14,r8
+        vpaddq  xmm6,xmm6,xmm11
+        and     r12,rax
+        xor     r13,rax
+        add     rdx,QWORD[96+rsp]
+        mov     r15,r8
+DB      143,72,120,195,209,7
+        xor     r12,rcx
+        ror     r14,6
+        vpxor   xmm8,xmm8,xmm9
+        xor     r15,r9
+        add     rdx,r12
+        ror     r13,14
+        and     rdi,r15
+DB      143,104,120,195,221,3
+        xor     r14,r8
+        add     rdx,r13
+        vpxor   xmm8,xmm8,xmm10
+        xor     rdi,r9
+        ror     r14,28
+        vpsrlq  xmm10,xmm5,6
+        add     r11,rdx
+        add     rdx,rdi
+        vpaddq  xmm6,xmm6,xmm8
+        mov     r13,r11
+        add     r14,rdx
+DB      143,72,120,195,203,42
+        ror     r13,23
+        mov     rdx,r14
+        vpxor   xmm11,xmm11,xmm10
+        mov     r12,rax
+        ror     r14,5
+        xor     r13,r11
+        xor     r12,rbx
+        vpxor   xmm11,xmm11,xmm9
+        ror     r13,4
+        xor     r14,rdx
+        and     r12,r11
+        xor     r13,r11
+        vpaddq  xmm6,xmm6,xmm11
+        add     rcx,QWORD[104+rsp]
+        mov     rdi,rdx
+        xor     r12,rbx
+        ror     r14,6
+        vpaddq  xmm10,xmm6,XMMWORD[64+rbp]
+        xor     rdi,r8
+        add     rcx,r12
+        ror     r13,14
+        and     r15,rdi
+        xor     r14,rdx
+        add     rcx,r13
+        xor     r15,r8
+        ror     r14,28
+        add     r10,rcx
+        add     rcx,r15
+        mov     r13,r10
+        add     r14,rcx
+        vmovdqa XMMWORD[96+rsp],xmm10
+        vpalignr        xmm8,xmm0,xmm7,8
+        ror     r13,23
+        mov     rcx,r14
+        vpalignr        xmm11,xmm4,xmm3,8
+        mov     r12,r11
+        ror     r14,5
+DB      143,72,120,195,200,56
+        xor     r13,r10
+        xor     r12,rax
+        vpsrlq  xmm8,xmm8,7
+        ror     r13,4
+        xor     r14,rcx
+        vpaddq  xmm7,xmm7,xmm11
+        and     r12,r10
+        xor     r13,r10
+        add     rbx,QWORD[112+rsp]
+        mov     r15,rcx
+DB      143,72,120,195,209,7
+        xor     r12,rax
+        ror     r14,6
+        vpxor   xmm8,xmm8,xmm9
+        xor     r15,rdx
+        add     rbx,r12
+        ror     r13,14
+        and     rdi,r15
+DB      143,104,120,195,222,3
+        xor     r14,rcx
+        add     rbx,r13
+        vpxor   xmm8,xmm8,xmm10
+        xor     rdi,rdx
+        ror     r14,28
+        vpsrlq  xmm10,xmm6,6
+        add     r9,rbx
+        add     rbx,rdi
+        vpaddq  xmm7,xmm7,xmm8
+        mov     r13,r9
+        add     r14,rbx
+DB      143,72,120,195,203,42
+        ror     r13,23
+        mov     rbx,r14
+        vpxor   xmm11,xmm11,xmm10
+        mov     r12,r10
+        ror     r14,5
+        xor     r13,r9
+        xor     r12,r11
+        vpxor   xmm11,xmm11,xmm9
+        ror     r13,4
+        xor     r14,rbx
+        and     r12,r9
+        xor     r13,r9
+        vpaddq  xmm7,xmm7,xmm11
+        add     rax,QWORD[120+rsp]
+        mov     rdi,rbx
+        xor     r12,r11
+        ror     r14,6
+        vpaddq  xmm10,xmm7,XMMWORD[96+rbp]
+        xor     rdi,rcx
+        add     rax,r12
+        ror     r13,14
+        and     r15,rdi
+        xor     r14,rbx
+        add     rax,r13
+        xor     r15,rcx
+        ror     r14,28
+        add     r8,rax
+        add     rax,r15
+        mov     r13,r8
+        add     r14,rax
+        vmovdqa XMMWORD[112+rsp],xmm10
+        cmp     BYTE[135+rbp],0
+        jne     NEAR $L$xop_00_47
+        ror     r13,23
+        mov     rax,r14
+        mov     r12,r9
+        ror     r14,5
+        xor     r13,r8
+        xor     r12,r10
+        ror     r13,4
+        xor     r14,rax
+        and     r12,r8
+        xor     r13,r8
+        add     r11,QWORD[rsp]
+        mov     r15,rax
+        xor     r12,r10
+        ror     r14,6
+        xor     r15,rbx
+        add     r11,r12
+        ror     r13,14
+        and     rdi,r15
+        xor     r14,rax
+        add     r11,r13
+        xor     rdi,rbx
+        ror     r14,28
+        add     rdx,r11
+        add     r11,rdi
+        mov     r13,rdx
+        add     r14,r11
+        ror     r13,23
+        mov     r11,r14
+        mov     r12,r8
+        ror     r14,5
+        xor     r13,rdx
+        xor     r12,r9
+        ror     r13,4
+        xor     r14,r11
+        and     r12,rdx
+        xor     r13,rdx
+        add     r10,QWORD[8+rsp]
+        mov     rdi,r11
+        xor     r12,r9
+        ror     r14,6
+        xor     rdi,rax
+        add     r10,r12
+        ror     r13,14
+        and     r15,rdi
+        xor     r14,r11
+        add     r10,r13
+        xor     r15,rax
+        ror     r14,28
+        add     rcx,r10
+        add     r10,r15
+        mov     r13,rcx
+        add     r14,r10
+        ror     r13,23
+        mov     r10,r14
+        mov     r12,rdx
+        ror     r14,5
+        xor     r13,rcx
+        xor     r12,r8
+        ror     r13,4
+        xor     r14,r10
+        and     r12,rcx
+        xor     r13,rcx
+        add     r9,QWORD[16+rsp]
+        mov     r15,r10
+        xor     r12,r8
+        ror     r14,6
+        xor     r15,r11
+        add     r9,r12
+        ror     r13,14
+        and     rdi,r15
+        xor     r14,r10
+        add     r9,r13
+        xor     rdi,r11
+        ror     r14,28
+        add     rbx,r9
+        add     r9,rdi
+        mov     r13,rbx
+        add     r14,r9
+        ror     r13,23
+        mov     r9,r14
+        mov     r12,rcx
+        ror     r14,5
+        xor     r13,rbx
+        xor     r12,rdx
+        ror     r13,4
+        xor     r14,r9
+        and     r12,rbx
+        xor     r13,rbx
+        add     r8,QWORD[24+rsp]
+        mov     rdi,r9
+        xor     r12,rdx
+        ror     r14,6
+        xor     rdi,r10
+        add     r8,r12
+        ror     r13,14
+        and     r15,rdi
+        xor     r14,r9
+        add     r8,r13
+        xor     r15,r10
+        ror     r14,28
+        add     rax,r8
+        add     r8,r15
+        mov     r13,rax
+        add     r14,r8
+        ror     r13,23
+        mov     r8,r14
+        mov     r12,rbx
+        ror     r14,5
+        xor     r13,rax
+        xor     r12,rcx
+        ror     r13,4
+        xor     r14,r8
+        and     r12,rax
+        xor     r13,rax
+        add     rdx,QWORD[32+rsp]
+        mov     r15,r8
+        xor     r12,rcx
+        ror     r14,6
+        xor     r15,r9
+        add     rdx,r12
+        ror     r13,14
+        and     rdi,r15
+        xor     r14,r8
+        add     rdx,r13
+        xor     rdi,r9
+        ror     r14,28
+        add     r11,rdx
+        add     rdx,rdi
+        mov     r13,r11
+        add     r14,rdx
+        ror     r13,23
+        mov     rdx,r14
+        mov     r12,rax
+        ror     r14,5
+        xor     r13,r11
+        xor     r12,rbx
+        ror     r13,4
+        xor     r14,rdx
+        and     r12,r11
+        xor     r13,r11
+        add     rcx,QWORD[40+rsp]
+        mov     rdi,rdx
+        xor     r12,rbx
+        ror     r14,6
+        xor     rdi,r8
+        add     rcx,r12
+        ror     r13,14
+        and     r15,rdi
+        xor     r14,rdx
+        add     rcx,r13
+        xor     r15,r8
+        ror     r14,28
+        add     r10,rcx
+        add     rcx,r15
+        mov     r13,r10
+        add     r14,rcx
+        ror     r13,23
+        mov     rcx,r14
+        mov     r12,r11
+        ror     r14,5
+        xor     r13,r10
+        xor     r12,rax
+        ror     r13,4
+        xor     r14,rcx
+        and     r12,r10
+        xor     r13,r10
+        add     rbx,QWORD[48+rsp]
+        mov     r15,rcx
+        xor     r12,rax
+        ror     r14,6
+        xor     r15,rdx
+        add     rbx,r12
+        ror     r13,14
+        and     rdi,r15
+        xor     r14,rcx
+        add     rbx,r13
+        xor     rdi,rdx
+        ror     r14,28
+        add     r9,rbx
+        add     rbx,rdi
+        mov     r13,r9
+        add     r14,rbx
+        ror     r13,23
+        mov     rbx,r14
+        mov     r12,r10
+        ror     r14,5
+        xor     r13,r9
+        xor     r12,r11
+        ror     r13,4
+        xor     r14,rbx
+        and     r12,r9
+        xor     r13,r9
+        add     rax,QWORD[56+rsp]
+        mov     rdi,rbx
+        xor     r12,r11
+        ror     r14,6
+        xor     rdi,rcx
+        add     rax,r12
+        ror     r13,14
+        and     r15,rdi
+        xor     r14,rbx
+        add     rax,r13
+        xor     r15,rcx
+        ror     r14,28
+        add     r8,rax
+        add     rax,r15
+        mov     r13,r8
+        add     r14,rax
+        ror     r13,23
+        mov     rax,r14
+        mov     r12,r9
+        ror     r14,5
+        xor     r13,r8
+        xor     r12,r10
+        ror     r13,4
+        xor     r14,rax
+        and     r12,r8
+        xor     r13,r8
+        add     r11,QWORD[64+rsp]
+        mov     r15,rax
+        xor     r12,r10
+        ror     r14,6
+        xor     r15,rbx
+        add     r11,r12
+        ror     r13,14
+        and     rdi,r15
+        xor     r14,rax
+        add     r11,r13
+        xor     rdi,rbx
+        ror     r14,28
+        add     rdx,r11
+        add     r11,rdi
+        mov     r13,rdx
+        add     r14,r11
+        ror     r13,23
+        mov     r11,r14
+        mov     r12,r8
+        ror     r14,5
+        xor     r13,rdx
+        xor     r12,r9
+        ror     r13,4
+        xor     r14,r11
+        and     r12,rdx
+        xor     r13,rdx
+        add     r10,QWORD[72+rsp]
+        mov     rdi,r11
+        xor     r12,r9
+        ror     r14,6
+        xor     rdi,rax
+        add     r10,r12
+        ror     r13,14
+        and     r15,rdi
+        xor     r14,r11
+        add     r10,r13
+        xor     r15,rax
+        ror     r14,28
+        add     rcx,r10
+        add     r10,r15
+        mov     r13,rcx
+        add     r14,r10
+        ror     r13,23
+        mov     r10,r14
+        mov     r12,rdx
+        ror     r14,5
+        xor     r13,rcx
+        xor     r12,r8
+        ror     r13,4
+        xor     r14,r10
+        and     r12,rcx
+        xor     r13,rcx
+        add     r9,QWORD[80+rsp]
+        mov     r15,r10
+        xor     r12,r8
+        ror     r14,6
+        xor     r15,r11
+        add     r9,r12
+        ror     r13,14
+        and     rdi,r15
+        xor     r14,r10
+        add     r9,r13
+        xor     rdi,r11
+        ror     r14,28
+        add     rbx,r9
+        add     r9,rdi
+        mov     r13,rbx
+        add     r14,r9
+        ror     r13,23
+        mov     r9,r14
+        mov     r12,rcx
+        ror     r14,5
+        xor     r13,rbx
+        xor     r12,rdx
+        ror     r13,4
+        xor     r14,r9
+        and     r12,rbx
+        xor     r13,rbx
+        add     r8,QWORD[88+rsp]
+        mov     rdi,r9
+        xor     r12,rdx
+        ror     r14,6
+        xor     rdi,r10
+        add     r8,r12
+        ror     r13,14
+        and     r15,rdi
+        xor     r14,r9
+        add     r8,r13
+        xor     r15,r10
+        ror     r14,28
+        add     rax,r8
+        add     r8,r15
+        mov     r13,rax
+        add     r14,r8
+        ror     r13,23
+        mov     r8,r14
+        mov     r12,rbx
+        ror     r14,5
+        xor     r13,rax
+        xor     r12,rcx
+        ror     r13,4
+        xor     r14,r8
+        and     r12,rax
+        xor     r13,rax
+        add     rdx,QWORD[96+rsp]
+        mov     r15,r8
+        xor     r12,rcx
+        ror     r14,6
+        xor     r15,r9
+        add     rdx,r12
+        ror     r13,14
+        and     rdi,r15
+        xor     r14,r8
+        add     rdx,r13
+        xor     rdi,r9
+        ror     r14,28
+        add     r11,rdx
+        add     rdx,rdi
+        mov     r13,r11
+        add     r14,rdx
+        ror     r13,23
+        mov     rdx,r14
+        mov     r12,rax
+        ror     r14,5
+        xor     r13,r11
+        xor     r12,rbx
+        ror     r13,4
+        xor     r14,rdx
+        and     r12,r11
+        xor     r13,r11
+        add     rcx,QWORD[104+rsp]
+        mov     rdi,rdx
+        xor     r12,rbx
+        ror     r14,6
+        xor     rdi,r8
+        add     rcx,r12
+        ror     r13,14
+        and     r15,rdi
+        xor     r14,rdx
+        add     rcx,r13
+        xor     r15,r8
+        ror     r14,28
+        add     r10,rcx
+        add     rcx,r15
+        mov     r13,r10
+        add     r14,rcx
+        ror     r13,23
+        mov     rcx,r14
+        mov     r12,r11
+        ror     r14,5
+        xor     r13,r10
+        xor     r12,rax
+        ror     r13,4
+        xor     r14,rcx
+        and     r12,r10
+        xor     r13,r10
+        add     rbx,QWORD[112+rsp]
+        mov     r15,rcx
+        xor     r12,rax
+        ror     r14,6
+        xor     r15,rdx
+        add     rbx,r12
+        ror     r13,14
+        and     rdi,r15
+        xor     r14,rcx
+        add     rbx,r13
+        xor     rdi,rdx
+        ror     r14,28
+        add     r9,rbx
+        add     rbx,rdi
+        mov     r13,r9
+        add     r14,rbx
+        ror     r13,23
+        mov     rbx,r14
+        mov     r12,r10
+        ror     r14,5
+        xor     r13,r9
+        xor     r12,r11
+        ror     r13,4
+        xor     r14,rbx
+        and     r12,r9
+        xor     r13,r9
+        add     rax,QWORD[120+rsp]
+        mov     rdi,rbx
+        xor     r12,r11
+        ror     r14,6
+        xor     rdi,rcx
+        add     rax,r12
+        ror     r13,14
+        and     r15,rdi
+        xor     r14,rbx
+        add     rax,r13
+        xor     r15,rcx
+        ror     r14,28
+        add     r8,rax
+        add     rax,r15
+        mov     r13,r8
+        add     r14,rax
+        mov     rdi,QWORD[((128+0))+rsp]
+        mov     rax,r14
+
+        add     rax,QWORD[rdi]
+        lea     rsi,[128+rsi]
+        add     rbx,QWORD[8+rdi]
+        add     rcx,QWORD[16+rdi]
+        add     rdx,QWORD[24+rdi]
+        add     r8,QWORD[32+rdi]
+        add     r9,QWORD[40+rdi]
+        add     r10,QWORD[48+rdi]
+        add     r11,QWORD[56+rdi]
+
+        cmp     rsi,QWORD[((128+16))+rsp]
+
+        mov     QWORD[rdi],rax
+        mov     QWORD[8+rdi],rbx
+        mov     QWORD[16+rdi],rcx
+        mov     QWORD[24+rdi],rdx
+        mov     QWORD[32+rdi],r8
+        mov     QWORD[40+rdi],r9
+        mov     QWORD[48+rdi],r10
+        mov     QWORD[56+rdi],r11
+        jb      NEAR $L$loop_xop
+
+        mov     rsi,QWORD[152+rsp]
+
+        vzeroupper
+        movaps  xmm6,XMMWORD[((128+32))+rsp]
+        movaps  xmm7,XMMWORD[((128+48))+rsp]
+        movaps  xmm8,XMMWORD[((128+64))+rsp]
+        movaps  xmm9,XMMWORD[((128+80))+rsp]
+        movaps  xmm10,XMMWORD[((128+96))+rsp]
+        movaps  xmm11,XMMWORD[((128+112))+rsp]
+        mov     r15,QWORD[((-48))+rsi]
+
+        mov     r14,QWORD[((-40))+rsi]
+
+        mov     r13,QWORD[((-32))+rsi]
+
+        mov     r12,QWORD[((-24))+rsi]
+
+        mov     rbp,QWORD[((-16))+rsi]
+
+        mov     rbx,QWORD[((-8))+rsi]
+
+        lea     rsp,[rsi]
+
+$L$epilogue_xop:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_sha512_block_data_order_xop:
+
+ALIGN   64
+sha512_block_data_order_avx:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_sha512_block_data_order_avx:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+
+$L$avx_shortcut:
+        mov     rax,rsp
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+        shl     rdx,4
+        sub     rsp,256
+        lea     rdx,[rdx*8+rsi]
+        and     rsp,-64
+        mov     QWORD[((128+0))+rsp],rdi
+        mov     QWORD[((128+8))+rsp],rsi
+        mov     QWORD[((128+16))+rsp],rdx
+        mov     QWORD[152+rsp],rax
+
+        movaps  XMMWORD[(128+32)+rsp],xmm6
+        movaps  XMMWORD[(128+48)+rsp],xmm7
+        movaps  XMMWORD[(128+64)+rsp],xmm8
+        movaps  XMMWORD[(128+80)+rsp],xmm9
+        movaps  XMMWORD[(128+96)+rsp],xmm10
+        movaps  XMMWORD[(128+112)+rsp],xmm11
+$L$prologue_avx:
+
+        vzeroupper
+        mov     rax,QWORD[rdi]
+        mov     rbx,QWORD[8+rdi]
+        mov     rcx,QWORD[16+rdi]
+        mov     rdx,QWORD[24+rdi]
+        mov     r8,QWORD[32+rdi]
+        mov     r9,QWORD[40+rdi]
+        mov     r10,QWORD[48+rdi]
+        mov     r11,QWORD[56+rdi]
+        jmp     NEAR $L$loop_avx
+ALIGN   16
+$L$loop_avx:
+        vmovdqa xmm11,XMMWORD[((K512+1280))]
+        vmovdqu xmm0,XMMWORD[rsi]
+        lea     rbp,[((K512+128))]
+        vmovdqu xmm1,XMMWORD[16+rsi]
+        vmovdqu xmm2,XMMWORD[32+rsi]
+        vpshufb xmm0,xmm0,xmm11
+        vmovdqu xmm3,XMMWORD[48+rsi]
+        vpshufb xmm1,xmm1,xmm11
+        vmovdqu xmm4,XMMWORD[64+rsi]
+        vpshufb xmm2,xmm2,xmm11
+        vmovdqu xmm5,XMMWORD[80+rsi]
+        vpshufb xmm3,xmm3,xmm11
+        vmovdqu xmm6,XMMWORD[96+rsi]
+        vpshufb xmm4,xmm4,xmm11
+        vmovdqu xmm7,XMMWORD[112+rsi]
+        vpshufb xmm5,xmm5,xmm11
+        vpaddq  xmm8,xmm0,XMMWORD[((-128))+rbp]
+        vpshufb xmm6,xmm6,xmm11
+        vpaddq  xmm9,xmm1,XMMWORD[((-96))+rbp]
+        vpshufb xmm7,xmm7,xmm11
+        vpaddq  xmm10,xmm2,XMMWORD[((-64))+rbp]
+        vpaddq  xmm11,xmm3,XMMWORD[((-32))+rbp]
+        vmovdqa XMMWORD[rsp],xmm8
+        vpaddq  xmm8,xmm4,XMMWORD[rbp]
+        vmovdqa XMMWORD[16+rsp],xmm9
+        vpaddq  xmm9,xmm5,XMMWORD[32+rbp]
+        vmovdqa XMMWORD[32+rsp],xmm10
+        vpaddq  xmm10,xmm6,XMMWORD[64+rbp]
+        vmovdqa XMMWORD[48+rsp],xmm11
+        vpaddq  xmm11,xmm7,XMMWORD[96+rbp]
+        vmovdqa XMMWORD[64+rsp],xmm8
+        mov     r14,rax
+        vmovdqa XMMWORD[80+rsp],xmm9
+        mov     rdi,rbx
+        vmovdqa XMMWORD[96+rsp],xmm10
+        xor     rdi,rcx
+        vmovdqa XMMWORD[112+rsp],xmm11
+        mov     r13,r8
+        jmp     NEAR $L$avx_00_47
+
+ALIGN   16
+$L$avx_00_47:
+        add     rbp,256
+        vpalignr        xmm8,xmm1,xmm0,8
+        shrd    r13,r13,23
+        mov     rax,r14
+        vpalignr        xmm11,xmm5,xmm4,8
+        mov     r12,r9
+        shrd    r14,r14,5
+        vpsrlq  xmm10,xmm8,1
+        xor     r13,r8
+        xor     r12,r10
+        vpaddq  xmm0,xmm0,xmm11
+        shrd    r13,r13,4
+        xor     r14,rax
+        vpsrlq  xmm11,xmm8,7
+        and     r12,r8
+        xor     r13,r8
+        vpsllq  xmm9,xmm8,56
+        add     r11,QWORD[rsp]
+        mov     r15,rax
+        vpxor   xmm8,xmm11,xmm10
+        xor     r12,r10
+        shrd    r14,r14,6
+        vpsrlq  xmm10,xmm10,7
+        xor     r15,rbx
+        add     r11,r12
+        vpxor   xmm8,xmm8,xmm9
+        shrd    r13,r13,14
+        and     rdi,r15
+        vpsllq  xmm9,xmm9,7
+        xor     r14,rax
+        add     r11,r13
+        vpxor   xmm8,xmm8,xmm10
+        xor     rdi,rbx
+        shrd    r14,r14,28
+        vpsrlq  xmm11,xmm7,6
+        add     rdx,r11
+        add     r11,rdi
+        vpxor   xmm8,xmm8,xmm9
+        mov     r13,rdx
+        add     r14,r11
+        vpsllq  xmm10,xmm7,3
+        shrd    r13,r13,23
+        mov     r11,r14
+        vpaddq  xmm0,xmm0,xmm8
+        mov     r12,r8
+        shrd    r14,r14,5
+        vpsrlq  xmm9,xmm7,19
+        xor     r13,rdx
+        xor     r12,r9
+        vpxor   xmm11,xmm11,xmm10
+        shrd    r13,r13,4
+        xor     r14,r11
+        vpsllq  xmm10,xmm10,42
+        and     r12,rdx
+        xor     r13,rdx
+        vpxor   xmm11,xmm11,xmm9
+        add     r10,QWORD[8+rsp]
+        mov     rdi,r11
+        vpsrlq  xmm9,xmm9,42
+        xor     r12,r9
+        shrd    r14,r14,6
+        vpxor   xmm11,xmm11,xmm10
+        xor     rdi,rax
+        add     r10,r12
+        vpxor   xmm11,xmm11,xmm9
+        shrd    r13,r13,14
+        and     r15,rdi
+        vpaddq  xmm0,xmm0,xmm11
+        xor     r14,r11
+        add     r10,r13
+        vpaddq  xmm10,xmm0,XMMWORD[((-128))+rbp]
+        xor     r15,rax
+        shrd    r14,r14,28
+        add     rcx,r10
+        add     r10,r15
+        mov     r13,rcx
+        add     r14,r10
+        vmovdqa XMMWORD[rsp],xmm10
+        vpalignr        xmm8,xmm2,xmm1,8
+        shrd    r13,r13,23
+        mov     r10,r14
+        vpalignr        xmm11,xmm6,xmm5,8
+        mov     r12,rdx
+        shrd    r14,r14,5
+        vpsrlq  xmm10,xmm8,1
+        xor     r13,rcx
+        xor     r12,r8
+        vpaddq  xmm1,xmm1,xmm11
+        shrd    r13,r13,4
+        xor     r14,r10
+        vpsrlq  xmm11,xmm8,7
+        and     r12,rcx
+        xor     r13,rcx
+        vpsllq  xmm9,xmm8,56
+        add     r9,QWORD[16+rsp]
+        mov     r15,r10
+        vpxor   xmm8,xmm11,xmm10
+        xor     r12,r8
+        shrd    r14,r14,6
+        vpsrlq  xmm10,xmm10,7
+        xor     r15,r11
+        add     r9,r12
+        vpxor   xmm8,xmm8,xmm9
+        shrd    r13,r13,14
+        and     rdi,r15
+        vpsllq  xmm9,xmm9,7
+        xor     r14,r10
+        add     r9,r13
+        vpxor   xmm8,xmm8,xmm10
+        xor     rdi,r11
+        shrd    r14,r14,28
+        vpsrlq  xmm11,xmm0,6
+        add     rbx,r9
+        add     r9,rdi
+        vpxor   xmm8,xmm8,xmm9
+        mov     r13,rbx
+        add     r14,r9
+        vpsllq  xmm10,xmm0,3
+        shrd    r13,r13,23
+        mov     r9,r14
+        vpaddq  xmm1,xmm1,xmm8
+        mov     r12,rcx
+        shrd    r14,r14,5
+        vpsrlq  xmm9,xmm0,19
+        xor     r13,rbx
+        xor     r12,rdx
+        vpxor   xmm11,xmm11,xmm10
+        shrd    r13,r13,4
+        xor     r14,r9
+        vpsllq  xmm10,xmm10,42
+        and     r12,rbx
+        xor     r13,rbx
+        vpxor   xmm11,xmm11,xmm9
+        add     r8,QWORD[24+rsp]
+        mov     rdi,r9
+        vpsrlq  xmm9,xmm9,42
+        xor     r12,rdx
+        shrd    r14,r14,6
+        vpxor   xmm11,xmm11,xmm10
+        xor     rdi,r10
+        add     r8,r12
+        vpxor   xmm11,xmm11,xmm9
+        shrd    r13,r13,14
+        and     r15,rdi
+        vpaddq  xmm1,xmm1,xmm11
+        xor     r14,r9
+        add     r8,r13
+        vpaddq  xmm10,xmm1,XMMWORD[((-96))+rbp]
+        xor     r15,r10
+        shrd    r14,r14,28
+        add     rax,r8
+        add     r8,r15
+        mov     r13,rax
+        add     r14,r8
+        vmovdqa XMMWORD[16+rsp],xmm10
+        vpalignr        xmm8,xmm3,xmm2,8
+        shrd    r13,r13,23
+        mov     r8,r14
+        vpalignr        xmm11,xmm7,xmm6,8
+        mov     r12,rbx
+        shrd    r14,r14,5
+        vpsrlq  xmm10,xmm8,1
+        xor     r13,rax
+        xor     r12,rcx
+        vpaddq  xmm2,xmm2,xmm11
+        shrd    r13,r13,4
+        xor     r14,r8
+        vpsrlq  xmm11,xmm8,7
+        and     r12,rax
+        xor     r13,rax
+        vpsllq  xmm9,xmm8,56
+        add     rdx,QWORD[32+rsp]
+        mov     r15,r8
+        vpxor   xmm8,xmm11,xmm10
+        xor     r12,rcx
+        shrd    r14,r14,6
+        vpsrlq  xmm10,xmm10,7
+        xor     r15,r9
+        add     rdx,r12
+        vpxor   xmm8,xmm8,xmm9
+        shrd    r13,r13,14
+        and     rdi,r15
+        vpsllq  xmm9,xmm9,7
+        xor     r14,r8
+        add     rdx,r13
+        vpxor   xmm8,xmm8,xmm10
+        xor     rdi,r9
+        shrd    r14,r14,28
+        vpsrlq  xmm11,xmm1,6
+        add     r11,rdx
+        add     rdx,rdi
+        vpxor   xmm8,xmm8,xmm9
+        mov     r13,r11
+        add     r14,rdx
+        vpsllq  xmm10,xmm1,3
+        shrd    r13,r13,23
+        mov     rdx,r14
+        vpaddq  xmm2,xmm2,xmm8
+        mov     r12,rax
+        shrd    r14,r14,5
+        vpsrlq  xmm9,xmm1,19
+        xor     r13,r11
+        xor     r12,rbx
+        vpxor   xmm11,xmm11,xmm10
+        shrd    r13,r13,4
+        xor     r14,rdx
+        vpsllq  xmm10,xmm10,42
+        and     r12,r11
+        xor     r13,r11
+        vpxor   xmm11,xmm11,xmm9
+        add     rcx,QWORD[40+rsp]
+        mov     rdi,rdx
+        vpsrlq  xmm9,xmm9,42
+        xor     r12,rbx
+        shrd    r14,r14,6
+        vpxor   xmm11,xmm11,xmm10
+        xor     rdi,r8
+        add     rcx,r12
+        vpxor   xmm11,xmm11,xmm9
+        shrd    r13,r13,14
+        and     r15,rdi
+        vpaddq  xmm2,xmm2,xmm11
+        xor     r14,rdx
+        add     rcx,r13
+        vpaddq  xmm10,xmm2,XMMWORD[((-64))+rbp]
+        xor     r15,r8
+        shrd    r14,r14,28
+        add     r10,rcx
+        add     rcx,r15
+        mov     r13,r10
+        add     r14,rcx
+        vmovdqa XMMWORD[32+rsp],xmm10
+        vpalignr        xmm8,xmm4,xmm3,8
+        shrd    r13,r13,23
+        mov     rcx,r14
+        vpalignr        xmm11,xmm0,xmm7,8
+        mov     r12,r11
+        shrd    r14,r14,5
+        vpsrlq  xmm10,xmm8,1
+        xor     r13,r10
+        xor     r12,rax
+        vpaddq  xmm3,xmm3,xmm11
+        shrd    r13,r13,4
+        xor     r14,rcx
+        vpsrlq  xmm11,xmm8,7
+        and     r12,r10
+        xor     r13,r10
+        vpsllq  xmm9,xmm8,56
+        add     rbx,QWORD[48+rsp]
+        mov     r15,rcx
+        vpxor   xmm8,xmm11,xmm10
+        xor     r12,rax
+        shrd    r14,r14,6
+        vpsrlq  xmm10,xmm10,7
+        xor     r15,rdx
+        add     rbx,r12
+        vpxor   xmm8,xmm8,xmm9
+        shrd    r13,r13,14
+        and     rdi,r15
+        vpsllq  xmm9,xmm9,7
+        xor     r14,rcx
+        add     rbx,r13
+        vpxor   xmm8,xmm8,xmm10
+        xor     rdi,rdx
+        shrd    r14,r14,28
+        vpsrlq  xmm11,xmm2,6
+        add     r9,rbx
+        add     rbx,rdi
+        vpxor   xmm8,xmm8,xmm9
+        mov     r13,r9
+        add     r14,rbx
+        vpsllq  xmm10,xmm2,3
+        shrd    r13,r13,23
+        mov     rbx,r14
+        vpaddq  xmm3,xmm3,xmm8
+        mov     r12,r10
+        shrd    r14,r14,5
+        vpsrlq  xmm9,xmm2,19
+        xor     r13,r9
+        xor     r12,r11
+        vpxor   xmm11,xmm11,xmm10
+        shrd    r13,r13,4
+        xor     r14,rbx
+        vpsllq  xmm10,xmm10,42
+        and     r12,r9
+        xor     r13,r9
+        vpxor   xmm11,xmm11,xmm9
+        add     rax,QWORD[56+rsp]
+        mov     rdi,rbx
+        vpsrlq  xmm9,xmm9,42
+        xor     r12,r11
+        shrd    r14,r14,6
+        vpxor   xmm11,xmm11,xmm10
+        xor     rdi,rcx
+        add     rax,r12
+        vpxor   xmm11,xmm11,xmm9
+        shrd    r13,r13,14
+        and     r15,rdi
+        vpaddq  xmm3,xmm3,xmm11
+        xor     r14,rbx
+        add     rax,r13
+        vpaddq  xmm10,xmm3,XMMWORD[((-32))+rbp]
+        xor     r15,rcx
+        shrd    r14,r14,28
+        add     r8,rax
+        add     rax,r15
+        mov     r13,r8
+        add     r14,rax
+        vmovdqa XMMWORD[48+rsp],xmm10
+        vpalignr        xmm8,xmm5,xmm4,8
+        shrd    r13,r13,23
+        mov     rax,r14
+        vpalignr        xmm11,xmm1,xmm0,8
+        mov     r12,r9
+        shrd    r14,r14,5
+        vpsrlq  xmm10,xmm8,1
+        xor     r13,r8
+        xor     r12,r10
+        vpaddq  xmm4,xmm4,xmm11
+        shrd    r13,r13,4
+        xor     r14,rax
+        vpsrlq  xmm11,xmm8,7
+        and     r12,r8
+        xor     r13,r8
+        vpsllq  xmm9,xmm8,56
+        add     r11,QWORD[64+rsp]
+        mov     r15,rax
+        vpxor   xmm8,xmm11,xmm10
+        xor     r12,r10
+        shrd    r14,r14,6
+        vpsrlq  xmm10,xmm10,7
+        xor     r15,rbx
+        add     r11,r12
+        vpxor   xmm8,xmm8,xmm9
+        shrd    r13,r13,14
+        and     rdi,r15
+        vpsllq  xmm9,xmm9,7
+        xor     r14,rax
+        add     r11,r13
+        vpxor   xmm8,xmm8,xmm10
+        xor     rdi,rbx
+        shrd    r14,r14,28
+        vpsrlq  xmm11,xmm3,6
+        add     rdx,r11
+        add     r11,rdi
+        vpxor   xmm8,xmm8,xmm9
+        mov     r13,rdx
+        add     r14,r11
+        vpsllq  xmm10,xmm3,3
+        shrd    r13,r13,23
+        mov     r11,r14
+        vpaddq  xmm4,xmm4,xmm8
+        mov     r12,r8
+        shrd    r14,r14,5
+        vpsrlq  xmm9,xmm3,19
+        xor     r13,rdx
+        xor     r12,r9
+        vpxor   xmm11,xmm11,xmm10
+        shrd    r13,r13,4
+        xor     r14,r11
+        vpsllq  xmm10,xmm10,42
+        and     r12,rdx
+        xor     r13,rdx
+        vpxor   xmm11,xmm11,xmm9
+        add     r10,QWORD[72+rsp]
+        mov     rdi,r11
+        vpsrlq  xmm9,xmm9,42
+        xor     r12,r9
+        shrd    r14,r14,6
+        vpxor   xmm11,xmm11,xmm10
+        xor     rdi,rax
+        add     r10,r12
+        vpxor   xmm11,xmm11,xmm9
+        shrd    r13,r13,14
+        and     r15,rdi
+        vpaddq  xmm4,xmm4,xmm11
+        xor     r14,r11
+        add     r10,r13
+        vpaddq  xmm10,xmm4,XMMWORD[rbp]
+        xor     r15,rax
+        shrd    r14,r14,28
+        add     rcx,r10
+        add     r10,r15
+        mov     r13,rcx
+        add     r14,r10
+        vmovdqa XMMWORD[64+rsp],xmm10
+        vpalignr        xmm8,xmm6,xmm5,8
+        shrd    r13,r13,23
+        mov     r10,r14
+        vpalignr        xmm11,xmm2,xmm1,8
+        mov     r12,rdx
+        shrd    r14,r14,5
+        vpsrlq  xmm10,xmm8,1
+        xor     r13,rcx
+        xor     r12,r8
+        vpaddq  xmm5,xmm5,xmm11
+        shrd    r13,r13,4
+        xor     r14,r10
+        vpsrlq  xmm11,xmm8,7
+        and     r12,rcx
+        xor     r13,rcx
+        vpsllq  xmm9,xmm8,56
+        add     r9,QWORD[80+rsp]
+        mov     r15,r10
+        vpxor   xmm8,xmm11,xmm10
+        xor     r12,r8
+        shrd    r14,r14,6
+        vpsrlq  xmm10,xmm10,7
+        xor     r15,r11
+        add     r9,r12
+        vpxor   xmm8,xmm8,xmm9
+        shrd    r13,r13,14
+        and     rdi,r15
+        vpsllq  xmm9,xmm9,7
+        xor     r14,r10
+        add     r9,r13
+        vpxor   xmm8,xmm8,xmm10
+        xor     rdi,r11
+        shrd    r14,r14,28
+        vpsrlq  xmm11,xmm4,6
+        add     rbx,r9
+        add     r9,rdi
+        vpxor   xmm8,xmm8,xmm9
+        mov     r13,rbx
+        add     r14,r9
+        vpsllq  xmm10,xmm4,3
+        shrd    r13,r13,23
+        mov     r9,r14
+        vpaddq  xmm5,xmm5,xmm8
+        mov     r12,rcx
+        shrd    r14,r14,5
+        vpsrlq  xmm9,xmm4,19
+        xor     r13,rbx
+        xor     r12,rdx
+        vpxor   xmm11,xmm11,xmm10
+        shrd    r13,r13,4
+        xor     r14,r9
+        vpsllq  xmm10,xmm10,42
+        and     r12,rbx
+        xor     r13,rbx
+        vpxor   xmm11,xmm11,xmm9
+        add     r8,QWORD[88+rsp]
+        mov     rdi,r9
+        vpsrlq  xmm9,xmm9,42
+        xor     r12,rdx
+        shrd    r14,r14,6
+        vpxor   xmm11,xmm11,xmm10
+        xor     rdi,r10
+        add     r8,r12
+        vpxor   xmm11,xmm11,xmm9
+        shrd    r13,r13,14
+        and     r15,rdi
+        vpaddq  xmm5,xmm5,xmm11
+        xor     r14,r9
+        add     r8,r13
+        vpaddq  xmm10,xmm5,XMMWORD[32+rbp]
+        xor     r15,r10
+        shrd    r14,r14,28
+        add     rax,r8
+        add     r8,r15
+        mov     r13,rax
+        add     r14,r8
+        vmovdqa XMMWORD[80+rsp],xmm10
+        vpalignr        xmm8,xmm7,xmm6,8
+        shrd    r13,r13,23
+        mov     r8,r14
+        vpalignr        xmm11,xmm3,xmm2,8
+        mov     r12,rbx
+        shrd    r14,r14,5
+        vpsrlq  xmm10,xmm8,1
+        xor     r13,rax
+        xor     r12,rcx
+        vpaddq  xmm6,xmm6,xmm11
+        shrd    r13,r13,4
+        xor     r14,r8
+        vpsrlq  xmm11,xmm8,7
+        and     r12,rax
+        xor     r13,rax
+        vpsllq  xmm9,xmm8,56
+        add     rdx,QWORD[96+rsp]
+        mov     r15,r8
+        vpxor   xmm8,xmm11,xmm10
+        xor     r12,rcx
+        shrd    r14,r14,6
+        vpsrlq  xmm10,xmm10,7
+        xor     r15,r9
+        add     rdx,r12
+        vpxor   xmm8,xmm8,xmm9
+        shrd    r13,r13,14
+        and     rdi,r15
+        vpsllq  xmm9,xmm9,7
+        xor     r14,r8
+        add     rdx,r13
+        vpxor   xmm8,xmm8,xmm10
+        xor     rdi,r9
+        shrd    r14,r14,28
+        vpsrlq  xmm11,xmm5,6
+        add     r11,rdx
+        add     rdx,rdi
+        vpxor   xmm8,xmm8,xmm9
+        mov     r13,r11
+        add     r14,rdx
+        vpsllq  xmm10,xmm5,3
+        shrd    r13,r13,23
+        mov     rdx,r14
+        vpaddq  xmm6,xmm6,xmm8
+        mov     r12,rax
+        shrd    r14,r14,5
+        vpsrlq  xmm9,xmm5,19
+        xor     r13,r11
+        xor     r12,rbx
+        vpxor   xmm11,xmm11,xmm10
+        shrd    r13,r13,4
+        xor     r14,rdx
+        vpsllq  xmm10,xmm10,42
+        and     r12,r11
+        xor     r13,r11
+        vpxor   xmm11,xmm11,xmm9
+        add     rcx,QWORD[104+rsp]
+        mov     rdi,rdx
+        vpsrlq  xmm9,xmm9,42
+        xor     r12,rbx
+        shrd    r14,r14,6
+        vpxor   xmm11,xmm11,xmm10
+        xor     rdi,r8
+        add     rcx,r12
+        vpxor   xmm11,xmm11,xmm9
+        shrd    r13,r13,14
+        and     r15,rdi
+        vpaddq  xmm6,xmm6,xmm11
+        xor     r14,rdx
+        add     rcx,r13
+        vpaddq  xmm10,xmm6,XMMWORD[64+rbp]
+        xor     r15,r8
+        shrd    r14,r14,28
+        add     r10,rcx
+        add     rcx,r15
+        mov     r13,r10
+        add     r14,rcx
+        vmovdqa XMMWORD[96+rsp],xmm10
+        vpalignr        xmm8,xmm0,xmm7,8
+        shrd    r13,r13,23
+        mov     rcx,r14
+        vpalignr        xmm11,xmm4,xmm3,8
+        mov     r12,r11
+        shrd    r14,r14,5
+        vpsrlq  xmm10,xmm8,1
+        xor     r13,r10
+        xor     r12,rax
+        vpaddq  xmm7,xmm7,xmm11
+        shrd    r13,r13,4
+        xor     r14,rcx
+        vpsrlq  xmm11,xmm8,7
+        and     r12,r10
+        xor     r13,r10
+        vpsllq  xmm9,xmm8,56
+        add     rbx,QWORD[112+rsp]
+        mov     r15,rcx
+        vpxor   xmm8,xmm11,xmm10
+        xor     r12,rax
+        shrd    r14,r14,6
+        vpsrlq  xmm10,xmm10,7
+        xor     r15,rdx
+        add     rbx,r12
+        vpxor   xmm8,xmm8,xmm9
+        shrd    r13,r13,14
+        and     rdi,r15
+        vpsllq  xmm9,xmm9,7
+        xor     r14,rcx
+        add     rbx,r13
+        vpxor   xmm8,xmm8,xmm10
+        xor     rdi,rdx
+        shrd    r14,r14,28
+        vpsrlq  xmm11,xmm6,6
+        add     r9,rbx
+        add     rbx,rdi
+        vpxor   xmm8,xmm8,xmm9
+        mov     r13,r9
+        add     r14,rbx
+        vpsllq  xmm10,xmm6,3
+        shrd    r13,r13,23
+        mov     rbx,r14
+        vpaddq  xmm7,xmm7,xmm8
+        mov     r12,r10
+        shrd    r14,r14,5
+        vpsrlq  xmm9,xmm6,19
+        xor     r13,r9
+        xor     r12,r11
+        vpxor   xmm11,xmm11,xmm10
+        shrd    r13,r13,4
+        xor     r14,rbx
+        vpsllq  xmm10,xmm10,42
+        and     r12,r9
+        xor     r13,r9
+        vpxor   xmm11,xmm11,xmm9
+        add     rax,QWORD[120+rsp]
+        mov     rdi,rbx
+        vpsrlq  xmm9,xmm9,42
+        xor     r12,r11
+        shrd    r14,r14,6
+        vpxor   xmm11,xmm11,xmm10
+        xor     rdi,rcx
+        add     rax,r12
+        vpxor   xmm11,xmm11,xmm9
+        shrd    r13,r13,14
+        and     r15,rdi
+        vpaddq  xmm7,xmm7,xmm11
+        xor     r14,rbx
+        add     rax,r13
+        vpaddq  xmm10,xmm7,XMMWORD[96+rbp]
+        xor     r15,rcx
+        shrd    r14,r14,28
+        add     r8,rax
+        add     rax,r15
+        mov     r13,r8
+        add     r14,rax
+        vmovdqa XMMWORD[112+rsp],xmm10
+        cmp     BYTE[135+rbp],0
+        jne     NEAR $L$avx_00_47
+        shrd    r13,r13,23
+        mov     rax,r14
+        mov     r12,r9
+        shrd    r14,r14,5
+        xor     r13,r8
+        xor     r12,r10
+        shrd    r13,r13,4
+        xor     r14,rax
+        and     r12,r8
+        xor     r13,r8
+        add     r11,QWORD[rsp]
+        mov     r15,rax
+        xor     r12,r10
+        shrd    r14,r14,6
+        xor     r15,rbx
+        add     r11,r12
+        shrd    r13,r13,14
+        and     rdi,r15
+        xor     r14,rax
+        add     r11,r13
+        xor     rdi,rbx
+        shrd    r14,r14,28
+        add     rdx,r11
+        add     r11,rdi
+        mov     r13,rdx
+        add     r14,r11
+        shrd    r13,r13,23
+        mov     r11,r14
+        mov     r12,r8
+        shrd    r14,r14,5
+        xor     r13,rdx
+        xor     r12,r9
+        shrd    r13,r13,4
+        xor     r14,r11
+        and     r12,rdx
+        xor     r13,rdx
+        add     r10,QWORD[8+rsp]
+        mov     rdi,r11
+        xor     r12,r9
+        shrd    r14,r14,6
+        xor     rdi,rax
+        add     r10,r12
+        shrd    r13,r13,14
+        and     r15,rdi
+        xor     r14,r11
+        add     r10,r13
+        xor     r15,rax
+        shrd    r14,r14,28
+        add     rcx,r10
+        add     r10,r15
+        mov     r13,rcx
+        add     r14,r10
+        shrd    r13,r13,23
+        mov     r10,r14
+        mov     r12,rdx
+        shrd    r14,r14,5
+        xor     r13,rcx
+        xor     r12,r8
+        shrd    r13,r13,4
+        xor     r14,r10
+        and     r12,rcx
+        xor     r13,rcx
+        add     r9,QWORD[16+rsp]
+        mov     r15,r10
+        xor     r12,r8
+        shrd    r14,r14,6
+        xor     r15,r11
+        add     r9,r12
+        shrd    r13,r13,14
+        and     rdi,r15
+        xor     r14,r10
+        add     r9,r13
+        xor     rdi,r11
+        shrd    r14,r14,28
+        add     rbx,r9
+        add     r9,rdi
+        mov     r13,rbx
+        add     r14,r9
+        shrd    r13,r13,23
+        mov     r9,r14
+        mov     r12,rcx
+        shrd    r14,r14,5
+        xor     r13,rbx
+        xor     r12,rdx
+        shrd    r13,r13,4
+        xor     r14,r9
+        and     r12,rbx
+        xor     r13,rbx
+        add     r8,QWORD[24+rsp]
+        mov     rdi,r9
+        xor     r12,rdx
+        shrd    r14,r14,6
+        xor     rdi,r10
+        add     r8,r12
+        shrd    r13,r13,14
+        and     r15,rdi
+        xor     r14,r9
+        add     r8,r13
+        xor     r15,r10
+        shrd    r14,r14,28
+        add     rax,r8
+        add     r8,r15
+        mov     r13,rax
+        add     r14,r8
+        shrd    r13,r13,23
+        mov     r8,r14
+        mov     r12,rbx
+        shrd    r14,r14,5
+        xor     r13,rax
+        xor     r12,rcx
+        shrd    r13,r13,4
+        xor     r14,r8
+        and     r12,rax
+        xor     r13,rax
+        add     rdx,QWORD[32+rsp]
+        mov     r15,r8
+        xor     r12,rcx
+        shrd    r14,r14,6
+        xor     r15,r9
+        add     rdx,r12
+        shrd    r13,r13,14
+        and     rdi,r15
+        xor     r14,r8
+        add     rdx,r13
+        xor     rdi,r9
+        shrd    r14,r14,28
+        add     r11,rdx
+        add     rdx,rdi
+        mov     r13,r11
+        add     r14,rdx
+        shrd    r13,r13,23
+        mov     rdx,r14
+        mov     r12,rax
+        shrd    r14,r14,5
+        xor     r13,r11
+        xor     r12,rbx
+        shrd    r13,r13,4
+        xor     r14,rdx
+        and     r12,r11
+        xor     r13,r11
+        add     rcx,QWORD[40+rsp]
+        mov     rdi,rdx
+        xor     r12,rbx
+        shrd    r14,r14,6
+        xor     rdi,r8
+        add     rcx,r12
+        shrd    r13,r13,14
+        and     r15,rdi
+        xor     r14,rdx
+        add     rcx,r13
+        xor     r15,r8
+        shrd    r14,r14,28
+        add     r10,rcx
+        add     rcx,r15
+        mov     r13,r10
+        add     r14,rcx
+        shrd    r13,r13,23
+        mov     rcx,r14
+        mov     r12,r11
+        shrd    r14,r14,5
+        xor     r13,r10
+        xor     r12,rax
+        shrd    r13,r13,4
+        xor     r14,rcx
+        and     r12,r10
+        xor     r13,r10
+        add     rbx,QWORD[48+rsp]
+        mov     r15,rcx
+        xor     r12,rax
+        shrd    r14,r14,6
+        xor     r15,rdx
+        add     rbx,r12
+        shrd    r13,r13,14
+        and     rdi,r15
+        xor     r14,rcx
+        add     rbx,r13
+        xor     rdi,rdx
+        shrd    r14,r14,28
+        add     r9,rbx
+        add     rbx,rdi
+        mov     r13,r9
+        add     r14,rbx
+        shrd    r13,r13,23
+        mov     rbx,r14
+        mov     r12,r10
+        shrd    r14,r14,5
+        xor     r13,r9
+        xor     r12,r11
+        shrd    r13,r13,4
+        xor     r14,rbx
+        and     r12,r9
+        xor     r13,r9
+        add     rax,QWORD[56+rsp]
+        mov     rdi,rbx
+        xor     r12,r11
+        shrd    r14,r14,6
+        xor     rdi,rcx
+        add     rax,r12
+        shrd    r13,r13,14
+        and     r15,rdi
+        xor     r14,rbx
+        add     rax,r13
+        xor     r15,rcx
+        shrd    r14,r14,28
+        add     r8,rax
+        add     rax,r15
+        mov     r13,r8
+        add     r14,rax
+        shrd    r13,r13,23
+        mov     rax,r14
+        mov     r12,r9
+        shrd    r14,r14,5
+        xor     r13,r8
+        xor     r12,r10
+        shrd    r13,r13,4
+        xor     r14,rax
+        and     r12,r8
+        xor     r13,r8
+        add     r11,QWORD[64+rsp]
+        mov     r15,rax
+        xor     r12,r10
+        shrd    r14,r14,6
+        xor     r15,rbx
+        add     r11,r12
+        shrd    r13,r13,14
+        and     rdi,r15
+        xor     r14,rax
+        add     r11,r13
+        xor     rdi,rbx
+        shrd    r14,r14,28
+        add     rdx,r11
+        add     r11,rdi
+        mov     r13,rdx
+        add     r14,r11
+        shrd    r13,r13,23
+        mov     r11,r14
+        mov     r12,r8
+        shrd    r14,r14,5
+        xor     r13,rdx
+        xor     r12,r9
+        shrd    r13,r13,4
+        xor     r14,r11
+        and     r12,rdx
+        xor     r13,rdx
+        add     r10,QWORD[72+rsp]
+        mov     rdi,r11
+        xor     r12,r9
+        shrd    r14,r14,6
+        xor     rdi,rax
+        add     r10,r12
+        shrd    r13,r13,14
+        and     r15,rdi
+        xor     r14,r11
+        add     r10,r13
+        xor     r15,rax
+        shrd    r14,r14,28
+        add     rcx,r10
+        add     r10,r15
+        mov     r13,rcx
+        add     r14,r10
+        shrd    r13,r13,23
+        mov     r10,r14
+        mov     r12,rdx
+        shrd    r14,r14,5
+        xor     r13,rcx
+        xor     r12,r8
+        shrd    r13,r13,4
+        xor     r14,r10
+        and     r12,rcx
+        xor     r13,rcx
+        add     r9,QWORD[80+rsp]
+        mov     r15,r10
+        xor     r12,r8
+        shrd    r14,r14,6
+        xor     r15,r11
+        add     r9,r12
+        shrd    r13,r13,14
+        and     rdi,r15
+        xor     r14,r10
+        add     r9,r13
+        xor     rdi,r11
+        shrd    r14,r14,28
+        add     rbx,r9
+        add     r9,rdi
+        mov     r13,rbx
+        add     r14,r9
+        shrd    r13,r13,23
+        mov     r9,r14
+        mov     r12,rcx
+        shrd    r14,r14,5
+        xor     r13,rbx
+        xor     r12,rdx
+        shrd    r13,r13,4
+        xor     r14,r9
+        and     r12,rbx
+        xor     r13,rbx
+        add     r8,QWORD[88+rsp]
+        mov     rdi,r9
+        xor     r12,rdx
+        shrd    r14,r14,6
+        xor     rdi,r10
+        add     r8,r12
+        shrd    r13,r13,14
+        and     r15,rdi
+        xor     r14,r9
+        add     r8,r13
+        xor     r15,r10
+        shrd    r14,r14,28
+        add     rax,r8
+        add     r8,r15
+        mov     r13,rax
+        add     r14,r8
+        shrd    r13,r13,23
+        mov     r8,r14
+        mov     r12,rbx
+        shrd    r14,r14,5
+        xor     r13,rax
+        xor     r12,rcx
+        shrd    r13,r13,4
+        xor     r14,r8
+        and     r12,rax
+        xor     r13,rax
+        add     rdx,QWORD[96+rsp]
+        mov     r15,r8
+        xor     r12,rcx
+        shrd    r14,r14,6
+        xor     r15,r9
+        add     rdx,r12
+        shrd    r13,r13,14
+        and     rdi,r15
+        xor     r14,r8
+        add     rdx,r13
+        xor     rdi,r9
+        shrd    r14,r14,28
+        add     r11,rdx
+        add     rdx,rdi
+        mov     r13,r11
+        add     r14,rdx
+        shrd    r13,r13,23
+        mov     rdx,r14
+        mov     r12,rax
+        shrd    r14,r14,5
+        xor     r13,r11
+        xor     r12,rbx
+        shrd    r13,r13,4
+        xor     r14,rdx
+        and     r12,r11
+        xor     r13,r11
+        add     rcx,QWORD[104+rsp]
+        mov     rdi,rdx
+        xor     r12,rbx
+        shrd    r14,r14,6
+        xor     rdi,r8
+        add     rcx,r12
+        shrd    r13,r13,14
+        and     r15,rdi
+        xor     r14,rdx
+        add     rcx,r13
+        xor     r15,r8
+        shrd    r14,r14,28
+        add     r10,rcx
+        add     rcx,r15
+        mov     r13,r10
+        add     r14,rcx
+        shrd    r13,r13,23
+        mov     rcx,r14
+        mov     r12,r11
+        shrd    r14,r14,5
+        xor     r13,r10
+        xor     r12,rax
+        shrd    r13,r13,4
+        xor     r14,rcx
+        and     r12,r10
+        xor     r13,r10
+        add     rbx,QWORD[112+rsp]
+        mov     r15,rcx
+        xor     r12,rax
+        shrd    r14,r14,6
+        xor     r15,rdx
+        add     rbx,r12
+        shrd    r13,r13,14
+        and     rdi,r15
+        xor     r14,rcx
+        add     rbx,r13
+        xor     rdi,rdx
+        shrd    r14,r14,28
+        add     r9,rbx
+        add     rbx,rdi
+        mov     r13,r9
+        add     r14,rbx
+        shrd    r13,r13,23
+        mov     rbx,r14
+        mov     r12,r10
+        shrd    r14,r14,5
+        xor     r13,r9
+        xor     r12,r11
+        shrd    r13,r13,4
+        xor     r14,rbx
+        and     r12,r9
+        xor     r13,r9
+        add     rax,QWORD[120+rsp]
+        mov     rdi,rbx
+        xor     r12,r11
+        shrd    r14,r14,6
+        xor     rdi,rcx
+        add     rax,r12
+        shrd    r13,r13,14
+        and     r15,rdi
+        xor     r14,rbx
+        add     rax,r13
+        xor     r15,rcx
+        shrd    r14,r14,28
+        add     r8,rax
+        add     rax,r15
+        mov     r13,r8
+        add     r14,rax
+        mov     rdi,QWORD[((128+0))+rsp]
+        mov     rax,r14
+
+        add     rax,QWORD[rdi]
+        lea     rsi,[128+rsi]
+        add     rbx,QWORD[8+rdi]
+        add     rcx,QWORD[16+rdi]
+        add     rdx,QWORD[24+rdi]
+        add     r8,QWORD[32+rdi]
+        add     r9,QWORD[40+rdi]
+        add     r10,QWORD[48+rdi]
+        add     r11,QWORD[56+rdi]
+
+        cmp     rsi,QWORD[((128+16))+rsp]
+
+        mov     QWORD[rdi],rax
+        mov     QWORD[8+rdi],rbx
+        mov     QWORD[16+rdi],rcx
+        mov     QWORD[24+rdi],rdx
+        mov     QWORD[32+rdi],r8
+        mov     QWORD[40+rdi],r9
+        mov     QWORD[48+rdi],r10
+        mov     QWORD[56+rdi],r11
+        jb      NEAR $L$loop_avx
+
+        mov     rsi,QWORD[152+rsp]
+
+        vzeroupper
+        movaps  xmm6,XMMWORD[((128+32))+rsp]
+        movaps  xmm7,XMMWORD[((128+48))+rsp]
+        movaps  xmm8,XMMWORD[((128+64))+rsp]
+        movaps  xmm9,XMMWORD[((128+80))+rsp]
+        movaps  xmm10,XMMWORD[((128+96))+rsp]
+        movaps  xmm11,XMMWORD[((128+112))+rsp]
+        mov     r15,QWORD[((-48))+rsi]
+
+        mov     r14,QWORD[((-40))+rsi]
+
+        mov     r13,QWORD[((-32))+rsi]
+
+        mov     r12,QWORD[((-24))+rsi]
+
+        mov     rbp,QWORD[((-16))+rsi]
+
+        mov     rbx,QWORD[((-8))+rsi]
+
+        lea     rsp,[rsi]
+
+$L$epilogue_avx:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_sha512_block_data_order_avx:
+
+ALIGN   64
+sha512_block_data_order_avx2:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_sha512_block_data_order_avx2:
+        mov     rdi,rcx
+        mov     rsi,rdx
+        mov     rdx,r8
+
+
+
+$L$avx2_shortcut:
+        mov     rax,rsp
+
+        push    rbx
+
+        push    rbp
+
+        push    r12
+
+        push    r13
+
+        push    r14
+
+        push    r15
+
+        sub     rsp,1408
+        shl     rdx,4
+        and     rsp,-256*8
+        lea     rdx,[rdx*8+rsi]
+        add     rsp,1152
+        mov     QWORD[((128+0))+rsp],rdi
+        mov     QWORD[((128+8))+rsp],rsi
+        mov     QWORD[((128+16))+rsp],rdx
+        mov     QWORD[152+rsp],rax
+
+        movaps  XMMWORD[(128+32)+rsp],xmm6
+        movaps  XMMWORD[(128+48)+rsp],xmm7
+        movaps  XMMWORD[(128+64)+rsp],xmm8
+        movaps  XMMWORD[(128+80)+rsp],xmm9
+        movaps  XMMWORD[(128+96)+rsp],xmm10
+        movaps  XMMWORD[(128+112)+rsp],xmm11
+$L$prologue_avx2:
+
+        vzeroupper
+        sub     rsi,-16*8
+        mov     rax,QWORD[rdi]
+        mov     r12,rsi
+        mov     rbx,QWORD[8+rdi]
+        cmp     rsi,rdx
+        mov     rcx,QWORD[16+rdi]
+        cmove   r12,rsp
+        mov     rdx,QWORD[24+rdi]
+        mov     r8,QWORD[32+rdi]
+        mov     r9,QWORD[40+rdi]
+        mov     r10,QWORD[48+rdi]
+        mov     r11,QWORD[56+rdi]
+        jmp     NEAR $L$oop_avx2
+ALIGN   16
+$L$oop_avx2:
+        vmovdqu xmm0,XMMWORD[((-128))+rsi]
+        vmovdqu xmm1,XMMWORD[((-128+16))+rsi]
+        vmovdqu xmm2,XMMWORD[((-128+32))+rsi]
+        lea     rbp,[((K512+128))]
+        vmovdqu xmm3,XMMWORD[((-128+48))+rsi]
+        vmovdqu xmm4,XMMWORD[((-128+64))+rsi]
+        vmovdqu xmm5,XMMWORD[((-128+80))+rsi]
+        vmovdqu xmm6,XMMWORD[((-128+96))+rsi]
+        vmovdqu xmm7,XMMWORD[((-128+112))+rsi]
+
+        vmovdqa ymm10,YMMWORD[1152+rbp]
+        vinserti128     ymm0,ymm0,XMMWORD[r12],1
+        vinserti128     ymm1,ymm1,XMMWORD[16+r12],1
+        vpshufb ymm0,ymm0,ymm10
+        vinserti128     ymm2,ymm2,XMMWORD[32+r12],1
+        vpshufb ymm1,ymm1,ymm10
+        vinserti128     ymm3,ymm3,XMMWORD[48+r12],1
+        vpshufb ymm2,ymm2,ymm10
+        vinserti128     ymm4,ymm4,XMMWORD[64+r12],1
+        vpshufb ymm3,ymm3,ymm10
+        vinserti128     ymm5,ymm5,XMMWORD[80+r12],1
+        vpshufb ymm4,ymm4,ymm10
+        vinserti128     ymm6,ymm6,XMMWORD[96+r12],1
+        vpshufb ymm5,ymm5,ymm10
+        vinserti128     ymm7,ymm7,XMMWORD[112+r12],1
+
+        vpaddq  ymm8,ymm0,YMMWORD[((-128))+rbp]
+        vpshufb ymm6,ymm6,ymm10
+        vpaddq  ymm9,ymm1,YMMWORD[((-96))+rbp]
+        vpshufb ymm7,ymm7,ymm10
+        vpaddq  ymm10,ymm2,YMMWORD[((-64))+rbp]
+        vpaddq  ymm11,ymm3,YMMWORD[((-32))+rbp]
+        vmovdqa YMMWORD[rsp],ymm8
+        vpaddq  ymm8,ymm4,YMMWORD[rbp]
+        vmovdqa YMMWORD[32+rsp],ymm9
+        vpaddq  ymm9,ymm5,YMMWORD[32+rbp]
+        vmovdqa YMMWORD[64+rsp],ymm10
+        vpaddq  ymm10,ymm6,YMMWORD[64+rbp]
+        vmovdqa YMMWORD[96+rsp],ymm11
+        lea     rsp,[((-128))+rsp]
+        vpaddq  ymm11,ymm7,YMMWORD[96+rbp]
+        vmovdqa YMMWORD[rsp],ymm8
+        xor     r14,r14
+        vmovdqa YMMWORD[32+rsp],ymm9
+        mov     rdi,rbx
+        vmovdqa YMMWORD[64+rsp],ymm10
+        xor     rdi,rcx
+        vmovdqa YMMWORD[96+rsp],ymm11
+        mov     r12,r9
+        add     rbp,16*2*8
+        jmp     NEAR $L$avx2_00_47
+
+ALIGN   16
+$L$avx2_00_47:
+        lea     rsp,[((-128))+rsp]
+        vpalignr        ymm8,ymm1,ymm0,8
+        add     r11,QWORD[((0+256))+rsp]
+        and     r12,r8
+        rorx    r13,r8,41
+        vpalignr        ymm11,ymm5,ymm4,8
+        rorx    r15,r8,18
+        lea     rax,[r14*1+rax]
+        lea     r11,[r12*1+r11]
+        vpsrlq  ymm10,ymm8,1
+        andn    r12,r8,r10
+        xor     r13,r15
+        rorx    r14,r8,14
+        vpaddq  ymm0,ymm0,ymm11
+        vpsrlq  ymm11,ymm8,7
+        lea     r11,[r12*1+r11]
+        xor     r13,r14
+        mov     r15,rax
+        vpsllq  ymm9,ymm8,56
+        vpxor   ymm8,ymm11,ymm10
+        rorx    r12,rax,39
+        lea     r11,[r13*1+r11]
+        xor     r15,rbx
+        vpsrlq  ymm10,ymm10,7
+        vpxor   ymm8,ymm8,ymm9
+        rorx    r14,rax,34
+        rorx    r13,rax,28
+        lea     rdx,[r11*1+rdx]
+        vpsllq  ymm9,ymm9,7
+        vpxor   ymm8,ymm8,ymm10
+        and     rdi,r15
+        xor     r14,r12
+        xor     rdi,rbx
+        vpsrlq  ymm11,ymm7,6
+        vpxor   ymm8,ymm8,ymm9
+        xor     r14,r13
+        lea     r11,[rdi*1+r11]
+        mov     r12,r8
+        vpsllq  ymm10,ymm7,3
+        vpaddq  ymm0,ymm0,ymm8
+        add     r10,QWORD[((8+256))+rsp]
+        and     r12,rdx
+        rorx    r13,rdx,41
+        vpsrlq  ymm9,ymm7,19
+        vpxor   ymm11,ymm11,ymm10
+        rorx    rdi,rdx,18
+        lea     r11,[r14*1+r11]
+        lea     r10,[r12*1+r10]
+        vpsllq  ymm10,ymm10,42
+        vpxor   ymm11,ymm11,ymm9
+        andn    r12,rdx,r9
+        xor     r13,rdi
+        rorx    r14,rdx,14
+        vpsrlq  ymm9,ymm9,42
+        vpxor   ymm11,ymm11,ymm10
+        lea     r10,[r12*1+r10]
+        xor     r13,r14
+        mov     rdi,r11
+        vpxor   ymm11,ymm11,ymm9
+        rorx    r12,r11,39
+        lea     r10,[r13*1+r10]
+        xor     rdi,rax
+        vpaddq  ymm0,ymm0,ymm11
+        rorx    r14,r11,34
+        rorx    r13,r11,28
+        lea     rcx,[r10*1+rcx]
+        vpaddq  ymm10,ymm0,YMMWORD[((-128))+rbp]
+        and     r15,rdi
+        xor     r14,r12
+        xor     r15,rax
+        xor     r14,r13
+        lea     r10,[r15*1+r10]
+        mov     r12,rdx
+        vmovdqa YMMWORD[rsp],ymm10
+        vpalignr        ymm8,ymm2,ymm1,8
+        add     r9,QWORD[((32+256))+rsp]
+        and     r12,rcx
+        rorx    r13,rcx,41
+        vpalignr        ymm11,ymm6,ymm5,8
+        rorx    r15,rcx,18
+        lea     r10,[r14*1+r10]
+        lea     r9,[r12*1+r9]
+        vpsrlq  ymm10,ymm8,1
+        andn    r12,rcx,r8
+        xor     r13,r15
+        rorx    r14,rcx,14
+        vpaddq  ymm1,ymm1,ymm11
+        vpsrlq  ymm11,ymm8,7
+        lea     r9,[r12*1+r9]
+        xor     r13,r14
+        mov     r15,r10
+        vpsllq  ymm9,ymm8,56
+        vpxor   ymm8,ymm11,ymm10
+        rorx    r12,r10,39
+        lea     r9,[r13*1+r9]
+        xor     r15,r11
+        vpsrlq  ymm10,ymm10,7
+        vpxor   ymm8,ymm8,ymm9
+        rorx    r14,r10,34
+        rorx    r13,r10,28
+        lea     rbx,[r9*1+rbx]
+        vpsllq  ymm9,ymm9,7
+        vpxor   ymm8,ymm8,ymm10
+        and     rdi,r15
+        xor     r14,r12
+        xor     rdi,r11
+        vpsrlq  ymm11,ymm0,6
+        vpxor   ymm8,ymm8,ymm9
+        xor     r14,r13
+        lea     r9,[rdi*1+r9]
+        mov     r12,rcx
+        vpsllq  ymm10,ymm0,3
+        vpaddq  ymm1,ymm1,ymm8
+        add     r8,QWORD[((40+256))+rsp]
+        and     r12,rbx
+        rorx    r13,rbx,41
+        vpsrlq  ymm9,ymm0,19
+        vpxor   ymm11,ymm11,ymm10
+        rorx    rdi,rbx,18
+        lea     r9,[r14*1+r9]
+        lea     r8,[r12*1+r8]
+        vpsllq  ymm10,ymm10,42
+        vpxor   ymm11,ymm11,ymm9
+        andn    r12,rbx,rdx
+        xor     r13,rdi
+        rorx    r14,rbx,14
+        vpsrlq  ymm9,ymm9,42
+        vpxor   ymm11,ymm11,ymm10
+        lea     r8,[r12*1+r8]
+        xor     r13,r14
+        mov     rdi,r9
+        vpxor   ymm11,ymm11,ymm9
+        rorx    r12,r9,39
+        lea     r8,[r13*1+r8]
+        xor     rdi,r10
+        vpaddq  ymm1,ymm1,ymm11
+        rorx    r14,r9,34
+        rorx    r13,r9,28
+        lea     rax,[r8*1+rax]
+        vpaddq  ymm10,ymm1,YMMWORD[((-96))+rbp]
+        and     r15,rdi
+        xor     r14,r12
+        xor     r15,r10
+        xor     r14,r13
+        lea     r8,[r15*1+r8]
+        mov     r12,rbx
+        vmovdqa YMMWORD[32+rsp],ymm10
+        vpalignr        ymm8,ymm3,ymm2,8
+        add     rdx,QWORD[((64+256))+rsp]
+        and     r12,rax
+        rorx    r13,rax,41
+        vpalignr        ymm11,ymm7,ymm6,8
+        rorx    r15,rax,18
+        lea     r8,[r14*1+r8]
+        lea     rdx,[r12*1+rdx]
+        vpsrlq  ymm10,ymm8,1
+        andn    r12,rax,rcx
+        xor     r13,r15
+        rorx    r14,rax,14
+        vpaddq  ymm2,ymm2,ymm11
+        vpsrlq  ymm11,ymm8,7
+        lea     rdx,[r12*1+rdx]
+        xor     r13,r14
+        mov     r15,r8
+        vpsllq  ymm9,ymm8,56
+        vpxor   ymm8,ymm11,ymm10
+        rorx    r12,r8,39
+        lea     rdx,[r13*1+rdx]
+        xor     r15,r9
+        vpsrlq  ymm10,ymm10,7
+        vpxor   ymm8,ymm8,ymm9
+        rorx    r14,r8,34
+        rorx    r13,r8,28
+        lea     r11,[rdx*1+r11]
+        vpsllq  ymm9,ymm9,7
+        vpxor   ymm8,ymm8,ymm10
+        and     rdi,r15
+        xor     r14,r12
+        xor     rdi,r9
+        vpsrlq  ymm11,ymm1,6
+        vpxor   ymm8,ymm8,ymm9
+        xor     r14,r13
+        lea     rdx,[rdi*1+rdx]
+        mov     r12,rax
+        vpsllq  ymm10,ymm1,3
+        vpaddq  ymm2,ymm2,ymm8
+        add     rcx,QWORD[((72+256))+rsp]
+        and     r12,r11
+        rorx    r13,r11,41
+        vpsrlq  ymm9,ymm1,19
+        vpxor   ymm11,ymm11,ymm10
+        rorx    rdi,r11,18
+        lea     rdx,[r14*1+rdx]
+        lea     rcx,[r12*1+rcx]
+        vpsllq  ymm10,ymm10,42
+        vpxor   ymm11,ymm11,ymm9
+        andn    r12,r11,rbx
+        xor     r13,rdi
+        rorx    r14,r11,14
+        vpsrlq  ymm9,ymm9,42
+        vpxor   ymm11,ymm11,ymm10
+        lea     rcx,[r12*1+rcx]
+        xor     r13,r14
+        mov     rdi,rdx
+        vpxor   ymm11,ymm11,ymm9
+        rorx    r12,rdx,39
+        lea     rcx,[r13*1+rcx]
+        xor     rdi,r8
+        vpaddq  ymm2,ymm2,ymm11
+        rorx    r14,rdx,34
+        rorx    r13,rdx,28
+        lea     r10,[rcx*1+r10]
+        vpaddq  ymm10,ymm2,YMMWORD[((-64))+rbp]
+        and     r15,rdi
+        xor     r14,r12
+        xor     r15,r8
+        xor     r14,r13
+        lea     rcx,[r15*1+rcx]
+        mov     r12,r11
+        vmovdqa YMMWORD[64+rsp],ymm10
+        vpalignr        ymm8,ymm4,ymm3,8
+        add     rbx,QWORD[((96+256))+rsp]
+        and     r12,r10
+        rorx    r13,r10,41
+        vpalignr        ymm11,ymm0,ymm7,8
+        rorx    r15,r10,18
+        lea     rcx,[r14*1+rcx]
+        lea     rbx,[r12*1+rbx]
+        vpsrlq  ymm10,ymm8,1
+        andn    r12,r10,rax
+        xor     r13,r15
+        rorx    r14,r10,14
+        vpaddq  ymm3,ymm3,ymm11
+        vpsrlq  ymm11,ymm8,7
+        lea     rbx,[r12*1+rbx]
+        xor     r13,r14
+        mov     r15,rcx
+        vpsllq  ymm9,ymm8,56
+        vpxor   ymm8,ymm11,ymm10
+        rorx    r12,rcx,39
+        lea     rbx,[r13*1+rbx]
+        xor     r15,rdx
+        vpsrlq  ymm10,ymm10,7
+        vpxor   ymm8,ymm8,ymm9
+        rorx    r14,rcx,34
+        rorx    r13,rcx,28
+        lea     r9,[rbx*1+r9]
+        vpsllq  ymm9,ymm9,7
+        vpxor   ymm8,ymm8,ymm10
+        and     rdi,r15
+        xor     r14,r12
+        xor     rdi,rdx
+        vpsrlq  ymm11,ymm2,6
+        vpxor   ymm8,ymm8,ymm9
+        xor     r14,r13
+        lea     rbx,[rdi*1+rbx]
+        mov     r12,r10
+        vpsllq  ymm10,ymm2,3
+        vpaddq  ymm3,ymm3,ymm8
+        add     rax,QWORD[((104+256))+rsp]
+        and     r12,r9
+        rorx    r13,r9,41
+        vpsrlq  ymm9,ymm2,19
+        vpxor   ymm11,ymm11,ymm10
+        rorx    rdi,r9,18
+        lea     rbx,[r14*1+rbx]
+        lea     rax,[r12*1+rax]
+        vpsllq  ymm10,ymm10,42
+        vpxor   ymm11,ymm11,ymm9
+        andn    r12,r9,r11
+        xor     r13,rdi
+        rorx    r14,r9,14
+        vpsrlq  ymm9,ymm9,42
+        vpxor   ymm11,ymm11,ymm10
+        lea     rax,[r12*1+rax]
+        xor     r13,r14
+        mov     rdi,rbx
+        vpxor   ymm11,ymm11,ymm9
+        rorx    r12,rbx,39
+        lea     rax,[r13*1+rax]
+        xor     rdi,rcx
+        vpaddq  ymm3,ymm3,ymm11
+        rorx    r14,rbx,34
+        rorx    r13,rbx,28
+        lea     r8,[rax*1+r8]
+        vpaddq  ymm10,ymm3,YMMWORD[((-32))+rbp]
+        and     r15,rdi
+        xor     r14,r12
+        xor     r15,rcx
+        xor     r14,r13
+        lea     rax,[r15*1+rax]
+        mov     r12,r9
+        vmovdqa YMMWORD[96+rsp],ymm10
+        lea     rsp,[((-128))+rsp]
+        vpalignr        ymm8,ymm5,ymm4,8
+        add     r11,QWORD[((0+256))+rsp]
+        and     r12,r8
+        rorx    r13,r8,41
+        vpalignr        ymm11,ymm1,ymm0,8
+        rorx    r15,r8,18
+        lea     rax,[r14*1+rax]
+        lea     r11,[r12*1+r11]
+        vpsrlq  ymm10,ymm8,1
+        andn    r12,r8,r10
+        xor     r13,r15
+        rorx    r14,r8,14
+        vpaddq  ymm4,ymm4,ymm11
+        vpsrlq  ymm11,ymm8,7
+        lea     r11,[r12*1+r11]
+        xor     r13,r14
+        mov     r15,rax
+        vpsllq  ymm9,ymm8,56
+        vpxor   ymm8,ymm11,ymm10
+        rorx    r12,rax,39
+        lea     r11,[r13*1+r11]
+        xor     r15,rbx
+        vpsrlq  ymm10,ymm10,7
+        vpxor   ymm8,ymm8,ymm9
+        rorx    r14,rax,34
+        rorx    r13,rax,28
+        lea     rdx,[r11*1+rdx]
+        vpsllq  ymm9,ymm9,7
+        vpxor   ymm8,ymm8,ymm10
+        and     rdi,r15
+        xor     r14,r12
+        xor     rdi,rbx
+        vpsrlq  ymm11,ymm3,6
+        vpxor   ymm8,ymm8,ymm9
+        xor     r14,r13
+        lea     r11,[rdi*1+r11]
+        mov     r12,r8
+        vpsllq  ymm10,ymm3,3
+        vpaddq  ymm4,ymm4,ymm8
+        add     r10,QWORD[((8+256))+rsp]
+        and     r12,rdx
+        rorx    r13,rdx,41
+        vpsrlq  ymm9,ymm3,19
+        vpxor   ymm11,ymm11,ymm10
+        rorx    rdi,rdx,18
+        lea     r11,[r14*1+r11]
+        lea     r10,[r12*1+r10]
+        vpsllq  ymm10,ymm10,42
+        vpxor   ymm11,ymm11,ymm9
+        andn    r12,rdx,r9
+        xor     r13,rdi
+        rorx    r14,rdx,14
+        vpsrlq  ymm9,ymm9,42
+        vpxor   ymm11,ymm11,ymm10
+        lea     r10,[r12*1+r10]
+        xor     r13,r14
+        mov     rdi,r11
+        vpxor   ymm11,ymm11,ymm9
+        rorx    r12,r11,39
+        lea     r10,[r13*1+r10]
+        xor     rdi,rax
+        vpaddq  ymm4,ymm4,ymm11
+        rorx    r14,r11,34
+        rorx    r13,r11,28
+        lea     rcx,[r10*1+rcx]
+        vpaddq  ymm10,ymm4,YMMWORD[rbp]
+        and     r15,rdi
+        xor     r14,r12
+        xor     r15,rax
+        xor     r14,r13
+        lea     r10,[r15*1+r10]
+        mov     r12,rdx
+        vmovdqa YMMWORD[rsp],ymm10
+        vpalignr        ymm8,ymm6,ymm5,8
+        add     r9,QWORD[((32+256))+rsp]
+        and     r12,rcx
+        rorx    r13,rcx,41
+        vpalignr        ymm11,ymm2,ymm1,8
+        rorx    r15,rcx,18
+        lea     r10,[r14*1+r10]
+        lea     r9,[r12*1+r9]
+        vpsrlq  ymm10,ymm8,1
+        andn    r12,rcx,r8
+        xor     r13,r15
+        rorx    r14,rcx,14
+        vpaddq  ymm5,ymm5,ymm11
+        vpsrlq  ymm11,ymm8,7
+        lea     r9,[r12*1+r9]
+        xor     r13,r14
+        mov     r15,r10
+        vpsllq  ymm9,ymm8,56
+        vpxor   ymm8,ymm11,ymm10
+        rorx    r12,r10,39
+        lea     r9,[r13*1+r9]
+        xor     r15,r11
+        vpsrlq  ymm10,ymm10,7
+        vpxor   ymm8,ymm8,ymm9
+        rorx    r14,r10,34
+        rorx    r13,r10,28
+        lea     rbx,[r9*1+rbx]
+        vpsllq  ymm9,ymm9,7
+        vpxor   ymm8,ymm8,ymm10
+        and     rdi,r15
+        xor     r14,r12
+        xor     rdi,r11
+        vpsrlq  ymm11,ymm4,6
+        vpxor   ymm8,ymm8,ymm9
+        xor     r14,r13
+        lea     r9,[rdi*1+r9]
+        mov     r12,rcx
+        vpsllq  ymm10,ymm4,3
+        vpaddq  ymm5,ymm5,ymm8
+        add     r8,QWORD[((40+256))+rsp]
+        and     r12,rbx
+        rorx    r13,rbx,41
+        vpsrlq  ymm9,ymm4,19
+        vpxor   ymm11,ymm11,ymm10
+        rorx    rdi,rbx,18
+        lea     r9,[r14*1+r9]
+        lea     r8,[r12*1+r8]
+        vpsllq  ymm10,ymm10,42
+        vpxor   ymm11,ymm11,ymm9
+        andn    r12,rbx,rdx
+        xor     r13,rdi
+        rorx    r14,rbx,14
+        vpsrlq  ymm9,ymm9,42
+        vpxor   ymm11,ymm11,ymm10
+        lea     r8,[r12*1+r8]
+        xor     r13,r14
+        mov     rdi,r9
+        vpxor   ymm11,ymm11,ymm9
+        rorx    r12,r9,39
+        lea     r8,[r13*1+r8]
+        xor     rdi,r10
+        vpaddq  ymm5,ymm5,ymm11
+        rorx    r14,r9,34
+        rorx    r13,r9,28
+        lea     rax,[r8*1+rax]
+        vpaddq  ymm10,ymm5,YMMWORD[32+rbp]
+        and     r15,rdi
+        xor     r14,r12
+        xor     r15,r10
+        xor     r14,r13
+        lea     r8,[r15*1+r8]
+        mov     r12,rbx
+        vmovdqa YMMWORD[32+rsp],ymm10
+        vpalignr        ymm8,ymm7,ymm6,8
+        add     rdx,QWORD[((64+256))+rsp]
+        and     r12,rax
+        rorx    r13,rax,41
+        vpalignr        ymm11,ymm3,ymm2,8
+        rorx    r15,rax,18
+        lea     r8,[r14*1+r8]
+        lea     rdx,[r12*1+rdx]
+        vpsrlq  ymm10,ymm8,1
+        andn    r12,rax,rcx
+        xor     r13,r15
+        rorx    r14,rax,14
+        vpaddq  ymm6,ymm6,ymm11
+        vpsrlq  ymm11,ymm8,7
+        lea     rdx,[r12*1+rdx]
+        xor     r13,r14
+        mov     r15,r8
+        vpsllq  ymm9,ymm8,56
+        vpxor   ymm8,ymm11,ymm10
+        rorx    r12,r8,39
+        lea     rdx,[r13*1+rdx]
+        xor     r15,r9
+        vpsrlq  ymm10,ymm10,7
+        vpxor   ymm8,ymm8,ymm9
+        rorx    r14,r8,34
+        rorx    r13,r8,28
+        lea     r11,[rdx*1+r11]
+        vpsllq  ymm9,ymm9,7
+        vpxor   ymm8,ymm8,ymm10
+        and     rdi,r15
+        xor     r14,r12
+        xor     rdi,r9
+        vpsrlq  ymm11,ymm5,6
+        vpxor   ymm8,ymm8,ymm9
+        xor     r14,r13
+        lea     rdx,[rdi*1+rdx]
+        mov     r12,rax
+        vpsllq  ymm10,ymm5,3
+        vpaddq  ymm6,ymm6,ymm8
+        add     rcx,QWORD[((72+256))+rsp]
+        and     r12,r11
+        rorx    r13,r11,41
+        vpsrlq  ymm9,ymm5,19
+        vpxor   ymm11,ymm11,ymm10
+        rorx    rdi,r11,18
+        lea     rdx,[r14*1+rdx]
+        lea     rcx,[r12*1+rcx]
+        vpsllq  ymm10,ymm10,42
+        vpxor   ymm11,ymm11,ymm9
+        andn    r12,r11,rbx
+        xor     r13,rdi
+        rorx    r14,r11,14
+        vpsrlq  ymm9,ymm9,42
+        vpxor   ymm11,ymm11,ymm10
+        lea     rcx,[r12*1+rcx]
+        xor     r13,r14
+        mov     rdi,rdx
+        vpxor   ymm11,ymm11,ymm9
+        rorx    r12,rdx,39
+        lea     rcx,[r13*1+rcx]
+        xor     rdi,r8
+        vpaddq  ymm6,ymm6,ymm11
+        rorx    r14,rdx,34
+        rorx    r13,rdx,28
+        lea     r10,[rcx*1+r10]
+        vpaddq  ymm10,ymm6,YMMWORD[64+rbp]
+        and     r15,rdi
+        xor     r14,r12
+        xor     r15,r8
+        xor     r14,r13
+        lea     rcx,[r15*1+rcx]
+        mov     r12,r11
+        vmovdqa YMMWORD[64+rsp],ymm10
+        vpalignr        ymm8,ymm0,ymm7,8
+        add     rbx,QWORD[((96+256))+rsp]
+        and     r12,r10
+        rorx    r13,r10,41
+        vpalignr        ymm11,ymm4,ymm3,8
+        rorx    r15,r10,18
+        lea     rcx,[r14*1+rcx]
+        lea     rbx,[r12*1+rbx]
+        vpsrlq  ymm10,ymm8,1
+        andn    r12,r10,rax
+        xor     r13,r15
+        rorx    r14,r10,14
+        vpaddq  ymm7,ymm7,ymm11
+        vpsrlq  ymm11,ymm8,7
+        lea     rbx,[r12*1+rbx]
+        xor     r13,r14
+        mov     r15,rcx
+        vpsllq  ymm9,ymm8,56
+        vpxor   ymm8,ymm11,ymm10
+        rorx    r12,rcx,39
+        lea     rbx,[r13*1+rbx]
+        xor     r15,rdx
+        vpsrlq  ymm10,ymm10,7
+        vpxor   ymm8,ymm8,ymm9
+        rorx    r14,rcx,34
+        rorx    r13,rcx,28
+        lea     r9,[rbx*1+r9]
+        vpsllq  ymm9,ymm9,7
+        vpxor   ymm8,ymm8,ymm10
+        and     rdi,r15
+        xor     r14,r12
+        xor     rdi,rdx
+        vpsrlq  ymm11,ymm6,6
+        vpxor   ymm8,ymm8,ymm9
+        xor     r14,r13
+        lea     rbx,[rdi*1+rbx]
+        mov     r12,r10
+        vpsllq  ymm10,ymm6,3
+        vpaddq  ymm7,ymm7,ymm8
+        add     rax,QWORD[((104+256))+rsp]
+        and     r12,r9
+        rorx    r13,r9,41
+        vpsrlq  ymm9,ymm6,19
+        vpxor   ymm11,ymm11,ymm10
+        rorx    rdi,r9,18
+        lea     rbx,[r14*1+rbx]
+        lea     rax,[r12*1+rax]
+        vpsllq  ymm10,ymm10,42
+        vpxor   ymm11,ymm11,ymm9
+        andn    r12,r9,r11
+        xor     r13,rdi
+        rorx    r14,r9,14
+        vpsrlq  ymm9,ymm9,42
+        vpxor   ymm11,ymm11,ymm10
+        lea     rax,[r12*1+rax]
+        xor     r13,r14
+        mov     rdi,rbx
+        vpxor   ymm11,ymm11,ymm9
+        rorx    r12,rbx,39
+        lea     rax,[r13*1+rax]
+        xor     rdi,rcx
+        vpaddq  ymm7,ymm7,ymm11
+        rorx    r14,rbx,34
+        rorx    r13,rbx,28
+        lea     r8,[rax*1+r8]
+        vpaddq  ymm10,ymm7,YMMWORD[96+rbp]
+        and     r15,rdi
+        xor     r14,r12
+        xor     r15,rcx
+        xor     r14,r13
+        lea     rax,[r15*1+rax]
+        mov     r12,r9
+        vmovdqa YMMWORD[96+rsp],ymm10
+        lea     rbp,[256+rbp]
+        cmp     BYTE[((-121))+rbp],0
+        jne     NEAR $L$avx2_00_47
+        add     r11,QWORD[((0+128))+rsp]
+        and     r12,r8
+        rorx    r13,r8,41
+        rorx    r15,r8,18
+        lea     rax,[r14*1+rax]
+        lea     r11,[r12*1+r11]
+        andn    r12,r8,r10
+        xor     r13,r15
+        rorx    r14,r8,14
+        lea     r11,[r12*1+r11]
+        xor     r13,r14
+        mov     r15,rax
+        rorx    r12,rax,39
+        lea     r11,[r13*1+r11]
+        xor     r15,rbx
+        rorx    r14,rax,34
+        rorx    r13,rax,28
+        lea     rdx,[r11*1+rdx]
+        and     rdi,r15
+        xor     r14,r12
+        xor     rdi,rbx
+        xor     r14,r13
+        lea     r11,[rdi*1+r11]
+        mov     r12,r8
+        add     r10,QWORD[((8+128))+rsp]
+        and     r12,rdx
+        rorx    r13,rdx,41
+        rorx    rdi,rdx,18
+        lea     r11,[r14*1+r11]
+        lea     r10,[r12*1+r10]
+        andn    r12,rdx,r9
+        xor     r13,rdi
+        rorx    r14,rdx,14
+        lea     r10,[r12*1+r10]
+        xor     r13,r14
+        mov     rdi,r11
+        rorx    r12,r11,39
+        lea     r10,[r13*1+r10]
+        xor     rdi,rax
+        rorx    r14,r11,34
+        rorx    r13,r11,28
+        lea     rcx,[r10*1+rcx]
+        and     r15,rdi
+        xor     r14,r12
+        xor     r15,rax
+        xor     r14,r13
+        lea     r10,[r15*1+r10]
+        mov     r12,rdx
+        add     r9,QWORD[((32+128))+rsp]
+        and     r12,rcx
+        rorx    r13,rcx,41
+        rorx    r15,rcx,18
+        lea     r10,[r14*1+r10]
+        lea     r9,[r12*1+r9]
+        andn    r12,rcx,r8
+        xor     r13,r15
+        rorx    r14,rcx,14
+        lea     r9,[r12*1+r9]
+        xor     r13,r14
+        mov     r15,r10
+        rorx    r12,r10,39
+        lea     r9,[r13*1+r9]
+        xor     r15,r11
+        rorx    r14,r10,34
+        rorx    r13,r10,28
+        lea     rbx,[r9*1+rbx]
+        and     rdi,r15
+        xor     r14,r12
+        xor     rdi,r11
+        xor     r14,r13
+        lea     r9,[rdi*1+r9]
+        mov     r12,rcx
+        add     r8,QWORD[((40+128))+rsp]
+        and     r12,rbx
+        rorx    r13,rbx,41
+        rorx    rdi,rbx,18
+        lea     r9,[r14*1+r9]
+        lea     r8,[r12*1+r8]
+        andn    r12,rbx,rdx
+        xor     r13,rdi
+        rorx    r14,rbx,14
+        lea     r8,[r12*1+r8]
+        xor     r13,r14
+        mov     rdi,r9
+        rorx    r12,r9,39
+        lea     r8,[r13*1+r8]
+        xor     rdi,r10
+        rorx    r14,r9,34
+        rorx    r13,r9,28
+        lea     rax,[r8*1+rax]
+        and     r15,rdi
+        xor     r14,r12
+        xor     r15,r10
+        xor     r14,r13
+        lea     r8,[r15*1+r8]
+        mov     r12,rbx
+        add     rdx,QWORD[((64+128))+rsp]
+        and     r12,rax
+        rorx    r13,rax,41
+        rorx    r15,rax,18
+        lea     r8,[r14*1+r8]
+        lea     rdx,[r12*1+rdx]
+        andn    r12,rax,rcx
+        xor     r13,r15
+        rorx    r14,rax,14
+        lea     rdx,[r12*1+rdx]
+        xor     r13,r14
+        mov     r15,r8
+        rorx    r12,r8,39
+        lea     rdx,[r13*1+rdx]
+        xor     r15,r9
+        rorx    r14,r8,34
+        rorx    r13,r8,28
+        lea     r11,[rdx*1+r11]
+        and     rdi,r15
+        xor     r14,r12
+        xor     rdi,r9
+        xor     r14,r13
+        lea     rdx,[rdi*1+rdx]
+        mov     r12,rax
+        add     rcx,QWORD[((72+128))+rsp]
+        and     r12,r11
+        rorx    r13,r11,41
+        rorx    rdi,r11,18
+        lea     rdx,[r14*1+rdx]
+        lea     rcx,[r12*1+rcx]
+        andn    r12,r11,rbx
+        xor     r13,rdi
+        rorx    r14,r11,14
+        lea     rcx,[r12*1+rcx]
+        xor     r13,r14
+        mov     rdi,rdx
+        rorx    r12,rdx,39
+        lea     rcx,[r13*1+rcx]
+        xor     rdi,r8
+        rorx    r14,rdx,34
+        rorx    r13,rdx,28
+        lea     r10,[rcx*1+r10]
+        and     r15,rdi
+        xor     r14,r12
+        xor     r15,r8
+        xor     r14,r13
+        lea     rcx,[r15*1+rcx]
+        mov     r12,r11
+        add     rbx,QWORD[((96+128))+rsp]
+        and     r12,r10
+        rorx    r13,r10,41
+        rorx    r15,r10,18
+        lea     rcx,[r14*1+rcx]
+        lea     rbx,[r12*1+rbx]
+        andn    r12,r10,rax
+        xor     r13,r15
+        rorx    r14,r10,14
+        lea     rbx,[r12*1+rbx]
+        xor     r13,r14
+        mov     r15,rcx
+        rorx    r12,rcx,39
+        lea     rbx,[r13*1+rbx]
+        xor     r15,rdx
+        rorx    r14,rcx,34
+        rorx    r13,rcx,28
+        lea     r9,[rbx*1+r9]
+        and     rdi,r15
+        xor     r14,r12
+        xor     rdi,rdx
+        xor     r14,r13
+        lea     rbx,[rdi*1+rbx]
+        mov     r12,r10
+        add     rax,QWORD[((104+128))+rsp]
+        and     r12,r9
+        rorx    r13,r9,41
+        rorx    rdi,r9,18
+        lea     rbx,[r14*1+rbx]
+        lea     rax,[r12*1+rax]
+        andn    r12,r9,r11
+        xor     r13,rdi
+        rorx    r14,r9,14
+        lea     rax,[r12*1+rax]
+        xor     r13,r14
+        mov     rdi,rbx
+        rorx    r12,rbx,39
+        lea     rax,[r13*1+rax]
+        xor     rdi,rcx
+        rorx    r14,rbx,34
+        rorx    r13,rbx,28
+        lea     r8,[rax*1+r8]
+        and     r15,rdi
+        xor     r14,r12
+        xor     r15,rcx
+        xor     r14,r13
+        lea     rax,[r15*1+rax]
+        mov     r12,r9
+        add     r11,QWORD[rsp]
+        and     r12,r8
+        rorx    r13,r8,41
+        rorx    r15,r8,18
+        lea     rax,[r14*1+rax]
+        lea     r11,[r12*1+r11]
+        andn    r12,r8,r10
+        xor     r13,r15
+        rorx    r14,r8,14
+        lea     r11,[r12*1+r11]
+        xor     r13,r14
+        mov     r15,rax
+        rorx    r12,rax,39
+        lea     r11,[r13*1+r11]
+        xor     r15,rbx
+        rorx    r14,rax,34
+        rorx    r13,rax,28
+        lea     rdx,[r11*1+rdx]
+        and     rdi,r15
+        xor     r14,r12
+        xor     rdi,rbx
+        xor     r14,r13
+        lea     r11,[rdi*1+r11]
+        mov     r12,r8
+        add     r10,QWORD[8+rsp]
+        and     r12,rdx
+        rorx    r13,rdx,41
+        rorx    rdi,rdx,18
+        lea     r11,[r14*1+r11]
+        lea     r10,[r12*1+r10]
+        andn    r12,rdx,r9
+        xor     r13,rdi
+        rorx    r14,rdx,14
+        lea     r10,[r12*1+r10]
+        xor     r13,r14
+        mov     rdi,r11
+        rorx    r12,r11,39
+        lea     r10,[r13*1+r10]
+        xor     rdi,rax
+        rorx    r14,r11,34
+        rorx    r13,r11,28
+        lea     rcx,[r10*1+rcx]
+        and     r15,rdi
+        xor     r14,r12
+        xor     r15,rax
+        xor     r14,r13
+        lea     r10,[r15*1+r10]
+        mov     r12,rdx
+        add     r9,QWORD[32+rsp]
+        and     r12,rcx
+        rorx    r13,rcx,41
+        rorx    r15,rcx,18
+        lea     r10,[r14*1+r10]
+        lea     r9,[r12*1+r9]
+        andn    r12,rcx,r8
+        xor     r13,r15
+        rorx    r14,rcx,14
+        lea     r9,[r12*1+r9]
+        xor     r13,r14
+        mov     r15,r10
+        rorx    r12,r10,39
+        lea     r9,[r13*1+r9]
+        xor     r15,r11
+        rorx    r14,r10,34
+        rorx    r13,r10,28
+        lea     rbx,[r9*1+rbx]
+        and     rdi,r15
+        xor     r14,r12
+        xor     rdi,r11
+        xor     r14,r13
+        lea     r9,[rdi*1+r9]
+        mov     r12,rcx
+        add     r8,QWORD[40+rsp]
+        and     r12,rbx
+        rorx    r13,rbx,41
+        rorx    rdi,rbx,18
+        lea     r9,[r14*1+r9]
+        lea     r8,[r12*1+r8]
+        andn    r12,rbx,rdx
+        xor     r13,rdi
+        rorx    r14,rbx,14
+        lea     r8,[r12*1+r8]
+        xor     r13,r14
+        mov     rdi,r9
+        rorx    r12,r9,39
+        lea     r8,[r13*1+r8]
+        xor     rdi,r10
+        rorx    r14,r9,34
+        rorx    r13,r9,28
+        lea     rax,[r8*1+rax]
+        and     r15,rdi
+        xor     r14,r12
+        xor     r15,r10
+        xor     r14,r13
+        lea     r8,[r15*1+r8]
+        mov     r12,rbx
+        add     rdx,QWORD[64+rsp]
+        and     r12,rax
+        rorx    r13,rax,41
+        rorx    r15,rax,18
+        lea     r8,[r14*1+r8]
+        lea     rdx,[r12*1+rdx]
+        andn    r12,rax,rcx
+        xor     r13,r15
+        rorx    r14,rax,14
+        lea     rdx,[r12*1+rdx]
+        xor     r13,r14
+        mov     r15,r8
+        rorx    r12,r8,39
+        lea     rdx,[r13*1+rdx]
+        xor     r15,r9
+        rorx    r14,r8,34
+        rorx    r13,r8,28
+        lea     r11,[rdx*1+r11]
+        and     rdi,r15
+        xor     r14,r12
+        xor     rdi,r9
+        xor     r14,r13
+        lea     rdx,[rdi*1+rdx]
+        mov     r12,rax
+        add     rcx,QWORD[72+rsp]
+        and     r12,r11
+        rorx    r13,r11,41
+        rorx    rdi,r11,18
+        lea     rdx,[r14*1+rdx]
+        lea     rcx,[r12*1+rcx]
+        andn    r12,r11,rbx
+        xor     r13,rdi
+        rorx    r14,r11,14
+        lea     rcx,[r12*1+rcx]
+        xor     r13,r14
+        mov     rdi,rdx
+        rorx    r12,rdx,39
+        lea     rcx,[r13*1+rcx]
+        xor     rdi,r8
+        rorx    r14,rdx,34
+        rorx    r13,rdx,28
+        lea     r10,[rcx*1+r10]
+        and     r15,rdi
+        xor     r14,r12
+        xor     r15,r8
+        xor     r14,r13
+        lea     rcx,[r15*1+rcx]
+        mov     r12,r11
+        add     rbx,QWORD[96+rsp]
+        and     r12,r10
+        rorx    r13,r10,41
+        rorx    r15,r10,18
+        lea     rcx,[r14*1+rcx]
+        lea     rbx,[r12*1+rbx]
+        andn    r12,r10,rax
+        xor     r13,r15
+        rorx    r14,r10,14
+        lea     rbx,[r12*1+rbx]
+        xor     r13,r14
+        mov     r15,rcx
+        rorx    r12,rcx,39
+        lea     rbx,[r13*1+rbx]
+        xor     r15,rdx
+        rorx    r14,rcx,34
+        rorx    r13,rcx,28
+        lea     r9,[rbx*1+r9]
+        and     rdi,r15
+        xor     r14,r12
+        xor     rdi,rdx
+        xor     r14,r13
+        lea     rbx,[rdi*1+rbx]
+        mov     r12,r10
+        add     rax,QWORD[104+rsp]
+        and     r12,r9
+        rorx    r13,r9,41
+        rorx    rdi,r9,18
+        lea     rbx,[r14*1+rbx]
+        lea     rax,[r12*1+rax]
+        andn    r12,r9,r11
+        xor     r13,rdi
+        rorx    r14,r9,14
+        lea     rax,[r12*1+rax]
+        xor     r13,r14
+        mov     rdi,rbx
+        rorx    r12,rbx,39
+        lea     rax,[r13*1+rax]
+        xor     rdi,rcx
+        rorx    r14,rbx,34
+        rorx    r13,rbx,28
+        lea     r8,[rax*1+r8]
+        and     r15,rdi
+        xor     r14,r12
+        xor     r15,rcx
+        xor     r14,r13
+        lea     rax,[r15*1+rax]
+        mov     r12,r9
+        mov     rdi,QWORD[1280+rsp]
+        add     rax,r14
+
+        lea     rbp,[1152+rsp]
+
+        add     rax,QWORD[rdi]
+        add     rbx,QWORD[8+rdi]
+        add     rcx,QWORD[16+rdi]
+        add     rdx,QWORD[24+rdi]
+        add     r8,QWORD[32+rdi]
+        add     r9,QWORD[40+rdi]
+        add     r10,QWORD[48+rdi]
+        add     r11,QWORD[56+rdi]
+
+        mov     QWORD[rdi],rax
+        mov     QWORD[8+rdi],rbx
+        mov     QWORD[16+rdi],rcx
+        mov     QWORD[24+rdi],rdx
+        mov     QWORD[32+rdi],r8
+        mov     QWORD[40+rdi],r9
+        mov     QWORD[48+rdi],r10
+        mov     QWORD[56+rdi],r11
+
+        cmp     rsi,QWORD[144+rbp]
+        je      NEAR $L$done_avx2
+
+        xor     r14,r14
+        mov     rdi,rbx
+        xor     rdi,rcx
+        mov     r12,r9
+        jmp     NEAR $L$ower_avx2
+ALIGN   16
+$L$ower_avx2:
+        add     r11,QWORD[((0+16))+rbp]
+        and     r12,r8
+        rorx    r13,r8,41
+        rorx    r15,r8,18
+        lea     rax,[r14*1+rax]
+        lea     r11,[r12*1+r11]
+        andn    r12,r8,r10
+        xor     r13,r15
+        rorx    r14,r8,14
+        lea     r11,[r12*1+r11]
+        xor     r13,r14
+        mov     r15,rax
+        rorx    r12,rax,39
+        lea     r11,[r13*1+r11]
+        xor     r15,rbx
+        rorx    r14,rax,34
+        rorx    r13,rax,28
+        lea     rdx,[r11*1+rdx]
+        and     rdi,r15
+        xor     r14,r12
+        xor     rdi,rbx
+        xor     r14,r13
+        lea     r11,[rdi*1+r11]
+        mov     r12,r8
+        add     r10,QWORD[((8+16))+rbp]
+        and     r12,rdx
+        rorx    r13,rdx,41
+        rorx    rdi,rdx,18
+        lea     r11,[r14*1+r11]
+        lea     r10,[r12*1+r10]
+        andn    r12,rdx,r9
+        xor     r13,rdi
+        rorx    r14,rdx,14
+        lea     r10,[r12*1+r10]
+        xor     r13,r14
+        mov     rdi,r11
+        rorx    r12,r11,39
+        lea     r10,[r13*1+r10]
+        xor     rdi,rax
+        rorx    r14,r11,34
+        rorx    r13,r11,28
+        lea     rcx,[r10*1+rcx]
+        and     r15,rdi
+        xor     r14,r12
+        xor     r15,rax
+        xor     r14,r13
+        lea     r10,[r15*1+r10]
+        mov     r12,rdx
+        add     r9,QWORD[((32+16))+rbp]
+        and     r12,rcx
+        rorx    r13,rcx,41
+        rorx    r15,rcx,18
+        lea     r10,[r14*1+r10]
+        lea     r9,[r12*1+r9]
+        andn    r12,rcx,r8
+        xor     r13,r15
+        rorx    r14,rcx,14
+        lea     r9,[r12*1+r9]
+        xor     r13,r14
+        mov     r15,r10
+        rorx    r12,r10,39
+        lea     r9,[r13*1+r9]
+        xor     r15,r11
+        rorx    r14,r10,34
+        rorx    r13,r10,28
+        lea     rbx,[r9*1+rbx]
+        and     rdi,r15
+        xor     r14,r12
+        xor     rdi,r11
+        xor     r14,r13
+        lea     r9,[rdi*1+r9]
+        mov     r12,rcx
+        add     r8,QWORD[((40+16))+rbp]
+        and     r12,rbx
+        rorx    r13,rbx,41
+        rorx    rdi,rbx,18
+        lea     r9,[r14*1+r9]
+        lea     r8,[r12*1+r8]
+        andn    r12,rbx,rdx
+        xor     r13,rdi
+        rorx    r14,rbx,14
+        lea     r8,[r12*1+r8]
+        xor     r13,r14
+        mov     rdi,r9
+        rorx    r12,r9,39
+        lea     r8,[r13*1+r8]
+        xor     rdi,r10
+        rorx    r14,r9,34
+        rorx    r13,r9,28
+        lea     rax,[r8*1+rax]
+        and     r15,rdi
+        xor     r14,r12
+        xor     r15,r10
+        xor     r14,r13
+        lea     r8,[r15*1+r8]
+        mov     r12,rbx
+        add     rdx,QWORD[((64+16))+rbp]
+        and     r12,rax
+        rorx    r13,rax,41
+        rorx    r15,rax,18
+        lea     r8,[r14*1+r8]
+        lea     rdx,[r12*1+rdx]
+        andn    r12,rax,rcx
+        xor     r13,r15
+        rorx    r14,rax,14
+        lea     rdx,[r12*1+rdx]
+        xor     r13,r14
+        mov     r15,r8
+        rorx    r12,r8,39
+        lea     rdx,[r13*1+rdx]
+        xor     r15,r9
+        rorx    r14,r8,34
+        rorx    r13,r8,28
+        lea     r11,[rdx*1+r11]
+        and     rdi,r15
+        xor     r14,r12
+        xor     rdi,r9
+        xor     r14,r13
+        lea     rdx,[rdi*1+rdx]
+        mov     r12,rax
+        add     rcx,QWORD[((72+16))+rbp]
+        and     r12,r11
+        rorx    r13,r11,41
+        rorx    rdi,r11,18
+        lea     rdx,[r14*1+rdx]
+        lea     rcx,[r12*1+rcx]
+        andn    r12,r11,rbx
+        xor     r13,rdi
+        rorx    r14,r11,14
+        lea     rcx,[r12*1+rcx]
+        xor     r13,r14
+        mov     rdi,rdx
+        rorx    r12,rdx,39
+        lea     rcx,[r13*1+rcx]
+        xor     rdi,r8
+        rorx    r14,rdx,34
+        rorx    r13,rdx,28
+        lea     r10,[rcx*1+r10]
+        and     r15,rdi
+        xor     r14,r12
+        xor     r15,r8
+        xor     r14,r13
+        lea     rcx,[r15*1+rcx]
+        mov     r12,r11
+        add     rbx,QWORD[((96+16))+rbp]
+        and     r12,r10
+        rorx    r13,r10,41
+        rorx    r15,r10,18
+        lea     rcx,[r14*1+rcx]
+        lea     rbx,[r12*1+rbx]
+        andn    r12,r10,rax
+        xor     r13,r15
+        rorx    r14,r10,14
+        lea     rbx,[r12*1+rbx]
+        xor     r13,r14
+        mov     r15,rcx
+        rorx    r12,rcx,39
+        lea     rbx,[r13*1+rbx]
+        xor     r15,rdx
+        rorx    r14,rcx,34
+        rorx    r13,rcx,28
+        lea     r9,[rbx*1+r9]
+        and     rdi,r15
+        xor     r14,r12
+        xor     rdi,rdx
+        xor     r14,r13
+        lea     rbx,[rdi*1+rbx]
+        mov     r12,r10
+        add     rax,QWORD[((104+16))+rbp]
+        and     r12,r9
+        rorx    r13,r9,41
+        rorx    rdi,r9,18
+        lea     rbx,[r14*1+rbx]
+        lea     rax,[r12*1+rax]
+        andn    r12,r9,r11
+        xor     r13,rdi
+        rorx    r14,r9,14
+        lea     rax,[r12*1+rax]
+        xor     r13,r14
+        mov     rdi,rbx
+        rorx    r12,rbx,39
+        lea     rax,[r13*1+rax]
+        xor     rdi,rcx
+        rorx    r14,rbx,34
+        rorx    r13,rbx,28
+        lea     r8,[rax*1+r8]
+        and     r15,rdi
+        xor     r14,r12
+        xor     r15,rcx
+        xor     r14,r13
+        lea     rax,[r15*1+rax]
+        mov     r12,r9
+        lea     rbp,[((-128))+rbp]
+        cmp     rbp,rsp
+        jae     NEAR $L$ower_avx2
+
+        mov     rdi,QWORD[1280+rsp]
+        add     rax,r14
+
+        lea     rsp,[1152+rsp]
+
+        add     rax,QWORD[rdi]
+        add     rbx,QWORD[8+rdi]
+        add     rcx,QWORD[16+rdi]
+        add     rdx,QWORD[24+rdi]
+        add     r8,QWORD[32+rdi]
+        add     r9,QWORD[40+rdi]
+        lea     rsi,[256+rsi]
+        add     r10,QWORD[48+rdi]
+        mov     r12,rsi
+        add     r11,QWORD[56+rdi]
+        cmp     rsi,QWORD[((128+16))+rsp]
+
+        mov     QWORD[rdi],rax
+        cmove   r12,rsp
+        mov     QWORD[8+rdi],rbx
+        mov     QWORD[16+rdi],rcx
+        mov     QWORD[24+rdi],rdx
+        mov     QWORD[32+rdi],r8
+        mov     QWORD[40+rdi],r9
+        mov     QWORD[48+rdi],r10
+        mov     QWORD[56+rdi],r11
+
+        jbe     NEAR $L$oop_avx2
+        lea     rbp,[rsp]
+
+$L$done_avx2:
+        lea     rsp,[rbp]
+        mov     rsi,QWORD[152+rsp]
+
+        vzeroupper
+        movaps  xmm6,XMMWORD[((128+32))+rsp]
+        movaps  xmm7,XMMWORD[((128+48))+rsp]
+        movaps  xmm8,XMMWORD[((128+64))+rsp]
+        movaps  xmm9,XMMWORD[((128+80))+rsp]
+        movaps  xmm10,XMMWORD[((128+96))+rsp]
+        movaps  xmm11,XMMWORD[((128+112))+rsp]
+        mov     r15,QWORD[((-48))+rsi]
+
+        mov     r14,QWORD[((-40))+rsi]
+
+        mov     r13,QWORD[((-32))+rsi]
+
+        mov     r12,QWORD[((-24))+rsi]
+
+        mov     rbp,QWORD[((-16))+rsi]
+
+        mov     rbx,QWORD[((-8))+rsi]
+
+        lea     rsp,[rsi]
+
+$L$epilogue_avx2:
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_sha512_block_data_order_avx2:
+EXTERN  __imp_RtlVirtualUnwind
+
+ALIGN   16
+se_handler:
+        push    rsi
+        push    rdi
+        push    rbx
+        push    rbp
+        push    r12
+        push    r13
+        push    r14
+        push    r15
+        pushfq
+        sub     rsp,64
+
+        mov     rax,QWORD[120+r8]
+        mov     rbx,QWORD[248+r8]
+
+        mov     rsi,QWORD[8+r9]
+        mov     r11,QWORD[56+r9]
+
+        mov     r10d,DWORD[r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jb      NEAR $L$in_prologue
+
+        mov     rax,QWORD[152+r8]
+
+        mov     r10d,DWORD[4+r11]
+        lea     r10,[r10*1+rsi]
+        cmp     rbx,r10
+        jae     NEAR $L$in_prologue
+        lea     r10,[$L$avx2_shortcut]
+        cmp     rbx,r10
+        jb      NEAR $L$not_in_avx2
+
+        and     rax,-256*8
+        add     rax,1152
+$L$not_in_avx2:
+        mov     rsi,rax
+        mov     rax,QWORD[((128+24))+rax]
+
+        mov     rbx,QWORD[((-8))+rax]
+        mov     rbp,QWORD[((-16))+rax]
+        mov     r12,QWORD[((-24))+rax]
+        mov     r13,QWORD[((-32))+rax]
+        mov     r14,QWORD[((-40))+rax]
+        mov     r15,QWORD[((-48))+rax]
+        mov     QWORD[144+r8],rbx
+        mov     QWORD[160+r8],rbp
+        mov     QWORD[216+r8],r12
+        mov     QWORD[224+r8],r13
+        mov     QWORD[232+r8],r14
+        mov     QWORD[240+r8],r15
+
+        lea     r10,[$L$epilogue]
+        cmp     rbx,r10
+        jb      NEAR $L$in_prologue
+
+        lea     rsi,[((128+32))+rsi]
+        lea     rdi,[512+r8]
+        mov     ecx,12
+        DD      0xa548f3fc
+
+$L$in_prologue:
+        mov     rdi,QWORD[8+rax]
+        mov     rsi,QWORD[16+rax]
+        mov     QWORD[152+r8],rax
+        mov     QWORD[168+r8],rsi
+        mov     QWORD[176+r8],rdi
+
+        mov     rdi,QWORD[40+r9]
+        mov     rsi,r8
+        mov     ecx,154
+        DD      0xa548f3fc
+
+        mov     rsi,r9
+        xor     rcx,rcx
+        mov     rdx,QWORD[8+rsi]
+        mov     r8,QWORD[rsi]
+        mov     r9,QWORD[16+rsi]
+        mov     r10,QWORD[40+rsi]
+        lea     r11,[56+rsi]
+        lea     r12,[24+rsi]
+        mov     QWORD[32+rsp],r10
+        mov     QWORD[40+rsp],r11
+        mov     QWORD[48+rsp],r12
+        mov     QWORD[56+rsp],rcx
+        call    QWORD[__imp_RtlVirtualUnwind]
+
+        mov     eax,1
+        add     rsp,64
+        popfq
+        pop     r15
+        pop     r14
+        pop     r13
+        pop     r12
+        pop     rbp
+        pop     rbx
+        pop     rdi
+        pop     rsi
+        DB      0F3h,0C3h               ;repret
+
+section .pdata rdata align=4
+ALIGN   4
+        DD      $L$SEH_begin_sha512_block_data_order wrt ..imagebase
+        DD      $L$SEH_end_sha512_block_data_order wrt ..imagebase
+        DD      $L$SEH_info_sha512_block_data_order wrt ..imagebase
+        DD      $L$SEH_begin_sha512_block_data_order_xop wrt ..imagebase
+        DD      $L$SEH_end_sha512_block_data_order_xop wrt ..imagebase
+        DD      $L$SEH_info_sha512_block_data_order_xop wrt ..imagebase
+        DD      $L$SEH_begin_sha512_block_data_order_avx wrt ..imagebase
+        DD      $L$SEH_end_sha512_block_data_order_avx wrt ..imagebase
+        DD      $L$SEH_info_sha512_block_data_order_avx wrt ..imagebase
+        DD      $L$SEH_begin_sha512_block_data_order_avx2 wrt ..imagebase
+        DD      $L$SEH_end_sha512_block_data_order_avx2 wrt ..imagebase
+        DD      $L$SEH_info_sha512_block_data_order_avx2 wrt ..imagebase
+section .xdata rdata align=8
+ALIGN   8
+$L$SEH_info_sha512_block_data_order:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$prologue wrt ..imagebase,$L$epilogue wrt ..imagebase
+$L$SEH_info_sha512_block_data_order_xop:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$prologue_xop wrt ..imagebase,$L$epilogue_xop wrt ..imagebase
+$L$SEH_info_sha512_block_data_order_avx:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$prologue_avx wrt ..imagebase,$L$epilogue_avx wrt ..imagebase
+$L$SEH_info_sha512_block_data_order_avx2:
+DB      9,0,0,0
+        DD      se_handler wrt ..imagebase
+        DD      $L$prologue_avx2 wrt ..imagebase,$L$epilogue_avx2 wrt ..imagebase
diff --git a/CryptoPkg/Library/OpensslLib/X64/crypto/x86_64cpuid.nasm b/CryptoPkg/Library/OpensslLib/X64/crypto/x86_64cpuid.nasm
new file mode 100644
index 0000000000..2b64a074c3
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/X64/crypto/x86_64cpuid.nasm
@@ -0,0 +1,472 @@
+; Copyright 2005-2018 The OpenSSL Project Authors. All Rights Reserved.
+;
+; Licensed under the OpenSSL license (the "License").  You may not use
+; this file except in compliance with the License.  You can obtain a copy
+; in the file LICENSE in the source distribution or at
+; https://www.openssl.org/source/license.html
+
+default rel
+%define XMMWORD
+%define YMMWORD
+%define ZMMWORD
+EXTERN  OPENSSL_cpuid_setup
+
+section .CRT$XCU rdata align=8
+                DQ      OPENSSL_cpuid_setup
+
+
+common  OPENSSL_ia32cap_P 16
+
+section .text code align=64
+
+
+global  OPENSSL_atomic_add
+
+ALIGN   16
+OPENSSL_atomic_add:
+        mov     eax,DWORD[rcx]
+$L$spin:        lea     r8,[rax*1+rdx]
+DB      0xf0
+        cmpxchg DWORD[rcx],r8d
+        jne     NEAR $L$spin
+        mov     eax,r8d
+DB      0x48,0x98
+        DB      0F3h,0C3h               ;repret
+
+
+global  OPENSSL_rdtsc
+
+ALIGN   16
+OPENSSL_rdtsc:
+        rdtsc
+        shl     rdx,32
+        or      rax,rdx
+        DB      0F3h,0C3h               ;repret
+
+
+global  OPENSSL_ia32_cpuid
+
+ALIGN   16
+OPENSSL_ia32_cpuid:
+        mov     QWORD[8+rsp],rdi        ;WIN64 prologue
+        mov     QWORD[16+rsp],rsi
+        mov     rax,rsp
+$L$SEH_begin_OPENSSL_ia32_cpuid:
+        mov     rdi,rcx
+
+
+
+        mov     r8,rbx
+
+
+        xor     eax,eax
+        mov     QWORD[8+rdi],rax
+        cpuid
+        mov     r11d,eax
+
+        xor     eax,eax
+        cmp     ebx,0x756e6547
+        setne   al
+        mov     r9d,eax
+        cmp     edx,0x49656e69
+        setne   al
+        or      r9d,eax
+        cmp     ecx,0x6c65746e
+        setne   al
+        or      r9d,eax
+        jz      NEAR $L$intel
+
+        cmp     ebx,0x68747541
+        setne   al
+        mov     r10d,eax
+        cmp     edx,0x69746E65
+        setne   al
+        or      r10d,eax
+        cmp     ecx,0x444D4163
+        setne   al
+        or      r10d,eax
+        jnz     NEAR $L$intel
+
+
+        mov     eax,0x80000000
+        cpuid
+        cmp     eax,0x80000001
+        jb      NEAR $L$intel
+        mov     r10d,eax
+        mov     eax,0x80000001
+        cpuid
+        or      r9d,ecx
+        and     r9d,0x00000801
+
+        cmp     r10d,0x80000008
+        jb      NEAR $L$intel
+
+        mov     eax,0x80000008
+        cpuid
+        movzx   r10,cl
+        inc     r10
+
+        mov     eax,1
+        cpuid
+        bt      edx,28
+        jnc     NEAR $L$generic
+        shr     ebx,16
+        cmp     bl,r10b
+        ja      NEAR $L$generic
+        and     edx,0xefffffff
+        jmp     NEAR $L$generic
+
+$L$intel:
+        cmp     r11d,4
+        mov     r10d,-1
+        jb      NEAR $L$nocacheinfo
+
+        mov     eax,4
+        mov     ecx,0
+        cpuid
+        mov     r10d,eax
+        shr     r10d,14
+        and     r10d,0xfff
+
+$L$nocacheinfo:
+        mov     eax,1
+        cpuid
+        movd    xmm0,eax
+        and     edx,0xbfefffff
+        cmp     r9d,0
+        jne     NEAR $L$notintel
+        or      edx,0x40000000
+        and     ah,15
+        cmp     ah,15
+        jne     NEAR $L$notP4
+        or      edx,0x00100000
+$L$notP4:
+        cmp     ah,6
+        jne     NEAR $L$notintel
+        and     eax,0x0fff0ff0
+        cmp     eax,0x00050670
+        je      NEAR $L$knights
+        cmp     eax,0x00080650
+        jne     NEAR $L$notintel
+$L$knights:
+        and     ecx,0xfbffffff
+
+$L$notintel:
+        bt      edx,28
+        jnc     NEAR $L$generic
+        and     edx,0xefffffff
+        cmp     r10d,0
+        je      NEAR $L$generic
+
+        or      edx,0x10000000
+        shr     ebx,16
+        cmp     bl,1
+        ja      NEAR $L$generic
+        and     edx,0xefffffff
+$L$generic:
+        and     r9d,0x00000800
+        and     ecx,0xfffff7ff
+        or      r9d,ecx
+
+        mov     r10d,edx
+
+        cmp     r11d,7
+        jb      NEAR $L$no_extended_info
+        mov     eax,7
+        xor     ecx,ecx
+        cpuid
+        bt      r9d,26
+        jc      NEAR $L$notknights
+        and     ebx,0xfff7ffff
+$L$notknights:
+        movd    eax,xmm0
+        and     eax,0x0fff0ff0
+        cmp     eax,0x00050650
+        jne     NEAR $L$notskylakex
+        and     ebx,0xfffeffff
+
+$L$notskylakex:
+        mov     DWORD[8+rdi],ebx
+        mov     DWORD[12+rdi],ecx
+$L$no_extended_info:
+
+        bt      r9d,27
+        jnc     NEAR $L$clear_avx
+        xor     ecx,ecx
+DB      0x0f,0x01,0xd0
+        and     eax,0xe6
+        cmp     eax,0xe6
+        je      NEAR $L$done
+        and     DWORD[8+rdi],0x3fdeffff
+
+
+
+
+        and     eax,6
+        cmp     eax,6
+        je      NEAR $L$done
+$L$clear_avx:
+        mov     eax,0xefffe7ff
+        and     r9d,eax
+        mov     eax,0x3fdeffdf
+        and     DWORD[8+rdi],eax
+$L$done:
+        shl     r9,32
+        mov     eax,r10d
+        mov     rbx,r8
+
+        or      rax,r9
+        mov     rdi,QWORD[8+rsp]        ;WIN64 epilogue
+        mov     rsi,QWORD[16+rsp]
+        DB      0F3h,0C3h               ;repret
+
+$L$SEH_end_OPENSSL_ia32_cpuid:
+
+global  OPENSSL_cleanse
+
+ALIGN   16
+OPENSSL_cleanse:
+        xor     rax,rax
+        cmp     rdx,15
+        jae     NEAR $L$ot
+        cmp     rdx,0
+        je      NEAR $L$ret
+$L$ittle:
+        mov     BYTE[rcx],al
+        sub     rdx,1
+        lea     rcx,[1+rcx]
+        jnz     NEAR $L$ittle
+$L$ret:
+        DB      0F3h,0C3h               ;repret
+ALIGN   16
+$L$ot:
+        test    rcx,7
+        jz      NEAR $L$aligned
+        mov     BYTE[rcx],al
+        lea     rdx,[((-1))+rdx]
+        lea     rcx,[1+rcx]
+        jmp     NEAR $L$ot
+$L$aligned:
+        mov     QWORD[rcx],rax
+        lea     rdx,[((-8))+rdx]
+        test    rdx,-8
+        lea     rcx,[8+rcx]
+        jnz     NEAR $L$aligned
+        cmp     rdx,0
+        jne     NEAR $L$ittle
+        DB      0F3h,0C3h               ;repret
+
+
+global  CRYPTO_memcmp
+
+ALIGN   16
+CRYPTO_memcmp:
+        xor     rax,rax
+        xor     r10,r10
+        cmp     r8,0
+        je      NEAR $L$no_data
+        cmp     r8,16
+        jne     NEAR $L$oop_cmp
+        mov     r10,QWORD[rcx]
+        mov     r11,QWORD[8+rcx]
+        mov     r8,1
+        xor     r10,QWORD[rdx]
+        xor     r11,QWORD[8+rdx]
+        or      r10,r11
+        cmovnz  rax,r8
+        DB      0F3h,0C3h               ;repret
+
+ALIGN   16
+$L$oop_cmp:
+        mov     r10b,BYTE[rcx]
+        lea     rcx,[1+rcx]
+        xor     r10b,BYTE[rdx]
+        lea     rdx,[1+rdx]
+        or      al,r10b
+        dec     r8
+        jnz     NEAR $L$oop_cmp
+        neg     rax
+        shr     rax,63
+$L$no_data:
+        DB      0F3h,0C3h               ;repret
+
+global  OPENSSL_wipe_cpu
+
+ALIGN   16
+OPENSSL_wipe_cpu:
+        pxor    xmm0,xmm0
+        pxor    xmm1,xmm1
+        pxor    xmm2,xmm2
+        pxor    xmm3,xmm3
+        pxor    xmm4,xmm4
+        pxor    xmm5,xmm5
+        xor     rcx,rcx
+        xor     rdx,rdx
+        xor     r8,r8
+        xor     r9,r9
+        xor     r10,r10
+        xor     r11,r11
+        lea     rax,[8+rsp]
+        DB      0F3h,0C3h               ;repret
+
+global  OPENSSL_instrument_bus
+
+ALIGN   16
+OPENSSL_instrument_bus:
+        mov     r10,rcx
+        mov     rcx,rdx
+        mov     r11,rdx
+
+        rdtsc
+        mov     r8d,eax
+        mov     r9d,0
+        clflush [r10]
+DB      0xf0
+        add     DWORD[r10],r9d
+        jmp     NEAR $L$oop
+ALIGN   16
+$L$oop: rdtsc
+        mov     edx,eax
+        sub     eax,r8d
+        mov     r8d,edx
+        mov     r9d,eax
+        clflush [r10]
+DB      0xf0
+        add     DWORD[r10],eax
+        lea     r10,[4+r10]
+        sub     rcx,1
+        jnz     NEAR $L$oop
+
+        mov     rax,r11
+        DB      0F3h,0C3h               ;repret
+
+
+global  OPENSSL_instrument_bus2
+
+ALIGN   16
+OPENSSL_instrument_bus2:
+        mov     r10,rcx
+        mov     rcx,rdx
+        mov     r11,r8
+        mov     QWORD[8+rsp],rcx
+
+        rdtsc
+        mov     r8d,eax
+        mov     r9d,0
+
+        clflush [r10]
+DB      0xf0
+        add     DWORD[r10],r9d
+
+        rdtsc
+        mov     edx,eax
+        sub     eax,r8d
+        mov     r8d,edx
+        mov     r9d,eax
+$L$oop2:
+        clflush [r10]
+DB      0xf0
+        add     DWORD[r10],eax
+
+        sub     r11,1
+        jz      NEAR $L$done2
+
+        rdtsc
+        mov     edx,eax
+        sub     eax,r8d
+        mov     r8d,edx
+        cmp     eax,r9d
+        mov     r9d,eax
+        mov     edx,0
+        setne   dl
+        sub     rcx,rdx
+        lea     r10,[rdx*4+r10]
+        jnz     NEAR $L$oop2
+
+$L$done2:
+        mov     rax,QWORD[8+rsp]
+        sub     rax,rcx
+        DB      0F3h,0C3h               ;repret
+
+global  OPENSSL_ia32_rdrand_bytes
+
+ALIGN   16
+OPENSSL_ia32_rdrand_bytes:
+        xor     rax,rax
+        cmp     rdx,0
+        je      NEAR $L$done_rdrand_bytes
+
+        mov     r11,8
+$L$oop_rdrand_bytes:
+DB      73,15,199,242
+        jc      NEAR $L$break_rdrand_bytes
+        dec     r11
+        jnz     NEAR $L$oop_rdrand_bytes
+        jmp     NEAR $L$done_rdrand_bytes
+
+ALIGN   16
+$L$break_rdrand_bytes:
+        cmp     rdx,8
+        jb      NEAR $L$tail_rdrand_bytes
+        mov     QWORD[rcx],r10
+        lea     rcx,[8+rcx]
+        add     rax,8
+        sub     rdx,8
+        jz      NEAR $L$done_rdrand_bytes
+        mov     r11,8
+        jmp     NEAR $L$oop_rdrand_bytes
+
+ALIGN   16
+$L$tail_rdrand_bytes:
+        mov     BYTE[rcx],r10b
+        lea     rcx,[1+rcx]
+        inc     rax
+        shr     r10,8
+        dec     rdx
+        jnz     NEAR $L$tail_rdrand_bytes
+
+$L$done_rdrand_bytes:
+        xor     r10,r10
+        DB      0F3h,0C3h               ;repret
+
+global  OPENSSL_ia32_rdseed_bytes
+
+ALIGN   16
+OPENSSL_ia32_rdseed_bytes:
+        xor     rax,rax
+        cmp     rdx,0
+        je      NEAR $L$done_rdseed_bytes
+
+        mov     r11,8
+$L$oop_rdseed_bytes:
+DB      73,15,199,250
+        jc      NEAR $L$break_rdseed_bytes
+        dec     r11
+        jnz     NEAR $L$oop_rdseed_bytes
+        jmp     NEAR $L$done_rdseed_bytes
+
+ALIGN   16
+$L$break_rdseed_bytes:
+        cmp     rdx,8
+        jb      NEAR $L$tail_rdseed_bytes
+        mov     QWORD[rcx],r10
+        lea     rcx,[8+rcx]
+        add     rax,8
+        sub     rdx,8
+        jz      NEAR $L$done_rdseed_bytes
+        mov     r11,8
+        jmp     NEAR $L$oop_rdseed_bytes
+
+ALIGN   16
+$L$tail_rdseed_bytes:
+        mov     BYTE[rcx],r10b
+        lea     rcx,[1+rcx]
+        inc     rax
+        shr     r10,8
+        dec     rdx
+        jnz     NEAR $L$tail_rdseed_bytes
+
+$L$done_rdseed_bytes:
+        xor     r10,r10
+        DB      0F3h,0C3h               ;repret
+
diff --git a/CryptoPkg/Library/OpensslLib/process_files.pl b/CryptoPkg/Library/OpensslLib/process_files.pl
index 4ba25da407..c0a19b99b6 100755
--- a/CryptoPkg/Library/OpensslLib/process_files.pl
+++ b/CryptoPkg/Library/OpensslLib/process_files.pl
@@ -12,6 +12,47 @@
 use strict;
 use Cwd;
 use File::Copy;
+use File::Basename;
+use File::Path qw(make_path remove_tree);
+use Text::Tabs;
+
+#
+# OpenSSL perlasm generator script does not transfer the copyright header
+#
+sub copy_license_header
+{
+    my @args = split / /, shift;    #Separate args by spaces
+    my $source = $args[1];          #Source file is second (after "perl")
+    my $target = pop @args;         #Target file is always last
+    chop ($target);                 #Remove newline char
+
+    my $temp_file_name = "license.tmp";
+    open (my $source_file, "<" . $source) || die $source;
+    open (my $target_file, "<" . $target) || die $target;
+    open (my $temp_file, ">" . $temp_file_name) || die $temp_file_name;
+
+    #Copy source file header to temp file
+    while (my $line = <$source_file>) {
+        next if ($line =~ /#!/);    #Ignore shebang line
+        $line =~ s/#/;/;            #Fix comment character for assembly
+        $line =~ s/\s+$/\r\n/;      #Trim trailing whitepsace, fixup line endings
+        print ($temp_file $line);
+        last if ($line =~ /http/);  #Last line of copyright header contains a web link
+    }
+    print ($temp_file "\r\n");      #Add an empty line after the header
+    #Retrieve generated assembly contents
+    while (my $line = <$target_file>) {
+        $line =~ s/\s+$/\r\n/;      #Trim trailing whitepsace, fixup line endings
+        print ($temp_file expand ($line));  #expand() replaces tabs with spaces
+    }
+
+    close ($source_file);
+    close ($target_file);
+    close ($temp_file);
+
+    move ($temp_file_name, $target) ||
+        die "Cannot replace \"" . $target . "\"!";
+}
 
 #
 # Find the openssl directory name for use lib. We have to do this
@@ -21,10 +62,39 @@ use File::Copy;
 #
 my $inf_file;
 my $OPENSSL_PATH;
+my $uefi_config;
+my $extension;
+my $arch;
 my @inf;
 
 BEGIN {
     $inf_file = "OpensslLib.inf";
+    $uefi_config = "UEFI";
+    $arch = shift;
+
+    if (defined $arch) {
+        if (lc ($arch) eq lc ("X64")) {
+            $inf_file = "OpensslLibX64.inf";
+            $uefi_config = "UEFI-x86_64";
+            $extension = "nasm";
+        } elsif (lc ($arch) eq lc ("IA32")) {
+            $arch = "Ia32";
+            $inf_file = "OpensslLibIa32.inf";
+            $uefi_config = "UEFI-x86";
+            $extension = "nasm";
+        } else {
+            die "Unsupported architecture \"" . $arch . "\"!";
+        }
+
+        # Prepare assembly folder
+        if (-d $arch) {
+            remove_tree ($arch, {safe => 1}) ||
+                die "Cannot clean assembly folder \"" . $arch . "\"!";
+        } else {
+            mkdir $arch ||
+                die "Cannot create assembly folder \"" . $arch . "\"!";
+        }
+    }
 
     # Read the contents of the inf file
     open( FD, "<" . $inf_file ) ||
@@ -47,9 +117,9 @@ BEGIN {
             # Configure UEFI
             system(
                 "./Configure",
-                "UEFI",
+                "--config=../uefi-asm.conf",
+                "$uefi_config",
                 "no-afalgeng",
-                "no-asm",
                 "no-async",
                 "no-autoerrinit",
                 "no-autoload-config",
@@ -126,22 +196,52 @@ BEGIN {
 # Retrieve file lists from OpenSSL configdata
 #
 use configdata qw/%unified_info/;
+use configdata qw/%config/;
+use configdata qw/%target/;
+
+#
+# Collect build flags from configdata
+#
+my $flags = "";
+foreach my $f (@{$config{lib_defines}}) {
+    $flags .= " -D$f";
+}
 
 my @cryptofilelist = ();
 my @sslfilelist = ();
+my @asmfilelist = ();
+my @asmbuild = ();
 foreach my $product ((@{$unified_info{libraries}},
                       @{$unified_info{engines}})) {
     foreach my $o (@{$unified_info{sources}->{$product}}) {
         foreach my $s (@{$unified_info{sources}->{$o}}) {
-            next if ($unified_info{generate}->{$s});
-            next if $s =~ "crypto/bio/b_print.c";
-
             # No need to add unused files in UEFI.
             # So it can reduce porting time, compile time, library size.
+            next if $s =~ "crypto/bio/b_print.c";
             next if $s =~ "crypto/rand/randfile.c";
             next if $s =~ "crypto/store/";
             next if $s =~ "crypto/err/err_all.c";
 
+            if ($unified_info{generate}->{$s}) {
+                if (defined $arch) {
+                    my $buildstring = "perl";
+                    foreach my $arg (@{$unified_info{generate}->{$s}}) {
+                        if ($arg =~ ".pl") {
+                            $buildstring .= " ./openssl/$arg";
+                        } elsif ($arg =~ "PERLASM_SCHEME") {
+                            $buildstring .= " $target{perlasm_scheme}";
+                        } elsif ($arg =~ "LIB_CFLAGS") {
+                            $buildstring .= "$flags";
+                        }
+                    }
+                    ($s, my $path, undef) = fileparse($s, qr/\.[^.]*/);
+                    $buildstring .= " ./$arch/$path$s.$extension";
+                    make_path ("./$arch/$path");
+                    push @asmbuild, "$buildstring\n";
+                    push @asmfilelist, "  $arch/$path$s.$extension\r\n";
+                }
+                next;
+            }
             if ($product =~ "libssl") {
                 push @sslfilelist, '  $(OPENSSL_PATH)/' . $s . "\r\n";
                 next;
@@ -179,15 +279,31 @@ foreach (@headers){
 }
 
 
+#
+# Generate assembly files
+#
+if (@asmbuild) {
+    print "\n--> Generating assembly files ... ";
+    foreach my $buildstring (@asmbuild) {
+        system ("$buildstring");
+        copy_license_header ($buildstring);
+    }
+    print "Done!";
+}
+
 #
 # Update OpensslLib.inf with autogenerated file list
 #
 my @new_inf = ();
 my $subbing = 0;
-print "\n--> Updating OpensslLib.inf ... ";
+print "\n--> Updating $inf_file ... ";
 foreach (@inf) {
+    if ($_ =~ "DEFINE OPENSSL_FLAGS_CONFIG") {
+        push @new_inf, "  DEFINE OPENSSL_FLAGS_CONFIG    =" . $flags . "\r\n";
+        next;
+    }
     if ( $_ =~ "# Autogenerated files list starts here" ) {
-        push @new_inf, $_, @cryptofilelist, @sslfilelist;
+        push @new_inf, $_, @asmfilelist, @cryptofilelist, @sslfilelist;
         $subbing = 1;
         next;
     }
@@ -212,49 +328,51 @@ rename( $new_inf_file, $inf_file ) ||
     die "rename $inf_file";
 print "Done!";
 
-#
-# Update OpensslLibCrypto.inf with auto-generated file list (no libssl)
-#
-$inf_file = "OpensslLibCrypto.inf";
-
-# Read the contents of the inf file
-@inf = ();
-@new_inf = ();
-open( FD, "<" . $inf_file ) ||
-    die "Cannot open \"" . $inf_file . "\"!";
-@inf = (<FD>);
-close(FD) ||
-    die "Cannot close \"" . $inf_file . "\"!";
+if (!defined $arch) {
+    #
+    # Update OpensslLibCrypto.inf with auto-generated file list (no libssl)
+    #
+    $inf_file = "OpensslLibCrypto.inf";
 
-$subbing = 0;
-print "\n--> Updating OpensslLibCrypto.inf ... ";
-foreach (@inf) {
-    if ( $_ =~ "# Autogenerated files list starts here" ) {
-        push @new_inf, $_, @cryptofilelist;
-        $subbing = 1;
-        next;
-    }
-    if ( $_ =~ "# Autogenerated files list ends here" ) {
-        push @new_inf, $_;
-        $subbing = 0;
-        next;
+    # Read the contents of the inf file
+    @inf = ();
+    @new_inf = ();
+    open( FD, "<" . $inf_file ) ||
+        die "Cannot open \"" . $inf_file . "\"!";
+    @inf = (<FD>);
+    close(FD) ||
+        die "Cannot close \"" . $inf_file . "\"!";
+
+    $subbing = 0;
+    print "\n--> Updating OpensslLibCrypto.inf ... ";
+    foreach (@inf) {
+        if ( $_ =~ "# Autogenerated files list starts here" ) {
+            push @new_inf, $_, @cryptofilelist;
+            $subbing = 1;
+            next;
+        }
+        if ( $_ =~ "# Autogenerated files list ends here" ) {
+            push @new_inf, $_;
+            $subbing = 0;
+            next;
+        }
+
+        push @new_inf, $_
+            unless ($subbing);
     }
 
-    push @new_inf, $_
-        unless ($subbing);
+    $new_inf_file = $inf_file . ".new";
+    open( FD, ">" . $new_inf_file ) ||
+        die $new_inf_file;
+    print( FD @new_inf ) ||
+        die $new_inf_file;
+    close(FD) ||
+        die $new_inf_file;
+    rename( $new_inf_file, $inf_file ) ||
+        die "rename $inf_file";
+    print "Done!";
 }
 
-$new_inf_file = $inf_file . ".new";
-open( FD, ">" . $new_inf_file ) ||
-    die $new_inf_file;
-print( FD @new_inf ) ||
-    die $new_inf_file;
-close(FD) ||
-    die $new_inf_file;
-rename( $new_inf_file, $inf_file ) ||
-    die "rename $inf_file";
-print "Done!";
-
 #
 # Copy opensslconf.h and dso_conf.h generated from OpenSSL Configuration
 #
diff --git a/CryptoPkg/Library/OpensslLib/uefi-asm.conf b/CryptoPkg/Library/OpensslLib/uefi-asm.conf
new file mode 100644
index 0000000000..4fd52c9cf2
--- /dev/null
+++ b/CryptoPkg/Library/OpensslLib/uefi-asm.conf
@@ -0,0 +1,14 @@
+## -*- mode: perl; -*-
+## UEFI assembly openssl configuration targets.
+
+my %targets = (
+#### UEFI
+    "UEFI-x86" => {
+        inherit_from     => [ "UEFI",  asm("x86_asm") ],
+        perlasm_scheme   => "win32n",
+    },
+    "UEFI-x86_64" => {
+        inherit_from     => [ "UEFI",  asm("x86_64_asm") ],
+        perlasm_scheme   => "nasm",
+    },
+);
-- 
2.16.2.windows.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [edk2-devel] [PATCH 0/1] CryptoPkg/OpensslLib: Add native instruction support for IA32 and X64
  2020-03-17 10:26 [PATCH 0/1] CryptoPkg/OpensslLib: Add native instruction support for IA32 and X64 Zurcher, Christopher J
  2020-03-17 10:26 ` [PATCH 1/1] " Zurcher, Christopher J
@ 2020-03-23 12:59 ` Laszlo Ersek
  2020-03-25 18:40 ` Ard Biesheuvel
  2 siblings, 0 replies; 14+ messages in thread
From: Laszlo Ersek @ 2020-03-23 12:59 UTC (permalink / raw)
  To: devel, christopher.j.zurcher
  Cc: Jian J Wang, Xiaoyu Lu, Eugene Cohen, Ard Biesheuvel

On 03/17/20 11:26, Zurcher, Christopher J wrote:
> BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=2507
> 
> This patch adds support for building the native instruction algorithms for
> IA32 and X64 versions of OpensslLib. The process_files.pl script was modified
> to parse the .asm file targets from the OpenSSL build config data struct, and
> generate the necessary assembly files for the EDK2 build environment.
> 
> For the X64 variant, OpenSSL includes calls to a Windows error handling API,
> and that function has been stubbed out in ApiHooks.c.
> 
> For all variants, a constructor was added to call the required CPUID function
> within OpenSSL to facilitate processor capability checks in the native
> algorithms.
> 
> Additional native architecture variants should be simple to add by following
> the changes made for these two architectures.
> 
> The OpenSSL assembly files are traditionally generated at build time using a
> perl script. To avoid that burden on EDK2 users, these end-result assembly
> files are generated during the configuration steps performed by the package
> maintainer (through process_files.pl). The perl generator scripts inside
> OpenSSL do not parse file comments as they are only meant to create
> intermediate build files, so process_files.pl contains additional hooks to
> preserve the copyright headers as well as clean up tabs and line endings to
> comply with EDK2 coding standards. The resulting file headers align with
> the generated .h files which are already included in the EDK2 repository.
> 
> Cc: Jian J Wang <jian.j.wang@intel.com>
> Cc: Xiaoyu Lu <xiaoyux.lu@intel.com>
> Cc: Eugene Cohen <eugene@hp.com>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> 
> Christopher J Zurcher (1):
>   CryptoPkg/OpensslLib: Add native instruction support for IA32 and X64
> 
>  CryptoPkg/Library/OpensslLib/OpensslLib.inf                          |    2 +-
>  CryptoPkg/Library/OpensslLib/OpensslLibCrypto.inf                    |    2 +-
>  CryptoPkg/Library/OpensslLib/OpensslLibIa32.inf                      |  680 ++
>  CryptoPkg/Library/OpensslLib/OpensslLibX64.inf                       |  691 ++
>  CryptoPkg/Library/Include/openssl/opensslconf.h                      |    3 -
>  CryptoPkg/Library/OpensslLib/ApiHooks.c                              |   18 +
>  CryptoPkg/Library/OpensslLib/OpensslLibConstructor.c                 |   34 +
>  CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/aesni-x86.nasm          | 3209 ++++++++
>  CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/vpaes-x86.nasm          |  648 ++
>  CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/bn-586.nasm              | 1522 ++++
>  CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/co-586.nasm              | 1259 +++
>  CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-gf2m.nasm            |  352 +
>  CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-mont.nasm            |  486 ++
>  CryptoPkg/Library/OpensslLib/Ia32/crypto/des/crypt586.nasm           |  887 +++
>  CryptoPkg/Library/OpensslLib/Ia32/crypto/des/des-586.nasm            | 1835 +++++
>  CryptoPkg/Library/OpensslLib/Ia32/crypto/md5/md5-586.nasm            |  690 ++
>  CryptoPkg/Library/OpensslLib/Ia32/crypto/modes/ghash-x86.nasm        | 1264 +++
>  CryptoPkg/Library/OpensslLib/Ia32/crypto/rc4/rc4-586.nasm            |  381 +
>  CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha1-586.nasm           | 3977 ++++++++++
>  CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha256-586.nasm         | 6796 ++++++++++++++++
>  CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha512-586.nasm         | 2842 +++++++
>  CryptoPkg/Library/OpensslLib/Ia32/crypto/x86cpuid.nasm               |  513 ++
>  CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-mb-x86_64.nasm     | 1772 +++++
>  CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha1-x86_64.nasm   | 3271 ++++++++
>  CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha256-x86_64.nasm | 4709 +++++++++++
>  CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-x86_64.nasm        | 5084 ++++++++++++
>  CryptoPkg/Library/OpensslLib/X64/crypto/aes/vpaes-x86_64.nasm        | 1170 +++
>  CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-avx2.nasm            | 1989 +++++
>  CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-x86_64.nasm          | 2242 ++++++
>  CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-gf2m.nasm          |  432 +
>  CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont.nasm          | 1479 ++++
>  CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont5.nasm         | 4033 ++++++++++
>  CryptoPkg/Library/OpensslLib/X64/crypto/md5/md5-x86_64.nasm          |  794 ++
>  CryptoPkg/Library/OpensslLib/X64/crypto/modes/aesni-gcm-x86_64.nasm  |  984 +++
>  CryptoPkg/Library/OpensslLib/X64/crypto/modes/ghash-x86_64.nasm      | 2077 +++++
>  CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-md5-x86_64.nasm      | 1395 ++++
>  CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-x86_64.nasm          |  784 ++
>  CryptoPkg/Library/OpensslLib/X64/crypto/sha/keccak1600-x86_64.nasm   |  532 ++
>  CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-mb-x86_64.nasm      | 7581 ++++++++++++++++++
>  CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-x86_64.nasm         | 5773 ++++++++++++++
>  CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-mb-x86_64.nasm    | 8262 ++++++++++++++++++++
>  CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-x86_64.nasm       | 5712 ++++++++++++++
>  CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha512-x86_64.nasm       | 5668 ++++++++++++++
>  CryptoPkg/Library/OpensslLib/X64/crypto/x86_64cpuid.nasm             |  472 ++
>  CryptoPkg/Library/OpensslLib/process_files.pl                        |  208 +-
>  CryptoPkg/Library/OpensslLib/uefi-asm.conf                           |   14 +
>  46 files changed, 94478 insertions(+), 50 deletions(-)
>  create mode 100644 CryptoPkg/Library/OpensslLib/OpensslLibIa32.inf
>  create mode 100644 CryptoPkg/Library/OpensslLib/OpensslLibX64.inf
>  create mode 100644 CryptoPkg/Library/OpensslLib/ApiHooks.c
>  create mode 100644 CryptoPkg/Library/OpensslLib/OpensslLibConstructor.c
>  create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/aesni-x86.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/vpaes-x86.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/bn-586.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/co-586.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-gf2m.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-mont.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/des/crypt586.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/des/des-586.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/md5/md5-586.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/modes/ghash-x86.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/rc4/rc4-586.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha1-586.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha256-586.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha512-586.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/x86cpuid.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-mb-x86_64.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha1-x86_64.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha256-x86_64.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-x86_64.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/aes/vpaes-x86_64.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-avx2.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-x86_64.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-gf2m.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont5.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/md5/md5-x86_64.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/modes/aesni-gcm-x86_64.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/modes/ghash-x86_64.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-md5-x86_64.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-x86_64.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/sha/keccak1600-x86_64.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-mb-x86_64.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-x86_64.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-mb-x86_64.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-x86_64.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha512-x86_64.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/x86_64cpuid.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/uefi-asm.conf
> 

(1) Please break this patch into at least two patches. The generated
files add more than ninety thousand lines, and (as I understand) noone
is expected to review them in detail. They should be separated to a
dedicated patch.

The rest of the code changes should be reviewed with more care, so they
deserve at least one stand-alone patch (several patches if necessary).

(2) Furthermore, I would suggest including a comment near the top of
each generated NASM file that said file was generated, and should not be
modified manually. (Of course this would mean updating the perl script
-- I'm not asking for manual comments!)

Thanks!
Laszlo


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/1] CryptoPkg/OpensslLib: Add native instruction support for IA32 and X64
  2020-03-17 10:26 [PATCH 0/1] CryptoPkg/OpensslLib: Add native instruction support for IA32 and X64 Zurcher, Christopher J
  2020-03-17 10:26 ` [PATCH 1/1] " Zurcher, Christopher J
  2020-03-23 12:59 ` [edk2-devel] [PATCH 0/1] " Laszlo Ersek
@ 2020-03-25 18:40 ` Ard Biesheuvel
  2020-03-26  1:04   ` [edk2-devel] " Zurcher, Christopher J
  2 siblings, 1 reply; 14+ messages in thread
From: Ard Biesheuvel @ 2020-03-25 18:40 UTC (permalink / raw)
  To: Christopher J Zurcher
  Cc: edk2-devel-groups-io, Jian J Wang, Xiaoyu Lu, Eugene Cohen

On Tue, 17 Mar 2020 at 11:26, Christopher J Zurcher
<christopher.j.zurcher@intel.com> wrote:
>
> BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=2507
>
> This patch adds support for building the native instruction algorithms for
> IA32 and X64 versions of OpensslLib. The process_files.pl script was modified
> to parse the .asm file targets from the OpenSSL build config data struct, and
> generate the necessary assembly files for the EDK2 build environment.
>
> For the X64 variant, OpenSSL includes calls to a Windows error handling API,
> and that function has been stubbed out in ApiHooks.c.
>
> For all variants, a constructor was added to call the required CPUID function
> within OpenSSL to facilitate processor capability checks in the native
> algorithms.
>
> Additional native architecture variants should be simple to add by following
> the changes made for these two architectures.
>
> The OpenSSL assembly files are traditionally generated at build time using a
> perl script. To avoid that burden on EDK2 users, these end-result assembly
> files are generated during the configuration steps performed by the package
> maintainer (through process_files.pl). The perl generator scripts inside
> OpenSSL do not parse file comments as they are only meant to create
> intermediate build files, so process_files.pl contains additional hooks to
> preserve the copyright headers as well as clean up tabs and line endings to
> comply with EDK2 coding standards. The resulting file headers align with
> the generated .h files which are already included in the EDK2 repository.
>
> Cc: Jian J Wang <jian.j.wang@intel.com>
> Cc: Xiaoyu Lu <xiaoyux.lu@intel.com>
> Cc: Eugene Cohen <eugene@hp.com>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>
> Christopher J Zurcher (1):
>   CryptoPkg/OpensslLib: Add native instruction support for IA32 and X64
>

Hello Christopher,

It would be helpful to understand the purpose of all this. Also, I
think we could be more picky about which algorithms we enable - DES
and MD5 don't seem highly useful, and even if they were, what do we
gain by using a faster (or smaller?) implementation?




>  CryptoPkg/Library/OpensslLib/OpensslLib.inf                          |    2 +-
>  CryptoPkg/Library/OpensslLib/OpensslLibCrypto.inf                    |    2 +-
>  CryptoPkg/Library/OpensslLib/OpensslLibIa32.inf                      |  680 ++
>  CryptoPkg/Library/OpensslLib/OpensslLibX64.inf                       |  691 ++
>  CryptoPkg/Library/Include/openssl/opensslconf.h                      |    3 -
>  CryptoPkg/Library/OpensslLib/ApiHooks.c                              |   18 +
>  CryptoPkg/Library/OpensslLib/OpensslLibConstructor.c                 |   34 +
>  CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/aesni-x86.nasm          | 3209 ++++++++
>  CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/vpaes-x86.nasm          |  648 ++
>  CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/bn-586.nasm              | 1522 ++++
>  CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/co-586.nasm              | 1259 +++
>  CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-gf2m.nasm            |  352 +
>  CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-mont.nasm            |  486 ++
>  CryptoPkg/Library/OpensslLib/Ia32/crypto/des/crypt586.nasm           |  887 +++
>  CryptoPkg/Library/OpensslLib/Ia32/crypto/des/des-586.nasm            | 1835 +++++
>  CryptoPkg/Library/OpensslLib/Ia32/crypto/md5/md5-586.nasm            |  690 ++
>  CryptoPkg/Library/OpensslLib/Ia32/crypto/modes/ghash-x86.nasm        | 1264 +++
>  CryptoPkg/Library/OpensslLib/Ia32/crypto/rc4/rc4-586.nasm            |  381 +
>  CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha1-586.nasm           | 3977 ++++++++++
>  CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha256-586.nasm         | 6796 ++++++++++++++++
>  CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha512-586.nasm         | 2842 +++++++
>  CryptoPkg/Library/OpensslLib/Ia32/crypto/x86cpuid.nasm               |  513 ++
>  CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-mb-x86_64.nasm     | 1772 +++++
>  CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha1-x86_64.nasm   | 3271 ++++++++
>  CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha256-x86_64.nasm | 4709 +++++++++++
>  CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-x86_64.nasm        | 5084 ++++++++++++
>  CryptoPkg/Library/OpensslLib/X64/crypto/aes/vpaes-x86_64.nasm        | 1170 +++
>  CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-avx2.nasm            | 1989 +++++
>  CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-x86_64.nasm          | 2242 ++++++
>  CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-gf2m.nasm          |  432 +
>  CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont.nasm          | 1479 ++++
>  CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont5.nasm         | 4033 ++++++++++
>  CryptoPkg/Library/OpensslLib/X64/crypto/md5/md5-x86_64.nasm          |  794 ++
>  CryptoPkg/Library/OpensslLib/X64/crypto/modes/aesni-gcm-x86_64.nasm  |  984 +++
>  CryptoPkg/Library/OpensslLib/X64/crypto/modes/ghash-x86_64.nasm      | 2077 +++++
>  CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-md5-x86_64.nasm      | 1395 ++++
>  CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-x86_64.nasm          |  784 ++
>  CryptoPkg/Library/OpensslLib/X64/crypto/sha/keccak1600-x86_64.nasm   |  532 ++
>  CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-mb-x86_64.nasm      | 7581 ++++++++++++++++++
>  CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-x86_64.nasm         | 5773 ++++++++++++++
>  CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-mb-x86_64.nasm    | 8262 ++++++++++++++++++++
>  CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-x86_64.nasm       | 5712 ++++++++++++++
>  CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha512-x86_64.nasm       | 5668 ++++++++++++++
>  CryptoPkg/Library/OpensslLib/X64/crypto/x86_64cpuid.nasm             |  472 ++
>  CryptoPkg/Library/OpensslLib/process_files.pl                        |  208 +-
>  CryptoPkg/Library/OpensslLib/uefi-asm.conf                           |   14 +
>  46 files changed, 94478 insertions(+), 50 deletions(-)
>  create mode 100644 CryptoPkg/Library/OpensslLib/OpensslLibIa32.inf
>  create mode 100644 CryptoPkg/Library/OpensslLib/OpensslLibX64.inf
>  create mode 100644 CryptoPkg/Library/OpensslLib/ApiHooks.c
>  create mode 100644 CryptoPkg/Library/OpensslLib/OpensslLibConstructor.c
>  create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/aesni-x86.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/vpaes-x86.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/bn-586.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/co-586.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-gf2m.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-mont.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/des/crypt586.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/des/des-586.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/md5/md5-586.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/modes/ghash-x86.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/rc4/rc4-586.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha1-586.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha256-586.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha512-586.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/x86cpuid.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-mb-x86_64.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha1-x86_64.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha256-x86_64.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-x86_64.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/aes/vpaes-x86_64.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-avx2.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-x86_64.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-gf2m.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont5.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/md5/md5-x86_64.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/modes/aesni-gcm-x86_64.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/modes/ghash-x86_64.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-md5-x86_64.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-x86_64.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/sha/keccak1600-x86_64.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-mb-x86_64.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-x86_64.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-mb-x86_64.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-x86_64.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha512-x86_64.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/x86_64cpuid.nasm
>  create mode 100644 CryptoPkg/Library/OpensslLib/uefi-asm.conf
>
> --
> 2.16.2.windows.1
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [edk2-devel] [PATCH 0/1] CryptoPkg/OpensslLib: Add native instruction support for IA32 and X64
  2020-03-25 18:40 ` Ard Biesheuvel
@ 2020-03-26  1:04   ` Zurcher, Christopher J
  2020-03-26  7:49     ` Ard Biesheuvel
  0 siblings, 1 reply; 14+ messages in thread
From: Zurcher, Christopher J @ 2020-03-26  1:04 UTC (permalink / raw)
  To: devel@edk2.groups.io, ard.biesheuvel@linaro.org
  Cc: Wang, Jian J, Lu, XiaoyuX, david.harris4@hp.com

> -----Original Message-----
> From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of Ard Biesheuvel
> Sent: Wednesday, March 25, 2020 11:40
> To: Zurcher, Christopher J <christopher.j.zurcher@intel.com>
> Cc: edk2-devel-groups-io <devel@edk2.groups.io>; Wang, Jian J
> <jian.j.wang@intel.com>; Lu, XiaoyuX <xiaoyux.lu@intel.com>; Eugene Cohen
> <eugene@hp.com>
> Subject: Re: [edk2-devel] [PATCH 0/1] CryptoPkg/OpensslLib: Add native
> instruction support for IA32 and X64
> 
> On Tue, 17 Mar 2020 at 11:26, Christopher J Zurcher
> <christopher.j.zurcher@intel.com> wrote:
> >
> > BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=2507
> >
> > This patch adds support for building the native instruction algorithms for
> > IA32 and X64 versions of OpensslLib. The process_files.pl script was
> modified
> > to parse the .asm file targets from the OpenSSL build config data struct,
> and
> > generate the necessary assembly files for the EDK2 build environment.
> >
> > For the X64 variant, OpenSSL includes calls to a Windows error handling
> API,
> > and that function has been stubbed out in ApiHooks.c.
> >
> > For all variants, a constructor was added to call the required CPUID
> function
> > within OpenSSL to facilitate processor capability checks in the native
> > algorithms.
> >
> > Additional native architecture variants should be simple to add by
> following
> > the changes made for these two architectures.
> >
> > The OpenSSL assembly files are traditionally generated at build time using
> a
> > perl script. To avoid that burden on EDK2 users, these end-result assembly
> > files are generated during the configuration steps performed by the package
> > maintainer (through process_files.pl). The perl generator scripts inside
> > OpenSSL do not parse file comments as they are only meant to create
> > intermediate build files, so process_files.pl contains additional hooks to
> > preserve the copyright headers as well as clean up tabs and line endings to
> > comply with EDK2 coding standards. The resulting file headers align with
> > the generated .h files which are already included in the EDK2 repository.
> >
> > Cc: Jian J Wang <jian.j.wang@intel.com>
> > Cc: Xiaoyu Lu <xiaoyux.lu@intel.com>
> > Cc: Eugene Cohen <eugene@hp.com>                                                                                                                  
> > Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> >
> > Christopher J Zurcher (1):
> >   CryptoPkg/OpensslLib: Add native instruction support for IA32 and X64
> >
> 
> Hello Christopher,
> 
> It would be helpful to understand the purpose of all this. Also, I
> think we could be more picky about which algorithms we enable - DES
> and MD5 don't seem highly useful, and even if they were, what do we
> gain by using a faster (or smaller?) implementation?
> 

The selection of algorithms comes from the default OpenSSL assembly targets; this combination is validated and widely used, and I don't know all the consequences of picking and choosing which ones to include. If necessary I could look into reducing the list.

The primary driver for this change is enabling the Full Flash Update (FFU) OS provisioning flow for Windows as described here:
https://docs.microsoft.com/en-us/windows-hardware/manufacture/desktop/wim-vs-ffu-image-file-formats
This item under "Reliability" is what we are speeding up: "Includes a catalog and hash table to validate a signature upfront before flashing onto a device. The hash table is generated during capture, and validated when applying the image."
This provisioning flow can be performed within the UEFI environment, and the native algorithms allow significant time savings in a factory setting (minutes per device).

We also had a BZ which requested these changes and the specific need was not provided. Maybe David @HP can provide further insight?
https://bugzilla.tianocore.org/show_bug.cgi?id=2507

There have been additional platform-specific benefits identified, for example speeding up HMAC authentication of communication with other HW/FW components.

Thanks,
Christopher Zurcher

> 
> 
> 
> >  CryptoPkg/Library/OpensslLib/OpensslLib.inf                          |
> 2 +-
> >  CryptoPkg/Library/OpensslLib/OpensslLibCrypto.inf                    |
> 2 +-
> >  CryptoPkg/Library/OpensslLib/OpensslLibIa32.inf                      |
> 680 ++
> >  CryptoPkg/Library/OpensslLib/OpensslLibX64.inf                       |
> 691 ++
> >  CryptoPkg/Library/Include/openssl/opensslconf.h                      |
> 3 -
> >  CryptoPkg/Library/OpensslLib/ApiHooks.c                              |
> 18 +
> >  CryptoPkg/Library/OpensslLib/OpensslLibConstructor.c                 |
> 34 +
> >  CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/aesni-x86.nasm          |
> 3209 ++++++++
> >  CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/vpaes-x86.nasm          |
> 648 ++
> >  CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/bn-586.nasm              |
> 1522 ++++
> >  CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/co-586.nasm              |
> 1259 +++
> >  CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-gf2m.nasm            |
> 352 +
> >  CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-mont.nasm            |
> 486 ++
> >  CryptoPkg/Library/OpensslLib/Ia32/crypto/des/crypt586.nasm           |
> 887 +++
> >  CryptoPkg/Library/OpensslLib/Ia32/crypto/des/des-586.nasm            |
> 1835 +++++
> >  CryptoPkg/Library/OpensslLib/Ia32/crypto/md5/md5-586.nasm            |
> 690 ++
> >  CryptoPkg/Library/OpensslLib/Ia32/crypto/modes/ghash-x86.nasm        |
> 1264 +++
> >  CryptoPkg/Library/OpensslLib/Ia32/crypto/rc4/rc4-586.nasm            |
> 381 +
> >  CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha1-586.nasm           |
> 3977 ++++++++++
> >  CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha256-586.nasm         |
> 6796 ++++++++++++++++
> >  CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha512-586.nasm         |
> 2842 +++++++
> >  CryptoPkg/Library/OpensslLib/Ia32/crypto/x86cpuid.nasm               |
> 513 ++
> >  CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-mb-x86_64.nasm     |
> 1772 +++++
> >  CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha1-x86_64.nasm   |
> 3271 ++++++++
> >  CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha256-x86_64.nasm |
> 4709 +++++++++++
> >  CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-x86_64.nasm        |
> 5084 ++++++++++++
> >  CryptoPkg/Library/OpensslLib/X64/crypto/aes/vpaes-x86_64.nasm        |
> 1170 +++
> >  CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-avx2.nasm            |
> 1989 +++++
> >  CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-x86_64.nasm          |
> 2242 ++++++
> >  CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-gf2m.nasm          |
> 432 +
> >  CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont.nasm          |
> 1479 ++++
> >  CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-mont5.nasm         |
> 4033 ++++++++++
> >  CryptoPkg/Library/OpensslLib/X64/crypto/md5/md5-x86_64.nasm          |
> 794 ++
> >  CryptoPkg/Library/OpensslLib/X64/crypto/modes/aesni-gcm-x86_64.nasm  |
> 984 +++
> >  CryptoPkg/Library/OpensslLib/X64/crypto/modes/ghash-x86_64.nasm      |
> 2077 +++++
> >  CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-md5-x86_64.nasm      |
> 1395 ++++
> >  CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-x86_64.nasm          |
> 784 ++
> >  CryptoPkg/Library/OpensslLib/X64/crypto/sha/keccak1600-x86_64.nasm   |
> 532 ++
> >  CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-mb-x86_64.nasm      |
> 7581 ++++++++++++++++++
> >  CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-x86_64.nasm         |
> 5773 ++++++++++++++
> >  CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-mb-x86_64.nasm    |
> 8262 ++++++++++++++++++++
> >  CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-x86_64.nasm       |
> 5712 ++++++++++++++
> >  CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha512-x86_64.nasm       |
> 5668 ++++++++++++++
> >  CryptoPkg/Library/OpensslLib/X64/crypto/x86_64cpuid.nasm             |
> 472 ++
> >  CryptoPkg/Library/OpensslLib/process_files.pl                        |
> 208 +-
> >  CryptoPkg/Library/OpensslLib/uefi-asm.conf                           |
> 14 +
> >  46 files changed, 94478 insertions(+), 50 deletions(-)
> >  create mode 100644 CryptoPkg/Library/OpensslLib/OpensslLibIa32.inf
> >  create mode 100644 CryptoPkg/Library/OpensslLib/OpensslLibX64.inf
> >  create mode 100644 CryptoPkg/Library/OpensslLib/ApiHooks.c
> >  create mode 100644 CryptoPkg/Library/OpensslLib/OpensslLibConstructor.c
> >  create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/aesni-
> x86.nasm
> >  create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/aes/vpaes-
> x86.nasm
> >  create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/bn-586.nasm
> >  create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/co-586.nasm
> >  create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-
> gf2m.nasm
> >  create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/bn/x86-
> mont.nasm
> >  create mode 100644
> CryptoPkg/Library/OpensslLib/Ia32/crypto/des/crypt586.nasm
> >  create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/des/des-
> 586.nasm
> >  create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/md5/md5-
> 586.nasm
> >  create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/modes/ghash-
> x86.nasm
> >  create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/rc4/rc4-
> 586.nasm
> >  create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha1-
> 586.nasm
> >  create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha256-
> 586.nasm
> >  create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/sha/sha512-
> 586.nasm
> >  create mode 100644 CryptoPkg/Library/OpensslLib/Ia32/crypto/x86cpuid.nasm
> >  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-mb-
> x86_64.nasm
> >  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-sha1-
> x86_64.nasm
> >  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-
> sha256-x86_64.nasm
> >  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/aes/aesni-
> x86_64.nasm
> >  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/aes/vpaes-
> x86_64.nasm
> >  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-
> avx2.nasm
> >  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/bn/rsaz-
> x86_64.nasm
> >  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-
> gf2m.nasm
> >  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-
> mont.nasm
> >  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/bn/x86_64-
> mont5.nasm
> >  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/md5/md5-
> x86_64.nasm
> >  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/modes/aesni-
> gcm-x86_64.nasm
> >  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/modes/ghash-
> x86_64.nasm
> >  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-md5-
> x86_64.nasm
> >  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/rc4/rc4-
> x86_64.nasm
> >  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/sha/keccak1600-
> x86_64.nasm
> >  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-mb-
> x86_64.nasm
> >  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha1-
> x86_64.nasm
> >  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-mb-
> x86_64.nasm
> >  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha256-
> x86_64.nasm
> >  create mode 100644 CryptoPkg/Library/OpensslLib/X64/crypto/sha/sha512-
> x86_64.nasm
> >  create mode 100644
> CryptoPkg/Library/OpensslLib/X64/crypto/x86_64cpuid.nasm
> >  create mode 100644 CryptoPkg/Library/OpensslLib/uefi-asm.conf
> >
> > --
> > 2.16.2.windows.1
> >
> 
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native instruction support for IA32 and X64
  2020-03-17 10:26 ` [PATCH 1/1] " Zurcher, Christopher J
@ 2020-03-26  1:15   ` Yao, Jiewen
       [not found]   ` <15FFB5A5A94CCE31.23217@groups.io>
  1 sibling, 0 replies; 14+ messages in thread
From: Yao, Jiewen @ 2020-03-26  1:15 UTC (permalink / raw)
  To: devel@edk2.groups.io, Zurcher, Christopher J
  Cc: Wang, Jian J, Lu, XiaoyuX, Eugene Cohen, Ard Biesheuvel

HI Christopher
Thanks for the contribution. I think it is good enhancement.

Do you have any data show what performance improvement we can get?
Did the system boot faster with the this? Which feature ?
UEFI Secure Boot? TCG Measured Boot? HTTPS boot?


Comment for the code:
1) I am not sure if we need separate OpensslLibIa32 and OpensslLibX64.
Can we just define single INF, such as OpensslLibHw.inf ?

2) Do we also need add a new version for OpensslLibCrypto.inf ?



Thank you
Yao Jiewen

> -----Original Message-----
> From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of Zurcher,
> Christopher J
> Sent: Tuesday, March 17, 2020 6:27 PM
> To: devel@edk2.groups.io
> Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX <xiaoyux.lu@intel.com>;
> Eugene Cohen <eugene@hp.com>; Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Subject: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native instruction
> support for IA32 and X64
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native instruction support for IA32 and X64
       [not found]   ` <15FFB5A5A94CCE31.23217@groups.io>
@ 2020-03-26  1:23     ` Yao, Jiewen
  2020-03-26  2:44       ` Zurcher, Christopher J
  0 siblings, 1 reply; 14+ messages in thread
From: Yao, Jiewen @ 2020-03-26  1:23 UTC (permalink / raw)
  To: devel@edk2.groups.io, Yao, Jiewen, Zurcher, Christopher J
  Cc: Wang, Jian J, Lu, XiaoyuX, Eugene Cohen, Ard Biesheuvel

Some more comment:

3) Do you consider to enable RNG instruction as well?

4) I saw you added some code for AVX instruction, such as YMM register.
Have you validated that code, to make sure it can work correctly in current environment?




> -----Original Message-----
> From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of Yao, Jiewen
> Sent: Thursday, March 26, 2020 9:15 AM
> To: devel@edk2.groups.io; Zurcher, Christopher J
> <christopher.j.zurcher@intel.com>
> Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX <xiaoyux.lu@intel.com>;
> Eugene Cohen <eugene@hp.com>; Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Subject: Re: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native
> instruction support for IA32 and X64
> 
> HI Christopher
> Thanks for the contribution. I think it is good enhancement.
> 
> Do you have any data show what performance improvement we can get?
> Did the system boot faster with the this? Which feature ?
> UEFI Secure Boot? TCG Measured Boot? HTTPS boot?
> 
> 
> Comment for the code:
> 1) I am not sure if we need separate OpensslLibIa32 and OpensslLibX64.
> Can we just define single INF, such as OpensslLibHw.inf ?
> 
> 2) Do we also need add a new version for OpensslLibCrypto.inf ?
> 
> 
> 
> Thank you
> Yao Jiewen
> 
> > -----Original Message-----
> > From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of Zurcher,
> > Christopher J
> > Sent: Tuesday, March 17, 2020 6:27 PM
> > To: devel@edk2.groups.io
> > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX
> <xiaoyux.lu@intel.com>;
> > Eugene Cohen <eugene@hp.com>; Ard Biesheuvel <ard.biesheuvel@linaro.org>
> > Subject: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native
> instruction
> > support for IA32 and X64
> >
> 
> 
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native instruction support for IA32 and X64
  2020-03-26  1:23     ` Yao, Jiewen
@ 2020-03-26  2:44       ` Zurcher, Christopher J
  2020-03-26  3:05         ` Yao, Jiewen
  0 siblings, 1 reply; 14+ messages in thread
From: Zurcher, Christopher J @ 2020-03-26  2:44 UTC (permalink / raw)
  To: devel@edk2.groups.io, Yao, Jiewen
  Cc: Wang, Jian J, Lu, XiaoyuX, Ard Biesheuvel, david.harris4@hp.com,
	Kinney, Michael D

The specific performance improvement depends on the operation; the OS provisioning I mentioned in the [Patch 0/1] thread removed the hashing as a bottleneck and improved the overall operation speed over 4x (saving 2.5 minutes of flashing time), but a direct SHA256 benchmark on the particular silicon I have available showed over 12x improvement. I have not benchmarked the improvements to boot time. I do not know the use case targeted by BZ 2507 so I don't know what benefit will be seen there.

I will look at unifying the INF files in the next patch-set and will also add the OpensslLibCrypto.inf case.

I have not exercised the AVX code specifically, as it is coming directly from OpenSSL and includes checks against the CPUID capability flags before executing. I'm not entirely familiar with AVX requirements; is there a known environment restriction against AVX instructions in EDK2?

Regarding RNG, it looks like we already have architecture-specific variants of RdRand...?

There was some off-list feedback regarding the number of files required to be checked in here. OpenSSL does not include assembler-specific implementations of these files and instead relies on "perlasm" scripts which are parsed by a translation script at build time (in the normal OpenSSL build flow) to generate the resulting .nasm files. The implementation I have shared here generates these files as part of the OpensslLib maintainer process, similar to the existing header files which are also generated. Since process_files.pl already requires the package maintainer to have a Perl environment installed, this does not place any additional burden on them.
An alternative implementation has been proposed which would see only a listing/script of the required generator operations to be checked in, and any platform build which intended to utilize the native algorithms would require a local Perl environment as well as any underlying environment dependencies (such as a version check against the NASM executable) for every developer, and additional pre-build steps to run the generator scripts.

Are there any strong opinions here around adding Perl as a build environment dependency vs. checking in maintainer-generated assembly "intermediate" build files?

Thanks,
Christopher Zurcher

> -----Original Message-----
> From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of Yao, Jiewen
> Sent: Wednesday, March 25, 2020 18:23
> To: devel@edk2.groups.io; Yao, Jiewen <jiewen.yao@intel.com>; Zurcher,
> Christopher J <christopher.j.zurcher@intel.com>
> Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX <xiaoyux.lu@intel.com>;
> Eugene Cohen <eugene@hp.com>; Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Subject: Re: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native
> instruction support for IA32 and X64
> 
> Some more comment:
> 
> 3) Do you consider to enable RNG instruction as well?
> 
> 4) I saw you added some code for AVX instruction, such as YMM register.
> Have you validated that code, to make sure it can work correctly in current
> environment?
> 
> 
> 
> 
> > -----Original Message-----
> > From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of Yao, Jiewen
> > Sent: Thursday, March 26, 2020 9:15 AM
> > To: devel@edk2.groups.io; Zurcher, Christopher J
> > <christopher.j.zurcher@intel.com>
> > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX
> <xiaoyux.lu@intel.com>;
> > Eugene Cohen <eugene@hp.com>; Ard Biesheuvel <ard.biesheuvel@linaro.org>
> > Subject: Re: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native
> > instruction support for IA32 and X64
> >
> > HI Christopher
> > Thanks for the contribution. I think it is good enhancement.
> >
> > Do you have any data show what performance improvement we can get?
> > Did the system boot faster with the this? Which feature ?
> > UEFI Secure Boot? TCG Measured Boot? HTTPS boot?
> >
> >
> > Comment for the code:
> > 1) I am not sure if we need separate OpensslLibIa32 and OpensslLibX64.
> > Can we just define single INF, such as OpensslLibHw.inf ?
> >
> > 2) Do we also need add a new version for OpensslLibCrypto.inf ?
> >
> >
> >
> > Thank you
> > Yao Jiewen
> >
> > > -----Original Message-----
> > > From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of Zurcher,
> > > Christopher J
> > > Sent: Tuesday, March 17, 2020 6:27 PM
> > > To: devel@edk2.groups.io
> > > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX
> > <xiaoyux.lu@intel.com>;
> > > Eugene Cohen <eugene@hp.com>; Ard Biesheuvel <ard.biesheuvel@linaro.org>
> > > Subject: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native
> > instruction
> > > support for IA32 and X64
> > >
> >
> >
> >
> 
> 
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native instruction support for IA32 and X64
  2020-03-26  2:44       ` Zurcher, Christopher J
@ 2020-03-26  3:05         ` Yao, Jiewen
  2020-03-26  3:29           ` Zurcher, Christopher J
  0 siblings, 1 reply; 14+ messages in thread
From: Yao, Jiewen @ 2020-03-26  3:05 UTC (permalink / raw)
  To: Zurcher, Christopher J, devel@edk2.groups.io
  Cc: Wang, Jian J, Lu, XiaoyuX, Ard Biesheuvel, david.harris4@hp.com,
	Kinney, Michael D

Thanks. Comment inline:

> -----Original Message-----
> From: Zurcher, Christopher J <christopher.j.zurcher@intel.com>
> Sent: Thursday, March 26, 2020 10:44 AM
> To: devel@edk2.groups.io; Yao, Jiewen <jiewen.yao@intel.com>
> Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX <xiaoyux.lu@intel.com>;
> Ard Biesheuvel <ard.biesheuvel@linaro.org>; david.harris4@hp.com; Kinney,
> Michael D <michael.d.kinney@intel.com>
> Subject: RE: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native
> instruction support for IA32 and X64
> 
> The specific performance improvement depends on the operation; the OS
> provisioning I mentioned in the [Patch 0/1] thread removed the hashing as a
> bottleneck and improved the overall operation speed over 4x (saving 2.5
> minutes of flashing time), but a direct SHA256 benchmark on the particular
> silicon I have available showed over 12x improvement. I have not benchmarked
> the improvements to boot time. I do not know the use case targeted by BZ 2507
> so I don't know what benefit will be seen there.
[Jiewen] I guess there might be some improvement on HTTPS boot because of AES-NI for TLS session.
I am just curious on the data. :-)


> 
> I will look at unifying the INF files in the next patch-set and will also add the
> OpensslLibCrypto.inf case.
[Jiewen] Thanks!


> 
> I have not exercised the AVX code specifically, as it is coming directly from
> OpenSSL and includes checks against the CPUID capability flags before executing.
> I'm not entirely familiar with AVX requirements; is there a known environment
> restriction against AVX instructions in EDK2?
[Jiewen] Yes. UEFI spec only requires to set env for XMM register.
Using other registers such as YMM or ZMM may requires special setup, and save/restore some FPU state.
If a function containing the YMM register access and it is linked into BaseCryptoLib, I highly recommend you run some test to make sure it can work correct.

Maybe I should ask more generic question: Have you validated all impacted crypto lib API to make sure they still work well with this improvement?


> 
> Regarding RNG, it looks like we already have architecture-specific variants of
> RdRand...?
[Jiewen] Yes. That is in RngLib. 
I ask this question because I see openssl wrapper is using PMC/TSC as noisy. 
https://github.com/tianocore/edk2/blob/master/CryptoPkg/Library/OpensslLib/rand_pool.c
Since this patch adds instruction dependency, why no use RNG instruction as well?


> 
> There was some off-list feedback regarding the number of files required to be
> checked in here. OpenSSL does not include assembler-specific implementations
> of these files and instead relies on "perlasm" scripts which are parsed by a
> translation script at build time (in the normal OpenSSL build flow) to generate
> the resulting .nasm files. The implementation I have shared here generates these
> files as part of the OpensslLib maintainer process, similar to the existing header
> files which are also generated. Since process_files.pl already requires the
> package maintainer to have a Perl environment installed, this does not place any
> additional burden on them.
> An alternative implementation has been proposed which would see only a
> listing/script of the required generator operations to be checked in, and any
> platform build which intended to utilize the native algorithms would require a
> local Perl environment as well as any underlying environment dependencies
> (such as a version check against the NASM executable) for every developer, and
> additional pre-build steps to run the generator scripts.
> 
> Are there any strong opinions here around adding Perl as a build environment
> dependency vs. checking in maintainer-generated assembly "intermediate" build
> files?
[Jiewen] Good question. For tool, maybe Mike or Liming can answer.
And I did get similar issue with you.
I got a submodule code need using CMake and pre-processor to generate a common include file.
How do we handle that? I look for the recommendation as well. 


> 
> Thanks,
> Christopher Zurcher
> 
> > -----Original Message-----
> > From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of Yao,
> Jiewen
> > Sent: Wednesday, March 25, 2020 18:23
> > To: devel@edk2.groups.io; Yao, Jiewen <jiewen.yao@intel.com>; Zurcher,
> > Christopher J <christopher.j.zurcher@intel.com>
> > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX
> <xiaoyux.lu@intel.com>;
> > Eugene Cohen <eugene@hp.com>; Ard Biesheuvel <ard.biesheuvel@linaro.org>
> > Subject: Re: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native
> > instruction support for IA32 and X64
> >
> > Some more comment:
> >
> > 3) Do you consider to enable RNG instruction as well?
> >
> > 4) I saw you added some code for AVX instruction, such as YMM register.
> > Have you validated that code, to make sure it can work correctly in current
> > environment?
> >
> >
> >
> >
> > > -----Original Message-----
> > > From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of Yao,
> Jiewen
> > > Sent: Thursday, March 26, 2020 9:15 AM
> > > To: devel@edk2.groups.io; Zurcher, Christopher J
> > > <christopher.j.zurcher@intel.com>
> > > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX
> > <xiaoyux.lu@intel.com>;
> > > Eugene Cohen <eugene@hp.com>; Ard Biesheuvel
> <ard.biesheuvel@linaro.org>
> > > Subject: Re: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native
> > > instruction support for IA32 and X64
> > >
> > > HI Christopher
> > > Thanks for the contribution. I think it is good enhancement.
> > >
> > > Do you have any data show what performance improvement we can get?
> > > Did the system boot faster with the this? Which feature ?
> > > UEFI Secure Boot? TCG Measured Boot? HTTPS boot?
> > >
> > >
> > > Comment for the code:
> > > 1) I am not sure if we need separate OpensslLibIa32 and OpensslLibX64.
> > > Can we just define single INF, such as OpensslLibHw.inf ?
> > >
> > > 2) Do we also need add a new version for OpensslLibCrypto.inf ?
> > >
> > >
> > >
> > > Thank you
> > > Yao Jiewen
> > >
> > > > -----Original Message-----
> > > > From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of
> Zurcher,
> > > > Christopher J
> > > > Sent: Tuesday, March 17, 2020 6:27 PM
> > > > To: devel@edk2.groups.io
> > > > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX
> > > <xiaoyux.lu@intel.com>;
> > > > Eugene Cohen <eugene@hp.com>; Ard Biesheuvel
> <ard.biesheuvel@linaro.org>
> > > > Subject: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native
> > > instruction
> > > > support for IA32 and X64
> > > >
> > >
> > >
> > >
> >
> >
> > 
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native instruction support for IA32 and X64
  2020-03-26  3:05         ` Yao, Jiewen
@ 2020-03-26  3:29           ` Zurcher, Christopher J
  2020-03-26  3:58             ` Yao, Jiewen
  0 siblings, 1 reply; 14+ messages in thread
From: Zurcher, Christopher J @ 2020-03-26  3:29 UTC (permalink / raw)
  To: Yao, Jiewen, devel@edk2.groups.io
  Cc: Wang, Jian J, Lu, XiaoyuX, Ard Biesheuvel, david.harris4@hp.com,
	Kinney, Michael D

> -----Original Message-----
> From: Yao, Jiewen <jiewen.yao@intel.com>
> Sent: Wednesday, March 25, 2020 20:05
> To: Zurcher, Christopher J <christopher.j.zurcher@intel.com>;
> devel@edk2.groups.io
> Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX <xiaoyux.lu@intel.com>;
> Ard Biesheuvel <ard.biesheuvel@linaro.org>; david.harris4@hp.com; Kinney,
> Michael D <michael.d.kinney@intel.com>
> Subject: RE: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native
> instruction support for IA32 and X64
> 
> Thanks. Comment inline:
> 
> > -----Original Message-----
> > From: Zurcher, Christopher J <christopher.j.zurcher@intel.com>
> > Sent: Thursday, March 26, 2020 10:44 AM
> > To: devel@edk2.groups.io; Yao, Jiewen <jiewen.yao@intel.com>
> > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX
> <xiaoyux.lu@intel.com>;
> > Ard Biesheuvel <ard.biesheuvel@linaro.org>; david.harris4@hp.com; Kinney,
> > Michael D <michael.d.kinney@intel.com>
> > Subject: RE: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native
> > instruction support for IA32 and X64
> >
> > The specific performance improvement depends on the operation; the OS
> > provisioning I mentioned in the [Patch 0/1] thread removed the hashing as a
> > bottleneck and improved the overall operation speed over 4x (saving 2.5
> > minutes of flashing time), but a direct SHA256 benchmark on the particular
> > silicon I have available showed over 12x improvement. I have not
> benchmarked
> > the improvements to boot time. I do not know the use case targeted by BZ
> 2507
> > so I don't know what benefit will be seen there.
> [Jiewen] I guess there might be some improvement on HTTPS boot because of
> AES-NI for TLS session.
> I am just curious on the data. :-)
> 
> 
> >
> > I will look at unifying the INF files in the next patch-set and will also
> add the
> > OpensslLibCrypto.inf case.
> [Jiewen] Thanks!
> 
> 
> >
> > I have not exercised the AVX code specifically, as it is coming directly
> from
> > OpenSSL and includes checks against the CPUID capability flags before
> executing.
> > I'm not entirely familiar with AVX requirements; is there a known
> environment
> > restriction against AVX instructions in EDK2?
> [Jiewen] Yes. UEFI spec only requires to set env for XMM register.
> Using other registers such as YMM or ZMM may requires special setup, and
> save/restore some FPU state.
> If a function containing the YMM register access and it is linked into
> BaseCryptoLib, I highly recommend you run some test to make sure it can work
> correct.
> 
> Maybe I should ask more generic question: Have you validated all impacted
> crypto lib API to make sure they still work well with this improvement?
> 

I have not. Is there an existing test suite that was used initially, or used with each OpenSSL version update?

Thanks,
Christopher Zurcher

> 
> >
> > Regarding RNG, it looks like we already have architecture-specific variants
> of
> > RdRand...?
> [Jiewen] Yes. That is in RngLib.
> I ask this question because I see openssl wrapper is using PMC/TSC as noisy.
> https://github.com/tianocore/edk2/blob/master/CryptoPkg/Library/OpensslLib/ra
> nd_pool.c
> Since this patch adds instruction dependency, why no use RNG instruction as
> well?
> 
> 
> >
> > There was some off-list feedback regarding the number of files required to
> be
> > checked in here. OpenSSL does not include assembler-specific
> implementations
> > of these files and instead relies on "perlasm" scripts which are parsed by
> a
> > translation script at build time (in the normal OpenSSL build flow) to
> generate
> > the resulting .nasm files. The implementation I have shared here generates
> these
> > files as part of the OpensslLib maintainer process, similar to the existing
> header
> > files which are also generated. Since process_files.pl already requires the
> > package maintainer to have a Perl environment installed, this does not
> place any
> > additional burden on them.
> > An alternative implementation has been proposed which would see only a
> > listing/script of the required generator operations to be checked in, and
> any
> > platform build which intended to utilize the native algorithms would
> require a
> > local Perl environment as well as any underlying environment dependencies
> > (such as a version check against the NASM executable) for every developer,
> and
> > additional pre-build steps to run the generator scripts.
> >
> > Are there any strong opinions here around adding Perl as a build
> environment
> > dependency vs. checking in maintainer-generated assembly "intermediate"
> build
> > files?
> [Jiewen] Good question. For tool, maybe Mike or Liming can answer.
> And I did get similar issue with you.
> I got a submodule code need using CMake and pre-processor to generate a
> common include file.
> How do we handle that? I look for the recommendation as well.
> 
> 
> >
> > Thanks,
> > Christopher Zurcher
> >
> > > -----Original Message-----
> > > From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of Yao,
> > Jiewen
> > > Sent: Wednesday, March 25, 2020 18:23
> > > To: devel@edk2.groups.io; Yao, Jiewen <jiewen.yao@intel.com>; Zurcher,
> > > Christopher J <christopher.j.zurcher@intel.com>
> > > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX
> > <xiaoyux.lu@intel.com>;
> > > Eugene Cohen <eugene@hp.com>; Ard Biesheuvel <ard.biesheuvel@linaro.org>
> > > Subject: Re: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native
> > > instruction support for IA32 and X64
> > >
> > > Some more comment:
> > >
> > > 3) Do you consider to enable RNG instruction as well?
> > >
> > > 4) I saw you added some code for AVX instruction, such as YMM register.
> > > Have you validated that code, to make sure it can work correctly in
> current
> > > environment?
> > >
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of Yao,
> > Jiewen
> > > > Sent: Thursday, March 26, 2020 9:15 AM
> > > > To: devel@edk2.groups.io; Zurcher, Christopher J
> > > > <christopher.j.zurcher@intel.com>
> > > > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX
> > > <xiaoyux.lu@intel.com>;
> > > > Eugene Cohen <eugene@hp.com>; Ard Biesheuvel
> > <ard.biesheuvel@linaro.org>
> > > > Subject: Re: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native
> > > > instruction support for IA32 and X64
> > > >
> > > > HI Christopher
> > > > Thanks for the contribution. I think it is good enhancement.
> > > >
> > > > Do you have any data show what performance improvement we can get?
> > > > Did the system boot faster with the this? Which feature ?
> > > > UEFI Secure Boot? TCG Measured Boot? HTTPS boot?
> > > >
> > > >
> > > > Comment for the code:
> > > > 1) I am not sure if we need separate OpensslLibIa32 and OpensslLibX64.
> > > > Can we just define single INF, such as OpensslLibHw.inf ?
> > > >
> > > > 2) Do we also need add a new version for OpensslLibCrypto.inf ?
> > > >
> > > >
> > > >
> > > > Thank you
> > > > Yao Jiewen
> > > >
> > > > > -----Original Message-----
> > > > > From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of
> > Zurcher,
> > > > > Christopher J
> > > > > Sent: Tuesday, March 17, 2020 6:27 PM
> > > > > To: devel@edk2.groups.io
> > > > > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX
> > > > <xiaoyux.lu@intel.com>;
> > > > > Eugene Cohen <eugene@hp.com>; Ard Biesheuvel
> > <ard.biesheuvel@linaro.org>
> > > > > Subject: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native
> > > > instruction
> > > > > support for IA32 and X64
> > > > >
> > > >
> > > >
> > > >
> > >
> > >
> > > 
> >
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native instruction support for IA32 and X64
  2020-03-26  3:29           ` Zurcher, Christopher J
@ 2020-03-26  3:58             ` Yao, Jiewen
  2020-03-26 18:23               ` Michael D Kinney
  0 siblings, 1 reply; 14+ messages in thread
From: Yao, Jiewen @ 2020-03-26  3:58 UTC (permalink / raw)
  To: Zurcher, Christopher J, devel@edk2.groups.io
  Cc: Wang, Jian J, Lu, XiaoyuX, Ard Biesheuvel, david.harris4@hp.com,
	Kinney, Michael D

There is an old shell based test application. It is removed from edk2-repo, and it should be in edk2-test repo in the future.

Before the latter is ready, you can find the archive at
https://github.com/jyao1/edk2/tree/edk2-cryptest/CryptoTestPkg/Cryptest

But I am not sure if it covers the latest interface, especially the new added one. As far as I know, some SM3 is not added there.

I still recommend you double check which functions use the YMM and make sure these functions are tested.

I have no concern on HASH, since you already provided the data.
But other algo, such as AES/RSA/HMAC, etc.


Thank you
Yao Jiewen

> -----Original Message-----
> From: Zurcher, Christopher J <christopher.j.zurcher@intel.com>
> Sent: Thursday, March 26, 2020 11:29 AM
> To: Yao, Jiewen <jiewen.yao@intel.com>; devel@edk2.groups.io
> Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX <xiaoyux.lu@intel.com>;
> Ard Biesheuvel <ard.biesheuvel@linaro.org>; david.harris4@hp.com; Kinney,
> Michael D <michael.d.kinney@intel.com>
> Subject: RE: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native
> instruction support for IA32 and X64
> 
> > -----Original Message-----
> > From: Yao, Jiewen <jiewen.yao@intel.com>
> > Sent: Wednesday, March 25, 2020 20:05
> > To: Zurcher, Christopher J <christopher.j.zurcher@intel.com>;
> > devel@edk2.groups.io
> > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX
> <xiaoyux.lu@intel.com>;
> > Ard Biesheuvel <ard.biesheuvel@linaro.org>; david.harris4@hp.com; Kinney,
> > Michael D <michael.d.kinney@intel.com>
> > Subject: RE: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native
> > instruction support for IA32 and X64
> >
> > Thanks. Comment inline:
> >
> > > -----Original Message-----
> > > From: Zurcher, Christopher J <christopher.j.zurcher@intel.com>
> > > Sent: Thursday, March 26, 2020 10:44 AM
> > > To: devel@edk2.groups.io; Yao, Jiewen <jiewen.yao@intel.com>
> > > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX
> > <xiaoyux.lu@intel.com>;
> > > Ard Biesheuvel <ard.biesheuvel@linaro.org>; david.harris4@hp.com; Kinney,
> > > Michael D <michael.d.kinney@intel.com>
> > > Subject: RE: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native
> > > instruction support for IA32 and X64
> > >
> > > The specific performance improvement depends on the operation; the OS
> > > provisioning I mentioned in the [Patch 0/1] thread removed the hashing as a
> > > bottleneck and improved the overall operation speed over 4x (saving 2.5
> > > minutes of flashing time), but a direct SHA256 benchmark on the particular
> > > silicon I have available showed over 12x improvement. I have not
> > benchmarked
> > > the improvements to boot time. I do not know the use case targeted by BZ
> > 2507
> > > so I don't know what benefit will be seen there.
> > [Jiewen] I guess there might be some improvement on HTTPS boot because of
> > AES-NI for TLS session.
> > I am just curious on the data. :-)
> >
> >
> > >
> > > I will look at unifying the INF files in the next patch-set and will also
> > add the
> > > OpensslLibCrypto.inf case.
> > [Jiewen] Thanks!
> >
> >
> > >
> > > I have not exercised the AVX code specifically, as it is coming directly
> > from
> > > OpenSSL and includes checks against the CPUID capability flags before
> > executing.
> > > I'm not entirely familiar with AVX requirements; is there a known
> > environment
> > > restriction against AVX instructions in EDK2?
> > [Jiewen] Yes. UEFI spec only requires to set env for XMM register.
> > Using other registers such as YMM or ZMM may requires special setup, and
> > save/restore some FPU state.
> > If a function containing the YMM register access and it is linked into
> > BaseCryptoLib, I highly recommend you run some test to make sure it can
> work
> > correct.
> >
> > Maybe I should ask more generic question: Have you validated all impacted
> > crypto lib API to make sure they still work well with this improvement?
> >
> 
> I have not. Is there an existing test suite that was used initially, or used with
> each OpenSSL version update?
> 
> Thanks,
> Christopher Zurcher
> 
> >
> > >
> > > Regarding RNG, it looks like we already have architecture-specific variants
> > of
> > > RdRand...?
> > [Jiewen] Yes. That is in RngLib.
> > I ask this question because I see openssl wrapper is using PMC/TSC as noisy.
> >
> https://github.com/tianocore/edk2/blob/master/CryptoPkg/Library/OpensslLib/
> ra
> > nd_pool.c
> > Since this patch adds instruction dependency, why no use RNG instruction as
> > well?
> >
> >
> > >
> > > There was some off-list feedback regarding the number of files required to
> > be
> > > checked in here. OpenSSL does not include assembler-specific
> > implementations
> > > of these files and instead relies on "perlasm" scripts which are parsed by
> > a
> > > translation script at build time (in the normal OpenSSL build flow) to
> > generate
> > > the resulting .nasm files. The implementation I have shared here generates
> > these
> > > files as part of the OpensslLib maintainer process, similar to the existing
> > header
> > > files which are also generated. Since process_files.pl already requires the
> > > package maintainer to have a Perl environment installed, this does not
> > place any
> > > additional burden on them.
> > > An alternative implementation has been proposed which would see only a
> > > listing/script of the required generator operations to be checked in, and
> > any
> > > platform build which intended to utilize the native algorithms would
> > require a
> > > local Perl environment as well as any underlying environment dependencies
> > > (such as a version check against the NASM executable) for every developer,
> > and
> > > additional pre-build steps to run the generator scripts.
> > >
> > > Are there any strong opinions here around adding Perl as a build
> > environment
> > > dependency vs. checking in maintainer-generated assembly "intermediate"
> > build
> > > files?
> > [Jiewen] Good question. For tool, maybe Mike or Liming can answer.
> > And I did get similar issue with you.
> > I got a submodule code need using CMake and pre-processor to generate a
> > common include file.
> > How do we handle that? I look for the recommendation as well.
> >
> >
> > >
> > > Thanks,
> > > Christopher Zurcher
> > >
> > > > -----Original Message-----
> > > > From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of Yao,
> > > Jiewen
> > > > Sent: Wednesday, March 25, 2020 18:23
> > > > To: devel@edk2.groups.io; Yao, Jiewen <jiewen.yao@intel.com>; Zurcher,
> > > > Christopher J <christopher.j.zurcher@intel.com>
> > > > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX
> > > <xiaoyux.lu@intel.com>;
> > > > Eugene Cohen <eugene@hp.com>; Ard Biesheuvel
> <ard.biesheuvel@linaro.org>
> > > > Subject: Re: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native
> > > > instruction support for IA32 and X64
> > > >
> > > > Some more comment:
> > > >
> > > > 3) Do you consider to enable RNG instruction as well?
> > > >
> > > > 4) I saw you added some code for AVX instruction, such as YMM register.
> > > > Have you validated that code, to make sure it can work correctly in
> > current
> > > > environment?
> > > >
> > > >
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of Yao,
> > > Jiewen
> > > > > Sent: Thursday, March 26, 2020 9:15 AM
> > > > > To: devel@edk2.groups.io; Zurcher, Christopher J
> > > > > <christopher.j.zurcher@intel.com>
> > > > > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX
> > > > <xiaoyux.lu@intel.com>;
> > > > > Eugene Cohen <eugene@hp.com>; Ard Biesheuvel
> > > <ard.biesheuvel@linaro.org>
> > > > > Subject: Re: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native
> > > > > instruction support for IA32 and X64
> > > > >
> > > > > HI Christopher
> > > > > Thanks for the contribution. I think it is good enhancement.
> > > > >
> > > > > Do you have any data show what performance improvement we can get?
> > > > > Did the system boot faster with the this? Which feature ?
> > > > > UEFI Secure Boot? TCG Measured Boot? HTTPS boot?
> > > > >
> > > > >
> > > > > Comment for the code:
> > > > > 1) I am not sure if we need separate OpensslLibIa32 and OpensslLibX64.
> > > > > Can we just define single INF, such as OpensslLibHw.inf ?
> > > > >
> > > > > 2) Do we also need add a new version for OpensslLibCrypto.inf ?
> > > > >
> > > > >
> > > > >
> > > > > Thank you
> > > > > Yao Jiewen
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of
> > > Zurcher,
> > > > > > Christopher J
> > > > > > Sent: Tuesday, March 17, 2020 6:27 PM
> > > > > > To: devel@edk2.groups.io
> > > > > > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX
> > > > > <xiaoyux.lu@intel.com>;
> > > > > > Eugene Cohen <eugene@hp.com>; Ard Biesheuvel
> > > <ard.biesheuvel@linaro.org>
> > > > > > Subject: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native
> > > > > instruction
> > > > > > support for IA32 and X64
> > > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > > 
> > >
> >
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [edk2-devel] [PATCH 0/1] CryptoPkg/OpensslLib: Add native instruction support for IA32 and X64
  2020-03-26  1:04   ` [edk2-devel] " Zurcher, Christopher J
@ 2020-03-26  7:49     ` Ard Biesheuvel
  0 siblings, 0 replies; 14+ messages in thread
From: Ard Biesheuvel @ 2020-03-26  7:49 UTC (permalink / raw)
  To: Zurcher, Christopher J
  Cc: devel@edk2.groups.io, Wang, Jian J, Lu, XiaoyuX,
	david.harris4@hp.com

On Thu, 26 Mar 2020 at 02:04, Zurcher, Christopher J
<christopher.j.zurcher@intel.com> wrote:
>
> > -----Original Message-----
> > From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of Ard Biesheuvel
> > Sent: Wednesday, March 25, 2020 11:40
> > To: Zurcher, Christopher J <christopher.j.zurcher@intel.com>
> > Cc: edk2-devel-groups-io <devel@edk2.groups.io>; Wang, Jian J
> > <jian.j.wang@intel.com>; Lu, XiaoyuX <xiaoyux.lu@intel.com>; Eugene Cohen
> > <eugene@hp.com>
> > Subject: Re: [edk2-devel] [PATCH 0/1] CryptoPkg/OpensslLib: Add native
> > instruction support for IA32 and X64
> >
> > On Tue, 17 Mar 2020 at 11:26, Christopher J Zurcher
> > <christopher.j.zurcher@intel.com> wrote:
> > >
> > > BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=2507
> > >
> > > This patch adds support for building the native instruction algorithms for
> > > IA32 and X64 versions of OpensslLib. The process_files.pl script was
> > modified
> > > to parse the .asm file targets from the OpenSSL build config data struct,
> > and
> > > generate the necessary assembly files for the EDK2 build environment.
> > >
> > > For the X64 variant, OpenSSL includes calls to a Windows error handling
> > API,
> > > and that function has been stubbed out in ApiHooks.c.
> > >
> > > For all variants, a constructor was added to call the required CPUID
> > function
> > > within OpenSSL to facilitate processor capability checks in the native
> > > algorithms.
> > >
> > > Additional native architecture variants should be simple to add by
> > following
> > > the changes made for these two architectures.
> > >
> > > The OpenSSL assembly files are traditionally generated at build time using
> > a
> > > perl script. To avoid that burden on EDK2 users, these end-result assembly
> > > files are generated during the configuration steps performed by the package
> > > maintainer (through process_files.pl). The perl generator scripts inside
> > > OpenSSL do not parse file comments as they are only meant to create
> > > intermediate build files, so process_files.pl contains additional hooks to
> > > preserve the copyright headers as well as clean up tabs and line endings to
> > > comply with EDK2 coding standards. The resulting file headers align with
> > > the generated .h files which are already included in the EDK2 repository.
> > >
> > > Cc: Jian J Wang <jian.j.wang@intel.com>
> > > Cc: Xiaoyu Lu <xiaoyux.lu@intel.com>
> > > Cc: Eugene Cohen <eugene@hp.com>
> > > Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> > >
> > > Christopher J Zurcher (1):
> > >   CryptoPkg/OpensslLib: Add native instruction support for IA32 and X64
> > >
> >
> > Hello Christopher,
> >
> > It would be helpful to understand the purpose of all this. Also, I
> > think we could be more picky about which algorithms we enable - DES
> > and MD5 don't seem highly useful, and even if they were, what do we
> > gain by using a faster (or smaller?) implementation?
> >
>
> The selection of algorithms comes from the default OpenSSL assembly targets; this combination is validated and widely used, and I don't know all the consequences of picking and choosing which ones to include. If necessary I could look into reducing the list.
>
> The primary driver for this change is enabling the Full Flash Update (FFU) OS provisioning flow for Windows as described here:
> https://docs.microsoft.com/en-us/windows-hardware/manufacture/desktop/wim-vs-ffu-image-file-formats
> This item under "Reliability" is what we are speeding up: "Includes a catalog and hash table to validate a signature upfront before flashing onto a device. The hash table is generated during capture, and validated when applying the image."
> This provisioning flow can be performed within the UEFI environment, and the native algorithms allow significant time savings in a factory setting (minutes per device).
>
> We also had a BZ which requested these changes and the specific need was not provided. Maybe David @HP can provide further insight?
> https://bugzilla.tianocore.org/show_bug.cgi?id=2507
>
> There have been additional platform-specific benefits identified, for example speeding up HMAC authentication of communication with other HW/FW components.
>

OK, so in summary, there is one particular hash that you need to speed
up, and this is why you enable every single asm implementation in
OpenSSL for X86. I'm not sure I am convinced that this is justified.

OpenSSL can easily deal with having just a couple of these accelerated
implementations enabled. To me, it seems like a more sensible approach
to only enable algorithms based on special instructions (such as
AES-NI and SHA-NI), which typically give a speedup in the 10x-20x
range, and to leave the other ones disabled. E.g., accelerated
montgomery multiplication for RSA is really not worth it, since the
speedup is around 50% and RSA is hardly a bottleneck in UEFI. Same
applies to deprecated ciphers such as DES or MD5 - they are not based
on special instructions, and shouldn't be used in the first place.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native instruction support for IA32 and X64
  2020-03-26  3:58             ` Yao, Jiewen
@ 2020-03-26 18:23               ` Michael D Kinney
  2020-03-27  0:52                 ` Zurcher, Christopher J
  0 siblings, 1 reply; 14+ messages in thread
From: Michael D Kinney @ 2020-03-26 18:23 UTC (permalink / raw)
  To: Yao, Jiewen, Zurcher, Christopher J, devel@edk2.groups.io,
	Kinney, Michael D
  Cc: Wang, Jian J, Lu, XiaoyuX, Ard Biesheuvel, david.harris4@hp.com

The main issue with use of extended registers is
that the default interrupt/exception handler for 
IA32/X64 does FXSAVE/FXRSTOR.

https://github.com/tianocore/edk2/blob/5f7c91f0d72efca3b53628163861265c89306f1f/UefiCpuPkg/Library/CpuExceptionHandlerLib/X64/ExceptionHandlerAsm.nasm#L244

https://github.com/tianocore/edk2/blob/5f7c91f0d72efca3b53628163861265c89306f1f/UefiCpuPkg/Library/CpuExceptionHandlerLib/Ia32/ExceptionHandlerAsm.nasm#L290

This covers FP/MMX/XMM registers, but no more.

Any code that uses more than this has 2 options:
1) Expand the state saved in interrupts/exception handlers
   for the entire platform.
2) Disable interrupts when calling code that uses 
   additional register state.

Mike

> -----Original Message-----
> From: Yao, Jiewen <jiewen.yao@intel.com>
> Sent: Wednesday, March 25, 2020 8:58 PM
> To: Zurcher, Christopher J
> <christopher.j.zurcher@intel.com>; devel@edk2.groups.io
> Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX
> <xiaoyux.lu@intel.com>; Ard Biesheuvel
> <ard.biesheuvel@linaro.org>; david.harris4@hp.com;
> Kinney, Michael D <michael.d.kinney@intel.com>
> Subject: RE: [edk2-devel] [PATCH 1/1]
> CryptoPkg/OpensslLib: Add native instruction support for
> IA32 and X64
> 
> There is an old shell based test application. It is
> removed from edk2-repo, and it should be in edk2-test
> repo in the future.
> 
> Before the latter is ready, you can find the archive at
> https://github.com/jyao1/edk2/tree/edk2-
> cryptest/CryptoTestPkg/Cryptest
> 
> But I am not sure if it covers the latest interface,
> especially the new added one. As far as I know, some SM3
> is not added there.
> 
> I still recommend you double check which functions use
> the YMM and make sure these functions are tested.
> 
> I have no concern on HASH, since you already provided the
> data.
> But other algo, such as AES/RSA/HMAC, etc.
> 
> 
> Thank you
> Yao Jiewen
> 
> > -----Original Message-----
> > From: Zurcher, Christopher J
> <christopher.j.zurcher@intel.com>
> > Sent: Thursday, March 26, 2020 11:29 AM
> > To: Yao, Jiewen <jiewen.yao@intel.com>;
> devel@edk2.groups.io
> > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX
> <xiaoyux.lu@intel.com>;
> > Ard Biesheuvel <ard.biesheuvel@linaro.org>;
> david.harris4@hp.com; Kinney,
> > Michael D <michael.d.kinney@intel.com>
> > Subject: RE: [edk2-devel] [PATCH 1/1]
> CryptoPkg/OpensslLib: Add native
> > instruction support for IA32 and X64
> >
> > > -----Original Message-----
> > > From: Yao, Jiewen <jiewen.yao@intel.com>
> > > Sent: Wednesday, March 25, 2020 20:05
> > > To: Zurcher, Christopher J
> <christopher.j.zurcher@intel.com>;
> > > devel@edk2.groups.io
> > > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX
> > <xiaoyux.lu@intel.com>;
> > > Ard Biesheuvel <ard.biesheuvel@linaro.org>;
> david.harris4@hp.com; Kinney,
> > > Michael D <michael.d.kinney@intel.com>
> > > Subject: RE: [edk2-devel] [PATCH 1/1]
> CryptoPkg/OpensslLib: Add native
> > > instruction support for IA32 and X64
> > >
> > > Thanks. Comment inline:
> > >
> > > > -----Original Message-----
> > > > From: Zurcher, Christopher J
> <christopher.j.zurcher@intel.com>
> > > > Sent: Thursday, March 26, 2020 10:44 AM
> > > > To: devel@edk2.groups.io; Yao, Jiewen
> <jiewen.yao@intel.com>
> > > > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu,
> XiaoyuX
> > > <xiaoyux.lu@intel.com>;
> > > > Ard Biesheuvel <ard.biesheuvel@linaro.org>;
> david.harris4@hp.com; Kinney,
> > > > Michael D <michael.d.kinney@intel.com>
> > > > Subject: RE: [edk2-devel] [PATCH 1/1]
> CryptoPkg/OpensslLib: Add native
> > > > instruction support for IA32 and X64
> > > >
> > > > The specific performance improvement depends on the
> operation; the OS
> > > > provisioning I mentioned in the [Patch 0/1] thread
> removed the hashing as a
> > > > bottleneck and improved the overall operation speed
> over 4x (saving 2.5
> > > > minutes of flashing time), but a direct SHA256
> benchmark on the particular
> > > > silicon I have available showed over 12x
> improvement. I have not
> > > benchmarked
> > > > the improvements to boot time. I do not know the
> use case targeted by BZ
> > > 2507
> > > > so I don’t know what benefit will be seen there.
> > > [Jiewen] I guess there might be some improvement on
> HTTPS boot because of
> > > AES-NI for TLS session.
> > > I am just curious on the data. :-)
> > >
> > >
> > > >
> > > > I will look at unifying the INF files in the next
> patch-set and will also
> > > add the
> > > > OpensslLibCrypto.inf case.
> > > [Jiewen] Thanks!
> > >
> > >
> > > >
> > > > I have not exercised the AVX code specifically, as
> it is coming directly
> > > from
> > > > OpenSSL and includes checks against the CPUID
> capability flags before
> > > executing.
> > > > I'm not entirely familiar with AVX requirements; is
> there a known
> > > environment
> > > > restriction against AVX instructions in EDK2?
> > > [Jiewen] Yes. UEFI spec only requires to set env for
> XMM register.
> > > Using other registers such as YMM or ZMM may requires
> special setup, and
> > > save/restore some FPU state.
> > > If a function containing the YMM register access and
> it is linked into
> > > BaseCryptoLib, I highly recommend you run some test
> to make sure it can
> > work
> > > correct.
> > >
> > > Maybe I should ask more generic question: Have you
> validated all impacted
> > > crypto lib API to make sure they still work well with
> this improvement?
> > >
> >
> > I have not. Is there an existing test suite that was
> used initially, or used with
> > each OpenSSL version update?
> >
> > Thanks,
> > Christopher Zurcher
> >
> > >
> > > >
> > > > Regarding RNG, it looks like we already have
> architecture-specific variants
> > > of
> > > > RdRand...?
> > > [Jiewen] Yes. That is in RngLib.
> > > I ask this question because I see openssl wrapper is
> using PMC/TSC as noisy.
> > >
> >
> https://github.com/tianocore/edk2/blob/master/CryptoPkg/L
> ibrary/OpensslLib/
> > ra
> > > nd_pool.c
> > > Since this patch adds instruction dependency, why no
> use RNG instruction as
> > > well?
> > >
> > >
> > > >
> > > > There was some off-list feedback regarding the
> number of files required to
> > > be
> > > > checked in here. OpenSSL does not include
> assembler-specific
> > > implementations
> > > > of these files and instead relies on "perlasm"
> scripts which are parsed by
> > > a
> > > > translation script at build time (in the normal
> OpenSSL build flow) to
> > > generate
> > > > the resulting .nasm files. The implementation I
> have shared here generates
> > > these
> > > > files as part of the OpensslLib maintainer process,
> similar to the existing
> > > header
> > > > files which are also generated. Since
> process_files.pl already requires the
> > > > package maintainer to have a Perl environment
> installed, this does not
> > > place any
> > > > additional burden on them.
> > > > An alternative implementation has been proposed
> which would see only a
> > > > listing/script of the required generator operations
> to be checked in, and
> > > any
> > > > platform build which intended to utilize the native
> algorithms would
> > > require a
> > > > local Perl environment as well as any underlying
> environment dependencies
> > > > (such as a version check against the NASM
> executable) for every developer,
> > > and
> > > > additional pre-build steps to run the generator
> scripts.
> > > >
> > > > Are there any strong opinions here around adding
> Perl as a build
> > > environment
> > > > dependency vs. checking in maintainer-generated
> assembly "intermediate"
> > > build
> > > > files?
> > > [Jiewen] Good question. For tool, maybe Mike or
> Liming can answer.
> > > And I did get similar issue with you.
> > > I got a submodule code need using CMake and pre-
> processor to generate a
> > > common include file.
> > > How do we handle that? I look for the recommendation
> as well.
> > >
> > >
> > > >
> > > > Thanks,
> > > > Christopher Zurcher
> > > >
> > > > > -----Original Message-----
> > > > > From: devel@edk2.groups.io <devel@edk2.groups.io>
> On Behalf Of Yao,
> > > > Jiewen
> > > > > Sent: Wednesday, March 25, 2020 18:23
> > > > > To: devel@edk2.groups.io; Yao, Jiewen
> <jiewen.yao@intel.com>; Zurcher,
> > > > > Christopher J <christopher.j.zurcher@intel.com>
> > > > > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu,
> XiaoyuX
> > > > <xiaoyux.lu@intel.com>;
> > > > > Eugene Cohen <eugene@hp.com>; Ard Biesheuvel
> > <ard.biesheuvel@linaro.org>
> > > > > Subject: Re: [edk2-devel] [PATCH 1/1]
> CryptoPkg/OpensslLib: Add native
> > > > > instruction support for IA32 and X64
> > > > >
> > > > > Some more comment:
> > > > >
> > > > > 3) Do you consider to enable RNG instruction as
> well?
> > > > >
> > > > > 4) I saw you added some code for AVX instruction,
> such as YMM register.
> > > > > Have you validated that code, to make sure it can
> work correctly in
> > > current
> > > > > environment?
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: devel@edk2.groups.io
> <devel@edk2.groups.io> On Behalf Of Yao,
> > > > Jiewen
> > > > > > Sent: Thursday, March 26, 2020 9:15 AM
> > > > > > To: devel@edk2.groups.io; Zurcher, Christopher
> J
> > > > > > <christopher.j.zurcher@intel.com>
> > > > > > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu,
> XiaoyuX
> > > > > <xiaoyux.lu@intel.com>;
> > > > > > Eugene Cohen <eugene@hp.com>; Ard Biesheuvel
> > > > <ard.biesheuvel@linaro.org>
> > > > > > Subject: Re: [edk2-devel] [PATCH 1/1]
> CryptoPkg/OpensslLib: Add native
> > > > > > instruction support for IA32 and X64
> > > > > >
> > > > > > HI Christopher
> > > > > > Thanks for the contribution. I think it is good
> enhancement.
> > > > > >
> > > > > > Do you have any data show what performance
> improvement we can get?
> > > > > > Did the system boot faster with the this? Which
> feature ?
> > > > > > UEFI Secure Boot? TCG Measured Boot? HTTPS
> boot?
> > > > > >
> > > > > >
> > > > > > Comment for the code:
> > > > > > 1) I am not sure if we need separate
> OpensslLibIa32 and OpensslLibX64.
> > > > > > Can we just define single INF, such as
> OpensslLibHw.inf ?
> > > > > >
> > > > > > 2) Do we also need add a new version for
> OpensslLibCrypto.inf ?
> > > > > >
> > > > > >
> > > > > >
> > > > > > Thank you
> > > > > > Yao Jiewen
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: devel@edk2.groups.io
> <devel@edk2.groups.io> On Behalf Of
> > > > Zurcher,
> > > > > > > Christopher J
> > > > > > > Sent: Tuesday, March 17, 2020 6:27 PM
> > > > > > > To: devel@edk2.groups.io
> > > > > > > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu,
> XiaoyuX
> > > > > > <xiaoyux.lu@intel.com>;
> > > > > > > Eugene Cohen <eugene@hp.com>; Ard Biesheuvel
> > > > <ard.biesheuvel@linaro.org>
> > > > > > > Subject: [edk2-devel] [PATCH 1/1]
> CryptoPkg/OpensslLib: Add native
> > > > > > instruction
> > > > > > > support for IA32 and X64
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > 
> > > >
> > >
> >


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native instruction support for IA32 and X64
  2020-03-26 18:23               ` Michael D Kinney
@ 2020-03-27  0:52                 ` Zurcher, Christopher J
  0 siblings, 0 replies; 14+ messages in thread
From: Zurcher, Christopher J @ 2020-03-27  0:52 UTC (permalink / raw)
  To: Kinney, Michael D, Yao, Jiewen, devel@edk2.groups.io
  Cc: Wang, Jian J, Lu, XiaoyuX, Ard Biesheuvel, david.harris4@hp.com

During the Perlasm generation step, the script attempts to execute NASM from the user's path to see if it's an appropriate version to support AVX instructions:
https://github.com/openssl/openssl/blob/OpenSSL_1_1_1-stable/crypto/sha/asm/sha512-x86_64.pl#L129
For now, we could force this check to fail during generation. I'm going to reduce this patch to only cover AES and SHA, and the OS-provisioning use case for the SHA acceleration has enough dependency on I/O operations that the AVX speed increase is already reaching diminishing returns. I don't mind leaving AVX support to the future.

--
Christopher Zurcher

> -----Original Message-----
> From: Kinney, Michael D <michael.d.kinney@intel.com>
> Sent: Thursday, March 26, 2020 11:23
> To: Yao, Jiewen <jiewen.yao@intel.com>; Zurcher, Christopher J
> <christopher.j.zurcher@intel.com>; devel@edk2.groups.io; Kinney, Michael D
> <michael.d.kinney@intel.com>
> Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX <xiaoyux.lu@intel.com>;
> Ard Biesheuvel <ard.biesheuvel@linaro.org>; david.harris4@hp.com
> Subject: RE: [edk2-devel] [PATCH 1/1] CryptoPkg/OpensslLib: Add native
> instruction support for IA32 and X64
> 
> The main issue with use of extended registers is
> that the default interrupt/exception handler for
> IA32/X64 does FXSAVE/FXRSTOR.
> 
> https://github.com/tianocore/edk2/blob/5f7c91f0d72efca3b53628163861265c89306f
> 1f/UefiCpuPkg/Library/CpuExceptionHandlerLib/X64/ExceptionHandlerAsm.nasm#L24
> 4
> 
> https://github.com/tianocore/edk2/blob/5f7c91f0d72efca3b53628163861265c89306f
> 1f/UefiCpuPkg/Library/CpuExceptionHandlerLib/Ia32/ExceptionHandlerAsm.nasm#L2
> 90
> 
> This covers FP/MMX/XMM registers, but no more.
> 
> Any code that uses more than this has 2 options:
> 1) Expand the state saved in interrupts/exception handlers
>    for the entire platform.
> 2) Disable interrupts when calling code that uses
>    additional register state.
> 
> Mike
> 
> > -----Original Message-----
> > From: Yao, Jiewen <jiewen.yao@intel.com>
> > Sent: Wednesday, March 25, 2020 8:58 PM
> > To: Zurcher, Christopher J
> > <christopher.j.zurcher@intel.com>; devel@edk2.groups.io
> > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX
> > <xiaoyux.lu@intel.com>; Ard Biesheuvel
> > <ard.biesheuvel@linaro.org>; david.harris4@hp.com;
> > Kinney, Michael D <michael.d.kinney@intel.com>
> > Subject: RE: [edk2-devel] [PATCH 1/1]
> > CryptoPkg/OpensslLib: Add native instruction support for
> > IA32 and X64
> >
> > There is an old shell based test application. It is
> > removed from edk2-repo, and it should be in edk2-test
> > repo in the future.
> >
> > Before the latter is ready, you can find the archive at
> > https://github.com/jyao1/edk2/tree/edk2-
> > cryptest/CryptoTestPkg/Cryptest
> >
> > But I am not sure if it covers the latest interface,
> > especially the new added one. As far as I know, some SM3
> > is not added there.
> >
> > I still recommend you double check which functions use
> > the YMM and make sure these functions are tested.
> >
> > I have no concern on HASH, since you already provided the
> > data.
> > But other algo, such as AES/RSA/HMAC, etc.
> >
> >
> > Thank you
> > Yao Jiewen
> >
> > > -----Original Message-----
> > > From: Zurcher, Christopher J
> > <christopher.j.zurcher@intel.com>
> > > Sent: Thursday, March 26, 2020 11:29 AM
> > > To: Yao, Jiewen <jiewen.yao@intel.com>;
> > devel@edk2.groups.io
> > > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX
> > <xiaoyux.lu@intel.com>;
> > > Ard Biesheuvel <ard.biesheuvel@linaro.org>;
> > david.harris4@hp.com; Kinney,
> > > Michael D <michael.d.kinney@intel.com>
> > > Subject: RE: [edk2-devel] [PATCH 1/1]
> > CryptoPkg/OpensslLib: Add native
> > > instruction support for IA32 and X64
> > >
> > > > -----Original Message-----
> > > > From: Yao, Jiewen <jiewen.yao@intel.com>
> > > > Sent: Wednesday, March 25, 2020 20:05
> > > > To: Zurcher, Christopher J
> > <christopher.j.zurcher@intel.com>;
> > > > devel@edk2.groups.io
> > > > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu, XiaoyuX
> > > <xiaoyux.lu@intel.com>;
> > > > Ard Biesheuvel <ard.biesheuvel@linaro.org>;
> > david.harris4@hp.com; Kinney,
> > > > Michael D <michael.d.kinney@intel.com>
> > > > Subject: RE: [edk2-devel] [PATCH 1/1]
> > CryptoPkg/OpensslLib: Add native
> > > > instruction support for IA32 and X64
> > > >
> > > > Thanks. Comment inline:
> > > >
> > > > > -----Original Message-----
> > > > > From: Zurcher, Christopher J
> > <christopher.j.zurcher@intel.com>
> > > > > Sent: Thursday, March 26, 2020 10:44 AM
> > > > > To: devel@edk2.groups.io; Yao, Jiewen
> > <jiewen.yao@intel.com>
> > > > > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu,
> > XiaoyuX
> > > > <xiaoyux.lu@intel.com>;
> > > > > Ard Biesheuvel <ard.biesheuvel@linaro.org>;
> > david.harris4@hp.com; Kinney,
> > > > > Michael D <michael.d.kinney@intel.com>
> > > > > Subject: RE: [edk2-devel] [PATCH 1/1]
> > CryptoPkg/OpensslLib: Add native
> > > > > instruction support for IA32 and X64
> > > > >
> > > > > The specific performance improvement depends on the
> > operation; the OS
> > > > > provisioning I mentioned in the [Patch 0/1] thread
> > removed the hashing as a
> > > > > bottleneck and improved the overall operation speed
> > over 4x (saving 2.5
> > > > > minutes of flashing time), but a direct SHA256
> > benchmark on the particular
> > > > > silicon I have available showed over 12x
> > improvement. I have not
> > > > benchmarked
> > > > > the improvements to boot time. I do not know the
> > use case targeted by BZ
> > > > 2507
> > > > > so I don’t know what benefit will be seen there.
> > > > [Jiewen] I guess there might be some improvement on
> > HTTPS boot because of
> > > > AES-NI for TLS session.
> > > > I am just curious on the data. :-)
> > > >
> > > >
> > > > >
> > > > > I will look at unifying the INF files in the next
> > patch-set and will also
> > > > add the
> > > > > OpensslLibCrypto.inf case.
> > > > [Jiewen] Thanks!
> > > >
> > > >
> > > > >
> > > > > I have not exercised the AVX code specifically, as
> > it is coming directly
> > > > from
> > > > > OpenSSL and includes checks against the CPUID
> > capability flags before
> > > > executing.
> > > > > I'm not entirely familiar with AVX requirements; is
> > there a known
> > > > environment
> > > > > restriction against AVX instructions in EDK2?
> > > > [Jiewen] Yes. UEFI spec only requires to set env for
> > XMM register.
> > > > Using other registers such as YMM or ZMM may requires
> > special setup, and
> > > > save/restore some FPU state.
> > > > If a function containing the YMM register access and
> > it is linked into
> > > > BaseCryptoLib, I highly recommend you run some test
> > to make sure it can
> > > work
> > > > correct.
> > > >
> > > > Maybe I should ask more generic question: Have you
> > validated all impacted
> > > > crypto lib API to make sure they still work well with
> > this improvement?
> > > >
> > >
> > > I have not. Is there an existing test suite that was
> > used initially, or used with
> > > each OpenSSL version update?
> > >
> > > Thanks,
> > > Christopher Zurcher
> > >
> > > >
> > > > >
> > > > > Regarding RNG, it looks like we already have
> > architecture-specific variants
> > > > of
> > > > > RdRand...?
> > > > [Jiewen] Yes. That is in RngLib.
> > > > I ask this question because I see openssl wrapper is
> > using PMC/TSC as noisy.
> > > >
> > >
> > https://github.com/tianocore/edk2/blob/master/CryptoPkg/L
> > ibrary/OpensslLib/
> > > ra
> > > > nd_pool.c
> > > > Since this patch adds instruction dependency, why no
> > use RNG instruction as
> > > > well?
> > > >
> > > >
> > > > >
> > > > > There was some off-list feedback regarding the
> > number of files required to
> > > > be
> > > > > checked in here. OpenSSL does not include
> > assembler-specific
> > > > implementations
> > > > > of these files and instead relies on "perlasm"
> > scripts which are parsed by
> > > > a
> > > > > translation script at build time (in the normal
> > OpenSSL build flow) to
> > > > generate
> > > > > the resulting .nasm files. The implementation I
> > have shared here generates
> > > > these
> > > > > files as part of the OpensslLib maintainer process,
> > similar to the existing
> > > > header
> > > > > files which are also generated. Since
> > process_files.pl already requires the
> > > > > package maintainer to have a Perl environment
> > installed, this does not
> > > > place any
> > > > > additional burden on them.
> > > > > An alternative implementation has been proposed
> > which would see only a
> > > > > listing/script of the required generator operations
> > to be checked in, and
> > > > any
> > > > > platform build which intended to utilize the native
> > algorithms would
> > > > require a
> > > > > local Perl environment as well as any underlying
> > environment dependencies
> > > > > (such as a version check against the NASM
> > executable) for every developer,
> > > > and
> > > > > additional pre-build steps to run the generator
> > scripts.
> > > > >
> > > > > Are there any strong opinions here around adding
> > Perl as a build
> > > > environment
> > > > > dependency vs. checking in maintainer-generated
> > assembly "intermediate"
> > > > build
> > > > > files?
> > > > [Jiewen] Good question. For tool, maybe Mike or
> > Liming can answer.
> > > > And I did get similar issue with you.
> > > > I got a submodule code need using CMake and pre-
> > processor to generate a
> > > > common include file.
> > > > How do we handle that? I look for the recommendation
> > as well.
> > > >
> > > >
> > > > >
> > > > > Thanks,
> > > > > Christopher Zurcher
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: devel@edk2.groups.io <devel@edk2.groups.io>
> > On Behalf Of Yao,
> > > > > Jiewen
> > > > > > Sent: Wednesday, March 25, 2020 18:23
> > > > > > To: devel@edk2.groups.io; Yao, Jiewen
> > <jiewen.yao@intel.com>; Zurcher,
> > > > > > Christopher J <christopher.j.zurcher@intel.com>
> > > > > > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu,
> > XiaoyuX
> > > > > <xiaoyux.lu@intel.com>;
> > > > > > Eugene Cohen <eugene@hp.com>; Ard Biesheuvel
> > > <ard.biesheuvel@linaro.org>
> > > > > > Subject: Re: [edk2-devel] [PATCH 1/1]
> > CryptoPkg/OpensslLib: Add native
> > > > > > instruction support for IA32 and X64
> > > > > >
> > > > > > Some more comment:
> > > > > >
> > > > > > 3) Do you consider to enable RNG instruction as
> > well?
> > > > > >
> > > > > > 4) I saw you added some code for AVX instruction,
> > such as YMM register.
> > > > > > Have you validated that code, to make sure it can
> > work correctly in
> > > > current
> > > > > > environment?
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: devel@edk2.groups.io
> > <devel@edk2.groups.io> On Behalf Of Yao,
> > > > > Jiewen
> > > > > > > Sent: Thursday, March 26, 2020 9:15 AM
> > > > > > > To: devel@edk2.groups.io; Zurcher, Christopher
> > J
> > > > > > > <christopher.j.zurcher@intel.com>
> > > > > > > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu,
> > XiaoyuX
> > > > > > <xiaoyux.lu@intel.com>;
> > > > > > > Eugene Cohen <eugene@hp.com>; Ard Biesheuvel
> > > > > <ard.biesheuvel@linaro.org>
> > > > > > > Subject: Re: [edk2-devel] [PATCH 1/1]
> > CryptoPkg/OpensslLib: Add native
> > > > > > > instruction support for IA32 and X64
> > > > > > >
> > > > > > > HI Christopher
> > > > > > > Thanks for the contribution. I think it is good
> > enhancement.
> > > > > > >
> > > > > > > Do you have any data show what performance
> > improvement we can get?
> > > > > > > Did the system boot faster with the this? Which
> > feature ?
> > > > > > > UEFI Secure Boot? TCG Measured Boot? HTTPS
> > boot?
> > > > > > >
> > > > > > >
> > > > > > > Comment for the code:
> > > > > > > 1) I am not sure if we need separate
> > OpensslLibIa32 and OpensslLibX64.
> > > > > > > Can we just define single INF, such as
> > OpensslLibHw.inf ?
> > > > > > >
> > > > > > > 2) Do we also need add a new version for
> > OpensslLibCrypto.inf ?
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Thank you
> > > > > > > Yao Jiewen
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: devel@edk2.groups.io
> > <devel@edk2.groups.io> On Behalf Of
> > > > > Zurcher,
> > > > > > > > Christopher J
> > > > > > > > Sent: Tuesday, March 17, 2020 6:27 PM
> > > > > > > > To: devel@edk2.groups.io
> > > > > > > > Cc: Wang, Jian J <jian.j.wang@intel.com>; Lu,
> > XiaoyuX
> > > > > > > <xiaoyux.lu@intel.com>;
> > > > > > > > Eugene Cohen <eugene@hp.com>; Ard Biesheuvel
> > > > > <ard.biesheuvel@linaro.org>
> > > > > > > > Subject: [edk2-devel] [PATCH 1/1]
> > CryptoPkg/OpensslLib: Add native
> > > > > > > instruction
> > > > > > > > support for IA32 and X64
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > 
> > > > >
> > > >
> > >
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-03-27  0:52 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-03-17 10:26 [PATCH 0/1] CryptoPkg/OpensslLib: Add native instruction support for IA32 and X64 Zurcher, Christopher J
2020-03-17 10:26 ` [PATCH 1/1] " Zurcher, Christopher J
2020-03-26  1:15   ` [edk2-devel] " Yao, Jiewen
     [not found]   ` <15FFB5A5A94CCE31.23217@groups.io>
2020-03-26  1:23     ` Yao, Jiewen
2020-03-26  2:44       ` Zurcher, Christopher J
2020-03-26  3:05         ` Yao, Jiewen
2020-03-26  3:29           ` Zurcher, Christopher J
2020-03-26  3:58             ` Yao, Jiewen
2020-03-26 18:23               ` Michael D Kinney
2020-03-27  0:52                 ` Zurcher, Christopher J
2020-03-23 12:59 ` [edk2-devel] [PATCH 0/1] " Laszlo Ersek
2020-03-25 18:40 ` Ard Biesheuvel
2020-03-26  1:04   ` [edk2-devel] " Zurcher, Christopher J
2020-03-26  7:49     ` Ard Biesheuvel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox